AI-Powered Scrapers & Alternative POI Sources

ScrapeGraphAI, Crawl4AI, Overture Maps, AllThePlaces, OSM

AI/LLM-Powered Scrapers

AI/LLM-Powered Scrapers

ScrapeGraphAI (MIT / Free)

Describe what you want in plain English, it extracts structured JSON without CSS/XPath selectors. Auto-adapts to layout changes. Supports GPT, Gemini, Groq, Azure, Hugging Face, local Ollama models.

GitHub

Crawl4AI (58K Stars, Apache 2.0)

Open-source LLM-friendly web crawler. Outputs clean Markdown for RAG/agents. Heuristic noise filtering, CSS/XPath/LLM extraction. Local-first (no API costs).

GitHub

Firecrawl

Developer API that outputs clean Markdown from any URL, optimized for LLM ingestion. Used in enrichment pipelines alongside Maps scrapers.

firecrawl.dev

LLM Query Expansion

Not a scraper per se, but a technique: use LLMs to generate category synonyms and related search terms to multiply coverage per geographic area. "dentist" becomes "dental clinic", "oral surgeon", "dental practice", etc. Combined with any scraper for 3-5x more results.

Alternative POI Data Sources (No Scraping)

Alternative POI Data Sources

Free or commercial POI databases that don't require scraping Google Maps.

Overture Maps Foundation (64.8M POIs Free)

Backed by Meta, Microsoft, Amazon, TomTom, Foursquare. The strongest free alternative to Google Maps data.

Size64.8M places — Meta (~59.2M), Foursquare (~6.7M), Microsoft (~7.4M), AllThePlaces (~1.7M)
FieldsNames, categories (64+), phones, emails, websites, socials, addresses, brand, operating_status, confidence score, coordinates
FormatGeoParquet on S3 and Azure Blob. Python CLI, DuckDB SQL, browser Explorer
LicenseCDLA-Permissive-2.0 / ODbL (commercial use OK)
LimitsMonthly updates (not real-time). No reviews/photos. Thinner outside Western countries

Warning: Reddit reports Places layer stopped updating as of Sept 2024 release. Verify current state.

Overture Places Guide | Downloads

Overture-Based API (Community-Built, 200x Cheaper)

A developer built a Places API using Overture data + Rust/Axum + PostGIS. Free 5K/mo, $10/100K, $30/500K, $80/2M — vs Google's ~$1,700 for 100K.

Other Alternative Sources

SourcePOIsFree TierKey StrengthReviews?
OpenStreetMap (Overpass API)VariesUnlimited, no keyFree, query any tag combo. Overpass TurboNo
Foursquare Places100M+CommercialRichest venue data, behavioral insights, check-insTips only
HERE TechnologiesGlobal, 400+ cats250K tx/moTripAdvisor ratings, EV/fuel data, chain IDVia TripAdvisor
TomTom~100M, 180+ countries50K daily txNavigation-optimized, relevance scoringNo
MapboxGlobal100K req/moPolished SDKsNo
GeoapifyOSM-based, 400+ cats3K credits/dayTransparent pricing, can cache/storeNo
Yelp Fusion APIMillions5K calls/day3 reviews max via API. Open Dataset: 8.6M reviews (academic)3 (API), 8.6M (dataset)
AllThePlaces20M+Unlimited (CC-0)4,100+ Scrapy spiders, weekly updatesNo
MapQuestVariousLimitedSearch API v5: radius/rect/polygon/corridorNo
NominatimOSM geocodingFreeForward/reverse geocodingNo
Photon (Komoot)OSM geocodingFreeTypo-tolerant, multilingualNo

OSM Reality Check (from Reddit): OpenStreetMap is poor for business/POI data. Business coverage is ~75% at best, biased toward bigger/popular locations.

Data Providers & Marketplaces

ProviderDatasetPricingGoogle Maps?
Bright Data200.7M+ records$0.0025/recordYes
Datarade60M+ US (varies)By providerYes
Veridion134M+ businesses~$99/user/moPartial
SafeGraph52M+ POIsCommercialPartial (foot traffic)
DataplorVariousCommercialPartial (LatAm strong)
Xtract.io6M+ locationsCommercialPartial
CoresignalN/AN/ANo (employee data only)
Data.worldN/AN/ANo (data governance)

Export & Data Liberation Tools