# AI-Powered Scrapers & Alternative POI Sources

ScrapeGraphAI, Crawl4AI, Overture Maps, AllThePlaces, OSM

# AI/LLM-Powered Scrapers

## AI/LLM-Powered Scrapers

### ScrapeGraphAI (MIT / Free)

Describe what you want in plain English, it extracts structured JSON without CSS/XPath selectors. Auto-adapts to layout changes. Supports GPT, Gemini, Groq, Azure, Hugging Face, local Ollama models.

[GitHub](https://github.com/ScrapeGraphAI/Scrapegraph-ai)

### Crawl4AI (58K Stars, Apache 2.0)

Open-source LLM-friendly web crawler. Outputs clean Markdown for RAG/agents. Heuristic noise filtering, CSS/XPath/LLM extraction. Local-first (no API costs).

[GitHub](https://github.com/unclecode/crawl4ai)

### Firecrawl

Developer API that outputs clean Markdown from any URL, optimized for LLM ingestion. Used in enrichment pipelines alongside Maps scrapers.

[firecrawl.dev](https://www.firecrawl.dev/)

### LLM Query Expansion

Not a scraper per se, but a technique: use LLMs to generate category synonyms and related search terms to multiply coverage per geographic area. "dentist" becomes "dental clinic", "oral surgeon", "dental practice", etc. Combined with any scraper for 3-5x more results.

# Alternative POI Data Sources (No Scraping)

## Alternative POI Data Sources

Free or commercial POI databases that don't require scraping Google Maps.

### Overture Maps Foundation (64.8M POIs Free)

Backed by Meta, Microsoft, Amazon, TomTom, Foursquare. The strongest free alternative to Google Maps data.

<table id="bkmrk-size64.8m-places-%E2%80%94-m"><tr><td>Size</td><td>**64.8M places** — Meta (~59.2M), Foursquare (~6.7M), Microsoft (~7.4M), AllThePlaces (~1.7M)</td></tr><tr><td>Fields</td><td>Names, categories (64+), phones, emails, websites, socials, addresses, brand, operating\_status, confidence score, coordinates</td></tr><tr><td>Format</td><td>GeoParquet on S3 and Azure Blob. Python CLI, DuckDB SQL, browser Explorer</td></tr><tr><td>License</td><td>CDLA-Permissive-2.0 / ODbL (commercial use OK)</td></tr><tr><td>Limits</td><td>Monthly updates (not real-time). No reviews/photos. Thinner outside Western countries</td></tr></table>

**Warning:** Reddit reports Places layer stopped updating as of Sept 2024 release. Verify current state.

[Overture Places Guide](https://docs.overturemaps.org/guides/places/) | [Downloads](https://overturemaps.org/download/)

#### Overture-Based API (Community-Built, 200x Cheaper)

A developer built a Places API using Overture data + Rust/Axum + PostGIS. Free 5K/mo, $10/100K, $30/500K, $80/2M — vs Google's ~$1,700 for 100K.

### Other Alternative Sources

<table id="bkmrk-sourcepoisfree-tierk"><tr><th>Source</th><th>POIs</th><th>Free Tier</th><th>Key Strength</th><th>Reviews?</th></tr><tr><td>**OpenStreetMap (Overpass API)**</td><td>Varies</td><td>Unlimited, no key</td><td>Free, query any tag combo. [Overpass Turbo](https://overpass-turbo.eu/)</td><td>No</td></tr><tr><td>**Foursquare Places**</td><td>100M+</td><td>Commercial</td><td>Richest venue data, behavioral insights, check-ins</td><td>Tips only</td></tr><tr><td>**HERE Technologies**</td><td>Global, 400+ cats</td><td>250K tx/mo</td><td>TripAdvisor ratings, EV/fuel data, chain ID</td><td>Via TripAdvisor</td></tr><tr><td>**TomTom**</td><td>~100M, 180+ countries</td><td>50K daily tx</td><td>Navigation-optimized, relevance scoring</td><td>No</td></tr><tr><td>**Mapbox**</td><td>Global</td><td>100K req/mo</td><td>Polished SDKs</td><td>No</td></tr><tr><td>**Geoapify**</td><td>OSM-based, 400+ cats</td><td>3K credits/day</td><td>Transparent pricing, can cache/store</td><td>No</td></tr><tr><td>**Yelp Fusion API**</td><td>Millions</td><td>5K calls/day</td><td>3 reviews max via API. Open Dataset: 8.6M reviews (academic)</td><td>3 (API), 8.6M (dataset)</td></tr><tr><td>**AllThePlaces**</td><td>20M+</td><td>Unlimited (CC-0)</td><td>4,100+ Scrapy spiders, weekly updates</td><td>No</td></tr><tr><td>**MapQuest**</td><td>Various</td><td>Limited</td><td>Search API v5: radius/rect/polygon/corridor</td><td>No</td></tr><tr><td>**Nominatim**</td><td>OSM geocoding</td><td>Free</td><td>Forward/reverse geocoding</td><td>No</td></tr><tr><td>**Photon (Komoot)**</td><td>OSM geocoding</td><td>Free</td><td>Typo-tolerant, multilingual</td><td>No</td></tr></table>

**OSM Reality Check (from Reddit):** OpenStreetMap is poor for business/POI data. Business coverage is ~75% at best, biased toward bigger/popular locations.

### Data Providers &amp; Marketplaces

<table id="bkmrk-providerdatasetprici"><tr><th>Provider</th><th>Dataset</th><th>Pricing</th><th>Google Maps?</th></tr><tr><td>Bright Data</td><td>200.7M+ records</td><td>$0.0025/record</td><td>Yes</td></tr><tr><td>Datarade</td><td>60M+ US (varies)</td><td>By provider</td><td>Yes</td></tr><tr><td>Veridion</td><td>134M+ businesses</td><td>~$99/user/mo</td><td>Partial</td></tr><tr><td>SafeGraph</td><td>52M+ POIs</td><td>Commercial</td><td>Partial (foot traffic)</td></tr><tr><td>Dataplor</td><td>Various</td><td>Commercial</td><td>Partial (LatAm strong)</td></tr><tr><td>Xtract.io</td><td>6M+ locations</td><td>Commercial</td><td>Partial</td></tr><tr><td>Coresignal</td><td>N/A</td><td>N/A</td><td>No (employee data only)</td></tr><tr><td>Data.world</td><td>N/A</td><td>N/A</td><td>No (data governance)</td></tr></table>

### Export &amp; Data Liberation Tools

- **Google Takeout**: Export saved/starred places (CSV/JSON, no coordinates — needs geocoding)
- [Takeout Tools](https://www.takeout-tools.com/): Adds coordinates, converts to GeoJSON/KML/GPX
- [json2kml](https://github.com/dmsza/json2kml): Python converter for saved places to KML
- [Export-Google-Maps-Saved-Places](https://github.com/Yeshey/Export-Google-Maps-Saved-Places)
- **Firefox GPX Exporter**: Export saved lists as GPX with lat/lon