Headless Browser Automation (Playwright, Puppeteer, Selenium)
Browser Automation Frameworks
The foundation technique used by most scrapers. A browser renders Maps, executes JS, and extracts from the DOM.
Playwright (Most Popular)
Microsoft's library. Used by gosom/google-maps-scraper, HasData, many commercial tools. Supports Chromium, Firefox, WebKit.
- Languages: Python, Node.js, Java, .NET
- Anti-detect: stealth plugins available
- Parallel tabs: 10+ tabs in one browser, ~1.7s/URL
- Codegen: record browser interactions, auto-generate scraper code
Puppeteer (Node.js)
Google's Node.js browser automation. puppeteer-extra-plugin-stealth provides 17 evasion modules.
- Best stealth ecosystem (17 modules)
- XHR interception via
Network.requestWillBeSent - Caveat: anti-bot companies study the stealth package
Selenium (Legacy)
Original framework. undetected-chromedriver patches for detection evasion.
- Languages: Python, Java, C#, Ruby, JS
- Larger fingerprint, easier to detect
- Still used by HasData and Zubdata scrapers
Extraction Strategies
| Strategy | Token Cost | When to Use |
|---|---|---|
CSS selectors (querySelectorAll) | ~52/item | Known structure — default choice |
aria-label attributes | ~52/item | More resilient — accessibility attrs are stabler than CSS classes |
body.innerText | ~5K/page | Discovery — learn structure once, then switch |
| Network/XHR interception | Minimal | Capture protobuf responses directly — best approach |
| Accessibility tree (filtered) | ~28K/page | Find buttons, forms, interactive elements |
| Screenshot | ~132K | CAPTCHA solving, visual debugging only |