# Headless Browser Automation (Playwright, Puppeteer, Selenium)

## Browser Automation Frameworks

The foundation technique used by most scrapers. A browser renders Maps, executes JS, and extracts from the DOM.

### Playwright (Most Popular)

Microsoft's library. Used by gosom/google-maps-scraper, HasData, many commercial tools. Supports Chromium, Firefox, WebKit.

- Languages: Python, Node.js, Java, .NET
- Anti-detect: stealth plugins available
- Parallel tabs: 10+ tabs in one browser, ~1.7s/URL
- Codegen: record browser interactions, auto-generate scraper code

### Puppeteer (Node.js)

Google's Node.js browser automation. `puppeteer-extra-plugin-stealth` provides 17 evasion modules.

- Best stealth ecosystem (17 modules)
- XHR interception via `Network.requestWillBeSent`
- Caveat: anti-bot companies study the stealth package

### Selenium (Legacy)

Original framework. `undetected-chromedriver` patches for detection evasion.

- Languages: Python, Java, C#, Ruby, JS
- Larger fingerprint, easier to detect
- Still used by HasData and Zubdata scrapers

### Extraction Strategies

<table id="bkmrk-strategytoken-costwh"><tr><th>Strategy</th><th>Token Cost</th><th>When to Use</th></tr><tr><td>CSS selectors (`querySelectorAll`)</td><td>~52/item</td><td>Known structure — default choice</td></tr><tr><td>`aria-label` attributes</td><td>~52/item</td><td>More resilient — accessibility attrs are stabler than CSS classes</td></tr><tr><td>`body.innerText`</td><td>~5K/page</td><td>Discovery — learn structure once, then switch</td></tr><tr><td>Network/XHR interception</td><td>Minimal</td><td>Capture protobuf responses directly — best approach</td></tr><tr><td>Accessibility tree (filtered)</td><td>~28K/page</td><td>Find buttons, forms, interactive elements</td></tr><tr><td>Screenshot</td><td>~132K</td><td>CAPTCHA solving, visual debugging only</td></tr></table>