Skip to main content

Headless Browser Automation (Playwright, Puppeteer, Selenium)

Browser Automation Frameworks

The foundation technique used by most scrapers. A browser renders Maps, executes JS, and extracts from the DOM.

Playwright (Most Popular)

Microsoft's library. Used by gosom/google-maps-scraper, HasData, many commercial tools. Supports Chromium, Firefox, WebKit.

  • Languages: Python, Node.js, Java, .NET
  • Anti-detect: stealth plugins available
  • Parallel tabs: 10+ tabs in one browser, ~1.7s/URL
  • Codegen: record browser interactions, auto-generate scraper code

Puppeteer (Node.js)

Google's Node.js browser automation. puppeteer-extra-plugin-stealth provides 17 evasion modules.

  • Best stealth ecosystem (17 modules)
  • XHR interception via Network.requestWillBeSent
  • Caveat: anti-bot companies study the stealth package

Selenium (Legacy)

Original framework. undetected-chromedriver patches for detection evasion.

  • Languages: Python, Java, C#, Ruby, JS
  • Larger fingerprint, easier to detect
  • Still used by HasData and Zubdata scrapers

Extraction Strategies

StrategyToken CostWhen to Use
CSS selectors (querySelectorAll)~52/itemKnown structure — default choice
aria-label attributes~52/itemMore resilient — accessibility attrs are stabler than CSS classes
body.innerText~5K/pageDiscovery — learn structure once, then switch
Network/XHR interceptionMinimalCapture protobuf responses directly — best approach
Accessibility tree (filtered)~28K/pageFind buttons, forms, interactive elements
Screenshot~132KCAPTCHA solving, visual debugging only