Search Results Data Collection with CAPTCHA Handling

Google uses reCAPTCHA to protect its search results and other services from automated access. When triggered, you'll see a reCAPTCHA v2 or v3 challenge that blocks further requests. CaptchaAI solves these challenges so your scraper can continue.

How Google Detects Scrapers

Signal	Description
Query rate	Too many searches from one IP
IP reputation	Datacenter or flagged proxy IPs
Cookie absence	No Google session cookies
Behavioral patterns	Identical query patterns, no dwell time
JavaScript fingerprint	Missing browser environment indicators

Google typically serves a 429 Too Many Requests response or redirects to a reCAPTCHA challenge page at google.com/sorry/.

Requirements

Requirement	Details
CaptchaAI API key	From captchaai.com
Python 3.7+	With requests
Residential proxies	Strongly recommended

Solving Google's reCAPTCHA

Step 1: Detect the CAPTCHA

import requests

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Accept-Language": "en-US,en;q=0.9"
})

resp = session.get("https://www.google.com/search?q=example")

if "sorry" in resp.url or resp.status_code == 429:
    print("CAPTCHA triggered!")
    captcha_url = resp.url
else:
    print("Results loaded")

Step 2: Extract the Site Key

from bs4 import BeautifulSoup

soup = BeautifulSoup(resp.text, "html.parser")

# Google uses data-sitekey on the reCAPTCHA div
recaptcha = soup.find("div", {"data-sitekey": True})
if recaptcha:
    site_key = recaptcha["data-sitekey"]
    print(f"Site key: {site_key}")

Step 3: Solve with CaptchaAI

import time

API_KEY = "YOUR_API_KEY"

def solve_google_recaptcha(site_key, page_url):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": site_key,
        "pageurl": page_url
    })
    if not resp.text.startswith("OK|"):
        raise Exception(f"Submit error: {resp.text}")

    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|")[1]
        raise Exception(f"Error: {result.text}")

    raise TimeoutError("Timed out")

token = solve_google_recaptcha(site_key, captcha_url)

Step 4: Submit the Token

# Find the form action URL and hidden fields
form = soup.find("form")
form_data = {}
for inp in form.find_all("input", {"name": True}):
    form_data[inp["name"]] = inp.get("value", "")

form_data["g-recaptcha-response"] = token

action = form.get("action", "")
if action.startswith("/"):
    action = f"https://www.google.com{action}"

result = session.post(action, data=form_data)
print(f"Redirected to: {result.url}")

Complete Scraper with CAPTCHA Handling

import requests
import time
from bs4 import BeautifulSoup

API_KEY = "YOUR_API_KEY"

def solve_captcha(site_key, page_url):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY, "method": "userrecaptcha",
        "googlekey": site_key, "pageurl": page_url
    })
    task_id = resp.text.split("|")[1]
    for _ in range(60):
        time.sleep(5)
        r = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if r.text == "CAPCHA_NOT_READY": continue
        if r.text.startswith("OK|"): return r.text.split("|")[1]
    raise TimeoutError()

def google_search(query, num_results=10):
    session = requests.Session()
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
        "Accept-Language": "en-US,en;q=0.9"
    })

    resp = session.get("https://www.google.com/search", params={
        "q": query, "num": num_results
    })

    # Handle CAPTCHA
    if "sorry" in resp.url or resp.status_code == 429:
        soup = BeautifulSoup(resp.text, "html.parser")
        rc = soup.find("div", {"data-sitekey": True})
        if rc:
            token = solve_captcha(rc["data-sitekey"], resp.url)
            form = soup.find("form")
            data = {i["name"]: i.get("value", "")
                    for i in form.find_all("input", {"name": True})}
            data["g-recaptcha-response"] = token
            action = form.get("action", resp.url)
            if action.startswith("/"):
                action = f"https://www.google.com{action}"
            resp = session.post(action, data=data)

    # Parse results
    soup = BeautifulSoup(resp.text, "html.parser")
    results = []
    for div in soup.find_all("div", class_="g"):
        link = div.find("a")
        title = div.find("h3")
        if link and title:
            results.append({
                "title": title.text,
                "url": link.get("href")
            })

    return results

results = google_search("best captcha solving api")
for r in results:
    print(f"{r['title']}: {r['url']}")

Best Practices

Use residential proxies — Google blocks datacenter IPs immediately
Randomize query timing — Wait 5-15 seconds between searches
Vary User-Agents — Rotate through realistic browser User-Agent strings
Limit volume — Keep queries under 100/hour per IP
Use localized domains — Match your proxy region to the Google domain

Troubleshooting

Issue	Fix
CAPTCHA on every request	Switch to residential proxies; reduce rate
reCAPTCHA site key not found	Google may have changed the challenge page layout
Token accepted but still blocked	Google may require additional verification; try different proxy
Results page is empty	Check if Google served an alternate layout

FAQ

Does Google always use reCAPTCHA?

Google primarily uses reCAPTCHA v2 on its challenge pages. Some Google services may use reCAPTCHA v3 in the background. CaptchaAI handles both versions.

How many searches can I make before hitting a CAPTCHA?

It depends on your IP quality and request pattern. With residential proxies and delays, you can often make 50-100 searches before triggering. Without proxies, expect CAPTCHAs after 5-10 searches.

Should I use Google's API instead?

Google's Custom Search JSON API allows 100 free queries/day and 10,000 at $5/1,000. If your volume is low and you only need search results, the official API may be simpler. Scraping is necessary for data Google doesn't expose via API.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Search Results Data Collection with CAPTCHA Handling

How Google Detects Scrapers

Requirements

Solving Google's reCAPTCHA

Step 1: Detect the CAPTCHA

Step 2: Extract the Site Key

Step 3: Solve with CaptchaAI

Step 4: Submit the Token

Complete Scraper with CAPTCHA Handling

Best Practices

Troubleshooting

FAQ

Does Google always use reCAPTCHA?

How many searches can I make before hitting a CAPTCHA?

Should I use Google's API instead?

Discussions (0)

Puppeteer Stealth + CaptchaAI: Reliable Browser Automation

Mobile Proxies for CAPTCHA Solving: Higher Success Rates Explained

Rotating Residential Proxies: Best Practices for CAPTCHA Solving

Extracting reCAPTCHA Parameters from Page Source

How Proxy Quality Affects CAPTCHA Solve Success Rate

Academic Research Web Scraping with CAPTCHA Solving

CAPTCHA Scraping with Python: Complete Guide

Job Board Scraping with CAPTCHA Handling Using CaptchaAI

Why reCAPTCHA v3 Returns Low Score

CAPTCHA Token Injection Methods Reference

How Google Detects Scrapers

Requirements

Solving Google's reCAPTCHA

Step 1: Detect the CAPTCHA

Step 2: Extract the Site Key

Step 3: Solve with CaptchaAI

Step 4: Submit the Token

Complete Scraper with CAPTCHA Handling

Best Practices

Troubleshooting

FAQ

Does Google always use reCAPTCHA?

How many searches can I make before hitting a CAPTCHA?

Should I use Google's API instead?

Related Guides

Discussions (0)

Join the conversation

Related Posts

Puppeteer Stealth + CaptchaAI: Reliable Browser Automation

Mobile Proxies for CAPTCHA Solving: Higher Success Rates Explained

Rotating Residential Proxies: Best Practices for CAPTCHA Solving

Extracting reCAPTCHA Parameters from Page Source

How Proxy Quality Affects CAPTCHA Solve Success Rate

Academic Research Web Scraping with CAPTCHA Solving

CAPTCHA Scraping with Python: Complete Guide

Job Board Scraping with CAPTCHA Handling Using CaptchaAI

Why reCAPTCHA v3 Returns Low Score

CAPTCHA Token Injection Methods Reference