Use Cases

How to Handle CAPTCHA Challenges in Web Scraping Workflows

CAPTCHAs are the most common blocker in web scraping workflows. When a target site serves a reCAPTCHA, Cloudflare Turnstile, or image CAPTCHA, your scraper stops dead. CaptchaAI's API solves these challenges automatically so your scraper keeps running.

How CAPTCHA Blocking Works in Scraping

Websites trigger CAPTCHAs based on behavioral signals:

Signal Trigger
Request rate Too many requests from one IP
Missing cookies No session or preference cookies
Bot-like headers Missing Accept-Language, Referer
JavaScript fingerprint No JS execution or headless browser detected
IP reputation Datacenter or proxy IP flagged

When triggered, the site returns a CAPTCHA challenge instead of the page content. Your scraper needs to solve it and submit the token to proceed.

Requirements

Requirement Details
CaptchaAI API key From captchaai.com
Python 3.7+ or Node.js 16+ For code examples
requests / axios HTTP client library
Target site URL The page serving the CAPTCHA
CAPTCHA site key Extracted from the page source

Step 1: Identify the CAPTCHA Type

Before solving, identify what CAPTCHA the site uses. Check the page source:

reCAPTCHA v2:

<div class="g-recaptcha" data-sitekey="6Le-wvkS..."></div>

reCAPTCHA v3:

<script src="https://www.google.com/recaptcha/api.js?render=6Le-wvkS..."></script>

Cloudflare Turnstile:

<div class="cf-turnstile" data-sitekey="0x4AAAAA..."></div>

Each type requires a different method parameter when submitting to CaptchaAI.

Step 2: Extract the Site Key

Python (with requests + BeautifulSoup)

from bs4 import BeautifulSoup
import requests

page = requests.get("https://example.com/protected-page", headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
soup = BeautifulSoup(page.text, "html.parser")

# reCAPTCHA v2
recaptcha_div = soup.find("div", class_="g-recaptcha")
if recaptcha_div:
    site_key = recaptcha_div["data-sitekey"]
    print(f"reCAPTCHA v2 site key: {site_key}")

Node.js (with cheerio)

const axios = require("axios");
const cheerio = require("cheerio");

const { data } = await axios.get("https://example.com/protected-page");
const $ = cheerio.load(data);

const siteKey = $(".g-recaptcha").attr("data-sitekey");
console.log("Site key:", siteKey);

Step 3: Submit the CAPTCHA to CaptchaAI

Python

import requests
import time

API_KEY = "YOUR_API_KEY"
SITE_KEY = "6Le-wvkS..."
PAGE_URL = "https://example.com/protected-page"

# Submit
resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": API_KEY,
    "method": "userrecaptcha",
    "googlekey": SITE_KEY,
    "pageurl": PAGE_URL
})

if not resp.text.startswith("OK|"):
    raise Exception(f"Submit error: {resp.text}")

task_id = resp.text.split("|")[1]
print(f"Task submitted: {task_id}")

# Poll for result
while True:
    time.sleep(5)
    result = requests.get("https://ocr.captchaai.com/res.php", params={
        "key": API_KEY,
        "action": "get",
        "id": task_id
    })
    if result.text == "CAPCHA_NOT_READY":
        continue
    if result.text.startswith("OK|"):
        token = result.text.split("|")[1]
        print(f"Solved! Token: {token[:50]}...")
        break
    raise Exception(f"Solve error: {result.text}")

Node.js

const axios = require("axios");

const API_KEY = "YOUR_API_KEY";
const SITE_KEY = "6Le-wvkS...";
const PAGE_URL = "https://example.com/protected-page";

// Submit
const submitResp = await axios.get("https://ocr.captchaai.com/in.php", {
  params: {
    key: API_KEY,
    method: "userrecaptcha",
    googlekey: SITE_KEY,
    pageurl: PAGE_URL,
  },
});

const taskId = submitResp.data.split("|")[1];

// Poll
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

while (true) {
  await sleep(5000);
  const result = await axios.get("https://ocr.captchaai.com/res.php", {
    params: { key: API_KEY, action: "get", id: taskId },
  });
  if (result.data === "CAPCHA_NOT_READY") continue;
  if (result.data.startsWith("OK|")) {
    const token = result.data.split("|")[1];
    console.log("Token:", token.substring(0, 50));
    break;
  }
  throw new Error(`Error: ${result.data}`);
}

Step 4: Submit the Token to the Target Site

Once you have the token, submit it with the form data the site expects:

Python

# Submit the solved token with the form
form_data = {
    "g-recaptcha-response": token,
    "username": "user@example.com",
    "password": "password123"
}

response = requests.post(PAGE_URL, data=form_data, headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})

print(f"Status: {response.status_code}")

Step 5: Build a Reusable Scraper Function

Wrap the solve logic into a reusable function:

import requests
import time

API_KEY = "YOUR_API_KEY"

def solve_captcha(site_key, page_url, method="userrecaptcha"):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": method,
        "googlekey": site_key,
        "pageurl": page_url
    })
    if not resp.text.startswith("OK|"):
        raise Exception(resp.text)
    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|")[1]
        raise Exception(result.text)
    raise TimeoutError("CAPTCHA solve timed out")

# Use in your scraper
def scrape_page(url, site_key):
    token = solve_captcha(site_key, url)
    response = requests.post(url, data={"g-recaptcha-response": token})
    return response.text

Troubleshooting

Error Cause Fix
ERROR_WRONG_USER_KEY Invalid API key Check your key at captchaai.com dashboard
ERROR_ZERO_BALANCE No funds Add balance to your account
ERROR_CAPTCHA_UNSOLVABLE Challenge couldn't be solved Verify the site key and URL are correct
CAPCHA_NOT_READY (loops forever) Slow solve or wrong parameters Increase timeout; verify site key matches the page
Token rejected by site Token expired or wrong site key Use token within 120 seconds; confirm site key

Best Practices

  1. Rotate user agents — Use realistic browser User-Agent strings
  2. Add delays — Space requests 2-5 seconds apart to avoid rate limits
  3. Use proxies — Rotate residential proxies to distribute requests
  4. Handle cookies — Maintain session cookies across requests
  5. Cache tokens — Some tokens work for multiple requests within their validity window

FAQ

Does this work with Cloudflare-protected sites?

Yes. Use method=turnstile for Turnstile CAPTCHAs or method=cloudflare_challenge for full Cloudflare challenge pages. See How to Bypass Cloudflare Turnstile.

Do I need a headless browser?

Not always. For simple form submissions with reCAPTCHA, plain HTTP requests work. For JavaScript-heavy sites, combine CaptchaAI with Selenium or Puppeteer.

How much does it cost to scrape 10,000 pages?

At CaptchaAI's rates, solving 10,000 reCAPTCHA v2 challenges costs approximately $10. Image CAPTCHAs are even cheaper.

Can I solve CAPTCHAs in parallel?

Yes. Submit multiple tasks simultaneously and poll for each result. See Solving Multiple CAPTCHAs in Parallel.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Discussions (0)

No comments yet.