How to Handle CAPTCHA Challenges in Web Scraping Workflows

CAPTCHAs are the most common blocker in web scraping workflows. When a target site serves a reCAPTCHA, Cloudflare Turnstile, or image CAPTCHA, your scraper stops dead. CaptchaAI's API solves these challenges automatically so your scraper keeps running.

How CAPTCHA Blocking Works in Scraping

Websites trigger CAPTCHAs based on behavioral signals:

Signal	Trigger
Request rate	Too many requests from one IP
Missing cookies	No session or preference cookies
Bot-like headers	Missing `Accept-Language`, `Referer`
JavaScript fingerprint	No JS execution or headless browser detected
IP reputation	Datacenter or proxy IP flagged

When triggered, the site returns a CAPTCHA challenge instead of the page content. Your scraper needs to solve it and submit the token to proceed.

Requirements

Requirement	Details
CaptchaAI API key	From captchaai.com
Python 3.7+ or Node.js 16+	For code examples
`requests` / `axios`	HTTP client library
Target site URL	The page serving the CAPTCHA
CAPTCHA site key	Extracted from the page source

Step 1: Identify the CAPTCHA Type

Before solving, identify what CAPTCHA the site uses. Check the page source:

reCAPTCHA v2:

<div class="g-recaptcha" data-sitekey="6Le-wvkS..."></div>

reCAPTCHA v3:

<script src="https://www.google.com/recaptcha/api.js?render=6Le-wvkS..."></script>

Cloudflare Turnstile:

<div class="cf-turnstile" data-sitekey="0x4AAAAA..."></div>

Each type requires a different method parameter when submitting to CaptchaAI.

Step 2: Extract the Site Key

Python (with requests + BeautifulSoup)

from bs4 import BeautifulSoup
import requests

page = requests.get("https://example.com/protected-page", headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})
soup = BeautifulSoup(page.text, "html.parser")

# reCAPTCHA v2
recaptcha_div = soup.find("div", class_="g-recaptcha")
if recaptcha_div:
    site_key = recaptcha_div["data-sitekey"]
    print(f"reCAPTCHA v2 site key: {site_key}")

Node.js (with cheerio)

const axios = require("axios");
const cheerio = require("cheerio");

const { data } = await axios.get("https://example.com/protected-page");
const $ = cheerio.load(data);

const siteKey = $(".g-recaptcha").attr("data-sitekey");
console.log("Site key:", siteKey);

Step 3: Submit the CAPTCHA to CaptchaAI

Python

import requests
import time

API_KEY = "YOUR_API_KEY"
SITE_KEY = "6Le-wvkS..."
PAGE_URL = "https://example.com/protected-page"

# Submit
resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": API_KEY,
    "method": "userrecaptcha",
    "googlekey": SITE_KEY,
    "pageurl": PAGE_URL
})

if not resp.text.startswith("OK|"):
    raise Exception(f"Submit error: {resp.text}")

task_id = resp.text.split("|")[1]
print(f"Task submitted: {task_id}")

# Poll for result
while True:
    time.sleep(5)
    result = requests.get("https://ocr.captchaai.com/res.php", params={
        "key": API_KEY,
        "action": "get",
        "id": task_id
    })
    if result.text == "CAPCHA_NOT_READY":
        continue
    if result.text.startswith("OK|"):
        token = result.text.split("|")[1]
        print(f"Solved! Token: {token[:50]}...")
        break
    raise Exception(f"Solve error: {result.text}")

Node.js

const axios = require("axios");

const API_KEY = "YOUR_API_KEY";
const SITE_KEY = "6Le-wvkS...";
const PAGE_URL = "https://example.com/protected-page";

// Submit
const submitResp = await axios.get("https://ocr.captchaai.com/in.php", {
  params: {
    key: API_KEY,
    method: "userrecaptcha",
    googlekey: SITE_KEY,
    pageurl: PAGE_URL,
  },
});

const taskId = submitResp.data.split("|")[1];

// Poll
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

while (true) {
  await sleep(5000);
  const result = await axios.get("https://ocr.captchaai.com/res.php", {
    params: { key: API_KEY, action: "get", id: taskId },
  });
  if (result.data === "CAPCHA_NOT_READY") continue;
  if (result.data.startsWith("OK|")) {
    const token = result.data.split("|")[1];
    console.log("Token:", token.substring(0, 50));
    break;
  }
  throw new Error(`Error: ${result.data}`);
}

Step 4: Submit the Token to the Target Site

Once you have the token, submit it with the form data the site expects:

Python

# Submit the solved token with the form
form_data = {
    "g-recaptcha-response": token,
    "username": "user@example.com",
    "password": "password123"
}

response = requests.post(PAGE_URL, data=form_data, headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
})

print(f"Status: {response.status_code}")

Step 5: Build a Reusable Scraper Function

Wrap the solve logic into a reusable function:

import requests
import time

API_KEY = "YOUR_API_KEY"

def solve_captcha(site_key, page_url, method="userrecaptcha"):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": method,
        "googlekey": site_key,
        "pageurl": page_url
    })
    if not resp.text.startswith("OK|"):
        raise Exception(resp.text)
    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY":
            continue
        if result.text.startswith("OK|"):
            return result.text.split("|")[1]
        raise Exception(result.text)
    raise TimeoutError("CAPTCHA solve timed out")

# Use in your scraper
def scrape_page(url, site_key):
    token = solve_captcha(site_key, url)
    response = requests.post(url, data={"g-recaptcha-response": token})
    return response.text

Troubleshooting

Error	Cause	Fix
`ERROR_WRONG_USER_KEY`	Invalid API key	Check your key at captchaai.com dashboard
`ERROR_ZERO_BALANCE`	No funds	Add balance to your account
`ERROR_CAPTCHA_UNSOLVABLE`	Challenge couldn't be solved	Verify the site key and URL are correct
`CAPCHA_NOT_READY` (loops forever)	Slow solve or wrong parameters	Increase timeout; verify site key matches the page
Token rejected by site	Token expired or wrong site key	Use token within 120 seconds; confirm site key

Best Practices

Rotate user agents — Use realistic browser User-Agent strings
Add delays — Space requests 2-5 seconds apart to avoid rate limits
Use proxies — Rotate residential proxies to distribute requests
Handle cookies — Maintain session cookies across requests
Cache tokens — Some tokens work for multiple requests within their validity window

FAQ

Does this work with Cloudflare-protected sites?

Yes. Use method=turnstile for Turnstile CAPTCHAs or method=cloudflare_challenge for full Cloudflare challenge pages. See How to Bypass Cloudflare Turnstile.

Do I need a headless browser?

Not always. For simple form submissions with reCAPTCHA, plain HTTP requests work. For JavaScript-heavy sites, combine CaptchaAI with Selenium or Puppeteer.

How much does it cost to scrape 10,000 pages?

At CaptchaAI's rates, solving 10,000 reCAPTCHA v2 challenges costs approximately $10. Image CAPTCHAs are even cheaper.

Can I solve CAPTCHAs in parallel?

Yes. Submit multiple tasks simultaneously and poll for each result. See Solving Multiple CAPTCHAs in Parallel.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

How to Handle CAPTCHA Challenges in Web Scraping Workflows

How CAPTCHA Blocking Works in Scraping

Requirements

Step 1: Identify the CAPTCHA Type

Step 2: Extract the Site Key

Python (with requests + BeautifulSoup)

Node.js (with cheerio)

Step 3: Submit the CAPTCHA to CaptchaAI

Python

Node.js

Step 4: Submit the Token to the Target Site

Python

Step 5: Build a Reusable Scraper Function

Troubleshooting

Best Practices

FAQ

Does this work with Cloudflare-protected sites?

Do I need a headless browser?

How much does it cost to scrape 10,000 pages?

Can I solve CAPTCHAs in parallel?

Discussions (0)

Puppeteer Stealth + CaptchaAI: Reliable Browser Automation

Multi-Step Workflow Automation with CaptchaAI

Bot Detection vs CAPTCHA Scraping — What You Need to Know

Cloudflare Turnstile Errors and Troubleshooting

How Cloudflare Turnstile Works

Cloudflare Turnstile Widget Modes: Managed, Non-Interactive, Invisible

Cloudflare Bot Management vs Turnstile: Understanding the Difference

CAPTCHA Token Injection Methods Reference

Cloudflare Turnstile vs reCAPTCHA

Cloudflare Turnstile vs Cloudflare Challenge: Complete Comparison

How CAPTCHA Blocking Works in Scraping

Requirements

Step 1: Identify the CAPTCHA Type

Step 2: Extract the Site Key

Python (with requests + BeautifulSoup)

Node.js (with cheerio)

Step 3: Submit the CAPTCHA to CaptchaAI

Python

Node.js

Step 4: Submit the Token to the Target Site

Python

Step 5: Build a Reusable Scraper Function

Troubleshooting

Best Practices

FAQ

Does this work with Cloudflare-protected sites?

Do I need a headless browser?

How much does it cost to scrape 10,000 pages?

Can I solve CAPTCHAs in parallel?

Related Guides

Discussions (0)

Join the conversation

Related Posts

Puppeteer Stealth + CaptchaAI: Reliable Browser Automation

Multi-Step Workflow Automation with CaptchaAI

Bot Detection vs CAPTCHA Scraping — What You Need to Know

Cloudflare Turnstile Errors and Troubleshooting

How Cloudflare Turnstile Works

Cloudflare Turnstile Widget Modes: Managed, Non-Interactive, Invisible

Cloudflare Bot Management vs Turnstile: Understanding the Difference

CAPTCHA Token Injection Methods Reference

Cloudflare Turnstile vs reCAPTCHA

Cloudflare Turnstile vs Cloudflare Challenge: Complete Comparison