Proxy Rotation for CAPTCHA Scraping

Proxy rotation reduces CAPTCHA frequency by distributing requests across multiple IPs. Combined with CaptchaAI for solving the CAPTCHAs that still appear, you get a reliable scraping pipeline that handles any anti-bot system.

Why Proxy Rotation Reduces CAPTCHAs

Sites trigger CAPTCHAs based on per-IP request patterns:

Factor	Single IP	Rotating Proxies
Requests per minute	10+ triggers CAPTCHA	Distributed across IPs
IP reputation	Degrades over time	Fresh IPs from pool
Session patterns	Suspicious patterns visible	Patterns spread across IPs
Geographic consistency	Single location	Natural geographic diversity

Proxy Types for Scraping

Type	Best For	CAPTCHA Rate	Cost
Residential	High-value targets (Google, Amazon)	Lowest	$$$
Mobile	Ultra-low detection	Lowest	$$$$
ISP/Static	Sustained sessions	Low	$$
Datacenter	High-volume, lenient sites	Higher	$

Recommendation: Use residential proxies for sites with aggressive CAPTCHA triggers. Datacenter proxies work for less protected sites.

Basic Proxy Rotation (Python)

import requests
import random
import time

PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

API_KEY = "YOUR_API_KEY"

def get_random_proxy():
    proxy = random.choice(PROXIES)
    return {"http": proxy, "https": proxy}

def scrape_with_rotation(url):
    proxy = get_random_proxy()
    session = requests.Session()
    session.proxies = proxy
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    })

    resp = session.get(url)

    # If CAPTCHA appears, solve it
    if "g-recaptcha" in resp.text or "captcha" in resp.text.lower():
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(resp.text, "html.parser")
        rc = soup.find("div", class_="g-recaptcha")
        if rc:
            site_key = rc["data-sitekey"]
            token = solve_captcha(site_key, url)
            resp = session.post(url, data={"g-recaptcha-response": token})

    return resp.text

def solve_captcha(site_key, page_url):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY, "method": "userrecaptcha",
        "googlekey": site_key, "pageurl": page_url
    })
    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY": continue
        if result.text.startswith("OK|"): return result.text.split("|")[1]
        raise Exception(result.text)
    raise TimeoutError()

Smart Proxy Rotation

Track which proxies trigger CAPTCHAs and avoid them:

from collections import defaultdict
import random

class SmartProxyRotator:
    def __init__(self, proxies):
        self.proxies = proxies
        self.captcha_count = defaultdict(int)
        self.success_count = defaultdict(int)

    def get_proxy(self):
        # Prefer proxies with lower CAPTCHA rates
        scored = []
        for proxy in self.proxies:
            total = self.captcha_count[proxy] + self.success_count[proxy]
            if total == 0:
                score = 0.5  # Unknown proxy, neutral score
            else:
                score = self.success_count[proxy] / total
            scored.append((proxy, score))

        # Weight selection by score
        scored.sort(key=lambda x: x[1], reverse=True)
        top_proxies = scored[:max(len(scored) // 2, 1)]
        proxy = random.choice(top_proxies)[0]
        return proxy

    def report_success(self, proxy):
        self.success_count[proxy] += 1

    def report_captcha(self, proxy):
        self.captcha_count[proxy] += 1

# Usage
rotator = SmartProxyRotator(PROXIES)

def scrape(url):
    proxy = rotator.get_proxy()
    resp = requests.get(url, proxies={"http": proxy, "https": proxy})

    if "captcha" in resp.text.lower():
        rotator.report_captcha(proxy)
        # Solve CAPTCHA...
    else:
        rotator.report_success(proxy)

    return resp.text

Proxy Rotation with Selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def create_driver_with_proxy(proxy_url):
    options = Options()
    options.add_argument(f"--proxy-server={proxy_url}")
    options.add_argument("--disable-blink-features=AutomationControlled")
    return webdriver.Chrome(options=options)

# Rotate proxy per session
proxy = random.choice(PROXIES)
driver = create_driver_with_proxy(proxy)
driver.get("https://example.com")

Proxy + CAPTCHA Solving for Cloudflare

Cloudflare Challenge solving requires passing a proxy to CaptchaAI:

proxy = "http://user:pass@proxy.example.com:8080"

resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": API_KEY,
    "method": "cloudflare_challenge",
    "pageurl": "https://example.com",
    "proxy": proxy,
    "proxytype": "HTTP"
})
task_id = resp.text.split("|")[1]

# Poll for cf_clearance cookie
# Use the same proxy for subsequent requests

Best Practices

Match proxy geo to target — Use US proxies for US sites
One session per proxy — Don't reuse sessions across different proxies
Rate limit per proxy — Max 5-10 requests/minute per IP
Monitor CAPTCHA rates — Track which proxies trigger more CAPTCHAs
Use sticky sessions — Keep the same proxy for multi-step workflows
Handle proxy failures — Retry with a different proxy on connection errors

Troubleshooting

Issue	Fix
All proxies trigger CAPTCHAs	Switch to residential proxies; reduce rate
Proxy timeout errors	Remove slow proxies from pool; increase timeout
Different content per proxy	Some sites serve geo-specific content; normalize
CAPTCHA tokens don't work with proxy	Ensure token is used from the same session/IP

FAQ

Do I need proxies if I use CaptchaAI?

Not strictly — CaptchaAI can solve CAPTCHAs regardless. But proxies reduce how often CAPTCHAs appear, saving time and API costs.

Should I use the same proxy for CAPTCHA solving and scraping?

For most CAPTCHA types, the token is valid regardless of IP. For Cloudflare Challenge, you must use the same proxy since the cf_clearance cookie is IP-bound.