Use Cases

Proxy Rotation for CAPTCHA Scraping

Proxy rotation reduces CAPTCHA frequency by distributing requests across multiple IPs. Combined with CaptchaAI for solving the CAPTCHAs that still appear, you get a reliable scraping pipeline that handles any anti-bot system.

Why Proxy Rotation Reduces CAPTCHAs

Sites trigger CAPTCHAs based on per-IP request patterns:

Factor Single IP Rotating Proxies
Requests per minute 10+ triggers CAPTCHA Distributed across IPs
IP reputation Degrades over time Fresh IPs from pool
Session patterns Suspicious patterns visible Patterns spread across IPs
Geographic consistency Single location Natural geographic diversity

Proxy Types for Scraping

Type Best For CAPTCHA Rate Cost
Residential High-value targets (Google, Amazon) Lowest $$$
Mobile Ultra-low detection Lowest $$$$
ISP/Static Sustained sessions Low $$
Datacenter High-volume, lenient sites Higher $

Recommendation: Use residential proxies for sites with aggressive CAPTCHA triggers. Datacenter proxies work for less protected sites.

Basic Proxy Rotation (Python)

import requests
import random
import time

PROXIES = [
    "http://user:pass@proxy1.example.com:8080",
    "http://user:pass@proxy2.example.com:8080",
    "http://user:pass@proxy3.example.com:8080",
]

API_KEY = "YOUR_API_KEY"

def get_random_proxy():
    proxy = random.choice(PROXIES)
    return {"http": proxy, "https": proxy}

def scrape_with_rotation(url):
    proxy = get_random_proxy()
    session = requests.Session()
    session.proxies = proxy
    session.headers.update({
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
    })

    resp = session.get(url)

    # If CAPTCHA appears, solve it
    if "g-recaptcha" in resp.text or "captcha" in resp.text.lower():
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(resp.text, "html.parser")
        rc = soup.find("div", class_="g-recaptcha")
        if rc:
            site_key = rc["data-sitekey"]
            token = solve_captcha(site_key, url)
            resp = session.post(url, data={"g-recaptcha-response": token})

    return resp.text

def solve_captcha(site_key, page_url):
    resp = requests.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY, "method": "userrecaptcha",
        "googlekey": site_key, "pageurl": page_url
    })
    task_id = resp.text.split("|")[1]

    for _ in range(60):
        time.sleep(5)
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id
        })
        if result.text == "CAPCHA_NOT_READY": continue
        if result.text.startswith("OK|"): return result.text.split("|")[1]
        raise Exception(result.text)
    raise TimeoutError()

Smart Proxy Rotation

Track which proxies trigger CAPTCHAs and avoid them:

from collections import defaultdict
import random

class SmartProxyRotator:
    def __init__(self, proxies):
        self.proxies = proxies
        self.captcha_count = defaultdict(int)
        self.success_count = defaultdict(int)

    def get_proxy(self):
        # Prefer proxies with lower CAPTCHA rates
        scored = []
        for proxy in self.proxies:
            total = self.captcha_count[proxy] + self.success_count[proxy]
            if total == 0:
                score = 0.5  # Unknown proxy, neutral score
            else:
                score = self.success_count[proxy] / total
            scored.append((proxy, score))

        # Weight selection by score
        scored.sort(key=lambda x: x[1], reverse=True)
        top_proxies = scored[:max(len(scored) // 2, 1)]
        proxy = random.choice(top_proxies)[0]
        return proxy

    def report_success(self, proxy):
        self.success_count[proxy] += 1

    def report_captcha(self, proxy):
        self.captcha_count[proxy] += 1

# Usage
rotator = SmartProxyRotator(PROXIES)

def scrape(url):
    proxy = rotator.get_proxy()
    resp = requests.get(url, proxies={"http": proxy, "https": proxy})

    if "captcha" in resp.text.lower():
        rotator.report_captcha(proxy)
        # Solve CAPTCHA...
    else:
        rotator.report_success(proxy)

    return resp.text

Proxy Rotation with Selenium

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def create_driver_with_proxy(proxy_url):
    options = Options()
    options.add_argument(f"--proxy-server={proxy_url}")
    options.add_argument("--disable-blink-features=AutomationControlled")
    return webdriver.Chrome(options=options)

# Rotate proxy per session
proxy = random.choice(PROXIES)
driver = create_driver_with_proxy(proxy)
driver.get("https://example.com")

Proxy + CAPTCHA Solving for Cloudflare

Cloudflare Challenge solving requires passing a proxy to CaptchaAI:

proxy = "http://user:pass@proxy.example.com:8080"

resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": API_KEY,
    "method": "cloudflare_challenge",
    "pageurl": "https://example.com",
    "proxy": proxy,
    "proxytype": "HTTP"
})
task_id = resp.text.split("|")[1]

# Poll for cf_clearance cookie
# Use the same proxy for subsequent requests

Best Practices

  1. Match proxy geo to target — Use US proxies for US sites
  2. One session per proxy — Don't reuse sessions across different proxies
  3. Rate limit per proxy — Max 5-10 requests/minute per IP
  4. Monitor CAPTCHA rates — Track which proxies trigger more CAPTCHAs
  5. Use sticky sessions — Keep the same proxy for multi-step workflows
  6. Handle proxy failures — Retry with a different proxy on connection errors

Troubleshooting

Issue Fix
All proxies trigger CAPTCHAs Switch to residential proxies; reduce rate
Proxy timeout errors Remove slow proxies from pool; increase timeout
Different content per proxy Some sites serve geo-specific content; normalize
CAPTCHA tokens don't work with proxy Ensure token is used from the same session/IP

FAQ

Do I need proxies if I use CaptchaAI?

Not strictly — CaptchaAI can solve CAPTCHAs regardless. But proxies reduce how often CAPTCHAs appear, saving time and API costs.

Should I use the same proxy for CAPTCHA solving and scraping?

For most CAPTCHA types, the token is valid regardless of IP. For Cloudflare Challenge, you must use the same proxy since the cf_clearance cookie is IP-bound.

How many proxies do I need?

For moderate scraping (1,000 pages/day), 10-20 rotating residential proxies suffice. For high volume, use a proxy provider with automatic rotation.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Discussions (0)

No comments yet.