Tutorials

CAPTCHA Solving Throughput: How to Process 10,000 Tasks per Hour

Processing 10,000 CAPTCHAs per hour means ~2.8 solves per second sustained. That's achievable with the right architecture. This guide walks through the math, the code, and the tuning required to reach this throughput using CaptchaAI.

The Math

If a single reCAPTCHA v2 solve takes 15 seconds (median):

  • Sequential: 3,600s / 15s = 240 solves/hour
  • To reach 10,000/hour: you need ~42 concurrent solves in flight at all times

The key insight: you're not waiting for CaptchaAI to be faster — you're overlapping enough requests that 42 solves complete during the same 15-second window.

Architecture

┌──────────┐     ┌────────────┐     ┌─────────────┐     ┌──────────┐
│  Task     │────▶│  Submit    │────▶│  CaptchaAI  │────▶│  Result  │
│  Queue    │     │  Workers   │     │  API        │     │  Store   │
│  (Redis)  │     │  (async)   │     │             │     │  (DB)    │
└──────────┘     └────────────┘     └─────────────┘     └──────────┘
                       │                    ▲
                       │    ┌──────────┐    │
                       └───▶│  Poll    │────┘
                            │  Workers │
                            └──────────┘

Components:

  1. Task queue — Holds pending CAPTCHA tasks with sitekeys and URLs
  2. Submit workers — Send tasks to CaptchaAI API concurrently
  3. Poll workers — Check for results at optimized intervals
  4. Result store — Saves tokens as they arrive

Python: Async Pipeline

# high_throughput_solver.py
import os
import asyncio
import time
import aiohttp

API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")
BASE_URL = "https://ocr.captchaai.com"
MAX_CONCURRENT = 50  # Max simultaneous solves
POLL_INTERVAL = 5    # Seconds between polls
INITIAL_WAIT = 12    # Seconds before first poll

semaphore = asyncio.Semaphore(MAX_CONCURRENT)
stats = {"submitted": 0, "solved": 0, "failed": 0, "start": 0}

async def solve_one(session, sitekey, pageurl, task_num):
    """Submit and poll a single CAPTCHA."""
    async with semaphore:
        try:
            # Submit
            async with session.get(f"{BASE_URL}/in.php", params={
                "key": API_KEY, "method": "userrecaptcha",
                "googlekey": sitekey, "pageurl": pageurl, "json": "1",
            }) as resp:
                result = await resp.json(content_type=None)

            if result.get("status") != 1:
                stats["failed"] += 1
                return None

            stats["submitted"] += 1
            task_id = result["request"]

            # Wait before first poll
            await asyncio.sleep(INITIAL_WAIT)

            # Poll
            for _ in range(25):
                async with session.get(f"{BASE_URL}/res.php", params={
                    "key": API_KEY, "action": "get",
                    "id": task_id, "json": "1",
                }) as resp:
                    poll_result = await resp.json(content_type=None)

                if poll_result.get("status") == 1:
                    stats["solved"] += 1
                    return poll_result["request"]

                if poll_result.get("request") != "CAPCHA_NOT_READY":
                    stats["failed"] += 1
                    return None

                await asyncio.sleep(POLL_INTERVAL)

            stats["failed"] += 1
            return None

        except Exception as e:
            stats["failed"] += 1
            return None

async def run_batch(tasks):
    """Process a batch of CAPTCHA tasks concurrently."""
    connector = aiohttp.TCPConnector(
        limit=MAX_CONCURRENT,
        keepalive_timeout=60,
    )
    async with aiohttp.ClientSession(connector=connector) as session:
        coros = [
            solve_one(session, task["sitekey"], task["pageurl"], i)
            for i, task in enumerate(tasks)
        ]
        results = await asyncio.gather(*coros)
    return results

async def main():
    # Generate test tasks (replace with your task source)
    tasks = [
        {
            "sitekey": "6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-",
            "pageurl": "https://www.google.com/recaptcha/api2/demo",
        }
        for _ in range(100)  # Start with 100 tasks
    ]

    stats["start"] = time.time()
    print(f"Processing {len(tasks)} tasks with {MAX_CONCURRENT} concurrent workers")

    results = await run_batch(tasks)
    elapsed = time.time() - stats["start"]

    print(f"\nCompleted in {elapsed:.0f}s")
    print(f"Submitted: {stats['submitted']}")
    print(f"Solved: {stats['solved']}")
    print(f"Failed: {stats['failed']}")
    print(f"Throughput: {stats['solved'] / (elapsed / 3600):.0f} solves/hour")

asyncio.run(main())

JavaScript: Concurrent Pipeline

// high_throughput_solver.js
const axios = require('axios');
const https = require('https');

const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';
const BASE = 'https://ocr.captchaai.com';
const MAX_CONCURRENT = 50;

const agent = new https.Agent({ keepAlive: true, maxSockets: MAX_CONCURRENT });
const api = axios.create({ baseURL: BASE, httpsAgent: agent, timeout: 30000 });

const stats = { submitted: 0, solved: 0, failed: 0 };

async function solveOne(sitekey, pageurl) {
  try {
    const submit = await api.get('/in.php', {
      params: { key: API_KEY, method: 'userrecaptcha', googlekey: sitekey, pageurl, json: '1' },
    });
    if (submit.data.status !== 1) { stats.failed++; return null; }
    stats.submitted++;

    await new Promise(r => setTimeout(r, 12000));

    for (let i = 0; i < 25; i++) {
      const poll = await api.get('/res.php', {
        params: { key: API_KEY, action: 'get', id: submit.data.request, json: '1' },
      });
      if (poll.data.status === 1) { stats.solved++; return poll.data.request; }
      if (poll.data.request !== 'CAPCHA_NOT_READY') { stats.failed++; return null; }
      await new Promise(r => setTimeout(r, 5000));
    }
    stats.failed++;
    return null;
  } catch { stats.failed++; return null; }
}

async function runWithConcurrency(tasks, limit) {
  const results = [];
  const executing = new Set();

  for (const task of tasks) {
    const p = solveOne(task.sitekey, task.pageurl).then(r => {
      executing.delete(p);
      return r;
    });
    executing.add(p);
    results.push(p);

    if (executing.size >= limit) {
      await Promise.race(executing);
    }
  }
  return Promise.all(results);
}

(async () => {
  const tasks = Array.from({ length: 100 }, () => ({
    sitekey: '6Le-wvkSAAAAAPBMRTvw0Q4Muexq9bi0DJwx_mJ-',
    pageurl: 'https://www.google.com/recaptcha/api2/demo',
  }));

  const start = Date.now();
  console.log(`Processing ${tasks.length} tasks, ${MAX_CONCURRENT} concurrent`);

  await runWithConcurrency(tasks, MAX_CONCURRENT);
  const elapsed = (Date.now() - start) / 1000;

  console.log(`\nDone in ${elapsed.toFixed(0)}s`);
  console.log(`Solved: ${stats.solved}, Failed: ${stats.failed}`);
  console.log(`Throughput: ${(stats.solved / (elapsed / 3600)).toFixed(0)} solves/hour`);

  agent.destroy();
})();

Tuning Parameters

Parameter Conservative Balanced Aggressive
MAX_CONCURRENT 20 50 100
INITIAL_WAIT 15s 12s 10s
POLL_INTERVAL 7s 5s 3s
MAX_POLL_ATTEMPTS 30 25 20
Expected throughput ~4,800/hr ~10,000/hr ~18,000/hr

Start conservative and increase MAX_CONCURRENT until you see diminishing returns or increased error rates.

Monitoring Throughput

Track these metrics in real-time:

  • Solves per minute — Should stay at ~167 for 10K/hour target
  • Error rate — Keep below 5%. If it spikes, reduce concurrency
  • Queue depth — If growing, increase workers. If empty, you're over-provisioned
  • P90 solve time — If increasing, CaptchaAI may be rate-limiting

Troubleshooting

Issue Cause Fix
Throughput plateaus at ~5K/hr Insufficient concurrency Increase MAX_CONCURRENT to 80–100
Error rate > 10% Overloading API or bad proxies Reduce concurrency, check proxy health
Memory usage growing Unbounded task accumulation Process results as they arrive, don't buffer
ERROR_NO_SLOT_AVAILABLE CaptchaAI queue full Back off and retry after 5 seconds

FAQ

What's the CaptchaAI concurrency limit?

There's no hard limit on concurrent requests, but extremely high concurrency (500+) may trigger rate limiting. Start at 50 and scale up.

Can I run this across multiple machines?

Yes. Use a shared queue (Redis, RabbitMQ) and run the worker script on multiple servers. Each worker pulls tasks independently.

What about balance consumption at this rate?

At 10,000 solves/hour, monitor your balance closely. Use the balance check endpoint (res.php?action=getbalance) and set up alerts.

Next Steps

Build your high-throughput CAPTCHA pipeline — get your CaptchaAI API key.

Related guides:

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Discussions (0)

No comments yet.