Tutorials

Memory and CPU Optimization for CAPTCHA Solving Workers

CAPTCHA solving workers are I/O-bound — they spend most of their time waiting for API responses. But poor resource management can still cause memory leaks, high CPU usage, and process crashes. This guide covers practical optimizations for CaptchaAI workers.

Common Resource Bottlenecks

Bottleneck Cause Impact
Memory growth Unbounded response buffering OOM kills, swap thrashing
High CPU Busy-wait polling loops Waste compute, block other tasks
Connection leaks Unclosed HTTP sessions File descriptor exhaustion
Large payloads Base64 image bodies in memory 2–5 MB per image CAPTCHA

Python: Lean Worker Patterns

Use Connection Pooling with Limits

# lean_worker.py
import os
import asyncio
import aiohttp

API_KEY = os.environ.get("CAPTCHAAI_KEY", "YOUR_API_KEY")

async def create_lean_session():
    """Create a memory-efficient aiohttp session."""
    connector = aiohttp.TCPConnector(
        limit=20,            # Max connections
        limit_per_host=20,   # All go to same host
        keepalive_timeout=30,
        enable_cleanup_closed=True,
    )
    return aiohttp.ClientSession(
        connector=connector,
        timeout=aiohttp.ClientTimeout(total=30),
    )

async def solve_captcha(session, sitekey, pageurl):
    """Solve with minimal memory footprint."""
    # Submit
    async with session.get("https://ocr.captchaai.com/in.php", params={
        "key": API_KEY,
        "method": "userrecaptcha",
        "googlekey": sitekey,
        "pageurl": pageurl,
        "json": "1",
    }) as resp:
        # Read and release response immediately
        result = await resp.json(content_type=None)

    if result.get("status") != 1:
        return None

    task_id = result["request"]
    del result  # Free memory

    # Poll with sleep (not busy-wait)
    await asyncio.sleep(15)
    for _ in range(25):
        async with session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get",
            "id": task_id, "json": "1",
        }) as resp:
            poll_result = await resp.json(content_type=None)

        if poll_result.get("status") == 1:
            token = poll_result["request"]
            del poll_result
            return token

        if poll_result.get("request") != "CAPCHA_NOT_READY":
            return None

        del poll_result
        await asyncio.sleep(5)  # Async sleep — zero CPU

    return None

async def main():
    session = await create_lean_session()
    try:
        tasks = [
            solve_captcha(session, "SITEKEY", "https://example.com")
            for _ in range(50)
        ]
        results = await asyncio.gather(*tasks)
        solved = sum(1 for r in results if r)
        print(f"Solved: {solved}/{len(tasks)}")
    finally:
        await session.close()

asyncio.run(main())

Stream Large Image CAPTCHAs

For Image/OCR CAPTCHAs, avoid loading entire images into memory:

import base64
import aiohttp

async def submit_image_streaming(session, image_path):
    """Submit image CAPTCHA without loading entire file into memory."""
    # Read file in chunks and encode
    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode("ascii")

    # Submit and immediately release the base64 string
    async with session.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY,
        "method": "base64",
        "body": image_data,
        "json": "1",
    }) as resp:
        result = await resp.json(content_type=None)

    del image_data  # Free the base64 string immediately
    return result

Monitor Memory Usage

import tracemalloc

tracemalloc.start()

# ... run your solver ...

current, peak = tracemalloc.get_traced_memory()
print(f"Current: {current / 1024 / 1024:.1f} MB")
print(f"Peak: {peak / 1024 / 1024:.1f} MB")
tracemalloc.stop()

JavaScript: Resource-Efficient Patterns

Proper Agent Configuration

// lean_worker.js
const axios = require('axios');
const https = require('https');

const API_KEY = process.env.CAPTCHAAI_KEY || 'YOUR_API_KEY';

// Configure agent for minimal resource usage
const agent = new https.Agent({
  keepAlive: true,
  maxSockets: 20,        // Limit concurrent connections
  maxFreeSockets: 5,     // Keep 5 idle for reuse
  timeout: 30000,        // Close idle connections after 30s
});

const api = axios.create({
  baseURL: 'https://ocr.captchaai.com',
  httpsAgent: agent,
  timeout: 30000,
  maxContentLength: 50000, // Limit response size (50 KB)
  maxBodyLength: 5000000,  // Limit request body (5 MB for images)
});

async function solveCaptcha(sitekey, pageurl) {
  const submit = await api.get('/in.php', {
    params: {
      key: API_KEY, method: 'userrecaptcha',
      googlekey: sitekey, pageurl, json: '1',
    },
  });

  if (submit.data.status !== 1) return null;
  const taskId = submit.data.request;

  await new Promise(r => setTimeout(r, 15000));

  for (let i = 0; i < 25; i++) {
    const poll = await api.get('/res.php', {
      params: { key: API_KEY, action: 'get', id: taskId, json: '1' },
    });

    if (poll.data.status === 1) return poll.data.request;
    if (poll.data.request !== 'CAPCHA_NOT_READY') return null;

    await new Promise(r => setTimeout(r, 5000));
  }
  return null;
}

// Process with concurrency control
async function processWithLimit(tasks, concurrency) {
  const results = [];
  const active = new Set();

  for (const task of tasks) {
    const p = solveCaptcha(task.sitekey, task.pageurl).then(r => {
      active.delete(p);
      return r;
    });
    active.add(p);
    results.push(p);

    if (active.size >= concurrency) await Promise.race(active);
  }
  return Promise.all(results);
}

// Monitor memory
function logMemory() {
  const usage = process.memoryUsage();
  console.log(`RSS: ${(usage.rss / 1024 / 1024).toFixed(1)} MB`);
  console.log(`Heap: ${(usage.heapUsed / 1024 / 1024).toFixed(1)} MB`);
}

Resource Budgets

Target resource usage per concurrency level:

Concurrent solves Expected memory Expected CPU Connections
10 30–50 MB < 5% 10
50 60–100 MB < 10% 20
100 100–200 MB < 15% 50
500 300–500 MB < 25% 100

If your worker exceeds these targets, look for:

  • Unbounded buffers (accumulating results without processing)
  • Connection leaks (sessions not closed on error)
  • Synchronous file I/O blocking the event loop

Anti-Patterns to Avoid

Anti-Pattern Problem Fix
while True polling without sleep 100% CPU usage Use asyncio.sleep() or setTimeout()
Storing all tokens in memory Unbounded growth Write to database or file as they arrive
Creating new HTTP client per request Connection churn, memory waste Reuse a single session/client
Loading all images at once Memory spike Process images one at a time or in small batches
Not closing sessions on shutdown Connection leaks Use try/finally or process signal handlers

Troubleshooting

Issue Cause Fix
Memory climbs over time Result accumulation or connection leak Process results immediately; close sessions on error
CPU spikes during polling Busy-wait loop or JSON parsing overhead Use async sleep; limit response parsing
Process killed by OS (OOM) Memory exceeds system limit Set maxSockets, process images in batches
File descriptor limit hit Too many open connections Set ulimit -n 65536 (Linux) or reduce pool size

FAQ

Does CaptchaAI solving use local CPU for computation?

No. The actual CAPTCHA solving happens on CaptchaAI's servers. Your worker only performs HTTP requests and JSON parsing, which are lightweight operations.

Should I use processes or threads for parallelism?

Use async I/O (asyncio for Python, native Promise for Node.js). Threads add memory overhead without benefit for I/O-bound work. Use processes only if you need to exceed 500+ concurrent solves.

How do I detect a memory leak in my worker?

Track RSS and heap used over time. If either grows linearly without plateau, you have a leak. Use tracemalloc (Python) or --inspect (Node.js) to identify the source.

Next Steps

Build resource-efficient CAPTCHA solving workers — get your CaptchaAI API key.

Related guides:

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Discussions (0)

No comments yet.