DevOps & Scaling

Monitoring CAPTCHA Solve Rates with Prometheus and Grafana

Production CAPTCHA solving needs observability. Prometheus collects metrics, Grafana visualizes them, and alerting catches issues before they impact your pipeline.


Metrics to Track

Metric Type Purpose
captcha_solves_total Counter Total solve attempts
captcha_solves_success Counter Successful solves
captcha_solves_errors Counter Failed solves (by error type)
captcha_solve_duration Histogram Solve time distribution
captcha_balance Gauge Current account balance
captcha_queue_length Gauge Pending tasks in queue

Python Metrics Exporter

# metrics.py
import time
import requests
from prometheus_client import (
    Counter, Histogram, Gauge, start_http_server,
)


# Define metrics
SOLVES_TOTAL = Counter(
    "captcha_solves_total",
    "Total CAPTCHA solve attempts",
    ["method"],
)

SOLVES_SUCCESS = Counter(
    "captcha_solves_success",
    "Successful CAPTCHA solves",
    ["method"],
)

SOLVES_ERRORS = Counter(
    "captcha_solves_errors",
    "Failed CAPTCHA solves",
    ["method", "error_code"],
)

SOLVE_DURATION = Histogram(
    "captcha_solve_duration_seconds",
    "CAPTCHA solve duration in seconds",
    ["method"],
    buckets=[5, 10, 15, 20, 30, 45, 60, 90, 120],
)

BALANCE = Gauge(
    "captcha_balance_usd",
    "Current CaptchaAI account balance in USD",
)

QUEUE_LENGTH = Gauge(
    "captcha_queue_length",
    "Number of pending CAPTCHA tasks",
)


class InstrumentedSolver:
    """Solver with Prometheus metric instrumentation."""

    def __init__(self, api_key):
        self.api_key = api_key
        self.base = "https://ocr.captchaai.com"

    def solve(self, method, **params):
        """Solve CAPTCHA with metric collection."""
        SOLVES_TOTAL.labels(method=method).inc()
        start = time.time()

        try:
            token = self._do_solve(method, params)
            duration = time.time() - start

            SOLVES_SUCCESS.labels(method=method).inc()
            SOLVE_DURATION.labels(method=method).observe(duration)

            return token

        except Exception as e:
            error_code = str(e)[:30]
            SOLVES_ERRORS.labels(
                method=method, error_code=error_code,
            ).inc()
            raise

    def update_balance(self):
        """Fetch and update balance metric."""
        resp = requests.get(f"{self.base}/res.php", params={
            "key": self.api_key,
            "action": "getbalance",
            "json": 1,
        }, timeout=15)
        balance = float(resp.json()["request"])
        BALANCE.set(balance)
        return balance

    def _do_solve(self, method, params, timeout=120):
        data = {"key": self.api_key, "method": method, "json": 1}
        data.update(params)

        resp = requests.post(
            f"{self.base}/in.php", data=data, timeout=30,
        )
        result = resp.json()

        if result.get("status") != 1:
            raise RuntimeError(result.get("request"))

        task_id = result["request"]
        start = time.time()

        while time.time() - start < timeout:
            time.sleep(5)
            resp = requests.get(f"{self.base}/res.php", params={
                "key": self.api_key,
                "action": "get",
                "id": task_id,
                "json": 1,
            }, timeout=15)
            data = resp.json()
            if data["request"] != "CAPCHA_NOT_READY":
                if data.get("status") == 1:
                    return data["request"]
                raise RuntimeError(data["request"])

        raise TimeoutError("Solve timeout")


# Start metrics server on port 8000
start_http_server(8000)
print("Metrics server running on :8000/metrics")

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:

  - job_name: "captcha-solver"
    static_configs:

      - targets: ["solver-app:8000"]
    scrape_interval: 10s

Docker Compose Stack

# docker-compose.yml
version: "3.8"

services:
  solver:
    build: .
    environment:

      - CAPTCHAAI_KEY=${CAPTCHAAI_KEY}
    ports:

      - "8000:8000"

  prometheus:
    image: prom/prometheus:latest
    volumes:

      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:

      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    ports:

      - "3000:3000"
    environment:

      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:

      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

Grafana Dashboard Queries

Success Rate (PromQL)

rate(captcha_solves_success[5m])
/ rate(captcha_solves_total[5m]) * 100

Average Solve Time

rate(captcha_solve_duration_seconds_sum[5m])
/ rate(captcha_solve_duration_seconds_count[5m])

Error Rate by Type

sum by (error_code) (
  rate(captcha_solves_errors[5m])
)

Balance Over Time

captcha_balance_usd

P95 Solve Duration

histogram_quantile(0.95,
  rate(captcha_solve_duration_seconds_bucket[5m])
)

Alerting Rules

# alert_rules.yml
groups:

  - name: captcha-alerts
    rules:

      - alert: LowBalance
        expr: captcha_balance_usd < 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "CaptchaAI balance below $5"

      - alert: HighErrorRate
        expr: |
          rate(captcha_solves_errors[5m])
          / rate(captcha_solves_total[5m]) > 0.1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "CAPTCHA error rate above 10%"

      - alert: SlowSolveTime
        expr: |
          histogram_quantile(0.95,
            rate(captcha_solve_duration_seconds_bucket[5m])
          ) > 60
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "P95 solve time exceeds 60s"

Troubleshooting

Issue Cause Fix
No metrics at /metrics Server not started Call start_http_server(8000)
Prometheus shows "down" Wrong target address Check Docker network and port
Grafana shows no data Prometheus not added as source Add Prometheus data source in Grafana
Metrics reset on restart Counter resets expected Use rate() not raw counters

FAQ

How much overhead does Prometheus add?

Negligible. The prometheus_client library adds <1ms per metric operation. Scraping every 10-15 seconds has no meaningful impact.

Can I use this with multiple worker instances?

Yes. Each worker exposes its own /metrics endpoint. Prometheus scrapes all targets. Grafana queries aggregate across all instances automatically.

What dashboard should I start with?

Start with four panels: success rate, average solve time, error breakdown, and balance. Add queue depth and throughput as you scale.



Full observability — monitor CaptchaAI with Prometheus today.

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Discussions (0)

No comments yet.