CAPTCHA Solve Rate SLI/SLO: How to Define and Monitor

"Our CAPTCHA solving works most of the time" isn't a reliability target. SLIs (Service Level Indicators) and SLOs (Service Level Objectives) give you measurable thresholds, error budgets, and actionable alerts for your CAPTCHA pipeline.

Definitions

Term	Meaning	CAPTCHA Example
SLI	A metric that measures service quality	Solve success rate: 94.2%
SLO	A target value for an SLI	Solve success rate ≥ 92% over 30 days
Error Budget	Allowed failures before SLO breach	8% failure budget = 800 failures per 10,000 tasks
Burn Rate	How fast you're consuming error budget	2x burn rate = budget exhausted in 15 days

Recommended SLIs for CAPTCHA Solving

SLI 1: Solve Success Rate

Success Rate = Successful Solves / Total Solve Attempts

CAPTCHA Type	Typical Rate	SLO Target
reCAPTCHA v2	95–99%	≥ 92%
reCAPTCHA v3	90–97%	≥ 88%
Cloudflare Turnstile	95–99%	≥ 92%
hCaptcha	90–97%	≥ 88%
Image/OCR	85–95%	≥ 82%

SLI 2: Solve Latency

Latency = Time from task submission to solution received

Percentile	Target	Alert Threshold
p50	< 25s	—
p95	< 90s	> 120s
p99	< 180s	> 300s

SLI 3: Pipeline Availability

Availability = Time pipeline is accepting and solving tasks / Total time

Target: ≥ 99.5% (allows 3.6 hours downtime per month)

Python — SLI/SLO Tracker

import os
import time
from collections import deque
from dataclasses import dataclass, field

API_KEY = os.environ["CAPTCHAAI_API_KEY"]


@dataclass
class SLITracker:
    """Track CAPTCHA solving SLIs over a sliding window."""

    window_seconds: int = 86400 * 30  # 30 days default
    events: deque = field(default_factory=deque)

    def record_success(self, latency_seconds):
        self.events.append({
            "time": time.time(),
            "success": True,
            "latency": latency_seconds
        })
        self._prune()

    def record_failure(self, error_code):
        self.events.append({
            "time": time.time(),
            "success": False,
            "error": error_code
        })
        self._prune()

    def _prune(self):
        cutoff = time.time() - self.window_seconds
        while self.events and self.events[0]["time"] < cutoff:
            self.events.popleft()

    @property
    def success_rate(self):
        if not self.events:
            return 1.0
        successes = sum(1 for e in self.events if e["success"])
        return successes / len(self.events)

    @property
    def latency_percentiles(self):
        latencies = sorted(
            e["latency"] for e in self.events if e.get("latency")
        )
        if not latencies:
            return {"p50": 0, "p95": 0, "p99": 0}

        def percentile(data, p):
            idx = int(len(data) * p / 100)
            return data[min(idx, len(data) - 1)]

        return {
            "p50": round(percentile(latencies, 50), 2),
            "p95": round(percentile(latencies, 95), 2),
            "p99": round(percentile(latencies, 99), 2),
        }

    @property
    def error_breakdown(self):
        errors = {}
        for e in self.events:
            if not e["success"]:
                code = e.get("error", "unknown")
                errors[code] = errors.get(code, 0) + 1
        return errors


class SLOChecker:
    """Check SLIs against SLO targets."""

    def __init__(self, tracker):
        self.tracker = tracker
        self.slos = {
            "success_rate": 0.92,    # ≥ 92%
            "latency_p95": 90.0,     # < 90 seconds
            "latency_p99": 180.0,    # < 180 seconds
        }

    @property
    def error_budget_total(self):
        """Total allowed failures in the window."""
        total = len(self.tracker.events)
        return int(total * (1 - self.slos["success_rate"]))

    @property
    def error_budget_remaining(self):
        """How many more failures before SLO breach."""
        total = len(self.tracker.events)
        failures = sum(1 for e in self.tracker.events if not e["success"])
        budget = self.error_budget_total
        return max(0, budget - failures)

    @property
    def error_budget_pct(self):
        """Percentage of error budget remaining."""
        total = self.error_budget_total
        if total == 0:
            return 100.0
        return round(self.error_budget_remaining / total * 100, 1)

    @property
    def burn_rate(self):
        """How fast error budget is being consumed.
        1.0 = on track, 2.0 = will exhaust in half the window.
        """
        total = len(self.tracker.events)
        if total == 0:
            return 0.0
        failures = sum(1 for e in self.tracker.events if not e["success"])
        expected_failures = total * (1 - self.slos["success_rate"])
        if expected_failures == 0:
            return 0.0
        return round(failures / expected_failures, 2)

    def check_all(self):
        """Check all SLOs and return status."""
        rate = self.tracker.success_rate
        latencies = self.tracker.latency_percentiles

        return {
            "success_rate": {
                "current": round(rate, 4),
                "target": self.slos["success_rate"],
                "met": rate >= self.slos["success_rate"]
            },
            "latency_p95": {
                "current": latencies["p95"],
                "target": self.slos["latency_p95"],
                "met": latencies["p95"] <= self.slos["latency_p95"]
            },
            "latency_p99": {
                "current": latencies["p99"],
                "target": self.slos["latency_p99"],
                "met": latencies["p99"] <= self.slos["latency_p99"]
            },
            "error_budget": {
                "remaining_pct": self.error_budget_pct,
                "remaining_count": self.error_budget_remaining,
                "burn_rate": self.burn_rate,
            },
            "overall": rate >= self.slos["success_rate"]
                       and latencies["p95"] <= self.slos["latency_p95"]
        }


# Usage
tracker = SLITracker(window_seconds=86400 * 30)
slo = SLOChecker(tracker)

# After each solve:
# tracker.record_success(latency_seconds=24.5)
# tracker.record_failure("ERROR_CAPTCHA_UNSOLVABLE")

# Check SLOs:
# print(slo.check_all())

JavaScript — SLO Dashboard

class SLODashboard {
  constructor(windowMs = 30 * 24 * 60 * 60 * 1000) {
    this.windowMs = windowMs;
    this.events = [];
    this.slos = {
      successRate: 0.92,
      latencyP95: 90,
      latencyP99: 180,
    };
  }

  recordSuccess(latencySeconds) {
    this.events.push({ time: Date.now(), success: true, latency: latencySeconds });
    this._prune();
  }

  recordFailure(errorCode) {
    this.events.push({ time: Date.now(), success: false, error: errorCode });
    this._prune();
  }

  _prune() {
    const cutoff = Date.now() - this.windowMs;
    this.events = this.events.filter((e) => e.time > cutoff);
  }

  get successRate() {
    if (this.events.length === 0) return 1;
    const successes = this.events.filter((e) => e.success).length;
    return successes / this.events.length;
  }

  get errorBudget() {
    const total = this.events.length;
    const allowedFailures = Math.floor(total * (1 - this.slos.successRate));
    const actualFailures = this.events.filter((e) => !e.success).length;
    const remaining = Math.max(0, allowedFailures - actualFailures);

    return {
      total: allowedFailures,
      consumed: actualFailures,
      remaining,
      remainingPct: allowedFailures > 0
        ? ((remaining / allowedFailures) * 100).toFixed(1)
        : "100.0",
      burnRate: allowedFailures > 0
        ? (actualFailures / allowedFailures).toFixed(2)
        : "0.00",
    };
  }

  get report() {
    const latencies = this.events
      .filter((e) => e.success && e.latency)
      .map((e) => e.latency)
      .sort((a, b) => a - b);

    const p95 = latencies.length > 0
      ? latencies[Math.floor(latencies.length * 0.95)]
      : 0;

    return {
      sliSuccessRate: (this.successRate * 100).toFixed(2) + "%",
      sloSuccessRate: (this.slos.successRate * 100).toFixed(0) + "%",
      sloMet: this.successRate >= this.slos.successRate,
      latencyP95: p95.toFixed(1) + "s",
      errorBudget: this.errorBudget,
      totalEvents: this.events.length,
    };
  }
}

const dashboard = new SLODashboard();
// dashboard.recordSuccess(24.5);
// console.log(dashboard.report);

Burn Rate Alert Thresholds

Burn Rate	Meaning	Alert
1.0	On track — budget lasts the full window	None
2.0	Budget exhausted in half the window	Warning
6.0	Budget exhausted in 5 days	Page on-call
14.0	Budget exhausted in ~2 days	Critical — immediate action

Troubleshooting

Issue	Cause	Fix
SLO always breached	Target too aggressive	Start with current performance − 3% as SLO
Error budget always full	SLO too loose	Tighten SLO to drive improvements
Burn rate spikes	Burst of failures	Check if transient (retry storm) or systemic
Budget consumed by one error type	Single root cause	Fix that error type; see error breakdown

FAQ

What SLO should I start with?

Measure your current success rate over 7 days. Subtract 3 percentage points — that's your starting SLO. Tighten it as you improve reliability.

Who owns the CAPTCHA SLO?

The team that operates the CAPTCHA solving pipeline. If scraping and CAPTCHA solving are separate teams, the CAPTCHA team owns solve rate SLOs while the scraping team owns end-to-end SLOs.

Should I set different SLOs per CAPTCHA type?

Yes. Image/OCR CAPTCHAs have fundamentally different success rates than reCAPTCHA v2. Setting per-type SLOs prevents one type from masking another's issues.

Next Steps

Set measurable reliability targets — get your CaptchaAI API key and define SLOs for your pipeline.

Related guides:

CAPTCHA Solve Rate SLI/SLO: How to Define and Monitor

Definitions

Recommended SLIs for CAPTCHA Solving

SLI 1: Solve Success Rate

SLI 2: Solve Latency

SLI 3: Pipeline Availability

Python — SLI/SLO Tracker

JavaScript — SLO Dashboard

Burn Rate Alert Thresholds

Troubleshooting

FAQ

What SLO should I start with?

Who owns the CAPTCHA SLO?

Should I set different SLOs per CAPTCHA type?

Next Steps

Discussions (0)

Discord Webhook Alerts for CAPTCHA Pipeline Status

Grafana Dashboard Templates for CaptchaAI Metrics

Webhook Endpoint Monitoring for CAPTCHA Solve Callbacks

Monitoring CAPTCHA Solve Rates with Prometheus and Grafana

Building Custom CaptchaAI Alerts with PagerDuty

Building a CaptchaAI Usage Dashboard and Monitoring

Batch CAPTCHA Solving Cost Estimation and Budget Alerts

CaptchaAI Monitoring with Datadog: Metrics and Alerts

Structured Logging for CAPTCHA Operations

CaptchaAI Monitoring with New Relic: APM Integration

Definitions

Recommended SLIs for CAPTCHA Solving

SLI 1: Solve Success Rate

SLI 2: Solve Latency

SLI 3: Pipeline Availability

Python — SLI/SLO Tracker

JavaScript — SLO Dashboard

Burn Rate Alert Thresholds

Troubleshooting

FAQ

What SLO should I start with?

Who owns the CAPTCHA SLO?

Should I set different SLOs per CAPTCHA type?

Related Articles

Next Steps

Discussions (0)

Join the conversation

Related Posts

Discord Webhook Alerts for CAPTCHA Pipeline Status

Grafana Dashboard Templates for CaptchaAI Metrics

Webhook Endpoint Monitoring for CAPTCHA Solve Callbacks

Monitoring CAPTCHA Solve Rates with Prometheus and Grafana

Building Custom CaptchaAI Alerts with PagerDuty

Building a CaptchaAI Usage Dashboard and Monitoring

Batch CAPTCHA Solving Cost Estimation and Budget Alerts

CaptchaAI Monitoring with Datadog: Metrics and Alerts

Structured Logging for CAPTCHA Operations

CaptchaAI Monitoring with New Relic: APM Integration