Webhook Endpoint Monitoring for CAPTCHA Solve Callbacks

Your CaptchaAI callback endpoint is a critical dependency — if it goes down, solved CAPTCHAs don't reach your application. Built-in monitoring catches problems before they cascade.

What to Monitor

Metric	Why It Matters	Healthy Range
Endpoint uptime	Callbacks fail during downtime	> 99.5%
Response latency	Slow responses may timeout	< 500 ms
Error rate (4xx/5xx)	Indicates handler bugs	< 1%
Callback delivery rate	Ratio of callbacks received vs tasks submitted	> 95%
Time between callbacks	Detects sudden stops	< 5× average interval

Self-Monitoring Middleware

Add monitoring directly to your callback handler.

Python (Flask)

import time
import threading
from collections import deque
from flask import Flask, request, jsonify

app = Flask(__name__)

# Rolling window metrics (last 1000 callbacks)
metrics = {
    "total_received": 0,
    "total_errors": 0,
    "latencies": deque(maxlen=1000),
    "last_callback_at": 0,
    "error_counts": {}
}
metrics_lock = threading.Lock()


@app.route("/callback")
def captcha_callback():
    start = time.time()

    task_id = request.args.get("id")
    solution = request.args.get("code")

    try:
        # Process the callback
        store_result(task_id, solution)
        status = "ok"
        http_code = 200
    except Exception as e:
        status = "error"
        http_code = 200  # Still ACK to CaptchaAI
        error_type = type(e).__name__
        with metrics_lock:
            metrics["total_errors"] += 1
            metrics["error_counts"][error_type] = \
                metrics["error_counts"].get(error_type, 0) + 1

    # Record metrics
    latency_ms = (time.time() - start) * 1000
    with metrics_lock:
        metrics["total_received"] += 1
        metrics["latencies"].append(latency_ms)
        metrics["last_callback_at"] = time.time()

    return "OK", http_code


@app.route("/health/callbacks")
def callback_health():
    """Health endpoint for monitoring."""
    with metrics_lock:
        latencies = list(metrics["latencies"])
        last_at = metrics["last_callback_at"]

    now = time.time()
    avg_latency = sum(latencies) / len(latencies) if latencies else 0
    p95_latency = sorted(latencies)[int(len(latencies) * 0.95)] if latencies else 0
    seconds_since_last = now - last_at if last_at > 0 else -1

    health = {
        "status": "healthy" if seconds_since_last < 300 else "stale",
        "total_received": metrics["total_received"],
        "total_errors": metrics["total_errors"],
        "error_rate": metrics["total_errors"] / max(metrics["total_received"], 1),
        "avg_latency_ms": round(avg_latency, 2),
        "p95_latency_ms": round(p95_latency, 2),
        "seconds_since_last_callback": round(seconds_since_last, 1),
        "error_breakdown": dict(metrics["error_counts"])
    }

    status_code = 200 if health["status"] == "healthy" else 503
    return jsonify(health), status_code

JavaScript (Express)

const express = require("express");
const app = express();

const metrics = {
  totalReceived: 0,
  totalErrors: 0,
  latencies: [],
  lastCallbackAt: 0,
  errorCounts: {},
};

const MAX_LATENCIES = 1000;

app.get("/callback", (req, res) => {
  const start = Date.now();
  const taskId = req.query.id;
  const solution = req.query.code;

  try {
    storeResult(taskId, solution);
  } catch (err) {
    metrics.totalErrors++;
    const errType = err.constructor.name;
    metrics.errorCounts[errType] = (metrics.errorCounts[errType] || 0) + 1;
  }

  const latencyMs = Date.now() - start;
  metrics.totalReceived++;
  metrics.latencies.push(latencyMs);
  if (metrics.latencies.length > MAX_LATENCIES) metrics.latencies.shift();
  metrics.lastCallbackAt = Date.now();

  res.sendStatus(200);
});

app.get("/health/callbacks", (req, res) => {
  const latencies = [...metrics.latencies].sort((a, b) => a - b);
  const avgLatency =
    latencies.length > 0
      ? latencies.reduce((a, b) => a + b, 0) / latencies.length
      : 0;
  const p95Latency =
    latencies.length > 0
      ? latencies[Math.floor(latencies.length * 0.95)]
      : 0;
  const secondsSinceLast =
    metrics.lastCallbackAt > 0
      ? (Date.now() - metrics.lastCallbackAt) / 1000
      : -1;

  const health = {
    status: secondsSinceLast < 300 ? "healthy" : "stale",
    totalReceived: metrics.totalReceived,
    totalErrors: metrics.totalErrors,
    errorRate: metrics.totalErrors / Math.max(metrics.totalReceived, 1),
    avgLatencyMs: Math.round(avgLatency * 100) / 100,
    p95LatencyMs: Math.round(p95Latency * 100) / 100,
    secondsSinceLastCallback: Math.round(secondsSinceLast * 10) / 10,
    errorBreakdown: metrics.errorCounts,
  };

  res.status(health.status === "healthy" ? 200 : 503).json(health);
});

app.listen(3000);

Delivery Rate Tracking

Compare tasks submitted with callbacks received to measure delivery success:

Python

import time

submitted_tasks = {}  # task_id -> submitted_at
delivered_tasks = set()
delivery_timeout = 300  # 5 minutes


def on_submit(task_id):
    """Call after submitting to CaptchaAI with pingback."""
    submitted_tasks[task_id] = time.time()


def on_callback(task_id):
    """Call when callback is received."""
    delivered_tasks.add(task_id)
    submitted_tasks.pop(task_id, None)


def get_delivery_stats():
    """Calculate delivery metrics."""
    now = time.time()

    # Expired tasks = submitted > 5 min ago, never received callback
    expired = [
        tid for tid, ts in submitted_tasks.items()
        if now - ts > delivery_timeout
    ]

    total = len(delivered_tasks) + len(expired)
    rate = len(delivered_tasks) / max(total, 1)

    return {
        "delivered": len(delivered_tasks),
        "missed": len(expired),
        "pending": len(submitted_tasks) - len(expired),
        "delivery_rate": round(rate, 4),
        "missed_task_ids": expired[:10]  # Sample for debugging
    }

Alert Conditions

Set up alerts for these conditions:

Alert	Trigger	Severity
Stale endpoint	No callback received in 5+ minutes	Warning
High error rate	> 5% error rate over 100 requests	Critical
Slow responses	p95 latency > 1000 ms	Warning
Low delivery rate	< 90% delivery rate	Critical
Endpoint down	Health check returns 503 or timeout	Critical

Simple Alert Script

import requests
import time


def check_callback_health(health_url, alert_callback):
    """Periodic health checker."""
    while True:
        try:
            resp = requests.get(health_url, timeout=5)
            health = resp.json()

            if resp.status_code != 200:
                alert_callback("CRITICAL", f"Callback endpoint unhealthy: {health['status']}")

            if health.get("error_rate", 0) > 0.05:
                alert_callback("CRITICAL", f"High error rate: {health['error_rate']:.1%}")

            if health.get("p95_latency_ms", 0) > 1000:
                alert_callback("WARNING", f"Slow callbacks: p95={health['p95_latency_ms']}ms")

            if health.get("seconds_since_last_callback", -1) > 300:
                alert_callback("WARNING", f"No callbacks for {health['seconds_since_last_callback']:.0f}s")

        except requests.RequestException as e:
            alert_callback("CRITICAL", f"Health check failed: {e}")

        time.sleep(60)  # Check every minute

External Monitoring Integration

For production systems, pair self-monitoring with external uptime checks:

Tool	Integration
UptimeRobot	Monitor `/health/callbacks` endpoint
Pingdom	HTTP check with response body validation
AWS CloudWatch	Synthetic canary on health endpoint
Self-hosted	Cron job calling health check script

Troubleshooting

Issue	Cause	Fix
Health endpoint shows "stale" with no callbacks	No tasks submitted recently, or callbacks not reaching server	Check if tasks are being submitted with `pingback`; verify firewall rules
High latency on callback handler	Slow database writes in handler	Process async — accept callback, queue for background processing
Delivery rate dropping	Server restarts clearing in-memory task tracking	Use Redis or database to persist submitted task IDs
Error rate spikes	Downstream service (database) failing	Check error breakdown; fix underlying service

FAQ

Should I use a separate service for monitoring?

For small setups, self-monitoring middleware is sufficient. For production systems with SLAs, add external monitoring (UptimeRobot, Pingdom) that checks from outside your infrastructure.

How long should I keep metrics in memory?

A rolling window of the last 1,000 events is usually enough for real-time dashboards. For historical analysis, export metrics to Prometheus, Datadog, or a time-series database.

What if my callback endpoint is behind a load balancer?

Each instance tracks its own metrics. Aggregate across instances in your monitoring platform, or expose a shared metrics store (Redis) that all instances write to.

Next Steps

Monitor your callback endpoints — get your CaptchaAI API key and add health checks from day one.

Related guides:

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Webhook Endpoint Monitoring for CAPTCHA Solve Callbacks

What to Monitor

Self-Monitoring Middleware

Python (Flask)

JavaScript (Express)

Delivery Rate Tracking

Python

Alert Conditions

Simple Alert Script

External Monitoring Integration

Troubleshooting

FAQ

Should I use a separate service for monitoring?

How long should I keep metrics in memory?

What if my callback endpoint is behind a load balancer?

Next Steps

Discussions (0)

Discord Webhook Alerts for CAPTCHA Pipeline Status

CaptchaAI Webhook Security: Validating Callback Signatures

Grafana Dashboard Templates for CaptchaAI Metrics

Monitoring CAPTCHA Solve Rates with Prometheus and Grafana

Building Custom CaptchaAI Alerts with PagerDuty

CAPTCHA Solve Rate SLI/SLO: How to Define and Monitor

Building a CaptchaAI Usage Dashboard and Monitoring

Batch CAPTCHA Solving Cost Estimation and Budget Alerts

CaptchaAI Monitoring with Datadog: Metrics and Alerts

Structured Logging for CAPTCHA Operations

What to Monitor

Self-Monitoring Middleware

Python (Flask)

JavaScript (Express)

Delivery Rate Tracking

Python

Alert Conditions

Simple Alert Script

External Monitoring Integration

Troubleshooting

FAQ

Should I use a separate service for monitoring?

How long should I keep metrics in memory?

What if my callback endpoint is behind a load balancer?

Related Articles

Next Steps

Discussions (0)

Join the conversation

Related Posts

Discord Webhook Alerts for CAPTCHA Pipeline Status

CaptchaAI Webhook Security: Validating Callback Signatures

Grafana Dashboard Templates for CaptchaAI Metrics

Monitoring CAPTCHA Solve Rates with Prometheus and Grafana

Building Custom CaptchaAI Alerts with PagerDuty

CAPTCHA Solve Rate SLI/SLO: How to Define and Monitor

Building a CaptchaAI Usage Dashboard and Monitoring

Batch CAPTCHA Solving Cost Estimation and Budget Alerts

CaptchaAI Monitoring with Datadog: Metrics and Alerts

Structured Logging for CAPTCHA Operations