DevOps & Scaling

Building Custom CaptchaAI Alerts with PagerDuty

A CAPTCHA solving outage at 3 AM costs hours of missed data. PagerDuty integration ensures the right person gets notified immediately — with enough context to diagnose and fix the issue without digging through logs.

Alert Strategy

Severity Condition PagerDuty Action
Critical Balance < $2 Page on-call engineer
Critical All workers down Page on-call engineer
High Error rate > 20% for 5 min Create urgent incident
Warning Balance < $10 Create low-urgency incident
Warning Queue depth > 100 for 10 min Create low-urgency incident
Info Solve latency p95 > 120s Add to existing incident or log

Python — PagerDuty Events API v2

import os
import time
import hashlib
import requests
from datetime import datetime

API_KEY = os.environ["CAPTCHAAI_API_KEY"]
PAGERDUTY_ROUTING_KEY = os.environ["PAGERDUTY_ROUTING_KEY"]

session = requests.Session()


class CaptchaPagerDuty:
    EVENTS_URL = "https://events.pagerduty.com/v2/enqueue"

    def __init__(self, routing_key):
        self.routing_key = routing_key

    def trigger(self, summary, severity="error", source="captcha-pipeline",
                details=None, dedup_key=None):
        """Trigger a new PagerDuty incident."""
        payload = {
            "routing_key": self.routing_key,
            "event_action": "trigger",
            "payload": {
                "summary": summary,
                "severity": severity,  # critical, error, warning, info
                "source": source,
                "timestamp": datetime.utcnow().isoformat() + "Z",
                "custom_details": details or {}
            }
        }

        if dedup_key:
            payload["dedup_key"] = dedup_key

        resp = requests.post(self.EVENTS_URL, json=payload, timeout=10)
        resp.raise_for_status()
        return resp.json()

    def resolve(self, dedup_key):
        """Resolve an existing incident."""
        payload = {
            "routing_key": self.routing_key,
            "event_action": "resolve",
            "dedup_key": dedup_key
        }
        resp = requests.post(self.EVENTS_URL, json=payload, timeout=10)
        resp.raise_for_status()
        return resp.json()

    def acknowledge(self, dedup_key):
        """Acknowledge an existing incident."""
        payload = {
            "routing_key": self.routing_key,
            "event_action": "acknowledge",
            "dedup_key": dedup_key
        }
        resp = requests.post(self.EVENTS_URL, json=payload, timeout=10)
        resp.raise_for_status()
        return resp.json()


pagerduty = CaptchaPagerDuty(PAGERDUTY_ROUTING_KEY)


class CaptchaMonitor:
    def __init__(self):
        self.error_window = []  # (timestamp, is_error)
        self.window_size = 300  # 5 minutes in seconds

    def record_solve(self, success):
        now = time.time()
        self.error_window.append((now, not success))
        # Prune old entries
        self.error_window = [
            (t, e) for t, e in self.error_window
            if now - t < self.window_size
        ]

    @property
    def error_rate(self):
        if not self.error_window:
            return 0.0
        errors = sum(1 for _, e in self.error_window if e)
        return errors / len(self.error_window)

    def check_balance(self):
        resp = session.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "getbalance", "json": 1
        })
        data = resp.json()
        if data.get("status") != 1:
            return None
        return float(data["request"])

    def run_checks(self):
        """Run all monitoring checks and trigger alerts."""
        # Check balance
        balance = self.check_balance()
        if balance is not None:
            if balance < 2:
                pagerduty.trigger(
                    summary=f"CaptchaAI balance critically low: ${balance:.2f}",
                    severity="critical",
                    dedup_key="captcha-balance-critical",
                    details={"balance": balance, "threshold": 2}
                )
            elif balance < 10:
                pagerduty.trigger(
                    summary=f"CaptchaAI balance low: ${balance:.2f}",
                    severity="warning",
                    dedup_key="captcha-balance-warning",
                    details={"balance": balance, "threshold": 10}
                )
            else:
                # Resolve if balance recovered
                try:
                    pagerduty.resolve("captcha-balance-critical")
                    pagerduty.resolve("captcha-balance-warning")
                except Exception:
                    pass  # No incident to resolve

        # Check error rate
        rate = self.error_rate
        if rate > 0.20:
            total = len(self.error_window)
            errors = sum(1 for _, e in self.error_window if e)
            pagerduty.trigger(
                summary=f"CaptchaAI error rate {rate:.0%} "
                        f"({errors}/{total} in 5 min)",
                severity="error",
                dedup_key="captcha-error-rate-high",
                details={
                    "error_rate": round(rate, 3),
                    "total_tasks": total,
                    "failed_tasks": errors,
                    "window_seconds": self.window_size
                }
            )
        elif rate < 0.05 and len(self.error_window) > 10:
            try:
                pagerduty.resolve("captcha-error-rate-high")
            except Exception:
                pass


monitor = CaptchaMonitor()

# After each solve:
# monitor.record_solve(success=True)

# Run checks every 60 seconds:
# while True:
#     monitor.run_checks()
#     time.sleep(60)

JavaScript — PagerDuty Integration

const axios = require("axios");

const API_KEY = process.env.CAPTCHAAI_API_KEY;
const PD_ROUTING_KEY = process.env.PAGERDUTY_ROUTING_KEY;
const PD_EVENTS_URL = "https://events.pagerduty.com/v2/enqueue";

class PagerDutyAlerter {
  constructor(routingKey) {
    this.routingKey = routingKey;
  }

  async trigger(summary, severity = "error", details = {}, dedupKey = null) {
    const payload = {
      routing_key: this.routingKey,
      event_action: "trigger",
      payload: {
        summary,
        severity,
        source: "captcha-pipeline",
        timestamp: new Date().toISOString(),
        custom_details: details,
      },
    };
    if (dedupKey) payload.dedup_key = dedupKey;

    const resp = await axios.post(PD_EVENTS_URL, payload, { timeout: 10000 });
    return resp.data;
  }

  async resolve(dedupKey) {
    await axios.post(PD_EVENTS_URL, {
      routing_key: this.routingKey,
      event_action: "resolve",
      dedup_key: dedupKey,
    }, { timeout: 10000 });
  }
}

const alerter = new PagerDutyAlerter(PD_ROUTING_KEY);

class CaptchaHealthMonitor {
  constructor(windowMs = 300000) {
    this.results = [];
    this.windowMs = windowMs;
  }

  record(success) {
    this.results.push({ time: Date.now(), success });
    const cutoff = Date.now() - this.windowMs;
    this.results = this.results.filter((r) => r.time > cutoff);
  }

  get errorRate() {
    if (this.results.length === 0) return 0;
    const errors = this.results.filter((r) => !r.success).length;
    return errors / this.results.length;
  }

  async checkAndAlert() {
    // Balance check
    try {
      const resp = await axios.get("https://ocr.captchaai.com/res.php", {
        params: { key: API_KEY, action: "getbalance", json: 1 },
      });
      if (resp.data.status === 1) {
        const balance = parseFloat(resp.data.request);
        if (balance < 2) {
          await alerter.trigger(
            `CaptchaAI balance critically low: $${balance.toFixed(2)}`,
            "critical",
            { balance },
            "captcha-balance-critical"
          );
        } else if (balance < 10) {
          await alerter.trigger(
            `CaptchaAI balance low: $${balance.toFixed(2)}`,
            "warning",
            { balance },
            "captcha-balance-warning"
          );
        } else {
          await alerter.resolve("captcha-balance-critical").catch(() => {});
          await alerter.resolve("captcha-balance-warning").catch(() => {});
        }
      }
    } catch (err) {
      console.error("Balance check failed:", err.message);
    }

    // Error rate check
    const rate = this.errorRate;
    if (rate > 0.2 && this.results.length > 10) {
      await alerter.trigger(
        `CaptchaAI error rate: ${(rate * 100).toFixed(1)}%`,
        "error",
        { errorRate: rate, totalTasks: this.results.length },
        "captcha-error-rate"
      );
    } else if (rate < 0.05 && this.results.length > 10) {
      await alerter.resolve("captcha-error-rate").catch(() => {});
    }
  }
}

const monitor = new CaptchaHealthMonitor();

// Run checks every 60 seconds
setInterval(() => monitor.checkAndAlert(), 60000);

module.exports = { monitor, alerter };

PagerDuty Setup Checklist

Step Action
1 Create a service in PagerDuty for "CaptchaAI Pipeline"
2 Add Events API v2 integration to the service
3 Copy the routing key to PAGERDUTY_ROUTING_KEY env var
4 Set up escalation policy (on-call → team lead → manager)
5 Configure notification rules (push, SMS, phone)
6 Add maintenance windows for planned downtime

Troubleshooting

Issue Cause Fix
Alert not triggering Wrong routing key Verify key matches the service's Events API integration
Duplicate incidents Missing dedup_key Always set a consistent dedup key per alert type
Alert flood No cooldown between triggers PagerDuty dedup key suppresses duplicates; ensure you use them
Auto-resolve not working Dedup key mismatch Ensure resolve uses the exact same dedup key as trigger

FAQ

How do I avoid alert fatigue?

Use deduplication keys to group related alerts into a single incident. Set warning alerts as low-urgency (no page). Reserve critical/high-urgency for balance < $2 or all workers down.

Can I integrate PagerDuty with Datadog/New Relic instead?

Yes. Both Datadog and New Relic have native PagerDuty integrations. Use those if you already send metrics to an observability platform. Direct API integration (this guide) is best when you want custom control.

What's the difference between trigger, acknowledge, and resolve?

Trigger creates a new incident. Acknowledge stops notifications but keeps the incident open (someone is working on it). Resolve closes the incident completely.

Next Steps

Get alerted the moment your CAPTCHA pipeline has issues — start with a CaptchaAI API key and connect PagerDuty.

Related guides:

Full Working Code

Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.

View on GitHub →

Discussions (0)

No comments yet.