Explainers

How Image CAPTCHA Solving Works (OCR)

Image CAPTCHAs render distorted text in an image and ask users to type what they see. Automated solving uses Optical Character Recognition (OCR) — pattern recognition that converts image pixels into text characters.


How text CAPTCHAs create difficulty

CAPTCHA images use multiple distortion techniques to defeat OCR:

Technique How it works Effect on OCR
Character warping Letters bent, stretched, rotated Breaks template matching
Overlapping characters Letters overlap each other Makes segmentation difficult
Background noise Random dots, lines, patterns Confuses edge detection
Color variation Different colors for text and noise Breaks single-threshold binarization
Variable fonts Mixed font families and sizes Prevents consistent template matching
Line strikes Lines drawn through text Fragments character shapes
JPEG artifacts Low quality compression Blurs character edges

How OCR solves CAPTCHAs

Traditional OCR pipeline

Image → Preprocessing → Segmentation → Recognition → Text
  1. Preprocessing — Convert to grayscale, remove noise, enhance contrast
  2. Binarization — Convert to black/white using adaptive thresholding
  3. Segmentation — Isolate individual characters
  4. Recognition — Match each character against a trained model
  5. Post-processing — Apply dictionary/grammar corrections

Neural network approach (modern)

Image → CNN Feature Extraction → RNN Sequence Modeling → CTC Decoding → Text

Modern solvers skip segmentation entirely. Convolutional Neural Networks (CNNs) extract visual features. Recurrent Neural Networks (RNNs) with CTC (Connectionist Temporal Classification) decode the character sequence directly from the feature map.


Why local OCR fails on CAPTCHAs

Standard OCR tools (Tesseract, EasyOCR) are designed for clean document text. CAPTCHAs are specifically designed to defeat them:

Feature Document OCR CAPTCHA
Text clarity Clean, high-contrast Distorted, noisy
Character spacing Regular Overlapping
Background White/plain Complex patterns
Font consistency Uniform Variable
Accuracy needed ~95% usable ~100% required

If any character is wrong, the entire CAPTCHA solution fails. A 95% per-character accuracy means a 6-character CAPTCHA has only a 73% success rate (0.95^6).


API-based solving vs local OCR

Approach Accuracy Speed Cost
Tesseract on CAPTCHA 10–40% Instant Free
Custom CNN model 60–85% Instant Training cost
CaptchaAI API ~95%+ 5–15 sec Per-solve
Manual human solving ~99% 10–30 sec Per-solve

CaptchaAI achieves high accuracy by combining specialized ML models trained specifically on CAPTCHA images with human verification for difficult cases.


Common CAPTCHA image types

Simple text CAPTCHA

┌──────────────────┐
│   a K 3 m P 7    │  ← Letters and numbers, minimal distortion
└──────────────────┘

Heavily distorted

┌──────────────────┐
│ ▓░▒a░K▒3▓m░P▒7░ │  ← Noise, warping, overlapping
└──────────────────┘

Math CAPTCHA

┌──────────────────┐
│    3 + 7 = ?     │  ← User must compute: answer is 10
└──────────────────┘

Multiline CAPTCHA

┌──────────────────┐
│   Select the     │
│   red word:      │
│  apple  DOG      │  ← User types "DOG" (shown in red)
└──────────────────┘

Solving with CaptchaAI

import requests
import time
import base64

# Read and encode the image
with open("captcha.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

# Submit
resp = requests.post("https://ocr.captchaai.com/in.php", data={
    "key": "YOUR_API_KEY",
    "method": "base64",
    "body": img_b64,
    "json": 1
}).json()

task_id = resp["request"]

# Poll
for _ in range(30):
    time.sleep(5)
    result = requests.get("https://ocr.captchaai.com/res.php", params={
        "key": "YOUR_API_KEY", "action": "get", "id": task_id, "json": 1
    }).json()
    if result.get("status") == 1:
        print(f"Text: {result['request']}")
        break

FAQ

Is OCR CAPTCHA still used?

Yes. Many legacy systems, government websites, and smaller sites still use text-based image CAPTCHAs because they are simple to implement.

Can I train my own model instead of using an API?

Yes, but you need thousands of labeled CAPTCHA examples from the specific site. The model must be retrained if the site changes its CAPTCHA style. API services handle this maintenance for you.

What about CAPTCHAs with colored text?

CaptchaAI handles colored text CAPTCHAs. For local OCR, preprocessing must preserve text while removing colored noise — significantly harder than grayscale CAPTCHAs.

How do I report wrong answers?

Send a report to https://ocr.captchaai.com/res.php?key=KEY&action=reportbad&id=TASK_ID. This improves solver accuracy and may refund the solve cost.


Discussions (0)

No comments yet.