Explainers

How Grid Image CAPTCHA Challenges Work

Grid image CAPTCHAs display an image divided into cells and ask users to select specific cells based on visual content. This format is used by reCAPTCHA, custom CAPTCHA systems, and various website protection services.


The core mechanism

  1. A large image is split into a grid (3×3, 4×4, or custom)
  2. Text instructions describe what to find ("Select all squares with traffic lights")
  3. The user clicks cells containing the target object
  4. The server verifies the selection against known correct answers
  5. If correct, access is granted; if wrong, a new challenge appears

Grid formats

Standard 3×3 grid (9 cells)

The most common format. One image is divided into 9 equal sections:

┌─────┬─────┬─────┐
│  1  │  2  │  3  │
├─────┼─────┼─────┤
│  4  │  5  │  6  │
├─────┼─────┼─────┤
│  7  │  8  │  9  │
└─────┴─────┴─────┘

4×4 grid (16 cells)

Used for higher security. Smaller cells make object identification harder:

┌────┬────┬────┬────┐
│ 1  │ 2  │ 3  │ 4  │
├────┼────┼────┼────┤
│ 5  │ 6  │ 7  │ 8  │
├────┼────┼────┼────┤
│ 9  │ 10 │ 11 │ 12 │
├────┼────┼────┼────┤
│ 13 │ 14 │ 15 │ 16 │
└────┴────┴────┴────┘

Custom grids

Some systems use irregular layouts — different-sized cells, non-square grids, or overlapping images.


Types of grid challenges

Type Description Example
Single object Select all cells containing one object type "Select all buses"
Multi-round New tiles replace selected ones reCAPTCHA dynamic grids
Ordered selection Click items in a specific sequence "Click the cars from left to right"
Negative selection Identify cells that do NOT contain the object "Select cells without text"

Who uses grid image CAPTCHAs

Provider Grid format Key characteristics
Google reCAPTCHA v2 3×3 and 4×4 Dynamic tiles, behavioral analysis
BLS CAPTCHA Variable (3-9 separate images) Custom instructions, visa systems
hCaptcha 3×3 and 4×4 Similar to reCAPTCHA, privacy-focused
Custom implementations Variable Site-specific, no standardized API

How grid CAPTCHAs detect bots

Signal What it reveals
Click accuracy Bots click exact cell centers; humans are imprecise
Click timing Bots click too fast or at perfectly regular intervals
Mouse trajectory Bots move in straight lines; humans curve naturally
Selection correctness ML models flag edge-case errors that humans make vs binary bot errors
Challenge completion time Too fast = bot; too slow = bot giving up

Solving grid CAPTCHAs with CaptchaAI

For reCAPTCHA grids, use the token method for better reliability:

import requests, time

# Token method — CaptchaAI handles the grid internally
resp = requests.get("https://ocr.captchaai.com/in.php", params={
    "key": "YOUR_API_KEY",
    "method": "userrecaptcha",
    "googlekey": "SITE_KEY",
    "pageurl": "https://example.com",
    "json": 1
}).json()
task_id = resp["request"]

for _ in range(30):
    time.sleep(5)
    result = requests.get("https://ocr.captchaai.com/res.php", params={
        "key": "YOUR_API_KEY", "action": "get", "id": task_id, "json": 1
    }).json()
    if result.get("status") == 1:
        print(f"Token: {result['request'][:50]}...")
        break

For non-reCAPTCHA grids, use the image method:

import base64

with open("grid.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

resp = requests.post("https://ocr.captchaai.com/in.php", data={
    "key": "YOUR_API_KEY",
    "method": "post",
    "body": img_b64,
    "recaptcha": 1,
    "json": 1
}).json()

FAQ

What is the difference between grid CAPTCHA and image CAPTCHA?

Grid CAPTCHA divides an image into cells for selection. Image CAPTCHA (OCR) shows distorted text that the user types. Grid challenges require object recognition; image CAPTCHAs require text recognition.

Can AI solve grid CAPTCHAs without human help?

Modern image classification models can identify objects in grid cells, but success rates vary. CaptchaAI combines AI models with human verification for high accuracy.

Why do some grid CAPTCHAs show new images after clicking?

Dynamic grids (used by reCAPTCHA) replace clicked tiles to prevent screenshot-based solving and to require sustained attention. This increases the difficulty for automated systems.

Are 4×4 grids harder to solve than 3×3?

Yes. Smaller cells contain less visual information, making object identification harder. 4×4 grids also require selecting more cells correctly.


Discussions (0)

No comments yet.