Explainers

How Grid Image CAPTCHAs Work

Grid image CAPTCHAs present a photo divided into tiles and ask users to select the tiles containing a specific object — traffic lights, crosswalks, buses. This is the visual challenge behind reCAPTCHA v2's image grid. Understanding how these grids work is essential for building reliable solving workflows.


The grid structure

A single image is divided into a grid of equal-sized tiles:

3×3 Grid (9 tiles):          4×4 Grid (16 tiles):
┌───┬───┬───┐                ┌───┬───┬───┬───┐
│ 1 │ 2 │ 3 │                │ 1 │ 2 │ 3 │ 4 │
├───┼───┼───┤                ├───┼───┼───┼───┤
│ 4 │ 5 │ 6 │                │ 5 │ 6 │ 7 │ 8 │
├───┼───┼───┤                ├───┼───┼───┼───┤
│ 7 │ 8 │ 9 │                │ 9 │10 │11 │12 │
└───┴───┴───┘                ├───┼───┼───┼───┤
                             │13 │14 │15 │16 │
                             └───┴───┴───┴───┘

Tiles are numbered left-to-right, top-to-bottom. The user selects tiles that contain the target object.


How the challenge works

  1. Image generated — The server selects a photo containing identifiable objects
  2. Grid applied — The photo is divided into a 3×3 or 4×4 grid
  3. Instruction shown — "Select all squares with traffic lights"
  4. User selects tiles — Clicking tiles that contain the target object
  5. Verification — The server checks whether the correct tiles were selected

The instruction always names a single object category. Common targets include: crosswalks, traffic lights, cars, buses, motorcycles, bicycles, fire hydrants, stairs, bridges, boats, and parking meters.


Single-step vs multi-step challenges

Single-step

One image, one instruction, one selection. Select the matching tiles and submit. This is the simpler format.

Multi-step (dynamic grids)

After selecting tiles and submitting, new tiles appear in the selected positions. The user must select again if the new tiles also match the instruction. This continues until no matching tiles remain.

Step 1: Select tiles 2, 5, 8 (traffic lights)
Step 2: Tiles 2, 5, 8 refresh with new images
        → Select tile 5 (still has traffic light)
Step 3: Tile 5 refreshes again
        → No traffic lights → Done

Multi-step challenges are harder to automate because each step requires a new image analysis.


How reCAPTCHA v2 uses grids

reCAPTCHA v2 uses grid challenges as a fallback when the checkbox risk score is too high. The flow:

  1. User clicks "I'm not a robot" checkbox
  2. reCAPTCHA evaluates browser behavior and risk score
  3. Low risk → Checkbox passes immediately (no grid)
  4. High risk → Grid image challenge appears
  5. User solves the grid → reCAPTCHA generates a token

The grid difficulty scales with risk. Higher-risk sessions get:

  • 4×4 grids instead of 3×3
  • Multi-step challenges instead of single-step
  • Harder-to-distinguish objects (crosswalks in shadows, partial traffic lights)

Grid vs individual images

Grid CAPTCHAs split a single photo into tiles. This is different from CAPTCHAs that show multiple distinct images (like BLS CAPTCHA).

Feature Grid (reCAPTCHA) Individual images (BLS)
Source One photo, divided Separate distinct images
Context Objects span multiple tiles Each image is independent
Partial objects Yes (corner of a car in one tile) No
Multi-step Yes (tiles refresh) No

The key challenge with grid CAPTCHAs is that objects can span tile boundaries. A traffic light might appear across tiles 2 and 5, requiring both to be selected even though neither shows the complete object.


What makes grid CAPTCHAs difficult

Challenge Description
Partial visibility Object appears in only a corner of a tile
Ambiguous boundaries Object is partially in the tile — include it or not?
Similar objects Street lights vs traffic lights, vans vs buses
Perspective Objects at unusual angles or distances
Occlusion Objects partially hidden behind others
Multi-step dynamics New tiles introduce new classification per step

How CaptchaAI solves grid CAPTCHAs

CaptchaAI processes grid CAPTCHAs using these parameters:

Parameter Value
method post (file upload)
grid_size 3x3 or 4x4
img_type recaptcha
instructions The target object (e.g., "crosswalks")

CaptchaAI analyzes the full image contextually — understanding that objects span tiles and using the surrounding context to make selections. The response is an array of tile indices: [1, 3, 6, 9].


Accuracy factors

Factor Higher accuracy Lower accuracy
Image quality Clear, well-lit photos Dark, blurry, compressed
Object clarity Obvious objects (large car) Partial/distant objects
Grid size 3×3 (larger tiles) 4×4 (smaller tiles, less context)
Instruction specificity Common objects (cars, lights) Ambiguous objects

FAQ

Are 4×4 grids harder than 3×3?

Yes. Smaller tiles provide less visual context per cell, and objects are more likely to be partially visible. 4×4 grids are used for higher-risk sessions.

How many tiles are usually correct?

Typically 2–5 tiles out of 9 (3×3) or 3–6 tiles out of 16 (4×4). Selecting all tiles or no tiles is almost never correct.

Can I solve multi-step grid CAPTCHAs with CaptchaAI?

For single-step grids, submit the image with instructions and get the result. Multi-step grids require submitting each new image state separately as tiles refresh.


Solve grid image CAPTCHAs with CaptchaAI

Get accurate grid solutions at captchaai.com.


Discussions (0)

No comments yet.