Troubleshooting

CAPTCHA Appears After Login: Mid-Session CAPTCHA Handling

You logged in successfully. You scraped 50 pages. Then a CAPTCHA appears — blocking your session without warning. Mid-session CAPTCHAs are triggered by behavior, not just the initial visit. This guide covers why they appear, how to detect them, and how to solve them without losing your session.


Why CAPTCHAs appear mid-session

Trigger Description
Request rate Too many requests in a short time
Navigation pattern Non-human browsing patterns (no pauses, no scrolling)
Session age Cookie or token expiry after a set duration
IP reputation change Proxy IP gets flagged during the session
Action-based triggers Specific actions (checkout, form submit) trigger verification
JavaScript fingerprint Missing or inconsistent browser fingerprint

Detecting mid-session CAPTCHAs

Your scraper must check every response for CAPTCHA indicators:

import requests

session = requests.Session()


def has_captcha(response):
    """Check if a response contains a CAPTCHA challenge."""
    html = response.text.lower()

    # reCAPTCHA
    if "g-recaptcha" in html or "www.google.com/recaptcha" in html:
        return "recaptcha"

    # Cloudflare Turnstile
    if "cf-turnstile" in html or "challenges.cloudflare.com/turnstile" in html:
        return "turnstile"

    # Cloudflare Challenge (full-page)
    if response.status_code in [403, 503] and "just a moment" in html:
        return "cloudflare_challenge"

    # Generic CAPTCHA indicators
    if "captcha" in html and ("verify" in html or "robot" in html):
        return "unknown"

    return None


def safe_get(url):
    """GET with automatic CAPTCHA detection."""
    resp = session.get(url)
    captcha_type = has_captcha(resp)

    if captcha_type:
        print(f"CAPTCHA detected ({captcha_type}) on {url}")
        resp = handle_captcha(resp, url, captcha_type)

    return resp

Solving mid-session CAPTCHAs

When a CAPTCHA is detected, solve it and continue without losing the session:

import time

API_KEY = "YOUR_API_KEY"


def solve_recaptcha(sitekey, pageurl):
    submit = requests.post("https://ocr.captchaai.com/in.php", data={
        "key": API_KEY, "method": "userrecaptcha",
        "googlekey": sitekey, "pageurl": pageurl, "json": 1
    }).json()
    if submit.get("status") != 1:
        raise RuntimeError(submit.get("request"))
    task_id = submit["request"]
    time.sleep(15)
    for _ in range(24):
        result = requests.get("https://ocr.captchaai.com/res.php", params={
            "key": API_KEY, "action": "get", "id": task_id, "json": 1
        }).json()
        if result.get("status") == 1:
            return result["request"]
        time.sleep(5)
    raise TimeoutError("Solve timed out")


def handle_captcha(response, url, captcha_type):
    """Solve the detected CAPTCHA and retry the request."""
    html = response.text

    if captcha_type == "recaptcha":
        if 'data-sitekey="' in html:
            start = html.index('data-sitekey="') + 14
            end = html.index('"', start)
            sitekey = html[start:end]

            token = solve_recaptcha(sitekey, url)

            # Submit the token using the SAME session (preserves cookies)
            return session.post(url, data={
                "g-recaptcha-response": token
            })

    if captcha_type == "turnstile":
        if 'data-sitekey="' in html:
            start = html.index('data-sitekey="') + 14
            end = html.index('"', start)
            sitekey = html[start:end]

            submit = requests.post("https://ocr.captchaai.com/in.php", data={
                "key": API_KEY, "method": "turnstile",
                "sitekey": sitekey, "pageurl": url, "json": 1
            }).json()
            task_id = submit["request"]
            time.sleep(10)
            for _ in range(24):
                result = requests.get("https://ocr.captchaai.com/res.php", params={
                    "key": API_KEY, "action": "get", "id": task_id, "json": 1
                }).json()
                if result.get("status") == 1:
                    return session.post(url, data={
                        "cf-turnstile-response": result["request"]
                    })
                time.sleep(5)

    # Fallback: return original response
    return response

Preventing mid-session CAPTCHAs

Strategy How
Slow down requests Add 2–5 second delays between pages
Randomize timing Use random.uniform(2, 5) for natural pauses
Rotate User-Agent Change User-Agent periodically
Preserve cookies Use session persistence across all requests
Use residential proxies Lower CAPTCHA trigger rate
Mimic human patterns Vary request order, skip some pages
import random

def controlled_scrape(urls):
    for url in urls:
        resp = safe_get(url)
        # Process response...
        delay = random.uniform(2, 5)
        time.sleep(delay)

Maintaining session after solving

The key to mid-session CAPTCHA handling is session persistence. Never create a new session after solving.

# WRONG — new session loses auth cookies
new_session = requests.Session()
new_session.post(url, data={"g-recaptcha-response": token})

# CORRECT — same session preserves auth cookies
session.post(url, data={"g-recaptcha-response": token})
# Continue scraping with the same session
next_page = session.get("https://example.com/page/2")

FAQ

Why does a CAPTCHA appear after I am already logged in?

Sites use CAPTCHAs to gate suspicious actions, not just initial access. Fast navigation, bulk downloads, or repeated form submissions trigger mid-session verification.

Does solving the mid-session CAPTCHA log me out?

Not if you use the same session object. The authentication cookies remain intact. Only create a new session if you need to re-authenticate.

How do I handle CAPTCHAs that appear on AJAX/API calls?

Inspect the API response for CAPTCHA indicators (HTML fragments, specific JSON error codes, or 403 status). Solve the CAPTCHA and replay the failed API call.

Should I re-solve the login CAPTCHA or the new one?

Solve the new CAPTCHA. The login CAPTCHA was already passed. The mid-session CAPTCHA is a separate challenge.


Handle mid-session CAPTCHAs with CaptchaAI

Keep your sessions running at captchaai.com.


Discussions (0)

No comments yet.