Amazon uses image CAPTCHAs to block automated access. When you hit their anti-bot threshold, you'll see a page asking you to type characters from a distorted image. CaptchaAI's OCR solving handles these automatically.
How Amazon's CAPTCHA Works
Amazon triggers CAPTCHAs based on:
| Signal | Description |
|---|---|
| Request volume | Too many requests from one IP in a short window |
| Missing cookies | No Amazon session cookies |
| Suspicious headers | Bot-like User-Agent or missing headers |
| IP reputation | Known datacenter or proxy IP ranges |
When triggered, Amazon redirects to a page with a distorted text image and an input field. You must solve the image and submit the text to continue.
Requirements
| Requirement | Details |
|---|---|
| CaptchaAI API key | From captchaai.com |
| Python 3.7+ | With requests and beautifulsoup4 |
| Residential proxies | Recommended for sustained scraping |
Solving Amazon's Image CAPTCHA
Step 1: Detect the CAPTCHA Page
import requests
from bs4 import BeautifulSoup
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
})
def is_captcha_page(html):
return "Type the characters you see in this image" in html or \
"captcha" in html.lower()
url = "https://www.amazon.com/dp/B0EXAMPLE"
resp = session.get(url)
if is_captcha_page(resp.text):
print("CAPTCHA detected!")
else:
print("Page loaded successfully")
Step 2: Extract and Solve the Image
import base64
API_KEY = "YOUR_API_KEY"
def solve_amazon_captcha(session, captcha_page_html, captcha_page_url):
soup = BeautifulSoup(captcha_page_html, "html.parser")
# Find the CAPTCHA image
img_tag = soup.find("img", src=lambda s: s and "captcha" in s.lower())
if not img_tag:
raise Exception("CAPTCHA image not found")
img_url = img_tag["src"]
# Download the image
img_resp = session.get(img_url)
img_base64 = base64.b64encode(img_resp.content).decode()
# Submit to CaptchaAI
submit_resp = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY,
"method": "base64",
"body": img_base64
})
task_id = submit_resp.text.split("|")[1]
# Poll for result
import time
for _ in range(30):
time.sleep(5)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if result.text == "CAPCHA_NOT_READY":
continue
if result.text.startswith("OK|"):
return result.text.split("|")[1]
raise Exception(f"Solve error: {result.text}")
raise TimeoutError("Solve timed out")
Step 3: Submit the Solution
def submit_captcha_solution(session, captcha_page_html, solution, captcha_page_url):
soup = BeautifulSoup(captcha_page_html, "html.parser")
form = soup.find("form")
# Build form data
form_data = {}
for inp in form.find_all("input"):
name = inp.get("name")
if name:
form_data[name] = inp.get("value", "")
# Set the CAPTCHA answer
form_data["field-keywords"] = solution
# Submit
action = form.get("action", captcha_page_url)
if action.startswith("/"):
from urllib.parse import urljoin
action = urljoin(captcha_page_url, action)
resp = session.post(action, data=form_data)
return resp
Full Working Example
import requests
import base64
import time
from bs4 import BeautifulSoup
API_KEY = "YOUR_API_KEY"
def scrape_amazon_product(url):
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Accept-Language": "en-US,en;q=0.9"
})
resp = session.get(url)
# Handle CAPTCHA if present
if "captcha" in resp.text.lower():
soup = BeautifulSoup(resp.text, "html.parser")
img = soup.find("img", src=lambda s: s and "captcha" in s.lower())
if img:
# Download and solve
img_data = session.get(img["src"]).content
img_b64 = base64.b64encode(img_data).decode()
submit = requests.get("https://ocr.captchaai.com/in.php", params={
"key": API_KEY, "method": "base64", "body": img_b64
})
task_id = submit.text.split("|")[1]
for _ in range(30):
time.sleep(5)
result = requests.get("https://ocr.captchaai.com/res.php", params={
"key": API_KEY, "action": "get", "id": task_id
})
if result.text == "CAPCHA_NOT_READY":
continue
if result.text.startswith("OK|"):
solution = result.text.split("|")[1]
break
# Submit solution
form = soup.find("form")
form_data = {inp.get("name"): inp.get("value", "")
for inp in form.find_all("input") if inp.get("name")}
form_data["field-keywords"] = solution
action = form.get("action", url)
resp = session.post(action, data=form_data)
# Parse product data
soup = BeautifulSoup(resp.text, "html.parser")
title = soup.find("span", {"id": "productTitle"})
price = soup.find("span", class_="a-price-whole")
return {
"title": title.text.strip() if title else None,
"price": price.text.strip() if price else None
}
product = scrape_amazon_product("https://www.amazon.com/dp/B0EXAMPLE")
print(product)
Best Practices for Amazon Scraping
- Use residential proxies — Amazon blocks datacenter IPs aggressively
- Rotate User-Agents — Use a pool of realistic browser strings
- Maintain sessions — Keep cookies across requests
- Add delays — 3-10 seconds between requests
- Set Accept-Language — Always include locale headers
- Don't scrape logged-in pages — Product pages are accessible without login
Troubleshooting
| Issue | Fix |
|---|---|
| CAPTCHA on every request | Use residential proxies; slow down request rate |
| CAPTCHA solution rejected | Verify image was downloaded correctly; retry |
| Redirect loops | Check cookie handling; use allow_redirects=True |
| Empty product data | Amazon may serve different layouts; check selectors |
FAQ
Does Amazon use reCAPTCHA?
Amazon primarily uses its own image-based CAPTCHA (distorted text). CaptchaAI solves these using the method=base64 endpoint for image/OCR solving.
How many requests before Amazon shows a CAPTCHA?
It varies. With good proxies and realistic headers, you may scrape hundreds of pages. Without proxies, CAPTCHAs can appear after 10-20 requests.
Is scraping Amazon legal?
Scraping publicly available product data is generally legal, but check Amazon's terms of service and applicable laws in your jurisdiction.
Related Guides
- Search Results Data Collection with CAPTCHA Handling
- Proxy Rotation for CAPTCHA Scraping
- Image CAPTCHA Solving Using API
Full Working Code
Complete runnable examples for this article in Python, Node.js, PHP, Go, Java, C#, Ruby, Rust, Kotlin & Bash.
View on GitHub →
Discussions (0)
Join the conversation
Sign in to share your opinion.
Sign InNo comments yet.