A powerful, feature-rich Python library to bypass Cloudflare's anti-bot protection with 10+ production-ready bypass strategies, cutting-edge advanced stealth capabilities, async support, and comprehensive monitoring. This Hybrid Edition includes the revolutionary Hybrid Engine, integrating TLS-Chameleon and Py-Parkour for the ultimate bypass capability now powered by Google Gemini AI.
The scraper now deeply integrates Google Gemini 1.5 Flash to solve complex visual challenges like reCAPTCHA v2:
- Visual Understanding: Analyzes instruction images (e.g., "Select all traffic lights") and identifies target objects.
- Intelligent Solving: Visually inspects every tile, matches objects, and solves the puzzle just like a human.
- Fast & Cheap: Uses Gemini 1.5 Flash for millisecond latency.
| Feature | Status |
|---|---|
| reCAPTCHA v2 Solving | β Tested |
| Text Captcha (Generic) | β Tested |
| Hybrid Engine | β Tested |
| Cloudflare Bypass | β Tested |
# Pass your Google API Key to enable AI Solving
scraper = cloudscraper.create_scraper(
interpreter='hybrid',
google_api_key='YOUR_GEMINI_API_KEY',
# Proxies are automatically used for AI requests too!
rotating_proxies=['http://user:pass@proxy:port']
)
# For Complicated Text Captchas (Non-Standard)
scraper = cloudscraper.create_scraper(
interpreter='hybrid',
google_api_key='YOUR_GEMINI_API_KEY',
captcha={
'text_captcha': {
'selector': '#captcha-image', # CSS selector for the image
'input_selector': '#captcha-input', # CSS selector for the input
'submit_selector': '#submit-btn' # Optional: submit button
}
}
)The Hybrid Engine is a game-changer that combines two powerful technologies:
- TLS-Chameleon (
curl_cffi): Provides perfect TLS fingerprinting (JA3/JA4) to mimic real browsers at the network layer. - Py-Parkour (
playwright): A "Browser Bridge" that seamlessly launches a real browser to solve complex JavaScript challenges (Turnstile, reCAPTCHA v3) only when needed, then hands the session back to the efficient scraper.
Why use Hybrid?
- Speed: Uses lightweight HTTP requests for 99% of work.
- Power: Falls back to a real browser only for seconds to solve a challenge.
- Stealth: Perfect TLS fingerprints + Real Browser interactions.
- Simplicity: No complex setupβjust
interpreter='hybrid'.
- π‘οΈ Hybrid Engine: Automatically switches between lightweight requests and real browser solving
- π€ AI Captcha Solver: Solves reCAPTCHA v2 using Google Gemini Vision
- π TLS Fingerprinting: JA3 fingerprint rotation with real browser signatures (Chrome, Firefox, Safari) via
tls-chameleon - π΅οΈ Traffic Pattern Obfuscation: Intelligent request spacing and behavioral consistency
- π‘οΈ Advanced Automation Bypass: Cutting-edge techniques to mask Playwright/Chromium indicators (navigator.webdriver, chrome runtime, etc.)
- π±οΈ Human-like Behavioral Patterns: Integrated mouse movements, scrolling, and interaction simulation for browser-based challenges
- π§ Intelligent Challenge Detection: AI-powered challenge recognition
- β‘ Async Support: Check
async_cloudscraperfor non-blocking operations
The Hybrid Engine and AI capabilities are built upon these cutting-edge libraries:
- ai-urllib4: The next-generation HTTP client for Python, featuring HTTP/2 support, advanced compression (Brotli/Zstd), and AI-optimized connection handling.
- TLS-Chameleon: Advanced TLS fingerprinting library that perfectly mimics real browser TLS handshakes (JA3/JA4) to evade detection.
- Py-Parkour: The "Browser Bridge" that seamlessly orchestrates real browser interactions (via Playwright) for solving complex challenges and enhancing stealth.
For the most challenging sites that use Cloudflare Turnstile Managed Challenges or Interactive Browser Verification, use the dedicated create_high_security_scraper factory:
import cloudscraper
# Method 1: With a Captcha Solving Service (Recommended)
scraper = cloudscraper.create_high_security_scraper(
captcha_provider='2captcha', # or 'anticaptcha'
captcha_api_key='YOUR_2CAPTCHA_API_KEY',
debug=True
)
response = scraper.get("https://example-protected-site.com")
print(response.status_code)
scraper.close()
# Method 2: With Residential Proxy (Improved Success Rate)
scraper = cloudscraper.create_high_security_scraper(
captcha_api_key='YOUR_API_KEY',
proxy='http://user:[email protected]:port',
debug=True
)
# Method 3: Full Power (Captcha Solver + AI + Proxy)
scraper = cloudscraper.create_high_security_scraper(
captcha_api_key='YOUR_2CAPTCHA_KEY',
google_api_key='YOUR_GEMINI_KEY', # AI fallback for visual CAPTCHAs
proxy='http://user:[email protected]:port',
debug=True
)| Feature | Setting |
|---|---|
| Interpreter | hybrid (Playwright + TLS Chameleon) |
| Turnstile Handling | β Enabled |
| Intelligent Challenges | β Enabled |
| External Captcha Solver | Configured (2captcha, anticaptcha, etc.) |
| Stealth Mode | Maximum (human-like delays, randomized headers) |
| Advanced Stealth | β
Enabled (navigator.webdriver masking) |
| Behavioral Simulation | β Enabled (Mouse/Scroll interaction) |
| Solve Depth | 5 (allows more retries) |
Note: External captcha solvers like 2captcha charge per solve (~$2-3 per 1000 Turnstile solves). Residential proxies are often necessary for geofenced or IP-blacklisted sites.
For sites with aggressive browser-engine profiling, standard methods will fail. Use the Trust Builder "Boss Mode" combo:
- π Clean IP: Use a high-quality residential proxy or VPN.
- π Identity Masking: Use the
disguise=Trueparameter to swap browser hardware DNA. - ποΈ AI Vision: Enabled by default in
warm_get, it "sees" the challenge visually.
from cloudscraper.trust_builder import warm_get
# The Boss Level Bypass
response = warm_get(
"https://high-security-site.com/protected/",
disguise=True, # π Swaps hardware signatures
depth=5, # π‘οΈ High warmth (5 pages visited first)
debug=True # π See the AI at work
)
if response.status_code == 200:
print("Boss defeated! π")
print(f"Extracted {len(response.cookies)} clearance cookies.")| Parameter | Default | Description |
|---|---|---|
url |
Required | The target "Boss" website URL. |
proxy |
None |
New: SOCKS/HTTP proxy URL (e.g., http://user:pass@host:port). |
disguise |
False |
CRITICAL for Boss sites. Generates a unique hardware/software identity. |
depth |
3 |
Number of "organic" pages to visit before the target to build trust. |
headless |
True |
Set to False to watch the AI Vision solve the challenge in real-time. |
debug |
False |
Detailed logging of Ghost Cursor and AI Vision actions. |
Pro Tip: If a site is still blocking you with a 403, your IP is likely flagged. Change your VPN server and try again with
disguise=True.
This version includes 10 production-ready bypass strategies:
The most powerful mode available. Requires cloudscraper[hybrid].
# Install with: pip install cloudscraper[hybrid]
scraper = cloudscraper.create_scraper(
interpreter='hybrid',
impersonate='chrome120', # Optional: Force specific fingerprint
google_api_key='YOUR_API_KEY' # Optional: For AI Captcha solving
)
scraper.get("https://hight-security-site.com")- Auto-saves
cf_clearancecookies after successful bypasses - Reuses cookies for 30-60 minutes (configurable TTL)
- 70-90% reduction in repeat challenge encounters
- Storage:
~/.cloudscraper/cookies/
# Enabled by default!
scraper = cloudscraper.create_scraper(
enable_cookie_persistence=True,
cookie_ttl=1800 # 30 minutes
)- Tries AI OCR β AI Object Detection β 2Captcha in sequence
- Automatic fallback on failure
- 3-5x higher solve rate vs single solver
scraper = cloudscraper.create_scraper(
captcha={
'provider': 'hybrid',
'primary': 'ai_ocr',
'fallbacks': ['ai_obj_det', '2captcha'],
'2captcha': {'api_key': 'YOUR_KEY'}
}
)If you need the raw speed of version 3.1.0 without the overhead of 3.6.0 advanced stealth features:
# Disables adaptive timing, metrics, and background monitors for maximum speed
scraper = cloudscraper.create_scraper(compatibility_mode=True)- Uses Playwright to launch real browser when all else fails
- Ultimate fallback with 99% success rate
from cloudscraper.browser_helper import create_browser_helper
browser = create_browser_helper(headless=False)
cookies = browser.solve_challenge_and_get_cookies(url)
scraper.cookies.update(cookies)- Content-aware delays (text vs images vs API)
- Mouse movement simulation
- Fingerprint resistance
- Prevents infinite retry loops
- Opens after 3 consecutive failures (configurable)
- Auto-retry after timeout
# Enabled by default!
scraper = cloudscraper.create_scraper(
enable_circuit_breaker=True,
circuit_failure_threshold=3,
circuit_timeout=60
)- Maintains pool of 3-10 scraper instances
- Each with unique browser fingerprint
- Round-robin / random / least-used rotation
from cloudscraper.session_pool import SessionPool
pool = SessionPool(pool_size=5, rotation_strategy='round_robin')
resp = pool.get('https://protected-site.com')- Adaptive per-domain delays
- Learns from 429/503 responses
- Burst prevention
from cloudscraper.rate_limiter import SmartRateLimiter
limiter = SmartRateLimiter(default_delay=1.0, burst_limit=10)
limiter.wait_if_needed(domain)- 6+ real browser JA3 signatures (Chrome, Firefox, Safari, Edge)
- Auto-rotation every N requests
from cloudscraper.tls_rotator import TLSFingerprintRotator
rotator = TLSFingerprintRotator(rotation_interval=10)
fp = rotator.get_fingerprint() # chrome_120, firefox_122, etc.- Learns which domains use which challenges
- Auto-configuration based on history
- SQLite storage:
~/.cloudscraper/challenges.db
from cloudscraper.challenge_predictor import ChallengePredictor
predictor = ChallengePredictor()
predicted = predictor.predict_challenge('example.com')
config = predictor.get_recommended_config('example.com')
scraper = cloudscraper.create_scraper(**config)- Content-type aware delays
- Adaptive reading time calculation
- Solves Cloudflare's specific "JavaScript Detection" (JSD) challenge.
- Uses a custom LZString implementation to handle dynamic alphabets.
- Essential for sites like Crypto.com and others using this specific protection.
from cloudscraper.jsd_solver import JSDSolver
# If you encounter a JSD challenge (raw script content):
solver = JSDSolver(user_agent="Mozilla/5.0...")
solution = solver.solve(script_content)
# Returns the 'wp' (window properties) payload and 's' (secret) key
# {
# "wp": "compressed_payload...",
# "s": "secret_key"
# }| Configuration | Success Rate | Speed | Use Case |
|---|---|---|---|
| Default (V1 + Cookies + Circuit Breaker) | 70-80% | Fast | Most sites |
| + Hybrid Solver | 85-95% | Medium | Sites with captchas |
| + Session Pool | 90-95% | Medium | Pattern detection |
| + Browser Fallback | 99%+ | Slow | Hardest sites |
See ENHANCED_FEATURES.md for detailed documentation on all bypass strategies.
If you find this library useful, consider supporting its development:
Note
This is a maintained fork of the original cloudscraper library.
You can use this version (ai-cloudscraper) as a drop-in replacement while waiting for updates to the original library, or continue using it as your primary driver as we will consistently update it with the latest anti-detection technologies.
[!IMPORTANT]
Import Note:
Even though you install the package as ai-cloudscraper, you still import it as cloudscraper in your Python code.
This package is designed as a drop-in replacement.
# Correct usage
import cloudscraper# Install maintained version (Recommended)
pip install ai-cloudscraper
# Install with AI solvers (Phase 1)
pip install ai-cloudscraper[ai]
# Install with browser automation (Phase 1)
pip install ai-cloudscraper[browser]
# Or install from source (Development)
pip install -e .import cloudscraper
# Create a CloudScraper instance (cookie persistence + circuit breaker enabled by default)
scraper = cloudscraper.create_scraper()
# Use it like a regular requests session
response = scraper.get("https://protected-site.com")
print(response.text)import cloudscraper
from cloudscraper.session_pool import SessionPool
from cloudscraper.challenge_predictor import ChallengePredictor
# Option 1: Default (Recommended for most sites)
scraper = cloudscraper.create_scraper()
resp = scraper.get('https://protected-site.com')
# Option 2: With hybrid solver
scraper = cloudscraper.create_scraper(
captcha={
'provider': 'hybrid',
'fallbacks': ['ai_ocr', '2captcha'],
'2captcha': {'api_key': 'YOUR_KEY'}
}
)
# Option 3: Session pool for maximum stealth
pool = SessionPool(pool_size=5, rotation_strategy='round_robin')
resp = pool.get('https://protected-site.com')
# Option 4: Challenge predictor for smart configuration
predictor = ChallengePredictor()
config = predictor.get_recommended_config('target-domain.com')
scraper = cloudscraper.create_scraper(**config)Cloudflare's anti-bot protection works by presenting JavaScript challenges that must be solved before accessing the protected content. cloudscraper:
- Detects Cloudflare challenges automatically
- Solves JavaScript challenges using embedded interpreters
- Maintains session state and cookies
- Returns the protected content seamlessly
- Python 3.8+
- requests >= 2.32.0
- requests_toolbelt >= 1.0.0
- js2py >= 0.74 (default JavaScript interpreter)
- Additional dependencies listed in requirements.txt
Phase 1 AI Solvers:
pip install ddddocr ultralytics pillowPhase 1 Browser Automation:
pip install playwright
playwright install chromiumPhase 2 features require NO additional dependencies - everything is included!
cloudscraper supports multiple JavaScript interpreters:
- js2py (default) - Pure Python implementation
- nodejs - Requires Node.js installation
- native - Built-in Python solver
# Use Chrome fingerprint
scraper = cloudscraper.create_scraper(browser='chrome')
# Use Firefox fingerprint
scraper = cloudscraper.create_scraper(browser='firefox')# Single proxy
scraper = cloudscraper.create_scraper()
scraper.proxies = {
'http': 'http://proxy:8080',
'https': 'http://proxy:8080'
}scraper = cloudscraper.create_scraper(
captcha={
'provider': '2captcha',
'api_key': 'your_api_key'
}
)Supported CAPTCHA providers:
- 2captcha
- anticaptcha
- CapSolver
- CapMonster Cloud
scraper = cloudscraper.create_scraper( enable_tls_fingerprinting=True, enable_anti_detection=True, enable_enhanced_spoofing=True, spoofing_consistency_level='high', enable_adaptive_timing=True, behavior_profile='research', # Slowest, most careful stealth_options={ 'min_delay': 3.0, 'max_delay': 10.0, 'human_like_delays': True } )
scraper.enable_maximum_stealth()
**Challenge detection not working?**
```python
# Add custom challenge patterns
scraper.intelligent_challenge_system.add_custom_pattern(
domain='problem-site.com',
pattern_name='Custom Challenge',
patterns=[r'custom.+challenge.+text'],
challenge_type='custom',
response_strategy='delay_retry'
)
Want to optimize for specific domains?
# Make several learning requests first
for i in range(5):
try:
response = scraper.get('https://target-site.com/test')
except Exception:
pass
# Then optimize for the domain
scraper.optimize_for_domain('target-site.com')Check enhanced system status:
stats = scraper.get_enhanced_statistics()
for system, status in stats.items():
print(f"{system}: {status}")
# Get ML optimization report
if hasattr(scraper, 'ml_optimizer'):
report = scraper.ml_optimizer.get_optimization_report()
print(f"Success rate: {report.get('global_success_rate', 0):.2%}")Challenge solving fails:
# Try different interpreter
scraper = cloudscraper.create_scraper(interpreter='nodejs')
# Increase delay
scraper = cloudscraper.create_scraper(delay=10)
# Enable debug mode
scraper = cloudscraper.create_scraper(debug=True)403 Forbidden errors:
# Enable stealth mode
scraper = cloudscraper.create_scraper(
enable_stealth=True,
auto_refresh_on_403=True
)Slow performance:
# Use faster interpreter
scraper = cloudscraper.create_scraper(interpreter='native')Enable debug mode to see what's happening:
scraper = cloudscraper.create_scraper(debug=True)
response = scraper.get("https://example.com")
# Debug output shows:
# - Challenge type detected
# - JavaScript interpreter used
# - Challenge solving process
# - Final response status| Parameter | Type | Default | Description |
|---|---|---|---|
enable_tls_fingerprinting |
boolean | True | Enable advanced TLS fingerprinting |
enable_tls_rotation |
boolean | True | Rotate TLS fingerprints automatically |
enable_anti_detection |
boolean | True | Enable traffic pattern obfuscation |
enable_enhanced_spoofing |
boolean | True | Enable Canvas/WebGL spoofing |
spoofing_consistency_level |
string | 'medium' | Spoofing consistency ('low', 'medium', 'high') |
enable_intelligent_challenges |
boolean | True | Enable AI challenge detection |
enable_adaptive_timing |
boolean | True | Enable human behavior simulation |
behavior_profile |
string | 'casual' | Timing profile ('casual', 'focused', 'research', 'mobile') |
enable_ml_optimization |
boolean | True | Enable ML-based bypass optimization |
enable_enhanced_error_handling |
boolean | True | Enable intelligent error recovery |
advanced_stealth |
boolean | True | Enable deep automation bypass (Playwright only) |
behavioral_patterns |
boolean | True | Enable human interaction simulation (Playwright only) |
compatibility_mode |
boolean | False | Disable all 3.6.x overhead for 3.1.x performance |
stealth_options = {
'min_delay': 1.0, # Minimum delay between requests
'max_delay': 4.0, # Maximum delay between requests
'human_like_delays': True, # Use human-like delay patterns
'randomize_headers': True, # Randomize request headers
'browser_quirks': True, # Enable browser-specific quirks
'simulate_viewport': True, # Simulate viewport changes
'behavioral_patterns': True # Use behavioral pattern simulation
}import cloudscraper
# Ultimate bypass configuration
scraper = cloudscraper.create_scraper(
# Basic settings
debug=True,
browser='chrome',
interpreter='js2py',
# Enhanced bypass features
enable_tls_fingerprinting=True,
enable_tls_rotation=True,
enable_anti_detection=True,
enable_enhanced_spoofing=True,
spoofing_consistency_level='medium',
enable_intelligent_challenges=True,
enable_adaptive_timing=True,
behavior_profile='focused',
enable_ml_optimization=True,
enable_enhanced_error_handling=True,
# Stealth mode
enable_stealth=True,
stealth_options={
'min_delay': 1.5,
'max_delay': 4.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True,
'simulate_viewport': True,
'behavioral_patterns': True
},
# Session management
session_refresh_interval=3600,
auto_refresh_on_403=True,
max_403_retries=3,
# Proxy rotation
rotating_proxies=[
'http://proxy1:8080',
'http://proxy2:8080',
'http://proxy3:8080'
],
proxy_options={
'rotation_strategy': 'smart',
'ban_time': 600
},
# CAPTCHA solving
captcha={
'provider': '2captcha',
'api_key': 'your_api_key'
}
)
# Monitor bypass performance
stats = scraper.get_enhanced_statistics()
print(f"Active bypass systems: {len(stats)}")| Profile | Description | Use Case |
|---|---|---|
casual |
Relaxed browsing patterns | General web scraping |
focused |
Efficient but careful | Targeted data collection |
research |
Slow, methodical access | Academic or detailed research |
mobile |
Mobile device simulation | Mobile-optimized sites |
| Level | Fingerprint Stability | Detection Resistance | Performance |
|---|---|---|---|
low |
Minimal changes | Good | Fastest |
medium |
Moderate variations | Excellent | Balanced |
high |
Significant obfuscation | Maximum | Slower |
| Parameter | Type | Default | Description |
|---|---|---|---|
debug |
boolean | False | Enable debug output |
delay |
float | auto | Override challenge delay |
interpreter |
string | 'js2py' | JavaScript interpreter |
browser |
string/dict | None | Browser fingerprint |
enable_stealth |
boolean | True | Enable stealth mode |
allow_brotli |
boolean | True | Enable Brotli compression |
| Parameter | Type | Default | Description |
|---|---|---|---|
disableCloudflareV1 |
boolean | False | Disable v1 challenges |
disableCloudflareV2 |
boolean | False | Disable v2 challenges |
disableCloudflareV3 |
boolean | False | Disable v3 challenges |
disableTurnstile |
boolean | False | Disable Turnstile |
| Parameter | Type | Default | Description |
|---|---|---|---|
session_refresh_interval |
int | 3600 | Session refresh time (seconds) |
auto_refresh_on_403 |
boolean | True | Auto-refresh on 403 errors |
max_403_retries |
int | 3 | Max 403 retry attempts |
scraper = cloudscraper.create_scraper(
debug=True,
delay=5,
interpreter='js2py',
browser='chrome',
enable_stealth=True,
stealth_options={
'min_delay': 2.0,
'max_delay': 5.0,
'human_like_delays': True,
'randomize_headers': True,
'browser_quirks': True
}
)Extract Cloudflare cookies for use in other applications:
import cloudscraper
# Get cookies as dictionary
tokens, user_agent = cloudscraper.get_tokens("https://example.com")
print(tokens)
# {'cf_clearance': '...', '__cfduid': '...'}
# Get cookies as string
cookie_string, user_agent = cloudscraper.get_cookie_string("https://example.com")
print(cookie_string)
# "cf_clearance=...; __cfduid=..."Use cloudscraper tokens with curl or other HTTP clients:
import subprocess
import cloudscraper
cookie_string, user_agent = cloudscraper.get_cookie_string('https://example.com')
result = subprocess.check_output([
'curl',
'--cookie', cookie_string,
'-A', user_agent,
'https://example.com'
])MIT License. See LICENSE file for details.
For detailed documentation about the enhanced bypass capabilities, see:
- ENHANCED_FEATURES.md - Complete technical documentation
- examples/enhanced_bypass_demo.py - Comprehensive usage examples
- tests/test_enhanced_features.py - Feature validation tests
| Feature | Module | Description |
|---|---|---|
| TLS Fingerprinting | tls_fingerprinting.py |
JA3 fingerprint rotation |
| Anti-Detection | anti_detection.py |
Traffic pattern obfuscation |
| Enhanced Spoofing | enhanced_spoofing.py |
Canvas/WebGL fingerprint spoofing |
| Challenge Detection | intelligent_challenge_system.py |
AI-powered challenge recognition |
| Adaptive Timing | adaptive_timing.py |
Human behavior simulation |
| ML Optimization | ml_optimization.py |
Machine learning bypass optimization |
| Error Handling | enhanced_error_handling.py |
Intelligent error recovery |
π Enhanced CloudScraper - Bypass the majority of Cloudflare protections with cutting-edge anti-detection technology!
Contributions are welcome! Please feel free to submit a Pull Request.
This tool is for educational and testing purposes only. Always respect website terms of service and use responsibly.
