-
Notifications
You must be signed in to change notification settings - Fork 0
Set up Django project with Celery and gevent #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
snopoke
wants to merge
14
commits into
main
Choose a base branch
from
claude/django-celery-gevent-setup-011CUz8zxDfaHiAKaQ9QBFZf
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Set up Django project with Celery and gevent #2
snopoke
wants to merge
14
commits into
main
from
claude/django-celery-gevent-setup-011CUz8zxDfaHiAKaQ9QBFZf
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Create a complete Django project that reproduces a bug affecting SSL verification in PostgreSQL connections when using: - Celery with gevent pool - Langfuse (>3.0) with OpenTelemetry instrumentation - psycopg3 with SSL connections Project includes: - Django 4.2 with PostgreSQL SSL configuration - Celery 5.3+ configured with gevent pool - Langfuse 3.0+ integration with @observe decorators - Docker Compose setup for PostgreSQL (with SSL) and Redis - Multiple test scripts to isolate and reproduce the bug - Comprehensive documentation Test scripts: - reproduce_bug.py: Standalone test with 4 scenarios - trigger_tasks.py: Trigger Celery tasks to test in worker context - manage.py test_bug: Django management command for testing The bug manifests when gevent's monkey patching interacts with OpenTelemetry's instrumentation, affecting SSL context handling in psycopg3 database connections.
Replace pip-based workflow with uv for significantly faster dependency installation and environment management. Changes: - Add pyproject.toml for modern Python project configuration - Add .python-version file for consistent Python version (3.11) - Generate uv.lock for deterministic dependency resolution - Update setup.sh to install and use uv instead of pip - Update run_celery_gevent.sh to use 'uv run' - Update README.md with uv-first instructions and examples - Update .gitignore to include .venv directory Benefits: - 10-100x faster dependency installation with uv - Deterministic builds with uv.lock - Automatic virtual environment management - Better dependency resolution - Still supports traditional pip workflow via requirements.txt All scripts now use 'uv run' for command execution. Users can still use traditional pip/venv workflow if preferred.
Significantly expand the bug reproduction capabilities to increase the likelihood of triggering the gevent + langfuse + psycopg3 SSL issue. New Features: 1. RequestLog Model - Track HTTP requests made during task execution - Log URL, method, status code, response time, and errors - Stores in PostgreSQL with SSL connection 2. New Celery Tasks - test_internal_observe: Uses @observe internally instead of as decorator - test_http_with_db_logging: Makes HTTP requests and logs to DB - test_multiple_http_requests: Multiple HTTP calls per task - test_mixed_operations: Combines DB queries, HTTP, and ORM operations 3. Enhanced trigger_tasks.py Script - Multiple test modes: simple, http, multiple, mixed, stress, all - Command-line arguments to control test parameters - Stress test mode with configurable concurrency - Default: 20 concurrent tasks + 10 HTTP tasks - Progress indicators and detailed result summaries 4. HTTP Integration - Uses httpbin.org for realistic HTTP requests - Adds I/O delays to increase concurrency pressure - Combines HTTP and database operations in single tasks Benefits: - Higher likelihood of reproducing the SSL verification bug - More realistic workload scenarios - Stress testing capabilities - Multiple test vectors for different bug scenarios - Better observability with request logging Usage Examples: uv run python trigger_tasks.py # All tests uv run python trigger_tasks.py --mode stress # Stress test only uv run python trigger_tasks.py --concurrency 50 # High concurrency Files Modified: - testapp/models.py: Add RequestLog model - testapp/tasks.py: Add 4 new comprehensive tasks - trigger_tasks.py: Complete rewrite with multiple test modes - pyproject.toml/requirements.txt: Add requests library - README.md: Document new tasks and test modes
Create specialized testing tools to increase likelihood of reproducing the intermittent SSL verification bug that occurs with gevent + langfuse + psycopg3. New Tools: 1. test_monkey_patching.py - Test 6 different monkey patching strategies - Early vs late patching relative to Django/OTEL imports - Aggressive vs minimal module patching - SSL-only and no-SSL variants - Helps identify which patching order triggers the bug 2. test_connection_pool.py - Connection cycling: rapidly open/close connections - Concurrent connections: many greenlets simultaneously - Mixed operations: varying timing patterns - Rapid context switches: continuous operations with greenlet switching - Exposes race conditions in connection pool and SSL context 3. inspect_ssl_context.py - Diagnostic tool showing SSL module state - Thread-local storage behavior in greenlets - psycopg3 internals and connection details - SSL parameters and certificate info - Helps understand current SSL context state 4. celery_worker_early_patch.py - Worker entry point with monkey patching before ALL imports - Tests whether early patching affects SSL initialization - Alternative to standard celery worker command 5. run_celery_multiworker.sh - Helper script for running multiple worker processes - Higher total concurrency to expose race conditions 6. REPRODUCING_THE_BUG.md - Comprehensive guide for bug reproduction strategies - Explains theory behind the bug - Step-by-step reproduction phases - What to look for and how to report results Key Strategies: - Vary monkey patching order (before/after Django imports) - Aggressive connection pool stress testing - High concurrency with many greenlets - Long-running continuous operations - Multiple worker processes - Rapid greenlet context switching - SSL context inspection and debugging Theory: The bug likely involves a race condition where: 1. Gevent's monkey patching affects SSL context initialization 2. OTEL's context propagation interferes with greenlet switching 3. Thread-local storage accessed from wrong greenlet 4. Connection pool state during specific timing windows These tools provide multiple attack vectors to trigger the bug by stressing different aspects of the system. Usage Examples: uv run python test_monkey_patching.py --strategy early_aggressive uv run python test_connection_pool.py --cycles 200 --greenlets 30 uv run python inspect_ssl_context.py uv run python celery_worker_early_patch.py --pool=gevent --concurrency=20 See REPRODUCING_THE_BUG.md for detailed reproduction strategies.
Problem: Monkey patching is persistent within a Python process. Once modules are patched, they cannot be unpatched, causing interference between different patching strategy tests. Solution: Refactor test_monkey_patching.py to spawn a subprocess for each strategy when running with --strategy=all (default behavior). Changes: - Add subprocess-based execution for each strategy - Add --internal-run flag for subprocess execution - Each strategy now runs in completely isolated Python process - No interference between different patching configurations - Simplified usage: just run "uv run python test_monkey_patching.py" - Can still test single strategy without subprocess overhead Benefits: - Reliable, reproducible results for each strategy - Clean Python environment for each test - Can test all 6 strategies in one command - Better isolation reveals true patching order effects Usage: # Test all strategies (automatic subprocesses) uv run python test_monkey_patching.py # Test single strategy (no subprocess needed) uv run python test_monkey_patching.py --strategy early_aggressive This addresses the concern that testing multiple strategies in the same process would produce unreliable results due to persistent monkey patching state. Updated documentation to reflect simpler usage pattern.
Updated inspect_ssl_context.py and test_connection_pool.py to use the pg_stat_ssl internal table instead of PostgreSQL SSL functions (ssl_is_used(), ssl_version(), ssl_cipher()) which are not available in all PostgreSQL Docker images. Changes: - Replace direct SSL function calls with subqueries from pg_stat_ssl - Query format: SELECT ssl, version, cipher FROM pg_stat_ssl WHERE pid = pg_backend_pid() - Add null handling for SSL information (N/A when not available) - Maintain backward compatibility with existing variable names
Enhanced bug reproduction strategy to test thread interference:
1. Updated test_monkey_patching.py:
- Now spawns concurrent greenlets instead of sequential operations
- Default increased to 50 concurrent greenlets per strategy
- Forces context switches and connection cycling during tests
- Much more likely to expose race conditions
2. Added langgraph to dependencies:
- Langgraph uses real OS threads internally
- Critical for testing thread interference with gevent
3. Created test_langgraph_gevent.py:
- Tests interaction between langgraph threads and gevent greenlets
- Key insight: langgraph's threads + gevent's monkey patching + SSL
context in thread-local storage can cause SSL verification failures
- Two test modes: basic (multiple rounds) and concurrent (maximum stress)
- Includes HTTP requests to add I/O delays and timing variations
4. Updated REPRODUCING_THE_BUG.md:
- Added langgraph test as PRIORITY test strategy
- Explained why thread interference is critical to reproduce bug
- Updated recommended reproduction strategy
The langgraph test is most likely to reproduce the bug because it creates
the exact conditions that cause SSL context to be accessed from wrong
thread-local storage during greenlet context switches.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Environment to try and reproduce the errors described in https://docs.google.com/document/d/1VMiPFP1qL17TyG2huzQxLXH49IasPCvaQRRfG32SP0I/edit?tab=t.0