A high-precision market data capture and analysis tool for cryptocurrency exchanges, designed for accurate replay and timing analysis. Perfect for quantitative trading strategy development, backtesting, and performance validation.
- High-precision timing: Nanosecond-accurate monotonic timestamps for reliable replay
- Real-time data capture: WebSocket-based streaming from Binance Spot
- Comprehensive metrics: Timing analysis, data integrity checks, and market compliance validation
- CLI interface: Easy-to-use command-line tools for data capture and analysis
- Quant Strategy Integration: Direct integration with trading strategies for backtesting and live execution
- Multi-speed replay: Test strategies at 1x real-time or 100x+ for rapid iteration
pip install -r requirements.txtCapture real-time market data from Binance:
python3 main.py datacapture --ticker btcusdt --time 10Parameters:
--ticker: Trading pair symbol (e.g., btcusdt, ethusdt)--time: Capture duration in seconds
Example Output:
Starting collector for btcusdt for 10 seconds
Connection opened
Subscribed to btcusdt@bookTicker
First message received, starting 10s timer
Normalized data: {'venue': 'binance-spot', 'symbol': 'BTCUSDT', 'recv_wall_ms': 1759295523244, 'recv_mono_ns': 8046576053666, 'update_id': 77130729886, 'best_bid_px': 114707.37, 'best_bid_qty': 13.74864, 'best_ask_px': 114707.38, 'best_ask_qty': 3.6567}
...
Data capture completed for btcusdt
Output Files:
- Data is saved to
data/<SYMBOL>/<SYMBOL>_YYYYMMDD_HHMM.jsonl - Symbol-specific directories (e.g.,
data/btcusdt/) - Deterministic naming with UTC timestamps (minute precision)
- Each line contains a JSON object with normalized market data
- Automatic file rotation every 10 minutes for long-running captures
Replay captured market data with precise timing for backtesting and analysis:
python3 main.py replay --symbol btcusdt --speed 1.0 --handler printParameters:
--symbol: Symbol to replay (e.g., btcusdt)--speed: Replay speed multiplier (default: 1.0 for realtime, 10.0 for 10x speed)--handler: Event handler -print(default) orstats--start-time: Optional start timestamp in milliseconds--duration: Optional duration limit in seconds--no-progress: Disable progress updates
Example Output (print handler):
=== Market Data Replay ===
Symbol: btcusdt
Total Events: 299
Time Range: 2025-10-12 10:03:33 to 10:03:36 UTC (3.0 seconds)
Replay Speed: 1.0x
Starting replay...
[10:03:47.811] BTCUSDT bid=111875.52 ask=111875.53 spread=0.01
[10:03:47.811] BTCUSDT bid=111875.52 ask=111875.53 spread=0.01
...
[Progress: 140/299 events, 47% complete, 1.4s elapsed]
...
=== Replay Complete ===
Events processed: 299
Actual duration: 3.02 seconds
Average Δt: 10.113ms
Stats Handler Example:
python3 main.py replay --symbol btcusdt --speed 10.0 --handler statsThe stats handler aggregates statistics during replay:
=== Replay Statistics ===
Events processed: 299
Duration: 3.02 seconds
Average rate: 98.9 events/second
Spread Statistics:
Mean: 0.0100
Min: 0.0100
Max: 0.0100
Price Range:
Min: 111875.52
Max: 111875.52
Key Features:
- Nanosecond Precision: Uses
recv_mono_nsfor exact timing reproduction - Speed Control: Replay at any speed (0.1x slow motion to 100x fast forward)
- Multi-File Support: Seamlessly replays across rotated files
- Event Handlers: Extensible handler system for custom backtesting logic
- Progress Tracking: Real-time progress updates during replay
Visualize captured market data with comprehensive bid/ask price and spread analysis:
python3 main.py plot --symbol btcusdtParameters:
--symbol: Trading pair symbol to plot (e.g., btcusdt)--output: Optional output file path (e.g., plot.png)--show: Display the plot interactively
Example Output:
=== Plot Summary ===
Symbol: BTCUSDT
Data Points: 299
Time Range: 2025-10-12 10:03:33 to 10:03:36 UTC
Duration: 3.0 seconds
Price Range: $111875.52 - $111875.53
Spread Range: $0.0100 - $0.0100
Average Spread: $0.0100
Plot saved to: data/plots/btcusdt/btcusdt_plot.png
Usage Examples:
# Basic plot generation (saves to data/plots/<symbol>/<symbol>_plot.png)
python3 main.py plot --symbol btcusdt
# Save to specific file
python3 main.py plot --symbol btcusdt --output myplot.png
# Display plot interactively
python3 main.py plot --symbol btcusdt --show
# Both save and show
python3 main.py plot --symbol btcusdt --output myplot.png --showPlot Features:
- Dual Subplots:
- Top plot: Bid and Ask prices over time (green/red lines)
- Bottom plot: Bid-Ask spread over time (blue line)
- High Resolution: 300 DPI output for publication-quality plots
- Time Formatting: Automatic time axis formatting with appropriate intervals
- Summary Statistics: Price ranges, spread analysis, and data point counts
- Flexible Output: Save to file, display interactively, or both
Analyze captured data for timing precision, data integrity, and market compliance:
python3 main.py metrics --filename data/btcusdt/btcusdt_20251012_0952.jsonlParameters:
--filename: Path to the captured data file
Example Output:
Analyzing metrics for data/btcusdt/btcusdt_20251012_0952.jsonl
Symbol: BTCUSDT, Venue: binance-spot, Epsilon: 1e-08
Rows: 4092
Duration (mono): 10.097 seconds
Duration (wall): 10.097 seconds
Clock drift: 0.000 seconds
Update ID regressions: 0
Delta T mean: 2.468 ms
Delta T median: 0.069 ms
Delta T 95th percentile: 13.298 ms
Ask >= Bid - ε compliance: 4091/4091 (100.00%)
Metrics Explained:
| Metric | Description | Expected Values |
|---|---|---|
| Rows | Total number of data points captured | Varies by capture duration |
| Duration (mono) | Monotonic clock duration (authoritative for replay) | Should match capture time |
| Duration (wall) | Wall clock duration (for sanity check) | Should be close to mono duration |
| Clock drift | Difference between wall and mono clocks | Should be < 1 second |
| Update ID regressions | Count of times update_id decreased (should be 0) | 0 (indicates proper ordering) |
| Delta T mean | Average time between consecutive updates | ~2-10 ms for active markets |
| Delta T median | Median time between updates | > 0 (indicates nanosecond precision) |
| Delta T 95th percentile | 95th percentile of update intervals | < 100 ms for good performance |
| Ask >= Bid - ε compliance | Percentage of valid ask/bid relationships | 100% (market integrity) |
The collected market data follows this normalized schema:
| Field | Type | Description |
|---|---|---|
venue |
string | Exchange/venue identifier |
symbol |
string | Trading pair symbol |
recv_wall_ms |
int | Wall clock timestamp in milliseconds (for plotting & human-time) |
recv_mono_ns |
int | Monotonic timestamp in nanoseconds (authoritative for replay/Δt) |
update_id |
int | Payload update ID (for gap checks) |
best_bid_px |
float | Best bid price |
best_bid_qty |
float | Best bid quantity |
best_ask_px |
float | Best ask price |
best_ask_qty |
float | Best ask quantity |
This schema ensures consistent data structure across different venues and enables reliable replay functionality.
The system uses two timing mechanisms:
-
Monotonic Clock (
recv_mono_ns): Nanosecond-precision, monotonic timestamps that are immune to system clock adjustments. This is the authoritative timing source for replay and Δt calculations. -
Wall Clock (
recv_wall_ms): Millisecond-precision wall clock timestamps for human-readable time and plotting.
The metrics analysis performs several quality checks:
- Timing Precision: Ensures nanosecond precision is working (median Δt > 0)
- Data Ordering: Verifies update_id sequence integrity (regressions = 0)
- Clock Consistency: Compares wall vs monotonic clock drift
- Market Integrity: Validates ask >= bid - ε constraint (100% compliance expected)
market-replay/
├── main.py # CLI entry point
├── collector.py # WebSocket data capture
├── normalizer.py # Data normalization
├── metrics.py # Analysis and metrics
├── plotter.py # Plot visualization utilities
├── logger.py # Data logging utilities with rotation
├── requirements.txt # Python dependencies
├── data/ # Captured data files
│ ├── btcusdt/ # Symbol-specific directories
│ │ └── btcusdt_*.jsonl # Rotated data files
│ ├── ethusdt/ # Multiple symbols supported
│ │ └── ethusdt_*.jsonl
│ └── plots/ # Generated plot files
│ ├── btcusdt/ # Symbol-specific plot directories
│ │ └── btcusdt_plot.png
│ └── ethusdt/
│ └── ethusdt_plot.png
└── README.md # This file
The data capture system includes robust file management:
- Automatic Rotation: Files rotate every 10 minutes to prevent oversized files
- Deterministic Naming:
data/<SYMBOL>/<SYMBOL>_YYYYMMDD_HHMM.jsonlformat - Crash Safety: Validates last line on startup, handles torn writes gracefully
- Flush Policy: Flushes every 200 lines or 1 second (whichever first)
- Durability: fsync only on rotation and shutdown for optimal performance
- Real-time Metrics: Tracks rows written, file size, rotations, and flush age
This tool aims to provide historical L1 market data for quantitative trading strategy development and backtesting. Here's how to integrate it with your trading strategies:
1. Capture Data → 2. Replay Engine → 3. Strategy Handler → 4. Performance Analysis
Step 1: Capture Historical Data
python3 main.py datacapture --ticker btcusdt --time 3600 # Capture 1 hour of dataStep 2: Feed Data to Your Strategy Engine
from market_replay import replay_data
# Option 1: Direct integration (if you can modify your strategy)
class MyTradingStrategy:
def __call__(self, market_event):
# Your existing strategy logic here
spread = market_event['best_ask_px'] - market_event['best_bid_px']
if spread > 0.01:
self.place_order(market_event)
strategy = MyTradingStrategy()
replay_data('btcusdt', handler=strategy, speed=10.0)
# Option 2: Feed to existing strategy engine (more common)
class DataFeedAdapter:
def __init__(self, your_strategy_engine):
self.strategy = your_strategy_engine
def __call__(self, market_event):
# Convert and feed to your existing strategy
formatted_data = self.format_data(market_event)
self.strategy.process_market_data(formatted_data)
def format_data(self, event):
# Convert to your strategy's expected format
return {
'symbol': event['symbol'],
'bid': event['best_bid_px'],
'ask': event['best_ask_px'],
'timestamp': event['recv_wall_ms']
}
# Feed historical data to your existing strategy
adapter = DataFeedAdapter(your_existing_strategy_engine)
replay_data('btcusdt', handler=adapter, speed=10.0)Step 3: Analyze Results
print(f"Final P&L: ${strategy.pnl:.2f}")
print(f"Trades executed: {len(strategy.positions)}")- Precise Timing: Nanosecond timestamps for accurate replay
- Fast Testing: Replay at different speeds for quick iteration
- Simple Integration: Just implement a
__call__method in your strategy class - Multi-Symbol: Test strategies across different trading pairs
Feed to Existing Strategy: Create an adapter that converts replayed data to your strategy's expected format Queue-Based: Put replayed data into a queue that your strategy engine consumes File-Based: Write replayed data to files that your strategy can read WebSocket Simulation: Format replayed data to simulate live WebSocket messages
Key Point: Your replay system becomes a data source that feeds into existing strategy infrastructure, not a replacement for it.
The replay_data() function processes historical data and calls your handler for each market event:
# Your handler gets called for EVERY market data event in the historical file
def my_handler(market_event):
# This function is called once per market update
# market_event contains: symbol, bid/ask prices, quantities, timestamps, etc.
print(f"Processing {market_event['symbol']} at {market_event['recv_wall_ms']}")
# The replay engine:
# 1. Loads all historical data from files
# 2. Calculates precise timing between events
# 3. Sleeps for the exact time interval
# 4. Calls your handler with the next market event
# 5. Repeats until all data is processed
replay_data('btcusdt', handler=my_handler, speed=1.0)Handler Requirements:
- Must be a callable (function or class with
__call__method) - Receives one parameter: the market event dictionary
- Should be fast (called frequently during replay)
- Can maintain state (use classes for complex logic)
This allows you to test your existing strategies with historical data without rewriting your entire system.