Skip to content

Lightweight Python Market-data CLI to normalize & store top-of-book, then replay with timing for execution backtests

License

Notifications You must be signed in to change notification settings

Dharshan2004/market-replay

Repository files navigation

Market Replay CLI

A high-precision market data capture and analysis tool for cryptocurrency exchanges, designed for accurate replay and timing analysis. Perfect for quantitative trading strategy development, backtesting, and performance validation.

Features

  • High-precision timing: Nanosecond-accurate monotonic timestamps for reliable replay
  • Real-time data capture: WebSocket-based streaming from Binance Spot
  • Comprehensive metrics: Timing analysis, data integrity checks, and market compliance validation
  • CLI interface: Easy-to-use command-line tools for data capture and analysis
  • Quant Strategy Integration: Direct integration with trading strategies for backtesting and live execution
  • Multi-speed replay: Test strategies at 1x real-time or 100x+ for rapid iteration

Installation

pip install -r requirements.txt

Usage

Data Capture

Capture real-time market data from Binance:

python3 main.py datacapture --ticker btcusdt --time 10

Parameters:

  • --ticker: Trading pair symbol (e.g., btcusdt, ethusdt)
  • --time: Capture duration in seconds

Example Output:

Starting collector for btcusdt for 10 seconds
Connection opened
Subscribed to btcusdt@bookTicker
First message received, starting 10s timer
Normalized data: {'venue': 'binance-spot', 'symbol': 'BTCUSDT', 'recv_wall_ms': 1759295523244, 'recv_mono_ns': 8046576053666, 'update_id': 77130729886, 'best_bid_px': 114707.37, 'best_bid_qty': 13.74864, 'best_ask_px': 114707.38, 'best_ask_qty': 3.6567}
...
Data capture completed for btcusdt

Output Files:

  • Data is saved to data/<SYMBOL>/<SYMBOL>_YYYYMMDD_HHMM.jsonl
  • Symbol-specific directories (e.g., data/btcusdt/)
  • Deterministic naming with UTC timestamps (minute precision)
  • Each line contains a JSON object with normalized market data
  • Automatic file rotation every 10 minutes for long-running captures

Data Replay

Replay captured market data with precise timing for backtesting and analysis:

python3 main.py replay --symbol btcusdt --speed 1.0 --handler print

Parameters:

  • --symbol: Symbol to replay (e.g., btcusdt)
  • --speed: Replay speed multiplier (default: 1.0 for realtime, 10.0 for 10x speed)
  • --handler: Event handler - print (default) or stats
  • --start-time: Optional start timestamp in milliseconds
  • --duration: Optional duration limit in seconds
  • --no-progress: Disable progress updates

Example Output (print handler):

=== Market Data Replay ===
Symbol: btcusdt
Total Events: 299
Time Range: 2025-10-12 10:03:33 to 10:03:36 UTC (3.0 seconds)
Replay Speed: 1.0x

Starting replay...

[10:03:47.811] BTCUSDT bid=111875.52 ask=111875.53 spread=0.01
[10:03:47.811] BTCUSDT bid=111875.52 ask=111875.53 spread=0.01
...
[Progress: 140/299 events, 47% complete, 1.4s elapsed]
...

=== Replay Complete ===
Events processed: 299
Actual duration: 3.02 seconds
Average Δt: 10.113ms

Stats Handler Example:

python3 main.py replay --symbol btcusdt --speed 10.0 --handler stats

The stats handler aggregates statistics during replay:

=== Replay Statistics ===
Events processed: 299
Duration: 3.02 seconds
Average rate: 98.9 events/second

Spread Statistics:
  Mean: 0.0100
  Min: 0.0100
  Max: 0.0100

Price Range:
  Min: 111875.52
  Max: 111875.52

Key Features:

  • Nanosecond Precision: Uses recv_mono_ns for exact timing reproduction
  • Speed Control: Replay at any speed (0.1x slow motion to 100x fast forward)
  • Multi-File Support: Seamlessly replays across rotated files
  • Event Handlers: Extensible handler system for custom backtesting logic
  • Progress Tracking: Real-time progress updates during replay

Plot Visualization

Visualize captured market data with comprehensive bid/ask price and spread analysis:

python3 main.py plot --symbol btcusdt

Parameters:

  • --symbol: Trading pair symbol to plot (e.g., btcusdt)
  • --output: Optional output file path (e.g., plot.png)
  • --show: Display the plot interactively

Example Output:

=== Plot Summary ===
Symbol: BTCUSDT
Data Points: 299
Time Range: 2025-10-12 10:03:33 to 10:03:36 UTC
Duration: 3.0 seconds
Price Range: $111875.52 - $111875.53
Spread Range: $0.0100 - $0.0100
Average Spread: $0.0100
Plot saved to: data/plots/btcusdt/btcusdt_plot.png

Usage Examples:

# Basic plot generation (saves to data/plots/<symbol>/<symbol>_plot.png)
python3 main.py plot --symbol btcusdt

# Save to specific file
python3 main.py plot --symbol btcusdt --output myplot.png

# Display plot interactively
python3 main.py plot --symbol btcusdt --show

# Both save and show
python3 main.py plot --symbol btcusdt --output myplot.png --show

Plot Features:

  • Dual Subplots:
    • Top plot: Bid and Ask prices over time (green/red lines)
    • Bottom plot: Bid-Ask spread over time (blue line)
  • High Resolution: 300 DPI output for publication-quality plots
  • Time Formatting: Automatic time axis formatting with appropriate intervals
  • Summary Statistics: Price ranges, spread analysis, and data point counts
  • Flexible Output: Save to file, display interactively, or both

Metrics Analysis

Analyze captured data for timing precision, data integrity, and market compliance:

python3 main.py metrics --filename data/btcusdt/btcusdt_20251012_0952.jsonl

Parameters:

  • --filename: Path to the captured data file

Example Output:

Analyzing metrics for data/btcusdt/btcusdt_20251012_0952.jsonl
Symbol: BTCUSDT, Venue: binance-spot, Epsilon: 1e-08
Rows: 4092
Duration (mono): 10.097 seconds
Duration (wall): 10.097 seconds
Clock drift: 0.000 seconds
Update ID regressions: 0
Delta T mean: 2.468 ms
Delta T median: 0.069 ms
Delta T 95th percentile: 13.298 ms
Ask >= Bid - ε compliance: 4091/4091 (100.00%)

Metrics Explained:

Metric Description Expected Values
Rows Total number of data points captured Varies by capture duration
Duration (mono) Monotonic clock duration (authoritative for replay) Should match capture time
Duration (wall) Wall clock duration (for sanity check) Should be close to mono duration
Clock drift Difference between wall and mono clocks Should be < 1 second
Update ID regressions Count of times update_id decreased (should be 0) 0 (indicates proper ordering)
Delta T mean Average time between consecutive updates ~2-10 ms for active markets
Delta T median Median time between updates > 0 (indicates nanosecond precision)
Delta T 95th percentile 95th percentile of update intervals < 100 ms for good performance
Ask >= Bid - ε compliance Percentage of valid ask/bid relationships 100% (market integrity)

Data Schema

@bookTicker Normalized Schema

The collected market data follows this normalized schema:

Field Type Description
venue string Exchange/venue identifier
symbol string Trading pair symbol
recv_wall_ms int Wall clock timestamp in milliseconds (for plotting & human-time)
recv_mono_ns int Monotonic timestamp in nanoseconds (authoritative for replay/Δt)
update_id int Payload update ID (for gap checks)
best_bid_px float Best bid price
best_bid_qty float Best bid quantity
best_ask_px float Best ask price
best_ask_qty float Best ask quantity

This schema ensures consistent data structure across different venues and enables reliable replay functionality.

Timing Precision

The system uses two timing mechanisms:

  1. Monotonic Clock (recv_mono_ns): Nanosecond-precision, monotonic timestamps that are immune to system clock adjustments. This is the authoritative timing source for replay and Δt calculations.

  2. Wall Clock (recv_wall_ms): Millisecond-precision wall clock timestamps for human-readable time and plotting.

Data Quality Validation

The metrics analysis performs several quality checks:

  • Timing Precision: Ensures nanosecond precision is working (median Δt > 0)
  • Data Ordering: Verifies update_id sequence integrity (regressions = 0)
  • Clock Consistency: Compares wall vs monotonic clock drift
  • Market Integrity: Validates ask >= bid - ε constraint (100% compliance expected)

File Structure

market-replay/
├── main.py              # CLI entry point
├── collector.py         # WebSocket data capture
├── normalizer.py        # Data normalization
├── metrics.py           # Analysis and metrics
├── plotter.py           # Plot visualization utilities
├── logger.py            # Data logging utilities with rotation
├── requirements.txt     # Python dependencies
├── data/               # Captured data files
│   ├── btcusdt/       # Symbol-specific directories
│   │   └── btcusdt_*.jsonl  # Rotated data files
│   ├── ethusdt/       # Multiple symbols supported
│   │   └── ethusdt_*.jsonl
│   └── plots/         # Generated plot files
│       ├── btcusdt/   # Symbol-specific plot directories
│       │   └── btcusdt_plot.png
│       └── ethusdt/
│           └── ethusdt_plot.png
└── README.md           # This file

File Rotation & Durability

The data capture system includes robust file management:

  • Automatic Rotation: Files rotate every 10 minutes to prevent oversized files
  • Deterministic Naming: data/<SYMBOL>/<SYMBOL>_YYYYMMDD_HHMM.jsonl format
  • Crash Safety: Validates last line on startup, handles torn writes gracefully
  • Flush Policy: Flushes every 200 lines or 1 second (whichever first)
  • Durability: fsync only on rotation and shutdown for optimal performance
  • Real-time Metrics: Tracks rows written, file size, rotations, and flush age

Quantitative Trading Strategy Integration

High-Level Integration Approach

This tool aims to provide historical L1 market data for quantitative trading strategy development and backtesting. Here's how to integrate it with your trading strategies:

Data Flow to Strategy Engine

1. Capture Data → 2. Replay Engine → 3. Strategy Handler → 4. Performance Analysis

Step 1: Capture Historical Data

python3 main.py datacapture --ticker btcusdt --time 3600  # Capture 1 hour of data

Step 2: Feed Data to Your Strategy Engine

from market_replay import replay_data

# Option 1: Direct integration (if you can modify your strategy)
class MyTradingStrategy:
    def __call__(self, market_event):
        # Your existing strategy logic here
        spread = market_event['best_ask_px'] - market_event['best_bid_px']
        if spread > 0.01:
            self.place_order(market_event)

strategy = MyTradingStrategy()
replay_data('btcusdt', handler=strategy, speed=10.0)

# Option 2: Feed to existing strategy engine (more common)
class DataFeedAdapter:
    def __init__(self, your_strategy_engine):
        self.strategy = your_strategy_engine
    
    def __call__(self, market_event):
        # Convert and feed to your existing strategy
        formatted_data = self.format_data(market_event)
        self.strategy.process_market_data(formatted_data)
    
    def format_data(self, event):
        # Convert to your strategy's expected format
        return {
            'symbol': event['symbol'],
            'bid': event['best_bid_px'],
            'ask': event['best_ask_px'],
            'timestamp': event['recv_wall_ms']
        }

# Feed historical data to your existing strategy
adapter = DataFeedAdapter(your_existing_strategy_engine)
replay_data('btcusdt', handler=adapter, speed=10.0)

Step 3: Analyze Results

print(f"Final P&L: ${strategy.pnl:.2f}")
print(f"Trades executed: {len(strategy.positions)}")

Key Benefits

  • Precise Timing: Nanosecond timestamps for accurate replay
  • Fast Testing: Replay at different speeds for quick iteration
  • Simple Integration: Just implement a __call__ method in your strategy class
  • Multi-Symbol: Test strategies across different trading pairs

Common Integration Patterns

Feed to Existing Strategy: Create an adapter that converts replayed data to your strategy's expected format Queue-Based: Put replayed data into a queue that your strategy engine consumes File-Based: Write replayed data to files that your strategy can read WebSocket Simulation: Format replayed data to simulate live WebSocket messages

Key Point: Your replay system becomes a data source that feeds into existing strategy infrastructure, not a replacement for it.

How the Handler Mechanism Works

The replay_data() function processes historical data and calls your handler for each market event:

# Your handler gets called for EVERY market data event in the historical file
def my_handler(market_event):
    # This function is called once per market update
    # market_event contains: symbol, bid/ask prices, quantities, timestamps, etc.
    print(f"Processing {market_event['symbol']} at {market_event['recv_wall_ms']}")

# The replay engine:
# 1. Loads all historical data from files
# 2. Calculates precise timing between events
# 3. Sleeps for the exact time interval
# 4. Calls your handler with the next market event
# 5. Repeats until all data is processed
replay_data('btcusdt', handler=my_handler, speed=1.0)

Handler Requirements:

  • Must be a callable (function or class with __call__ method)
  • Receives one parameter: the market event dictionary
  • Should be fast (called frequently during replay)
  • Can maintain state (use classes for complex logic)

This allows you to test your existing strategies with historical data without rewriting your entire system.

About

Lightweight Python Market-data CLI to normalize & store top-of-book, then replay with timing for execution backtests

Topics

Resources

License

Stars

Watchers

Forks

Languages