Market Replay CLI

A high-precision market data capture and analysis tool for cryptocurrency exchanges, designed for accurate replay and timing analysis. Perfect for quantitative trading strategy development, backtesting, and performance validation.

Features

High-precision timing: Nanosecond-accurate monotonic timestamps for reliable replay
Real-time data capture: WebSocket-based streaming from Binance Spot
Comprehensive metrics: Timing analysis, data integrity checks, and market compliance validation
CLI interface: Easy-to-use command-line tools for data capture and analysis
Quant Strategy Integration: Direct integration with trading strategies for backtesting and live execution
Multi-speed replay: Test strategies at 1x real-time or 100x+ for rapid iteration

Installation

pip install -r requirements.txt

Usage

Data Capture

Capture real-time market data from Binance:

python3 main.py datacapture --ticker btcusdt --time 10

Parameters:

--ticker: Trading pair symbol (e.g., btcusdt, ethusdt)
--time: Capture duration in seconds

Example Output:

Starting collector for btcusdt for 10 seconds
Connection opened
Subscribed to btcusdt@bookTicker
First message received, starting 10s timer
Normalized data: {'venue': 'binance-spot', 'symbol': 'BTCUSDT', 'recv_wall_ms': 1759295523244, 'recv_mono_ns': 8046576053666, 'update_id': 77130729886, 'best_bid_px': 114707.37, 'best_bid_qty': 13.74864, 'best_ask_px': 114707.38, 'best_ask_qty': 3.6567}
...
Data capture completed for btcusdt

Output Files:

Data is saved to data/<SYMBOL>/<SYMBOL>_YYYYMMDD_HHMM.jsonl
Symbol-specific directories (e.g., data/btcusdt/)
Deterministic naming with UTC timestamps (minute precision)
Each line contains a JSON object with normalized market data
Automatic file rotation every 10 minutes for long-running captures

Data Replay

Replay captured market data with precise timing for backtesting and analysis:

python3 main.py replay --symbol btcusdt --speed 1.0 --handler print

Parameters:

--symbol: Symbol to replay (e.g., btcusdt)
--speed: Replay speed multiplier (default: 1.0 for realtime, 10.0 for 10x speed)
--handler: Event handler - print (default) or stats
--start-time: Optional start timestamp in milliseconds
--duration: Optional duration limit in seconds
--no-progress: Disable progress updates

Example Output (print handler):

=== Market Data Replay ===
Symbol: btcusdt
Total Events: 299
Time Range: 2025-10-12 10:03:33 to 10:03:36 UTC (3.0 seconds)
Replay Speed: 1.0x

Starting replay...

[10:03:47.811] BTCUSDT bid=111875.52 ask=111875.53 spread=0.01
[10:03:47.811] BTCUSDT bid=111875.52 ask=111875.53 spread=0.01
...
[Progress: 140/299 events, 47% complete, 1.4s elapsed]
...

=== Replay Complete ===
Events processed: 299
Actual duration: 3.02 seconds
Average Δt: 10.113ms

Stats Handler Example:

python3 main.py replay --symbol btcusdt --speed 10.0 --handler stats

The stats handler aggregates statistics during replay:

=== Replay Statistics ===
Events processed: 299
Duration: 3.02 seconds
Average rate: 98.9 events/second

Spread Statistics:
  Mean: 0.0100
  Min: 0.0100
  Max: 0.0100

Price Range:
  Min: 111875.52
  Max: 111875.52

Key Features:

Nanosecond Precision: Uses recv_mono_ns for exact timing reproduction
Speed Control: Replay at any speed (0.1x slow motion to 100x fast forward)
Multi-File Support: Seamlessly replays across rotated files
Event Handlers: Extensible handler system for custom backtesting logic
Progress Tracking: Real-time progress updates during replay

Plot Visualization

Visualize captured market data with comprehensive bid/ask price and spread analysis:

python3 main.py plot --symbol btcusdt

Parameters:

--symbol: Trading pair symbol to plot (e.g., btcusdt)
--output: Optional output file path (e.g., plot.png)
--show: Display the plot interactively

Example Output:

=== Plot Summary ===
Symbol: BTCUSDT
Data Points: 299
Time Range: 2025-10-12 10:03:33 to 10:03:36 UTC
Duration: 3.0 seconds
Price Range: $111875.52 - $111875.53
Spread Range: $0.0100 - $0.0100
Average Spread: $0.0100
Plot saved to: data/plots/btcusdt/btcusdt_plot.png

Usage Examples:

# Basic plot generation (saves to data/plots/<symbol>/<symbol>_plot.png)
python3 main.py plot --symbol btcusdt

# Save to specific file
python3 main.py plot --symbol btcusdt --output myplot.png

# Display plot interactively
python3 main.py plot --symbol btcusdt --show

# Both save and show
python3 main.py plot --symbol btcusdt --output myplot.png --show

Plot Features:

Dual Subplots:
- Top plot: Bid and Ask prices over time (green/red lines)
- Bottom plot: Bid-Ask spread over time (blue line)
High Resolution: 300 DPI output for publication-quality plots
Time Formatting: Automatic time axis formatting with appropriate intervals
Summary Statistics: Price ranges, spread analysis, and data point counts
Flexible Output: Save to file, display interactively, or both

Metrics Analysis

Analyze captured data for timing precision, data integrity, and market compliance:

python3 main.py metrics --filename data/btcusdt/btcusdt_20251012_0952.jsonl

Parameters:

--filename: Path to the captured data file

Example Output:

Analyzing metrics for data/btcusdt/btcusdt_20251012_0952.jsonl
Symbol: BTCUSDT, Venue: binance-spot, Epsilon: 1e-08
Rows: 4092
Duration (mono): 10.097 seconds
Duration (wall): 10.097 seconds
Clock drift: 0.000 seconds
Update ID regressions: 0
Delta T mean: 2.468 ms
Delta T median: 0.069 ms
Delta T 95th percentile: 13.298 ms
Ask >= Bid - ε compliance: 4091/4091 (100.00%)

Metrics Explained:

Metric	Description	Expected Values
Rows	Total number of data points captured	Varies by capture duration
Duration (mono)	Monotonic clock duration (authoritative for replay)	Should match capture time
Duration (wall)	Wall clock duration (for sanity check)	Should be close to mono duration
Clock drift	Difference between wall and mono clocks	Should be < 1 second
Update ID regressions	Count of times update_id decreased (should be 0)	0 (indicates proper ordering)
Delta T mean	Average time between consecutive updates	~2-10 ms for active markets
Delta T median	Median time between updates	> 0 (indicates nanosecond precision)
Delta T 95th percentile	95th percentile of update intervals	< 100 ms for good performance
Ask >= Bid - ε compliance	Percentage of valid ask/bid relationships	100% (market integrity)

Data Schema

@bookTicker Normalized Schema

The collected market data follows this normalized schema:

Field	Type	Description
`venue`	string	Exchange/venue identifier
`symbol`	string	Trading pair symbol
`recv_wall_ms`	int	Wall clock timestamp in milliseconds (for plotting & human-time)
`recv_mono_ns`	int	Monotonic timestamp in nanoseconds (authoritative for replay/Δt)
`update_id`	int	Payload update ID (for gap checks)
`best_bid_px`	float	Best bid price
`best_bid_qty`	float	Best bid quantity
`best_ask_px`	float	Best ask price
`best_ask_qty`	float	Best ask quantity

This schema ensures consistent data structure across different venues and enables reliable replay functionality.

Timing Precision

The system uses two timing mechanisms:

Monotonic Clock (recv_mono_ns): Nanosecond-precision, monotonic timestamps that are immune to system clock adjustments. This is the authoritative timing source for replay and Δt calculations.
Wall Clock (recv_wall_ms): Millisecond-precision wall clock timestamps for human-readable time and plotting.

Data Quality Validation

The metrics analysis performs several quality checks:

Timing Precision: Ensures nanosecond precision is working (median Δt > 0)
Data Ordering: Verifies update_id sequence integrity (regressions = 0)
Clock Consistency: Compares wall vs monotonic clock drift
Market Integrity: Validates ask >= bid - ε constraint (100% compliance expected)

File Structure

market-replay/
├── main.py              # CLI entry point
├── collector.py         # WebSocket data capture
├── normalizer.py        # Data normalization
├── metrics.py           # Analysis and metrics
├── plotter.py           # Plot visualization utilities
├── logger.py            # Data logging utilities with rotation
├── requirements.txt     # Python dependencies
├── data/               # Captured data files
│   ├── btcusdt/       # Symbol-specific directories
│   │   └── btcusdt_*.jsonl  # Rotated data files
│   ├── ethusdt/       # Multiple symbols supported
│   │   └── ethusdt_*.jsonl
│   └── plots/         # Generated plot files
│       ├── btcusdt/   # Symbol-specific plot directories
│       │   └── btcusdt_plot.png
│       └── ethusdt/
│           └── ethusdt_plot.png
└── README.md           # This file

File Rotation & Durability

The data capture system includes robust file management:

Automatic Rotation: Files rotate every 10 minutes to prevent oversized files
Deterministic Naming: data/<SYMBOL>/<SYMBOL>_YYYYMMDD_HHMM.jsonl format
Crash Safety: Validates last line on startup, handles torn writes gracefully
Flush Policy: Flushes every 200 lines or 1 second (whichever first)
Durability: fsync only on rotation and shutdown for optimal performance
Real-time Metrics: Tracks rows written, file size, rotations, and flush age

Quantitative Trading Strategy Integration

High-Level Integration Approach

This tool aims to provide historical L1 market data for quantitative trading strategy development and backtesting. Here's how to integrate it with your trading strategies:

Data Flow to Strategy Engine

1. Capture Data → 2. Replay Engine → 3. Strategy Handler → 4. Performance Analysis

Step 1: Capture Historical Data

python3 main.py datacapture --ticker btcusdt --time 3600  # Capture 1 hour of data

Step 2: Feed Data to Your Strategy Engine

from market_replay import replay_data

# Option 1: Direct integration (if you can modify your strategy)
class MyTradingStrategy:
    def __call__(self, market_event):
        # Your existing strategy logic here
        spread = market_event['best_ask_px'] - market_event['best_bid_px']
        if spread > 0.01:
            self.place_order(market_event)

strategy = MyTradingStrategy()
replay_data('btcusdt', handler=strategy, speed=10.0)

# Option 2: Feed to existing strategy engine (more common)
class DataFeedAdapter:
    def __init__(self, your_strategy_engine):
        self.strategy = your_strategy_engine
    
    def __call__(self, market_event):
        # Convert and feed to your existing strategy
        formatted_data = self.format_data(market_event)
        self.strategy.process_market_data(formatted_data)
    
    def format_data(self, event):
        # Convert to your strategy's expected format
        return {
            'symbol': event['symbol'],
            'bid': event['best_bid_px'],
            'ask': event['best_ask_px'],
            'timestamp': event['recv_wall_ms']
        }

# Feed historical data to your existing strategy
adapter = DataFeedAdapter(your_existing_strategy_engine)
replay_data('btcusdt', handler=adapter, speed=10.0)

Step 3: Analyze Results

print(f"Final P&L: ${strategy.pnl:.2f}")
print(f"Trades executed: {len(strategy.positions)}")

Key Benefits

Precise Timing: Nanosecond timestamps for accurate replay
Fast Testing: Replay at different speeds for quick iteration
Simple Integration: Just implement a __call__ method in your strategy class
Multi-Symbol: Test strategies across different trading pairs

Common Integration Patterns

Feed to Existing Strategy: Create an adapter that converts replayed data to your strategy's expected format Queue-Based: Put replayed data into a queue that your strategy engine consumes File-Based: Write replayed data to files that your strategy can read WebSocket Simulation: Format replayed data to simulate live WebSocket messages

Key Point: Your replay system becomes a data source that feeds into existing strategy infrastructure, not a replacement for it.

How the Handler Mechanism Works

The replay_data() function processes historical data and calls your handler for each market event:

# Your handler gets called for EVERY market data event in the historical file
def my_handler(market_event):
    # This function is called once per market update
    # market_event contains: symbol, bid/ask prices, quantities, timestamps, etc.
    print(f"Processing {market_event['symbol']} at {market_event['recv_wall_ms']}")

# The replay engine:
# 1. Loads all historical data from files
# 2. Calculates precise timing between events
# 3. Sleeps for the exact time interval
# 4. Calls your handler with the next market event
# 5. Repeats until all data is processed
replay_data('btcusdt', handler=my_handler, speed=1.0)

Handler Requirements:

Must be a callable (function or class with __call__ method)
Receives one parameter: the market event dictionary
Should be fast (called frequently during replay)
Can maintain state (use classes for complex logic)

This allows you to test your existing strategies with historical data without rewriting your entire system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Market Replay CLI

Features

Installation

Usage

Data Capture

Data Replay

Plot Visualization

Metrics Analysis

Data Schema

@bookTicker Normalized Schema

Timing Precision

Data Quality Validation

File Structure

File Rotation & Durability

Quantitative Trading Strategy Integration

High-Level Integration Approach

Data Flow to Strategy Engine

Key Benefits

Common Integration Patterns

How the Handler Mechanism Works

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
collector.py		collector.py
logger.py		logger.py
main.py		main.py
metrics.py		metrics.py
normalizer.py		normalizer.py
plotter.py		plotter.py
replay.py		replay.py
requirements.txt		requirements.txt

License

Dharshan2004/market-replay

Folders and files

Latest commit

History

Repository files navigation

Market Replay CLI

Features

Installation

Usage

Data Capture

Data Replay

Plot Visualization

Metrics Analysis

Data Schema

@bookTicker Normalized Schema

Timing Precision

Data Quality Validation

File Structure

File Rotation & Durability

Quantitative Trading Strategy Integration

High-Level Integration Approach

Data Flow to Strategy Engine

Key Benefits

Common Integration Patterns

How the Handler Mechanism Works

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages