This Python script parses pool metrics from DSE system logs and outputs them to CSV format.
- Parses the DSE pool statistics pattern from system logs
- Extracts timestamps from log entries
- Filters pools by name (case-insensitive)
- Outputs to CSV with specified columns: timestamp, pool_name, shared, stolen, completed, blocked, all_time_blocked
- Supports multiple input files
- Handles both stdout and file output
- Generates interactive HTML visualizations of completed metrics over time
- Compares pool metrics across multiple nodes on the same graph
python3 parse_pool_metrics.py --pools "CompactionExecutor,GossipStage" system.logpython3 parse_pool_metrics.py --pools "TPC" system.logpython3 parse_pool_metrics.py --metrics "Active,Completed,Blocked" system.logpython3 parse_pool_metrics.py --pools "CompactionExecutor" --metrics "Active,Completed" system.logpython3 parse_pool_metrics.py system1.log system2.log system3.logpython3 parse_pool_metrics.py --output custom_name.csv system.logBy default, the script creates a CSV file with a timestamp in the filename:
python3 parse_pool_metrics.py system.log
# Creates: pool_metrics_20251007_134328.csvpython3 parse_pool_metrics.py system.logfiles: One or more log files to parse (required)--pools: Comma-separated list of pool names to filter (optional)--metrics: Comma-separated list of metrics to include (optional)--output,-o: Output CSV file (optional, defaults to pool_metrics_.csv)--visualize,--graph: Generate interactive HTML visualization of completed metrics over time--html-output: Output HTML file for visualization (optional, defaults to pool_metrics_.html)--nodes: Comma-separated list of node names/labels matching file order (optional, will infer from filenames if not provided)
The script supports two output formats:
When you don't specify --pools, each pool gets its own row:
timestamp,pool_name,shared,stolen,completed,blocked,all_time_blocked
"2025-10-03 10:08:32,368",CompactionExecutor,N/A,N/A,311,0,0
"2025-10-03 10:08:32,368",GossipStage,N/A,N/A,665,0,0
"2025-10-03 10:13:32,520",CompactionExecutor,N/A,N/A,636,0,0When you specify --pools, each pool gets its own set of columns:
timestamp,CacheCleanupExecutor-Active,CacheCleanupExecutor-Pending,CacheCleanupExecutor-Backpressure,CacheCleanupExecutor-Delayed,CacheCleanupExecutor-Shared,CacheCleanupExecutor-Stolen,CacheCleanupExecutor-Completed,CacheCleanupExecutor-Blocked,CacheCleanupExecutor-All_Time_Blocked
"2025-10-03 10:08:32,368",0,0,N/A,N/A,N/A,N/A,0,0,0
"2025-10-03 10:13:32,520",0,0,N/A,N/A,N/A,N/A,0,0,0Each selected pool gets columns for each selected statistic:
{PoolName}-Active{PoolName}-Pending{PoolName}-Backpressure{PoolName}-Delayed{PoolName}-Shared{PoolName}-Stolen{PoolName}-Completed{PoolName}-Blocked{PoolName}-All_Time_Blocked
You can select which metrics to include in the output using the --metrics parameter:
Available metrics:
Active- Active tasksPending- Pending tasksBackpressure- Backpressure statusDelayed- Delayed tasksShared- Shared tasksStolen- Stolen tasksCompleted- Completed tasksBlocked- Blocked tasksAll_Time_Blocked- All time blocked tasks
Examples:
# Only include Active and Completed metrics
python3 parse_pool_metrics.py --metrics "Active,Completed" system.log
# Include all metrics (default behavior)
python3 parse_pool_metrics.py system.logThe script can generate interactive HTML visualizations that track the "completed" metric over time. This is particularly useful for comparing pool metrics across multiple nodes.
python3 parse_pool_metrics.py --visualize system.logThis creates an HTML file (e.g., pool_metrics_20251007_134328.html) that you can open in any web browser.
When analyzing logs from multiple nodes, you can compare pool metrics across nodes:
# With explicit node labels
python3 parse_pool_metrics.py --visualize --nodes "node1,node2,node3" node1.log node2.log node3.log
# Node names inferred from filenames
python3 parse_pool_metrics.py --visualize node1-system.log node2-system.log node3-system.logYou can combine pool filtering with visualization:
python3 parse_pool_metrics.py --visualize --pools "CompactionExecutor" node1.log node2.logThe interactive HTML dashboard includes:
- Pool Selector: A dropdown menu to select which pool to view
- Node Checkboxes: When a pool is selected, checkboxes appear for each node that has data for that pool
- Default Behavior: All nodes are selected by default when a pool is chosen
- Interactive Chart: Plotly.js-powered chart with zoom, pan, and hover tooltips
- Time Series Display: Shows completed metrics over time with timestamps on the x-axis
- Select a pool from the dropdown menu
- Node checkboxes appear for all nodes that have data for the selected pool
- All nodes are checked by default, showing all node traces on the graph
- Uncheck nodes to hide their traces and focus on specific nodes
- The graph updates in real-time as you toggle nodes
When processing multiple log files, the CSV output includes a node_name column (when multiple nodes are detected):
timestamp,node_name,pool_name,active,pending,completed,blocked,all_time_blocked
"2025-10-03 10:08:32,368",node1,CompactionExecutor,0,0,311,0,0
"2025-10-03 10:08:32,368",node2,CompactionExecutor,0,0,298,0,0The script looks for the following pattern in log files:
Pool Name Active Pending Backpressure Delayed Shared Stolen Completed Blocked All Time Blocked
It extracts the timestamp from the line immediately preceding this header and parses all subsequent pool data lines until it encounters an empty line or a new log entry.
- Python 3.6+
- No external dependencies (uses only standard library)
See example_usage.py for programmatic usage examples.