Add Python scripts for analyzing memory debug logs

## Background

PR #2521 added memory reservation debug logging (`spark.comet.debug.memory` config and `LoggingPool` wrapper). That PR also contained Python scripts for parsing and visualizing the memory debug logs, but those scripts were not merged. This issue tracks adding analysis/visualization scripts as a follow-up.

## Log Format

When `spark.comet.debug.memory=true` is set, the `LoggingPool` produces log lines like:

```
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(256232960) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(257820416) returning Err
[Task 486] MemoryPool[ExternalSorterMerge[6]].shrink(10485760)
[Task 486] MemoryPool[ExternalSorterMerge[6]].try_grow(68928) returning Ok
```

## Proposed Scripts

### 1. `dev/scripts/mem_debug_to_csv.py` — Parse logs to CSV

Parses the Spark executor/worker log file, filters by task ID, and tracks cumulative memory allocation per consumer (operator).

Key details from the #2521 implementation:
- Uses regex to parse lines matching `[Task <id>] MemoryPool[<consumer>].<method>(<size>)`
- Tracks running total per consumer: `grow`/`try_grow` add to allocation, `shrink` subtracts
- For `try_grow` failures (line contains "Err"), the allocation is **not** updated but the row is annotated with an `ERR` label
- Outputs CSV with columns: `name, size, label`
- Accepts `--task <id>` to filter to a specific Spark task and `--file <path>` for the log file

### 2. `dev/scripts/plot_memory_usage.py` — Visualize memory usage

Reads the CSV output and produces a stacked area chart showing memory usage over time by consumer (operator).

Key details from the #2521 implementation:
- Uses pandas and matplotlib
- Creates a time index from row order (each row = sequential event)
- Pivots data so each consumer is a column, forward-fills missing values
- Renders a stacked area chart (`plt.stackplot`)
- Annotates `try_grow` failures with red vertical dashed lines labeled "ERR"
- Saves chart as PNG (same path as CSV but with `_chart.png` suffix)

## Suggestions from PR #2521 Code Review

The following review feedback should be incorporated:

1. **Use `#!/usr/bin/env python3` shebang** and make scripts executable (`chmod +x`)
2. **Fix CSV formatting** — use f-strings (`f"{consumer},{alloc[consumer]}"`) instead of `print(consumer, ",", alloc[consumer])` to avoid extra spaces around values
3. **Fix ERR label handling** — the original implementation printed two rows for the same event on `try_grow` failure (one with ERR label, one without). Use a label variable so only one row is printed per event
4. **Handle first occurrence being `shrink`** — the original code assumed the first event for a consumer is always `grow`/`try_grow`, but the first event could be a `shrink`
5. **Fix `--task` argument** — `int(None)` fails with TypeError when `--task` is not provided; make it optional or a positional arg
6. **Consider making `--file` a positional argument** for simpler CLI usage
7. **Use `pandas.DataFrame.ffill()`** instead of deprecated `fillna(method='ffill')` (deprecated since pandas 2.1.0)
8. **Consider logging backtraces** — when the backtrace feature is enabled, it could be useful to log backtraces on every call (not just errors) to trace precise allocation origins. This was suggested as an optional `trace!`-level enhancement to the Rust `LoggingPool`

## Example Workflow

```shell
# Step 1: Run Spark with memory debug logging enabled
spark-submit --conf spark.comet.debug.memory=true ...

# Step 2: Parse the log and generate CSV for a specific task
python3 dev/scripts/mem_debug_to_csv.py --task 486 /path/to/executor/log > /tmp/mem.csv

# Step 3: Generate a chart
python3 dev/scripts/plot_memory_usage.py /tmp/mem.csv
```

## Reference

- PR #2521: https://github.com/apache/datafusion-comet/pull/2521
- Example charts from #2521 showing stacked memory usage per operator with ERR annotations for failed `try_grow` calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python scripts for analyzing memory debug logs #3490

Background

Log Format

Proposed Scripts

1. `dev/scripts/mem_debug_to_csv.py` — Parse logs to CSV

2. `dev/scripts/plot_memory_usage.py` — Visualize memory usage

Suggestions from PR #2521 Code Review

Example Workflow

Reference

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Python scripts for analyzing memory debug logs #3490

Description

Background

Log Format

Proposed Scripts

1. dev/scripts/mem_debug_to_csv.py — Parse logs to CSV

2. dev/scripts/plot_memory_usage.py — Visualize memory usage

Suggestions from PR #2521 Code Review

Example Workflow

Reference

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `dev/scripts/mem_debug_to_csv.py` — Parse logs to CSV

2. `dev/scripts/plot_memory_usage.py` — Visualize memory usage