Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Feb 11, 2026

Which issue does this PR close?

Part of #3445

Rationale for this change

These changes were originally in #2521, which also included Python scripts for analyzing the data. This PR just adds the logging changes. I am hoping this makes review easier. We can discuss adding analysis scripts in a separate issue or PR in the future.

When debugging memory reservation issues in Comet, it is helpful to see all memory pool interactions (grow, shrink, try_grow) with task IDs and consumer names. This PR adds a lightweight opt-in logging decorator for the native memory pool.

What changes are included in this PR?

  • Add new config spark.comet.debug.memory (default: false)
  • Add new LoggingPool that wraps the existing memory pool when the config is enabled
  • Log all grow, shrink, and try_grow calls with task attempt ID, consumer name, and result (Ok/Err)
  • Add documentation for the new debugging feature

Example output:

[Task 486] MemoryPool[ExternalSorter[6]].try_grow(256232960) returning Ok
[Task 486] MemoryPool[ExternalSorter[6]].try_grow(257820416) returning Err
[Task 486] MemoryPool[ExternalSorterMerge[6]].shrink(10485760)

How are these changes tested?

Manual testing with TPC-DS/TPC-H benchmarks. The logging pool is a thin decorator that delegates all operations to the underlying pool, so it does not change any behavior.

🤖 Generated with Claude Code

Add a new `spark.comet.debug.memory` config that wraps the native memory
pool in a LoggingPool decorator, logging all grow/shrink/try_grow calls
with task ID and consumer name. This helps diagnose memory reservation
issues in production environments.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@andygrove andygrove marked this pull request as draft February 11, 2026 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant