-
Notifications
You must be signed in to change notification settings - Fork 2k
Parallel merge for SortPreservingMergeExec #21381
Description
Is your feature request related to a problem or challenge?
When SortExec is eliminated via sort pushdown (statistics-based file reordering), SortPreservingMergeExec reads directly from I/O-bound sources instead of from SortExec's in-memory buffer. Currently SPM does a single K-way merge of all input streams, which can become a bottleneck when there are many partitions.
Describe the solution you'd like
Implement parallel merge for SortPreservingMergeExec: split the N input streams into groups, merge each group in parallel, then merge the intermediate results. This creates a tree of merges instead of a single flat K-way merge.
For example, with 8 input streams:
Level 1 (parallel): merge(s1,s2), merge(s3,s4), merge(s5,s6), merge(s7,s8)
Level 2 (parallel): merge(m1,m2), merge(m3,m4)
Level 3: merge(m5,m6) → final output
This would be especially beneficial when:
- Sort elimination removes the buffering
SortExec, making SPM I/O-bound - Many partitions with I/O-bound sources
- Large datasets where merge computation itself becomes a bottleneck
Additional context
Suggested by @Dandandan in #21182 (comment):
Also slightly looking forward: I think we could benefit from parallel merge (e.g. finding some split in the n streams and merging in parallel) in these situations where sorting becomes mostly about merging.
Related: DuckDB implements a similar parallel merge sort strategy.
Parent issue: #17348