Skip to content

Parallel merge for SortPreservingMergeExec #21381

@zhuqi-lucas

Description

@zhuqi-lucas

Is your feature request related to a problem or challenge?

When SortExec is eliminated via sort pushdown (statistics-based file reordering), SortPreservingMergeExec reads directly from I/O-bound sources instead of from SortExec's in-memory buffer. Currently SPM does a single K-way merge of all input streams, which can become a bottleneck when there are many partitions.

Describe the solution you'd like

Implement parallel merge for SortPreservingMergeExec: split the N input streams into groups, merge each group in parallel, then merge the intermediate results. This creates a tree of merges instead of a single flat K-way merge.

For example, with 8 input streams:

Level 1 (parallel):  merge(s1,s2), merge(s3,s4), merge(s5,s6), merge(s7,s8)
Level 2 (parallel):  merge(m1,m2), merge(m3,m4)
Level 3:             merge(m5,m6) → final output

This would be especially beneficial when:

  • Sort elimination removes the buffering SortExec, making SPM I/O-bound
  • Many partitions with I/O-bound sources
  • Large datasets where merge computation itself becomes a bottleneck

Additional context

Suggested by @Dandandan in #21182 (comment):

Also slightly looking forward: I think we could benefit from parallel merge (e.g. finding some split in the n streams and merging in parallel) in these situations where sorting becomes mostly about merging.

Related: DuckDB implements a similar parallel merge sort strategy.

Parent issue: #17348

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions