Parallel merge for SortPreservingMergeExec

**Is your feature request related to a problem or challenge?**

When `SortExec` is eliminated via sort pushdown (statistics-based file reordering), `SortPreservingMergeExec` reads directly from I/O-bound sources instead of from `SortExec`'s in-memory buffer. Currently SPM does a single K-way merge of all input streams, which can become a bottleneck when there are many partitions.

**Describe the solution you'd like**

Implement parallel merge for `SortPreservingMergeExec`: split the N input streams into groups, merge each group in parallel, then merge the intermediate results. This creates a tree of merges instead of a single flat K-way merge.

For example, with 8 input streams:
```
Level 1 (parallel):  merge(s1,s2), merge(s3,s4), merge(s5,s6), merge(s7,s8)
Level 2 (parallel):  merge(m1,m2), merge(m3,m4)
Level 3:             merge(m5,m6) → final output
```

This would be especially beneficial when:
- Sort elimination removes the buffering `SortExec`, making SPM I/O-bound
- Many partitions with I/O-bound sources
- Large datasets where merge computation itself becomes a bottleneck

**Additional context**

Suggested by @Dandandan in https://github.com/apache/datafusion/pull/21182#discussion_r3036542606:

> Also slightly looking forward: I think we could benefit from _parallel merge_ (e.g. finding some split in the n streams and merging in parallel) in these situations where sorting becomes mostly about merging.

Related: DuckDB implements a similar [parallel merge sort](https://duckdb.org/2021/08/27/external-sorting.html) strategy.

Parent issue: https://github.com/apache/datafusion/issues/17348

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel merge for SortPreservingMergeExec #21381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parallel merge for SortPreservingMergeExec #21381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions