Skip to content

Unnecessary RepartitionExec + SortPreservingMergeExec on single-partition sorted output #21349

@neilconway

Description

@neilconway

Describe the bug

From the plan for TPC-H Q22, in the #21240 branch:

  ScalarSubqueryExec: subqueries=1
    RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1, maintains_sort_order=true
      SortPreservingMergeExec: [cntrycode@0 ASC NULLS LAST]
        SortExec: expr=[cntrycode@0 ASC NULLS LAST], preserve_partitioning=[true]
          ...

EnforceDistribution inserts a RepartitionExec: partitioning=RoundRobinBatch(N) on an already-sorted single-partition output, followed immediately by SortPreservingMergeExec to merge them back. This is wasted work — the data is split and immediately re-merged.

This might be related / subset of #4368, not sure exactly.

To Reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions