Skip to content

Optimize ORDER BY by Pruning Functionally Redundant Sort Keys #21361

@xiedeyantu

Description

@xiedeyantu

Is your feature request related to a problem or challenge?

DataFusion currently propagates functional dependencies through aggregates and joins, but it does not use them to simplify ORDER BY clauses. As a result, queries like:

SELECT deptno, SUM(sal) AS total_sal
FROM emp
GROUP BY deptno
ORDER BY deptno, total_sal

still keep both sort keys in the physical plan, even though deptno -> total_sal already holds after the aggregation. This leads to extra sort work, unnecessary comparison cost, and avoidable memory usage.

The same problem appears in cases where a later sort key is functionally determined by earlier keys. Because Sort is order-sensitive, only suffix keys that are provably redundant should be removed, and the current optimizer does not do that.

Describe the solution you'd like

I would like DataFusion to prune functionally redundant ORDER BY expressions during logical optimization.

For example, when the input schema or an intermediate plan establishes that an earlier ordering key functionally determines a later one, the optimizer should drop the redundant suffix key while preserving sort semantics. In the example above, ORDER BY deptno, total_sal should be simplified to ORDER BY deptno.

This should be implemented in a way that respects sort order, so only trailing expressions that are fully determined by the already-kept prefix may be removed.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions