-
Notifications
You must be signed in to change notification settings - Fork 2k
Optimize ORDER BY by Pruning Functionally Redundant Sort Keys #21361
Description
Is your feature request related to a problem or challenge?
DataFusion currently propagates functional dependencies through aggregates and joins, but it does not use them to simplify ORDER BY clauses. As a result, queries like:
SELECT deptno, SUM(sal) AS total_sal
FROM emp
GROUP BY deptno
ORDER BY deptno, total_sal
still keep both sort keys in the physical plan, even though deptno -> total_sal already holds after the aggregation. This leads to extra sort work, unnecessary comparison cost, and avoidable memory usage.
The same problem appears in cases where a later sort key is functionally determined by earlier keys. Because Sort is order-sensitive, only suffix keys that are provably redundant should be removed, and the current optimizer does not do that.
Describe the solution you'd like
I would like DataFusion to prune functionally redundant ORDER BY expressions during logical optimization.
For example, when the input schema or an intermediate plan establishes that an earlier ordering key functionally determines a later one, the optimizer should drop the redundant suffix key while preserving sort semantics. In the example above, ORDER BY deptno, total_sal should be simplified to ORDER BY deptno.
This should be implemented in a way that respects sort order, so only trailing expressions that are fully determined by the already-kept prefix may be removed.
Describe alternatives you've considered
No response
Additional context
No response