Skip to content

Conversation

@adriangb
Copy link
Contributor

Which issue does this PR close?

Closes #20213

Rationale for this change

When both sides of a join have columns with the same name (e.g. k), the dynamic filter from an outer join was incorrectly pushed to both children instead of only the correct one. With small row groups this caused wrong results (0 rows instead of the expected result set).

The root cause was that FilterColumnChecker in filter_pushdown.rs matched columns by name only. When the parent pushed a filter referencing column k at index 2 (the right child's k), the name-based check found k in the left child's schema too, and incorrectly pushed the filter to both sides.

What changes are included in this PR?

Approach adopted from #20192:

  1. FilterColumnChecker now matches on (name, index) pairs instead of just names, preventing incorrect cross-side pushdown when columns share names
  2. ChildFilterDescription::from_child_with_allowed_columns — new method that restricts pushdown to an explicit set of allowed (name, index) pairs
  3. ChildFilterDescription::all_unsupported — helper to mark all filters unsupported for a child
  4. HashJoinExec::gather_filters_for_pushdown — builds per-side allowed-column sets from column_indices (+ optional projection), uses lr_is_preserved to gate pushdown eligibility per join type
  5. lr_is_preserved — mirrors the logical optimizer's preserved-side logic, enabling parent filter pushdown for non-inner join types (Left, Right, Semi, Anti, Mark)

Are these changes tested?

Yes:

  • Unit test for lr_is_preserved covering all join types
  • SLT regression test reproducing the exact issue Dynamic filter applied to the wrong table when using subqueries #20213 scenario: subquery join with same-named columns, small row groups, verifying both COUNT(*) and SELECT * correctness
  • Updated snapshot for existing Left join filter pushdown test (preserved-side filter now correctly pushes down)
  • All existing hash join tests (368), filter pushdown tests (47), and SLT tests pass

Are there any user-facing changes?

  • Bug fix: Queries with nested joins where both sides have same-named columns now return correct results with dynamic filter pushdown enabled
  • Improvement: Parent filters on preserved join sides can now push through non-inner joins (Left, Right, Semi, Anti, Mark)

🤖 Generated with Claude Code

When both sides of a join have columns with the same name (e.g. `k`),
the dynamic filter from an outer join was incorrectly pushed to both
children instead of only the correct one. This happened because
`FilterColumnChecker` matched columns by name only.

Fix `FilterColumnChecker` to match on `(name, index)` pairs, and use
`column_indices` in `HashJoinExec::gather_filters_for_pushdown` to
build per-side allowed-column sets via `from_child_with_allowed_columns`.
Also adds `lr_is_preserved` to correctly gate pushdown for non-inner
join types (Left, Right, Semi, Anti, Mark).

Closes apache#20213

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@github-actions github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Feb 12, 2026
@adriangb
Copy link
Contributor Author

@jackkleeman it looks like we basically ended up with #20192. Maybe just copy over tests from here and confirm your PR fixes #20213 and we can discard this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dynamic filter applied to the wrong table when using subqueries

1 participant