Skip to content

Null aware hash mark joins#21585

Draft
AdamGS wants to merge 3 commits intoapache:mainfrom
AdamGS:adamg/mark-join-null-marking
Draft

Null aware hash mark joins#21585
AdamGS wants to merge 3 commits intoapache:mainfrom
AdamGS:adamg/mark-join-null-marking

Conversation

@AdamGS
Copy link
Copy Markdown
Contributor

@AdamGS AdamGS commented Apr 13, 2026

This is still a draft, I'm putting it up because because might want to weigh in, and I find it useful to be able to see the diff clearly.

Which issue does this PR close?

Rationale for this change

This change is about correctness/sql completeness, but is also a step towards better subquery de-correlation.

What changes are included in this PR?

  1. Adds support for null-aware mark joins
  2. Make sure queries that joins that require null awareness go through a join implementation that supports that.

Are these changes tested?

  1. Existing SLT tests that explicitly showed bad results.
  2. New dedicated SLT tests.
  3. New unit tests.

Are there any user-facing changes?

This PR changes planning behavior and introduces more public API around hash joins, I'll finalize this section as it gets closer to a reviewable state.

AI Usage

AI was used in the process of developing this PR, mostly around testing and planning

@AdamGS AdamGS changed the title Adamg/mark join null marking Null aware hash mark joins Apr 13, 2026
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mark joins don't support null mark columns

1 participant