[SPARK-55854][SQL] Tag pass-through duplicate attributes in Expand output to prevent AMBIGUOUS_REFERENCE by mihailotim-db · Pull Request #54641 · apache/spark

mihailotim-db · 2026-03-05T14:09:28Z

What changes were proposed in this pull request?

When Expand is created for ROLLUP/CUBE/GROUPING SETS, its output contains duplicate-named attributes: the original pass-through child attribute (e.g., a#0) and a new grouping instance created via newInstance() (e.g., a#5). Both share the same name, which causes AMBIGUOUS_REFERENCE errors when any operator performs name-based resolution against the Expand output.

This PR tags pass-through child attributes with __is_duplicate metadata in Expand.apply(), so that AttributeSeq.getCandidatesForResolution deprioritizes them when multiple candidates match by name. This is the same mechanism already used by DeduplicateUnionChildOutput for Union operators.

Only attributes whose ExprId matches a simple Attribute child of a groupByAlias are tagged — complex grouping expressions (e.g., c1 + 1) produce aliases with different names than any child column, so no name conflict arises. ExprId-based resolution (used for already-resolved expressions like aggregate functions) is unaffected.

The fix is guarded behind a new internal config spark.sql.analyzer.expandTagPassthroughDuplicates (default true).

Why are the changes needed?

The Expand operator for ROLLUP/CUBE/GROUPING SETS produces an output like [a#0, b#1, c#2, a#5, gid#3] where a#0 is the pass-through child attribute and a#5 is the new grouping attribute. Both have the name "a". When any operator above the Expand resolves the reference "a" by name (e.g., a Filter or Project inserted by a custom analysis rule, or a correlated subquery whose outer reference resolves against the Expand's output), getCandidatesForResolution returns two candidates, and resolve() throws an AMBIGUOUS_REFERENCE error.

Does this PR introduce any user-facing change?

No. The fix prevents a latent AMBIGUOUS_REFERENCE error in name-based resolution against Expand output. Standard SQL queries are not affected because the Aggregate above the Expand already shields upper operators from seeing the duplicate names. The fix is defensive and makes the Expand output safe for any future feature or custom rule that may resolve names against it.

How was this patch tested?

7 new unit tests in ResolveGroupingAnalyticsSuite:

Tagging behavior (flag enabled, default):
- Tags pass-through attribute for simple single-column grouping (ROLLUP(a))
- Does not tag for complex grouping expressions (ROLLUP(a + 1))
- Tags multiple pass-through attributes for multi-column grouping (ROLLUP(a, b))
- Preserves ExprId and name on tagged attributes
- Demonstrates that resolve("a") succeeds with tagging and throws AMBIGUOUS_REFERENCE without tagging
Flag disabled behavior:
- No tagging for single-column grouping; resolve("a") throws AMBIGUOUS_REFERENCE
- No tagging for multi-column grouping; resolve("a") and resolve("b") both throw AMBIGUOUS_REFERENCE

All 9 pre-existing tests in ResolveGroupingAnalyticsSuite continue to pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude claude-4.6-opus-high-thinking (Cursor)

fix

9bd1ade

mihailotim-db changed the title ~~fix~~ [SPARK-55854][SQL] Tag pass-through duplicate attributes in Expand output to prevent AMBIGUOUS_REFERENCE Mar 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-55854][SQL] Tag pass-through duplicate attributes in Expand output to prevent AMBIGUOUS_REFERENCE#54641

[SPARK-55854][SQL] Tag pass-through duplicate attributes in Expand output to prevent AMBIGUOUS_REFERENCE#54641
mihailotim-db wants to merge 1 commit intoapache:masterfrom
mihailotim-db:mihailo-timotic_data/expand_qualify

mihailotim-db commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mihailotim-db commented Mar 5, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant