[SPARK-55854][SQL] Tag pass-through duplicate attributes in Expand output to prevent AMBIGUOUS_REFERENCE#54641
Open
mihailotim-db wants to merge 1 commit intoapache:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
When
Expandis created forROLLUP/CUBE/GROUPING SETS, its output contains duplicate-named attributes: the original pass-through child attribute (e.g.,a#0) and a new grouping instance created vianewInstance()(e.g.,a#5). Both share the same name, which causesAMBIGUOUS_REFERENCEerrors when any operator performs name-based resolution against theExpandoutput.This PR tags pass-through child attributes with
__is_duplicatemetadata inExpand.apply(), so thatAttributeSeq.getCandidatesForResolutiondeprioritizes them when multiple candidates match by name. This is the same mechanism already used byDeduplicateUnionChildOutputfor Union operators.Only attributes whose
ExprIdmatches a simpleAttributechild of agroupByAliasare tagged — complex grouping expressions (e.g.,c1 + 1) produce aliases with different names than any child column, so no name conflict arises. ExprId-based resolution (used for already-resolved expressions like aggregate functions) is unaffected.The fix is guarded behind a new internal config
spark.sql.analyzer.expandTagPassthroughDuplicates(defaulttrue).Why are the changes needed?
The
Expandoperator forROLLUP/CUBE/GROUPING SETSproduces an output like[a#0, b#1, c#2, a#5, gid#3]wherea#0is the pass-through child attribute anda#5is the new grouping attribute. Both have the name"a". When any operator above theExpandresolves the reference"a"by name (e.g., aFilterorProjectinserted by a custom analysis rule, or a correlated subquery whose outer reference resolves against theExpand's output),getCandidatesForResolutionreturns two candidates, andresolve()throws anAMBIGUOUS_REFERENCEerror.Does this PR introduce any user-facing change?
No. The fix prevents a latent
AMBIGUOUS_REFERENCEerror in name-based resolution againstExpandoutput. Standard SQL queries are not affected because theAggregateabove theExpandalready shields upper operators from seeing the duplicate names. The fix is defensive and makes theExpandoutput safe for any future feature or custom rule that may resolve names against it.How was this patch tested?
7 new unit tests in
ResolveGroupingAnalyticsSuite:Tagging behavior (flag enabled, default):
ROLLUP(a))ROLLUP(a + 1))ROLLUP(a, b))ExprIdand name on tagged attributesresolve("a")succeeds with tagging and throwsAMBIGUOUS_REFERENCEwithout taggingFlag disabled behavior:
resolve("a")throwsAMBIGUOUS_REFERENCEresolve("a")andresolve("b")both throwAMBIGUOUS_REFERENCEAll 9 pre-existing tests in
ResolveGroupingAnalyticsSuitecontinue to pass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude claude-4.6-opus-high-thinking (Cursor)