Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,73 @@ private static class AliasReplacementShuttle extends SqlShuttle {
}
}

/**
* Wraps a nested join into a subquery with explicit column aliases.
*
* <p>This is specifically required for dialects like ClickHouse where the
* identifier resolver cannot resolve columns with internal table qualifiers
* (e.g., 'd1.loc') when they are wrapped in a subquery alias. By forcing
* an explicit 'AS' projection for every column, we ensure the identifiers
* are flattened and visible to the outer query block.
*
* @param input The Result of the nested join branch
* @param inputRel The RelNode representing the join branch to extract row types
* @param outerAlias The alias to be assigned to the wrapped subquery
* @return A new Result containing the wrapped SQL with explicit aliases
*/
protected Result wrapNestedJoin(Result input, RelNode inputRel, String outerAlias) {
final SqlParserPos pos = SqlParserPos.ZERO;

// Extract the underlying SqlSelect from the input Result.
// We manipulate the Select node directly to avoid redundant nesting
// often introduced by Result.asSelect().
final SqlSelect innerSelect = input.asSelect();
final SqlNodeList originalSelectList = innerSelect.getSelectList();
final List<String> fieldNames = inputRel.getRowType().getFieldNames();

final List<SqlNode> newSelectList = new ArrayList<>();

// Iterate through the fields to build explicit projections.
// Example: transforms 'd1.deptno' into 'd1.deptno AS deptno'.
for (int i = 0; i < fieldNames.size(); i++) {
SqlNode expr = originalSelectList.get(i);
String targetName = fieldNames.get(i);

// If the expression is already aliased, strip the AS to get the raw expression.
if (expr.getKind() == SqlKind.AS) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is quite strange. What if there are ambiguous fields?
What will happen if I want to have something like t1.id as customer_id, t2.id as product_id?
It will become t1.id as id, t2.id as id

Also, can you point me to ClickHouse documentation that shows this limitation?
Have you tried creating an issue in the ClickHouse repository as well ?

Copy link
Author

@Dwrite Dwrite Dec 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR description with the specific error log from ClickHouse 25.12.1.

To answer your concern about ambiguity: Calcite's RelDataType already ensures that all field names in the row type are unique. My implementation uses these unique names for explicit aliasing (e.g., t1.id AS id, t2.id AS id_0). This guarantees that the subquery 'exports' a clean and deterministic schema to the outer query, resolving the ClickHouse scoping limitation without risking field collisions.

I also renamed the methods to shouldWrapNestedJoin and wrapNestedJoinWithAliases to better reflect their purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test might be a good way to illustrate this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried creating an issue in the ClickHouse repository as well ?

Created an issue in clickhouse repo by myself ClickHouse/ClickHouse#93112. I hope they also can improve it.

expr = ((SqlCall) expr).operand(0);
}

// Force an explicit alias. This breaks the link to the internal table
// qualifiers (like 'd1' or 'd2') and exports the column as a clean
// identifier for the outer join scope.
newSelectList.add(
SqlStdOperatorTable.AS.createCall(
pos,
expr,
new SqlIdentifier(targetName, pos)));
}

// Update the select list of the inner query with flattened aliases.
innerSelect.setSelectList(new SqlNodeList(newSelectList, pos));

// Wrap the modified Select node with the outer alias (e.g., AS j).
SqlNode wrappedNode =
SqlStdOperatorTable.AS.createCall(pos,
innerSelect,
new SqlIdentifier(outerAlias, pos));

// Return the new Result, resetting the internal clauses to prevent
// the Implementor from adding additional SELECT wrappers.
return new Result(
wrappedNode,
Collections.emptyList(),
outerAlias,
inputRel.getRowType(),
ImmutableMap.of(outerAlias, inputRel.getRowType()));
}


/** Visits a Join; called by {@link #dispatch} via reflection. */
public Result visit(Join e) {
switch (e.getJoinType()) {
Expand All @@ -233,7 +300,13 @@ public Result visit(Join e) {
break;
}
final Result leftResult = visitInput(e, 0).resetAlias();
final Result rightResult = visitInput(e, 1).resetAlias();
Result rightResult = visitInput(e, 1).resetAlias();

if (dialect.shouldWrapNestedJoin(e)) {
String newAlias = "t" + e.getId(); // alias
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this identifier is safe and won't for example hide another identifier with the same name that happens to exist?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the safety of the identifier t + e.getId():

In the RelToSqlConverter phase, original SQL aliases (like 'j') are often not preserved or can become ambiguous after multiple logical transformations. Using rel.getId() is a standard practice in Calcite for generating synthetic aliases because:

Global Uniqueness: RelNode IDs are globally unique within a single planning session. This ensures that the generated alias t{id} will not collide with any other synthetic aliases generated by the converter for different nodes in the same query tree.

Deterministic Scoping: Since we are creating a new subquery scope to resolve the ClickHouse scoping issue, this unique identifier provides a clean boundary. Even if a user-defined table named t123 exists, the risk of collision is significantly lower than attempting to reuse or guess original aliases which might have been pruned or renamed by optimization rules.

Avoids Identifier Shadowing: By forcing this unique alias at the join boundary, we ensure that the outer scope has a deterministic way to reference the flattened columns without worrying about shadowing identifiers from deeper nested structures.

While using SqlValidatorUtil.uniquify is another option, it requires maintaining a usedNames set across the entire implementation. Given Calcite's internal conventions, t + id provides the best balance between implementation robustness and identifier safety for this dialect-specific fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Significantly lower != 0.
A compiler has to work for all user programs.
The worst case is that some programs are compiled to produce incorrect results. These kinds of bugs are also very hard to diagnose when they happen. So why not avoid them?
I can easily see a program that has many tables named T0, T1, etc. where such a collision can happen.
For column names Calcite automatically generates fresh names if there is a collision.
If that's not true for relation names (I don't know myself), I think the implementation has to make the extra effort to generate correct code, for example by computing the used names before generating a fresh identifier. If uniquify does what you need, why not use it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right. I agree that even a statistically low probability of collision is unacceptable for a SQL generator, and a compiler must be robust for all possible user programs.

I have updated the implementation to replace the t + id logic with a more rigorous alias generation approach:

Unique Alias Generation: I now use SqlValidatorUtil.uniquify to generate the subquery alias. To ensure 100% safety, I collect all field names from the Join's rowType and the neededAlias into a usedNames set. This ensures the generated identifier (e.g., t, t0, t1...) will never collide with existing table or column names in the current scope.

In-place Aliasing: The wrapNestedJoin method now manipulates the SqlSelect directly. It forces an explicit AS projection for every column using the standardized field names from RelDataType. This "flattens" the identifiers and makes them resolvable by the outer query block in ClickHouse, while avoiding redundant subquery nesting.

This change ensures that the generated SQL is both ClickHouse-compatible and formally correct, preventing any potential identifier shadowing as you pointed out. Thank you for the guidance on maintaining compiler-level robustness!

rightResult = wrapNestedJoin(rightResult, e.getRight(), newAlias);
}

final Context leftContext = leftResult.qualifiedContext();
final Context rightContext = rightResult.qualifiedContext();
final SqlNode sqlCondition;
Expand Down
23 changes: 23 additions & 0 deletions core/src/main/java/org/apache/calcite/sql/SqlDialect.java
Original file line number Diff line number Diff line change
Expand Up @@ -1825,4 +1825,27 @@ private ContextImpl(DatabaseProduct databaseProduct,
conformance, nullCollation, dataTypeSystem, jethroInfo);
}
}
/**
* Whether this dialect requires wrapping a nested JOIN in an extra subquery.
*
* <p>Some SQL dialects (for example, ClickHouse) do not support nested JOIN
* syntax directly. In such cases, a JOIN whose input is itself another JOIN
* must be wrapped as:
*
* <pre>
* SELECT * FROM ( ... ) AS t
* </pre>
*
* <p>The default implementation returns {@code false}, meaning that nested
* JOINs can be emitted directly without additional wrapping. Dialects with
* stricter JOIN syntax rules should override this method and return
* {@code true} when wrapping is required for the given {@link RelNode}.
*
* @param relNode Relational expression representing a JOIN input
* @return whether an extra subquery wrapper is required for nested JOINs
*/
public boolean shouldWrapNestedJoin(RelNode relNode) {
return false;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@

import org.apache.calcite.avatica.util.TimeUnitRange;
import org.apache.calcite.config.NullCollation;
import org.apache.calcite.rel.RelNode;
import org.apache.calcite.rel.core.Aggregate;
import org.apache.calcite.rel.core.Correlate;
import org.apache.calcite.rel.core.Filter;
import org.apache.calcite.rel.core.Join;
import org.apache.calcite.rel.core.Project;
import org.apache.calcite.rel.core.Sort;
import org.apache.calcite.rel.type.RelDataType;
import org.apache.calcite.rel.type.RelDataTypeSystem;
import org.apache.calcite.rel.type.RelDataTypeSystemImpl;
Expand Down Expand Up @@ -404,4 +411,45 @@ private static void unparseFloor(SqlWriter writer, SqlCall call) {
call.operand(0).unparse(writer, 0, 0);
writer.endList(frame);
}

@Override public boolean shouldWrapNestedJoin(RelNode rel) {
if (!(rel instanceof Join)) {
return false;
}
Join join = (Join) rel;

// ClickHouse primarily requires wrapping the right side of a JOIN
// when it contains a nested JOIN
return containsJoinRecursive(join.getRight());
}

/**
* Checks whether the given RelNode contains a JOIN that is directly exposed
* at the JOIN boundary, possibly wrapped by a small number of transparent
* single-input operators (e.g. Project, Filter, Sort).
*
* <p>This method intentionally does NOT perform a full tree traversal.
* ClickHouse only requires wrapping when a JOIN appears directly as a JOIN input;
* JOINs deeper in the subtree do not trigger the restriction.
*
* <p>Therefore, a full RelVisitor is avoided here to prevent over-detection
* and unnecessary wrapping.
*/
private static boolean containsJoinRecursive(RelNode rel) {
if (rel instanceof Join || rel instanceof Correlate) {
return true;
}

// Look through transparent single-input operators
// These don't create subquery boundaries in the SQL
if (rel instanceof Project
|| rel instanceof Filter
|| rel instanceof Sort
|| rel instanceof Aggregate) { // Add Aggregate here
return containsJoinRecursive(rel.getInput(0));
}

return false;
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@
import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.CoreMatchers.notNullValue;
import static org.hamcrest.MatcherAssert.assertThat;
import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.hasToString;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertTrue;
Expand Down Expand Up @@ -11462,4 +11463,118 @@ public Sql schema(CalciteAssert.SchemaSpec schemaSpec) {
relFn, transforms);
}
}

/** Test case for
* <a href="https://issues.apache.org/jira/browse/CALCITE-7279">[CALCITE-7279]
* ClickHouse dialect should wrap nested JOINs with explicit aliasing</a>. */
@Test void testClickHouseNestedJoin() {
final String query = "SELECT e.empno, j.dname, j.loc\n"
+ "FROM emp e\n"
+ "LEFT JOIN (\n"
+ " SELECT d1.deptno, d1.dname, d2.loc\n"
+ " FROM dept d1\n"
+ " INNER JOIN dept d2 ON d1.deptno = d2.deptno\n"
+ ") AS j ON e.deptno = j.deptno";

final String sql = sql(query)
.schema(CalciteAssert.SchemaSpec.JDBC_SCOTT)
.withClickHouse()
.exec();

// 1. Verify that the inner join is wrapped in a subquery
assertThat(sql, containsString("LEFT JOIN (SELECT"));

// 2. Verify explicit aliasing which is crucial for ClickHouse scoping.
// We must see "source AS alias" to ensure the identifier is resolvable outer scope.
assertThat(sql, containsString("`DEPTNO` AS `DEPTNO`"));
assertThat(sql, containsString("`DNAME` AS `DNAME`"));
assertThat(sql, containsString("`LOC` AS `LOC`"));

// 3. Ensure we don't have redundant double-wrapping like (SELECT * FROM (SELECT...))
// The structure should be: LEFT JOIN (SELECT expr AS col FROM ...) AS j
assertFalse(sql.contains("SELECT *"),
"ClickHouse dialect should avoid SELECT * in nested joins");
}

/** Test that simple JOINs without nesting are not wrapped unnecessarily. */
@Test void testClickHouseSimpleJoinNotWrapped() {
final String query = "SELECT e.empno, d.dname\n"
+ "FROM emp e\n"
+ "LEFT JOIN dept d ON e.deptno = d.deptno";

// Simple joins should remain flat as standard SQL
final String expected = "SELECT `EMP`.`EMPNO`, `DEPT`.`DNAME`\n"
+ "FROM `SCOTT`.`EMP`\n"
+ "LEFT JOIN `SCOTT`.`DEPT` ON `EMP`.`DEPTNO` = `DEPT`.`DEPTNO`";

sql(query)
.schema(CalciteAssert.SchemaSpec.JDBC_SCOTT)
.withClickHouse()
.ok(expected);
}

/** Test nested JOIN with aggregation to ensure expressions like COUNT(*) are aliased. */
@Test void testClickHouseNestedJoinWithAggregation() {
final String query = "SELECT e.empno, j.cnt\n"
+ "FROM emp e\n"
+ "LEFT JOIN (\n"
+ " SELECT d1.deptno, COUNT(*) as cnt\n"
+ " FROM dept d1\n"
+ " INNER JOIN dept d2 ON d1.deptno = d2.deptno\n"
+ " GROUP BY d1.deptno\n"
+ ") AS j ON e.deptno = j.deptno";

final String sql = sql(query)
.schema(CalciteAssert.SchemaSpec.JDBC_SCOTT)
.withClickHouse()
.exec();

// Aggregation results must be explicitly aliased for ClickHouse visibility
assertThat(sql, containsString("COUNT(*) AS `CNT`"));
assertThat(sql, containsString("GROUP BY `DEPT`.`DEPTNO`"));
}

/** Test three-way JOIN to ensure the converter handles linear joins normally. */
@Test void testClickHouseThreeWayJoin() {
final String query = "SELECT e.empno, d1.dname, d2.loc\n"
+ "FROM emp e\n"
+ "INNER JOIN dept d1 ON e.deptno = d1.deptno\n"
+ "INNER JOIN dept d2 ON d1.deptno = d2.deptno";

// Standard multi-way joins shouldn't be forced into subqueries unless nested on the right
final String expected = "SELECT `EMP`.`EMPNO`, `DEPT`.`DNAME`, `DEPT0`.`LOC`\n"
+ "FROM `SCOTT`.`EMP`\n"
+ "INNER JOIN `SCOTT`.`DEPT` ON `EMP`.`DEPTNO` = `DEPT`.`DEPTNO`\n"
+ "INNER JOIN `SCOTT`.`DEPT` AS `DEPT0` ON `DEPT`.`DEPTNO` = `DEPT0`.`DEPTNO`";

sql(query)
.schema(CalciteAssert.SchemaSpec.JDBC_SCOTT)
.withClickHouse()
.ok(expected);
}

/** Regression test: Ensure MySQL dialect remains unaffected by ClickHouse-specific fix. */
@Test void testMysqlNestedJoinNotWrapped() {
final String query = "SELECT e.empno, j.dname\n"
+ "FROM emp e\n"
+ "LEFT JOIN (\n"
+ " SELECT d1.deptno, d1.dname\n"
+ " FROM dept d1\n"
+ " INNER JOIN dept d2 ON d1.deptno = d2.deptno\n"
+ ") AS j ON e.deptno = j.deptno";

// MySQL supports nested join syntax; no additional wrapping select should be added by our fix.
final String expected = "SELECT `EMP`.`EMPNO`, `t`.`DNAME`\n"
+ "FROM `SCOTT`.`EMP`\n"
+ "LEFT JOIN (SELECT `DEPT`.`DEPTNO`, `DEPT`.`DNAME`\n"
+ "FROM `SCOTT`.`DEPT`\n"
+ "INNER JOIN `SCOTT`.`DEPT` AS `DEPT0` ON `DEPT`.`DEPTNO` = `DEPT0`.`DEPTNO`) AS `t` "
+ "ON `EMP`.`DEPTNO` = `t`.`DEPTNO`";

sql(query)
.schema(CalciteAssert.SchemaSpec.JDBC_SCOTT)
.withMysql()
.ok(expected);
}

}
Loading