-
Notifications
You must be signed in to change notification settings - Fork 284
Open
Description
Describe the bug
SQL
SELECT c3, c42, corr(c20, c6) FROM test0 GROUP BY c3,c42 ORDER BY c3, c42;
Spark Plan
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(3) Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91129]
+- *(2) HashAggregate(keys=[c3#3, c42#42], functions=[corr(c20#20, c6#6)], output=[c3#3, c42#42, corr(c20, c6)#28057])
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- Exchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, [plan_id=91101]
+- *(1) HashAggregate(keys=[c3#3, c42#42], functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038, xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
+- *(1) ColumnarToRow
+- FileScan parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
+- == Initial Plan ==
Sort [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=91083]
+- HashAggregate(keys=[c3#3, c42#42], functions=[corr(c20#20, c6#6)], output=[c3#3, c42#42, corr(c20, c6)#28057])
+- Exchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, [plan_id=91080]
+- HashAggregate(keys=[c3#3, c42#42], functions=[partial_corr(c20#20, c6#6)], output=[c3#3, c42#42, n#28038, xAvg#28039, yAvg#28040, ck#28041, xMk#28042, yMk#28043])
+- FileScan parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
Comet Plan
AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
*(1) CometColumnarToRow
+- CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 1
+- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91263]
+- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, yAvg#28157, ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], [corr(c20#20, c6#6)]
+- AQEShuffleRead coalesced
+- ShuffleQueryStage 0
+- CometColumnarExchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91217]
+- CometHashAggregate [c3#3, c6#6, c20#20, c42#42], Partial, [c3#3, c42#42], [partial_corr(c20#20, c6#6)]
+- CometScan [native_iceberg_compat] parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
+- == Initial Plan ==
CometSort [c3#3, c42#42, corr(c20, c6)#28174], [c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST]
+- CometColumnarExchange rangepartitioning(c3#3 ASC NULLS FIRST, c42#42 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91198]
+- CometHashAggregate [c3#3, c42#42, n#28155, xAvg#28156, yAvg#28157, ck#28158, xMk#28159, yMk#28160], Final, [c3#3, c42#42], [corr(c20#20, c6#6)]
+- CometColumnarExchange hashpartitioning(c3#3, c42#42, 200), ENSURE_REQUIREMENTS, CometColumnarShuffle, [plan_id=91196]
+- CometHashAggregate [c3#3, c6#6, c20#20, c42#42], Partial, [c3#3, c42#42], [partial_corr(c20#20, c6#6)]
+- CometScan [native_iceberg_compat] parquet [c3#3,c6#6,c20#20,c42#42] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c3:int,c6:double,c20:double,c42:array<timestamp_ntz>>
First difference at row 150:
Spark: 1190973260,[3333-01-21T01:11:48.781],NULL
Comet: 1190973260,[3333-01-21T01:11:48.781],NaN
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
Reactions are currently unavailable