Skip to content

Conversation

@comphead
Copy link
Contributor

  • DataFusion 52 migration

Which issue does this PR close?

Closes #3046 .

This PR is on shared branch and replaces #3052

Rationale for this change

What changes are included in this PR?

How are these changes tested?

* DataFusion 52 migration
@comphead
Copy link
Contributor Author

@andygrove @mbutrovich cc

@comphead comphead changed the title DataFusion 52 migration chore: DataFusion 52 migration Feb 10, 2026
andygrove and others added 2 commits February 10, 2026 12:21
…3471)

DataFusion 52's arrow-arith kernels only support Date32 +/- Interval
types, not raw integers. When Spark sends Date32 + Int8/Int16/Int32
arithmetic, the planner now routes these operations to the Spark
date_add/date_sub UDFs which handle integer types directly.

Co-authored-by: Claude Opus 4.6 <[email protected]>
@comphead
Copy link
Contributor Author

Some array functions tests fails on Cause: org.apache.comet.CometNativeException: index out of bounds: the len is 3 but the index is 3. The possible related issue #3338

comphead and others added 3 commits February 11, 2026 09:21
DataFusion 52's default PhysicalExprAdapter can fail when casting
complex nested types (List<Struct>, Map) between physical and logical
schemas. This adds a fallback path in SparkPhysicalExprAdapter that
wraps type-mismatched columns with CometCastColumnExpr using
spark_parquet_convert for the actual conversion.

Changes to CometCastColumnExpr:
- Add optional SparkParquetOptions for complex nested type conversions
- Use == instead of equals_datatype to detect field name differences
  in nested types (Struct, List, Map)
- Add relabel_array for types that differ only in field names (e.g.,
  List element "item" vs "element", Map "key_value" vs "entries")
- Fallback to spark_parquet_convert for structural nested type changes

Changes to SparkPhysicalExprAdapter:
- Try default adapter first, fall back to wrap_all_type_mismatches
  when it fails on complex nested types
- Route Struct/List/Map casts to CometCastColumnExpr instead of
  Spark Cast, which doesn't handle nested type rewriting

Co-authored-by: Claude Opus 4.6 <[email protected]>
@andygrove
Copy link
Member

@sqlbenchmark run tpch --iterations 3

…3493)

* fix: make relabel_array recursive for nested type mismatches

The shallow ArrayData type swap in relabel_array caused panics when
Arrow's ArrayData::build() validated child types recursively. This
rebuilds arrays from typed constructors (ListArray, LargeListArray,
MapArray, StructArray) so nested field name and metadata differences
are handled correctly.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* style: run cargo fmt

Co-Authored-By: Claude Opus 4.6 <[email protected]>

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
@andygrove
Copy link
Member

@sqlbenchmark run tpch --iterations 3

Benchmarks failed with OOM on q19

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFusion 52 migration

2 participants