Skip to content

Add native_datafusion V2 DataSource API reader #3481

@mbutrovich

Description

@mbutrovich

I will do this after #3446 merges.

What is the problem the feature request solves?

While working on #3446 I tested implementing a DataSource V2 compatible native_datafusion scan. I got tests passing, but then realized that Spark's DataSource V2 Parquet scan has fewer features than V1, such as not supporting DPP. Maybe Spark implemented the V2 Parquet reader to dogfood the V2 Data Source API without external dependencies to test an API that was really created for things like Iceberg, Delta, etc.

However I think newer catalog implementations might return V2 DataSource API Parquet table references, so we should probably still support.

Describe the potential solution

Implement a CometNativeBatchScanExec operator that converts Spark BatchScanExec (with ParquetScan)

This should hopefully serialize down to the same proto as CometNativeScanExec and handled transparently on the native side in planner.rs.

Additional context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions