Skip to content

Conversation

@JoshElkind
Copy link

@JoshElkind JoshElkind commented Feb 10, 2026

Which issue does this PR close?

Rationale for this change

As a follow-on to #15646 (Extension Type / Metadata support for Scalar UDFs), the narrow helpers on ExprSchema and ExprSchemable can be deprecated in favor of the single “get the field” API. Callers should use field_from_column / to_field and then read type, nullability, and metadata from the returned field, giving one consistent way to resolve expression/column schema info and allowing the deprecated methods to be removed in a later release.

What changes are included in this PR?

ExprSchema (datafusion_common): Deprecate nullable(col), data_type(col), metadata(col), and data_type_and_nullable(col) with #[deprecated(since = "53.0.0", note = "use field_from_column")]. Default implementations delegate to field_from_column(col). ExprSchemable (datafusion_expr): Deprecate get_type(schema), nullable(schema), and metadata(schema) with #[deprecated(since = "53.0.0", note = "use to_field")]; data_type_and_nullable was already deprecated (since 51.0.0). Default implementations delegate to to_field(schema). All internal uses of these methods are updated to to_field / field_from_column across expr, optimizer, sql, spark, substrait, examples, and docs. Test fixes (no production logic changed): test_inlist_nullability “long list” case now uses 7 elements so it hits the list.len() > 6 path; test_like_nullability expects nullable for Like/SimilarTo (current implementation always returns nullable); rewrite_sort_cols_by_agg_alias expected values use Column::new_unqualified so they match the rewriter’s unqualified output column names. Clippy: addressed collapsible_if and added #[expect(deprecated)] for the Expr::Wildcard match arm in expr_schema.rs.

Are these changes tested?

Yes. Existing tests cover the behavior; no new tests were added. The three test updates only align expectations with current implementation. cargo test -p datafusion-expr --lib and the broader test suite pass. Deprecated methods keep default implementations so behavior is unchanged for any remaining external callers until they migrate.

Are there any user-facing changes?

Yes. ExprSchema methods data_type, nullable, metadata, data_type_and_nullable are deprecated in favor of field_from_column(col) then the field’s accessors. ExprSchemable methods get_type, nullable, metadata, data_type_and_nullable are deprecated in favor of to_field(schema) then the field’s .data_type(), .is_nullable(), .metadata(). This is deprecation only; no APIs are removed and behavior is unchanged. The user guide (working-with-exprs.md) already documents using to_field for expression type/nullability.

@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate spark labels Feb 10, 2026
@Jefffrey
Copy link
Contributor

There seem to be unrelated changes bleeding into this PR

@JoshElkind JoshElkind force-pushed the deprecate-exprschema-funcs branch from bc1538d to b27d2b6 Compare February 11, 2026 03:50
@github-actions github-actions bot removed development-process Related to development process of DataFusion physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Feb 11, 2026
@JoshElkind
Copy link
Author

Thanks for flagging this, @Jefffrey, you were right. The branch had picked up an unrelated commit (clone_on_ref_ptr) and was behind main, so the diff showed extra workflow and sqllogictest changes.
I’ve rebased so the PR now only contains the ExprSchema/ExprSchemable deprecation: a single commit on top of current main, touching 25 files (all deprecation-related). The diff should be clean now; please take another look when you have a chance.

@github-actions github-actions bot added development-process Related to development process of DataFusion physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Feb 11, 2026
…che#15847)

- ExprSchema (datafusion_common): deprecate nullable, data_type, metadata,
  data_type_and_nullable with note to use field_from_column
- ExprSchemable (datafusion_expr): deprecate get_type, nullable, metadata,
  data_type_and_nullable with note to use to_field
- Migrate internal call sites to to_field / field_from_column
- Fix test_inlist_nullability (long list threshold), test_like_nullability
  (match current Like nullable), rewrite_sort_cols_by_agg_alias (unqualified cols)
- Address clippy: collapsible_if, expect(deprecated) for Wildcard
@JoshElkind JoshElkind force-pushed the deprecate-exprschema-funcs branch from 0873a4a to 16d3c54 Compare February 11, 2026 07:43
@github-actions github-actions bot removed development-process Related to development process of DataFusion physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) catalog Related to the catalog crate execution Related to the execution crate proto Related to proto crate functions Changes to functions implementation datasource Changes to the datasource crate ffi Changes to the ffi crate physical-plan Changes to the physical-plan crate labels Feb 11, 2026
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting this PR. While I do agree with these changes in theory, in practice it seems to make the calling code much less ergonomic 🤔

Also I'm having a bit of trouble understanding some seemingly unrelated changes made here; could we narrow the PR to only what is strictly necessary to make reviewing easier (considering the PR is already big as it is)

I see mention in the PR body (which I'll add doesn't seem properly formatted so it is a bit hard to read):

Test fixes (no production logic changed): test_inlist_nullability “long list” case now uses 7 elements so it hits the list.len() > 6 path; test_like_nullability expects nullable for Like/SimilarTo (current implementation always returns nullable); rewrite_sort_cols_by_agg_alias expected values use Column::new_unqualified so they match the rewriter’s unqualified output column names. Clippy: addressed collapsible_if and added #[expect(deprecated)] for the Expr::Wildcard match arm in expr_schema.rs.

If we intend to improve the tests perhaps we should do that in a separate PR to cleanly separate what is required for the original issue

Comment on lines +5175 to +5178
// ScalarValue Struct may have 0 rows (e.g. empty array not foldable) or 1 row
if struct_arr.is_empty() {
return write!(f, "Struct({{}})");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem correct; a scalar should always have length 1? Also this fix seems unrelated to the issue at hand

desc: r#"min(c2) --> "min(c2)" -- (column *named* "min(t.c2)"!)"#,
input: sort(min(col("c2"))),
expected: sort(col("min(t.c2)")),
expected: sort(Expr::Column(Column::new_unqualified("min(t.c2)"))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being changed?

#[test]
fn test_simplify_composed_bitwise_and() {
// ((c2 > 5) & (c1 < 6)) & (c2 > 5) --> (c2 > 5) & (c1 < 6)
// ((c3 & 1) & (c3 & 2)) & (c3 & 1) --> simplified (duplicate folded)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these tests being changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation logical-expr Logical plan and expressions optimizer Optimizer rules spark sql SQL Planner substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deprecate ExprSchema functions

2 participants