-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Deprecate ExprSchema and ExprSchemable helpers (#15847) #20281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
There seem to be unrelated changes bleeding into this PR |
bc1538d to
b27d2b6
Compare
|
Thanks for flagging this, @Jefffrey, you were right. The branch had picked up an unrelated commit (clone_on_ref_ptr) and was behind main, so the diff showed extra workflow and sqllogictest changes. |
…che#15847) - ExprSchema (datafusion_common): deprecate nullable, data_type, metadata, data_type_and_nullable with note to use field_from_column - ExprSchemable (datafusion_expr): deprecate get_type, nullable, metadata, data_type_and_nullable with note to use to_field - Migrate internal call sites to to_field / field_from_column - Fix test_inlist_nullability (long list threshold), test_like_nullability (match current Like nullable), rewrite_sort_cols_by_agg_alias (unqualified cols) - Address clippy: collapsible_if, expect(deprecated) for Wildcard
…llow 0-row in Display
0873a4a to
16d3c54
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR. While I do agree with these changes in theory, in practice it seems to make the calling code much less ergonomic 🤔
Also I'm having a bit of trouble understanding some seemingly unrelated changes made here; could we narrow the PR to only what is strictly necessary to make reviewing easier (considering the PR is already big as it is)
I see mention in the PR body (which I'll add doesn't seem properly formatted so it is a bit hard to read):
Test fixes (no production logic changed): test_inlist_nullability “long list” case now uses 7 elements so it hits the list.len() > 6 path; test_like_nullability expects nullable for Like/SimilarTo (current implementation always returns nullable); rewrite_sort_cols_by_agg_alias expected values use Column::new_unqualified so they match the rewriter’s unqualified output column names. Clippy: addressed collapsible_if and added #[expect(deprecated)] for the Expr::Wildcard match arm in expr_schema.rs.
If we intend to improve the tests perhaps we should do that in a separate PR to cleanly separate what is required for the original issue
| // ScalarValue Struct may have 0 rows (e.g. empty array not foldable) or 1 row | ||
| if struct_arr.is_empty() { | ||
| return write!(f, "Struct({{}})"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct; a scalar should always have length 1? Also this fix seems unrelated to the issue at hand
| desc: r#"min(c2) --> "min(c2)" -- (column *named* "min(t.c2)"!)"#, | ||
| input: sort(min(col("c2"))), | ||
| expected: sort(col("min(t.c2)")), | ||
| expected: sort(Expr::Column(Column::new_unqualified("min(t.c2)"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this being changed?
| #[test] | ||
| fn test_simplify_composed_bitwise_and() { | ||
| // ((c2 > 5) & (c1 < 6)) & (c2 > 5) --> (c2 > 5) & (c1 < 6) | ||
| // ((c3 & 1) & (c3 & 2)) & (c3 & 1) --> simplified (duplicate folded) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these tests being changed?
Which issue does this PR close?
Rationale for this change
As a follow-on to #15646 (Extension Type / Metadata support for Scalar UDFs), the narrow helpers on ExprSchema and ExprSchemable can be deprecated in favor of the single “get the field” API. Callers should use field_from_column / to_field and then read type, nullability, and metadata from the returned field, giving one consistent way to resolve expression/column schema info and allowing the deprecated methods to be removed in a later release.
What changes are included in this PR?
ExprSchema (datafusion_common): Deprecate nullable(col), data_type(col), metadata(col), and data_type_and_nullable(col) with #[deprecated(since = "53.0.0", note = "use field_from_column")]. Default implementations delegate to field_from_column(col). ExprSchemable (datafusion_expr): Deprecate get_type(schema), nullable(schema), and metadata(schema) with #[deprecated(since = "53.0.0", note = "use to_field")]; data_type_and_nullable was already deprecated (since 51.0.0). Default implementations delegate to to_field(schema). All internal uses of these methods are updated to to_field / field_from_column across expr, optimizer, sql, spark, substrait, examples, and docs. Test fixes (no production logic changed): test_inlist_nullability “long list” case now uses 7 elements so it hits the list.len() > 6 path; test_like_nullability expects nullable for Like/SimilarTo (current implementation always returns nullable); rewrite_sort_cols_by_agg_alias expected values use Column::new_unqualified so they match the rewriter’s unqualified output column names. Clippy: addressed collapsible_if and added #[expect(deprecated)] for the Expr::Wildcard match arm in expr_schema.rs.
Are these changes tested?
Yes. Existing tests cover the behavior; no new tests were added. The three test updates only align expectations with current implementation. cargo test -p datafusion-expr --lib and the broader test suite pass. Deprecated methods keep default implementations so behavior is unchanged for any remaining external callers until they migrate.
Are there any user-facing changes?
Yes. ExprSchema methods data_type, nullable, metadata, data_type_and_nullable are deprecated in favor of field_from_column(col) then the field’s accessors. ExprSchemable methods get_type, nullable, metadata, data_type_and_nullable are deprecated in favor of to_field(schema) then the field’s .data_type(), .is_nullable(), .metadata(). This is deprecation only; no APIs are removed and behavior is unchanged. The user guide (working-with-exprs.md) already documents using to_field for expression type/nullability.