-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Validate coerce int96 config 17498 #20253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Validate coerce int96 config 17498 #20253
Conversation
- Introduce DFTimeUnit enum for INT96 timestamp coercion - Implement FromStr, Display, and ConfigField for validation and SET support - Add conversions to/from arrow::TimeUnit for engine integration - Replace old parse_coerce_int96_string usage with DFTimeUnit - Ensures invalid config values are rejected early
| let time_units_and_expected = vec![ | ||
| ( | ||
| None, // Same as "ns" time_unit | ||
| None, // default: None = "ns" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we treat None as defaulting to ns then perhaps we should remove the Option and just make it directly a DFTimeUnit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I change CoerceInt96Opt too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm perhaps I was mislead by these test comments; I'm not certain that None does default to ns, especially considering theres some test failures now 🤔
Might need more investigation before proceeding with this change (of removing the Option)
|
@mbutrovich FYI |
The `datafusion.execution.parquet.coerce_int96` configuration option is now a `DFTimeUnit` directly, instead of `Option<DFTimeUnit>`. Its default value is `DFTimeUnit::NS` (Nanosecond). The `DFTimeUnit` enum was simplified by removing redundant variants (e.g., `Nanosecond`, `Microsecond`), retaining only the abbreviated forms (`NS`, `US`, `MS`, `S`). This standardizes `coerce_int96` to always have a value and cleans up the underlying `DFTimeUnit` enum. Related code, including protobuf serialization/deserialization and accessor methods, has been updated to reflect this non-optional behavior and simplified enum structure.
Which issue does this PR close?
Rationale for this change
Currently, invalid
coerce_int96values (e.g., "invalid") are accepted atSETtime and only fail later when reading Parquet files. This PR adds early validation so invalid values are rejected immediately with clear error messages, following the same pattern asDFWriterVersion.What changes are included in this PR?
DFTimeUnitenum withFromStr,Display,ConfigFieldimplementationsParquetOptions.coerce_int96fromString/Option<String>toOption<DFTimeUnit>parse_coerce_int96_stringas a separate runtime parser; validation now happens at config timeParquetFormat::with_coerce_int96and related usage to acceptOption<DFTimeUnit>Are these changes tested?
Yes. Added
test_parquet_coerce_int96_validationthat verifies:Are there any user-facing changes?
Not exactly. Invalid
coerce_int96values now error immediately when set viaSETcommand or proto deserialization, instead of failing later during Parquet file reading. This provides better error messages and earlier feedback.