Skip to content

Conversation

@AlyAbdelmoneim
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Currently, invalid coerce_int96 values (e.g., "invalid") are accepted at SET time and only fail later when reading Parquet files. This PR adds early validation so invalid values are rejected immediately with clear error messages, following the same pattern as DFWriterVersion.

What changes are included in this PR?

  • Add DFTimeUnit enum with FromStr, Display, ConfigField implementations
  • Change ParquetOptions.coerce_int96 from String/Option<String> to Option<DFTimeUnit>
  • Remove parse_coerce_int96_string as a separate runtime parser; validation now happens at config time
  • Update proto conversions to validate during deserialization
  • Update ParquetFormat::with_coerce_int96 and related usage to accept Option<DFTimeUnit>
  • Add test for early validation

Are these changes tested?

Yes. Added test_parquet_coerce_int96_validation that verifies:

  • Valid values ("ns", "us", "ms", "s") are accepted
  • Invalid values ("invalid") are rejected immediately at SET time
  • Error messages are clear and helpful

Are there any user-facing changes?

Not exactly. Invalid coerce_int96 values now error immediately when set via SET command or proto deserialization, instead of failing later during Parquet file reading. This provides better error messages and earlier feedback.

- Introduce DFTimeUnit enum for INT96 timestamp coercion
- Implement FromStr, Display, and ConfigField for validation and SET support
- Add conversions to/from arrow::TimeUnit for engine integration
- Replace old parse_coerce_int96_string usage with DFTimeUnit
- Ensures invalid config values are rejected early
@github-actions github-actions bot added core Core DataFusion crate common Related to common crate proto Related to proto crate datasource Changes to the datasource crate labels Feb 10, 2026
@AlyAbdelmoneim
Copy link
Contributor Author

Hi @Jefffrey, I’ve been tied up with final exams, so I haven’t made progress on issue #17498 over the past few weeks. I’ll be resuming work on it in the coming days. Please review when you have a chance.

let time_units_and_expected = vec![
(
None, // Same as "ns" time_unit
None, // default: None = "ns"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we treat None as defaulting to ns then perhaps we should remove the Option and just make it directly a DFTimeUnit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I will do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I change CoerceInt96Opt too ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm perhaps I was mislead by these test comments; I'm not certain that None does default to ns, especially considering theres some test failures now 🤔

Might need more investigation before proceeding with this change (of removing the Option)

@comphead
Copy link
Contributor

@mbutrovich FYI

The `datafusion.execution.parquet.coerce_int96` configuration option is now a `DFTimeUnit` directly, instead of `Option<DFTimeUnit>`. Its default value is `DFTimeUnit::NS` (Nanosecond).

The `DFTimeUnit` enum was simplified by removing redundant variants (e.g., `Nanosecond`, `Microsecond`), retaining only the abbreviated forms (`NS`, `US`, `MS`, `S`). This standardizes `coerce_int96` to always have a value and cleans up the underlying `DFTimeUnit` enum.

Related code, including protobuf serialization/deserialization and accessor methods, has been updated to reflect this non-optional behavior and simplified enum structure.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants