-
Notifications
You must be signed in to change notification settings - Fork 284
Description
Description
Timestamp truncation (date_trunc / TruncTimestamp) for sub-day formats (HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND) can be optimized using simple modular arithmetic instead of the current approach that converts to/from DateTime<Tz> via chrono.
Since timestamps are stored internally as microseconds since Unix epoch (UTC), truncation to e.g. SECOND can be computed as:
micros - (micros % MICROS_PER_SECOND)This avoids the overhead of timezone-aware DateTime construction and is significantly faster.
Important: timezone offset considerations
As noted by @parthchandra in #2996 (comment), the arithmetic fast path is not timezone-independent for HOUR (and theoretically MINUTE).
The raw arithmetic truncates the UTC hour, not the local hour. For timezones with non-whole-hour offsets, this produces incorrect results. For example:
Asia/Kathmandu(UTC+5:45): truncating14:00 UTCto HOUR gives14:00 UTC(= 19:45 local), but the correct local-hour truncation would be19:00 local=13:15 UTCAsia/Kolkata(UTC+5:30),Australia/Eucla(UTC+8:45), and others with :30/:45 offsets have the same issue
Safe fast-path formats
- MICROSECOND — always safe (no-op for microsecond timestamps)
- MILLISECOND — always safe (sub-second, no timezone has sub-second offset)
- SECOND — always safe (no timezone has sub-second offset)
- MINUTE — safe in practice (no modern timezone has sub-minute offset, though some historical ones did)
- HOUR — NOT safe for non-whole-hour timezone offsets
Suggested approach
One of:
- Only apply arithmetic fast path for SECOND, MILLISECOND, MICROSECOND — safest, still a meaningful optimization
- Apply fast path for HOUR/MINUTE only when the timezone offset is a whole-hour/whole-minute multiple — check the offset at runtime and fall back to the slow path otherwise
- Adjust arithmetic to account for timezone offset — add offset before truncation, subtract after. This requires resolving the timezone offset (which may vary due to DST), so it may negate the performance benefit for HOUR
The prior PR (#2996) also included optimizations for date_trunc (Date32 truncation) that work directly with days-since-epoch instead of converting through NaiveDateTime. Those optimizations are not affected by this timezone issue and could be submitted separately.
Benchmark results (from prior PR #2996)
The arithmetic approach showed significant speedup for sub-day formats:
Timestamp Truncate - HOUR: ~2-3x faster
Timestamp Truncate - MINUTE: ~2-3x faster
Timestamp Truncate - SECOND: ~2-3x faster
Timestamp Truncate - MILLISECOND: ~2-3x faster
Timestamp Truncate - MICROSECOND: ~10x faster (no-op)