Skip to content

perf: optimize timestamp truncation for sub-day formats #3477

@andygrove

Description

@andygrove

Description

Timestamp truncation (date_trunc / TruncTimestamp) for sub-day formats (HOUR, MINUTE, SECOND, MILLISECOND, MICROSECOND) can be optimized using simple modular arithmetic instead of the current approach that converts to/from DateTime<Tz> via chrono.

Since timestamps are stored internally as microseconds since Unix epoch (UTC), truncation to e.g. SECOND can be computed as:

micros - (micros % MICROS_PER_SECOND)

This avoids the overhead of timezone-aware DateTime construction and is significantly faster.

Important: timezone offset considerations

As noted by @parthchandra in #2996 (comment), the arithmetic fast path is not timezone-independent for HOUR (and theoretically MINUTE).

The raw arithmetic truncates the UTC hour, not the local hour. For timezones with non-whole-hour offsets, this produces incorrect results. For example:

  • Asia/Kathmandu (UTC+5:45): truncating 14:00 UTC to HOUR gives 14:00 UTC (= 19:45 local), but the correct local-hour truncation would be 19:00 local = 13:15 UTC
  • Asia/Kolkata (UTC+5:30), Australia/Eucla (UTC+8:45), and others with :30/:45 offsets have the same issue

Safe fast-path formats

  • MICROSECOND — always safe (no-op for microsecond timestamps)
  • MILLISECOND — always safe (sub-second, no timezone has sub-second offset)
  • SECOND — always safe (no timezone has sub-second offset)
  • MINUTE — safe in practice (no modern timezone has sub-minute offset, though some historical ones did)
  • HOURNOT safe for non-whole-hour timezone offsets

Suggested approach

One of:

  1. Only apply arithmetic fast path for SECOND, MILLISECOND, MICROSECOND — safest, still a meaningful optimization
  2. Apply fast path for HOUR/MINUTE only when the timezone offset is a whole-hour/whole-minute multiple — check the offset at runtime and fall back to the slow path otherwise
  3. Adjust arithmetic to account for timezone offset — add offset before truncation, subtract after. This requires resolving the timezone offset (which may vary due to DST), so it may negate the performance benefit for HOUR

The prior PR (#2996) also included optimizations for date_trunc (Date32 truncation) that work directly with days-since-epoch instead of converting through NaiveDateTime. Those optimizations are not affected by this timezone issue and could be submitted separately.

Benchmark results (from prior PR #2996)

The arithmetic approach showed significant speedup for sub-day formats:

Timestamp Truncate - HOUR:        ~2-3x faster
Timestamp Truncate - MINUTE:      ~2-3x faster  
Timestamp Truncate - SECOND:      ~2-3x faster
Timestamp Truncate - MILLISECOND: ~2-3x faster
Timestamp Truncate - MICROSECOND: ~10x faster (no-op)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions