Skip to content

Timestamp get() throws for valid data outside Number.MAX_SAFE_INTEGER range #421

@rustyconover

Description

@rustyconover

Summary

Vector.get(i) on Timestamp columns throws TypeError for valid timestamp values whose microsecond representation exceeds Number.MAX_SAFE_INTEGER. This makes large portions of the valid timestamp range inaccessible through Arrow JS's public API.

Root cause

The timestamp get visitors in visitor/get.mjs convert BigInt64Array values to Number via bigIntToNumber() / divideBigInts():

const getTimestampSecond = ({ values }, index) => 1000 * bigIntToNumber(values[index]);
const getTimestampMillisecond = ({ values }, index) => bigIntToNumber(values[index]);
const getTimestampMicrosecond = ({ values }, index) => divideBigInts(values[index], BigInt(1000));
const getTimestampNanosecond = ({ values }, index) => divideBigInts(values[index], BigInt(1000000));

bigIntToNumber() throws if the value is outside [-MAX_SAFE_INTEGER, MAX_SAFE_INTEGER]:

export function bigIntToNumber(number) {
    if (typeof number === 'bigint' && (number < Number.MIN_SAFE_INTEGER || number > Number.MAX_SAFE_INTEGER)) {
        throw new TypeError(`${number} is not safe to convert to a number.`);
    }
    return Number(number);
}

Impact

For Timestamp<MICROSECOND> (DuckDB's default, and common in Parquet files):

  • Accessible range: 1970-04-26 to 2255-06-05 (~285 years)
  • Valid DuckDB range: 290,309 BC to 294,247 AD (~584,000 years)
  • Infinity sentinels (INT64_MAX, INT64_MIN+1): completely inaccessible

This means get() throws — not returns null, not returns a sentinel, but throws an exception — for valid data that was correctly serialized and deserialized through IPC.

For Timestamp<NANOSECOND>, the accessible range is even narrower.

The conversion is also lossy within range. divideBigInts(value, 1000n) divides BigInt microseconds into a float, losing sub-millisecond precision. A microsecond timestamp of 1652397825000001n returns 1652397825000.001 — correct here, but precision degrades as values grow.

Comparison with other Arrow implementations

  • Arrow C++: Returns int64_t. Caller interprets.
  • Arrow Python: Returns raw int via .as_py() on scalar, or numpy int64 in arrays. to_pylist() preserves int64.
  • Arrow Rust: Returns i64.
  • Arrow JS: Converts to Number, throws if out of range.

Arrow JS is the only implementation where get() can throw on valid data.

Note that visitInt64 and visitUint64 already return raw BigInts:

const getBigInts = ({ values }, index) => values[index];

The timestamp visitors are inconsistent with this — they convert instead of returning the raw value.

Proposal

Return raw BigInt from timestamp visitors, matching Int64/Uint64 behavior:

const getTimestampSecond = ({ values }, index) => values[index];
const getTimestampMillisecond = ({ values }, index) => values[index];
const getTimestampMicrosecond = ({ values }, index) => values[index];
const getTimestampNanosecond = ({ values }, index) => values[index];

The raw BigInt preserves full precision and never throws. Callers that want a JS Date or millisecond Number can convert explicitly.

For convenience, a toDate(index) method or utility function could provide the current behavior for callers in the common case (recent dates, millisecond precision is fine):

function timestampToDate(vector, index) {
  const raw = vector.get(index);
  if (raw === null) return null;
  const ms = Number(raw / 1000n); // microseconds to ms, lossy but fine for Date
  return new Date(ms);
}

Similarly, getDateDay should return raw int32 days rather than converting to epoch milliseconds, since the millisecond conversion loses precision for dates far from epoch.

Breaking change

This is a breaking change for code that assumes get() on a timestamp column returns a Number. The alternatives are:

  1. Breaking change in a major version — cleanest, aligns with C++/Python/Rust
  2. Opt-in flag (e.g., { useBigInt: true } on reader options) — backwards-compatible but adds API surface
  3. Return BigInt only when the value is outside safe range — surprising behavior difference based on data values

I'd advocate for option 1. Returning BigInt is consistent with how Int64/Uint64 already behave, and the current behavior (throwing) is arguably already broken — callers can't rely on get() succeeding, which defeats the purpose of a typed API.

Context

We hit this building a DuckDB WASM shell that displays test_all_types() output. Lists containing timestamps with infinity sentinels, extreme-range timestamps, and nanosecond-precision timestamps all trigger this. Our workaround reads raw BigInt64Array values from column.data[0].values, bypassing Arrow's getter entirely — which works but defeats the purpose of the Vector abstraction.

Related issues: #91 (timestamp types all return same values), #83 (pre-epoch timestamp offset errors), #77 (decimal conversion errors). These are all symptoms of the same underlying design: eagerly converting int64 → Number in get visitors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions