Summary
Vector.get(i) on Timestamp columns throws TypeError for valid timestamp values whose microsecond representation exceeds Number.MAX_SAFE_INTEGER. This makes large portions of the valid timestamp range inaccessible through Arrow JS's public API.
Root cause
The timestamp get visitors in visitor/get.mjs convert BigInt64Array values to Number via bigIntToNumber() / divideBigInts():
const getTimestampSecond = ({ values }, index) => 1000 * bigIntToNumber(values[index]);
const getTimestampMillisecond = ({ values }, index) => bigIntToNumber(values[index]);
const getTimestampMicrosecond = ({ values }, index) => divideBigInts(values[index], BigInt(1000));
const getTimestampNanosecond = ({ values }, index) => divideBigInts(values[index], BigInt(1000000));
bigIntToNumber() throws if the value is outside [-MAX_SAFE_INTEGER, MAX_SAFE_INTEGER]:
export function bigIntToNumber(number) {
if (typeof number === 'bigint' && (number < Number.MIN_SAFE_INTEGER || number > Number.MAX_SAFE_INTEGER)) {
throw new TypeError(`${number} is not safe to convert to a number.`);
}
return Number(number);
}
Impact
For Timestamp<MICROSECOND> (DuckDB's default, and common in Parquet files):
- Accessible range: 1970-04-26 to 2255-06-05 (~285 years)
- Valid DuckDB range: 290,309 BC to 294,247 AD (~584,000 years)
- Infinity sentinels (
INT64_MAX, INT64_MIN+1): completely inaccessible
This means get() throws — not returns null, not returns a sentinel, but throws an exception — for valid data that was correctly serialized and deserialized through IPC.
For Timestamp<NANOSECOND>, the accessible range is even narrower.
The conversion is also lossy within range. divideBigInts(value, 1000n) divides BigInt microseconds into a float, losing sub-millisecond precision. A microsecond timestamp of 1652397825000001n returns 1652397825000.001 — correct here, but precision degrades as values grow.
Comparison with other Arrow implementations
- Arrow C++: Returns
int64_t. Caller interprets.
- Arrow Python: Returns raw int via
.as_py() on scalar, or numpy int64 in arrays. to_pylist() preserves int64.
- Arrow Rust: Returns
i64.
- Arrow JS: Converts to
Number, throws if out of range.
Arrow JS is the only implementation where get() can throw on valid data.
Note that visitInt64 and visitUint64 already return raw BigInts:
const getBigInts = ({ values }, index) => values[index];
The timestamp visitors are inconsistent with this — they convert instead of returning the raw value.
Proposal
Return raw BigInt from timestamp visitors, matching Int64/Uint64 behavior:
const getTimestampSecond = ({ values }, index) => values[index];
const getTimestampMillisecond = ({ values }, index) => values[index];
const getTimestampMicrosecond = ({ values }, index) => values[index];
const getTimestampNanosecond = ({ values }, index) => values[index];
The raw BigInt preserves full precision and never throws. Callers that want a JS Date or millisecond Number can convert explicitly.
For convenience, a toDate(index) method or utility function could provide the current behavior for callers in the common case (recent dates, millisecond precision is fine):
function timestampToDate(vector, index) {
const raw = vector.get(index);
if (raw === null) return null;
const ms = Number(raw / 1000n); // microseconds to ms, lossy but fine for Date
return new Date(ms);
}
Similarly, getDateDay should return raw int32 days rather than converting to epoch milliseconds, since the millisecond conversion loses precision for dates far from epoch.
Breaking change
This is a breaking change for code that assumes get() on a timestamp column returns a Number. The alternatives are:
- Breaking change in a major version — cleanest, aligns with C++/Python/Rust
- Opt-in flag (e.g.,
{ useBigInt: true } on reader options) — backwards-compatible but adds API surface
- Return BigInt only when the value is outside safe range — surprising behavior difference based on data values
I'd advocate for option 1. Returning BigInt is consistent with how Int64/Uint64 already behave, and the current behavior (throwing) is arguably already broken — callers can't rely on get() succeeding, which defeats the purpose of a typed API.
Context
We hit this building a DuckDB WASM shell that displays test_all_types() output. Lists containing timestamps with infinity sentinels, extreme-range timestamps, and nanosecond-precision timestamps all trigger this. Our workaround reads raw BigInt64Array values from column.data[0].values, bypassing Arrow's getter entirely — which works but defeats the purpose of the Vector abstraction.
Related issues: #91 (timestamp types all return same values), #83 (pre-epoch timestamp offset errors), #77 (decimal conversion errors). These are all symptoms of the same underlying design: eagerly converting int64 → Number in get visitors.
Summary
Vector.get(i)on Timestamp columns throwsTypeErrorfor valid timestamp values whose microsecond representation exceedsNumber.MAX_SAFE_INTEGER. This makes large portions of the valid timestamp range inaccessible through Arrow JS's public API.Root cause
The timestamp get visitors in
visitor/get.mjsconvertBigInt64Arrayvalues toNumberviabigIntToNumber()/divideBigInts():bigIntToNumber()throws if the value is outside[-MAX_SAFE_INTEGER, MAX_SAFE_INTEGER]:Impact
For
Timestamp<MICROSECOND>(DuckDB's default, and common in Parquet files):INT64_MAX,INT64_MIN+1): completely inaccessibleThis means
get()throws — not returns null, not returns a sentinel, but throws an exception — for valid data that was correctly serialized and deserialized through IPC.For
Timestamp<NANOSECOND>, the accessible range is even narrower.The conversion is also lossy within range.
divideBigInts(value, 1000n)divides BigInt microseconds into a float, losing sub-millisecond precision. A microsecond timestamp of1652397825000001nreturns1652397825000.001— correct here, but precision degrades as values grow.Comparison with other Arrow implementations
int64_t. Caller interprets..as_py()on scalar, or numpy int64 in arrays.to_pylist()preserves int64.i64.Number, throws if out of range.Arrow JS is the only implementation where
get()can throw on valid data.Note that
visitInt64andvisitUint64already return raw BigInts:The timestamp visitors are inconsistent with this — they convert instead of returning the raw value.
Proposal
Return raw BigInt from timestamp visitors, matching Int64/Uint64 behavior:
The raw BigInt preserves full precision and never throws. Callers that want a JS
Dateor millisecond Number can convert explicitly.For convenience, a
toDate(index)method or utility function could provide the current behavior for callers in the common case (recent dates, millisecond precision is fine):Similarly,
getDateDayshould return raw int32 days rather than converting to epoch milliseconds, since the millisecond conversion loses precision for dates far from epoch.Breaking change
This is a breaking change for code that assumes
get()on a timestamp column returns a Number. The alternatives are:{ useBigInt: true }on reader options) — backwards-compatible but adds API surfaceI'd advocate for option 1. Returning BigInt is consistent with how Int64/Uint64 already behave, and the current behavior (throwing) is arguably already broken — callers can't rely on
get()succeeding, which defeats the purpose of a typed API.Context
We hit this building a DuckDB WASM shell that displays
test_all_types()output. Lists containing timestamps with infinity sentinels, extreme-range timestamps, and nanosecond-precision timestamps all trigger this. Our workaround reads rawBigInt64Arrayvalues fromcolumn.data[0].values, bypassing Arrow's getter entirely — which works but defeats the purpose of the Vector abstraction.Related issues: #91 (timestamp types all return same values), #83 (pre-epoch timestamp offset errors), #77 (decimal conversion errors). These are all symptoms of the same underlying design: eagerly converting int64 → Number in get visitors.