Timestamp get() throws for valid data outside Number.MAX_SAFE_INTEGER range

## Summary

`Vector.get(i)` on Timestamp columns throws `TypeError` for valid timestamp values whose microsecond representation exceeds `Number.MAX_SAFE_INTEGER`. This makes large portions of the valid timestamp range inaccessible through Arrow JS's public API.

## Root cause

The timestamp get visitors in `visitor/get.mjs` convert `BigInt64Array` values to `Number` via `bigIntToNumber()` / `divideBigInts()`:

```js
const getTimestampSecond = ({ values }, index) => 1000 * bigIntToNumber(values[index]);
const getTimestampMillisecond = ({ values }, index) => bigIntToNumber(values[index]);
const getTimestampMicrosecond = ({ values }, index) => divideBigInts(values[index], BigInt(1000));
const getTimestampNanosecond = ({ values }, index) => divideBigInts(values[index], BigInt(1000000));
```

`bigIntToNumber()` throws if the value is outside `[-MAX_SAFE_INTEGER, MAX_SAFE_INTEGER]`:

```js
export function bigIntToNumber(number) {
    if (typeof number === 'bigint' && (number < Number.MIN_SAFE_INTEGER || number > Number.MAX_SAFE_INTEGER)) {
        throw new TypeError(`${number} is not safe to convert to a number.`);
    }
    return Number(number);
}
```

## Impact

For `Timestamp<MICROSECOND>` (DuckDB's default, and common in Parquet files):
- **Accessible range**: 1970-04-26 to 2255-06-05 (~285 years)
- **Valid DuckDB range**: 290,309 BC to 294,247 AD (~584,000 years)  
- **Infinity sentinels** (`INT64_MAX`, `INT64_MIN+1`): completely inaccessible

This means `get()` throws — not returns null, not returns a sentinel, but *throws an exception* — for valid data that was correctly serialized and deserialized through IPC.

For `Timestamp<NANOSECOND>`, the accessible range is even narrower.

The conversion is also lossy within range. `divideBigInts(value, 1000n)` divides BigInt microseconds into a float, losing sub-millisecond precision. A microsecond timestamp of `1652397825000001n` returns `1652397825000.001` — correct here, but precision degrades as values grow.

## Comparison with other Arrow implementations

- **Arrow C++**: Returns `int64_t`. Caller interprets.
- **Arrow Python**: Returns raw int via `.as_py()` on scalar, or numpy int64 in arrays. `to_pylist()` preserves int64.
- **Arrow Rust**: Returns `i64`.
- **Arrow JS**: Converts to `Number`, throws if out of range.

Arrow JS is the only implementation where `get()` can throw on valid data.

Note that `visitInt64` and `visitUint64` already return raw BigInts:
```js
const getBigInts = ({ values }, index) => values[index];
```

The timestamp visitors are inconsistent with this — they convert instead of returning the raw value.

## Proposal

**Return raw BigInt from timestamp visitors**, matching Int64/Uint64 behavior:

```js
const getTimestampSecond = ({ values }, index) => values[index];
const getTimestampMillisecond = ({ values }, index) => values[index];
const getTimestampMicrosecond = ({ values }, index) => values[index];
const getTimestampNanosecond = ({ values }, index) => values[index];
```

The raw BigInt preserves full precision and never throws. Callers that want a JS `Date` or millisecond Number can convert explicitly.

For convenience, a `toDate(index)` method or utility function could provide the current behavior for callers in the common case (recent dates, millisecond precision is fine):

```js
function timestampToDate(vector, index) {
  const raw = vector.get(index);
  if (raw === null) return null;
  const ms = Number(raw / 1000n); // microseconds to ms, lossy but fine for Date
  return new Date(ms);
}
```

Similarly, `getDateDay` should return raw int32 days rather than converting to epoch milliseconds, since the millisecond conversion loses precision for dates far from epoch.

### Breaking change

This is a breaking change for code that assumes `get()` on a timestamp column returns a Number. The alternatives are:

1. **Breaking change in a major version** — cleanest, aligns with C++/Python/Rust
2. **Opt-in flag** (e.g., `{ useBigInt: true }` on reader options) — backwards-compatible but adds API surface
3. **Return BigInt only when the value is outside safe range** — surprising behavior difference based on data values

I'd advocate for option 1. Returning BigInt is consistent with how Int64/Uint64 already behave, and the current behavior (throwing) is arguably already broken — callers can't rely on `get()` succeeding, which defeats the purpose of a typed API.

## Context

We hit this building a DuckDB WASM shell that displays `test_all_types()` output. Lists containing timestamps with infinity sentinels, extreme-range timestamps, and nanosecond-precision timestamps all trigger this. Our workaround reads raw `BigInt64Array` values from `column.data[0].values`, bypassing Arrow's getter entirely — which works but defeats the purpose of the Vector abstraction.

Related issues: #91 (timestamp types all return same values), #83 (pre-epoch timestamp offset errors), #77 (decimal conversion errors). These are all symptoms of the same underlying design: eagerly converting int64 → Number in get visitors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timestamp get() throws for valid data outside Number.MAX_SAFE_INTEGER range #421

Summary

Root cause

Impact

Comparison with other Arrow implementations

Proposal

Breaking change

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Timestamp get() throws for valid data outside Number.MAX_SAFE_INTEGER range #421

Description

Summary

Root cause

Impact

Comparison with other Arrow implementations

Proposal

Breaking change

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions