No extension type registry — consumers must manually decode FixedSizeBinary

## Summary

Arrow JS has no mechanism to register custom getters for Arrow extension types. Columns with `ARROW:extension:name` and `ARROW:extension:metadata` field metadata always return raw bytes from `get()`. Every consumer must independently check metadata and decode values.

## Background

The Arrow Extension Type spec ([format docs](https://arrow.apache.org/docs/format/Columnar.html#extension-types)) allows producers to annotate fields with semantic type information via metadata:

- `ARROW:extension:name` — type identifier (e.g., `"arrow.uuid"`, `"arrow.opaque"`)
- `ARROW:extension:metadata` — serialized type parameters (e.g., `{"type_name": "hugeint", "vendor_name": "DuckDB"}`)

Other Arrow implementations provide extension type registration:

- **Arrow C++**: `arrow::ExtensionType` — register a subclass with `RegisterExtensionType()`, and IPC deserialization automatically produces typed arrays with custom accessors
- **Arrow Python**: `pyarrow.ExtensionType` — register with `register_extension_type()`, custom `__arrow_ext_deserialize__` decodes IPC data into Python objects
- **Arrow Rust**: `arrow::datatypes::ExtensionType` trait

Arrow JS has no equivalent. Extension types are preserved in field metadata but `get()` returns the raw storage value (e.g., `Uint8Array` for `FixedSizeBinary`).

## Impact

DuckDB with `arrow_lossless_conversion=true` serializes several types as Arrow extension types:

| DuckDB Type | Arrow Storage | Extension Name | Bytes |
|---|---|---|---|
| `HUGEINT` | FixedSizeBinary[16] | arrow.opaque | 16-byte two's complement signed int |
| `UHUGEINT` | FixedSizeBinary[16] | arrow.opaque | 16-byte unsigned int |
| `TIME WITH TIME ZONE` | FixedSizeBinary[8] | arrow.opaque | packed micros + offset |
| `UUID` | FixedSizeBinary[16] | arrow.uuid | 16 raw bytes |
| `BIGNUM` | Binary | arrow.opaque | 3-byte header + big-endian magnitude |
| `VARINT` | Binary | arrow.opaque | same as BIGNUM |
| `BIT` | Binary | arrow.opaque | padding byte + bit data |

For each of these, consumers must:
1. Check `field.metadata.get("ARROW:extension:metadata")`
2. Parse the JSON to get `type_name`
3. Read raw bytes from `column.data[0].values` at the correct offset
4. Interpret the binary encoding (two's complement, packed bitfields, etc.)

This is ~100 lines of manual decoding in our codebase, repeated by every consumer that reads DuckDB Arrow output.

## Proposal

Add an extension type registry, similar to C++/Python:

```js
import { registerExtensionType } from 'apache-arrow';

registerExtensionType({
  name: 'arrow.opaque',         // matches ARROW:extension:name
  match: (metadata) => {        // optional: filter by extension metadata
    const parsed = JSON.parse(metadata);
    return parsed.type_name === 'hugeint';
  },
  get: (data, index) => {       // custom getter, replaces default
    const dv = new DataView(data.values.buffer, data.values.byteOffset + index * 16, 16);
    const lo = dv.getBigUint64(0, true);
    const hi = dv.getBigUint64(8, true);
    const raw = lo | (hi << 64n);
    if (raw & (1n << 127n)) {
      const mask = (1n << 128n) - 1n;
      return -(((raw ^ mask) + 1n) & mask);
    }
    return raw;
  },
});
```

After registration, `vector.get(i)` on a HUGEINT column would return a BigInt directly instead of a Uint8Array.

This could also support a `serialize` method for the write path, making round-trip extension types fully supported.

## Alternatives

- **Do nothing**: consumers continue to manually decode. Works, but fragile and duplicated.
- **Vendor-specific packages**: e.g., `@duckdb/arrow-extensions` that monkey-patches Arrow's visitor. Feasible but hacky.
- **Local fork of get.mjs**: what we currently do via Vite alias. Maintenance burden.

## Context

We maintain a DuckDB WASM frontend that displays query results through Arrow IPC. Every DuckDB extension type requires custom byte-level decoding because Arrow JS can't be taught about them. The same decoding logic would need to be written by anyone consuming DuckDB, Spark, or other engines that use Arrow extension types in JS.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No extension type registry — consumers must manually decode FixedSizeBinary #423

Summary

Background

Impact

Proposal

Alternatives

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

DuckDB Type	Arrow Storage	Extension Name	Bytes
`HUGEINT`	FixedSizeBinary[16]	arrow.opaque	16-byte two's complement signed int
`UHUGEINT`	FixedSizeBinary[16]	arrow.opaque	16-byte unsigned int
`TIME WITH TIME ZONE`	FixedSizeBinary[8]	arrow.opaque	packed micros + offset
`UUID`	FixedSizeBinary[16]	arrow.uuid	16 raw bytes
`BIGNUM`	Binary	arrow.opaque	3-byte header + big-endian magnitude
`VARINT`	Binary	arrow.opaque	same as BIGNUM
`BIT`	Binary	arrow.opaque	padding byte + bit data

No extension type registry — consumers must manually decode FixedSizeBinary #423

Description

Summary

Background

Impact

Proposal

Alternatives

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions