FSST and general compression #5248

niyue · 2025-11-17T03:02:17Z

niyue
Nov 17, 2025

description

Hi there, in the current Lance v2.1 implementation, when build_variable_width_compressor is used for compressing variable width data, specifying a general compression algorithm (e.g., ZSTD) in the field metadata can still result in fsst being applied internally due to the existing logic (see the code section below). I’m not fully sure whether this behavior is intentional. Technically it works, but conceptually it’s a bit confusing: none / fsst / zstd / lz4 appear to be parallel choices, yet fsst can be nested under a general compression algorithm. This nesting doesn’t seem obvious from the API.

Additionally, when an external general compressor such as zstd is applied, the random-access benefit of fsst is lost. The nested fsst + zstd approach may yield better compression ratios, but it also adds overhead during decompression, so there is a trade-off. I’m not sure this behavior is always desirable, and do you think if we need to add a configuration option to control it?

relevant code

Here is the relevant code. If the data qualifies for fsst, step 3 will select the fsst encoder, and in step 4 a general compression encoder may also be applied. In other words, fsst can be chosen first, and then an additional external compressor may be layered on top of it.

https://github.com/lancedb/lance/blob/254a8217ac26666585983aa7ec8c4234f4c3f99f/rust/lance-encoding/src/compression.rs#L378-L405

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FSST and general compression #5248

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

FSST and general compression #5248

Uh oh!

Uh oh!

niyue Nov 17, 2025

description

relevant code

Replies: 0 comments

niyue
Nov 17, 2025