feat: add classic alp encoding support #5587

Xuanwo · 2025-12-29T08:54:17Z

This PR will add classic alp encoding support

Parts of this PR were drafted with assistance from Codex (with gpt-5.2) and fully reviewed and edited by me. I take full responsibility for all changes.

github-actions · 2025-12-29T08:55:39Z

Code Review for Classic ALP Encoding Support

Summary

This PR adds classic ALP (Adaptive Lossless floating-Point) encoding for f32/f64 data as a miniblock compressor. The implementation includes exponent selection, delta-based bitpacking, and exception handling for non-encodable values.

P1 Issues

1. Potential integer overflow in fast_round_f32/fast_round_f64 transformations

The "sweet spot" rounding technique can produce incorrect results for values outside the representable integer range:

let rounded = fast_round_f32(scaled);
let enc = rounded as i32;  // Can overflow if scaled is outside i32 range

While you check is_finite() and the scaling factor validity, values like f32::MAX * 10^e scaled back could produce values that are finite but exceed i32::MAX/MIN. This would cause silent truncation. Consider adding explicit range checks before the cast, or using try_from semantics.

2. Missing feature gate consistency

The AlpMiniBlockEncoder struct and its MiniBlockCompressor impl are not gated by #[cfg(feature = "bitpacking")], but they call encode_chunk_f32/encode_chunk_f64 which return errors when bitpacking is disabled. This creates a runtime error rather than a compile-time check. Consider either:

Gating the entire encoder behind the feature flag, or
Documenting this behavior explicitly

3. Exception position overflow risk

Exception positions are stored as u16, but chunk sizes are validated only by max_chunk_size() (1024 for f32, 512 for f64). While this fits in u16, there's no explicit assertion that chunk_values <= u16::MAX at the position recording site:

exception_positions.push(i as u16);  // Safe only if i < 65536

Consider adding a debug_assert or making the assumption explicit in the code.

Suggestions (Non-blocking)

The duplicate logic between f32 and f64 encoding paths is significant (~200+ lines). A generic implementation with trait bounds could reduce maintenance burden, though this may be intentional for performance.
Consider adding a benchmark test for ALP encoding to validate compression ratios and performance characteristics on representative datasets.

Overall this is a well-documented implementation with proper fallback handling and bitwise-lossless guarantees.

wjones127

This is exciting!

Have you benchmarked the throughput of this and done any profiling? The sampling of e and f seems pretty heavy, so maybe worth spending some time optimizing.

Also, could you run the fuzz_test.rs on this, and do it for a longer period to check correctness? I think that's just running against 2.1 right now.

Also, I'm not sure it's from this PR, but cargo check -p lance-encoding --no-default-features does currently fail due to some bitpacking code not being gated properly.

wjones127 · 2026-01-01T20:20:03Z

rust/lance-encoding/src/encodings/physical/alp.rs

+    if num_values <= sample_size {
+        return (0..num_values).collect();
+    }


I wonder if you should have some value like None or empty Vec that should be interpreted as "sample all rows", that way you can avoid the allocation in this common case.

wjones127 · 2026-01-01T20:55:05Z

rust/lance-encoding/src/encodings/physical/alp.rs

+        if data.len() != 4 {
+            return Err(Error::invalid_input(
+                format!("ALP decompression expects 4 buffers, got {}", data.len()),
+                location!(),
+            ));
+        }
+
+        let n = usize::try_from(num_values)
+            .map_err(|_| Error::invalid_input("ALP chunk too large for usize", location!()))?;
+
+        let mut iter = data.into_iter();
+        let packed_deltas = iter.next().unwrap();
+        let header = iter.next().unwrap();
+        let exception_positions = iter.next().unwrap();
+        let exception_bits = iter.next().unwrap();


nit: you should be able to do the length check and unpack the array in one go. The upshot of this is that the compile doesn't need to put a panic message in the binary for each call to .next().unwrap().

Suggested change

if data.len() != 4 {

return Err(Error::invalid_input(

format!("ALP decompression expects 4 buffers, got {}", data.len()),

location!(),

));

}

let n = usize::try_from(num_values)

.map_err(|_| Error::invalid_input("ALP chunk too large for usize", location!()))?;

let mut iter = data.into_iter();

let packed_deltas = iter.next().unwrap();

let header = iter.next().unwrap();

let exception_positions = iter.next().unwrap();

let exception_bits = iter.next().unwrap();

let [packed_deltas, header, exception_positions, exception_bits] =

data.try_into().map_err(|_| Error::invalid_input(

format!("ALP decompression expects 4 buffers, got {}", data.len()),

location!(),

))?;

let n = usize::try_from(num_values)

.map_err(|_| Error::invalid_input("ALP chunk too large for usize", location!()))?;

Xuanwo added 2 commits December 29, 2025 16:00

feat: Implement alp encoding

7f4d8f6

feat: add classic alp encoding support

e451867

github-actions bot added the enhancement New feature or request label Dec 29, 2025

Fix CI

30f05f5

wjones127 reviewed Jan 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add classic alp encoding support #5587

feat: add classic alp encoding support #5587

Xuanwo commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

wjones127 left a comment

Uh oh!

wjones127 Jan 1, 2026

Uh oh!

wjones127 Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add classic alp encoding support #5587

Are you sure you want to change the base?

feat: add classic alp encoding support #5587

Conversation

Xuanwo commented Dec 29, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Code Review for Classic ALP Encoding Support

Summary

P1 Issues

Suggestions (Non-blocking)

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

wjones127 Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants