SAFE Optimizations: Hybrid stack/heap allocator, BitSlice instead of Vec<bool>, and Allocation combination #78

fereidani · 2025-12-18T20:33:13Z

Hi, Really nice optimized project!
I really enjoyed it, To be honest I worked and tested for 8 hours and I couldn't find anything to optimize.

I was almost had major disappointment but something clicked at the last moment.

I also updated benchmarks to criterion.

I hope you like it, Let me know if you want to discuss anything!

maxbachmann · 2025-12-19T00:06:05Z

What is the actual performance improvement you are seeing with this change for different string lengths?
What is the impact on binary size for the different functions? This is especially interesting because this library is fairly focused on stack usage while https://github.com/rapidfuzz/rapidfuzz-rs is more focused on performance. Damerau Levenshtein is fairly similar. The others use more efficient implementations in rapidfuzz-rs.

fereidani · 2025-12-19T00:31:38Z

Hey, Here are my results on 9950x:
Best improvement is for normalized_levenshtein and levenshtein with roughly 18-19% improvement and osa_distance with 15% improvement:
Benchmark code is included so I recommend testing it yourself too.

     Running benches/benches.rs (target/release/deps/benches-de929fd6e345bf4e)
hamming                 time:   [23.611 ns 23.726 ns 23.847 ns]
                        change: [−2.9541% −1.4701% −0.0719%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

jaro                    time:   [252.87 ns 255.47 ns 257.71 ns]
                        change: [−1.3861% −0.3061% +0.8445%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 21 outliers among 100 measurements (21.00%)
  21 (21.00%) high mild

jaro_winkler            time:   [250.41 ns 251.35 ns 252.36 ns]
                        change: [+3.9810% +4.9065% +5.8499%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

Benchmarking jaro_longstring: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 3.0s. You may wish to increase target time to 3.0s, or reduce sample count to 90.
jaro_longstring         time:   [29.871 ms 29.999 ms 30.133 ms]
                        change: [+3.1316% +3.9735% +4.7743%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

levenshtein             time:   [531.67 ns 531.99 ns 532.28 ns]
                        change: [−17.907% −17.838% −17.771%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

levenshtein_u8          time:   [358.26 ns 359.07 ns 360.01 ns]
                        change: [+1.5142% +1.8168% +2.1244%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

normalized_levenshtein  time:   [534.28 ns 534.58 ns 534.93 ns]
                        change: [−18.588% −18.492% −18.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

osa_distance            time:   [553.30 ns 553.88 ns 554.66 ns]
                        change: [−15.436% −15.293% −15.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

damerau_levenshtein     time:   [1.1171 µs 1.1184 µs 1.1198 µs]
                        change: [−10.019% −9.9091% −9.8048%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

normalized_damerau_levenshtein
                        time:   [1.1048 µs 1.1055 µs 1.1064 µs]
                        change: [−10.349% −10.036% −9.5561%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe

sorensen_dice           time:   [661.48 ns 661.89 ns 662.29 ns]
                        change: [−1.5827% −1.4893% −1.3807%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

sorensen_dice_long_0    time:   [126.42 ns 126.61 ns 126.79 ns]
                        change: [−0.9868% −0.7670% −0.5411%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sorensen_dice_long_1    time:   [586.27 ns 586.87 ns 587.62 ns]
                        change: [−0.9852% −0.7824% −0.5318%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

sorensen_dice_long_2    time:   [1.7806 µs 1.7821 µs 1.7834 µs]
                        change: [−1.8718% −1.4158% −0.8131%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

sorensen_dice_long_3    time:   [1.3045 µs 1.3049 µs 1.3053 µs]
                        change: [+0.1160% +0.2055% +0.2960%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sorensen_dice_long_4    time:   [125.89 µs 125.95 µs 126.01 µs]
                        change: [−2.5226% −2.4407% −2.3577%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

sorensen_dice_long_5    time:   [273.25 ns 274.04 ns 274.80 ns]
                        change: [+1.1451% +1.5181% +2.0031%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

sorensen_dice_long_6    time:   [187.44 ns 187.53 ns 187.65 ns]
                        change: [−0.0200% +0.2188% +0.5952%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

fereidani · 2025-12-19T00:57:38Z

You can use same optimizations in rapidfuzz-rs too, Let me know if you are interested, I can send you the PR.

fereidani added 3 commits December 18, 2025 23:58

add criterion for benchmarks

fcbac66

add HybridBuffer and BitSlice, combine all allocations

d7b2749

remove a useless line

9608075

init register as zero

4c10850

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SAFE Optimizations: Hybrid stack/heap allocator, BitSlice instead of Vec<bool>, and Allocation combination #78

SAFE Optimizations: Hybrid stack/heap allocator, BitSlice instead of Vec<bool>, and Allocation combination #78

Uh oh!

fereidani commented Dec 18, 2025 •

edited

Loading

Uh oh!

maxbachmann commented Dec 19, 2025

Uh oh!

fereidani commented Dec 19, 2025

Uh oh!

fereidani commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

SAFE Optimizations: Hybrid stack/heap allocator, BitSlice instead of Vec<bool>, and Allocation combination #78

Are you sure you want to change the base?

SAFE Optimizations: Hybrid stack/heap allocator, BitSlice instead of Vec<bool>, and Allocation combination #78

Uh oh!

Conversation

fereidani commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxbachmann commented Dec 19, 2025

Uh oh!

fereidani commented Dec 19, 2025

Uh oh!

fereidani commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fereidani commented Dec 18, 2025 •

edited

Loading