Skip to content

Conversation

@fereidani
Copy link

@fereidani fereidani commented Dec 18, 2025

Hi, Really nice optimized project!
I really enjoyed it, To be honest I worked and tested for 8 hours and I couldn't find anything to optimize.

I was almost had major disappointment but something clicked at the last moment.

I also updated benchmarks to criterion.

I hope you like it, Let me know if you want to discuss anything!

@maxbachmann
Copy link
Member

What is the actual performance improvement you are seeing with this change for different string lengths?
What is the impact on binary size for the different functions? This is especially interesting because this library is fairly focused on stack usage while https://github.com/rapidfuzz/rapidfuzz-rs is more focused on performance. Damerau Levenshtein is fairly similar. The others use more efficient implementations in rapidfuzz-rs.

@fereidani
Copy link
Author

Hey, Here are my results on 9950x:
Best improvement is for normalized_levenshtein and levenshtein with roughly 18-19% improvement and osa_distance with 15% improvement:
Benchmark code is included so I recommend testing it yourself too.

     Running benches/benches.rs (target/release/deps/benches-de929fd6e345bf4e)
hamming                 time:   [23.611 ns 23.726 ns 23.847 ns]
                        change: [−2.9541% −1.4701% −0.0719%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

jaro                    time:   [252.87 ns 255.47 ns 257.71 ns]
                        change: [−1.3861% −0.3061% +0.8445%] (p = 0.60 > 0.05)
                        No change in performance detected.
Found 21 outliers among 100 measurements (21.00%)
  21 (21.00%) high mild

jaro_winkler            time:   [250.41 ns 251.35 ns 252.36 ns]
                        change: [+3.9810% +4.9065% +5.8499%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe

Benchmarking jaro_longstring: Warming up for 500.00 ms
Warning: Unable to complete 100 samples in 3.0s. You may wish to increase target time to 3.0s, or reduce sample count to 90.
jaro_longstring         time:   [29.871 ms 29.999 ms 30.133 ms]
                        change: [+3.1316% +3.9735% +4.7743%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

levenshtein             time:   [531.67 ns 531.99 ns 532.28 ns]
                        change: [−17.907% −17.838% −17.771%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

levenshtein_u8          time:   [358.26 ns 359.07 ns 360.01 ns]
                        change: [+1.5142% +1.8168% +2.1244%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

normalized_levenshtein  time:   [534.28 ns 534.58 ns 534.93 ns]
                        change: [−18.588% −18.492% −18.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

osa_distance            time:   [553.30 ns 553.88 ns 554.66 ns]
                        change: [−15.436% −15.293% −15.142%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

damerau_levenshtein     time:   [1.1171 µs 1.1184 µs 1.1198 µs]
                        change: [−10.019% −9.9091% −9.8048%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

normalized_damerau_levenshtein
                        time:   [1.1048 µs 1.1055 µs 1.1064 µs]
                        change: [−10.349% −10.036% −9.5561%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe

sorensen_dice           time:   [661.48 ns 661.89 ns 662.29 ns]
                        change: [−1.5827% −1.4893% −1.3807%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

sorensen_dice_long_0    time:   [126.42 ns 126.61 ns 126.79 ns]
                        change: [−0.9868% −0.7670% −0.5411%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sorensen_dice_long_1    time:   [586.27 ns 586.87 ns 587.62 ns]
                        change: [−0.9852% −0.7824% −0.5318%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe

sorensen_dice_long_2    time:   [1.7806 µs 1.7821 µs 1.7834 µs]
                        change: [−1.8718% −1.4158% −0.8131%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

sorensen_dice_long_3    time:   [1.3045 µs 1.3049 µs 1.3053 µs]
                        change: [+0.1160% +0.2055% +0.2960%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

sorensen_dice_long_4    time:   [125.89 µs 125.95 µs 126.01 µs]
                        change: [−2.5226% −2.4407% −2.3577%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

sorensen_dice_long_5    time:   [273.25 ns 274.04 ns 274.80 ns]
                        change: [+1.1451% +1.5181% +2.0031%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

sorensen_dice_long_6    time:   [187.44 ns 187.53 ns 187.65 ns]
                        change: [−0.0200% +0.2188% +0.5952%] (p = 0.18 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

@fereidani
Copy link
Author

You can use same optimizations in rapidfuzz-rs too, Let me know if you are interested, I can send you the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants