Skip to content

Conversation

@maxbachmann
Copy link
Member

Since we only need to iterate over the bigrams for each string once, we can create them lazily instead of collecting them into a string. This reduces the binary size by around 7%. In addition it reduces runtime in our current benchmark by around 11%.

For reference in my example binary this gives:

File  .text     Size Crate
6.0%  96.7% 275.8KiB std
0.1%   1.5%   4.3KiB strsim
0.0%   0.0%     124B rf_test
0.0%   0.0%     102B [Unknown]
6.3% 100.0% 285.3KiB .text section size, the file size is 4.5MiB

while previously it was:

File  .text     Size Crate
6.1%  96.6% 276.5KiB std
0.1%   1.6%   4.6KiB strsim
0.0%   0.0%     124B rf_test
0.0%   0.0%     102B [Unknown]
6.3% 100.0% 286.3KiB .text section size, the file size is 4.5MiB

This reduces the binary size by around 7%. In our benchmark this reduces
runtime by around 11%.
@maxbachmann maxbachmann marked this pull request as draft January 4, 2024 11:57
@maxbachmann
Copy link
Member Author

It should be possible to improve this quite a bit further by using the same hashmap used for the damerau-levenshtein implementation. I made a quick experiment which reduced runtime by another 64% and while reducing binary size by another 38%. This version was just a quick experiment and doesn't calculate the correct score yet. So it could just be faster + smaller since it's broken 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants