avoid string copy in sorensen dice #64

maxbachmann · 2024-01-04T11:48:21Z

Since we only need to iterate over the bigrams for each string once, we can create them lazily instead of collecting them into a string. This reduces the binary size by around 7%. In addition it reduces runtime in our current benchmark by around 11%.

For reference in my example binary this gives:

File  .text     Size Crate
6.0%  96.7% 275.8KiB std
0.1%   1.5%   4.3KiB strsim
0.0%   0.0%     124B rf_test
0.0%   0.0%     102B [Unknown]
6.3% 100.0% 285.3KiB .text section size, the file size is 4.5MiB

while previously it was:

File  .text     Size Crate
6.1%  96.6% 276.5KiB std
0.1%   1.6%   4.6KiB strsim
0.0%   0.0%     124B rf_test
0.0%   0.0%     102B [Unknown]
6.3% 100.0% 286.3KiB .text section size, the file size is 4.5MiB

This reduces the binary size by around 7%. In our benchmark this reduces runtime by around 11%.

maxbachmann · 2024-01-04T12:06:20Z

It should be possible to improve this quite a bit further by using the same hashmap used for the damerau-levenshtein implementation. I made a quick experiment which reduced runtime by another 64% and while reducing binary size by another 38%. This version was just a quick experiment and doesn't calculate the correct score yet. So it could just be faster + smaller since it's broken 🤷‍♂️

avoid string copy in sorensen dice

d6383cf

This reduces the binary size by around 7%. In our benchmark this reduces runtime by around 11%.

maxbachmann marked this pull request as draft January 4, 2024 11:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

avoid string copy in sorensen dice #64

avoid string copy in sorensen dice #64

Uh oh!

maxbachmann commented Jan 4, 2024

Uh oh!

maxbachmann commented Jan 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

avoid string copy in sorensen dice #64

Are you sure you want to change the base?

avoid string copy in sorensen dice #64

Uh oh!

Conversation

maxbachmann commented Jan 4, 2024

Uh oh!

maxbachmann commented Jan 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants