-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: Optimize strpos() for ASCII-only inputs #20295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Benchmark results: |
|
I added some notes on the approach and some future possible improvements to #20294 20294 |
|
It would be great if you could use the PR template with relevant details, it maintains consistency. |
Sure. I've been removing sections that would otherwise have been left empty, but I can leave the full template in if you'd prefer. |
The previous implementation had a fast path for ASCII-only inputs, but it was still relatively slow. Switch to using memchr::memchr() to find the first matching byte and then check the rest of the bytes by hand. This improves performance for ASCII inputs by 2x-4x on the built-in strpos benchmarks.
1cb12b8 to
7e1ef9f
Compare
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimization!
Left an idea for potential simplification, if it's slower, we can proceed with the current implementation.
| /// `memchr` does not, and strpos is often invoked many times on short inputs. | ||
| /// Returns a 1-based position, or 0 if not found. | ||
| /// Both inputs must be ASCII-only. | ||
| fn find_ascii_substring(haystack: &[u8], needle: &[u8]) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use memchr::memmem::find() directly? Based on the Complexity section, it seems has implemented the same algorithm.
https://docs.rs/memchr/latest/memchr/memmem/fn.find.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! When I tried using memmem::find(), it was substantially slower -- presumably because it incurs some per-call overhead (I'd imagine setting up lookup tables etc.) that memchr does not.
I'd like to explore optimizing the (common) case where strpos() is invoked with a constant substring; in that case we could construct a memmove::Finder once, and use it for the entire input batch. But this PR is already a significant win so my thought was to defer that to a subsequent PR.
The previous implementation had a fast path for ASCII-only inputs, but it was still relatively slow. Switch to using memchr::memchr() to find the first matching byte and then check the rest of the bytes by hand. This improves performance for ASCII inputs by 2x-4x on the built-in strpos benchmarks.
Which issue does this PR close?
Are these changes tested?
Yes, passes unit tests and SLT.
Are there any user-facing changes?
No.