perf: Optimize strpos() for ASCII-only inputs #20295

neilconway · 2026-02-11T17:50:22Z

The previous implementation had a fast path for ASCII-only inputs, but it was still relatively slow. Switch to using memchr::memchr() to find the first matching byte and then check the rest of the bytes by hand. This improves performance for ASCII inputs by 2x-4x on the built-in strpos benchmarks.

Which issue does this PR close?

Closes Optimize strpos() for ASCII inputs #20294.

Are these changes tested?

Yes, passes unit tests and SLT.

Are there any user-facing changes?

No.

neilconway · 2026-02-11T17:51:07Z

Benchmark results:

$ cargo bench --bench strpos -- --baseline strpos-vanilla
   Compiling datafusion-functions v52.1.0 (/Users/neilconway/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 49.54s
     Running benches/strpos.rs (target/release/deps/strpos-276a7f6d948782b8)
Gnuplot not found, using plotters backend
strpos_StringArray_ascii_str_len_8
                        time:   [70.568 µs 70.979 µs 71.399 µs]
                        change: [−42.408% −42.154% −41.895%] (p = 0.00 < 0.05)
                        Performance has improved.

strpos_StringArray_utf8_str_len_8
                        time:   [139.70 µs 139.98 µs 140.24 µs]
                        change: [−2.8251% −2.5080% −2.2091%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

strpos_StringViewArray_ascii_str_len_8
                        time:   [84.823 µs 85.501 µs 86.164 µs]
                        change: [−36.379% −35.942% −35.475%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe

strpos_StringViewArray_utf8_str_len_8
                        time:   [149.49 µs 149.70 µs 149.91 µs]
                        change: [−1.3145% −1.0960% −0.8604%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

strpos_StringArray_ascii_str_len_32
                        time:   [88.618 µs 88.681 µs 88.746 µs]
                        change: [−59.156% −59.095% −59.039%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) high mild

strpos_StringArray_utf8_str_len_32
                        time:   [288.50 µs 288.98 µs 289.65 µs]
                        change: [−0.7910% −0.6439% −0.4836%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  5 (5.00%) high mild
  3 (3.00%) high severe

strpos_StringViewArray_ascii_str_len_32
                        time:   [103.70 µs 103.83 µs 103.98 µs]
                        change: [−55.373% −55.209% −55.040%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) low severe
  11 (11.00%) high mild
  7 (7.00%) high severe

strpos_StringViewArray_utf8_str_len_32
                        time:   [311.17 µs 311.76 µs 312.27 µs]
                        change: [+0.9177% +1.1383% +1.3431%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

strpos_StringArray_ascii_str_len_128
                        time:   [135.59 µs 136.00 µs 136.40 µs]
                        change: [−79.902% −79.847% −79.794%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking strpos_StringArray_utf8_str_len_128: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
strpos_StringArray_utf8_str_len_128
                        time:   [1.2347 ms 1.2360 ms 1.2373 ms]
                        change: [−1.2792% −1.1145% −0.9587%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild

strpos_StringViewArray_ascii_str_len_128
                        time:   [173.34 µs 177.51 µs 181.56 µs]
                        change: [−74.843% −74.464% −74.023%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking strpos_StringViewArray_utf8_str_len_128: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
strpos_StringViewArray_utf8_str_len_128
                        time:   [1.2400 ms 1.2414 ms 1.2428 ms]
                        change: [−1.4076% −1.2513% −1.0985%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) low mild

strpos_StringArray_ascii_str_len_4096
                        time:   [4.4126 ms 4.4207 ms 4.4292 ms]
                        change: [−76.979% −76.930% −76.887%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe

strpos_StringArray_utf8_str_len_4096
                        time:   [36.033 ms 36.097 ms 36.179 ms]
                        change: [−1.2339% −1.0242% −0.7534%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

strpos_StringViewArray_ascii_str_len_4096
                        time:   [4.6480 ms 4.6559 ms 4.6643 ms]
                        change: [−75.980% −75.938% −75.899%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

strpos_StringViewArray_utf8_str_len_4096
                        time:   [36.095 ms 36.134 ms 36.173 ms]
                        change: [−1.0341% −0.9052% −0.7789%] (p = 0.00 < 0.05)
                        Change within noise threshold.

neilconway · 2026-02-11T18:06:56Z

I added some notes on the approach and some future possible improvements to #20294 20294

kumarUjjawal · 2026-02-11T18:21:29Z

It would be great if you could use the PR template with relevant details, it maintains consistency.

neilconway · 2026-02-11T18:24:05Z

It would be great if you could use the PR template with relevant details, it maintains consistency.

Sure. I've been removing sections that would otherwise have been left empty, but I can leave the full template in if you'd prefer.

The previous implementation had a fast path for ASCII-only inputs, but it was still relatively slow. Switch to using memchr::memchr() to find the first matching byte and then check the rest of the bytes by hand. This improves performance for ASCII inputs by 2x-4x on the built-in strpos benchmarks.

2010YOUY01

Nice optimization!

Left an idea for potential simplification, if it's slower, we can proceed with the current implementation.

2010YOUY01 · 2026-02-12T03:11:45Z

datafusion/functions/src/unicode/strpos.rs

+/// `memchr` does not, and strpos is often invoked many times on short inputs.
+/// Returns a 1-based position, or 0 if not found.
+/// Both inputs must be ASCII-only.
+fn find_ascii_substring(haystack: &[u8], needle: &[u8]) -> usize {


Is it possible to use memchr::memmem::find() directly? Based on the Complexity section, it seems has implemented the same algorithm.
https://docs.rs/memchr/latest/memchr/memmem/fn.find.html

Thanks for the suggestion! When I tried using memmem::find(), it was substantially slower -- presumably because it incurs some per-call overhead (I'd imagine setting up lookup tables etc.) that memchr does not.

I'd like to explore optimizing the (common) case where strpos() is invoked with a constant substring; in that case we could construct a memmove::Finder once, and use it for the entire input batch. But this PR is already a significant win so my thought was to defer that to a subsequent PR.

github-actions bot added the functions Changes to functions implementation label Feb 11, 2026

neilconway force-pushed the neilc/optimize-strpos branch from 1cb12b8 to 7e1ef9f Compare February 11, 2026 19:00

2010YOUY01 approved these changes Feb 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize strpos() for ASCII-only inputs #20295

perf: Optimize strpos() for ASCII-only inputs #20295

neilconway commented Feb 11, 2026

Uh oh!

neilconway commented Feb 11, 2026

Uh oh!

neilconway commented Feb 11, 2026

Uh oh!

kumarUjjawal commented Feb 11, 2026

Uh oh!

neilconway commented Feb 11, 2026

Uh oh!

2010YOUY01 left a comment

Uh oh!

2010YOUY01 Feb 12, 2026

Uh oh!

neilconway Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf: Optimize strpos() for ASCII-only inputs #20295

Are you sure you want to change the base?

perf: Optimize strpos() for ASCII-only inputs #20295

Conversation

neilconway commented Feb 11, 2026

Which issue does this PR close?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Feb 11, 2026

Uh oh!

neilconway commented Feb 11, 2026

Uh oh!

kumarUjjawal commented Feb 11, 2026

Uh oh!

neilconway commented Feb 11, 2026

Uh oh!

2010YOUY01 left a comment

Choose a reason for hiding this comment

Uh oh!

2010YOUY01 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

neilconway Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neilconway Feb 12, 2026 •

edited

Loading