Skip to content

Conversation

@neilconway
Copy link
Contributor

The previous implementation incurred the overhead of Unicode machinery, even for the common case that both the input string and the fill string consistent only of ASCII characters. For the ASCII-only case, we can assume that the length in bytes equals the length in characters, and avoid expensive graphene-based segmentation. This follows similar optimizations applied elsewhere in the codebase.

Benchmarks indicate this is a significant performance win for ASCII-only input (4x-10x faster) but only a mild regression for Unicode input (2-5% slower).

Along the way:

  • Combine: a few instances of write_str(str)? + append_value("") with append_value(str), which saves a few cycles
  • Add a missing test case for truncating the input string
  • Add benchmarks for Unicode input

Which issue does this PR close?

Are these changes tested?

Covered by existing tests. Added new benchmarks for Unicode inputs.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Feb 10, 2026
@neilconway
Copy link
Contributor Author

Benchmark results:

     Running benches/pad.rs (target/release/deps/pad-b74c12aa445bf68e)
Gnuplot not found, using plotters backend
lpad size=1024/lpad utf8 [size=1024, str_len=5, target=20]
                        time:   [13.059 µs 13.073 µs 13.086 µs]
                        change: [−75.614% −75.346% −75.137%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
lpad size=1024/lpad stringview [size=1024, str_len=5, target=20]
                        time:   [11.552 µs 11.560 µs 11.569 µs]
                        change: [−78.830% −78.528% −78.298%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=1024/lpad utf8 [size=1024, str_len=20, target=50]
                        time:   [11.373 µs 11.420 µs 11.458 µs]
                        change: [−93.139% −92.998% −92.888%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild
lpad size=1024/lpad stringview [size=1024, str_len=20, target=50]
                        time:   [11.857 µs 11.871 µs 11.887 µs]
                        change: [−92.972% −92.825% −92.700%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=1024/lpad utf8 unicode [size=1024, target=20]
                        time:   [92.289 µs 93.798 µs 95.872 µs]
                        change: [−4.0607% −2.2791% −0.0744%] (p = 0.05 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
lpad size=1024/lpad stringview unicode [size=1024, target=20]
                        time:   [95.919 µs 96.579 µs 97.458 µs]
                        change: [+3.0933% +4.1235% +5.2351%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

lpad size=4096/lpad utf8 [size=4096, str_len=5, target=20]
                        time:   [55.219 µs 55.744 µs 56.437 µs]
                        change: [−74.845% −74.463% −74.067%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
lpad size=4096/lpad stringview [size=4096, str_len=5, target=20]
                        time:   [47.605 µs 47.737 µs 47.887 µs]
                        change: [−78.282% −78.097% −77.945%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=4096/lpad utf8 [size=4096, str_len=20, target=50]
                        time:   [46.430 µs 47.324 µs 48.286 µs]
                        change: [−93.049% −92.852% −92.662%] (p = 0.00 < 0.05)
                        Performance has improved.
lpad size=4096/lpad stringview [size=4096, str_len=20, target=50]
                        time:   [47.352 µs 48.110 µs 49.133 µs]
                        change: [−92.810% −92.629% −92.423%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
lpad size=4096/lpad utf8 unicode [size=4096, target=20]
                        time:   [376.29 µs 378.75 µs 381.86 µs]
                        change: [+1.7985% +2.5712% +3.4954%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
lpad size=4096/lpad stringview unicode [size=4096, target=20]
                        time:   [380.75 µs 383.62 µs 387.43 µs]
                        change: [−1.2725% −0.2318% +0.8972%] (p = 0.70 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

rpad size=1024/rpad utf8 [size=1024, str_len=5, target=20]
                        time:   [13.597 µs 13.665 µs 13.748 µs]
                        change: [−77.429% −77.259% −77.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
rpad size=1024/rpad stringview [size=1024, str_len=5, target=20]
                        time:   [13.854 µs 13.908 µs 13.970 µs]
                        change: [−76.702% −76.483% −76.293%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
rpad size=1024/rpad utf8 [size=1024, str_len=20, target=50]
                        time:   [12.804 µs 12.850 µs 12.903 µs]
                        change: [−92.564% −92.437% −92.325%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
rpad size=1024/rpad stringview [size=1024, str_len=20, target=50]
                        time:   [13.173 µs 13.204 µs 13.238 µs]
                        change: [−92.356% −92.207% −92.115%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=1024/rpad utf8 unicode [size=1024, target=20]
                        time:   [98.236 µs 98.714 µs 99.357 µs]
                        change: [+2.2886% +3.1339% +3.9890%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
rpad size=1024/rpad stringview unicode [size=1024, target=20]
                        time:   [97.562 µs 103.38 µs 113.92 µs]
                        change: [−0.4527% +5.6605% +16.577%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

rpad size=4096/rpad utf8 [size=4096, str_len=5, target=20]
                        time:   [57.742 µs 58.722 µs 59.893 µs]
                        change: [−76.131% −75.713% −75.202%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
rpad size=4096/rpad stringview [size=4096, str_len=5, target=20]
                        time:   [57.256 µs 58.176 µs 59.196 µs]
                        change: [−75.652% −75.151% −74.661%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=4096/rpad utf8 [size=4096, str_len=20, target=50]
                        time:   [52.659 µs 55.964 µs 61.240 µs]
                        change: [−92.224% −91.701% −90.893%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
rpad size=4096/rpad stringview [size=4096, str_len=20, target=50]
                        time:   [50.029 µs 50.950 µs 51.995 µs]
                        change: [−92.638% −92.455% −92.266%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=4096/rpad utf8 unicode [size=4096, target=20]
                        time:   [368.78 µs 370.27 µs 371.98 µs]
                        change: [−6.9765% −5.8825% −4.9873%] (p = 0.00 < 0.05)
                        Performance has improved.
rpad size=4096/rpad stringview unicode [size=4096, target=20]
                        time:   [372.78 µs 374.26 µs 376.59 µs]
                        change: [−6.8942% −6.1219% −5.3899%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

@neilconway neilconway force-pushed the neilc/optimize-lpad-rpad branch 2 times, most recently from 690f6b4 to f3b449e Compare February 10, 2026 19:52
The previous implementation incurred the overhead of Unicode machinery,
even for the common case that both the input string and the fill string
consistent only of ASCII characters. For the ASCII-only case, we can
assume that the length in bytes equals the length in characters, and
avoid expensive graphene-based segmentation. This follows similar
optimizations applied elsewhere in the codebase.

Benchmarks indicate this is a significant performance win for ASCII-only
input (4x-10x faster) but only a mild regression for Unicode input (2-5%
slower).

Along the way:

* Combine: a few instances of `write_str(str)? + append_value("")` with
  `append_value(str)`, which saves a few cycles
* Add a missing test case for truncating the input string
* Add benchmarks for Unicode input
@neilconway neilconway force-pushed the neilc/optimize-lpad-rpad branch from f3b449e to 53b7236 Compare February 10, 2026 21:32
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 10, 2026
argument(name = "n", description = "String length to pad to."),
argument(
name = "n",
description = "String length to pad to. If the input string is longer than this length, it is truncated."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description = "String length to pad to. If the input string is longer than this length, it is truncated."
description = "String length to pad to. If the input string is longer than this length, it is truncated (on the left)."

to be explicit, as in lpad.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize lpad, rpad for ASCII-only strings

2 participants