Skip to content

Conversation

@neilconway
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The translate() is commonly invoked with constant values for its second and third arguments. We can take advantage of that to significantly optimize its performance by precomputing the translation lookup table, rather than recomputing it for every row. For scalar ASCII inputs, this yields roughly a 10x performance improvement. For scalar UTF8 inputs, the performance improvement is more like 50%, although less so for long strings.

What changes are included in this PR?

  • Add a benchmark for scalar/constant input to translate
  • Add a missing test case
  • Improve translate() docs
  • Support translate() on LargeUtf8 input (AFAICS there is no reason this shouldn't be supported?)
  • Optimize translate() for scalar inputs by precomputing lookup hashmap
  • Optimize translate() for ASCII inputs by precomputing ASCII byte-wise lookup table

Are these changes tested?

Yes. Added an extra test case and did a bunch of benchmarking.

Are there any user-facing changes?

No.

AFAIK there is no reason not to support this.
When the second and third arguments are constants (which is common), we
can build the lookup table once, rather than rebuilding it for every
input row.

When all of the arguments are ASCII-only, we can do lookups via a
fixed-size lookup table that directly maps ASCII byte values, rather
than a hash table.
@github-actions github-actions bot added documentation Improvements or additions to documentation functions Changes to functions implementation labels Feb 12, 2026
@neilconway
Copy link
Contributor Author

Benchmark results below. There is a very small regression for array (non-constant) inputs. That's a bit surprising to me -- I'd suspect for non-microbenchmark scenarios, the performance difference shouldn't be measurable.

$ cargo bench --bench translate -- --baseline translate-vanilla
   Compiling datafusion-functions v52.1.0 (/Users/neilconway/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 48.08s
     Running benches/translate.rs (target/release/deps/translate-0ad4e0fbe704471b)
Gnuplot not found, using plotters backend
translate size=1024/array_from_to [str_len=8]
                        time:   [142.15 µs 142.80 µs 143.65 µs]
                        change: [+3.5257% +4.1316% +4.7931%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
translate size=1024/scalar_from_to [str_len=8]
                        time:   [23.681 µs 23.884 µs 24.073 µs]
                        change: [−88.308% −88.195% −88.095%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=1024/array_from_to [str_len=32]
                        time:   [382.29 µs 383.81 µs 385.73 µs]
                        change: [+4.2362% +5.2992% +6.1855%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
translate size=1024/scalar_from_to [str_len=32]
                        time:   [38.447 µs 38.574 µs 38.738 µs]
                        change: [−92.135% −92.066% −92.014%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
translate size=1024/array_from_to [str_len=128]
                        time:   [1.2771 ms 1.2866 ms 1.2982 ms]
                        change: [+4.4088% +5.5487% +6.6548%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
translate size=1024/scalar_from_to [str_len=128]
                        time:   [90.708 µs 91.473 µs 92.386 µs]
                        change: [−94.261% −94.186% −94.113%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
translate size=1024/array_from_to [str_len=1024]
                        time:   [9.5563 ms 9.6087 ms 9.6860 ms]
                        change: [+3.6639% +4.8067% +5.8998%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 10 measurements (30.00%)
  1 (10.00%) low mild
  2 (20.00%) high severe
translate size=1024/scalar_from_to [str_len=1024]
                        time:   [574.30 µs 576.41 µs 578.83 µs]
                        change: [−95.134% −95.094% −95.059%] (p = 0.00 < 0.05)
                        Performance has improved.

translate size=4096/array_from_to [str_len=8]
                        time:   [554.04 µs 555.40 µs 557.34 µs]
                        change: [+4.2651% +4.7723% +5.2386%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high severe
translate size=4096/scalar_from_to [str_len=8]
                        time:   [89.457 µs 89.865 µs 90.271 µs]
                        change: [−88.765% −88.690% −88.618%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=4096/array_from_to [str_len=32]
                        time:   [1.4962 ms 1.5035 ms 1.5160 ms]
                        change: [+4.3863% +5.3105% +6.5114%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
translate size=4096/scalar_from_to [str_len=32]
                        time:   [149.42 µs 149.90 µs 150.43 µs]
                        change: [−92.350% −92.289% −92.229%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=4096/array_from_to [str_len=128]
                        time:   [5.0164 ms 5.0436 ms 5.0850 ms]
                        change: [+3.4791% +4.7191% +5.8389%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
translate size=4096/scalar_from_to [str_len=128]
                        time:   [351.02 µs 353.78 µs 357.13 µs]
                        change: [−94.417% −94.367% −94.310%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=4096/array_from_to [str_len=1024]
                        time:   [38.090 ms 38.370 ms 38.712 ms]
                        change: [+3.4318% +4.6043% +5.7667%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
translate size=4096/scalar_from_to [str_len=1024]
                        time:   [2.3102 ms 2.3175 ms 2.3268 ms]
                        change: [−95.111% −95.067% −95.024%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize translate()

1 participant