perf: Optimize translate() UDF for scalar inputs #20305

neilconway · 2026-02-12T01:36:00Z

Which issue does this PR close?

Closes Optimize translate() #20302.

Rationale for this change

The translate() is commonly invoked with constant values for its second and third arguments. We can take advantage of that to significantly optimize its performance by precomputing the translation lookup table, rather than recomputing it for every row. For scalar ASCII inputs, this yields roughly a 10x performance improvement. For scalar UTF8 inputs, the performance improvement is more like 50%, although less so for long strings.

What changes are included in this PR?

Add a benchmark for scalar/constant input to translate
Add a missing test case
Improve translate() docs
Support translate() on LargeUtf8 input (AFAICS there is no reason this shouldn't be supported?)
Optimize translate() for scalar inputs by precomputing lookup hashmap
Optimize translate() for ASCII inputs by precomputing ASCII byte-wise lookup table

Are these changes tested?

Yes. Added an extra test case and did a bunch of benchmarking.

Are there any user-facing changes?

No.

AFAIK there is no reason not to support this.

When the second and third arguments are constants (which is common), we can build the lookup table once, rather than rebuilding it for every input row. When all of the arguments are ASCII-only, we can do lookups via a fixed-size lookup table that directly maps ASCII byte values, rather than a hash table.

neilconway · 2026-02-12T01:37:13Z

Benchmark results below. There is a very small regression for array (non-constant) inputs. That's a bit surprising to me -- I'd suspect for non-microbenchmark scenarios, the performance difference shouldn't be measurable.

$ cargo bench --bench translate -- --baseline translate-vanilla
   Compiling datafusion-functions v52.1.0 (/Users/neilconway/datafusion/datafusion/functions)
    Finished `bench` profile [optimized] target(s) in 48.08s
     Running benches/translate.rs (target/release/deps/translate-0ad4e0fbe704471b)
Gnuplot not found, using plotters backend
translate size=1024/array_from_to [str_len=8]
                        time:   [142.15 µs 142.80 µs 143.65 µs]
                        change: [+3.5257% +4.1316% +4.7931%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
translate size=1024/scalar_from_to [str_len=8]
                        time:   [23.681 µs 23.884 µs 24.073 µs]
                        change: [−88.308% −88.195% −88.095%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=1024/array_from_to [str_len=32]
                        time:   [382.29 µs 383.81 µs 385.73 µs]
                        change: [+4.2362% +5.2992% +6.1855%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe
translate size=1024/scalar_from_to [str_len=32]
                        time:   [38.447 µs 38.574 µs 38.738 µs]
                        change: [−92.135% −92.066% −92.014%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
translate size=1024/array_from_to [str_len=128]
                        time:   [1.2771 ms 1.2866 ms 1.2982 ms]
                        change: [+4.4088% +5.5487% +6.6548%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
translate size=1024/scalar_from_to [str_len=128]
                        time:   [90.708 µs 91.473 µs 92.386 µs]
                        change: [−94.261% −94.186% −94.113%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
translate size=1024/array_from_to [str_len=1024]
                        time:   [9.5563 ms 9.6087 ms 9.6860 ms]
                        change: [+3.6639% +4.8067% +5.8998%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 10 measurements (30.00%)
  1 (10.00%) low mild
  2 (20.00%) high severe
translate size=1024/scalar_from_to [str_len=1024]
                        time:   [574.30 µs 576.41 µs 578.83 µs]
                        change: [−95.134% −95.094% −95.059%] (p = 0.00 < 0.05)
                        Performance has improved.

translate size=4096/array_from_to [str_len=8]
                        time:   [554.04 µs 555.40 µs 557.34 µs]
                        change: [+4.2651% +4.7723% +5.2386%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low mild
  1 (10.00%) high severe
translate size=4096/scalar_from_to [str_len=8]
                        time:   [89.457 µs 89.865 µs 90.271 µs]
                        change: [−88.765% −88.690% −88.618%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=4096/array_from_to [str_len=32]
                        time:   [1.4962 ms 1.5035 ms 1.5160 ms]
                        change: [+4.3863% +5.3105% +6.5114%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
translate size=4096/scalar_from_to [str_len=32]
                        time:   [149.42 µs 149.90 µs 150.43 µs]
                        change: [−92.350% −92.289% −92.229%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=4096/array_from_to [str_len=128]
                        time:   [5.0164 ms 5.0436 ms 5.0850 ms]
                        change: [+3.4791% +4.7191% +5.8389%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
translate size=4096/scalar_from_to [str_len=128]
                        time:   [351.02 µs 353.78 µs 357.13 µs]
                        change: [−94.417% −94.367% −94.310%] (p = 0.00 < 0.05)
                        Performance has improved.
translate size=4096/array_from_to [str_len=1024]
                        time:   [38.090 ms 38.370 ms 38.712 ms]
                        change: [+3.4318% +4.6043% +5.7667%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high mild
translate size=4096/scalar_from_to [str_len=1024]
                        time:   [2.3102 ms 2.3175 ms 2.3268 ms]
                        change: [−95.111% −95.067% −95.024%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

neilconway added 5 commits February 11, 2026 20:30

Add benchmark code for translate w/ scalar args

7a2b9f2

Add test case

d693af3

Tweaks for docs

60d896d

Support translate() in LargeUtf8 input

d5c184e

AFAIK there is no reason not to support this.

github-actions bot added documentation Improvements or additions to documentation functions Changes to functions implementation labels Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Optimize translate() UDF for scalar inputs #20305

perf: Optimize translate() UDF for scalar inputs #20305

neilconway commented Feb 12, 2026

Uh oh!

neilconway commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

perf: Optimize translate() UDF for scalar inputs #20305

Are you sure you want to change the base?

perf: Optimize translate() UDF for scalar inputs #20305

Conversation

neilconway commented Feb 12, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant