perf: small-buffer inline cache for `replace_rec_fn` by Kha · Pull Request #13378 · leanprover/lean4

Kha · 2026-04-12T16:40:01Z

This PR replaces the std::unordered_map-based cache in replace_rec_fn with a small-buffer cache that stores its first 16 entries inline in uninitialized stack storage and only allocates a real hash map for the rare large traversal. Instrumentation across a full leanchecker --fresh Init.Data.List.Lemmas run shows that 87% of replace_rec_fn instances hold at most 15 entries and only 0.21% exceed 128, with a mean cache size of just 9 entries spread across ~950k instances. At that scale a hash map is the wrong data structure: its per-instance bucket-array allocation and per-entry node allocation dwarf the cost of a linear scan over a handful of entries.

The new structure pays nothing for entries that are never inserted, no allocation at all on the common path, and falls back to the original unordered_map once the inline buffer fills. Combined with the existing is_likely_unshared filter, lookups on the common path are just a tight scan over a stack-resident array.

On leanchecker --fresh Init.Data.List.Lemmas this shaves 17.10 G -> 16.18 G instructions (~5.4%) and 1.74s -> 1.62s wall-clock (~6.7%) compared to the previous baseline. It supersedes the prior try_emplace and reserve(128) micro-optimizations on the same cache, both of which are no longer needed since the hash map is no longer on the hot path.

Kha · 2026-04-12T16:40:06Z

!bench

leanprover-radar · 2026-04-12T16:40:12Z

Benchmark results for 9398a0b against 82bb27f are in. Significant changes detected! @Kha

✅ build//instructions: -111.4G (-0.92%)

Large changes (11✅, 5🟥)

✅ build/module/Std.Data.HashMap.RawLemmas//instructions: -5.1G (-4.06%)
🟥 elab/big_do//instructions: +452.6M (+1.91%)
✅ elab/big_match//instructions: -274.8M (-2.37%)
🟥 elab/big_omega//instructions: +501.1M (+2.13%)
🟥 elab/big_omega_MT//instructions: +500.8M (+2.11%)
🟥 elab/big_struct//instructions: +131.4M (+4.77%)
🟥 elab/big_struct_dep//instructions: +1.5G (+10.63%)
✅ elab/cbv_divisors//instructions: -2.3G (-4.36%)
✅ elab/cbv_leroy//instructions: -2.0G (-4.20%)
✅ elab/grind_bitvec2//instructions: -4.0G (-2.08%)
✅ elab/grind_ring_5//instructions: -673.1M (-7.61%)
✅ elab/reduceMatch//instructions: -574.9M (-4.44%)
✅ misc/import Init.Data.BitVec.Lemmas//instructions: -3.7G (-2.97%)
✅ misc/leanchecker --fresh Init//instructions: -24.3G (-6.46%)
✅ misc/leanchecker --fresh Init//task-clock: -2s (-9.86%)
✅ misc/leanchecker --fresh Init//wall-clock: -2s (-9.86%)

Medium changes (28✅, 3🟥)

Too many entries to display here. View the full report on radar instead.

Small changes (895✅, 29🟥)

Too many entries to display here. View the full report on radar instead.

mathlib-lean-pr-testing · 2026-04-12T17:42:11Z

Mathlib CI status (docs):

❗ Mathlib CI can not be attempted yet, as the nightly-testing-2026-04-09 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Mathlib CI should run now. You can force Mathlib CI using the force-mathlib-ci label. (2026-04-12 17:42:11)

leanprover-bot · 2026-04-12T17:42:13Z

Reference manual CI status:

❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-04-09 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-04-12 17:42:13)

This PR replaces the `std::unordered_map`-based cache in `replace_rec_fn` with a small-buffer cache that stores its first 16 entries inline in uninitialized stack storage and only allocates a real hash map for the rare large traversal. Instrumentation across a full `leanchecker --fresh Init.Data.List.Lemmas` run shows that 87% of `replace_rec_fn` instances hold at most 15 entries and only 0.21% exceed 128, with a mean cache size of just 9 entries spread across ~950k instances. At that scale a hash map is the wrong data structure: its per-instance bucket-array allocation and per-entry node allocation dwarf the cost of a linear scan over a handful of entries. The new structure pays nothing for entries that are never inserted, no allocation at all on the common path, and falls back to the original `unordered_map` once the inline buffer fills. Combined with the existing `is_likely_unshared` filter, lookups on the common path are just a tight scan over a stack-resident array. On `leanchecker --fresh Init.Data.List.Lemmas` this shaves `17.10 G -> 16.18 G` instructions (~5.4%) and `1.74s -> 1.62s` wall-clock (~6.7%) compared to the previous baseline. It supersedes the prior `try_emplace` and `reserve(128)` micro-optimizations on the same cache, both of which are no longer needed since the hash map is no longer on the hot path.

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 12, 2026

Kha force-pushed the push-kwnquwmslonl branch from 9398a0b to ecc6f08 Compare April 12, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: small-buffer inline cache for `replace_rec_fn`#13378

perf: small-buffer inline cache for `replace_rec_fn`#13378
Kha wants to merge 1 commit intoleanprover:masterfrom
Kha:push-kwnquwmslonl

Kha commented Apr 12, 2026

Uh oh!

Kha commented Apr 12, 2026

Uh oh!

leanprover-radar commented Apr 12, 2026 •

edited

Loading

Uh oh!

mathlib-lean-pr-testing bot commented Apr 12, 2026

Uh oh!

leanprover-bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kha commented Apr 12, 2026

Uh oh!

Kha commented Apr 12, 2026

Uh oh!

leanprover-radar commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathlib-lean-pr-testing bot commented Apr 12, 2026

Uh oh!

leanprover-bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leanprover-radar commented Apr 12, 2026 •

edited

Loading