perf: de-virtualize replace_rec_fn dispatch via templated functor#13380
Draft
Kha wants to merge 1 commit intoleanprover:masterfrom
Draft
perf: de-virtualize replace_rec_fn dispatch via templated functor#13380Kha wants to merge 1 commit intoleanprover:masterfrom
replace_rec_fn dispatch via templated functor#13380Kha wants to merge 1 commit intoleanprover:masterfrom
Conversation
This PR turns `replace_rec_fn` into a class template parameterized on the functor type and exposes it from `kernel/replace_fn.h`, alongside templated `replace<F>(...)` overloads that monomorphize on the functor at the call site. Hot callers like `instantiate`, `instantiate_rev`, and the C-exported `lean_expr_instantiate*_core` variants in `kernel/instantiate.cpp` all pass their substitution closure as a stateless lambda; previously each call into the closure went through `std::function`'s indirect dispatch on every recursion step, blocking inlining of the loose-bvar-range early return. With this change the closure body inlines into `replace_rec_fn::apply` and the `std::function` indirection is gone. The legacy `replace(expr, std::function<...>, bool)` overload is preserved as a thin out-of-line wrapper that instantiates the template once with `std::function` as the functor type, so external callers continue to work unchanged. On `leanchecker --fresh Init.Data.List.Lemmas` this shaves `17.10 G -> 16.50 G` instructions (~3.5%) and `1.74s -> 1.69s` wall-clock (~3%). All existing callers benefit automatically since they were already passing lambdas directly; no caller-side changes were required.
Member
Author
|
!bench |
|
Benchmark results for ea73513 against 82bb27f are in. Significant changes detected! @Kha
Large changes (8✅)
Medium changes (28✅) Too many entries to display here. View the full report on radar instead. Small changes (604✅, 2🟥) Too many entries to display here. View the full report on radar instead. |
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR turns
replace_rec_fninto a class template parameterized on the functor type and exposes it fromkernel/replace_fn.h, alongside templatedreplace<F>(...)overloads that monomorphize on the functor at the call site. Hot callers likeinstantiate,instantiate_rev, and the C-exportedlean_expr_instantiate*_corevariants inkernel/instantiate.cppall pass their substitution closure as a stateless lambda; previously each call into the closure went throughstd::function's indirect dispatch on every recursion step, blocking inlining of the loose-bvar-range early return. With this change the closure body inlines intoreplace_rec_fn::applyand thestd::functionindirection is gone.The legacy
replace(expr, std::function<...>, bool)overload is preserved as a thin out-of-line wrapper that instantiates the template once withstd::functionas the functor type, so external callers continue to work unchanged.On
leanchecker --fresh Init.Data.List.Lemmasthis shaves17.10 G -> 16.50 Ginstructions (~3.5%) and1.74s -> 1.69swall-clock (~3%). All existing callers benefit automatically since they were already passing lambdas directly; no caller-side changes were required.