perf: pre-reserve replace_rec_fn cache to avoid early rehashing#13376
Draft
Kha wants to merge 1 commit intoleanprover:masterfrom
Draft
perf: pre-reserve replace_rec_fn cache to avoid early rehashing#13376Kha wants to merge 1 commit intoleanprover:masterfrom
replace_rec_fn cache to avoid early rehashing#13376Kha wants to merge 1 commit intoleanprover:masterfrom
Conversation
This PR pre-reserves 128 buckets in the `replace_rec_fn` cache when caching is enabled. The default `std::unordered_map` starts very small and rehashes several times during a typical traversal, and the resulting bucket-array churn dominated cache-miss stalls in the hot kernel traversal. Reserving up front shaves another ~4% off the wall-clock of `leanchecker --fresh Init.Data.List.Lemmas` on top of the previous `try_emplace` change, for a combined ~8% wall-clock improvement (with a small ~1.5% increase in retired instructions that is more than offset by the improved IPC).
Member
Author
|
!bench |
|
Benchmark results for 1f4b906 against 82bb27f are in. Significant changes detected! @Kha
Medium changes (3✅, 3🟥)
Small changes (7✅, 69🟥) Too many entries to display here. View the full report on radar instead. |
|
Mathlib CI status (docs):
|
Collaborator
|
Reference manual CI status:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR pre-reserves 128 buckets in the
replace_rec_fncache when caching is enabled. The defaultstd::unordered_mapstarts very small and rehashes several times during a typical traversal, and the resulting bucket-array churn dominated cache-miss stalls in the hot kernel traversal. Reserving up front shaves another ~4% off the wall-clock ofleanchecker --fresh Init.Data.List.Lemmason top of the previoustry_emplacechange, for a combined ~8% wall-clock improvement (with a small ~1.5% increase in retired instructions that is more than offset by the improved IPC).