Fix SMT2 output determinism by using ordered maps#8830
Fix SMT2 output determinism by using ordered maps#8830tautschnig wants to merge 8 commits intodiffblue:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Improves determinism of generated SMT2 output (and thus solver performance stability) by making key generation steps order-stable and adding a regression test.
Changes:
- Replaced several
std::unordered_mapmembers in the SMT2 converter with ordered maps and added determinism-oriented comments. - Made
goto_symext::phi_functionprocess SSA delta merges in a deterministic order. - Added a regression test asserting stable ordering of SMT2 declarations.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/solvers/smt2/smt2_conv.h | Switches multiple unordered_map members to map to attempt stable ordering. |
| src/solvers/smt2/smt2_conv.cpp | Adds determinism-related commentary around footer emission and symbol discovery. |
| src/goto-symex/symex_goto.cpp | Sorts phi-merge processing by identifier to stabilize downstream equation / SMT2 output order. |
| regression/cbmc/deterministic-smt-output/test.desc | New regression asserting consistent declaration ordering (x before y). |
| regression/cbmc/deterministic-smt-output/main.c | Minimal program to trigger and validate deterministic SMT2 output ordering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Test results on mlkem-native, running on M1 macOS, with MLKEM_K=4 and -j8 Before (release 6.8.0): 139 proofs in 17m13s real, 49m 12s user |
|
Test results on mlkem-native, running on r7g.16xlarge/Ubuntu, with MLKEM_K=4 and -j64 Before (release 6.8.0): 139 proofs in 6m20s real, 47m37s user |
|
Test results on mldsa-native, running on r7g.16xlarge/Ubuntu, with MLD_CONFIG_PARAMETER_SET=87 and -j64 Before (release 6.8.0): 173 proofs in 4m22s real, 51m38s user so a little slower. For example, the proofs of sign_verify_internal slows from 143s to 307s. With stability gained, we can experiment with that to see which ordering of SMT terms is the best. I also hope that PR#8705 will result in a noticeable improvement in performance. |
|
Rod: can you tell what makes that run slower? Is it the time to generate the SMT-LIB instance, or the time to solve it? |
|
For polyvec_add() which slowed a little after this change, the time to generate the SMT is 5 seconds. Time to prove it with Z3 is 520s. Once we have stabilized SMT-generation in CBMC, we've got a rational basis to proceed - we can experiment with Z3 options and tactics, work out what the best ordering of terms is (for Z3 at least), and possible open bug reports on Z3. Onwards! |
|
That Z3 time of 520s is with the option With default options, proof time is 44s on my laptop. There might be a few other cases where we can adjust Z3 options to get the overall performance back to what it was. |
|
Do we know why CI is failing? |
64ab6c2 to
a627bf8
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #8830 +/- ##
========================================
Coverage 80.00% 80.01%
========================================
Files 1700 1700
Lines 188248 188296 +48
Branches 73 73
========================================
+ Hits 150613 150669 +56
+ Misses 37635 37627 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
With the latest commit: result for mlkem-native proofs on r7g, 64 cores, with MLKEM_K=4 CBMC 6.8.0 - 151 proofs OK in 6m20s real, 48m2s user |
|
With the latest commit: result for mldsa-native (main branch) proofs on r7g, 64 cores, with MLDSA_CONFIG_PARAMETER_SET=87 CBMC 6.8.0 - 174 proofs OK in 3m56s real, 52m24s user The most obvious drop in performance is in sign_verify_internal() function, where proof time slows from 120s to 260s with the new branch. |
36ba44b to
8436275
Compare
|
I need to fix up some regression tests that expect a particular instruction numbering. |
|
mlkem-native,K=4,r7g,AArch64/Ubuntu,64 cores CBMC 6.8.0, 151 proofs in 6m29s real, 49m40s user. Longest are indcpa_keypair_derand@348s, indcpa_enc@250s Odd... indcpa_enc has slowed from 250s to 986s |
| } | ||
| } | ||
|
|
||
| // Push a pointer typecast into pointer arithmetic |
There was a problem hiding this comment.
See the commit message: sort-of in that it avoids producing different-looking formulae that are still semantically equivalent. But it's indeed not a problem caused by iteration over hash maps. I can move this to a different PR if preferred.
This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines). Root Cause: ----------- Phi nodes were generated by iterating over a hash map, implying iteration in hash-dependent order. When source code formatting changes: 1. Line numbers change 2. Expression hashes change 3. Symbols are stored/iterated in different orders 4. SMT2 declarations appear in different orders The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in diffblue#8820 this now yields: `SMT files: 0 lines differ` (where previously it was 8 lines differing). Fixes: diffblue#8820
We otherwise set the prefix in `goto_convert_functionst::convert_function`, which is not used when directly converting individual instructions. This makes sure we have deterministic names irrespective of the order in which we iterate over functions.
Avoid hash value changes resulting in different sequences of constraints being generated, which can affect SMT solver performance.
Explain why iteration is actually deterministic.
Iterate over symbols in lexicographic order to ensure that we consistently produce the same instrumented program for a given input.
b51252d to
ec20240
Compare
Changes in function-identifier hashes should not result in varying location numbers. This requires updating several test descriptions that rely on specific location numbers. Also, some location numbers can vary across platforms when, e.g., system headers drag in additional functions, or 32/64-bit differences in constraining argc. To avoid such problems, several tests were adjusted to either not use unnecessary system headers or not use argc/argv.
Use lexicographic ordering when iterating over goto functions to avoid property numbers (or the order in which they printed) to depend on hash values.
This permits simplifying, e.g., `(char*)(ptr + 1) + -1` to `(char*)ptr` when `ptr` is `unsigned char*`. This, in turn, avoids different formulae produced on AArch64 (where `char` is unsigned) and Apple Silicon/macOS (where `char` is signed, as is the case on x86).
ec20240 to
37f2da4
Compare
|
With the latest commit: result for mlkem-native proofs on r7g, 64 cores, with MLKEM_K=4 Last test run - 151 proofs OK in 7m2s real, 50m35s user The tall poppy is indcpa_enc() at 982s. The next longest is 300s, so I will look at that one to see if it can brought back under control. |
|
With the latest commit: result for mldsa-native (main branch) proofs on r7g, 64 cores, with MLDSA_CONFIG_PARAMETER_SET=87 Last test run - 174 proofs OK in 4m55s real, 55m54s user so a bit faster. |
|
I can confirm that using z3 tactic.default_tactic=smt reduces the proof time of indcpa_enc() to 300s with MLKEM_K=4 on r7g/Ubuntu, which I hope is acceptable. |
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines).
Root Cause:
Phi nodes were generated by iterating over a hash map, implying iteration in hash-dependent order. When source code formatting changes:
The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in #8820 this now yields:
SMT files: 0 lines differ(where previously it was 8 lines differing).Fixes: #8820