Skip to content

Fix SMT2 output determinism by using ordered maps#8830

Open
tautschnig wants to merge 8 commits intodiffblue:developfrom
tautschnig:smt2-output-stability
Open

Fix SMT2 output determinism by using ordered maps#8830
tautschnig wants to merge 8 commits intodiffblue:developfrom
tautschnig:smt2-output-stability

Conversation

@tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented Jan 23, 2026

This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines).

Root Cause:

Phi nodes were generated by iterating over a hash map, implying iteration in hash-dependent order. When source code formatting changes:

  1. Line numbers change
  2. Expression hashes change
  3. Symbols are stored/iterated in different orders
  4. SMT2 declarations appear in different orders

The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in #8820 this now yields: SMT files: 0 lines differ (where previously it was 8 lines differing).

Fixes: #8820

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copilot AI review requested due to automatic review settings January 23, 2026 12:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves determinism of generated SMT2 output (and thus solver performance stability) by making key generation steps order-stable and adding a regression test.

Changes:

  • Replaced several std::unordered_map members in the SMT2 converter with ordered maps and added determinism-oriented comments.
  • Made goto_symext::phi_function process SSA delta merges in a deterministic order.
  • Added a regression test asserting stable ordering of SMT2 declarations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/solvers/smt2/smt2_conv.h Switches multiple unordered_map members to map to attempt stable ordering.
src/solvers/smt2/smt2_conv.cpp Adds determinism-related commentary around footer emission and symbol discovery.
src/goto-symex/symex_goto.cpp Sorts phi-merge processing by identifier to stabilize downstream equation / SMT2 output order.
regression/cbmc/deterministic-smt-output/test.desc New regression asserting consistent declaration ordering (x before y).
regression/cbmc/deterministic-smt-output/main.c Minimal program to trigger and validate deterministic SMT2 output ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@tautschnig tautschnig self-assigned this Jan 23, 2026
@rod-chapman
Copy link
Collaborator

rod-chapman commented Jan 23, 2026

Test results on mlkem-native, running on M1 macOS, with MLKEM_K=4 and -j8

Before (release 6.8.0): 139 proofs in 17m13s real, 49m 12s user
After (this branch): 139 proofs in 18m39s real, 51m users.

@rod-chapman
Copy link
Collaborator

rod-chapman commented Jan 23, 2026

Test results on mlkem-native, running on r7g.16xlarge/Ubuntu, with MLKEM_K=4 and -j64

Before (release 6.8.0): 139 proofs in 6m20s real, 47m37s user
After (this branch): 139 proofs in 6m53s real, 49m33s user.

@rod-chapman
Copy link
Collaborator

Test results on mldsa-native, running on r7g.16xlarge/Ubuntu, with MLD_CONFIG_PARAMETER_SET=87 and -j64

Before (release 6.8.0): 173 proofs in 4m22s real, 51m38s user
After (this branch): 173 proofs in 5m49s real, 57m28s

so a little slower. For example, the proofs of sign_verify_internal slows from 143s to 307s.

With stability gained, we can experiment with that to see which ordering of SMT terms is the best.

I also hope that PR#8705 will result in a noticeable improvement in performance.

@kroening
Copy link
Collaborator

Rod: can you tell what makes that run slower? Is it the time to generate the SMT-LIB instance, or the time to solve it?

@rod-chapman
Copy link
Collaborator

For polyvec_add() which slowed a little after this change, the time to generate the SMT is 5 seconds. Time to prove it with Z3 is 520s.

Once we have stabilized SMT-generation in CBMC, we've got a rational basis to proceed - we can experiment with Z3 options and tactics, work out what the best ordering of terms is (for Z3 at least), and possible open bug reports on Z3. Onwards!

@rod-chapman
Copy link
Collaborator

That Z3 time of 520s is with the option rewrite.sort_bv_ac=true.

With default options, proof time is 44s on my laptop. There might be a few other cases where we can adjust Z3 options to get the overall performance back to what it was.

@rod-chapman
Copy link
Collaborator

Do we know why CI is failing?

@tautschnig tautschnig force-pushed the smt2-output-stability branch 2 times, most recently from 64ab6c2 to a627bf8 Compare January 27, 2026 10:38
@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 95.31250% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.01%. Comparing base (74fd5bd) to head (37f2da4).

Files with missing lines Patch % Lines
src/goto-programs/show_properties.cpp 50.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #8830   +/-   ##
========================================
  Coverage    80.00%   80.01%           
========================================
  Files         1700     1700           
  Lines       188248   188296   +48     
  Branches        73       73           
========================================
+ Hits        150613   150669   +56     
+ Misses       37635    37627    -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rod-chapman
Copy link
Collaborator

With the latest commit: result for mlkem-native proofs on r7g, 64 cores, with MLKEM_K=4

CBMC 6.8.0 - 151 proofs OK in 6m20s real, 48m2s user
This branch - 151 proofs OK in 7m2s real, 50m35s user

@rod-chapman
Copy link
Collaborator

With the latest commit: result for mldsa-native (main branch) proofs on r7g, 64 cores, with MLDSA_CONFIG_PARAMETER_SET=87

CBMC 6.8.0 - 174 proofs OK in 3m56s real, 52m24s user
This branch - 174 proofs OK in 4m55s real, 55m54s user

The most obvious drop in performance is in sign_verify_internal() function, where proof time slows from 120s to 260s with the new branch.

@tautschnig
Copy link
Collaborator Author

I need to fix up some regression tests that expect a particular instruction numbering.

@rod-chapman
Copy link
Collaborator

mlkem-native,K=4,r7g,AArch64/Ubuntu,64 cores

CBMC 6.8.0, 151 proofs in 6m29s real, 49m40s user. Longest are indcpa_keypair_derand@348s, indcpa_enc@250s
CBMC commit c596, 151 proofs in 17m3s real, 65m45s user. Longest are indcpa_enc@986s, indcpa_keypair_derand@297s

Odd...

indcpa_enc has slowed from 250s to 986s
indcpa_keypair_derand got faster from 348s to 297s

}
}

// Push a pointer typecast into pointer arithmetic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that even related?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the commit message: sort-of in that it avoids producing different-looking formulae that are still semantically equivalent. But it's indeed not a problem caused by iteration over hash maps. I can move this to a different PR if preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please!

This commit addresses the issue where SMT2 formula output varied even
when semantically equivalent C code was provided with different source
formatting (e.g., adding empty lines).

Root Cause:
-----------
Phi nodes were generated by iterating over a hash map, implying
iteration in hash-dependent order. When source code formatting changes:
1. Line numbers change
2. Expression hashes change
3. Symbols are stored/iterated in different orders
4. SMT2 declarations appear in different orders

The ensuing changes to the SMT formula content -- even though the
formulae were semantically equivalent -- caused variability in
SMT-solver performance. Producing stable output should ensure stability
in solver performance. For the test case provided in diffblue#8820 this now
yields: `SMT files: 0 lines differ` (where previously it was 8 lines
differing).

Fixes: diffblue#8820
We otherwise set the prefix in
`goto_convert_functionst::convert_function`, which is not used when
directly converting individual instructions. This makes sure we have
deterministic names irrespective of the order in which we iterate over
functions.
Avoid hash value changes resulting in different sequences of constraints
being generated, which can affect SMT solver performance.
Explain why iteration is actually deterministic.
Iterate over symbols in lexicographic order to ensure that we
consistently produce the same instrumented program for a given input.
@tautschnig tautschnig force-pushed the smt2-output-stability branch 2 times, most recently from b51252d to ec20240 Compare February 7, 2026 19:01
Changes in function-identifier hashes should not result in varying
location numbers. This requires updating several test descriptions that
rely on specific location numbers. Also, some location numbers can vary
across platforms when, e.g., system headers drag in additional
functions, or 32/64-bit differences in constraining argc. To avoid such
problems, several tests were adjusted to either not use unnecessary
system headers or not use argc/argv.
Use lexicographic ordering when iterating over goto functions to avoid
property numbers (or the order in which they printed) to depend on hash
values.
This permits simplifying, e.g., `(char*)(ptr + 1) + -1` to `(char*)ptr`
when `ptr` is `unsigned char*`. This, in turn, avoids different formulae
produced on AArch64 (where `char` is unsigned) and Apple Silicon/macOS
(where `char` is signed, as is the case on x86).
@tautschnig tautschnig force-pushed the smt2-output-stability branch from ec20240 to 37f2da4 Compare February 7, 2026 19:25
@rod-chapman
Copy link
Collaborator

With the latest commit: result for mlkem-native proofs on r7g, 64 cores, with MLKEM_K=4

Last test run - 151 proofs OK in 7m2s real, 50m35s user
This branch @ 37f2... - 151 proofs OK in 16m55s real, 63m29s

The tall poppy is indcpa_enc() at 982s. The next longest is 300s, so I will look at that one to see if it can brought back under control.

@rod-chapman
Copy link
Collaborator

With the latest commit: result for mldsa-native (main branch) proofs on r7g, 64 cores, with MLDSA_CONFIG_PARAMETER_SET=87

Last test run - 174 proofs OK in 4m55s real, 55m54s user
Latest commit on this branch: 174 proofs OK in 4m31s 54m44s

so a bit faster.

@rod-chapman
Copy link
Collaborator

rod-chapman commented Feb 9, 2026

I can confirm that using z3 tactic.default_tactic=smt reduces the proof time of indcpa_enc() to 300s with MLKEM_K=4 on r7g/Ubuntu, which I hope is acceptable.

mkannwischer added a commit to pq-code-package/mlkem-native that referenced this pull request Feb 12, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
rod-chapman pushed a commit to pq-code-package/mlkem-native that referenced this pull request Feb 13, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
rod-chapman pushed a commit to pq-code-package/mldsa-native that referenced this pull request Feb 13, 2026
See diffblue/cbmc#8830

Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proof instability resulting from insertion of blank line in sources

3 participants