Fix SMT2 output determinism by using ordered maps by tautschnig · Pull Request #8830 · diffblue/cbmc

tautschnig · 2026-01-23T12:57:41Z

This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines).

Root Cause:

Phi nodes were generated by iterating over a hash map, implying iteration in hash-dependent order. When source code formatting changes:

Line numbers change
Expression hashes change
Symbols are stored/iterated in different orders
SMT2 declarations appear in different orders

The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in #8820 this now yields: SMT files: 0 lines differ (where previously it was 8 lines differing).

Fixes: #8820

Each commit message has a non-empty body, explaining why the change was made.
n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
n/a My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copilot

Pull request overview

Improves determinism of generated SMT2 output (and thus solver performance stability) by making key generation steps order-stable and adding a regression test.

Changes:

Replaced several std::unordered_map members in the SMT2 converter with ordered maps and added determinism-oriented comments.
Made goto_symext::phi_function process SSA delta merges in a deterministic order.
Added a regression test asserting stable ordering of SMT2 declarations.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/solvers/smt2/smt2_conv.h	Switches multiple `unordered_map` members to `map` to attempt stable ordering.
src/solvers/smt2/smt2_conv.cpp	Adds determinism-related commentary around footer emission and symbol discovery.
src/goto-symex/symex_goto.cpp	Sorts phi-merge processing by identifier to stabilize downstream equation / SMT2 output order.
regression/cbmc/deterministic-smt-output/test.desc	New regression asserting consistent declaration ordering (x before y).
regression/cbmc/deterministic-smt-output/main.c	Minimal program to trigger and validate deterministic SMT2 output ordering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/solvers/smt2/smt2_conv.h

src/solvers/smt2/smt2_conv.cpp

src/goto-symex/symex_goto.cpp

src/solvers/smt2/smt2_conv.h

rod-chapman · 2026-01-23T14:50:48Z

Test results on mlkem-native, running on M1 macOS, with MLKEM_K=4 and -j8

Before (release 6.8.0): 139 proofs in 17m13s real, 49m 12s user
After (this branch): 139 proofs in 18m39s real, 51m users.

rod-chapman · 2026-01-23T14:51:52Z

Test results on mlkem-native, running on r7g.16xlarge/Ubuntu, with MLKEM_K=4 and -j64

Before (release 6.8.0): 139 proofs in 6m20s real, 47m37s user
After (this branch): 139 proofs in 6m53s real, 49m33s user.

rod-chapman · 2026-01-23T18:46:49Z

Test results on mldsa-native, running on r7g.16xlarge/Ubuntu, with MLD_CONFIG_PARAMETER_SET=87 and -j64

Before (release 6.8.0): 173 proofs in 4m22s real, 51m38s user
After (this branch): 173 proofs in 5m49s real, 57m28s

so a little slower. For example, the proofs of sign_verify_internal slows from 143s to 307s.

With stability gained, we can experiment with that to see which ordering of SMT terms is the best.

I also hope that PR#8705 will result in a noticeable improvement in performance.

kroening · 2026-01-23T23:15:29Z

Rod: can you tell what makes that run slower? Is it the time to generate the SMT-LIB instance, or the time to solve it?

rod-chapman · 2026-01-25T09:21:24Z

For polyvec_add() which slowed a little after this change, the time to generate the SMT is 5 seconds. Time to prove it with Z3 is 520s.

Once we have stabilized SMT-generation in CBMC, we've got a rational basis to proceed - we can experiment with Z3 options and tactics, work out what the best ordering of terms is (for Z3 at least), and possible open bug reports on Z3. Onwards!

rod-chapman · 2026-01-25T09:34:16Z

That Z3 time of 520s is with the option rewrite.sort_bv_ac=true.

With default options, proof time is 44s on my laptop. There might be a few other cases where we can adjust Z3 options to get the overall performance back to what it was.

rod-chapman · 2026-01-26T08:30:58Z

Do we know why CI is failing?

codecov · 2026-01-27T11:05:08Z

Codecov Report

❌ Patch coverage is 95.31250% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.01%. Comparing base (74fd5bd) to head (37f2da4).

Files with missing lines	Patch %	Lines
src/goto-programs/show_properties.cpp	50.00%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #8830   +/-   ##
========================================
  Coverage    80.00%   80.01%           
========================================
  Files         1700     1700           
  Lines       188248   188296   +48     
  Branches        73       73           
========================================
+ Hits        150613   150669   +56     
+ Misses       37635    37627    -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

rod-chapman · 2026-01-27T13:30:41Z

With the latest commit: result for mlkem-native proofs on r7g, 64 cores, with MLKEM_K=4

CBMC 6.8.0 - 151 proofs OK in 6m20s real, 48m2s user
This branch - 151 proofs OK in 7m2s real, 50m35s user

rod-chapman · 2026-01-27T13:53:15Z

With the latest commit: result for mldsa-native (main branch) proofs on r7g, 64 cores, with MLDSA_CONFIG_PARAMETER_SET=87

CBMC 6.8.0 - 174 proofs OK in 3m56s real, 52m24s user
This branch - 174 proofs OK in 4m55s real, 55m54s user

The most obvious drop in performance is in sign_verify_internal() function, where proof time slows from 120s to 260s with the new branch.

tautschnig · 2026-01-27T20:23:39Z

I need to fix up some regression tests that expect a particular instruction numbering.

rod-chapman · 2026-01-28T13:22:42Z

mlkem-native,K=4,r7g,AArch64/Ubuntu,64 cores

CBMC 6.8.0, 151 proofs in 6m29s real, 49m40s user. Longest are indcpa_keypair_derand@348s, indcpa_enc@250s
CBMC commit c596, 151 proofs in 17m3s real, 65m45s user. Longest are indcpa_enc@986s, indcpa_keypair_derand@297s

Odd...

indcpa_enc has slowed from 250s to 986s
indcpa_keypair_derand got faster from 348s to 297s

kroening · 2026-02-05T21:41:49Z

src/util/simplify_expr.cpp

    }
  }

+  // Push a pointer typecast into pointer arithmetic


Is that even related?

See the commit message: sort-of in that it avoids producing different-looking formulae that are still semantically equivalent. But it's indeed not a problem caused by iteration over hash maps. I can move this to a different PR if preferred.

Yes, please!

This commit addresses the issue where SMT2 formula output varied even when semantically equivalent C code was provided with different source formatting (e.g., adding empty lines). Root Cause: ----------- Phi nodes were generated by iterating over a hash map, implying iteration in hash-dependent order. When source code formatting changes: 1. Line numbers change 2. Expression hashes change 3. Symbols are stored/iterated in different orders 4. SMT2 declarations appear in different orders The ensuing changes to the SMT formula content -- even though the formulae were semantically equivalent -- caused variability in SMT-solver performance. Producing stable output should ensure stability in solver performance. For the test case provided in diffblue#8820 this now yields: `SMT files: 0 lines differ` (where previously it was 8 lines differing). Fixes: diffblue#8820

We otherwise set the prefix in `goto_convert_functionst::convert_function`, which is not used when directly converting individual instructions. This makes sure we have deterministic names irrespective of the order in which we iterate over functions.

Avoid hash value changes resulting in different sequences of constraints being generated, which can affect SMT solver performance.

Explain why iteration is actually deterministic.

Iterate over symbols in lexicographic order to ensure that we consistently produce the same instrumented program for a given input.

Changes in function-identifier hashes should not result in varying location numbers. This requires updating several test descriptions that rely on specific location numbers. Also, some location numbers can vary across platforms when, e.g., system headers drag in additional functions, or 32/64-bit differences in constraining argc. To avoid such problems, several tests were adjusted to either not use unnecessary system headers or not use argc/argv.

Use lexicographic ordering when iterating over goto functions to avoid property numbers (or the order in which they printed) to depend on hash values.

This permits simplifying, e.g., `(char*)(ptr + 1) + -1` to `(char*)ptr` when `ptr` is `unsigned char*`. This, in turn, avoids different formulae produced on AArch64 (where `char` is unsigned) and Apple Silicon/macOS (where `char` is signed, as is the case on x86).

rod-chapman · 2026-02-08T16:07:35Z

With the latest commit: result for mlkem-native proofs on r7g, 64 cores, with MLKEM_K=4

Last test run - 151 proofs OK in 7m2s real, 50m35s user
This branch @ 37f2... - 151 proofs OK in 16m55s real, 63m29s

The tall poppy is indcpa_enc() at 982s. The next longest is 300s, so I will look at that one to see if it can brought back under control.

rod-chapman · 2026-02-09T07:35:25Z

With the latest commit: result for mldsa-native (main branch) proofs on r7g, 64 cores, with MLDSA_CONFIG_PARAMETER_SET=87

Last test run - 174 proofs OK in 4m55s real, 55m54s user
Latest commit on this branch: 174 proofs OK in 4m31s 54m44s

so a bit faster.

rod-chapman · 2026-02-09T12:08:47Z

I can confirm that using z3 tactic.default_tactic=smt reduces the proof time of indcpa_enc() to 300s with MLKEM_K=4 on r7g/Ubuntu, which I hope is acceptable.

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

tautschnig requested review from kroening and martin-cs as code owners January 23, 2026 12:57

Copilot AI review requested due to automatic review settings January 23, 2026 12:57

tautschnig requested review from TGWDB and peterschrammel as code owners January 23, 2026 12:57

Copilot started reviewing on behalf of tautschnig January 23, 2026 12:58 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

tautschnig self-assigned this Jan 23, 2026

tautschnig force-pushed the smt2-output-stability branch 2 times, most recently from 64ab6c2 to a627bf8 Compare January 27, 2026 10:38

tautschnig requested review from feliperodri and remi-delmas-3000 as code owners January 27, 2026 18:57

tautschnig force-pushed the smt2-output-stability branch from 36ba44b to 8436275 Compare January 27, 2026 19:47

kroening reviewed Feb 5, 2026

View reviewed changes

tautschnig added 4 commits February 6, 2026 09:22

Make goal-to-constraint conversion fully deterministic

48d263c

Avoid hash value changes resulting in different sequences of constraints being generated, which can affect SMT solver performance.

Add determinism comments to SMT formula printing

600f4cb

Explain why iteration is actually deterministic.

DFCC instrumentation: deterministic initializer order

c35d36d

Iterate over symbols in lexicographic order to ensure that we consistently produce the same instrumented program for a given input.

tautschnig force-pushed the smt2-output-stability branch 2 times, most recently from b51252d to ec20240 Compare February 7, 2026 19:01

tautschnig added 3 commits February 7, 2026 19:25

Make property numbering and show-properties fully deterministic

5072fbe

Use lexicographic ordering when iterating over goto functions to avoid property numbers (or the order in which they printed) to depend on hash values.

tautschnig force-pushed the smt2-output-stability branch from ec20240 to 37f2da4 Compare February 7, 2026 19:25

mkannwischer added a commit to pq-code-package/mlkem-native that referenced this pull request Feb 12, 2026

Nix: Update CBCM to experimental branch

58ce882

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

mkannwischer mentioned this pull request Feb 12, 2026

Nix: Update CBMC to experimental branch pq-code-package/mlkem-native#1562

Draft

mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026

Nix: Update CBCM to experimental branch

d6546df

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

mkannwischer mentioned this pull request Feb 12, 2026

Nix: Update CBCM to experimental branch pq-code-package/mldsa-native#964

Draft

mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026

Nix: Update CBCM to experimental branch

f85f14f

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026

Nix: Update CBCM to experimental branch

4183c1f

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

mkannwischer added a commit to pq-code-package/mldsa-native that referenced this pull request Feb 12, 2026

Nix: Update CBCM to experimental branch

7085048

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

rod-chapman pushed a commit to pq-code-package/mlkem-native that referenced this pull request Feb 13, 2026

Nix: Update CBCM to experimental branch

e731fa5

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

rod-chapman pushed a commit to pq-code-package/mldsa-native that referenced this pull request Feb 13, 2026

Nix: Update CBCM to experimental branch

f12b973

See diffblue/cbmc#8830 Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

kroening approved these changes Feb 13, 2026

View reviewed changes

Conversation

tautschnig commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root Cause:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rod-chapman commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rod-chapman commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rod-chapman commented Jan 23, 2026

Uh oh!

kroening commented Jan 23, 2026

Uh oh!

rod-chapman commented Jan 25, 2026

Uh oh!

rod-chapman commented Jan 25, 2026

Uh oh!

rod-chapman commented Jan 26, 2026

Uh oh!

codecov bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rod-chapman commented Jan 27, 2026

Uh oh!

rod-chapman commented Jan 27, 2026

Uh oh!

tautschnig commented Jan 27, 2026

Uh oh!

rod-chapman commented Jan 28, 2026

Uh oh!

kroening Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

tautschnig Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

kroening Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

rod-chapman commented Feb 8, 2026

Uh oh!

rod-chapman commented Feb 9, 2026

Uh oh!

rod-chapman commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tautschnig commented Jan 23, 2026 •

edited

Loading

rod-chapman commented Jan 23, 2026 •

edited

Loading

rod-chapman commented Jan 23, 2026 •

edited

Loading

codecov bot commented Jan 27, 2026 •

edited

Loading

rod-chapman commented Feb 9, 2026 •

edited

Loading