Skip to content

Conversation

@kaimast
Copy link
Collaborator

@kaimast kaimast commented Jan 5, 2026

This will avoid certain race conditions when preparing a block template.

Concretely, in the current design the following is possible:

  • There is a thread t that calls Ledger::prepare_next_beacon_block which reads the value of current_block
  • There is another thread t' that calls Ledger::advance_to_next_block, which updates current_block` and the state root.
  • If t now continues execution and reads the state root, it will return a block and not an error, but the block will be invalid, i.e. the state root is incorrect.

This is problematic for two reasons:

  • Ledger implements Sync, which implies concurrent calls to its functions should be executed atomically and not read partial/inconsistent data.
  • Such invalid blocks are harder to catch on the snarkOS side, as they appear to have the correct height and inconsistencies in the state root will be detected only later when the block is executed.

While this PR will ensure calls to the Ledger are atomic (formally, the Ledger being linearizable), it does not ensure that multiple calls to these functions do not interleave in unintended ways (formally, serializable). The latter is implemented in snarkOS and out of the scope of snarkVM, while the former makes the implementation for in snarkOS a lot easier.

P.S.: There is this great blog post by Peter Bailis describing the two formal guarantees mentioned in the previous paragraph, in case you want to learn more about the formalism behind it.

Other minor changes

The PR also reduces log verbosity for sequential operations, as they add a decent amount of noise otherwise.

@kaimast kaimast force-pushed the fix/atomic_prepare_next_block branch from f2c2142 to bbb39e1 Compare January 5, 2026 22:42
@kaimast kaimast force-pushed the fix/atomic_prepare_next_block branch from bbb39e1 to 2a42f7e Compare January 6, 2026 00:41
@kaimast kaimast marked this pull request as ready for review January 6, 2026 00:42
@kaimast kaimast requested review from ljedrz and vicsn January 6, 2026 03:55
@vicsn
Copy link
Collaborator

vicsn commented Jan 6, 2026

Can you explain in the PR README with which test setup you saw what issue come up and how you validated it is resolved?

Can you try to run a network while calling GET /<network>/block/latest on repeat?

All existing calls to latest_block() in snarkVM and snarkOS seem to happen outside of the block generation path, so may incur some delay but no obvious deadlock.

@kaimast kaimast force-pushed the fix/atomic_prepare_next_block branch from 2a42f7e to 93e9807 Compare January 6, 2026 19:27
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me how this fixes things, as prepare_advance_to_next_quorum_block appears to only reuse the block obtained at the beginning.

If I'm not mistaken, the specific intent is to make it impossible for concurrent calls to try_advance_to_next_block to have a different previous_block; however, I'd argue that those calls should not happen concurrently at all, which this change doesn't prevent, as multiple concurrent readers are permitted.

@kaimast
Copy link
Collaborator Author

kaimast commented Jan 9, 2026

If I'm not mistaken, the specific intent is to make it impossible for concurrent calls to try_advance_to_next_block to have a different previous_block; however, I'd argue that those calls should not happen concurrently at all, which this change doesn't prevent, as multiple concurrent readers are permitted.

That is actually not the goal. I simply do not want any concurrent updates or reads to the ledger while the function is executing. I added some more comments to the code and will update the PR description in a second.
At a high-level my reasoning is that Ledger is Sync, which implies that it can be called to from concurrent threads, without ending up in a corrupted state, but it is possible to create invalid block templates if there are certain concurrent invocations in the current design.

Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I initially missed the broader context.

LGTM; a read guard should suffice to provide the desired guarantees, while not being too restrictive for any concurrent work. In the future we might want to reorganize the related logic so that mistakes are less likely to happen due to the structure itself, but it's a step in the right direction, and it's well-documented.

@vicsn
Copy link
Collaborator

vicsn commented Jan 21, 2026

Let's first finish ProvableHQ/snarkOS#4069 before merging

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a deadlock condition here? latest_timestamp also tries to get the read lock. It should be handled the same way latest_epoch_hash is fetched.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latest_timestamp() is also used in Ledger::num_remaining_solutions, which may have a conflicting .read() as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants