Skip to content

Conversation

@kaimast
Copy link
Collaborator

@kaimast kaimast commented Dec 4, 2025

Addition of CheckBlockError

This PR enables detecting race conditions when verifying a block. In particular, it returns a machine-readable CheckBlockError that contains an error type for the case when a block already exists in the ledger.
On the snarkOS side (see ProvableHQ/snarkOS#4039), block sync will remove the affected pending blocks and retry when it encounters this error. This allows us to cut down on errors like Unable to advance to the next block - Failed to speculate on transactions - Failed to post-ratify - Next round 2532 must be greater than current round 2532.

One alternative to this approach would be to return anyhow::Result<Option<PendingBlock>>. I do not like this approach, as it is not really clear what Ok(None) means without reading the function's doc comment.
Another alternative would be to match the error message against a string, but that can easily break in the future and is generally considered bad practice.

Finally, it would be possible to use locks on the snarkOS side instead. I try to avoid locking there whenever possible to reduce the potential for deadlocks and to maintain good performance. The linked snarkOS PR still uses locks, but only when it is actually advancing the ledger.

The change is only made to Ledger::check_block_subdag, to ensure the existing Ledger::check_next_block still works as before. Maybe at some point we could consider cleaning up some of the block checking and advancement API, but now is not that time.

Atomic block checks

Additionally, the PR changesLedger::check_next_block and Ledger::check_block_subdag so that it is guaranteed no concurrent block advancement happens.

The following is a new error message that appeared after the above changes, and is addressed by this:

ERROR snarkos_node_bft::sync: Block synchronization failed — Failed to check contents of pending block ab1y3l48teq38zm9kftznt60vpgl4ydq09f6qsx6mr269rw5qef7uqsjzd7l5 at height 342 — Failed to speculate over unconfirmed transactions — Invalid transaction found in the transactions list: Transaction 'at1zuc4zprjff66q8ftc8jtt6e9xa7ltym5ttg348jz37j6fnel3yyseqrhsc' already exists in the ledger

check_next_block and check_block_subdag now acquire a read-lock on current_block, which prevents the ledger advancing while the check is performed. In other words, they are no atomic operations.
On the snarkOS side, we still need to account for concurrent updates between the block check and the block advancement, but this atomicity guarantee makes it a lot easier and avoids messy errors, such as the one above.

@kaimast kaimast marked this pull request as ready for review December 5, 2025 18:36
@kaimast kaimast marked this pull request as draft December 5, 2025 21:54
@kaimast kaimast force-pushed the feat/check-block-error branch from 97f7b1c to 77801ef Compare December 6, 2025 22:41
@kaimast kaimast marked this pull request as ready for review December 7, 2025 23:19
@kaimast kaimast changed the title [Draft] Add CheckBlockError [Feature] Add CheckBlockError Dec 7, 2025
@vicsn vicsn merged commit cf10c9d into staging Dec 8, 2025
7 checks passed
@vicsn vicsn deleted the feat/check-block-error branch December 8, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants