[Feature] Add CheckBlockError
#3050
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Addition of
CheckBlockErrorThis PR enables detecting race conditions when verifying a block. In particular, it returns a machine-readable
CheckBlockErrorthat contains an error type for the case when a block already exists in the ledger.On the snarkOS side (see ProvableHQ/snarkOS#4039), block sync will remove the affected pending blocks and retry when it encounters this error. This allows us to cut down on errors like
Unable to advance to the next block - Failed to speculate on transactions - Failed to post-ratify - Next round 2532 must be greater than current round 2532.One alternative to this approach would be to return
anyhow::Result<Option<PendingBlock>>. I do not like this approach, as it is not really clear whatOk(None)means without reading the function's doc comment.Another alternative would be to match the error message against a string, but that can easily break in the future and is generally considered bad practice.
Finally, it would be possible to use locks on the snarkOS side instead. I try to avoid locking there whenever possible to reduce the potential for deadlocks and to maintain good performance. The linked snarkOS PR still uses locks, but only when it is actually advancing the ledger.
The change is only made to
Ledger::check_block_subdag, to ensure the existingLedger::check_next_blockstill works as before. Maybe at some point we could consider cleaning up some of the block checking and advancement API, but now is not that time.Atomic block checks
Additionally, the PR changes
Ledger::check_next_blockandLedger::check_block_subdagso that it is guaranteed no concurrent block advancement happens.The following is a new error message that appeared after the above changes, and is addressed by this:
check_next_blockandcheck_block_subdagnow acquire a read-lock oncurrent_block, which prevents the ledger advancing while the check is performed. In other words, they are no atomic operations.On the snarkOS side, we still need to account for concurrent updates between the block check and the block advancement, but this atomicity guarantee makes it a lot easier and avoids messy errors, such as the one above.