-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Fix] Pause BFT block advancement during sync #4039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
kaimast
wants to merge
17
commits into
staging
Choose a base branch
from
fix/sync-race-conditions
base: staging
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
e8a483a to
c32278a
Compare
648a2f7 to
fe03df4
Compare
fe03df4 to
bd73f5f
Compare
b6b59ba to
31fbc06
Compare
c4f62ba to
1e37a31
Compare
Collaborator
|
Pausing this for now, the latest stress-test run |
435cce4 to
da65f53
Compare
Contributor
Author
|
ProvableHQ/snarkVM#3070 needs to be merged first. |
da65f53 to
1c6e520
Compare
1c6e520 to
40d5a10
Compare
40d5a10 to
39ad9cd
Compare
Collaborator
39ad9cd to
0ec6c22
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to fix #3911 and #3636.
Recently, we added
CheckBlockErroras a machine-readable response to failures in block checks. The proposed changes use this new error type to differentiate between invalid blocks, and cases where the ledger has already advanced, and the proposed block is contained in the ledger already.The snarkVM changes also ensure that calls to the ledger are atomic, but provides no transactional guarantees across multiple invocations of
Ledgerfunctions.BFThas alockfield that aims to protect, block advancement. This lock already existed before this change, but was not accessible from outside the struct. A problem with this setup is thatcheck_block_contentandadvance_to_next_blockare dependent operations. Generally, one assumes that if the former succeeds, the latter will also. However, without additional locking,BFTcan calladvance_to_blockbetweenSync's calls ofcheck_block_contentandadvance_to_next_block. In this case,Syncsecond call will fail.The implementation implements an approached based on optimistic concurrency control to address this. Sync builds a chain of
PendingBlocks without grabbing theBFTlock (otherwise, Sync would hold the lock for long periods of time).Once a chain of pending blocks has reached the availability threshold, it will grab the lock, check their contents for correctness, and advance the ledger. In this step,
check_block_contentwill verify the height of the block again and, if the height is below the current ledger height,Syncwill abort and retry.The main change for
BFTis thatupdate_dagdoes not grab thelockanymore, but callers are required to grab the lock before callingupdate_dag. There is a new methodBFT::pause_for_block_syncthatSynccalls before trying to commit blocks.I removed the
ALLOW_LEDGER_ACCESSparameter fromupdate_dagto remove complexity from this function. This flag was only set tofalsefor some tests, and does not seem to be needed any more even in this case, as theLedgerServiceabstraction allows to simply ignore writes. I would like to eventually have separateupdate_dagfunctions for sync and BFT, but this PR already reduces the complexity of the function noticeably.