-
Notifications
You must be signed in to change notification settings - Fork 11.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Consensus] test to confirm correct threshold clock advancement after GC #20492
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Skipped Deployments
|
consensus/core/src/core.rs
Outdated
/// commits can trigger an advancement in gc round. Suspended blocks that have dependency in their causal history to any gc'ed blocks, will get unsuspended | ||
/// and accepted. | ||
#[instrument(level = "debug", skip_all)] | ||
fn try_commit(&mut self) -> ConsensusResult<(Vec<CommittedSubDag>, Vec<VerifiedBlock>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative I thought was try_commit
calling directly the add_accepted_blocks
it self for stronger guarantees if we refactor the code etc. Chose to return the accepted blocks and handle independently as I didn't want the try_commit
do too much on its own
I think this issue seems to be a consequence of the threshold clock implementation, which is essentially a separate data structure that needs to be synced with DagState. Keeping track of the flow of accepted blocks seem a bit messy. What about revisiting a previous discussion / PR, where during proposing, the quorum round is read from DagState? |
c3c0502
to
a9b2f85
Compare
a9b2f85
to
1424e0c
Compare
d4024c5
to
545b664
Compare
@mwtian @arun-koshy could you please re-review? I will try to refactor this on separate PR and find a better fit/solution probably using DagState, but I would like to merge this so I can at least enable GC for simtests/devnet and start getting feedback earlier and then refine further. Let me know if that's ok. |
…old clock under GC conditions.
545b664
to
64b2bf6
Compare
Description
This is a fix for a behaviour that has been observed during the private-testnet test where some nodes started crashing when settinggc_depth = 5
. The goal was to test GC with very low gc values to reveal possible issues, such as this one.This PR is fixing the block acceptance path after commits happening. As we try to unsuspend blocks after commits have happened, which naturally move thegc_round
, we should make sure that those blocks are been processed via Core as well and move the threshold clock. Otherwise it's possible - and this mostly becomes visible during bulk processing of blocks - for the following scenario to happen:This PR is now including the test only that confirm the issue as described below when we bulk processing blocks under GC conditions. The #20906 has moved the threshold clock in DagState eliminating the issue:
R
according to the so far accepted blocks via the block manager. Some blocks though are not accepted yet as they do have some early dependencies higher than the currentgc_round
, so they get suspended.gc_round
has advanced so much that our threshold clock is well behind. Now our node tries to create a new block but the threshold clock is at roundR
although gc_round has advanced toR + X
. This will make our own block not getting accepted as we produce a block for a quite old round. We see panics fromsui/consensus/core/src/core.rs
Line 515 in 5b9269e
Test plan
CI/PT
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.