Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raft: plumbing crdb version handle into raft #122165

Closed
wants to merge 2 commits into from

Conversation

lyang24
Copy link
Contributor

@lyang24 lyang24 commented Apr 10, 2024

We need the ability to safely ship backward incompatible changes in the raft module. This commit plumbs version handle in to raft package to enable version gate infrastructure in raft.

Epic: None

Release note: None

@lyang24 lyang24 requested a review from a team April 10, 2024 23:33
@lyang24 lyang24 requested a review from a team as a code owner April 10, 2024 23:33
Copy link

blathers-crl bot commented Apr 10, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Apr 10, 2024
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@lyang24 lyang24 requested a review from pav-kv April 10, 2024 23:33
pkg/kv/kvserver/store.go Outdated Show resolved Hide resolved
pkg/raft/raft.go Outdated Show resolved Hide resolved
@pav-kv
Copy link
Collaborator

pav-kv commented Apr 10, 2024

Probably need to run dev generate bazel to update the BUILD files.

@lyang24 lyang24 force-pushed the raft_crdb_version branch from 15996ba to b26577a Compare April 11, 2024 06:28
Copy link

blathers-crl bot commented Apr 11, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@pav-kv
Copy link
Collaborator

pav-kv commented Apr 11, 2024

LGTM, this will do. Let's keep this uncommitted for now, it's better to include this commit into a PR that would use the gate.

@lyang24 lyang24 marked this pull request as draft April 11, 2024 17:35
We need the ability to safely ship backward incompatible changes in
the raft module. This commit plumbs version handle in to raft package
to enable version gate infrastructure in raft. Related to issue

Epic: None

Release note: None
@lyang24 lyang24 force-pushed the raft_crdb_version branch from b26577a to ee25da1 Compare April 11, 2024 20:42
Copy link

blathers-crl bot commented Apr 12, 2024

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@lyang24 lyang24 force-pushed the raft_crdb_version branch from cdd52ec to bb863de Compare April 12, 2024 18:12
Copy link

blathers-crl bot commented Apr 12, 2024

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Contributor Author

@lyang24 lyang24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pav-kv)


pkg/raft/raft_test.go line 1149 at r3 (raw file):

		// Increase the commit index only if the log is in sync with the leader.
		{pb.Message{From: 2, To: 1, Type: pb.MsgHeartbeat, Term: 2, Commit: commit + 1}, false, commit},

I copied code from this commit etcd-io/raft@eb6bfc6
This test case will fail I see both leader's commit index and follower's last index is at 3 and we are expecting 2
cc @pav-kv

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft_test.go line 1149 at r3 (raw file):

Previously, lyang24 (Lanqing Yang) wrote…

I copied code from this commit etcd-io/raft@eb6bfc6
This test case will fail I see both leader's commit index and follower's last index is at 3 and we are expecting 2
cc @pav-kv

This test was buggy initially, let's clean it up first (see the other comment).


pkg/raft/raft_test.go line 1158 at r3 (raw file):

		require.NoError(t, storage.Append(index(1).terms(1, 2, 3)))
		sm := newTestRaft(1, 5, 1, storage)
		sm.becomeFollower(2, 2)

This test was incorrect. The sm.becomeFollower(2, 2) call makes the node a follower at term 2 with the leader node ID = 2. But the log in the Storage, initialized 2 lines above, ends with an entry at term 3. This can't be the case: the invariant is that Term >= log.Entry(lastIndex).Term.

We should probably:

  • Call storage.SetHardState() with the HardState at term 3, and commit index at commit.
  • Then the newTestRaft, while creating the raft node, will read the hard state from storage and init it correctly.
  • Then we can remove the commitTo invocation in the next line.
  • And we probably can remove the logSynced initialization as well, because it should be initialized when the raft node is created in newTestRaft.

This change makes the commit index advancement in handleHeartbeat safe.
Previously, a follower would attempt to update the commit index to
whichever was sent in the MsgHeartbeat message. Out-of-bound indices
would crash the node.

It is always safe to advance a commit index if the follower's log is in

Epic: None

Relase note: None
@lyang24 lyang24 force-pushed the raft_crdb_version branch from bb863de to 24cdedd Compare April 17, 2024 08:29
Copy link

blathers-crl bot commented Apr 17, 2024

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft.go line 358 at r5 (raw file):

	// candidate. For a follower, becomes true the first time a MsgApp append to
	// the log succeeds.
	logSynced bool

Let's replace this field with accTerm uint64 (the latest "accepted term").

The code for tracking this field will be simpler because accTerm will be update only in 2 places: when accepting a log append, or when installing a snapshot (which wipes the log, and is equivalent to simultaneously accepting and committing the state represented by this snapshot).

In contrast, the code for updating logSynced is a bit less explicit, and there are more conditions under which it has to be updated (for example, we need to update it also when r.Term is bumped).

Having the accTerm field will be aligned with #122446, and eventually this field will become persistent.

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @lyang24)


pkg/raft/raft.go line 358 at r5 (raw file):

Previously, pav-kv (Pavel Kalinnikov) wrote…

Let's replace this field with accTerm uint64 (the latest "accepted term").

The code for tracking this field will be simpler because accTerm will be update only in 2 places: when accepting a log append, or when installing a snapshot (which wipes the log, and is equivalent to simultaneously accepting and committing the state represented by this snapshot).

In contrast, the code for updating logSynced is a bit less explicit, and there are more conditions under which it has to be updated (for example, we need to update it also when r.Term is bumped).

Having the accTerm field will be aligned with #122446, and eventually this field will become persistent.

You could draw some inspiration from the earlier version of the upstream PR: etcd-io/raft@87ac09e.

@lyang24 lyang24 closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants