Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation for the Reconcile Logic #17

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

gdasson
Copy link

@gdasson gdasson commented Jan 4, 2025

This PR is an intial implementation of the design mentioned here

The code is now in working state and able to create an etcd cluster. However, the code will continue to be refined and made more production ready as we review and incorporate feedback.

Signed-off-by: Gaurav Dasson <[email protected]>
go.mod Show resolved Hide resolved
config/rbac/role.yaml Outdated Show resolved Hide resolved
internal/etcdutils/etcdutils.go Outdated Show resolved Hide resolved
internal/etcdutils/etcdutils.go Outdated Show resolved Hide resolved
internal/etcdutils/etcdutils.go Outdated Show resolved Hide resolved
internal/utils/utils.go Outdated Show resolved Hide resolved
internal/utils/utils.go Outdated Show resolved Hide resolved
internal/utils/utils.go Outdated Show resolved Hide resolved
internal/utils/utils.go Outdated Show resolved Hide resolved
internal/utils/utils.go Outdated Show resolved Hide resolved
@ahrtr
Copy link
Member

ahrtr commented Jan 4, 2025

@gdasson Thanks for the PR.

Could you please resolve the comments and ensure the PR is in the best shape you think before marking it as ready for review and also remove "draft" from the title? I will take a second round of review once you mark it as ready for review.

@ahrtr ahrtr marked this pull request as draft January 4, 2025 14:50
@gdasson gdasson changed the title Initial Draft for the Reconcile Logic Initial implementation for the Reconcile Logic Jan 5, 2025
@gdasson gdasson marked this pull request as ready for review January 6, 2025 03:27
@ahrtr
Copy link
Member

ahrtr commented Jan 6, 2025

@jmhbnz @ivanvc Do you know why workflow checks are not triggered in this PR? cc @hakman

Also it'd be great if we could add more e2e test case as mentioned in #19. Of course, in followup PRs.

@ahrtr ahrtr requested review from hakman and justinsb January 6, 2025 12:46
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gdasson
Once this PR has been reviewed and has the lgtm label, please assign jmhbnz for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

internal/controller/etcdcluster_controller.go Show resolved Hide resolved
internal/controller/etcdcluster_controller.go Outdated Show resolved Hide resolved
internal/controller/etcdcluster_controller.go Outdated Show resolved Hide resolved
internal/controller/etcdcluster_controller.go Outdated Show resolved Hide resolved
internal/controller/etcdcluster_controller.go Outdated Show resolved Hide resolved
internal/controller/etcdcluster_controller.go Outdated Show resolved Hide resolved
internal/controller/utils.go Outdated Show resolved Hide resolved

// Check if the size of the stateful set is less than expected size
// Or if there is a pending learner to be promoted
if *sts.Spec.Replicas < int32(etcdCluster.Spec.Size) && memberCnt > 0 {
Copy link
Member

@ahrtr ahrtr Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if *sts.Spec.Replicas < int32(etcdCluster.Spec.Size) && memberCnt > 0 {
if *sts.Spec.Replicas < int32(etcdCluster.Spec.Size) {

We should guarantee the following two conditions before performing the scale in & out:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahrtr : I am a little confused on this one. In the reference code you have given the inverse condition i.e replica != memberCnt but in the comment you are suggesting for it to be true? I am also thinking the pros/cons for the validation of sts.Spec.Replica == memberCnt before statefulset scale in/out. What additional assurance would this check provide that the reconcile logic is not currently guaranteeing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally sts.Spec.Replica == memberCnt (returned from etcd cluster) should be always true after etcd-operator finishes each round of Reconcile.

If it isn't true (not equal), then it means there are something wrong happened in previous Reconcile round, possible reasons:

  1. etcd-operator crashes right after we add a new member (call MemberAdd) and before reconcile the statefulSet (update the replica). In this case, sts.Spec.Replica < memberCnt is true.
  2. etcd-operator crashes right after we delete a member (call MemberDelete) and before reconcile the statefulSet (update the replica). In this case, sts.Spec.Replica > memberCnt is true.

We should always fix any problems coming from previous reconcile rounds before we do something new in current reconcile round. So once we get to current round's scale in & out steps, we can assume that sts.Spec.Replica == memberCnt (returned from etcd cluster) is true.

You can add a TODO item for now and do it in a followup PR, just similar to what I did in https://github.com/ahrtr/etcd-operator/blob/f3b16f5e0b8e2751a1ddc35e0c6a8ccaf4afe568/controller.go#L302-L315.

@ahrtr
Copy link
Member

ahrtr commented Jan 7, 2025

@hakman @justinsb @jmhbnz @ArkaSaha30 Please take a look at this PR. It will be the base for the following PRs. Thanks.

cc @ivanvc

Also I am most concerned about the e2e test as mentioned in #17 (comment)

@jmhbnz
Copy link
Member

jmhbnz commented Jan 7, 2025

@jmhbnz @ivanvc Do you know why workflow checks are not triggered in this PR? cc @hakman

Probably the GitHub actions bug we see from time to time, we may need to close and re-open this pr for it trigger. I will migrate the existing workflows to prow longer term.

/ok-to-test
/retest

@ahrtr
Copy link
Member

ahrtr commented Jan 8, 2025

@gdasson please rebase this PR, thx

}

if etcdCluster.Spec.Size == 0 {
logger.Info("EtcdCluster size is 0..Skipping next steps")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, a warning message might a little better. It is unusual to create a EtcdCluster with 0 member.

@ahrtr
Copy link
Member

ahrtr commented Jan 10, 2025

@gdasson please rebase this PR and let's merge this PR. Also squash the commits (I can also do it for you if you have any difficulties). I will create more following tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants