Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: use two releases to deploy virtual nodes #64

Merged
merged 2 commits into from
Jan 31, 2024

Conversation

fuweid
Copy link
Collaborator

@fuweid fuweid commented Jan 24, 2024

Some cloud providers will delete unknown or not-ready nodes. If we render both nodes and controllers in one helm release, helm won't wait for controller ready before creating nodes. The nodes will be deleted by cloud providers.

$ kperf vc nodepool add xxx --nodes=200

...

6s          Normal   RegisteredNode   node/xxx-26               Node xxx-26 event: Registered Node xxx-26 in Controller
0s          Normal   DeletingNode     node/xxx-110              Deleting node xxx-110 because it does not exist in the cloud provider
0s          Normal   DeletingNode     node/xxx-139              Deleting node xxx-139 because it does not exist in the cloud provider
0s          Normal   DeletingNode     node/xxx-175              Deleting node xxx-175 because it does not exist in the cloud provider
0s          Normal   DeletingNode     node/xxx-162              Deleting node xxx-162 because it does not exist in the cloud provider

The helm's post-install or post-upgrade hook can ensure that it won't deploy nodes until controllers ready. However, resources created by helm hook aren't part of helm release. We need extra step to cleanup nodes resources when we delete nodepool's helm release.

Based on this fact, we separate one helm release into two. One is for controllers and other one is for nodes.

@fuweid fuweid force-pushed the weifu/render-node-after-kwok branch 2 times, most recently from cd5c30e to 676e1ba Compare January 31, 2024 07:18
@fuweid fuweid marked this pull request as ready for review January 31, 2024 07:36
Some cloud providers will delete unknown or not-ready nodes. If we render both
nodes and controllers in one helm release, helm won't wait for controller ready
before creating nodes. The nodes will be deleted by cloud providers.

```bash
$ kperf vc nodepool add xxx --nodes=200

...

6s          Normal   RegisteredNode   node/xxx-26               Node xxx-26 event: Registered Node xxx-26 in Controller
0s          Normal   DeletingNode     node/xxx-110              Deleting node xxx-110 because it does not exist in the cloud provider
0s          Normal   DeletingNode     node/xxx-139              Deleting node xxx-139 because it does not exist in the cloud provider
0s          Normal   DeletingNode     node/xxx-175              Deleting node xxx-175 because it does not exist in the cloud provider
0s          Normal   DeletingNode     node/xxx-162              Deleting node xxx-162 because it does not exist in the cloud provider
```

The helm's post-install or post-upgrade hook can ensure that it won't deploy
nodes until controllers ready. However, resources created by helm hook aren't
part of helm release. We need extra step to cleanup nodes resources when we
delete nodepool's helm release.

Based on this fact, we separate one helm release into two. One is for
controllers and other one is for nodes.

Signed-off-by: Wei Fu <[email protected]>
@fuweid fuweid force-pushed the weifu/render-node-after-kwok branch from 4637ce7 to 3720317 Compare January 31, 2024 07:44
@fuweid fuweid merged commit fb2e27c into Azure:main Jan 31, 2024
4 checks passed
@fuweid fuweid deleted the weifu/render-node-after-kwok branch January 31, 2024 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant