Skip to content
This repository has been archived by the owner on Feb 21, 2024. It is now read-only.

Possible to first wait for pods to gracefully terminate then terminate the node #39

Open
alnhk opened this issue Feb 25, 2021 · 3 comments

Comments

@alnhk
Copy link

alnhk commented Feb 25, 2021

Hello.,

We raised a case with AWS support team, and below is the problem statement:

We have AWS EKS production environment, and contains ~500 EKS worker nodes (1.15)., we observe that most of the nodes were more than 80 days old, with this 80+ days old, noticed degrading performance on pod deployments. So, we wanted to do instance refresh on EKS nodes where it should first cordon the node, wait for the pods to terminate gracefully then the node is terminated.

With above point, the AWS Support team gave reference to this github - "amazon-k9s-node-drainer". So, we are doing this POC on this "amazon-k8s-node-drainer", on "DEV" EKS environment, we observe that it works same manner as "instance refresh" without doing standard process like :

  1. cordon the node
  2. wait for all deployed pods to terminate
  3. destroy the node
  4. according to ASG, new node is added

So, wanted to check if there is a way to do this manner ? especially follow "standard process" before terminating the EKS nodes.

Thanks
HK

@svozza
Copy link
Contributor

svozza commented Feb 25, 2021

I'm not quite sure I follow are you're saying is it that your POC doesn't do any of those steps you've mentioned? What is instance refresh? Are you referring to this? https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-max-instance-lifetime.html

@alnhk
Copy link
Author

alnhk commented Feb 25, 2021

I'm not quite sure I follow are you're saying is it that your POC doesn't do any of those steps you've mentioned? What is instance refresh? Are you referring to this? https://docs.aws.amazon.com/autoscaling/ec2/userguide/asg-max-instance-lifetime.html

We already followed, we were told by AWS support team stating that this "ASG MAX INSTANCE LIFETIME" is only for EC2, not intended for kubernetes based Worker nodes. We are looking for similar EKS kubernetes node drainer( cordon off the node if older than 15 days -> wait till all deployed pods is terminated -> terminate the node -> according to ASG for EKS node, the new node is added).

If this git repo doesnt support what we are looking for, let me know, i shall get back to the AWS support team and confirm this git repo doesnt follow.

@svozza
Copy link
Contributor

svozza commented Feb 26, 2021

The node drainer should still work if you set the max age: when the node is terminated it will generate the EC2 Instance-terminate Lifecycle Action event (you see can here in the CloudFormation where we subscribe to the event) and the lambda will be trigger. The only way it wouldn't work is if for some reason the Max Age parameter generates a different type of event but I see no reason why that would be the case and even if it were you could just change the CloudFormation I've linked to there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants