-
Notifications
You must be signed in to change notification settings - Fork 57
Completed Job pods cause an error #33
Comments
Hi there, apologies for the delay in responding. It sounds like you have an idea on how to fix this, would you like to open a pull request that makes the changes you think are required? |
I would love to. Unfortunately I am unable to do so in a reasonable time due to company policies. |
No problem, is it really just as simple as looking for the value in that |
What I settled on was: if ref.kind == CONTROLLER_KIND_DAEMON_SET:
logger.info("Skipping DaemonSet {}/{}".format(pod.metadata.namespace, pod.metadata.name))
return False
elif ref.kind == CONTROLLER_KIND_JOB:
if pod.status and pod.status.phase:
if pod.status.phase == "Failed":
logger.info("Skipping failed Job pod {}/{}".format(pod.metadata.namespace, pod.metadata.name))
return False
elif pod.status.phase == "Succeeded":
logger.info("Skipping succeeded Job pod {}/{}".format(pod.metadata.namespace, pod.metadata.name))
return False CONTROLLER_KIND_JOB was set to "Job". The thought being that if the job failed or succeeded it could be ignored, otherwise it is running and could be evicted. I am not sure if everyone would want to evict running jobs, however. |
Yeah, I see what you mean about people not wanting to evict running jobs, leave it with me. |
If the node being drained has any pods that are not ready, such as a pod created by a Job that has completed, then there will be an error.
The completed job will not be removed from the evictable jobs list, and therefore the code will loop forever (or until the lambda times out) waiting for the job to be evicted.
The pod_is_evictable method should ignore any pods that are not in a ready state, as well as DaemonSet pods. An alternative would be to ignore the pod if its owner_reference is a Job.
The text was updated successfully, but these errors were encountered: