Completed Job pods cause an error #33

dlaidlaw · 2020-08-28T19:43:09Z

If the node being drained has any pods that are not ready, such as a pod created by a Job that has completed, then there will be an error.

The completed job will not be removed from the evictable jobs list, and therefore the code will loop forever (or until the lambda times out) waiting for the job to be evicted.

The pod_is_evictable method should ignore any pods that are not in a ready state, as well as DaemonSet pods. An alternative would be to ignore the pod if its owner_reference is a Job.

svozza · 2020-09-01T15:25:29Z

Hi there, apologies for the delay in responding. It sounds like you have an idea on how to fix this, would you like to open a pull request that makes the changes you think are required?

dlaidlaw · 2020-09-03T11:18:20Z

I would love to. Unfortunately I am unable to do so in a reasonable time due to company policies.

svozza · 2020-09-03T12:39:47Z

No problem, is it really just as simple as looking for the value in that owner_reference field? If I get time next week I can give that a go.

dlaidlaw · 2020-09-03T13:55:55Z

What I settled on was:

            if ref.kind == CONTROLLER_KIND_DAEMON_SET:
                logger.info("Skipping DaemonSet {}/{}".format(pod.metadata.namespace, pod.metadata.name))
                return False
            elif ref.kind == CONTROLLER_KIND_JOB:
                if pod.status and pod.status.phase:
                    if pod.status.phase == "Failed":
                        logger.info("Skipping failed Job pod {}/{}".format(pod.metadata.namespace, pod.metadata.name))
                        return False
                    elif pod.status.phase == "Succeeded":
                        logger.info("Skipping succeeded Job pod {}/{}".format(pod.metadata.namespace, pod.metadata.name))
                        return False

CONTROLLER_KIND_JOB was set to "Job".

The thought being that if the job failed or succeeded it could be ignored, otherwise it is running and could be evicted. I am not sure if everyone would want to evict running jobs, however.

svozza · 2020-09-03T15:23:12Z

Yeah, I see what you mean about people not wanting to evict running jobs, leave it with me.

hari2192 mentioned this issue Oct 12, 2020

feature/fix Completed Job pods cause an error #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completed Job pods cause an error #33

Completed Job pods cause an error #33

dlaidlaw commented Aug 28, 2020

svozza commented Sep 1, 2020

dlaidlaw commented Sep 3, 2020

svozza commented Sep 3, 2020 •

edited

Loading

dlaidlaw commented Sep 3, 2020

svozza commented Sep 3, 2020

Completed Job pods cause an error #33

Completed Job pods cause an error #33

Comments

dlaidlaw commented Aug 28, 2020

svozza commented Sep 1, 2020

dlaidlaw commented Sep 3, 2020

svozza commented Sep 3, 2020 • edited Loading

dlaidlaw commented Sep 3, 2020

svozza commented Sep 3, 2020

svozza commented Sep 3, 2020 •

edited

Loading