Backport of fix: handling non reschedule disconnecting and reconnecting allocs into release/1.5.x #18885

hc-github-team-nomad-core · 2023-10-27T12:08:58Z

Backport

This PR is auto-generated from #18701 to be assessed for backporting due to the inclusion of the label backport/1.5.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@Juanadelacuesta
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/nomad/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.

This PR fixes Max_client_disconnect ignores no-reschedule policy

The bug had 2 root causes:

The code was written under the assumption that disconnecting allocs should always be rescheduled:

nomad/scheduler/reconcile.go

Line 461 in cecd9b0

_, rescheduleDisconnecting, _ := disconnecting.filterByRescheduleable(a.batch, true, a.now, a.evalID, a.deployment)

Any disconnecting allocation that should not be rescheduled now, was ignored for the rest of the reconcile process and would always end up on a new placement, because it was not taken into consideration for the deployment final count.

The logic to define if an allocation should be reschedule didn't take into account disconnecting allocations, they were
never set to be explicitly replaced, because according to the shouldFilter logic, they are untainted and are ignored on line 396, and never make it to the elegibleNow selection.

          isUntainted, ignore := shouldFilter(alloc, isBatch)
   	if isUntainted && !isDisconnecting {
   		untainted[alloc.ID] = alloc
   	}
   	if isUntainted || ignore {
   		continue
   	}

   	// Only failed allocs with desired state run get to this point
   	// If the failed alloc is not eligible for rescheduling now we
   	// add it to the untainted set. Disconnecting delay evals are
   	// handled by allocReconciler.createTimeoutLaterEvals
   	eligibleNow, eligibleLater, rescheduleTime = updateByReschedulable(alloc, now, evalID, deployment, isDisconnecting)
   	if !isDisconnecting && !eligibleNow {

L396
Also on the updateByReschedulable, disonnecting allocs where never set to reschedule now:

	if eligible && (alloc.FollowupEvalID == evalID || rescheduleTime.Sub(now) <= rescheduleWindowSize) {
		rescheduleNow = true
		return
	}

This allocations would once again end up being replaced because the deployment count would be one short, not because it should be replaced and most definitely not taking into account its reschedule policy.

Once the previous problem was fixed and the correct reschedule policy was being taken into account, a new bug was visible: The disconnecting allocations don't have a fail time, so to calculate the next reschedule time, the alloc reconciler now was used. The minimun delay for batch jobs is 5 sec and 30 sec for services, following the logic in the code, the next reschedule time ends up always being at least 5 sec from now, and the disconnecting allocations were always set to rescheduleLater, but for disconnecting allocations, the rescheduleLater group was ignored, as shown previously on line 461 of the rescheduler.

The function filterByRescheduleable was modified to correctly process disconnecting allocation and the disconnecting untainted and reconnect later allocations were added to the result to avoid unnecessary placement and correctly replace the disconnecting allocs using the reschedule policy.

Once the previous change was done, new evaluations were created with the new reschedule time to replace the disconnecting allocs, but by the time the evaluation was executed, the allocs don't qualify as disconnecting anymore so a new restriction needed to be added in the updateByReschedule function for alloc that were not disconnecting and had not failed yet, the unknown allocs.
In order to avoid adding replacements for this allocs every time the reconciler run, a validation using the followupEvalID was added.
When taking into account the reschedule policy, every disconnecting alloc generates 2 new evaluations: one for the next reschedule time, in order to place a replacement and a second one to set the alloc as lost once the max_client_disconnect times out.
The way the code is written, the disconnecting alloc would get updated with the evaluation ID of the second generated evaluation, making the nextEvalID useless to verify if the unknown alloc being rescheduled was in fact a replacement for an old one. A change to overwrite the nextEvalID with the ID of the evaluation corresponding to the replacement was set in place. This change also made finding if an alloc was a replacement of an older one or not on reconnecting a lot easier.

Overview of commits

e8efe2d

hashicorp-cla · 2023-10-27T12:09:03Z

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA. If you already have a GitHub account, please add the email address used for this commit to your account.

_{Have you signed the CLA already but the status is still pending? Recheck it.}

no-op commit due to failed cherry-picking

031cbf3

hc-github-team-nomad-core assigned Juanadelacuesta Oct 27, 2023

hc-github-team-nomad-core force-pushed the backport/b-gh-16515/deadly-trusting-squid branch from b62b58a to 1e0ed5a Compare October 27, 2023 12:08

hc-github-team-nomad-core requested a review from Juanadelacuesta October 27, 2023 12:09

vercel bot temporarily deployed to Preview – nomad October 27, 2023 12:14 Inactive

vercel bot temporarily deployed to Preview – nomad-storybook-and-ui October 27, 2023 12:18 Inactive

Juanadelacuesta marked this pull request as ready for review October 27, 2023 15:20

Juanadelacuesta merged commit 2781ee8 into release/1.5.x Oct 27, 2023
1 check passed

Juanadelacuesta deleted the backport/b-gh-16515/deadly-trusting-squid branch October 27, 2023 15:20

Juanadelacuesta restored the backport/b-gh-16515/deadly-trusting-squid branch October 27, 2023 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of fix: handling non reschedule disconnecting and reconnecting allocs into release/1.5.x #18885

Backport of fix: handling non reschedule disconnecting and reconnecting allocs into release/1.5.x #18885

hc-github-team-nomad-core commented Oct 27, 2023

hashicorp-cla commented Oct 27, 2023

Backport of fix: handling non reschedule disconnecting and reconnecting allocs into release/1.5.x #18885

Backport of fix: handling non reschedule disconnecting and reconnecting allocs into release/1.5.x #18885

Conversation

hc-github-team-nomad-core commented Oct 27, 2023

Backport

hashicorp-cla commented Oct 27, 2023