Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of fix: handling non reschedule disconnecting and reconnecting allocs into release/1.5.x #18885

Merged

Conversation

hc-github-team-nomad-core
Copy link
Contributor

Backport

This PR is auto-generated from #18701 to be assessed for backporting due to the inclusion of the label backport/1.5.x.

🚨

Warning automatic cherry-pick of commits failed. If the first commit failed,
you will see a blank no-op commit below. If at least one commit succeeded, you
will see the cherry-picked commits up to, not including, the commit where
the merge conflict occurred.

The person who merged in the original PR is:
@Juanadelacuesta
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.

merge conflict error: POST https://api.github.com/repos/hashicorp/nomad/merges: 409 Merge conflict []

The below text is copied from the body of the original PR.


This PR fixes Max_client_disconnect ignores no-reschedule policy

The bug had 2 root causes:

  • The code was written under the assumption that disconnecting allocs should always be rescheduled:
    _, rescheduleDisconnecting, _ := disconnecting.filterByRescheduleable(a.batch, true, a.now, a.evalID, a.deployment)

Any disconnecting allocation that should not be rescheduled now, was ignored for the rest of the reconcile process and would always end up on a new placement, because it was not taken into consideration for the deployment final count.

  • The logic to define if an allocation should be reschedule didn't take into account disconnecting allocations, they were
    never set to be explicitly replaced, because according to the shouldFilter logic, they are untainted and are ignored on line 396, and never make it to the elegibleNow selection.
          isUntainted, ignore := shouldFilter(alloc, isBatch)
   	if isUntainted && !isDisconnecting {
   		untainted[alloc.ID] = alloc
   	}
   	if isUntainted || ignore {
   		continue
   	}

   	// Only failed allocs with desired state run get to this point
   	// If the failed alloc is not eligible for rescheduling now we
   	// add it to the untainted set. Disconnecting delay evals are
   	// handled by allocReconciler.createTimeoutLaterEvals
   	eligibleNow, eligibleLater, rescheduleTime = updateByReschedulable(alloc, now, evalID, deployment, isDisconnecting)
   	if !isDisconnecting && !eligibleNow {

L396
Also on the updateByReschedulable, disonnecting allocs where never set to reschedule now:

	if eligible && (alloc.FollowupEvalID == evalID || rescheduleTime.Sub(now) <= rescheduleWindowSize) {
		rescheduleNow = true
		return
	}

This allocations would once again end up being replaced because the deployment count would be one short, not because it should be replaced and most definitely not taking into account its reschedule policy.

Once the previous problem was fixed and the correct reschedule policy was being taken into account, a new bug was visible: The disconnecting allocations don't have a fail time, so to calculate the next reschedule time, the alloc reconciler now was used. The minimun delay for batch jobs is 5 sec and 30 sec for services, following the logic in the code, the next reschedule time ends up always being at least 5 sec from now, and the disconnecting allocations were always set to rescheduleLater, but for disconnecting allocations, the rescheduleLater group was ignored, as shown previously on line 461 of the rescheduler.

The function filterByRescheduleable was modified to correctly process disconnecting allocation and the disconnecting untainted and reconnect later allocations were added to the result to avoid unnecessary placement and correctly replace the disconnecting allocs using the reschedule policy.

Once the previous change was done, new evaluations were created with the new reschedule time to replace the disconnecting allocs, but by the time the evaluation was executed, the allocs don't qualify as disconnecting anymore so a new restriction needed to be added in the updateByReschedule function for alloc that were not disconnecting and had not failed yet, the unknown allocs.
In order to avoid adding replacements for this allocs every time the reconciler run, a validation using the followupEvalID was added.
When taking into account the reschedule policy, every disconnecting alloc generates 2 new evaluations: one for the next reschedule time, in order to place a replacement and a second one to set the alloc as lost once the max_client_disconnect times out.
The way the code is written, the disconnecting alloc would get updated with the evaluation ID of the second generated evaluation, making the nextEvalID useless to verify if the unknown alloc being rescheduled was in fact a replacement for an old one. A change to overwrite the nextEvalID with the ID of the evaluation corresponding to the replacement was set in place. This change also made finding if an alloc was a replacement of an older one or not on reconnecting a lot easier.


Overview of commits

@hashicorp-cla
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes


temp seems not to be a GitHub user.
You need a GitHub account to be able to sign the CLA. If you already have a GitHub account, please add the email address used for this commit to your account.

Have you signed the CLA already but the status is still pending? Recheck it.

@vercel vercel bot temporarily deployed to Preview – nomad October 27, 2023 12:14 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui October 27, 2023 12:18 Inactive
@Juanadelacuesta Juanadelacuesta marked this pull request as ready for review October 27, 2023 15:20
@Juanadelacuesta Juanadelacuesta merged commit 2781ee8 into release/1.5.x Oct 27, 2023
1 check passed
@Juanadelacuesta Juanadelacuesta deleted the backport/b-gh-16515/deadly-trusting-squid branch October 27, 2023 15:20
@Juanadelacuesta Juanadelacuesta restored the backport/b-gh-16515/deadly-trusting-squid branch October 27, 2023 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants