-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability for parallel Jenkins multi-arch builds #99
Conversation
@@ -1,15 +1,18 @@ | |||
// one job per arch (for now) that just builds "the top thing" (triggered by the meta-update job) | |||
properties([ | |||
disableConcurrentBuilds(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, hmm, something to think about here:
I wish Jenkins had something like GitHub's concurrency groups -- this prevents the same buildId
from even attempting to build twice at the same time, and with this PR the only thing that prevents that is that these are canonically triggered by the "trigger" job, which is fine for the normal case, but when things go wrong and we're running "build" by hand to debug, nothing will stop "trigger" from firing and potentially clobbering the thing we're testing. 🤔
I think the closest thing Jenkins has is "Lockable Resources" (https://plugins.jenkins.io/lockable-resources/), but they're a really awful experience and IIRC nothing cleans them up, so we can't reasonably put an arbitrary number of those into the system. 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "Throttle Concurrent Builds" plugin can at least get things with the same parameters from running at the same time: https://www.gusi.me/2022/05/06/Disable-concurrent-builds-based-on-parameters.html. It will still be quasi-queued though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throttleJobProperty(
limitOneJobWithMatchingParams: true,
paramsToUseForLimit: 'identifier, buildId',
throttleEnabled: true,
throttleOption: 'project',
),
Ok, I did a local test using the "Throttle Builds" plugin with this job config and found that it can stop concurrent jobs from adding to the Jenkins queue when specific parameters match already queued/running jobs.
If the build job activation comes from something like the trigger job that uses waitForStart :true
, then the trigger job will be stuck waiting to add it to the queue and not start anything else. This should be fine since trigger is the only one that should be starting them and so would only happen if we manually had started a build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly cosmetics! 😄 ❤️
Jenkinsfile.build
Outdated
string(name: 'identifier', trim: true), | ||
string(name: 'buildId', trim: true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put the "required" parameter first and give the identifier
a description so we remember what it is while triggering the job manually (also doubles as a "load-bearing" code comment 😄):
string(name: 'identifier', trim: true), | |
string(name: 'buildId', trim: true), | |
string(name: 'buildId', trim: true), | |
string(name: 'identifier', trim: true, description: '(optional) used to set <code>currentBuild.displayName</code> to a meaningful value earlier'), |
Jenkinsfile.trigger
Outdated
//echo(json) // for debugging/data purposes | ||
// list of closures that we can use to wait for the jobs on. | ||
def waitQueue = [:] | ||
def addToWait(identifier, buildId, externalizableId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall your original version of this function actually was adding the given parameters to the queue, but given the complications with that IMO we should rename this to make it more clear it only returns the closure and adding it to the queue is still something the caller is responsible for; maybe something like this?
def addToWait(identifier, buildId, externalizableId) { | |
def waitQueueClosure(identifier, buildId, externalizableId) { |
Jenkinsfile.trigger
Outdated
// "catchError" to set "stageResult" :( | ||
catchError(message: 'Build of "' + identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') { | ||
stage(identifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know you tested and it should work in this order, but I think it's slightly clearer if we swap these two (both so we don't catch stage()
itself somehow failing and so it's clear our stage happens and then we set the state of it from within not without):
// "catchError" to set "stageResult" :( | |
catchError(message: 'Build of "' + identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') { | |
stage(identifier) { | |
stage(identifier) { | |
// "catchError" to set "stageResult" :( | |
catchError(message: 'Build of "' + identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane. I should also move the other one to inside the stage(buildObj.identifier)
on https://github.com/docker-library/meta-scripts/pull/99/files#diff-a1eafb3a911bf01d23222ee7808d97d5f227a9608c46843ae4908cdca3408234R139, but that means I'll need to duplicate it for the else
block but that seems fine.
Jenkinsfile.trigger
Outdated
|
||
// stage to wrap up all the build job triggers that get waited on later | ||
stage('trigger') { | ||
for (int i = 0; i < queue.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you remember what the error is when you left this with the previous syntax so we could note it in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it from reading https://www.jenkins.io/doc/pipeline/examples/#jobs-in-parallel and https://www.jenkins.io/doc/pipeline/examples/#parallel-from-grep, but it seems like the function is now enough to scope it correctly (tested locally), so I'll switch back to "for-in"
Jenkinsfile.trigger
Outdated
string(name: 'identifier', value: buildObj.identifier), | ||
string(name: 'buildId', value: buildObj.buildId), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same order swap)
string(name: 'identifier', value: buildObj.identifier), | |
string(name: 'buildId', value: buildObj.buildId), | |
string(name: 'buildId', value: buildObj.buildId), | |
string(name: 'identifier', value: buildObj.identifier), |
Jenkinsfile.trigger
Outdated
// trigger these quickly so they all get added to Jenkins queue in "queue" order | ||
quietPeriod: 0, // seconds | ||
// we'll wait on them after they are all queued | ||
waitForStart: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have a quietPeriod
on these (even though that would be nice) because it messes with waitForStart
(and we would then be "captive" for the entire quiet period) -- I think we should probably note that here. 🤔
Maybe something like this?
// trigger these quickly so they all get added to Jenkins queue in "queue" order | |
quietPeriod: 0, // seconds | |
// we'll wait on them after they are all queued | |
waitForStart: true, | |
// trigger these quickly so they all get added to Jenkins queue in "queue" order (also using "waitForStart" means we have to wait for the entire "quietPeriod" before we get to move on and schedule more) | |
quietPeriod: 0, // seconds | |
// we'll wait on the builds in parallel after they are all queued (so our sorted order is the queue order) | |
waitForStart: true, |
Trigger all jenkins builds (adding them to jenkins queue) and then parallel wait on them. We control the concurrency by the number of executors per arch (or if we add multiple machines that match the build label).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great 👍
I do think we should wait to merge it until after Thanksgiving at this point, though 👀 🦃
} else { | ||
// "catchError" to set "stageResult" :( | ||
catchError(message: 'Build of "' + buildObj.identifier + '" failed', buildResult: 'UNSTABLE', stageResult: 'FAILURE') { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blank line is extraneous, but it doesn't bother me to leave it in. 👍
} | ||
} | ||
} | ||
} | ||
|
||
// wait on all the 'build' jobs that were queued | ||
if (waitQueue.size() > 0) { | ||
parallel waitQueue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer to be explicit about the ()
in single-argument invocations like this (although I don't usually go as far as "naming" the parameter), but I don't think it really matters much either way. 👍
parallel waitQueue | |
parallel(waitQueue) |
Trigger all jenkins builds (adding them to jenkins queue) and then parallel wait on them. We control the concurrency by the number of executors per arch (or if we add multiple machines that match the build label).