Remove ami pinning from scale-config.yml files #6163
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
At news year eve, we had an small CI outage. Queue started to grow due to lack of capacity to create new linux instances. After investigating the issue we noticed that this is due to the pinning of labels in the
ami
tag ofscale-config.yml
and its variants. This is due to the fact that for security reasons, Amazon removed the tag on its services, so we could not resolve the AMI ID from the tag search.The goal of this tag is to enable to migrate to a newer AMI type runner-by-runner, so we can troubleshoot problems and avoid the issue of being stuck in the migration because of a particular job that runs in a particular instance. This was included with the concept of variants.
Now that the migration is complete, the correct approach is to REMOVE these labels and rely on the labels that are pinned at release/deploy time. They are safer, for many reasons, somo of them:
So, to avoid outages similar to what we had, this action should be taken.
This is on top of the following changes that correctly reflected the pinning we're using to the release:
cc @zxiiro @malfet @atalman @seemethere @ZainRizvi