-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache: don’t link blobonly based on chainid #3566
Conversation
Signed-off-by: Tonis Tiigi <[email protected]>
@sipsma @imeoer Looks like testing was successful in #2631 (comment) . PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving because I see the issue here and can see how this fixes it, but have some concerns.
It seems like with this change it's possible for there to be a snapshot associated with the record but blobonly
to still be true, which seems to contradict what I assume the purpose of blobonly
was originally. If it works, it works, it just seems confusing.
Double checking my understanding: the fundamental problem is that the ref can be marked as not blobonly
due to the existence of an equivalent snapshot but then when we later try to export such a ref the code presumes blobs for it must exist even though they don't due to different compression from the snapshot it dedupes with. Right?
In that case the patch from @imeoer makes more sense to me. That fixes the logic to determine whether a blob exists or not based on just checking the content store.
I actually feel like ideally it should be taken further; blobonly
doesn't even need to exist in metadata and instead we could just check the content store everytime to determine whether it exists locally or not. It's the ultimate source of truth anyways.
That's a larger change of course, hard to do w/ back-compat and I'm probably not remembering details here so LGTM on this change for the short term either way.
@sipsma Update this in follow-up if you have ideas. |
I have applied this change against Result differs from the tests I have done (described in #2631 (comment)). In the test bed, I have a references to public images only (https://github.com/sharesight/buildkit-debug/blob/pr-2/docker/Dockerfile). In our private repo, I have 2 branches where I see different results:
For both these branches, I am using the patched version linked above (same change as here). Result after 24h:
I store cache on S3 and call build-push action twice. First invocation: restore from S3, save to local directory. Second invocation: restore from local directory, save to S3. Workflow fails at second invocation (https://github.com/sharesight/buildkit-debug/blob/pr-2/.github/workflows/docker.yml#L96) with same error ( @tonistiigi would it be possible that this patch does not address both cases (private image vs no private images)? |
Debugging #2631 and trying to understand how #3447 might fix this I found this case where we link to another snapshot that has the same chainID(uncompressed digest) but different blobID(compressed digest). In that case, we can mark the blob as
!blobOnly
while the ref is actually lazy. That does not look correct.This case should be quite hard to hit, but maybe there are some layers with very common files that could make it more likely. @ohmer is testing if this is enough to fix the issue they are seeing.
Looking at @imeoer patch #3447 again it doesn't look very scary, so if this patch doesn't fully fix this, we could pick that as well, but I'm worried that we don't understand the actual issue, then and it will return in some other form.
PTAL @imeoer @sipsma
Signed-off-by: Tonis Tiigi [email protected]