Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: don’t link blobonly based on chainid #3566

Merged
merged 1 commit into from
Feb 9, 2023

Conversation

tonistiigi
Copy link
Member

Debugging #2631 and trying to understand how #3447 might fix this I found this case where we link to another snapshot that has the same chainID(uncompressed digest) but different blobID(compressed digest). In that case, we can mark the blob as !blobOnly while the ref is actually lazy. That does not look correct.

This case should be quite hard to hit, but maybe there are some layers with very common files that could make it more likely. @ohmer is testing if this is enough to fix the issue they are seeing.

Looking at @imeoer patch #3447 again it doesn't look very scary, so if this patch doesn't fully fix this, we could pick that as well, but I'm worried that we don't understand the actual issue, then and it will return in some other form.

PTAL @imeoer @sipsma

Signed-off-by: Tonis Tiigi [email protected]

@crazy-max crazy-max added this to the v0.11.3 milestone Feb 6, 2023
@tonistiigi tonistiigi marked this pull request as ready for review February 8, 2023 20:59
@tonistiigi
Copy link
Member Author

@sipsma @imeoer Looks like testing was successful in #2631 (comment) . PTAL

Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving because I see the issue here and can see how this fixes it, but have some concerns.

It seems like with this change it's possible for there to be a snapshot associated with the record but blobonly to still be true, which seems to contradict what I assume the purpose of blobonly was originally. If it works, it works, it just seems confusing.

Double checking my understanding: the fundamental problem is that the ref can be marked as not blobonly due to the existence of an equivalent snapshot but then when we later try to export such a ref the code presumes blobs for it must exist even though they don't due to different compression from the snapshot it dedupes with. Right?

In that case the patch from @imeoer makes more sense to me. That fixes the logic to determine whether a blob exists or not based on just checking the content store.

I actually feel like ideally it should be taken further; blobonly doesn't even need to exist in metadata and instead we could just check the content store everytime to determine whether it exists locally or not. It's the ultimate source of truth anyways.

That's a larger change of course, hard to do w/ back-compat and I'm probably not remembering details here so LGTM on this change for the short term either way.

@tonistiigi tonistiigi merged commit a196d7b into moby:master Feb 9, 2023
@tonistiigi
Copy link
Member Author

@sipsma Update this in follow-up if you have ideas.

@ohmer
Copy link

ohmer commented Feb 13, 2023

I have applied this change against v0.11.2 (https://github.com/moby/buildkit/compare/v0.11.2...sharesight:buildkit:fix-blobonly?expand=1) and started using it against our private repo. Issue has reappeared unfortunately.

Result differs from the tests I have done (described in #2631 (comment)). In the test bed, I have a references to public images only (https://github.com/sharesight/buildkit-debug/blob/pr-2/docker/Dockerfile). In our private repo, I have 2 branches where I see different results:

  • on our main branch, I have a Dockerfile with reference to private and public images (stored in ECR, if that matters).
  • on our test branch, I have a Dockerfile with reference only to public images (Docker Hub).

For both these branches, I am using the patched version linked above (same change as here). Result after 24h:

  • main branch: fails 10% of the time, 1 or 2 retry unblocked our pipelines
  • test branch: success rate is 100%

I store cache on S3 and call build-push action twice. First invocation: restore from S3, save to local directory. Second invocation: restore from local directory, save to S3. Workflow fails at second invocation (https://github.com/sharesight/buildkit-debug/blob/pr-2/.github/workflows/docker.yml#L96) with same error (Get error: content digest sha256: ***: not found when exporting image).

@tonistiigi would it be possible that this patch does not address both cases (private image vs no private images)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants