Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harbor Proxy should serve manifests from local repository if the remote manifest digest matches a local manifest #21122

Open
raphaelzoellner opened this issue Oct 31, 2024 · 8 comments · May be fixed by #21141
Assignees
Labels
area/proxy-cache kind/requirement New feature or idea on top of harbor

Comments

@raphaelzoellner
Copy link

raphaelzoellner commented Oct 31, 2024

Is your feature request related to a problem? Please describe.
DockerHub Proxy GET /manifest requests referencing a tag are not served from cache even when the remote manifest matches a manifest in the local repository.

This leads to requests reducing the users remainder of the rate limit.

This can be observed by running harbor with log level debug and fetching a manifest multiple times.

GET https://harbor.example.com/v2/docker-hub/goharbor/harbor-exporter/manifests/v2.10.2

[DEBUG] [/controller/proxy/controller.go:196]: Digest is not found in manifest list cache, key=cache:manifestlist:docker-hub/goharbor/harbor-exporter:sha256:b6bb051a967de0948992f4a44ed00369adcc04ba88cdc76ed8ddeb4326ccf8be
[DEBUG] [/server/middleware/repoproxy/proxy.go:203]: the tag is v2.10.2, digest is 
[WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo

https://github.com/goharbor/harbor/blob/v2.10.2/src/controller/proxy/controller.go#L193-L201C1
https://github.com/goharbor/harbor/blob/v2.10.2/src/server/middleware/repoproxy/proxy.go#L203-L209

Describe the solution you'd like
Harbor should serve the manifest from the local repository if the remote manifest digest matches the digest of the manifest in the local repository.

Describe the main design/architecture of your solution
Since Harbor already attempts to use a cache for ManifestLists I suggest to extend this behavior to Manifests by attempting to pull the manifest from the local repository.

https://github.com/goharbor/harbor/blob/v2.10.2/src/controller/proxy/controller.go#L159

Update:
Since middlewares would be skipped by the extended behavior described above, I suggest to push the manifest to the local repository with reference to the digest and tag if both are known instead.

err := m.local.PushManifest(art.Repository, getReference(art), man)

@stonezdj
Copy link
Contributor

stonezdj commented Nov 5, 2024

Could you please double check the artifact with the same tag and same digest exist in the proxy cache project? It seems that the artifact didn't cached yet from the log.

There are two caches, one is the registry, another is the redis cache, the following log is refer the redis cache.
[/controller/proxy/controller.go:196]: Digest is not found in manifest list cache, key=cache:manifestlist:docker-hub/goharbor/harbor-exporter:sha256:b6bb051a967de0948992f4a44ed00369adcc04ba88cdc76ed8ddeb4326ccf8be

@raphaelzoellner
Copy link
Author

raphaelzoellner commented Nov 5, 2024

Sure,
I've reproduced the behavior.

I've performed multiple GET manifest requests referencing the tag.
https://harbor.example.com/v2/docker-hub/goharbor/harbor-exporter/manifests/v2.10.2

2024-11-05T09:26:28Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo
2024-11-05T09:26:52Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo
2024-11-05T09:27:47Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo
2024-11-05T09:31:22Z [WARNING] [/server/middleware/repoproxy/proxy.go:207]: Artifact: docker-hub/goharbor/harbor-exporter:v2.10.2, digest: is not found in proxy cache, fetch it from remote repo

The corresponding artifact in the Harbor portal does not seem to have a tag associated with it, but was already created before the last GET request.
grafik

On the other hand when performing a GET manifest request referencing the digest, logs of the form is not found in proxy cache, fetch it from remote repo seem not to be created, I assume the manifest is served from the local repository in this case.
https://harbor.example.com/v2/docker-hub/goharbor/harbor-exporter/manifests/sha256:b6bb051a967de0948992f4a44ed00369adcc04ba88cdc76ed8ddeb4326ccf8be

@stonezdj
Copy link
Contributor

stonezdj commented Nov 7, 2024

So the problem is the tag is not created, not proxied image is not served?

@raphaelzoellner
Copy link
Author

raphaelzoellner commented Nov 7, 2024

Hello yes, the images are currently proxied in this case,
but not served from the local repository cache even though the manifest and layers exist in the local repository already.

The cause for this behavior seems to be that the manifest was initially only pushed with a reference to the digest, but not the tag.

@stonezdj
Copy link
Contributor

The root cause is the tag is not created in proxy cache project, and the request is pull image by tag, then it cause the image is not served with cached content. is that correct?

@raphaelzoellner
Copy link
Author

raphaelzoellner commented Nov 11, 2024

Yes I think so,
even if the tag exists in the proxy cache project, the digest should be compared (which should already be the case) since tags might be moved in the upstream registry to a new manifest referencing other layers.
https://github.com/goharbor/harbor/blob/1f75b7aaef21a1d87da8dbbff20022f58d82f9d4/src/controller/proxy/controller.go#L194C1-L194C63

I've created the following PR to push not only the manifest referenced by digest but also push the manifest referenced by tag if it is known, which aims to solve this.
#21141

@stonezdj was this PR intentionally closed?
#21141 (comment)

@raphaelzoellner
Copy link
Author

raphaelzoellner commented Nov 18, 2024

Hello @stonezdj,

why was #21141 closed?
Could this have been a mixup with #21123, which was already closed?

If this was intentionally closed, could you describe the concerns, so one can think about another solution to the problem?

@stonezdj
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy-cache kind/requirement New feature or idea on top of harbor
Projects
None yet
2 participants