Harbor proxy cache fails pull instead of returning locally cached copy when upstream returns 429 #19899

ricardojdsilva87 · 2024-01-23T16:45:57Z

Hello,
This is a reopening of the ticket

Recently we have been hit with several 429 Too Many Requests Error coming from Harbor. After digging in we saw that all of these errors came from the docker proxy registry that we have set against docker.io to serve as a cache for docker.io container images.

All of these errors happen only for images that rely on tags that might be updated very often, like latest, or even alpine3.15 for example. In this case it was for amazoncorretto.

The error shown is caused by a protection on DockerHub that does not rely on authenticated users like the pull limit.
It seems that this blockage can happen at any time an is binded only by IP, the DockerHub infrastructure blocks these calls if it seems fit.

We use Datadog to monitor our infrastrucure and we can see that there are some of these errors happening from time to time

And the error logs from Harbor:

This is an issue because it seems that Harbor does not serve the cached image/tag/layer even if the check against DockerHub fails. This causes the container to be unable to run because it cannot download the image.
We suspect that this might happen also for other tags other than latest, etc, since Harbor needs to check if the cached layer is still the same on DockerHub.

My question is if there is some kind of protection on Harbor that we could enable, like for example:

Serve the cached image if the request to DockerHub fails
Request updates to DockerHub about layer information within a timeframe per image layer/tag. This is if there was a request already to check if a tag was updated on DockerHub in the last 6 hours don't do it otherwise the API rate limit might be reach. In reality most of the images, latest, etc are not updated as often, it might take some days until a new image is recreated.

If these options cannot be configured via helm-chart/ UI this would be a nice feature to implement on the core itself. We might see the same behaviour happening for other services like quay.io or even gcr.io in the future if they implement the same protection feature per IP.

Thanks for the support

stonezdj · 2024-01-29T07:33:50Z

Current implementation, it will return 429 to the client when the upstream registry response 429.
For the sack of stability, we need some enhancement to the fix: #18750 to allow user to setup a timeframe to skip the check of the manifest in the upstream registry.

ricardojdsilva87 · 2024-02-01T16:58:34Z

@stonezdj thanks for the update.
We are still seeing some 429 requests but it seems that the container images are able to be used and not crash. We'll keep monitoring.
Adding a cache timeout parameter would be nice also like you mentioned.
Thanks!

strowi · 2024-09-25T09:34:13Z

Just ran into a similar issue with proxying the trivy-db from ghcr.io.
It would be really helpful if this could be cirumvented somehow on the harbor-side.

tuunit · 2024-11-27T10:45:58Z

Related to #21122 and maybe fixed by #21141

wy65701436 assigned stonezdj Jan 29, 2024

wy65701436 added the area/proxy-cache label Jan 29, 2024

stonezdj added the kind/requirement New feature or idea on top of harbor label Jan 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harbor proxy cache fails pull instead of returning locally cached copy when upstream returns 429 #19899

Harbor proxy cache fails pull instead of returning locally cached copy when upstream returns 429 #19899

ricardojdsilva87 commented Jan 23, 2024

stonezdj commented Jan 29, 2024 •

edited

Loading

ricardojdsilva87 commented Feb 1, 2024

strowi commented Sep 25, 2024

tuunit commented Nov 27, 2024

Harbor proxy cache fails pull instead of returning locally cached copy when upstream returns 429 #19899

Harbor proxy cache fails pull instead of returning locally cached copy when upstream returns 429 #19899

Comments

ricardojdsilva87 commented Jan 23, 2024

stonezdj commented Jan 29, 2024 • edited Loading

ricardojdsilva87 commented Feb 1, 2024

strowi commented Sep 25, 2024

tuunit commented Nov 27, 2024

stonezdj commented Jan 29, 2024 •

edited

Loading