-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ServiceMonitor contains a hard-coded serverName that assumes the operator namespace is cert-utils-operator #138
Comments
Only 12 issues listed and yet no updates? |
It seems like the certificate being issued looks properly configured if the operator was installed to the This shouldn't happen, because the template for the certificate resources takes into account the target namespace: In this case, it seems that the A few assumptions to validate:
And one (speculative) thing to try: Thanks for your patience. |
Hi @davgordo - |
Ah okay thanks for the clarification then @cigna-asoria I'm going to see if I can recreate the issue, sounds like it should be pretty easy to recreate. The only things that might be helpful for me to reference are:
I might discover that the problem is not challenging to recreate in which case I'll be able to reference these things in my own environment. But if you have time, it couldn't hurt to have more info. |
@davgordo - Let me get the data your requested |
Yes, so for context. When installing via Helm, we provide cert-manager support because we're making an assumption (sometimes it's a bad assumption) that users using Helm are probably targeting plain k8s. When the target platform is OpenShift, on the other hand, there are some built-in certificate capabilities that we can leverage instead. Specifically you'll see this config in the annotations of the So with that background, I just used OLM to deploy this operator, and the result looked okay to me so far. If I decode the certificate, I see the following SANS:
Those look good because they reflect the |
Here is the service yaml, for DNS, how do I pull that information? I can't provide the secret since it contains certificates.
|
Here is the DNS output. Downloads % openssl x509 -in cert.crt -text -noout |grep DNS |
@davgordo - I provided the information above. All seems right so why did Prometheus use the wrong server_name? |
So I think it doesn't look right to me, because I thought this operator is installed in the The operator is deployed to the |
@davgordo - cert-utils is installed under |
Ah hah! My apologies for misunderstanding. So Prometheus is going to search for services usually by label. We can tell it what labels to search for with My cluster spun down, but as soon as I spin back up, I will try to specify the Wild guess but, you don't happen to have a namespace called |
@davgordo No, we don't have a namespace called |
@davgordo Found it and I think this might be the problem? I bolded it below. Downloads># oc get ServiceMonitor cert-utils-operator-controller-manager-metrics-monitor -n openshift-operators -o yaml
|
Now we're cookin'. Server name is wrong there. Thanks for all your help with the extra info. The problem is clear now. We'll have to do some brainstorming for a fix. |
@davgordo - Yeah! Please do keep me informed. I have many clusters with this issue that i definitely want to fix. |
@cigna-asoria actually, I don't know for sure whether OLM creates that service monitor automatically... Did you all configure that, or was that provided by the operator provisioning? |
@davgordo - No, we did not configure that. We only upgraded/installed cert-utils instances through OperatorHub UI via the OpenShift Console. My take is that OpenShift deployed it. |
Ah I see it in my environment too. Thanks again. |
@cigna-asoria FYI, I know it's not an ideal fix, but I am able to modify the serverName manually and this change does not get overwritten by the operator. This might help you temporarily until we make the next release. |
@davgordo - Thanks, I will go that route until a fix is in place. Thanks again! |
This issue seems to persist as the fix linked above apparently hasn't been merged, could it be re-opened? |
Hi -
We are on OpenShift 4.8.35 and updated our cert-utils to 1.3.10 in all our environments.
But we are getting an alert message that the cert-utils metrics is down.
cert-utils is installed in namespace openshift-operators and not cert-utils-operator.
The endpoint is the IP and I can get those metrics per the commands you specify in the wiki, even using the service name.
But I'm getting this error:
Get "https://x.x.x.x:8443/metrics": x509: certificate is valid for cert-utils-operator-controller-manager-metrics-service.openshift-operators.svc, cert-utils-operator-controller-manager-metrics-service.openshift-operators.svc.cluster.local, not cert-utils-operator-controller-manager-metrics-service.cert-utils-operator.svc
so, i'm wondering if the problem is in the prometheus config for server_name.
tls_config: ca_file: /etc/prometheus/certs/secret_openshift-operators_cert-utils-operator-certs_tls.crt server_name: cert-utils-operator-controller-manager-metrics-service.cert-utils-operator.svc insecure_skip_verify: false
the server_name in the Prometheus config is not valid per the error message.
Can this be the problem when trying to pull metrics?
The text was updated successfully, but these errors were encountered: