Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging external-apps-ingress-controller in Prod cluster #29

Closed
dystewart opened this issue Oct 31, 2022 · 8 comments
Closed

Debugging external-apps-ingress-controller in Prod cluster #29

dystewart opened this issue Oct 31, 2022 · 8 comments
Assignees
Labels
added_post_planning openshift This issue pertains to NERC OpenShift

Comments

@dystewart
Copy link

I've done some looking around and I'm seeing what may be a typo in the ingressController yaml file on this line

I'm assuming it's meant to be:

spec:
  domain: apps.openshift.nerc.mghpcc.org

As opposed to:

spec:
  domain: apps.shift.nerc.mghpcc.org

This may not be the only thing that needs changing seeing as there is also something going on with the node scheduling as seen in the conditions here, but it's a start.

@larsks
Copy link
Contributor

larsks commented Nov 1, 2022

See my comment on the PR: the original value was correct (and you can verify that with a simple DNS query; compare the result of looking up foo.apps.shift.nerc.mghpcc.org with foo.apps.openshift.mghpcc.org).

@larsks
Copy link
Contributor

larsks commented Nov 1, 2022

And to avoid some additional confusion: there is some overlap between this issue and with #16. This issue is supposed to be "Why isn't the external ingress controller running?"

@dystewart
Copy link
Author

It looks like the reason the external ingressController is not creating is because the pods are not schedule-able (See the PodsScheduled status at: here

I quickly looked through the nodes available to the prod cluster and I don't see any labels (zone: external) which as you can see in the ingressController yaml, it's looking for

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: external-apps-ingress-controller
  namespace: openshift-ingress-operator
spec:
  domain: apps.shift.nerc.mghpcc.org
  defaultCertificate:
    name: external-apps-ingress-certificate
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: External
  nodePlacement:
    nodeSelector:
      matchLabels:
        zone: external
  namespaceSelector:
    matchLabels:
      type: external

On several of the prod cluster's worker nodes there is however a label (nerc.mghpcc.org/external-ingress: 'true') which appears to be the label we are actually looking for. I think the namespace label may become an issue as well but the main error is related to the nodeselector so I'm going to open a PR to see about making this change and will link it below.

@dystewart
Copy link
Author

The patch in OCP-on-NERC/nerc-ocp-config#42 did not result in any change to the error status of the ingressController at: https://console-openshift-console.apps.nerc-ocp-prod.rc.fas.harvard.edu/k8s/ns/openshift-ingress-operator/operator.openshift.iov1IngressController/external-apps-ingress-controller/.

I also have determined that the namespaceSelector field, shouldn't be the root of the problem since

namespaceSelector: This selects particular namespaces for which all Pods should be allowed as ingress sources or egress destinations

and so this should have no impact on the ingressController pods being scheduled.
Continuing to look into the issue...

@dystewart
Copy link
Author

After deleting the external-apps-ingress-controller ingressController in the prod cluster, and recreating it via an argoCD sync, the PR to update the nodeSelector fields does appear to have worked as the ingressContoller is reporting that pods are now scheduled. There is still something going on here though bc we're still seeing 0/2 replicas available

@dystewart
Copy link
Author

@dystewart
Copy link
Author

See also: #41

@larsks
Copy link
Contributor

larsks commented Nov 30, 2022

This has been closed by recent pull requests.

@larsks larsks closed this as completed Nov 30, 2022
@joachimweyl joachimweyl added the openshift This issue pertains to NERC OpenShift label Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
added_post_planning openshift This issue pertains to NERC OpenShift
Projects
None yet
Development

No branches or pull requests

3 participants