Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploying a MVP BinderHub #1404

Merged
merged 24 commits into from
Jun 22, 2022
Merged

Conversation

sgibson91
Copy link
Member

@sgibson91 sgibson91 commented Jun 8, 2022

related #1280

This PR deploys a minimal working BinderHub to the 2i2c cluster. It can be accessed at https://binder-staging.2i2c.cloud

  • Basing Binderhub's values from the original Pangeo Binder's values: https://github.com/pangeo-data/pangeo-binder/blob/staging/pangeo-binder/values.yaml
  • Config for the BinderHub has been added
    • For now the BinderHub pushes images to my personal Docker Hub account
  • Deployer has been updated to handle a BinderHub deployment
  • All extra config entries have been disabled in the BinderHub helm chart for now, until we migrate over the templates from basehub

@sgibson91
Copy link
Member Author

This config is currently failing with the following messages:

  • From Helm CLI:
I0608 16:18:36.603860    2220 request.go:665] Waited for 1.098502108s due to client-side throttling, not priority and fairness, request: GET:https://104.198.26.247/apis/cert-manager.io/v1alpha2?timeout=32s
  • Events from hub pod which is in CrashLoopBackOff
Events:
  Type     Reason     Age                   From                                   Message
  ----     ------     ----                  ----                                   -------
  Normal   Scheduled  2m23s                 gke.io/optimize-utilization-scheduler  Successfully assigned binder-staging/hub-66958dc964-mtd6s to gke-pilot-hubs-cluster-core-pool-d184c825-vhck
  Warning  Unhealthy  107s (x3 over 2m13s)  kubelet                                Readiness probe failed: Get "http://10.0.1.33:8081/hub/health": dial tcp 10.0.1.33:8081: connect: connection refused
  Normal   Created    75s (x4 over 2m13s)   kubelet                                Created container hub
  Normal   Started    74s (x4 over 2m13s)   kubelet                                Started container hub
  Warning  BackOff    48s (x9 over 2m8s)    kubelet                                Back-off restarting failed container
  Normal   Pulled     33s (x5 over 2m14s)   kubelet                                Container image "jupyterhub/k8s-hub:1.1.2" already present on machine

@sgibson91
Copy link
Member Author

The above looks like the following PR but we have not enabled the prePuller or userPlaceholders

@sgibson91
Copy link
Member Author

From the hub logs

k logs -c hub hub-66958dc964-mtd6s --previous
Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: 0-binderspawnermixin
Loading extra config: 00-binder
Loading extra config: 01-custom-theme
Loading extra config: 02-custom-admin
[E 2022-06-08 16:01:34.455 JupyterHub app:2969]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2966, in launch_instance_async
        await self.initialize(argv)
      File "/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py", line 2457, in initialize
        self.load_config_file(self.config_file)
      File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 87, in inner
        return method(app, *args, **kwargs)
      File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 775, in load_config_file
        for (config, filename) in self._load_config_files(filename, path=path, log=self.log,
      File "/usr/local/lib/python3.8/dist-packages/traitlets/config/application.py", line 737, in _load_config_files
        config = loader.load_config()
      File "/usr/local/lib/python3.8/dist-packages/traitlets/config/loader.py", line 616, in load_config
        self._read_file_as_dict()
      File "/usr/local/lib/python3.8/dist-packages/traitlets/config/loader.py", line 648, in _read_file_as_dict
        exec(compile(f.read(), conf_filename, 'exec'), namespace, namespace)
      File "/usr/local/etc/jupyterhub/jupyterhub_config.py", line 446, in <module>
        exec(config_py)
      File "<string>", line 3, in <module>
    ModuleNotFoundError: No module named 'jupyterhub_configurator'

@sgibson91
Copy link
Member Author

Commenting out all the custom scripts made the hub pod get out of crashloopbackoff

@sgibson91
Copy link
Member Author

Binder exists at https://binder-staging.2i2c.cloud/ Image build and push succeeded, but spawn failed with:

HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"jupyter-binder-2dexamples-2drequirements-2dat9cvbbg\" is forbidden: error looking up service account binder-staging/user-sa: serviceaccount \"user-sa\" not found","reason":"Forbidden","details":{"name":"jupyter-binder-2dexamples-2drequirements-2dat9cvbbg","kind":"pods"},"code":403}


Spawn failed: (403)
Reason: error
HTTP response headers: HTTPHeaderDict({'Audit-Id': '9b2b8bc3-ab2b-4fed-8d75-0fcacf2b86a4', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '6076e3d8-fb9c-4af6-bc16-1828eab8bf98', 'X-Kubernetes-Pf-Prioritylevel-Uid': '3b5efc35-8cfb-4eeb-8300-d32a394811b8', 'Date': 'Wed, 08 Jun 2022 16:33:06 GMT', 'Content-Length': '369'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"jupyter-binder-2dexamples-2drequirements-2dat9cvbbg\" is forbidden: error looking up service account binder-staging/user-sa: serviceaccount \"user-sa\" not found","reason":"Forbidden","details":{"name":"jupyter-binder-2dexamples-2drequirements-2dat9cvbbg","kind":"pods"},"code":403}

@sgibson91
Copy link
Member Author

sgibson91 commented Jun 8, 2022

Removed singleuser.serviceAccountName to circumvent the above issue

@sgibson91 sgibson91 force-pushed the deploy-test-binder branch from b384c5c to 6543a9c Compare June 15, 2022 08:49
@sgibson91 sgibson91 changed the title Deploying a test BinderHub Deploying a MVP BinderHub Jun 15, 2022
@sgibson91
Copy link
Member Author

sgibson91 commented Jun 15, 2022

Current problems

@GeorgianaElena
Copy link
Member

However, a dask cluster is never successfully created. I can't remember if we need a service account or not for that to work - can anyone remind me?

hmm, I just tried and it seems like it's starting a cluster for me?
Screenshot 2022-06-15 at 17 12 25

@GeorgianaElena
Copy link
Member

Error: execution error at (binderhub/charts/dask-gateway/templates/gateway/secret.yaml:10:27): gateway.auth.jupyterhub.apiToken must be defined when using jupyterhub auth

I believe that until dask/dask-gateway#473 is solved, a token must be manually generated and explicitly set for the jupyterhub gateway service. The code here probably makes more sense than what I'm saying.

# FIXME: This section can be removed upon resolution of the below linked issue, where we would
# instead just define a JupyterHub service under hub.services and
# rely on the JupyterHub Helm chart to generate an api token if
# needed.
#
# Blocked by https://github.com/dask/dask-gateway/issues/473 and a
# release including it.
#
if hub_helm_chart == "daskhub":
gateway_token = hmac.new(
secret_key, b"gateway-" + self.spec["name"].encode(), hashlib.sha256
).hexdigest()
generated_config["dask-gateway"] = {
"gateway": {"auth": {"jupyterhub": {"apiToken": gateway_token}}}
}
generated_config["basehub"].setdefault("jupyterhub", {}).setdefault(
"hub", {}
).setdefault("services", {})["dask-gateway"] = {"apiToken": gateway_token}

if hub_helm_chart == "daskhub":
gateway_token = hmac.new(
secret_key, b"gateway-" + self.spec["name"].encode(), hashlib.sha256
).hexdigest()
generated_config["dask-gateway"] = {
"gateway": {"auth": {"jupyterhub": {"apiToken": gateway_token}}}
}
generated_config["basehub"].setdefault("jupyterhub", {}).setdefault(
"hub", {}
).setdefault("services", {})["dask-gateway"] = {"apiToken": gateway_token}

@sgibson91
Copy link
Member Author

sgibson91 commented Jun 15, 2022

hmm, I just tried and it seems like it's starting a cluster for me?

Ah that's because I pasted the wrong link! The original (now updated) link points to dask-staging instead of binder-staging. Try this one?

https://binder-staging.2i2c.cloud/v2/gh/pangeo-gallery/default-binder/master?urlpath=git-pull%3Frepo%3Dhttps%253A%252F%252Fgithub.com%252FTomAugspurger%252Fpangeo-dask-gateway%26urlpath%3Dtree%252Fpangeo-dask-gateway%252F%26branch%3Dmaster

@sgibson91
Copy link
Member Author

sgibson91 commented Jun 15, 2022

I believe that until dask/dask-gateway#473 is solved, a token must be manually generated and explicitly set for the jupyterhub gateway service.

So do we actually manually set this for other dask hubs in their values files? I feel like we don't, we let the deployer generate one. Therefore, if we wait for the deployer to generate one, how do those hubs pass the validation check - because the validation step runs before the deployer generates and inserts the token?

Am I misunderstanding something?

I added code to generate/inject the API token for binderhub here:

And I can deploy the hub fine so long as I skip the validation check. See 69f712d

@GeorgianaElena
Copy link
Member

@sgibson91, validation works for other daskhubs because of this (had no idea about it and never noticed it)...

cmd.append("--set=dask-gateway.gateway.auth.jupyterhub.apiToken=dummy")

@sgibson91
Copy link
Member Author

Ahhhh, so I need a conditional there for BinderHub too - thank you @GeorgianaElena! <3

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me!

helm-charts/binderhub/values.yaml Outdated Show resolved Hide resolved
deployer/config_validation.py Outdated Show resolved Hide resolved
sgibson91 and others added 7 commits June 16, 2022 10:40
@sgibson91
Copy link
Member Author

Got a warning in the logs about calico

2022-06-22T09:11:06Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "1b7ca330c53fd745eeae4e2c6d6145e4ec90293d784b615366dfa70466ece06b": stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

@sgibson91 sgibson91 marked this pull request as ready for review June 22, 2022 09:18
@sgibson91 sgibson91 requested a review from yuvipanda June 22, 2022 09:18
@sgibson91
Copy link
Member Author

sgibson91 commented Jun 22, 2022

Other than the above comment about calico, this is deployed and ready for review! 🎉

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/ Let's go! THANKS @sgibson91!

@yuvipanda
Copy link
Member

The calico warning is harmless for the most part, means the pod got scheduled on the node before networkpolicy had a chance to start up.

@sgibson91 sgibson91 merged commit 1f76a48 into 2i2c-org:master Jun 22, 2022
@sgibson91 sgibson91 deleted the deploy-test-binder branch June 22, 2022 16:34
@github-actions
Copy link

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/workflows/deploy-hubs.yaml?query=branch%3Amaster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants