Zero-to-Jupyterhub 13/05/22

# Zero-to-Jupyterhub 13/05/22 **Aim**: Deploy a prototype jupyterhub on our `turingmybinder` subscription. We are going to keep it completely separate from the binder `turing` cluster for now, and we will tear it down after. We will follow [these docs](https://zero-to-jupyterhub.readthedocs.io/en/latest/). ### Co-working session 13/05: We followed the docs and deployed a kubenetes cluster. We deviated from the instructions as below: - In step 6 of [this guide](https://zero-to-jupyterhub.readthedocs.io/en/latest/kubernetes/microsoft/step-zero-azure.html) we are supposed to create a service principal which is associated (via --scope) with a $VNET_ID. We did not do this, because we are not owners on Azure. Instead, we used an existing service principal (see software notes below). Presumably, when this was created it was also associated with a specific $VNET_ID. In the following steps, we refer to the subnet, but not the vnet. - We borrowed the following steps from deploying the staging cluster. ``` The service principal is encrypted in the repo under `mybinder.org-deploy/secrets/turing-auth-key-prod.json`. - `cat turing-auth-key-prod.json` - logged in to service principal using ` az login --service-principal --username <sp-app-id> --password <sp-app-key> --tenant <tenant-id>` ``` #### Co-working session 20/05/22 Aim: Resume working through the [docs](https://zero-to-jupyterhub.readthedocs.io/en/latest/), do: - [x] setup helm - simple as `brew install helm` - [x] install jupyterhub ``` ➜ hub24 helm upgrade --cleanup-on-fail \ --install jupyterhub jupyterhub/jupyterhub \ --namespace hub24 \ --create-namespace \ --version=1.2.0 \ --values config.yaml ``` output: ![](https://i.imgur.com/vhN45kB.png) verifying (notes - the helm charts need a namespace to install to, `kubens` sets your active namespace for kubectl. `kgp` is an alias for `kubectl get pods`, given by the zsh kubectl plugin): ![](https://i.imgur.com/7F7YsZA.png) We can now modify our deployment by making a change to the `config.yaml` and passing it to the `helm upgrade` function (we can remove the `--create-namespace`). The [customization guide](https://zero-to-jupyterhub.readthedocs.io/en/latest/jupyterhub/customization.html#customization-guide) is useful to flick through. Things to customize: - **User Environment**: software & files available after log-in. - Next session aims: - [ ] Configure login to use accounts. - [ ] host it on a domain --- ### Software notes - [more involved docs about software used](https://zero-to-jupyterhub.readthedocs.io/en/latest/resources/tools.html#tools) - may be worth reading this note on [networking](https://docs.microsoft.com/en-us/azure/aks/use-network-policies#create-an-aks-cluster-and-enable-network-policy) - Helm - tool for automating software deployment on kubernetes (analogous to a package manager, with packages being charts) - Chart – Pre-configured template of Kubernetes resources. - Release – A chart deployed to a Kubernetes cluster using Helm. - Repository – Publicly available charts. - Kubernetes - container-based scalable system for cloud applications. **Service Principals** - Essentially they are 'deployment accounts' that have been given the authority to deploy resources in azure. Having a service principal avoids needing to authenticate individual users everytime you want to deploy something (that's the dummy version, probably all we need to know). You then do not need to transfer privileges to any user wanting to deploy resources, rather you can just give the keys to the service principal. Most cloud providers have a similar concept. IT set up our service principal when Sarah G originally deployed the cluster. **Helm** - Package manager for Kubernetes - Helm installs **charts** into Kubernetes, creating a new **release** for each installation. And to find new charts, you can search Helm chart **repositories**. - Helm chart is bundle of yaml files which can be reused on other clusters e.g. dev, staging & prod - The [helm docs](https://helm.sh/docs/) are good and accessible. Some info on helm and useful helm commands: `helm get manifest releasename` `helm install --debug --dry-run releasename releasedirectory` Helm charts are structured like: > wordpress/ Chart.yaml # A YAML file containing information about the chart LICENSE # OPTIONAL: A plain text file containing the license for the chart README.md # OPTIONAL: A human-readable README file values.yaml # The default configuration values for this chart values.schema.json # OPTIONAL: A JSON Schema for imposing a structure on the values.yaml file charts/ # A directory containing any charts upon which this chart depends. crds/ # Custom Resource Definitions templates/ # A directory of templates that, when combined with values, # will generate valid Kubernetes manifest files. templates/NOTES.txt # OPTIONAL: A plain text file containing short usage notes Helm charts have a default folder `/templates` which establish kubernetes objects. A `Chart.yaml` file which defines describes the chart. And a `values.yaml` file which provides values for the templates. **Notes from [this intro video](https://www.youtube.com/watch?v=Zzwq9FmZdsU)** - I liked the idea that a package manager is a way to share expertise. - ### To connect to a running cluster: ``` # start it up on azure az login az aks get-credentials \ --name hub24 \ --resource-group hub24 kubectl get nodes ``` **Tips** - if you are working with `zsh` use the kubectl plugin, it exposes a bunch of short-hand aliases. ### Hub23 how to use: https://github.com/alan-turing-institute/Hut23/wiki/How-to-use-Hub23:-the-Turing-BinderHub ## Co-working 27/05/22 ### Setting up DNZ ZONE. - We want to use the hub23 DNS zone. - So we created an A record called 'hub24' in hub23 DNS zone and matched it to the respective kubernetes resource in `MC_hub24_hub24_...`. This is the azure-created node resource group. - Then in `config.yaml` we set the following as per the Security section in Administrator guide [here](https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html#https): ``` proxy: https: enabled: true hosts: - hub24.hub23.turing.ac.uk letsencrypt: contactEmail: cmole@turing.ac.uk ``` - We then upgraded the helm chart using: ``` helm upgrade --cleanup-on-fail \ --install jupyterhub jupyterhub/jupyterhub \ --namespace hub24 \ --create-namespace \ --version=1.2.0 \ --values config.yaml ``` - After waiting a few minutes we can access our jupyterhub at hub24.hub23.turing.ac.uk. - we can login with any username. The username/password combination is linked to a persistent volume, so the notebooks get saved. - We can explore this with Lens: - if you have previously connected Lens with your local kubeconfig `~/.kube/config`. then the cluster 'context' will be added automatically and you can view hub24 in Lens. - ### Setting up Authentication. - :tada: we have set up authentication using the github OAUTH app, currently linked to `callummole`. - we maqnaged this due to adding the following to `config.yaml` ``` proxy: https: enabled: true hosts: - hub24.hub23.turing.ac.uk letsencrypt: contactEmail: cmole@turing.ac.uk # service: # loadBalancerIP: hub24.hub23.turing.ac.uk hub: config: GitHubOAuthenticator: client_id: <> client_secret: <> oauth_callback_url: https://hub24.hub23.turing.ac.uk/hub/oauth_callback allowed_organizations: - binderhub-test-org scope: - read:user JupyterHub: authenticator_class: github ``` - We have tested that this means that I (callum) can login but luke cannot (I am a member of binderhub-test-org). - You need to make sure the client_id matches the OAUTH_APP, and the client_secret matches the secret generated on github. - `helm get values jupyterhub` lists the user-supplied - See authentication docs: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html - These docs are also useful for: 1) [using oath apps for organisations](https://docs.github.com/en/account-and-profile/setting-up-and-managing-your-personal-account-on-github/managing-your-membership-in-organizations/requesting-organization-approval-for-oauth-apps), and 2) [oauth scopes](https://docs.github.com/en/developers/apps/building-oauth-apps/scopes-for-oauth-apps) - Note that I needed to change scope to `read:org` for the `alan-turing-institute` authentication ot work. - Currently authentication is restricted to public membership of ATI. For some reason TODO: - [x] Make presentation. - [ ] Get an alan-turing-institute github org admin to set up an oauth app for us (or make us an admin so we can do it ourselves - probs the best option so we can update the app later) - [ ] migrate the authentication to alan-turing-institute organisation - [ ] figure out what the benefits are to have this on the binder cluster (beyond cost). - [ ] some sort of analysis of "best" configuration (storage size, cpu, etc) - [ ] think about user environments (read config pages). - [ ] setup github repository (or start using github repository) for up-to-date charts, issues, etc - [ ] Revisit Sarah's ROADMAP to see if we understand it ---- ### Upgrade the existing hub23 helm chart. Using [hub23 upgrade](https://alan-turing-institute.github.io/hub23-deploy/binderhub/installing-binderhub-local-helm-chart.html#upgrading-hub23-chart).