# Technical draft for dynamic image building This document includes an overview of things done by binderhub and tljh-repo2docker, and an drafted idea of what may be reasonable to develop sustainably. --- The idea of this document was outlined in https://github.com/2i2c-org/infrastructure/issues/1598. > The goal we have with the notes is to provide an overview of the UX and tech available via tljh-repo2docker / binderhub, and how they couple to various core jupyterhub functionalities and specific implementations of Spawners for example. > > The value we see for such an overview is that we think it would be helpful when trying to scope what we are to do going onwards more clearly. We hope that such an overview can for example help us realize better what can be re-used to meet goals etc. ## Overview of existing code ### [jupyterhub/binderhub](https://github.com/jupyterhub/binderhub) - [Helm chart](https://github.com/jupyterhub/binderhub/tree/master/helm-chart) - ([link](https://github.com/jupyterhub/binderhub/blob/b78cac34ee6a2324822d9596137462b022d7dec9/helm-chart/binderhub/Chart.yaml#L14-L16)) Installs the JupyterHub Helm chart as a dependency. - ([link](https://github.com/jupyterhub/binderhub/blob/b78cac34ee6a2324822d9596137462b022d7dec9/helm-chart/binderhub/values.yaml#L52-L226)) Declares the Authenticator to NullAuthenticator and the Spawner to BinderSpawner derived from KubeSpawner, and overrides some default configuration. - ([link](https://github.com/jupyterhub/binderhub/blob/b78cac34ee6a2324822d9596137462b022d7dec9/helm-chart/binderhub/values.yaml#L260-L277)) Configures and deploys [jupyterhub/docker-image-cleaner](https://github.com/jupyterhub/docker-image-cleaner) to run as a privileged pod, once per node, to clean up the k8s node controlled docker runtime. - ([link to config](https://github.com/jupyterhub/binderhub/blob/b78cac34ee6a2324822d9596137462b022d7dec9/helm-chart/binderhub/values.yaml#L243-L258), [link to docs](https://github.com/jupyterhub/binderhub/blob/master/doc/zero-to-binderhub/setup-binderhub.rst#use-docker-inside-docker-dind)) Optionally configure `dind` or docker-in-docker to expose a custom versioned docker runtime that the build pods can make use of. - [Python package](https://github.com/jupyterhub/binderhub/tree/master/binderhub) - ([link](https://github.com/jupyterhub/binderhub/blob/master/binderhub/app.py#L71)) The BinderHub tornado application allows you to start a repo2docker build followed by a JupyterHub server launch. - ([link](https://github.com/jupyterhub/binderhub/blob/b78cac34ee6a2324822d9596137462b022d7dec9/binderhub/build.py#L52-L576)) `Build` is a class to create a monitored (status, logs) k8s Pod that in turns builds and pushes an image using repo2docker. ### [plasmabio/tljh-repo2docker](https://github.com/plasmabio/tljh-repo2docker) The single computer distribution of JupyterHub called The Littlest JupyterHub [supports plugins](https://tljh.jupyter.org/en/latest/contributing/plugins.html), and tljh-repo2docker is such plugin. As seen [in the plugin](https://github.com/plasmabio/tljh-repo2docker/blob/d711bda82942d55eb14395993c1d63043294fae8/tljh_repo2docker/__init__.py#L169-L208) it adds configuration of JupyterHub to: - Use a DockerSpawner class instead of the TLJH default [jupyterhub/systemdspawner](https://github.com/jupyterhub/systemdspawner) - [Registers](https://github.com/plasmabio/tljh-repo2docker/blob/d711bda82942d55eb14395993c1d63043294fae8/tljh_repo2docker/__init__.py#L196-L208) additional webpage handlers such as /environments where users can pre-build environments and declare cpu/memory resources for them. ### [BinderHub configured with authentication and persistent storage](https://discourse.jupyter.org/t/a-persistent-binderhub-deployment/2865) ## Overview of related core JupyterHub functionality - JupyterHub's `/hub/spawn` page - ([link](https://github.com/jupyterhub/jupyterhub/blob/3b59c4861f155f868bcf29c00dfa78034d289950/jupyterhub/handlers/pages.py#L84-L90)) The SpawnHandler class managed GET and POST requests to /spawn, where GET requests renders a form and POST requests spawns a server. - ([link](https://github.com/jupyterhub/jupyterhub/blob/3b59c4861f155f868bcf29c00dfa78034d289950/jupyterhub/handlers/pages.py#L203-L211)) If the Spawner class returns something truthy from the `get_options_form()` method, options are presented. - The Spawner class is tightly coupled with the `/hub/spawn` page. - ([link](https://github.com/jupyterhub/jupyterhub/blob/3b59c4861f155f868bcf29c00dfa78034d289950/jupyterhub/spawner.py#L400-L444)) The Spawner base class provides a default implementation of `get_options_form()` that returns the `options_form` configuration. - ([link](https://github.com/jupyterhub/kubespawner/blob/564111c09a093005394e517fd6b786d29a53f4f2/kubespawner/spawner.py#L2862-L2876)) KubeSpawner provides `profile_list` configuration that manipulates `options_form` to present more choices, and `options_from_form` to parse data selected in those choices - [JupyterHub Services ("externally managed")](https://jupyterhub.readthedocs.io/en/stable/reference/services.html#properties-of-a-service) (at `/services/<something>`) - Via `c.JupyterHub.services`, one can make JupyterHub proxy traffic to `/services/<some-service>` to some other webserver and also expose it via a a top menu seen in `/hub` pages after logging in. - If the service requires information about the user, the user will need to authorize the service first via a JupyterHub page. Possibly this can be bypassed by declaring the service as trusted [in the future](https://github.com/jupyterhub/jupyterhub/issues/4007). ## Overview of goals - Enable a UX for existing JupyterHub deployments (at least z2jh based) to build images using repo2docker. - Avoid locking in more than needed into how JupyterHub is deployed by avoiding coupling with JupyterHub distribution and Spawner choice. - Avoid forking and creating a new project too similar to other projects. - Enable a UX to make use of user build images. - Enable a UX to configure more spawn option details ## Takeaway points 1. **Multi-node image building adds lots of complexity:** There is significant complexity stemming from having a service that doesn't directly build images itself, but distributes the work to workers in k8s Pods like BinderHub does. Creation of pods and accessing logs from them etc. 2. **Multi-node use of images adds some complexity:** If you need to consume images on separate machines, you need to have them accessible from some central location - a container registry. If that is needed, we also need to manage access to write and read from it. 3. **tljh-repo2docker could avoid some complexity we can't avoid:** we may not need to build the images on different computers, but we must at least support accessing them from different computers. ## Idea draft 2022-08-09 - a repo2docker service pushing to a container registry A standalone service (webserver application) registered and accessed as a jupyterhub service under `/services/repo2docker`, that would allow users to build images via repo2docker and push them to a container registry. The service would also be able to describe what images is available via an API. This is meant to enable a JupyterHub admin to populate image choice options in a KubeSpawner `profile_list` or more generally via the spawners `get_options_form()` method. This idea is meant to be composable with other ideas to meet more goals, but very stricly scoped to not try to meet the goals of allowing users to configure spawn options directly. In scope for this: - create a python based web-application enabling users to run repo2docker, and optionally also uploads the result to a container registry as well. - create a helm chart that deploys this web-application - write documentation in general and provide an example on how it can be used with z2jh ### Further work and feedback There are plenty of things to discuss etc, but here are some questions to get started. #### General input - Scope: too large, about right, or too narrow? - Implementation: technically reasonable and open source sustainable? - Value: is it sufficiently valuable to invest time in? #### Technical clarifications - What needs to be configurable? - Does the service need to manage user identity? - Yes, because else everyone can visit `/services/<our-service>` - If we build and upload the images, should they be named in a way associated with the username? - Should the user be allowed to choose the name of the image uploaded freely, or should the be namespaced based on the user to avoid conflicts, or should it be configurable like `pod_name_template` is etc? #### Level of ambition: semver2 releases and changelog management If we go for developing this, we should do it with semver2 versioned releases of both the Python package and the Helm chart to deploy it I think. Keeping everything developed it in the same github repository is reasonable in my mind, but we could also split the helm chart from the python package.