# Notes from speaking with Arnim Find a way to summarize interactions in relevant upstream github repo's. This way, we can help anyone funding us understand what they get for. Authored PRs: - binderhub: https://github.com/jupyterhub/binderhub/pulls?q=is%3Apr+author%3Aconsideratio+created%3A%3E2022-10-08+ - repo2docker: https://github.com/jupyterhub/repo2docker/pulls?q=is%3Apr+author%3Aconsideratio+created%3A%3E2022-10-08+ Make it a github issue visible from https://github.com/orgs/2i2c-org/projects/33/views/1. # Upstreaming and deploying a persistent binderhub like experience In this document I'm describing my thoughts on how to meet two goals. These are two goals I understand there is agreement between 2i2c and GESIS collaboration to work towards. Note that since I've joined 2i2c less than a month ago, and not had a 2i2c hat on for long, I not confident on these goal. 1. To contribute-to/establish open-source projects that has/gets a good chance of remaining/becoming adopted in a way that motivates a sustainable maintenance effort by the surrounding community. 2. For 2i2c to deploy a [persistent_binderhub](https://github.com/gesiscss/persistent_binderhub) chart or something similar. To meet these goals, I've outlined what I think is crucial having a JupyterHub team member hat and a 2i2c hat on. <details> <summary>1. Minimize use of chart dependencies (click for details)</summary> ## Minimize use of chart dependencies ### How would it look if we had minimized chart dependencies? Instead of a chart composition like: ```yaml # grandparent -> parent -> child persistent_binderhub: # ... binderhub: # ... jupyterhub: # ... ``` We would work with a chart composition like: ```yaml # brother | sister new_persistent_binderhub: # ... jupyterhub: # ... ``` ### Why? My 2i2c hat sais its makes it far easier for us to deploy this next to other things, and my jupyterhub team hat sais it makes it far more attractive to contribute to long term. - The `binderhub` chart is a complicated project by itself, so building higher on top of it makes it very complicated. This complexity is a big deal in my mind as I see it as it makes it harder to use, document, maintain, speak about, etc. - The `persistent_binderhub` chart reverts logic in `binderhub` of removing the jupyterhub chart's user authentication and user's persistent home directory storage. Bypassing the binderhub chart (but retaining use of the binderhub software) can avoid this complexity entirely. - By becoming a opt-in binderhub service chart that can be deployed next to a JupyterHub without bundling with an existing binderhub->jupyterhub bundle, it could something far easier to adopt for users. ### Reference: challenges of chart dependencies I'd like to detail two kinds of complexities that arise below. But first, let's provide a background how things work while considering `persistent_binderhub`. - [`persistent_binderhub` depends on `binderhub`](https://github.com/gesiscss/persistent_binderhub/blob/6fca456/persistent_binderhub/requirements.yaml#L1-L5) - [`binderhub` depends on `jupyterhub`](https://github.com/jupyterhub/binderhub/blob/a3df72f/helm-chart/binderhub/Chart.yaml#L5-L16) - [`jupyterhub` has no chart dependencies](https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/2.0.0/jupyterhub/Chart.yaml). Helm chart's can be configured via values when used, and they come with default values. `persistent_binderhub` for example provides default values to [configure both its direct dependency on `binderhub`](https://github.com/gesiscss/persistent_binderhub/blob/6fca456f0cb2711ead60d1dd583c43997c8ab66f/persistent_binderhub/values.yaml#L1-L23) but [also its indirect dependency on `jupyterhub`](https://github.com/gesiscss/persistent_binderhub/blob/6fca456f0cb2711ead60d1dd583c43997c8ab66f/persistent_binderhub/values.yaml#L52-L90). If one would install a Helm chart, one would typically provide configured values to override the default values. So in the case of `persistent_binderhub` this means that we have four sources of values to consider. User provided values, and the three sets of default values from the linear chart dependency tree. #### Complexity: configuration nesting > Note that this example can be managed, but adds complexity as chart dependency tree's grow. If persistent_binderhub chart, configures the binderhub chart's dependent jupyterhub chart, it will look like this: ```yaml binderhub: jupyterhub: <config key>: <config value> ``` It is a bit of additional work to handle the nesting of charts like this, where a jupyterhub config example needs to be adjusted to be nested under `binderhub`. #### Complexity: combining values means lists are overridden and dictionaries merged > Note that this example can be worked around, but adds significant complexity as chart dependency tree's grow. Working with list values in charts can be troublesome. For example, if a user would like to add an entry to the jupyterhub's `extraVolumes` configuration for a `persistent_binderhub` deployment, it would cause [the `extraVolumes` defined by `persistent_binderhub`](https://github.com/gesiscss/persistent_binderhub/blob/6fca456f0cb2711ead60d1dd583c43997c8ab66f/persistent_binderhub/values.yaml#L70-L79) to be overridden. This means that this strategy now made `extraVolumes` something that can't be configured by the end user because it would interfere with the helm chart's configuration. Working with dictionaries in charts can also be troublesome. For example, if a helm chart defines `nodeSelector` to list a few keys and values, then how would a chart depending on that remove a key if needed? They can't. </details> <details> <summary>2. Enabling opt-in adoption (click for details)</summary> ## Enabling opt-in adoption The hope is to enable users that has deployed a jupyterhub chart to opt-in to this functionality, a _binderhub service_ next to/in addition to a typical jupyterhub with authentication and user home folder persistence. For this to work, there will be a need to decouple how various parts interact with each so they can be easier to compose. 1. Separation by avoiding use of chart dependencies, making helm charts deploy as siblings instead of parent/child. 2. Avoiding adding software to run directly in the jupyterhub chart's hub pod, where the jupyterhub software runs, but to instead run in a standalone jupyterhub registered external service. To make things into standalone parts isn't always easier as there is often a need for configuratin glue in between them, and applying this glue needs to be documented well. </details>