owned this note
owned this note
Published
Linked with GitHub
# Unify binderhub helm chart with zero to jupyterhub helm chart
Maintaining user images is the most time consuming, 'toil'-like part of
operating a JupyterHub with z2jh. Users need to wait on hub operators,
and hub operators have to spend time doing package management work -
in many cases, this is a lose-lose situation. Giving control of the environment
to users is one of the most wished for features among zero to jupyterhub operators. Users
should be able to 'bring their own environments', and share it among themselves
if necessary. Hub operators should be able to focus on running clusters,
security, performance and tuning - not on building user images!
BinderHub offers an amazingly user-friendly solution to this problem of
'let users bring their own environment', with
extremely wide adoption. The current prevalent view of the difference
between JupyterHub and BinderHub is informed by the most popular
BinderHub deployment - at mybinder.org. Users see binders as *transient*,
*non-authenticated* and with *limited resources*. BinderHub is also
seen as something *distinctly separate* from a JupyterHub - partially
because no parts of JupyterHub are user visible on mybinder.org!
The amazing work done to make BinderHub work with authenticated JupyterHubs
(MAKE SURE TO LINK TO APPROPRIATE CREDIT HERE, Arnim and GESIS folks i think? Find out!)
solves so many of these problems, and I (among others) (credit and reference folks, particularly Ryan) believe this is
a big step forward in JupyterHub deployments. Doing the social and
technical work required to make it easy for hub operators to *enable
dynamic image building functionality* with just config on any JupyterHub
deployed with z2jh will help drive adoption, and make JupyterHub users happy.
From a technical perspective, the binderhub helm chart should be
merged with the z2jh helm chart. Binderhub the software project is not affected.
The hope is for BinderHub to become a piece of software (like JupyterHub) with
multiple distributions (like TLJH, z2jh).
This *must* be done in a way that does not increase the maintenance burden for
current maintainers of neither z2jh nor binderhub.
At the end of this migration, this repository should contain *just*
the binderhub source code, and *no* helm charts.
- The image-cleaner doesn't change too often, and should move to its own
repository. That repository can publish a Docker image that we can refer
to from our helm chart.
- The primary BinderHub specific customization we have is the ability to
run user servers without the `jupyterhub-singleuser` command present,
by starting `jupyter-notebook` instead. This could be absorbed into
kubespawner, or we can start requiring `jupyterhub-singleuser` to be
present in the user image. A more ambitious approach is to support
running with a sidecar doing auth instead of requiring `jupyterhub-singleuser`.
or `jupyter-notebook` - although that will be tied to kubernetes.
This is also not a blocker - we can continue to keep the small
amount of code required right now in the upstream `jupyterhub_config.py` behind
a flag as long as required.
- The code and helm-chart co-existing has helped make it easier to do
changes that require co-ordination between the helm chart and the python
code. Moving the helm-chart away will make development mimic JupyterHub
development. In the long run, I think this is a good thing - it'll
give binderhub more structured releases, and make it easier to do work
that lets it run without requiring Kubernetes. In the short run, we
can prevent extra maintenance work by keeping the Dockerfile for
BinderHub here, auto-publishing it on merge, and configuring
Dependabot to bump the image tag version in z2jh with each merge. We
should not add extra maintenance burden in terms of release amanagement
as a pre-requisite for this work. The number of changes that
require co-ordination between the python and helm code is already
pretty low, and hopefully any extra co-ordination work required here
is not too much.
- This is a *breaking change* for current users of the binderhub chart, as they
will need to install a *different* chart. We must have a good migration
guide, and it should be as seamless as possible. Extra care must be paid
to those using authenticated binderhubs right now. However, there are a lot
more z2jh installations than binderhub installations, and hopefully those
running binderhub will be able to benefit from the broader community
support with z2jh.
- mybinder.org is the largest installation of BinderHub, and explicit
effort must be put into helping it migrate without a lot of burden on
the existing maintainers.
- People are already confused about the differences between 'BinderHub',
'JupyterHub' and 'mybinder.org' (among others), and this has the potential
to either make the situation clearer or murkier. Effort must be paid to
communicating this change properly, so operators know what they are getting!
This process should help increase the number of binderhub installs in the
wild, and drive more development and community engagement forward. For example,
better customization of the home page, deeper integration with named servers,
finer-grained authorization on which users can launch which repositories,
etc. This merging work is the *beginning*, not the end!
# THOUGHTS FROM EARLIER DRAFTS
# Thoughts from Chris
- I think that "enable dynamic image building on a JupyterHub" is a great user-facing feature
- We should loop in Arnim from GESIS on this and see what he thinks (they're the ones that drove authenticated BinderHub, I believe)
- Julie from OpenScapes also really wants this
- I think it could be useful if you had a separate section about "conditions that need to be met after combining the charts". There are a few places where you make statements like "this needs to reduce maintenance burden, not grow it", etc. It could be helpful if that was a short bullet list somewhere
- It would also be useful if we had some ideas of what "reduced maintenance burden" would look like. E.g., is it some kind of "helm chart complexity"? (we don't want the combined helm chart to be a huge frankenstein!)
- More generally I feel like a JEP template could be a nice way to structure this information for productive feedback (https://github.com/jupyter/enhancement-proposals/blob/master/jupyter-enhancement-proposal-guidelines/JEP-TEMPLATE.md)
- Would this mean that BinderHub effectively ceases to exist as a standalone entity? What would "deploy a BinderHub" mean anymore? Is a BinderHub now just "A JupyterHub with transient user servers and image auto-building flags turned on"?
- This would likely be a significant effort to migrate mybinder.org over (and that service has almost no support at all), I think it's worth recognizing this. (e.g., our binderbot would now become I guess a "jupyterbot" since we'd need to upgrade every time a new JupyterHub helm chart was released)
- I wonder if we could also de-couple the *files* from the *environment* in a Binder context.
- People have been asking for this for a long time (especially because people often update files, but don't often update the environment, and this harms build time substantially)
- Moreover, if you're dynamically building images on a JupyterHub, you probably are more likely to want to bring *your* files to that image, rather than use whatever files are baked into the image.
- I also want to keep in mind Ryan's vision of "JupyterHubs don't have their own filesystems anymore". Could this fit into that vision as well? It would certainly simplify hub deployments if everybody's work was stored in a repository that they pulled/pushed from, rather than a filesystem that was hub-specific (or cloud storage or something).
- I don't want to scope-creep this idea, but just a thought in case this would be a natural time to explore this as well.
RESPONSE FROM YUVI:
I was trying to make the document be explicit that it's not a radical change of what binderhub is, as much as maybe integrating the binderhub helm chart into the jupyterhub helm chart. BinderHub will continue to exist as is, and hopefully expand - and run on systems that don't require kubernetes. This document thus needs more work.
## Thoughts from Erik (2021-09-12)
> [name=Erik Sundell] Kenan (@bitnik) and Arnim (@arnim) have put in a lot of effort into the auth logic I think.
> [name=Erik Sundell] a "should be merged" statement about z2jh/binderhub helm charts makes me a bit hesitant. The gist is that I think of it as a technical statement rather than a goal focused statement, and that I haven't explored the technical options to accomplish the goal well enough yet.
> [name=Erik Sundell] The [image-cleaner](https://github.com/jupyterhub/binderhub/tree/master/helm-chart/images/image-cleaner): my understanding made explicit.
>- It is a Python script to monitor and clean the host computers docker build cache so the builders won't run out of space.
>- It is a docker image to run the image-cleaner Python script
>- It is a component of the BinderHub Helm chart, to run the docker image as a k8s DaemonSet - as one instance on each node
>
> It sounds very reasonable to make this a standalone repo, like jupyterhub/configurable-http-proxy is to Z2JH atm. Artifacts of the repo would then be a PyPI package and an image published to quay.io or some container registry.
> [name=Erik Sundell]
> About the following quote
>
> > [name=Yuvi Panda]
> > [...] the ability to run user servers without the `jupyterhub-singleuser` command present by starting `jupyter-notebook` instead.
>
> I lack understanding to reason clearly about this. A description of why this has been done historically and with references to the customization would be useful to me.
> [name=Erik Sundell]
> About a clarification of scope and mode of operations etc.
>
> I'm a bit confused of what it would mean to have a BinderHub as something you can opt-in to in a Z2JH deployment. What does it mean to do so? If you deploy a BinderHub, you get some defaults set automatically and I don't think an opt-in to enable BinderHub should start overriding related Z2JH settings that BinderHub the Helm chart have defaulted to use for example.
>
> Anyhow, below are some statements of what I think can be reasonable. By making these statements, I hope to clarify the open questions and the related technical complexities.
> 1. The BinderHub web UI exists alongside the JupyterHub UI still.
> 2. The BinderHub web UI use the same authentication/authorization as the JupyterHub UI. So either both JupyterHub and BinderHub is anonymous for example, or neither is.
> 3. The BinderHub web UI is to be exposed via a /services/binder route and z2jh/tljh should not support having a separate domain for the BinderHub web UI.
> 4. No statement about user storage / git repo - just confusion about the situation and what to do.
>
> I'm very uncertain about user storage for a BinderHub. Typically the home folder is a git repo for BinderHub, but for JupyterHub the home folder is the user's home folder. What does the authenticated BinderHub currenty do and how is it done etc?
> [name=Erik Sundell]
> > [name=Chris Holdgraf]
> > (we don't want the combined helm chart to be a huge frankenstein!)
>
> Haha no I fully agree!
>
> I think it would be reasonable to maintain Helm templates for k8s resources to run binderhub-image-cleaner pods, binderhub-builder pods, and binderhub-web-ui - within the z2jh project. But, I think its important that the actual code for all of these are part of Docker images, and that when these images get new versions, there is automation to update the version of the images in z2jh.
>
> Technically, I'd also like to consider some code hygine by sectioning the Helm templates associated with the BinderHub feature as a sub-chart to the JupyterHub Helm chart under the optional `char
> ts` folder. This is for example what the truly enormous GitLab Helm chart do to remain somewhat sane. The GitLab Helm chart both internally defined Helm charts nested under a `charts` folder next to the `templates` folder, and externally defined charts it depends on listed in the `Charts.yaml` file. --- See the [GitLab Helm chart](https://gitlab.com/gitlab-org/charts/gitlab) and its `charts` folder with internally defined Helm chart and its `Chart.yaml` referencing externally defined Helm charts, and the [gitlab-runner Helm chart](https://gitlab.com/gitlab-org/charts/gitlab-runner) as an example of an externally defined Helm chart.
## From Erik & Yuvi conversations
- What does this mean for auth?
- What does this mean for persistent storage?