owned this note
owned this note
Published
Linked with GitHub
---
tags: quansight, jupyterhub, plan
---
# The Unopionatedest JupyterHub
(also the scratchiest JupyterHub, the minimalest JupyterHub,
the minimalist JupyterHub)
A minimal JupyterHub distribution that is unopionated about
spawners and authenticators.
If you think of [TLJH](https://tljh.jupyter.org) as 'Ubuntu',
this distribution will be like [Gentoo](https://gentoo.org) -
small enough that other distributions can build opinionated
things on top (Like ChromeOS, coreOS, etc). Or maybe it's an
'undistribution'? Unclear, but you get the point.
## First use case
The first use case is a JupyterHub with batchspawner on
a slurm cluster running on OpenHPC. We will look for an
easy second use case to make sure we don't overfit to
that one use case.
## Why?
TLJH is an opinionated JupyterHub distribution to run on
one VM with minimal dependencies. While it has a *lot* of
opinions, the plugin system is flexible enough to support
many other use cases. The base setup that is useful to
a lot of folks include:
1. JupyterHub and proxy processes run independently to make
sure JupyterHub restarts don't impact running users.
2. All processes are supervised robustly (with systemd). They
start on system startup, restart when failed and send logs
to standard system sources (`journalctl`)
3. Minimal permissions for all hub processes, with a combination
of user accounts and systemd-based isolation.
4. Automated HTTPS with Let's Encrypt support.
5. Opinions on directory structures - where environments are set
up, where packages are installed, structure of config files,
etc.
6. Idle Culler support is built in, since almost all installations
want this.
However, a few other things are tightly coupled, and unlikely to
change in the base to preserve ease of maintenance.
1. Tied to Ubuntu / Debian based operating systems
2. Has strong opinions on how user environments are set up - that
there is only one, and it is set up in a very specfic way. For
many custom use cases, this is entirely useless.
3. Users are created and used in specific, inflexbile ways. If you
have another system for creating and managing users on your systems,
you are out of luck.
4. In general, where there is a choice to be made between security
and usability/maintainability, we always pick usability. TLJH is
as secure as possible given its usability goals. Users can not
really tweak this.
It will be very useful to build a very minimal JupyterHub 'distribution'
with the basic features from TLJH but otherwise as unopinionated as
possible.
## Features
### Directory structure
A *base directory* will contain the JupyterHub install. The layout
will be vaguely like TLJH, but without user environments. This directory
can be set during install.
```
base/
- jupyterhub_config.d/
- 01-some-config.py
- 02-some-other-config.py
- hub/ # conda environment
- state/
- cookieSecret
- database.sqlite
- pid files
```
### Installer
A `bootstrapper.py`, possibly as a conda-installer, will do the following:
1. Create base directory `$BASE_DIR`
2. Set up a `conda` environment under `$BASE_DIR/conda`
3. Install an installer package inside the conda environment
4. Run the installer
This makes sure we have non-stdlib dependencies for the bootstrapper,
while allowing usage of libraries and packages for the installer.
This is extremely similar to how TLJH does things, except it uses
a virtual environment for the hub environment rather than a conda
environment.
miniforge is preferred over miniconda for this.
### Hub and Proxy processes
Using `jupyterhub-traefik-proxy` lets us not install nodejs in the
hub environment. We will use `systemd` to start, supervise and secure
the hub and proxy processes. systemd security directives will be
used where needed to restrict the processes privileges. The proxy
process will run as a `www-data` (or equivalent) user. The hub will
run as `root` by default, but this can be changed by the installer.
### HTTPS
`traefik` supports automatic HTTPS aquisition with Let's Encrypt,
and we will make it extremely easy to use that. Manual TLS certificates
will also be supported.
### Hub Config
Unlike TLJH, there will be no special command or config file for
hub specific configuration. Admins can put bits of config in a
`$BASE_DIR/jupyterhub_config.d` directory, and it will be loaded
in lexicographical order. Reloads of hub config will be done
via standard systemctl commands. The base `jupyterhub_config.py`
setting up the defaults and loading config will be kept out of
sight, however.
### Idle Culler
As it has been duplicated too many times already, the idle culler will be
moved to its own python package, and provided by default. Configuring it will
require creating a file in `jupyterhub_config.d` however.
### Uninstall
Stopping the systemd services and running `rm -rf $BASE_DIR` should
uninstall everything set up by the base. However, the authenticator
and spawner you set up might create resources (such as users, slurm
jobs, etc) outside that you might have to clean up manually.
### Upgrades
Unlike TLJH, we will be used mostly as a base for a variety of other
installs. So upgrades become critical. This shall be released as a package,
which will depend on very specific versions of JupyterHub, proxy, etc. So
upgrades can happen via upgrades to the python package - although manual
commands might need to be run for some versions. This means we have a
different strategy from TLJH - we will have versioned releases, rather than
asking people to install from master all the time.
We should have a fixed, date based release cadence, to make sure
releases don't lag too much.
### Versioning
There are two aspects of versioning:
1. Versioning your hub conda environment, which is set up by the
installer. This could be done via traditional conda means (an
environment.yml file) checked into a repository. This should
capture any authenticators, etc installed there.
2. Versioning your config files. This could be a git repo containing
the contents of `jupyterhub_config.d`, pinned with an environment.yml
file.
The harder problem is versioning *user* environments, which is
out of scope here.