# What could a Turing's jupyterhub look like. A document to discuss the potential benefits and pitfalls of a Turing-hosted jupyterhub. ## Uses within Turing - [name=Callum] classes: - could come with preset environments, notebooks, and datasets for students to work on. - active research development - [name=Evelina] shared datasets + consistent dev environment across all collaborators sound great - [name=JamesR] feels like the main use case is people who can't/don't have time to run jupyter themselves, ie. mainly useful for teaching - [name=Sarah] There's a big need for JupyterHubs (particularly cloud-based ones) in research too. 2i2c targets _both_ research groups and educators as part of it's service model. - Connextions workshop - [name=Evelina] Make it easier than managing a VM in the cloud and deploying own Jupyterhub ## Desired Features - real-time collaboration? fast-switching? - [name=Sarah] JupyterLab RTC is not yet ready for production deployment. There was an incident in UC Berkeley's Data8 deployment (a large network of JupyterHubs serving the whole campus) where someone overwrote someone's elses data. The JupyterLab team are continuing to work to improve this. On the JupyterHub side of things, the most recent releases bring in roles and permission scopes so it is possible to "grant my colleague read-only access to this one notebook on my server" for example. - repo2docker. - [name=Sarah] 2i2c are actively working on bringing repo2docker capabilities into JupyterHub https://github.com/2i2c-org/infrastructure/issues/1382 - version control? - could ensure some good practices e.g. precommit hooks - [name=Sarah] JupyterLab git extension: https://github.com/jupyterlab/jupyterlab-git Safe pushing back to GitHub with appropriately scoped credentials: https://github.com/yuvipanda/gh-scoped-creds/ - bind into azure active directory? - pangeos jupyterhub does this? - [name=Sarah] Has been supported for a long time: https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/authentication.html#azure-active-directory This hub uses the Turing's Azure AD (setup by Anna before she left): https://github.com/alan-turing-institute/bridge-data-platform - cna you change the compute depending on user group membership? - [name=Sarah] Yes https://discourse.jupyter.org/t/tailoring-spawn-options-and-server-configuration-to-certain-users/8449 - shared directories within sub-groups of users - [name=Sarah] Not confirmed, but maybe possible with the roles and permissions work ongoing - installable packages - [name=Sarh] Users can install any package they like, the problem is that is not persistent if the hub restarts. 2i2c currently have a workflow to make adding images to a docker image for a hub pretty well automated using the repo2docker-action. However, with the repo2docker in jupyterhub work we are pushing forward, users will be able to define their own environment on the fly. - ability to request additional compute - No just limited to the `alan-turing-inst` GitHub org. Some projects setup their own GH orgs and it would be useful to include them too. - [name=Evelina] +1 - [name=Sarah] This is a matter of listing the approved orgs in the config, e.g., https://github.com/2i2c-org/infrastructure/blob/fe3375645d958085e8a927dbfcd9acedc97e750d/config/clusters/linked-earth/common.values.yaml#L45-L50 - [name=JamesR] CoCalc already does collaborative notebook editing - [name=Sarah] Do you have a Right To Participate with CoCalc? If you ask CoCalc for a feature, are they obligated to provide it? What if CoCalc disappears? ## Pitfalls / Blockers - needs maintainers - unclear where funding comes from (if every turing researcher uses it that's a lot of compute!) - Plus, Azure is expensive. +1 - Possibly competing with existing services _e.g._ [Baskerville Portal](https://docs.baskerville.ac.uk/portal/portal/), [RCSIS](https://docs.hpc.cam.ac.uk/cloud/index.html) - [name=JR] Are packages global or per user? I just ran `!pip uninstall -y jupyterhub` to see what would happen - [name=Sarah] Global, currently. Installing as user works until the hub restarts. The repo2docker work will make environment definition and building more binder-like and this can be considered "per user" ## Questions - what is the add vs azure labs - [name=Achintya] Can you spin out a JupyterHub instance from a GH/GL repo? - [name=Evelina] This sounds like Codespaces though - [name=Achintya] Probably! My thinking was for the education use-case: a repo that you can “fork” into JH and run the exercises etc.… - [name=Sarah] Yes. Right now, you use [nbgitpuller](https://jupyterhub.github.io/nbgitpuller/). In the future, we are bulding repo2docker into JupyterHub - [name=Achintya] I can’t remember if JH has multi-user instances. Any idea? - [CoCalc](https://cocalc.com/) does, not certain if that is built on Jupyter. - [name=Achintya] Ah, thanks! - [name=Sarah] JupyterHub will do with the right roles and permissions scoped. - how can we provide access to different compute as part of group membership? - [name=Sarah] Like this https://discourse.jupyter.org/t/tailoring-spawn-options-and-server-configuration-to-certain-users/8449 - can we link compute to project budgets? - [name=Sarah] Yes probably. 2i2c are working towards a solution for this. - how can we make it accessible to deploy your own jupyterhub? - [name=Sarah] It already is, just most people are not interested in the maintenance aspect, and are more interested in doing super-cool science on the hub. This is the need 2i2c are filling. - do we want a turing-wide jupyterhub or do we want a use-case-specific-jupyterhub? - [name=Sarah] Probably easier to maintain a single hub. 2i2c manages hubs for whole universities (e.g., the University of Toronto), we also do research-group-specific hubs and short-lived event hubs (e.g., ICE-SAT hackathons) - Is there integration with azure storage? So we can perisist info beyond a jupyterhub cluster. - [name=Sarah] Yes, 2i2c use Azure File Storage as an NFS drive for all users home directories. We also create scratch buckets for temporary data storage and have also been known to mount specific storage buckets for data and/or provide mysql databases