--- tags: ubc, jupyter --- # Jupyterhub discussion topics Link to [github markdown source](https://github.com/eoas-ubc/ubc_hubs/blob/master/docs/initial_topics.md) Link to [OCESE website](https://eoas-ubc.github.io/) ## Jupyterhub/kubernetes configuation and operation ### Issues for next year (Sep. 2021) 1. High performance computing for teaching * cloud based infrastructure * new personnel $\alpha = \beta$ $\int_\alpha^\beta \Gamma d\alpha$ 1. What do we need? * Education is not necessarily HPC * Grading, notebook testing, notebook building etc. * Pair programming/teamwork (IOOXA) * Personnel capable of maintenance and support within the units supplying the service (department, faculty, LT-Hub, whatever). Both technical and pedagogical aspects need to be supported. 1. Followup * Funding -- UBC (ITAC) and external * Building collaborations -- what's the plan? * How to find the right people -- to help build workflows at user/pedagogy level and keep up with (and contribute towards) open source collaborators. * Timeline? * Sara to raise this with the Dean * Make education/science links external to UBC 1. Nuts and bolts * Ansible and terraform * Jupyterhub (tbd -- accounting) * Done: Terraform files for deployment with authentification * Different strategies for high vs. low resource courses? * Need bespoke containers for a range of software requrements. * Billing/quotas -- hidden inside the application * UBC shopping for a cloud management solution * Tag based billing * funding approved for year 1: 18K students * Which courses are/will need access? (2,550 fall term ) * how/when to separate grading from student computing * assessment is as challenging/important as computing * supercluster https://www.digitalsupercluster.ca/ -- microsoft/other digital companies * Jupyterhub is an umbrella * Track many collabrations already underway * Bring in compute canada ([NDRIO](https://engagedri.ca/)) (& maybe [digital supercluster](https://www.digitalsupercluster.ca/)?). * Cloudbank - UCSD, UW, Cern * 2I2C -- how to link up? 1. CTLT has lots of experience with open source 1. How do we share tools/techniques 1. How many juypterhubs should UBC run? * How should DSCI and EOAS coordinate with syzygy and [2I2C](https://2i2c.org/)? * do we need low, medium and high performance configurations? * what about binderhubs and voila dashboards? 1. How do we associate multiple course containers with each hub? 1. Should we maintain a repository of current helm/ansible/docker-compose/terraform config files (e.g. like [helm.dask.org](https://helm.dask.org/)) 1. Should UBC maintain a jupyterhub container registry? 1. Should we keep a database of per-capita costs for current hubs? 1. How do we load-test new deployments to set pod specs for resource intensive courses? 1. How do we cost-recover for upper year courses with extra resource requirements? ### Relevant jupyterhub projects * [2I2C](https://cfp.jupytercon.com/2020/schedule/presentation/209/2i2c-sustaining-open-source-through-hosted-jupyter-infrastructure-for-research-and-education/) * [QHUB from Quantsight](https://cfp.jupytercon.com/2020/schedule/presentation/185/introducing-qhub-how-to-get-your-own-cloud-data-science-platform-on-the-cheap/) * [UMich infrastructure](https://cfp.jupytercon.com/2020/schedule/presentation/230/using-the-jupyterverse-to-power-mads/) ### OCESE current todo list (Jan. 2021) 1. How do we set disk quotas/resource limits on spawned containers? 1. How do we backup and restore student volumes? ### Possible collaborations: jupyterhub containers 1. OCESE is developing github actions to build, deploy and test containers and notebooks, following the approach of the [pangeo docker stacks](https://github.com/pangeo-data/pangeo-docker-images) * We use [conda-lock](https://github.com/brl0/conda-lock) for exact pinning * Should we use a common base image? 1. OCESE and DSCI are developing [ansible playbooks](https://github.com/jupyterhub/jupyterhub-deploy-teaching) for jupyterhub configuration on single nodes ### Pitfalls/issues to resolve 1. Economies of scale: * The per-student costs of holding the minimal infrastructure in AWS (and any other cloud) depends on size of the course. E.g. for our EOSC 350 (approx. 50 students) it is around $20/day. This cost will vary depending on the needs of the courses. Nevertheless, larger courses or deployments serving more then one course will benefit for distributing the minimal cost across larger user group. 2. Accounting issue: * the precise cost accounting in AWS both per student or per course (in multi-course deployments) is a difficult problem due to the lack of information to allow separation of cost on the hub's user level. 3. Support and sustainability: * Who can "take charge" and coordinate activities? This feels like at least a half time job over the next 2-3 years. Is there anyone in this group or elsewhere at UBC with that kind of "spare" time? * Is UBC committing to having persistent expertise to support users (instructors, TAs and students)? * At what institutional levels can we expect there to be support & expertise? (eg, UBC IT, Learning Technologies, Faculty, Department ...) * Who will commit to 'wrangling' the documentation (and maybe training sessions) so that users can learn what they need to know when they need to know it? At some point, resources and strategies need to become "routine" rather than the special domain of experts in cloud computing. Getting there will not be trivial; i.e. will need concrete support. ## Jupyter notebooks/student programming tools 1. Possible collaborations: Jupyter notebooks * OCESE uses [Git Pre-commit hooks](https://github.com/executablebooks/jupyter-book/blob/master/.pre-commit-config.yaml) for linting and formatting * We use the jupyter notebook [jupytext](https://github.com/mwouts/jupytext) extension so that saving a notebook generates both a [myst-markdown](https://jupytext.readthedocs.io/en/latest/formats.html#myst-markdown) file for text formatting (whitespace, newlines etc.), and a [py:percent](https://jupytext.readthedocs.io/en/latest/formats.html#the-percent-format) file for code formatting (using [black](https://black.readthedocs.io/en/stable/) and [flake8](https://flake8.pycqa.org/en/latest/)) 1. Possible collaborations: Team/pair programming * [iooxa jupytercon poster](https://docs.google.com/presentation/d/12y8nxthKjUguDmVbGHj89J4knbiFjNCk4vM1PkEg0sg/edit#slide=id.p) and [video](https://iooxa.com/for/jupyter/) * OCESE is following the progress of [jupyterhub rtc](https://github.com/jupyterlab/rtc) * We are experimenting with [codefreak](https://github.com/codefreak/codefreak) # December 7 minutes * Jim, Henryk, Phil, Tiffany, Stephen, Pan, Sara, Trevor 1) nbgrader 2) entry point 3) binderhub 4) white paper still hiring offers out -- 2 senior analysts, cloud computing/jupyter mixed. circulate job ad what's the boundary between instructors and deployed give controls to instructors for container build who actually sets up the kubernetes cluster? should students see different jupyterbook -- examples -- phil will provide * nbgrader can ctlt play a role? needs to be enterprise trevor -- work is frozen at deadline need to separate generic from course specific need to build privacy in from the beginning cpsc 103 -- common set of features what is the base set of services 1) schedule, collect and pass off to grader 2) build infrastructure autograde centralized grading service to coordinate TA grading -- lifesaving -- Jim