Carbon Plan collaboration plan

# Carbon Plan collaboration plan ## Background Carbon Plan wishes to run a variety of JupyterHubs that run commercial cloud, and provide the same experience regardless of which cloud is being used at a particular moment. They currently run a hub with 2i2c on AWS, and wish to deploy a hub on Microsoft Azure as well. Doing this will require some new development in addition to the deployment and operations of a new hub. Because this collaboration entails a combination of hub operations and development, as well as *new* development, we'll give this partnership a more bespoke description than 2i2c's other hub service tiers. ## Goals of this document This document describes a few ways in which 2i2c and Carbon Plan can collaborate together to make this happen. We have the following goals: - Describe the new development needed to meet the needs of Carbon Plan's Azure deployments - Describe the ongoing operations that are needed as a part of the partnership - Define the resources that both organizations can provide - Define a time-frame for the collaboration, and potential next steps once it is finished. Our goal is to define some specific deliverables to work towards together, and to ## Scoping the environment management stuff ### Background It sounds like conda store is definitely worth exploring, we'd also like to explore stuff more along the lines of "Binder-like functionality in JupyterHub". It'd be good to hear your thoughts on the "end user problem" you'd like to solve and perhaps we can brainstorm ideas for specific development to try. ### Options - Conda Store - Built by QuanSight - Seems complex and unclear how to participate in project - Maintaining Docker images - Improving the UX for integrating a new image into the JupyterHub environment - notebooks.gesis.org is a good test case - Configurator API? - If the configurator had an endpoint you could control from a GitHub action, then you could update the configurator via the same process that publishes an image. ### Short term - Building images via repo2docker action, updating via the configurator, would be an OK short-term solution - ## Scoping Prefect stuff ### Background Pangeo Forge is leaning hard into Prefect (and especially Pangeo Forge), so this is something we should have a story around. - prefect is pretty user-friendly, doesn't take much time to understand it - they have a weird license - Prefect core is open source - Cloud service is proprietary ### Options - Deploy prefect agents (not server) - Carbon Plan is using Prefect Cloud - Register agent w/ Prefect Cloud and then it "just works" - This is pretty straightforward to accomplish https://discourse.pangeo.io/t/pangeo-batch-workflows/804/11 ### Conclusion - Fine for us to run Prefect Agents, shouldn't be a ton of work - But don't put a ton of time into automating - Maybe a small bit of documentation - Longer term we'll need a clearer story for "batch-style workflows" ## Scoping the Azure CarbonPlan deployment needs Is there anything fancy there? Would a "minimum viable collaboration" be for us to just deploy an Azure hub for Carbon Plan so that we can start to play around on there? ## Multi-cloud spawning Could you run a single hub on one cloud provider that could **spawn** user sessions on other cloud providers. - Right now you can choose spawner variables like RAM, CPU, environment, etc - What if you could also choose the cloud provider to spawn ## Whether some of this work is aligned enough with people's pre-existing workstreams or if it'd be a new dev path (e.g., Erik is already quite interested in both Conda Store and "batch computation" style stuff, Sarah is more familiar with Azure in general, Yuvi really wants the "binder in JupyterHub" stuff to happen, etc). ## Next steps - Chris writes up these notes in something like a "Statement of Work" - Short-term deliverable that Joe needs: a hub running on Azure - Medium-term they'll need GPU support - Long-term deliverable is