Some technical notes about the deployment of OGGM on the cloud.
Context:
Here are some notes I gathered after playing around with Jupyterhub myself a little bit: they will be refined as I/we learn more about how to do it properly. Comments welcome!
From the proposal:
Set-up and deploy the Open Global Glacier Model in a scalable cloud environment, and make this computing environment available for everyone.
…
We envision an online platform where people can log-in and get access to a fully functional computing environment where they can run the OGGM model. This environment will scale according to resources demand. It will be personalized and persistent, so that a user can prepare some computations, start them, log out, then log back in and still find the computing environment he or she left earlier. The advantages for the user will be important: scalable computing resources, no installation burden, no data download, no version issues, user-friendly development environment, all in a web browser.
More concretely: we wish to have a JupyterHub installed on our own cloud resources with a working python environment where OGGM can run. User authentification will be handled the same way as Pangeo does, i.e. via a github organisation. According to Ryan, the sessions can be persistent, i.e. users will develop a script or a notebook, let it run, log-out, come back and see the results.
I need to have a working prototype by mid-July 2019 because I promised this for an invited talk I will give in Montreal (this is usually how I get things done: promising things before they actualy exist). Details (see below) can be delt with later, on the longer run.
Before we start, here are some definitions for the newcomers:
I need to get access to free cloud resources. This should not be an issue, and I will write a proposal ASAP. I have no preference for any cloud provider, but my first experience with Azure was not very good. I've hear good things about Digital Ocean (see this) and Pangeo uses Google.
Which providers should we choose? Here's a wishlist:
Can someone help me to choose here?
This is going to be quite easy I think. We need to provide JupyterHub with Docker containers where OGGM can run. We have good installation instructions, and I have been able to use repo2docker to build a working environment for OGGM-Edu: see this.
Here, we will have to see how we can use the pangeo way of doing things to be more efficient and follow their protocoll. These links provide more context about things I don't fully understand yet:
This is where I don't know yet how this is going to look like. We will have to follow the instructions on Pangeo (http://pangeo.io/setup_guides/index.html) and maybe start smaller with Zero2JupyterHub. This is where kubernetes and helm come into play.
This is where I'll need most help!
Most of the things I described until now are similar to a "standard JupyterHub set-up with a Pangeo flavor" as far as I understand.
In practice, there will be interesting questions related to OGGM itself: