owned this note
owned this note
Published
Linked with GitHub
---
tags: 2i2c, SOLyR
---
# Critical Mass for SOLyR Ignition
> The notes below synthesize ideas from conversations, discussion threads, Slack exchanges,..., with many people. I was activated to start this document after Fernando Perez shared an Apple Notes file with me earlier today. I invite others to correct and improve these notes. --J. Colliander, 2022-07-08
Over the past two years, members of the 2i2c team, together with our collaborators, have envisioned improved systems to support collaborative data-intensive research, learning, and knowledge sharing. The time is right to move forward: let's start a process to design and build a prototype.
## Problems
>"Wait, where's my #2 philips, the one with the blue handle...not in this drawer...WTF!?" Imagine the frustration an auto mechanic would feel if she had to change her tool collection every time she started work on another car.
1. **One person, too many hubs.** A single person can belong to multiple communities, with each community collaborating on that community's JupyterHub. The person may develop a nice customization (e.g. dotfiles) inside the `Hub A` they use with `Community A` and then encounter frustration when that customization is not available inside the `Hub B` they use with `Community B`. Personalized tools should follow the person.
2. **One person, too many projects.** A single person can work on multiple projects. Consider an atmosphere chemist who works on sun-climate interactions in the thermosphere. One of her projects may focus on coronal mass ejections and use tools from heliophysics. Another of her projects may focus on chemical reactions in the upper atmosphere and use tools from chemistry and climate science. She wants to work on both of these projects using the same JupyterHub but the software environment is controlled at the hub level and she wants to easily change tools when she switches between projects. Any project folder should be interactively openable with the curated tools for that project a la Binder. Can Binder's flexibility to support multiple software environments on the fly be brought into JupyterHub at the folder/project level?
3. **Bikes are sometimes better than jets.** Cloud-hosted K8S-backed JupyterHubs typically deliver an almost uniform hardware resource for all users. Small usage tasks (make a few edits) that could be done with small hardware must instead be done with the uniform hardware resource. Offering a palette of hardware profiles will allow users to select the appropriate resources needed for the tasks they need to perform.
4. **Binder is slow.** Binder launches involve long waits while Docker images are built. Imagine a "Pareto improved" Binder service where pre-built images serve 80% of the usage requests instantly and the remaining 20% merely require some small customization added to a pre-built image. Reproducibility can be enhanced using a registry of pre-built images `pangeo.io::ocean-base::2021.0.2` with nearly instant launch times.
5. **Version control sometimes sucks.** Technically capable [naysayers dismissed Dropbox](https://news.ycombinator.com/item?id=6625306) as unnecessary. Who needs Dropbox when we already have `svn` and `mercurial`? The seamlessness created by Dropbox consumerized version control for many users of file systems. The current Jupyter ecosystem expects savvy users to know and use `git`. We can do better! Innovations like [nbgitpuller](https://jupyterhub.github.io/nbgitpuller/) are awesome but should be considered first rather than final steps. Traditional usage of `git` on GitHub involves persistent passwords which are less secure than ephemeral token-based auth back up by 2FA. Yuvi Panda's innovative [gh-scoped-creds](https://blog.jupyter.org/securely-pushing-to-github-from-a-jupyterhub-3ee42dfdc54f) provides a simple-to-use notebook that improves security and makes it easier to use `git` from a cloud-hosted Jupyter session. How should `git` be integrated into JupyterLab?
6. **Publishing should be more automatic.** A reasonably structured project folder with some config files, directories and notebook files should be easily rendered as a Jupyter Book. People with the right chops can convert a slick visualization obtained in a notebook into a stand-alone web app using Voila. We should make it super easy for users to export from their interactive collaborative hub view into stand-alone static (more ambitiously interactive via Thebe/binder) sites.
7. What are the other problems we should try to solve?
## Toward Solutions?
> "Premature optimization is the root of all evil." -- [Sir Tony Hoare, popularized by Donald Knuth](https://ubiquity.acm.org/article.cfm?id=1513451).
The design space for finding solutions to problems like those listed above is big. I reckon that space is bigger than big when imagined by the creative pioneers on our team who have built the foundations of the Jupyter ecosystem. The ultimate hotness might involve next version react.js for this and rust-lang for that. It could be super awesome! I _strongly_ suggest we take a different approach. What we have right now is already awesome. I suggest we can make it a lot better with clever opinionated integrations and proof-of-concept almost-viable-products (AVP) developed first as notebooks.
### Managing role conflicts
Our team includes engineers who serve on the upstream Jupyter core team. There are possible conflicts between the questions "What's best for Jupyter?" and "What's best for 2i2c?" Similar collisions may occur between Binder and 2i2c, JupyterBook and 2i2c, etc. Real and potential opponents to 2i2c (Saturn Cloud, Quansight, Collab, GitHub/Microsoft, CoCalc,IBL Education, Coiled,...) may simultaneously be natural allies to Jupyter core. Do others on our team perceive this tension? Does this potential conflict between roles affect our capacity to create?
I hope that these conflicts can be resolved by replacing "What's best for 2i2c?" with deeper more mission-focused questions like "What's best for science?", "What's best for the user?", "What advances equity, diversity, and inclusivity in STEM?". 2i2c has a fabulous team of excellent engineers and contributors to this ecosystem. That said, there is a lot of talent at Google, at Coiled, at Quansight,..., so that's the main differentiator. I view 2i2c as fundamentally different from other contributors to this space in mission, vision, values and our focus on (academic) research and education usage scenarios. The conflicts I've experienced so far have all been resolved by harkening back to the mission and finding a new perspective that resolves the conflict.
### User/community focused design
2i2c should catalyze a global open source collaborative effort to build a **S**ystem for **O**pen **L**earning and **R**esearch (SOLyR). This thing that we imagine, SOLyR, could be an incredibly valuable intellectual property but developing that way goes against our mission. Instead, we aim to build a collective good consistent with the right to replicate and designed to render the commercial cloud vendors as essentially equal utility providers of compute/data/network.
The idea to christen this thing as `SOLyR` is due to Fernando Perez. The name pays homage to Sun Solaris ("the network is the computer"). An organized collection of Jupyter planets orbiting together around a common purpose is a SOLyR system. I made up the acronym above and like the way the `y` echoes the leimotif in Jupyter and evokes "and" in Spanish as a celebratory 👏 at Fernando.
SOLyR has sometimes been described as "the cloud operating system for data-intensive computational investigation". Traditional operating systems come in two main flavors: desktop (e.g. Windows, MacOS, Linux, ...) and mobile (iOS, Android). Mobile operating systems tend to support applications that kinda have their own file systems (Spotify files can't really be accessed by Google Maps). Desktop operating systems have applications that interoperate on top of a shared file system. These traditonal operating systems are typically deployed on a single computer in order to help a single user get work done. The cloud operating system uses multiple computers to help users, working alone or in teams, get work done. The files-attached-to-apps idea in the mobile OS is kinda similar to the imagined repo-with-associated-software-environment idea explored in problem 3. above.
The experience of the Eddy Symposium (in harmony with many other recent examples, perhaps most strikingly Pangoe) shows that scientic collaboration is already transformed. Groups of peole are getting collaborative work done using multiple computers stitched together with the Jupyter ecosystem. SOLyR, as a name and concept, gives 2i2c a blank canvas to imagine a future free from core team obligations, Numpy governance, etc.
### Almost Viable Products $\rightarrow$ Already Visionary Products
How far can we go toward finding solutions without building anything new in JupyterLab? I think we can do a lot with the paints already available on our design palette. I am inspired by gh-scoped-creds and the way the `shared` and `shared-readwrite` directories were used at the Eddy Symposium. Notebooks are proto-webapps via Voila. The flexibility to deliver a full-blown Linux desktop via JupyterHub, launch another IDE (VS Code) from within JupyterLab, host video conferencing via Jitsi, ..., is already available. This [pull request](https://github.com/jack-eddy-symposium/exoplanetary-impact/pull/2) made by Fernando showed the Eddy Symposium participants that it is possible automate JupyterBook creation using GitHub actions. Imagine what we might do as a team and as a catalyzing force for a global open source community? Let's build a new sun that all the Jupyters can orbit around.