owned this note
owned this note
Published
Linked with GitHub
# Collaboration in Jupyter Initiative
The goal is to rally core developer/design efforts around a common vision: **enhance the "collaborating on Jupyter notebooks" experience**. This involves three high-level deliverables (expanded below):
1. Commenting in Notebook (and other docs)
2. Shared/Multi-tenant Jupyter Server
3. Real-Time Collaboration
To achieve these deliverables, we'll need to coordinate efforts across multiple subprojects.
*How will we achieve this practically?*
Meet regularly (bi-weekly) to coordinate development happening in multiple subprojects. This initiative requires changes across the entire Jupyter stack (nbformat, kernels, jupyter server, jupyter lab, nteract, voila, jupyterhub). Right now, we have meetings for subprojects, but we don’t have a place to discuss broad, cross-project initiatives like this.
## Meeting Purpose
These meetings are a place to connect people working on different aspects/pieces of the same core deliverables (see deliverables below).
We will keep discussions and decisions high-level. The purpose is to report progress made by various subprojects towards achieving the core deliverable. If deeper discussion about low-level technical decisions is needed, interested people should form separate submeetings as needed.
Let's always be asking "**What’s next?**". We should finish a discussion by evaluating together what our next step is. Let's keep the momentum going! Smaller iterations are big wins!
## Core deliverables
### 1. Commenting in Notebook
#### What is this?
* Commenting on notebooks
* Notebook level (annotations)
* Cell and output level
* Inside cell (i.e. character level comments).
* Comments are persistent and travel with notebook
* Commenting on other document types (CSV, txt, etc.)
* Who commented? (i.e. identity)
#### What subprojects?
<details>
<summary>
nbformat, jupyter_server, jupyterlab, nteract, jupyterhub. <i>(expand for details)</i>
</summary>
* nbformat:
* possibly standardize comments field in nbformat
* jupyter server:
* provides identity;
* stores comments, either through new endpoint or old contents api
* jupyterlab and nteract:
* UI/UX for comments
* Multiple other document types for
* jupyterhub
* implementation of an identity provider for jupyter.
</details>
#### Known questions:
<details>
<summary>
Expand for details...
</summary>
* Where are comments stored (driven by user stories)?
* how do we unify commenting storage for notebooks AND other document formats… i.e. CSV, txt, etc. and version control question with adding more notebook metadata.
* How are comments stored?
* Inside Notebook JSON? What about other document types? What happens when the comments are more granular than cells (i.e. character level)?
* Datastore for RTC?
* Does this require changes to the nbformat?
* It would be ideal to standardize notebook comment JSON so that multiple frontends can level this piece of notebook data.
</details>
#### Current Technical work:
* [JupyterLab commenting extension](https://github.com/jupyterlab/jupyterlab-commenting).
* [JupyterLab RTC plans](https://github.com/jupyterlab/rtc).
### 2. Shared/Multi-tenant Jupyter Workspace
#### What is this?
* Think “Jupyter Drives”
* Shared and remote services/resources—i.e. shared kernels, contents, and terminals.
* Access controlled/provided by authorization providers (e.g. jupyterhub, Github, Google, etc.).
#### What subprojects?:
<details>
<summary>
jupyter_server, jupyterlab, nteract, jupyterhub, jupyter_kernel_mgmt, jupyter_protocol, jupyter-fs <i>(expand for details)</i>
</summary>
* jupyter server:
* provides identity;
* plumbing for authentication + authorization
* jupyterlab and nteract:
* UI/UX to surface identity of current users
* Login / Logout UI
* jupyterhub
* default implementation of an identity provider and project dashboard for shared servers.
* implement a user/organizational data model for mapping permissions to server.
</details>
#### Current Technical work:
<details>
<summary>
<a href="https://github.com/jupyter/jupyter_server/pull/165">Identity and Authorization in Jupyter Server</a><i> (expand for details)</i>
</summary>
* Read / Write / Execute control wrapping Jupyter's REST API
* Operators / extensions can easily patch in their custom auth provider.
* clients send along special token and the server calls these configurable functions that check if the action is allowed and provide an endpoint to get a persistent identifier for the user
* def allowed(token, action, resource) -> bool
* def persistant_id(token) -> str
* the auth backend should say whether it should expose token and id to the kernel
* JUPYTER_AUTH_TOKEN
* JUPYTER_AUTH_ID
</details>
<details>
<summary>
<a href="">Remote asynchronous kernel lifecycles and management</a><i> (expand for details)</i>
</summary>
* In collaborative Jupyter world, kernels are shared resources.
* In these cases, it makes sense for kernels to be living separately from the Server—i.e. remote kernels.
* These kernels should also manage themselves—i.e. start, restart, cull, configure, etc. separately from the server.
* Current work
* [Async Kernel Management in Jupyter Server](https://github.com/jupyter/jupyter_server/pull/191)
* [Kernel Providers](https://github.com/jupyter/jupyter_server/pull/112)
</details>
<details>
<summary>
<a href="">Multiple, remote Contents Providers</a><i> (expand for details)</i>
</summary>
* Directories of notebooks and other files live elsewhere (i.e. AWS S3 buckets, Google Drive, etc.) but are accessed through Jupyter Server.
* The UI/UX to browse these directories looks like a local filesystem
* Kernels should be able to talk to these directories.
* Current work
* [jupyter-fs](https://github.com/jpmorganchase/jupyter-fs)
* [nteract/commuter](https://github.com/nteract/commuter)
</details>
### 3. Real-Time Collaboration
#### What is this?
* Live real-time editing of Jupyter Notebooks by multiple users
* Shared Jupyter server with changes coming from multiple clients.
#### What subprojects?
* jupyter_server, jupyterlab, nteract, jupyterhub
#### Current technical work:
* [jupyterlab-rtc](https://github.com/jupyterlab/rtc)
## People
This is a tentative list of people I (Zach) thought might be interested in joining these meetings. Please feel free to add yourself to this list.
* Zach Sailer (Jupyter Server | Jupyter Cal Poly)
* Saul Shanabrook (JupyterLab | Quansight)
* Sylvain Corlay (Jupyter | Quanstack)
* Eric Charles (Jupyter Server + JupyterLab | Datalayer)
* Kevin Bates (Jupyter Server + Kernel Management | IBM)
* Luciano Resende (JupyterLab + Jupyter Server + Kernel Management | IBM)
* Chris Holdgraf (JupyterHub | UC Berkeley)
* Lindsey Heagy (JupyterHub | UC Berkeley)
* Min Regan-Kelly (JupyterHub | Simula)
* Steve Silvester (JupyterLab | AWS)
* Safia Abdalla (nteract | Microsoft)
* Matthew Seal (nteract, jupyter_core, nbconvert, papermill | Netflix)
* Max Klein (Jupyter Server | J.P. Morgan Chase)
* Vidar Fauske (JupyterLab | J.P. Morgan Chase)
* Brian Granger (Jupyter | AWS)
* Fernando Perez (Jupyter | UC Berkeley)
* Jason Grout (JupyterLab | Bloomberg)
* A. T. Darian (JupyterLab | Two Sigma)
* Tim George (JupyterLab | Cal Poly Jupyter)
* Erik Sundell (JupyterHub | Sundell Open Source Consulting AB)
* Shiti Saxena
## Relevant subprojects
* [nbformat](https://github.com/jupyter/nbformat)
* The notebook JSON format.
* [jupyter kernel management](https://github.com/takluyver/jupyter_kernel_mgmt)
* The future of kernel management, built around Kernel providers.
* [jupyter_protocol](https://github.com/takluyver/jupyter_protocol)
* New implementation for Jupyter protocol
* [Jupyter Server](https://github.com/jupyter/jupyter_server)
* Default implementation of web server and Jupyter’s REST API
* [JupyterHub](https://github.com/jupyterhub/jupyterhub)
* Identity provider for Jupyter Server;
* proxies requests to jupyter server and patches-in identity.
* also provides spawners and authenticators for single-user servers
* [jupyter remote file system](https://github.com/jpmorganchase/jupyter-fs)
* Filesystem contents manager for remote contents providers.
* [nteract/commuter](https://github.com/nteract/commuter)
* AWS S3 Remote Contents Provider
* JupyterLab and nteract
* Clients that will surface the new collaborative features in the underlying protocols.
## Funding sources
Below is a list of grants whose deliverables overlap with this initiative.
* Enabling Safe Access to Sensitive Data (Sloan | Jupyter Cal Poly)
* Shared Project Workspace
* Will use the shared server, commenting, and remote kernels pieces.
* Data models for teams/organizations
* Uses the shared server piece.
* Collaborating in Jupyter (Schmidt | Jupyter Cal Poly)
* Commenting on notebooks and datasets
* Data registry—i.e. shared datasets between multiple users.
* Jupyter Meets the Earth (UCB)
* Chan-Zuckerberg Initative Grant on Real-time Collaboration in Jupyter.
* https://chanzuckerberg.com/eoss/proposals/real-time-collaboration-in-jupyter/