owned this note
owned this note
Published
Linked with GitHub
---
tags: OpenDreamKit, Jupyter, funding
---
# Brainstorm on an e-infrastructure grant for Jupyter
## About
**Context:** The [OpenDreamKit](https://www.opendreamkit.org) project will be winding down in 2019 ([original proposal](https://github.com/OpenDreamKit/OpenDreamKit/blob/master/Proposal/proposal-www.pdf)), and we are looking for opportunities for future funding opportunities. This page hosts a collective brainstorm toward a proposal centered on the Jupyter ecosystem for the following call.
**Call:** [INFRAEOSC-02-2019](http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/h2020/topics/infraeosc-02-2019.html), which is about proposing services for the [European Open Science Cloud](http://eosc.eu) (EOSC).
**Call coordinator:** [OpenDreamKit](https://www.opendreamkit.org)'s former project officer Georgia (she sounded enthusiastic about us joining an application)
**Leader:** Min Ragan-Kelley
**Deadline:** Late January 2019!!!
**What for:** supporting the Jupyter ecosystem at large for science
**Story:** offering Jupyter-based services for the EOSC, with demonstrators in various areas of science
**Some of the key call criteria:**
- [x] Participation of small companies; possibly including offering of commercial services
- [x] Multidisciplinary
- [x] [Technology Readiness Level (TRL)](https://en.wikipedia.org/wiki/Technology_readiness_level): the tools should be at TRL 6 (prototypes) at the beginning of the project and TRL 8 (production) by the end. Arguably, most of Jupyter tools are already at TRL 6, with many pieces (e.g. Binder) already at TRL 8.
### Proposal Name
We will need to find a name for this proposal; we may -- or not -- want to build on ODK's reputation, in addition to that of Jupyter.
Keywords: services, Jupyter, VRE/DRE, toolkit, open science, flexible, ...
Suggestions:
- Jovyan's Dream (Digital Research Environment for All M?????)
- Jovyan's OSS (Open Science Services)
- BOSSEE (Building Open Science Services on European E-infrastructure) ?
## Building a shared vision
TODO: write a few paragraphs for each item below (possibly merge with the above section)
### Who are we?
### What is our spirit?
### What is our goal?
### What is our strategy?
### From where do we start?
### How do we connect to or differ from other projects?
### Why are we excellent?
## Narrowing the proposal
Thanks everyone for the ideas! I think we have some really good ideas here,
and we just need to narrow it down for a final proposal.
This is a smaller-scoped project than OpenDreamKit,
so it cannot cover as many broad categories of work.
I think we have three broad categories:
1. Improving and developing Jupyter-based services for open science. This includes the following broad tasks:
- Maintaining and developing Jupyter kernels and infrastructure (e.g. xeus, widgets, visualization)
- Developing the JupyterHub and Binder software for operating these services
- Developing repo2docker for automating reproducible environments
- Federating Binder across multiple deployments
- Deploying a BinderHub on EOSC
- Collaboration and advanced interfaces in Jupyter
2. Applications of these services (Demonstrators/motivators)
- Enabling citizens to work with personal data (PersonalData)
- Photon Science (XFEL)
- Life Sciences application (INSERM?)
- Math visualization demonstrator (UPSud)
- More here (ESA?)?
3. Education and Dissemination (helping scientists to do open science)
- Running workshops and training
- Developing online materials about best practices
- Applications of Jupyter in university education (UPSud)
In particular, ideas from the brainstorm that I think don't fit into this structure are:
1. simulagora on EOSC
2. data storage and semantic search
3. Open Text Mining
because I think they spread the proposal in too many directions.
I'll be putting together a repo for the proposal shortly at https://github.com/bossee-project/proposal
> [name=Min RK]
## Potential partners
Just brainstorming here; Min, feel free to jump in the discussion
- LRI, Université Paris Sud? In addition to the usual suspects, Nicolas contacted the Human-Computer Interaction team (that e.g. runs Wilder's huge screen and already uses Jupyter technology)
- Gent?
- [Simula](https://www.simula.no/)
- [EGI](https://www.egi.eu/)
- [CERN?](https://home.cern/) -- see e.g. [here](https://swan.web.cern.ch/) ongoing discussions between Min, Nicolas, Sylvain Corlay and Vassil Vassilev.
- [INSERM](https://www.inserm.fr/) -- French national institute for medical research: Nicolas contacted Isabelle Perseil
- [QuantStack](http://quantstack.net/) -- company behind Xeus-Cling, led by Sylvain Corlay (one recipient of ACM award): ongoing discussions
- [Logilab](https://www.logilab.fr/)
- [FAU Erlangen--Nürnberg](https://www.fau.eu/); see task descriptions on [data storage](/is4LnLIzTlON4Pe-5Q6l4w) and [search](/is4LnLIzTlON4Pe-5Q6l4w) @Min, Feel free to ask or strike this item.
- [ESA](https://www.esa.int/ESA) -- European Space Agency: @embray has contacts
- [ESO](https://www.eso.org/public/) -- European Southern Observatory: @embray has contacts
- [University of Silesia](http://us.edu.pl)
- PersonalData.io (Paul-Olivier Dehaye, nonprofit)
- [SWITCH Engines: NO](https://www.switch.ch/engines/) a bit like EGI but for Switzerland. Paul is working on this. Answer: NO
- [EPFL HPC lab: NO](@hpcepfl) Paul is working on this. response: NO "I would, however, be open to help as a third party if you need some support for privacy technologies."
- [U Geneva ?](https://iss.unige.ch/staff/morin-jean-henry/) for security issues, ongoing
- [ITEMM](http://itemm.fr/itemm/) -- Olivier (from Logilab) has had a talk with someone from this French laboratory and they might be interested to port an application dedicated to the design and the engineering of musical instruments into the EOSC cloud. Logilab and ITEMM have worked together to deploy such an application (that is already used by crafstmen) but this application lacks an ergonomic GUI and would benefit to be ported in the cloud.
- Suggestion by our Reviewer Maria: [OpenAIRE](https://www.openaire.eu/) is "an EU organisation to facilitate openness in scholarly communication"? Do we have contacts?
> [name=Florian] At the review meeting, I got the advice from reviewer Maria that we might want to include "Open Text Mining"
>
> [name=Olivier (Logilab)] Text mining? ie extracting "entities" such as persons or places from a text
>
> [name=Samuel]
> - Quoting the [Wikipedia entry for Text mining](https://en.wikipedia.org/wiki/Text_mining#Scientific_literature_mining_and_academic_applications) (emphasis mine): "initiatives have been taken such as Nature's proposal for an **Open Text Mining Interface (OTMI)** and the National Institutes of Health's common Journal Publishing Document Type Definition (DTD) that would provide semantic cues to machines to answer specific queries contained within text without removing publisher barriers to public access".
## Open Proposal Writing: get involved!
Following the [tradition of OpenDreamKit](https://opendreamkit.org/2015/01/31/open-proposal-writing/), we are building this proposal bottom up, reaching for potential partners in the community.
In practice, we have setup this collaborative pad where everyone is
welcome to propose tentative tasks (see below).
Stand by your dream: Assume you had funding and man power, what would be -- in your view -- its best use for the community? What key hurdles you would tackle? What striking demonstrator would you develop?
Once we will have a collection of such tentative tasks, the next step will be to structure them and refine the scope and consortium to make a coherent story. This may involve dropping certain tasks or even participants. As much as possible the decision process will be collective, but ultimately this will be the project coordinator's call.
## Tentative tasks
For each task: create a separate pad, with
- task leader, participants (sites, people)
- estimate of resources needed: personnel (in Person Months), ...
- 3--4 paragraphs of description:
- what to achieve
- story of how it would contribute to the general project
- where would funding be useful
The structuring by theme/work package below is completely tentative.
### Dissemination
- [Training and dissemination](https://hackmd.io/1s_3i3XjRsqTWbMGGFi_CA)
- [Teaching](https://hackmd.io/3iSfgzKURK-cfNZCVuCTIg)
- [Develop materials for best practices](https://hackmd.io/bEwTyNcoR8qZ-IxbnWVLGQ)
- [Run workshops to train scientists to do more reproducible research](https://hackmd.io/LwM6EsBDThePJqOJepVdsw)
### Infrastructure and deployment
- [Develop Binder, repo2docker facilities for reproducible publications](https://hackmd.io/G9RJcE7dRUm2zMUHJr1Tyg)
- [Binder Federation to distribute hosting across providers/Launch and maintain at least one European Binder instance](https://hackmd.io/Xtwd3GtUSBuJA2XHR_xcgA)
- [Binder/JupyterNB/JupyterHub as an "app store"](/Efij3rs8SBCJzfmTEgZWCg)
- Further work on EGI's deployment of BinderHub/JupyterHub
### Jupyter / User Interface / Visualization
- [Maintenance of math docker containers and kernels](https://hackmd.io/GSQ5HBgPSWuXWflIPXQQQw)
- [Visualization for mathematics: needs nice demonstrators!](https://hackmd.io/7x2pVrqHScyf6oxXsLihwg)
- [Jupyter 3-D visualization](https://hackmd.io/D4lGn-lnQfOHkBpgjnvpwA)
- [Further development of C++ kernel xeus-cling](https://hackmd.io/ZE_xks08TW2pqN_rbYPjBQ)
- [Binder-backed dashboards with Project Jupyter](https://hackmd.io/hO-63M0RRCmQwvY0przEqw)
- [Advanced interaction and collaboration for Jupyter](https://hackmd.io/In55QL9bQQqX3ZPqcAzHKA?both)
- Improvements to Jupyter's cell full screen / editor?
> [name=Sylvain Corlay] I added some things in the data section with respect to data decimation that may be relevant to this section.
>
> [name=Sylvain Corlay] Any interest in me adding general Jupyter Widgets roadmap items here?
### Data
> [name=nthiery] Data is an important aspect of EOSC in general (less so in this call?); so we definitely should say something about it in the proposal. However other groups will have decade long expertise in providing general purpose data management infrastructure. I believe we should focus on showcasing that Jupyter services play well with existing data infrastructure; and maybe develop infrastructure in some specific areas of science where we have something very special to say.
>
- [Multi-Modal Semantic Search](/is4LnLIzTlON4Pe-5Q6l4w)
- [Unified Interoperable Storage for highly structured Data](/vU666B6hQOOdlhF0176Lug)
> [name=Sylvain Corlay] Another project from CERN that could be relevant for this section in the [EOS](http://information-technology.web.cern.ch/services/eos-service), a distributed filestystem which has been used for other EU-level projects such as [JEODPP](https://cidportal.jrc.ec.europa.eu/home/).
>
> [name=Sylvain Corlay] GIS: PanGeo, JEODPP?
>
> Data: on-the-fly tile services for leaflet.
> Vector data: GeoPandas?
> NetCDF: Iris / xframe
> Distributed computing: Dask
> Data-decimation: Vaex
> ML: usecase machine learning on geographical data
> - satellite images: detect buildings, ships
>
> [name=Sylvain Corlay] Vaex - Data decimation
>
> - Dask-distributed support for vaex.
> Vaex core, file system: ~stable.
> Vaex server currently at prototype level.
> Remote data access in vaex.
>
> Dask-ml, Vaex-ml: infrastructure to perform machine learning with scikit-learn on large data sets.
>
### Demonstrators
- [Demonstrate use cases in research applications in widely different domains](https://hackmd.io/5_egUIDGRAe1HyyF2Hnl5g)
- [Port Simulagora to EOSC and implement use cases](https://hackmd.io/l2FCRsXVSCyiQRZ_LgrAcg)
- [Personal data processing pipelines](https://hackmd.io/GuO3cdfERa6qPNivynrudA)
- Something around nteract? (see [here](https://medium.com/netflix-techblog/notebook-innovation-591ee3221233)), but who should lead?
- something around nteract, and [Netflix' data science pipelines](https://medium.com/netflix-techblog/notebook-innovation-591ee3221233)
- Interactive data analysis and visualization for High Energy Physics: demonstrating Root + xeus-cling + Jupyter + Widgets + data
- security review?