owned this note
owned this note
Published
Linked with GitHub
# Build A Binder - EPFL
Shared notes from the workshop day.
## Introductions
Introduce yourself and short sentence about what it is you want to get out of this day.
Tim - spread knowledge about what mybinder.org can do and how to do it.
Bastian – help Tim in spreading knowledge and learn more about Binder himself.
Leopold, Giovanni, Sasha - working on the [Materials Cloud](https://www.materialscloud.org) web platform - looking to get to know the performance limits of mybinder / self-hosting / flexibility beyond jupyter
Vladimir - EPFL PhD student. Computer Science. Data Visualization course TA. Want to know how to embed interactive visualizations in mybinder.
Paolo - I teach DSP on campus and online and use notebooks for that. Always happy to learn more
Antoine- I work at the EPFL Library in the Research Data Management Team. We help researcher to manage/store/publish data (and yes code is data ;)
Elisa - PhD at King's College London, digital humanities. I would like to share interactive jupyter notebooks.
Maryam - I graduated as a microbial ecologist, and identify myself as part of the bioinformatics middle class. I want to share my jupyter notebooks and R markdown files with collaborators
Paul - I lead a nonprofit called PersonalData.IO that aims to help individuals and groups take back control over their personal data. People often asked me "why would they want to do that? What for?". I believe mybinder.org and systems can help decouple the "taking back control of data" to "making something useful from that".
Cristina - I am a computer scientist at UZH. I use Jupyter notebooks in my daily work (to analyse data sets and for teaching). MyBinder looks very useful to share runnable code. I think scientific conferences should encourage paper submissions to have a repository that can launch binder -- it would also make the life of reviewers so much easier. I just sent an email to the [ISWC Resources](http://iswc2018.semanticweb.org/call-for-resources-track-papers/) program chairs to propose to include this in the list of recommendations for authors.
## Questions about Binder
Collect questions and answers about Binder here.
Q: ...
A: ...
Questions on the service:
- Q: will you stop the free service at some point/what are the plans for exiting the beta phase?
- the free service will remain as long as there's money, but the beta label could disappear at some time ;)
- currently funded by grant from the Moore foundation
- talking to Google wrt waiving fees, applying to other grants, perhpas under the hood of software carpentry
- looking for people to help run the service (PRs, ...)
- possible business models: become member of mybinder org, people pay for increased resources for repos
- Q: what about demoing web apps that aren't jupyter notebooks (plotly, flask, ...)?
- example: [Openrefine](https://github.com/OpenRefine/OpenRefine) see https://github.com/betatim/openrefineder
- unclear whether you can use entry points in Dockerfile to directly direct to a running web server
- but can construct URLs to link people to a different endpoint
- Q: where can you get in terms of startup times?
- launching it a bunch of times will distribute images over the cluster
- about 400 repos launched every week
- wrt adding "hot" containers: not currently a feature - open an issue to
- Q : Is binder Opensource ?
- yes: https://github.com/jupyterhub/binderhub/blob/master/LICENSE
- Q: timeout to "kill" binders/timeout policy? how do you detect "inactivity"?
- remove after 10-20 minutes of inactivity (notebook kernel inactive/no network activity)
- inactivity: no network traffic or no jupyter notebook kernel activity (not executing any cells, ...)
- Q: quick comparison with other "hosting" solution (such as Azure for notebooks)
- no login
- Q: has someone built a binder for Sandstorm?
- Q: what kind of partnerships are you considering?
- service provider
- head of federation
- similarity with Software Carpentry (sustained training)
- Q: where is mybinder.org deployed?
- Google Cloud Platform
- Q: I saw data2binder, what are all the XXX2binder or binder2XXX tools?
Question on deploying binderhub
- Q: Could we have significant speedups by running our own binderhub?
- Yes
- Q: How do you handle memory requirements - memory has to be replicated for each user, right? What are disk and memory available to each binder?
- They use (on average 3-5, autoscaling) highmem-8 google compute cloud machines (52GB RAM, 8 CPUs), with up to 100 pods on them, and a max memory of 2GB as well as a min guaranteed memory requirement (kubernetes takes care of this).
- On average a pod that just starts jupyter takes ~100MB RAM, and even with many users, if they don't have long-running simulations, the average CPU usage is close to zero, as most time is spent "reading/thinking" more than executing cells.
- Q: How complex is it to host a docker registry? Is it just a file repository (so we can run it on the object store) or a software is needed?
- they use the registry provided by Google
- Q: Is it easy/scripted to deploy BinderHub on AWS / OpenStack?
- on kubernetes it's the click of a button
- on OpenStack it may be more fiddly
- need some features in a kubernetes cluster that are fiddly to get
- each openstack deployment is a bit different, makes it harder
- Q: Security issues: what about people starting the binder and then using it on the network for malicious usage?
- network connections are limited to http & https (no ftp/smtp/ssh)
- never received an angry letters from people being spammed by binderhub
- bandwidth throttled
- block some IPs due to bitcoin mining on binderhub (private blacklist)
- Q: Can we do Git push in Binder?
- No, in you're not ready to get your Github password stolen. myBinder policy: do not share on binder anything that you wouldn't share in a tweet
- Recommended way to get back changes: Download the notebook
- idea: let a "binder-bot" make a PR
- Q: could you add some kind of GitHub tipping system to your repositories to incentivize development?
- Q: usage limits?
- cannot have more than 100 live instances of the same repo
- some limits on the compute resources (2 cores, 2GB RAM)
- hard shutdown after ~8h
- if your needs exceed the limits, stop by at the gitter channel
- Q: mybinder resources?
- resources: 4x highmem8 nodes (goes up and down depending on usage, max seen: 5-6)
- highmem8: 8cores + 52GB of mem, 100 pods per node
- note: over 90% of binders can run on under 100MB of RAM
- Q: what about authentication?
- One of the Leibniz institutes is working on adding it
- might allow for user directories in the long run
- Q: How to work simultaneously in the same binder?
- just sharing the URL won't work
- need to get token from session and append to URL "token=..."
- Leibniz institute made a button for this (copy session link)
- Q: How can I start a service next to my notebook?
- A: https://github.com/jupyter/repo2docker/blob/master/tests/venv/start/start-script/start there is a `start` script that is executed before your notebook
- Q: What do I need to use my own `Dockerfile`?
- A: https://mybinder.readthedocs.io/en/latest/dockerfile.html look at the output of `repo2docker --debug --no-build ` for a repository without any special files.
## Zero to Binder
A guide with words on how to setup a simple binder (with pointers to ideas):
https://github.com/Build-a-binder/build-a-binder.github.io/tree/master/workshop
## Links to cool things
Links to the cool examples of projects here:
Meta-list of cool examples: https://github.com/binder-examples/
RStudio https://github.com/binder-examples/r
ThinkDSP https://mybinder.org/v2/gh/AllenDowney/ThinkDSP/master?filepath=code%2Fcacophony.ipynb
Interactive documentation https://spacy.io/usage/linguistic-features
Textbooks with jupyter https://github.com/choldgraf/textbooks-with-jupyter
LibreTexts https://eng.libretexts.org/Textbook_Maps/Computer_Science/Map%3A_Python_for_Everybody_(Severance)/4%3A_Functions/4.04%3A_Random_numbers
appmode, dashboard like apps https://github.com/binder-examples/appmode
Electromagnetism with Jupyter https://em.geosci.xyz/index.html
openrefineder, completely different apps work as well https://github.com/betatim/openrefineder
Game theory in Binder https://github.com/drvinceknight/gt
OpenDreamKit example
https://opendreamkit.org/2018/07/23/live-online-slides-with-sagemath-jupyter-rise-binder/
RISE demo with mybinder.org https://github.com/binder-examples/jupyter-rise
One Bit Music: https://mybinder.org/v2/gh/prandoni/COM303/master?filepath=OneBitMusic
I Feel Fine: https://mybinder.org/v2/gh/prandoni/COM303/master?filepath=I_feel_fine
## Break out session
Collecting ideas (for after lunch) for possible break out sessions you want to help run or take part in. Note that you can run a breakout on something you want to learn about, you don't need to know the answer!
Tim & Bastian are interested in how R users specify dependencies and what formats we should support.
- how to install specific versions of an R package: https://stackoverflow.com/a/29840882 (tldr: use `devtools`).
Tim: Check on status of the SPARQL endpoint for city of Zurich
Tim: find users from EPFL in the GA data, tell Antoine Masson
* https://github.com/mdeff/ntds_2017
Paul wants to do a practical use case: help Uber drivers analyse their own data, retrieved from Uber, by decoupling the analysis/Jupyter part from the data part
Vladimir.
* Interactive datavisualization in a notebook (embed HTML & JavaScript code).
* Using external data sources in a notebook (databases, APIs). <-- and packaging for this to pass on environment variables
* https://github.com/binder-examples/getting-data
Maryam: Open science in practice -- sharing complete workflows / online collaboration through interactive notebooks.
## Other cool stuff
At gesis.org, they have built three buttons:
share repo/build/session link
https://notebooks.gesis.org/binder/jupyter/user/binder-examples-requirements-bptc62oj/tree?token=7sZaiGYzT16qkjtVX85WDQ