owned this note
owned this note
Published
Linked with GitHub
# binder @ neurips
---
tags: binder, jupyterhub, blog
---
## Info
**Post issue**: https://github.com/jupyterhub/team-compass/issues/96
## Post text
The [NeurIPS](https://neurips.cc/) conference is one of the primary places for the
Machine Learning community to share ideas. Reproducibility is a hot topic
in all of the areas discussed at the conference and there is still much work
to be done to improve the
reproducibility of Machine Learning methods (see [this keynote talk for example](https://videos.videoken.com/index.php/videos/neurips-2018-invited-talk-on-reproducible-reusable-and-robust-reinforcement-learning/)).
The Binder Project is creating open, community-driven, and modular tools to facilitate
interactive and reproducible research in the cloud. This year, we sent a team to NeurIPS
to explore ways that Binder can serve the Machine Learning community. As a part of this,
we deployed a BinderHub running with considerably more resources (in particular, GPU access for
users) than our popoular https://mybinder.org service. This post describes the challenges
and solutions we encountered in setting this up.
## The problem we wanted to solve
Reproducibility is challenging for researchers in a field like Machine Learning. New research often
relies on extremely large datasets with heavy computational demand. Combined with the increased
use of GPU processing, it is increasingly difficult to produce and reproduce machine learning results
on your own laptop. On the one hand there are proprietary solutions for doing ML work in the cloud, but these tools tend to lock customers into one particular cloud offering. Project Binder, on the other
hand, allows users to run many different kinds of workflows, with many different kinds of interfaces,
with whatever packages users want and on a deployment that can be on any cloud architecture that runs kubernetes.
BinderHub (the main software component of Project Binder) makes it possible to deploy a web app (like the one at mybinder.org)
that can facilitate users sharing reproducible, interactive versions of their repositories. While
mybinder.org runs with limited resources, a BinderHub can be deployed by anyone on any cloud or on-premise hardware that runs Kubernetes. To show off this flexibility the Binder team decided to create a demo for NeurIPS 2018 that shows how open tools can be used
to reproduce research-grade machine learning work in the cloud.
## The NeurIPS BinderHub deployment
Our goal was to create a BinderHub deployment with the following technical specifications:
* Users had access to eight CPU cores and 20GB of memory
* Users had access to a GPU
* Users could use [repo2docker](https://blog.jupyter.org/introducing-repo2docker-61a593c0752d) to specify the software that should be installed
We accomplished this by following a combination of these guides:
* The [Zero to JupyterHub guide](https://z2jh.jupyter.org) to set up a JupyterHub in the cloud
* Erik Sundell's [guide to GPU-enabled JupyterHub](https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/994#issue-373992464) for modifying the JupyterHub to work with K80s.
* The [Zero to BinderHub guide](https://binderhub.readthedocs.io/en/latest/) to enable a BinderHub to flexibly build repository images for use with the JupyterHub.
[Here's the BinderHub@NeurIPS helm chart](https://github.com/consideRatio/neurips.mybinder.org-deploy/blob/master/values.yaml) that we used to configure the
deployment on Kubernetes. Below is a description of some of the main
points we followed to get it working.
### Setting up a JupyterHub with GPU+CUDA support
We first set up a JupyterHub in the cloud on nodes that had GPUs
(K80s) attached to them. We did so following [this excellent guide](https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/994#issue-373992464) that covers all of the basics for getting this deployment working.
### Identifying machine learning repositories
We then identified several GitHub repositories that replicated
Machine Learning results submitted to NeurIPS. These generally used
hardware that was non-standard for laptops (e.g., that needed a GPU
to speed up computation). A list of example repositories is below:
- https://github.com/jzf2101/shap
- https://github.com/jzf2101/tutorial
- https://github.com/jzf2101/gan_tutorial
- https://github.com/jzf2101/mxnet-the-straight-dope
### Using repo2docker to build reproducible images that run CUDA
We needed to modify these repositories to ensure that they had
the proper CUDA environment installed. We did so by adding common text
files such as `environment.yml` (or modifying pre-existing ones) to include
the relevant packages needed for GPU support. We also added a `start`
script to each repository to set some environment variables for GPU
support. If the `start` file exists in a repository, then
`repo2docker` treat it as
an entrypoint to a running container, meaning it'll be run each time
someone starts a session.
Here's an example of the change that needed to be made to a repository
in order to run with GPU support on the BinderHub: https://github.com/jzf2101/mxnet-the-straight-dope/commit/338f3365b17487e50e090ec24bd58d0b9c32d721
### Speeding up our BinderHub deployment by ensure these images were on all nodes
Next, we used repo2docker (via BinderHub) to build Docker images for these repositories
To speed up the deployment, we used a public image registry, and configured
the JupyterHub to automatically load these images onto every node. This meant
that users never had to wait for a `docker pull` operation to finish before
a new session started. The configuration for this can be found below:
```yaml
# from https://github.com/consideRatio/neurips.mybinder.org-deploy/blob/5aad94a99afed0f13fce35c9524ea5d5c57d3084/values.yaml#L78-L89
binderhub:
jupyterhub:
prePuller:
extraImages:
init-image:
name: minrk/tc-init
tag: 0.0.4
demo1:
name: gcr.io/binder-prod/neurips-jzf2101-2dshap-8561f0
tag: 4572eb360937c90a3fe92649f89ec030ca293205
demo3:
name: gcr.io/binder-prod/neurips-jzf2101-2dgan-5ftutorial-54d291
tag: a8f7b8e357e8bbd4b7bdad7a23b9691689c18c3c
demo4:
name: gcr.io/binder-prod/neurips-jzf2101-2dmxnet-2dthe-2dstraight-2ddope-b7541c
tag: 338f3365b17487e50e090ec24bd58d0b9c32d721
https://github.com/consideRatio/neurips.mybinder.org-deploy/blob/5aad94a99afed0f13fce35c9524ea5d5c57d3084/values.yaml#L78-L89
```
## Reproducing ML research
Once we had the BinderHub@NeurIPS deployment ready, we next identified a few interesting research
papers that had provided their code and analyses. We added the necessary configuration files to these
repositories in order to ensure that they built on BinderHub. Often this was simply a matter of adding
one or two text files to the repository.
* TODO: photo of us at the poster
## Wrapping up
The NeurIPS BinderHub deployment showed that it is possible to deploy a BinderHub with enough
computational resources to run modern day machine learning workflows. We hope it serves as a
proof of concept for those who are interested in using a BinderHub as part of a reproducible
publishing platform (we're looking at you, community journals). We're excited to see others push
the envelope of what can be done with JupyterHub and BinderHub, and would love for you to join
the community. If you'd like to do so, check out the following links:
* Links to BinderHub repos/community pages/etc