Boost your research reproducibility with binder
===
###### tags: `turing-way` `Workshop` `External`
:::info
- **Event:** Boost your research reproducibility with binder
- **Date:** 11 June, 2020 13:00 - 17:00 (GMT)
- **Instructors:** Kirstie Whitaker, Sarah Gibson, Malvika Sharan
- **Contact:** msharan@turing.ac.uk
:::
### Shared notes:
# https://hackmd.io/@malvikasharan/BinderJune2020
### Zoom
Non verbal communication using zoom buttons:
- Options that you see:

- Click on "Participants"

:dart: Agenda
---
| Time | Activity |
| ---- | -------- |
| 13:00 - 13:10 | Introductions |
| 13:10 - 13:20 | Introduction to the workshop and The Turing Way |
| 13:20 - 14.30 | Why you need a reproducible computing environment and how Binder can help |
| 14:30 - 15:00 | Break |
| 15:00 - 16:00 | Zero to Binder, a guided tour of building a Binder resource |
| 16:00 - 16:30 | Build your own Binder |
| 16:30 - 17:00 | Feedback, demo and closing |
:desktop_computer: Introductions
---
### Roll call:
**Name / Pronouns / Affiliation / GitHub:**
- Malvika Sharan / she/her / The Alan Turing Institute / malvikasharan
- Sarah Gibson / she/her / The Alan Turing Institute / sgibson91
- Kirstie Whitaker / she/her / The Alan Turing Institute / KirstieJane
- Nabila Rahman / - / Cardiff University / NabilaRahman
- Ali Seyhun Saral / he/him / Max Planck Institute for Research on Coll. Goods / seyhunsaral
- Catherine Sutherland/ she/her / University of Edinburgh / catsutherland
- Owen Dando / he/him / University of Edinburgh / lweasel
- Zrinko Kozic / he/him / University of Edinburgh / zkozic
- Fiona Grimm/ she/her / The Health Foundation / fiona-grimm
- Alex Handy / he/him / King's College London / AlexHandy1
- Festus Nyasimi /he/him / ICIPE / Fnyasimi
- Sarah Marzi / she/her / Imperial College London / SarahMarzi
- Katie Emelianova / she/her / University of Edinburgh / katieemelianova
- Delwen Franzen / she/her / QUEST center (BIH) Charite Universitätsmedizin Berlin / delwen
- Andrea Pierré / he/him / Brown University / kir0ul
- Jobin John / he/him / Chalmers University /jobindj
- Xin He / he/him / University of Edinburgh / hxin
- Dmitrijs Celinskis / he/him / Brown University / dcelinsk
- Kristina Salontaji/ she/her/ Imperial College London / KristinaSalontaji
- Dervis Salih/ he/him / UCL / DSalih20
- Nathan Skene / - / Imperial / nathanskene
**Icebreaker:**
Name / One fun app/software you have been using specially a lot during the lockdown (Zoom is not the right answer!)
- Malvika / slack & netflix
- Sarah / [Elevate brain training](https://www.elevateapp.com/)
- Kirstie / Signal and Whatsapp - I'm in better contact in lockdown with my friends than ever before!
- Owen / https://en.boardgamearena.com
- Nabila / [edx.org](edx.org) & amazon prime & Witcher 1 (game)
- Alex / Splitwise for shared house meals!
- Festus / edx.org & Codewars
- Sarah M / garageband
- Ali / [TypeRacer](https://play.typeracer.com/)
- Delwen / learning R!
- Xin / [BBC iplayer kid](https://www.bbc.co.uk/iplayer/features/iplayer-kids)
- Jobin / Zulip & Youtube+netflix
- Dervis / Twitter
- Emilia / Animal crossingy
**Introduction to _The Turing Way_**
* Link to the slides: [https://doi.org/10.5281/zenodo.3974948](https://doi.org/10.5281/zenodo.3974948)
* The Turing Way GitHub repository: https://github.com/alan-turing-institute/the-turing-way
* [Online Collaboration Cafes](https://github.com/alan-turing-institute/the-turing-way/blob/master/project_management/online-collaboration-cafe.md)
* Chat on Gitter: https://gitter.im/alan-turing-institute/the-turing-way
* Join the mailing list: https://tinyletter.com/TuringWay
:ballot_box_with_check: GitHub, MarkDown - HackMD
---
* Create a GitHub account if you don't have one: https://github.com/join
* Tutorials and resources:
* GitHub for collaboration: https://malvikasharan.github.io/developing_collaborative_document/
* Markdown cheatsheet: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
* HackMD: https://hackmd.io/ ([Lessons on using HackMD editor](https://hackmd.io/c/tutorials/%2Fs%2Ftutorials))
* Explore Turing Way GitHub:
* Main repository: https://github.com/alan-turing-institute/the-turing-way
* [Slides explaining how to open a new issue or pull request](https://zenodo.org/record/3676449#.Xmey7UOnzOQ) (starts at the slide number 45)
:hammer_and_wrench: Introduction to the tools and methods for this workshop
---
## Talk by Kirstie Whitaker
- Slides: https://doi.org/10.5281/zenodo.2598529
- Why you need a reproducible computing environment and how Binder can help?
**Small Group Exercises:**
https://github.com/alan-turing-institute/the-turing-way/blob/master/workshops/boost-research-reproducibility-binder/paired_examples.md
- https://github.com/alan-turing-institute/CompEnv-Ex1
- https://github.com/alan-turing-institute/CompEnv-Ex2
- https://github.com/alan-turing-institute/CompEnv-Ex3
- https://github.com/alan-turing-institute/CompEnv-Ex4
### Take shared notes here:
- Ex1: Binder does not take the environment file as default for running python.
- Ex2: Different matplotlib versions in the two branches. Exact same codes, but different environment files.
- Ex3: Different sklearn requirements (requirements.txt) => a big difference in end result.
- Ex4:
### Q&A section:
- Thanks! You mentioned -freeze to get the configuration information. How can you get that same information for past projects? (you likely have updated information in the meantime)
- Short answer: you can't :cry: Reproducibility as archeology is really hard! The best practice that _The Turing Way_ advocates for is to start version controlling your environment as early as possible because once you update your packages, it's very hard to get back to that state.
- Can binder cope with complicated situations where (particular versions of) pre-processing software needs to be run on raw data before getting to the stage of running an R or python script? (i.e. converting raw data into CSV files that can then be analysed with R or python)?
- Yes! https://repo2docker.readthedocs.io/en/latest/config_files.html#postbuild-run-code-after-installing-the-environment
- Thanks! So the configuration files would need to specify how to install that extra software in the environment too?
- Yes they would, or you could install it in postBuild too
- https://github.com/alan-turing-institute/das-public
- parallel jobs?
- e.g. https://binder.pangeo.io/
- MatLab?
- mybinder.org supports Octave which is a very similar open source version: https://mybinder.readthedocs.io/en/latest/sample_repos.html?highlight=octave#octave-on-mybinder-org
## Minutes from the first half
### Up
- The examples were very insightful (+2)
- EEEE
- Friendly, supportive learning environment! (+1)
- :)
- The runtime environment in R example was amaaazing!!!!
- Clear explanation of the Turing Way - and lots to think about! Much more to reproducible science than I'd thought about
- Have got a much better idea what the purpose and uses of Binder are now.
- I really liked how the groupwork is handled using rooms
- Good! (I'm here instead of another seminar, because it's been interesting). Looking forward to know more.
- Very valuable to see a 'good practice' example of what a good, reproducible project is
- I actually didn't care much about the package versions that I have been using as long as it works and now I understood how important it is to report them. +1000
- The `sk-learn` example was very enlighting :grimacing:
- Really great instruction and insights, great to have so much experience to draw from and great for answering all the different questions - definitely feel like I've got a really great overview of binder :)
### Down
- Zoom buttons make me feel dizzy :D it's always confusing
- Maybe more time for those four examples. (I also agree here) +3
- It would be good with a real word example showing the usage of binder, such as the publication example Kirstie shows during the coffee break +1
## Talk by Sarah Gibson
- Zero to Binder, a guided tour of building a Binder resource
- Link to the Tutorial: http://bit.ly/zero-to-binder-tutorial
### Please write down your name under the programming language the content of your GitHub repo contains or you are interested in:
- Python: Delwen, Festus, Andrea, Dmitrijs, Jobin
- R: Nabila Rahman, Fiona Grimm, Ali Seyhun Saral, Kristina Salontaji, Zrinko Kozic, Sarah Marzi, Nathan, Catherine Sutherland, Dervis
- Julia:
- Unix scripts: Katie Emelianova, Xin He, Owen Dando
- No specific language:
- Other:
Break out Groups:
R Group 1: Ali Seyhun, Nabila, Nathan, Fiona
R Group 2: Catherine, Kristina, Sarah, Dervis
UNIX Group: Katie, Owen, Xin, Zrinko
Python Group: Andrea, Delwen, Dmitrijs, Festus
### Take shared notes here:
- What are we supposed to do now (R group 2 asking)? (Also R group 1 asking)
- Please try to binderise your code now :)
- [name=Kirstie] Sorry folks! You're hopefully experimenting together with your own code!
- [name=Kirstie] I'm in the main area so shout if you'd like me to come to your breakout room!
### Q&A section:
- Is 2048 MB a memory limit on mybinder.org?
- Hard limit is 2 GB, better performance for 1 GB
:busts_in_silhouette: :speech_balloon: Build your own Binder: Breakout discussion and hands-on session
---
### Take shared notes here:
* https://elifesciences.org/labs/d42fe2b9/integrating-binder-and-stencila-the-building-blocks-to-increased-open-communication-and-transparency
* https://github.com/binder-examples
### Q&A section:
*
#### Report out: Shared insights
*
:clock2: Final structuring and writing
---
:pencil: **Post links to your Binder-ised GitHub repositories here:**
* https://github.com/NabilaRahman/my-first-binder/blob/master/README.md (tutorial example for R)
* https://github.com/katieemelianova/binder
### Q&A section:
* Building R studio takes a long time. Can I set the build going on command line (e.g. from a remote cluster? (So i can go away and shut down my pc in the meantime)
* Not from the command line, no :slightly_frowning_face: Using conda to build R takes a bit less take because you don't have to build from binaries: https://github.com/binder-examples/r-conda
* Still struggling to get RStudio running on Binder (tried to create the URL) -> tried the URL path and doesn't work :(
* All you need is to add `?urlpath=rstudio` to the end
* Can you paste your link please?
https://mybinder.org/v2/gh/SarahMarzi/R_tutorials/master?urlpath=rstudio
* I have been working with RMarkdown for reproducible analysis tutorials - how does that integrate with Binder?
* I think they should run in RStudio, but Binder wouldn't be able to generate the PDF in a pop-out window as it's serverless
* A little off topic - but any suggestions for intro to git/github tutorials to get started with version control?
* The Software Carpentries have a great lesson
* One example: https://github.com/ImperialCollegeLondon/grad_school_git_course/
* Can postBuild be used to get a number of data files in different formats from a Zenodo link?
* I think so? I don't think it's been tried before, so please tell us what you find!
#### Report out: Shared insights
*
:closed_book: Feedback, demo and closing
--
#### Key take away
* runtime.exe (for R) is needed to load packages from CRAN that was available on THAT specific day. Problem with R on Binder is that it can't load different version of packages. It gets a snapshot from CRAN
* Binder-izing my project asap
* That Binder is all about **communication of analysis of results**, rather than encapsulating the whole of an extended analysis.
#### Pluses
* Fantastic tutorial, really engaging sessions. Loved the first examles in the breakout rooms. Soooo much material!
* Really helpful, everything clearly explained. Examples of "failures" of reproducibility very eye-opening
* Really impressed by how you managed to help participants that were stuck by quickly channeling them into breakout rooms! The exercises at the beginning really drive home the point on why we need to learn about these tools.
* I enjoyed this tutorial. Lot to takeaway. I will be sharing what I learned today with my lab
* Useful tutorial, I liked the practical elements. Great tutors.
* Thought it was awesome. It can be daunting to be introduced to this first time and you made that a really nice experience. I'm excited to apply it to my work!
#### Deltas
* Maybe have a more complex shared example to work through or suggest to people in advance to bring code that they want to "binderise"/maybe go through more complex scenarios like using conda envs
* Have a more complete R example, using the NAMESPACE/DEPENDENCY structure which is strandard for R. I'm a bit unclear still on what the actual required files are? Is there a document that spells out that it's OK to use environment.yml, runtime.txt etc?
* [name=Sarah] Would posting this link have helped? https://mybinder.readthedocs.io/en/latest/config_files.html
* I would have benefited from examples drawing on more complex datasets (not necessarily huge files but a variety of different types of files)
* Was a little lost with the first exercise. A little more time to get acquinted with people and understand the instructions and start working.
* Would actually be great if the session were a bit longer, with a little more time for each section.
* The hands-on tutorial was awesome and well explained, but I found it difficult to properly listen and implement it at the same time (maybe a quick run through first followed by implementation, if there is sufficient time?)
* It would be great to see examples of code/data relevant to us biologists.
### Next steps:
- check with Jobin for feedback
#### Connect with us!
We love hearing about how you're using _The Turing Way_.
Stay in touch through one of the many different pathways below!
- [About the project](https://www.turing.ac.uk/research/research-projects/turing-way-handbook-reproducible-data-science)
- [_The Turing Way_ book](https://the-turing-way.netlify.com)
- [GitHub repository](https://github.com/alan-turing-institute/the-turing-way)
- [Gitter chat room](https://gitter.im/alan-turing-institute/the-turing-way)
- [YouTube Videos](https://www.youtube.com/channel/UCPDxZv5BMzAw0mPobCbMNuA)
- [Twitter account](https://twitter.com/turingway) and [#TuringWay Hashtag](https://twitter.com/hashtag/TuringWay?f=live)
- Get in touch with _The Turing Way_ community manager [Malvika Sharan](mailto:msharan@turing.ac.uk)