Boost your research reproducibility with binder
- Event: Boost your research reproducibility with binder
- Date: 11 June, 2020 13:00 - 17:00 (GMT)
- Instructors: Kirstie Whitaker, Sarah Gibson, Malvika Sharan
- Contact: msharan@turing.ac.uk
Shared notes:
Zoom
Non verbal communication using zoom buttons:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Agenda
Time |
Activity |
13:00 - 13:10 |
Introductions |
13:10 - 13:20 |
Introduction to the workshop and The Turing Way |
13:20 - 14.30 |
Why you need a reproducible computing environment and how Binder can help |
14:30 - 15:00 |
Break |
15:00 - 16:00 |
Zero to Binder, a guided tour of building a Binder resource |
16:00 - 16:30 |
Build your own Binder |
16:30 - 17:00 |
Feedback, demo and closing |
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Introductions
Roll call:
Name / Pronouns / Affiliation / GitHub:
- Malvika Sharan / she/her / The Alan Turing Institute / malvikasharan
- Sarah Gibson / she/her / The Alan Turing Institute / sgibson91
- Kirstie Whitaker / she/her / The Alan Turing Institute / KirstieJane
- Nabila Rahman / - / Cardiff University / NabilaRahman
- Ali Seyhun Saral / he/him / Max Planck Institute for Research on Coll. Goods / seyhunsaral
- Catherine Sutherland/ she/her / University of Edinburgh / catsutherland
- Owen Dando / he/him / University of Edinburgh / lweasel
- Zrinko Kozic / he/him / University of Edinburgh / zkozic
- Fiona Grimm/ she/her / The Health Foundation / fiona-grimm
- Alex Handy / he/him / King's College London / AlexHandy1
- Festus Nyasimi /he/him / ICIPE / Fnyasimi
- Sarah Marzi / she/her / Imperial College London / SarahMarzi
- Katie Emelianova / she/her / University of Edinburgh / katieemelianova
- Delwen Franzen / she/her / QUEST center (BIH) Charite Universitรคtsmedizin Berlin / delwen
- Andrea Pierrรฉ / he/him / Brown University / kir0ul
- Jobin John / he/him / Chalmers University /jobindj
- Xin He / he/him / University of Edinburgh / hxin
- Dmitrijs Celinskis / he/him / Brown University / dcelinsk
- Kristina Salontaji/ she/her/ Imperial College London / KristinaSalontaji
- Dervis Salih/ he/him / UCL / DSalih20
- Nathan Skene / - / Imperial / nathanskene
Icebreaker:
Name / One fun app/software you have been using specially a lot during the lockdown (Zoom is not the right answer!)
- Malvika / slack & netflix
- Sarah / Elevate brain training
- Kirstie / Signal and Whatsapp - I'm in better contact in lockdown with my friends than ever before!
- Owen / https://en.boardgamearena.com
- Nabila / edx.org & amazon prime & Witcher 1 (game)
- Alex / Splitwise for shared house meals!
- Festus / edx.org & Codewars
- Sarah M / garageband
- Ali / TypeRacer
- Delwen / learning R!
- Xin / BBC iplayer kid
- Jobin / Zulip & Youtube+netflix
- Dervis / Twitter
- Emilia / Animal crossingy
Introduction to The Turing Way
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
GitHub, MarkDown - HackMD
Talk by Kirstie Whitaker
Small Group Exercises:
https://github.com/alan-turing-institute/the-turing-way/blob/master/workshops/boost-research-reproducibility-binder/paired_examples.md
Take shared notes here:
- Ex1: Binder does not take the environment file as default for running python.
- Ex2: Different matplotlib versions in the two branches. Exact same codes, but different environment files.
- Ex3: Different sklearn requirements (requirements.txt) => a big difference in end result.
- Ex4:
Q&A section:
- Thanks! You mentioned -freeze to get the configuration information. How can you get that same information for past projects? (you likely have updated information in the meantime)
- Short answer: you can't
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Reproducibility as archeology is really hard! The best practice that The Turing Way advocates for is to start version controlling your environment as early as possible because once you update your packages, it's very hard to get back to that state.
- Can binder cope with complicated situations where (particular versions of) pre-processing software needs to be run on raw data before getting to the stage of running an R or python script? (i.e. converting raw data into CSV files that can then be analysed with R or python)?
- parallel jobs?
- MatLab?
Minutes from the first half
Up
- The examples were very insightful (+2)
- EEEE
- Friendly, supportive learning environment! (+1)
- :)
- The runtime environment in R example was amaaazing!!!
- Clear explanation of the Turing Way - and lots to think about! Much more to reproducible science than I'd thought about
- Have got a much better idea what the purpose and uses of Binder are now.
- I really liked how the groupwork is handled using rooms
- Good! (I'm here instead of another seminar, because it's been interesting). Looking forward to know more.
- Very valuable to see a 'good practice' example of what a good, reproducible project is
- I actually didn't care much about the package versions that I have been using as long as it works and now I understood how important it is to report them. +1000
- The
sk-learn
example was very enlighting
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
- Really great instruction and insights, great to have so much experience to draw from and great for answering all the different questions - definitely feel like I've got a really great overview of binder :)
Down
- Zoom buttons make me feel dizzy :D it's always confusing
- Maybe more time for those four examples. (I also agree here) +3
- It would be good with a real word example showing the usage of binder, such as the publication example Kirstie shows during the coffee break +1
Talk by Sarah Gibson
Please write down your name under the programming language the content of your GitHub repo contains or you are interested in:
- Python: Delwen, Festus, Andrea, Dmitrijs, Jobin
- R: Nabila Rahman, Fiona Grimm, Ali Seyhun Saral, Kristina Salontaji, Zrinko Kozic, Sarah Marzi, Nathan, Catherine Sutherland, Dervis
- Julia:
- Unix scripts: Katie Emelianova, Xin He, Owen Dando
- No specific language:
- Other:
Break out Groups:
R Group 1: Ali Seyhun, Nabila, Nathan, Fiona
R Group 2: Catherine, Kristina, Sarah, Dervis
UNIX Group: Katie, Owen, Xin, Zrinko
Python Group: Andrea, Delwen, Dmitrijs, Festus
Take shared notes here:
- What are we supposed to do now (R group 2 asking)? (Also R group 1 asking)
- Please try to binderise your code now :)
- Kirstie Sorry folks! You're hopefully experimenting together with your own code!
- Kirstie I'm in the main area so shout if you'd like me to come to your breakout room!
Q&A section:
- Is 2048 MB a memory limit on mybinder.org?
- Hard limit is 2 GB, better performance for 1 GB
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Build your own Binder: Breakout discussion and hands-on session
Take shared notes here:
Q&A section:
Report out: Shared insights
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Final structuring and writing
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Post links to your Binder-ised GitHub repositories here:Q&A section:
-
Building R studio takes a long time. Can I set the build going on command line (e.g. from a remote cluster? (So i can go away and shut down my pc in the meantime)
- Not from the command line, no
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Using conda to build R takes a bit less take because you don't have to build from binaries: https://github.com/binder-examples/r-conda
-
Still struggling to get RStudio running on Binder (tried to create the URL) -> tried the URL path and doesn't work :(
-
I have been working with RMarkdown for reproducible analysis tutorials - how does that integrate with Binder?
- I think they should run in RStudio, but Binder wouldn't be able to generate the PDF in a pop-out window as it's serverless
-
A little off topic - but any suggestions for intro to git/github tutorials to get started with version control?
-
Can postBuild be used to get a number of data files in different formats from a Zenodo link?
- I think so? I don't think it's been tried before, so please tell us what you find!
Report out: Shared insights
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Feedback, demo and closing
Key take away
- runtime.exe (for R) is needed to load packages from CRAN that was available on THAT specific day. Problem with R on Binder is that it can't load different version of packages. It gets a snapshot from CRAN
- Binder-izing my project asap
- That Binder is all about communication of analysis of results, rather than encapsulating the whole of an extended analysis.
Pluses
- Fantastic tutorial, really engaging sessions. Loved the first examles in the breakout rooms. Soooo much material!
- Really helpful, everything clearly explained. Examples of "failures" of reproducibility very eye-opening
- Really impressed by how you managed to help participants that were stuck by quickly channeling them into breakout rooms! The exercises at the beginning really drive home the point on why we need to learn about these tools.
- I enjoyed this tutorial. Lot to takeaway. I will be sharing what I learned today with my lab
- Useful tutorial, I liked the practical elements. Great tutors.
- Thought it was awesome. It can be daunting to be introduced to this first time and you made that a really nice experience. I'm excited to apply it to my work!
Deltas
- Maybe have a more complex shared example to work through or suggest to people in advance to bring code that they want to "binderise"/maybe go through more complex scenarios like using conda envs
- Have a more complete R example, using the NAMESPACE/DEPENDENCY structure which is strandard for R. I'm a bit unclear still on what the actual required files are? Is there a document that spells out that it's OK to use environment.yml, runtime.txt etc?
- I would have benefited from examples drawing on more complex datasets (not necessarily huge files but a variety of different types of files)
- Was a little lost with the first exercise. A little more time to get acquinted with people and understand the instructions and start working.
- Would actually be great if the session were a bit longer, with a little more time for each section.
- The hands-on tutorial was awesome and well explained, but I found it difficult to properly listen and implement it at the same time (maybe a quick run through first followed by implementation, if there is sufficient time?)
- It would be great to see examples of code/data relevant to us biologists.
Next steps:
- check with Jobin for feedback
Connect with us!
We love hearing about how you're using The Turing Way.
Stay in touch through one of the many different pathways below!