## IT Conference- Winter 2021
### Research reproducibility and what it means to IT.
<!-- Put the link to this slide here so people can follow -->
Martin Callaghan
slides: https://bit.ly/mtc-it-repro
---

---
## Who am I?
- Martin Callaghan
- Research Software Engineer
- I (we) work with researchers to integrate code with (big) hardware & Cloud with (big) data with (big) AI
- Likes **Linux**, **Python** and **R**
---
## Agenda
- What's reproducibility?
- The crisis in reproducibility
- Fixing the problem
- Training, tools and techniques
- Challenges for IT
- Thinking about the future
---
## What's reproducibility?
Imagine giving one recipe to 10 different chefs and getting 10 completely different results.
This inconsistency could be due to any number of factors — variables that cannot be controlled, omission of details, or shortcomings in design and execution.
The same challenges apply to scientific experiments.
---
## Why is it important?
* If other people can reproduce your methods and results then it makes your findings credible.
* As code and data are now valid research outputs it has become just as important to ensure that code produces reproducible results.
* Funders want reproducibility
---
## The crisis

---
## The 'crisis'

---
## The crisis? What crisis?
- Just how easy is it to reproduce some computational research?
- Let us have a look at a couple of papers (links in the chat)
https://www.reprohack.org/paper/51/
https://www.reprohack.org/paper/49/
---
## Activity
(8 mins)
Put yourself in the shoes of a new PhD student:
- Could you reproduce these papers?
- Would you know where to start?
- What information would help?
Put your comments in the chat and we'll explore a few shortly.
---
## Fixing the problem
- Accept there is a problem and that fixing it is important
- Training and Researcher Education!
- Ensuring the right tools are there at our end
- Admin rights are the enemy of reproducibility!
- Researchers like admin rights!
---
## Training, tools and techniques
- Researcher training
- Reprohacks
- Conda & Containers: the reproducibility toolkit
- Literate programming tools
- Documentation (and lots of it!)
---
## Researcher training
**Probably the most important thing we do in RSE**
We run courses and workshops on (amongst others):
- Coding skills
- Version Control
- Reproducible practices
- Testing
- Documentation
---

---
## Reproducibility best practices
**Activity:**
(5 mins)
Many of you are developers of one form or another.
What do **you** regard as best practice (either in your team or personally)?
Answers in the chat!
---
## The reproducibility toolkit
We don’t hand out local admin rights. I don’t disagree (always).
Poorly used, local admin rights can break reproducibility.
We can offer a cascading set of solutions:
1. Try Conda package management
2. Try Docker or Singularity containers
3. Try Virtual Machines (with Vagrant)
Not worked? Now let us look at admin rights
1,2,3 can all provide **scriptable** and **reproducible** and **shareable** environments.
---
## Conda
Many researchers-who-code use Python and/or R.
Conda is a package management tool that can be:
* user installed and managed
* easily updated
* used to manage R/Python versions, packages and (most) dependencies
---
## Containers
You might have heard of tools like
* Docker
* Singularity
Containers can be built to bundle all the necessary ingredients (data, code, environment).
A container provides operating-system-level virtualisation, sharing the host system’s kernel with other containers. It’s an easy way to run (eg) Linux applications on a Mac/ Windows, Ubuntu on Centos, etc.
---
## Virtual machines
The next-to-last resort!
A VM allows us to capture even more of an analog of the user environment, even down to the OS kernel.
A tool like Vagrant allows us to script the creation of the VM.
---
## Admin rights
For me, the last resort!
---
## Activity
(5 mins)
What are your thoughts and experiences of these tools?
Answers in the chat!
---
## Challenges and the future
This is our opportunity to think about how we can support our researchers to be:
* more independent
* safer
* work reproducibly
---
## Final activity
Using [this hackpad](https://hackmd.io/@callaghanmt/rJg4va_qK):
You'll find three sections as per the previous slide.
Share a few of your thoughts and ideas!
---
## Final thoughts and questions
---
{"metaMigratedAt":"2023-06-16T16:27:48.592Z","metaMigratedFrom":"YAML","title":"IT conference talk- Reproducibility","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"900f0411-76c7-4cae-914f-0049b8930a03\",\"add\":4831,\"del\":95}]"}