## IT Conference- Winter 2021 ### Research reproducibility and what it means to IT. <!-- Put the link to this slide here so people can follow --> Martin Callaghan slides: https://bit.ly/mtc-it-repro --- ![](https://i.imgur.com/PSDmXj6.png) --- ## Who am I? - Martin Callaghan - Research Software Engineer - I (we) work with researchers to integrate code with (big) hardware & Cloud with (big) data with (big) AI - Likes **Linux**, **Python** and **R** --- ## Agenda - What's reproducibility? - The crisis in reproducibility - Fixing the problem - Training, tools and techniques - Challenges for IT - Thinking about the future --- ## What's reproducibility? Imagine giving one recipe to 10 different chefs and getting 10 completely different results. This inconsistency could be due to any number of factors — variables that cannot be controlled, omission of details, or shortcomings in design and execution. The same challenges apply to scientific experiments. --- ## Why is it important? * If other people can reproduce your methods and results then it makes your findings credible. * As code and data are now valid research outputs it has become just as important to ensure that code produces reproducible results. * Funders want reproducibility --- ## The crisis ![](https://i.imgur.com/BKqUXuI.jpg) --- ## The 'crisis' ![](https://i.imgur.com/lCqwJPa.jpg) --- ## The crisis? What crisis? - Just how easy is it to reproduce some computational research? - Let us have a look at a couple of papers (links in the chat) https://www.reprohack.org/paper/51/ https://www.reprohack.org/paper/49/ --- ## Activity (8 mins) Put yourself in the shoes of a new PhD student: - Could you reproduce these papers? - Would you know where to start? - What information would help? Put your comments in the chat and we'll explore a few shortly. --- ## Fixing the problem - Accept there is a problem and that fixing it is important - Training and Researcher Education! - Ensuring the right tools are there at our end - Admin rights are the enemy of reproducibility! - Researchers like admin rights! --- ## Training, tools and techniques - Researcher training - Reprohacks - Conda & Containers: the reproducibility toolkit - Literate programming tools - Documentation (and lots of it!) --- ## Researcher training **Probably the most important thing we do in RSE** We run courses and workshops on (amongst others): - Coding skills - Version Control - Reproducible practices - Testing - Documentation --- ![](https://i.imgur.com/FwIAiTH.png) --- ## Reproducibility best practices **Activity:** (5 mins) Many of you are developers of one form or another. What do **you** regard as best practice (either in your team or personally)? Answers in the chat! --- ## The reproducibility toolkit We don’t hand out local admin rights. I don’t disagree (always). Poorly used, local admin rights can break reproducibility. We can offer a cascading set of solutions: 1. Try Conda package management 2. Try Docker or Singularity containers 3. Try Virtual Machines (with Vagrant) Not worked? Now let us look at admin rights 1,2,3 can all provide **scriptable** and **reproducible** and **shareable** environments. --- ## Conda Many researchers-who-code use Python and/or R. Conda is a package management tool that can be: * user installed and managed * easily updated * used to manage R/Python versions, packages and (most) dependencies --- ## Containers You might have heard of tools like * Docker * Singularity Containers can be built to bundle all the necessary ingredients (data, code, environment). A container provides operating-system-level virtualisation, sharing the host system’s kernel with other containers. It’s an easy way to run (eg) Linux applications on a Mac/ Windows, Ubuntu on Centos, etc. --- ## Virtual machines The next-to-last resort! A VM allows us to capture even more of an analog of the user environment, even down to the OS kernel. A tool like Vagrant allows us to script the creation of the VM. --- ## Admin rights For me, the last resort! --- ## Activity (5 mins) What are your thoughts and experiences of these tools? Answers in the chat! --- ## Challenges and the future This is our opportunity to think about how we can support our researchers to be: * more independent * safer * work reproducibly --- ## Final activity Using [this hackpad](https://hackmd.io/@callaghanmt/rJg4va_qK): You'll find three sections as per the previous slide. Share a few of your thoughts and ideas! --- ## Final thoughts and questions ---
{"metaMigratedAt":"2023-06-16T16:27:48.592Z","metaMigratedFrom":"YAML","title":"IT conference talk- Reproducibility","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"900f0411-76c7-4cae-914f-0049b8930a03\",\"add\":4831,\"del\":95}]"}
    292 views