owned this note changed 6 months ago
Linked with GitHub

Foundational Open Science Skills (FOSS) Lesson 6: Reproducibility I

Date: 10/10/2024
Today Lead Instructor: Jeff, Michele
Today Helpers: Tina
Course Website: https://foss.cyverse.org/04_talk_to_computer
Hack(pad)-of-Hack(pads): https://hackmd.io/-4TgToyFRU2eX7lmDZLRnQ

Instant Feedback: (please complete before you leave class) Complete Form

Table of Contents


Agenda
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • About your Capstone Project (5 mins)
  • Scripting Languages (15 minutes)
  • Computing Environment(s) and Path(s) (15 minutes)
  • Environment Managers (15 minutes)
  • Break (10 minutes)
  • Reproducibility tutorial (30 minutes)

Definition

Reproducibility has been defined (Wikipedia) in computational sciences as: study results should be documented by making all data and code available in such a way that the computations can be executed again with identical results.

yes!!

Check in Questions

Have you encountered hurdles in reproducing your work?

  • Making sure the packages are the same or else there could be errors
  • Yes some problems faced previously to do the same analysis after few months
  • Sure, I had some serious issues with this early in my PhD research (chemical kinetic model gave slightly different results after a few months)
  • Yes, I always make sure to remember how I coded or created certain variables
    -Sometimes when coming back to a project or switching computers I need to spend some time installing dependencies
  • Yes!
  • Yes, especially when the creation of variables were not well documented

Have you ever run into a problem that prevented you from generating the same results, figures, analyses as before?

  • Yes!! +++++

Have you ever lost time trying to figure out how a collaborator got a particular result?

  • Yes++++++
  • I often encounter software packages from older papers that are not maintained and are very difficult to get to work
  • Reproducing work from papers that predate any Open Science considerations, when the equations and not code would be described with all implementation considerations left to the reader

What were the issues you ran into, and how might you have solved them?

  • Floating point problems; forgetting the very specific order in which to do things
    -colleague who ran the analysis left the team and we needed to basically run it again from scratch.

Discussion Set

Q1: What are some tasks you have automated or want to automate?

  • building a coarse-grain protein model
    -labeling variables, adding in new columns of data based on specific identifiers,

Q1A: Have you ever successfully automated a task?

  • setting up many simulations with different parameters
    -Does programming your coffee maker to make coffee in the morning at the same time count
  • No
  • Yes, writing a matlab script to generate plots
    -Yes, tried to make

Q1B: Found a way to make something scale or take less time?

  • Sure, for example using a script to generate & run slurm submissions

Q1C: What was the task, and how did you do it?

  • python code that takes a .cntrl file to define parameter combinations
    -stata do file, python code to left join the data

Q1D: Are there any things you wish you could automate?

  • Responding to emails +
  • table generation

Q1E: What are some barriers to automating them?

  • i just don't know how yet

Tutorial

During the second section of the class, we are going to cover a tutorial on reproducibility using Conda as a proof of concept and example on how to approach reproducible science.


Instant Feedback: (please complete before you leave class) Complete Form

Select a repo