Try   HackMD

Reproducible workflows in R

(A new draft started on 4.5.2023 after the old one in Hedgedoc disappeared.)

Things that can affect reproducibility of R workflows:

  • data management
    • R scripts
    • data storage and accessibility
  • R version
  • R package versions
    • tip: define R version when loading the r-env moduleon Puhti: r-env/432 etc.
  • operating system, its version and other underlying parts

Topics to cover:

  • R Markdown and Quarto
  • projects in R
  • don't save workspace (save objects and scripts instead)
  • keep original data - modified data separate should be a separate copy
  • R script reproducibility
    • commenting
    • file paths
    • general readability
    • functions for repeating sections of code
    • set.seed()
    • aim for scripts that can produce the output again at any time (instead of relying on storage of output)
  • version control
  • renv
    • the ultimate tool for reprodubility
    • BUT be careful when using on Puhti
  • containers
  • package versions
    • packages on Puhti tied to a specific date
    • sessionInfo()
    • Posit Public Package manager snapshots

Structure draft

  • R versions on Puhti
  • R package versions on Puhti
  • minimum information to record for reproducibility
  • light-weight tools and tips for R reproducibility
    • stand-alone script principle
    • version control
    • general script tips
  • heavy-weight tools for R reproducibility
    • renv
    • containers