How do you review code that accompanies a research project or paper?

Weecology Lab Meeting
2018 October 10

Agenda

  • intro (5 min)
  • whole lab discussion on the goals for accompanying code (15 min)
  • breakout discussions on code criteria (15 min)
  • breakout discussions on practices/guidelines (15 min)

Introduction

(Hao is going to be talking about this topic next week at the rOpenSci Community Call)

(to simplify text, using this definition)
research code := code that accompanies a research project or manuscript

Meeting Objectives

  • identify the different kinds of code that accompany research projects
  • describe the criteria for code
  • discuss practices/guidelines that you use or would be willing to use

What are the goals for research code (whole lab)

  • appropriate and consistent variable names (guides for coding style)
    code is understandable to an proficient non-expert?
  • code produces all the analyses in the paper
  • code produces all the figures in the paper
  • code is commented to describe what is what
  • documentation of environment, package dependencies, version #s
  • how-to guide for running the code
    • documentation of manual steps, where they occur in the pipeline (e.g. data cleaning)
    • describing what different components of the code do (could be literate programming, or comments)
  • the code reliably runs into the future (computational reproducibility)
  • flexibility to modify inputs/parameters (ideally)
  • what is appropriate to share? w.r.t. old code that is commented out

Breakout Group 1 (remote + Ethan + Ellen)

How do you tell if research code meets acceptable standards or is "good enough"?

  • Can you read the code itself?
  • Small amount of work required to get from reading it to running it?
  • Does it run? Does it produce output from the manuscript?
  • Runnable examples/something to demonstrate that the code does what you think it does.
  • Independent of platform
  • Documentation - can you understand how the code relates to the goal of the analysis?

What practices or guidelines do you currently use to check research code? What practices or guidelines would you be willing to adopt?

  • adding tests to code (make sure functions do what they do) and for the whole analysis (does the data as input produce the same results and figures)
  • asking other people in the lab to read your code and vice-versa (pair programming review?)
  • going over style guide as a group? (some way to devote time to it, look at examples)
  • use tools like roxygen or formatting to produce consistent documentation (with internal lab community practices and external scientific community)
  • guidelines for how to structure workflows (file/folder organization, saving intermediate objects, naming, etc.) - provide examples for other members of the lab
  • follow structure of an R package (if working in R)
  • more lab demos and training on practices

Breakout Group 2 (unconference room)

How do you tell if research code meets acceptable standards or is "good enough"?

  • must run on anyone's computer (provide everything in code, or document external packages and data)
  • documentation (examples of how to run the code, and different pieces)
    • lots of possibilities here
    • describe main flow of what portions of code to run in what order (and what they do)
  • tests (and code written in functions to be tested) - would be nice
  • how many functions per file? (could be 1 function per file or multiple, whatever is sensible for users trying to find the code)
  • listing dependencies among functions in the research code (what order are things run in?)
  • style guide (use consistent names, don't reuse names if representing different things)
  • self-check "if I come back in a year, will I still be able to understand and use the code?"
  • clean up previous drafts that may be commented out (for the publication stage)

What practices or guidelines do you currently use to check research code? What practices or guidelines would you be willing to adopt?

  • Current:
  • version control
  • testing
  • documentation ('self-check')
  • core code as packages
  • documenting dependencies
  • Adopt:
  • testing
  • peer review (of code)
  • runnable examples (formal)
  • containers
  • documenting dependencies
  • dependency management tool - specify package versions
Select a repo