How do you review code that accompanies a research project or paper?

Weecology Lab Meeting
2018 October 10

Agenda

(Hao is going to be talking about this topic next week at the rOpenSci Community Call)

(to simplify text, using this definition)
research code := code that accompanies a research project or manuscript

Meeting Objectives

appropriate and consistent variable names (guides for coding style)
code is understandable to an proficient non-expert?
code produces all the analyses in the paper
code produces all the figures in the paper
code is commented to describe what is what
documentation of environment, package dependencies, version #s
how-to guide for running the code
- documentation of manual steps, where they occur in the pipeline (e.g. data cleaning)
- describing what different components of the code do (could be literate programming, or comments)
the code reliably runs into the future (computational reproducibility)
flexibility to modify inputs/parameters (ideally)
what is appropriate to share? w.r.t. old code that is commented out

Can you read the code itself?
Small amount of work required to get from reading it to running it?
Does it run? Does it produce output from the manuscript?
Runnable examples/something to demonstrate that the code does what you think it does.
Independent of platform
Documentation - can you understand how the code relates to the goal of the analysis?

adding tests to code (make sure functions do what they do) and for the whole analysis (does the data as input produce the same results and figures)
asking other people in the lab to read your code and vice-versa (pair programming review?)
going over style guide as a group? (some way to devote time to it, look at examples)
use tools like roxygen or formatting to produce consistent documentation (with internal lab community practices and external scientific community)
guidelines for how to structure workflows (file/folder organization, saving intermediate objects, naming, etc.) - provide examples for other members of the lab
follow structure of an R package (if working in R)
more lab demos and training on practices

must run on anyone's computer (provide everything in code, or document external packages and data)
documentation (examples of how to run the code, and different pieces)
- lots of possibilities here
- describe main flow of what portions of code to run in what order (and what they do)
tests (and code written in functions to be tested) - would be nice
how many functions per file? (could be 1 function per file or multiple, whatever is sensible for users trying to find the code)
listing dependencies among functions in the research code (what order are things run in?)
style guide (use consistent names, don't reuse names if representing different things)
self-check "if I come back in a year, will I still be able to understand and use the code?"
clean up previous drafts that may be commented out (for the publication stage)