# Analysis Code Review Checklist
Copied from: https://raw.githubusercontent.com/sachsmc/stats-code-review/master/Checklists/analysis-checklist.md
Checklist for Data Analysis Scripts
===================================
## Preparation for review
- [ ] Store intermediate objects that take >10 minutes to run
- [ ] Remove the following things
- [ ] View(data)
- [ ] ?functionname
- [ ] Check for magic numbers (???)
- [ ] Go over this code review list and evaluate whether your code checks on all items
## What can go in a README?
- Names of code authors
- Code structure
- Order of files to run
- What does the code do
- Intended output
- time for long code chunks
- tree structure of files
- dependencies
Documentation and Organization
------------------------------
- [ ] Is there a readme?
- [ ] Are the scripts and data dependencies organized sensibly in a folder? (with an Rproject)
- [ ] Is the code written in small, manageable chunks?
- [ ] Is the code executable from top to bottom?
- [ ] Are the dependencies (data, packages, other scripts) clearly documented?
- [ ] Is the intent of the analysis clear from the documentation?
- [ ] Does the project contain a separate folder with raw data file (that remains untouched) and processed data files? (Storing processed data files makes it fast and easy to rerun analysis parts of the code in case they change)
- [ ] Does the code contain control output like `print(object)`? This could rather be output as a log message.
Code Structure
--------------
**Comments within the code**
- [ ] Is the purpose of the script known?
- [ ] Do comments describe explanation of the code? (Rather than describing what the code does, as that can be read from the code. Example: comment 'This Chi-square test is done in order to evaluate whether X and Z are independent' on a chi-square test, rather than 'Run the chi-square test'.)
- [ ] For statistical tests, is the null hypothesis described in a comment?
**Advanced code structure**
- [ ] Is the code properly indented, using a reasonable line length, and is an editor with syntax highlighting used?
> [name=Anna I think the editor thing is weird. I can use whatever editor I want to read it right?]
- [ ] Does the code follow a consistent style and formatting?
- [ ] Does the code follow the DRY principle and use functions?
- [ ] Are file names, functions and variable names informative? Can you guess what a file/function does by its name and arguments?
- [ ] Are there redundant or unused variables?
- [ ] Are there repeated blocks of code that can be made into a function?
- [ ] Can the code easily handle changes/updates to the inputs (e.g. data updates)?
Data Reading and Cleaning
-------------------------
- [ ] Are the data inspected? (expected ranges or counts)
- [ ] Is there any data manipulation done outside of the script, eg in Excel or SPSS?
- [ ] Are missing values identified and handled correctly?
- [ ] Are merge operations done correctly using an appropriate key? Any chances for scrambling of data?
- [ ] Are transformations to variables done correctly?
- [ ] Are there any magic numbers?
Analysis Methods
----------------
- [ ] Are methods reinvented that are implemented in an existing (high quality) package?
- [ ] Are the algorithm settings/defaults known and correct?
> [name=Anna I think it would be best to advise against the use of hidden default and insist that all (relevant) parameters are specified even if the specification is equivalent to the default]
- [ ] Does the analysis do what is intended?
- [ ] Are there any issues with convergence or other numerical problems?
- [ ] Do the results rely on random number generation? If so, is set.seed() used?
- [ ] Are there any slow running blocks that need to be optimized? (The need may depend on how frequently the script will be run)
- [ ] Does the analysis successfully address the needs of the project?
Output
------
- [ ] Do the results rely on any external tools for output?
- [ ] Do downstream modifications need to be made to the output? (e.g. changing labels or colors in a graphics editor)
- [ ] What is the intended product of the analysis (report, paper, presentation, etc.) and how does the script address it?
Overall comments:
------
- [ ] Did evaluation of the code generate questions about the analysis? (in a scientific sense) If so, report them here:
## New ideas (to sort)
- [ ] Specify how long code takes to run
- [ ] Raw Data transformations - preprocessing steps
- [ ] Intermediate objects stored?
- [ ] Where do descriptive data exploration (i.e., missing data, which data points are kicked out?)
- [ ]