---
title: "CODE CHECK"
author:
- affiliation: Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Cambridgeshire, GB
corresponding: sje30@cam.ac.uk
email: sje30@cam.ac.uk
name: Stephen Eglen
orcid: 0000-0001-8607-8025
- affiliation: Institute for Geoinformatics, University of M"unster, M"unster, Germany
email: daniel.nuest@uni-muenster.de
name: Daniel NĂ¼st
orcid: 0000-0002-0024-5046
date: "`r format(Sys.time(), '%d %B, %Y')`"
abstract: |
| ...
#author_summary: |
# TBD
bibliography: bibliography.bib
#output: rticles::plos_article
#csl: plos.csl
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
<!-- Let's stick to one sentence per line! -->
<!-- Citations: @Figueredo:2009dg, citep: [@Figueredo:2009dg], multiple citep: [@Figueredo:2009dg; @other_2018] -->
<!-- add " {-}" after headings to remove numbering -->
# Background
- role of data and code, data science
- reproducible resarch
- data enclaves
- CASCAID
- ROpenSci/PyOpenSci
- time capsules > https://twitter.com/DougBlank/status/1135904909663068165?s=09
**main contributions**
- a concept for integrating a minimal code review into common scholarly communication processes around peer review
- set of shared principles (recognition value) that allow researchers to quickly understand the level of evaluation across journals/publishers/conferences, and allows helps these stakeholders to establish workflow checking in their domain/solution/product
- a common language and rubric for classifying different levels of code and data evaluation as part of a peer-review
# What is CODE CHECK?
- **Principles**: https://codecheck.org.uk/
- relation to "proper" citation of data and software, and depositing of data and software in suitable repos _besides_ the bundle for CODE CHECK: do it, only provide the concrete analysis script for the check
- **Why is it useful?**
- reduce the barrier to evaluating non-text parts of research
- acknowledges different skill sets and existing high load on reviewers
- CC breaks problem that publishers staff expertise today is not likely to suffice
- engaging ECRs is a perfet fit (they are up-to-date in methods, are introduced to peer review)
- relation to research compendia? point it out for authors unsure about how to structure their workflow
- CrossRef and Publons and ORCID have features around reviews
- shifting burden from reviewer to author (but not too much)
- keynote https://doi.org/10.7557/5.4963: in discussion, the speaker mentioned value of peer review for "baseline reassurance", i.e. a paper has at least been checked by someone with an understanding, to increase trust especially if looking at papers from other disciplines
- **report** must fulfil some requirements to make clear what is really checked and not mislead - readers will have very different understandings of a green checkmark!
- reproducibility is hard https://twitter.com/annakrystalli/status/1144176149859377152?ref_src=twsrc%5Etfw > code check gives at least some benefits
# Related work
- https://paperpile.com/shared/rVNwBS
- https://medium.com/bits-and-behavior/a-modern-vision-for-peer-review-d5f73f0fae07 - work by Amy J. Ko: https://faculty.washington.edu/ajko/publications
- is CODE CHECK's process too small, too incremental a change?
- _Successes and struggles with computational reproducibility: Lessons from the Fragile Families Challenge_ > https://osf.io/preprints/socarxiv/g3pdb/
- community standards for software, cf. slide 6 in https://figshare.com/articles/_/5675104
- "good enough practice" is "get a colleague to try using it", slide 19 in https://figshare.com/articles/_/5675104
- journals doing code reviews
- _"We know of 16 journals that take this valuable step (ie TOP Level 3): AEA as already mentioned, plus others in polisci, psychology, biostats, and org chem: see Data Rep Policies here https://t.co/plOF8j6ADU "_ (https://twitter.com/EvoMellor/status/1202692360456589339?s=09); see also whole thread!
- https://medium.com/@NeurIPSConf/call-for-papers-689294418f43
- https://www.journals.elsevier.com/information-systems/editorial-board/
- SIGMOD Reproducibility Review?
- Biostatistics "AER" (https://academic.oup.com/biostatistics/article/10/3/405/293660)
- open science platforms/cloud services > https://www.nature.com/articles/d41586-019-03366-x
- live code in articles > o2r, eLife RDS https://www.nature.com/articles/d41586-019-00724-7, ...
- Artefact Evaluation - https://www.artifact-eval.org/about.html
- continuous analysis - https://www.nature.com/articles/nbt.3780
- "A Universal Identifier for Computational Results" > https://www.sciencedirect.com/science/article/pii/S1877050911001256
- Discussion about using Docker containers during JOSS reviews > https://github.com/openjournals/joss/issues/498#issuecomment-462046912
- https://scigen.report, an independent site for registering reproductions, via https://annakrystalli.me/talks/ro-reprohack.html#32
- example https://scigen.report/reviews/get?doi=10.1063/1.1823034
- seems to be closed infrastructure
# CODE CHECK implementation concepts
- need to get wording straight: process/implementation/workflow (DN: suggest to use "workflow" for research-analysis workflow)
- https://codecheck.org.uk/process/
- _dimensions_ along which a check can vary
- independent/pre-submission (Peer_Coding) by author + colleagues
- AGILE
- premier (OA) publisher
- community OA / scholarly society with public peer review
- invited reproducibility paper track, cf. https://www.elsevier.com/journals/information-systems/0306-4379/guide-for-authors
- on giving credit: public databases
- https://www.reviewercredits.com/
- Publons
- ORCID
- compare with existing processes that are publicly documented
- AJPS Verification Policy: https://ajps.org/ajps-verification-policy/
- ...
- flexible level of detail? a check can be enhanced to (potentially semi-automatically) check for good practices
- "A group of us in the @force11rescomm Software Citation Implementation Working Group produced a "Software Citation Checklist for Authors" https://t.co/4Noe66FsoX providing guidelines on how to cite software and describing why it is important and what software should be cited" via https://twitter.com/alegonbel/status/1207664922932453376?s=09)
# Annotated Examples
<!-- see examples.md -->
- **the default implementation**
- critically discuss our/the default implementation vs. the principles
- HPC example
- if we see the corpus of examles as a scientific dataset, it should be published in a citable way (maybe even submit to a data journal)
# Open problems
- need publishers build up data science experts who can conduct checks?
- integration of software/data citation checks benefitial?
- bot-support and tools (linter)
- Does nudging towards better practices work (highlight the successes) or is any check with opt-in doomed to fail?
- Do we need more than one codechecker? Is there a similar diversity as in a peer review where more opinions are needed?
- can CODE CHECK work for preprints? how? is a re-check required?
- risk: is the hurdle too low? are "levels" of checks are solution to increase transparency of check's level of detail
- code check in the context of innovations and disruptions in scientific publishing
- https://doi.org/10.1042/ETLS20180172
- https://doi.org/10.3390/publications7020034
- see https://github.com/codecheckers/discussion
# Acknowledgements
Collaborators and potential collaborators:
- eLife
- GigaScience
- AH, SciData
- Mozilla
- PLOS?
# Contribution
..