---
title: General Software Standards
tags: statistical-software
robots: noindex, nofollow
---
<!-- Edit the .Rmd not the .md file -->
## General Standards for Statistical Software
These general standards, and all category-specific standards that
follow, are intended to serve as *recommendations* for best practices.
Note in particular that many standards are written using the word
“*should*” in explicit acknowledgement that adhering to such standards
may not always be possible. All standards phrased in these terms are
intended to be interpreted as applicable under such conditions as
“*Where possible*”, or “*Where applicable*”. Developers are requested to
note any standards which they deem not applicable to their software via
the [`srr` package](https://ropenscilabs.github.io/srr/), as described
in [Chapter 3](#pkgdev).
<details>
<summary>
These standards refer to <b>Data Types</b> as the fundamental types
defined by the
<a href="https://cran.r-project.org/doc/manuals/R-lang.html">R
language</a> itself. Information on these types can be seen by clicking
here.
</summary>
<p>
The [R language](https://cran.r-project.org/doc/manuals/R-lang.html)
defines the following data types:
- Logical
- Integer
- Continuous (`class = "numeric"` / `typeof = "double"`)
- Complex
- String / character
The base R system also includes what are considered here to be direct
extensions of fundamental types to include:
- Factor
- Ordered Factor
- Date/Time
The continuous type has a `typeof` of “double” because that represents
the storage mode in the C representation of such objects, while the
`class` as defined within R is referred to as “numeric”. While `typeof`
is not the same as `class`, with reference to continuous variables,
“numeric” may be considered identical to “double” throughout.
The term “character” is interpreted here to refer to a vector each
element of which is an individual “character” object. The term “string”
does not relate to any official R nomenclature, but is used here to
refer for convenience to a character vector of length one; in other
words, a “string” is the sole element of a single-length “character”
vector.
------------------------------------------------------------------------
</p>
</details>
<br>
### 1 Documentation
- **G1.0** *Statistical Software should list at least one primary
reference from published academic literature.*
We consider that statistical software submitted under our system will
either (i) implement or extend prior methods, in which case the *primary
reference* will be to the most relevant published version(s) of prior
methods; or (ii) be an implementation of some new method. In the second
case, it will be expected that the software will eventually form the
basis of an academic publication. Until that time, the most suitable
reference for equivalent algorithms or implementations should be
provided.
- **G1.1** *Statistical Software should document whether the
algorithm(s) it implements are:*
- *The first implementation of a novel algorithm*; or
- *The first implementation within **R** of an algorithm which has
previously been implemented in other languages or contexts*; or
- *An improvement on other implementations of similar algorithms
in **R***.
The second and third options additionally require references to
comparable algorithms or implementations to be documented somewhere
within the software, including references to all known implementations
in other computer languages. (A common location for such is a statement
of “*Prior Art*” or similar at the end of the main `README` document.)
#### 1.1 Statistical Terminology
- **G1.2** *All statistical terminology should be clarified and
unambiguously defined.*
Developers should not presume anywhere in the documentation of software
that specific statistical terminology may be “generally understood”, and
therefore not need explicit clarification. Even terms which many may
consider sufficiently generic as to not require such clarification, such
as “null hypotheses” or “confidence intervals”, will generally need
explicit clarification. For example, both the estimation and
interpretation of confidence intervals are dependent on distributional
properties and associated assumptions. Any particular implementation of
procedures to estimate or report on confidence intervals will
accordingly reflect assumptions on distributional properties (among
other aspects), both the nature and implications of which must be
explicitly clarified.
#### 1.2 Function-level Documentation
- **G1.3** *Software should use
[`roxygen2`](https://roxygen2.r-lib.org/) to document all
functions.*
- **G1.2a** *All internal (non-exported) functions should also be
documented in standard [`roxygen2`](https://roxygen2.r-lib.org/)
format, along with a final `@noRd` tag to suppress automatic
generation of `.Rd` files.*
#### 1.3 Supplementary Documentation
The following standards describe several forms of what might be
considered “Supplementary Material”. While there are many places within
an R package where such material may be included, common locations
include vignettes, or in additional directories (such as `data-raw`)
listed in `.Rbuildignore` to prevent inclusion within installed
packages.
Where software supports a publication, all claims made in the
publication with regard to software performance (for example, claims of
algorithmic scaling or efficiency; or claims of accuracy), the following
standard applies:
- **G1.4** *Software should include all code necessary to reproduce
results which form the basis of performance claims made in
associated publications.*
Where claims regarding aspects of software performance are made with
respect to other extant R packages, the following standard applies:
- **G1.5** *Software should include code necessary to compare
performance claims with alternative implementations in other R
packages.*
### 2 Input Structures
This section considers general standards for *Input Structures*. These
standards may often effectively be addressed through implementing class
structures, although this is not a general requirement. Developers are
nevertheless encouraged to examine the guide to [S3
vectors](https://vctrs.r-lib.org/articles/s3-vector.html#casting-and-coercion)
in the [`vctrs` package](https://vctrs.r-lib.org) as an example of the
kind of assurances and validation checks that are possible with regard
to input data. Systems like those demonstrated in that vignette provide
a very effective way to ensure that software remains robust to diverse
and unexpected classes and types of input data. Packages such
[`checkmate`](https://mllg.github.io/checkmate/index.html) enable direct
and simple ways to check and assert input structures.
#### 2.1 Uni-variate (Vector) Input
It is important to note for univariate data that single values in R are
vectors with a length of one, and that `1` is of exactly the same *data
type* as `1:n`. Given this, inputs expected to be univariate should:
- **G2.0** *Implement assertions on lengths of inputs, particularly
through asserting that inputs expected to be single- or multi-valued
are indeed so.*
- **G2.0a** Provide explicit secondary documentation of any
expectations on lengths of inputs
- **G2.1** *Implement assertions on types of inputs (see the initial
point on nomenclature above).*
- **G2.1a** *Provide explicit secondary documentation of
expectations on data types of all vector inputs.*
- **G2.2** *Appropriately prohibit or restrict submission of
multivariate input to parameters expected to be univariate.*
- **G2.3** *For univariate character input:*
- **G2.3a** *Use `match.arg()` or equivalent where applicable to
only permit expected values.*
- **G2.3b** *Either: use `tolower()` or equivalent to ensure input
of character parameters is not case dependent; or explicitly
document that parameters are strictly case-sensitive.*
- **G2.4** *Provide appropriate mechanisms to convert between
different data types, potentially including:*
- **G2.4a** *explicit conversion to `integer` via `as.integer()`*
- **G2.4b** *explicit conversion to continuous via `as.numeric()`*
- **G2.4c** *explicit conversion to character via `as.character()`
(and not `paste` or `paste0`)*
- **G2.4d** *explicit conversion to factor via `as.factor()`*
- **G2.4e** *explicit conversion from factor via `as...()`
functions*
- **G2.5** *Where inputs are expected to be of `factor` type,
secondary documentation should explicitly state whether these should
be `ordered` or not, and those inputs should provide appropriate
error or other routines to ensure inputs follow these expectations.*
A few packages implement R versions of “static type” forms common in
other languages, whereby the type of a variable must be explicitly
specified prior to assignment. Use of such approaches is encouraged,
including but not restricted to approaches documented in packages such
as [`vctrs`](https://vctrs.r-lib.org), or the experimental package
[`typed`](https://github.com/moodymudskipper/typed). One additional
standard for vector input is:
- **G2.6** *Software which accepts one-dimensional input should ensure
values are appropriately pre-processed regardless of class
structures.*
The [`units` package](https://github.com/r-quantities/units/) provides a
good example, in creating objects that may be treated as vectors, yet
which have a class structure that does not inherit from the `vector`
class. Using these objects as input often causes software to fail. The
`storage.mode` of the underlying objects may nevertheless be examined,
and the objects transformed or processed accordingly to ensure such
inputs do not lead to errors.
#### 2.2 Tabular Input
This sub-section concerns input in “tabular data” forms, meaning the
base R forms `array`, `matrix`, and `data.frame`, and other forms and
classes derived from these. Tabular data generally have two dimensions,
although may have more (such as for `array` objects). There is a primary
distinction within R itself between `array` or `matrix` representations,
and `data.frame` and associated representations. The former are
restricted to storing data of a single uniform type (for example, all
`integer` or all `character` values), whereas `data.frame` as associated
representations (generally) store each column as a list item, allowing
different columns to hold values of different types. Further noting that
a `matrix` may, [as of R version
4.0](https://developer.r-project.org/Blog/public/2019/11/09/when-you-think-class.-think-again/index.html),
be considered as a strictly two-dimensional array, tabular inputs for
the purposes of these standards are considered to imply data represented
in one or more of the following forms:
- `matrix` form when referring to specifically two-dimensional data of
one uniform type
- `array` form as a more general expression, or when referring to data
that are not necessarily or strictly two-dimensional
- `data.frame`
- Extensions such as
- [`tibble`](https://tibble.tidyverse.org)
- [`data.table`](https://rdatatable.gitlab.io/data.table)
- domain-specific classes such as
[`tsibble`](https://tsibble.tidyverts.org) for time series, or
[`sf`](https://r-spatial.github.io/sf/) for spatial data.
Both `matrix` and `array` forms are actually stored as vectors with a
single `storage.mode`, and so all of the preceding standards
**G2.0**–**G2.5** apply. The other rectangular forms are not stored as
vectors, and do not necessarily have a single `storage.mode` for all
columns. These forms are referred to throughout these standards as
“`data.frame`-type tabular forms”, which may be assumed to refer to data
represented in either the `base::data.frame` format, and/or any of the
classes listed in the final of the above points.
General Standards applicable to software which is intended to accept any
one or more of these `data.frame`-type tabular inputs are then that:
- **G2.7** *Software should accept as input as many of the above
standard tabular forms as possible, including extension to
domain-specific forms.*
Software need not necessarily test abilities to accept different types
of inputs, because that may require adding packages to the `Suggests`
field of a package for that purpose alone. Nevertheless, software which
somehow uses (through `Depends` or `Suggests`) any packages for
representing tabular data should confirm in tests the ability to accept
these types of input.
- **G2.8** *Software should provide appropriate conversion or dispatch
routines as part of initial pre-processing to ensure that all other
sub-functions of a package receive inputs of a single defined class
or type.*
- **G2.9** *Software should issue diagnostic messages for type
conversion in which information is lost (such as conversion of
variables from factor to character; standardisation of variable
names; or removal of meta-data such as those associated with
[`sf`-format](https://r-spatial.github.io/sf/) data) or added (such
as insertion of variable or column names where none were provided).*
Note, for example, that an `array` may have column names which start
with numeric values, but that a `data.frame` may not.
``` r
x <- array (1, dim = c(1, 1), dimnames = list("1", "2")) # okay
print (x)
```
## 2
## 1 1
``` r
data.frame (x)
```
## X2
## 1 1
If `array` or `matrix` class objects are accepted as input, then
**G2.8** implies that routines should be implemented to check for such
conversion of column names.
The next standard concerns the following inconsistencies between three
common tabular classes in regard the column extraction operator, `[`.
``` r
x <- iris # data.frame from the datasets package
class (x)
#> [1] "data.frame"
class (x [, 1])
#> [1] "numeric"
class (x [, 1, drop = TRUE]) # default
#> [1] "numeric"
class (x [, 1, drop = FALSE])
#> [1] "data.frame"
x <- tibble::tibble (x)
class (x [, 1])
#> [1] "tbl_df" "tbl" "data.frame"
class (x [, 1, drop = TRUE])
#> [1] "numeric"
class (x [, 1, drop = FALSE]) # default
#> [1] "tbl_df" "tbl" "data.frame"
x <- data.table::data.table (x)
class (x [, 1])
#> [1] "data.table" "data.frame"
class (x [, 1, drop = TRUE]) # no effect
#> [1] "data.table" "data.frame"
class (x [, 1, drop = FALSE]) # default
#> [1] "data.table" "data.frame"
```
- Extracting a single column from a `data.frame` returns a `vector` by
default, and a `data.frame` if `drop = FALSE`.
- Extracting a single column from a `tibble` returns a single-column
`tibble` by default, and a `vector` if `drop = TRUE`.
- Extracting a single column from a `data.table` always returns a
`data.table`, and the `drop` argument has no effect.
Given such inconsistencies,
- **G2.10** *Software should ensure that extraction or filtering of
single columns from tabular inputs should not presume any particular
default behaviour, and should ensure all column-extraction
operations behave consistently regardless of the class of tabular
data used as input.*
Adherence to the above standard **G2.8** will ensure that any implicitly
or explicitly assumed default behaviour will yield consistent results
regardless of input classes.
**Columns of tabular inputs**
The follow standards apply to `data.frame`-like tabular objects
(including all derived and otherwise compatible classes), and so do not
apply to `matrix` or `array` objects.
- **G2.11** *Software should ensure that `data.frame`-like tabular
objects which have columns which do not themselves have standard
class attributes (typically, `vector`) are appropriately processed,
and do not error without reason. This behaviour should be tested.
Again, columns created by the [`units`
package](https://github.com/r-quantities/units/) provide a good test
case.*
- **G2.12** *Software should ensure that `data.frame`-like tabular
objects which have list columns should ensure that those columns are
appropriately pre-processed either through being removed, converted
to equivalent vector columns where appropriate, or some other
appropriate treatment such as an informative error. This behaviour
should be tested.*
#### 2.3 Missing or Undefined Values
- **G2.13** *Statistical Software should implement appropriate checks
for missing data as part of initial pre-processing prior to passing
data to analytic algorithms.*
- **G2.14** *Where possible, all functions should provide options for
users to specify how to handle missing (`NA`) data, with options
minimally including:*
- **G2.14a** *error on missing data*
- **G2.14b** *ignore missing data with default warnings or
messages issued*
- **G2.14c** *replace missing data with appropriately imputed
values*
- **G2.15** *Functions should never assume non-missingness, and should
never pass data with potential missing values to any base routines
with default `na.rm = FALSE`-type parameters (such as
[`mean()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/mean.html),
[`sd()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/sd.html)
or
[`cor()`](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/cor.html)).*
- **G2.16** *All functions should also provide options to handle
undefined values (e.g., `NaN`, `Inf` and `-Inf`), including
potentially ignoring or removing such values.*
### 3 Algorithms
- **G3.0** *Statistical software should never compare floating point
numbers for equality. All numeric equality comparisons should either
ensure that they are made between integers, or use appropriate
tolerances for approximate equality.*
This standard applies to all computer languages included in any package.
In R, values can be affirmed to be integers through `is.integer()`, or
asserting that the `storage.mode()` of an object is “integer”. One way
to compare numeric values with tolerance is with the [`all.equal()`
function](https://stat.ethz.ch/R-manual/R-devel/library/base/html/all.equal.html),
which accepts an additional `tolerance` parameter with a default for
`numeric` comparison of `sqrt(.Machine$double.eps)`, which is typically
around e(-8–10). In other languages, including C and C++, comparisons of
floating point numbers are commonly implemented by conditions such as
`if (abs(a - b) < tol)`, where `tol` specifies the tolerance for
equality.
Importantly, R functions such as
[`duplicated()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/duplicated.html)
and
[`unique()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/unique.html)
rely on equality comparisons, and this standard extends to require that
software should not apply any functions which themselves rely on
equality comparisons to floating point numbers.
- **G3.1** *Statistical software which relies on covariance
calculations should enable users to choose between different
algorithms for calculating covariances, and should not rely solely
on covariances from the `stats::cov` function.*
- **G3.1a** *The ability to use arbitrarily specified covariance
methods should be documented (typically in examples or
vignettes).*
Estimates of covariance can be very sensitive to outliers, and a variety
of methods have been developed for “robust” estimates of covariance,
implemented in such packages as
[`rms`](https://cran.r-project.org/package=rms),
[`robust`](https://cran.r-project.org/package=robust), and
[`sandwich`](https://cran.r-project.org/package=sandwich). Adhering to
this standard merely requires an ability for a user to specify a
particular covariance function, such as through an additional parameter.
The `stats::cov` function can be used as a default, and additional
packages such as the three listed here need not necessarily be listed as
`Imports` to a package.
### 4 Output Structures
- **G4.0** *Statistical Software which enables outputs to be written
to local files should parse parameters specifying file names to
ensure appropriate file suffices are automatically generated where
not provided.*
### 5 Testing
All packages should follow rOpenSci standards on
[testing](https://devguide.ropensci.org/building.html#testing) and
[continuous integration](https://devguide.ropensci.org/ci.html),
including aiming for high test coverage. Extant R packages which may be
useful for testing include [`testthat`](https://testthat.r-lib.org),
[`tinytest`](https://github.com/markvanderloo/tinytest),
[`roxytest`](https://github.com/mikldk/roxytest), and
[`xpectr`](https://github.com/LudvigOlsen/xpectr).
#### 5.1 Test Data Sets
- **G5.0** *Where applicable or practicable, tests should use standard
data sets with known properties (for example, the [NIST Standard
Reference Datasets](https://www.itl.nist.gov/div898/strd/), or data
sets provided by other widely-used R packages).*
- **G5.1** *Data sets created within, and used to test, a package
should be exported (or otherwise made generally available) so that
users can confirm tests and run examples.*
#### 5.2 Responses to Unexpected Input
- **G5.2** *Appropriate error and warning behaviour of all functions
should be explicitly demonstrated through tests. In particular,*
- **G5.2a** *Every message produced within R code by `stop()`,
`warning()`, `message()`, or equivalent should be unique*
- **G5.2b** *Explicit tests should demonstrate conditions which
trigger every one of those messages, and should compare the
result with expected values.*
- **G5.3** *For functions which are expected to return objects
containing no missing (`NA`) or undefined (`NaN`, `Inf`) values, the
absence of any such values in return objects should be explicitly
tested.*
#### 5.3 Algorithm Tests
For testing *statistical algorithms*, tests should include tests of the
following types:
- **G5.4** **Correctness tests** *to test that statistical algorithms
produce expected results to some fixed test data sets (potentially
through comparisons using binding frameworks such as
[RStata](https://github.com/lbraglia/RStata)).*
- **G5.4a** *For new methods, it can be difficult to separate out
correctness of the method from the correctness of the
implementation, as there may not be reference for comparison. In
this case, testing may be implemented against simple, trivial
cases or against multiple implementations such as an initial R
implementation compared with results from a C/C++
implementation.*
- **G5.4b** *For new implementations of existing methods,
correctness tests should include tests against previous
implementations. Such testing may explicitly call those
implementations in testing, preferably from fixed-versions of
other software, or use stored outputs from those where that is
not possible.*
- **G5.4c** *Where applicable, stored values may be drawn from
published paper outputs when applicable and where code from
original implementations is not available*
- **G5.5** *Correctness tests should be run with a fixed random seed*
- **G5.6** **Parameter recovery tests** *to test that the
implementation produce expected results given data with known
properties. For instance, a linear regression algorithm should
return expected coefficient values for a simulated data set
generated from a linear model.*
- **G5.6a** *Parameter recovery tests should generally be expected
to succeed within a defined tolerance rather than recovering
exact values.*
- **G5.6b** *Parameter recovery tests should be run with multiple
random seeds when either data simulation or the algorithm
contains a random component. (When long-running, such tests may
be part of an extended, rather than regular, test suite; see
G4.10-4.12, below).*
- **G5.7** **Algorithm performance tests** *to test that
implementation performs as expected as properties of data change.
For instance, a test may show that parameters approach correct
estimates within tolerance as data size increases, or that
convergence times decrease for higher convergence thresholds.*
- **G5.8** **Edge condition tests** *to test that these conditions
produce expected behaviour such as clear warnings or errors when
confronted with data with extreme properties including but not
limited to:*
- **G5.8a** *Zero-length data*
- **G5.8b** *Data of unsupported types (e.g., character or complex
numbers in for functions designed only for numeric data)*
- **G5.8c** *Data with all-`NA` fields or columns or all identical
fields or columns*
- **G5.8d** *Data outside the scope of the algorithm (for example,
data with more fields (columns) than observations (rows) for
some regression algorithms)*
- **G5.9** **Noise susceptibility tests** *Packages should test for
expected stochastic behaviour, such as through the following
conditions:*
- **G5.9a** *Adding trivial noise (for example, at the scale of
`.Machine$double.eps`) to data does not meaningfully change
results*
- **G5.9b** *Running under different random seeds or initial
conditions does not meaningfully change results*
#### 5.4 Extended tests
Thorough testing of statistical software may require tests on large data
sets, tests with many permutations, or other conditions leading to
long-running tests. In such cases it may be neither possible nor
advisable to execute tests continuously, or with every code change.
Software should nevertheless test any and all conditions regardless of
how long tests may take, and in doing so should adhere to the following
standards:
- **G5.10** *Extended tests should included and run under a common
framework with other tests but be switched on by flags such as as a
`<MYPKG>_EXTENDED_TESTS=1` environment variable.*
- **G5.11** *Where extended tests require large data sets or other
assets, these should be provided for downloading and fetched as part
of the testing workflow.*
- **G5.11a** *When any downloads of additional data necessary for
extended tests fail, the tests themselves should not fail,
rather be skipped and implicitly succeed with an appropriate
diagnostic message.*
- **G5.12** *Any conditions necessary to run extended tests such as
platform requirements, memory, expected runtime, and artefacts
produced that may need manual inspection, should be described in
developer documentation such as a `CONTRIBUTING.md` or
`tests/README.md` file.*