---
title: Bayesian Software Standards
tags: statistical-software
robots: noindex, nofollow
---
<!-- Edit the .Rmd not the .md file -->
## Bayesian and Monte Carlo Software
Bayesian and Monte Carlo software centres on quantitative estimation of
components of [Baye’s
theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem), particularly
on estimation or application of prior and/or posterior probability
distributions. The procedures implemented to estimate the properties of
such distributions are commonly based on random sampling procedures,
hence referred to as “*Monte Carlo*” routines in reference to the random
yet quantifiable nature of casino games. The scope of this category also
includes algorithms which focus on sampling routines only, such as
Markov-Chain Monte Carlo (MCMC) procedures, independent of application
in Bayesian analyses.
The term “model” is understood with reference here to Bayesian software
to refer to an encoded description of how parameters specifying aspects
of one or more prior distributions are transformed into (properties of)
one or more posterior distributions.
Some examples of Bayesian and Monte Carlo software include:
1. The [`bayestestR`
package](https://joss.theoj.org/papers/10.21105/joss.01541) which
“provides tools to describe … posterior distributions”
2. The [`ArviZ`
package](https://joss.theoj.org/papers/10.21105/joss.01143) python
package for exploratory analyses of Bayesian models, particularly
posterior distributions.
3. The [`GammaGompertzCR`
package](https://joss.theoj.org/papers/10.21105/joss.00216), which
features explicit diagnostics of MCMC convergence statistics.
4. The [`BayesianNetwork`
package](https://joss.theoj.org/papers/10.21105/joss.00425), which
is in many ways a wrapper package primarily serving a `shiny` app,
and is also accordingly a package in the EDA category.
5. The [`fmcmc`
package](https://joss.theoj.org/papers/10.21105/joss.01427), which
is a “classic” MCMC package which directly provides its own
implementation, and generates its own convergence statistics.
6. The [`rsimsum`
package](https://joss.theoj.org/papers/10.21105/joss.00739) which
“summarise\[s\] results from Monte Carlo simulation studies”. Many
of the statistics generated by this package are useful for assessing
and comparing Bayesian and Monte Carlo software in general. (See
also the [`MCMCvis`
package](https://joss.theoj.org/papers/10.21105/joss.00640), with
more of a focus on visualisation.)
7. The [`walkr`
package](https://joss.theoj.org/papers/10.21105/joss.00061) for
“MCMC Sampling from Non-Negative Convex Polytopes”. This package is
also indicative of the difficulties of deriving generally applicable
assessments of software in this category, because MCMC *sampling*
relies on fundamentally different inputs and outputs than many other
MCMC routines.
Click on the following link to view a demonstration [Application of
Bayesian and Monte Carlo
Standards](https://hackmd.io/zVWTAl9ZQeCcj_bvMGcmMQ).
Bayesian and Monte Carlo Software (hereafter referred to for simplicity
as “Bayesian Software”) is presumed to perform one or more of the
following steps:
1. Document how to specify inputs including:
- 1.1 Data
- 1.2 Parameters determining prior distributions
- 1.3 Parameters determining the computational processes
2. Accept and validate all of forms of input
3. Apply data transformation and pre-processing steps
4. Apply one or more analytic algorithms, generally sampling algorithms
used to generate estimates of posterior distributions
5. Return the result of that algorithmic application
6. Offer additional functionality such as printing or summarising
return results
This chapter details standards for each of these steps, each prefixed
with “BS”.
### 1 Documentation of Inputs
Prior to actual standards for documentation of inputs, we note one
terminological standard for Bayesian software which uses the term
“hyperparameter”:
- **BS1.0** *Bayesian software which uses the term “hyperparameter”
should explicitly clarify the meaning of that term in the context of
that software.*
This standard reflects the dual facts that this term is frequently used
in Bayesian software, yet has no unambiguous definition or
interpretation. The term “hyperparameter” is also used in other
statistical contexts in ways that are often distinctly different from
its common use in Bayesian analyses. Examples of the kinds of
clarifications required to adhere to this standard include,
> Hyperparameters refer here to parameters determining the form of prior
> distributions that conditionally depend on other parameters.
Such a clarification would then require further explicit distinction
between “parameters” and “hyperparameters”. The remainder of these
standards does not refer to “hyperparameters”, rather attempts to make
explicit distinctions between different kinds of parameters, such as
distributional or algorithmic control parameters. Beyond this standard,
Bayesian Software should provide the following documentation of how to
specify inputs:
- **BS1.1** *Descriptions of how to enter data, both in textual form
and via code examples. Both of these should consider the simplest
cases of single objects representing independent and dependent data,
and potentially more complicated cases of multiple independent data
inputs.*
- **BS1.2** *Description of how to specify prior distributions, both
in textual form describing the general principles of specifying
prior distributions, along with more applied descriptions and
examples, within:*
- **B31.2a** *The main package `README`, either as textual
description or example code*
- **B31.2b** *At least one package vignette, both as general and
applied textual descriptions, and example code*
- **B31.2c** *Function-level documentation, preferably with code
included in examples*
- **BS1.3** *Description of all parameters which control the
computational process (typically those determining aspects such as
numbers and lengths of sampling processes, seeds used to start them,
thinning parameters determining post-hoc sampling from simulated
values, and convergence criteria). In particular:*
- **BS1.3a** *Bayesian Software should document, both in text and
examples, how to use the output of previous simulations as
starting points of subsequent simulations.*
- **BS1.3b** *Where applicable, Bayesian software should document,
both in text and examples, how to use different sampling
algorithms for a given model.*
- **BS1.4** *For Bayesian Software which implements or otherwise
enables convergence checkers, documentation should explicitly
describe and provide examples of use with and without convergence
checkers.*
- **BS1.5** *For Bayesian Software which implements or otherwise
enables multiple convergence checkers, differences between these
should be explicitly tested.*
### 2 Input Data Structures and Validation
This section contains standards primarily intended to ensure that input
data, including model specifications, are validated prior to passing
through to the main computational algorithms.
#### 2.1 Input Data
Bayesian Software is commonly designed to accept generic one- or
two-dimensional forms such as vector, matrix, or `data.frame` objects,
for which the following standard applies.
- **BS2.1** *Bayesian Software should implement pre-processing
routines to ensure all input data is dimensionally commensurate, for
example by ensuring commensurate lengths of vectors or numbers of
rows of tabular inputs.*
- **BS2.1a** *The effects of such routines should be tested.*
#### 2.2 Prior Distributions, Model Specifications, and Distributional Parameters
The second set of standards in this section concern specification of
prior distributions, model structures, or other equivalent ways of
specifying hypothesised relationships among input data structures. R
already has a diverse range of Bayesian Software with distinct
approaches to this task, commonly either through specifying a model as a
character vector representing an R function, or an external file either
as R code, or encoded according to some alternative system (such as for
[`rstan`](https://mc-stan.org/rstan/)).
Bayesian Software should:
- **BS2.2** *Ensure that all appropriate validation and pre-processing
of distributional parameters are implemented as distinct
pre-processing steps prior to submitting to analytic routines, and
especially prior to submitting to multiple parallel computational
chains.*
- **BS2.3** *Ensure that lengths of vectors of distributional
parameters are checked, with no excess values silently discarded
(unless such output is explicitly suppressed, as detailed below).*
- **BS2.4** *Ensure that lengths of vectors of distributional
parameters are commensurate with expected model input (see example
immediately below)*
- **BS2.5** *Where possible, implement pre-processing checks to
validate appropriateness of numeric values submitted for
distributional parameters; for example, by ensuring that
distributional parameters defining second-order moments such as
distributional variance or shape parameters, or any parameters which
are logarithmically transformed, are non-negative.*
The following example demonstrates how standards like the above
(BS2.4-2.5) might be addressed. Consider the following function which
defines a log-likelihood estimator for a linear regression, controlled
via a vector of three distributional parameters, `p`:
``` r
ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
```
Pre-processing stages should be used to determine:
1. That the dimensions of the input data, `x` and `y`, are commensurate
(BS2.1); non-commensurate inputs should error by default.
2. The length of the vector `p` (BS2.3)
The latter task is not necessarily straightforward, because the
definition of the function, `ll()`, will itself generally be part of the
input to an actual Bayesian Software function. This functional input
thus needs to be examined to determine expected lengths of
hyperparameter vectors. The following code illustrates one way to
achieve this, relying on utilities for parsing function calls in R,
primarily through the
[`getParseData`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/getParseData.html)
function from the `utils` package. The parse data for a function can be
extracted with the following line:
``` r
x <- getParseData (parse (text = deparse (ll)))
```
The object `x` is a `data.frame` of every R token (such as an
expression, symbol, or operator) parsed from the function `ll`. The
following section illustrates how this data can be used to determine the
expected lengths of vector inputs to the function, `ll()`.
<details>
<summary>
click to see details
</summary>
<p>
Input arguments used to define parameter vectors in any R software are
accessed through R’s standard vector access syntax of `vec[i]`, for some
element `i` of a vector `vec`. The parse data for such begins with the
`SYMBOL` of `vec`, the `[`, a `NUM_CONST` for the value of `i`, and a
closing `]`. The following code can be used to extract elements of the
parse data which match this pattern, and ultimately to extract the
various values of `i` used to access members of `vec`.
``` r
vector_length <- function (x, i) {
xn <- x [which (x$token %in% c ("SYMBOL", "NUM_CONST", "'['", "']'")), ]
# split resultant data.frame at first "SYMBOL" entry
xn <- split (xn, cumsum (xn$token == "SYMBOL"))
# reduce to only those matching the above pattern
xn <- xn [which (vapply (xn, function (j)
j$text [1] == i & nrow (j) > 3,
logical (1)))]
ret <- NA_integer_ # default return value
if (length (xn) > 0) {
# get all values of NUM_CONST as integers
n <- vapply (xn, function (j)
as.integer (j$text [j$token == "NUM_CONST"] [1]),
integer (1), USE.NAMES = FALSE)
# and return max of these
ret <- max (n)
}
return (ret)
}
```
That function can then be used to determine the length of any inputs
which are used as hyperparameter vectors:
``` r
ll <- function (p, x, y) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE)
p <- parse (text = deparse (ll))
x <- utils::getParseData (p)
# extract the names of the parameters:
params <- unique (x$text [x$token == "SYMBOL"])
lens <- vapply (params, function (i) vector_length (x, i), integer (1))
lens
#> y p x
#> NA 3 NA
```
And the vector `p` is used as a hyperparameter vector containing three
parameters. Any initial value vectors can then be examined to ensure
that they have this same length.
------------------------------------------------------------------------
</p>
</details>
<br>
Not all Bayesian Software is designed to accept model inputs expressed
as R code. The [`rstan` package](https://github.com/stan-dev/rstan), for
example, implements its own model specification language, and only
allows distributional parameters to be named, and not addressed by
index. While this largely avoids problems of mismatched lengths of
parameter vectors, the software (at v2.21.1) does not ensure the
existence of named parameters prior to starting the computational
chains. This ultimately results in each chain generating an error when a
model specification refers to a non-existent or undefined distributional
parameter. Such controls should be part of a single pre-processing
stage, and so should only generate a single error.
#### 2.3 Computational Parameters
Computational parameters are considered here distinct from
distributional parameters, and commonly passed to Bayesian functions to
directly control computational processes. They typically include
parameters controlling lengths of runs, lengths of burn-in periods,
numbers of parallel computations, other parameters controlling how
samples are to be generated, or convergence criteria. All Computational
Parameters should be checked for general “sanity” prior to calling
primary computational algorithms. The standards for such sanity checks
include that Bayesian Software should:
- **BS2.6** *Check that values for computational parameters lie within
plausible ranges.*
While admittedly not always possible to define, plausible ranges may be
as simple as ensuring values are greater than zero. Where possible,
checks should nevertheless ensure appropriate responses to extremely
large values, for example by issuing diagnostic messages about likely
long computational times. The following two sub-sections consider
particular cases of computational parameters.
#### 2.4 Parameters Controlling Start Values
Bayesian software generally relies on sequential random sampling
procedures, with each sequence uniquely determined by (among other
aspects) the value at which it is started. Given that, Bayesian software
should:
- **BS2.7** *Enable starting values to be explicitly controlled via
one or more input parameters, including multiple values for software
which implements or enables multiple computational “chains.”*
- **BS2.8** *Enable results of previous runs to be used as starting
points for subsequent runs.*
Bayesian Software which implements or enables multiple computational
chains should:
- **BS2.9** *Ensure each chain is started with a different seed by
default.*
- **BS2.10** *Issue diagnostic messages when identical seeds are
passed to distinct computational chains.*
- **BS2.12** *Software which accepts starting values as a vector
should provide the parameter with a plural name: for example,
“starting\_values” and not “starting\_value”.*
To avoid potential confusion between separate parameters to control
random seeds and starting values, we recommended a single “starting
values” rather than “seeds” argument, with appropriate translation of
these parameters into seeds where necessary.
#### 2.5 Output Verbosity
All Bayesian Software should implement computational parameters to
control output verbosity. Bayesian computations are often
time-consuming, and often performed as batch computations. The following
standards should be adhered to in regard to output verbosity:
- **BS2.13** *Bayesian Software should implement at least one
parameter controlling the verbosity of output, defaulting to verbose
output of all appropriate messages, warnings, errors, and progress
indicators.*
- **BS2.14** *Bayesian Software should enable suppression of messages
and progress indicators, while retaining verbosity of warnings and
errors. This should be tested.*
- **BS2.15** *Bayesian Software should enable suppression of warnings
where appropriate. This should be tested.*
- **BS2.16** *Bayesian Software should explicitly enable errors to be
caught, and appropriately processed either through conversion to
warnings, or otherwise captured in return values. This should be
tested.*
### 3 Pre-processing and Data Transformation
#### 3.1 Missing Values
In additional to the *General Standards* for missing values
(**G2.13**–**2.16**), and in particular **G2.13**, Bayesian Software
should:
- **BS3.0** *Explicitly document assumptions made in regard to missing
values; for example that data is assumed to contain no missing
(`NA`, `Inf`) values, and that such values, or entire rows including
any such values, will be automatically removed from input data.*
#### 3.2 Perfect Collinearity
Where appropriate, Bayesian Software should:
- **BS3.1** *Implement pre-processing routines to diagnose perfect
collinearity, and provide appropriate diagnostic messages or
warnings*
- **BS3.2** *Provide distinct routines for processing perfectly
collinear data, potentially bypassing sampling algorithms*
An appropriate test for **BS3.2** would confirm that `system.time()` or
equivalent timing expressions for perfectly collinear data should be
*less* than equivalent routines called with non-collinear data.
Alternatively, a test could ensure that perfectly collinear data passed
to a function with a stopping criteria generated no results, while
specifying a fixed number of iterations may generate results.
### 4 Analytic Algorithms
As mentioned, analytic algorithms for Bayesian Software are commonly
algorithms to simulate posterior distributions, and to draw samples from
those simulations. Numerous extant R packages implement and offer
sampling algorithms, and not all Bayesian Software will internally
implement sampling algorithms. The following standards apply to packages
which do implement internal sampling algorithms:
- **BS4.0** *Packages should document sampling algorithms (generally
via literary citation, or reference to other software)*
- **BS4.1** *Packages should provide explicit comparisons with
external samplers which demonstrate intended advantage of
implementation (generally via tests, vignettes, or both).*
Regardless of whether or not Bayesian Software implements internal
sampling algorithms, it should:
- **BS4.2** *Implement at least one means to validate posterior
estimates.*
An example of posterior validation is the [Simulation Based
Calibration](https://arxiv.org/abs/1804.06788) approach implemented in
the [`rstan`](https://mc-stan.org/rstan) function
[`sbc`](https://mc-stan.org/rstan/reference/sbc.html)). (Note also that
the [`BayesValidate`
package](https://cran.r-project.org/package=BayesValidate) has not been
updated for almost 15 years, so should not be directly used, although
ideas from that package may be adapted for validation purposes.) Beyond
this, where possible or applicable, Bayesian Software should:
- **BS4.3** *Implement or otherwise offer at least one type of
convergence checker, and provide a documented reference for that
implementation.*
- **BS4.3** *Enable computations to be stopped on convergence
(although not necessarily by default).*
- **BS4.5** *Ensure that appropriate mechanisms are provided for
models which do not converge.*
This is often achieved by having default behaviour to stop after
specified numbers of iterations regardless of convergence.
- **BS4.6** *Implement tests to confirm that results with convergence
checker are statistically equivalent to results from equivalent
fixed number of samples without convergence checking.*
- **BS4.7** *Where convergence checkers are themselves parametrised,
the effects of such parameters should also be tested. For threshold
parameters, for example, lower values should result in longer
sequence lengths.*
### 5 Return Values
Unlike software in many other categories, Bayesian Software should
generally return several kinds of distinct data, both the raw data
derived from statistical algorithms, and associated metadata. Such
distinct and generally disparate forms of data will be generally best
combined into a single object through implementing a defined class
structure, although other options are possible, including (re-)using
extant class structures (see the CRAN Task view on [Bayesian
Inference](https://cran.r-project.org/web/views/Bayesian.html) for
reference to other packages and class systems). Regardless of the
precise form of return object, and whether or not defined class
structures are used or implemented, the following standards apply:
- **BS5.0** *Return values should include starting value(s) or
seed(s), including values for each sequence where multiple sequences
are included*
- **BS5.1** *Return values should include appropriate metadata on
types (or classes) and dimensions of input data*
The latter standard may also include returning a unique hash computed
from the input data, to enable results to be uniquely associated with
that input data. With regard to the input function, or alternative means
of specifying prior distributions:
- **BS5.2** *Bayesian Software should either return the input function
or prior distributional specification in the return object; or
enable direct access to such via additional functions which accept
the return object as single argument.*
Where convergence checkers are implemented or provided:
- **BS5.3** *Bayesian Software should return convergence statistics or
equivalent*
- **BS5.4** *Where multiple checkers are enabled, Bayesian Software
should return details of convergence checker used*
- **BS5.5** *Appropriate diagnostic statistics to indicate absence of
convergence should either be returned or immediately able to be
accessed.*
### 6 Additional Functionality
With regard to additional methods implemented for, or dispatched on,
return objects:
- **BS6.0** *Software should implement a default `print` method for
return objects*
- **BS6.1** *Software should implement a default `plot` method for
return objects*
- **BS6.2** *Software should provide and document straightforward
abilities to plot sequences of posterior samples, with burn-in
periods clearly distinguished*
- **BS6.3** *Software should provide and document straightforward
abilities to plot posterior distributional estimates*
Beyond these points:
- **BS6.4** *Software may provide `summary` methods for return
objects*
- **BS6.5** *Software may provide abilities to plot both sequences of
posterior samples and distributional estimates together in single
graphic*
### 7 Tests
#### 7.1 Parameter Recovery Tests
Bayesian software should implement the following parameter recovery
tests:
- **BS7.0** *Software should demonstrate and confirm recovery of
parametric estimates of a prior distribution*
- **BS7.1** *Software should demonstrate and confirm recovery of a
prior distribution in the absence of any additional data or
information*
- **BS7.2** *Software should demonstrate and confirm recovery of a
expected posterior distribution given a specified prior and some
input data*
#### 7.2 Algorithmic Scaling Tests
- **BS7.3** *Bayesian software should include tests which demonstrate
and confirm the scaling of algorithmic efficiency with sizes of
input data.*
An example of adhering to this standard would be documentation or tests
which demonstrate or confirm that computation times increase
approximately logarithmically with increasing sizes of input data.
#### 7.3 Scaling of Input to Output Data
- **BS7.4** *Bayesian software should implement tests which confirm
that predicted or fitted values are on (approximately) the same
scale as input values.*
- **BS7.4a** *The implications of any assumptions on scales on
input objects should be explicitly tested in this context; for
example that the scales of inputs which do not have means of
zero will not be able to be recovered.*