Bayesian Software Standards

--- title: Bayesian Software Standards tags: statistical-software robots: noindex, nofollow ---  ## Bayesian and Monte Carlo Software Bayesian and Monte Carlo software centres on quantitative estimation of components of [Baye’s theorem](https://en.wikipedia.org/wiki/Bayes%27_theorem), particularly on estimation or application of prior and/or posterior probability distributions. The procedures implemented to estimate the properties of such distributions are commonly based on random sampling procedures, hence referred to as “*Monte Carlo*” routines in reference to the random yet quantifiable nature of casino games. The scope of this category also includes algorithms which focus on sampling routines only, such as Markov-Chain Monte Carlo (MCMC) procedures, independent of application in Bayesian analyses. The term “model” is understood with reference here to Bayesian software to refer to an encoded description of how parameters specifying aspects of one or more prior distributions are transformed into (properties of) one or more posterior distributions. Some examples of Bayesian and Monte Carlo software include: 1. The [`bayestestR` package](https://joss.theoj.org/papers/10.21105/joss.01541) which “provides tools to describe … posterior distributions” 2. The [`ArviZ` package](https://joss.theoj.org/papers/10.21105/joss.01143) python package for exploratory analyses of Bayesian models, particularly posterior distributions. 3. The [`GammaGompertzCR` package](https://joss.theoj.org/papers/10.21105/joss.00216), which features explicit diagnostics of MCMC convergence statistics. 4. The [`BayesianNetwork` package](https://joss.theoj.org/papers/10.21105/joss.00425), which is in many ways a wrapper package primarily serving a `shiny` app, and is also accordingly a package in the EDA category. 5. The [`fmcmc` package](https://joss.theoj.org/papers/10.21105/joss.01427), which is a “classic” MCMC package which directly provides its own implementation, and generates its own convergence statistics. 6. The [`rsimsum` package](https://joss.theoj.org/papers/10.21105/joss.00739) which “summarise\[s\] results from Monte Carlo simulation studies”. Many of the statistics generated by this package are useful for assessing and comparing Bayesian and Monte Carlo software in general. (See also the [`MCMCvis` package](https://joss.theoj.org/papers/10.21105/joss.00640), with more of a focus on visualisation.) 7. The [`walkr` package](https://joss.theoj.org/papers/10.21105/joss.00061) for “MCMC Sampling from Non-Negative Convex Polytopes”. This package is also indicative of the difficulties of deriving generally applicable assessments of software in this category, because MCMC *sampling* relies on fundamentally different inputs and outputs than many other MCMC routines. Click on the following link to view a demonstration [Application of Bayesian and Monte Carlo Standards](https://hackmd.io/zVWTAl9ZQeCcj_bvMGcmMQ). Bayesian and Monte Carlo Software (hereafter referred to for simplicity as “Bayesian Software”) is presumed to perform one or more of the following steps: 1. Document how to specify inputs including: - 1.1 Data - 1.2 Parameters determining prior distributions - 1.3 Parameters determining the computational processes 2. Accept and validate all of forms of input 3. Apply data transformation and pre-processing steps 4. Apply one or more analytic algorithms, generally sampling algorithms used to generate estimates of posterior distributions 5. Return the result of that algorithmic application 6. Offer additional functionality such as printing or summarising return results This chapter details standards for each of these steps, each prefixed with “BS”. ### 1 Documentation of Inputs Prior to actual standards for documentation of inputs, we note one terminological standard for Bayesian software which uses the term “hyperparameter”: - **BS1.0** *Bayesian software which uses the term “hyperparameter” should explicitly clarify the meaning of that term in the context of that software.* This standard reflects the dual facts that this term is frequently used in Bayesian software, yet has no unambiguous definition or interpretation. The term “hyperparameter” is also used in other statistical contexts in ways that are often distinctly different from its common use in Bayesian analyses. Examples of the kinds of clarifications required to adhere to this standard include, > Hyperparameters refer here to parameters determining the form of prior > distributions that conditionally depend on other parameters. Such a clarification would then require further explicit distinction between “parameters” and “hyperparameters”. The remainder of these standards does not refer to “hyperparameters”, rather attempts to make explicit distinctions between different kinds of parameters, such as distributional or algorithmic control parameters. Beyond this standard, Bayesian Software should provide the following documentation of how to specify inputs: - **BS1.1** *Descriptions of how to enter data, both in textual form and via code examples. Both of these should consider the simplest cases of single objects representing independent and dependent data, and potentially more complicated cases of multiple independent data inputs.* - **BS1.2** *Description of how to specify prior distributions, both in textual form describing the general principles of specifying prior distributions, along with more applied descriptions and examples, within:* - **B31.2a** *The main package `README`, either as textual description or example code* - **B31.2b** *At least one package vignette, both as general and applied textual descriptions, and example code* - **B31.2c** *Function-level documentation, preferably with code included in examples* - **BS1.3** *Description of all parameters which control the computational process (typically those determining aspects such as numbers and lengths of sampling processes, seeds used to start them, thinning parameters determining post-hoc sampling from simulated values, and convergence criteria). In particular:* - **BS1.3a** *Bayesian Software should document, both in text and examples, how to use the output of previous simulations as starting points of subsequent simulations.* - **BS1.3b** *Where applicable, Bayesian software should document, both in text and examples, how to use different sampling algorithms for a given model.* - **BS1.4** *For Bayesian Software which implements or otherwise enables convergence checkers, documentation should explicitly describe and provide examples of use with and without convergence checkers.* - **BS1.5** *For Bayesian Software which implements or otherwise enables multiple convergence checkers, differences between these should be explicitly tested.* ### 2 Input Data Structures and Validation This section contains standards primarily intended to ensure that input data, including model specifications, are validated prior to passing through to the main computational algorithms. #### 2.1 Input Data Bayesian Software is commonly designed to accept generic one- or two-dimensional forms such as vector, matrix, or `data.frame` objects, for which the following standard applies. - **BS2.1** *Bayesian Software should implement pre-processing routines to ensure all input data is dimensionally commensurate, for example by ensuring commensurate lengths of vectors or numbers of rows of tabular inputs.* - **BS2.1a** *The effects of such routines should be tested.* #### 2.2 Prior Distributions, Model Specifications, and Distributional Parameters The second set of standards in this section concern specification of prior distributions, model structures, or other equivalent ways of specifying hypothesised relationships among input data structures. R already has a diverse range of Bayesian Software with distinct approaches to this task, commonly either through specifying a model as a character vector representing an R function, or an external file either as R code, or encoded according to some alternative system (such as for [`rstan`](https://mc-stan.org/rstan/)). Bayesian Software should: - **BS2.2** *Ensure that all appropriate validation and pre-processing of distributional parameters are implemented as distinct pre-processing steps prior to submitting to analytic routines, and especially prior to submitting to multiple parallel computational chains.* - **BS2.3** *Ensure that lengths of vectors of distributional parameters are checked, with no excess values silently discarded (unless such output is explicitly suppressed, as detailed below).* - **BS2.4** *Ensure that lengths of vectors of distributional parameters are commensurate with expected model input (see example immediately below)* - **BS2.5** *Where possible, implement pre-processing checks to validate appropriateness of numeric values submitted for distributional parameters; for example, by ensuring that distributional parameters defining second-order moments such as distributional variance or shape parameters, or any parameters which are logarithmically transformed, are non-negative.* The following example demonstrates how standards like the above (BS2.4-2.5) might be addressed. Consider the following function which defines a log-likelihood estimator for a linear regression, controlled via a vector of three distributional parameters, `p`: ``` r ll <- function (x, y, p) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE) ``` Pre-processing stages should be used to determine: 1. That the dimensions of the input data, `x` and `y`, are commensurate (BS2.1); non-commensurate inputs should error by default. 2. The length of the vector `p` (BS2.3) The latter task is not necessarily straightforward, because the definition of the function, `ll()`, will itself generally be part of the input to an actual Bayesian Software function. This functional input thus needs to be examined to determine expected lengths of hyperparameter vectors. The following code illustrates one way to achieve this, relying on utilities for parsing function calls in R, primarily through the [`getParseData`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/getParseData.html) function from the `utils` package. The parse data for a function can be extracted with the following line: ``` r x <- getParseData (parse (text = deparse (ll))) ``` The object `x` is a `data.frame` of every R token (such as an expression, symbol, or operator) parsed from the function `ll`. The following section illustrates how this data can be used to determine the expected lengths of vector inputs to the function, `ll()`. <details> <summary> click to see details </summary> <p> Input arguments used to define parameter vectors in any R software are accessed through R’s standard vector access syntax of `vec[i]`, for some element `i` of a vector `vec`. The parse data for such begins with the `SYMBOL` of `vec`, the `[`, a `NUM_CONST` for the value of `i`, and a closing `]`. The following code can be used to extract elements of the parse data which match this pattern, and ultimately to extract the various values of `i` used to access members of `vec`. ``` r vector_length <- function (x, i) { xn <- x [which (x$token %in% c ("SYMBOL", "NUM_CONST", "'['", "']'")), ] # split resultant data.frame at first "SYMBOL" entry xn <- split (xn, cumsum (xn$token == "SYMBOL")) # reduce to only those matching the above pattern xn <- xn [which (vapply (xn, function (j) j$text [1] == i & nrow (j) > 3, logical (1)))] ret <- NA_integer_ # default return value if (length (xn) > 0) { # get all values of NUM_CONST as integers n <- vapply (xn, function (j) as.integer (j$text [j$token == "NUM_CONST"] [1]), integer (1), USE.NAMES = FALSE) # and return max of these ret <- max (n) } return (ret) } ``` That function can then be used to determine the length of any inputs which are used as hyperparameter vectors: ``` r ll <- function (p, x, y) dnorm (y - (p[1] + x * p[2]), sd = p[3], log = TRUE) p <- parse (text = deparse (ll)) x <- utils::getParseData (p) # extract the names of the parameters: params <- unique (x$text [x$token == "SYMBOL"]) lens <- vapply (params, function (i) vector_length (x, i), integer (1)) lens #> y p x #> NA 3 NA ``` And the vector `p` is used as a hyperparameter vector containing three parameters. Any initial value vectors can then be examined to ensure that they have this same length. ------------------------------------------------------------------------ </p> </details> <br> Not all Bayesian Software is designed to accept model inputs expressed as R code. The [`rstan` package](https://github.com/stan-dev/rstan), for example, implements its own model specification language, and only allows distributional parameters to be named, and not addressed by index. While this largely avoids problems of mismatched lengths of parameter vectors, the software (at v2.21.1) does not ensure the existence of named parameters prior to starting the computational chains. This ultimately results in each chain generating an error when a model specification refers to a non-existent or undefined distributional parameter. Such controls should be part of a single pre-processing stage, and so should only generate a single error. #### 2.3 Computational Parameters Computational parameters are considered here distinct from distributional parameters, and commonly passed to Bayesian functions to directly control computational processes. They typically include parameters controlling lengths of runs, lengths of burn-in periods, numbers of parallel computations, other parameters controlling how samples are to be generated, or convergence criteria. All Computational Parameters should be checked for general “sanity” prior to calling primary computational algorithms. The standards for such sanity checks include that Bayesian Software should: - **BS2.6** *Check that values for computational parameters lie within plausible ranges.* While admittedly not always possible to define, plausible ranges may be as simple as ensuring values are greater than zero. Where possible, checks should nevertheless ensure appropriate responses to extremely large values, for example by issuing diagnostic messages about likely long computational times. The following two sub-sections consider particular cases of computational parameters. #### 2.4 Parameters Controlling Start Values Bayesian software generally relies on sequential random sampling procedures, with each sequence uniquely determined by (among other aspects) the value at which it is started. Given that, Bayesian software should: - **BS2.7** *Enable starting values to be explicitly controlled via one or more input parameters, including multiple values for software which implements or enables multiple computational “chains.”* - **BS2.8** *Enable results of previous runs to be used as starting points for subsequent runs.* Bayesian Software which implements or enables multiple computational chains should: - **BS2.9** *Ensure each chain is started with a different seed by default.* - **BS2.10** *Issue diagnostic messages when identical seeds are passed to distinct computational chains.* - **BS2.12** *Software which accepts starting values as a vector should provide the parameter with a plural name: for example, “starting\_values” and not “starting\_value”.* To avoid potential confusion between separate parameters to control random seeds and starting values, we recommended a single “starting values” rather than “seeds” argument, with appropriate translation of these parameters into seeds where necessary. #### 2.5 Output Verbosity All Bayesian Software should implement computational parameters to control output verbosity. Bayesian computations are often time-consuming, and often performed as batch computations. The following standards should be adhered to in regard to output verbosity: - **BS2.13** *Bayesian Software should implement at least one parameter controlling the verbosity of output, defaulting to verbose output of all appropriate messages, warnings, errors, and progress indicators.* - **BS2.14** *Bayesian Software should enable suppression of messages and progress indicators, while retaining verbosity of warnings and errors. This should be tested.* - **BS2.15** *Bayesian Software should enable suppression of warnings where appropriate. This should be tested.* - **BS2.16** *Bayesian Software should explicitly enable errors to be caught, and appropriately processed either through conversion to warnings, or otherwise captured in return values. This should be tested.* ### 3 Pre-processing and Data Transformation #### 3.1 Missing Values In additional to the *General Standards* for missing values (**G2.13**–**2.16**), and in particular **G2.13**, Bayesian Software should: - **BS3.0** *Explicitly document assumptions made in regard to missing values; for example that data is assumed to contain no missing (`NA`, `Inf`) values, and that such values, or entire rows including any such values, will be automatically removed from input data.* #### 3.2 Perfect Collinearity Where appropriate, Bayesian Software should: - **BS3.1** *Implement pre-processing routines to diagnose perfect collinearity, and provide appropriate diagnostic messages or warnings* - **BS3.2** *Provide distinct routines for processing perfectly collinear data, potentially bypassing sampling algorithms* An appropriate test for **BS3.2** would confirm that `system.time()` or equivalent timing expressions for perfectly collinear data should be *less* than equivalent routines called with non-collinear data. Alternatively, a test could ensure that perfectly collinear data passed to a function with a stopping criteria generated no results, while specifying a fixed number of iterations may generate results. ### 4 Analytic Algorithms As mentioned, analytic algorithms for Bayesian Software are commonly algorithms to simulate posterior distributions, and to draw samples from those simulations. Numerous extant R packages implement and offer sampling algorithms, and not all Bayesian Software will internally implement sampling algorithms. The following standards apply to packages which do implement internal sampling algorithms: - **BS4.0** *Packages should document sampling algorithms (generally via literary citation, or reference to other software)* - **BS4.1** *Packages should provide explicit comparisons with external samplers which demonstrate intended advantage of implementation (generally via tests, vignettes, or both).* Regardless of whether or not Bayesian Software implements internal sampling algorithms, it should: - **BS4.2** *Implement at least one means to validate posterior estimates.* An example of posterior validation is the [Simulation Based Calibration](https://arxiv.org/abs/1804.06788) approach implemented in the [`rstan`](https://mc-stan.org/rstan) function [`sbc`](https://mc-stan.org/rstan/reference/sbc.html)). (Note also that the [`BayesValidate` package](https://cran.r-project.org/package=BayesValidate) has not been updated for almost 15 years, so should not be directly used, although ideas from that package may be adapted for validation purposes.) Beyond this, where possible or applicable, Bayesian Software should: - **BS4.3** *Implement or otherwise offer at least one type of convergence checker, and provide a documented reference for that implementation.* - **BS4.3** *Enable computations to be stopped on convergence (although not necessarily by default).* - **BS4.5** *Ensure that appropriate mechanisms are provided for models which do not converge.* This is often achieved by having default behaviour to stop after specified numbers of iterations regardless of convergence. - **BS4.6** *Implement tests to confirm that results with convergence checker are statistically equivalent to results from equivalent fixed number of samples without convergence checking.* - **BS4.7** *Where convergence checkers are themselves parametrised, the effects of such parameters should also be tested. For threshold parameters, for example, lower values should result in longer sequence lengths.* ### 5 Return Values Unlike software in many other categories, Bayesian Software should generally return several kinds of distinct data, both the raw data derived from statistical algorithms, and associated metadata. Such distinct and generally disparate forms of data will be generally best combined into a single object through implementing a defined class structure, although other options are possible, including (re-)using extant class structures (see the CRAN Task view on [Bayesian Inference](https://cran.r-project.org/web/views/Bayesian.html) for reference to other packages and class systems). Regardless of the precise form of return object, and whether or not defined class structures are used or implemented, the following standards apply: - **BS5.0** *Return values should include starting value(s) or seed(s), including values for each sequence where multiple sequences are included* - **BS5.1** *Return values should include appropriate metadata on types (or classes) and dimensions of input data* The latter standard may also include returning a unique hash computed from the input data, to enable results to be uniquely associated with that input data. With regard to the input function, or alternative means of specifying prior distributions: - **BS5.2** *Bayesian Software should either return the input function or prior distributional specification in the return object; or enable direct access to such via additional functions which accept the return object as single argument.* Where convergence checkers are implemented or provided: - **BS5.3** *Bayesian Software should return convergence statistics or equivalent* - **BS5.4** *Where multiple checkers are enabled, Bayesian Software should return details of convergence checker used* - **BS5.5** *Appropriate diagnostic statistics to indicate absence of convergence should either be returned or immediately able to be accessed.* ### 6 Additional Functionality With regard to additional methods implemented for, or dispatched on, return objects: - **BS6.0** *Software should implement a default `print` method for return objects* - **BS6.1** *Software should implement a default `plot` method for return objects* - **BS6.2** *Software should provide and document straightforward abilities to plot sequences of posterior samples, with burn-in periods clearly distinguished* - **BS6.3** *Software should provide and document straightforward abilities to plot posterior distributional estimates* Beyond these points: - **BS6.4** *Software may provide `summary` methods for return objects* - **BS6.5** *Software may provide abilities to plot both sequences of posterior samples and distributional estimates together in single graphic* ### 7 Tests #### 7.1 Parameter Recovery Tests Bayesian software should implement the following parameter recovery tests: - **BS7.0** *Software should demonstrate and confirm recovery of parametric estimates of a prior distribution* - **BS7.1** *Software should demonstrate and confirm recovery of a prior distribution in the absence of any additional data or information* - **BS7.2** *Software should demonstrate and confirm recovery of a expected posterior distribution given a specified prior and some input data* #### 7.2 Algorithmic Scaling Tests - **BS7.3** *Bayesian software should include tests which demonstrate and confirm the scaling of algorithmic efficiency with sizes of input data.* An example of adhering to this standard would be documentation or tests which demonstrate or confirm that computation times increase approximately logarithmically with increasing sizes of input data. #### 7.3 Scaling of Input to Output Data - **BS7.4** *Bayesian software should implement tests which confirm that predicted or fitted values are on (approximately) the same scale as input values.* - **BS7.4a** *The implications of any assumptions on scales on input objects should be explicitly tested in this context; for example that the scales of inputs which do not have means of zero will not be able to be recovered.*