---
title: Time Series Software Standards
tags: statistical-software
robots: noindex, nofollow
---
<!-- Edit the .Rmd not the .md file -->
## Time Series Software
The category of Time Series software is arguably easier to define that
the preceding categories, and represents any software the primary input
of which is intended to be temporally structured data. Importantly,
while “*temporally structured*” may often imply temporally ordered, this
need not necessarily be the case. The primary definition of temporally
structured data is that they possess some kind of index which can be
used to extract temporal relationships.
Time series software is presumed to perform one or more of the following
steps:
1. Accept and validate input data
2. Apply data transformation and pre-processing steps
3. Apply one or more analytic algorithms
4. Return the result of that algorithmic application
5. Offer additional functionality such as printing or summarising
return results
This document details standards for each of these steps, each prefixed
with “TS”.
### 1 Input data structures and validation
Input validation is an important software task, and an important part of
our standards. While there are many ways to approach validation, the
class systems of R offer a particularly convenient and effective means.
For Time Series Software in particular, a range of class systems have
been developed, for which we refer to the section “Time Series Classes”
in the CRAN Task view on [Time Series
Analysis"](https://cran.r-project.org/web/views/TimeSeries.html), and
the class-conversion package [`tsbox`](https://www.tsbox.help/).
Software which uses and relies on defined classes can often validate
input through affirming appropriate class(es). Software which does not
use or rely on class systems will generally need specific routines to
validate input data structures. In particular, because of the long
history of time series software in R, and the variety of class systems
for representing time series data, new time series packages should
accept as many different classes of input as possible by according with
the following standards:
- **TS1.0** *Time Series Software should use and rely on explicit
class systems developed for representing time series data, and
should not permit generic, non-time-series input*
The core algorithms of time-series software are often ultimately applied
to simple vector objects, and some time series software accepts simple
vector inputs, assuming these to represent temporally sequential data.
Permitting such generic inputs nevertheless prevents any such
assumptions from being asserted or tested. Missing values pose
particular problems in this regard. A simple `na.omit()` call or similar
will shorten the length of the vector by removing any `NA` values, and
will change the explicit temporal relationship between elements. The use
of explicit classes for time series generally ensures an ability to
explicitly assert properties such as strict temporal regularity, and to
control for any deviation from expected properties.
- **TS1.1** *Time Series Software should explicitly document the types
and classes of input data able to be passed to each function.*
- **TS1.2** *Time Series Software should accept input data in as many
time series specific classes as possible.*
- **TS1.3** *Time Series Software should implement validation routines
to confirm that inputs are of acceptable classes (or represented in
otherwise appropriate ways for software which does not use class
systems).*
- **TS1.4** *Time Series Software should implement a single
pre-processing routine to validate input data, and to appropriately
transform it to a single uniform type to be passed to all subsequent
data-processing functions (the [`tsbox`
package](https://www.tsbox.help/) provides one convenient approach
for this).*
- **TS1.5** *The pre-processing function described above should
maintain all time- or date-based components or attributes of input
data.*
For Time Series Software which relies on or implements custom classes or
types for representing time-series data, the following standards should
be adhered to:
- **TS1.6** *The software should ensure strict ordering of the time,
frequency, or equivalent ordering index variable.*
- **TS1.7** *Any violations of ordering should be caught in the
pre-processing stages of all functions.*
#### 1.1 Time Intervals and Relative Time
While most common packages and classes for time series data assume
*absolute* temporal scales such as those represented in [`POSIX`
classes](https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.POSIXlt.html)
for dates or times, time series may also be quantified on *relative*
scales where the temporal index variable quantifies intervals rather
than absolute times or dates. Many analytic routines which accept time
series inputs in absolute form are also appropriately applied to
analogous data in relative form, and thus many packages should accept
time series inputs both in absolute and relative forms. Software which
can or should accept times series inputs in relative form should:
- **TS1.8** *Accept inputs defined via the [`units`
package](https://github.com/r-quantities/units/) for attributing SI
units to R vectors.*
- **TS1.9** *Where time intervals or periods may be days or months, be
explicit about the system used to represent such, particularly
regarding whether a calendar system is used, or whether a year is
presumed to have 365 days, 365.2422 days, or some other value.*
### 2 Pre-processing and Variable Transformation
#### 2.1 Missing Data
One critical pre-processing step for Time Series Software is the
appropriate handling of missing data. It is convenient to distinguish
between *implicit* and *explicit* missing data. For regular time series,
explicit missing data may be represented by `NA` values, while for
irregular time series, implicit missing data may be represented by
missing rows. The difference is demonstrated in the following table.
<table>
<caption>
Missing Values
</caption>
<tbody>
<tr class="odd">
<td style="text-align: left;">
Time
</td>
<td style="text-align: left;">
value
</td>
</tr>
<tr class="even">
<td style="text-align: left;">
08:43
</td>
<td style="text-align: left;">
0.71
</td>
</tr>
<tr class="odd">
<td style="text-align: left;">
08:44
</td>
<td style="text-align: left;">
NA
</td>
</tr>
<tr class="odd">
<td style="text-align: left;">
08:45
</td>
<td style="text-align: left;">
0.28
</td>
</tr>
<tr class="odd">
<td style="text-align: left;">
08:47
</td>
<td style="text-align: left;">
0.34
</td>
</tr>
<tr class="odd">
<td style="text-align: left;">
08:48
</td>
<td style="text-align: left;">
0.07
</td>
</tr>
</tbody>
</table>
The value for 08:46 is *implicitly missing*, while the value for 08:44
is *explicitly missing*. These two forms of missingness may connote
different things, and may require different forms of pre-processing.
With this in mind, and beyond the [*General
Standards*](#general-standards) for missing data (**G2.13**–**G2.16**),
the following standards apply:
- **TS2.0** *Time Series Software which presumes or requires regular
data should only allow **explicit** missing values, and should issue
appropriate diagnostic messages, potentially including errors, in
response to any **implicit** missing values.*
- **TS2.1** *Where possible, all functions should provide options for
users to specify how to handle missing data, with options minimally
including:*
- **TS2.1a** \*error on missing data; or.
- **TS2.1b** *warn or ignore missing data, and proceed to analyse
irregular data, ensuring that results from function calls with
regular yet missing data return identical values to submitting
equivalent irregular data with no missing values; or*
- **TS2.1c** *replace missing data with appropriately imputed
values.*
This latter standard is a modified version of *General Standard*
**G2.14**, with additional requirements via **TS2.1b**.
#### 2.2 Stationarity
Time Series Software should explicitly document assumptions or
requirements made with respect to the stationarity or otherwise of all
input data. In particular, any (sub-)functions which assume or rely on
stationarity should:
- **TS2.2** *Consider stationarity of all relevant moments - typically
first (mean) and second (variance) order, or otherwise document why
such consideration may be restricted to lower orders only.*
- **TS2.3** *Explicitly document all assumptions and/or requirements
of stationarity*
- **TS2.4** *Implement appropriate checks for all relevant forms of
stationarity, and either:*
- **TS2.4a** *issue diagnostic messages or warnings; or*
- **TS2.4b** *enable or advise on appropriate transformations to
ensure stationarity.*
The two options in the last point (TS2.4b) respectively translate to
*enabling* transformations to ensure stationarity by providing
appropriate routines, generally triggered by some function parameter, or
*advising* on appropriate transformations, for example by directing
users to additional functions able to implement appropriate
transformations.
#### 2.3 Covariance Matrices
Where covariance matrices are constructed or otherwise used within or as
input to functions, they should:
- **TS2.5** *Incorporate a system to ensure that both row and column
orders follow the same ordering as the underlying time series data.
This may, for example, be done by including the `index` attribute of
the time series data as an attribute of the covariance matrix.*
- **TS2.6** *Where applicable, covariance matrices should also include
specification of appropriate units.*
*General Standard* **G3.1** also applies to all Time Series Software
which constructs or uses covariance matrices.
### 3 Analytic Algorithms
Analytic algorithms are considered here to reflect the core analytic
components of Time Series Software. These may be many and varied, and we
explicitly consider only a small subset here.
#### 3.1 Forecasting
Statistical software which implements forecasting routines should:
- **TS3.0** *Provide tests to demonstrate at least one case in which
errors widen appropriately with forecast horizon.*
- **TS3.1** *If possible, provide at least one test which violates
TS3.0*
- **TS3.2** *Document the general drivers of forecast errors or
horizons, as demonstrated via the particular cases of TS3.0 and
TS3.1*
- **TS3.3** *Either:*
- **TS3.3a** *Document, preferable via an example, how to trim
forecast values based on a specified error margin or equivalent;
or*
- **TS3.3b** *Provide an explicit mechanism to trim forecast
values to a specified error margin, either via an explicit
post-processing function, or via an input parameter to a primary
analytic function.*
### 4 Return Results
For (functions within) Time Series Software which return time series
data:
- **TS4.0** *Return values should either:*
- **TS4.0a** *Be in same class as input data, for example by using
the [`tsbox` package](https://www.tsbox.help/) to re-convert
from standard internal format (see 1.4, above); or*
- **TS4.0b** *Be in a unique, preferably class-defined, format.*
- **TS4.1** *Any units included as attributes of input data should
also be included within return values.*
- **TS4.2** *The type and class of all return values should be
explicitly documented.*
For (functions within) Time Series Software which return data other than
direct series:
- **TS4.3** *Return values should explicitly include all appropriate
units and/or time scales*
#### 4.1 Data Transformation
Time Series Software which internally implements routines for
transforming data to achieve stationarity and which returns forecast
values should:
- **TS4.4** *Document the effect of any such transformations on
forecast data, including potential effects on both first- and
second-order estimates.*
- **TS4.5** *In decreasing order of preference, either:*
- **TS4.5a** *Provide explicit routines or options to
back-transform data commensurate with original, non-stationary
input data*
- **TS4.5b** *Demonstrate how data may be back-transformed to a
form commensurate with original, non-stationary input data.*
- **TS4.5c** *Document associated limitations on forecast values*
#### 4.2 Forecasting
Where Time Series Software implements or otherwise enables forecasting
abilities, it should return one of the following three kinds of
information. These are presented in decreasing order of preference, such
that software should strive to return the first kind of object, failing
that the second, and only the third as a last resort.
- **TS4.6** *Time Series Software which implements or otherwise
enables forecasting should return either:*
- **TS4.6a** *A distribution object, for example via one of the
many packages described in the CRAN Task View on [Probability
Distributions](https://cran.r-project.org/web/views/Distributions.html)
(or the new [`distributional`
package](https://pkg.mitchelloharawild.com/distributional/) as
used in the [`fable` package](https://fable.tidyverts.org) for
time-series forecasting).*
- **TS4.6b** *For each variable to be forecast, predicted values
equivalent to first- and second-order moments (for example, mean
and standard error values).*
- **TS4.6c** *Some more general indication of error associated
with forecast estimates.*
Beyond these particular standards for return objects, Time Series
Software which implements or otherwise enables forecasting should:
- **TS4.7** *Ensure that forecast (modelled) values are clearly
distinguished from observed (model or input) values, either (in this
case in no order of preference) by*
- **TS4.7a** *Returning forecast values alone*
- **TS4.7b** *Returning distinct list items for model and forecast
values*
- **TS4.7c** *Combining model and forecast values into a single
return object with an appropriate additional column clearly
distinguishing the two kinds of data.*
### 5 Visualization
Time Series Software should:
- **TS5.0** *Implement default `plot` methods for any implemented
class system.*
- **TS5.1** *When representing results in temporal domain(s), ensure
that one axis is clearly labelled “time” (or equivalent), with
continuous units.*
- **TS5.2** *Default to placing the “time” (or equivalent) variable on
the horizontal axis.*
- **TS5.3** *Ensure that units of the time, frequency, or index
variable are printed by default on the axis.*
- **TS5.4** *For frequency visualization, abscissa spanning
\[ − *π*, *π*\] should be avoided in favour of positive units of
\[0, 2*π*\] or \[0, 0.5\], in all cases with appropriate additional
explanation of units.*
- **TS5.5** *Provide options to determine whether plots of data with
missing values should generate continuous or broken lines.*
For the results of forecast operations, Time Series Software should
- **TS5.6** *By default indicate distributional limits of forecast on
plot*
- **TS5.7** *By default include model (input) values in plot, as well
as forecast (output) values*
- **TS5.8** *By default provide clear visual distinction between model
(input) values and forecast (output) values.*