Introduction - HackMD

--- date updated: 2021-01-28 (Thu) | 12:37 tags: - '#floor-effect' - '#manuscript' --- [![hackmd-github-sync-badge](https://hackmd.io/zL5NEAeNSsS3Py2oy8lZBg/badge)](https://hackmd.io/zL5NEAeNSsS3Py2oy8lZBg) Created: **[[2021-01-23 (Sat)]] | 20:41** Last edit: **[[2021-01-28 (Thu)]] | 12:37** tags: #floor-effect #manuscript --- ### Gist Are higher levels of (cross)correlations, and (cross-lagged) autoregression effects (network edges) merely statistical artifacts due to skewness of data (caused by the floor effect)? To what extent? --- # Introduction Auto regressive (AR) models have been used in psychological research. The autoregression effect ($\phi_j$) has been interpreted as (affective) *inertia*, pertaining to the ability of individuals to suppress random shocks to the system. If the AR effect (inertia) is small, then random shocks to the system vanishes quickly. (==#is-that-correct?==) ==[[Mulder (2018)]] cites some studies ( #is-citation-needed more?)== In the context of network psychometrics, linear associations between and within symptoms (regressions, autoregressions, and partial correlations) are of great interest. These associations are conceptualized as edges in a network of interconnected nodes or elements (i.e., symptoms). #citation-needed The strength of such associations (i.e., edge weights) are hypothesized to be indicative of the "stages" of severity of psychopathology [[(Wigman et al., 2013)]]. However, [[Terluin et al. (2016)]] have suggested that such staging effect can be merely a statistical artifact; that (high positive) skewness in the data can account for different (co)variances between the symptoms. ==#is-that-correct? I mean, do I understand it correctly? Terluin et al. (2016) talk about differences in skewness leading to differences in variance. But isn't i all about correlations/covariances? What am I missing? #question== More specifically, if the data exhibits #floor-effect (such that it is highly positively skewed, such that a substantial number of responses are zero), one can imagine that ... #to-be-completed In this paper, by simulating data resembling empirical measurements, we investigate if individuals sharing the same dynamics can have different AR coefficient if they differ in the skewness of their responses. To make the simulated data as similar to real-world empirical data, we use a class of integer-valued autoregressive models, namely INAR(1) and its extensions. # Intro to autoregressive models ## Single level, univariate AR model An autoregressive model of order $p$, denoted as $AR(p)$, is defined as $$ Y_t = c + \sum_{i=1}^{p} \phi_i Y_{t-i} + \epsilon_t $$ Where $c$ is a constant and $\phi_0, \ldots, \phi_p$ are model parameters (the regression coefficients of $Y_t$ on $Y-{t-1} \ldots Y_{t-p}$) and satisfy $|\phi_i| \lt 1$ to remain [[wide-sense stationary]]. (even in higher order models? #is-citation-needed) The residuals $\epsilon_t$ are i.i.d random variables drawn from $N(0, \sigma_\epsilon^2)$ and are called white noise or *innovations*. The mean and variance of $Y_t$ can be derived as follows: $$ \begin{align} E(Y_t) &= E(c) + \sum_{i=1}^{p} \phi_i E(Y_{t-i}) + E(\epsilon_t) \\ \mu &= c + \mu \sum_{i=1}^{p}\phi_i + 0 \end{align} $$ Thus $$ \mu = \frac{c}{1-\sum_{i=1}^{p}\phi_i} $$ The variance does not have a neat, closed form, formula. Though, for $AR(1)$ (with $p=1$) we have $$ Var(Y_t) = E(Y_t^2) - \mu^2 = \phi^2 Var(Y_{t-1}) + Var(\epsilon_t) $$ Given that $Var(Y_{t-1}) = Var(Y_t)$, $$ Var(Y_t) = \frac{\sigma_\epsilon^2}{1-\phi^2} $$ is is defined as a time-series process where the outcome variable $Y_t$ at any given time $t$ can be predicted only by its previous values and a stochastic noise: ## Multivariate VAR model If the outcome variable is a vector of $Y_t = [y_{1,t}, y_{2,t}, \ldots, y_{n,t}]$, we might have a vector autoregressive (VAR) model. Since the derivation of general, $n$-dimensional matrix notation of VAR model is tedious (cf. [its dedicated Wikipedia page](https://en.wikipedia.org/wiki/General_matrix_notation_of_a_VAR(p))). So, here a 2D VAR model is presented. $$ \begin{bmatrix} y_{1,t} \\ y_{2,t} \end{bmatrix} = \begin{bmatrix} c_{1} \\ c_{2} \end{bmatrix} + \begin{bmatrix} a_{1,1} & a_{1,2} \\ a_{2,1} & a_{2,2} \end{bmatrix} \begin{bmatrix} y_{1,t-1} \\ y_{2,t-1} \end{bmatrix} + \begin{bmatrix} \epsilon_{1,t} \\ \epsilon_{2,t} \end{bmatrix} $$ ## Multilevel AR(1) model Following [[Schuurman et al. (2016)]], the score of person ${j}$ on outcome variable $y_t$, denoted by $y_{tj}$, can be broken into person's mean $\mu_j$ and a time-dependent, varying score $z_{tj}$. In an AR(1) model, $z_{tj}$ can be regressed on its previous values ($z_{t-1j}$) and on an exogenous, time-dependent variable $x_{t-1j}$ ==#question : what does it mean to have person-specific error variances?== More formally, as a multilevel model, at level 1 we have: $$ \begin{align} & y_{tj} = \mu_j + z_{tj} \\ & z_{tj} = \phi_j z_{t-1j} + \beta_j x_{t-1j} + e_{tj} \\ & e_{tj} \sim N(0, \sigma_\epsilon^2) \end{align} $$ In which $\mu_j$, $\phi_j$, and $\beta_j$ (respectively person $j$'s mean, autoregression coefficient, and cross-regression parameter) can vary over persons, and are called random effects. At level 2, these random effects com from a multivariate normal distribution with vector of mean values $[\mu_j, \phi_j, \beta_j]^T$ (often called fixed effects), and covariance matrix $\Psi$, which specifies the covariances among the random effects: $$ \begin{bmatrix} \mu_{j} \\ \phi_{j} \\ \beta_{j} \end{bmatrix} \sim MvN \left\{ \begin{bmatrix} \gamma_{\mu} \\ \gamma_{\phi} \\ \gamma_{\beta} \end{bmatrix} , \begin{bmatrix} \psi_{\mu}^2 & & \\ \psi_{\mu\phi} & \psi_{\phi}^2 & \\ \psi_{\mu\beta} & \psi_{\phi \beta} & \psi_{\beta}^2 \end{bmatrix} \right\}$$ Represented visually, we have ![](https://i.imgur.com/aM1SjjC.png) The MvN distributions of random effects is problematic since it entails that $\phi_j$s can take arbitrarily large values falling outside the $(-1,1)$ range, violating stationarity of the AR model. Cf. footnotes 1 (and 2) of [[Schuurman et al. (2016)]]. ## AR Assumptions and restrictions As (V)AR models are, in essence, linear regression models. Thus—on top of the $|\phi_i| \lt 1$ restriction for stationarity of $AR(1)$—all the assumptions of such (nongeneralized) models apply (explanation required? #doubt) Most importantly: 1. The outcome variable must be able to take real numbers ($y_t \in \mathbb{R}$), thus has to be continuous; and often overlooked, 2. The error terms must be normally distributed, entailing continuous values and an infinite range. Both these assumptions are violated if the range of possible values of $y_{jt}$ are bounded, and even worse, if they are discrete—both of which are prevalent in ESM measurements. A viable alternative to continuous-valued AR models are [[Discrete-Valued Time Series|discrete-valued time series]] which remedy both these problems. However, some concerns exist (hierarchical modeling, multivariate time series, etc.) # Discrete valued time series ## INAR(1) model ### Preliminaries The conventional AR(1) recursion, $X_t = \alpha \cdot X_{t-i} + \epsilon_t$, dos not apply to count time series; not only because of non-normality of $\epsilon_t$ residuals, but also due to the *multiplication problem*; that the multiplication "$\alpha \ \cdot$" does not preserve the integer range of the observation when updating the recursion [[Weiss (2018, Chapter 1)]]. ^[This issue cannot be hindered by integer valued error terms, thus generalized linear models still fail. Though, it is worth looking into [[Fokianos (2015)]], as discusses how GLMs can be used for DVTS.] For this reason, a class of integer-valued autoregressive moving-average (INARMA) models have been proposed by [[McKenzie (1985)]] and [[Al-Osh and Alzaid (1987)]] back in the 1980's. INAR(1) model tackles the multiplication problem by using the probabilistic operation of *binomial sampling* or *[[Binomial thinning|binomial thinning]]* defined below. If $X$ is a r.v. with range $\mathbb{N}_0$ and if $\alpha \in (0;1)$, for *counting series* $Z_i$ (with $P(Z_i=1) = \alpha$), the r.v. $\alpha \circ X := \sum_{i=1}^{X} Z_i$ is said to arise from $X$ by *binomial thinning*. ^[For a more general form of thinning (denoted by "$\ast$"), see [[(Grunwald et al., 2000, p. 482)]].] In binomial sampling, $Z_i$ are i.i.d. binary r.v. with $P(Z_i=1) = \alpha$, thus $\alpha \circ X$ is, by construction, integer-valued between $0$ and $X$. Given that $0 \circ X := 0$ and $1 \circ X := X$, $Z_i$ will satisfy $Z_i \sim Bin(1,\alpha)$. As binomial distribution is additive, $\alpha \circ X$ is [[Binomial distribution|binomial]] given the value of $X$; i.e., $\alpha \circ X | X \sim Bin(X,\alpha)$. Using the [[Law of total expectation|law of total variance]], one can show that the binomial thinning $\alpha \circ X$ and the multiplication $\alpha \cdot X$ have the same mean: $$ E[\alpha \circ X] = E \Big[\ \underbrace{ \vphantom{\Big|} E[\alpha \circ X|X]}_{\substack{\text{Mean of} \\ \text{binom. distr.}}} \ \Big] = E[\alpha \cdot X] = \alpha . \mu $$ This motivates using binomial thinning in AR(1) recursion. Though, since multiplication is not a random operation, via [[Law of total variance|variance decomposition]], it follows that $Var[\alpha \circ X] \neq Var[\alpha \cdot X]$. More specifically, $$ \begin{align} Var(\alpha \circ X) &= Var( \ E[\alpha \circ X|X] \ ) + E[ \ Var(\alpha \circ X) \ ] \\ &= Var(\alpha \cdot X) + E[\alpha(1-\alpha) \cdot X] \\ &= \alpha^2 \sigma^2 + \alpha(1-\alpha) . \mu \end{align} $$ ### INAR(1) definition and interpretation Let $\alpha \in (0;1)$ and innovations $(\epsilon_t)_{\mathbb{N}} \in \mathbb{N}_0$ be i.i.d. r.v.'s with expected value $E[\epsilon_t] = \mu_\epsilon$ and variance $Var(\epsilon_t) = \sigma_\epsilon^2$. A process of $(X_t)_{\mathbb{N}_0}$, following the recursion $$ X_t = \alpha \circ X_{t-1} + \epsilon_t $$ is said to be an INAR(1) process if **(a)** all thinning operations are performed independently of each other ^[Note that it would be more correct to write "$\circ~_{t}$" in the above recursion to emphasize the fact that the thinning is realized at each time $t$ anew [[(Weiss, 2018)]].] and of $(\epsilon_t)_{\mathbb{N}}$; and **(b)** if the thinning operations at each time $t$ as well as $\epsilon_t$ are independent of $(X_s)_{s \lt t}$ [[(Weiss, 2018)]]. One can, following [[Al-Osh and Alzaid (1987)]] interpret the recursion by two terms; shrinkage of the population by a constant individual "death rate" ^[Perhaps another term than "rate" would better fit here.] of $(1-\alpha)$—that is, every individual has $\alpha$ chance of surviving the time step—and in influx of new immigrants, whose count follow the distribution of $\epsilon_t$ $$ \underbrace{\vphantom{\Big|} \; \; X_t \; \; }_\text{Population at time $t$} = \underbrace{\vphantom{\Big|} \ \alpha \circ X_{t-1} \ }_\text{Survivors from time $t-1$} + \underbrace{\vphantom{\Big|} \; \; \ \epsilon_t \; \; \ }_\text{Immigration} $$ ### INAR(1) properties The INAR(1) is a homogeneous Markov chain process with 1-step probabilities given by ([[McKenzie, 1985]]; [[Al-Osh & Alzaid, 1987]]): $$ \begin{align} p_{k|l} :=& P(X_t=k | X_{t-1} = l) \\ =& \sum_{j=0}^{min\{k,l\}} {l \choose j} \alpha^j (1-\alpha)^{l-j} \cdot P(\epsilon_t = k-j) \end{align} $$ Given independence assumption of the survival and immigration terms, one can derive [[(Al-Osh & Alzaid, 1988)]] the conditional mean and variance as follows, both of which are linear in $X_{t-1}$: #question `what does that mean?` #question `I did not look into the latter paper from 1988, but just cited since Weiss (2018) has done so. Is it okay?` $$ \begin{align} E[X_t|X_{t-1}] &= \alpha \cdot X_{t-1} + \mu_\epsilon \\ Var(X_t|X_{t-1}) &= \alpha (1-\alpha) \cdot X_{t-1} + \sigma^2_\epsilon \end{align} $$ Because the conditional mean is linear in $X_{t-1}$, the INAR(1) model belongs to [[conditional linear AR(1) models]] (or CLAR(1)), as discussed by [[Grunwald et al. (2000)]]. INAR(1)'s conditional mean is akin to that of AR(1). However, unlike AR(p)'s conditionally homoscedastic variance ($\sigma^2_\epsilon$), ^[Which is a shortcoming of AR models; financial time series are often characterized by clusters of high and low volatility.] the conditional variance of INAR(1) varies with time (thus is *conditionally heteroscedastic*). ^[#idea💡 Can INAR(1) heteroscedasticity benefit #change-point-detection ?] #question `These definitions and explanations are almost carbon-copy of the references (e.g., here Weiss 2018). It does not make sense to block-quote them, nor paraphrasing is easy. How to avoid plagiarism in such cases?` ## Binomial AR model In many applications, the observed count series has a natural upper bound of $n \in \mathbb{N}$, and INAR(1) model and its extensions fail to achieve that; to guarantee the restricted support of $\{0, \ldots, n\}$ for $X_t = \alpha \circ X_{t-1} + \epsilon_t$, $\epsilon_t$ at time $t$ must be restricted to $\{0, \ldots, n - X_{t-1}\}$, contradicting the i.i.d. assumption of innovations. [[McKenzie (1985)]] proposed, together with INAR(1), the binomial AR(1) model (henceforth, *BiNAR(1)*) in which the innovation is replaced by $\beta \circ (n - X_{t-1})$. ### BiNAR(1) definition and properties Per [[(Weiss, 2018)]], let $\pi \in (0; 1)$ and $$ \rho \in \Big( max \{ \frac{-\pi}{1-\pi}, \frac{1-\pi}{-\pi} \}; 1 \Big) $$ Define $\beta := \pi (1-\rho)$ and $\alpha := \beta + \rho$ and fix $n \in \mathbb{N}$. Then, the process $(X_t)_\mathbb{Z}$, defined by the recursion $$ X_t = \alpha \circ X_{t-1} + \beta \circ (n - X_{t-1}) $$ is a *binomial AR(1) process*, if **(a)** all thinnings are preformed independently of each other, and **(b)** the thinnings at time $t$ are independent of $(X_s)_{s \lt t}$. BiNAR(1) model with support $(0;n)$ is fully characterized by $\rho$ and $\pi$, since substituting $\alpha$ and $\beta$ yields $$ \begin{align} X_t = \Big[ \rho + \pi(1-\rho) \Big] &\circ X_{t-1} \\ + \Big[ \pi(1-\rho) \Big] &\circ (n - X_{t-1}) \end{align} $$ To interpret this formula, assume we have a set of $n$ independent units that can take binary states $\{0, 1\}$. Let $X_{t-1}$ be the number of units with state "1" at time $t-1$. Then $\alpha \circ X_{t-1}$ is the number of units that will remain at state "1" at time $t$, with individual transition probability $\alpha$ ("survival probability"). Similarly, $\beta \circ (n - X_{t-1})$ are the number of units that switched from "0" to "1" at the time $t$, with individual transition probability $\beta$ ("revival probability") [[(Weiss, 2018)]]. $$ \underbrace{ \; \; X_t \; \; }_\text{Population at time $t$} = \underbrace{ \ \alpha \circ X_{t-1} \ }_\text{Survivors from time $t-1$} + \underbrace{\ \beta \circ (n - X_{t-1}) \;}_\text{Revived units from time $t-1$} $$ ### BiNAR(1) properties It is known that $(X_t)_\mathbb{Z}$ is a [[stationary]], [[ergodic]] and $\phi$-[[mixing]] finite Markov chain. ^[Every uniformly ergodic, stationary Markov chain is $\phi$-mixing [[(Geyer, 2012)]]. #what-is-that🤨] Its marginal distribution is $Bin(n,\pi)$, and the (truly positive) 1-step-ahead transition probabilities are given by $$ \begin{align} p_{k|l} :=& P(X_t=k | X_{t-1} = l) \\ =& \sum_{m=max\{0, k+l-n\}}^{min\{k,l\}} {l \choose m} {n-l \choose k-m} \alpha^{m} (1-\alpha)^{l-m} \beta^{k-m} (1-\beta)^{n-l+m-k} \end{align} $$ The conditional mean and variance, both linear in $X_{t-1}$, are $$ \begin{align} E[X_t|X_{t-1}] &= \rho \cdot X_{t-1} + n\beta \\ Var(X_t|X_{t-1}) &= \rho (1-\rho) (1-2\pi) \cdot X_{t-1} + n\beta(1-\beta) \end{align} $$ The [[ACF]] of the binomial AR(1) is of AR(1)-type and it is given by $\rho(k) = \rho^k$ for $k \gt 0$, wherein $\rho$ can be negative. For further properties, cf. [[Weiss (2018, p. 60)]] # Simulation strategy cf. [[Simulating Floor Effect]] #to-be-completed 1. BiNAR(1) --> simulate data for a given $\alpha$ and various binomial skewness --> see if AR(1) coefficients differ --> repeat for different $\alpha$s. 2. Bivariate BiNAR(1) --> ... AND cross-lagged relationships 3. Multinomial INAR(1) --> ...