# NDARMA(1,1) formulation
## As a choice model
An observed time series $X_t$ with marginal distribution $X_t \sim \Pi$, in which $\Pi$ can be anything, say, $Categorical(\lambda)$ or $Poisson(\lambda)$, NDARMA(1,1) is
$$
X_t =
\begin{cases}
X_{t-1} &\quad \text{with probability } \alpha_1 \\
U_{t} &\quad \text{with probability } \beta_0 \\
U_{t-1} &\quad \text{with probability } \beta_1
\end{cases} \quad,
$$
in which $U_t$ is a latent process with $U_t \sim \Pi$ and $\alpha_1 + \beta_0 + \beta_1 = 1$.
## With ARMA(1,1)-type formulation
For a latent innovation process $U_t$ with $U_t \sim \Pi$, an observed time series $X_t$ with marginal distribution $X_t \sim \Pi$ is an NDARMA(1,1) process/dice with and is given by,
$$
X_t = a_{1,t} \cdot X_{t-1} + b_{0,t} \cdot U_t + b_{1,t} \cdot U_{t-1}
$$
in which $U_t$ is a latent innovation process/dice $U_t$ with $U_t \sim \Pi$ and $D_t = [a_{1,t}, b_{0,t}, b_{1,t}]$ a latent decision variable (dice) $D_t \sim Multinomial(1; ab)$ with decision probabilities $ab = [\alpha_1, \beta_0, \beta_1]$
The Log-likelihood of this process (if I'm not mistaken) may be written as
$$
\begin{align}
\log\mathcal{L} &= \log \Pr(X_t = x | \lambda, \alpha_1, \beta_0, \beta_1; X_{t-1}, \dots) \\
&= \sum_{t = 1}^N \Big[ \alpha_1 \Pr(X_{t-1} = x|\dots) + \beta_0 \Pr(U_{t} = x|\dots) + \beta_1 \Pr(U_{t-1} = x|\dots) \Big]
\end{align}
$$
## Some Stan attempts
``` stan
data {
int<lower=1> N; // Number of observations
int<lower=1> k; // Number of categories in X and U
int<lower=1, upper=k> X_t[N]; // Observed data X_t
}
parameters {
simplex[k] lambda; // Parameters of the multinomial distribution Pi
simplex[3] ab; // Probabilities for dice (alpha1, beta0, beta1)
}
model {
int U_t[N];
array[3,N] int D_t;
// Priors
lambda ~ dirichlet(rep_vector(2.0, k));
ab ~ dirichlet(rep_vector(2.0, 3));
for (t in 2:N) {
vector[3] contributions;
// array[3] real D_t;
// D_t ~ multinomial(ab);
contributions[1] = D_t[1, t] + X_t[t-1];
contributions[2] = D_t[2, t] + U_t[t-1];
contributions[3] = D_t[3, t] + U_t[t];
target += log_sum_exp(contributions);
}
}
```
---
**:rage: NO**
The conditional probability of the mixture should be
$$
\begin{align}
& \Pr(X_t = x | \lambda, \alpha_1, \beta_0, \beta_1; X_{t-1}, \dots) = \\
& \alpha_1 \Pr(X_{t-1} = x|\dots) + \beta_0 \Pr(U_{t} = x|\dots) + \beta_1 \Pr(U_{t-1} = x|\dots)
\end{align}
$$
So the log-likelihood would be
$$
\begin{align}
\log\mathcal{L} &= \log \Pr(X_t = x | \lambda, \alpha_1, \beta_0, \beta_1; X_{t-1}, \dots) \\
&= \sum_{t = 1}^N \Big[ \alpha_1 \Pr(X_{t-1} = x|\dots) + \beta_0 \underbrace{\Pr(U_{t} = x|\dots)}_{\lambda_x} + \beta_1 \underbrace{\Pr(U_{t-1} = x|\dots)}_{\lambda_x} \\
&= \sum_{t = 1}^N \Big[ \alpha_1 \Pr(X_{t-1} = x|\dots) + \lambda_x (\beta_0 + \beta_1) \Big] \\
&= \sum_{t = 1}^N \Big[ \alpha_1 \Pr(X_{t-1} = x|\dots) + \lambda_x (1 - \alpha_1) \Big]
\end{align}
$$
$P(X_t, X_{t-1}; \epsilon_t, \epsilon_{t-1}) = P(X_t, \epsilon_t | X_{t-1}, \epsilon_{t-1}) * P(X_{t-1}, \epsilon_{t-1})$
\[
P(X_t = i_0, X_{t-1} = i_1, U_t = j_0, U_{t-1} = j_1)
\]
\[
= p_{j_0} \cdot \left( (\alpha_0 \delta_{i_0 j_0} + \alpha_1 \delta_{i_0 j_1} + \alpha_1 \delta_{i_0 i_1}) \cdot P(X_{t-1} = i_1, U_{t-1} = j_1) \right.
\]
\[
+ \beta_2 \cdot P(X_{t-1} = i_1, U_{t-1} = j_1, X_{t-2} = i_0, U_{t-1} = i_0)
+ \beta_2 \cdot P(X_{t-1} = i_1, X_{t-2} = i_0, U_{t-1} = j_1) \Bigg)
\]
$p_{j_0} \cdot \left( (\beta_0 \delta_{i_0 j_0} + \beta_1 \delta_{i_0 j_1} + \alpha_1 \delta_{i_0 i_1}) \cdot p_{j_1} \cdot \left( (1 - \beta_0) p_{i_1} + \beta_0 \delta_{i_1 j_1} \right)\right)$
$p_{j_0}p_{j_1} \cdot (\beta_0 \delta_{i_0 j_0} + \beta_1 \delta_{i_0 j_1} + \alpha_1 \delta_{i_0 i_1})((1 - \beta_0) p_{i_1} + \beta_0 \delta_{i_1 j_1})$