# 時間序列
:::success
[回筆記目錄](https://hackmd.io/68GUUX4MQXGXroA5TlsZ-Q)
編輯:2023/04/19 第ㄧ章施工中
[github](https://github.com/Iofting1023/2024-Spring--Introduction-to-stochastic-calculus)
:::
[TOC]
# 1.Characteristics of Time Series
:::info
時間序列分析處理Time(serial) correlations data.
:::
- Time Domain: lagged relationships.
- Frequency Domain: Investigation of cycles.
- Import Data in textbook from python.
```python=+
import astsadata
data = astsadata.jj #後面換成資料及名稱.
```
## 1.1 The Nature of Time Series Data
- 有趨勢的時間序列:溫度

- 有週期性的時間序列:音頻資料

- 波動度變動資料: 道瓊報酬

## 1.2 Time Series Statistical Models
- Time series can be defined as a collection of random variables indexed according to the order they are obtained in time.
- For discrete, we defined by$$\{x_t\}\quad t=0,\pm1,\pm2,\ldots$$
called realization of the stochastic proces, $t$ as integer.
:::info
時間的標記原本是連續的,為了方便分析做離散化.
:::
### White noise
- 不相關的隨機變數,可以給或不給分配假設
$$w_t \sim wn(0,\sigma_w^2)$$
$$w_t \sim N(0,\sigma_w^2)$$
### Moving Averages
- 使用相鄰樣本對平滑序列
$$v_t=(\sum_{t= -h}^h a_tw_t)$$
- 這樣做平滑不會改變平均
### Autoregressions(自迴歸)
- 現在的點與過去的點相關
$$x_t = x_{t-1}-0.9x_{t-2}+w_t$$
- lag幾項就需要給幾個初始值,才能建構數列.
### Random Walk with Drift
$$x_t = \delta +x_{t-1}+w_t=\delta t +\sum_{j=1}^t w_t$$
- $\delta$: drift term.
- $E(x_t)=\delta t$
### Signal in Noise
$$x_t = 2\cos(2\pi\frac{t+15}{50})+w_t$$
- 週期可以表示成$A\cos(2\pi\omega t+\phi)$
- $A$: the amplitude
- $\omega$: the frequency of oscillation
- $\phi$: is a phase shift
- $\frac{A}{\sigma_w}$: signal-to-noise ratio
- $E(x_t)=2 \cos(2\pi \frac{t+15}{50})$
後續章節會學習到detecting regular or periodic signals
$$x_t =s_t +v_t$$
- $s_t$: some unknown signal
- $v_t$: time series that may be white or correlated over time
## 1.3 Measures of Dependence
- 可以把$\{X_t\}$看成一個聯合機率分佈.
:::info
Although the joint distribution function describes the data completely, it is an unwieldy tool for displaying and analyzing time series data.
:::
- most useful descriptive measures are those expressed in terms of covariance and correlation functions.
### Define autocovariance function
$$\gamma_x(s,t)=cov(x_s,x_t)=E[(x_s-\mu_s)(x_t-\mu_t)]$$
- measures the linear dependence between two points on the same series observed at different times
- For simply denoted $\gamma_x(s,t)=\gamma(x,t)$
- $\gamma(t,t)=E[(x_t-\mu_y)]=var(x_t)$
#### Example1.17 Autocovariance of a Moving Average (p 17)
\begin{align}
\gamma_V(s,t)&=cov(v_s,v_t)=cov\{\frac{1}{3}(w_{s-1}+w_s+w_{s+1}),\frac{1}{3}(w_{t-1}+w_t+w_{t+1})\}\\
&=\begin{cases}\frac{3}{9}\sigma_w^2,\quad &s=t,\\
\frac{2}{9}\sigma_w^2,\quad &|s-t|=1,\\
\frac{1}{9}\sigma_w^2,\quad &|s-t|=2,\\
0,\quad &|s-t|>2 .
\end{cases}
\end{align}
- It only depends on the time separation or lag and not on the absolute location of the points along the series
- 看時間間隔有沒有重疊到.
#### Example1.18 AutocovarianceofaRandomWalk
$$\gamma_x(s,t)=cov\big(\sum_{j=1}^s w_j,\sum_{k=1}^t w_k\big)=\min\{s,t\}\sigma_w^2$$
- autocovariance function of a random walk depends on the particular time values $s$ and $t$, and not on the time separation or lag.
### Define autocorrelation function (ACF)
$$\rho(s,t)=\frac{\gamma(s,t)}{\sqrt{\gamma(s,s)\gamma(t,t)}}$$
- The ACF measures the linear predictability of the series at time $t$, say $x_t$, using only the value $x_s$.
- $-1\leq\rho(s,t)\leq1$
### Define cross-covariance function and **CCF**
考慮兩條時間數列$\{x_t\},\{y_t\}$.
- cross-covariance function
$$\gamma_{xy}(s,t)=cov(x_s,y_t)=E[(x_s-\mu_xs)(y_t-\mu_yt)]$$
- cross-correlation function (CCF)
$$\rho(s,t)=\frac{\gamma_{x,y}(s,t)}{\sqrt{\gamma_x(s,s)\gamma_y(t,t)}}$$
## 1.4 Stationary Time Series
### Define strictly stationary
Consider one time series $\{x_t\}$,we have
- collection of values:
$$\{x_{t1},x_{t2},\ldots x_{tk}\}$$
- time shifted set
$$\{x_{t1+h},x_{t2+h},\ldots x_{tk+h}\}$$
we have
\begin{align}
&\mathcal{P}\{x_{t1}\leq c_1,\ldots x_{tk}\leq c_k\}=\mathcal{P}\{x_{t1+h}\leq c_1,\ldots x_{tk+h}\leq c_k\},\\
&\forall \ k=1,2\ldots ,\ t=1,2\ldots , c_1,c_2,\ldots c_k,\ \text{all time shifts}: h=0,\pm1,\pm2,\ldots.
\end{align}
- Then all of the multivariate distribution functions for subsets of variables must agree with their counterparts in the shifted set for all values of the shift parameter $h$.
- 對相同時間間隔的機率分佈會相同,只取決與間隔不關絕對時間點.
- Imply $\gamma(s,t)=\gamma(s+h,t+h)$
- Autocovariance function of the process depends only on the time difference between s and t, and not on the actual times.
- 這個條件本身太嚴格,也很難去檢驗所以另外定義.
### Define weak stationary
Consider one time series $\{x_t\}$, satisfied 1 and 2.
1. the mean value function, $μ_t$ is constant and does not depend on time $t$.
2. the autocovariance function, $\gamma(s, t)$ depends on $s$ and $t$ only through their difference $|s − t|$.
- 通常說 ***stationary***指的就是***weak stationary***.
- strictly stationary 可以推到 weak stationary, 反過來不一定成立,(ex. 常態可以反推.)
- By $\mu_t$ as constant , under stationary denoted it as $\mu$.
- By $\gamma(t+h,t)=cov(x_{t+h},x_t)=cov(x_h,x_0)=\gamma(h,0)$, Denoted it as $\gamma(h)$.
- Define $\rho(h)=\frac{\gamma(h)}{\gamma(0)}.$
#### Example1.19 Stationarity of White Noise
The mean and autocovariance functions of the white noise series are $\mu_{wt}=0$ and
$$
\gamma_w(h)= cov(w_{t+h}, w_t)=\left\{
\begin{aligned}
\sigma_w^2 \ ; & \ h=0.\\
0 \ ; & \ h \neq 0. \\
\end{aligned}
\right.
$$
- White noise is weakly stationary or stationary
- If the white noise variates are also normally distributed, the series is also strictly stationary.
#### Example 1.20 Stationarity of a Moving Average
- The three-point moving average process is stationary (independent of time t)
- $\mu_{vt} = 0$, and
$$
\gamma_v(h)=\left\{
\begin{aligned}
\frac{3}{9}\sigma_w^2 \ ; & \ h=0.\\
\frac{2}{9}\sigma_w^2 \ ; & \ h=\pm1.\\
\frac{1}{9}\sigma_w^2 \ ; & \ h=\pm2.\\
0 \ ; & \ |h| > 2. \\
\end{aligned}
\right.
$$
- autocorrelation function
$$
\rho_v(h)=\left\{
\begin{aligned}
1 \ ; &\ h=0 \\
\frac{2}{3} \ ; &\ h=\pm1\\
\frac{1}{3} \ ; &\ h=\pm2\\
0 \ ; &\ |h|>2\\
\end{aligned}
\right.
$$

#### Example 1.21 A RandomWalk is Not Stationary
- not stationary because
- autocovariance function $\gamma(s,t)=min\{s,t\}\sigma_w^2$
- the mean of random walk with drift $\mu_{xt}=\delta t$
- both are a function of time $t$
#### Example 1.22 Trend Stationarity
model: $x_t=\alpha+\beta t+y_t$ where $y_t$ is white noise (stationary)
- Mean function: $\mu_{x, t}=E(x_t)=\alpha +\beta_t+\mu_y$ (不獨立於時間 $t$,所以 not stationary)
- autocovariance functio: $\gamma_x(h)=cov(x_{x+h}, x_t)=E[(x_{t+h}-\mu_{x, t+h})(x_{t}-\mu_{x, t})]=E[(y_{t+h}-\mu_{y, t+h})(y_{t}-\mu_{y, t})]=\gamma_y(h)$
- the model may be considered as having stationary behavior around a linear trend (called **trend stationarity**)

### Special properties (ACF of a stationary process)
- $\gamma(h)$ is non-negative definite ( Problem 1.25 [(a)](https://stats.stackexchange.com/questions/431429/show-that-the-autocovariance-function-of-stationary-process-x-t-is-positiv))
- 確保 $x_t$ 的變異的線性組合不會是負的,即:
$$
0 \leq var(a_1x_1+\dots+a_nx_n)=\sum_{j=1}^{n}\sum_{k=1}^n a_ja_{k}\gamma(j-k)
$$
- Cauchy-Schwarz inequality: $|\gamma(h)| \leq \gamma(0)$
- $\gamma(h)=\gamma(-h)$
proof: $\gamma((t+h)-t)=cov(x_{t+h}, x_t)=cov(x_t, x_{t+h})=\gamma(t-(t+h))$
### Definition 1.10: cross-covariance function
Two time series $x_t$ and $y_t$
$$
\gamma_{xy}(h)=cov(x_{t+h}, y_t)=E[(x_{t+h}-\mu_x)(y_{t+h}-\mu_y)]
$$
### Definition 1.11: cross-correlation function
Two time series $x_t$ and $y_t$
$$
\rho_{xy}(h)=\frac{\gamma_{xy}(h)}{\sqrt{\gamma_x(0)\gamma_y(0)}}
$$
- property:
- $\gamma_{xy}((t+h)-t)=cov(x_{t+h}, y_t)=cov(y_{t}, x_{t+h})=\gamma_{xy}(-h)$
- $\gamma_{xy}(h)=\gamma_{xy}(-h)$
- $\rho_{xy}(h)=\rho_{xy}(-h)$
#### Example 1.24 Prediction Using Cross-Correlation
model: $y_t=Ax_{t-l}+w_t$
property: the series $x_t$ is said to lead $y_t$ for $l>0$ and is said to lag $y_t$ for $l<0$
- $\gamma_{xy}(h)=cov(y_{t+h}, x_t)=cov(Ax_{t+h-l}+w_{t+h}, x_t)=cov(Ax_{t+h-l}. x_t)=A\gamma_x(h-l)$
- 根據 Cauchy–Schwarz Inequality,$|\gamma(h-l)| \leq \gamma(0)$,並且在 $h=l$ 時會有最大的 $\gamma_{xy}(h)$
- 下圖是模擬 $l=5$ 的 CCovF ,可看到在 $lag=5$ 時有最大的值

### Definition 1.12 linear process:
$x_t$, is defined to be a linear combination of white noise variates $w_t$ , and is given by
$$
x_t=\mu+\sum_{j=-\infty}^{\infty}\psi_jw_{t-j}, \ \sum_{j=-\infty}^{\infty}|\psi_j|<\infty
$$
- Problem 1.11 $\gamma_x(h)=\sigma_w^2\sum_{j=-\infty}^{\infty}\psi_{j+h}\psi_j$
- 讓該 linear process 有有限變異的條件是 $\sum_{j=-\infty}^{\infty}\psi_j^2<\infty$
### Definition 1.13 Gaussian process
$\{x_t\}:$ Gaussian process, if the n-dimentional vector $x=(x_{t_1}, ..., x_{t_n})'$, for every collection of distinct time points $t_1, t_2, ..., t_n$, have a multivariate normal distribution
- mean vector: $E(x)=\mu=(\mu_{t_1}, \mu_{t_2}, ..., \mu_{t_n})'$
- covariance matrix: $var(x)=\Gamma=\{\gamma(t_i, t_j); i, j=1, ..., n\}$, which is assumed to be positive definite
- density function:
$f(x)=\frac{1}{(2\pi)^{n/2}}|\Gamma|^{-1/2}exp\{-\frac{1}{2}(x-\mu)'\Gamma^{-1}(x-\mu)\}$
important porperty:
- $\{x_t\}$: weakly stationary, $\mu_t$: constant, $\gamma(t_i, t_j)=\gamma(|t_i-t_j|)$
## 1.5 Estimation of Correlation
若是時間序列是 stationary 則:
- 平均是常數 $\mu_t=\mu$,可以用樣本平均估計$\bar{x}$
$$
\bar{x}=\frac{1}{n}\sum_{i=1}^{n} x_i
$$
-
$$
var(\bar{x})=var(\frac{1}{n}\sum_{t=1}^n x_t)=\frac{1}{n^2}cov(\sum_{t=1}^{n} x_t, \sum_{s=1}^{n} x_s)=\frac{1}{n^2}(n\gamma_x(0)+(n-1)\gamma_x(1)+\dots+)
$$
### Definition 1.14 The sample autocovariance function
$$
\hat{\gamma}(h)=\frac{1}{n}\sum_{t=1}^{n-h}(x_{t+h}-\bar{x})(x_t-\bar{x})
$$
$with \hat{\gamma}(-h)=\hat{\gamma}(h) for h=0, 1, ..., n-1$
[Problem 1.25 (b)](https://www.stat.berkeley.edu/~bartlett/courses/153-fall2010/lectures/4.pdf)
### Definition 1.15 The sample autocorrelation function
$$
\hat{\rho}(h)=\frac{\hat{\gamma}(h)}{\hat\gamma(0)}
$$
## 1.6 Vector-Valued and Multidimensional Series
- Consider $x_t =(x_{t1},\ldots,x_{tp})^\top$ a vector time series,which contains as its components $p$ univariate timeseries.
**For the stationary case,**
- $p\times1$ mean vector:
$$\mu = E(x_t)=(\mu_{t1},\ldots,\mu_{tp})$$
- $p\times p$ autocovariance matrix:
$$$$
# 2.Time Series Regression and Exploratory Data Analysis
- Multiple linear regression in a time series context
- Model selection
- Exploratory data analysis
- Preprocessing nonstationary time series
## 2.1 Classical Regression in the Time Series Context
使用線性迴歸對時間序列進行建模.
$$x_t =\beta_0+\beta_1z_{t1}+\ldots +\beta_qz_{tq}+w_t$$
- $\{w_t\}$ is a random error or follow $N(0,\sigma_w^2)$.
- For time series regression, it is rarely the case that the noise is white, and we will need to eventually relax that assumption.
- $\beta = (\beta_0,\beta_1,\ldots,\beta_q)'$
- 可以透過OLS的方式最小化$\text{SSE}=\sum_{t=1}^n(x_t-\hat{\beta}'z_t)^2$得到$\beta$的估計量$\hat{\beta}$.
### 模型選擇
- Select the best subset of independent variables.
考慮一個變數子集$z_{t,1:r}=\{z_{t1},\ldots ,z_{tr}\},\ r<q$, reduce model為:
$$x_t =\beta_0+\beta_1z_{t1}+\ldots +\beta_qz_{tr}+w_t$$
透過$H_0:\beta_{r+1}=\ldots= \beta{q}=0$, 比較reduce model和full model.
- 檢定統計量:
$$F=\frac{(\text{SSE}_r-\text{SSE})/(q-r)}{\text{SSE}/(n-q-1)}$$
如果$H_0$ 成立,$\text{SSE}_r\approx \text{SSE}$,因為 $\beta_{r+1}=\ldots= \beta{q}$ 會接近0.
- 可以透過逐步方入變數的選擇模型,稱為 *stepwise multiple regression*.
有時候不會使用逐步方式而是想直接比較多個特定的模型,則會使用以下幾個指標.
- 考慮一個包含k個解釋變數的模型,以及vaeiance 的MLE:
$$\sigma_k^2 = \frac{\text{SSE}(k)}{n}$$
### AIC
$$\text{AIC}=\log\hat{\sigma}^2_k+\frac{n+2k}{n}$$
- 使AIC最小的k會被認為是最好的模型.
### AICc
$$\text{AICc}=\log\hat{\sigma_k^2}+\frac{n+k}{n-k-2}$$
- 小樣本中相比於AIC使用這個較好.
### BIC
$$\text{BIC}=\log\hat{\sigma}_k^2+\frac{k\log n }{n}$$
- BIC 對變數的懲罰比AIC大許多,因此會傾向選擇較小的模型.
## 2.2 Exploratory Data Analysis
對 trend stationary model 最簡單的建模如下:
\begin{align}
x_t=\mu_t+y_t
\end{align}
其中 $x_t$:observations, $\mu_t$:trend 和 $y_t$: stationary process.
強烈的 trend 經常會蓋住 stationary process ($y_t$),所以會在 expploratory analysis 的第一步會移除趨勢,而會對 trend component 做合理的估計得 $\hat \mu_t$
$$
\hat y_t=x_t-\hat \mu_t
$$
#### Exaple 2.4: Detrending Chicken Prices
該範例是對資料假設模型如 $x_t=\mu_t+y_t$
****OLS****
可以利用 Example 2.1 對資料 chicken 配的回歸模型去除 trend,trend 模型如下:
$$
\mu_t = \beta_0+\beta_1t+e_t
$$
用 OLS 得到 $\hat \mu_t=-7131.02+3.59t$,則可以得到 detrended series
$$
\hat y_t=x_t+7131.02-3.59t
$$
**differencing:**
由前面 Chapter1. 可以得出 $x_t-x_{t-1}$ 是 stationary,所以透過差分亦可讓資料成定態。
\begin{aligned}
x_t-x_{t-1}&=(\mu_t+y_t)-(\mu_{t-1}+y_{t-1})\\
&=\delta+w_t+y_t-y_{t-1}
\end{aligned}
下圖分別是利用上面兩個方法做 detrending 得到的時間序列:

### difference
1. 差分的優點是不需要估計參數,但是亦是缺點就沒有對 $y_t$ 有估計,所以如果只是要讓時序變定態,差分是合適的。
2. first difference: $\bigtriangledown x_t=x_t-x_{t-1}$
3. first difference 去除 linear trend,而 second difference 去除 quadratic trend.
### Definition: backshift operator
* backshift operator: $Bx_t=x_{t-1}$ (extend:
$B^kx_t=x_{t-k}$)
* forward-shift operator: $x_t=B^{-1}Bx_t=B^{-1}x_{t-1}.$
* $\bigtriangledown x_t=(1-B)x_t$
* $\bigtriangledown^d=(1-B)^d$
#### Example 2.5: Differencing Chicken Prices
在上圖可以看到一街差分後就看不到像 detrend series 的趨勢了,而從下面的 ACF 可看到在一階差分之後可以看到有每年的循環。

#### Example 2.6: Differencing Global Temperature
從下圖的最上面可看到這個資料較像 random walk 而不是 trend stationary series,因此用差分比較合適。

### fractional differencing
用來處理上面 difference order 在 $-0.5<d<0.5$ 的情況,通常在長期的時序資料(例如水文學的資料)會有 $0<d<0.5$ 的情況。
* 對數轉換: $y_t=log\ x_t$
* Box-Cox family:
\begin{aligned}
y_t=
(x_t^\lambda-1)/\lambda &; \lambda \neq 0 \\
logx_t &; \lambda=0
\end{aligned}
### Scatterplot Matrics
ACF 讓我們了解到 $x_t$ 和 $x_{t-h}$ 的線性關係,但可能忽略非線性關係,因此需要利用 scatterplot。
#### Example 2.8: Scatterplot Matrices, SOI and Recruitment
紅線是 locally weighted scatter plot smoothing (lowess) lines 可以用來當著發現非線性關係。
- $S_t$ 和 $S_{t-1}, S_{t-2}, S_{t-11}, S_{t-12}$ 有強正線性關係

- 可看到 $R_t$ 和 $S_{t-5}, S_{t-6},S_{t-7},S_{t-8}$ 有強非線性關係

#### Example 2.9: Regression with Lagged Variable
#### Example 2.10:
考慮加入 dummy 的情況:
$$
R_t=\beta_0+\beta_2S_{t-6}+\beta_2D_{t-6}+\beta_3D_{t-6}S_{t-6}+w_t
$$
其中 $D_t=0$ 若 $S_t<0$ 且 o.w. 若 $S_t \geq 0$
由下面的圖可以看到上面回歸模型和 lowess 配的模型差不多,但是從下向兩張圖可看到殘差仍不是白噪音。


### periodic behavior
#### Example 2.10: Using Regression to Discover a Signal in Noise
考慮以下模型產生 500 個觀測值:
$$
x_t=Acos(2\pi \omega t+\phi)+w_t
$$
其中 $\omega=1/50, A=2, \phi=0.6 \pi, \sigma_w=5$
假設 $\omega=1/50$ 是已知,其中 $A$ 和 $\phi$ 未知,利用三角函數的性質可以將右式轉換得到:
$$
x_t=\beta_1cos(2\pi t/50)+\beta_2sin(2\pi t/50)+w_t
$$
可以利用回歸估計 $\hat \beta_1$ 和 $\hat \beta_2$ 來偵測是否有 cyclic or periodic signals.
根據上面的回歸得 $\hat \beta_1=-0.74, \hat \beta_2=-1.99$,而真實的值是 $\beta_1=-0.62, \beta_2=-1.9$,其他相關的討論在第四章。

## 2.3 Smoothing in the Time Series Context
利用 moving average 去 smooth 白噪音,可以發現時間序列中的 long-term trend 和 seasonal component。
$$
m_t=\sum_{j=-k}^k a_jx_{t-j}
$$
其中 $a_j=a_{-j}$ 且 $\sum_{j=-k}^k a_j=1$ (symmetric moving average)
#### Example 2.11 Moving Average Smoother
可以看出用這個方式曲線看起來還是比較不平滑的。

#### Example 2.12 Kernel Smoothing
也是 moving average smoother 只是權重是用 kernel 去加權平均觀測值。
$$
m_t=\sum_{i=1}^n w_i(t)x_i
$$
其中 $w_i(t)=K(\frac{t-i}{b})/\sum_{j=1}^n K(\frac{t-j}{b})$,而 $K(.)$ 是 kernel function,通常用常態分配,而 $b$ 若是越大則曲線會越平滑。

#### Example 2.13 Lowess
- 另一種方法是 nearest neighbor regression,這個方法是依據 k-nearest neightbors regression,基本概念是只用 $\{x_{t-k/2}, \dots, x_t, \dots, x_{t+k/2}\}$ 用回歸去預測 $x_t$,然後設定 $m_t=\hat x_t$,實際作法 Lowess 更複雜。
- 因此要選擇要用多少附近數據去預測 $m_t$,下圖聖嬰現象的循環是用 5%(藍線),而整體趨勢是預設值 2/3(紅線)。

#### Example 2.14 Smoothing Splines
- 直覺的方式是配一個時間 t 的 polynomial regressioin ,例如一個 cubic polynomial:
$$
x_t=\beta_0+\beta_1t+\beta_2t^2+\beta_3t^3+e_t=m_t+e_t
$$
- 另一種進階的方法是將時間 $t=1, \dots, k$ 分成 $k$ 區間,$[t_0=1, t_1], [t_1+1, t_2], \dots, [t_{k-1}+1, t_k=n]$,而這些區間 $t_0, t_1, \dots, t_k$ 稱為 knots,然後在每個區間配一個時間 t 的 polynomial regressioin,若是配 cubic polynomial ,則稱為 cubic splines,
- 該方式稱為 **smoothing splines**,希望讓該式越小越好, $\lambda$ 的選擇會是呈現出回歸 (completely smooth) 和資料本身 (no smoothness) 的權衡。$\lambda$ 越大則資料越平滑。
$$
\sum_{t=1}^n [x_t-m_t]^2+\lambda\int(m_t'')^2dt
$$

#### Example 2.15 Smoothing One Series as a Function of Another
由 lowes line 可以大概看出當溫度在華氏 83 度時有最低的死亡率。
