---
title: "119020033_Project"
author: "Puyuan Liu"
date: "2022/5/10"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
app <- get(load("AppData.RData"))
```
#1. Overview of the time series pattern
```{r}
ts_a = ts(app$A)
ts_b = ts(app$B)
ts_c = ts(app$C)
ts_d = ts(app$D)
ts_e = ts(app$E)
ts_f = ts(app$F)
ts_g = ts(app$G)
ts_h = ts(app$H)
ts_o = ts(app$O)
ts_list = list(ts_a, ts_b, ts_c, ts_d, ts_e, ts_f, ts_g, ts_h, ts_o)
par(mar = c(2, 2, 1, 1), mfrow = c(3,1))
plot(ts_a[1:91], type="l")
plot(ts_b[1:91], type="l")
plot(ts_c[1:91], type="l")
```
```{r}
par(mar = c(2, 2, 1, 1), mfrow = c(3,1))
plot(ts_d[1:91], type="l")
plot(ts_e[1:91], type="l")
plot(ts_f[1:91], type="l")
```
```{r}
par(mar = c(2, 2, 1, 1), mfrow = c(3,1))
plot(ts_g[1:91], type="l")
plot(ts_h[1:91], type="l")
plot(ts_o[1:91], type="l")
```
We can infer from the graphs that the data points follows an overall downward trend from around day 40 to day 75, and goes upward and remain stable afterwards.
```{r}
par(mar = c(2, 2, 1, 1), mfrow = c(3,1))
plot(acf(ts_a[1:91]))
plot(acf(ts_c[1:91]))
plot(acf(ts_f[1:91]))
```
```{r}
par(mar = c(2, 2, 1, 1), mfrow = c(3,1))
plot(pacf(ts_a[1:91]))
plot(pacf(ts_c[1:91]))
plot(pacf(ts_f[1:91]))
```
**From the above plot, we can see that the ACFs are tails off, and the PACF sems to cut off after lag 1 or 2 or 3. Which implies that it can be a AR(p) model.**
We also consider in another way:
The pattern of data appears to behave more as a random walk than a trend stationary series. Hence, rather than detrend the data, it would be more appropriate to use differencing to coerce it into stationarity.
The transformed data of App A:
```{r}
par(mar = c(3, 3, 1, 1), mfrow = c(2, 1))
diff_ts_a = diff(ts_a)
plot(diff_ts_a)
acf(diff_ts_a)
```
The transformed data of App A is shown above along with the corresponding sample ACF.
In this case it appears that the differenced process shows minimal autocorrelation, which may imply the time series is nearly a random walk with drift.
The mean of drift for the 9 categories of mobile app:
```{r}
diff_ts_list <- list(diff(ts_a), diff(ts_b), diff(ts_c), diff(ts_d), diff(ts_e), diff(ts_f), diff(ts_g), diff(ts_h), diff(ts_o))
lapply(diff_ts_list, mean)
```
Hence the preliminary values of d (d = 1) have been settled, the next step is to look at the sample ACF and PACF of $\nabla x_t$.
ACF and PACF for APP A:
```{r}
library(astsa)
acf2(diff_ts_a[1:91])
```
ACF and PACF for APP F:
```{r}
acf2(diff_ts_list[[6]][1:91])
```
ACF and PACF for APP C:
```{r}
acf2(diff_ts_list[[3]][1:91])
```
After plotting the ACF and PACF for the data of 9 categories of mobile apps, by inspecting the sample ACF and PACF, it seems that the ACF is tails off and the PACF is cutting off at lag 2.
This would suggest the difference of App data usage follows an AR(2) process, or the App data usage follows an ARIMA(2, 1, 0) model.
Rather than focus on one model, it also appears that the ACF is cutting off at lag 1 and the PACF is tails off. This suggests an MR(1) model for the difference of App data usage,
or ARIMA(0, 1, 1) for the App data usage.
As a preliminary analysis, we will fit both models.
## Fit ARIMA(2, 1, 0) to the App data usage. In this model, I continuously refit the ARIMA(2, 1, 0) model using the information up to day T and predict day T+1.
```{r}
library(forecast)
pred_list = c()
for (app in 1:9){
pred = c()
for (i in 1:60){
(reg = arima(ts_list[[app]][1:(91+i-1)], order = c(2, 1, 0)))
pred[i] = forecast(reg, h=1)$mean[1]
}
# print(pred)
pred_list = cbind(pred_list, pred)
}
```
### Calculate the MSE for the ARIMA(2, 1, 0) model Aaccording to the formula
$$MSE = \frac{1}{9\times60}\sum_{i=1}^9\sum_{s=1}^{60}(\hat{x}_{s,i}-x_{s,i})^2$$
```{r}
Cal_MSE <- function(pred){
sum_error_list = c()
for (i in 1:9){
sq_error_list = (ts_list[[i]][92:151]- pred[,i])^2
sum_error_list[i]= sum(sq_error_list)
}
MSE <- 1/(9*60)*sum(sum_error_list)
return(MSE)
}
Cal_MSE(pred_list)
```
## Fit ARIMA(0, 1, 1) to the App data usage. In this model, I continuously refit the ARIMA(0, 1, 1) model using the information up to day T and predict day T+1.
```{r}
library(forecast)
pred_list2 = c()
for (app in 1:9){
pred2 = c()
for (i in 1:60){
(reg2 = arima(ts_list[[app]][1:(91+i-1)], order = c(0, 1, 1)))
pred2[i] = forecast(reg2, h=1)$mean[1]
}
# print(pred)
pred_list2 = cbind(pred_list2, pred2)
}
```
### Calculate the MSE for the ARIMA(0, 1, 1) model
```{r}
Cal_MSE(pred_list2)
```
### However, the above MSE is not satisfying.
Actually, the original app data series tend to be stationary when t is large. So the following trials consider not to apply the difference step. We directly use ARMA model instead.
### fit the ARMA to the App data usage using to function $auto.arima$.
Use the information before March 1 to fit the model while using the **information up to day T** to predict day T+1 by the **moving windows**. (stationary = TRUE implies that we assume the time series is stationary and will not differentiate.)
```{r}
pred_list3 = c()
for (app in 1:9){
fit <- auto.arima(window(ts_list[[app]], end = 91),
ic = "aic",
stationary = T,
seasonal = F)
refit <- Arima(ts_list[[app]], model = fit)
fc <- window(fitted(refit), start = 92)
pred_list3 = cbind(pred_list3, fc)
}
cat("The MSE for auto.arima model with moving window is:", Cal_MSE(pred_list3))
```
### fit the AR(2) to the App data usage.
Use the information before March 1 to fit the model while using the **information up to day T** to predict day T+1 by the **moving windows**.
```{r}
pred_list5 = c()
for (app in 1:9){
fit <- Arima(window(ts_list[[app]], end = 91), order = c(2, 0, 0))
refit <- Arima(ts_list[[app]], model = fit)
fc <- window(fitted(refit), start = 92)
pred_list5 = cbind(pred_list5, fc)
}
cat("The MSE for AR(2) model with moving window is:", Cal_MSE(pred_list5))
```
After several trials, I found that instead of using the moving window, directly forecast the day T+1 values by the information up to day T based on the fitted model generate slightly better results:
```{r}
mse = rep(0, 9)
for (app in 1:9){
fit <- auto.arima(ts_list[[app]][1:91],
ic = "aic",
stationary = T,
seasonal = F)
refit <- Arima(ts_list[[app]][92:151], model = fit)
res = accuracy(refit)
mse[app] = as.data.frame((res))$RMSE^2
}
cat("The MSE for auto.arima model is:", mean(mse))
```
```{r}
mse = rep(0, 9)
for (app in 1:9){
fit <- Arima(ts_list[[app]][1:91], order = c(2, 0, 0))
refit <- Arima(ts_list[[app]][92:151], model = fit)
res = accuracy(refit)
mse[app] = as.data.frame((res))$RMSE^2
}
cat("The MSE for AR(2) model is:", mean(mse))
```
### From the above results, we found that the AR(2) model has the lowest MSE. Hence the AR(2) model can be the best ARIMA model.
# 2. GARCH
Since the App F has the largest value of data usage, we analyse the ACF and PACF of the squared residuals of App data:
```{r}
library(astsa)
u = sarima(ts_f[1:91], 2, 0, 0)
acf2(resid(u$fit)^2)
```
The above plot shows the ACF and PACF of the squared residuals. It appears that there may be some dependence, albeit small, left in the residuals.
We used the R package fGarch to fit an AR(2)-ARCH(1) model to the app data with the following results.
```{r}
library(fGarch)
garch_mdl <- garchFit(~arma(2,0)+garch(1,0), ts_f[1:91])
summary(garch_mdl)
```
```{r}
pred_list_garch = c()
for (app in 1:9){
reg_garch = garchFit(~arma(2,0)+garch(1,0), ts_list[[app]][1:91])
pred_garch = predict(reg_garch, n.ahead = 60)[, 1]
pred_list_garch = cbind(pred_list_garch, pred_garch)
}
cat("The MSE for AR(2)-ARCH(1) model is:", Cal_MSE(pred_list_garch))
```
```{r}
print(pred_list_garch)
```
```{r}
pred_list_garch = c()
for (app in 1:9){
pred_garch = c()
for (i in 1:60){
reg_garch = garchFit(~arma(2,0)+garch(1,0), ts_list[[app]][1:91])
pred_garch[i] = predict(reg_garch, n.ahead = 1)[1]
}
pred_list_garch = cbind(pred_list_garch, pred_garch)
}
# cat("The MSE for garch model is:", Cal_MSE(pred_list_garch))
```
```{r}
library(rugarch)
garchspec <- ugarchspec(variance.model = list(model = "sGARCH", garchOrder = c(1, 0)), mean.model = list(armaOrder = c(2, 0)), distribution.model = "norm")
pred_garch = c()
for (app in (1:9)){
garch_fit <- ugarchfit(data = ts_list[[app]][1:91], spec = garchspec)
garch_forecast <- ugarchforecast(fitORspec = garch_fit, n.ahead = 60)
print(fitted(garch_forecast))
pred_garch = cbind(pred_garch, fitted(garch_forecast))
}
cat("The MSE for for AR(2)-ARCH(1) model with moving window is:", Cal_MSE(pred_garch))
```