Time Series Part 2 - Forecasting

--- title: Time Series Part 2 - Forecasting tags: Machine Learning, CoderSchool --- # The ARIMA Time Series Model One of the most common methods used in time series forecasting is known as the ARIMA model, which stands for **A**utoreg**R**essive **I**ntegrated **M**oving **A**verage. ARIMA is a model that can be fitted to time series data in order to better understand or predict future points in the series. There are three distinct integers (`p`, `d`, `q`) that are used to parametrize ARIMA models. Because of that, ARIMA models are denoted with the notation `ARIMA(p, d, q)`. Together these three parameters account for seasonality, trend, and noise in datasets: - `p` is the _auto-regressive_ part of the model. It allows us to incorporate the effect of past values into our model. Intuitively, this would be similar to stating that it is likely to be warm tomorrow if it has been warm the past 3 days. - `d` is the _integrated_ part of the model. This includes terms in the model that incorporate the amount of differencing (i.e. the number of past time points to subtract from the current value) to apply to the time series. Intuitively, this would be similar to stating that it is likely to be same temperature tomorrow if the difference in temperature in the last three days has been very small. - `q` is the _moving average_ part of the model. This allows us to set the error of our model as a linear combination of the error values observed at previous time points in the past. When dealing with seasonal effects, we make use of the _seasonal_ ARIMA, which is denoted as `ARIMA(p,d,q)(P,D,Q)s`. Here, `(p, d, q)` are the non-seasonal parameters described above, while `(P, D, Q)` follow the same definition but are applied to the seasonal component of the time series. The term `s` is the periodicity of the time series (`4` for quarterly periods, `12` for yearly periods, etc.). The seasonal ARIMA method can appear daunting because of the multiple tuning parameters involved. In the next section, we will describe how to automate the process of identifying the optimal set of parameters for the seasonal ARIMA time series model. ## Autoregressive (AR) Models $X_t=δ+ϕ_1X_{t−1}+ϕ_2X_{t−2}+⋯+ϕ_pX_{t−1}+A_t$ where $X_t$ is the time series, $A_t$ is white noise, and $δ=(1−\sum_{i=1}^pϕ_i)μ$ with $μ$ denoting the process mean. An autoregressive model is simply a linear regression of the current value of the series against one or more prior values of the series. **$p$ is the order of the AR model.** ## Moving Average (MA) Models $X_t=μ+A_t−θ_1A_{t−1}−θ_2A_{t−2}−⋯−θ_qA_{t−q}$ where $X_t$ is the time series, $μ$ is the mean of the series, $A_{t−i}$ are white noise terms, and $θ_1,…,θ_q$ are the parameters of the model. **$q$ is the order of the MA model.** That is, a moving average model is conceptually a linear regression of the current value of the series against the white noise or random shocks of one or more prior values of the series. The random shocks at each point are assumed to come from the same distribution, typically a normal distribution, with location at zero and constant scale. The distinction in this model is that these random shocks are propogated to future values of the time series. Fitting the MA estimates is more complicated than with AR models because the error terms are not observable. This means that iterative non-linear fitting procedures need to be used in place of linear least squares. MA models also have a less obvious interpretation than AR models. ## ARMA When no differencing is involved, the abbreviation ARMA may be used. ARMA is a combination of the AR and MA models: $X_t=δ+ϕ_1X_{t−1}+ϕ_2X_{t−2}+⋯+ϕ_pX_{t−p}+A_t−θ_1A_{t−1}−θ_2A_{t−2}−⋯−θ_qA_{t−q}$ where the terms in the equation have the same meaning as given for the AR and MA model. ## Specify the Elements of the Model In most software programs, the elements in the model are specified in the order **(AR order, differencing, MA order)**. As examples, - A model with (only) two AR terms would be specified as an ARIMA of order (2,0,0). - A MA(2) model would be specified as an ARIMA of order (0,0,2). - A model with one AR term, a first difference, and one MA term would have order (1,1,1). - For the last model, ARIMA (1,1,1), a model with one AR term and one MA term is being applied to the variable $z_t=x_t-x_{t-1}$. A first difference might be used to account for a linear trend in the data. The differencing order refers to successive first differences. For example, for a difference order = 2 the variable analyzed is $z_t = (x_t-x_{t-1}) - (x_{t-1}-x_{t-2})$, the first difference of first differences. This type of difference might account for a quadratic trend in the data. ## Identify a Possible Model Three items should be considered to determine a first guess at an ARIMA model: **a time series plot of the data, the ACF, and the PACF.** ### Time series plot of the observed series What to look for: possible trend, seasonality, outliers, constant variance or nonconstant variance. - You won’t be able to spot any particular model by looking at this plot, but you will be able to see the need for various possible actions. - If there’s an obvious upward or downward linear trend, a first difference may be needed. A quadratic trend might need a 2nd order difference (as described above). We rarely want to go much beyond two. In those cases, we might want to think about things like smoothing. Over-differencing can cause us to introduce unnecessary levels of dependency (difference white noise to obtain a MA(1)–difference again to obtain a MA(2), etc.) - For data with a curved upward trend accompanied by increasing variance, you should consider transforming the series with either a logarithm or a square root. **_Note_**: Nonconstant variance in a series with no trend may have to be addressed with something like an ARCH model which includes a model for changing variation over time. ### ACF and PACF The ACF and PACF should be considered together. It can sometimes be tricky going, but a few combined patterns do stand out. - AR models have theoretical PACFs with non-zero values at the AR terms in the model and zero values elsewhere. The ACF will taper to zero in some fashion. ![](https://i.imgur.com/oN7Jgp2.png) - An AR(1) model has an ACF with a pattern $ρ_k=ρ^k_1$ - An AR(2) has a sinusoidal ACF that converges to 0. ![](https://i.imgur.com/5XV6u7E.png) - MA models have theoretical ACFs with non-zero values at the MA terms in the model and zero values elsewhere. ![](https://i.imgur.com/sPiaLFh.png) - ARMA models (including both AR and MA terms) have ACFs and PACFs that both tail off to 0. These are the trickiest because the order will not be particularly obvious. Basically you just have to guess that one or two terms of each type may be needed and then see what happens when you estimate the model. ![](https://i.imgur.com/TZXFBq5.png) - If the ACF and PACF do not tail off, but instead have values that stay close to 1 over many lags, the series is non-stationary and differencing will be needed. Try a first difference and then look at the ACF and PACF of the differenced data. - If all autocorrelations are non-significant, then the series is random (white noise; the ordering matters, but the data are independent and identically distributed.) You’re done at that point. - If you have taken first differences and all autocorrelations are non-significant, then the series is called a random walk and you are done. (A possible model for a random walk is $x_t = δ + x_{t-1} + w_t$. The data are dependent and are not identically distributed; in fact both the mean and variance are increasing through time.) **_Note_**: You might also consider examining plots of $x_t$ versus various lags of $x_t$. ## Estimate and Diagnose a Possible Model After you’ve made a guess (or two) at a possible model, use software such as R, Minitab, or SAS to estimate the coefficients. Most software will use maximum likelihood estimation methods to make the estimates. Once the model has been estimated, do the following. - Look at the significance of the coefficients. In R, sarima provides p-values and so you may simply compare the p-value to the standard 0.05 cut-off. The arima command does not provide p-values and so you can calculate a t-statistic: t = estimated coeff. / std. error of coeff. Recall that $t_{α,df}$ is the Student $t$-value with area "$α$" to the right of $t_{α,df}$ on $df$ degrees of freedom. If |t| > $t_{.025,(n-p-q-1)}$, then the estimated coefficient is significantly different from 0. When $n$ is large, you may compare estimated coeff. / std. error of coeff to 1.96. - Look at the ACF of the residuals. For a good model, all autocorrelations for the residual series should be non-significant. If this isn’t the case, you need to try a different model. - Look at Box-Pierce (Ljung) tests for possible residual autocorrelation at various lags (see [Lesson 3.2](https://newonlinecourses.science.psu.edu/stat510/lesson/3/3.2) for a description of this test). - If non-constant variance is a concern, look at a plot of residuals versus fits and/or a time series plot of the residuals. If something looks wrong, you’ll have to revise your guess at what the model might be. This might involve adding parameters or re-interpreting the original ACF and PACF to possibly move in a different direction. ## What If More Than One Model Looks Okay? Sometimes more than one model can seem to work for the same dataset. When that’s the case, some things you can do to decide between the models are: - Possibly choose the model with the fewest parameters. - Examine standard errors of forecast values. Pick the model with the generally lowest standard errors for predictions of the future. - Compare models with regard to statistics such as the MSE (the estimate of the variance of the $w_t$), AIC, AICc, and SIC (also called BIC). Lower values of these statistics are desirable. One reason that two models may seem to give about the same results is that, with the certain coefficient values, two different models can sometimes be nearly equivalent when they are each converted to an infinite order MA model. [Every ARIMA model can be converted to an infinite order MA – this is useful for some theoretical work, including the determination of standard errors for forecast errors.] More about this in [Lesson 3.2](https://newonlinecourses.science.psu.edu/stat510/lesson/3/3.2). --- # Sources > [1]https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/ [2]https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-prophet-in-python-3 [3]https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-forecasting-with-arima-in-python-3 [4]https://github.com/susanli2016/Machine-Learning-with-Python/blob/master/Time%20Series%20Forecastings.ipynb [5]https://newonlinecourses.science.psu.edu/stat510/lesson/3/3.1 [6]https://newonlinecourses.science.psu.edu/stat510/lesson/4/4.1