Time Series Part 1 - Analysis

--- title: Time Series Part 1 - Analysis tags: Machine Learning, CoderSchool --- This article mathematically explains the characteristics of time series accompanied by sample codes in Python. Please refer to [Time Series Part 2 - Forecasting](https://hackmd.io/jhj6qyxISoOe-vg92BE1TQ?view) for practical uses of time series analyzing and forecasting techniques. # Overview A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. One difference from standard linear regression is that the data are not necessarily independent and not necessarily identically distributed. One defining characteristic of time series is that this is a list of observations where the ordering matters. Ordering is very important because there is dependency and changing the order could change the meaning of the data. ## Univariate Time Series Is a series with a single time-dependent variable. For example, the dataset below consists of the temperature values (each hour) for the past 2 years. Temperature is dependent on Time. ![](https://i.imgur.com/s22SS4F.png) To predict the temperature for the next few days, we will look at the past values of the variable Temperature and try to gauge and extract a pattern. ## Multivariate Time Series Has more than one time-dependent variable. Each variable depends not only on its past values but also has some dependency on other variables. This dependency is used for forecasting future values. Now suppose our dataset above also includes perspiration percent, dew point, wind speed, cloud cover percentage, etc. along with the temperature value for the past two years. In this case, there are multiple variables to be considered to optimally predict temperature. A series like this would fall under the category of multivariate time series. ![](https://i.imgur.com/Ep8dqgw.png) ## Basic Objectives of the Analysis The basic objective usually is to determine a model that describes the pattern of the time series. Uses for such a model are: 1. To describe the important features of the time series pattern. 1. To explain how the past affects the future or how two time series can “interact”. 1. To forecast future values of the series. 1. To possibly serve as a control standard for a variable that measures the quality of product in some manufacturing situations. ## Important Characteristics to Consider First Some important questions to first consider when first looking at a time series are: - Is there a **trend**, meaning that, on average, the measurements tend to increase (or decrease) over time? - Is there **seasonality**, meaning that there is a regularly repeating pattern of highs and lows related to calendar time such as seasons, quarters, months, days of the week, and so on? - Are their **outliers**? In regression, outliers are far away from your line. With time series data, your outliers are far away from your other data. - Is there a **long-run cycle** or **period** unrelated to seasonality factors? - Is there **constant variance** over time, or is the variance non-constant? - Are there any **abrupt changes** to either the level of the series or the variance? ### Example 1 The following plot is a time series plot of the annual number of earthquakes in the world with seismic magnitude over 7.0, for a 99 consecutive years. By a time series plot, we simply mean that the variable is plotted against time. ![](https://i.imgur.com/lppxZOt.gif) Some features of the plot: - There is **no consistent trend** (upward or downward) over the entire time span. The series appears to slowly wander up and down. The horizontal line drawn at quakes = 20.2 indicates the mean of the series. Notice that the series tends to stay on the same side of the mean (above or below) for a while and then wanders to the other side. - Almost by definition, there is **no seasonality** as the data are annual data. - There are **no obvious outliers**. - It’s difficult to judge whether the **variance** is constant or not. ### Example 2 The plot below shows a time series of quarterly production of beer in Australia for 18 years. ![](https://i.imgur.com/IC4I9Z4.gif) Some important features are: - There is an **upward trend**, possibly a curved one. - There is **seasonality** – a regularly repeating pattern of highs and lows related to quarters of the year. - There are **no obvious outliers**. - There might be **increasing variation** as we move across time, although that’s uncertain. # Data Preparation & Visualization ## Import Time Series Let’s use the ``read_csv()`` in pandas package to read the time series dataset (a csv file on Australian Drug Sales) as a pandas dataframe. Adding the ``parse_dates=['date']`` argument will make the date column to be parsed as a date field. In: ```python= from dateutil.parser import parse import matplotlib as mpl import matplotlib.pyplot as plt import seaborn as sns import numpy as np import pandas as pd plt.rcParams.update({'figure.figsize': (10, 7), 'figure.dpi': 120}) # Import as Dataframe df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) df.head() ``` Out: ![](https://i.imgur.com/bP9AU9q.png) Alternately, you can import it as a pandas Series with the date as index. You just need to specify the ``index_col`` argument in the ``pd.read_csv()`` to do this. In: ```python= ser = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') ser.head() ``` Out: ![](https://i.imgur.com/1rkvWeA.png) ## Treat Missing Values in a Time Series When it comes to time series, you should typically **NOT** **replace** missing values with the mean of the series, especially if the series is not stationary. What you could do instead for a quick and dirty workaround is to forward-fill the previous value. However, depending on the nature of the series, you want to try out multiple approaches before concluding. Some effective alternatives to imputation are: 1. Backward Fill 1. Linear Interpolation 1. Quadratic interpolation 1. Mean of nearest neighbors 1. Mean of seasonal couterparts To measure the imputation performance, I manually introduce missing values to the time series, impute it with above approaches and then measure the mean squared error of the imputed against the actual values. In: ```python= ## Generate dataset from scipy.interpolate import interp1d from sklearn.metrics import mean_squared_error df_orig = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date').head(100) df = pd.read_csv('datasets/a10_missings.csv', parse_dates=['date'], index_col='date') fig, axes = plt.subplots(7, 1, sharex=True, figsize=(10, 12)) plt.rcParams.update({'xtick.bottom' : False}) ## 1. Actual ------------------------------- df_orig.plot(title='Actual', ax=axes[0], label='Actual', color='red', style=".-") df.plot(title='Actual', ax=axes[0], label='Actual', color='green', style=".-") axes[0].legend(["Missing Data", "Available Data"]) ## 2. Forward Fill -------------------------- df_ffill = df.ffill() error = np.round(mean_squared_error(df_orig['value'], df_ffill['value']), 2) df_ffill['value'].plot(title='Forward Fill (MSE: ' + str(error) +")", ax=axes[1], label='Forward Fill', style=".-") ## 3. Backward Fill ------------------------- df_bfill = df.bfill() error = np.round(mean_squared_error(df_orig['value'], df_bfill['value']), 2) df_bfill['value'].plot(title="Backward Fill (MSE: " + str(error) +")", ax=axes[2], label='Back Fill', color='firebrick', style=".-") ## 4. Linear Interpolation ------------------ df['rownum'] = np.arange(df.shape[0]) df_nona = df.dropna(subset = ['value']) f = interp1d(df_nona['rownum'], df_nona['value']) df['linear_fill'] = f(df['rownum']) error = np.round(mean_squared_error(df_orig['value'], df['linear_fill']), 2) df['linear_fill'].plot(title="Linear Fill (MSE: " + str(error) +")", ax=axes[3], label='Cubic Fill', color='brown', style=".-") ## 5. Cubic Interpolation -------------------- f2 = interp1d(df_nona['rownum'], df_nona['value'], kind='cubic') df['cubic_fill'] = f2(df['rownum']) error = np.round(mean_squared_error(df_orig['value'], df['cubic_fill']), 2) df['cubic_fill'].plot(title="Cubic Fill (MSE: " + str(error) +")", ax=axes[4], label='Cubic Fill', color='red', style=".-") # Interpolation References: # https://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html # https://docs.scipy.org/doc/scipy/reference/interpolate.html ## 6. Mean of 'n' Nearest Past Neighbors ------ def knn_mean(ts, n): out = np.copy(ts) for i, val in enumerate(ts): if np.isnan(val): n_by_2 = np.ceil(n/2) lower = np.max([0, int(i-n_by_2)]) upper = np.min([len(ts)+1, int(i+n_by_2)]) ts_near = np.concatenate([ts[lower:i], ts[i:upper]]) out[i] = np.nanmean(ts_near) return out df['knn_mean'] = knn_mean(df.value.values, 8) error = np.round(mean_squared_error(df_orig['value'], df['knn_mean']), 2) df['knn_mean'].plot(title="KNN Mean (MSE: " + str(error) +")", ax=axes[5], label='KNN Mean', color='tomato', alpha=0.5, style=".-") ## 7. Seasonal Mean ---------------------------- def seasonal_mean(ts, n, lr=0.7): """ Compute the mean of corresponding seasonal periods ts: 1D array-like of the time series n: Seasonal window length of the time series """ out = np.copy(ts) for i, val in enumerate(ts): if np.isnan(val): ts_seas = ts[i-1::-n] # previous seasons only if np.isnan(np.nanmean(ts_seas)): ts_seas = np.concatenate([ts[i-1::-n], ts[i::n]]) # previous and forward out[i] = np.nanmean(ts_seas) * lr return out df['seasonal_mean'] = seasonal_mean(df.value, n=12, lr=1.25) error = np.round(mean_squared_error(df_orig['value'], df['seasonal_mean']), 2) df['seasonal_mean'].plot(title="Seasonal Mean (MSE: " + str(error) +")", ax=axes[6], label='Seasonal Mean', color='blue', alpha=0.5, style=".-") ``` Out: ![](https://i.imgur.com/RljoH6V.png) You could also consider the following approaches depending on how accurate you want the imputations to be. 1. If you have explanatory variables use a prediction model like the random forest or k-Nearest Neighbors to predict it. 1. If you have enough past observations, forecast the missing values. 1. If you have enough future observations, backcast the missing values 1. Forecast of counterparts from previous cycles. ## Visualize a Time Series In: ```python= # Time series data source: fpp pacakge in R. import matplotlib.pyplot as plt df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') # Draw Plot def plot_df(df, x, y, title="", xlabel='Date', ylabel='Value', dpi=100): plt.figure(figsize=(16,5), dpi=dpi) plt.plot(x, y, color='tab:red') plt.gca().set(title=title, xlabel=xlabel, ylabel=ylabel) plt.show() plot_df(df, x=df.index, y=df.value, title='Monthly anti-diabetic drug sales in Australia from 1992 to 2008.') ``` Out: ![](https://i.imgur.com/g34lu5L.png) Since all values are positive, you can show this on both sides of the Y axis to emphasize the growth. In: ```python= # Import data df = pd.read_csv('datasets/AirPassengers.csv', parse_dates=['date']) x = df['date'].values y1 = df['value'].values # Plot fig, ax = plt.subplots(1, 1, figsize=(16,5), dpi= 120) plt.fill_between(x, y1=y1, y2=-y1, alpha=0.5, linewidth=2, color='seagreen') plt.ylim(-800, 800) plt.title('Air Passengers (Two Side View)', fontsize=16) plt.hlines(y=0, xmin=np.min(df.date), xmax=np.max(df.date), linewidth=.5) plt.show() ``` Out: ![](https://i.imgur.com/ka9OsD0.png) Since its a monthly time series and follows a certain repetitive pattern every year, you can plot each year as a separate line in the same plot. This lets you compare the year wise patterns side-by-side. ### Seasonal Plot of a Time Series ```python= # Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') df.reset_index(inplace=True) # Prepare data df['year'] = [d.year for d in df.date] df['month'] = [d.strftime('%b') for d in df.date] years = df['year'].unique() # Prep Colors np.random.seed(100) mycolors = np.random.choice(list(mpl.colors.XKCD_COLORS.keys()), len(years), replace=False) # Draw Plot plt.figure(figsize=(16,12), dpi= 80) for i, y in enumerate(years): if i > 0: plt.plot('month', 'value', data=df.loc[df.year==y, :], color=mycolors[i], label=y) plt.text(df.loc[df.year==y, :].shape[0]-.9, df.loc[df.year==y, 'value'][-1:].values[0], y, fontsize=12, color=mycolors[i]) # Decoration plt.gca().set(xlim=(-0.3, 11), ylim=(2, 30), ylabel='$Drug Sales$', xlabel='$Month$') plt.yticks(fontsize=12, alpha=.7) plt.title("Seasonal Plot of Drug Sales Time Series", fontsize=20) plt.show() ``` Out: ![](https://i.imgur.com/21lAlzw.png) There is a steep fall in drug sales every February, rising again in March, falling again in April and so on. Clearly, the pattern repeats within a given year, every year. However, as years progress, the drug sales increase overall. You can nicely visualize this trend and how it varies each year in a nice year-wise boxplot. Likewise, you can do a month-wise boxplot to visualize the monthly distributions. ### Boxplot of Month-wise (Seasonal) and Year-wise (trend) Distribution You can group the data at seasonal intervals and see how the values are distributed within a given year or month and how it compares over time. In: ```python= # Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') df.reset_index(inplace=True) # Prepare data df['year'] = [d.year for d in df.date] df['month'] = [d.strftime('%b') for d in df.date] years = df['year'].unique() # Draw Plot fig, axes = plt.subplots(1, 2, figsize=(20,7), dpi= 80) sns.boxplot(x='year', y='value', data=df, ax=axes[0]) sns.boxplot(x='month', y='value', data=df.loc[~df.year.isin([1991, 2008]), :]) # Set Title axes[0].set_title('Year-wise Box Plot\n(The Trend)', fontsize=18); axes[1].set_title('Month-wise Box Plot\n(The Seasonality)', fontsize=18) plt.show() ``` Out: ![](https://i.imgur.com/tsjo6n8.png) The boxplots make the year-wise and month-wise distributions evident. Also, in a month-wise boxplot, the months of December and January clearly has higher drug sales, which can be attributed to the holiday discounts season. ## Patterns in a Time Series Any time series may be split into the following components: **Base Level, Trend, Seasonality, Error** In: ```python= fig, axes = plt.subplots(1,3, figsize=(20,4), dpi=100) pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/guinearice.csv', parse_dates=['date'], index_col='date').plot(title='Trend Only', legend=False, ax=axes[0]) pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/sunspotarea.csv', parse_dates=['date'], index_col='date').plot(title='Seasonality Only', legend=False, ax=axes[1]) pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/AirPassengers.csv', parse_dates=['date'], index_col='date').plot(title='Trend and Seasonality', legend=False, ax=axes[2]) ``` Out: ![](https://i.imgur.com/gCjeZ7E.png) Another aspect to consider is the cyclic behaviour. It happens when the rise and fall pattern in the series does **not** happen in fixed calendar-based intervals. Care should be taken to not confuse ‘cyclic’ effect with ‘seasonal’ effect. Unlike the seasonality, cyclic effects are typically influenced by the business and other socio-economic factors. # Decomposition Decomposition procedures are used to describe the trend and seasonal factors in a time series. More extensive decompositions might also include long-run cycles, holiday effects, day of week effects and so on. One of the main objectives for a decomposition is to estimate seasonal effects that can be used to create and present seasonally adjusted values. A seasonally adjusted value removes the seasonal effect from a value so that trends can be seen more clearly. For instance, in many regions of the U.S. unemployment tends to decrease in the summer due to increased employment in agricultural areas. Thus a drop in the unemployment rate in June compared to May doesn’t necessarily indicate that there’s a trend toward lower unemployment in the country. To see whether there is a real trend, we should adjust for the fact that unemployment is always lower in June than in May. ## Basic Structures The following two structures are considered for basic decomposition models: 1. Additive: $x_t$ = Trend + Seasonal + Random (or Irregular) 2. Multiplicative: $x_t$ = Trend * Seasonal * Random (or Irregular) ## How to Choose Between Additive and Multiplicative Decompositions The additive model is useful when the seasonal variation is relatively constant over time. The multiplicative model is useful when the seasonal variation increases over time. ![](https://i.imgur.com/T9Wx4d5.png) ## Basic Steps in Decomposition ### Step 1: To estimate the trend. * One approach is to estimate the trend with a smoothing procedure such as moving averages. With this approach no equation is used to describe trend. * The second approach is to model the trend with a regression equation. ### Step 2: To “de-trend” the series. For an additive decomposition, this is done by subtracting the trend estimates from the series. For a multiplicative decomposition, this is done by dividing the series by the trend values. ### Step 3: Seasonal factors are estimated using the de-trended series. For monthly data, this entails estimating an effect for each month of the year. For quarterly data, this entails estimating an effect for each quarter. The simplest method for estimating these effects is to average the de-trended values for a specific season. For instance, to get a seasonal effect for January, we average the de-trended values for all Januarys in the series, and so on. The seasonal effects are usually adjusted so that they average to 0 for an additive decomposition or they average to 1 for a multiplicative decomposition. ### Step 4: To determine the random (irregular) component. For the additive model, random = series – trend – seasonal. For the multiplicative model, random = series / (trend*seasonal) The random component could be analyzed for such things as the mean location, or mean squared size (variance), or possibly even for whether the component is actually random or might be modeled with an ARIMA model. A few programs iterate through the steps 1 to 3. For example, after step 3 we could use the seasonal factors to de-seasonalize the series and then return to step 1 to estimate the trend based on the de-seasonalized series. ## Decompose a Time Series into Its Components The ``seasonal_decompose`` in ``statsmodels`` implements this conveniently. In: ```python= from statsmodels.tsa.seasonal import seasonal_decompose from dateutil.parser import parse # Import Data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') # Multiplicative Decomposition result_mul = seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq') # Additive Decomposition result_add = seasonal_decompose(df['value'], model='additive', extrapolate_trend='freq') # Plot plt.rcParams.update({'figure.figsize': (10,10)}) result_mul.plot().suptitle('Multiplicative Decompose', fontsize=22) result_add.plot().suptitle('Additive Decompose', fontsize=22) plt.show() ``` Out: ![](https://i.imgur.com/Aq6EhQQ.png) Setting ``extrapolate_trend='freq'`` takes care of any missing values in the trend and residuals at the beginning of the series. If you look at the residuals of the additive decomposition closely, it has some pattern left over. The multiplicative decomposition, however, looks quite random, which is good. So ideally, multiplicative decomposition should be preferred for this particular series. The numerical output of the trend, seasonal and residual components are stored in the ``result_mul`` output itself. Let’s extract them and put it in a dataframe. In: ```python= # Extract the Components ---- # Actual Values = Product of (Seasonal * Trend * Resid) df_reconstructed = pd.concat([result_mul.seasonal, result_mul.trend, result_mul.resid, result_mul.observed], axis=1) df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values'] df_reconstructed.head() ``` If you check, the product of ``seas``, ``trend`` and ``resid`` columns should exactly equal to the ``actual_values``. ## Detrend a Time Series There are multiple approaches to extract the trend. 1. Subtract the line of best fit from the time series. The line of best fit may be obtained from a linear regression model with the time steps as the predictor. For more complex trends, you may want to use quadratic terms ($x^2$) in the model. 1. Subtract the trend component obtained from time series decomposition we saw earlier. 1. Subtract the mean 1. Apply a filter like Baxter-King filter(``statsmodels.tsa.filters.bkfilter``) or the Hodrick-Prescott Filter (``statsmodels.tsa.filters.hpfilter``) to remove the moving average trend lines or the cyclical components. Let’s implement the first two methods. In: ```python= # Using scipy: Subtract the line of best fit from scipy import signal df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) detrended = signal.detrend(df.value.values) plt.plot(detrended) plt.title('Drug Sales detrended by subtracting the least squares fit', fontsize=16) ``` Out: ![](https://i.imgur.com/oEzeH6T.png) In: ```python= # Using statmodels: Subtracting the Trend Component. from statsmodels.tsa.seasonal import seasonal_decompose df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') result_mul = seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq') detrended = df.value.values - result_mul.trend plt.plot(detrended) plt.title('Drug Sales detrended by subtracting the trend component', fontsize=16) ``` Out: ![](https://i.imgur.com/LoxH9Pd.png) ## Deseasonalize a Time Series There are multiple approaches to deseasonalize a time series. 1. Take a moving average with length as the seasonal window. This will smoothen in series in the process. 2. Seasonal difference the series (subtract the value of previous season from the current value) 3. Divide the series by the seasonal index obtained from STL decomposition If dividing by the seasonal index does not work well, try taking a log of the series and then do the deseasonalizing. You can later restore to the original scale by taking an exponential. In: ```python= # Subtracting the Trend Component. df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date'], index_col='date') # Time Series Decomposition result_mul = seasonal_decompose(df['value'], model='multiplicative', extrapolate_trend='freq') # Deseasonalize deseasonalized = df.value.values / result_mul.seasonal # Plot plt.plot(deseasonalized) plt.title('Drug Sales Deseasonalized', fontsize=16) plt.plot() ``` Out: ![](https://i.imgur.com/H4wI82i.png) ## Test for Seasonality The common way is to plot the series and check for repeatable patterns in fixed time intervals. So, the types of seasonality is determined by the clock or the calendar: Hour of day, Day of month, Weekly, Monthly, Yearly. However, if you want a more definitive inspection of the seasonality, use the Autocorrelation Function (ACF) plot. When there is a strong seasonal pattern, the ACF plot usually reveals definitive repeated spikes at the multiples of the seasonal window. For example, the drug sales time series is a monthly series with patterns repeating every year. So, you can see spikes at 12th, 24th, 36th.. lines. However, in real word datasets, such strong patterns are hardly noticed and can get distorted by any noise, so you need a careful eye to capture these patterns. In: ```python= from pandas.plotting import autocorrelation_plot df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv') # Draw Plot plt.rcParams.update({'figure.figsize':(9,5), 'figure.dpi':120}) autocorrelation_plot(df.value.tolist()) ``` Out: ![](https://i.imgur.com/ufH9Yo4.png) Alternately, if you want a statistical test, the CHTest can determine if seasonal differencing is required to stationarize the series. # Stationarity ## Definition A stationary series is one where the values of the series is not a function of time. That is, the statistical properties of the series like mean, variance and autocorrelation are constant over time. A stationary time series is devoid of seasonal effects as well. So how to identify if a series is stationary or not? Let’s plot some examples to make it clear: ![](https://i.imgur.com/OwLZ1XK.png) It is possible to make nearly any time series stationary by applying a suitable transformation. Most statistical forecasting methods are designed to work on a stationary time series. The first step in the forecasting process is typically to do some transformation to convert a non-stationary series to stationary. ## Why Stationarizing before Forecasting? Forecasting a stationary series is relatively easy and the forecasts are more reliable. Autoregressive forecasting models are essentially linear regression models that utilize the lag(s) of the series itself as predictors. We know that linear regression works best if the predictors (X variables) are not correlated against each other. So, stationarizing the series solves this problem since it removes any persistent autocorrelation, thereby making the predictors (lags of the series) in the forecasting models nearly independent. ## Test for Stationarity By looking at the plot of the series like we did earlier. **OR** By splitting the series into 2 or more contiguous parts and computing the summary statistics like the mean, variance and the autocorrelation. If the stats are quite different, then the series is not likely to be stationary. Nevertheless, you need a method to quantitatively determine if a given series is stationary or not. This can be done using statistical tests called ‘Unit Root Tests’. There are multiple variations of this, where the tests check if a time series is non-stationary and possess a unit root. ### 1. Augmented Dickey Fuller test (ADF Test) The most commonly used is the ADF test, where the null hypothesis is the time series possesses a unit root and is non-stationary. So, if the **p-value** in ADH test is less than the **significance level (0.05)**, you reject the null hypothesis and infer that the time series is indeed stationary. ### 2. Kwiatkowski-Phillips-Schmidt-Shin – KPSS test (trend stationary) The KPSS test, on the other hand, is used to test for trend stationarity. The null hypothesis and the p-value interpretation is just the opposite of ADH test. ### 3. Philips Perron test (PP Test) ### The below code implements the first 2 tests using ``statsmodels`` package in Python. In: ```python= from statsmodels.tsa.stattools import adfuller, kpss df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) # ADF Test result = adfuller(df.value.values, autolag='AIC') print(f'ADF Statistic: {result[0]}') print(f'p-value: {result[1]}') for key, value in result[4].items(): print('Critial Values:') print(f' {key}, {value}') # KPSS Test result = kpss(df.value.values, regression='c') print('\nKPSS Statistic: %f' % result[0]) print('p-value: %f' % result[1]) for key, value in result[3].items(): print('Critial Values:') print(f' {key}, {value}') ``` Out: ```python= ADF Statistic: 3.14518568930674 p-value: 1.0 Critial Values: 1%, -3.465620397124192 Critial Values: 5%, -2.8770397560752436 Critial Values: 10%, -2.5750324547306476 KPSS Statistic: 1.313675 p-value: 0.010000 Critial Values: 10%, 0.347 Critial Values: 5%, 0.463 Critial Values: 2.5%, 0.574 Critial Values: 1%, 0.739 ``` ## How to Stationarize a Time Series ### 1. Take the log of the series ### 2. Take the nth root of the series ### 3. Combination of all these ### 4. Difference the series (once or more): most common and convenient method Differencing the series is subtracting the next value by the current value. If $Y_t$ is the value at time $t$, then the first difference of $Y = Y_t – Y_{t-1}$. If the first difference doesn’t make a series stationary, you can go for the second differencing. And so on. The value of d is the minimum number of differencing needed to make the series stationary. And if the time series is already stationary, then d = 0. For example, consider the following series: $[1, 5, 2, 12, 20]$ First differencing gives: $[5-1, 2-5, 12-2, 20-12] = [4, -3, 10, 8]$ Second differencing gives: $[-3-4, -10-3, 8-10] = [-7, -13, -2]$ ### Example If p-value in the ADF test > 0.05 we go ahead with finding the order of differencing. In: ```python= import numpy as np, pandas as pd from statsmodels.graphics.tsaplots import plot_acf, plot_pacf import matplotlib.pyplot as plt plt.rcParams.update({'figure.figsize':(9,7), 'figure.dpi':120}) # Import data df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0) # Original Series fig, axes = plt.subplots(3, 2, sharex=True) axes[0, 0].plot(df.value); axes[0, 0].set_title('Original Series') plot_acf(df.value, ax=axes[0, 1]) # 1st Differencing axes[1, 0].plot(df.value.diff()); axes[1, 0].set_title('1st Order Differencing') plot_acf(df.value.diff().dropna(), ax=axes[1, 1]) # 2nd Differencing axes[2, 0].plot(df.value.diff().diff()); axes[2, 0].set_title('2nd Order Differencing') plot_acf(df.value.diff().diff().dropna(), ax=axes[2, 1]) plt.show() ``` Out: ![](https://i.imgur.com/10c5ltg.png) For the above series, the time series reaches stationarity with 2 orders of differencing. But on looking at the autocorrelation plot for the 2nd differencing the lag goes into the far negative zone fairly quick, which indicates, the series might have been over differenced. So, I am going to tentatively fix the order of differencing as 1 even though the series is not perfectly stationary (weak stationarity). In: ```python= from pmdarima.arima.utils import ndiffs df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusage.csv', names=['value'], header=0) y = df.value ## Adf Test ndiffs(y, test='adf') # 2 # KPSS test ndiffs(y, test='kpss') # 0 # PP test: ndiffs(y, test='pp') # 2 ``` Out: ```python= 2 0 2 ``` ## White Noise vs Stationary Series Like a stationary series, the white noise is also **not** a function of time, that is its mean and variance does **not** change over time. But the difference is, the white noise is completely **random** with a **mean of 0**. In white noise there is no pattern whatsoever. If you consider the sound signals in an FM radio as a time series, the blank sound you hear between the channels is white noise. Mathematically, a sequence of completely random numbers with mean zero is a white noise. In: ```python= randvals = np.random.randn(1000) pd.Series(randvals).plot(title='Random White Noise', color='k') ``` Out: ![](https://i.imgur.com/AutKFm1.png) # Autocorrelation & Partial Autocorrelation ## Definition The coefficient of correlation between two values in a time series is called the **autocorrelation function (ACF)**. For example, the ACF for a time series $y_t$ is given by: $Corr(y_t,y_{t−k}),k=1,2,...$ This value of $k$ is the time gap being considered and is called the **lag**. A lag 1 autocorrelation (i.e., $k = 1$ in the above) is the correlation between values that are one time period apart. More generally, a lag $k$ autocorrelation is the correlation between values that are $k$ time periods apart. The ACF is a way to measure the linear relationship between an observation at time $t$ and the observations at previous times. If we assume an AR($k$) model, then we may wish to only measure the association between $y_t$ and $y_{t−k}$ and filter out the linear influence of the random variables that lie in between (i.e., $y_{t−1},y_{t−2},…,y_{t−(k−1)}$), which requires a transformation on the time series. Then by calculating the correlation of the transformed time series we obtain the **partial autocorrelation function (PACF)**. The PACF is most useful for identifying the order of an autoregressive model. Specifically, sample partial autocorrelations that are significantly different from 0 indicate lagged terms of $y$ that are useful predictors of $y_t$. It is important that the choice of the order makes sense. In: ```python= from statsmodels.tsa.stattools import acf, pacf from statsmodels.graphics.tsaplots import plot_acf, plot_pacf df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv') # Calculate ACF and PACF upto 50 lags # acf_50 = acf(df.value, nlags=50) # pacf_50 = pacf(df.value, nlags=50) # Draw Plot fig, axes = plt.subplots(1,2,figsize=(16,3), dpi= 100) plot_acf(df.value.tolist(), lags=50, ax=axes[0]) plot_pacf(df.value.tolist(), lags=50, ax=axes[1]) ``` Out: ![](https://i.imgur.com/pVN5Lxe.png) ## Compute Partial Autocorrelation Function The partial autocorrelation of lag (k) of a series is the coefficient of that lag in the autoregression equation of Y (the linear regression of Y with its own lags as predictors). For example, if $Y_t$ is the current series and $Y_{t-1}$ is the lag 1 of $Y$, then the partial autocorrelation of lag 3 $Y_{t-3}$ is the coefficient $\alpha_3$ of $Y_{t-3}$ in the following equation: $$ Y_t = \alpha_0 + \alpha_1Y_{t-1} + \alpha_2Y_{t-2} + \alpha_3Y_{t-3} $$ ## Lag Plots A lag plot is a scatter plot of a time series against a lag of itself. It is normally used to check for autocorrelation. If there is any pattern existing in the series like the one you see below, the series is autocorrelated. If there is no such pattern, the series is likely to be random white noise. In below example on Sunspots area time series, the plots get more and more scattered as the n_lag increases. In: ```python= from pandas.plotting import lag_plot plt.rcParams.update({'ytick.left' : False, 'axes.titlepad':10}) # Import ss = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/sunspotarea.csv') a10 = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv') # Plot fig, axes = plt.subplots(1, 4, figsize=(10,3), sharex=True, sharey=True, dpi=100) for i, ax in enumerate(axes.flatten()[:4]): lag_plot(ss.value, lag=i+1, ax=ax, c='firebrick') ax.set_title('Lag ' + str(i+1)) fig.suptitle('Lag Plots of Sun Spots Area \n(Points get wide and scattered with increasing lag -> lesser correlation)\n', y=1.15) fig, axes = plt.subplots(1, 4, figsize=(10,3), sharex=True, sharey=True, dpi=100) for i, ax in enumerate(axes.flatten()[:4]): lag_plot(a10.value, lag=i+1, ax=ax, c='firebrick') ax.set_title('Lag ' + str(i+1)) fig.suptitle('Lag Plots of Drug Sales', y=1.05) plt.show() ``` Out: ![](https://i.imgur.com/xB4PJ56.png) # Forecastability of a Time Series ## Methods **The more regular and repeatable patterns a time series has, the easier it is to forecast.** The ‘Approximate Entropy’ can be used to quantify the regularity and unpredictability of fluctuations in a time series. **The higher the approximate entropy, the more difficult it is to forecast it.** Another better alternate is the ‘Sample Entropy’. Sample Entropy is similar to Approximate Entropy but is more consistent in estimating the complexity even for smaller time series. For example, a random time series with fewer data points can have a lower ‘Approximate Entropy’ than a more ‘regular’ time series, whereas, a longer random time series will have a higher ‘Approximate Entropy’. Sample Entropy handles this problem nicely. See the demonstration below. In: ```python= # https://en.wikipedia.org/wiki/Approximate_entropy ss = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/sunspotarea.csv') a10 = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv') rand_small = np.random.randint(0, 100, size=36) rand_big = np.random.randint(0, 100, size=136) def ApEn(U, m, r): """Compute Aproximate entropy""" def _maxdist(x_i, x_j): return max([abs(ua - va) for ua, va in zip(x_i, x_j)]) def _phi(m): x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)] C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x] return (N - m + 1.0)**(-1) * sum(np.log(C)) N = len(U) return abs(_phi(m+1) - _phi(m)) print(ApEn(ss.value, m=2, r=0.2*np.std(ss.value))) # 0.651 print(ApEn(a10.value, m=2, r=0.2*np.std(a10.value))) # 0.537 print(ApEn(rand_small, m=2, r=0.2*np.std(rand_small))) # 0.143 print(ApEn(rand_big, m=2, r=0.2*np.std(rand_big))) # 0.716 ``` Out: ```python= 0.6514704970333534 0.5374775224973489 0.0898376940798844 0.7369242960384561 ``` In: ```python= # https://en.wikipedia.org/wiki/Sample_entropy def SampEn(U, m, r): """Compute Sample entropy""" def _maxdist(x_i, x_j): return max([abs(ua - va) for ua, va in zip(x_i, x_j)]) def _phi(m): x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)] C = [len([1 for j in range(len(x)) if i != j and _maxdist(x[i], x[j]) <= r]) for i in range(len(x))] return sum(C) N = len(U) return -np.log(_phi(m+1) / _phi(m)) print(SampEn(ss.value, m=2, r=0.2*np.std(ss.value))) # 0.78 print(SampEn(a10.value, m=2, r=0.2*np.std(a10.value))) # 0.41 print(SampEn(rand_small, m=2, r=0.2*np.std(rand_small))) # 1.79 print(SampEn(rand_big, m=2, r=0.2*np.std(rand_big))) # 2.42 ``` Out: ```python= 0.7853311366380039 0.41887013457621214 inf 2.181224235989778 del sys.path[0] ``` ## Granger Causality Test Granger causality test is used to determine if one time series will be useful to forecast another. It is based on the idea that if X causes Y, then the forecast of Y based on previous values of Y AND the previous values of X should outperform the forecast of Y based on previous values of Y alone. So, understand that Granger causality should not be used to test if a lag of Y causes Y. Instead, it is generally used on exogenous (not Y lag) variables only. It is nicely implemented in the ``statsmodel`` package. It accepts a 2D array with 2 columns as the main argument. The values are in the first column and the predictor (X) is in the second column. The Null hypothesis is: the series in the second column, does not Granger cause the series in the first. If the P-Values are less than a significance level (0.05) then you reject the null hypothesis and conclude that the said lag of X is indeed useful. The second argument ``maxlag`` says till how many lags of Y should be included in the test. In: ```python= from statsmodels.tsa.stattools import grangercausalitytests df = pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv', parse_dates=['date']) df['month'] = df.date.dt.month grangercausalitytests(df[['value', 'month']], maxlag=2) ``` Out: ```python= Granger Causality number of lags (no zero) 1 ssr based F test: F=54.7797 , p=0.0000 , df_denom=200, df_num=1 ssr based chi2 test: chi2=55.6014 , p=0.0000 , df=1 likelihood ratio test: chi2=49.1426 , p=0.0000 , df=1 parameter F test: F=54.7797 , p=0.0000 , df_denom=200, df_num=1 Granger Causality number of lags (no zero) 2 ssr based F test: F=162.6989, p=0.0000 , df_denom=197, df_num=2 ssr based chi2 test: chi2=333.6567, p=0.0000 , df=2 likelihood ratio test: chi2=196.9956, p=0.0000 , df=2 parameter F test: F=162.6989, p=0.0000 , df_denom=197, df_num=2 ``` In the above case, the P-Values are Zero for all tests. So the ‘month’ indeed can be used to forecast the Air Passengers. # Smoothing Smoothing is usually done to help us better see patterns, trends for example, in time series. Generally smooth out the irregular roughness to see a clearer signal. For seasonal data, we might smooth out the seasonality so that we can identify the trend. Smoothing doesn’t provide us with a model, but it can be a good first step in describing various components of the series. The term filter is sometimes used to describe a smoothing procedure. For instance, if the smoothed value for a particular time is calculated as a linear combination of observations for surrounding times, it might be said that we’ve applied a linear filter to the data (not the same as saying the result is a straight line, by the way). ## Some Methods in Python **1. Take a moving average** **2. Do a LOESS smoothing (Localized Regression)** **3. Do a LOWESS smoothing (Locally Weighted Regression)** Moving average is the average of a rolling window of defined width. But you must choose the window-width wisely, because, large window-size will over-smooth the series. For example, a window-size equal to the seasonal duration (ex: 12 for a month-wise series), will effectively nullify the seasonal effect. LOESS, short for ‘LOcalized regrESSion’ fits multiple regressions in the local neighborhood of each point. It is implemented in the ``statsmodels`` package, where you can control the degree of smoothing using frac argument which specifies the percentage of data points nearby that should be considered to fit a regression model. In: ```python= from statsmodels.nonparametric.smoothers_lowess import lowess plt.rcParams.update({'xtick.bottom' : False, 'axes.titlepad':5}) # Import df_orig = pd.read_csv('datasets/elecequip.csv', parse_dates=['date'], index_col='date') # 1. Moving Average df_ma = df_orig.value.rolling(3, center=True, closed='both').mean() # 2. Loess Smoothing (5% and 15%) df_loess_5 = pd.DataFrame(lowess(df_orig.value, np.arange(len(df_orig.value)), frac=0.05)[:, 1], index=df_orig.index, columns=['value']) df_loess_15 = pd.DataFrame(lowess(df_orig.value, np.arange(len(df_orig.value)), frac=0.15)[:, 1], index=df_orig.index, columns=['value']) # Plot fig, axes = plt.subplots(4,1, figsize=(7, 7), sharex=True, dpi=120) df_orig['value'].plot(ax=axes[0], color='k', title='Original Series') df_loess_5['value'].plot(ax=axes[1], title='Loess Smoothed 5%') df_loess_15['value'].plot(ax=axes[2], title='Loess Smoothed 15%') df_ma.plot(ax=axes[3], title='Moving Average (3)') fig.suptitle('How to Smoothen a Time Series', y=0.95, fontsize=14) plt.show() ``` Out: ![](https://i.imgur.com/rnnpLqC.png) ## Some Other Methods Mathematically Explained ## Moving Average Smoothing data removes random variation and shows trends and cyclic components. Taking averages is the simplest way to smooth data. ### Example A manager of a warehouse wants to know how much a typical supplier delivers in 1000 dollar units. He/she takes a sample of 12 suppliers, at random, obtaining the following results: |Supplier|Amount|Supplier|Amount| |:-: |:-: |:-: |:-: | | 1 | 9 | 7 | 11 | | 2 | 8 | 8 | 7 | | 3 | 9 | 9 | 13 | | 4 | 12 | 10 | 9 | | 5 | 9 | 11 | 11 | | 6 | 12 | 12 | 10 | The computed mean or average of the data = 10. The manager decides to use this as the estimate for expenditure of a typical supplier. **Is this a good or bad estimate?** Mean squared error is a way to judge how good a model is. We shall compute it: * The "error" = true amount spent minus the estimated amount. * The "error squared" (ES) is the error above, squared. * The "SSE" is the sum of the squared errors. * The "MSE" is the mean of the squared errors. **The estimate = 10** |Supplier|$|Error|ES| |:-:|:-:|:-:|:-:| |1 |9 |-1 |1| |2 |8 |-2 |4| |3 |9 |-1 |1| |4 |12 |2 |4| |5 |9 |-1 |1| |6 |12 |2 |4| |7 |11 |1 |1| |8 |7 |-3 |9| |9 |13 |3 |9| |10 |9 |-1 |1| |11 |11 |1 |1| |12 |10 |0 |0| The SSE = 36 and the MSE = 36/12 = 3. So how good was the estimator for the amount spent for each supplier? Let us compare the estimate (10) with the following estimates: 7, 9, and 12. That is, we estimate that each supplier will spend $7, or $9 or $12. Performing the same calculations we arrive at: |Estimator|7|9|10|12| |:-:|:-:|:-:|:-:|:-:| |SSE|144|48|36|84| |MSE|12|4|3|7| The estimator with the smallest MSE is the best. It can be shown mathematically that the estimator that minimizes the MSE for a set of random data is the mean. Next we will examine the mean to see how well it predicts net income over time. The next table gives the income before taxes of a PC manufacturer between 1985 and 1994. |Year|$ (millions)|Mean|Error|Squared Error| |:-:|:-:|:-:|:-:|:-:| |1985| 46.163| 48.676| -2.513| 6.313| |1986| 46.998| 48.676| -1.678| 2.814| |1987| 47.816| 48.676| -0.860| 0.739| |1988| 48.311| 48.676| -0.365| 0.133| |1989| 48.758| 48.676| 0.082 | 0.007| |1990| 49.164| 48.676| 0.488 | 0.239| |1991| 49.548| 48.676| 0.872 | 0.761| |1992| 48.915| 48.676| 0.239 | 0.057| |1993| 50.315| 48.676| 1.639 | 2.688| |1994| 50.768| 48.676| 2.092 | 4.378| The MSE = 1.8129. However, the mean is **not** a good estimator to forecast **when there are trends**. ![](https://i.imgur.com/ZxFaW8v.gif) ## Exponential Smoothing Exponential smoothing schemes weight past observations using exponentially decreasing weights. In other words, recent observations are given relatively more weight in forecasting than the older observations. In the case of moving averages, the weights assigned to the observations are the same and are equal to 1/N. In exponential smoothing, however, there are one or more smoothing parameters to be determined (or estimated) and these choices determine the weights assigned to the observations. ## Single Exponential Smoothing ### Formula For any time period $t$, the smoothed value $S_t$ is found by computing: $$S_t=αy_{t−1}+(1−α)S_{t−1}$$ when $0<α≤1$ and $t≥3$; $α$ is called the smoothing constant; $y$ stands for the original observation. This can be written as: $S_{t+1}=S_t+αϵ_t$ where $ϵ_t$ is the forecast error $(actual - forecast)$ for period $t$. In other words, the new forecast is the old one plus an adjustment for the error that occurred in the last forecast. ### What is the "best" value for α? When α is close to 1, smoothing is quick and when α is close to 0, smoothing is slow. We choose the best value for α so the value which results in the smallest MSE. Let us illustrate this principle with an example. Consider the following data set consisting of 12 observations taken over time: |Time|$y_t$|$S(α=0.1)$|Error|ES| |:-:|:-:|:-:|:-:|:-:| |1| 71| |2| 70| 71| -1.00| 1.00 |3| 69| 70.9| -1.90| 3.61 |4| 68| 70.71| -2.71| 7.34 |5| 64| 70.44| -6.44| 41.47 |6| 65| 69.80| -4.80| 23.04 |7| 72| 69.32| 2.68| 7.18 |8| 78| 69.58| 8.42| 70.90 |9| 75| 70.43| 4.57| 20.88 |10|75| 70.88| 4.12| 16.97 |11|75| 71.29| 3.71| 13.76 |12|70| 71.67| -1.67| 2.79 The SSE = 208.94. The MSE is the SSE /11 = 19.0. ![](https://i.imgur.com/jJh9ahF.gif) ### Bootstrapping of Forecasts What happens if you wish to forecast from some origin, usually the last data point, and no actual observations are available? In this situation we have to modify the formula to become: $S_{t+1}=αy_{orgin}+(1−α)S_t$ where $y_{origin}$ remains constant. This technique is known as bootstrapping. ### Single Exponential Smoothing with Trend Single Exponential Smoothing is not very good when there is a trend. The single coefficient α is not enough. Let us demonstrate this with the following dataset smoothed with an α of 0.3: |Data| Fit| |:-:|:-:| |6.4| |5.6| 6.4 |7.8| 6.2 |8.8| 6.7 |11.0| 7.3 |11.6| 8.4 |16.7| 9.4 |15.3| 11.6 |21.6| 12.7 |22.4| 15.4 The resulting graph looks like: ![](https://i.imgur.com/15KuGys.gif) ## Double Exponential Smoothing Double exponential smoothing uses two constants and is better at handling trends. ### Formula ![](https://i.imgur.com/ek9qCre.png) Note that the current value of the series is used to calculate its smoothed value replacement in double exponential smoothing. As in the case for single smoothing, there are a variety of schemes to set initial values for St and bt in double smoothing. $S_1$ is in general set to $y_1$. Here are 3 suggestions for $b_1$. ![](https://i.imgur.com/eYLsPWJ.png) The one-period-ahead forecast is given by: $F_{t+1}=S_t+b_t$. The m-periods-ahead forecast is given by: $F_{t+m}=S_t+mb_t$. ### Comments The first smoothing equation adjusts $S_t$ directly for the trend of the previous period, $b_{t−1}$, by adding it to the last smoothed value, $S_{t−1}$. This helps to eliminate the lag and brings St to the appropriate base of the current value. The second smoothing equation then updates the trend, which is expressed as the difference between the last two values. The equation is similar to the basic form of single smoothing, but here applied to the updating of the trend. ### Example Consider the data set: $6.4, 5.6, 7.8, 8.8, 11, 11.6, 16.7, 15.3, 21.6, 22.4$. Now we will fit a double smoothing model with α=0.3623 and γ=1.0. These are the estimates that result in the lowest possible MSE when comparing the orignal series to one step ahead at a time forecasts (since this version of double exponential smoothing uses the current series value to calculate a smoothed value, the smoothed series cannot be used to determine an α with minimum MSE). The chosen starting values are $S_1=y_1=6.4$ and $b_1=((y_2−y_1)+(y_3−y_2)+(y_4−y_3))/3=0.8$. For comparison's sake we also fit a single smoothing model with α=0.977 (this results in the lowest MSE for single exponential smoothing). The MSE for double smoothing is 3.7024. The MSE for single smoothing is 8.8867. The smoothed results for the example are: |Data|Double|Single| |:-:|:-:|:-:| |6.4| 6.4| |5.6| 6.6 (Forecast = 7.2)| 6.4 |7.8| 7.2 (Forecast = 6.8)| 5.6 |8.8| 8.1 (Forecast = 7.8)| 7.8 |11.0| 9.8 (Forecast = 9.1)| 8.8 |11.6| 11.5 (Forecast = 11.4)| 10.9 |16.7| 14.5 (Forecast = 13.2)| 11.6 |15.3| 16.7 (Forecast = 17.4)| 16.6 |21.6| 19.9 (Forecast = 18.9)| 15.3 |22.4| 22.8 (Forecast = 23.1)| 21.5 ### Comparison of Forecasts To see how each method predicts the future, we computed the first five forecasts from the last observation as follows: |Period|Single|Double |:-:|:-:|:-:| |11 |22.4|25.8 |12 |22.4|28.7 |13 |22.4|31.7 |14 |22.4|34.6 |15 |22.4|37.6 #### Plot comparing single and double exponential smoothing forecasts ![](https://i.imgur.com/SjeS5il.gif) This graph indicates that double smoothing follows the data much closer than single smoothing. Furthermore, for forecasting single smoothing cannot do better than projecting a straight horizontal line, which is not very likely to occur in reality. So in this case double smoothing is preferred. #### Plot comparing double exponential smoothing and regression forecasts ![](https://i.imgur.com/KymLR5a.gif) Both techniques follow the data in similar fashion, but the regression line is more conservative. That is, there is a slower increase with the regression line than with double smoothing. #### The selection of the technique depends on the forecaster If it is desired to portray the growth process in a more aggressive manner, then one selects double smoothing. Otherwise, regression may be preferable. It should be noted that in linear regression "time" functions as the independent variable. ## Triple Exponential Smoothing What happens if the data show trend **and** seasonality? In this case double smoothing will not work. We now introduce a third equation to take care of seasonality. The resulting set of equations is called the "Holt-Winters" (HW) method after the names of the inventors. ### Formula ![](https://i.imgur.com/4H2Xhom.png) where * $y$ is the observation * $S$ is the smoothed observation * $b$ is the trend factor * $I$ is the seasonal index * $F$ is the forecast at m periods ahead * $t$ is an index denoting a time period and $α$, $β$, and $γ$ are constants that must be estimated in such a way that the MSE of the error is minimized. To initialize the HW method we need at least one complete season's data to determine initial estimates of the seasonal indices $I_{t−L}$. A complete season's data consists of $L$ periods. And we need to estimate the trend factor from one period to the next. To accomplish this, it is advisable to use two complete seasons; that is, $2L$ periods. ### Initial Values for the Trend Factor ![](https://i.imgur.com/QSStyqS.png) ### Initial Values for the Seasonal Indices As we will see in the example, we work with data that consist of 6 years with 4 periods (that is, 4 quarters) per year. #### Step 1: Compute yearly averages. ![](https://i.imgur.com/9VdDU6O.png) #### Step 2: Divide the observations by the appropriate yearly mean. ![](https://i.imgur.com/sjvMADG.png) #### Step 3: Form seasonal indices by computing the average of each row. ![](https://i.imgur.com/KZxkxs4.png) ### The case of the Zero Coefficients Sometimes it happens that a computer program for triple exponential smoothing outputs a final coefficient for trend (γ) or for seasonality (β) of 0. Or worse, both are outputted as 0! Does this indicate that there is no trend and/or no seasonality? Of course not! It only means that the initial values for trend and/or seasonality were right on the money. No updating was necessary in order to arrive at the lowest possible MSE. We should inspect the updating formulas to verify this. ### Example The following data set represents 24 observations. These are six years of quarterly data (each year has four quarters). |Year|Quarter|Period|Sales |:-:|:-:|:-:|:-:| |90|1| 1| 362 |90|2| 2| 385 |90|3| 3| 432 |90|4| 4| 341 |91|1| 5| 382 |91|2| 6| 409 |91|3| 7| 498 |91|4| 8| 387 |92|1| 9| 473 |92|2| 10| 513 |92|3| 11| 582 |92|4| 12| 474 |93|1| 13| 544 |93|2| 14| 582 |93|3| 15| 681 |93|4| 16| 557 |94|1| 17| 628 |94|2| 18| 707 |94|3| 19| 773 |94|4| 20| 592 |95|1| 21| 627 |95|2| 22| 725 |95|3| 23| 854 |95|4| 24| 661 #### Plot of raw data with single, double, and triple exponential forecasts ![](https://i.imgur.com/DLfgJF6.gif) #### Plot of raw data with triple exponential forecasts ![](https://i.imgur.com/SDbBf5R.gif) --- # Sources > [1]https://towardsdatascience.com/almost-everything-you-need-to-know-about-time-series-860241bdc578 [2]https://www.statisticssolutions.com/time-series-analysis/ [3]https://newonlinecourses.science.psu.edu/stat510/node/47/ [4]https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm [5]https://github.com/marcopeix/stock-prediction/blob/master/Stock%20Prediction.ipynb [6]https://towardsdatascience.com/almost-everything-you-need-to-know-about-time-series-860241bdc578 [7]https://towardsdatascience.com/end-to-end-time-series-analysis-and-modelling-8c34f09a3014 [8]https://machinelearningmastery.com/time-series-trends-in-python/ [9]https://towardsdatascience.com/an-end-to-end-project-on-time-series-analysis-and-forecasting-with-python-4835e6bf050b [10]https://www.machinelearningplus.com/time-series/time-series-analysis-python/ [11]https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/ [12]https://www.analyticsvidhya.com/blog/2018/09/multivariate-time-series-guide-forecasting-modeling-python-codes/ [13]https://www.digitalocean.com/community/tutorials/a-guide-to-time-series-visualization-with-python-3#step-2-%E2%80%94-loading-time-series-data