# Time Series Analysis: an introduction March 27th, 2023 --- ## Introduction <!-- > Time series data is an important source of information used for future decision making, strategy, and planning operations in different industries: from marketing and finance to education, healthcare, and robotics. --> * A time series is a collection of data points that are stored with respect to their time. * Anything that is observed sequentially over time is a time series * Mathematical and statistical analysis performed on these data to find hidden patterns is called **time series analysis**. * Time-series modeling techniques are used to understand past patterns from data to try to forecast future horizons. --- ## Types of Data * Time-series analysis is a statistical technique that measures a sequential set of data points. ![Types of data](https://i.imgur.com/LNL1pr1.png) * **Time-Series data** contains data points that change in chronological order over a period --- ## Univariate and multivariate time-series * A time series data that incorporates the records of a single feature or variable is called a **univariate time series**, * When a series has more than one **feature** or **variable**, it is called a **multivariate time series**. * A time-series can be designated in a **continuous** ou **discrete** way --- ## Continuous data * In a **continuous time series**, data observation is carried out continuously throughout a period, such as in earthquake seismograph magnitude data, speech data, and temperature. ![Forty years of earthquake seismograph magnitude data](https://i.imgur.com/6tRsH2U.png) --- ## Discrete time series * In **discrete time** series, data observation is carried out at a specific time or equally spaced, as with temperature increases or decreases, exchange rates of currencies, air pressure data, among others --- ## Cross-Section data * Cross-section data is data gathered at a specific point of time for several subjects such as **closing prices** of some particular stocks on a specific date, opinion polls of elections, obesity level in a population, among others. * Cross-section studies are utilized in many research areas such as medical, economics, and psychology. --- ## Painel data / longitudinal data * Panel data/longitudinal data contains observations of multiple occurrences collected over various durations of time for the same individuals. * It is data that is determined periodically by the number of observations in cross-sectional data units such as individuals, companies, or government agencies. --- ## Forecasting tasks 1. **Problem definition**: understanding the way the forecasts will be used, by whom, and how the forecasting method fits within the organisation requiring the forecasts 2. **Gathering information**: there are always at least two kinds of information required: (a) **statistical data**, and (b) the **accumulated expertise** of the people who collect the data and use the forecasts --- ## Forecasting tasks 3. **Exploratory analysis**: always start by graphing the data: - Are there consistent patterns? - Is there a significant trend? - Is seasonality important? - Is there evidence of the presence of business cycles? - Are there any outliers in the data? - How strong are the relationships among the variables available for analysis? --- ## Forecasting tasks 4. **Choosing and fitting models**: the best model to use depends on the availability of historical data, the strength of relationships between the forecast variable and any explanatory variables 5. **Using and evaluating a forecasting model**: Once a model has been selected and its parameters estimated, the model is used to make forecasts. The performance of the model can only be properly evaluated after the data for the forecast period have become available --- ## Time series graphics * A **time series graph** plots observed values on the _y-axis_ against an increment of time on the _x-axis_ * It visually highlights the behavior and patterns of the data and can lay the foundation for building a reliable model. * Visualizing time series data provides a preliminary tool for detecting if data: * Is mean-reverting or has explosive behavior * Has a time trend * Exhibits seasonality * Demonstrates structural breaks --- <!-- ## Time series graph (cont.) --> > In our view, the first step in any time series investigation always involves careful scrutiny of the recorded data plotted over time. This scrutiny often suggests the method of analysis as well as statistics that will be of use in summarizing the information in the data. -- Shumay and Stoffer <!-- * While visual inspection should never replace statistical estimation, it can help you decide whether a non-zero mean should be included in the model --> --- ## Mean reverting data * **Mean reverting** data returns over time, to a time-invariant mean * It is important to know whether a model includes a non-zero mean since it is a prerequisite for determining appropriate testing and modeling methods. <img src="https://i.imgur.com/xUI3j97.png" width="50%"/> <!-- ![](https://i.imgur.com/xUI3j97.png) --> --- ## Testing for Mean Reversion * A continuous **mean-reverting** time series can be represented by an **Ornstein-Uhlenbeck** stochastic differential equation: \begin{eqnarray} d x_t = \theta (\mu - x_t) dt + \sigma dW_t \end{eqnarray} * where, $\theta$ is the rate of reversion to the mean, $\mu$ is the mean value of the process, $\sigma$ is the variance of the process, and $W_t$ is a Wiener Process or Brownian Motion <!-- * In a **discrete setting** it states that change of the price series in the next time is proportional to the difference between the mean and the current price with the addition of Gaussian noise. --> --- ## Trend * A trend is a pattern that is observed over a period of time * It can be non-linear * Refer to trend as _"changing direction"_ <!-- * Reliability of a time series model depends on properly identifying and accounting for time trends --> <img src="https://i.imgur.com/7eYhmkm.png" width="50%"/> --- ## Seasonality * **Seasonality** is a periodical fluctuation where the same pattern occurs at a regular interval of time * For example, sales usually increase between September to December and normally decreases between January and February. <!-- * Seasonality occurs when time series data exhibits regular and predictable patterns at time intervals that are smaller than a year --> <img src="https://i.imgur.com/MrNJmxg.png" width="40%"> --- ## Detect seasonality * We can use **box plots** and **autocorrelation plots** to detect seasonality in the data * A **box plot** is an essential graph to depict data spread out over a range. It is a standard approach to showing the minimum, first quartile, middle, third quartile, and maximum. * Autocorrelation is used to check randomness in data. It helps to identify types of data where the period is not known. --- ## Decomposing a time series * Decomposition is a method used to isolate time series data into different elements such as **trends**, **seasonality**, **cyclic variance**, and **residuals** * We can leverage seasonal decomposition from a statistical model to decompose the data into its parts, considering series as **additive** or **multiplicative** --- ## Additive model * An additive model works with the linear trends of the time series data such as changes constantly over time * It is computed as $$y[t] = T[ts] + S[ts] + C[ts] + \epsilon{}[t]$$ * where, $T$, $S$, $C$, and $\epsilon{}$ are the trend, seasonality, cyclic variations, and residuals (errors) --- ## Multiplicative model * A multiplicative model works with non-linear types of data, such as quadric or exponential. * It's computed as: $$y[t] = T[ts] * S[ts] * c[ts] * \epsilon{}[t]$$ --- ## Notebook --- ## References 1. [Hyndman, R.J., and Athanasopoulos, G. (2021). Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia](https://otexts.com/fpp3/) 1. [Introduction to the Fundamentals of Time Series Data and Analysis](https://www.aptech.com/blog/introduction-to-the-fundamentals-of-time-series-data-and-analysis/) 2. [Working with Time Series](https://jakevdp.github.io/PythonDataScienceHandbook/03.11-working-with-time-series.html) 3. [Basics of Statistical Mean Reversion Testing](https://www.quantstart.com/articles/Basics-of-Statistical-Mean-Reversion-Testing/) 4. [Python Data Science Handbook](https://github.com/jakevdp/PythonDataScienceHandbook) 5. [Time Series Analysis with Pandas](https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas) 1. [A Comprehensive Guide to Time Series Analysis](https://www.analyticsvidhya.com/blog/2021/10/a-comprehensive-guide-to-time-series-analysis/) <!-- 1. [Mean Reversion](https://www.investopedia.com/terms/m/meanreversion.asp) 1. [Crypto Data using AlphaVantatge.jl](https://dm13450.github.io/2021/03/27/CryptoAlphaVantage.html) 1.[Alpha Vantage Cookbook](https://github.com/prediqtiv/alpha-vantage-cookbook/blob/master/symbol-lists.md) 1. [Download historical data using Alpha Vantage](http://cafim.sssup.it/~giulio/other/alpha_vantage/index.html) -->
{"metaMigratedAt":"2023-06-16T21:44:29.311Z","metaMigratedFrom":"YAML","title":"Time Series Analysis: an introduction","breaks":true,"slideOptions":"{\"theme\":\"League\",\"transition\":\"slide\",\"spotlight\":{\"enabled\":false}}","contributors":"[{\"id\":\"1ad0ffd4-d3e8-4079-b433-c54a3340aed4\",\"add\":10658,\"del\":1223}]"}
    286 views