---
# System prepended metadata

title: Practical Time Series Analysis
tags: [presentation]

---

---
type: slide
tags: presentation
slideOptions:
  theme: white
  transition: 'fade'
  allottedMinutes: 45
---

<style>
.cols{
  display: flex;
}
.col {
  flex: 1;
}
</style>

# Practical Time Series Analysis

---

<!-- [TOC]

---  -->

## What are Time Series?

- Data points indexed in time order
- Time introduces a unique set of problems
![](https://i.imgur.com/D6jIqth.png)

Note:
- Most of our data are time series
- Why is it difficult? Time importance, speed, shift, amplitude...
- How long, and how many data points in the past? Is it changing?

---

## Time Series Tasks

- Forecasting
- Anomaly detection
- Clustering
- Classification

Note:
- What is the most important step in the analysis?

---

## Data Preparation


<div class="cols">
<div class="col">

- Longest & the most important step
    - Garbage in = garbage out!
- Try to gather some domain knowledge
- It is always application specific
    

</div>
<div class="col">
    <img style="float: right;" src="https://www.oreilly.com/api/v2/epubs/9781788838535/files/assets/2f80aca9-1de1-420d-8633-eb6f7e987398.jpg">
</div>
</div>

Note:
- Well-trained monkey can run fit & predict
- Be aware of what is your goal
- Scaling - mean, median, didn't I just lose some information?
- Arcsinh
- Many times way harder than other steps...

---

### Feature Extraction

- Constructing combinations of the variables to get around time
- We can combine them with TS data

![](https://github.com/blue-yonder/tsfresh/raw/main/docs/images/introduction_ts_exa_features.png =600x)

Note:
- Many times this is the best approach as we "know" the features
- Usually creating heterogeneous data
- Combination with TS is difficult
- Usage of NN to extract features (transformers, etc.)

---

## Machine Learning Tasks

- Occam's razor
- Proper definition is key
- Start simple and advance gradually
- Try multiple ML families - there is no single best model

Note:
- People tend to blindly follow Internet guides
- heterogeneous vs homogeneous data 

---

## Anomaly Detection

<div class="cols">
<div class="col">

- What are anomalies?
- Time series anomalies
    - Point outliers
    - Sequence outliers
    - Shape/behavior Anomaly

<sub>https://arxiv.org/pdf/2002.04236.pdf</sub>
</div>
<div class="col">
    
![](https://i.imgur.com/lzR0oXL.png =400x)
![](https://i.imgur.com/nxX8Cxg.png =400x)
     
</div>
</div>


Note:
- It's hard as there are not many anomalies
- Recall-precision trade-off
- Usually, noone can tell you if it's an anomaly
- Try to gather some "human" data

---

### Anomaly Detection Methods

- STL decomposition
- Tree & density approaches (Isolation Forest, LODA)
- Neural Networks (Auto-Encoders)
- Forecasting

![](https://i.imgur.com/MQRmbCn.png =x300)


Note:
- STL - Robust, simple, well studied, Changes like Corona
- Tree and density - not really for time series
- Auto Encoders - NN problems (more on that later)
- Forecasting - error cumulation, is my forecast really good?

---

### Isolation Forest

<div class="cols">
<div class="col">

- Binary tree
- Anomalies are "easy" to isolate
- Random orthogonal boundaries
- Scale invariance

</div>
    

<div class="col">
    
![](https://upload.wikimedia.org/wikipedia/commons/8/84/Normalized_Anomaly_Scores_of_Isolation_Forest.png =300x)
![](https://i.imgur.com/rPATR5P.png =300x)
</div>
</div>

---

### LODA: Lightweight On-line Detector of Anomalies 

<div class="cols">
<div class="col">

- Random projections and histograms ([1])
- Very fast and scalable
- Randomness adds robustness
- Anomaly explanation

</div>


<div class="col">
    
![](https://i.imgur.com/WnxH8Q6.png =350x)

</div>
</div>

[1]: http://agents.fel.cvut.cz/stegodata/pdfs/Pev15-Loda.pdf

---

## Forecasting

- Predicting the future from historic values
- Good error metric is essential!
- Can we predict a "random" series?
 
![](https://www.bounteous.com/sites/default/files/styles/insight_550/public/sample-time-seriese-analysis.png?itok=jUaQIXzT =350x)


Note:
- Practical things
    - Why do we want to forecast?
    - Chaotic and random time series
    - Do we have all the data
    - Does our time series depend 
- Good error metric is essential!
- We don't need a good prediction when nothing is happening

---

### Model Validation

- Don't use the future to predict the future!
- Notice big changes in data (change points)
- Cross-validation & moving window

![](https://www.researchgate.net/publication/341618027/figure/fig4/AS:895046874329097@1590407212999/Time-Series-Cross-Validation.pp =800x)

---

### Forecasting Approaches

- STL decomposition
    - Auto ARIMA
    - Facebook Prophet
- Classic ML
    - Time Frame Shifting
- Neural Networks
    - Recurrent NN and LSTM
    - CNN (SCINet)
    - Decomposition (N-BEATS)
    - Transformers (LTSF-Linear, Informer, etc.)

Note:
- Stability
- Cost vs effectiveness
- Always fact-check with a simple model
- heterogeneous vs homogeneous data 
- Don't use a tank to kill a fly
- NN practical problems - Number of parameters (LSTM)

---

### Facebook Prophet

- Decomposing TS as $y(t) = g(t) + s(t) + h(t) + et$

![](https://i.imgur.com/WbQaW5G.png =800x)

Note:
- Start with Auto-Arima
- Simple and automatic holiday support
- Kinda slow

---

### N-BEATS

![](https://i.imgur.com/prKb50R.png =800x)

Note:
- Quite simple NN
- We can check output of every layer
- Still, we need to scale our data

---

## Clasification & Clustering

- DTW, shape-based approaches, Hidden Markov Model
- Other ML approaches using DTW as metric or using DTW distance matrix

![](https://i.imgur.com/6KkrVdc.png)

Note:
- 1NN DTW - surprisingly good and simple
- Time & Space complexity
- Not a metric nor derivable!
- sDTW - usage with NN
- HMM - complexity

---

## Interesting Links

- Feature Extraction - [tsfresh](https://github.com/blue-yonder/tsfresh)
- Forecasting:
    - [Sktime](https://github.com/sktime/sktime) & [tslearn](https://tslearn.readthedocs.io/en/stable/)
    - Nixtla repositories 
        - [Stats Forecast](https://github.com/Nixtla/statsforecast)
        - [Neural Forecast](https://github.com/Nixtla/neuralforecast)
    - Auto Arima - [pmdarima](https://alkaline-ml.com/pmdarima/)

---

Thank you for your attention!
Questions?

