8/11 Data Science Notes

# 8/11 Data Science Notes ## I. Univariate Time Series When you model a univariate time series, you are modeling time series changes that represent changes in a single variable ( $X$ ) over time ( $n$ ). $$ X_{Nx1}= \begin{bmatrix} x_1\\x_2\\...\\x_n \end{bmatrix} $$ ## I. 1 Forecaster Model Forecasting involves taking models fit on historical data and using them to predict future observations. $$ f(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_n \end{bmatrix})=[\hat{x_{t+1}}] $$ ### I. 1. 1 Types of Algorithms 1. __Recursive Strategy__ This algorithm is a naive approach on building a forecasting model. It will recursively append the previous output values to the input list to calculate the remaining output values. This algorithm has a relatively high time complexity. Moreover, the accuracy rate of each prediction will get lower and lower as the predicted series grow longer. \begin{eqnarray*} f(x) &=& f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{t+1}}\\\hat{x_{t+2}} \end{bmatrix} \\ &=& [f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix}), \ f(\begin{bmatrix} x_2\\x_3\\...\\\hat{x_{t+1}} \end{bmatrix})] \end{eqnarray*} 2. __Direct Strategy__ In this algorithm, we will create a model for each of the values in the predicted series. Each model will have the same vector input and will predict one specific future value. The downside of this algorithm is that it is hard to scale and it requires a large space to store the multiple models. $$ f(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{t+1}}\\\hat{x_{t+2}} \end{bmatrix} \\ f^{1}(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{t+1}} \end{bmatrix} \\ f^{2}(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{t+2}} \end{bmatrix} $$ 3. __Multi Input Multi Output (MIMO)__ This type of model takes in a vector on input and outputs a vector of predicted values. It can only be achieved by using Deep Learning, not Machine Learning. </br> ![](https://i.stack.imgur.com/atDJQ.png) ### I. 1. 2 Preprocessing 1. __Trajectory Matrix__ This matrix is what is given as input for a univariate time series forecasting model. The values to the left of the line are the input value and the value on the right is the label. $$ \left(\begin{array}{c c c c|c} x_{1} & x_{2} & \cdots & x_{t-1} & x_{t} \\ x_{2} & x_{3} & \cdots & x_{t} & x_{t+1} \\ \vdots & \vdots & \vdots & \vdots & \vdots \\ x_{N-t+1} & x_{N-t+2} & \cdots & x_{N-1} & x_{N} \end{array}\right)_{(N-t+1)\times t} $$ 2. __Sliding Window__ Sliding window is an algorithm used to separate a long series into shorter ranges to analyze. Here, we use the sliding window to get each row from the series to append into a new matrix. $$ W = [\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ ]_{1\times t} $$ 3. __Sample Code__ ```python= def func(df, t): ret = np.array([]) n = len(df) for j in range(n-t): ret = np.append(ret, np.array(df['value'][j:j+t+1]), axis=0) ret = np.reshape(ret, (n-t, t+1)) return ret ``` ```python= def func(df, t): ret = [[0 for x in range(t+1)] for y in range(len(df)-t)] n = len(df) for j in range(n-t): for i in range(t+1): ret[j][i] = (df['value'][j+i]) return ret ``` ### I. 1. 3 Forecasting Output in UTSAD #### I. 1. 3. 1 Anomaly Score $Recall:$ $$ f(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{t+1}}\\\hat{x_{t+2}}\\...\\\hat{x_{n}} \end{bmatrix} $$ Let the vector $\bar{E}$ be the anomaly score that could be used as a loss function. We define the vector $\bar{E}$ as follows: \begin{eqnarray*} \bar{E} &=& \begin{bmatrix} E_{t+1}\\E_{t+2}\\...\\E_n \end{bmatrix}_{N\times 1}\\ &=& abs(\begin{bmatrix} \hat{x_{t+1}}\\\hat{x_{t+2}}\\...\\\hat{x_{n}} \end{bmatrix} - \begin{bmatrix} x_{t+1}\\x_{t+2}\\...\\x_{n} \end{bmatrix}) \end{eqnarray*} A similar concept with vector $\bar{E}$ could be done with the Reconstructor model with an anomaly score matrix $E$. The matrix $E$ could be defined as follows: $Recall:$ $$ f(x) = f(\begin{bmatrix} x_1 \ ... \ x_t\\x_2 \ ... \ x_{t+1}\\...\\x_{N-t+1} \ ... \ x_N \end{bmatrix})=\begin{bmatrix} \hat{x}_{1} \ ... \ \hat{x}_{t}\\\hat{x}_{2} \ ... \ \hat{x}_{t+1}\\...\\\hat{x}_{N-t+1} \ ... \ \hat{x}_N \end{bmatrix}_{(N-t+1)\times t} $$ \begin{eqnarray*} {E} &=& \begin{bmatrix} E_1 \ ... \ E_t\\E_2 \ ... \ E_{t+1}\\...\\E_{N-t+1} \ ... \ E_N \end{bmatrix}_{(N-t+1)\times t} \\ &=& abs( \begin{bmatrix} \hat{x}_{1} \ ... \ \hat{x}_{t}\\\hat{x}_{2} \ ... \ \hat{x}_{t+1}\\...\\\hat{x}_{N-t+1} \ ... \ \hat{x}_N \end{bmatrix} - \begin{bmatrix} x_1 \ ... \ x_t\\x_2 \ ... \ x_{t+1}\\...\\x_{N-t+1} \ ... \ x_N \end{bmatrix} \end{eqnarray*} These anomaly scores could be used in 2 ways: 1. Filtered through an anomaly **threshold** to directly label them with 0/1 as being an anomaly or normal instance value. The threshold value can either be determined through domain knowledge or be set statistically using $\pm \ 3\sigma$ 2. Fed into a clustering algorithm where the outliers could be labeled as an anomaly with 0/1 #### I. 1. 3. 2 Training and Testing * __DO NOT SPLIT into testing and training set__ Consider the graph below: ![](https://i.imgur.com/CYRhZAz.png) If we were to split the series at `06:00`, then the left part of the series would be considered an anomaly. Likewise, if we were to split the series at `20:00`, the right most part of the series would be considered an anomaly. **Anomaly detection need to take the a look at the series as a whole.** Therefore, it is recommended to perform *batch mode* training and testing. * The only acceptable time where it is better to split the series into training and testing data is when a part of the data is considered to have no anomalies. This part of the series is called **streaming data** which is real time data. Offline data also known as **static data**, ## I. 2 Reconstructor Model * Different from the Forecasting Model, the Reconstructor Model attempts to recreate the series. $$ f(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_n \end{bmatrix})=[\begin{bmatrix} \hat{x_1}\\\hat{x_2}\\...\\\hat{x_n} \end{bmatrix}] $$ * Most reconstructor models uses Deep Learning methods, specifically Autoencoders (AE). It is not suitable for product-based development as most deep learning models requires online training. * **ML-Based Reconstructor Model** The ML approach to a reconstruction model is very similar to the Direct Strategy for forecasting. However, notice that the output is different. $$ f(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{1}}\\\hat{x_{2}} \end{bmatrix} \\ f^{1}(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{1}} \end{bmatrix} \\ f^{2}(x) = f(\begin{bmatrix} x_1\\x_2\\...\\x_t \end{bmatrix})=\begin{bmatrix} \hat{x_{2}} \end{bmatrix} $$ ## II. Multivariate Time Series When you model a multivariate time series, you are modeling time series changes that represent changes of $m$ variables represented in vector $[\bar{x}_1 ... \bar{x}_m]$ over time ( $n$ ). $$ X = \begin{bmatrix} \bar{x_1}\\\bar{x_2}\\...\\\bar{x_m} \end{bmatrix} = \begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1n}\\ x_{21} & x_{22} & \cdots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} & \cdots & \cdots & x_{mn} \end{pmatrix} \\ where \ x_{ij} \ = \ j^{th} \ value \ in \ sensor \ i $$ The sliding window sized $t$ used for MTSAD would look like the following: $$ W = \begin{bmatrix} \ & \ & \ \\ \ & \ & \ \\ \end{bmatrix}_{M \times t} $$ The sample code to implement the sliding window is as follows: ```python= def func(df, t, output_sz): ret = [] n = len(df) m = len(df.columns) #number of columns: tmp = [] for k in range(m): tmp.append(df.iloc[j:j+t, k]) ret.append(tmp) ret = np.array(ret) return ret ``` ```python= def func(df, t, output_sz): n = len(df) m = len(df.columns) ret = [[[0 for x in range(t)] for y in range(n-t+1)] for z in range(m)] for k in range(m): for j in range(n-t): for i in range(t): ret[k][j][i] = (df.iloc[j+i][k]) return ret ``` ## II. 1 Forecaster Model $$ f(X) = f( \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n}\\ x_{21} & x_{22} & \cdots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} & \cdots & \cdots & x_{mn} \end{bmatrix}) = \begin{bmatrix} \hat{x}_{1(t+1)}\\\hat{x}_{2(t+1)}\\...\\\hat{x}_{m(t+1)} \end{bmatrix} \ or \ \ [\hat{x}_{1(t+1)}] $$ ## II. 2 Reconstructor Model $$ f(X) = f( \begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n}\\ x_{21} & x_{22} & \cdots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} & \cdots & \cdots & x_{mn} \end{bmatrix}) = \begin{pmatrix} \hat{x}_{11} & \hat{x}_{12} & \cdots & \hat{x}_{1n}\\ \hat{x}_{21} & \hat{x}_{22} & \cdots & \hat{x}_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ \hat{x}_{m1} & \cdots & \cdots & \hat{x}_{mn} \end{pmatrix} $$ ## III. Our Framework: In Detail ![](https://i.imgur.com/cBrLMeS.png) 1. __Preprocessing for Temporal Model__ * Seperate all the variables from the time series * Use the sliding door size $t$ to make the trajectory matrix for UTS forecasting model 2. __Preprocessing for Spatial Model__ * There are two proposed ways of preprocessing for the spatial model. * First, directly use the sliding door algorithm to create a matrix like the UTS trajectory matrix, but without the labels. * If the time complexity and memory complexity are too high such that it compromises the user experience, we can consider the back up plan. We can perform a simple arithmathic operation (such as summation or mean average) within a defined range to decrese the number of computation needed. 3. __Temporal Model__ * The temporal model will perform univariate anomaly detection. We are planning to use **forecastor models**. * As of 12/8, we are considering ARIMA and XGboost as the potential candidates. 4. __Spatial Model__ * The temporal model will perform multivariate anomaly detection. We are planning to use **unsupervised models**. * As of 12/8, we are considering One Class SVM, Isolation Forest, DBSCAN, and KNN as potential candidates. 5. __Postprocessing for Temporal Model__ * We hope that the temporal model will output an anomaly score (the data type is usually float). * Given the anomaly score, we would use $\pm3\sigma$ as the threshold function to determine if that timestamp is considered an anomaly. * If it is an anomaly, we would write 1, if not we would write 0. In the end, we would have $N-t+1$ vectors which each vector containing $m$ variables. 6. __Postprocessing for Spatial Model__ * Because the temporal model's output will start at index $t+1$, we need to delete the first $t$ values of the output. 7. __Concatenate__ * After the preprocessing, both of the outputs have the same format and length, we can combine both outputs into one matrix size $(m+1)\times(N-t+1)$. 8. __Clustering Model__ * All the vectors of each time stamp is used as the input for a clustering model. * We have yet to determined which clustering algorithm we would like to use. 9. __Output__ * The final output would be a list of range of where the anomalies lie. * It will then be visualized using Tableau. ###### tags: `Notes`