# Notes on "[Abnormal Event Detection in Videos using Spatio Temporal Autoencoder](https://arxiv.org/pdf/1701.01546.pdf)" ###### tags: `Video anamoly detection` `Spatiotemporal auto-encoders` ## Introduction The authors propose a new architecture for anomaly detection in videos. Their architecture includes two main components one for spatial feature representation, and one for learning the temporal evolution of the spatial features. ## Principle of working The method is based on the principle that the frames containing an abnormality will be significantly different from the previous frames. ## Methodology ### Pre-processing 1) Each frame is extracted from the raw videos and resized to 27×227. 2) The global mean image is calculated by averaging the pixel values at each location of every frame in the training dataset. 3) Each image is scaled between 0 and 1 and the global mean is subtracted. 4) These images are then converted to gray-scale. The processed images are then normalized to have zero mean and unit variance. ### Architecture ![STAE architecture](https://i.imgur.com/kFU3wNr.png) ### Regularity scores Reconstruction error of all pixel values in frame t of the video sequence is taken as the Euclidean distance between the input frame and the reconstructed frame: $e(t) = ||x(t)-f_{W}(x(t))||_{2}$ Abnormality score : $S_a(t) = \frac{e(t)-e(t)_{min}}{e(t)_{max}}$ The minimum and maximum are calculated from t = t to t+k. K was chosen to be 50 in the original paper Reularity score : $S_r(t) = 1 - S_a$ ### Anomaly detection The reconstruction error of each frame determines whether the frame is classified as anomalous. The threshold determines how sensitive we wish the detection system to behave