# Notes on "[Abnormal Event Detection in Videos using Spatio Temporal Autoencoder](https://arxiv.org/pdf/1701.01546.pdf)"
###### tags: `Video anamoly detection` `Spatiotemporal auto-encoders`
## Introduction
The authors propose a new architecture for anomaly detection in videos. Their architecture includes two main components one for spatial feature representation, and one for learning the temporal evolution of the spatial features.
## Principle of working
The method is based on the principle that the frames containing an abnormality will be significantly different from the previous frames.
## Methodology
### Pre-processing
1) Each frame is extracted from the raw videos and resized to 27×227.
2) The global mean image is calculated by averaging the pixel values at each location of every frame in the training dataset.
3) Each image is scaled between 0 and 1 and the global mean is subtracted.
4) These images are then converted to gray-scale. The processed images are then normalized to have zero mean and unit variance.
### Architecture
![STAE architecture](https://i.imgur.com/kFU3wNr.png)
### Regularity scores
Reconstruction error of all pixel values in frame t of the video sequence is taken as the Euclidean distance between the input frame and the reconstructed frame:
$e(t) = ||x(t)-f_{W}(x(t))||_{2}$
Abnormality score :
$S_a(t) = \frac{e(t)-e(t)_{min}}{e(t)_{max}}$
The minimum and maximum are calculated from t = t to t+k.
K was chosen to be 50 in the original paper
Reularity score :
$S_r(t) = 1 - S_a$
### Anomaly detection
The reconstruction error of each frame determines whether the frame is classified as anomalous. The threshold determines how sensitive we wish the detection system to behave