Try โ€‚โ€‰HackMD

Notes on "Abnormal Event Detection in Videos using Spatio Temporal Autoencoder"

tags: Video anamoly detection Spatiotemporal auto-encoders

Introduction

The authors propose a new architecture for anomaly detection in videos. Their architecture includes two main components one for spatial feature representation, and one for learning the temporal evolution of the spatial features.

Principle of working

The method is based on the principle that the frames containing an abnormality will be significantly different from the previous frames.

Methodology

Pre-processing

  1. Each frame is extracted from the raw videos and resized to 27ร—227.
  2. The global mean image is calculated by averaging the pixel values at each location of every frame in the training dataset.
  3. Each image is scaled between 0 and 1 and the global mean is subtracted.
  4. These images are then converted to gray-scale. The processed images are then normalized to have zero mean and unit variance.

Architecture

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Regularity scores

Reconstruction error of all pixel values in frame t of the video sequence is taken as the Euclidean distance between the input frame and the reconstructed frame:

e(t)=||x(t)โˆ’fW(x(t))||2

Abnormality score :

Sa(t)=e(t)โˆ’e(t)mine(t)max

The minimum and maximum are calculated from t = t to t+k.
K was chosen to be 50 in the original paper

Reularity score :

Sr(t)=1โˆ’Sa

Anomaly detection

The reconstruction error of each frame determines whether the frame is classified as anomalous. The threshold determines how sensitive we wish the detection system to behave