# Video Stabilization
## Introduction
Video Stabilization is the process of estimating the undesired camera motion and wrapping the images and compensate for it. For example, the videos taken by hand-held cameras, smartphone cameras are often shaking. Thís can be done with hardware, e.g mordern cameras using OIS (Optical Image Stabilization). Beside that, there are many researchs using algorithm to against undesired vibrations, e.g deep neural network, CNNs ... But in this blog we'll not discuss deep neural net, just focus on classic techniques to solve this problem.
There are four main steps in video stabilization process: motion estimation, motion compensation (or smoothing), image wrapping and finally cropping to remove empty regions at border of frames.
We will walk through each step and discuss algorithms also a bit mathematic behind that. Remember, all solutions we will present are used in offline, if you want to learn the real time method, you can read more in [MeshFlow](http://www.liushuaicheng.org/eccv2016/). Now let's get started.
## Motion Estimation
Vibrations can lead to moving abnormally of features in frames. In order to reduce these movements, we first estimate motion of features in current frame. This step consits of two smaller processes: detect features and expect transformation matrix between two adjacent frames. Lucky for us, both had been already implemented in OpenCV but we must understand clearly both these.
### Feature Detection
We can use Harris corner detection or SURF, SHIFT ... to detect features. SURF and SHIFT are better than Harris in accuracy and speed, but to make it easy to understand, we'll brief the main ideal of basic feature detection algorithm, Harris.
### Transformation Matrix Expectation
Phu
## Motion Compensation
Huy Phu Khang
### Additive method
In this section, we will describe a simple strategy named **"Local linear matrix-based smoothing"** that avoid the compositional of matrices. The benefit of this method is that the errors produced by the compositions of matrices are not accumulated.
This method relies on the term *virtual trajectories*, where the information used for estimation the set of smooth transformations is not based on real trajectories, but on the integral of the velocity of a pixel.
* Vitual trajectory of a homography is defined as
$$\bar{H_{i}} = Id + \sum_{i-1}^{l=1}(H_{l,l+1} - Id)$$
or
$$\bar{H}_{i} = \bar{H}_{i-1} + H_{i-1,i}-Id$$
* Smoothed trajectories coeficients of the matrix trajectories for i = 1,..,*N* are calculated using Gaussian smoothing. They are defined by
$$ \tilde{H_{i}}(p,q) = (G_{\sigma}*\{\bar{H_{j}}(p,q)\})_{i}
= \sum_{j=-k}^{j=k}G_{\sigma}(j)\bar{H}_{i-j}(p,q)$$
and then
$$ \hat{H}_{i} = \tilde{H}_{i} - \bar{H}_{i} + Id$$
* Finally, this method appoarch is then obtain by the rectifying transformation $H_{i}^{'} = \bar{H}_i^{-1}$
The concept of virtual trajectory is clearer in the case of point, where the virtual trajectory is not given by the true displacement of the points from frame to frame but by the apparent motion of the point in each position.
The boundary conditions for this method are the same to the compositional strategy.
The step of **Local linear matrix-based smoothing** can be illustrated in the algorithm below.
>$\begin{aligned}
&\quad\mathbf{Input} : \{H\}, \sigma, bc\\
&\quad\mathbf{Output}: \{H^{'}\}\\
&1.\quad\bar{H}_{1} \leftarrow Id\\
&2.\quad\textrm{for } i \leftarrow 2 \textrm{ to } N \textrm{ do}\\
&3.\quad\quad\bar{H}_{i} \leftarrow \bar{H}_{i-1} + H_{i-1,i}-Id\\
&4.\quad\textrm{for } i \leftarrow 1 \textrm{ to } N \textrm{ do}\\
&5.\quad\quad\textrm{Guassian convolution}(\{\bar{H}\},\{\tilde{H}_{i}\},i,\sigma,bc)\\
&6.\quad\textrm{for } i \leftarrow 1 \textrm{ to } N \textrm{ do}\\
&7. \quad\quad \hat{H}_{i} = \tilde{H}_{i} - \bar{H}_{i} + Id\\
&8. \quad\quad H^{'} = \bar{H}_i^{-1}
&\end{aligned}$
## Wrapping and Cropping
### Crop and zoom strategy
A post-processing is needed to remove the empty region that appear at the border of the images. We will discuss a simple approach - Crop and Zoom. The idea is to find the largest axis-parrallel retangel that does not contain empty regions, and apply the adequate crop to all frames to remove them.
#### Crop strategy
A fast and simple process for computing the rectangle is to iteratively project the corners of the images using the smoothed homography of each frame and update the limits of the rectangle by each corner.
The algorithm can be describe as follow:
> Current rectangle: $x_{1}, y_{1}, x_{2}, y_{2}$
At frame $I$ do
$\quad$ //project and update the top-left corner
$\quad$ $(x,y,z)^{T} \leftarrow (H_{i}^{'})^{-1}.(0,0,1)^{T}$
$\quad$ If $x/z>x_{1}$ then $x_{1} \leftarrow x/z$
$\quad$ If $y/z > y_{1}$ then $y_{1} \leftarrow y/z$
$\quad$ //project and update the other 3 corners
After iterating to the last frame, we arrive at the rectangle that is close to the largest rectangle that does not contain the empty regions.
#### Zoom strategy
Let $(x_{m},y_{m}) =(\dfrac{x_{1}+x_{2}}{2},\dfrac{y_{1}+y_{2}}{2})$ be the center of the rectangle we derived from the earlier step and let $s = min(\dfrac{x_{1}+x_{2}}{n_{x}},\dfrac{y_{1}+y_{2}}{n_{y}})$ be the scale factor from the small rectangle to the origin rectangle. We can define the **Crop and Zoom** transformation as
$\begin{aligned}
T &= \begin{pmatrix}
1&0&x_{n}\\
0&1&y_{n}\\
0&0&1
\end{pmatrix}\begin{pmatrix}
s&0&0\\
0&s&0\\
0&0&1
\end{pmatrix}\begin{pmatrix}
1&0&-n_{x}/2\\
0&1&-n_{y}/2\\
0&0&1
\end{pmatrix}\\
&=\begin{pmatrix}
s&0&x_{m} - sn_{x}/2\\
0&s&y_{m} - sn_{y}/2\\
0&0&1
\end{pmatrix}
\end{aligned}$
The idea behind this transformation is to move the center of the original frame to $(x_{m},y_{m})$ as the same time scaling our frame to match the rectangle's size.
### Warping
This is the final step in which we will warp each frames base on the smoothed transformation and the Crop&Zoom transformation.
Notice that for a pixel location $x$ in the stabilized frame, we have
$$I_i^{'}(x) = I_{i}(H_{i}^{'}Tx)$$
This is because our $H_{i}^{'}$ and $T$ are used to describe the transformation from the stabilized frame back to the original frame. This relation can be used to determine the color value at each pixel for each new stabilized frame.
## Result
## Conclusion
In this report, we have descibed a work flow for video stabilization as well as different motion compensation strategies.
The smoothing strategies a divided into compositional and addictive approaches. Compositional approaches tend to suffer more from errors in previous steps because they are accumulated throughout the frame. In contrast, addictive approaches tend to suffer less because of it's increment through addiction method.
The downsides of these appoarchs is that they are not fit to real-time implementation and can also yield bad result when the video quality is low, or the camera is too shaky, loss of perspective ,etc. There is a more power method which can eliminate these downside named **Mesh Flow**, but we will discuss this in another day.