---
tags: Academic
---
# CV HW 1 Tutorial: Harris Corner
## Requirements
```
python 3.6+
numpy
scipy
scikit-image
matplotlib
```
`numpy` and `scipy` are used for basic operations. `scikit-image` (`skimage`) is used for image stuff. `matplotlib` is the de-facto visualiation tool in `python`.
You should use **virtual environment** or Colab to isolate this project from other python project or system's python to prevent pollution. Code in this project utilizes the mechanism of `ndarray`, so you should have read [Numpy Quickstart](https://docs.scipy.org/doc/numpy/user/quickstart.html) before, especially **indexing**, **masking**, **broadcasting**. If you got time, [Numpy Indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) is also a good material.
The `import` in this tutorial is:
```python=
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage as ndi
```
## Gaussian Kernel

Take 5x5 gaussian kernel for example, we want to find the gaussian value of *black dots* and organize them in a 5x5 `ndarray`.
```python=
s = 5 # sigma
# Create the x, y coordinates of each dot
cx, cy = (0 + 5 - 1) / 2, (0 + 5 - 1) / 2 # shifting value
xx, yy = np.meshgrid(np.arange(5) - cx, np.arange(5) - cy)
# >>> print(xx)
# [[-2. -1. 0. 1. 2.]
# [-2. -1. 0. 1. 2.]
# [-2. -1. 0. 1. 2.]
# [-2. -1. 0. 1. 2.]
# [-2. -1. 0. 1. 2.]]
# >>> print(yy)
# [[-2. -2. -2. -2. -2.]
# [-1. -1. -1. -1. -1.]
# [ 0. 0. 0. 0. 0.]
# [ 1. 1. 1. 1. 1.]
# [ 2. 2. 2. 2. 2.]]
# Calcuate the gaussian value of each dot
g = np.exp(-(xx ** 2 + yy ** 2) / (2 * s ** 2)) / (2 * np.pi * s ** 2)
# Normalize the kernel to have sum = 1.0
g = g / g.sum()
# >>> print(g)
# [[0.03688345 0.03916419 0.03995536 0.03916419 0.03688345]
# [0.03916419 0.04158597 0.04242606 0.04158597 0.03916419]
# [0.03995536 0.04242606 0.04328312 0.04242606 0.03995536]
# [0.03916419 0.04158597 0.04242606 0.04158597 0.03916419]
# [0.03688345 0.03916419 0.03995536 0.03916419 0.03688345]]
```
Notice that `xx`, `yy`, `g` are `ndarray` with shape `[5, 5]`. The `np.exp(), **, *, /` operations are all element-wise. To do a gaussian blur, you just need to call `output = ndi.convole(img, g)`.
Other way to obtain a 2D gaussian kernel is computing the outer product of two 1D Gaussian kernel using formula:
$$
G(x, y) =
(\frac{1}{\sqrt{2\pi} \sigma_x}
exp(-\frac{x^2}{2 \sigma_x^2}))
((\frac{1}{\sqrt{2\pi} \sigma_y})
exp(-\frac{y^2}{2 \sigma_y^2}))
$$
```python=
sx, sy = 5, 5 # sigma
xx = np.arange(k) - (0 + k - 1) / 2 # [k,]
yy = np.arange(k) - (0 + k - 1) / 2 # [k,]
gx = np.exp(-0.5 * (xx / sx)**2) / (np.sqrt(2 * np.pi) * sx) # [k,]
gy = np.exp(-0.5 * (yy / sy)**2) / (np.sqrt(2 * np.pi) * sy) # [k,]
g = np.outer(gy, gx) # [k, k]
g = g / g.sum()
```
## Sobel
Assume you get the x-gradient `dx` and y-graident `dy`, the magnitue is simply:
```python=
mag = np.sqrt(dx ** 2 + dy ** 2)
```
What else can you expect? ``¯\_(ツ)_/¯``
To visualize gradient direction, [HSV colorspace](https://zh.wikipedia.org/zh-tw/HSL%E5%92%8CHSV%E8%89%B2%E5%BD%A9%E7%A9%BA%E9%97%B4) is a good way. Let *hue* represent the angle of the vector, so different angle has different color. Let *value* represent the magnitude of the vector, so the smaller the magnitude, the darker the color. *Saturation* can be a constant value like `1.0`. Since most visualization library do not support viewing hsv image, you'll need to convert hsv image back to rgb image.

For implementation, angle can be calculated using `np.arctan2`. Be aware that we are not using `np.arctan` since [atan2](https://en.wikipedia.org/wiki/Atan2) has some advantages (say, no need to deal with division by zero). Notice that `skimage.color.hsv2rgb` expects hsv being represnted as `ndarray` with range `[0, 1]` and shape `[h, w, 3]`. You need some normalization to deal with the data range.
```python=
h, w = gray.shape
hsv = np.zeros((h, w, 3))
hsv[..., 0] = (np.arctan2(dy, dx) + np.pi) / (2 * np.pi)
hsv[..., 1] = np.ones((h, w)) # or just write = 1.0
hsv[..., 2] = (mag - mag.min()) / (mag.max() - mag.min())
rgb = color.hsv2rgb(hsv)
```
Expected result (10x10 Gaussian):

## Structure Tensor
Structure tensor is calculated on each pixel of image from its local window. Each pixel has a `2x2` matrix and we'll find the determinant and trace of such matrix to derive harris corner response. Formally,
The structure tensor $A$ of position $(r, c)$ consists of weighted sums of the local patch of $I_{xx}(r, c), I_{xy}(r, c), I_{yy}(r, c)$:
$$
A(r, c) =
\begin{bmatrix}
\sum_{u, v} w(u,v) I_{xx}(r + u, c + v) &
\sum_{u, v} w(u,v) I_{xy}(r + u, c + v)
\\
\sum_{u, v} w(u,v) I_{yx}(r + u, c + v) &
\sum_{u, v} w(u,v) I_{yy}(r + u, c + v)
\end{bmatrix}
$$
which can be rewritten using convolution:
$$
\begin{align}
A(r, c)
=&
\begin{bmatrix}
A_{xx}(r, c) & A_{xy}(r, c) \\
A_{xy}(r, c) & A_{yy}(r, c)
\end{bmatrix} , where \\
& A_{xx} = I_{xx} * w \\
& A_{xy} = I_{xy} * w \\
& A_{yy} = I_{yy} * w
\end{align}
$$
Furthermore, by assuming $w$ is a Gaussian Kernel, we get:
$$
A_{xx} = GaussianBlur(I_{xx}) \\
A_{xy} = GaussianBlur(I_{xy}) \\
A_{yy} = GaussianBlur(I_{yy})
$$
My implmentation:
```python=
def structure_matrix(dx, dy):
'''
Args:
dx: (ndarray) sized [H, W]
dy: (ndarray) sized [H, W]
Return:
Axx: (ndarray) sized [H, W]
Axy: (ndarray) sized [H, W]
Ayy: (ndarray) sized [H, W]
where structure tensor A of (r, c) is
[
[ Axx[r, c], Axy[r, c] ],
[ Axy[r, c], Ayy[r, c] ]
]
'''
Axx = gaussian_blur(dx * dx, 15)
Axy = gaussian_blur(dx * dy, 15)
Ayy = gaussian_blur(dy * dy, 15)
return Axx, Axy, Ayy
```
## Harris Corner Response
The response of pixel $(r, c)$ is $det(A(r, c)) - k \cdot tr(A(r, c))^2$ where
1. $det(A(r, c))$ is `Axx[r, c] * Ayy[r, c] - Axy[r, c] * Axy[r, c]`
2. $tr(A(r, c))$ is `Axx[r, c] + Ayy[r, c]`
Like before, we utilize `ndarray` element-wise operations to obtain response. All pixels' response can be calculated together:
```python=
det = Axx * Ayy - Axy * Axy
tr = Axx + Ayy
R = det - k * tr * tr
```
As you expect, `det`, `tr`, `R` are `ndarray` with shape same as (gray) image.
## NMS
There's a lot of ways to do nms, the way I gonna use here is a simplified version of `skimage.feature.corner_peaks` which utilizes [maximum filter](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.maximum_filter.html).

Pixel `(r, c)` is a harris corner if and only if:
1. `R[r, c] > threshold`. Threshold is tunned manually, something like `3e-7`.
2. `(r, c)` is a local maximum in `R`, that is, larger than the surrounding pixels' response.
(1.) is easy. `R > 3e-6` do the job. You'll get a `ndarray` with `dtype` `bool` representing the mask, i.e., whether each pixel satisfy the condition.
(2.) can be done using the fact that a local maximum remains same value after `R` being maximum filtered. Comparing the original response with maximum filtered one, the pixels which have same value in both should be a local maximum. `Scipy` has maximum-filter built-in, so `np.abs(ndi.maximum_filer(R) - R) < 1e-15` gets the mask of condition (2.).
Finally, the mask that satisfies both condiations is combined from 2 masks:
```python=
mask1 = (R > 3e-6)
mask2 = (np.abs(ndi.maximum_filter(R, size=30) - R) < 1e-15)
mask = (mask1 & mask2) # or np.logical_and(...)
```
Notice that I don't compare `maxR` and `R` by `maxR == R` because they are both `float ndarray`. In general, doing comparision in floating points should not use `==` due to floating point error.
If you want to get the positions of the corners instead of the mask, `np.nonzero` extracts the positions of the positive elements of a mask.

## Misc.
### Reading Image
Images should be processed as `np.float32 ndarray`, to prevent the overflow or quantization error when processed in `np.uint8`. My usual way to read an image and make it gray-scale:
```python=
from skimage import io
from skimage import util
from skimage import color
img = io.imread('test.png')
img = util.img_as_float(img)
img = color.rgb2gray(img) # [H, W 3] -> [H, W]
```
### Showing Image
`Matplotlib` has [2 interfaces](https://medium.com/@kapil.mathur1987/matplotlib-an-introduction-to-its-object-oriented-interface-a318b1530aed). One is MATLAB style, one is object oriented. While most tutorial use former, I personally prefer latter. Following are some examples.
To show an image:
```python=
fig, ax = plt.subplots()
ax.imshow(img)
plt.show()
```
The `imshow` function can show both `np.uint8` image and `np.float` image. It expects `np.uint8` having range `[0, 255]` and `np.float` image `[0.0, 1.0]`. When showing gray-scale image, `imshow` can have `cmap` ([colormap](https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html)) as argument. It specifies which color each value to mapping to. By default, `cmap` is `viridis`. If you want gray-scale output, `cmap` should be set as `gray`:
```python=
ax.imshow(img, cmap='gray')
```
To draw 2x3 plots in a figure:
```python=
fig, ax = plt.subplots(nrows=2, ncols=3) # ax is ndarray shaped [2, 3]
ax[0, 0].imshow(...)
ax[1, 0].imshow(...)
...
plt.show()
```
To overlap the image with other plots, say some 2d points:
```python=
r, c = np.nonzero(mask)
print(r.shape) # [#points, ]
print(c.shape) # [#points, ]
fig, ax = plt.subplots()
ax.imshow(...)
ax.plot(c, r, 'r.', markersize=3) # or ax.scatter(c, r, ...)
plt.show()
```