*Fields to study**.
* Camera / perspective projection
* Geometric transform
* Digital image
* Image convolution
* Edge detection
* Hough transformation
* Interesting points
* Local descriptors
* Deep learning
---
# Week 6 - Linear filtering
**Smoothing**.
* *Average filter*. Does not compare well with a defocused lens
* *Gaussian filter*. $G_\sigma = \frac{1}{2\pi \sigma^2}\exp\bigg(-\frac{x^2+y^2}{2\sigma^2}\bigg)$
* *Idea*. Eliminate edge effects
* *Filter width*. Set filter half-width to $3\sigma$
* *Properties*.
* Remove high-frequency components, i.e. noise, from the image
* Convolution with self is another Gaussian
* *Separable kernel*. Product of two 1D Gaussians, i.e. more efficient computation
**Noise**.
* *Salt and pepper noise*. Random occurrences of black and white pixels
* *Reduce noise*. Median filter
* *Impulse noise*. Random occurrences of white pixels
* *Gaussian noise*. Variations in intensity drawn from a Gaussian distribution
* *Formulation*. $f(x, y) = \hat{f}(x, y) + \eta(x, y)$ where $\eta(x, y)\sim {\cal{N}}(\mu, \sigma)$
* *Assumption*. Independent, zero-mean noise
* *Reduce noise*. Gaussin filter
**Unsharp mask filter**. $f + \alpha (f - f*g)$
* $f$ is the image
* $f*g$ is the blurred image
# Week 7
**Finite difference filter**.
* *Perwitt*
* *Sobel*
* *Roberts*
**Effects of noise**. Finite difference filters respond strongly to noise
* *Solution*. Smoothing image
**Derivative of Gaussian filter**. $\frac{d}{dx} (f*g) f * \frac{d}{dx} g$
* *Implementation issues*.
* Trade of between smoothing and localizaiton
* Gradient magnitude is large along a think trail or ridge
**Criteira for an optimal edge detector**.
* *Good detection*. Minimize the probability of false positive and false negative
* *Good localization*. Detected edges are close to true edges
* *Single resonse*. Return one point only for each true edge point
**Canny edge detector**. The most widely used edge detector
* *Steps*.
1. Filter image with derivative of Gaussian
2. Find magnitude and orientation of graient
3. NMS to narrow multi-pixel wide ridges to single pixel width
4. Linking and thresholding to detect edge
# Week 8 - Local descriptor
**Keypoint matching**.
1. Find a set of distinctive key points
2. Define a region around each keypoint
3. Extract and normalize the region content
4. Compute a local descriptor from the normalized region
5. Match local descriptors
**Harris detector**. Search for local neighborhoods where image content has two main directions, i.e. eigenvectors
* *Second moment matrix*. $\mu(\sigma_I, \sigma_D) = g(\sigma_I) * \begin{bmatrix}I_x^2(\sigma_D) & I_x I_y (\sigma_D) \\ I_x I_y (\sigma_D) & I^2_y(\sigma_D)\end{bmatrix}$
* $g(\sigma_i)$ is Gaussian filter
* $I(\sigma_D)$ is the blurred image with Gaussian filter w.r.t $\sigma_D$
* $I_x, I_y$ are image derivatives
* $I_x^2, I_y^2$ are square of image derivatives
* *Cornernes function*. $\text{har} = \det \mu(\sigma_I, \sigma_D) - \alpha \cdot \text{trace}^2 \big[\mu(\sigma_I, \sigma_D)\big]$
* *Pipeline*.
1. Blur image $I$ to obtain $I(\sigma_D)$
2. Compute image derivatives $I_x, I_y$
3. Compute square of derivatives $I_x^2, I_y^2$
4. Apply Gaussian filter $g(\sigma_I)$
5. Compute cornerness function
6. Use NMS to get rid of redundant patches
**Hessian detector**. Search for strong curvature in two orthogonal directions
* *Hessian determinant*. The determinant of $\begin{bmatrix}I_{xx} & I_{xy} \\ I_{yx} & I_{yy}\end{bmatrix}$, i.e. $I_{xx} I_{yy} - I_{xy}^2$
* *Effects*. Responses mainly on corners and strongly textured areas
# Week 9 - Describe interesting points
**Automatic scale detection**. At any point, try out different scales and pick the one with highest score
**Difference of Gaussian**. Great for finding interesting keypoints in the image
1. Generate seveal octaves of the original image, corresponding to different scales of the image
2. Within an octave, image are progressively blurred progressively, in terms of $\sigma$, using Gaussian blur
3. DoG images for an octave is obtained as the difference of Gaussian blurring of an image with two different $\sigma$
4. For each DoG image, compare every pixel with its 8-neighbors as well as 9 pixels in the next scale, and 9 pixels in the previous scales
$\to$ If it is a local extrema, it is a potential keypoint, which is best represented in the correpsonding scale
**Orientation normalization**.
1. Take a neighborhood around each keypoint depending on the scale
2. Compute the gradient magnitude and the direction within the region
3. Bin the orientation into 36 bins covering 360 degrees
4. The "amount" added to the bin is proportional to the magnitude of the gradient at that point
5. Do this for all pixels around the keypoint to generate a histogram
6. Choose the highest peak in the histogram and any peak above 80% of it as orientations of the keypoint
**Keypoint descriptor**.
1. A 16x16 window around each keypoint is taken
2. Divide the window 16-sub-blocks of 4x4 size
3. For each sub-block, 8-bin orientation histogram is created
4. The resulting 128 bin values are used as keypoint descriptor
**SIFT descriptor**. Use histogram of oriented gradients
* *Idea*.
* Capture important texture information
* Robust to small translation / affine deformations
* *Steps*.
1. Run DoG detector
* Find maxima in location / scale space
* Remove edge points
2. Find all major orientations
* Bin orientations into 36 bin histogram
* Weight by gradient magnitude
* Weight by distance to center
* Return orientations within 0.8 of peak
* Use parabola for better orientation fit
3. For each (x, y, scale, orientation), create descriptor
* Sample 16x16 gradient magnitude and relative orientation
* Bin 4x4 samples into 4x4 histogram
* Threshold values to max of 0.2, divid by L2 norm
* Final descriptor is a 4x4x8 normalized histogram
**SURF**. Fast approximation of SIFT idea
# Texture descriptor
**Texture**. Regular or stochastic patterns caused by bumps, grooves, and / or markings
**Filter bank**. An overcomplete representation
* *Filter banks*. Process with each filter and keep responses
**Represent texture**. Measure responses of blobs and edges at various orientations and scales via filter bank
* *Option 1*. Record simple statistics, e.g. mean, std, etc. of absolute filter response
* *Option 2*. Take vectors of filter responses at each pixel and cluster them, then take histograms
# Typical image processing workflow

# Fitting, alignment, and instance recognition
**Key point matching**.
1. Find a set of distinctive key points
2. Define a region around each keypoint
3. Extract and normalize the region content
4. Compute a local descriptor from normlized region
5. Match local descriptors
**Find the objects**.
1. Match interest points from input image to database image
2. Matched points vote for rough position / oritentation / scale of object
3. Find position / orientation / scales, which have major votes
4. Compute affine registration and matches using iterative least squares with outlier check
5. Report object if there are at least $T$ matched points
## Line fitting
**Least square line fitting**. Fit $a,b$ in the equation $y=ax+b$
* *Good*.
* Clearly specified objective
* Convex optimization
* *Bad*.
* May not be what we expect
* Sensitive to outliers
* Does not allow multiple good fits
**Hough transform**.
* *Idea*.
1. Create a grid of parameter values
2. Each point votes for a set of paramters, incrementing those values in grid
3. Find maximum or local maxima in grid
* *Examples*. Find lines, i.e. $y=ax+b$
* *Good*.
* Robust to outliers
* Fairly efficient
* Provide multiple good fits
* *Bad*.
* Som sensitive to noise
* Bin size trades of between noise tolerance, precision, and speed / memory
* Not suitable more for more than a few parameters
**RANSAC**.
* *Idea*
1. Sample the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
4. Repeat until the best model is found with high confidence
* *Good*
* Robust to outliers
* Applicable for large number of objective function parameters than Hough transform
* Optimization parameters are easier to choose
* *Bad*.
* Computational time grows quickly with fraction of outliers and number of parameters
* Not good for getting multiple fits
**Image registration**. Use least square method to solve for transformation matrix