MLRF: Lecture 03

Agenda for lecture 3

  1. Introduction
  2. Finish lecture about local feature detectors
  3. Local feature descriptors
  4. Descriptor matching and indexing
  5. Projective transformations
  6. Homography estimation

A word about Divise clustering

HAC is bottom-up, divisive clustering is performed top-down

Classical approach:

  1. Start with all data
  2. Apply flat clustering
  3. Recursively apply the approach on each cluster until some termination

Pros: can have more than 2 sub-trees

Summary of last lecture

Global image descriptors

  • Color histogram
  • Limited descriptive power
  • Which distance function ?

Clustering

  • K-means
  • Hierarchical Agglomerative Clustering

Local feature detectors

  • Image gradients
  • Edge detector: Sobel, Canny
  • Corner detector: Harris
    • Large image gradient in 2 directions
  • Corner detector: FAST
  • Corner detectors: LoG, DoG, DoH
  • Blob detector: MSER

Nest practice sessions

Compute and match descriptors for max. 1 hour (from practice session 2)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Play with ORB keypoint mathcing to implement a simple AR technique (practice session 3)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Introduction

How are panorama pcitures created from multiple pictures?

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  1. Detect small parts invariant under viewpoint change: keypoints
  2. Find pairs of mathcing keypoints using a description of their neighborhood
  3. Compute the most likely transformation to blend images together

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Local feature descriptors

Harris & Stephen conclusion

Harris-Stephens trick to avoid computing eigenvalues:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Nowadays linear algebra is cheap, so compute the real eigenvalues.

Thein filter using

min(λ1,λ2)>λ,
λ
being a threshold

This is the Shi-Tomasi variant

Build your own edge/corner detector

We need the eigenvalues

λ1 and
λ2
of the structure tensor (hessian matrix with block-wise summing)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

dst = cv2.cornerEigenValsAndVecs(src, neighborhood_size, sobel_aperture)
dst = cv2.cornerMinEigenVal(src, neighborhood_size, sobel_aperture)

Harris summary

Pros

Translation invariant

  • Large gradients in both directions = stable points

Cons

Not so fast

  • Avoid to compute all those derivatives
    Not scale invariant
  • Detect corners at different scales

Corner detectors, binary tests FAST

Features from accelerated segment test (FAST)

Keypoint detector used by ORB

Segment test
Compare pixel

P intensity
Ip
with surrounding pixels (circle of 16 pixels)

If

n contiguous pixels are either:

  • all darker than
    Ipt
  • all brighter than
    Ip+t

then

P is detected as a corner

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Tricks

  1. Cascading
    • If
      n=12
      (
      34
      of the circle)
  2. Machine learning
  3. How to perform non-maximal suppression

FAST summary

Pros

Very fast

  • 20 times faster than Harris
  • 40 times faster than DoG

Very robust to transformations (perspective in particular)

Cons

Very sensitive to blur

Corner detectors at different scales LoG, DoG, DoH

Laplacian of Gaussian (LoG)

The theoretical, slow way

Band-pass filter
It detects objects of a certain size

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Laplacian = second derivative

Like sobel with 1 more derivation

Taylor, again:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

New filter

Ixx=[121]×I

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Laplacian filter
2I(x,y)

Edge detector, like Sobel but with second derivatives

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Laplacian of Gaussian

mexican hat

LoG = detector of circular shapes

Detector of circular shapes:

LoG filter extrema locates "blobs"

  • maxima = dark blobs on light background
  • minima = light blobs on dark background

Detecting corners/blobs

Build a scale space representation: stack of images (3D) with increasing

σ

Difference of Gaussian (DoG)

Fast approximation of LoG (Used by SIFT)

LoG can be approximate by a Difference of 2 Gaussians (DoG) at different scales

DoG filter

It is a band-pass filter

L'idee est : "est-ce que mes bosses sont comprises entre telles ou telles longueurs d'onde"

DoG filter

Intuition

  • Gaussian (g) is a low pass filter

DoG computation in practice

Take an image

Blur it

Take the difference

DoG scale generation trick

DoG computation use "octaves"

  • "Octave" because frequency doubles/halves between octaves
  • If
    σ=2
    , then
    3
    levels per octave
  • Downsample images for next octave (like double sized kernel)
  • Compute the DoG between images

DoG: Corner selection

Throw out weak responses and edges

Estimate gradients

  • Similar to harris, look at nearby responses
  • Not whole image, only a few points! faster
  • Throw out weak responses

Find cornery things

  • Same deal, structure matrix, use dete and trace info (SIFT variant)

Determination of Hessian (DoH)

Faster approximation

LoG vs DoG vs DoH

On prefere un "Laplacian of Gaussian" pour detecter les petites etoiles

LoG, DoG DoH summary

Pros

Very robust to transformations

  • Perspective
  • Blur

Adjustable size detector

Cons

Slow

Blob detectors MSER

Maximally Stable Extremal Regions (MSER)

Detects regions which are stable over thresholds

  1. Create min & max-tree of the image
    • Tree of included components when thresholdinf the image ar each possible level
      • Le cerveau a une tumeur
  2. Select most stable regions
    • between
      t
      and
      t+
    • Rt\*
      is maximally stable iif
      q(t)=|RtRt+|/|Rt|
      as local minimum at
      t\*

Summary

Pros

Very robust to transformations

  • Affine transformations
  • Intensity changes

Quite fast

Cons

Does not support blur

Local fetaure detectors: Conclusion

  • Harris Stephens: can be very stable when combined with DoG
  • Shi-Tomasi: Assumes affine transformation (avoid it with perspective)
  • DoG: slow but very robust (perspective, blur, illumination)
  • DoH: faster than DoG, misses small elements, better with perspective
  • FAST: very fast, robust to perspective change (like DoG), but blur quickly kills it
  • MSER: fast, very stable, good choice when no blur

Introduction

Given som keypoints in image 1, what are the more similar ones in image 2 ?

This is a nearest neighbor problem in descriptor space

This is also a geometrical problem in coordinate space

Matching

Matching problem

Goal: given 2 sets of descriptors, find the best matching pairs

Need a distance/norm: depends on the descriptor

  • Distribution (histogram)? Stats?
  • Data type ?
    • Float, integers: Euclidean, cosine
    • Binary: Hamming

1-way matching

For each

xi in the set of descriptors
D1
, find the closest element
yi
in
D2

We have a match

m(xi,yi) for each
xi

Symmetry test aka cross check aka 2-way matching

For each

xi in the set of descriptor
D1
, find the closest element
yi
in
D2
such as
xi
is also the closest element to
yi

Ratio test

For each

xi in
D1
, find the 2 closest elements
yi
and
yj
in
D2

Calibrate the ratio

Adjust it on a training set !

For each correct/incorrect match in your annotated database, plot the next to next closest distance PDF.

What is a good ratio in D. Lowe's experiment ?

Geometric validation

Summary

Indexing

Indexing pipeline

Use case: We have a database of images and we want to find an object from it

Bruteforce matching aka linear matching

Simply scan all data and keep the closest elements

Does not scale to large databases, but can be faster on small ones

kD-Trees

binary tree in which every leaf node is a k-dimensional point

FLANN - Efficient indexing

  • Original version: hierarchical k-means
  • Construction: repetitive k-means on data (then inside clusters) until minimum cluster size is reached
  • Lookup: traverse the tree in a best-bin-first manner with backtrack queue, backtrack until enough point are returned

Locally Sensitive Hasing (LSH)

Hash items using family of hash function which project similar items in the same bucket with high probability

Not cryptographic hashing !

En fonction de quel cote notre separatrice se trouve par rapport au point, on met notre bit a 1 ou 0

Approximation de la distance

cos en binaire

Locality Sensitive Hashing (LSH)

  • Fast and efficient with large spaces and lot of data
  • Return a "good match", maybe not the best one
  • kNN can be costly

Which indexing ?

Experiment

Advices for practice session:

Projective transformations

A linear transformation of pixel coordinates

Image Mappings Overview

Math. foundations & assumptions

  • For planar surfaces, 3D to 2D perspectives projection reduces to a 2D to a 2D transformation
  • This is just a change of coordinate system
  • This transformation is invertible

Translation

Scale

Rotation

Notation: Partitioned matrices

Affine

Projective

More on projective transform

  • Each point in 2D is actually a vector in 3D
  • Equivalent up to scaling factor
  • Have to normalize to get back to 2D
  • Using homogrpahy to project point
  • Multiply
    x~
    by
    H~

Summary

Homography estimation, Geometric validation

We want to recover H from keypoint matches

Recover the parameters of a perspective transform

How many correspondences are needed ?

Depends on the type of transform:

  • How many for translation ?
  • For rotation ?
  • For general projective transform ?

Reminded: we have 2 knowns for each match

Linear system

Use Linear Least Square to solve
Ma=b

Solve the system

How reliable is the estimate ?

Even worse

Is our data perfect ?

On a pas mal de bruit

Overcoming Least Square Limitations

  • We need a robust estimation

RANSAC: RANdom SAmple Consensus

Algorithm

RANSAC works well with extreme noises