MLRF: Lecture 03

# MLRF: Lecture 03 # Agenda for lecture 3 1. Introduction 2. Finish lecture about local feature **detectors** 3. Local feature descriptors 4. Descriptor matching and indexing 5. Projective transformations 6. Homography estimation # A word about Divise clustering HAC is *bottom-up*, divisive clustering is performed *top-down* Classical approach: 1. Start with all data 2. Apply flat clustering 3. Recursively apply the approach on each cluster until some termination Pros: can have more than 2 sub-trees # Summary of last lecture Global image descriptors - Color histogram - Limited descriptive power - Which distance function ? Clustering - K-means - Hierarchical Agglomerative Clustering Local feature detectors - Image gradients - Edge detector: Sobel, Canny - Corner detector: Harris - Large image gradient in 2 directions - Corner detector: FAST - Corner detectors: LoG, DoG, DoH - Blob detector: MSER # Nest practice sessions **Compute and match descriptors** for max. 1 hour (from practice session 2) ![](https://i.imgur.com/ME4rRyL.png) Play with **ORB keypoint mathcing** to implement a simple AR technique (practice session 3) ![](https://i.imgur.com/tOqFPM7.png) # Introduction *How are panorama pcitures created from multiple pictures?* ![](https://i.imgur.com/VC2FHNU.png) 1. Detect small parts invariant under viewpoint change: **keypoints** 2. Find pairs of mathcing keypoints using a **description** of their neighborhood 3. Compute the **most likely transformation** to blend images together ![](https://i.imgur.com/M7npHSo.png) # Local feature descriptors ## Harris & Stephen conclusion Harris-Stephens trick to avoid computing eigenvalues: ![](https://i.imgur.com/FLWPRWd.png) :::danger Nowadays linear algebra is cheap, so **compute the real eigenvalues**. ::: Thein filter using $\min(\lambda_1, \lambda_2)\gt\lambda$, $\lambda$ being a threshold :::success This is the Shi-Tomasi variant ::: ## Build your own edge/corner detector We need the eigenvalues $\lambda_1$ and $\lambda_2$ of the structure tensor (hessian matrix with block-wise summing) ![](https://i.imgur.com/7GstOaW.png) ```python dst = cv2.cornerEigenValsAndVecs(src, neighborhood_size, sobel_aperture) dst = cv2.cornerMinEigenVal(src, neighborhood_size, sobel_aperture) ``` ## Harris summary ### Pros Translation invariant - Large gradients in both directions = stable points ### Cons Not so fast - Avoid to compute all those derivatives Not scale invariant - Detect corners at different *scales* # Corner detectors, binary tests FAST ## Features from accelerated segment test (FAST) *Keypoint detector used by ORB* :::info **Segment test** Compare pixel $P$ intensity $I_p$ with surrounding pixels (circle of 16 pixels) ::: :::success If $n$ contiguous pixels are either: - all darker than $I_p-t$ - all brighter than $I_p+t$ then $P$ is detected as a corner ::: ![](https://i.imgur.com/Eu4B86L.png) ## Tricks 1. Cascading - If $n=12$ ($\frac{3}{4}$ of the circle) 3. Machine learning 4. How to perform **non-maximal suppression** ## FAST summary ### Pros Very fast - 20 times faster than Harris - 40 times faster than DoG Very robust to transformations (perspective in particular) ### Cons Very sensitive to blur # Corner detectors at different scales LoG, DoG, DoH ## Laplacian of Gaussian (LoG) > The theoretical, slow way :::info **Band-pass filter** It detects objects of a certain size ::: ![](https://i.imgur.com/7GcefOa.png) ### Laplacian = second derivative Like sobel with 1 more derivation Taylor, again: ![](https://i.imgur.com/vy1GaMl.png) New filter $I_{xx} = \begin{bmatrix}&1 &-2 &1\end{bmatrix} \times I$ ![](https://i.imgur.com/BAlsMNI.png) ### Laplacian filter $\nabla^2I(x,y)$ Edge detector, like Sobel but with second derivatives ![](https://i.imgur.com/6Yho5sq.png) ### Laplacian **of Gaussian** > mexican hat ![](https://i.imgur.com/yZ1Bi2w.png) ### LoG = detector of circular shapes :::success Detector of circular shapes: ::: ![](https://i.imgur.com/UXh0ZNU.png) LoG filter extrema locates "blobs" - maxima = dark blobs on light background - minima = light blobs on dark background ### Detecting corners/blobs Build a scale space representation: *stack of images (3D) with increasing $\sigma$* ![](https://i.imgur.com/ZgwHcUl.png) ## Difference of Gaussian (DoG) Fast approximation of LoG (Used by SIFT) :::info LoG can be approximate by a Difference of 2 Gaussians (DoG) at different scales ![](https://i.imgur.com/WTrNA1i.png) ::: ### DoG filter It is a band-pass filter ![](https://i.imgur.com/xiqeck7.png) ![](https://i.imgur.com/gxogNF1.png) > L'idee est : "est-ce que mes bosses sont comprises entre telles ou telles longueurs d'onde" ### DoG filter Intuition - Gaussian (g) is a low pass filter ![](https://i.imgur.com/oOaZRnt.png) ### DoG computation in practice Take an image ![](https://i.imgur.com/zU0jFr7.png) Blur it ![](https://i.imgur.com/PNSUndv.png) Take the difference ![](https://i.imgur.com/sm39Q0d.png) ### DoG scale generation trick :::info DoG computation use "octaves" ::: - "Octave" because frequency doubles/halves between octaves - If $\sigma=\sqrt{2}$, then $3$ levels per octave - Downsample images for next octave (like double sized kernel) - Compute the DoG between images ![](https://i.imgur.com/Lx4twSa.png) ### DoG: Corner selection :::info Throw out weak responses and edges ::: Estimate gradients - Similar to harris, look at nearby responses - Not whole image, only a few points! faster - Throw out weak responses Find cornery things - Same deal, structure matrix, use dete and trace info (SIFT variant) ![](https://i.imgur.com/qUZfnNO.png) ## Determination of Hessian (DoH) :::info Faster approximation ::: ![](https://i.imgur.com/lNt6Chu.png) ![](https://i.imgur.com/ntMSuUG.png) ## LoG vs DoG vs DoH ![](https://i.imgur.com/sxQR6D6.jpg) :::success On prefere un "Laplacian of Gaussian" pour detecter les petites etoiles ::: ## LoG, DoG DoH summary ### Pros Very robust to transformations - Perspective - Blur Adjustable size detector ### Cons Slow # Blob detectors MSER ## Maximally Stable Extremal Regions (MSER) > Detects regions which are stable over thresholds 1. Create min & max-tree of the image - Tree of included components when thresholdinf the image ar each possible level - ![](https://i.imgur.com/Nbj8pYi.png) - ![](https://i.imgur.com/7kQ4564.png) - Le cerveau a une tumeur 2. Select most stable regions - between $t-\triangle$ and $t+\triangle$ - $R_{t\*}$ is maximally stable iif $q(t)=\vert R_{t-\triangle}\text{\ } R_{t+\triangle}\vert/\vert R_t\vert$ as local minimum at $t^{\*}$ - ![](https://i.imgur.com/EhzBlrT.png) ### Summary #### Pros Very robust to transformations - Affine transformations - Intensity changes Quite fast #### Cons Does not support blur # Local fetaure detectors: Conclusion - Harris Stephens: can be very stable when combined with DoG - Shi-Tomasi: Assumes affine transformation (avoid it with perspective) - DoG: slow but very robust (perspective, blur, illumination) - DoH: faster than DoG, misses small elements, better with perspective - FAST: very fast, robust to perspective change (like DoG), but blur quickly kills it - MSER: fast, very stable, good choice when no blur ![](https://i.imgur.com/AYG4TCS.png) # Introduction > Given som keypoints in image 1, what are the more similar ones in image 2 ? ![](https://i.imgur.com/9FCjhYP.png) :::info This is a **nearest neighbor problem** in **descriptor space** ::: :::warning This is also a **geometrical problem** in **coordinate space** ::: # Matching ## Matching problem :::danger Goal: given 2 sets of descriptors, find the best matching pairs ::: Need a **distance/norm**: depends on the descriptor - Distribution (histogram)? Stats? - Data type ? - Float, integers: Euclidean, cosine - Binary: Hamming ## 1-way matching :::info For each $x_i$ in the set of descriptors $D_1$, find the closest element $y_i$ in $D_2$ ::: ![](https://i.imgur.com/JIOYVVN.png) ![](https://i.imgur.com/cX90LS3.png) :::success We have a match $m(x_i, y_i)$ for each $x_i$ ::: ![](https://i.imgur.com/Lb6FbId.png) ## Symmetry test aka cross check aka 2-way matching :::info For each $x_i$ in the set of descriptor $D_1$, find the closest element $y_i$ in $D_2$ such as $x_i$ is **also the closest** element to $y_i$ ::: ![](https://i.imgur.com/oSkRfIu.png) ![](https://i.imgur.com/mXPJRmi.png) ## Ratio test :::info For each $x_i$ in $D_1$, find the 2 closest elements $y_i$ and $y_j$ in $D_2$ ::: ![](https://i.imgur.com/sSGDA3q.png) ### Calibrate the ratio :::warning Adjust it on a training set ! ::: For each correct/incorrect match in your annotated database, plot the *next to next closest distance* PDF. **What is a good ratio in D. Lowe's experiment ?** ![](https://i.imgur.com/heyBGn6.png) ## Geometric validation ![](https://i.imgur.com/HWSJlMr.png) ## Summary ![](https://i.imgur.com/emb2LW0.jpg) # Indexing ## Indexing pipeline Use case: We have a database of images and we want to find an object from it ![](https://i.imgur.com/IT7nvU1.png) ## Bruteforce matching aka linear matching :::info Simply scan all data and keep the closest elements ::: Does not scale to large databases, but can be faster on small ones ## kD-Trees > binary tree in which every leaf node is a k-dimensional point ## FLANN - Efficient indexing - Original version: hierarchical k-means - Construction: repetitive k-means on data (then inside clusters) until minimum cluster size is reached - Lookup: traverse the tree in a best-bin-first manner with backtrack queue, backtrack until enough point are returned ## Locally Sensitive Hasing (LSH) :::info Hash items using family of hash function which project similar items in the same bucket with high probability ::: :::danger Not cryptographic hashing ! ::: ![](https://i.imgur.com/8pNSTNa.png) ![](https://i.imgur.com/34UbaCg.png) En fonction de quel cote notre separatrice se trouve par rapport au point, on met notre bit a 1 ou 0 ![](https://i.imgur.com/y91ZzFV.png) ![](https://i.imgur.com/olHjq6h.png) :::success Approximation de la distance $\cos$ en binaire ::: ![](https://i.imgur.com/KBfg4xN.png) ## Locality Sensitive Hashing (LSH) - Fast and efficient with large spaces and lot of data - Return a "good match", maybe not the best one - kNN can be costly ## Which indexing ? :::success Experiment ::: Advices for practice session: ![](https://i.imgur.com/6RI3g9V.png) ![](https://i.imgur.com/pWfbArP.png) # Projective transformations ## A linear transformation of pixel coordinates ![](https://i.imgur.com/MuVLu6J.png) ## Image Mappings Overview ![](https://i.imgur.com/l0bhIFj.png) ## Math. foundations & assumptions ![](https://i.imgur.com/pRtFCWX.png) - For planar surfaces, 3D to 2D perspectives projection reduces to a 2D to a 2D transformation - This is just a change of coordinate system - This transformation is **invertible** ## Translation ![](https://i.imgur.com/hsauJ6A.png) ## Scale ![](https://i.imgur.com/FdafdCS.png) ## Rotation ![](https://i.imgur.com/2gPwaoN.png) ### Notation: Partitioned matrices ![](https://i.imgur.com/CvxLHJd.png) ## Affine ![](https://i.imgur.com/JDWlKoO.png) ## Projective ![](https://i.imgur.com/JQVbpnZ.png) ## More on projective transform - Each point in 2D is actually a vector in 3D - Equivalent up to scaling factor - Have to normalize to get back to 2D - Using homogrpahy to project point - Multiply $\tilde x$ by $\tilde H$ ## Summary ![](https://i.imgur.com/HXJQHTn.png) # Homography estimation, Geometric validation ## We want to recover H from keypoint matches ![](https://i.imgur.com/L1sAs1F.png) ## Recover the parameters of a perspective transform ![](https://i.imgur.com/QOPhp1a.png) ## How many correspondences are needed ? Depends on the type of transform: - How many for translation ? - For rotation ? - ... - For general projective transform ? :::danger Reminded: we have 2 knowns for each match ::: ![](https://i.imgur.com/qrJNg89.png) ### Linear system ![](https://i.imgur.com/AdWjP58.png) ### Use Linear Least Square to solve $Ma=b$ ![](https://i.imgur.com/7ZwlwZW.png) ### Solve the system ![](https://i.imgur.com/GFEOQIZ.png) ## How reliable is the estimate ? ![](https://i.imgur.com/NJJYp3J.png) ### Even worse ![](https://i.imgur.com/Lkqj3Hh.png) ## Is our data perfect ? ![](https://i.imgur.com/BSPY6Jy.jpg) On a pas mal de bruit ## Overcoming Least Square Limitations - We need a **robust estimation** ### RANSAC: RANdom SAmple Consensus ![](https://i.imgur.com/DG1ja8H.png) ![](https://i.imgur.com/UvDoR08.png) #### Algorithm ![](https://i.imgur.com/TDS9vMN.png) ![](https://i.imgur.com/cfc8vdt.png) ## RANSAC works well with extreme noises ![](https://i.imgur.com/LWMNKmj.jpg)