CV shit - HackMD

# CV shit ## computer vision tasks -  Image representation -  Image description -  Feature extraction -  Image matching -  Stereo image processing -  Range data processing -  Image segmentation ## Levels in image processing - low level - improves quality of image - for humans, and higher routines - mid level - feature extraction and pattern detection - high level - classification - recog - object identification - top down approach - high level DIP tasks - Segmentation - Recognition - Compression - Motion analysis ## Neighbourhood procesing tasks - Smoothing / averaging - Noise removal / filtering - Edge detection - Contrast enhancement ## Distance ### Minoswski dist - p<1 not metric dist (triangle ineq fail) ### quadratic form distance - d = sqrt(traspose(x-y)\*A\*(x-y)) - cross bin distance - used to compare histograms - specfies cross dependencies of dims - A is similarity matrix - A is identity => euclidean - A is diagonal => weighted euclidean - A is +ve semi difnite (for dist to be >= 0) - metric distance - A = 1 - Cij/Cmax (for color histogram) ## Covariance and shit - if x, y are independent, covar is 0 - but if covar is zero, x and y are not necessarily independent - if covar matrix is diag, then all vars are uncorrelated - covar matrix shows relationship with all variables - inverse of covar matrix - concentration matrix/precision matrix - shows partial correlation and partial variances - shows relationshiop with neighbours ### mahalabonis distance - replace A in quadratic form distance with inverse of covariance matrix ### Histogram intersection - used to compare histograms - d(h1, h2) = 1-sum(min(h1i, h2i))/sum(h1) - not metric distance (non symmetric) - metric when sum of both histo same - when comparing normalised histogram, then metric dista (diff size or same) - use - image similarity - change detection - content based image retreival ### cosine distance - d= 1 - cos(<(x, y)) = 1 - x.y/|x||y| - metric distance - angle 0 => d = 0, angle 90 => d = 1 (max angle is 90, because first quad) - application - document matching = two documents with same ratio of words ### Bhattacharya - coefficient - approximate measure of overlap of 2 distributions - determines relative closeness of 2 samples - used to compare 2 normalised histograms - B(x, y) = sum(sqrt(xi\*yi)) - hellinger distance - d = 1 - B(x, y) - metric - bhattacharya distance - d = -lnB(x, y) - not metric (no triangle ineq) ### Hausdroff - to compare geomtric shape - min min function - problem - doesnt depend on shape - one point can be very far away - posiition - hausdroff is max min distance - not metric (not symmetric) ### Edit distance - to compare strings and words - levenhstein - min no of edit operations to convert x to y - insert - del - substitution - metric dist - if string are same size, hamming dist is upper bound - is 0, iff strings are equal - lower bound = len(x) - len(y), upper bound = max(len(x), len(y)) - application - error correction - pattern matcing - cons - not normalised (depends on the len of string) ## Fourier - For fourier -> spatial, need both magnitude and phase - fourier is stored in float, because it has higher range (spatial is stored in int) - Fourier transform images are always symmetrical about its center (magnitude spectrum) - Phase in fourier transform is symmertrical, but with a 180 deg shift (-ve phase) - center is F(0, 0) - DC value = center value (?) [avg of brightness] - fmax = 1/(2*pixel) - logarithmic transform shows the other frequencies too - Both halfs have the same amount of info, but need both halves to recreate orig ### FFT - Only generates half - other half by rot and dupl ### mag and phase - magnitude: the presence of sinusoid in orig func - phase: relative placement of sine and cosine waves - phase is more important ### comp - complexity of 1d fourier = O(N^2) - complexity of FFT = O(nlg(n)) - complexity of 2d fourier = O(N^4) - complexity of 2d fourier with 1d = O(N^3) - complexity of 2d fourier with 1d FFT = O(N^2 lg(N)) ### Properties of dft - periodic, with period N - conjugate symmertry (slide 4, 15) (pg 79) - f(x, y) real and even => F(u, v) real and even - f(x, y) real and odd => F(u, v) imag and odd - scaling, pg 88 - distribution: (add/subtr) F(f+g) = F(f) + F(g) - laplacian: pg90 F[dn f(x)] = (2(pi)ju)^n*F(u) - translation: 91, useful for translating by N/2 - rotation - average = F(0, 0)/N ## Filtering - removal unwanted - enhancing image - point processing = works on pixel - negative - contrast stertching - thresholding - histogram equalization - area or mask processing = works on neighbourhood - need to define area, size and operation - operation is weighting the pixel - differnt weights: sharpen, smoothen, edge detection etc - filter = mask/kernel/weight matrix - handling pixel on boudaries: wrap around or pad with zeros #### correlation and convolution - coreclation = multiply and add - convolution = rotate by 180 (flip x and y) and multiply and add ### spatial filter - convotional filters - linear - box (avg/mean) filter - performs average smoothing - sum of mask is 1 - all weights are equal - gaussian filter - weights depend on distance from pixel - sigma: defines the sharp and flat of peak (sigma high, peak flat) - complexity O(2kn^2) (worst n^2k^2) - order statistics filter - non linear - median filter - rank order filter - hybrid - combination of two #### Problems - value near wrong pixel will increase ### Order statistic - median filter - replace by median insterad of mean - advantage - sharpness is preserved - occasional (wrong) high wont affect - if more noise, more than one pass might do good - rank order - any nth order (min, max, median)# CV shit ## computer vision tasks -  Image representation -  Image description -  Feature extraction -  Image matching -  Stereo image processing -  Range data processing -  Image segmentation ## Levels in image processing - low level - improves quality of image - for humans, and higher routines - mid level - feature extraction and pattern detection - high level - classification - recog - object identification - top down approach - high level DIP tasks - Segmentation - Recognition - Compression - Motion analysis ## Neighbourhood procesing tasks - Smoothing / averaging - Noise removal / filtering - Edge detection - Contrast enhancement ## Distance ### Minoswski dist - p<1 not metric dist (triangle ineq fail) ### quadratic form distance - d = sqrt(traspose(x-y)\*A\*(x-y)) - cross bin distance - used to compare histograms - specfies cross dependencies of dims - A is similarity matrix - A is identity => euclidean - A is diagonal => weighted euclidean - A is +ve semi difnite (for dist to be >= 0) - metric distance - A = 1 - Cij/Cmax (for color histogram) ## Covariance and shit - if x, y are independent, covar is 0 - but if covar is zero, x and y are not necessarily independent - if covar matrix is diag, then all vars are uncorrelated - covar matrix shows relationship with all variables - inverse of covar matrix - concentration matrix/precision matrix - shows partial correlation and partial variances - shows relationshiop with neighbours ### mahalabonis distance - replace A in quadratic form distance with inverse of covariance matrix ### Histogram intersection - used to compare histograms - d(h1, h2) = 1-sum(min(h1i, h2i))/sum(h1) - not metric distance (non symmetric) - metric when sum of both histo same - when comparing normalised histogram, then metric dista (diff size or same) - use - image similarity - change detection - content based image retreival ### cosine distance - d= 1 - cos(<(x, y)) = 1 - x.y/|x||y| - metric distance - angle 0 => d = 0, angle 90 => d = 1 (max angle is 90, because first quad) - application - document matching = two documents with same ratio of words ### Bhattacharya - coefficient - approximate measure of overlap of 2 distributions - determines relative closeness of 2 samples - used to compare 2 normalised histograms - B(x, y) = sum(sqrt(xi\*yi)) - hellinger distance - d = 1 - B(x, y) - metric - bhattacharya distance - d = -lnB(x, y) - not metric (no triangle ineq) ### Hausdroff - to compare geomtric shape - min min function - problem - doesnt depend on shape - one point can be very far away - posiition - hausdroff is max min distance - not metric (not symmetric) ### Edit distance - to compare strings and words - levenhstein - min no of edit operations to convert x to y - insert - del - substitution - metric dist - if string are same size, hamming dist is upper bound - is 0, iff strings are equal - lower bound = len(x) - len(y), upper bound = max(len(x), len(y)) - application - error correction - pattern matcing - cons - not normalised (depends on the len of string) ## Fourier - For fourier -> spatial, need both magnitude and phase - fourier is stored in float, because it has higher range (spatial is stored in int) - Fourier transform images are always symmetrical about its center (magnitude spectrum) - Phase in fourier transform is symmertrical, but with a 180 deg shift (-ve phase) - center is F(0, 0) - DC value = center value (?) [avg of brightness] - fmax = 1/(2*pixel) - logarithmic transform shows the other frequencies too - Both halfs have the same amount of info, but need both halves to recreate orig ### FFT - Only generates half - other half by rot and dupl ### mag and phase - magnitude: the presence of sinusoid in orig func - phase: relative placement of sine and cosine waves - phase is more important ### comp - complexity of 1d fourier = O(N^2) - complexity of FFT = O(nlg(n)) - complexity of 2d fourier = O(N^4) - complexity of 2d fourier with 1d = O(N^3) - complexity of 2d fourier with 1d FFT = O(N^2 lg(N)) ### Properties of dft - periodic, with period N - conjugate symmertry (slide 4, 15) (pg 79) - f(x, y) real and even => F(u, v) real and even - f(x, y) real and odd => F(u, v) imag and odd - scaling, pg 88 - distribution: (add/subtr) F(f+g) = F(f) + F(g) - laplacian: pg90 F[dn f(x)] = (2(pi)ju)^n*F(u) - translation: 91, useful for translating by N/2 - rotation - average = F(0, 0)/N ## Filtering - removal unwanted - enhancing image - point processing = works on pixel - negative - contrast stertching - thresholding - histogram equalization - area or mask processing = works on neighbourhood - need to define area, size and operation - operation is weighting the pixel - differnt weights: sharpen, smoothen, edge detection etc - filter = mask/kernel/weight matrix - handling pixel on boudaries: wrap around or pad with zeros #### correlation and convolution - coreclation = multiply and add - convolution = rotate by 180 (flip x and y) and multiply and add ### spatial filter - convotional filters - linear - box (avg/mean) filter - performs average smoothing - sum of mask is 1 - all weights are equal - gaussian filter - weights depend on distance from pixel - sigma: defines the sharp and flat of peak (sigma high, peak flat) - complexity O(2kn^2) (worst n^2k^2) - order statistics filter - non linear - median filter - rank order filter - hybrid - combination of two #### Problems - value near wrong pixel will increase ### Order statistic - median filter - replace by median insterad of mean - advantage - sharpness is preserved - occasional (wrong) high wont affect - if more noise, more than one pass might do good - rank order - any nth order (min, max, median) ## Edge Detection - Edge is a boundary between two homogeneous regions - The gray level properties of the two regions on either side of an edge - are distinct, and - exhibit some local uniformity or homogeneity among themselves - Edge Operators - ![](https://i.imgur.com/TIS6c7r.png) - Laplacian Operator - ![](https://i.imgur.com/dCE1zHv.png) -