# CV shit
## computer vision tasks
- Image representation
- Image description
- Feature extraction
- Image matching
- Stereo image processing
- Range data processing
- Image segmentation
## Levels in image processing
- low level
- improves quality of image
- for humans, and higher routines
- mid level
- feature extraction and pattern detection
- high level
- classification
- recog
- object identification
- top down approach
- high level DIP tasks
- Segmentation
- Recognition
- Compression
- Motion analysis
## Neighbourhood procesing tasks
- Smoothing / averaging
- Noise removal / filtering
- Edge detection
- Contrast enhancement
## Distance
### Minoswski dist
- p<1 not metric dist (triangle ineq fail)
### quadratic form distance
- d = sqrt(traspose(x-y)\*A\*(x-y))
- cross bin distance
- used to compare histograms
- specfies cross dependencies of dims
- A is similarity matrix
- A is identity => euclidean
- A is diagonal => weighted euclidean
- A is +ve semi difnite (for dist to be >= 0)
- metric distance
- A = 1 - Cij/Cmax (for color histogram)
## Covariance and shit
- if x, y are independent, covar is 0
- but if covar is zero, x and y are not necessarily independent
- if covar matrix is diag, then all vars are uncorrelated
- covar matrix shows relationship with all variables
- inverse of covar matrix
- concentration matrix/precision matrix
- shows partial correlation and partial variances
- shows relationshiop with neighbours
### mahalabonis distance
- replace A in quadratic form distance with inverse of covariance matrix
### Histogram intersection
- used to compare histograms
- d(h1, h2) = 1-sum(min(h1i, h2i))/sum(h1)
- not metric distance (non symmetric)
- metric when sum of both histo same
- when comparing normalised histogram, then metric dista (diff size or same)
- use
- image similarity
- change detection
- content based image retreival
### cosine distance
- d= 1 - cos(<(x, y)) = 1 - x.y/|x||y|
- metric distance
- angle 0 => d = 0, angle 90 => d = 1 (max angle is 90, because first quad)
- application
- document matching = two documents with same ratio of words
### Bhattacharya
- coefficient
- approximate measure of overlap of 2 distributions
- determines relative closeness of 2 samples
- used to compare 2 normalised histograms
- B(x, y) = sum(sqrt(xi\*yi))
- hellinger distance
- d = 1 - B(x, y)
- metric
- bhattacharya distance
- d = -lnB(x, y)
- not metric (no triangle ineq)
### Hausdroff
- to compare geomtric shape
- min min function
- problem
- doesnt depend on shape
- one point can be very far away
- posiition
- hausdroff is max min distance
- not metric (not symmetric)
### Edit distance
- to compare strings and words
- levenhstein
- min no of edit operations to convert x to y
- insert
- del
- substitution
- metric dist
- if string are same size, hamming dist is upper bound
- is 0, iff strings are equal
- lower bound = len(x) - len(y), upper bound = max(len(x), len(y))
- application
- error correction
- pattern matcing
- cons
- not normalised (depends on the len of string)
## Fourier
- For fourier -> spatial, need both magnitude and phase
- fourier is stored in float, because it has higher range (spatial is stored in int)
- Fourier transform images are always symmetrical about its center (magnitude spectrum)
- Phase in fourier transform is symmertrical, but with a 180 deg shift (-ve phase)
- center is F(0, 0)
- DC value = center value (?) [avg of brightness]
- fmax = 1/(2*pixel)
- logarithmic transform shows the other frequencies too
- Both halfs have the same amount of info, but need both halves to recreate orig
### FFT
- Only generates half
- other half by rot and dupl
### mag and phase
- magnitude: the presence of sinusoid in orig func
- phase: relative placement of sine and cosine waves
- phase is more important
### comp
- complexity of 1d fourier = O(N^2)
- complexity of FFT = O(nlg(n))
- complexity of 2d fourier = O(N^4)
- complexity of 2d fourier with 1d = O(N^3)
- complexity of 2d fourier with 1d FFT = O(N^2 lg(N))
### Properties of dft
- periodic, with period N
- conjugate symmertry (slide 4, 15) (pg 79)
- f(x, y) real and even => F(u, v) real and even
- f(x, y) real and odd => F(u, v) imag and odd
- scaling, pg 88
- distribution: (add/subtr) F(f+g) = F(f) + F(g)
- laplacian: pg90 F[dn f(x)] = (2(pi)ju)^n*F(u)
- translation: 91, useful for translating by N/2
- rotation
- average = F(0, 0)/N
## Filtering
- removal unwanted
- enhancing image
- point processing = works on pixel
- negative
- contrast stertching
- thresholding
- histogram equalization
- area or mask processing = works on neighbourhood
- need to define area, size and operation
- operation is weighting the pixel
- differnt weights: sharpen, smoothen, edge detection etc
- filter = mask/kernel/weight matrix
- handling pixel on boudaries: wrap around or pad with zeros
#### correlation and convolution
- coreclation = multiply and add
- convolution = rotate by 180 (flip x and y) and multiply and add
### spatial filter
- convotional filters - linear
- box (avg/mean) filter
- performs average smoothing
- sum of mask is 1
- all weights are equal
- gaussian filter
- weights depend on distance from pixel
- sigma: defines the sharp and flat of peak (sigma high, peak flat)
- complexity O(2kn^2) (worst n^2k^2)
- order statistics filter - non linear
- median filter
- rank order filter
- hybrid - combination of two
#### Problems
- value near wrong pixel will increase
### Order statistic
- median filter
- replace by median insterad of mean
- advantage
- sharpness is preserved
- occasional (wrong) high wont affect
- if more noise, more than one pass might do good
- rank order
- any nth order (min, max, median)# CV shit
## computer vision tasks
- Image representation
- Image description
- Feature extraction
- Image matching
- Stereo image processing
- Range data processing
- Image segmentation
## Levels in image processing
- low level
- improves quality of image
- for humans, and higher routines
- mid level
- feature extraction and pattern detection
- high level
- classification
- recog
- object identification
- top down approach
- high level DIP tasks
- Segmentation
- Recognition
- Compression
- Motion analysis
## Neighbourhood procesing tasks
- Smoothing / averaging
- Noise removal / filtering
- Edge detection
- Contrast enhancement
## Distance
### Minoswski dist
- p<1 not metric dist (triangle ineq fail)
### quadratic form distance
- d = sqrt(traspose(x-y)\*A\*(x-y))
- cross bin distance
- used to compare histograms
- specfies cross dependencies of dims
- A is similarity matrix
- A is identity => euclidean
- A is diagonal => weighted euclidean
- A is +ve semi difnite (for dist to be >= 0)
- metric distance
- A = 1 - Cij/Cmax (for color histogram)
## Covariance and shit
- if x, y are independent, covar is 0
- but if covar is zero, x and y are not necessarily independent
- if covar matrix is diag, then all vars are uncorrelated
- covar matrix shows relationship with all variables
- inverse of covar matrix
- concentration matrix/precision matrix
- shows partial correlation and partial variances
- shows relationshiop with neighbours
### mahalabonis distance
- replace A in quadratic form distance with inverse of covariance matrix
### Histogram intersection
- used to compare histograms
- d(h1, h2) = 1-sum(min(h1i, h2i))/sum(h1)
- not metric distance (non symmetric)
- metric when sum of both histo same
- when comparing normalised histogram, then metric dista (diff size or same)
- use
- image similarity
- change detection
- content based image retreival
### cosine distance
- d= 1 - cos(<(x, y)) = 1 - x.y/|x||y|
- metric distance
- angle 0 => d = 0, angle 90 => d = 1 (max angle is 90, because first quad)
- application
- document matching = two documents with same ratio of words
### Bhattacharya
- coefficient
- approximate measure of overlap of 2 distributions
- determines relative closeness of 2 samples
- used to compare 2 normalised histograms
- B(x, y) = sum(sqrt(xi\*yi))
- hellinger distance
- d = 1 - B(x, y)
- metric
- bhattacharya distance
- d = -lnB(x, y)
- not metric (no triangle ineq)
### Hausdroff
- to compare geomtric shape
- min min function
- problem
- doesnt depend on shape
- one point can be very far away
- posiition
- hausdroff is max min distance
- not metric (not symmetric)
### Edit distance
- to compare strings and words
- levenhstein
- min no of edit operations to convert x to y
- insert
- del
- substitution
- metric dist
- if string are same size, hamming dist is upper bound
- is 0, iff strings are equal
- lower bound = len(x) - len(y), upper bound = max(len(x), len(y))
- application
- error correction
- pattern matcing
- cons
- not normalised (depends on the len of string)
## Fourier
- For fourier -> spatial, need both magnitude and phase
- fourier is stored in float, because it has higher range (spatial is stored in int)
- Fourier transform images are always symmetrical about its center (magnitude spectrum)
- Phase in fourier transform is symmertrical, but with a 180 deg shift (-ve phase)
- center is F(0, 0)
- DC value = center value (?) [avg of brightness]
- fmax = 1/(2*pixel)
- logarithmic transform shows the other frequencies too
- Both halfs have the same amount of info, but need both halves to recreate orig
### FFT
- Only generates half
- other half by rot and dupl
### mag and phase
- magnitude: the presence of sinusoid in orig func
- phase: relative placement of sine and cosine waves
- phase is more important
### comp
- complexity of 1d fourier = O(N^2)
- complexity of FFT = O(nlg(n))
- complexity of 2d fourier = O(N^4)
- complexity of 2d fourier with 1d = O(N^3)
- complexity of 2d fourier with 1d FFT = O(N^2 lg(N))
### Properties of dft
- periodic, with period N
- conjugate symmertry (slide 4, 15) (pg 79)
- f(x, y) real and even => F(u, v) real and even
- f(x, y) real and odd => F(u, v) imag and odd
- scaling, pg 88
- distribution: (add/subtr) F(f+g) = F(f) + F(g)
- laplacian: pg90 F[dn f(x)] = (2(pi)ju)^n*F(u)
- translation: 91, useful for translating by N/2
- rotation
- average = F(0, 0)/N
## Filtering
- removal unwanted
- enhancing image
- point processing = works on pixel
- negative
- contrast stertching
- thresholding
- histogram equalization
- area or mask processing = works on neighbourhood
- need to define area, size and operation
- operation is weighting the pixel
- differnt weights: sharpen, smoothen, edge detection etc
- filter = mask/kernel/weight matrix
- handling pixel on boudaries: wrap around or pad with zeros
#### correlation and convolution
- coreclation = multiply and add
- convolution = rotate by 180 (flip x and y) and multiply and add
### spatial filter
- convotional filters - linear
- box (avg/mean) filter
- performs average smoothing
- sum of mask is 1
- all weights are equal
- gaussian filter
- weights depend on distance from pixel
- sigma: defines the sharp and flat of peak (sigma high, peak flat)
- complexity O(2kn^2) (worst n^2k^2)
- order statistics filter - non linear
- median filter
- rank order filter
- hybrid - combination of two
#### Problems
- value near wrong pixel will increase
### Order statistic
- median filter
- replace by median insterad of mean
- advantage
- sharpness is preserved
- occasional (wrong) high wont affect
- if more noise, more than one pass might do good
- rank order
- any nth order (min, max, median)
## Edge Detection
- Edge is a boundary between two homogeneous regions
- The gray level properties of the two regions on either side of an edge
- are distinct, and
- exhibit some local uniformity or homogeneity among themselves
- Edge Operators
- 
- Laplacian Operator
- 
-