Part 17 - HackMD

--- tags: Digital Image Processing disqus: hackmd --- # Part 17 ## Object Representation and Description ### Chain Code It is a boundary-based representation scheme. The boundary is a set of points on object contour. Consider a regular grid as shown. ![](https://i.imgur.com/JZr4PlI.png) This is the $8$ connectivity situation. To move from $i \to i + 1$ point, we have $8$ different possibilities. When we move from one boundary to the next boundary point, the distance is $1$. Suppose each movement is labelled as ![](https://i.imgur.com/gfr2kQx.png) The entire boundary can be represented as a sequence of numbers or code. This is known as chain code. One problem is that due to noise, the boundary points may not follow a good code pattern. To combat this, resampling is done. Consider the object boundary. The green points along grids can be represented as the boundary points. ![](https://i.imgur.com/yyueyyg.png)![](https://i.imgur.com/xkS81XD.png) Using this, it is possible to make a chain code Given an anticlockwise numbering system from $0$ to $7$, we can formulate the code sequence. ![](https://i.imgur.com/0ziqapr.png)![](https://i.imgur.com/lVYWtS3.png) For this case and starting point, we have the chain code as $0766764543422211$. Note that this is not the code for the original boundary, but a resampled boundary. The chain code here is translation invariant, but not rotation invariant. ![](https://i.imgur.com/rBZEJsH.png) For a simpler case, we have the code as $00664422$. Suppose this is rotated by an angle of $45$ degrees, we have, ![](https://i.imgur.com/bWhftqq.png) Its chain code is $77553311$. If the two codes are analysed, it is seen that they are different. We need a rotation-invariant code system. Instead of this, we consider the differential chain code. After we get the codes, we analyse how many rotations are required to relate the two. Therefore, we consider it with rotation pattern. So, for $00664422$, we have $06060606$ and for $77553311$, we have $06060606$ as the differential chain codes. These are the same codes, therefore, rotation invariancy is achieved. So here, the codes are considered as cycle and then differences are obtained. When we consider the differential chain code as a cycle, depending on the starting point, we have to redefine the starting point, that will have a minimum numerical value. For example, for $5321632$, we can define the starting point as $1632532$. The minimum numerical value achieved is known as shape number. ### Polygonal Approximation Given a boundary, we can approximate it with a polygon. ![](https://i.imgur.com/xN4j5CB.png) We can define the connectivity in each grid and define the inner and outer wall. Then, the polygonal approximation can be done, ![](https://i.imgur.com/YwDp7MP.png) Another approach is considering a boundary and drawing two arbitrary points farthest away from each other. Now, two more points are defined as far away from the line between the original two as possible. If the distance is greater than a threshold, then the line is drawn from that point onto the original two and the same process is repeated. If it less, then the segment is not broken. This is known as boundary splitting. ![](https://i.imgur.com/8esBwWr.png) Another way of splitting is by choosing a single point and traversing the boundary to make another point if the perpendicular distance of a boundary point from the line joining the two is within the set threshold. Now that we have a polygon, we need to get a description from it. Many research papers prefer autoregressive models. ![](https://i.imgur.com/Gt3swrl.png) Given an angle in the polygon, we consider it as a linear combination of previously recorded angles, as we move in a clockwise direction. \begin{equation} \theta_n = \sum_{i = 1}^{k} \alpha_i \theta_{n - i} \end{equation} Here, we have $k$ unknown. So, we need $k$ equations. So, take $k$ number of such vertices. Solving the equations yield $\alpha_i, i = 1,2,...,k$. This can be used to describe the polygon or in turn the boundary. Another approach is obtained from the signature. Depending on the angle, the distances from the point to another point on boundary is plotted. ![](https://i.imgur.com/Aa1SBHp.png) However, this method is not translation invariant, however, normalization can solve this issue. To make it rotation invariant, we can choose two points on the eigenaxis and whichever of the two is farthest away from centroid, we consider it as the reference. ### Fourier Descriptor ![](https://i.imgur.com/HOnZGe7.png) Consider a boundary, where each point is represented by $s(k) = (x(k), y(k))$. If we assume x - axis as real axis and y - axis as imaginary, we can represent each coordinate as a complex number $s(k) = x(k) + jy(k), k = 0,1,...,N - 1$. Now, we take DFT of this sequence of complex numbers. \begin{equation} a(u) = \frac{1}{N}\sum_{k = 0}^{N - 1}s(k)e^{-j2\pi uk /N} \end{equation} This sequence can now be considered as a descriptor. If we take IDFT of $a(u)$ we get the original boundary points $s(k)$. \begin{equation} s(k) = \sum_{u = 0}^{N - 1}a(u)e^{j2\pi uk /N} \end{equation} Normally, we do not consider all Fourier coefficients. Instead, only $M < N$ coefficients are considered. So, its IDFT will yield an approximate reconstruction of the boundary. \begin{equation} \hat{s}(k) = \sum_{u = 0}^{M - 1}a(u)e^{j2\pi uk /N}, k = 0,1,...,N - 1 \end{equation} Lower coefficients retain the structure, whereas higher coefficients retain the finer aspects of the boundary shape. ### Boundary Straightness This is defined based on boundary curvature. It is defined as the ratio of the number of pixels where the boundary direction changes abruptly to the total number of boundary points. So, less the value, more is the straightness, else a lot of curves are there. ### Bending Energy If $c(k)$ is the curvature at point $k$, we can define bending energy as, \begin{equation} BE = \frac{1}{L}\sum_{k = 1}^{L}c^2(k) \end{equation} Where $L$ is the total number of pixels. ### Region-Based Shape Descriptor 1. Area: Area of a shape is the total number of pixels belonging to the region. 2. Euler Number: Consider the object below, ![](https://i.imgur.com/r7wQb9X.png) Here, we have one connected component and three holes. So, Euler number is defined as $v = s - N$, where $s$ are total CC and $N$ are the total number of holes present. This is a topologically invariant operation (unless it is zoomed so much that other features are not visible) 3. Horizontal and Vertical Projections: ![](https://i.imgur.com/m9gG4lX.png) The horizontal component is, \begin{equation} p_h(i) = \sum_j f(i,j) \end{equation} The vertical component is, \begin{equation} p_v(j) = \sum_i f(i,j) \end{equation} ### Eccentricity If we have a region, we find a chord in the region of maximum length. We find another largest chord perpendicular to the original chord. The ratio of the length of maximum chord $A$ to the other chord is known as eccentricity. ![](https://i.imgur.com/gCyAfXk.png) ### Elongatedness ![](https://i.imgur.com/8sdWLyj.png) For the given shape, a bounding rectangle has been constructed. This is known as the minimum bounding rectangle. Suppose $a$ is the length of the larger side of the rectangle and $b$ is the other side, therefore, $\frac{a}{b}$ is known as elongatedness of the shape. However, the definition is not complete, since other shapes can also yield a similar rectangle and ratio. Therefore, we calculate the thickness and length of the elongated region. It is defined as the ratio of the area of elongated region by maximum thickness squared. Another representation is, \begin{equation} \frac{\text{area}}{(2d)^2} \end{equation} Where $d$ is the number of erosions that take place before it yields a null set. ### Rectangularity It is the ratio of region area and the area of a bounding rectangle which is maximum. ### Compactness It is defined as the ratio of area to perimeter squared. The maximum value will be for a circle, that is $\frac{1}{4\pi}$ whereas the minimum possible value can be zero (for example, a straight line). ### Moments Here, a normalized grey level image can be interpreted as a probability density function. Moment of order $p + q$ is defined as, \begin{equation} m_{pq} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} x^p y^q f(x,y) dxdy \end{equation} In discrete case, this can be written as, \begin{equation} m_{pq} = \sum_{j = -\infty}^{\infty} \sum_{i = -\infty}^{\infty} i^p j^q f(i,j) \end{equation} To make the moment translation invariant, we can take the central moment as, \begin{equation} \mu_{pq} = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} (x - x_c)^p (y - y_c)^q f(x,y)dxdy \end{equation} In discrete case, this can be written as, \begin{equation} \mu_{pq} = \sum_{j = -\infty}^{\infty} \sum_{i = -\infty}^{\infty} (i - x_c)^p (j - y_c)^q f(i,j) \end{equation} Where, \begin{equation} x_c = \frac{m_{10}}{m_{00}}, y_c = \frac{m_{01}}{m_{00}} \end{equation} There are four-moment invariants. They are, ![](https://i.imgur.com/wotUwFB.png) ![](https://i.imgur.com/aLs8UHZ.png) The fourth invariant will not be discussed here. (Ref: J. Flusser and T. Suk, Pattern Recognition using Moment Invariants, Pattern Recognition, 26: 167-172, 1993) ### Textures Given each image, some textures are present. Though no precise definition of texture exists, however it is easy to visualize the basic concept. Some example images to distinguish can be, ![](https://i.imgur.com/uCXVOuz.png) Some ways to extract texture descriptors can be, 1. Statistical methods 2. Spectral-domain analysis ### Statistical Method Consider $z_i$ is a variable representing intensity of textures present in the image. Suppose $p(z_i)$ represents graylevel histogram. With this, we can have moment of order $n$ as, \begin{equation} \mu_n(z) = \sum_{i = 0}^{L - 1}(z_i - m)^n p(z_i) \end{equation} Where, \begin{equation} m = \sum_{i = 0}^{L - 1}z_ip(z_i) \end{equation} From this equation, many interesting observations can be seen. 1. For the zeroth moment, it is only the addition of all histogram terms, which is equal to $1$. Therefore, $\mu_0 = 1$. 2. $\mu_1 = 0$. $\mu_2 = \sigma^2(z)$. The second order moment is the variance. From variance, we can define a texture descriptor as, \begin{equation} R = 1 - \frac{1}{1 + \sigma^2(z)} \end{equation} For uniform intensity, $\sigma^2(z) = 0$ for uniform intensity, yielding $R = 0$. As the surface becomes more rough, $R \to 1$. $\mu_3(z)$ indicates the skewness of histogram. $\mu_4(z)$ indicates the relative flatness of the histogram (kurtosis). However, this doesn't show the relative position of pixels. ### Co - occurrance Matrix $P$ are the position operator. The matrix $A_{L \times L}$ will be made with the help of the position operator ($L$ are the number of grey levels). Each element of the matrix $a_{ij}$ indicates the number of times points with intensity $z_j$ occur, at a position determined by $P$ relative to points with intensity $z_i$. Consider an example, ![](https://i.imgur.com/HF1SZqz.png) Suppose $P$ is one pixel to right. With this, we generate the matrix $A$. Since there are three intensity levels, the matrix will be $A_{3 \times 3}$. $a_{00}$ means the frequency of where $0$ and $0$ obey the position vector criteria. Here, it is $4$. Similarly, $a_{01} = 3$. In this way, we get the matrix as, \begin{bmatrix} 4 & 3 & 1 \\ 3 & 3 & 1 \\ 1 & 1 & 1 \end{bmatrix} From this matrix, we can get the positional operation. If we sum all the elements, we get $19$. Now, the co-occurrence matrix can be defined as $C = \frac{1}{19}A$. Every element $c_{ij}$ in the matrix represents the joint probability of a pair of points that satisfies $P$ will have values $(z_i, z_j)$. ![](https://i.imgur.com/2XXUokH.png) 1. The maximum probability $\max_{i,j}\{c_{ij}\}$. 2. Element difference moment of order $k$. This is defined as, \begin{equation} \sum_i \sum_j (i - j)^k c_{ij} \end{equation} 3. Inverse of this is defined as, \begin{equation} \sum_i \sum_j \frac{c_{ij}}{(i - j)^k} \end{equation} 4. Entropy is defined as, \begin{equation} - \sum_i \sum_j c_{ij} \log{c_{ij}} \end{equation} This indicates the degree of randomness. 5. Uniformity is defined as, \begin{equation} \sum_i \sum_j c^2_{ij} \end{equation} This will have the highest value if all $c_{ij}$ are equal. ### Spectral Descriptors Here, descriptors are defined from the spectral domain representation. So, first, we need to find the Fourier Spectrum. For convenience, this Fourier Spectrum is converted into polar coordinates. So it can be represented as $S(\theta,r)$. For each value of $r$, we can have a one dimensional function as $S_r(\theta)$. This indicates the behaviour of the spectrum around a circle of radius $r$. A more global descriptor can be represented with it as, \begin{equation} S(r) = \sum_{\theta = 0}^{\pi} S_\theta(r) \end{equation} \begin{equation} S(\theta) = \sum_{r = 1}^{R} S_r(\theta) \end{equation} So, an image having size of $N \times N$, $R$ is generally taken as $R \approx \frac{N}{2}$. From this, the descriptors which are normally generated are the location of the highest peak. Another kind can be mean and variance of amplitude as well as axial variation. It can also have the distance between the mean and highest value of the function. ![](https://i.imgur.com/gsKRURT.png) Depending on the texture format, it is possible to differentiate them from each other using the spectral descriptors. ### Shape Number This was done before with eight directions. Using this, we had obtained the chain code, differential chain code and shape number. If we consider four directions, we can still find all these parameters. ![](https://i.imgur.com/O45AKt5.png) The order of shape number is the number of digits present in shape number. For example for $0013210$, the order is $7$. Suppose we have two different objects with their shape numbers, ![](https://i.imgur.com/TKF1qNu.png) ![](https://i.imgur.com/asweGDO.png) The shape seen here has been divided by the major and minor axis, followed by a grid representation by drawing a rectangle and then dividing into segments. How we need to decide how many cells should be required for generating the shape number. First, we find the eccentricity of the given shape. Suppose $n = 18$. We need to obtain a rectangle, whose shape number is $18$. There are a limited number of possibilities though. The eccentricity of the object matches with the eccentricity of rectangle, making the rectangle $3 \times 6$ in this case. This will generate the required shape number. Coming to the two shapes, when we have the two shape numbers, we ensure that they have the same order. Here, comes the concept of degree of similarity. Given $S_1$ and $S_2$, then degree of similarity $k$ is defined as $S_i(A) = S_i(B)$ for $i \leq k$ and $S_i(A) \neq S_i(B)$ otherwise. Using this, we can find out whether the two given shapes are similar or not. From this, we can define distance function as $d(A,B) = \frac{1}{k}$. ![](https://i.imgur.com/qrfmhoO.png) Here, we can see that increasing the order changes the similarity of the shapes. Up to order $10$, we can find the similarity between the shapes, with the best match with $A$. This is known as decision trees. ### Feature Based Recognition We define an object in terms of a feature vector $F \to n$ of dimension $n$. If $n = 2$, then it can be represented with the features as $(x_1, x_2)^T$. There can be more too. ![](https://i.imgur.com/drltA4u.png) Here we can see the variation of feature vector parameters can separate the two objects or characteristics $\omega_1$ and $\omega_2$. It is possible to separate them by a function as well. Suppose it is separated by $g(x)$. If $g(x^1) > 0$, then the function lies on one of its sides, and if $g(x^2) < 0$, then this point lies on the other side of the function. $g(x)$ is also known as a discriminant function. For $M$ number of classes, we have the functions $g_1(x), g_2(x),..., g_M(x)$. We can define the decision rule such that $g_i(x)$ is the discriminant function for class $\omega_i$. Suppose $g_i(x) > g_j(x)$ $\forall j, j \neq i \implies x \in \omega_i$. Otherwise, $g_i(x) - g_j(x) = 0$. We attempt to design a discrimant for each pair of classes such that $g_{ij}(x) = g_i(x) - g_j(x)$. If $g_{ij}(x) > 0$, it means $g_i(x) > g_j(x)$, therefore, $x \in \omega_i$. If negative, $y \in \omega_j$. ### Minimum Distance Classifier For $M$ classes, we have feature vectors. Every class $i$ is represented by feature vector, \begin{equation} m_i = \frac{1}{N_i}\sum_{x \in \omega_i}x \end{equation} This is the representative of class $\omega_i$. So, we will have $M$ such representations for each class. So, given any unknown vector $x$, we can find the Euclidean norm from each class $D_i(x) = ||x - m_i||$. We assign it the class closest to its corresponding mean representation. In vector form, we can represent this as, $D_i(X) = X^TX - 2X^Tm_i + m_i^Tm_i$. This is equivalent to having a discriminant function $g_i(x) = X^Tm_i - \frac{1}{2}m_i^Tm_i$. This is known as minimum distance classifier. ### Optimum Statistical Classifier Suppose $x \in \omega_i$ but classifier has inferred that $x \in \omega_j$. Once we get such a wrong decision, we define a loss $L_{ij}$. The average loss is defined as $r_j(x) = \sum_{k = 1}^{M} L_{kj}p(\omega_k / X)$. We know that $p(a|b) = \frac{p(a)p(b|a)}{p(b)}$. Therefore, \begin{equation} r_j(x) = \frac{1}{p(x)} \sum_{k = 1}^{M}L_{kj}p(X|\omega_k).P(\omega_k) \end{equation} For simplicity, \begin{equation} r_j(x) = \sum_{k = 1}^{M}L_{kj}p(X|\omega_k).P(\omega_k) \end{equation} The loss will be $0$ for a correct decision and $1$ for wrong decision. So, $L_{ij} = 1 - \delta_{ij}$, where $\delta_{ij} = 1, i = j$ else $0$. So, $r_j(x) = p(x) - p(x|\omega_j)P(\omega_j)$. If $p(x|\omega_i)P(\omega_i) > p(x|\omega_j)P(\omega_j)$, then the feature vector will be assigned to class $\omega_i$. This is equivalent to having a discriminant function $g_i(x) = p(x|\omega_i)P(\omega_i)$. For most of the applications, these density functions are Gaussian PDF. It is of the form, \begin{equation} p(X|\omega_i) = \frac{1}{(2\pi)^{n/2}|C_i|^{1/2}}\exp[-(x - m_i)^TC_i^{-1}(x - m_i)] \end{equation} Therefore, \begin{equation} $g_i(x) = \ln{P(\omega_i)} - \frac{1}{2}\ln{|C_i|} - \frac{1}{2}[(x - m_i)^TC_i^{-1}(x - m_i)] \end{equation} If the covariance matrix for all classes are same and equiprobable $C_i = C$, therefore, \begin{equation} g_i(x) = -\frac{1}{2}[(x - m_i)^TC^{-1}(x - m_i)] \end{equation} This is known as Mahalonobis distance and this is the optimal statistical classifier. This can be used to train a neural network as well. The type of neural network used for object type recognition is a multilayer feed-forward network. (More information in Deep Learning course) ![](https://i.imgur.com/J70ilZR.png) -----------------------------------------------