--- tags: Digital Image Processing disqus: hackmd --- # Part 15 ## Image Segmentation There are two approaches to tackle this, 1. Discontinuity based approach - abrupt changes are considered. This can be useful for identification of isolated points, lines or edges. 2. Similarity-based approach - such as thresholding operation, region growing and region splitting and merging. ### Discontinuity Based Approach Using masks, isolated points, lines or edges can be detected. For point detection, it is possible to have a mask like, \begin{bmatrix} -1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1 \end{bmatrix} A point is located at $(x,y)$ in the image if the convolution of mask and the image at a window yields $R$, which satisfies, $|R| > T$, where $T$ is a nonnegative threshold value. For the detection of lines, we can have masks like, ![](https://i.imgur.com/jGyabVZ.png) So, when these four masks are applied, we consider a particular line case if its score is maximum among all the four. In case of edge detection, there will be a transition from brighter to dark to bright or dark to bright to dark, across an edge. So, as discussed to some extent before, taking first or second derivatives yields the prominence of an edge or sharp intensity change. Second derivative can be used to extract secondary characteristics such as determining the type of edge transition and also to find the location of the edge at zero crossings. Given image $f(x,y)$, its gradient is, \begin{equation} \nabla f(x,y) = \begin{bmatrix} G_x \\ G_y \end{bmatrix} = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} \end{equation} Its magnitude can be written as $|\nabla f(x,y)| = \sqrt{G_x^2 + G_y^2} \approx |G_x| + |G_y|$. Its direction can be written as $\alpha(x,y) = \tan^{-1}\big(\frac{G_y}{G_x}\big)$. #### Prewitt Edge Operator Horizontal and vertical edges are determined as, 1. Horizontal \begin{bmatrix} -1 & -1 & -1 \\ 0 & 0 & 0 \\ +1 & +1 & +1 \end{bmatrix} 2. Vertical \begin{bmatrix} -1 & 0 & +1 \\ -1 & 0 & +1 \\ -1 & 0 & +1 \end{bmatrix} #### Sobel Edge Operator Horizontal and vertical edges are determined as, 1. Horizontal \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ +1 & +2 & +1 \end{bmatrix} 2. Vertical \begin{bmatrix} -1 & 0 & +1 \\ -2 & 0 & +2 \\ -1 & 0 & +1 \end{bmatrix} #### Laplacian Operator It is defined as, \begin{equation} \nabla^2 f(x,y) = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} \end{equation} The mask for it was defined as, \begin{bmatrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{bmatrix} or \begin{bmatrix} -1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1 \end{bmatrix} It is also possible to have a negation of it, that is, positive sides and negative center. However, this operator is quite sensitive to noise and produces double edges. #### Laplacian of Gaussian (LoG) Gaussian: \begin{equation} h(x,y) = exp\bigg(\frac{-(x^2 + y^2)}{2\sigma^2}\bigg) \end{equation} For $x^2 + y^2 = r^2$, we can have, \begin{equation} \nabla^2 h = \bigg(\frac{r^2 - \sigma^2}{\sigma^4}\bigg)exp\bigg(\frac{-r^2}{2\sigma^2}\bigg) \end{equation} This is represented as, ![](https://i.imgur.com/nK53Wo6.png) In the form of a kernel, we obtain, ![](https://i.imgur.com/mqVba2V.png) The edges are now to be linked to get a more prominent result. For edge linking, we have two possibilities, 1. Local processing approach 2. Global processing approach ### Local Processing Given a point $(x,y)$ another point $(x',y')$ lies in its neighbourhood, or $(x',y') \in N_{xy}$, which can be quantified in terms of strength and direction as, \begin{equation} |\nabla f(x,y) - \nabla f(x',y')| \leq T \\ |\alpha (x,y) - \alpha (x',y')| \leq A \end{equation} Where $T$ and $A$ are strength and direction thresholds respectively. ### Hough Transform (Global Processing) It is a mapping from a spatial domain to parametric space. A straight line in $xy$ plane can be written as $y = m_1x + c_1$, where $m_1$ is slope and $c_1$ is y-intercept. So, two parameters are governing the position and orientation of the line. If another coordinate system made of $m$ and $c$ is constructed, the line will only be represented by a single point. Suppose in $xy$ plane, only a single point is denoted. So, for a line to pass through, say, $(x_1,y_1)$, we have $y_1 = m_1x_1 + c_1$. Therefore, $c_1 = y_1 - m_1x_1$. Therefore, the equivalent in $mc$ domain is a straight line. Say there are two points $(x_i, y_i)$ and $(x_j, y_j)$. In that case, the straight line passing through it will have a unique $m$ and $c$, so, it will be represented by a point in $mc$ domain. So, we need to find the equation of the line passing through these two points. Since each point is represented by a line in $mc$ plane, we will have two such lines for different point. Their intersection will yield the straight line passing through the two points. So, in general, all the points lying in a straight line will have the same slope and intercept, therefore, all possible lines of each point meet at a common point. The plane is subdivided based on the ranges as, ![](https://i.imgur.com/7EUaYJ8.png) These cells are accumulator cells. So, for $(i,j)$ cell, it has values $(m_i, c_j)$. Each of these cells is set of zero initially. So, the cells with higher intensities can be considered as a potential line. We allow $m$ to take all possible values in the specified range. Given the coordinate points $(x_k,y_k)$, the value of $c$ is evaluated. The obtained coordinate. So, for $m_p \to c_q$, we increement each cell as $A(p,q) = A(p,q) + 1$. At the end of the process, whenever an accumulator cell is analysed, say one of them has value $Q$. This is the number of points that lie in a particular line. Note that there will be some lines which will have lesser accumulator values. So, a threshold is set such that whenever the accumulator has more than the threshold value, then it is considered as a line. Note that the slope cannot become very large, since $m$ can tend to infinity. So, instead of using this notation, another notation for line is used, which is, \begin{equation} \rho = x\cos{\theta} + y\sin{\theta} \end{equation} ![](https://i.imgur.com/07zaHJC.png) Here, the value of $\theta = \pm 90^{0}$ and $\rho = \sqrt{M^2 + N^2}$, where $M \times N$ is the image size. ![](https://i.imgur.com/bVrYs2d.png) So, we have the accumulator at $(\theta_i, \rho_j)$. For the example below, we can get the curves as, ![](https://i.imgur.com/vJsmJ7U.png) ### Thresholding An image is represented by a 2D function $f(x,y)$. Given a dark object in a bright background or vice versa. If the histogram is plotted, there are two maxima present, one for the darker pixels and the other for brighter pixels. So, we get a valley in the middle, more or less like, ![](https://i.imgur.com/0Iu7JGb.png) Consider a pointed intensity of $T$ in the valley. If $f(x,y) > T$, we denote that part as white (255), else black (0). In that case, we get a binary image which has the segmented object. This is known as thresholding and $T$ is the threshold. In the case of multimodal classifications, ![](https://i.imgur.com/xsSfF9k.png) There are two valleys and therefore, there are two threshold values, $T_1$ and $T_2$. So, each region indicates either background or foregrounds. If thresholding is only a function of the image, it is known as global thresholding. If it is a function of image $f(x,y)$ and local property $p(x,y)$ it is known as local thresholding. In general, if thresholding is the function of the image, local property as well as the pixel $(x,y)$, then this thresholding is called adaptive or dynamic thresholding. By looking at the histogram, is it possible to segment the image using a histogram? ### Automatic Thresholding 1. Choose an initial value of threshold. 2. Using this, perform segmentation operation, partitioning image into $G_1$ and $G_2$. 3. Compute the mean or average intensity value for groups as $\mu_1$ and $\mu_2$. 4. Set new threshold as $T = \frac{\mu_1 + \mu_2}{2}$. Repeat from step 2, until convergence or $T_i - T_{i + 1} \leq T'$, where $T'$ is a limit. Instead of performing thresholding on the image, it is also possible to perform it in grids or smaller regions, to ensure that illumination variation throughout doesn't affect the final thresholding greatly. ### Optimal Thresholding Here, the histogram is considered as a probability distribution function or PDF, with the random variable $z$ as $p(z)$. ![](https://i.imgur.com/e8yLsfm.png) The histogram here can be considered as a combination of two PDFs. The final histogram can be represented as $p(z) = P_1p_1(z) + P_2p_2(z)$, where $P_1 + P_2 = 1$. ![](https://i.imgur.com/e2dFUWY.png) Consider the threshold $T$ set in between. In that case, we have the error for background pixel as an object is, \begin{equation} E_1(T) = \int_{-\infty}^{T}p_2(z)dx \end{equation} Conversely, for object to be in background pixel will be, \begin{equation} E_2(T) = \int_{T}^{\infty}p_1(z)dx \end{equation} Overall error will be, \begin{equation} E(T) = P_1E_1(T) + P_2E_2(T) \end{equation} The error has to be minimized as $\frac{\partial E(T)}{\partial T} = 0$. On solving, we get the expression as, \begin{equation} P_1p_1(T) = P_2p_2(T) \end{equation} If the PDF is assumed to be Gaussian Distribution, we have, \begin{equation} p(z) = \frac{P_1}{\sqrt{2\pi}\sigma_1}e^{-\frac{(z - \mu_1)^2}{2\sigma_1^2}} + \frac{P_2}{\sqrt{2\pi}\sigma_2}e^{-\frac{(z - \mu_2)^2}{2\sigma_2^2}} \end{equation} From the expression above, the value of $T$ can be obtained as, $AT^2 + BT + C = 0$, where $A = \sigma_1^2 - \sigma_2^2$, $B = 2(\mu_1\sigma_2^2 - \mu_2\sigma_1^2)$ and $C = \sigma_1^2\mu_2^2 - \sigma_2^2\mu_1^2 + 2\sigma_1^2\sigma_2^2\ln{\frac{\sigma_2P_1}{\sigma_1P_2}}$. If $\sigma_1^2 = \sigma_2^2 = \sigma^2$, we have, \begin{equation} T = \frac{\mu_1 + \mu_2}{2} + \frac{\sigma^2}{\mu_1 - \mu_2}\ln{\frac{P_2}{P_1}} \end{equation} ### Local Thresholding with Gradient In case of uneven distribution of modes of the histogram, it is possible to take the local area inside the image as, ![](https://i.imgur.com/wrUeeLt.png) Here, the distribution of the histogram in the narrow strip is symmetric. Therefore, the thresholding is quite simple. However, this can't be used in such a form since the boundary is unknown. In that case, it is possible to use image gradients (use Laplacian), then the edge can be obtained. From before, we know that, \begin{equation} |\nabla f(x,y)| = \sqrt{G_x^2 + G_y^2} \approx |G_x| + |G_y|, G_x = \frac{\partial f(x,y)}{\partial x}, G_y = \frac{\partial f(x,y)}{\partial y} \end{equation} Also, Laplacian can be written as, \begin{equation} \nabla^2 f(x,y) = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} \end{equation} So, we define the image, \begin{equation} s(x,y) = \begin{cases} 0, |\nabla f| < T \\ +, |\nabla f| \geq T, \nabla^2 f \geq 0 \\ -, |\nabla f| \geq T, \nabla^2 f < 0 \end{cases} \end{equation} Three different intensity values will take place of each of the conditions. Depending on the transitions, we can have $- \to +$ and $+ \to -$ transition. The former denotes the transition from background to object and vice versa for the latter. ### Region Growing Given a region $R$ in the image, it is splitted or segmented into partitions $R_1, R_2, ..., R_n$. The splitting should be done such that $\cup_i R_i = R$. Also, the region $R_i$ should be connected. Also, $R_i \cap R_j = \phi, i \neq j$. The predicate for the pixels should be true for a particular region, but false with the union with another partition. Consider an image with some seed points as shown, ![](https://i.imgur.com/3K3WGwG.png) The regions are grown based on various similarity measures, such as comparing the intensity values. Now, what is done is that a window is initialized, using which each of the seed points is grown. Recursive algorithms can be used for the same. ### Splitting and Merging ![](https://i.imgur.com/WzsnFwk.png) Here, the images are broken into partitions (initially broken into four). If the intensity variation or similarity in a partition is less, in that case, it remains as it is. If there is a large variation, then further split this region into four more partitions. The partitioning is done as, ![](https://i.imgur.com/s23EHdg.png) Now, each of the final nodes is compared with each other. If they are similar, then they can be merged.