Part 3 - HackMD

--- tags: Digital Image Processing disqus: hackmd --- # Part 3 This part starts with some digital image processing techniques that are applied to images. ## Relationship between Pixels It is assumed that the person is familiar with pixels as discussed in Part 1. Consider a pixel in the image. We can imagine that if we consider that pixel as a reference, then we can assign the pixels around it like its neighbours. If we consider the vertical and horizontal pixels around it, then we get a total of 4 neighbours as, ![](https://i.imgur.com/aHnbaLD.png) This is a pixel set of 4 neighbours, also denoted as, $p = N_4(p)$. Each of these pixels is a unit distance away from the reference pixel. The upper and lower pixels are the vertical neighbours, while the other two are horizontal neighbours. What if a boundary pixel is considered as a reference? Thinking upon it, for a non - corner case, the number of neighbours will be $3$ and that for a corner pixel will be $2$ (think on this). If we consider the diagonal pixels also ($N_D(p)$), then we will have 8 pixels as neighbours $(N_8(p) = N_4(p) \cup N_D(p))$, which is represented as, ![](https://i.imgur.com/PaNVJoY.png) ## Connectivity of Points It is a very important concept, which is used in segmentation of a path or an area present in the image. Defining a threshold value $Th$, the image is thresholded such that, \begin{equation} F(x,y) = \begin{cases} (x,y) \in \text{object}, F(x,y) > Th \\ (x,y) \in \text{background, otherwise} \end{cases} \end{equation} One may choose to set the pixel intensity value as white (or $1$) if the threshold is crossed, else it is black (or $0$). Consider a binary image $B$. Therefore, the two pixels will be connected if they are adjacent in some sense. So, either they are neighbours or their intensities are similar. So, for two pixels $p,q$ present in the image, they shall be connected if $p \in N(q)$ or $q \in N(p)$ and $B(p) = B(q)$. Let $V$ be a set of grey levels to define the connectivity of two points $p, q \in V$. Related to this, there are three types of connectivity, 1. 4 - connectivity $p,q \in V$ & $p \in N_4(q)$. 2. 8 - connectivity $p,q \in V$ & $p \in N_8(q)$. 3. M - connectivity (mixed connectivity) $q \in N_4(p)$ or $q \in N_D(p)$ and $N_4(p) \cap N_4(q) = \phi$. This avoids the problem of multiple paths. ## Adjacency Two pixels are said to be adjacent if they are connected. Therefore, depending on the connectivity scheme used, there are three types of adjacency used, 1. 4 - adjacency 2. 8 - adjacency 3. m - adjacency ![](https://i.imgur.com/4DcEJ2o.png) ## Path A path from $p(x,y)$ to $q(s,t)$ is a sequence of distinct pixels $(x_i, y_i)$, where for $i = 0$, we have $p$ and for $i = n$, we have $q$. For the other values of $i$, we have the distinct pixels that are between these two. $n$ is the length of the path. ## Connected Component Considering two points $p, q \in S$, these two points are connected if there are a set of distinct pixels connecting the two in $S$. For these set of pixels, the set connected to $p$ is known as a connected component of $S$. ![](https://i.imgur.com/cor1kDg.png) ![](https://i.imgur.com/UeyT1kj.png) The figure above shows the two connected component sets. ### Algorithm for Group Identification 1. Scan the image from left to right and from top to bottom as a 2D array is traversed through. 2. Assuming 4 - connectivity 3. Consider a pixel $P(x,y)$ that is accessed. Before that, $P(x-1,y)$ and $P(x,y - 1)$ have also been accessed. Now, let $l(P)$ is the pixel value at $P$ and $L(P)$ is the label assigned to it. If $l(P) = 0$, move to next scanning position, else when we encounter $1$, but the connected components are $0$, then assign a new label to $P$. For the case where all of them are $1$, assign the common label to them and if at least one of them is $0$. This process can be demonstrated as shown, ![](https://i.imgur.com/0ViK7CC.png) So, the final result is obtained as, ![](https://i.imgur.com/C1GpVPE.png) Therefore, the two entities have been segmented. ## Distance between Points Given two points $p(x,y)$ and $q(s,t)$, the distance between them can be calculated as, \begin{equation} D(p,q) = \sqrt{(x - s)^2 + (y - t)^2} \end{equation} This is known as Euclidean Distance. $D$ can be any metric and is said to known as a distance metric if, 1. $D(p,q) \geq 0$, $D(p,q) = 0$, means $p$ and $q$ are the same point. 2. $D(p,q) = D(q,p)$ 3. $D(p,z) \leq D(p,q) + D(q,z)$, where $p$, $q$ and $z$ are three points in space. The set of point $S = \{q | D(p,q) \leq r\}$ are the points situated in a disk of radius $r$, centered at point $p$. Another metric is known as City Block or Manhatten Distance, which is represented as, \begin{equation} D_4(p,q) = |x - s| + |y - t| \end{equation} The set $S$ can now be interpreted as the formation of a diamond, spanning a distance $r$ or $r$ blocks in each quadrature axes (or $2r$ in horizontal and vertical axes), centred at point $p$. For another form of metric, known as Chessboard Distance, \begin{equation} D_8(p,q) = max(|x - s|,|y - t|) \end{equation} The set $S$ form a square centred as $p$. For $D_8 = 1$, we form the $8$ - neighbour group. Applications of such distance measure can be for shape matching. ### Skeletonization It is the process of reducing the foreground region of the binary image into a skeletal remnant that largely preserves the extent and connectivity of the original region while throwing away most of the original foreground pixels. Distance measure techniques can be used to do so. Being digital pixels, it is possible to perform logical and arithmetic operations between them. Consider two pixels $p$ and $q$. Then it is possible to perform addition/subtraction ($p \pm q$), multiplication ($p * q$) and modulus ($p \% q$). Also, the logical operations can be performed such as AND ($p.q$), OR ($p + q$) and NOT ($p'$). Logical operations are performed on binary images. NOT can be used for binary image inversion. ![](https://i.imgur.com/DSlmCiz.png) Apart from pixel operations, it is possible to perform neighbourhood operations also such as finding the average value of the centre and eight other pixels and replacing each location with the calculated average. This concept can be generalized by using templates, where another template matrix is taken and weighted average (average of each of the terms in linear combination with the coefficients as the terms of the template.) This is shown as, ![](https://i.imgur.com/3XpkBvh.png) This can be used in various operations in denoising, thinning and edge detections. Rotation and scale invariance