Part 10 - HackMD

--- tags: Digital Image Processing disqus: hackmd --- # Part 10 ## Image Enhancement Image enhancement is basically to enhance certain features of an image, to make the result more suitable than the original image. They are very much problem-oriented, so, using a technique on one image doesn't mean it can be used in a general sense. They are divided into two types, 1. Spatial Domain Technique: Here the processing is done directly on the image plane and pixels undergo direct manipulation. 2. Frequency Domain Technique: Modify Fourier Transform Coefficients of the image, followed by IFT to get the modified form of the original image. The transformation $T$ can be applied to a signal $f(x)$ to get $T[f(x)] = g(x)$. For images, $g(x,y) = T[f(x,y)]$. Most of the operators cover a neighbourhood of pixels around the point $(x,y)$ where it is operated. Example, $3 \times 3$ neighbourhood can be seen as, ![](https://i.imgur.com/Bz1o9Gz.png) The neighbourhood size varies per its usage. For point transform, the neighbourhood size is often $1 \times 1$. So, the operator works on a single-pixel location, rather than the pixel as well as the neighbourhood. One example of point processing is image thresholding, where if the intensity of the pixel is greater than some set threshold, the pixel value is changed to one of the two values, else the value is the other of the two. This makes the image binary. Below transformation shows image thresholding, where a higher intensity is given to pixel with value more than the threshold. ![](https://i.imgur.com/03ncfRw.png) The other method is mask processing, where a neighbourhood is considered. Consider the same situation of $3 \times 3$, where a mask has been defined as ![](https://i.imgur.com/ih6ahkr.png) The final image can be obtained as \begin{equation} g(x,y) = \sum_{i = -1}^{1} \sum_{j = -1}^{1}w_{i,j}f(x + i,y + j) \end{equation} Depending on the objective, the values in the mask can be varied accordingly. ## Image Negative This is a point processing technique. Given the image $r$, we have to convert it to $s$. We know, $s = T(r)$. In this case, $T(r) = L - 1 - r$, where $L - 1$ is the maximum intensity level present in the image. So, dark pixels become brighter and vice versa. ![](https://i.imgur.com/tajyFtJ.png) This can be used for detecting abnormal cell deposition in an organ. ## Contrast Stretching This is a useful technique to improve the contrast of the image. Here, the dynamic range (capacity of the camera to register a range of pixels. More the dynamic range, the better is the image contrast) can be increased using different operations. Consider the intensity distribution as ![](https://i.imgur.com/BpTmRnb.png) Depending on the values of $(r_1,s_1)$ and $(r_2,s_2)$ we can find the intensity distribution across the image. If $(r_1,s_1) = (r_2,s_2)$ we get a line with slope of $45^o$. So, the image retains the same intensity and there is no deviation from the original image. Note that it is important to maintain $r_1 \leq r_2$ and $s_1 \leq s_2$ so the order is maintained and is monotonically increasing. Contrast enhancement can yield, ![](https://i.imgur.com/EDjMa8Z.png) The corresponding Fourier transform can be seen as, ![](https://i.imgur.com/S0FGCpU.png) We can see that the low contrast image has lesser frequency components, therefore, fewer intensity values, therefore, bad contrast. For enhanced technique, we have more intensity values and more Fourier components. This can also indicate the dynamic range of the image taken. However, there are certain frequency components which are redundant, and which can be removed for compression. So, dynamic range compression can be performed. One can use $s = T(r) = c\log{(1 + |r|)}$. This is a logarithmic method that can be used. Another method is power-law transformation, and can be shown as, ![](https://i.imgur.com/Yh54uWS.png) It can be defined as $s = T(r) = cr^{\gamma}$. This is also known as gamma correction. ![](https://i.imgur.com/BxpvlyT.png) This is an example of gamma correction with different values. ## Histogram Consider that the digital image has intensity levels from $0$ to $L - 1$. Consider $r_k$ as the $k^{th}$ intensity level. A histogram can be defined as $h(r_k) = n_k$, which indicates the number of pixels having intensity $r_k$. If this function is plotted, what we get is a histogram. Another form of histogram is normalized histogram shown as $p(r_k) = \frac{n_k}{n}$. This denotes the probability of having a pixel with intensity $r_k$. So, a darker image will have higher values near to the origin, whereas for a brighter image, higher values will be near to the maximum value, that is $L - 1$. At the same time, when the histogram values are quite close to each other, it implies that the image has low contrast. A higher contrast image often has the histogram spread over the whole intensity axis. Consider the situation, where $r$ represents the gray level in an image. Let it be normalized between $0$ (black) and $1$ (white) that is, $r \in [0,1]$. For point processing, we have $s = T(r)$. The transformation function $T$ has to satisfy the following functions, 1. $T(r)$ must be single valued and monotonically increasing in $0 \leq r \leq 1$. 2. $0 \leq T(r) \leq 1$, for $0 \leq r \leq 1$. Note that these parameters are also satisfied by the inverse transformation. Assume $p_r(r)$ be the probability density function of $r$ and $p_s(s)$ be the probability density function of $s$. If $T^{-1}(s)$ is single valued and monotonically increasing, \begin{equation} p_s(s) = p_r(r)\bigg|\frac{dr}{ds}\bigg|_{r = T^{-1}(s)} \end{equation} From this, we can also say, \begin{equation} s = T(r) = \int_{0}^{r}p_r(w)dw, 0 \leq r \leq 1 \end{equation} This gives the cumulative distribution of the function $r$. From this, we can find, \begin{equation} \frac{ds}{dr} = p_r(r) \end{equation} This makes $p_s(s) = p_r(r)\frac{1}{p_r(r)} = 1$. So, if we consider CDF of the function, we can get the processed image with uniform PDF, looking like a high contrast image. This can be used for enhancing the contrast. This concept can be formulated in discrete sense as, \begin{equation} s_k = T(r_k) = \sum_{i = 0}^{k}p_r(r_i) = \sum_{i = 0}^{k}\frac{n_i}{n} \end{equation} Histogram equalization can be shown as, ![Uploading file..._n75hwnz5n]() However, the issue is that it generates a single processed image, that is, it is not an interactive method. So, we can use the Histogram specification in that case. In Histogram specification, we need a target histogram and the processing must be done in such a vary that its histogram must be like the target image. Let $r$ be the intensities in the original image and $z$ in the processed image. So, we have, $s = T(r)$, where, \begin{equation} s = T(r) = \int_{0}^{r}p_r(w)dw, G(z) = \int_{0}^{z}p_z(t)dt \end{equation} So, $G(z) = T(r) = s$. This gives $G^{-1}(s) = z = G^{-1}[T(r)]$. So, given intensity at $k$, we can acquire the number of pixels representing it. Similarly, we have the same details for target histogram, other than the initial condition. So, what changes should be made in original image $z$? An iterative approach can be used for this. We have $G(z_k) = s_k$ or $G(z_k) - s_k = 0$. Consider the initial guess as $z_k = \hat{z}$, for all $k$, where $\hat{z}$ is the smallest integer yielding $G(z_k) - s_k \geq 0$. At every iteration, $\hat{z}$ is changed slightly or slight increment is done, till the solution is obtained. Note that these calculations are done with normalized equations. In discrete case, the intensity values range in between $0$ and $255$. However, the output from equalization is between $0$ and $1$. So, some post processing has to be done. In order to solve this, we can define another variable $s'$ which will be our solution as, \begin{equation} s' = Int\bigg[\frac{s - s_{min}}{1 - s_{min}}(L - 1) + 0.5\bigg] \end{equation} One can use the equalized histogram as the target image, perform histogram specification! Image Differencing: Here, we take the difference between the two images. Consider two images $f(x,y)$ and $h(x,y)$. So, difference between two images are $g(x,y) = f(x,y) - h(x,y)$. This can be used in medical imaging to check for abnormalities. Similarly, if we take the average of the same image taken for a scene, then the noise can actually decrease! This can be easily proved. An image $g(x,y)$ can be described as $g(x,y) = f(x,y) + \eta(x,y)$. So, the average can be calculated as, \begin{equation} \hat{g} = \frac{1}{k}\sum_{i = 0}^{k}g_i(x) \end{equation} It's expectation can be calculated to get, $E[\hat{g}] = f(x,y)$. This is used quite frequently in astronomy. ## Mask Processing Techniques Uptill now, the neighbourhood was considered of size $1$. However, it's size can also be more than one (in odd numbers). Recapitulating from before, we have, \begin{equation} g(x,y) = \sum_{i = -1}^{1} \sum_{j = -1}^{1}w_{i,j}f(x + i, y + j) \end{equation} ### Averaging Filter / Smoothing Filter (Lowpass Filter) We can define the filter as, \begin{equation} \frac{1}{9} \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \end{equation} This type of filter is also known as box filter. So, we get the final image as, \begin{equation} g(x,y) = \frac{1}{9}\sum_{i = -1}^{1} \sum_{j = -1}^{1}w_{i,j}f(x + i, y + j) \end{equation} This will yield a smooth and blurred image. Also, the sharp edges are blurred. To preserve this, we use weighted masks. For example, \begin{equation} \frac{1}{16} \begin{bmatrix} 1 & 2 & 1 \\ 2 & 4 & 2 \\ 1 & 2 & 1 \end{bmatrix} \end{equation} The final expression is, \begin{equation} g(x,y) = \frac{1}{16}\sum_{i = -1}^{1} \sum_{j = -1}^{1}w_{i,j}f(x + i, y + j) \end{equation} The blurring will be lesser in this case. In general, the expression is, \begin{equation} g(x,y) = \frac{\sum_{i = -a}^{a} \sum_{j = -b}^{b}w_{i,j}f(x + i, y + j)}{\sum_{i = -a}^{a} \sum_{j = -b}^{b}w_{i,j}} \end{equation} The mask size is $M \times N$, where $M = 2a + 1$, $N = 2b + 1$, or, these values are odd. Depending on the size of the kernel, different magnitudes of blurring can be achieved. For a higher-order kernel, the blurring will be more, however, the noise will be very less. ### Nonlinear Filters One of this type of filters is called a Median Filter #### Median Filter Similarly, mask size is initialized. Consider an image $f(x,y)$ patch as, \begin{bmatrix} 100 & 85 & 98 \\ 99 & 105 & 102 \\ 90 & 101 & 108 \end{bmatrix} This can be arranged in ascending order as $85, 90, 98, 99, 100, 101, 102, 105, 108$. There are a total of $9$ values present. If we consider the value at position $5$, we have $100$. This is the median of this sequence of numbers by definition. So, we put this value in the middle of the matrix. Similarly, the other elements shall be considered throughout the image. This can be considered as a statistical filter. This retains the sharpness of the image to a greater extent or edge preservation property is quite good. This filter is quite useful to tackle a particular type of noise which is random (also known as salt and pepper noise). #### Sharpening Spatial Filter Here, differentiation operations are considered. To consider the derivative, we can either use the first-order derivative or second-order derivative operation. The conditions for the first-order derivative are, 1. Must be zero in areas of the constant grey level. 2. Nonzero in case of the grey level ramp. 3. Nonzero along ramps. For the second-order derivative, 1. Zero in flat areas. 2. Nonzero at onset and end of the grey level ramp. 3. Zero along ramps of constant slope. If we have function, $f(x)$, then its derivative can be denoted as, \begin{equation} \frac{df(x)}{dx} = \lim_{\Delta x \to 0}\frac{f(x + \Delta x) - f(x)}{\Delta x} \end{equation} In discrete domain, the minimum distance between two pixels is $1$. So, the formula is reduced to, \begin{equation} \frac{\partial f(x)}{\partial x} = f(x + 1) - f(x) \end{equation} The second order for discrete domain can be written as, \begin{equation} \frac{\partial^2 f(x)}{\partial x^2} = f(x + 1) + f(x - 1) - 2f(x) \end{equation} Using practical examples, it is evident that second order derivative has stronger reaction to ramps and sudden peaks. However, for a step type discontinuity, the second derivative often yields two sudden responses, one positive and the other is negative, making double lines, unlike the first derivative, which gives only one response, therefore, giving single line. Inference: 1. The first-order derivative gives thicker edges in an image. 2. The second-order derivative has a stronger response to find details like lines and isolated points. 3. The first-order derivative has a higher response to the grey level step, whereas the second-order derivative produces a double response for the same case. In general, second-order derivatives are better suited for image enhancement. Considering isotropic operators, we can talk about Laplacian operator, which are defined for a function $f$ as, \begin{equation} \nabla^2f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2} \end{equation} For discrete domain, we can use the substitutions defined before in order to get, \begin{equation} \nabla^2f = f(x + 1, y) + f(x - 1, y) + f(x, y + 1) + f(x, y - 1) - 4f(x,y) \end{equation} This can be arranged in matrix forms as, \begin{equation} \begin{bmatrix} 0 & 1 & 0 \\ 1 & -4 & 1 \\ 0 & 1 & 0 \end{bmatrix}, \begin{bmatrix} 1 & 1 & 1 \\ 1 & -8 & 1 \\ 1 & 1 & 1 \end{bmatrix} \end{equation} The RHS mask includes diagonal too. The nature of masks remain same of the polarity of center is positive, that is, in the form, \begin{equation} \begin{bmatrix} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{bmatrix}, \begin{bmatrix} -1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1 \end{bmatrix} \end{equation} To enhance an image, Laplacian operator can be applied to it, followed by scaling it. Then, it can be applied to original image to filter out redundant noise so as to get the enhanced form of image. So, smoothness is suppresed and discontinuity can be observed. In other words, \begin{equation} g(x,y) = \begin{cases} f(x,y) - \nabla^2f(x,y), \text{-ve center coefficient} \\ f(x,y) + \nabla^2f(x,y), \text{+ve center coefficient} \end{cases} \end{equation} If the centre value is incremented by one, then the pixel value will be added. (known as a composite mask) #### Unsharp Masking Here, the blurred form of image $\bar{f}$ is subtracted from the original image $f$ as, \begin{equation} f_s(x,y) = f(x,y) - \bar{f}(x,y) \end{equation} #### High Boost Filtering \begin{equation} f_{hb}(x,y) = Af(x,y) - \bar{f}(x,y), A \geq 1 \end{equation} This can be also written as, \begin{equation} f_{hb}(x,y) = (A-1)f(x,y) + f(x,y) -\bar{f}(x,y) = (A - 1)f(x,y) + f_s(x,y), A \geq 1 \end{equation} If the sharpened image is obtained using Laplacian operator, it can be written as, \begin{equation} f_{hb}(x,y) = \begin{cases} Af(x,y) - \nabla^2f(x,y), \text{-ve center coefficient} \\ Af(x,y) + \nabla^2f(x,y), \text{+ve center coefficient} \end{cases} \end{equation} #### First Order Derivative Operators (Sobel Operators) We know that the gradient of a function $f$ can be written as, \begin{equation} \nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} \end{equation} It's magnitude can be written as, \begin{equation} |\nabla f| = \sqrt{\frac{\partial^2 f}{\partial x^2} + \frac{\partial^2 f}{\partial y^2}} \approx \bigg|\frac{\partial f}{\partial x}\bigg| + \bigg|\frac{\partial f}{\partial y}\bigg| \end{equation} Using the original definitions, we can obtain, \begin{equation} \frac{\partial f}{\partial x} \approx [f(x + 1, y - 1) + f(x + 1, y + 1) + 2f(x + 1, y)] - [f(x - 1, y - 1) + f(x - 1, y + 1) + 2f(x - 1, y)] \end{equation} A similar value for $\frac{\partial f}{\partial y}$ can also be obtained. These operators are also known as Sobel operators. Their masks can be written as, \begin{equation} \frac{\partial f}{\partial x}: \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix}, \frac{\partial f}{\partial y}: \begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix} \end{equation} Sometimes, such simple filters are used in combination with other image enhancement techniques too!