Edge Detection

--- tags: image processing --- # Edge Detection ## Table of Contents [TOC] ## Fundamentals ### adjacency, connectivity #### neighbors of a pixel 4-neighbors of $p$ ($N_4(p)$) and diagonal neibors of $p$ ($N_D(p)$): ![](https://i.imgur.com/rMev9Td.png =250x150) 8-neighbors of $p$ is $N_8(p) = N_4(p) \cup N_D(p)$ #### adjacency $V$: set of intensity value to define adjacency(for binary image, $V = \{1\}$ if we are referring to adjacency of pixels with value $1$) 3 types of adjacency (for binary image) for $p$ and $q$ with values from $V$: * 4-adjacency: $p$ and $q$ are 4-adjacent if $q \in N_4(p)$ * 8-adjacency: $p$ and $q$ are 8-adjacent if $q \in N_8(p)$ * m-adjacency (mixed adjacency): $p$ and $q$ are $m$-adjacent if (a) $q \in N_4(p)$ or (b) $N_4(p) N_4(q)$ has no pixels whose value are from $V$. For example, from the following figure ![](https://i.imgur.com/ZZv7YZL.png =200x140) $p$ and $q$ are 8-adjacent (blue dashed line) but not m-adjacent (green dashed line). $\Rightarrow$ m-adjacency eliminate the ambiguities result from 8-adjacency. #### digital path A ***digital path*** from $p$ to $q$ is a sequence of distinct pixels with coordinates $$ p = (x_0, y_0), (x_1, y_1) , ... ,(x_n, y_n) = q $$ where $(x_i, y_i)$ and $(x_{i-1}, y_{i-1}$) are adjacent. We say that $p$ and $q$ is connected in $S$ if such path exists in subset of pixels $S$. A set $S$ is called a ***connected set*** if all the pairs in $S$ are connected to each other #### boundary The ***boundary*** of a region $R$ is the set of pixels in $R$ that are adjacent to pixels in $R^C$. If $R$ happens to be an entire image, then its boundary is defined as the first and last rows and columns of the image (the case for the boundary pixels.). > **Note** > ***Edges*** are formed from pixels with *derivative values* that exceed a preset threshold. Thus, an edge is a **“local” concept** that is based on a *measure of intensity-level discontinuity* at a point. However, boundary of a finite region forms a closed path and is thus a **“global” concept**. >$\Rightarrow$ **Edge and boundary are different!** ### derivatives of a digital function We require that any approximation used for first derivatives must (1) Be $0$ in areas of constant intensity. (2) Be nonzero at the onset of an intensity step or ramp (2) Be nonzero at points along an intensity ramp. Therefore, a basic definition of the first-order derivative of a one-dimensional function $f(x)$ is the difference $$ \frac{\partial f}{\partial x} = f(x + 1) - f(x) $$ Similarly, any approximation used for first derivatives must (1) Be $0$ in areas of constant intensity. (2) Be nonzero at the onset of an intensity step or ramp. (3) Be nonzero along intensity ramps. From the Taylor's expansion of univariate funciton $f$: $$ f(x + \Delta x) = f(x) + \frac{\Delta x}{1!}f'(x) + \frac{\Delta x}{2!}f''(x) + \cdots $$ Therefore, $$ f(x + \Delta x) + f(x - \Delta x) \approx 2f(x) + f''(x) (\Delta x)^2 $$ When $\Delta x = 1$, we have $$ f''(x) \approx f(x + 1) + f(x - 1) - 2f(x) $$ ![](https://i.imgur.com/r7YN6j0.png =500x) Second-order derivative is **much more aggressive** than a first-order derivative in **enhancing sharp changes** (including noise). $\Rightarrow$ As we can see, the magnitude of the response at the point is much stronger for 2nd derivative than for the first-order derivative. Since first-order and second-order derivatives are both linear operator, we can [thus](/kbHoXQkFQgeot5d5T996AA) use convolution to get the value of them at a given point. ### direction of isolated points From the previous discussion, we know that point detection should be based on the second derivative which neans using the [Laplacian](https://hackmd.io/Y-qXmpiYRyKVAqg9UQ3o3Q#Using-the-second-derivative-for-image-sharpening): $$ \nabla ^2 f = \frac{\partial^2 f}{\partial x^2} + \frac{\partial ^2 f}{\partial y^2}. $$ To express this equation in discrete form, we use the definition in the last section, keeping in mind that we now have a second variable. In the $x$-direction, we have $$ \frac{\partial ^2 f}{\partial x^2} = f(x+ 1, y) + f(x-1, y) - 2f(x,y) $$ In the $y$-direction, we have $$ \frac{\partial ^2 f}{\partial y^2} = f(1, y + 1) + f(x, y - 1) - 2f(x,y) $$ Sum up these two equations, we have $$ \nabla^2 f(x, y) = \frac{\partial ^2 f}{\partial x^2} = f(x+ 1, y) + f(x-1, y) + f(1, y + 1) + f(x, y - 1) - 4f(x,y) $$ So the kernel can be implemented as: ![](https://i.imgur.com/1cQkuz9.png =200x) The diagonal directions can be incorporated in the definition of the digital Laplacian by adding four more terms: ![](https://i.imgur.com/4Cwal7Q.png =200x) The basic way in which we use the Laplacian for image sharpening is $$ g(x, y) = f(x, y) + c [\nabla^2 f(x, y)] $$ > **Example** > Suppose we use the first kernel, it will sharp intensity transitions an **de-emphasize regions of slowy varying intensities**: > >![](https://i.imgur.com/Zs8XgwW.jpg =500x) > > $\Rightarrow$ The new image tends to have grayish edge lines on a dconclusionark, featureless background. > By **substracting** the Laplacian image to the original, we can recovered the background features but still perserving the sharpening effect: > >![](https://i.imgur.com/esA0kWO.jpg =500x) >(with different kernels) > We can see the image on the right has a better effect in sharpness over the other. This is because the second kernel provides addtional differentiation (sharpening) in the diagonal directions. ### line detection The laplacian operator is isotropic, so its response is independent of direction. Often, interest lies in detecting lines in specified direction (use different kernel): ![](https://i.imgur.com/4YiwhCc.png) For example, if we filter an image with the leftmost kernel in the figure above, the *maximun responses* would occur at image locations in which a **horizontal line passes through the middle row of the kernel**. ### edge models Edge models are classified according to their intensity profiles. ***step edge***: ![](https://i.imgur.com/aATVhcu.png =300x) $\Rightarrow$ Characterized by a transition between two intensity levels occurring ideally over the distance of one pixel. In practice, digital images have edges that are blurred and noisy. In such situations , edges are more closely modeled as having a intensity ***ramp*** profile: ![](https://i.imgur.com/K4gehRA.png =300x) The slope of the intensity is *inversely propositional* to the degree to whcih the edge is blurred. ***roof edge***: ![](https://i.imgur.com/XKSeJbW.png =300x) Roof edges are models of lines through a region. It's usual that images contain all three types of edges. A horizontal intensity profile and its derivatives: ![](https://i.imgur.com/Mh5AJNH.png =400x) The intersection between the zero intensity axis and a line extending between the extrema of the second derivative mark a point called the ***zero crossing*** of the second derivative. From the figure above, we can conclude that: * The magnitude of the first derivative can be used to detct the presence of an edge at a point in an image. * The *sign* of the second derivative can be used to determined whether an edge pixel lies on the ramp. >**Behavior of the first and second derivatives in the region of a noisy edge** > For the following figure, the first column represent the intensity profile of the image segment. The second and the third columns represent the first derivative and the second derivative, respectively. > ![](https://i.imgur.com/pLJI9H9.png =400x) > From the figure above, it can be observed that the derivatives are sensitive to the noise, and the second derivative is more sensitive to the noise. > $\Rightarrow$ Image smoothing should be a serious consideration prior to the use of derivatives in applications. ## Basic Edge Detection ### the image gradient and its properties Some definitions: $$ \begin{align} \nabla f(x, y) \equiv [\nabla f(x, y)] \equiv \begin{bmatrix} g_x(x, y) \\ g_y(x, y) \end{bmatrix} \end{align} = \begin{bmatrix} \frac{\partial f(x, y)}{\partial x} \\ \frac{\partial f(x, y)}{\partial y} \\ \end{bmatrix} $$ $$ M(x, y) = \| \nabla f(x,y) \| = \sqrt{g_x^2(x, y) + g_y^2(x, y)} $$ The ***directoin*** of the graident vector at a point $(x,y)$ is given by $$ \alpha(x, y) = \tan^{-1} \left[ \frac{g_y(x, y)}{g_x(x,y)} \right] $$ Angles are measured in the counterclockwise direction with respect to the $x$-axis: ![](https://i.imgur.com/LypACRL.png) Note that $\nabla f(x,y)$ and $\alpha(x, y)$ becomes a *vector image*. ### gradient operators We typically use a forward or centered finite difference: $$ g_x(x, y) = \frac{\partial f(x,y)}{\partial x} = f(x+1, y) - f(x, y) $$ and $$ g_y(x, y) = \frac{\partial f(x, y)}{\partial y} = f(x, y+1) - f(x, y) $$ When diagonal edge direction is of interest, we need 2-D kernels. The following are various kernels used to compute the gradient at the point labeled $z_5$: ![](https://i.imgur.com/vSonu3O.png =600x) The fact that the Sobel kernels have **better noise-suppression** characteristics makes them preferable. To decrease the computation burden of $M(x,y)$ (use squares and square roots), an approach used frequently is to approximate the magnitude of the gradient by absolute values: $$ M(x,y) \approx |g_x| + |g_y| $$ An example of Sobel absolute value: ![](https://i.imgur.com/p1Y7Gln.jpg =500x) As you can see, the contribution made to image detail by the wall bricks is significant. One way to **reduce fine detail** is to **smooth the image prior to computing the edges**. For example, if we use a $5 \times 5$ averaging kernel prior to edge detectoin, the result would become: ![](https://i.imgur.com/6RQ2izp.png =500x) The kernels introduced above respnse predominantly for vertical and horizontal edges. The ***Kirsch compass kernels*** are designed to detect edge magnitude and directoin in **all eight compass directions**: ![](https://i.imgur.com/NsePvPr.png =500x) Kirsch’s approach was to determine the edge magnitude by convolving an image with all eight kernels and assign the edge magnitude at a point **as the response of the kernel that gave strongest convolution value at that point**. The edge angle at that point is then the direction associated with that kernel. $\Rightarrow$ If it's important to emphasize edges oriented in particular diagonal directions, then Kirsch kernels should be used. ## The Marr-Hildreth Edge Detector [Marr and Hildreth](https://hackmd.io/Mmvt2yFNRwGkQufByZEVVA) argued that 1. Intensity changes are not independent of image scale. $\Rightarrow$ Edge's detectoin requires using operators of **different sizes**. 2. A sudden intensity change will give rise to a peak or trough in the first derivative or, equivalently, to a zero crossing in the second derivative. These ideas suggest that an operator used for edge detection shoud have the following features: 1. It should be a differential operator capable of computing a digital approximation of the ﬁrst or second derivative at every point in the image. 2. It should be capable of being “tuned” to act at any desired scale, so that large operators can be used to detect blurry edges and small operators to detect sharply focused ﬁne detail. Marr and Hildreth suggested that the most satisfactory operator fulfilling these conditions is the filter $$ \nabla G(x, y) $$ ,where $G$ is the 2-D Gaussian function $$ G(x, y) = e ^{-\frac{x^2 + y^2}{2 \sigma^2}} $$ with standard deviation $sigma$. The expression for $\nabla^2 G$, called ***Laplacian of Gaussian*** (LoG), is $$ \nabla^2 G(x, y) = \left( \frac{x^2 + y^2 - 2\sigma^2}{\sigma^4} e^{-\frac{x^2 + y^2}{2 \sigma^2}} \right) $$ The plot of LoG and a $5 \times 5$ kernel that approximates the shape: ![](https://i.imgur.com/NH35SfW.png =500x) A effective way to generate a LoG kernel is sampling $\nabla ^2 G(x,y)$ to the desired size and then convolving the resulting array with a Laplacian kernel. The Marr-Hildreth algorithm consists of convolving the LoG kernel with an input image: $$ g(x, y) = \left[ \nabla^2 G(x, y) \right] * f(x, y) $$ Because the Laplacian and convolution are *linear processes*, we can write the equation above as $$ \begin{align} g(x, y) = \nabla^2 [G(x, y) * f(x,y)] \end{align} $$ The Marr-Hildreth edge-detection algorithm may be summarized as follows: 1. Filter the input image with an $n \times n$ Gaussian lowpass kenel obtained by sampling $G(x, y)$. 2. Compute the Laplacian of the image resulting from 1. 3. Find the zero crossings of the image from the previous step. ### detect zero crossing One approach for finding the zero crossing at any pixel $p$: 1. Use a $3 \times 3$ neighborhood cented at $p$. 2. Check that if the signs of at least two of its opposing neighboring pixels (left/right, up/down and the two diagonals) **differ** and the abolute value of their numerical difference exceed a certain **threshold value**. This approaches generally gives good results. If the result is not ideal, then a [technique](https://ieeexplore.ieee.org/abstract/document/4767838) for finding zero crossings with *subpixel accuracy* can by employed. ### approximation of LoG It is possible to approximate the LoG function by a ***difference of Gaussians*** (DoG): $$ \begin{align} D_G(x, y) =. \frac{1}{2 \pi \sigma_1^2} e^{{-\frac{x^2 + y^2}{2 \sigma_1^2}}} - \frac{1}{2 \pi \sigma_2^2} e^{{-\frac{x^2 + y^2}{2 \sigma_2^2}}} \end{align} $$ with $\sigma_1 > \sigma_2$. In order for the LoG and DoG to have the same zero crossings, the value of $\sigma$ for the LoG must be selected based on the following equation: $$ \begin{align} \sigma^2 = \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 - \sigma_2^2} \ln \left[ \frac{\sigma_1^2}{\sigma_2^2} \right] \end{align} $$ Although the zero crossings of the LoG and DoG will be the same when this value of $\sigma$ is used, their **amplitude scales** will be different. Experimental results show that the ratio $\sigma_1 : \sigma_2 = 1.6:1$ is a good choice to fit the human vision system. ### separability of Gaussian kernel Since the Guassian function is $$ \begin{align} G(x, y) &= e ^{-\frac{x^2 + y^2}{2 \sigma^2}} \\ &= e ^{-\frac{x^2}{\sqrt 2 \sigma}} \cdot e ^{-\frac{y^2}{\sqrt 2 \sigma}} \end{align} $$ , the Gaussian kernel is separable, which mean that the $n \times n$ kernel matrix can be decomposed to $$ u ^T \cdot v $$ , where $u$ and $v$ are both $1 \times n$ 1-D Guassian kernel. This will make the convolution much more faster when $n$ is large. ## The Canny Edge Detector ### Steps #### smoothening and compute gradients Using numerical optimization with 1-D step edges **corrupted by additive white Gaussian noise** led to the conclusion that a *good approximation* to the optimal step edge detector is the first derivative of a Gaussian: $$ \begin{align} \frac{d}{dx} e^{\frac{-x^2}{2\sigma^2}} = \frac{-x}{\sigma^2} e^{-\frac{x^2}{2 \sigma^2 }} \end{align} $$ We can generalize the precedding result to 2-D if we *know the direction of the edge normal*. Because the direction is unknown in the beginning, we have to **apply 1-D edge detector** at every point. To conclude the step described above, let $f(x,y)$ denote the input image and $G(x,y)$ denote the Gaussian function: $$ \begin{align} G(x, y) = e^{-\frac{x^2 + y^2}{2\sigma^2}} \end{align} $$ We form a smoothed image $f_s(x,y)$ by convolving $f$ and $G$: $$ \begin{align} f_s(x, y) = G(x, y) * f(x, y) \end{align} $$ This operation is followed by computing the gradient magnitude $M_s(x,y)$ and direction $\alpha (x, y)$ like [before](#Basic-Edge-Detection). #### nonmaximun suppression Gradient image $\| f_s (x, y) \|$ usually contains *wide ridges* around local maximun. To **thin the edge**, one approach is to apply ***nonmaxima suppression*** to $M_s(x, y)$. To see if a point is a local maximun, we have to compare its intensity to those of its neiboring points along all possible direction. In a $3 \times 3$ region we can define 4 orientations for an edge passing through the center point of the region: horizontal, verical $+45^\circ$ and $-45^\circ$. To define what quantize all possible edge directions into these 4 ranges, we have to define a range of directions over which we consider an edge to be horizontal: ![](https://i.imgur.com/214tLG0.png =350x) Based on the figure above, if we denote the direction that is closest to $\alpha(x, y)$ by $d_k$. Then the next step is to check whether the point is a local maxima along the direction $d_k$. Let $K$ denotes the value of $\| \nabla f_s \|$ at $(x, y)$. If $K$ is less than the value of $\| \nabla f_s \|$ at one or both of the beighbors of $(x, y)$ along $d_k$, let $g_N(x, y) = 0$ (suppression), otherwise, let $g_N(x, y) = K$. In the following figure, for example, assume the edge through $p_5$ is a horizontal edge, then we have to compares its gradient magnitude to those of $p_2$ and $p_8$: ![](https://i.imgur.com/xJI8l1d.png =500x) Applying this method on all points in $M_s(x,y)$, we will yeild a nonmaxima supressed image $g_N(x,y)$. #### reduce false edge points Canny's algorthm use ***hysteresis thresholding*** which uses **2 thresholds**: a low threshold $T_L$ and a high threshold $T_H$ ([Experiments ](https://ieeexplore.ieee.org/document/4767851)suggest that the raito of $T_H$ to $T_L$ should in the range of $2:1$ to $3:1$). We can create 2 additional *binary images*: $$ g_{NH} (x, y) = g_N(x, y) \ge T_H $$ and $$ g_{NL} (x, y) = g_N(x, y) \ge T_L $$ Finally, we mark a **nonzero** pixel in $g_{NL}$ (weak edge point) as edge pixel if it is connected to any **nonzero** pixel in $g_{NH}(x, y)$ (strong edge point) in the sense of 8-connectivity. To obtain edges one pixel thick, it is typical to follow the last step with one pass of an [edge-thinning](https://hackmd.io/aERPaYC5RNiljwsKmmWmag#thickening) algorithm. ### summary The Canny edge detection algorithm consists of the following steps: 1. Smooth the input image with a Gaussian ﬁlter. 2. Compute the gradient magnitude and angle images. 3. Apply nonmaxima suppression to the gradient magnitude image. 4. Use double thresholding and connectivity analysis to detect and link edges.