:::info # <center><i class="fa fa-edit"></i> Introduction to Convolution </center> ::: ###### tags: `Convolution` ` Filters` `Kernel` `Fourier Transform` :::success The objectives of this task was to - Feature detection and extraction - Image filtering ::: ## What is Convolution? Convolution is a term that rings bell to many who have had experience either with the digital signal processing (DSP), digital image processing or worked in the field of computer graphics. Usually, we can convolve two functions together in time domain which is equivalent to multiplication in frequency domain to get some output. Convolution finds application in various domains such as * Machine Vision * Audio Processing * Statistics * Computer Graphics Mathematically, convolution can be described as an operation (mathematical in nature) where two functions are combined and resulting overlap is used to describe them. We can think of this intuitively as having two functions where where we multiply function values at the point that they overlap and we can we keep sliding one function over the other and adding the resulting products to create a new function. :::warning Convolution is an integral that describes the amount of overlap of one function call it f(t) as the other function that overlaps with it g(t) is shifted in time ::: Convolution formula is given as $$ (f * g)(t) = \int_{-\infty}^{\infty} f(\tau)\, g(t - \tau)\, d\tau $$ where - f(t) is the input signal - g(t) is the filtering kernel - (f * g)(t) is the resulting filtered signal We can observe that the terms of f(t) (input signal) are multiplied by terms which are time shifted by filtering signal. A fundamental property defined by the operator says that convolution operation in time domain is equal to multiplication in frequency domain, which means that a signed kernel which is properly designed can be used to either amplify or attenuate signals. These principles can be extend to digital image processing where we can sharped or blur an image by performing the same convolution operation. Given an image is 2D discrete signal, the discrete convolution can be performed in 2 dimensions using a discrete kernel as follows The 2D discrete convolution of an image \( y \) with a kernel \( k \) is given by: $$ (y * k)[i, j] = \sum_n \sum_m y[i - n, j - m]\, k[n, m] $$ Convolution is in fact a scalar product between the filter weights and image pixels. The filter is a matrix where the center item of the kernel is used to weight the center pixel while the adjust matrix values weights the neighbouring pixel. At the border of the image, different strategies can be employed to ensure filtering is done as expected. Careful selection of kernel weights is needed to realize the desired effect. ## Naive 2D Convolution implementation using C++ The code below is a naive implementation of 2D convolution in c++ and padding is not taken into account. This explains the resultant matrix (feature map) being smaller than the original image. ```cpp= #include<iostream> #include<vector> #include<cmath> #include<algorithm> using namespace std; // 2D convolution function without padding vector<vector<double>> TwoDConvolution(vector<vector<double>> &Matx, vector<vector<double>> &kernel){ int matrix_height = Matx.size(); int matrix_width = Matx[0].size(); int kernel_height = kernel.size(); int kernel_width = kernel[0].size(); int output_height = matrix_height - kernel_height + 1; int output_width = matrix_width - kernel_width + 1; vector<vector<double>> output(output_height, vector<double>(output_width, 0.0)); for (int i = 0; i < output_height; i++) { for (int j = 0; j < output_width; j++) { double res = 0.0; for (int k = 0; k < kernel_height; k++) { for (int l = 0; l < kernel_width; l++) { res += Matx[i + k][j + l] * kernel[k][l]; } } output[i][j] = res; } } return output; } int main(){ // Simple test input matrix vector<vector<double>> input = { {1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 6}, {1, 2, 3, 4, 5, 6} }; // Input kernel vector<vector<double>> kernel = { {1, 2, 3}, {1, 2, 3}, {1, 2, 3} }; vector<vector<double>> feature_map = TwoDConvolution(input, kernel); cout << "Resultant matrix is shown below:" << endl; for (int i = 0; i < feature_map.size(); i++) { for (int j = 0; j < feature_map[0].size(); j++) { cout << feature_map[i][j] << "\t"; } cout << endl; } return 0; } ``` --- ## Convolution in Image Processing Convolution is the heart of many filtering algorithms that we use on day to day basis. Three signals of interest are related by convolution which are * Input signal - array of pixels extracted from a picture * Filtering kernel - Filter kernel with specific coefficients * Output signal - Modified filtered Image Think of convolution as a process of transforming a given image by applying kernel (filter) across the image. The kernel values determine the output of the resultant image. ### 1D Convolution Recall that images are 2D dimensional signals however some filter operators can be decomposed since they are separable and they can be applied in two passes. In the first pass, the filter can be convoled columnwise then row-wise or vice versa. Example of separable filters include * Gaussian filter * Sobel edge detector * Average blur Therefore, we can say that a separable convolution can be written as convolution of of two 1D filters. The separability of convolution saves the computation time, since performing 1D convolution is less computationally intensive. ## Convolution implementation in python Similar to the c++ implementation above, I did the same python implementation for convolution. The python implementation proceeds in the following steps. This is the original image which filtering processes will be applied. ![kodim21](https://hackmd.io/_uploads/S1flthD1el.png) ## Python Code Implementation ```python= #install needed libraries, numpy, matplotlib and PIL !pip install numpy !pip install matplotlib !pip install PIL #imporrt libraries import numpy as np from PIL import Image import matplotlib.pyplot as plt #read the image img_path = r"C:\Users\lang_es\Desktop\CompVision\PythonConv\kodim21.png" img = Image.open(r"C:\Users\lang_es\Desktop\CompVision\PythonConv\kodim21.png") #function to display the image #function to plot images def show_image(input_img,title="filtered image"): plt.imshow(input_img) plt.xlabel("X axis ") plt.ylabel(" Y axis") plt.title(title) #display the image show_image(img,"original image") #function to plot the images side by side #function to plot images adjacent to one another def plot_images(input_img, filtered_img, title="original image", title_2 = "filtered image"): fig, ax = plt.subplots(1,2, figsize=(10,5)) #plot the first image ax[0].imshow(input_img, cmap = "gray" ) ax[0].set_title(title) ax[0].axis("off") #plot the second image ax[1].imshow(filtered_img) ax[1].set_title(title_2) ax[1].axis("off") #convolution function def my_convolution(image_path, filter_kernel): #load the image and convert grayscale image = Image.open(image_path) #convert to grayscale converted_image = image.convert("L") # obtain numpy array of image image_array = np.array(converted_image) #dimension of the image and the kernel image_height, image_width = image_array.shape kernel_height, kernel_width = filter_kernel.shape # calculate output dimensions output_height = image_height - kernel_width output_width = image_width - kernel_width # initialize output array output = np.zeros((output_height, output_width)) # apply convolution for i in range(output_height): for j in range(output_width): #extract sub-array of the image under kernel sub_array = image_array[i:i+kernel_height, j:j+kernel_width] # convolve sub array with kernel convolved_value = np.sum(sub_array * filter_kernel) #store the result in output array output[i,j] = convolved_value return output #Gaussian Kernel gaussian_kernel = np.array([ [1,2,1], [2,4,2], [1,2,1] ]) gauss_Image = my_convolution(img_path, gaussian_kernel) #plot the images plot_images(img, gauss_Image, "original image", "filtered image") ``` This is the output of the filtered image when compared side by side by original image ![output](https://hackmd.io/_uploads/r1I3_3PJex.png) ### edge detection - vertical ```python= # Box mean/average filter #Edge detector vertical_kernel = np.array([ [-1,0,1], [-2,0,2], [-1,0,1] ]) vert_Image = my_convolution(img_path, vertical_kernel) plot_images(img, vert_Image, "original image", "Vertical edge filter") ``` Output image ![5daf1187-fdd5-4e13-ad56-e1254713fa3a](https://hackmd.io/_uploads/SJi_I6wJxl.png) ### edge detection horizontal ```python #Edge detector - Horizontal horizontal_kernel = np.array([ [-1,-2,-1], [0,0,0], [1,2,1] ]) horizontal_Image = my_convolution(img_path, horizontal_kernel) plot_images(img, horizontal_Image, "original image", "Horizontal edge filter") ``` Resultant image is shown below ![d9e782d9-3b61-4f4e-8f3b-45c4ad3cbcb0](https://hackmd.io/_uploads/S1BDDTvkgg.png) ### laplacian filter - detects different edge orientation ```python= laplacian_kernel = np.array([ [-1,-1,-1], [-1,8,-1], [-1,-1,-1] ]) laplacian_Image = my_convolution(img_path, laplacian_kernel) plot_images(img, laplacian_Image, "original image", "Horizontal edge filter") ``` Resultant output ![374db0d2-99f2-4187-af2a-1b4d3cea8b0e](https://hackmd.io/_uploads/S1V0KTvJeg.png) ___ ## Fourier Transform Fourier Transform is a faster way of convolving an image with a large filter kernel in contrast to using a conventional convolution filter. :::info In its simplest form, fourier transforms can be thought of writing functions as sums of sinusoids Another good analogy is that consider the fourier transform as being given a smoothie and as a talented chef that you are, you are expected to find the ingredients through filtering. Once you have filtered all the ingredients you can blend them together once again to obtain the original smoothie ::: # Fourier Transform Definition The Fourier transform (FT) of a function $f(x)$ is $F(\omega)$, defined as: $$F(\omega) = \int_{-\infty}^{\infty} f(x)e^{-i\omega x} dx$$ The inverse Fourier transform is: $$f(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} F(\omega)e^{i\omega x} d\omega$$ where $i = \sqrt{-1}$ and $e^{i\theta} = \cos \theta + i\sin \theta$. A Fourier transform pair is written as $f(x) \leftrightarrow F(\omega)$ or $\mathcal{F}\{f(x)\} = F(\omega)$. - If $f(x)$ is a signal, $F(\omega)$ is its **spectrum**. - If $f(x)$ is a filter's impulse response, $F(\omega)$ is its **frequency response**. ## Fourier Transform Properties Summary | Domain Property | Frequency Property | |------------------------|---------------------------------| | discrete | periodic | | periodic | discrete | | discrete, periodic | discrete, periodic | | box | sinc | | sinc | box | | Gaussian | Gaussian | | impulse train | impulse train | ## Fourier Transform in Image Processing Fourier transform is an image processing tool that can be used to decompose an image into constituent sine and cosine signals. As mentioned earlier the output of the transformation represents the image in the frequency domain while the image itself is the spatial domain. Fouerier transform finds applicability in many domains such as image analysis, image filtering, image compression as well as reconstruction. Considering we are dealing with digital images, discrete fourier transform (DFT) will be considered. DFT unlike the FT contains only the sampled frequencies forming an image, this set of samples is large enough to describe the image in the spatial domain fully. :100: The fouerier domain and the spatial domain are of the same dimension. The 2D DFT is given by The 2D Discrete Fourier Transform (DFT) of a discrete signal $f(i, j)$ of size $N \times N$ is given by $F(k, l)$: $$F(k, l) = \sum_{i=0}^{N-1} \sum_{j=0}^{N-1} f(i, j) e^{-j2\pi (\frac{ki}{N} + \frac{lj}{N})}$$ The equation above can be intepreted as f(i,j) is the spatial domain while the exponential is the basis function that corresponds to points in the Fourier space. We can intutively say to obtain the frequency components of an image all we need is the spatial coordinates of an image (pixel values) and then we multiply them with exponential space which corresponds to points in the fourier space. The 2D Inverse Discrete Fourier Transform (IDFT) of a frequency domain representation $F(k, l)$ of size $N \times N$ recovers the original discrete signal $f(a, b)$: $$f(a, b) = \frac{1}{N^2} \sum_{k=0}^{N-1} \sum_{l=0}^{N-1} F(k, l) e^{j2\pi (\frac{ka}{N} + \frac{lb}{N})}$$ $$ \frac{1}{N*N} i $$ This is the normalization term for the inverse transformation. As seen from the formula of the DFT, to perform fourier transformation or recover the image, we have to perform a double sum and we can leverage this property to split the computation process. The formula provided in isolation, resembles the 1D DFT performed along one dimension of a 2D signal. If we consider $P(k, b)$ as the result of a 1D DFT along the rows of a 2D signal, then this formula calculates $F(k, l)$ by performing a 1D DFT along the columns of $P(k, b)$: $$F(k, l) = \frac{1}{N} \sum_{b=0}^{N-1} P(k, b) e^{-j2\pi \frac{lb}{N}}$$ After the first transformation, the second transformation can also be done and we will have completed the transformation. It has been observed that performing the computations in this style is much faster though it is not comparable to the fast fourier transform. The Fourier Transform of an image yields a **complex-valued output**. This output can be represented by: - **Real and Imaginary Parts** - **Magnitude and Phase** In image processing, the **magnitude** of the Fourier Transform is often displayed because it primarily reveals the **geometric structure** of the original spatial domain image. However, to accurately **reconstruct the spatial domain image** after frequency domain processing, it's crucial to **preserve both the magnitude and the phase** of the Fourier Transform. The values in the Fourier domain typically have a **greater dynamic range** than the spatial domain image, and therefore are usually stored as **float values** for sufficient precision. ### Example : Given a rectangular signal in time domain we can perform fourier transformation of the rectangular pulse to obtain its fourier transformation which is a sinc function. The derivation is given below We consider a rectangular pulse: $$ f(t) = \begin{cases} A, & -\frac{W}{2} \leq t \leq \frac{W}{2} \\ 0, & \text{otherwise} \end{cases} $$ The continuous Fourier transform of \( f(t) \) is given by: $$ F(\mu) = \int_{-\infty}^{\infty} f(t) e^{-j2\pi \mu t} \, dt $$ $$ F(\mu) = \int_{-\frac{W}{2}}^{\frac{W}{2}} A e^{-j2\pi \mu t} \, dt $$ This reduces to $$ F(\mu) = A \int_{-\frac{W}{2}}^{\frac{W}{2}} e^{-j2\pi \mu t} \, dt $$ $$ = \left[ \frac{-A}{j2\pi \mu} e^{-j2\pi \mu t} \right]_{-\frac{W}{2}}^{\frac{W}{2}} = \frac{-A}{j2\pi \mu} \left( e^{-j\pi \mu W} - e^{j\pi \mu W} \right) $$ $$ = \frac{A}{j2\pi \mu} \left( e^{j\pi \mu W} - e^{-j\pi \mu W} \right) $$ We use Euler's identity: $$ e^{j\theta} - e^{-j\theta} = 2j \sin(\theta) $$ $$ F(\mu) = \frac{A}{j2\pi \mu} \cdot (2j \sin(\pi \mu W)) $$ $$ = \frac{2Aj \sin(\pi \mu W)}{j2\pi \mu} = A \cdot \frac{\sin(\pi \mu W)}{\pi \mu} $$ Multiply numerator and denominator by \( W \): $$ F(\mu) = AW \cdot \frac{\sin(\pi \mu W)}{\pi \mu W} $$ Thus, we arrive at: $$ F(\mu) = AW \cdot \text{sinc}(\pi \mu W) $$ Where: $$ \text{sinc}(x) = \frac{\sin(x)}{x} $$ --- The sinc function is quite useful in filtering and interpolation in the field of digital signal processing. We can see from the derivation that the pulse is defined between some given interval meaning that this function forms the basis of designinbg filters. Sinc is low pass filter. The components of signal with frequencies that are less than the cut off frequency are let to pass unaffected while higher frequencies can be removed completely. ### Python Code Implementation for DFT ```python= import cv2 import numpy as np import math def my_dft(img_path): # Read the image in grayscale img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE) if img is None: raise ValueError("Image not found ") # Convert it to a numpy array img_data = np.asarray(img, dtype=np.float32) M, N = img_data.shape # image dimensions # Initialize matrix to store complex DFT values dft_matrix = np.zeros((M, N), dtype=np.complex64) # Perform 2D DFT manually for u in range(M): # u the frequency along the x-axis for v in range(N): # v the frequency along the y-axis sum_val = 0.0 + 0.0j # Initialize sum_val as a complex number # Sum over all spatial points (x, y) for x in range(M): # x for y in range(N): # y is the position along the vertical axis angle = -2 * math.pi * ((u * x / M) + (v * y / N)) # Add the contribution of the current pixel (f(x, y)) sum_val += img_data[x, y] * np.exp(1j * angle) # Store the result in the frequency domain (u, v) dft_matrix[u, v] = sum_val return dft_matrix ``` ### Fourier transform of an image I left the code to run for more than 10 minutes on my CPU but the computation of the fourier transform was in complete. The possible reasons for this is the matrix size(image) being large which gets quite tricky for the CPU. As an alternative, I considered using the inbuilt fourier transform implemented in numpy and presented below is the code implementation as well as the results. ```python= import numpy as np import matplotlib.pyplot as plt import cv2 # Load image img = cv2.imread('kodim10.png', cv2.IMREAD_GRAYSCALE) if img is None: raise ValueError("Image not found!") # Perform 2D FFT fft_result = np.fft.fft2(img) # Shift the zero-frequency component to the center fft_shifted = np.fft.fftshift(fft_result) # Compute the magnitude spectrum: Reveals the intensity of each signal in the frequency magnitude_spectrum = np.abs(fft_shifted) ``` ### Output of the fourier transformation ![image](https://hackmd.io/_uploads/HJLlXsj1el.png) **low frequencies** - slowly varying components of the image **high frequencies** - rapidly changing image parts e.g. edges and noise ## Image Blurring using Gaussian filter ```python= import cv2 import numpy as np import matplotlib.pyplot as plt # Read the image img = cv2.imread('kodim10.png', cv2.IMREAD_GRAYSCALE) if img is None: raise ValueError("Image not found!") # Perform 2D FFT fft_result = np.fft.fft2(img) # Shift the zero-frequency component to the center fft_shifted = np.fft.fftshift(fft_result) # Create a low-pass filter (Gaussian filter) rows, cols = img.shape crow, ccol = rows // 2, cols // 2 # Create a meshgrid for filter x, y = np.meshgrid(np.arange(cols), np.arange(rows)) d = np.sqrt((x - ccol)**2 + (y - crow)**2) # distance to the center # Gaussian low-pass filter sigma = 30 low_pass_filter = np.exp(-(d**2) / (2 * sigma**2)) # Apply the low-pass filter to the FFT of the image fft_shifted_filtered = fft_shifted * low_pass_filter # Inverse FFT to get the image back in the spatial domain fft_filtered = np.fft.ifftshift(fft_shifted_filtered) # Reverse the shift img_filtered = np.fft.ifft2(fft_filtered) # Apply inverse FFT # Take the magnitude (real part) of the result img_filtered = np.abs(img_filtered) # Plot the original and blurred images def plot_images(input_img, filtered_img, title, title_2): fig, ax = plt.subplots(1, 2, figsize=(12, 6)) # Plot the first image ax[0].imshow(input_img, cmap='gray') ax[0].set_title(title) ax[0].axis("off") # Plot the second image ax[1].imshow(filtered_img, cmap='gray') ax[1].set_title(title_2) ax[1].axis("off") plt.show() plot_images(img, img_filtered, "Original Image", "Blurred Image (Frequency Domain)") ``` Output of the processed images ![image](https://hackmd.io/_uploads/B18oi2oklg.png) :::success In a nutshell we can summarize the image processing in the fourier transform as follows. We map the spatial image coordinates(pixels) to the frequency domain. We can then choose to play around with the frequency domain components of the matrix to obtain the desired goals. Recall that when performing convolution, we always convolve a filter with the input image to obtain a feature map. In the frequency domain we simply multiply the filter with the input matrix to obtain the filtered image. Finally we perform the inverse fourier transform to retrieve our processed image. :::