Group[5]_HW[2]_report

# Group[5]_HW[2]_report ## Task 1 - Hybrid Image ### Introduction - Hybrid images blend the low-frequency content of one image with the high-frequency details of another, producing a single image that appears differently depending on viewing distance. By applying low-pass and high-pass ideal filter and Gaussian filter using Fourier transforms, this task creates hybrid images, illustrating how frequency manipulation can affect visual perception. ### Implementation Procedure - The funciton in this block automatically detects the number of images and creates hybrid image pairs. Additionally, you can switch between filters by adjusting the parameter (ideal/gaussian). ```python= img_dir = './data/task1and2_hybrid_pyramid/' img_list = sorted(os.listdir(img_dir)) D0 = 30 for i in range(len(img_list) // 2): print(f"low-pass image: {img_list[i*2]}, high-pass image: {img_list[i*2+1]}") img1_result, img2_result, hybrid_result = hybrid_images(img_dir + img_list[i*2], img_dir + img_list[i*2+1], D0, 'gaussian') plot_images(img1_result, img2_result, hybrid_result) ``` - The function creates a hybrid image by combining the low-frequecy content of one image with the high-frequency content of another. It reads and resizes two images, applies Fourier transforms, filters them with low-pass and high-pass filters, then covert them back to the spatial domain. Last, the two filtered images are added together to produce the final hybrid image. The procedure follows the guidelines outlined in the homework slides. ```python=+ def hybrid_images(img1_pwd, img2_pwd, D0, mode): img1 = np.float32(cv2.imread(img1_pwd, cv2.IMREAD_GRAYSCALE)) img2 = np.float32(cv2.imread(img2_pwd, cv2.IMREAD_GRAYSCALE)) img1, img2 = rezise_images(img1, img2) img1_fft = np.fft.fft2(shift(img1)) img2_fft = np.fft.fft2(shift(img2)) if mode == 'ideal': img1_fft_filtered = ideal_filter(img1_fft, D0, 'low') img2_fft_filtered = ideal_filter(img2_fft, D0, 'high') else: img1_fft_filtered = gaussian_filter(img1_fft, D0, 'low') img2_fft_filtered = gaussian_filter(img2_fft, D0, 'high') img1_shifted = np.fft.ifft2(img1_fft_filtered) img2_shifted = np.fft.ifft2(img2_fft_filtered) img1_result = shift(np.real(img1_shifted)) img2_result = shift(np.real(img2_shifted)) hybrid_result = img1_result + img2_result return img1_result, img2_result, hybrid_result ``` - The functions in this block are implemented according to the definitions provided in the homework slides. ```python=+ def ideal_filter(img, D0, mode): # Creates an ideal low-pass or high-pass filter based on the mode and cutoff frequency D0. width, height = img.shape x_center, y_center = width // 2, height // 2 x, y = np.ogrid[-x_center:width-x_center, -y_center:height-y_center] if mode == 'low': result = np.where(np.sqrt(x**2 + y**2) <= D0, 1, 0) else: result = np.where(np.sqrt(x**2 + y**2) <= D0, 0, 1) return result * img def gaussian_filter(img, D0, mode): # Creates a Gaussian low-pass or high-pass filter based on the mode and cutoff frequency D0. width, height = img.shape x_center, y_center = width // 2, height // 2 x, y = np.ogrid[-x_center:width-x_center, -y_center:height-y_center] D_uv = np.sqrt(x**2 + y**2) if mode == 'low': result = np.exp(-D_uv**2 / (2 * D0**2)) else: result = 1 - np.exp(-D_uv**2 / (2 * D0**2)) return result * img def shift(img): # Multiplies pixel values by (-1)^(i+j) to shift the image in preparation for Fourier transform. for i in range(img.shape[0]): for j in range(img.shape[1]): img[i, j] = img[i, j] * ((-1)**(i+j)) return img def rezise_images(img1, img2): # Resizes two images to the smallest width and height of the two. min_height = min(img1.shape[0], img2.shape[0]) min_width = min(img1.shape[1], img2.shape[1]) img1_resized = cv2.resize(img1, (min_width, min_height)) img2_resized = cv2.resize(img2, (min_width, min_height)) return img1_resized, img2_resized ``` ### Experimental Results The final additional image pair was downloaded from the Internet. - ${D_0=f_{cutoff}=10}$ - ![output_10](https://hackmd.io/_uploads/Hyx6vFJxJx.jpg) - ${D_0=f_{cutoff}=30}$ - ![output_30](https://hackmd.io/_uploads/By9kvYJlyl.jpg) - ${D_0=f_{cutoff}=50}$ - ![output_50](https://hackmd.io/_uploads/B1i1_YkeJx.jpg) ### Discussion 1. The images produced at different cutoff frequencies will have different visual effects at various distances, demonstrating that the human eye perceives images of different frequencies differently. 2. Hybrid images can be used in applications such as art and image processing, and can even add invisible watermarks to images to enhance security and utility. 3. Different cutoff frequencies can yield varying visual effects, allowing for appropriate adjustments to the threshold to achieve the desired application results. ### Conclusion Hybrid images effectively combine high and low-frequency components to create dual-image effects, enhancing artistic expression and practical applications like image processing and security. By selecting appropriate cutoff frequencies, we can manipulate visual perception and engage viewers, paving the way for innovative creative solutions. ## Task2 - Image Pyramid ### Introduction - Gaussian pyramid is literally a set of layers of images, where the original image is the bottom marked as $G(0)$. As it goes upward, from ${G(i)}$ to $G(i+1)$, the resolution of upper image will be one-fourth the size of that of lower image since both length and width are cut half. - To be more specific, to produce $G(i+1)$, we should firstly convolve $G(i)$ with Gaussian kernel, and lastly remove even-numbered (index = $1,3,5...$) columns and rows in $G(i)$. ![20161732N421HzOB7P](https://hackmd.io/_uploads/Sk7EJiMeJe.png) ### Implementation Procedure - The following is the function between layers, converting $G(i)$ to $G(i+n)$. n denotes how many rounds each image goes, where the stop-condition ensures both width and length are bigger than 32 pixels. ```python= import cv2 import glob import numpy as np import matplotlib.pyplot as plt import os def gaussian_pyramid(image): """ Image pyramid 1. Set the finest scale layer to the image 2. For each layer, going from next to finest to coarsest Obtain this layer by smoothing the next finest layer with a Gaussian, and then subsampling it. End """ # 5*5 Gaussian kernel kernel = np.array([[1, 4, 6, 4, 1], [4, 16, 24, 16, 4], [6, 24, 36, 24, 6], [4, 16, 24, 16, 4], [1, 4, 6, 4, 1]], dtype=np.float32) kernel = kernel / kernel.sum() while image.shape[0]>=32 and image.shape[1]>=32: #image = cv2.pyrDown(image) image = cv2.filter2D(image, -1, kernel) # blur image = image[::2, ::2] # subsampling return image ``` - This is the main part to read images, run function and save the processed images. ```python= input_folder = os.path.join(os.getcwd(), 'data/task1and2_hybrid_pyramid') output_folder = os.path.join(os.getcwd(), 'GP_output') os.makedirs(output_folder, exist_ok=True) for filename in os.listdir(input_folder): file_in_path = os.path.join(input_folder, filename) image = cv2.imread(file_in_path) image = gaussian_pyramid(image) file_out_path = os.path.join(output_folder, filename) cv2.imwrite(file_out_path, image) ``` ### Experimental Results - We set the stop-condition to ensure both width and length are bigger than 32 pixels, or the images would be too blur to observe. - This demonstrates that an example image becomes smaller and smaller along the levels upward in the pyramid. ![on](https://hackmd.io/_uploads/B1DGdiGxke.png) ### Discussion - We tried to compare the processed images between our hand-crafted Gaussian pyramid and cv2 package ***cv2.pyrDown***, and we found the results are basically the same. - hand-crafted ![4_einstein](https://hackmd.io/_uploads/rJ8NVifg1e.bmp) - *cv2.pyrDown* ![4_einstein](https://hackmd.io/_uploads/BJ-SNsfgkl.bmp) ### Conclusion - In this part, we were requested to implement Gaussian pyramid. After several rounds of downsampling, we lost the information of the original images and got blurred ones. If we were asked to do upsampling backward (to reconstruct), we would clearly see what this process has caused (information lost). ## Task3 - Colorizing the Russian Empire ### Introduction This assignment explores early color photography using digitized Prokudin-Gorskii glass plate images. Sergei Mikhailovich Prokudin-Gorskii was a pioneer who, in the early 20th century, developed a technique to capture color photos by taking three separate black and white images through red, green, and blue filters. These were combined on a single glass plate. ### Implementation Procedure * The main colorization function splits the image into three equal parts representing the blue, green, and red channels, then calculates the displacement between the green and blue channels, then between the red and aligned green channels, finally saves the colorized image as a JPG file in the desired directory ```python= def colorize(file, outputDirPath): print(file) filename = os.path.basename(file) outputFileName = filename.split('.')[0] + '.jpg' # run a timer start_time = time.time() imname = file im = skio.imread(imname) im = sk.img_as_float(im) # compute the height of each part (just 1/3 of total) height = np.floor(im.shape[0] / 3.0).astype(np.int64) # separate color channels b = im[:height] g = im[height: 2 * height] r = im[2 * height: 3 * height] b = crop_center(b) g = crop_center(g) r = crop_center(r) # number of levels in image pyramid num_runs = math.floor(math.log2(b.shape[1] / 100)) displacements = [[0, 0], [0, 0]] rgb = pyramid_align(r, g, b, num_runs, displacements) print("Runtime: %.5s seconds" % (time.time() - start_time)) rgb = 255 * rgb # Now scale by 255 img = rgb.astype(np.uint8) # save the image fname = outputDirPath + '/' + outputFileName skio.imsave(fname, img) # display the image skio.imshow(img) skio.show() def crop_center(pil_img): img_width, img_height = pil_img.shape crop_width = (int)(8.5*img_width/10) remaining_width = (int)((img_width - crop_width)/2) crop_height = (int)(8.5*img_height/10) remaining_height = (int)((img_height - crop_height) / 2) im = pil_img[remaining_width: img_width-remaining_width, remaining_height: img_height - remaining_height] return im # performs edge detection using canny filtering def get_edges(A): return sk.feature.canny(A, 3) def pyramid_align(r, g, b, num_runs, displacements): # base case if num_runs == 0: print(displacements) return np.dstack((r,g,b)) factor = 2 ** num_runs # rescale the images new_r = get_edges(sktr.rescale(r, 1 / factor)) new_g = get_edges(sktr.rescale(g, 1 / factor)) new_b = get_edges(sktr.rescale(b, 1 / factor)) ag_d = get_displacement(new_g, new_b) ag = align(new_g, ag_d) ar_d = get_displacement(new_r, ag) # align original images relative to rescaled image displacement g = align(g, [ag_d[0] * factor, ag_d[1] * factor]) r = align(r, [ar_d[0] * factor, ar_d[1] * factor]) # first row is green displacement # second row is red displacement # multiply to keep track of displacement displacements[0][0] += ag_d[0] * factor displacements[0][1] += ag_d[1] * factor displacements[1][0] += ar_d[0] * factor displacements[1][1] += ar_d[1] * factor return pyramid_align(r, g, b, num_runs-1, displacements) # sum of square distances def ssd(im1, im2): return np.sum(np.sum((im1^im2)**2)) def horizontal_shift(img, n): return np.roll(img, n, axis=1) def vertical_shift(img, n): return np.roll(img, n, axis=0) # returns best dx dy displacement def get_displacement(A, B, threshold=15): dx = 0 dy = 0 min_ssd = ssd(A, B) for u in range(-1 * threshold, threshold): for v in range(-1 * threshold, threshold): displaced_img = align(A, [u,v]) new_ssd = ssd(displaced_img, B) if new_ssd < min_ssd: dx = u dy = v min_ssd = new_ssd return [dx, dy] # aligns images using horizontal and vertical shift def align(A, d): return horizontal_shift(vertical_shift(A, d[1]), d[0]) ``` ### Experimental Results > Origin ![cathedral](https://hackmd.io/_uploads/ryx-jVimlke.jpg) > Colorized ![cathedral](https://hackmd.io/_uploads/H1R0EoQeJg.jpg) --- > Origin ![monastery](https://hackmd.io/_uploads/H1Gero7eJl.jpg) > Colorized ![monastery](https://hackmd.io/_uploads/BJaNri7eJe.jpg) --- > Origin ![tobolsk](https://hackmd.io/_uploads/ByBWSi7ekx.jpg) > Colorized > ![tobolsk](https://hackmd.io/_uploads/SkdSBimgkl.jpg) ### Discussion 1. Cropping is needed for files like tif for downsampling to reduce processing time. 2. When scale down the image, the brightness and contrast may change due to the size change 3. There are some single colors at the edge of the colorized images might be the alignment issue when splitting the image into three parts ### Conclusion In this part we delved into the fascinating world of early color photography by colorizing Prokudin-Gorskii's glass plate images of the Russian Empire. Through a combination of image processing techniques, we successfully extracted, aligned, and merged the red, green, and blue channels to reconstruct the provided color images. ## Work Assignment - 313560007 蕭皓隆：Task1 Coding, Task1 Report - 313554015 周禹彤：Task2 Coding , Task2 Report - 313551011 李佾：Task3 Coding, Task3 Report