StereoVision - Depth Estimation

:::info # <center><i class="fa fa-rocket"></i> Week 3</center> ### <center>Stereovision - Depth Estimation </center> ::: ## Introduction As we had briefly looked at the camera calibration in a nutshell. In this section, we will dig a little bit deeper on how we can estimate the depth of a point object. We all know that depth estimation using a single camera is not possible in the sense that when an image is captured or a scene for that matter, the image will always lie on the same point in the image plane regardless of the distance between the object and the image which makes it a little bit hard to give an estimate of the distance. To solve this issue, novel approach was proposed where two camera system or more (should be atleast two ) can be used to capture the same point object. Using basic geometry which leverages the use of triangulation of rays depth can be estimated. Generally, the stereo camera systems can be categorized as either parallel system or the general system. In parallel camera systems the cameras are parallel to one another, that means the rotation matrix is zero, only the tran slation matrix is given. The general system has the cameras positioned at an angle, in essence botth the translation matrix and the rotation matrices will be given. ### Depth estimation: Parallel Calibrated Camera system :::spoiler Recall: A calibrated camera that u know all the intrinsic parameters (like focal length, principal point, lens distortion) and extrinsic parameters (rotation, translation between the cameras) precisely. ::: In the parallel camera set up, depth can be estimated using the following equation: $$ \frac{T}{Z} = \frac{T + (x_r - x_l)}{Z - f} $$ where f is the focal length, Z is the depth while $$ x_r$$ and $$x_l $$ are the horizontal coordinates of the image location on the image plane. We are interested in computing the depth, therefore Z can be made subject of the formula as follows $$ Z = \frac{f * T}{x_l - x_r} $$ Recall the lower argument where we find the difference between the horizontal pixels is the disparity. And if Z is known, Y and X can be computed to obtain the point in 3D. We can scan through the horizontal line to find corresponding matching points but in this case, this is only true for parallel camera set up. Squared sum difference is an algorithm that can be used to find matching points. The squared sum difference formula is given below $$ SSD(left_(patch), right_(patch) ) = \sum_x\sum_y({left_(patch)(x,y)} - {right_(patch)(x,y)})^2 $$ We compute for the patch that will give us a minimal cost equally if we choose to normalize the data, we obtain normalized correlation that will give us the maxima. To perfom even better, classifiers can be trained which will give us even better stereo matches. Instead of matching single points in an image, we can scan the entire image and the resultant image is called a disparity map. The disparity map is a map that shows disparity in pixels between the matching points in an image (left) and right image. As a general rule of thumb, things closer to the camera will have larger disparity while the objects further from the camera will have smaller disparity. Based on this equation: $$ Z = \frac{f * T}{x_l - x_r} $$ We can see that depth and disparity are inversely proportional. In general, the smaller the patches the greater the detail, the more the noise, in constrast to larger patches which have less detail but are more smooth. Markov random fields are class models that are more effective in stereo matching. Attached link can be used as a point of reference [Markov Random fields](https://ermongroup.github.io/cs228-notes/representation/undirected/) ## Correspondence problem and disparity maps Aside from the epipolar geometry constraints, the other constraints that aid us in the identification of the corresponding points include - Disparity gradient - Ordering - Uniqueness - Similarity To compute the correspondences between pixels in the given images, block matching wil be used and below is the summarited approach - Input : Left and right images from perfectly aligned cameras - Output: Disparity map - Block size: neighborhood size we select to compare pixels from left image and the right image - Pixel # Image Processing Techniques Histogram analysis of images is really important in image processing for several key reasons: **Brightness and Contrast:** A histogram provides a visual representation of the distribution of pixel intensities. By looking at the histogram, you can quickly assess if an image is generally dark (most pixels concentrated on the left side), bright (most pixels on the right), or has good contrast (pixels spread across a wide range of intensities). **Intensity Distribution:** It reveals the frequency of each intensity level in the image, giving an overall idea of the tonal distribution. This helps in understanding the characteristics of the image content. **Dynamic Range:** The histogram shows the range of intensity values present in the image, indicating the extent of detail captured from the darkest to the brightest areas. **Identifying Issues:** Problems like underexposure, overexposure (saturation), and low contrast can be easily spotted as skewed or narrow histograms. Spikes in the histogram can indicate the presence of quantization noise or specific dominant intensity values. **Contrast Adjustment:** Based on the histogram, you can apply techniques like histogram stretching to expand the intensity range and increase contrast, making details more visible. **Brightness Correction:** If an image is too dark or too bright, histogram analysis helps determine the appropriate shift in intensity values to improve overall brightness. **Histogram Equalization:** This powerful technique aims to create a more uniform distribution of pixel intensities, leading to an increase in the overall contrast and revealing details in both dark and bright regions. ### Python Code Implementation ```python= #import the needed libraries import cv2 import numpy as np import matplotlib.pyplot as plt from scipy import stats from skimage.util import random_noise from skimage.metrics import peak_signal_noise_ratio as psnr #read second image img_2 = cv2.imread("kodim21.png", cv2.IMREAD_GRAYSCALE) print(img_2.shape) plt.imshow(img_2, cmap = "gray") ``` Resultant Image ![3399f374-c23f-41cf-82a7-db4e00659338](https://hackmd.io/_uploads/ByUuocJlgx.png) ```python= #compute the histogram using open cv hist_2 = cv2.calcHist([img_2], [0], None, [16], [0,256]) #histogram computation using numpy #plot the findings fig, ax = plt.subplots(1, 2) ax[0].set_title(" Gray Scale Histogram (opencv)") ax[0].plot(hist_2) ax[1].set_title("Histogram of the image") ax[1].hist(img_2.ravel(), bins=16) ``` Histogram Analysis ![490c364b-58b3-4048-8a65-f4ef13ac3c5d](https://hackmd.io/_uploads/r1G2s5kllg.png) #### Observation * There is a large peak around intensity ~130–150, which corresponds to the middle gray range. * Few pixels are in the darker (left) or brighter (right) ends. * This further confirms that the image has many mid-tones and likely lacks strong shadows or highlights — indicating a somewhat flat or evenly lit image ### Low contrast Image ```python= #read second image img_3 = cv2.imread("low_contrast.jpg", cv2.IMREAD_GRAYSCALE) print(img_3.shape) #compute the histogram hist_3 = cv2.calcHist([img_3], [0], None, [16], [0,256]) #plot the findings fig, ax = plt.subplots(1, 2) ax[0].set_title(" Gray Scale Histogram (opencv)") ax[0].plot(hist_3) ax[1].set_title("Histogram of the image") ax[1].hist(img_3.ravel(), bins=16) ``` ### Output ![646eb70f-d16c-42a9-8755-51919e7ccabc](https://hackmd.io/_uploads/r1cfa9kxel.png) ### Original Image ![f2adffa2-90cc-4b4a-b052-dc628e89c54f](https://hackmd.io/_uploads/ry6racJgle.png) ### Observation * An extremely high concentration of pixel values is in the first bin (close to 0), representing very dark (nearly black) pixels. * Subsequent bins have sharply decreasing frequencies, approaching zero. * This histogram suggests the image is heavily dominated by dark pixels, with very little variation in brightness. ### Elementwise operations - Adding Two Images Elementwise operations - Adding Two Images means combining two images by adding the pixel values at corresponding positions. Purpose: Used in image blending, transparency effects, or highlighting changes. Example is shown below ![b8811c26-9cdf-4aca-9b2b-fadf4e499813](https://hackmd.io/_uploads/r1TUfokxxx.png) ```python! #read our images lego_a = plt.imread("lego_a.jpg") lego_b = plt.imread("lego_b.jpg") #display the images fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10,5)) ax1.imshow(lego_a) ax1.axis("off") ax1.set_title("image a") ax2.imshow(lego_b) ax2.axis("off") ax2.set_title("image b") #perform arithmetic image operations added_lego_img = cv2.add(lego_a, lego_b) plt.imshow(added_lego_img) plt.axis("off") plt.title("Added Images") ``` #### Results ![671e8c61-6448-4c11-8a89-b41b5ee36e92](https://hackmd.io/_uploads/SkIl7iJeee.png) #### Some statistics ```python #some statistics #slice the image and obtain image channels separately for lego a lego_a_red = lego_a[:,:,0] #extract blue lego_a_blue = lego_a[:,:,1] #extract green lego_a_green = lego_a[:,:,2] #slice the image and obtain image channels separately for lego b lego_b_red = lego_b[:,:,0] #extract blue lego_b_blue = lego_b[:,:,1] #extract green lego_b_green = lego_b[:,:,2] #flatten the arrays to 1D flattened_red_lego_a = lego_a_red.ravel() flattened_blue_lego_a = lego_a_blue.ravel() flattened_green_lego_a = lego_a_green.ravel() #statistics # statitistics of color channel # Red channel mean_red_lego_a = np.mean(flattened_red_lego_a) median_red_lego_a = np.median(flattened_red_lego_a) mode_red_lego_a = stats.mode(flattened_red_lego_a, axis=None, keepdims=False) #green channel mean_green_lego_a = np.mean(flattened_green_lego_a) median_green_lego_a = np.median(flattened_green_lego_a) mode_green_lego_a = stats.mode(flattened_green_lego_a, axis=None, keepdims=False) # blue channel mean_blue_lego_a = np.mean(flattened_blue_lego_a) median_blue_lego_a = np.median(flattened_blue_lego_a) mode_blue_lego_a = stats.mode(flattened_blue_lego_a, axis=None, keepdims=False) print(f"The mean of red channel is {mean_red_lego_a}") print(f"The mode of red channel is {median_red_lego_a}") print(f"The median of red channel is {mode_red_lego_a}") print(" ") print(f"The mean of blue channel is {mean_blue_lego_a}") print(f"The mode of blue channel is {median_blue_lego_a}") print(f"The median of blue channel is {mode_blue_lego_a}") print(" ") print(f"The mean of green channel is {mean_green_lego_a}") print(f"The mode of green channel is {median_green_lego_a}") print(f"The median of green channel is {mode_green_lego_a}") ``` ### Results **IMage a** The mean of red channel is 152.5122355015674 The mode of red channel is 162.0 The median of red channel is ModeResult(mode=np.uint8(161), count=np.int64(72165)) The mean of blue channel is 145.2903614811912 The mode of blue channel is 161.0 The median of blue channel is ModeResult(mode=np.uint8(160), count=np.int64(65372)) The mean of green channel is 142.70874412225706 The mode of green channel is 159.0 The median of green channel is ModeResult(mode=np.uint8(158), count=np.int64(73333)) **Added Images** The mean of red channel is 238.8343230799373 The mode of red channel is 255.0 The median of red channel is ModeResult(mode=np.uint8(255), count=np.int64(816850)) The mean of blue channel is 228.01535070532915 The mode of blue channel is 255.0 The median of blue channel is ModeResult(mode=np.uint8(255), count=np.int64(778667)) The mean of green channel is 226.36845513322885 The mode of green channel is 255.0 The median of green channel is ModeResult(mode=np.uint8(255), count=np.int64(786015)) ### Results * All three channels have mid-to-high average intensity values, suggesting the image is neither too dark nor overly saturated in any one color. * The red channel has the highest mean, so the image may have a warmer tone overall. * Mode ≈ Median in all channels suggests a fairly balanced distribution without strong skew. ## Matrix Scalar Operations Brightness Contrast Shift ```python= import cv2 import matplotlib.pyplot as plt # Compute histogram and statistics # Compute histograms hist_original = cv2.calcHist([gray_image], [0], None, [256], [0, 256]) hist_adjusted = cv2.calcHist([adjusted_img], [0], None, [256], [0, 256]) # Compute statistics def print_stats(name, image): print(f"\n{name} Image Statistics:") print(f"Mean: {np.mean(image):.2f}") print(f"Standard Deviation: {np.std(image):.2f}") print(f"Min: {np.min(image)}") print(f"Max: {np.max(image)}") print_stats("Original", gray_image) print_stats("Enhanced", adjusted_img) # Plot images and histograms plt.figure(figsize=(12, 6)) # Original image plt.subplot(2, 2, 1) plt.title("Original Image") plt.imshow(gray_image, cmap='gray') plt.axis("off") # Enhanced image plt.subplot(2, 2, 2) plt.title("Enhanced Image") plt.imshow(adjusted_img, cmap='gray') plt.axis("off") # Histogram of original plt.subplot(2, 2, 3) plt.title("Histogram - Original") plt.plot(hist_original, color='black') plt.xlim([0, 256]) # Histogram of enhanced plt.subplot(2, 2, 4) plt.title("Histogram - Enhanced") plt.plot(hist_adjusted, color='black') plt.xlim([0, 256]) plt.tight_layout() plt.show() ``` The code snippet increases the contrast of the grayscale image by a factor of 1.5 and adds 50 to the brightness of each pixel, resulting in an image with more distinct light and dark areas and an overall brighter appearance. ![8a16ac81-3d4d-434f-b5a5-138bb5a16662](https://hackmd.io/_uploads/SJdOwokeee.png) ### Matrix Exponentitaion ```python= # load a grayscale image and normalize my_img = cv2.imread("kodim08.png", cv2.IMREAD_GRAYSCALE)/255.0 #my matrix exponents my_exponents = [0.4, 1, 1.5, 2] plt.figure(figsize=(12,8)) # Load grayscale image img = cv2.imread("kodim08.png", cv2.IMREAD_GRAYSCALE) if img is None: raise ValueError("Image not found. .") # Define gamma values my_exponents = [0.4, 1, 1.5, 2] # Setup the plot plt.figure(figsize=(10, 12)) for i, gamma in enumerate(my_exponents): # Normalize to range [0, 1] normalized = img / 255.0 # Apply gamma correction corrected = np.power(normalized, gamma) # Rescale to [0, 255] and convert to uint8 corrected_img = np.uint8(np.clip(corrected * 255, 0, 255)) # Compute histogram hist = cv2.calcHist([corrected_img], [0], None, [256], [0, 256]) # Show gamma-corrected image plt.subplot(len(my_exponents), 2, i * 2 + 1) plt.imshow(corrected_img, cmap='gray') plt.title(f"Gamma = {gamma}") plt.axis("off") # Show histogram plt.subplot(len(my_exponents), 2, i * 2 + 2) plt.plot(hist, color='black') plt.title(f"Histogram (Gamma = {gamma})") plt.xlim([0, 256]) plt.tight_layout() plt.show() ``` ### Results ![bb21450f-e857-4ced-819c-612871f59b78](https://hackmd.io/_uploads/rk0bijyxgl.png) Gamma < 1 Brightens the image. Enhances darker details by making dark pixels lighter. Useful for underexposed images. Gamma = 1: No change to the image. The output is identical to the input. Gamma > 1 Darkens the image. Compresses brighter regions and enhances highlights. Useful for overexposed images or to add contrast. ## Downsampling Downsampling is about creating a more compact version of an image by leaving out some pixels while holding onto the most important details. This technique finds its way into various fields, from squeezing images for storage to making real-time visual tasks snappier, like in computer vision and graphics. ```python= # Load a grayscale image and normalize to [0, 1] my_img = cv2.imread("kodim08.png", cv2.IMREAD_GRAYSCALE) / 255.0 # Interleaving (slicing) interleaved = my_img[::2, ::2] # Interleaving (simple 2x2 box filter) kernel = np.ones((2, 2), np.float32) / 4 # Average filter smoothed = cv2.filter2D(my_img, -1, kernel) conv_interleaved = smoothed[::2, ::2] # Plot the original and both downsampled versions plt.figure(figsize=(12, 6)) # Original (resized to same scale) plt.subplot(1, 3, 1) plt.imshow(my_img, cmap='gray') plt.title("Original (Normalized)") plt.axis("off") # Interleaved plt.subplot(1, 3, 2) plt.imshow(interleaved, cmap='gray') plt.title("Interleaved (No Filter)") plt.axis("off") # Convolution + Interleaved plt.subplot(1, 3, 3) plt.imshow(conv_interleaved, cmap='gray') plt.title("Filtered + Interleaved") plt.axis("off") plt.tight_layout() plt.show() ``` ### Results ![ee7874aa-4323-4faf-9ab3-8fda29ac80ca](https://hackmd.io/_uploads/r1oA6skele.png) ## PSNR Peak Signal-to-Noise Ratio (PSNR) is a widely used metric for assessing the quality of a reconstructed or compressed image compared to its original version. It quantifies the ratio between the maximum possible signal (image intensity) and the noise introduced during processing. A higher PSNR value indicates better image quality, with values above 40 dB generally considered excellent, while values below 30 dB suggest poor quality and noticeable degradation. PSNR is particularly sensitive to large errors, making it a useful tool for detecting significant distortions. In practice, PSNR can be used to evaluate the effect of noise (e.g., Gaussian noise added with randomnoise(legoa, mode='gaussian', var=var)) or to compare different image restoration or compression methods. ```python= # Load the original image lego_a = plt.imread("lego_a.jpg") if lego_a.dtype != np.float32 and lego_a.dtype != np.float64: lego_a = lego_a / 255.0 # Noise variances for low, medium, high intensity noise_variances = [1.5, 0.02, 0.05] noisy_images = [] psnr_values = [] # Add noise and compute PSNR for var in noise_variances: noisy = random_noise(lego_a, mode='gaussian', var=var) noisy_images.append(noisy) psnr_val = psnr(lego_a, noisy) psnr_values.append(psnr_val) # Plot original and noisy images with PSNR values fig, axes = plt.subplots(1, 4, figsize=(16, 5)) axes[0].imshow(lego_a) axes[0].set_title("Original") axes[0].axis("off") titles = ["High", "low", "medium"] for i in range(3): axes[i+1].imshow(noisy_images[i]) axes[i+1].set_title(f"{titles[i]} Noise\nPSNR: {psnr_values[i]:.2f} dB") axes[i+1].axis("off") plt.tight_layout() plt.show() ``` ### Results ![needed_for_compression](https://hackmd.io/_uploads/rkhWe2ygll.png) Afterwards noise was removed using some filtering and the following os the code ```python= psnr_denoised = [] denoised_images = [] for i in range(3): noisy_8bit = (noisy_images[i] * 255).astype(np.uint8) denoised = cv2.blur(noisy_8bit, (3, 3)) denoised = denoised / 255.0 denoised_images.append(denoised) psnr_val = psnr(lego_a, denoised) psnr_denoised.append(psnr_val) print(f"Denoised PSNR for noise level {i+1}: {psnr_val:.2f} dB") # Plotting fig, axes = plt.subplots(3, 3, figsize=(12, 9)) titles = ['Original', 'Noisy', 'Denoised'] for i in range(3): axes[i, 0].imshow(lego_a) axes[i, 0].set_title(titles[0]) axes[i, 1].imshow(noisy_images[i]) axes[i, 1].set_title(titles[1] + f" {i+1}") axes[i, 2].imshow(denoised_images[i]) axes[i, 2].set_title(titles[2] + f" {i+1}\nPSNR: {psnr_denoised[i]:.2f} dB") for j in range(3): axes[i, j].axis('off') plt.tight_layout() plt.show() ``` Results ![compression](https://hackmd.io/_uploads/rJf9ehyxxl.png) :::info # <center><i class="fa fa-rocket"></i> Week 3 - Continuation </center> ### <center>Stereovision - Depth Estimation </center> :::