Week 1 : Summary

# Week 1: Summary :::info ### Objectives The objectives of this week were to: - Familiarize myself with epipolar geometry and camera calibration - Set up a simple experiment using the RealSense D455 camera ::: ## 16.04.2025 – Camera Calibration As someone interested in photography, I’ve often been confounded by the issue of distortion in photos. By distortion, I mean situations where we are absolutely certain that the scene we photographed contains straight lines—yet the resulting image shows them as curved. Interesting, isn't it? In some cases, object distances also appear misrepresented, leading us to wonder: why does this happen? :::spoiler The distortion effect arises from how cameras perceive the world. A camera’s "vision" gives us valuable insight into how such distortions occur. ::: :::success To mitigate distortion effects, **camera calibration** is essential. You can think of calibration as a corrective procedure that reduces the margin of error a camera makes when capturing and processing visual information. ::: Camera calibration provides a scientific framework for understanding how a 3D point in space is projected onto a 2D image. This knowledge is fundamental in virtually every field that involves imaging or computer vision. Calibration relies on well-established mathematical transformations rooted in linear algebra. At its core is the equation: > **X = P·Y** Where: - **X** is the 2D point in the image - **P** is the transformation matrix (camera matrix) - **Y** is the 3D point in the real world :::danger The **camera matrix** contains the transformation parameters that make it possible to convert a 3D point in space into a 2D image. These transformation components fall into two key categories: - **Intrinsic Parameters** — Attributes of the camera itself - **Extrinsic Parameters** — How the camera is positioned in space ::: ### 🔍 Intrinsic Parameters The **intrinsic camera matrix** converts a point from the camera coordinate system to the pixel coordinate system. It is determined by the camera's internal characteristics: - **Focal Length** – Indicates how “zoomed in” the view is - **Principal Point** – Defines the actual center of the image - **Skew** – Describes pixel shape and alignment - **Aspect Ratio** – The ratio between pixel width and height ### 🧭 Extrinsic Parameters The **extrinsic parameters** transform a point from world coordinates to the camera coordinate system. They describe the camera's position and orientation: - **Rotation** – The camera’s orientation in 3D space - **Translation** – The camera’s position relative to the world --- ## ❓ Why Perform Camera Calibration? Imagine being tasked with estimating the distance between two buildings or monitoring a vessel approaching a harbor. Without proper camera calibration, it’s impossible to obtain accurate measurements. Today, camera calibration is critical across domains such as: - 🤖 Robotics - 🏗️ Construction - 🏭 Industrial manufacturing --- ## 🔧 Simple Camera Calibration Setup A basic camera calibration workflow includes the following steps: 1. Use a chessboard with known square dimensions 2. Capture multiple images from different angles, ensuring the pattern is clearly visible 3. Detect the chessboard corners 4. Compute the camera's intrinsic and extrinsic parameters - You can use **OpenCV** or **Zhang’s Method** 5. Evaluate **reprojection errors** to assess calibration accuracy --- ### Sample D455 Camera calibration set up ___ In my simple experimental set up, I have a **RealSense D455C camera**. The Intel RealSense cameras function like our very own eyes and they have a secondary infrared sensor as well as conventional sensor for accurate depth estimnation. The D450 features include: * Field of View of 86° x 57° (+/- 3°) * A recommended range of 40 cm to 6 m * The Accuracy is +/-2% at 40 cm The expected error rate of the D455 camera is les than 2% at 4 meters. The D455 cameras are always precalibrated therefore we can implement a simple python script to check the camera intrinsics. The implementation is shown below: ```python= import pyrealsense as rs #create a pipeline. The pipelines transfers data from device to the program my_pipeline = rs.pipeline() #create a configuration to configure camera stream my_config = rs.config() #configure pipeline to stream my_config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30) #start the pipeline based on the given configurations profile = pipeline.start(my_config) #Get active profile of a device color_stream = profile.get_stream(rs.stream.color) #get camera intrinsics camera_intrinsics = color_stream.as_video_stream_profile().get_intrinsics() #stop streaming my_pipeline.stop() #output the intrinsics # Print intrinsic parameters print("Camera Intrinsic Parameters:") print(f"Width: {camera_intrinsics.width}") print(f"Height: {camera_intrinsics.height}") print(f"PPX (cx): {camera_intrinsics.ppx}") print(f"PPY (cy): {camera_intrinsics.ppy}") print(f"fx: {camera_intrinsics.fx}") print(f"fy: {camera_intrinsics.fy}") print(f"Distortion Model: {camera_intrinsics.model}") print(f"Distortion Coefficients: {camera_intrinsics.coeffs}") ```` I executed the script above and obtained the results shown in the following section **Camera Intrinsic Parameters:** * Width: 640 * Height: 480 * PPX (cx): 325.6593322753906 * PPY (cy): 244.73487854003906 * fx: 384.5003967285156 * fy: 383.7279357910156 * Distortion Model: distortion.inverse_brown_conrady * Distortion Coefficients: [-0.05377735570073128, 0.06453442573547363, -0.0008009545272216201, 6.907136412337422e-05, -0.02068648859858513] ### Quick Test - Corner Detection single image Having obtained the intrinsic camera parameters, I was keen to see if I could proceed with calibration in any way. So I decided to test the waters with a simple script to check corner detection on a single image using opencv library. Attached is the code ```python= #import opencv import cv2 import numpy as np #checkerboard size and path CHECKERBOARD_SIZE = (28, 17) # Internal corners image_path = r"C:\Users\lang_es\Desktop\CompVision\captured_images\image_20250422_123426.png" #read the image img = cv2.imread(image_path) #convert to grayscale, comp vision libraries work best with grayscale images gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #detect the chessboard corners ret, corners = cv2.findChessboardCorners(gray_img, CHECKERBOARD_SIZE, None) if ret: print("Checkerboard detected!") #define stoping criteria criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001) #refine corner detection to subpixel values corners = cv2.cornerSubPix(gray, corners, (3, 3), (-1, -1), criteria) #draw and show corners cv2.drawChessboardCorners(img, CHECKERBOARD_SIZE, corners2, ret) cv2.imshow('Checkerboard Detection', img) cv2.waitKey(0) cv2.destroyAllWindows( else: print("Checkerboard not detected!") ``` The detection results is output in the window below ![marked](https://hackmd.io/_uploads/BJDJsgSkle.png) Having successfully tested and confirmed that our algorithm is working, we extend it to camera calibration. Before the camera calibration step, I took a series of chessboard images from different angles. The code implementation is shown below ```python= import os import cv2 import pyrealsense2 as rs from datetime import datetime #directory where captured images will be stored save_dir = "captured_images" #create the directory os.makedir(save_dir, exist_ok = True) #initialize the pipeline and start pipeline = rs.pipeline() pipeline.start() print("Press SPACEBAR to capture or ESC to exit") try: while True: frames = pipeline.wait_for_frames() color_frame = frames.get_color_frame() if not color_frame: continue color_image = np.asanyarray(color_frame.get_data()) cv2.imshow("RealSense Camera", color_image) key = cv2.waitKey(1) if key % 256 == 27: # ESC print("Exiting...") break elif key % 256 == 32: # SPACE timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") filename = f"{save_dir}/image_{timestamp}.png" cv2.imwrite(filename, color_image) print(f"Image saved: {filename}") finally: pipeline.stop() cv2.destroyAllWindows() ``` We know have the required chessboard images that we need to calibrate our camera. To perform the camera calibration step, we proceed with the following code ```python! import cv2 import numpy as np import os import glob # Used to find files matching a pattern in the file system # Calibration pattern type pattern_type = "checkerboard" CHECKERBOARD_SIZE = (28, 17) SQUARE_SIZE_MM = 10 # Path to directory containing calibration images image_dir = r"C:\Users\lang_es\Desktop\CompVision\captured_images" # Path to save calibration data calib_data_path = r"C:\Users\lang_es\Desktop\CompVision" # Criteria for corner refinement and stopping criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001) if pattern_type == "checkerboard": # Prepare 3D object points objp = np.zeros((CHECKERBOARD_SIZE[0] * CHECKERBOARD_SIZE[1], 3), np.float32) objp[:, :2] = np.mgrid[0:CHECKERBOARD_SIZE[0], 0:CHECKERBOARD_SIZE[1]].T.reshape(-1, 2) objp *= SQUARE_SIZE_MM obj_points_3D = [] # 3D points in real world space img_points_2D = [] # 2D points in image plane image_size = None # Get list of image files image_paths = glob.glob(os.path.join(image_dir, '*.png')) for image_path in image_paths: img = cv2.imread(image_path) gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # Find the chessboard corners ret, corners = cv2.findChessboardCorners(gray_img, CHECKERBOARD_SIZE, None) if ret: if image_size is None: image_size = gray_img.shape[::-1] obj_points_3D.append(objp) corners2 = cv2.cornerSubPix(gray_img, corners, (3, 3), (-1, -1), criteria) img_points_2D.append(corners2) cv2.drawChessboardCorners(img, CHECKERBOARD_SIZE, corners2, ret) if image_size and obj_points_3D and img_points_2D: # Calibrate camera ret, mtx, dist_coeff, R_vecs, T_vecs = cv2.calibrateCamera( obj_points_3D, img_points_2D, image_size, None, None ) print("Calibration complete.") print("Camera Matrix:\n", mtx) print("Distortion Coefficients:\n", dist_coeff) # Save calibration data np.savez( os.path.join(calib_data_path, "CalibrationMatrix_college_cpt"), Camera_matrix=mtx, distCoeff=dist_coeff, RotationalV=R_vecs, TranslationV=T_vecs, ) else: print("No valid checkerboard patterns found in the images.") ``` The obtained results is presented as shown below ### Camera Matrix $$ \mathbf{K} = \begin{bmatrix} 640.46395702 & 0 & 649.3118425 \\ 0 & 640.53032516 & 369.79584193 \\ 0 & 0 & 1 \end{bmatrix} $$ We can observe that the pixel scaling in x and y are nearly equal. Pixel scaling in x and y are given by 640.46 and 640.53 meaning they have square pixel. The principal focus values are 649.311 and 369.79 which is usually near the center of the image. The images were captured at a resolution of 1280 and 720 and since the principal focus values are slightly off by about 9 pixels, the findings are perfectly normal. ### Distortion Coefficients $$ \mathbf{D} = \begin{bmatrix} -0.0528824890 & 0.0617703141 & -0.000929724206 & 0.0000844753416 & -0.0189956296 \end{bmatrix} $$ The radial distortion coefficients are -0.05288 and 0.06177. The negative radial distortion indicates barrel distortion while the positive indicates distortion correction from the center. The tangential coefficients are -0.00093 while 0.00008. The values are close to zero which is good and the tangential distortion arises from sensor and lens. --- ## 17.04.2025 – Epipolar Geometry Typically it is not possible to recover the 3D world structure from a 2D image of scene since with the associated transformation or mapping of 3D point to a 2D image place there is an associated information loss and this can be observed effectively a dimension loss. Epipolar geometry is the geometric relationship between two images :::spoiler Often when a scene is captured from multiple views, an interesting relationship always arises between the associated cameras, the 3D point and the image mapping or (points projection on the image plane of the camera). Epipolar geometry is the geometry that relates the 3D points, the associated observations as well as the cameras ::: The line or the distance between the camera centers is called the baseline and the plane that is defined by the two camera centers an **epipolar plane.** The epipolar plane consists of the camera centers and the 3D point. **Epipoles** are the locations where the baseline intersects the image planes and the lines that are defined by the intersection of the epipolar plane and the image planes are called epipolar lines. Epipolar lines is a point in one image while when viewed from the other image it is a line. **Epipoles** are the intersection between the image plane abd lines which connect the camera centers. When the cameras are placed parallel to one another, the corresponding image planes will be parallel and hence the epipolar lines will not intersect the baseline hence they will be located at infinity since baseline joining centers of the camera will be parallel to the associated image plane otherwise the epipolar lines could intersect outside the image or inside the image. :::warning In a real world situation, the exact location of 3D point is never given but using the image projections the distance can be estimated. As mentioned earlier cameras **intrinsic and extrinsic** properties will come in handy ::: Having the camera location knowledge as well as the image point,the epipolar plane can be defined. Given the epipolar plane, epipolar lines can be determined and by definition the projection of the 3D point lets call it P will be located on the epipolar line of the second image. Therefore, armed with amazing understanding of the epipolar geometry, a strong constraint between the image pairs cane be created. Epipolar constraint is a powerful idea that helps us narrow down on the scanline or the search line when we are tasked with matching of points in images. Given two stereo images and we are asked to finding matching points, of course this will be a computational intensive task to match pixel by pixel and for this reason we can consider coming up with an essential matrix. **Essential matrix** this is a 3x3 matrix that makes it possible to encode epipolar geometry. For example if we are given an image, we can multiply this image by an essential matrix for us to obtain an epipolar line for the second image which will make it easy for us to perform the scanning and the matching. :::danger - Essential matrix used for calibrated camera. They are rank(2). Rank deficient - Fundamental matrix used for uncalibrated camera ::: Remember that the epipolar line constrains the search space for correspondence from a region to a line. In this case, if a point is observed in an image of camera 1, then the location of x' in image two must lie on the epipolar line. The stereo images can be related by the equations below: $$ \mathbf{X}'_c = \mathbf{R} \mathbf{X}_c + \mathbf{T} $$ Taking the vector (cross) product with **T**, we obtain: $$ \mathbf{T} \times \mathbf{X}'_c = \mathbf{T} \times \mathbf{R} \mathbf{X}_c + \mathbf{T} \times \mathbf{T} $$ Since the cross product of a vector with itself is zero: $$ \mathbf{T} \times \mathbf{T} = \mathbf{0} $$ So: $$ \mathbf{T} \times \mathbf{X}'_c = \mathbf{T} \times \mathbf{R} \mathbf{X}_c $$ ## The Essential Matrix Taking the scalar (dot) product we obtain: $$ \mathbf{X}'_c \cdot (\mathbf{T} \times \mathbf{X}'_c) = \mathbf{X}'_c \cdot (\mathbf{T} \times \mathbf{R} \mathbf{X}_c) $$ we have: $$ \mathbf{X}'_c \cdot (\mathbf{T} \times \mathbf{R} \mathbf{X}_c) = 0 \tag{1} $$ Recall that a vector cross product can be expressed as a matrix multiplication: $$ \mathbf{T} \times \mathbf{X}_c = [\mathbf{T}]_\times \mathbf{X}_c $$ Where T is the skew-symmetric matrix $$ [\mathbf{T}]_\times = \begin{bmatrix} 0 & -T_z & T_y \\ T_z & 0 & -T_x \\ -T_y & T_x & 0 \end{bmatrix} $$ So equation (1) can be rewritten as: $$ \mathbf{X}'_c \cdot ([\mathbf{T}]_\times \mathbf{R} \mathbf{X}_c) = 0 $$ This constraint also holds for **image rays**, which are parallel to the camera-centered position vectors. $$ \mathbf{p}'^{\mathrm{T}} \mathbf{E} \mathbf{p} = 0 \tag{2} $$ This is the **epipolar constraint**. If we observe a point **p** in one image, then its corresponding point **p'** in the other image must lie on the **epipolar line** defined by equation (2). :::success The essential matrix is skew symetric and can be used to compute the epipolar line as well as the epipoles. The epipoles can be estimated using the essential matrix and the location of one of the epipoles which is not the reference lies in the nullspace of Essential matrix meaning that the essential matrix is rank deficient and of maximum rank 2. ::: :::info For calibrated cameras we will end up with an essential matrix while for uncalibrated matrix we will end up with fundamental matrix. :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.