# Group[5]_HW[4]_report ## Introduction - This homework aims to implement the algorithm called structure from motion (SfM), which is to reconstruct the 3-dimensional environment through several given 2-dimensional images. ## Implementation Procedure ### Step 1: - Find out correspondence across images - We first invoked `find_features()` to identify keypoints and computes descriptors for both images based on `SIFT`. - Then the `feature_matching()` performs descriptor matching using a brute-force approach and applies *Lowe’s ratio* test to filter good matches based on a specified threshold $(0.7)$. ```python= def find_features(img1, img2, method): if method == 'orb': orb = cv2.ORB_create() kp1, des1 = orb.detectAndCompute(img1,None) kp2, des2 = orb.detectAndCompute(img2,None) else: sift = cv2.SIFT_create() kp1, des1 = sift.detectAndCompute(img1,None) kp2, des2 = sift.detectAndCompute(img2,None) return kp1, kp2, des1, des2 def feature_matching(des1, des2, threshold): matches = [] for i, desc1 in enumerate(des1): scores = [np.sum((desc1 - desc2) ** 2) for desc2 in des2] first, second = heapq.nsmallest(2, range(len(scores)), key=scores.__getitem__) ratio_distance = np.sqrt(scores[first]) / np.sqrt(scores[second]) if ratio_distance < threshold: match = cv2.DMatch() match.queryIdx = i match.trainIdx = first match.distance = np.sqrt(scores[first]) matches.append(match) return matches '''step-1''' kp1, kp2, des1, des2 = find_features(img1, img2, 'sift') best_match = feature_matching(des1, des2, 0.7) ``` ### Step 2: - This step estimates the fundamental matrix $F$ between two images using `RANSAC` algorithm. - First, normalizes point coordinates and randomly samples matches, then computes candidate fundamental matrices via the normalized 8-point algorithm and evaluates inliers based on geometric error. - Lastly, the fundamental matrix with the highest inlier count is selected. - For the parameters of `RANSAC`: - $S=8$, required for the 8-point algorithm used to estimate the fundamental matrix - $N=3000$, RANSAC interations - $threshold=0.01$, matches with an error below the threshold are considered inliers ```python= def normalize_points(points, w, h): # Construct the normalization transformation matrix transform = np.array([ [2.0/w, 0, -1], [0, 2/h, -1], [0, 0, 1.0] ]) # Apply normalization transformation normalized_points = np.dot(transform, np.vstack((points.T, np.ones(points.shape[0])))) return transform, normalized_points def estimate_fundamental_matrix(S, pts1, pts2, w1, h1, w2, h2 # Normalize the points from both images T1, pt_1 = normalize_points(pts1, w1, h1) T2, pt_2 = normalize_points(pts2, w2, h2) # Construct the matrix A for the 8-point algorithm A = np.ones((S, 9)) for i in range(S): A[i,0:3] = pt_2[0,i] * pt_1[:,i].T # First row of constraints A[i,3:6] = pt_2[1,i] * pt_1[:,i]. # Second row of constraints A[i,6:9] = pt_1[:,i].T # Third row of constraints # Compute the singular value decomposition (SVD) of A U, S, Vt = np.linalg.svd(A) # The fundamental matrix is the last row of V.T F = Vt[-1].reshape(3, 3) # Enforce rank-2 constraint on F U, S, Vt = np.linalg.svd(F) # Set the smallest singular value to zero S[2] = 0 S1 = np.zeros((3, 3)) S1[0, 0] = S[0] S1[1, 1] = S[1] # Denormalize the fundamental matrix F = np.dot(np.dot(U, S1), Vt) F = np.dot(T2.T, np.dot(F, T1)) # Normalize F so that F[2, 2] is 1 F = F/F[2,2] return F def RANSAC(S, N, matches, kp1, kp2, threshold, w1, h1, w2, h2): best_F = None max_inliers = [] for _ in range(N): # Randomly sample S matches id = random.sample(range(len(matches)), S) sample_pairs = [matches[i] for i in id] # Extract the points corresponding to the sampled matches pts1 = np.array([kp1[m.queryIdx].pt for m in sample_pairs]) pts2 = np.array([kp2[m.trainIdx].pt for m in sample_pairs]) try: # Estimate a candidate fundamental matrix using the sampled points F_candidate = estimate_fundamental_matrix(S, pts1, pts2, w1, h1, w2, h2) except np.linalg.LinAlgError: continue # Evaluate the inliers for the candidate fundamental matrix inliers = [] for m in matches: pt1 = np.array([kp1[m.queryIdx].pt[0], kp1[m.queryIdx].pt[1], 1]) # Homogeneous point in image 1 pt2 = np.array([kp2[m.trainIdx].pt[0], kp2[m.trainIdx].pt[1], 1]) # Homogeneous point in image 2 # Compute the geometric error error = np.abs(np.dot(pt2.T, np.dot(F_candidate, pt1))) if error < threshold: inliers.append(m) # Update the best fundamental matrix if more inliers are found if len(inliers) > len(max_inliers): max_inliers = inliers best_F = F_candidate return best_F, max_inliers '''step-2''' w1, h1, c1 = img1.shape w2, h2, c2 = img2. F, max_inliers = RANSAC(S=8, N=3000, matches=best_match, kp1=kp1, kp2=kp2, threshold=0.01, w1=w1, h1=h1, w2=w2, h2=h2) ``` ### Step 3: - This step visualizes epipolar geometry by computing epipolar lines using the fundamental matrix $F$ and drawing them on two images. - First, points from one image are marked on the other, and corresponding epipolar lines are drawn. The images are displayed side-by-side with points and lines. The best matching keypoints are used to generate the epipolar lines. ```python= def compute_epipolar(pts, F): # Convert points to homogeneous coordinates and compute the epipolar lines pts_homogeneous = np.hstack((pts, np.ones((pts.shape[0], 1)))) # Compute epipolar lines lines = np.dot(F, pts_homogeneous.T).T return lines def draw(img1_with_pts, img2_with_lines): plt.figure(figsize=(18, 9)) plt.subplot(1, 2, 1) plt.title("Image 1 with points") plt.imshow(img1_with_pts) plt.subplot(1, 2, 2) plt.title("Image 2 with epilines") plt.imshow(img2_with_lines) plt.axis("off") plt.show() return def draw_epipolar_lines(img1, img2, pts1, pts2, F): img1_with_pts = img1.copy() img2_with_lines = img2.copy() h2, w2 = img2.shape[:2] # Compute the epipolar lines for points in img1 lines = compute_epipolar(pts1, F) for pt1, line in zip(pts2, lines # Generate random color for each line and point color = tuple(np.random.randint(0, 255, 3).tolist()) # Draw the points on the first image pt1_int = tuple(map(int, pt1)) cv2.circle(img1_with_pts, pt1_int, 5, color, -1) # Compute line endpoints in img2 for epipolar line x0, y0 = map(int, [0, -line[2] / line[1]]) x1, y1 = map(int, [w2, -(line[2] + line[0] * w2)/ line[1]]) # Draw the epipolar line on img2 cv2.line(img2_with_lines, (x0, y0), (x1, y1), color, 1) draw(img1_with_pts, img2_with_lines) return img1_with_pts, img2_with_lines '''step-3''' best_match_kp1 = np.array([kp1[m.queryIdx].pt for m in max_inliers]) # Points from image 1 best_match_kp2 = np.array([kp2[m.trainIdx].pt for m in max_inliers]) # Points from image 2 img1_with_points, img2_with_lines = draw_epipolar_lines(img2, img1, best_match_kp1, best_match_kp2, F_cv) ``` ### Step 4: - Estimate essential matrix from fundamental matrix. - $E = {K^T}FK$ - $E$: essential matrix - $F$: fundamental matrix - $K$: camera intrinsic matrix - Since the essentail matrix is identified, there are four possible camera pose configurations, denoted by (C, R). - $C$ is the camera center. - $R$ is the rotation matrix. - Camera pose is $P=KR[I_{3×3} -C]$ ```python= def essential_matrix(F, K): E = np.matmul(np.matmul(np.transpose(K), F), K) U, S, Vt = np.linalg.svd(E) W = [[0, -1, 0], [1, 0, 0], [0, 0, 1]] Wt = np.transpose(W) # cameras' rotation matrix R1 = R2 = np.matmul(np.matmul(U, W), Vt) R3 = R4 = np.matmul(np.matmul(U, Wt), Vt) # cameras' center C1 = C3 = U[:, 2] C2 = C4 = -U[:, 2] solutions = [(R1, C1), (R2, C2), (R3, C3), (R4, C4)] return solutions ``` ### Step 5: - To find the best/correct camera pose, we need to check the cheirality condition. To be more specific, the reconstructed points must be in front of the cameras. ```python= def best_solution(solutions, K, pts1, pts2): best_solution = None max_cnt = 0 for R, C in solutions: det = np.linalg.det(R) R1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]]) R2 = np.hstack((R, C.reshape(3, 1))) # P = KR P1 = K @ R1 P2 = K @ R2 # Triangulate 3D points points_4D = cv2.triangulatePoints(P1, P2, pts1.T, pts2.T) # homogeneous points_3D = points_4D[:3,:]/points_4D[3,:] # heterogenous # Count the number of positive depth cnt = np.sum(points_3D[2, :] > 0) # Update if cnt > max_cnt: max_cnt = cnt if det > 0: best_solution = (R, C) else: best_solution = (-R, -C) return best_solution def best_essential_matrix(R, t): tx = t[0] ty = t[1] tz = t[2] t_x = np.array([[0, -tz, ty], [tz, 0, -tx], [-ty, tx, 0]]) E = t_x @ R return E ``` ### Step 6: - Triangulates 3D points from corresponding 2D points in two images using a custom implementation of the Direct Linear Transform (DLT) algorithm. ```python= def triangulate_points_custom(P1, P2, pts1, pts2): num_points = pts1.shape[0] points_3D = np.zeros((4, num_points)) for i in range(num_points): x1, y1 = pts1[i] x2, y2 = pts2[i] # Construct the A matrix for DLT A = np.array([ y1 * P1[2, :] - P1[1, :], P1[0, :] - x1 * P1[2, :], y2 * P2[2, :] - P2[1, :], P2[0, :] - x2 * P2[2, :] ]) # Solve for the 3D point using SVD _, _, V = np.linalg.svd(A) X = V[-1, :] # Solution is the last column of V points_3D[:, i] = X return points_3D def triangulate_points(R, t, K, pts1, pts2): R1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]]) R2 = np.hstack((R, t.reshape(3, 1))) # P = KR P1 = K @ R1 P2 = K @ R2 # Triangulate 3D points # points_4D = cv2.triangulatePoints(P1, P2, pts1.T, pts2.T) # homogeneous points_4D = triangulate_points_custom(P1, P2, pts1, pts2) # custom points_3D = points_4D[:3, :] / points_4D[3, :] # heterogenous return points_3D ``` ### Step-7&8: - We utilize the matlab code provided by TA to get the `.obj` file, `.mtl` file, and the mesh plot result. - Then import `.obj` file in Blender to do the visualization. ## Experimental Results ### Step-1 (correspondence across images) #### Mesona ![image](https://hackmd.io/_uploads/HJsVCtFmJe.png) #### Statue ![step-1-statue](https://hackmd.io/_uploads/HyTXCKKXyx.jpg) #### Our Own Image ![step-1-own](https://hackmd.io/_uploads/Bkjw-CCXkx.jpg) ### Step-2 (estimate the fundamental matrix) - For the correctness of the fundamental matrix, we compare our results with those computed by `cv2.findFundamentalMat()`. #### Mesona ![image](https://hackmd.io/_uploads/HkUEk5KXyg.png) #### Statue ![image](https://hackmd.io/_uploads/ByxKJqFQkx.png) #### Our Own Image ![image](https://hackmd.io/_uploads/BJgBZC0Xyx.png) ### step-3 (interest points and epipolar lines) #### Mesona ![image](https://hackmd.io/_uploads/HyZgitF71g.png) #### Statue ![step-3-statue](https://hackmd.io/_uploads/rJVERFKQyl.jpg) #### Our Own Image ![step-3-own](https://hackmd.io/_uploads/SyYaWRRXke.jpg) ### Step-4&5 - we compare the correct essential matrix `E` and its correponding rotation matrix `R` and camera center `t` by our hand-crafted code and *cv2* package (`findEssentialMat` & `recoverPose`) respectively. The outcome below showed we had the same `R` and `t`. ![Screenshot 2024-12-06 at 12.57.05 AM](https://hackmd.io/_uploads/SyThiLk4kx.png) ### Step 6-7 - we reconstruct the 3D scene from the images. The code triangulates the matched feature points to create a 3D point cloud and then maps the image texture onto these points, exporting the result as a 3D model in OBJ format. #### Mesona ![image](https://hackmd.io/_uploads/BJiAgvl4Jl.png =75%x) ![image](https://hackmd.io/_uploads/SyiIzweN1g.png =75%x) #### Statue ![image](https://hackmd.io/_uploads/SyTTZvlEJg.png =75%x) ![image](https://hackmd.io/_uploads/HyXJGPeVJg.png =75%x) #### Our Own Image ![image](https://hackmd.io/_uploads/B1tAfPeVkg.png =75%x) ![image](https://hackmd.io/_uploads/rk3J7wlNJg.png =75%x) ### Step 8 #### Mesona ![image](https://hackmd.io/_uploads/rkKDnwlNJe.png) #### Statue ![image](https://hackmd.io/_uploads/BJ2GawxEke.png) #### Our Own Image ![image](https://hackmd.io/_uploads/BkrTjPxVye.png) ## Discussion - Reconstructing areas with little texture proved challenging, as these regions lack distinct features for the algorithm to identify and match. We encountered this issue particularly with our own dataset, where the background often contained a high density of interest points while the object of interest had comparatively few. This imbalance made it difficult to generate an accurate 3D model. Increasing the ratio distance threshold reduced the number of background points, but this, in turn, hindered the computation of a reliable fundamental matrix, which is essential for accurate 3D reconstruction. ## Conclusion - In this assignment, we successfully implemented the steps required for structure-from-motion (SfM), including finding correspondences across images, estimating the fundamental matrix using the normalized 8-point algorithm, and visualizing epipolar lines. - We also computed the essential matrix, performed triangulation to reconstruct 3D points, and used texture mapping to create a 3D model. - Finally, the resulting model was visualized using Blender. The experiment demonstrated a clear understanding of the SfM pipeline and its practical challenges, providing insights into camera calibration, correspondence matching, and 3D reconstruction. ## Work Assignment Plan Between Team Members - 313560007 蕭皓隆:coding(step 1-3), composing report - 313554015 周禹彤:coding(step 4-5), composing report - 313551011 李佾:coding(step 6-8), composing report