# Group[5]_HW[3]_report ## Introduction - This homework focuses on creating a panoramic image through image stitching techniques. The process involves detecting and describing interest points using the SIFT algorithm, which identifies distinct features in overlapping images. Moreover, the approach demostrates the steps for constructing wide-filed views from multiple images. - Following the homework spec, feature matching is applied to align corresponding points between the images. Using RANSAC, a homography matrix $H$ is calculated to optimized the transformation between images and reduce outliers. Lastly, one image is warped and aligned with the other to produce a seamless panoramic result. ## Implementation Procedure ### Step 1: Interest points detection & feature description by SIFT - The implementation details are provided within the code block comments. ```python def find_features(img1, img2, method): ''' check the method is 'ORB' or 'SIFT' call existing cv2 API to find the keypoints and descriptor ''' if method == 'orb': orb = cv2.ORB_create() kp1, des1 = orb.detectAndCompute(img1,None) kp2, des2 = orb.detectAndCompute(img2,None) else: sift = cv2.SIFT_create() kp1, des1 = sift.detectAndCompute(img1,None) kp2, des2 = sift.detectAndCompute(img2,None) return kp1, kp2, des1, des2 ``` ### Step 2: Feature matching by SIFT features - The implementation details are provided within the code block comments. ```python best_match = feature_matching(des1, des2, threshold) match_result = combine_result(img1, img2, kp1, kp2) ``` ```python def feature_matching(des1, des2, threshold): # Initialize array to store the index of the best match for each descriptor in des1 best_match = -1 * np.ones(des1.shape[0], dtype=int) # Iterate throught each descriptor in des1 for i, desc1 in enumerate(des1): # Calculate the L2 distance between desc1 and each desciptor in des2 scores = [np.sum((desc1 - desc2) ** 2) for desc2 in des2] # Find the two smallest distances using a heap for efficiency first, second = heapq.nsmallest(2, range(len(scores)), key=scores.__getitem__) # Compute the ratio of the smallest and second smallest distances ratio_distance = np.sqrt(scores[first]) / np.sqrt(scores[second]) # Check whether the ratio is smaller than the threshold or not if ratio_distance < threshold: # Record the index of the best match in des2 for desc1 best_match[i] = first return best_match def combine_result(img1, img2, kp1, kp2): # Get the height and width of two imgs h1, w1 = img1.shape[0], img1.shape[1] h2, w2 = img2.shape[0], img2.shape[1] # Initialize an output image to display the matches between img1 and img2 side by side match_result = np.zeros((max(h1, h2), w1 + w2, 3), dtype='uint8') # Place img1 on left side, img2 on right side match_result[0:h1, 0:w1] = img1 match_result[0:h2, w1:] = img2 # Draw lines connecting matched keypoints between two imgs for i, label in enumerate(best_match): # Check whether the match is valid or not if label != -1: c = np.random.randint(0, high=255, size=(3,)) color = tuple([int(x) for x in c]) # Keypoint in img1 ptA = tuple(map(int, kp1[i].pt)) # Keypoint in img2 adjusted for concatenation ptB = (int(kp2[label].pt[0] + w1), int(kp2[label].pt[1])) # Draw line between matched keypoints cv2.line(match_result, ptA, ptB, color, 1) return match_resultpython ``` ### Step 3: RANSAC to find homography matrix $H$ - As the transformation between two images is through a 3*3 homography matrix ***H***, in which there are 9 degree of freedom (DoF), we would need at least 4 pairs to solve H. Since we have a larger set of matching points, here we apply RANSAC to iteratively select random subsets of point pairs to estimate H, then choose the H with the most inliers, which improves robustness to outliers. - The implementation details are provided within the code block comments. ```python def homomat(pairs): """ 1) Given n matching pairs, we will have a 2n*9 A, then calculate h by Ah = 0 2) Reshape h to H """ A = [] for x1, y1, x2, y2 in pairs: A.append([-x1, -y1, -1, 0, 0, 0, x2*x1, x2*y1, x2]) A.append([ 0, 0, 0, -x1, -y1, -1, y2*x1, y2*y1, y2]) A = np.array(A) U, S, V = np.linalg.svd(A) H = V[-1].reshape(3, 3) H /= H[2,2] # normalized return H ``` ```python # Classify inliers or outliers by a given threshold def count_error(matches, H, threshold): inliers = [] for item in matches: x1, y1, x2, y2 = item p1 = np.array([x1, y1, 1]) # homogenious in img 1 p2 = np.array([x2, y2]) # [xi', yi'] in img 2 # ground truth # estimation wx, wy, w = H @ p1 x = wx/w y = wy/w estimated_p2 = np.array([x, y]) # error calculation for finding inlier index # [x2, y2] and [x, y] should be as close as possible errors = np.linalg.norm(p2-estimated_p2) if errors < threshold: inliers.append(item) return inliers ``` ```python # Use RANSAC to sample points and find best H def RANSAC(S, N, matches, threshold): best_H = None max_inliers = [] # 4) Iterate (1)~(3) for N times, and update current best H in (5) for iter in range(N): # 1) Sample S correspondences from the feature matching results id = random.sample(range(len(matches)), S) sample_pairs = [matches[i] for i in id] # 2) Compute the homography matrix based on these sampled correspondences try: H = homomat(sample_pairs) except np.linalg.LinAlgError: continue # 3) Check the number of inliers/outliers by a threshold inliers = count_error(matches, H, threshold) # 5) Get the best homography matrix with smallest number of outliers # (= with biggest number of inliers) if len(inliers) > len(max_inliers): max_inliers = inliers.copy() best_H = H.copy() # additional refinement # calculate a better H with inliers if max_inliers: best_H = homomat(max_inliers) else: best_H = None return max_inliers, best_H ``` ### Step 4: Warp image to create panoramic image * Using funciton wrap to overlay the left image and right image with provided H * Then apply linear blending for create smooth and seamless transitions between the individual images that make up the panorama ```python def linearBlending(imgs): ''' linear Blending(also known as Feathering) ''' img_left, img_right = imgs (hl, wl) = img_left.shape[:2] (hr, wr) = img_right.shape[:2] img_left_mask = np.zeros((hr, wr), dtype="int") img_right_mask = np.zeros((hr, wr), dtype="int") # find the left image and right image mask region(Those not zero pixels) for i in range(hl): for j in range(wl): if np.count_nonzero(img_left[i, j]) > 0: img_left_mask[i, j] = 1 for i in range(hr): for j in range(wr): if np.count_nonzero(img_right[i, j]) > 0: img_right_mask[i, j] = 1 # find the overlap mask(overlap region of two image) overlap_mask = np.zeros((hr, wr), dtype="int") for i in range(hr): for j in range(wr): if (np.count_nonzero(img_left_mask[i, j]) > 0 and np.count_nonzero(img_right_mask[i, j]) > 0): overlap_mask[i, j] = 1 # Plot the overlap mask plt.figure(21) plt.title("overlap_mask") plt.imshow(overlap_mask.astype(int), cmap="gray") # compute the alpha mask to linear blending the overlap region alpha_mask = np.zeros((hr, wr)) # alpha value depend on left image for i in range(hr): minIdx = maxIdx = -1 for j in range(wr): if (overlap_mask[i, j] == 1 and minIdx == -1): minIdx = j if (overlap_mask[i, j] == 1): maxIdx = j if (minIdx == maxIdx): # represent this row's pixels are all zero, or only one pixel not zero continue decrease_step = 1 / (maxIdx - minIdx) for j in range(minIdx, maxIdx + 1): alpha_mask[i, j] = 1 - (decrease_step * (j - minIdx)) linearBlending_img = np.copy(img_right) linearBlending_img[:hl, :wl] = np.copy(img_left) # linear blending for i in range(hr): for j in range(wr): if ( np.count_nonzero(overlap_mask[i, j]) > 0): linearBlending_img[i, j] = alpha_mask[i, j] * img_left[i, j] + (1 - alpha_mask[i, j]) * img_right[i, j] return linearBlending_img ``` ```python def warp(img_left, img_right, HomoMat): ''' Warp image to create panoramic image There are three different blending method - noBlending、linearBlending、linearBlendingWithConstant ''' (hl, wl) = img_left.shape[:2] (hr, wr) = img_right.shape[:2] stitch_img = np.zeros( (max(hl, hr), wl + wr, 3), dtype="int") # create the (stitch)big image accroding the imgs height and width stitch_img[:hl, :wl] = img_left # Transform Right image(the coordination of right image) to destination iamge(the coordination of left image) with HomoMat inv_H = np.linalg.inv(HomoMat) for i in range(stitch_img.shape[0]): for j in range(stitch_img.shape[1]): coor = np.array([j, i, 1]) img_right_coor = inv_H @ coor # the coordination of right image img_right_coor /= img_right_coor[2] # you can try like nearest neighbors or interpolation y, x = int(round(img_right_coor[0])), int(round(img_right_coor[1])) # y for width, x for height # if the computed coordination not in the (hegiht, width) of right image, it's not need to be process if (x < 0 or x >= hr or y < 0 or y >= wr): continue # else we need the tranform for this pixel stitch_img[i, j] = img_right[x, y] stitch_img = linearBlending([img_left, stitch_img]) return stitch_img ``` ## Experimental Result ### Visualization for feature matching #### SIFT - $threshold=0.7$ - ![sift_0.7](https://hackmd.io/_uploads/rJF4Bilbkg.jpg) - $threshold=0.4$ - ![sift_0.4](https://hackmd.io/_uploads/H1tHSoeZJx.jpg) #### ORB - $threshold=0.7$ - ![orb_0.7](https://hackmd.io/_uploads/rJ8LHjgZ1l.jpg) - $threshold=0.4$ - ![orb_0.4](https://hackmd.io/_uploads/rJYLBog-kl.jpg) ### Result ![hill](https://hackmd.io/_uploads/SJJxcSDWyg.jpg) ![S](https://hackmd.io/_uploads/Sy0kqBDZkl.jpg) ![TV](https://hackmd.io/_uploads/BJAJcBPZye.jpg) ![BIRD](https://hackmd.io/_uploads/SkygqBPZke.jpg) ## Discussion - In feature matching session, the commonly used threshold is between 0.7 and 0.8. - In RANSAC, the threhold pixel threshold to tell whether a point falls as inlieri s typically between 3 and 10 pixels, for which the best choice may vary due to factors like image resolution, feature density, and noise level - The function in the library ***cv2.findHomography*** takes additional consideration in comparison to our hand-crafted one, which may be the reason why its stitching is more smooth than ours. 1) A built-in RANSAC with **finely tuned thresholds** and **convergence criteria** 2) ***Normalization*** of points, helping ***minimize computational errors*** when calculating the homography matrix - Centering: Translate the points so their centroid is at the origin. - Scaling: Adjust the scale so that the average distance of the points from the origin is a constant. - RANSAC is refined with the code below , recalculating the best homography matrix based on inliers. ```python if max_inliers: best_H = homomat(max_inliers) else: best_H = None ``` - To increase robustness and optimize the output image, we use blending to create smooth and seamless transitions between the individual images that make up the panorama. ## Conclusion - We have successfully implemented an automatic panoramic image stitching process using SIFT for feature detection, matching, and RANSAC for homography estimation. - We have also addressed challenges in feature matching accuracy and image alignment, achieving stable and precise stitching results. - This project enhanced our understanding of image processing and feature-based matching, providing a foundation for future applications. ## Work Assignment Plan Between Team Members - 313560007 蕭皓隆:coding(step 1&2), composing report - 313554015 周禹彤:coding(step 3), composing report - 313551011 李佾:coding(step 4), composing report