# Group[5]_HW[3]_report
## Introduction
- This homework focuses on creating a panoramic image through image stitching techniques. The process involves detecting and describing interest points using the SIFT algorithm, which identifies distinct features in overlapping images. Moreover, the approach demostrates the steps for constructing wide-filed views from multiple images.
- Following the homework spec, feature matching is applied to align corresponding points between the images. Using RANSAC, a homography matrix $H$ is calculated to optimized the transformation between images and reduce outliers. Lastly, one image is warped and aligned with the other to produce a seamless panoramic result.
## Implementation Procedure
### Step 1: Interest points detection & feature description by SIFT
- The implementation details are provided within the code block comments.
```python
def find_features(img1, img2, method):
'''
check the method is 'ORB' or 'SIFT'
call existing cv2 API to find the keypoints and descriptor
'''
if method == 'orb':
orb = cv2.ORB_create()
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
else:
sift = cv2.SIFT_create()
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
return kp1, kp2, des1, des2
```
### Step 2: Feature matching by SIFT features
- The implementation details are provided within the code block comments.
```python
best_match = feature_matching(des1, des2, threshold)
match_result = combine_result(img1, img2, kp1, kp2)
```
```python
def feature_matching(des1, des2, threshold):
# Initialize array to store the index of the best match for each descriptor in des1
best_match = -1 * np.ones(des1.shape[0], dtype=int)
# Iterate throught each descriptor in des1
for i, desc1 in enumerate(des1):
# Calculate the L2 distance between desc1 and each desciptor in des2
scores = [np.sum((desc1 - desc2) ** 2) for desc2 in des2]
# Find the two smallest distances using a heap for efficiency
first, second = heapq.nsmallest(2, range(len(scores)), key=scores.__getitem__)
# Compute the ratio of the smallest and second smallest distances
ratio_distance = np.sqrt(scores[first]) / np.sqrt(scores[second])
# Check whether the ratio is smaller than the threshold or not
if ratio_distance < threshold:
# Record the index of the best match in des2 for desc1
best_match[i] = first
return best_match
def combine_result(img1, img2, kp1, kp2):
# Get the height and width of two imgs
h1, w1 = img1.shape[0], img1.shape[1]
h2, w2 = img2.shape[0], img2.shape[1]
# Initialize an output image to display the matches between img1 and img2 side by side
match_result = np.zeros((max(h1, h2), w1 + w2, 3), dtype='uint8')
# Place img1 on left side, img2 on right side
match_result[0:h1, 0:w1] = img1
match_result[0:h2, w1:] = img2
# Draw lines connecting matched keypoints between two imgs
for i, label in enumerate(best_match):
# Check whether the match is valid or not
if label != -1:
c = np.random.randint(0, high=255, size=(3,))
color = tuple([int(x) for x in c])
# Keypoint in img1
ptA = tuple(map(int, kp1[i].pt))
# Keypoint in img2 adjusted for concatenation
ptB = (int(kp2[label].pt[0] + w1), int(kp2[label].pt[1]))
# Draw line between matched keypoints
cv2.line(match_result, ptA, ptB, color, 1)
return match_resultpython
```
### Step 3: RANSAC to find homography matrix $H$
- As the transformation between two images is through a 3*3 homography matrix ***H***, in which there are 9 degree of freedom (DoF), we would need at least 4 pairs to solve H. Since we have a larger set of matching points, here we apply RANSAC to iteratively select random subsets of point pairs to estimate H, then choose the H with the most inliers, which improves robustness to outliers.
- The implementation details are provided within the code block comments.
```python
def homomat(pairs):
"""
1) Given n matching pairs, we will have a 2n*9 A, then calculate h by Ah = 0
2) Reshape h to H
"""
A = []
for x1, y1, x2, y2 in pairs:
A.append([-x1, -y1, -1, 0, 0, 0, x2*x1, x2*y1, x2])
A.append([ 0, 0, 0, -x1, -y1, -1, y2*x1, y2*y1, y2])
A = np.array(A)
U, S, V = np.linalg.svd(A)
H = V[-1].reshape(3, 3)
H /= H[2,2] # normalized
return H
```
```python
# Classify inliers or outliers by a given threshold
def count_error(matches, H, threshold):
inliers = []
for item in matches:
x1, y1, x2, y2 = item
p1 = np.array([x1, y1, 1]) # homogenious in img 1
p2 = np.array([x2, y2]) # [xi', yi'] in img 2 # ground truth
# estimation
wx, wy, w = H @ p1
x = wx/w
y = wy/w
estimated_p2 = np.array([x, y])
# error calculation for finding inlier index
# [x2, y2] and [x, y] should be as close as possible
errors = np.linalg.norm(p2-estimated_p2)
if errors < threshold:
inliers.append(item)
return inliers
```
```python
# Use RANSAC to sample points and find best H
def RANSAC(S, N, matches, threshold):
best_H = None
max_inliers = []
# 4) Iterate (1)~(3) for N times, and update current best H in (5)
for iter in range(N):
# 1) Sample S correspondences from the feature matching results
id = random.sample(range(len(matches)), S)
sample_pairs = [matches[i] for i in id]
# 2) Compute the homography matrix based on these sampled correspondences
try:
H = homomat(sample_pairs)
except np.linalg.LinAlgError:
continue
# 3) Check the number of inliers/outliers by a threshold
inliers = count_error(matches, H, threshold)
# 5) Get the best homography matrix with smallest number of outliers
# (= with biggest number of inliers)
if len(inliers) > len(max_inliers):
max_inliers = inliers.copy()
best_H = H.copy()
# additional refinement
# calculate a better H with inliers
if max_inliers:
best_H = homomat(max_inliers)
else:
best_H = None
return max_inliers, best_H
```
### Step 4: Warp image to create panoramic image
* Using funciton wrap to overlay the left image and right image with provided H
* Then apply linear blending for create smooth and seamless transitions between the individual images that make up the panorama
```python
def linearBlending(imgs):
'''
linear Blending(also known as Feathering)
'''
img_left, img_right = imgs
(hl, wl) = img_left.shape[:2]
(hr, wr) = img_right.shape[:2]
img_left_mask = np.zeros((hr, wr), dtype="int")
img_right_mask = np.zeros((hr, wr), dtype="int")
# find the left image and right image mask region(Those not zero pixels)
for i in range(hl):
for j in range(wl):
if np.count_nonzero(img_left[i, j]) > 0:
img_left_mask[i, j] = 1
for i in range(hr):
for j in range(wr):
if np.count_nonzero(img_right[i, j]) > 0:
img_right_mask[i, j] = 1
# find the overlap mask(overlap region of two image)
overlap_mask = np.zeros((hr, wr), dtype="int")
for i in range(hr):
for j in range(wr):
if (np.count_nonzero(img_left_mask[i, j]) > 0 and np.count_nonzero(img_right_mask[i, j]) > 0):
overlap_mask[i, j] = 1
# Plot the overlap mask
plt.figure(21)
plt.title("overlap_mask")
plt.imshow(overlap_mask.astype(int), cmap="gray")
# compute the alpha mask to linear blending the overlap region
alpha_mask = np.zeros((hr, wr)) # alpha value depend on left image
for i in range(hr):
minIdx = maxIdx = -1
for j in range(wr):
if (overlap_mask[i, j] == 1 and minIdx == -1):
minIdx = j
if (overlap_mask[i, j] == 1):
maxIdx = j
if (minIdx == maxIdx): # represent this row's pixels are all zero, or only one pixel not zero
continue
decrease_step = 1 / (maxIdx - minIdx)
for j in range(minIdx, maxIdx + 1):
alpha_mask[i, j] = 1 - (decrease_step * (j - minIdx))
linearBlending_img = np.copy(img_right)
linearBlending_img[:hl, :wl] = np.copy(img_left)
# linear blending
for i in range(hr):
for j in range(wr):
if ( np.count_nonzero(overlap_mask[i, j]) > 0):
linearBlending_img[i, j] = alpha_mask[i, j] * img_left[i, j] + (1 - alpha_mask[i, j]) * img_right[i, j]
return linearBlending_img
```
```python
def warp(img_left, img_right, HomoMat):
'''
Warp image to create panoramic image
There are three different blending method - noBlending、linearBlending、linearBlendingWithConstant
'''
(hl, wl) = img_left.shape[:2]
(hr, wr) = img_right.shape[:2]
stitch_img = np.zeros( (max(hl, hr), wl + wr, 3), dtype="int") # create the (stitch)big image accroding the imgs height and width
stitch_img[:hl, :wl] = img_left
# Transform Right image(the coordination of right image) to destination iamge(the coordination of left image) with HomoMat
inv_H = np.linalg.inv(HomoMat)
for i in range(stitch_img.shape[0]):
for j in range(stitch_img.shape[1]):
coor = np.array([j, i, 1])
img_right_coor = inv_H @ coor # the coordination of right image
img_right_coor /= img_right_coor[2]
# you can try like nearest neighbors or interpolation
y, x = int(round(img_right_coor[0])), int(round(img_right_coor[1])) # y for width, x for height
# if the computed coordination not in the (hegiht, width) of right image, it's not need to be process
if (x < 0 or x >= hr or y < 0 or y >= wr):
continue
# else we need the tranform for this pixel
stitch_img[i, j] = img_right[x, y]
stitch_img = linearBlending([img_left, stitch_img])
return stitch_img
```
## Experimental Result
### Visualization for feature matching
#### SIFT
- $threshold=0.7$
- 
- $threshold=0.4$
- 
#### ORB
- $threshold=0.7$
- 
- $threshold=0.4$
- 
### Result




## Discussion
- In feature matching session, the commonly used threshold is between 0.7 and 0.8.
- In RANSAC, the threhold pixel threshold to tell whether a point falls as inlieri s typically between 3 and 10 pixels, for which the best choice may vary due to factors like image resolution, feature density, and noise level
- The function in the library ***cv2.findHomography*** takes additional consideration in comparison to our hand-crafted one, which may be the reason why its stitching is more smooth than ours.
1) A built-in RANSAC with **finely tuned thresholds** and **convergence criteria**
2) ***Normalization*** of points, helping ***minimize computational errors*** when calculating the homography matrix
- Centering: Translate the points so their centroid is at the origin.
- Scaling: Adjust the scale so that the average distance of the points from the origin is a constant.
- RANSAC is refined with the code below , recalculating the best homography matrix based on inliers.
```python
if max_inliers:
best_H = homomat(max_inliers)
else:
best_H = None
```
- To increase robustness and optimize the output image, we use blending to create smooth and seamless transitions between the individual images that make up the panorama.
## Conclusion
- We have successfully implemented an automatic panoramic image stitching process using SIFT for feature detection, matching, and RANSAC for homography estimation.
- We have also addressed challenges in feature matching accuracy and image alignment, achieving stable and precise stitching results.
- This project enhanced our understanding of image processing and feature-based matching, providing a foundation for future applications.
## Work Assignment Plan Between Team Members
- 313560007 蕭皓隆:coding(step 1&2), composing report
- 313554015 周禹彤:coding(step 3), composing report
- 313551011 李佾:coding(step 4), composing report