Computer Vision Questions

# Computer Vision Questions ## Camera parameters ### Intrinsics ![](https://i.imgur.com/IftKVFI.jpg) ``` K(3, 3) = [ [fx, a, cx], # f: focal length (camera center to image plane) [ 0, fy, cy], # a: skew factor [ 0, 0, 1] # c: optical center ] ``` ### Extrinsics ``` H (Homogeneous) = [ [R00, R01, R02, tx] # 3-by-3 rotation matrix [R10, R11, R12, ty] # 3-by-1 translation vector [R20, R21, R22, tz] [ 0, 0, 0, 1] ] Accumulation: P(t=0) @ P(t=1) = P(t=2) ``` ### Projection ``` Point in 3d space (Homogeneous): [X, Y, Z, 1]^T Point in 2d camera frame (Homogeneous): [x, y, 1]^T [x] [fx, a, cx] [R00, R01, R02, tx] [X] s[y] = K @ H @ P = [ 0, fy, cy] @ [R10, R11, R12, ty] @ [Y] [1] [ 0, 0, 1] [R20, R21, R22, tz] [Z] [1] Where s is the scaling factor. ``` Rotation has 3 DOFs, while translation also has 3. This is a linear function. ## Pose representation ### General ``` H (Homogeneous) = [ [R00, R01, R02, tx] // 3-by-3 rotation matrix [R10, R11, R12, ty] // 3-by-1 translation vector [R20, R21, R22, tz] [ 0, 0, 0, 1] ] ``` Characteristics of the rotation matrix: * Each row are orthogonal to each other (inner product = 0). Same for each column. * Determinant `det(R) = 1`. * Inverse is the transpose `R^T = inv(R)`. ``` Rx(t) = [[1, 0, 0], [0, cos(t), -sin(t)], [0, sin(t), cos(t)]] Ry(t) = [[ cos(t), 0, sin(t)], [ 0, 1, 0], [-sin(t), 0, cos(t)]] Rz(t) = [[cos(t), -sin(t), 0], [sin(t), cos(t), 0], [ 0, 0, 1]] ``` ### Roll-Pitch-Yaw ![](https://i.imgur.com/PjahU0b.jpg) Aircraft principal axes: axes are fixed to target object itself instead of global frame. ``` Rotation matrix R = Rx(roll) @ Ry(pitch) @ Rz(yaw) ``` ### Euler Angles ![](https://i.imgur.com/lbgGX6z.png) Rotations are defined based on "current self frame". Therefore matmul order matters here. ### Quaternions 4-element vector form representing a rotation matrix. ``` q = a + bi + cj + dk # (a,b,c,d are real numbers; i,j,k are basic quaternions). ``` ## Epipolar Geometry and Homography ### Epipolar Geometry ![](https://i.imgur.com/JMZ2w5K.png) - Camera centers (`O_L, O_R`) - Epipolar point / epipole (`e_L, e_R`): The projection of each camera center onto another's frame. - Epipolar lines (`e_R:X_R`): Take `XO_L` as example, all points on it (including the epipole) projects to the same point on the left frame. These points captured by the right frame is called the epipolar line. All epipolar lines pass through the epipole. - Epipolar plane (`X, O_L, O_R`): Intersects the camera planes by the epipolar lines. ### Homography ``` Point before homographic warping: [u, v, 1]^T Point after homographic warping: [x, y, 1]^T [x] [h11, h12, h13] [fx, a, cx] [r_xu, r_xv, tx] [u] s[y] = [h21, h22, h23] = [ 0, fy, cy] @ [r_yu, r_yv, ty] @ [v] [1] [h31, h32, h33] [ 0, 0, 1] [r_zu, r_zv, tz] [1] Where s is the scaling factor. The homography matrix has 8 DOFs since we have to account for the scaling factor. ``` Thus the 8-point algorithm -- 8 points in both images are needed to estimate the homography matrix. ## Essential and Fundamental matrices