Camera model - HackMD

# Camera model ###### tags: `cvdl2020` ## Pinhole camera Consider the central projection of points in space onto a plane. Let the centre of projection be the origin of a Euclidean coordinate system, and consider the place $x=f$ which is called the **image plane** or **focal plane**. ![](https://i.imgur.com/Zwg2hEc.jpg) Given the point $(X,Y,Z)^T$ in 3D world coordinate, it's mapped to the point $(f X/Z, fY/Z, f)^T$ on the image plane. Ignoring the final image coordinate, we conclude that $$(X,Y,Z)^T \rightarrow (f X/Z, fY/Z, f)^T$$. Notice that this is a mapping from Euclidean space $R^3$ to Euclidean space $R^3$. The center of projection is called the **camera center** or **optical center**. The line from the camera center perpendicular to the image plane is called the **principle axis** or **principal ray** of the camera. The point where the principal axis meets the image plance is called the **principal point**. The plane through the camera center parallel to the image plane is called the **principal plane** of the camera. ### Center projection using homogeneous coordinates If the world coordinate is represented by homogeneous vector, then centeral projection can be expressed as a linear mapping between homogeneous coordinate (another reason why homogeneous coordinate is good) This is a mapping from $R^3 \in \text{World Coordinate}$ to $R^3 \in \text{Image Plane}$ $$(X,Y,Z)^T \rightarrow (f X/Z, fY/Z, f)^T$$ can be written to $$ \begin{bmatrix} fX \\ fY \\ Z \\ \end{bmatrix} = \begin{pmatrix} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{pmatrix} \begin{pmatrix} X \\ Y \\ Z \\ 1 \\ \end{pmatrix} $$ or $$x = PX$$ where - $x$ is the image point represented by a homogeneous 3-vector - $P$ is a 3x4 homogeneous **camera projection matrix** - $X$ is the world coordinate in $P^3$, represented by a homogeneous 4-vector ### Principla point offset In reality, the origin of coordinate in the image plane is at the principal point. So the mapping becomes: $$(X,Y,Z)^T \rightarrow (f X/Z + p_x, fY/Z + p_y)^T$$ $$ \begin{bmatrix} fX + Z p_x \\ fY + Z p_y \\ Z \\ \end{bmatrix} = \begin{pmatrix} f & 0 & p_x & 0 \\ 0 & f & p_y & 0 \\ 0 & 0 & 1 & 0 \\ \end{pmatrix} \begin{pmatrix} X \\ Y \\ Z \\ 1 \\ \end{pmatrix} $$ where $(p_x, p_y)^T$ is the coordinate of principla point. or $$x = K[I|0] X$$ where $K$ is a 3x3 **camera caliberation matrix** ### Camera rotation and translation If $\tilde X$ is an inhomogeneous 3-vector representing the coordinates of a point in the world coordinate frame, and $\tilde X_{cam}$ represents the same point in the camera coordinate frame. Since two coordinate frames are related via a rotation and a translation. We may write $\tilde X_{cam} = R(\tilde X - \tilde C)$ where $\tilde C$ is the coordinate of the camera center in the world coordinate frame, and R is a 3x3 rotation matrix representing the orientation of the camera coordinate frame. $$X_{cam} = \begin{bmatrix} R & -R \tilde C \\ 0 & 1 \\ \end{bmatrix} \begin{pmatrix} X \\ Y \\ Z \\ 1 \\ \end{pmatrix} = \begin{bmatrix} R & -R \tilde C \\ 0 & 1 \\ \end{bmatrix} X $$ Finally, $$ x = KR[I|-\tilde C]X $$ One sees that a general pinhole camera $P=KR[I|-\tilde C]$ has **9** dof: **3** for $R$ ($f,p_x,p_y$), **3** for $R$, and **3** for $\tilde C$. ### More on intrinsic & entrinsic parameter ![](https://i.imgur.com/hXQl5bN.png) $$ \lambda \begin{bmatrix} \mu \\ \nu \\ 1 \\ \end{bmatrix} = \begin{bmatrix} \alpha & \gamma & \mu_0 \\ 0 & \beta & \nu_0 \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} R_{11} & R_{12} & R_{13} & T_1 \\ R_{21} & R_{22} & R_{23} & T_2 \\ R_{31} & R_{32} & R_{33} & T_3 \\ \end{bmatrix} \begin{bmatrix} X \\ Y \\ Z \\ 1 \\ \end{bmatrix} = KE \begin{bmatrix} X \\ Y \\ Z \\ 1 \\ \end{bmatrix} = P \begin{bmatrix} X \\ Y \\ Z \\ 1 \\ \end{bmatrix} $$ where $$ \begin{bmatrix} \mu \\ \nu \\ 1 \\ \end{bmatrix} $$ is the image coordiate $$ \begin{bmatrix} X \\ Y \\ Z \\ 1 \\ \end{bmatrix} $$ is the world coordinate point $$ \begin{bmatrix} R_{11} & R_{12} & R_{13} \\ R_{21} & R_{22} & R_{23} \\ R_{31} & R_{32} & R_{33} \\ \end{bmatrix} $$ is the **Rotational Matrix** $$ \begin{bmatrix} T_1 \\ T_2 \\ T_3 \\ \end{bmatrix} $$ is the **Translational Matrix** Parameters: - $$\alpha = - f \times k_{\mu}$$ - $$\beta = - f \times k_{\nu}$$ - $\gamma$ is the **skew factor** - $(\mu_0, \nu_0)$ is the **principal point** - $\lambda$ is the **scale factor** which controls the actual distance given a known ratio in the image #### Intrinsic Parameter **Extrinsic Parameter** $K$ is a projection matrix that maps **3D camera coordinate** to **2D pixel coordinate (undistorted)** $$ K = \begin{bmatrix} -f \times k_{\mu} & \gamma & \mu_0 \\ 0 & -f \times k_{\nu} & \nu_0 \\ 0 & 0 & 1 \\ \end{bmatrix} $$ #### Extrinsic Parameter **Extrinsic Parameter** $E$ is a rotation and translation matrix that maps **3D world coordinate** to **3D camera coordinate** $$ E = \begin{bmatrix} R_{11} & R_{12} & R_{13} & T_1 \\ R_{21} & R_{22} & R_{23} & T_2 \\ R_{31} & R_{32} & R_{33} & T_3 \\ \end{bmatrix} $$ #### Camera calibration K as 7 dof in theory and 8 dof in practice. Why then do most textbooks treat K as an upper-triangular matrix with 5 dof ? It's because we canno recover the full K matrix based on external measurement alone. When caliberating a camera based on external 3D points or other measurements, we end up estimating the K and E camera paraemters simultaneously. Given a full 3x4 camera matrix P, we can compute an upper-traiangular K matrix using QR factorization. (Note in linear algebra book, R represents an upper-triangular matrix, while in CV R represents an orthogonal rotation). #### Summary ```graphviz digraph G{ graph [fontname=Arial, compound=true]; node [shape=record,style=filled, fillcolor=aquamarine]; edge [fontcolor=red]; rankdir=LR; subgraph cluster_0 { rankdir=LR; label="3D"; world [label="World"]; cam [label="Camera"]; } subgraph cluster_2 { label="2D"; pixel [label="Pixel (Undistorted)"]; img [label="Image (Distorted)"]; } // edges pixel->img[label="D", dir="both"]; world->cam[label="E", dir="both"]; cam->pixel[label="K", dir="both"]; } ``` ### Distortion Distortion is the change to image when light passes through lens Given 2D **distorted** image point $(\mu_d, \nu_d)$, we want to find the **undistorted** image point $(\mu_u, \nu_u)$ - Radial (Barrel) distortion: - $\mu_u = \mu_d (1+k_1 r^2 + k_2 r^4 + k_3 r^6)$ - $\nu_u = \nu_d (1+k_1 r^2 + k_2 r^4 + k_3 r^6)$ - Tangential distortion: skip