# Camera Calibration ###### tags: `computer vision` <style> figure { padding: 4px; margin: auto; text-align: center; } figcaption { background-color: black; color: white; font-style: italic; padding: 1px; text-align: center; } </style> :::info Find a camera's intrinsic and extrinsic parameters, describing the projection from 3D world coord system to 2D camera coordinate system. ::: ![image](https://hackmd.io/_uploads/ByXUvdFtT.png) ## Perspective Projection Converting from camera-coordinate $(x_{c}, y_{c}, z_{c})$ to image-coordinate $(u, v)$ includes three tasks: 1. Length projection using [pinhole camera formula](https://hackmd.io/8y0Ez6DVSOy5TeL_LFWw9w#Pinhole-Camera). 2. Unit transformation (e.g. mm -> pixels) 3. Place the origin of image-coordinate to corner ![image](https://hackmd.io/_uploads/HkdLKP_Y6.png) It gives us the following formula to converting corrdinates: <figure> <img src="https://hackmd.io/_uploads/B1lKsPOF6.png" width="500"> </figure> For simplicity, we always combine parameters $m, f$ for representation: $$ u = f_{x} \frac{x_{c}}{z_{c}} + o_{x} $$ $$ v = f_{y} \frac{y_{c}}{z_{c}} + o_{y} $$ Great! We know how to do projection now. However, the formula above is a ==non-linear transformation== as the operation divides one of the input parameters (namely $z$). That is, we cannot rewrite the formula above into: $$ \begin{bmatrix} u \\ v \end{bmatrix} = M \begin{bmatrix} x_{c} \\ y_{c} \\ z_{c} \end{bmatrix} $$ This nonlinearity avoids us to take advantage of good calculation properties of linearity. It drives mathematicians to ponder the question "is it possible to represent transformation as a **matrix-vector product** despite its nonlinearity?" The answer is yes. One of solution is to use Homogeneous Coordinate System. ### Homogeneous Coordinate #### Euclidean -> Homogeneous $$ \begin{bmatrix} x \\ y \end{bmatrix} \Rightarrow \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} \equiv \begin{bmatrix} xw \\ yw \\ w \end{bmatrix} $$ #### Homogeneous -> Euclidean $$ \begin{bmatrix} x \\ y \\ z \end{bmatrix} \equiv \begin{bmatrix} x/z \\ y/z \\ 1 \end{bmatrix} \Rightarrow \begin{bmatrix} x/z \\ y/z \\ \end{bmatrix} $$ #### Advantage 1. Linear transformation After converting to homogeneous coordinate system, we can do **matrix-vector product** for perspective projection $(x_{c}, y_{c}, z_{c}, 1) \rightarrow (u, v, 1)$ : $$ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \equiv \begin{bmatrix} uz_{c} \\ vz_{c} \\ z_{c} \end{bmatrix} = \begin{bmatrix} f_{x} & 0 & o_{x} & 0 \\ 0 & f_{y} & o_{y} & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} x_{c} \\ y_{c} \\ z_{c} \\ 1 \end{bmatrix} $$ 2. Infinity representation Homogenous coordinates allows representing points in infinity. For example, the point at infinity can be represented as $(u, v, 0)$. The property is useful for representing vanishing points on 2D plane. 3. Projection intuitive A pixel projected on 2D plane actually maps multiple points in 3D world. Homogenous coordinates allows representing multiple points in one coordinate. $$ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} \equiv \begin{bmatrix} uw \\ vw \\ w \end{bmatrix} $$ ![image](https://hackmd.io/_uploads/Hkim4XtKT.png) ## Camera Calibration Matrix ![image](https://hackmd.io/_uploads/ByXUvdFtT.png) ### Intrinsic Matrix Decompose linear transformation matrix in homogeneous coordinate a bit furthur into: $$ P' = M_{int}P = \begin{bmatrix} f_{x} & 0 & o_{x} & 0 \\ 0 & f_{y} & o_{y} & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} P = \begin{bmatrix} f_{x} & 0 & o_{x} \\ 0 & f_{y} & o_{y} \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} I & 0 \end{bmatrix} P = K \begin{bmatrix} I & 0 \end{bmatrix} P $$ The matrix $K$ is often referred to as the camera intrinsic matrix. ### Extrinsic Matrix Next, we have to convert points from world reference system to camera reference system. This relationship is captured by rotation matrix $R$ and translation vector $t$. $$ P' = M_{ext}P = \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} P_{w} $$ ### Projection Matrix Combine intrinsic matrix and extrinsic matrix, we can derive projection matrix $M$: $$ P = MP_{w} = M_{int}M_{ext}P_{w} = K \begin{bmatrix} I & 0 \end{bmatrix} \begin{bmatrix} R & t \\ 0 & 1 \end{bmatrix} P_{w} = \boxed{K \begin{bmatrix} R & t \end{bmatrix} P_{w}} $$ Extended question: how many degree of freedom for projection Matrix? <br> Ans: 5+3+3 = 11 DoF <br> - $K$: 5 DoF - $R$: 3 DoF - $t$: 3 DoF ## Solve Intrinsic / Extrinsic Matrix by DLT The Direct Linear Transform (DLT) is an algorithm that solves a [homogeneous system](https://hackmd.io/@jackyyeh/HyXcLNewp). **Step 1:** Capture an image of object with known geometry ![image](https://hackmd.io/_uploads/HJ-QeN9Ip.png) **Step 2:** Identify the correspondences between 3D scene points and image points ![image](https://hackmd.io/_uploads/r12_Qa5Fa.png) **Step 3:** Expand the matrix as linear equations for each corresponding pair. One correspondence pair contributes two constraint equations. $$ P' = M P_{w} $$ $$ \underbrace{ \begin{bmatrix} u \\ v \\ 1 \end{bmatrix} }_\text{known} = \underbrace{ \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \\ \end{bmatrix} }_\text{unknown} \underbrace{ \begin{bmatrix} x_{w} \\ y_{w} \\ z_{w} \\ 1 \end{bmatrix} }_\text{known} $$ Two constraint equations: ![image](https://hackmd.io/_uploads/SkoRr6qYT.png) **Step 4:** Rearranging the terms <br> According to **step 3**, with 12 elements unknown, we need at least 6 correspondence pairs. ![image](https://hackmd.io/_uploads/HyaHlVqUp.png) **Step 5:** Solve $P$ by [homogeneous least square solution](https://hackmd.io/@jackyyeh/HyXcLNewp). <br> Note that $P$ has multiple solutions because $P$ is defined up to a scale. (Refer to [supplement](https://hackmd.io/o0UqmikhQdKBHLiaY5FV-Q?both#Supplement) for explanation). **Step 6:** Decompose $P$ into intrinsic & extrinsic matrix - Find $K$ and $R$ using QR factorization ![image](https://hackmd.io/_uploads/SkmIn4cLp.png) - Find $t$ ![image](https://hackmd.io/_uploads/B10i34cUp.png) ## Supplement - Projection matrix is defined only [up to a scale](https://stackoverflow.com/questions/17114880/up-to-a-scale-factor). ![image](https://hackmd.io/_uploads/ryDIbN9UT.png) ![image](https://hackmd.io/_uploads/HJe6W4c86.png) - [Perspective-n-Point(PnP)](https://hackmd.io/@jackyyeh/BJxMZtUUT/%2FXoqLoirfTHmv0RN7n9e-Cw) is an another calibration algorithm provided with camera intrinsic matrix. We will discuss it in later post. ## Reference - [Camera Calibration | Camera Calibration](https://www.youtube.com/watch?v=GUbWsXU1mac) - [Intrinsic and Extrinsic Matrices | Camera Calibration](https://www.youtube.com/watch?v=2XM2Rb2pfyQ)