# Camera Calibration
###### tags: `computer vision`
<style>
figure {
padding: 4px;
margin: auto;
text-align: center;
}
figcaption {
background-color: black;
color: white;
font-style: italic;
padding: 1px;
text-align: center;
}
</style>
:::info
Find a camera's intrinsic and extrinsic parameters, describing the projection from 3D world coord system to 2D camera coordinate system.
:::

## Perspective Projection
Converting from camera-coordinate $(x_{c}, y_{c}, z_{c})$ to image-coordinate $(u, v)$ includes three tasks:
1. Length projection using [pinhole camera formula](https://hackmd.io/8y0Ez6DVSOy5TeL_LFWw9w#Pinhole-Camera).
2. Unit transformation (e.g. mm -> pixels)
3. Place the origin of image-coordinate to corner

It gives us the following formula to converting corrdinates:
<figure>
<img src="https://hackmd.io/_uploads/B1lKsPOF6.png" width="500">
</figure>
For simplicity, we always combine parameters $m, f$ for representation:
$$
u = f_{x} \frac{x_{c}}{z_{c}} + o_{x}
$$
$$
v = f_{y} \frac{y_{c}}{z_{c}} + o_{y}
$$
Great! We know how to do projection now. However, the formula above is a ==non-linear transformation== as the operation divides one of the input parameters (namely $z$). That is, we cannot rewrite the formula above into:
$$
\begin{bmatrix}
u \\
v
\end{bmatrix} =
M
\begin{bmatrix}
x_{c} \\
y_{c} \\
z_{c}
\end{bmatrix}
$$
This nonlinearity avoids us to take advantage of good calculation properties of linearity. It drives mathematicians to ponder the question "is it possible to represent transformation as a **matrix-vector product** despite its nonlinearity?"
The answer is yes. One of solution is to use Homogeneous Coordinate System.
### Homogeneous Coordinate
#### Euclidean -> Homogeneous
$$
\begin{bmatrix}
x \\
y
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
x \\
y \\
1
\end{bmatrix}
\equiv
\begin{bmatrix}
xw \\
yw \\
w
\end{bmatrix}
$$
#### Homogeneous -> Euclidean
$$
\begin{bmatrix}
x \\
y \\
z
\end{bmatrix}
\equiv
\begin{bmatrix}
x/z \\
y/z \\
1
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
x/z \\
y/z \\
\end{bmatrix}
$$
#### Advantage
1. Linear transformation
After converting to homogeneous coordinate system, we can do **matrix-vector product** for perspective projection $(x_{c}, y_{c}, z_{c}, 1) \rightarrow (u, v, 1)$ :
$$
\begin{bmatrix}
u \\
v \\
1
\end{bmatrix}
\equiv
\begin{bmatrix}
uz_{c} \\
vz_{c} \\
z_{c}
\end{bmatrix} =
\begin{bmatrix}
f_{x} & 0 & o_{x} & 0 \\
0 & f_{y} & o_{y} & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
\begin{bmatrix}
x_{c} \\
y_{c} \\
z_{c} \\
1
\end{bmatrix}
$$
2. Infinity representation
Homogenous coordinates allows representing points in infinity. For example, the point at infinity can be represented as $(u, v, 0)$. The property is useful for representing vanishing points on 2D plane.
3. Projection intuitive
A pixel projected on 2D plane actually maps multiple points in 3D world. Homogenous coordinates allows representing multiple points in one coordinate.
$$
\begin{bmatrix}
u \\
v \\
1
\end{bmatrix}
\equiv
\begin{bmatrix}
uw \\
vw \\
w
\end{bmatrix}
$$

## Camera Calibration Matrix

### Intrinsic Matrix
Decompose linear transformation matrix in homogeneous coordinate a bit furthur into:
$$
P' = M_{int}P =
\begin{bmatrix}
f_{x} & 0 & o_{x} & 0 \\
0 & f_{y} & o_{y} & 0 \\
0 & 0 & 1 & 0
\end{bmatrix}
P =
\begin{bmatrix}
f_{x} & 0 & o_{x} \\
0 & f_{y} & o_{y} \\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
I & 0
\end{bmatrix} P = K
\begin{bmatrix}
I & 0
\end{bmatrix} P
$$
The matrix $K$ is often referred to as the camera intrinsic matrix.
### Extrinsic Matrix
Next, we have to convert points from world reference system to camera reference system. This relationship is captured by rotation matrix $R$ and translation vector $t$.
$$
P' = M_{ext}P =
\begin{bmatrix}
R & t \\
0 & 1
\end{bmatrix} P_{w}
$$
### Projection Matrix
Combine intrinsic matrix and extrinsic matrix, we can derive projection matrix $M$:
$$
P = MP_{w} = M_{int}M_{ext}P_{w} = K
\begin{bmatrix}
I & 0
\end{bmatrix} \begin{bmatrix}
R & t \\
0 & 1
\end{bmatrix}
P_{w} = \boxed{K
\begin{bmatrix}
R & t
\end{bmatrix} P_{w}}
$$
Extended question: how many degree of freedom for projection Matrix? <br>
Ans: 5+3+3 = 11 DoF <br>
- $K$: 5 DoF
- $R$: 3 DoF
- $t$: 3 DoF
## Solve Intrinsic / Extrinsic Matrix by DLT
The Direct Linear Transform (DLT) is an algorithm that solves a [homogeneous system](https://hackmd.io/@jackyyeh/HyXcLNewp).
**Step 1:** Capture an image of object with known geometry

**Step 2:** Identify the correspondences between 3D scene points and image points

**Step 3:** Expand the matrix as linear equations for each corresponding pair. One correspondence pair contributes two constraint equations.
$$
P' = M P_{w}
$$
$$
\underbrace{
\begin{bmatrix}
u \\
v \\
1
\end{bmatrix}
}_\text{known} =
\underbrace{
\begin{bmatrix}
p_{11} & p_{12} & p_{13} & p_{14} \\
p_{21} & p_{22} & p_{23} & p_{24} \\
p_{31} & p_{32} & p_{33} & p_{34} \\
\end{bmatrix}
}_\text{unknown}
\underbrace{
\begin{bmatrix}
x_{w} \\
y_{w} \\
z_{w} \\
1
\end{bmatrix}
}_\text{known}
$$
Two constraint equations:

**Step 4:** Rearranging the terms <br>
According to **step 3**, with 12 elements unknown, we need at least 6 correspondence pairs.

**Step 5:** Solve $P$ by [homogeneous least square solution](https://hackmd.io/@jackyyeh/HyXcLNewp). <br>
Note that $P$ has multiple solutions because $P$ is defined up to a scale. (Refer to [supplement](https://hackmd.io/o0UqmikhQdKBHLiaY5FV-Q?both#Supplement) for explanation).
**Step 6:** Decompose $P$ into intrinsic & extrinsic matrix
- Find $K$ and $R$ using QR factorization

- Find $t$

## Supplement
- Projection matrix is defined only [up to a scale](https://stackoverflow.com/questions/17114880/up-to-a-scale-factor).


- [Perspective-n-Point(PnP)](https://hackmd.io/@jackyyeh/BJxMZtUUT/%2FXoqLoirfTHmv0RN7n9e-Cw) is an another calibration algorithm provided with camera intrinsic matrix. We will discuss it in later post.
## Reference
- [Camera Calibration | Camera Calibration](https://www.youtube.com/watch?v=GUbWsXU1mac)
- [Intrinsic and Extrinsic Matrices | Camera Calibration](https://www.youtube.com/watch?v=2XM2Rb2pfyQ)