# Camera model
###### tags: `cvdl2020`
## Pinhole camera
Consider the central projection of points in space onto a plane. Let the centre of projection be the origin of a Euclidean coordinate system, and consider the place $x=f$ which is called the **image plane** or **focal plane**.

Given the point $(X,Y,Z)^T$ in 3D world coordinate, it's mapped to the point $(f X/Z, fY/Z, f)^T$ on the image plane. Ignoring the final image coordinate, we conclude that $$(X,Y,Z)^T \rightarrow (f X/Z, fY/Z, f)^T$$. Notice that this is a mapping from Euclidean space $R^3$ to Euclidean space $R^3$.
The center of projection is called the **camera center** or **optical center**.
The line from the camera center perpendicular to the image plane is called the **principle axis** or **principal ray** of the camera.
The point where the principal axis meets the image plance is called the **principal point**.
The plane through the camera center parallel to the image plane is called the **principal plane** of the camera.
### Center projection using homogeneous coordinates
If the world coordinate is represented by homogeneous vector, then centeral projection can be expressed as a linear mapping between homogeneous coordinate (another reason why homogeneous coordinate is good)
This is a mapping from $R^3 \in \text{World Coordinate}$ to $R^3 \in \text{Image Plane}$
$$(X,Y,Z)^T \rightarrow (f X/Z, fY/Z, f)^T$$
can be written to
$$
\begin{bmatrix}
fX \\
fY \\
Z \\
\end{bmatrix} =
\begin{pmatrix}
f & 0 & 0 & 0 \\
0 & f & 0 & 0 \\
0 & 0 & 1 & 0 \\
\end{pmatrix}
\begin{pmatrix}
X \\
Y \\
Z \\
1 \\
\end{pmatrix}
$$
or
$$x = PX$$
where
- $x$ is the image point represented by a homogeneous 3-vector
- $P$ is a 3x4 homogeneous **camera projection matrix**
- $X$ is the world coordinate in $P^3$, represented by a homogeneous 4-vector
### Principla point offset
In reality, the origin of coordinate in the image plane is at the principal point. So the mapping becomes:
$$(X,Y,Z)^T \rightarrow (f X/Z + p_x, fY/Z + p_y)^T$$
$$
\begin{bmatrix}
fX + Z p_x \\
fY + Z p_y \\
Z \\
\end{bmatrix} =
\begin{pmatrix}
f & 0 & p_x & 0 \\
0 & f & p_y & 0 \\
0 & 0 & 1 & 0 \\
\end{pmatrix}
\begin{pmatrix}
X \\
Y \\
Z \\
1 \\
\end{pmatrix}
$$
where $(p_x, p_y)^T$ is the coordinate of principla point.
or
$$x = K[I|0] X$$
where $K$ is a 3x3 **camera caliberation matrix**
### Camera rotation and translation
If $\tilde X$ is an inhomogeneous 3-vector representing the coordinates of a point in the world coordinate frame, and $\tilde X_{cam}$ represents the same point in the camera coordinate frame.
Since two coordinate frames are related via a rotation and a translation. We may write $\tilde X_{cam} = R(\tilde X - \tilde C)$ where $\tilde C$ is the coordinate of the camera center in the world coordinate frame, and R is a 3x3 rotation matrix representing the orientation of the camera coordinate frame.
$$X_{cam} =
\begin{bmatrix}
R & -R \tilde C \\
0 & 1 \\
\end{bmatrix}
\begin{pmatrix}
X \\
Y \\
Z \\
1 \\
\end{pmatrix} =
\begin{bmatrix}
R & -R \tilde C \\
0 & 1 \\
\end{bmatrix} X
$$
Finally,
$$
x = KR[I|-\tilde C]X
$$
One sees that a general pinhole camera $P=KR[I|-\tilde C]$ has **9** dof: **3** for $R$ ($f,p_x,p_y$), **3** for $R$, and **3** for $\tilde C$.
### More on intrinsic & entrinsic parameter

$$
\lambda
\begin{bmatrix}
\mu \\
\nu \\
1 \\
\end{bmatrix} =
\begin{bmatrix}
\alpha & \gamma & \mu_0 \\
0 & \beta & \nu_0 \\
0 & 0 & 1 \\
\end{bmatrix}
\begin{bmatrix}
R_{11} & R_{12} & R_{13} & T_1 \\
R_{21} & R_{22} & R_{23} & T_2 \\
R_{31} & R_{32} & R_{33} & T_3 \\
\end{bmatrix}
\begin{bmatrix}
X \\
Y \\
Z \\
1 \\
\end{bmatrix} = KE
\begin{bmatrix}
X \\
Y \\
Z \\
1 \\
\end{bmatrix} = P
\begin{bmatrix}
X \\
Y \\
Z \\
1 \\
\end{bmatrix}
$$
where $$
\begin{bmatrix}
\mu \\
\nu \\
1 \\
\end{bmatrix}
$$ is the image coordiate
$$
\begin{bmatrix}
X \\
Y \\
Z \\
1 \\
\end{bmatrix}
$$ is the world coordinate point
$$
\begin{bmatrix}
R_{11} & R_{12} & R_{13} \\
R_{21} & R_{22} & R_{23} \\
R_{31} & R_{32} & R_{33} \\
\end{bmatrix}
$$ is the **Rotational Matrix**
$$
\begin{bmatrix}
T_1 \\
T_2 \\
T_3 \\
\end{bmatrix}
$$ is the **Translational Matrix**
Parameters:
- $$\alpha = - f \times k_{\mu}$$
- $$\beta = - f \times k_{\nu}$$
- $\gamma$ is the **skew factor**
- $(\mu_0, \nu_0)$ is the **principal point**
- $\lambda$ is the **scale factor** which controls the actual distance given a known ratio in the image
#### Intrinsic Parameter
**Extrinsic Parameter** $K$ is a projection matrix that maps **3D camera coordinate** to **2D pixel coordinate (undistorted)**
$$ K =
\begin{bmatrix}
-f \times k_{\mu} & \gamma & \mu_0 \\
0 & -f \times k_{\nu} & \nu_0 \\
0 & 0 & 1 \\
\end{bmatrix}
$$
#### Extrinsic Parameter
**Extrinsic Parameter** $E$ is a rotation and translation matrix that maps **3D world coordinate** to **3D camera coordinate**
$$ E =
\begin{bmatrix}
R_{11} & R_{12} & R_{13} & T_1 \\
R_{21} & R_{22} & R_{23} & T_2 \\
R_{31} & R_{32} & R_{33} & T_3 \\
\end{bmatrix}
$$
#### Camera calibration
K as 7 dof in theory and 8 dof in practice. Why then do most textbooks treat K as an upper-triangular matrix with 5 dof ?
It's because we canno recover the full K matrix based on external measurement alone. When caliberating a camera based on external 3D points or other measurements, we end up estimating the K and E camera paraemters simultaneously.
Given a full 3x4 camera matrix P, we can compute an upper-traiangular K matrix using QR factorization. (Note in linear algebra book, R represents an upper-triangular matrix, while in CV R represents an orthogonal rotation).
#### Summary
```graphviz
digraph G{
graph [fontname=Arial, compound=true];
node [shape=record,style=filled, fillcolor=aquamarine];
edge [fontcolor=red];
rankdir=LR;
subgraph cluster_0 {
rankdir=LR;
label="3D";
world [label="World"];
cam [label="Camera"];
}
subgraph cluster_2 {
label="2D";
pixel [label="Pixel (Undistorted)"];
img [label="Image (Distorted)"];
}
// edges
pixel->img[label="D", dir="both"];
world->cam[label="E", dir="both"];
cam->pixel[label="K", dir="both"];
}
```
### Distortion
Distortion is the change to image when light passes through lens
Given 2D **distorted** image point $(\mu_d, \nu_d)$, we want to find the **undistorted** image point $(\mu_u, \nu_u)$
- Radial (Barrel) distortion:
- $\mu_u = \mu_d (1+k_1 r^2 + k_2 r^4 + k_3 r^6)$
- $\nu_u = \nu_d (1+k_1 r^2 + k_2 r^4 + k_3 r^6)$
- Tangential distortion: skip