{%hackmd 5xqeIJ7VRCGBfLtfMi0_IQ %}
# Polynomial regression
## Problem
Consider the data:
| i | xi | yi |
| --- | --- | --- |
| 1 | 1 | 2 |
| 2 | 2 | 0 |
| 3 | 3 | 0 |
| 4 | 4 | 1 |
Find a polynomial $f(x) = c_0 + c_1x + c_2x^2$ such that $\sum_{i=1}^4 (f(x_i) - y_i)^2$ is minimized.
## Thought
For any given data set $(x_1, y_1), \ldots, (x_N, y_N)$ and the straight line $f(x) = c_0 + c_1x + c_2x^2$, the key observation here is that
$$
\begin{bmatrix} f(x_1) \\ \vdots \\ f(x_N) \end{bmatrix} =
\begin{bmatrix} c_0 + c_1x_1 + c_2x_1^2 \\ \vdots \\ c_0 + c_1x_N + c_2x_N^2 \end{bmatrix} =
\begin{bmatrix}
1 & x_1 & x_1^2 \\
\vdots & \vdots & \vdots \\
1 & x_N & x_N^2
\end{bmatrix}
\begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix}.
$$
With
$$
A = \begin{bmatrix}
1 & x_1 & x_1^2 \\
\vdots & \vdots & \vdots \\
1 & x_N & x_N^2
\end{bmatrix}, \
\bc = \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix}, \text{ and }
\by = \begin{bmatrix} y_1 \\ \vdots \\ y_N \end{bmatrix},
$$
we are looking for appropriate $\bc$ to minimize $\|A\bc - \by\|^2$. This is a least square problem, and we know that the answer is $\bc = (A\trans A)^{-1}A\trans\by$.
## Sample answer
Let
$$
A = \begin{bmatrix}
1 & 1 & 1 \\
1 & 2 & 4 \\
1 & 3 & 9 \\
1 & 4 & 16
\end{bmatrix}\text{ and }
\by = \begin{bmatrix} 2 \\ 0 \\ 0 \\ 1 \end{bmatrix}.
$$
Then the answer is
$$
\begin{aligned}
\bc &= (A\trans A)^{-1}A\trans \by \\
&= \begin{bmatrix} 5.25 \\ -4.05 \\ 0.75 \end{bmatrix}.
\end{aligned}
$$
Thus, $f(x) = 5.25 - 4.05x + 0.75x^2$ is the polynomial that best describes the data. One may use [desmos](https://www.desmos.com/calculator) to plot the function to see if it is close to the data points.
*This note can be found at Course website > Learning resources.*