{%hackmd 5xqeIJ7VRCGBfLtfMi0_IQ %} # Polynomial regression ## Problem Consider the data: | i | xi | yi | | --- | --- | --- | | 1 | 1 | 2 | | 2 | 2 | 0 | | 3 | 3 | 0 | | 4 | 4 | 1 | Find a polynomial $f(x) = c_0 + c_1x + c_2x^2$ such that $\sum_{i=1}^4 (f(x_i) - y_i)^2$ is minimized. ## Thought For any given data set $(x_1, y_1), \ldots, (x_N, y_N)$ and the straight line $f(x) = c_0 + c_1x + c_2x^2$, the key observation here is that $$ \begin{bmatrix} f(x_1) \\ \vdots \\ f(x_N) \end{bmatrix} = \begin{bmatrix} c_0 + c_1x_1 + c_2x_1^2 \\ \vdots \\ c_0 + c_1x_N + c_2x_N^2 \end{bmatrix} = \begin{bmatrix} 1 & x_1 & x_1^2 \\ \vdots & \vdots & \vdots \\ 1 & x_N & x_N^2 \end{bmatrix} \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix}. $$ With $$ A = \begin{bmatrix} 1 & x_1 & x_1^2 \\ \vdots & \vdots & \vdots \\ 1 & x_N & x_N^2 \end{bmatrix}, \ \bc = \begin{bmatrix} c_0 \\ c_1 \\ c_2 \end{bmatrix}, \text{ and } \by = \begin{bmatrix} y_1 \\ \vdots \\ y_N \end{bmatrix}, $$ we are looking for appropriate $\bc$ to minimize $\|A\bc - \by\|^2$. This is a least square problem, and we know that the answer is $\bc = (A\trans A)^{-1}A\trans\by$. ## Sample answer Let $$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 2 & 4 \\ 1 & 3 & 9 \\ 1 & 4 & 16 \end{bmatrix}\text{ and } \by = \begin{bmatrix} 2 \\ 0 \\ 0 \\ 1 \end{bmatrix}. $$ Then the answer is $$ \begin{aligned} \bc &= (A\trans A)^{-1}A\trans \by \\ &= \begin{bmatrix} 5.25 \\ -4.05 \\ 0.75 \end{bmatrix}. \end{aligned} $$ Thus, $f(x) = 5.25 - 4.05x + 0.75x^2$ is the polynomial that best describes the data. One may use [desmos](https://www.desmos.com/calculator) to plot the function to see if it is close to the data points. *This note can be found at Course website > Learning resources.*