# Least Squares Approximation
### Big Ideas
The goal of the Least Squares Approximation is to find the vector $\mathbf{x}$ that comes closest to solving an inconsistent system $A\mathbf{x} \approx \mathbf{b}$. This is achieved by minimizing the length, or norm, of the residual vector, $\| A\mathbf{x} - \mathbf{b}\|$. The approximation can be found using the Normal Equations or the QR Equations.

## Definition
:::info
**Definition** The **least squares approximation** of the system $A\mathbf{x} \approx \mathbf{b}$ (where $A$ is $m \times n$ with $m>n$ and $\mathrm{rank}(A)=n$) is the vector $\mathbf{x}$ that minimizes the distance $\| A\mathbf{x} - \mathbf{b} \|$.
:::
## Normal Equations
:::danger
**Theorem (Best Approximation Theorem)**
The least squares approximation, $\mathbf{x}_{\text{LS}}$, is the solution to the **Normal Equations**:
$$
A^TA\mathbf{x} = A^T\mathbf{b}
$$
:::
### Properties
* The system $A^TA\mathbf{x} = A^T\mathbf{b}$ always has at least one solution, as $\mathcal{R}(A^T A) = \mathcal{R}(A^T)$.
* A **unique** solution $\mathbf{x}_{\text{LS}}$ exists if and only if $A$ has linearly independent columns, which means $\mathrm{rank}(A) = n$.
* The resulting projection onto the column space of $A$ is $A\mathbf{x}_{\text{LS}} = \operatorname{Proj}_{\mathcal{R}(A)}(\mathbf{b}) \equiv A(A^T A)^{-1} A^T \mathbf{b}$.
:::warning
**Example**
Consider the inconsistent system
$$
\begin{cases}
3x - y = 4 \\
x + 2y = 0 \\
2x + y = 1
\end{cases}
$$
Let $A = \begin{bmatrix} 3 & -1 \\ 1 & 2 \\ 2 & 1 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 4 \\ 0 \\ 1 \end{bmatrix}$. The least squares solution is:
$$
\mathbf{x}_{\text{LS}} = (A^T A)^{-1} A^T \mathbf{b} = \frac{1}{83} \begin{bmatrix} 87 \\ -56 \end{bmatrix}
$$
* The vector $A\mathbf{x}_{\text{LS}} \approx \begin{bmatrix} 3.82 \\ -0.3 \\ 1.42 \end{bmatrix}$ is the closest vector in $\mathcal{R}(A)$ to $\mathbf{b}$.
* The error vector $\mathbf{b} - A\mathbf{x}_{\text{LS}}$ is orthogonal to $\mathcal{R}(A)$, confirmed by $A^T (\mathbf{b} - A \mathbf{x}_{\text{LS}}) = \mathbf{0}$.
:::
## QR Equations
:::danger
**Theorem**
If the thin QR decomposition of $A$ is $A = Q_1 R_1$, the least squares approximation $\mathbf{x}_{\text{LS}}$ is the solution to the QR Equations:
$$
R_1\mathbf{x} = Q_1^T \mathbf{b}
$$
The minimum length of the residual vector is given by the norm of the projection of $\mathbf{b}$ onto the orthogonal complement of the range, $\mathcal{R}(A)^{\perp}$:
$$
\| A \mathbf{x} - \mathbf{b} \| = \| Q_2^T \mathbf{b} \|
$$
where $Q_2$ contains the orthonormal basis vectors of $\mathcal{R}(A)^{\perp}$.
:::
* Solving this system is often more *numerically stable* than solving the Normal Equations, especially when $A^TA$ is ill-conditioned. Since $R_1$ is upper triangular, $\mathbf{x}_{\text{LS}}$ is easily found using *backward substitution*.
:::warning
**Example (Finding the Least Squares Approximation)**
Find the least squares approximation, $\mathbf{x}_{\text{LS}}$, for the inconsistent system $A\mathbf{x} \approx \mathbf{b}$, where:
$$
A = \begin{bmatrix} 1 & -1 & 4 \\ 1 & 4 & -2 \\ 1 & 4 & 2 \\ 1 & 1 & 0 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix}
$$
Since $\mathrm{rank}(A) = 3$, a unique least squares solution exists.
**Method 1: Using the Normal Equations**
The least squares solution satisfies the Normal Equations: $A^T A \mathbf{x} = A^T \mathbf{b}$.
1. Compute $A^T A$ and $A^T \mathbf{b}$: $$A^T A = \begin{bmatrix} 4 & 6 & 4 \\ 6 & 34 & -4 \\ 4 & -4 & 24 \end{bmatrix}, \quad A^T \mathbf{b} = \begin{bmatrix} 10 \\ 15 \\ 6 \end{bmatrix}$$
2. Solve for $\mathbf{x}_{\text{LS}}$: By computing the inverse $(A^T A)^{-1}$ and multiplying, we find: $$\mathbf{x}_{\text{LS}} = (A^T A)^{-1} A^T \mathbf{b} = \begin{bmatrix} 0.5 & -0.1 & -0.1 \\ -0.1 & 0.05 & 0.025 \\ -0.1 & 0.025 & 0.0625 \end{bmatrix} \begin{bmatrix} 10 \\ 15 \\ 6 \end{bmatrix} = \begin{bmatrix} 2.9 \\ -0.1 \\ -0.25 \end{bmatrix}$$
**Method 2: Using the QR Equations**
The least squares solution satisfies the QR Equations: $R_1 \mathbf{x} = Q_1^T \mathbf{b}$.
1. Use the thin QR decomposition $A = Q_1 R_1$: $$Q_1 = \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & -\frac{1}{2} & -\frac{1}{2} \end{bmatrix}, \quad R_1 = \begin{bmatrix} 2 & 3 & 2 \\ 0 & 5 & -2 \\ 0 & 0 & 4 \end{bmatrix}$$
2. Compute $Q_1^T \mathbf{b}$: $$Q_1^T \mathbf{b} = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\ -\frac{1}{2} & \frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \\ \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 5 \\ 0 \\ -1 \end{bmatrix}$$
3. Solve the upper triangular system $$R_1 \mathbf{x}_{\text{LS}} = \begin{bmatrix} 5 \\ 0 \\ -1 \end{bmatrix}$$ via backward substitution: $$\begin{aligned} 4 x_3 = -1 \quad &\implies \quad x_3 = -0.25 \\ 5 x_2 - 2 x_3 = 0 \quad &\implies \quad 5 x_2 - 2(-0.25) = 0 \quad \implies \quad x_2 = -0.1 \\ 2 x_1 + 3 x_2 + 2 x_3 = 5 \quad &\implies \quad 2 x_1 + 3(-0.1) + 2(-0.25) = 5 \quad \implies \quad x_1 = 2.9 \end{aligned}$$
The least squares approximation is:
$$
\mathbf{x}_{\text{LS}} = \begin{bmatrix} 2.9 \\ -0.1 \\ -0.25 \end{bmatrix}
$$
:::
:::success
1. Determine the conditions for a unique least squares approximation.
* (a) Let $A$ be a $m \times n$ matrix with $m \ge n$ and let $\mathbf{b} \in \mathbb{R}^m$. There is a unique vector $\mathbf{x} \in \mathbb{R}^n$ which minimizes the norm of the residual $\|A \mathbf{x} - \mathbf{b}\|$.
* (b) Let $A$ be a $m \times n$ matrix with $m \ge n$ and $\operatorname{rank}(A) = n$, and let $\mathbf{b} \in \mathbb{R}^m$. There is a unique vector $\mathbf{x} \in \mathbb{R}^n$ which minimizes the norm of the residual $\|A \mathbf{x} - \mathbf{b}\|$.
2. Use the QR equations to find the least squares approximation of the system $A\mathbf{x} = \mathbf{b}$ for $\mathbf{b} = [-2,\, -1,\, 0,\, 1,\, 2]^T$ and
$$
Q =
\begin{bmatrix}
0 & 0 & 0 & 1 & 0 \\
0 & 1 & 0 & 0 & 0 \\
0 & 0 & 1 & 0 & 0 \\
1 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 1
\end{bmatrix},
\quad
R =
\begin{bmatrix}
1 & 1 & 1 & 1 \\
0 & 1 & 1 & 1 \\
0 & 0 & 1 & 1 \\
0 & 0 & 0 & 1 \\
0 & 0 & 0 & 0
\end{bmatrix}
$$
:::
## Fitting Models to Data
The least squares approximation is the foundation of data modeling. Given $m$ data points $(t_i,y_i)$ and a model function $f(t,\mathbf{c})$ with parameters $\mathbf{c} = [c_1, \dots, c_n]^T$, the goal is to find the parameters $\mathbf{c}$ that minimize the Sum of Squared Errors (SSE):
$$
SSE = \sum_i (y_i - f(t_i,\mathbf{c}))^2
$$
If the model function is a linear combination of basis functions, $f(t,\mathbf{c}) = c_1 f_1(t) + \cdots + c_n f_n(t)$, the problem is called a **linear data fitting problem**. In this case, the SSE can be written in matrix form:
$$
SSE = \Vert \mathbf{y} - A \mathbf{c} \Vert^2
$$
where
$$
A = \begin{bmatrix} f_1(t_1) & \cdots & f_n(t_1) \\ \vdots & \ddots & \vdots \\ f_1(t_m) & \cdots & f_n(t_m) \end{bmatrix}
$$
is the **design matrix**, and the optimal coefficient vector $\mathbf{c}$ is the least squares approximation of the system $A \mathbf{c} \approx \mathbf{y}$.
:::warning
**Example** Common examples of linear data fitting:
* **Polynomial Fitting:** $f(t) = c_1 + c_2 t + \dots + c_n t^{n-1}$. This results in a design matrix $A$ where the columns are powers of the $t$-values (a generalized Vandermonde matrix)
$$
A =
\begin{bmatrix}
1 & t_1 & t_1^2 & \dots & t_1^{n-1} \\
1 & t_2 & t_2^2 & \dots & t_2^{n-1} \\
\vdots & \vdots & \vdots & & \vdots \\
1 & t_m & t_m^2 & \dots & t_m^{n-1}
\end{bmatrix}
$$
* **Least Squares Linear Regression:** This is polynomial fitting where $n=2$, fitting a straight line $y = c_1 + c_2 t$. $$A = \begin{bmatrix} 1 & t_1 \\ 1 & t_2 \\ \vdots & \vdots \\ 1 & t_m \end{bmatrix}$$
:::
:::success
**Exercise**
3. Set up (but do not solve) a linear system $B\mathbf{c} = \mathbf{y}$ where the solution is the coefficient vector $\mathbf{c} = [c_0 \; c_1 \; c_2]^T$ for the function $f(t) = c_0 + c_1 \cos(2\pi t) + c_2 \sin(2\pi t)$ that best fits the data: $(0, 1), \; (1/4, 3), \; (1/2, 2), \; (3/4, -1), \; (1, 0)$.
4. Find the best fitting line $y = ax + b$ for three data points $(x_1, y_1) = (-1, 1)$, $(x_2, y_2) = (0, 0)$, $(x_3, y_3) = (1, 0)$.
* a) Write down the least squares equation for $\begin{bmatrix} a \\ b \end{bmatrix}$.
* b) Solve the least squares equation.
5. Consider the bivariate data: $(-1, 0)$, $(1,1)$, $(3,1)$.
* (a) Draw a sketch showing the approximate least-squares straight-line fit $y = ax + b$ to this data.
* (b) Write down the least squares (or normal) equation satisfied by $\begin{bmatrix} a \\ b \end{bmatrix}$.
* \(c) What quantity is minimized by the solution to the equation in (b)?
:::