# Least Squares Approximation ### Big Ideas The goal of the Least Squares Approximation is to find the vector $\mathbf{x}$ that comes closest to solving an inconsistent system $A\mathbf{x} \approx \mathbf{b}$. This is achieved by minimizing the length, or norm, of the residual vector, $\| A\mathbf{x} - \mathbf{b}\|$. The approximation can be found using the Normal Equations or the QR Equations. ![image](https://hackmd.io/_uploads/HktMDKqRgx.png) ## Definition :::info **Definition** The **least squares approximation** of the system $A\mathbf{x} \approx \mathbf{b}$ (where $A$ is $m \times n$ with $m>n$ and $\mathrm{rank}(A)=n$) is the vector $\mathbf{x}$ that minimizes the distance $\| A\mathbf{x} - \mathbf{b} \|$. ::: ## Normal Equations :::danger **Theorem (Best Approximation Theorem)** The least squares approximation, $\mathbf{x}_{\text{LS}}$, is the solution to the **Normal Equations**: $$ A^TA\mathbf{x} = A^T\mathbf{b} $$ ::: ### Properties * The system $A^TA\mathbf{x} = A^T\mathbf{b}$ always has at least one solution, as $\mathcal{R}(A^T A) = \mathcal{R}(A^T)$. * A **unique** solution $\mathbf{x}_{\text{LS}}$ exists if and only if $A$ has linearly independent columns, which means $\mathrm{rank}(A) = n$. * The resulting projection onto the column space of $A$ is $A\mathbf{x}_{\text{LS}} = \operatorname{Proj}_{\mathcal{R}(A)}(\mathbf{b}) \equiv A(A^T A)^{-1} A^T \mathbf{b}$. :::warning **Example** Consider the inconsistent system $$ \begin{cases} 3x - y = 4 \\ x + 2y = 0 \\ 2x + y = 1 \end{cases} $$ Let $A = \begin{bmatrix} 3 & -1 \\ 1 & 2 \\ 2 & 1 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 4 \\ 0 \\ 1 \end{bmatrix}$. The least squares solution is: $$ \mathbf{x}_{\text{LS}} = (A^T A)^{-1} A^T \mathbf{b} = \frac{1}{83} \begin{bmatrix} 87 \\ -56 \end{bmatrix} $$ * The vector $A\mathbf{x}_{\text{LS}} \approx \begin{bmatrix} 3.82 \\ -0.3 \\ 1.42 \end{bmatrix}$ is the closest vector in $\mathcal{R}(A)$ to $\mathbf{b}$. * The error vector $\mathbf{b} - A\mathbf{x}_{\text{LS}}$ is orthogonal to $\mathcal{R}(A)$, confirmed by $A^T (\mathbf{b} - A \mathbf{x}_{\text{LS}}) = \mathbf{0}$. ::: ## QR Equations :::danger **Theorem** If the thin QR decomposition of $A$ is $A = Q_1 R_1$, the least squares approximation $\mathbf{x}_{\text{LS}}$ is the solution to the QR Equations: $$ R_1\mathbf{x} = Q_1^T \mathbf{b} $$ The minimum length of the residual vector is given by the norm of the projection of $\mathbf{b}$ onto the orthogonal complement of the range, $\mathcal{R}(A)^{\perp}$: $$ \| A \mathbf{x} - \mathbf{b} \| = \| Q_2^T \mathbf{b} \| $$ where $Q_2$ contains the orthonormal basis vectors of $\mathcal{R}(A)^{\perp}$. ::: * Solving this system is often more *numerically stable* than solving the Normal Equations, especially when $A^TA$ is ill-conditioned. Since $R_1$ is upper triangular, $\mathbf{x}_{\text{LS}}$ is easily found using *backward substitution*. :::warning **Example (Finding the Least Squares Approximation)** Find the least squares approximation, $\mathbf{x}_{\text{LS}}$, for the inconsistent system $A\mathbf{x} \approx \mathbf{b}$, where: $$ A = \begin{bmatrix} 1 & -1 & 4 \\ 1 & 4 & -2 \\ 1 & 4 & 2 \\ 1 & 1 & 0 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} $$ Since $\mathrm{rank}(A) = 3$, a unique least squares solution exists. **Method 1: Using the Normal Equations** The least squares solution satisfies the Normal Equations: $A^T A \mathbf{x} = A^T \mathbf{b}$. 1. Compute $A^T A$ and $A^T \mathbf{b}$: $$A^T A = \begin{bmatrix} 4 & 6 & 4 \\ 6 & 34 & -4 \\ 4 & -4 & 24 \end{bmatrix}, \quad A^T \mathbf{b} = \begin{bmatrix} 10 \\ 15 \\ 6 \end{bmatrix}$$ 2. Solve for $\mathbf{x}_{\text{LS}}$: By computing the inverse $(A^T A)^{-1}$ and multiplying, we find: $$\mathbf{x}_{\text{LS}} = (A^T A)^{-1} A^T \mathbf{b} = \begin{bmatrix} 0.5 & -0.1 & -0.1 \\ -0.1 & 0.05 & 0.025 \\ -0.1 & 0.025 & 0.0625 \end{bmatrix} \begin{bmatrix} 10 \\ 15 \\ 6 \end{bmatrix} = \begin{bmatrix} 2.9 \\ -0.1 \\ -0.25 \end{bmatrix}$$ **Method 2: Using the QR Equations** The least squares solution satisfies the QR Equations: $R_1 \mathbf{x} = Q_1^T \mathbf{b}$. 1. Use the thin QR decomposition $A = Q_1 R_1$: $$Q_1 = \begin{bmatrix} \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & -\frac{1}{2} & -\frac{1}{2} \end{bmatrix}, \quad R_1 = \begin{bmatrix} 2 & 3 & 2 \\ 0 & 5 & -2 \\ 0 & 0 & 4 \end{bmatrix}$$ 2. Compute $Q_1^T \mathbf{b}$: $$Q_1^T \mathbf{b} = \begin{bmatrix} \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} \\ -\frac{1}{2} & \frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \\ \frac{1}{2} & -\frac{1}{2} & \frac{1}{2} & -\frac{1}{2} \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 5 \\ 0 \\ -1 \end{bmatrix}$$ 3. Solve the upper triangular system $$R_1 \mathbf{x}_{\text{LS}} = \begin{bmatrix} 5 \\ 0 \\ -1 \end{bmatrix}$$ via backward substitution: $$\begin{aligned} 4 x_3 = -1 \quad &\implies \quad x_3 = -0.25 \\ 5 x_2 - 2 x_3 = 0 \quad &\implies \quad 5 x_2 - 2(-0.25) = 0 \quad \implies \quad x_2 = -0.1 \\ 2 x_1 + 3 x_2 + 2 x_3 = 5 \quad &\implies \quad 2 x_1 + 3(-0.1) + 2(-0.25) = 5 \quad \implies \quad x_1 = 2.9 \end{aligned}$$ The least squares approximation is: $$ \mathbf{x}_{\text{LS}} = \begin{bmatrix} 2.9 \\ -0.1 \\ -0.25 \end{bmatrix} $$ ::: :::success 1. Determine the conditions for a unique least squares approximation. * (a) Let $A$ be a $m \times n$ matrix with $m \ge n$ and let $\mathbf{b} \in \mathbb{R}^m$. There is a unique vector $\mathbf{x} \in \mathbb{R}^n$ which minimizes the norm of the residual $\|A \mathbf{x} - \mathbf{b}\|$. * (b) Let $A$ be a $m \times n$ matrix with $m \ge n$ and $\operatorname{rank}(A) = n$, and let $\mathbf{b} \in \mathbb{R}^m$. There is a unique vector $\mathbf{x} \in \mathbb{R}^n$ which minimizes the norm of the residual $\|A \mathbf{x} - \mathbf{b}\|$. 2. Use the QR equations to find the least squares approximation of the system $A\mathbf{x} = \mathbf{b}$ for $\mathbf{b} = [-2,\, -1,\, 0,\, 1,\, 2]^T$ and $$ Q = \begin{bmatrix} 0 & 0 & 0 & 1 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}, \quad R = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix} $$ ::: ## Fitting Models to Data The least squares approximation is the foundation of data modeling. Given $m$ data points $(t_i,y_i)$ and a model function $f(t,\mathbf{c})$ with parameters $\mathbf{c} = [c_1, \dots, c_n]^T$, the goal is to find the parameters $\mathbf{c}$ that minimize the Sum of Squared Errors (SSE): $$ SSE = \sum_i (y_i - f(t_i,\mathbf{c}))^2 $$ If the model function is a linear combination of basis functions, $f(t,\mathbf{c}) = c_1 f_1(t) + \cdots + c_n f_n(t)$, the problem is called a **linear data fitting problem**. In this case, the SSE can be written in matrix form: $$ SSE = \Vert \mathbf{y} - A \mathbf{c} \Vert^2 $$ where $$ A = \begin{bmatrix} f_1(t_1) & \cdots & f_n(t_1) \\ \vdots & \ddots & \vdots \\ f_1(t_m) & \cdots & f_n(t_m) \end{bmatrix} $$ is the **design matrix**, and the optimal coefficient vector $\mathbf{c}$ is the least squares approximation of the system $A \mathbf{c} \approx \mathbf{y}$. :::warning **Example** Common examples of linear data fitting: * **Polynomial Fitting:** $f(t) = c_1 + c_2 t + \dots + c_n t^{n-1}$. This results in a design matrix $A$ where the columns are powers of the $t$-values (a generalized Vandermonde matrix) $$ A = \begin{bmatrix} 1 & t_1 & t_1^2 & \dots & t_1^{n-1} \\ 1 & t_2 & t_2^2 & \dots & t_2^{n-1} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & t_m & t_m^2 & \dots & t_m^{n-1} \end{bmatrix} $$ * **Least Squares Linear Regression:** This is polynomial fitting where $n=2$, fitting a straight line $y = c_1 + c_2 t$. $$A = \begin{bmatrix} 1 & t_1 \\ 1 & t_2 \\ \vdots & \vdots \\ 1 & t_m \end{bmatrix}$$ ::: :::success **Exercise** 3. Set up (but do not solve) a linear system $B\mathbf{c} = \mathbf{y}$ where the solution is the coefficient vector $\mathbf{c} = [c_0 \; c_1 \; c_2]^T$ for the function $f(t) = c_0 + c_1 \cos(2\pi t) + c_2 \sin(2\pi t)$ that best fits the data: $(0, 1), \; (1/4, 3), \; (1/2, 2), \; (3/4, -1), \; (1, 0)$. 4. Find the best fitting line $y = ax + b$ for three data points $(x_1, y_1) = (-1, 1)$, $(x_2, y_2) = (0, 0)$, $(x_3, y_3) = (1, 0)$. * a) Write down the least squares equation for $\begin{bmatrix} a \\ b \end{bmatrix}$. * b) Solve the least squares equation. 5. Consider the bivariate data: $(-1, 0)$, $(1,1)$, $(3,1)$. * (a) Draw a sketch showing the approximate least-squares straight-line fit $y = ax + b$ to this data. * (b) Write down the least squares (or normal) equation satisfied by $\begin{bmatrix} a \\ b \end{bmatrix}$. * \(c) What quantity is minimized by the solution to the equation in (b)? :::