# Error Analysis ### The Big Idea When you solve a linear system $A \mathbf{x} = \mathbf{b}$, the condition number of the matrix $A$ tells you how sensitive the solution $\mathbf{x}$ is to small changes in the vector $\mathbf{b}$. A large condition number means that a small error in b can lead to a very large error in the solution $\mathbf{x}$, indicating an ill-conditioned system. Conversely, a small condition number indicates a well-conditioned system. ## Vector Norms :::info **Definition** A **vector norm** is a function that assigns a positive length or magnitude to a vector. It satisfies four properties: 1. (Non-negativity) $\| \mathbf{x} \| \geq 0$ for all $\mathbf{x} \in \mathbb{R}^n$ 2. (Definiteness) $\| \mathbf{x} \| = 0$ if and only if $\mathbf{x} = \mathbf{0}$ 3. (Homogeneity) $\| c \mathbf{x} \| = |c| \| \mathbf{x} \|$ for any $c \in \mathbb{R}$ and $\mathbf{x} \in \mathbb{R}^n$ 4. (triangle inequality) $\| \mathbf{x} + \mathbf{y} \| \leq \| \mathbf{x} \| + \| \mathbf{y} \|$ for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ ::: :::info **Definition** Commonly used vector norms for a vector $\mathbf{x} =[x_1, \dots, x_n]^T \in \mathbb{R}^n$ are called **$p$-norms**. They are defined for $1 \leq p < \infty$ as: $$ \| \mathbf{x} \|_p \equiv (|x_1|^p+\ldots+|x_n|^p)^\frac{1}{p} $$ Specific examples of these norms include: * **$1$-norm:** $\|\mathbf{x} \|_1 = |x_1|+\ldots+|x_n|$ * **$2$-norm (Euclidean norm):** $\|\mathbf{x} \|_2 = \sqrt{|x_1|^2+\ldots+|x_n|^2}$ * **$\infty$-norm:** $\| \mathbf{x} \|_\infty \equiv \max |x_i|$ ::: :::warning **Example** The unit ball for a given norm is the set of all vectors with a norm of 1. Different norms define different "shapes" for the unit ball. The image shows the unit balls in $\mathbb{R}^n$. ![image](https://hackmd.io/_uploads/HyYgdNhjgx.png) ::: ### Properties and Examples * For values of $p$ where $0<p<1$, the formula does not satisfy the triangle inequality, so it is not considered a true norm. * The $\infty$-norm is the limit of the $p$-norm as $p \to \infty$. :::warning **Example** For the vector $\mathbf{x} =\begin{bmatrix} 3\\4\end{bmatrix}$: * $\| \mathbf{x} \|_1= 3+4 =7$ * $\|\mathbf{x} \|_2 = \sqrt{3^2+4^2} =5$ * $\|\mathbf{x} \|_\infty = \max \{3,4\}=4$ ::: :::success **Exercise** 1. * Suppose $\mathbf{x} = \begin{bmatrix} 1 & 2 & 2 & c & 1 & 1 \end{bmatrix}^T \in \mathbb{R}^6$ such that $\|\mathbf{x}\|_3 = 3$ If $c \geq 0$, find $c$. * Suppose $\mathbf{x} = \begin{bmatrix} 1 & 3 & 2 & 2 & c & 2 & 1 & 2 \end{bmatrix} \in \mathbb{R}^8$ such that $\|\mathbf{x}\|_3 = 5$. If $c \geq 0$, find $c$. 2. Define $\|\mathbf{x}\|_{\min} = \min_k |x_k|$ for any $\mathbf{x} = \begin{bmatrix} x_1 & \cdots & x_n \end{bmatrix}^T \in \mathbb{R}^n$. Then prove that $\|\mathbf{x}\|_{\min}$ is a norm. ::: ## Matrix Norms :::info **Definition** A **matrix norm** is a function that assigns a positive scalar value to a matrix, satisfying five key properties: 1. Non-negativity: $\|A\| \geq 0$ 2. Definiteness: $\|A\| = 0 \Leftrightarrow A = 0$ 3. Homogeneity:$\|cA\| = |c| \, \|A\| \;\; \forall c \in \mathbb{R}$ 4. Triangle Inequality: $\|A + B\| \leq \|A\| + \|B\|$ for any matrix $B$ 5. Sub-multiplicativity: $\| A B \| \leq \| A \| \| B \|$ ::: Two common types of matrix norms are the Frobenius norm and the operator norm. ### Frobenius Norm :::info **Definition** The **Frobenius norm** of a matrix $A$ is analogous to the Euclidean norm for vectors. It is calculated by taking the square root of the sum of the squares of all the matrix entries: $$ \| A \|_F = \sqrt{ \sum_{i=1}^{m} \sum_{j=1}^{n} |a_{i,j}|^2 } $$ where $a_{i,j}$ are the entries of $A$ $$ A = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} $$ ::: * The Frobenius norm can also be expressed in terms of the singular values of $A$, denoted as $\sigma_i$. If $r$ is the rank of the matrix, then: $$\| A \|_F = (\sigma^2_1 + \sigma^2_2 + \dots + \sigma^2_r)^\frac{1}{2}$$ * For a diagonal matrix $D = \mathrm{diag}(d_1, d_2, \dots, d_n)$, the Frobenius norm is the square root of the sum of the squared diagonal entries. $$\| D \|_F = (d_1^2 + d_2^2 + \cdots + d_n^2)^\frac{1}{2}$$ ### Operator Norm :::info **Definition** The **operator norm** of a matrix $A$ is also known as the induced norm. It quantifies the maximum "stretch" that a matrix applies to a unit vector. The definition with respect to the 2-norm is: $$ \| A \| = \max_{\mathbf{x} \not= 0 } \frac{ \| A \mathbf{x} \| }{ \| \mathbf{x} \|}= \max_{\|\mathbf{x}\| =1 } \| A \mathbf{x} \| $$ ::: * The operator norm is equal to the largest singular value of the matrix, $\sigma_1$. * It also satisfies the property that $\| A \mathbf{x} \| \leq \| A \| \| \mathbf{x} \|$ for any vector $\mathbf{x}$. * f a matrix A is nonsingular, the operator norm of its inverse is given by $$\| A^{-1} \| = \frac{1}{ \displaystyle \min_{ \| \mathbf{x} \| = 1 } \| A \mathbf{x} \|}$$ :::warning **Example** Consider $$ A = \begin{bmatrix} 1&1 \\ -1 & 1 \end{bmatrix} $$ * First compute $1$-norm of $A$: $$|x| + |y| = 1 \Rightarrow |x-y| + |x+y| = \begin{cases} 2|x| & \text{if } |x| \geq |y| \\ 2|y| & \text{if } |x| \leq |y| \end{cases}$$ Thus, $$ \|A\|_1 = \max_{\|\mathbf{x}\|_1 = 1} \|A\mathbf{x}\|_1 = 2 $$ * Next compute $2-$norm of $A$: $$x^2 + y^2 = 1 \Longrightarrow (x-y)^2 + (x+y)^2 = 2(x^2 + y^2) = 2$$ Thus, $$ \|A\|_2 = \max_{\|\mathbf{x}\|_2 = 1} \|A\mathbf{x}\|_2 = \sqrt{2} $$ * Finally, compute $\infty-$norm of $A$: $$\max \{|x|, |y|\} = 1 \Longrightarrow \max\{|x-y|, |x+y|\} \leq 2$$ The equality hold if and only if $|x| = |y| = 1$. Hence $$ \|A\|_\infty = \max_{\|\mathbf{x}\|_\infty = 1} \|A\mathbf{x}\|_\infty = 2 $$ ::: :::warning **Example** Consider the diagonal matrix $D = \mathrm{diag}(d_1, d_2, \dots, d_n)$ * Note that \begin{align} \|D\mathbf{x}\|_2^2 &= (d_1 x_1)^2 + \cdots + (d_n x_n)^2 \\ &\leq C^2 (|x_1|^2 + \cdots + |x_n|^2) \\ &= C^2 \|\mathbf{x}\|_2^2 \end{align} where $C \equiv\max\{ |d_1|, |d_2|, \dots, |d_n| \}$. * It follows that $$\|D \mathbf{x}\|_2 \leq C \|\mathbf{x}\|_2 \Longrightarrow \|D\|_{op} \leq C$$ * Suppose $C = |d_1|$, and take $\mathbf{x} = e_1$. \begin{align} D\mathbf{x} = d_1 \mathbf{e}_1 &\Longrightarrow \frac{\|D\mathbf{x}\|_2}{\|\mathbf{x}\|_2} = |d_1| = C \\ &\Longrightarrow \|D\|_{op} \geq C \end{align} * Therefore $$ \|D\|_{op} = \max \{ |d_1|, |d_2|, \dots, |d_n| \} $$ ::: ## Condition Number :::info **Definition** The **condition number** of a nonsingular square matrix A is defined using an operator norm: $$ \mathrm{cond}(A) \equiv \| A \| \| A^{-1} \| $$ By convention, we define $\mathrm{cond}(A) = \infty$ if $\det(A) = 0$. ::: A well-conditioned matrix has a condition number close to 1, while a large condition number indicates an ill-conditioned matrix. The condition number can also be interpreted as the ratio of the maximum stretch to the minimum stretch that the matrix applies to a unit vector. :::danger **Proposition** If $A$ is nonsingular, we have \begin{align} \mathrm{cond}(A) &= \| A \| \| A^{-1} \| \\ &= \frac{\text{maximum stretch of a unit vector}}{\text{minimum stretch of a unit vector}} \\ &\geq 1 \end{align} ::: :::warning **Example** The image below shows the unit circle and its image under the linear transformation defined by a $2 \times 2$ matrix $A$. Determine $\| A \|$, $\| A^{-1} \|$ and $\mathrm{cond}(A)$. ![image](https://hackmd.io/_uploads/Sk7pvY3jge.png) Observe * the maximum stretch of a unit vector is $\| A \| = 3 / \sqrt{2}$ * the minimum stretch is $1/\sqrt{2}$ therefore $\| A^{-1} \| = \sqrt{2}$ * thus the condition number is $\mathrm{cond}(A) = 3$. ::: ## Relative Errors The condition number directly relates the relative error in the input vector $\mathbf{b}$ to the relative error in the solution vector $\mathbf{x}$. Consider two linear systems: 1. The exact system: $A\mathbf{x} = \mathbf{b}$ 2. The perturbed system: $A\mathbf{x}' = \mathbf{b} + \Delta \mathbf{b}$ The change in the solution is $\Delta \mathbf{x} = \mathbf{x}' - \mathbf{x}$. The following theorem provides a bound on the relative error of the solution: :::danger **Theorem** Let $A$ be a nonsingular matrix and consider the linear system $A \mathbf{x} = \mathbf{b}$. If a small change $\Delta \mathbf{b}$ corresponds to a change $\Delta \mathbf{x}$ in the sense that $A(\mathbf{x} + \Delta \mathbf{x}) = \mathbf{b} + \Delta \mathbf{b}$, then $$ \frac{\| \Delta \mathbf{x} \|}{\| \mathbf{x} \|} \leq \mathrm{cond}(A) \frac{\| \Delta \mathbf{b} \|}{\| \mathbf{b} \|} $$ ::: This theorem shows that the relative error in the solution can be up to $\mathrm{cond}(A)$ times larger than the relative error in the input. :::warning **Example: A Tale of Two Matrices** The two examples below compare a well-conditioned matrix and an ill-conditioned matrix to show how small changes in b affect the solution. **Case 1 (Well-Conditioned):** $A = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 2 \\ 2 \end{bmatrix}$. * The solution is $\mathbf{x} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$. * If we introduce a small change $\Delta \mathbf{b} = \begin{bmatrix} 0 \\ 0.001 \end{bmatrix}$, the new solution is $\mathbf{x}' = \begin{bmatrix} 1 \\ 1.0005 \end{bmatrix}$. * Both the relative change in $\mathbf{b}$ and the relative change in $\mathbf{x}$ are very small. **Case 2 (Ill-Conditioned):** $A = \begin{bmatrix} 1 & 1 \\ 1 & 1.001 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 2 \\ 2 \end{bmatrix}$. * The solution is $\mathbf{x} = \begin{bmatrix} 2 \\ 0 \end{bmatrix}$. * If we introduce the same small change $\Delta \mathbf{b} = \begin{bmatrix} 0 \\ 0.001 \end{bmatrix}$, the new solution is $\mathbf{x}' = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$. * This matrix is nearly singular, with a large condition number. In this case, a tiny change in the input resulted in a significant change in the output, demonstrating the system's sensitivity. :::