# Error Analysis
### The Big Idea
When you solve a linear system $A \mathbf{x} = \mathbf{b}$, the condition number of the matrix $A$ tells you how sensitive the solution $\mathbf{x}$ is to small changes in the vector $\mathbf{b}$. A large condition number means that a small error in b can lead to a very large error in the solution $\mathbf{x}$, indicating an ill-conditioned system. Conversely, a small condition number indicates a well-conditioned system.
## Vector Norms
:::info
**Definition** A **vector norm** is a function that assigns a positive length or magnitude to a vector. It satisfies four properties:
1. (Non-negativity) $\| \mathbf{x} \| \geq 0$ for all $\mathbf{x} \in \mathbb{R}^n$
2. (Definiteness) $\| \mathbf{x} \| = 0$ if and only if $\mathbf{x} = \mathbf{0}$
3. (Homogeneity) $\| c \mathbf{x} \| = |c| \| \mathbf{x} \|$ for any $c \in \mathbb{R}$ and $\mathbf{x} \in \mathbb{R}^n$
4. (triangle inequality) $\| \mathbf{x} + \mathbf{y} \| \leq \| \mathbf{x} \| + \| \mathbf{y} \|$ for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$
:::
:::info
**Definition**
Commonly used vector norms for a vector $\mathbf{x} =[x_1, \dots, x_n]^T \in \mathbb{R}^n$ are called **$p$-norms**. They are defined for $1 \leq p < \infty$ as:
$$
\| \mathbf{x} \|_p \equiv (|x_1|^p+\ldots+|x_n|^p)^\frac{1}{p}
$$
Specific examples of these norms include:
* **$1$-norm:** $\|\mathbf{x} \|_1 = |x_1|+\ldots+|x_n|$
* **$2$-norm (Euclidean norm):** $\|\mathbf{x} \|_2 = \sqrt{|x_1|^2+\ldots+|x_n|^2}$
* **$\infty$-norm:** $\| \mathbf{x} \|_\infty \equiv \max |x_i|$
:::
:::warning
**Example**
The unit ball for a given norm is the set of all vectors with a norm of 1. Different norms define different "shapes" for the unit ball. The image shows the unit balls in $\mathbb{R}^n$.

:::
### Properties and Examples
* For values of $p$ where $0<p<1$, the formula does not satisfy the triangle inequality, so it is not considered a true norm.
* The $\infty$-norm is the limit of the $p$-norm as $p \to \infty$.
:::warning
**Example**
For the vector $\mathbf{x} =\begin{bmatrix} 3\\4\end{bmatrix}$:
* $\| \mathbf{x} \|_1= 3+4 =7$
* $\|\mathbf{x} \|_2 = \sqrt{3^2+4^2} =5$
* $\|\mathbf{x} \|_\infty = \max \{3,4\}=4$
:::
:::success
**Exercise**
1.
* Suppose $\mathbf{x} =
\begin{bmatrix} 1 & 2 & 2 & c & 1 & 1 \end{bmatrix}^T \in \mathbb{R}^6$ such that $\|\mathbf{x}\|_3 = 3$ If $c \geq 0$, find $c$.
* Suppose $\mathbf{x} = \begin{bmatrix} 1 & 3 & 2 & 2 & c & 2 & 1 & 2 \end{bmatrix} \in \mathbb{R}^8$ such that $\|\mathbf{x}\|_3 = 5$. If $c \geq 0$, find $c$.
2. Define $\|\mathbf{x}\|_{\min} = \min_k |x_k|$ for any $\mathbf{x} = \begin{bmatrix} x_1 & \cdots & x_n \end{bmatrix}^T \in \mathbb{R}^n$. Then prove that $\|\mathbf{x}\|_{\min}$ is a norm.
:::
## Matrix Norms
:::info
**Definition**
A **matrix norm** is a function that assigns a positive scalar value to a matrix, satisfying five key properties:
1. Non-negativity: $\|A\| \geq 0$
2. Definiteness: $\|A\| = 0 \Leftrightarrow A = 0$
3. Homogeneity:$\|cA\| = |c| \, \|A\| \;\; \forall c \in \mathbb{R}$
4. Triangle Inequality: $\|A + B\| \leq \|A\| + \|B\|$ for any matrix $B$
5. Sub-multiplicativity: $\| A B \| \leq \| A \| \| B \|$
:::
Two common types of matrix norms are the Frobenius norm and the operator norm.
### Frobenius Norm
:::info
**Definition**
The **Frobenius norm** of a matrix $A$ is analogous to the Euclidean norm for vectors. It is calculated by taking the square root of the sum of the squares of all the matrix entries:
$$
\| A \|_F = \sqrt{ \sum_{i=1}^{m} \sum_{j=1}^{n} |a_{i,j}|^2 }
$$
where $a_{i,j}$ are the entries of $A$
$$
A = \begin{bmatrix}
a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m,1} & a_{m,2} & \cdots & a_{m,n}
\end{bmatrix}
$$
:::
* The Frobenius norm can also be expressed in terms of the singular values of $A$, denoted as $\sigma_i$. If $r$ is the rank of the matrix, then: $$\| A \|_F = (\sigma^2_1 + \sigma^2_2 + \dots + \sigma^2_r)^\frac{1}{2}$$
* For a diagonal matrix $D = \mathrm{diag}(d_1, d_2, \dots, d_n)$, the Frobenius norm is the square root of the sum of the squared diagonal entries. $$\| D \|_F = (d_1^2 + d_2^2 + \cdots + d_n^2)^\frac{1}{2}$$
### Operator Norm
:::info
**Definition**
The **operator norm** of a matrix $A$ is also known as the induced norm. It quantifies the maximum "stretch" that a matrix applies to a unit vector. The definition with respect to the 2-norm is:
$$
\| A \| = \max_{\mathbf{x} \not= 0 } \frac{ \| A \mathbf{x} \| }{ \| \mathbf{x} \|}=
\max_{\|\mathbf{x}\| =1 } \| A \mathbf{x} \|
$$
:::
* The operator norm is equal to the largest singular value of the matrix, $\sigma_1$.
* It also satisfies the property that $\| A \mathbf{x} \| \leq \| A \| \| \mathbf{x} \|$ for any vector $\mathbf{x}$.
* f a matrix A is nonsingular, the operator norm of its inverse is given by $$\| A^{-1} \| = \frac{1}{ \displaystyle \min_{ \| \mathbf{x} \| = 1 } \| A \mathbf{x} \|}$$
:::warning
**Example** Consider
$$
A = \begin{bmatrix} 1&1 \\ -1 & 1 \end{bmatrix}
$$
* First compute $1$-norm of $A$:
$$|x| + |y| = 1 \Rightarrow |x-y| + |x+y| = \begin{cases} 2|x| & \text{if } |x| \geq |y| \\ 2|y| & \text{if } |x| \leq |y| \end{cases}$$ Thus,
$$
\|A\|_1 = \max_{\|\mathbf{x}\|_1 = 1} \|A\mathbf{x}\|_1 = 2
$$
* Next compute $2-$norm of $A$:
$$x^2 + y^2 = 1 \Longrightarrow (x-y)^2 + (x+y)^2 = 2(x^2 + y^2) = 2$$ Thus,
$$
\|A\|_2 = \max_{\|\mathbf{x}\|_2 = 1} \|A\mathbf{x}\|_2 = \sqrt{2}
$$
* Finally, compute $\infty-$norm of $A$:
$$\max \{|x|, |y|\} = 1 \Longrightarrow \max\{|x-y|, |x+y|\} \leq 2$$ The equality hold if and only if $|x| = |y| = 1$. Hence
$$
\|A\|_\infty = \max_{\|\mathbf{x}\|_\infty = 1} \|A\mathbf{x}\|_\infty = 2
$$
:::
:::warning
**Example** Consider the diagonal matrix $D = \mathrm{diag}(d_1, d_2, \dots, d_n)$
* Note that
\begin{align} \|D\mathbf{x}\|_2^2 &= (d_1 x_1)^2 + \cdots + (d_n x_n)^2
\\ &\leq C^2 (|x_1|^2 + \cdots + |x_n|^2) \\ &= C^2 \|\mathbf{x}\|_2^2 \end{align}
where $C \equiv\max\{ |d_1|, |d_2|, \dots, |d_n| \}$.
* It follows that $$\|D \mathbf{x}\|_2 \leq C \|\mathbf{x}\|_2
\Longrightarrow \|D\|_{op} \leq C$$
* Suppose $C = |d_1|$, and take $\mathbf{x} = e_1$.
\begin{align} D\mathbf{x} = d_1 \mathbf{e}_1 &\Longrightarrow
\frac{\|D\mathbf{x}\|_2}{\|\mathbf{x}\|_2} = |d_1| = C \\ &\Longrightarrow \|D\|_{op} \geq C \end{align}
* Therefore
$$
\|D\|_{op} = \max \{ |d_1|, |d_2|, \dots, |d_n| \}
$$
:::
## Condition Number
:::info
**Definition** The **condition number** of a nonsingular square matrix A is defined using an operator norm:
$$
\mathrm{cond}(A) \equiv \| A \| \| A^{-1} \|
$$
By convention, we define $\mathrm{cond}(A) = \infty$ if $\det(A) = 0$.
:::
A well-conditioned matrix has a condition number close to 1, while a large condition number indicates an ill-conditioned matrix. The condition number can also be interpreted as the ratio of the maximum stretch to the minimum stretch that the matrix applies to a unit vector.
:::danger
**Proposition** If $A$ is nonsingular, we have
\begin{align}
\mathrm{cond}(A)
&= \| A \| \| A^{-1} \| \\
&= \frac{\text{maximum stretch of a unit vector}}{\text{minimum stretch of a unit vector}} \\
&\geq 1
\end{align}
:::
:::warning
**Example** The image below shows the unit circle and its image under the linear transformation defined by a $2 \times 2$ matrix $A$. Determine $\| A \|$, $\| A^{-1} \|$ and $\mathrm{cond}(A)$.

Observe
* the maximum stretch of a unit vector is $\| A \| = 3 / \sqrt{2}$
* the minimum stretch is $1/\sqrt{2}$ therefore $\| A^{-1} \| = \sqrt{2}$
* thus the condition number is $\mathrm{cond}(A) = 3$.
:::
## Relative Errors
The condition number directly relates the relative error in the input vector $\mathbf{b}$ to the relative error in the solution vector $\mathbf{x}$. Consider two linear systems:
1. The exact system: $A\mathbf{x} = \mathbf{b}$
2. The perturbed system: $A\mathbf{x}' = \mathbf{b} + \Delta \mathbf{b}$
The change in the solution is $\Delta \mathbf{x} = \mathbf{x}' - \mathbf{x}$. The following theorem provides a bound on the relative error of the solution:
:::danger
**Theorem** Let $A$ be a nonsingular matrix and consider the linear system $A \mathbf{x} = \mathbf{b}$. If a small change $\Delta \mathbf{b}$ corresponds to a change $\Delta \mathbf{x}$ in the sense that $A(\mathbf{x} + \Delta \mathbf{x}) = \mathbf{b} + \Delta \mathbf{b}$, then
$$
\frac{\| \Delta \mathbf{x} \|}{\| \mathbf{x} \|} \leq \mathrm{cond}(A) \frac{\| \Delta \mathbf{b} \|}{\| \mathbf{b} \|}
$$
:::
This theorem shows that the relative error in the solution can be up to $\mathrm{cond}(A)$ times larger than the relative error in the input.
:::warning
**Example: A Tale of Two Matrices**
The two examples below compare a well-conditioned matrix and an ill-conditioned matrix to show how small changes in b affect the solution.
**Case 1 (Well-Conditioned):** $A = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 2 \\ 2 \end{bmatrix}$.
* The solution is $\mathbf{x} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$.
* If we introduce a small change $\Delta \mathbf{b} = \begin{bmatrix} 0 \\ 0.001 \end{bmatrix}$, the new solution is $\mathbf{x}' = \begin{bmatrix} 1 \\ 1.0005 \end{bmatrix}$.
* Both the relative change in $\mathbf{b}$ and the relative change in $\mathbf{x}$ are very small.
**Case 2 (Ill-Conditioned):** $A = \begin{bmatrix} 1 & 1 \\ 1 & 1.001 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 2 \\ 2 \end{bmatrix}$.
* The solution is $\mathbf{x} = \begin{bmatrix} 2 \\ 0 \end{bmatrix}$.
* If we introduce the same small change $\Delta \mathbf{b} = \begin{bmatrix} 0 \\ 0.001 \end{bmatrix}$, the new solution is $\mathbf{x}' = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$.
* This matrix is nearly singular, with a large condition number. In this case, a tiny change in the input resulted in a significant change in the output, demonstrating the system's sensitivity.
:::