The Condition Number

# Condition Numbers ## Overview A **condition number** measures how sensitive the output of a problem is to small changes in its input. In other words, if small input changes end up causing only small output changes, it can be said a **well-conditioned** problem exists. On the other side, if small input changes end up creating large output changes, it can be said an **ill-conditioned** problem exists. In very simple terms, a **high condition number** indicates the mathematical problem is **unstable** whereas a **low condition number** indicates the problem is **stable.** To better understand the concept of **condition numbers**, it can be explained through dissecting linear forms of equations. ## Condition Number To begin, the 'textbook definition' of the condition formula is: $\kappa(A) = \|A\|\,\|A^{-1}\|$ This is the standard way to **measure how sensitive a linear system** is to **small changes in its input.** The condition number, which is $k(A)$, measures how **sensitive** a system is to small changes in its input. The term $\|A\|$ shows how much the matrix can **stretch** an input, and $\|A^{-1}\|$ shows how much **stretching** is needed to reverse that effect. When both are **large**, $k(A)$ becomes large, meaning even tiny input errors can turn into much larger changes in the output. Each part of the formula contributes to that concept which will now be explained in detail for deeper understanding and clarification. ### $k(A)$: The Condition Number You can think of $k(A)$ as the **condition number of $A$** similar to how you look at: - $f(x)$ = a function - $p(t)$ = a polynomial So - $k$ is the **name** of the function - $A$ is the **input** - $k(A)$ is the **output**, which is a single number measuring how sensitive is the system. ### A: The System Itself The matrix $A$ represents the system of equations you are trying to solve. It tells you how the input $b$ relates to the output $x$ through: $Ax = b$ ### $\|A\|$: How Much the System Can $Stretch$ Things The notation $\|A\|$ is called the **matrix norm** of $A$. In simple terms, it measures **how much can the matrix $A$ stretch or amplify a vector.** If $\|A\|$ is large, then $A$ can take a small input and turn it into something much larger. If $\|A\|$ is small, $A$ does not 'stretch' things as much. You can think of $\|A\|$ as 'how aggressively the system transforms inputs.' ### $A^{-1}$: The System Running Backwards $A^{-1}$ is the **inverse** of the matrix $A.$ Mathematically, it tells you how to undo what $A$ does. From the equation: $Ax = b$ the solution is $x = A^{-1}b$ So, $A^{-1}$ is the part of the system that transforms the input $b$ into the output $x$. Now that we have a base understanding of the formula itself, we will dive into how it relates to the linear systems of equations through $Ax = b$. ## Condition Numbers for Linear Systems $Ax = b$ Focusing on the linear system of $Ax = b$ where the following are true: 1. $A$ is an $n$ x $n$ matrix of **coefficients** (These coefficients are the numerical values that multiply the unknowns in the system. E.g. $3_{x1}$ - $2_{x2}$ = 5, the numbers **3** and **-2** would be **coefficients**) 2. $x$ is the unknown vector 3. $b$ is the data (or better thought of as the 'input' to the problem. It contains the values that the linear combinations should equal.) When applied to the linear system $Ax = b$, the condition number addresses a precise question: **If we slightly change the input $b$, how much can the output $x$ change?** In this dissection, the **original problem** can be stated as: $A$x = b $\quad\Rightarrow\quad$ x = $A^{-1}$ b Now, to illustrate the concept of the **condition number**, an alteration will be made by changing the input $b$ slightly which could represent a few things: 1. A rounding error 2. Measurement noise 3. Any accidental or unavoidable small change in the data This is represented by: b $\to$ b + e Where $e$ is a **small error vector**. This would effectually change the system to: A(x + $\delta$x) = b + e where $\delta$x represents the change in the solution caused by the change in the input. Solving for the new solution: x + $\delta$x = $A^{-1}$(b + e) Then subtracting the original solution: $x$ = $A^{-1}$ b $\delta$x = $A^{-1}$ e This equation is extremely important since it **shows exactly how an error in the input ($e$) is turned into an error in the output ($\delta x$). The matrix $A^{-1}$ is responsible for $amplifying$ or $dampening$ the error. Even this simple derivation already demonstrates the core idea: 1. A small change in the input $b$ produces a change in the solution $x$. 2. The matrix $A$ -- specifically how 'invertible' it is -- determines how large that change can be in the equation. If $A{-1}$ is 'large' then even a very small input error $e$ can create a large output error $\delta x$. This is exactly the motivation for the **condition number**, which quantifies **how much** the matrix $A$ can amplify errors. ## From Error Amplification to the Condition Number In the previous section, we arrived at the key relationship: $\delta x = A^{-1}e$ which tells us how a small change in $e$ in the input vector $b$ produces a change $\delta x$ in the solution. Next, we are going to focus on turning this concept into a precise numerical quantity (something that measures how **sensitive** is the system) This number is the **condition number** of the matrix $A$. ## Absolute vs. Relative Error When we solve a linear system such as $Ax = b$, we usually assume that the vector $b$ we are given is correct. In real problems -- measurements, rounding, data collection, etc. -- the numbers in $b$ are $never$ exact. I.e. there is always some small error. To understand how these errors behave, we measure their **size** in two different ways: 1. **Absolute error** 2. **Relative error** ## Absolute Error Absolute error is simply the **raw size** of the mistake. **Absolute error in the input:** If the true input is $b$ and what we actually get is $b$ + $e$, then $e$ is the error. Its absolute size is $\|e\|$ which tells us 'how big is the error vector by itself.' **Absolute error in the output:** Since the system is sensitive, this input error produces a change in the solution $\delta x$. The size of that change is: $\|\delta x\| = \|A^{-1} e\|.$ which tells us 'how much did the solution move due to the input error.' ## Why Absolute Error Is Not Enough An absolute error of 0.1 could be either: 1. **Insignificant** if the true value is, for example, 5000 2. **Very Significant** if the true value is, for example, 0.02 Thus, absolute error does not tell the complete story which is why it is also important to consider **relative** error. ## Relative Error Relative error compares the error to the size of the true value. It tells us **how large the mistake** is in proportion to what it is measuring. **Relative error in the input:** $\frac{\|e\|}{\|b\|}$ This means 'how large is the error in the data compared to the data itself.' **Relative error in the output:** $\frac{\|\delta x\|}{\|x\|}$ This means 'how large is the change in the solution compared to the solution itself.' These ratios give a fair sense of how serious is the error. ## How This Connects to Condition Numbers The entire purpose of the condition number is to compare: - **relative change in the output** vs. - **relative change in the input** If a matrix $A$ has a large condition number, then: A tiny relative error in the input: $\frac{\|e\|}{\|b\|}$ can turn into a significant relative error in the output: $\frac{\|\delta x\|}{\|x\|}$ Which is precisely why ill-conditioned systems are dangerous -- the errors 'blow up.' ## Numeric Example of the Condition Number Consider the 2 X 2 matrix: A = $\begin{pmatrix}100 & 99 \\99 & 98\end{pmatrix}$ The right hand side in this example will be: b = $\begin{pmatrix} 199 \\ 197 \end{pmatrix} \;\;\Rightarrow\;\; x = \begin{pmatrix} 1 \\ 1 \end{pmatrix}$ Next, a small change in the data is introduced: $b' = \begin{pmatrix} 199 \\ 197.01 \end{pmatrix} \quad\text{(only a 0.01 change)}$ Solving the system means computing $x'$ = $A^{-1}b'$, so even this tiny change in $b$ becomes amplified by the inverse of $A$. Since $A$ is nearly singular, $A^{-1}$ has very large values, which is why the new solution shifts so dramatically. Solving $Ax = b'$ results with: $x' \approx \begin{pmatrix} 2 \\ 0 \end{pmatrix}$ A **0.01** change in the input produced a change of about *1* in the output. This is a prime example of what occurs in a system with a **large condition number.** A system with a **large condition number** behaves as such: 1. Tiny measurement noise becomes a noticeable change in the output. 2. Numerical algorithms produce unstable or unreliable answers. 3. The output cannot be trusted unless the input is extremely precise. 4. Solutions fluctuate dramatically even when the data barely changes. This is precisely why such systems are referred to as **ill-conditioned**; they are mathematically correct, but **numerically volatile.** ## Visual Illustration ![image](https://hackmd.io/_uploads/HJmYyB4GZg.png) - On the **left**, a small change in the input (the short red arrow labeled $\delta x$) barely moves the original input $x$. - The matrix $A$ transforms both vectors. - On the **right**, the tiny input change becomes a **much larger change** in the output (the long blue arrow between $Ax$ and $A(x + \delta x)$. The concept of focus in this example is: **a small input change becomes a large output change.** This amplification is exactly what a **high condition number** measures. ## Summary You have seen the foundational building blocks of what a **condition number** is through its base formula and how it relates to linear algebra. The key concept takeaway is simply: it measures **how sensitive the output is to small changes in the input.** A system with a **low** condition number behaves predictably, i.e. small input errors cause small output changes. A system with a **high** condition number is unstable, i.e. even tiny inaccuracies in the data can cause large deviations in the solution. Simply stated, the **condition number** is a **warning signal.** Its purpose is not to indicate whether the system is mathematically solvable, rather it indicates whether the solution can be trusted when your **data** is imperfect which it almost always is in real world applications. It is important to recall the **condition number** only predicts sensitivity for small changes and depends on the chosen matrix norm, so it measures a **worst-case** scenario rather than how the system behaves in every situation. High **condition numbers** show up in situations such as nearly parallel equations, strongly correlated measurements, or systems close to singular. If you are able to recognize these situations, it will help you understand when solutions may be **unstable** and when better data or different methods are necessary. When analyzing data and systems, keep this essential concept at the forefront of your mind: **How much does your system amplify the small errors you cannot avoid?**