# Linear Equations
## 1. Linear System
The goal is to solve the system of linear equations, which can be expressed in the matrix form $\mathbf{A\mathbf{x} = \mathbf{b}}$.
* **Representation**: The system is solved using the augmented matrix $[A \mid \mathbf{b}]$.
* **Solution Method (Gaussian Elimination)**: A matrix is transformed into *Row Echelon Form (REF)* or *Reduced Row Echelon Form (RREF)* using three *elementary row operations* (swapping rows, scaling a row by a non-zero constant, or adding a multiple of one row to another).
* **Solvability (Rank)**: The *rank* of a matrix is the number of non-zero rows in its row echelon form. Comparing the rank of the coefficient matrix $A$ and the augmented matrix $[A \mid \mathbf{b}]$ determines the number of solutions:
* *No Solution (Inconsistent)*: $\mathrm{rank}(A) < \mathrm{rank}([A \mid \mathbf{b}])$.
* *Unique Solution (Consistent)*: $\mathrm{rank}(A) = \mathrm{rank}([A \mid \mathbf{b}]) = n$ (where $n$ is the number of variables).
* *Infinite Solutions (Consistent)*: $\mathrm{rank}(A) = \mathrm{rank}([A \mid \mathbf{b}]) < n$ (the system has $n - r$ free variables).
## 2. LU Decomposition
*LU Decomposition* factors a matrix $A$ into a product of two triangular matrices, $A = LU$, to efficiently solve $A\mathbf{x} = \mathbf{b}$.
* **Factorization Components**:
* $L$: *Unit Lower Triangular Matrix* (ones on the diagonal, zeros above).
* $U$: *Upper Triangular Matrix* (zeros below the diagonal).
* **Solving the System (Two Steps)**:
1. *Forward Substitution*: Solve $L\mathbf{y} = \mathbf{b}$ for the intermediate vector $\mathbf{y}$.
2. *Backward Substitution*: Solve $U\mathbf{x} = \mathbf{y}$ for the solution vector $\mathbf{x}$.
* **Condition for $A=LU$**: The decomposition $A=LU$ exists if the matrix $A$ can be reduced to row echelon form *without using row swaps*.
* **Computational Benefit**: Once $A$ is factored ($O(n^3)$ operation), solving for different $\mathbf{b}$ vectors requires only $O(n^2)$ operations for the two substitution steps, making it highly efficient for multiple solves.
## 3. Error Analysis
Error analysis uses matrix norms to understand how errors in the input data affect the final solution.
* **Condition Number ($\mathrm{cond}(A)$)**: Defined for a nonsingular square matrix $A$ using an operator norm as $\mathrm{cond}(A) \equiv \| A \| \| A^{-1} \|$.
* It quantifies the ratio of the *maximum stretch* to the *minimum stretch* that $A$ applies to a unit vector, and always satisfies $\mathrm{cond}(A) \geq 1$.
* *Well-conditioned*: $\mathrm{cond}(A)$ is close to $1$.
* *Ill-conditioned*: $\mathrm{cond}(A)$ is large, meaning the system is highly sensitive to perturbations.
* **Relative Error Bound**: The condition number provides a bound on the relative error in the solution $\mathbf{x}$ caused by a small change $\Delta \mathbf{b}$ in the vector $\mathbf{b}$: $$\frac{\| \Delta \mathbf{x} \|}{\| \mathbf{x} \|} \leq \mathrm{cond}(A) \frac{\| \Delta \mathbf{b} \|}{\| \mathbf{b} \|}$$ The relative error in the solution $\mathbf{x}$ can be up to $\mathrm{cond}(A)$ times larger than the relative error in the input $\mathbf{b}$.
## 4. Interpolation
*Interpolation* involves finding a function $f(t)$ (the *interpolant*) that passes exactly through a given set of data points $(t_0, y_0), \dots, (t_d, y_d)$.
* **Polynomial Interpolation**: Finds a unique polynomial $p(t)$ of degree (at most) $d$ that passes through $d+1$ distinct points.
* *Monomial Basis*: Using the basis $\{1, t, \dots, t^d\}$ leads to solving a linear system where the coefficient matrix $A$ is the *Vandermonde matrix*.
* *Vandermonde Instability*: The Vandermonde matrix is often *ill-conditioned* for a large number of points, leading to computational instability (large condition number).
* *Alternative Bases (Lagrange/Newton)*: These methods use different polynomial bases to find the same unique polynomial $p(t)$, but the Newton basis is advantageous because it makes the coefficient matrix lower-triangular, allowing for easier iterative updates when adding new data points.
* **Cubic Spline Interpolation**: A highly stable alternative that uses $N$ piecewise cubic polynomials over $N+1$ points.
* *Continuity Conditions*: The function $p(t)$, its first derivative $p'(t)$, and its second derivative $p''(t)$ are required to be continuous at the interior points where the pieces connect.
* *Natural Cubic Spline*: Achieved by setting the second derivatives at the endpoints to zero: $p''_1(t_0) = p''_N(t_N) = 0$.
* *Key Advantage*: The matrix for constructing a cubic spline has a significantly smaller condition number than the Vandermonde matrix, making cubic splines a much more stable and reliable method for practical applications, especially with many data points.
# Orthogonality
## 5. Subspaces
A *subspace* $U \subseteq \mathbb{R}^n$ is a non-empty subset that is *closed under addition* and *closed under scalar multiplication* and must contain the *zero vector* $\mathbf{0}$.
* **Linear Independence and Span**: The *span* of a set of vectors is the smallest subspace containing them. A set of vectors is *linearly independent* if the only way to form the zero vector is by setting all scalar coefficients to zero.
* **Basis and Dimension**: A set of vectors forms a *basis* for a subspace $U$ if it is linearly independent and spans $U$. The *dimension* $\dim U$ is the number of vectors in any basis.
* **Fundamental Subspaces**: For an $m \times n$ matrix $A$, the primary subspaces are the *Nullspace* $\mathcal{N}(A) = \{\mathbf{x} : A\mathbf{x} = \mathbf{0}\}$ (a subspace of $\mathbb{R}^n$) and the *Range* (or *Column Space*) $\mathcal{R}(A) = \{A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n\}$ (a subspace of $\mathbb{R}^m$).
## 6. Orthogonal Complement
The relationship between subspaces is formalized through the inner product and orthogonal complements.
* **Inner Product (Dot Product)**: For $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$, the inner product is $\langle \mathbf{x} , \mathbf{y} \rangle = \mathbf{x}^T \mathbf{y}$.
* **Orthogonal Vectors**: Two vectors are *orthogonal* if their inner product is zero: $\langle \mathbf{x}, \mathbf{y} \rangle = 0$.
* **Orthogonal Complement ($U^\perp$)**: The orthogonal complement of a subspace $U \subseteq \mathbb{R}^n$ is the subspace of all vectors in $\mathbb{R}^n$ that are orthogonal to every vector in $U$.
* **Orthogonal Decomposition**: The entire space $\mathbb{R}^n$ is the direct sum of $U$ and its orthogonal complement: $\mathbb{R}^n = U \oplus U^\perp$. Their dimensions sum up to $n$: $\dim U + \dim U^\perp = n$.
### Fundamental Subspaces Theorem
This theorem defines the orthogonal relationships between the four fundamental subspaces of a matrix $A$:
1. *Nullspace* is the orthogonal complement of the *Row Space* $\mathcal{R}(A^T)$: $\mathcal{N}(A) = \mathcal{R}(A^T)^{\perp}$.
2. *Left Nullspace* $\mathcal{N}(A^T)$ is the orthogonal complement of the *Column Space* $\mathcal{R}(A)$: $\mathcal{N}(A^T) = \mathcal{R}(A)^{\perp}$.
## 7. Orthogonal Projection
*Orthogonal Projection* finds the point in a subspace $U$ that is nearest to a given vector $\mathbf{x}$.
* **Projection onto a Vector ($\mathbf{u}$)**: The projection $\mathrm{proj}_{\mathbf{u}}(\mathbf{x})$ is: $$\mathrm{proj}_{\mathbf{u}}(\mathbf{x}) = \frac{\langle \mathbf{x} , \mathbf{u} \rangle}{\langle \mathbf{u} , \mathbf{u} \rangle} \mathbf{u}$$ The projection matrix is $P = \frac{1}{\| \mathbf{u} \|^2} \mathbf{u} \mathbf{u}^T$.
* **Projection onto a Subspace ($U$)**: If $U$ has an *orthogonal basis* $\{\mathbf{u}_1, \dots, \mathbf{u}_m\}$, the projection is the sum of projections onto each basis vector: $$\mathrm{proj}_U(\mathbf{x}) = \sum_{k=1}^m \frac{\langle \mathbf{x} , \mathbf{u}_k \rangle}{ \langle \mathbf{u}_k , \mathbf{u}_k \rangle } \mathbf{u}_k$$
* **Best Approximation Property**: The projection $\mathrm{proj}_U(\mathbf{x})$ is the *unique vector* in $U$ that minimizes the distance (residual norm) to $\mathbf{x}$. $$\|\mathbf{x}- \mathrm{proj}_U(\mathbf{x})\| \le \|\mathbf{x}-\mathbf{y}\| \quad \forall \mathbf{y} \in U$$ The *residual vector* $\mathbf{x} - \mathrm{proj}_U(\mathbf{x})$ is always orthogonal to $U$; it lies in $U^\perp$.
* **Gram-Schmidt Process**: This is an algorithm used to convert a set of linearly independent basis vectors $\{\mathbf{u}_1, \dots, \mathbf{u}_m\}$ into an *Orthogonal Basis* $\{\mathbf{v}_1, \dots, \mathbf{v}_m\}$ for the subspace $U$. Normalizing the orthogonal vectors yields an *Orthonormal Basis (ONB)* $\{\mathbf{w}_1, \dots, \mathbf{w}_m\}$.
## 8. QR Decomposition
The *QR Decomposition* factors a matrix $A$ into the product $A = QR$, where $Q$ is an *orthogonal matrix* and $R$ is an *upper triangular matrix*.
* **Orthogonal Matrix ($Q$)**: A square matrix where $Q^T Q = Q Q^T = I$. Its columns (and rows) form an *ONB*. Orthogonal transformations preserve vector length (norm).
* **Decomposition by Gram-Schmidt**: Applying the Gram-Schmidt process to the columns of $A$ yields $A = Q_1 R_1$, where $Q_1$ has *orthonormal columns*.
* **Subspace Bases from QR**: For an $m \times n$ matrix $A$ with $\mathrm{rank}(A)=n$:
* The $n$ columns of $Q_1$ form an *ONB for the Range* $\mathcal{R}(A)$.
* The remaining $m-n$ columns of $Q_2$ (if $Q=[Q_1 \ Q_2]$) form an *ONB for the Orthogonal Complement* $\mathcal{R}(A)^{\perp}$.
* **Projection using QR**: The orthogonal projection matrix onto the range $\mathcal{R}(A)$ is simply $P = Q_1 Q_1^T$.
## 9. Least Squares Approximation
Least Squares finds the best approximate solution $\mathbf{x}_{\text{LS}}$ to an *inconsistent system* $A\mathbf{x} \approx \mathbf{b}$, by minimizing the length of the residual vector, $\| A\mathbf{x} - \mathbf{b}\|$.
* The vector $A\mathbf{x}_{\text{LS}}$ is the *orthogonal projection* of $\mathbf{b}$ onto the column space $\mathcal{R}(A)$.
* **Normal Equations**: The least squares approximation $\mathbf{x}_{\text{LS}}$ is the solution to the *Normal Equations*: $$A^T A \mathbf{x} = A^T \mathbf{b}$$ A *unique solution* exists if and only if $A$ has linearly independent columns, i.e., $\mathrm{rank}(A) = n$.
* **QR Equations (Numerically Stable Method)**: If $A=Q_1 R_1$ is the thin QR decomposition, the least squares approximation $\mathbf{x}_{\text{LS}}$ is the solution to the upper triangular system: $$R_1\mathbf{x} = Q_1^T \mathbf{b}$$ This method is often preferred for its *numerical stability*.
* **Data Fitting**: Least squares is used for *linear data fitting* problems (e.g., polynomial regression) by minimizing the *Sum of Squared Errors (SSE)*, which is equivalent to solving the least squares problem $\Vert \mathbf{y} - A \mathbf{c} \Vert^2$.
# Eigenvalues
## 10. Diagonalization
Diagonalization is the process of decomposing a square matrix $A$ into a form that reveals its intrinsic scaling factors and directions.
* **Eigenvalues and Eigenvectors**: An *eigenvalue* $\lambda$ is a scalar that satisfies the equation $A\mathbf{v} = \lambda \mathbf{v}$ for a non-zero *eigenvector* $\mathbf{v}$. The *eigenspace* for $\lambda$ is $E_{\lambda} = \mathcal{N}(A - \lambda I)$.
* **Characteristic Polynomial**: The eigenvalues are the *roots* of the characteristic polynomial $C_A(x) = \det(A - xI)$.
* The *algebraic multiplicity* $m_i$ is the exponent of the factor $(x - \lambda_i)$ in $C_A(x)$.
* The *geometric multiplicity* $d_i$ is the dimension of the eigenspace, $d_i = \dim E_{\lambda_i}$.
* **Diagonalization Theorem**: An $n \times n$ matrix $A$ is *diagonalizable* if and only if it has $n$ linearly independent eigenvectors, which occurs if and only if the algebraic multiplicity equals the geometric multiplicity for every eigenvalue ($m_i = d_i$).
* The decomposition is $A = PDP^{-1}$, where $D$ is the diagonal matrix of eigenvalues and $P$ is the invertible matrix of corresponding eigenvectors.
* **Spectral Theorem (Symmetric Matrices)**: If $A$ is a *symmetric matrix* ($A=A^T$), it is always *orthogonally diagonalizable* ($A = PDP^T$, where $P$ is orthogonal, $P^T=P^{-1}$). Crucially, all eigenvalues are *real* and eigenvectors corresponding to distinct eigenvalues are *orthogonal*.
* **Applications**: Diagonalization simplifies powers of $A$ ($A^k = P D^k P^{-1}$), and links eigenvalues to standard matrix properties: $\det(A) = \prod \lambda_i$ and $\operatorname{tr}(A) = \sum \lambda_i$.
## 11. Singular Value Decomposition (SVD)
The SVD generalizes the concept of diagonalization to any $m \times n$ matrix $A$, providing a complete structural decomposition.
* **SVD Definition**: Any $m \times n$ matrix $A$ can be factored as $A = P \Sigma Q^T$.
* $P$ ($m \times m$) and $Q$ ($n \times n$) are *orthogonal matrices*.
* $\Sigma$ ($m \times n$) is a diagonal matrix containing the *singular values* $\sigma_i$.
* **Singular Values ($\sigma_i$)**: The singular values are the square roots of the positive eigenvalues ($\lambda_i$) of the symmetric matrices $A^T A$ and $A A^T$: $\sigma_i = \sqrt{\lambda_i}$. They are ordered such that $\sigma_1 \geq \sigma_2 \geq \cdots > 0$.
* **Fundamental Subspaces**: The SVD provides orthonormal bases for all four fundamental subspaces of $A$:
* The rank of $A$ is the number of positive singular values: $\mathrm{rank}(A) = r$.
* The first $r$ columns of $Q$ span the *Row Space* $\mathcal{R}(A^T)$.
* The last $n-r$ columns of $Q$ span the *Nullspace* $\mathcal{N}(A)$.
* The first $r$ columns of $P$ span the *Column Space* $\mathcal{R}(A)$.
* The last $m-r$ columns of $P$ span the *Left Nullspace* $\mathcal{N}(A^T)$.
* **Norm and Condition Number**: The $\ell_2$ matrix norm is the largest singular value: $\| A \| = \sigma_1$. If $A$ is invertible, the condition number is $\mathrm{cond}(A) = \sigma_1 / \sigma_n$.
* **SVD Expansion and Low-Rank Approximation**: The SVD can be written as a sum of rank-one matrices: $$A = \sum_{i=1}^r \sigma_i \mathbf{p}_i \mathbf{q}_i^T$$ The *truncated SVD* $A_k = \sum_{i=1}^k \sigma_i \mathbf{p}_i \mathbf{q}_i^T$ is the *best rank-$k$ approximation* of $A$.
## 12. Computing Eigenvalues
While finding eigenvalues by solving $\det(A - \lambda I) = 0$ is the theoretical basis, it is *impractical* for large matrices. Numerical methods are used instead.
* **Power Method**: An iterative algorithm used to find the *dominant eigenvalue* $\lambda_1$ (the one with the largest magnitude) and its corresponding eigenvector $\mathbf{v}_1$.
* The normalized iteration is $\mathbf{x}_{k+1} := \frac{A \mathbf{x}_k}{\|A \mathbf{x}_k\|}$.
* As $k \to \infty$, $\mathbf{x}_k$ converges to $\mathbf{v}_1$ and the *Rayleigh Quotient* $\lambda_k = \frac{\mathbf{x}_k^T A \mathbf{x}_k}{ \mathbf{x}_k^T \mathbf{x}_k }$ converges to $\lambda_1$.
* **Inverse Iteration**: A method to find the eigenvalue $\lambda_j$ that is *closest to a desired shift value $s$*.
* It applies the power method to the matrix $(A - sI)^{-1}$.
* Instead of computing the inverse, each step requires solving the system $(A - sI)\mathbf{y}_{k+1} = \mathbf{x}_k$ and normalizing.
* By setting the shift s=0, the method finds the eigenvalue smallest in magnitude.
# Discrete Fourier Transform (DFT)
## 13. Complex Vectors
The DFT is built upon the mathematics of complex numbers and complex vector spaces.
* **Complex Numbers ($z$)**: A complex number is written $z = a + ib$.
* *Polar Form*: $z = r e^{i \theta}$, where $r = |z|$ (modulus) and $\theta = \arg z$ (argument/phase).
* *Euler's Formula*: $e^{i\theta} = \cos\theta + i\sin\theta$.
* *Conjugate*: The complex conjugate of $z = a + ib$ is $\overline{z} = a - ib$.
* **Complex Vector Space ($\mathbb{C}^n$)**: The space of $n$-dimensional vectors with complex entries.
* **Standard Inner Product**: The inner product is defined using the conjugate transpose to ensure the norm is real and positive: $$\langle \mathbf{u} , \mathbf{v} \rangle = \mathbf{u}^T \overline{ \mathbf{v} } = u_1 \overline{v}_1 + \cdots + u_n \overline{v}_n$$
* **Conjugate Transpose ($A^\ast$)**: The adjoint of a complex matrix $A$ is $A^\ast = \overline{A}^T$. The inner product for matrix multiplication is $\langle A \mathbf{u} , \mathbf{v} \rangle = \langle \mathbf{u} , A^\ast \mathbf{v} \rangle$.
* **Hermitian/Unitary Matrices**:
* **Hermitian**: $A = A^\ast$ (analogous to real symmetric). All eigenvalues are real.
* **Unitary**: $A A^\ast = I$ ($A^\ast = A^{-1}$) (analogous to real orthogonal). Preserves vector norms: $\|A\mathbf{v}\| = \|\mathbf{v}\|$.
## 14. Discrete Fourier Transform
The DFT is a linear transformation that rotates a signal vector $\mathbf{x}$ into the *Fourier basis*.
* **Roots of Unity**: The basis is constructed using the complex *$N$th roots of unity*. The *primitive $N$th root of unity* is $\omega_N = e^{2 \pi i / N}$. The set of all $N$th roots is $\{1, \omega_N, \dots, \omega_N^{N-1}\}$.
* **Fourier Basis ($\mathbf{f}_k$)**: The $k$th Fourier basis vector in $\mathbb{C}^N$ is: $$\mathbf{f}_k = \begin{bmatrix} 1 \\ \omega_N^k \\ \omega_N^{2k} \\ \vdots \\ \omega_N^{(N-1)k} \end{bmatrix}$$
* **Orthogonality**: The Fourier basis is an *orthogonal basis* of $\mathbb{C}^N$: $\langle \mathbf{f}_k , \mathbf{f}_{\ell} \rangle = N \delta_{k\ell}$ (where $\delta_{k\ell}$ is the Kronecker delta).
* **Fourier Matrix ($F_N$)**: The DFT operation uses the Fourier matrix $F_N$. The entry at row $k$ and column $n$ is $F_N[k, n] = \overline{\omega}_N^{kn}$, where $\overline{\omega}_N = \omega_N^{-1}$.
* **DFT Definition**: The DFT of a signal $\mathbf{x} \in \mathbb{C}^N$ is $\mathrm{DFT}(\mathbf{x}) = F_N \mathbf{x}$. The $k$th component of the DFT, $\mathrm{DFT}(\mathbf{x})[k]$, is the inner product $\langle \mathbf{x} , \mathbf{f}_k \rangle$ (i.e., the coefficient scaled by $N$).
* **Properties of Real Signals**: If $\mathbf{x}$ is a real signal, its DFT $\mathbf{y}$ is conjugate symmetric: $\overline{\mathbf{y}[k]} = \mathbf{y}[N-k]$.
* **Inverse DFT (IDFT)**: The transformation is invertible. The IDFT returns the original signal: $$\mathrm{IDFT}(\mathbf{y}) = \frac{1}{N} F_N^\ast \mathbf{y}$$
* **Signal Synthesis (IDFT)**: The IDFT allows any signal $\mathbf{x}$ to be perfectly reconstructed as a sum of sinusoids:$$\mathbf{x} = \sum_{k=0}^{N-1} \left( \frac{\mathrm{DFT}(\mathbf{x})[k]}{N} \right) \mathbf{f}_k$$ This demonstrates that any signal is a linear combination of its frequency components.
## 15. Frequency, Amplitude, and Phase
The DFT coefficients directly give the *amplitude* and *phase* of the sinusoidal components that make up the signal.
* **Sinusoids**: A discrete-time sinusoid signal $\mathbf{x}$ has the form $\mathbf{x} = A \cos(2\pi k \mathbf{t} + \phi)$.
* $A$ is the *amplitude*.
* $k$ is the *frequency* (cycles per $N$ samples).
* $\phi$ is the *phase*.
* **DFT of a Single Sinusoid**: The DFT $\mathbf{y}$ of $\mathbf{x} = A \cos(2 \pi k \mathbf{t} + \phi)$ is non-zero only at two indices, $k$ and $N-k$: $$\mathrm{DFT}(\mathbf{x})[k] = \frac{AN}{2} e^{i \phi} \hspace{10mm} \mathrm{DFT}(\mathbf{x})[N-k] = \frac{AN}{2} e^{-i \phi}$$
* The magnitude $|\mathbf{y}[k]| = AN/2$ is proportional to the amplitude $A$.
* The phase $\angle \mathbf{y}[k] = \phi$ gives the phase shift.
* **Spectra (Stemplots)**: The frequency content is visualized by plotting the magnitude and phase of $\mathbf{y}$:
* *Magnitude Stemplot*: Plot of $| \mathbf{y}[n] |$ versus frequency index $n$. Shows the amplitude of each frequency component.
* *Angle Stemplot*: Plot of $\angle \mathbf{y}[n]$ versus $n$. Shows the phase of each component.