Linear Algebra

--- tags: math, linear algebra --- # Linear Algebra ## I. Norm ### a. Vector norm Vector norm is a function that maps $C^m \to R$, called length of a vector. A norm must satisfy: * $\| x \| \geq 0$, $\|x\| = 0$ iff $x = 0$ (**positivity**) * $\|x+y\| \leq \|x\| + \|y\|$ (**triangular inequality**) * $\|\alpha x\| = |\alpha|\|x\|$ (**scalar scaling**) Some special norms: 1. $\|x\|_1 = \sum_{i=1}^{m}|x_i|$ (**1-Norm**) 2. $\|x\|_2 = \sqrt{\sum_{i=1}^{m}(x_i^2)}$ (**2-Norm**) 3. $\|x\|_\infty = max_{1 \leq i \leq m}|x_i|$ (**infinity norm**) 4. $\|x\|_{p} = (\sum_{i=1}^{m} (x_{i})^p)^{1/p}$ (with $1 \leq p < \infty$) (**p-norm**) ### b. Matrix norm It's sometimes called "size" of a matrix. Induced matrix norm is defined as: $$\|A\| = sup_{x \neq 0} \dfrac{\|Ax\|}{\|x\|}$$ From above equation, we have a nice inequality of induced of $AB$: $$\|AB\| \leq \|A\|.\|B\|$$ *Proof:* $\|AB\| = sup_{x \neq 0} \dfrac{\|ABx\|}{\|x\|} \leq \dfrac{\|A\| \|Bx\|}{\|x\|} \leq \dfrac{\|A\| \|B\| \|x\|}{\|x\|} = \|A\|.\|B\|$ Some special norms 1. **1-Norm** $\|A\|_1 = max_{1 \leq j \leq n}(\|x_j\|_1)$ (Maximum absolute column sum) For example: $$ A = \begin{bmatrix} -1 & -2 & 3 \\ 4 & 5 & -6 \\ -7 & 8 & 9 \end{bmatrix} $$ $\|col_1\|_1 = 1+4+7 = 12$ $\|col_2\|_1 = 2+5+8 = 15$ $\|col_3\|_1 = 3+6+9 = 18$ $\Rightarrow \|A\|_1 = 18$ 2. **Infinity-Norm** $\|A\|_{\infty} = max_{1 \leq i \leq n}(\|x_i\|_1)$ (Maximum absolute row sum) For example: $$ A = \begin{bmatrix} -1 & -2 & 3 \\ 4 & 5 & -6 \\ -7 & 8 & 9 \end{bmatrix} $$ $\|row_1\|_1 = 1+2+3 = 6$ $\|row_2\|_1 = 4+5+6 = 15$ $\|row_3\|_1 = 7+8+9 = 24$ $\Rightarrow \|A\|_{\infty} = 24$ 3. **2-Norm** $\|A\|_2 = \sqrt{max(eigenvalue(A^TA))} = max(singular\_values(A))$ For example: $$ A = \begin{bmatrix} -1 & -2 & 3 \\ 4 & 5 & -6 \\ -7 & 8 & 9 \end{bmatrix} $$ To find 2-Norm, we need to work a little bit more. There are 2 ways here: (1) Find eigenvalue of $A^TA$; (2) Find SVD of $A$. Both ways have the same complexity. 4. **Frobenious-Norm** $\|A\|_F = \sqrt{\sum_{i}\sum_{j}a_{ij}^2}$ It is the same as 2-Norm in vector norm 5. **p-Norm of Diagonal Matrix** $\|D\|_p = max_{1 \leq i \leq m}|d_i|$, where $d_i$ is the element of diagonal of matrix D For example: $$ D = \begin{bmatrix} -1 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 9 \end{bmatrix} $$ $\Rightarrow \|D\|_p = 9$ 6. Invarience under Unitary (Matrix $Q$) Multiplication Note: Unitary matrix (in $C$) is analogous to othorgonal matrix (in $R$) * $\|QA\|_2 = \|A\|_2$ * $\|QA\|_F = \|A\|_F$ ## II. Singular Value Decomposition (SVD) ### 1. SVD SVD is a matrix factorization that decompose a matrix into 3 special matrices. Given a matrix $A \in R^{m \times n}$: $$A = U \Sigma V^T$$ where $U, V$ are othorgonal matrices, and $\Sigma$ is a diagonal matrix containing $n$ sigular values $\sigma_1 \geq \sigma_2 \geq...\geq \sigma_n >0$. * The columns vector of $U$ is called *left singular vectors*. * The columns vector of $V$ is called *right singular vectors*. ### 2. Reduced SVD Reduced form: $$A = \hat U \hat \Sigma V^T$$ where $\hat U \in R^{m \times n}, \hat \Sigma \in R^{n \times n}, V \in R^{n \times n}$, illustrated below: ![](https://i.imgur.com/KnT9kqg.png) ### 3. Full SVD Full form: $$A = U \Sigma V^T$$ where $U \in R^{m \times m}, \Sigma \in R^{m \times n}, V \in R^{n \times n}$, illustrated below: ![](https://i.imgur.com/TfnZCvy.png) ### 4. Theorems Let $A \in R^{m \times n}, p=min(m, n)$, let $r \leq q$ denote the number of nonzero singular values of A. **a.** The rank of A is r, the number of nonzero singular values. **b.** $range(A)=<u_1, u_2,...,u_n>$ and $null(A) = <v_1, v_2,...v_n>$ Easily prove both above theorems with the expression of $A = \sum_{i}^{r} \sigma_i u_i v_i^T$ **c.** $\|A\|_2 = \sigma_1$ and $\|A\|_F = \|\Sigma\|_F = \sqrt{\sum_{i=1}^{r} \sigma_i^2}$ From the Invarience under Unitary (Matrix $Q$) Multiplication, we can easily prove this **d.** The nonzero singular values of $A$ are the square roots of the nonzero eigenvalues of $A^TA$ or $AA^T$ (These matrices have the same nonzero eigenvalues) $A^TA = V (\Sigma^T \Sigma) V^T$ So $A^TA$ is similar to $\Sigma^T \Sigma$. Therefore, they share the same eigenvalues which are $\sigma_i^2$. **e.** If $A=A^T$, or symmetric, then the sigular values of $A$ are the absolute values of eigenvalues of $A$ **f.** For $A \in R^{m \times m}, det(A)=\prod_{i=1}^m \sigma_i$ meaning determinant of $A$ if the pruduct of all of the singular values of $A$. ### 5. Computation of SVD We know that $A = U \Sigma V^T$. * Step 1: From $A^TA = V \Sigma^T \Sigma V^T$, we know that singular values of $A$ are the square roots of eigenvalues of $A^TA$, and the eigenvectors of $A^TA$ are the column of matrix $V$. * Step 2: After step 1, we have computed matrix $V$ and sigular matrix $\Sigma$, the the computation of matrix $U$ is trivial. We have $AV=U \Sigma$, or $u_i = \dfrac{1}{\sigma_i}Av_i$, with $u_i$ is the column of matrix $U$ ## III. QR factorization - Orthorgonalization ### 1. Projectors A matrix $P$ is called an *orthorgonal projector*, iff: $$P = P^T, P^2 = P$$ If $P$ is an orthorgonal projector, then $I-P$ is also an orthorgonal projector. The matrix $I-P$ is called the *complementary projector* to $P$. For any projector $P$: $$range(I-P) = null(P) \\ null(I-P) = range(P)$$ ![](https://i.imgur.com/Kyy0cMa.png) Properties: 1. $Rank(P) = 1$ 2. $Range(P)$ is a line through origin (as shown above). - Projection on arbitrary basis: $P = A(A^TA)^{-1} A^T$ - Projection on orthorgonal basis: $P = QQ^T$ (because $Q^TQ = I$) ### 2. QR factorization The idea is given a matrix $A \in R^{m \times n}$, find a factorization that returns $A = QR$, where $Q$ is an orthorgonal matrix, and $R$ is an upper triangular matrix (described in below image). ![](https://i.imgur.com/F3JXQwY.png) Similar to SVD decomposition, we also have 2 forms of QR factorization: Reduced form and Full form. ![](https://i.imgur.com/91maYY8.png) ![](https://i.imgur.com/IhMTyGS.png) In other to find the full form of QR factorization, we append vectors that are othorgonal to all of previous vectors of $Q$ and they have to be normalized to 1. Ideas to find QR factorization: #### a. Classical Gram-Schmidt orthorgonalization The idea is simple, at first step, normalize the first vector of original matrix to 1. After that, at each step, we find an unit vector that are orthorgonal to all of the previous vectors. $$q_1 = \dfrac{a_1}{\|a_1\|}, \\ q_2 = a_2 - <a_2, q_1>q_1; q_2 = \dfrac{q_2}{\|q_2\|}\\ q_3 = a_3 - <a_3, q_1>q_1 - <a_3, q_2>q_2; q_3 = \dfrac{q_3}{\|q_3\|}\\ ...$$ where $<u,v>$ is dot product between $u$ and $v$. There are many ways to find $R$. One simple way is from $A = QR$, we have $R = Q^TA$. ![](https://i.imgur.com/wcyVEO7.png) - Solution of $Ax = b$ by QR factorization From $Ax = b \Rightarrow QRx=b \Rightarrow Rx = Q^Tb$ The right hand side is easy to compute. On the left hand side, it is also easy to solve because it is triangular. #### b. Modified Gram-Schmidt It turns out that the classical GS is not stable in practice. The idea is to make all of the forthcoming vectors to be orthorgonal to the current vector. ![](https://i.imgur.com/9XDQjPK.png) ### 3. Householder algorithm The idea of Householder is to apply a sucession of elementary unitary matrices $Q_k$ on the left of $A$ to produce the upper triangular matrix $R$: $$Q_n Q_{n-1}...Q_2Q_1 A = R$$ #### a. Triangularizing by introducing zeros ![](https://i.imgur.com/pWwSZiY.png) In general, $Q_k$ operates on rows $k,...,m$. At each step, $Q_k$ will introduce $k-1$ zeroes (below the first element) in $k^{th}$ column and preserve the previous columns. After $n$ steps, $A$ will converge to the upper triangular $R$. #### b. Contruct $Q_k$ (Householder reflector) Each $Q_k$ is chosen to be the orthorgonal matrix that has the form: $$ Q_k = \begin{bmatrix} I & 0 \\ 0 & F \end{bmatrix} $$ where $I$ is the $(k-1) \times (k-1)$ identity, and $F$ is a special orthorgonal matrix that has the shape of $(m-k+1) \times (m-k+1)$: $$F = I - 2 \dfrac{vv^T}{v^Tv}$$ where $v = \|x\|e_1 - x$ ![](https://i.imgur.com/Kmzyvqy.png) ![](https://i.imgur.com/4iJxqDj.png) ## IV. Eigenvalues, eigenvectors ### 1. Eigenvalues, eigenvectors Let $A \in R^{m \times m}$. A nonzero vector $x \in R^{m}$ is an eigenvector of $A$, and $\lambda \in R$ is its corresponding eigenvalue if: $$Ax = \lambda x$$ The geometric meaning of this formular is there exist some particular matrices that transform a vector into a new vector that has the same direction to the original vector but in different scale. ### 2. Eigenvalue Decomposition An eigenvalue decomposition of a square matrix $A$ is a factorization: $$A = X\Lambda X^{-1}$$ It is obvious that in order to exist $X^{-1}$, columns of $X$ have to be linearly independent. That means we have to have $m$ distinct eigenvalues $\lambda$ ### 3. Algebraic multiplicity and Geometric multiplicity #### a. Algebraic multiplicity To find eigenvalue, we have to find the root of *characteristic polynomial* function $p(\lambda) = det(A- \lambda I) = (\lambda_1-c_1)(\lambda_2-c_2)...(\lambda_m-c_m)$ Algrbraic multiplicity of an eigenvalue $\lambda$ is its multiplicity as a root of $p(\lambda)$, including the duplicate ones. #### b. Geometric multiplicity The eigenvectors corresponding to a single eigenvalues, together with the zero vector, form a subspace of $R^n$ known as **eigenspace $E_\lambda$**.We can see that $E_\lambda = null(A-\lambda I)$. Geometric multiplicity of a specific $\lambda$ is the maximum number of linearly independent eigenvectors that can be found, or it can be describe as $GM = dim(null(A-\lambda I))$