奇異值分解 - HackMD

# 奇異值分解 ![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png) This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/). $\newcommand{\trans}{^\top} \newcommand{\adj}{^{\rm adj}} \newcommand{\cof}{^{\rm cof}} \newcommand{\inp}[2]{\left\langle#1,#2\right\rangle} \newcommand{\dunion}{\mathbin{\dot\cup}} \newcommand{\bzero}{\mathbf{0}} \newcommand{\bone}{\mathbf{1}} \newcommand{\ba}{\mathbf{a}} \newcommand{\bb}{\mathbf{b}} \newcommand{\bc}{\mathbf{c}} \newcommand{\bd}{\mathbf{d}} \newcommand{\be}{\mathbf{e}} \newcommand{\bh}{\mathbf{h}} \newcommand{\bp}{\mathbf{p}} \newcommand{\bq}{\mathbf{q}} \newcommand{\br}{\mathbf{r}} \newcommand{\bx}{\mathbf{x}} \newcommand{\by}{\mathbf{y}} \newcommand{\bz}{\mathbf{z}} \newcommand{\bu}{\mathbf{u}} \newcommand{\bv}{\mathbf{v}} \newcommand{\bw}{\mathbf{w}} \newcommand{\tr}{\operatorname{tr}} \newcommand{\nul}{\operatorname{null}} \newcommand{\rank}{\operatorname{rank}} %\newcommand{\ker}{\operatorname{ker}} \newcommand{\range}{\operatorname{range}} \newcommand{\Col}{\operatorname{Col}} \newcommand{\Row}{\operatorname{Row}} \newcommand{\spec}{\operatorname{spec}} \newcommand{\vspan}{\operatorname{span}} \newcommand{\Vol}{\operatorname{Vol}} \newcommand{\sgn}{\operatorname{sgn}} \newcommand{\idmap}{\operatorname{id}} \newcommand{\am}{\operatorname{am}} \newcommand{\gm}{\operatorname{gm}}$ ```python from lingeo import random_int_list ``` ## Main idea Continuing the introduction of the singular value decomposition in 314, this section will provide the theoretical foundation of it. ##### Singular value decomposition Let $A$ be an $m\times n$ matrix. Then there are orthonormal bases $\alpha$ and $\beta$ of $\mathbb{R}^n$ and $\mathbb{R}^m$, respectively, such that $$ [f_A]_\alpha^\beta = \Sigma = \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}, $$ where $\sigma_1\geq\cdots\geq\sigma_r$ and $\operatorname{diag}(\sigma_1,\ldots,\sigma_r)$ is the diagonal matrix with the given diagonal entries. That is, there are $m\times m$ and $n\times n$ orthogonal matrices $U$ and $V$ such that $U^\top AV = \Sigma$ and $A = U\Sigma V^\top$. Recall that $AB$ and $BA$ have the same set of nonzero eigenvalues. (See 506-6.) The values $\sigma_1 \geq \cdots \geq \sigma_r$ are called the **singular values** of $A$. - They are the (positive) square roots of the nonzero eigenvalues of $A\trans A$. - They are the (positive) square roots of the nonzero eigenvalues of $AA\trans$. - They are positive. - There are $r = \rank(A)$ of them. Indeed, the columns of $V$ form an orthonormal eigenbasis of $A\trans A$, while the columns of $U$ form an orthonormal eigenbasis of $AA\trans$. The singular value decomposition of an $m\times n$ matirx can be found by the following steps: 1. Compute an orthonormal eigenbasis $\alpha$ of $A\trans A$. 2. Order the eigenvectors in $\alpha$ by the corresponding eigenvalues in the non-increasing order. Let $\alpha_1$ and $\alpha_2$ be the sets of eigenvectors in $\alpha$ that correspond to positive and zero eigenvalues, respectively. Let $\lambda_1 \geq \cdots \geq \lambda_r$ be the positive eigenvalues and $\sigma_i = \sqrt{\lambda_i}$ for $i = 1,\ldots, r$. 3. Let $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\}$. Let $\beta_0$ be an orthonormal basis of $\ker(AA\trans)$. Let $\beta = \beta_1 \cup \beta_0$. Thus, the desired eigenbasis are found. For the construction of the matrices. - Construct $V$ by using $\alpha$ as the columns vectors. - Construct $U$ by using $\beta$ as the columns vectors. - The singular values are the (positive) square roots of nonzero eigenvalues of $A\trans A$ (or $AA\trans$). ## Side stories - $AB$ and $BA$ have the same nonzero eigenvalues - image compression - Moore–Penrose pseudo inverse (TBD) ## Experiments ##### Exercise 1 執行以下程式碼。令 $\alpha = \{\bv_1, \ldots, \bv_n\}$ 為 $\mathbb{R}^n$ 中的垂直標準基、 $\beta = \{\bu_1, \ldots, \bu_m\}$ 為 $\mathbb{R}^m$ 中的垂直標準基。已知一線性函數 $f = \mathbb{R}^n \rightarrow \mathbb{R}^m$ 具有 $[f]_\alpha^\beta = \Sigma$ 的性質。 ```python ### code set_random_seed(0) print_ans = False r = choice([1,2,3]) while True: sigs = list(map(lambda k: abs(k), random_int_list(r))) sigs.sort(reverse=True) if sigs[-1] > 0: break m,n = 3,4 Sigma = zero_matrix(m,n) for i in range(r): Sigma[i,i] = sigs[i] cs = random_int_list(n) print("m,n = %s,%s"%(m, n)) pretty_print(LatexExpr(r"\Sigma ="), Sigma) print("c_1, ..., c_n =", cs) if print_ans: rs = [Sigma[i,i] for i in range(m)] cvs = " + ".join(r"(%s)\mathbf{v}_{%s}"%(cs[i], i + 1) for i in range(n)) fcvs = " + ".join(r"(%s)\mathbf{u}_{%s}"%(rs[i] * cs[i], i + 1) for i in range(m)) pretty_print(LatexExpr(cvs + "=" + fcvs)) print("kernel of f is the span of v%s ~ vn"%(r+1)) print("range of f is the span of u1 ~ u%s"%(r)) ``` $m,n = 3,4$ $$Σ=\begin{bmatrix} 3 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\end{bmatrix}. $$ $c_1, ..., c_n = [5, -5, -5, 0]$ ##### Exercise 1(a) 將 $f(c_1\bv_1 + \cdots + c_n\bv_n)$ 寫成 $\beta$ 的線性組合。 **[由廖緯程同學提供]** 直接計算 $$ \begin{aligned} \Sigma \begin{bmatrix} c_1\\c_2\\c_3\\c_4 \end{bmatrix} &= \begin{bmatrix} 3 & 0 & 0 & 0\\ 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} 5\\-5\\-5\\0 \end{bmatrix}\\ &= \begin{bmatrix} 15\\0\\0 \end{bmatrix}. \end{aligned} $$ 所以 $f(c_1 \bv_1 + \cdots + c_n \bv_n) = 15 \bu_1$. ##### Exercise 1(b) 用 $\alpha$ 中的向量來描述 $\ker(f)$。 **[由廖緯程同學提供]** $\alpha$ 是標準基，而且 $\Sigma$ 中的行向量中只有第一行不為零向量，所以只有第一個 $\alpha$ 中的基底，也就是 $\bv_1$ 不在 $\ker(f)$ 裡面。所以 $\ker(f)$ 由剩下的基底張開，即 $\ker(f) = \vspan( \{ \bv_2, \bv_3, \bv_4 \})$. ##### Exercise 1(c) 用 $\beta$ 中的向量來描述 $\range(f)$。 **[由廖緯程同學提供]** 任意取 $\mathbb{R}^4$ 中的向量， $$ \bv = \begin{pmatrix} a\\b\\c\\d \end{pmatrix}. $$ 直接計算 $$ f(\bv) = \begin{pmatrix} 3a\\0\\0 \end{pmatrix} = 3a \bu_1. $$ 所以只要 $\bu_1$ 就可以張成 $\range(f)$，即 $\range(f) = \vspan(\{ \bu_1 \})$. ## Exercises ##### Exercise 2 求以下矩陣的奇異值分解。 ##### Exercise 2(a) $$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix}. $$ **[由廖緯程同學提供]** 先找出 $A \trans A$ 的 orthonormal eigenbasis $\alpha$， $$ A \trans A = \begin{bmatrix} 2 & 2 & 2\\ 2 & 2 & 2\\ 2 & 2 & 2 \end{bmatrix} $$ 可以觀察到 $\nul(A) = 2$，所以有兩個特徵值為 $0$，而且 $\tr(A) = 6$，所以特徵值分別為 $6, 0, 0$. 經過一些計算可以得到 $$ \ker(A \trans A - 6I) = \vspan\left\{ \begin{pmatrix} \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{3}} \end{pmatrix} \right\}. $$ 及 $$ \ker(A \trans A) = \vspan\left\{ \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\\0 \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\0\\ \frac{1}{\sqrt{2}} \end{pmatrix} \right\}. $$ 所以， $$ \alpha = \left\{ \begin{pmatrix} \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{3}}\\ \frac{1}{\sqrt{3}} \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\\0 \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\0\\ \frac{1}{\sqrt{2}} \end{pmatrix} \right\},\\ \sigma_1 = \sqrt{6}, \sigma_2 = \sigma_3 = 0. $$ $\alpha_1$ 是由 $\alpha$ 中對應的特徵值為正的特徵向量組成，找 $\beta = \beta_1 \cup \beta_0$，先找 $\beta_1$， $$ \begin{aligned} \beta_1 &= \{ \frac{1}{\sigma_i} A \bv_i : \bv_i \in \alpha_1 \}\\ &= \{ \frac{1}{\sqrt{6}} A \bv_1 \}\\ &= \{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix} \}. \end{aligned} $$ 再找 $\beta_0$，也就是 $A A \trans$ 的 orthonormal eigenbasis， $$ A A \trans = \begin{bmatrix} 3 & 3\\ 3 & 3 \end{bmatrix}, $$ 特徵值為 $6,0$， $$ \beta_0 = \{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix} \}. $$ $\beta = \beta_1 \cup \beta_0$， $$ \beta = \{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix} \} $$ $V$ 的行向量是 $\alpha$，$U$ 的行向量是 $\beta$， $$ \begin{aligned} V &= \begin{bmatrix} \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{2}} & -\frac{1} {\sqrt{2}}\\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{2}} & 0\\ \frac{1}{\sqrt{3}} & 0 & \frac{1}{\sqrt{2}}\\ \end{bmatrix}\\ U &= \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\\ \Sigma &= \begin{bmatrix} \sqrt{6} & 0 & 0\\ 0 & 0 & 0 \end{bmatrix} \end{aligned} $$ 驗證一下， $$ U \Sigma V \trans = \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1 \end{bmatrix} = A. $$ ##### Exercise 2(b) $$ A = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}. $$ **[由廖緯程同學提供]** $V$ 的行向量由 $\alpha$ 組成，$\alpha$ 是 $A \trans A$ 的 orthonormal eigenbasis. $$ \begin{aligned} A \trans A &= \begin{bmatrix} 1 & 0 & 1\\ 0 & 1 & 0\\ 1 & 0 & 1 \end{bmatrix} \end{aligned} $$ 因為 $\nul(A \trans A) = 1$，所以它有一個特徵值為 $0$. 利用柯西交錯定理觀察去掉第一行列的 principal submatrix，剛好是 $I_2$，特徵值為 $1, 1$，所以 $A \trans A$ 有一個特徵值為 $1$. 最後，$\tr(A \trans A) = 3 = 0 + 1 + 2$，所以 $A \trans A$ 的特徵值為 $2, 1, 0$. $$ \begin{aligned} \lambda_1 = 2,\\ \lambda_2 = 1,\\ \lambda_3 = 0. \end{aligned} $$ 分別計算 $\ker(A \trans A - \lambda_i I)$ 的orthonormal basis， $$ \begin{aligned} \bv_1 &= \begin{pmatrix} \frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}} \end{pmatrix},\\ \bv_2 &= \begin{pmatrix} 0\\ 1\\ 0 \end{pmatrix},\\ \bv_3 &= \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}} \end{pmatrix},\\ \alpha &= \{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}} \end{pmatrix}, \begin{pmatrix} 0\\ 1\\ 0 \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}} \end{pmatrix} \},\\ V &= \begin{bmatrix} \frac{1}{\sqrt{2}} & 0 & -\frac{1}{\sqrt{2}}\\ 0 & 1 & 0\\ \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{bmatrix}. \end{aligned} $$ $\alpha_1$ 是將 $\alpha$ 中對應特徵值為正的特徵向量取出來， $$ \alpha_1 = \left\{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ 0\\ \frac{1}{\sqrt{2}} \end{pmatrix}, \begin{pmatrix} 0\\ 1\\ 0 \end{pmatrix} \right\}. $$ $\sigma_i = \sqrt{\lambda_i}$，$\lambda_i$ 只取正的， $$ \begin{aligned} \sigma_1 &= \sqrt{2},\\ \sigma_2 &= 1,\\ \Sigma &= \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}\\ &= \begin{bmatrix} \sqrt{2} & 0 & 0\\ 0 & 1 & 0 \end{bmatrix}. \end{aligned} $$ $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\}$， $$ \beta_1 = \left\{ \begin{pmatrix} 1\\ 0 \end{pmatrix}, \begin{pmatrix} 0\\ 1 \end{pmatrix} \right\}. $$ $U$ 的行向量由 $\beta_1$ 組成， $$ U = \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix}. $$ 驗證一下， $$ \begin{aligned} U \Sigma V \trans &= \begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \begin{bmatrix} \sqrt{2} & 0 & 0\\ 0 & 1 & 0 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}}\\ 0 & 1 & 0\\ -\frac{1}{\sqrt{2}} & 0 & \frac{1}{\sqrt{2}} \end{bmatrix}\\ &= \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{bmatrix}\\ &= A. \end{aligned} $$ ##### Exercise 3 依照以下步驟求出 $$ A = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \end{bmatrix} $$ 的奇異值分解。 ##### Exercise 3(a) 求出 $A\trans A$ 的所有特徵值 $\lambda_1 \geq \cdots \geq \lambda_4$ 及其對應的一組垂直單位長的特徵基底 $\alpha = \{\bv_1,\ldots, \bv_4\}.$ **[由廖緯程同學提供]** $$ A \trans A = \begin{bmatrix} 2 & 2 & 0 & 0\\ 2 & 2 & 0 & 0\\ 0 & 0 & 2 & 2\\ 0 & 0 & 2 & 2 \end{bmatrix} $$ 直接求特徵多項式的根， $$ \begin{aligned} \det(A \trans A - xI) &= x^4 - 8x^3 + 16x^2\\ &= x^2(x - 4)^2. \end{aligned} $$ 所以 $A \trans A$ 的特徵值為 $4, 4, 0, 0$. 分別求出特徵向量後標準化可以得到 $\alpha$， $$ \alpha = \left\{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\\ 0\\ 0 \end{pmatrix}, \begin{pmatrix} 0\\ 0\\ \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix}, \begin{pmatrix} -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\\ 0\\ 0 \end{pmatrix}, \begin{pmatrix} 0\\ 0\\ -\frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix} \right\}. $$ ##### Exercise 3(b) 求出 $AA\trans$ 的所有特徵值 $\mu_1 \geq \mu_2$ 及其對應的一組垂直單位長的特徵基底 $\beta = \{\bu_1, \bu_2\}$。 **[由廖緯程同學提供]** $$ A A \trans = \begin{bmatrix} 4 & 0\\ 0 & 4 \end{bmatrix} $$ 特徵值是 $4, 4$， $$ \beta =\{ \begin{pmatrix} 1\\ 0 \end{pmatrix}, \begin{pmatrix} 0\\ 1 \end{pmatrix} \}. $$ ##### Exercise 3(c) 令 $\sigma_1 = \sqrt{\lambda_1}$、$\sigma_2 = \sqrt{\lambda_2}$。判斷以下敘述是否正確： 1. $\lambda_1 = \mu_1$ 且 $\lambda_2 = \mu_2$。 2. $A\bv_1 = \sigma\bu_1$ 且 $A\bv_2 = \sigma_2\bu_2$。 3. $A\bv_1$ 是 $AA\trans$ 的特徵向量且特徵值為 $\lambda_1$。 $A\bv_2$ 是 $AA\trans$ 的特徵向量且特徵值為 $\lambda_2$。 4. $A\bv_1$ 的長度為 $\sigma_1$、$A\bv_2$ 的長度為 $\sigma_2$。 **[由廖緯程同學提供]** 1. 正確。$\lambda_1 = \lambda_2 = \mu_1 = \mu_2 = 4$. 2. 不見得相等。 $A \bv_1 \neq \sigma_1 \bu_1$， $$ \begin{aligned} A \bv_1 &= \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \end{bmatrix} \begin{pmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}}\\ 0\\ 0 \end{pmatrix}\\ &= \begin{pmatrix} \sqrt{2}\\ \sqrt{2} \end{pmatrix}\\ &\neq 2 \begin{pmatrix} 1\\ 0 \end{pmatrix}\\ &= \sigma_1 \bu_1. \end{aligned} $$ $A \bv_2 \neq \sigma_2 \bu_2$， $$ \begin{aligned} A \bv_2 &= \begin{bmatrix} 1 & 1 & 1 & 1 \\ 1 & 1 & -1 & -1 \end{bmatrix} \begin{pmatrix} 0\\ 0\\ \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix}\\ &= \begin{pmatrix} \sqrt{2}\\ -\sqrt{2} \end{pmatrix}\\ &\neq 2 \begin{pmatrix} 0\\ 1 \end{pmatrix}\\ &= \sigma_2 \bu_2. \end{aligned} $$ 3. 正確。 $A\bv_1$ 是 $AA\trans$ 的特徵向量且特徵值為 $\lambda_1$， $$ \begin{aligned} A A \trans A \bv_1 &= \begin{bmatrix} 4 & 0\\ 0 & 4 \end{bmatrix} \begin{pmatrix} \sqrt{2}\\ \sqrt{2} \end{pmatrix}\\ &= \begin{pmatrix} 4 \sqrt{2}\\ 4 \sqrt{2} \end{pmatrix}\\ &= 4 \begin{pmatrix} \sqrt{2}\\ \sqrt{2} \end{pmatrix}\\ &= \lambda_1 A \bv_1. \end{aligned} $$ $A\bv_2$ 是 $AA\trans$ 的特徵向量且特徵值為 $\lambda_2$， $$ \begin{aligned} A A \trans A \bv_2 &= \begin{bmatrix} 4 & 0\\ 0 & 4 \end{bmatrix} \begin{pmatrix} \sqrt{2}\\ -\sqrt{2} \end{pmatrix}\\ &= \begin{pmatrix} 4 \sqrt{2}\\ -4 \sqrt{2} \end{pmatrix}\\ &= 4 \begin{pmatrix} \sqrt{2}\\ -\sqrt{2} \end{pmatrix}\\ &= \lambda_2 A \bv_2. \end{aligned} $$ 4. 正確。 $A\bv_1$ 的長度為 $\sigma_1$， $$ \begin{aligned} ||{A \bv_1}|| &= \sqrt{2 + 2}\\ &= 2\\ &= \sigma_1. \end{aligned} $$ $A\bv_2$ 的長度為 $\sigma_2$， $$ \begin{aligned} ||{A \bv_2}|| &= \sqrt{2 + 2}\\ &= 2\\ &= \sigma_1. \end{aligned} $$ ##### Exercise 3(d) 將 $\beta$ 做適當的修正使得 $\alpha$、$\beta$ 可以將 $A$ 奇異值分解。 **[由廖緯程同學提供]** 令 $$ \begin{aligned} \beta &= \left\{ \frac{1}{\sigma_1} A \bv_1, \frac{1} {\sigma_2} A \bv_2 \right\}\\ &= \left\{ \begin{pmatrix} \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} \end{pmatrix}, \begin{pmatrix} \frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}} \end{pmatrix} \right\},\\ \Sigma &= \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}\\ &= \begin{bmatrix} 2 & 0 & 0 & 0\\ 0 & 2 & 0 & 0 \end{bmatrix}. \end{aligned} $$ $U$ 的行向量由 $\beta$ 組成，$V$ 的行向量由 $\alpha$ 組成，驗證一下， $$ \begin{aligned} U \Sigma V \trans &= \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} 2 & 0 & 0 & 0\\ 0 & 2 & 0 & 0 \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 & 0\\ 0 & 0 & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} & 0 & 0\\ 0 & 0 & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\\ &= \begin{bmatrix} 1 & 1 & 1 & 1\\ 1 & 1 & -1 & -1 \end{bmatrix}\\ &= A. \end{aligned} $$ ##### Exercise 4 在 506-6 我們用連續性論證來說明 $AB$ 和 $BA$ 有相同的非零特徵值集合。這裡提供另一種方式來證明。 ##### Exercise 4(a) 令 $$ X = \begin{bmatrix} AB & A \\ O & O \end{bmatrix}, \quad Y = \begin{bmatrix} O & A \\ O & BA \end{bmatrix}. $$ 參考 408 寫出「把 $X$ 的上區塊左乘 $B$ 加到下區塊」的區塊基本矩陣 $E$。並驗證 $EXE^{-1} = Y$。 **[由廖緯程同學提供]** 令 $$ E = \begin{bmatrix} I_m & O\\ B & I_n \end{bmatrix}. $$ $E$ 是下三角矩陣而且滿秩，所以 $E$ 可逆. 驗證 $EX = YE$. $$ \begin{aligned} EX &= \begin{bmatrix} I_m & O\\ B & I_n \end{bmatrix} \begin{bmatrix} AB & A\\ O & O \end{bmatrix}\\ &= \begin{bmatrix} AB & A\\ BAB & BA \end{bmatrix}\\ &= \begin{bmatrix} O & A\\ O & BA \end{bmatrix} \begin{bmatrix} I_m & O\\ B & I_n \end{bmatrix}\\ &= YE. \end{aligned} $$ ##### Exercise 4(b) 證明 $AB$ 和 $BA$ 有相同的非零特徵值集合。 **[由廖緯程同學提供]** 考慮 $X$ 的特徵多項式， $$ \begin{aligned} p_X(x) &= \det(X - xI_{m + n})\\ &=\det( \begin{bmatrix} AB - xI_m & A\\ O & -xI_n \end{bmatrix})\\ &= \det(AB - xI_m) \det(-xI_n)\\ &= p_{AB}(x) \cdot (-x)^n. \end{aligned} $$ 考慮 $Y$ 的特徵多項式， $$ \begin{aligned} p_Y(x) &= \det(Y - xI_{m + n})\\ &=\det( \begin{bmatrix} -xI_m & A\\ O & BA - xI_n \end{bmatrix})\\ &= \det(-xI_m) \det(BA - xI_n)\\ &= p_{BA}(x) \cdot (-x)^m. \end{aligned} $$ 因為 $EXE^{-1} = Y$，所以 $X$ 相似於 $Y$，$X,Y$ 有相同的特徵多項式， $$ p_{AB}(x) \cdot (-x)^n = p_{BA}(x) \cdot (-x)^m. $$ 可以看的出來 $p_{BA}$ 與 $p_{AB}$ 有相同的非零根，所以 $AB$ 與 $BA$ 有相同的非零特徵值. ##### Exercise 5 執行以下程式碼，並用文字解釋輸出的各項是什麼。 ```python ### code import numpy as np arr = np.array([ [1,1,1], [1,1,1] ]) np.linalg.svd(arr) ``` **[由廖緯程同學提供]** 執行結果， ```python (array([[-0.70710678, -0.70710678], [-0.70710678, 0.70710678]]), array([2.44948974e+00, 1.15875172e-16]), array([[-5.77350269e-01, -5.77350269e-01, -5.77350269e-01], [-8.16496581e-01, 4.08248290e-01, 4.08248290e-01], [-6.99362418e-17, -7.07106781e-01, 7.07106781e-01]])) ``` 相當於， $$ \begin{bmatrix} -\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix},\\ \begin{bmatrix} \sqrt{6} & 0 \end{bmatrix},\\ \begin{bmatrix} -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}}\\ -\frac{\sqrt{6}}{3} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}}\\ 0 & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}. $$ $\sqrt{6}, 0$ 是 ```arr``` 的奇異值，另外兩個都是正交矩陣，所以猜測它們是 ```arr``` 的奇異值分解，驗證看看， $$ \begin{aligned} &\begin{bmatrix} -\frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}}\\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} \sqrt{6} & 0 & 0\\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}}\\ -\frac{\sqrt{6}}{3} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}}\\ 0 & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\\ &= \begin{bmatrix} -\sqrt{3} & 0 & 0\\ -\sqrt{3} & 0 & 0 \end{bmatrix} \begin{bmatrix} -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}}\\ -\frac{\sqrt{6}}{3} & \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}}\\ 0 & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}\\ &= \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1 \end{bmatrix}.\\ \end{aligned} $$ 所以確實是奇異值分解，而且結果與 Exercise 2(a) 不同，這說明了奇異值分解不唯一. :::info 一般來說奇異值指的是非零的那些數字。所以以這個矩陣來說只有一個奇異值 $\sqrt{6}$。但為了程式方便起見，會把後面的 $0$ 也都包進來，這樣才能固定第二項輸出陣列的長度。（同時也是數值方法很難判斷一個數字是不是零。） ::: ##### Exercise 6 以下小題解釋如何利用奇異值分解進行影像壓縮（去除雜訊）。 ##### Exercise 6(a) 執行以下程式碼。選用不同的 `k` 來看看結果有什麼差異。用文字敘述 `k` 對結果的影響、並選一個 `k` 是你能接受的最小值。 ```python ### 未壓縮原圖 import numpy as np from PIL import Image r = 10 img = Image.open('incrediville-side.jpg') x,y = img.size img = img.resize((x // r, y // r)).convert("L") img ``` ```python ### 壓縮 k = 10 arr = np.array(img) u,s,vh = np.linalg.svd(arr) arr_compressed = (u[:,:k] * s[:k]).dot(vh[:k,:]) img_compressed = Image.fromarray(arr_compressed.astype('uint8'), 'L') img_compressed ``` :::warning 難得比較有趣的題目，可惜你們無法執行 ::: ##### Exercise 6(b) 令 $A$ 為一 $m\times n$ 矩陣、且 $\rank(A) = r$。已知 $A = U\Sigma V^\top$ 為其奇異值分解。令 $\bu_1,\ldots,\bu_{m}$ 為 $U$ 的各行向量、而 $\bv_1, \ldots, \bv_{n}$ 為 $V$ 的各行向量。說明 $A$ 可寫成 $$ A = \sum_{i=1}^{r} \sigma_i \bu_i \bv_i\trans. $$ 因此對任意 $k \leq r$ 而言，$A' = \sum_{i=1}^{k} \sigma_i \bu_i \bv_i\trans$ 都可視為 $A$ 的一個逼近。 $Ans:$ $$ A=\begin{bmatrix} | & ~ & | \\ {\bf u}_1 & \cdots & {\bf u}_n \\ | & ~ & | \end{bmatrix} \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\O_{m-r,r} & O_{m-r,n-r} \end{bmatrix} \begin{bmatrix} - & {\bf v}_1^\top & - \\ ~ & \vdots & ~\\ - & {\bf v}_n^\top & - \end{bmatrix}. $$ $A= \sigma_1 \bu_1 \bv_1\trans+\sigma_2 \bu_2 \bv_2\trans+.....+\sigma_k \bu_k \bv_k \trans$ 可以寫成 $A = \sum_{i=1}^{k} \sigma_i \bu_i \bv_i\trans.$ ##### Exercise 6(c) 在矩陣世界裡，我們定義長度平方為 $\|X\|^2 = \tr(X\trans X)$。因此也可以計算兩矩陣 $X$ 和 $Y$ 的距離為 $\|X - Y\|$。驗證對每一個 $i$ 都有 $\|\bu_i\bv_i\trans\| = 1$，並說明 $\|A - A'\|^2 = \sum_{i = k+1}^r \sigma_i^2$。（因此我們刪除小的奇異值是合理的，並不會改變 $A$ 太多。） **[由廖緯程同學提供]** 因為 $\bu_i, \bv_i$ 長度都是 $1$，即 $\bu_i \trans \bu_i = \bv_i \trans \bv_i = 1$， $$ \begin{aligned} \| \bu_i \bv_i \trans \| &= \sqrt{\tr(\bv_i \bu_i \trans \bu_i \bv_i \trans)}\\ &= \sqrt{\tr(\bv_i \bv_i \trans)}\\ &= \sqrt{1}\\ &= 1. \end{aligned} $$ 對於所有 $i \neq j$，$\bu_i \trans \bu_j = \bv_i \trans \bv_j = \begin{bmatrix}0\end{bmatrix}$，$\tr(\bu_i \bu_i \trans) = \tr(\bv_i \bv_i \trans) = 1$， $$ \begin{aligned} \| A - A' \|^2 &= \tr((A \trans - A' \trans)(A - A'))\\ &= \tr(\sum_{i = k + 1}^r \sigma_i \bv_i \bu_i \trans \sum_{i = k + 1}^r \sigma_i \bu_i \bv_i \trans)\\ &= \tr(\sum_{i = k + 1}^r \sigma_i^2 \bv_i \bu_i \trans \bu_i \bv_i \trans)\\ &= \tr(\sum_{i = k + 1}^r \sigma_i^2 \bv_i \bv_i \trans)\\ &= \sum_{i = k + 1}^r \sigma_i^2. \end{aligned} $$ ##### Exercise 6(d) 令 `x` 為電腦儲存一個浮點數所需的容量。說明一張 $m\times n$ 畫素的灰階圖片大約佔 $mn$ `x` 的容量、而給定 $k$ 經過壓縮後的圖片大約佔 $mk + nk + k$ `x` 的容量。 :::warning - [x] 元 --> 元素 - [x] 用別人的圖要加註出處，不然就不要用；或是自己畫 - [x] 沒有寫結論 ::: $Ans:$ 矩陣 $A$ 總共有 $m\times n$ 個元素，$U_r$ 有 $m\times r$ 個元素，$V_r\trans$ 有 $r\times n$ 個元素，$\Sigma_r$ 則只需儲存主對角的 $r$ 個非零元。若以 $SVD$ 形式儲存，總計有 $(m+n+1)\times r$ 個元素。當 $r$ 遠小於 $m$ 和 $n$ 時，利用矩陣的 $SVD$ 可以大幅減少儲存量。 ![](https://i.imgur.com/GwM22tO.jpg) (Source: [線代啟示錄 | 奇異值分解 (SVD)](https://ccjou.wordpress.com/2009/09/01/%E5%A5%87%E7%95%B0%E5%80%BC%E5%88%86%E8%A7%A3-svd/)) ##### Exercise 7 以下練習討論一個矩陣 $A$ 的**摩爾–彭若斯廣義反矩陣（Moore–Penrose pseudoinverse）** 。記作 $A^\dagger$。 ##### Exercise 7(a) 若 $\sigma_1, \ldots, \sigma_r$ 為非零實數，且 $$ \Sigma = \begin{bmatrix} \operatorname{diag}(\sigma_1, \ldots, \sigma_r) & O_{r,n-r} \\ O_{m-r,r} & O_{m-r,n-r} \end{bmatrix}, $$ 則 $$ \Sigma^\dagger = \begin{bmatrix} \operatorname{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}) & O_{r,m-r} \\ O_{n-r,r} & O_{n-r,m-r} \end{bmatrix}. $$ （注意零矩陣的大小。）求 $$ \Sigma = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} $$ 的 $\Sigma^\dagger$。 :::warning - [x] 標點 - [x] $\sigma_n$ --> $\sigma_i$ ::: $Ans:$ 由定義可觀察到 $m,n$ 需互換且 $\sigma_i$ 要變成 $\sigma_i^{-1}$，因此 $\Sigma^\dagger= \begin{bmatrix} \frac{1}{2} & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}$。 ##### Exercise 7(b) 若 $A$ 的奇異值分解為 $U\Sigma V\trans$，則 $A^\dagger = V\Sigma^\dagger U\trans$。求 $$ A = \begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{bmatrix} $$ 的 $A^\dagger$。 $Ans:$ 由 2(a) 可知 $V=\begin{bmatrix} \sqrt3/3 & -\sqrt2/2 & -\sqrt6/6 \\ \sqrt3/3& \sqrt2/2 & -\sqrt6/6\\ \sqrt3/3 & 0 & \sqrt6/3 \end{bmatrix}$ ， $U=\begin{bmatrix} \sqrt2/2 & -\sqrt2/2\\ \sqrt2/2 & \sqrt2/2 \end{bmatrix}$， $Σ=\begin{bmatrix} \sqrt6 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}$，因此可求得$$U\trans=\begin{bmatrix} \sqrt2/2 & \sqrt2/2\\ -\sqrt2/2 & \sqrt2/2 \end{bmatrix}$$， $$\Sigma^\dagger= \begin{bmatrix} \frac{1}{\sqrt6} & 0 \\ 0 & 0 \\ 0 & 0 \end{bmatrix}。$$ 最後經過計算 $$A^\dagger = V\Sigma^\dagger U\trans = \begin{bmatrix} 1/6 & 1/6 \\ 1/6 & 1/6 \\ 1/6 & 1/6 \end{bmatrix}$$ 。 ##### Exercise 8 令 $A$ 為一 $m\times n$ 矩陣。以下練習建立奇異值分解的理論基礎。 ##### Exercise 8(a) 令 $\bv$ 為 $A\trans A$ 的特徵向量且 $A\trans A\bv = \lambda\bv$。證明 $A\bv$ 滿足 $AA\trans (A\bv) = \lambda(A\bv)$。 :::warning - [x] 用文字取代邏輯符號 $\Rightarrow$ - [x] 標點 ::: Ans: $AA\trans (A\bv)=A(A\trans A\bv)= A\lambda\bv$ 因為 $\lambda$ 是數字而不是矩陣所以可將 $\lambda$ 提到 $A$ 的前面，推得 $A\lambda\bv= \lambda A\bv = \lambda (A\bv)$。 ##### Exercise 8(b) 令 $\bv$ 為 $A\trans A$ 的特徵向量且 $A\trans A\bv = \lambda\bv$。證明對任意向量 $\bu\in\mathbb{R}^n$，都有 $\inp{A\bv}{A\bu} = \lambda\inp{\bv}{\bu}$。 :::warning - [x] 用文字取代邏輯符號 $\Rightarrow$ - [x] 標點 ::: Ans: $\inp{A\bv}{A\bu}=\inp{A\trans A\bv}{\bu}=\inp{\lambda \bv}{\bu}$ 因為 $\lambda$ 是數字而不是矩陣所以可將 $\lambda$ 提出，推得 $\inp{\lambda \bv}{\bu}= \lambda \inp{ \bv}{\bu}$。 ##### Exercise 8(c) 藉由以上性質證明： 1. $\lambda \geq 0$。 2. $A\bv$ 是 $AA\trans$ 的特徵向量且長度為 $\sqrt{\lambda}$。 3. 若 $\alpha_1$ 為 $\Row(A\trans A)$ 的一組垂直標準特徵基底、則 $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\}$ 為 $\Row(AA\trans)$ 的一組垂直標準特徵基底。 :::warning - [x] 用文字取代邏輯符號 $\Rightarrow$ - [x] 標點 ::: Ans: 令 $||\bv||=1$ 1. $\lambda=\inp{\bv}{\lambda \bv}=\inp{\bv}{A\trans A \bv}=\inp{A\bv}{A\bv}=||A\bv||^2>0$ 推得 $\lambda>0$。 2. 根據$\ 1.\ \lambda =||A\bv||^2$ 推得 $||A\bv||=\sqrt{\lambda}$。 3. $\sigma_i=\sqrt{\lambda} ,A\bv$ 是 $AA\trans$ 的特徵向量且長度為 $\sqrt{\lambda}$ ，推得 $A\bv\in \Row(A\trans A),$ 是個垂直特徵基底且長度為 $\sqrt{\lambda}$ ，推得 $\frac{1}{\sqrt{\lambda}}A\bv$ 為除以長度後的 $\Row(A\trans A)$ 垂直標準特徵基底，而 $\beta_1 = \{\frac{1}{\sigma_i}A\bv: \bv \in \alpha_1\} \in \Row(A\trans A)$ 所以 $\beta_1$ 是 $\Row(A\trans A)$ 的一組垂直標準特徵基底。 :::info 分數 = 5 :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.