--- title: Linear Algebra Note 4 tags: Linear Algebra, 線性代數, 魏群樹, 大學, 國立陽明交通大學, 筆記 --- # Linear Algebra Note 4 ## Orthogonality (2021.11.10 ~ 2021.11.17) - Example $A = \left[\begin{array}{c c} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right],\ \vec{b} = \left[\begin{array}{c} 6 \\ 0 \\ 0 \\ \end{array}\right],\ P = A(A^TA)^{-1}A^T \\ \text{where } A^TA = \left[\begin{array}{c c c} 1 & 1 & 1 \\ 0 & 1 & 2 \\ \end{array}\right]\left[\begin{array}{c c} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right] = \left[\begin{array}{c c} 3 & 3 \\ 3 & 5 \\ \end{array}\right] \\ A^TA\vec{x} = A^T\vec{b} = \left[\begin{array}{c c} 3 & 3 \\ 3 & 5 \\ \end{array}\right]\left[\begin{array}{c} \hat{x_1} \\ \hat{x_2} \\ \end{array}\right] = \left[\begin{array}{c c c} 1 & 1 & 1 \\ 0 & 1 & 2 \\ \end{array}\right]\left[\begin{array}{c} 6 \\ 0 \\ 0 \\ \end{array}\right] = \left[\begin{array}{c} 6 \\ 0 \\ \end{array}\right] \implies \hat{\vec{x}} = \left[\begin{array}{c} \hat{x_1} \\ \hat{x_2} \\ \end{array}\right] = \left[\begin{array}{c} 5 \\ -3 \\ \end{array}\right] \\ \vec{p} = A\hat{\vec{x}} = \left[\begin{array}{c c} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right]\left[\begin{array}{c} 5 \\ -3 \\ \end{array}\right] = \left[\begin{array}{c} 5 \\ 2 \\ -1 \\ \end{array}\right] \\ \vec{e} = \vec{b} - \vec{p} = \left[\begin{array}{c} 1 \\ -2 \\ 1 \\ \end{array}\right],\ \text{check } \vec{e}^T\vec{p} = \left[\begin{array}{c} 1 \\ -2 \\ 1 \\ \end{array}\right]^T\left[\begin{array}{c} 5 \\ 2 \\ -1 \\ \end{array}\right] = 0,\ \text{check } \vec{e}^TA = \left[\begin{array}{c} 1 \\ -2 \\ 1 \\ \end{array}\right]^T\left[\begin{array}{c c} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right] = 0\\ \text{Alternatively, } P = A(A^TA)^{-1}A^T \text{ where } (A^TA)^{-1} = {1 \over 6}\left[\begin{array}{c c} 5 & -3 \\ -3 & 3 \\ \end{array}\right] \implies P = {1 \over 6}\left[\begin{array}{c c c} 5 & 2 & -1 \\ 2 & 2 & 2 \\ -1 & 2 & 5 \\ \end{array}\right] \\ \vec{p} = P\vec{b} = {1 \over 6}\left[\begin{array}{c c c} 5 & 2 & -1 \\ 2 & 2 & 2 \\ -1 & 2 & 5 \\ \end{array}\right]\left[\begin{array}{c} 6 \\ 0 \\ 0 \\ \end{array}\right] = \left[\begin{array}{c} 5 \\ 2 \\ -1 \\ \end{array}\right]$ - Theorem $P^2 = P$ - Intuition $P\vec{b} = \vec{p},\ P(P\vec{b}) = P^2\vec{b} = \vec{p}$ - Proof $P = A(A^TA)^{-1}A^T \\ \begin{split}P^2 = PP &= (A(A^TA)^{-1}A^T)(A(A^TA)^{-1}A^T) \\ &= A(A^TA)^{-1}\color{red}{A^TA(A^TA)^{-1}}A^T \\ &= A(A^TA)^{-1}A^T = P\end{split}$ ### Least Square Approximations In many practical cases, $A\vec{x} = \vec{b}$ has no solutions, e.g. $A$ = tall matrix (more equations than unknowns) Let $\vec{e} \triangleq A\vec{x} - \vec{b}$ , the best approximation is to minimize $\vec{e} \implies$ minimize $\begin{Vmatrix} \vec{e} \end{Vmatrix}^2$ , i.e. Least Square Approximations - Example (Linear fit) Find the closest line to points $(t,\ b) = (0,\ 6),\ (1,\ 0),\ (2,\ 0)$ Let the line: $b = x_0 + x_1t$ $\left.\begin{array}{c} x_0 + x_1 \cdot 0 = 6 \\ x_0 + x_1 \cdot 1 = 0 \\ x_0 + x_1 \cdot 2 = 0 \\ \end{array}\right. \implies \left[\begin{array}{c c} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right]\left[\begin{array}{c} x_0 \\ x_1 \\ \end{array}\right] = \left[\begin{array}{c} 6 \\ 0 \\ 0 \\ \end{array}\right]\ (A\vec{x} = \vec{b})$ Let $\vec{e} = A\vec{x} - \vec{b}$ and minimize $\begin{Vmatrix} \vec{e} \end{Vmatrix}^2$ $e \perp A \hat{\vec{x}} \iff A^T\vec{e} = 0 \iff A^T(A\vec{x} - \vec{b}) = 0 \\ \begin{split} \hat{\vec{x}} &= (A^TA)^{-1}A^T\vec{b} \\ &= (\left[\begin{array}{c c c} 1 & 1 & 1 \\ 0 & 1 & 2 \\ \end{array}\right]\left[\begin{array}{c c} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ \end{array}\right])^{-1}\left[\begin{array}{c c c} 1 & 1 & 1 \\ 0 & 1 & 2 \\ \end{array}\right]\left[\begin{array}{c} 6 \\ 0 \\ 0 \\ \end{array}\right] = \left[\begin{array}{c} 5 \\ -3 \\ \end{array}\right]\end{split} \\ \therefore b = 5 - 3t \text{ is the best linear fit}$ - Theorem (Orthogonality principle) Let $\vec{e} = A\vec{x} - \vec{b}$ and let $\hat{\vec{x}}$ be the solution to min $\begin{Vmatrix} \vec{e} \end{Vmatrix}^2$ = min $\begin{Vmatrix} A\vec{x} - \vec{b} \end{Vmatrix}^2$ , then $\vec{e} \perp C(A)$ and $A\hat{\vec{x}} = P\vec{b} = \vec{p}$ where $P$ is the projection matrix onto $C(A)$ and $\hat{\vec{x}}$ is the solution to $A^TA\hat{\vec{x}} = A^T\vec{b}$ - Proof Let $\vec{b} = \vec{p} + \vec{e}$ where $\vec{p} \in C(A)$ and $\vec{e} \in N(A^T)$ while $A\vec{x} = \vec{b}$ may **NOT** be solvable, $A\vec{x} = \vec{p}$ is always solvable as $\vec{p} \in C(A)$ The objective function (to be minimized) is $\begin{split}\begin{Vmatrix} A\vec{x} - \vec{b} \end{Vmatrix}^2 &= \begin{Vmatrix} A\vec{x} - \vec{p} - \vec{e} \end{Vmatrix}^2 \\ &= (A\vec{x} - \vec{p} - \vec{e})^T(A\vec{x} - \vec{p} - \vec{e}) \\ &= \begin{Vmatrix} A\vec{x} - \vec{p} \end{Vmatrix}^2 + \begin{Vmatrix} \vec{e} \end{Vmatrix}^2 - (A\vec{x}-\vec{p})^T\vec{e} - \vec{e}^T(A\vec{x}-\vec{p}) \\ &= \begin{Vmatrix} A\vec{x} - \vec{p} \end{Vmatrix}^2 + \begin{Vmatrix} \vec{e} \end{Vmatrix}^2 - 0 - 0\ (\because N(A^T) \perp C(A)) \\ &= \begin{Vmatrix} A\vec{x} - \vec{p} \end{Vmatrix}^2 + \begin{Vmatrix} \vec{e} \end{Vmatrix}^2\end{split}$ - Goal: min $\begin{Vmatrix} A\vec{x}-\vec{b} \end{Vmatrix}^2$ = min \[ $\begin{Vmatrix} A\vec{x} - \vec{p} \end{Vmatrix}^2 + \begin{Vmatrix} \vec{e} \end{Vmatrix}^2$ \] = min $\begin{Vmatrix} A\vec{x} - \vec{p} \end{Vmatrix}^2$ + min $\begin{Vmatrix} \vec{e} \end{Vmatrix}^2 \ge \begin{Vmatrix} \vec{e} \end{Vmatrix}^2$ we can hit the equality by choosing $\vec{x} = \hat{\vec{x}}$ , then the least square solution is $\begin{Vmatrix} \vec{e} \end{Vmatrix}^2$ when $A\hat{\vec{x}} = P\vec{b} = \vec{p}$ - Summary: When $A\vec{x}=\vec{b}$ has **NO** solution, multiply by $A^T$ and solve $A^TA\hat{\vec{x}}=A^T\vec{b}$ . The solution to this alternative system of equations is the least square approximation to $A\vec{x} = \vec{b}$ ![](https://i.imgur.com/el6B4lZ.jpg) ### Fitting by a parabola Given a quadratic (二次的) equation $b = x_2 + x_1t + x_2t^2$ with observations $(b,\ t)=(b_1,\ t_1),\cdots,\ (b_m,\ t_m)$ with $m > 3$ points, the $m$ equations for an exact fit are usually unsolvable $\left\{\begin{array}{c} x_2 + x_1t_1 + x_2t^2_1 = b_1 \\ x_2 + x_1t_2 + x_2t^2_2 = b_2 \\ \vdots \\ x_2 + x_1t_m + x_2t^2_m = b_m \\ \end{array}\right. \implies A = \left[\begin{array}{c c c} 1 & t_1 & t_1^2 \\ \vdots & \vdots & \vdots \\ 1 & t_m & t_m^2 \\ \end{array}\right]_{m \times 3},\ A\left[\begin{array}{c} x_0 \\ x_1 \\ x_2 \\ \end{array}\right] = \left[\begin{array}{c} b_1 \\ b_2 \\ \vdots \\ b_m \\ \end{array}\right]\ (A\vec{x} = \vec{b})$ Least Squares: solve $A^TA\hat{\vec{x}} = A^T\vec{b}$ ### Orthogonal Bases A vector spaces can be spanned by different bases. A basis containing all orthogonal vectors is called an orthogonal basis. Recall that vectors $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}$ are orthogonal if $<\vec{q_i},\ \vec{q_j}> = \vec{q_i}^T\vec{q_j} = \left\{\begin{array}{c} 0\ ,\ i \not= j \\ \begin{Vmatrix} \vec{q_i} \end{Vmatrix}^2 = \begin{Vmatrix} \vec{q_j} \end{Vmatrix}^2,\ i = j \\ \end{array}\right.$ - Define (Orthonormal): The vectors $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}$ are orthonormal (orthogonal and normalized) if $<\vec{q_i},\ \vec{q_j}> = \vec{q_i}^T\vec{q_j} = \left\{\begin{array}{c} 0\ ,\ i \not= j \\ 1\ ,\ i = j \\ \end{array}\right.$ (normalized to unit vector) - Theorem Let $Q = [\vec{q_1}\ \vec{q_2}\ \cdots\ \vec{q_n}]$ where $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}$ are orthonormal. Then $Q^TQ = I$ . Moreover, if $Q$ is square, then $Q^{-1} = Q^T$ - Proof $\begin{split}Q^TQ = \left[\begin{array}{c} \vec{q_1}^T \\ \vec{q_2}^T \\ \vdots \\ \vec{q_n}^T \\ \end{array}\right]\left[\begin{array}{c c c c} \vec{q_1} & \vec{q_2} & \cdots & \vec{q_n} \\ \end{array}\right] &= \left[\begin{array}{c c c c} \vec{q_1}^T\vec{q_1} & \vec{q_1}^T\vec{q_2} & \cdots & \vec{q_1}^T\vec{q_n} \\ \vec{q_2}^T\vec{q_1} & \vec{q_2}^T\vec{q_2} & \cdots & \vec{q_2}^T\vec{q_n} \\ \vdots & \vdots & \ddots & \vdots \\ \vec{q_n}^T\vec{q_1} & \vec{q_n}^T\vec{q_2} & \cdots & \vec{q_n}^T\vec{q_n} \\ \end{array}\right] \\ &= \left[\begin{array}{c c c c} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 \\ \end{array}\right] = I\end{split}$ If $Q$ is square $\implies$ size $= n \times n$ and $Q^TQ = I \implies Q^{-1} = Q^T$ - Example Permutation matrix $Q = \left[\begin{array}{c c c} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ \end{array}\right]$ is orthogonal - Check $Q^TQ = \left[\begin{array}{c c c} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ \end{array}\right]\left[\begin{array}{c c c} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \\ \end{array}\right] = \left[\begin{array}{c c c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array}\right] = I$ - Example Rotation matrix $Q = \left[\begin{array}{c c} cos\theta & -sin\theta \\ sin\theta & cos\theta \\ \end{array}\right]$ rotates a vector counterclockwise by $\theta$ and $Q$ is orthogonal - Check $Q^TQ = \left[\begin{array}{c c} cos\theta & sin\theta \\ -sin\theta & cos\theta \\ \end{array}\right]\left[\begin{array}{c c} cos\theta & -sin\theta \\ sin\theta & cos\theta \\ \end{array}\right] = \left[\begin{array}{c c} 1 & 0 \\ 0 & 1 \\ \end{array}\right] = I$ - Theorem Orthonormal matrices preserve length of a vector, i.e. $\begin{Vmatrix} Q\vec{x} \end{Vmatrix} = \begin{Vmatrix} \vec{x} \end{Vmatrix}\ \forall\ \vec{x} \in \mathbb{R}^n$ Moreover, it preserves dot products, i.e. $(Q\vec{x})^T(Q\vec{y}) = \vec{x}^T\vec{y}$ - Proof $(Q\vec{x})^T(Q\vec{y}) = \vec{x}^T\color{red}{Q^TQ}\vec{y} = \vec{x}^T\color{red}{I}\vec{y} = \vec{x}^T\vec{y}$ Let $\vec{y} = \vec{x},\ (Q\vec{x})^T(Q\vec{x}) = \vec{x}^T\vec{x} \iff \begin{Vmatrix} Q\vec{x} \end{Vmatrix}^2 = \begin{Vmatrix} \vec{x} \end{Vmatrix}^2 \iff \begin{Vmatrix} Q\vec{x} \end{Vmatrix} = \begin{Vmatrix} \vec{x} \end{Vmatrix}$ > Recall that for projecting a vector $\vec{b}$ onto a subspace spanned by columns of $A \implies$ they are basis for $C(A)$ $\begin{split}<\vec{e},\ A>\ =\ &<\vec{b}-A\vec{x},\ A>\ =\ 0 \\ \implies &A^T(\vec{b}-A\vec{x}) = 0 \\ \implies &A^T\vec{b} = A^TA\vec{x} \\ \implies &\hat{\vec{x}} = (A^TA)^{-1}A^T\vec{b},\ \vec{p} = A\hat{\vec{x}} = A(A^TA)^{-1}A^T\vec{b}\end{split}$ when we have an orthogonal basis of the subspace $Q = [\vec{q_1}\ \vec{q_2}\ \cdots\ \vec{q_n}],\ Q^TQ = I$ Then $\hat{\vec{x}} = Q^T\vec{b},\ \vec{p} = Q\hat{\vec{x}} = QQ^T\vec{b}$ - Projection of $\vec{b}$ onto a subspace is the sum of the projection of $\vec{b}$ onto each orthonormal vector $\vec{q_i}$ in the orthonormal basis of the subspace $\begin{split}\vec{p} &= QQ^T\vec{b} = \left[\begin{array}{c c c c} \vec{q_1} & \vec{q_2} & \cdots & \vec{q_n} \\ \end{array}\right]\left[\begin{array}{c} \vec{q_1}^T \\ \vec{q_2}^T \\ \vdots \\ \vec{q_n}^T \\ \end{array}\right]\vec{b} = \left[\begin{array}{c c c c} \vec{q_1} & \vec{q_2} & \cdots & \vec{q_n} \\ \end{array}\right]\left[\begin{array}{c} \vec{q_1}^T\vec{b} \\ \vec{q_2}^T\vec{b} \\ \vdots \\ \vec{q_n}^T\vec{b} \\ \end{array}\right] \\ &=(\underbrace{\vec{q_1}^T\vec{b}}_\text{scalar})\vec{q_1} + (\vec{q_2}^T\vec{b})\vec{q_2} + \cdots + (\vec{q_n}^T\vec{b})\vec{q_n}\ (\vec{b} = \vec{p}+\vec{e},\ \vec{e} \perp \vec{q_i}) \\ &= (\vec{q_1}^T\vec{p})\vec{q_1} + (\vec{q_2}^T\vec{p})\vec{q_2} + \cdots + (\vec{q_n}^T\vec{p})\vec{q_n} \\ &= \vec{p_1} + \vec{p_2} + \cdots + \vec{p_n}\end{split}$ - Let $\vec{p} = \hat{x_1}\vec{q_1} + \hat{x_2}\vec{q_2} + \cdots + \hat{x_n}\vec{q_n}$ $\vec{q_i}^T\vec{p} = \vec{q_i}^T(\hat{x_1}\vec{q_1} + \hat{x_2}\vec{q_2} + \cdots + \hat{x_n}\vec{q_n}) = \hat{x_i}\vec{q_i}^T\vec{q_i}\ (\vec{q_i}^T\vec{q_j} = 0\ \forall\ j \not= i) \\ \therefore \vec{p}^T\vec{q_i} = \hat{x_i}\vec{q_i}^T\vec{q_i},\ \hat{x_i} = {\vec{p}^T\vec{q_i} \over \begin{Vmatrix} \vec{q_i} \end{Vmatrix}^2} \text{ where } \begin{Vmatrix} \vec{q_i} \end{Vmatrix}^2 = 1 \text{ for orthonormal basis} \{ q_i \}$ - Example $\vec{q_1} = \left[\begin{array}{c} 1 \\ 0 \\ 0 \\ \end{array}\right],\ \vec{q_2} = \left[\begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array}\right],\ \vec{b} = \left[\begin{array}{c} 4 \\ 3 \\ 2 \\ \end{array}\right] \\ \vec{p_1} = (\vec{q_1}^T\vec{b})\vec{q_1} = \left[\begin{array}{c} 1 \\ 0 \\ 0 \\ \end{array}\right]^T\left[\begin{array}{c} 4 \\ 3 \\ 2 \\ \end{array}\right]\left[\begin{array}{c} 1 \\ 0 \\ 0 \\ \end{array}\right] = \left[\begin{array}{c} 4 \\ 0 \\ 0 \\ \end{array}\right] \\ \vec{p_2} = (\vec{q_2}^T\vec{b})\vec{q_2} = \left[\begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array}\right]^T\left[\begin{array}{c} 4 \\ 3 \\ 2 \\ \end{array}\right]\left[\begin{array}{c} 0 \\ 1 \\ 0 \\ \end{array}\right] = \left[\begin{array}{c} 0 \\ 3 \\ 0 \\ \end{array}\right] \\ \vec{p} = \vec{p_1} + \vec{p_2} = \left[\begin{array}{c} 4 \\ 3 \\ 0 \\ \end{array}\right] = \text{ Projection of } \vec{b} \text{ onto } span(\{ \vec{q_1},\ \vec{q_2}\})$ ### The Gram-Schmidt Process A systematic way to create orthonormal basis from an arbitrary basis Input: $\vec{v_1},\ \vec{v_2},\ \cdots,\ \vec{v_n}$ ($n$ non-zero vectors) Output: $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_r}\ (r = rank([\vec{v_1}\ \vec{v_2}\ \cdots\ \vec{v_n}]))$ Step1: Let $\vec{e_1} = \vec{v_1},\ \vec{q_1} = { \vec{e_1} \over \begin{Vmatrix} \vec{e_1} \end{Vmatrix}}$ Step2+: $\underbrace{\vec{e_k} = \vec{v_k} - \sum\limits_{i = 1}^{k-1}(\vec{q_i}^T\vec{v_k})\vec{q_i}}_\text{orthogonalization},\ \underbrace{\vec{q_k} = { \vec{e_k} \over \begin{Vmatrix} \vec{e_k} \end{Vmatrix}}}_\text{normalization}$ - Example $\vec{a} = \left[\begin{array}{c} 1 \\ -1 \\ 0 \\ \end{array}\right],\ \vec{b} = \left[\begin{array}{c} 2 \\ 0 \\ -2 \\ \end{array}\right],\ \vec{c} = \left[\begin{array}{c} 3 \\ -3 \\ 3 \\ \end{array}\right]$ 1. Let $\vec{q_1} = { \vec{a} \over \begin{Vmatrix} \vec{a} \end{Vmatrix}} = {1 \over \sqrt{2}}\left[\begin{array}{c} 1 \\ -1 \\ 0 \\ \end{array}\right]$ 2. Let $\vec{e_2} = \vec{v_2} - (\text{projection of } \vec{v_2} \text{ onto } \vec{q_1}) = \vec{v_2} - (\vec{q_1}^T\vec{v_2})\vec{q_1},\ \vec{q_2} = { \vec{e_2} \over \begin{Vmatrix} \vec{e_2} \end{Vmatrix}}$ ### QR Factorization (Decomposition) $\{\vec{v_1},\ \vec{v_2},\ \cdots,\ \vec{v_n}\} \text{ (linearly independent) } \xrightarrow{Gram-Schmidt\ Process} \{\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}\}$ Express $\vec{v_k}$ by the newly obtained orthonormal basis $\{\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}\}$ $\vec{v_1} = (\vec{q_1}^T\vec{v_1})\vec{q_1} \\ \vec{v_2} = (\vec{q_2}^T\vec{v_2})\vec{q_2} + (\vec{q_1}^T\vec{v_2})\vec{q_1} \\ \ \vdots \\ \vec{v_k} = (\vec{q_1}^T\vec{v_k})\vec{q_1} + \cdots + (\vec{q_k}^T\vec{v_k})\vec{q_k}$ $\begin{split}\text{Let } A &= \left[\begin{array}{c c c c} \vec{v_1} & \vec{v_2} & \cdots & \vec{v_k} \\ \end{array}\right] \\ &= \left[\begin{array}{c c c c} (\vec{q_1}^T\vec{v_1})\vec{q_1} & (\vec{q_1}^T\vec{v_2})\vec{q_1} & & (\vec{q_1}^T\vec{v_k})\vec{q_1} \\ & + (\vec{q_2}^T\vec{v_2})\vec{q_2} & \cdots & (\vec{q_2}^T\vec{v_k})\vec{q_2} \\ & & & + \cdots \\ & & & + (\vec{q_k}^T\vec{v_k})\vec{q_k} \\ \end{array}\right] \\ &= \left[\begin{array}{c c c c} \vec{q_1} & \vec{q_2} & \cdots & \vec{q_k} \\ \end{array}\right]\left[\begin{array}{c c c c} \vec{q_1}^T\vec{v_1} & \vec{q_1}^T\vec{v_2} & \cdots & \vec{q_1}^T\vec{v_k} \\ 0 & \vec{q_2}^T\vec{v_2} & \cdots & \vec{q_2}^T\vec{v_k} \\ \vdots & 0 & & \vdots \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \vec{q_k}^T\vec{v_k} \\ \end{array}\right] = QR\end{split}$ $A = QR$ , consider the least square approximation $A^TA\vec{\hat{x}} = A^T\vec{b}$ where $A=QR$ $\begin{split}&\iff (QR)^T(QR)\vec{\hat{x}} = (QR)^T\vec{b} \\ &\iff R^T\color{red}{Q^TQ}R\vec{\hat{x}} = R^TQ^T\vec{b} \\ &\iff R^TR\vec{\hat{x}} = R^TQ^T\vec{b} \\ &\iff (R^T)^{-1}R^TR\vec{\hat{x}} = (R^T)^{-1}R^TQ^T\vec{b} \\ &\iff R\vec{\hat{x}} = Q^T\vec{b} \\ &\therefore \text{ we can solve } \vec{\hat{x}} \text{ by back substituting using } R\end{split}$ - Example $A = \left[\begin{array}{c c c} 1 & 1 & 0 \\ 1 & 0 & 1 \\ 0 & 1 & 1 \\ \end{array}\right] \implies \vec{v_1} = \left[\begin{array}{c} 1 \\ 1 \\ 0 \\ \end{array}\right],\ \vec{v_2} = \left[\begin{array}{c} 1 \\ 0 \\ 1 \\ \end{array}\right],\ \vec{v_3} = \left[\begin{array}{c} 0 \\ 1 \\ 1 \\ \end{array}\right] \\ \vec{q_1} = { \vec{v_1} \over \begin{Vmatrix} \vec{v_1} \end{Vmatrix}} = {1 \over \sqrt{2}}\left[\begin{array}{c} 1 \\ 1 \\ 0 \\ \end{array}\right] \\ \vec{e_2} = \vec{v_2} - (\vec{q_1}^T\vec{v_2})\vec{q_1} = \left[\begin{array}{c} {1 \over 2} \\ {-1 \over 2} \\ 1 \\ \end{array}\right],\ \vec{q_2} = { \vec{e_2} \over \begin{Vmatrix} \vec{e_2} \end{Vmatrix}} = \left[\begin{array}{c} {1 \over \sqrt{6}} \\ {-1 \over \sqrt{6}} \\ {2 \over \sqrt{6}} \\ \end{array}\right] \\ \vec{e_3} = \vec{v_3} - (\vec{q_1}^T\vec{v_3})\vec{q_1} - (\vec{q_2}^T\vec{v_3})\vec{q_2} = \left[\begin{array}{c} {-2 \over 3} \\ {2 \over 3} \\ {2 \over 3} \\ \end{array}\right],\ \vec{q_3} = { \vec{e_3} \over \begin{Vmatrix} \vec{e_3} \end{Vmatrix}} = \left[\begin{array}{c} {-1 \over \sqrt{3}} \\ {1 \over \sqrt{3}} \\ {1 \over \sqrt{3}} \\ \end{array}\right] \\ \therefore Q = \vec{v_1} = \left[\begin{array}{c c c} \vec{q_1} & \vec{q_2} & \vec{q_3} \\ \end{array}\right] = \left[\begin{array}{c} {1 \over \sqrt{2}} & {1 \over \sqrt{6}} & {-1 \over \sqrt{3}} \\ {1 \over \sqrt{2}} & {-1 \over \sqrt{6}} & {1 \over \sqrt{3}} \\ 0 & {2 \over \sqrt{6}} & {1 \over \sqrt{3}} \\ \end{array}\right],\ \begin{split}R &= \left[\begin{array}{c c c} \vec{v_1}^T\vec{q_1} & \vec{v_1}^T\vec{q_2} & \vec{v_1}^T\vec{q_3} \\ 0 & \vec{v_2}^T\vec{q_2} & \vec{v_2}^T\vec{q_3} \\ 0 & 0 & \vec{v_3}^T\vec{q_3}\\ \end{array}\right] \\ &= \left[\begin{array}{c c c} {1 \over \sqrt{2}} & {1 \over \sqrt{2}} & {1 \over \sqrt{2}} \\ 0 & {3 \over \sqrt{6}} & {1 \over \sqrt{6}} \\ 0 & 0 & {2 \over \sqrt{3}}\\ \end{array}\right]\end{split}$ ## Determinants (2021.11.19) - Example Determinant of $2 \times 2$ matrix $A = \left[\begin{array}{c c} a & b \\ c & d \\ \end{array}\right],\ det(A) = \begin{vmatrix} A \end{vmatrix} = ad-bc$ - Properties 1. $\begin{vmatrix} I_n \end{vmatrix} = 1$ 2. Exchange of two rows leads to change of sign $A = \left[\begin{array}{c c} a & b \\ c & d \\ \end{array}\right],\ A' = \left[\begin{array}{c c} c & d \\ a & b \\ \end{array}\right],\ \begin{vmatrix} A' \end{vmatrix} = bc-ad = -(ad-bc) = -\begin{vmatrix} A \end{vmatrix}$ 3. The determinant is a linear function of each row separately (scalar multiplication) $A' = \left[\begin{array}{c c} ta & tb \\ c & d \\ \end{array}\right] \implies \begin{vmatrix} A' \end{vmatrix} = t(ad-bc) = t\begin{vmatrix} A \end{vmatrix}$ (vector addition) $\begin{vmatrix} A'' \end{vmatrix} = \begin{vmatrix} \left[\begin{array}{c c} a+a' & b+b' \\ c & d \\ \end{array}\right] \end{vmatrix} = \begin{vmatrix} A \end{vmatrix}+\begin{vmatrix} A' \end{vmatrix} \text{ where } A' = \left[\begin{array}{c c} a' & b' \\ c & d \\ \end{array}\right]$ 4. If two rows are equal in $A$ , $\begin{vmatrix} A \end{vmatrix} = 0$ - Proof Given two rows, $\vec{a_i}^T$ and $\vec{a_j}^T$ are equal in $A$ Let $A'$ be the matrix $A$ with $\vec{a_i}^T$ and $\vec{a_j}^T$ exchanged From 2, $det(A') = -det(A) ... ①$, since $A'=A,\ det(A') = det(A) ... ②$ From ① and ②, $det(A) = det(A') = 0$ 5. Subtracting a multiple of one row from another row leaves $\begin{vmatrix} A \end{vmatrix}$ unchanged $\begin{split}\begin{vmatrix} A' \end{vmatrix} &= \begin{vmatrix} \left[\begin{array}{c c} a & b \\ c-ka & d-kb \\ \end{array}\right] \end{vmatrix} \\ &= \begin{vmatrix} \left[\begin{array}{c c} a & b \\ c & d \\ \end{array}\right] \end{vmatrix} + \begin{vmatrix} \left[\begin{array}{c c} a & b \\ -ka & -kb \\ \end{array}\right] \end{vmatrix} \\ &= \begin{vmatrix} A \end{vmatrix} + (-k)\begin{vmatrix} \left[\begin{array}{c c} a & b \\ a & b \\ \end{array}\right] \end{vmatrix} \\ &= \begin{vmatrix} A \end{vmatrix} + 0 = \begin{vmatrix} A \end{vmatrix}\end{split}$ 6. A matrix with a row of zeros has $det(A) = 0$ $A = \left[\begin{array}{c c} a & b \\ 0 & 0 \\ \end{array}\right],\ A' = \left[\begin{array}{c c} a & b \\ 0+a & 0+b \\ \end{array}\right]$ From 4, $\begin{vmatrix} A' \end{vmatrix} = 0$ , from 5, $\begin{vmatrix} A \end{vmatrix} = \begin{vmatrix} A' \end{vmatrix} = 0$ 7. If $A$ is triangular, then $\begin{vmatrix} A \end{vmatrix} =$ product of diagonal elements $A = \left[\begin{array}{c c} a_{11} & a_{12} & \cdots & a_{1n} \\ 0 & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & a_{nn} \end{array}\right] \xrightarrow{Jordan\ elimination} D = \left[\begin{array}{c c} a_{11} & 0 & \cdots & 0 \\ 0 & a_{22} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & a_{nn} \end{array}\right] (diagonal)$ From 5, $\begin{vmatrix} D \end{vmatrix} = \begin{vmatrix} A \end{vmatrix} =$ product of diagonal elements 8. If $A$ is singular, then $\begin{vmatrix} A \end{vmatrix} = 0$ , if $A$ is invertible, $\begin{vmatrix} A \end{vmatrix} \not= 0$ - Define A singular matrix does not have a matrix inverse $A$ is singular $\implies$ $A$ does not have full rank $\implies$ $A$ contains all-zero row after Gauss-Jordan Elimination Note that Gaussian Elimanation can convert $A$ to an upper triangular matrix $U$ with row operation that does not change the determinant or row exchange that change the sign of determinant only $\therefore \begin{vmatrix} U \end{vmatrix} = (-1)^k\begin{vmatrix} A \end{vmatrix}$ where $k$ = the number of row exchanges If $A$ is singular, $\begin{vmatrix} A \end{vmatrix} = {1 \over (-1)^k}\begin{vmatrix} U \end{vmatrix} = 0$ If $A$ is invertible, $\begin{vmatrix} U \end{vmatrix} =$ product of diagonal elements of $U$ $\implies \begin{vmatrix} A \end{vmatrix} = {1 \over (-1)^k}\begin{vmatrix} U \end{vmatrix} \not= 0$