---
title: Linear Algebra Note 4
tags: Linear Algebra, 線性代數, 魏群樹, 大學, 國立陽明交通大學, 筆記
---
# Linear Algebra Note 4
## Orthogonality (2021.11.10 ~ 2021.11.17)
- Example
$A = \left[\begin{array}{c c}
1 & 0 \\
1 & 1 \\
1 & 2 \\
\end{array}\right],\ \vec{b} = \left[\begin{array}{c}
6 \\
0 \\
0 \\
\end{array}\right],\ P = A(A^TA)^{-1}A^T \\
\text{where } A^TA = \left[\begin{array}{c c c}
1 & 1 & 1 \\
0 & 1 & 2 \\
\end{array}\right]\left[\begin{array}{c c}
1 & 0 \\
1 & 1 \\
1 & 2 \\
\end{array}\right] = \left[\begin{array}{c c}
3 & 3 \\
3 & 5 \\
\end{array}\right] \\
A^TA\vec{x} = A^T\vec{b} = \left[\begin{array}{c c}
3 & 3 \\
3 & 5 \\
\end{array}\right]\left[\begin{array}{c}
\hat{x_1} \\
\hat{x_2} \\
\end{array}\right] = \left[\begin{array}{c c c}
1 & 1 & 1 \\
0 & 1 & 2 \\
\end{array}\right]\left[\begin{array}{c}
6 \\
0 \\
0 \\
\end{array}\right] = \left[\begin{array}{c}
6 \\
0 \\
\end{array}\right] \implies \hat{\vec{x}} = \left[\begin{array}{c}
\hat{x_1} \\
\hat{x_2} \\
\end{array}\right] = \left[\begin{array}{c}
5 \\
-3 \\
\end{array}\right] \\
\vec{p} = A\hat{\vec{x}} = \left[\begin{array}{c c}
1 & 0 \\
1 & 1 \\
1 & 2 \\
\end{array}\right]\left[\begin{array}{c}
5 \\
-3 \\
\end{array}\right] = \left[\begin{array}{c}
5 \\
2 \\
-1 \\
\end{array}\right] \\
\vec{e} = \vec{b} - \vec{p} = \left[\begin{array}{c}
1 \\
-2 \\
1 \\
\end{array}\right],\ \text{check } \vec{e}^T\vec{p} = \left[\begin{array}{c}
1 \\
-2 \\
1 \\
\end{array}\right]^T\left[\begin{array}{c}
5 \\
2 \\
-1 \\
\end{array}\right] = 0,\ \text{check } \vec{e}^TA = \left[\begin{array}{c}
1 \\
-2 \\
1 \\
\end{array}\right]^T\left[\begin{array}{c c}
1 & 0 \\
1 & 1 \\
1 & 2 \\
\end{array}\right] = 0\\
\text{Alternatively, } P = A(A^TA)^{-1}A^T \text{ where } (A^TA)^{-1} = {1 \over 6}\left[\begin{array}{c c}
5 & -3 \\
-3 & 3 \\
\end{array}\right] \implies P = {1 \over 6}\left[\begin{array}{c c c}
5 & 2 & -1 \\
2 & 2 & 2 \\
-1 & 2 & 5 \\
\end{array}\right] \\
\vec{p} = P\vec{b} = {1 \over 6}\left[\begin{array}{c c c}
5 & 2 & -1 \\
2 & 2 & 2 \\
-1 & 2 & 5 \\
\end{array}\right]\left[\begin{array}{c}
6 \\
0 \\
0 \\
\end{array}\right] = \left[\begin{array}{c}
5 \\
2 \\
-1 \\
\end{array}\right]$
- Theorem
$P^2 = P$
- Intuition
$P\vec{b} = \vec{p},\ P(P\vec{b}) = P^2\vec{b} = \vec{p}$
- Proof
$P = A(A^TA)^{-1}A^T \\
\begin{split}P^2 = PP &= (A(A^TA)^{-1}A^T)(A(A^TA)^{-1}A^T) \\
&= A(A^TA)^{-1}\color{red}{A^TA(A^TA)^{-1}}A^T \\
&= A(A^TA)^{-1}A^T = P\end{split}$
### Least Square Approximations
In many practical cases, $A\vec{x} = \vec{b}$ has no solutions, e.g. $A$ = tall matrix (more equations than unknowns)
Let $\vec{e} \triangleq A\vec{x} - \vec{b}$ , the best approximation is to minimize $\vec{e} \implies$ minimize $\begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2$ , i.e. Least Square Approximations
- Example (Linear fit)
Find the closest line to points $(t,\ b) = (0,\ 6),\ (1,\ 0),\ (2,\ 0)$
Let the line: $b = x_0 + x_1t$
$\left.\begin{array}{c}
x_0 + x_1 \cdot 0 = 6 \\
x_0 + x_1 \cdot 1 = 0 \\
x_0 + x_1 \cdot 2 = 0 \\
\end{array}\right. \implies \left[\begin{array}{c c}
1 & 0 \\
1 & 1 \\
1 & 2 \\
\end{array}\right]\left[\begin{array}{c}
x_0 \\
x_1 \\
\end{array}\right] = \left[\begin{array}{c}
6 \\
0 \\
0 \\
\end{array}\right]\ (A\vec{x} = \vec{b})$
Let $\vec{e} = A\vec{x} - \vec{b}$ and minimize $\begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2$
$e \perp A \hat{\vec{x}} \iff A^T\vec{e} = 0 \iff A^T(A\vec{x} - \vec{b}) = 0 \\
\begin{split} \hat{\vec{x}} &= (A^TA)^{-1}A^T\vec{b} \\
&= (\left[\begin{array}{c c c}
1 & 1 & 1 \\
0 & 1 & 2 \\
\end{array}\right]\left[\begin{array}{c c}
1 & 0 \\
1 & 1 \\
1 & 2 \\
\end{array}\right])^{-1}\left[\begin{array}{c c c}
1 & 1 & 1 \\
0 & 1 & 2 \\
\end{array}\right]\left[\begin{array}{c}
6 \\
0 \\
0 \\
\end{array}\right] = \left[\begin{array}{c}
5 \\
-3 \\
\end{array}\right]\end{split} \\
\therefore b = 5 - 3t \text{ is the best linear fit}$
- Theorem (Orthogonality principle)
Let $\vec{e} = A\vec{x} - \vec{b}$ and let $\hat{\vec{x}}$ be the solution to min $\begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2$ = min $\begin{Vmatrix}
A\vec{x} - \vec{b}
\end{Vmatrix}^2$ , then $\vec{e} \perp C(A)$ and $A\hat{\vec{x}} = P\vec{b} = \vec{p}$ where $P$ is the projection matrix onto $C(A)$ and $\hat{\vec{x}}$ is the solution to $A^TA\hat{\vec{x}} = A^T\vec{b}$
- Proof
Let $\vec{b} = \vec{p} + \vec{e}$ where $\vec{p} \in C(A)$ and $\vec{e} \in N(A^T)$
while $A\vec{x} = \vec{b}$ may **NOT** be solvable, $A\vec{x} = \vec{p}$ is always solvable as $\vec{p} \in C(A)$
The objective function (to be minimized) is
$\begin{split}\begin{Vmatrix}
A\vec{x} - \vec{b}
\end{Vmatrix}^2 &= \begin{Vmatrix}
A\vec{x} - \vec{p} - \vec{e}
\end{Vmatrix}^2 \\
&= (A\vec{x} - \vec{p} - \vec{e})^T(A\vec{x} - \vec{p} - \vec{e}) \\
&= \begin{Vmatrix}
A\vec{x} - \vec{p}
\end{Vmatrix}^2 + \begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2 - (A\vec{x}-\vec{p})^T\vec{e} - \vec{e}^T(A\vec{x}-\vec{p}) \\
&= \begin{Vmatrix}
A\vec{x} - \vec{p}
\end{Vmatrix}^2 + \begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2 - 0 - 0\ (\because N(A^T) \perp C(A)) \\
&= \begin{Vmatrix}
A\vec{x} - \vec{p}
\end{Vmatrix}^2 + \begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2\end{split}$
- Goal:
min $\begin{Vmatrix}
A\vec{x}-\vec{b}
\end{Vmatrix}^2$ = min \[ $\begin{Vmatrix}
A\vec{x} - \vec{p}
\end{Vmatrix}^2 + \begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2$ \] = min $\begin{Vmatrix}
A\vec{x} - \vec{p}
\end{Vmatrix}^2$ + min $\begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2 \ge \begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2$
we can hit the equality by choosing $\vec{x} = \hat{\vec{x}}$ , then the least square solution is $\begin{Vmatrix}
\vec{e}
\end{Vmatrix}^2$ when $A\hat{\vec{x}} = P\vec{b} = \vec{p}$
- Summary:
When $A\vec{x}=\vec{b}$ has **NO** solution, multiply by $A^T$ and solve $A^TA\hat{\vec{x}}=A^T\vec{b}$ . The solution to this alternative system of equations is the least square approximation to $A\vec{x} = \vec{b}$

### Fitting by a parabola
Given a quadratic (二次的) equation $b = x_2 + x_1t + x_2t^2$ with observations $(b,\ t)=(b_1,\ t_1),\cdots,\ (b_m,\ t_m)$
with $m > 3$ points, the $m$ equations for an exact fit are usually unsolvable
$\left\{\begin{array}{c}
x_2 + x_1t_1 + x_2t^2_1 = b_1 \\
x_2 + x_1t_2 + x_2t^2_2 = b_2 \\
\vdots \\
x_2 + x_1t_m + x_2t^2_m = b_m \\
\end{array}\right. \implies A = \left[\begin{array}{c c c}
1 & t_1 & t_1^2 \\
\vdots & \vdots & \vdots \\
1 & t_m & t_m^2 \\
\end{array}\right]_{m \times 3},\ A\left[\begin{array}{c}
x_0 \\
x_1 \\
x_2 \\
\end{array}\right] = \left[\begin{array}{c}
b_1 \\
b_2 \\
\vdots \\
b_m \\
\end{array}\right]\ (A\vec{x} = \vec{b})$
Least Squares: solve $A^TA\hat{\vec{x}} = A^T\vec{b}$
### Orthogonal Bases
A vector spaces can be spanned by different bases. A basis containing all orthogonal vectors is called an orthogonal basis.
Recall that vectors $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}$ are orthogonal if
$<\vec{q_i},\ \vec{q_j}> = \vec{q_i}^T\vec{q_j} = \left\{\begin{array}{c}
0\ ,\ i \not= j \\
\begin{Vmatrix}
\vec{q_i}
\end{Vmatrix}^2 = \begin{Vmatrix}
\vec{q_j}
\end{Vmatrix}^2,\ i = j \\
\end{array}\right.$
- Define (Orthonormal): The vectors $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}$ are orthonormal (orthogonal and normalized) if
$<\vec{q_i},\ \vec{q_j}> = \vec{q_i}^T\vec{q_j} = \left\{\begin{array}{c}
0\ ,\ i \not= j \\
1\ ,\ i = j \\
\end{array}\right.$ (normalized to unit vector)
- Theorem
Let $Q = [\vec{q_1}\ \vec{q_2}\ \cdots\ \vec{q_n}]$ where $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}$ are orthonormal. Then $Q^TQ = I$ . Moreover, if $Q$ is square, then $Q^{-1} = Q^T$
- Proof
$\begin{split}Q^TQ = \left[\begin{array}{c}
\vec{q_1}^T \\
\vec{q_2}^T \\
\vdots \\
\vec{q_n}^T \\
\end{array}\right]\left[\begin{array}{c c c c}
\vec{q_1} & \vec{q_2} & \cdots & \vec{q_n} \\
\end{array}\right] &= \left[\begin{array}{c c c c}
\vec{q_1}^T\vec{q_1} & \vec{q_1}^T\vec{q_2} & \cdots & \vec{q_1}^T\vec{q_n} \\
\vec{q_2}^T\vec{q_1} & \vec{q_2}^T\vec{q_2} & \cdots & \vec{q_2}^T\vec{q_n} \\
\vdots & \vdots & \ddots & \vdots \\
\vec{q_n}^T\vec{q_1} & \vec{q_n}^T\vec{q_2} & \cdots & \vec{q_n}^T\vec{q_n} \\
\end{array}\right] \\
&= \left[\begin{array}{c c c c}
1 & 0 & \cdots & 0 \\
0 & 1 & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & 1 \\
\end{array}\right] = I\end{split}$
If $Q$ is square $\implies$ size $= n \times n$ and $Q^TQ = I \implies Q^{-1} = Q^T$
- Example
Permutation matrix $Q = \left[\begin{array}{c c c}
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0 \\
\end{array}\right]$ is orthogonal
- Check
$Q^TQ = \left[\begin{array}{c c c}
0 & 0 & 1 \\
1 & 0 & 0 \\
0 & 1 & 0 \\
\end{array}\right]\left[\begin{array}{c c c}
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0 \\
\end{array}\right] = \left[\begin{array}{c c c}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1 \\
\end{array}\right] = I$
- Example
Rotation matrix $Q = \left[\begin{array}{c c}
cos\theta & -sin\theta \\
sin\theta & cos\theta \\
\end{array}\right]$ rotates a vector counterclockwise by $\theta$ and $Q$ is orthogonal
- Check
$Q^TQ = \left[\begin{array}{c c}
cos\theta & sin\theta \\
-sin\theta & cos\theta \\
\end{array}\right]\left[\begin{array}{c c}
cos\theta & -sin\theta \\
sin\theta & cos\theta \\
\end{array}\right] = \left[\begin{array}{c c}
1 & 0 \\
0 & 1 \\
\end{array}\right] = I$
- Theorem
Orthonormal matrices preserve length of a vector, i.e. $\begin{Vmatrix}
Q\vec{x}
\end{Vmatrix} = \begin{Vmatrix}
\vec{x}
\end{Vmatrix}\ \forall\ \vec{x} \in \mathbb{R}^n$
Moreover, it preserves dot products, i.e. $(Q\vec{x})^T(Q\vec{y}) = \vec{x}^T\vec{y}$
- Proof
$(Q\vec{x})^T(Q\vec{y}) = \vec{x}^T\color{red}{Q^TQ}\vec{y} = \vec{x}^T\color{red}{I}\vec{y} = \vec{x}^T\vec{y}$
Let $\vec{y} = \vec{x},\ (Q\vec{x})^T(Q\vec{x}) = \vec{x}^T\vec{x} \iff \begin{Vmatrix}
Q\vec{x}
\end{Vmatrix}^2 = \begin{Vmatrix}
\vec{x}
\end{Vmatrix}^2 \iff \begin{Vmatrix}
Q\vec{x}
\end{Vmatrix} = \begin{Vmatrix}
\vec{x}
\end{Vmatrix}$
> Recall that for projecting a vector $\vec{b}$ onto a subspace spanned by columns of $A \implies$ they are basis for $C(A)$
$\begin{split}<\vec{e},\ A>\ =\ &<\vec{b}-A\vec{x},\ A>\ =\ 0 \\
\implies &A^T(\vec{b}-A\vec{x}) = 0 \\
\implies &A^T\vec{b} = A^TA\vec{x} \\
\implies &\hat{\vec{x}} = (A^TA)^{-1}A^T\vec{b},\ \vec{p} = A\hat{\vec{x}} = A(A^TA)^{-1}A^T\vec{b}\end{split}$
when we have an orthogonal basis of the subspace $Q = [\vec{q_1}\ \vec{q_2}\ \cdots\ \vec{q_n}],\ Q^TQ = I$
Then $\hat{\vec{x}} = Q^T\vec{b},\ \vec{p} = Q\hat{\vec{x}} = QQ^T\vec{b}$
- Projection of $\vec{b}$ onto a subspace is the sum of the projection of $\vec{b}$ onto each orthonormal vector $\vec{q_i}$ in the orthonormal basis of the subspace
$\begin{split}\vec{p} &= QQ^T\vec{b} = \left[\begin{array}{c c c c}
\vec{q_1} & \vec{q_2} & \cdots & \vec{q_n} \\
\end{array}\right]\left[\begin{array}{c}
\vec{q_1}^T \\
\vec{q_2}^T \\
\vdots \\
\vec{q_n}^T \\
\end{array}\right]\vec{b} = \left[\begin{array}{c c c c}
\vec{q_1} & \vec{q_2} & \cdots & \vec{q_n} \\
\end{array}\right]\left[\begin{array}{c}
\vec{q_1}^T\vec{b} \\
\vec{q_2}^T\vec{b} \\
\vdots \\
\vec{q_n}^T\vec{b} \\
\end{array}\right] \\
&=(\underbrace{\vec{q_1}^T\vec{b}}_\text{scalar})\vec{q_1} + (\vec{q_2}^T\vec{b})\vec{q_2} + \cdots + (\vec{q_n}^T\vec{b})\vec{q_n}\ (\vec{b} = \vec{p}+\vec{e},\ \vec{e} \perp \vec{q_i}) \\
&= (\vec{q_1}^T\vec{p})\vec{q_1} + (\vec{q_2}^T\vec{p})\vec{q_2} + \cdots + (\vec{q_n}^T\vec{p})\vec{q_n} \\
&= \vec{p_1} + \vec{p_2} + \cdots + \vec{p_n}\end{split}$
- Let $\vec{p} = \hat{x_1}\vec{q_1} + \hat{x_2}\vec{q_2} + \cdots + \hat{x_n}\vec{q_n}$
$\vec{q_i}^T\vec{p} = \vec{q_i}^T(\hat{x_1}\vec{q_1} + \hat{x_2}\vec{q_2} + \cdots + \hat{x_n}\vec{q_n}) = \hat{x_i}\vec{q_i}^T\vec{q_i}\ (\vec{q_i}^T\vec{q_j} = 0\ \forall\ j \not= i) \\
\therefore \vec{p}^T\vec{q_i} = \hat{x_i}\vec{q_i}^T\vec{q_i},\ \hat{x_i} = {\vec{p}^T\vec{q_i} \over \begin{Vmatrix}
\vec{q_i}
\end{Vmatrix}^2} \text{ where } \begin{Vmatrix}
\vec{q_i}
\end{Vmatrix}^2 = 1 \text{ for orthonormal basis} \{ q_i \}$
- Example
$\vec{q_1} = \left[\begin{array}{c}
1 \\
0 \\
0 \\
\end{array}\right],\ \vec{q_2} = \left[\begin{array}{c}
0 \\
1 \\
0 \\
\end{array}\right],\ \vec{b} = \left[\begin{array}{c}
4 \\
3 \\
2 \\
\end{array}\right] \\
\vec{p_1} = (\vec{q_1}^T\vec{b})\vec{q_1} = \left[\begin{array}{c}
1 \\
0 \\
0 \\
\end{array}\right]^T\left[\begin{array}{c}
4 \\
3 \\
2 \\
\end{array}\right]\left[\begin{array}{c}
1 \\
0 \\
0 \\
\end{array}\right] = \left[\begin{array}{c}
4 \\
0 \\
0 \\
\end{array}\right] \\
\vec{p_2} = (\vec{q_2}^T\vec{b})\vec{q_2} = \left[\begin{array}{c}
0 \\
1 \\
0 \\
\end{array}\right]^T\left[\begin{array}{c}
4 \\
3 \\
2 \\
\end{array}\right]\left[\begin{array}{c}
0 \\
1 \\
0 \\
\end{array}\right] = \left[\begin{array}{c}
0 \\
3 \\
0 \\
\end{array}\right] \\
\vec{p} = \vec{p_1} + \vec{p_2} = \left[\begin{array}{c}
4 \\
3 \\
0 \\
\end{array}\right] = \text{ Projection of } \vec{b} \text{ onto } span(\{ \vec{q_1},\ \vec{q_2}\})$
### The Gram-Schmidt Process
A systematic way to create orthonormal basis from an arbitrary basis
Input: $\vec{v_1},\ \vec{v_2},\ \cdots,\ \vec{v_n}$ ($n$ non-zero vectors)
Output: $\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_r}\ (r = rank([\vec{v_1}\ \vec{v_2}\ \cdots\ \vec{v_n}]))$
Step1: Let $\vec{e_1} = \vec{v_1},\ \vec{q_1} = { \vec{e_1} \over \begin{Vmatrix}
\vec{e_1}
\end{Vmatrix}}$
Step2+: $\underbrace{\vec{e_k} = \vec{v_k} - \sum\limits_{i = 1}^{k-1}(\vec{q_i}^T\vec{v_k})\vec{q_i}}_\text{orthogonalization},\ \underbrace{\vec{q_k} = { \vec{e_k} \over \begin{Vmatrix}
\vec{e_k}
\end{Vmatrix}}}_\text{normalization}$
- Example
$\vec{a} = \left[\begin{array}{c}
1 \\
-1 \\
0 \\
\end{array}\right],\ \vec{b} = \left[\begin{array}{c}
2 \\
0 \\
-2 \\
\end{array}\right],\ \vec{c} = \left[\begin{array}{c}
3 \\
-3 \\
3 \\
\end{array}\right]$
1. Let $\vec{q_1} = { \vec{a} \over \begin{Vmatrix}
\vec{a}
\end{Vmatrix}} = {1 \over \sqrt{2}}\left[\begin{array}{c}
1 \\
-1 \\
0 \\
\end{array}\right]$
2. Let $\vec{e_2} = \vec{v_2} - (\text{projection of } \vec{v_2} \text{ onto } \vec{q_1}) = \vec{v_2} - (\vec{q_1}^T\vec{v_2})\vec{q_1},\ \vec{q_2} = { \vec{e_2} \over \begin{Vmatrix}
\vec{e_2}
\end{Vmatrix}}$
### QR Factorization (Decomposition)
$\{\vec{v_1},\ \vec{v_2},\ \cdots,\ \vec{v_n}\} \text{ (linearly independent) } \xrightarrow{Gram-Schmidt\ Process} \{\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}\}$
Express $\vec{v_k}$ by the newly obtained orthonormal basis $\{\vec{q_1},\ \vec{q_2},\ \cdots,\ \vec{q_n}\}$
$\vec{v_1} = (\vec{q_1}^T\vec{v_1})\vec{q_1} \\
\vec{v_2} = (\vec{q_2}^T\vec{v_2})\vec{q_2} + (\vec{q_1}^T\vec{v_2})\vec{q_1} \\
\ \vdots \\
\vec{v_k} = (\vec{q_1}^T\vec{v_k})\vec{q_1} + \cdots + (\vec{q_k}^T\vec{v_k})\vec{q_k}$
$\begin{split}\text{Let } A &= \left[\begin{array}{c c c c}
\vec{v_1} & \vec{v_2} & \cdots & \vec{v_k} \\
\end{array}\right] \\
&= \left[\begin{array}{c c c c}
(\vec{q_1}^T\vec{v_1})\vec{q_1} & (\vec{q_1}^T\vec{v_2})\vec{q_1} & & (\vec{q_1}^T\vec{v_k})\vec{q_1} \\
& + (\vec{q_2}^T\vec{v_2})\vec{q_2} & \cdots & (\vec{q_2}^T\vec{v_k})\vec{q_2} \\
& & & + \cdots \\
& & & + (\vec{q_k}^T\vec{v_k})\vec{q_k} \\
\end{array}\right] \\
&= \left[\begin{array}{c c c c}
\vec{q_1} & \vec{q_2} & \cdots & \vec{q_k} \\
\end{array}\right]\left[\begin{array}{c c c c}
\vec{q_1}^T\vec{v_1} & \vec{q_1}^T\vec{v_2} & \cdots & \vec{q_1}^T\vec{v_k} \\
0 & \vec{q_2}^T\vec{v_2} & \cdots & \vec{q_2}^T\vec{v_k} \\
\vdots & 0 & & \vdots \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & \cdots & \vec{q_k}^T\vec{v_k} \\
\end{array}\right] = QR\end{split}$
$A = QR$ , consider the least square approximation $A^TA\vec{\hat{x}} = A^T\vec{b}$ where $A=QR$
$\begin{split}&\iff (QR)^T(QR)\vec{\hat{x}} = (QR)^T\vec{b} \\
&\iff R^T\color{red}{Q^TQ}R\vec{\hat{x}} = R^TQ^T\vec{b} \\
&\iff R^TR\vec{\hat{x}} = R^TQ^T\vec{b} \\
&\iff (R^T)^{-1}R^TR\vec{\hat{x}} = (R^T)^{-1}R^TQ^T\vec{b} \\
&\iff R\vec{\hat{x}} = Q^T\vec{b} \\
&\therefore \text{ we can solve } \vec{\hat{x}} \text{ by back substituting using } R\end{split}$
- Example
$A = \left[\begin{array}{c c c}
1 & 1 & 0 \\
1 & 0 & 1 \\
0 & 1 & 1 \\
\end{array}\right] \implies \vec{v_1} = \left[\begin{array}{c}
1 \\
1 \\
0 \\
\end{array}\right],\ \vec{v_2} = \left[\begin{array}{c}
1 \\
0 \\
1 \\
\end{array}\right],\ \vec{v_3} = \left[\begin{array}{c}
0 \\
1 \\
1 \\
\end{array}\right] \\
\vec{q_1} = { \vec{v_1} \over \begin{Vmatrix}
\vec{v_1}
\end{Vmatrix}} = {1 \over \sqrt{2}}\left[\begin{array}{c}
1 \\
1 \\
0 \\
\end{array}\right] \\
\vec{e_2} = \vec{v_2} - (\vec{q_1}^T\vec{v_2})\vec{q_1} = \left[\begin{array}{c}
{1 \over 2} \\
{-1 \over 2} \\
1 \\
\end{array}\right],\ \vec{q_2} = { \vec{e_2} \over \begin{Vmatrix}
\vec{e_2}
\end{Vmatrix}} = \left[\begin{array}{c}
{1 \over \sqrt{6}} \\
{-1 \over \sqrt{6}} \\
{2 \over \sqrt{6}} \\
\end{array}\right] \\
\vec{e_3} = \vec{v_3} - (\vec{q_1}^T\vec{v_3})\vec{q_1} - (\vec{q_2}^T\vec{v_3})\vec{q_2} = \left[\begin{array}{c}
{-2 \over 3} \\
{2 \over 3} \\
{2 \over 3} \\
\end{array}\right],\ \vec{q_3} = { \vec{e_3} \over \begin{Vmatrix}
\vec{e_3}
\end{Vmatrix}} = \left[\begin{array}{c}
{-1 \over \sqrt{3}} \\
{1 \over \sqrt{3}} \\
{1 \over \sqrt{3}} \\
\end{array}\right] \\
\therefore Q = \vec{v_1} = \left[\begin{array}{c c c}
\vec{q_1} & \vec{q_2} & \vec{q_3} \\
\end{array}\right] = \left[\begin{array}{c}
{1 \over \sqrt{2}} & {1 \over \sqrt{6}} & {-1 \over \sqrt{3}} \\
{1 \over \sqrt{2}} & {-1 \over \sqrt{6}} & {1 \over \sqrt{3}} \\
0 & {2 \over \sqrt{6}} & {1 \over \sqrt{3}} \\
\end{array}\right],\ \begin{split}R &= \left[\begin{array}{c c c}
\vec{v_1}^T\vec{q_1} & \vec{v_1}^T\vec{q_2} & \vec{v_1}^T\vec{q_3} \\
0 & \vec{v_2}^T\vec{q_2} & \vec{v_2}^T\vec{q_3} \\
0 & 0 & \vec{v_3}^T\vec{q_3}\\
\end{array}\right] \\
&= \left[\begin{array}{c c c}
{1 \over \sqrt{2}} & {1 \over \sqrt{2}} & {1 \over \sqrt{2}} \\
0 & {3 \over \sqrt{6}} & {1 \over \sqrt{6}} \\
0 & 0 & {2 \over \sqrt{3}}\\
\end{array}\right]\end{split}$
## Determinants (2021.11.19)
- Example
Determinant of $2 \times 2$ matrix $A = \left[\begin{array}{c c}
a & b \\
c & d \\
\end{array}\right],\ det(A) = \begin{vmatrix}
A
\end{vmatrix} = ad-bc$
- Properties
1. $\begin{vmatrix}
I_n
\end{vmatrix} = 1$
2. Exchange of two rows leads to change of sign
$A = \left[\begin{array}{c c}
a & b \\
c & d \\
\end{array}\right],\ A' = \left[\begin{array}{c c}
c & d \\
a & b \\
\end{array}\right],\ \begin{vmatrix}
A'
\end{vmatrix} = bc-ad = -(ad-bc) = -\begin{vmatrix}
A
\end{vmatrix}$
3. The determinant is a linear function of each row separately
(scalar multiplication) $A' = \left[\begin{array}{c c}
ta & tb \\
c & d \\
\end{array}\right] \implies \begin{vmatrix}
A'
\end{vmatrix} = t(ad-bc) = t\begin{vmatrix}
A
\end{vmatrix}$
(vector addition) $\begin{vmatrix}
A''
\end{vmatrix} = \begin{vmatrix}
\left[\begin{array}{c c}
a+a' & b+b' \\
c & d \\
\end{array}\right]
\end{vmatrix} = \begin{vmatrix}
A
\end{vmatrix}+\begin{vmatrix}
A'
\end{vmatrix} \text{ where } A' = \left[\begin{array}{c c}
a' & b' \\
c & d \\
\end{array}\right]$
4. If two rows are equal in $A$ , $\begin{vmatrix}
A
\end{vmatrix} = 0$
- Proof
Given two rows, $\vec{a_i}^T$ and $\vec{a_j}^T$ are equal in $A$
Let $A'$ be the matrix $A$ with $\vec{a_i}^T$ and $\vec{a_j}^T$ exchanged
From 2, $det(A') = -det(A) ... ①$, since $A'=A,\ det(A') = det(A) ... ②$
From ① and ②, $det(A) = det(A') = 0$
5. Subtracting a multiple of one row from another row leaves $\begin{vmatrix}
A
\end{vmatrix}$ unchanged
$\begin{split}\begin{vmatrix}
A'
\end{vmatrix} &= \begin{vmatrix}
\left[\begin{array}{c c}
a & b \\
c-ka & d-kb \\
\end{array}\right]
\end{vmatrix} \\
&= \begin{vmatrix}
\left[\begin{array}{c c}
a & b \\
c & d \\
\end{array}\right]
\end{vmatrix} + \begin{vmatrix}
\left[\begin{array}{c c}
a & b \\
-ka & -kb \\
\end{array}\right]
\end{vmatrix} \\
&= \begin{vmatrix}
A
\end{vmatrix} + (-k)\begin{vmatrix}
\left[\begin{array}{c c}
a & b \\
a & b \\
\end{array}\right]
\end{vmatrix} \\
&= \begin{vmatrix}
A
\end{vmatrix} + 0 = \begin{vmatrix}
A
\end{vmatrix}\end{split}$
6. A matrix with a row of zeros has $det(A) = 0$
$A = \left[\begin{array}{c c}
a & b \\
0 & 0 \\
\end{array}\right],\ A' = \left[\begin{array}{c c}
a & b \\
0+a & 0+b \\
\end{array}\right]$
From 4, $\begin{vmatrix}
A'
\end{vmatrix} = 0$ , from 5, $\begin{vmatrix}
A
\end{vmatrix} = \begin{vmatrix}
A'
\end{vmatrix} = 0$
7. If $A$ is triangular, then $\begin{vmatrix}
A
\end{vmatrix} =$ product of diagonal elements
$A = \left[\begin{array}{c c}
a_{11} & a_{12} & \cdots & a_{1n} \\
0 & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & a_{nn}
\end{array}\right] \xrightarrow{Jordan\ elimination} D = \left[\begin{array}{c c}
a_{11} & 0 & \cdots & 0 \\
0 & a_{22} & \cdots & 0 \\
\vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & a_{nn}
\end{array}\right] (diagonal)$
From 5, $\begin{vmatrix}
D
\end{vmatrix} = \begin{vmatrix}
A
\end{vmatrix} =$ product of diagonal elements
8. If $A$ is singular, then $\begin{vmatrix}
A
\end{vmatrix} = 0$ , if $A$ is invertible, $\begin{vmatrix}
A
\end{vmatrix} \not= 0$
- Define
A singular matrix does not have a matrix inverse
$A$ is singular $\implies$ $A$ does not have full rank $\implies$ $A$ contains all-zero row after Gauss-Jordan Elimination
Note that Gaussian Elimanation can convert $A$ to an upper triangular matrix $U$ with row operation that does not change the determinant or row exchange that change the sign of determinant only
$\therefore \begin{vmatrix}
U
\end{vmatrix} = (-1)^k\begin{vmatrix}
A
\end{vmatrix}$ where $k$ = the number of row exchanges
If $A$ is singular, $\begin{vmatrix}
A
\end{vmatrix} = {1 \over (-1)^k}\begin{vmatrix}
U
\end{vmatrix} = 0$
If $A$ is invertible, $\begin{vmatrix}
U
\end{vmatrix} =$ product of diagonal elements of $U$ $\implies \begin{vmatrix}
A
\end{vmatrix} = {1 \over (-1)^k}\begin{vmatrix}
U
\end{vmatrix} \not= 0$