# MAT 345 - Inner Products and Orthogonality
## The definition of inner-product.
**Inner Product** Let $V$ be a vector space over $F$ for $F$ either $\mathbb R$ or $\mathbb C$. An ***inner product*** on $V$ is a map $\langle \cdot,\cdot\rangle:V\times V\to F$ satisfying the following:
* ***left-linearity***: $\langle \alpha\mathbf u_1+\beta\mathbf u_2,\mathbf v\rangle=\alpha\langle\mathbf u_1,\mathbf v\rangle+\beta\langle\mathbf u_2,\mathbf v\rangle$. The text calls this linearity in the first *slot*.
* ***conjugate-symmetry***: $\langle \mathbf u,\mathbf v\rangle=\overline{\langle\mathbf v,\mathbf u\rangle}$
* ***positive definite***: $\langle \mathbf u,\mathbf u\rangle\in (0,\infty)$ if $\mathbf u\neq \mathbf 0$.
There are several things to notice: (**Verify these!**)
* ***Right conjugate-linearity*** holds: $\langle \mathbf v, \alpha\mathbf u_1+\beta\mathbf u_2\rangle=\bar\alpha\langle\mathbf v, \mathbf u_1\rangle+\bar\beta\langle\mathbf v,\mathbf u_2\rangle$.
* If $F=\mathbb R$, then conjugate-symmetry just becomes symmetry, that is $\langle\mathbf u,\mathbf v\rangle=\langle\mathbf v,\mathbf u\rangle$ and in this case the inner product is ***bi-linear***.
* If $\mathbf u=\mathbf 0$, then $\langle \mathbf u,\mathbf v\rangle=\langle \mathbf v,\mathbf u\rangle=0$.
**Example** Of course, the standard inner product on $\mathbb C^n$($\mathbb R^n)$ defined by $\langle \mathbf u,\mathbf v\rangle=\mathbf v^H\mathbf u$ is an inner product, but if $A$ is any $n\times n$ diagonal matrix with all diagonal entries real and positive, then $\langle \mathbf u,\mathbf v\rangle_A=\mathbf v^HA\mathbf u$ is an inner-product.
:::spoiler **Solution**
Let's do the complex case. Recall that it suffices to show *left-linearity* and *conjugate-symmetry*
* *Left-Linearity*:
\begin{split}
\langle \alpha_1\mathbf u_1&+\alpha_2\mathbf u_2,\mathbf v\rangle_A=\mathbf v^HA( \alpha_1\mathbf u_1+\alpha_2\mathbf u_2)\\
&=\alpha_1\mathbf v^H\mathbf u_1+\alpha_2\mathbf v^H\mathbf u_2=\alpha_1\langle \mathbf u_1,\mathbf v\rangle_A+\alpha_2\langle \mathbf u_2,\mathbf v\rangle_A
\end{split}
* *Conjugate-Symmetry*:
$$
\langle\mathbf u,\mathbf v\rangle_A=\mathbf v^HA\mathbf u=(\mathbf v^HA\mathbf u)^H=\mathbf u^HA^H\mathbf v=\mathbf u^HA\mathbf v
=\overline{\langle \mathbf v,\mathbf u\rangle}
$$
* *Positive-Definite*:
For this notice
\begin{align}
\langle \mathbf u,\mathbf v\rangle_A
=\mathbf v^HA\mathbf u
&=\sum_{i=1}^n (A_{i,i})\bar{\mathbf v}_i \mathbf u_i\\
\|\mathbf u\|^2_A
=\langle\mathbf u,\mathbf u\rangle_A
&=\sum_{i=1}^n (A_{i,i})\bar{\mathbf u}_i \mathbf u_i
=\sum_{i=1}^n (A_{i,i})|\mathbf u_i|^2
\end{align}
Since each $A_{i,i}$ is real and positive it is clear that $\langle\mathbf u,\mathbf u\rangle_A$ real and non-negative and that $\langle\mathbf u,\mathbf u\rangle_A=0$ iff for all $i$, $|\mathbf u_i|^2=0$, hence for all $i$, $\mathbf u_i=0$, and so $\mathbf u=0$.
Note later on we will see that any ***positive-definite hermitian*** matrix could be used for $A$.
:::
>
**Example** An example we will use a lot is that of $V=C([0,1],\mathbb R)$ (continuous functions $f:[0,1]\to\mathbb R$). Here define $\langle f,g\rangle = \int_0^1fg\,dx$. Show that this is a inner-product and give an intuitive geometric interpretation of $\|f-g\|$.
:::spoiler **Solution**
The vector spcae here is real so we need only show left-linearity, symmetry, and positive definite.
* *Linearity*:
\begin{align*}
\langle \alpha_1 f_1&+\alpha_2 f_2, g\rangle_A=
\int_0^1(\alpha_1 f_1+\alpha_2f_2) g\,dx\\
&=\alpha_1\int_0^1 f_1 g\,dx+\alpha_2\int_0^1 f_2 g\,dx=
\alpha_1\langle f_1, g\rangle+ \alpha_2\langle f_2, g\rangle
\end{align*}
* *Symmetry*: There is nothing to do since $f= g= g f$.
* *Positive-Definite*: That $\int_0^1 f^2(x)\,dx\ge 0$ is clear as $f^2(x)\ge 0$. Suppose $f^2(c)>0$ at some $c$, then there is some $\delta>0$ so that $f^2(x)> f^2(c)/2$ for all $x\in(c-\delta,c+\delta)$ and thus $\int_0^2 f^2\,dx\ge\int_{c-\delta}^{c+\delta} f^2\,dx\ge 2\delta\cdot f^2(c)/2>0$. This shows that $\langle f,f\rangle=0\implies f= 0$.
The intuitive interpretation of $\| f-g\|$ is pretty easy to see. Consider $\int_0^1|f-g|\,dx$ is the area between the graphs of $f$ and $g$. So $\left(\int_0^1(f-g)^2\,dx\right)^{1/2}$ is a way of measuring the "difference" between the two graphs as well. (See the discussion and pictures below.)
:::
>
There is an interesting relationship between the previous example and the standard inner product. Take $f\in C([0,1],\mathbb R)$ and define $f_n\in\mathbb R^n$ by $f_n(i)=f(i/n)$ for $i=1,\ldots,n$. This replaces the continuous $f$ by a discrete approximation of $f$:

Inuitively, $f_n\to f$ as $n\to\infty$ so we might hope that $\langle f_n,g_n\rangle\to \langle f,g\rangle$ as $n\to\infty$. But this is not true, since as $n$ increases, we are just putting more and more into $f_n$. But if we think about integrals and their finite approximations, it is clear that $\sum_{i=1}^nf_n(i)g_n(i)(1/n)\to \int_0^1 fg\,dx$. So, if we take $A_n$ to be the diagonal matrix with all diagonal entries being $1/n$, then $\sum_{i=1}^nf_n(i)g_n(i)(1/n)=\langle f_n,g_n\rangle_{A_n}\to \langle f,g\rangle$ as $n\to\infty$.
[](https://www.desmos.com/calculator/qbklaltinz)
## Cauchy's Inequality and the $2$-norm
**Definition:** Define the ***$2$-norm***, $||\cdot||_2:V\to [0,\infty)$ by $|\mathbf u||_2=\sqrt{\langle \mathbf u,\mathbf u\rangle}$. It is most useful to think of this as $||\mathbf u||_2^2=\langle \mathbf u,\mathbf u\rangle$.
**Notation:** In this class we mostly only consider the $2$-norm so when I write $||\mathbf u||$, I mean $||\mathbf u||_2$.
**Definion** $||\cdot||:V\to [0,\infty)$ is a ***norm on $V$*** iff the following hold
* $||\mathbf u + \mathbf v|| \leq ||\mathbf u|| + ||\mathbf v||$ (***triangle inequality***)

* $||\alpha\cdot \mathbf u||=|\alpha|\cdot ||\mathbf u||$
* $||\mathbf u||=0 \iff \mathbf u = \mathbf 0$
A norm is somtimes called a *length* or *magnitude*.
From a norm $||\cdot||:V\to [0,\infty)$ we get a (homogeneous) distance function $d:V\times V\to[0,\infty)$ by setting $d(\mathbf u,\mathbf v)=||\mathbf u-\mathbf v||$. Note that with this definition $||\mathbf u||=d(\mathbf u,\mathbf 0)$, so the length of $\mathbf u$ is the same as the distance of $u$ from the origin.

**Definition** A ***distance function*** is any function satisfying:
* $d(\mathbf u,\mathbf v)\le d(\mathbf u,\mathbf w)+d(\mathbf w,\mathbf v)$ for any $\mathbf u,\mathbf v,\mathbf w\in V$. (***triangle inequality***)

* $d(\mathbf u,\mathbf v)=d(\mathbf v,\mathbf u)$ for all $\mathbf u,\mathbf v\in V$.
* $d(\mathbf u,\mathbf v)=0\iff \mathbf u=\mathbf v$ for all $\mathbf u,\mathbf v\in V$.
A distance function is ***homogeneous*** iff $d(\alpha \mathbf u,\alpha \mathbf v)=|\alpha|d(\mathbf u,\mathbf v)$.
**Theorem** Let $\langle\cdot,\cdot\rangle$ be an inner product on $V$, then
$$
||\cdot ||_2 \text{ is a norm. }
$$
**Proof** The triangle inequality is dealt with separately below after Cauchy's theorem. The other two follow simply using the axioms of inner products.
$$
\|\alpha \mathbf v\|^2=\langle \alpha \mathbf v,\alpha \mathbf v\rangle=\alpha\bar{\alpha}\langle \mathbf v,\mathbf v\rangle=|\alpha|^2\|\mathbf v\|^2
$$
and
$$
\|\mathbf u\|=0\iff \|\mathbf u\|^2=0\iff \langle \mathbf u,\mathbf u\rangle=0\iff \mathbf u=\mathbf 0
$$
❏
**Theorem (Cauchy's Inequality)**
$$
|\langle \mathbf u,\mathbf v\rangle| \leq ||\mathbf u||\cdot||\mathbf v||
$$
Moreover, $\langle \mathbf u,\mathbf v\rangle|=\|\mathbf u\|\|\mathbf v\|$ iff $\mathbf u=\lambda \mathbf v$ for some scalar $\lambda$.
**Proof:** It suffices to prove this for $||\mathbf u|| = ||\mathbf v|| = 1$. For let $\hat{\mathbf u}= \frac{\mathbf u}{\|\mathbf u\|}$ and $\hat{\mathbf v}=\frac{\mathbf v}{\|\mathbf v\|}$, clearly
\begin{equation}
\|\hat{\mathbf u}\|=\left\|\frac{\mathbf u}{\|\mathbf u\|}\right\|=\left|\frac{1}{\|\mathbf u\|}\right|\|\mathbf u\|=1\tag{since $\|\alpha \mathbf u\|=|\alpha|\|\mathbf u\|$}
\end{equation}
and similarly for $\|\hat{\mathbf v}\|$. So we have
\begin{align}
\bigl|\langle \hat{\mathbf u},\hat{\mathbf v}\rangle\bigr| \le 1
&\iff \left|\langle \hat{\mathbf u},\hat{\mathbf v}\rangle \right|\le \|\hat{\mathbf u}\|\hat{\mathbf v}\|\\
&\iff \left|\left\langle \frac{\mathbf u}{\|\mathbf u\|},\frac{\mathbf v}{\|\mathbf v\|}\right\rangle\right| \le \left\|\frac{\mathbf u}{\|\mathbf u\|}\right\|\cdot\left\|\frac{\mathbf v}{\|\mathbf v\|}\right\| \\
&\iff \frac{1}{\|\mathbf u\|\|\mathbf v\|}\bigl|\langle \mathbf u,\mathbf v\rangle\bigr|\le \frac{1}{\|\mathbf u\|\|\mathbf v\|}\|\mathbf u\|\|\mathbf v\|\\
&\iff \bigl|\langle \mathbf u,\mathbf v\rangle\bigr|\le \|\mathbf u\|\|\mathbf v\|
\end{align}
Thus we aim to show: $|\langle \mathbf u, \mathbf v\rangle| \leq 1$ where $||\mathbf u||=||\mathbf v||=1$.
Here there is a trick:
\begin{align*}
0&\le\langle \mathbf u - \lambda \mathbf v, \mathbf u - \lambda \mathbf v\rangle\\
&=\langle \mathbf u,\mathbf u-\lambda \mathbf v\rangle -\lambda\langle \mathbf v,\mathbf u-\lambda \mathbf v\rangle\\
&=\langle \mathbf u,\mathbf u\rangle - \bar\lambda\langle \mathbf u, \mathbf v\rangle - \lambda
\langle \mathbf v, \mathbf u\rangle + \lambda\bar\lambda\langle \mathbf v,\mathbf u\rangle\\
&= ||\mathbf u||^2 + |\lambda|^2||\mathbf v||^2 - (\bar\lambda \langle \mathbf u, \mathbf v\rangle
+ \lambda \overline{\langle \mathbf u, \mathbf v\rangle})\tag{let $\lambda=\langle \mathbf u,\mathbf v\rangle$}\\
&= 1+|\langle \mathbf v,\mathbf u\rangle|^2-2|\langle \mathbf v,\mathbf u\rangle|^2\tag{since $\|\mathbf u\|=\|\mathbf v\|=1$}\\
&=1-|\langle \mathbf v,\mathbf u\rangle|^2
\end{align*}
So $0\le 1-|\langle \mathbf v,\mathbf u\rangle|^2$ and so $|\langle \mathbf v,\mathbf u\rangle|\le 1$. Notice that the inequalities are stict unless, $\mathbf u=\lambda \mathbf v$, since that is the only case in which $\langle \mathbf u - \lambda \mathbf v, \mathbf u - \lambda \mathbf v\rangle=0$ can hold.
❏
Note the following nice corollary:
**Corollary** For $x_i\in \mathbb R$, $\bigl(\sum_i x_i\bigr)^2\le n\sum_i x_i^2$.
**Proof**
$$
\Bigl(\sum _i x_i\Bigr)^2=\bigl\langle \mathbf x,\mathbf 1 \bigr\rangle^2 \le \|\mathbf x\|^2\|\mathbf 1\|^2=\Bigl(\sum_i x_i^2\Bigr)\Bigl(\sum_i 1^2\Bigr)=n\sum_i x_i^2
$$
❏
From this it follows easily that we have:
## Triangle Inequality
**Theorem** Let $V$ be an inner product space with inner product $\langle\cdot,\cdot\rangle$ and $\|\mathbf u\|^2=\langle \mathbf u,\mathbf u\rangle$, then
$$
||\mathbf u+\mathbf v|| \leq ||\mathbf u|| + ||\mathbf v||
$$
Moreover,
$$
||\mathbf u+\mathbf v|| = ||\mathbf u|| + ||\mathbf v|| \text{ iff }\mathbf u=\lambda\mathbf v
$$
**Proof:** This is now a simple computation.
\begin{align*}
||\mathbf u+\mathbf v||^2&=|\langle \mathbf u+\mathbf v, \mathbf u+\mathbf v\rangle|\\
&=|\langle \mathbf u,\mathbf u\rangle+\langle \mathbf u,\mathbf v\rangle+\langle \mathbf v,\mathbf u\rangle+\langle \mathbf v,\mathbf v\rangle|\\
&\le||\mathbf u||^2+||\mathbf v||^2+|\langle \mathbf u,\mathbf v\rangle|+|\langle \mathbf v,\mathbf u\rangle|\tag{$|a+b|\le|a|+|b|$}\\
&= ||\mathbf u||^2+||\mathbf v||^2+2|\langle \mathbf u,\mathbf v \rangle|\\
&\leq ||\mathbf u||^2+||\mathbf v||^2+2||\mathbf u||||\mathbf v||\tag{Cauchy's Inequality}\\
&=(||\mathbf u|+||\mathbf v||)^2
\end{align*}
So $||\mathbf u+\mathbf v||\leq ||\mathbf u||+||\mathbf v||$.
❏
Given a norm (any norm) on $V$, there is a distance (metric) defined on $V$ by
$$
d(\mathbf u,\mathbf v) = ||\mathbf u-\mathbf v||.
$$
## Parallelogram Law (Optional)
$$
2(||\mathbf u||^2_2+||\mathbf v||^2_2)= ||\mathbf u-\mathbf v||^2+||\mathbf u+\mathbf v||^2
$$
[](https://www.geogebra.org/calculator/yurn99vz)
**Proof:** This is a simple calculation.
\begin{align*}
||\mathbf u+\mathbf v||^2+||\mathbf u-\mathbf v||^2&=\langle \mathbf u+\mathbf v, \mathbf u+\mathbf v\rangle + \langle \mathbf u-\mathbf v, \mathbf u-\mathbf v\rangle\\
&= \langle \mathbf u,\mathbf u \rangle + \langle \mathbf u,\mathbf v \rangle + \langle \mathbf v,\mathbf u \rangle + \langle \mathbf v,\mathbf v\rangle + \langle \mathbf u, \mathbf u \rangle + \langle \mathbf u, \mathbf v \rangle - \langle \mathbf v, \mathbf u \rangle + \langle \mathbf v,\mathbf v \rangle\\
&=2||\mathbf u||^2+2||\mathbf v||^2
\end{align*}
Note that this is a generalization of the Pythagorean Theorem, since in that case, the two diagonals have equal length, so $||\mathbf u-\mathbf v||=||\mathbf u+\mathbf v||$, in fact, it is clear that $||\mathbf u+\mathbf v||^2=||\mathbf u-\mathbf v||^2\iff \langle \mathbf u,\mathbf v\rangle = 0$.
## The Polarization Identity
A little computation yields:
\begin{align}
\|\mathbf p+\mathbf q\|^2
&=\langle \mathbf p+\mathbf q,\mathbf p+\mathbf q\rangle\\
&=\|\mathbf p\|^2+\langle \mathbf p,\mathbf q\rangle+\langle \mathbf q,\mathbf p\rangle+\|\mathbf q\|^2\\
&= \|\mathbf p\|^2+\langle \mathbf p,\mathbf q\rangle
+\overline{\langle \mathbf p,\mathbf q\rangle}+\|\mathbf q\|^2\\
&= \|\mathbf p\|^2+2\text{Re}(\langle \mathbf p,\mathbf q\rangle)+\|\mathbf q\|^2
\end{align}
$$
\text{Re}(\langle \mathbf p,\mathbf q\rangle)=\frac{1}{2}\bigl(
\|\mathbf p+\mathbf q\|^2-(\|\mathbf p\|^2+\|\mathbf q\|^2)\bigr)
\tag{polarization}
$$
Since $\text{Im}(z)=\text{Re}(i\bar z)=\text{Re}(\bar i z)$ we have
\begin{align}
\text{Im}(\langle \mathbf p,\mathbf q\rangle)
&=\text{Re}(\bar i\langle \mathbf p,\mathbf q\rangle)
=\text{Re}(\langle \mathbf p,i\mathbf q\rangle)\\
&=\frac{1}{2}\bigl(\|\mathbf p+i\mathbf q\|^2-(\|\mathbf p\|^2+\|i\mathbf q\|^2)\bigr)\\
&=\frac{1}{2}\bigl(\|\mathbf p+i\mathbf q\|^2-(\|\mathbf p\|^2+\|\mathbf q\|^2)\bigr)
\end{align}
So we can reconstruct the inner product from
$$
\langle \mathbf p,\mathbf q\rangle=\frac{1}{2}\left(\|\mathbf p+\mathbf q\|^2-(\|\mathbf p\|^2+\|\mathbf q\|^2)\right)+\frac{i}{2}\left(\|\mathbf p+i\mathbf q\|^2-(\|\mathbf p\|^2+\|i\mathbf q\|^2)\right)
$$
Substituting $\frac{1}{2}\bigl(\|\mathbf p+\mathbf q\|^2+\|\mathbf p-\mathbf q\|^2\bigr)$ for $\|\mathbf p\|^2+\|\mathbf q\|^2$ (Parallelogram Law) we have:
$$
\langle \mathbf p,\mathbf q\rangle=\frac{1}{4}\left(\|\mathbf p+\mathbf q\|^2-\|\mathbf p-\mathbf q\|^2)\right)+\frac{i}{4}\left(\|\mathbf p+i\mathbf q\|^2-\|\mathbf p-i\mathbf q\|^2\right)
$$
Here we characterize exactly when a norm is the norm derived from an inner-product.
**Note** There is a family of norms on $\mathbb C^n$ or $\mathbb R^n$, $||u||_p=\left(\sum_{i=1}^n|u_i|^p\right)^{1/p}$ is a norm on $\mathbb C^n$ for any $p>1$. (Not trivial to prove!). These are the *$p$-norms*. Of all the $p$-norms, only the $2$-norm comes from an inner-product.
**Theorem:** A norm $||\cdot||:V\to[0,\infty)$ arises from an inner-product iff the parallelogram law holds.
**Proof:** The "only if" part we have shown above. For the "if" part we need to show that defining $\langle u,v\rangle$ by
$$
\langle \mathbf u,\mathbf v\rangle=\frac{1}{4}\left(\|\mathbf u+\mathbf v\|^2-\|\mathbf u-\mathbf v\|^2)\right)+\frac{i}{4}\left(\|\mathbf u+i\mathbf v\|^2-\|\mathbf u-i\mathbf v\|^2\right)
$$
satisfies the requirements of an inner-product. That when so defined, $\|u\|^2=\langle u,u\rangle$ is trivial since
\begin{align}
\langle \mathbf u,\mathbf u\rangle
&=\frac{1}{4}\left(\|\mathbf u+\mathbf u\|^2-\|\mathbf u-\mathbf u\|^2)\right)+\frac{i}{4}\left(\|\mathbf u+i\mathbf u\|^2-\|\mathbf u-i\mathbf u\|^2\right)\\
&=\frac{1}{4}\|2\mathbf u\|^2+
\frac{i}{4}\left(\|(1+i)\mathbf u\|^2-\|(1-i)\mathbf u\|^2\right)\\
&=\frac{1}{4}4\|\mathbf u\|^2+
\frac{i}{4}\left(|1+i|^2\|\mathbf u\|^2-|1-i|^2\|\mathbf u\|^2\right)\\
&=\|\mathbf u\|^2+
\frac{i}{4}\left(2\|\mathbf u\|^2-2\|\mathbf u\|^2\right)=\|\mathbf u\|^2
\end{align}
<a id="polarization"></a>
:::spoiler **Proof the the axioms of inner-product hold**
**Warning** For this entire proof, $\langle\mathbf u,\mathbf v\rangle$ must be interpreted as defined above, we must not accidentally assume any additional information about it based on what we know about inner products, we are trying to show that this satisfies the inner product axioms.
This will be the polarization formula, which defines the inner-product from the norm. Of course, it stell needs to be shown that satisfying the parallelogram law is enough to show that this defines an inner product, and that $||\cdot||$ is the norm associated to this inner product.
The only thing that needs be proved now is that taking the above as a definition results in $\langle u,v\rangle$ being an inner product, when the parallelogram law holds.
* **Conjugate symmetry:**
\begin{align*}
\langle \textbf u,\textbf v\rangle&=\frac{\left(||\textbf u+\textbf v||^2-||\textbf u-\textbf v||^2\right) + i\cdot\left(||\textbf u+i\textbf v||^2-||\textbf u-i\textbf v||^2\right)}{4}\\
&=\frac{\left(||\textbf v+\textbf u||^2-||\textbf v-\textbf u||^2\right) + i\cdot\left(||i(\textbf v-i\textbf u)||^2-||i(\textbf v+i\textbf u)||^2\right)}{4}\\
&=\frac{\left(||\textbf v+\textbf u||^2-||\textbf v-\textbf u||^2\right) + i\cdot\left(|i|^2\cdot||\textbf v-i\textbf u||^2-|i|^2\cdot||\textbf v+i\textbf u||^2\right)}{4}\\
&=\frac{\left(||\textbf v+\textbf u||^2-||\textbf v-\textbf u||^2\right) - i\cdot\left(||\textbf v+i\textbf u||^2-||\textbf v-i\textbf u||^2\right)}{4}\\
&=\overline{\langle \textbf v,\textbf u\rangle}
\end{align*}
Note that the parallelogram law is not needed for this part.
* **Additivity:**
We show $\text{Re}(\langle \textbf u_1+\textbf u_2,\textbf v\rangle) = \text{Re}(\langle \textbf u_1,\textbf v\rangle)+\text{Re}(\langle \textbf u_2,\textbf v\rangle)$
\begin{align*}
4\text{Re}(\langle \textbf u_1+\textbf u_2,\textbf v\rangle)
&=\|\textbf u_1+\textbf u_2+\textbf v\|^2-\|\textbf u_1+\textbf u_2-\textbf v\|^2\\
4\text{Re}(\langle\mathbf u_1,\mathbf v\rangle)+4\text{Re}(\langle\mathbf u_2,\mathbf v\rangle)
&=\|\textbf u_1+\textbf v\|^2-\|\textbf u_1-\textbf v\|^2+
\|\textbf u_2+\textbf v\|^2-\|\textbf u_2-\textbf v\|^2
\end{align*}
So our goal is to show the equality of the two right-hand sides of these two equalities.
Rearrange to ask
$$
\begin{align}
\bigl(\|\mathbf u_1+\mathbf u_2+\mathbf v\|^2+\|\mathbf u_1-\mathbf v\|^2\bigr)&-\bigl(\|\mathbf u_1+\mathbf u_2-\mathbf v\|^2+\|\mathbf u_1+\mathbf v\|^2\bigr)\\&\overset{?}{=}\|\textbf u_2+\textbf v\|^2-\|\textbf u_2-\textbf v\|^2\tag{$\dagger$}
\end{align}
$$
We **use** the parallelogram law
$$
\|\mathbf w_1+\mathbf w_2\|^2+\|\mathbf w_1-\mathbf w_2\|^2=2\|\mathbf w_1\|^2+2\|\mathbf w_2\|^2
$$
Take $\mathbf w_1=\mathbf u_1+\mathbf u_2+\mathbf v$ and $\mathbf w_2=\mathbf u_1-\mathbf v$ to get
$$
\begin{align}
\|2\mathbf u_1+\mathbf u_2\|^2+\|\mathbf u_2+2\mathbf v\|^2
&=2\|\mathbf u_1+\mathbf u_2+\mathbf v\|^2+2\|\mathbf u_1-\mathbf v\|^2
\end{align}
$$
Similarly take $\mathbf w_1=\mathbf u_1+\mathbf u_2-\mathbf v$ and $\mathbf w_2=\mathbf u_1+\mathbf v$ to get
$$
\begin{align}
\|2\mathbf u_1+\mathbf u_2\|^2+\|\mathbf u_2-2\mathbf v\|^2
&=2\|\mathbf u_1+\mathbf u_2-\mathbf v\|^2+2\|\mathbf u_1+\mathbf v\|^2
\end{align}
$$
So we have from $(\dagger)$
$$
2\text{LHS}=\|\mathbf u_2+2\mathbf v\|^2-\|\mathbf u_2-2\mathbf v\|^2\overset{?}{=}2\|\mathbf u_2+\mathbf v\|^2-2\|\mathbf u_2-\mathbf v\|^2=2\text{RHS}\quad(\ddagger)
$$
We apply the parralelogram law twice again, first with $\mathbf w_1=\mathbf u_2+\mathbf v$ and $\mathbf w_2=\mathbf v$ to get
$$
\| \mathbf u_2+2\mathbf v\|^2+\|\mathbf u_2\|^2=2\|\mathbf u_2+\mathbf v\|^2+2\|\mathbf v\|^2
$$
With $\mathbf w_1=\mathbf u_2-\mathbf v$ and $\mathbf w_2=-\mathbf v$ we get
$$
\| \mathbf u_2-2\mathbf v\|^2+\|\mathbf u_2\|^2=2\|\mathbf u_2-\mathbf v\|^2+2\|\mathbf v\|^2
$$
We can now verify that $(\ddagger)$ is true:
$$
\begin{align}
\|\mathbf u_2+2\mathbf v\|^2-\|\mathbf u_2-2\mathbf v\|^2
&=\bigl(-\|\mathbf u_2\|^2+2\|\mathbf u_2+\mathbf v\|^2+2\|\mathbf v\|^2\bigr)\\
&\qquad-\bigl(-\|\mathbf u_2\|^2+2\|\mathbf u_2-\mathbf v\|^2+2\|\mathbf v\|^2\bigr)\\
&=2\|\mathbf u_2+\mathbf v\|^2-2\|\mathbf u_2-\mathbf v\|^2
\end{align}
$$
The same argument gives the result with $\text{Im}(\langle \textbf u_1+\textbf u_2,\textbf v\rangle)$ and thus we get
\begin{align*}
\langle \textbf u_1+\textbf u_2,\textbf v\rangle
&=\text{Re}(\langle \textbf u_1+\textbf u_2,\textbf v\rangle)+i\text{Im}(\langle \textbf u_1+\textbf u_2,\textbf v\rangle)\\
&=\text{Re}(\langle \textbf u_1,\textbf v\rangle)+\text{Re}(\langle \textbf u_2,\textbf v\rangle)+i\text{Im}(\langle \textbf u_1,\textbf v\rangle)+i\text{Im}(\langle \textbf u_2,\textbf v\rangle)\\
&=\bigl(\text{Re}(\langle \textbf u_1,\textbf v\rangle)+i\text{Im}(\langle \textbf u_1,\textbf v\rangle)\bigr)+
\bigl(\text{Re}(\langle \textbf u_2,\textbf v\rangle)+i\text{Im}(\langle \textbf u_2,\textbf v\rangle)\bigr)\\
&=\langle \textbf u_1,\textbf v\rangle+\langle \textbf u_2,\textbf v\rangle
\end{align*}
* **Homogeneity** This is the challenging part. There is no direct proof that I know. What we do know from additivity is that $\langle n\mathbf u,\mathbf v\rangle=n\langle\mathbf u,\mathbf v\rangle$ for $n=1,2,3,\ldots$ It is trivial to see that this holds for $n=0$ and to extend to negative integers. From this it is easy to extend to reciprocals and then to rationals. Usinge conjugate symmetry we can extend to $\mathbb Q(i)=\{r+is\mid r,s\in\mathbb Q\}$, the ***field of complex rationals***.
$$
\text{For all }\alpha\in\mathbb Q(i), \langle \alpha\mathbf u,\mathbf v\rangle=\alpha\langle\mathbf u,\mathbf v\rangle=\langle\mathbf u,\bar\alpha\mathbf v\rangle
$$
At this point, the techniques of algebra fail us. The trick here is to note that $f_{\mathbf u,\mathbf v}:\mathbb C\to\mathbb C$ defined by $f_{\mathbf u,\mathbf v}(\alpha)=\langle \alpha\mathbf u,\mathbf v\rangle$ is continuous, since norms are continuous, and hence of $(\beta_j)_{j=0}^\infty$ is a sequence from $\mathbb Q(i)$ so that $\lim_{i\to\infty}\beta_i=\alpha$, then
$$
\begin{split}
\langle\alpha\mathbf u,\mathbf v\rangle=f_{\mathbf u,\mathbf v}(\alpha)&=f_{\mathbf u,\mathbf v}\bigl(\lim_{i\to\infty}\beta_i\bigr)=\lim_{i\to\infty}f_{\mathbf u,\mathbf v}(\beta_i)=\lim_{i\to\infty}\langle\beta_i\mathbf u,\mathbf v\rangle\\
&=\lim_{i\to\infty}\beta_i\langle\mathbf u,\mathbf v\rangle=\Bigl(\lim_{i\to\infty}\beta_i\Bigr)\langle\mathbf u,\mathbf v\rangle=\alpha\langle\mathbf u,\mathbf v\rangle
\end{split}
$$
* **Positive definite:** We already showed that $\langle \mathbf u, \mathbf u\rangle=\|\mathbf u\|^2$, so positive-definiteness follows.
:::
# Orthogonality
Let $\langle\cdot,\cdot\rangle$ be an inner-product on $V$. For $\mathbf{u,v}\in V$ define $\mathbf{u,v}$ to be ***orthogonal*** iff $\langle\mathbf{u,v}\rangle=0$. Denote this by $\mathbf u\perp\mathbf v$. The reason for defining orthogonality follows from the Pythagorean Theorem.
## Pythagorean Theorem
Pythagorean Theorem holds for an arbitrary inner product $\langle\cdot,\cdot\rangle$ on $V$.
**Note** This is immediate from the *polarization* identity above, but here is a simple argument:
Suppose $\langle \mathbf p,\mathbf q\rangle=0$, then
\begin{align}
\|\mathbf p+\mathbf q\|^2
&=\langle \mathbf p+\mathbf q,\mathbf p+\mathbf q\rangle\\
&=\|\mathbf p\|^2+\langle \mathbf p,\mathbf q\rangle+\langle \mathbf q,\mathbf p\rangle+\|\mathbf q\|^2\\
&=\|\mathbf p\|^2+0+0+\|\mathbf q\|^2\\
&= \|\mathbf p\|^2+\|\mathbf q\|^2\\
\end{align}
❏
**Remark** Notice that for complex vector spaces $\|\mathbf p+\mathbf q\|^2=\|\mathbf p\|^2+\|\mathbf q\|^2$ is not enough to guarantee that $\mathbf p\perp\mathbf q$. In particular:
\begin{align}
\|\mathbf p+\mathbf q\|^2&=\|\mathbf p\|^2+\|\mathbf q\|^2+2\text{Re}(\langle\mathbf p,\mathbf q\rangle)\\
\|\mathbf p+i\mathbf q\|^2&=\|\mathbf p\|^2+\|\mathbf q\|^2+2\text{Im}(\langle\mathbf p,\mathbf q\rangle)
\end{align}
For $\mathbf p\perp\mathbf q$ to obtain we must have both $\text{Re}(\langle\mathbf p,\mathbf q\rangle)=0$ and $\text{Im}(\langle\mathbf p,\mathbf q\rangle)=0$.
**Example** Let $\mathbf v = (a, b)$ and $\mathbf u=(ci,di)$ be vectors in $\mathbb C^2$ with $a,b,c,d\in\mathbb R$, then $\|v\|^2=a^2+b^2$, $\|u\|^2=c^2+d^2$ and
\begin{align}
\|\mathbf v+\mathbf u\|^2&=\|(a+ci,b+di)\|^2\\
&=|a+ci|^2+|b+di|^2\\
&=(a^2+c^2)+(b^2+d^2)\\
&=\|\mathbf v\|^2+\|\mathbf u\|^2\\
\|\mathbf v+i\mathbf u\|^2
&=\|(a-c,b-d)\|^2\\
&=(a-c)^2+(b-d)^2\\
&=\|\mathbf v\|^2+\|i\mathbf u\|^2-2(ac+bd)\\
&\neq\|\mathbf v\|^2+\|i\mathbf u\|^2
=\|\mathbf v\|^2+\|\mathbf u\|^2
\end{align}
However, we do have that:
\begin{split}
\mathbf p\perp\mathbf q\iff &\|\mathbf p+\mathbf q\|^2=\|\mathbf p+i\mathbf q\|^2=\|\mathbf p\|^2+\|\mathbf q\|^2
\end{split}
## Projections
Suppose $S\subseteq V$ and $\mathbf p\in V$ define
$$
d(\mathbf p,S)=\inf\{d(\mathbf p,\mathbf q')\mid \mathbf q'\in S\}
$$
**Claim** When $S$ is a subspace of $V$, there is a unique $\mathbf q\in S$ so that $d(\mathbf p,\mathbf q)=d(\mathbf p,S)$.

This $\mathbf q$ is called the ***projection of $\mathbf p$ into $S$*** and is denoted $\text{proj}_S(\mathbf p)$.
:::spoiler **Proof**
WLOG we may assume $\mathbf p\not\in S$, since positive definiteness guarantees that for $\mathbf p\in S$, $\mathbf p$ is the unique projection of $\mathbf p$ into $S$.
Suppose, towards a contradiction, that $\mathbf q_0 \neq \mathbf q_1$ both satisfy $d(\mathbf p,\mathbf q_i)=d(\mathbf p,S)$.

Consider $\mathbf q= t\mathbf q_0+(1-t)\mathbf q_1$ for $0\le t\le 1$, then
\begin{align*}
\|\mathbf p - \mathbf q\|
&=\|\mathbf p-(t\mathbf q_0+(1-t)\mathbf q_1)\|\\
&=\|\mathbf p+t\mathbf p-t\mathbf q_0 -t\mathbf p-(1-t)\mathbf q_1\|\\
&=\|t(\mathbf p-\mathbf q_0)+(1-t)(\mathbf p - \mathbf q_1)\|\\
&\le t\|\mathbf p-\mathbf q_0\|+(1-t)\|\mathbf p-\mathbf q_1\|\\
&=t d(\mathbf p,S)+(1-t) d(\mathbf p,S)\\
&=d(\mathbf p,S)
\end{align*}
Since strict inequality can't happen it must be that $d(\mathbf p,S)=d(\mathbf p,\mathbf q)$ for all $\mathbf q$ on the line segment $t\mathbf q_0+(1-t)\mathbf q_1$ through $\mathbf q_0$ and $\mathbf q_1$. Moreover, equality can only happen above in the case that $\mathbf p-\mathbf q_0=\lambda(\mathbf p-\mathbf q_1)$, but then we have $(1-\lambda)\mathbf p=\mathbf q_0-\lambda\mathbf q_1$ so $\mathbf p=\frac{1}{1-\lambda}\mathbf q_0+\Bigl(-\frac{\lambda}{1-\lambda}\Bigr)\mathbf q_1$, but $\frac{1}{1-\lambda}+\Bigl(-\frac{\lambda}{1-\lambda}\Bigr)=1$. So $\mathbf p$ would itself be on this line segment and hence in $S$. This is the desired contradiction.
❏
:::
>
We need to see how to find $\text{proj}_S(\mathbf p)$, the key is the following:
**Claim** For $\mathbf q\in S$ and $\langle \mathbf p-\mathbf q,\mathbf q'\rangle = 0$ for all $\mathbf q'\in S$, then
$\|\mathbf p-\mathbf q\|\le\|\mathbf p-\mathbf q'\|$ for all $\mathbf q'\in S$.
Stated sightly differently, if $\mathbf q\in S$ and $\mathbf p-\mathbf q\perp S$, then $\mathbf q=\text{proj}_S(\mathbf p)$. (This is what the picture above is intended to convey.)
**Proof** Let $\mathbf q\in S$ and $\mathbf p-\mathbf q\perp S$, then for any other $\mathbf q'\in S$:
\begin{align}
\|\mathbf p-\mathbf q'\|^2&=
\|(\mathbf p-\mathbf q)+(\mathbf q-\mathbf q')\|^2\\
&=\|(\mathbf p-\mathbf q)\|^2+\|(\mathbf q-\mathbf q')\|^2&&\text{(Pythagorean Theorem)}\\
&\ge \|\mathbf p-\mathbf q\|^2
\end{align}
Thus we have $d(\mathbf p,\mathbf q)\le d(\mathbf p,\mathbf q')$ for all $\mathbf q'\in S$ and hence $\mathbf q=\text{proj}_S(\mathbf p)$.
❏
**Example** For $\mathbf v,\mathbf u\in V$, define
$$
\mathbf q
=\frac{\langle \mathbf u,\mathbf v\rangle}{\langle \mathbf u,\mathbf u\rangle}\mathbf u
=\frac{\langle \mathbf u,\mathbf v\rangle}{\|\mathbf u\|^2}\mathbf u
=\frac{\langle \mathbf u,\mathbf v\rangle}{\|u\|}\hat{\mathbf u}
$$
we see
\begin{split}
\langle\mathbf v-\mathbf q,\mathbf u\rangle&=\langle \mathbf v,\mathbf u\rangle
-\langle \mathbf q,\mathbf u\rangle=
\langle \mathbf v,\mathbf u\rangle
-\left\langle\textstyle{\frac{\langle \mathbf u,\mathbf v\rangle}{\langle \mathbf u,\mathbf u\rangle}\mathbf u},\mathbf u\right\rangle
\\
&=\langle \mathbf v,\mathbf u\rangle
-\textstyle{\frac{\langle \mathbf u,\mathbf v\rangle}{\langle \mathbf u,\mathbf u\rangle}}\left\langle\mathbf u,\mathbf u\right\rangle\\
&=\langle \mathbf v,\mathbf u\rangle-\langle \mathbf v,\mathbf u\rangle=0
\end{split}
So $\mathbf q$ is the point in the line $\ell={\mathbb R}\mathbf u$ closest to $\mathbf v$.
**Definition** $\text{proj}_{\mathbf u}^\perp(\mathbf v)=\frac{\langle \mathbf u,\mathbf v\rangle}{\langle \mathbf u,\mathbf u\rangle}\mathbf u$ is the orthogonal projection of $\mathbf v$ onto the subspace spanned by $\mathbf u$, that is, the *line* in the direction of $\mathbf u$.
There are two useful things to note here:
1. $\|\text{proj}^\perp_{\mathbf u}(\mathbf v)\|=\left\|\frac{\langle \mathbf u,\mathbf v\rangle}{\|\mathbf u\|^2}\mathbf u\right\|=\frac{|\langle \mathbf u,\mathbf v\rangle|}{\|\mathbf u\|^2}\|\mathbf u\|=\frac{\langle \mathbf u,\mathbf v\rangle}{\|\mathbf u\|}$
2. The sign of $\langle \mathbf u,\mathbf v\rangle$ determines if the projection is in the same direction as $\mathbf u$ or the oposite.
3. If $\mathbf u$ is a unit vector, the $\text{proj}_{\mathbf u}^\perp(\mathbf v)=\langle \mathbf v,\mathbf u\rangle \mathbf u$.

**Important Point** We are using the above picture, but it works just as well if I want to find the projection of $f(x)=x^2$ onto $u(x)=\sin(\pi x)$ where $V=C([0,1],\mathbb R)$ and the inner product is $\langle g, h\rangle=\int_0^1 g(x)g(x)\,dx$.
**Example** Find the projection of $(1,2)$ onto $(3,-1)$ with the standard inner product. Next show that $\langle \mathbf u,\mathbf v\rangle_A=\mathbf v^TA\mathbf u$ is an inner-product on $\mathbb R^2$ where
$$
A = \begin{bmatrix}2&0\\0&4\end{bmatrix}
$$
Find the projection of $(1,2)$ onto $(3,-1)$ with the inner product $\langle\cdot,\cdot\rangle_A$. What is the unit circle under $\|\cdot\|_A$? ([Desmos](https://www.desmos.com/calculator/24ucypac7f))
**Example** Find the projection of $f(x)=\sqrt{x}$ onto $g(x)=x^2$ with the inner product $\langle f(x), h(x)\rangle=\int_0^1 f(x)h(x)\,dx$. ([Desmos](https://www.desmos.com/calculator/f2zierbmex))
## Projections and Orthogonal Complements
**Definition** An ***orthogonal basis*** basis for a vector space $V$ is a basis $\cal B$ so that if $\mathbf u,\mathbf v\in \cal B$, then $\langle\mathbf u,\mathbf v\rangle=0$. An orthogonal basis $\cal B$ is ***orthonormal*** iff $\cal B$ is orthogonal and every $u\in \cal B$ is a unit vector.
### Orthonormal Basis
Let $U\subseteq V$ be a subspace of the inner product space $V$ and ${\cal B}=\{\mathbf u_1,\ldots,\mathbf u_k\}$ be an orthonormal basis for $U$. For $\mathbf u\in U$,
$$
[\mathbf u]_{\cal B}=\begin{bmatrix}
\langle \mathbf v,\mathbf u_1\rangle\\\langle \mathbf v,\mathbf u_2\rangle\\\vdots\\\langle \mathbf v,\mathbf u_k\rangle
\end{bmatrix}
$$
To see this suppose $\mathbf u=\sum_{i=1}^k\alpha_i\mathbf u_i$, then
$$
\langle \mathbf u,\mathbf u_i\rangle=\left\langle\sum_{j=1}^k\alpha_j \mathbf u_j,\mathbf u_i\right\rangle=\sum_{j=1}^k\alpha_j\langle \mathbf u_j,\mathbf u_i\rangle
=\sum_{j=1}^k\alpha_j\begin{cases}1&i=j\\0&i\neq j\end{cases}=\alpha_i
$$
Moreover we have, by the Pythagorean Theorem, that for $u\in U$
$$
||\mathbf u||^2=\sum_{j=1}^k|\langle \mathbf u,\mathbf u_j\rangle|^2=\sum_{j=1}^k|\alpha_j|^2
$$
that is, if
$$
[\mathbf u]_{\cal B}=\begin{bmatrix}\alpha_1\\\vdots\\\alpha_k\end{bmatrix}
$$
then $||\mathbf u||^2=||[\mathbf u]_{\cal B}||^2$. In fact we actually have,
**Claim** For $\mathbf u,\mathbf v\in U$, $\langle \mathbf u, \mathbf v\rangle = [\mathbf u]_{\cal B}\bullet[\mathbf v]_{\cal B}$.
So the ***abstract*** inner product of $U$ maps directly to the standard inner product of $\mathbb R^k$ and hence the ***geometry*** of $U$ and the geometry of $\mathbb R^k$ are the same!
**Example** Let
$$
\mathbf u_1=\frac{1}{\sqrt{2}}(1,1)
\quad\text{and}\quad \mathbf u_2=\frac{1}{\sqrt{2}}(1,-1)
$$
So ${\cal B}=\{\mathbf u_1,\mathbf u_2\}$ is an orthonormal basis for $\mathbb R^2$. Represent $\mathbf v=(5,-4)$ in terms of $\cal B$.
\begin{align}
\mathbf v&=\langle \mathbf v,\mathbf u_1\rangle \mathbf u_1 +\langle \mathbf v,\mathbf u_2\rangle \mathbf u_2\\
&=\frac{-1}{\sqrt{2}}\mathbf u_1+\frac{9}{\sqrt{2}}\mathbf u_2
\end{align}
so
$$
[\mathbf v]_{\cal B}=\begin{bmatrix}
\frac{-1}{\sqrt{2}}\\\frac{9}{\sqrt{2}}
\end{bmatrix}
$$
and $\|\mathbf v\|^2=5^2+(-4)^2=\Bigl(\frac{-1}{\sqrt{2}}\Bigr)^2+\Bigl(\frac{9}{\sqrt{2}}\Bigr)^2=41$
### Orthogonal Projection Revisited
Recall given a space $S\subseteq V$ and $\mathbf q\in V$, the (unique point $\mathbf q$ in $S$ minimizing the distance from $S$ to $\mathbf v$) is the point so that $\mathbf v-\mathbf q\perp S$. We called $\mathbf q$ ***the orthogonal projection of $\mathbf v$ onto $S$*** and denote this by $\text{proj}_S(\mathbf v)$. If ${\cal B}=\{\mathbf u_1,\ldots,\mathbf u_k\}$ is an orthonormal basis for $S$, then
\begin{align*}
\text{proj}_S(\mathbf v)&=\langle \mathbf v,\mathbf u_1\rangle\mathbf u_1 +\langle \mathbf v,\mathbf u_2\rangle \mathbf u_2 + \cdots+\langle \mathbf v,\mathbf u_k\rangle \mathbf u_k&&(\dagger)\\
&=\text{proj}_{\mathbf u_1}(\mathbf v)+\text{proj}_{\mathbf u_2}(\mathbf v)+\cdots+\text{proj}_{\mathbf u_k}(\mathbf v)
\end{align*}
To verify this we need only verify:
**Claim** Let $S\subseteq V$ have an orthonormal basis ${\cal B}=\{\mathbf u_1,\ldots,\mathbf u_k\}$ and let $\mathbf v\in V$. Then if $\mathbf q=\sum_{i=1}^k\langle \mathbf v,\mathbf u_i\rangle\mathbf u_i$, then $\mathbf v-\mathbf q\perp S$. So $\mathbf q=\text{proj}_S(\mathbf v)$.
**Proof** This is a computation using the properties of the inner product. First notice that it sufficed to show that $\mathbf p-\mathbf q\perp \mathbf u_i$ for each $i$. (Verify this!)
\begin{align}
\left\langle \mathbf v-\text{proj}_S(\mathbf v),\mathbf u_i\right\rangle
&=\left\langle \mathbf v-\sum_{j=1}^k \langle \mathbf v,\mathbf u_i\rangle \mathbf u_i,\mathbf u_i\right\rangle\\
&=\langle \mathbf v,\mathbf u_i\rangle-\sum_{j=1}^k\langle \mathbf v,\mathbf u_j\rangle \langle \mathbf u_j,\mathbf u_i\rangle\\
&=\langle \mathbf v,\mathbf u_i\rangle-\langle \mathbf v,\mathbf u_i\rangle=0
\end{align}
❏
**Theorem** If $S\subseteq V$ and $\dim(S)=k$, then $S$ has an orthonormal basis ${\cal B}=\{\mathbf u_1,\ldots,\mathbf u_k\}$.
**Proof** This is in essence Gram-Schmidt which we will return to later. The argument is simple, take any basis ${\cal B}=\{\mathbf u_1,\ldots,\mathbf u_k\}$ for $U$ and transform it one vector at a time to an orthonormal basis. We may assume all the $\mathbf u_i$'s are unit vectors. Suppose $\mathbf u_1,\ldots,\mathbf u_l$ are orthonormal, then let $\mathbf u_{l+1}'=\mathbf u_{l+1}-\text{proj}_{\{\mathbf u_1,\ldots,\mathbf u_{l}\}}(\mathbf u_{l+1})$. Since $\{\mathbf u_1,\ldots,\mathbf u_l,\mathbf u_{l+1}\}$ is independent, we know $\mathbf u_{l+1}'\neq \mathbf 0$. Now just replace $\mathbf u_{l+1}$ with the normalized version of $\mathbf u_{l+1}'$.
❏
**Theorem** If $V$ is a finite dimensional inner product space and $U\subseteq V$ is a subspace, then $U^\perp=\{\mathbf v\in V\mid \mathbf v\perp U\}$ is a subspace of $V$ and $U\oplus U^\perp=V$.
**Proof** That $U^\perp$ is a subspace is trivial and we have proved this previously, here it is again. Let $\mathbf v,\mathbf v'\in U^\perp$, we must show $\alpha \mathbf v+\alpha' \mathbf v'\in U^\perp$. This follows from properties of inner products. For all $u\in U$,
$$
\langle \alpha \mathbf v+\alpha' \mathbf v',u\rangle=\alpha\langle \mathbf v,\mathbf u\rangle+\alpha'\langle \mathbf v',\mathbf u\rangle
=\alpha(0)+\alpha'(0)=0.
$$
❏
We have done the hard work above to show that every $\mathbf v\in V$ can be written as $\mathbf v'+\text{proj}_U(\mathbf v)$ where $\mathbf v'\perp U$ and so $V=U+U^\perp$.
That $U\cap U^\perp$ is trivial, for suppose $u\in U\cap U^\perp$, then $\langle \mathbf u,\mathbf u\rangle=0$ and so $\mathbf u=\mathbf 0$.
❏
**Example** Find an orthonormal basis for $S=\text{span}\{\mathbf v_1,\mathbf v_2,\mathbf v_3\}$ where
$$
\mathbf v_1 = \begin{bmatrix}
1\\2\\2\\0
\end{bmatrix}\quad
\mathbf v_2 = \begin{bmatrix}
2\\0\\2\\1
\end{bmatrix}\quad
\mathbf v_3 = \begin{bmatrix}
0\\1\\2\\2
\end{bmatrix}
$$
Find $\text{proj}_S(\mathbf v)$ where $\mathbf v=\begin{bmatrix}1\\1\\1\\1\end{bmatrix}$ and for practice give a plain English interpretation of what you have found here.
:::spoiler **Solution**
1. Normalize $\mathbf v_1$ by deviding by $\|\mathbf v_1\|=\sqrt{4+4+1}=3$:
$$
\mathbf q_1 = \begin{bmatrix}1/3\\2/3\\2/3\\0
\end{bmatrix}=\frac{1}{3} \begin{bmatrix}
1\\2\\2\\0
\end{bmatrix}
$$
2. Project $\mathbf v_2$ onto $\mathbf q_1$
$$
\text{proj}_{\{\mathbf q_1\}}(\mathbf v_2)=
\langle \mathbf v_2, \mathbf q_1\rangle \mathbf q_1=2\mathbf q_1
$$
Then
$$
\mathbf q'_2=\mathbf v_2-\text{proj}_{\{\mathbf q_1\}}(\mathbf v_2)
=\begin{bmatrix}2\\0\\2\\1\end{bmatrix}
-2\begin{bmatrix}1/3\\2/3\\2/3\\0\end{bmatrix}
=\begin{bmatrix}4/3\\-4/3\\2/3\\1\end{bmatrix}
$$
Do a quick check that $\mathbf q_2'\perp \mathbf q_1$. Normalize to get
$$
\mathbf q_2= \frac{1}{\sqrt{5}}\mathbf q_2'=
\frac{1}{3\sqrt{5}}\begin{bmatrix}4\\-4\\2\\3\end{bmatrix}
$$
3. Project $\mathbf v_3$ onto $\text{span}\{\mathbf q_1,\mathbf q_2\}$.
\begin{align}
\text{proj}_{\{\mathbf q_1,\mathbf q_2\}}(\mathbf v_3)&=
\langle \mathbf v_3, \mathbf q_1\rangle \mathbf q_1+\langle \mathbf v_3, \mathbf q_2\rangle \mathbf q_2
=2\mathbf q_1+\frac{2}{\sqrt{5}}\mathbf q_2\\
&=
\frac{2}{3}\begin{bmatrix}1\\2\\2\\0\end{bmatrix}+\frac{2}{15}\begin{bmatrix}4\\-4\\2\\3\end{bmatrix}=
\frac{2}{15}\begin{bmatrix}5\\10\\10\\0\end{bmatrix}+\frac{2}{15}\begin{bmatrix}4\\-4\\2\\3\end{bmatrix}=
\frac{2}{5}\begin{bmatrix}3\\2\\4\\1\end{bmatrix}
\end{align}
So
$$
\mathbf q_3'=\begin{bmatrix}0\\1\\2\\2\end{bmatrix}=
\frac{1}{5}\begin{bmatrix}-6\\1\\2\\8\end{bmatrix}
$$
Again, you should check that this is orthogonal to both $\mathbf q_1$ and $\mathbf q_2$. So
$$
\mathbf q_3=\frac{1}{\sqrt{105}}\begin{bmatrix}-6\\1\\2\\8\end{bmatrix}
$$
Finally,
\begin{align}
\text{proj}_S(\mathbf v)&=\langle \mathbf v,\mathbf q_1\rangle\mathbf q_1
+\langle \mathbf v,\mathbf q_2\rangle\mathbf q_2
+\langle \mathbf v,\mathbf q_3\rangle\mathbf q_3\\
&=\frac{5}{9}\begin{bmatrix}1\\2\\2\\0\end{bmatrix}
+\frac{5}{45}\begin{bmatrix}4\\-4\\2\\3\end{bmatrix}
+\frac{5}{105}\begin{bmatrix}-6\\1\\2\\8\end{bmatrix}
=\frac{1}{63}\begin{bmatrix}49\\73\\90\\13\end{bmatrix}
\end{align}
Recall the interpretation of this: $\text{proj}_S(\mathbf v)$ is the unique vector $\mathbf q\in S$ so that $d(\mathbf p,\mathbf q)=d(\mathbf p,\mathbf S)$, that is, $\mathbf q$ is the unique point in $S$ closest to $\mathbf p$ in the $\|\cdot\|_2$ norm, i.e., the standard $2$-norm.
:::
>
**Example** Consider $V=C([0,1])$. This is an inner product space with $\langle f,g\rangle =\int_0^1fg\,dx$. (Check the axioms!)
**Thought Question 1** What does $\|f\|_2$ represent geometrically? As a warm up what does $\|f\|_1=\int_0^1|f|\,dx$ represent? What about $\|f-g\|_1$ and $\|f-g\|_2$?
**Thought Question 2** Given functions $\{f_1,\ldots,f_n\}$ from $C([0,1])$ and $g\in C([0,1])$. Let $S=\text{span}\{f_1,\ldots,f_n\}$ what does $\text{proj}_S(g)$ represent intuitively?
Find an orthonormal basis for $P_3$. Let $g(x)=\cos(3\pi x)$ find $\hat g=\text{proj}
_{P_3}(g)$. Plot $g$ and $\hat g$ and describe geometrically what is being minimized when we say $\hat g$ is the *closest* point to $g$ in $P_3$.
:::spoiler **Solution**
There is some work here. A basis for $P_3$ is given by $\{1,x,x^2,x^3\}$ we can use GS to convert this to an orthonormal basis.
1. $u_0(x)=1$ (this has unit length already)
2. Find the second vector of the orthonormal basis.
$$
u'_1(x)=x-\int_0^1t\cdot 1\,dt=x-1/2.
$$
$$
\|u_1'(x)\|^2=\int_0^1(t-1/2)^2\,dt=\int_0^1(t^2-t+1/4)\,dt=\frac{1}{3}-\frac{1}{2}+\frac{1}{4}=\frac{1}{12}
$$
and so $u_1(x)=\sqrt{12}(x-1/2)$.
3. Find the third vector of the orthonormal basis.
\begin{align}
u'_2(x)&=x^2-\langle x^2,u_0(x)\rangle u_0(x)-\langle x^2,u_1(x)\rangle u_1(x)\\
&=x^2-\left(\int_0^2t^2\,dt\right)\cdot (1)-
\left(\int_0^1t^2\sqrt{12}(t-1/2)\,dt\right)\cdot(\sqrt{12}(x-1/2))\\
&=x^2-x+1/6
\end{align}
and
$$
\|u'_2(x)\|^2=\langle u'_2,u'_2\rangle=\int_0^1 (t^2-t+1/6)^2\,dt=\frac{1}{180}
$$
So
$$
u_2(x)=\sqrt{180}(x^2-x+1/6)
$$
4. Find the fourth vector of the orthonormal basis. This is getting messy enough that I let the machine do the integrals.
\begin{align}
u'_3(x)&=x^3-\langle x^3,u_0(x)\rangle u_0(x)-\langle x^3,u_1(x)\rangle u_1(x)-\langle x^3,u_2(x)\rangle u_1(x)\\
&=x^2-\left(\int_0^2t^2\,dt\right)\cdot (1)-
\left(\int_0^1t^2\sqrt{12}(t-1/2)\,dt\right)\cdot(\sqrt{12}(x-1/2))\\
&\quad-
\left(\int_0^1(t^3\sqrt{180}(t^2-t+1/6)\,dt\right)\cdot(\sqrt{180}(x^2-x+1/6))\\
&=x^3-\frac{3}{2}x^2+\frac{3}{5}x-\frac{1}{20}
\end{align}
and
$$
\|u'_3(x)\|^2=\langle u'_3,u'_3\rangle=\int_0^1 \left(t^3-\frac{3}{2}t^2+\frac{3}{5}t-\frac{1}{20}\right)^2\,dt=\frac{1}{2800}
$$
So
$$
u_3(x)=\sqrt{2880}\left(x^3-\frac{3}{2}x^2+\frac{3}{5}x-\frac{1}{20}\right)
$$
5. Find the projection of $g$ onto $P_3$.
$$
\hat g=\langle g,u_0\rangle u_0+\langle g,u_1\rangle u_1+\langle g,u_2\rangle u_2+\langle g,u_3\rangle u_3
$$
where
\begin{align}
\langle g,u_0\rangle&=\int_0^1\cos(3\pi t)\,dt=0\\
\langle g,u_1\rangle&=\int_0^1\cos(3\pi t)\sqrt{12}\left(t-\frac{1}{2}\right)\,dt=-\frac{4\sqrt{3}}{9\pi^2}\approx -0.078\\
\langle g,u_2\rangle&=\int_0^1\cos(3\pi t)\sqrt{180}\left(x^2-x+\frac{1}{6}\right)\,dt=0\\
\langle g,u_3\rangle&=\int_0^1\cos(3\pi t)\sqrt{2880}\left(x^3-\frac{3}{2}x^2+\frac{3}{5}x-\frac{1}{20}\right)\,dt=-\frac{8 \, {\left(9 \, \sqrt{7} \pi^{2} - 10 \, \sqrt{7}\right)}}{27 \, \pi^{4}}\approx -0.6343\\
\end{align}
So
\begin{align}
\hat g&=-\frac{4\sqrt{3}}{9\pi^2}\sqrt{12}\left(x-\frac{1}{2}\right)-\frac{8 \, {\left(9 \, \sqrt{7} \pi^{2} - 10 \, \sqrt{7}\right)}}{27 \, \pi^{4}}\sqrt{2880}\left(x^3-\frac{3}{2}x^2+\frac{3}{5}x-\frac{1}{20}\right)\\
&=\left(\frac{11200}{27\pi^4}-\frac{1120}{3\pi^2}\right)x^3+\left(\frac{560}{\pi^2}-\frac{5600}{9\pi^4}\right)x^2+\left(\frac{2240}{9\pi^4}-\frac{680}{\pi^2}\right)x+\left(\frac{20}{\pi^2}-\frac{560}{27\pi^4})\right)\\
&\approx -33.568x^3+50.352x^2-20.411x+1.815
\end{align}
[Desmos](https://www.desmos.com/calculator/xpwnoashhi) (Basically the demo below, but set for this problem.)
:::
>
### Demos
|Demo |Brief discription |
|:-:|-|
|[Desmos](https://www.desmos.com/calculator/xpwnoashhi)|This is interactive and lets you "guess" the coefficients.|
|[MATLAB](https://ketchers.github.io/Teaching/345/GSInCont.mlx)|MATLAB code for the same examples.|
|[SageCell](https://sagecell.sagemath.org/?z=eJxtk91ymzAQhe-Z4R2YlAwSUZ1aor1oS2_9FmEwlpFaGSlCbnCfvis0BEPggtk9q--cGf38rS3KhgzHURyd-DmRXcctaonA3-Mogc9yd7Ud6I63tnYctbkgA_lC9u-Msfr3JjJZPYdSQJmLieq0vaB2xfSv1qEJbPEY8VrtkzLZx9G1olAMyeeQOBCYYD_3MoLp8-gJBca7Xl6Mkudbdb4qhbCnmadf6Dv_QsGB4mUfHFlwZJMj23YsRkeWzBYMLBhe9nTVh4giRBRTRLEVEUcGljW6Ryw3Mh9GSYAU9siMbjn8nmaBeoHeCcwL7E4ovFDE0SfvJLZSDzA42NoI2fReOTyVRmmHIIkgf_ak0UrbMrP8lBHFW96dKlUfuSqzFBal2QKiS-jGldJvGxxdcWzJaVt3Ld_g2Iorllxzq7sNqlhQYskc1fVjklgQZk3UzZ8PiElBkh3v3U3xMjtp52DLvMuuF_oNnWXby3-8_Ebq3vDGVfDIpC53X8NRu4tCD2l4Ao99-uPn0f5K0nDl73u26uF2za0Iw4dHpOAFD_4UMZlKOpdsLoupFBjj_5-VN74=&lang=sage&interacts=eJyLjgUAARUAuQ==)|This is a simple demo with few bells and whistles|
|[Geogebra](https://www.geogebra.org/3d/zmj3butj)|This demo tries to represent what is going on in function space and in $\mathbb R^3$ simultaneously.|
|[Python/Sage](https://ketchers.github.io/web/MAT345/InnerProduct.html)|This is a webpage with embedded sage that has descriptions of what is going on and let's you play with the results.|
## Least Squares Solutions to Systems
We have studied solving systems in the form $A\mathbf x=\mathbf y$ and we know that it is possible thatthere is no solution. But now we can be more precise:
* $A\mathbf x=\mathbf y$ has a solution if and only if $\mathbf y\in\text{Img}(A)=\text{CS}(A)$. In this case, there might be more than one solutions and in fact, the set of solutions is $\mathbf y+\ker(A)$ or, written differently, $\mathbf y+\text{NS}(A)$.
* If $A\mathbf x=\mathbf y$ has no solution, there is still $\hat{\mathbf y}\in \text{CS}(A)$ so that $\|\hat{\mathbf y}-\mathbf y\|_2$ is as small as possible, in particular, $\hat{\mathbf y}=\text{proj}_{\text{CS}(A)}(\mathbf y)$. This $\hat{\mathbf y}$ is unique, but there might be more than one $\hat{\mathbf x}$ so that $A\hat{\mathbf x}=\hat{\mathbf y}$. Any such $\hat{\mathbf x}$ is called a ***least squares*** solution to $A\mathbf x=\mathbf y$ and as above, given any such $\hat{\mathbf x}$, the set $\hat{\mathbf x}+\text{NS}(A)$ is the set of all least square solutions.
**Note** If $\mathbf x$ is an "actual" solution, then it is also a least square solution, so the second notion is more general.
Just to emphasize the important point remember:
$$
\hat{\mathbf x}\text{ is a least square solution to }A\mathbf x=\mathbf y\iff\hat{\mathbf x}\text{ is a solution to }A\mathbf x=\hat{\mathbf y}
$$
We have seen one method of computing least square solutions to $A\mathbf x=\mathbf y$. Here is a method ... not the most efficient ...
1. Find a basis ${\cal B}=\{\mathbf v_1,\ldots,\mathbf v_k\}$ for $\text{CS}(A)$, e.g., find $\text{rref}(A)$ and then the basis will be the pivot columns of $A$.
2. Take the basis just found and perform Gram-Schmidt to convert this into an othonormal basis, ${\cal Q}=\{\mathbf q_1,\ldots,\mathbf q_k\}$.
3. Find $\hat{\mathbf y}=\sum_{i=1}^k\langle \mathbf q_i, \mathbf y\rangle\mathbf q_i$.
4. Solve $A\mathbf x=\hat{\mathbf y}$, by Gaussian elimination. along the way we find a basis for $\text{NS}(A)$ and thereby find the set of all solutions to $A\mathbf x=\hat{\mathbf y}$, namely $\hat{\mathbf x}+\text{NS}(A)$, that is, all least square solutions to $A\mathbf x=\mathbf y$.
We need a more efficient method. We want $\hat{\mathbf y}=A\hat{\mathbf x}$ for some $\hat{\mathbf x}$ and $\mathbf y-\hat{\mathbf y}\perp \text{CS}(A)$. For this we want $A^T(\mathbf y-\hat{\mathbf y})=A^T(\mathbf y-A\hat{\mathbf x})=\mathbf 0$. This is equivalent to:
\begin{align}
A^TA\hat{\mathbf x}=A^T\mathbf y
\end{align}
This equation is called ***the normal equation***.
Note that $\ker(A^TA)=\ker(A)$ since
\begin{align}
A^TA\mathbf x=\mathbf 0
&\implies \mathbf x^TA^TA\mathbf x= 0\\
&\implies \langle A\mathbf x,A\mathbf x\rangle=0\\
&\implies A\mathbf x=\mathbf 0\\
&\implies A^TA\mathbf x=\mathbf 0\\
\end{align}
So we have $A^TA\mathbf x=\mathbf 0\iff A\mathbf x=\mathbf 0$ or equivalently, $\ker(A^TA)=\ker(A)$. Thus we have
**Fact**: $\hat{\mathbf x}$ is a least-squares solution to $A\mathbf x=\mathbf y$ iff $A^TA\hat{\mathbf x}=A^T\mathbf y$ and any other least-square solution is $\hat{\mathbf x}+\mathbf z$ for some $\mathbf z\in \text{NS}(A^TA)$.
Usually we apply this in a situation where $A$ is $m\times n$ where $m>n$ (overdetermined) and in this case $A^TA$ is $n\times n$ and $A^T\mathbf y$ is $n\times 1$ so overall solving the system of $n$-equations in $n$-unknowns $A^TA\mathbf x=A^T\mathbf y$ should be easier than solving the system of $m$-equations in $n$-unknowns, $A\mathbf x=\hat{\mathbf y}$.
**A special case:** Notice that in the case that $\ker(A^TA)=\{\mathbf 0\}$ we have $A^TA$ is invertible ans so the unique least-squares solution is $\hat{\mathbf x}=(A^TA)^{-1}A^T\mathbf y$ and the projection of $\mathbf y$ onto $\text{CS}(A)$ is $\text{proj}_{\text{CS}(A)}(\mathbf y)=A(A^TA)^{-1}A^T\mathbf y$. Note that if $A$ is $n\times 1$, so $A=[\mathbf u]$, then
$$\text{proj}_{\text{CS}(A)}(\mathbf y)=\text{proj}_{\mathbf u}(\mathbf y)=A(A^TA)^{-1}A^T\mathbf y=\mathbf u(\mathbf u^T\mathbf u)^{-1}\mathbf u^T\mathbf y=\frac{\langle \mathbf y,\mathbf u\rangle}{\mathbf u,\mathbf u}\mathbf u
$$
Generally we are not in this special case and we have two options:
**Method 1 ([$QR$-decomposition](#QR-Decomposition))** We find an orthonormal basis ${\cal Q}=\{\mathbf q_1,\ldots,\mathbf q_k\}$ for $\text{CS}(A)$ and put these into the columns of a matrix $Q$. Then there is a square invertible matrix $R$ so that $A=QR$. We just compute $[\text{proj}_{\text{CS}(A)}(\mathbf v)]_{\cal Q}=Q^T\mathbf v$ so that $QQ^T\mathbf v=\hat{\mathbf v}$ and we want to find $\hat{\mathbf x}$ so that $A\hat{\mathbf x}=\hat{\mathbf v}$, that is $QR\hat{\mathbf x}=QQ^T\mathbf v$, hence $\hat{\mathbf x} = R^{-1}Q^T\mathbf v$.
**Method 2 ([$A^\dagger$ (psuedo-inverse)](/W070QXZ9T6KGhTCrXyxZMA#Psuedo-inverse-and-least-square-again))** Later, after studying the SVD, we will learn of something called the ***psuedo-inverse*** of $A$, denoted $A^\dagger$ so that we can find
$$
\hat{\mathbf x} = A^\dagger \mathbf v
$$
and $A^TA\hat{\mathbf x}=A^T\mathbf v$, so that $\hat{\mathbf x}$ is a least-squares solution to $A\mathbf x=\mathbf v$. Note in this case that $\hat{\mathbf v}=AA^\dagger \mathbf v=\text{proj}_{\text{CS}(A)}(\mathbf v)$.
**Example** Given some data $\{(x_i,y_i)\mid i=1,\ldots,N\}$ we might try to find a function $y=f(x)$ that best ***fits*** the data. By best fits here we mean $\sum_{i=1}^N(f(x_i)-y_i)^2$ is minimal. Let $X=(x_1,x_1,\ldots,x_N)$, $Y=(y_1,y_2,\ldots,y_N)$. Suppose we want the best $k$-degree polynomial $f(x)$ that fits the data. Then we wish to find $c_0,c_1,\ldots,c_k$. Consider the matrix
$$
A=\Bigl[X^0\,X^1\,X^2\,\cdots\,X^k\Bigr] \text{ and }\mathbf c
=\begin{bmatrix}c_0\\c_1\\\vdots\\c_k\end{bmatrix}
$$
We would have a perfect fit if we could solve $A\mathbf c=Y$, that is $c_0X^0+c_1X^1+\cdots+c_kX^k=Y$. But probably this is too much to ask for. But we can find the unique $\hat Y$ so that $\|A\mathbf c-\hat Y\|$ is minimized over all $\mathbf c$ and we can find a $\hat{\mathbf c}$ (not unique) so that $A\hat{\mathbf c}=\hat Y$.

The data was generated by adding random noise to the yellow curve. So the closer we get to the yellow curve, the better we are doing at predicting the actual pattern. In this example the data has 300 points, so for $n=13$, the matrix $A$ is $300\times 14$. In each example the red curve is the $n$^th^ degree polynomial that best fits the data.
[Some Sage (python) code](https://cocalc.com/rketchers/teaching/LeastSquares) and some [MATLAB code](https://ketchers.github.io/Teaching/345/Topic%205/LeastSquares.mlx) for this.
## QR Decomposition
We can encode the Gram-Schmidt procedure into a matrix decomposition. Suppose we are given an $m\times n$ matrix $A=\Bigl[\mathbf u_1\,\cdots\,\mathbf u_n\Bigr]$ with independent columns. (We will discuss the case where A has non-independent columns below.) We run Gram-Schmidt on the columns to get $\mathbf q_1,\ldots,\mathbf q_n$ which are orthonormal and satisfy
$$
\text{span}\{\mathbf u_1,\ldots,\mathbf u_l\}=\text{span}\{\mathbf q_1,\ldots,\mathbf q_l\}
\text{ for }1\le l\le n.
$$
We know that for each $i$,
$$
\mathbf u_i=\sum_{j=1}^i\langle \mathbf u_i,\mathbf q_j\rangle\mathbf q_l
$$
So if $Q=\Bigl[\mathbf q_1,\ldots,\mathbf q_n\Bigr]$, then this can be written as
$$
\mathbf u_i=Q\begin{bmatrix}
\langle \mathbf u_i,\mathbf q_1 \rangle\\
\langle \mathbf u_i,\mathbf q_2 \rangle\\
\vdots\\
\langle \mathbf u_i,\mathbf q_i \rangle\\
0\\
\vdots\\
0
\end{bmatrix}=Q[\mathbf u_i]_{\cal Q}\text{, where }{\cal Q}=\{\mathbf q_1,\ldots,\mathbf q_n\}
$$
So
\begin{align}
R=\Bigl[[\mathbf u_1]_{\cal Q}\,[\mathbf u_2]_{\cal Q}\,\cdots\,[\mathbf u_n]_{\cal Q}\Bigr]
=\begin{bmatrix}
\langle \mathbf u_1,\mathbf q_1 \rangle&
\langle \mathbf u_2,\mathbf q_1 \rangle&
\langle \mathbf u_3,\mathbf q_1 \rangle&
\cdots&
\langle \mathbf u_n,\mathbf q_1 \rangle\\
0&
\langle \mathbf u_2,\mathbf q_2 \rangle&
\langle \mathbf u_3,\mathbf q_2 \rangle&
\cdots&
\langle \mathbf u_n,\mathbf q_2 \rangle\\
0&
0&
\langle \mathbf u_3,\mathbf q_3 \rangle&
\cdots&
\langle \mathbf u_n,\mathbf q_3 \rangle\\
\vdots&\vdots&\vdots&\ddots&\vdots\\
0&0&0&\cdots&\langle \mathbf u_n,\mathbf q_n \rangle
\end{bmatrix}
\end{align}
Notice that $R=[\text{id}]_{\cal B,\cal Q}$ where ${\cal B}=\{\mathbf u_1,\ldots,\mathbf u_n\}$ is taken as a basis for $\text{CS}(A)$. Keep in mind, the actual vectors $\mathbf u_i$ and $\mathbf q_i$ are in $\mathbb R^m$ and here $m>n$ is generally the case.
With $R$ defined like this
$$
A=QR
$$
Since $Q^TQ=I_n$ we see that $R=Q^TA$.
In the current setting, i.e., columns of $A$ independent, we know $\langle\mathbf u_i,\mathbf q_i\rangle\neq 0$ and so $R$
is invertible (why?) and $Q=AR^{-1}$.
We can use this decomposition in application of least-squares applied to $A\mathbf x=\mathbf b$. Recall we are looking for $\hat{\mathbf x}$ satisfying $A^TA\hat{\mathbf x}=A^T\mathbf b$. The following are equivalent:
\begin{align}
A^TA\hat{\mathbf x}&=A^T\mathbf b\\
(QR)^T(QR)\hat{\mathbf x}&=(QR)^T\mathbf b\\
(R^TQ^T)(QR)\hat{\mathbf x}&=R^TQ^T\mathbf b\\
R^T(I_k)R\hat{\mathbf x}&=R^TQ^T\mathbf b\\
R^TR\hat{\mathbf x}&=R^TQ^T\mathbf b\\
R\hat{\mathbf x}&=Q^T\mathbf b\\
\hat{\mathbf x}&=R^{-1}Q^T\mathbf b
\end{align}
**Note**: The 2^nd^ to last step is valid as $R^T$ is invertible.
So we have reduced the $m\times n$ least squares problem $A\mathbf x=\mathbf b$ to solving the $n\times n$ "upper-triangular" system $R\mathbf x=Q^T\mathbf b$, i.e. we have reduced it to a **back-substitution problem**.
Notice that $Q^T\mathbf b=[\hat{\mathbf b}]_{\cal Q}$, or equivalently $QQ^T\mathbf b=\hat{\mathbf b}$ so $Q$ provides a simple way to compute projections.
$$
\text{proj}^\perp_{\text{CS}(A)}(\mathbf v)=QQ^T\mathbf v
$$
where $A=QR$ for $Q$ orthogonal and $R$ echelon with no zero rows.
## Characterization of Orthogonal Projection
From what we have done above we can define:
$$
\text{proj}^\perp_S(\mathbf v)=\text{ the unique }\hat{\mathbf v}\in S\text{ such that }\mathbf v-\hat{\mathbf v}\perp S
$$
**Theorem** For $S$ a subspace of $V$, the following are true for $P=\text{proj}_S^\perp:V\to S$:
1. $P$ is linear.
2. $P^2=P$ and $\text{img}(P)=S$.
3. $\langle P\mathbf x,\mathbf y\rangle = \langle \mathbf x,P\mathbf y\rangle$.
**Proof** This is a good exercise.❏
**Theorem** The following hold for any $P:V\to S$ satisfying (1)-(3):
* $\mathbf v-P(\mathbf v)\perp S$ for all $\mathbf{v}\in V$.
* $\ker(P)=S^\perp$.
* $V=S\oplus S^\perp$.
:::spoiler **Proof**
Linearity is easy. Let $\hat{\mathbf v}$ and $\hat{\mathbf u}$ be the orthogonal projections of $\mathbf v$ and $\mathbf u$ respectively. Then clearly, for any $\mathbf s\in S$
$$
\langle (c\mathbf u + d\mathbf v) - (c\hat{\mathbf u}+d\hat{\mathbf v}), \mathbf s\rangle =
c\langle \mathbf u-\hat{\mathbf u},\mathbf s\rangle+d\langle \mathbf v-\hat{\mathbf v},\mathbf s\rangle=0+0=0
$$
So $(c\mathbf u + d\mathbf v)-(c\hat{\mathbf u}+d\hat{\mathbf v})\perp S$ and hence
$$
P(c\mathbf u + d\mathbf v)=c\hat{\mathbf u}+d\hat{\mathbf v}
=cP(\mathbf u)+dP(\mathbf v)
$$
That $P^2=P$ is trivial since $P(\mathbf v)\in S$ and $P(\mathbf s)=\mathbf s$ for any $\mathbf s\in S$. We say $P$ is the ***identity*** on $S$. Notice conversely, that $P^2=P$ implies that $P$ is the identity on $\text{img}(P)$, since if $\mathbf u=P(\mathbf v)$, then $P(\mathbf u)=P(P(\mathbf v))=P(\mathbf v)=\mathbf u$.
For (3), note that $\langle x-P(x),P(y)\rangle=0=\langle x,P(y)\rangle-\langle P(x),P(y)\rangle$ and similarly $\langle P(x),y-P(y)\rangle=0=\langle P(x),y\rangle-\langle P(x),P(y)\rangle$. So we have
$$
\langle x,P(y)\rangle=\langle P(x),P(y)\rangle=\langle P(x),y\rangle
$$
Now suppose we have $P:V\to S$ satisfying (1)-(3). Let $\mathbf s\in S$, then $\mathbf s=P(\mathbf u)$ and
\begin{align}
\langle \mathbf v-P(\mathbf v),P(\mathbf u)\rangle&=\langle \mathbf v,P(\mathbf u)\rangle-\langle P(\mathbf v),P(\mathbf u)\rangle\\
&=\langle \mathbf v,P(\mathbf u)\rangle-\langle \mathbf v,PP(\mathbf u)\rangle\tag{from (3)}\\
&=\langle \mathbf v,P(\mathbf u)\rangle-\langle \mathbf v,P(\mathbf u)\rangle\tag{from (2)}\\
&=0
\end{align}
For (5), let $\mathbf v\in \ker(P)$, $\mathbf v-P(\mathbf v)=\mathbf v\in S^\perp$, so $\ker(P)\subseteq S^\perp$. Conversely, if $\mathbf v\in S^\perp$, then $\mathbf v-P(\mathbf v)\in S^\perp$ so $\mathbf v-(\mathbf v-P(\mathbf v))=P(\mathbf v)\in S^\perp$. But $P(\mathbf v)\in S$ and so $P(\mathbf v)\in S\cap S^\perp=\{\mathbf 0\}$ and so $\mathbf v\in \ker(P)$.
(6) follows trivially from (4).
❏
:::
>
What we have shown here is that any $P$ satisfying (1) - (3) is, in fact, the unique orthogonal projection function, so in effect, we have ***axiomatized*** being the orthogonal projection map.
Also, in terms of matrices, (3) can be read as "$P$ is symmetric." This is true since
* If $P$ is symmetric, the clearly, $\langle \mathbf x,P\mathbf y\rangle=(P\mathbf y)^T\mathbf x=\mathbf y^TP^T\mathbf x=\mathbf y^TP\mathbf x=\langle P\mathbf x,\mathbf y\rangle$
* If $P$ is a matrix and $\langle \mathbf x,P\mathbf x\rangle=\mathbf x^TP\mathbf x=\mathbf x^TP^T\mathbf x=\langle P\mathbf x,\mathbf x\rangle$, for all $\mathbf x$, then $P$ is symmetric. (This too is a good exercise.)