# Chapter 3.3. Orthogonality and Basis Transformations Definition count: 15; Theorem count: 24; Corollary count: 10. We start by defining geometric concepts of length, distance, and perpendicularity, which are well known for $\mathbb{R}^{2}$ and $\mathbb{R}^{3}$, are defined here for $\mathbb{R}^{n}$. These concepts provide powerful geometric tools for solving many applied problems, including the least-squares problems mentioned above. All three notions are defined in terms of the inner product of two vectors. <span style="color: lime;">**Definition 3..**</span> The ==inner product== of vectors $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{R}^{n}$, denoted by $\mathbf{u} \cdot \mathbf{v}$, is the scalar $\mathbf{u}^{T}\mathbf{v} = \displaystyle\sum_{i=1}^{n}u_{i}v_{i}$. >Example. Let $\mathbf{u}=\left[\begin{array}{r}2 \\ -5 \\ -1\end{array}\right]$ and $\mathbf{v}=\left[\begin{array}{r}3 \\ 2 \\ -3\end{array}\right]$. The inner product of $\mathbf{u}$ and $\mathbf{v}$ and the inner product of $\mathbf{v}$ and $\mathbf{u}$ is $$ \begin{aligned} & \mathbf{u} \cdot \mathbf{v}=\mathbf{u}^T \mathbf{v}=\left[\begin{array}{lll} 2 & -5 & -1 \end{array}\right]\left[\begin{array}{r} 3 \\ 2 \\ -3 \end{array}\right]=(2)(3)+(-5)(2)+(-1)(-3)=-1 \\ & \mathbf{v} \cdot \mathbf{u}=\mathbf{v}^T \mathbf{u}=\left[\begin{array}{lll} 3 & 2 & -3 \end{array}\right]\left[\begin{array}{r} 2 \\ -5 \\ -1 \end{array}\right]=(3)(2)+(2)(-5)+(-3)(-1)=-1 \end{aligned} $$ It is clear from the above calculations why $\mathbf{u} \cdot \mathbf{v}=\mathbf{v} \cdot \mathbf{u}$. This commutativity of the inner product holds in general. The following properties of the inner product are easily deduced from properties of the transpose operation in Section 2.1. (See Exercises 21 and 22 at the end of this section.) <span style="color: mediumslateblue;">**Theorem 3..**</span> Let $\mathbf{u}$, $\mathbf{v}$, and $\mathbf{w}$ be vectors in $\mathbb{R}^n$, and let $\lambda$ be a scalar. Then (i) $\mathbf{u} \cdot \mathbf{v}=\mathbf{v} \cdot \mathbf{u}$ (ii) $(\mathbf{u}+\mathbf{v}) \cdot \mathbf{w}=\mathbf{u} \cdot \mathbf{w}+\mathbf{v} \cdot \mathbf{w}$ (iii) $(\lambda \mathbf{u}) \cdot \mathbf{v}=\lambda(\mathbf{u} \cdot \mathbf{v})=\mathbf{u} \cdot(\lambda \mathbf{v})$ (iv) $\mathbf{u} \cdot \mathbf{u} \geq 0$, and $\mathbf{u} \cdot \mathbf{u}=0$ if and only if $\mathbf{u}=\mathbf{0}$ *Proof.* (i) Observe that, $$\begin{align*} \mathbf{u} \cdot \mathbf{v} &= \mathbf{u}^{T}\mathbf{v} \\ &= \displaystyle\sum_{i=1}^{n}u_{i}v_{i} \\ &= \displaystyle\sum_{i=1}^{n}v_{i}u_{i} \\&= \mathbf{v}^{T}\mathbf{u} = \mathbf{v} \cdot \mathbf{u}.\end{align*}$$ (ii) $(\mathbf{u}+\mathbf{v}) \cdot \mathbf{w} = \displaystyle\sum_{i=1}^{n}(u_{i}+v_{i})w_{i} = \displaystyle\sum_{i=1}^{n}u_{i}w_{i}+\displaystyle\sum_{i=1}^{n}v_{i}w_{i} = \mathbf{u} \cdot \mathbf{w} +\mathbf{v} \cdot \mathbf{w}.$ (iii) $(\lambda \mathbf{u}) \cdot \mathbf{v} = \displaystyle\sum_{i=1}^{n}\lambda u_{i}v_{i} = \lambda\displaystyle\sum_{i=1}^{n}u_{i}v_{i}$ and $(\lambda \mathbf{u}) \cdot \mathbf{v} = \displaystyle\sum_{i=1}^{n}\lambda u_{i}v_{i} = \displaystyle\sum_{i=1}^{n} u_{i} (\lambda v_{i})=\mathbf{u} \cdot(\lambda \mathbf{v})$ as required. $\blacksquare$ (iv) $\mathbf{u} \cdot \mathbf{u} = \displaystyle\sum_{i=1}^{n}u_{i}^{2} \geq 0$. Next, $\mathbf{u} \cdot \mathbf{u} = \displaystyle\sum_{i=1}^{n}u_{i}^{2} = 0$ which is true if and only if $u_{i}=0$ for all $i=1,\ldots,n$ implying that $\mathbf{u}=0$. $\blacksquare$ Properties (ii) and (iii) can be combined several times to produce the following useful rule: $$\left(\lambda_{1} \mathbf{u}_{1}+\cdots+\lambda_{p} \mathbf{u}_{p}\right) \cdot \mathbf{w}={\lambda}_{1}\left(\mathbf{u}_{1} \cdot \mathbf{w}\right)+\cdots+\lambda_{p}\left(\mathbf{u}_{p} \cdot \mathbf{w}\right)$$ If $\mathbf{v}$ is in $\mathbb{R}^n$, with entries $v_1, \ldots, v_n$, then the square root of $\mathbf{v} \cdot \mathbf{v}$ is defined because $\mathbf{v} \cdot \mathbf{v}$ is nonnegative. <span style="color: lime;">**Definition 3..**</span> The ==length (or norm)== of $\mathbf{v}$ is the non-negative scalar $\lVert \mathbf{v} \rVert$ defined by $$ \lVert \mathbf{v} \rVert =\sqrt{\mathbf{v} \cdot \mathbf{v}}=\sqrt{v_{1}^{2}+v_{2}^{2}+\cdots+v_{n}^{2}}, \quad \text { and } \quad \lVert \mathbf{v} \rVert^{2}=\mathbf{v} \cdot \mathbf{v} .$$ A similar calculation with the diagonal of a rectangular box shows that the definition of length of a vector $\mathbf{v}$ in $\mathbb{R}^3$ coincides with the usual notion of length. For any scalar $\lambda$, the length of $\lambda \mathbf{v}$ is $|\lambda|$ times the length of $\mathbf{v}$. That is, $$\lVert \lambda \mathbf{v} \rVert = \sqrt{\lambda^{2}(v_{1}^{2}+v_{2}^{2}+\cdots+v_{n}^{2})} = \lvert \lambda \rvert \sqrt{v_{1}^{2}+v_{2}^{2}+\cdots+v_{n}^{2}} = \lvert \lambda \rvert\lVert \mathbf{v} \rVert.$$ A vector whose length is $1$ is called a unit vector. If we divide a non-zero vector $\mathbf{v}$ by its length-that is, multiply by $\frac{1}{\lVert \mathbf{v} \rVert}$ - we obtain a unit vector $\mathbf{u}$ because the length of $\mathbf{u}$ is $\left(\frac{1}{\lVert \mathbf{v} \rVert}\right)\lVert \mathbf{v} \rVert$. The process of creating $\mathbf{u}$ from $\mathbf{v}$ is sometimes called ==normalizing== $\mathbf{v}$, and we say that $\mathbf{u}$ is in the same direction as $\mathbf{v}$. Several examples that follow use the space-saving notation for (column) vectors. > Example. Let $\mathbf{v}=(1,-2,2,0)$. We find a unit vector $\mathbf{u}$ in the same direction as $\mathbf{v}$. $$ \begin{aligned} \lVert \mathbf{v} \rVert^{2} & =\mathbf{v} \cdot \mathbf{v}=(1)^2+(-2)^2+(2)^2+(0)^2=9 \\ \lVert \mathbf{v} \rVert &= \sqrt{9}=3 \end{aligned} $$ Then, multiply $\mathbf{v}$ by $\frac{1}{\lVert \mathbf{v} \rVert}$ to obtain $$ \mathbf{u}=\frac{1}{\lVert \mathbf{v} \rVert} \mathbf{v}=\frac{1}{3} \mathbf{v}=\frac{1}{3}\left(\begin{array}{r} 1 \\ -2 \\ 2 \\ 0 \end{array}\right)=\left(\begin{array}{c} \frac{1}{3} \\ -\frac{2}{3} \\ \frac{2}{3} \\ 0 \end{array}\right). $$ To check that $\|\mathbf{u}\|=1$, it suffices to show that $\|\mathbf{u}\|^2=1$. $$ \begin{aligned} \|\mathbf{u}\|^2 & =\mathbf{u} \cdot \mathbf{u}=\left(\frac{1}{3}\right)^2+\left(-\frac{2}{3}\right)^2+\left(\frac{2}{3}\right)^2+(0)^2 \\ & =\frac{1}{9}+\frac{4}{9}+\frac{4}{9}+0=1 \end{aligned} $$ We are ready now to describe how close one vector is to another. Recall that if $a$ and $b$ are real numbers, the distance on the number line between $a$ and $b$ is the number $|a-b|$. Two examples are shown in Fig. 3. This definition of distance in $\mathbb{R}$ has a direct analogue in $\mathbb{R}^{n}$. <span style="color: lime;">**Definition 3..**</span> For $\mathbf{u}, \mathbf{v} \in \mathbb{R}^{n}$, the ==distance between $\mathbf{u}$ and $\mathbf{v}$==, written as $\operatorname{dist}(\mathbf{u},\mathbf{v})$, is the length of the vector $\mathbf{u}-\mathbf{v}$. That is, $$\operatorname{dist}(\mathbf{u}, \mathbf{v})=\lVert \mathbf{u}-\mathbf{v} \rVert. $$ >Example. To compute, the distance between the vectors $\mathbf{u}=(7,1)$ and $\mathbf{v}=(3,2)$, we first compute $$\mathbf{u}-\mathbf{v} =\left(\begin{array}{l} 7 \\ 1 \end{array}\right)-\left(\begin{array}{l} 3 \\ 2 \end{array}\right)=\left(\begin{array}{r} 4 \\ -1 \end{array}\right).$$ Therefore, $$\lVert \mathbf{u}-\mathbf{v} \rVert =\sqrt{4^{2}+(-1)^{2}}=\sqrt{17}$$ ### 3.3.1. Orthogonal Vectors Next, we formalize the concept of perpendicular lines in ordinary Euclidean geometry has an analogue in $\mathbb{R}^n$. Consider $\mathbb{R}^2$ or $\mathbb{R}^3$ and two lines through the origin determined by vectors $\mathbf{u}$ and $\mathbf{v}$. The two lines shown in Fig. 5 are geometrically perpendicular if and only if the distance from $\mathbf{u}$ to $\mathbf{v}$ is the same as the distance from $\mathbf{u}$ to $-\mathbf{v}$. This is the same as requiring the squares of the distances to be the same. Now, $$ \begin{aligned} {[\operatorname{dist}(\mathbf{u},-\mathbf{v})]^2 } & =\|\mathbf{u}-(-\mathbf{v})\|^2=\|\mathbf{u}+\mathbf{v}\|^2 & & \\ & =(\mathbf{u}+\mathbf{v}) \cdot(\mathbf{u}+\mathbf{v}) \\ & =\mathbf{u} \cdot(\mathbf{u}+\mathbf{v})+\mathbf{v} \cdot(\mathbf{u}+\mathbf{v}) \\ & =\mathbf{u} \cdot \mathbf{u}+\mathbf{u} \cdot \mathbf{v}+\mathbf{v} \cdot \mathbf{u}+\mathbf{v} \cdot \mathbf{v} \\ & =\|\mathbf{u}\|^2+\lVert \mathbf{v} \rVert^{2}+2 \mathbf{u} \cdot \mathbf{v} \end{aligned} $$ The same calculations with $\mathbf{v}$ and $-\mathbf{v}$ interchanged show that $$ \begin{aligned} {[\operatorname{dist}(\mathbf{u}, \mathbf{v})]^2 } & =\|\mathbf{u}\|^2+\|-\mathbf{v}\|^2+2 \mathbf{u} \cdot(-\mathbf{v}) \\ & =\|\mathbf{u}\|^2+\lVert \mathbf{v} \rVert^2-2 \mathbf{u} \cdot \mathbf{v} \end{aligned} $$ The two squared distances are equal if and only if $2 \mathbf{u} \cdot \mathbf{v}=-2 \mathbf{u} \cdot \mathbf{v}$, which happens if and only if $\mathbf{u} \cdot \mathbf{v}=0$. This calculation shows that when vectors $\mathbf{u}$ and $\mathbf{v}$ are identified with geometric points, the corresponding lines through the points and the origin are perpendicular if and only if $\mathbf{u} \cdot \mathbf{v}=0$. The following definition generalizes to $\mathbb{R}^n$ this notion of perpendicularity or orthogonality. <span style="color: lime;">**Definition 3..**</span> Two vectors $\mathbf{u}$ and $\mathbf{v}$ in $\mathbb{R}^n$ are orthogonal (to each other) if $\mathbf{u} \cdot \mathbf{v}=0$. Observe that the zero vector is orthogonal to every vector in $\mathbb{R}^n$ because $\mathbf{0}^T \mathbf{v}=0$ for all $\mathbf{v}$. The next theorem provides a useful fact about orthogonal vectors. The proof follows immediately from the calculation in (1) above and the definition of orthogonality. <span style="color: mediumslateblue;">**Theorem 3..**</span> Two vectors $\mathbf{u}$ and $\mathbf{v}$ are orthogonal if and only if $\lVert \mathbf{u}+\mathbf{v} \rVert^{2}=\lVert \mathbf{u} \rVert^{2}+\lVert \mathbf{v} \rVert^2$. ### 3.3.2. Orthogonal and Orthonormal Basis <span style="color: lime;">**Definition 3..**</span> A set of vectors $\left\{\mathbf{u}_1, \ldots, \mathbf{u}_p\right\}$ in $\mathbb{R}^n$ is called ==orthogonal== if each pair of distinct vectors from the set is orthogonal, that is, if $\mathbf{u}_i \cdot \mathbf{u}_j=0$ whenever $i \neq j$. <span style="color: mediumslateblue;">**Theorem 3..**</span> If $S=\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{p}\right\}$ is an orthogonal set of non-zero vectors in $\mathbb{R}^{n}$, then $S$ is linearly independent. *Proof*. If $\mathbf{0}=\lambda_{1} \mathbf{u}_{1}+\cdots+\lambda_{p} \mathbf{u}_{p}$ for some scalars $\lambda_{1}, \ldots, \lambda_{p}$, then, since $\mathbf{u}_{1}$ is orthogonal to $\mathbf{u}_{2},\ldots,\mathbf{u}_{p}$, $$ \begin{aligned} 0 & =\mathbf{0} \cdot \mathbf{u}_1=\left({\lambda}_{1} \mathbf{u}_{1}+{\lambda}_{2} \mathbf{u}_{2}+\cdots+\lambda_{p} \mathbf{u}_p\right) \cdot \mathbf{u}_1 \\ & =\left(\lambda_{1} \mathbf{u}_1\right) \cdot \mathbf{u}_1+\left(\lambda_{2} \mathbf{u}_2\right) \cdot \mathbf{u}_1+\cdots+\left(\lambda_{p} \mathbf{u}_p\right) \cdot \mathbf{u}_{1} \\ & =\lambda_{1}\left(\mathbf{u}_{1} \cdot \mathbf{u}_{1}\right)+\lambda_{2}\left(\mathbf{u}_{2} \cdot \mathbf{u}_{1}\right)+\cdots+\lambda_{p}\left(\mathbf{u}_{p} \cdot \mathbf{u}_{1}\right) \\ & =\lambda_{1}\left(\mathbf{u}_{1} \cdot \mathbf{u}_{1}\right) \end{aligned}$$ Since $\mathbf{u}_{1}$ is nonzero, $\mathbf{u}_1 \cdot \mathbf{u}_1$ is not zero and so $\lambda_{1}=0$. Similarly, $\lambda_{2},\ldots,\lambda_{p}$ must be zero. Thus, $S$ is linearly independent. $\blacksquare$ <span style="color: lime;">**Definition 3..**</span> An ==orthogonal basis== of $\mathbb{R}^{n}$ is a basis that is also an orthogonal set. If all the vectors in the orthogonal basis are unit vectors then it is called an ==orthonormal basis== of $\mathbb{R}^{n}$. The next theorem suggests why an orthogonal basis is much nicer than other bases. The weights in a linear combination can be computed easily. <span style="color: mediumslateblue;">**Theorem 3..**</span> Let $\left\{\mathbf{u}_{1},\ldots, \mathbf{u}_{n}\right\}$ be an orthogonal basis of $\mathbb{R}^{n}$. For each $\mathbf{y} \in \mathbb{R}^{n}$, the weights in the linear combination $$ \mathbf{y}=\lambda_{1} \mathbf{u}_{1}+\cdots+\lambda_{n} \mathbf{u}_{n} $$ are given by $$\lambda_{j}=\frac{\mathbf{y} \cdot \mathbf{u}_j}{\mathbf{u}_j \cdot \mathbf{u}_j} \quad(j=1, \ldots, p).$$ *Proof*. Since $\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{n}\right\}$ is an orthogonal basis of $\mathbb{R}^{n}$, we have $$\mathbf{y} \cdot \mathbf{u}_{1}=\left(\lambda_{1} \mathbf{u}_{1}+\lambda_{2} \mathbf{u}_{2}+\cdots+\lambda_{n} \mathbf{u}_{n}\right) \cdot \mathbf{u}_{1}=\lambda_{1}\left(\mathbf{u}_1 \cdot \mathbf{u}_1\right).$$ Since $\mathbf{u}_1 \cdot \mathbf{u}_1$ is not zero, the equation above can be solved for $\lambda_{1}$. To find $\lambda_{j}$ for $j=2,\ldots,n$, compute $\mathbf{y} \cdot \mathbf{u}_{j}$ and solve for $\lambda_{j}$. >Example. The set $\left\{\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3\right\}$ where $$ \mathbf{u}_1=\left(\begin{array}{l} 3 \\ 1 \\ 1 \end{array}\right), \quad \mathbf{u}_2=\left(\begin{array}{r} -1 \\ 2 \\ 1 \end{array}\right), \quad \mathbf{u}_3=\left(\begin{array}{c} -\frac{1}{2} \\ -2 \\ \frac{7}{2} \end{array}\right) $$ is an orthogonal set. To see this, observe that $$ \begin{aligned} & \mathbf{u}_1 \cdot \mathbf{u}_2=3(-1)+1(2)+1(1)=0 \\ & \mathbf{u}_1 \cdot \mathbf{u}_3=3\left(-\frac{1}{2}\right)+1(-2)+1\left(\frac{7}{2}\right)=0 \\ & \mathbf{u}_2 \cdot \mathbf{u}_3=-1\left(-\frac{1}{2}\right)+2(-2)+1\left(\frac{7}{2}\right)=0 \end{aligned} $$ Therefore, the set $S=\left\{\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3\right\}$ is an orthogonal basis for $\mathbb{R}^3$. Next, we express the vector $\mathbf{y}=\left[\begin{array}{r}6 \\ 1 \\ -8\end{array}\right]$ as a linear combination of these vectors. Observe that $$ \begin{aligned} \mathbf{y} \cdot \mathbf{u}_1 & =11, & \mathbf{y} \cdot \mathbf{u}_2 & =-12, \\ \mathbf{u}_1 \cdot \mathbf{u}_1 & =11, & \mathbf{y} \cdot \mathbf{u}_3 & =-33 \\ \mathbf{u}_2 \cdot \mathbf{u}_2 & =6, & \mathbf{u}_3 \cdot \mathbf{u}_3 & =\frac{33}{2} \end{aligned} $$ Therefore, by Theorem 3., $$ \begin{aligned} \mathbf{y} & =\frac{\mathbf{y} \cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1} \mathbf{u}_1+\frac{\mathbf{y} \cdot \mathbf{u}_2}{\mathbf{u}_2 \cdot \mathbf{u}_2} \mathbf{u}_2+\frac{\mathbf{y} \cdot \mathbf{u}_3}{\mathbf{u}_3 \cdot \mathbf{u}_3} \mathbf{u}_3 \\ & =\frac{11}{11} \mathbf{u}_1+\frac{-12}{6} \mathbf{u}_2+\frac{-66}{33} \mathbf{u}_3 \\ & =\mathbf{u}_1-2 \mathbf{u}_2-2 \mathbf{u}_3 \end{aligned} $$ Notice how easy it is to compute the weights needed to build $\mathbf{y}$ from an orthogonal basis. If the basis were not orthogonal, it would be necessary to solve a system of linear equations in order to find the weights. The ==Gram-Schmidt process== is a simple algorithm for producing an orthogonal or orthonormal basis of $\mathbb{R}^n$. The first two examples of the process are aimed at hand calculation. Given a basis $\left\{\mathbf{b}_{1}, \ldots, \mathbf{b}_{n}\right\}$ of $\mathbb{R}^{n}$, the Gram-Schmidt algorithm generates an orthogonal basis $\left\{\mathbf{u}_{1}, \ldots, \mathbf{u}_{n}\right\}$ of $\mathbb{R}^{n}$ as follows: $$ \begin{aligned} \mathbf{u}_{1} & =\mathbf{b}_{1}; \\ \mathbf{u}_{2} & =\mathbf{b}_{2}-\frac{\mathbf{b}_{2} \cdot \mathbf{u}_1}{\mathbf{u}_1 \cdot \mathbf{u}_1} \mathbf{u}_{1}; \\ \mathbf{u}_{3} & =\mathbf{b}_3-\frac{\mathbf{b}_3 \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}-\frac{\mathbf{b}_{3} \cdot \mathbf{u}_{2}}{\mathbf{u}_{2} \cdot \mathbf{u}_{2}} \mathbf{u}_{2}; \\ & \vdots \\ \mathbf{u}_{n} & =\mathbf{b}_{n}-\frac{\mathbf{b}_{n} \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_1-\frac{\mathbf{b}_{n} \cdot \mathbf{u}_2}{\mathbf{u}_{2} \cdot \mathbf{u}_{2}} \mathbf{u}_2-\cdots-\frac{\mathbf{b}_{n} \cdot \mathbf{u}_{n-1}}{\mathbf{u}_{n-1} \cdot \mathbf{u}_{n-1}} \mathbf{u}_{n-1}. \end{aligned} $$ The next example fully illustrates the Gram-Schmidt process. > Example. Let $\left\{\mathbf{b}_{1}, \mathbf{b}_{2}, \mathbf{b}_{3}\right\}$ be a basis of $\mathcal{R}^{3}$ where $$\mathbf{b}_{1}=\left(\begin{array}{l}1 \\ 1 \\ 1 \\ 1\end{array}\right), \mathbf{b}_{2}=\left(\begin{array}{l}0 \\ 1 \\ 1 \\ 1\end{array}\right), \mathbf{b}_{3}=\left(\begin{array}{l}0 \\ 0 \\ 1 \\ 1\end{array}\right).$$ We use the Gram-Schmidt algorithm to generate an orthogonal basis of $\mathcal{R}^{3}$ from $\left\{\mathbf{b}_{1}, \mathbf{b}_{2}, \mathbf{b}_{3}\right\}$. **Step 1**. Set $\mathbf{u}_{1}=\mathbf{b}_1$. **Step 2**. The second vector in the orthogonal basis, $\mathbf{u}_{2}$, is computed as follows: $$\mathbf{u}_{2} = \mathbf{b}_2-\frac{\mathbf{b}_{2} \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}.$$ Therefore, \begin{aligned} \mathbf{u}_{2} & =\left(\begin{array}{l} 0 \\ 1 \\ 1 \\ 1 \end{array}\right)-\frac{3}{4}\left(\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \end{array}\right)=\left(\begin{array}{r} -\frac{3}{4} \\ \frac{1}{4}\\ \frac{1}{4} \\ \frac{1}{4} \end{array}\right). \end{aligned} **Step 3**. The third vector in the orthogonal basis, $\mathbf{u}_{3}$, is computed as follows: $$\mathbf{u}_{3} =\mathbf{b}_3-\frac{\mathbf{b}_3 \cdot \mathbf{u}_{1}}{\mathbf{u}_{1} \cdot \mathbf{u}_{1}} \mathbf{u}_{1}-\frac{\mathbf{b}_{3} \cdot \mathbf{u}_{2}}{\mathbf{u}_{2} \cdot \mathbf{u}_{2}} \mathbf{u}_{2}.$$ Therefore, \begin{aligned} \mathbf{u}_{3} & =\left(\begin{array}{l} 0 \\ 0 \\ 1 \\ 1 \end{array}\right)-\frac{1}{2}\left(\begin{array}{l} 1 \\ 1 \\ 1 \\ 1 \end{array}\right)-\frac{2}{3}\left(\begin{array}{r} -\frac{3}{4} \\ \frac{1}{4}\\ \frac{1}{4} \\ \frac{1}{4} \end{array}\right)=\left(\begin{array}{r} 0 \\ -\frac{2}{3}\\ \frac{1}{3} \\ \frac{1}{3} \end{array}\right). \end{aligned} It is left to the reader to verify that the set $\left\{\mathbf{u}_{1}, \mathbf{u}_{2}, \mathbf{u}_{3}\right\}$ is indeed an *orthogonal* basis. ### 3.3.3. QR Factorization of Matrices If an $m \times n$ matrix $\mathbf{A}$ has linearly independent columns $\mathbf{a}^{1}, \ldots, \mathbf{a}^{n}$, then applying the Gram-Schmidt process (with normalizations) to $\mathbf{a}^{1},\ldots,\mathbf{a}^{n}$ amounts to factoring $\mathbf{A}$ into an ==orthogonal matrix== $\mathbf{Q}$, a matrix whose columns form an orthonormal basis of its column space $\mathcal{C}(A), and an upper-triangular matrix $\mathbf{R}$. This factorization is widely used in computer algorithms for various computations, such as solving a system of linear equations and finding eigenvalues. Here are the steps of QR factorization of a $m \times n$ matrix $\mathbf{A}$: **Step 1**. If the columns of $\mathbf{A}$ are independent then go to Step 3. **Step 2**. If the columns of $\mathbf{A}$ are dependent then apply Gaussian elimination and identify the pivot columns of $\mathbf{A}$, i.e., the corresponding columns containing the leading $1$s in the row-reduced echelon matrix derived from $\mathbf{A}$. **Step 3**. Apply the Gram-Schmidt process to the pivot columns of $\mathbf{A}$ to obtain the orthogonal matrix $\mathbf{Q}$ (which is a $m \times k$ matrix where $k=\operatorname{rank}(\mathbf{A})$. **Step 4**. Compute the upper-triangular matrix $\mathbf{R}$ using the following observation $$\mathbf{Q}^{T}\mathbf{A} = \mathbf{Q}^{T}\mathbf{Q}\mathbf{R} = \mathbf{I}\mathbf{R} = \mathbf{R}.$$ > Example. We QR factorize the following matrix: $$\mathbf{A} = \begin{pmatrix} 3 & -5 & 1 \\ 1 & 1 & 1 \\ -1 & 5 & -2 \\ 3 & -7 & 8 \end{pmatrix}$$ The columns are $\mathbf{a}^{1} = \begin{pmatrix} 3 \\ 1 \\ -1 \\ 3 \end{pmatrix}$, $\mathbf{a}^{2} = \begin{pmatrix} -5 \\ 1 \\ 5 \\ -7 \end{pmatrix}$, and $\mathbf{a}^{3} = \begin{pmatrix} 1 \\ 1 \\ -2 \\ 8 \end{pmatrix}$. When applying Gaussian elimination, we can observe that the rank of $\mathbf{A}$ is $3$, confirming that $\{\mathbf{a}^{1}, \mathbf{a}^{2}, \mathbf{a}^{3}\}$ is a basis of its column space $\mathcal{C}(A)$. Next, apply the Gram-Schmidt process to obtain the QR factorization of $\mathbf{A}$ where $$\mathbf{Q} = \begin{pmatrix} 3 & 1 & -3 \\ 1 & 3 & 1 \\ -1 & 3 & 1 \\ 3 & -1 & 3 \end{pmatrix},$$ and $$\mathbf{R} = \mathbf{Q}^{T}\mathbf{A} = \begin{pmatrix} 20 & -40 & 30 \\ 0 & 20 & -10 \\ 0 & 0 & 20 \end{pmatrix}.$$