Linear Algebra Basics

# Linear Algebra Basics ###### tags: `Data Science`, `Linear Algebra` ## Definition * __Scalar__: a single number. e.g. $n=10$ * __Vector__: an array of numbers. e.g. $\mathbf{x} =\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{bmatrix}$ * __Matrix__: a 2-D array of numbers. e.g. $\mathbf{X}=\begin{bmatrix} x_{1,1} & x_{1,2} & \cdots & x_{1,m}\\ x_{2,1} & x_{2,2} & \cdots & x_{2,m}\\ \vdots & \vdots & \ddots & \vdots \\ x_{n,1} & x_{n,2} & \cdots & x_{n,m}\\ \end{bmatrix}$ * __Tensor__: a $n$-D array of numbers. e.g. $\mathbf{X}=[x_{i,j,k}]$ ## Vector Operations ### Vector * Linear combination: given a set of vectors $\{\mathbf{v}^{(1)}, \mathbf{v}^{(2)}...\mathbf{v}^{(n)}\}$, the linear combination of this set is $\mathbf{c}=\sum_{i}x_i\mathbf{v}^{(i)}$ * Dot (inner) product: $c=\mathbf{a}\cdot\mathbf{b}$ or $c=\langle\mathbf{a},\mathbf{b}\rangle$, where $c=\sum_{i}a_ib_i$ * Outer product: $\mathbf{C}=\mathbf{a}\otimes\mathbf{b}$, where $C_{i,j}=a_ib_j$ ### Matrix * Transpose: $(\mathbf{X}^T)_{i,j}=\mathbf{X}_{j,i}$ * Addition: $\mathbf{C}=\mathbf{A}+\mathbf{B}$, where $C_{i,j}=A_{i,j}+B_{i,j}$ * Scaler multiplication: $\mathbf{D}=a\mathbf{X}+c$, where $D_{i,j}=aX_{i,j}+c$ * Array (vector) multiplication: $\mathbf{C}=\mathbf{A}\mathbf{B}$, where $C_{i,j}=\sum_{k}A_{i,k}B_{k,j}$ * Hadamard product: $\mathbf{C}=\mathbf{A}\odot\mathbf{B}$, where $C_{i,j}=A_{i,j}\times B_{i,j}$ ## Properties ### Vector * __Norm__: the $L^p$ norm is defined as $\|\mathbf{x}\|_p=(\sum_i|x_i|^p)^{1/p}$, the norm function satisfies the following three properties: * $f(\mathbf{x})=0\Rightarrow \mathbf{x}=0$ * $f(\mathbf{x} + \mathbf{y})\leq f(\mathbf{x})+f(\mathbf{y})$ * $\forall\alpha\in\mathbb{R}, f(\alpha \mathbf{x})=|\alpha|f(\mathbf{x})$ * __Euclidean norm__: the $L^2$ norm * __Orthogonal__: $\mathbf{x}$ and $\mathbf{y}$ are orthogonal if $\mathbf{x}^T\mathbf{y}=0$ * __Orthonormal__: two orthogonal vectors both with unit norm ### Matrix * __Span__: the set of all points obtainable by the linear combination of a set of vectors * __Range__: the span of column vectors of $\mathbf{A}$, also known as _column space_. * __Rank__: the dimension of the range (column space) of $\mathbf{A}$ * __Linear dependence__: a set of vectors is _linearly independent_ if no vector in the set is a linear combination of the other vectors. * __Singular__: the square matrix with linearly dependent column vecotrs. * __Orthogonal__: square matrix with rows being mutually orthonomal and columns being mutually orthonomal: * $\mathbf{A}^T\mathbf{A}=\mathbf{A}\mathbf{A}^T=I$ or $\mathbf{A}^{-1}=\mathbf{A}$ * __Frobenius norm__: $\|\mathbf{A}\|_F=\sqrt{\sum_{i,j}A_{i,j}^2}$ * __Determinant__: $det(\mathbf{A})=\Pi_n\lambda$, where $\lambda$ is the eigenvalues of $\mathbf{A}$ * __Trace__: $tr(\mathbf{A})=\sum_iA_{i,i}$ * __Null space__: the null space of $\mathbf{A}$ is all the solutions to $\mathbf{Ax}=0$: * If $\mathbf{Ax}=0$, $\mathbf{Ay}=0$, then $\mathbf{A}(\mathbf{x} + \mathbf{y})=0$ * If $\mathbf{Ax}=0$, then $\mathbf{A}(c\mathbf{x})=0$ ## Properties of Multiplication * Distributive: $\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}$ * Associative: $\mathbf{A}(\mathbf{B}\mathbf{C})=(\mathbf{A}\mathbf{B})\mathbf{C}$ * __Not__ commutative: $\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A}$ is __not__ always true * Transpose: $(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T$ * Multiplication by a vector as linear combination of column vector: $\mathbf{A}\mathbf{x}=\mathbf{b}$ can be think of as $\mathbf{b}=x_1A_{.,1}+x_2A_{.,2}...x_mA_{.,m}$ ## Special Kinds of Vectors and Matrix * __Unit vector__: vector with unit $L^1$ norm * __Identity matrix__: $\forall \mathbf{x} \in \mathbb{R}^n,\mathbf{I}_n\mathbf{x}=\mathbf{x}$ * __Inverse matrix__: $\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_n$, inverse matrix $\mathbf{A}^{-1}$ only exists when (1) $\mathbf{A}$ is a square matrix, (2) $rank(\mathbf{A})<n$ and (3) $det(\mathbf{A})\ne0$. * __Diagonal matrix__: $D_{i,j}=0$ for all $i \ne j$ * __Symmetric matrix__: $\mathbf{A}=\mathbf{A}^T$ ## Eigenvector and Eigenvalue Given a square matrix $\mathbf{A}$, eigenvector is the nonzero vector $\mathbf{v}$, and eigenvalue is the scaler $\lambda$ such that $\mathbf{A}\mathbf{v}=\lambda\mathbf{v}$. * If $\mathbf{x}$ is the eigenvector of $\mathbf{A}$, then $s\mathbf{x}$ for $s \in \mathbb{R}, s \ne 0$ will also be the eigenvector of $\mathbf{A}^n$ * If $\mathbf{x}$ is the eigenvector of $\mathbf{A}$, then $\mathbf{x}$ will also be the eigenvector of $\mathbf{A}^n$ * If $\mathbf{x}$ is the eigenvector of $\mathbf{A}$, then $\mathbf{x}$ will also be the eigenvector of $\mathbf{A}^{-1}$ if the eigenvalue $\lambda \ne 0$. If $\lambda=0$, $\mathbf{A}$ is not invertible (singular). * If $\mathbf{B}=\mathbf{M}^{-1}\mathbf{AM}$, then $\mathbf{B}$ and $\mathbf{A}$ has the same eigenvalue ($\mathbf{B}$ and $\mathbf{A}$ are similar). * $\mathbf{AB}$ has the same eigenvalue of $\mathbf{BA}$ * Eigenvectors corresponding to two distinct eigenvalues are linearly independent * __Condition number__: $\frac{max(\mathbf{\lambda})}{min(\mathbf{\lambda})}$, refers to how rapidly a function chagnes with respect to small changes in its inputs. * __Definite symmetric matrix__ * __Positive definite__: a matrix whose eigenvalues are all positive * __Positive semidefinite__: a matrix whose eigenvalues are all positive or zero * __Negative definite__: a matrix whose eigenvalues are all negative * __Negative semidefinite__: a matrix whose eigenvalues are all negative or zero ## Reference * Ian Goodfellow and Yoshua Bengio and Aaron Courville. 2016. "Deep Learning". MIT Press. * Gilbert Strang. 2018. "MIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning"