# Linear Algebra Basics
###### tags: `Data Science`, `Linear Algebra`
## Definition
* __Scalar__: a single number. e.g. $n=10$
* __Vector__: an array of numbers. e.g. $\mathbf{x} =\begin{bmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n \\
\end{bmatrix}$
* __Matrix__: a 2-D array of numbers. e.g. $\mathbf{X}=\begin{bmatrix}
x_{1,1} & x_{1,2} & \cdots & x_{1,m}\\
x_{2,1} & x_{2,2} & \cdots & x_{2,m}\\
\vdots & \vdots & \ddots & \vdots \\
x_{n,1} & x_{n,2} & \cdots & x_{n,m}\\
\end{bmatrix}$
* __Tensor__: a $n$-D array of numbers. e.g. $\mathbf{X}=[x_{i,j,k}]$
## Vector Operations
### Vector
* Linear combination: given a set of vectors $\{\mathbf{v}^{(1)}, \mathbf{v}^{(2)}...\mathbf{v}^{(n)}\}$, the linear combination of this set is $\mathbf{c}=\sum_{i}x_i\mathbf{v}^{(i)}$
* Dot (inner) product: $c=\mathbf{a}\cdot\mathbf{b}$ or $c=\langle\mathbf{a},\mathbf{b}\rangle$, where $c=\sum_{i}a_ib_i$
* Outer product: $\mathbf{C}=\mathbf{a}\otimes\mathbf{b}$, where $C_{i,j}=a_ib_j$
### Matrix
* Transpose: $(\mathbf{X}^T)_{i,j}=\mathbf{X}_{j,i}$
* Addition: $\mathbf{C}=\mathbf{A}+\mathbf{B}$, where $C_{i,j}=A_{i,j}+B_{i,j}$
* Scaler multiplication: $\mathbf{D}=a\mathbf{X}+c$, where $D_{i,j}=aX_{i,j}+c$
* Array (vector) multiplication: $\mathbf{C}=\mathbf{A}\mathbf{B}$, where $C_{i,j}=\sum_{k}A_{i,k}B_{k,j}$
* Hadamard product: $\mathbf{C}=\mathbf{A}\odot\mathbf{B}$, where $C_{i,j}=A_{i,j}\times B_{i,j}$
## Properties
### Vector
* __Norm__: the $L^p$ norm is defined as $\|\mathbf{x}\|_p=(\sum_i|x_i|^p)^{1/p}$, the norm function satisfies the following three properties:
* $f(\mathbf{x})=0\Rightarrow \mathbf{x}=0$
* $f(\mathbf{x} + \mathbf{y})\leq f(\mathbf{x})+f(\mathbf{y})$
* $\forall\alpha\in\mathbb{R}, f(\alpha \mathbf{x})=|\alpha|f(\mathbf{x})$
* __Euclidean norm__: the $L^2$ norm
* __Orthogonal__: $\mathbf{x}$ and $\mathbf{y}$ are orthogonal if $\mathbf{x}^T\mathbf{y}=0$
* __Orthonormal__: two orthogonal vectors both with unit norm
### Matrix
* __Span__: the set of all points obtainable by the linear combination of a set of vectors
* __Range__: the span of column vectors of $\mathbf{A}$, also known as _column space_.
* __Rank__: the dimension of the range (column space) of $\mathbf{A}$
* __Linear dependence__: a set of vectors is _linearly independent_ if no vector in the set is a linear combination of the other vectors.
* __Singular__: the square matrix with linearly dependent column vecotrs.
* __Orthogonal__: square matrix with rows being mutually orthonomal and columns being mutually orthonomal:
* $\mathbf{A}^T\mathbf{A}=\mathbf{A}\mathbf{A}^T=I$ or $\mathbf{A}^{-1}=\mathbf{A}$
* __Frobenius norm__: $\|\mathbf{A}\|_F=\sqrt{\sum_{i,j}A_{i,j}^2}$
* __Determinant__: $det(\mathbf{A})=\Pi_n\lambda$, where $\lambda$ is the eigenvalues of $\mathbf{A}$
* __Trace__: $tr(\mathbf{A})=\sum_iA_{i,i}$
* __Null space__: the null space of $\mathbf{A}$ is all the solutions to $\mathbf{Ax}=0$:
* If $\mathbf{Ax}=0$, $\mathbf{Ay}=0$, then $\mathbf{A}(\mathbf{x} + \mathbf{y})=0$
* If $\mathbf{Ax}=0$, then $\mathbf{A}(c\mathbf{x})=0$
## Properties of Multiplication
* Distributive: $\mathbf{A}(\mathbf{B}+\mathbf{C})=\mathbf{A}\mathbf{B}+\mathbf{A}\mathbf{C}$
* Associative: $\mathbf{A}(\mathbf{B}\mathbf{C})=(\mathbf{A}\mathbf{B})\mathbf{C}$
* __Not__ commutative: $\mathbf{A}\mathbf{B}=\mathbf{B}\mathbf{A}$ is __not__ always true
* Transpose: $(\mathbf{A}\mathbf{B})^T=\mathbf{B}^T\mathbf{A}^T$
* Multiplication by a vector as linear combination of column vector: $\mathbf{A}\mathbf{x}=\mathbf{b}$ can be think of as $\mathbf{b}=x_1A_{.,1}+x_2A_{.,2}...x_mA_{.,m}$
## Special Kinds of Vectors and Matrix
* __Unit vector__: vector with unit $L^1$ norm
* __Identity matrix__: $\forall \mathbf{x} \in \mathbb{R}^n,\mathbf{I}_n\mathbf{x}=\mathbf{x}$
* __Inverse matrix__: $\mathbf{A}^{-1}\mathbf{A}=\mathbf{I}_n$, inverse matrix $\mathbf{A}^{-1}$ only exists when (1) $\mathbf{A}$ is a square matrix, (2) $rank(\mathbf{A})<n$ and (3) $det(\mathbf{A})\ne0$.
* __Diagonal matrix__: $D_{i,j}=0$ for all $i \ne j$
* __Symmetric matrix__: $\mathbf{A}=\mathbf{A}^T$
## Eigenvector and Eigenvalue
Given a square matrix $\mathbf{A}$, eigenvector is the nonzero vector $\mathbf{v}$, and eigenvalue is the scaler $\lambda$ such that $\mathbf{A}\mathbf{v}=\lambda\mathbf{v}$.
* If $\mathbf{x}$ is the eigenvector of $\mathbf{A}$, then $s\mathbf{x}$ for $s \in \mathbb{R}, s \ne 0$ will also be the eigenvector of $\mathbf{A}^n$
* If $\mathbf{x}$ is the eigenvector of $\mathbf{A}$, then $\mathbf{x}$ will also be the eigenvector of $\mathbf{A}^n$
* If $\mathbf{x}$ is the eigenvector of $\mathbf{A}$, then $\mathbf{x}$ will also be the eigenvector of $\mathbf{A}^{-1}$ if the eigenvalue $\lambda \ne 0$. If $\lambda=0$, $\mathbf{A}$ is not invertible (singular).
* If $\mathbf{B}=\mathbf{M}^{-1}\mathbf{AM}$, then $\mathbf{B}$ and $\mathbf{A}$ has the same eigenvalue ($\mathbf{B}$ and $\mathbf{A}$ are similar).
* $\mathbf{AB}$ has the same eigenvalue of $\mathbf{BA}$
* Eigenvectors corresponding to two distinct eigenvalues are linearly independent
* __Condition number__: $\frac{max(\mathbf{\lambda})}{min(\mathbf{\lambda})}$, refers to how rapidly a function chagnes with respect to small changes in its inputs.
* __Definite symmetric matrix__
* __Positive definite__: a matrix whose eigenvalues are all positive
* __Positive semidefinite__: a matrix whose eigenvalues are all positive or zero
* __Negative definite__: a matrix whose eigenvalues are all negative
* __Negative semidefinite__: a matrix whose eigenvalues are all negative or zero
## Reference
* Ian Goodfellow and Yoshua Bengio and Aaron Courville. 2016. "Deep Learning". MIT Press.
* Gilbert Strang. 2018. "MIT 18.065 Matrix Methods in Data Analysis, Signal Processing, and Machine Learning"