Prerequisites for CS467

Linear Algebra

Vector

A vector is a list of numbers
$x = [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{matrix}] \in R^{n}$
$x^{⊤} = [\begin{matrix} x_{1} & x_{2} & \dots & x_{n} \end{matrix}]$

Inner product

given two vectors:
$x, y \in R^{n}$
notations:
$⟨ x, y ⟩$ or
$x^{⊤} y$
$x^{⊤} y = y^{⊤} x = \sum_{i = 1}^{n} x_{i} y_{i}$ (a scalar)

Euclidean norm (2-norm)

the length of the vector
given a vectors:
$x \in R^{n}$
$‖ x ‖ = \sqrt{\sum_{i = 1}^{n} x_{i}^{2}}$
$‖ x ‖^{2} = x^{⊤} x$
unit vector:
$\frac{x}{‖ x ‖}$

Orthogonal and orthonormal

given three vectors:
$x, y, z \in R^{n}$
if
$⟨ x, y ⟩ = 0$ , then
$x$ and
$y$ is orthogonal (perpendicular) to each other
the set of vectors
${x, y, z}$ is said to be orthonormal if
$x, y, z$ are mutually orthogonal, and
$‖ x ‖ = ‖ y ‖ = ‖ z ‖ = 1$

Linear independence

given a set of
$m$ vectors
${v_{1}, v_{2}, . . ., v_{m}}$
the set is linearly dependent if a vector in the set can be represented
as a linear combination of the remaining vectors
- $v_{m} = \sum_{i = 1}^{m - 1} α_{i} v_{i}, α_{i} \in R$
otherwise, the set is linearly independent

Span

the span of
${v_{1}, . . ., v_{m}}$ is the set of all vectors that can be expressed as a linear combination of
${v_{1}, . . ., v_{m}}$
- span
  $({v_{1}, . . ., v_{m}})$ =
  ${u : u = \sum_{i = 1}^{m} α_{i} v_{i}}$
if
${v_{1}, . . ., v_{m}}$ is linearly independent, where
$v_{i} \in R^{m}$ , then any vector
$u \in R^{m}$ can be written as a linear combination of
$v_{1}$ through
$v_{m}$
- in other words, span
  $({v_{1}, . . ., v_{m}}) = R^{m}$
- e.g.
  $v_{1} = [\begin{matrix} 1 \\ 0 \end{matrix}] v_{2} = [\begin{matrix} 0 \\ 1 \end{matrix}],$ span
  $({[\begin{matrix} 1 \\ 0 \end{matrix}], [\begin{matrix} 0 \\ 1 \end{matrix}]}) = R^{2}$
- we call
  ${v_{1}, . . ., v_{m}}$ a basis of
  $R^{m}$

Matrix

A matrix
$A \in R^{m \times n}$ is a 2d array of numbers with
$m$ rows and
$n$ columns
- if
  $m = n$ , we call
  $A$ a square matrix
Matrix multiplication
- given two matrices
  $A \in R^{m \times n}$ and
  $B \in R^{n \times p}$
- $C = A B \in R^{m \times p}$ , where
  $C_{i j} = \sum_{k = 1}^{n} A_{i k} B_{k j}$
- not commutative:
  $A B \neq B A$
Matrix-vector multiplication
- can be viewed as a linear combination of the columns of a matrix

Transpose
$A^{⊤}$

The transpose of a matrix results from flipping the rows and columns
- $(A^{⊤})_{i j} = A_{j i}$
Some properties:
- $(A B)^{⊤} = B^{⊤} A^{⊤}$
- $(A + B)^{⊤} = A^{⊤} + B^{⊤}$
If
$A^{⊤} = A$ , we call
$A$ a symmetric matrix

Trace
$tr (A)$

The trace of a square matrix is the sum of its diagonal
- $tr (A) = \sum_{i = 1}^{n} A_{i i}$
If
$A, B$ are square matrix, then
$tr (A B) = tr (B A)$
- $\sum_{i = 1}^{n} (A B)_{i i} = \sum_{i = 1}^{n} \sum_{k = 1}^{n} A_{i k} B_{k i} = \sum_{k = 1}^{n} \sum_{i = 1}^{n} B_{k i} A_{i k} = \sum_{k = 1}^{n} (B A)_{k k}$

Inverse
$A^{- 1}$

The inverse of a square matrix
$A$ is the unique matrix such that
- $A^{- 1} A = I = A A^{- 1}$
Let
$A$ be a
$n \times n$ matrix.
$A$ is invertible iff
- the column vectors of
  $A$ is linearly independent (spans
  $R^{n}$ )
- $det (A) \neq 0$
Assume that both
$A, B$ are invertible. Some properties:
- $(A B)^{- 1} = B^{- 1} A^{- 1}$
- $(A^{- 1})^{⊤} = (A^{⊤})^{- 1}$

Orthogonal matrix

a square matrix is orthogonal if all its columns are orthonormal
$U$ is a orthogonal matrix iff
- $U^{⊤} U = I = U U^{⊤}$
- $U^{- 1} = U^{⊤}$
Note: if
$U$ is not square but has orthonormal columns, we still have
$U^{⊤} U = I$ but not
$U U^{⊤} = I$

Calculus

Chain rule

$\frac{d y}{d x} = \frac{d y}{d u} \frac{d u}{d x}$ (single-variable)
e.g.,
$y = (x^{2} + 1)^{3}, \frac{d y}{d x} = ?$
- let
  $u = x^{2} + 1$ , then
  $y = u^{3}$
- $\frac{d y}{d x} = 3 (x^{2} + 1)^{2} (2 x)$

Critical points

$f^{'} > 0$ =>
$f$ is increasing
$f^{'} < 0$ =>
$f$ is decreasing
$f^{″} > 0$ =>
$f^{'}$ is increasing =>
$f$ is concave up
$f^{″} < 0$ =>
$f^{'}$ is decreasing =>
$f$ is concave down
We call
$x = a$ a critical point of a continous function
$f (x)$ if either
$f^{'} (a) = 0$ or
$f^{'} (a)$ is undefined

Taylor series

$f (x) = \sum_{n = 0}^{\infty} \frac{f^{(n)} (a)}{n!} (x - a)^{n}$ (single-variable)
second order approximation:
- $f (x) \approx f (a) + f^{'} (a) (x - a) + \frac{1}{2} f^{″} (a) (x - a)^{2}$ (single-variable)
  - for
    $x$ close enough to
    $a$
- $f (w) \approx f (u) + \nabla f (u)^{⊤} (w - u) + \frac{1}{2} (w - u)^{⊤} H_{f} (u) (w - u)$ (multivariate)
  - $w, u \in R^{d}$ , for
    $w$ close enough to
    $u$
  - ${\nabla_{w} f}^{⊤} = [\frac{\partial f}{\partial w_{0}}, \dots, \frac{\partial f}{\partial w_{d}}]$
  - $H_{f}$ is the Hessian matrix, where
    $(H_{f})_{i j} = \frac{\partial^{2} f}{\partial w_{i} \partial w_{j}}$

Probability

Conditional probability

$P (A | B)$ : the conditional probability of event
$A$ happening given that event
$B$ has occurred
- $P (A | B) = \frac{P (A, B)}{P (B)}$
$P (A | B) = \frac{P (B | A) P (A)}{P (B)}$ (Bayes rule)
${A_{i}}_{i = 1}^{n}$ is a partition of the sample space. We have
$P (B) = \sum_{i = 1}^{n} P (B, A_{i}) = \sum_{i = 1}^{n} P (B | A_{i}) P (A_{i})$

Independece and conditional independence

random variable
$X$ is independent with
$Y$ iff
- $P (X | Y) = P (X)$
- $P (X, Y) = P (X) P (Y)$
random variable
$X$ is conditionally independent with
$Y$ given
$Z$ iff
- $P (X | Y, Z) = P (X | Z)$
- $P (X, Y | Z) = P (X | Z) P (Y | Z)$

Expectation

$X$ : the sample space. The set of all possible outcomes of an experiment.
- e.g., the face of a dice,
  $X = {1, 2, 3, 4, 5, 6}$
$p (x)$ : probability mass function (discrete) or probability density function (continuous)
$E [X] = \sum_{x \in X} x p (x)$ (discrete)
$E [X] = \int_{X} x p (x) d x$ (continuous)
- more generally,
  $E [g (X)] = \int_{X} g (x) p (x) d x$
Linearity of expectation
- $E [a X + b] = a E [X] + b$
- $E [X + Y] = E [X] + E [Y]$
If
$X$ and
$Y$ are independent,
$E [X Y] = E [X] E [Y]$
- why? plug in
  $P (X, Y) = P (X) P (Y)$ into the definintion of expectation
- the converse is not true

Variance and covariance

$Var [X] = E [(X - μ)^{2}]$ , where
$μ = E [X]$
- $= E [X^{2}] - μ^{2}$
$Var [a X + b] = a^{2} Var [X]$
$Cov [X, Y] = E [(X - μ_{X}) (Y - μ_{Y})]$
If
$X$ and
$Y$ are independent, then
$Cov [X, Y] = 0$
- the converse is not true

Covariance matrix

Consider
$d$ random variables
$X_{1}, . . ., X_{d}$
$Cov [X_{i}, X_{j}] = E [(X_{i} - μ_{X_{i}}) (X_{j} - μ_{X_{j}})]$
We can write a
$d \times d$ matrix
$Σ$ , where
$Σ_{i j} = Cov [X_{i}, X_{j}]$
- when
  $i = j$ (diagonal),
  $E [(X_{i} - μ_{X_{i}}) (X_{i} - μ_{X_{i}})] = Var [X_{i}]$

Prerequisites for CS467

Linear Algebra

Vector

Inner product

Euclidean norm (2-norm)

Orthogonal and orthonormal

Linear independence

Span

Matrix

Transpose A⊤

Trace tr(A)

Inverse A−1

Orthogonal matrix

Calculus

Chain rule

Critical points

Taylor series

Probability

Conditional probability

Independece and conditional independence

Expectation

Variance and covariance

Covariance matrix

Transpose
$A^{⊤}$

Trace
$tr (A)$

Inverse
$A^{- 1}$