Try   HackMD

Prerequisites for CS467

Linear Algebra

Vector

  • A vector is a list of numbers
  • x=[x1x2xn]Rn
  • x=[x1x2xn]

Inner product

  • given two vectors:
    x,yRn
  • notations:
    x,y
    or
    xy
  • xy=yx=i=1nxiyi
    (a scalar)

Euclidean norm (2-norm)

  • the length of the vector
  • given a vectors:
    xRn
  • x=i=1nxi2
  • x2=xx
  • unit vector:
    xx

Orthogonal and orthonormal

  • given three vectors:
    x,y,zRn
  • if
    x,y=0
    , then
    x
    and
    y
    is orthogonal (perpendicular) to each other
  • the set of vectors
    {x,y,z}
    is said to be orthonormal if
    x,y,z
    are mutually orthogonal, and
    x=y=z=1

Linear independence

  • given a set of
    m
    vectors
    {v1,v2,...,vm}
  • the set is linearly dependent if a vector in the set can be represented
    as a linear combination of the remaining vectors
    • vm=i=1m1αivi,αiR
  • otherwise, the set is linearly independent

Span

  • the span of
    {v1,...,vm}
    is the set of all vectors that can be expressed as a linear combination of
    {v1,...,vm}
    • span
      ({v1,...,vm})
      =
      {u:u=i=1mαivi}
  • if
    {v1,...,vm}
    is linearly independent, where
    viRm
    , then any vector
    uRm
    can be written as a linear combination of
    v1
    through
    vm
    • in other words, span
      ({v1,...,vm})=Rm
    • e.g.
      v1=[10]v2=[01],
      span
      ({[10],[01]})=R2
    • we call
      {v1,...,vm}
      a basis of
      Rm

Matrix

  • A matrix
    ARm×n
    is a 2d array of numbers with
    m
    rows and
    n
    columns
    • if
      m=n
      , we call
      A
      a square matrix
  • Matrix multiplication
    • given two matrices
      ARm×n
      and
      BRn×p
    • C=ABRm×p
      , where
      Cij=k=1nAikBkj
    • not commutative:
      ABBA
  • Matrix-vector multiplication
    • can be viewed as a linear combination of the columns of a matrix

Transpose
A

  • The transpose of a matrix results from flipping the rows and columns
    • (A)ij=Aji
  • Some properties:
    • (AB)=BA
    • (A+B)=A+B
  • If
    A=A
    , we call
    A
    a symmetric matrix

Trace
tr(A)

  • The trace of a square matrix is the sum of its diagonal
    • tr(A)=i=1nAii
  • If
    A,B
    are square matrix, then
    tr(AB)=tr(BA)
    • i=1n(AB)ii=i=1nk=1nAikBki=k=1ni=1nBkiAik=k=1n(BA)kk

Inverse
A1

  • The inverse of a square matrix
    A
    is the unique matrix such that
    • A1A=I=AA1
  • Let
    A
    be a
    n×n
    matrix.
    A
    is invertible iff
    • the column vectors of
      A
      is linearly independent (spans
      Rn
      )
    • det(A)0
  • Assume that both
    A,B
    are invertible. Some properties:
    • (AB)1=B1A1
    • (A1)=(A)1

Orthogonal matrix

  • a square matrix is orthogonal if all its columns are orthonormal
  • U
    is a orthogonal matrix iff
    • UU=I=UU
    • U1=U
  • Note: if
    U
    is not square but has orthonormal columns, we still have
    UU=I
    but not
    UU=I

Calculus

Chain rule

  • dydx=dydududx
    (single-variable)
  • e.g.,
    y=(x2+1)3,dydx=?
    • let
      u=x2+1
      , then
      y=u3
    • dydx=3(x2+1)2(2x)

Critical points

  • f>0
    =>
    f
    is increasing
  • f<0
    =>
    f
    is decreasing
  • f>0
    =>
    f
    is increasing =>
    f
    is concave up
  • f<0
    =>
    f
    is decreasing =>
    f
    is concave down
  • We call
    x=a
    a critical point of a continous function
    f(x)
    if either
    f(a)=0
    or
    f(a)
    is undefined

Taylor series

  • f(x)=n=0f(n)(a)n!(xa)n
    (single-variable)
  • second order approximation:
    • f(x)f(a)+f(a)(xa)+12f(a)(xa)2
      (single-variable)
      • for
        x
        close enough to
        a
    • f(w)f(u)+f(u)(wu)+12(wu)Hf(u)(wu)
      (multivariate)
      • w,uRd
        , for
        w
        close enough to
        u
      • wf=[fw0,,fwd]
      • Hf
        is the Hessian matrix, where
        (Hf)ij=2fwiwj

Probability

Conditional probability

  • P(A|B)
    : the conditional probability of event
    A
    happening given that event
    B
    has occurred
    • P(A|B)=P(A,B)P(B)
  • P(A|B)=P(B|A)P(A)P(B)
    (Bayes rule)
  • {Ai}i=1n
    is a partition of the sample space. We have
    P(B)=i=1nP(B,Ai)=i=1nP(B|Ai)P(Ai)

Independece and conditional independence

  • random variable
    X
    is independent with
    Y
    iff
    • P(X|Y)=P(X)
    • P(X,Y)=P(X)P(Y)
  • random variable
    X
    is conditionally independent with
    Y
    given
    Z
    iff
    • P(X|Y,Z)=P(X|Z)
    • P(X,Y|Z)=P(X|Z)P(Y|Z)

Expectation

  • X
    : the sample space. The set of all possible outcomes of an experiment.
    • e.g., the face of a dice,
      X={1,2,3,4,5,6}
  • p(x)
    : probability mass function (discrete) or probability density function (continuous)
  • E[X]=xXxp(x)
    (discrete)
  • E[X]=Xxp(x)dx
    (continuous)
    • more generally,
      E[g(X)]=Xg(x)p(x)dx
  • Linearity of expectation
    • E[aX+b]=aE[X]+b
    • E[X+Y]=E[X]+E[Y]
  • If
    X
    and
    Y
    are independent,
    E[XY]=E[X]E[Y]
    • why? plug in
      P(X,Y)=P(X)P(Y)
      into the definintion of expectation
    • the converse is not true

Variance and covariance

  • Var[X]=E[(Xμ)2]
    , where
    μ=E[X]
    • =E[X2]μ2
  • Var[aX+b]=a2Var[X]
  • Cov[X,Y]=E[(XμX)(YμY)]
  • If
    X
    and
    Y
    are independent, then
    Cov[X,Y]=0
    • the converse is not true

Covariance matrix

  • Consider
    d
    random variables
    X1,...,Xd
  • Cov[Xi,Xj]=E[(XiμXi)(XjμXj)]
  • We can write a
    d×d
    matrix
    Σ
    , where
    Σij=Cov[Xi,Xj]
    • when
      i=j
      (diagonal),
      E[(XiμXi)(XiμXi)]=Var[Xi]