Ting-Yun Chang
[Prerequisites for CS467](https://probml.github.io/pml-book/book1.html)
Try
HackMD
Ting-Yun Chang
·
Follow
Last edited by
Ting-Yun Chang
on
Mar 1, 2025
Linked with GitHub
Contributed by
0
Comments
Feedback
Log in to edit or delete your comments and be notified of replies.
Sign up
Already have an account? Log in
There is no comment
Select some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Discard
Send
Prerequisites for CS467
Linear Algebra
Vector
A vector is a list of numbers
x
=
[
x
1
x
2
⋮
x
n
]
∈
R
n
x
⊤
=
[
x
1
x
2
⋯
x
n
]
Inner product
given two vectors:
x
,
y
∈
R
n
notations:
⟨
x
,
y
⟩
or
x
⊤
y
x
⊤
y
=
y
⊤
x
=
∑
i
=
1
n
x
i
y
i
(a scalar)
Euclidean norm (2-norm)
the length of the vector
given a vectors:
x
∈
R
n
‖
x
‖
=
∑
i
=
1
n
x
i
2
‖
x
‖
2
=
x
⊤
x
unit vector:
x
‖
x
‖
Orthogonal and orthonormal
given three vectors:
x
,
y
,
z
∈
R
n
if
⟨
x
,
y
⟩
=
0
, then
x
and
y
is
orthogonal
(perpendicular) to each other
the set of vectors
{
x
,
y
,
z
}
is said to be orthonormal if
x
,
y
,
z
are mutually orthogonal, and
‖
x
‖
=
‖
y
‖
=
‖
z
‖
=
1
Linear independence
given a set of
m
vectors
{
v
1
,
v
2
,
.
.
.
,
v
m
}
the set is
linearly dependent
if a vector in the set can be represented
as a linear combination of the remaining vectors
v
m
=
∑
i
=
1
m
−
1
α
i
v
i
,
α
i
∈
R
otherwise, the set is
linearly independent
Span
the
span
of
{
v
1
,
.
.
.
,
v
m
}
is the set of all vectors that can be expressed as a linear combination of
{
v
1
,
.
.
.
,
v
m
}
span
(
{
v
1
,
.
.
.
,
v
m
}
)
=
{
u
:
u
=
∑
i
=
1
m
α
i
v
i
}
if
{
v
1
,
.
.
.
,
v
m
}
is linearly independent, where
v
i
∈
R
m
, then any vector
u
∈
R
m
can be written as a linear combination of
v
1
through
v
m
in other words, span
(
{
v
1
,
.
.
.
,
v
m
}
)
=
R
m
e.g.
v
1
=
[
1
0
]
v
2
=
[
0
1
]
,
span
(
{
[
1
0
]
,
[
0
1
]
}
)
=
R
2
we call
{
v
1
,
.
.
.
,
v
m
}
a
basis
of
R
m
Matrix
A matrix
A
∈
R
m
×
n
is a 2d array of numbers with
m
rows and
n
columns
if
m
=
n
, we call
A
a square matrix
Matrix multiplication
given two matrices
A
∈
R
m
×
n
and
B
∈
R
n
×
p
C
=
A
B
∈
R
m
×
p
, where
C
i
j
=
∑
k
=
1
n
A
i
k
B
k
j
not commutative:
A
B
≠
B
A
Matrix-vector multiplication
can be viewed as a linear combination of the columns of a matrix
Transpose
A
⊤
The transpose of a matrix results from flipping the rows and columns
(
A
⊤
)
i
j
=
A
j
i
Some properties:
(
A
B
)
⊤
=
B
⊤
A
⊤
(
A
+
B
)
⊤
=
A
⊤
+
B
⊤
If
A
⊤
=
A
, we call
A
a
symmetric
matrix
Trace
tr
(
A
)
The trace of a square matrix is the sum of its diagonal
tr
(
A
)
=
∑
i
=
1
n
A
i
i
If
A
,
B
are square matrix, then
tr
(
A
B
)
=
tr
(
B
A
)
∑
i
=
1
n
(
A
B
)
i
i
=
∑
i
=
1
n
∑
k
=
1
n
A
i
k
B
k
i
=
∑
k
=
1
n
∑
i
=
1
n
B
k
i
A
i
k
=
∑
k
=
1
n
(
B
A
)
k
k
Inverse
A
−
1
The inverse of a square matrix
A
is the unique matrix such that
A
−
1
A
=
I
=
A
A
−
1
Let
A
be a
n
×
n
matrix.
A
is invertible iff
the column vectors of
A
is linearly independent (spans
R
n
)
det
(
A
)
≠
0
Assume that both
A
,
B
are invertible. Some properties:
(
A
B
)
−
1
=
B
−
1
A
−
1
(
A
−
1
)
⊤
=
(
A
⊤
)
−
1
Orthogonal matrix
a square matrix is orthogonal if all its columns are
orthonormal
U
is a orthogonal matrix iff
U
⊤
U
=
I
=
U
U
⊤
U
−
1
=
U
⊤
Note: if
U
is not square but has orthonormal columns, we still have
U
⊤
U
=
I
but not
U
U
⊤
=
I
Calculus
Chain rule
d
y
d
x
=
d
y
d
u
d
u
d
x
(single-variable)
e.g.,
y
=
(
x
2
+
1
)
3
,
d
y
d
x
=
?
let
u
=
x
2
+
1
, then
y
=
u
3
d
y
d
x
=
3
(
x
2
+
1
)
2
(
2
x
)
Critical points
f
′
>
0
=>
f
is increasing
f
′
<
0
=>
f
is decreasing
f
″
>
0
=>
f
′
is increasing =>
f
is concave up
f
″
<
0
=>
f
′
is decreasing =>
f
is concave down
We call
x
=
a
a critical point of a continous function
f
(
x
)
if either
f
′
(
a
)
=
0
or
f
′
(
a
)
is undefined
Taylor series
f
(
x
)
=
∑
n
=
0
∞
f
(
n
)
(
a
)
n
!
(
x
−
a
)
n
(single-variable)
second order approximation:
f
(
x
)
≈
f
(
a
)
+
f
′
(
a
)
(
x
−
a
)
+
1
2
f
″
(
a
)
(
x
−
a
)
2
(single-variable)
for
x
close enough to
a
f
(
w
)
≈
f
(
u
)
+
∇
f
(
u
)
⊤
(
w
−
u
)
+
1
2
(
w
−
u
)
⊤
H
f
(
u
)
(
w
−
u
)
(multivariate)
w
,
u
∈
R
d
, for
w
close enough to
u
∇
w
f
⊤
=
[
∂
f
∂
w
0
,
⋯
,
∂
f
∂
w
d
]
H
f
is the Hessian matrix, where
(
H
f
)
i
j
=
∂
2
f
∂
w
i
∂
w
j
Probability
Conditional probability
P
(
A
|
B
)
: the conditional probability of event
A
happening given that event
B
has occurred
P
(
A
|
B
)
=
P
(
A
,
B
)
P
(
B
)
P
(
A
|
B
)
=
P
(
B
|
A
)
P
(
A
)
P
(
B
)
(
Bayes rule
)
{
A
i
}
i
=
1
n
is a partition of the sample space. We have
P
(
B
)
=
∑
i
=
1
n
P
(
B
,
A
i
)
=
∑
i
=
1
n
P
(
B
|
A
i
)
P
(
A
i
)
Independece and conditional independence
random variable
X
is independent with
Y
iff
P
(
X
|
Y
)
=
P
(
X
)
P
(
X
,
Y
)
=
P
(
X
)
P
(
Y
)
random variable
X
is conditionally independent with
Y
given
Z
iff
P
(
X
|
Y
,
Z
)
=
P
(
X
|
Z
)
P
(
X
,
Y
|
Z
)
=
P
(
X
|
Z
)
P
(
Y
|
Z
)
Expectation
X
: the
sample space
. The set of all possible outcomes of an experiment.
e.g., the face of a dice,
X
=
{
1
,
2
,
3
,
4
,
5
,
6
}
p
(
x
)
: probability mass function (discrete) or probability density function (continuous)
E
[
X
]
=
∑
x
∈
X
x
p
(
x
)
(discrete)
E
[
X
]
=
∫
X
x
p
(
x
)
d
x
(continuous)
more generally,
E
[
g
(
X
)
]
=
∫
X
g
(
x
)
p
(
x
)
d
x
Linearity of expectation
E
[
a
X
+
b
]
=
a
E
[
X
]
+
b
E
[
X
+
Y
]
=
E
[
X
]
+
E
[
Y
]
If
X
and
Y
are independent,
E
[
X
Y
]
=
E
[
X
]
E
[
Y
]
why? plug in
P
(
X
,
Y
)
=
P
(
X
)
P
(
Y
)
into the definintion of expectation
the converse is
not
true
Variance and covariance
Var
[
X
]
=
E
[
(
X
−
μ
)
2
]
, where
μ
=
E
[
X
]
=
E
[
X
2
]
−
μ
2
Var
[
a
X
+
b
]
=
a
2
Var
[
X
]
Cov
[
X
,
Y
]
=
E
[
(
X
−
μ
X
)
(
Y
−
μ
Y
)
]
If
X
and
Y
are independent, then
Cov
[
X
,
Y
]
=
0
the converse is
not
true
Covariance matrix
Consider
d
random variables
X
1
,
.
.
.
,
X
d
Cov
[
X
i
,
X
j
]
=
E
[
(
X
i
−
μ
X
i
)
(
X
j
−
μ
X
j
)
]
We can write a
d
×
d
matrix
Σ
, where
Σ
i
j
=
Cov
[
X
i
,
X
j
]
when
i
=
j
(diagonal),
E
[
(
X
i
−
μ
X
i
)
(
X
i
−
μ
X
i
)
]
=
Var
[
X
i
]
Prerequisites for CS467
Linear Algebra
Vector
Inner product
Euclidean norm (2-norm)
Orthogonal and orthonormal
Linear independence
Span
Matrix
Transpose A⊤
Trace tr(A)
Inverse A−1
Orthogonal matrix
Calculus
Chain rule
Critical points
Taylor series
Probability
Conditional probability
Independece and conditional independence
Expectation
Variance and covariance
Expand all
Back to top
Go to bottom
Prerequisites for CS467
Linear Algebra
Vector
Inner product
Euclidean norm (2-norm)
Orthogonal and orthonormal
Linear independence
Span
Matrix
Transpose A⊤
Trace tr(A)
Inverse A−1
Orthogonal matrix
Calculus
Chain rule
Critical points
Taylor series
Probability
Conditional probability
Independece and conditional independence
Expectation
Variance and covariance
Expand all
Back to top
Go to bottom
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up
Comment