Ting-Yun Chang
[CS467 Cheatsheet](https://robinjia.github.io/classes/spring2023-csci467/)
Try
HackMD
Ting-Yun Chang
·
Follow
Last edited by
Ting-Yun Chang
on
Feb 10, 2023
Linked with GitHub
Contributed by
0
Comments
Feedback
Log in to edit or delete your comments and be notified of replies.
Sign up
Already have an account? Log in
There is no comment
Select some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Discard
Send
CS467 Cheatsheet
Gradient Descent
Gradient:
∇
w
f
(
w
)
=
[
∂
f
∂
w
0
⋮
∂
f
∂
w
d
]
∈
R
d
, where
w
∈
R
d
gradient is the direction of steepest ascent
negative gradient is the direction of steepest descent
update rule:
w
←
w
−
η
∇
w
L
(
w
)
,
where
L
(
w
)
is the loss function
Convextiy
convex function
f
: a line connecting
(
x
1
,
f
(
x
1
)
)
and
(
x
2
,
f
(
x
2
)
)
must lie above the function
If
f
″
(
x
)
≥
0
and exists everywhere, then
f
is convex
If
f
is convex, then
g
(
x
)
=
f
(
A
x
+
b
)
is convex
If
f
(
x
)
and
g
(
x
)
are convex, so is
f
(
x
)
+
g
(
x
)
For a convex funcion, any local minimum is a global minimum
Maximum Likelihood Estimation (MLE)
posit a
probablistic process
that generated our data
find parameters
w
that make observed data seem most likely
max
w
P
(
D
;
w
)
, where data
D
=
{
(
x
(
i
)
,
y
(
i
)
)
}
i
=
1
n
view
w
as being constant-valued but unknown
Recipe
: take the negative log-likelihood of data as the loss function, and then use
gradient descent
to find the parameters
w
that miminize loss
e.g., for
discriminative
models,
L
(
w
)
=
−
∑
i
=
1
n
log
P
(
y
(
i
)
|
x
(
i
)
;
w
)
Maximum A Posteriori Estimation (MAP)
assume a prior
P
(
w
)
, which models our prior belief or preference about
w
view
w
as a random variable
max
w
P
(
w
|
D
)
: find parameters
w
after seeing the data
D
=
max
w
P
(
D
|
w
)
P
(
w
)
Linear/Logistic/Softmax Regression
Linear regression for regression tasks (predict scalars)
has closed-form solution, aka the normal equation:
set
∇
w
L
(
w
)
to
0
w
∗
=
(
X
⊤
X
)
−
1
X
⊤
y
,
where
X
∈
R
n
×
d
,
y
∈
R
n
Logistic regression for binary classification tasks
Softmax regression for multiclass classification tasks
Logistic vs. Softmax
logistic function:
σ
(
z
)
=
1
1
+
e
−
z
range:
[
0
,
1
]
softmax function:
s
(
z
i
)
=
e
z
i
∑
k
=
1
C
e
z
k
∑
i
=
1
C
s
(
z
i
)
=
1
when only two classes (binary classification), softmax regression is equivalent to logistic regression
e
z
1
e
z
0
+
e
z
1
=
1
e
(
z
0
−
z
1
)
+
1
=
1
1
+
e
−
z
′
,
where
z
′
=
z
1
−
z
0
Logistic Regression
P
(
y
=
1
|
x
)
=
σ
(
w
⊤
x
)
P
(
y
=
−
1
|
x
)
=
1
−
P
(
y
=
1
|
x
)
=
1
−
1
1
+
e
−
w
⊤
x
=
σ
(
−
w
⊤
x
)
Thus,
P
(
y
|
x
)
can be written as
σ
(
y
w
⊤
x
)
With MLE, loss
L
(
w
)
=
∑
i
=
1
n
−
log
σ
(
y
(
i
)
w
⊤
x
(
i
)
)
we call
y
(
i
)
w
⊤
x
(
i
)
margin
; the larger the margin, the lower the loss
−
log
σ
(
⋅
)
is convex
→
logistic regression loss function is convex
∇
w
L
(
w
)
=
−
∑
i
=
1
n
σ
(
−
y
(
i
)
w
⊤
x
(
i
)
)
⋅
y
(
i
)
⋅
x
(
i
)
Discriminative vs. Generative Models
Discriminative
directly learn
P
(
y
|
x
)
MLE maximizes conditional likelihood
P
(
y
|
x
)
Likelihood of data:
∏
i
=
1
n
P
(
y
(
i
)
|
x
(
i
)
;
w
)
Log-likelihood:
∑
i
=
1
n
log
P
(
y
(
i
)
|
x
(
i
)
;
w
)
e.g., Logistic Regression
Generative
learn
P
(
x
|
y
)
(features given the class)
learn
P
(
y
)
(class prior)
MLE maximizes joint likelihood
P
(
x
,
y
)
=
P
(
x
|
y
)
P
(
y
)
prediction: we plug in the learned
P
(
x
|
y
)
and
P
(
y
)
to get
P
(
y
|
x
)
P
(
y
|
x
)
=
P
(
x
|
y
)
P
(
y
)
P
(
x
)
, where
P
(
x
)
=
∑
k
=
1
C
P
(
x
|
y
=
k
)
P
(
y
=
k
)
e.g., for binary classification,
P
(
y
=
1
|
x
)
=
P
(
x
|
y
=
1
)
P
(
y
=
1
)
P
(
x
|
y
=
0
)
P
(
y
=
0
)
+
P
(
x
|
y
=
1
)
P
(
y
=
1
)
e.g., Naive Bayes
k
NN
keep the training set when making prediction, "non-parametric"
define a metric that measures the distance between two datapoints, e.g., Euclidean distance
chose a hyperparameter
k
To predict the label of a test input
x
find its
k
nearest neighbors in the training set
predict label which is most common among the neighbors
CS467 Cheatsheet
Gradient Descent
Convextiy
Maximum Likelihood Estimation (MLE)
Maximum A Posteriori Estimation (MAP)
Linear/Logistic/Softmax Regression
Logistic vs. Softmax
Logistic Regression
Discriminative vs. Generative Models
Discriminative
Generative
kNN
Expand all
Back to top
Go to bottom
CS467 Cheatsheet
Gradient Descent
Convextiy
Maximum Likelihood Estimation (MLE)
Maximum A Posteriori Estimation (MAP)
Linear/Logistic/Softmax Regression
Logistic vs. Softmax
Logistic Regression
Discriminative vs. Generative Models
Discriminative
Generative
kNN
Expand all
Back to top
Go to bottom
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up
Comment