NYCU-2022-Fall
Temporary Grading Policy: Midterm Exam (28%), Final Exam (37%), Homework (35%)
From chapter 3
Root Mean Square \(E_{RMS}=\sqrt{2E(\rm{w}^{*})/N}\)
Regularization
It refers to the act of modifying a learning algorithm to favor “simpler” prediction rules to avoid overfitting. Most commonly, regularization refers to modifying the loss function to penalize certain values of the weights you are learning.
Minimize error: \(E(w)=\frac{1}{2}\sum^{N}_{i=1}\{y(x_n,w)-t_n\}^{2}\)
Model selection: choosing \(M\)
Regularization: \(\widetilde{E}(w)=\frac{1}{2}\sum^{N}_{i=1}\{ y(x_n,w)-t_n \}^{2}+\frac{\lambda}{2}\|w\|^{2}\)
This can be expressed in the Bayesian framework using maximum likelihood.
Sum rule: \(p(X)=\sum_Y p(X,Y)\)
Product rule: \(p(X,Y)=p(X|Y)p(Y)=p(Y|X)p(X)\)
Exception: \(\mathop{\mathbb{E}}[f]=\sum_x p(x)f(x)\)
Conditional exception: \(\mathop{\mathbb{E}}_x[f|y] = \sum_x p(x|y)f(x)\)
Covariance for example \(x\) and \(y\): \(\rm{cov}[x,y]=\mathop{\mathbb{E}_{x,y}}[xy]-\mathop{\mathbb{E}}[x]\mathop{\mathbb{E}}[y]\)
Variance: \(\rm{var}[f]=\mathop{\mathbb{E}}[(f(x)-\mathop{\mathbb{E}}[f(x)])^2]\)
Bayes: \(P(W|D)=\frac{P(D|W)P(W)}{P(D)}\)
事後機率: 是在考慮和給出相關證據或數據後所得到的條件機率。 \(posterior = \frac{likekihood × prior}{normalization}\)
\(P(D)\): evidence
\(P(W|D)\): posterior
\(P(D|W)\): likeihood function
它表达了在不同的参数向量w下,观测数据出现的可能性的⼤⼩
\(P(W)\): prior
频率学家⼴泛使⽤的⼀个估计是最⼤似然(maximum likelihood)估计,其中w的值是使似
然函数p(D | w)达到最⼤值的w值。这对应于选择使观察到的数据集出现概率最⼤的w的值。
在机器学习的⽂献中,似然函数的负对数被叫做误差函数(error function)。由于负对数是单调递减的函数,最⼤化似然函数等价于最⼩化误差函数。
\(α\) is the precision of the distribution (分布⽅差的倒数)
\(β\) is the Gaussian conditional distribution (分布的精度)
Regularization parameter given by \(λ = α/β\)
Not all the intuitions developed in spaces of low dimensionality will generalize to spaces of many dimensions
Minimizing the misclassification rate \(p(C_k | x)\)
Minimizing the expected loss (\(L\) means loss function)
Loss functions for regression
Relative entropy and mutual information
Lagrange multipliers
\(KL(p \| q) = - \int p(x)ln \ q(x)dx - (\int p(x)ln \ p(x) dx) = - \int p(x) ln \{ \frac{q(x)}{p(x)}\}dx\)
\(KL(q \| p) = - \int q(x)ln \ p(x)dx - (\int q(x)ln \ q(x) dx) = - \int q(x) ln \{ \frac{p(x)}{q(x)}\}dx\)
\(KL(p\|q) \neq KL(q\|p)\)
Discrete | Continuous | |
---|---|---|
Observation (likelihood) \(P(D|W)\) | Binomial / Multinomial | Gaussian |
parameter | \(p \ ,(1-p)\) / \(\{P_i\}^M_{i=1}\) | \(\mu \ , \sigma^{2}\) |
prior \(P(W)\) | beta / Dirichlet | Gaussian / gamma |
posterior \(P(W|D)\) | beta / Dirichlet | Gaussian / gamma |
2.3.1 条件⾼斯分布
2.3.2 边缘⾼斯分布
2.3.4 ⾼斯分布的最⼤似然估计
2.3.5 顺序估计
2.3.6 ⾼斯分布的贝叶斯推断
2.3.7 学⽣t分布
2.3.9 混合⾼斯模型
2.4 指数族分布
本章中研究的概率分布(⾼斯混合分布除外)都是⼀⼤类被称为指数族(exponential family)分布的概率分布的具体例⼦。
2.4.2 共轭先验
2.4.3 ⽆信息先验
2.5 ⾮参数化⽅法
2.5.1 核密度估计
2.5.2 近邻⽅法
3.1 线性基函数模型
Feel free to update course content
Suggest to read textbook with chinese version for preparing midterm
The note for the course just copy and paste from the textbook,
so I won't update anymore, but maybe future :P
(but only include 3 chapters, which from chapter 1 ~ chapter 3)
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing