機器學習 - 簡仁宗 (2022 Fall)

tags: `NYCU-2022-Fall`

Class info.

課程資訊

Temporary Grading Policy: Midterm Exam (28%), Final Exam (37%), Homework (35%)

Homework-01 (2022-Fall-ML)

Date

9/16

Chapter 1 ppt

Chapter 3 ppt

From chapter 3

Root Mean Square \(E_{RMS}=\sqrt{2E(\rm{w}^{*})/N}\)

Regularization
It refers to the act of modifying a learning algorithm to favor “simpler” prediction rules to avoid overfitting. Most commonly, regularization refers to modifying the loss function to penalize certain values of the weights you are learning.
Minimize error: \(E(w)=\frac{1}{2}\sum^{N}_{i=1}\{y(x_n,w)-t_n\}^{2}\)
Model selection: choosing \(M\)
Regularization: \(\widetilde{E}(w)=\frac{1}{2}\sum^{N}_{i=1}\{ y(x_n,w)-t_n \}^{2}+\frac{\lambda}{2}\|w\|^{2}\)

This can be expressed in the Bayesian framework using maximum likelihood.

Probability theory (discrete random variables)

Sum rule: \(p(X)=\sum_Y p(X,Y)\)
Product rule: \(p(X,Y)=p(X|Y)p(Y)=p(Y|X)p(X)\)
Exception: \(\mathop{\mathbb{E}}[f]=\sum_x p(x)f(x)\)
Conditional exception: \(\mathop{\mathbb{E}}_x[f|y] = \sum_x p(x|y)f(x)\)
Covariance for example \(x\) and \(y\): \(\rm{cov}[x,y]=\mathop{\mathbb{E}_{x,y}}[xy]-\mathop{\mathbb{E}}[x]\mathop{\mathbb{E}}[y]\)
Variance: \(\rm{var}[f]=\mathop{\mathbb{E}}[(f(x)-\mathop{\mathbb{E}}[f(x)])^2]\)
Bayes: \(P(W|D)=\frac{P(D|W)P(W)}{P(D)}\)
事後機率：是在考慮和給出相關證據或數據後所得到的條件機率。 \(posterior = \frac{likekihood × prior}{normalization}\)

\(P(D)\): evidence

\(P(W|D)\): posterior

\(P(D|W)\): likeihood function
它表达了在不同的参数向量w下，观测数据出现的可能性的⼤⼩

\(P(W)\): prior

Probability densities (continuous random variables)

cumulative distribution function: \(P(z)=p(x \in (-\infty, z))\)

频率学家⼴泛使⽤的⼀个估计是最⼤似然（maximum likelihood）估计，其中w的值是使似
然函数p(D | w)达到最⼤值的w值。这对应于选择使观察到的数据集出现概率最⼤的w的值。

在机器学习的⽂献中，似然函数的负对数被叫做误差函数（error function）。由于负对数是单调递减的函数，最⼤化似然函数等价于最⼩化误差函数。

9/23

Fully Bayesian approach

\(α\) is the precision of the distribution (分布⽅差的倒数)

\(β\) is the Gaussian conditional distribution (分布的精度)

Regularization parameter given by \(λ = α/β\)

K-folder cross-vailation

Not all the intuitions developed in spaces of low dimensionality will generalize to spaces of many dimensions

Minimizing the misclassification rate \(p(C_k | x)\)
Minimizing the expected loss (\(L\) means loss function)
Loss functions for regression
Relative entropy and mutual information
Lagrange multipliers

9/30

\(KL(p \| q) = - \int p(x)ln \ q(x)dx - (\int p(x)ln \ p(x) dx) = - \int p(x) ln \{ \frac{q(x)}{p(x)}\}dx\)

\(KL(q \| p) = - \int q(x)ln \ p(x)dx - (\int q(x)ln \ q(x) dx) = - \int q(x) ln \{ \frac{p(x)}{q(x)}\}dx\)

\(KL(p\|q) \neq KL(q\|p)\)

Forward and Reverse KL

mutual information

	Discrete	Continuous
Observation (likelihood) \(P(D\|W)\)	Binomial / Multinomial	Gaussian
parameter	\(p \ ,(1-p)\) / \(\{P_i\}^M_{i=1}\)	\(\mu \ , \sigma^{2}\)
prior \(P(W)\)	beta / Dirichlet	Gaussian / gamma
posterior \(P(W\|D)\)	beta / Dirichlet	Gaussian / gamma

Binomial distribution
\[ \rm{Bin}(m | N, \mu) = \begin{pmatrix}N\\m\end{pmatrix} \mu^{m}(1-\mu)^{N-m} \]
Gamma
\[ \Gamma(x) \equiv \int_0^{\infty} u^{x-1} e^{-u} \mathrm{~d} u \]
Beta

10/7

2.3.1 条件⾼斯分布

多元⾼斯分布的⼀个重要性质是，如果两组变量是联合⾼斯分布，那么以⼀组变量为条件，另⼀组变量同样是⾼斯分布。类似地，任何⼀个变量的边缘分布也是⾼斯分布。

2.3.2 边缘⾼斯分布

边缘概率分布
\[ p(\textbf{x}_a) = \cal{N}(\textbf{x}_a | \mu_a, \sum {}_{aa}) \]

2.3.4 ⾼斯分布的最⼤似然估计

2.3.5 顺序估计

Robbins-Monro

10/14

2.3.6 ⾼斯分布的贝叶斯推断

2.3.7 学⽣t分布

2.3.9 混合⾼斯模型

2.4 指数族分布

本章中研究的概率分布（⾼斯混合分布除外）都是⼀⼤类被称为指数族（exponential family）分布的概率分布的具体例⼦。

2.4.2 共轭先验

2.4.3 ⽆信息先验

平移不变性（translation invariance）
缩放不变性（scale invariance）

2.5 ⾮参数化⽅法

密度估计的直⽅图⽅法:
具有下⾯的性质：⼀旦直⽅图被计算出来，数据本⾝就被丢弃了，这当数据量很⼤的时候会很有优势。并且，直⽅图⽅法也很容易应⽤到数据顺序到达的情形。

2.5.1 核密度估计

2.5.2 近邻⽅法

3.1 线性基函数模型

Feel free to update course content
Suggest to read textbook with chinese version for preparing midterm
The note for the course just copy and paste from the textbook,
so I won't update anymore, but maybe future :P

Cheat sheet for midterm

(but only include 3 chapters, which from chapter 1 ~ chapter 3)

Reference

原文書電子檔

Reading Group Slide

PRML中文版_模式识别与机器学习.pdf

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.