機器學習 - 簡仁宗 (2022 Fall)

tags: NYCU-2022-Fall

Class info.

課程資訊

Temporary Grading Policy: Midterm Exam (28%), Final Exam (37%), Homework (35%)

Homework-01 (2022-Fall-ML)

Date

9/16

Chapter 1 ppt

Chapter 3 ppt

From chapter 3

Root Mean Square \(E_{RMS}=\sqrt{2E(\rm{w}^{*})/N}\)

  • Regularization
    It refers to the act of modifying a learning algorithm to favor “simpler” prediction rules to avoid overfitting. Most commonly, regularization refers to modifying the loss function to penalize certain values of the weights you are learning.

  • Minimize error: \(E(w)=\frac{1}{2}\sum^{N}_{i=1}\{y(x_n,w)-t_n\}^{2}\)

  • Model selection: choosing \(M\)

  • Regularization: \(\widetilde{E}(w)=\frac{1}{2}\sum^{N}_{i=1}\{ y(x_n,w)-t_n \}^{2}+\frac{\lambda}{2}\|w\|^{2}\)

This can be expressed in the Bayesian framework using maximum likelihood.

Probability theory (discrete random variables)

  • Sum rule: \(p(X)=\sum_Y p(X,Y)\)

  • Product rule: \(p(X,Y)=p(X|Y)p(Y)=p(Y|X)p(X)\)

  • Exception: \(\mathop{\mathbb{E}}[f]=\sum_x p(x)f(x)\)

  • Conditional exception: \(\mathop{\mathbb{E}}_x[f|y] = \sum_x p(x|y)f(x)\)

  • Covariance for example \(x\) and \(y\): \(\rm{cov}[x,y]=\mathop{\mathbb{E}_{x,y}}[xy]-\mathop{\mathbb{E}}[x]\mathop{\mathbb{E}}[y]\)

  • Variance: \(\rm{var}[f]=\mathop{\mathbb{E}}[(f(x)-\mathop{\mathbb{E}}[f(x)])^2]\)

  • Bayes: \(P(W|D)=\frac{P(D|W)P(W)}{P(D)}\)

  • 事後機率: 是在考慮和給出相關證據或數據後所得到的條件機率。 \(posterior = \frac{likekihood × prior}{normalization}\)

\(P(D)\): evidence

\(P(W|D)\): posterior

\(P(D|W)\): likeihood function
它表达了在不同的参数向量w下,观测数据出现的可能性的⼤⼩

\(P(W)\): prior

Probability densities (continuous random variables)

  • cumulative distribution function: \(P(z)=p(x \in (-\infty, z))\)

频率学家⼴泛使⽤的⼀个估计是最⼤似然(maximum likelihood)估计,其中w的值是使似
然函数p(D | w)达到最⼤值的w值。这对应于选择使观察到的数据集出现概率最⼤的w的值。

在机器学习的⽂献中,似然函数的负对数被叫做误差函数(error function)。由于负对数是单调递减的函数,最⼤化似然函数等价于最⼩化误差函数。

9/23

\(α\) is the precision of the distribution (分布⽅差的倒数)

\(β\) is the Gaussian conditional distribution (分布的精度)

Regularization parameter given by \(λ = α/β\)

Not all the intuitions developed in spaces of low dimensionality will generalize to spaces of many dimensions

  • Minimizing the misclassification rate \(p(C_k | x)\)


  • Minimizing the expected loss (\(L\) means loss function)

  • Loss functions for regression


  • Relative entropy and mutual information

  • Lagrange multipliers

9/30

\(KL(p \| q) = - \int p(x)ln \ q(x)dx - (\int p(x)ln \ p(x) dx) = - \int p(x) ln \{ \frac{q(x)}{p(x)}\}dx\)

\(KL(q \| p) = - \int q(x)ln \ p(x)dx - (\int q(x)ln \ q(x) dx) = - \int q(x) ln \{ \frac{p(x)}{q(x)}\}dx\)

\(KL(p\|q) \neq KL(q\|p)\)

  • mutual information

Discrete Continuous
Observation (likelihood) \(P(D|W)\) Binomial / Multinomial Gaussian
parameter \(p \ ,(1-p)\) / \(\{P_i\}^M_{i=1}\) \(\mu \ , \sigma^{2}\)
prior \(P(W)\) beta / Dirichlet Gaussian / gamma
posterior \(P(W|D)\) beta / Dirichlet Gaussian / gamma
  • Binomial distribution
    \[ \rm{Bin}(m | N, \mu) = \begin{pmatrix}N\\m\end{pmatrix} \mu^{m}(1-\mu)^{N-m} \]
  • Gamma
    \[ \Gamma(x) \equiv \int_0^{\infty} u^{x-1} e^{-u} \mathrm{~d} u \]
  • Beta


10/7


2.3.1 条件⾼斯分布

  • 多元⾼斯分布的⼀个重要性质是,如果两组变量是联合⾼斯分布,那么以⼀组变量为条件,另⼀组变量同样是⾼斯分布。类似地,任何⼀个变量的边缘分布也是⾼斯分布。

2.3.2 边缘⾼斯分布

  • 边缘概率分布
    \[ p(\textbf{x}_a) = \cal{N}(\textbf{x}_a | \mu_a, \sum {}_{aa}) \]

2.3.4 ⾼斯分布的最⼤似然估计

2.3.5 顺序估计

  • Robbins-Monro

10/14

2.3.6 ⾼斯分布的贝叶斯推断

2.3.7 学⽣t分布

2.3.9 混合⾼斯模型

2.4 指数族分布

本章中研究的概率分布(⾼斯混合分布除外)都是⼀⼤类被称为指数族(exponential family)分布的概率分布的具体例⼦。

2.4.2 共轭先验

2.4.3 ⽆信息先验

  • 平移不变性(translation invariance)
  • 缩放不变性(scale invariance)

2.5 ⾮参数化⽅法

  • 密度估计的直⽅图⽅法:
    具有下⾯的性质:⼀旦直⽅图被计算出来,数据本⾝就被丢弃了,这当数据量很⼤的时候会很有优势。并且,直⽅图⽅法也很容易应⽤到数据顺序到达的情形。

2.5.1 核密度估计

2.5.2 近邻⽅法


3.1 线性基函数模型

Feel free to update course content
Suggest to read textbook with chinese version for preparing midterm
The note for the course just copy and paste from the textbook,
so I won't update anymore, but maybe future :P

Cheat sheet for midterm

(but only include 3 chapters, which from chapter 1 ~ chapter 3)

Reference

原文書電子檔

Reading Group Slide

PRML中文版_模式识别与机器学习.pdf

Select a repo