Noise & Error - HackMD

# 雜訊 Noise 潛在的 $f$ 所給出的值，被我們接收的時候可能會有誤。原本應該是計算 deterministic 的結果： $$ [[f(\mathbf{x}) \ne h(\mathbf{x})]] $$ 會改成紀錄 probabilistic 的結果： $$ [[y \ne h(\mathbf{x})]],y \sim P(y|\mathbf{x}) $$ 而我們的 VC 先前只有說抽出來的資料要遵循相同的分佈，現在則是增加 y 也要遵循相同的分佈： $$ (\mathbf{x},y)\sim P(\mathbf{x},y) $$ 在這樣的條件下， VC 依舊可以向我們保證。 ## Target Distribution $P(y|\mathbf{x})$ 對於某個 $\mathbf{x}$，會有一個 Target Distribution 影響其 mini-target 的行為。所以可以將一個 y 看作是 ideal mini-target + noise 的結果，而 noise 就是 Target Distribution 的影響。例如： - ideal mini-target $f(\mathbf{x})=\circ$ - $P(\circ|\mathbf{x})=0.7,P(\times|\mathbf{x})=0.3$ - 則 noise level = 0.3 也因此可知 deterministic 是 Target Distribution 的一種特例： - $P(y|\mathbf{x}) = 1\ for\ y = f(\mathbf{x})$ - $P(y|\mathbf{x}) = 0\ for\ y \ne f(\mathbf{x})$ 所以原本是要找潛在規則 $f$ ，現在換成找出在某個分佈($P(y|\mathbf{x})$) 下的 ideal mini-target。 # Error Measure 是 $\mathcal{A}$ 的重要一部份。 $$ E_{out}(g)=\underset{\mathbf{x}\sim P}{\mathcal{E}} err(g(\mathbf{x}),f(\mathbf{x}))\\ E_{in}(g)=\frac{1}{N}\sum_{n=1}^{N} err(g(\mathbf{x}),f(\mathbf{x})) $$ $err$ 是 application/user-dependent。 ## Classification error 通常又叫做 **「0/1 error」** # Weighted Classification [請見機器學習技法 AdaBoost 部分](https://hackmd.io/@ShawnNTU-CS/SJvNkG16h#Weighted-Base-Algorithm)