Pattern Recognition

Pattern Recognition === 課本 === [Pattern Recognition (可透過交大IP直接下載)](https://www.sciencedirect.com/science/book/9781597492720) ## Chapter 1: Clustering ### Intro. - | Clustering | PR ---|---|--- 待解的問題|$\{x_1, x_2,...,x_n\}$ n 點分 k 群 | 已有 $\{C_1, C_2,...,C_k\}$ k 群，現有一新資料點 $x_i$，問 $x_i$ 應該分到哪群？其中一組解 | $C_1 = \{x_1, x_2\},\\C_2=\{x_3, x_5\},...\\ C_k=\{x_4,x_6,x_7...x_n\}$ | $x_i\in \{C_3\}$ 解 | 無最佳解，只有能不能接受這種分群結果 | 有正解，我們期望誤分率越低越好（理論上界：貝氏定理） Given n data points --> Clustering --> k clusters Given a new data point --> PR --> it belongs to which cluster? ### 1.1 [k-means](https://hackmd.io/s/BkyMzbSsG#Ch1-k-means) ### 1.2 [Hierarchical Method](https://hackmd.io/s/BkyMzbSsG#Ch2-Hierarchical-Method-%E9%9A%8E%E5%B1%A4%E5%BC%8F) ## Chapter 2: Classifiers Based on Bayes Decision Theory ### 2.1 Introduction Given a classification task of $M$ classes, $\omega_1, \omega_2, \dots, \omega_M$, and a feature vector $\boldsymbol{x}$, we form the $M$ conditional probabilities $P(\omega_i|\boldsymbol{x}), i = 1, 2, \dots, M$. :::success 目標：$\max\limits_i P(\omega_i|\boldsymbol{x})$ ::: ### 2.2 Bayes Decision Theory Given priori probabilities $P(\omega_i)$ and class-conditional probability density functions $p(\boldsymbol{x}|\omega_i)$ #### Bayes rule $$ P(\omega_i|\boldsymbol{x}) = \frac{p(\boldsymbol{x}|\omega_i)P(\omega_i)}{p(\boldsymbol{x})} $$ where $p(\boldsymbol{x})$ is the pdf of $\boldsymbol{x}$ $$ p(\boldsymbol{x}) = \sum_{i = 1}^{M}p(\boldsymbol{x}|\omega_i)P(\omega_i) $$ #### Bayes classification rule for two-class case $$ \text{If }P(\omega_1|\boldsymbol{x})>P(\omega_2|\boldsymbol{x}),\ \ \boldsymbol{x}\text{ is classified to }\omega_1 \\ \text{If }P(\omega_2|\boldsymbol{x})>P(\omega_1|\boldsymbol{x}),\ \ \boldsymbol{x}\text{ is classified to }\omega_2 \\ \Bigg\Downarrow \text{ bayes rule} \\ p(\boldsymbol{x}|\omega_1)P(\omega_1)\gtrless p(\boldsymbol{x}|\omega_2)P(\omega_2) \\ \Bigg\Downarrow \text{if }P(\omega_1)=P(\omega_2) \\ p(\boldsymbol{x}|\omega_1) \gtrless p(\boldsymbol{x}|\omega_2) $$ #### Minimizing the Classification Error Probability ++**Thm**++ The Bayesian classifier is optimal with respect to minimizing the classification error probability. ++**pf**++ Let $R_i$ be the region of the feature space in which we decide in favor of $\omega_i$. $$ P_e = P(\boldsymbol{x}\in R_2, \omega_1)+P(\boldsymbol{x} \in R_1, \omega_2) $$ :::info $\boldsymbol{x}\in R_2$ 然而它其實是 $\omega_1$ 類，或 $\boldsymbol{x}\in R_1$ 然而它其實是 $\omega_2$ 類 ::: $$ \begin{align} P_e &= P(\boldsymbol{x}\in R_2|\omega_1)P(\omega_1)+P(\boldsymbol{x}\in R_1|\omega_2)P(\omega_2)\qquad(\text{joint probabitilty})\\ & =P(\omega_1)\int\limits_{R_2}p(\boldsymbol{x}|\omega_1)d\boldsymbol{x}+P(\omega_2)\int\limits_{R_1}P(\boldsymbol{x}|\omega_2)d\boldsymbol{x} \end{align} $$ ![](https://i.imgur.com/u9n81bT.png) :::danger 找好一點的圖 ::: or using Bayes rule $$ P_e = \int\limits_{R_2}P(\omega_1|\boldsymbol{x})p(\boldsymbol{x})d\boldsymbol{x}+\int\limits_{R_1}P(\omega_2|\boldsymbol{x})p(\boldsymbol{x})d\boldsymbol{x} $$ Error is minimized if the partitioning regions $R_1$ and $R_2$ of the feature space are chosen $s.t.$ $$ R_1: P(\omega_1|\boldsymbol{x})>P(\omega_2|\boldsymbol{x}) \\ R_2: P(\omega_2|\boldsymbol{x}) > P(\omega_1|\boldsymbol{x}) $$ **Some Explaination** Since the union of $R_1, R_2$ covers all the space, we have $$ \int\limits_{R_1}P(\omega_1|\boldsymbol{x})p(\boldsymbol{x})d\boldsymbol{x}+\int\limits_{R_2}P(\omega_1|\boldsymbol{x})p(\boldsymbol{x})d\boldsymbol{x} = P(\omega_1) $$ and thus $$ P_e = P(\omega_1) - \int\limits_{R_1}(P(\omega_1|\boldsymbol{x})-P(\omega_2|\boldsymbol{x}))p(\boldsymbol{x})d\boldsymbol{x} $$ which suggests the above partitioning choice. **Generalization** $\boldsymbol{x}$ is assigned to class $\omega_i$ if $P(\omega_i|\boldsymbol{x})>P(\omega_j|\boldsymbol{x}) \quad\forall j \ne i$ #### Minimizing the Average Risk Let $R_j, j = 1,2,\dots,M$ be the regions of the feature space assigned to classes $\omega_j$ respectively. Let *loss* $\lambda_{ki}$ be the penalty term associated with the misclassification, that a feature vector $\boldsymbol{x}$ that belongs to class $\omega_k$ lies in $R_i, i \ne k$. Let *loss matrix* $L$ be a matrix where $l_{ki} = \lambda_{ki}$ $$L=\begin{bmatrix} \lambda_{11} & \lambda_{12} & \cdots & \lambda_{1M} \\ \lambda_{21} & \lambda_{22} & \cdots & \lambda_{2M} \\ \vdots & \vdots & \ddots & \vdots \\ \lambda_{M1} & \lambda_{M2} & \cdots & \lambda_{MM} \end{bmatrix}$$ :::info 屬於 $\omega_k$ 類的 $\boldsymbol{x}$ 卻在 $R_i$ 中($\boldsymbol{x}$ 被分到 $\omega_i$ 類)，會得到一個 $\lambda_{ki}$ 的懲罰一般來說 $\lambda_{kk}=0, \forall k = 1,2,\dots,M$ ::: The *risk* or *loss* associated with $\omega_k$ is defined as $$ r_k = \sum_{i=1}^{M} \lambda_{ki}\int\limits_{R_i}p(\boldsymbol{x}|\omega_k)d\boldsymbol{x} $$ :::info 積分的部分是 class $\omega_k$ 的 feature vector 被分到 class $\omega_i$ 的機率 ::: **Goal**：choose the partitioning regions $R_j$ so that the *average risk* $r$ is minimized. $$ \begin{align} r &= \sum_{k=1}^{M}r_kP(\omega_k) \\ &= \sum_{i=1}^{M}\int\limits_{R_i} \biggl( \sum_{k=1}^{M}\lambda_{ki}p(\boldsymbol{x}|\omega_k)P(\omega_k) \biggr)d\boldsymbol{x} \end{align} $$ This is achieved if each of the integrals is minimized. Select partitioning regions $s.t.$ $$ \boldsymbol{x} \in R_i\ \ \text{if}\ \ l_i \equiv \sum_{k=1}^{M}\lambda_{ki}p(\boldsymbol{x}|\omega_k)P(\omega_k)<l_j\equiv\sum_{k=1}^{M}\lambda_{kj}p(\boldsymbol{x}|\omega_k)P(\omega_k)\ \ \forall j \ne i $$ **The two-class case** :::danger 懶得寫，待補 ::: ### 2.3 Decision Boundary :::info 有一點點內容，不是很重要，待補 ::: ### 2.4 Bayesian Classifier for Normal Distribution :::danger In $l$-dimension space, Gaussian pdf $N(\boldsymbol{\mu_i}, \Sigma_i)$ for $i^{th}$ class: $$ p(\boldsymbol{x}|\omega_i) = \frac{1}{(\sqrt{2\pi})^l \sqrt {|\Sigma_i|}}\ exp(-0.5 (\boldsymbol{x}-\boldsymbol{\mu_i})^T \Sigma_i ^{-1} (\boldsymbol{x}-\boldsymbol{\mu_i})) $$ 其中 $$ \boldsymbol{\mu_i} = E[\boldsymbol{x}] \text{ is a }(l\times1) \text{ MEAN vector}\\ \Sigma_i = E[(\boldsymbol{x}-\boldsymbol{\mu_i})(\boldsymbol{x}-\boldsymbol{\mu_i})^T] \text{ is an } (l \times l) \text{ covariance matrix of class } i $$ 紅紅這段要記清楚 ::: :::info 舉例 $l=2$ $\Sigma = \begin{bmatrix}\sigma_1^2 & \sigma_{12} \\ \sigma_{12} & \sigma_2^2\end{bmatrix}$ ::: :::success assign $\boldsymbol{x}$ to class $i$ $\iff$ $P(\omega_i)p(\boldsymbol{x}|\omega_i)$ is max ::: 而 $P(\omega_i)p(\boldsymbol{x}|\omega_i)$ is max $\iff$ $\ln P(\omega_i) + \ln p(\boldsymbol{x}|\omega_i)$ is max ++**def**++ $g_i(\boldsymbol{x}) = -0.5 (\boldsymbol{x}-\boldsymbol{\mu_i})^T \Sigma_i ^{-1} (\boldsymbol{x}-\boldsymbol{\mu_i}) + \ln P(\omega_i) + c_i$ , $c_i = \bbox[yellow]{-\frac{l}{2} \ln 2\pi} - \frac{1}{2}\ln |\Sigma_i|$ 黃色那塊大家都一樣，不用比 **Minimum Distance Classifiers** If ++equi-probable++ and ++same covariance++ ( $P(\omega_i) = 1/M$ and $\Sigma_i = \Sigma,\ \forall i$ ) then 定義新的 $g_i(\boldsymbol{x}) = -0.5 (\boldsymbol{x}-\boldsymbol{\mu_i})^T \Sigma ^{-1} (\boldsymbol{x}-\boldsymbol{\mu_i})$ - If $\Sigma = \sigma^2 I$ then $\max\limits_{i}g_i(x)$ is to assign $\boldsymbol{x}$ to $\omega_i$ when **Euclidean Distance** $d_E \equiv ||\boldsymbol{x}-\boldsymbol{\mu_i}||$ is minimum - Else (non diagonal, or diagonal but $\sigma_{i}\ne\sigma_j$) then $\max\limits_{i}g_i(x)$ is to assign $\boldsymbol{x}$ to $\omega_i$ when **Mahalanobis Distance** $d_m \equiv \sqrt{(\boldsymbol{x}-\boldsymbol{\mu_i})^T \Sigma ^{-1} (\boldsymbol{x}-\boldsymbol{\mu_i})}$ is minimum ### 2.5 Estimation of Unknown PDF 不重要 ### 2.6 Classifier using k nearest neighbors (kNN) Rule 從 $N$ 個點中找離 $\boldsymbol{x}$ 最近的 $k$ 個，看它們大多屬於哪類就說 $\boldsymbol{x}$ 屬於哪類 :::danger Let $P_B$ be the classification error probability of Bayes Theory, $P_{NN}$ be the classification error probability of 1-NN As $N \to \infty$, $$ P_B \le P_{NN} \le P_B(2-\frac{M}{M-1}P_B) \le 2P_B $$ 必考 ::: :::danger For two classes, $$ P_{Bayes} \le P_{kNN} \le P_{Bayes} + \sqrt{\frac{2P_{NN}}{k}} $$ Therefore, $$ k \to \infty\ ,\ P_{kNN} \to P_{Bayes} $$ 必考 ::: If $P_{Bayes}$ is small, then $P_{NN} \cong 2P_{Bayes}$ $P_{3NN} \cong P_{Bayes} + 3(P_{Bayes})^2$ ## Reference 1. Pattern Recognition, by S. Theodoridis and K.Koutroumbas, 4th edition, 2009, Elsevier (Academic Press)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.