2.2 Vapnik-Chervonenkis Dimension

# 2.2 Vapnik-Chervonenkis Dimension 假設我們有一個含 $N$ 個點的 dataset，並且如果每個點都可以被 label positive / negative，總共有 $2^N$ 種可能的 label 方式，因此也可以說這 $N$ 個點可以 define $2^N$ 種 learning problems。如果說對這 $2^N$ 個 problems 的任何一個，我們都能找到一個 hypothesis $h \in H$，而這個 $h$ 可以區分 positive 和 negative examples，那我們就說 $H$ shatters $N$ points。 $\rightarrow$ 也就是說，任何可以被這 $N$ 個 data points 所定義出來的 learning problem，都可以從 $H$ 裡找到一個 hypothesis，使這個 problem learned with no error。 > 意思等同於說： > > 對任何 $N$ 個點可能產生的情形，我們都能找到一個 $h \in H$，使得這個 hypothesis $h$ 能正確的 classify 這 $N$ 個點。 ++可以被 $H$ 所 shattered 的 max point 數++，就稱作 Vapnik-Chervonenkis (VC) dimension of $H$，denoted by ==$VC(H)$==。參考資料附的網站連結裡，有一個很清楚的 VC dimension 定義方式： :::info if $\exists$ a set of $n$ points that can be shattered by the classifier, and $\nexists$ set of $n+1$ points that can be shattered by the classifier, $\rightarrow$ then the VC dimension of th classifier is $n$ ::: 舉個例子：假設我們有一個含三個點 A, B, C 的 dataset，每個點都能被分成 negative / positive （以 $-1 /+1$ 表示）的情況下，會有 $2^3$ 種可能的 label 組合，因此我們說這三個點能定義八種 learning problems。如果現在我們用的是 2d-perceptron，那它能不能 shatter 這三個點呢？ > 簡單來說，就是我們想用一條線來把這三個點按照正負區分開來。由下圖我們可以看到這八種情形都能被一條線區隔開來： ![image](https://hackmd.io/_uploads/BJyWXwDdR.png) 但是如果我們的 dataset 有四個點呢？下圖是四個點 label positive / negative 的所有可能情形，共有 $2^4 =16$ 種： ![image](https://hackmd.io/_uploads/Sy4ONwv_A.png) 從上面這個圖我們可以看到如果是用一條直線的話，最右下角的兩種情形沒有辦法將 positive / negative 區分開來，因此我們說 2d-perceptron 的 VC dimension $=3$。 > 實際上 in general，一個 $x$d-perceptron 的 VC dimension $=x+1$。再看一個課本的例子： ![fig 2.6](https://hackmd.io/_uploads/BJs1QPcxA.png) 從上圖中可以看到，這些和座標軸對齊的長方形可以 shatter 二維空間裡的這四個點。 $\rightarrow$ 在這裏，我們的 $H$ 是那些「在二維空間中和座標軸對齊的長方形」所成的 hypothesis class，然後 $VC(H)=4$ - 在計算 $VC(H)$ 時，像這個例子是 $4$，只要我們能找到四個點可以被 shattered 就足夠了，不需要是對任何四個點都成立。 > 舉例來說，四個在同一條線上的點就無法被長方形給 shattered。此外，在二維空間裡面我們找不到任何一個地方能放第五個點，使得我們可以用長方形來對所有可能的 labelings 做分割。 VC dimension 這個概念可能聽起來讓人覺得有點悲觀，因為它告訴我們，假設我們用長方形來當作 hypothesis class，我們最多就只能 learn 只包含四個點的 datasets，聽起來蠻沒用的，然而，這是因為 VC dimension 是獨立於 instances 的機率分佈的。在現實生活中，在彼此附近的 instances 通常會有相同的 label，所以我們不用去想像所有可能的 labelings。有很多包含超過四個點的 datasets 是 learnable 的，所以即使只有很小的 VC dimension 的 hypothesis class 也適用，甚至比起有很大的 VC dimension 的 hypothesis class，我們更喜歡用 VC dimension 小的。 # 參考資料 - [Defining The VC Dimension And Generalization Bound For Support Vector Machines](https://helenedk.medium.com/defining-the-vc-dimension-and-generalization-bound-for-support-vector-machines-5ee3a90f4ba1) > 這篇文章寫得簡單清楚，非常推薦看看！