# Machine Learning Homework 4 SVM Department : Forsetry Student : M10912004 Name : De-Kai Kao (高得愷) E-mail : block58697@gmail.com ## 1. Linear hard-margin SVM For the data set $$ X=[\vec{x}_{1}\; \vec{x}_{2}\; \vec{x}_{3} ] = [(1,2)\;(2,1)\;(4,2)]\\Y=[y_{1}\;y_{2}\;y_{3}]=[-1\;1\;1] $$ * Plot ![](https://hackmd.io/_uploads/ByY34fF43.png) 1. Determine the supporting vectors $$ Decison\;Boundary:\vec{w}\cdot\vec{x}+b=0\\Upper\;Boundary:\vec{w}\cdot\vec{x}+b=1\\Lower\;Boundary:\vec{w}\cdot\vec{x}+b=-1 $$ $$ Width\;d=\overrightarrow{x^{-}x^{+}}\frac{\vec{w}}{||w||}\\ =\frac{1}{||w||}w^T(x^+-x^-)\\=\frac{1}{||w||}(w^Tx^+-w^Tx^-)\\=\frac{1}{||w||}[1-b-(1-b)]\\d=\frac{2}{||w||} $$ $$ argmax_{w,b}\frac{2}{||w||}\Rightarrow argmin_{w,b}\frac{||w||^2}{2}\\Objective \;Function:f_{i}(w)=argmin_{w,b}\frac{||w||^2}{2}\\s.t.\;f_i(w)=y_{i}(\vec{w}\cdot \vec{x_i}+b)\ge1,\;i=1,2,...,N $$ * 基於KKT條件的二次規劃問題,需要引入拉格朗日乘數(lagrange multiplier)求得最佳解。 $$ Lp(w,b,\lambda)=\frac{1}{2}||w||^2+\sum_{i=1}^{N}\lambda_{i}[1-y_{i}(\vec{w}\cdot \vec{x_i}+b)]......(1)\\ $$ * 拉格朗日乘數可以透過引入未知標量($\lambda$),分別 $w,b$ 偏微分,直接求多元函數條件的極值 * 若多元函數不帶約束條件求極值時,分別 $w,b$ 偏微分解方程,極為無條件極值 * 當 $f_{i}(w)>0$ 無約束 $\lambda_{i}=0$ * 當 $f_{i}(w)=0,\;\lambda_{i}\neq0$ 即可求出支持向量 $$ \frac{\partial Lp}{\partial w}=0\Rightarrow \vec{w}=\sum_{i=1}^N\lambda_{i}y_{i}\vec{x_{i}}.......(2 )\\\frac{\partial Lp}{\partial b}=0\Rightarrow -\sum_{i=1}^N\lambda_{i}y_{i}.......(3) $$ * 將(2)帶回(1)式,可得 $$ Dual form\;\;L_{D}(\lambda)=\sum_{i=1}^N\lambda_{i}-\frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N\lambda_{i}\lambda_{j}y_{i}y_{j}(x_{i}\cdot x_{j})......(4) $$ * 因為 $L_{D}(\lambda)$ 已經沒有變量 $w,b$ 故將$X,\; Y$ 代入 $L_{D}(\lambda)$ * Case1 支持向量為 $\vec{x}_1\; \vec{x}_2$ 因 $\vec{x_3}$ 為非支持向量,所以 $\lambda_3=0$ \begin{aligned} L_{D}(\lambda)&=(\lambda_{1}+\lambda_{2})-\frac{1}{2}[\lambda_{1}\lambda_{1}(4)-\lambda_{1}\lambda_{2}(4)-\lambda_{2}\lambda_{1}(4)+\lambda_{2}\lambda_{2}(4)]\end{aligned} * 代入限制條件 $$ s.t.=\lambda\ge0,\;\sum^N_{i=1}\lambda_{i}y_{i}=0\\\Rightarrow\lambda_{1}y_{1}+\lambda_{2}y_{2}=0\\-\lambda_{1}+\lambda_{2}=0\\\lambda_{1}=\lambda_{2} $$ * $L_{D}(\lambda)=2\lambda_{1}-(0)\quad\lambda_{1,2}\in\mathbb{R}$ * Case1 支持向量為 $\vec{x}_1\; \vec{x}_2$ ,將其代入 $Upper\;\&\;Lower\;boundary$ $$ 1\vec{w_1}+2\vec{w_2}+b=1\\2\vec{w_1}+1\vec{w_2}+b=-1\\Upper\;boundary\;-\;Lower\;boundary\\\Rightarrow\;\vec{w_1}=\vec{w_2}-2\\ $$ * 目標函數 $f_{i}(w)=argmin_{w,b}\frac{||w||^2}{2}$ $$ \Rightarrow\vec{w_1}^2+\vec{w_2}^2=(\vec{w_2}-2)^2+\vec{w_2}^2\\=2\vec{w_2}^2-4\vec{w_2}+4\\\frac{\partial L_D}{\partial \vec{w_2}}=0,\quad4\vec{w_2}-4=0,\quad\vec{w_2}=1,\quad\vec{w_1}=-1\\Upper\;boundary:1\times(-1)+2\times1+b=1\\b=0 $$ * Result: $$Decison\;Boundary:-x_1+x_2=0\\Upper\;Boundary:-x_1+x_2=1\\Lower\;Boundary:-x_1+x_2=-1$$ 2. Derive and plot the decision boundaries using Hard-margin SVM . ![](https://hackmd.io/_uploads/r107TIkSh.png) * Case2 支持向量為 $\vec{x_1}\; \vec{x_3}$ , $\vec{x_2}$ 會和限制條件抵觸 $$ s.t.\;f_i(w)=y_{i}(\vec{w}\cdot \vec{x_i}+b)\ge1,\;i=1,2,...,N\\f_i(\vec{x_2})=y_i(\vec{w}\cdot \vec{x_2}+b)\ngeq1$$ * 討論: 課堂講義上只有兩點的計算範例,當資料是兩點以上就需要判定是否為支持向量,但在這次的例題中我仍然無法全然確定哪些並非支持向量,在 Case2 的例題中,其實是無法在未知 $\vec{w}\;\&\;b$ 的情況下判定是否為支持向量,本題非支持向量是由散佈圖中推論出來。可以理解的部分是引入拉格朗日乘數後,支持向量是在 $\lambda\neq0$ 時可以求出,$\lambda=0$ 對應的即為非支持向量,但仍然無法完全從對偶公式中判別支持向量與否,有從網路上得知可以用SMO或二次規劃等數值優化技術,很抱歉但無法在本次作業理解所有的公式。 ## 2. Linear soft-margin SVM For the data set $$ X=[\vec{x}_{1}\; \vec{x}_{2}\; \vec{x}_{3}\; \vec{x}_{4} ] = [(1,2)\;(2,1)\;(3,1)\;(5,1)]\\Y=[y_{1}\;y_{2}\;y_{3}\;y_{4}]=[-1\;1\;1\;1] $$ * Plot ![](https://hackmd.io/_uploads/rySa4OANn.png =60%x) A slack variable is chosen to be C=0.5 1. Determine the supporting vectors \begin{aligned}L=min_{w,\xi_i}\frac{||w||^2}{2}+C\sum_{i=1}^{N}\xi_i\quad s.t.:\;y_{i}(\vec{w}\cdot\vec{x_{i}}+b)\ge1-\xi_i,\;\xi\ge0 \end{aligned} * 同樣需要引入拉格朗日乘數來求得最佳解 $$ Lp(w,b,\lambda,\mu,\xi)=\\\frac{1}{2}||w||^2+C\left(\sum_{i=1}^{N}\xi_{i}\right)-\sum^N_{i=1}\lambda_{i}(1-y_{i}(\vec{w}\cdot\vec{x_{i}}+b)-1+\xi_i)-\sum^N_{i=1}\mu_i\xi_i......(1)\\ $$ * 根據KKT條件將不等式約束轉換成等式 $$ \lambda_i\ge0,\;\lambda_i[y_i(\vec{w}\;\cdot\vec{x_i}+b)-1+\xi_i]=0\\\mu_i\ge0,\;\mu_i\xi_i=0 $$ * 分別對 $w,b,\xi_i$ 偏微分令其為零 $$ \frac{\partial Lp}{\partial w}=0\Rightarrow \vec{w}=\sum_{i=1}^N\lambda_{i}y_{i}\vec{x_{i}}\\\frac{\partial Lp}{\partial b}=0\Rightarrow \sum_{i=1}^N\lambda_{i}y_{i}=0\\\frac{\partial Lp}{\partial \xi}=0\Rightarrow C-\lambda_i-\mu_i=0\\ $$ * 整合可得目標函數 $$ L_D=max\sum^N_{i=1}\lambda_i-\frac{1}{2}\sum^N_{i=1}\sum^N_{j=1}\lambda_i\lambda_jy_iy_j(\vec{x_i}\cdot\vec{x_j})\\s.t.\quad\sum^N_{i=1}\lambda_iy_i=0,\;0\le\lambda_i\le C=0.5 $$ $$ \left\{ \begin{aligned} &\lambda=0\quad 即點在兩條 Boundary 之外\\&\lambda=C\quad 即點在兩條 Boundary 之內\\&0\le\lambda_i\le C\quad 即點在兩條 Boundary 上,即為支持向量 \end{aligned} \right\}...(2) $$ * Case1 支持向量為 $\vec{x}_1\; \vec{x}_2;\quad \lambda_3=0,\lambda_4=0$ $$ \lambda_1y_1+\lambda_2y_2=0\\\lambda_1=\lambda_2\\L_D=(\lambda_1+\lambda_2)-\frac{1}{2}(\lambda_1\lambda_1(4)-\lambda_1\lambda_2(4)-\lambda_2\lambda_1(4)+\lambda_2\lambda_2(4))\\=2\lambda_1-(0) $$ * $L_{D}(\lambda)=2\lambda_1-(0)\quad\lambda_{1,2}\in\mathbb{R}$ * Case2 支持向量為 $\vec{x}_1\; \vec{x}_3;\quad \lambda_2=0.5,\lambda_4=0$ $$ \lambda_1y_1+\lambda_2y_2+\lambda_3y_3=0\\-\lambda_1+0.5+\lambda_3=0\quad\lambda_1=\lambda_3+0.5\\ $$ \begin{aligned} L_D&=(\lambda_1+0.5+\lambda_3)-\frac{1}{2}[\\&\quad\lambda_1\lambda_1(4)-\lambda_1(0.5)(4)-\lambda_1\lambda_3(5)\\&\quad-(0.5)\lambda_1(4)+(0.5)(0.5)(5)+(0.5)\lambda_3(6)\\&\quad-\lambda_3\lambda_1(5)+\lambda_3(0.5)(7)+\lambda_3\lambda_3(10))\\&=3(\lambda_3)^2+\lambda_3-\frac{3}{8}\end{aligned} $$ \frac{\partial L_D}{\partial \lambda_3}=6\lambda_3+1=0,\quad \lambda_3=-\frac{1}{6},\;\lambda_1=\frac{1}{3} $$ * . * $\lambda_3$ 為負值... * Case3 支持向量為 $\vec{x}_1\; \vec{x}_4\quad \lambda_{2,4}=0.5$ $$ \lambda_1y_1+\lambda_2y_2+\lambda_3y_3+\lambda_4y_4=0\\-\lambda_1+0.5+\lambda_3+0.5=0\quad\lambda_1=\lambda_3+1 $$ \begin{aligned} L_D &=(\lambda_1+1+\lambda_3)-\frac{1}{2}[\\&\quad\lambda_1\lambda_1(4)-\lambda_1 (0.5)(4)-\lambda_1\lambda_3(5)-\lambda_1(0.5)(7)\\&\quad-(0.5)\lambda_1(4)+(0.5)(0.5)(4)+(0.5)\lambda_3(7)-(0.5)(0.5)(11)\\&\quad-\lambda_3\lambda_1(5)+\lambda_3(0.5)(7)+\lambda_3\lambda_3(10)+\lambda_3(0.5)(16))\\&\quad-(0.5)\lambda_1(7)+(0.5)(0.5)(11)+(0.5)\lambda_3(16)+(0.5)(0.5)(26)]\\&=2\lambda_3+2-\frac{1}{2}[\\&\quad4(\lambda_3)^2+8\lambda_3+4-2\lambda_3-2-5(\lambda_3)^2-5\lambda_3-3.5\lambda_3-3.5\\&\quad-2\lambda_3-2+1+3.5\lambda_3+2.75\\&\quad-5(\lambda_3)^2-5\lambda_3+3.5\lambda_3+10(\lambda_3)^2+8\lambda_3\\&\quad-3.5\lambda_3-3.5+2.75+8\lambda_3+6.5]\\&=4(\lambda_3)^2+10\lambda_3+10\end{aligned} $$ \frac{\partial L_D}{\partial \lambda_3}=8\lambda_3+10=0,\quad \lambda_3=-\frac{5}{4},\;\lambda_1=-\frac{1}{4} $$ * $\lambda_{1,3}$ 為負值... 2. Derive and plot the decision boundaries using Soft-margin SVM. 在所有 Case 計算出的 $\lambda$ 值皆有負值和式(2)衝突,無法計算 $\vec{w}=\sum_{i=1}^N\lambda_{i}y_{i}\vec{x_{i}}$ ## Python sklearn soft-margin * Import packages ``` import matplotlib.pyplot as plt import numpy as np import pandas as pd import pylab as pl import seaborn as sns from sklearn import svm from sklearn.metrics import accuracy_score ``` * 生成三個label各200筆的資料 ``` # Set up the parameters for the normal distributions mean1 = 20 std1 = 5 mean2 = 50 std2 = 10 mean3 = 60 std3= 4 n = 200 # Generate n random samples of x1 and x2 x1 = np.random.normal(mean1, std1, n) y1 = np.random.normal(mean1, std1, n) x2 = np.random.normal(mean2-0.5, std2, n) y2 = np.random.normal(mean2, std2, n) x3 = np.random.normal(mean3, std3, n) y3 = np.random.normal(mean1-0.5, std3, n) mean = np.mean(np.concatenate((x1, x2, x3, y1, y2, y3))) sigma = np.std(np.concatenate((x1, x2, x3, y1, y2, y3))) zx1 = (x1-mean) / sigma zy1 = (y1-mean) / sigma zx2 = (x2-mean) / sigma zy2 = (y2-mean) / sigma zx3 = (x3-mean) / sigma zy3 = (y3-mean) / sigma xy1 = np.stack((zx1, zy1), axis=1) xy2 = np.stack((zx2, zy2), axis=1) xy3 = np.stack((zx3, zy3), axis=1) # plt.xlim(0, 100) # plt.ylim(0, 100) plt.plot(xy1[:,0], xy1[:,1], 'g.', xy2[:,0], xy2[:,1], 'y.', xy3[:,0], xy3[:,1], 'r.') plt.gca().axis('equal') one = np.insert(xy1, 2, 1, axis=1) two = np.insert(xy2, 2, 2, axis=1) three = np.insert(xy3, 2, 3, axis=1) df = pd.DataFrame(np.concatenate((one, two, three)), columns = ['x','y','label']) df['label'] = df['label'].astype(int) print(df) ``` ![](https://hackmd.io/_uploads/B1ztzseS3.png) * 將資料拆分成訓練資料和測試資料 ``` train = df.sample(frac=0.8) test = df.loc[df.index.difference(train.index)] X = train[train.columns[:2]] Y = train[train.columns[2]] X_test = test[test.columns[:2]] Y_test = test[test.columns[2]] ``` * 最後使用sklearn套件 ``` svc = svm.SVC(kernel='linear').fit(X, Y) # create a mesh to plot in x_min, y_min = X.min()-1 x_max, y_max = X.max()+1 h=.02 # step size in the mesh xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) titles = 'SVC with linear kernel' Z = svc.predict(np.c_[xx.ravel(), yy.ravel()]) # Put the result into a color plot Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, cmap=plt.get_cmap('terrain'), alpha=0.8) colors = {1:'g', 2:'y', 3:'r'} # Plot also the training points plt.scatter(X_test.x, X_test.y, c=Y_test.map(colors), alpha=0.8) plt.xlim(xx.min(), xx.max()) plt.ylim(yy.min(), yy.max()) plt.title(titles) plt.show() ``` ![](https://hackmd.io/_uploads/Bk15SjgS3.png) * Result of accuracy score:0.975 ``` svc.predict(np.array(X_test)) accuracy_score(svc.predict(np.array(X_test)), np.array(Y_test)) # 0.975 ``` https://hackmd.io/Zfw0FsTjTdi8z2aOQwzTew?view ![](https://hackmd.io/_uploads/SJeP1yfr2.png)