# Machine Learning Homework 4 SVM
Department : Forsetry
Student : M10912004
Name : De-Kai Kao (高得愷)
E-mail : block58697@gmail.com
## 1. Linear hard-margin SVM
For the data set
$$
X=[\vec{x}_{1}\; \vec{x}_{2}\; \vec{x}_{3} ] = [(1,2)\;(2,1)\;(4,2)]\\Y=[y_{1}\;y_{2}\;y_{3}]=[-1\;1\;1]
$$
* Plot 
1. Determine the supporting vectors
$$
Decison\;Boundary:\vec{w}\cdot\vec{x}+b=0\\Upper\;Boundary:\vec{w}\cdot\vec{x}+b=1\\Lower\;Boundary:\vec{w}\cdot\vec{x}+b=-1
$$
$$
Width\;d=\overrightarrow{x^{-}x^{+}}\frac{\vec{w}}{||w||}\\ =\frac{1}{||w||}w^T(x^+-x^-)\\=\frac{1}{||w||}(w^Tx^+-w^Tx^-)\\=\frac{1}{||w||}[1-b-(1-b)]\\d=\frac{2}{||w||}
$$
$$
argmax_{w,b}\frac{2}{||w||}\Rightarrow argmin_{w,b}\frac{||w||^2}{2}\\Objective \;Function:f_{i}(w)=argmin_{w,b}\frac{||w||^2}{2}\\s.t.\;f_i(w)=y_{i}(\vec{w}\cdot \vec{x_i}+b)\ge1,\;i=1,2,...,N
$$
* 基於KKT條件的二次規劃問題,需要引入拉格朗日乘數(lagrange multiplier)求得最佳解。
$$
Lp(w,b,\lambda)=\frac{1}{2}||w||^2+\sum_{i=1}^{N}\lambda_{i}[1-y_{i}(\vec{w}\cdot \vec{x_i}+b)]......(1)\\
$$
* 拉格朗日乘數可以透過引入未知標量($\lambda$),分別 $w,b$ 偏微分,直接求多元函數條件的極值
* 若多元函數不帶約束條件求極值時,分別 $w,b$ 偏微分解方程,極為無條件極值
* 當 $f_{i}(w)>0$ 無約束 $\lambda_{i}=0$
* 當 $f_{i}(w)=0,\;\lambda_{i}\neq0$ 即可求出支持向量
$$
\frac{\partial Lp}{\partial w}=0\Rightarrow \vec{w}=\sum_{i=1}^N\lambda_{i}y_{i}\vec{x_{i}}.......(2
)\\\frac{\partial Lp}{\partial b}=0\Rightarrow -\sum_{i=1}^N\lambda_{i}y_{i}.......(3)
$$
* 將(2)帶回(1)式,可得
$$
Dual form\;\;L_{D}(\lambda)=\sum_{i=1}^N\lambda_{i}-\frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N\lambda_{i}\lambda_{j}y_{i}y_{j}(x_{i}\cdot x_{j})......(4)
$$
* 因為 $L_{D}(\lambda)$ 已經沒有變量 $w,b$ 故將$X,\; Y$ 代入 $L_{D}(\lambda)$
* Case1 支持向量為 $\vec{x}_1\; \vec{x}_2$ 因 $\vec{x_3}$ 為非支持向量,所以 $\lambda_3=0$
\begin{aligned} L_{D}(\lambda)&=(\lambda_{1}+\lambda_{2})-\frac{1}{2}[\lambda_{1}\lambda_{1}(4)-\lambda_{1}\lambda_{2}(4)-\lambda_{2}\lambda_{1}(4)+\lambda_{2}\lambda_{2}(4)]\end{aligned}
* 代入限制條件
$$ s.t.=\lambda\ge0,\;\sum^N_{i=1}\lambda_{i}y_{i}=0\\\Rightarrow\lambda_{1}y_{1}+\lambda_{2}y_{2}=0\\-\lambda_{1}+\lambda_{2}=0\\\lambda_{1}=\lambda_{2}
$$
* $L_{D}(\lambda)=2\lambda_{1}-(0)\quad\lambda_{1,2}\in\mathbb{R}$
* Case1 支持向量為 $\vec{x}_1\; \vec{x}_2$ ,將其代入 $Upper\;\&\;Lower\;boundary$
$$
1\vec{w_1}+2\vec{w_2}+b=1\\2\vec{w_1}+1\vec{w_2}+b=-1\\Upper\;boundary\;-\;Lower\;boundary\\\Rightarrow\;\vec{w_1}=\vec{w_2}-2\\
$$
* 目標函數 $f_{i}(w)=argmin_{w,b}\frac{||w||^2}{2}$
$$
\Rightarrow\vec{w_1}^2+\vec{w_2}^2=(\vec{w_2}-2)^2+\vec{w_2}^2\\=2\vec{w_2}^2-4\vec{w_2}+4\\\frac{\partial L_D}{\partial \vec{w_2}}=0,\quad4\vec{w_2}-4=0,\quad\vec{w_2}=1,\quad\vec{w_1}=-1\\Upper\;boundary:1\times(-1)+2\times1+b=1\\b=0
$$
* Result:
$$Decison\;Boundary:-x_1+x_2=0\\Upper\;Boundary:-x_1+x_2=1\\Lower\;Boundary:-x_1+x_2=-1$$
2. Derive and plot the decision boundaries using Hard-margin SVM .

* Case2 支持向量為 $\vec{x_1}\; \vec{x_3}$ , $\vec{x_2}$ 會和限制條件抵觸
$$
s.t.\;f_i(w)=y_{i}(\vec{w}\cdot \vec{x_i}+b)\ge1,\;i=1,2,...,N\\f_i(\vec{x_2})=y_i(\vec{w}\cdot \vec{x_2}+b)\ngeq1$$
* 討論:
課堂講義上只有兩點的計算範例,當資料是兩點以上就需要判定是否為支持向量,但在這次的例題中我仍然無法全然確定哪些並非支持向量,在 Case2 的例題中,其實是無法在未知 $\vec{w}\;\&\;b$ 的情況下判定是否為支持向量,本題非支持向量是由散佈圖中推論出來。可以理解的部分是引入拉格朗日乘數後,支持向量是在 $\lambda\neq0$ 時可以求出,$\lambda=0$ 對應的即為非支持向量,但仍然無法完全從對偶公式中判別支持向量與否,有從網路上得知可以用SMO或二次規劃等數值優化技術,很抱歉但無法在本次作業理解所有的公式。
## 2. Linear soft-margin SVM
For the data set
$$
X=[\vec{x}_{1}\; \vec{x}_{2}\; \vec{x}_{3}\; \vec{x}_{4} ] = [(1,2)\;(2,1)\;(3,1)\;(5,1)]\\Y=[y_{1}\;y_{2}\;y_{3}\;y_{4}]=[-1\;1\;1\;1]
$$
* Plot 
A slack variable is chosen to be C=0.5
1. Determine the supporting vectors
\begin{aligned}L=min_{w,\xi_i}\frac{||w||^2}{2}+C\sum_{i=1}^{N}\xi_i\quad s.t.:\;y_{i}(\vec{w}\cdot\vec{x_{i}}+b)\ge1-\xi_i,\;\xi\ge0
\end{aligned}
* 同樣需要引入拉格朗日乘數來求得最佳解
$$
Lp(w,b,\lambda,\mu,\xi)=\\\frac{1}{2}||w||^2+C\left(\sum_{i=1}^{N}\xi_{i}\right)-\sum^N_{i=1}\lambda_{i}(1-y_{i}(\vec{w}\cdot\vec{x_{i}}+b)-1+\xi_i)-\sum^N_{i=1}\mu_i\xi_i......(1)\\
$$
* 根據KKT條件將不等式約束轉換成等式
$$
\lambda_i\ge0,\;\lambda_i[y_i(\vec{w}\;\cdot\vec{x_i}+b)-1+\xi_i]=0\\\mu_i\ge0,\;\mu_i\xi_i=0
$$
* 分別對 $w,b,\xi_i$ 偏微分令其為零
$$
\frac{\partial Lp}{\partial w}=0\Rightarrow \vec{w}=\sum_{i=1}^N\lambda_{i}y_{i}\vec{x_{i}}\\\frac{\partial Lp}{\partial b}=0\Rightarrow \sum_{i=1}^N\lambda_{i}y_{i}=0\\\frac{\partial Lp}{\partial \xi}=0\Rightarrow C-\lambda_i-\mu_i=0\\
$$
* 整合可得目標函數
$$
L_D=max\sum^N_{i=1}\lambda_i-\frac{1}{2}\sum^N_{i=1}\sum^N_{j=1}\lambda_i\lambda_jy_iy_j(\vec{x_i}\cdot\vec{x_j})\\s.t.\quad\sum^N_{i=1}\lambda_iy_i=0,\;0\le\lambda_i\le C=0.5
$$
$$
\left\{
\begin{aligned}
&\lambda=0\quad 即點在兩條 Boundary 之外\\&\lambda=C\quad 即點在兩條 Boundary 之內\\&0\le\lambda_i\le C\quad 即點在兩條 Boundary 上,即為支持向量
\end{aligned}
\right\}...(2)
$$
* Case1 支持向量為 $\vec{x}_1\; \vec{x}_2;\quad \lambda_3=0,\lambda_4=0$
$$
\lambda_1y_1+\lambda_2y_2=0\\\lambda_1=\lambda_2\\L_D=(\lambda_1+\lambda_2)-\frac{1}{2}(\lambda_1\lambda_1(4)-\lambda_1\lambda_2(4)-\lambda_2\lambda_1(4)+\lambda_2\lambda_2(4))\\=2\lambda_1-(0)
$$
* $L_{D}(\lambda)=2\lambda_1-(0)\quad\lambda_{1,2}\in\mathbb{R}$
* Case2 支持向量為 $\vec{x}_1\; \vec{x}_3;\quad \lambda_2=0.5,\lambda_4=0$
$$
\lambda_1y_1+\lambda_2y_2+\lambda_3y_3=0\\-\lambda_1+0.5+\lambda_3=0\quad\lambda_1=\lambda_3+0.5\\
$$
\begin{aligned} L_D&=(\lambda_1+0.5+\lambda_3)-\frac{1}{2}[\\&\quad\lambda_1\lambda_1(4)-\lambda_1(0.5)(4)-\lambda_1\lambda_3(5)\\&\quad-(0.5)\lambda_1(4)+(0.5)(0.5)(5)+(0.5)\lambda_3(6)\\&\quad-\lambda_3\lambda_1(5)+\lambda_3(0.5)(7)+\lambda_3\lambda_3(10))\\&=3(\lambda_3)^2+\lambda_3-\frac{3}{8}\end{aligned}
$$
\frac{\partial L_D}{\partial \lambda_3}=6\lambda_3+1=0,\quad \lambda_3=-\frac{1}{6},\;\lambda_1=\frac{1}{3}
$$
* .
* $\lambda_3$ 為負值...
* Case3 支持向量為 $\vec{x}_1\; \vec{x}_4\quad \lambda_{2,4}=0.5$
$$
\lambda_1y_1+\lambda_2y_2+\lambda_3y_3+\lambda_4y_4=0\\-\lambda_1+0.5+\lambda_3+0.5=0\quad\lambda_1=\lambda_3+1
$$
\begin{aligned} L_D &=(\lambda_1+1+\lambda_3)-\frac{1}{2}[\\&\quad\lambda_1\lambda_1(4)-\lambda_1
(0.5)(4)-\lambda_1\lambda_3(5)-\lambda_1(0.5)(7)\\&\quad-(0.5)\lambda_1(4)+(0.5)(0.5)(4)+(0.5)\lambda_3(7)-(0.5)(0.5)(11)\\&\quad-\lambda_3\lambda_1(5)+\lambda_3(0.5)(7)+\lambda_3\lambda_3(10)+\lambda_3(0.5)(16))\\&\quad-(0.5)\lambda_1(7)+(0.5)(0.5)(11)+(0.5)\lambda_3(16)+(0.5)(0.5)(26)]\\&=2\lambda_3+2-\frac{1}{2}[\\&\quad4(\lambda_3)^2+8\lambda_3+4-2\lambda_3-2-5(\lambda_3)^2-5\lambda_3-3.5\lambda_3-3.5\\&\quad-2\lambda_3-2+1+3.5\lambda_3+2.75\\&\quad-5(\lambda_3)^2-5\lambda_3+3.5\lambda_3+10(\lambda_3)^2+8\lambda_3\\&\quad-3.5\lambda_3-3.5+2.75+8\lambda_3+6.5]\\&=4(\lambda_3)^2+10\lambda_3+10\end{aligned}
$$
\frac{\partial L_D}{\partial \lambda_3}=8\lambda_3+10=0,\quad \lambda_3=-\frac{5}{4},\;\lambda_1=-\frac{1}{4}
$$
* $\lambda_{1,3}$ 為負值...
2. Derive and plot the decision boundaries using Soft-margin SVM.
在所有 Case 計算出的 $\lambda$ 值皆有負值和式(2)衝突,無法計算 $\vec{w}=\sum_{i=1}^N\lambda_{i}y_{i}\vec{x_{i}}$
## Python sklearn soft-margin
* Import packages
```
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pylab as pl
import seaborn as sns
from sklearn import svm
from sklearn.metrics import accuracy_score
```
* 生成三個label各200筆的資料
```
# Set up the parameters for the normal distributions
mean1 = 20
std1 = 5
mean2 = 50
std2 = 10
mean3 = 60
std3= 4
n = 200
# Generate n random samples of x1 and x2
x1 = np.random.normal(mean1, std1, n)
y1 = np.random.normal(mean1, std1, n)
x2 = np.random.normal(mean2-0.5, std2, n)
y2 = np.random.normal(mean2, std2, n)
x3 = np.random.normal(mean3, std3, n)
y3 = np.random.normal(mean1-0.5, std3, n)
mean = np.mean(np.concatenate((x1, x2, x3, y1, y2, y3)))
sigma = np.std(np.concatenate((x1, x2, x3, y1, y2, y3)))
zx1 = (x1-mean) / sigma
zy1 = (y1-mean) / sigma
zx2 = (x2-mean) / sigma
zy2 = (y2-mean) / sigma
zx3 = (x3-mean) / sigma
zy3 = (y3-mean) / sigma
xy1 = np.stack((zx1, zy1), axis=1)
xy2 = np.stack((zx2, zy2), axis=1)
xy3 = np.stack((zx3, zy3), axis=1)
# plt.xlim(0, 100)
# plt.ylim(0, 100)
plt.plot(xy1[:,0], xy1[:,1], 'g.', xy2[:,0], xy2[:,1], 'y.', xy3[:,0], xy3[:,1], 'r.')
plt.gca().axis('equal')
one = np.insert(xy1, 2, 1, axis=1)
two = np.insert(xy2, 2, 2, axis=1)
three = np.insert(xy3, 2, 3, axis=1)
df = pd.DataFrame(np.concatenate((one, two, three)), columns = ['x','y','label'])
df['label'] = df['label'].astype(int)
print(df)
```

* 將資料拆分成訓練資料和測試資料
```
train = df.sample(frac=0.8)
test = df.loc[df.index.difference(train.index)]
X = train[train.columns[:2]]
Y = train[train.columns[2]]
X_test = test[test.columns[:2]]
Y_test = test[test.columns[2]]
```
* 最後使用sklearn套件
```
svc = svm.SVC(kernel='linear').fit(X, Y)
# create a mesh to plot in
x_min, y_min = X.min()-1
x_max, y_max = X.max()+1
h=.02 # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
titles = 'SVC with linear kernel'
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.get_cmap('terrain'), alpha=0.8)
colors = {1:'g', 2:'y', 3:'r'}
# Plot also the training points
plt.scatter(X_test.x, X_test.y, c=Y_test.map(colors), alpha=0.8)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title(titles)
plt.show()
```

* Result of accuracy score:0.975
```
svc.predict(np.array(X_test))
accuracy_score(svc.predict(np.array(X_test)), np.array(Y_test))
# 0.975
```
https://hackmd.io/Zfw0FsTjTdi8z2aOQwzTew?view
