SVM - HackMD

--- title: SVM tags: research --- # Support Vector Machine :::info Edited by Prof. Huei-Wen Teng and Ming-Hsuan Kang at NYCU and updated on 2021/10/29. All rights reserved. ::: Support Vector Machine (SVM) is a useful machine learning algorithm. Consider a set of input data $X \subset \mathbb{R}^d$ and a set of output data $Y \subset \{\pm 1\}$. Let $n$ be the size of $X$. Let $X^{(i)}$ and $Y^{(i)}$ be the $i$-th observation, for $i=1,\ldots,n$. ## The standard SVM Let's start with the standard SVM. First, we consider a linear function of $x$ with parameters $w$ and $b$: $$g_{w,b}(x) = \langle w, x\rangle +b. $$ And we define $$\mathrm{sign}(z) = \begin{cases} 1 &, \mbox{ if }z\geq 0 \\ -1 &, \mbox{ if }z<0 \end{cases} \quad \mbox{and} \quad $$ The optimal margin classifier is given by $$ y = \mathrm{sign}(g(x)).$$ where $(w,b)$ is the solution of $$ \min_{w,b} \frac{1}{2}\|w\|^2 \quad \mbox{subject to} \quad Y^{(i)}( \langle w, X^{(i)}\rangle +b ) \geq 1 \, \mbox{ for $i=1$ to $n$}.$$ Here, two parameters $w$ and $b$ form the hyperplane, where $w$ gives the direction and $b$ is the intercept. ### Non-linear SVM Next, we consider the same problem together with a feature map $\phi$. Then, the optimal margin classifier is $$y =\mathrm{sign}(g(x)) = \mathrm{sign}( \langle w, \phi(x)) +b)$$ Here $(w,b)$ is the solution of $$ \min_{w,b} \frac{1}{2}\|w\|^2 \quad \mbox{subject to} \quad Y^{(i)}( \langle w, \phi(X^{(i)})\rangle +b ) \geq 1 \, \mbox{ for $i=1$ to $n$}.$$ For example, the quadratic polynomial feature map: $$\phi(x) = (x_1,\ldots,x_d, x_1^2,\ldots,x_d^2,x_1x_2, x_1x_3,\ldots,x_{d-1}x_{d}).$$ ## Soft-Margin SVM One can consider the optimal **soft** margin classifier which is the solution of $$ \min_{w,b,\xi} \frac{1}{2}\|w\|^2+ C \sum \xi_i \quad \mbox{subject to} \quad Y^{(i)}( \langle w, X^{(i)}\rangle +b ) \geq 1-\xi_i \quad \mbox{and} \quad \xi_i \geq 0 \,,\mbox{for $i=1$ to $n$}.$$ Here $C$ is the hyperparameter. We treat $\xi_i$ as a parameter, which are estimated from data. ![](https://i.imgur.com/Fy7lM9d.png) ## Weighted Soft-Margin SVM When the data is imbalanced, we may consider the weighted soft-margin SVM as follows. $$ \min_{w,b,\xi} \frac{1}{2}\|w\|^2+ \sum C_{i} \xi_i \quad \mbox{subject to} \quad Y^{(i)}( \langle w, X^{(i)}\rangle +b ) \geq 1-\xi_i \,,\mbox{for $i=1$ to $n$}.$$ For convenience, we will set $C_i=C_-$ if $Y^{(i)}=-1$ and $C_i =C_{+}$ if $Y^{(i)}=1$. In this case, we have to tune the hyperparameters $C_+$ and $C_-$. We assume the proportion of +1 is small (about 7%). ### Weighted Soft-Margin Non-linear SVM When the data is complex, the weighted Soft-Margin Non-Linear SVM considers: $$ g(x) = \color{green}{\langle w, \phi(x) \rangle +b} .$$Here, $\phi(x)$ is the quadratric polynomial feature map. ## The SVM can help to provide the default probability 1. Given a training data set $X_{\mathrm{train}}$, we apply the weighted soft-margin non-linear SVM to obtain a function $$ g(x) = \color{green}{\langle w, \phi(x) \rangle +b} .$$Here, $\phi(x)$ is the quadratric polynomial feature map. 2. There are two hyperparameters: $\color{blue}{C}$ and $\color{blue}{\delta}$. 3. The idea is to use $g(x)$ to split the data set $X$ into several groups such that each group has a different default rate. 4. Since the data set is imbalanced, we let $(\mu, \sigma)$ be the mean and the standard deviation of $\{g(X^{(i)})| X^{(i)} \in X_{\mathrm{train}} ,Y^{(i)}<0\}$. In this case, $\mu$ is close to $-1$ and $\sigma$ is very small. (Why?) 5. We split the data into 7 groups according the values $g(x)$ with the following cutting points: $$-\infty\,,\, \mu-\sigma\,,\, \mu - 0.5 \sigma \,,\, \mu \,,\,\mu+0.5 \sigma\,,\, \mu + \sigma \,,\,0 \,,\,\infty.$$ 6. To tune hyperparameters, we will compute the default rate for each groups for the train set and the test set as follows. ![](https://i.imgur.com/FQELzTg.png)