---
title: SVM
tags: research
---
# Support Vector Machine
:::info
Edited by Prof. Huei-Wen Teng and Ming-Hsuan Kang at NYCU and updated on 2021/10/29. All rights reserved.
:::
Support Vector Machine (SVM) is a useful machine learning algorithm. Consider a set of input data $X \subset \mathbb{R}^d$ and a set of output data $Y \subset \{\pm 1\}$. Let $n$ be the size of $X$. Let $X^{(i)}$ and $Y^{(i)}$ be the $i$-th observation, for $i=1,\ldots,n$.
## The standard SVM
Let's start with the standard SVM. First, we consider a linear function of $x$ with parameters $w$ and $b$:
$$g_{w,b}(x) = \langle w, x\rangle +b. $$
And we define
$$\mathrm{sign}(z) = \begin{cases} 1 &, \mbox{ if }z\geq 0 \\
-1 &, \mbox{ if }z<0 \end{cases} \quad \mbox{and} \quad
$$
The optimal margin classifier is given by
$$ y = \mathrm{sign}(g(x)).$$
where $(w,b)$ is the solution of
$$ \min_{w,b} \frac{1}{2}\|w\|^2 \quad \mbox{subject to} \quad Y^{(i)}( \langle w, X^{(i)}\rangle +b ) \geq 1 \, \mbox{ for $i=1$ to $n$}.$$
Here, two parameters $w$ and $b$ form the hyperplane, where $w$ gives the direction and $b$ is the intercept.
### Non-linear SVM
Next, we consider the same problem together with a feature map $\phi$. Then, the optimal margin classifier is
$$y =\mathrm{sign}(g(x)) = \mathrm{sign}( \langle w, \phi(x)) +b)$$
Here $(w,b)$ is the solution of
$$ \min_{w,b} \frac{1}{2}\|w\|^2 \quad \mbox{subject to} \quad Y^{(i)}( \langle w, \phi(X^{(i)})\rangle +b ) \geq 1 \, \mbox{ for $i=1$ to $n$}.$$
For example, the quadratic polynomial feature map: $$\phi(x) = (x_1,\ldots,x_d, x_1^2,\ldots,x_d^2,x_1x_2, x_1x_3,\ldots,x_{d-1}x_{d}).$$
## Soft-Margin SVM
One can consider the optimal **soft** margin classifier which is the solution of
$$ \min_{w,b,\xi} \frac{1}{2}\|w\|^2+ C \sum \xi_i \quad \mbox{subject to} \quad Y^{(i)}( \langle w, X^{(i)}\rangle +b ) \geq 1-\xi_i \quad \mbox{and} \quad \xi_i \geq 0 \,,\mbox{for $i=1$ to $n$}.$$
Here $C$ is the hyperparameter.
We treat $\xi_i$ as a parameter, which are estimated from data.

## Weighted Soft-Margin SVM
When the data is imbalanced, we may consider the weighted soft-margin SVM as follows.
$$ \min_{w,b,\xi} \frac{1}{2}\|w\|^2+ \sum C_{i} \xi_i \quad \mbox{subject to} \quad Y^{(i)}( \langle w, X^{(i)}\rangle +b ) \geq 1-\xi_i \,,\mbox{for $i=1$ to $n$}.$$
For convenience, we will set $C_i=C_-$ if $Y^{(i)}=-1$ and $C_i =C_{+}$ if $Y^{(i)}=1$.
In this case, we have to tune the hyperparameters $C_+$ and $C_-$.
We assume the proportion of +1 is small (about 7%).
### Weighted Soft-Margin Non-linear SVM
When the data is complex, the weighted Soft-Margin Non-Linear SVM considers:
$$ g(x) = \color{green}{\langle w, \phi(x) \rangle +b} .$$Here, $\phi(x)$ is the quadratric polynomial feature map.
## The SVM can help to provide the default probability
1. Given a training data set $X_{\mathrm{train}}$, we apply the weighted soft-margin non-linear SVM to obtain a function
$$ g(x) = \color{green}{\langle w, \phi(x) \rangle +b} .$$Here, $\phi(x)$ is the quadratric polynomial feature map.
2. There are two hyperparameters: $\color{blue}{C}$ and $\color{blue}{\delta}$.
3. The idea is to use $g(x)$ to split the data set $X$ into several groups such that each group has a different default rate.
4. Since the data set is imbalanced, we let $(\mu, \sigma)$ be the mean and the standard deviation of $\{g(X^{(i)})| X^{(i)} \in X_{\mathrm{train}} ,Y^{(i)}<0\}$. In this case, $\mu$ is close to $-1$ and $\sigma$ is very small. (Why?)
5. We split the data into 7 groups according the values $g(x)$ with the following cutting points:
$$-\infty\,,\, \mu-\sigma\,,\, \mu - 0.5 \sigma \,,\, \mu \,,\,\mu+0.5 \sigma\,,\, \mu + \sigma \,,\,0 \,,\,\infty.$$
6. To tune hyperparameters, we will compute the default rate for each groups for the train set and the test set as follows.
