2.5 Introduction to SVM

--- title: 2.5 Introduction to SVM tags: [UiT teaching/DTE 2502 - Neural Networks] description: Some comments / suggestions to get python environment up and running reference: https://<> --- # <pre>+++ 2.5 Introduction to SVMs +++</pre> ## Training an SVM ![Hyperplane and margins](https://i.imgur.com/OvdSKr6.png) ### Observations Following the intuition and notations from the lecture we build SVM algorithm using two classes of points shown in ${\color{red} \times}$ and ${\color{blue} \blacklozenge}$ respectively * The classes are denoted by $\{-1, +1\}$ for ${\color{red} \times}$ and ${\color{blue} \blacklozenge}$ respectively. * The corresponding hyperplanes are ${\color{red} { H :\overset{\underset{\mathrm{def}}{}}{=}y_i = \mathbf{w}^T \cdot \mathbf{x} + w_0 \le -1}}$ for $y_i=-1$ and ${\color{blue} { H :\overset{\underset{\mathrm{def}}{}}{=}y_i = \mathbf{w}^T \cdot \mathbf{x} + w_0 \ge -1}}$ for $y_i=+1$ respectively where $w_0$ is the bias term. * Since ${\color{blue}{\mathbf{b_0}}}$ and ${\color{red}{\mathbf{a_0}}}$ are lying on ${\color{blue} { H}}$ and ${\color{red} { H}}$ respectively, we have ${\color{blue} { \mathbf{w}^T \cdot \mathbf{b_0} + w_0 = +1}}$ and ${\color{red} { \mathbf{w}^T \cdot \mathbf{a_0} + w_0 = -1}}$ * Consider the point $A$ denoted by the vector ${\color{red}{\mathbf{a_0}}}$ lying on the plane ${\color{red}{H}}$ $$ {\color{blue}{\mathbf{b_0}}} = {\color{red}{\mathbf{a_0}}} + {\color{magenta}{\mathbf{c_0}}}$$ where $|{\color{magenta}{\mathbf{c_0}}}|$ is the margin by construction so ${\color{magenta}{\mathbf{c_0}}} = m \widehat{\color{magenta}{\mathbf{w}}} = m \frac{\mathbf{w}}{||\mathbf{w}||}$ for some constant $m$ which denotes the length of the margin. We have $${ { \mathbf{w}^T \cdot \mathbf{b_0} + w_0 = 1}}$$ $$\implies { { \mathbf{w}^T \cdot (\mathbf{a_0 + c_0}) + w_0 = 1}}$$ $$\implies { { \mathbf{w}^T \cdot \mathbf{a_0} + w_0 + \mathbf{w}^T \cdot \mathbf{c_0} = 1}}$$ $$\implies { { -1 + \mathbf{w}^T \cdot \mathbf{c_0} = 1}}$$ $$\implies \mathbf{w}^T \cdot \mathbf{c_0} = m \frac{\mathbf{w}^T \cdot \mathbf{w}}{||\mathbf{w}||} = m \frac{||\mathbf{w}||^2}{||\mathbf{w}||} = 2$$ $$\implies m = \frac{2}{||\mathbf{w}||}$$ In order to maximize the margin we have to minmize the norm of the vector $\mathbf{w}$ ### Objective function $$\mathcal{L}(\mathbf{w}, w_0) = \frac{1}{2} \lambda || \mathbf{w}||^2 + \frac{1}{n} \sum_{i=1}^{n} \max\bigg(0, 1 - y_i(\mathbf{w}^T \cdot \mathbf{x} + w_0)\bigg)$$ The loss function to be optimized can be defined as above, where the minimizing the first term maximizes the margin and the minmizing the second term minimizes the error in the output. **NOTE:** Convince yourselves that $\bigg(1 - y_i(\mathbf{w}^T \cdot \mathbf{x} + w_0)\bigg)$ is true for both the hyperplanes ${\color{blue} { H}}$ and ${\color{red} { H}}$. The loss component w.r.t $i$-th data tuple $(x_i, y_i)$ and corressponding partial derivative w.r.t $k$-th weight component $w_k$ can be summarized as $$\mathcal{L}_i = \left\{\begin{matrix} \frac{1}{2} \lambda || \mathbf{w}||^2& \text{if } y_i(\mathbf{w}^T \cdot \mathbf{x} + w_0) \ge 1, & \frac{\partial \mathcal{L}_i}{\partial w_k} = \lambda w_k, & \frac{\partial \mathcal{L}_i}{\partial w_0} = 0 \\ \frac{1}{2} \lambda || \mathbf{w}||^2 + 1 - y_i(\mathbf{w}^T \cdot \mathbf{x} + w_0) & \text{otherwise, } & \frac{\partial \mathcal{L}_i}{\partial w_k} = \lambda w_k - y_i x_i, & \frac{\partial \mathcal{L}_i}{\partial w_0} = - y_i\\ \end{matrix}\right.$$ ### Implemeting SVM on two class problem Use the `SVM.ipynb`, `ellipse_data_small.mat` and `ellipse_data_large.mat` files for experiments.