---
title: Spam classification problem
description:
duration: 200
card_type: cue_card
---
### **Problem statement**
<img src="https://drive.google.com/uc?id=1SoIWP9V9xjzoE6MDw7d5ZZCgZYMPOgFh" width=700>
---
title: SVM - support vector machine
description:
duration: 200
card_type: cue_card
---
<img src="https://drive.google.com/uc?id=1avbibjOWa10nLXr-d8ZgAQE7zT9wV2ap" width=700>
---
title: SVM - Geometric intution
description:
duration: 200
card_type: cue_card
---
### **Geometric intution behind SVM**
<img src='https://drive.google.com/uc?id=1WpiQhJkbQdvrPlwMp8XgN7-vdo-lL0RB' width='700'>
<img src='https://drive.google.com/uc?id=1SB2ymuy6BJo_S2AnBNlfUWc4xQv1W6pz' width='700'>
<img src='https://drive.google.com/uc?id=1roB26OTVEbXCuqYknXZ6Kjwt0E47y9Cp' width='800'>
---
title: Quiz 1
description:
duration: 60
card_type: quiz_card
---
# Question
What do you mean by generalization in terms of SVM?
# Choices
- [ ] How far the hyperplane is from the training datapoints
- [x] How accurately the SVM can predict outcomes for unseen data
- [ ] How accurately the SVM classifies training datapoints
---
title: SVM - Geometric intution 2
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1sw4pKVOJCkaa8oFTvBwfXFi49q7qKxKi' width='800'>
<img src='https://drive.google.com/uc?id=1yFdBZTjjd0cfXOurb5BDSaJN5etsxafx' width='800'>
<img src='https://drive.google.com/uc?id=1_OZwfj1hXMBIWh6FlY-ro-bTmPn6dnJY' width='800'>
<img src='https://drive.google.com/uc?id=1hIc5gVv8wEJ3XDtqfTFuXfAE_1E_MHSS' width='800'>
<img src='https://drive.google.com/uc?id=18vNlvPS5XprZdhEbK2MqJDEoCxQ80crO' width='800'>
---
title: Quiz 2
description:
duration: 60
card_type: quiz_card
---
# Question
If,
- Ο+ : w^T * x + b = 40
- Ο- : w^T * x + b = -50
then margin will be:
# Choices
- [ ] 10/|w|
- [ ] 40/|w|
- [ ] 50/|w|
- [x] 90/|w|
---
title: SVM - Demo
description:
duration: 200
card_type: cue_card
---
https://jgreitemann.github.io/svm-demo
<img src='https://drive.google.com/uc?id=149Xs-dDaEhXH8m90fiSlT0SPlUchUo2T' height='400' width='650'>
---
title: Hard Margin SVM
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1mbTiyZd6SGRjCNc1ItOvZkcGSp-D5h4i' width='800'>
<img src='https://drive.google.com/uc?id=15vv6vu28pQkfwAGuNpL1LeaBTojBTKhH' width='800'>
<img src='https://drive.google.com/uc?id=1Xhkz2rDmyuiQCE2sRWyacuvrQa9LXlvP' width='800'>
Example -
<img src='https://drive.google.com/uc?id=1VgUbWlosavzPc9ftYrdt0JntpwZP6c1_' width='800'>
<img src='https://drive.google.com/uc?id=1M5-AUwdrAxpqBt9HYwLCbGiwbNNnedK3' width='800'>
<img src='https://drive.google.com/uc?id=1fc5esn73N3G8zpSXU9LannNTGmrVhGXs' width='800'>
---
title: Quiz 3
description:
duration: 60
card_type: quiz_card
---
# Question
What do you mean by a hard margin?
# Choices
- [x] The SVM allows no error in classification.
- [ ] The SVM allows some error in classification.
- [ ] The SVM allows high error in classification.
---
title: Soft Margin SVM
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1pWeBbjWP3JIG5uD6yYT_I_gaEsH2xF-N' width='800'>
<img src='https://drive.google.com/uc?id=1XQZZ0LtnxBxSaW-Pi3nL1w_DUM0H09zm' width='800'>
Now, our optimization problem becomes:
πππ₯ 2 ||π€| | i.e., the margin
along with minimizing error πβ²ππ
because we're try to get the best possible classificaton.
Can we think of another way to write this?
Reciprocating above equation,
πππ ||π€| |2 with πβ²ππ
<img src='https://drive.google.com/uc?id=1ReU2nrrZYrphbW-QEYeICUMZZrjh03pi' width='800'>
---
title: Hyperparameters in SVM
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1H09wg9b3P9vQJ5IFPAKR5ShSJhZ7jOck' width='800'>
<img src='https://drive.google.com/uc?id=11sfF-vASmq357iU7VVvzhYD21H6Dn6Wb' width='800'>
Therefore, we need to find a balance here.
---
title: Quiz 4
description:
duration: 60
card_type: quiz_card
---
# Question
What would happen when you use very large value of C?
# Choices
- [x] We can still classify training data correctly for given value of C.
- [ ] We cannot classify training data correctly for given value of C.
- [ ] Canβt say for sure
---
title: Algebraic intuition behind SVM
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1WaNSTM_ks_ztOfykq0ic4-Somgb6pX_c' width='800'>
---
title: Hinge loss
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1WIuz_R4HWi9bZfNTq9SwfVZddOcI0Pd7' width='800'>
<img src='https://drive.google.com/uc?id=1AyhARksxIXfedCiyxvnaQZ1I42xL9bo-' width='800'>
<img src='https://drive.google.com/uc?id=1e4gGFbMGXggB_tgU2f1KO429oT1I_vNX' width='800'>
---
title: Quiz 5
description:
duration: 60
card_type: quiz_card
---
# Question
x1,x2,x3 are -ve datapoints which are 0.2, 3.0, 1.0 at unit distance below the Ο-,
what will be their respective ΞΆi?
# Choices
- [ ] 0.8, -2.0, 0.0
- [x] 0.2, 3.0, 1.0
- [ ] 0.8, 2.0, 0.0
---
title: Conclude
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=11N7XqR65bSqV4gmCYpEWNm4QSC3MFxvl' width='800'>
---
title: Comparison with Log Loss
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1QrO3vqgtp2_9S7efvTjTO3_co_522cTG' width='800'>
<img src='https://drive.google.com/uc?id=16I2Ife_Ep2gJr2SW9LTo_oda3v6blnUM' width='800'>
\
We will not be deriving how we get this equation.
---
title: Data Imbalance
description:
duration: 200
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1nu0yh8x8p09L3UPbiitQZ5H1KbS2qo1F' width='800'>
---
title: Quiz 6
description:
duration: 60
card_type: quiz_card
---
# Question
SVM will be impacted if there's an imbalance in the no. of datapoints belonging to each class?
# Choices
- [ ] True
- [x] False
---
title: Code
description:
duration: 200
card_type: cue_card
---
```python=
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
from sklearn import feature_extraction, model_selection, naive_bayes, metrics, svm
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
!gdown 1QViUZJ5UIBCgxB_qbOXTLs_2V48w7MWo
df = pd.read_csv('Spam_processed.csv', encoding='latin-1')
df.dropna(inplace = True)
print(df)
```
>Output
```
type message cleaned_message
0 0 Go until jurong point, crazy.. Available only ... go jurong point crazy available bugis n great ...
1 0 Ok lar... Joking wif u oni... ok lar joking wif u oni
2 1 Free entry in 2 a wkly comp to win FA Cup fina... free entry 2 wkly comp win fa cup final tkts 2...
3 0 U dun say so early hor... U c already then say... u dun say early hor u c already say
4 0 Nah I don't think he goes to usf, he lives aro... nah nt think goes usf lives around though
... ... ... ...
5567 1 This is the 2nd time we have tried 2 contact u... 2nd time tried 2 contact u u Γ₯750 pound prize ...
5568 0 Will Γ_ b going to esplanade fr home? Γ¬_ b going esplanade fr home
5569 0 Pity, * was in mood for that. So...any other s... pity mood suggestions
5570 0 The guy did some bitching but I acted like i'd... guy bitching acted like interested buying some...
5571 0 Rofl. Its true to its name
```
- Performing train-test split
- with [CountVectorization](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html)
- and StandardScaler.
```python=
from sklearn.model_selection import train_test_split
df_X_train, df_X_test, y_train, y_test = train_test_split(df['cleaned_message'], df['type'],
test_size=0.25, random_state=47)
print([np.shape(df_X_train), np.shape(df_X_test)])
# CountVectorizer
f = feature_extraction.text.CountVectorizer()
X_train = f.fit_transform(df_X_train)
X_test = f.transform(df_X_test)
# StandardScaler
scaler = StandardScaler(with_mean=False) # problems with dense matrix
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
print([np.shape(X_train), np.shape(X_test)])
print(type(X_train))
```
>Output
```
[(4173,), (1392,)]
[(4173, 7622), (1392, 7622)]
<class 'scipy.sparse._csr.csr_matrix'>
```
Let's train Linear SVM on the given Spam/Ham data.
```python=
# SVC
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
params = {
'C': [1e-4, 0.001, 0.01, 0.1, 1,10] # which hyperparam value of C do you think will work well?
}
svc = SVC(class_weight={ 0:0.1, 1:0.5 }, kernel='linear')
clf = GridSearchCV(svc, params, scoring = "f1", cv=3)
clf.fit(X_train, y_train)
```
>Output
```
GridSearchCV
GridSearchCV(cv=3,
estimator=SVC(class_weight={0: 0.1, 1: 0.5}, kernel='linear'),
param_grid={'C': [0.0001, 0.001, 0.01, 0.1, 1, 10]}, scoring='f1')
estimator: SVC
SVC
SVC(class_weight={0: 0.1, 1: 0.5}, kernel='linear')
```
```python=
res = clf.cv_results_
for i in range(len(res["params"])):
print(f"Parameters:{res['params'][i]} \n Mean score: {res['mean_test_score'][i]} \n Rank: {res['rank_test_score'][i]}")
```
>Output
```
Parameters:{'C': 0.0001}
Mean score: 0.6566305780023073
Rank: 6
Parameters:{'C': 0.001}
Mean score: 0.7742322485787693
Rank: 1
Parameters:{'C': 0.01}
Mean score: 0.767533370474547
Rank: 2
Parameters:{'C': 0.1}
Mean score: 0.7649416969151316
Rank: 3
Parameters:{'C': 1}
Mean score: 0.7649416969151316
Rank: 3
Parameters:{'C': 10}
Mean score: 0.7649416969151316
Rank: 3
```
As you can see,
- we get the best performance when $C=0.001$,
- with F1 Score of 0.77.
\
Now implementing this SVM on the test data.
```python=
svc = SVC(C=0.001,class_weight={ 0:0.1, 1:0.5 }, kernel='linear')
svc.fit(X_train, y_train)
y_pred = svc.predict(X_test)
print(metrics.f1_score(y_test,y_pred))
```
>Output
```
0.8835820895522388
```
Linear SVM performs much well
- on the Spam/Ham data
- with F1 Score of 0.88
- when using class weights.