---
title: Sklearn Implementation of Logistic regression
description:
duration: 15000
card_type: cue_card
---
Let's load the data of our business case now
```python=
#Churn prediction in telecom.
import numpy as np
import matplotlib.pyplot as plt
!gdown 1uUt7uL-VuF_5cpodYRiriEwhsldeEp3m
import pandas as pd
churn = pd.read_csv("churn_logistic.csv")
churn.head()
# We will choose 5 features for our logistic regression which we selected using simple EDA
# You can go through the EDA of this to understand how we selected these features:
https://colab.research.google.com/drive/1nkbiGCMrevDzdSG9yN5bXaxeC8CPJSQg?usp=sharing
```
```python=
cols = ['Day Mins', 'Eve Mins', 'Night Mins', 'CustServ Calls', 'Account Length']
y = churn["Churn"]
y = np.array(y).reshape(len(y), 1) #Reshaping our data to (m,1) shape
X = churn[cols]
X.shape```
```
> Output
```
(5700, 5)
```
Let's split the data into training, validation and testing
```python=
from sklearn.model_selection import train_test_split
X_tr_cv, X_test, y_tr_cv, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
X_train, X_val, y_train, y_val = train_test_split(X_tr_cv, y_tr_cv, test_size=0.25,random_state=1)
X_train.shape
```
> Output
```
(3420, 5)
```
We will scale our data before fitting the model
```python=
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)
X_train
```
> Output
```
array([[-1.8525591 , -0.54121117, 1.87596728, 0.0724823 , 2.13378709],
[ 0.93155078, 1.05292599, 0.39854651, -0.54879454, -0.81991418],
[ 0.46912157, 0.11462924, 1.13324217, 0.0724823 , -2.27130187],
...,
[-0.52565742, -0.04014136, -0.68543069, 0.69375914, 0.55508469],
[-0.94359172, -0.58957698, -0.37428909, 1.93631281, -0.36158122],
[-0.58604336, 2.40910335, 1.70935597, 0.69375914, -0.25972945]])
```
```python=
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
print("coef = ",model.coef_)
print("intercept = ",model.intercept_)
```
> Output
```
coef = array([[0.6844725 , 0.29104522, 0.13637423, 0.79640697, 0.0613349 ]])
intercept = array([-0.01215015])
```
```python=
model.predict(X_train)
```
> Output
```
array([0, 1, 1, ..., 1, 1, 1])
```
---
title: Accuracy Metric
description:
duration: 15000
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1idIjt8sYlFbBdGy0udJz_NdAMeX_BAf9' width=700>
<img src='https://drive.google.com/uc?id=15_a3vpK24gFOY4BJHaFovUxMQ9nVWuNd' width=700>
Let's implement our accuracy metric now
```python=
def accuracy(y_true, y_pred):
y_true = y_true.reshape(len(y_true))
return np.sum(y_true==y_pred)/y_true.shape[0]
```
```python=
accuracy(y_train, model.predict(X_train))
```
> Output
```
0.7058479532163743
```
```python=
accuracy(y_val, model.predict(X_val))
```
> Output
```
0.6982456140350877
```
---
title: Quiz 1
description:
duration: 60
card_type: quiz_card
---
# Question
What is the main risk of overfitting when tuning hyperparameters in logistic regression?
# Choices
- [ ] The model may generalize well to unseen data but poorly on the training data
- [x] The model may perform well on the training data but poorly on unseen data
- [ ] The model may underperform compared to a model with default hyperparameter values
- [ ] The model may be too simple and fail to capture complex relationships in the data
---
title: Hyperparameter tuning
description:
duration: 15000
card_type: cue_card
---
We will tune the regularization rate of our model.
You can refer to the documentation for the various list of parameters in logistic regression.
Link: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
Hence let's start doing hyper parameter tuning on parameter 𝐶=1/𝜆 to increase the performance of the model
```python=
from sklearn.pipeline import make_pipeline
train_scores = []
val_scores = []
scaler = StandardScaler()
for la in np.arange(0.01, 5000.0, 100): # range of values of Lambda
scaled_lr = make_pipeline(scaler, LogisticRegression(C=1/la))
scaled_lr.fit(X_train, y_train)
train_score = accuracy(y_train, scaled_lr.predict(X_train))
val_score = accuracy(y_val, scaled_lr.predict(X_val))
train_scores.append(train_score)
val_scores.append(val_score)
```
The code is similar to the code we did in hyperparameter tuning
```python=
!gdown 1bwRmKkPwmLKiqOgQ_LnKH0Vsc3mJKmVR
```
```python=
len(val_scores)
#Output = 50
```
Now, let's plot the graph and pick the Regularization Parameter λ which gives the best validation score
```python=
plt.figure(figsize=(10,5))
plt.plot(list(np.arange(0.01, 5000.0, 100)), train_scores, label="train")
plt.plot(list(np.arange(0.01, 5000.0, 100)), val_scores, label="val")
plt.legend(loc='lower right')
plt.xlabel("Regularization Parameter(λ)")
plt.ylabel("Accuracy")
plt.grid()
plt.show()
```
<img src='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/043/497/original/download_%283%29.png?1692597511' width=700>
We see how Validation increases to a peak and then decreases
Notice as Regularization is increasing, the Accuracy decreasing since model is moving towards Underfit
---
title: Code
description:
duration: 15000
card_type: cue_card
---
Let's take lambda value as 1000 for this data and check the
results
```python=
model = LogisticRegression(C=1/1000)
model.fit(X_train, y_train)
print("Train acc = ",accuracy(y_train, model.predict(X_train)))
print("Test acc = ",accuracy(y_val, model.predict(X_val)))
```
> Output
```
Train acc = 0.7137426900584796
Test acc = 0.7096491228070175
```
We can observe an increase of 0.01, or 1%, in both training and validation data
Let's check our model for test data too
```python=
accuracy(y_test, model.predict(X_test))
```
> Output
```
0.7096491228070175
```
---
title: Quiz 2
description:
duration: 30
card_type: quiz_card
---
# Question
What is the effect of increasing the regularization rate (C) in logistic regression?
# Choices
- [ ] The model becomes less prone to overfitting
- [ ] The model's training accuracy increases
- [x] The model becomes more prone to overfitting
- [ ] The model's test accuracy increases
---
title: Quiz 3
description:
duration: 30
card_type: quiz_card
---
# Question
How does the regularization rate ( C ) affect the magnitude of the model coefficients in logistic regression?
# Choices
- [ ] Higher C results in larger coefficient values
- [x] Higher C results in smaller coefficient values
- [ ] C has no impact on the magnitude of the coefficients
- [ ] The effect of C on the coefficients depends on the dataset
---
title: Quiz 4
description:
duration: 30
card_type: quiz_card
---
# Question
The logistic regression algorithm estimates the parameters by maximizing the:
# Choices
- [ ] Sum of squared errors
- [ ] Mean squared error
- [x] Likelihood function
- [ ] Cross-entropy loss
---
title: Log odds interpretation of logistic regression
description:
duration: 15000
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1z-0qkx0h81U_iwb7fVeFQG0RVkpqyPGy' width=700>
<img src='https://drive.google.com/uc?id=1mruiW2aBWCEMjW74WtAC3_AQoeDZ4EdJ' width=700>
#### Which concept of earlier is this similar to?
Remember, $σ(p)$ also defined probability.
So if we simplify our winning/losing as belonging to class 1/0, then $σ(p)$ here defines the probability of belonging to class 1 (winning class)
---
title: Quiz 5
description:
duration: 30
card_type: quiz_card
---
# Question
The logistic regression model predicts:
# Choices
- [x] Probabilities
- [ ] Class labels
- [ ] Continuous values
- [ ] Ordinal values
---
title: Log odds interpretation of logistic regression - 2
description:
duration: 15000
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1Xpm2xAc1oT95bAzZvRQPUobikSRR2Fgs' width=800>
..
<img src='https://drive.google.com/uc?id=1XWM57akV5CFtG8JypxDELnpNokU6nLco' width=800>
What does this mean geometrically?
<img src='https://drive.google.com/uc?id=17CVyUuT9ZLlsqgWhsyKUChPP0o6Nlw33' width=800>
---
title: Quiz 6
description:
duration: 30
card_type: quiz_card
---
# Question
If log(odds) is negative, which of the options hold true?
.
# Choices
- [x] 1-p > p
- [ ] p > 1-p
- [ ] p == 1-p
---
title: Linear regression vs logistic
description:
duration: 15000
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1F7pWJ-_hmPbEe7LgaJhC9VESNrx0Y24x' width=800>
To find the probability of the point lying, we simply apply exponential to both sides and solve for p, which would give:
𝑝=1/1+𝑒−𝑧
Note: Sigmoid and Logit and just inverse of each other, and both can be used to build a logistic regression model
---
title: Quiz 7
description:
duration: 30
card_type: quiz_card
---
# Question
What is the range of log odds in logistic regression?
.
# Choices
- [ ] (0, 1)
- [x] (-∞, ∞)
- [ ] [0, 1]
- [ ] [0, ∞)
---
title: Quiz 8
description:
duration: 30
card_type: quiz_card
---
# Question
How are log odds transformed into probabilities in logistic regression?
# Choices
- [x] By applying the sigmoid function
- [ ] By taking the exponential function
- [ ] By dividing by the odds ratio
- [ ] By subtracting the intercept term
---
title: Impact of outliers
description:
duration: 15000
card_type: cue_card
---
<img src='https://drive.google.com/uc?id=1aQk_WFojHob2thbycSBBC1hXx2cIL2Lh' width=700>
### Case I: When the outlier lies on the correct side
<img src='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/036/753/original/image_2023-06-14_052158593.png?1686700322' width=700>
<img src='https://drive.google.com/uc?id=1iDeFLogS9rCNs1WiELMsFoRMIx_jRHZ8' width=700>
Since the Loss is very less in this case:
=> The impact of outlier is very less
### Case II: When the outlier is on the opposite/wrong side
<img src='https://drive.google.com/uc?id=1SKv32h8SUGk4pbOuS6XQnCv20LMnUV6V' width=700>
Let's say $z^i=-4.3$
So $\hat{y}$ becomes 0.01
Therefore, L = $-log_e(0.01)$
This comes out almost equal to 4.6, which is a very large value
=> The impact of outlier will be **very high**
Thus the best thing is to find the outlier and remove them, so that we get accurate results
---
title: Quiz 9
description:
duration: 45
card_type: quiz_card
---
# Question
How do outliers affect the classification boundaries in logistic regression?
# Choices
- [x] Outliers shift the classification boundaries closer to the outlier values
- [ ] Outliers have no effect on the classification boundaries
- [ ] Outliers widen the gap between the classification boundaries
- [ ] Outliers make the classification boundaries more sensitive to minor changes
---
title: Multi-class classification
description:
duration: 15000
card_type: cue_card
---
## Multi-class classification
Till now we have seen how to use logistic regression to classify between two classes
But in real world there will be cases with many more classes
#### How can we use logistic regression in cases with more than two output classes?
<img src='https://drive.google.com/uc?id=1ZXmXc62oRRLsGOxNVvHi4GWITISWvL16' width=700>
<img src='https://drive.google.com/uc?id=1MSTuz_D9AJUZlHgDqMwQsBsyTLAE2gE7' width=700>
To train these models, we can't use the same dataset, since our data will have three classes.
So we will modify our data for the three models.
Say for model 1, to check whether the input is orange or not,
Our output column will be modified by replacing the values with orange as 1, and rest values with 0
We will do the same for the other two models
<img src='https://drive.google.com/uc?id=1xCJJoF5j0HJILD0xfhI6hA_1RqwoefHz' width=700>
.
<img src='https://drive.google.com/uc?id=15kHWLomnIvIkr6EmzB1EiDpAddlOQ-q2' width=700>
---
title: Quiz 10
description:
duration: 30
card_type: quiz_card
---
# Question
We want to classify cars based on the 20 different brands of cars.
How many logisitic Regression model will we need ?
# Choices
- [ ] 10
- [x] 20
- [ ] 21
- [ ] 19
---
title: Multi-class classification - 2
description:
duration: 15000
card_type: cue_card
---
#### Now given an input point, how to predict which class it belongs to?
<img src='https://drive.google.com/uc?id=1RTcgUwMq12FlqHJBH3l0jl91mbfCMQxv
' width=700>
---
title: Quiz 11
description:
duration: 30
card_type: quiz_card
---
# Question
For three models, the yhat values come to be:
M1=0.34
M2=0.28
M3=0.35
What would be the predicted output class by the classifier?
# Choices
- [ ] M1
- [ ] M2
- [x] M3
- [ ] None since no model has yhat>0.5
---
title: Quiz 12
description:
duration: 45
card_type: quiz_card
---
# Question
What is the purpose of the one-vs-rest (OvR) strategy in multi-class logistic regression?
# Choices
- [ ] To improve the interpretability of the model coefficients
- [ ] To handle imbalanced datasets in multi-class problems
- [ ] To reduce the complexity of the model
- [x] To transform a multi-class problem into multiple binary classification problems
---
title: Multi-class classification - Code
description:
duration: 15000
card_type: cue_card
---
Let's see an implementation of the same using sklearn
```python=
#import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.inspection import DecisionBoundaryDisplay
```
Creating some data with multiple classes
```python=
# dataset creation with 3 classes
from sklearn.datasets import make_classification
X, y = make_classification(n_samples= 498,
n_features= 2,
n_classes = 3,
n_redundant=0,
n_clusters_per_class=1,
random_state=5)
y=y.reshape(len(y), 1)
```
> Output
```
(498, 2) (498, 1)
```
Plotting the data
```python=
plt.scatter(X[:, 0], X[:, 1], c = y)
plt.show()
```
> Output
<img src='
https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/043/499/original/download_%284%29.png?1692600337' width=700>
Splitting the data into train validation and test set
```python=
from sklearn.model_selection import train_test_split
X_tr_cv, X_test, y_tr_cv, y_test = train_test_split(X, y, test_size=0.2, random_state=4)
X_train, X_val, y_train, y_val = train_test_split(X_tr_cv, y_tr_cv, test_size=0.25,random_state=4)
X_train.shape
```
> Output
```
(298, 2)
```
training the OneVsRest Logistic Regression model
```python=
model = LogisticRegression(multi_class='ovr')
# fit model
model.fit(X_train, y_train)
print(f'Training Accuracy:{model.score(X_train,y_train)}')
print(f'Validation Accuracy :{model.score(X_val,y_val)}')
print(f'Test Accuracy:{model.score(X_test,y_test)}')
```
> Output
```
Training Accuracy:0.9161073825503355
Validation Accuracy :0.91
Test Accuracy:0.91
```
Creating Hyperplane of OVR LogisticRegression for the entire data
```python=
_, ax = plt.subplots()
DecisionBoundaryDisplay.from_estimator(model, X, response_method="predict", cmap=plt.cm.Paired, ax=ax)
plt.title("Decision surface of LogisticRegression")
plt.axis("tight")
# Plot also the training points
colors = "bry"
for i, color in zip(model.classes_, colors):
idx = np.where(y == i)
plt.scatter(
X[idx, 0], X[idx, 1], c=color, cmap=plt.cm.Paired, edgecolor="black", s=20
)
# Plot the three one-against-all classifiers
xmin, xmax = plt.xlim()
ymin, ymax = plt.ylim()
coef = model.coef_
intercept = model.intercept_
def plot_hyperplane(c, color):
def line(x0):
return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1]
plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color)
for i, color in zip(model.classes_, colors):
plot_hyperplane(i, color)
plt.show()
```
> Output
<img src='
https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/043/501/original/download_%285%29.png?1692600599' width=700>
Observe
We can see how One-vs-Rest Logistic Regression is able to classify Multi-class Classification data
---
title: Quiz 13
description:
duration: 30
card_type: quiz_card
---
# Question
Which evaluation metric is commonly used to assess the performance of a logistic regression model?
# Choices
- [ ] Mean squared error
- [ ] R-squared value
- [x] Accuracy
- [ ] Root mean squared error
---
title: Quiz 14
description:
duration: 30
card_type: quiz_card
---
# Question
Logistic regression assumes that the relationship between the independent variables and the log-odds of the dependent variable is:
# Choices
- [ ] Exponential
- [ ] Quadratic
- [ ] Non-linear
- [x] Linear
---
title: Quiz 15
description:
duration: 30
card_type: quiz_card
---
# Question
How is the loss function typically defined in multi-class logistic regression?
# Choices
- [x] Cross-entropy loss
- [ ] Mean squared error (MSE)
- [ ] Mean absolute error (MAE)
- [ ] Hinge loss