Logistic regression 2

--- title: Sklearn Implementation of Logistic regression description: duration: 15000 card_type: cue_card --- Let's load the data of our business case now ```python= #Churn prediction in telecom. import numpy as np import matplotlib.pyplot as plt !gdown 1uUt7uL-VuF_5cpodYRiriEwhsldeEp3m import pandas as pd churn = pd.read_csv("churn_logistic.csv") churn.head() # We will choose 5 features for our logistic regression which we selected using simple EDA # You can go through the EDA of this to understand how we selected these features: https://colab.research.google.com/drive/1nkbiGCMrevDzdSG9yN5bXaxeC8CPJSQg?usp=sharing ``` ```python= cols = ['Day Mins', 'Eve Mins', 'Night Mins', 'CustServ Calls', 'Account Length'] y = churn["Churn"] y = np.array(y).reshape(len(y), 1) #Reshaping our data to (m,1) shape X = churn[cols] X.shape``` ``` > Output ``` (5700, 5) ``` Let's split the data into training, validation and testing ```python= from sklearn.model_selection import train_test_split X_tr_cv, X_test, y_tr_cv, y_test = train_test_split(X, y, test_size=0.2, random_state=1) X_train, X_val, y_train, y_val = train_test_split(X_tr_cv, y_tr_cv, test_size=0.25,random_state=1) X_train.shape ``` > Output ``` (3420, 5) ``` We will scale our data before fitting the model ```python= from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_val = scaler.transform(X_val) X_test = scaler.transform(X_test) X_train ``` > Output ``` array([[-1.8525591 , -0.54121117, 1.87596728, 0.0724823 , 2.13378709], [ 0.93155078, 1.05292599, 0.39854651, -0.54879454, -0.81991418], [ 0.46912157, 0.11462924, 1.13324217, 0.0724823 , -2.27130187], ..., [-0.52565742, -0.04014136, -0.68543069, 0.69375914, 0.55508469], [-0.94359172, -0.58957698, -0.37428909, 1.93631281, -0.36158122], [-0.58604336, 2.40910335, 1.70935597, 0.69375914, -0.25972945]]) ``` ```python= from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X_train, y_train) print("coef = ",model.coef_) print("intercept = ",model.intercept_) ``` > Output ``` coef = array([[0.6844725 , 0.29104522, 0.13637423, 0.79640697, 0.0613349 ]]) intercept = array([-0.01215015]) ``` ```python= model.predict(X_train) ``` > Output ``` array([0, 1, 1, ..., 1, 1, 1]) ``` --- title: Accuracy Metric description: duration: 15000 card_type: cue_card --- <img src='https://drive.google.com/uc?id=1idIjt8sYlFbBdGy0udJz_NdAMeX_BAf9' width=700> <img src='https://drive.google.com/uc?id=15_a3vpK24gFOY4BJHaFovUxMQ9nVWuNd' width=700> Let's implement our accuracy metric now ```python= def accuracy(y_true, y_pred): y_true = y_true.reshape(len(y_true)) return np.sum(y_true==y_pred)/y_true.shape[0] ``` ```python= accuracy(y_train, model.predict(X_train)) ``` > Output ``` 0.7058479532163743 ``` ```python= accuracy(y_val, model.predict(X_val)) ``` > Output ``` 0.6982456140350877 ``` --- title: Quiz 1 description: duration: 60 card_type: quiz_card --- # Question What is the main risk of overfitting when tuning hyperparameters in logistic regression? # Choices - [ ] The model may generalize well to unseen data but poorly on the training data - [x] The model may perform well on the training data but poorly on unseen data - [ ] The model may underperform compared to a model with default hyperparameter values - [ ] The model may be too simple and fail to capture complex relationships in the data --- title: Hyperparameter tuning description: duration: 15000 card_type: cue_card --- We will tune the regularization rate of our model. You can refer to the documentation for the various list of parameters in logistic regression. Link: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html Hence let's start doing hyper parameter tuning on parameter 𝐶=1/𝜆 to increase the performance of the model ```python= from sklearn.pipeline import make_pipeline train_scores = [] val_scores = [] scaler = StandardScaler() for la in np.arange(0.01, 5000.0, 100): # range of values of Lambda scaled_lr = make_pipeline(scaler, LogisticRegression(C=1/la)) scaled_lr.fit(X_train, y_train) train_score = accuracy(y_train, scaled_lr.predict(X_train)) val_score = accuracy(y_val, scaled_lr.predict(X_val)) train_scores.append(train_score) val_scores.append(val_score) ``` The code is similar to the code we did in hyperparameter tuning ```python= !gdown 1bwRmKkPwmLKiqOgQ_LnKH0Vsc3mJKmVR ``` ```python= len(val_scores) #Output = 50 ``` Now, let's plot the graph and pick the Regularization Parameter λ which gives the best validation score ```python= plt.figure(figsize=(10,5)) plt.plot(list(np.arange(0.01, 5000.0, 100)), train_scores, label="train") plt.plot(list(np.arange(0.01, 5000.0, 100)), val_scores, label="val") plt.legend(loc='lower right') plt.xlabel("Regularization Parameter(λ)") plt.ylabel("Accuracy") plt.grid() plt.show() ``` <img src='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/043/497/original/download_%283%29.png?1692597511' width=700> We see how Validation increases to a peak and then decreases Notice as Regularization is increasing, the Accuracy decreasing since model is moving towards Underfit --- title: Code description: duration: 15000 card_type: cue_card --- Let's take lambda value as 1000 for this data and check the results ```python= model = LogisticRegression(C=1/1000) model.fit(X_train, y_train) print("Train acc = ",accuracy(y_train, model.predict(X_train))) print("Test acc = ",accuracy(y_val, model.predict(X_val))) ``` > Output ``` Train acc = 0.7137426900584796 Test acc = 0.7096491228070175 ``` We can observe an increase of 0.01, or 1%, in both training and validation data Let's check our model for test data too ```python= accuracy(y_test, model.predict(X_test)) ``` > Output ``` 0.7096491228070175 ``` --- title: Quiz 2 description: duration: 30 card_type: quiz_card --- # Question What is the effect of increasing the regularization rate (C) in logistic regression? # Choices - [ ] The model becomes less prone to overfitting - [ ] The model's training accuracy increases - [x] The model becomes more prone to overfitting - [ ] The model's test accuracy increases --- title: Quiz 3 description: duration: 30 card_type: quiz_card --- # Question How does the regularization rate ( C ) affect the magnitude of the model coefficients in logistic regression? # Choices - [ ] Higher C results in larger coefficient values - [x] Higher C results in smaller coefficient values - [ ] C has no impact on the magnitude of the coefficients - [ ] The effect of C on the coefficients depends on the dataset --- title: Quiz 4 description: duration: 30 card_type: quiz_card --- # Question The logistic regression algorithm estimates the parameters by maximizing the: # Choices - [ ] Sum of squared errors - [ ] Mean squared error - [x] Likelihood function - [ ] Cross-entropy loss --- title: Log odds interpretation of logistic regression description: duration: 15000 card_type: cue_card --- <img src='https://drive.google.com/uc?id=1z-0qkx0h81U_iwb7fVeFQG0RVkpqyPGy' width=700> <img src='https://drive.google.com/uc?id=1mruiW2aBWCEMjW74WtAC3_AQoeDZ4EdJ' width=700> #### Which concept of earlier is this similar to? Remember, $σ(p)$ also defined probability. So if we simplify our winning/losing as belonging to class 1/0, then $σ(p)$ here defines the probability of belonging to class 1 (winning class) --- title: Quiz 5 description: duration: 30 card_type: quiz_card --- # Question The logistic regression model predicts: # Choices - [x] Probabilities - [ ] Class labels - [ ] Continuous values - [ ] Ordinal values --- title: Log odds interpretation of logistic regression - 2 description: duration: 15000 card_type: cue_card --- <img src='https://drive.google.com/uc?id=1Xpm2xAc1oT95bAzZvRQPUobikSRR2Fgs' width=800> .. <img src='https://drive.google.com/uc?id=1XWM57akV5CFtG8JypxDELnpNokU6nLco' width=800> What does this mean geometrically? <img src='https://drive.google.com/uc?id=17CVyUuT9ZLlsqgWhsyKUChPP0o6Nlw33' width=800> --- title: Quiz 6 description: duration: 30 card_type: quiz_card --- # Question If log(odds) is negative, which of the options hold true? . # Choices - [x] 1-p > p - [ ] p > 1-p - [ ] p == 1-p --- title: Linear regression vs logistic description: duration: 15000 card_type: cue_card --- <img src='https://drive.google.com/uc?id=1F7pWJ-_hmPbEe7LgaJhC9VESNrx0Y24x' width=800> To find the probability of the point lying, we simply apply exponential to both sides and solve for p, which would give: 𝑝=1/1+𝑒−𝑧 Note: Sigmoid and Logit and just inverse of each other, and both can be used to build a logistic regression model --- title: Quiz 7 description: duration: 30 card_type: quiz_card --- # Question What is the range of log odds in logistic regression? . # Choices - [ ] (0, 1) - [x] (-∞, ∞) - [ ] [0, 1] - [ ] [0, ∞) --- title: Quiz 8 description: duration: 30 card_type: quiz_card --- # Question How are log odds transformed into probabilities in logistic regression? # Choices - [x] By applying the sigmoid function - [ ] By taking the exponential function - [ ] By dividing by the odds ratio - [ ] By subtracting the intercept term --- title: Impact of outliers description: duration: 15000 card_type: cue_card --- <img src='https://drive.google.com/uc?id=1aQk_WFojHob2thbycSBBC1hXx2cIL2Lh' width=700> ### Case I: When the outlier lies on the correct side <img src='https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/036/753/original/image_2023-06-14_052158593.png?1686700322' width=700> <img src='https://drive.google.com/uc?id=1iDeFLogS9rCNs1WiELMsFoRMIx_jRHZ8' width=700> Since the Loss is very less in this case: => The impact of outlier is very less ### Case II: When the outlier is on the opposite/wrong side <img src='https://drive.google.com/uc?id=1SKv32h8SUGk4pbOuS6XQnCv20LMnUV6V' width=700> Let's say $z^i=-4.3$ So $\hat{y}$ becomes 0.01 Therefore, L = $-log_e(0.01)$ This comes out almost equal to 4.6, which is a very large value => The impact of outlier will be **very high** Thus the best thing is to find the outlier and remove them, so that we get accurate results --- title: Quiz 9 description: duration: 45 card_type: quiz_card --- # Question How do outliers affect the classification boundaries in logistic regression? # Choices - [x] Outliers shift the classification boundaries closer to the outlier values - [ ] Outliers have no effect on the classification boundaries - [ ] Outliers widen the gap between the classification boundaries - [ ] Outliers make the classification boundaries more sensitive to minor changes --- title: Multi-class classification description: duration: 15000 card_type: cue_card --- ## Multi-class classification Till now we have seen how to use logistic regression to classify between two classes But in real world there will be cases with many more classes #### How can we use logistic regression in cases with more than two output classes? <img src='https://drive.google.com/uc?id=1ZXmXc62oRRLsGOxNVvHi4GWITISWvL16' width=700> <img src='https://drive.google.com/uc?id=1MSTuz_D9AJUZlHgDqMwQsBsyTLAE2gE7' width=700> To train these models, we can't use the same dataset, since our data will have three classes. So we will modify our data for the three models. Say for model 1, to check whether the input is orange or not, Our output column will be modified by replacing the values with orange as 1, and rest values with 0 We will do the same for the other two models <img src='https://drive.google.com/uc?id=1xCJJoF5j0HJILD0xfhI6hA_1RqwoefHz' width=700> . <img src='https://drive.google.com/uc?id=15kHWLomnIvIkr6EmzB1EiDpAddlOQ-q2' width=700> --- title: Quiz 10 description: duration: 30 card_type: quiz_card --- # Question We want to classify cars based on the 20 different brands of cars. How many logisitic Regression model will we need ? # Choices - [ ] 10 - [x] 20 - [ ] 21 - [ ] 19 --- title: Multi-class classification - 2 description: duration: 15000 card_type: cue_card --- #### Now given an input point, how to predict which class it belongs to? <img src='https://drive.google.com/uc?id=1RTcgUwMq12FlqHJBH3l0jl91mbfCMQxv ' width=700> --- title: Quiz 11 description: duration: 30 card_type: quiz_card --- # Question For three models, the yhat values come to be: M1=0.34 M2=0.28 M3=0.35 What would be the predicted output class by the classifier? # Choices - [ ] M1 - [ ] M2 - [x] M3 - [ ] None since no model has yhat>0.5 --- title: Quiz 12 description: duration: 45 card_type: quiz_card --- # Question What is the purpose of the one-vs-rest (OvR) strategy in multi-class logistic regression? # Choices - [ ] To improve the interpretability of the model coefficients - [ ] To handle imbalanced datasets in multi-class problems - [ ] To reduce the complexity of the model - [x] To transform a multi-class problem into multiple binary classification problems --- title: Multi-class classification - Code description: duration: 15000 card_type: cue_card --- Let's see an implementation of the same using sklearn ```python= #import libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.inspection import DecisionBoundaryDisplay ``` Creating some data with multiple classes ```python= # dataset creation with 3 classes from sklearn.datasets import make_classification X, y = make_classification(n_samples= 498, n_features= 2, n_classes = 3, n_redundant=0, n_clusters_per_class=1, random_state=5) y=y.reshape(len(y), 1) ``` > Output ``` (498, 2) (498, 1) ``` Plotting the data ```python= plt.scatter(X[:, 0], X[:, 1], c = y) plt.show() ``` > Output <img src=' https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/043/499/original/download_%284%29.png?1692600337' width=700> Splitting the data into train validation and test set ```python= from sklearn.model_selection import train_test_split X_tr_cv, X_test, y_tr_cv, y_test = train_test_split(X, y, test_size=0.2, random_state=4) X_train, X_val, y_train, y_val = train_test_split(X_tr_cv, y_tr_cv, test_size=0.25,random_state=4) X_train.shape ``` > Output ``` (298, 2) ``` training the OneVsRest Logistic Regression model ```python= model = LogisticRegression(multi_class='ovr') # fit model model.fit(X_train, y_train) print(f'Training Accuracy:{model.score(X_train,y_train)}') print(f'Validation Accuracy :{model.score(X_val,y_val)}') print(f'Test Accuracy:{model.score(X_test,y_test)}') ``` > Output ``` Training Accuracy:0.9161073825503355 Validation Accuracy :0.91 Test Accuracy:0.91 ``` Creating Hyperplane of OVR LogisticRegression for the entire data ```python= _, ax = plt.subplots() DecisionBoundaryDisplay.from_estimator(model, X, response_method="predict", cmap=plt.cm.Paired, ax=ax) plt.title("Decision surface of LogisticRegression") plt.axis("tight") # Plot also the training points colors = "bry" for i, color in zip(model.classes_, colors): idx = np.where(y == i) plt.scatter( X[idx, 0], X[idx, 1], c=color, cmap=plt.cm.Paired, edgecolor="black", s=20 ) # Plot the three one-against-all classifiers xmin, xmax = plt.xlim() ymin, ymax = plt.ylim() coef = model.coef_ intercept = model.intercept_ def plot_hyperplane(c, color): def line(x0): return (-(x0 * coef[c, 0]) - intercept[c]) / coef[c, 1] plt.plot([xmin, xmax], [line(xmin), line(xmax)], ls="--", color=color) for i, color in zip(model.classes_, colors): plot_hyperplane(i, color) plt.show() ``` > Output <img src=' https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/043/501/original/download_%285%29.png?1692600599' width=700> Observe We can see how One-vs-Rest Logistic Regression is able to classify Multi-class Classification data --- title: Quiz 13 description: duration: 30 card_type: quiz_card --- # Question Which evaluation metric is commonly used to assess the performance of a logistic regression model? # Choices - [ ] Mean squared error - [ ] R-squared value - [x] Accuracy - [ ] Root mean squared error --- title: Quiz 14 description: duration: 30 card_type: quiz_card --- # Question Logistic regression assumes that the relationship between the independent variables and the log-odds of the dependent variable is: # Choices - [ ] Exponential - [ ] Quadratic - [ ] Non-linear - [x] Linear --- title: Quiz 15 description: duration: 30 card_type: quiz_card --- # Question How is the loss function typically defined in multi-class logistic regression? # Choices - [x] Cross-entropy loss - [ ] Mean squared error (MSE) - [ ] Mean absolute error (MAE) - [ ] Hinge loss