IML: Supervised learning
Supervised learning
Supervised learning: process of teaching a model by feeding it input data as well as correct output data. The model will (hopefully) deduce a correct relationship between the input and output
- An input/output pair is called labeled data
- All pairs form the training set
- Once training is completed, the model can infer new outputs if fed with new inputs.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Given some training data , supervised learning aims at finding a model correctly mapping input data to their respective output
- The model can predict new outputs
- The learning mechanism is called regression or classification
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Managing data for supervised learning
Hide some data out during training ( data) to further evaluate model performances train/test split
Use validation set ( data) if parameters are iteratively adjusted tain/validation split
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Stratified sampling
For classification purposes
CLasses might be imbalanaced use stratified sampling to guarantee a fair balance of train/est samples for each class
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Regression
The art of predicting values
Regression: the output value to predict is quantitative (real number)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
How to mathematically model the relationship between predictor variables and their numerical output ?
Linear regression
Sometimes, there's no need for a complicated model…
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Ordinary Least Squares
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Anscombes' quartet
For all 4 datasets
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Le 3e regression a une donnee aberrante, cad une donnee tres eloignee des autres qui risque de fausser la regression (probablement du au capteur qui s'est chie dessus)
Linear regression line and are the SAME for all 4 datasets
Least absolute deviation
Linear regression by OLS is sensitive to outliers (tj=hank you norm…)
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Is it a good idea ?
- is the MLE estimator of when noise follows a Laplace distribution
- No analyticial formula for LAD
- Harder to find the solution
- Must use gradient descent approach
- Solution of LAD may not be unique
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Toutes les droites dans le cone sont optimales
Adding some regularization
Add apenalty term to OLS to eforce particular properties to
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
From regression to classification
Logistic regression
Linear regression predicts a real value based on predictor variables
- Does not work is is boolean
- and
- Use logistic regression instead
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Linear relationship between predictor variables and logit of event:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
k-nearest neighbors
k-NN classifier simply assigns test data points to the majority class in the neighborood of the test points
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Result:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Choosing k
- small k: simple but noisy decision boundary
- large k: smoothed boundaries but computationally intensive
- can also serve as a starting heuristic, refined by cross-validation
- should be odd for binary classification
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
k-nearest neighbors for regression
Use the k nearest neighbors (in terms of features only) and average to get predicted value
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Support Vector Machine
Linear SVM
Training set: with and
Goal: find hyperplane that best divide positive sample and negative samples
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Qu'est-ce qu'on a envie de faire ici ?
Une moyenne
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
On cherche la droite qui passe le plus au centre
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Rappel: produit scalaire de 2 vecteurs colineaires:

Soft margin SVM
Data may not be fully linearly separable
Kernel SVM
Remember the kernel trick ?
Kernel trick:
- map data points into high dimesional space where they would become linearly separable
- Effortlessly interfaced with the SVM by replacing dot product by kernelizes version

Widely used kernel functions:
- Polynomial kernel

- Gaussian RBF kernel

- Sigmoid kernel

Choosing the right kernel with the right hyperparameters
Kernel Try linear first. If does not work, RBF is probably the best kernel choice (unless you have some prior information on the geometry of your dataset)
Hyperparameters ( + kernel parameter(s)) grid search and cross-validation
Mutliclass SVM
What if we have more than 2 classes ?
2 possible strategies
one vs all: One SVM model per class separate the class from all other classes
- Assign new points with winner takes all rule
- if no outright winner, assign point to the class of closest hyperplane (Platt scaling)
One versus one: one SVM model per pair of classes separate 2 classes at a time, ignoring the other data
- assign new points with majority voting rule

Decision trees
Decision trees use recusrive partitioning to create a sequence of decision rules on input features that nested split of data points
Input features can be numeric (decision ) or categorical (decision )
Decision node decision rule for one feature
Classification tree predict class
Regression tree predict real number

On the current node, try to apply all the possible decision rules for all features and select the decision that best split the data
Classification tree impurity riterion
Regression tree variance reduction

Final decision boundaries overlapping orthogonal half planes
Decision on new data running it down through the branches and assign classes
How to split a node
Which split should we choose between 

La reponse est goche


Stop recursive partitionning if node is pure
Pros and cons of decision trees
Pros
- Simple decision rules
- Surprisingly computationally efficient
- Handle multiclass problems
- Handle numeric and categorical features at the same time
Cons

Potential solution
Restrain the growth of the tree by imposing a maximal tree depth
Random forests
Bagging several decision trees
Decision trees are weak classifiers when considered individually
- Average the decision of several of them
- Compensate their respective errors (wisdom of crowds)
- Useless if all decision trees see the same data
- introduce some variability with bagging (bootstrap aggregating)
- Introduce more variability by selecting only out of total features for each split in each decision tree (typically )

Final decision is taken by majority voting on all decision tree outputs

Decision boundaries comparison


Cross-validation
-fold cross validation
- Divide whole data into non-overlapping sample blocks
- Train models on training blocks and test on remaining block
- Compte perf metrics of each model + avergae & standard deviation of all models

Confusion matrix
