Week 5 - Machine Learning Arch Overview === {%hackmd theme-dark %} Overview --- - Machine Learning - What is it? - Evaluating models - Binary cross entropy - Logistic regression - Types of errors - Regularisation Machine Learning --- ![](https://i.imgur.com/COxVe4G.png) \*Deep learning is also known as the Deep Neural Network. ### Definitions The practice of using algorithms to parse data, learn from it, and make a determination or prediction about something in the world. Rather than hand-coding software routines with a specific set of instructions, machine is "trained" using large amount of data and algorithms. A well-defined learning task is given by some performance P, at some task T, with some experience E. ### When is ML Used - Human expertise doesn't exist (navigation on Mars) - Humans can't explain their expertise (speech recognition) - Models must be customized (personalized medicine) - Models are based on huge amounts of data (genomics) - Learning isn't always useful (e.g. there is no need to "learn" to calculate payroll) There are 3 components to Machine Learning: - Representation (what is our data) - Optimization (a model, based on the data) - Evaluation (the model) ### Tasks and Techniques - Supervised - Classification: - Out of discrete number of classes, where does the subject belong to? - Logistic regression, Support Vector Machines, NN, Ensemble Methods, Decision tree, k Nearest Neighbors, Naive Bayes, etc. - Regression: - Predicting a number - Linear regression, Support Vector Regression, NN, Lasso, Ridge - Unsupervised - Clustering: - Instances that belong together without labels - K-means, Hierarchical - Association Rules: - How different instances are related - Reinforcement Learning - (will be covered further later) ### Types of Learning - Supervised (inductive) learning - Given: Training data + desired outputs (labels) - Unsupervised learning - Given: Training data (without desired outputs) - Semi-supervised learning - Given: Training data (with a few desired outputs) - Transductive learning: Transfer some of the learning from the labelled dataset to the unlabelled dataset. (how is this different from testing in supervised?) - Reinforcement learning - Rewards from sequence of actions - Self-supervised learning - Subset of unsupervised learning where output labels can be generated "intrinsically" from data objects by exposing a relation between parts of the object, or different views of the object. - Here, we use a pretext task (e.g. rotating the image to learn the more latent feature from the image). ### Designing a Learning System - An ML system consists of a task T, which is the task that the model is expected to do, along with a performance measure P, and a training experience E. - e.g. ![](https://i.imgur.com/Z73R4es.png) - Choose exactly what is to be learned (i.e. the target function) - e.g. NextMove: B→M, where B denotes the set of board states and M denotes the set of legal moves given a board state. NextMove is our target function. - Choose how to represent the target function (model): - NextMove = u0 + u1x1 +u2x2 +u3x3 + u4x4 +u5x5 +u6x6 - x1: no of black pieces on the board - x2: no of white pieces on the board - etc. - Choose a learning algo to infer the target ### Evaluation There are different matrices to evaluate the model - Accuracy (class specific) - Confusion matrix (TP, FP, TN, FN) - Precision (or positive predicted value is the fraction of relevant instances among the retrieved instances) and recall (sensitivity is the fraction of total amount of relevant instances that are actually retrieved), F1 (harmonic mean of precision and recall) - Receiver Operating Curve (ROC): True positive rate vs False positive rate - ![](https://i.imgur.com/9EVzPtu.png) - Area under the Curve (AUC) - For regression: Mean squared error (L2 loss): - $MSE=\frac{1}{n}\sum^{n}_{i=1}(Y_i-\hat{Y}_i)^2$ Information Theory --- Timestamp: https://www.youtube.com/watch?v=vbgp7TBCJZo