There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
IML - Intro
Motivation
What is learning ?
It's all about evolving
Definition Learning: Improver over experience to perform better in new situations.
Quoting S. Bengio Learning is not learning by heart. Any computer can learn by heart. The difficulty is to generalize a behavior to a novel situation.
Can machines learn ?
A new science with a goal and an object.
How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes ?
Tom Mitchel, 2006
What is it good for ?
According to Peter Norvig
The 3 main reasons why you may want to use Machine Learning:
Avoid coding numerous complex rules by hand
lower cost, more effective, faster reaction to changing problem
Optimize the parameteres of your system given a dataset of yours
Better accuracy
Create systems for which you do not know the rules conscioulsy (e.g. recognize a face)
Greater potential
AI vs Machine Learning
AI is a very fuzzy concept, much like "any computer program doing something useful"
Think "if-then" rules
ML can be considered a subfield of AI since those algorithms can be seen as building blocks to make computers learn to behave more intelligently by somehow generalizing rather that just storing and retrieving data items like a database system would do
Engineering point of view: ML is about builiding programs with turnable parameters (typically an array of floating point values) that are adjusted automatically so as to improve their behavior by adapting to previously seen data
…the hidden test points (seen after the training)…
Learning bias
How to guide generalization
It is always possible to find a model complex enough to fit all the examples
Example: polynomial with very high degree
But how would this help us with new samples?
It should not generalize well.
We need to define a family of acceptable solutions to search from
It forces to learn a “smoothed” representation.
… but it should not smooth the representation too much!
Occam’s Principle of Parsimony(14th century) One should not increase, beyond what is necessary, the number of entities required to explain anything.
When many solutions are available for a given problem, we should select the simplest one. But what do we mean by simple? We will use prior knowledge of the problem to solve to define what is a simple solution.
Example of a prior: smoothness
Learning as a search problem
Hypothesis space / initial, compatible (with train set), optimal, and ideal solutions
What are the sources of error ?
Noise, intrinsic error
Your data is not perfect (can have noisy or erroneous labels). (or “Every model is wrong.”) Even if there exist an optimal underlying model, the observations are corrupted by noise.
We are exploring a restricted subset of all possible solutions. Your classifier needs to drop some information about the training set to have generalization power (simplify to generalize).
You have many ways to explain your training dataset. It is hard to find an optimal solution among those many possibilities. Our exploration is not very accurate, we are limited by data we see during training.
Density estimation input: samples described by several input variables (correlated) Density estimation output: estimate of the probability distribution function over the feature space
Three kinds of supervision/trainings
According to Lecun, S. Bengio
Supervised learning: Training data contains the desired behavior — desired class, outcome, etc
Medium feedback
Reinforcement learning: Training data contains partial targets — Did the system do well or not? Is some object present in the image (without knowing is position)?
Weak feedback
Unsupervised/Self-supervised learning: Training data is raw, no class or target is given.
There is often a hidden goal in the task: compression, maximum likelihood, predict parts from other parts (BERT-like)…
Lot of feedback
Forms of Machine Learning
According to Cornuejols and Miclet
Exploration-based: Generalization or specialization of rules
Examples: Grammatical inference, heuristic discovery for SAT solvers…
Optimization-based: Topic of this course.
Examples: linear separators and SVMs, neural networks, decision trees, Bayesian networks, HMMs…