# MLRF: Lecture 01
# Scope of this course
> Apply Machine Learning (ML) techniques to solve some practical Computer Vision (CV) problems
- About **Computer Vision** (CV)
- It should be called CV-ML, ML4CV or so...
We need some definitions:
- What is *Computer Vision* ? What is *Pattern Recognition* ? *Shape Recognition* ?
- What is *Machine Learning* ?
- How do those concepts relate together ?
# Agenda for lecture 1
1. Some definitions and basic notions
2. Course outline
3. Introduction to *Twin it !*
4. Pattern Matching
# Some definitions
## Computer Vision
:::info
**Definition**
The automation of visual tasks with the goal of producing results directly or indirectly usable by humans
:::
- **Input**: image(s) in machine format (image acquisition of a subpart of CV)
- **Output**: some pieces
### Exemple
:::warning
How would you process image pixels to get those results ?
:::
![](https://i.imgur.com/2BF45t3.png)
> Les photos de chats sur Internet c'est important
![](https://i.imgur.com/ynVQQe7.png)
![](https://i.imgur.com/XsmFBiR.png)
- Some applications are direct (like the insect recognition app):
- a human reads and uses the output
- Some applications are indirect (like bank checking reading)
- The output is fed to a business system
- Some applications extend what humans can naturally do
- Either by extending our range
## Pattern Recognition
:::info
**Definition**
The field of a pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take action such as *classifying* the data into different categories
> Bishop, 2006
:::
IAPR: **pattern recognition**, **computer vision** and **image processing** in a broad sense
### Examples
- OCR
- Computer vision
- Pedestrian detection
- Computer Vision
- Credit fraud detection
- Not computer vision
$\Rightarrow$ CV$\cap$PR$\neq\emptyset$
### Pattern Recognition is an *inverse* problem
> OCR example - Why Pattern Recognition is hard
![](https://i.imgur.com/ETm4YMn.png)
## "Shapes"
:::info
**Definition**
A way to designate meaningful visual patterns.
:::
Sometimes used to describe "visual percepts"
*Let S and S' be 2 shapes observed in 2 different images which happen to be similar.*
![](https://i.imgur.com/mo3c7OJ.png)
![](https://i.imgur.com/nL14MCD.png)
:::warning
Some **statistics** can help us making better decisions...
Idea: **learn** the distance threshold under which shapes can be deemed identical
:::
## Machine Learning
### Many forms of Machine Learning
- Focus on **inductive learning** (generalize from examples)
- We will consider both **supervised** (a "teacher" provides labels for examples) and **unsupervised** (only samples)
- Focus on **optimization-based learning techniques** (examples are represented as numerical vectors)
## Examples of optimization-based learning techniques
- Linear classifiers, SVMs
- Neural networks
## ("Statistical") Machine Learning
> Learning means **changing** in order to be **better** (according to a given **criterion**) when a similar situation arrives
> Learning **IS NOT** learning by heart
> Any computer can learn by heart, the difficulty is to **generalize** a behavior to a novel situation
> *Quoting S. Bengio*
### From an engineer's POV
> Machin Learning is about building programs with **tunable parameters** (typicalyy an array of floating point values) that are **adjusted automatically** so as to improve their behavior by **adapting to previously seen data**.
> Machine Learning can be considered *a subfield of AI* since those algorithms can be seen as building blocks to make computer learn
> *Scikit Learn Documentation*
# Why is learning difficult ?
Given a **finite amount of training data**, you have to derive a **relation for an infinite domain**.
In fact, there is an **infinite** number of such relations
![](https://i.imgur.com/aS55m47.png)
*Which relation is the most appropriate ?*
![](https://i.imgur.com/CSf1D5v.png)
... **the hidden test points**...
## Learning bias
It is **always** possible to find a model **complex enough** to fit **all** the examples
But how would this help us with **new samples** ? It should not **generalize** well.
We need to define a **family of acceptable solutions to search from**. It forces to learn a "smoothed" representation
### So in practice we need
- Examples (data!)
- A tunable algorithm (model)
- A evalutation of the model fitness to examples (risk, loss)
- A definition of the model search space (not too big, not too small)
- An optimization strategy
:::success
**The bias/variance compromise**
Small search space:
- Easier to find the best (available) solution
- But it may be far from the ideal one
Large search space:
- It is hard to find the best (available) solution
:::
## 3 kinds of problems
### Regression
![](https://i.imgur.com/CGxxpbw.png)
$$
x=\underbrace{\begin{pmatrix} \vdots \end{pmatrix}}_{\in\mathbb R^T}\\
y=\underbrace{\begin{pmatrix} \vdots \end{pmatrix}}_{\in\mathbb R^5}
$$
### Classification
![](https://i.imgur.com/XF0PQrn.png)
$$
x=\mathbb R^5\\
y=\mathbb R^T
$$
### Density estimation
![](https://i.imgur.com/0hxsLDl.png)
$$
x\in\mathbb R^5\\
\mathbb P(x)\in[0,1]
$$
![](https://i.imgur.com/MXb6dFE.png)
## 3 types of learning
- **Supervised** learning $(x,y)$
- The training contains the desired behavior (desired class, outcome, etc.)
- **Reinforcement** learning $(x,\tilde y)$
- The training data contains partial targets (for instance, simply whether the machines did well or not)
- **Unsupervised** learning
- The training data is raw, no class or target is given
- There is often a hidden goal in that task (compression, maximum likelihood, etc.)
![](https://i.imgur.com/crL8H8G.png)
## Model validation
![](https://i.imgur.com/bu3yFhU.png)
More on that later
1. You need to **test the generalization** power of your approach
2. So you need data not seen during the training: **a test set**
3. For which you know the **expected output** ("ground-truth", "gold standard", "target",...)
# Benefits of ML
## A duck example
How to filter the grass to keep only the duckshape, using threshold domain ?
![](https://i.imgur.com/kXq8hDI.png)
## Why using Machine Learning in computer Vision ?
To avoid knob turning. It's complex. It's unsafe
## But beware of the Machine Learning Magic
![](https://i.imgur.com/45IlSIH.png)
# Actual goals of this course
- Teach you that you can (and should whenever possible) **optimize the parameters** of your CV/PR product
- Show some **simple tools** to try to do it
- Address practical problem
- describe a pattern
- look for a pattern
- match a pattern
- classify a pattern
- describe a set of patterns (an object/an image)
- retrieve an object given a query, segment objects...
- and face the unavoidable work surrounding them
# Course agenda
6 "weeks" (Friday to Friday)
See the [web page](https://www.lrde.epita.fr/~jchazalo/teaching/MLRF/202105_IMAGE_SCIA_S8/) for complete agenda
Weekly tests + assignments (practice sessions). **No final exam**
Weekly wokflow should be:
- **Friday, 09:30-10:00**: answer the weekly quiz on Moodle (*starting next Friday*)
- **Friday, 10:00-12:00**: attend the lecture using Teams
- **Friday, 14:00-17:00**: Work on the practice session and join the discussion using Teams
- **Before next Friday**: Complete the assignement and submit your results using Moodle (*for sessions 4, 5 and 6 only*)
# No deep learning !
- We need a course about basic techniques
- There are cases where setting up
# Pratice sessions: setup your dev. env.
Basically: Python with:
- Jupyter
- Numpy
- Matplotlin
- Scikit-image: RGB
- Scikit-learn
- OpenCV: BGR
# Why I love Scikit-Learn
## Numpy-friendly
![](https://i.imgur.com/TjB9l0u.png)
## 3-way documentation: User guide, API ref, Examples
![](https://i.imgur.com/2y3Kv0w.png)
## Super smart API
> Decomposition, level of detail, default values, consistency, etc
![](https://i.imgur.com/4B3UHFG.png)
![](https://i.imgur.com/VrAfggf.png)
# Introduction to *Twin it!*
## Overview
A poster game
- $X$ bubbles, all different but
- $Y$ bubbles, which have 1 (and only 1) twin
![](https://i.imgur.com/C21NkuT.jpg)
Your goals:
- Find the pairs
:::success
Discussion (3 minutes):
1. How can we *decompose* the problem ?
2. How can we make *sure* our solution works ?
3. What should we *focus* on ?
:::
Already done:
- Scan the poster
- Stitch the tiles
- Normalize the contrast
## Undelying problems
1. Isolate each bubble $\Rightarrow$ **Segmentation** ![](https://i.imgur.com/0zamne3.png)
- We provide pre-computed results for this step
3. Compare image pairs $\Rightarrow$ **Matching** ![](https://i.imgur.com/LsR95M7.png)
- We will focus on this one
- We will use **Template Matching**
5. Identify pairs $\Rightarrow$ **Calibration** ![](https://i.imgur.com/3OrDVtM.png)
# Template matching
## Why template matching ?
A simple method which will be useful to understand
- Evaluation challenges
- The ideas behind keypoint detection (next lecture)
It can work on the Twin it! case
- Twice the same texture
- Textures are the **same scale**, without **rotation** nor **intensity change**
- Only need to cope with **translation** (and some **small noise**)
## Step by step: Compare 2 images
- 2 arrays of intensities ![](https://i.imgur.com/lTZgevq.png)
- Take the **absolute** difference
![](https://i.imgur.com/htN1gEO.png)
$$
R(x,y) = \vert I_1(x,y) - I_2(x,y)\vert
$$
- Sum the differences
![](https://i.imgur.com/lKTscKz.png)
$$
S=\sum_{x,y}(I_1(x,y) - I_2(x,y))^2
$$
(Opt.) Normalize so the results belongs to $[0,1]$
![](https://i.imgur.com/grXMn0J.png)
## Template Matching: Sliding comparison
- $I_1$ is a small template $T$ to match against $I_2$ (just $I$ after)
- We rewrite the preceding formula to compute a map $R$ of the shape of $I$
- Each pixel of $R$ will have the value of the SSD when the top-left pixel of $T$ in on the pixel $(x,y)$ of $I$
![](https://i.imgur.com/gWIgF8y.png)
## Several approaches $\Leftrightarrow$ Practice session
![](https://i.imgur.com/j8ByvfZ.png)
## About the denominator
![](https://i.imgur.com/rP4p1E2.png)
## Cross correlation: 2 things to know
![](https://i.imgur.com/R84XmMS.png)
**More robust to intensity shift**
## Ideal goal
For each bubble, retunr only a mathcin pair, if it exists
![](https://i.imgur.com/c8Cbgao.png)