Content-based image retrieval
Evaluation of image retrieval systems
Texture descriptors (on les a pas du tout vu)
BoVW
Best practices:
Implement a simple image classifier:
Will be graded
Re-recognize a known 2D or 3D rigid object, potentially being viewed from a novel viewpoint, against a cluttered background, and with partial occlusions
Ex: practice session 3
Recognize any instance of a particular general class such as "cat", "car" or "bicycle"
Aka category-level or generic object recognition
More challenging
This lecture and next practice session
This is a supervised machine learning task
This is a very usual data representation for a classification problem
Classifier inputs = "samples" with "features"
Classifier outputs = "labels"
Now we just need to select an appropriate method, prepare our data, run some training, test the results, adjust some parameters, compare approaches, display results, …
Consists in dropping some data columns
Can help later stages:
Which columns ?
What follows is a very limited selection
Only classifiers suitable for image classification as we present it today
input = feature vector
output = label
Many other approaches
Given samples (described by features) and true lables, find a good function which will correctly predict labels given new data samples
Logisitic regression, Linear Discriminant Analysis, naive Bayes, Perceptrion, Simple Neural Networks..
A learning model that summarizes data with a set of parameters of fixed size (independant of the number of training examples) is called a parametric model. No matter how much data you throw in nature
k-Neares Neighbors, Decision Trees, SVMs
"Non-parametric models differ from parametric models int that hte model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance"
Wikipedia
"Nonparametric methods are good when you have a lot of data and no prior knowledge"
Say you have a dataset with 9 muffins and 1 chihuahua.
You have a new sample to classify.
Which class should you bet on ?
If your class prior probabilities are not equal, then you should bet on the most frequent class! ()
Without such information, you can just pick at random
Waht is the expected accuracy (true predictions / total predictions) if you have N classes an pick one at random ?
Scikit-learn offers a DummyClassifier class which helps testing such a strategy
Keep all training samples
View new samples as quieries over the previously learned / indexed samples
Assign the class of the closest(s) samples
We can check more than one sample
Setting K:
: average number of training sample/class
Very basic classifier
Distance to the mean of the class
It does not take into account differences
in variance for each class
Predicted class for x:
For each class , the mean and covariance matrix are computed from the set of examples
The covariance matrix is taken into account when computing the distance from an image to the class
The feature vector of the image is projected over the eigenvectors of the class
General case: need to tale into consideration and
Optimal decision rule (Bayes classifier): maximum a posteriori (MAP):
If classes are equiprobables and error cost is the same, then, because is constant, we get the maximum likelihood estimation:
Training data , .
For each , build model for of
Typically, small (few possibles lables), low dimensional
Learn w and b
Problem: how to learn w and b ?
Linear classifier, is logistic function
Optimize to find best
Trained using gradient descent (no closed form solution)
Formally:
Where is step size, how far to step relative to the gradient
What is the best for this dataset ?
Trade-off:
large margin vs few mistakes on training set
Optimization problems:
What is the best linear classifier for this dataset?
None. We need something nonlinear!
2 solutions:
Transform the dataset to enable linear separability
The original input space can always be mapped to some higher-dimensional feature space where the training set is separable.
Compute for all in the dataset.
Then train a linear classifier just like before
Used to be avoided because of computation issues, but it is a hot topic again.
Linear classification requires to compute only dot products
The function does not need to be explicit, we can use a kernel function
which represents a dot product in a “hidden” feature space.
This gives a non-linear boundary in the original feature space.
Linear kernel”: identical solution as linear SVM
“Hellinger kernel”: less sensitive to extreme value in feature vector
“Histogram intersection kernel”: very robust
“-distance kernel”: good empirical results
“Gaussian kernel”: overall most popular kernel in Machine Learning
Using simple square root properties, we have:
Tricks for next practice session: given a BoVW vector,
All the following classifiers have a 90% accuracy
Do all errors have the same cost?
For binary classification
Instead of , take all possible thresholds for
ROC: “Receiver Operating Characteristic”
Kind of signal/noise measure under various tunings
Ligne rose: random results
Draw randomly, with replacement samples from the training set.
Enables us to estimate the variance of estimators we use in the classification rule.
Just keep a part of the dataset for later validation/testing