CS231n L2 L3 Image Classification Pipeline

# CS231n L2 L3 Image Classification Pipeline Image = big grid of numbers between [0,255] ![](https://i.imgur.com/CUzmz4X.png =50%x) Challenges: angles, lumination, deformation, occlusion(e.g. part of a cat), background clutter, intraclass variation ## Image Classifier Data-Driven Approach 1. Collect dataset of images & labels 2. Use ML to train classifier 3. Evaluate on real data ### Nearest Neighbor Train: memorize all data and labels Predict: predict the label of the most similar training image Return the label of the image with the closest distance metric (nearest neighbor). #### Distance Metric ##### L1 distance (Manhattan) ![](https://i.imgur.com/tjOIKlK.png =30%x) ![](https://i.imgur.com/DKiyaWQ.png) The speed of training and prediction: train -> O(1) predict -> O(N) => Bad (fast at training, slow at prediction) ##### L2 distance (Euclidean) ![](https://i.imgur.com/AueU88L.png =30%x) :::success If rotate coordinate frame -> L1 changes, but L2 don't ::: ### K-Nearest Neighbor Finding 'K' nearest neighbors instead of a single nearest neighbor. Instead of copying label from nearest neibor, take **majority vote** from K closest points. **Never used** on classifying images in real world: 1. Slow at test time 2. Distance metrics on pixels are not informative 3. Curse of dimensionality (D = 1 needs 4, D = 2 needs 4^2, D = 3 needs 4^3...) ### Linear Classification *Neural networks are like legos. Linear classifiers are like basic building blocks of the huge network.* ![](https://i.imgur.com/DgKL3GJ.png =50%x) b = biased term * Data independent preferences for some classes over another * If unbalanced, e.g cats >> dogs, then b of cats > dogs ![](https://i.imgur.com/u2Jzml8.png) *Linear Classification Example* ![](https://i.imgur.com/1O9HiQK.png) *Trying to draw lines to categorize* :::success Only allow to learn 1 template per category ::: #### Parametric Approach Input: images (x) and weights (W) Output: scores for each class Output = ![](https://i.imgur.com/drFFh8I.png =10%x) :::success Only need **weights** in test time => more efficient ::: ## Loss Function *To determine the weights. Tells how good the current classifier is.* ### Multiclass SVM loss Sum up all the losses between each incorrect class and the correct class. Loss = max(0, score of incorrect class - score of correct class + 1). Total loss = average of the losses for all the catogories. ![](https://i.imgur.com/Zg2Kto7.png =30%x) ![](https://i.imgur.com/6CWU3nK.png =40%x) Let loss(cat) = 2.9, loss(car) = 0, loss(frog) = 12.9, total loss (3 catogories only) = (2.9 + 0 + 12.9) / 3 = 5.27 :::success Values of the losses do not have much meaning. ::: ### Softmax Classifier - multinomial logistic regression ![](https://i.imgur.com/JrgJqn0.png =90%x) *Score = -log(probability of the correct class)* :::success **Values matter.** Target: get the probability of the correct class to 1 (to lower the loss). ::: ### Recap :::info ![](https://i.imgur.com/TLXLEIq.png =50%x) ::: ## Optimization *To find the best values of weights.* ### Random search Randomly set the values of weights and put them in the loss function to rate the performance. Perfom bad in practice. Don't use. ### Follow the slope 1D: Find the derivative of the function. Multi-D * The **gradient** is the vector of (partial derivatives) along each dimension. * Slope in any direction is the **dot product** of the direction with the gradient. * The direction of steepest descent is the **negative gradient**. #### Finding the gradients * Numerical: approximate, slow, easy to write * Analytic: exact, fast, error-rone In practice: Always use analytic. But check with numerical (gradient check). ##### Gradient Descent *Determine where and how far to step next.* Learning step: A kind of hyperameter. Determine how big a step is to move forward to the lowest-loss weight set. ###### Stochastic Gradient Descent (SGD) Calculating the full sum is too expensive when the data set is huge. Calculate approximate sum using a **minibatch** of examples instead. ##### Backpropagation *A tool for computing gradients in complex functions.* Split the computation graph into computation nodes. Calculate the gradients of each node from the back (output) by derivatives and chain rule. ![](https://i.imgur.com/bo16QVh.png =80%x) *Red values: gradients* Grouping up some nodes: ![](https://i.imgur.com/briQ9iG.png =40%x) -> ![](https://i.imgur.com/vG5Az2w.png =30%x) ![](https://i.imgur.com/fCHVV1y.png =70%x) Functions of different gates: * Add gate: gradient distributor ![](https://i.imgur.com/h6uLWWO.png =20%x) * Max gate: gradient router ![](https://i.imgur.com/lAIuPQK.png =20%x) * Mul gate: gradient switcher ![](https://i.imgur.com/2TKEEtt.png =20%x) Jacobian Matrix: ![](https://i.imgur.com/mJMd4YQ.png =30%x) ###### Vectorized Example The gradient with respect to a variable should have the **same** shape as the variable. ![](https://i.imgur.com/316UIxK.png) Code Implemtation: ![](https://i.imgur.com/pzZikfG.png) *Forward: computing result of an operation and save any intermediates Backward: computing the gradients* ## Hyperameters Q: How to choose the best hyperparameters (e.g. value of K, distance to use)? A: Try them all and choose the best. Split the data into training, validation and testing sets. Only touch the testing set (the unseen data) once and on the very end. ### Cross-Validation Split the data into folds. Choose different folds to be the validation data for different times. ![](https://i.imgur.com/1n3eFhN.png =90%x) *5-fold cross validation* ## Over-Fitting Occam's Razor: Pick the simplest one (To prevent over fitting). ![](https://i.imgur.com/vG2GDDk.png =50%x) ![](https://i.imgur.com/yNEuuFw.png =60%x) Blue line: Over fitted Green line: Simpler, regularization added :::success Do not over fit the training data, as the goal of ML is to predict the unseen data (test data). ::: ### Regularizations *Measuring model complexity* #### L1 regularization: ![](https://i.imgur.com/Ar8JN0w.png =35%x) #### L2 regularization: ![](https://i.imgur.com/LjAuIo6.png =35%x) #### Elastic net (L1 + L2): ![](https://i.imgur.com/0M81FVa.png =50%x) #### Max norm regularization #### Dropout ## Visualizing Features ![](https://i.imgur.com/l4ekcyY.png) *E.g.1 Color Histogram* *** ![](https://i.imgur.com/ELtzJOF.png) *E.g.2 Histogram of Oriented Gradients (HoG)* *** ![](https://i.imgur.com/w3g9X5V.png) *E.g.3 Bag of Words*

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.