# A Closer look at Few Shot Classification
##
## Experimental Setup
- addresses the few-shot classification problem under three scenarios:
(1) generic object detection (uses mini-ImageNet)
(2) fine-grained image classification (ises CUB)
(3) cross-domain adaptation
- datasets:
(1) mini-ImageNet (64 base, 16 validation and 20 novel classes, as suggested by Ravi & Larochelle 2017) --> uses on
(2) CUB-200-2011 (100 base, 50 validation, and 50 novel classes, as suggested by the author)
(3) mini-ImageNet for base, CUB for validation and novel classes (50 validation and 50 novel classes).
## Implementation Details
There are two main experiments under comparison. First is the experiment with baseline and baseline++ and second is the meta-learning approaches. Note that
**Training**
- baseline, baseline++: 400 epochs, 16 batch size
- meta-training/testing: 60,000 episodes for 1 shot task, 40,000 episodes fpr 5-shot tasks
- validation set is used to select the number of episodes
- in each episode, N classes are sampled to form N-way classification.
- for each class, k labeled instances are picked (therefore k-shot task), so k instances for the support set and 16 instances for the query task.
**Testing**
- a total of 600 experiments, with each experiment sampling 5 random classes from novel classes, each of which consists of k-number of instances for the support set and 16 instances for the query set.
- for baseline and baseline++, the entire support set is used for training the classification model in both baseline and meta-learning methods (100 iterations and 4 batch size)
- for meta-learning testing, classification model is conditioned on the support set.
**Common Settings**
- all methods are trained from scrath with adam and learning rate = 10^-3
- apply standard data augmentation: random crop, left-right flip, color jitter (in both training and meta-training stage)
## Evaluation using standard setting
- few-shot classification setting (1-shot abd 5-shot)
- four-layer convolutional backbone (Conv-4)
- input size is 84x84
- during testing, 5-way classification is performed in meta testing
- author uses 5-way for one-shot and five-shot, but the official results use 30-way for one-shot and 20-way for five-shot.
- author points out that the re-implementation of models causes differences with the original ones by no more than 2%. The reason is due to slight modifications of the model (e.g using the same optimizer for training in order for consistency)
- under the 5 way setting, all models are trained with data augmentation, 1, 5-shot settings, and a Conv4 backbone.
- **baseline++** happens to perform in a competitive way compared to other meta-learning methods.
- **a major finding is that by reducing the intra-class variations, baseline under the setting of few-shot classification can also achieve higher accuracy.**
## Evaluation on backbone depth, cross-domain setting(miniImageNet->CUB) further adaptation
### deeper backbone
- gap between existing methods can be reduced if their intra-class variations are reduced by deeper backbones.
- it is realised that some meta-learning models are worse than their proposed baseline++ with deeper backbones.
- some meta-learning models also perform poorly compared to the baseline on CUB dataset (caused by domain shift between base and novel classes)
### cross-domain shift scenario
- baseline actually outperforms all models when there is a large diffrence in domain.
- In such case,the author hypothesizes that there may be a trade-off between intra-class variation and cross-domain adaptability.
### effect of further adaptation
- the author attempts to adapt the meta-learning models by **fixing the features and training a softmax classifier** (simle adaptation scheme).
## conclusion
- in a summary, the author conducted two experiments, deeper backbones and domain adaptation.
- the first experiment proves that deeper layers are capable of reducing intra-class variations, which explains why the baselines outperform some meta-learning models.
- the second experiment proves that by reducing intra-class variations, adaptability may become poorer. This is proven by training meta-learning models with easy adaptation scheme, which produces better results than the baselines.
- therefore, the author concludes that there is a trade-off between intra-class variation and adaptability.
- the author also realizes that the weakness of meta-learning and domain adaptive methods combined is that it leads to **inconsistency** at task-testing time, which explains why ProtoNet's performance degrades when domain difference is less.
- **therefore, learning to adapt to different in the meta-training stage is important**.
- the author claims that this could be a potential research in few-shot classification, where models need to **learn to adapt**.