# A Closer look at Few Shot Classification ## ## Experimental Setup - addresses the few-shot classification problem under three scenarios: (1) generic object detection (uses mini-ImageNet) (2) fine-grained image classification (ises CUB) (3) cross-domain adaptation - datasets: (1) mini-ImageNet (64 base, 16 validation and 20 novel classes, as suggested by Ravi & Larochelle 2017) --> uses on (2) CUB-200-2011 (100 base, 50 validation, and 50 novel classes, as suggested by the author) (3) mini-ImageNet for base, CUB for validation and novel classes (50 validation and 50 novel classes). ## Implementation Details There are two main experiments under comparison. First is the experiment with baseline and baseline++ and second is the meta-learning approaches. Note that **Training** - baseline, baseline++: 400 epochs, 16 batch size - meta-training/testing: 60,000 episodes for 1 shot task, 40,000 episodes fpr 5-shot tasks - validation set is used to select the number of episodes - in each episode, N classes are sampled to form N-way classification. - for each class, k labeled instances are picked (therefore k-shot task), so k instances for the support set and 16 instances for the query task. **Testing** - a total of 600 experiments, with each experiment sampling 5 random classes from novel classes, each of which consists of k-number of instances for the support set and 16 instances for the query set. - for baseline and baseline++, the entire support set is used for training the classification model in both baseline and meta-learning methods (100 iterations and 4 batch size) - for meta-learning testing, classification model is conditioned on the support set. **Common Settings** - all methods are trained from scrath with adam and learning rate = 10^-3 - apply standard data augmentation: random crop, left-right flip, color jitter (in both training and meta-training stage) ## Evaluation using standard setting - few-shot classification setting (1-shot abd 5-shot) - four-layer convolutional backbone (Conv-4) - input size is 84x84 - during testing, 5-way classification is performed in meta testing - author uses 5-way for one-shot and five-shot, but the official results use 30-way for one-shot and 20-way for five-shot. - author points out that the re-implementation of models causes differences with the original ones by no more than 2%. The reason is due to slight modifications of the model (e.g using the same optimizer for training in order for consistency) - under the 5 way setting, all models are trained with data augmentation, 1, 5-shot settings, and a Conv4 backbone. - **baseline++** happens to perform in a competitive way compared to other meta-learning methods. - **a major finding is that by reducing the intra-class variations, baseline under the setting of few-shot classification can also achieve higher accuracy.** ## Evaluation on backbone depth, cross-domain setting(miniImageNet->CUB) further adaptation ### deeper backbone - gap between existing methods can be reduced if their intra-class variations are reduced by deeper backbones. - it is realised that some meta-learning models are worse than their proposed baseline++ with deeper backbones. - some meta-learning models also perform poorly compared to the baseline on CUB dataset (caused by domain shift between base and novel classes) ### cross-domain shift scenario - baseline actually outperforms all models when there is a large diffrence in domain. - In such case,the author hypothesizes that there may be a trade-off between intra-class variation and cross-domain adaptability. ### effect of further adaptation - the author attempts to adapt the meta-learning models by **fixing the features and training a softmax classifier** (simle adaptation scheme). ## conclusion - in a summary, the author conducted two experiments, deeper backbones and domain adaptation. - the first experiment proves that deeper layers are capable of reducing intra-class variations, which explains why the baselines outperform some meta-learning models. - the second experiment proves that by reducing intra-class variations, adaptability may become poorer. This is proven by training meta-learning models with easy adaptation scheme, which produces better results than the baselines. - therefore, the author concludes that there is a trade-off between intra-class variation and adaptability. - the author also realizes that the weakness of meta-learning and domain adaptive methods combined is that it leads to **inconsistency** at task-testing time, which explains why ProtoNet's performance degrades when domain difference is less. - **therefore, learning to adapt to different in the meta-training stage is important**. - the author claims that this could be a potential research in few-shot classification, where models need to **learn to adapt**.