# Hallucination improves object detection
## Introduction
### Motivation
- in extremely low-shot setting, the lack of data variation is a problem, especially when considering novel classes.
- RPN is a good start since it finds the most porimising regions by the highest IoU boxes, but in a low-shot settinig, there is simply not enough data that allows the variation.
- **in this work, it claims that it trains a network to transfer shared within-class variation from base classes**
- one reason if because shared variation is hard to be encoded in the RPN.
- this paper proposes to use a hallucinator at the ROI head (**after RPN**) to generate examples in the **ROI feature space** (the ROI feture space here means the ROI regions/boxes/proposals generated by the RPN)
- this can be seen as a way of data augmentation for building a better classifer.
### Contributions
- contributions are three-fold
(1) the author explores the problem that arises with the lack of inter-class variation in training data, in the few-shot learning problem.
(2) the author proposes a hullicinator for novel data to transfer shared modes of within-class variation from base classes to novel classes.
(3) the author claims that their proposed model outperforms TFA (from the frustratinly simply few-shot object detection) in low-shot setting.
- author also claims that their work is the first one to show the effectiveness of hallucination on few-shot object detection.
## Related Work
### Object detection
- author defines two main groups: (1) serial (2) parallel object detection networks
- serial detectors first generate promising RoIs and then feed each proposal box to classifier that predicts if the region contains an object
### Few-shot object detection
There are mainly four lines of work under this paradigm:
- learning better feature representatiions through **Metric learning**
- modified fine-tunning techniques
- **Meta-learning techniques**
- techniques to imporve region proposal generation process by **attention-mechanism and class-aware features**
- additional information such as **semantic relations** and **multi-scale representations**
### Data Hallucination
- Most work focuses on classification tasks and the learned feature space.
- learning from base classes the shared feature transformations to generate novel class features.
- pairwise deformations between examples of the same class.
- combined meta-learner and a hallucinator.
## Approach
- The author builds their model based on the two existing state-of-the-art baselines: (1) TFA and (2) CoRPN.
### TFA
- TFA is a two-stage fine-tuning few-shot detector: train on base classes and then fine tune on novel classes (blue area means training and grey are means fixed)

- TFA is a two-stage fine-tuning approach to few-shot learning.
- It is built on top on the Faster-RCNN baseline. However, it uses a cosine-similarity classifier to reduce intra-class variance in few-shot learning.
- It has a ResNet-101 backbone pretrained on ImageNet with a feature pyramid network.
- The training procedure is that, in the first stage, it is trained on base class instances; in the second stage, it is fine-tuned on novel class instance, where only the box regressor and classifier are trained, while keeping the rest of the part frozen.
### CoRPN

- CoRPN has exactly the same architectuer and training procedure as TFA, except the **proposal generation procedure**.
- CoRPN has multiple RPN's, where each RPN predicts the RoI (high IoU boxes), so that if one RPN fails the others can still capture them.
- The loss of CoRPN is as such:

where Ldiv is the divergence loss and Lcoop is the cooperative loss. Ldiv encourages the RPN's to be different while Lcoop encourages RPN's to cooperate by **setting a lower bound to the RPN response(prediction) to boxes(anchors)**


### The hallucinator model
- as can be seen, the hallucinator model is placed after the "box head" (ROI feature extractor, e.g. NMS), but does not contribute any outputs for the box regressor.
- so it can be seen as a way of augmenting the variation of ROI features.
- the hallucinated examples are then appended with the original ROI training examples to train the box classifier.
- **only the original examples are used to train the box regressor, not the hallucinated examples**
- the hallucintor is a two-layer MLP with ReLU. The input size is three times the feature size and the output size of eacch linear layer is the same size as the input feature.

- the hallucinator model takes as inputs the class prototypes, a seed example, a noise vector parameterized by phi.
- the prototypes are used to capture global category info. This can also be seen as **a way of regularization** by not simply copying the seed examples.
- during training, the prototypes computed at the base-class training stage and novel class fine-tuning stage is different. All base class examples are used to compute the base class prototypes **before training the hallucinator**, and these base class prototypes are not updated during training.
- however, the novel class prototypes are updated dynamically using both training and hallucinated exmaples when training the classifier (prototypes are updated whenever a new hallucinated example is generated).

- the loss is computed by using the weights pretrained (wyi and wk) and the seed example xi of category ck.

#### Training style
- Iterative training as opposed to end-to-end joint training: **EM style training**.
- randomly sample the proposals as seed exmaples as input to the hallucinator.
#### Actual training procedure

Training on base classes
- First train a plain detector (without the hallucinator) on base classes, then train the hallucinator on this pretrained ROI feature space (guided by the pre-trained classifier).
Fine-tuning on novel classes
- initially, a batch of samples consisting of an imbalanced set of positive and negative exmaples, with negative examples being the majority.
- First generate the hallucinated examples of novel classes using the trained hallucinator and then randomly replace the background examples with examples generated by the hallucinator to obtain a **refined training batch**.
- Then, we train the classifier again on the refined dataset with hallucinated examples.
- Next, we fine-tune the hallucinator using the classifier, and then use the fine-tuned classifier to fine-tune hallucinator...etc.
## Evaluation
- both the baselines and the proposed models are Cb+Cn way few shot detectors.
- the **standard evaluation procedure** used is based on the Frustraintly-Simple Few-Shot Object Detection paper.
- **including grountruth boxes in the training examples in the RoI head.**
- fine-tuning stages on PASCAL and COCO are different. For PASCAL VOC, the classifier is trained using a balanced dataset with base and novel classes, whereas for COCO, a Cn-way classifier is first trained on novel classes, it is then trained on the Cb+Cn way using a balanced few-shot dataset.