Learning Deep Features for Discriminative Localization
====
> Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
Computer Science and Artificial Intelligence Laboratory
> MIT
[回目錄](https://hackmd.io/@marmot0814/ryzDejzGr)
# Abstract
1. Revisit the GAP
- GAP(global average pooling layer)
2. Achieve 37.1% top-5 error on ILSVRC 2014
- ILSVRC(Large Scale Visual Recognition Challenge)
# Introduction
1. convolutional layers can localize objects, but this ability is lost when fully-connected layers
- Network in Network, GoogLeNet avoid fully-connected layers.
- minimize the number of params
2. GAP(known as a kind of structural regularizer) doesn't simply act as a regularizer.
3. This approach can be easily transferred to other recognition datasets for generic classification, localization and concept discovery.
- achieves 37.1% top-5 test error, close to the fully supervised AlexNet.
## Related Work
# Class Activation Mapping


# Weakly-supervised Object Localization
## Setup
- use AlexNet, VGGnet and GoogLeNet to generate *GAP
1. remove some layer (fully connected layer and softmax)
2. add some convolution layer
- 3 * 3, stride 1, padding 1 with 1024 units.
- GAP layer
- softmax
## Results
1. Classification

- similar
2. Localization

- weakly supervision still have a long way to go.
3. Generic Localization
- to test the ability about feature extraction between original network and the network concatenated with GAP by linear SVM
- 
- similar
- to test the ability about localization by weakly supervision
- 
- It still can find the position of the object.
# Deep Features for Generic Localization
## Fine-grained Recognition
- with bounding box
- 
- 
## Pattern Discovery
- Given a set of images containing a common concept, test the network whether can find where the position f the important regions in this images.
1. How to identify the important region before train the network to test the network performance.
1. use GoogLeNet-GAP network training by image-level label. use SVM weight and GAP to contruct the CAM to identify the important region.
2. Experiment
1. Discovering informative objects in the scenes
- 
2. Concept localization in weakly labeled images
- 
3. Weakly supervised text detector
- 
- postive set: picture with text
- negtive set: picture without text
4. Interpreting visual question answering (???)
- 
# Visualizing Class-Specific Units

# Conclusion
1. CAM fofr CNN with global average pooling.
- enable to visualize hotmap to the given image
2. weakly supervision to find localize the object.