Try โ€‚โ€‰HackMD

Learning Deep Features for Discriminative Localization

Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
Computer Science and Artificial Intelligence Laboratory
MIT

ๅ›ž็›ฎ้Œ„

Abstract

  1. Revisit the GAP
    • GAP(global average pooling layer)
  2. Achieve 37.1% top-5 error on ILSVRC 2014
    • ILSVRC(Large Scale Visual Recognition Challenge)

Introduction

  1. convolutional layers can localize objects, but this ability is lost when fully-connected layers
    • Network in Network, GoogLeNet avoid fully-connected layers.
      • minimize the number of params
  2. GAP(known as a kind of structural regularizer) doesn't simply act as a regularizer.
  3. This approach can be easily transferred to other recognition datasets for generic classification, localization and concept discovery.
    • achieves 37.1% top-5 test error, close to the fully supervised AlexNet.

Class Activation Mapping

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Weakly-supervised Object Localization

Setup

  • use AlexNet, VGGnet and GoogLeNet to generate *GAP
    1. remove some layer (fully connected layer and softmax)
    2. add some convolution layer
      • 3 * 3, stride 1, padding 1 with 1024 units.
      • GAP layer
      • softmax

Results

  1. Classification
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More โ†’
  • similar
  1. Localization
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More โ†’
  • weakly supervision still have a long way to go.
  1. Generic Localization
  • to test the ability about feature extraction between original network and the network concatenated with GAP by linear SVM
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More โ†’
    • similar
  • to test the ability about localization by weakly supervision
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More โ†’
    • It still can find the position of the object.

Deep Features for Generic Localization

Fine-grained Recognition

  • with bounding box
    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More โ†’

    • Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More โ†’

Pattern Discovery

  • Given a set of images containing a common concept, test the network whether can find where the position f the important regions in this images.
    1. How to identify the important region before train the network to test the network performance.
      1. use GoogLeNet-GAP network training by image-level label. use SVM weight and GAP to contruct the CAM to identify the important region.
    2. Experiment
      1. Discovering informative objects in the scenes

        • Image Not Showing Possible Reasons
          • The image file may be corrupted
          • The server hosting the image is unavailable
          • The image path is incorrect
          • The image format is not supported
          Learn More โ†’
      2. Concept localization in weakly labeled images

        • Image Not Showing Possible Reasons
          • The image file may be corrupted
          • The server hosting the image is unavailable
          • The image path is incorrect
          • The image format is not supported
          Learn More โ†’
      3. Weakly supervised text detector

        • Image Not Showing Possible Reasons
          • The image file may be corrupted
          • The server hosting the image is unavailable
          • The image path is incorrect
          • The image format is not supported
          Learn More โ†’
        • postive set: picture with text
        • negtive set: picture without text
      4. Interpreting visual question answering (???)

        • Image Not Showing Possible Reasons
          • The image file may be corrupted
          • The server hosting the image is unavailable
          • The image path is incorrect
          • The image format is not supported
          Learn More โ†’

Visualizing Class-Specific Units

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More โ†’

Conclusion

  1. CAM fofr CNN with global average pooling.
    • enable to visualize hotmap to the given image
  2. weakly supervision to find localize the object.