meta-learning

@meta-learning

meta-learning

Public team

Community (0)
No community contribution yet

Joined on Sep 2, 2021

  • introduces a weight prediction model to predict the model parameters from its few-shot parameters. the meta-learning strategy is designed to disentangle the learning of class-agnostic and class-specific weight parameters. this work focuses on weight parameter generation/prediction for the detector network (box head) by learning a meta-model that regresses the class-specific parameters trained from few-shot to class-specific parameters trained from large base classes. Training in the first stage, a detector is trained on large base classes, specifically to learn the class-agnostic parameters. in the second stage, only the class-specific parameters are learned while the class-agnostic parameters are fixed. The meta model T is trained by receiving the class-specific weights in the last layer and class-specific weights from few-shot training as inputs, where T is parameterized by the class-agnostic weights. Loss function for training T is:
     Like  Bookmark
  • this work proposes a DML architecture for both image classification and detection in the few-shot scenario.
     Like  Bookmark
  • this is by far the only survey that systematically compares few-shot object detection methods. Taxonomy (based on dataset settings) In terms of novel classes, the problems can be defined as: LS-FSOD: a small novel set data and an optional dataset without target supervision to learn generic notions. SS-FSOD: has an extra target-domain data without annotations (an additional unlabelled examples) WS-FSOD: a small novel set of data with image-level labels (weakly labelled, sometimes some unlaballed novel set and a base set may be included to compensate for the inaccurate supervisory signals) Usually, base is usually for learning task-agnostic notions, where as learning from whether weakly labelled novel set or unlabelled novel set is for learning task-specific guidance
     Like  Bookmark
  • # Weakly Supervised Object Localisation and Detection: A Survey
     Like  Bookmark
  • # Ideas on improving FSOD
     Like  Bookmark
  • this work focuses on improving feature generalization by enhancing/focusing on the intrinsical characteristics that are universal across object categories. contributions: enhance object features with universal prototype, impose a consistency loss between the enhanced and original features, and Intro intrinsical object characteristics: should be invariant under different visual changes like textual variances or environmental noises.
     Like  Bookmark
  • 2021 [CVPR 2021] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection [CVPR 2021] FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding [CVPR 2021] Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection [CVPR 2021] Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection [arXiv 2102] Should I Look at the Head or the Tail? Dual-awareness Attention for Few-Shot Object Detection [arXiv 2103] Universal-Prototype Augmentation for Few-Shot Object Detection [arXiv 2103] Beyond Max-Margin: Class Margin Equilibrium for Few-shot Object Detection [arXiv 2103] Meta-DETR: Few-Shot Object Detection via Unified Image-Level Meta-Learning
     Like  Bookmark
  • leading approaches derived from meta learning mainly focus on a single visual object. propose meta-learning over ROI features instead of a full image feature. introduce PRN (Predictor-Head Remodeling Network) that shares its main backbone with Faster/Mask R-CNN. PRN takes as inputs the few-shot obkects from the base and novel classes and outputs class attentive vectors. These vetors then take channel-wise soft attention to the ROI feature and to help discriminate better in prediction. Related work Few-shot object recognition: Bayesian approaches, metric-learning and meta-learning. Bayesian approaches: design probabilistic model to discover the information among latent variables. Metric-learning(similarity-learning): focuses on distinguishing similar/dissimilar features among different class object.
     Like 1 Bookmark
  • treat the problem as a direct set prediction problem. it is pointed out that both self-attention is especially suitable for the constraints of set prediction. architecture: encode-decoder transformer (with non-autoregressive parallel decoding), set-based global loss (bipartite matching for computing loss, where loss is permutation-invariant). Related Work bipartite matching losses for set prediction, encode-decoder architectures based on transformer, parallel decoding, and object detection methods. Set Prediction
     Like  Bookmark
  • Three experiment setups to validate GAN: open-set discrimination splits a single dataset into open and closed sets. The open-set discrimination classifies open vs closed test examples. open-set recognition requires K-way classification on both closed-set and open-set discrimination. examines the open-set discrimination at pixel level in semantic segmentation, which evaluates pixel-level open-vs-closed classification accuracy. For (1) and (2), a closed-world K-way network (ResNet18) is trained on the closed training set, whereas in (3), HRNet is used as an OTS network.
     Like  Bookmark
  • this work mainly focuses on data augmentation. the basic assumption of this approach is that the intra-class cross-sample relationship learned from seen classes can be applied to unseen classes. in data augmentation, the diversity and discriminabilit are especially important in the few-shot setting. Common Problems in the Data Augmentation Approach learning arbitary transformation mappings may destroy discriminatability of the synthesized samples. when synthesizing samples specifically for certain tasks (regularize the synthesis process), the task may constrain the synthesis processs and thus the synthesized smaples tend to collapse into certain modes. the proposed cWGAN-based feature synthesis framework and two novel regularizers. the
     Like  Bookmark
  • the paper "Hallucination Improves Few-Shot Object Detection" is based on ths paper this paper uses a meta-learning approach. realistic examples might still fail to capture many modes of variation of visual concepts, while unrealistic hallucinations can still lead to a good decision boundary. training process is model agnostic. The hallucination appraoch can be used on different meta-learning methods. training using meta-learning allows for learning to hallucinate to make class distinctions (so no need to worry about having to tune for realism or diversity) Related Work based on the work by Hariharan and GirShick, the effort of annotations can be avoided by trying to transfer the transformation from a pair of exmaples from a known category to a novel class' seed example. This paper follows this line of work, in an end-to-end manner.
     Like  Bookmark
  • # Adaptive Prototype Learning
     Like  Bookmark
  • Introduction Motivation in extremely low-shot setting, the lack of data variation is a problem, especially when considering novel classes. RPN is a good start since it finds the most porimising regions by the highest IoU boxes, but in a low-shot settinig, there is simply not enough data that allows the variation. in this work, it claims that it trains a network to transfer shared within-class variation from base classes one reason if because shared variation is hard to be encoded in the RPN. this paper proposes to use a hallucinator at the ROI head (after RPN) to generate examples in the ROI feature space (the ROI feture space here means the ROI regions/boxes/proposals generated by the RPN) this can be seen as a way of data augmentation for building a better classifer.
     Like  Bookmark
  • Preliminaries Why not just use softmax loss? Sometimes in cases such as face recognition, it is required to compare two unknown faces. This indicates that classes may be variable instead of fixed-length. With softmax, classes are of fixed-length, but with triplet mining, embeddings of classes different from the trained training set may be computed. offline mining: at the beginning of each epoch, compute embeddings on full training set. 3B embeddings are computed to get B triplets. It is required to do a full pass in order to generate the triplets online mining (on-the-fly): give a batch of images, compute B embeddings to generate B^3 triplets. Most of these triplets are not valid. Intro
     Like  Bookmark
  • Introduction authors propose three concerns regarding previous work: (1) loss of information due to global average pooling on support features,(2) CNN are bad at modelling varying spatial distribution, (3) class-specific representations heavily assume those vectors are representative enough. the hypothesis is that, if the above three problems are trivial in FSOD, the choice of support images should not affect the performance too much. Experimental setup is such that the set of support images change every time when the models are tested on the same query set for 100 times. proposes two modules: background attenuation (BA) and cross-image spatial attention (CISA) modules. BA module is inspired by wave interference, where each feature vector of a high-level feature map is viewed as a wave along the channel dimension. When the representative vector is added back to the feature map, the local features having different wave patterns can be considered as noise by the detection network. Cross-image spatial-attention is proposed to adaptively transform support images into query-positioned-aware vectors (QPA). QPA carries support information that is considered the most relevant to each query region. By finding the correlation between query proposals and QPA vectors, the model can determine whether the query region belongs to a target object. The dual-awareness attention mechanisms are used to capture object-wise correlations.
     Like  Bookmark
  • keywords: attention-rpn, multi-relation detector, contrastive learning strategy, FSOD dataset. Intro the author focuses on the question "how to correctly localise unseen objects with a cluttered background" author points out that missing bounding boxes on novel objects may be caused by inappropriate low scores of good bounding boxes output from RPN. the attention module is used to enhance the quality of proposals the multi-relation detector module supresses and filters out false detections in the cluttered background. Datasets
     Like  Bookmark
  • # RCNN variants ![](https://i.imgur.com/glKsSfK.jpg)
     Like  Bookmark
  • Abstract Summary meta-learning based few-shot object detection to transfer knowledge from base to novel classes coarse-to-fine proposal based object detection framework incorporates prototype-based classifiers into both proposal generation and classification stages proposal generation: incorporates a light weight matching network (between query image feature map and spatially-pooled class features) attentive feature alignment method to reduce spatial misalignment between proposals and few-shot class examples experiments conducted on FSOD benchmarks Introduction
     Like 1 Bookmark
  •  Like  Bookmark