tags: notes
unsupervised
domain-adaptation
segmentation
self-supervised
cvpr20
Author: Akshay Kulkarni
Note: CVPR '20 Oral, Code
Brief Outline
They propose a two-step self-supervised DA approach to minimize the inter-domain and intra-domain gap together.
- They conduct inter-domain adaptation and from this, they separate the target domain into an easy and hard split using an entropy-based ranking function.
- For the intra-domain adaptation, they propose a self-supervised adaptation technique from the easy to the hard split.
Introduction
- Target data collected from real world have diverse scene distributions, caused by various factors such as moving objects, weather conditions, which leads to a large gap within the target (intra-domain gap).
- Previous DA works focus more on inter-domain gap, so this paper presents a 2-step DA approach to minimize the inter-domain and the intra-domain gaps.
- Their model consists of 3 parts
- An inter-domain adaptation module to close the inter-domain gap between labeled source data and unlabeled target data.
- An entropy-based ranking system to separate target data into an easy and hard split.
- An intra-domain adaptation module to close intra-domain gap between the easy and hard split (using pseudo labels from the easy domain).
Methodology
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- Let denote a source domain consisting of a set of images with their associated ground-truth -class segmentation maps . Similarly, let denote a target domain containing a set of unlabeled images .
- The first step is inter-domain adaptation, based on common UDA approaches (Tsai et. al. 2018 and Vu et. al. 2019). Then, the pseudo labels and predicted entropy maps are used by an entropy-based ranking system to cluster the target data into the easy and hard split.
- The second step is intra-domain adaptation, which consists of aligning the pseudo-labeled easy split with the hard split. The full procedure is illustrated in the figure above.
- The proposed network consists of the inter-domain generator and discriminator , and the intra-domain generator and discriminator .
Inter-Domain Adaptation
- A sample is from the source domain with it's associated map . Each entry of provides a label of a pixel as a one-hot vector.
- The network takes as input and generates a "soft segmentation map" . is optimized in a supervised way by minimizing the CE loss
- ADVENT (Vu et. al. 2019) assumes that trained models tend to produce over-confident (low entropy) predictions for source-like images, and under-confident (high entropy) predictions for target-like images. Based on this, they propose to utilize entropy maps to align the distribution shift of the features.
- This paper adopts ADVENT for inter-domain adaptation due to it's simplicity and effectiveness. The generator takes a target image as input and produces the segmentation map , and the entropy map is formulated as
- To reduce the inter-domain gap, is trained to predict the domain labels for the entropy maps while is trained to fool , and the optimization is achieved via the loss function
- Here, is the entropy map of . The loss functions 2 and 3 are optimized to align the distribution shift between source and target domains.
Entropy-based Ranking
- Some target prediction maps are clean (confident and smooth) while others are noisy, despite being generated from the same model. Since this intra-domain gap exists among target images, a straightforward solution is to decompose the target domain into small subdomains.
- To build these splits, they use entropy maps to determine the confidence levels of target predictions. They rank the predictions using the mean value of the entropy map given by
- Let and denote a target image assigned to the easy and hard splits respectively. For domain separation, they define where is the cardinality (number of elements) of easy split, and is the cardinality of the whole target set.
- Note that a threshold value for separation is not used since it would be specific to a dataset. They choose the ratio as a hyperparameter, which shows strong generalization to other datasets.
Entropy Normalization
- Complex scenes (containing many objects) might be categorized as hard.
- For a more representative ranking, they adopt a new normalization by dividing the mean entropy with the number of predicted rare classes in the target image.
- Note that rare classes are pre-defined from the set of all classes (see results section in the paper for definition).
- The entropy normalization helps to move images with many objects to the easy split.
Intra-domain Adaptation
- They propose to use the predictions from as pseudo labels for the easy split. Given image from easy split , the prediction map is a soft-segmentation map, which is converted to where each entry is a one-hot vector.
- Using these pseudo-labels, is optimized by minimizing the CE loss
- An image from the hard split gives the segmentation map (note that this is the being trained and not the fixed ) and the entropy map .
- To close the intra-domain gap, the intra-domain discriminator is trained to predict the split labels of (easy split) and (hard split), and is trained to fool . The adversarial loss can be formulated as
- The complete loss function is the sum of all 4 equations (2, 3, 5, 6) and the objective is to learn a target model according to
- Since the proposed method is a 2-step self-supervised approach, it is difficult to train in a single stage. They choose to minimize it in 3 stages as follows
- Train the inter-domain adaptation model to optimize and .
- Generate target pseudo labels using and rank all target images based on .
- Train the intra-domain adaptation model to optimize and .
Note: See Section 4.3 of the paper for theoretical analysis (not able to understand yet).
Conclusion
- A self-supervised DA approach is proposed to minimize the inter-domain and intra-domain gaps simultaneously.