Notes on "Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision"

tags: `notes` `unsupervised` `domain-adaptation` `segmentation` `self-supervised` `cvpr20`

Author: Akshay Kulkarni

Note: CVPR '20 Oral, Code

Brief Outline

They propose a two-step self-supervised DA approach to minimize the inter-domain and intra-domain gap together.

They conduct inter-domain adaptation and from this, they separate the target domain into an easy and hard split using an entropy-based ranking function.
For the intra-domain adaptation, they propose a self-supervised adaptation technique from the easy to the hard split.

Introduction

Target data collected from real world have diverse scene distributions, caused by various factors such as moving objects, weather conditions, which leads to a large gap within the target (intra-domain gap).
Previous DA works focus more on inter-domain gap, so this paper presents a 2-step DA approach to minimize the inter-domain and the intra-domain gaps.
Their model consists of 3 parts
- An inter-domain adaptation module to close the inter-domain gap between labeled source data and unlabeled target data.
- An entropy-based ranking system to separate target data into an easy and hard split.
- An intra-domain adaptation module to close intra-domain gap between the easy and hard split (using pseudo labels from the easy domain).

Methodology

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Let
$S$ denote a source domain consisting of a set of images
$\in R^{H \times W \times 3}$ with their associated ground-truth
$C$ -class segmentation maps
$\in (1, C)^{H \times W}$ . Similarly, let
$T$ denote a target domain containing a set of unlabeled images
$\in R^{H \times W \times 3}$ .
The first step is inter-domain adaptation, based on common UDA approaches (Tsai et. al. 2018 and Vu et. al. 2019). Then, the pseudo labels and predicted entropy maps are used by an entropy-based ranking system to cluster the target data into the easy and hard split.
The second step is intra-domain adaptation, which consists of aligning the pseudo-labeled easy split with the hard split. The full procedure is illustrated in the figure above.
The proposed network consists of the inter-domain generator and discriminator
${G_{i n t e r}, D_{i n t e r}}$ , and the intra-domain generator and discriminator
${G_{i n t r a}, D_{i n t r a}}$ .

Inter-Domain Adaptation

A sample
$X_{s} \in R^{H \times W \times 3}$ is from the source domain with it's associated map
$Y_{s}$ . Each entry
$Y_{s}^{(h, w)} = [Y_{s}^{(h, w, c)}]_{c}$ of
$Y_{s}$ provides a label of a pixel
$(h, w)$ as a one-hot vector.
The network
$G_{i n t e r}$ takes
$X_{s}$ as input and generates a "soft segmentation map"
$P_{s} = G_{i n t e r} (X_{s})$ .
$G_{i n t e r}$ is optimized in a supervised way by minimizing the CE loss

\begin{matrix} (1) & L_{i n t e r}^{s e g} (X_{s}, Y_{s}) = - \sum_{h, w} \sum_{c} Y_{s}^{(h, w, c)} \log (P_{s}^{(h, w, c)}) \end{matrix}

ADVENT (Vu et. al. 2019) assumes that trained models tend to produce over-confident (low entropy) predictions for source-like images, and under-confident (high entropy) predictions for target-like images. Based on this, they propose to utilize entropy maps to align the distribution shift of the features.
This paper adopts ADVENT for inter-domain adaptation due to it's simplicity and effectiveness. The generator
$G_{i n t e r}$ takes a target image
$X_{t}$ as input and produces the segmentation map
$P_{t} = G_{i n t e r} (X_{t})$ , and the entropy map
$I_{t}$ is formulated as

\begin{matrix} (2) & I_{t}^{(h, w)} = \sum_{c} - P_{t}^{(h, w, c)} \log (P_{t}^{(h, w, c)}) \end{matrix}

To reduce the inter-domain gap,
$D_{i n t e r}$ is trained to predict the domain labels for the entropy maps while
$G_{i n t e r}$ is trained to fool
$D_{i n t e r}$ , and the optimization is achieved via the loss function

\begin{matrix} (3) & L_{i n t e r}^{a d v} (X_{s}; X_{t}) = \sum_{h, w} \log (1 - D_{i n t e r} (I_{t}^{(h, w)})) + \log (D_{i n t e r} (I_{s}^{(h, w)})) \end{matrix}

Here,
$I_{s}$ is the entropy map of
$X_{s}$ . The loss functions 2 and 3 are optimized to align the distribution shift between source and target domains.

Entropy-based Ranking

Some target prediction maps are clean (confident and smooth) while others are noisy, despite being generated from the same model. Since this intra-domain gap exists among target images, a straightforward solution is to decompose the target domain into small subdomains.
To build these splits, they use entropy maps to determine the confidence levels of target predictions. They rank the predictions using the mean value of the entropy map
$I_{t}$ given by

\begin{matrix} (4) & R (X_{t}) = \frac{1}{H W} \sum_{h, w} I_{t}^{(h, w)} \end{matrix}

Let
$X_{t e}$ and
$X_{t h}$ denote a target image assigned to the easy and hard splits respectively. For domain separation, they define
$λ = \frac{| X_{t e} |}{| X_{t} |}$ where
$| X_{t e} |$ is the cardinality (number of elements) of easy split, and
$| X_{t} |$ is the cardinality of the whole target set.
Note that a threshold value for separation is not used since it would be specific to a dataset. They choose the ratio as a hyperparameter, which shows strong generalization to other datasets.

Entropy Normalization

Complex scenes (containing many objects) might be categorized as hard.
For a more representative ranking, they adopt a new normalization by dividing the mean entropy with the number of predicted rare classes in the target image.
Note that rare classes are pre-defined from the set of all classes (see results section in the paper for definition).
The entropy normalization helps to move images with many objects to the easy split.

Intra-domain Adaptation

They propose to use the predictions from
$G_{i n t e r}$ as pseudo labels for the easy split. Given image from easy split
$X_{t e}$ , the prediction map
$P_{t e} = G_{i n t e r} (X_{t e})$ is a soft-segmentation map, which is converted to
$P_{t e}$ where each entry is a one-hot vector.
Using these pseudo-labels,
$G_{i n t r a}$ is optimized by minimizing the CE loss

\begin{matrix} (5) & L_{i n t r a}^{s e g} (X_{t e}) = - \sum_{h, w} \sum_{c} P_{t e}^{(h, w, c)} \log (G_{i n t r a} (X_{t e})^{(h, w, c)}) \end{matrix}

An image
$X_{t h}$ from the hard split gives the segmentation map
$P_{t h} = G (X_{t h})$ (note that this is the
$G$ being trained and not the fixed
$G_{i n t e r}$ ) and the entropy map
$I_{t h}$ .
To close the intra-domain gap, the intra-domain discriminator
$D_{i n t r a}$ is trained to predict the split labels of
$I_{t e}$ (easy split) and
$I_{t h}$ (hard split), and
$G$ is trained to fool
$D_{i n t r a}$ . The adversarial loss can be formulated as

\begin{matrix} (6) & L_{i n t r a}^{a d v} (X_{t e}, X_{t h}) = \sum_{h, w} \log (1 - D_{i n t r a} (I_{t h}^{(h, w)})) + \log (D_{i n t r a} (I_{t e}^{(h, w)})) \end{matrix}

The complete loss function
$L$ is the sum of all 4 equations (2, 3, 5, 6) and the objective is to learn a target model
$G$ according to

\begin{matrix} (7) & G^{*} = {argmin}_{G_{i n t r a}} min_{\begin{matrix} G_{i n t e r} \\ G_{i n t r a} \end{matrix}} max_{\begin{matrix} D_{i n t e r} \\ D_{i n t r a} \end{matrix}} L \end{matrix}

Since the proposed method is a 2-step self-supervised approach, it is difficult to train in a single stage. They choose to minimize it in 3 stages as follows
1. Train the inter-domain adaptation model to optimize
  $G_{i n t e r}$ and
  $D_{i n t e r}$ .
2. Generate target pseudo labels using
  $G_{i n t e r}$ and rank all target images based on
  $R (X_{t})$ .
3. Train the intra-domain adaptation model to optimize
  $G_{i n t r a}$ and
  $D_{i n t r a}$ .

Note: See Section 4.3 of the paper for theoretical analysis (not able to understand yet).

Conclusion

A self-supervised DA approach is proposed to minimize the inter-domain and intra-domain gaps simultaneously.

Notes on "Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision"

tags: notes unsupervised domain-adaptation segmentation self-supervised cvpr20

Brief Outline

Introduction

Methodology

Inter-Domain Adaptation

Entropy-based Ranking

Entropy Normalization

Intra-domain Adaptation

Conclusion

Read more

Notes on "[DACS: Domain Adaptation via Cross-domain Mixed Sampling](https://arxiv.org/abs/2007.08702)"

Notes on "[Prototypical Pseudo Label Denoising and Target Structure Learning for DA sem. seg.](https://arxiv.org/abs/2101.10979)"

Notes on "[Consistency Regularization with High-dimensional Non-adversarial Source-guided Perturbation for UDA in Segmentation](https://arxiv.org/abs/2009.08610)"

Notes on "[Discover, Hallucinate, and Adapt: Open Compound Domain Adaptation for Semantic Segmentation](https://papers.nips.cc/paper/2020/hash/7a9a322cbe0d06a98667fdc5160dc6f8-Abstract.html)"

tags: `notes` `unsupervised` `domain-adaptation` `segmentation` `self-supervised` `cvpr20`