tags: notes
domain-adaptation
segmentation
weakly-supervised
unsupervised
ECCV '20 paper; Project Page; Code not released as of 20/09/20.
Author: Akshay Kulkarni
Brief Outline
This paper proposes a framework for domain adaptation (DA) in semantic segmentation with image-level weak labels in the target domain. They use weak labels to enable the interplay between feature alignment and pseudo-labeling, improving both in DA.
Introduction
- Existing UDA methods for semantic segmentation are developed mainly using 2 mechanisms
- Psuedo label self-training
- In this, pixel-wise pseudo labels are generated via strategies such as confidence scores (BMVC '18, CVPR '19) or self-paced learning (ECCV '18).
- But, such pseudo labels are specific to the target domain and do not consider alignment between domains.
- Distribution alignment between source and target domains
- To alleviate the issue of lacking annotations in the target domain, they propose utilizing weak labels in the form of image- or point-level annotations in the target domain.
- The weak labels could be estimated from the model prediction in the UDA setting or provided by the human oracle in the weakly-supervised DA (WDA) paradigm. Note that this is the first paper to introduce a WDA setting for semantic segmentation.
- Specifically, they use weak labels to perform
- image-level classification to identify the presence/absence of categories in an image as a regularization.
- category-wise domain alignment using such categorical labels.
- For the image-level classification task, weak labels help obtain a better pixel-wise attention map per category. These category-wise attention maps act as guidance to further pool category-wise features for proposed domain alignment procedure.
- The main contributions of this work are
- They propose the concept of using weak labels to help DA for semantic segmentation.
- They utilize weak labels to improve category-wise alignment for better feature space adaptation.
- They demonstrate the applicability of their method to both UDA and WDA settings.
Methodology
Problem Definition
- In the source domain, they have images and pixel-wise labels denoted as . Whereas, the target dataset contains images and only image-level labels as .
- Here, , with pixel-wise one-hot vectors, is a multi-hot vector representing the categories present in the image and is the number of categories (same for both source and target datasets).
- The image-level labels , termed weak labels, can be estimated (in which case, they are called pseudo-weak labels i.e. UDA) or acquired from a human oracle (in which case, they are called oracle-weak labels i.e. WDA).
- Given such data, the problem is to adapt a segmentation model learned on the source dataset to the target dataset .
Algorithm Overview
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
- They first pass both the source and target images through the segmentation network and obtain their features , segmentation predictions and the upsampled pixel-wise predictions .
- As a baseline, they use source pixel-wise annotations to learn , while aligning the output space distribution and , following this CVPR '18 paper.
- First, they introduce a module which learns to predict the categories that are present in a target image. Second, they formulate a mechanism to align the features of each individual category between source and target domains.
- To this end, they use category-specific domain discriminators guided by the weak labels to determine which categories should be aligned.
Weak Labels for Category Classification
- To predict whether a category is absent/present in a particular image, they define an image classification task using the weak labels, such that can discover those categories.
- They feed the target images through to obtain the predictions and then apply a global pooling layer to obtain a single vector of predictions for each category:
- Here, is the sigmoid function such that represents the probability that a particular category appears in an image. Note that (1) is a smooth approximation of the function, and higher the value of , better it approximates to . They use .
- Using and the weak labels , they compute the category-wise binary CE loss:
- This loss helps to identify the categories which are absent/present in a particular image and enforces to pay attention to those objects/stuff that are partially identified when the source model is used directly on the target images.
Weak Labels for Feature Alignment
- Methods in literature either align feature space or output space across domains. However, these are agnostic to category, so it may align features of categories not present in certain images.
- Also, features belonging to different categories may have different domain gaps. Thus, category-wise alignment could be beneficial but has not been widely studied in UDA for semantic segmentation.
Category-wise Feature Pooling
- Given the last layer features and the segmentation prediction , they obtain the category-wise features by using the prediction as an attention over the features. Specifically, they obtain the category-wise feature as a 2048-dimensional vector for the category as follows:
- Here, is a tensor of dimension with each channel along the category dimension representing the category-wise attention obtained by the softmax operation over the spatial dimensions.
- is a scalar and is a 2048-dimensional vector. So, is the summed feature of weighted by over the spatial map . Note that the subscripts and are dropped as they employ the same operation to obtain the category-wise features for both domains.
- Note that denotes the pooled features for the category and denotes the set of pooled features for all categories.
Category-wise Feature Alignment
- To learn such that source and target category-wise features are aligned, they use an adversarial loss while using category-specific discriminators .
- The reason for using category-specific discriminators is to ensure that the feature distribution for each category could be aligned independently, which avoids the noisy distribution modeling from a mixture of categories.
- They train distinct category-specific discriminators as follows:
- While training the discriminators, they only compute the loss for those categories which are present in the particular images via the weak labels that indicate whether a category occurs in an image or not.
- The adversarial loss for the target images to train is:
- Similarly, they use the target weak labels to align only those categories present in the target image.
- Note: These 2 loss functions are effectively those used in the original GAN paper and also in the output space adaptation paper (CVPR '18).
Network Optimization
Discriminator Training
- Both source and target images are used to train a set of distinct discriminators for each category , which learn to distinguish between the category-wise features drawn from either the source or the target domain.
- The optimization problem to train the discriminator can be expressed as .
Segmentation Network Training
- They train with the pixel-wise CE loss on the source images, image classification loss and adversarial loss on the target images.
- The combined loss function to train is:
- They follow standard GAN training procedure (NeurIPS '14) to alternatively update and .
Acquiring Weak Labels
Pseudo-Weak Labels (UDA)
- One way is to directly estimate the weak labels using the data available i.e. source images/labels and target images, which is the UDA setting. In this work, they utilize the baseline model (CVPR '18) to adapt a model learned from the source to the target domain, and obtain the weak labels of target images as follows:
- Here, is the probability for category as computed in (1) and is a threshold, which they set to 0.2 in the experiments.
- They forward a target image through the model, obtain the weak labels using (7) in an online manner. Since these do not require human supervision, this is in a UDA setting.
Oracle-Weak Labels (WDA)
- In this, they obtain weak labels by querying a human oracle to provide a list of categories that occur in the target image.
- They further show that their method can use different forms of oracle-weak labels by using point supervision (ECCV '16) (which is only slightly more effort compared to image-level supervision).
- In point supervision, they randomly obtain one pixel coordinate of each category that belongs in the image, i.e. the set of tuples . For an image, they compute the loss as follows: , where is the output prediction of target after pixel-wise softmax.
Conclusion
- In this work, they use weak labels to improve domain adaptation for semantic segmentation in both UDA and WDA settings, with the latter being a novel setting.
- They design an image-level classification module using weak labels, enforcing the network to pay attention to categories present in the image. With this guidance from weak labels, they further utilize a category-wise alignment method to improve adversarial alignment in the feature space.
- Their formulation generalizes to both pseudo-weak and oracle-weak labels.