Try   HackMD

Notes on "Domain Adaptive Semantic Segmentation Using Weak Labels"

tags: notes domain-adaptation segmentation weakly-supervised unsupervised

ECCV '20 paper; Project Page; Code not released as of 20/09/20.

Author: Akshay Kulkarni

Brief Outline

This paper proposes a framework for domain adaptation (DA) in semantic segmentation with image-level weak labels in the target domain. They use weak labels to enable the interplay between feature alignment and pseudo-labeling, improving both in DA.

Introduction

  • Existing UDA methods for semantic segmentation are developed mainly using 2 mechanisms
    • Psuedo label self-training
      • In this, pixel-wise pseudo labels are generated via strategies such as confidence scores (BMVC '18, CVPR '19) or self-paced learning (ECCV '18).
      • But, such pseudo labels are specific to the target domain and do not consider alignment between domains.
    • Distribution alignment between source and target domains
  • To alleviate the issue of lacking annotations in the target domain, they propose utilizing weak labels in the form of image- or point-level annotations in the target domain.
  • The weak labels could be estimated from the model prediction in the UDA setting or provided by the human oracle in the weakly-supervised DA (WDA) paradigm. Note that this is the first paper to introduce a WDA setting for semantic segmentation.
  • Specifically, they use weak labels to perform
    • image-level classification to identify the presence/absence of categories in an image as a regularization.
    • category-wise domain alignment using such categorical labels.
  • For the image-level classification task, weak labels help obtain a better pixel-wise attention map per category. These category-wise attention maps act as guidance to further pool category-wise features for proposed domain alignment procedure.
  • The main contributions of this work are
    • They propose the concept of using weak labels to help DA for semantic segmentation.
    • They utilize weak labels to improve category-wise alignment for better feature space adaptation.
    • They demonstrate the applicability of their method to both UDA and WDA settings.

Methodology

Problem Definition

  • In the source domain, they have images and pixel-wise labels denoted as
    Is={Xsi,Ysi}i=1Ns
    . Whereas, the target dataset contains images and only image-level labels as
    It={Xti,yti}i=1Nt
    .
  • Here,
    Xs,XtRH×W×3
    ,
    YsBH×W×C
    with pixel-wise one-hot vectors,
    ytBC
    is a multi-hot vector representing the categories present in the image and
    C
    is the number of categories (same for both source and target datasets).
  • The image-level labels
    yt
    , termed weak labels, can be estimated (in which case, they are called pseudo-weak labels i.e. UDA) or acquired from a human oracle (in which case, they are called oracle-weak labels i.e. WDA).
  • Given such data, the problem is to adapt a segmentation model
    G
    learned on the source dataset
    Is
    to the target dataset
    It
    .

Algorithm Overview

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • They first pass both the source and target images through the segmentation network
    G
    and obtain their features
    Fs,FtRH×W×2048
    , segmentation predictions
    As,AtRH×W×C
    and the upsampled pixel-wise predictions
    Os,OtRH×W×C
    .
  • As a baseline, they use source pixel-wise annotations to learn
    G
    , while aligning the output space distribution
    Os
    and
    Ot
    , following this CVPR '18 paper.
  • First, they introduce a module which learns to predict the categories that are present in a target image. Second, they formulate a mechanism to align the features of each individual category between source and target domains.
  • To this end, they use category-specific domain discriminators
    Dc
    guided by the weak labels to determine which categories should be aligned.

Weak Labels for Category Classification

  • To predict whether a category is absent/present in a particular image, they define an image classification task using the weak labels, such that
    G
    can discover those categories.
  • They feed the target images
    Xt
    through
    G
    to obtain the predictions
    At
    and then apply a global pooling layer to obtain a single vector of predictions for each category:

(1)ptc=σs[1klog1HWh,wexpkAt(h,w,c)]

  • Here,
    σs
    is the sigmoid function such that
    pt
    represents the probability that a particular category appears in an image. Note that (1) is a smooth approximation of the
    max
    function, and higher the value of
    k
    , better it approximates to
    max
    . They use
    k=1
    .
  • Using
    pt
    and the weak labels
    yt
    , they compute the category-wise binary CE loss:

(2)Lc(Xt;G)=c=1Cytclog(ptc)(1ytc)log(1ptc)

  • This loss
    Lc
    helps to identify the categories which are absent/present in a particular image and enforces
    G
    to pay attention to those objects/stuff that are partially identified when the source model is used directly on the target images.

Weak Labels for Feature Alignment

  • Methods in literature either align feature space or output space across domains. However, these are agnostic to category, so it may align features of categories not present in certain images.
  • Also, features belonging to different categories may have different domain gaps. Thus, category-wise alignment could be beneficial but has not been widely studied in UDA for semantic segmentation.

Category-wise Feature Pooling

  • Given the last layer features
    F
    and the segmentation prediction
    A
    , they obtain the category-wise features by using the prediction as an attention over the features. Specifically, they obtain the category-wise feature
    Fc
    as a 2048-dimensional vector for the
    cth
    category as follows:

(3)Fc=h,wσ(A)(h,w,c)F(h,w)

  • Here,
    σ(A)
    is a tensor of dimension
    H×W×C
    with each channel along the category dimension representing the category-wise attention obtained by the softmax operation
    σ
    over the spatial dimensions.
  • σ(A)(h,w,c)
    is a scalar and
    F(h,w)
    is a 2048-dimensional vector. So,
    Fc
    is the summed feature of
    F(h,w)
    weighted by
    σ(A)(h,w,c)
    over the spatial map
    H×W
    . Note that the subscripts
    s
    and
    t
    are dropped as they employ the same operation to obtain the category-wise features for both domains.
  • Note that
    Fc
    denotes the pooled features for the
    cth
    category and
    FC
    denotes the set of pooled features for all categories.

Category-wise Feature Alignment

  • To learn
    G
    such that source and target category-wise features are aligned, they use an adversarial loss while using category-specific discriminators
    DC={Dc}c=1C
    .
  • The reason for using category-specific discriminators is to ensure that the feature distribution for each category could be aligned independently, which avoids the noisy distribution modeling from a mixture of categories.
  • They train
    C
    distinct category-specific discriminators
    DC
    as follows:

(4)LdC(FsC,FtC;DC)=c=1CysclogDc(Fsc)ytclog(1Dc(Ftc))

  • While training the discriminators, they only compute the loss for those categories which are present in the particular images via the weak labels
    ys,ytBC
    that indicate whether a category occurs in an image or not.
  • The adversarial loss for the target images to train
    G
    is:

(5)LadvC(FtC;G,DC)=c=1CytclogDc(Ftc)

  • Similarly, they use the target weak labels
    yt
    to align only those categories present in the target image.
  • Note: These 2 loss functions are effectively those used in the original GAN paper and also in the output space adaptation paper (CVPR '18).

Network Optimization

Discriminator Training

  • Both source and target images are used to train a set of
    C
    distinct discriminators for each category
    c
    , which learn to distinguish between the category-wise features drawn from either the source or the target domain.
  • The optimization problem to train the discriminator can be expressed as
    minDCLdC(FsC,FtC)
    .

Segmentation Network Training

  • They train
    G
    with the pixel-wise CE loss
    Ls
    on the source images, image classification loss
    Lc
    and adversarial loss
    LadvC
    on the target images.
  • The combined loss function to train
    G
    is:

(6)minGLs(Xs)+λcLc(Xt)+λdLadvC(FtC)

  • They follow standard GAN training procedure (NeurIPS '14) to alternatively update
    G
    and
    DC
    .

Acquiring Weak Labels

Pseudo-Weak Labels (UDA)

  • One way is to directly estimate the weak labels using the data available i.e. source images/labels and target images, which is the UDA setting. In this work, they utilize the baseline model (CVPR '18) to adapt a model learned from the source to the target domain, and obtain the weak labels of target images as follows:

(7)ytc={1,ptc>T,0,otherwise.

  • Here,
    ptc
    is the probability for category
    c
    as computed in (1) and
    T
    is a threshold, which they set to 0.2 in the experiments.
  • They forward a target image through the model, obtain the weak labels using (7) in an online manner. Since these do not require human supervision, this is in a UDA setting.

Oracle-Weak Labels (WDA)

  • In this, they obtain weak labels by querying a human oracle to provide a list of categories that occur in the target image.
  • They further show that their method can use different forms of oracle-weak labels by using point supervision (ECCV '16) (which is only slightly more effort compared to image-level supervision).
  • In point supervision, they randomly obtain one pixel coordinate of each category that belongs in the image, i.e. the set of tuples
    {(hc,wc,c)|ytc=1}
    . For an image, they compute the loss as follows:
    Lpoint=ytc=1ytclog(Ot(hc,wc,c))
    , where
    OtRH×W×C
    is the output prediction of target after pixel-wise softmax.

Conclusion

  • In this work, they use weak labels to improve domain adaptation for semantic segmentation in both UDA and WDA settings, with the latter being a novel setting.
  • They design an image-level classification module using weak labels, enforcing the network to pay attention to categories present in the image. With this guidance from weak labels, they further utilize a category-wise alignment method to improve adversarial alignment in the feature space.
  • Their formulation generalizes to both pseudo-weak and oracle-weak labels.