Try   HackMD

Notes on "Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision"

tags: notes unsupervised domain-adaptation segmentation self-supervised cvpr20

Author: Akshay Kulkarni

Note: CVPR '20 Oral, Code

Brief Outline

They propose a two-step self-supervised DA approach to minimize the inter-domain and intra-domain gap together.

  1. They conduct inter-domain adaptation and from this, they separate the target domain into an easy and hard split using an entropy-based ranking function.
  2. For the intra-domain adaptation, they propose a self-supervised adaptation technique from the easy to the hard split.

Introduction

  • Target data collected from real world have diverse scene distributions, caused by various factors such as moving objects, weather conditions, which leads to a large gap within the target (intra-domain gap).
  • Previous DA works focus more on inter-domain gap, so this paper presents a 2-step DA approach to minimize the inter-domain and the intra-domain gaps.
  • Their model consists of 3 parts
    • An inter-domain adaptation module to close the inter-domain gap between labeled source data and unlabeled target data.
    • An entropy-based ranking system to separate target data into an easy and hard split.
    • An intra-domain adaptation module to close intra-domain gap between the easy and hard split (using pseudo labels from the easy domain).

Methodology

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • Let
    S
    denote a source domain consisting of a set of images
    RH×W×3
    with their associated ground-truth
    C
    -class segmentation maps
    (1,C)H×W
    . Similarly, let
    T
    denote a target domain containing a set of unlabeled images
    RH×W×3
    .
  • The first step is inter-domain adaptation, based on common UDA approaches (Tsai et. al. 2018 and Vu et. al. 2019). Then, the pseudo labels and predicted entropy maps are used by an entropy-based ranking system to cluster the target data into the easy and hard split.
  • The second step is intra-domain adaptation, which consists of aligning the pseudo-labeled easy split with the hard split. The full procedure is illustrated in the figure above.
  • The proposed network consists of the inter-domain generator and discriminator
    {Ginter,Dinter}
    , and the intra-domain generator and discriminator
    {Gintra,Dintra}
    .

Inter-Domain Adaptation

  • A sample
    XsRH×W×3
    is from the source domain with it's associated map
    Ys
    . Each entry
    Ys(h,w)=[Ys(h,w,c)]c
    of
    Ys
    provides a label of a pixel
    (h,w)
    as a one-hot vector.
  • The network
    Ginter
    takes
    Xs
    as input and generates a "soft segmentation map"
    Ps=Ginter(Xs)
    .
    Ginter
    is optimized in a supervised way by minimizing the CE loss

(1)Linterseg(Xs,Ys)=h,wcYs(h,w,c)log(Ps(h,w,c))

  • ADVENT (Vu et. al. 2019) assumes that trained models tend to produce over-confident (low entropy) predictions for source-like images, and under-confident (high entropy) predictions for target-like images. Based on this, they propose to utilize entropy maps to align the distribution shift of the features.
  • This paper adopts ADVENT for inter-domain adaptation due to it's simplicity and effectiveness. The generator
    Ginter
    takes a target image
    Xt
    as input and produces the segmentation map
    Pt=Ginter(Xt)
    , and the entropy map
    It
    is formulated as

(2)It(h,w)=cPt(h,w,c)log(Pt(h,w,c))

  • To reduce the inter-domain gap,
    Dinter
    is trained to predict the domain labels for the entropy maps while
    Ginter
    is trained to fool
    Dinter
    , and the optimization is achieved via the loss function

(3)Linteradv(Xs;Xt)=h,wlog(1Dinter(It(h,w)))+log(Dinter(Is(h,w)))

  • Here,
    Is
    is the entropy map of
    Xs
    . The loss functions 2 and 3 are optimized to align the distribution shift between source and target domains.

Entropy-based Ranking

  • Some target prediction maps are clean (confident and smooth) while others are noisy, despite being generated from the same model. Since this intra-domain gap exists among target images, a straightforward solution is to decompose the target domain into small subdomains.
  • To build these splits, they use entropy maps to determine the confidence levels of target predictions. They rank the predictions using the mean value of the entropy map
    It
    given by

(4)R(Xt)=1HWh,wIt(h,w)

  • Let
    Xte
    and
    Xth
    denote a target image assigned to the easy and hard splits respectively. For domain separation, they define
    λ=|Xte||Xt|
    where
    |Xte|
    is the cardinality (number of elements) of easy split, and
    |Xt|
    is the cardinality of the whole target set.
  • Note that a threshold value for separation is not used since it would be specific to a dataset. They choose the ratio as a hyperparameter, which shows strong generalization to other datasets.

Entropy Normalization

  • Complex scenes (containing many objects) might be categorized as hard.
  • For a more representative ranking, they adopt a new normalization by dividing the mean entropy with the number of predicted rare classes in the target image.
  • Note that rare classes are pre-defined from the set of all classes (see results section in the paper for definition).
  • The entropy normalization helps to move images with many objects to the easy split.

Intra-domain Adaptation

  • They propose to use the predictions from
    Ginter
    as pseudo labels for the easy split. Given image from easy split
    Xte
    , the prediction map
    Pte=Ginter(Xte)
    is a soft-segmentation map, which is converted to
    Pte
    where each entry is a one-hot vector.
  • Using these pseudo-labels,
    Gintra
    is optimized by minimizing the CE loss

(5)Lintraseg(Xte)=h,wcPte(h,w,c)log(Gintra(Xte)(h,w,c))

  • An image
    Xth
    from the hard split gives the segmentation map
    Pth=G(Xth)
    (note that this is the
    G
    being trained and not the fixed
    Ginter
    ) and the entropy map
    Ith
    .
  • To close the intra-domain gap, the intra-domain discriminator
    Dintra
    is trained to predict the split labels of
    Ite
    (easy split) and
    Ith
    (hard split), and
    G
    is trained to fool
    Dintra
    . The adversarial loss can be formulated as

(6)Lintraadv(Xte,Xth)=h,wlog(1Dintra(Ith(h,w)))+log(Dintra(Ite(h,w)))

  • The complete loss function
    L
    is the sum of all 4 equations (2, 3, 5, 6) and the objective is to learn a target model
    G
    according to

(7)G=argminGintraminGinterGintramaxDinterDintraL

  • Since the proposed method is a 2-step self-supervised approach, it is difficult to train in a single stage. They choose to minimize it in 3 stages as follows
    1. Train the inter-domain adaptation model to optimize
      Ginter
      and
      Dinter
      .
    2. Generate target pseudo labels using
      Ginter
      and rank all target images based on
      R(Xt)
      .
    3. Train the intra-domain adaptation model to optimize
      Gintra
      and
      Dintra
      .

Note: See Section 4.3 of the paper for theoretical analysis (not able to understand yet).

Conclusion

  • A self-supervised DA approach is proposed to minimize the inter-domain and intra-domain gaps simultaneously.