Notes on "Universal Domain Adaptation through Self-Supervision" [Paper]

Notes Author: Rohit Lal

Brief Outline

Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains
Author propose a universally applicable domain adaptation framework that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE)

Image Not Showing Possible Reasons

Combines two novel ideas:
1. as we cannot fully rely on source categories to learn features discriminative for the target, auuthors propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way.
2. Author use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy.
Rather than relying only on the supervision of source categories to learn a discriminative representation, DANCE harnesses the cluster structure of the target domain using self-supervision.

Image Not Showing Possible Reasons

Task is universal domain adaptation: given a labeled source domain
$D_{s} = {(x_{i}^{s}, y_{i}^{s})}_{i = 1}^{N_{s}}$ with “known" categories
$L_{s}$ and an unlabeled target domain
$D_{t} = {(x_{i}^{t})}_{i = 1}^{N_{t}}$ which contains all or some “known" categories and possible “unknown" categories.
goal is to label the target samples with either one of the Ls labels or the “unknown" label.

Image Not Showing Possible Reasons

propose to minimize the entropy of each target point’s similarity distribution to other target samples and to prototypes. To minimize the entropy, the point will move closer to a nearby point (we assume a neighbor exists) or to a prototype.
Similar classes come together. See Fig. 1
p is outpyt of n/w

Image Not Showing Possible Reasons

The neighborhood clustering loss encourages the target samples to become well-clustered, but we still need to align some of them with “known” source categories while keeping the “unknown” target samples far from the source.
“unknown" target samples are likely to have a larger entropy of the source classifier’s output than “known” target samples. This is because “unknown" target samples do not share common features with “known" source classes.
The distance between the entropy and threshold boundary,
$ρ$ , is defined as
$| H (p) - ρ |$ , where p is the classification output for a target sample. By maximizing the distance, we can make H§ far from
$ρ$ . The introduction of the confidence threshold m allows us to give the separation loss only to confident samples.Final Loss function:
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

The batch normalization layer whitens the feature activations, which contributes to a performance gain.
simply splitting source and target samples into different mini-batches and forwarding them separately helps alignment.
Final Objective:
$L = L_{c l s} + λ (L_{n c} + L_{e s})$

$L_{c l s}$ is cross-entropy loss on source samples
The loss on source and target is calculated in a different mini-batch to achieve domain-specific batch normalization.