# Notes on "Universal Domain Adaptation through Self-Supervision" [[Paper]](https://arxiv.org/pdf/2002.07953.pdf)
###### tags: `notes` `unsupervised` `domain-adaptation`
Notes Author: [Rohit Lal](https://rohitlal.net/)
---
## Brief Outline
- Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains
- Author propose a universally applicable domain adaptation framework that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE)
## Introduction
![](https://i.imgur.com/kPh0sd1.png)
- Combines two novel ideas:
1. as we cannot fully rely on source categories to learn features discriminative for the target, auuthors propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way.
2. Author use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy.
- Rather than relying only on the supervision of source categories to learn a discriminative representation, DANCE harnesses the cluster structure of the target domain using self-supervision.
## Methodology
![](https://i.imgur.com/xdsn5Ru.png)
- Task is universal domain adaptation: given a labeled source domain $D_s = \{(x_i^s,y_i^s)\}_{i=1}^{N_s}$ with “known" categories $L_s$ and an unlabeled target domain $D_t = \{(x_i^t)\}_{i=1}^{N_t}$ which contains all or some “known" categories and possible “unknown" categories.
- goal is to label the target samples with either one of the Ls labels or the “unknown" label.
![](https://i.imgur.com/0g1MOsZ.png)
### Network Architecture
- Network Adapted from [Semi-supervised Domain Adaptation via Minimax Entropy](https://arxiv.org/pdf/1904.06487.pdf)
- ![](https://i.imgur.com/7vsP5b6.png)
### Neighborhood Clustering (NC)
- propose to minimize the entropy of each target point’s similarity distribution to other target samples and to prototypes. To minimize the entropy, the point will move closer to a nearby point (we assume a neighbor exists) or to a prototype.
- Similar classes come together. See Fig. 1
- p is outpyt of n/w
![](https://i.imgur.com/2wBFPgB.png)
### Entropy Separation loss (ES)
- The neighborhood clustering loss encourages the target samples to become well-clustered, but we still need to align some of them with “known” source categories while keeping the “unknown” target samples far from the source.
- “unknown" target samples are likely to have a larger entropy of the source classifier’s output than “known” target samples. This is because “unknown" target samples do not share common features with “known" source classes.
- The distance between the entropy and threshold boundary, $\rho$, is defined as $|H(p) -\rho|$, where p is the classification output for a target sample. By maximizing the distance, we can make H(p) far from $\rho$. The introduction of the confidence threshold m allows us to give the separation loss only to confident samples.Final Loss function:
![](https://i.imgur.com/Azatc9E.png)
### Training with Domain Specific Batch Normalization
- The batch normalization layer whitens the feature activations, which contributes to a performance gain.
- simply splitting source and target samples into different mini-batches and forwarding them separately helps alignment.
- Final Objective: $\mathcal{L} = \mathcal{L}_{cls}+ \lambda(\mathcal{L}_{nc} + \mathcal{L}_{es})$
$\mathcal{L}_{cls}$ is cross-entropy loss on source samples
- The loss on source and target is calculated in a different mini-batch to achieve domain-specific batch normalization.