# Notes on "Universal Domain Adaptation through Self-Supervision" [[Paper]](https://arxiv.org/pdf/2002.07953.pdf) ###### tags: `notes` `unsupervised` `domain-adaptation` Notes Author: [Rohit Lal](https://rohitlal.net/) --- ## Brief Outline - Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains - Author propose a universally applicable domain adaptation framework that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE) ## Introduction ![](https://i.imgur.com/kPh0sd1.png) - Combines two novel ideas: 1. as we cannot fully rely on source categories to learn features discriminative for the target, auuthors propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way. 2. Author use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy. - Rather than relying only on the supervision of source categories to learn a discriminative representation, DANCE harnesses the cluster structure of the target domain using self-supervision. ## Methodology ![](https://i.imgur.com/xdsn5Ru.png) - Task is universal domain adaptation: given a labeled source domain $D_s = \{(x_i^s,y_i^s)\}_{i=1}^{N_s}$ with “known" categories $L_s$ and an unlabeled target domain $D_t = \{(x_i^t)\}_{i=1}^{N_t}$ which contains all or some “known" categories and possible “unknown" categories. - goal is to label the target samples with either one of the Ls labels or the “unknown" label. ![](https://i.imgur.com/0g1MOsZ.png) ### Network Architecture - Network Adapted from [Semi-supervised Domain Adaptation via Minimax Entropy](https://arxiv.org/pdf/1904.06487.pdf) - ![](https://i.imgur.com/7vsP5b6.png) ### Neighborhood Clustering (NC) - propose to minimize the entropy of each target point’s similarity distribution to other target samples and to prototypes. To minimize the entropy, the point will move closer to a nearby point (we assume a neighbor exists) or to a prototype. - Similar classes come together. See Fig. 1 - p is outpyt of n/w ![](https://i.imgur.com/2wBFPgB.png) ### Entropy Separation loss (ES) - The neighborhood clustering loss encourages the target samples to become well-clustered, but we still need to align some of them with “known” source categories while keeping the “unknown” target samples far from the source. - “unknown" target samples are likely to have a larger entropy of the source classifier’s output than “known” target samples. This is because “unknown" target samples do not share common features with “known" source classes. - The distance between the entropy and threshold boundary, $\rho$, is defined as $|H(p) -\rho|$, where p is the classification output for a target sample. By maximizing the distance, we can make H(p) far from $\rho$. The introduction of the confidence threshold m allows us to give the separation loss only to confident samples.Final Loss function: ![](https://i.imgur.com/Azatc9E.png) ### Training with Domain Specific Batch Normalization - The batch normalization layer whitens the feature activations, which contributes to a performance gain. - simply splitting source and target samples into different mini-batches and forwarding them separately helps alignment. - Final Objective: $\mathcal{L} = \mathcal{L}_{cls}+ \lambda(\mathcal{L}_{nc} + \mathcal{L}_{es})$ $\mathcal{L}_{cls}$ is cross-entropy loss on source samples - The loss on source and target is calculated in a different mini-batch to achieve domain-specific batch normalization.