Unsupervised domain adaptation methods traditionally assume that all source categories are present in the target domain. In practice, little may be known about the category overlap between the two domains
Author propose a universally applicable domain adaptation framework that can handle arbitrary category shift, called Domain Adaptative Neighborhood Clustering via Entropy optimization (DANCE)
as we cannot fully rely on source categories to learn features discriminative for the target, auuthors propose a novel neighborhood clustering technique to learn the structure of the target domain in a self-supervised way.
Author use entropy-based feature alignment and rejection to align target features with the source, or reject them as unknown categories based on their entropy.
Rather than relying only on the supervision of source categories to learn a discriminative representation, DANCE harnesses the cluster structure of the target domain using self-supervision.
Task is universal domain adaptation: given a labeled source domain with “known" categories and an unlabeled target domain which contains all or some “known" categories and possible “unknown" categories.
goal is to label the target samples with either one of the Ls labels or the “unknown" label.
propose to minimize the entropy of each target point’s similarity distribution to other target samples and to prototypes. To minimize the entropy, the point will move closer to a nearby point (we assume a neighbor exists) or to a prototype.
The neighborhood clustering loss encourages the target samples to become well-clustered, but we still need to align some of them with “known” source categories while keeping the “unknown” target samples far from the source.
“unknown" target samples are likely to have a larger entropy of the source classifier’s output than “known” target samples. This is because “unknown" target samples do not share common features with “known" source classes.
The distance between the entropy and threshold boundary, , is defined as , where p is the classification output for a target sample. By maximizing the distance, we can make H§ far from . The introduction of the confidence threshold m allows us to give the separation loss only to confident samples.Final Loss function: