Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in an open-world setting. However, existing OOD detection solutions can be brittle in the open world, facing various types of adversarial OOD inputs.
In this paper, we provide a theoretically motivated method, Adversarial Training with informative Outlier Mining (ATOM), which improves the robustness of OOD detection. We show that, by mining informative auxiliary OOD data, one can significantly improve OOD detection performance, and somewhat surprisingly, generalize to unseen adversarial attacks.
Out-of-distribution (OOD) detection has become an indispensable part of building reliable open-world machine learning models. An OOD detector determines whether an input is from the same distribution as the training data, or different distribution.
An OOD image (e.g., mailbox) can be perturbed to be misclassified by the OOD detector as in-distribution (traffic sign data). Failing to detect such an adversarial OOD example 1 can be consequential in safety-critical applications.
The authors propose a novel training framework, Adversarial Training with informative Outlier Mining (ATOM). The key idea is to selectively utilize auxiliary outlier data for estimating a tight decision boundary between ID and OOD data, which leads to robust OOD detection performance.
While recent methods have leveraged auxiliary OOD data, they show that randomly selecting outlier samples for training yields a large portion of uninformative samples, which do not meaningfully improve the decision boundary between ID and OOD data.
ATOM demonstrates that by mining low OOD score data for training, one can significantly improve the robustness of an OOD detector, and generalize to unseen adversarial attacks.
The authors extensively evaluate ATOM on common OOD detection benchmarks, as well as a suite of adversarial OOD tasks, as illustrated in Figure 1.
Lastly, the authors provide theoretical analysis for ATOM, characterizing how outlier mining can better shape the decision boundary of the OOD detector.
Fig.1. Robust out-of-distribution detection. When deploying an image classification system (OOD detector + image classifier ) in an open world, there can be multiple types of OOD examples. We consider a broad family of OOD inputs, including (a) Natural OOD, (b) OOD, © corruption OOD, and (d) Compositional OOD. In (b-d), a perturbed OOD input (e.g., a perturbed mailbox image) can mislead the OOD detec- tor to classify it as an in-distribution sample. This can trigger the downstream image classifier to predict it as one of the in-distribution classes (e.g., speed limit 70). Through adversarial training with informative outlier mining (ATOM), our method can robustify the decision boundary of OOD detector , which leads to improved performance across all types of OOD inputs. Solid lines are actual computation flow.
Considering a training dataset train in drawn i.i.d. from a data distribution , where is the sample space and is the set of labels. In addition, having an auxiliary outlier data from distribution .
The goal is to learn a detector , which outputs for an indistribution example and output for a clean or perturbed OOD example .
Let be a set of small perturbations on an OOD example . The detector is evaluated on from and on the worst-case input inside for an OOD example from .
The false negative rate (FNR) and false positive rate (FPR) are defined as
The authors use the terminology outlier mining to denote the process of selecting informative outlier training samples from the pool of auxiliary outlier data. Outlier training data is sampled from a uniform distribution from outside the support of in-distribution.
Fig.2. A toy example in 2D space for illustration of informative outlier mining. With informative outlier mining, we can tighten the decision boundary and build a robust OOD detector.
The classification involves using a mixture of ID data and outlier samples. consider a -way classifier network , where the -th class label indicates out-of-distribution class. Denote by the softmax output of f on x. The robust training objective is given by
where is the cross entropy loss, and is the OOD training dataset.
The authors use Projected Gradient Descent (PGD) to solve the inner max of the objective, and apply it to half of a minibatch while keeping the other half clean to ensure performance on both clean and perturbed data
Once trained, the OOD detector can be constructed
where is the threshold, and in practice can be chosen on the in-distribution data so that a high fraction of the test examples are correctly classified by . And is the OOD score of .
For an input labeled as in-distribution by , one can obtain its semantic label using
During each training epoch, we randomly sample data points from the auxiliary OOD dataset , and use the current model to infer the OOD score. Next, the authors sort the data points according to the OOD scores and select a subset of data points, starting with the -th data in the sorted list. We then use the selected samples as OOD training data D for the next epoch of training.
Intuitively, determines the informativeness of the sampled points w.r.t the OOD detector. The larger is, the less informative those sampled examples become.