# Response to Reviewer kE4t (1) We thank Reviewer kE4t for appreciating the relevance of our work. We now address the main points raised in the review. >[...] to provide a faithful comparison to related work some experimental evidence on real world data might be beneficial: **Prior works already provide experimental evidence on real world data.** Our work draws on two directions of related work: 1) the theoretical results that characterize the inductive bias of min-$\ell_2$-norm interpolators (for regression) and max-$\ell_2$-margin interpolators (for classification) in the overparameterized regime; and 2) the empirical observation of the phenomenon of robust overfitting [1]. We assume that Reviewer kE4t refers to the latter, but we would be happy to clarify the relationship to the former line of work as well if desired. Prior work shows experimentally that deep neural networks trained on large image datasets benefit from early stopping if evaluated with the adversarially robust risk [1], or with the worst-case risk among subpopulations [2,3]. A recent work [4] argues that overfitting of the adversarially robust risk may be due to noise in the training data. In our paper, we instead focus on showing that robust overfitting occurs even for settings in which we can reduce label noise to a minimum (see the experiments in Figure 1 for more details). > While the results presented in this paper are useful and interesting I have some concerns about the form of adversarial perturbations considered. In particular, there is not sufficient motivation why the particular definition of adversarial perturbation is used. It seems as though considering perturbation orthogonal to the ground truth may primarily be well suited to the linear ground truth setting. It's not clear if this is a valid choice for other models or how to generalize such perturbation to different models especially one where the data is lower dimensional. It is also unclear why "adversarial" perturbations are consistent. That is a strong assumption. **Consistent perturbations in practice and extending the definition to generic function classes.** We assume that the reviewer is challenging the fact that adversarial perturbations in practice (e.g. on images) are consistent and asks whether this assumption for theoretical analysis is reasonable. We kindly ask Reviewer kE4t to correct our presumption if they are not satisfied with our response. Adversarial perturbations were originally defined in the context of image data to be *imperceptible* to the human eye [5], in the sense that they do not change the ground truth label, or equivalently, the ground truth classifier perfectly classifies the perturbed inputs. This property of perturbations $\delta$ is what we call consistency (as has been done before in [6]), i.e., $f^\star(x+\delta) = f^\star(x)$, for a generic ground truth $f^\star$ and all inputs $x$. We note that this definition is general and can be used for arbitrary function classes. In particular, for linear models, it takes the form of the orthogonality condition $\langle \theta^\star, \delta \rangle = 0$ that we use in our derivations. > On a different axis, it is not clear why training in the regression setting does not include adversarial examples but training in the classification setting does. What structural differences in the two problems imply the use of these different settings? **Difference between adversarial training for regression and classification.** For linear regression, adversarial training either renders interpolating estimators infeasible, or requires oracle knowledge of the ground truth which leaks too much information and allows perfect recovery of $\theta^\star$. In contrast, for linear classification, interpolation is easier to achieve – it only requires the sign of $\langle x_i, \theta \rangle$ to be the same as the label $y_i$ for all $i$. In particular, when the data is sufficiently high-dimensional, it is possible to find an interpolator of the adversarially perturbed training set. We further elaborate on the topic of consistent adversarial training in the general comments. [1]: Overfitting in adversarially robust deep learning.Rice, Leslie and Wong, Eric and Kolter, Zico. ICML 20. \ [2]: Distributionally Robust Neural Networks. Shiori Sagawa and Pang Wei Koh and Tatsunori B. Hashimoto and Percy Liang. ICLR 20. \ [3]: An Investigation of Why Overparameterization Exacerbates Spurious Correlations. Sagawa, Shiori and Raghunathan, Aditi and Koh, Pang Wei and Liang, Percy. ICML 20. \ [4]: How Benign is Benign Overfitting? Amartya Sanyal and Puneet K. Dokania and Varun Kanade and Philip Torr. ICLR 21. \ [5]: Explaining and Harnessing Adversarial Examples. Ian Goodfellow and Jonathon Shlens and Christian Szegedy. ICLR 15. \ [6]: Understanding and Mitigating the Tradeoff Between Robustness and Accuracy. A. Raghunathan, S. M. Xie, F. Yang, J. Duchi, P. Liang. ICML 20