Rebuttal AT-non-zero-sum

# A message to the AC We want to bring the following issues to your attention: 1. **Incorrect claims about Prop. 1** `Reviewer xfGh` claims that (1) Prop. 1 is incorrect without providing any mathematical justification and (2) that Prop. 1 does not hold for nonconcave problems, which is false. They also criticize this result for being "straightforward" and "not surprising;" that our results are clear and interpretable is a strength, not a weakness. 2. **Impossible guarantees requested.**`Reviewer xfGh` has indicated that our paper is incomplete without a proof that our framework is *guaranteed* to find an adversarial example. Such a guarantee has never appeared in the literature before because this guarantee is false; it cannot hold in general. Relatedly, `Reviewer xfGh` has indicated that they do not understand the difference between an optimization problem and global optimality guarantees of first-order methods. We have included ample evidence in our response including new plots and examples, but the review has ignored this evidence and refuses to acknowledge it (see [our response](https://openreview.net/forum?id=naS22RQnFv&noteId=nq8mvcVmTK)). 3. **Unrealisitic expectation for a conference paper.**`Reviewer iTuT`'s only remaining criticism is that we do not extend our work beyond supervised classification (to, e.g., segmentation or regression). This is an unrealistic standard for a conference paper; we are not aware of *any* paper which propose a new AT algorithm for classification and in the same paper extends this algorithm to another setting (see [our response](https://openreview.net/forum?id=naS22RQnFv&noteId=yGzmAFq4bn).) 8. **Unjustified experiments requested.** `Reviewer iTuT` has ignored our theoretical evidence and vaguely asks for more experiments without justifying (a) what experiments they would like to see and (b) why our paper would be improved by adding experiments. We have provided experiments on standard benchmarks which definitely back up the claims we make in our paper. All of the reviewers highlighted the *novelty* of our work; rejecting a novel paper because it could add additional ablations is a harmful standard to set for the community. Overall, we feel strongly that we have addressed all of the concerns highlighted by the reviewers. The reviewers that gave `Rating=4` acknowledged that our edits and new evidence improves our paper (`Reviewer xfGh`: ["Thanks for the additional responses to my follow-up questions. They alleviate some of my concerns;"](https://openreview.net/forum?id=naS22RQnFv&noteId=dym8zU642t) `Reviewer iTuT`: ["the authors have addressed some concerns such as the Speed-performance trade-off"](https://openreview.net/forum?id=naS22RQnFv&noteId=cDSQBLdIZk)). However, despite their view that our paper has improved post-rebuttal, they refuse to raise their scores without strong justification. In general, we feel that any reader of this forum would agree that `Reviewer xfGh` and `Reviewer iTuT` are looking for arbitrary and inconsequential reasons to reject our work; this is not in the spirit of fair and unbiased reviewing. # xfGh part 2 We'll go through your second comment step by step. > Proposition 1 considers $\eta^\star$ and $j^\star$, defined as the optimal solutions of BETA, then draws conclusions. For nonconvex $M_\theta$, the optimization problem in Eq. 13 cannot be exactly solved. We agree. This is not a strength or weakness of any AT formulation. That is, when optimizing over DNNs, the following problems can't be solved to global optimality: * **The 0-1 problem in (11)**, since its discontinuous and has zero gradient almost everywhere. * **The surrogate problem in (7)** due to non-concavity. * **The margin problem in (13)** due to non-concavity. Note that (7) and (13) can be solved to local optimality using first-order methods, whereas (11) cannot be solved via first-order methods. The hardness of non-concave maximization is well known in the AT literature and for this reason, formal guarantees of global optimality for the inner maximization problem **have never appeared before in the literature**, the next table covers the most common AT algos: | Ref. | Conference | Citations | Proves global opt. guarantees? | extra info | |---|---|---|---|---| | [PGD](https://openreview.net/forum?id=rJzIBfZAb) | ICLR'18 | 9251 | **NO** | -| | [TRADES](https://proceedings.mlr.press/v97/zhang19p.html) | ICML'19 (Long Talk) | 1893 | **NO** | Champion of NeurIPS 2018 Adv. Vision Challenge / In RobustBench, all top-10 methods use TRADES ([source](https://hongyanz.github.io))| | [MART](https://openreview.net/forum?id=rklOg6EFwS) | ICLR 2020 | 492 | **NO** | - | | [AWP](https://arxiv.org/abs/2004.05884) | NeurIPS 20202 | 446 | **NO** | - | | [RST](https://papers.nips.cc/paper_files/paper/2019/hash/32e0bd1497aa43e02a42f47d9d6515ad-Abstract.html) | NeurIPS 2019 | 600 | **NO** | - | We want to convey the following: *lacking convergence guarantees for AT algorithms has not been an obstacle for the community to find them useful and impactful.* > when you apply Alg. 1 there is no guarantee that the perturbation returned induces misclassification. We kindly ask the reviewer to reread our claims: **We never claimed that our algorithm was *guaranteed* to produce an adversarial example**, because **this kind of guarantee will never hold for any algorithm** (there are instances where adv. examples do not exist). To clarify, could you please signal whether you agree or disagree with the following? 1. There **does not currently exist** an algorithm that **guarantees** the return of an adversarial example (one that induces misclassification). 2. Even if one could obtain a **global maximizer** of (11), (7), or (13), one would still **not** be guaranteed to find an adversarial example. Both of these statements are true, and they reflect that this part of **your concern does not apply to our work or to the broader field of adv. robustness**. > Does the conclusion of Proposition 1 generalize to $(\eta, j)$ that are approximated solutions to Equation 13? Prop. 1 only makes a statement about the optima of the margin problem in (13), namely, that they are also optimal for the 0-1 problem in (11). In general, there is no guarantee that arbitrary feasible point for (13) will be optimal for (11). > you argue BETA defines the correct objective (and PGD is flawed), I expect a clarification on how Prop. 1 supports BETA as the correct formulation. This is important from the theoretical side As we show in the counterexample given in our rebuttal, there are cases where **adv. examples exist, but approximately (or even globally optimal) solutions of the surrogate problem in (7) obtained by PGD, *do not* find any adv. example, whereas solving (13) does**. This is the "fundamental flaw" in maximizing Cross-ent. Our formulation (13) does not suffer from this i.e., if there exists an adv. example, our formulation will *always* prefer it over a non-adversarial one. We decouple our argument into three steps: **Step 0: Maximizing the 0-1 loss (11) is *correct but intractable*.**: it is discontinuous and has zero gradient almost everywhere. **Step 1: Maximizing the cross-ent. is *incorrect and tractable*.** The CE can be approx. optimized (PGD). However, in Sec. 3.1 we show that "(7) can fail to produce an adv. example even if one exists, and even at optimality" (lines 136-137). In our rebuttal, we provide an example where maximizing CE **does not find** an adv. example where indeed one exists, which is **the "fundamental flaw"**. This implies the following conclusion: **one should maximize (13), rather than an upper bound (like CE), on the 0-1 loss to avoid these pitfalls.** **Step 2: BETA is *correct and tractable*.** Prop. 1 shows that instead of solving (11), one can solve the BETA problem in (13) which is smooth enough to enable first-order methods, meaning it can be (approx.) optimized efficiently. Please indicate which sentences in this argument you disagree with (if there are any). --- > "You argue BETA-AT eliminates robust overfitting (RO), I am expecting sufficient theoretical or empirical evidence. I do not understand how the provided results prove the elimination of RO in BETA-AT." Again, we kindly ask the reviewer to reread our introduction, which states that > "Our empirical evaluations reveal our approach...eliminates the problem of RO" This is **empirically** shown, not theoretically proven; to be clear, we do not claim a *theoretical* elimination of RO. This is repeated numerous times (lines 233-235, 266-267, and 298-300). RO is an empirically defined phenomenon (see [Wong et al., 2020]), and correspondingly our result is an *empirical* elimination of robust overfitting. If you would like us to use language that clearly emphasizes that this result is empirical, we will do so. But we kindly ask the reviewer to evaluate our paper based on the claims we actually make. > I am not convinced that the empirical results based on a single dataset/architecture are sufficient to claim elimination of RO. Why you think more experiments are not necessary?" please see [our answer](https://openreview.net/forum?id=naS22RQnFv&noteId=kyHqEV7tIk) to a similar question by reviewer iTuT --- # answer to iTuT message Based on your message, we have addressed `Concern (1): Speed-performance trade-off` and have answered `Question (a): Our focus on classification/0-1 loss`. Given that you indicated that this would improve our paper in your original review, we ask that you consider raising your score. Below, we answer the remainder of your questions. --- > the authors have addressed some concerns, such as the speed-performance trade-off This was one of the two concerns raised in your original review. Given that we have "addressed" this, we urge you to consider raising your score. > "...it still needs more evaluation on larger models and other datasets (not to mention different parameter setting such as the perturbation range, constrains of norm)." In your review, you emphasized: * Our **"novel formulation and algorithm"** * That our approach constitutes a **"meaningful research direction"** * Our strong **"theoretical support"** * That our algorithm achives **"comparable performance to the SOTA"** The vague suggestion that we should add more experiments -- particularly without indicating (a) why this would improve our paper and (b) what experiments/datasets/models you have in mind -- will not change any of the four strengths listed above. Based on your comment and review, **a novel formulation and algorithm, a meaningful direction, theoretical evidence, and SOTA experiments** are only enough for a score of ``Rating: 4: Borderline reject``. And so we ask, What is your criteria for acceptance? Furthermore, here are three recent papers which (1) propose adversarial training/evaluation algorithms (2) are **published** in NeurIPS/ICLR/ICML, and (3) have **really similar experiments compared to our paper**: * **TRADES (ICML '19):** Experiments only on MNIST and CIFAR-10. Only considers $\epsilon=8/255$ for $\ell_\infty$ perturbations with the ResNet architecture for CIFAR. * **MART (ICLR '20):** Experiments only on CIFAR-10. Only considers $\epsilon=8/255$ for $\ell_\infty$ perturbations with the ResNet architecture for CIFAR.. * **DALE (NeurIPS '21):** Experiments only on MNIST and CIFAR-10. Only considers $\epsilon=8/255$ for $\ell_\infty$ perturbations with the ResNet architecture for CIFAR. Unlike our work, none of these papers compare to RobustBench or provide speed-complexity trade-offs. This evidence is conclusive shows that we have provided sufficient experimental evidence to merit acceptance. > PGD and other attack methods have been used to test the robustness in different fields. We do not understand this comment. What do you mean by "different fields?" Note that we use PGD and several other methods to evaluate robustness in Tables 1 and 2 of our paper. > Although the classification problem is commonly used to evaluate the adversarial training/evaluation algorithm, I still concern that whether the proposed AT algorithm could be used as potential alternative to PGD in other applications. We are absolutely sure that this is **beyond the scope of our paper** and **should not be a criterion** for evaluating our paper. Consider that **nearly every paper which has proposed an algorithm for adversarial examples in the last ten years has focused on supervised classification**, e.g., `FGSM (ICLR '15)`, `DeepFool (CVPR '16)`, `PGD (ICLR '18)`, `Convex Adv. Polytope (ICML '18)`, `TRADES (ICML '19)`, `Free AT (NeurIPS '19)`, `YOPO (NeurIPS '19)`, `MART (ICLR '20)`, `Fast AT (ICLR '20)`, and many others. Furthermore: * Adversarial examples were discovered (see [A, B]) in the context of **supervised classification**, as were the most prominent algorithms: FGSM, PGD, and TRADES. * The standard leaderboard [RobustBench](https://robustbench.github.io) only considers supervised classification. * The problem of robust overfitting [C] which we study in this paper is introduced and **only known in the context of supervised classification** We ask you the following: > Can you find any published/well-known paper that (a) proposes an algorithm for adversarial examples, (b) evaluates on supervised classification, and (c) considers other problems (e.g., regression)? Finally, we remind the reviewer of our listed contributions in lines 55-66, which only concern supervised classification. **Adding additional settings would go far beyond any reasonable expectation for an 8-page conference paper**. Please evaluate our paper based on the claims we make, and not on claims (e.g., about other problems) that we do not make. **References in this message** [A] Intriguing properties of neural networks. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, Rob Fergus. 2014. [B] Evasion Attacks against Machine Learning at Test Time. Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, Giorgio Giacinto & Fabio Roli. 2013 [C] Overfitting in adversarially robust deep learning Leslie Rice, Eric Wong, J. Zico Kolter. ICML 2020. # answer to xfGh message Thanks for your reply! We have carefully considered your comments. --- > "I am not able to find this new subsection. Can you point out its location or directly reply below in response to this comment?" As far as we can tell, it is not possible to upload a new PDF, so we have pasted in our new subsection below > **Sample efficiency.** One potential limitation of the BETA-AT algorithm introduced in Algorithm 2 is its sample efficiency: BETA computes one adversarial perturbation per class, but only one of these perturbations is chosen for the upper level of the bilevel formulation (20). In this way, one could argue that there is wasted computational effort in discarding perturbations that achieve high values of the negative margin (12). This potential shortcoming is a byproduct of the non-smoothness of the max operator in (22). In practice, this shortcoming manifests in the fact that BETA-AT must compute $K$ candidate adversarial perturbations per instance, whereas in traditional AT, only one such perturbation is computed per instance. As we show in Section 5, one can view this increased computational cost as the price of (a) ameliorating robust overfitting and (b) eliminating the need for heuristics to accurately evaluate test-time adversarial robustness. Furthermore, in Appendix C, we show that on average, BETA is 5.11 times faster than AutoAttack without sacrificing any loss of performance on the top entries on RobustBench; see Figure 2 for details. In this appendix, we also provide a transparent analysis between training time and robust accuracy in Figure 3, showing that BETA-AT's mitigating of robust overfitting can come at the cost of a slightly longer epochs. In this paragraph, Figures 2 and 3 are shown in our rebuttal PDF. --- > "Thanks for comparing the running time between BETA and AutoAttack. These comparisons on running time are helpful and could be appropriately incorporated in Table 2 of the original submission. Since APGD-T is also presented as a baseline in Table 2, I am wondering about the running time of APGD-T compared with BETA." In line with your suggestion, we added APGD-T to Figure 2 and found that while APGD-T is faster than AutoAttack, **BETA is $>4.5$ times faster than APGD-T** (unfortunately, OpenReview does not allow us to re-upload the PDF with the updated figure). We hope you agree that this represents strong evidence in favor of the scalability of our approach for adversarial evaluation. And as this addresses one of your main concerns, we hope you'll consider increasing your score. --- > "If I understand correctly, Proposition 1 is the central theoretical result of your work. For nonconvex optimization, the assumption that one can solve the inner maximization exactly, assumed in Proposition 1, does not hold, so it is possible for the equivalence claimed in Proposition 1 not to hold for nonconcave inner optimization with neural networks." We think that there has been a misunderstanding, because you're characterization of our assumptions is not correct: **There is no assumption that the problem in (11) can or cannot be solved in practice regardless of concavity** (see the proof in Appendix A), and the proposition does not identify either problem as being "better" than the other (as mentioned in your review). In plain terms, Prop. 1 shows that the optima of the negative margin maximization problem defined in (13) are also optimal for (11), which captures the problem of finding adversarial examples for the 0-1 loss. This has nothing to do with whether one can find these optima in practice. One implication of this is as follows: Since the objective of (13) is differentiable (almost everywhere), one can readily solve (13) using gradient ascent, whereas the original problem in (11) cannot be solved via gradient descent because the objective is non-continuous. --- > "Please correct me if I am wrong. Theoretically, BETA and standard adversarial Training generate approximated solutions for nonconcave inner maximization problems (Equation 11)." This is not correct. **Our algorithm finds approximate solutions to the inner problem in (11)**, which aligns exactly with the goal of finding adversarial examples. **Standard AT (e.g., PGD) finds approximate solutions to a different, potentially very different problem** -- specifically, the surrogate problem in (7) -- which, as we show in Section 3.1, may be a very poor solution for (11) and therefore quite misaligned with the goal of finding adversarial examples; see the discussion under the "Limintation I: Weak attackers" sub-heading. --- > "Then why does the optimization problem of BETA (Equations 17, 18, 19) define a better formulation (for adversarially robust learning) than standard AT?" Building on the response above, answering this question is the primary and central purpose of our paper. Our argument is as follows: 1. **Theoretical limitations.** Maximizing an upper bound of the 0-1 loss provides no guarantees of finding adversarial examples (see Limitation 1, eq. 9), and therefore standard AT is not guaranteed to improve robustness (see Limitation 2). 2. **A counterexample.** Furthermore, as we prove in the example in our rebuttal, there exist instances where maximizing the cross-entropy loss yields a solution which does not result in an adversarial example and is fundamentally misaligned with the maximization of the 0-1 loss. 3. **Theoretical evidence.** Thus, maximizing a surrogate is *inconsistent* with the adversarial examples problem. However, maximizing the correct objective -- the 0-1 loss -- is intractable since it's discontinuous everywhere. Prop. 1 shows that this discontinuous 0-1 problem can be replaced with an almost-everywhere continuous objective which has the same optima. 4. **SOTA test-time robustness.** As shown in Table 2, PGD attacks (which maximize the cross-entropy) overestimate adversarial robustness. On the other hand, BETA provides a SOTA estimate in practice while being 5.11 times faster than the best known algorithm. 5. **Elimination of robust overfitting.** We show that the performance of the last-iterate (which is more representative of the optimization formulation) obtained with BETA-AT eliminates robust overfitting on CIFAR-10 (the most common benchmark for adversarial robustness). It would be helpful if you could clarify whether you agree or disagree with this argument. And if you disagree, could you be specific about which piece(s) you do not agree with? --- > "For this, I can see two options: either you can theoretically prove that your method is always better than the standard formulation of adversarial training (e.g., eliminating robust overfitting), or you give sufficient and convincing empirical evidence showing that your method is better. However, I do not see supporting solid evidence in your paper and response from either option." We have provided evidence that satisfies both of your criteria: * **Theoretical.** In the example in our previous response, we proved that there are cases in which maximizing the cross-entropy loss is provably worse than maximizing the objective we propose. We feel that this satisfies your first criteria. Do you agree? * **Empirical.** We give convcincing empirical evidence in Table 1 and Figures 1 & 3 that our algorithm eliminates robust overfitting. We feel that this satisfies your second criteria. Do you agree? # Reviewer xfGh (score 4) Thank you for your review of our work! We are pleased that you think our paper is **"well-structured and easy to follow,"** that it focuses on an **"important research topic,"**, and that our technical and algorithmic analysis is **"well-explained"**. We are also glad that you think our proposed algorithm **"mitigates robust overfitting"** and performs similarly to AutoAttack, consituting a **promising** approach to adversarial attacks and training. **Summary of our response:** We understand that you feel that three main ingredients are missing from our paper: (1) a clarification of the novelty of the technical aspects of our approach; (2) a discussion of the computational complexity of our proposed algorithm; (3) an expanded experiments section. Our response focuses on these items, and in our view, we resolve your concerns. Here is a summary of the changes we have made: * **Novelty/main claims: A new example.** We added a new example illustrating why maximizing the cross-entropy loss is "fundamentally flawed;" see response A4 in the next post. * **Computational complexity: new section and plot.** At the end of Section 4, we added a new subsection dedicated to the computational complexity of our method relative to other adversarial training and evaluation methods. We also added a new plot (see our PDF response) showing that on average, **BETA is 5.11 times faster than AutoAttack** on the top entries on RobustBench while matching AutoAttack's scores. * **Proposed experiments: some provide, some beyond the scope.** While you propose several interesting directions for future work (e.g., providing convergence guarantees, using additional datsets, scaling to new architectures, and combinging BETA with PGD), we clarify why these ideas are beyond the scope of our paper. In writing this paper, our goal is to convince the reader that our non-zero-sum formulation for AT is the correct formulation. We support this idea with sufficient theoretical and empirical evidence. Please have a look through our detailed answers in the next post to each of the points you raised in your review. We feel strongly that these responses resolve most, if not all, of your concerns. And if this is the case, we hope that you'll consider raising your score to "accept." Thanks! --- > Q1. [...] clarify the technical contributions [...] using a margin-based loss for finding adversarial examples is not new to the field. A1. We kindly ask the reviewer to re-read lines 268-281, which clearly show that **we agree with you!** Using a margin based loss for **finding** adversarial examples is **not new** (as we note in lines 277-281 with reference to [Gowal et al., 2019]), but using the multitargeted approach for **training**, which leads to our bilevel formulation, **is new and constitutes a contribution to the literature**. Furthermore, we emphasize that **we did not claim to be the first to use margin-based losses**, meaning that there is no need to "clarify our technical contributions;" see our listed contributions (lines 54-66). > Q2. Proposition 1 shows that defining the worst-case perturbation based on margin-based loss is a better alternative if we can solve the non-concave inner maximization problem. This is not surprising, and its proof looks straightforward. A2. **We completely agree with you**, and our agreement on this fact is a central pillar of our paper; our result and proof are straightforward and unsurprising. What is surprising is that for the past decade, researchers have kept using surrogate losses (like the cross-entropy) during the **training phase**, when in fact Proposition 1 convincingly shows that a better approach is to use the multi-targeted margin-based attack. Our goal is not to convince you that Proposition 1 (note that we don't call it "theorem") is a surpising result (see footnote 2 on page 5); rather, our goal in stating this result is to give theoretical support to our claim that our non-zero-sum (bilevel) formulation in (17)-(19) is the correct formulation for AT. > Q3. The assumption that we can solve the inner maximization problem exactly is too strong to be held in practice. A3. This is true, **BUT** our goal is to find a correct optimization formulation for AT, not to provide formal guarantees of convergence. Note that traditional AT also lacks any convergence guarantees but it doesn't diminish its merits. This question is possibly a separate direction of research and there have been some advances in that front, but its out of scope for our paper. > Q4. The claim that "using surrogate loss is fundamentally flawed" is a bit exaggerated, so the authors should consider toning down such claims. A4. Based on your review, you agree with the following: (a) our approach eliminates robust overfitting and (b) our algorithm is a "straightforward modification of adversarial training" in the sense that the only modification we make is to substitute an margin-based loss for a surrogate loss in the attacker's objective. Based on your agreement (a) and (b), it follows logically that the flaw in adversarial training that leads to robust overfitting is the use of a surrogate loss in the attacker's objective. (Note that Reviewer gJU3 agreed with our evidence in calling the use of a surrogate loss an "overlooked limitation," and our conclusion in saying that this insight leads to "new understanding" of the problem and "[contributes] to the existing literature.") Therefore, from our perspective, the above quote from your review indicates that you agree with our evidence, but not our conclusion that surrogate losses are a fundamental flaw in adversarial training. Does this argument make sense, or is there any way that you think we could write this more clearly? Here's one way that we could make this argument clearer by means of an example which shows that maximizing the cross-entropy loss is fundamentally flawed. Consider two possible vector of logits: - ``Option A``: $(1/K + \epsilon, 1/K - \epsilon, \ldots, 1/K)$ - ``Option B``: $(0.5 + \epsilon, 0.5 - \epsilon, 0, 0, \ldots, 0)$ where $\epsilon$ is a small positive number, and assume the first class is the correct one, so we compute the cross-entropy with respect to the vector $(1, 0, \ldots, 0)$. For ``Option A``, the cross-entropy is $-\log(1/K + \epsilon)$, which tends to **infinity** as $K\to \infty$ and $\epsilon \to 0$. For ``Option B``, the cross-entropy has a cross-entopy of $-\log(0.5+\epsilon)$, which is a small constant number. Hence, one is to maximize the cross-entropy, one would always choose option A. However, note that ``Option A`` **never** leads to an adversarial example, since the predicted class is the correct one. In contrast, ``Option B`` is always an adversarial example, but has low (i.e., constant) cross-entropy. This illustrates why we claim that naively maximizing the cross-entropy is a flawed approach. We will add this example to the final version of the paper to clarify our argument. > Q5. One limitation is the additional computational overhead required to find the largest negative margin for each target class. A5. Our goal is not to devise an efficient algorithm per se, but to determine the correct optimization formulation for adversarial training, which we belive (a) we have accomplished in this work and (b) represents a significant contribution to the field. Given the correct formulation, we can then derive improved algorithms for faster optimization, but this is left for future work. Note that this is normal in the field, as the original versions of AT were slow and later work focused on speeding up these algorithms (see, e.g., [Shafahi et al., 2019], [Wong et al., 2020]). > Q6. For datasets with more classes I can imagine BETA and BETA-AT will suffer from scalability issues. A6. This is true, and was stated in lines 212-219 of our paper. To make this more clear, at the end of Section 4, we added a new subsection dedicated to the computational complexity of our method relative to other adversarial training and evaluation methods. > Q7. I failed to understand why SBETA-AT solves the scalability issue since it also loops through all the target classes. A7. We argue about **sample efficiency**: note that BETA generates $K$ possible adversarial examples (eq 21) but ends up using **only 1** to calculate a gradient for the network parameters (eq. 20). In contrast, SBETA weighs each of the $K$ possible adversarial examples obtained in the lower level (eq. 35), and uses all of their corresponding gradients (weighted proportionally to the achieved margin) in the upper level (eq. 34). > Q8. A major weakness of this paper is the insufficient experimental results. A8. Our tables show that the performance of the last iterate obtained with our optimization formulation achieves a much higher robust accuracy than the other baselines. Please note that the last iterate of an optimization algorithm is more indicative of the optimization formulation used. Specifically, the other methods require early stopping to avoid robust overfitting, whereas our method does not need such heuristic. In situations where we do not have access to a validation set, the need to have strong performance in the last iterate of training is essential, and ours is the first algorithm to provide this. > Q9. The empirical observations that BETA matches the estimate of Auto-Attack and BETA-AT mitigates robust overfitting are still interesting from my perspective. However, these claims are not fully justified due to missing comparisons and limited evaluations A9. We are pleased that you find this result "interesting." We assert that offering an apples-to-apples comparison to the top models on RobustBench is the gold standard for evaluating adversarial performance. If you feel that this is insufficient, can you be more specific about the "missing comparisons" and/or "limited evaluations?" > Q10. The main advantage of BETA over Auto-Attack is claimed to be its lower computational complexity, but no running time is reported. So, I recommend the authors report the running time of the results presented in Table 2 or conduct a complexity analysis of BETA with comparisons to attack algorithms used in Auto-Attack. A10. We also added a new plot (see our PDF response) showing that on average, **BETA is 5.11 times faster than AutoAttack** on the top entries on RobustBench while matching AutoAttack's scores. This means that one can match SOTA with significantly less computations complexity and without the need for heuristics, which is a strong argument in favor of our algorithm. > Q11. The main experiments regarding BETA-AT are only conducted using ResNet 18 model architecture, which is much smaller than state-of-the-art A11. Our goal in this paper is to validate the claims made in the introduction, not to provide **absolute** (architecture + optimizer + heuristics + (generated) data) state-of-the-art numbers. Or, said another way, we aim to show that **our method** is SOTA when compared in a controlled, yet representative setting (ResNet-18), which is consistent with many previous works on the topic that only use ResNets (see, e.g., Table 2 in the PGD paper [Madry et al., 2017], Table 1 in the Convex AdvPolytope paper [Wong and Kolter, 2018], Tables 2-3 in the MART paper [Wang et al., 2019], Tables 1-2 in the FastAT paper [Wong et al., 2020], and so on). Note that a good experimental setup tries to isolate the effects of each choice in the training loop, which is precisely what we do. In our case our argument is that our optimization formulation is better, and so we only need to vary this choice. > Q12. it is a bit unclear whether the mitigation of robust overfitting also happens for larger models [...] provide results for larger models. A12. Is there a reason to believe the results won't hold for larger models? ResNet-18 is already a relatively large deep neural network and is similar to many used in vision tasks. The fate of this paper does not hinge on scaling to billions of parameters. While one can always add more experiments, we feel strongly that we have given sufficient theoretical and empirical evidence for our formulation. > Q13. The paper only considers CIFAR-10. It is unclear whether the observations generalize to other image datasets such as SVHN, CIFAR-100, and ImageNet. A13. Only considering datasets as large as CIFAR-10 is the standard practice in the field; see the PGD paper [Madry et al., 2017], the Convex AdvPolytope paper [Wong and Kolter, 2018], the MART paper [Wang et al., 2019], the FastAT paper [Wong et al., 2020], and likely any other canoncial reference in the field. It is unreasonable to hold our paper to a significantly higher standard, given that we also provide sufficient theoretical motivaiton for our framework, whereas (many) past works only provide empirical results. > Q14. Other schemes for mitigating robust overfitting and improving robust generalization exist (see [1-3]). These papers should be properly cited and discussed. A14. Thanks for this pointer -- we have added references and a thorough discussion of [1-3] in your review to our paper. > Q15. It is a bit unclear to see the advantage of your method compared with these approaches. A15. Our goal is to convince you that adversarial training should be cast as a bilevel optimization formulation (non-zero-sum game), not that any one algorithm is absolutely better than another. The main difference in methodology of the papers you mention is that we derive a new optimization formulation from theoretical arguments. In contrast, the references you mention simply use empirical tricks that mitigate robust overfitting, for example * [1] finds via empirical exploration that flatness leads to an improvement in robust generalization in practice and performs experiments to verify claims; * [2] performs some experiments, finds in practice that small-loss examples are more correlated with robust overfitting, designs a method that tries to fix the symptom; * [3] finds that in practice adding more data improves robust generalization, which seems intuitive but lacks theoretical backing specific to robust overfitting. In contrast, our work theoretically exposes the weaknesses of the zero-sum formulation of AT. We then offer a theoretically principled solution, which we verify in a representative set of experiments. > Q16. your proposed BETA seems to be a general alternative to PGD attacks, so it would be interesting to conduct experiments to see if combining your method with them further improve model robustness. A16. Because our goal in this paper is to convince the reader that our proposed optimization formulation is more aligned with the goal of adversarial training, we only compare against algorithms with different optimization formulations. That is, we do not use heuristics to boost scores, which is needed to reach the absolute SOTA on leaderboards like RobustBench. While we agree that doing so would be interesting, it is beyond the scope of our paper. > Q17. Can you clarify how Smooth BETA Adversarial Training (SBETA-AT) solves the sample efficiency issue? Generally speaking, do you expect your method to be scalable to datasets with more classes like CIFAR-100? A17. We believe SBETA might be needed when the number of classes is really large, because otherwise (e.g., on CIFAR-100) we would be obtaining many possible adversarial examples, and using only one per iteration. Then again, SBETA is only an optimization formulation and we use the simplest algorithm (gradient descent for upper level, gradient ascent for lower level), but future work could focus on improving the complexity of BETA/SBETA similar to how the original slow PGD (2017) was made faster in later papers like in [Wong et al., 2020]. **References** [Shafahi et al., 2019] Shafahi, Ali, et al. "Adversarial training for free!." Advances in Neural Information Processing Systems 32 (2019). [Wong et al., 2020] Wong, Eric, Leslie Rice, and J. Zico Kolter. "Fast is better than free: Revisiting adversarial training." arXiv preprint arXiv:2001.03994 (2020). # Reviewer iTuT (score 4) Thank you for your review! We are pleased that you think our formulation and algorithm are **"novel"** and that our paper constitutes a **"meaningful research direction."** We are also glad that our SOTA empirical results render our algorithm a **"valuable alternative to existing methods"** and that you think that our approach has **"potential for addressing adversarial training in different domains."** **Main points in our response:** We understand that you think that two elements are missing from our paper: (1) a discussion of the speed-performance trade-off, and (2) a clarification of why our algorithm does not achieve "significant performance improvements" relative to other training algorithms. You also raised three questions, regarding (a) why we based on our formulation on the 0-1 loss and whether there are any limitations to this choice, (b) whether our method applies to problems outside of multi-class classification, and (c) whether there are additional benchmarks/experiments that would be relevant. In an updated version of our paper, we have addressed all of your concerns and answered each of your questions. Here is a summary of our response and of the changes we have made in line with your comments: * **Main concern (1):** At the end of Section 4, we added a new subsection dedicated to the computational complexity of our method relative to other adversarial training and evaluation methods. We also added a new plot (see our PDF response) showing that on average, **BETA is 5.11 times faster than AutoAttack** on the top entries on RobustBench while matching AutoAttack's scores. * **Main concern (2):** We disagree, for two main reasons. Firstly, if one looks at the last iterate, our algorithm **significantly outperforms the baselines** by **eliminating robust overfitting**, and if one can use early stopping, then based on Table 1, our algorithm performs similarly, and in some cases outperforms the baselines. Secondly, note that offering absolute performance improvements (with early stopping) was not one of our listed contributions (see lines 55-66); our main contribution is to show that our formulation eliminates robust overfitting, which you noted in your summary of our work. * **Question (a):** As we explicitly state in lines 69-78, we consider supervised classification problems, which exactly corresponds to minimizing the 0-1 loss. This is consistent with nearly all of the well-known papers that have introduced adversarial training/evaluation algorithms in the last 10 years (e.g., the FGSM, PGD, FastAT, and MART papers, as well as many others). * **Question (b):** While a similar approach may apply to other problems (e.g., regression or semantic segmentation), this is beyond the scope of our paper. A promising direction for future work would be to extend our formulation to these settings. * **Question (c):** We have presented theoretical and empircal evidence for our framework. Note that while almost any paper can benefit from additional experiments, by considering adversarial training and evaluation on CIFAR-10, we have provided at least as many experiments as many of the seminal papers in theis field (e.g., the FGSM, PGD, and MART papers, as well as many others). We have also added additional experiments in our rebuttal (see the first bullet). More detailed explanations and comments are given in the post directly following this one. We feel that these answers **completely address your concerns and questions**, and as such, we hope that you'll consider increaing your score to "accept." Thank you! --- **Concern (1): Speed-performance trade-off.** We agree that we could add more details regarding the computational complexity of our method. Based on your comments, in Section 4, we have added a **new subsection called Sample efficiency** where discuss the running time of BETA relative to other adversarial training algorithms. To summarize this section, whereas typical AT algorithms (e.g., PGD) perform untargeted attacks, our algorithm uses a targeted attack, which introduces additional complexity. However, in our PDF response, we show that our heuristic-free algorithm **BETA is 5.11x faster than AutoAttack** on average when evaluating the adversarial performance of the top entries on RobustBench. While not the fastest availabel AT algorithm, our approach **eliminates robust overfitting** via a simple and intuitive bilevel procedure. This indicates that one must pay a price to avoid overfitting, which is to be expected. Trade-off in number of steps... **Concern (2): Significant performance improvements.** Your claim that our algorithm does not offer significant improvements is not correct. If one cannot run early stopping (e.g., due to the additional complexity it imparts), then by inspecting the last iterate column in table 1, observe that there is a considerable difference between our method and the baselines. This is consistent with the claims that we made in the introduction (see lines 55-66), which focus on the elimination of robust overfitting. However, if one can run early stopping, then based on Table 1, our method performs just as well, and in some cases better than the baselines we considered. In some sense, this is the expected result. The simple, theoretically-motivated modification that we introduce does not lead to a fundamentally new algorithm, but to a more nuanced version of traditional adversarial training that eliminates robust overfitting. Based on your review, we are in agreement that this constitutes a contribhution to the existing literature, given that numerous papers have attempted to tackle the problem of robust overfitting recently. **Question (a): Our focus on classification/0-1 loss.** You ask "why [we] specifically start from the 0-1 loss to derive" our formulation. In response, note that we explicitly state in lines 69-78 that this paper concerns supervised classification, which precisely corresponds to the minimization of the 0-1 loss. **This is consistent with nearly all of the well-known papers in this field** that have introduced adversarial training/evaluation algorithms in the last 10 years** (e.g., FGSM [Goodfellow et al., 2014], PGD [Madry et al., 2017], FastAT [Wong et al., 2020], MART [Wang et al., 2019], AutoAttack [Croce et al., 2020], as well as many others). As you marked ``Confidence=4`` (i.e., you are confident about your familiarity with related work), we hope you agree that focusing our analysis on a different setting would deviate severely from the bulk of the literature. Relatedly, you also asked whether there are "any inherent limitations or assumptions" associated with the choice to minimize the 0-1 loss. **This question is answered in lines 76-88** (see the paragraph which starts, "Prominent amoung the barriers to solving (1) in practice..."). To summarize, the 0-1 loss is discontinuous and is therefore not amenable to gradient-based optimization (it has gradient $=$ zero almost everywhere). That is why we have to relax the objective using surrogates. **Question (b): Going beyond classification.** We feel strongly that focusing our analysis on classification problems is not a weakness of our paper. As we mentioned above, this approach is consistent with nearly every paper that introduces an adversarial training/evaluation algorithm. While this is an exciting direction for future work, considering other problems such as regression or semantic segmentation is **beyond the scope of our work**. **Question (c): Other benchmarks.** Your question is whether we could perform more experiments that "further support the claims made in the paper." However, in your review, you wrote that our empirical results render our approach a "valuable alternative to existing methods." So our question for your is, Why do we need more experiments if you feel we have already made a compelling case? Consider the following argument. We supported our claims with both theoretical and empirical evidence. On the theoretical side, we rigorously showed why the standard formulation for adversarial training suffers from two main limitations (see Section 3.1). Then, on the empirical side, we derived a new formulation for adversarial training, which we shoed leads to an algorithm which eliminates robust overfitting and matches SOTA for adversarial evaluation without heuristics (see Sections 4-5). While any paper can benefit from further experiments, between the experiments in the paper and the new experiments in this rebuttal, we have certainly shown that the claims made in our paper hold, which is the aim for any paper under review. **References** [Goodfellow et al., 2014] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). [Madry et al., 2017] Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083 (2017). [Wang et al., 2019] Wang, Yisen, et al. "Improving adversarial robustness requires revisiting misclassified examples." International conference on learning representations. 2019. [Wong et al., 2020] Wong, Eric, Leslie Rice, and J. Zico Kolter. "Fast is better than free: Revisiting adversarial training." arXiv preprint arXiv:2001.03994 (2020). [Croce et al., 2020] Croce, Francesco, et al. "Robustbench: a standardized adversarial robustness benchmark." arXiv preprint arXiv:2010.09670 (2020). ---           # Reviewer gXG9 (score 5) Thank you for your positive review of our work! We are glad that you think our paper is **"well-written"** and **"easy to follow"**, and we agree with you that our formulation in (10)-(11) is **"a new starting point for the study of adversarial robustness"**. **Main points in our response:** We understand that you think there are two minor ingredients missing from our paper: (1) a discussion of the computational complexity of our algorithm, and (2) a comparison to "SOTA adversarial training methods." You also raised two questions regarding (a) why there are no error bars on the experiments and (b) why the bilevel problems in (15)-(16) and (17)-(19) are equivalent. In an updated version of our paper, we have addressed all of your concerns. Here is a summary of our response and of the changes we have made in line with your comments: * **Minor concern (1):** At the end of Section 4, we added a new subsection dedicated to the computational complexity of our method relative to other adversarial training methods. We also added a new plot (see our PDF response) showing that on average, **BETA is 5.11 times faster than AutoAttack** on the top entries on RobustBench while matching AutoAttack's scores. * **Minor concern (2):** Table 2 already contains the top 10 entries on RobustBench, which are definitionally SOTA; we have added additional baselines to Table 1 in line with your comment; see Table XXX. * **Question (a):** The precident to not include error bars was set in past work (e.g., the FGSM, PGD, FastAT, and MART papers, as well as many others). We answered "yes" because this answer is vacuously true: there are no plots that require error bars. * **Question (b):** A full proof relating the optimal solution of (15)-(16) to that of (17)-(19) is given in the detailed comments. We will include this proof in an appendix, with pointers in the main text. More detailed explanations and comments are given in the post directly following this one. We feel that these answers **completely address your concerns and questions**. And, as you've already rated us as haveing ``Soundness=3``, ``Presentation=3``, and ``Contribution=3``, we hope that you'll consider increaing your score to "accept." Thank you! --- **Minor concern (1): Discussion of computational complexity.** We agree that we could add more details regarding the computational complexity of our method. Based on your comments, in Section 4, we have added a **new subsection called Sample efficiency** where discuss the running time of BETA relative to other adversarial training algorithms. To summarize this section, whereas typical AT algorithms (e.g., PGD) perform untargeted attacks, our algorithm uses a targeted attack, which introduces additional complexity. However, in our PDF response, we show that our heuristic-free algorithm **BETA is 5.11x faster than AutoAttack** on average when evaluating the adversarial performance of the top entries on RobustBench. While not the fastest availabel AT algorithm, our approach **eliminates robust overfitting** via a simple and intuitive bilevel procedure. This indicates that one must pay a price to avoid overfitting, which is to be expected. **Minor concern (2): Comparison to SOTA.** **Question (a): Error bars in the experiments.** Your first question concerns why we checked "yes" for error bars question. We emphasize that we had to select either "yes" or "no"; if there had been an option to abstain from answering this question, we would have done so, because we feel that **if we had included error bars, we would have diverged from the norms set in the seminal works of this field**. So to answer your question: When prompted to answer "yes" or "no," we answered "yes" as this answer is vacuously true in the sense that there were no experiments that required error bars. Here are further details and references pointing to a small subset of the seminal works in this field, none of which employ error bars for the same experiments that we run. * **Table 1:** we did not include error bars because it is not standard practice to do so (see, e.g., Table 2 in the PGD paper [Madry et al., 2017], Table 1 in the Convex AdvPolytope paper [Wong and Kolter, 2018], Tables 2-3 in the MART paper [Wang et al., 2019], Tables 1-2 in the FastAT paper [Wong et al., 2020], and so on). Given that you set ``Confidence=5`` in your review (i.e., you are "very familiar with the related work"), we hope you agree that including error bars does not follow the standard practice for this field. * **Table 2:** we compared the performance of BETA with the SOTA AutoAttack method for each of the top entries on the widely-used RobustBench leaderboard [Croce et al., 2020]. Note that RobustBench does not use error bars, and therefore we followed standard practice by not including error bars in this table. We hope you agree that in the case of this paper, the presence or absence of error base is not a weakness; it is merely an artifact of a poorly-worded question in the submission form for NeurIPS this year, and our effort to adhere to standard practices in the field of adversarial robustness. **Question (b): Equivalence of the bilevel problems:** Your second question concerns the equivalence between the bilevel problem in (15)-(16) and the bilevel problem in (17)-(19). We will walk through the derivation set by step. Let's start by fixing a data point $(X,Y)$. Now consider eq. (16), which is itself an optimization problem: $$ \eta^\star\in\argmax_{\eta\in\Delta} \max_{j\in[K]-\{Y\}} M_\theta(X+\eta,Y)_j \tag{16}$$ where for simplicity, we have defined $\Delta=\{\eta : ||\eta||\leq \epsilon\}$. Note that as we are maximizing over both $\eta$ and $j$, we can switch the order of maximization: $$ \max_{j\in[K]-\{Y\}} \max_{\eta\in\Delta} M_\theta(X+\eta,Y)_j \tag{SwappedMax}$$ Now for each $j\in[K]-\{Y\}$, let's write the solution to the inner problem as $$ \eta^\star_j \in \argmax_{\eta\in\Delta} M_\theta(X+\eta,Y)_j $$ so that `(SwappedMax)` can be *equivalently* rewritten as: $$ \max_{j\in[K]-\{Y\}} M_\theta(X+\eta^\star_j, Y)_j \quad\text{where}\quad \eta^\star_j \in \argmax_{\eta\in\Delta} M_\theta(X+\eta,Y)_j. $$ Now consider the maximization over $j$. Since there are only $K-1$ elements in $[K]-\{Y\}$, we can simply choose the index $j^\star$ for which $M_\theta(X+\eta^\star_j, Y)_j$ is maximized, i.e., $$ j^\star \quad\text{satisfies}\quad M_\theta(X+\eta_{j^\star}^\star, Y)_{j^\star} \geq M_\theta(X+\eta_j^\star, Y)_j \quad\forall j\in[K]-\{Y\}$$ Putting together these pieces, we have shown that $\eta^\star$ is optimal for (16) if and only if $\eta^\star_{j^\star}$ is optimal for the problems defined by $$ \eta^\star_{j} \in \argmax_{\eta\in\Delta} M_\theta(X+\eta,Y)_j \quad\forall j\in[K]-\{Y\} \\ j^\star \in \argmax_{j\in[K]-\{Y\}} M_\theta(X+\eta^\star_j, Y)_j $$ which completes the proof and answers your question. #### References [Croce et al., 2020] Croce, Francesco, et al. "Robustbench: a standardized adversarial robustness benchmark." arXiv preprint arXiv:2010.09670 (2020). [Madry et al., 2017] Madry, Aleksander, et al. "Towards deep learning models resistant to adversarial attacks." arXiv preprint arXiv:1706.06083 (2017). [Wong and Kolter, 2018] Wong, Eric, and Zico Kolter. "Provable defenses against adversarial examples via the convex outer adversarial polytope." International conference on machine learning. PMLR, 2018. [Wang et al., 2019] Wang, Yisen, et al. "Improving adversarial robustness requires revisiting misclassified examples." International conference on learning representations. 2019. [Wong et al., 2020] Wong, Eric, Leslie Rice, and J. Zico Kolter. "Fast is better than free: Revisiting adversarial training." arXiv preprint arXiv:2001.03994 (2020). # Reviewer gJU3 (Score 7) Thank you for your positive review of our work! We are glad that you feel that our work **"contributes to the existing literature"** and that our approach **"adds new understanding"** regarding robust overfitting. We certainly agree that our paper **"[closes] the generalization gap between robust AT settings and standard neural net training settings."** Our response is summarized below: **Main points in our response:** We understand that you have one minor concern regarding the empirical performance of adversarial training, and a question about whether we've looked at other factors impacting test time performance. * **Minor concern:** We agree with you -- our method doesn't significantly improve performance (c.f., Table 1 and Figure 1), which is why **we did not claim this as a contribution** (see lines 55-66). As you note in your review, our "simple modification" is enough to eliminate robust overfitting. This modification does not lead to a fundamentally new algorithm, but to a more nuanced version of traditional adversarial training that eliminates robust overfitting. Based on your review, we are in agreement that this constitutes a contribhution to the existing literature, given that numerous papers have attempted to tackle the problem of robust overfitting recently. We hope to look more carefully at this concern in future work, as boosting the training-time performance of this algorithm is a promising direction for future work. * **Question:** In line with the precedent set by RobustBench, we did not look at different model capacities for each algorithm when evaluating the test-time performance of the BETA; this would be a promising direction for future work. Similarly, we feel that determining the theoretical and empirical extent to which more data improves the performance of BETA is outside the scope of our paper, although we are excited about looking into this in a future paper. --- Average speedup: 5.11x for BETA ![](https://hackmd.io/_uploads/rJqRe5hs2.png)