adv NTK rebuttal

# adv NTK rebuttal * R7n6: Thank you for your time and your enthusiastic review! To address your comments: * We will provide slightly more details on the NTK computation in the appendix. In Sec. 3.2 we used Eq. (C.29) and its derivative, evaluated on the entire training data. For the robustness-usefulness analysis in Sec. 4 we use the (differentiable) analytical expressions for the various architectures, available in the neural tangents library, and compute kernels of size 10K for MNIST and the entire binary CIFAR (also 10K) - these details were in a caption before and are now in the main text. In general it is harder to compute analytical kernels in the multi-class case, especially for more sophisticated architectures. The details on the empirical kernel dynamics computations in Sec. 5 are in the Appendix. * line 68: You are right that training data are needed. Our intent in this line is to compare the method we present with attacks that train substitute models, where the dependence on data is obvious. So in that sense, there is no change in the assumptions of the two models, which is why we do not mention it there. * We appreciate your concerns about paragraph L241-L245. We will modify/shorten these lines in the revision, to make the distinction between our model and the traditional BB and WB models more crisp. We will mention that our threat model explicitly requires access to the training data, and how this may depart from the currently established definition of the black-box threat model. We will make similar clarifications in the other places you point out. Thank you for alerting us to this potential source for confusion: we will disambiguate our claims (and remove allusions to the Black Box threat model). * About Fig. 2, you are right. The colored boxes are for illustrative purposes and the boundaries are chosen arbitrarily. We will add a note in the caption. * vnuA: Thank you for your time and your review! We address your concerns: * We agree that our claim is valid only in the lazy regime. This is reflected in the title of this section "White box = Black box in the kernel regime" (which we have now adjusted). We realize that this generated confusion, and will make sure to delineate the claim clearly in the rest of the section. * We considered the same setup as in Arora et. al, ("Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks", ICML'19) which is one of the first works in the NTK literature. * We apologize if that section in the supplementary material was not clear enough. We will try to clarify the confusion in the revised version. The point is that we do not compare performance on binary problems with performance on multiclass, we just argue that achieving robustness on multiclass classification is generally more difficult than in the binary case (see for instance: Qian et al., "Robustness from Simple Classifiers", https://arxiv.org/abs/2002.09422, 2020). Based on the results on MNIST (which did consider a very small amount of robust features - 3), we conclude that it is unlikely that a kernel composed of robust features will yield a robust machine. Regarding your questions: 1. We review aspects of the NTK theory in the most general setting, while referring interested readers to the relevant papers for the exact assumptions of the theory. We clarified the confusion around the dimension of the weights in the revised version. Thank you for suggesting this. As indicated in the Appendix, for the transfer results in Sec. 3.2 we use the two-layer network with frozen weights of Eq. (C.29). For the usefulness-robustness results we use standard FC{1,2,3,4,5} and CONV{1,2} architectures, which analytical expressions provided in the neural tangents library. 2. Anlyzing the minimization of cross entropy loss in the lazy regime is a non-trivial subject, where the neural net expressions change significantly. See for example [Lee et al 19] Appendix B.2. We do not attempt to pursue this direction in our paper. 3. Multi-step attacks are harder to be analyzed analytically, so we only consider this scenario in the last section, where we do adversarial training. 4. Appendix B contains the derivation of the attacks on kernels. Please let us know if you have any suggestions on how to improve it. 5. In that section, we deliberately stay in the kernel regime, as reflected in the title of the section. We will make sure to make it more clear in the revised version. 6. Please note that the accuracy reaches 2% and not 200% (the y-axis shows %). 7. Thank you for the suggestion! We will improve the visibility of figures in the revised version! In summary, we will make the setting of Section 3.2 more clear, as you suggested and as is suggested by some of the other referees. Note, however, that the role of that Section is mostly to motivate the analysis that follows (and our revised version will adjust the claims in the introduction that refer to it). We are at your disposal for further questions about the rest of the paper and hope that you might consider raisig your score, since you seem to appreciate its contributions. * bexg: Thank you very much for your comprehensive and valuable review. We have been thrilled to read your very thoughtful comments and the enthusiasm you convey for our study and are very grateful for your remarks, which we have taken as a starting point for a few improvements of our paper. We hope that the answer to your questions and changes we have made alleviate your concerns as to possibly fuzzy details and technical flaws. We truly hope our clarifications will incite you to raise your score, and are at your disposal for further clarifications if necessary. To answer your questions: > 1. Number of epochs of training in Fig. 4 and corresponding robustness: * You are right that these numbers would look suspicious under "normal" conditions. However, note that these experiments were performed in a different regime ("close to the NTK"): The number of epochs is high, since learning is slowed down due to the choice of hyperparameters (learning rate, variance at initialization, large width, $\ell_2$ loss, full batch GD - please see Appendix C for details). The robustness seems very large, but notice that Fig. 4 shows a binary task and hence the numbers must be appreciated accordingly. The same experiment, when done on multiclass MNIST (Fig C.1(a)), shows very low levels of robustness (1%), which agrees with previous literature (which mainly focuses on multiclass problems). Finaly, notice that robustness being larger in the beginning than in later stages of training agrees with our finding that the top eigenfunctions of the NTK are more robust (as the top ones are the ones being learned first when nets are trained in the kernel regime). 2. Unclear usefulness metric: * Good catch - thank you! Indeed, our definition of usefulness and robustness of a feature differs slightly from what we present in Fig 2 (right) and Figs. D.2 and D.3. We corrected this in the revised version. In brief, we view the usefulness of a feature as its classification ability (and, accordingly, its robustness as its classification ability under adversarial perturbations). This makes the extension to the multiclass case natural (App. A). In our figures, we use robustness against FGSM attacks as a surrogate for robustness, as mentioned in Sec. 4. >3. Weak results on transferability/black-box attacks * You are absolutley right that our experimental analysis in Section 3.2 is limited to the kernel regime (and we do mention this throught the paper, though possibly not enough). In the revised version, we are making sure to put the contributions of this section in a better context. Doing the same set of experiments with empirical kernels and more realistic networks (such as deep convolutional architectures) is definitely interesting and important, albeit computationally demanding and sligthly beyond the scope of this paper in our opinion. We will try to emphasize that the contributions of this section serve more as a motivation for the remainder of the paper, and adjust the claims about black-box attacks. > Miscellanea: Figure colors, font sizes, references: * Thank you for pointing these out. We have modified the color map of Fig. 3 to something hopefully more satisfying. We updated all references, and increased the font size where possible. Thank you for helping us improve our work! We hope our responses address your concerns. * Ly8J: Thank you for your time and your review! We appreciate your critical read of our work. Before we adress your questions, please allow us to comment on the intent of this paper: In general, since adversarial robustness has not been studied much (or at all) from an NTK viewpoint, we chose breadth over depth in many places of our work, also trying to open new areas of exploration for future work. However, the fact that adversarial examples transfer (in the kernel regime), the fact that the distinction between robust and non-robust features seems to hold on kernels as well - and that robust features tend to correspond to the top of the eigenspectrum, and the empirical phenomena of movement and laziness of the empirical kernel during adversarial training are all novel contributions of our study. In our revision, we tried to improve the clarity of our claims regarding the so-called black box attack we introduce, as we realize they have been confusing for several of the referees. We hope you will appreciate what we believe is a more nuanced presentation (and hence less flamboyant). We also agree that our transfer results to wide neural nets are in no way surprising (and were not meant to be): since previous works have not studied or emphasized transfer of *gradients* with respect to the data, we wanted to establish and highlight these results chiefly as a basis for what follows and to make sure the foundations hold (i.e. gradients behave as we expect). >I liked the E.1 Linearized Adversarial Training experiment, but wish the authors had developed this further. Thank you! We were severely constrained by space and scope, and have left extensions of this particular line of research to future work. However, in the revision, we have added an additional set of data comparing linearization at initialization (previously we only linearized after 25 and 50 epochs) and expand our observations on the resulting gap between standard and robust test accuracy (which increases the earlier we linearize). We believe this phenomenon warrants further in-depth study. In reponse to your questions: > The third investigation (evolution of empirical NTK) seems to make a couple of leap of faiths (maybe they are better justified but in that case you should make them clearer). In particular, it is not obvious at all why the top eigenspectrum should correspond to more robust features for the empirical NTK and particularly for adv training (as this was only shown for the exact NTK and std training). We agree that this later part of our discussion (L. 351-359) is speculative and makes the assumptions you point out (a leap of faith from the analytical to the empirical NTK). We only attempt to provide some intuition on the mechanism of adversarial training through what has been analyzed already in the paper. However, we believe that the rest of this section makes interesting contributions (e.g. slowdown of kernel during adversarial training) that are not based on any assumptions. > Can you provide more details on how exactly you computed usefulness and robustness for the NTK features? I could not find them in the Appendix. * Thank you for asking this! The way that we computed them indeed differs slightly from the definitions of Section 2.2. The revised version corrects this. In brief, we view the usefulness of a feature as its classification ability (and, accordingly, its robustness as its classification ability under adversarial perturbations). Please let us know if you have any more questions about this. > Have you tried computing the robustness of the eigen-features arising from the empirical NTK? * This is a very good suggestion and we were initially planning to include such a study in our paper. However, computing the empirical kernel for convolutional, multi-output architectures for the **whole dataset** and then computing its gradients to estimate robustness is computationally prohibitive at the moment (at least to the best our knowledge). So, reproducing the study of Section 4 for empirical kernels is not trivial computationally. We believe that this is one of the possible future directions that our work opens. Please also note that any prior work you might be referring to is unpublished and has (not coincidentally) not been made available on the arXiv. # Revision of the paper Summary of revisions in the updated version: Main paper: 1. We have clarified that our transfer results from (analytical) kernels to neural nets are only provided in the "lazy" regime and have clarified statements comparing the "NTK-attack" to black box and white box thread models. We clarify that our claims to the effectiveness of this attack are only verified for the wide networks in the "lazy"-regime considered in Sec. 3.2. 2. We have clarified the nature of the "NTK thread model": it requires training data and knowledge of the model architecture, but no access to model weights, so constitutes a type of "analytical substitution attack". 3. We have slightly modified the definitions of robust and useful features in Section 2.2 and Appendix A according to their classification ability. This gives a straightforward extension to the multiclass case and does not require balanced or normalized features. 4. We have clarified "usefulness" in the captions of Figs. 2 (Right) and D.2 and D.3 in the Appendix and pointed out that the shaded boxes are arbitrary and serve to visualize various regions in usefulness-robustness space 5. We have improved the color rendering of Fig. 3 and increased font sizes where possible 6. We fixed the legend of Fig. 6 Left (which already had a correct rendition in the former appendix, which we now moved to the main text) 7. We have updated all references to refer to the latest published version Appendix: 1. In Sec. E.1.we have expanded our analysis of linearized models and added an additional set of data comparing linearization at *initialization* (previously we only linearized after 25 and 50 epochs) and expand our observations on the resulting gap between standard and robust test accuracy (which increases the earlier we linearize). We believe this phenomenon warrants further in-depth study. 2. We slightly modify the definition of usefulness/robustness in App A, as explained for the main paper 3. We have added more detail on the size of the datasets used in Sec. 3.2 (the entire dataset) 4. We explanded hte captions on Figs. D.2 and D.3 to indicate y-axis values for useful features and clarifying that shaded boxes serve for visualization only. 5. We clarify in Sec. D.2. the exact architectures and data set sizes we used for the usefulness-robustness calculations #Rebuttal to the rebuttal vnuA We thank the reviewer for their response to our comments, and appreciation of our efforts. Perhaps the following clarifications can address your comments, and in particular convey our complete surprise to read your comment/weakness 3 since we made special effort to include many multi-class experiments. > The transfer learning attack only holds for the lazy regime, then if the network is outside the lazy regime, the black box attack from this paper will not have the same effect as white-box attack, therefore my first weakness still holds. You are correct that in Section 3 we present transfer results to wide nets in the NTK parameter setting. Our main goal here is to make sure gradients (with respect to the data) transfer from NTKs to wide nets as we would expect (but which has not been verified anywhere in the literature, to our knowledge). We then discuss the type of threat model such an "NTK-attack" would constitute (it requires description of the model architecture and training data, but not more, thus is somewhat outside the standard attack models that are usually studied). Having established these two facts, we then open the question for future work (see lines 369-371: "Sec. 3 argues that transferable attacks from the NTK may be as effective as white-box attacks, but this warrants an in-depth study across architectures, kernels and data sets (which has not been the main focus of this work).") In this paper, we aim to lay out a wider spectrum of benefits the "NTK-lens" offers to the study of adversarial robustness: this is not intended as a cryptography paper proposing a new attack. We truly hope the reviewer can appreciate our other contributions as well (Sec. 4, 5 and the analytical derivations complimenting Sec 3) and want to convey the spirit of this paper as one that opens the route to wider inquiry. > For [1], the initialization is I), and as far as I can tell, is for the purpose of analysis to understand how width changes with . In fact, [2,3,4,5] all use the same standard initialization with normalization factor. Therefore, it’s still unclear to me why choosing . We appreciate your concern and thank you for the additional references, which are all important works. As already mentioned in our previous comment, we have decided to consider the same setup as in Arora et. al, ("Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks", ICML'19) which is one of the first (and arguably influential) works in the NTK literature (reference [1] from your list). Please allow us to cite from the arXiv version of that paper (https://arxiv.org/pdf/1901.08584.pdf): page 17, "Experiment Setup", first paragraph: "...Our theory requires a small scaling factor κ during the initialization (cf. (1)). We fix in all experiments...." Note that Arora et al. have the term in their setup (formula above Eq. (1) on page 1) - as we do in ours (Eq. (C.28)). While Du et al. (reference [2]) consider only the case (see footnote 3 on page 5 of [1]) [1] includes also into bounds in and, relevant for us, sets to a small constant in the experiments. As far as we could tell, it is important that smaller results in smaller outputs at initialization. (Perhaps related is a similar parameter in [Arora et al. '19: On Exact Computation with an Infinitely Wide Neural Net] that multiplies the output of the net.) We hope this clarifies our choices, but we are at your disposal for further clarifications, if needed. > I appreciate the authors' thoughts on different experiments. However, since this is more of an experimental paper, merely binary classification with two-layer ReLU network won't make a strong argument or any insightful suggestions for real-world application. Does the author try any transfer-attack experiments for multi-layer neural networks? It’d be more convincing if the idea could go beyond two-layer networks. We are truly surprised at this comment. Our paper contains non-binary (multi-class) experiments for each topic we touch upon (transfer to wide nets, features and their visualization, kernel dynamics). Might the reviewer perhaps have overlooked Appendix C (which is entirely dedicated to transfer in the multi-class case) and reference to it in lines 232-234 (of the revised version of Aug 3) - "We reproduce these plots for MNIST in the Appendix, leading to similar conclusions."? While space constraints didn't allow us present all multi-class experiments in the main body of the paper (they are in App. C and D) we have all of Section 5 completely in the multi-class setting. We hope our responses are satisfactory and might incite you to reconsider your score, since you seem to appreciate the novel connection between the NTK and adversarial robustness. bexg: We again thank the reviewer for his generosity with input to our paper. We are pleased our comments were helpful. While we do not want to extend this discussion unnecessarily, we just wanted to make clear what we meant by "computationally prohibitive" in response to this last point: > It is true, as some other reviewers mention, that some experiments could have been extended further or be a bit more thorough (e.g., evaluate with PGD, rather than FGSM; or performed thorough comparisons for practically-sized finite-width linearised networks), but I do not think these are enough reasons to argue for a rejection. > However, on this last point, I would like to add that, to the best of my knowledge, it is in fact not prohibitive to compare the dynamics of a standard neural network and its linearised counterpart. This has been previously done by (Fort et al. 2020, Baratin et al. 2021, Ortiz-jimenez et al. 2021) without access to much hardware. For the same reason, computing adversarial attacks of the linearised neural networks should not require a prohibitive computational cost if performed using standard gradient-based attack pipelines.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.