jansch
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Invariance-aware smoothing NeurIPS 2022 Review # Original reviews ## Reviewer 1 ### Strengths * Invariant models for computing with point cloud data are becoming more and more prevalent, and understanding issues of robustness with these models and their connection to certified robustness seems timely. This paper seems to initiate this line of work. * The paper is well-written, with ample details on background material, derivations, and technical points. The appendices are extensive and catalogue conceptual/computational aspects of the experiments. * It seems to be a useful methodological contribution to show that it is 'enough' in some practical cases to use the simple post-processing approach to certification, even if it is not Neyman-Pearson-exact (although see some caveats below). ### Weaknesses * The contributions here seem somewhat narrow -- it would be better if theory and experiments relevant to the case of 3D rotations were presented, given that this seems to be most important for the applications used as motivation (point cloud nets, molecular prediction, ...). Most of the experiments are used to motivate the idea that the 'post-processing' invariance approach is comparable to the tight certificates approach for 2D rotations; one would like to see that this continues to hold in 3D. * It seems strange that the tight certificates approach does not yield superior accuracy in Figure 6. * The theorems and derivations throughout the paper are presented in a formalism-heavy way that often belies significant underlying conceptual simplicity. I think the clarity of the paper could be improved as a whole by also mentioning some of these simple conceptual intuitions: for example, eventually the post-processing approach that is argued to be sufficient (Theorem 2) is quite intuitive (it might be nice to have an Algorithm environment somewhere that summarizes exactly what approach the authors are ultimately proposing that one takes away from the paper, to have a useful methodological contribution -- with all the competing approaches it is easy to get a bit lost reading through); the proof of Theorem 3 essentially only depends on rotation invariance of the gaussian distribution; the 'canonical representation' formalism from definition 3 (is this a standard term?) and its instantiations throughout section 6 seem to be understandable in terms of whether or not you can quotient in a sufficiently nice sense by the action of the symmetry group; the observation discussed in the paragraph at line 217 can probably be explained using some classical insights from invariant theory. * It seems that generalizing this to images will be infeasible in general, given that invariances one is interested in here act on the image plane rather than on the image values, which means that the analogue of Theorem 3 will not transfer. (c.f. Section 7) More generally, one imagines it would be challenging to extend this approach to any similar 'homogeneous space' setting of the invariant group action. ### Minor Issues * Line 45: it seems perhaps premature to call a method from 2019 "classic". * Line 84: spurious period * Line 91: should this line 'First, one can use the fact that ...' state rather than ? The discussion below seems to demand that this is true. * Line 92: I would recommend stating the restriction to binary classification outside of a footnote, as it is done systematically throughout the paper. In appendix F, it would make sense to reference the general version of the Neyman-Pearson calculation proved in [27] for general values of the probabilities -- I imagine the current omission won't bother a reader who is familiar with randomized smoothing, but I found it necessary to do the calculation myself and then reference what was shown in [27] to convince myself that something wasn't being omitted here. * Equation (5): why is there a slash through the bottom line on the symbol? I do not understand this result. ### Questions * I would appreciate some clarifications on the contributions and the relationship to prior work. The contributions claimed at line 40 are quite narrow -- is this also the first work that takes up the systematic study of randomized smoothing combined with invariant classifiers? The section in the related work at line 52 seems to suggest this is the case, but doesn't claim it directly. * The expressions in Theorem 7 seem a bit too complex to easily parse, even when looking at the full details in the appendix -- one is curious whether some pathological behavior emerges for 'strange' values of , , and ; whether there is a 'typical' setting (random? incoherent?) where it is possible to get more insight into the result's predictions; etc. If the authors can point to any details in the appendices that help interpret these issues + could be integrated into the main body, this could help. ## Reviewer 2 ### Strengths * Robustness certificates are usually used to prove that certain invariances hold for a model. Using a-priori knowledge about invariances to increase the scope of provable (derived) robustness is an interesting avenue. * The paper conducts a solid theoretical analysis and further supports its conclusions using empirical evidence. ### Weaknesses * The applicability of the work might be somewhat limited, because it is difficult to construct models that are provably robust to certain input transformations, and not just weakly invariant due to training. The certificates developed in the paper only hold when strict invariance can be assumed. * There are some clarity issues, especially regarding the different types of certificates. There are a number of different certificates (black-box, gray-box, post-processing, pre-processing, tight), and it's difficult to keep track of their meanings relationships. The different types of certificates should be clearly defined (ideally already in the intro, at least roughly) and delineated. For instance, it's not clear to me what the role of the pre-processing certificates is? Are they more than post-processing certificates proven to be tight? * A central quantity in the theoretical analysis is the probability (line 92) of predicting the target class under randomized smoothing, which is used to bound classification probabilities. However, in the original randomized smoothing work, this quantity is compared to the probability of the next most likely class, which is absent from the analysis here. As far as I can see the workaround is to assume that , though this seems like an overly strict assumption in a multi-class scenario. Wouldn't comparing against the next most likely probability allow for deriving tighter bounds? Or is the analysis limited to binary classification? Footnote 2 seems to suggest something like that, but the definition of g only mentions a general label set . Please clarify this. ### Questions * In theorem 6, shouldn't g have the property of being rotation instead of translation invariant? * I am not an expert in the spatial-data domain, so I am not sure how realistic for instance attacks using point translations or rotations are, it might be worth to discuss the threat-model or other ways in which such transformations might enter the test data in more detail. * Could you clarify why the set of invariance transformations need to form a group? ## Reviewer 3 ### Strengths 1. The paper is quite complete. 2. The presentation is good. ### Weaknesses 1. The contribution of this paper is a bit incremental. Specifically, Theorem1,2,5 are obvious to me. 2. The significance of invariance-aware randomized smoothing is limited. The neural network classifiers that satisfy invariance are extremely rare, so the author needs to construct invariant versions of PointNet and DGCNN via some tricks. ### Questions * Is Eq. (91) in Lemma 14 (appendix) wrong? It might be a typo. I think the left-hand side is the same as the right-hand side. * Is group averaging (line 222) a common technique for improving classification performance, or just for the invariance property in the field of 2D/3D point cloud classification? * This paper should include more discussions on the connection to [Li et al., 2021], which is also an important work on transformation-specific smoothing * [Li et al., 2021] Tss: Transformation-specific smoothing for robustness certification, CCS 2021. ### Limitations * The contribution of this paper is limited since invariant models are rarely studied for some reasons (might be the poor performance or the limited generalization ability?). # Summaries and Rebuttals ## Reviewer 1 ### Strengths * Invariant models are interesting * Well-written * Like idea of having "good enough" simple method ### Weaknesses * Would like to see gap between post-processing and tight in 3D rotation * Strange that post-processing better than tight in Figure 6 (but they say in their own summary, that some looseness due to Monte Carlo) * Too formalism-heavy. Should mention conceptual intuitions * Post-processing-approach quite intuitive * Would be nice to have some algorithm environment summarize which approach we suggest people use. * Idea of canonical representation and choice of canonical representation can be explaiend by whether you can quotient input space in nice manner * Fact that worst-case classifier is group-averaging can likely be explained via classical result * Hard to generalize to signals over space rather than direct coordiante data ### Minor issues * not call Cohen "classic" * Typos ll. 84,91 * Make binary classification footnote part of main text * Include Cohen paper general version of multi-class cert in Appendix F * Why slash through bottom of the >= symbol? ### Questions * First work to do invariance + smoothing (Response: Yes. And going even further, first work to study interplay of invariance and certification at all. Is also mentioned in abstract) * Strange behavior of rotation cert w.r.t. parameters? ## Rebuttal Reviewer 1 We would like to thank the reviewer for their in-depth review and their constructive feedback. Before addressing the other comments and questions, we would like to respond to your first question concerning the contribution and relationship to prior work. ### Contributions and prior work Our submission is the first to study the interplay between certifiable robustness and invariance (irrespective of the certification method). Thus, the contribution even goes beyond being "the first work that takes up the systematic study of randomized smoothing combined with invariant classifiers". Perhaps our statement about certificates for "specific operations with invariances" in Section 2 contributed to the lack of clarity concerning our contribution. We were trying to explain that neural networks commonly use a few specific operations (e.g. batch norm) that happen to have certain invariances and are compatible with existing white-box certification techniques. But their invariances have never been discussed in certifiable robustness literature, let alone used as a tool for certification. To clarify this, we have updated the list of contributions in Section 1 and the second part of the "orthogonal research directions" paragraph in Section 2. ### Comments #### [...] it would be better if theory and experiments relevant to the case of 3D rotations were presented [...] First, let us reiterate that the post-processing based certificate is compatible with arbitrary Euclidean isometries, including 3D rotations. In Appendix A.2.4. we compare it to the tight certificate for elemental / axis-aligned rotations in 3D. The former is (expectedly) not affected by adversarial rotations and more effective than the latter. We have updated our experimental section to directly reference this section of the appendix. Regardless, we agree with you that deriving tight certificates for 3D rotations would be an interesting extension. However, we hope that the novelty of our work as a whole and our strong result for translation-invariance should outweigh the fact that we have not derived this specific result yet. #### I think the clarity of the paper could be improved as a whole by also mentioning some of these simple conceptual intuitions: [...] Based on your feedback and that of Reviewer XHgH, we have made the following changes to more easily convey certain concepts: * More clearly delineate the different types of certificates in Section 1 and 5. * Added intuitive explanation of Theorem 3 (how using zero-mean, isotropic noise preserves rotation invariance) * Renamed "canonical representation" to "canonical map" (to avoid confusion with group representations) * Improved explanation of tight certification approach (focus on how it amounts to finding distribution over orbits/equivalence classes and applying Neyman-Pearson, rather than on technical details) Concerning your comment about connections between the worst-case classifier and invariant theory: We were sadly unable to find this specific result in existing literature. However, the fact that the group-averaging performed by this specific classifiers leads to invariance should be sufficiently intuitive for readers. #### It seems strange that the tight certificates approach does not yield superior accuracy in Figure 6. As you correctly mentioned in your summary, "there is extra Monte Carlo complexity/error associated with the tight certificate approach here". We also point this out in l.308-309. After our initial submission, we have repeated the same experiment using $10\times$ as many samples and observed the tight Monte Carlo certificate to converge towards the post-processing certificate (figshare link). Given the tight confidence bounds obtained by using this large number of samples, we expect the "true" (non-Monte Carlo) certificate to also be very close to the post-processing-based one. For the camera-ready version, we plan to repeat all experiments with a larger number of samples. #### It seems that generalizing this to images will be infeasible in general [...] You are right, the current approach is not applicable to symmetry groups acting on images or other signals defined over some homogenous space (aside from the translation certificate, which could be applied to intensity-invariant models). But, while computer vision is one important application domain, there are various other active fields (particularly ML for science and medicine) that work with spatial data and in which invariant models are much more prevalent (see also Section 1). We believe that, since this is the first forray into bringing together model invariance and certification, we can be excused from focusing on one (important) domain, rather than trying to tackle invariant machine learning as a whole. We do of course expect future work to generalize to other domains (either using randomized smoothing or some other, more suitable paradigm). ### Minor issues * l.45: We removed the sentence about "classic" randomized smoothing * l.84: We removed the spurious period * l.91: No. We are interested in determining whether $f(X') = y^*$. To clarify this, we have updated the conditional clause to "if, for perturbed input $X'$, class $y^*$ is more likely than all other classes combined". * l.92: We have moved the footnote about binary vs multi-class certificates into the main text. Note that the "binary certificate" is valid for arbitrary models, not just binary classifiers. * Appendix F: We were not sure what you meant with "Neyman-Pearson calculation proved in [27] for general values of the probabilities". We suppose you meant how Neyman-Pearson can be used to prove both upper and lower bounds and have thus restated both Lemmata in Appendix F, and explicitly written out the black-box multiclass certificate from [27]. * Eq.5: The slash through the bottom of $\gneq$ was meant to highlight that it is a strict inequality. We have replaced it with a normal "$>$". ### Questions #### I would appreciate some clarifications on the contributions and the relationship to prior work. See "Contributions and prior work" above. #### "The expressions in Theorem 7 seem a bit too complex to easily parse [...] -- one is curious whether some pathological behavior emerges for 'strange' values ; whether there is a 'typical' setting (random? incoherent?) " We are sorry, but we are not entirely sure what you are interested in. Assuming you are interested in whether we have any insights into the functional dependence between the certificates output and its parameters: 1. The certificate guarantees complete robustness for arbitrary adversarial rotations, i.e. $\Delta = R X - X$. We derive which specific parameter values corresponding to adversarial rotations in Appendix H.2. 2. When $|\Delta|$ is large relativ to $|X|$, then $X'$ cannot be the result of adversarial rotations and we no longer have this nice special case (see also Appendix H.2 and Fig. 9d/e). 3. The certificate appears to be symmetric w.r.t. parameter $\epsilon_4$, i.e. only depend on its magnitude (see Fig. 2). Note that our method is not the only randomized smoothing certificate that does not have a closed-form analytical expression that can be easily analyzed. For instance, certificates for discrete data require solving a linear program (see [1] and [2]). --- We hope that we have addressed all comments to your satisfaction and would be happy to respond to any further questions during the discussion period. [1] [Tight Certificates of Adversarial Robustness for Randomly Smoothed Classifiers](https://proceedings.neurips.cc/paper/2019/hash/fa2e8c4385712f9a1d24c363a2cbe5b8-Abstract.html) ; Guang-He Lee, Yang Yuan, Shiyu Chang, Tommi Jaakkola ; NeurIPS 2019 [2] [Efficient Robustness Certificates for Discrete Data: Sparsity-Aware Randomized Smoothing for Graphs, Images and More](https://proceedings.mlr.press/v119/bojchevski20a.html) ; Aleksandar Bojchevski, Johannes Gasteiger, Stephan Günnemann ; ICML 2020 ### Questions ## Reviewer 2 ### Strengths * Interesting idea * Solid analysis and experiments ### Weaknesses * Difficult to construct models that are actually invariant, not due to train-time augmentation (Response: Invariant and equivariant models are the bread-and-butter of various large, active machine learning subfields) * Types of certificates confusing (maybe remove pre-processing) * What is role of pre-processing * Why only binary? Or is it multiclass ### Questions * Typo theorem 6 * How realistic are translations and rotations. (Response: More literature on types of attacks. Important: We are trivially robust to translations/rotations. We are interested in arbitrary perturbations) * Why do we need group (Response: To have inverse element. e.g. for Theorem 3) ## Rebuttal Reviewer 2 We would like to thank the reviewer for the detailed comments on our submission and are glad to hear that they also find the combination of robustness certfication and invariant machine learning interesting. In the following, we will first address your three comments concerning potential weaknesses of our paper, before responding to your additional questions. ### Comments #### The applicability of the work might be somewhat limited, because it is difficult to construct models that are provably robust to certain input transformations [...] While the construction of invariant and equivariant models is certainly challenging, it has been extensively and actively studied since the publication of "Group Equivariant Convolutional Networks" in 2016 [1]. At this point, the ways of enforcing these properties are theoretically well-established (e.g. equivariant linear layers are necessarily group convolutions [2]) and there exist methods to build models that are invariant to arbitrary symmetry groups (see, for instance [3]). More importantly, invariant and equivariant models have become the de-facto standard in multiple large and active fields (outside computer vision for planar images). For instance: * Message-passing GNNs are the standard for graph classification and invariant to graph isometries. * Virtually all point cloud classifiers since PointNet are built around permutation invariance. * Point-cloud models that are equivariant or invariant under spatial transformations (which we study in this submission), have been the SOTA for tasks like molecular property prediction, molecular dynamics or drug discovery for years (see, for instance, this leaderboard for [QM9](https://paperswithcode.com/sota/formation-energy-on-qm9) and the papers for models like [MPNN](https://proceedings.mlr.press/v70/gilmer17a.html), [SchNet](https://arxiv.org/abs/1706.08566), [GemNet](https://arxiv.org/abs/2106.08903), [SpookyNet](https://arxiv.org/abs/2105.00304) or [PaiNN](https://arxiv.org/abs/2102.03150)). See also Section 1 of our submission. While our method is of course -- by design -- constrained to invariant models, this class of models is interesting, important and popular enough that we would not consider it a weakness. #### There are some clarity issues [...] The different types of certificates should be clearly defined (ideally already in the intro, at least roughly) and delineated. For instance, it’s not clear to me what the role of the pre-processing certificates is? Post-processing and pre-processing are two views or characterizations of the same certificate. The post-processing certificate takes an original certified region $B$ and applies all transformations from the symmetry group to construct a larger certified region $\tilde{B}$. The pre-processing formulation simply describes how you would determine whether $X' \in \tilde{B}$ for a specific $X'$, i.e. it is an implicit characterization of region $\tilde{B}$. The only reason we discuss the pre-processing perspective separately is that allows for a more convenient comparison with the tight certificates we derive in subsequent sections (e.g. in Section 6.2, where we show that the tight and post-processing-based certificates for translation invariance are identical.) Based on your feedback, we have added a delineation of the different certificate types to Section 1 and further clarified that post-processing and pre-processing are indeed two views on the same certificate in Section 5. #### Wouldn't comparing $p_{X',y^*}$ against the next most likely probability allow for deriving tighter bounds? Or is the analysis limited to binary classification? The analysis is not limited to binary classification. Like in the original randomized smoothing paper [4], there are two valid ways of certifying robustness for an arbitrary number of classes: 1. Proving that $p_{X',y^*} > 0.5$, i.e. $y^*$ is more likely than all other classes combined 2. Proving that $p_{X',y^*} > \max_{y \neq y^*} p_{X',y^*}$, i.e. $y^*$ is more likely than the next most likely class. For the sake of exposition we focus on the "binary certificate" (1.) throughout the main text, since the multiclass certificate (2.) can be constructed by combining two "binary certificates" (identical to the original randomized smoothing paper [4]). We discuss how to compute multiclass versions of all certificates in Appendix F. Note that "binary certificate" is not to be confused with a certificate that is limited to binary classifiers. To address your comment, we have moved the footnote into the main text and and included the above clarifications. We have further removed the potentially confusing "binary certificate" name from the main text. ### Questions #### In theorem 6, shouldn't g have the property of being rotation instead of translation invariant? Yes, we have corrected that typo. #### I am not sure how realistic for instance attacks using point translations or rotations are, it might be worth to discuss the threat-model or other ways in which such transformations might enter the test data in more detail. Based on your feedback, we have expanded the paragraph discussing adversarial attacks on point cloud data in Section 2. We now: * highlight that the field has evolved beyond simply adapting methods from the image domain * reference work that demonstrate physically realizable atacks (e.g. adversarial examples that can attack models through LiDAR sensors) * reference work on adversarial attacks via parametric transformations (e.g. translations or rotations) and discuss how it relates to our work Note that we are mostly concerned with adversarial attacks that modify points and are *not* translations or rotations. Since our model is invariant, we already know that such attacks can not influence our model's prediction. #### Could you clarify why the set of invariance transformations need to form a group? We primarily require the group property so that the inverse of each transformation is also an element of the group. This is needed for our derivation of Theorem 3 (see Appendix C.3). The inverses are also necessary to ensure that the post-processing and pre-processing perspective describe the same certificate. Otherwise, there may be points $X'$ that can be reached by applying transformations from group $T$ to a certified region $B$ (post-processing perspective) but not mapped back into $B$ (pre-processing perspective). Aside from that, most commonly studied invariances (translation, rotation, reflections, permututation, graph isometries ...) correspond to groups and the group formalism is used in most literature on invariant machine learning. We hope that we have addressed all comments and questions to your satisfaction. Please let us know if you have any further questions during the discussion period. [1] [Group Equivariant Convolutional Networks](https://proceedings.mlr.press/v48/cohenc16.html); Taco Cohen, Max Welling; ICML 2016 [2] [On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups](https://proceedings.mlr.press/v80/kondor18a.html); Risi Kondor, Shubhendu Trivedi, ICML 2018 [3] [A Program to Build E(N)-Equivariant Steerable CNNs](https://openreview.net/forum?id=WE4qe9xlnQw); Gabriele Cesa, Leon Lang, Maurice Weiler; ICLR 2022 [4] [Certified Adversarial Robustness via Randomized Smoothing](https://proceedings.mlr.press/v97/cohen19c.html); Jeremy M Cohen, Elan Rosenfeld, J. Zico Kolter ; ICML 2019 ## Summary Reviewer 3 ### Strengths * paper is complete * presentation is good ### Weaknesses * Incremental (Theorems 1,2, 5 obvious) * Response: 1,2 are supposed to be obvious, are our nice simplistic baseline * 5 is a trivial consequence of 2. Interesting bit is, that we prove it to be tight using our approach * Noone uses invariant models. We need to construct invariant verseions via some tricks (1. They are bread-and-butter of various subfields. 2. Not "some tricks", but CVPR papers (also cite other PCA-based papers). 3. We opted for these models for comparison with existing work on randomized smoothing for spatial data. And because it is common to use well-known baselines) ### Questions * Typo Lemma 14 * Is group averaging common for improving performance or just for building invariant models (Means of invariant models. But one that is associated with increased accuracy, cite e.g. Cohen) * Want more discussion of transformation-specific smoothign (Give extra paragraph distinguishing three types of transformation-specific. 1. Over-approximate reachable set. 2. Parameter as input. 2.1. via randomized smoothing. 2.2 via convex relaxation) ### Limitations * Invariant models rarely studied ## Rebuttal Reviewer 3 We would like to thank the Reviewer for their helpful comments and questions, which have allowed us to further improve upon our submission. Before going through them, let us first address your two main comments concerning the paper's quality: ### Comments #### The contribution of this paper is a bit incremental. Specifically, Theorem1,2,5 are obvious to me. Firstly, we disagree with the assessment that this paper was incremental. No prior work has studied the interplay of invariance and certifiable robustness. Theorems 1 and 2 are very straight-forward, but it is important to consider them within the overall context of our paper. Since no one has studied the certification of invariant models before, we first propose the easily understandable "post-processing-based certificate" (Theorems 1 and 2). This method serves as a baseline against which we can compare the provably tight certificates we derive in Section 6 (which, aside from studying invariance and certification for the first time, are our main contribution). We have updated the introduction to clarify this from the very beginning. We strongly disagree with Theorem 5 being obvious. The right-hand side of Theorem 5 is obvious because it can be obtained by simply applying Theorem 2 to translation invariance. What makes Theorem 5 interesting is that it is derived using our method for tight certification from Section 6.1 (that is why we have an equality with $\min_{h \in \mathbb{H}_T} \Pr\left[h(Z) = y^*\right]$, where $\mathbb{H}_T$ is the set of all translation-invariant classifiers). The interesting and non-trivial aspect is that this very simple certificate is actually tight -- it is impossible to find a better one. #### The significance of invariance-aware randomized smoothing is limited. The neural network classifiers that satisfy invariance are extremely rare, so the author needs to construct invariant versions of PointNet and DGCNN via some tricks. Invariant classifiers are not rare at all (unless one is exclusively interested in a few specific computer vision tasks for planar images -- and ignores the translation-invariance of CNNs). Invariant neural networks have been extensively and actively studied since the publication of "Group Equivariant Convolutional Networks" [1] in 2016. More importantly, invariant and equivariant models have become the de-facto standard in multiple large and active fields. For instance: * Message-passing GNNs are the standard for graph classification and invariant to graph isometries. * Virtually all point cloud classifiers since PointNet are built around permutation invariance. * Point-cloud models that are equivariant or invariant under spatial transformations (which we study in this submission), have been the SOTA for tasks like molecular property prediction, molecular dynamics or drug discovery for years (see, for instance, this leaderboard for [QM9](https://paperswithcode.com/sota/formation-energy-on-qm9) and the papers for models like [MPNN](https://proceedings.mlr.press/v70/gilmer17a.html), [SchNet](https://arxiv.org/abs/1706.08566), [GemNet](https://arxiv.org/abs/2106.08903), [SpookyNet](https://arxiv.org/abs/2105.00304) or [PaiNN](https://arxiv.org/abs/2102.03150)). See also Section 1 of our submission. Furthermore, we would find it questionable to dismiss the work of Li et al. (ICCV 21) [2] and Xiao et al. (ICME 20) [3] that the models in our experiments are based on as "some tricks". We opted for these specific models because all prior work on (non-invariant) randomized smoothing for point clouds also used PointNet and DGCNN as base classifiers. Given the ubiquity of invariances in machine learning (and the on-going interest in robust machine learning), we believe the joint study of both aspects to indeed be significant. ### Questions #### Is Eq. (91) in Lemma 14 (appendix) wrong? It might be a typo. Yes, the right-hand side should have $\mathbb{H}$ (the set of all classifiers) instead of $\mathbb{H}_T$ (the set of all invariant classifiers). We have corrected that typo. #### Is group averaging (line 222) a common technique for improving classification performance, or just for the invariance property in the field of 2D/3D point cloud classification? Group-averaging is a technique for achieving invariance. As discussed above, invariant models are often more accurate than models without such inductive biases. #### This paper should include more discussions on the connection to [Li et al., 2021], While we already cited (Li et al., 2021), we have expanded our discussion of transformation-specific certification. We now distinguish between white-box and randomized-smoothing-based methods (including that of Li et al. 2021) and highlight the unique aspects of different randomized-smoothing-based methods. We hope that we have addressed all of your comments and questions to your satisfaction. We further hope that we have been at least somewhat successful in convincing you that equivariance and invariance do indeed play a significant role in the larger machine learning community. Please let us know if you have any further questions during the discussion period. [1] [Group Equivariant Convolutional Networks](https://proceedings.mlr.press/v48/cohenc16.html); Taco Cohen, Max Welling; ICML 2016 [2] [A Closer Look at Rotation-Invariant Deep Point Cloud Analysis](openaccess.thecvf.com/content/ICCV2021/html/Li_A_Closer_Look_at_Rotation-Invariant_Deep_Point_Cloud_Analysis_ICCV_2021_paper.html); Feiran Li, Kent Fujiwara, Fumio Okura, Yasuyuki Matsushita; ICCV 2021 [3] [Endowing Deep 3D Models with Rotation Invariance Based on Principal Component Analysis](https://ieeexplore.ieee.org/abstract/document/9102947); Zelin Xiao; Hongxin Lin; Renjie Li; Lishuai Geng; Hongyang Chao; Shengyong Ding; ICME 2020 ## Changelog Based on the reviewers' comments we have made further improvements to our submission. Changed or added sentences are highlighted in blue in the updated manuscript. Line numbers below refer to the updated submission. ### Main changes * Appendix is now included in the main submission pdf * Section 1 (Introduction) * Delineate certificate types in Section 1 (ll.34-37) * Add "first study on interplay of invariance and certifiable robustness" to contributions list (l.44) * Section 2 (Related work) * Expand discussion of prior work and threat models for spatial data (ll.65-73) * Expand discussion of transformation-specific randomized smoothing (ll. 81-91) * Improve clarity of discussion of "operations with invariances" (ll.93-95) * Section 3 (Randomized smoothing) * Move discussion of binary vs multiclass certificates into main text (ll.109-111) * Remove ambiguous "binary certificate" term from main text (ll.109-111) * Section 5 (Post-processing-based certificates) * Stress that pre-processing and post-processing are two alternative characterizations of / views on the same certificate (ll.146-147) * Provide intuitive explanation of isotropic smoothing distributions preserving invariance to isometries (ll.165-169) * Section 6 (Tight gray-box certificates) * Improve intuitive explanation of tight certification approach (finding distributions over equivalence classes) (ll.190-195) * Section 8 (Experimental evaluation) * Reference experiment on post-processing-based certificate for 3D rotation invariance from Appendix A.2.4. (ll.327-328) ### Minor changes * removed sentence calling 2019 randomized smoothing paper "classic" * removed spurios period (l.101) * stress that certification requires bounding class probabilities for perturbed input, not the clean input (l.108) * include Neyman-Pearson upper bound and black-box multiclass certificate in discussion of multi-class certificates (Appendix F) * Replace "$\gneq$" with $>$ in Theorem 6 * Replaced erroneous "translation invariant" with "rotation invariant" in Theorem 6 * Replaced erroneous "$\mathbb{H}_\mathbb{T}$" with "$\mathbb{H}$" in Eq. 91 of Lemma 14. * Replaced "canonical representation" with "canonical map" everywhere * removed a few redundant or unnecessary words / clauses to stay within page limit

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully