## Discussion
**Thank you for carefully reviewing and confirming the errors reported in our submission and for drafting a summary table!** We have amended the table below. It will undoubtedly make a valuable addition to the manuscript!
A few comments on the changes below:
1. [9] is a refined journal version of [10], and are based on the same code. Here is the completed version.
2. We realize that the section in the appendix about [75] is not clear. The problem in the code of [75] is that the attack does not use (an approproiate estimate) the posterior predictive mean. That is, the code for **[75] attacks a single logit sample (i.e., $f(x ; \theta^i)$) instead of a mean of multiple softmax predictions (i.e., $\frac{1}{S} \sum\_{i=1}^S p(y | x ; \theta^i)$, as described above)**. Since BNNs are stochastic classifiers, they use multiple Monte Carlo samples at evaluation (typically at least $S=10$ and often more, e.g., $S=100$). If the number of Monte Carlos samples used in the attack is too small, we are not actually be attacking the predictive function used at evaluation (which uses multiple Monte Carlo samples). We discuss how to correctly attack a stochastic classifier in lines 97--100 in the manuscript. We show how to fix the code of [75] to use an appropriate estimate of the posterior predictive mean below [\*]. Using an appropriate attack (switching to a multi-sample posterior predictive mean estimate and replacing `torch.nn.CrossEntropyLoss` with
`torch.nn.NLLLoss `) will break the method proposed in [75] and reproduce the "re-evaluated" results presented in Table 5 in the appendix of the manuscript.
Work | model prediction | attack softmax
----- | ----- | ------- |
[9,10] | line 134 and line 254 in `model_bnn.py` | line 76 and line 99 in `adversarialAttacks.py`
[60] | Line 66, `cats_and_dogs.py` (softmax) | In the cleverhans attack code. line 61 in Cleverhans v2.0/utils_tf.py, line 142 in Cleverhans `v3.1/cleverhans/attack/fast_gradient_method.py`, and line 47 in Cleverhans `v4.0/cleverhans/tf2/attacks/fast_gradient_method.py`.
[75] | Line 34 and line 47 in SEBR_evaluating.py (single-sample logit instead of average of softmax predictions) | Line 37 and 50 in `SEBR_evaluating.py`
[\*] To attack an appropriate estimate of the posterior predictive mean in `SEBR_evaluating.py`, the following changes are made to the code (see annotations):
```
def fgsm(model, X, y, norm, epsilon):
delta = torch.zeros_like(X, requires_grad=True)
X2 = norm(X + delta).cuda()
### [original code] single-sample logit prediction:
### outputs = model(X2)
outputs, _, _, _, _ = model.sample_predict(X2, 20) # [fixed code] 20 logit MC samples
outputs = torch.nn.functional.softmax(outputs, dim=-1) # [fixed code] application of softmax to logit MC samples
outputs = outputs.mean(dim=0) # [fixed code] Monte Carlo estimate of posterior predictive mean
if type(outputs) == type(()):
outputs = outputs[0]
### [original code] cross-entropy loss computation with single logit prediction as input:
### loss = nn.CrossEntropyLoss()(outputs, y.cuda())
loss = nn.NLLLoss()(torch.log(outputs + 1e-10), y.cuda()) # [fixed code] cross entropy loss computation with mean of softmax predictions as input
loss.backward()
return epsilon * delta.grad.sign()
def pgd(model, X, y, norm, epsilon, alpha, num_iter):
delta = torch.zeros_like(X, requires_grad=True)
delta.data.uniform_(-epsilon, epsilon)
for t in range(num_iter):
X2 = norm(X + delta).cuda()
### [original code] single-sample logit prediction:
### outputs = model(X2)
outputs, _, _, _, _ = model.sample_predict(X2, 20) # [fixed code] 20 logit MC samples
outputs = torch.nn.functional.softmax(outputs, dim=-1) # [fixed code] application of softmax to logit MC samples
outputs = outputs.mean(dim=0) # [fixed code] Monte Carlo estimate of posterior predictive mean
if type(outputs) == type(()):
outputs = outputs[0]
### [original code] cross-entropy loss computation with single logit prediction as input:
### loss = nn.CrossEntropyLoss()(outputs, y.cuda())
loss = nn.NLLLoss()(torch.log(outputs + 1e-10), y.cuda()) # [fixed code] cross entropy loss computation with mean of softmax predictions as input
loss.backward()
delta.data = (delta + alpha * delta.grad.data.sign()).clamp(-epsilon, epsilon)
delta.grad.zero_()
return delta.detach()
```
We also adjusted the alpha parameter in PGD, which was set to 100,000 (`1e5` in line 66 in `SEBR_evaluating.py`), which is non-standard and makes the PGD attack very weak. We changed the value to `alpha = epsilon * 0.2` to ensure alpha lies in a more commonly used range (between 0 and 1). We did not tune this attack parameter and expect that PGD can be made even more effective by tuning it, as is commonly done in the adversaral attacks literature.
We have added the details above to the appendix of the manuscript. Thank you for your help in making these points clearer!
---
## General response
We thank all reviewers for their thoughtful and constructive comments.
### Claims about BNNs' inherent adversarial robustness in the literature
We want to use the general response to further clarify and highlight claims made in the literature (including in peer-reviewd papers published at reputable venues) about the inherent robustness of BNNs to adversarial attacks.
When conducting our literature search, **we were surprised by the abundance of statements in the literature that BNNs are a priori robust to adversarial attacks** and that this belief has then been **propagated to related works sections of various follow-up works**.
Statements about the robustness of BNNs in earlier works (published after peer review at reputable venues) include statements such as:
* > "Bayesian neural networks have been found to have adversarial robustness naturally" [Zhang et al. 2021]
* > "the Bayesian neural network shows significant robustness in the experiments in terms of classification accuracy" [Pang et al. 2021]
* > "Experimental results..., representing the finite data regime, ... showing that BNNs can display both high accuracy and robustness to gradient based adversarial attacks" [Carbone et al. 2020].
These beliefs can be shown to have been further propagated to more recent works:
* > "BNNs have been shown to possess many favorable robustness properties against adversarial attacks" [Yuan et al. 2021]
* > "using 2 BNNs trained on MNIST and attacked with $\ell\_\infty$-FGSM and $\ell\_\infty$-PGD attacks, BNNs are found to be robust to these attacks" [Uchendu et al. 2021]
* > "Bayesian models are suggested to be more robust" [Li et al. 2021] "networks trained with stochastic gradient descent are known to be less robust to adversarial perturbations than networks trained with Bayesian inference" [Palma et al. 2021]
Additionally, beliefs about the usefulness of Bayesian pipelines for adversarial detection also has also been propagated:
* > "Bayesian neural networks (BNNs) have shown promise in adversarial detection" [Deng et al. 2021]
* > "Recent advances in Bayesian neural networks have demonstrated that uncertainty estimates can be used to detect adversarial attacks" [Li et al. 2019]
* > "uncertainty estimation can also be used to detect anomalous samples" [Zhang et al. 2022]
Given this very ample evidence, of which the above is only a small sample, we feel the strong need to provide a definitive study that carefully investigates the benefits (or lack thereof) of BNNs for adversarial robustnes and detection. We believe that this will lead to a more nuanced understanding of the robustness of BNNs and hope it will ultimately contribute to the advancement of the field.
### Requested comparison to deterministic neural networks
Several reviewers requested a direct comparison to deterministic neural networks. The lack of robustness to adversarial attacks in deterministic neural networks has been widely reported in prior works and while the focus of the paper is to investigate whether BNNs are inherently adversarially robust (as clained in several published works), we agree that including deterministic neural network results from our implementation may help put our findings into context.
Following your suggestions, we have incorporated an additional empirical evaluation on deterministic neural networks (NNs) using the same architecture, code base, and adversarial evaluation methods as for the BNNs. Specifically, we trained deterministic CNNs with SGD using a learning rate of 0.05, and deterministic ResNets with a learning rate of 0.005. The number of epochs was aligned with those used for Bayesian Neural Networks (BNNs). The deterministic NNs achieved clean accuracy comparable (with differences within the margin of error) to that of Bayesian Neural Networks (BNNs), ensuring a fair and balanced comparison. For robustness evaluation, we utilized the PGD40 attack with the same adversarial budget $\epsilon$
The robust accuracy (in %) for deterministic NNs is shown below:
Model \ Dataset | MNIST($\epsilon=0.3$) | FMNIST($\epsilon=0.1$) | CIFAR10($\epsilon=8/255$) |
----- | ----- | ------- | -------- |
CNN | 1.10(0.31) | 8.02(0.23) | - |
ResNet | 0.39(0.03) |4.64(0.03) | 4.74(0.10) |
We further conducted a comparative analysis of these results against the BNNs in two distinct settings: MNIST using CNN and CIFAR10 using ResNet. The robust accuracy (in %) for BNNs is shown below:
Setting \ Method | Deterministic | HMC | MCD | PSVI | FSVI |
----- | ----- | ------- | -------- | -------- | -------- |
MNIST+CNN, PGD | 1.10(0.31) | 0.57(0.05) | 0.52(0.03) | 0.64(0.02) | 0.60(0.06) |
CIFAR10+ResNet, PGD | 4.74(0.10) | - | 4.23(0.17) | 5.25(2.27) | 5.18(0.27) |
Intriguingly, our findings do not uncover a significant advantage of BNNs over deterministic NNs with regard to predictive accuracy (test accuracy percentage vs. PGD attack). In the context of MNIST using a CNN, deterministic NNs performed slightly better than BNNs, while methods such as PSVI and FSVI achieved higher robust accuracy for CIFAR-10 with ResNet. Notably, in both scenarios, the observed gap was smaller than three times the standard deviation. We believe these results, along with other evidence presented in the paper, coalesce into a compelling argument that BNNs are not inherently robust and may be even less robust than deterministic NNs.
We would be happy to conduct further follow-up experiments if you have further questions. If you wish to inspect our code, we included links to colab notebooks at the top of the appendix submitted in the supplementary material.
---
**Thank you for reviewing our work!**
# Reviewer 1: eV94
Thank you for your thoughtful and constructive questions and suggestions!
We address your questions and comments below. Please let us know if you have any remaining questions.
---
> I do not think in the literature there is the widespread claim that BNNs are inherently robust to adversarial perturbations.
When conducting our literature search, we were surprised by the abundance of statements in the literature that BNNs are a priori robust to adversarial attacks and that this belief has then been propagated to related works sections of various follow-up works. Please see the General Response for details.
---
> I think in its current form the paper is missing an opportunity. [...] only fixed epsilon are used and no comparison with deterministic NNs is performed
Thank you for the suggestion! While the focus of the paper is to investigate whether BNNs are inherently adversarially robust (as claimed in several published works), **we agree that including deterministic neural network results may help put our findings into context and performed a comparison between BNNs and deterministic NNs**. The results are presented in the tables in the general response.
**Interestingly, our findings do not uncover a significant advantage of BNNs over deterministic NNs with regard to predictive accuracy (test accuracy percentage vs. PGD attack).** In the context of MNIST using a CNN, deterministic NNs performed slightly better than BNNs, while methods such as PSVI and FSVI achieved higher robust accuracy for CIFAR-10 with ResNet. In both scenarios, the observed gap was smaller than three times the standard deviation. We believe these results and other evidence presented in the paper coalesce into a compelling argument that BNNs are not inherently robust and may be even less robust than deterministic NNs.
We would be happy to conduct further follow-up experiments if you have further questions. If you wish to inspect our code, we included links to colab notebooks at the top of the appendix submitted in the supplementary material.
---
> Also, already papers exist that try to combine adversarial learning with Bayesian inference [70,75] and contrasting empirical results have already being published [7]
We agree that [70] and [75] represent steps in the right direction. However, adversarial training for BNNs is not the focus of our analysis. As explained above, our analysis is aimed providing a definite assessment of the numerous claims in the literature that BNNs are inherently robust.
While [7] does provide some negative results, they are limited to only small network architectures that provide little insight into the behavior of BNNs constrcuted from larger architectures that are actually of practical interest in, for example, image classification problems. Our analysis is systematic, thorough, and includes practically relevant evaluation settings and neural network architectures. It is consistent with and corroborates the claims in [7], and we hope that our study will serve as a definitive statement in this debate.
---
>[75] is not the only paper that consider robust training of BNNs, see e.g. [70] that the authors already cite, so it is unclear to me why the authors only compare with [75] and not also with [70]
The main focus of the paper was on BNNs that have not been explicitly trained to be adversarially robust, unlike [70]. Also, the method of [70] shows gains of certifiable adversarial robustness, yet on a different regime than what has been the focus of our study (much smaller perturbations).
---
> the theoretical results [5,9] focus on limiting conditions and only guarantee vanishing zero gradients on the data manifold, but they give no guarantees on what happens if you sample outside the data manifold like in the PGD case and/or in the finite width case, and they seem to be quite clear on the fact that additional experimental investigations are needed.
We do not take issue with the validity of the theoretical results but note that the claim about the empirical robustness of BNNs suggest that BNNs are robust (or at least more robust than they are shown to be in our study). In addition, our experiments with their code shows that their optimistic empirical results are due to an error in their adversarial attack. To emphasize this point again: We do not criticize the theoretical contributions but
1. point out that the empirical adversarial robustness claimed in the paper does not reproduce and that
2. this is due to an error in the authors' attack.
---
> For simpler tasks I believe the authors should also compare with Gaussian processes
Thank you for the suggestion. GPs have already been the focus of independent prior studies (e.g. [32] in the paper) that agree with the central conclusions of our work. We also note that GPs are a fundamentally different model class since no feature learning occurs in (non-parametric) GPs, such as the "infinite-width" BNNs you referenced.
---
> In line 248 the authors mention that before of attacking the BNN for the particular experiments they rescale the logits. This modifies the attack you are using, so can you elaborate why you think this modification makes the comparison fair with the original results?
In order for a defense to claim success, it must be able to withstand *any* adversarial attack. The fact that a simple rescaling of the logits allows a vanilla attack to succeed means that, unfortunately, the proposed defense was not a real defense. When adversarially attacking a proposed defense, we can use *any* attack as long as the perturbation is within the prescribed $\epsilon$-ball around the data point. This is the concept of adaptive attacks which is now standard in evaluation of adversarial robustness. [1] is an excellent reference on it.
[1] On Evaluating Adversarial Robustness, Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian J. Goodfellow, Aleksander Madry, Alexey Kurakin (2019)
---
Please let us know if you have any further questions!
# Reviewer 2: Jcvq
Thank you for your thoughtful and constructive questions and suggestions!
We are pleased that you found our submission is "well-motivated" and "of importance" and that our findings are "valuable" We hope that our response below address all of your questions and concerns.
We address your questions and comments below. Please let us know if you have any remaining questions.
---
> While the findings are valuable, it would have been more appropriate if the authors had provided some guidelines on improving BNN robustness, or assessed transfer attacks from single deterministic models to BNNs, etc. As in current format, provided that the authors make their benchmarking tool public, this work is more suited for the benchmark and datasets track at NeurIPS. However, I'd like to defer my judgement on this issue until after the discussion phase
Our comprehensive evaluation of past work, together with the design of our own experiments on novel tasks (semantic shift detection) is not focused on a particular dataset/methodology and **applying adversarial attacks correctly using existing python libraries only requires minor modifications to avoid the errors made in previous works**. Our paper systematically studies representative approximate inference methods for BNNs, uncovering similar failure modes as their deterministic counterparts, and does not propose a benchmarking testbed: It points out which errors were made in prior works and shows that after fixing these errors, BNNs exhibit little to no evidence of inherent robustness to adversarial attacks.
---
>Have the authors considered evaluating their attacks against more advance stochastic gradient MCMC methods (such as cSGHMC, cSGLD, etc.) or deep ensembles?
Rudner et al. (2022) showed that FSVI outperforms deep ensembles in terms of uncertainty quantification and SGHMC (stochastic gradient HMC) is an approximation to HMC and HMC typically outperforms SGHMC.
Given that both state-of-the-art BNN approximate inference methods as well as deterministic NNs fail to be inherently adversarially robust, it is unclear why the methods you suggested would provide increased robustness. We are working on adding an SGHMC baseline but it would be highly surprising if the qualitative results would be meaningfully different from the results presented in the manuscript.
---
> The double softmax problem seems to be widely reported in this work. It's important that this problem is described in the main body along with the annotated code of the original implementation showcasing the errors.
We agree that it may appear odd to focus on a mundane error like this. The reason we do is because we found this error to be very common and because it reflects a misunderstanding about how to compute a Bayesian posterior predictive distribution in Bayesian neural networks. We believe this emphasis is useful for readers to appreciate errors made in prior work (and, in some cases, to understand why prior work obtained more optimistic results) but we would be happy to de-emphsize it if you do not think it is useful to readers.
---
> The authors mention that clipping the feature values to be between 0 and 1 is problematic. I don't see how. It's normal for a machine learning system to have bounds on it's inputs and outputs defined and it's fair to reject examples which do not fit these bounds.
The issue in this specific setting is that the original images lie inside [0, 255], while after the adversarial attack they are clipped incorrectly to lie in [0, 1] (instead of [0,255]).
---
> If the authors are claiming implementation issues with the past work, then it must be clearly pointed out (i.e. present the annotated code to showcase where the errors are appearing) It's important to double check things like software version, etc. to make sure that things haven't changed since the time the authors of the original studies released their implementation.
Thank you for the suggestion! We will include annotated snippets of code in the revision to highlight our guiding principles for correct evaluation on BNNs.
---
Please let us know if you have any further questions!
# Reviewer 3: Vfd6
Thank you for your thoughtful and constructive questions and suggestions!
We are pleased that you found our empirical evaluation and discussion of related work to be "thorough" and that our submission addresses a "very important" problem. We hope that our response below address all of your questions and concerns.
We address your questions and comments below. Please let us know if you have any remaining questions.
---
> In this context, the reported results should further include appropriate entries pertaining to the results using deterministic architectures, or even adversarially trained methods for direct comparison and interpretation of the results.
Thank you for the suggestion! While the focus of the paper is to investigate whether BNNs are inherently adversarially robust (as claimed in several published works), **we agree that including deterministic neural network results may help put our findings into context and performed a comparison between BNNs and deterministic NNs**. The results are presented in the tables in the general response.
**Interestingly, our findings do not uncover a significant advantage of BNNs over deterministic NNs with regard to predictive accuracy (test accuracy percentage vs. PGD attack).** In the context of MNIST using a CNN, deterministic NNs performed slightly better than BNNs, while methods such as PSVI and FSVI achieved higher robust accuracy for CIFAR-10 with ResNet. In both scenarios, the observed gap was smaller than three times the standard deviation. We believe these results and other evidence presented in the paper coalesce into a compelling argument that BNNs are not inherently robust and may be even less robust than deterministic NNs.
We would be happy to conduct further follow-up experiments if you have further questions. If you wish to inspect our code, we included links to colab notebooks at the top of the appendix submitted in the supplementary material.
---
> There are two instances that report: "Deterministic NNs have close to 0% robust accuracy, while we show low single digits, but we have not optimized our attacks for this proof-of-principle analysis" (Table 2 caption) and "Note that for deterministic neural networks, robust accuracy under adversarial attacks approaches 0% while for our attacks on BNNs it is in the low single digits (still below the 10% accuracy for random guessing). Since the goal of this work is to evaluate claims of significant adversarial robustness of BNNs, we have not optimized our attacks to drive accuracy to approach zero but believe this to be possible.". I do not see why the authors need to reiterate this fact.
We will delete these sentences and add the direct comparison between the robustness of BNNs and that of deterministic NNs. Please see our general response for experimental results for this comparison.
---
> Similarly, in line 344 the authors state: "Our empirical analysis has refuted prior accessible evidence that BNNs enjoy some natural inherent robustness to adversarial attacks,", I find this phrasing a bit too strong. The experimental evidence suggest that maybe EMPIRICALLY there is no significant inherent robustness to BNNs as claimed in some works but coming back to the previous point, even single digits difference needs to be further investigated.
We agree that it would be useful to add additional nuance here. We have made appropriate changes in the manuscript.
---
>Are there any other works that deviate from the standard small scale datasets? BNNs trained on larger datasets may exhibit significantly different properties.
Which small-scale datasets are you referring to? Modern BNNs are typically evaluated on large-ish datasets, such as FashionMNIST and CIFAR-10, with ResNet-18 architectures (or larger). That being said, exact inference in NNs can be challenging even for small datasets and small NN architectures. We believe that the combination of FashionMNIST (which is surprisingly challenging to classify with an accuracy above 94%) and the small CNNs used in our study (which contain several hundred thousand parameters) gives the right amount of generality. Training a BNN on a subset of these datasets degrades performance and there is no obvious reason why a decrease in the dataset size would leasd to an improvement in adversarial robustness. (In fact, it is more likely to deteriorate adversarial robustness or not change it at all than to improve it. We would expect the same for larger datasets.)
---
>There are some works that are missing and could be interesting [1,2]. [1] Constitutes a recent adversarial training for BNNs. [2] introduces a data driven sparse and stochastic activation. They optimize an ELBO so it falls under the BNN umbrella and results with EoT seem good. It is apparent that some kind of gradient masking is naturally arising, but this is up to the adversary to solve and could not in principle be achieved without a bayesian treatment.
Thank you for sharing these references. As we understand it, [1] and [2] are optimized to yield adversarial robustness. In contrast, in our study, **we assess inherent adversarial robustness of BNNs**, which---as noted above---has been claimed as a feature of BNNs in a number of peer-reviewed papers published at reputable venues. Moreover, unless we are mistaken, [2], while considering ELBO-type optimization, does not directly address robustness of the BNN inference pipeline (e.g., for selective prediction).
---
Please let us know if you have any further questions!
# Reviewer 4: 5fRB
Thank you for your questions and feedback!
We were pleased that our submission makes a "good" contribution and has an "excellent" presentation. We address the one weakness mentioned in your review below and also answer your questions about the double softmax.
We address your questions and comments below. Please let us know if you have any remaining questions.
---
> The paper aruges that BNNs are not inherently robust in practice. However, there is no experimental comparison between BNNs and deterministic NNs. If BNNs are more robust than deterministic NNs, we can still make the conclusion that BNNs are inherently more robust.
Thank you for the suggestion! While the focus of the paper is to investigate whether BNNs are inherently adversarially robust (as claimed in several published works), **we agree that including deterministic neural network results may help put our findings into context and performed a comparison between BNNs and deterministic NNs**. The results are presented in the tables in the general response.
**Interestingly, our findings do not uncover a significant advantage of BNNs over deterministic NNs with regard to predictive accuracy (test accuracy percentage vs. PGD attack).** In the context of MNIST using a CNN, deterministic NNs performed slightly better than BNNs, while methods such as PSVI and FSVI achieved higher robust accuracy for CIFAR-10 with ResNet. In both scenarios, the observed gap was smaller than three times the standard deviation. We believe these results and other evidence presented in the paper coalesce into a compelling argument that BNNs are not inherently robust and may be even less robust than deterministic NNs.
We would be happy to conduct further follow-up experiments if you have further questions. If you wish to inspect our code, we included links to colab notebooks at the top of the appendix submitted in the supplementary material.
---
> It is argued that many previous papers [9,10,60,75] have the “double-softmax” problem. The paper shows in Line 793, Deterministic models produce pre-loss outputs and the softmax operations are done in losses, while BNNs directly produce probabilities. However, I do not think it is right. For a Bayesian neural networks with multiple layers (without softmax operation in the last layer), there is no guarantee that the direct output from the model is in [0,1] and can be seen as a probability. Could you please show more analysis about the "double-softmax" problem in these papers?
BNN predictions are performed using a Monte Carlo estimate of the posterior predictive mean. The posterior predictive mean is computed by making several stochastic forward passes thorugh the neural network to obtain samples $f(x ; \theta^i) \in R^{Q}$, where $Q$ is the number of output dimensions, computing predictions $p(y | x ; \theta^i)$, which for classification is obtained by computing $\text{softmax}(f(x ; \theta^i))$, and then averaging the different $p(y | x ; \theta^i)$, i.e., the posterior predictive mean is given by $\frac{1}{S} \sum\_{i=1}^S p(y | x ; \theta^i)$. This is an average of softmax predictions (with entries in $[0,1]$). Averaging all $f(x ; \theta^i)$ would lead to a different value and would be incorrect.
The double softmax problem arises because adversarial examples libraries typically take as inputs the unnormalized logits (i.e., $f(x ; \theta^i)$) and internally define a cross entropy loss function, $\frac{1}{Q} \sum\_{k=1}^Q \delta(y_k = 1) \log \text{softmax}(\text{"input"})$ and the high-level adversarial training interface only requires passing $f(x ; \theta)$ as $\text{"input"}$. The double-softmax problem arises when the posterior predictive mean estimate of a BNN, i.e., $\frac{1}{S} \sum\_{i=1}^S p(y | x ; \theta^i)$ (which, as stated above, is an average of softmax predictions) is given as $\text{"input"}$, which should be evident will lead to applying a softmax function to an average of softmax functions.
---
Please let us know if you have any further questions!
---
Nearly all the BNNs we examine implement a softmax function to directly output the $p(y|x, \theta)$. These implementations use such output to attack with cross entropy loss, leading to the double-softmax problem. We would also like to presen an empirical evidence.
Sorry we are not allowed to provide link during rebuttal period. You could find the official implementation from [1] in github. In this implementation, the output of BNNs are already post-softmax (see line 134 in model_bnn.py). However, when they perform the adversarial attack, either FGSM or PGD, they use the cross-entropy loss which add another softmax to the prediction (see line 76 and line 99 in adversarialAttacks.py).
[1] Ginevra Carbone, Matthew Wicker, Luca Laurenti, Andrea Patane', Luca Bortolussi, and Guido Sanguinetti. Robustness of bayesian neural networks to gradient-based attacks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 15602–15613. Curran Associates, Inc., 2020.
In response to the reviewer's request, we follow the standard evalution benchmark
$\epsilon$ | 0/1275 | 2/1275 | 4/1275 | 6/1275 | 8/1275 | 10/1275 | 12/1275 | 14/1275 | 16/1275 | 18/1275 | 20/1275 | 22/1275 | 24/1275 | 26/1275 | 28/1275 | 30/1275 | 32/1275 | 34/1275 | 36/1275 | 38/1275 | 40/1275 |
---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
FSVI | 93.25 | 84.35 | 64.67 | 50.47 | 40.50 | 32.70 | 26.14 | 21.02 | 17.16 | 14.42 | 12.25 | 10.15 | 8.84 | 7.97 | 7.24 | 6.68 | 6.38 | 6.14 | 5.95 | 5.72 | 5.58 |
PSVI | 95.10 | 83.76 | 62.04 | 44.29 | 31.26 | 22.51 | 16.65 | 12.26 | 9.28 | 7.23 | 5.81 | 4.88 | 4.35 | 3.98 | 3.75 | 3.60 | 3.51 | 3.44 | 3.36 | 3.35 | 3.36 |
MCD | 94.06 | 86.46 | 69.14 | 49.53 | 31.67 | 19.57 | 12.92 | 8.79 | 6.86 | 5.60 | 5.22 | 5.02 | 4.90 | 4.82 | 4.86 | 4.84 | 4.83 | 4.85 | 4.86 | 4.87 | 4.85 |
Deterministic | 94.06 | 81.45 | 58.62 | 37.78 | 24.12 | 16.46 | 11.53 | 8.76 | 6.97 | 6.03 | 5.43 | 5.09 | 4.90 | 4.77 | 4.62 | 4.59 | 4.56 | 4.55 | 4.54 | 4.54 | 4.55 |
# To Reviewer 4
In response to the reviewer's request, we evaluate the small CNN on MNIST adhering to the standard adversarial evaluation benchmark. We observe that the performance of deterministic neural networks (NN) situates between FSVI, PSVI, and MCDropout. Notably, for the deterministic model with different hyperparameters, the robust accuracy at smaller radii exhibits significant variation. Such variance can reach up to 5%. This substantial variation underscores our rationale for assessing robustness only at certain well-established (larger) perturbation levels, as is common practice in the adversarial literature.
$\epsilon$ | 0/200 | 3/200 | 6/200 | 9/200 | 12/200 | 15/200 | 18/200 | 21/200 | 24/200 | 27/200 | 30/200 | 33/200 | 36/200 | 39/200 | 42/200 | 45/200 | 48/200 | 51/200 | 54/200 | 57/200 | 60/200 |
---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- |
FSVI | 99.27(0.01) | 97.47(0.09) | 86.22(1.16) | 47.88(2.54) | 13.31(1.84) | 2.59(0.88) | 0.74(0.12) | 0.60(0.03) | 0.59(0.02) | 0.59(0.02) | 0.59(0.02) | 0.59(0.02) | 0.59(0.02) | 0.58(0.02) | 0.59(0.02) | 0.59(0.02) | 0.59(0.02) | 0.58(0.02) | 0.58(0.02) | 0.58(0.02) | 0.58(0.02) |
PSVI | 99.21(0.02) | 96.65(0.34) | 83.00(1.55) | 50.95(0.72) | 15.60(2.20) | 2.66(0.88) | 0.83(0.18) | 0.72(0.09) | 0.71(0.08) | 0.70(0.08) | 0.70(0.09) | 0.71(0.09) | 0.70(0.09) | 0.70(0.09) | 0.70(0.09) | 0.70(0.08) | 0.70(0.08) | 0.70(0.09) | 0.70(0.09) | 0.70(0.08) | 0.71(0.09) |
MCD | 99.39(0.04) | 98.67(0.12) | 95.48(0.19) | 86.71(0.76) | 69.22(1.97) | 43.67(2.74) | 21.46(1.18) | 8.99(0.15) | 4.06(0.11) | 2.26(0.34) | 1.53(0.36) | 1.14(0.35) | 0.96(0.31) | 0.86(0.25) | 0.79(0.19) | 0.72(0.16) | 0.68(0.13) | 0.64(0.11) | 0.63(0.09) | 0.60(0.06) | 0.59(0.06) |
Deterministic | 99.41(0.03) | 98.44(0.10) | 94.42(0.23) | 83.34(0.44) | 61.77(1.35) | 35.76(1.72) | 18.02(1.39) | 9.52(1.08) | 5.40(1.18) | 3.39(1.10) | 2.33(0.91) | 1.70(0.79) | 1.33(0.65) | 1.12(0.57) | 0.97(0.46) | 0.87(0.40) | 0.80(0.36) | 0.77(0.31) | 0.74(0.28) | 0.70(0.24) | 0.67(0.21) |