# Response to R4
We thank the **Reviewer RJnB** for the thoughtful review and insightful recommendations. Below we provide a response to key questions/comments that may alleviate major concerns:
## Clarifying claims regarding zero transferability
> On page 2, the authors claim an ensemble with zero transferability can be learned from disjoint frequency subsets. This claim is a bit of a stretch without an attack bound. An unbounded attack can surely reach 100\% transferability.
Consider an ensemble of two disjoint frequency models $F_A$ and $F_B$ looking at frequency subsets $S_A$ and $S_B$. We do acknowledge the reviewer’s argument that an unbounded attack would always succeed. However, the probability of success of an attack against $F_A$ remains the same with or without the knowledge of $F_B$, implying “zero” transferability. This is because any change in model $F_B$’s output is independent of the frequencies in subset $S_A$, and the gradients of $F_B$’s output with respect to frequencies in $S_A$ are always zero. We will clarify this in the main paper.
## Assumptions on attack budget of disjoint classifiers
>In the Appendix, the proof of bounds uses the assumption that each disjoint classifier has an attack budget strictly smaller than the total budget, that is, $\epsilon_i<\epsilon$ and $\epsilon=\sum_i\epsilon_i$. This assumption does not always hold, especially when the decision boundaries of the classifiers largely overlap. The question boils down to whether one can always assume the decision boundary of a classifier Fi is completely orthogonal to the other classifiers in the ensemble?
To answer the reviewer’s question about the attack budget bounds in the proof, we would firstly like to point out that $\epsilon_i \leq$ $\epsilon$ (equal or less, not strictly less) is trivial. Regarding the sum of perturbation budgets, we would like to prove the following :
Consider a disjoint frequency ensemble consisting of $N$ models : $F_1, F_2,..., F_N$. Let each model be assigned disjoint subsets of frequencies $S_1,S_2,...,S_N$ respectively, where $S_1\cup S_2 ...\cup S_N = S_0$ where $S_0$ is the entire frequency spectrum, and $S_i \cap S_j = \phi \; \forall \; 1\leq i,j \leq N$ . For an attack, let $\delta_f$ be any arbitrary perturbation in the frequency domain under the total budget constraint $\epsilon$. Let $\delta_f^{S_1}, \delta_f^{S_2},..., \delta_f^{S_N}$ be the perturbation contributions belonging to the different subsets. Note that, a perturbation budget $\epsilon$ in the pixel domain translates to the same value in the frequency domain (Parseval's theorem : L2 norm is invariant to frequency transformation) i.e. $|\delta_f|_2 \leq \epsilon$. Now, $|\delta_f^{S_i}|_2$ comprises of sum of squares of perturbations in individual frequencies in $S_i$. Since pairwise intersection of subsets is empty, $\sum_{i=1}^N |\delta_f^{S_i}|$ has the contribution of each frequency exactly once.
Therefore, $\sum_{i=1}^N |\delta_f^{S_i}| \leq |\delta_f|_2 \leq \epsilon$. (If the subsets weren’t disjoint, then the sum could exceed the total norm as pointed out by the reviewer). Thus, the sum of the budgets is bounded by $\epsilon$.
We also note that for the case of disjoint ensembles, the decision boundaries of individual classifiers are always pairwise orthogonal. Consider the models $F_1$ and $F_2$ defined on the frequency subsets $S_1$ and $S_2$ respectively. Since $F_1$ is defined on $S_1$, the decision boundary for $F_1$ will lie in the span of $S_1$, and similarly for $F_2$. Now, we know $S_1 \cap S_2 = \phi$, which implies that the decision boundaries are orthogonal. To illustrate this, consider a simple example in 2D, a ensemble of two models :
$F_A(x,y) = \{1, x>\theta; 0, x\leq \theta\}$ and $F_B(x,y) = \{1, y>\phi; 0, y\leq \phi\}$. The decision boundaries of $F_A$ and $F_B$ are always parallel to the y-axis and x-axis respectively, and hence are alway perpendicular.
## Experimental methodology for attacking a D3 ensemble
>How is a D3 ensemble attacked? Do you assign an attack budget $\epsilon_i$ smaller than the total budget $\epsilon$ to each classifier in the ensemble?
For any of the ensemble settings (D3 or any of the baselines), we computed the final logits of the ensemble as the average of the logits of the constituent models. Therefore, the ensemble can be treated as a single model which outputs the average logits. Then, we assign the total attack budget of $\epsilon$ to the ensemble.
>What if you apply D3 ensembles on the adversarial samples computed from the entire frequency spectrum?
Below we show results for the performance of D3-S(4) on the adversarial examples computed via attacking the model looking at the entire frequency spectrum (i.e., AT). We generate the attacks via 50 steps of APGD-CE.
| Epsilon ($L_2$) | 0.5 | 1 | 5 | 10 | 20 |
|---------|-----|-----|----|----|-----|
| Accuracy| 100 | 100 | 100| 100| 1.4 |
| Epsilon ($L_\infty$) | 0.004 | 0.016 | 0.032 | 0.064 | 0.128 |
|----------------------|-------|---------|--------|-------|--------|
| Accuracy | 100 | 100 | 100| 100| 49.1 |
We can observe here that D3-S(4) is resilient to attacks computed from the entire frequency spectrum as well.
> In Section 4.3, Expectation Over Transformation (EoT) is used as the attack strategy. Shouldn't EoT be used in all experiments where an ensemble is employed?
As discussed above, we can represent D3 or any of the other baseline ensembles by a single model which outputs the average logits. Moreover, there is no runtime randomness i.e. the ensembles produce the same output for multiple queries of the same input. Therefore, we don’t need to apply EOT while attacking a single D3 ensemble (or any of the other base lines ensembles). In Section 4.3, we propose an extension of D3 where we employ a pool of multiple D3 ensembles (with different random frequency paritionings) and randomly select one ensemble from the pool to generate the output. To handle this runtime randomness, we employ EOT to attack the pool of D3 ensembles.
> Why is the performance of D3-R superior to other ensembles: ADP, GAL, and DV? I assume they all can be trained on random subsets of the frequency components in the DCT feature space.
The other baseline ensembles (ADP, GAL, DV) involve training multiple models, each of which is trained on the entire feature space. They provide robustness via increasing the diversity among the constituent models by either using regularizers or inter-model adversarial training (please refer to the Appendix Section A.2.1 for more details). On the other hand, D3 provides diversity by using disjoint features. Considering this, Training the other baselines on subsets of the frequency components would make them identical to D3. We will clarify this in the paper.
## Disjoint Classifiers
> While the authors did mention the limitation of their approach, the "disjointedness" of each classifier in the ensemble should be clearly defined and verified.
We would like emphasize here that D3 ensures that the classifiers are disjoint by design : no two models in the ensemble look at the same frequency. Therefore, there are no common features that an adversary can target to attack all the models simultaneously. Please refer to Section 3 in the main paper for more details.