# MIA
### 2022/09/23
---
## Setting
For a black box MemGuard defensed model $M_D$, the data owner only know the
(1) model architecture of $M_D$
(2) the training algorithm of the MemGuard model $M_D = A(S)$
(3) the dataset $S = \{ s_i = (\mathbf{x}_i, \mathbf{y}_i)\}, \mathbf{x}_i \in \mathbb{R}^d, \mathbf{y}_i \in \mathbb{R}^c$.
The data owner wants to know whether the data $s_i$ is used to train the model $M_D$ or not.
---
## Training Algorithms
1. Train a shadow model $M_T$ with a subset $S_T$ of full dataset $S$
2. Train a MIA attack model $f_{\theta}: \mathbb{R}^c \to \mathbb{R}$, which predict the the data point $\mathbf{x}_i$ is in training set or not $f_{\theta}(M_T(\mathbf{x}_i))$ based on the shadow model $M_S$ and the dataset $S_T \cup S_D$.
$$
\arg\min_{\theta} - \frac{1}{N} \left( \sum_{\mathbf{x}_i \in S_T} \log(f_{\theta}(M_T(\mathbf{x}_i))) + \sum_{\mathbf{x}_i \in S_D} \log(1 - f_{\theta}(M_T(\mathbf{x}_i)))\right)
$$
Where $S_D \cap S_T = \emptyset, S_T, S_D \in S$ and $N$ is number of elements of the set $S_T \cup S_D$.
---
3. Train the target model $M$ with $S_T$. Then, train a MemGuard noise $E$ from target model $M$ with dataset $S_T \cup S_D$. Let $E = \{ \mathbf{e}_i = n(M(\mathbf{x}_i))\}, \mathbf{x}_i \in S_T \cup S_D$. Then, we can get a protected vector $\{\mathbf{x}_i + \mathbf{e}_i\}$
4. Split $S_{T} \cup S_{D}$ into training dataset $S_{T}^{train} \cup S_{D}^{train}$ and testing dataset $S_{T}^{test} \cup S_{D}^{test}$. Perform Algorithm1: Line 9
$$
\theta^*, \tilde{\theta}^∗ = \arg\min_{\theta, \tilde{\theta}} \text{Errorrate}(M + n(M), \mathcal{R}_{\theta, \tilde{\theta}}(f_{\theta}), S_{T}^{train}, S_{D}^{train})
$$
5. Evaluate the smoothed attack model $\mathcal{R}_{\theta^*, \tilde{\theta}^*}(f_{\theta})$ on MemGuard output $\{ M(\mathbf{x}_i) + n(M(\mathbf{x}_i)) \}, \mathbf{x}_i \in S_{T}^{test} \cup S_{D}^{test}$
$$
\text{MIA}(\mathcal{R}_{\theta^*, \tilde{\theta}^*}(f_{\theta^*}), M + n(M), S_{T}^{test} \cup S_{D}^{test})
$$
---
## Randomized Smoothness
Given an dataset $\mathcal{S} = \{ \mathbf{x}_i, \mathbf{y}_i \}_{i = 1}^{N}, \mathbf{x}_i \in \mathbb{R}^d, \mathbf{y}_i \in \mathbb{R}$ and a data sample $\mathbf{x}_i$ to the attack classifier.
The randomized smooth attack model can be represented as
$$
\mathcal{R}_{\theta, \tilde{\theta}}(f_{\theta}(\mathbf{x}_i)) = f_{\theta}(\mathbf{x}_i + p_{\tilde{\theta}}(\mathbf{x}_i) \times \mathbf{z}_i)
$$
where $p_{\tilde{\theta}}(\mathbf{x})$ is the multiplier of the Gaussian noise, $\theta$ is the parameters of the model $f$ and $\tilde{\theta}$ is the parameters of model $p$ and $\mathbf{z} \sim \mathcal{N}(0, \mathbf{I}), \mathbf{z} \in \mathbb{R}^d$.
---
## Find The Optimal Parameter of The Randomized Smoothness
To obtain the optimal randomized smoothed function $\mathcal{R}_{\theta^*, \tilde{\theta}^*}(f_{\theta})$. The optimization problem can be formulated as
$$
\mathcal{R}_{\theta^*, \tilde{\theta}^*}(f_{\theta})
= \arg \min_{\theta, \tilde{\theta}} \frac{1}{N} \sum_{i=1}^{N} y_i \log(f_{\theta}(\mathbf{x}_i + p_{\tilde{\theta}}(\mathbf{x}_i) \times \mathbf{z}_i)) + (1 - y_i) \log(1 - f_{\theta}(\mathbf{x}_i + p_{\theta'}(\mathbf{x}_i) \times \mathbf{z}_i))
$$
Where $\mathbf{z}_i \sim \mathcal{N}(0, \mathbf{I}), \mathbf{I} \in \mathbb{R}^{d \times d}$
---
## Variants of Function $p$
I've designed some variants of the function $p$
- ``None``: No perturbation, $p_{\tilde{\theta}}(\mathbf{x}_i) = 0$
- ``Const Vector``: $p_{\tilde{\theta}}(\mathbf{x}_i) = \mathbf{v}, \mathbf{v} \in \mathbb{R}^d$. Parameter $\mathbf{v} \in \theta'$ is trainable.
- ``Const Scaler``: $p_{\tilde{\theta}}(\mathbf{x}_i) = c, c \in \mathbb{R}$. Parameter $c \in \theta'$ is trainable.
---
# Experiment 1:
---
- Optimizer: Adam with LR = 0.01
- Batch size: 256
- Epoch: 500
- Training/Testing dataset size: 500 in-member + 500 out-member
- For each combination, we re-train the shadow model, attack classifier, and the trainable parameters of the perturbation $p$ for 10 times and report the average test accuracy.
---
## Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.597/0.051 | 0.493/0.005 | 0.517/0.057 |
| 0.7 | 0.574/0.027 | 0.495/0.006 | 0.515/0.044 |
| 0.5 | 0.581/0.009 | 0.500/0.013 | 0.524/0.043 |
| 0.3 | 0.611/0.012 | 0.527/0.052 | 0.545/0.064 |
| 0.1 | 0.693/0.011 | 0.550/0.094 | 0.641/0.076 |
| 0 | 0.822/0.009 | 0.635/0.140 | 0.725/0.154 |
---
## Non-Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.543/0.066 | 0.540/0.076 | 0.574/0.063 |
| 0.7 | 0.569/0.020 | 0.579/0.048 | 0.561/0.038 |
| 0.5 | 0.586/0.011 | 0.572/0.022 | 0.580/0.006 |
| 0.3 | 0.613/0.008 | 0.618/0.011 | 0.606/0.011 |
| 0.1 | 0.693/0.014 | 0.695/0.024 | 0.699/0.009 |
| 0 | 0.820/0.007 | 0.821/0.007 | 0.822/0.006 |
---
## Observation of Experiment 1:
1. It seems doesn't work well
2. Test accuracy has high standard deviation, especially for ``Const Vec`` and ``Const Scaler``
3. Non-randomness gives better accuracy
---
## Hypothesis:
**For Observation 1.**
1. The complexity of $p_{\tilde{\theta}}$ may not enough
**For Observation 2.**
2. Optimizer doesn't find good parameters during optimization
3. The randomness of $\mathbf{z}$ makes the loss surface rugged
4. In-proper optimizer
5. Not converge yet
**For Observation 3.**
6. The randomness of $\mathbf{z}$ makes the loss surface rugged
---
## Experiments
**For hypothesis 1**
Use trainable single / multi layer NNs as $p_{\tilde{\theta}}$
**For hypothesis 2**
Manually specify the parameter $c$, for ``Const-Scaler`` perturbation $p_{\tilde{\theta}}(\mathbf{x}) = c$
**For hypothesis 3, 6**
Take expectation over the random variables $\mathbf{z}_i$
$$
\mathcal{R}_{\theta^*, \tilde{\theta}^*}(f_{\theta})
= \arg \min_{\theta, \tilde{\theta}} \frac{1}{N} \sum_{i=1}^{N} \mathbb{E}_{\mathbf{z}_i} \left[ y_i \log(f_{\theta}(\mathbf{x}_i + p_{\tilde{\theta}}(\mathbf{x}_i) \times \mathbf{z}_i)) + (1 - y_i) \log(1 - f_{\theta}(\mathbf{x}_i + p_{\theta'}(\mathbf{x}_i) \times \mathbf{z}_i)) \right]
$$
I use MC to estimate the expectation with 100 samples
---
# Experiment 2
---
## Experiment 2
Use trainable single / multi layer NNs as $p_{\tilde{\theta}}$. I've designed some variants of $p_{\tilde{\theta}}$ with higher complexity
- ``Linear``: $p_{\tilde{\theta}}(\mathbf{x}_i) = \mathbf{w} \mathbf{x}_i, \mathbf{w} \in \mathbb{R}^{d \times d}$. Parameter $\mathbf{w} \in \theta'$ is trainable.
- ``Non-Linear``: $p_{\tilde{\theta}}(\mathbf{x}_i)$ is a trainable 3-layer fully-connected NN. Each layer has 256 neurons.
---
## Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Linear | Non-Linear | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.597/0.051 | 0.535/0.053 | 0.624/0.076 | 0.493/0.005 | 0.517/0.057 |
| 0.7 | 0.574/0.027 | 0.571/0.034 | 0.653/0.009 | 0.495/0.006 | 0.515/0.044 |
| 0.5 | 0.581/0.009 | 0.578/0.021 | 0.640/0.043 | 0.500/0.013 | 0.524/0.043 |
| 0.3 | 0.611/0.012 | 0.615/0.006 | 0.631/0.019 | 0.527/0.052 | 0.545/0.064 |
| 0.1 | 0.693/0.011 | 0.701/0.017 | 0.700/0.020 | 0.550/0.094 | 0.641/0.076 |
| 0 | 0.822/0.009 | 0.826/0.008 | 0.821/0.007 | 0.635/0.140 | 0.725/0.154 |
---
## Non-Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Linear | Non-Linear | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.543/0.066 | 0.642/0.007 | 0.797/0.007 | 0.540/0.076 | 0.574/0.063 |
| 0.7 | 0.569/0.020 | 0.631/0.013 | 0.795/0.010 | 0.579/0.048 | 0.561/0.038 |
| 0.5 | 0.586/0.011 | 0.616/0.015 | 0.797/0.013 | 0.572/0.022 | 0.580/0.006 |
| 0.3 | 0.613/0.008 | 0.650/0.012 | 0.791/0.013 | 0.618/0.011 | 0.606/0.011 |
| 0.1 | 0.693/0.014 | 0.719/0.011 | 0.773/0.031 | 0.695/0.024 | 0.699/0.009 |
| 0 | 0.820/0.007 | 0.824/0.008 | 0.822/0.005 | 0.821/0.007 | 0.822/0.006 |
---
## Observation of Experiment 2
1. It seems that non-randomized ``Linear`` and ``Non-Linear`` are better than randomized ones.
2. Even in non-randomized setting, ``Const Vec`` and ``Const Scaler`` don't work well.
---
# Experiment 3
---
## Experiment 3
Manually specify the parameter $c$, for ``Const-Scaler`` perturbation $p_{\tilde{\theta}}(\mathbf{x}) = c$
I use a log-scale series as $c$ with base 0.8 and multiplier 0.8
---
## Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
##### Denote $c$ as perturb scale
| Perturb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:-------------:|:------:|:------:|:------:|:------:|:------:|:------:|
| 0.8 | 0.495 | 0.494 | 0.493 | 0.497 | 0.494 | 0.501 |
| 0.64 | 0.496 | 0.491 | 0.507 | 0.501 | 0.502 | 0.496 |
| 0.512 | 0.491 | 0.492 | 0.499 | 0.503 | 0.495 | 0.495 |
| 0.2097 | 0.515 | 0.497 | 0.502 | 0.504 | 0.495 | 0.496 |
| 0.1073 | 0.506 | 0.495 | 0.500 | 0.507 | 0.642 | 0.540 |
---
| Perturb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:-------------:|:------:|:------:|:------:|:------:|:------:|:------:|
| 0.0549 | 0.580 | 0.529 | 0.496 | 0.502 | 0.690 | 0.757 |
| 0.0225 | 0.533 | 0.538 | 0.509 | 0.513 | 0.687 | 0.804 |
| 0.0115 | 0.535 | 0.551 | 0.536 | 0.582 | 0.677 | 0.813 |
| 0.0059 | 0.573 | 0.570 | 0.573 | 0.600 | 0.680 | 0.825 |
| 0.0024 | 0.585 | 0.574 | 0.585 | 0.616 | 0.688 | 0.823 |
| 0.0012 | 0.547 | 0.547 | 0.581 | 0.613 | 0.685 | 0.821 |
---
## Observation of Experiment 3
1. We found that the best $c = 0.0024$ yields similar test accuracy to the non-randomized ``Const Scaler``, which means randomness doesn't improve accuracy.
---
# Experiment 4
---
## Experiment 4
Take expectation over the random variables $\mathbf{z}_i$
$$
\mathcal{R}_{\theta^*, \tilde{\theta}^*}(f_{\theta})
= \arg \min_{\theta, \tilde{\theta}} \frac{1}{N} \sum_{i=1}^{N} \mathbb{E}_{\mathbf{z}_i} \left[ y_i \log(f_{\theta}(\mathbf{x}_i + p_{\tilde{\theta}}(\mathbf{x}_i) \times \mathbf{z}_i)) + (1 - y_i) \log(1 - f_{\theta}(\mathbf{x}_i + p_{\theta'}(\mathbf{x}_i) \times \mathbf{z}_i)) \right]
$$
I use MC to estimate the expectation with 1000 samples and conduct experiment 2 and 3 again.
---
## Reproduce EXP 2: Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Linear | Non-Linear | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.545/0.067 | 0.622/0.012 | 0.797/0.006 | 0.493/0.006 | 0.518/0.054 |
| 0.7 | 0.577/0.025 | 0.623/0.006 | 0.797/0.006 | 0.491/0.009 | 0.556/0.037 |
| 0.5 | 0.576/0.012 | 0.621/0.008 | 0.799/0.010 | 0.495/0.017 | 0.584/0.010 |
| 0.3 | 0.615/0.013 | 0.637/0.012 | 0.797/0.005 | 0.511/0.033 | 0.613/0.006 |
| 0.1 | 0.696/0.013 | 0.714/0.005 | 0.742/0.026 | 0.636/0.095 | 0.689/0.014 |
| 0 | 0.820/0.010 | 0.820/0.006 | 0.822/0.006 | 0.684/0.157 | 0.821/0.007 |
---
## Reproduce EXP 3: Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
##### Denote $c$ as perturb scale
| Perturb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:-------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| 0.8 | 0.538/0.060 | 0.572/0.040 | 0.586/0.014 | 0.615/0.008 | 0.687/0.019 | 0.825/0.012 |
| 0.64 | 0.555/0.062 | 0.581/0.033 | 0.580/0.006 | 0.613/0.010 | 0.702/0.014 | 0.819/0.006 |
| 0.512 | 0.528/0.052 | 0.568/0.034 | 0.583/0.010 | 0.615/0.009 | 0.694/0.017 | 0.823/0.006 |
| 0.2097 | 0.521/0.056 | 0.581/0.023 | 0.582/0.013 | 0.611/0.008 | 0.692/0.011 | 0.821/0.009 |
| 0.1073 | 0.539/0.060 | 0.576/0.036 | 0.584/0.011 | 0.618/0.007 | 0.696/0.015 | 0.821/0.008 |
---
## Reproduce EXP 3: Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
##### Denote $c$ as perturb scale
| Perturb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:-------------:|:------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| 0.0549 | 0.591/0.047 | 0.561/0.046 | 0.578/0.007 | 0.618/0.009 | 0.693/0.016 | 0.819/0.006 |
| 0.0225 | 0.5424/0.070 | 0.564/0.038 | 0.586/0.020 | 0.615/0.014 | 0.690/0.018 | 0.824/0.007 |
| 0.0115 | 0.533/0.062 | 0.578/0.034 | 0.578/0.013 | 0.615/0.011 | 0.693/0.013 | 0.821/0.007 |
| 0.0059 | 0.563/0.068 | 0.588/0.013 | 0.584/0.007 | 0.613/0.009 | 0.693/0.010 | 0.823/0.010 |
| 0.0024 | 0.561/0.072 | 0.583/0.030 | 0.587/0.013 | 0.617/0.009 | 0.699/0.014 | 0.820/0.005 |
| 0.0012 | 0.591/0.050 | 0.557/0.045 | 0.582/0.021 | 0.609/0.005 | 0.696/0.015 | 0.819/0.007 |
---
## Observation of Experiment 4
1. MC can reduce the variacne of the accuracy in most of cases.
2. However, although MC can improve the accuracy due to smoothing the loss surface, the performace can only be competetive to non-randomness ones.
---
# Experiment 5
---
## Experiment 5
So, why ``CONST SCAL`` perform poor? We propose 1 hypothesis
1. Because the decision boundery of the attack classifier isn't smooth enough.
As a result, if we can make the boundery smoother, we may get better performance. Thus, we design 2 experiments
1. Augment: Replicate the dataset for 3 times and add Gaussian noise $\mathcal{N}(0, 0.001)$ to them.
2. One-hot: Follow the setting of the original paper [Certified Adversarial Robustness via Randomized Smoothing](https://arxiv.org/abs/1902.02918)
---
#### Randomized Smoothed
Non-Extract, MC 1000
``Const Scaler``, Perturb Scale = 0.0024
| P Value | Augment | One-Hot | Augment + One-Hot |
|:-------:|:-----------:|:-----------:|:-----------------:|
| 1.0 | 0.657/0.009 | 0.506/0.012 | 0.501/0.004 |
| 0.7 | 0.628/0.005 | 0.495/0.010 | 0.501/0.005 |
| 0.5 | 0.609/0.007 | 0.501/0.009 | 0.499/0.004 |
| 0.3 | 0.624/0.004 | 0.508/0.011 | 0.498/0.006 |
| 0.1 | 0.715/0.003 | 0.500/0.006 | 0.498/0.004 |
| 0 | 0.822/0.003 | 0.500/0.013 | 0.497/0.005 |
---
#### Non-Randomized Smoothed
Non-Extract, ``Const Scaler``
| P Value | Augment | One-Hot | Augment + One-Hot |
|:-------:|:-----------:|:-----------:|:-----------------:|
| 1.0 | 0.662/0.009 | 0.497/0.014 | 0.500/0.005 |
| 0.7 | 0.625/0.005 | 0.493/0.016 | 0.500/0.006 |
| 0.5 | 0.611/0.003 | 0.504/0.012 | 0.500/0.007 |
| 0.3 | 0.620/0.005 | 0.504/0.011 | 0.500/0.005 |
| 0.1 | 0.712/0.004 | 0.510/0.015 | 0.502/0.004 |
| 0 | 0.822/0.003 | 0.499/0.012 | 0.498/0.004 |
---
## Observation of Experiment 5
1. Randomized smoothing works, but the decision boundery(``CONST SCAL`` performs poor without data augmentation) and the loss surface(high variamce accuracy) are rugged, so it need to augment 3 times more data.
2. We may try augment more data to check the performace upper bound.
---
# Experiment 6
---
## Experiment 6
Exam the performance of extract MemGuard model. All the setting are the same as Reproduce EXP 2.
Hyperameter
- Extract model
- MC 1000
---
## Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Linear | Non-Linear | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.496/0.008 | 0.492/0.006 | 0.493/0.011 | 0.489/0.006 | 0.485/0.007 |
| 0.7 | 0.491/0.007 | 0.486/0.008 | 0.490/0.014 | 0.491/0.006 | 0.485/0.009 |
| 0.5 | 0.487/0.014 | 0.491/0.010 | 0.494/0.009 | 0.490/0.005 | 0.486/0.008 |
| 0.3 | 0.497/0.007 | 0.498/0.011 | 0.490/0.012 | 0.497/0.003 | 0.491/0.012 |
| 0.1 | 0.489/0.013 | 0.489/0.006 | 0.494/0.014 | 0.491/0.006 | 0.498/0.008 |
| 0 | 0.488/0.005 | 0.489/0.008 | 0.488/0.011 | 0.491/0.006 | 0.491/0.006 |
<!-- ##### Denote $c$ as perturb scale
| Perturb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:-------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| 0.8 | 0.496/0.007 | 0.496/0.014 | 0.503/0.012 | 0.501/0.011 | 0.499/0.010 | 0.506/0.011 |
| 0.64 | 0.500/0.008 | 0.501/0.010 | 0.500/0.007 | 0.500/0.007 | 0.499/0.013 | 0.500/0.010 |
| 0.512 | 0.499/0.009 | 0.501/0.009 | 0.501/0.015 | 0.496/0.017 | 0.500/0.011 | 0.499/0.013 |
| 0.2097 | 0.507/0.008 | 0.497/0.008 | 0.500/0.010 | 0.500/0.006 | 0.497/0.015 | 0.494/0.008 |
| 0.1073 | 0.500/0.014 | 0.499/0.010 | 0.494/0.011 | 0.500/0.010 | 0.500/0.011 | 0.502/0.011 | -->
---
## Non-Randomized Smoothed Testing Acc.(Mean Acc./Std Acc.)
| P Value | None | Linear | Non-Linear | Const Vec | Const Scaler |
|:-------:|:-----------:|:-----------:|:-----------:|:-----------:|:------------:|
| 1.0 | 0.493/0.013 | 0.495/0.015 | 0.485/0.006 | 0.486/0.007 | |
| 0.7 | 0.491/0.008 | 0.488/0.004 | 0.497/0.017 | | |
| 0.5 | 0.490/0.008 | 0.487/0.008 | 0.494/0.014 | | |
| 0.3 | 0.489/0.010 | 0.496/0.011 | 0.488/0.005 | | |
| 0.1 | 0.491/0.009 | 0.497/0.010 | 0.502/0.008 | | |
| 0 | 0.484/0.006 | 0.496/0.011 | 0.495/0.012 | | |
The empty fileds are in progress.
<!--##### Denote $c$ as perturb scale
| Perturb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:-------------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|:-----------:|
| 0.0549 | 0.492/0.011 | 0.497/0.011 | 0.505/0.007 | 0.499/0.012 | 0.498/0.011 | 0.493/0.008 |
| 0.0225 | 0.499/0.010 | 0.498/0.010 | 0.497/0.006 | 0.496/0.011 | 0.502/0.013 | 0.499/0.013 |
| 0.0115 | 0.502/0.011 | 0.497/0.015 | 0.498/0.011 | 0.501/0.009 | 0.496/0.009 | 0.497/0.007 |
| 0.0059 | 0.501/0.014 | 0.501/0.014 | 0.500/0.010 | 0.505/0.011 | 0.498/0.011 | 0.501/0.005 |
| 0.0024 | 0.491/0.013 | 0.500/0.012 | 0.505/0.010 | 0.494/0.008 | 0.497/0.014 | 0.500/0.016 |
| 0.0012 | 0.498/0.009 | 0.500/0.015 | 0.497/0.008 | 0.501/0.012 | 0.499/0.012 | 0.501/0.010 | -->
---
# Conclusion
1. Randomized smoothing works, but the decision boundery(``CONST SCAL`` performs poor without data augmentation) and the loss surface(high variamce accuracy) are rugged, so it need to augment 3 times more data. We may try augment more data to check the performace upper bound.
2. However, a deterministic NN-based perturbation may performs better than the randomized smoothing, but it require more experiments.
---
# Appendix
---
## Experiment 3:
Given a fixed perturbation scale $\sigma$
| Perurb Scale | P: 1.0 | P: 0.7 | P: 0.5 | P: 0.3 | P: 0.1 | P: 0.0 |
|:------------:|:------:|:------:|:------:|:------:|:------:|:------:|
| 0.8 | 0.495 | 0.494 | 0.493 | 0.497 | 0.494 | 0.501 |
| 0.64 | 0.496 | 0.491 | 0.507 | 0.501 | 0.502 | 0.496 |
| 0.512 | 0.491 | 0.492 | 0.499 | 0.503 | 0.495 | 0.495 |
| 0.4096 | | | | | | |
| 0.3276 | | | | | | |
| 0.2621 | | | | | | |
| 0.2097 | 0.515 | 0.497 | 0.502 | 0.504 | 0.495 | 0.496 |
| 0.1677 | | | | | | |
| 0.1342 | | | | | | |
| 0.1073 | 0.506 | 0.495 | 0.500 | 0.507 | 0.642 | 0.540 |
| 0.0858 | | | | | | |
| 0.0687 | | | | | | |
| 0.0549 | 0.580 | 0.529 | 0.496 | 0.502 | 0.690 | 0.757 |
| 0.0439 | | | | | | |
| 0.0351 | | | | | | |
| 0.0281 | | | | | | |
| 0.0225 | 0.533 | 0.538 | 0.509 | 0.513 | 0.687 | 0.804 |
| 0.0180 | | | | | | |
| 0.0144 | | | | | | |
| 0.0115 | 0.535 | 0.551 | 0.536 | 0.582 | 0.677 | 0.813 |
| 0.0092 | 0.601 | 0.574 | 0.564 | 0.596 | 0.675 | 0.818 |
| 0.0073 | | | | | | |
| 0.0059 | 0.573 | 0.570 | 0.573 | 0.600 | 0.680 | 0.825 |
| 0.0047 | | | | | | |
| 0.0037 | | | | | | |
| 0.0030 | | | | | | |
| 0.0024 | 0.585 | 0.574 | 0.585 | 0.616 | 0.688 | 0.823 |
| 0.0019 | | | | | | |
| 0.0015 | | | | | | |
| 0.0012 | 0.547 | 0.547 | 0.581 | 0.613 | 0.685 | 0.821 |
| | | | | | | |