# References
:::info
- 標題記得括號標Conference(有的話)
- Link: 貼可以直接拿到bib的網址,github/openreview/...,有inproceedings的比較好
- Relevance: 以下擇一
- Atk: Attack
- AT: TRADES (AT) (Memorization)
- GM: Gradient Masking (Obfuscated Gradients)
- Q/V: Quantification/Visualization
- Oth: Others
- Intro: 主要是related work,介紹沒看過的(主文有看過的可以寫個代號就好)
- Importance: 以下擇一
- MT: Main Text
- RW: Related Work
- NC: No Cite
:::
#### 1. Theoretically Principled Trade-off between Robustness and Accuracy (ICML'19)
- Link: https://github.com/yaodongyu/TRADES
- Relevance: AT
- Intro: TRADES
- Importance: MT
#### 2. Towards Deep Learning Models Resistant to Adversarial Attacks (ICLR'18)
- Link: https://openreview.net/forum?id=rJzIBfZAbc
- Relevance: Atk
- Intro: PGD
- Importance: MT
#### 3. On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them (NeurIPS'20)
- Link: https://github.com/liuchen11/AdversaryLossLandscape
- Relevance: Q/V
- Intro: loss landscape of AT
- Importance: RW
#### 4. On the Convergence and Robustness of Adversarial Training (ICML'19)
- Link: https://github.com/YisenWang/dynamic_adv_training
- Relevance: Q/V
- Intro: FOSC
- Importance: MT
#### 5. Gradient Masking of Label Smoothing in Adversarial Robustness (IEEE Access'21)
- Link: https://ieeexplore.ieee.org/document/9311250
- Relevance: GM
- Intro: SGCS
- Importance: MT
#### 6. Annealing Self-Distillation Rectification Improves Adversarial Training (ICLR'24)
- Link: https://openreview.net/forum?id=eT6oLkm1cm
- Relevance: AT
- Intro: ADR
- Importance: NC
#### 7. Logit Pairing Methods Can Fool Gradient-Based Attacks (NeurIPSW'18)
- Link: https://github.com/uds-lsv/evaluating-logit-pairing-methods
- Relevance: GM
- Intro: ALP can fool gradient-gased attacks.
- Importance: RW
#### 8. Adversarial Examples Are Not Bugs, They Are Features (NeurIPS'19)
- Link: https://papers.nips.cc/paper_files/paper/2019/hash/e2c420d928d4bf8ce0ff2ec19b371514-Abstract.html
- Relevance: AT
- Intro: Identifying (Non)Robust features, trade-off between Robustness/Accuracy
- Importance: RW
#### 9. Robustness May Be at Odds with Accuracy (ICLR'19)
- Link: https://openreview.net/forum?id=SyxAb30cY7
- Relevance: AT
- Intro: trade-off between Robustness/Accuracy
- Importance: RW
#### 10. Adversarial Examples Are Not Real Features (NeurIPS'23)
- Link: https://openreview.net/forum?id=hSkEcIFi3o
- Relevance: AT
- Intro: Inspecting (Non)Robust features
- Importance: NC
#### 11. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (ICML'18)
- Link: https://github.com/anishathalye/obfuscated-gradients
- Relevance: GM
- Intro: Obfuscated Gradients
- Importance: MT
#### 12. Explaining and Harnessing Adversarial Examples (ICLR'15)
- Link: https://arxiv.org/abs/1412.6572
- Relevance: Atk
- Intro: FGSM
- Importance: RW
#### 13. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (ICML'20)
- Link: https://github.com/fra31/auto-attack
- Relevance: Atk
- Intro: Autoattack (APGD)
- Importance: MT
#### 14. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack (ICML'20)
- Link: https://github.com/fra31/fab-attack
- Relevance: Atk
- Intro: FAB
- Importance: MT
#### 15. Square Attack: a query-efficient black-box adversarial attack via random search (ECCV'20)
- Link: https://github.com/max-andr/square-attack
- Relevance: Atk
- Intro: Square Attack
- Importance: MT
#### 16. Adversarial Logit Pairing
- Link: https://arxiv.org/abs/1803.06373
- Relevance: AT
- Intro: ALP
- Importance: RW
#### 17. Visualizing the Loss Landscape of Neural Nets (NeurIPS'18)
- Link: https://papers.nips.cc/paper_files/paper/2018/hash/a41b3bb3e6b050b6c9067c67f663b915-Abstract.html
- Relevance: Q/V
- Intro: Loss landscape
- Importance: RW
#### 18. Improving Adversarial Robustness Requires Revisiting Misclassified Examples (ICLR'20)
- Link: https://openreview.net/forum?id=rklOg6EFwS
- Relevance: AT
- Intro: MART (AT for Misclassified Examples)
- Importance: RW
#### 19. Towards Evaluating the Robustness of Neural Networks
- Link: updated
- Intro: CW
#### 20. Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training?
- Link: https://arxiv.org/abs/1910.11585
- Relevance: logit pairing
#### 21. Adversarial Machine Learning at Scale (ICLR'17)
#### 22. A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization
- Link: https://epubs.siam.org/doi/10.1137/030601296
- Relevance: AT
- Intro: Convergence for finding critical points of a non-convex, non-smooth function is a question that is hard (quoted from 23); proposes training method that I don't understand yet
- Importance: RW (TBD)
#### 23. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks (AAAI'20)
- Link: https://ojs.aaai.org/index.php/AAAI/article/view/5736/
- Relevance: Oth - Smoothness
- Intro:
- Importance: RW / NC
#### 24. A Convergence Theory for Deep Learning via Over-Parameterization (PMLR')
- Link: https://proceedings.mlr.press/v97/allen-zhu19a.html
- Relevance: Oth - Smoothness
- Intro:
- Importance: RW / NC
#### 25. Exploring Memorization in Adversarial Training (ICLR'22) $
- Link: https://openreview.net/forum?id=7gE9V9GBZaI
- Relevance: Oth - Memorization
- Intro: Memorization of random label in TRADES
- Importance: RW
#### 26. Adversarial weight perturbation helps robust generalization.
- Link: updated
#### 27. Uncoveringthe limits of adversarial training against norm-bounded adversarial examples. CoRR
- Link: https://github.com/imrahulr/adversarial_robustness_pytorch
- Importance: RW (exp set)
#### 28. Overfitting in adversarially robust deep learning.
- Link: updated
#### 29. Bag of tricks for adversarial training. In 9th International Conference on Learning Representations, ICLR 2021
- Link: updated
#### 30./31. CIFAR/TinyImageNet
- Link: updated
#### 32. robustbench
- Link: updated
#### 33. Evaluating and Understanding the Robustness of Adversarial Logit Pairing
- Link: https://arxiv.org/abs/1807.10272
- Intro: Analysis of ALP
- Importance: RW
#### 34. Robust Overfitting may be mitigated by properly learned smoothening
- Link: https://openreview.net/forum?id=qZzy5urZw9
#### 35. Escaping from saddle points—online stochastic gradient for tensor decomposition
- Link: updated
#### 36. Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size
- Link: https://arxiv.org/abs/1804.07870
- Relevance: GM
#### 37. Gradient Masking and the Underestimated Robustness Threats of Differential Privacy in Deep Learning
- Link: https://arxiv.org/abs/2105.07985
- Relevance: GM
#### 38. Regularizer to Mitigate Gradient Masking Effect during Single-Step Adversarial Training
- Link: https://ieeexplore.ieee.org/document/9025502
- Relevance: GM
#### 39. Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness
- Link: https://github.com/HanxunH/MDAttack?tab=readme-ov-file
- Relevance: GM
#### 40. Attacks Which Do Not Kill Training Make Adversarial Learning Stronger
- Link: https://github.com/zjfheart/Friendly-Adversarial-Training
- Relevance: AT
#### 41. Randomized Adversarial Training via Taylor Expansion
- Link: https://github.com/Alexkael/Randomized-Adversarial-Training
- Relevance: TRADES
#### 42. Enhancing Adversarial Training with Second-Order Statistics of Weights
- Relevance: TRADES
#### 43. MMA Training: Direct Input Space Margin Maximization through Adversarial Training
- Link: https://github.com/BorealisAI/mma_training
- Relevance: AT
# TRADES_evaluation
## tiny-imagenet
### 80epoch:
| | clean | autoattack | pgd-10 | pgd-100 | apgd(ce) | square |
|:-------------------:|:------:|:----------:|:------:|:-------:| --- | --- |
| best_adv_score.pt | 0.4881 | 0.1595 | 0.2510 | 0.2192 | 0.2192 | 0.1882 |
| best_clean_score.pt | 0.5023 | 0.1240 | 0.2096 | 0.1566 | 0.1877 | 0.1593 |
### 100epoch:
| | clean | autoattack | pgd-10 | pgd-100 |
|:-------------------:|:------:|:----------:|:------:|:-------:|
| best_adv_score.pt | 0.5192 | 0.1225 | 0.2412 | 0.1371 |
| best_clean_score.pt | 0.5225 | 0.0996 | 0.2265 | 0.1265 |
## cifar-100
<!-- ### random seed (best_adv_score.pt):
| clean | autoattack | pgd-10 | pgd-100 | apgd(ce) | square |
|:------:|:----------:|:------:|:-------:|:--------:|:------:|
| 0.5585 | 0.2407 | 0.2917 | 0.2875 | 0.2868 | 0.2829 |
| 0.5597 | 0.2484 | 0.2943 | 0.2905 | 0.2903 | 0.2850 |
| 0.5624 | 0.2470 | 0.2942 | 0.2900 | 0.2900 | 0.2863 | -->
### adv.beta = 3, batch_size = 256
#### best_adv_score.pt
| clean | autoattack | tpgd | pgd-10 | apgd(ce) | apgd(dlr) | fab | square |
|:------:|:----------:|:------:|:------:|:--------:|:---------:|:------:|:----------:|
| 0.5304 | **0.1544** | 0.4265 | 0.2656 | 0.2206 | 0.2656 | 0.2262 | **0.1682** |
| 0.5749 | 0.2294 | 0.4484 | 0.2742 | 0.2681 | 0.2401 | 0.2447 | 0.2755 |
| 0.5782 | 0.2306 | 0.4479 | 0.2752 | 0.2708 | 0.2409 | 0.2426 | 0.2749 |
| 0.5738 | 0.2331 | 0.4454 | 0.2774 | 0.2716 | 0.2430 | 0.2469 | 0.2761 |
| 0.5896 | **0.1464** | 0.4361 | 0.2818 | 0.2189 | 0.2238 | 0.3105 | **0.1586** |
| 0.5448 | **0.1297** | 0.4103 | 0.2819 | 0.2252 | 0.2068 | 0.3029 | **0.1389** |
#### final_199.pt
| clean | autoattack | pgd-10 |
|:------:|:----------:|:----------:|
| 0.5616 | 0.2097 | 0.2402 |
| 0.5649 | 0.2167 | 0.2439 |
| 0.5653 | 0.2171 | 0.2473 |
| 0.5604 | 0.2138 | 0.2413 |
| 0.5898 | **0.1521** | **0.2708** |
#### loss landscape
- stable (2)


- instable (5)


- somehow rectified (1)


*(final)*


### adv.beta = 1, batch_size = 256
| clean | autoattack | pgd-10 |
|:------:|:----------:|:------:|
| 0.5843 | 0.0857 | 0.2524 |
| 0.6320 | 0.0754 | 0.2819 |
| 0.6348 | **0.0526** | 0.3184 |
| 0.6292 | 0.0859 | 0.2788 |
| 0.5639 | 0.0766 | 0.2669 |
| 0.6331 | **0.1507** | 0.2723 |
| 0.5894 | 0.0875 | 0.2711 |
| 0.5837 | 0.0970 | 0.3334 |
| 0.6202 | 0.0900 | 0.2875 |
| 0.5512 | 0.0691 | 0.2724 |
### X: all broken, P: part broken, V: all unbroken
| adv.beta \ batchsize | 128 | 256 | 512 | 1024 |
|:--------------------:|:------:|:---:|:---:|:------:|
| 1 | X | X | P | V or P |
| 3 | X or P | P | V | V |
| 6 | P | V | | |
### nostep (adv.beta = 3, batch_size = 128)
| | autoattack | pgd |
|:-----------------:|:---------------:|:---------------:|
| best_adv_score.pt | 0.1233 / 0.0934 | 0.2872 / 0.3106 |
| final_199.pt | 0.1814 / 0.1839 | 0.2292 / 0.2315 |
* Isolates the phenomenon (not dependent on step size change)
* However, many epochs allow the model to stabilize
_1


_2


Corresponding experiments: cifar100_nostep_beta3_128_1
cifar100_nostep_beta3_128_2
### Current evaluation workflow
1. Model training $\Rightarrow$ tensorboard plot + best ckpt + last ckpt
2. Perform autoattack / individual attack on best and last ckpt
3. Plot loss landscape
4. Record SGCS curve
### No random start in eval
(runs/eval_norestart) (3 256)


### No random start in TPGD inner maximization
(runs/TPGD_norand) (3 256)

(1 128)

| | clean | autoattack | pgd-10 | pgd-40 | apgd |
|:-------------------:|:------:|:----------:|:------:|:-------:| --- |
| 1 | 0.6425 | 0.0954 | 0.2933 | 0.1848 | 0.1867 |
| 2 | 0.6862 | 0.0338 | 0.2629 | 0.0972 | 0.1414 |
* Still instable after removing random start
### Tracking extra metrics
1. 3_256 step



2. 3_256 nostep



4. (1, 6)_256 nostep



### Weight Loss landscape
1. What is recorded:
- {[2d, 3d], [ce, kl, total]}




- experiments done: 1~40 ckpts and final ckpt of (1, 6)_256
- focus on 2d mapping
- direction of the two vectors?
2. General stable (6_256) vs instable (1_256)
The final ckpt of stable and instable models don't seem to be too different (6_256 is above)
1_256_last:



4. Training before and after SGCS / KL / eval changes
src/results, hard to share on hackmd.
Observations:
- At first, CE is prioritized in both 1 and 6, leading to multiple local minima in the KL loss landscape (around epoch 12~13)
- Personally feel like measuring with the "training" loss does not come up with too much new information.
### Possible TODOs
1. Experiments
- [x] Wideresnet nostep evaluation
- [x] Would it be possible to plot the two terms in the loss respectively?
- [ ] Cosine similarity detector? (Update weights only if resulting loss landscape is smooth, else reselect batch)
- [x] Per epoch covariance / variance between each batch's clean / robust loss
- training or eval?
- What extra info do we get?
- [ ] Loss landscape w.r.t. weights (+ save each ckpt)
- https://github.com/tomgoldstein/loss-landscape
- What extra info do we get?
- PGD and square
- Monitor gradient / perpendicular
- Use 3_256 instable vs stable instead of 1 vs 6
- [ ] (非常後面的事) SGCS 改 FOSC?
- https://arxiv.org/pdf/2112.08304.pdf
- https://github.com/YisenWang/dynamic_adv_training
2. Theory
* Methods to evaluate how "balanced" a batch(dataset) is
* Class
* Perturbed class
* embedding
* feature map mse
* Variance / Covariance of loss terms
* Discussion on how the non robust models "heal" themselves and why it breaks in the first place
3. Advices
* Iteratively update the two terms instead of using a sum
* Plot loss landscape wrt weights as well
* Save all checkpoints for one of each case (examine SGCS dropping before eval acc changing) (check if we can save 200 checkpoints on server)
4. 隨便亂記
* Survey two-termed loss functions
## Record in PPT form
https://docs.google.com/presentation/d/1QBpf9tlMfbot1kcuXVeYmh4mpW7FWgX7CkUA7j9mla4/edit?usp=sharing
## Items to report / discuss
### 0104
- [x] Similarity to label leaking
- [x] No random start in eval still causes observable instability $\Rightarrow$ SGCS implementation should be ok
- [x] Results of batch balance evaluation
- Variance / Covariance of the two loss terms (within an epoch, across batches) + individual term tracking.
- Record values during training.
- Discussion on results?
- [x] Saved all epochs (問過網管,可以)
- Motivation: Observation that SGCS drops before eval acc becomes instable.
- TODO: plot landscape w.r.t. weights.
- [x] FOSC instead of SGCS?
- "On the Convergence and Robustness of Adversarial Training" https://arxiv.org/pdf/2112.08304.pdf
- https://github.com/YisenWang/dynamic_adv_training
### 0109
- [x] Results of weight loss landscape
- Terms of loss / implementation
- Results
- General stable / instable
- Before and after SGCS drop
- [x] FOSC implementation
- [x] 學姊的建議
- [ ] Future TODOs
- [ ] What is the equivalent loss function of Autoattack (Square?), and should we plot it?
- [ ] A good metric to quantify smoothness of landscape?
### 0118
- FOSC
- Weight landscape (directions / different losses / 3_256)
### 0229
- SGCS / FOSC 是要針對 training 的 inner maximization,還是 evaluation
- Weight loss landscape update: sum 長的很像 CW / Square 是因為那一個攻擊沒有成功,回傳原 image
- 只有 SGCS / spike 的時候 model 才是 instable 的,can self heal
- 壞掉後會回到原來的收斂處,model 表現回不到同 config 正常狀況
- activation / weights 在 model
- Techniques:
- JS kind of failed
- MART 會不會就解決問題
TODO:
- Check other training procedures if instability occurs
- MART Schedule and Scheduling as a fixing technique / FOSC 當 regularizer
- JS fix bug
- Gradients
- 整理 code
- 砍 ADR
方向整理:
- 整理 code (必須做)
- Generalization (最優先)
- Gradients (優先)
- Other metrics 當 support (已完成/只要小修)
- Fix
### 0307
- FOSC / SGCS
- Q: on PGD or TPGD
- Gradient norm / direction exps
- Q: l2?
- Q: WSGCS is negative
- Then what?
- Numerical instability
- Proposal? https://nips.cc/
**TODO:**
- Per batch analysis
- MART / ALP
- Solutions: Rollback / FOSC as regularizer / Scheduling
- Remove ADR code
## Experiments / Tools
### Phenomenon Exists
- [ ] TRADES acc / adv acc eval $\Rightarrow$ Instability
- [ ] Datasets
- [ ] Tiny Image net
- [ ] Cifar10
- [ ] Cifar100
- [ ] Models
- [ ] WRN
- [ ] resnet18
- [ ] Parameters
- [ ] Beta (1, 3, 6)
- [ ] Batch size (128, 256, 512)
- [ ] LR (0.1, 0.01)
- [ ] Image loss landscape
### Explanation
- [ ] Weight loss landscape
- [ ] FOSC
- [ ] Per epoch
- [ ] Per batch
- [ ] SGCS
- [ ] Per epoch
- [ ] Per batch
- [ ] WSGCS
- [ ] WGradnorm
- [ ] CE
- [ ] KL
- [ ] Full
- [ ] Cosine Similarity between CE KL
- [ ] Entropy
- [ ] Class Distribution / Correct Incorrect
- [ ] Random Labels
- [ ] Memorization
- [ ] Label leaking
### Solutions
- [ ] Loss function changes
- [ ] LR
- [ ] Small
- [ ] Scheduling
### TODO
- [ ] Explainability
- [ ] Clean batch vs PGD (fail)
- [ ] Standard vs PGD (查)
- [ ] https://arxiv.org/abs/2306.11035
- [ ] Solution
- [ ] Perturb / noise on image (works)
- [ ] Weight perturbation (亂走一步) (fail)
- [ ] Poster / Writing
- [x] Abstract
- [x] 貢獻
- [x] Intro
- [x] PGD-AT loss
- [x] TRADES loss
- [x] Overestimation
- [x] 式子 FOSC SGCS
- [x] Phenomenon exists + FOSC
- [x] General case
- [x] FOSC graph
- [x] Acc table (parameters) 三組
- [x] Img loss landscape
- [x] Explaining
- [ ] FOSC / SGCS epoch / (batch)
- [ ] Acc (看狀況)
- [ ] Weight gradients cosine similarity
- [x] Discussion
- [ ] Baseline methods (講就好,unless 其他 fail)
- [ ] Healing (FOSC)
- [ ] Perturb img (works)
- [ ] Perturb weights (fail)