# References :::info - 標題記得括號標Conference(有的話) - Link: 貼可以直接拿到bib的網址,github/openreview/...,有inproceedings的比較好 - Relevance: 以下擇一 - Atk: Attack - AT: TRADES (AT) (Memorization) - GM: Gradient Masking (Obfuscated Gradients) - Q/V: Quantification/Visualization - Oth: Others - Intro: 主要是related work,介紹沒看過的(主文有看過的可以寫個代號就好) - Importance: 以下擇一 - MT: Main Text - RW: Related Work - NC: No Cite ::: #### 1. Theoretically Principled Trade-off between Robustness and Accuracy (ICML'19) - Link: https://github.com/yaodongyu/TRADES - Relevance: AT - Intro: TRADES - Importance: MT #### 2. Towards Deep Learning Models Resistant to Adversarial Attacks (ICLR'18) - Link: https://openreview.net/forum?id=rJzIBfZAbc - Relevance: Atk - Intro: PGD - Importance: MT #### 3. On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them (NeurIPS'20) - Link: https://github.com/liuchen11/AdversaryLossLandscape - Relevance: Q/V - Intro: loss landscape of AT - Importance: RW #### 4. On the Convergence and Robustness of Adversarial Training (ICML'19) - Link: https://github.com/YisenWang/dynamic_adv_training - Relevance: Q/V - Intro: FOSC - Importance: MT #### 5. Gradient Masking of Label Smoothing in Adversarial Robustness (IEEE Access'21) - Link: https://ieeexplore.ieee.org/document/9311250 - Relevance: GM - Intro: SGCS - Importance: MT #### 6. Annealing Self-Distillation Rectification Improves Adversarial Training (ICLR'24) - Link: https://openreview.net/forum?id=eT6oLkm1cm - Relevance: AT - Intro: ADR - Importance: NC #### 7. Logit Pairing Methods Can Fool Gradient-Based Attacks (NeurIPSW'18) - Link: https://github.com/uds-lsv/evaluating-logit-pairing-methods - Relevance: GM - Intro: ALP can fool gradient-gased attacks. - Importance: RW #### 8. Adversarial Examples Are Not Bugs, They Are Features (NeurIPS'19) - Link: https://papers.nips.cc/paper_files/paper/2019/hash/e2c420d928d4bf8ce0ff2ec19b371514-Abstract.html - Relevance: AT - Intro: Identifying (Non)Robust features, trade-off between Robustness/Accuracy - Importance: RW #### 9. Robustness May Be at Odds with Accuracy (ICLR'19) - Link: https://openreview.net/forum?id=SyxAb30cY7 - Relevance: AT - Intro: trade-off between Robustness/Accuracy - Importance: RW #### 10. Adversarial Examples Are Not Real Features (NeurIPS'23) - Link: https://openreview.net/forum?id=hSkEcIFi3o - Relevance: AT - Intro: Inspecting (Non)Robust features - Importance: NC #### 11. Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples (ICML'18) - Link: https://github.com/anishathalye/obfuscated-gradients - Relevance: GM - Intro: Obfuscated Gradients - Importance: MT #### 12. Explaining and Harnessing Adversarial Examples (ICLR'15) - Link: https://arxiv.org/abs/1412.6572 - Relevance: Atk - Intro: FGSM - Importance: RW #### 13. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks (ICML'20) - Link: https://github.com/fra31/auto-attack - Relevance: Atk - Intro: Autoattack (APGD) - Importance: MT #### 14. Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack (ICML'20) - Link: https://github.com/fra31/fab-attack - Relevance: Atk - Intro: FAB - Importance: MT #### 15. Square Attack: a query-efficient black-box adversarial attack via random search (ECCV'20) - Link: https://github.com/max-andr/square-attack - Relevance: Atk - Intro: Square Attack - Importance: MT #### 16. Adversarial Logit Pairing - Link: https://arxiv.org/abs/1803.06373 - Relevance: AT - Intro: ALP - Importance: RW #### 17. Visualizing the Loss Landscape of Neural Nets (NeurIPS'18) - Link: https://papers.nips.cc/paper_files/paper/2018/hash/a41b3bb3e6b050b6c9067c67f663b915-Abstract.html - Relevance: Q/V - Intro: Loss landscape - Importance: RW #### 18. Improving Adversarial Robustness Requires Revisiting Misclassified Examples (ICLR'20) - Link: https://openreview.net/forum?id=rklOg6EFwS - Relevance: AT - Intro: MART (AT for Misclassified Examples) - Importance: RW #### 19. Towards Evaluating the Robustness of Neural Networks - Link: updated - Intro: CW #### 20. Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training? - Link: https://arxiv.org/abs/1910.11585 - Relevance: logit pairing #### 21. Adversarial Machine Learning at Scale (ICLR'17) #### 22. A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization - Link: https://epubs.siam.org/doi/10.1137/030601296 - Relevance: AT - Intro: Convergence for finding critical points of a non-convex, non-smooth function is a question that is hard (quoted from 23); proposes training method that I don't understand yet - Importance: RW (TBD) #### 23. Generalization Error Bounds of Gradient Descent for Learning Over-Parameterized Deep ReLU Networks (AAAI'20) - Link: https://ojs.aaai.org/index.php/AAAI/article/view/5736/ - Relevance: Oth - Smoothness - Intro: - Importance: RW / NC #### 24. A Convergence Theory for Deep Learning via Over-Parameterization (PMLR') - Link: https://proceedings.mlr.press/v97/allen-zhu19a.html - Relevance: Oth - Smoothness - Intro: - Importance: RW / NC #### 25. Exploring Memorization in Adversarial Training (ICLR'22) $ - Link: https://openreview.net/forum?id=7gE9V9GBZaI - Relevance: Oth - Memorization - Intro: Memorization of random label in TRADES - Importance: RW #### 26. Adversarial weight perturbation helps robust generalization. - Link: updated #### 27. Uncoveringthe limits of adversarial training against norm-bounded adversarial examples. CoRR - Link: https://github.com/imrahulr/adversarial_robustness_pytorch - Importance: RW (exp set) #### 28. Overfitting in adversarially robust deep learning. - Link: updated #### 29. Bag of tricks for adversarial training. In 9th International Conference on Learning Representations, ICLR 2021 - Link: updated #### 30./31. CIFAR/TinyImageNet - Link: updated #### 32. robustbench - Link: updated #### 33. Evaluating and Understanding the Robustness of Adversarial Logit Pairing - Link: https://arxiv.org/abs/1807.10272 - Intro: Analysis of ALP - Importance: RW #### 34. Robust Overfitting may be mitigated by properly learned smoothening - Link: https://openreview.net/forum?id=qZzy5urZw9 #### 35. Escaping from saddle points—online stochastic gradient for tensor decomposition - Link: updated #### 36. Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size - Link: https://arxiv.org/abs/1804.07870 - Relevance: GM #### 37. Gradient Masking and the Underestimated Robustness Threats of Differential Privacy in Deep Learning - Link: https://arxiv.org/abs/2105.07985 - Relevance: GM #### 38. Regularizer to Mitigate Gradient Masking Effect during Single-Step Adversarial Training - Link: https://ieeexplore.ieee.org/document/9025502 - Relevance: GM #### 39. Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial Robustness - Link: https://github.com/HanxunH/MDAttack?tab=readme-ov-file - Relevance: GM #### 40. Attacks Which Do Not Kill Training Make Adversarial Learning Stronger - Link: https://github.com/zjfheart/Friendly-Adversarial-Training - Relevance: AT #### 41. Randomized Adversarial Training via Taylor Expansion - Link: https://github.com/Alexkael/Randomized-Adversarial-Training - Relevance: TRADES #### 42. Enhancing Adversarial Training with Second-Order Statistics of Weights - Relevance: TRADES #### 43. MMA Training: Direct Input Space Margin Maximization through Adversarial Training - Link: https://github.com/BorealisAI/mma_training - Relevance: AT # TRADES_evaluation ## tiny-imagenet ### 80epoch: | | clean | autoattack | pgd-10 | pgd-100 | apgd(ce) | square | |:-------------------:|:------:|:----------:|:------:|:-------:| --- | --- | | best_adv_score.pt | 0.4881 | 0.1595 | 0.2510 | 0.2192 | 0.2192 | 0.1882 | | best_clean_score.pt | 0.5023 | 0.1240 | 0.2096 | 0.1566 | 0.1877 | 0.1593 | ### 100epoch: | | clean | autoattack | pgd-10 | pgd-100 | |:-------------------:|:------:|:----------:|:------:|:-------:| | best_adv_score.pt | 0.5192 | 0.1225 | 0.2412 | 0.1371 | | best_clean_score.pt | 0.5225 | 0.0996 | 0.2265 | 0.1265 | ## cifar-100 <!-- ### random seed (best_adv_score.pt): | clean | autoattack | pgd-10 | pgd-100 | apgd(ce) | square | |:------:|:----------:|:------:|:-------:|:--------:|:------:| | 0.5585 | 0.2407 | 0.2917 | 0.2875 | 0.2868 | 0.2829 | | 0.5597 | 0.2484 | 0.2943 | 0.2905 | 0.2903 | 0.2850 | | 0.5624 | 0.2470 | 0.2942 | 0.2900 | 0.2900 | 0.2863 | --> ### adv.beta = 3, batch_size = 256 #### best_adv_score.pt | clean | autoattack | tpgd | pgd-10 | apgd(ce) | apgd(dlr) | fab | square | |:------:|:----------:|:------:|:------:|:--------:|:---------:|:------:|:----------:| | 0.5304 | **0.1544** | 0.4265 | 0.2656 | 0.2206 | 0.2656 | 0.2262 | **0.1682** | | 0.5749 | 0.2294 | 0.4484 | 0.2742 | 0.2681 | 0.2401 | 0.2447 | 0.2755 | | 0.5782 | 0.2306 | 0.4479 | 0.2752 | 0.2708 | 0.2409 | 0.2426 | 0.2749 | | 0.5738 | 0.2331 | 0.4454 | 0.2774 | 0.2716 | 0.2430 | 0.2469 | 0.2761 | | 0.5896 | **0.1464** | 0.4361 | 0.2818 | 0.2189 | 0.2238 | 0.3105 | **0.1586** | | 0.5448 | **0.1297** | 0.4103 | 0.2819 | 0.2252 | 0.2068 | 0.3029 | **0.1389** | #### final_199.pt | clean | autoattack | pgd-10 | |:------:|:----------:|:----------:| | 0.5616 | 0.2097 | 0.2402 | | 0.5649 | 0.2167 | 0.2439 | | 0.5653 | 0.2171 | 0.2473 | | 0.5604 | 0.2138 | 0.2413 | | 0.5898 | **0.1521** | **0.2708** | #### loss landscape - stable (2) ![image](https://hackmd.io/_uploads/S1oUcAAH6.png =50%x) ![image](https://hackmd.io/_uploads/HkuNjARHT.png =50%x) - instable (5) ![image](https://hackmd.io/_uploads/BJ3di00Bp.png =50%x) ![image](https://hackmd.io/_uploads/SkJCjRRST.png =50%x) - somehow rectified (1) ![image](https://hackmd.io/_uploads/HykBaARBa.png =50%x) ![image](https://hackmd.io/_uploads/Skn8a00BT.png =50%x) *(final)* ![image](https://hackmd.io/_uploads/rJ8a60RHp.png =50%x) ![image](https://hackmd.io/_uploads/r1ykA0ABa.png =50%x) ### adv.beta = 1, batch_size = 256 | clean | autoattack | pgd-10 | |:------:|:----------:|:------:| | 0.5843 | 0.0857 | 0.2524 | | 0.6320 | 0.0754 | 0.2819 | | 0.6348 | **0.0526** | 0.3184 | | 0.6292 | 0.0859 | 0.2788 | | 0.5639 | 0.0766 | 0.2669 | | 0.6331 | **0.1507** | 0.2723 | | 0.5894 | 0.0875 | 0.2711 | | 0.5837 | 0.0970 | 0.3334 | | 0.6202 | 0.0900 | 0.2875 | | 0.5512 | 0.0691 | 0.2724 | ### X: all broken, P: part broken, V: all unbroken | adv.beta \ batchsize | 128 | 256 | 512 | 1024 | |:--------------------:|:------:|:---:|:---:|:------:| | 1 | X | X | P | V or P | | 3 | X or P | P | V | V | | 6 | P | V | | | ### nostep (adv.beta = 3, batch_size = 128) | | autoattack | pgd | |:-----------------:|:---------------:|:---------------:| | best_adv_score.pt | 0.1233 / 0.0934 | 0.2872 / 0.3106 | | final_199.pt | 0.1814 / 0.1839 | 0.2292 / 0.2315 | * Isolates the phenomenon (not dependent on step size change) * However, many epochs allow the model to stabilize _1 ![image](https://hackmd.io/_uploads/HJeAlxELT.png) ![image](https://hackmd.io/_uploads/SkyBblVUT.png) _2 ![image](https://hackmd.io/_uploads/ByIhRyVI6.png) ![image](https://hackmd.io/_uploads/H1HyyeNI6.png) Corresponding experiments: cifar100_nostep_beta3_128_1 cifar100_nostep_beta3_128_2 ### Current evaluation workflow 1. Model training $\Rightarrow$ tensorboard plot + best ckpt + last ckpt 2. Perform autoattack / individual attack on best and last ckpt 3. Plot loss landscape 4. Record SGCS curve ### No random start in eval (runs/eval_norestart) (3 256) ![image](https://hackmd.io/_uploads/By4HiDKwT.png) ![image](https://hackmd.io/_uploads/HksIovYvp.png) ### No random start in TPGD inner maximization (runs/TPGD_norand) (3 256) ![image](https://hackmd.io/_uploads/HJBXyuYDa.png) (1 128) ![image](https://hackmd.io/_uploads/r1plOvcva.png) | | clean | autoattack | pgd-10 | pgd-40 | apgd | |:-------------------:|:------:|:----------:|:------:|:-------:| --- | | 1 | 0.6425 | 0.0954 | 0.2933 | 0.1848 | 0.1867 | | 2 | 0.6862 | 0.0338 | 0.2629 | 0.0972 | 0.1414 | * Still instable after removing random start ### Tracking extra metrics 1. 3_256 step ![image](https://hackmd.io/_uploads/SkuoBszu6.png) ![image](https://hackmd.io/_uploads/BJn2rofua.png) ![image](https://hackmd.io/_uploads/Hkdb8izdT.png) 2. 3_256 nostep ![image](https://hackmd.io/_uploads/BkVbDoM_a.png) ![image](https://hackmd.io/_uploads/Bysmwof_6.png) ![image](https://hackmd.io/_uploads/Sks_vszuT.png) 4. (1, 6)_256 nostep ![image](https://hackmd.io/_uploads/ByCR_iMda.png) ![image](https://hackmd.io/_uploads/HJK1toM_T.png) ![image](https://hackmd.io/_uploads/r1cZKiG_6.png) ### Weight Loss landscape 1. What is recorded: - {[2d, 3d], [ce, kl, total]} ![image](https://hackmd.io/_uploads/BJfqFajuT.png =x150) ![image](https://hackmd.io/_uploads/BkujtaouT.png =x150) ![image](https://hackmd.io/_uploads/H1NpYao_T.png =x150) ![image](https://hackmd.io/_uploads/ry9ZqaiO6.png =x150) - experiments done: 1~40 ckpts and final ckpt of (1, 6)_256 - focus on 2d mapping - direction of the two vectors? 2. General stable (6_256) vs instable (1_256) The final ckpt of stable and instable models don't seem to be too different (6_256 is above) 1_256_last: ![image](https://hackmd.io/_uploads/Sy-ZW0sdp.png =x150) ![image](https://hackmd.io/_uploads/r11pxRodT.png =x150) ![image](https://hackmd.io/_uploads/rytQ-0i_6.png =x150) 4. Training before and after SGCS / KL / eval changes src/results, hard to share on hackmd. Observations: - At first, CE is prioritized in both 1 and 6, leading to multiple local minima in the KL loss landscape (around epoch 12~13) - Personally feel like measuring with the "training" loss does not come up with too much new information. ### Possible TODOs 1. Experiments - [x] Wideresnet nostep evaluation - [x] Would it be possible to plot the two terms in the loss respectively? - [ ] Cosine similarity detector? (Update weights only if resulting loss landscape is smooth, else reselect batch) - [x] Per epoch covariance / variance between each batch's clean / robust loss - training or eval? - What extra info do we get? - [ ] Loss landscape w.r.t. weights (+ save each ckpt) - https://github.com/tomgoldstein/loss-landscape - What extra info do we get? - PGD and square - Monitor gradient / perpendicular - Use 3_256 instable vs stable instead of 1 vs 6 - [ ] (非常後面的事) SGCS 改 FOSC? - https://arxiv.org/pdf/2112.08304.pdf - https://github.com/YisenWang/dynamic_adv_training 2. Theory * Methods to evaluate how "balanced" a batch(dataset) is * Class * Perturbed class * embedding * feature map mse * Variance / Covariance of loss terms * Discussion on how the non robust models "heal" themselves and why it breaks in the first place 3. Advices * Iteratively update the two terms instead of using a sum * Plot loss landscape wrt weights as well * Save all checkpoints for one of each case (examine SGCS dropping before eval acc changing) (check if we can save 200 checkpoints on server) 4. 隨便亂記 * Survey two-termed loss functions ## Record in PPT form https://docs.google.com/presentation/d/1QBpf9tlMfbot1kcuXVeYmh4mpW7FWgX7CkUA7j9mla4/edit?usp=sharing ## Items to report / discuss ### 0104 - [x] Similarity to label leaking - [x] No random start in eval still causes observable instability $\Rightarrow$ SGCS implementation should be ok - [x] Results of batch balance evaluation - Variance / Covariance of the two loss terms (within an epoch, across batches) + individual term tracking. - Record values during training. - Discussion on results? - [x] Saved all epochs (問過網管,可以) - Motivation: Observation that SGCS drops before eval acc becomes instable. - TODO: plot landscape w.r.t. weights. - [x] FOSC instead of SGCS? - "On the Convergence and Robustness of Adversarial Training" https://arxiv.org/pdf/2112.08304.pdf - https://github.com/YisenWang/dynamic_adv_training ### 0109 - [x] Results of weight loss landscape - Terms of loss / implementation - Results - General stable / instable - Before and after SGCS drop - [x] FOSC implementation - [x] 學姊的建議 - [ ] Future TODOs - [ ] What is the equivalent loss function of Autoattack (Square?), and should we plot it? - [ ] A good metric to quantify smoothness of landscape? ### 0118 - FOSC - Weight landscape (directions / different losses / 3_256) ### 0229 - SGCS / FOSC 是要針對 training 的 inner maximization,還是 evaluation - Weight loss landscape update: sum 長的很像 CW / Square 是因為那一個攻擊沒有成功,回傳原 image - 只有 SGCS / spike 的時候 model 才是 instable 的,can self heal - 壞掉後會回到原來的收斂處,model 表現回不到同 config 正常狀況 - activation / weights 在 model - Techniques: - JS kind of failed - MART 會不會就解決問題 TODO: - Check other training procedures if instability occurs - MART Schedule and Scheduling as a fixing technique / FOSC 當 regularizer - JS fix bug - Gradients - 整理 code - 砍 ADR 方向整理: - 整理 code (必須做) - Generalization (最優先) - Gradients (優先) - Other metrics 當 support (已完成/只要小修) - Fix ### 0307 - FOSC / SGCS - Q: on PGD or TPGD - Gradient norm / direction exps - Q: l2? - Q: WSGCS is negative - Then what? - Numerical instability - Proposal? https://nips.cc/ **TODO:** - Per batch analysis - MART / ALP - Solutions: Rollback / FOSC as regularizer / Scheduling - Remove ADR code ## Experiments / Tools ### Phenomenon Exists - [ ] TRADES acc / adv acc eval $\Rightarrow$ Instability - [ ] Datasets - [ ] Tiny Image net - [ ] Cifar10 - [ ] Cifar100 - [ ] Models - [ ] WRN - [ ] resnet18 - [ ] Parameters - [ ] Beta (1, 3, 6) - [ ] Batch size (128, 256, 512) - [ ] LR (0.1, 0.01) - [ ] Image loss landscape ### Explanation - [ ] Weight loss landscape - [ ] FOSC - [ ] Per epoch - [ ] Per batch - [ ] SGCS - [ ] Per epoch - [ ] Per batch - [ ] WSGCS - [ ] WGradnorm - [ ] CE - [ ] KL - [ ] Full - [ ] Cosine Similarity between CE KL - [ ] Entropy - [ ] Class Distribution / Correct Incorrect - [ ] Random Labels - [ ] Memorization - [ ] Label leaking ### Solutions - [ ] Loss function changes - [ ] LR - [ ] Small - [ ] Scheduling ### TODO - [ ] Explainability - [ ] Clean batch vs PGD (fail) - [ ] Standard vs PGD (查) - [ ] https://arxiv.org/abs/2306.11035 - [ ] Solution - [ ] Perturb / noise on image (works) - [ ] Weight perturbation (亂走一步) (fail) - [ ] Poster / Writing - [x] Abstract - [x] 貢獻 - [x] Intro - [x] PGD-AT loss - [x] TRADES loss - [x] Overestimation - [x] 式子 FOSC SGCS - [x] Phenomenon exists + FOSC - [x] General case - [x] FOSC graph - [x] Acc table (parameters) 三組 - [x] Img loss landscape - [x] Explaining - [ ] FOSC / SGCS epoch / (batch) - [ ] Acc (看狀況) - [ ] Weight gradients cosine similarity - [x] Discussion - [ ] Baseline methods (講就好,unless 其他 fail) - [ ] Healing (FOSC) - [ ] Perturb img (works) - [ ] Perturb weights (fail)