Recommended Papers

# Recommended Papers ## 1. Background and Foundations of Adversarial Robustness ### 1.1 Intriguing properties of neural networks ## 2. Adversarial Attack and Training ### 2.1 Explaining and Harnessing Adversarial Examples (ICLR, 2015) #### Key Insights * Adversarial examples can be explained as a property of high-dimensional dot products. They are a result of models being too **linear**, rather than too nonlinear. * The generalization of adversarial examples across different models can be explained as a result of adversarial perturbations being highly aligned with the **weight vectors** of a model, and different models learning similar functions when trained to perform the same task. * The **direction of perturbation**, rather than the specific point in space, matters most. Space is not full of pockets of adversarial examples that finely tile the reals like the rational numbers. * Because it is the direction that matters most, adversarial perturbations generalize across different clean examples. * We have introduced a family of fast methods for generating adversarial examples. * We have demonstrated that adversarial training can result in regularization; even further regularization than dropout. * We have run control experiments that failed to reproduce this effect with simpler but less efficient regularizers including L1 weight decay and adding noise. * Models that are easy to optimize are easy to perturb. * Linear models lack the capacity to resist adversarial perturbation; only structures with a hidden layer (where the **universal approximator theorem** applies) should be trained to resist adversarial perturbation. * RBF networks are resistant to adversarial examples. * Models trained to model the input distribution are not resistant to adversarial examples. * Ensembles are not resistant to adversarial examples. #### Methods - Generating adversarial examples: Fast Gradient Sign Method $$\eta = \epsilon \cdot sign(\nabla_xJ(\theta, x, y))$$ * Adversarial Traning against adversarial examples generated by fast gradient sign method: $$\widetilde J(\theta, x, y) = \alpha J(\theta, x, y) + (1-\alpha)J(\theta, x + \epsilon \cdot sign(\nabla_xJ(\theta, x, y)),y) $$ #### Questions Left - Whether it is better to preturb the input or the hidden layers or both - Tradeoff: ease of optimization has come at the cost of models that are easily misled by adversarial exapmles **Possible Direction: [the development of optimization procedures that are able to train models whose behavior is more locally stable]** ### 2.2 Towards Evaluating the Robustness of Neural Networks ### 2.3 Towards Deep Learning Models Resistant to Adversarial Examples ### 2.4 Theoretically Principled Trade-off between Robustness and Accuracy ## 3. Theoretical Understanding of Adversarial Examples Robustness May Be at Odds with Accuracy Adversarially Robust Generalization Requires More Data Adversarial Examples Are Not Bugs, They Are Features ## 4. Randomized Smoothing Certified Adversarial Robustness via Randomized Smoothing Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers ## 5. Lipschitz Network Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons Boosting the Certified Robustness of L-infinity Distance Nets ## 6. Other Approaches in Certified Robustness Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope