# Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks ###### tags: `Defense` ###### paper origin: 2016 IEEE Symposium on Security & Privacy ###### papers: [link](https://arxiv.org/pdf/1511.04508.pdf) ###### video: [link](https://www.youtube.com/watch?v=oQr0gODUiZo&ab_channel=IEEESymposiumonSecurityandPrivacy) # 1. INTRODUCTION ## Research Problems To defense adversarial attack defense. ## Proposed Solutions Defensive distillation. ## What is distillation? * In machine learning, knowledge distillation is the process of transferring knowledge from a large model to a smaller one. While large models (such as very deep neural networks or ensembles of many models) have higher knowledge capacity than small models, this capacity might not be fully utilized. It can be computationally just as expensive to evaluate a model even if it utilizes little of its knowledge capacity. **Knowledge distillation transfers knowledge from a large model to a smaller model without loss of validity. As smaller models are less expensive to evaluate, they can be deployed on less powerful hardware (such as a mobile device)**.The gradient of the knowledge distillation loss **E** with respect to the logit of the distilled model **zi** is given by![](https://i.imgur.com/AXCUMCs.png) where **zi_hat** are the logits of the large model. For large values of **t** this can be approximated as ![](https://i.imgur.com/4lHxWUz.png)![](https://i.imgur.com/JFl6TLc.png)and under the zero-mean hypothesis![](https://i.imgur.com/zx5Aq7B.png) it becomes![](https://i.imgur.com/TSiM6p4.png)which is the derivative of![](https://i.imgur.com/IJHzooI.png) i.e. the loss is equivalent to matching the logits of the two models, as done in model compression. * Distillation is motivated by the end goal of reducing the size of DNN architectures or ensembles of DNN architectures, so as to reduce their computing ressource needs, and in turn allow deployment on resource constrained devices like smartphones. **The general intuition behind the technique is to extract class probability vectors produced by a first DNN or an ensemble of DNNs to train a second DNN of reduced dimensionality without loss of accuracy**. * A **softmax** layer is merely a layer that considers a vector Z(X) of outputs produced by the last hidden layer of a DNN, which are named logits, and normalizes them into a probability vector F(X), the ouput of the DNN, assigning a probability to each class of the dataset for input X. ![](https://i.imgur.com/1hWRRRr.png) **The higher the temperature of a softmax is, the more ambiguous its probability distribution will be.(All probabilities of the output F(X) are close to 1/N**). Whereas the smaller the temperature of a softmax is, the more discrete its probability distribution will be. (Only one probability in output F(X) is close to 1 and the remainder are close to 0). * This second model, although of smaller size, achieves comparable accuracy than the original model but is **less computationally expensive**. * Instead of transferring knowledge between different architectures, **we propose to use the knowledge extracted from a DNN to improve its own resilience to adversarial samples. We use defensive distillation to smooth the model learned by a DNN architecture during training by helping the model generalize better to samples outside of its training dataset**. At test time, models trained with defensive distillation are less sensitive to adversarial samples, and are therefore more suitable for deployment in security sensitive settings. ## Contributions * We articulate the requirements for the design of adversarial sample DNN defenses. These guidelines highlight the inherent tension between **defensive robustness, output accuracy, and performance** of DNNs. * We introduce defensive distillation, a procedure to train DNN-based classifier models that are more robust to perturbations. Distillation extracts additional knowledge about training points as class probability vectors produced by a DNN, which is fed back into the training regimen. This departs substantially from the past uses of distillation which aimed to reduce the DNN architectures to improve computational performance, but rather feeds the gained knowledge back into the original models. * We analytically investigate defensive distillation as a security countermeasure. We show that distillation generates smoother classifier models by reducing their sensitivity to input perturbations. **These smoother DNN classifiers are found to be more resilient to adversarial samples and have improved class generalizability properties**. * We show empirically that defensive distillation reduces the success rate of adversarial sample crafting from 95:89% to 0:45% against a first DNN trained on the MNIST dataset, and from 87:89% to 5:11% against a second DNN trained on the CIFAR10 dataset. * A further empirical exploration of the distillation parameter space shows that a correct parameterization can reduce the sensitivity of a DNN to input perturbations by a factor of 1030. Successively, this increases the average minimum number of input features to be perturbed to achieve adversarial targets by 790% for a first DNN, and by 556% for a second DNN. # 2. Implementation ## Adversarial attack: **(From: The limitations of deep learning in adversarial settings)** Adversarial crafting framework: ![](https://i.imgur.com/86gBPg6.png) * **(1) Jocobian (positive derivative) :** ![](https://i.imgur.com/vGfDgYm.png) Can get a gradient information of the output and the input * **(2) ![](https://i.imgur.com/2rkM2zM.png) Result: ![](https://i.imgur.com/yI8gNY6.png) --- ## Robustness: * Visualizing the hardness metric: ![](https://i.imgur.com/3D5bZ2N.png) * Robustness is achieved by ensuring that the classification output by a DNN remains somewhat constant in a closed neighborhood around any given sample extracted from the classifier’s input distribution. **The larger this neighborhood is for all inputs within the natural distribution of samples, the more robust is the DNN**. * ![](https://i.imgur.com/iQViBFX.png) where k · k is a norm and must be specified accordingly to the context. The higher the average minimum perturbation required to misclassify a sample from the data manifold is, the more robust the DNN is to adversarial samples.(The minimum perturbation that make the attack success.) * The robustness of a trained DNN model F is: ![](https://i.imgur.com/l4icXY3.png) where inputs X are drawn from distribution µ that DNN architecture is attempting to model with F, and ∆adv(X, F) is defined to be the minimum perturbation required to misclassify sample x in each of the other classes. ## Defense: * An overview of our defense mechanism based on a transfer of knowledge contained in probability vectors through distillation **Knowledge extracted by distillation, in the form of probability vectors, and transferred in smaller networks to maintain accuracies comparable with those of larger networks can also be beneficial to improving generalization capabilities of DNNs outside of their training dataset and therefore enhances their resilience to perturbations**. ![](https://i.imgur.com/sx0FVre.png) **(Training purpose: minimize cross entropy of Fd(X) and F(X))** * Minimize: ![](https://i.imgur.com/kNPUJ5J.png) * Minimize: ![](https://i.imgur.com/YYNydLD.png) # 3. Result * Overview of Architectures ![](https://i.imgur.com/JYT2ouy.png) * Overview of Training Parameters ![](https://i.imgur.com/5zicL9s.png) * An exploration of the temperature parameter space ![](https://i.imgur.com/GwImmcJ.png) * Influence of distillation on accuracy ![](https://i.imgur.com/ZIqyGKh.png) * An exploration of the impact of temperature on the amplitude of adversarial gradients ![](https://i.imgur.com/OXkY645.png) * Quantifying the impact of distillation temperature on robustness ![](https://i.imgur.com/02FbolC.png) # 4. Report * **懶人包:** distillation 是指將複雜網絡的知識遷移到簡單網絡上,Defensive distillation 是基於這種技術設計並被證明其可以抵抗小幅度擾動的對抗攻擊 * **優點:** Defensive distillation 可以通過降低分類器對輸入擾動的敏感度來生成更平滑的分類器模型。實驗發現這些更平滑的DNN分類器對對抗性樣本更robust,並且該方法具有通用性,亦使模型更加輕便快速 * **缺點:** 防禦蒸餾實際上是一個典型的梯度遮蔽的方式來防禦對抗攻擊,實際上我們即使不知道真正的梯度,採用近似的梯度,也能夠成功攻擊。或者使用對抗樣本的遷移特徵也可以攻破該防禦模型。因為這個方式是在不改變深度神經網絡的結構並且盡可能小的影響模型準確率的前提下能夠有效地抵禦對抗樣例的攻擊。儘管這種防禦方式能成功地降低攻擊的成功率,但這種方法是一種靜態的防禦方法,因此這種防禦體系很容易被再一次攻破,像是C&W提出的攻擊方式就非常成功的攻擊了該防禦模型(發布不到一年就被C&W證實無效),且此方法僅可使用於分類方法與容易訓練之模型