Paper Note - Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

# Paper Note - Self-Attention Attribution: Interpreting Information Interactions Inside Transformer <style> .red { color: red; } </style> paper source: https://arxiv.org/abs/2004.11207 other referenece: https://zhuanlan.zhihu.com/p/366767736 https://toutiao.io/posts/mriypje/preview (*) https://zhuanlan.zhihu.com/p/148105536 (gradient indegration - > Attribute method) ## Paper Abstract ### Motivation * Prior work strives to attribute model decisions to individual input features with different saliency measures, but they **fail to explain how these input features interactwith each other to reach predictions.** ### Paper Target * We propose a **self-attentionattribution method** (ATTATTR) based on **integrated gradient** (Sundararajan, Taly, and Yan 2017). ### Paper Contributions * We propose to use **self-attention attribution** to **interpret the information interactions** inside Transformer. * We present **how to derive interaction trees** based on attribution scores, which **visualizes the compositional structures** learnt by Transformer. * We show that the proposed attribution method can be used to **prune self-attention heads**, and **construct adversarial triggers**. ### Paper Structure 1. Background knowledge (Related work) 2. Method (How to calculate attribution socres using the **integrated gradient** method) 3. Experiemt with pruning 4. Do the tree visualization to show the relationship 5. Adversarial Triggers 6. Conclusion ## Related Work * Are Sixteen Heads Really Better than One? https://arxiv.org/abs/1905.10650 Method as comparison in pruning * Axiomatic Attribution for Deep Networks (*) https://arxiv.org/abs/1703.01365 Gradient Indegration * Attention is All you Need. https://arxiv.org/abs/1706.03762 Transformer * BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/abs/1810.04805 BERT Transformer跟BERT可以做得簡單一點反正我想大家都懂大概 ## Method **<span class="red">Our goal is to calculate anattribution score for each attention connection.</span>** ![](https://hackmd.io/_uploads/r1gLd87Bh.png) ### Transformer ![](https://hackmd.io/_uploads/BymXKIXS2.png) ### Self-Attention Attribution ![](https://hackmd.io/_uploads/B1pwYI7B2.png) ![](https://hackmd.io/_uploads/rJIutIXBh.png) m is the number of approximation steps ## Experiments 1. Results on pruning 2. Visualization tree 3. Adversarial Triggers Data Set: * MNLI * RTE * SST-2 * MRPC Hyper Parameters: * number of BERT layers: 12 * number of attention heads in each layer: 12 * the size of hidden embeddings: 768 * data split: https://mrqa.github.io/2019/assets/papers/3_Paper.pdf * fine-tuning setting: https://arxiv.org/abs/1810.04805 ### Effectiveness Analysis <span class="red">To justify the selfattention edges with larger attribution scores contribute more to the model decision.</span> Prune the attention heads incrementally in each layer according to their attribution scores and record the performance. ![](https://hackmd.io/_uploads/By0AZD7rn.png) ### Attention Head Pruning <span class="red">Research about identifying and pruning the unimportant attention heads.</span> Comparison with other metrices - accuracy difference and the Taylor expansion #### Head Importance * ATTATTR (proposed method) ![](https://hackmd.io/_uploads/SkAB7vmrh.png) * Taylor expansion ![](https://hackmd.io/_uploads/SyqD7DXHh.png) ![](https://hackmd.io/_uploads/SJ_d7vXS2.png) #### Universality of Important Head ![](https://hackmd.io/_uploads/ByMhQP7H3.png) ### Visualizing Information Flow Inside Transformer ![](https://hackmd.io/_uploads/SkyVsC7Bn.png) ### Adversarial Attack ![](https://hackmd.io/_uploads/rJ_LoA7S2.png)