# Paper Note - Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
<style>
.red {
color: red;
}
</style>
paper source:
https://arxiv.org/abs/2004.11207
other referenece:
https://zhuanlan.zhihu.com/p/366767736
https://toutiao.io/posts/mriypje/preview (*)
https://zhuanlan.zhihu.com/p/148105536 (gradient indegration - > Attribute method)
## Paper Abstract
### Motivation
* Prior work strives to attribute model decisions to individual input features with different saliency measures, but they **fail to explain how these input features interactwith each other to reach predictions.**
### Paper Target
* We propose a **self-attentionattribution method** (ATTATTR) based on **integrated gradient** (Sundararajan, Taly, and Yan 2017).
### Paper Contributions
* We propose to use **self-attention attribution** to **interpret the information interactions** inside Transformer.
* We present **how to derive interaction trees** based on attribution scores, which **visualizes the compositional structures** learnt by Transformer.
* We show that the proposed attribution method can be used to **prune self-attention heads**, and **construct adversarial triggers**.
### Paper Structure
1. Background knowledge (Related work)
2. Method (How to calculate attribution socres using the **integrated gradient** method)
3. Experiemt with pruning
4. Do the tree visualization to show the relationship
5. Adversarial Triggers
6. Conclusion
## Related Work
* Are Sixteen Heads Really Better than One?
https://arxiv.org/abs/1905.10650
Method as comparison in pruning
* Axiomatic Attribution for Deep Networks (*)
https://arxiv.org/abs/1703.01365
Gradient Indegration
* Attention is All you Need.
https://arxiv.org/abs/1706.03762
Transformer
* BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
https://arxiv.org/abs/1810.04805
BERT
Transformer跟BERT可以做得簡單一點反正我想大家都懂大概
## Method
**<span class="red">Our goal is to calculate anattribution score for each attention connection.</span>**

### Transformer

### Self-Attention Attribution


m is the number of approximation steps
## Experiments
1. Results on pruning
2. Visualization tree
3. Adversarial Triggers
Data Set:
* MNLI
* RTE
* SST-2
* MRPC
Hyper Parameters:
* number of BERT layers: 12
* number of attention heads in each layer: 12
* the size of hidden embeddings: 768
* data split: https://mrqa.github.io/2019/assets/papers/3_Paper.pdf
* fine-tuning setting: https://arxiv.org/abs/1810.04805
### Effectiveness Analysis
<span class="red">To justify the selfattention edges with larger attribution scores contribute more to the model decision.</span>
Prune the attention heads incrementally in each layer according to their attribution scores and record the performance.

### Attention Head Pruning
<span class="red">Research about identifying and pruning the unimportant attention heads.</span>
Comparison with other metrices - accuracy difference and the Taylor expansion
#### Head Importance
* ATTATTR (proposed method)

* Taylor expansion


#### Universality of Important Head

### Visualizing Information Flow Inside Transformer

### Adversarial Attack
