---
title: Attention is not not explanation
date: 2020-06-24 15:12:00
comments: true
author: Darcy
categories:
- nlp study group
tags:
- NLP
---
###### tags: `study` `paper` `DSMI lab`
paper: [Attention is not not explanation](https://arxiv.org/pdf/1908.04626.pdf)
(不太喜歡這篇的表達方式,很多雙重否定@@,而且結構有點亂)
## Introduction
這篇是基於"Attention is not explanation"(以下簡稱 ==J&W==)這篇論文的探討。目的不是在說attention有解釋性,而是在說"attention 不一定沒有解釋性"。文中有點出幾個在"Attention is not explanation"的實驗和論述裡面,潛在的文問題。並且提出一些比較好的分析方式(雖然我沒有完全被說服)。
<!-- more -->
## Main Claim
1. Attention Distribution is not a Primitive
> The base attention weights are not assigned arbitrarily by the model, but rather computed by an integral component whose parameters were trained alongside the rest of the layers; the way they work depends on each other.
Attention是有參與模型訓練的,並不是獨立於模型存在。J&W的把attention random permute 的實驗不是那麼適當。在製造adversary的時候應該要retrain model.
2. Existence does not Entail Exclusivity
> We hold that attention scores are used as providing an explanation; not the explanation.
找到另一個可解釋的方式不代表本來的方式沒有解釋性。解釋性不具有唯一性。尤其對於binary classification task,input是很多個字(維度很大),output是0~1,在降維的過程中容易有比較大的彈性。
## Defining Explanation
J&W 沒有清楚的定義甚麼是explanation,其實在過去的文獻對於explanation有不同定義。提到AI的可解釋性,常會出現下列三個名詞。
1. transparency: 找到model中可以令人理解的部分。
> Attention mechanisms do provide a look into the inner workings of a model, as they produce an easily-understandable weighting of hidden states.
2. explainability
- 可以提供決策
- 可以模擬人類從過去所發生的事情中進行推斷的能力
4. interpretability
- relationship between input and output (類似回歸裡面的regressor)
- 需要專家幫忙鑑定
## Experiments
### Dataset
跟J&W做一樣的實驗,但只有做 binary classification,沒有做QA。

* 1. Diabetes: whether a patient is diagnosed with diabetes from their ICU discharge summary
* Anemia: hether the patient is diagnosed with acute (neg.) or chronic (pos.) anemia
* IMDb: positive or negative sentiment from movie reviews
* SST:positive or negative sentiment from sentences
* AgNews:the topic of news articles as either world (neg.) or business (pos.)
* 20News:the topic of news articles as either baseball (neg.) or hockey (pos.)
### Model
same as J&W
* single-layer bidirectional LSTM with tanh activation
* attention layer
* softmax prediction
* hyperparameters are set to be same as J&W
### Uniform as the Adversary
> Attention is not explanation if you don’t need it
每個字給相同的權重在某些task上面其實跟attention一樣好,那在這些task上面,attention的確沒什麼用。
### Variance within a Model
(其實我不太懂這個實驗到底要幹嘛)
* Test whether the variances observed by J&W between trained attention scores and adversarially-obtained ones are unusual.
* Train 8 models with different initial seeds
* Plot the distribution of JSD(Jensen-Shannon Divergence) attention weights by the models
* IMDB, SST, Anemia are robust to with seed changes.
* (e): J&W 所產生出的advesary attentions 確實和本來的model很不一樣
* (d):negative-label instances in Diabetes dataset subject to relatively arbitrary distributions from the different random seeds.因此對於分布差很多的attention weights,效果可能還是不錯
* (f): 所以J&W 所產生出的advesary attentions 不是足夠adversarial

### Training an Adversary
所以要怎麼樣才能建立合理的Adversary呢?上面有提到attention weight 不是獨立於model而存在的,所以每換一組attention weight都應該train a new model
* Given a base model $M_b$, train a model $M_a$ which
- can provide similar prediction scores
- its distribution of attention weights should be very different from $M_b$
* Loss function: $L(M_a, M_b)=\sum^N_{i=1}TVD(\hat{y}_a^i,\hat{y}_b^i)-\lambda KL(\alpha_a^i||\alpha_b^i)$
$TVD(\hat{y}_a^i,\hat{y}_b^i)=\frac{1}{2}|\hat{y}_a^i-\hat{y}_b^i|$

* 以下的圖:
* 曲線越convex 表示attention weight 越可以被操控。
* x軸是某個model和$M_b$的attention JSD
* 圖例:
* 三角形: 固定$\lambda$ (不同dataet自己最好的$\lambda$),使用不同random seeds 所train 出來的model
* 正方形: uniform weight model
* 加號: J&W's adversary model
* 點點: 不同$\lambda$所train 出來的model

### Diagnosing attention distribution by guiding simpler models
使用RNN系列的 model,會有前後字的影響,其實很難排除前後字是不是會影響解釋性,所以這篇使用MLP(non-contextual model,不能看左右鄰居)來診斷if attention weights provide better guides.

* 選擇一組pretrained attention weight (有對比MLP自己學)
* train MLP
* 實驗結果有發現用本來的model還是最好的(so attention provides better guide)

## Conclusion
* 首先要定義好explanation是甚麼
* J&W 的實驗有不少漏洞,我們提供了比要合理的實驗方式
* 從MLP的實驗可以看到attention is somehow meaningful
* Future work: 擴展實驗到QA tasks、不是英文的語言、請專家鑑定
## Reference
* https://zhuanlan.zhihu.com/p/84490817
* https://medium.com/@yuvalpinter/attention-is-not-not-explanation-dbc25b534017