Attention is not not explanation

--- title: Attention is not not explanation date: 2020-06-24 15:12:00 comments: true author: Darcy categories: - nlp study group tags: - NLP --- ###### tags: `study` `paper` `DSMI lab` paper: [Attention is not not explanation](https://arxiv.org/pdf/1908.04626.pdf) (不太喜歡這篇的表達方式，很多雙重否定@@，而且結構有點亂) ## Introduction 這篇是基於"Attention is not explanation"(以下簡稱 ==J&W==)這篇論文的探討。目的不是在說attention有解釋性，而是在說"attention 不一定沒有解釋性"。文中有點出幾個在"Attention is not explanation"的實驗和論述裡面，潛在的文問題。並且提出一些比較好的分析方式(雖然我沒有完全被說服)。  ## Main Claim 1. Attention Distribution is not a Primitive > The base attention weights are not assigned arbitrarily by the model, but rather computed by an integral component whose parameters were trained alongside the rest of the layers; the way they work depends on each other. Attention是有參與模型訓練的，並不是獨立於模型存在。J&W的把attention random permute 的實驗不是那麼適當。在製造adversary的時候應該要retrain model. 2. Existence does not Entail Exclusivity > We hold that attention scores are used as providing an explanation; not the explanation. 找到另一個可解釋的方式不代表本來的方式沒有解釋性。解釋性不具有唯一性。尤其對於binary classification task，input是很多個字(維度很大)，output是0~1，在降維的過程中容易有比較大的彈性。 ## Defining Explanation J&W 沒有清楚的定義甚麼是explanation，其實在過去的文獻對於explanation有不同定義。提到AI的可解釋性，常會出現下列三個名詞。 1. transparency: 找到model中可以令人理解的部分。 > Attention mechanisms do provide a look into the inner workings of a model, as they produce an easily-understandable weighting of hidden states. 2. explainability - 可以提供決策 - 可以模擬人類從過去所發生的事情中進行推斷的能力 4. interpretability - relationship between input and output (類似回歸裡面的regressor) - 需要專家幫忙鑑定 ## Experiments ### Dataset 跟J&W做一樣的實驗，但只有做 binary classification，沒有做QA。 ![](https://i.imgur.com/xsapH1A.png) * 1. Diabetes: whether a patient is diagnosed with diabetes from their ICU discharge summary * Anemia: hether the patient is diagnosed with acute (neg.) or chronic (pos.) anemia * IMDb: positive or negative sentiment from movie reviews * SST:positive or negative sentiment from sentences * AgNews:the topic of news articles as either world (neg.) or business (pos.) * 20News:the topic of news articles as either baseball (neg.) or hockey (pos.) ### Model same as J&W * single-layer bidirectional LSTM with tanh activation * attention layer * softmax prediction * hyperparameters are set to be same as J&W ### Uniform as the Adversary > Attention is not explanation if you don’t need it 每個字給相同的權重在某些task上面其實跟attention一樣好，那在這些task上面，attention的確沒什麼用。 ### Variance within a Model (其實我不太懂這個實驗到底要幹嘛) * Test whether the variances observed by J&W between trained attention scores and adversarially-obtained ones are unusual. * Train 8 models with different initial seeds * Plot the distribution of JSD(Jensen-Shannon Divergence) attention weights by the models * IMDB, SST, Anemia are robust to with seed changes. * (e): J&W 所產生出的advesary attentions 確實和本來的model很不一樣 * (d):negative-label instances in Diabetes dataset subject to relatively arbitrary distributions from the different random seeds.因此對於分布差很多的attention weights，效果可能還是不錯 * (f): 所以J&W 所產生出的advesary attentions 不是足夠adversarial ![](https://i.imgur.com/KTdlbIL.png) ### Training an Adversary 所以要怎麼樣才能建立合理的Adversary呢?上面有提到attention weight 不是獨立於model而存在的，所以每換一組attention weight都應該train a new model * Given a base model $M_b$, train a model $M_a$ which - can provide similar prediction scores - its distribution of attention weights should be very different from $M_b$ * Loss function: $L(M_a, M_b)=\sum^N_{i=1}TVD(\hat{y}_a^i,\hat{y}_b^i)-\lambda KL(\alpha_a^i||\alpha_b^i)$ $TVD(\hat{y}_a^i,\hat{y}_b^i)=\frac{1}{2}|\hat{y}_a^i-\hat{y}_b^i|$ ![](https://i.imgur.com/BsuzzNa.png) * 以下的圖: * 曲線越convex 表示attention weight 越可以被操控。 * x軸是某個model和$M_b$的attention JSD * 圖例: * 三角形: 固定$\lambda$ (不同dataet自己最好的$\lambda$)，使用不同random seeds 所train 出來的model * 正方形: uniform weight model * 加號: J&W's adversary model * 點點: 不同$\lambda$所train 出來的model ![](https://i.imgur.com/IKUg4P1.png) ### Diagnosing attention distribution by guiding simpler models 使用RNN系列的 model，會有前後字的影響，其實很難排除前後字是不是會影響解釋性，所以這篇使用MLP(non-contextual model，不能看左右鄰居)來診斷if attention weights provide better guides. ![](https://i.imgur.com/K3pTsjb.png) * 選擇一組pretrained attention weight (有對比MLP自己學) * train MLP * 實驗結果有發現用本來的model還是最好的(so attention provides better guide) ![](https://i.imgur.com/OccPgJr.png) ## Conclusion * 首先要定義好explanation是甚麼 * J&W 的實驗有不少漏洞，我們提供了比要合理的實驗方式 * 從MLP的實驗可以看到attention is somehow meaningful * Future work: 擴展實驗到QA tasks、不是英文的語言、請專家鑑定 ## Reference * https://zhuanlan.zhihu.com/p/84490817 * https://medium.com/@yuvalpinter/attention-is-not-not-explanation-dbc25b534017