or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
![image alt](https:// "title") | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | Emoji list | ||
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
[Explainable AI] Transformer Interpretability Beyond Attention Visualization。 Transformer可解釋性與視覺化
tags:
Literature Reading
XAI
Visualization
Interpretability
Transformer
AI / ML領域相關學習筆記入口頁面
ViT與Transformer相關筆記
Transformer Interpretability Beyond Attention Visualization
官方code
核心概念
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →先備知識
Transformer架構與Self Attention
參考資料
Self-Attention
Scaled Dot-Product Attention 縮放後的點積注意力
Multi-head Attention
Transformer模型的整體結構
Explainability in computer vision
在給定輸入圖像和CNN的情況下,有許多方法被建議用於生成表明局部相關性的熱圖。這些方法大多屬於兩類中的一類:梯度方法(Gradient based)和歸因(Attribution propagation)方法。
XAI - Gradient based methods
Intuitive understanding of gradients
Gradient based methods
XAI - Attribution propagation methods
LRP(Layer-wise relevance propagation) 相關性逐層回傳
根據Deep Taylor Decomposition(DTD),將相關性從預測類別反向傳播到輸入圖像。
每層相關性預測總和是固定的
- Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data
各神經元相關分數(貢獻度)計算方式
看整體影響力的占比(global),而非權重敏感度(local)
計算各神經元相關分數(貢獻度)
相關分數重新分配給下一層(outputs > inputs)
遞迴反向傳播(BackPropogation)直到輸入
LRP其他參考資料
論文重點摘要
參考Intro to Transformers and Transformer Explainability(作者演講)(45min開始)
1. 從注意力矩陣(attention matrix)切入
2. 多個Attention map如何整合
Attention rollout(Abnar et al., 2000)
Transformer Interpretability Beyond Attention Visualization (Chefer et al., 2001)
Attention head的匯聚方式同時採用注意力矩陣的相關度圖(Relevance maps)與梯度(Gradients)
Relevance maps(LRP)
Layer-Wise Relevance Propagation (LRP)公式
LRP傳遞的過程
從輸出層傳播到輸入層
通過定義每一層對上一層結果的貢獻和相關性,從最後的輸出結果逐層推導至原圖中的像素層面
relevance map determines how much each similarity scores influences the output
Attention map只有\(Q·K^T\)的相似度值,然而整個Transformer模塊中,Attention map還需要跟其他的神經網路層(例如還會乘上Linear Layer)活化,因此單純使用Attention map的相似度數值無法充分反映反應特定token(patch)對類別的影響力。為此,此研究提議計算相關分數(LRP Relevance values)作為替代
Gradients
Weighted average of the heads
\(\overline{A}^{(b)} = I + \mathbb{E_h}(\bigtriangledown A^{(b)} \bigodot R^{(n_b)})^{+}\quad\quad(13)\)
\(C = \overline{A}^{(1)} · \overline{A}^{(2)} · . . . · \overline{A}^{(B)} \quad \quad(14)\)
\(C ∈ \mathbb{R}^{s×s}\) : 權重過的注意力相關圖(the weighted attention relevance)
\((\bigtriangledown A \bigodot R)^+\)
以梯度做為權重,與LRP做逐元素相乘
在計算梯度權重時,僅考慮正值
\(\mathbb{E_h}\) is the mean across the “heads” dimension
\(I\) : 由於Transformer模組中的跳接設計,在計算注意力相關性圖時,加上單位矩陣來避免每個token(patch)本身的特徵消失
為了改進rollout直接平均head的做法,此研究利用梯度大小會對類別訊號反應的特性,將相關圖(LRP)逐元素乘上梯度,亦即取得與類別相關的權重,而得到權重過的平均Attention head
使用矩陣相乘,將各層權重後的Attention map進行訊息匯聚(Layer Aggregate by matrix multiplucation)_見公式(14)
實驗結果評估
質性評估(Qualitative evaluation)
擾動測試(Perturbation tests)
正負擾動說明
在正向擾動中,像素依照相關性分數被從最高掩蓋到最低
在負向擾動中,像素依照相關性分數被從最低到最高
在這兩種情況下,我們都測量了"曲線下的面積"(area-under-the-curve, AUC),對10%-90%的像素進行調整。
An example of ROC curves with good (AUC = 0.9) and satisfactory (AUC = 0.65) parameters of specificity and sensitivity
Simply ROC Curve
擾動測試評估結果
分割任務(Segmentation)
The segmentation metrics (pixel-accuracy,mAP, and mIoU)
mean Average Precision(mAP)指標說明
平均精確率(Average Precision)
WHAT IS THE DIFFERENCE BETWEEN PRECISION-RECALL CURVE VS ROC-AUC CURVE?
Precision and Recall Made Simple
mAP (mean average precision) 即多個AP的平均值
Mean Intersection over Union(mIoU))指標說明
語言理解(Language reasoning)
消融研究(Ablation study)
模型設定
Algorithm for model
Setting for models
結果
eBird MAE模型測試
方法
影像質性測試
討論
參考資料
Transformer Interpretability Beyond Attention Visualization
)