Attention Mechanism概念與脈絡
-
Attention is a core ingredient of ‘conscious’ AI (ICLR 2020 Yoshua Bengio)
注意力機制(Attention),即人(或算法)一次專注於單個元素或多個元素的機制,是機器學習中的一種數據處理方法,廣泛應用在自然語言處理、圖像識別及語音識別等各種不同類型的機器學習任務中。
Bengio在其著作《Thinking, Fast and Slow》中描述了以美國心理學和經濟學家丹尼爾•卡尼曼(Daniel Kahneman)提出的認知系統。
- 第一種是無意識的——它是直覺的和快速的,非語言和習慣性的,它只處理隱性的知識。
- 第二種是有意識的——它是語言和算法的,它結合了推理和計劃,以及明確的知識形式。
-
人類的視覺注意力焦點
- 你可以在下面發現幾張臉?
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
注意力模型家族
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
第四大類深度學型模型
- Stanford大學在2021年底將transformer列為繼MLP, CNN, RNN之後的第四大類深度學型模型,把以transformer為基礎的一系列模型定義為foundation models。
與CNN、RNN對比
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
代號註記
- n : sequence length
- d : representation dimension,
- k : kernel size of convolutions
-
自注意力機制特色
- 可以完全平行計算
- 在最大路徑長度(最長序列為1)最低
- 抓取/查詢全局訊息越容易
- CNN的k越大,感受視野範圍越大
- 對長序列的計算複雜度高
-
模型架構
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
每個單詞同時去計算自己本身與所有單詞的相似性,來得到注意力分數
-
可理解為在所有詞庫中,建立所有單詞間的相似性分數矩陣
-
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
所有句子中的每個單詞(token/sequence)本身同時作為Query與Key,Value
-
注意力分數
- 即該embedding vector的V(Value) x 關注力權重
-
關注力權重 = Q x K
- 即得到該單詞與全部單詞的相關性矩陣
- 透過縮放及Softmax轉為(0,1)間的機率分布
-
encoder :
根據全局(全文)訊息,計算每個embedding vector(在NLP領域即每個單詞)對所有詞庫的相關性,獲得該單詞對(embedding vector)所有單詞的注意力分數
-
decoder :
Self-Attention
-
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
Scaled Dot-Product Attention 縮放後的點積注意力
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Multi-head Attention
- 由多個Self-Attention模塊串接組成
- 提供模型彈性、多面向的注意力
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Position encoding
- CNN與RNN帶有位置訊息,但自注意力機制本身沒有
- 鑲嵌在每個單詞(X)上,讓每個單詞帶有位置資訊
- 位置編碼
- X + P
- P ∈ ℝn×d :
- P :
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
推薦學習資源