or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
[Transformer] Self-Attention與Transformer
tags:
Literature Reading
Self-Attention
Transformer
ViT
ViT與Transformer相關筆記
Attention Mechanism概念與脈絡
Attention is a core ingredient of ‘conscious’ AI (ICLR 2020 Yoshua Bengio)
人類的視覺注意力焦點
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →注意力模型家族
Transformer的重要性及突破性概念
第四大類深度學型模型
On the Opportunities and Risks of Foundation Models
與CNN、RNN對比
代號註記
自注意力機制特色
原始論文:Attention is All you Need
Transformer模型的整體結構
模型架構

每個單詞同時去計算自己本身與所有單詞的相似性,來得到注意力分數
可理解為在所有詞庫中,建立所有單詞間的相似性分數矩陣
所有句子中的每個單詞(token/sequence)本身同時作為Query與Key,Value
注意力分數
關注力權重 = Q x K
encoder :
根據全局(全文)訊息,計算每個embedding vector(在NLP領域即每個單詞)對所有詞庫的相關性,獲得該單詞對(embedding vector)所有單詞的注意力分數decoder :
以當前embedding vector作為Q,在encoder所學習到的注意力矩陣中進行查詢
在推論時,
Self-Attention
Scaled Dot-Product Attention 縮放後的點積注意力
Multi-head Attention
Position encoding
推薦學習資源