paper
Before: complex recurrent or convolutional neural network
Proposed architecture: Transformer, based on attention mechanisms
補充 (attention)
https://blog.csdn.net/Enjoy_endless/article/details/88679989
補充 (self-attention)
從影像來看,觀測範圍會比較廣,比起CNN,CNN一次只有看一個kernal
經過encoder可以知道哪個部分重要
補充 (residual connection)
source: arXiv:1512.03385
補充 (batch normalization)
encoder: N = 6, produce outputs of dimension \(d_{model} = 512\)
Add: residual connection
Norm: layer normalization
別人理解:
source: https://jalammar.github.io/illustrated-transformer/
我的理解:
paper: In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder.
Decoder: N = 6
\(h = 8, d_k = d_v = d_{model}/h = 64\)
Dense(dff, activation='relu')
fully connected: \(FFN(x) = max(0, xW_1+b_1)W_2+b_2 = Relu(xW_1+b_1)W_2+b_2\)
Because using binary values would be waste of place, we can use float contunus counterparts - Sinusoidal functions
source: https://kazemnejad.com/blog/transformer_architecture_positional_encoding/
Transformer是如何處理可變長度的?
length跟weight的dim無關
以下只計算維度變化,softmax不會造成維度變化
decoder跟 encoder怎麼接起來?
source: https://jalammar.github.io/illustrated-transformer/
paper: In “encoder-decoder attention” layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder.
paper: The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers.
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
we share the same weight matrix between the two embedding layers and the pre-softmax linear transformation
為什麼可以share? 跟NLP subword有關
為什麼要有FFN,self-attention後已經有linear transformation
有Relu增加performance
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing