or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Transformer
tags:
Deep Learning for Computer Vision
Self-Attention
query, key : caculate relationship between two word
softmax : transform scaler \(a_{i,j}\) into logits
vector : weight of word which weighted sum other word
Step 1 :
Use \(X = \{x_1, x_2, ..., x_n\}\) to represent \(N\) input information, and get the initial representation of the three vectors \(Q, K, V\) through linear transformation \(W\)
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Step 2 :
Calculate the similarity score between query \(q\) and key \(k\) by doing the inner product
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Step 3 :
Apply softmax to project the vector range between 0~1
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Step 4 :
Weighted sum each \(a_{i,j}\)
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Step 5 :
Doing attention by each word
Implementation
Muti-Head Self-Attention
Divide the model into multiple heads to form multiple subspaces, allowing the model to focus on different aspects of information (each weight is initialized randomly)

Layer normalization
The Decoder in Transformer
Vision Transformer
Query-Key-Value Attention
CNN v.s. Transformer
When we train a CNN, the kernel is learnable, and we use the kernel to convolve the entire image. If the convolution weight is high, the region is more important.
As same as CNN, Transformer uses attention mechanism to compute the query-key weight.
PS-ViT
Transformer for Semantic Segmentation
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Architecture
Different patch size