Deep Learning for Computer Vision
We load pre-trained transformer model to implement image Captioning Visualization.
FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING
The output feature of the last decoder layer is utilized to predict next word via a linear layer whose output dimension equals to the vocabulary size. we take one example image below to show the caption predicted by model and visualize the "words-to-patches" cross attention weights in the decoder.
According to torch.nn.MultiheadAttention code. We can direct get the attn_output_weights to visualize attention Map
https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py
In our case :
where source sequence length \(S\) 128 is max position embeddings.
Visualization of the attention weights computed by the “words-to-patches” cross attention in the last decoder layer. “A young girl holding up a slice of pizza.” is the caption generated by our model. We can see both "girl" and "pizza" are correspond with each word.
Different from image classification only focus on the first row \(q\cdot k\). If we want show attention map in image captioning, we need to consider all metrix \(q\cdot k\).
or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing