or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Syncing
xxxxxxxxxx
Week 8 Notes
Transformer Networks 🤖
Purpose of encoder: Low entropy info to high entropy, lower dimension representation/encoding
Purpose of decoder: Making sense of the representation to produce something
Positional encoding:
Encoder-Decoder Attention similarities:
Multi-Head attention:
Feed-Forward Network
Decoder Layers:
CNNs
Why CNN?
Convolution
Architecture in 2014 paper: n words * k vectors -> filter -> max-over-time pooling
Generating a feature map
Why do we take the maximum value (in pooling)?
Multi-channel approach
Character-level CNN
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →Advantages: Model is robust against typos and misspelling, and can be used for strings like URLs
Disadvantages: Longer to train
Quantization
Larger datasets tend to perform better in CLCNN.
URL Example: Sum pooling used to "accumulate" rather than max-pooling
Papers Referenced: