or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Matching Networks for One Shot Learning
Code
Abstract
1 Introduction
2 Model
我們用來解決 one-shot learning 的 non-parametric approach 是基於以下兩點:
2.1 model architecture
靈感來自於
prediction \(\hat y = f(D^{train}, x^{test})\)
機率表示為 \(P(\hat y|\hat x, S)\),\(S = \{(x_i, y_i)\}_{i=1}^k\)
Matching Net 將該模型表示為:\[\hat y = \sum_\limits{i=1}^k\alpha(\hat x, x_i)y_i\]
2.1.1 Attention Kernel
本文附予 \(a(\hat x, x_i)\) 新的形式:看作 attention kernel,model 預測結果就是 support set 中 attention 最多的圖片的 label。常見 attention kernel 是 cosine similarity 加上 softmax:\[a(\hat x, x_i) = \dfrac{e^{c(f(\hat x), g(x_i))}}{\sum_{j=1}^k e^{c(f(\hat x), g(x_j))}}\]
2.1.2 Full Context Embeddings
embedding vector \(g(x^i) \leftarrow g(x^i, S)\),嵌入函數的輸出同時由對應的 \(x^i\) 和整個support set有關。support set是每次隨機選取的,嵌入函數同時考慮 support set 和 \(x^i\) 可以消除隨機選擇造成的差異性。類似機器翻譯中 word 和 context 的關係,\(S\) 可以看做是 \(x^i\) 的 context,所以本文在嵌入函數中用到了 LSTM。
support set 中的 \(x^i\) 經過多層 convolution 後,再經過一層 bi-LSTM
The Fully Conditional Embedding \(f\)
\(\hat h_k, c_k = LSTM(f'(\hat x), [h_{k-1}, r_{k-1}], c_{k-1})\\ h_k = \hat h_k + f'(\hat x)\\ r_{k-1} = \sum_{i=1}^{|S|}a(h_{k-1}, g(x_i))g(x_i)\\ a(h_{k-1}, g(x_i)) = softmax(h_{k-1}^T g(x_i))\)
因此再回顧一次 structure

以下個人理解
2.2 Training Strategy
to do N-way K-shot
每個 "episode" example
參考
tags:
fewshot learning