Prototypical Networks for Few-shot Learning. NIPS 2017

Abstract

  • some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning.
  • Prototypical Networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class.

Prototypical Networks

Model

  • Prototype \(c_k\) for class \(k\): \(c_k = \frac{1}{|S_k|}\sum_\limits{(x_i, y_i)\in S_k} f_\phi(x_i) \tag 1\)
  • Predicted probability for class \(k\): \(p_\phi(y=k|x) = \dfrac{\exp(-d(f_\phi(x), c_k))}{\sum_{k'}\exp(-d(f_\phi(x), c_{k'}))} \tag 2\)
  • Loss of one query data: \(\dfrac{1}{N_CN_Q}[d(f_\phi(x),c_k)+\log\sum_{k'}\exp(-d(f_\phi(x),c_{k'}))]\tag 3\)

  • \(D\) 是整個 trainin set;\(D_k\) 是整個 training set 中的 class \(k\) data
  • \(f_\phi\) 是 embedding function
  • Update loss 那項 本來是 \(-\log(\exp(-d(f_\phi(x), c_k)))\),化簡後得到 \(d(f_\phi(x), c_k)\)

  • support set 裡面同 class 的 image 的 embedding 的 mean 被稱作 prototype
  • 訓練使得 query set 到自己類別 prototype 的距離越近越好;到其他類別 prototype 距離越遠越好
  • 有做 additional 實驗,training task 用的 way 比 testing task 更多,效果比較好;而使用的 shot 和 testing task 一樣,效果比較好。
  • 使用 Euclidean distance 作為距離,表現比 cosine similarity 好,推測是因為 cosine similarity 不是 Bregman divergence (???)
  • 也可做 zero-shot learning
tags: fewshot learning
Select a repo