Try   HackMD

Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation

論文 : NeurIPS2023
原文 : Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation

類別:

  1. Weakly Supervised
  2. Semantic Segmentation
  3. Segment Anything Model (SAM)

目的:

就是產生更好的 pseudo-label

Weakly supervised semantic segmentation (WSSS) 目的為僅使用 image-level 而非 pixel level 的標註。
大多數方法皆使用 CAM 得到較 pixel-level 的 pseudo-labels ,再用他們訓練一個 fully supervised semantic segmentation model ,但有個問題是 CAM 是基於 class 去生出一個粗略的區域,如下圖,而非是基於物件而產生一個可以描繪出物件輪廓的邊界

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
原圖
這篇論文使用了 CAM 去產生一個 pseudo label 作為提示,並結合 SAM 的 pseudo mask。

Segment Anything Model (SAM 由 facebookresearch 提出,既然是 anynthing 代表其並不知道他所框出來的物件 class 為何,也稱為 class-agnostic ,此模型可以產生精細的 instance masks)。

一般使用 CAM 輔助的 WSSS 有四個步驟

  1. 首先,使用 image level 標註的圖像訓練一個分類模型。
  2. 將模型中間的 Feature map 基於 class 乘上權重,將這些 Feature map 相加之後產生粗略的 class 位置。
  3. 透過 pixel affinity-based methods 或 saliency guidance 去建立 pixel level 的 pseudo-labels
  4. 最後,拿這些 pixel-level pseudo-labels 去訓練一個 semantic segmentation model

Survey

Pixel affinity-based methods:

透過 AffinityNet 的深度神經網絡(透過 CAM 透過 CAM 產生的圖經過 Threshold 產生較有自信的部分,接著只要相鄰為類似即為 1 ,反之則 0 )預測相鄰 pixel 之間的語義相似度。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Saliency guidance:

a binary mask segmenting the one
object a person is most likely to look first

一個 binary mask 指出當一個人在看那張圖時最可能看的地方,因此要辨認多類別是較困難的,如下,僅有火車的部分被標示出

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

因此我認為這是他僅在以下資料集上測試的原因
PASCAL VOC 2012 and MS COCO 2014

SAM Enhanced pseudo-labels (SEPL)

訓練時,一張圖

XRH×W×C 會對應到一個 image level 的 label vector
y=[y1,y2,...,yK]T{0,1}K
yK
代表該 class 存在,反之則 0

  1. 首先,會先透過資料集訓練一 classifier
    f
  2. 再來 WSSS 階段,會將圖像輸入
    f
    取得 Class Activation Maps (CAM) 得到
    M=[M1,...,MK]
    where
    MKRH×W
  3. SAM 會取一張圖
    XRH×W×C
    作為輸入並傳回每一個 subpart (即圖中每一個 object ) 的 masks list
    S=[S1,...,SL]
    where
    Sl{0,1}H×W
    L
    為 mask 的數量,其取決於圖像。假設選定
    S3
    ,那麼
    S3{0,1}H×W
    即代表第 4 個 mask 在
    XRH×W
    大小的圖中哪些 pixel 屬於第 4 個 mask 是的話就是 1 ,反之則 0 。

透過以上 CAM 及 SAM ,可得到分別為 class 及 object 的資訊。

再來進入先後兩階段:

  1. mask assignment: 前面提到因為 SAM 所產出的 mask 為 class-agnostic ,因此是不含 class 資訊的,而此階段會看 SAM 所產出的 mask
    S0
    與 CAM 哪一個 class 的 pseudo label 有最大的交集,就被 assign 到該 class 。

以前面的代號表示,SAM mask

Sl 會與 每一個 class
k{1,...,k}
的 Pseudo label
Pk
計算交集,並且每一個
Sl
會被分配到用有最大交集的 class ,若 Pseudo label 任何 mask 有交疊,則忽略該 mask ,即若不存在該 class ,Pseudo label 不會有對應的 activation , mask 自然就不會被用到,處理完全部的 SAM 的 mask 後,會獲得一組 mask assignment list
A=[A1,...,AK]
AK
包含了被 assign 給第 K 個 class 的 masks (一個 class 多個 mask) 。

  1. mask selection: 為了克服 CAM 有 false activation 的問題 (這裡比較像 CAM 的 pseudo label 太大,而非不在該 class 上,因此論文提到 已知與背景會有最小重疊的情況下 ),會選擇與 CAM 的 activation map 重疊最多的 SAM 的 mask 。同時,為了克服 partial activation ,也會使用有重疊到多個 CAM 的 pseudo label 的 SAM 的 mask ,兩階段如下

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Os
為 mask (自 SAM ) 被 pseudo label (自 CAM ) 覆蓋的部分 ( 這裡應該是要處理 partial activation 的問題,若 pseudo label 較小, mask 仍能因為整體為 object-aware 而無法切割被整個 object 分割出來 ),
Op
則為 pseudo label 被 mask 覆蓋的部分 ( 這裡應該是要處理 false activation 的問題,若 pseudo label 較大,而 mask 較小加上 object-aware 的特性使僅有 mask 的部分被分割出來 )

而若

Os
Op
超過以下 threshold , mask
S
將會保留作為 enhanced pseudo-label

  1. Os
    >
    t1
    ,
    t1=0.5
    ,表 pseudo label 遮蓋了
    50
    % 的 mask
    partial activation
  2. Op
    >
    t2
    ,
    t2=0.85
    ,表 mask 遮蓋了
    85
    % 的 pseudo label
    false activation

此處為 AND 運算,要同時滿足 1 及 2 ,mask
S
才會保留作為 enhanced pseudo-label。

再來看演算法

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

第 1 個 for 迴圈 Mask assignment

  1. mask 採用與 CAM 的 pseudo label 有最大的交集的 class

第 2 個 for 迴圈 Mask selection

  1. if 若 pseudo label 沒有 activation ( 無該class ) ,無需去計算 mask set 是否該採用。

    對於每一個 class :

    1. for 迴圈計算 pseudo label 與 class-k mask set 內每一張 mask 做計算
      Os
      Op
      ,皆超過 threshold 就使用 mask 更新 tmp 。
    2. if 若 tmp 為空,即都沒有過 threshold , tmp 直接使用 pseudo label。
    3. 透過 tmp 與 pseudo label 做 element-wise OR 更新 pseudo label 並將其分配為第 k 個 class 。

可以看到其所做的僅有改善 pseudo-label 品質,因此可套用於任何 model 上。

結果如下圖,可以看到透過 CAM 的 class-aware 及 SAM 的 object-aware ,即使有 false 及 partial activation 的問題,透過 SAM 輔助可以得到非常好的效果

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

實驗

驗證資料集:PASCAL VOC 2012 和 MS COCO 2014

因為是 weakly supervised ,在產生 pseudo label 時,僅使用 image-level label 去做為 ground truth label

使用 mIOU 做為評分基準

SAM 使用官方 code ,超參數設定及模型使用可以參考此論文。

在 PASCAL VOC 資料集上

以原本 SOTA 模型加上本方法產生的 pseudo label 品質比較

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

以原本 SOTA 模型加上本方法產生的 pseudo label 去做 SSS (Supervised Semantic Segmentation) 比較

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

在 COCO 資料集上

以原本 SOTA 模型加上本方法產生的 pseudo label 品質比較

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

以原本 SOTA 模型加上本方法產生的 pseudo label 去做 SSS (Supervised Semantic Segmentation) 比較

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

什麼時候 SAM 不會有幫助?

作者分析了 SAM 無效的原因,並歸因於initial pseudo-labels 、 SAM masks 及 enhancement 演算法。

Initial pseudo-labels: 因為使用原始的 pseudo-labels 去尋找相關的 masks,若他們 activate 錯誤區域或沒辦法 activate 正確區域, SAM 將不會有幫助,反而有害於影響 pseudo-labels

如下, pseudo label 誤 activate 了裝飾物及少 activate 了船。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

SAM masks: SAM 可以處理大部分 VOC 及 COCO 的圖像,但偶爾會錯誤地將不同的物件視為整體或乎卻正確區域,如下圖,第一行中寵物及其他物件會被視為整體,而第二行會有些並未被 segment 到。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Enhancement Algorithm: 儘管前面的演算法可以處理大部分情況,但仍因為有些情況產生問題

SAM 的 mask 包含著其他 mask ,如下圖, column g 包含著 column e 及 f ,此種情況下,因為 pseudo-labels 常為 partial activation ,然而演算法為偏向選擇較大的 mask 以產生更完整的 pseudo-label ,因此反而誤導了 pseudo-label 。
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

在論文後方有他應用在一些背景很空曠的地區, SAM 非常有幫助地描繪出非常細節的物件,可以另外去看。