Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation
論文 : NeurIPS2023
原文 : Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation
類別:
- Weakly Supervised
- Semantic Segmentation
- Segment Anything Model (SAM)
目的:
就是產生更好的 pseudo-label
Weakly supervised semantic segmentation (WSSS) 目的為僅使用 image-level 而非 pixel level 的標註。
大多數方法皆使用 CAM 得到較 pixel-level 的 pseudo-labels ,再用他們訓練一個 fully supervised semantic segmentation model ,但有個問題是 CAM 是基於 class 去生出一個粗略的區域,如下圖,而非是基於物件而產生一個可以描繪出物件輪廓的邊界
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
原圖
這篇論文使用了 CAM 去產生一個 pseudo label 作為提示,並結合 SAM 的 pseudo mask。Segment Anything Model (SAM 由 facebookresearch 提出,既然是 anynthing 代表其並不知道他所框出來的物件 class 為何,也稱為 class-agnostic ,此模型可以產生精細的 instance masks)。
一般使用 CAM 輔助的 WSSS 有四個步驟
- 首先,使用 image level 標註的圖像訓練一個分類模型。
- 將模型中間的 Feature map 基於 class 乘上權重,將這些 Feature map 相加之後產生粗略的 class 位置。
- 透過 pixel affinity-based methods 或 saliency guidance 去建立 pixel level 的 pseudo-labels
- 最後,拿這些 pixel-level pseudo-labels 去訓練一個 semantic segmentation model
Survey
Pixel affinity-based methods:
透過 AffinityNet 的深度神經網絡(透過 CAM 透過 CAM 產生的圖經過 Threshold 產生較有自信的部分,接著只要相鄰為類似即為 1 ,反之則 0 )預測相鄰 pixel 之間的語義相似度。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Saliency guidance:
a binary mask segmenting the one
object a person is most likely to look first
一個 binary mask 指出當一個人在看那張圖時最可能看的地方,因此要辨認多類別是較困難的,如下,僅有火車的部分被標示出
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
因此我認為這是他僅在以下資料集上測試的原因
PASCAL VOC 2012 and MS COCO 2014SAM Enhanced pseudo-labels (SEPL)
訓練時,一張圖 會對應到一個 image level 的 label vector , 代表該 class 存在,反之則 0
- 首先,會先透過資料集訓練一 classifier
- 再來 WSSS 階段,會將圖像輸入 取得 Class Activation Maps (CAM) 得到 where
- SAM 會取一張圖 作為輸入並傳回每一個 subpart (即圖中每一個 object ) 的 masks list where , 為 mask 的數量,其取決於圖像。假設選定 ,那麼 即代表第 4 個 mask 在 大小的圖中哪些 pixel 屬於第 4 個 mask 是的話就是 1 ,反之則 0 。
透過以上 CAM 及 SAM ,可得到分別為 class 及 object 的資訊。
再來進入先後兩階段:
- mask assignment: 前面提到因為 SAM 所產出的 mask 為 class-agnostic ,因此是不含 class 資訊的,而此階段會看 SAM 所產出的 mask 與 CAM 哪一個 class 的 pseudo label 有最大的交集,就被 assign 到該 class 。
以前面的代號表示,SAM mask 會與 每一個 class 的 Pseudo label 計算交集,並且每一個 會被分配到用有最大交集的 class ,若 Pseudo label 任何 mask 有交疊,則忽略該 mask ,即若不存在該 class ,Pseudo label 不會有對應的 activation , mask 自然就不會被用到,處理完全部的 SAM 的 mask 後,會獲得一組 mask assignment list , 包含了被 assign 給第 K 個 class 的 masks (一個 class 多個 mask) 。
- mask selection: 為了克服 CAM 有 false activation 的問題 (這裡比較像 CAM 的 pseudo label 太大,而非不在該 class 上,因此論文提到 已知與背景會有最小重疊的情況下 ),會選擇與 CAM 的 activation map 重疊最多的 SAM 的 mask 。同時,為了克服 partial activation ,也會使用有重疊到多個 CAM 的 pseudo label 的 SAM 的 mask ,兩階段如下
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
為 mask (自 SAM ) 被 pseudo label (自 CAM ) 覆蓋的部分 ( 這裡應該是要處理 partial activation 的問題,若 pseudo label 較小, mask 仍能因為整體為 object-aware 而無法切割被整個 object 分割出來 ), 則為 pseudo label 被 mask 覆蓋的部分 ( 這裡應該是要處理 false activation 的問題,若 pseudo label 較大,而 mask 較小加上 object-aware 的特性使僅有 mask 的部分被分割出來 )而若 與 超過以下 threshold , mask 將會保留作為 enhanced pseudo-label
- > , ,表 pseudo label 遮蓋了 % 的 mask partial activation
- > , ,表 mask 遮蓋了 % 的 pseudo label false activation
此處為 AND 運算,要同時滿足 1 及 2 ,mask 才會保留作為 enhanced pseudo-label。
再來看演算法
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
第 1 個 for 迴圈 Mask assignment
- mask 採用與 CAM 的 pseudo label 有最大的交集的 class
第 2 個 for 迴圈 Mask selection
-
if 若 pseudo label 沒有 activation ( 無該class ) ,無需去計算 mask set 是否該採用。
對於每一個 class :
- for 迴圈計算 pseudo label 與 class-k mask set 內每一張 mask 做計算 與 ,皆超過 threshold 就使用 mask 更新 tmp 。
- if 若 tmp 為空,即都沒有過 threshold , tmp 直接使用 pseudo label。
- 透過 tmp 與 pseudo label 做 element-wise OR 更新 pseudo label 並將其分配為第 k 個 class 。
可以看到其所做的僅有改善 pseudo-label 品質,因此可套用於任何 model 上。
結果如下圖,可以看到透過 CAM 的 class-aware 及 SAM 的 object-aware ,即使有 false 及 partial activation 的問題,透過 SAM 輔助可以得到非常好的效果
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
實驗
驗證資料集:PASCAL VOC 2012 和 MS COCO 2014
因為是 weakly supervised ,在產生 pseudo label 時,僅使用 image-level label 去做為 ground truth label
使用 mIOU 做為評分基準
SAM 使用官方 code ,超參數設定及模型使用可以參考此論文。
在 PASCAL VOC 資料集上
以原本 SOTA 模型加上本方法產生的 pseudo label 品質比較
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
以原本 SOTA 模型加上本方法產生的 pseudo label 去做 SSS (Supervised Semantic Segmentation) 比較
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
在 COCO 資料集上
以原本 SOTA 模型加上本方法產生的 pseudo label 品質比較
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
以原本 SOTA 模型加上本方法產生的 pseudo label 去做 SSS (Supervised Semantic Segmentation) 比較
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
什麼時候 SAM 不會有幫助?
作者分析了 SAM 無效的原因,並歸因於initial pseudo-labels 、 SAM masks 及 enhancement 演算法。
Initial pseudo-labels: 因為使用原始的 pseudo-labels 去尋找相關的 masks,若他們 activate 錯誤區域或沒辦法 activate 正確區域, SAM 將不會有幫助,反而有害於影響 pseudo-labels
如下, pseudo label 誤 activate 了裝飾物及少 activate 了船。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
SAM masks: SAM 可以處理大部分 VOC 及 COCO 的圖像,但偶爾會錯誤地將不同的物件視為整體或乎卻正確區域,如下圖,第一行中寵物及其他物件會被視為整體,而第二行會有些並未被 segment 到。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Enhancement Algorithm: 儘管前面的演算法可以處理大部分情況,但仍因為有些情況產生問題 SAM 的 mask 包含著其他 mask ,如下圖, column g 包含著 column e 及 f ,此種情況下,因為 pseudo-labels 常為 partial activation ,然而演算法為偏向選擇較大的 mask 以產生更完整的 pseudo-label ,因此反而誤導了 pseudo-label 。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
在論文後方有他應用在一些背景很空曠的地區, SAM 非常有幫助地描繪出非常細節的物件,可以另外去看。