Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation

論文 : NeurIPS2023
原文 : Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation

類別:

Weakly Supervised
Semantic Segmentation
Segment Anything Model (SAM)

目的:

就是產生更好的 pseudo-label

Weakly supervised semantic segmentation (WSSS) 目的為僅使用 image-level 而非 pixel level 的標註。
大多數方法皆使用 CAM 得到較 pixel-level 的 pseudo-labels ，再用他們訓練一個 fully supervised semantic segmentation model ，但有個問題是 CAM 是基於 class 去生出一個粗略的區域，如下圖，而非是基於物件而產生一個可以描繪出物件輪廓的邊界

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

原圖
這篇論文使用了 CAM 去產生一個 pseudo label 作為提示，並結合 SAM 的 pseudo mask。

Segment Anything Model (SAM 由 facebookresearch 提出，既然是 anynthing 代表其並不知道他所框出來的物件 class 為何，也稱為 class-agnostic ，此模型可以產生精細的 instance masks)。

一般使用 CAM 輔助的 WSSS 有四個步驟

首先，使用 image level 標註的圖像訓練一個分類模型。
將模型中間的 Feature map 基於 class 乘上權重，將這些 Feature map 相加之後產生粗略的 class 位置。
透過 pixel affinity-based methods 或 saliency guidance 去建立 pixel level 的 pseudo-labels
最後，拿這些 pixel-level pseudo-labels 去訓練一個 semantic segmentation model

Survey

Pixel affinity-based methods:

透過 AffinityNet 的深度神經網絡（透過 CAM 透過 CAM 產生的圖經過 Threshold 產生較有自信的部分，接著只要相鄰為類似即為 1 ，反之則 0 ）預測相鄰 pixel 之間的語義相似度。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Saliency guidance:

a binary mask segmenting the one
object a person is most likely to look first

一個 binary mask 指出當一個人在看那張圖時最可能看的地方，因此要辨認多類別是較困難的，如下，僅有火車的部分被標示出

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

因此我認為這是他僅在以下資料集上測試的原因

\to

PASCAL VOC 2012 and MS COCO 2014

SAM Enhanced pseudo-labels (SEPL)

訓練時，一張圖

X \in R^{H \times W \times C}

會對應到一個 image level 的 label vector

y = [y_{1}, y_{2}, . . ., y_{K}]^{T} \in {0, 1}^{K}

，

y_{K}

代表該 class 存在，反之則 0

首先，會先透過資料集訓練一 classifier
$f$
再來 WSSS 階段，會將圖像輸入
$f$ 取得 Class Activation Maps (CAM) 得到
$M = [M_{1}, . . ., M_{K}]$ where
$M_{K} \in R^{H \times W}$
SAM 會取一張圖
$X \in R^{H \times W \times C}$ 作為輸入並傳回每一個 subpart (即圖中每一個 object ) 的 masks list
$S = [S_{1}, . . ., S_{L}]$ where
$S_{l} \in {0, 1}^{H \times W}$ ，
$L$ 為 mask 的數量，其取決於圖像。假設選定
$S_{3}$ ，那麼
$S_{3} \in {0, 1}^{H \times W}$ 即代表第 4 個 mask 在
$X \in R^{H \times W}$ 大小的圖中哪些 pixel 屬於第 4 個 mask 是的話就是 1 ，反之則 0 。

透過以上 CAM 及 SAM ，可得到分別為 class 及 object 的資訊。

再來進入先後兩階段：

mask assignment: 前面提到因為 SAM 所產出的 mask 為 class-agnostic ，因此是不含 class 資訊的，而此階段會看 SAM 所產出的 mask
$S_{0}$ 與 CAM 哪一個 class 的 pseudo label 有最大的交集，就被 assign 到該 class 。

以前面的代號表示，SAM mask

S_{l}

會與每一個 class

k \in {1, . . ., k}

的 Pseudo label

P_{k}

計算交集，並且每一個

S_{l}

會被分配到用有最大交集的 class ，若 Pseudo label 任何 mask 有交疊，則忽略該 mask ，即若不存在該 class ，Pseudo label 不會有對應的 activation ， mask 自然就不會被用到，處理完全部的 SAM 的 mask 後，會獲得一組 mask assignment list

A = [A_{1}, . . ., A_{K}]

，

A_{K}

包含了被 assign 給第 K 個 class 的 masks （一個 class 多個 mask）。

mask selection: 為了克服 CAM 有 false activation 的問題 (這裡比較像 CAM 的 pseudo label 太大，而非不在該 class 上，因此論文提到 已知與背景會有最小重疊的情況下 )，會選擇與 CAM 的 activation map 重疊最多的 SAM 的 mask 。同時，為了克服 partial activation ，也會使用有重疊到多個 CAM 的 pseudo label 的 SAM 的 mask ，兩階段如下

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

O_{s}

為 mask (自 SAM ) 被 pseudo label (自 CAM ) 覆蓋的部分 ( 這裡應該是要處理 partial activation 的問題，若 pseudo label 較小， mask 仍能因為整體為 object-aware 而無法切割被整個 object 分割出來 )，

O_{p}

則為 pseudo label 被 mask 覆蓋的部分 ( 這裡應該是要處理 false activation 的問題，若 pseudo label 較大，而 mask 較小加上 object-aware 的特性使僅有 mask 的部分被分割出來 )

而若

O_{s}

與

O_{p}

超過以下 threshold ， mask

S

將會保留作為 enhanced pseudo-label

$O_{s}$ >
$t_{1}$ ,
$t_{1} = 0.5$ ，表 pseudo label 遮蓋了
$50$ % 的 mask
$\to$ partial activation
$O_{p}$ >
$t_{2}$ ,
$t_{2} = 0.85$ ，表 mask 遮蓋了
$85$ % 的 pseudo label
$\to$ false activation

⋆

此處為 AND 運算，要同時滿足 1 及 2 ，mask

S

才會保留作為 enhanced pseudo-label。

再來看演算法

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

第 1 個 for 迴圈 Mask assignment

mask 採用與 CAM 的 pseudo label 有最大的交集的 class

第 2 個 for 迴圈 Mask selection

if 若 pseudo label 沒有 activation ( 無該class ) ，無需去計算 mask set 是否該採用。

對於每一個 class :
1. for 迴圈計算 pseudo label 與 class-k mask set 內每一張 mask 做計算
  $O_{s}$ 與
  $O_{p}$ ，皆超過 threshold 就使用 mask 更新 tmp 。
2. if 若 tmp 為空，即都沒有過 threshold ， tmp 直接使用 pseudo label。
3. 透過 tmp 與 pseudo label 做 element-wise OR 更新 pseudo label 並將其分配為第 k 個 class 。

可以看到其所做的僅有改善 pseudo-label 品質，因此可套用於任何 model 上。

結果如下圖，可以看到透過 CAM 的 class-aware 及 SAM 的 object-aware ，即使有 false 及 partial activation 的問題，透過 SAM 輔助可以得到非常好的效果

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

實驗

驗證資料集：PASCAL VOC 2012 和 MS COCO 2014

因為是 weakly supervised ，在產生 pseudo label 時，僅使用 image-level label 去做為 ground truth label

使用 mIOU 做為評分基準

SAM 使用官方 code ，超參數設定及模型使用可以參考此論文。

在 PASCAL VOC 資料集上

以原本 SOTA 模型加上本方法產生的 pseudo label 品質比較

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

以原本 SOTA 模型加上本方法產生的 pseudo label 去做 SSS (Supervised Semantic Segmentation) 比較

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

在 COCO 資料集上

以原本 SOTA 模型加上本方法產生的 pseudo label 品質比較

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

以原本 SOTA 模型加上本方法產生的 pseudo label 去做 SSS (Supervised Semantic Segmentation) 比較

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

什麼時候 SAM 不會有幫助?

作者分析了 SAM 無效的原因，並歸因於initial pseudo-labels 、 SAM masks 及 enhancement 演算法。

Initial pseudo-labels: 因為使用原始的 pseudo-labels 去尋找相關的 masks，若他們 activate 錯誤區域或沒辦法 activate 正確區域， SAM 將不會有幫助，反而有害於影響 pseudo-labels

如下， pseudo label 誤 activate 了裝飾物及少 activate 了船。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

SAM masks: SAM 可以處理大部分 VOC 及 COCO 的圖像，但偶爾會錯誤地將不同的物件視為整體或乎卻正確區域，如下圖，第一行中寵物及其他物件會被視為整體，而第二行會有些並未被 segment 到。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Enhancement Algorithm: 儘管前面的演算法可以處理大部分情況，但仍因為有些情況產生問題

\to

SAM 的 mask 包含著其他 mask ，如下圖， column g 包含著 column e 及 f ，此種情況下，因為 pseudo-labels 常為 partial activation ，然而演算法為偏向選擇較大的 mask 以產生更完整的 pseudo-label ，因此反而誤導了 pseudo-label 。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

在論文後方有他應用在一些背景很空曠的地區， SAM 非常有幫助地描繪出非常細節的物件，可以另外去看。

Segment Anything Model (SAM) Enhances PseudoLabels for Weakly Supervised Semantic Segmentation

目的:

Survey

Pixel affinity-based methods:

Saliency guidance:

SAM Enhanced pseudo-labels (SEPL)

實驗

在 PASCAL VOC 資料集上

在 COCO 資料集上

什麼時候 SAM 不會有幫助?

Read more

Linux 核心專題: 回顧 bitops 並改進

2024q1 Homework2 (quiz1+2)

Performance Analysis of the IEEE 802.11 Distributed Coordination Function

FNS 改進