Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

# Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation ## Introduction:Weak to strong consistency regularization - supervise ==strongly perturbed== unlabeled image $x^s$，prediction yield from ==weakly perturbed== image $x^\omega$($x^s$:訓練模型，$x^\omega$預測) ### Strengthen FixMatch - **expanding a broader perturbation space**: unified perturbation network - FiMAtch缺點 - perturbation 被限制在image level=> model無法更多perturbation space && maintain consistncy at diverse level - 改進 - raw image: 套用pre-defined image-level strong perturbations - extracted features of weakly perturbed features: add single channel dropout - **sufficiently harvesting original perturbations** : 對input改變 - FixMatch缺點 - 沒有利用single strong view of unlabeled images => 無法完整利用manually pre-defined perturbation space - 改進 - ==randomly== sample ==dual independent== views from perturbation pools - fed into student model in parrallel - supervised by their shared weak view ## Model Architecture ### weak-to-strong consistency regularization - parameters - weak perturbation $A^\omega$ - strong perturbation $A^s$ - supervised loss $L_s$(cross entropy loss) - batch size for unlabeled data$B_u$ - pre-defined confidence threshold $\tau$ - teacher model $\hat{F}$ : produce pseudo label on weakly perturbed images - ![image](https://hackmd.io/_uploads/rkEMRZtZR.png =35%x) - student model $F$: leverages strongly perturbed images for model optimization - ![image](https://hackmd.io/_uploads/ryqv0WYWC.png =40%x) - $H$ : minimize entropy between $p^\omega,p^s$ - unsupervised loss $L_u$ - ![image](https://hackmd.io/_uploads/By8C0bYZR.png =60%x) - loss function - $L=\frac{1}{2}(L_s+L_u)$ ### Unified Perturbations for Images and Features ![image](https://hackmd.io/_uploads/SkJHGWFZA.png =40%x) - inject perturbations on features of the weakly perturbed image$x^\omega$ - parameter - segmentaion model $f$ - encoder $g$ - decoder $h$ - extracted features of $x^\omega$: $e^\omega =g(x^\omega)$ - auxiliary feature perturbation : $p^{fp}=h(P(e^\omega))$, $P:$feature perturbations(dropout/adding uniform noise) - $p^s,p^\omega$: FixMatch output - feedforward streams - simplest stream: $x^\omega\rightarrow f\rightarrow p^\omega$ - image level strong perturbation stream: $x^s\rightarrow f\rightarrow p^s$ - feature perturbation: $x^\omega\rightarrow g\rightarrow P\rightarrow h\rightarrow p^\omega$ - unsupervised loss ![image](https://hackmd.io/_uploads/rJpumbtW0.png) ### Double-Stream Perturbations ![image](https://hackmd.io/_uploads/r1mqNZK-R.png =45%x) - 原理: 透過strong perturbaton pool $A^s$ 從 $x^\omega$ 得到 dual-stream perturbations$(x_{s1},x_{s2})$ independently - ($A^s$:predefined not deterministic, $x_{s1}\neq x_{s2}$) - parameter - $k_\omega$: classifier weight of the class predicted y $x_\omega$ - $(q_{s1},q_{s2}):$$(x_{s1},x_{s2})$ 的 features images - classifier weight of class i : $k_i$ - Loss function: InfoNCE loss ![image](https://hackmd.io/_uploads/H1SJS-FWC.png) - $q_{s1},q_{s2}$: positive pairs - classifier samples except $k_\omega$ is negative samples ### holistic framework ![image](https://hackmd.io/_uploads/B1G66eFWC.png =45%x) - loss function ![image](https://hackmd.io/_uploads/ByRCt-YWC.png) ## Implementation details - sgmentaion model : DeepLabv3+ based on ResNet - mini-patch: 8 labeled images + 8 unlabeled images - $A^s$: ST++ & CutMix - $x^\omega:$ resized(0.5~2.0)$\rightarrow$cropped$\rightarrow$flip - dropout: 50% ## Conlcusion #### 因為Fixmatch 搭配 image-level strong perturbations performance :+1: - unify image & feaure level perturbations : diverse perturbation space - dual-stream perturbation : exploit image-level perturbations