[筆記] On Feature Normalization and Data Augmentation

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

CVPR 2021
arxiv
Github

Overview

Normalization
- batch_norm 跟 instance_norm 其實就是對不同維度做 normalization，batch_norm 對整個 batch 做 normalization，instance_norm 對 channel 做 normalization
- Batch normalization 常常被用在 training 時做 feature scaling
  - 一個 batch 的 features 減
    $μ$ 除以
    $σ$ 做標準化 (平移縮放)，
    $\tilde{z} = \frac{Z - μ}{σ}$
  - 再乘
    $γ$ 加上
    $β$ 來調整 features 不要都落在
    $[0, 1]$ ，
    ${\hat{z}}^{i} = γ ⊙ {\tilde{z}}^{i} + β$
- Instance normalization 反而在 image generation 這類的 task 表現的比 batch_norm 更好，因為透過 instance_norm 或是 position_norm 的 moments (
  $μ$ ,
  $σ$ ) 可以更好的抓到 style 和 shape
  - 我看過一個說法是，在 image generation task 更看重 instance 之間的差異，要生成一張 image 不需要參考其他 image 的資訊，應該保有該張 image 的特色，也就是
    $μ$ ,
    $σ$
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
在 image recognition (classification) task 中，latent feature 的
$μ$ ,
$σ$ 被視為 noise，需要透過 batch_norm 移除；但是 image generation task，latent feature 的
$μ$ ,
$σ$ 是一種 feature
- 例如，下圖 1. 透過 postion normalization 得到 ResNet-18 第一個 layer 的
  $μ$ ,
  $σ$ ，仍舊可以透過
  $μ$ ,
  $σ$ 來預測 class 的類別
  1. Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
  2. Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
- 比較下表中 classification task 的 error rate，單純從 moments 來做分類 (PONO moments, 紅色) 已經比隨機亂猜 (Random Baseline, 灰色) 來得更好。如果把 moments 移掉 (PONO normalized, 藍色)，結果會比標準的 PONO (綠色) 還要更爛，所以 moments 其實是重要的 feature
  - Image Not Showing Possible Reasons
    The image file may be corrupted
    The server hosting the image is unavailable
    The image path is incorrect
    The image format is not supported
    Learn More →
本篇論文的方法基於 positional normalization，既然 moments 代表 shape 和 style，那只要交換 moments 就能限制模型同時學習 a instance 的 feature dist 和 b instance 的 moments
- 這篇的方法稱為 Moment Exchange (MoEX)
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →

Methodology

…前面其實已經把本篇論文的核心給講完了，接著細探這篇的方法。但不得不說，這篇的作法跟 DOMAIN GENERALIZATION WITH MIXSTYLE 超級無敵像，只差在這篇透過 intra-instance normalization 得到

μ

σ

，DOMAIN GENERALIZATION WITH MIXSTYLE 是透過 instance normalization 得到

μ

σ

MoEX

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

input

X_{A}

經過模型得到 features

h_{A}

，透過 insta-instance normalization 取得

h_{A}

的

(μ_{A}, σ_{A})

；同時，input

X_{B}

的 features

h_{B}

也會透過 insta-instance normalization 取得

(μ_{B}, σ_{B})

Normalize
$A$ 的 features，再透過
$(μ_{B}, σ_{B})$ 縮放平移
Normalize
$B$ 的 features，再透過
$(μ_{A}, σ_{A})$ 縮放平移

透過 normalized

h_{A}

替換成

(μ_{B}, σ_{B})

可以強迫模型同時關注 data 的兩個面向：normalized features 和 moments

Ground-truth

y = λ \cdot ℓ (h_{A}^{(B)}, y_{A}) + (1 - λ) \cdot ℓ (h_{A}^{(B)}, y_{B})

λ \in [0, 1]

既然結合兩個 instances 的特徵，那結合後的 ground-truth 一定是兩個 instances 個別 ground-truth 的 weighted sum

原 paper 的 Table 10. 有做

λ

的參數實驗，總的來說最推薦

λ = 0.9

Normalization

這篇方法有個重要的前提：normalize 只能做在 instance 內部，也就是 intra-instance normalization，例如 positional normalization 就是拿 instance 的 position 做標準化

({\hat{h}}_{i}^{ℓ}, μ_{i}^{ℓ}, σ_{i}^{ℓ}) = F (h_{i}^{ℓ})

Annotation

$_{i}$ : 第
$i$ 個 input
$x_{i}$
$l$ : 第
$l$ 個 layer 的 feature
$h_{i}^{ℓ}$

假設有個 function
$F$ 負責做 intra-instance normalization，給
$F$ feature maps 會得到 (1) normalized feature
${\hat{h}}_{i}^{ℓ}$ , (2)
$μ_{i}^{ℓ}$ , (3)
$σ_{i}^{ℓ}$
- 以上方圖示意舉例，
  $({\hat{h}}_{A}, μ_{A}, σ_{A}^{}) = F (h_{A})$

h_{i}^{ℓ} = F^{- 1} ({\hat{h}}_{i}^{ℓ}, μ_{i}^{ℓ}, σ_{i}^{ℓ})

Annotation

$_{i}$ : 第
$i$ 個 input
$x_{i}$
$l$ : 第
$l$ 個 layer 的 feature
$h_{i}^{ℓ}$

對應
$F^{- 1}$ 負責從 normalization 還原，給
$F^{- 1}$
$(1) {\hat{h}}_{i}^{ℓ}, (2) μ_{i}^{ℓ}, (3) σ_{i}^{ℓ}$ ，做縮放平移
- 以上方圖示意舉例，
  $h_{A}^{(B)} = F^{- 1} ({\hat{h}}_{A}, μ_{B}, σ_{B})$

MoEX 限制在 instance 內的標準化，很單純的調整

μ_{i}^{ℓ}, σ_{i}^{ℓ}

，所以還是可以做 inter-instance normalization (e.g. batch normalization)，也就是 instance 間的標準化

Intra-instance normalization 有很多種 (IN, GN, LN)，作者也有做相關的實驗在 Table 8.

My Conclusions

我真的覺得這篇的作法跟 DOMAIN GENERALIZATION WITH MIXSTYLE 超級像，原則上是一模一樣，只是切入的角度有一點不同，投稿在不同的 conference 上

[筆記] On Feature Normalization and Data Augmentation

Overview

Methodology

MoEX

Ground-truth

Normalization

My Conclusions

Read more

[筆記] EFFICIENT DEEP REPRESENTATION LEARNING BY ADAPTIVE LATENT SPACE SAMPLING

[筆記] GEOMETRIC DATA AUGMENTATION BASED ON FEATURE MAP ENSEMBLE

[筆記] SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization

[筆記] Contrastive learning of global and local features for medical image segmentation with limited annotations