Differentiable Augmentation for Data-Efficient GAN Training

# Differentiable Augmentation for Data-Efficient GAN Training ###### tags: `paper` ## [Slides](https://drive.google.com/drive/u/0/folders/1N-GQqji8LPbVCFME4HX_EMGJopfcdbgN) ## Abstract 要訓練出優秀的GAN通常需要大量的圖片資料，此篇paper即是希望透過各種Differentiable Augmentation(DiffAugment)來優化data efficiency。 DiffAugment將對生成的圖片和真實的圖片皆套用此技術以得到較佳的效果與穩定的訓練，此篇paper聲稱透過這樣的技術可以在ImageNet128x128上得到FID 6.8, IS 100.8的數據，甚至在CIFAR-10上只需要原來20%的訓練資料量就可以得到接近目前最頂尖的數據。 - IS(Inception Score): 評價GAN模型的一項指標，使用Inception NET-V3(Google的圖像辨識模型)來分類生成的圖片，分析生成的圖片是否**清晰**、**多樣** - FID(Frechet Inception Distance): IS的缺點是沒有考慮生成圖片與真實圖片之間的距離，而FID可算是針對此點進行改善的另一個指標，透過計算生成圖片與真實圖片特徵向量之間的距離來評價模型，FID為0表示生成圖片與真實圖片無異 ## Introduction GAN在近年進步得非常快，先進的模型已經可以生成各種高仿真的圖片，也驅動了相關的應用發展。但是GAN對於計算和訓練資料的要求仍然非常高，雖然已經有不少研究提出許多方法來降低模型的計算成本，然而在Data efficiency方面仍值得努力。目前知名的dataset如ImageNet等，都有上百萬張的圖片在支持著GAN的發展，要蒐集如此多的圖片並做label得耗費非常長時間的人力工作，有些時候，我們甚至根本無法取得那麼多圖片，因此降低GAN對於資料的需求就顯得十分重要。然而，如果將training data限制為原來的10%或20%，模型的表現就會大幅下降，D的training accuracy會飽和得非常快，然而validation accuracy卻會持續下降，我們可以合理推測即是發生overfitting，D只是記住了整個training set，而沒有學習到general的特徵。因此就出現了data augmentation技術，諸如將圖片剪裁、翻轉、縮放、masking等等，以增加資料的多樣性，省去蒐集更多資料的必要。不過要在GAN上套用data augmentation存在著本質上的不同，如果我們只對真實的圖片做增強，那麼某種程度即是在鼓勵G生成這些經過處理的圖片。作為替代方案，我們可以在訓練D時，對真實和生成的圖片皆套用augmentation，然而這麼做會破壞G和D之間微妙的動態平衡，導致poor convergence。此篇paper提出另一套解決方法─DiffAugment，對於G和D的訓練，皆套用相同的differentiable augmentation在真實和生成的圖片上。 ## Method 回顧GAN的模型，我們可以得到以下兩個表示loss的式子，G會從潛在空間中隨機取樣出 z 生成G(z)，而D負責學習如何分辨G(z)與真實的資料 x ，最基本的GAN就是交替的優化D和G，也就是這兩個loss function： ![](https://i.imgur.com/CjlxlnS.png) > 這裡的loss function f~D~和f~G~可以不同的loss function計算儘管已經有許多研究提出更好的GAN架構和loss function，然而最根本的overfitting問題仍然存在，D仍然會傾向於記住所看見的真實資料，一旦如此，D將會penalize任何生成的圖片，使得G無法得到有益的資訊 ![](https://i.imgur.com/4uU3zMR.png) 從上圖就可以看出來，縱使是目前相當powerful的BigGAN，並且給予100%的資料集，training accuracy和validation accuracy之間的差距仍然有越來越大的趨勢，表示D正在記憶訓練圖片 > BigGAN已經採用Spectral Normalization來限縮每層的輸出，但仍存在overfitting ### Augment reals only 最直觀的套用augmentation在GAN上的方法就是直接對於真實的圖片 x 套上augmentation T： ![](https://i.imgur.com/epnURnn.png) 然而，這個方法事實上偏離了GAN原始的目標，生成模型將會學習T(x)而不是原來的 x ，這也讓我們能夠使用的augmentation方法變得有所限制(可能將只剩下水平翻轉)，否則G就會學習生成我們變造過後的圖片(unnatural color, cutout holes) ![](https://i.imgur.com/wPVVWLx.png) 此表格就說明了如果在augment reals/D only的情況下，施加強力的augmentation會有什麼副作用 ### Augment D only 為了解決Augment reals only會產生的問題，讓G不要生成像是變造過後的圖片，衍生出了以下方法，也就是訓練D時，將真實的 x 和生成的G(z)皆套用augmentation T： ![](https://i.imgur.com/w9xvgCw.png) 如果G成功的model出distribution of x，則T(G(z))和T(x)對於D來說應該就難以分辨。然而，這個方法在實驗上卻得到更糟糕的結果。 ![](https://i.imgur.com/kkIhpum.png) 雖然D在分辨T(G(z))和T(x)上很厲害，然而在處理G(z)時卻完全傻了，準確率甚至不足10%，其原因可能是我們這麼做的時候破壞了G和D之間學習的動態平衡，因此導致整個訓練失敗，這個例子也演示了GAN在訓練上容易不穩定的特性 ### DiffAugment for GAN 綜合上述的經驗，此paper就提出了DiffAugment的方法，其概念在於我們也必須propagate gradients through the augmented samples to G，也因此，我們使用的augmentation T必須是differentiable的，此即DiffAugment： ![](https://i.imgur.com/ICen65r.png) 此方法在BigGAN上採用*Translation, Cutout, Color*等augmentation去做實驗也取得了不錯的結果 > Translation: 位移 > Cutout: masking with a random square of half image size > Color: including random brightness, contrast, and saturation ## Experiments Based on the leading class-conditional BigGAN & unconditional StyleGAN2 ### ImageNet - follow BigGAN on ImageNet at 128x128 - augment real images with random horizontal flips, yielding the best reimplementation of BigGAN - simple Translation DiffAugment for all the data percentage settings - result ![](https://i.imgur.com/EvjbD2G.png) ### CIFAR-10 & CIFAR-100 - BigGAN, CR-BigGAN, StyleGAN2 - random horizontal flips also - advanced regularization techniques already > Translation + Cutout for BigGAN > Color + Cutout for StyleGAN2 w/ 100% data > Translation + Color + Cutout for StyleGAN2 w/ 10% or 20% data - DiffAugment improves all the baselines without any hyperparameter changes - new state-of-the-art records #### FID ![](https://i.imgur.com/Z0iygLV.png) #### CIFAR-10 Details ![](https://i.imgur.com/movFXtI.png) #### CIFAR-100 Details ![](https://i.imgur.com/QeCuhZY.png) ### Few-Shot Generation ## my References [Data Augmentation](https://zhuanlan.zhihu.com/p/41679153) [Can We Train GANs With Less Data](https://analyticsindiamag.com/can-we-train-gans-with-less-data/) [notes: Coursera - Andrew Ng](https://www.itread01.com/content/1545889984.html) [GAN loss function](https://medium.com/hoskiss-stand/gan-note-791358c3b10b) [distance of distribution](https://angnotes.wordpress.com/2017/12/04/gan%E7%B3%BB%E5%88%97%E6%96%871-distance-of-distribution/) [Cross entropy, KL Divergence](https://www.ycc.idv.tw/deep-dl_2.html) [BigGAN](https://medium.com/@xiaosean5408/biggan%E7%B0%A1%E4%BB%8B-large-scale-gan-training-for-high-fidelity-natural-image-synthesis-df349a5f811c) [Spectral Normalization](https://medium.com/@xiaosean5408/sn-gan%E7%B0%A1%E4%BB%8B-spectral-normalization-for-generative-adversarial-networks-f8fd784f2ad) [StyleGAN](https://www.coderbridge.com/@pomelyu/dda74ea81cf4431ca4ea2e9c617d8402) [Transfer Learning](https://medium.com/%E6%88%91%E5%B0%B1%E5%95%8F%E4%B8%80%E5%8F%A5-%E6%80%8E%E9%BA%BC%E5%AF%AB/transfer-learning-%E8%BD%89%E7%A7%BB%E5%AD%B8%E7%BF%92-4538e6e2ffe4) [transfer: FreezeD](https://arxiv.org/abs/2002.10964) [transfer: MineGAN](https://arxiv.org/abs/1912.05270) [transfer: scale / shift](https://arxiv.org/abs/1904.01774) [Learderboard for ImageNet 128x128](https://paperswithcode.com/sota/conditional-image-generation-on-imagenet) --- [Neural Architecture Search](https://medium.com/ai-academy-taiwan/%E6%8F%90%E7%85%89%E5%86%8D%E6%8F%90%E7%85%89%E6%BF%83%E7%B8%AE%E5%86%8D%E6%BF%83%E7%B8%AE-neural-architecture-search-%E4%BB%8B%E7%B4%B9-ef366ffdc818) [DARTS: Differentiable Architecture Search](https://www.cnblogs.com/wangxiaocvpr/p/10556789.html) [Softmax 回歸](https://medium.com/%E6%89%8B%E5%AF%AB%E7%AD%86%E8%A8%98/%E4%BD%BF%E7%94%A8-tensorflow-%E5%AD%B8%E7%BF%92-softmax-%E5%9B%9E%E6%AD%B8-softmax-regression-41a12b619f04)