# Generateive Adversarial Network, GAN 生成式對抗網路 Course: 李宏毅 機器學習2021 [連結](https://www.youtube.com/watch?v=4OWp0wDu6Xw) ![](https://i.imgur.com/lTYKiQO.png) - Network 輸入x 加上從某distribution sample出來的另一個隨機變數 - distribution需要夠simple - Network輸出為一distribution Unconditional Generation - 先不考慮input x Anime Face Generation - low-dim vector => high-dim vector - 選擇distributiono不太重要, generator會想辦法做轉換 Discriminator ![](https://i.imgur.com/8q2CwGz.png) - 輸入image, 輸出scalar - 試圖分辨generator產生的圖片和真實圖片的差異 Algorithm - Initialize generator & discriminator - Each training iteration - 固定Genarator G, 更新discriminator D - 初始的G產生的結果必然很糟,D只需對G產生結果以及ground truth進行分類 - 若為classifier問題,可以用one-hot encoder將G產生結果和gt分類(high score to real objects, low scores to generated objects) - ![](https://i.imgur.com/G2tagiP.png) - 固定D, 更新G - 可將NN Generator和Discriminator合併成更大的network - 更新權重只針對NN Generator的部分 - 欲使輸出越大越好 (越像, fool the discriminator) - ![](https://i.imgur.com/E3eEI2q.png) - 重複iteration Theory - Generator Training: 找一組Generator的參數,使產生的P~G~和P~data~之間divergence最小 ![](https://i.imgur.com/vXvh8da.png) - GAN: 只要能夠從P~G~和P~data~sample出data, 就可以得到divergence ![](https://i.imgur.com/GwPrV5I.png) - Discriminator Training: 找一組D的參數,使Objective function最大 ![](https://i.imgur.com/G3350Ma.png) - 等同於訓練classifier - maxV(D,G)和JS divergence有關 (直觀理解:若能夠被well-classified, 資料間會存在相當程度差異(higher divergence) - [Other Divergence](https://arxiv.org/abs/1606.00709) Tips - JS divergence不一定適合 - 大多情況P~G~和P~data~不重疊(高維空間中可能只是有少數交點,幾乎可忽略) - JS divergence is always log2 if two distributions do not overlap - :star: 若用不重疊的兩筆資料分布train binary classifier, 通常會得到100% accuracy - :star2: 對GAN而言, accuracy或GAN沒有太大意義 ![](https://i.imgur.com/iF7Nocu.png) - 對P~G~和P~data~的瞭解僅透過sampling - Wasserstein distance ![](https://i.imgur.com/LiH3T93.png) - 若有兩筆資料P和Q,搬移P的資料使其分布和Q相同 - 用最短平均距離的moving plans定義Wasserstein distance - W distance有變化,才有辦法train model ![](https://i.imgur.com/xZY37Zm.png) - 限制: D需要夠平滑(否則training D不會收斂) - 方法(很多種)... - [Improved WGAN: Gradient Penalty](https://arxiv.org/abs/1704.00028) - [Spectral Normalization](https://arxiv.org/abs/1802.05957) Evaluation - Generator產生出的圖,用影像辨識系統做分類,得到類別分布 - 一張影像丟入classifier後得到機率分布(類別分布),分布越集中即visual quality越高 - 多張影像丟入classifier得到類別分布,做平均後,越uniform即diversity越高 - Quality & Diversity評估對象不一樣 ![](https://i.imgur.com/2Y1UNw7.png) - Inception Score(IS) - InceptionNet: Good quality, large diversity: higher IS - Frechet Inception Distance (FID) - 將多張影像丟入InceptionNet取得進入softmax之前的向量 - 真實影像和生成影像會得到兩組Gaussian - FID為兩組Gaussian分布的距離,越小越好 ![](https://i.imgur.com/s1zoSi4.png) Issues - Mode collapse (Diversity) - G產生多張非常相近的圖(分布集中在某一塊,例如只有髮色變),以騙過Discriminator ![](https://i.imgur.com/fiU9tL9.png) - Mode Dropping (Diversity) - G在單輪iteration能產生多樣性很高的資料,但跑多輪下來資料卻沒有大幅變動 (例如人臉輪廓不變只有膚色變) ![](https://i.imgur.com/fmEjHbf.png) Conditional GAN - G輸入除了分布之外,還可以同時丟入其他資訊x(例如讓機器認識red eyes為產生紅眼睛的提示) - D也需要輸入其他資訊x和生成資料y ![](https://i.imgur.com/fwCVl6L.png) - 範例: Image translation (pix2pix) - 輸入影像產生影像 - [參考paper-Image-to-Image Translation with Conditional Adversarial Networks](https://arxiv.org/abs/1611.07004) - 範例: sound-to-image - [參考paper-Towards Audio to Scene Image Synthesis using Generative Adversarial Network](https://arxiv.org/abs/1808.04108) - 範例: 會動的蒙娜麗莎 - [參考paper-Few-Shot Adversarial Learning of Realistic Neural Talking Head Models](https://arxiv.org/abs/1905.08233) Unsupervised Conditional GAN - 範例: Image Style tranfer - 輸入真人照片,輸出動漫人物 - 缺乏成對資料(取得真實人物照片並畫出漫畫人物成本太高) - [參考paper](https://arxiv.org/abs/1907.10830) - Cycle GAN - 只輸入Domain x的影像(真實人物照片)不夠,因為沒有輸入與輸出連結 - 輸入x經第一個G轉換後再經過第二個G轉換,還原成原來的x ![](https://i.imgur.com/f8e4h8T.png) Reference [Anime Face Generation](https://zhuanlan.zhihu.com/p/24767059) [Sytle GAN](https://www.gwern.net/Faces) [Progressive GAN](https://arxiv.org/abs/1710.10196) [The first GAN](https://arxiv.org/abs/1809.11096) [Training language GANs from Scratch](https://arxiv.org/abs/1905.09922) [Are GANs Created Equal? A Large-Scale Study](https://arxiv.org/abs/1711.10337) [Pros and cons of GAN evaluation measures](https://arxiv.org/abs/1802.03446) [延伸-李宏毅GAN 2018](https://www.youtube.com/watch?v=DQNNMiAP5lw&list=PLJV_el3uVTsMq6JEFPW35BCiOQTsoqwNw [延伸-李宏毅VAE](https://www.youtube.com/watch?v=8zomhgKrsmQ) [延伸-李宏毅Flow-based Generative Model](https://www.youtube.com/watch?v=uXY18nzdSsM) [wjohn1483.github.io](https://wjohn1483.github.io/page4/) [Unsupervised Abstractive Summarization](https://arxiv.org/abs/1810.02851) [Unsupervised Translation-1](https://arxiv.org/abs/1710.04087) [Unsupervised Translation-2](https://arxiv.org/abs/1710.11041) [Unsupervised ASR-1](https://arxiv.org/abs/1804.00316) [Unsupervised ASR-2](https://arxiv.org/abs/1812.09323) [Unsupervised ASR-3](https://arxiv.org/abs/1904.04100)