Lecture 13: Generative Models

--- tags: cs231 --- # Lecture 13: Generative Models ![](https://i.imgur.com/WL87l4S.png) ![](https://i.imgur.com/1cUUExD.png) - 生成模型是一種訓練模型進行無監督學習的模型 - 圖片生成模型就是更具體的指向說給模型一組圖片作爲訓練集讓模型進行學習，希望模型生成一組和訓練集圖片儘可能相近的圖片 ![](https://i.imgur.com/RiLd6Xi.png) # PixelRNN/CNN - Explicit Density Model ![](https://i.imgur.com/SSThWrM.png) - 一個像素點一個像素點的進行生成，同時將前面生成的像素點作爲參考。相當於將預測一張圖上所有像素點的聯合分佈轉換爲對條件分佈的預測。 ![](https://i.imgur.com/paVUbDk.png) 必須逐一提取當前像素點的信息（從左到右、從上到下），雖然效果好，但是速度太慢了從表現效果上面看，pixelCNN的速度雖然比pixelRNN快，但是模型的效果並沒有它好。 | 第二組 |1.可以簡單介紹一下Markov Chain的兩個模型：Boltzmann Machine & GSN https://www.stockfeel.com.tw/%E9%A1%9E%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF%E7%9A%84%E5%BE%A9%E8%88%88%EF%BC%9A%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92%E7%B0%A1%E5%8F%B2/ 多步的馬爾可夫鏈方法有很大的計算量。因為這些原因GAN被設計為避免使用馬爾可夫鏈。參考網站：https://sinpycn.github.io/2017/04/29/GAN-Tutorial-How-do-generative-models-work.html 2.data augementation與generative model都可以用來生成圖片，何時該用哪一個? # Variational Autoencoder (VAE) ## AutoEncoder (AE) ![](https://i.imgur.com/0ovepsy.png) :::info :question:AutoEncoder與一般資料壓縮演算法相同嗎？與一般的資料壓縮演算法如 ZIP, MP3，JPG 等通用演算法不同，Autoencoder 訓練完成的 Model 只適用於特定類型資料且會損失 (lossy) 原始資訊。 ::: * 目標： * 盡可能讓 input 和 output 相同，也就是讓 reconstruction loss 愈小愈好 * 常見應用： * 特徵擷取（Feature extraction） * 降維（Dimensionality reduction） * 異常偵測 (Anomaly detection) The autoencoder is solely trained to encode and decode with as few loss as possible, no matter how the latent space is organised. Thus, if we are not careful about the definition of the architecture, it is natural that, during the training, the network takes advantage of any overfitting possibilities to achieve its task as well as it can… unless we explicitly regularise it! ![](https://i.imgur.com/7m5iZGm.png) ## Variational Autoencoder（VAE） ![](https://i.imgur.com/CEUnAU6.png) A variational autoencoder can be defined as being an autoencoder whose training is regularised to avoid overfitting and ensure that the latent space has good properties that enable generative process. ![](https://i.imgur.com/wsv9f7X.png) ![](https://i.imgur.com/3KQ4WOc.png) ![](https://i.imgur.com/mGbs2CH.png) ![](https://i.imgur.com/Og1TNN7.jpg) VAE 可以解讀 latent space 中的每一個維度（dimension）分別代表什麼意思，如傾斜角度、形狀變化、表情變化等，因此理想上可以調整想要生成的圖片。 https://towardsdatascience.com/understanding-variational-autoencoders-vaes-f70510919f73 ## Denoising AE（DAE） ![](https://i.imgur.com/c0WFTk9.png) Denoising AE 是一種學習對圖片去噪（denoise）的神經網絡，它可用於從類似圖像中提取特徵到訓練集。實際做法是在 input 加入隨機 noise，然後使它回復到原始無噪聲的資料，使模型學會去噪的能力。 ``` noise_factor = 0.5 # 決定 noise 的數量，值越大 noise 越多 x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_test.shape) # 將資料限制在 [0, 1] 之間的範圍內 x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.) ``` ![](https://i.imgur.com/NXFY35Y.png) ![](https://i.imgur.com/FjCIRpI.png) ## Sparse AE（SAE）如果一個人可以做 A, B, C ... 許多的工作，那他就不太可能是 A 工作的專家，或是 B 工作的專家．如果一個神經元對於每個不同的訓練都會有反應，那有它沒它好像沒有什麼差別。 * 目標： * 加上稀疏的限制條件 (sparse constraint)，讓每個神經元都能成為辨識某些特定輸入的專家。 * 優點： * 避免 overfitting ![](https://i.imgur.com/bEUj82d.png) https://github.com/y0ast/VAE-TensorFlow # GAN(Generative Adversarial Networks) 生成對抗神經網路 - Implicit Density Model：如果我們不想要顯式的 density modeling，又想要生成一堆樣本 - GAN的重要角色 - 訓練階段 - Generator Network：生成假的圖片，目的在於讓 Discriminator 將這些假的圖片辨識為真 - Discriminator network：明確辨識真偽的圖片。即若圖片來自原本的 data ，那 Discriminator 要辨識為1；反之辨識為0。 - minimax objective function：為了使上述的 Generator、Discriminator越變越強，也就是降低這兩個 Networks 的 loss ，所以有這個 minimax objective function ，也就是一整個大 loss function - 如何訓練(在每一個iteration) * 初始的 Generator & Discriminator 1. 固定 Generator ，更新 Discriminator 的參數(只訓練 Discriminator 的意思) ![](https://i.imgur.com/C5rQ803.jpg) 2. 固定 Discriminator ，更新 Generator 的參數(只訓練 Generator 的意思) ![](https://i.imgur.com/IBoTixF.png) 4. 演算法步驟 ![](https://i.imgur.com/cSMfBPR.png) --- :::info :question: 第三組問[辣個圖片]()怎麼生成的 * 用Image-to-Image Translation with Conditional Adversarial Networks * [paper link](https://arxiv.org/pdf/1611.07004.pdf) * [code link](https://github.com/phillipi/pix2pix) (by PyTorch) * 與 GAN 不同的是， conditional GAN 是這樣的：輸入一個 noise vector & image，然後想辦法生成一張 image 讓 Discrimintator 認為是真實的圖片 * GANs：Learn a mapping from random noise vector z to output image y, $G : z → y$ * image to image conditional GANs：Learn a mapping from observed image x and random noise vector z, to y, $G : {x,z} → y$ * 詳細看上面 paper link 的 3. Method ::: --- :::info 第二組與第三組的2.:question: 在存摺、國稅局影像辨識等專案中，適合使用GAN生成新資料嗎？ :bulb: :man_in_tuxedo:我覺得不一定適合，因為 1. data augmentation的生成資料方式蠻固定的，如shifts, reflections, rotations, or color alterations等技巧，我們大概可以知道新圖片不會與舊圖片差太多；但我們根本就無法確定 GAN 會生出什麼東西（我怕用的人看到了也會喊一聲GAN）。如下列圖片(出自參考資料1) - 數量問題 ![](https://i.imgur.com/8uDWK4Z.jpg) - 透視問題 ![](https://i.imgur.com/SvnlRxJ.jpg) - 結構問題 ![](https://i.imgur.com/gUpQkSH.jpg) 2. 當然也是有些情形增加之後的成效是好的，如參考資料3的那篇論文。但參考資料2的那篇論文是不如預期的 3. 我找了一下，好像DAGAN(Data Augmentation GAN)蠻符合的，而且還有[code(DAGAN)](https://github.com/AntreasAntoniou/DAGAN)，他的圖片好像生成的不錯[(datasets)](https://drive.google.com/drive/folders/1IqdhiQzxHysSSnfSrGA9_jKTWzp9gl0k) 4. 總之，還是可以實驗看看囉～ * 此問題參考資料 1. 網路文章-[一文看懂生成式对抗网络GANs：介绍指南及前景展望](https://36kr.com/p/5086889) 2. 2017年-[The Effectiveness of Data Augmentation in Image Classification using Deep Learning](https://arxiv.org/pdf/1712.04621.pdf) 3. 2019年-[Data Augmentation Using GANs](https://arxiv.org/pdf/1904.09135.pdf) 4. Medium artical-[GANs for Data Augmentation](https://medium.com/reality-engines/gans-for-data-augmentation-21a69de6c60b) ::: --- - Generative Adversarial Nets: Convolutional Architectures - 程式碼連結： - [Tensorflow版本](https://github.com/carpedm20/DCGAN-tensorflow) - [Pytorch版本](https://github.com/soumith/dcgan.torch) - paper link -> [Here](https://arxiv.org/pdf/1511.06434.pdf) 重點是6.3.2這邊 - Interpretable Vector Math --- :::info - [ ] :question:第四組問為什麼 GAN 很難收斂呢？ 1. 我上山你下山：在 Generator 與 Discriminator 較勁中，可能在 loss of Generator 減少時卻讓 loss of Discriminator 增加，以至於沒有達到兩邊進步。有時候即使達到了平衡點，卻是雙方都沒什麼幫助的地方(e.g.$f(x, y) = xy$) 2. 生成器 mode collapse：例如在MNIST的狀況，生成器可能發現數字1的圖片很容易騙過 Discriminator ，所以就瘋狂產生數字1。就好像考試很愛猜C的道理 :smile::smile::smile: 3. Generator 梯度消失：Discriminator太精準了，一下子他的 loss 就收斂到0，那麼就沒什麼資訊提供給 Generator，所以 loss of Generator也沒啥變化，就造成梯度消食 * 本題參考資料 1. [为什么GAN难以训练](https://www.jianshu.com/p/93f6c62eadbb)：第1點的函數是從這裡找出來的 2. [[機器學習] GAN 筆記](https://medium.com/hoskiss-stand/gan-note-791358c3b10b)：第2點是從這裡出來的 3. [GAN不穩定因素](https://www.itread01.com/content/1544807946.html)：其實幾乎都參考這裡XD - [ ] :question:第四組又問說，如果要使用GAN這項技術來生成資料，那要用哪種指標來判斷是否有足夠的程度能夠使用這個模型來增加資料量？ - Inception Score：衡量生成圖片的多樣性 - Fr ́echet Inception distance - 用GAN-train & GAN-test的precision & recall - 參考論文：[How good is my GAN?](https://arxiv.org/pdf/1807.09499.pdf%20%20原文網址：https://kknews.cc/news/jb5qqkp.html) -> 主要看2. Related Work ::: --- 報告參考資料： 1. 李宏毅教授2018年課程：[Link](http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2018/Lecture/GAN%20(v2).pdf)、[影片](https://www.youtube.com/watch?v=DQNNMiAP5lw&list=PLJV_el3uVTsMq6JEFPW35BCiOQTsoqwNw&index=1) --- # 問題 | 組別 |<center> 問題 </center>| |:-------------------:|----------------------| | 第一組 |<center>報告組</center>| | 第二組 |1.可以簡單介紹一下Markov Chain的兩個模型：Boltzmann Machine & GSN 2.data augementation與generative model都可以用來生成圖片，何時該用哪一個? | 第三組 |1.好奇像圖像轉圖像是如何做的![](https://i.imgur.com/K1bZ1GJ.jpg) 2.在存摺、國稅局影像辨識等專案中，適合使用GAN生成新資料嗎？。 3.AutoEncoder四大類型有AutoEncoder(AE)、Variational Autoencoder(VAE)、Denoising AE（DAE）、Sparse AE（SAE），我應該用什麼指標來評估這次要使用的方法呢 | 第四組 |1. 這堂課程介紹的VAE是使用Normal distribution為先驗分配，那有其它文獻是使用其他先驗分配可以得到更好的結果，還是先驗分配的選擇並不影響？ 2. 如果單就GAN來看的話，是一個很強大的模型，但時常有人說他難以收斂，請問是什麼原因造成他難以收斂？3. 如果要使用GAN這項技術來生成資料，那要用哪種指標來判斷是否有足夠的程度能夠使用這個模型來增加資料量？|