books: GANs-in-Action

# books: GANs-in-Action [github](https://github.com/GANs-in-Action/gans-in-action) and reference [0614 AIV NLP&GAN (I)](/geO-eoI1R2eWVaZI0EVDVQ) [code] - [cyclegan.ipynb](https://drive.google.com/file/d/1WsQ8ybWLjiks9j_qDXQCEXbjCdrhTy_Q/view?usp=sharing) - [dcgan.ipynb](https://drive.google.com/file/d/140y8H3UCXX6AhyXXUrUB22ksWrqaDrJR/view?usp=sharing) - [datasets for cyclegan](https://people.eecs.berkeley.edu/~taesung_park/CycleGAN/datasets/) 先記一下其他文章 fb 6/13 4:40 am 1. [Arjovsky, Martin, Soumith Chintala, and Léon Bottou. “Wasserstein gan.” arXiv preprint arXiv:1701.07875 (2017).](https://arxiv.org/pdf/1701.07875.pdf) 2. [Arjovsky, Martin, and Léon Bottou. “Towards principled methods for training generative adversarial networks.” arXiv preprint arXiv:1701.04862 (2017).](https://arxiv.org/pdf/1701.04862.pdf) 3. [知乎 — 令人拍案叫绝的Wasserstein GAN](https://zhuanlan.zhihu.com/p/25071913) 4. [Python Keras 實現 — eriklindernoren/Keras-GAN](https://github.com/eriklindernoren/Keras-GAN/blob/master/wgan/wgan.py) | GAN# | Networks | Input | | -------- | -------- | -------- | | Text | Text | Text | |GAN#|Networks|Input|Output|Loss|Goal|Remark| |--------|--------|--------|--------|--------|--------|--------| | GAN | $D(x)$ | -------- | -------- | loss='binary_crossentropy' optimizer=Adam() | $D^* = \frac{P_r}{P_r + P_g}$ $J_G = - J_D = JS(P_R\|P_G)$ | ||$G(x)$|||loss='binary_crossentropy' optimizer=Adam()|| |DCGAN|$D(x)$|||loss='binary_crossentropy' optimizer=Adam()||$D^* = \frac{P_r}{P_r + P_g}$ $J_G = KL(P_R\|P_G)-JS(P_R\|P_G)$ ||$G(x)$|||loss='binary_crossentropy' optimizer=Adam()||but, no real x input, this is ns-gan| |WGAN|$D(x)$||||| ||$G(x)$||||| |SGAN|$D_{sup}(x)$|||loss='categorical_crossentropy' optimizer=Adam()|| ||$D_{unsup}(x)$|||loss='binary_crossentropy' optimizer=Adam()|| ||$G(x)$|||loss='binary_crossentropy' optimizer=Adam()||but, no real x input, this is ns-gan| |CGAN|$D(x)$||||| ||$G(x)$||||| |CycleGAN|$D(x)$|||loss='binary_crossentropy’ or 'mse' optimizer=Adam()|| ||$G(x)$|||$G_{AB}(A_{pic})$ $G_{BA}(B_{pic})$ $G_{BA}(G_{AB}(A_{pic}))$ $G_{AB}(G_{BA}(B_{pic}))$|| ** NLP 中的 loss definition, 比對基礎？？？ ||Identity|||$G_{AB}(B_{pic})$ $G_{BA}(A_{pic})$||** NLP 有這個特性嗎？ # 基本上我們就有了六個損失項。 # (B ==> A) 與 A 的相似：對抗損失 DA(GBA(imgsB)) 鑑別器輸出機率計算：對抗損失 # (A ==> B) 與 B 的相似：對抗損失 DB(GAB(imgsA)) 鑑別器輸出機率計算：對抗損失 # A 圖片來回損失：GBA(GAB(imgsA)): 來回一致損失 # B 圖片來回損失：GAB(GBA(imgsB)): 來回一致損失 # A 轉進 A 自己的定義域 GBA(imgsA)：特性損失 # B 轉進 B 自己的定義域 GAB(imgsB)：特性損失 self.combined.compile(loss=['mse', 'mse', 'mae', 'mae', 'mae', 'mae'], loss_weights=[1, 1, self.lambda_cycle, self.lambda_cycle, self.lambda_id, self.lambda_id], optimizer=optimizer) ## Paper and article 1. [gan slides by inagoodfellow](https://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf) 2. nVidia 出身的科學家：逐步增加網路複雜度，從小圖片到大圖片... [paper:Progressive Growing of GANs for Improved Quality, Stability, and Variation](https://arxiv.org/abs/1710.10196) 3. [Wasserstein GAN](https://arxiv.org/pdf/1701.07875.pdf) ### Chapter 2, VAE - generative model 應該具 - 擬真性 - 多樣性 - AE: autoencoder: x $\Rightarrow$ encoder $\Rightarrow$ latent space, z $\Rightarrow$ decoder $\Rightarrow$ x$^*$ - ||x - x$^x$\|| 就是 reconstuctions loss - VAE: Variational AutoEncoder) - 輸出不會是一種離散類別，而是有連續漸次。 - VAE is base on Bayesian machine learning - AE 把潛在空間(latent space) 視為無規則的隨機數字陣列（以 mnist 為例）；VAE 會試著找出適合的規則（mean and variance) 來界定資料的潛在空間的分佈區域。 ### chapter 3 Basic - procedure: - iterations: 1. discriminator - batch x, batch G(z) = x$^*$ - train discriminator: min loss $\theta_D$ - remark: G(z) 在此只生成 x$^*$，與訓練完全無關 3. generator - batch G(z) = x$^*$ - train Generator, max loss $\theta_G$ - remark: - the networks is gan = (distriminator + generator), and fix $\theta_D$ to train $\theta_G$ - ### chapter 4 DCGAN (Deep Convolutional GAN) - DGGAN 不好訓練，導入 Batch Normalization - $\hat{x} = \frac{x-\mu}{\sigma}$ - normalization 的好處在於每一層的輸入的 ‘特徵’ 不會因為比例不同，造成模型訓練時的不穩定。就是輸入的 ‘特徵’縮放比例的統一，才算是合理比較 - Batch Normalization - $\hat{x} = \frac{x-\mu}{\sqrt{\sigma^2+\epsilon}}$, $\epsilon$ is small and to avoid zero_dividing - we have $\hat{x}$, we do $y = \gamma\hat x+ \beta$ before send it into layers. - $\gamma$ and $\beta$ are trainable - @keras, we do batch normal before activation... - generator，要好好計算 ``` def build_generator(z_dim): model = Sequential() # Reshape input into 7x7x256 tensor via a fully connected layer model.add(Dense(256 * 7 * 7, input_dim=z_dim)) model.add(Reshape((7, 7, 256))) # Transposed convolution layer, from 7x7x256 into 14x14x128 tensor model.add(Conv2DTranspose(128, kernel_size=3, strides=2, padding='same')) # Batch normalization model.add(BatchNormalization()) # Leaky ReLU activation model.add(LeakyReLU(alpha=0.01)) # Transposed convolution layer, from 14x14x128 to 14x14x64 tensor model.add(Conv2DTranspose(64, kernel_size=3, strides=1, padding='same')) # Batch normalization model.add(BatchNormalization()) # Leaky ReLU activation model.add(LeakyReLU(alpha=0.01)) # Transposed convolution layer, from 14x14x64 to 28x28x1 tensor model.add(Conv2DTranspose(1, kernel_size=3, strides=2, padding='same')) # Output layer with tanh activation model.add(Activation('tanh')) # 採用 tanh 是因為比 sigmoid 圖案更清楚（?) return model ``` - gan ``` def build_gan(generator, discriminator): model = Sequential() # Combined Generator -> Discriminator model model.add(generator) model.add(discriminator) return model ``` - ... ``` # Build and compile the Discriminator discriminator = build_discriminator(img_shape) discriminator.compile(loss='binary_crossentropy', optimizer=Adam(), metrics=['accuracy']) # Build the Generator generator = build_generator(z_dim) # Keep Discriminator’s parameters constant for Generator training discriminator.trainable = False # Build and compile GAN model with fixed Discriminator to train the Generator gan = build_gan(generator, discriminator) gan.compile(loss='binary_crossentropy', optimizer=Adam()) ``` - train ``` losses = [] accuracies = [] iteration_checkpoints = [] def train(iterations, batch_size, sample_interval): # Load the MNIST dataset (X_train, _), (_, _) = mnist.load_data() # Rescale [0, 255] grayscale pixel values to [-1, 1] X_train = X_train / 127.5 - 1.0 X_train = np.expand_dims(X_train, axis=3) # Labels for real images: all ones real = np.ones((batch_size, 1)) # Labels for fake images: all zeros fake = np.zeros((batch_size, 1)) for iteration in range(iterations): # ------------------------- # Train the Discriminator # ------------------------- # Get a random batch of real images idx = np.random.randint(0, X_train.shape[0], batch_size) imgs = X_train[idx] # Generate a batch of fake images z = np.random.normal(0, 1, (batch_size, 100)) gen_imgs = generator.predict(z) # Train Discriminator d_loss_real = discriminator.train_on_batch(imgs, real) d_loss_fake = discriminator.train_on_batch(gen_imgs, fake) d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake) # --------------------- # Train the Generator # --------------------- # Generate a batch of fake images z = np.random.normal(0, 1, (batch_size, 100)) gen_imgs = generator.predict(z) # Train Generator g_loss = gan.train_on_batch(z, real) if (iteration + 1) % sample_interval == 0: # Save losses and accuracies so they can be plotted after training losses.append((d_loss, g_loss)) accuracies.append(100.0 * accuracy) iteration_checkpoints.append(iteration + 1) # Output training progress print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (iteration + 1, d_loss, 100.0 * accuracy, g_loss)) # Output a sample of generated image sample_images(generator) ``` ### Chap 5 ![](https://i.imgur.com/I4Q3Hyb.png) from iangoodfellow.com https://www.iangoodfellow.com/slides/2016-12-04-NIPS.pdf - Maximum Likelihood - ${\theta^*} = \mathop{arg \space max}\limits_{\theta}\space E_{x\~ p_{data}} \space log \space p_{model}(x|\theta)$ - VAE: - $log\space p(x) \geq log\space p(x) -D_{KL}(q(z)||p(z|x)) = E_{z \~ q} log \space p(x, z) + H(q)$ - GAN 訓練上的挑戰 1. Model collapse: - IS: interclass model collapse - FIC: intraclass model collapse 2. Slow convergence: 3. Overgeneralization: 比如很多頭的牛... - ... - min_max gan - KL - $D_{KL}(P||Q) = - \sum\limits_{x\in R}P(x)log\frac{1}{P(x)}+\sum\limits_{x\in R}P(x)log\frac{1}{Q(x)} =\sum\limits_{x\in R}P(x)log\frac{P(x)}{Q(x)}$ - JS - $JS(P1||P2) = \frac{1}{2} KL(P1||\frac{P1+P2}{2}) + \frac{1}{2} KL(P2||\frac{P1+P2}{2})$ - in textbook: - $J^D = - E[logD(x)]-E[log(1-D(G(z)))]$ 很直覺 - 簡化處理...$J^D = - D(x)+D(G(z)$ - $J^G = - J^D$ - [KL, JS and Wasserstein](https://translate.google.com/translate?hl=zh-TW&sl=zh-CN&u=https://zxth93.github.io/2017/09/27/KL%25E6%2595%25A3%25E5%25BA%25A6JS%25E6%2595%25A3%25E5%25BA%25A6Wasserstein%25E8%25B7%259D%25E7%25A6%25BB/index.html&prev=search&pto=aue) - 其實，如果我們直接參考現在的程式，可以發現，我們沒有用 JS - $J^D = - E[logD(x)]-E[log(1-D(G(z)))]$ 很直覺 - $J^G = - E[log(1-D(G(z)))]$ - 這個是 NS-GAN (Non-Saturating GAN), 因為在 min_max 中，$J^G$ 也會對 $+ E[logD(x)]$ 訓練，所以 Discriminator 會收斂很緩 - WGAN - 上文说过，WGAN与原始GAN第一种形式相比，只改了四点： - 判别器最后一层去掉sigmoid - 生成器和判别器的loss不取log - 每次更新判别器的参数之后把它们的绝对值截断到不超过一个固定常数c - 不要用基于动量的优化算法（包括momentum和Adam），推荐RMSProp，SGD也行 ## Chap 6 ### Chapter 7 SGAN(Semi_Supervised GAN) - Discriminator: N+1 classes (N classes from read data, +1 is generated data) - Generator: latent code $\Rightarrow$ fake data $x^*$ $\Rightarrow$ to make discriminator classifies it into N classes - Discriminator: $(x, (x,y), x^*)$ $\Rightarrow$ possibilities $\Rightarrow$ accuracy, a screen out $x^*$ - $loss_D$: unsupervised loss + supervised loss - SGAN 的目標不是生成器，而是一個可以正確分類又可以判斷真偽的餞別器 -