ML2021Spring-hw6 筆記

# ML2021Spring-hw6 筆記 ## 題目說明 71314 張 96x96 動漫人臉圖使用 GAN 學習生成 64x64 相似的動漫人物圖 ![Screenshot_20240216_162800](https://hackmd.io/_uploads/ByC9n52oa.png) ![Screenshot_20240216_162914](https://hackmd.io/_uploads/BJrk6qnsp.png) 註：本次沒有 kaggle 評分，僅實驗各種技術 ## 多種測試結果 Comparison ### Sample Code （DCGAN） 10 epoches, n_critics=1, batch_size=64 ![Epoch_009](https://hackmd.io/_uploads/Hy6WZo2oa.jpg) ### Modified DCGAN Changed Network Structure ![Epoch_010](https://hackmd.io/_uploads/HyGrWo3oT.jpg) 20 epoches, batch_size=128 ![Epoch_023](https://hackmd.io/_uploads/HyKvWs2ia.jpg) ### WGAN clipping 50 epoches, n_critics=5 ![Epoch_050](https://hackmd.io/_uploads/Hyn2Ws3s6.jpg) 50 epoches, n_critics=1, batch_size=128 ![Epoch_050](https://hackmd.io/_uploads/r1sCbo2iT.jpg) 50 epoches, n_critics=1, batch=64 ![Epoch_050](https://hackmd.io/_uploads/HJjZMsniT.jpg) 50 epoches, n_critics=1, but with linear layer, batch_size=128 ![Epoch_050](https://hackmd.io/_uploads/rJAVMo3j6.jpg) ### WGAN GP ### StyleGAN2 使用 [lucidrains/stylegan2-pytorch](https://github.com/lucidrains/stylegan2-pytorch) 開源 pytorch 套件來訓練及生成 29000 steps ![29](https://hackmd.io/_uploads/SkLmt9aia.jpg) ## 過程與程式說明 ### 理解並嘗試 Sample Code（DCGAN） #### Generator： ```python= class Generator(nn.Module): def __init__(self, in_dim, dim=64): super(Generator, self).__init__() def dconv_bn_relu(in_dim, out_dim): return nn.Sequential( nn.ConvTranspose2d(in_dim, out_dim, 5, 2, padding=2, output_padding=1, bias=False), nn.BatchNorm2d(out_dim), nn.ReLU() ) self.l1 = nn.Sequential( nn.Linear(in_dim, dim * 8 * 4 * 4, bias=False), nn.BatchNorm1d(dim * 8 * 4 * 4), nn.ReLU() ) self.l2_5 = nn.Sequential( dconv_bn_relu(dim * 8, dim * 4), # shape: (bs, dim*4, 8*8) dconv_bn_relu(dim * 4, dim * 2), # 16*16 dconv_bn_relu(dim * 2, dim), # 32*32 nn.ConvTranspose2d(dim, 3, 5, 2, padding=2, output_padding=1), # 64*64 nn.Tanh() ) self.apply(weights_init) def forward(self, x): y = self.l1(x) y = y.view(y.size(0), -1, 4, 4) # shape: (bs, dim*64, 4, 4) y = self.l2_5(y) return y ``` 這裡稍微解釋一下 ConvTranspose2d 參數意義：[ConvTranspose2d 參考資料](https://blog.51cto.com/u_11466419/5459142) 使用 ConvTranspose2d 時，將 padding 設為 0 其實就是 Conv2d 中 padding = (kernel_size - 1) (Full padding) 而 padding 設成 1 時，就是 Conv2d 中 padding = (kernel_size - 2) ![09021621_62c87475eeb5810007](https://hackmd.io/_uploads/rJCq3opja.gif) 在 GAN 中，輸入是從 normal distribution 隨機取值的 Tensor 經過 ConvTranspose2d 可以讓輸出的圖片比輸入的 Tensor 還要大如圖所示，原本輸入的 Tensor 為 2x2, 1 channel 經過 padding=0 的 ConvTranspose2d 後，輸出變為 4x4, 1 channel 而 stride 也與 Conv2d 不大相同，這裡的 stride 是在輸入 Tensor 的值與值中間插入 0 預設 stride=1 代表不插入任何 0，stride=2 時插入一個 0，以此類推 ![09021622_62c874766003076958](https://hackmd.io/_uploads/H1Dphspsp.gif) 上圖設定為 stride=2, padding=0 的 ConvTranspose2d 而程式中每層輸出的 Shape 可見註解。 #### Discriminator： ```python= class Discriminator(nn.Module): def __init__(self, in_dim, dim=64): super(Discriminator, self).__init__() def conv_bn_lrelu(in_dim, out_dim): return nn.Sequential( nn.Conv2d(in_dim, out_dim, 5, 2, 2), nn.BatchNorm2d(out_dim), nn.LeakyReLU(0.2), ) self.ls = nn.Sequential( nn.Conv2d(in_dim, dim, 5, 2, 2), nn.LeakyReLU(0.2), conv_bn_lrelu(dim, dim * 2), conv_bn_lrelu(dim * 2, dim * 4), conv_bn_lrelu(dim * 4, dim * 8), nn.Conv2d(dim * 8, 1, 4), nn.Sigmoid(), ) self.apply(weights_init) def forward(self, x): y = self.ls(x) y = y.view(-1) return y ``` CNN 架構，使用 LeakyReLU （ReLU 是直接將負值截斷，但 LeakyReLU 將負值乘上 $\alpha$，此處為 0.2） ![Screenshot_20240217_142629](https://hackmd.io/_uploads/ryQsbAaj6.png) #### Optimizer and Loss function: ```python= # Loss criterion = nn.BCELoss() # Optimizer opt_D = torch.optim.Adam(D.parameters(), lr=lr, betas=(0.5, 0.999)) opt_G = torch.optim.Adam(G.parameters(), lr=lr, betas=(0.5, 0.999)) ``` #### Training: ```python= steps = 0 for e, epoch in enumerate(range(n_epoch)): progress_bar = tqdm(dataloader) for i, data in enumerate(progress_bar): imgs = data imgs = imgs.cuda() bs = imgs.size(0) # Train D z = Variable(torch.randn(bs, z_dim)).cuda() r_imgs = Variable(imgs).cuda() f_imgs = G(z) # Label r_label = torch.ones((bs)).cuda() f_label = torch.zeros((bs)).cuda() # Model forwarding r_logit = D(r_imgs.detach()) f_logit = D(f_imgs.detach()) # Compute the loss for the discriminator. r_loss = criterion(r_logit, r_label) f_loss = criterion(f_logit, f_label) loss_D = (r_loss + f_loss) / 2 # Model backwarding D.zero_grad() loss_D.backward() # Update the discriminator. opt_D.step() # Train G if steps % n_critic == 0: # Generate some fake images. z = Variable(torch.randn(bs, z_dim)).cuda() f_imgs = G(z) # Model forwarding f_logit = D(f_imgs) loss_G = criterion(f_logit, r_label) # Model backwarding G.zero_grad() loss_G.backward() # Update the generator. opt_G.step() steps += 1 ``` 每一個 Step 先更新 Discriminator，再更新 Generator ```n_critics``` 數值代表更新 Dicriminator 和 Generator 的比例。 #### 結果 10 epoches, n_critics=1, batch_size=64 ![Epoch_009](https://hackmd.io/_uploads/Hy6WZo2oa.jpg) ### 修改 Generator 架構在 PyTorch 官方 [DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html) 中，網路結構如下： ![Screenshot_20240217_150828](https://hackmd.io/_uploads/ryCdiC6oT.png) 跟 Sample Code 相比，少了最前面的 Linear 層，多加一層 ConvTranspose2d 於是我也嘗試將 Linear 層拿掉，並多加一層 ConvTranspose2d 得到以下結果：（10 epoches） ![Epoch_010](https://hackmd.io/_uploads/HyGrWo3oT.jpg) 感覺上比先前 Sample code 的結果還要模糊一點，決定再往下 Train 看看經過 23 個 epoches 後： ![Epoch_023](https://hackmd.io/_uploads/HyKvWs2ia.jpg) 感覺 10 epoch 時看起來不錯的有變更好，但相對的，原本糟糕的就變得更糟糕 ### WGAN_CP > HW6 PPT：![Screenshot_20240217_153728](https://hackmd.io/_uploads/SkmSGJAiT.png) 照著指示修改：首先把 Discriminator 最後一層 Sigmoid 拿掉 #### Optimizer ```python= opt_D = torch.optim.RMSprop(D.parameters(), lr=lr) opt_G = torch.optim.RMSprop(G.parameters(), lr=lr) ``` #### Calculate Loss ```python= loss_D = -torch.mean(D(r_imgs)) + torch.mean(D(f_imgs)) ... loss_G = -torch.mean(D(f_imgs)) ``` #### Weight Clipping ```python= for p in D.parameters(): p.data.clamp_(-clip_value, clip_value) ``` 此處 clip_value 設為 0.01 而助教建議 WGAN Train 50 epoches，並且 n_critcis 設成 5 即更新 Discriminator 5 次後再更新 1 次 Generator 以下先依照指示做的生成結果： 50 epoches, n_critics=5 ![Epoch_050](https://hackmd.io/_uploads/Hyn2Ws3s6.jpg) 結果不是很理想，於是嘗試提高 n_critcis 增加 Generator 更新次數： 50 epoches, n_critics=1, batch_size=128 ![Epoch_050](https://hackmd.io/_uploads/r1sCbo2iT.jpg) 感覺成果好上不少嘗試調整 batch_size： 50 epoches, ,n_critics=1, batch=64 ![Epoch_050](https://hackmd.io/_uploads/HJjZMsniT.jpg) 將 Sample Code 中的 Linear Layer 加回來： 50 epoches, n_critics=1, but with linear layer, batch_size=128 ![Epoch_050](https://hackmd.io/_uploads/rJAVMo3j6.jpg) 以上三個 WGAN 成果不相上下，至於哪個設定較好見仁見智。 ### StyleGAN2 使用 [lucidrains/stylegan2-pytorch](https://github.com/lucidrains/stylegan2-pytorch) 開源 pytorch 套件來訓練及生成先使用套件 Defualt 設定進行訓練： * batch_size：5 * gradient accumulate：every 6 steps * 生成大小：128x128 下圖為走了 29000 steps 的生成結果（Default 訓練太慢了，於是先停止）： ![29](https://hackmd.io/_uploads/SkLmt9aia.jpg) exponential moving average： ![29-ema](https://hackmd.io/_uploads/HyuJq1RsT.jpg) mixing regularity： ![29-mr](https://hackmd.io/_uploads/By1H51Rsp.jpg) 引用一下作者在 issue 上面對於 exponential moving average 和 mixing regularity 的解釋： > EMA stands for exponential moving average, and it's simply a way to keep an approximate average of some variable, weighted by the more recent values. there's a trick to make this work, code-wise, by keeping and updating a single variable. > > mixing regularity is a technique introduced in stylegan to improve disentanglement of each resolution layer (recall that in stylegan, the latent vector z is translated to style vector w by a feedforward network, each w is then fed into each resolution layer). what you do is, 10% of the time, you actually pick two random z's resulting in two random w's. you then feed N random resolution layers one of the w's and the rest the other 不得不說，StyleGAN 生成的細節跟 DCGAN 和 WGAN 不在同一個 Level 才走 29000 Steps 就有這樣的結果（預設建議 128x128 的圖要走 150000 steps）雖然邊緣還有一些瑕疵，但臉部細節都還算不錯。以下簡短講解 StyleGAN2 基本概念：相關文章整理：[hackmd](https://hackmd.io/@wilson920430/H1HGadhia) #### Progressive GAN（PG-GAN） ![Screenshot_20240217_172148](https://hackmd.io/_uploads/Bks65eRoT.png) 核心概念：Generator 和 Discriminator 都先從低畫素開始訓練，當兩者已經可以把低畫素做得很好時，再進階到訓練生成更高畫素圖片的網路。 #### StyleGAN ![v2-b54e4ac6af2ffb7e0b0b7697b64e937e_720w](https://hackmd.io/_uploads/rJqaslkhT.jpg) StyleGAN 保留 PG-GAN 的概念，但是新增 Mapping-Network 這個 Network 將輸入的 Latent z 轉換成另一個 intermediate Latent w 引用一下 chatGPT 的解釋： > 這個映射網絡是經過訓練的，它的目標是將潛在空間中的隨機噪聲向量映射到一個更有意義的空間中，這個空間被稱為樣式空間（Style Space）。 > > 這個映射過程的主要目的是將潛在空間中的噪聲向量轉換為更加可控的表示，這樣生成器可以更好地理解並操作這些表示以生成逼真的圖像。這個映射過程是由多個 Linear 層和激活函數組成的，這些 Linear 層會對輸入向量進行線性變換，而激活函數則用於引入非線性，從而使得模型可以學習到複雜的映射關係。 > ![Screenshot_20240219_230849](https://hackmd.io/_uploads/HyThJx-2T.png) 更：最後有讓 StyleGAN 訓練跑完，結果如下： ![generated-02-19-2024_19-40-30-0-ema](https://hackmd.io/_uploads/H1pXfJZna.jpg) ![generated-02-19-2024_19-35-24-0-mr](https://hackmd.io/_uploads/HkxUzJWnp.jpg) <div style="position:relative; width:100%; height:0px; padding-bottom:100.000%"><iframe allow="fullscreen;autoplay" allowfullscreen height="100%" src="https://streamable.com/e/ocnhtk?autoplay=1&nocontrols=1" width="100%" style="border:none; width:100%; height:100%; position:absolute; left:0px; top:0px; overflow:hidden;"></iframe></div> <br> <iframe allow="fullscreen;autoplay" allowfullscreen height="392" src="https://streamable.com/e/j3trnh?autoplay=1&nocontrols=1" width="392" style="border:none;"></iframe>