Lecture 6: GAN

###### tags: `COMP3340 Applied DL` # Lecture 6: GAN ## Introduction In this section, we will introduce the Generative Adversarial Networks (GANs). GANs are approaches to generative modeling, which involves using a model to generate new examples with deep learning methods, such as convolutional neural networks. <div align="center"> <img src="https://github.com/geyuying/GAN_lecture/blob/main/definition.jpg?raw=true" style="zoom:30%"> </div> Generative Adversarial Networks (GANs) usually consist of two models that compete with each other to capture the data distribution.The Generator in GAN learns to create fake data by incorporating the feedback from the Discriminator. The Discriminator in GAN is a classifier that identifies real data from the fake data created by the Generator. <div style='display: none'> GANs adopt a clever way of training a generative model by framing the problem as a supervised learning problem with two sub-models. <div align="center"> <img src="https://github.com/geyuying/GAN_lecture/blob/main/GAN.png?raw=true" style="zoom:60%"> </div> The Generator model takes a fixed-length random vector as input and generates a sample in the domain. The Discriminator takes an image as input (real or generated) and predicts a binary class label of real (from image dataset) ‘1’, or fake (generated by generator) ‘0’. The Generator needs to learn how to create data in such a way that the Discriminator isn’t able to distinguish it as fake anymore. </div> ## GAN Basis ### Overview of GAN Structure A generative adversarial network (GAN) has two parts: - The generator learns to fool the discriminator by generating real-looking images. - The discriminator learns to distinguish the generator's fake data from real data. When training begins, the generator produces obviously fake data, and the discriminator quickly learns to tell that it's fake: <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/init_stage.png?raw=true"/ style="zoom:50%"> </div> As training progresses, the generator gets closer to producing output that can fool the discriminator: <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/middle.png?raw=true"/ style="zoom:50%"> </div> Finally, if generator training goes well, the discriminator gets worse at telling the difference between real and fake. It starts to classify fake data as real, and its accuracy decreases. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/final.png?raw=true"/ style="zoom:50%"> </div> The whole classic GAN system acts as the following picture. Both the generator and the discriminator are neural networks. The generator output is connected directly to the discriminator input. Through backpropagation, the discriminator's classification provides a signal that the generator uses to update its weights. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/gan_structure_2.png?raw=true"/ style="zoom:80%"> </div> ### Generator The generator part of a GAN learns to create fake data by incorporating feedback from the discriminator. It learns to make the discriminator classify its output as real. Generator training requires tighter integration between the generator and the discriminator than discriminator training requires. The portion of the GAN that trains the generator includes: - Random or conditional input vectors - Generator network, which transforms the random or conditional input into a data instance - Discriminator network, which classifies the generated data - Discriminator output - Generator loss, which penalizes the generator for failing to fool the discriminator <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/g_loss_bp2.png?raw=true"/ style="zoom:80%"> </div> During the training process, the generator feeds into the discriminator net, and the discriminator produces the output we're trying to affect. The generator loss penalizes the generator for producing a sample that the discriminator network classifies as fake. So backpropagation starts at the output and flows back through the discriminator into the generator. We train the generator with the following procedure: - Sample random noise or conditional input vectors - Produce generator output from sampled random noise. - Get discriminator "Real" or "Fake" classification for generator output. - Calculate loss from discriminator classification. - Backpropagate through both the discriminator and generator to obtain gradients. - Use gradients to **change only the generator weights**. ### Discriminator The discriminator in a GAN is simply a classifier. It tries to distinguish real data from the data created by the generator. It could use any network architecture appropriate to the type of data it's classifying. The discriminator's training data comes from two sources: - **Real data instances**, such as real pictures of people. The discriminator uses these instances as positive examples during training. The real data instances are shown as the above "sample boxes" in the following figure. - **Fake data instances** created by the generator. The discriminator uses these instances as negative examples during training. The fake data instances are shown as the "sample boxes" down below in the following figure. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/d_loss_bp2.png?raw=true"/ style="zoom:80%"> </div> The discriminator connects to two loss functions. During discriminator training, the discriminator ignores the generator loss and just uses the discriminator loss. We use the generator loss during generator training, as described in the next section. During discriminator training: - The discriminator classifies both real data and fake data from the generator. - The discriminator loss penalizes the discriminator for misclassifying a real instance as fake or a fake instance as real. - The discriminator updates its weights through backpropagation from the discriminator loss through the discriminator network. ## GAN Training ### Formulation The above dynamic of GAN can be formulated as a minmax game as below, $$\text{min}_G\text{max}_DV(D,G)$$ where the discriminator is trying to maxmize its reward $V$, while the generator is trying to minimize the discriminator's reward $V$. The reward $V$ is formulated as below, $$V(D,G)=E_{x\sim p_{data}} \text{log}D(x) + E_{z\sim p(z)}\text{log}(1-D(G(z)))$$ where $D(x)$ is the output from the discriminator for real data $x$, $G(z)$ is the output from the generator for samples $z$, and $D(G(z))$ is the output from the discriminator for the generated fake data $G(z)$. The generator and the discriminator are trained jointly using the minimax objective function as below, $$\text{min}_{\theta_g}\text{max}_{\theta_d}[E_{x\sim p_{data}} \text{log}D_{\theta_d}(x) + E_{z\sim p(z)}\text{log}(1-D_{\theta_d}(G_{\theta_g}(z)))]$$ The discriminator $D_{\theta_d}$ wants to maximize the objective function, such that the output for real data denoted as $D_{\theta_d}(x)$ is close to 1 and the output for fake date denoted as $D_{\theta_d}(G_{\theta_g}(z))$ is close to 0. The generator $G_{\theta_g}$ wants to minimize the objective function, such that the output for its generated date from the discriminator denoted as $D_{\theta_d}$ is close to 1. In this way, the discriminator is fooled into thinking that the generated date from the generator is real. The above minimax objective function is optimized with the alter training strategy: - Gradient ascent on the discriminator as below: $$\text{max}_{\theta_d}[E_{x\sim p_{data}} \text{log}D_{\theta_d}(x) + E_{z\sim p(z)}\text{log}(1-D_{\theta_d}(G_{\theta_g}(z)))]$$ - Gradient descent on the generator as below: $$\text{min}_{\theta_g}E_{z\sim p(z)}\text{log}(1-D_{\theta_d}(G_{\theta_g}(z)))]$$ Instead of performing the gradient descent on the generator, we perform the gradient ascent on the generator as below: $$\text{max}_{\theta_g}E_{z\sim p(z)}\text{log}(D_{\theta_d}(G_{\theta_g}(z)))]$$ ### Alter Training The generator and the discriminator have different training processes. The alter training strategy is utilized for training both the generator and discriminator seperatelly. - The generator trains for one or more iterations. We keep the discriminator constant during the generator training phase. - The discriminator trains for one or more iterations. We keep the generator constant during the discriminator training phase. - Repeat the following steps to continue to train the generator and discriminator networks. The **alter training sample code** is as what follows: ```python class GAN(): def __init__(self): xxxx def forward(self): """Run forward pass; called by both functions <optimize_parameters> and <test>.""" self.fake_B = self.netG(self.real_A) # G(A) def backward_D(self): """Calculate GAN loss for the discriminator""" # Fake; stop backprop to the generator by detaching fake_B fake_AB = torch.cat((self.real_A, self.fake_B),1) # we use conditional GANs; we need to feed both input and output to the discriminator pred_fake = self.netD(fake_AB) self.loss_D_fake = self.criterionGAN(pred_fake, True) # Real real_AB = torch.cat((self.real_A, self.real_B), 1) pred_real = self.netD(real_AB) self.loss_D_real = self.criterionGAN(pred_real, False) # combine loss and calculate gradients self.loss_D = (self.loss_D_fake + self.loss_D_real) * 0.5 self.loss_D.backward() def backward_G(self): """Calculate GAN and L1 loss for the generator""" # First, G(A) should fake the discriminator fake_AB = torch.cat((self.real_A, self.fake_B), 1) pred_fake = self.netD(fake_AB) self.loss_G_GAN = self.criterionGAN(pred_fake, False) # Second, G(A) = B self.loss_G_L1 = self.criterionL1(self.fake_B, self.real_B) * self.opt.lambda_L1 # combine loss and calculate gradients self.loss_G = self.loss_G_GAN + self.loss_G_L1 self.loss_G.backward() def optimize_parameters(self): self.forward() # compute fake images: G(A) # update D self.set_requires_grad(self.netD, False) # enable backprop for D self.optimizer_D.zero_grad() # set D's gradients to zero self.backward_D() # calculate gradients for D self.optimizer_D.step() # update D's weights # update G self.set_requires_grad(self.netD, True) # D requires no gradients when optimizing G """_____Blank_________""" # set G's gradients to zero """_____Blank_________""" # calculate graidents for G """_____Blank_________""" # udpate G's weights if __name__ == '__main__': model = GAN() model.optimize_parameters() ``` ### Commom Problems GANs have a number of common failure modes. All of these common problems are areas of active research. While none of these problems have been completely solved, the following contente will mention some things that reserachers have tried. - Vanishing Gradients When the discriminator is perfect, the loss function falls to zero and we end up with no gradient to update the loss during learning iterations. The following figure demonstrates an experiment when the discriminator gets better, the gradient vanishes fast. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/vanishing_grad.png?raw=true"/ style="zoom:40%"> </div> As a result, training a GAN faces a dilemma: 1. If the discriminator behaves badly, the generator does not have accurate feedback and the loss function cannot represent the reality. 2. If the discriminator does a great job, the gradient of the loss function drops down to close to zero and the learning becomes super slow or even jammed. This dilemma clearly is capable to make the GAN training very tough. - Mode Collapse During the training, the generator may collapse to a setting where it always produces same outputs. This is a common failure case for GANs, commonly referred to as Mode Collapse. Even though the generator might be able to trick the corresponding discriminator, it fails to learn to represent the complex real-world data distribution and gets stuck in a small space with extremely low variety. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/mode_collapse.png?raw=true"/ style="zoom:80%"> </div> - Failure to Converge As the generator improves with training, the discriminator performance gets worse because the discriminator can't easily tell the difference between real and fake. If the generator succeeds perfectly, then the discriminator has a 50% accuracy. For a GAN, convergence is often a fleeting, rather than stable, state. Thus, such a unstable convergence state is very hard to achieve. ## Classic Methods ### Progressive GAN #### Overview This method trains GANs from low-resolution images, and the progressively increase the resolution by adding layers to the network as below. It involves training by starting with a very small image and then adding the blocks of layers incrementally so that the output size of the generator model increases. It also increases the input size of the discriminator model until the desired image size is obtained. This incremental nature allows the training to first discover large-scale structure of the image distribution and then shift attention to increasingly finer scale detail, instead of having to learn all scales simultaneously. <div align="center"> <img src="https://miro.medium.com/max/1050/1*yxd2UrZAuyWMphTaB5iaKQ.png"/ style="zoom:80%"> </div> #### Growing the Generator - Taking the 32×32 pixels from the 16×16 pixels as an example, for the generator, this involves adding a new block of convolutional layers that outputs a 32×32 image. - The output of this new layer is combined with the output of the 16×16 layer that is upsampled using nearest neighbor interpolation to 32×32, and the output of the 32×32 layer, weighted by $1-\alpha$ and $\alpha$ respectively. - $\alpha$ is small initially. We slowly give more weight and then all weight to the new 32×32 output layers over training iterations. <div align="center"> <img src="https://miro.medium.com/max/1050/1*-lY_AywUNxaWVmdo0qQ5sA.png"/ style="zoom:80%"> </div> #### Growing the Discriminator - Taking the 32×32 pixels from the 16×16 pixels as an example, for the discriminator, this involves adding a new block of convolutional layers for the input of the model to support image sizes with 32×32 pixels. - The input image is downsampled to 16×16 using average pooling. The output of the new 32×32 block of layers is also downsampled using average pooling. - The two downsampled versions of the input are combined in a weighted manner, starting with a full weighting to the downsampled raw input and linearly transitioning to a full weighting for the interpreted output of the new input layer block. #### Training - Progressive GAN starts out generating very low resolution images. When training stabilizes, a new layer is added and the resolution is doubled. This continues until the output reaches the desired resolution as shown in the below figure. - To do this, they first artificially shrunk their training images to a very small starting resolution (only 4x4 pixels). They created a generator with just a few layers to synthesize images at this low resolution, and a corresponding discriminator of mirrored architecture. Because these networks were so small, they trained relatively quickly, and learned only the large-scale structures visible in the heavily blurred images. - When the first layers completed training, they then add another layer to the generator and the discriminator, doubling the output resolution to 8x8. The trained weights in the earlier layers were kept, but not locked, and the new layer was faded in gradually to help stabilize the transition (more on that later). Training resumed until the GAN was once again synthesizing convincing images, this time at the new 8x8 resolution. - In this way, they continued to add layers, double the resolution and train until the desired output size was reached. By progressively growing the networks in this fashion, high-level structure is learned first, and training is stabilized. <div align="center"> <img src="https://miro.medium.com/max/1050/1*tUhgr3m54Qc80GU2BkaOiQ.gif"/ style="zoom:70%"> </div> **Sample code:** #### Generator ```python class Generator(nn.Module): def __init__(self, code_dim=512): super().__init__() self.code_norm = PixelNorm() self.progression = nn.ModuleList([ConvBlock(512, 512, 4, 3, 3, 1), ConvBlock(512, 512, 3, 1, 3, 1), ConvBlock(512, 512, 3, 1, 3, 1), ConvBlock(512, 512, 3, 1, 3, 1), ConvBlock(512, 256, 3, 1, 3, 1), ConvBlock(256, 128, 3, 1, 3, 1)]) self.to_rgb = nn.ModuleList([nn.Conv2d(512, 3, 1), nn.Conv2d(512, 3, 1), nn.Conv2d(512, 3, 1), nn.Conv2d(512, 3, 1), nn.Conv2d(256, 3, 1), nn.Conv2d(128, 3, 1),]) def forward(self, input, expand=0, alpha=-1): out = self.code_norm(input) for i, (conv, to_rgb) in enumerate(zip(self.progression, self.to_rgb)): if i > 0 and expand > 0: upsample = F.interpolate(out, scale_factor=2) out = conv(upsample) else: out = conv(out) if i == expand: out = to_rgb(out) if i > 0 and 0 <= alpha < 1: skip_rgb = self.to_rgb[i - 1](upsample) out = (1 - alpha) * skip_rgb + alpha * out break return out ``` #### Discriminator ```python class Distriminator(nn.Module): def __init__(self): super().__init__() self.progression = nn.ModuleList([ConvBlock(128, 256, 3, 1, 3, 1, pixel_norm=False), ConvBlock(256, 512, 3, 1, 3, 1, pixel_norm=False), ConvBlock(512, 512, 3, 1, 3, 1, pixel_norm=False), ConvBlock(512, 512, 3, 1, 3, 1, pixel_norm=False), ConvBlock(512, 512, 3, 1, 3, 1, pixel_norm=False), ConvBlock(513, 512, 3, 1, 4, 0, pixel_norm=False),]) self.from_rgb = nn.ModuleList([nn.Conv2d(3, 128, 1), nn.Conv2d(3, 256, 1), nn.Conv2d(3, 512, 1), nn.Conv2d(3, 512, 1), nn.Conv2d(3, 512, 1), nn.Conv2d(3, 512, 1),]) self.n_layer = len(self.progression) self.linear = nn.Linear(512, 1) def forward(self, input, expand=0, alpha=-1): for i in range(expand, -1, -1): index = self.n_layer - i - 1 if i == expand: out = self.from_rgb[index](input) if i == 0: mean_std = input.std(0).mean() mean_std = mean_std.expand(input.size(0), 1, 4, 4) out = torch.cat([out, mean_std], 1) out = self.progression[index](out) if i > 0: out = F.avg_pool2d(out, 2) if i == expand and 0 <= alpha < 1: skip_rgb = F.avg_pool2d(input, 2) skip_rgb = self.from_rgb[index + 1](skip_rgb) out = (1 - alpha) * skip_rgb + alpha * out out = out.squeeze(2).squeeze(2) out = self.linear(out) return out ``` ### Pix2Pix GAN #### Image-to-Image translation Image-to-image translation is the problem of changing a given image in a specific or controlled way. Examples include translating a photograph of a landscape from summer to winter or translating a photograph to a segmented image, or even image inpainting. It is a challenging problem that typically requires the development of a specialized model and hand-crafted loss function for the type of translation task being performed. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/image2image.png?raw=true"/ style="zoom:50%"> </div> #### Pix2Pix GAN for Image-to-Image translation The Pix2Pix GAN is a general approach for image-to-image translation. It is based on the conditional generative adversarial network, where a target image is generated, conditional on a given input image. In this case, the Pix2Pix GAN changes the loss function so that the generated image is both plausible in the content of the target domain, and is a plausible translation of the input image. In Pix2Pix GAN, the generator model is provided with a given image as input and generates a translated version of the image. The discriminator model is given an input image and a real or generated paired image and must determine whether the paired image is real or fake. Finally, the generator model is trained to both fool the discriminator model and to minimize the loss between the generated image and the expected target image. As such, the Pix2Pix GAN must be trained on image datasets that are comprised of input images (before translation) and output or target images (after translation). #### U-Net Generator Model in Pix2Pix GAN The U-Net generator model takes an image as input, and unlike a traditional GAN model, it does not take a point from the latent space as input. - **Input:** Image from source domain - **Output:** Image in target domain The U-Net model architecture is similar to the encoder-decoder architecture, which involves taking an image as input and downsampling it over a few layers until a bottleneck layer, where the representation is then upsampled again over a few layers before outputting the final image with the desired size. For example, the first layer of the encoder has the same-sized feature maps as the last layer of the decoder and is merged with the decoder. This is repeated with each layer in the encoder and corresponding layer of the decoder, forming a U-shaped model. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/unet2.png?raw=true "/ style="zoom:60%"> </div> Here, we present a simple **U-Net block sample code**. The whole Unet consist of *n* consecutive U-Net blocks. ```python class UnetSkipConnectionBlock(nn.Module): """Defines the Unet submodule with skip connection. X -------------------identity---------------------- |-- downsampling -- |submodule| -- upsampling --| """ def __init__(self, outer_nc, inner_nc, input_nc=None, submodule=None, outermost=False, innermost=False, norm_layer=nn.BatchNorm2d, use_dropout=False): """Construct a Unet submodule with skip connections. Parameters: outer_nc (int) -- the number of filters in the outer conv layer inner_nc (int) -- the number of filters in the inner conv layer input_nc (int) -- the number of channels in input images/features submodule (UnetSkipConnectionBlock) -- previously defined submodules outermost (bool) -- if this module is the outermost module innermost (bool) -- if this module is the innermost module norm_layer -- normalization layer use_dropout (bool) -- if use dropout layers. """ super(UnetSkipConnectionBlock, self).__init__() self.outermost = outermost if type(norm_layer) == functools.partial: use_bias = norm_layer.func == nn.InstanceNorm2d else: use_bias = norm_layer == nn.InstanceNorm2d if input_nc is None: input_nc = outer_nc downconv = nn.Conv2d(input_nc, inner_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) downrelu = nn.LeakyReLU(0.2, True) downnorm = norm_layer(inner_nc) uprelu = nn.ReLU(True) upnorm = norm_layer(outer_nc) if outermost: upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc, kernel_size=4, stride=2, padding=1) down = [downconv] up = [uprelu, upconv, nn.Tanh()] model = down + [submodule] + up elif innermost: upconv = nn.ConvTranspose2d(inner_nc, outer_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) down = [downrelu, downconv] up = [uprelu, upconv, upnorm] model = down + up else: upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) down = [downrelu, downconv, downnorm] up = [uprelu, upconv, upnorm] if use_dropout: model = down + [submodule] + up + [nn.Dropout(0.5)] else: model = down + [submodule] + up self.model = nn.Sequential(*model) def forward(self, x): if self.outermost: return self.model(x) else: # add skip connections return torch.cat([x, self.model(x)], 1) ``` <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/u-net_block.png?raw=true "/ style="zoom:60%"> </div> #### PatchGAN Discriminator for Image-to-Image translation The discriminator model takes an image from the source domain and an image from the target domain and predicts the likelihood of whether the image from the target domain is a real or generated version of the source image. The difcriminator used in traditional GAN and PatchGAN are quiet different. In traditional GAN, the discriminator usually output the probability vectors to judge that the generated image is real or fake. However, the discriminator aopted in PatchGAN usually output a probability martix. Each element in this martix is responsible for judging the authenticity of one part of region in the image. - **Input:** Image from source domain, and Image from the target domain. - **Output:** Probability that the image from the target domain is a real translation of the source image. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/patchGAN_D.png?raw=true "/ style="zoom:60%"> </div> Here is the **sample code for classic pathchGAN Discriminator**. ```python class NLayerDiscriminator(nn.Module): """Defines a PatchGAN discriminator""" def __init__(self, input_nc, ndf=64, n_layers=3, norm_layer=nn.BatchNorm2d): """Construct a PatchGAN discriminator Parameters: input_nc (int) -- the number of channels in input images ndf (int) -- the number of filters in the last conv layer n_layers (int) -- the number of conv layers in the discriminator norm_layer -- normalization layer """ super(NLayerDiscriminator, self).__init__() if type(norm_layer) == functools.partial: # no need to use bias as BatchNorm2d has affine parameters use_bias = norm_layer.func == nn.InstanceNorm2d else: use_bias = norm_layer == nn.InstanceNorm2d kw = 4 padw = 1 sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)] nf_mult = 1 nf_mult_prev = 1 for n in range(1, n_layers): # gradually increase the number of filters nf_mult_prev = nf_mult nf_mult = min(2 ** n, 8) sequence += [ nn.Conv2d("""_____Blank_________"""), norm_layer("""_____Blank_________"""), nn.LeakyReLU(0.2, True) ] nf_mult_prev = nf_mult nf_mult = min(2 ** n_layers, 8) sequence += [ nn.Conv2d("""_____Blank_________"""), norm_layer("""_____Blank_________"""), nn.LeakyReLU(0.2, True) ] sequence += [nn.Conv2d(ndf * nf_mult, """_____Blank_________""", kernel_size=kw, stride=1, padding=padw)] # output channel prediction map self.model = nn.Sequential(*sequence) def forward(self, input): """Standard forward.""" return self.model(input) ``` If the channel of the input image is **3**. The overall PatchGAN Discriminator structure defined by the above code is as what follows: ``` [ Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)), LeakyReLU(negative_slope=0.2, inplace=True), Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)), InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False), LeakyReLU(negative_slope=0.2, inplace=True), Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)), InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False), LeakyReLU(negative_slope=0.2, inplace=True), Conv2d(256, 512, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1)), InstanceNorm2d(512, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False), LeakyReLU(negative_slope=0.2, inplace=True), Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), padding=(1, 1)) ] ``` #### Loss Function for Classic Pix2Pix GAN The discriminator model is trained in a standalone manner in the same way as a traditional GAN model, minimizing the negative log likelihood of identifying real and fake images, although conditioned on a source image. $$\mathcal{L}_D=\text{max}_{\theta_d}[E_{x\sim p_{data}} \text{log}D_{\theta_d}(x)]$$ The generator model is trained using both the adversarial loss for the discriminator model and the L1 or mean absolute pixel difference between the generated translation of the source image and the expected target image. $$\mathcal{L}_G=\text{min}_{\theta_g}[E_{z\sim p(z)}\text{log}(1-D_{\theta_d}(G_{\theta_g}(z)))] + \mathcal{L}_1(input, target)$$ ### CycleGAN #### Problem With Image-to-Image Translation Traditionally, training an image-to-image translation model requires a dataset comprised of paired examples. That is, the input image $X$ and the same image with the desired modificatin $Y$ are needed. However, the requirement for a paired training dataset is a limitation. For example, if we want to change horses to the zebras, we don't have the paired images at hand. These datasets are challenging and expensive to prepare, e.g. photos of different scenes under different conditions. More examples are shown in the following. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/unpaired_data.png?raw=true "/ style="zoom:60%"> </div> #### CycleGAN for Unpaired Image-to-Image Translation CycleGAN is able to be trained witout paired examples. That is, it does not require examples of photographs before and after the translation in order to train the model. Instead, the model is able to use a collection of photographs from each domain and extract and harness the underlying style of images in the collection in order to perform the translation. #### CycleGAN Architecture Consider the problem where we are interested in translating images from horses to zebras and zebras to horses. The CecleGAN architecture consists of two GANs. Each GAN has a discriminator and a generator model, meaning there are four models in total in the architecture. The two Gans are shown in the following pictures. - Generator Model 1: - Input: Takes photos of zebras (collection 1). - Output: Generates photos of horses (collection 2). - Generator Model 2: - Input: Takes photos of horses (collection 2). - Output: Generates photos of zeras (collection 1). - Discriminator Model 1: - Input: Takes photos of horses from collection 2 and output from Generator Model 1 (generated horses). - Output: Likelihood of image is from collection 2. - Discriminator Model 2: - Input: Takes photos of zebras from collection 1 and output from Generator Model 2 (generated zebras). - Output: Likelihood of image is from collection 1. <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/two_gans2.png?raw=true "/ style="zoom:60%"> </div> Thus, the four models form two cycles in the training process as shown in the following figure. The two cycle processes consist of two losses, i.e., the forward cycle-consistency loss and the barkward cycle-consistency loss. Here are the detailed components of the two losses. - Forward Cycle-Consistency Loss: - Input photo of zebras (collection 1) to GAN_1 - Output photo of horses from GAN_1 - Input photo of horses from GAN_1 to GAN_2 - Output photo of zebras from GAN_2 - Compare photo of zebras (collection 1) to photo of zebras from GAN_2 - Backward Cycle Consistency Loss: - Input photo of horses (collection 2) to GAN_2 - Output photo of zebras from GAN_2 - Input photo of zebras from GAN_2 to GAN_1 - Output photo of horses from GAN_1 - Compare photo of horses (collection 2) to photo of horses from GAN_1 <div align="center"> <img src="https://github.com/ChongjianGE/COMP3340_Applied_DL_Note/blob/main/two_cycles.png?raw=true "/ style="zoom:60%"> </div> **Sample code:** ```python class CycleGAN(): def __init__(self): xxxx def forward(self): """Run forward pass; called by both functions <optimize_parameters> and <test>.""" self.fake_B = self.netG_A(self.real_A) # G_A(A) self.rec_A = self.netG_B(self.fake_B) # G_B(G_A(A)) self.fake_A = self.netG_B(self.real_B) # G_B(B) self.rec_B = self.netG_A(self.fake_A) # G_A(G_B(B)) def backward_D_basic(self, netD, real, fake): """Calculate GAN loss for the discriminator Parameters: netD (network) -- the discriminator D real (tensor array) -- real images fake (tensor array) -- images generated by a generator Return the discriminator loss. We also call loss_D.backward() to calculate the gradients. """ # Real pred_real = netD(real) loss_D_real = self.criterionGAN(pred_real, True) # Fake pred_fake = netD(fake.detach()) loss_D_fake = self.criterionGAN(pred_fake, False) # Combined loss and calculate gradients loss_D = (loss_D_real + loss_D_fake) * 0.5 loss_D.backward() return loss_D def backward_D_A(self): """Calculate GAN loss for discriminator D_A""" fake_B = self.fake_B_pool.query(self.fake_B) self.loss_D_A = self.backward_D_basic(self.netD_A, self.real_B, fake_B) def backward_D_B(self): """Calculate GAN loss for discriminator D_B""" fake_A = self.fake_A_pool.query(self.fake_A) self.loss_D_B = self.backward_D_basic(self.netD_B, self.real_A, fake_A) def backward_G(self): """Calculate the loss for generators G_A and G_B""" lambda_idt = self.opt.lambda_identity lambda_A = self.opt.lambda_A lambda_B = self.opt.lambda_B # Identity loss if lambda_idt > 0: # G_A should be identity if real_B is fed: ||G_A(B) - B|| self.idt_A = self.netG_A(self.real_B) self.loss_idt_A = self.criterionIdt(self.idt_A, self.real_B) * lambda_B * lambda_idt # G_B should be identity if real_A is fed: ||G_B(A) - A|| self.idt_B = self.netG_B(self.real_A) self.loss_idt_B = self.criterionIdt(self.idt_B, self.real_A) * lambda_A * lambda_idt else: self.loss_idt_A = 0 self.loss_idt_B = 0 # GAN loss D_A(G_A(A)) self.loss_G_A = self.criterionGAN(self.netD_A(self.fake_B), """_____Blank_____""") # GAN loss D_B(G_B(B)) self.loss_G_B = self.criterionGAN("""_____Blank_____""") # Forward cycle loss || G_B(G_A(A)) - A|| self.loss_cycle_A = self.criterionCycle("""_____Blank_____""") * lambda_A # Backward cycle loss || G_A(G_B(B)) - B|| self.loss_cycle_B = self.criterionCycle("""____Blank_____""") * lambda_B # combined loss and calculate gradients self.loss_G = self.loss_G_A + self.loss_G_B + self.loss_cycle_A + self.loss_cycle_B + self.loss_idt_A + self.loss_idt_B self.loss_G.backward() def optimize_parameters(self): """Calculate losses, gradients, and update network weights; called in every training iteration""" # forward self.forward() # compute fake images and reconstruction images. # G_A and G_B self.set_requires_grad([self.netD_A, self.netD_B], False) # Ds require no gradients when optimizing Gs self.optimizer_G.zero_grad() # set G_A and G_B's gradients to zero self.backward_G() # calculate gradients for G_A and G_B self.optimizer_G.step() # update G_A and G_B's weights # D_A and D_B self.set_requires_grad([self.netD_A, self.netD_B], True) self.optimizer_D.zero_grad() # set D_A and D_B's gradients to zero self.backward_D_A() # calculate gradients for D_A self.backward_D_B() # calculate graidents for D_B self.optimizer_D.step() # update D_A and D_B's weights if __name__ == '__main__': model = CycleGAN() model.optimize_parameters() ``` ## Quiz ### Quiz 1 - Writing and correcting the code of G-D compete process Descriptions: Given the code of the loss of the generator, fill in the code of the loss of the discriminator. ```python= class GAN(): def __init__(self): xxxx def forward(self): """Run forward pass; called by both functions <optimize_parameters> and <test>.""" self.fake_B = self.netG(self.real_A) # G(A) def backward_D(self): """Calculate GAN loss for the discriminator""" # Fake; stop backprop to the generator by detaching fake_B fake_AB = torch.cat((self.real_A, self.fake_B),1) # we use conditional GANs; we need to feed both input and output to the discriminator pred_fake = self.netD(fake_AB) # inference the fake sample self.loss_D_fake = self.criterionGAN(pred_fake, True) # Real real_AB = torch.cat((self.real_A, self.real_B), 1) pred_real = self.netD(real_AB) # inference the real sample self.loss_D_real = self.criterionGAN(pred_real, False) # combine loss and calculate gradients self.loss_D = (self.loss_D_fake + self.loss_D_real) * 0.5 self.loss_D.backward() def backward_G(self): """Calculate GAN and L1 loss for the generator""" # First, G(A) should fake the discriminator fake_AB = torch.cat((self.real_A, self.fake_B), 1) pred_fake = self.netD(fake_AB) self.loss_G_GAN = self.criterionGAN(pred_fake, False) # Second, G(A) = B self.loss_G_L1 = self.criterionL1(self.fake_B, self.real_B) * self.opt.lambda_L1 # combine loss and calculate gradients self.loss_G = self.loss_G_GAN + self.loss_G_L1 self.loss_G.backward() def optimize_parameters(self): self.forward() # compute fake images: G(A) # update D self.set_requires_grad(self.netD, False) # enable backprop for D """_____Blank(1)_________""" # set D's gradients to zero """_____Blank(2)_________""" # calculate gradients for D """_____Blank(3)_________""" # update D's weights # update G self.set_requires_grad(self.netD, True) # D requires no gradients when optimizing G """_____Blank(4)_________""" # set G's gradients to zero """_____Blank(5)_________""" # calculate graidents for G """_____Blank(6)_________""" # udpate G's weights if __name__ == '__main__': model = GAN() model.optimize_parameters() ``` ### Quiz 2 - Writing the code of PatchGAN Dicscriminator Descriptions: Since you have completed the code of simple Discriminator in Quiz 1, please extend the code for PatchGAN Discriminator implementation. ```python= class NLayerDiscriminator(nn.Module): """Defines a PatchGAN discriminator""" def __init__(self, input_nc, ndf=64, n_layers=3, norm_layer=nn.BatchNorm2d): """Construct a PatchGAN discriminator Parameters: input_nc (int) -- the number of channels in input images ndf (int) -- the number of filters in the last conv layer n_layers (int) -- the number of conv layers in the discriminator norm_layer -- normalization layer """ super(NLayerDiscriminator, self).__init__() if type(norm_layer) == functools.partial: # no need to use bias as BatchNorm2d has affine parameters use_bias = norm_layer.func == nn.InstanceNorm2d else: use_bias = norm_layer == nn.InstanceNorm2d kw = 4 padw = 1 sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)] nf_mult = 1 nf_mult_prev = 1 for n in range(1, n_layers): # gradually increase the number of filters nf_mult_prev = nf_mult nf_mult = min(2 ** n, 8) # input channel = ndf * nf_mult_prev # output channel = ndf * nf_mult # stride = 2 sequence += [ nn.Conv2d("""_____Blank(1)_________"""), norm_layer("""_____Blank(2)_________"""), nn.LeakyReLU(0.2, True) ] nf_mult_prev = nf_mult nf_mult = min(2 ** n_layers, 8) # input channel = ndf * nf_mult_prev # output channel = ndf * nf_mult # stride = 1 sequence += """_____Blank(3)_________""" sequence += [nn.Conv2d(ndf * nf_mult, """_____Blank(4)_________""", kernel_size=kw, stride=1, padding=padw)] # output ?? channel prediction map self.model = nn.Sequential(*sequence) def forward(self, input): """Standard forward.""" return self.model(input) ``` ### Quiz 3 - Writing the code of Cycle Loss Descriptions: writing the implementation of 4 detailed loss in Cycle GAN. ```python= def backward_G(self): """Calculate the loss for generators G_A and G_B""" lambda_idt = self.opt.lambda_identity lambda_A = self.opt.lambda_A lambda_B = self.opt.lambda_B # GAN loss D_A(G_A(A)) """_____Blank(1)_________""" # GAN loss D_B(G_B(B)) """_____Blank(2)_________""" # Forward cycle loss || G_B(G_A(A)) - A|| """_____Blank(3)_________""" # Backward cycle loss || G_A(G_B(B)) - B|| """_____Blank(4)_________""" # combined loss and calculate gradients self.loss_G = self.loss_G_A + self.loss_G_B + self.loss_cycle_A + self.loss_cycle_B self.loss_G.backward() ``` ## Answer ### Answer 1 - Writing and correcting the code of G-D compete process <font color='red'>Correcting code:</font> - Line 13: self.netD(fake_AB.detach()) - Line 14: self.loss_D_fake = self.criterionGAN(pred_fake, False) - Line 18: self.loss_D_real = self.criterionGAN(pred_real, True) - Line 28: self.loss_G_GAN = self.criterionGAN(pred_fake, True) - Line 38: self.set_requires_grad(self.netD, True) - Line 43: self.set_requires_grad(self.netD, False) <font color='red'>Writing code:</font> - Line 39: self.optimizer_D.zero_grad() - Line 40: self.backward_D() - Line 41: self.optimizer_D.step() - Line 43: self.optimizer_G.zero_grad() - Line 44: self.backward_G() - Line 45: self.optimizer_G.step() ### Answer 2 - Writing the code of PatchGAN Dicscriminator - Line 31: nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=2, padding=padw, bias=use_bias) - Line 32: norm_layer(ndf * nf_mult) - Line 42: sequence += [nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, padding=padw, bias=use_bias), norm_layer(ndf * nf_mult), nn.LeakyReLU(0.2, True)] - Line 44: nn.Conv2d(ndf * nf_mult, 1, kernel_size=kw, stride=1, padding=padw) ### Answer 3 - Writing the code of Cycle Loss - Line 9: self.loss_G_A = self.criterionGAN(self.netD_A(self.fake_B), True) - Line 11: self.loss_G_B = self.criterionGAN(self.netD_B(self.fake_A), True) - Line 13: self.loss_cycle_A = self.criterionCycle(self.rec_A, self.real_A) * lambda_A - Line 15: self.loss_cycle_B = self.criterionCycle(self.rec_B, self.real_B) * lambda_B ## Reference 1. https://github.com/open-mmlab/mmgeneration 2. https://arxiv.org/abs/1710.10196 3. https://arxiv.org/abs/1703.10593 4. https://developers.google.com/machine-learning/gan/ 5. https://machinelearningmastery.com/ 6. https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/ 7. http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture13.pdf 8. https://towardsdatascience.com/progressively-growing-gans-9cb795caebee 9. https://github.com/rosinality/progressive-gan-pytorch 10. https://towardsdatascience.com/progan-how-nvidia-generated-images-of-unprecedented-quality-51c98ec2cbd2