DL - Generative Adversarial Networks (GAN)

# DL - Generative Adversarial Networks (GAN) ## Summary * [Overview](#overview) * [Architecture](#architecture) * [Loss Function](#loss-function) * [Implementation](#implementation) * [Application Of Generative Adversarial Networks (GANs)](#application-of-generative-adversarial-networks-(gans)) * [References](#references) ## Overview A generative adversarial network (GAN) is a deep learning architecture that involves training two neural networks to compete against each other in order to generate more realistic and authentic data based on a given training dataset. This can include generating new images from an existing image database or producing original music from a collection of songs. The term "adversarial" in GAN refers to the fact that it involves two separate networks that are pitted against each other. The first network, known as the generator, takes an input data sample and aims to generate new data by modifying it as much as possible. Meanwhile, the second network, called the discriminator, attempts to predict whether the generated data belongs to the original dataset or is a fake. The discriminator essentially acts as a judge, determining the authenticity of the generated data. During training, the generator keeps generating improved versions of fake data samples, while the discriminator continuously tries to improve its ability to differentiate between real and fake data. This iterative process pushes the generator to generate more realistic data, as it aims to fool the discriminator. The ultimate goal is to reach a point where the discriminator is unable to distinguish between the generated data and the original data. ## Architecture ![GAN](https://hackmd.io/_uploads/BJ2jlimd0.png) A Generative Adversarial Network (GAN) is a deep learning architecture consisting of two neural networks: the Generator and the Discriminator. These networks are trained simultaneously using an adversarial training approach. ### Generator Model A Generator in GANs is a neural network that creates fake data to be trained on the discriminator. It learns to generate plausible data. The generated examples/instances become negative training examples for the discriminator. It takes a fixed-length random vector carrying noise as input and generates a sample. ![GeneratorNetwork](https://hackmd.io/_uploads/HkL0goXdA.png) The main aim of the Generator is to make the discriminator classify its output as real. The part of the GAN that trains the Generator includes: - noisy input vector - generator network, which transforms the random input into a data instance - discriminator network, which classifies the generated data - generator loss, which penalizes the Generator for failing to dolt the discriminator The backpropagation method is used to adjust each weight in the right direction by calculating the weight's impact on the output. It is also used to obtain gradients and these gradients can help change the generator weights. ![GeneratorTraining](https://hackmd.io/_uploads/rJVk-jXOA.png) ### Discriminator Model The discriminator’s role is to distinguish between real and fake data. If you’re thinking about GANs in the context of images (which is a common application), the discriminator tries to tell apart real images from the fake images generated by the generator. The Discriminator is a neural network that identifies real data from the fake data created by the Generator. The discriminator's training data comes from different two sources: - The real data instances, such as real pictures of birds, humans, currency notes, etc., are used by the Discriminator as positive samples during training. - The fake data instances created by the Generator are used as negative examples during the training process. ![DiscriminatorNetwork](https://hackmd.io/_uploads/rJ6lbs7uA.png) While training the discriminator, it connects to two loss functions. During discriminator training, the discriminator ignores the generator loss and just uses the discriminator loss. In the process of training the discriminator, the discriminator classifies both real data and fake data from the generator. The discriminator loss penalizes the discriminator for misclassifying a real data instance as fake or a fake data instance as real. The discriminator updates its weights through backpropagation from the discriminator loss through the discriminator network. ![DiscriminatorTraining](https://hackmd.io/_uploads/rJXGZs7O0.png) ## Loss Function Generative Adversarial Networks (GANs) utilize loss functions to train both the generator and the discriminator. The loss function helps adjust the weights of these models during training to optimize their performance. Both the generator and the discriminator use the binary cross-entropy loss to train the models, that can be written as ![Screenshot from 2024-07-16 11-21-04](https://hackmd.io/_uploads/HygvWjQuR.png) where: - *L*(*y*,*p*) is the loss value; - *y* is the true label (either 0 or 1); - *p* is the predicted probability of the sample belonging to class 1. ### Discriminator Loss The discriminator’s goal is to correctly classify real samples as real and fake samples (produced by the generator) as fake. Its loss is typically represented as: ![Screenshot from 2024-07-16 11-22-11](https://hackmd.io/_uploads/Bkm5-oXdR.png) where ![Screenshot from 2024-07-16 11-22-21 (1)](https://hackmd.io/_uploads/ryfYWjQuR.png) *x_i* are samples from the real dataset, *N* is the number of samples from the real dataset, *z_i* are samples from the noise distribution, and *M* is the number of samples from the noise distribution. The first term on the right hand penalizes the discriminator for misclassifying real data, while the second term penalizes the discriminator for misclassifying the fake data produced by the generator. ### Generator Loss The generator’s goal is to produce samples that the discriminator incorrectly classifies as real. Its loss is typically represented as: ![Screenshot from 2024-07-16 11-23-39](https://hackmd.io/_uploads/Bk4nWj7_R.png) This term penalizes the generator when the discriminator correctly identifies its outputs as fake. ### Combined Loss The combined GAN Loss, often referred to as the minimax loss, is a combination of the discriminator and generator losses. It can be expressed as: ![Screenshot from 2024-07-16 11-24-36](https://hackmd.io/_uploads/rkJC-jmuR.png) This represents the adversarial nature of GAN training, where the generator and the discriminator are in a two-player minimax game. The discriminator tries to maximize its ability to classify real and fake data correctly, while the generator tries to minimize the discriminator’s ability by generating realistic data. ### Gradient Penalties Gradient penalty is a technique used to stabilize the training by penalizing the gradients if they become too steep. This can help in stabilizing the training and avoiding issues like mode collapse. ![Screenshot from 2024-07-16 11-35-35](https://hackmd.io/_uploads/H1L1zsmdC.png) where - *GP* represents the gradient penalty term; - *λ* is a hyperparameter that controls the strength of the penalty; - the gradient component is the gradient of the discriminator’s output with respect to its input *x*^; - *P_x*^ represents the distribution of interpolated samples between real and generated data; - *k* is a target norm for the gradient, often set to 1; The discriminator loss with gradient penalty can be incorporated as follows: ![Screenshot from 2024-07-16 11-36-00 (1)](https://hackmd.io/_uploads/By2eMjXuA.png) This loss function consists of the usual GAN discriminator loss components for real and generated data, plus the gradient penalty term to regularize the discriminator’s behavior. The key concept is that by penalizing large gradients, you encourage the discriminator to behave more smoothly, which can lead to a more stable training process for both the generator and the discriminator. This approach can be adjusted and adapted depending on the specific characteristics of the GAN architecture and the problem at hand. ## Implementation We will follow and understand the steps to understand how GAN is implemented: **Step1 : Importing the required libraries** ``` import torch import torch.nn as nn import torch.optim as optim import torchvision from torchvision import datasets, transforms import matplotlib.pyplot as plt import numpy as np # Set device device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') ``` For training on the CIFAR-10 image dataset, this [**PyTorch**](https://www.geeksforgeeks.org/getting-started-with-pytorch/) module creates a Generative Adversarial Network (GAN), switching between generator and discriminator training. Visualization of the generated images occurs every tenth epoch, and the development of the GAN is tracked. **Step 2: Defining a Transform** The code uses PyTorch’s transforms to define a simple picture transforms.Compose. It normalizes and transforms photos into tensors. ``` # Define a basic transform transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) ]) ``` **Step 3: Loading the Dataset** A [**CIFAR-10 dataset**](https://www.geeksforgeeks.org/cifar-10-image-classification-in-tensorflow/) is created for training with below code, which also specifies a root directory, turns on train mode, downloads if needed, and applies the specified transform. Subsequently, it generates a 32-batch [**DataLoader**](https://www.geeksforgeeks.org/datasets-and-dataloaders-in-pytorch/) and shuffles the training set of data. ``` train_dataset = datasets.CIFAR10(root='./data',\ train=True, download=True, transform=transform) dataloader = torch.utils.data.DataLoader(train_dataset, \ batch_size=32, shuffle=True) ``` **Step 4: Defining parameters to be used in later processes** A Generative Adversarial Network (GAN) is used with specified hyperparameters. - The latent space’s dimensionality is represented by latent_dim. - lr is the optimizer’s learning rate. - The coefficients for the [**Adam optimizer**](https://www.geeksforgeeks.org/intuition-of-adam-optimizer/) are beta1 and beta2. To find the total number of training epochs, use num_epochs. ``` # Hyperparameters latent_dim = 100 lr = 0.0002 beta1 = 0.5 beta2 = 0.999 num_epochs = 10 ``` **Step 5: Defining a Utility Class to Build the Generator** The generator architecture for a GAN in PyTorch is defined with below code. - From [**nn.Module**](https://www.geeksforgeeks.org/create-model-using-custom-module-in-pytorch/), the Generator class inherits. It is comprised of a sequential model with Tanh, linear, convolutional, batch normalization, reshaping, and upsampling layers. - The neural network synthesizes an image (img) from a latent vector (z), which is the generator’s output. The architecture uses a series of learned transformations to turn the initial random noise in the latent space into a meaningful image. ``` # Define the generator class Generator(nn.Module): def __init__(self, latent_dim): super(Generator, self).__init__() self.model = nn.Sequential( nn.Linear(latent_dim, 128 * 8 * 8), nn.ReLU(), nn.Unflatten(1, (128, 8, 8)), nn.Upsample(scale_factor=2), nn.Conv2d(128, 128, kernel_size=3, padding=1), nn.BatchNorm2d(128, momentum=0.78), nn.ReLU(), nn.Upsample(scale_factor=2), nn.Conv2d(128, 64, kernel_size=3, padding=1), nn.BatchNorm2d(64, momentum=0.78), nn.ReLU(), nn.Conv2d(64, 3, kernel_size=3, padding=1), nn.Tanh() ) def forward(self, z): img = self.model(z) return img ``` **Step 6: Defining a Utility Class to Build the Discriminator** The PyTorch code describes the discriminator architecture for a GAN. The class Discriminator is descended from nn.Module. It is composed of linear layers, batch normalization, [**dropout**](https://www.geeksforgeeks.org/dropout-in-neural-networks/), convolutional, [**LeakyReLU**](https://www.geeksforgeeks.org/tensorflow-js-tf-layers-leakyrelu-function/), and sequential layers. An image (img) is the discriminator’s input, and its validity—the probability that the input image is real as opposed to artificial—is its output. ``` # Define the discriminator class Discriminator(nn.Module): def __init__(self): super(Discriminator, self).__init__() self.model = nn.Sequential( nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1), nn.LeakyReLU(0.2), nn.Dropout(0.25), nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1), nn.ZeroPad2d((0, 1, 0, 1)), nn.BatchNorm2d(64, momentum=0.82), nn.LeakyReLU(0.25), nn.Dropout(0.25), nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1), nn.BatchNorm2d(128, momentum=0.82), nn.LeakyReLU(0.2), nn.Dropout(0.25), nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1), nn.BatchNorm2d(256, momentum=0.8), nn.LeakyReLU(0.25), nn.Dropout(0.25), nn.Flatten(), nn.Linear(256 * 5 * 5, 1), nn.Sigmoid() ) def forward(self, img): validity = self.model(img) return validity ``` **Step 7: Building the Generative Adversarial Network** The code snippet defines and initializes a discriminator (Discriminator) and a generator (Generator). - The designated device (GPU if available) receives both models. [**Binary Cross Entropy Loss,**](https://www.geeksforgeeks.org/tensorflow-js-tf-metrics-binarycrossentropy-function/) which is frequently used for GANs, is selected as the loss function (adversarial_loss). - For the generator (optimizer_G) and discriminator (optimizer_D), distinct Adam optimizers with predetermined learning rates and betas are also defined. ``` # Define the generator and discriminator # Initialize generator and discriminator generator = Generator(latent_dim).to(device) discriminator = Discriminator().to(device) # Loss function adversarial_loss = nn.BCELoss() # Optimizers optimizer_G = optim.Adam(generator.parameters()\ , lr=lr, betas=(beta1, beta2)) optimizer_D = optim.Adam(discriminator.parameters()\ , lr=lr, betas=(beta1, beta2)) ``` **Step 8: Training the Generative Adversarial Network** For a Generative Adversarial Network (GAN), the code implements the training loop. - The training data batches are iterated through during each epoch. Whereas the generator (optimizer_G) is trained to generate realistic images that trick the discriminator, the discriminator (optimizer_D) is trained to distinguish between real and phony images. - The generator and discriminator’s adversarial losses are computed. Model parameters are updated by means of Adam optimizers and the losses are backpropagated. - Discriminator printing and generator losses are used to track progress. For a visual assessment of the training process, generated images are additionally saved and shown every 10 epochs. ``` # Training loop for epoch in range(num_epochs): for i, batch in enumerate(dataloader): # Convert list to tensor real_images = batch[0].to(device) # Adversarial ground truths valid = torch.ones(real_images.size(0), 1, device=device) fake = torch.zeros(real_images.size(0), 1, device=device) # Configure input real_images = real_images.to(device) # --------------------- # Train Discriminator # --------------------- optimizer_D.zero_grad() # Sample noise as generator input z = torch.randn(real_images.size(0), latent_dim, device=device) # Generate a batch of images fake_images = generator(z) # Measure discriminator's ability # to classify real and fake images real_loss = adversarial_loss(discriminator\ (real_images), valid) fake_loss = adversarial_loss(discriminator\ (fake_images.detach()), fake) d_loss = (real_loss + fake_loss) / 2 # Backward pass and optimize d_loss.backward() optimizer_D.step() # ----------------- # Train Generator # ----------------- optimizer_G.zero_grad() # Generate a batch of images gen_images = generator(z) # Adversarial loss g_loss = adversarial_loss(discriminator(gen_images), valid) # Backward pass and optimize g_loss.backward() optimizer_G.step() # --------------------- # Progress Monitoring # --------------------- if (i + 1) % 100 == 0: print( f"Epoch [{epoch+1}/{num_epochs}]\ Batch {i+1}/{len(dataloader)} " f"Discriminator Loss: {d_loss.item():.4f} " f"Generator Loss: {g_loss.item():.4f}" ) # Save generated images for every epoch if (epoch + 1) % 10 == 0: with torch.no_grad(): z = torch.randn(16, latent_dim, device=device) generated = generator(z).detach().cpu() grid = torchvision.utils.make_grid(generated,\ nrow=4, normalize=True) plt.imshow(np.transpose(grid, (1, 2, 0))) plt.axis("off") plt.show() ``` Output: ``` Epoch [10/10] Batch 1300/1563 Discriminator Loss: 0.4473 Generator Loss: 0.9555 Epoch [10/10] Batch 1400/1563 Discriminator Loss: 0.6643 Generator Loss: 1.0215 Epoch [10/10] Batch 1500/1563 Discriminator Loss: 0.4720 Generator Loss: 2.5027 ``` ## Application Of Generative Adversarial Networks (GANs) GANs, or Generative Adversarial Networks, have many uses in many different fields. Here are some of the widely recognized uses of GANs: - **Image Synthesis and Generation : GANs** are often used for picture synthesis and generation tasks, They may create fresh, lifelike pictures that mimic training data by learning the distribution that explains the dataset. The development of lifelike avatars, high-resolution photographs, and fresh artwork have all been facilitated by these types of generative networks. - **Image-to-Image Translation : GANs** may be used for problems involving image-to-image translation, where the objective is to convert an input picture from one domain to another while maintaining its key features. GANs may be used, for instance, to change pictures from day to night, transform drawings into realistic images, or change the creative style of an image. - **Text-to-Image Synthesis : GANs** have been used to create visuals from descriptions in text. GANs may produce pictures that translate to a description given a text input, such as a phrase or a caption. This application might have an impact on how realistic visual material is produced using text-based instructions. - **Data Augmentation : GANs** can augment present data and increase the robustness and generalizability of machine-learning models by creating synthetic data samples. - **Data Generation for Training : GANs** can enhance the resolution and quality of low-resolution images. By training on pairs of low-resolution and high-resolution images, GANs can generate high-resolution images from low-resolution inputs, enabling improved image quality in various applications such as medical imaging, satellite imaging, and video enhancement. ## References 1. [Generative Adversarial Networks, Ian J. Goodfellow](https://arxiv.org/abs/1406.2661) 2. [Wasserstein GAN, Martin Arjovsky](https://arxiv.org/pdf/1701.07875.pdf) 3. [Deep Learning Lectures, Generative Adversarial Networks](https://www.youtube.com/watch?v=wFsI2WqUfdA&list=PLqYmG7hTraZCDxZ44o4p3N5Anz3lLRVZF&index=10&t=0s)