The document provides an overview of Generative Adversarial Networks (GANs), detailing their structure, training techniques, and various applications such as image generation and style transfer. It discusses the roles of the generator and discriminator, different types of GANs like Vanilla GAN, Progressive GAN, and Conditional GAN, as well as challenges faced in training them. Additionally, it highlights advancements in style transfer using architectures like CycleGAN and pix2pix GAN.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
43 views52 pages
Gen AI Unit 3
The document provides an overview of Generative Adversarial Networks (GANs), detailing their structure, training techniques, and various applications such as image generation and style transfer. It discusses the roles of the generator and discriminator, different types of GANs like Vanilla GAN, Progressive GAN, and Conditional GAN, as well as challenges faced in training them. Additionally, it highlights advancements in style transfer using architectures like CycleGAN and pix2pix GAN.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52
Unit 3: Generative Adversarial Networks
Generative Adversarial Networks – Vanilla GAN – Progressive
GAN – Style transfer and image transformation – Image Generation with GANs – Style Transfer with GANs Generative Adversarial Networks • Generative Adversarial Networks (GANs) were developed in 2014 by Ian Goodfellow and his teammates. • GAN is basically an approach to generative modeling that generates a new set of data based on training data that look like training data. • GANs have two main blocks (two neural networks) which compete with each other and are able to capture, copy, and analyze the variations in a dataset. To understand the term GAN let’s break it into separate three parts • Generative – To learn a generative model, which describes how data is generated in terms of a probabilistic model. • Adversarial – The training of the model is done in an adversarial setting. • Networks – use deep neural networks for training purposes. Generator • The generator network takes random input (typically noise) and generates samples, such as images, text, or audio, that resemble the training data it was trained on. • The goal of the generator is to produce samples that are indistinguishable from real data. Discriminator • The discriminator network, on the other hand, tries to distinguish between real and generated samples. • It is trained with real samples from the training data and generated samples from the generator. • The discriminator’s objective is to correctly classify real data as real and generated data as fake. • The training process involves an adversarial game between the generator and the discriminator. • The generator aims to produce samples that fool the discriminator, while the discriminator tries to improve its ability to distinguish between real and generated data. This adversarial training pushes both networks to improve over time. • As training progresses, the generator becomes more adept at producing realistic samples, while the discriminator becomes more skilled at differentiating between real and generated data. • Ideally, this process converges to a point where the generator is capable of generating high-quality samples that are difficult for the discriminator to distinguish from real data. Components of Generative Adversarial Networks (GANs) 1) Discriminator – It is a supervised approach. It is a simple classifier that predicts data is fake or real. It is trained on real data and provides feedback to a generator. 2) Generator – It is an unsupervised learning approach. It will generate data that is fake data based on original(real) data. It is also a neural network that has hidden layers, activation, loss function. Its aim is to generate the fake image based on feedback and make the discriminator fool that it cannot predict a fake image. When the discriminator is made a fool by the generator, the training stops and we can say that a generalized GAN model is created. Training Techniques of GAN Define the Problem • Begin by defining the problem, aim to solve with GANs, whether it's generating audio, poems, text, images, or another type of content.
Select GAN Architecture
• Choose the specific architecture of the GAN that suits problem statement and requirements.
Train Discriminator on Real Dataset
• Initially, train the discriminator on real dataset samples without noise, classifying both real and fake data. Update discriminator weights based on its performance and misclassifications. Training Techniques of GAN Train Generator • Provide random noise inputs to the generator to produce fake outputs. During generator training, discriminator is idle. The generator learns to transform noise into meaningful data over many epochs through backpropagation.
Train Discriminator on Fake Data
• The generated fake samples are passed to the discriminator for classification. Discriminator provides feedback to the generator, helping it improve its output.
Train Generator with Discriminator Feedback
• Generator is trained based on the feedback from the discriminator, aiming to create more convincing fake samples. This iterative process continues until the generator's output successfully deceives the discriminator. Applications of GAN • Generate new data from available data – It means generating new samples from an available sample that is not similar to a real one. • Generate realistic pictures of people that have never existed. • GANs is not limited to Images, It can generate text, articles, songs, poems, etc. • Generate Music by using some clone Voice – If you provide some voice then GANs can generate a similar clone feature of it. • Text to Image Generation (Object GAN and Object Driven GAN) • Creation of anime characters in Game Development and animation production. • Image to Image Translation – We can translate one Image to another without changing the background of the source image. For example, Gans can replace a dog with a cat. • Low resolution to High resolution – If you pass a low-resolution Image or video, GAN can produce a high-resolution Image version of the same. • Prediction of Next Frame in the video – By training a neural network on small frames of video, GANs are capable to generate or predict a small next frame of video. Challenges of GAN • The problem of stability between generator and discriminator. We do not want that discriminator should be too strict, we want to be lenient • Problem to determine the positioning of objects. suppose in a picture we have 3 horse and generator have created 6 eyes and 1 horse. • The problem in understanding the global objects – GANs do not understand the global structure or holistic structure which is similar to the problem of perspective. It means sometimes GAN generates an image that is unrealistic and cannot be possible. • A problem in understanding the perspective – It cannot understand the 3-d images and if we train it on such types of images then it will fail to create 3-d images because today GANs are capable to work on 1-d images. Vanilla GAN • Vanilla GANs are the simplest type of GANs. They consist of two neural networks, a generator and a discriminator, which are trained in a two-player minimax game. • The generator learns to produce fake data that resembles real data, while the discriminator learns to distinguish between real and fake data. Progressive GAN • It involves training by starting with a very small image and then the blocks of layers added incrementally so that the output size of the generator model increases and increases the input size of the discriminator model until the desired image size is obtained. • This type of approach has proven very effective at generating high-quality synthetic images that are highly realistic. • It basically involves 4 major steps 1) Progressive growing (of model and layers) 2) Minibatch std on Discriminator 3) Normalization with PixelNorm 4) Equalized Learning Rate • During the training process, it systematically adds new blocks of convolutional layers to both the generator model and the discriminator model. • This incremental addition of the convolutional layers allows the models to learn coarse-level detail effectively at the beginning and later learn even finer detail, both on the generator and discriminator side. • The process of adding a new block of layers involves the usage of skip connection as shown in the above figure, it is mainly to connect the new block either to the output of the generator or the input of the discriminator and adding it to the existing output or input layer with a weighting which controls the influence of the new block. • It shows a generator that outputs a 16×16 image and a discriminator that takes a 16×16 pixel image. The models are grown to the size of 32×32. Style transfer and image transformation Image Generation with GANs • The taxonomy of generative models • The discriminator model • The generator model • Training GANs The generator is learning to generate good enough fake samples, while the discriminator is working hard to discriminate between real and fake. More formally, this is termed as the minimax game, where the value function V(G, D) is described as follows:
• We can better understand the value function V(G, D) by separating
out the objective function for each of the players. The following equations describe individual objective functions:
where 𝐽𝐷is the discriminator objective function in the classical sense,
𝐽𝐺is the generator objective equal to the negative of the discriminator, and 𝑝𝑑ata is the distribution of the training data. • The objective functions help us to understand the aim of each of the players. If we assume both probability densities are non-zero everywhere, we can get the optimal value of D(x) as: • The simplest yet widely used way of training a GAN is as follows. Repeat the following steps N times. N is the number of total iterations: 1. Repeat steps k times: • Sample a minibatch of size m from the generator: {z1, z2, … zm} = pmodel(z) • Sample a minibatch of size m from the actual data: {x1, x2, … xm} = pdata(x) • Update the discriminator loss, 𝐽𝐷 2. Set the discriminator as non-trainable 3. Sample a minibatch of size m from the generator: {z1, z2, … zm} = pmodel(z) 4. Update the generator loss, 𝐽𝐺 Deep Convolutional GAN • Let's start by preparing the discriminator model. CNN-based binary classifiers are simple models. One modification we make here is to use strides longer than 1 to down-sample the input between layers instead of using pooling layers. This helps in providing better stability for the training of the generator model. We also rely on batch normalization and Leaky ReLU for the same purposes (although these were not used in the original GAN paper). Another important aspect of this discriminator (as compared to the vanilla GAN discriminator) is the absence of fully connected layers. Vector arithmetic
• The ability to manipulate the latent vectors by addition, subtraction,
and so on to generate meaningful output transformations is a powerful tool. • The authors of the DCGAN paper showed that indeed the z representative space of the generator obeys such a rich linear structure. • Similar to vector arithmetic in the NLP domain, where word2vec generates a vector similar to "Queen" upon performing the manipulation "King" – "Man" + "Woman," DCGAN allows the same in the visual domain. Conditional GAN • CGANs work by training the generator model to generate fake samples conditioned on specific characteristics of the output required. • The discriminator, on the other hand, needs to do some extra work. It needs to learn not only to differentiate between fake and real but also to mark out samples as fake if the generated sample and its conditioning characteristics do not match • We denote the conditioning input as y and transform the value function for the GAN minimax game as follows: • where log log D (x|y) is the discriminator output for a real sample, x, conditioned on y and similarly log log (1 - D (G(z|y))) is the discriminator output for a fake sample, G(z), conditioned on y. Wasserstein GAN • The main difference between typical GANs and W-GANs is the fact that W-GANs treat the discriminator as a critic. • Hence, instead of simply classifying input images as real or fake, the W-GAN discriminator (or critic) generates a score to inform the generator about the realness or fakeness of the input image. • The maximum likelihood game we discussed in the initial sections of the chapter explained the task as one where we try to minimize the divergence between pz and pdata using KL divergence, that is, 𝛩=argmin𝐷KL(𝑝data(𝑥) ∥ 𝑝𝑔 (𝑧)). • Mathematically, this can be stated as the infimum (or greatest lower bound, denoted as inf) for any transport plan (denoted as W(source, destination), that is: Progressive GAN Style transfer and image transformation Style Transfer with GANs • Neural networks are improving in a number of tasks involving analytical and linguistic skills. Creativity is one sphere where humans have had an upper hand. Not only is art subjective and has no defined boundaries, it is also difficult to quantify. • Generative Adversarial Networks (GANs) in particular have been studied and explored in detail for the task of style transfer over the years. One such example is presented, where the CycleGAN architecture has been used to successfully transform photographs into paintings using the styles of famous artists such as Monet and Van Gogh. Paired style transfer using pix2pix GAN • Image Generation with GANs, we discussed a number of innovations related to GAN architectures that led to improved results and better control of the output class. One of those innovations was conditional GANs. This simple yet powerful addition to the GAN setup enabled us to navigate the latent vector space and control the generator to generate specific outputs. • Style transfer is an intriguing research area, pushing the boundaries of creativity and deep learning together. • As the name suggests, this GAN architecture takes a specific type of image as input and transforms it into a different domain. It is called pair-wise style transfer as the training set needs to have matching samples from both source and target domains. • This generic approach is shown to effectively synthesize high-quality images from label maps and edge maps, and even colorize images. The U-Net generator • Since CNNs are optimized for computer vision tasks, using them for generator as well as discriminator architectures has a number of advantages. • The two choices are the vanilla encoder-decoder architecture and the encoder-decoder architecture with skip connections. • The architecture with skip connections has more in common with the U-Net model5 than the encoder-decoder setup. Hence, the generator in the pix2pix GAN is termed a U-Net generator. • A typical encoder (in the encoder-decoder setup) takes an input and passes it through a series of downsampling layers to generate a condensed vector form. This condensed vector is termed the bottleneck features. • The decoder part then upsamples the bottleneck features to generate the final output. This setup is extremely useful in a number of scenarios, such as language translation and image reconstruction. The bottleneck features condense the overall input into a lower-dimensional space. • The U-Net architecture uses skip connections to shuttle important features between the input and output. • In the case of the pix2pix GAN, skip connections are added between every ith down-sampling layer and (n - i)th oversampling layer, where n is the total number of layers in the generator. • The skip connection leads to the concatenation of all channels from the ith to (n - i)th layers, with the ith layers being appended to the (n - i)th layers: Use cases Unpaired style transfer using CycleGAN • Paired style transfer is a powerful setup with a number of use cases, some of which we discussed in the previous section. It provides the ability to perform cross-domain transfer given a pair of source and target domain datasets. • The pix2pix setup also showcased the power of GANs to understand and learn the required loss functions without the need for manually specifying them. • CycleGAN improves upon paired style transfer architecture by relaxing the constraints on input and output images. • CycleGAN explores the unpaired style transfer paradigm where the model actually tries to learn the stylistic differences between source and target domains without explicit pairing between input and output images. Overall setup for CycleGAN • For CycleGAN, the training dataset consists of unpaired samples from the source set, denoted as, with no specific information regarding which xi matches which yj. In order to reduce the search space and add more constraints in our search for the best possible generator G, the authors introduced a property called cycle consistency. CycleGAN generated outputs at different stages of training for the apples to oranges experiment Thank You