Unit 5
Unit 5
Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow and his
colleagues in 2014. GANs are a class of neural networks that autonomously learn
patterns in the input data to generate new examples resembling the original dataset.
GAN’s architecture consists of two neural networks:
Generator: creates synthetic data from random noise to produce data so realistic that
the discriminator cannot distinguish it from real data.
Discriminator: acts as a critic, evaluating whether the data it receives is real or fake.
Generative Adversarial Network (GAN)
Generator Model
The generator is a deep neural network that takes random noise as input to generate
realistic data samples (e.g., images or text). It learns the underlying data distribution by
adjusting its parameters through backpropagation.
The generator’s objective is to produce samples that the discriminator classifies as real. The
loss function is:
JG=−1mΣi=1mlogD(G(zi))JG=−m1Σi=1mlogD(G(zi))
Where,
JGJG measure how well the generator is fooling the discriminator.
log D(G(zi))D(G(zi))represents log probability of the discriminator being correct for generated
samples.
The generator aims to minimize this loss, encouraging the production of samples that the
discriminator classifies as real (logD(G(zi))(logD(G(zi)), close to 1
Discriminator Model
The discriminator acts as a binary classifier, distinguishing between real and generated data. It
learns to improve its classification ability through training, refining its parameters to detect fake
samples more accurately.
When dealing with image data, the discriminator often employs convolutional layers or other
relevant architectures suited to the data type. These layers help extract features and enhance the
model’s ability to differentiate between real and generated samples.
The discriminator reduces the negative log likelihood of correctly classifying both produced and real
samples. This loss incentivizes the discriminator to accurately categorize generated samples as fake
and real samples with the following equation:
JD=−1mΣi=1mlogD(xi)–1mΣi=1mlog(1–D(G(zi))JD=−m1Σi=1mlogD(xi)–m1Σi=1mlog(1–D(G(zi))
JDJD assesses the discriminator’s ability to discern between produced and actual samples.
The log likelihood that the discriminator will accurately categorize real data is represented
by logD(xi)logD(xi).
The log chance that the discriminator would correctly categorize generated samples as fake is
represented by log(1−D(G(zi)))log(1−D(G(zi))).
By minimizing this loss, the discriminator becomes more effective at distinguishing between real and
generated samples
MinMax Loss
GANs follow a minimax optimization where the generator and discriminator are adversaries:
minGmaxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–
D(g(z)))]minGmaxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]
Where,
G is generator network and is D is the discriminator network
Actual data samples obtained from the true data distribution pdata(x)pdata(x) are represented by x.
Random noise sampled from a previous distribution pz(z)pz(z)(usually a normal or uniform
distribution) is represented by z.
D(x) represents the discriminator’s likelihood of correctly identifying actual data as real.
D(G(z)) is the likelihood that the discriminator will identify generated data coming from the generator
as authentic.
The generator aims to minimize the loss, while the discriminator tries to maximize its classification
accuracy.
Generator’s First Move
G takes a random noise vector as input. This noise vector contains random values and acts as the
starting point for G’s creation process. Using its internal layers and learned patterns, G
transforms the noise vector into a new data sample, like a generated image.
2. Discriminator’s Turn
D receives two kinds of inputs:
Real data samples from the training dataset.
The data samples generated by G in the previous step.
D’s job is to analyze each input and determine whether it’s real data or something G cooked up.
It outputs a probability score between 0 and 1. A score of 1 indicates the data is likely real, and
0 suggests it’s fake.
Adversarial Learning
If the discriminator correctly classifies real data as real and fake data as fake, it strengthens its
ability slightly.
If the generator successfully fools the discriminator, it receives a positive update, while the
discriminator is penalized.
4. Generator’s Improvement
Every time the discriminator misclassifies fake data as real, the generator learns and improves.
Over multiple iterations, the generator produces more convincing synthetic samples.
Discriminator’s Adaptation
The discriminator continuously refines its ability to distinguish real from fake data. This ongoing
duel between the generator and discriminator enhances the overall model’s learning process.
6. Training Progression
As training continues, the generator becomes highly proficient at producing realistic data.
Eventually, the discriminator struggles to distinguish real from fake, indicating that the GAN has
reached a well-trained state.
At this point, the generator can be used to generate high-quality synthetic data for various
applications.
Types of GANs
Vanilla GAN
Vanilla GAN is the simplest type of GAN. It consists of:
A generator and a discriminator, both are built using multi-layer perceptrons (MLPs).
The model optimizes its mathematical formulation using stochastic gradient descent (SGD).
While Vanilla GANs serve as the foundation for more advanced GAN models, they often struggle with
issues like mode collapse and unstable training.
2. Conditional GAN (CGAN)
Conditional GANs (CGANs) introduce an additional conditional parameter to guide the generation
process. Instead of generating data randomly, CGANs allow the model to produce specific types of
outputs.
Working of CGANs:
A conditional variable (y) is fed into both the generator and the discriminator.
This ensures that the generator creates data corresponding to the given condition (e.g., generating
images of specific objects).
The discriminator also receives the labels to help distinguish between real and fake data.
Deep Convolutional GAN (DCGAN)
Deep Convolutional GANs (DCGANs) are among the most popular and widely used types of GANs,
particularly for image generation.
What Makes DCGAN Special?
Uses Convolutional Neural Networks (CNNs) instead of simple multi-layer perceptrons (MLPs).
Max pooling layers are replaced with convolutional stride, making the model more efficient.
Fully connected layers are removed, allowing for better spatial understanding of images.
DCGANs have been highly successful in generating high-quality images, making them a go-to choice for
deep learning researchers.
4. Laplacian Pyramid GAN (LAPGAN)
Laplacian Pyramid GAN (LAPGAN) is designed to generate ultra-high-quality images by leveraging a
multi-resolution approach.
Working of LAPGAN:
Uses multiple generator-discriminator pairs at different levels of the Laplacian pyramid.
Images are first downsampled at each layer of the pyramid and upscaled again using Conditional GANs
(CGANs).
This process allows the image to gradually refine details, reducing noise and improving clarity.
Super Resolution GAN (SRGAN)
Super-Resolution GAN (SRGAN) is specifically designed to increase the resolution of low-quality
images while preserving details.
Working of SRGAN:
Uses a deep neural network combined with an adversarial loss function.
Enhances low-resolution images by adding finer details, making them appear sharper and more
realistic.
Helps reduce common image upscaling errors, such as blurriness and pixelation.
Libraries and tools for GAN