0% found this document useful (0 votes)
10 views

C 03 Variational Autoencoders Generative Adversarial Network

Uploaded by

srinivasa p
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

C 03 Variational Autoencoders Generative Adversarial Network

Uploaded by

srinivasa p
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Advanced Generative AI: Models and

Architecture
Variational Autoencoders and Generative
Adversarial Network
Quick Recap

• The roles of the encoder and decoder in a Large


Language Model and explain how they contribute to
understanding and generating language.
• The importance of tokenization and embedding in
Large Language Models and describe their impact
on the models' language processing capabilities.
Engage and Think

Imagine you are a member of a development team


that has created a virtual wardrobe application,
leveraging the capabilities of variational
autoencoders (VAEs) and generative adversarial
networks (GANs). The goal of this application is to
allow users to see how various clothing options
would look on them through the use of a virtual
avatar.
How can VAEs enhance an application's ability to
handle a diverse range of clothing styles and body
shapes, ensuring that clothes fit avatars accurately
and realistically? Furthermore, how can GANs be
utilized to generate new, unique clothing items that
currently do not exist, thereby expanding the
selection of virtual garments available for users to
try on?
Learning Objectives

By the end of this lesson, you will be able to:

Analyze variational autoencoders' effectiveness in data


representation and reconstruction to focus on their
structure and output quality
Design and implement a variational autoencoder for a
specific data modeling task to demonstrate practical skills
in neural network configuration and data preprocessing
Utilize generative adversarial networks to generate unique
images, demonstrating an understanding of their
architecture and function
Introduction to Autoencoders
Autoencoders

It is a neural network that learns to compress and reconstruct input data.

Encoder
• Autoencoders are unsupervised learning models
• They are used for dimensionality reduction, data
Code compression, and feature extraction

Output Layer
Input Layer

• The model consists of three main components:


o An encoder, which maps input data to a lower-
X h X’
dimensional representation
o Latent Space, where the data is in its most
Latent compressed form
Space o A decoder, which maps the lower-dimensional
representation back to the original input data
Decoder
Autoencoders

It reconstructs the image to be as close as possible to the original image, indicating that the
network has learned a meaningful representation.

Encoding Decoding

This layer is
what we want.
Challenges in Autoencoders

Autoencoders have been successful in encoding and reconstructing data; however,


they have limitations:

• Data generation: Traditional autoencoders were limited in generating new, unseen data.
• Robustness in learning: Autoencoders often learn oversimplified representations,
missing the data's true complexity.
• Handling variability: Autoencoders struggled with the inherent variability and
randomness in data, which is crucial for tasks like image generation or simulation.

Note
Standard autoencoders' limitations have led to the development of Variational
Autoencoders (VAEs). VAEs overcome these challenges and create a more versatile and
effective generative model.
Introduction to Variational Autoencoders

Variational autoencoders (VAEs) are autoencoders with generative capabilities.

Latent
Input Layer Output Layer
Representation
x1 ^
x 1

x2 ^
x 2

x3 x^3

x4 ^
x 4

x5 ^
x 5

x6 ^
x 6

x7 ^
x 7

x8 x^8
Functions of VAE

• VAEs are designed to learn the underlying probability distributions of the input
data, not to compress it.
• It can generate new data samples that are like the training data, making them
powerful for generative tasks.
• It provides a more robust and generalizable way of representing data, capturing its
inherent variability and complexity.
VAE Architecture

VAE architecture contains the same components as that of the autoencoders, such as:

Input Image Reconstructed Image

Encoder Decoder
µz|x
Latent Space Encoder
Latent space
qø(z | x) Z pθ(x | z) ^
x x Decoder
Reconstruction loss
Ʃz|x KL Divergence term

Standard Devaition

Reconstruction Loss + KL Divergence


Roles of VAE Components

Each component plays a distinct role in the functionality of a Variational Autoencoder.

Encoder

Latent space • Compresses input data into a latent space


representation
Decoder • Produces the parameters (mean and variance) of a
probability distribution
Reconstruction loss

KL Divergence term
Roles of VAE Components

Each component plays a distinct role in the functionality of a Variational Autoencoder.

Encoder

Latent space
• Is a compressed representation of input data in the form
of a probability distribution
Decoder
• Encodes essential features for data reconstruction

Reconstruction loss

KL Divergence term
Roles of VAE Components

Each component plays a distinct role in the functionality of a Variational Autoencoder.

Encoder

Latent space
• Reconstructs input data from its latent space representation

Decoder • Ensures quality of reconstruction for optimal VAE


performance

Reconstruction loss

KL Divergence term
Roles of VAE Components

Each component plays a distinct role in the functionality of a Variational Autoencoder.

Encoder

• Measures how well the decoder reconstructs the input data


Latent space
• Commonly uses Mean Squared Error (MSE) or cross-
entropy loss
Decoder
• Acts as a guiding compass, nudging the VAE towards better
Reconstruction loss capturing the important features of the input data

KL Divergence term
Roles of VAE Components

Each component plays a distinct role in the functionality of a Variational Autoencoder.

Encoder

Latent space • Is essential for VAE's overall loss function


• Measures the divergence of the latent space distribution
Decoder from a prior (usually standard normal) distribution
• Helps with generalization and prevents overfitting
Reconstruction loss

KL Divergence term
Quick Check

Question: In a Variational Autoencoder (VAE), what is


the purpose of the latent space?

A. To store the original input data


B. To compress the input data for storage efficiency
C. To capture a probabilistic representation of the
input data
D. To directly generate the output data from the
input data
VAE Training Process
VAE Generative Training Process

Training a Variational Autoencoder (VAE) is a multi-step process that enables it to unlock its powerful
generative capabilities.

Encoding 2 3 Sampling

Data collection 1 4 Decoding


The steps of
the training
process
include:

Generative capability 7 5 Objective function

6
Training and backpropagation
Data Collection

Here’re the steps for training process of a VAE, which provides valuable insights into how a
language model captures and recreates complex data patterns:

• Gather a large dataset of existing content.


• Ensure the dataset represents the domain or
type of data you aim for the VAE to generate.
Encoding

• The next step involves the encoding process.


• The encoder, usually a neural network, maps
the input data (𝑥) to a latent space (𝑧).
• It learns the mean 〖(𝜇〗_𝜙 (𝑥))and variance (〖𝜎_𝜙
〗^2 (𝑥)) of the Gaussian distribution in the
latent space.
Sampling

• The model samples from the distribution it


learned in the latent space.
• This sampling enables the creation of new data
points from that distribution.
• This process introduces a crucial element of
randomness necessary for the model's
generative capabilities.
Decoding

• The decoder, which is another neural network,


generates new data samples.
• It maps the latent representation (𝑧) back to the
data space.
• The decoder learns the mean 〖(𝜇〗_𝜙 (𝑧)) and
variance (〖𝜎_𝜙〗^2 (𝑧)) in the data space.
Objective Function

• Training a VAE involves optimizing an objective


function.
• This objective function comprises two components:
a. Minimizing the reconstruction error between the
input and the generated data
b. Minimizing the Kullback-Leibler (KL) divergence
between the learned distribution in the latent
space and a standard Gaussian distribution
Training and Backpropagation

• Backpropagation trains the model.


• The process computes gradients in relation to
the encoder and decoder parameters.
• The system updates the parameters to minimize
the objective function.
Generative Capability

• The unique feature of a VAE is its continuous


latent space.
• It enables straightforward random sampling and
interpolation between data points.
• This versatility allows VAEs to generate a wide
variety of data types effectively.
Role of VAE in Generative AI

Below are the two important roles of Variational Autoencoders in the


evolution of artificial intelligence:

VAEs are instrumental in VAEs significantly contribute to the


understanding and modeling development of generative models
complex data distributions. in Artificial Intelligence.
Quick Check

Question: What are the essential components of the


objective function used in the training of a Variational
Autoencoder (VAE)?

A. Maximizing the alignment between the encoder and


decoder networks
B. Minimizing the reconstruction error and the
Kullback-Leibler (KL) divergence
C. Increasing the complexity of the latent space for
better data representation
D. Reducing the number of layers in the neural
network to prevent overfitting
Demo: Implementing a VAE with TensorFlow for Image
Generation Using the MNIST Dataset

Duration: 20 minutes

Problem Statement:
The task involves implementing a Variational Autoencoder (VAE) using TensorFlow to generate images.
This requires understanding the intricacies of VAEs and applying this knowledge to a practical dataset.
The challenge is accentuated with the use of the MNIST dataset, which comprises digit images.

Objective:
The goal is to successfully employ a VAE with TensorFlow for image generation. The focus is on the
MNIST dataset, known for its 60,000 training examples and 10,000 testing examples of handwritten
digits. Each digit are size-normalized and centered in 28x28 pixel images. The digits will be transformed
into a 1-dimensional NumPy array consisting of 784 features (28*28). The primary aim is to use the VAE
to create new images that resemble those in the MNIST dataset.

Note
Please download the solution document from the Reference Material Section and
follow the Jupyter Notebook for step-by-step execution.
VAE Generative Applications
Image Generation

VAEs excel at creating new and realistic images.

• They generate unique visual artworks.


• They create in-game assets, characters,
and environments.
• VAEs assist in generating medical
images for research and diagnostics.
Anomaly Detection

VAEs play a vital role in identifying anomalies or outliers in datasets.

• They assist in spotting unusual financial


transactions.
• They enhance security systems by detecting
irregular network activities.
• They improve manufacturing processes by
identifying defects.
Drug Discovery

VAEs are a boon for drug discovery.

• They speed up identifying potential drugs.


• They help design molecules with specific
properties for various applications.
Data Imputation

VAEs help fill in missing or incomplete data.

• VAEs complete patient records in


healthcare, aiding medical professionals.
• VAEs impute missing financial data for
analysis.
• VAEs are invaluable where missing data
complicates decision-making.
Drawbacks of VAE

The greatest disadvantage of VAEs is that they tend to produce blurry and unrealistic outputs.

Note
GANs are known for producing high-quality, sharp, and realistic outputs,
particularly in image generation.
Generative Adversarial Network (GAN)
Introduction to GANs

Generative Adversarial Networks (GANs) are architectures in deep learning that use structures
like convolutional neural networks for generative modeling.

• GANs generate highly realistic samples with sharp details


and intricate features.
• They are effective in producing natural-looking images
closely resembling input data.
• For example, a new image can be generated from an old
training set.
The images created above are imaginary.

Source: https://arxiv.org/abs/1906.00446?ref=assemblyai.com
Introduction to GANs

• VAEs tend to produce samples that are often


blurry or averaged representations.
Problems in VAE
• They may struggle to capture the full richness
and diversity of the data distribution.

• GANs excel at capturing high-frequency details and


How do GANs generating more realistic and diverse samples.
solve this issue? • They produce images that capture the complexity
and variability of real data.
GAN Architecture

GANs use two neural networks: a generator and a discriminator. These networks engage in an
adversarial relationship.

This adversarial dynamic forms a zero-sum game, where one network's progress is at the
expense of the other.
GAN Architecture

Each component has its own functionality in the workings of a Generative Adversarial Network (GAN).

The discriminator
The sample images are
The generator receives functions as a binary
generated, and the real
input and creates sample classifier, providing
samples are passed
images. probabilities ranging
to the discriminator.
from 0 to 1.
GAN Architecture

A result closer to 0
A result closer to 1
indicates a higher
indicates the likelihood
likelihood of the sample
of the sample being real.
being fake.

Note
Both the generator and discriminator are implemented using CNNs
(Convolutional Neural Networks), particularly for image-related tasks.
Quick Check

Question: In the architecture of a Generative


Adversarial Network (GAN), what is the role of the
discriminator ?
A. The discriminator generates new image data from a
noise vector.
B. The discriminator guides the generator to produce
more realistic images by providing feedback.
C. The discriminator classifies input images as real or
fake.
D. The discriminator optimizes the noise vector to
improve image generation.
GAN Training Process
Training Process

Below is the training process for Generative Adversarial Networks (GANs):

Step 1 Step 2 Step 3 Step 4


Train the
Train the
discriminator
generator
Initialize both the network on a
network to
generator and batch of real Repeat steps 2
create new data
discriminator data samples and 3 until the
samples that can
networks with and another networks achieve
deceive the
random weights batch of convergence
discriminator
generated data
network
samples
Application: StyleGAN

The below example demonstrates the generation of human faces that do not belong to any
real individuals.

• StyleGAN generates the image.


• NVIDIA developed the Style Generative
Adversarial Network. It generates highly
realistic and customizable synthetic images.
• Its primary innovation is controlling content
and style, leading to diverse, high-quality,
personalized visuals.

Source: https://user-images.githubusercontent.com/6625384/64915614-b82efd00-d730-11e9-92e4-f3a6de1a5575.png
Benefits of GAN

• GANs operate in an unsupervised learning framework and don't require labeled data during
training.
• GANs can be applied to image-to-image translation tasks, like converting satellite images to
maps, black-and-white photos to color, or day-to-night scene translation.
• GANs can be used for style transfer in images, allowing for the synthesis of images in the style
of a particular artist or a given set of images.
Demo: Generating Fake Images with Generative Adversarial
Networks (GANs)

Duration: 20 minutes

Problem Statement:
The task is to implement Generative Adversarial Networks (GANs) for synthetic image generation. It
involves mastering GAN principles and applying them to a specific dataset, focusing on the interplay
between the generator and discriminator networks.

Objective:
The aim is to develop a GAN, using TensorFlow, to create fake images. The project centers on training
the generator network to produce images that can convincingly pass as real to the discriminator. The
goal is to demonstrate GANs' effectiveness in producing diverse, realistic images.

Note
Please download the solution document from the Reference Material Section and
follow the Jupyter Notebook for step-by-step execution.
Industrial Use Case of GAN

Real-time industrial use cases of GANs in the fashion industry are:

Virtual clothing try-on Customized shopping

• Customers can upload a photo to see • GANs enable retailers to create


how different clothing items would personalized shopping experiences by
look on them. generating tailored recommendations.

• GANs generate realistic images of the • This allows retailers to engage


customer wearing the clothes. customers more effectively.
Drawbacks of GAN

Difficult to train
Get Locations Limited subset

They can be difficult to train They are more prone to mode


due to the adversarial collapse, where a model
relationship between the generates only a limited
generator and discriminator. subset of the sample.
Quick Check

Question: What is the primary advantage of GANs over


Variational Autoencoders (VAEs) in generating images?

A. GANs generate more abstract representations of


images.
B. GANs produce images that are often blurry or
averaged representations.
C. GANs are less capable of capturing high-frequency
details in images.
D. GANs excel at capturing high-frequency details and
generating more realistic and diverse samples.
Guided Practice

Overview Duration: 20 minutes

This assignment is designed to enhance understanding and skills in advanced AI technologies through
engaging tasks. It focuses on developing practical solutions using AI in various creative scenarios. The
objective is to deepen knowledge in the application of AI for innovative and generative tasks,
emphasizing hands-on experience in utilizing AI for diverse creative outputs.
Key Takeaways

The variational autoencoders are autoencoders with


generative capabilities.
GANs generate highly realistic samples with sharp
details and intricate features.
StyleGAN allows you to create unique, high-resolution
art, such as realistic faces of non-existent people.
Q&A

You might also like