0% found this document useful (0 votes)
19 views33 pages

Unit V Deep Generative Models - Part 01

The document discusses Generative Models, particularly focusing on Deep Generative Models (DGMs) like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), which learn complex data distributions for generating new samples. It also covers Boltzmann Machines and their variations, including Restricted Boltzmann Machines (RBMs) and Deep Belief Networks (DBNs), highlighting their structures, training processes, and applications in tasks such as dimensionality reduction and feature learning. The document emphasizes the advantages of these models in unsupervised learning and their effectiveness in various applications such as image and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views33 pages

Unit V Deep Generative Models - Part 01

The document discusses Generative Models, particularly focusing on Deep Generative Models (DGMs) like Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), which learn complex data distributions for generating new samples. It also covers Boltzmann Machines and their variations, including Restricted Boltzmann Machines (RBMs) and Deep Belief Networks (DBNs), highlighting their structures, training processes, and applications in tasks such as dimensionality reduction and feature learning. The document emphasizes the advantages of these models in unsupervised learning and their effectiveness in various applications such as image and speech recognition.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Introduction

• A Generative Model is a powerful way of learning any


kind of data distribution using unsupervised learning and
has achieved tremendous success.

• All types of generative models aim at learning the true


data distribution of the training set so as to generate new
data points with some variations.

• Deep generative models (DGMs) are neural networks with


many hidden layers trained to approximate complicated,
high-dimensional probability distributions using a large
number of samples.
Introduction
• These models have gained significant attention in recent years
due to their ability to learn complex data distributions and
generate new samples from those distributions.

• When trained successfully, we can use DGMs to estimate the


likelihood of each observation and create new samples from the
underlying distribution.

• The two most popular approaches for deep generative modeling


are:

1. Variational Autoencoders (VAE)

2. Generative Adversarial Networks (GAN).


Introduction
1. Variational Autoencoders (VAE):

VAEs are probabilistic graphical models rooted in Bayesian


inference. VAEs aim to learn a low-dimensional latent
representation of training data, which can be used to
generate new data points.

VAEs combine an encoder and a decoder network.

The encoder maps input data to a latent space, and the


decoder generates samples from this latent space.

VAEs are commonly used for generative tasks and


representation learning.
Introduction
2. Generative Adversarial Networks (GAN): GANs consist of
a generator and a discriminator.

• The generator generates data samples, while the


discriminator evaluates whether a given sample is real or
generated.

• The training process involves an adversarial game


between the generator and discriminator, leading to the
generator learning to produce realistic data samples.
Boltzmann machine
• Boltzmann machine is designed to learn probability
distributions over its set of inputs.

• There are three key concepts to know about Boltzmann


machine:

1. Stochasticity : Unlike traditional deterministic neural


networks, Boltzmann machines incorporate randomness.

• The state of each neuron (node) in the network is


determined probabilistically based on the states of the
neighboring neurons and a temperature parameter.
Boltzmann machine
2. Energy Function: The Boltzmann machine assigns an
energy to each possible state of the system.

Lower energy states are more probable. The energy function


typically involves weights between nodes and biases.

3. Equilibrium: The machine aims to reach a thermal


equilibrium where the distribution of states follows the
Boltzmann distribution.

This distribution specifies that the probability of a system


being in a certain state decreases exponentially with the
energy of that state.
Boltzmann machine
• A Boltzmann machine is essentially a fully connected,
two-layer neural network.
• These two layers represents as the visual and hidden
layers.
• The visual layer is analogous to the input layer in
feedforward neural networks.
• A Boltzmann machine has a hidden layer, it functions
more as an output layer.
• The Boltzmann machine has no hidden layer between
the input and output layers.
Boltzmann machine
• The basic units of a Boltzmann machine are binary
neurons that can be in one of two states: on (1) or off (0).

• There are two types of units in a Boltzmann machine:

• Visible units: Correspond to the input data.

• Hidden units: Capture dependencies and abstract features


that are not directly observed.

• Weights: Represent connections between pairs of units.


These can be symmetric (i.e., the weight from unit i to unit
j is the same as from unit j to unit i).

• Biases: Represent the threshold for each unit.


Boltzmann machine
• Figure below shows the very simple structure of a Boltzmann
machine:

• The above Boltzmann machine has three hidden neurons and


four visible neurons.

• A Boltzmann machine is fully connected because every neuron


has a connection to every other neurons. However, no neuron is
connected to itself.
Boltzmann machine
• Types of Boltzmann Machines

1. Restricted Boltzmann Machines (RBMs): A simplified


version of the Boltzmann machine where the network is
restricted to a bipartite graph, meaning there are no
connections within the visible units or the hidden units.

• The Figure below RBM is not fully connected. All hidden


neurons are connected to each visible neuron.
Boltzmann machine
• There are no connections among the hidden neurons nor there
are connections among the visible neurons.

2. Deep Belief Networks (DBNs): Composed of multiple layers of


RBMs. These networks can learn hierarchical representations of
the data.

3. Deep Boltzmann Machines (DBMs): A Deep Boltzmann Machine


(DBM) is an advanced type of Boltzmann machine designed to
model complex, high-dimensional data.

It extends the idea of a Restricted Boltzmann Machine (RBM) by


stacking multiple layers of hidden units, creating a deep
architecture that can capture intricate patterns and dependencies
in data.
Restricted Boltzmann machine (RBM)
• A Restricted Boltzmann Machine (RBM) is a simplified
version of a Boltzmann machine with certain restrictions
that make it easier to train and more practical for many
applications.

• Structure of Restricted Boltzmann Machines

A Restricted Boltzmann Machine


(RBM) is a generative, stochastic,
and 2-layer artificial neural
network that can learn a
probability distribution over its set
of inputs.
Restricted Boltzmann machine (RBM)
• Visible Units (V): These units represent the input data.
The number of visible units corresponds to the number of
features in the input data.

• Hidden Units (H): These units capture the dependencies


and patterns in the input data. The number of hidden
units is a hyperparameter that can be tuned.

• Weights (W): Each visible unit is connected to every


hidden unit with a symmetric weight. The weight matrix
W defines these connections.
Restricted Boltzmann machine (RBM)
• Biases: There are bias terms for both visible units (𝑎) and
hidden units (𝑏). These biases help in adjusting the
activation thresholds of the units.

• The restriction in a Restricted Boltzmann Machine is that


there is no intra-layer communication(nodes of the same
layer are not connected).

• Visible units are not connected to other visible units, and


hidden units are not connected to other hidden units.

• This restriction allows for more efficient training


algorithm in the class of class of Boltzmann machines
Restricted Boltzmann machine (RBM)
• Energy function in RBM

• The energy of a configuration (a state of visible and hidden


units) in an RBM is defined as:

• 𝐸(𝑣,ℎ) = − ∑ 𝑖 𝑎𝑖 𝑣𝑖 − ∑𝑗 𝑏𝑗 ℎ𝑗 − ∑𝑖,𝑗 𝑣𝑖 𝑊𝑖𝑗 ℎ𝑗

• where:

• 𝑣𝑖 is the state of visible unit 𝑖,

• ℎ𝑗 is the state of hidden unit j,

• 𝑎𝑖 is the bias of visible unit 𝑖,

• 𝑏𝑗 is the bias of hidden unit j,

• 𝑊𝑖𝑗 is the weight between visible unit 𝑖 and hidden unit j.


Restricted Boltzmann machine (RBM)
• Probabilistic Activation

• The states of the units are binary (0 or 1) and are


activated probabilistically based on their energies.

• The probability that a hidden unit ℎ𝑗 is activated (i.e., set


to 1) given the visible units 𝑣 is:

• P ( hj = 1∣v ) = σ ( bj + ∑i vi Wij )

• Similarly, the probability that a visible unit 𝑣𝑖 is activated given the


hidden units ℎ is:

• P ( vi = 1∣h ) = σ ( ai + ∑j hj Wij)
Restricted Boltzmann machine (RBM)
• where 𝜎(𝑥) is the logistic sigmoid function:

• σ (x)=1 / 1+e−x1
Training RBMs

• Training an RBM involves adjusting the weights and


biases to minimize the difference between the observed
data distribution and the distribution modeled by the
RBM.

• The primary algorithm used for this purpose is


Contrastive Divergence (CD).
Restricted Boltzmann machine (RBM)
• Working of Restricted Boltzmann Machine
RBM works in two biases

• The hidden bias helps the RBM produce the activations


on the forward pass, while

• The visible layer’s biases help the RBM learn the


reconstructions on the backward pass.

• Forward pass

• The following Figure shows the working of RBM in


forward pass.
Restricted Boltzmann machine (RBM)

• The forward pass is the first step in training an RBM with

multiple inputs.

• The inputs are multiplied by the weights and then added to the

bias.
Restricted Boltzmann machine (RBM)

• The result is then passed through a sigmoid activation


function and the output determines if the hidden state
gets activated or not.

• Weights will be a matrix with the number of input nodes


as the number of rows and the number of hidden nodes as
the number of columns.

• The first hidden node will receive the vector multiplication


of the inputs multiplied by the first column of weights
before the corresponding bias term is added to it.
Restricted Boltzmann machine (RBM)
• The sigmoid function is given by:

• So the equation that we get in this step would be,

• where h(1) and v(0) are the corresponding vectors (column


matrices) for the hidden and the visible layers with the
superscript as the iteration (v(0) means the input that we
provide to the network) and a is the hidden layer bias
vector.
Restricted Boltzmann machine (RBM)
• Backward pass

• The backward pass is the reverse or the reconstruction


phase.

• It is similar to the first pass but in the opposite direction


as shown below:
Restricted Boltzmann machine (RBM)

• Where v(1) and h(1) are the corresponding vectors (column


matrices) for the visible and the hidden layers with the
superscript as the iteration and a is the visible layer bias
vector.
Applications of RBM
• RBMs have been used in various applications, including:

1. Dimensionality Reduction: Learning compact


representations of data.

2. Feature Learning: Extracting useful features from raw


data.

3. Collaborative Filtering: Building recommendation


systems.

4. Pre-training Deep Networks: Initializing the weights of


deep networks in a layer-wise manner.
Deep Belief Neural Networks
• A Restricted Boltzmann Machine (RBM) is a type of
generative stochastic artificial neural network that can
learn a probability distribution from its inputs.

• Deep belief networks, in particular, can be created by


“stacking” RBMs and fine-tuning the resulting deep
network via gradient descent and backpropagation.

• DBF belong to the family of unsupervised learning


algorithms and are known for their ability to learn
hierarchical representations from data.
Deep Belief Neural Networks
• DBN vary in operation, unlike autoencoders and RBMs
work with raw input data whereas DBN operate on an
input layer with one neuron for each input vector and go
through numerous levels before arriving at the final layer.

• The final outputs are produced using probabilities


acquired from earlier layers.
Deep Belief Neural Networks
• The Architecture of DBN

• The top two layers are the associative memory, and the
bottom layer is the visible units.

• The arrows pointing towards the layer closest to the data


point to relationships between all lower layers.
Deep Belief Neural Networks
• Directed acyclic connections in the lower layers translate
associative memory to observable variables.

• The lowest layer of visible units receives input data as binary or


actual data.

• Like RBM, there are no intralayer connections in DBN.

• The hidden units represent features that encapsulate the data’s


correlations.

• A matrix of proportional weights W connects two layers.


Deep Belief Neural Networks

• The “Input Layer” represents the initial layer, which has one neuron
for each input vector.

• “Hidden Layer 1” is the first layer of Restricted Boltzmann Machine


(RBM), which learns the fundamental structure of the data.
Deep Belief Neural Networks
• “Hidden Layer 2” and subsequent layers are additional RBMs
that learn higher-level features as we move through the
network.

• We can have multiple hidden layers depending on the


complexity of the task.

• “Output Layer” is used for supervised learning tasks like


classification or regression.

• The arrows indicate the flow of information from one layer to


the next, and the connections between neurons in adjacent
layers represent the weights that are learned during training.
Deep Belief Neural Networks
• Training the RBMs:

• One of the unique aspects of DBNs is that each RBM is


trained independently using a technique called
contrastive divergence.

• This method allows us to approximate the gradient of the


log-likelihood of the data with respect to the RBM’s
parameters.

• After training, the output of one RBM becomes the input


for the next, creating a stacked structure of RBMs.
Deep Belief Neural Networks
• Fine-Tuning for Supervised Learning:

• After the DBN has been assembled through the training of its
RBMs, it can be fine-tuned for supervised learning tasks.

• This fine-tuning process entails adjusting the weights of the


final layer using supervised learning techniques like
backpropagation.

• DBNs have gained popularity for their impressive performance


across various applications.

• From image and speech recognition to natural language


processing, they have consistently delivered state-of-the-art
results.
Deep Belief Neural Networks
• One of the main advantages of DBNs is their ability to
learn features from the data in an unsupervised manner.
• 1. A hierarchical representation of the data can also be
learned by DBNs, with each layer learning increasingly
sophisticated features from lower layers to higher layers.
• 2. DBNs have proven to be resistant to overfitting issue
due to model regularisation and by just using a small
amount of labelled data during the fine-tuning phase.
• 3. The capacity of DBNs to manage missing data that
happens frequently in many real-world applications for
some data to be corrupted or absent.

You might also like