0% found this document useful (0 votes)
21 views37 pages

Deep Learning UNIT-5

The document outlines key concepts in deep learning, focusing on interactive applications such as machine vision and natural language processing. It discusses convolutional neural networks (CNNs), their architecture, and applications in image recognition and object detection, as well as techniques in NLP like tokenization and word embeddings. Additionally, it introduces generative adversarial networks (GANs) for generative modeling, highlighting their structure and training process.

Uploaded by

hw7856330
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views37 pages

Deep Learning UNIT-5

The document outlines key concepts in deep learning, focusing on interactive applications such as machine vision and natural language processing. It discusses convolutional neural networks (CNNs), their architecture, and applications in image recognition and object detection, as well as techniques in NLP like tokenization and word embeddings. Additionally, it introduces generative adversarial networks (GANs) for generative modeling, highlighting their structure and training process.

Uploaded by

hw7856330
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

NARASARAOPETA INSTITUTE OF TECHNOLOGY

Department of Computer Science and Engineering


DEEP LEARNING (III – AI&ML) – II SEM

UNIT V
Interactive Applications of Deep Learning: Machine Vision, Natural Language processing,
Generative Adversial Networks, Deep Reinforcement Learning. [Text Book 1]
Deep Learning Research: Autoencoders, Deep Generative Models: Boltzmann Machines
Restricted Boltzmann Machines, Deep Belief Networks. [Text Book 1]

Machine Vision
Machine Vision" refers to the utilization of deep learning techniques, particularly within the
domain of computer vision, to develop interactive systems that can perceive and understand
visual information from the environment and respond to user input or environmental changes in
real-time.
Convolutional Neural Networks (CNNs):
The Two-Dimensional Structure of Visual Imagery:
• Convolutional neural networks (CNNs) are commonly used in image recognition tasks.
CNNs are specifically designed to work with two-dimensional data and are capable of
preserving spatial information through layers such as convolutional and pooling layers.
• In the context of handwritten digit recognition using MNIST, CNNs can learn
hierarchical representations of features directly from the two-dimensional pixel grid,
without the need for flattening the images into one-dimensional arrays.

Computational Complexity:
• The computational complexity of processing images in a dense neural network increases
rapidly with the size of the input image.
• For example, for a 28x28 pixel MNIST image with one color channel, passing the image
into a dense layer results in 785 parameters per neuron.

UNIT-5 [1] Dr.R.Satheeskumar, Professor


• for larger images such as a 200x200 pixel full-color RGB image, With three color
channels and 40,000 pixels per channel, each neuron in the dense layer would have
120,001 parameters.
• Convolutional neural networks (CNNs) are commonly employed in image processing
tasks. CNNs leverage shared weights and local connectivity to reduce the number of
parameters and computational complexity significantly

Convolutional Layers:
• Convolutional layers consist of sets of kernels, which are also known as filters. Each of
these kernels is a small window (called a patch) that scans across the image (in more
technical terms, the filter convolves), from top left to bottom right.
• Kernels are made up of weights, which—as in dense layers—are learned through
backpropagation. Kernels can range in size, but a typical size is 3X3, For the
monochromatic MNIST digits, this 3X3-pixel window would consist of 3 X 3 X 1
weights—nine weights, for a total of 10 parameters (like an artificial neuron in a dense
layer, every convolutional filter has a bias term b).

When reading a page of a book written in English, we begin in the top-left corner and read to the
right. Every time we reach the end of a row of text, we progress to the next row. In this way, we
eventually reach the bottom-right corner, thereby reading all of the words on the page.

UNIT-5 [2] Dr.R.Satheeskumar, Professor


Analogously, the kernel in a convolutional layer begins on a small window of pixels in the top-
left corner of a given image. From the top row downward, the kernel scans from left to right,
until it eventually reaches the bottom-right corner, thereby scanning all of the pixels in the
image.

we add some bias term b (say, -0.19) to arrive at z:

Multiple Filters:
• Multiple filters (also known as kernels) are used to extract different features from the
input images. Each filter performs convolution operations across the input image,
resulting in feature maps that highlight specific patterns or structures within the image.
• The filters in the layers react to increasingly complex combinations of these simple
features, learning to represent increasingly abstract spatial patterns and eventually
building a hierarchy from simple lines and colors up to complex textures and shapes.
• The number of filters in the layer, like the number of neurons in a dense layer, is a
hyperparameter that we configure ourselves.

UNIT-5 [3] Dr.R.Satheeskumar, Professor


A Convolutional Example:
Convolutional layers are a nontrivial departure from the simpler fully connected layers, so, to
help you make sense of the way the pixel values and weights combine to produce feature maps,
across we’ve created a detailed contrived example with accompanying math. To begin, imagine
we’re convolving over a single RGB image that’s 3X3 pixels in size.

UNIT-5 [4] Dr.R.Satheeskumar, Professor


Shown in the middle of the figure are the 3X3 arrays for each of the three channels:
• Red, green, and blue. Note that the image has been padded with zeros on all four sides.
• Below the arrays of pixel values you’ll find the weight matrices for each of the channels.
We chose a kernel size of 3 3, and given that there are three channels in the input image
the weights matrix will be an array with dimensions [3,3,3], shown here individually. The
bias term is 0.2.
Finally, the activation for the last filter position has been calculated, and the activation map is
complete.

Activation map:
• An activation map, also known as a feature map, is a two-dimensional array that
represents the output of applying a set of filters (kernels) to an input image in a
convolutional neural network (CNN).

Convolutional Filter Hyperparameters:


• Convolutional filters in convolutional neural networks (CNNs) have several hyperparameters
that affect their behavior and performance in feature extraction.
1. Kernel size
2. Stride length
3. Padding
Kernel Size (Filter Size):
This hyperparameter determines the spatial extent of the filter. It defines the width and height of
the receptive field over which the filter operates. Common filter sizes include 3x3, 5x5, and 7x7,
but smaller sizes are also used for specific tasks.

UNIT-5 [5] Dr.R.Satheeskumar, Professor


Stride length:
The stride determines the step size at which the filter is moved across the input image during the
convolution operation.
Padding:
Padding refers to the addition of extra border pixels around the input image. It is used to control
the spatial dimensions of the output feature maps

Pooling Layers:
▪ Pooling layers help in reducing computational complexity, controlling overfitting, and
increasing translation invariance.
▪ These layers help reduce the spatial dimensions (width and height) of the input volume,
which in turn reduces the computational complexity of the network and helps in
extracting dominant features

UNIT-5 [6] Dr.R.Satheeskumar, Professor


LeNet-5 in Keras:
▪ LeNet-5 is a classical convolutional neural network architecture proposed by Yann
LeCun et al. in 1998 for handwritten digit recognition.
▪ It consists of two sets of convolutional and average pooling layers, followed by three
fully connected layers.

Retaining two-dimensional image shape:

CNN model inspired by LeNet-5:

UNIT-5 [7] Dr.R.Satheeskumar, Professor


• The integers 32 and 64 correspond to the number of filters we’re specifying for the
• first and second convolutional layer, respectively. kernel_size is set to 3X3 pixels.
• We’re using relu as our activation function.
• We’re using the default stride length, which is 1 pixel (along both the vertical and the
horizontal axes).
• We’re using the default padding, which is 'valid'.
• MaxPooling2D() is used to reduce computational complexity.
• Dropout() reduces the risk of overfitting to our training data.
• Finally, Flatten() converts the three-dimensional activation map output by Conv2D() to a
one-dimensional array.
A summary of LeNet-5-inspired ConvNet architecture:

UNIT-5 [8] Dr.R.Satheeskumar, Professor


Applications of Machine Vision:
Machine vision, the field of enabling machines to visually perceive and understand their
surroundings, finds applications across various industries and domains.

These are examples of various machine vision applications. We have encountered classification
previously in this chapter, but now we cover object detection, semantic segmentation, and
instance segmentation.
Object detection:
Object detection has broad applications, such as detecting pedestrians in the field of view for
autonomous driving, or for identifying anomalies in medical images.
Generally speaking, object detection is divided into two tasks: detection (identifying where the
objects in the image are) and then, subsequently, classification (identifying what the objects are
that have been detected).
Seminal models—ones that have defined progress in this area—include R-CNN, Fast R-CNN,
Faster R-CNN, and YOLO.
R-CNN:
R-CNN (Region-based Convolutional Neural Network) is a seminal object detection framework
that popularized the use of deep learning for object detection tasks.
To emulate thisattention, Girshick and his coworkers developed R-CNN to:
1. Perform a selective search for regions of interest (ROIs) within the image.
2. Extract features from these ROIs by using a CNN.
3. Combine two “traditional” (as in Figure 1.12) machine learning approaches—called linear
regression and support vector machines—to, respectively, refine the locations of bounding
boxes34 and classify objects within each of those boxes.
R-CNNs redefined the state of the art in object detection, achieving a massive gain in
performance over the previous best model in the Pattern Analysis, Statistical Modeling and
Computational Learning (PASCAL) Visual Object Classes (VOC) competition.

UNIT-5 [9] Dr.R.Satheeskumar, Professor


Faster R-CNN:
Faster R-CNN builds upon the R-CNN framework by integrating a Region Proposal Network
(RPN) directly into the detection pipeline. This eliminates the need for external region proposal
methods like selective search, resulting in a more unified and efficient architecture.

YOLO:
• YOLO, which stands for "You Only Look Once," is a state-of-the-art real-time object
detection system introduced by Joseph Redmon et al. in 2016.
• Unlike traditional object detection methods that use region proposal algorithms and
multi-stage pipelines, YOLO approaches the task in a different way, aiming for both high
accuracy and fast inference speed.

UNIT-5 [10] Dr.R.Satheeskumar, Professor


Natural Language Processing
Natural Language Processing (NLP) is a field of artificial intelligence (AI) and computational
linguistics that focuses on enabling computers to understand, interpret, and generate human
language in a way that is both meaningful and contextually appropriate. NLP encompasses a
broad range of tasks and techniques aimed at processing and analyzing natural language data,
such as text and speech.
Preprocessing Natural Language Data:
Tokenization:
• Tokenization is the process of breaking down a text into smaller units, typically words or
subwords.

NLTK (Natural Language Toolkit) is a popular library in Python for natural language processing
tasks. The "corpus" submodule within NLTK contains a collection of text corpora for various
languages and domains.

The sent_tokenize() method in NLTK:


It is used for tokenizing text into sentences. Tokenization is the process of breaking text into
smaller units, such as words or sentences, to facilitate further processing.

The first book in the Project Gutenberg corpus is Emma, because this first,element contains the
book’s title page, chapter markers, and first sentence, all (erroneously) blended together with
newline characters (\n):

UNIT-5 [11] Dr.R.Satheeskumar, Professor


A stand-alone sentence is found in the second element, which you can view by executing
gberg_sent_tokens:

nltk’s word_tokenize() method:


word_tokenize(gberg_sent_tokens[1])
The word father, for example, is the 15th word in the second sentence, as you can see by running
this line of code:
word_tokenize(gberg_sent_tokens[1])[14]
Converting All Characters to Lowercase:
Convert all text to lowercase to ensure consistency. This prevents the model from treating words
like "Word" and "word" differently.

Removing Stop Words and Punctuation:


To handle these, let’s use the + operator to concatenate together nltk’s list of English stop words
with the string library’s list of punctuation marks:

UNIT-5 [12] Dr.R.Satheeskumar, Professor


Stemming:
Stemming is a text normalization technique in natural language processing (NLP) that involves
reducing words to their root or base form, called the "stem." The main purpose of stemming is to
remove affixes (suffixes, prefixes, infixes) from words so that variations of the same word are
mapped to the same root, thereby reducing the vocabulary size and improving text analysis and
retrieval.

Creating Word Embeddings with word2vec:


• Word vectors, also known as word embeddings, are numerical representations of words in a
high-dimensional space where similar words are closer to each other in terms of geometric
distance, meaningful location within a multidimensional space called the vector space.
• Each word is assigned to a random location within the vector space.
• Word2Vec is a popular technique for generating word embeddings, which are dense, high-
dimensional vector representations of words. These word embeddings capture semantic and
syntactic relationships between words in a continuous vector space.
The Essential Theory Behind word2vec:
There are two main architectures for training Word2Vec models:
1. Continuous Bag of Words (CBOW)
2. Skip-gram.

UNIT-5 [13] Dr.R.Satheeskumar, Professor


Continuous Bag of Words (CBOW) is a popular natural language processing technique used
to generate word embeddings. Word embeddings are important for many NLP tasks because
they capture semantic and syntactic relationships between words in a language.

Skip-Gram:
• The Skip-Gram model learns distributed representations of words in a continuous vector
space. The main objective of Skip-Gram is to predict context words (words surrounding a
target word) given a target word.

UNIT-5 [14] Dr.R.Satheeskumar, Professor


Evaluating Word Vectors:
• Evaluating word vectors is essential to ensure that they capture meaningful semantic and
syntactic relationships between words accurately.
• Several methods can be used to evaluate word vectors, including intrinsic and extrinsic
evaluation techniques.
Intrinsic Evaluation:
• Intrinsic evaluation involves assessing the quality of word vectors based on their
performance.
1. Word Similarity
2. Word Analogies
"King" - "Man" + "Woman"
Extrinsic evaluation:
• "Extrinsic" in the context of evaluation generally refers to assessing something in relation
to its external or practical application, rather than its inherent properties.
Example Sentence:
• "The movie was attractive and kept me on the edge of my seat."
1. Downstream Task Selection
2. Integration of Word Embeddings
3. Model Training
4. Evaluation of Task Performance
5. Comparison and Analysis

Plotting word vectors :


• It can provide valuable insights into the relationships between words in a high-dimensional
space.
• One common approach is to use dimensionality reduction techniques and to project word
vectors onto a lower-dimensional space (typically 2D or 3D) for visualization.

UNIT-5 [15] Dr.R.Satheeskumar, Professor


Networks Designed for Sequential Data:
• Networks designed for sequential data, such as recurrent neural networks (RNNs) and
their variants, are particularly well-suited for tasks involving sequence.
• which include specialized layer types like long short-term memory units (LSTMs) and
gated recurrent units (GRUs).

Recurrent Neural Networks:


Consider the following sentences:
“Jon and Grant are writing a book together. They have really enjoyed writing it.”
The human mind can track the concepts in the second sentence quite easily. You already
know that “they” in the second sentence refers to your authors, and “it” refers to the book
we’re writing. Although this task is easy for you, however, it is not so trivial for a neural
network.

UNIT-5 [16] Dr.R.Satheeskumar, Professor


Generative Adversarial Networks (GANs)
• GANs are used for generative modeling, a task that involves learning the underlying
distribution of a dataset in order to generate new samples similar to those in the dataset.
• GANs consist of two neural networks:
a generator and a discriminator, which are trained simultaneously in a competitive
manner.

The key components of a GAN are:


1. Generator:
• The generator is a neural network that takes random noise as input and generates
synthetic data samples. It learns to map input noise vectors from a low-dimensional
space (often called the latent space) to the high-dimensional space of the data
distribution.
• The goal of the generator is to produce data samples that are indistinguishable from real
samples in the training dataset.
2. Discriminator:
• The discriminator is another neural network that acts as a binary classifier. It takes input
data samples (either real or generated) and predicts whether they are real or fake.
• The discriminator is trained to distinguish between real and fake samples, effectively
learning to differentiate between the real data distribution and the distribution of
generated samples.
3. Adversarial Training:
• The generator and discriminator are trained simultaneously in a minimax game. The
generator aims to generate samples that are so realistic that the discriminator cannot

UNIT-5 [17] Dr.R.Satheeskumar, Professor


distinguish them from real samples, while the discriminator aims to correctly
classify real and fake samples.
• This adversarial training process leads to both networks improving over time, with
the generator learning to generate more realistic samples and the discriminator
becoming better at distinguishing real from fake.

Discriminator training loop. Forward propagation through the generator produces fake images.
These are mixed into batches with real images from the dataset and, together with their labels,
are used to train the discriminator. Learning paths are shown in green, while non-learning paths
are shown in black and the blue arrow calls attention to the image labels, y.

Forward propagation through the generator produces fake images, and inference with the
discriminator scores these images. The generator is improved through backpropagation. Learning
paths are shown in green, and non-learning paths are shown in black. The blue arrow calls
attention to the relationship between the image and its label y which, in the case of generator
training, is always equal to 1.

UNIT-5 [18] Dr.R.Satheeskumar, Professor


The Quick, Draw! Dataset:
• The Quick, Draw! Dataset is a collection of doodles or sketches made by users through
the Quick, Draw! game developed by Google.
• The dataset contains millions of drawings across various categories, ranging from
everyday objects like "banana" and "car"
Example of sketches drawn by humans who have played the Quick, Draw! game. Baseballs,
baskets, and bees:

Loading the data. Assuming you set up your directory structure the same as ours and downloaded
the apple.npy file, you can load these data in using the command

UNIT-5 [19] Dr.R.Satheeskumar, Professor


Again, if your directory structure is different from ours or you selected a different category of
NumPy images from the Quick, Draw! dataset, then you’ll need to amend theinput_images path
variable to your particular circumstance.

We divide by 255 to scale our pixels to be in the range of 0 to 1, just as we did for the MNIST
digits.
Example—a bitmap of the 4,243rd sketch from the apple category—by running this code:

The Discriminator Network:


Discriminator is a fairly straightforward convolutional network, involving the Conv2D layers.
Discriminator model architecture:

UNIT-5 [20] Dr.R.Satheeskumar, Professor


Schematic representation of our discriminator network for predicting whether an input image is
real (in this case, a hand-drawn apple from the Quick, Draw! dataset) or fake (produced by an
image generator)

UNIT-5 [21] Dr.R.Satheeskumar, Professor


To build the discriminator:

Compiling the discriminator network:

The Adversarial Network:


The term "adversarial" refers to the adversarial relationship between the generator and the
discriminator. The generator tries to outsmart the discriminator by generating realistic samples,
while the discriminator tries to become better at distinguishing real from fake samples to catch
the generator's mistakes.
The horizontal dashes visually separate generator training from discriminator training. Green
lines indicate trainable paths, whereas inference-only paths are in black. The red arrows above
and below indicate the path of the backpropagation step during the respective training processes.

UNIT-5 [22] Dr.R.Satheeskumar, Professor


Deep Reinforcement Learning
RL is a branch of machine learning where an agent learns to interact with an environment by
taking actions to maximize cumulative rewards. The agent observes the state of the environment,
selects actions, receives rewards, and learns from the consequences of its actions through trial
and error.
• An agent taking an action within an environment.
The environment returning two types of information to the agent:
1. Rewards:
• Rewards indicate the immediate feedback provided by the environment to the agent after
taking an action.
• Rewards quantify how good or bad the agent's action was in the given state.
2. State: This is how the environment changes in response to an agent’s action.

The Cart-Pole Game:


1. The objective is to balance a pole on top of a cart. The pole is connected to the cart at a
purple dot, which functions as a pin that permits the pole to rotate along the horizontal axis.
2. The cart itself can only move horizontally, either to the left or to the right.
3. Reward Function: The agent receives a reward at each time step based on the current state of
the environment.
4. Episode Termination: The episode ends when one of the following conditions is met:
1. The pole angle exceeds a certain threshold (fallen over).
2. The cart moves outside the boundaries of the track.
3. A maximum number of time steps is reached.

UNIT-5 [23] Dr.R.Satheeskumar, Professor


Markov decision process:
Reinforcement learning problems can be defined mathematically as something called a Markov
decision process. MDPs feature the so-called Markov property—an assumption that the current
timestep contains all of the pertinent information about the state of the environment from
previous timesteps.
1. States (S): A finite set of states represents all possible situations or configurations that the
system can be in. At any given time, the system is in one of these states.
2. Actions (A): A finite set of actions represents all possible decisions or choices that the
decision-maker can take. Each action leads to a transition from one state to another.
3. Transition Probabilities (P): For each state-action pair, there is a probability distribution over
the next possible states. These transition probabilities determine the likelihood of
transitioning to each possible state after taking a particular action in the current state. It
captures the stochastic nature of the system dynamics.
4. Rewards (R): A reward function defines the immediate payoff or reward received by the
decision-maker after taking an action in a particular state. The goal of the decision-maker is
typically to maximize the cumulative reward over time.
5. Policy (π): A policy is a mapping from states to actions, which specifies the decision-making
strategy of the decision-maker. It determines the action to take in each state based on the
current state and possibly some additional information.
The reinforcement learning loop provided again here for convenience) can be considered a
Markov decision process, which is defined by the five components S, A, R, P, and Y (bottom).

UNIT-5 [24] Dr.R.Satheeskumar, Professor


Markov decision process more-distant reward is discounted relative to reward that’s more
immediately attainable. Using the Atari game Pac-Man to illustrate this concept (a green trilobite
sitting in for Mr. Pac-Man himself), with = 0.9, cherries (or a fish!) only one timestep away are
valued at 90 points, whereas cherries (a fish) 20 timesteps away are valued at 12.2 points. Like
the ghosts in the Pac-Man game, the octopus here is roaming around and hoping to kill the poor
trilobite. This is why immediately attainable rewards are more valuable than distant ones:
There’s a higher chance of being killed before reaching the fish that’s farther away.

The policy function π enables an agent to map any state s (from the set of all possible states S) to
an action a from the set of all possible actions A.

UNIT-5 [25] Dr.R.Satheeskumar, Professor


Deep Learning Research
Autoencoders:
• To compress the data and reduce its dimensionality.
• Learning to compress and effectively represent input data without specific labels is the
essential principle of an automatic decoder.
• The basic structure of an autoencoder consists of an encoder and a decoder
• Encoder: The encoder takes an input and transforms it into a latent-space representation. This
latent space typically has a lower dimensionality than the input space.
• Decoder: The decoder takes the latent-space representation produced by the encoder and
attempts to reconstruct the original input data.
• An encoder function h = f (x) and a decoder that produces a reconstruction r = g(h).

1. Undercomplete Autoencoders:
An undercomplete autoencoder is a type of autoencoder neural network architecture where the
dimensionality of the latent space (also known as the bottleneck layer or encoding layer) is lower
than the dimensionality of the input data. Learning an undercomplete representation forces the
autoencoder to capture the most salient features of the training data.
The learning process is described simply as minimizing a loss function:

where L is a loss function penalizing g(f (x)) for being dissimilar from x, such as the mean
squared error.

UNIT-5 [26] Dr.R.Satheeskumar, Professor


2. Regularized Autoencoders:
RAE stands for "Regularized Autoencoder" and refers to a specific type of autoencoder that
incorporates regularization techniques to prevent overfitting and improve generalization.
• L1 or L2 regularization:
This involves adding a penalty term to the loss function, which discourages large weights
in the network. L1 regularization penalizes the absolute value of the weights, while L2
regularization penalizes the square of the weights.
• Dropout:
Dropout is a technique commonly used in deep learning to prevent overfitting. It involves
randomly setting a fraction of input units to zero during training, which helps to prevent co-
adaptation of units in the network.

UNIT-5 [27] Dr.R.Satheeskumar, Professor


3. Sparse autoencoders:
Sparse autoencoders are a type of autoencoder neural network architecture that imposes sparsity
constraints on the activations of the hidden layers (also known as the encoding layer) during
training.
• The goal of sparse autoencoders is to learn sparse representations of the input data, where
only a small subset of the neurons in the hidden layers are active at any given time.
• This involves adding a penalty term to the loss function, which discourages large weights
in the network. L1 regularization penalizes the absolute value of the weights, while L2
regularization penalizes the square of the weights.

4. Denoising autoencoders:
The denoising autoencoder (DAE) is an autoencoder that receives a corrupted data point as input
and is trained to predict the original, uncorrupted data point as its output.
Denoising autoencoders are a type of autoencoder neural network architecture designed to learn
robust representations of input data by removing noise from the input during training. They
achieve this by training the autoencoder to reconstruct clean versions of noisy input data,
effectively learning to denoise the input.
1. Noise Injection: During training, noise is intentionally added to the input data to create
corrupted versions of the original data. Common types of noise include Gaussian noise,
dropout noise, or masking noise, where random elements of the input are set to zero.
2. Reconstruction Objective: The denoising autoencoder is trained to minimize the
reconstruction error between the clean input data and the reconstructed output. The objective
is to teach the autoencoder to recover the original, clean data from the noisy input.
3. Regularization: In addition to the reconstruction objective, denoising autoencoders often
incorporate regularization techniques to encourage the learned representations to be robust to

UNIT-5 [28] Dr.R.Satheeskumar, Professor


variations in the input data. This can include techniques such as weight regularization (e.g.,
L1 or L2 regularization) or sparsity regularization.

A denoising autoencoder or DAE instead minimizes:

UNIT-5 [29] Dr.R.Satheeskumar, Professor


5. Learning Manifolds with Autoencoders:
Learning manifolds with autoencoders involves training autoencoder neural network
architectures to capture the underlying structure or geometry of high-dimensional data, typically
embedded in lower-dimensional manifolds within the input space. By encoding and decoding
data samples, autoencoders aim to learn a compressed representation of the input data that
captures its essential characteristics while reducing dimensionality.

Here we create a one-dimensional manifold in 784-dimensional space. We take an MNIST image


with 784 pixels and transform it by translating it vertically. The amount of vertical translation
defines a coordinate along a one-dimensional manifold that traces out a curved path through
image space. This plot shows a few points along this manifold.
6. Contractive Autoencoders:
• Contractive autoencoders are a type of autoencoder neural network architecture that
incorporates an additional regularization term in the loss function during training.
• The regularization term penalizes the gradients of the encoder's output with respect to the
input data, encouraging the learned representation to be insensitive to small variations in the
input.
• This regularization term effectively constrains the model to learn smooth and robust
representations of the input data.

UNIT-5 [30] Dr.R.Satheeskumar, Professor


Deep generative models
Deep generative models are a class of neural network architectures designed to model complex
data distributions and generate new data samples that resemble the training data. These models
leverage deep learning techniques to learn hierarchical representations of the data, allowing them
to capture high-level features and dependencies.
Boltzmann Machine:
Boltzmann Machine is a kind of recurrent neural network where the nodes make binary decisions
and are present with certain biases. Several Boltzmann machines can be collaborated together to
make even more sophisticated systems such as a deep belief network.
• Boltzmann Machines consist of a learning algorithm that helps them to discover
interesting features in datasets composed of binary vectors.
• The learning algorithm is generally slow in networks with many layers of feature
detectors but can be made faster by implementing a learning layer of feature detectors.

meaning we define the joint probability distribution using an energy function:

where U is the “weight” matrix of model parameters and b is the vector of bias parameters.

UNIT-5 [31] Dr.R.Satheeskumar, Professor


Boltzmann Machine Learning:
Learning algorithms for Boltzmann machines are usually based on maximum likelihood. All
Boltzmann machines have an intractable partition function. "Boltzmann Machine Learning"
refers to the process of training Boltzmann Machines (BMs), a type of neural network model, to
learn and represent patterns in data. Boltzmann Machines belong to the family of energy-based
models and are used in unsupervised learning tasks.
Restricted Boltzmann Machines
Restricted Boltzmann Machines (RBMs) are a type of generative neural network model that
belong to the family of energy-based models (EBMs). They were introduced by Geoffrey Hinton
and collaborators in the mid-2000s as a simplified version of Boltzmann Machines, aimed at
making learning and inference more tractable.
RBMs are undirected probabilistic graphical models containing a layer of observable variables
and a single layer of latent variables. RBMs may be stacked (one on top of the other) to form
deeper models. See figure for some examples. In particular, figure shows the graph structure of
the RBM itself. It is a bipartite graph, with no connections permitted between any variables in
the observed layer or between any units in the latent layer.

Examples of models that may be built with restricted Boltzmann machines.


(a)The restricted Boltzmann machine itself is an undirected graphical model based on a bipartite
graph, with visible units in one part of the graph and hidden units in the other part. There are no
connections among the visible units, nor any connections among the hidden units. Typically
every visible unit is connected to every hidden unit but it is possible to construct sparsely
connected RBMs such as convolutional RBMs.

(b) A deep belief network is a hybrid graphical model involving both directed and undirected
connections. Like an RBM, it has no intralayer connections. However, a DBN has multiple

UNIT-5 [32] Dr.R.Satheeskumar, Professor


hidden layers, and thus there are connections between hidden units that are in separate layers.
All of the local conditional probability distributions needed by the deep belief network are
copied directly from the local conditional probability distributions of its constituent RBMs.
Alternatively, we could also represent the deep belief network with a completely undirected
graph, but it would need intralayer connections to capture the dependencies between parents.

(c)A deep Boltzmann machine is an undirected graphical model with several layers of latent
variables. Like RBMs and DBNs, DBMs lack intralayer connections. DBMs are less closely tied
to RBMs than DBNs are. When initializing a DBM from a stack of RBMs, it is necessary to
modify the RBM parameters slightly. Some kinds of DBMs may be trained without first training
a set of RBMs.
Training Restricted Boltzmann Machines:
Training Restricted Boltzmann Machines (RBMs) involves adjusting the weights and biases of
the model to better capture the underlying distribution of the training data. RBMs are typically
trained using contrastive divergence (CD), a variant of stochastic gradient descent (SGD) that
approximates the gradient of the log-likelihood of the data.
The contrastive divergence algorithm is a computationally efficient approximation of the
maximum likelihood learning objective for RBMs.

UNIT-5 [33] Dr.R.Satheeskumar, Professor


Deep Belief Networks (DBNs)
Deep Belief Networks (DBNs) are a type of generative neural network model composed of
multiple layers of stochastic, latent variables, typically arranged in a bipartite graph structure.
DBNs combine the capabilities of probabilistic graphical models, such as Restricted Boltzmann
Machines (RBMs), with the representational power of deep neural networks.
• A deep-belief network (DBN) is built by appending a stack of RBM layers. In the stack every
RBM layer can communicate with both the previous and subsequent layers. Hence it is a
network which is assembled out many single-layer networks.
• Except the first and final layers every layer in a DBN performs a dual role by serving as the
hidden layer to the nodes that come before it and as the input layer to the nodes that come
after. Some applications of deep-belief networks are to recognize, cluster and generate
images, video sequences and motion-capture data.
DBN model the joint distribution between observed vector x and the l hidden layers hk is as
follows:

where x = ho, P(hk | hk + 1)is a conditional distribution for the visible units conditioned on the
hidden units of the RBM at level k, and P(hl − 1, hl)is the visible-hidden joint distribution in the
top-level RBM.
The architecture of a Deep-Belief Network with two RBMs is as shown in Fig.

UNIT-5 [34] Dr.R.Satheeskumar, Professor


The graphical model for a deep Boltzmann machine with one visible layer (bottom) and two
hidden layers. Connections are only between units in neighboring layers. There are no intralayer
layer connections.

RBMs and DBNs, DBMs typically contain only binary units—as we assume for simplicity of our
presentation of the model—but it is straightforward to include real-valued visible units.
A DBM is an energy-based model, meaning that the the joint probability distribution over the
model variables is parametrized by an energy function E. In the case of a deep Boltzmann
machine with one visible layer, v, and three hidden layers, h(1), h(2) and h(3) , the joint
probability is given by:

UNIT-5 [35] Dr.R.Satheeskumar, Professor


To simplify our presentation, we omit the bias parameters below. The DBM energy function is
then defined as follows:

A deep Boltzmann machine, re-arranged to reveal its bipartite graph structure:

DBNs have two phases:-


1. Pre-train Phase
2. Fine-tune Phase
Pre-train phase is nothing but multiple layers of RBNs, while Fine Tune Phase is a feed forward
neural network. Let us visualize both the steps:-

UNIT-5 [36] Dr.R.Satheeskumar, Professor


Applications:
DBNs have been applied to a wide range of tasks in machine learning, including image
recognition, speech recognition, natural language processing, and recommendation systems.
They have been particularly successful in domains where large amounts of unlabeled data are
available and where learning hierarchical representations of the data is crucial for achieving high
performance.

UNIT-5 [37] Dr.R.Satheeskumar, Professor

You might also like