0% found this document useful (0 votes)
7 views13 pages

DL Ia2

The document provides an overview of various neural network architectures including Autoencoders, Deep Belief Networks (DBNs), Generative Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). Each section describes the structure, function, and applications of these networks, highlighting their capabilities in tasks such as dimensionality reduction, feature extraction, image generation, and sequence modeling. Additionally, it discusses the specific layers and components within CNNs and RNNs, emphasizing their roles in processing and analyzing data.

Uploaded by

saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

DL Ia2

The document provides an overview of various neural network architectures including Autoencoders, Deep Belief Networks (DBNs), Generative Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). Each section describes the structure, function, and applications of these networks, highlighting their capabilities in tasks such as dimensionality reduction, feature extraction, image generation, and sequence modeling. Additionally, it discusses the specific layers and components within CNNs and RNNs, emphasizing their roles in processing and analyzing data.

Uploaded by

saurabh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT 3

1. ALL ABOUT AUTOENCODERS?


->
- Autoencoders are used for dimensionality reduction and speci c data compression.
- The output is a reconstruction of the input data in the most e cient form.
- The encoder part of the network is used for encoding and sometimes even for data
compression purposes.
- Encoding is achieved by the encoder part of the network which has a decreasing
number of hidden units in each layer.
- Thus this part is forced to pick up only the most signi cant and representative features
of the data.
- The second half of the network performs the Decoding function.
- This part has an increasing number of hidden units in each layer and thus tries to
reconstruct the original input from the encoded data.
- Thus Auto-encoders are an unsupervised learning technique.
- Autoencoders rely on back propagation to update their weights.
- Common variants of autoencoders:
• Compression Autoencoders :
- The network input must pass through a bottleneck region of the network before
being expanded back into the output representation
• Denoising Autoencoders:
- The autoencoder is given a corrupted version (e.g., some features are removed
randomly) of the input and the network is forced to learn the uncorrupted output.

2. ALL ABOUT DBN?


->
• DBNs are composed of layers of Restricted Boltzmann Machines (RBMs) for the pretrain
phase and then a feed-forward network for the ne-tuning phase.
• Feature Extraction using RBM:
- The fundamental purpose of RBMs in the context of deep learning and DBNs is to learn
higher-level features of a dataset in an unsupervised training fashion.
- Each hidden layer of the RBM in the pretrain phase learns progressively more complex
features from the distribution of the data.
- These higher-order features are progressively combined in nonlinear ways to do elegant
automated feature engineering.
- Boltzmann Machine has an input layer and one or several hidden layers.
- Boltzmann Machine is a generative unsupervised model
- Involves learning a probability distribution from an original dataset and using it to
make inferences about never-before-seen data.
- Boltzmann Machines are primarily divided into two categories:
• 1. Energy-based Models (EBMs)
fi
fi
ffi
fi
• 2. Restricted Boltzmann Machines (RBM):
• Visible node isn’t connected to each other, and hidden nodes aren’t connected to
each other.
• Other than that, RBMs are exactly the same as Boltzmann machines.
• They do not have recurrent connections, making the training process more stable
• Uses a more e cient training algorithm called Contrastive Divergence, which
approximates the gradient of the log-likelihood and speeds-up the learning
process.

3. ALL ABOUT GAN?


->
- A Generative Adversarial Network (GAN) is a machine learning model in which two
neural networks compete with each other to become more accurate in their predictions.
- GANs typically run unsupervised and use a cooperative zero-sum game framework to
learn.
- Essentially, GANs create their own training data.
- The two neural networks that make up a GAN are:
- generator:
- The generator is a convolutional neural network.
- The goal of the generator is to arti cially manufacture outputs that could
easily be mistaken for real data.
- discriminator:
- It is a deconvolutional neural network.
- The goal of the discriminator is to identify which outputs it receives have been
arti cially created.
- As the feedback loop between the adversarial networks continues, the generator will
begin to produce higher-quality output and the discriminator will become better at
agging data that has been arti cially created.
- When training GANs, we want to update the parameters such that the network will
generate more believable output images based on the training data.
- The goal here is to make images realistic enough that the discriminator network is
fooled to the point that it cannot distinguish the di erence between the real and the
synthetic input data.
- The Discriminator Network:
- When modeling images, the discriminator network is typically a standard CNN.
- Using a secondary neural network as the discriminator network allows the GAN to
train both networks in parallel in an unsupervised fashion.
- These discriminator networks take images as input and then output a
classi cation.The Generator Network.
- The generative network:
- The generative network in GANs generates data (or images) with a special kind of
layer called a deconvolutional layer.
- Deconvolutional Networks:
• The deconvolutional layers in a deconvolutional network map features to pixels when
modelling images, which is the opposite of what a normal convolutional layer does.
fl
fi
fi
ffi
fi
fi
ff
• This aspect of deconvolutional networks is what enables us to generate images as
output from neural networks.
• Deconvolutional networks are unsupervised and trained in a layer-wise fashion,
similar to a DBN.
- Conditional GANs:
• The conditional generative adversarial network, or cGAN for short, is a type of GAN
that involves the conditional generation of images by a generator model.
• Image generation can be conditional on a class label, if available, allowing the
targeted generated of images of a given type.

4. ALL ABOUT CNN?


->
- A convolutional neural network (CNN or ConvNet) is a network architecture for deep
learning that learns directly from data.
- CNNs are particularly useful for nding patterns in images to recognize objects, classes,
and categories.
- They can also be quite e ective for classifying audio, time-series, and signal data.
- The goal of a CNN is to learn higher-order features in the data via convolutions.
- They are well suited to object recognition with images and consistently top image
classi cation competitions.
- They can identify faces, individuals, street signs, platypuses, and many other aspects of
visual data.
- They’re also good at analyzing sound.
- CNNs are powering major advances in machine vision, which has obvious applications
for self-driving cars, robotics, drones, and treatments for the visually impaired.
- With CNNs, we can arrange the neurons in a three-dimensional structure:
A. Width
B. Height
C. Depth
- These attributes of the input match up to an image structure for which we have:
a) Image width in pixels
b) Image height in pixels
c) RGB channels as the depth

- BIOLOGICAL INSPIRATION:
- The biological inspiration for CNNs is the visual cortex in animals.
- The visual cortex, is the human brain's vision-processing center
- The visual cortex is the primary cortical region of the brain that receives,
integrates, and processes visual information relayed from the retinas.
- The cells in the visual cortex are sensitive to small subregions of the input.
- These smaller subregions are tiled together to cover the entire visual eld.
- The cells are well suited to exploit the strong spatially local correlation found in
the types of images our brains process and act as local lters over the input space.
- There are two classes of cells in this region of the brain.
fi
ff
fi
fi
fi
- The simple cells activate when they detect edge-like patterns, and the more
complex cells activate when they have a larger receptive eld and are invariant to
the position of the pattern.

- CNN ARCHITECTURE:

• The output of these layers produces typically a two-dimensional output of the


dimensions [b × N], where b is the number of examples in the mini-batch and N is the
number of classes we’re interested in scoring.

5. ALL ABOUT CNN LAYERS?


->
• CNN INPUT LAYER:
• Input layers are where we load and store the raw input data of the image for
processing in the network.
• This input data speci es the width, height, and number of channels.
• Typically, the number of channels is three, for the RGB values for each pixel.

• CNN CONVOLUTIONAL LAYER:


• Transform the input data by using a patch of locally connecting neurons from the
previous layer.
• Computes a dot product between the region of the neurons in the input layer and the
weights to which they are locally connected in the output layer.
• A convolution is de ned as a mathematical operation describing a rule for how to
merge two sets of information.
• It is important in both physics and mathematics and de nes a bridge between the
space/time domain and the frequency domain through the use of Fourier transforms.
• It takes input, applies a convolution kernel, and gives us a feature map as output.
• Convolutional layers have parameters for the layer and additional hyperparameters.
• Gradient descent is used to train the parameters in this layer.
• The following are the major components of convolutional layers:
fi
fi
fi
fi
• Filters:
- Filters are a function that has a width and height smaller than the width and
height of the input volume.
- Filters (e.g., convolutions) are applied across the width and height of the input
volume in a sliding window manner.
- Filters are also applied for every depth of the input volume.
• Activation maps:
- Activation is a numerical result if a neuron decided to let information pass
through.
- The output of applying a lter to the input volume is known as that lter's
activation map.
• Parameter sharing:
- CNNs use a parameter-sharing scheme to control the total parameter count.
- This helps training time because we’ll use fewer resources to learn the training
dataset.
- To implement parameter sharing in CNNs, we rst denote a single two-
dimensional slice of depth, then constrain the neurons in each depth slice to
use the same weights and bias.
- This gives us signi cantly fewer parameters (or weights) for a given
convolutional layer.
• Layer-speci c hyper parameters:
- Following are the hyper parameters that dictate the spatial arrangement and
size of the output volume from a convolutional layer :
• Filter (or kernel) size ( eld size):
- Filter size refers to the dimensions of the lter/kernel in the ConvNet.
• Output depth:
- The depth hyper parameter controls the neuron count in the
convolutional layer that is connected to the same region of the input
volume.
• Stride:
- Stride con gures how far our sliding lter window will move per
application of the lter function.
- Lower settings for stride will allocate more depth columns in the output
volume.
• Zero-padding:
- The last hyper parameter is zero-padding, with which we can control the
spatial size of the output volumes.

• CNN POOLING LAYER:


• Pooling layers are commonly inserted between successive convolutional layers.
• Pooling layers reduce the data representation progressively over the network and
help control over tting.
• The pooling layer operates independently on every depth slice of the input.
• The pooling layer uses the max () operation to resize the input data spatially.
• This operation is referred to as max pooling.
• Pooling layers use lters to perform the downsampling process on the input volume.
• These layers perform downsampling operations along the spatial dimension of the
input data.
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
• This means that if the input image were 32 pixels wide by 32 pixels tall, the output
image would be smaller in width and height.
• The most common setup for a pooling layer is to apply 2 × 2 lters with a stride of 2.
• This will downsample each depth slice in the input volume by a factor of two on the
spatial dimensions.
• This downsampling operation will result in 75 percent of the activations being
discarded.
• Pooling layers do not have parameters for the layer but do have additional
hyperparameters.
• This layer does not involve parameters, because it computes a xed function of the
input volume.
• It is not common to use zero-padding for pooling layers.

• CNN – FULLY CONNECTED LAYER:


• Fully connected layer are the dense networks of neurons.
• Applied after convolutional and max pooling layer.
• Classi es the output.
• Associate features to a particular label.
• We use this layer to compute class scores that we’ll use as output of the network
• Fully connected layers have the normal parameters for the layer and
hyperparameters.
• Fully connected layers perform transformations on the input data volume that are a
function of the activations in the input volume and the parameters
fi
fi
fi
UNIT-4
6. ALL ABOUT RNN?
->
- RNN works on the principle of saving the output of a particular layer and feeding this
back to the input in order to predict the output of the layer.
- family of feed-forward neural networks.
- They are di erent from other feed-forward networks in their ability to send information
over time-steps.
- allow for both parallel and sequential computation,
- can compute anything a traditional computer can compute.
- They are a large feedback network of connected neurons.
- It takes each vector from a sequence of input vectors and model them one at a time.
- This allows the network to retain its state while modeling each input vector across the
window of input vectors.
- Modeling the time dimension is a hallmark of Recurrent Neural Networks.
- APPLICATIONS:
- Natural Language Processing (NLP):
- Used for language modeling, auto-completion, text generation, machine
translation, and speech recognition.
- Time Series Prediction:
- Applied in stock price prediction, weather forecasting, sales prediction, and
credit scoring.
- Image and Video Analysis:
- Employed for image captioning, video analysis, action recognition, and
tracking.
- Healthcare:
- Analyzes time series medical data for disease prediction and drug discovery.
- Robotics and Control Systems:
- Utilized in robot control, dynamic system modeling, and coordination tasks.
- TYPES:
- One-to-one:
- Structure of the Vanilla Neural Network.
- It is used to solve general machine learning problems that have only one input
and output.
- Example: classi cation of images.
- One-to-many:
- sequence output.
- For example, image captioning takes an image and outputs a sequence of
words.
- Many-to-one:
- sequence input.
ff
fi
- For example, sentiment analysis is where a given sentence is input.
- Many-to-many:
- For example, video classi cation: label each frame.

7. MODELLING THE TIME DIMENSION?


->
- Many classi cation tools have been applied successfully without modeling the time
dimension, assuming independence.
- Other variations of these tools capture the time dynamic by modeling a sliding window
of the input.
- Sliding window techniques have a limited window width and will fail to capture any
e ects larger than the xed window size.
- A RNN includes a feedback loop that it uses to learn from sequences, including
sequences of varying lengths.
- It contains an extra parameter matrix for the connections between time steps, which are
used to capture the temporal relationships in the data.
- RNN are trained to generate sequences, in which the output at each time-step is based
on both the current input and the input at all previous time steps.
- Normal RNN computes a gradient with an algorithm called back propagation through
time (BPTT).
- Understanding model input and output:
- Traditional machine learning operates on the concept of a single xed-sized input
vector.
- In traditional modeling activities, we typically would see an input-to-output
relationship of xed input size to xed output size.
- This is commonly the pattern for modeling in building classi ers for image
classi cation or classifying columnar data.
- Recurrent Neural Networks change this input dynamic to include multiple input
vectors, one for each time step, and each vector can have many columns.

8. 3D VOLUMETRIC INPUT?
->
1. Mini-batch Size:
- Mini-batch size is the number of input records (collections of time-series points for a
single source entity) we want to model per batch.
2. Number of columns in our vector per time-step:
- The number of columns matches up to the traditional feature column count found in a
normal input vector.
3. Number of time-steps:
• The number of time-steps is how we represent the change in the input vector over time.
ff
fi
fi
fi
fi
fi
fi
fi
fi
9. GENERAL RECURRENT NEURAL NETWORK
ARCHITECTURE?
->
- superset of feed-forward neural networks but they add the concept of recurrent
connections.
- These connections span adjacent time-steps, giving the model the concept of time.
- The conventional connections do not contain cycles in recurrent neural networks.
- The output is computed from the hidden state at the given time-step.
- The previous input vector at the previous time step can in uence the current output at
the current time-step through the recurrent connections.
- We can chain layers of these specialized recurrent neurons together to build better
models.
- We connect the output of the previous layer to the input of the next layer.
- ISSUES IN RNN:
• Vanishing Gradient Problem:
• RNNs su er from the problem of vanishing gradients. The gradients carry
information used in the RNN, and when the gradient becomes too small, the
parameter updates become insigni cant.
• This makes the learning of long data sequences di cult.
• Exploding Gradient Problem:
• While training a neural network, if the slope tends to grow exponentially instead
of decaying, this is called an Exploding Gradient. This problem arises when large
error gradients accumulate, resulting in very large updates to the neural
network model weights during the training process.
• Long training time, poor performance, and bad accuracy are the major issues in
gradient problems.

10. LONG SHORT-TERM MEMORY (LSTM)


NETWORKS?
->
- LSTM networks are the most commonly used variation of Recurrent Neural Networks.
- It is a variety of RNN that are capable of learning long-term dependencies, especially in
sequence prediction problems.
- LSTM has feedback connections, i.e., it is capable of processing the entire sequence of
data, apart from single data points such as images.
- This nds application in speech recognition, machine translation, etc.
- LSTM is a special kind of RNN, which shows outstanding performance on a large variety
of problems.
- Properties of LSTM networks:
• Better update equation.
• Better back propagation
- Example use cases of LSTMs:
• Generating sentences.
• Classifying time-series
fi
ff
fi
ffi
fl
• Speech recognition
• Handwriting recognition
• Polyphonic music modeling
- With Recurrent Neural Networks, we introduce the idea of a type of connection that
connects the output of a hidden-layer neuron as an input to the same hidden-layer
neuron.
- With this recurrent connection, we can take input from the previous time step into the
neuron as part of the incoming information.
- Architecture of LSTM:

• A typical LSTM network is comprised of di erent memory blocks called cells.


• There are two states that are being transferred to the next cell; the cell state and the
hidden state.
• The memory blocks are responsible for remembering things and manipulations to this
memory are done through three major mechanisms, called gates.
• FORGET GATE:
- A forget gate is responsible for removing information from the cell state.
- This is required for optimizing the performance of the LSTM network.
- This gate takes in two inputs; h_t-1 (hidden state from previous cell ) and x_t.(input
at particular time-step).
• INPUT GATE:
- The input gate is responsible for the addition of information to the cell state.
- Regulating what values need to be added to the cell state by involving a sigmoid
function.
- Multiplying the value of the regulatory lter to the created vector and then
adding this useful information to the cell state via addition operation.
• OUTPUT GATE:
- Create a vector after applying tanh function to the cell state, thereby scaling the
values to the range -1 to +1.
- Make a lter using the values of h_t-1 and x_t, such that it can regulate the values
that need to be output from the vector created above.
- Multiplying the value of this regulatory lter to the vector created in step 1 and
sending it out as output and also to the hidden state of the next cell.
fi
fi
fi
ff
11. All ABOUT RvNN?
->
- (RvNNs) are Deep Neural Networks used for natural language processing.
- We get a RvNN when the same weights are applied recursively on a structured input to
obtain a structured prediction.
- can handle hierarchical data.
- The tree structure means combining child nodes and producing parent nodes.
- Each child-parent bond has a weight matrix, and similar children have the same
weights
- No. Of children in every node = xed (so they can use same weights).
- RvNNs are used when there's a need to parse an entire sentence.
- composed of a shared-weight matrix and a binary tree structure
- allows the recursive network to learn varying sequences of words or parts of an image.
- It is useful as a sentence and scene parser.
- use a variation of backpropagation called backpropagation through structure (BPTS).
- The feed-forward pass happens bottom-up, and backpropagation is topdown.
- Think of the objective as the top of the tree, whereas the inputs are the bottom.
- RvNN VARITIES:
1. Recursive Autoencoder:
• Just like its feed-forward cousin, recursive autoencoders learn how to
reconstruct the input.
• In the case of NLP, it learns how to reconstruct contexts.
• A semisupervised recursive autoencoder learns the likelihood of certain labels in
each context.
2. Recursive Neural Tensor Network:
• Another variation is a supervised neural network.
• It computes a supervised objective at each node of the tree.
• The tensor part of this means that it calculates the gradient a little di erently,
factoring in more information at each node by taking advantage of another
dimension of information using a tensor.

12. All ABOUT TRANSFORMERS?


->
- The transformer neural network aims to solve sequence-to- sequence tasks while
handling long-range dependencies with ease.
- a deep learning (DL) model, based on a self-attention mechanism that weights the
importance of each part of the input data di erently.
- It is mainly used in computer vision (CV) and natural language processing (NLP).
- designed to process sequential input data like natural language, and perform tasks like
text summarization and translation.
- process the entire input at once.
- The attention mechanism allows the model to focus on the most relevant parts of the
input for each output.

- TRANSFORMER ARCHITECTURE:
fi
ff
ff
• uses an encoder-decoder structure.
• The encoder maps an input sequence to a series of continuous representations.
• The decoder receives the encoder’s output and the decoder’s output at a previous
time step and generates an output sequence.
• The architecture rst converts the input data into an n-dimensional embedding,
which is then fed to an encoder.
• The encoder and decoder consist of modules stacked on each other several times.
• The modules include mainly feed-forward and multi-head attention layers.
• Multi-head attention attention:
- The multi-head attention mechanism enables the model to pay attention to
multiple parts of the key simultaneously.
• Self Attention:
- Allows the model to relate to each word
• Positional encoding:
- Helps us to carry some information about its position in the sentence.
- Helps us to represent a pattern that can be learned by the model.

- TRANSFORMER CHALLENGES:
• The vanilla Transformer model helps overcome the RNN model’s shortcomings but has
two key issues:
- Limited context dependency:
- the Transformer outperforms the LSTM for character-level language modeling
purposes.
- cannot keep long-term dependency information beyond the con gured
context length.
- cannot correlate with words that appeared several segments ago.
- Context fragmentation:
fi
fi
- the Transformer is trained from scratch for each segment.
- No context information is stored in the rst few symbols of each segment,
leading to performance issues.

fi

You might also like