0% found this document useful (0 votes)

335 views17 pages

Unit - 3-NNDL - Notes

This document provides an overview of deep learning, detailing its architecture, types of neural networks, and their applications. It discusses various neural network models such as Feed-Forward Neural Networks, Recurrent Neural Networks, Convolutional Neural Networks, Restricted Boltzmann Machines, and Auto Encoders, along with their specific use cases. Additionally, it covers concepts like gradient descent, cost functions, and the advantages and limitations of deep learning.

Uploaded by

nareshamgoth04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

335 views17 pages

Unit - 3-NNDL - Notes

Uploaded by

nareshamgoth04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

NNDL--UNIT - III

Introduction to Deep Learning, Historical Trends in Deep Learning, Deep Feed - Forward
Networks, Gradient-Based Learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms
Introduction to Deep Learning:
Deep learning is the branch of machine learning which is based on artificial Neural
Network architecture. An artificial Neural Network or A
NN uses Layers of interconnected nodes called Neurons that work together to process and
learn from the Input data.

In a fully connected Deep Neural Network, there is an Input Layer and one or more
Hidden Layers connected one after the other. Each neuron receives Input from the previous
Layer Neurons or the Input Layer. The Output of one neuron becomes the Input to other
Neurons in the next Layer of the Network, and this process continues until the final Layer
produces the Output of the Network.

The Layers of the Neural Network transform the Input data through a series of
nonlinear transformations, allowing the Network to learn complex representations of the Input
data.

Artificial Intelligence

Machine
Learning Data Science
Deep
Learning

Architectures:

 Deep Neural Networks

It is a Neural Network that incorporates the complexity of a certain level, which means
several numbers of Hidden Layers are encompassed in between the Input and Output
Layers. They are highly proficient on model and process non-linear associations.
 Deep Belief Networks
A deep belief Network is a class of Deep Neural Network that comprises of multi-Layer
belief Networks.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Steps to perform DBN:
1. With the help of the Contrastive Divergence algorithm, a Layer of features is learned
from perceptible units.
2. Next, the formerly trained features are treated as visible units, which perform learning of
features.
3. Lastly, when the learning of the final Hidden Layer is accomplished, then the whole
DBN is trained.
 Recurrent Neural Networks
It permits parallel as well as sequential computation, and it is exactly similar to that of
the human brain (large feedback Network of connected neurons). Since they are
capable enough to reminisce all of the imperative things related to the Input they have
received, so they are more precise.

Types of Deep Learning Networks:

1. Feed Forward Neural Network
2. Recurrent Neural Network
3. Convolution Neural Network
4. Restricted Boltzmann Machine
5. Auto Encoders

1. Feed-Forward Neural Network

It is important to recognize the subsequent training of our Neural Network.
Recognition is done by dividing our data samples through some decision boundary.
"The process of receiving an Input to produce some kind of Output to make some kind
of prediction is known as Feed Forward." Feed Forward Neural Network is the core of many
other important Neural Networks such as convolution Neural Network.

Applications:
 Data Compression
 Pattern Recognition
 Computer Vision
 Sonar Target Recognition
 Speech Recognition
 Handwritten Characters Recognition

2. Recurrent Neural Network:

This is another variation of Feed-Forward Networks. Here each of the Neurons
present in the Hidden Layers receives an Input with a specific delay in time. The Recurrent
Neural Network mainly accesses the previous information of existing Iterations.

For Example: To guess the succeeding word in any sentence, one must have knowledge
about the words that were previously used. It is not only processes the Inputs but also shares
the length as well as weights crossways time.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

It does not let the size of the model to increase with the increase in the Input size.
However, the only problem with this recurrent Neural Network is that it has slow
computational speed as well as it does not study any future Input for the current state. It has
a problem with recollect previous information.

Applications:
 Machine Translation
 Robot Control
 Time Series Prediction & Anomaly Detection
 Speech Recognition
 Speech Synthesis
 Rhythm Learning
 Music Composition

3. Convolution Neural Network(CNN)

These are a special kind of Neural Network mainly used for Image
classification, clustering of images and object recognition. DNNs enable Unsupervised
construction of Hierarchical Image representations. To achieve the best accuracy, deep
convolution Neural Networks are preferred more than any other Neural Network.

Applications:
 Identify Faces, Street Signs, and Tumors.
 Image Recognition.
 Video Analysis.
 NLP.
 Anomaly Detection.
 Drug Discovery.
 Checkers Game.
 Time Series Forecasting.

4. Restricted Boltzmann Machine:

The Neurons present in the Input Layer and Hidden Layer Encompasses (surround)
Symmetric connections along with them.
However, there is no internal association within the respective Layer. But in contrast
to RBM, Boltzmann machines do encompass internal connections inside the Hidden Layer.
These restrictions in BMs help the model to train efficiently.

Applications:
 Filtering.
 Feature Learning.
 Classification.
 Risk Detection.
 Business and Economic Analysis.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

5. Auto Encoders:
An auto encoder Neural Network is another kind of Unsupervised machine learning
algorithm. The number of Hidden Cells is merely small than that of the Input Cells. But the
number of Input Cells is equivalent to the number of Output Cells.
An Auto Encoder Network is trained to display the Output similar to the fed Input to
force AEs to find common patterns and generalize the data.
The Auto Encoders are mainly used for the smaller representation of the Input. It helps
in the reconstruction of the original data from compressed data. This algorithm is
comparatively simple as it only necessitates the Output identical to the Input.
 Encoder: Convert Input data in lower dimensions.
 Decoder: Reconstruct the compressed data. Lower to upper dimensions.
Applications:
 Classification.
 Clustering.
 Feature Compression (Density).

Deep Learning Applications:

a) Self-Driving Cars:
In self-driven cars, it is able to capture the images around it by processing a
huge amount of data, and then it will decide which actions should be incorporated to
take a left or right or should it stop. So, accordingly, it will decide what actions it should
take, which will further reduce the accidents that happen every year.
b) Voice Controlled Assistance:
When we talk about voice control assistance, then Siri is the one thing that
comes into our mind. So, you can tell Siri whatever you want it to do it for you, and it
will search it for you and display it for you.
c) Automatic Image Caption Generation:
Whatever Image is uploaded, the algorithm will work in such a way to generate
caption accordingly. If you say blue colored eye, it will display a blue-colored eye with
a caption at the bottom of the image.
d) Automatic Machine Translation:
With the help of automatic machine translation, we are able to convert one
language into another with the help of deep learning.
Limitations:
 It only learns through the Observations.
 It comprises of biases Issues.
Advantages:
 It lessens the need for Feature Engineering.
 It Eradicates All Those Costs that are Needless.
 It easily Identifies Difficult Defects.
 It Results in The Best-in-Class Performance on Problems.
Disadvantages
 It requires a sufficient amount of data.
 It is quite expensive to train.
 It does not have strong theoretical groundwork

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Deep Feed - Forward Neural Networks:-
In the Feed-Forward Neural Network, there are not any feedback loops or connections
in the Network. Here is simply an Input Layer, Hidden Layers, and an Output Layer.
There can be multiple Hidden Layers which depend on what kind of data you are
dealing with. The number of Hidden Layers is known as the depth of the Neural Network.
The deep Neural Network can learn from more functions.
 Input Layer first provides the Neural Network with data
 Output Layer then make predictions on that data which is based on a series of functions.
 Function is the most commonly used activation function in the deep Neural Network.

Feed-Forward Process Through Mathematically:

1) The first Input is fed to the Network, which is represented as Matrix x1, x2, and 1
where 1 is the bias value.
[ x1 x2 1 ]
2) Each Input is multiplied by weight with respect to the first and second model to obtain
their probability of being in the positive region in each model.
Multiply the Inputs by a matrix of weight using matrix multiplication.

3) After that, Take the sigmoid of our scores and gives us the probability of the point being
in the positive region in both models.

1/1+e-x [ score score ]= probability

4) Multiply the probability which is obtained from the previous step with the second set of
weights. Include 1 as bias of one whenever taking a combination of Inputs.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

And as for known to obtain the probability of the point being in the positive region of
this model, take the sigmoid and thus producing our final Output in a Feed-Forward
process.

Non-Linear Model In The Output Layer:

Let takes the Neural Network which we had previously with the following linear models
and the Hidden Layer which combined to form the non-linear model in the Output Layer.

So, what we will do we use our non-linear model to produce an Output that describes the
probability of the point being in the positive region. The point was represented by 2 and 2.
Along with bias, we will represent the Input as
[ 2 2 1]

The first linear model in the Hidden Layer recall and the equation defined it
-4x1-x2=12

Which means in the first Layer to obtain the linear combination the Inputs are multiplied
by -4, -1 and the bias value is multiplied by twelve.

The weight of the Inputs are multiplied by -1/5, 1, and the bias is multiplied by three to
obtain the linear combination of that same point in our second model.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Now, to obtain the probability of the point is in the positive region relative to both
models we apply sigmoid to both points as

The second Layer contains the weights which dictated the combination of the linear
models in the first Layer to obtain the non-linear model in the second Layer. The weights are
1.5, 1, and a bias value of 0.5.
Now, we have to multiply our probabilities from the first Layer with the second set of weights
as

Now, we will take the sigmoid of our final score

It is complete math behind the feed forward process where the Inputs from the Input
traverse the entire depth of the Neural Network. In this example, there is only one Hidden
Layer. Whether there is one Hidden Layer or twenty, the computational processes are the
same for all Hidden Layers.

A Feed-Forward Neural Network is none other than , which ensures that the nodes do
not form a cycle. In this kind of Neural Network, all the perceptions’ are organized within
Layers, such that the Input Layer takes the Input, and the Output Layer generates the Output.
Since the Hidden Layers do not link with the outside world, it is named as Hidden Layers.

Each of the perceptions contained in one single Layer is associated with each node in
the subsequent Layer. It can be concluded that all of the nodes are fully connected.
It does not contain any visible or invisible connection between the nodes in the same
Layer. There are no back-loops in the Feed-Forward Network. To minimize the prediction
error, the back propagation algorithm can be used to update the weight values.

Gradient Descent in Machine Learning:

The minimizing of errors between actual and expected results used to Machine
Learning Models , is called as Gradient Descent in Machine Learning.
It is one of the most commonly used Optimization Algorithms to train.
Gradient descent is also used to train Neural Networks.
 In mathematical terminology, Optimization algorithm refers to the task of
minimizing/maximizing an objective function f(x) parameterized by x.
 Similarly, in Machine Learning, optimization is the task of minimizing the cost function
parameterized by the model's parameters.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

 The main objective of Gradient descent is to minimize the convex function using
iteration of parameter updates.
Once Machine Learning Models are Optimized, then these models can be used as
powerful tools for Artificial Intelligence and various computer science applications.
` The best way to define the local minimum or local maximum of a function using
Gradient descent is as follows:
 If the movement towards the Negative Gradient or away from the Gradient of the
function at the current point, it will give the local minimum of that function.
 If the movement towards the Positive Gradient or towards the Gradient of the
function at the current point, it will give the local maximum of that function.

This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a Gradient descent algorithm is to minimize the
cost function using iteration. To achieve this goal, it performs two steps iteratively:
 Calculates the first-order derivative of the function to compute the Gradient or
slope of that function.
 Move away from the direction of the gradient, which means slope increased from
the current point by alpha times, where Alpha is defined as Learning Rate. It is a
tuning parameter in the optimization process , helps to decide the length of the
steps.

What is Cost-Function?
The cost function is defined as the measurement of difference or error between
actual values and expected values at the current position and present in the form of a
single real number.
 It helps to increase and improve machine learning efficiency by providing feedback to
this model so that it can minimize error and find the local or global minimum.
 Further, it continuously iterates along the direction of the negative Gradient until the
cost function approaches zero. At this steepest descent point, the model will stop
learning further.
 Although cost function and loss function are considered synonymous, also there is a
minor difference between them.
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
 The slight difference between the loss function and the cost function is about the error
within the training of machine learning models, as loss function refers to the error of
one training example, while a cost function calculates the average error across an
entire training set.
 The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using Gradient descent algorithms over known data to
reduce the cost function.
 Hypothesis
 Parameters
 Cost function
 Goal

How does Gradient Descent work?

Before starting the working principle of Gradient descent, we should know some basic
concepts to find out the slope of a line from linear regression. The equation for simple linear
regression is given as:

Y = mx + c Where m is the slope of the line, and 'c' is the intercepts on the Y-
axis.

The starting point(shown in above fig.) is used to evaluate the performance as it is

considered just as an arbitrary point. At this starting point, we will derive the first derivative or
slope and then use a tangent line to calculate the steepness of this slope.
The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point, it
approaches the lowest point, which is called a point of convergence.
The main objective of Gradient descent is to minimize the cost function or the error
between expected and actual. To minimize the cost function, two data points are required:

Direction & Learning Rate

These two factors are used to determine the partial derivative calculation of future
iteration and allow it to the point of convergence or local minimum or global minimum. Let's
discuss learning rate factors in brief;

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of the cost
function. If the learning rate is high, it results in larger steps but also leads to risks of
overshooting the minimum.

Types of Gradient Descent:-

Based on the error in various training models, the Gradient Descent learning algorithm
can be divided into 3 types.
1. Batch Gradient Descent,
2. Stochastic Gradient Descent
3. Mini-Batch Gradient Descent.

1. Batch Gradient Descent:-

Batch Gradient descent (BGD) is used to find the error for each point in the training set
and update the model after evaluating all training examples. This procedure is known as the
training period. In simple words, it is a greedy approach where we have to sum over all
examples for each update.
Advantages of Batch Gradient descent:
 It produces less noise in comparison to other Gradient descent.
 It produces stable Gradient descent convergence.
 It is Computationally efficient as all resources are used for all training samples.

2. Stochastic Gradient Descent:-

Stochastic Gradient Descent (SGD) is a type of Gradient descent that runs one
training example per each iteration. It processes a training period for each example within a
dataset and updates parameters of each training example , one parameter at a time.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

As it requires only one training example at a time, hence it is easier to store in
allocated memory.
It shows some computational efficiency losses in comparison to batch Gradient
systems as it shows frequent updates that require more detail and speed. Further, due to
frequent updates, it is also treated as a noisy gradient.
However, sometimes it can be helpful in finding the global minimum and also
escaping the local minimum.
Advantages of Stochastic Gradient descent:
 It is easier to allocate in desired memory.
 It is relatively fast to compute than batch Gradient descent.
 It is more efficient for large datasets.

3. Mini Batch Gradient Descent:

Mini Batch Gradient descent is the combination of both batch and stochastic gradient.
It divides the training datasets into small batch sizes then performs the updates on those
batches separately.
Splitting training datasets into smaller batches make a balance to maintain the
computational efficiency of batch Gradient descent and speed of stochastic Gradient
descent. Hence, we can achieve a special type of Gradient descent with higher
computational efficiency and less noisy Gradient descent.

Advantages of Mini Batch Gradient Descent:

 It is easier to fit in allocated memory.
 It is computationally efficient.
 It produces stable Gradient descent convergence.

Hidden Units:
The design of Hidden units is an extremely active area of research and these does not
have many definitive guiding theoretical principles .
 Rectified Linear Units are an excellent default choice of Hidden unit.
 It is usually impossible to predict in advance which will work best.
 The design process consists of trial and error, intuiting that a kind of Hidden unit may
work well, and evaluating its performance on a validation set.
 Some Hidden units are not differentiable at all Input points.

For Example, the Rectified Linear Function g(z)=max{0, z} is not differentiable at z = 0.

This may seem like it invalidates g for use with a Gradient based learning algorithm. In
practice, Gradient descent still performs well enough for these models to be used for
machine learning tasks.

Most Hidden units can be described as accepting a vector of Inputs x, computing an

affine(combine) transformation z = wTh+b, and then applying an element-wise non-linear
function g(z).

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Most Hidden units are distinguished from each other only by the choice of the form of
the activation function g(z)

Rectified Linear Units (ReLU) and Their Generalizations:

Rectified Linear Units use the activation function g(z) = max{0, z}.
Rectified Linear Units are easy to optimize due to similarity with linear units.

 Only difference with linear units that their output is 0 across half its domain
 Derivative is 1 everywhere that the unit is active
 Thus Gradient direction is far more useful than with activation functions with second-
order effects
Rectified Linear Units are typically used on top of an affine
transformation: h=g(W Tx+b)ℎ.
Good practice to set all elements of b to a small value such as 0.1.
This makes it likely that RLU will be initially active for most training samples and allow
derivatives to pass through

RLU Vs Other Activations:

 Sigmoid and tanh activation functions cannot be with many layers due to the
vanishing gradient problem.
 ReLU overcomes the vanishing Gradient problem, allowing models to learn faster and
perform better.
 ReLU is the default activation function with MLP and CNN

One drawback to rectified linear units is that they cannot learn via gradient based
methods on examples for which their activation is zero.
Three generalizations of rectified linear units are based on using a non-zero slope αi when
zi < 0: hi=g(z, α) i=max(0,zi)+αi
min(0,zi) hi=g(z, α) i=max(0,zi)+αimin(0,zi).
1. Absolute value rectification fixes αi = −1 to obtain g(z) = |z|. It is used for object
recognition from images
2. A leaky ReLU fixes αi to a small value like 0.01
3. parametric ReLU treats αi as a learnable parameter

Architecture Design:
The architecture of a neural network is the structure of interconnected nodes, called
neurons, that are organized in layers. The design of a neural network architecture is
important because it determines how the network functions and learns.

When designing a neural network architecture, you can consider things like:
 Problem: Understand the problem you're trying to solve
 Model objectives: Define what you want the model to do
 Network type: Choose the type of network you want to use
 Model complexity: Consider how complex the model should be
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
 Layers and units: Decide how many layers and units the network should have
 Activation functions: Choose the activation functions to use
 Regularization and dropout: Decide how to use regularization and dropout
 Optimization algorithm and learning rate: Select an optimization algorithm and
learning rate
Some types of neural network architectures include:
 Convolutional Neural Networks (CNNs): Used for image processing and analysis,
such as image classification and facial recognition
 Recurrent Neural Networks (RNNs): Used for processing sequential data, such as
time-series data
 Generative Adversarial Networks (GANs): Used for generative tasks, where the
network automatically learns to generate new data that resembles the original dataset
Other types of neural network models include: Feedforward Neural Network, Long
Short-Term Memory (LSTM) Network, Gated Recurrent Unit (GRU) Network, Auto
encoder, and Radial Basis Function Network (RBFN).

Architecture Terminology
The word architecture refers to the overall structure of the Network:
 How many units should It has?
 How the units should be connected to each other?
 Most Neural Networks are organized into groups of units called Layers
 Most Neural Network architectures arrange these Layers in a chain structure
 With each Layer being a function of the Layer that preceded.

Main Architectural Considerations

1. Choice of depth of Network
2. Choice of width of each Layer

Generic Neural Architectures(1-11)

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Back Propagation And Other Differentiation Algorithms
Back propagation is an algorithm that back propagates (circulate) the errors from the
Output nodes to the Input nodes. Therefore, it is simply referred to as the backward
propagation of errors. It uses in the vast applications of Neural Networks in data mining like
Character recognition, Signature verification, etc.

Neural Network:
Neural Networks are an information processing paradigm inspired by the human
nervous system. Just like in the human nervous system, we have biological Neurons in the
same way in Neural Networks we have artificial neurons, artificial Neurons are mathematical
functions derived from biological neurons.

The human brain is estimated to have about 10 billion neurons, each connected to an
average of 10,000 other neurons. Each neuron receives a signal through a synapse, which
controls the effect of the sign concerning on the neuron.

Back Propagation:

Back Propagation is a widely used algorithm for training feedforward Neural

Networks. It computes the Gradient of the loss function with respect to the Network weights.
It is very efficient, rather than naively directly computing the Gradient concerning each
weight.

This efficiency makes it possible to use Gradient methods to train multi-Layer

Networks and update weights to minimize loss; variants such as Gradient descent or
stochastic Gradient descent are often used.

The back propagation algorithm works by computing the Gradient of the loss function
with respect to each weight via the chain rule, computing the Gradient Layer by Layer, and
iterating backward from the last Layer to avoid redundant computation of intermediate terms
in the chain rule.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Working of Back Propagation:
Neural Networks use supervised learning to generate Output vectors from Input
vectors that the Network operates on. It Compares generated Output to the desired Output
and generates an error report if the result does not match the generated Output vector. Then
it adjusts the weights according to the bug report to get your desired Output.
Features of Back Propagation:
1. it is the method as used in the case of simple perceptron Network with the
differentiable unit.
2. it is different from other Networks in respect to the process by which the weights
are calculated during the learning period of the Network.
3. training is done in the three stages :
 the of Input training pattern
 the calculation and Back Propagation of the error
 updating of the weight

Algorithm of Back Propagation:

Step 1: Inputs X, arrive through the pre connected path.

Step 2: The Input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the Output of each neuron from the Input Layer to the Hidden Layer to the
Output Layer.
Step 4: Calculate the error in the Outputs
Back propagation Error= Actual Output – Desired Output
Step 5: From the Output Layer, go back to the Hidden Layer to adjust the weights to reduce
the error.
Step 6: Repeat the process until the desired Output is achieved.

Parameters :
 x = Inputs Training Vector x=(x1,x2,…………xn).
 t = Target Vector t=(t1,t2……………tn).
 δk = Error at Output Unit.
 δj = Error at Hidden Layer.
 α = Learning Rate.
 V0 j = Bias of Hidden Unit j.
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
Training Algorithm :

Step 1: Initialize weight to small random values.

Step 2: While the steps stopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each Input unit receives the signal unit and transmits the signal xi signal to all the
units.
Step 5 : Each Hidden unit Zj (z=1 to a)
sums its weighted Input signal to calculate its net Input
zinj = v0j + σxivij ( i= 1 to n)
Applying activation function
zj= f(zinj) and sends this signals to all units in the
Layer about i.e Output units
For each Output
l=unit yk = (k=1 to m ) sums its weighted Input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the Output signals.
yk = f(yink)

Back Propagation Error :

Step 6: Each Output unit Yk (k=1 to n) receives a target pattern corresponding to an Input
pattern then error is calculated as:
δk = ( tk – yk ) + Yink
Step 7: Each Hidden unit Zj (j=1 to a) sums its Input from all units in the Layer above
δinj = Σ δj W jk
The error information term is calculated as :
δj = δinj + Zinj

Updation of Weight and Bias :

Step 8: Each Output unit Yk (k=1 to m) updates its bias and weight (j=1 to a).
The weight correction term is given by :
Δ wjk = α δk zj and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each Hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the weight
connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Step 9: Test the stopping condition. The stopping condition can be the minimization of error,
number of epochs.

Types of Back Propagation:

There are two types of Back Propagation Networks.
1. Static Back Propagation: Static Back Propagation is a Network designed to map
static Inputs for static Outputs. These types of Networks are capable of solving static
classification problems such as OCR (Optical Character Recognition).
2. Recurrent Back Propagation: Recursive Back Propagation is another Network
used for fixed-point learning. Activation in recurrent Back Propagation is Feed-
Forward until a fixed value is reached. Static Back Propagation provides an instant
mapping, while recurrent Back Propagation does not provide an instant mapping.
Advantages:
 It is simple, fast, and easy to program.
 Only numbers of the Input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.
Disadvantages:
 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
 Performance is highly dependent on Input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Robotics Research Paper
100% (3)
Robotics Research Paper
23 pages
Canada NOC Code List PDF 2024 - In-Demand Jobs in Canada
No ratings yet
Canada NOC Code List PDF 2024 - In-Demand Jobs in Canada
363 pages
Neural Network Unit 1 Handwritten Notes
No ratings yet
Neural Network Unit 1 Handwritten Notes
30 pages
HF Security Smart-Pass - Installation Instructions - 1.5.9 - 20220304
No ratings yet
HF Security Smart-Pass - Installation Instructions - 1.5.9 - 20220304
28 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Deep Learning (RCS-086) ppt-1 of Unit-1
100% (2)
Deep Learning (RCS-086) ppt-1 of Unit-1
14 pages
UNIT - 1 (AR & VR Notes)
No ratings yet
UNIT - 1 (AR & VR Notes)
11 pages
Bhuvaneswar Reddy Pidatala
No ratings yet
Bhuvaneswar Reddy Pidatala
1 page
KJRP-86I, A Installation and Owner's Manual
No ratings yet
KJRP-86I, A Installation and Owner's Manual
25 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
ML Unit-3
No ratings yet
ML Unit-3
24 pages
ML Unit 1
No ratings yet
ML Unit 1
15 pages
GeoBase NHNC1 Data Model UML EN
No ratings yet
GeoBase NHNC1 Data Model UML EN
19 pages
aaLM Studio - Discover, Download, and Run Local LLMs
No ratings yet
aaLM Studio - Discover, Download, and Run Local LLMs
3 pages
JD - MIS Data Scientist
No ratings yet
JD - MIS Data Scientist
2 pages
5.hyperparameters and Validation Sets (C)
No ratings yet
5.hyperparameters and Validation Sets (C)
3 pages
Neural Network Based DPD
No ratings yet
Neural Network Based DPD
21 pages
CS3591 Computer Networks Lab Manual Finalized
No ratings yet
CS3591 Computer Networks Lab Manual Finalized
67 pages
Breadth-First Search: Breadth-First Search (BFS) Is A Traversing Algorithm Where You Starts From A Given
No ratings yet
Breadth-First Search: Breadth-First Search (BFS) Is A Traversing Algorithm Where You Starts From A Given
13 pages
Autonomic Computing-12
No ratings yet
Autonomic Computing-12
14 pages
Mobile Computing Presentation
No ratings yet
Mobile Computing Presentation
12 pages
XP Ndo DLL With GUI Fix
No ratings yet
XP Ndo DLL With GUI Fix
2 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
23 pages
MarSurf PS1 Instruction Manual
No ratings yet
MarSurf PS1 Instruction Manual
66 pages
Deep Learning Basics in Machine Learnning 1
No ratings yet
Deep Learning Basics in Machine Learnning 1
29 pages
Solutions To Assignment 2: Problem 1: Smallest Error in Differentiation
No ratings yet
Solutions To Assignment 2: Problem 1: Smallest Error in Differentiation
3 pages
Hardware and Software Selection and Acquisition and Computer Personnel
No ratings yet
Hardware and Software Selection and Acquisition and Computer Personnel
11 pages
Norfolk Documentation
No ratings yet
Norfolk Documentation
5 pages
Quectel BC660K-GL TCPIP Application Note V1.1
No ratings yet
Quectel BC660K-GL TCPIP Application Note V1.1
37 pages
Staad Aashto LRFD Parameters
No ratings yet
Staad Aashto LRFD Parameters
2 pages
DL Unit 1
No ratings yet
DL Unit 1
19 pages
How Is Battery Life Affected Through Use?
No ratings yet
How Is Battery Life Affected Through Use?
16 pages
Configuring Simconnect For Interfaceit™ Software
No ratings yet
Configuring Simconnect For Interfaceit™ Software
8 pages
TR 23689330.01.1.w
No ratings yet
TR 23689330.01.1.w
20 pages
Aditya College of Engineering: B R E A K B R E A K
No ratings yet
Aditya College of Engineering: B R E A K B R E A K
6 pages
Module 8 (Topic 8) Socialmedia Etiquette
No ratings yet
Module 8 (Topic 8) Socialmedia Etiquette
6 pages
ML LAB Mannual-1
No ratings yet
ML LAB Mannual-1
79 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
SwissgasSonimix 2106 Gas Dilution Calibrator
No ratings yet
SwissgasSonimix 2106 Gas Dilution Calibrator
2 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Stokes 2018 TOC - Emarketing 6ed
No ratings yet
Stokes 2018 TOC - Emarketing 6ed
4 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
136 pages
DL Question Bank Answers
No ratings yet
DL Question Bank Answers
55 pages
Chapter-2-Fundamentals of Machine Learning
No ratings yet
Chapter-2-Fundamentals of Machine Learning
23 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Deep Learning Exp
No ratings yet
Deep Learning Exp
25 pages
Bus Ticket Booking System
No ratings yet
Bus Ticket Booking System
17 pages
Unit 4
100% (1)
Unit 4
57 pages
Unit I Iot
No ratings yet
Unit I Iot
4 pages
200-901 V15.95
No ratings yet
200-901 V15.95
121 pages
Unit 4
No ratings yet
Unit 4
79 pages
ML Lab
No ratings yet
ML Lab
62 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
Machine Learning Full Question Bank
No ratings yet
Machine Learning Full Question Bank
14 pages
DL Question Bank
No ratings yet
DL Question Bank
23 pages
Answers For End-Sem Exam Part - 2 (Deep Learning)
No ratings yet
Answers For End-Sem Exam Part - 2 (Deep Learning)
20 pages
Dear Sir
100% (3)
Dear Sir
3 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Unit 5
No ratings yet
Unit 5
23 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
ML (U1&u2)
No ratings yet
ML (U1&u2)
51 pages
Unit 4
No ratings yet
Unit 4
24 pages
(2018!04!16) Bali DL PRB Justification
No ratings yet
(2018!04!16) Bali DL PRB Justification
7 pages
Deep Learning Handout
100% (1)
Deep Learning Handout
6 pages
RD 01 Mus 2
No ratings yet
RD 01 Mus 2
9 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
DL Unit-2 Notes PPT
No ratings yet
DL Unit-2 Notes PPT
39 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Pattern Recognition and Anomaly Detection Lab
No ratings yet
Pattern Recognition and Anomaly Detection Lab
3 pages
Overfitting vs. Underfitting, Bias vs. Variance
No ratings yet
Overfitting vs. Underfitting, Bias vs. Variance
7 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
ML Unit 1
No ratings yet
ML Unit 1
44 pages
NNDL Lab Record
No ratings yet
NNDL Lab Record
26 pages
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
NN DL
No ratings yet
NN DL
1 page
ML Unit-Iv
No ratings yet
ML Unit-Iv
19 pages
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
18AI61
No ratings yet
18AI61
3 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet

Unit - 3-NNDL - Notes

Uploaded by

Unit - 3-NNDL - Notes

Uploaded by

NNDL--UNIT - III

 Deep Neural Networks

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Types of Deep Learning Networks:

1. Feed-Forward Neural Network

2. Recurrent Neural Network:

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

3. Convolution Neural Network(CNN)

4. Restricted Boltzmann Machine:

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Deep Learning Applications:

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Feed-Forward Process Through Mathematically:

1/1+e-x [ score score ]= probability

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Non-Linear Model In The Output Layer:

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Now, we will take the sigmoid of our final score

Gradient Descent in Machine Learning:

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

How does Gradient Descent work?

The starting point(shown in above fig.) is used to evaluate the performance as it is

Direction & Learning Rate

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Types of Gradient Descent:-

1. Batch Gradient Descent:-

2. Stochastic Gradient Descent:-

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

3. Mini Batch Gradient Descent:

Advantages of Mini Batch Gradient Descent:

For Example, the Rectified Linear Function g(z)=max{0, z} is not differentiable at z = 0.

Most Hidden units can be described as accepting a vector of Inputs x, computing an

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Rectified Linear Units (ReLU) and Their Generalizations:

RLU Vs Other Activations:

Main Architectural Considerations

Generic Neural Architectures(1-11)

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Back Propagation is a widely used algorithm for training feedforward Neural

This efficiency makes it possible to use Gradient methods to train multi-Layer

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Algorithm of Back Propagation:

Step 1: Inputs X, arrive through the pre connected path.

Step 1: Initialize weight to small random values.

Back Propagation Error :

Updation of Weight and Bias :

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

Types of Back Propagation:

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

You might also like