0% found this document useful (0 votes)
334 views17 pages

Unit - 3-NNDL - Notes

This document provides an overview of deep learning, detailing its architecture, types of neural networks, and their applications. It discusses various neural network models such as Feed-Forward Neural Networks, Recurrent Neural Networks, Convolutional Neural Networks, Restricted Boltzmann Machines, and Auto Encoders, along with their specific use cases. Additionally, it covers concepts like gradient descent, cost functions, and the advantages and limitations of deep learning.

Uploaded by

nareshamgoth04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
334 views17 pages

Unit - 3-NNDL - Notes

This document provides an overview of deep learning, detailing its architecture, types of neural networks, and their applications. It discusses various neural network models such as Feed-Forward Neural Networks, Recurrent Neural Networks, Convolutional Neural Networks, Restricted Boltzmann Machines, and Auto Encoders, along with their specific use cases. Additionally, it covers concepts like gradient descent, cost functions, and the advantages and limitations of deep learning.

Uploaded by

nareshamgoth04
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

NNDL--UNIT - III

Introduction to Deep Learning, Historical Trends in Deep Learning, Deep Feed - Forward
Networks, Gradient-Based Learning, Hidden Units, Architecture Design, Back-Propagation and Other
Differentiation Algorithms
Introduction to Deep Learning:
Deep learning is the branch of machine learning which is based on artificial Neural
Network architecture. An artificial Neural Network or A
NN uses Layers of interconnected nodes called Neurons that work together to process and
learn from the Input data.

In a fully connected Deep Neural Network, there is an Input Layer and one or more
Hidden Layers connected one after the other. Each neuron receives Input from the previous
Layer Neurons or the Input Layer. The Output of one neuron becomes the Input to other
Neurons in the next Layer of the Network, and this process continues until the final Layer
produces the Output of the Network.

The Layers of the Neural Network transform the Input data through a series of
nonlinear transformations, allowing the Network to learn complex representations of the Input
data.

Artificial Intelligence

Machine
Learning Data Science
Deep
Learning

Architectures:

 Deep Neural Networks


It is a Neural Network that incorporates the complexity of a certain level, which means
several numbers of Hidden Layers are encompassed in between the Input and Output
Layers. They are highly proficient on model and process non-linear associations.
 Deep Belief Networks
A deep belief Network is a class of Deep Neural Network that comprises of multi-Layer
belief Networks.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Steps to perform DBN:
1. With the help of the Contrastive Divergence algorithm, a Layer of features is learned
from perceptible units.
2. Next, the formerly trained features are treated as visible units, which perform learning of
features.
3. Lastly, when the learning of the final Hidden Layer is accomplished, then the whole
DBN is trained.
 Recurrent Neural Networks
It permits parallel as well as sequential computation, and it is exactly similar to that of
the human brain (large feedback Network of connected neurons). Since they are
capable enough to reminisce all of the imperative things related to the Input they have
received, so they are more precise.

Types of Deep Learning Networks:


1. Feed Forward Neural Network
2. Recurrent Neural Network
3. Convolution Neural Network
4. Restricted Boltzmann Machine
5. Auto Encoders

1. Feed-Forward Neural Network


It is important to recognize the subsequent training of our Neural Network.
Recognition is done by dividing our data samples through some decision boundary.
"The process of receiving an Input to produce some kind of Output to make some kind
of prediction is known as Feed Forward." Feed Forward Neural Network is the core of many
other important Neural Networks such as convolution Neural Network.

Applications:
 Data Compression
 Pattern Recognition
 Computer Vision
 Sonar Target Recognition
 Speech Recognition
 Handwritten Characters Recognition

2. Recurrent Neural Network:


This is another variation of Feed-Forward Networks. Here each of the Neurons
present in the Hidden Layers receives an Input with a specific delay in time. The Recurrent
Neural Network mainly accesses the previous information of existing Iterations.

For Example: To guess the succeeding word in any sentence, one must have knowledge
about the words that were previously used. It is not only processes the Inputs but also shares
the length as well as weights crossways time.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


It does not let the size of the model to increase with the increase in the Input size.
However, the only problem with this recurrent Neural Network is that it has slow
computational speed as well as it does not study any future Input for the current state. It has
a problem with recollect previous information.

Applications:
 Machine Translation
 Robot Control
 Time Series Prediction & Anomaly Detection
 Speech Recognition
 Speech Synthesis
 Rhythm Learning
 Music Composition

3. Convolution Neural Network(CNN)


These are a special kind of Neural Network mainly used for Image
classification, clustering of images and object recognition. DNNs enable Unsupervised
construction of Hierarchical Image representations. To achieve the best accuracy, deep
convolution Neural Networks are preferred more than any other Neural Network.

Applications:
 Identify Faces, Street Signs, and Tumors.
 Image Recognition.
 Video Analysis.
 NLP.
 Anomaly Detection.
 Drug Discovery.
 Checkers Game.
 Time Series Forecasting.

4. Restricted Boltzmann Machine:


The Neurons present in the Input Layer and Hidden Layer Encompasses (surround)
Symmetric connections along with them.
However, there is no internal association within the respective Layer. But in contrast
to RBM, Boltzmann machines do encompass internal connections inside the Hidden Layer.
These restrictions in BMs help the model to train efficiently.

Applications:
 Filtering.
 Feature Learning.
 Classification.
 Risk Detection.
 Business and Economic Analysis.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


5. Auto Encoders:
An auto encoder Neural Network is another kind of Unsupervised machine learning
algorithm. The number of Hidden Cells is merely small than that of the Input Cells. But the
number of Input Cells is equivalent to the number of Output Cells.
An Auto Encoder Network is trained to display the Output similar to the fed Input to
force AEs to find common patterns and generalize the data.
The Auto Encoders are mainly used for the smaller representation of the Input. It helps
in the reconstruction of the original data from compressed data. This algorithm is
comparatively simple as it only necessitates the Output identical to the Input.
 Encoder: Convert Input data in lower dimensions.
 Decoder: Reconstruct the compressed data. Lower to upper dimensions.
Applications:
 Classification.
 Clustering.
 Feature Compression (Density).

Deep Learning Applications:


a) Self-Driving Cars:
In self-driven cars, it is able to capture the images around it by processing a
huge amount of data, and then it will decide which actions should be incorporated to
take a left or right or should it stop. So, accordingly, it will decide what actions it should
take, which will further reduce the accidents that happen every year.
b) Voice Controlled Assistance:
When we talk about voice control assistance, then Siri is the one thing that
comes into our mind. So, you can tell Siri whatever you want it to do it for you, and it
will search it for you and display it for you.
c) Automatic Image Caption Generation:
Whatever Image is uploaded, the algorithm will work in such a way to generate
caption accordingly. If you say blue colored eye, it will display a blue-colored eye with
a caption at the bottom of the image.
d) Automatic Machine Translation:
With the help of automatic machine translation, we are able to convert one
language into another with the help of deep learning.
Limitations:
 It only learns through the Observations.
 It comprises of biases Issues.
Advantages:
 It lessens the need for Feature Engineering.
 It Eradicates All Those Costs that are Needless.
 It easily Identifies Difficult Defects.
 It Results in The Best-in-Class Performance on Problems.
Disadvantages
 It requires a sufficient amount of data.
 It is quite expensive to train.
 It does not have strong theoretical groundwork

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Deep Feed - Forward Neural Networks:-
In the Feed-Forward Neural Network, there are not any feedback loops or connections
in the Network. Here is simply an Input Layer, Hidden Layers, and an Output Layer.
There can be multiple Hidden Layers which depend on what kind of data you are
dealing with. The number of Hidden Layers is known as the depth of the Neural Network.
The deep Neural Network can learn from more functions.
 Input Layer first provides the Neural Network with data
 Output Layer then make predictions on that data which is based on a series of functions.
 Function is the most commonly used activation function in the deep Neural Network.

Feed-Forward Process Through Mathematically:

1) The first Input is fed to the Network, which is represented as Matrix x1, x2, and 1
where 1 is the bias value.
[ x1 x2 1 ]
2) Each Input is multiplied by weight with respect to the first and second model to obtain
their probability of being in the positive region in each model.
Multiply the Inputs by a matrix of weight using matrix multiplication.

3) After that, Take the sigmoid of our scores and gives us the probability of the point being
in the positive region in both models.

1/1+e-x [ score score ]= probability

4) Multiply the probability which is obtained from the previous step with the second set of
weights. Include 1 as bias of one whenever taking a combination of Inputs.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


And as for known to obtain the probability of the point being in the positive region of
this model, take the sigmoid and thus producing our final Output in a Feed-Forward
process.

Non-Linear Model In The Output Layer:


Let takes the Neural Network which we had previously with the following linear models
and the Hidden Layer which combined to form the non-linear model in the Output Layer.

So, what we will do we use our non-linear model to produce an Output that describes the
probability of the point being in the positive region. The point was represented by 2 and 2.
Along with bias, we will represent the Input as
[ 2 2 1]

The first linear model in the Hidden Layer recall and the equation defined it
-4x1-x2=12

Which means in the first Layer to obtain the linear combination the Inputs are multiplied
by -4, -1 and the bias value is multiplied by twelve.

The weight of the Inputs are multiplied by -1/5, 1, and the bias is multiplied by three to
obtain the linear combination of that same point in our second model.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Now, to obtain the probability of the point is in the positive region relative to both
models we apply sigmoid to both points as

The second Layer contains the weights which dictated the combination of the linear
models in the first Layer to obtain the non-linear model in the second Layer. The weights are
1.5, 1, and a bias value of 0.5.
Now, we have to multiply our probabilities from the first Layer with the second set of weights
as

Now, we will take the sigmoid of our final score

It is complete math behind the feed forward process where the Inputs from the Input
traverse the entire depth of the Neural Network. In this example, there is only one Hidden
Layer. Whether there is one Hidden Layer or twenty, the computational processes are the
same for all Hidden Layers.

A Feed-Forward Neural Network is none other than , which ensures that the nodes do
not form a cycle. In this kind of Neural Network, all the perceptions’ are organized within
Layers, such that the Input Layer takes the Input, and the Output Layer generates the Output.
Since the Hidden Layers do not link with the outside world, it is named as Hidden Layers.

Each of the perceptions contained in one single Layer is associated with each node in
the subsequent Layer. It can be concluded that all of the nodes are fully connected.
It does not contain any visible or invisible connection between the nodes in the same
Layer. There are no back-loops in the Feed-Forward Network. To minimize the prediction
error, the back propagation algorithm can be used to update the weight values.

Gradient Descent in Machine Learning:


The minimizing of errors between actual and expected results used to Machine
Learning Models , is called as Gradient Descent in Machine Learning.
It is one of the most commonly used Optimization Algorithms to train.
Gradient descent is also used to train Neural Networks.
 In mathematical terminology, Optimization algorithm refers to the task of
minimizing/maximizing an objective function f(x) parameterized by x.
 Similarly, in Machine Learning, optimization is the task of minimizing the cost function
parameterized by the model's parameters.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


 The main objective of Gradient descent is to minimize the convex function using
iteration of parameter updates.
Once Machine Learning Models are Optimized, then these models can be used as
powerful tools for Artificial Intelligence and various computer science applications.
` The best way to define the local minimum or local maximum of a function using
Gradient descent is as follows:
 If the movement towards the Negative Gradient or away from the Gradient of the
function at the current point, it will give the local minimum of that function.
 If the movement towards the Positive Gradient or towards the Gradient of the
function at the current point, it will give the local maximum of that function.

This entire procedure is known as Gradient Ascent, which is also known as steepest
descent. The main objective of using a Gradient descent algorithm is to minimize the
cost function using iteration. To achieve this goal, it performs two steps iteratively:
 Calculates the first-order derivative of the function to compute the Gradient or
slope of that function.
 Move away from the direction of the gradient, which means slope increased from
the current point by alpha times, where Alpha is defined as Learning Rate. It is a
tuning parameter in the optimization process , helps to decide the length of the
steps.

What is Cost-Function?
The cost function is defined as the measurement of difference or error between
actual values and expected values at the current position and present in the form of a
single real number.
 It helps to increase and improve machine learning efficiency by providing feedback to
this model so that it can minimize error and find the local or global minimum.
 Further, it continuously iterates along the direction of the negative Gradient until the
cost function approaches zero. At this steepest descent point, the model will stop
learning further.
 Although cost function and loss function are considered synonymous, also there is a
minor difference between them.
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
 The slight difference between the loss function and the cost function is about the error
within the training of machine learning models, as loss function refers to the error of
one training example, while a cost function calculates the average error across an
entire training set.
 The cost function is calculated after making a hypothesis with initial parameters and
modifying these parameters using Gradient descent algorithms over known data to
reduce the cost function.
 Hypothesis
 Parameters
 Cost function
 Goal

How does Gradient Descent work?


Before starting the working principle of Gradient descent, we should know some basic
concepts to find out the slope of a line from linear regression. The equation for simple linear
regression is given as:

Y = mx + c Where m is the slope of the line, and 'c' is the intercepts on the Y-
axis.

The starting point(shown in above fig.) is used to evaluate the performance as it is


considered just as an arbitrary point. At this starting point, we will derive the first derivative or
slope and then use a tangent line to calculate the steepness of this slope.
The slope becomes steeper at the starting point or arbitrary point, but whenever new
parameters are generated, then steepness gradually reduces, and at the lowest point, it
approaches the lowest point, which is called a point of convergence.
The main objective of Gradient descent is to minimize the cost function or the error
between expected and actual. To minimize the cost function, two data points are required:

Direction & Learning Rate


These two factors are used to determine the partial derivative calculation of future
iteration and allow it to the point of convergence or local minimum or global minimum. Let's
discuss learning rate factors in brief;

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Learning Rate:
It is defined as the step size taken to reach the minimum or lowest point. This is
typically a small value that is evaluated and updated based on the behavior of the cost
function. If the learning rate is high, it results in larger steps but also leads to risks of
overshooting the minimum.

Types of Gradient Descent:-


Based on the error in various training models, the Gradient Descent learning algorithm
can be divided into 3 types.
1. Batch Gradient Descent,
2. Stochastic Gradient Descent
3. Mini-Batch Gradient Descent.

1. Batch Gradient Descent:-


Batch Gradient descent (BGD) is used to find the error for each point in the training set
and update the model after evaluating all training examples. This procedure is known as the
training period. In simple words, it is a greedy approach where we have to sum over all
examples for each update.
Advantages of Batch Gradient descent:
 It produces less noise in comparison to other Gradient descent.
 It produces stable Gradient descent convergence.
 It is Computationally efficient as all resources are used for all training samples.

2. Stochastic Gradient Descent:-


Stochastic Gradient Descent (SGD) is a type of Gradient descent that runs one
training example per each iteration. It processes a training period for each example within a
dataset and updates parameters of each training example , one parameter at a time.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


As it requires only one training example at a time, hence it is easier to store in
allocated memory.
It shows some computational efficiency losses in comparison to batch Gradient
systems as it shows frequent updates that require more detail and speed. Further, due to
frequent updates, it is also treated as a noisy gradient.
However, sometimes it can be helpful in finding the global minimum and also
escaping the local minimum.
Advantages of Stochastic Gradient descent:
 It is easier to allocate in desired memory.
 It is relatively fast to compute than batch Gradient descent.
 It is more efficient for large datasets.

3. Mini Batch Gradient Descent:


Mini Batch Gradient descent is the combination of both batch and stochastic gradient.
It divides the training datasets into small batch sizes then performs the updates on those
batches separately.
Splitting training datasets into smaller batches make a balance to maintain the
computational efficiency of batch Gradient descent and speed of stochastic Gradient
descent. Hence, we can achieve a special type of Gradient descent with higher
computational efficiency and less noisy Gradient descent.

Advantages of Mini Batch Gradient Descent:


 It is easier to fit in allocated memory.
 It is computationally efficient.
 It produces stable Gradient descent convergence.

Hidden Units:
The design of Hidden units is an extremely active area of research and these does not
have many definitive guiding theoretical principles .
 Rectified Linear Units are an excellent default choice of Hidden unit.
 It is usually impossible to predict in advance which will work best.
 The design process consists of trial and error, intuiting that a kind of Hidden unit may
work well, and evaluating its performance on a validation set.
 Some Hidden units are not differentiable at all Input points.

For Example, the Rectified Linear Function g(z)=max{0, z} is not differentiable at z = 0.


This may seem like it invalidates g for use with a Gradient based learning algorithm. In
practice, Gradient descent still performs well enough for these models to be used for
machine learning tasks.

Most Hidden units can be described as accepting a vector of Inputs x, computing an


affine(combine) transformation z = wTh+b, and then applying an element-wise non-linear
function g(z).

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Most Hidden units are distinguished from each other only by the choice of the form of
the activation function g(z)

Rectified Linear Units (ReLU) and Their Generalizations:


Rectified Linear Units use the activation function g(z) = max{0, z}.
Rectified Linear Units are easy to optimize due to similarity with linear units.

 Only difference with linear units that their output is 0 across half its domain
 Derivative is 1 everywhere that the unit is active
 Thus Gradient direction is far more useful than with activation functions with second-
order effects
Rectified Linear Units are typically used on top of an affine
transformation: h=g(W Tx+b)ℎ.
Good practice to set all elements of b to a small value such as 0.1.
This makes it likely that RLU will be initially active for most training samples and allow
derivatives to pass through

RLU Vs Other Activations:


 Sigmoid and tanh activation functions cannot be with many layers due to the
vanishing gradient problem.
 ReLU overcomes the vanishing Gradient problem, allowing models to learn faster and
perform better.
 ReLU is the default activation function with MLP and CNN

One drawback to rectified linear units is that they cannot learn via gradient based
methods on examples for which their activation is zero.
Three generalizations of rectified linear units are based on using a non-zero slope αi when
zi < 0: hi=g(z, α) i=max(0,zi)+αi
min(0,zi) hi=g(z, α) i=max(0,zi)+αimin(0,zi).
1. Absolute value rectification fixes αi = −1 to obtain g(z) = |z|. It is used for object
recognition from images
2. A leaky ReLU fixes αi to a small value like 0.01
3. parametric ReLU treats αi as a learnable parameter

Architecture Design:
The architecture of a neural network is the structure of interconnected nodes, called
neurons, that are organized in layers. The design of a neural network architecture is
important because it determines how the network functions and learns.

When designing a neural network architecture, you can consider things like:
 Problem: Understand the problem you're trying to solve
 Model objectives: Define what you want the model to do
 Network type: Choose the type of network you want to use
 Model complexity: Consider how complex the model should be
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
 Layers and units: Decide how many layers and units the network should have
 Activation functions: Choose the activation functions to use
 Regularization and dropout: Decide how to use regularization and dropout
 Optimization algorithm and learning rate: Select an optimization algorithm and
learning rate
Some types of neural network architectures include:
 Convolutional Neural Networks (CNNs): Used for image processing and analysis,
such as image classification and facial recognition
 Recurrent Neural Networks (RNNs): Used for processing sequential data, such as
time-series data
 Generative Adversarial Networks (GANs): Used for generative tasks, where the
network automatically learns to generate new data that resembles the original dataset
Other types of neural network models include: Feedforward Neural Network, Long
Short-Term Memory (LSTM) Network, Gated Recurrent Unit (GRU) Network, Auto
encoder, and Radial Basis Function Network (RBFN).

Architecture Terminology
The word architecture refers to the overall structure of the Network:
 How many units should It has?
 How the units should be connected to each other?
 Most Neural Networks are organized into groups of units called Layers
 Most Neural Network architectures arrange these Layers in a chain structure
 With each Layer being a function of the Layer that preceded.

Main Architectural Considerations


1. Choice of depth of Network
2. Choice of width of each Layer

Generic Neural Architectures(1-11)

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Back Propagation And Other Differentiation Algorithms
Back propagation is an algorithm that back propagates (circulate) the errors from the
Output nodes to the Input nodes. Therefore, it is simply referred to as the backward
propagation of errors. It uses in the vast applications of Neural Networks in data mining like
Character recognition, Signature verification, etc.

Neural Network:
Neural Networks are an information processing paradigm inspired by the human
nervous system. Just like in the human nervous system, we have biological Neurons in the
same way in Neural Networks we have artificial neurons, artificial Neurons are mathematical
functions derived from biological neurons.

The human brain is estimated to have about 10 billion neurons, each connected to an
average of 10,000 other neurons. Each neuron receives a signal through a synapse, which
controls the effect of the sign concerning on the neuron.

Back Propagation:

Back Propagation is a widely used algorithm for training feedforward Neural


Networks. It computes the Gradient of the loss function with respect to the Network weights.
It is very efficient, rather than naively directly computing the Gradient concerning each
weight.

This efficiency makes it possible to use Gradient methods to train multi-Layer


Networks and update weights to minimize loss; variants such as Gradient descent or
stochastic Gradient descent are often used.

The back propagation algorithm works by computing the Gradient of the loss function
with respect to each weight via the chain rule, computing the Gradient Layer by Layer, and
iterating backward from the last Layer to avoid redundant computation of intermediate terms
in the chain rule.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Working of Back Propagation:
Neural Networks use supervised learning to generate Output vectors from Input
vectors that the Network operates on. It Compares generated Output to the desired Output
and generates an error report if the result does not match the generated Output vector. Then
it adjusts the weights according to the bug report to get your desired Output.
Features of Back Propagation:
1. it is the method as used in the case of simple perceptron Network with the
differentiable unit.
2. it is different from other Networks in respect to the process by which the weights
are calculated during the learning period of the Network.
3. training is done in the three stages :
 the of Input training pattern
 the calculation and Back Propagation of the error
 updating of the weight

Algorithm of Back Propagation:

Step 1: Inputs X, arrive through the pre connected path.


Step 2: The Input is modeled using true weights W. Weights are usually chosen randomly.
Step 3: Calculate the Output of each neuron from the Input Layer to the Hidden Layer to the
Output Layer.
Step 4: Calculate the error in the Outputs
Back propagation Error= Actual Output – Desired Output
Step 5: From the Output Layer, go back to the Hidden Layer to adjust the weights to reduce
the error.
Step 6: Repeat the process until the desired Output is achieved.

Parameters :
 x = Inputs Training Vector x=(x1,x2,…………xn).
 t = Target Vector t=(t1,t2……………tn).
 δk = Error at Output Unit.
 δj = Error at Hidden Layer.
 α = Learning Rate.
 V0 j = Bias of Hidden Unit j.
NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath
Training Algorithm :

Step 1: Initialize weight to small random values.


Step 2: While the steps stopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each Input unit receives the signal unit and transmits the signal xi signal to all the
units.
Step 5 : Each Hidden unit Zj (z=1 to a)
sums its weighted Input signal to calculate its net Input
zinj = v0j + σxivij ( i= 1 to n)
Applying activation function
zj= f(zinj) and sends this signals to all units in the
Layer about i.e Output units
For each Output
l=unit yk = (k=1 to m ) sums its weighted Input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the Output signals.
yk = f(yink)

Back Propagation Error :

Step 6: Each Output unit Yk (k=1 to n) receives a target pattern corresponding to an Input
pattern then error is calculated as:
δk = ( tk – yk ) + Yink
Step 7: Each Hidden unit Zj (j=1 to a) sums its Input from all units in the Layer above
δinj = Σ δj W jk
The error information term is calculated as :
δj = δinj + Zinj

Updation of Weight and Bias :

Step 8: Each Output unit Yk (k=1 to m) updates its bias and weight (j=1 to a).
The weight correction term is given by :
Δ wjk = α δk zj and the bias correction term is given by Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each Hidden unit zj (j=1 to a) update its bias and weights (i=0 to n) the weight
connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath


Step 9: Test the stopping condition. The stopping condition can be the minimization of error,
number of epochs.

Types of Back Propagation:


There are two types of Back Propagation Networks.
1. Static Back Propagation: Static Back Propagation is a Network designed to map
static Inputs for static Outputs. These types of Networks are capable of solving static
classification problems such as OCR (Optical Character Recognition).
2. Recurrent Back Propagation: Recursive Back Propagation is another Network
used for fixed-point learning. Activation in recurrent Back Propagation is Feed-
Forward until a fixed value is reached. Static Back Propagation provides an instant
mapping, while recurrent Back Propagation does not provide an instant mapping.
Advantages:
 It is simple, fast, and easy to program.
 Only numbers of the Input are tuned, not any other parameter.
 It is Flexible and efficient.
 No need for users to learn any special functions.
Disadvantages:
 It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
 Performance is highly dependent on Input data.
 Spending too much time training.
 The matrix-based approach is preferred over a mini-batch.

NNDL –Unit-3-Notes Prepared by Assistant Professor V. Ravindranath

You might also like