0% found this document useful (0 votes)
20 views65 pages

Unit-1 DL

Uploaded by

mahum1838
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views65 pages

Unit-1 DL

Uploaded by

mahum1838
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

UNIT-1

BACK PROPAGATION NETWORKS


Introduction
Artificial Intelligence (AI)
• AI enables machines to think without any human intervention. Artificial
intelligence is the simulation of human intelligence processes by machines,
especially computer systems.
• Specific applications of AI include expert systems, natural language
processing, speech recognition and machine vision. It is a broad area of computer
science.
Machine Learning (ML)
• ML is a subset of AI that uses statistical learning algorithms to build smart
systems. The ML systems can automatically learn and improve without
explicitly being programmed.
• The recommendation systems on music and video streaming services are
examples of ML.
• The machine learning algorithms are classified into three categories: supervised,
unsupervised and reinforcement learning.
Deep Learning (DL)
• Deep learning is a type of machine learning and artificial intelligence (AI) that imitates
the way humans gain certain types of knowledge.
• This subset of AI is a technique that is inspired by the way a human brain filters
information. It is associated with learning from examples.
• DL systems help a computer model to filter the input data through layers to predict
and classify information. Deep Learning processes information in the same manner as
the human brain.
• It is used in technologies such as driver-less cars. DL network architectures are classified
into Convolutional Neural Networks, Recurrent Neural Networks, and Recursive Neural
Networks.

Deep learning Process


Difference between AI,ML& DL
Artificial Intelligence Machine Learning Deep Learning
AI stands for Artificial Intelligence, and ML stands for Machine Learning, and is DL stands for Deep Learning, and is the
is basically the study/process which the study that uses statistical methods study that makes use of Neural Networks
enables machines to mimic human enabling machines to improve with (similar to neurons present in human
experience.
behaviour through particular algorithm. brain) to imitate functionality just like a
human brain.
AI is the broader family consisting of ML is the subset of AI. DL is the subset of ML.
ML and DL as it’s components.
AI is a computer algorithm which ML is an AI algorithm which allows DL is a ML algorithm that uses deep
exhibits intelligence through decision system to learn from data. (more than one layer) neural networks to
making. analyze data and provide output
accordingly.
Search Trees and much complex math If you have a clear idea about the logic If you are clear about the math involved
is involved in AI. (math) involved in behind and you can in it but don’t have idea about the
visualize the complex functionalities features, so you break the complex
like K-Mean, Support Vector Machines, functionalities into linear/lower
etc., dimension features
Artificial Intelligence Machine Learning Deep Learning
then it defines the ML aspect. adding more layers, then it defines the DL
aspect.
The aim is to basically increase chances of The aim is to increase accuracy not caring It attains the highest rank in terms of
success and not accuracy. much about the success ratio. accuracy when it is trained with large
amount of data.
Three broad categories/types Of AI are: Three broad categories/types Of ML are: DL can be considered as neural networks
Artificial Narrow Intelligence (ANI), Supervised Learning, with a large number of parameters layers
Artificial General Intelligence (AGI) and Unsupervised Learning lying in one of the four fundamental
Artificial Super Intelligence (ASI) Reinforcement Learning
network architectures: Unsupervised
Pre-trained Networks, Convolutional Neural
Networks, Recurrent Neural Networks and
Recursive Neural Networks

The efficiency Of AI is basically the Less efficient than DL as it can’t work for More powerful than ML as it can easily
efficiency provided by ML and DL longer dimensions or higher amount of work for larger sets of data.
respectively. data.
Examples of AI applications include: Examples of ML applications include: Examples of DL applications include:
Google’s AI-Powered Predictions, Virtual Personal Assistants: Siri, Alexa, Sentiment based news aggregation, Image
Ridesharing Apps Like Uber and Lyft, Google, etc., Email Spam and Malware analysis and caption generation, etc.
Commercial Flights Use an AI Autopilot, Filtering.
etc.
Difference between ML & DL
Machine Learning Deep Learning
Data Performs well on small to Performs well on large datasets
medium datasets
Hardware Able to function on CPU Requires significant computing
power e.g., GPU
Features Features need to be manually Learns features automatically
identified
Training Quick to train Computationally intensive
time
Machine Learning algorithms
ML algorithms can be broadly classified into three categories
1. Supervised,
2. Unsupervised and
3. Reinforcement learning.
Supervised Learning
• In supervised learning we have input variables (x) and an output variable (Y) and we use an
algorithm to learn the mapping from input to output.
• In other words, a supervised learning algorithm takes a known set of input dataset and its known
responses to the data (output) to learn the regression/classification model.
• A learning algorithm then trains a model to generate a prediction for the response to new data or the
test datasets.
• Supervised learning, also known as supervised machine learning, is a subcategory of machine
learning and artificial intelligence.
• It is defined by its use of labeled datasets to train algorithms that to classify data or predict
outcomes accurately.
• There is a relationship between two or more variables i.e., a change in one variable is associated
with a change in the other variable. For example, salary based on work experience or weight based on
height.
Supervised Learning deals with two main tasks Regression and Classification.
Unsupervised Learning
• Unsupervised Learning is used when we do not have labelled data.
• Its main focus is to learn more about the data by inferring patterns in the dataset
without reference to the known outputs.
• It is called unsupervised because the algorithms are left on their own to group the
unsorted information by finding similarities, differences and patterns in the data.
• Unsupervised learning is mostly performed as a part of exploratory data analysis. It is
most commonly used to find clusters of data and for dimensionality reduction.
• Unsupervised learning, also known as unsupervised machine learning, uses machine
learning algorithms to analyze and cluster unlabeled datasets.
• These algorithms discover hidden patterns or data groupings without the need for
human intervention.
Unsupervised Learning deals with clustering and associative rule mining problems.
Reinforcement Learning
• Reinforcement learning can be explained as learning by continuously interacting with
the environment.
• It is a type of machine learning algorithm in which an agent learns from an
interactive environment in a trial and error way by continuously using feedback
from its previous actions and experiences.
• The reinforcement learning uses rewards and punishments, the agents receive rewards
for performing correct actions and penalties for doing it incorrectly.
• Reinforcement Learning is a type of machine learning algorithm that learns to solve a
multi-level problem by trial and error.
• The machine is trained on real-life scenarios to make a sequence of decisions. It
receives either rewards or penalties for the actions it performs.
• Its goal is to maximize the total reward.
Reinforcement Learning deals with exploitation or exploration, Markov's decision
processes, Policy Learning, Deep Learning and value learning.
Neural Networks
• A Neural Network or Neural Net is a system of interconnected processing units called
neurons.
• Artificial Neural Networks (ANN) or Neural Networks is an integral part of Artificial
Intelligence and the foundation of Deep Learning.
• ANN is the computational architecture consisting of neurons that mathematically
represent how a biological neural network operates to identify and recognize
relationships within the data.
• Essentially Neural networks are non-linear machine learning models, which can be
used for both supervised or unsupervised learning.
• Neural networks are also seen as a set of algorithms, which are modeled loosely based
on the human brain and are built to identify patterns.
Basics of Neural Networks

Neural Network Types


A neural network that contains computations to track features and uses Artificial Intelligence in
the input data is known as Perceptron. This neural links to the artificial neurons using simple logic
gates with binary outputs. There are two types of perceptron models
1. Single Layer Perceptron
2. Multi-Layered Perceptron
An artificial neuron
An artificial neuron is a mathematical function based on a model of biological neurons,
where each neuron takes inputs, weighs them separately, sums them up and passes this
sum through a nonlinear function to produce output.
characteristics:
❖ A neuron is a mathematical function modelled on the working of biological

neurons
❖ It is an elementary unit in an artificial neural network

❖ One or more inputs are separately weighted

❖ Inputs are summed and passed through a nonlinear function to produce output

❖ Every neuron holds an internal state called activation signal

❖ Each connection link carries information about the input signal

❖ Every neuron is connected to another neuron via connection link


Single Layer Perceptron model
• This is one of the easiest Artificial neural networks (ANN) types. A single- layered perceptron model
consists feed-forward network and also includes a threshold transfer function inside the model.
• The main objective of the single- layer perceptron model is to analyse the linearly separable objects
with binary outcomes.
• In a single layer perceptron model, its algorithms do not contain recorded data, so it begins with
inconstantly allocated input for weight parameters.
• Further, it sums up all inputs (weight). After adding all inputs, if the total sum of all inputs is more
than a pre-determined value, the model gets activated and shows the output value as +1.
• If the outcome is same as pre-determined or threshold value, then the performance of this model is
stated as satisfied, and weight demand does not change.
• However, this model consists of a few discrepancies triggered when multiple weight inputs values
are fed into the model.
• Hence, to find desired output and minimize errors, some changes should be necessary for the weights
input.
Single-layer perceptron can learn only linearly separable patterns.

Multi-layer Perceptron model


Multi-Layered Perceptron model
• Like a single-layer perceptron model, a multi-layer perceptron model also has the same model
structure but has a greater number of hidden layers.
•The multi-layer perceptron model is also known as the Backpropagation algorithm, which executes
in two stages as follows:
1. Forward Stage: From the input layer in the on stage, activation functions begin and terminate
on the output layer.
2. Backward Stage: In the backward stage, weight and bias values are modified per the model’s
requirement. The backstage removed the error between the actual output and demands
originating backward on the output layer. A multilayer perceptron model has a greater
processing power and can process linear and non-linear patterns. Further, it also implements
logic gates such as AND, OR, XOR, XNOR, and NOR.
Neural Network Architecture
How Neural Networks work?
Neural Networks are complex systems with artificial neurons.
The neurons receive many inputs and process a single output.
Neural networks are comprised of layers of neurons.
Artificial neurons or perceptron consist of:
∙ Input
∙ Weight
∙ Bias
∙ Activation Function
∙ Output
•The neurons receive many inputs and process a single output. Neural networks are
comprised of layers of neurons.
•These layers consist of the following:
Input layer
∙ The input layer receives data represented by a numeric value. Hidden layers perform the
most computations required by the network. Finally, the output layer predicts the
output.
∙ In a neural network, neurons dominate one another. Each layer is made of neurons. Once
the input layer receives data, it is redirected to the hidden layer. Each input is assigned
with weights.
weight
• The weight is a value in a neural network that converts input data within the network’s
hidden layers. Weights work by input layer, taking input data, and multiplying it by the
weight value. Let’s assume the input to be a, and the weight associated to be W1. Then after
passing through the node the input becomes a*W1
Bias – In addition to the weights, another linear component is applied to the input, called as the bias. It is added to
the result of weight multiplication to the input. The bias is basically added to change the range of the weight
multiplied input. After adding the bias, the result would look like a*W1+bias. This is the final linear component of
the input transformation.
Learning rule:
The learning rule is a rule or an algorithm which modifies the parameters of the neural network, in order for a given
input to the network to produce a favored output. This learning process typically aims to modifying the weights and
thresholds.
Activation Function – Once the linear component is applied to the input, a non-linear
function is applied to it. This is done by applying the activation function to the linear
combination. The activation function translates the input signals to output signals. The
output after application of the activation function would look something like f(a*W1+b)
where f() is the activation function.
Ex:
In the below diagram we have “n” inputs given as X1 to Xn and corresponding weights
Wk1 to Wkn. We have a bias given as bk. The weights are first multiplied to its
corresponding input and are then added together along with the bias. Let this be called as u.
u=∑w*x+b
The activation function is applied to u i.e. f(u) and we receive the final output from the
neuron as yk = f(u)
The similarities among the terminologies between the biological and the
Artificial Neural Networks based on their functionalities:

Biological Neural Network (BNN) Artificial Neural Network (ANN)


Five Senses: via which receives the input Sources of Data: the input is collected
Dendrites: is used to pass through the input Wires or Connections: to pass the received inputs

Neurons/Nodes/Soma/Nucleus/Processing Unit: Neurons/Nodes: the unit that consolidates all the


carries electrical impulses and transmits the information and takes the Decision
information to other nerve cells

Synapses: means how the neurons talk to each other Weights or the interconnections: it transforms the input
data within the hidden layers of the network

Axon: carries nerve impulses away from the cell Output: Output of the Neural Network
body. In short, it is a vehicle to channelize the output
Applications of Neural Networks
•Neural networks are effectively applied to several fields to resolve data issues, some examples are listed below.

Facial Recognition
•Neural networks are playing a significant role in facial recognition. Some smartphones can identify the age of a person. This is
based on facial features and visual pattern recognition.

Weather Forecasting
•Neural networks are trained to recognize the patterns and identify distinct kinds of weather. Weather forecasting, with the help of
neural networks, not only predicts the weather.
Music composition
•Neural networks are mastering patterns in sounds and tunes. These networks train themselves adequately to create new music.
They are also being used in music composition software.

Image processing and Character recognition


•Neural networks can recognize and learn patterns in an image. Image processing is a growing field.
•Image recognition is used in:
∙ Facial recognition.
∙ Cancer cell detection.
∙ Satellite imagery processing for use in defense and agriculture.
Character recognition is helping to detect fraud and national security assessments.
Advantages of Neural Networks
Fault tolerance
•In a neural network, even if a few neurons are not working properly, that would not prevent the
neural networks from generating outputs.
Real-time Operations
•Neural networks can learn synchronously and easily adapt to their changing environments.
Adaptive Learning
Neural networks can learn how to work on different tasks. Based on the data given to produce
the right output.
Parallel processing capacity
•Neural networks have the strength and ability to perform multiple jobs simultaneously.
Disadvantages of Neural Networks
Unexplained behavior of the network
•Neural networks provide a solution to a problem. Due to the complexity of the
networks, it doesn’t provide the reasoning behind “why and how” it made the
decisions it made. Therefore, trust in the network may be reduced.
Architectures of Neural Network
The three fundamental network architectures are as listed below:
1. Convolutional Neural Networks
2. Recurrent Neural Networks
3. Recursive Neural Networks
Convolutional Neural Networks:
• CNN is a type of artificial neural network, which is widely used for image/object recognition and
classification. Deep Learning thus recognizes objects in an image by using a CNN.
• Convolutional Neural Network is basically an artificial neural network that is most widely used in the
field of Computer Vision for analyzing and classifying images.
• It is a deep learning algorithm that takes the input image and assigns weights/biases to various aspects
or objects in the image, so that it can differentiate one from the other.
• The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected
layers, and normalization layers.
• The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the Human
Brain and was inspired by the organization of the Visual Cortex.
For example, a CNN model might train to detect cars in images. Cars can be viewed as the
sum of their parts, including the wheels, boot, and windscreen. Each feature of a car equates
to a low-level pattern identified by the neural network, which then combines these parts to
create a high-level pattern.
Recurrent Neural Networks
• Recurrent Neural Networks is a type of neural network architecture that is
used in sequence prediction problems and is heavily used in the field of
Natural Language Processing.
• A recurrent neural network (RNN) is a deep learning model that is trained to
process and convert a sequential data input into a specific sequential data
output.
• RNNs are called recurrent because they perform the same task for every
element of a sequence, with the output being dependent on the previous
computations.
• Another way to think about RNNs is that they have a “memory” which
captures information about what has been calculated so far.
Recursive Neural Networks
• “A recursive neural network is a kind of deep neural network created by applying
the same set of weights recursively over a structured input, to produce a
structured prediction over variable-size input structures, or a scalar prediction on
it, by traversing a given structure in topological order.”
•A recursive neural network is more like a hierarchical network where there is really
no time aspect to the input sequence but the input has to be processed
hierarchically in a tree fashion.
•Here is an example of how a recursive neural network looks. It shows the way to
learn a parse tree of a sentence by recursively taking the output of the operation
performed on a smaller chunk of the text.
• A subset of deep neural networks called recursive neural networks
(RvNNs) are capable of learning organized and detailed data. By
repeatedly using the same set of weights on structured inputs, RvNN
enables you to obtain a structured prediction. Recursive refers to the
neural network's application to its output.
• Recursive neural networks are capable of handling hierarchical data
because of their indepth tree-like structure. In a tree structure, parent
nodes are created by joining child nodes. There is a weight matrix for
every child-parent bond, and comparable children have the same
weights. To allow for recursive operations and the use of the same weights,
the number of children for each node in the tree is fixed.
Need for Multilayer Networks
Single Layer networks-Limitations:
It cannot used to solve Linear Inseparable problems & can only be used to
solve linear separable problems
Single layer networks cannot solve complex problems
Single layer networks cannot be used when large input-output data set is
available
Single layer networks cannot capture the complex information’s available in
the training pairs
Hence to overcome the above said Limitations we use Multi-Layer
Networks.
Multi-Layer Networks
Any neural network which has at least one layer in between input and output
layers is called Multi-Layer Networks
Layers present in between the input and out layers are called Hidden Layers
Input layer neural unit just collects the inputs and forwards them to the next higher
layer
Hidden layer and output layer neural units process the information’s feed to them
and produce an appropriate output
Multi -layer networks provide optimal solution for arbitrary classification problems
Multi -layer networks use linear discriminants, where the inputs are non linear
BACK PROPAGATION NETWORKS (BPN)
• Backpropagation is an algorithm that back propagates the errors from the output nodes to the
input nodes. Therefore, it is simply referred to as the backward propagation of errors.
• It uses in the vast applications of neural networks in data mining like Character recognition,
Signature verification, etc.
• Backpropagation is the essence of neural network training. It is the method of fine-tuning the
weights of a neural network based on the error rate obtained in the previous epoch (i.e.,
iteration).
• Proper tuning of the weights allows you to reduce error rates and make the model reliable by
increasing its generalization.
• Backpropagation in neural network is a short form for “backward propagation of errors.”
• It is a standard method of training artificial neural networks.
• This method helps calculate the gradient of a loss function with respect to all the weights in the
network.
BACK PROPAGATION NETWORKS (BPN)
• Introduced by Rumelhart, Hinton, & Williams in 1986. BPN is a Multi- layer
Feedforward Network but error is back propagated, Hence the name Back
Propagation Network (BPN).
• It uses Supervised Training process; it has a systematic procedure for training the
network and is used in Error Detection and Correction.
• Generalized Delta Law /Continuous Perceptron Law/ Gradient Descent Law is
used in this network.
• Generalized Delta rule minimizes the mean squared error of the output
calculated from the output. Delta law has faster convergence rate when compared
with Perceptron Law.
• It is the extended version of Perceptron Training Law.
• Limitations of this law is the Local minima problem. Due to this the
convergence speed reduces, but it is better than perceptron’s.
Fig 1: Back Propagation Network
• Figure 1 represents a BPN network architecture. Even though Multi level perceptron’s can be
used they are flexible and efficient that BPN.
• In figure 1 the weights between input and the hidden portion is considered as Wij and the
weight between first hidden to the next layer is considered as Vjk.
• This network is valid only for Differential Output functions. The Training process used in
backpropagation involves three stages, which are listed as below:
1. Feedforward of input training pair
2. Calculation and backpropagation of associated error
3. Adjustments of weights
How Backpropagation Algorithm Works
• The Back propagation algorithm in neural network computes the gradient of the loss
function for a single weight by the chain rule.
• It efficiently computes one layer at a time, unlike a native direct computation. It
computes the gradient, but it does not define how the gradient is used.
• It generalizes the computation in the delta rule.
Features of Backpropagation
• It is the gradient descent method as used in the case of simple perceptron
network with the differentiable unit.
• it is different from other networks in respect to the process by which the
weights are calculated during the learning period of the network.
• training is done in the three stages :
• the feed-forward of input training pattern
• the calculation and backpropagation of the error
• updation of the weight
Backpropagation Algorithm

Step 1: Inputs X, arrive through the pre connected path.


Step 2: The input is modeled using true weights W. Weights are usually chosen
randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden layer
to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights to reduce
the error.
Step 6: Repeat the process until the desired output is achieved.
Parameters :
• x = inputs training vector x=(x1,x2,…………xn).
• t = target vector t=(t1,t2……………tn).
• δk = error at output unit.
• δj = error at hidden layer.
• α = learning rate.
• V0j = bias of hidden unit j.
Training Algorithm
Step 1: Initialize weight to small random values.
Step 2: While the steps stopping condition is to be false do step 3 to 10.
Step 3: For each training pair do step 4 to 9 (Feed-Forward).
Step 4: Each input unit receives the signal unit and transmits the signal xi signal to all the
units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to calculate its net
input
zinj = v0j + Σxivij ( i=1 to n)
Applying activation function zj = f(zinj) and sends this signals to all units in the layer
about i.e output units
For each output l=unit yk = (k=1 to m) sums its weighted input signals.
yink = w0k + Σ ziwjk (j=1 to a)
and applies its activation function to calculate the output signals.
yk = f(yink)
Backpropagation Error
Step 6: Each output unit yk (k=1 to n) receives a target pattern
corresponding to an input pattern then error is calculated as:
δk = ( tk – yk ) + yink
Step 7: Each hidden unit Zj (j=1 to a) sums its input from all units in the
layer above
δinj = Σ δj wjk
The error information term is calculated as :
δj = δinj + zinj
Updation of weight and bias
Step 8: Each output unit yk (k=1 to m) updates its bias and weight (j=1 to a).
The weight correction term is given by :
Δ wjk = α δk zj
and the bias correction term is given by
Δwk = α δk.
therefore wjk(new) = wjk(old) + Δ wjk
w0k(new) = wok(old) + Δ wok
for each hidden unit zj (j=1 to a) update its bias and weights (i=0 to n)
the weight connection term
Δ vij = α δj xi
and the bias connection on term
Δ v0j = α δj
Therefore vij(new) = vij(old) + Δvij
v0j(new) = v0j(old) + Δv0j
Step 9: Test the stopping condition. The stopping condition can be the minimization of error, number of epochs.
BPN Algorithm
The algorithm for BPN is as classified into four major steps as follows:
1. Initialization of Bias, Weights

2. Feedforward process

3. Back Propagation of Errors

4. Updating of weights & biases


Algorithm
I. Initialization of weights:
Step 1: Initialize the weights to small random values near zero
Step 2: While stop condition is false , Do steps 3 to 10
Step 3: For each training pair do steps 4 to 9
II. Feed forward of inputs
Step 4: Each input xi is received and forwarded to higher layers (next hidden)
Step 5: Hidden unit sums its weighted inputs as follows Zinj = Woj + Σxiwij
• Applying Activation function Zj = f(Zinj)
• This value is passed to the output layer
Step 6: Output unit sums it’s weighted inputs
• yink= Voj + Σ ZjVjk
Applying Activation function
• Yk = f(yink)
III. Backpropagation of Errors
Step 7: δk = (tk – Yk)f(yink )
δinj = Σ δjVjk
IV. Updating of Weights & Biases
Step 8: Weight correction is
Δwij = αδkZj
• bias Correction is
Δwoj = αδk
V. Updating of Weights & Biases
Step 9: continued:
New Weight is
Wij(new) = Wij(old) + Δwij
Vjk(new) = Vjk(old) + ΔVjk
New bias is
Woj(new) = Woj(old) + Δwoj
Vok(new) = Vok(old) + ΔVok
Step 10: Test for Stop Condition
Merits
• Has smooth effect on weight correction
• Computing time is less if weight’s are small
• 100 times faster than perceptron model
• Has a systematic weight updating procedure
• Backpropagation is fast, simple and easy to program
• It has no parameters to tune apart from the numbers of input
• It is a flexible method as it does not require prior knowledge about the network
• It is a standard method that generally works well
• It does not need any special mention of the features of the function to be learned.
• It is simple, fast, and easy to program.
• Only numbers of the input are tuned, not any other parameter.
• It is Flexible and efficient.
• No need for users to learn any special functions.
Demerits
• Learning phase requires intensive calculations
• Selection of number of Hidden layer neurons is an issue

• Selection of number of Hidden layers is also an issue

• Network gets trapped in Local Minima

• Temporal Instability

• Network Paralysis

• Training time is more for Complex problems

• It is sensitive to noisy data and irregularities. Noisy data can lead to inaccurate
results.
• Performance is highly dependent on input data.
• Spending too much time training.
• The matrix-based approach is preferred over a mini-batch.
Need for Backpropagation
• Backpropagation is “backpropagation of errors” and is very useful for training
neural networks.
• It’s fast, easy to implement, and simple.
• Backpropagation does not require any parameters to be set, except the number of
inputs.
• Backpropagation is a flexible method because no prior knowledge of the network
is required.
Types of Backpropagation
There are two types of backpropagation networks:
1. Static backpropagation: It is one kind of backpropagation network which produces a
mapping of a static input for static output.
• These types of networks are capable of solving static classification problems such as
OCR (Optical Character Recognition).
2. Recurrent backpropagation: Recurrent Back propagation in data mining is fed
forward until a fixed value is achieved. After that, the error is computed and
propagated backward.
• The main difference between both of these methods is: that the mapping is rapid in
static back-propagation while it is non-static in recurrent backpropagation.
• Recurrent backpropagation is another network used for fixed-point learning. Activation
in recurrent backpropagation is feed-forward until a fixed value is reached. Static
backpropagation provides an instant mapping, while recurrent backpropagation does
not provide an instant mapping
Applications of Backpropagation
The applications are:
The neural network is trained to enunciate each letter of a word and a sentence

It is used in the field of speech recognition


It is used in the field of character and face recognition
Gradient Descent
• Gradient descent is by far the most popular optimization strategy used in Machine
Learning and Deep Learning at the moment.
• It is used while training our model, can be combined with every algorithm, and is easy
to understand and implement.
• Gradient measures how much the output of a function changes if we change the inputs
a little.
• We can also think of a gradient as the slope of a function. The higher the gradient, the
steeper the slope and the faster the model learns.
• Gradient descent can be thought of as climbing down to the bottom of a valley, instead of
as climbing up a hill. This is because it is a minimization algorithm that minimizes a
given function.
• Let’s consider the graph below where we need to find the values of w and b that
correspond to the minimum cost function (marked with a red arrow).
Types of Gradient Descent
Batch Gradient Descent
• In batch gradient descent, we use the complete dataset available to compute the
gradient of the cost function.
• Batch gradient descent is very slow because we need to calculate the gradient
on the complete dataset to perform just one update, and if the dataset is large
then it will be a difficult task.
1. Cost function is calculated after the initialization of parameters.
2. It reads all the records into memory from the disk.
3. After calculating sigma for one iteration, we move one step further, and repeat the
process.
Mini-batch Gradient Descent
• It is a widely used algorithm that makes faster and accurate results. The
dataset, here, is clustered into small groups of ‘n’ training datasets.
• It is faster because it does not use the complete dataset. In every iteration, we
use a batch of ‘n’ training datasets to compute the gradient of the cost function.
• It reduces the variance of the parameter updates, which can lead to more
stable convergence.
• It can also make use of a highly optimized matrix that makes computing of the
gradient very efficient.
Stochastic Gradient Descent
• We use stochastic gradient descent for faster computation. The first step is to
randomize the complete dataset.
• Then, we use only one training example in every iteration to calculate the
gradient of the cost function for updating every parameter.
• It is faster for larger datasets also because it uses only one training example in
each iteration.
• We understood all the basic concepts and working of back propagation algorithm
through this blog.
• Now, we know that back propagation algorithm is the heart of a neural
network.

You might also like