0% found this document useful (0 votes)

77 views21 pages

Unit-2 DL Cse

The document discusses feedforward neural networks. It provides details on their structure where information flows in one direction from input to output. It also describes how to calculate the number of parameters in feedforward neural networks with one or more hidden layers using a generalized formula. Backpropagation is introduced as an important concept for training these networks using gradient descent.

Uploaded by

Sushant Vyas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views21 pages

Unit-2 DL Cse

Uploaded by

Sushant Vyas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 21

UNIT-2

Feedforward Neural Networks

A Feed Forward Neural Network is an artificial neural network in which the connections
between nodes does not form a cycle. The opposite of a feed forward neural network is a
recurrent neural network, in which certain pathways are cycled. The feed-forward model is
the simplest form of neural network as information is only processed in one direction. While
the data may pass through multiple hidden nodes, it always moves in one direction and
never backwards. It can be used in pattern recognition. This type of organization is represented as
bottom-up or top-down.

Learning Parameters of Feedforward Neural Networks:

Mathematically, a feed-forward neural network defines a mapping y = f(x; θ) and learns the
value of the parameters θ that helps in finding the best function approximation.

Note: There is also a bias unit in a feed-forward neural network in all the layers except the
output layer.

Let us now use this knowledge to find the number of parameters.

Scenario 1: A feed-forward neural network with just

one hidden layer. Number of units in the input, hidden
and output layers are respectively 3, 4 and 2.

Assumptions:
i = number of neurons in input layer

h = number of neurons in hidden layer

o = number of neurons in output layer

From the diagram, we have i = 3, h = 4 and o = 2. Note that the red colored neuron is the
bias for that layer. Each bias of a layer is connected to all the neurons in the next layer
except the bias of the next layer.

Mathematically:
Number of connections between the first and second layer: 3 × 4 = 12, which is nothing but
the product of i and h.

Number of connections between the second and third layer: 4 × 2 = 8, which is nothing but
the product of h and o.

There are connections between layers via bias as well. Number of connections between the
bias of the first layer and the neurons of the second layer (except bias of the second layer):
1 × 4, which is nothing but h.

Number of connections between the bias of the second layer and the neurons of the third
layer: 1 × 2, which is nothing but o.

Summing up all:

3×4+4×2+1×4+1×2

= 12 + 8 + 4 + 2

= 26

Thus, this feed-forward neural network has 26 connections in all and thus will have 26
trainable parameters.

Let us try to generalize using this equation and find a formula.

3×4+4×2+1×4+1×2

=3×4+4×2+4+2

=i×h+h×o+h+o
Thus, the total number of parameters in a feed-forward neural network with one hidden
layer is given by:

(i × h + h × o) + h + o

Since this network is a small network it was also possible to count the connections in the
diagram to find the total number. But, what if the number of layers is more? Let us work on
one more scenario and see if this formula works or we need an extension to this.

Scenario 1: A feed-forward neural network with three hidden layers. Number of units in the
input, first hidden, second hidden, third hidden and output layers are respectively 3, 5, 6, 4
and 2.

Assumptions:

i = number of neurons in input layer

h1 = number of neurons in first hidden layer

h2 = number of neurons in second hidden layer

h3 = number of neurons in third hidden layer

o = number of neurons in output layer

Number of connections between the first and second layer: 3 × 5 = 15, which is nothing but
the product of i and h1.

Number of connections between the second and third layer: 5 × 6 = 30, which is nothing but
the product of h1 and h2.

Number of connections between the third and fourth layer: 6 × 4 = 24, which is nothing but
the product of h2 and h3.

Number of connections between the fourth and fifth layer: 4 × 2= 8, which is nothing but
the product of h3 and o.

Number of connections between the bias of the first layer and the neurons of the second
layer (except bias of the second layer): 1 × 5 = 5, which is nothing but h1.

Number of connections between the bias of the second layer and the neurons of the third
layer: 1 × 6 = 6, which is nothing but h2.

Number of connections between the bias of the third layer and the neurons of the fourth
layer: 1 × 4 = 4, which is nothing but h3.

Number of connections between the bias of the fourth layer and the neurons of the fifth
layer: 1 × 2 = 2, which is nothing but o.
Summing up all:

3×5+5×6+6×4+4×2+1×5+1×6+1×4+1×2

= 15 + 30 + 24 + 8 + 5 + 6 + 4 + 2

= 94

Thus, this feed-forward neural network has 94 connections in all and thus 94 trainable
parameters.

Let us try to generalize using this equation and find a formula.

3×5+5×6+6×4+4×2+1×5+1×6+1×4+1×2

=3×5+5×6+6×4+4×2+5+6+4+2

= i × h1 + h1 × h2 + h2 × h3+ h3 × o + h1 + h2 + h3+ o

Thus, the total number of parameters in a feed-forward neural network with three hidden
layers is given by:

(i × h1 + h1 × h2 + h2 × h3 + h3 × o) + h1 + h2 + h3+ o

Thus, the formula to find the total number of trainable parameters in a feed-forward neural
network with n hidden layers is given by:

Backpropagation Gradient Descent (GD):

Backpropagation is one of the important concepts of a neural network. Our task is to classify
our data best. For this, we have to update the weights of parameters and bias. In the linear
regression model, we use gradient descent to optimize the parameter. Similarly, here we
also use a gradient descent algorithm using Backpropagation. Backpropagation algorithms
are a set of methods used to efficiently train artificial neural networks following a gradient
descent approach that exploits the chain rule. The main features of Backpropagation are the
iterative, recursive and efficient method through which it calculates the updated weight to
improve the network until it is not able to perform the task for which it is being trained.

Now, how error function is used in Backpropagation, and how Backpropagation works?

Input values
X1=0.05
X2=0.10

Initial weight
W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values
b1=0.35 b2=0.60

Target Values
T1=0.01
T2=0.99
Now, we first calculate the values of H1 and H2 by a forward pass.
Forward Pass
To find the value of H1 we first multiply the input value from the weights as

H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1

H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we calculate the H1
and H2.

To find the value of y1, we first multiply the input value i.e., the outcome of H1 and
H2 from the weights as
y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as

We will calculate the value of y2 in the same way as y1

y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214

To calculate the final result of H1, we performed the sigmoid function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our
target values T1 and T2.

Now, we will find the total error, which is simply the difference between the outputs from
the target outputs. The total error is calculated as

So, the total error is

Now, we will
backpropagate
this error to
update the
weights using
a backward pass.

To update the weight, we calculate the error corresponding to each weight with the help of a
total error. The error on weight w is calculated by differentiating total error with respect to w.

We perform a backward process so first consider the last weight w5 as

From equation two, it is clear that we cannot partially differentiate it with respect to
w5 because there is no any w5. We split equation one into multiple terms so that we
can easily differentiate it with respect to w5 as

Now, we
calculate
each term one by one to differentiate E total with respect to w5 as
Putting the value of e-y in equation (5)

So, we put the values of in equation no (3) to find the

final result.
Now, we will calculate the updated weight w5 new with the help of the following formula

In the same way, we calculate w6new,w7new, and w8new and this will give us the following values

w5new=0.35891648
w6new=408666186
w7new=0.511301270
w8new=0.561370121
Backward pass at Hidden layer:

Now, we will backpropagate to our hidden layer and update the weight w1, w2, w3,
and w4 as we have done with w5, w6, w7, and w8 weights.

We will calculate the error at w1 as

From equation (2), it is clear that we cannot partially differentiate it with respect to w1
because there is no any w1. We split equation (1) into multiple terms so that we can easily
differentiate it with respect to w1 as

Now, we calculate each term one by one to differentiate E total with respect to w1 as
We again split this because there is no any H1 final term in Etoatal as

will again split because in E1 and E2 there is no H1 term. Splitting is

done as

We again Split both because there is no any y1 and y2 term in E1 and E2.
We split it as

Now, we find the value of by putting values in equation (18) and (19) as

From equation (18)

From equation (8)

From equation (19)

Putting the value of e-y2 in equation (23)

From equation (21)

Now from equation (16) and (17)

Put the value of in equation (15) as

We have we need to figure out as

Putting the value of e-H1 in equation (30)

We calculate the partial derivative of the total net input to H1 with respect to w1 the
same as we did for the output neuron:

So, we put the values of in equation (13) to find the final

result.
Now, we will calculate the updated weight w1 new with the help of the following
formula

In the same way, we calculate w2new,w3new, and w4 and this will give us the following
values

w1new=0.149780716
w2new=0.19956143
w3new=0.24975114
w4new=0.29950229
We have updated all the weights. We found the error 0.298371109 on the network when
we fed forward the 0.05 and 0.1 inputs. In the first round of Backpropagation, the total error
is down to 0.291027924. After repeating this process 10,000, the total error is down to
0.0000351085. At this point, the outputs neurons generate 0.159121960 and 0.984065734
i.e., nearby our target value when we feed forward the 0.05 and 0.1.

Root Mean Squared Propagation (RMSProp)

Root Mean Squared Propagation, or RMSProp for short, is an extension to the gradient
descent optimization algorithm. It is an unpublished extension, first described in Geoffrey
Hinton’s lecture notes. RMSProp is designed to accelerate the optimization process, e.g.
decrease the number of function evaluations required to reach the optima, or to improve
the capability of the optimization algorithm, e.g. result in a better final result. It is related to
another extension to gradient descent called Adaptive Gradient, or AdaGrad. RMSProp
extends Adagrad to avoid the effect of a monotonically decreasing learning rate. RMSProp
maintains a decaying average of squared gradients. The calculation of the mean squared
partial derivative for one parameter is as follows:

s(t+1) = (s(t) * rho) + (f'(x(t))^2 * (1.0-rho))

Where s(t+1) is the decaying moving average of the squared partial derivative for one
parameter for the current iteration of the algorithm, s(t) is the decaying moving average
squared partial derivative for the previous iteration, f'(x(t))^2 is the squared partial
derivative for the current parameter, and rho is a hyperparameter, typically with the value
of 0.9 like momentum.

Given that we are using a decaying average of the partial derivatives and calculating the
square root of this average gives the technique its name, e.g, square root of the mean
squared partial derivatives or root mean square (RMS). For example, the custom step size
for a parameter may be written as:

cust_step_size(t+1) = step_size / (1e-8 + RMS(s(t+1)))

Once we have the custom step size for the parameter, we can update the parameter using
the custom step size and the partial derivative f'(x(t)).

x(t+1) = x(t) – cust_step_size(t+1) * f'(x(t))

This process is then repeated for each input variable until a new point in the search space is
created and can be evaluated.

RMSProp is a very effective extension of gradient descent and is one of the preferred
approaches generally used to fit deep learning neural networks.

Adam:
The Adam optimization algorithm is an extension to stochastic gradient descent. It use to
update network weights iterative based in training data. Adam was presented by Diederik
Kingma from Open AI and Jimmy Ba. the name Adam is derived from adaptive moment
estimation.

benefits of using Adam on non-convex optimization problems, as follows:

Straightforward to implement.
Computationally efficient.
Little memory requirements.
Invariant to diagonal rescale of the gradients.
Well suited for problems that are large in terms of data and/or parameters.
Appropriate for non-stationary objectives.
Appropriate for problems with very noisy/or sparse gradients.
Hyper-parameters have intuitive interpretation and typically require little tuning.

Adam as combining the advantages of two other extensions of stochastic gradient

descent. Specifically:
Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter learning rate that
improves performance on problems with sparse gradients (e.g. natural language and
computer vision problems).
Root Mean Square Propagation (RMSProp) that also maintains per-parameter learning
rates that are adapted based on the average of recent magnitudes of the gradients for the
weight (e.g. how quickly it is changing). This means the algorithm does well on online and
non-stationary problems (e.g. noisy).

Weighted initialization in neural network:

Why Weight Initialization?

Its main objective is to prevent layer activation outputs from exploding or vanishing
gradients during the forward propagation. If either of the problems occurs, loss gradients
will either be too large or too small, and the network will take more time to converge if it is
even able to do so at all.

If we initialized the weights correctly, then our objective i.e, optimization of loss function
will be achieved in the least time otherwise converging to a minimum using gradient
descent will be impossible.

Different Weight Initialization Techniques

One of the important things which we have to keep in mind while building your neural
network is to initialize your weight matrix for different connections between layers
correctly.

Let us see the following two initialization scenarios which can cause issues while we training
the model:

Zero Initialization (Initialized all weights to 0)

If we initialized all the weights with 0, then what happens is that the derivative wrt loss
function is the same for every weight in W[l], thus all weights have the same value in
subsequent iterations. This makes hidden layers symmetric and this process continues for all
the n iterations. Thus initialized weights with zero make your network no better than a
linear model. It is important to note that setting biases to 0 will not create any problems as
non-zero weights take care of breaking the symmetry and even if bias is 0, the values in
every neuron will still be different.

Random Initialization (Initialized weights randomly)

– This technique tries to address the problems of zero initialization since it prevents neurons
from learning the same features of their inputs since our goal is to make each neuron learn
different functions of its input and this technique gives much better accuracy than zero
initialization.

– In general, it is used to break the symmetry. It is better to assign random values except 0
to weights.

– Remember, neural networks are very sensitive and prone to overfitting as it quickly
memorizes the training data.

“What happens if the weights initialized randomly can be very high or very
low?”
(a) Vanishing gradients :

For any activation function, abs(dW) will get smaller and smaller as we go backward with
every layer during backpropagation especially in the case of deep neural networks. So, in
this case, the earlier layers’ weights are adjusted slowly.

Due to this, the weight update is minor which results in slower convergence.

This makes the optimization of our loss function slow. It might be possible in the worst case,
this may completely stop the neural network from training further.

More specifically, in the case of the sigmoid and tanh and activation functions, if your
weights are very large, then the gradient will be vanishingly small, effectively preventing the
weights from changing their value. This is because abs(dW) will increase very slightly or
possibly get smaller and smaller after the completion of every iteration.

So, here comes the use of the RELU activation function in which vanishing gradients are
generally not a problem as the gradient is 0 for negative (and zero) values of inputs and 1
for positive values of inputs.

(b) Exploding gradients :

This is the exact opposite case of the vanishing gradients, which we discussed above.

Consider we have weights that are non-negative, large, and having small activations A.
When these weights are multiplied along with the different layers, they cause a very large
change in the value of the overall gradient (cost). This means that the changes in W, given
by the equation W= W — ⍺ * dW, will be in huge steps, the downward moment will increase.

Eigen values and Eigen vectors:

Role of Eigen values and Eigen vectors in deep learning
Picking the features which represent that data and eliminating less useful features is an
example of dimensionality reduction. We can use eigenvalues and vectors to identify those
dimensions which are most useful and prioritize our computational resources toward them.

What is an Eigenvalue?

Mathematically, the eigenvalue is the number by which the eigenvector is multiplied and
produces the same result as if the matrix were multiplied with the vector as shown in
Equation 1.

Ax = λx……………(1)
Where A is the square matrix, λ is the eigenvalue and x is the eigenvector

Details of how to calculate the determinant of a matrix can be found in a linear algebra
textbook.

Equation 2 (A - λI)x = 0

Equation 3 det(A - λI) = 0

Clase 3 - Redes Neuronales - Entrenamiento y Aplicaciones
No ratings yet
Clase 3 - Redes Neuronales - Entrenamiento y Aplicaciones
9 pages
Shayak
No ratings yet
Shayak
6 pages
Lab 04 Sol PDF
No ratings yet
Lab 04 Sol PDF
7 pages
Unit 3
No ratings yet
Unit 3
17 pages
Unit 2
No ratings yet
Unit 2
36 pages
Sheet #6 Ensemble + Neural Nets + Linear Regression + Backpropagation + CNN
No ratings yet
Sheet #6 Ensemble + Neural Nets + Linear Regression + Backpropagation + CNN
4 pages
Ai Assignment 2 Answer
No ratings yet
Ai Assignment 2 Answer
12 pages
Back Propagation LSN 4
No ratings yet
Back Propagation LSN 4
17 pages
The Influence of The Sigmoid Function Parameters On The Speed of Backpropagation Learning
No ratings yet
The Influence of The Sigmoid Function Parameters On The Speed of Backpropagation Learning
7 pages
Back-Propagation Algorithm of CHBPN Code
No ratings yet
Back-Propagation Algorithm of CHBPN Code
10 pages
BP Sum
No ratings yet
BP Sum
13 pages
Deep Learning-Material For The Units 1,2,3
No ratings yet
Deep Learning-Material For The Units 1,2,3
36 pages
Questions 11: Feed-Forward Neural Networks: Roman Belavkin Middlesex University
No ratings yet
Questions 11: Feed-Forward Neural Networks: Roman Belavkin Middlesex University
7 pages
(IJCST-V6I4P17) :P T V Lakshmi
No ratings yet
(IJCST-V6I4P17) :P T V Lakshmi
4 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
8 pages
Module-1 Backpropagation Process in Deep Neural Network
No ratings yet
Module-1 Backpropagation Process in Deep Neural Network
5 pages
Vanishing Gradient Problem
No ratings yet
Vanishing Gradient Problem
3 pages
Neural Network Notes
No ratings yet
Neural Network Notes
8 pages
Question 105A
No ratings yet
Question 105A
33 pages
Python Unit 5
No ratings yet
Python Unit 5
36 pages
Backpropagation Working Error Computation Adjusting Weights
No ratings yet
Backpropagation Working Error Computation Adjusting Weights
12 pages
Feedforward in Neural Networks
No ratings yet
Feedforward in Neural Networks
14 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
Week 2
No ratings yet
Week 2
17 pages
Neural Net Notes
No ratings yet
Neural Net Notes
7 pages
A Multilayer Feed-Forward Neural Network
No ratings yet
A Multilayer Feed-Forward Neural Network
9 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
ML Lec-22
No ratings yet
ML Lec-22
25 pages
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
No ratings yet
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
5 pages
Back Propagation-2-20
No ratings yet
Back Propagation-2-20
19 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Y W X y F (X) X W X: Neural Networks Viewed As Directed Graph
No ratings yet
Y W X y F (X) X W X: Neural Networks Viewed As Directed Graph
12 pages
Feed Forward Neural Network Assignment PDF
No ratings yet
Feed Forward Neural Network Assignment PDF
11 pages
Neural Link Assignment
No ratings yet
Neural Link Assignment
14 pages
2.3 Feed Forward Netwoks
No ratings yet
2.3 Feed Forward Netwoks
25 pages
Backpropagation Math
No ratings yet
Backpropagation Math
6 pages
Msep2013 L5
No ratings yet
Msep2013 L5
14 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Unit 2
No ratings yet
Unit 2
18 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
Tuto 6 Optimisation ENSIA
No ratings yet
Tuto 6 Optimisation ENSIA
3 pages
ML Lec-23
No ratings yet
ML Lec-23
20 pages
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ELA VL2023240104096 2023-09-07 Reference-Material-I
7 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages
Module 2
No ratings yet
Module 2
44 pages
Unit 1 Notes
0% (1)
Unit 1 Notes
33 pages
DeepLearning Practice Question Answers
No ratings yet
DeepLearning Practice Question Answers
43 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
Assignment Mtech
No ratings yet
Assignment Mtech
5 pages
L4-5 Ann
No ratings yet
L4-5 Ann
30 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
31 pages
Session 1
No ratings yet
Session 1
8 pages
Lecture Slides 2 - Neural Networks - 2021
No ratings yet
Lecture Slides 2 - Neural Networks - 2021
42 pages
Wa0000.
No ratings yet
Wa0000.
4 pages
Index
No ratings yet
Index
127 pages
RL 1
No ratings yet
RL 1
12 pages
BTECH All Branch 8th-Semt CBCS
No ratings yet
BTECH All Branch 8th-Semt CBCS
2 pages
Cse 8 Sem Natural Language Processing 3698 Summer 2019
No ratings yet
Cse 8 Sem Natural Language Processing 3698 Summer 2019
2 pages
ANN Thesis
No ratings yet
ANN Thesis
163 pages
ML Lab Manual 2023-2024
No ratings yet
ML Lab Manual 2023-2024
44 pages
ML QB (Vtu)
No ratings yet
ML QB (Vtu)
6 pages
A Hybrid Neural Network-First Principles Approach Process Modeling
No ratings yet
A Hybrid Neural Network-First Principles Approach Process Modeling
13 pages
A Multilayer Neural Network Controller
No ratings yet
A Multilayer Neural Network Controller
5 pages
Analyzing Activation Functions With Transfer Learning-Based Layer Customization For Improved Brain Tumor Classification
No ratings yet
Analyzing Activation Functions With Transfer Learning-Based Layer Customization For Improved Brain Tumor Classification
21 pages
ANN-unit 3
No ratings yet
ANN-unit 3
30 pages
Deep Learning
No ratings yet
Deep Learning
90 pages
Back Propagation Example
No ratings yet
Back Propagation Example
3 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Application of Machine Learning and Deep Learning in Finite Element Analysis: A Comprehensive Review
No ratings yet
Application of Machine Learning and Deep Learning in Finite Element Analysis: A Comprehensive Review
40 pages
Research On E-Commerce Customer Satisfaction Evaluation Method Based On Pso-Lstm and Text Mining
No ratings yet
Research On E-Commerce Customer Satisfaction Evaluation Method Based On Pso-Lstm and Text Mining
16 pages
NeurIPS 2020 Learning To Solve TV Regularised Problems With Unrolled Algorithms Paper
No ratings yet
NeurIPS 2020 Learning To Solve TV Regularised Problems With Unrolled Algorithms Paper
12 pages
Viva Question
No ratings yet
Viva Question
5 pages
16 Mikami
No ratings yet
16 Mikami
27 pages
12 - ASAP - NPTEL - Neural Network - Let4
No ratings yet
12 - ASAP - NPTEL - Neural Network - Let4
13 pages
Determination of Process Parameters in Stereolitho
No ratings yet
Determination of Process Parameters in Stereolitho
11 pages
A CNN-Based Structure For Performance Degradation
No ratings yet
A CNN-Based Structure For Performance Degradation
13 pages
Project Assignment TBW
No ratings yet
Project Assignment TBW
6 pages
Adaptive Neuro Fuzzy Inference System (ANFIS) With Error Backpropagation Algorithm Using Mapping Function
No ratings yet
Adaptive Neuro Fuzzy Inference System (ANFIS) With Error Backpropagation Algorithm Using Mapping Function
21 pages
Machine Learning DSE Course Handout
No ratings yet
Machine Learning DSE Course Handout
7 pages
Unit 1
No ratings yet
Unit 1
23 pages
Load Forecasting Using Artificial Neural Network
No ratings yet
Load Forecasting Using Artificial Neural Network
4 pages
Updated Paper ID-118
No ratings yet
Updated Paper ID-118
5 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Tackling Photonic Inverse Design With Machine Learning
No ratings yet
Tackling Photonic Inverse Design With Machine Learning
15 pages
Optimization of Power System Problems: Mahmoud Pesaran Hajiabbas Behnam Mohammadi-Ivatloo Editors
No ratings yet
Optimization of Power System Problems: Mahmoud Pesaran Hajiabbas Behnam Mohammadi-Ivatloo Editors
386 pages
Surveying Stock Market Forecasting Techniques Part II - Soft Computing Methods - 2009
No ratings yet
Surveying Stock Market Forecasting Techniques Part II - Soft Computing Methods - 2009
10 pages
Command Area Development by Using FAO Cropwat 8.0 Model and Impact of Climate Change On Crop Water Requirement-A Case Study On Araniar Reservoir Basin (Pichatur Dam)
No ratings yet
Command Area Development by Using FAO Cropwat 8.0 Model and Impact of Climate Change On Crop Water Requirement-A Case Study On Araniar Reservoir Basin (Pichatur Dam)
14 pages

Unit-2 DL Cse

Uploaded by

Unit-2 DL Cse

Uploaded by

UNIT-2

Feedforward Neural Networks

Learning Parameters of Feedforward Neural Networks:

Let us now use this knowledge to find the number of parameters.

Scenario 1: A feed-forward neural network with just

h = number of neurons in hidden layer

o = number of neurons in output layer

Let us try to generalize using this equation and find a formula.

i = number of neurons in input layer

h1 = number of neurons in first hidden layer

h2 = number of neurons in second hidden layer

h3 = number of neurons in third hidden layer

o = number of neurons in output layer

Let us try to generalize using this equation and find a formula.

Backpropagation Gradient Descent (GD):

To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1

To calculate the final result of H1, we performed the sigmoid function as

To calculate the final result of y1 we performed the sigmoid function as

We will calculate the value of y2 in the same way as y1

To calculate the final result of H1, we performed the sigmoid function as

So, the total error is

We perform a backward process so first consider the last weight w5 as

So, we put the values of in equation no (3) to find the

We will calculate the error at w1 as

will again split because in E1 and E2 there is no H1 term. Splitting is

From equation (18)

From equation (8)

Putting the value of e-y2 in equation (23)

From equation (21)

Now from equation (16) and (17)

We have we need to figure out as

So, we put the values of in equation (13) to find the final

Root Mean Squared Propagation (RMSProp)

s(t+1) = (s(t) * rho) + (f'(x(t))^2 * (1.0-rho))

cust_step_size(t+1) = step_size / (1e-8 + RMS(s(t+1)))

x(t+1) = x(t) – cust_step_size(t+1) * f'(x(t))

benefits of using Adam on non-convex optimization problems, as follows:

Adam as combining the advantages of two other extensions of stochastic gradient

Weighted initialization in neural network:

Different Weight Initialization Techniques

Zero Initialization (Initialized all weights to 0)

Random Initialization (Initialized weights randomly)

(b) Exploding gradients :

Eigen values and Eigen vectors:

Equation 3 det(A - λI) = 0

You might also like