0% found this document useful (0 votes)

12 views12 pages

Back in NN

The document discusses backpropagation in neural networks. It explains how the gradient descent algorithm is used to minimize the loss function by updating weights and biases based on their partial derivatives with respect to the loss. It provides equations for computing the gradients and walks through an example calculation to demonstrate how backpropagation works.

Uploaded by

Prathyusha Madavaneri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views12 pages

Back in NN

Uploaded by

Prathyusha Madavaneri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

For a neural network, you will now learn how the loss function is minimised using the

gradient descent function by finding the optimum values of weights and biases
using backpropagation.

The gradient descent algorithm presents us with the following parameter update
equation:

𝑙 𝑙 ∂𝐿
𝑤𝑘𝑗 = 𝑤𝑘𝑗 − η 𝑙
∂𝑤𝑘𝑗

where k and j are the indices of the weight in the weight matrix and l is the index of

the layer to which it belongs.

Given the neural network in the diagram, for the output layer, the following weights
and bias terms will be updated using the gradient descent update equation:

Now, for the hidden layer, the following weights and biases will be updated:
To compute these gradients, you use an algorithm called backpropagation.

As you can see in the above-mentioned formulas, there exist partial derivatives of
the loss function L with respect to the weights and biases in these equations. To
compute these, you use the chain rule. You can observe the dependencies of
different layers in the gradient computation.

Now, let’s simplify the neural network given above and represent it in a condensed
format as shown below.

In this case, the loss function is a function of 𝑤1, 𝑏1, 𝑤2 and 𝑏2.

The loss function, the activation function and the cumulative inputs are shown in the
following expressions:
Now, let’s compute the gradient of the loss function with respect to one of the
weights to understand how backpropagation works.

∂𝐿
Suppose you want to calculate ∂𝑤2
, that is, the gradient of the loss function, with

respect to the weight 𝑤2.

Using the chain rule, you can say that:

∂𝐿 ∂𝐿 ∂ℎ2 ∂𝑧2
∂𝑤2
= ∂ℎ2 ∂𝑧2 ∂𝑤2

Based on the definition of the loss function, L is a direct function of ℎ2, ℎ2 is a function
of 𝑧2 and 𝑧2 is a function of 𝑤2.

Now let’s see how each of the three terms in the RHS of the equation are computed:

1 2 ∂𝐿 ∂ 1 2
𝐿= 2
(𝑦 − ℎ2) →→ ∂ℎ2
= ∂ℎ2 2
(𝑦 − ℎ2) =− (𝑦 − ℎ2)
∂ℎ2 2 2
ℎ2 = 𝑡𝑎𝑛ℎ(𝑧2)→→ ∂𝑧2
= 1 − 𝑡𝑎𝑛ℎ (𝑧2) = 1 − (ℎ2)
∂𝑧2
𝑧2 = 𝑤2ℎ1 + 𝑏2→→ ∂𝑤2
= ℎ1

Hence, you get the gradient of the loss function with respect to 𝑤2, which is shown
below.

∂𝐿 2
∂𝑤2
= [− (𝑦 − ℎ2)][1 − (ℎ2) ][ℎ1]

With this, you have completed the computation of the gradient of the loss function
with respect to the weight 𝑤2 for backpropagation.

Numerical Example Demonstrating Backpropagation

You gained an understanding of the backpropagation technique on a very simple neural
network. You will now learn how the weights and biases, that is, the parameters of the
neural network considered for the house price prediction example, change.

The housing data set has two inputs, which are the size of the house and the number of
rooms available, and one output, which is the price of the house.

As seen in the computation of the forward pass, we randomly initialise the weights and
biases in the network. Let’s now take the same initialisation and the same input
observation that you used earlier while doing forward propagation.

So, we have:

And the input data is as follows:

2
As previously calculated, the output prediction ℎ obtained is 0.63, whereas the actual output 𝑦
is −0.54. Using backpropagation, let’s update the weights and biases such that this difference
between the predicted and the actual output gets minimised.

The steps taken to update the weights and biases between the hidden layer and the
output layer are as follows.
First, you will focus on the weights for the output layer. Let’s take the gradient of 𝐿
2
with respect to 𝑤11.
You know that:
2 2
∂𝐿 ∂𝐿 ∂ℎ1 ∂𝑧1
2 = 2 2 2 (using chain rule)
∂𝑤11 ∂ℎ1 ∂𝑧1 ∂𝑤11

∂𝐿 ∂ 1 2 2 2
1) 2 = 2 2
(𝑦 − ℎ1) =− (𝑦 − ℎ1)
∂ℎ1 ∂ℎ1
∂𝐿
2 =− (− 0. 54 − 0. 63) = 1. 17
∂ℎ1

2
∂ℎ1 2 2
2) 2 = 1 as ℎ1 = 𝑧1 (linear activation)
∂𝑧1

2
∂𝑧1 ∂ 2 2 1 2 1
3). 2 = 2 (𝑏1 + 𝑤11ℎ1 + 𝑤12ℎ2)
∂𝑤11 ∂𝑤11
2
∂𝑧1 1
2 = (ℎ1) = 0. 484
∂𝑤11

∂𝐿
Hence, this evaluates to 2 = 1. 17 * 1 * 0. 484 = 0. 5663.
∂𝑤11

Now, using the update rule for gradient descent and considering the learning rate η
as 0.2:

2 2 ∂𝐿
𝑤11(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤11 − η 2 = 0. 3 − (0. 2 * 0. 5663) = 0. 1867
∂𝑤11
2 2
∂𝐿 ∂𝐿 ∂ℎ1 ∂𝑧1
Similarly, 2 = 2 2 2 .
∂𝑤12 ∂ℎ1 ∂𝑧1 ∂𝑤12

Since you have already computed the first two derivatives, let’s now compute the
third one.
2
∂𝑧1 ∂ 2 2 1 2 1
2 = 2 (𝑏1 + 𝑤11ℎ1 + 𝑤12ℎ2)
∂𝑤12 ∂𝑤12
2
∂𝑧1 1
2 = (ℎ2) = 0. 424
∂𝑤12

∂𝐿
Hence, this evaluates to 2 = 1. 17 * 1 * 0. 424 = 0. 4961.
∂𝑤12

Now,
2 2 ∂𝐿
𝑤12(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤12 − η 2 = 0. 2 − (0. 2 * 0. 4961) = 0. 1008.
∂𝑤12

Similarly, for the bias term, you know that:

2 2
∂𝐿 ∂𝐿 ∂ℎ1 ∂𝑧1
2 = 2 2 2
∂𝑏1 ∂ℎ1 ∂𝑧1 ∂𝑏1

You have computed the first two derivatives already and the third one can be
computed as shown below.
2
∂𝑧1 ∂ 2 2 1 2 1
2 = 2 (𝑏1 + 𝑤11ℎ1 + 𝑤12ℎ2)
∂𝑏1 ∂𝑏1
2
∂𝑧1
2 =1
∂𝑏1

∂𝐿
Hence, this evaluates to 2 = 1. 17 * 1 * 1 = 1. 17.
∂𝑏1

Now,
2 2 ∂𝐿
𝑏1(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑏1 − η 2 = 0. 4 − (0. 2 * 1. 17) = 0. 166.
∂𝑏1

So, you have updated values of weights and biases of the output layer from a single
iteration.

2 2 2
𝑤11(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 0. 1867, 𝑤12(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 0. 1008, 𝑏1(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 0. 166

The steps involved in computing the updated weights and biases in the hidden layer
are shown in this image given below.
Now, let’s start with computing the weights and biases corresponding to the first
neuron of the hidden layer.
1
Taking the gradient of 𝐿 with respect to 𝑤11, you can say that:
2 1 2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝑧1
1 = 2 1 1 = 2 1 [ 1 1 ]
∂𝑤11 ∂ℎ1 ∂ℎ1 ∂𝑤11 ∂ℎ1 ∂ℎ1 ∂𝑧1 ∂𝑤11

You already know the first derivative term.

∂𝐿 ∂ 1 2 2 2
2 = 2 2
(𝑦 − ℎ1) =− (𝑦 − ℎ1)
∂ℎ1 ∂ℎ1
∂𝐿
2 =− (− 0. 54 − 0. 63) = 1. 17
∂ℎ1

Now, let’s compute the second, third and fourth derivative terms.
2
∂ℎ1 ∂ 2 2 1 2 1 2
1) 1 = 1 (𝑏1 + 𝑤11ℎ1 + 𝑤12ℎ2) = 𝑤11 = 0. 30
∂ℎ1 ∂ℎ1

1
∂ℎ1 1 1 1 1
2). 1 = σ(𝑧1)(1 − σ(𝑧1)) = ℎ1(1 − ℎ1) = 0. 484(1 − 0. 484)
∂𝑧1

1
∂𝑧1 ∂ 1 1 1
3) 1 = 1 (𝑏1 + 𝑤11𝑥1 + 𝑤12𝑥2) = 𝑥1 =− 0. 32
∂𝑤11 ∂𝑤11

∂𝐿
Hence, this evaluates to 1 = 1. 17 * 0. 30 * 0. 484 * (1 − 0. 484) * (− 0. 32) =− 0. 028.
∂𝑤11

Now,
1 1 ∂𝐿
𝑤11(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤11 − η 1 = 0. 2 − 0. 2 * (− 0. 028) = 0. 2056.
∂𝑤11

2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝑧1
Similarly, 1 = 2 1 [ 1 1 ].
∂𝑤12 ∂ℎ1 ∂ℎ1 ∂𝑧1 ∂𝑤12

Since you have already computed the values of the first three terms, you simply
need to calculate the pending derivative term.
1
∂𝑧1 ∂ 1 1 1
1 = 1 (𝑏1 + 𝑤11𝑥1 + 𝑤12𝑥2) = 𝑥2 =− 0. 66
∂𝑤12 ∂𝑤12

∂𝐿
Hence, this evaluates to 1 = 1. 17 * 0. 30 * 0. 484 * (1 − 0. 484) * (− 0. 66) =− 0. 058.
∂𝑤12

Now,
1 1 ∂𝐿
𝑤12(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤12 − η 1 = 0. 15 − 0. 2 * (− 0. 058) = 0. 1616.
∂𝑤12

2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝑧1
Similarly, 1 = 2 1 [ 1 1 ].
∂𝑏1 ∂ℎ1 ∂ℎ1 ∂𝑧1 ∂𝑏1

Consider the last term on the RHS of the above equation:

1
∂𝑧1 ∂ 1 1 1
1 = 1 (𝑏1 + 𝑤11𝑥1 + 𝑤12𝑥2) = 1
∂𝑏1 ∂𝑏1

∂𝐿
Hence, this evaluates to 1 = 1. 17 * 0. 30 * 0. 484 * (1 − 0. 484) * 1 = 0. 088.
∂𝑏1

Now,
1 1 ∂𝐿
𝑏1(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑏1 − η 1 = 0. 1 − 0. 2 * (0. 088) = 0. 0824.
∂𝑏1

Hence, for the first node, let’s compute the updated values of the weights and
biases using gradient descent and a learning rate of 0.2 (η).

1 1 ∂𝐿
𝑤11(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤11 − η 1 = 0. 2 − 0. 2 * (− 0. 028) = 0. 2056
∂𝑤11

1 1 ∂𝐿
𝑤12(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤12 − η 1 = 0. 15 − 0. 2 * (− 0. 058) = 0. 1616
∂𝑤12
1 1 ∂𝐿
𝑏1(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑏1 − η 1 = 0. 1 − 0. 2 * (0. 088) = 0. 0824
∂𝑏1

In the same manner, you calculate the weights and biases corresponding to the
second neuron in the hidden layer.
1
Starting with finding the derivative of the loss function L with respect to 𝑤21:
2 1 2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ2 ∂𝐿 ∂ℎ1 ∂ℎ2 ∂𝑧2
1 = 2 1 1 = 2 1 [ 1 1 ]
∂𝑤21 ∂ℎ1 ∂ℎ2 ∂𝑤21 ∂ℎ1 ∂ℎ2 ∂𝑧2 ∂𝑤21

You have already computed the first derivative term.

∂𝐿 ∂ 1 2 2 2
2 = 2 (𝑦 − ℎ1) =− (𝑦 − ℎ1)
∂ℎ1 ∂ℎ1 2
∂𝐿
2 =− (− 0. 54 − 0. 63) = 1. 17
∂ℎ1

Let’s now compute the second, third and fourth ones.

2
∂ℎ1 ∂ 2 2 1 2 1 2
1) 1 = 1 (𝑏1 + 𝑤11ℎ1 + 𝑤12ℎ2) = 𝑤12 = 0. 20
∂ℎ2 ∂ℎ2

1
∂ℎ2 1 1 1 1
2). 1 = σ(𝑧2)(1 − σ(𝑧2)) = ℎ2(1 − ℎ2) = 0. 424(1 − 0. 424)
∂𝑧2

1
∂𝑧2 ∂ 1 1 1
3) 1 = 1 (𝑏2 + 𝑤21𝑥1 + 𝑤22𝑥2) = 𝑥1 =− 0. 32
∂𝑤21 ∂𝑤21

1 1
Also, for 𝑤22 and 𝑏2, the first three terms will remain the same. Only the last term will
change. Hence, you will compute only the last term.
1
∂𝑧2
1 = 𝑥2 =− 0. 66
∂𝑤21
1
∂𝑧2
1 =1
∂𝑏2
Hence, for the second node:
2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝑧2
1 = 2 1 [ 1 1 ] = 1. 17 * 0. 20 * 0. 424 * (1 − 0. 424) * (− 0. 32) =− 0. 018
∂𝑤21 ∂ℎ1 ∂ℎ1 ∂𝑧1 ∂𝑤21

2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝑧2
1 = 2 1 [ 1 1 ] = 1. 17 * 0. 20 * 0. 424 * (1 − 0. 424) * (− 0. 66) =− 0. 038
∂𝑤22 ∂ℎ1 ∂ℎ1 ∂𝑧1 ∂𝑤22

2 1 1
∂𝐿 ∂𝐿 ∂ℎ1 ∂ℎ1 ∂𝑧2
1 = 2 1 [ 1 1 ] = 1. 17 * 0. 20 * 0. 424 * (1 − 0. 424) * 1 = 0. 057
∂𝑏2 ∂ℎ1 ∂ℎ1 ∂𝑧1 ∂𝑏2

Now, computing the updated values of weights and biases using gradient descent
and a learning rate of 0.2 (η):

1 1 ∂𝐿
𝑤21(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤21 − η 1 = 0. 5 − 0. 2 * (− 0. 018) = 0. 5036
∂𝑤21
1 1 ∂𝐿
𝑤22(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑤22 − η 1 = 0. 6 − 0. 2 * (− 0. 038) = 0. 6076
∂𝑤22
1 1 ∂𝐿
𝑏2(𝑢𝑝𝑑𝑎𝑡𝑒𝑑) = 𝑏2 − η 1 = 0. 25 − 0. 2 * (0. 057) = 0. 2386
∂𝑏2

Given below are the new values for weights and biases after one step of gradient
descent for the hidden and the output layers.

Forward Pass with Updated Parameters

Now, let’s perform another forward pass and check if performing backpropagation
and updating the weights and biases once has helped in reducing the loss.
You can see that the loss function computed with the updated weights and biases is
lower than earlier, which is what we want. By repeatedly performing
backpropagation to get optimum values of weights and biases, you can continue
reducing the loss. This, eventually, will help you obtain the predicted output that is
as close as possible to the actual expected output. This is how a neural network
learns using backpropagation.

Algorithm to Train a Neural Network

The pseudocode/pseudo-algorithm to train a neural network is given as follows:

Point 1: Initialise h0 with the inputs

Forward Propagation - Make a Prediction

Point 2: For each layer, compute the cumulative input and apply the non-linear
activation function on each neuron of each layer to get the prediction.

Point 3: For classification, get the probabilities of the observation belonging to a

class. For regression, compute the numerical output.

Point 4: Assess the performance of the neural network through a loss function, for
example, a cross-entropy loss function for classification and RMSE for regression.

Backpropagation - Update the Model Parameters to Reduce Loss

Point 5: Consider the output neuron and compute the derivative of the loss function.

Point 6: Compute the derivative of the loss function with respect to the weights in the
output layer.

Point 7: From the last layer to the first layer, for each layer, compute the gradient of
the loss function with respect to the weights at each layer and all the intermediate
gradients.

Updating the Model Parameters Using an Optimisation Algorithm such as Gradient Descent

Point 8: Once all the gradients of the loss with respect to the weights and biases are
obtained, use the gradient descent update equation to update the values of the
weights and biases.

Repeat This Process Until the Model Gives Acceptable Predictions

Point 9: Repeat the process for a specified number of iterations or until the
predictions made by the model are acceptable.

This algorithm is used to train a neural network.

ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
No ratings yet
4.pattern Recognition (Pattern Classification) - Convolutional Neural Networks - (CNN)
235 pages
SS - GenAI and LLMs-May-2024-25 - Final
No ratings yet
SS - GenAI and LLMs-May-2024-25 - Final
2 pages
DeepLearning Practice Question Answers
No ratings yet
DeepLearning Practice Question Answers
43 pages
State Machine Diagram Vs Activity Diagram
0% (1)
State Machine Diagram Vs Activity Diagram
1 page
Back-Propagation Is Very Simple. Who Made It Complicated
No ratings yet
Back-Propagation Is Very Simple. Who Made It Complicated
26 pages
International Baccalaureate (IB) : Artificial Neural Networks - #3
No ratings yet
International Baccalaureate (IB) : Artificial Neural Networks - #3
13 pages
(IJCST-V6I4P17) :P T V Lakshmi
No ratings yet
(IJCST-V6I4P17) :P T V Lakshmi
4 pages
Lab 5: 16 April 2012 Exercises On Neural Networks
No ratings yet
Lab 5: 16 April 2012 Exercises On Neural Networks
6 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
NN Lecture Notes
No ratings yet
NN Lecture Notes
45 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Unit 2
No ratings yet
Unit 2
36 pages
Machine Learning Lecture 11
No ratings yet
Machine Learning Lecture 11
28 pages
Week 2
No ratings yet
Week 2
17 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
CISC 867 Deep Learning: 12. Recurrent Neural Networks
No ratings yet
CISC 867 Deep Learning: 12. Recurrent Neural Networks
72 pages
Object Oriented Methods A Foundation PDF
No ratings yet
Object Oriented Methods A Foundation PDF
2 pages
A Comparison of Deep Learning Methods For Time Series Forecasting With Limited Data
No ratings yet
A Comparison of Deep Learning Methods For Time Series Forecasting With Limited Data
55 pages
Modue 2 - Back Propagation Algorithm-Updated
No ratings yet
Modue 2 - Back Propagation Algorithm-Updated
51 pages
Ann PPT
No ratings yet
Ann PPT
48 pages
Machine Learning Based Decline Curve Analysis For
No ratings yet
Machine Learning Based Decline Curve Analysis For
23 pages
Back Propagation-2-20
No ratings yet
Back Propagation-2-20
19 pages
03 DiscreteDynamics
No ratings yet
03 DiscreteDynamics
37 pages
The Statistical Analysis of Crash-Frequency Data
No ratings yet
The Statistical Analysis of Crash-Frequency Data
49 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
Presentation 1
No ratings yet
Presentation 1
14 pages
BP Sum
No ratings yet
BP Sum
13 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
ML Lec-23
No ratings yet
ML Lec-23
20 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
L4deep Learning
No ratings yet
L4deep Learning
14 pages
Backpropagation Working Error Computation Adjusting Weights
No ratings yet
Backpropagation Working Error Computation Adjusting Weights
12 pages
Back Propagation LSN 4
No ratings yet
Back Propagation LSN 4
17 pages
SC Practicals1 To 10 Black Book
No ratings yet
SC Practicals1 To 10 Black Book
32 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102084 2024-09-03 Reference-Material-I
16 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
MLP (Backward Propagation)
No ratings yet
MLP (Backward Propagation)
16 pages
Ai Assignment 2 Answer
No ratings yet
Ai Assignment 2 Answer
12 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Derivations For Back Propagation of Multilayer Neural Network
No ratings yet
Derivations For Back Propagation of Multilayer Neural Network
14 pages
02B DL2023 NN Backprop
No ratings yet
02B DL2023 NN Backprop
45 pages
ANN Example
No ratings yet
ANN Example
10 pages
MLP Numerical
No ratings yet
MLP Numerical
19 pages
Exp 4
No ratings yet
Exp 4
9 pages
Answer: Given Y 3x + 1, X
No ratings yet
Answer: Given Y 3x + 1, X
9 pages
Dfa Nfa: Sipser Pages 54-58
No ratings yet
Dfa Nfa: Sipser Pages 54-58
13 pages
Ad3501 Deep Learning QB
No ratings yet
Ad3501 Deep Learning QB
8 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
A Comprehensive Survey and Analysis of Generative Models in Machine Learning - CSR 2020
No ratings yet
A Comprehensive Survey and Analysis of Generative Models in Machine Learning - CSR 2020
29 pages
Lecture 3
No ratings yet
Lecture 3
24 pages
38 Backpropagation
No ratings yet
38 Backpropagation
19 pages
Backpropogation Algorithm
No ratings yet
Backpropogation Algorithm
48 pages
Flat Bits
No ratings yet
Flat Bits
9 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
A Step by Step Forward Pass and Backpropagation Example
No ratings yet
A Step by Step Forward Pass and Backpropagation Example
14 pages
Backpropagation (Numericals) SOLVED NEW
No ratings yet
Backpropagation (Numericals) SOLVED NEW
8 pages
Unit 2
No ratings yet
Unit 2
15 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
使用RNN预测股票下一日的收盘价
No ratings yet
使用RNN预测股票下一日的收盘价
14 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
Sta2604 2018 TL 103 2 B
No ratings yet
Sta2604 2018 TL 103 2 B
10 pages
Eio Supplementary
No ratings yet
Eio Supplementary
6 pages
Week 11
No ratings yet
Week 11
3 pages
Backpropagation in Neural Nets
No ratings yet
Backpropagation in Neural Nets
13 pages
Math 130B, Fall 2018 Final Exam
No ratings yet
Math 130B, Fall 2018 Final Exam
6 pages
Pseudo Label Final
No ratings yet
Pseudo Label Final
6 pages
423 - ShreyaKumari - TSA - 2 - Shreya Kumari
No ratings yet
423 - ShreyaKumari - TSA - 2 - Shreya Kumari
5 pages
2020 CS182 Section 7 Notes
No ratings yet
2020 CS182 Section 7 Notes
5 pages
Multilayer ANN For Regression 5107
No ratings yet
Multilayer ANN For Regression 5107
7 pages
Tipping Point Analysis
No ratings yet
Tipping Point Analysis
5 pages
Module-1 Backpropagation Process in Deep Neural Network
No ratings yet
Module-1 Backpropagation Process in Deep Neural Network
5 pages
Backpropagation
No ratings yet
Backpropagation
4 pages
Assign 6 2021 Autumn
No ratings yet
Assign 6 2021 Autumn
3 pages
ANN Backpropagation: Weight Updates For Hidden Nodes: Step 1: Update The Weights V
No ratings yet
ANN Backpropagation: Weight Updates For Hidden Nodes: Step 1: Update The Weights V
3 pages
Complexity: P & NP
No ratings yet
Complexity: P & NP
1 page
Assignment 3
No ratings yet
Assignment 3
3 pages
Lecture 6 Instrumental Variables (IV) Estimation and Two Stage Least Squares (2SLS)
No ratings yet
Lecture 6 Instrumental Variables (IV) Estimation and Two Stage Least Squares (2SLS)
2 pages
TAFL 1st Sessional
No ratings yet
TAFL 1st Sessional
2 pages
Online FDP Schdule
No ratings yet
Online FDP Schdule
1 page
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet

Back in NN

Uploaded by

Back in NN

Uploaded by

For a neural network, you will now learn how the loss function is minimised using the

the layer to which it belongs.

respect to the weight 𝑤2.

Using the chain rule, you can say that:

Numerical Example Demonstrating Backpropagation

And the input data is as follows:

Similarly, for the bias term, you know that:

You already know the first derivative term.

Consider the last term on the RHS of the above equation:

You have already computed the first derivative term.

Let’s now compute the second, third and fourth ones.

Forward Pass with Updated Parameters

Algorithm to Train a Neural Network

The pseudocode/pseudo-algorithm to train a neural network is given as follows:

Point 1: Initialise h0 with the inputs

Forward Propagation - Make a Prediction

Point 3: For classification, get the probabilities of the observation belonging to a

Backpropagation - Update the Model Parameters to Reduce Loss

Repeat This Process Until the Model Gives Acceptable Predictions

This algorithm is used to train a neural network.

You might also like