0% found this document useful (0 votes)

5 views5 pages

Computing Gradient Using Backpropagation: ZV0GDF798E

The document explains the application of gradient descent in deep neural networks, focusing on how to compute gradients using backpropagation to minimize a cost function. It details the steps involved in gradient descent, including weight initialization, forward propagation, and iterative optimization until optimal parameters are found. Additionally, it discusses the challenges of non-convex functions and the historical context of neural networks in relation to advancements in computing power.

Uploaded by

Mandy Law

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views5 pages

Computing Gradient Using Backpropagation: ZV0GDF798E

Uploaded by

Mandy Law

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Computing Gradient using Backpropagation

Now let's understand how to apply gradient descent to a deep neural network. We
have a cost function that we'd like to minimize using gradient descent.

So far when we talked about perceptrons we described them as linear plus nonlinear.
We took the threshold function to be our non-linearity, but there are many other
reasonable choices, like the Sigmoid function. The calculation is similar in the case of
sigmoid, the perceptron computes a linear function of their inputs using their weights,
adds in a constant term called the bias, and feeds this value into a sigmoid function to
get the output.

[email protected]
ZV0GDF798E

One of the reasons it will be important to work with Sigmoids is that these functions
are smooth and differentiable, whereas the derivative of the threshold is either zero or
undefined. So the value of the Sigmoid ranges from 0 to 1 and the derivative of the
Sigmoid lies in the range of 0 to 0.5.

This small derivative and the fact that it’s a fraction between 0 and 1 is one of the
reasons for not using the Sigmoid function in the hidden layers of a deep neural
network.

We need to minimize or approximate the non-convex function which comes from

computing the quadratic cost function on the output using gradient descent. This
quadratic cost function depends on all of the weights and biases inside the network.

1
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
The gradient is a measure of how changes to the weights and biases change the value
of the cost function, and we know we want to move in the direction opposite to the
gradient because we want to minimize the cost function.

Let’s quickly discuss what Gradient descent actually is.

Gradient descent is an optimization algorithm used to find the values of parameters
(coefficients) of a function (f) that minimizes a cost function (cost). You can think of it as
a large bowl. This bowl is a plot of the cost function (f). A random position on the
surface of the bowl is the cost of the current values of the coefficients (cost). The
bottom of the bowl is the cost of the best set of coefficients, the minimum of the
function. The goal is to continue to try different values for the coefficients, evaluate
their cost, and select new coefficients that have a slightly better (lower) cost.
Repeating this process enough times will lead to the bottom of the bowl and you will
know the values of the coefficients that result in the minimum cost.
You may read more about gradient descent here.

[email protected]
ZV0GDF798E

Steps in Gradient descent :

1. Initialize the weights with random values and calculate the cost function.
2. Calculate the gradient. We need to know the gradient so that we know the
direction (sign) to move the coefficient values in order to get a lower cost on the
next iteration.
3. Adjust the weights with the gradients to reach the optimal values where cost is
minimized.

new_coefficient = old_coefficient – (learning_rate * gradient)

2
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
4. Use the new weights for prediction and to calculate the new cost.
5. Repeat steps 2 and 3 till further adjustments to weights don’t significantly
reduce the error.

How do we compute the gradient?

In the context of neural networks, we can compute the gradient using what's called
backpropagation. This method helps calculate the gradient of a loss function with
respect to all the weights in the network. The intuition is much simpler than what
backpropagation is really doing, which is using the chain rule to compute the gradient
layer by layer.

[email protected]
ZV0GDF798E

Let's get into the intuition - imagine the m outputs of our neural network are 𝑎1,𝑎2....𝑎𝑚
and these are parameters that we feed into the quadratic cost function. Then it's
straightforward to compute the partial derivative of the cost function with respect to
any of these parameters. The partial derivative of a function of several variables is its
derivative with respect to one of those variables, with the others held constant (as
opposed to the total derivative, in which all variables are allowed to vary). Here we are
going to find a gradient for each of the parameters. This gradient says how much the
parameter needs to be changed to get to the global minima.

3
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Now the idea is that these m outputs are themselves based on the weights in the
previous layer. If we think about the ith output, it's a function of the weights coming into
the ith perceptron, in the last layer, and also its bias - we can compute again how
changes in these weights and biases affect the ith output, this is exactly where we need
the non-linear function. In our case, the Sigmoid is differentiable. Backpropagation
continues in this manner computing the partial derivatives of how the cost function
changes as we vary the weights and the element layer based on the partial derivatives
that we've computed for the weights and biases in the layer above. That's
backpropagation.

Now that we have all the tools we need to apply deep learning, let’s understand all the
steps involved in creating neural networks:

1. Pick a network architecture: Decide the number of layers, the number of

perceptrons in each layer, the activation functions, and the number of output
perceptrons according to the problem statement.
2. Random Initialization of Weights: The weights are randomly initialized to a
[email protected]
ZV0GDF798E value between 0 and 1, or rather, very close to zero.
3. Implementation of Forward Propagation to calculate the cost for a set of input
vectors for any of the hidden layers. The cost function would help determine
how well the neural network fits the training data.
4. Implementation of Backpropagation to compute the gradient.
5. Using the gradient descent technique with backpropagation to try and minimize
the cost function as a function of parameters or weights.
6. Repeat steps 3 to 5 until the model finds the optimal parameters.

Why does gradient descent on a non-convex function work at all? In low

dimensions, it seems obvious that it really would get stuck in local minima. But in high
dimensions, it seems to actually work, and the truth is that no one knows why.

There are several possible explanations:

1. Maybe these functions are closer to convex than we think and are at least
convex on a large region of states.

4
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
2. Another factor may be, what it means for a point to be a local minimum is much
more stringent in higher dimensions than in lower dimensions. A point is a local
minimum if in every direction you try to move, the function starts to increase.
Intuitively, it seems much harder to be a local minimum in higher dimensions
because there are so many directions which you could escape from.
3. Yet another takeaway is that when you apply backpropagation to fit the
parameters, you might get very different answers depending on where you
start. This just doesn't happen with convex functions, because wherever you
start from you'll always end up in the same place, like a globally optimal
solution.

Also, neural networks first became popular in the 1980s, but computers just weren't
powerful enough back then for us to implement and understand the power of deep
neural networks, so it was only possible to work with fairly small neural networks. The
truth is that if you're stuck with a small neural network, and you want to solve some
classification task, there are much better machine learning approaches such as
Ensemble methods and Support Vector Machines. But in recent times, vast advances in
[email protected]
ZV0GDF798Ecomputing power have made it feasible to work with truly huge neural networks, and
this has been a major driving force behind their research.

Lastly, about hierarchical representations - we talked about some of the philosophy

and connections to neuroscience. There is, at best, a very loose parallel between these
two domains, and you're advised not to take this comparison too literally. In the early
days, there was so much focus on doing exactly what neuroscience tells us happens in
the visual cortex, that researchers actually stayed away from gradient descent because
it wasn't and still isn't clear that the visual cortex can actually implement these types
of algorithms.

5
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.

Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
CS601 - Machine Learning - Unit 2 New
No ratings yet
CS601 - Machine Learning - Unit 2 New
56 pages
AI Unit II Lec Notes Deep Learning
No ratings yet
AI Unit II Lec Notes Deep Learning
64 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
No ratings yet
CS601 - Machine Learning - Unit 2 - Notes - 1672759753
14 pages
DL Unit2
No ratings yet
DL Unit2
113 pages
DL CS 5 M2 Class Live Session Flow May 25
No ratings yet
DL CS 5 M2 Class Live Session Flow May 25
64 pages
Learning 3
No ratings yet
Learning 3
98 pages
A Systematic Literature Review of Machine Learning Methods Applied To Predictive Maintenance
No ratings yet
A Systematic Literature Review of Machine Learning Methods Applied To Predictive Maintenance
16 pages
EPS-DL-Handout3-Build ANN From Scratch Basics
No ratings yet
EPS-DL-Handout3-Build ANN From Scratch Basics
25 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
DL Unit-I
No ratings yet
DL Unit-I
30 pages
Unit 1
No ratings yet
Unit 1
72 pages
CS601 Machine Learning Unit 2 Notes 1672759753
No ratings yet
CS601 Machine Learning Unit 2 Notes 1672759753
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
Introduction Deep Eng
No ratings yet
Introduction Deep Eng
50 pages
1.1 Introduction
No ratings yet
1.1 Introduction
73 pages
ML.8-Neural Networks - Deep Learning (Week 12,13)
No ratings yet
ML.8-Neural Networks - Deep Learning (Week 12,13)
80 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Lecture Notes 3 &4
No ratings yet
Lecture Notes 3 &4
35 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Ai - W7L13
No ratings yet
Ai - W7L13
46 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
9 Neural Networks Learning
No ratings yet
9 Neural Networks Learning
38 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
EE769 7 Introduction To Neural Networks
No ratings yet
EE769 7 Introduction To Neural Networks
52 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Upload Unit 2
No ratings yet
Upload Unit 2
19 pages
Intro To DL
No ratings yet
Intro To DL
28 pages
Multi Percept Ron
No ratings yet
Multi Percept Ron
14 pages
Back Propagation
No ratings yet
Back Propagation
17 pages
Sigmoid Neural Networks To Predict Handwritten Digits
No ratings yet
Sigmoid Neural Networks To Predict Handwritten Digits
16 pages
Mid 1 DL Notes
No ratings yet
Mid 1 DL Notes
15 pages
Mod 2.4,2.5,2.6 Architecture Design
No ratings yet
Mod 2.4,2.5,2.6 Architecture Design
20 pages
Cs229 Notes Deep Learning
No ratings yet
Cs229 Notes Deep Learning
21 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
NN 2
No ratings yet
NN 2
12 pages
Advanced Machine Learning CIE
No ratings yet
Advanced Machine Learning CIE
13 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
L07 Optimization
No ratings yet
L07 Optimization
12 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
No ratings yet
Neural Networks and Fuzzy Systems: Multi-Layer Feed Forward Networks
27 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
No ratings yet
Tensorflow Keras Pytorch: Step 1: For Each Input, Multiply The Input Value X With Weights W
6 pages
Unit IV BPA GD
No ratings yet
Unit IV BPA GD
12 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
cs188 sp23 Note25
No ratings yet
cs188 sp23 Note25
8 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
Research Gradient Descent
No ratings yet
Research Gradient Descent
8 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
A Weight Decides How Much Influence The Input Will Have On The Output
No ratings yet
A Weight Decides How Much Influence The Input Will Have On The Output
1 page
Powergrout - Ns1: High Performance Precision Grout
No ratings yet
Powergrout - Ns1: High Performance Precision Grout
2 pages
Storingscodes Hisense Hi Therma
No ratings yet
Storingscodes Hisense Hi Therma
54 pages
Btech Mechanical 2nd Year 1624713317
No ratings yet
Btech Mechanical 2nd Year 1624713317
44 pages
SAVCH PLC User's Manual of E and S Series MPU
No ratings yet
SAVCH PLC User's Manual of E and S Series MPU
10 pages
Tips For Show - Don't - Tell
No ratings yet
Tips For Show - Don't - Tell
12 pages
Statik Und Festigkeitslehre (STAFL)
No ratings yet
Statik Und Festigkeitslehre (STAFL)
9 pages
Dips Presentation Planning Weebly
No ratings yet
Dips Presentation Planning Weebly
4 pages
Aristotles Peri Hermeneias in The Latin Middle Ages Essays On The Commentary Tradition Multilingual Hag Braakhuis Instant Download
No ratings yet
Aristotles Peri Hermeneias in The Latin Middle Ages Essays On The Commentary Tradition Multilingual Hag Braakhuis Instant Download
91 pages
Cook, Merwade - 2009 - Effect of Topographic Data, Geometric Configuration and Modeling Approach On Flood Inundation Mapping-Annotated PDF
No ratings yet
Cook, Merwade - 2009 - Effect of Topographic Data, Geometric Configuration and Modeling Approach On Flood Inundation Mapping-Annotated PDF
12 pages
GentileGrabeSelf-esteemMetaRGP2009 - Bibliografia
No ratings yet
GentileGrabeSelf-esteemMetaRGP2009 - Bibliografia
13 pages
Foundations of Elastoplasticity Subloading Surface Model 4th Edition Koichi Hashiguchi Instant Download
No ratings yet
Foundations of Elastoplasticity Subloading Surface Model 4th Edition Koichi Hashiguchi Instant Download
49 pages
8control and DSP Lab - PDF
No ratings yet
8control and DSP Lab - PDF
50 pages
Formulation of Business Strategy Based On SWOT Analysis On PT. Garuda Maintenance Facility Aeoroasia - TBK
No ratings yet
Formulation of Business Strategy Based On SWOT Analysis On PT. Garuda Maintenance Facility Aeoroasia - TBK
19 pages
Worked Examples and Exercises On Redox Titrations
No ratings yet
Worked Examples and Exercises On Redox Titrations
6 pages
Hours Submittal Form
No ratings yet
Hours Submittal Form
2 pages
Course File Power Quality
No ratings yet
Course File Power Quality
10 pages
Day 0412 Typing
No ratings yet
Day 0412 Typing
1 page
Integrative Assessment PPT Rey Reyes 2
No ratings yet
Integrative Assessment PPT Rey Reyes 2
63 pages
Coal Geology Assignment
No ratings yet
Coal Geology Assignment
16 pages
Scientific Notation Is A Way of Writing Numbers That Is Often Used by Scientists and
No ratings yet
Scientific Notation Is A Way of Writing Numbers That Is Often Used by Scientists and
1 page
Do You Ever Feel
No ratings yet
Do You Ever Feel
2 pages
Ccaa Training Catalogue - March 2023 2
No ratings yet
Ccaa Training Catalogue - March 2023 2
27 pages
Chapter 0 - Miscellaneous Preliminaries: EE 520: Topics - Compressed Sensing Linear Algebra Review
No ratings yet
Chapter 0 - Miscellaneous Preliminaries: EE 520: Topics - Compressed Sensing Linear Algebra Review
18 pages
Data Analysis Coca Cola
No ratings yet
Data Analysis Coca Cola
7 pages
Assignment3 Tnlcs
No ratings yet
Assignment3 Tnlcs
7 pages
Activity 1 Algebra & Trigonometry
No ratings yet
Activity 1 Algebra & Trigonometry
3 pages
November DLL Oral-Comm
No ratings yet
November DLL Oral-Comm
4 pages
Atsumi Nr60qs Nr110qs Nr160qs Eng Catalog
No ratings yet
Atsumi Nr60qs Nr110qs Nr160qs Eng Catalog
2 pages
Laporan SP3 LB1 2021 DM
No ratings yet
Laporan SP3 LB1 2021 DM
1 page
Neural Networks: Neural Networks Tools and Techniques for Beginners
From Everand
Neural Networks: Neural Networks Tools and Techniques for Beginners
John Slavio
5/5 (10)
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet

Computing Gradient Using Backpropagation: ZV0GDF798E

Uploaded by

Computing Gradient Using Backpropagation: ZV0GDF798E

Uploaded by

Computing Gradient using Backpropagation

We need to minimize or approximate the non-convex function which comes from

Let’s quickly discuss what Gradient descent actually is.

Steps in Gradient descent :

new_coefficient = old_coefficient – (learning_rate * gradient)

How do we compute the gradient?

1. Pick a network architecture: Decide the number of layers, the number of

Why does gradient descent on a non-convex function work at all? In low

There are several possible explanations:

Lastly, about hierarchical representations - we talked about some of the philosophy

You might also like