0% found this document useful (0 votes)

25 views19 pages

Lecture Slides 4 - Backpropagation - 2021

The document discusses backpropagation, which is a method for calculating the gradient of the cost function with respect to the parameters of a neural network. It defines working variables like activation values and gradients, then describes how to calculate gradients starting from the output layer and propagating backwards through the network using the chain rule.

Uploaded by

alvin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views19 pages

Lecture Slides 4 - Backpropagation - 2021

Uploaded by

alvin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Lecture 7: Backpropagation for Neural Networks.

Dr Etienne Pienaar
[email protected] Room 5.43

University of Cape Town

October 11, 2021

1/12
Backpropagation
Based on Nielsen (2015).

2/12
Backpropagation

Backpropagation is a strategy for calculating the gradient of the cost

function with respect to the parameters of the network. Although the
calculations can become tedious, the only mathematical machinery
required is the chain rule for dierentiation. First, one denes working
variables:
A linear component:
dl−1
X
zjl = al−1 l l
k wkj + bj
k=1

A working gradient:
∂C
δjl =
∂zjl

3/12
After dening the working variables, one starts at the terminal nodes of
the network and work backwards. For these purposes, we calculate the
gradient w.r.t. the working variable in the output layer:
dX
L =q
∂C ∂aL
δjL = k

k=1
∂aL
k ∂zj
L

∂C ∂aL j
=
∂aL
j ∂zj
L

∂C 0 L
= σ (zj )
∂aL
j

Note that everything we need to evaluate δjL is already known after

evaluating the forward updating equation for the network. That is,
after a forward pass through the network, all we need to know running
backwards are the activation function specications and working
variable values.

4/12
Now, in order to evaluate the gradients in the interim layers of the
network, we iterate backwards from the terminal layer. For these
purposes, we derive:
∂C
δjl−1 =
∂zjl−1

5/12
Now, in order to evaluate the gradients in the interim layers of the
network, we iterate backwards from the terminal layer. For these
purposes, we derive:
∂C
δjl−1 = Cost. (l) w.r.t. linear component (l − 1)
∂zjl−1
dl
X ∂C ∂zkl
= Chain R.
k=1
∂zkl ∂zjl−1
dl
X ∂zkl
= δkl Swap.+w.g.
k=1 ∂zjl−1

5/12
So, we start with terminal working gradients δjL and then work our way
backwards using the backwards updating eqaution to nd δjL−1 , δjL−2
etc. That is, evaluate
∂C 0 L
δjL = σ (zj )
∂aL
j

and then propagate:

dl
X
δjl−1 = l
wjk δkl σ 0 (zjl−1 )
k=1

for all l, and j .

But these are `working' gradients. What does that have to do with
the parameters of our model?

6/12
As it turns out, we can make some nal manipulations to get the
gradient of the cost function w.r.t. the model parameters. First, for the
biases of the model, we have:
∂C ∂C ∂zjl
= ,
∂blj ∂zjl ∂blj (1)
= δjl

and then for the weights:

∂C ∂C ∂zjl
= l
l
∂wkj l
∂zj ∂wkj (2)
= al−1 l
k δj .

Equations 1 and 2 along with the terminal condition in Equation 6 dene

(backwards) updating equations for the back-propagation procedure.

7/12
Vector and Matrix form of backprop?

Can you write these equations in vector-form? Can you write them in
matrix-form? Start by dening:
δ1l
 l
z1l
   
a1
 δ2l   al2   z2l 
δl =  .  al =  .  zl =  . 
 ..   ..   .. 
     

l
δdl d ×1 aldl d ×1 zdl l d ×1
l l l

and of course the parameters Wl = (wjk

l
)dl−1 ×dl and bl = (blj )dl ×1 , so
that:
z l = WTl al−1 + bl
al = σ(z l )

8/12
For the vector-form of the equations, we can look at the structure of the
scalar updating equations for some guidance:
∂C
δjL = × σ 0 (zjL )
∂aL
j

for all j in the nal layer. So calculate a vector of elements ∂a

∂C
L and a
j

vector of elements σ 0 (zjL ) and calculate the vector δ L element-wise.

Then for the backward update:
dl
X
δjl−1 = l
wjk δkl × σ 0 (zjl−1 ).
k=1
| {z }
matrix mult?
The rst term coincides with the def. of matrix multiplication (post.m.
by vector), resulting an a single element on a vector. Again, conclude
with element-wise multiplication.

9/12
Written out in full, we arrive at:

= [Wl ]dl−1 ×dl [δ l ]dl ×1 (zl−1 )]dl−1 ×1

∂Ci 0
∂ zl−1
δ l−1 = [σl−1

for l = L − 1, L − 2, ... where σl0 (.) denotes the derivative of the

activation function in layer l and the terminal condition for the
equations are given by:
∂Ci 0
(zL ).
∂ aL
δL = σL

10/12
Again, that's only half the story... What about the weights and biases?
Well for the biases we just use
∂Ci ∂Ci
= δ ldl ×1 .
∂ zl−1
=
∂b l−1

in each layer. Then for the weights, noting again the similarity to matrix
multiplication:

= al−1 δ Tl = [al−1 ]dl−1 ×1 [δ Tl ]1×dl

∂Ci
∂ Wl

Does this make sense? What are the dimensions of Wl ?

11/12
References I

[Nielsen 2015] Nielsen, Michael A.: Neural networks and deep

learning. Determination Press, 2015

12/12

The Koch Snowflake
75% (8)
The Koch Snowflake
16 pages
Inverse Trigonometric Functions: Memory Tips
No ratings yet
Inverse Trigonometric Functions: Memory Tips
5 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Abstract Data Types: Asim Rehan
No ratings yet
Abstract Data Types: Asim Rehan
15 pages
Achievement Test
No ratings yet
Achievement Test
17 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Quarter 1-Module 5: Mathematics
100% (1)
Quarter 1-Module 5: Mathematics
14 pages
Catalogo Erico Pararrayos Dinasphere
100% (1)
Catalogo Erico Pararrayos Dinasphere
6 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Pest Identification Using Matlab
100% (1)
Pest Identification Using Matlab
14 pages
Friction - DPPs
No ratings yet
Friction - DPPs
11 pages
14 Backprop
No ratings yet
14 Backprop
34 pages
Notes On Backpropagation
No ratings yet
Notes On Backpropagation
14 pages
ICSE Class 9 Mathematics Chapter 14 - Rectilinear Figures Revision Notes
No ratings yet
ICSE Class 9 Mathematics Chapter 14 - Rectilinear Figures Revision Notes
7 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
Multilayer Perceptrons Neural Networks
No ratings yet
Multilayer Perceptrons Neural Networks
19 pages
Partial Differential Equations
No ratings yet
Partial Differential Equations
45 pages
Curs3site PDF
No ratings yet
Curs3site PDF
38 pages
OMBC106 Research Methodology
No ratings yet
OMBC106 Research Methodology
13 pages
Tic Tac Toe
No ratings yet
Tic Tac Toe
34 pages
MAT 1100 Inequalities - 2020
No ratings yet
MAT 1100 Inequalities - 2020
15 pages
Backpropagation: TA: Yi Wen
No ratings yet
Backpropagation: TA: Yi Wen
39 pages
Chapter 3-3 Neural Network-Back Propagation
No ratings yet
Chapter 3-3 Neural Network-Back Propagation
32 pages
Learning 3
No ratings yet
Learning 3
98 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
NN 2
No ratings yet
NN 2
31 pages
BER Analysis Power Point Presentation
No ratings yet
BER Analysis Power Point Presentation
39 pages
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
No ratings yet
Neural Networks - III: ICT3212 - Introduction To Intelligence Systems COM3303 - Artificial Intelligence
44 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
BBA Full Syllybus-DBI COLLEGE
No ratings yet
BBA Full Syllybus-DBI COLLEGE
40 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
Ann2018 L6
No ratings yet
Ann2018 L6
18 pages
ME Math 10 Q2 1002 PS
No ratings yet
ME Math 10 Q2 1002 PS
26 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Week 5 Lecture Notes
No ratings yet
Week 5 Lecture Notes
15 pages
Week5-LectureNotes
No ratings yet
Week5-LectureNotes
15 pages
Exp 3
No ratings yet
Exp 3
9 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
NN 2
No ratings yet
NN 2
31 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
ANN Research
No ratings yet
ANN Research
18 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
26 pages
Lecture 10
No ratings yet
Lecture 10
19 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
02B DL2023 NN Backprop
No ratings yet
02B DL2023 NN Backprop
45 pages
Back in NN
No ratings yet
Back in NN
12 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
Week 4 Groundwater Flow Equation - Unconfined Aquifer
No ratings yet
Week 4 Groundwater Flow Equation - Unconfined Aquifer
18 pages
Exp 4
No ratings yet
Exp 4
9 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Lecture 17-Backpropagation
No ratings yet
Lecture 17-Backpropagation
28 pages
38 Backpropagation
No ratings yet
38 Backpropagation
19 pages
Exp - 4 - 5 (Prakash)
No ratings yet
Exp - 4 - 5 (Prakash)
10 pages
Back Propagation
No ratings yet
Back Propagation
9 pages
Lecture 10
No ratings yet
Lecture 10
19 pages
Object Detection and Ship Classification Using YOLO and Amazon Rekognition
No ratings yet
Object Detection and Ship Classification Using YOLO and Amazon Rekognition
11 pages
Math 5 Week 2 Q1
No ratings yet
Math 5 Week 2 Q1
9 pages
Module 2
No ratings yet
Module 2
14 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
No ratings yet
Neural Networks Backpropagation Algorithm: COMP4302/COMP5322, Lecture 4, 5
11 pages
Back Propagation
No ratings yet
Back Propagation
10 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
Backpropagation Algorithm
No ratings yet
Backpropagation Algorithm
12 pages
Back Propagation
No ratings yet
Back Propagation
5 pages
10 32 Soft-Jarnote
No ratings yet
10 32 Soft-Jarnote
5 pages
ANN (MLP Back-Propagation) 07
No ratings yet
ANN (MLP Back-Propagation) 07
24 pages
Backpropergation
No ratings yet
Backpropergation
4 pages
15a. Caretium NB-201 PDF
No ratings yet
15a. Caretium NB-201 PDF
2 pages
Backpropagation MLP
No ratings yet
Backpropagation MLP
11 pages
Backprop Unit 2
No ratings yet
Backprop Unit 2
5 pages
GDC Revision
No ratings yet
GDC Revision
10 pages
Back Propagation Learning Algorithm
No ratings yet
Back Propagation Learning Algorithm
15 pages
STUDY Back Propagation
No ratings yet
STUDY Back Propagation
11 pages
Backpropagation in Neural Nets
No ratings yet
Backpropagation in Neural Nets
13 pages
Quantitative Analysis For Decision Making: Course Description
No ratings yet
Quantitative Analysis For Decision Making: Course Description
4 pages
Vidyapeeth: @icse - 2024 - Materials - Backup
No ratings yet
Vidyapeeth: @icse - 2024 - Materials - Backup
7 pages
BackPropagation Through Time
No ratings yet
BackPropagation Through Time
6 pages
Science of The Egg Drop1
No ratings yet
Science of The Egg Drop1
2 pages
III-Day 37
No ratings yet
III-Day 37
3 pages
Vector Operations 1
No ratings yet
Vector Operations 1
4 pages
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
No ratings yet
Multilayer Perceptrons and Backpropagation Learning: 1 Some History
6 pages
Conservation of Energy
No ratings yet
Conservation of Energy
3 pages
A Natural Asymmetry in Electrical Systems With Far-Reaching Consequences
No ratings yet
A Natural Asymmetry in Electrical Systems With Far-Reaching Consequences
4 pages
Act std4
No ratings yet
Act std4
3 pages
Illustration of BackP Learn
No ratings yet
Illustration of BackP Learn
2 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Feynman Lectures Simplified 2C: Electromagnetism: in Relativity & in Dense Matter
From Everand
Feynman Lectures Simplified 2C: Electromagnetism: in Relativity & in Dense Matter
Robert Piccioni
No ratings yet

Lecture Slides 4 - Backpropagation - 2021

Uploaded by

Lecture Slides 4 - Backpropagation - 2021

Uploaded by

Lecture 7: Backpropagation for Neural Networks.

University of Cape Town

October 11, 2021

Backpropagation is a strategy for calculating the gradient of the cost

Note that everything we need to evaluate δjL is already known after

and then propagate:

for all l, and j .

and then for the weights:

Equations 1 and 2 along with the terminal condition in Equation 6 dene

and of course the parameters Wl = (wjk

for all j in the nal layer. So calculate a vector of elements ∂a

vector of elements σ 0 (zjL ) and calculate the vector δ L element-wise.

= [Wl ]dl−1 ×dl [δ l ]dl ×1 (zl−1 )]dl−1 ×1

for l = L − 1, L − 2, ... where σl0 (.) denotes the derivative of the

= al−1 δ Tl = [al−1 ]dl−1 ×1 [δ Tl ]1×dl

Does this make sense? What are the dimensions of Wl ?

[Nielsen 2015] Nielsen, Michael A.: Neural networks and deep

You might also like

Equations 1 and 2 along with the terminal condition in Equation 6 dene

for all j in the nal layer. So calculate a vector of elements ∂a