0% found this document useful (0 votes)

191 views48 pages

BackPropagation PDF

The document discusses the back-propagation algorithm, which is used to calculate the gradient of a cost function with respect to the weights in a neural network. It first defines notation used in neural networks, such as weights, biases, transfer functions. It then derives the equations to calculate the error rate with respect to weights for both output layer nodes and nodes in hidden layers. The goal is to minimize this error rate to optimize the weights in the network.

Uploaded by

sridhiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views48 pages

BackPropagation PDF

Uploaded by

sridhiya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

The back-propagation algorithm

January 8, 2012

Ryan
The neuron
I The sigmoid equation is what is typically used as a transfer
function between neurons. It is similar to the step function,
but is continuous and differentiable.
The neuron
I The sigmoid equation is what is typically used as a transfer
function between neurons. It is similar to the step function,
but is continuous and differentiable.
I
1
σ(x) = (1)
1 + e −x

x
-5 -4 -3 -2 -1 0 1 2 3 4 5

Figure: The Sigmoid Function

The neuron
I The sigmoid equation is what is typically used as a transfer
function between neurons. It is similar to the step function,
but is continuous and differentiable.
I
1
σ(x) = (1)
1 + e −x

x
-5 -4 -3 -2 -1 0 1 2 3 4 5

Figure: The Sigmoid Function

I One useful property of this transfer function is the simplicity

of computing it’s derivative. Let’s do that now...
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
1 + e −x − 1
=
(1 + e −x )2
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x 1
= −
(1 + e −x )2 (1 + e −x ) 2
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x

1 2
= −
(1 + e −x )2 1 + e −x
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x

1 2
= −
(1 + e −x )2 1 + e −x
= σ(x) − σ(x)2
The derivative of the sigmoid transfer function

d d 1
σ(x) =
dx dx 1 + e −x
e −x
=
(1 + e −x )2
(1 + e −x ) − 1
=
(1 + e −x )2
1 + e −x

1 2
= −
(1 + e −x )2 1 + e −x
= σ(x) − σ(x)2
σ 0 = σ(1 − σ)
Single input neuron

ω
ξ σ O

Figure: A Single-Input Neuron

In the above figure (2) you can see a diagram representing a single
neuron with only a single input. The equation defining the figure is:

O = σ(ξω)
Single input neuron

ω
ξ σ O

Figure: A Single-Input Neuron

In the above figure (2) you can see a diagram representing a single
neuron with only a single input. The equation defining the figure is:

O = σ(ξω + θ)
Multiple input neuron

θ
ω1
ξ1
ω2 P
ξ2 σ O
ω3
ξ3

Figure: A Multiple Input Neuron

Figure 3 is the diagram representing the following equation:

O = σ(ω1 ξ1 + ω2 ξ2 + ω3 ξ3 + θ)
A neural network

Figure: A layer
A neural network

Figure: A neural network

A neural network

I J K

Figure: A neural network

The back propagation algorithm

Notation
I xj` : Input to node j of layer `
The back propagation algorithm

Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
The back propagation algorithm

Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
The back propagation algorithm

Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
I θ` : Bias of node j of layer `
j
The back propagation algorithm

Notation
I xj` : Input to node j of layer `
I Wij` : Weight from layer ` − 1 node i to layer ` node j
1
I σ(x) = 1+e −x
: Sigmoid Transfer Function
I θ` : Bias of node j of layer `
j
I Oj` : Output of node j in layer `
The back propagation algorithm

Given a set of training data points tk and output layer output Ok

we can write the error as
1X
E= (Ok − tk )2
2
k∈K

We let the error of the network for a single training iteration be

∂E
denoted by E . We want to calculate ∂W ` , the rate of change of
jk
the error with respect to the given connective weight, so we can
minimize it.
Now we consider two cases: The node is an output node, or it is in
a hidden layer...
Output layer node

∂E
=
∂Wjk
Output layer node

∂E ∂ 1X
= (Ok − tk )2
∂Wjk ∂Wjk 2
k∈K
Output layer node

∂E ∂
= (Ok − tk ) Ok
∂Wjk ∂Wjk
Output layer node

∂E ∂
= (Ok − tk ) σ(xk )
∂Wjk ∂Wjk
Output layer node

∂E ∂
= (Ok − tk )σ(xk )(1 − σ(xk )) xk
∂Wjk ∂Wjk
Output layer node

∂E
= (Ok − tk )Ok (1 − Ok )Oj
∂Wjk
Output layer node

∂E
= (Ok − tk )Ok (1 − Ok )Oj
∂Wjk
For notation purposes I will define δk to be the expression
(Ok − tk )Ok (1 − Ok ), so we can rewrite the equation above as

∂E
= Oj δk
∂Wjk

where
δk = Ok (1 − Ok )(Ok − tk )
Hidden layer node

∂E
=
∂Wij
Hidden layer node

∂E ∂ 1X
= (Ok − tk )2
∂Wij ∂Wij 2
k∈K
Hidden layer node

∂E X ∂
= (Ok − tk ) Ok
∂Wij ∂Wij
k∈K
Hidden layer node

∂E X ∂
= (Ok − tk ) σ(xk )
∂Wij ∂Wij
k∈K
Hidden layer node

∂E X ∂xk
= (Ok − tk )σ(xk )(1 − σ(xk ))
∂Wij ∂Wij
k∈K
Hidden layer node

∂E X ∂xk ∂Oj
= (Ok − tk )Ok (1 − Ok ) ·
∂Wij ∂Oj ∂Wij
k∈K
Hidden layer node

∂E X ∂Oj
= (Ok − tk )Ok (1 − Ok )Wjk
∂Wij ∂Wij
k∈K
Hidden layer node

∂E ∂Oj X
= (Ok − tk )Ok (1 − Ok )Wjk
∂Wij ∂Wij
k∈K
Hidden layer node

∂E ∂xj X
= Oj (1 − Oj ) (Ok − tk )Ok (1 − Ok )Wjk
∂Wij ∂Wij
k∈K
Hidden layer node

∂E X
= Oj (1 − Oj )Oi (Ok − tk )Ok (1 − Ok )Wjk
∂Wij
k∈K
Hidden layer node

∂E X
= Oj (1 − Oj )Oi (Ok − tk )Ok (1 − Ok )Wjk
∂Wij
k∈K

But, recalling our definition of δk we can write this as

∂E X
= Oi Oj (1 − Oj ) δk Wjk
∂Wij
k∈K
Hidden layer node

∂E X
= Oj (1 − Oj )Oi (Ok − tk )Ok (1 − Ok )Wjk
∂Wij
k∈K

But, recalling our definition of δk we can write this as

∂E X
= Oi Oj (1 − Oj ) δk Wjk
∂Wij
k∈K

Similar to before we will now define all terms besides the Oi to be

δj , so we have
∂E
= Oi δj
∂Wij
How weights affect errors

For an output layer node k ∈ K

∂E
= Oj δk
∂Wjk
where
δk = Ok (1 − Ok )(Ok − tk )

For a hidden layer node j ∈ J

∂E
= Oi δj
∂Wij
where X
δj = Oj (1 − Oj ) δk Wjk
k∈K
What about the bias?

If we incorporate the bias term θ into the equation you will find
that
∂O ∂θ
= O(1 − O)
∂θ ∂θ
and because ∂θ/∂θ = 1 we view the bias term as output from a
node which is always one.
What about the bias?

If we incorporate the bias term θ into the equation you will find
that
∂O ∂θ
= O(1 − O)
∂θ ∂θ
and because ∂θ/∂θ = 1 we view the bias term as output from a
node which is always one.
This holds for any layer ` we are concerned with, a substitution
into the previous equations gives us that
∂E
= δ`
∂θ
(because the O` is replacing the output from the “previous layer”)
The back propagation algorithm
1. Run the network forward with your input data to get the
network output
2. For each output node compute
δk = Ok (1 − Ok )(Ok − tk )
3. For each hidden node calulate
X
δj = Oj (1 − Oj ) δk Wjk
k∈K

4. Update the weights and biases as follows

Given
∆W = −ηδ` O`−1
∆θ = −ηδ`
apply
W + ∆W → W
θ + ∆θ → θ

BSC Maths Numerical Methods PDF
100% (1)
BSC Maths Numerical Methods PDF
223 pages
Matrix Structural Analysis Mcguire 2nd Ed Solutions
92% (52)
Matrix Structural Analysis Mcguire 2nd Ed Solutions
152 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Java Web Services
No ratings yet
Java Web Services
61 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
Design Thinking
100% (4)
Design Thinking
58 pages
XML Metadata Interchange
No ratings yet
XML Metadata Interchange
122 pages
AWS Cloud Practitioner Practice Set 2
100% (1)
AWS Cloud Practitioner Practice Set 2
63 pages
Back Propagation
100% (1)
Back Propagation
42 pages
Back Propagation in NN
No ratings yet
Back Propagation in NN
30 pages
DT Lab Manul Nayeem
No ratings yet
DT Lab Manul Nayeem
30 pages
Design Fabrication and Analysis of Tri Wheeled Electric Vehicle
No ratings yet
Design Fabrication and Analysis of Tri Wheeled Electric Vehicle
7 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
DMSS
No ratings yet
DMSS
438 pages
Design Thinking 12 Weeks Part Time Course
No ratings yet
Design Thinking 12 Weeks Part Time Course
18 pages
Spreading Design Thinking: Julie Baher, Managing Director Citrix Customer Experience
No ratings yet
Spreading Design Thinking: Julie Baher, Managing Director Citrix Customer Experience
97 pages
How To Generate Ideas Using SCAMPER' Creativity Tool
No ratings yet
How To Generate Ideas Using SCAMPER' Creativity Tool
18 pages
Employee Management Process (Process Model)
No ratings yet
Employee Management Process (Process Model)
20 pages
A Noval Approach of Idea Generation Techniques in Information Technology
100% (1)
A Noval Approach of Idea Generation Techniques in Information Technology
73 pages
The Composite Pattern: Intent
No ratings yet
The Composite Pattern: Intent
8 pages
Introduction To Finite Element Methods
No ratings yet
Introduction To Finite Element Methods
31 pages
Angular2 PDF
No ratings yet
Angular2 PDF
24 pages
Supply Chain Mapping
No ratings yet
Supply Chain Mapping
3 pages
Math Iv Chapter 3
No ratings yet
Math Iv Chapter 3
60 pages
IDEF0 and Software Process Engineering Model
No ratings yet
IDEF0 and Software Process Engineering Model
35 pages
Unit Iii
No ratings yet
Unit Iii
29 pages
Merise Method Power Amc
0% (1)
Merise Method Power Amc
3 pages
Wolf TinsmanBarryDavis DesignThinkingWorkshop
No ratings yet
Wolf TinsmanBarryDavis DesignThinkingWorkshop
66 pages
Take Note:: Miracle Light Christian Academy Casilagan, City of Ilagan, Isabela Mathematics 8
No ratings yet
Take Note:: Miracle Light Christian Academy Casilagan, City of Ilagan, Isabela Mathematics 8
4 pages
Design Thinking and Innovation at Apple
No ratings yet
Design Thinking and Innovation at Apple
36 pages
Numerical Modelling in Fortran: Day 6: Paul Tackley, 2017
No ratings yet
Numerical Modelling in Fortran: Day 6: Paul Tackley, 2017
53 pages
Design Thinking
100% (1)
Design Thinking
15 pages
Strassen's Matrix Multiplication
No ratings yet
Strassen's Matrix Multiplication
11 pages
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
No ratings yet
Big Data Processing, 2014/15: Lecture 8: Pig Latin!
58 pages
Liferay DXP 7.2 Features Overview PDF
No ratings yet
Liferay DXP 7.2 Features Overview PDF
27 pages
ET ZC235 Manufacturing Processes: BITS Pilani
No ratings yet
ET ZC235 Manufacturing Processes: BITS Pilani
113 pages
Object Persistence Using Hibernate: An Object-Relational Mapping Framework For Object Persistence
No ratings yet
Object Persistence Using Hibernate: An Object-Relational Mapping Framework For Object Persistence
74 pages
Chapter 23 - Product Metrics
No ratings yet
Chapter 23 - Product Metrics
23 pages
Model Driven
No ratings yet
Model Driven
61 pages
WhizCard CLF C01 06 09 2022
No ratings yet
WhizCard CLF C01 06 09 2022
111 pages
Design Thinking Thesis
No ratings yet
Design Thinking Thesis
19 pages
Rational Functions: Asymptotes
No ratings yet
Rational Functions: Asymptotes
2 pages
Programming With Java: K.Srivatsan
No ratings yet
Programming With Java: K.Srivatsan
194 pages
ML Module1 Chapter1
No ratings yet
ML Module1 Chapter1
38 pages
04 Unit IV Concept Generation Part 1
No ratings yet
04 Unit IV Concept Generation Part 1
48 pages
What Does XML Stand For
No ratings yet
What Does XML Stand For
16 pages
Connascence Key
No ratings yet
Connascence Key
132 pages
AWS Cloud Practitioner Practice Set 1
No ratings yet
AWS Cloud Practitioner Practice Set 1
63 pages
23 Java 8 File IO Part 2 PDF
No ratings yet
23 Java 8 File IO Part 2 PDF
23 pages
Exercises From Finite Difference Methods For Ordinary and Partial Differential Equations
No ratings yet
Exercises From Finite Difference Methods For Ordinary and Partial Differential Equations
35 pages
Design Thinking Lab 2020-21
No ratings yet
Design Thinking Lab 2020-21
47 pages
The Use of Design Thinking
100% (1)
The Use of Design Thinking
12 pages
Dti Notes
No ratings yet
Dti Notes
39 pages
Waterfall Model: Advantages
100% (1)
Waterfall Model: Advantages
5 pages
Life Cycle Map
No ratings yet
Life Cycle Map
2 pages
Reference Model PDF
No ratings yet
Reference Model PDF
48 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Developing Java Portlets For Luminis 5
No ratings yet
Developing Java Portlets For Luminis 5
7 pages
Unit 5
No ratings yet
Unit 5
20 pages
Rhapsody Modeler Tutorial
100% (1)
Rhapsody Modeler Tutorial
160 pages
Improving New Product Development Innovation Effectiveness by Using TRIZ
No ratings yet
Improving New Product Development Innovation Effectiveness by Using TRIZ
16 pages
Resume Amarish
No ratings yet
Resume Amarish
3 pages
Adaboost Matas
No ratings yet
Adaboost Matas
136 pages
Published On 22 Jul 15
No ratings yet
Published On 22 Jul 15
4 pages
Design Thinking Process Unit 2 Notes
No ratings yet
Design Thinking Process Unit 2 Notes
11 pages
Introduction To PL/SQL
No ratings yet
Introduction To PL/SQL
50 pages
An Introduction of Design Thinking Methodology
No ratings yet
An Introduction of Design Thinking Methodology
44 pages
Data Structures and Algorithms: Linked Lists Stacks PLSD210 (Ii)
No ratings yet
Data Structures and Algorithms: Linked Lists Stacks PLSD210 (Ii)
18 pages
SQL2
No ratings yet
SQL2
44 pages
Software Engineering: Lecture # 1
No ratings yet
Software Engineering: Lecture # 1
44 pages
Liferay Portal 6 2 Datasheet
No ratings yet
Liferay Portal 6 2 Datasheet
4 pages
Explaining Design Research
No ratings yet
Explaining Design Research
3 pages
Capers Jones
No ratings yet
Capers Jones
59 pages
SS - Unit 2 Study Material
No ratings yet
SS - Unit 2 Study Material
13 pages
2 Design Thinking-VRE
No ratings yet
2 Design Thinking-VRE
29 pages
Finite Difference Method
No ratings yet
Finite Difference Method
6 pages
U19EC416 DSP Lab Syllabus
No ratings yet
U19EC416 DSP Lab Syllabus
2 pages
RS Aggarwal Solutions Class 10 Chapter 2 - Exercise 2.1
No ratings yet
RS Aggarwal Solutions Class 10 Chapter 2 - Exercise 2.1
44 pages
What's Your Goal After Graduate: Prof. Susi Endrini, PH.D Rector
No ratings yet
What's Your Goal After Graduate: Prof. Susi Endrini, PH.D Rector
19 pages
Case Study Microsoft-1
No ratings yet
Case Study Microsoft-1
15 pages
Understanding The Engineering Design Process: by Tyler Keefe
No ratings yet
Understanding The Engineering Design Process: by Tyler Keefe
6 pages
Matrimony App 1
No ratings yet
Matrimony App 1
5 pages
Organizational Change (Jick) SP2019
No ratings yet
Organizational Change (Jick) SP2019
16 pages
Nonlinear Programming Concepts
No ratings yet
Nonlinear Programming Concepts
37 pages
Z.J. Wang - High-Order Methods For The Euler and Navier-Stokes Equations On Unstructured Grids
No ratings yet
Z.J. Wang - High-Order Methods For The Euler and Navier-Stokes Equations On Unstructured Grids
42 pages
2002 - Educating Managers Beyond Borders
No ratings yet
2002 - Educating Managers Beyond Borders
14 pages
Design Thinking Competitive Advantage
No ratings yet
Design Thinking Competitive Advantage
3 pages
Evaluation of CFD Codes On A Two-Phase Flow Benchmark Reference Test Case
No ratings yet
Evaluation of CFD Codes On A Two-Phase Flow Benchmark Reference Test Case
4 pages
Design Patterns Exercise
No ratings yet
Design Patterns Exercise
3 pages
Vision: The Broadest View: That Which You Strive For
0% (1)
Vision: The Broadest View: That Which You Strive For
9 pages
Design Thinker Pre-Read en
No ratings yet
Design Thinker Pre-Read en
4 pages
Linear Programming Solution Methods
No ratings yet
Linear Programming Solution Methods
38 pages
Scilab 5
No ratings yet
Scilab 5
15 pages
Machine Learning A Basic Approach
No ratings yet
Machine Learning A Basic Approach
9 pages
Nonlinear Programming Solvers For Unconstrained An
No ratings yet
Nonlinear Programming Solvers For Unconstrained An
44 pages
15cs1103 Software Engineering Methodologies and Design 3 0 0 100
No ratings yet
15cs1103 Software Engineering Methodologies and Design 3 0 0 100
2 pages
Module 3 - Complexity of An Algorithm
No ratings yet
Module 3 - Complexity of An Algorithm
7 pages
Yuping Intro To BendersDecomp PDF
No ratings yet
Yuping Intro To BendersDecomp PDF
19 pages
Mcq-On-Linear-Regression - 5eea6a0b39140f30f369dd1c
No ratings yet
Mcq-On-Linear-Regression - 5eea6a0b39140f30f369dd1c
22 pages
Direct Trajectory Optimization Using Nonlinear Programming and Collocation
No ratings yet
Direct Trajectory Optimization Using Nonlinear Programming and Collocation
5 pages
Result of Solution Using Gauss-Jordan Elimination
No ratings yet
Result of Solution Using Gauss-Jordan Elimination
6 pages
Ensemble Application of Convolutional and Recurrent Neural Networks For Multi-Label Text Categorization
No ratings yet
Ensemble Application of Convolutional and Recurrent Neural Networks For Multi-Label Text Categorization
7 pages
Topic Formula: P Q R HCF (P, Q, R) HCF (P, Q) HCF (Q, R) HCF (P, R) P Q R LCM (P, Q, R) LCM (P, Q) LCM (Q, R) LCM (P, R)
No ratings yet
Topic Formula: P Q R HCF (P, Q, R) HCF (P, Q) HCF (Q, R) HCF (P, R) P Q R LCM (P, Q, R) LCM (P, Q) LCM (Q, R) LCM (P, R)
9 pages
05 Task Performance 1
No ratings yet
05 Task Performance 1
3 pages
MIT15 093J F09 Final 2006
No ratings yet
MIT15 093J F09 Final 2006
6 pages

BackPropagation PDF

Uploaded by

BackPropagation PDF

Uploaded by

The back-propagation algorithm

Figure: The Sigmoid Function

Figure: The Sigmoid Function

I One useful property of this transfer function is the simplicity

Figure: A Single-Input Neuron

Figure: A Single-Input Neuron

Figure: A Multiple Input Neuron

Figure 3 is the diagram representing the following equation:

Figure: A neural network

Figure: A neural network

Given a set of training data points tk and output layer output Ok

We let the error of the network for a single training iteration be

But, recalling our definition of δk we can write this as

But, recalling our definition of δk we can write this as

Similar to before we will now define all terms besides the Oi to be

For an output layer node k ∈ K

For a hidden layer node j ∈ J

4. Update the weights and biases as follows

You might also like