0% found this document useful (0 votes)

42 views6 pages

Multi-Layer Feed-Forward Networks

Backpropagation is a method for training multi-layer neural networks by calculating the gradient of the loss function with respect to the network parameters. It involves propagating errors from the output layer backwards through the network to calculate how each unit contributed to the overall error. This allows adjusting weights to minimize loss by gradient descent. Specifically, it uses the chain rule to distribute output errors to hidden units weighted by their connections to output units, then recursively calculates errors for hidden units based on errors of units they connect to.

Uploaded by

Vishnu Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

Multi-Layer Feed-Forward Networks

Uploaded by

Vishnu Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

BACK PROPOGATION

A single-layer network has severe restrictions: the class of tasks that can be accomplished is very
limited. In this chapter we will focus on feed-forward networks with layers of processing units.
Minsky and Papert (Minsky & Papert, 1969) showed in 1969 that a two layer feed-forward
network can overcome many restrictions, but did not present a solution to the problem of how to
adjust the weights from input to hidden units. An answer to this question was presented by
Rumelhart, Hinton and Williams in 1986 (Rumelhart, Hinton, & Williams, 1986), and similar
solutions appeared to have been published earlier (Werbos, 1974; Parker, 1985; Cun, 1985). The
central idea behind this solution is that the errors for the units of the hidden layer are determined
by back-propagating the errors of the units of the output layer. For this reason the method is
often called the back-propagation learning rule. Back-propagation can also be considered as a
generalisation of the delta rule for non-linear activation functions1 and multilayer networks.

Multi-layer feed-forward networks

A feed-forward network has a layered structure. Each layer consists of units which receive their
input from units from a layer directly below and send their output to units in a layer directly
above the unit. There are no connections within a layer. The Ni inputs are fed into the first layer
of Nh;1 hidden units. The input units are merely 'fan-out' units; no processing takes place in
these units. The activation of a hidden unit is a function Fi of the weighted inputs plus a bias, as
given in in eq

The output of the hidden units is distributed over the next layer of Nh;2 hidden units, until the
last layer of hidden units, of which the outputs are fed into a layer of No output units .
Although backpropagation can be applied to networks with any number of layers, just as for
networks with binary units it has been shown (Hornik, Stinchcombe, & White, 1989; Funahashi,
1989; Cybenko, 1989; Hartman, Keeler, & Kowalski, 1990) that only one layer of hidden units
su
ces to approximate any function with finitely many discontinuities to arbitrary precision,
provided the activation functions of the hidden units are non-linear (the universal approximation
theorem). In most applications a feed-forward network with a single layer of hidden units is used
with a sigmoid activation function for the units.

Delta Rule
Since we are now using units with nonlinear activation functions, we have to generalise the delta
rule:

The activation is a differentiable function of the total input, given by

in which To get the correct generalisation of the delta rule

as presented in the previous chapter, we must set

The error measure Ep is defined as the total quadratic error for pattern p at the output units:

where dpo is the desired output for unit o when pattern p is clamped.

We further set as the summed squared error. We can write

By equation we see that the second factor is When we

define we will get an update rule which is equivalent to the delta rule as
described in the previous chapter, resulting in a gradient descent on the error surface if we make

the weight changes according to: The trick is to figure out what δp k should
be for each unit k in the network. The interesting result, which we now derive, is that there is a
simple recursive computation of these δ's which can be implemented by propagating error signals
backward through the network.

To compute δp k we apply the chain rule to write this partial derivative as the product of two
factors, one factor reflecting the change in error as a function of the output of the unit and one re
ecting the change in the output as a function of changes in the input. Thus, we have
Let us compute the second factor. By equation we see that
which is the same result as we obtained with the standard delta rule. Substituting this and

equation in equation , we get

for any output unit o. Secondly, if k is not an output unit but a hidden unit k = h, we do not
readily know the contribution of the unit to the output error of the network. However, the error
measure can be written as a function of the net inputs from hidden to output layer; Ep = Ep(sp
1,sp 2,....... sp j.....) and we use the chain rule to write

Substituting this in equation yields

Equations and give a recursive procedure

for computing the δ's for all units in
the network, which are then used to compute the weight changes according to equation.
This procedure constitutes the generalised delta rule for a feed-forward network of non-linear
units.

Understanding backpropagation
The equations derived in the previous section may be mathematically correct, but what do they
actually mean? Is there a way of understanding back-propagation other than reciting the
necessary equations? The answer is, of course, yes. In fact, the whole back-propagation process
is intuitively very clear. What happens in the above equations is the following. When a learning
pattern is clamped, the activation values are propagated to the output units, and the actual
network output is compared with the desired output values, we usually end up with an error in
each of the output units. Let's call this error eo for a particular output unit o. We have to bring eo
to zero The simplest method to do this is the greedy method: we strive to change the connections
in the neural network in such a way that, next time around, the error eo will be zero for this
particular pattern. We know from the delta rule that, in order to reduce an error, we have to adapt
its incoming weights according to.

That's step one. But it alone is not enough: when we only apply this rule, the weights from input
to hidden units are never changed, and we do not have the full representational power of the
feed-forward network as promised by the universal approximation theorem. In order to adapt the
weights from input to hidden units, we again want to apply the delta rule. In this case, however,
we do not have a value for δ for the hidden units. This is solved by the chain rule which does the
following: distribute the error of an output unit o to all the hidden units that is it connected to,
weighted by this connection. Differently put, a hidden unit h receives a delta from each output
unit o equal to the delta of that output unit weighted with (= multiplied by) the weight of the
connection between those units.

Working with back-propagation

The application of the generalised delta rule thus involves two phases: During the first phase the
input x is presented and propagated forward through the network to compute the output values
yp o for each output unit. This output is compared with its desired value do, resulting in an error
signal δp o for each output unit. The second phase involves a backward pass through the network
during which the error signal is passed to each unit in the network and appropriate weight
changes are calculated.

Weight adjustments with sigmoid activation function.

 The weight of a connection is adjusted by an amount proportional to the product of an

error signal δ, on the unit k receiving the input and the output of the unit j sending this

signal along the connection:

 If the unit is an output unit, the error signal is given by Take as

the activation function F the 'sigmoid' function as defined In

this case the derivative is equal to
such that the error signal for an output unit can be

written as:

 The error signal for a hidden unit is determined recursively in terms of error signals of the
units to which it directly connects and the weights of those connections. For the sigmoid
activation function:

Learning rate and momentum

The learning procedure requires that the change in weight is proportional to True
gradient descent requires that in nitesimal steps are taken. The
constant of proportionality is the learning rate . For practical purposes we choose a learning rate
that is as large as possible without leading to oscillation. One way to avoid oscillation at large , is
to make the change in weight dependent of the past weight change by adding a momentum term:

where t indexes the presentation number and F is a

constant which determines the efect of the previous weight change.

Although, theoretically, the back-propagation algorithm performs gradient descent on the total
error only if the weights are adjusted after the full set of learning patterns has been presented,
more often than not the learning rule is applied to each pattern separately, i.e., a pattern p is
applied, Ep is calculated, and the weights are adapted (p = 1, 2,..... P). There exists empirical
indication that this results in faster convergence. Care has to be taken, however, with the order in
which the patterns are taught. For example, when using the same sequence over and over again
the network may become focused on the rst few patterns. This problem can be overcome by
using a permuted training method.

CS5242 Neural Networks and Deep Learning: Quiz 1
No ratings yet
CS5242 Neural Networks and Deep Learning: Quiz 1
2 pages
Back Propagation Network: Soft Computing
No ratings yet
Back Propagation Network: Soft Computing
33 pages
Multilayer Networks and The Backpropagation Algorithm
No ratings yet
Multilayer Networks and The Backpropagation Algorithm
4 pages
Back Propagation
No ratings yet
Back Propagation
56 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
100% (1)
Principles of Training Multi-Layer Neural Network Using Backpropagation
15 pages
Error Back Propagation Algorithm
No ratings yet
Error Back Propagation Algorithm
14 pages
Neural Networks
No ratings yet
Neural Networks
37 pages
Ann R16 Unit 4 PDF
No ratings yet
Ann R16 Unit 4 PDF
16 pages
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
No ratings yet
Kevin Swingler - Lecture 4: Multi-Layer Perceptrons
20 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
Experiments On Learning by Back Propagation
No ratings yet
Experiments On Learning by Back Propagation
45 pages
Feedforward
No ratings yet
Feedforward
34 pages
Week10 (Backprop and Competitive)
No ratings yet
Week10 (Backprop and Competitive)
63 pages
CT1 NNDL Question Bank
No ratings yet
CT1 NNDL Question Bank
8 pages
Multi Layer Feed-Forward NN
No ratings yet
Multi Layer Feed-Forward NN
15 pages
Unit 2
No ratings yet
Unit 2
38 pages
03 Back Propagation Network
No ratings yet
03 Back Propagation Network
33 pages
Unit 4
No ratings yet
Unit 4
16 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
25 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
Summative Assessment 7.2.2 I. MULTIPLE CHOICE. Directions: Choose The Correct Answer by Writing The Letter of Your Choice
No ratings yet
Summative Assessment 7.2.2 I. MULTIPLE CHOICE. Directions: Choose The Correct Answer by Writing The Letter of Your Choice
3 pages
RS Aggarwal Solution Class 10 Maths Chapter 2 Polynomials Exercise 2B
No ratings yet
RS Aggarwal Solution Class 10 Maths Chapter 2 Polynomials Exercise 2B
14 pages
Supervised Learning: Csm10: BACKPROPAGATION: An Example of
No ratings yet
Supervised Learning: Csm10: BACKPROPAGATION: An Example of
6 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
BackProp in Recurrent NNs
100% (1)
BackProp in Recurrent NNs
10 pages
شبكات عصبية ٢
No ratings yet
شبكات عصبية ٢
6 pages
Gradient Maths - Step by Step Delta Rule PDF
No ratings yet
Gradient Maths - Step by Step Delta Rule PDF
18 pages
Back Propagation
No ratings yet
Back Propagation
20 pages
Artificial Neural Networks Mathematics of Backpropagation (Part 4) - BRIAN DOLHANSKY
No ratings yet
Artificial Neural Networks Mathematics of Backpropagation (Part 4) - BRIAN DOLHANSKY
9 pages
Learning Representations by Backpropagating Errors PDF
No ratings yet
Learning Representations by Backpropagating Errors PDF
4 pages
Backpropagation Example
No ratings yet
Backpropagation Example
9 pages
LP Using Assignment Method
No ratings yet
LP Using Assignment Method
15 pages
Lagrange's Interpolation Formula: A B y X
No ratings yet
Lagrange's Interpolation Formula: A B y X
5 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Classification Advanced
No ratings yet
Classification Advanced
51 pages
l7 - Learning in Multi-Layer Perceptrons, Back-Propagation
No ratings yet
l7 - Learning in Multi-Layer Perceptrons, Back-Propagation
16 pages
ML Session 15 Backpropagation
No ratings yet
ML Session 15 Backpropagation
30 pages
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
No ratings yet
Neural Networks: Single Neurons (Continued) : G. Extension of The Delta Rule: Smooth F (Z)
5 pages
Back Propagation Neural Network
No ratings yet
Back Propagation Neural Network
5 pages
Back-Propagation Algorithm
No ratings yet
Back-Propagation Algorithm
26 pages
Model of Neuron in An ANN
No ratings yet
Model of Neuron in An ANN
12 pages
Ia Davma Unidad 2
No ratings yet
Ia Davma Unidad 2
113 pages
Networks With Threshold Activation Functions: Navigation
No ratings yet
Networks With Threshold Activation Functions: Navigation
6 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Deep Learning 10 Hours: - Artificial Neural Networks (ANN) : Architecture
No ratings yet
Deep Learning 10 Hours: - Artificial Neural Networks (ANN) : Architecture
24 pages
4 Numerical Differentiation Integration
No ratings yet
4 Numerical Differentiation Integration
62 pages
EELU ANN ITF309 Lecture 07 Spring 2024
No ratings yet
EELU ANN ITF309 Lecture 07 Spring 2024
50 pages
PERCEPTRONS
No ratings yet
PERCEPTRONS
13 pages
Constrained Optimization With Equality Constraint
No ratings yet
Constrained Optimization With Equality Constraint
32 pages
Perform The Indicated Operations 1
No ratings yet
Perform The Indicated Operations 1
6 pages
HW 01 Sol
No ratings yet
HW 01 Sol
4 pages
Programming Assignment-4
No ratings yet
Programming Assignment-4
4 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
Chapter 2 ANN
No ratings yet
Chapter 2 ANN
31 pages
Chapter 3
No ratings yet
Chapter 3
30 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Minimal Polynomial
No ratings yet
Minimal Polynomial
5 pages
Decision Maths 1 Chapter 1 Algorithms
No ratings yet
Decision Maths 1 Chapter 1 Algorithms
56 pages
Comparative Study and Analysis of Uninformed Search Algorithms
No ratings yet
Comparative Study and Analysis of Uninformed Search Algorithms
41 pages
Eio Supplementary
No ratings yet
Eio Supplementary
6 pages
Python For Finance: Regressions, Interpolation & Optimisation
No ratings yet
Python For Finance: Regressions, Interpolation & Optimisation
38 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
Chapter 11-15
No ratings yet
Chapter 11-15
158 pages
Lecture 8.2
No ratings yet
Lecture 8.2
13 pages
ANN Example
No ratings yet
ANN Example
10 pages
CH 15
No ratings yet
CH 15
21 pages
Factoring Polynomials
No ratings yet
Factoring Polynomials
48 pages
CBSE Class 10 Maths Notes Chapter 2
No ratings yet
CBSE Class 10 Maths Notes Chapter 2
1 page
Supervised Learning Network
No ratings yet
Supervised Learning Network
33 pages
Neural Networks Unit-3
No ratings yet
Neural Networks Unit-3
14 pages
Back Propogation
No ratings yet
Back Propogation
9 pages
Branch and Bound
No ratings yet
Branch and Bound
4 pages
FDS Unit 3 Notes
No ratings yet
FDS Unit 3 Notes
43 pages
Ann 2 A
No ratings yet
Ann 2 A
20 pages
Pset 2
No ratings yet
Pset 2
2 pages
Machine Learning Unit 5 Notes
No ratings yet
Machine Learning Unit 5 Notes
19 pages
Unit 2 - Math Review
No ratings yet
Unit 2 - Math Review
9 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
ML Unit-5
No ratings yet
ML Unit-5
19 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
21 pages
Short Notes - Linear Programming - Lakshya MHTCET 2025
No ratings yet
Short Notes - Linear Programming - Lakshya MHTCET 2025
2 pages
Session 28 C4T5 Method of Weighted Residual and Ritz Method
No ratings yet
Session 28 C4T5 Method of Weighted Residual and Ritz Method
11 pages
DL Assignment 4
No ratings yet
DL Assignment 4
7 pages
DL Question Paper Solved
No ratings yet
DL Question Paper Solved
12 pages
RD Sharma Jan2021 Class 9 Maths Chapter 6 Exercise 6.4
No ratings yet
RD Sharma Jan2021 Class 9 Maths Chapter 6 Exercise 6.4
4 pages
Linear Algebra Assignment 2
No ratings yet
Linear Algebra Assignment 2
1 page
Numerical Methods For Engineers and Scientists Using Matlab Second Edition 2nd Ed Esfandiari PDF Download
No ratings yet
Numerical Methods For Engineers and Scientists Using Matlab Second Edition 2nd Ed Esfandiari PDF Download
79 pages

Multi-Layer Feed-Forward Networks

Uploaded by

Multi-Layer Feed-Forward Networks

Uploaded by

BACK PROPOGATION

Multi-layer feed-forward networks

The activation is a differentiable function of the total input, given by

in which To get the correct generalisation of the delta rule

as presented in the previous chapter, we must set

We further set as the summed squared error. We can write

By equation we see that the second factor is When we

equation in equation , we get

Substituting this in equation yields

Equations and give a recursive procedure

Working with back-propagation

Weight adjustments with sigmoid activation function.

 The weight of a connection is adjusted by an amount proportional to the product of an

signal along the connection:

 If the unit is an output unit, the error signal is given by Take as

the activation function F the 'sigmoid' function as defined In

Learning rate and momentum

where t indexes the presentation number and F is a

You might also like