0% found this document useful (0 votes)
162 views

Backpropagation Algorithm

The backpropagation algorithm is used to train multilayer perceptrons by adjusting synaptic weights. It provides an efficient method for learning in neural networks with multiple layers. The algorithm calculates the error gradient descent to update weights between neurons in a backward pass through the network. It is computationally efficient but only finds local, not global, optima. The multilayer perceptron trained with backpropagation can approximate continuous functions, as proven in Cybenko's theorem. The theorem shows that a neural network with one hidden layer can approximate any continuous function to a desired level of accuracy.

Uploaded by

Fernando Gaxiola
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views

Backpropagation Algorithm

The backpropagation algorithm is used to train multilayer perceptrons by adjusting synaptic weights. It provides an efficient method for learning in neural networks with multiple layers. The algorithm calculates the error gradient descent to update weights between neurons in a backward pass through the network. It is computationally efficient but only finds local, not global, optima. The multilayer perceptron trained with backpropagation can approximate continuous functions, as proven in Cybenko's theorem. The theorem shows that a neural network with one hidden layer can approximate any continuous function to a desired level of accuracy.

Uploaded by

Fernando Gaxiola
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

1

Backpropagation Algorithm
The learning algorithm used to adjust the synaptic weights of a multilayer perceptron is
known as back-propagation. This algorithm provides a compositionally ecient method for
the training of multi-layer perceptron. Even if it does not give a solution for all problems,
it put to rest the criticism about learning in multilayer neural networks.
yj = j (vj )
n
X
vj =
wij yi
i=1

The error at the output of neuron j is given as:


ej (k) = yj (k) dj (k)
where:
dj is the desired output.
yj is the neuron output
k indicates the k -th example
The instantaneous sum of the squared output errors is given by:
l

(k) =

1X 2
e (k)
2 j=1 j

where l is the number of neurons of the output layer.


Using the gradient decent, the weight connecting neuron i to neuron j is updated as:
(k)
wij (k) = wij (k + 1) wij (k) = w
(k)
ij

The correction term wji (k) is known as the delta rule. The term
as
(k)
(k) ej (k) yj (k) vj (k)
=
wij (k)
ej (k) yj (k) vj (k) wij (k)
The partial derivatives are given by

(k)
wji (k)

can be calculated

dj
k

Wki
vi

Wij
vj

yi

yj

Fig. 1.

(k)
ej (k)
ej (k)
yj (k)
yj (k)
vj (k)
vj (k)
wji (k)

= ej (k)
= 1

0
= j vj (k)
= yi (k)

So, the delta rule can be rewritten as

0
wij (k + 1) = wij (k) yi (k) ej (k) j vj (k)

(k)
(k) ej (k) yj (k) vj (k) yi (k) vi (k)
=
wki (k)
e (k) yj (k) vj (k) yi (k) vi (k) wki (k)
n j
o 0

0
=
ej (k) j vj (k) Wij i [vi (k)] yk (k)
0

= ei i [vi (k)] yk (k)

So
0

wki (k + 1) = wki (k) yk (k) ei (k) i [vi (k)]


In the output layer j (vj ) is linear
wij (k + 1) = wij (k) yi (k) ej (k)

The backpropagation algorithm has become the most popular one for training of the multilayer perceptron. It is compositionally very ecient and it is able to classify information
non-linearly separable. The algorithm is a gradient technique, implementing only one step
search in the direction of the minimum, which could be a local one, and not an optimization
one. So, it is not possible to demonstrate its convergence to a global optimum.
Function Approximation
A multilayer perceptron trained with the backpropagation algorithm is able to perform
a general nonlinear input-output mapping from <n (dimension of the input space) to <l

(dimension of the output space). In 1989, two additional papers were published on proofs of
the multilayer perceptron as universal approximate of continuous functions. The capability
of multilayer perceptron to approximate arbitrary continuous functions is established in the
following theorem
Theorem 1: (Cybenko) Let activation function of neural netrokd (.) be a stationary,
bounded, and monotone increasing. Then for any continuous functions f and any small
> 0, there exist an integers m, n and real constants vi , wij and i , such that neural
networks with one hidden layer satisfies
m

n
!

X
X

vi
wij uj i f (u1 , u2 ...un ) <

i=1

j=1

This theorem is directly applied to multilayer perceptron, with the following characteristics:
1. The input nodes: u1 , u2 ...un .
2. A hidden layer of m neurons completely connected to the input.
3. The activation function (.) , for the hidden neurons, is a constant, bounded, and is
monotonically increasing.
4. The network output is a linear combinations of the hidden neurons output.

You might also like