Backpropagation Algorithm
Backpropagation Algorithm
Backpropagation Algorithm
The learning algorithm used to adjust the synaptic weights of a multilayer perceptron is
known as back-propagation. This algorithm provides a compositionally ecient method for
the training of multi-layer perceptron. Even if it does not give a solution for all problems,
it put to rest the criticism about learning in multilayer neural networks.
yj = j (vj )
n
X
vj =
wij yi
i=1
(k) =
1X 2
e (k)
2 j=1 j
The correction term wji (k) is known as the delta rule. The term
as
(k)
(k) ej (k) yj (k) vj (k)
=
wij (k)
ej (k) yj (k) vj (k) wij (k)
The partial derivatives are given by
(k)
wji (k)
can be calculated
dj
k
Wki
vi
Wij
vj
yi
yj
Fig. 1.
(k)
ej (k)
ej (k)
yj (k)
yj (k)
vj (k)
vj (k)
wji (k)
= ej (k)
= 1
0
= j vj (k)
= yi (k)
0
wij (k + 1) = wij (k) yi (k) ej (k) j vj (k)
(k)
(k) ej (k) yj (k) vj (k) yi (k) vi (k)
=
wki (k)
e (k) yj (k) vj (k) yi (k) vi (k) wki (k)
n j
o 0
0
=
ej (k) j vj (k) Wij i [vi (k)] yk (k)
0
So
0
The backpropagation algorithm has become the most popular one for training of the multilayer perceptron. It is compositionally very ecient and it is able to classify information
non-linearly separable. The algorithm is a gradient technique, implementing only one step
search in the direction of the minimum, which could be a local one, and not an optimization
one. So, it is not possible to demonstrate its convergence to a global optimum.
Function Approximation
A multilayer perceptron trained with the backpropagation algorithm is able to perform
a general nonlinear input-output mapping from <n (dimension of the input space) to <l
(dimension of the output space). In 1989, two additional papers were published on proofs of
the multilayer perceptron as universal approximate of continuous functions. The capability
of multilayer perceptron to approximate arbitrary continuous functions is established in the
following theorem
Theorem 1: (Cybenko) Let activation function of neural netrokd (.) be a stationary,
bounded, and monotone increasing. Then for any continuous functions f and any small
> 0, there exist an integers m, n and real constants vi , wij and i , such that neural
networks with one hidden layer satisfies
m
n
!
X
X
vi
wij uj i f (u1 , u2 ...un ) <
i=1
j=1
This theorem is directly applied to multilayer perceptron, with the following characteristics:
1. The input nodes: u1 , u2 ...un .
2. A hidden layer of m neurons completely connected to the input.
3. The activation function (.) , for the hidden neurons, is a constant, bounded, and is
monotonically increasing.
4. The network output is a linear combinations of the hidden neurons output.