Multilayer Perceptron
Multilayer Perceptron
Structure Page No
6.1 Introduction 49
Objectives
6.2 Multi Layer Perceptron (MLP) 49
6.3 Backpropagation Learning 56
6.4 Summary 61
6.5 Solutions/Answers 61
6.6 Practical Assignment 64
6.7 References 64
6.1 INTRODUCTION
In this unit, we extend the Single Layer Neural Network architecture to the Multilayer
Feedforward (MLFF) network with backpropagation (BP) learning. This network is
also called multilayer perceptron. As its name suggests that this perceptron network
has more than one layer. First, we briefly review the perceptron model discussed in
unit 4 to show how this is altered to form MLFF networks in Sec. 6.2. In Section 6.3,
we derive the generalized delta (backpropagation) learning rule and see how it is
implemented in practice. Also, we shall examine the variations in the learning process
to improve the efficiency, and ways to avoid some potential problems that can arise
during training are described.
Objectives
After studying the unit you should be able to:
• define the multi-layer perceptron;
• formulate the multi-layer model for the given activation functions of input,
hidden and output layers;
• implement the backpropagation algorithm.
The first approach to solve such linearly inseparable problems was to have more than
one perceptron, each set up identifying small linearly separable sections of the inputs.
Then, combining their outputs into another perceptron would produce a final
indication of the class to which the input belongs.
**
**
P1
*
*
P3
* *
* *
* *
Fig. 1: Combination of perceptrons to solve XOR problem
It is impossible to strengthen the connections between active inputs and strengthen the
correct parts of the network. The actual inputs are effectively masked off from the
output units by the intermediate layer. The two states of neuron being on or off do not
give us any indication of the scale by which we have to adjust the weights. These are
shown in Fig. 2, where the threshold is adjusted at θ and at 0. The hard-hitting
threshold functions remove the information that is needed if the network is to
successfully learn. Hence, the network is not able to find which of the input weights
are increased and which one are not and so. Therefore, the network is unable to work
to produce a better solution next time. The way to go around the difficulty using the
step function as the thresholding process is to adjust it slightly and to use a slightly
different nonlinearity.
O
O
I I
0 0
θ
Fig. 2: “Step” or “Heaviside” functions
50
Multi-Layer
Perceptrons
I0 = 0
w0 = θ
I1
w1
Activation
w2 Σ O
Function
I2
wn Summation
In
Now let us define few nonlinear activation operators in addition to the earlier defined
activation operators.
φ
I
Fig. 4
ii) Piecewise Linear Function: The output for piecewise linear functions is
1 if mI > 1
defined as O = gI if mI < 1 . The corresponding graph is shown in Fig. 5.
−1 if mI > −1
0
1
–1
Fig. 5
51
Neural Networks iii) Hard Limiter Function: The output is given as O = sgn [ I ] , and the
corresponding graph is shown in Fig. 6.
0
1
–1
Fig. 6
iv) Unipolar Sigmoidal and Bipolar Sigmoidal Function: The output of unipolar
1
sigmoidal function is given as O = whereas the output of the
(1 + exp ( −λI ) )
bipolar function is given as O = tanh [ λI ] .
Fig. 7
II1 OO1
I
O
Consider → I I = I2 ; and OO = O2
⋮ ⋮
I Im OOn
Here input layer consists of m neurons and the output layer consists of n neurons that
the input layer use linear transfer function and the output layer use unipolar sigmoidal
n
function. Accordingly, {O1} = {I1} or I O j =
m×1 n×1
∑w
i =1
ij II i . The input to the output layer
can be given as
t t
[I O ] = [ W ] [OI ] = [ W ] [I I ]
n ×1 n × m m ×1 n × m m ×1
1
OOk =
(
1 + e−λIOk )
or [OO ] = f [ WI]
where, λ is sigmoidal gain and [ W ] is weight matrix and is also known as connection
matrix.
w11 I O1
II 1 OI1 Y01
1 1
w 21
w12
II2 OI 2 Y02
2 2
w1n
II m O Im Y0 n
m n
In multi layer perceptron the adapted perceptrons are arranged in layer. This model
has three layers; an input layer, and output layer, and a layer in between the input and
the output called the hidden layer. We use linear transfer function for the perceptrons
in the input layer and sigmoidal or squashed-S functions for hidden layer and the
output layer. The input layer does not perform a weighted sum or threshold. In this, we
have modified the single layer perceptron by changing the nonlinearity from a step
function to a sigmoidal function and added a hidden layer. This type of network
recognizes more complex things. The input-output mapping of multilayer perceptron
is given by
O = N3 N 2 N1 [ I]
where N1, is the nonlinear mapping provided by input, N2 is the nonlinear mapping
provide by hidden layer and N3 is the nonlinerar mapping provided by output layer.
Multilayer perceptron provides many applications of neural networks, such as
functional approximation, learning, generalization, etc. The multilayer network with
one hidden layer is shown in Fig. 9. The activity of the output units are described by
the activity of neurons in the hidden layer and the weight between the hidden and
output layers and the neurons in the hidden layer is determined by the activities of the
neurons in the input layer and the connecting weights between input and hidden units.
In such networks neurons in the hidden layers are free to construct their own
representations of the input.
O O1
1 1 1
OO 2
2 2 2
OO n
m p n
Table 1
X0 =1
W 01
∑
y1
W11
X1
W21
X2
Z1
E1) Consider the weights w11, w12, w21 and w22 on connection from the input neurons
to the hidden layer neurons and v1, v2 be the weights on the connections from
the hidden layer neurons to the output neurons with following values.
w11 = 0.15, w12 = 0.3, w21 = 0.15, w22 = 0.3, v1 = – 0.3 and v2 = 0.3. Also
consider the input (0, 0) generates the output (0, 0), (1,1) generates (1,1) and
(1, 0) or (0, 1) generate (0, 1) as the hidden layer output.
i) Write the output of the neuron set at the hidden layer.
ii) Check whether the vectors in this set are linear separable or not. Give
reason.
iii) Assume that input (0, 0) and (1, 1) gives 0, and (0, 1) generates 1 as output
at outer layer, then draw the diagraph of this network.
iv) Obtain the activation of layers in the network.
E3) Consider the following table for the connections between the input neurons and
the hidden layer neurons.
V21 W21
VI1 Wp1
Let us assume the activation function for the output of the input layer to the input of
input layer linear for this
{O}I = {I}I
n ×1 n ×1
input to the hidden neuron is the weighted sum of the outputs of the input neurons to
get I Hp (i.e. Input to the pth hidden neuron) as
n
Therefore I Hp = ∑V O
i =1
ip Ii
where n = 1,2,3,⋯ ,p
Denoting weight matrix or connectivity matrix between input neurons and hidden
neurons as
[ V ] , we can get an input to the hidden neuron as
l× m
56
T
In the matrix notation, {I}H = [ V] {O}I Multi-Layer
Perceptrons
p ×1 p × n n ×1
Let us again assume the activation function for the output of the pth hidden neuron as
sigmoidal function or squashed-S function. Then the output O Hp is given by
1
OHp =
(1 + e (
−λ I Hp −θHp )
)
th
θ Hp is the threshold of the p neuron. The computation of hidden layer is shown in
Fig. 12.
II1 OI1
1
V1n
II2 OI2
2
V2n
n
Vln
IIn OIn
n
θHn
IIO = – 1
O
OIO = – 1
57
Neural Networks
IH1 OH1
1
W1i
IH2 OH2
2 W2i
OOi
Wpi n
IHp OHp
p
IHO = – 1 ΘOi
O OHO
The Euclidean norm of error E10 for the first training pattern is given by
n
1 n
∑E
i =1 2 i =1
( OT − OC )2
1i = ∑
where E1i is the error in the ith neuron for the first training patterns, and OT is the
target output and OC is the calculated output.
Using the error, let us now write the modified values of weight vectors using steepes
descent method.
[ V ]t +1 = [ V ]t + [ ∆ V ]t +1
[ W ]t +1 = [ W ]t + [ ∆ W ]t +1
where
t +1 t
[ ∆w ] = α [ ∆w ] + η[ y ] ,
p× m p×m p×m
t +1 t
[ ∆v] = α [ ∆v ] + η[ x ] , here η is learning rate.
n×p n×p n×p
[I]I
Step 2: For each training pair, assume there are ‘n’ inputs given by and ‘m’
n ×1
[O]O
output in a normalized form.
m ×1
Step 3: Set the number of neurons in the hidden layer to lie between 1 < p < 21.
Step 9: Find
[ V ]t +1 = [ V ]t + [ ∆V ]t +1
[ W ]t +1 = [ W ]t + [ ∆W ]t +1
Step 10: Repeat steps 5 to 9 until the convergence in the error rate is less than the
tolerance value.
59
Neural Networks Let us cite the following example to understand the algorithm.
Example 1: Consider the three training sets given in the following Table 2,
Table 2
Inputs Output
I1 I1 O
0.3 – 0.2 0.2
0.4 0.6 0.3
0.6 – 0.2 0.1
Solution: Let us find the improved weights. The architecture of this model is given in
Fig. 14.
0.3 0.2
II1 HI1
– 0.5
0.4
0.2
OI1
0.1 0.2
– 0.2 HI2
II2
– 0.2
T
[I]H = [ V ] [O]I
0.2 0.1 0.3 0.04
= =
0.4 −0.2 −0.2 0.16
1
1 + e−0.04 0.51
[OC ]H = =
1 0.54
1 + 1e−0.16
0.51
[ I]O = [ w ]t [ O]H = [ −0.5 0.2] 0.52
= −0.15
1
[OC ]O = 0.151 = 0.462
1+ e
2
error = [ O TO − O CO ]
2
= [ 0.2 − 0.462] = 0.069
d = ( OTO − OCO1 )( OCO1 )
= ( 0.2 − 0.462 )( 0.462 )(1 − 0.462 ) = −0.065
−0.033
[OC ]H [ d ] = −0.035
Here assume that α = 1 and η = 0.5
[ ∆w ]1 = α [ ∆w ]0 + η[ y ]
−0.517
=
60 0.182
0.032 Multi-Layer
e = [ w ][ d ] = Perceptrons
−0.013
−0.001
d* =
0.0004
1 0.0003 0.0001
X = [ O]i d* =
0.0002 −0.00009
0.1998 0.40007
[ V ]1 = 0.1001 −0.20004
1 − 1.01
and [ W ] =
0.38
1 1
Using these modified [ V ] and [ w ] , error is calculated and then can be processed for
next training set.
E4) Find the modified weights for the training set having input I1 = 0.3, I2 = – 0.5
0 0.1 0.4 0 0.2
and output = 0.1 with [ V ] = and [ W ] = .
−0.2 0.2 −0.5
E5) How many Layers are there in Multi layer Neural Network?
E6) What is cycle to modify the weight values?
6.4 SUMMARY
In this unit, we have covered the following points.
i) The neural network architectures are broadly classified as single layer feed
forward networks and multilayer feed forward networks. If only input and
output layers are present, then the network is single layer network. In turn, if in
addition to the input and output layer one or more intermediate layer exists, then
the network is multilayer network.
ii) Backpropagation is a systematic method of training multilayer neural networks.
It is built on high mathematical foundation and has very good application
potential.
6.5 SOLUTIONS/ANSWERS
E1) i) Output set = {(0, 0), (1, 1), (0, 1)}
ii) These three vectors given in (i) are separable with (0, 0) and (1, 1) on one
side of the separating line, while (0, 1) is on the other side.
61
Neural Networks
iii) 0.15
0.2
– 0.3
0.15
0.3
0.2
0.3
0.2
0.3
iv)
Input Activation Output Output Output of
(Hidden (Hidden neuron the network
layer) layer) activation
(0, 0) (0, 0) (0, 0) 0 0
(1, 1) (0.3, 0.6) (1, 1) 0 0
(0, 1) (0.15, 0.3) (0, 1) 0.3 1
(1, 0) (0.15, 0.3) (0, 1) 0.3 1
E2)
O
A(1, 1 )
B(–1, 1 )
C(–1, –1 ) D(1, –1 )
E3) i) 1 1.8
0.6
–1 0.1
1
–1 0.3
–1 0.05 0.5
–1
0.6
0.2 0.3
0.2
0.6
62
Vertex/ Hidden Weighted Comment Activation Contribution Sum Multi-Layer
Coordinates Layer Sum to output Perceptrons
Neuron
O: 0, 0, 0 1 0+0+0 = 0 < 1.8 0 0
2 0+0+0 = 0 < 0.05 0 0
3 0+0+0 = 0 > –0.2 1 0.6 0.6*
A: 0, 0, 1 1 0+0+0.2 = < 1.8 0 0
0.2
2 0+0+0.3 = > 0.05 1 0.3
0.3
3 0+0+0.6 = > –0.2 1 0.6 0.9*
0.6
B: 0, 1, 0 1 0+1+0 = 1 < 1.8 0 0
2 0–1+0 = –1 < 0.05 0 0
3 0–1+0 = –1 < –0.2 0 0 0
C: 0, 1, 1 1 0+1+0.2 = < 1.8 0 0
1.2
2 0+0.1+0.2 = > 0.05 1 0.3
0.2
3 0–1+0.6 = < – 0.2 0 0 0.3
– 0.4
D: 1, 0, 0 1 1+0+0 = 1 < 1.8 0 0
2 0.1+0+0 = > 0.05 1 0.3
0.1
3 –1+0+0=–1 < –0.2 0 0 0.3
E: 1, 0, 1 1 1–0+0.2= 1.2 < 1.8 0 0
2 0.1+0+0.2 = > 0.05 1 0.3
0.4
3 –1+0+0.6 = < –0.2 0 0 0.3
–0.4
F: 1, 1, 0 1 1+1+0 = 2 > 1.8 1 0.6
2 0.1–1+0 = – < 0.05 0 0
0.9
3 –1–1+0 = –2 < – 0.2 0 0 0.6*
G: 1, 1, 1 1 1+1+0.2 = > 1.8 1 0.6
2.2
2 0.1–1+0.3 = – < 0.05 0 0
0.8
3 –1–1+0.6 = – < – 0.2 0 0 0.6*
1.4
* The output neuron fires, as this value is greater than 0.5 (the threshold value); the function value is + 1.
Session 7
Write a program in ‘C’ language to implement the backpropagation algorithm. Show
step by step output to input, hidden and output neurons as well as errors. How the
weights W and V modified? Use data given in Example 1.
6.7 REFERENCES
64