Back Propagation Network: Soft Computing
Back Propagation Network: Soft Computing
.in
rs
de
ea
yr
Back Propagation Network : Soft Computing Course Lecture 15 – 20, notes, slides
.m
w
w
www.myreaders.info/ , RC Chakraborty, e-mail [email protected] , Aug. 10, 2010
,w
ty
http://www.myreaders.info/html/soft_computing.html
or
ab
www.myreaders.info
kr
ha
C
C
R
Return to Website
yr
.m
w
w
,w
Soft Computing
ty
or
ab
kr
Topics
ha
C
C
4. References 33
02
fo
.in
rs
de
ea
yr
Back-Propagation Network
.m
w
w
,w
ty
What is BPN ?
or
ab
kr
ha
03
fo
.in
rs
de
SC - NN - BPN – Background
ea
1. Back-Propagation Network – Background
yr
.m
w
w
Real world is faced with a situations where data is incomplete or noisy.
,w
ty
or
that may to help reconstruct the missing data. It is in such situations the
R
- an output layer.
ea
[Continued from previous slide]
yr
.m
w • With BackProp networks, learning occurs during a training phase.
w
,w
ty
− each input pattern in a training set is applied to the input units and
C
C
R
− the error signal for each such target output pattern is then
back-propagated from the outputs to the inputs in order to
appropriately adjust the weights in each layer of the network.
ea
1.1 Learning :
yr
.m
w
w
AND function
,w
ty
or
AND W1
R
X1 X2 Y Input I1
0 0 0 A
0 1 0 Output O
W2 C
1 0 0
1 1 1 Input I2
B
satisfied.
w10 + w2 0 < θ , w1 0 + w2 1 < θ ,
w11 + w2 0 < θ , w1 1 + w2 1 > θ
− one possible solution :
if both weights are set to 1 and the threshold is set to 1.5, then
(1)(0) + (1)(0) < 1.5 assign 0 , (1)(0) + (1)(1) < 1.5 assign 0
(1)(1) + (1)(0) < 1.5 assign 0 , (1)(1) + (1)(1) > 1.5 assign 1
06
fo
.in
rs
de
SC - NN - BPN – Background
ea
• Example 1
yr
.m
w
w
AND Problem
,w
ty
or
ab
AND W1
X1 X2 Y Input I1
0 0 0 A
0 1 0 Output O
W2 C
1 0 0
1 1 1 Input I2
B
sum of its two inputs and comparing this value with a threshold θ.
− if the net input (net) is greater than the threshold, then the output
is 1, else it is 0.
− mathematically, the computation performed by the output unit is
• Example 2
Marital status and occupation
In the above example 1
− the input characteristics may be : marital Status (single or married)
ea
1.2 Simple Learning Machines
yr
.m
w
w
Rosenblatt (late 1950's) proposed learning networks called Perceptron.
,w
ty
or
08
fo
.in
rs
de
SC - NN - BPN – Background
ea
• Error Measure ( learning rule )
yr
.m
w Mentioned in the previous slide, the error measure is the difference
w
,w
ty
between actual output of the network with the target output (0 or 1).
or
ab
kr
ha
decreased by 1.
Case 2 : If output unit is 0 but need to be 1 then the opposite
changes are made.
09
fo
.in
rs
de
SC - NN – BPN – Background
ea
• Perceptron Learning Rule : Equations
yr
.m
w
w
The perceptron learning rules are govern by two equations,
,w
ty
or
∆ θ = - (tp - op) = - dp
10
fo
.in
rs
de
SC - NN - BPN – Background
ea
1.3 Hidden Layer
yr
.m
w
w
Back-propagation is simply a way to determine the error values in
,w
ty
or
XOR X1
A
X1 X2 Y X2
1 1 0 C Y
1 0 1
0 1 1 X2
B
0 0 0 X1
11
fo
.in
rs
de
SC - NN – Back Propagation Network
ea
2. Back Propagation Network
yr
.m
w
w
Learning By Example
,w
ty
or
ab
th th
The weight of the arc between i input neuron to j hidden layer is Vij .
The weight of the arc between i th hidden neuron to j th out layer is Wij
The table below indicates an 'nset' of input and out put data.
It shows ℓ inputs and the corresponding n output data.
Table : 'nset' of input and output data
No Input Ouput
I1 I2 .... Iℓ O1 O2 .... On
In this section, over a three layer network the computation in the input,
hidden and output layers are explained while the step-by-step
implementation of the BPN algorithm by solving an example is illustrated
in the next section.
12
fo
.in
rs
de
SC - NN – Back Propagation Network
ea
2.1 Computation of Input, Hidden and Output Layers
yr
.m
w (Ref.Previous slide, Fig. Multi-layer feed-forward back-propagation network)
w
,w
ty
or
ab
If the output of the input layer is the input of the input layer and
the transfer function is 1, then
{ O }I = { I }I
ℓx1 ℓx1 (denotes matrix row, column size)
The hidden neurons are connected by synapses to the input neurons.
th
- Let Vij be the weight of the arc between i input neuron to
th
j hidden layer.
- The input to the hidden neuron is the weighted sum of the outputs
13
fo
.in
rs
de
SC - NN – Back Propagation Network
ea
• Hidden Layer Computation
yr
.m
w Shown below the pth neuron of the hidden layer. It has input from the
w
,w
ty
sigmoidal function then the output of the pth hidden neuron is given by
ha
C
1
C
R
ea
• Output Layer Computation
yr
.m
w
w
Shown below the qth neuron of the output layer. It has input from
,w
ty
or
1
OOq = -λ (IOq – θOq)
(1+e )
IH1 OH1 –
1
–
IH2 OH2 W1q 1
2 w2q { O }O = -λ (IOq – θOq)
(1+e )
IH3 OH3 W3q –
3 q –
Wmq OOq
IHm OHm
m
θOq Note : here again the threshold is not
ea
2.2 Calculation of Error
yr
.m
w
w
(refer the earlier slides - Fig. "Multi-layer feed-forward back-propagation network"
,w
ty
or
and a table indicating an 'nset' of input and out put data for the purpose of
ab
kr
training)
ha
C
C
R
th
Consider any r output neuron. For the target out value T, mentioned
in the table- 'nset' of input and output data" for the purpose of
training, calculate output O .
The error norm in output for the r th output neuron is
E1r = (1/2) e2r = (1/2) (T –O)2
where E1r is 1/2 of the second norm of the error er in the r th neuron
for the given training pattern.
e2r is the square of the error, considered to make it independent
of sign +ve or –ve , ie consider only the absolute value.
The Euclidean norm of error E1 for the first training pattern is given by
n
E1 = (1/2) Σ (Tor - Oor )
2
r=1
This error function is for one training pattern. If we use the same
technique for all the training pattern, we get
nset
E (V, W) = Σ E j (V, W, I)
r=1
16
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
3. Back-Propagation Algorithm
yr
.m
w
w
The benefits of hidden layer neurons have been explained. The hidden layer
,w
ty
or
hierarchical network to learn any mapping and not just the linearly
R
separable ones.
17
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
3.1 Algorithm for Training Network
yr
.m
w
w
The basic algorithm loop structure, and the step by step procedure of
,w
ty
or
18
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
• Back-Propagation Algorithm - Step-by-step procedure
yr
.m
w
w
,w
■ Step 1 :
ty
or
ab
Normalize the I/P and O/P with respect to their maximum values.
kr
ha
C
For each training pair, assume that in normalized form there are
C
R
■ Step 2 :
ea
■ Step 3 :
yr
.m
w
w
Let [ V ] represents the weights of synapses connecting input
,w
ty
or
20
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 4 :
yr
.m
w
w
For training data, we need to present one set of inputs and outputs.
,w
ty
then by using linear activation function, the output of the input layer
ha
C
C
may be evaluated as
R
{ O }I = { I }I
ℓx1 ℓx1
■ Step 5 :
T
{ I }H = [ V] { O }I
mx1 mxℓ ℓx1
■ Step 6 :
Let the hidden layer units, evaluate the output using the
sigmoidal function as
–
–
1
{ O }H = - (IHi)
(1+e )
–
–
mx1
21
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 7 :
yr
.m
w
w
Compute the inputs to the output layers by multiplying corresponding
,w
ty
weights of synapses as
or
ab
kr
ha
T
C
{ I }O = [ W] { O }H
C
R
■ Step 8 :
Let the output layer units, evaluate the output using sigmoidal
function as
–
–
1
{ O }O = - (IOj)
(1+e )
–
–
ea
■ Step 9 :
yr
.m
w
w
Calculate the error using the difference between the network output
,w
ty
th
and the desired output as for the j training set as
or
ab
kr
ha
C
√∑ (Tj - Ooj )2
C
R
EP = n
■ Step 10 :
Find a term { d } as
–
–
–
–
nx1
23
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 11 :
yr
.m
w
w
Find [ Y ] matrix as
,w
ty
or
[ Y ] = { O }H 〈 d 〉
ab
■ Step 12 :
C
R
t +1 t
Find [∆W] = α [∆W] + η[Y]
mxn mxn mxn
■ Step 13 :
–
–
(OHi) (1 – OHi )
{ d* } = ei
–
–
mx1 mx1
Find [ X ] matrix as
[X] = { O }I 〈 d* 〉 = { I }I 〈 d* 〉
1xm ℓx1 1xm ℓx1 1xm
24
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 14 :
yr
.m
t +1 t
w Find [∆V] = α [∆V] + η[X]
w
,w
■ Step 15 :
ha
C
t +1 t t +1
C
t +1 t t +1
[W] = [W ] + [∆W]
■ Step 16 :
Find error rate as
∑ Ep
error rate =
nset
■ Step 17 :
Repeat steps 4 to 16 until the convergence in the error rate is less
than the tolerance value
■ End of Algorithm
25
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
3.2 Example : Training Back-Prop Network
yr
.m
w
w
• Problem :
,w
ty
or
In this problem,
- there are two inputs and one output.
- the values lie between -1 and +1 i.e., no need to normalize the values.
0.4 0.1
0.2
0.4 -0.2 TO = 0.1
-0.7 -0.5
0.2
26
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 1 : Input the first training set data (ref eq. of step 1)
yr
.m
w 0.4
w
,w
{ O }I = { I }I =
ty
-0.7
or
ℓx1 ℓx1
ab
2x1
kr
27
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 4 : (ref eq. of step 6)
yr
.m
w 1
w
,w
- (0.18)
ty
(1+e )
or
ab
0.5448
kr
{ O }H = =
ha
1
C
0.505
C
R
- (0.02)
(1+e )
28
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 5 : (ref eq. of step 7)
yr
.m
w
w
0.5448
,w
= [ W] T
= ( 0.2 = - 0.14354
ty
{ I }O { O }H - 0.5 )
or
0.505
ab
kr
ha
C
1
{ O }O = - (0.14354) = 0.4642
(1+e )
29
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 8 : (ref eq. of step 10)
yr
.m
w
w
d = (TO – OO1 ) ( OO1 ) (1 – OO1 )
,w
ty
0.5448 –0.0493
[ Y ] = { O }H (d ) = (– 0.09058) =
0.505 –0.0457
–0.02958
=
–0.02742
from values at step 2 & step 8 above
0.2 –0.018116
{e} = [W] {d}= (– 0.09058) =
-0.5 –0.04529
from values at step 8 above
from values at step 2
30
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 11 : (ref eq. of step 13)
yr
.m
w
w
,w
{ d* } = =
ab
0.4
[ X ] = { O }I ( d* ) = ( – 0.00449 0.01132)
-0.7
– 0.001796 0.004528
= 0.003143 –0.007924
– 0.001077 0.002716
1 0
[∆V] = α [∆V] + η[X] =
0.001885 –0.004754
31
fo
.in
rs
de
SC - NN - BPN – Algorithm
ea
■ Step 14 : (ref eq. of step 15)
yr
.m
w
w
,w
[V] = +
or
-0.2 0.2
ab
0.001885 –0.004754
kr
ha
C
– 0.0989 0.04027
=
0.1981 –0.19524
■ Step 15 :
■ Step 16 :
Iterations are carried out till we get the error less than the tolerance.
■ Step 17 :
32
fo
.in
rs
de
SC - NN - BPN – References
ea
4. References : Textbooks
yr
.m
w
w
1. "Neural Network, Fuzzy Logic, and Genetic Algorithms - Synthesis and
,w
ty
2. "Soft Computing and Intelligent Systems Design - Theory, Tools and Applications",
C
R
33