TB - 04 - Superwised Learning

Ot"fv!. 0v'J..
oM
48 Artificial Neural Network: An Introduction
'.;;
network with bipolar sigmoidal units (A= 1) ro
.,
. ~!
3
testing. The input-output data are obtained by
achieve the following [)YO-to-one mappings: varying inpuc variables (xt,Xz) within [-1,+1] ~
• y = 6sin(rrxt) + cos(rrx,) randomly. Also the output dara are normalized it

• y = sin(nxt) cos(0.2Jr"2) within [-1, 1]. Apply training ro find proper
weights in the network.
]
:f
Supervised Learning Network
Ser up rwo sets of data, each consisting of 10 :?.
input-output pairs, one for training and oilier for
~
Learning Objectives -----'''-----------------,
The basic networks in supervised learning. Adaline, Madaline, back~propagarion and
How the perceptron learning rule is better radial basis funcrion network.
rhan the Hebb rule. The various learning facrors used in BPN.
Original percepuon layer description. • An overview of Ttme Delay, Function Link,
Delta rule with single output unit. Wavelet and Tree Neural Networks.
Architecture, flowchart, training algorithm Difference between back-propagation and

and resting algorithm for perceptron, RBF networks.
I 3.1 Introduction
The chapter covers major topics involving supervised learning networks and their associated single-layer
and multilayer feed-forward networks. The following topics have been discussed in derail- rh'e- perceptron
learning r'Ule for simple perceptrons, the delta rule (Widrow-Hoff rule) for Adaline and single-layer feed-
forward flC[\VOrks with continuous activation functions, and the back-propagation algorithm for multilayer
feed-forward necworks with cominuous activation functions. ln short, ali the feed-forward networks have
been explored.
I 3.2 Perceptron Networks

1 3.2.1 Theory
Percepuon networks come under single-layer feed-forward networks and are also called simple perceptrons.
As described in Table 2-2 (Evolution of Neural Networks) in Chapter 2, various cypes of perceptrons were
designed by Rosenblatt (1962) and Minsky-Papert (1969, 1988). However, a simple perceprron network was
discovered by Block in 1962.
The key points to be noted in a perccptron necwork are:
I. The perceptron network consists of three units, namely, sensory unit (input unit), associator unit (hidden
unit), response unit (output unit).
~
50 SupeNised Learning Network 3.2 Perceptron Networks
51
2. The sensory units are connected to associamr units with fixed weights having values 1, 0 or -l, which are Output
assigned at random. · - o·or 1 Output Desired
II
3. The binary activation function is used in sensory unit and associator unit. Fixed _weight
t Oar 1 output
4. The response unit has an'activarion of l, 0 or -1. The binary step wiili fixed threshold 9 is used as
activation for associator. The output signals £hat are sem from the associator unit to the response unit are
valUe ciN., 0, -1
at randorr\ .
\ 0 G) y,
9~
only binary.
5. TiiCQUt'put of the percepuon network is given by
- ---
i . \.--- '
~¢~
' iX1
{', r' '-

c
y = f(y,,)
X X X
\i
;x,
G) G)
..., \.~ X
:.I\
' < •
.,cl. 1 X I I \ Xn
where J(y;n) is activation function and is defmed as &.,
·'~
"
•<
· ~. tr Sensory unit 1
•
f(\- ,-. \~ ..
~
sensor grid " /
) ·~ if J;n> 9
)\t \ .. ._ representing any-·'
f(y;,) ={ if -9~y;11 56
'Z -1 if y;71 <-9
lJa~------ .
@ @ ry~
6. The perceptron learning rule is used in the weight updation between the associamr unit and the response e,
unit. For each training input, the net will calculate the response and it will Oetermine whelfier or not an Assoc1ator un~ . Response unit
error has occurred.
Figure 3·1 Ori~erceprron network.
w~t fL 9-·
7. The error calculation is based on the comparison of th~~~~~rgets with those of the ca1~t!!_~~ed
outputs. b'"~ r>-"-j ::.Kq>
(l.. u,•>J;>.-l? '<Y\
' ~ AJA I &J)
8. The weights on the connections from the units that send the nonzero signal will get adjusted suitably. I 3.2.2 Perceptron Learning Rule '
9. The weights will be adjusted on the basis of the learning_rykjf an error has occurred for a particular
training patre_!Jl.,..i.e..,- In case of the percepuon learrling rule, the learning signal is the difference between esir.ed...and.actuaL...- -·--,
~ponse of a neuron. The perceptron learning rule IS exp rune as o ows: j ~ f.:] (\ :._ PK- A-£. )
Wi{new) = Wj{old) + a tx1• Consider a finite "n" number of input training vectors, with their associated r;g~ ~ired) values x(n) {
and t{n), where "n" r~o N. The target is either+ 1 or -1. The ourput ''y" is obtained on the
b(new) = b(old) + at basis of the net input calculated and activation function being applied over the net input.
If no error occurs, there is no weight updarion and hence the training process may be stopped. In the above
equations, the target value "t" is+ I or-land a is the learningrate.ln general, these learning rules begin with
an initial guess at rhe weight values and then successive adjusunents are made on the basis of the evaluation
of an ob~~ve function. Evenrually, the lear!Jillg rules reac~.a near~optimal or optimal solution in a finite __
y = f(y,,) = l~
-1
if J1i1 > (}
if-{} 5Jirl 58
if Jin < -{}
\r~~ ~r
I
-~~
~
'
.,
number of steps. -------
APcrceprron nerwork with irs three units is shown in Figure 3~1. A£ shown in Figure 3~1. a sensory unir The weight updacion in case of perceprron learning is as shown. X~ -~~. ·.
can be a two-dimensional matrix of 400 photodetectors upon which a lighted picture with geometric black
and white pmern impinges. These detectors provide a bif!.~.{~) __:~r-~lgl.signal__if.f\1_~ i~.u.~und lfy ,P • then /I
co exceei~. certain value of threshold. Also, these detectors are conne ed randomly with the associator ullit. w{new) = w{old) + a tx {a - learning rate)
The associator unit is found to conSISt of a set ofsubcircuits called atrtre predicates. The feature predicates are else, we have
(
hard-wired to detect the specific fearure of a pattern and are e "valent to the feature detectors. For a particular
w(new) = w(old)
fearure, each predicate is examined with a few or all of the ponses of the sensory unit. It can be found that
the results from the predicate units are also binary (0 1). The last unit, i.e. response unit, contains the
pattern~recognizers or perceptrons. The weights pr tin the input layers are all fixed, while the weights on
the response unit are trainable.
~I
l
52 Supervised Learning Network 3.2 Perceptron Networks 53
For
each No
Figure 3-2 Single classification perceptron network. s:t
training patterns, and this learning takes place within a finite number of steps provided that the solution
exists."-
I 3.2.3 Architecture
In the original perceptron ne[Work, the output obtained from the associator unit is a binary vector, and hence
that output can be taken as input signal to the res onse unit and classificanon can be performed. Here only
the weights be[l.veen the associator unit and the output unit can be adjuste , an, t e we1ghrs between the
sensory _and associator units are faxed. As a result, the discussion of the network is limited. to a single portion. Apply activation, obtain
Thus, the associator urut behaves like the input unit. A simple perceptron network architecrure is shown in Y= f(y,)
Figure 3•2. --~·------
In Figure 3-2, there are n input neurons, 1 output neuron and a bias. The inpur-layer and output-
layer neurons are connected through a directed communication link, which is associated with weights. The
goal of the perceptron net is to classify theJ!w.w: pa~~tern as a member or not a member to a p~nicular
class. · -···-·.··-· ·-.. --- ......
y!l~~~
~1
~.J.';) clo...JJ<L [j 1f'fll r').\-~Oo-t" Cl....\ ~ ··~Len sy (\fll-

1 3.2.4 Flowchart for Training Process
Yes
The flowchart for the perceprron nerwork training is shown in Figure 3-3. The nerwork has to be suitably
trained to obtain the response. The flowchan depicted here presents the flow of the training process. w1(new) = W1{old)+ atx1 W1(new)= w1{old)
As depicted in the flowchart, fim the basic initialization required for rhe training process is performed. .~ b{new) = b(old) +at b(new) = b(old)
'
The entire loop of the training process continues unril the training input pair is presented to rhe network.
The training {weight updation) is done on the basis of the comparison between the calculated and desired
output. The loop is terminated if there is no change in weight.
If
Yes
3.2.5 Perceptron Training Algorithm for Single Output Classes weight
changes
The percepuon algorithm can be used for either binary or bipolar input vectors, having bipolar targets,
threshold being fixed and variable bias. The algorithm discussed in rh1~ section is not particularly sensitive
No
to the initial values of the wei~fr or the value of the learning race. In the algorithm discussed below, initially
the inputs are assigned. Then e net input is calculated. The output of the network is obtained by app1ying Stop
the. activation function over the calculated net input. On performing comparison over the calculated and
Figure 3·3 Flowcha.n: for perceptron network with ·single ourput.
54 Supervised Learning Network 3.2 Parceptron Networks 55
ilie desired output, the weight updation process is carried out. The entire neMork is trained based on the Step 2: Perform Steps 3--5 for each bipolar or binary training vector pair s:t.
mentioned stopping criterion. The algorithm of a percepuon network is as follows: Step 3, Set activation (identity) of each input unit i = 1 ton:
I StepO: Initi-alize ili~weights a~d th~bia~for ~ ~culation they can b-e set to zero). Also initialize the / x;;= ~{
learning race a(O < a,;:= 1). For simplicity a is set to 1.
Step 1: Perform Steps 2-6 until the final stopping condition is false. Step 4, irst, the net input is calculated as i A
Step 2: Perform Steps 3-5 for each training pair indicated by s:t. I ,_..... It: -.-' --.~ ',_ -~,
---- :::::::::J~ "'( ;-;· J)'
Step 3: The input layer containing input units is applied with identity activation functions: ~----
(,. Yinj = bj + Lx;wij

n
.1~'
r<t' V" p•\ \ '' /
. u· \'.
'
(}.C: \\ , :/ r·
x; =si
\ ~
i=l
. v··· '<.''
Step 4: Calculate the output of the nwvork. To do so, first obtain the net input: Then activations are applied over the net input to calculate the output response:
"
~
Yin= b+ Lx;w; ify;11j > 9
i=I
Jj = f(y;.y) = { if-9 :S.Jinj :S.9 II
,,
where "n" is the number of input neurons in the input layer. Then apply activations over the net -I ify;11j < -9
input calculated to obmin the output:
Step 5: Make adjustment in weights and bias for j = I to m and i = I to n.
~
ify,:n>B
y= f(y;.) = { if -8 S.y;, s.B If;· # Jj• then
-I ify;n < -9 Wij(new) = Wij(old) + CXfjXi
Step 5, Weight and bias adjustment: Compare ilie value of the actual (calculated) output and desired bj(new) = bj(old) + Ofj
(target) output. else, we have \1
Ify i' f, then wij(new) = Wij(old) li'
w;(new) = w;(old) + atx; ~{new) = ~{old)
b(new) = b(old) + Of
Step 6: Test for the stopping condition, i.e., if there is no change in weights then stop the training process,
else, we have
1 else stan again from Step 2. 1
I.'
7Vi(new) = WJ(old}
b(new) = b(old) It em be noticed that after training, the net classifies each of the training vectors. The above algorithm is
I
i
Step 6: Train the nerwork until diere is no weight change. This is the stopping condition for the network. suited for the architecture shown in Figure 3~4. ~
j
If this condition is not met, then start again from Step 2. i
I
The algorithm discussed above is not sensitive to the initial values of the weights or the value of the
3.2. 7 Percept ron Network Testing Algorithm
~
~
It is best to test the network performance once the training process is complete. For efficient performance
learning rare.
of the network, it should be trained with more data. The testing algorithm (application procedure) is as
I
~
follows: ~!I
3.2.6 Perceptron Training Algorithm for Multiple Output Classes il\!
For multiple output classes, the perceptron training algorithm is as follows:
\ Step 0:-- Initialize the weights, biases and learning rare suitably.
Step 1: Check for stopping c?ndirion; if it is false, perform Steps 2-6.
I
I Step 0: The initi~ weights to be used here are taken from the training algorithms (the final weights I
obtained.i:l.uring training).
Step 1: For each input vector X to be classified, perform Steps 2-3.
Step 2: Set activations of the input unit.
II
I:;i
I
.,,.
011
r
56 Supervised Learriing Network
~- 3.3 Adaptive Unear Neuron (Adaline) 57
~~
~,
~~3.3 Adaptive Linear Neuron (Adaline)
'
1
I 3.3.1 Theory ,
x, 'x,
The unirs with linear activation function are called li~ear.~ts. A network ~ith a single linear unit is called
an Adaline (adaptive linear neuron). That is, in an Adaline, the input-output relationship is linear. Adaline
./~~ \~\J
/w,, uses bipolar activation for its input signals and its target output. The weights be.cween the input and the
omput are adjustable. The bias in Adaline acts like an adjustable weighr, whose connection is from a unit
with activations being always 1. Ad.aline is a net which has only one output unit. The Adaline nerwork may
w,l be trained using delta rule. The delta rule may afso be called as least mean square (LMS) rule or Widrow~Hoff
Xi
(x;)~ "/ ~ y 1:
~(s)--
YJ
I • -----+- YJ
rule. This learning rule is found to minimize the mean~squared error between the activation and the target
value.
I 3.3.2 Delta Rule for Single Output Unit
The Widrow-Hoff rule is very similar to percepuon learning rule. However, rheir origins are different. The
perceptron learning rule originates from the Hebbian assumption while the delta rule is derived from the
x, ( x,).£::::___ _ _~
w -
gradienc~descem method (it can be generalized to more than one layer). Also, the perceptron learning rule
stops after a finite number ofleaming steps, but the gradient~descent approach concinues forever, converging
Figure 3·4 Network archirecture for percepuon network for several output classes. only asymptotically to the solution. The delta rule updates the weights between the connections so as w
minimize the difference between the net input ro the output unit and the target value. The major aim is to
Step 3: Obrain the· response of output unit. minimize the error over all training parrerns. This is done by reducing the error for each pattern, one at a
rime.
The delta rule for adjusting rhe weight of ith pattern {i = 1 ro n) is
Yin = L" x;w; / ' ·
i=l
D.w; = a(t- y1,)x1
where D.w; is the weight change; a the learning rate; xthe vector of activation of input unit;y;, the net input
I if y;, > 8
to output unit, i.e., Y Li=l
= x;w;; t rhe target output. The deha rule in case of several output units for
Y = f(yhl) = { _o ~f ~e sy;, ~8 _,/'\ adjusting the weight from ith input unit to the jrh output unit (for each pattern) is
1 tfy111 <-8 IJ.wij = a(t;- y;,,j)x;
Thus, the testing algorithm resLS the performance of nerwork. I 3.3.3 Architeclure
As already stated, Adaline is a single~unir neuron, which receives input from several units and also from one
unit called bias. An Adaline inodel is shown in Figure 3~5. The basic Adaline model consists of trainable
weights. Inputs are either of the two values (+ 1 or -1) and the weights have signs (positive or negative).
The condition for separaring the response &om re~o is Initially, random weights are assigned. The net input calculated is applied to a quantizer transfer function
(possibly activation function) that restOres the output to +1 or -1. The Adaline model compares the actual
WJXJ + tiJ2X]. + b> (} output with the target output and on the basis of the training algorithm, the weights are adjusted.
_______
The condition for separating the resPonse from_...r~~o t~~ion of nega~ve
..
~--
is I 3.3.4 Flowchart lor Training Process
WI X} + 'WJ.X]_ + b < -(} The flowchan for the training process is shown in Figure 3~6. This gives a picrorial representation of the
network training. The conditions necessary for weight adjustments have co be checked carefully. The weights
The conditions- above are stated for a siilgie:f.i~p;;~~~ ~~~~;~k~ith rwo Input neurons and one output and other required parameters are initialized. Then the net input is calculated, output is obtained and compared
neuron and one bias. with the desired output for calculation of error. On the basis of the error Factor, weights are adjusted.
58 Supervised Learning Network 3.3 Adaptive Linear Neuron (Adaline) 59
Set initial values-weights

Ym= I.A/W1 y and bias, lear·rltrig-state
X, \
X2r j w2 ''-
_,.., If· b, a
w"
Y1"
X"
X"
Adaptive
algorithm I• e = t- Ym 1 Output error
generator +t For
No
each
.. ................................. Learning supervisor
~... s: t
Figure 3·5 Adaline model.
Yes
I 3.3.5 Training Algorithm

Activate input layer units
X =s (i=1ton)
1 1
The Adaline nerwork training algorithm is as follows:
.Step 0: Weights and bias are set to some random values bur not zero. Set the learning rate parameter ct.
Step 1: Perform Steps 2-6 when stopping condition is false.
Step 2: Perform Steps 3~5 for each bipolar training pair s:t.
Step 3: Set activations for input units i = I to n.
Weight updation
x;=s; w;(new) = w1(old) + a(t- Y1n)Xi
b(new) = b(old) + a(r- Yinl
Seep 4: Calculate the net input to the output unit.
"
y;, = b+ Lx;w;
i=J
Step 5: Update the weights and bias fori= I ron:
w;(new) = w;(old) + a (t- Yin) x; No If

b(new) = b (old) + a (t- y,,) E;=Es
Step 6: If the highest weight change rhat occurred during training is smaller than a specified toler-
ance ilien stop ilie uaining process, else continue. This is the rest for stopping condition of a
network.
The range of learning rate Can be be[Ween 0.1 and 1.0. Figure 3·6 Flowcharr for Adaline training process.
I
I
I
1._
.~
60 Supervised Learning Network 3.4 Multiple Adaptive Linear Neurons 61
I 3.3.6 Testing Algorithm
Ic is essential to perform the resting of a network rhat has been trained. When training is completed, the
Adaline can be used ro classify input patterns. A step &merion is used to test the performance of the network.
The resting procedure for thC Adaline nerwc~k is as follows:
J Step 0: Initialize the weights. (The weights are obtained from ilie ttaining algorithm.) J
Step 1: Perform Steps 2-4 for each bipolar input vecror x.
Step 2: Set the activations of the input units to x.
Step 3: Calculate the net input to rhe output unit:
]in= b+ Lx;Wj
Step 4: Apply the activation funcrion over the net input calculated: Figure 3·7 Archireaure of Madaline layer.
1 ify,"~o
y= and the output layer are ftxed. The time raken for the training process in the Madaline network is very high
{ -1 ifJin<O
compared to that of the Adaline network.

I 3.4 Multiple Adaptive Linear Neurons
In this training algorithm, only the weights between the hidden layer and rhe input layer are adjusted, and
I 3.4.1 Theory the weighu for the output units are ftxed. The weights VI, 112, ... , Vm and the bias bo that enter into output
unit Yare determined so that the response of unit Yis 1. Thus, the weights entering Yunit may be taken as
The multiple adaptive linear neurons (Madaline) model consists of many Adalin~el with a single
Vi ;::::V2;::::···;::::vm;::::!
output unit whose value is based on cerrain selection rules. 'It may use majOrity v(;re rule. On using this rule,
rhe output would have as answer eirher true or false. On the other hand, if AND rule is used, rhe output is and the bias can be taken as
true if and only ifborh rhe inputs are true, and so on. The weights that are connected from the Adaline layer
to ilie Madaline layer are fixed, positive and possess equal values. The weighrs between rhe input layer and bo;:::: ~
the Adaline layer are adjusted during the training process. The Adaline and Madaline layer neurons have a
The activation for the Adaline (hidden) and Madaline (output) units is given by
bias of excitation "l" connected to them. The uaining process for a Madaline system is similar ro that of an
{_
Adaline. lifx~O
f(x) = 1 if x < 0
I 3.4.2 Architectury>
A simple Madaline architecture is shown in Figure 3-7, which consists of"n" uniu of input layer, "m" units Step 0: Initialize the weighu. The weights entering the output unit are set as above. Set initial small
ofAdaline layer and "1" unit of rhe Madaline layer. Each neuron in theAdaline and Madaline layers has a bias random values for Adaline weights. Also set initial learning rate a.
of excitation 1. The Adaline layer is present between the input layer and the Madaline (output) layer; hence, Step 1: When stopping condition is false, perform Steps 2-3.
the Adaline layer can be considered a hidden layer. The use of the hidden layer gives the net computational
Step 2: For each bipolar training pair s:t, perform Steps 3-7.
capability which is nor found in single-layer nets, but chis complicates rhe training process to some extent.
The Adaline and Madaline models can be applied effectively in communication systems of adaptive Step 3: Activate input layer units. Fori;:::: 1 to n,
equalizers and adaptive noise cancellation and other cancellation circuits. x;:;: s;
I 3.4.3 Rowchart of Training Process Step 4: Calculate net input to each hidden Adaline unit:
The flowchart of the traini[lg process of the Madaline network is shown in Figure 3-8. In case of training, the "
Zinj:;:bj+ LxiWij, j:;: l tom
weighu between the input layer and the hidden layer are adjusted, and the weights between the hidden layer i=l
62 Supervised Learning Network 3.4 Multiple Adaptive Linear Neurons 63
(
p A
Initial & fixed weights

& bias between hidden & Yes
u
output layers
t=y
T
Set small random value
weights for adallne layer.
Initialize a
c}----~
t= 1" No
Yes
No
>--+---{8
Update weights on unit z1whose
net input is closest to zero.
b1(new) = b1(old) + a(1-z~)
w,(new) = wi(old) + a(1-zoy)X1
Activate input units
X10': s,, b1 ton
Update weights on units zk which

j has positive net inpul.
bk(new) = bN(old) + a(t-z.,.,)
Find net input to hidden layer
wilr(new) = w,.(old) + a(l-z.)x1
...
Zn~=b1 +tx1 w~,j=l tom
I
Calculate output
zJ= f(z.,)
I If no
Calculater net input to output unit No ( weight changes
c) (or) specilied
Y..,=b0 ·;i:zyJ
,., ' number of
epochs
T ' / (8
Calculate output Yes '
Y= l(y,)
cb Figure 3·8 (Continued).
Figure 3·8 Flowcharr for rraining ofMadaline,

I
L
-
Supe!Vised Learning Network 3.5 Back·Propagation Network 65
64
Step 5: Calculate output of each hidden unit: The back-propagation algorithm is different from mher networks in respect to the process by whic
weights are calculated during the learning period of the ne[INork. The general difficulty with the multilayer
Zj = /(z;n) pe'rceprrons is calculating the weights of the hidden layers in an efficient way that would result in a very small
or zero output error. When the hidden layers are incteas'ed the network training becomes more complex. To
Step 6: Find the output of the net: update weights, the error must be calculated. The error, Which is the difference between the actual (calculated)
and the desired (target) output, is easily measured at the"Output layer. It should be noted that at the hidden
y;, = bo + Lqvj
"' layers, there is no direct information of the en'or. Therefore, other techniques should be used to calculate an
j=l error at the hidden layer, which will cause minimization of the output error, and this is the ultimate goal.
The training of the BPN is done in three stages - the feed-forward of rhe input training pattern, the
y =f(y;")
calculation and back-propagation of the error, and updation of weights. The tescin of the BPN involves the
Step 7: Calculate the error and update ilie weighcs. compuration of feed-forward phase onlx.,There can be more than one hi en ayer (more beneficial) bur one
hidden layer is sufhcienr. Even though the training is very slow, once the network is trained it can produce
1. If t = y, no weight updation is required. its outputs very rapidly.
2. If t f y and t = +1, update weights on Zj, where net input is closest to 0 (zero):
I 3.5.2 Architecture
bj(new) = bj(old) + a (1 - z;11j}
wij(new) = W;i(old) + a (1 - z;11j)x; A back-propagation neural network is a multilayer, feed~forv.rard neural network consisting of an input layer,
a hidden layer and an output layer. The neurons present in che hidden and output layers have biases, which
3. If t f y and t = -1, update weights on units Zk whose net input is positive: are rhe connections from the units whose activation is always 1. The bias terms also acts as weights. Figure 3-9
shows the architecture of a BPN, depicting only the direction of information Aow for the feed~forward phase.
w;k(new) = w;k(old) + a (-1 - z;, k) x;
1 During the b~R3=l)3tion phase of learnms., si nals are sent in the reverse direction
b,(new) = b,(old) +a (-1- z;,.,) The inputs sent to the BPN and the output obtained from the net could be e1ther binary (0, I) or
bipolar (-1, + 1). The activation function could be any function which increases monotonically and is also
Step 8: Test for the stopping condition. (If there is no weight change or weight reaches a satisFactory level, differentiable.
or if a specifted maximum number of iterations of weight updarion have been performed then
1 stop, or else continue). I
Madalines can be formed with the weights on the output unit set to perform some logic functions. If there
are only t\VO hidden units presenr, or if there are more than two hidden units, then rhe "majoriry vote rule"
function may be used. /
I 3.5 Back·Propagation Network ...>,

:fu I\"..L.·'"''
,-- J f.
·
~ I
~
·-
(""~-~
I'
1 3.5.1 Theory
The back~propagarion learning algorithm is one of the most important developments in neural net\vorks
(Bryson and Ho, 1969; Werbos, 1974; Lecun, 1985; Parker, 1985; Rumelhan, 1986). This network has re-
awakened the scientific and engineering community to the model in and rocessin of nu
phenomena usin ne networks. This learning algori m IS a lied !tilayer feed-forward ne_two_d~
con;rung o processing elemen~S with continuous renua e activation functions. e networks associated
with back-propagation learning algorithm are so e ac -propagation networ. (BPNs). For a given set
of training input-output pair, chis algorithm provides a procedure for changing the weights in a BPN to
classify the given input patterns correctly. The basic concept for this weight update algorithm is simply the
gradient-des em method as used in the case of sim le crce uon networks with differentiable units. This is a r(~.
method where the error is propagated ack to the hidden unit. he aim o t e neur networ IS w train the ''
net to achieve a balance between the net's ability to respond (memorization) and irs ability to give reason~e
I ~~ure3·9
l
Architecture of a back-propagation network.
responses to rhe inpm mar "simi,.,. bur not identi/to me one mar is used in ttaining (generalization).
66 Super.<ise_d Learni~g Network 3.5 Back·Propagalion Network 67
I 3.5.3 Flowchart for Training Process
The flowchart for rhe training process using a BPN is shown in Figure 3-10. The terminologies used in the
flowchart and in the uaining algorithm are as follows:
x = input training vecro.r (XJ, ... , x;, ... , x11 )
t = target output vector (t), ... , t/r, ... , tm) -
a = learning rate parameter
x; :;::. input unit i. (Since rhe input layer uses identity activation function, the input and output signals © "
here are same.)
VOj = bias on jdi hidd~n unit
wok = bias on kch output unit FOr each No
~=hidden unirj. The net inpUt to Zj is training pair >-~----(B
x. t
"
Zinj = llOj +I: XjVij
i=l Yes
and rhe output is
Zj = f(zi"j) Receive Input signal x1 &
transmit to hidden unit
Jk = output unit k. The net input m Yk is

p
]ink = Wok + L ZjWjk In hidden unit, calculate o/p,

j=:l "
Z;nj::: Voj + i~/iVij
z;=f(Z;nj), ]=1top
and rhe output is i= 1\o n
y; = f(y,";)
Ok =. error correction weight adjusrmen~. for Wtk ~hat is due tO an error at output unit Yk• which is
back-propagared m the hidden uni[S thai feed into u~
Of = error correction weight adjustment for Vij that is due m the back-proEagation of error to the
hidden uni<zj- b>• '\f"-( L""'-'iJ ~-fe_,l.. ,,'-'.fJ Z-J' ...--
Also, ir should be noted that tOe commonly used acrivarion functions are l:imary sigmoidal and bipolar
sigmoidal activation functions (discussed in Section 2.3.3). These functions are used in the BPN because of Calculate output signal from
the following characteristics: (i) continui~; (ii) djffereorjahilit:ytlm) nQndeCreasing mon0£9.11Y· output layer,
p
The range of binary sigmoid is fio;Q to 1, and for bipolar sigmoid it is from -1 to+ 1. Yink =- Wok+ :E z,wik
"'
Yk = f(Yink), k =1 tom
I 3.5.4 Training Algorilhm
The error back-propagation learning algorithm can be oudined in ilie following algorithm:
Figure 3·10
!Step 0: Initialize weights and learning rate (take some small random values).
Step 1: Perform Sreps 2-9 when stopping condition is false.
Step 2: Perform Steps 3-8 for~ traini~~r.
I
L
Supervised learning Network
3.5 Back·Propagation Network 69
68
_, - ------------._
lf:edjorward p~as' (Phas:fJ_I
A
Compute error correction !actor

t,= (1,-yJ f'!Y~o.l
(between output and hidden)
--
Step 3: Each input unit receives input signal x; and sends it to the hidden unit (i
Step 4: Each hidden unit Zj(j = 1 top) sums irs Weighted inp~;~t signals to calculate net input:
..:/
Zfnf' =
-
v;j + LX
"
ill;;
I
-v
Y. '..,
= l to n}.
,I
'rJ
i=l
Calculate output of the hidden uilit by applying its activation functions over Zinj (binary or bipolar
Find weight & bias correction term
ll.Wjk. = aO,zj> l\W01c = ~J"II
Calculate error term bi

-
sigmoidal activation function}:
Zj = /(z;,j)
and send the output signal from the hidden unit to the input of output layer units.
Step 5: For each output unity,~o (k = I to m),_ca.lcuhue the net input: ,I
,\--. o\•\
(between hidden and input)
m
~nJ=f}kWjk ' I
p
~ = 0,,1f'(z1,p
Yink = Wok + L ZjWjk
j~l
I
Compute change in weights & bias based
on bj.l!.vii= aqx;. ll.v01 = aq
Update weight and bias on

output unit
-----:::~
......
f~ropagation ofen-or (Phase ll)j
St:ql-6: --Each output unu JJr(k
-
and apply the activation function to compute output signal
Yk = f(y;,,)
I to m) receives a target parrern corr~ponding to rhe input training

pattern and computes theferrorcorrectionJffii'C)
w111 (new) = w111 (old) + O.w_;11
I'
\
wok (new)= w0k (old)+ ll.w011
··= (t,- ykl/'(y;,,)
The derivative J'(y;11k) can be calculated as in Section 2.3.3. On the basis of the calculated error
correction term, update ilie change in weights and bias:
Update weight and bias on
hidden unil \,
t1wjk = cxOkzj; t1wok = cxOrr {j
v 11 (new) =V~(old) +I.Nq Of
V01 (new)= V01 (old) + t:N01 rJ
Also, send Ok to the hidden layer baCkwards.
Step 7: Each hidden unit (zj,j = I top) sums its delta inputs from the output units:
"'
8inj= z=okwpr
k=l
The term 8inj gets multiplied wirh ilie derivative of j(Zinj) to calculate the error tetm:
8j=8;11jj'(z;nj)
The derivative /'(z;71j) can be calculated as C!TS:cllssed in Section 2.3.3 depending on whether
binary or bipolar sigmoidal function is used. On the basis of the calculated 8j, update rhe change
in weights and bias:
t1vij = cx8jx;; tlvoj = aOj

,.
\.
I
-I
'
70 Supervised Learning Network 3.5 Back-Propagation Network 71 :IIf
. Wlighr and bias upddtion (PhaJ~ Ill): I from the beginning itself and the system may be smck at a local minima or at a very flat plateau at the starting
•
point itself. One method of choosing the weigh~ is choosing it in the range
Step 8: Each output unit (yk, k = 1 tom) updates the bias and weights:
I
Wjk(new) = Wjk(old)+6.wjk I -3' 3 J.
[ .fO;' _;a,'
= WQk(oJd)+L'.WQk '
WOk(new)
i
Each hidden unit (z;,j = 1 top) updates its bias and weights: I
Vij(new) = Vij(o!d)+6.vij
'<y(new) = VOj(old)+t.voj
Step 9: Check for the sropping condition. The stopping condition may be cenain number of epochs
1 reached or when ilie actual omput equals the t<Uget output. 1 V,j'(new) =y Vij(old)
llvj(old)ll
The above algorithm uses the incremental approach for updarion of weights, i.e., the weights are being
where Vj is the average weight calculated for all values of i, and the scale factory= 0.7(P) 11n ("n" is the
changed immediately after a training pattern is presented. There is another way of training called batch-mode
number of input neurons and "P" is the nwnber of hidden neurons).
training, where the weights are changed only after all the training patterns are presented. The effectiveness of
rwo approaches depends on the problem, but batch-mode training requires additional local storage for each
3.5.5.2 Learning Rate a
connection to maintain the immediate weight changes. When a BPN is used as a classifier, it is equivalent to
the optimal Bayesian discriminant function for asymptOtically large sets of statistically independent training The learning rate (a) affects the convergence of the BPN. A larger value of a may speed up the convergence
but might result in overshooting, while a smaller value of a has vice-versa effecr. The range of a from 10- 3
pauerns.
The problem in this case is whether the back-propagation learning algorithm can always converge and find to 10 has been used successfulfy for several back-propagation algorithmic experiments. Thus, a large learning I
proper weights for network even after enough learning. It will converge since it implements a gradient-descent rate leads to rapid learning bm there is oscillation of wei_g!lts, while the lower learning rare leads to slower
on the error surface in the weight space, and this will roll down the error surface to the nearest minimum error learning. -
and will stop. This becomes true only when the relation existing between rhe input and the output training
patterns is deterministic and rhe error surface is deterministic. This is nm the case in real world because the 3.5.5.3 Momentum Factor
produced square-error surfaces are always at random. This is the stochastic nature of the back-propagation The gradient descent is very slow if the learning rare a is small and oscillates widely if a is roo large. One
algorithm, which is purely based on the srochastic gradient-descent method. The BPN is a special case of very efficient and commonly used method that altows a larger learning rate without oscillations is by adding
stochastic approximation. a momentum factor ro rhc;_.!,LQ!DlaLgradient-descen_t __m~_r]l_Qq., _
If rhe BPN algorithm converges at all, then it may get smck with local minima and may be unable to The-iil"Omemum E'cror IS denoted by 1] E [0, i] and the value of 0.9 is often used for the momentum
find satisfactory solutions. The randomness of the algorithm helps it to get out of local minima. The error factor. Also, this approach is more useful when some training data are ve rem from the ma·oriry
functions may have large number of global minima because of permutations of weights that keep the network of clara. A momentum factor can be used with either p uern y pattern up atillg or batch-"iiii e up a -
input-output function unchanged. This"6.uses the error surfaces to have numerous troughs. ing.-I'iicase of batch mode, it has the effect of complete averagirig over rhe patterns. Even though the
averaging is only partial in the panern-by-pattern mode, it leaves some useful i-nformation for weight
updation.
3.5.5 Learning Factors _of Back-Propagation Network
The weight updation formulas used here are
The training of a BPN is based on the choice of various parameters. Also, the convergence of the BPN is
Wjk(t+ I)= Wji(t) + ao,Zj+ry [Wjk(t)- Wjk(t- I)]
based on some important learning factors such as rhe initial weights, the learning rare, the updation rule,
the size and nature of the training set, and the architecture (number of layers and number of neurons per ll.•uj~(r+ 1)
layer).
and
3.5.5.1 Initial Weights
Vij(t+ 1) = Vij(t) + a8jXi+1J{Vij(t)- Vij(t- l)]
The ultimate solution may be affected by the initial weights of a multilayer feed-forward nerwork. They are
ll.v;j(r+ l)
initialized at small random values. The choice of r wei t determines how fast the network converges. I
The initial weights cannm be very high because t q~g-~oidal acriva · ed here may get samrated I The momenlum factor also helps in fas"r convergence.
L
'.
72 Supervised Learning Network 3.6 Radiat Basis Function Network 73
3.5.5.4 Generalization Step 4: Now c?mpure the output of the output layer unit. Fork= I tom,
The best network for generalization is BPN. A network is said robe generalized when it sensibly imerpolates p
with input networks thai: are new to the nerwork. When there are many trainable parameters for the given link =:WOk + L ZjWjk
amount of training dam, the network learns well bm does not generalize well. This is usually called overfitting ·. ·j=l
or overtraining. One solurion to this problem is to moniror the error on the rest sec and terminate the training
when che error increases. With small number of trainable parameters, ~e network fails to learn the training Jk = f(yj,,)
_r!-'' ~.,_,r; data and performs very poorly. on the .test data. For improving rhe abi\icy of the network ro generalize from Use sigmoidal activation functions for calculating the output.
.-.!( ~o_ a training data set w a rest clara set, ir is desirable to make small changes in rhe iripur space of a panern,
}{i 1
.,'e,) without changing the output components. This is achieved by introducing variations in the in pur space of
-0
c..!( '!f.!' training panerns as pan of the training set. However, computationally, this method is very expensive. Also,
,-. ,:'\ j a net With large number of nodes is capable of membfizing the training set at the cost of generali:zation ...As a I 3.6 Radial Basis Function Network
?\ Ji result, smaller nets are preferred than larger ones.
r I 3.6.1 Theory
3.5.5.5 Number of Training Data
The radial basis function (RBF) is a classification and functional approximation neural network developed
The training clara should be sufficient and proper. There exisrs a rule of thumb, which states !!!:r rhe training
by M.J.D. Powell. The newark uses the most common nonlineariries such as sigmoidal and Gaussian kernel
dat:uhould cover the entire expected input space, and while training, training-vector pairs should be selected
functions. The Gaussian functions are also used in regularization networks. The response of such a function is
randomly from the set. Assume that theffiput space as being linearly separable into "L" disjoint regions
positive for all values ofy; rhe response decreases to 0 as lyl _. 0. The Gaussian function is generally defined as
with their boundaries being part of hyper planes. Let "T" be the lower bound on the ~umber~ of training
pens. Then, choosing T suE!!_ that TIL ») will allow the network w discriminate pauern classes using f(y) = ,-1
fine piecewise hyperplane parririomng. Also in some cases, scaling.ornot;!:flalization has to be done to help
learning. __ ,•' ··: }) \ .. The derivative of this function is given by
3.5.5.6 Number of Hidden Layer Nodes .•. A/77 _/ ['(yl = -zy,-r' = -2yf(yl
If there exists more than one hidden layer in a BPN, rhe~~ICufarions
performed for a single layer are The graphical represemarion of this Gaussian Function is shown in Figure 3-11 below.
repeated for all the layers and are summed up at rhe end. In case of"all mufnlayer feed-forward networks, When rhe Gaussian potemial functions are being used, each node is found to produce an idemical outpm
rhe size of a h1dden layer i'f"VeTy important. The number of hidden units required for an application needs for inputs existing wirhin the fixed radial disrance from rhe center of the kernel, they are found m be radically
to be determined separately. The size of a hidden lay~_:___is usually determi_~Q~~p_qim~~- For a network symmerric, and hence the name radial basis function network. The emire network forms a linear combination
of a reasonable size,~ SIZe of hidden nod -- araariVel}r~mall fraction of the inpllrl~For of the nonlinear basis function.
example, if the network does not converge to a solution, it may need mor hidduJ lmdes:-i3~and,
overa.ll system performance.
3.5.6 Testing Algorithm of Back-Propagation Network

---
if rhe net\vork converges, the user may try a very few hidden nodes and then settle finally on a size based on
f(y)
The resting procedure of the BPN is as follows:
Step 0: Initialize the weights. The weights are taken from the training algorithm.
Step 1: Perform Steps 2-4 for each input vector.
Step 2: Set the activation of input unit for x; (i = I ro n).
Step 3: Calculate the net input to hidden unit x and irs output-. For j = 1 ro p,
"
Zinj = VOj + L XiVij ~----~~--r---L-~--~r-----~Y
i:=l -2 -1 0 2
Z; = f(z;n;) Figure 3·11 Gaussian kernel fimcrion.

74 Supervised Learning Network 3.6 Radial Basis Function Network 75
x,
X,
For "'- No
each >--
x,
Input Hidden Output
layer layer (RBF) layer
Figure 3·12 Architecture ofRBE
Select centers of RBF functions;

I 3.6.2 Architecture sufficient number has to be
selected to ensure adequate sampling
The archirecmre for the radial basis function network (RBFN) is shown in Figure 3-12. The architecture
consim of two layers whose output nodes form a linear combination of the kernel (or basis) functions
computed by means of the RBF nodes or hidden layer nodes. The basis function (nonlinearicy) in the hidden
layer produces a significant nonzero response w the input stimulus it has received only when the input of it
falls within a smallloca.lized region of the input space. This network can also be called as localized receptive
field network.
I 3.6.3 Flowchart for Training Process
The flowchart for rhe training process of the RBF is shown in Figure 3-13 below. In this case, the cemer of
the RBF functions has to be chosen and hence, based on all parameters, the output of network is calculated.
The training algorithm describes in derail ali rhe calculations involved in the training process depicted in rhe
flowchart. The training is starred in the hidden layer with an unsupervised learning algorithm. The training is
continued in the output layer with a supervised learning algorithm. Simultaneously, we can apply supervised
learning algorithm to ilie hidden and output layers for fme-runing of the network. The training algorithm is
If no
given as follows. 'epochs (or)
no
I Ste~ 0: Set the weights to small random values. No weight
hange
Step 1: Perform Steps 2-8 when the stopping condition is false.
Step 2: Perform Steps 3-7 for each input. Yes f+------------'
Step 3: Each input unir .(x; for all i ::= 1 ron) receives inpm signals and transmits to rhe next hidden layer
unit.
Figure 3-13 Flowchart for the training process ofRBF.
76 Supervised Learning Network
3.8 Functional Link Networks 77
·Step 4: Calculate the radial basis function.
Step 5: Select the cemers for che radial basis function. The cenrers are selected from rhe set of input
vea:ors. It should be ·noted that a sufficient number of centen; have m be selected to ensure
X( I)
Delay line
l
adequate sampli~g of the input vecmr space. X( I) !<(1-D X( I-n)
Step 6: Calculate the output from the hidden layer unit:
-
Multllayar perceptron
r
t,rxji- Xji)']
v;(x;) =
exp [-
J-
a2
T
0(1)
'
where Xj; is the center of the RBF unit for input variables; a; the width of ith RBF unit; xp rhe Figure 3·14 Time delay neural network (FIR fiher).
jth variable of input panern.
Step 7: Calculate the output of the neural network:
Y11n = L W;mv;(x;) + wo X(!) X( I-n)
i=l
where k is the number of hidden layer nodes (RBF funcrion);y,m the output value of mrh node in Multilayer perceptron z-1
output layer for the nth incoming panern; Wim rhe weight between irh RBF unit and mrh ourpur
node; wo the biasing term at nrh output node.
Step 8: Calculate the error and test for the stopping condition. The stopping condition may be number 0(1)
of epochs or ro a certain ex:renr weight change.
Figure 3·15 TDNN wirh ompur feedback (IIR filter).
Thus, a network can be trained using RBFN.
I 3.8 Functional Link Networks

I 3.7 Time Delay Neural Network -
These networks are specifically designed for handling linearly non-separable problems using appropriate
The neural network has to respond to a sequence of patterns. Here the network is required to produce a input representacion. Thus, suitable enhanced representation of the inpm data has to be found out. This
particular ourpur sequence in response to a particular sequence of inputs. A shift register can be wnsidered can be achieved by increasing the dimensions of the input space. The input data which is expanded is
as a tapped delay line. Consider a case of a multilayer perceptron where the tapped outputs of rhe delay line used for training instead of the actual input data. In this case, higher order input terms are chosen so that
are applied to its inputs. This rype of network constitutes a time delay Jlfurtzlnerwork (TONN}. The ourpm they are linearly independent of the original pattern components. Thus, the input representation has been
consists of a finite temporal dependence on irs inpms, given a~ enhanced and linear separability can be achieved in the extended space. One of the functional link model
networks is shown in Figure 3·16. This model is helpful for learning continuous functions. For this model,
U(t) = F[x(t),x(t-1), ... ,x(t- n)] the higher-order input terms are obtained using the onhogonal basis functions such as sinTCX, cos JrX, sin 2TCX,
cos 2;rtr, etc.
where Fis any nonlinearity function. The multilayer perceptron with delay line is shown in Figure 3-14. The most common example oflinear nonseparabilicy is XOR problem. The functional link networks help
When the function U(t) is a weigh red sum, then the· TDNN is equivalent to a finite impulse response in solving this problem. The inputs now are
filter (FIR). In TDNN, when the output is being fed back through a unit delay into rhe input layer, then the
net computed here is equivalent to an infinite impulse response (IIR) filter. Figure 3-15 shows TDNN with x:z t
output feedback. "'-I-I
"'"'
Thus, a neuron with a tapped delay line is called a TDNN unit, and a network which consists ofTDNN -I I -I -I
units is called a TDNN. A specific application ofTDNNs is speech recognition. The TDNN can be trained -I -I -I
using the back-propagatio·n-learning rule with a momentum factor.
78
Supervised ~aming Network 3.10 Wavelet Neural Networks 79
Yes No
I "=' I I C=21 I C=1 I I C=31

Figure 3·18 Binary classification tree.
obtained by a multilayer network at a panicular decision node is used in the following way:
Figure 3·16 Functional line nerwork model.
x directed to left child node tL, if y < 0
x directed to right child node tR, if y ::: 0
x, 'x, The algorithm for a TNN consists of two phases:
~
/
1. Tree growing phase: In this phase, a large rree is grown by recursively fmding the rules for splitting until
x, x, 0 all the terminal nodes have pure or nearly pure class membership, else it cannot split further.
y y
2. Tree pnming phase: Here a smaller tree is being selected from the pruned subtree to avoid the overfilling
1 of data.
The training ofTNN involves [\VO nested optimization problems. In the inner optimization problem, the
~G BPN algorithm can be used to train the network for a given pair of classes. On the other hand, in omer
Figure 3·17 The XOR problem.
optimization problem, a heuristic search method is used to find a good pair of classes. The TNN when rested
on a character recognition problem decreases the error rare and size of rhe uee relative to that of the smndard
classifiCation tree design methods. The TNN can be implemented for waveform recognition problem. It
Thus, ir can be easily seen rhar rhe functional link nerwork in Figure 3~ 17 is used for solving this problem. obtains comparable error rates and the training here is faster than the large BPN for the same application.
The li.Jncriona.llink network consists of only one layer, therefore, ir can be uained using delta learning rule Also, TNN provides a structured approach to neural network classifier design problems.
instead of rhe generalized delta learning rule used in BPN. As, a result, rhe learning speed of the fUnc6onal
link network is faster rhan that of the BPN.
I 3.10 Wavelet Neural Networks
I 3.9 Tree Neural Networks The wavelet neural network (WNN) is based on the wavelet transform theory. This nwvork helps in
approximating arbitrary nonlinear functions. The powerful tool for function approximation is wavelet
The uee neural networks (TNNs) are used for rhe pattern recognition problem. The main concept of this decomposition.
network is m use a small multilayer neural nerwork ar each decision-making node of a binary classification Letj(x) be a piecewise cominuous function. This function can be decomposed into a family of functions,
tree for extracting the non-linear features. TNNs compbely extract rhe power of tree classifiers for using which is obtained by dilating and translating a single wavelet function¢: !(' --')- R as
appropriate local fearures at the rlilterent levels and nodes of the tree. A binary classification tree is shown in
Figure 3-18.
The decision nodes are present as circular nodes and the terminal nodes are present as square nodes. The
j(x) = L' w;det [D) 12] ¢ [D;(x- 1;)]
i::d
terminal node has class label denoted 'by Cassociated with it. The rule base is formed in the decision node
(splitting rule in the form off(x) < 0 ). The rule determines whether the panern moves to the right or to the where D,. is the diag(d,·), d,. EJ?t
ate dilation vectors; Di and t; are the translational vectors; det [ ] is the
left. Here,f(x) indicates the associated feature ofparcern and"(}" is the threshold. The pattern will be given determinant operator. The w:..velet function¢ selecred should satisfy some properties. For selecting¢: If' --')o
the sJass label of the terminal node on which it has landed. The classification here is based on the fact iliat R, the condition may be
the appropriate features can be selected ar different nodes and levels in the tree. The output feature y = j(x)
,P(x) =¢1 (XJ) .. t/J1 (X 11 ) forx:::: (x, X?.· . . , X11 )
L_i.._
..~~'"·
80 Supervised Learning Network 3.12 Solved Problems 81
ro form a Madaline network. These networks are trained using delta learning rule. Back-propagation network
-r is the most commonly used network in the real time applications. The error is back-propagated here and is
fine runed for achieving better performance. The basic difference between the back-propagation network and
~)-~-{~Q-{~}-~~-~ 7 radial basis function network is the activation funct'ion. use;d. The radial basis function network mostly uses
Gaussian activation funcr.ion. Apart from these nerWor~; some special supervised learning networks such as
:_,, : \] : : ~ time delay neural ne[Wotks, functional link networks, tree neural networks and wavelet neural networks have
also been discussed.
0----{~J--[~]-----{~-BJ------0-r
:· I I
K
Input( X
3.12 Solved Problems
Output
I. I!Jlplement AND function using perceptron net- Calculate the net input
~ //works for bipol~nd targets.
&-c~J-{~~J-G-cd
y;, = b+xtWJ +X2W2
Solution: Table 1···shows the truth table for AND
function with bipolar inputs and targelS:
=O+Ix0+1x0=0
Figure 3·19 Wavelet neural network. The output y is computed by applying activations
Table 1
over the net input calculated:
I {
X]
where "'I I ify;,> 0 · - .
-I -I y = f(;y;,) = 0 if y;, = 0
¢, (x) = -xexp ( -~J -I I -I -1 ify;71 <0
-I -I -I . - ··-· . .- -- .--_-==-...
Here we have rake~-1) = O.)Hence, when,y;11 = 0,
is called scalar wavelet. The network structure can be formed based on rhe wavelet decomposirion as y= 0. ---···
The perceptron network, which uses perceptron
" learning rule, is used to train the AND function. Check whether t = y. Here, t = 1 andy = 0, so
y(x) = L w;¢ [D;(x- <;)] +y The network architecture is as shown in Figure l. t f::. y, hence weight updation takes place:
i=l
The input patterns are presemed to the network one
w;(new) = zv;(old) + ct.t:x;
where J helps to deal with nonzero mean functions on finite domains. For proper dilation, a rotation can be by one. When all the four input patterns are pre-
made for bener network operation: sented, then one epoch is said to be completed. The WJ(new) = WJ(oJd}+ CUXJ =0+] X I X l = 1
initial weights and threshold are set to zero, i.e., W2(ncw) = W2(old) + atx:z = 0 + 1 x l x 1 = I
WJ = WJ. = h = 0 and IJ = 0. The learning rate
y(x) = L" w;¢ [D;R;(x- <;)] + y a is set equal to 1.
b(ncw) = h(old) + at= 0 + 1 x I = l
i=l
Here, the change in weights are
x,~
where R; are the rotation marrices. The network which performs according to rhe above equation is called
Ll.w! = ~Yt:q;
wavelet neural network. This is a combination of translation, rotarian and dilation; and if a wavelet is lying on
the same line, then it is called wavekm in comparison to the neurons in neural networks. The wavelet neural Ll.W2 = atxz;
network is shown in Figure 3-19. b..b = at
X, ~ y y
w, The weighlS WJ = I, W2 = l, b = 1 are the final

1 3.11 Summary ~
X,
weighlSafrer first input pattern is presented. The same
process is repeated for all the input patterns. The pro-
In chis chapter we have discussed the supervised learning networks. In most of the classification and recognition cess can be stopped when all the wgets become equal
Figure 1 Perceptron network for AND function. to the cllculared output or when a separating line is
problems, the widely used networks are the supervised learning networks. The.architecrure, the learning rule,
flowchart for training process-and training algorithm are discussed in detail for perceptron network, Adaline, For the first input pattern, x 1 = l, X2 = I and obrained using the final weights for separating the
Madaline, back-propagation network and radial basis function network. The percepuon network can be t = 1, with weights and bias, w1 = 0, W2 = 0 and positive responses from negative responses. Table 2
trained for single output clasSes as well as mulrioutput classes. AJso, many Adaline networks combine together b=O, shows the training of perceptron network until its
82 Supervised Learning Network 3.12 Solved Problems 83
Table2
Weights 0----z The final weights at the end of third epoch are
w, =2,W]_ = l,b= -1
Input
Target Net input
Calculated
output
Weight changes
W) W]. b x, X w,~y y
Fu-rther epochs have to be done for the convergence
~
X) X]. (t) (y,,) (y) ~WI f:j.W'l M (0 0 0) of'the network.
· 3. _Bnd-the weights using percepuon network for
EPOCH-I
I 0 0 I /AND NOT function when all the inpms are pre-
I
-I -1 -I -I 0 2 0 sented only one time. Use bipolar inputS and
Figure 3 Perceptron network for OR function.
-I -I 2 +I -1 -I I -I ' targets.
0 0 0 1 -1 -I The perceptron network, which uses perceptron Solution: The truth table for ANDNOT function is
-1 -1 -I -3 -I
learning rule, is used to train the OR function. shown in Table 5.
EPOCH-2
0 0 0 -I The network architecture is shown in Figure 3. TableS
I I
0 0 0 -I The initial values of the weights and bias are taken
I -1 -I -1 -I t
-1 as zero, i.e., Xj "'-
-I -I -I -I 0 0 0
-I
I I -I
-I -1 -3 -I 0 0 0
WJ=W]_:::::b:::::O 1 -I I
-I I -1
target and calculated ourput converge for all the ~ Also the learning rate is 1 and threshold is 0.2. So, -I -I -I
patterns. the aaivation function becomes
The final weights and bias after second epoch are The network architecture of AND NOT function is
/'~-..._.};.- . .~:- 1 if y;/1> 0.2 ~ shown as in Figure 4. Let the initial weights be zero
=l,W'l=l, b=-1 (-1, 1)
,_~-- ~Yin ~ 0.2
W[
0 . _,. \.. . .. , [(yin) ;::: { O if - 0.2 and ct = l,fJ = 0. For the first input sample, we
compme the net input as
Since the threshold for the problem is zero, the
equation of the separating line is '
-x,
l~
~
,x,. . J/
";?").. The network is trained as per the perceptron training "
·. ~i algorithm and the steps are as in problem 1 (given for Yin= b+ Lx;w; = h+x1w 1 +xzlil2
w, b
X2 = - - X i - -
:.. /1--'
first pattern}. Table 4 gives the network rraining for i=-1
(-1,-1) (1,-1) ~=-X,+1 '/
Here
'"' "" 3 epochs. =O+IxO+IxO=O
Table4
W[X! + lli2X2 + b > $ Weights
W]X] + UlzX2 + b> Q -X,
Input Calculated Weight changes
Figure 2 Decision boundary for AND function
Target Net input output w, W2 b
Thus, using the final weights we obtain Xi X2 (t) {y;,,) (y) ~W) ~., ~b (0 0 0)
in perceptron training{$= 0).
I (-1) EPOCH-I
X2 = -}x' - -~- ~'mplemenr OR function with binary inputs and I 0 0 I I I
0 2 0 0 0
lil~J
L _ -xt+l · bipolar targw using perceptron training algo-
rithm upto 3 epochs.
0 2 0 0 0 I I 0
h can be easily found that the above straight line 0 0 -I 0 0 -I I I 0
Solution: The uuth table for OR function with EPOCH-2
separates the positive response and negative response
binary inputs and bipolar targets is shown in Table 3. 2 0 0 0 I I 0
region, as shown in Figure 2.
I 0 0 0 0 I I 0
The same methodology can be applied for imple- Table 3 0 I I 0 0 0 I I 0
menting other logic functions such as OR, AND- t 0 0 -I 0 0 0 0 0 I I -I
NOT, NAND, etc. If there exists a threshold value
Xj
"'-
EPOCH-3
f) ::j:. 0, then two separating lines have to be obtained, I
I I I 0 0 0 I I -I
i.e., one to se-parate positive response from zero 0 I 0 0 0 I 0 I 2 I 0
and the other for separating zero from the negative 0 I 0 I I I I 0 0 0 2 I 0
0 0 -I 0 0 -I 0 0 0 0 -I 2 I -I
response.
"'
C:J
Supervised Learning Network
84 3.12 Solved Problema 85
For the third input sample, XI = -1, X2 = 1,
0----z t = -1, the net input is calculated as,
4. Pind the weights required to perform the follow- Table.7
w,~y
/ ing classification using percepuon network. The Input
(/ vectors (1,), 1, 1) and (-1, 1 -1, -1) are belong-
x, x, _..,;¥'
y '
]in= b+ Lx;w;= b+XJWJ +X2WJ. ing to the class (so have rarger value 1), vectors X2 b Targ.t (t)
w, i=l (1, 1, 1, -1) and (1, -1, -1, 1) are not belong-
'J
··) 1 "'1 "'
=0+-1 X O+ 1 X -2=0+0-2= -2 ing to the class (so have target value -1). Assume -1 1 -1 -1
X,
X, learning rate as 1 and initial weights as 0.
-1 1 -1
Figure 4 Network for AND NOT function. The output is oblained as y = fi.J;n) -1. Since = Solution: The truth table for lhe given vectors is given -1 -1 1 1 -1
t = y, no weight changes. Thus, even after presenting
in Table_?.·-· -·---.. ><
Applying the activation function over the net input, clJe third input sample, the weights are
Le~·Wt = ~~.l/l3. = W< "' b ,;;-p and the
we obtain lear7cng ratec; = 1. Since the thresWtl = 0.2, so Thus,ln the third epoch, all the calculated outputs
w=[O -2 0]
become equal to targets and the necwork has con-
=I ~
ify;,. > 0 the.' ctivation function is
y=f(y,,) if-O~y;11 ::S:0 For the fourth input sample, x1 = -1, X2 = -1, verged. The network convergence can also be checked
y., { ~
if ]in> 0.2
l-1 ify;,. < -0
t = -1, the net input is calculated as
'
if -0.2 :S Yin :S 0.1
by forming separating line equations for separating
positive response regions from zero and zero from
negative response region.
Hence, the output y = f (y;,.) = 0. Since t ::/= y, U.e -1 if Yin< -0.2
]in= b+ Lx;w; = b+x1w1 +X21112 The network architecture is shown in Figure 5.
new weights are computed as
i=l The net input is given by
WJ (new) = W] (o\d) + (UX] = 0 + 1 X -} X 1 = -} =0+-lxO+(-lx-2)
5. Classify the two-dimensiona1 input pattern shown
]in= b+x1w1 +xzWJ. +X3W3 _/ in Figure 6 using perceptron network. The sym~
U12(new) = W2.(old) + cttx2_ = 0 + 1 x -1 x l = -1 =0+0+2=2 bol "*" indicates the da[a representation to be +1
+x4w4
b(new) = b(old)+ at= 0 + 1 x -1 = -1 and "•" indicates data robe -1. The patterns are
The output is obtained as y = f (y;n) = 1. Since The training is performed and the weights are tabu- I-F. For panern I, the targer is+ 1, and for F, the
The weights after presenting the first sample are t f. y, the new weights on updating are given as lated in Table 8. target is -1.
w=[-1-1-1] WJ (new) = WJ (old)+ £UXj = 0+ l X -I X -I = 1
Tables
For the seconci inpur sample, we calculate the net IU2(new) = Ul!(old) + ct!X'z = -2 +I x -1 x -1 =-I
inpur as Weights
b(ncw) = b{old) +at= O+ 1 X -1 = -1 Inputs Target Net input Output Weight changes (w, w, w, w4 b)
' (x, X4 b) (t) (Y;,) (y) (.6.w1 /J.llJ2 .6.w3 IJ.w4 !:J.b) (0 0 0 0 0)
Yin= b + L:x;w; = b +x1w1 +X2W.Z X2
i:= I
The weights after presenting foun:h input sample are
w= [1 -1 -1]
EPOCH-! "'
=-l+lx-1+(-lx-1) ( 1 1 1 1 1) 1 0 0 1 1 1 l 1 1 1 1 1 1
(-1 1 -1 -1 1) 1 -1 -1 -1 1 -1 -1 1 0 2 0 0 2
One epoch of training for AND NOT function using
=-1-1+1=-1 ( 1 1 l -1 1) -1 4 I -1 -1 -I 1 -1 -1 1 -1 1
perceptron network is tabulated in Table 6.
( 1 -1 -1 1 1) -1 1 1 -1 1 1 -1 -1 -2 2 0 0 0
The output y = f(y;") is obtained by applying
Table& EPOCH-2
activation function, hence y = -1.
( 1 1 1 1 1) 1 0 0 1 1 1 1 1 -1 3 1
Since t i= y, the new weights are calculated as Weights
Calculated (-1 1 -1 -1 1) 1 3 1 0 0 0 0 0 -1 3 1
Input
Wj{new) = WJ(oJd) + CUXJ = -l + 1 X I X J = 0 _ _ _ Target Net input output WJ "'2 b ( 1 1 1 -1 1) -1 4 1 -1 -1 -1 1 -1 -2 2 0 2 0
(y) (0 0 0)
XI X:Z 1 (t) (y;,)
I
( 1 -1 -1 1 1) -1 -2 -1 0 0 0 0 0 -2 2 0 2 0
Ul2(new) = Ul2(old) + CtD:l = -1 + 1 x l x-I= -2
1 1 -1 0 0 -1 -1 -1 EPOCH-3
b(new) = b{old) +at= -1 + l xI =0
0 -2 0 ( 1 1 1 1 1) 1 2 1 0 0 0 0 0 -2 2 0 2 0
1 -1 1 1 -1 -1
The weights after presenting the second sample are -1 1 1 -1 -2 -1 0 -2 0 I (-1
( 1 1 1
1 -1 -1
-1
l)
1) -1
1 2
-2 -1
1 0
0
0
0
0
0
0
0
0
0
-2 2
-2 2
0
0
2 0
2 0
-1 -1 1 -1 2 1 1 -1 -l
l
w= [0 -2 0] !__1_ -1 -1 1 1) -1 -2 -1 0 0 0 0 0 -2 2 0 2 0
I
86
= b +x1w1 + XZW2 +X3w3 +X4W4 +xsws
+XGW6 + X7WJ + xawa + X9W9
Supervised learning Network
1
3.12 Solved Problems
w;(new) = w;(old)+ O:IXS = 1 + 1 x -1 x 1 = 0 lnitiaJly all the weights and links are assumed to be
W6(new) == WG(oJd) + 0:0:6 = -1 + 1 X -1 X 1 = -2 small raridom values, say 0.1, and the learning rare is
87
II
also set to 0.1. Also here the least mean square error
=0+1 x0+1 x0+1 x 0+(-1) xO W?{new) = W?(old) + atx'] =I+ 1 x -1 x 1 = 0
+1xO+~Dx0+1x0+1x0+1xO wg(new) = ws(old)+ o:txs = 1 + 1 x -1 x -1 = 2
· miy Qe set. The weights are calculated until the least
m~ square error is obtained.
I
Yin= 0 fU9(new) == fV9(old) + etfX9 = 1 + 1 x -1 x -1 "== 2 The initial weighlS are taken to be WJ = W2 =
b[new) = b(old) +or= I+ 1 x -1 = 0 b = 0.1 and rhe learning rate ct = 0.1. For the first
Therefore, by applying the activation function the
input sample, XJ = 1, X2 = 1, t = 1, we calculate the
output is given by y = ff.J;n) = 0. Now since t '# y, The weighlS afrer presenting rhe second input sam~ net input as
the new weights are computed as pie are ~
Wi(new) = WJ(oJd)+ atx1 =-0+ 1 X 1 X 1 = 1

w = [0 0 0 - 2 0 -2 0 2 2 0]
' 2
Yin= b+ Lx;w; = b+ Lx;w;
w,(new) = w,(old) + 01>2 = 0 + 1 x 1 x 1 = 1
The network architecture is as shown in Figure 7. The i=l i=l
w3(new) = w3(old) + at:q =0+ 1 x 1 x 1= 1 network can be further trained for its convergence. = b+x1w1 +xzwz
Figure 5 Network archirecrure. W.j(new) = W4(o!d) + CUX4:;:: 0 + l X l X -1 = -1
= 0.1 + 1 X 0.1 + 1 X 0.1 = 0.3
w;(new) = w;(old) + atx;_ = 0 + 1 x 1 x l = 1
••• ••• WG(new) = W6(old) + CttxG = 0 + 1 X 1 X -1 = -1 Now compute (t- y;n) = (1- 0.3) = 0.7. Updating
the weights we obrain,
•• W)(new) = W)(old)+ O"'J = 0 + 1 x 1 x 1 = 1
ws(new) = wg(old) + ""' = 0 + 1 x 1 x 1 = 1 w;(new) = w;(old) + a(t- y;n)x;
•••
W<J(new) = rlJ9(old) + O:fX9 = 0 + 1 x l x 1 = 1
'I' 'P where a(t- y;11 )x; is called as weight change fl.w;.
b(new) = b(old) + ot = 0 + 1 x 1 = 1
The new weights are obtained as
Figure 6 I~F data representation.
The weights afrer presenting first input sample are y
Solution: The training patterns for this problem are
w,(new) = WJ(old)+fl.wl = 0.1 +O.l X 0.7 X 1
w = [11 1 - 1 1 - 1 1 1 1 1]
tabulated in Table 9. = 0.1 + 0.07 = 0.17
Forrhesecondinputsample,xz=[1111111-1 w,(new) = w,(old)+L>W2 = 0.1
Table 9 -1 1], t= -1, rhe ner inpm is calculated as
+ 0.1 X 0.7 X 1 = 0.17
Input
b(new) = b(old)+M = 0.1 + 0.1 x 0.7 = 0.17
Pattern x 1 xz X3 .r4 x5 X6 Xi xa X9 1 Target (t) Yir~ = b+ L:x;w;
r"=l where
1-11-111111
F 1 1 1 1 1 ' 1 -1 -11 -1 = b +X] W] + XZWJ. + X3W3 + X4W4 + X5W5
6.w1 = a(t- JirJ~l
I~
+ X6W6 + X7lll] + XflWB +X<) IV<) Figure 7 Network architecture.
.6.wz = a(t- y;,)X2
The initial weights are all assumed to be zero, i.e., =1+ l X 1+ 1 X l+ l X 1+ 1 X -1 + 1 X 1 lmplemenr OR function with bipolar inputs and
e = 0 and a = 1. The activation function is given by
+1x-1+1x1+(-1)x 1+(-1)x1 targelS using Adaline network.
t.b = o(t- y;,)
~y~ {····~· ifJ.rn> ·o

if-O:Sy;, 1 .::;:0 i Yin= 2 Solution: The truth table for OR function with
bipolar inpulS and targers is shown in Table 10.
Now we calculare rhe error:
. -1 ifyrn < -0 I Therefore the output is given by y = f (y;u) = l. E = (r- y;,) 2 = (0.7) 2 = 0.49
I Since t f= y, rhe new weights are Table 10
For the first input sample, Xj = [l 1 L ~ I--1 -1 1 1 t The final weights after presenting ftrsr inpur sam·
1 1], t = l, the net input is calculated as
w,(new) == + o:oq == l + 1 x -1
WJ(old) X\== 0 Xj X:z
- pie are
fV2(new) == fV2(old) + O:tx]. = 1 + 1 X -1 Xl=0 1
-1 w= [0.17 0.17 0.17]
w3(new) = w3(old)+ O:b:J =I+\ X -1 X1= 0
y;, = b + Lx;w; -1
i=l
w~(new) = wq(old) + CtP:4 =-I+ 1 x -1 x t = -2 -1 -1 -1 and errorE= 0.49.
11
II
88 Supervised learning Network 3.12 Solved Problems 89
These calculations are performed for all the input Table 12 7. UseAdaline nerwork to train AND NOT funaion w,(new) = w,(old) + a(t- y,,)x:z
'
samples and the error is caku1ared. One epoch is
completed when all the input patterns are presented.
Summing up all the errors obtained for each input
Epoch
Epoch I
Total mean square error
3.02
with bipolar inputs and targets. Perform 2 epochs
of training.
= 0.2+ 0.2 X (-1.6) X I= -0.12
b(new) = b(old) + a(t- y;,) !
Epoch 2 1.938 Solution: The truth table for ANDNOT function = 0.2+ 0.2 (-1.6) = -0.12
sample during one epoch will give the mtal mean X '!:
Epoch 3 1.5506 with bipolar inputs and targets is shown in Table 13.
square error of that epoch. The network training is
Epoch 4 1.417 Table 13 Now we compute the error,
continued until this error is minimized to a very small
Epoch 5 1.377
value.
Adopting the method above, the network training E= (t- y;,) 2 = (-1.6) 2 = 2.56
~-
is done for OR function using Adaline network and
is tabulated below in Table 11 for a = 0.1. The final weights after presenting first input sample
-".~
The total mean square error aft:er each epoch is a<e w = [-0.12- 0.12- 0.12] and errorE= 2.56.
The operational steps are carried for 2 epochs
given as in Table 12. ,1 @ w1 == 0.4893 f::\_ ~
of training and network performance is noted. It is
Thus from Table 12, it can be noticed that as
training goes on, the error value gets minimized.
~~1'~Y Initially the weights and bias have assumed a random
tabulated as shown in Table 14.
Hence, further training can be continued for fur~ - value say 0.2. The learning rate is also set m 0.2. The
weights are calculated until the least mean square error
The total mean square error at the end of two
epochs is summation of the errors of all input samples
t:her minimization of error. The network archirecrure ~ is obtained. The initial weights are WJ = W1. b = =
of Adaline network for OR function is shown in as shown in Table 15.
0.2, and a= 0.2. For the fim input samplex1 = 1,
Figure 8. Figure 8 Network architecture of Adaline.
.::q = l, & = -1, we calculate the net input as Table15
Yin= b + XtWJ + X2lli2
)
Table 11
= 0.2+ I X 0.2+ I X 0.2= 0.6
Epoch Total mean square error
ll
Weights
Epoch I 5.71 :~
Net Epoch 2 2.43 ·'
Inputs T: Weight changes Now compute (t- Yin} = (-1- 0.6) = -1.6.
- - a<get input Wt b Enor
X] x:z I t Yin (r- Y;,l) i>wt
"'"" i>b (0.1 ""
0.1 0.1) (t- Y;,? Updacing ilie weights we obtain
Hence from Table 15, it is clearly undersrood rhat the .,
EPOCH-I w,-(new) = w,-(old) + o:(t- y,n)x; mean square error decreases as training progresses.
I I I I 0.3 0.7 0,07 0,07 om 0.17 0.17 0.17 0.49
Also, it can be noted rhat at the end of the sixth
'
\;
I -1 I I 0.17 0.83 0.083 -0.083 0.083 0.253 0.087 0.253 0.69 The new weights are obtained as
-I I I I 0.087 0.913 -0.0913 0,0913 0,0913 0.1617 0.1783 0.3443 0.83 epoch, rhe error becomes approximately equal to l.
-1 -1 1 -I 0.0043 -1.0043 0.1004 0.1004 -0.1004 0.2621 0.2787 0.2439 1.01 WI (new) ::::: w, (old) + ct(t- Jj )x,
11 The network architecture for ANDNOT function
EPOCH.2 = 0.2 + 0.2 X (-1.6) X I= -0.12 using Adaline network is shown in Figure 9.
1 I 1 1 0.7847 0.2153 0.0215 0.0215 0.0215 0.2837 0.3003 0.2654 0.046
I -1 1 I 0.2488 0.7512 0.7512 -0.0751 0.0751 0.3588 0.2251 0.3405 0.564 Table 14
-I I 1 I 0.2069 0.7931 -0.7931 0.0793 0.0793 0.2795 0.3044 0.4198 0.629
-1 -1 I -I Weights
-0.1641 -0.8359 0.0836 0.0836 -0.0836 0.3631 0.388 0.336 0.699 Ne<
Inputs Weight changes
EPOCH-3 _ _ Target input w, b Error
I I I I 1.0873 -0.0873 -0.087 -0.087 -0.087 0.3543 0.3793 0.3275 0.0076
t>w, M (0.2 ""
0.2 0.2) (t- Y;n)2
-I
I -1 I
I I
-1 -1 1 -1
I
I
0.3025 +0.6975
0.2827
0.0697 -0.0697 0.0697 0.4241 0.3096 0.3973
0.7173 -0.0717 0,0717 0,0717 0.3523 0.3813 0.469
-0.2647 -0.7353 0.0735 0.0735 -0.0735 0.4259 0.4548 0.3954
0.487
0.515
0.541
X[ X:Z
EPOCH-I
I t Y;" (t-y;rl)
"'""
I -I 0.6 -1.6 -0.32 -0.32 -0.32 -0.12 -0.12 -0.12 2.56
EPOCH-4
I I I I 0,076 -I I I -0.12 1.12 0.22 -0.22 0.22 0.10 -0.34 0.10 1.25
1.2761 -0.2761 -0.0276 -0.0276 -0.0276 0.3983 0.4272 0.3678
I -1 I I 0.3389 0.6611 0.0661 -0.0661 0.0661 0.4644 0.3611 0.4339 0.437 -I I I -I -0.34 -0.66 0.13 -0.13 -0.13 0.24 -0.48 -0.03 0.43
-I I 1 I 0.3307 0.6693 -0.0669 0.0669 0.0699 0.3974 0.428 0.5009 0.448 -1 -1 I -I 0.21 -1.2 0.24 0.24 -0.24 0.48 -0.23 -0.27 1.47
-1 -1 I -I -0.3246 -0.6754 0.0675 0.0675 -0.0675 0.465 0.4956 0.4333 0.456 EPOCH-2
EPOCH-5 -I -0.02 -0.98 -0.195 -0.195 -0.195 0.28 -0.43 -0.46 0.95
I I I I 1.3939 -0.3939 -0.0394 -0.0394 -0.0394 0.4256 0.4562 0.393 0.155
I -1 I I 0.25 0.76 0.15 -0.15 0.15 0.43 -0.58 -0.31 0.57
I -1 I I 0.3634 0.6366 0.0637 -0.0637 0.0637 0.4893 0.3925 0.457 0.405
-I I I I 0.3609 0.6391 -0.0639 0.0639 0.0639 0.4253 0.4654 0.5215 0.408 -I I I -I -1.33 0.33 -0.065 0.065 0.065 0.37 -0.51 -0.25 0.106
-1 -1 I -I -0.3603 -0.6397 0.064 0.064 -0.064 0.4893 0.5204 0.4575 0.409 -1 -1 I -I -0.11 -0.90 0.18 0.18 -0.18 0.55 -0.38 0.43 0.8
I
-~
3.1 '2 Solved Problems 91
90 Supervised learning Network
input sample, XJ = 1, X2 = l, target t = -1, and w11 (new) =W21 (old)+a(t-ZinJ)XZ
...
b.,o learning rate a equal to 0.5: =0.2+0.5(-1-0.55) X 1 =-0.575
"'22 (new)= "'22 (old)+ a(t- z;" 2)"2
x, x ') w1=o.ss
Calculate net input to the hidden units:
1 y =0.2+0.5(-1-0.45)x 1=-0.525'
Zinl = + XJ WlJ + X2U/2J
b1
y
>Nz"'_o.~ = 0.3 + 1 X 0.05 + 1 X 0.2 = 0.55
b2 (new]= b2 (old)+ a(t- z;d
x, x, Zin2 = /n. +X} WJ2 + xiW22 = 0.15+0.5(-1-0.45)=-0.575
= 0.15 + 1 X 0.1 + 1 X 0.2 = 0.45 All the weights and bias between the input layer and
Figure 9 Network architecrure for ANDNOT hidden layer are adjusted. This completes the train-
function using Adaline nerwork.. Calculate the output z 1,Z2 by applying the activa- ~::-1.08
ing for the first epoch. The same process is repeated
tions over the net input computed. The activation until the weight converges. It is found that the weight
8 Using Madaline network, implement XOR func- function is given by Figure 11 Madaline network for XOR function
tion with bipolar inputs and targets. Assume the converges at the end of 3 epochs. Table 17 shows the
(final weights given).
required parameters for training of the network. I ifz;,<:O training performance of Madaline network for XOR y
! (Zir~) = ( -1 ifz;11 <0 function.
Solution: The uaining pattern for XOR function is The network architecture for Madaline network
given in Table 16. Hence, with final weights for XOR function is shown in
Table 16 z1 = j(z;,,) = /(0.55) = I Figure 11.
z, = /(z;,,) = /(0.45) = 1 9._}Jsing back-propagation_ network, find the new
• After computing the output of the hidden units, / weights ~or the ~et shown in Figure 12. It is pre- .0.5
, semed wuh the mput pattern [0, 1] and the target 0.3,
then find the net input entering into the output
output is 1. Use a learning rare a = 0.25 and
unit:
binary sigmoidal activation function.
Yin= b3 +zJVJ +z2112
Solution: The new weights are calculated based
The Madaline Rule I (MRI) algorithm in which the = 0.5 + 1 X 0.5 + I X 0.5 = 1.5 on the training -algorithm in Section 3.5.4. The
-oj
weights between the hidden layer and ourpur layer Figure 12 Ne[Work.
remain fixed is used for uaining the nerwork. Initializ- • Apply the activation function over the net input initial weights are [v11 v11 vod = [0.6 -0.1 0.3],
ing the weights to small random values, the net\York Yin to calculate the output y.
Table 17
architecture is as shown in Figure 10, widt initial
y = f(;y;,) = /(1.5) = 1 Inputs Target
weights. From Figure 10, rhe initial weights and bias b, b2
X~ (t} wn
are [wu "'21 bd = [0.05 0.2 0.3], [wn "'22 b,] = Since t f:. y, weight updation has to be performed. Zinl Zinl ZJ Zl Y;11 Y "'21 W12
'""
[0.1 0.2 0.15] and [v 1 v, b3] = [0.5 0.5 0.5]. For fim Also since t = -1, the weights are updated on z1
EPOCH-I
and Zl that have positive net input. Since here both
I I 1 -1 0.55 0.45 I 1 1.5 1-0.725 -0.58 -0.475-0.625 -0.525 -0.575
1lbj=0.3 net inputs Zinl and Zinl are positive, updating the 1-1 I I -0.625 -0.675 -1-1 -0.5 -1 0.0875-1.39 0.34 -0.625 -0.525 -0.575
weights and bias on both hidden units, we obtain -I 1 1 I -1.1375 -0.475 -I -1 -0.5 -I 0.0875 -1.39 0.34 -1.3625 0.2125 0.1625
Wij(new) = Wij(old) + a(t- Zin)x; -1-1 1 -1 1.6375 1.3125 1 1 1.5 1 1.4065 -0.069 -0.98 -0.207 1.369 -0.994
bj(new) = bj(old) + a(t- z;"j) EPOCH-2
1 I I -1 0.3565 0.168 1 I 1.5 I 0.7285 -0.75 -1.66 -0.791 -0.207 -1.58
y
This implies: 1-1 I 1 -0.1845-3.154 -1-1-0.5-1 1.3205-1.34 -1.068-0.791 0.785 -1.58
-1 1 I 1 -3.728 -0.002 -1-1-0.5-1 1.3205 -1.34 -1.068- 1.29 0.785 -1.08
WI! (new)= WI! (old)+ a(t- ZinJ)XJ
-1-1 I -1 -1.0495-1.071 -1-1-0.5-1 1.3205 -1.34 -1.068-1.29 1.29 -1.08
=0.05+0.5(-1-0.55) X 1 = -0.725
EPOCH-3
WJ2(new) = WJ2(old) + a(t- Zin2)Xl 1.32 -1.34 -1.07 - 1.29 1.29 -1.08
1 1 1 -1 -1.0865-1.083 -1-1-0.5-1
'bz =0.15 =0.!+0.5(-1-0.45) X I =-0.625 -1.34 -1.07 -1.29 1.29 -1.08
1-1 I I 1.5915-3.655 1-1 0.5 I 1.32
b1(new)= b1(old)+a(t-z;"Il -I 1 I I -3.728 1.501 -1 1 0.5 1 1.32 -1.34 -1.07 -1.29 1.29 -1.08
Figure 10 Nerwork archicecrure ofMadaline for 1.29
=0.3+0.5( -I- 0.55) = -0.475 1-1 1 -1 -1.0495-1.701 -1-1-0.5-1 1.32 -1.34 -1.07 -1.29 -1.08
XOR funcr.ions .(initial weights given).
92 SupeJVised Learning Network
-I 3.12 Solved Problems 93
I Compute rhe final weights of the network:
[v12 vn "02l = [-0.3 0.40.5] and [w, w, wo] = [0.4 This implies
0.1 -0.2], and the learning' rate is a = 0.25. Acti- v11(new) = VIt(old)+b.vJI = 0.6 + 0 = 0.6
!, = (I - 0.5227) (0.2495) = 0.1191
vation function used is binary sigmoidal activation vn(new) = vn(old)+t.v12 = -0.3 + 0 = -0.3 .
function and is given by Find the change5~Ulweights be~een hidden and "21 (new) = "21 (oldl+<'>"21
output layer:.
I = -0.1 + 0.00295 = -0.09705
f(x) = I+ ,-• <'>wi = a!1 ZI = 0.25 X 0.1191 X 0.5498 vu(new) = vu(old)+t>vu
,-- 0.0164 ::>
= 0.4 + 0.0006125 = 0.4006125
Given the output sample [x 1, X2] = [0, 1] and target
t= 1, t.w, = a!1 Z2 = 0.25 X 0.1191 X 0.7109 w,(new) = w1(old)+t.w, = 0.4 + 0.0164,
Calculate the net input: For zt layer ---=o:o2iT7 = 0.4164
Figure 13 Network.
<'>wo = a! 1 = 0.25 x 0.1191 = 0.02978 w2(now) = w,(old)+<'>W2 = 0.1 + 0.02!17
Zinl = !lQJ + XJ + X2V21
V11
Compute the error portion 8j between input and = 0.!2!17 For z2layer
= 0.3+0 X 0.6+ I X -0.1 = 0.2 hidden layer (j = 1 to 2): VOl (new) = VOl (old)+<'>•OI = 0.3 + 0.00295
For z2 layer ~f'( = 0.30295
z;,2 = V02 + XJVJ2 + X2V22
Dj= O;,j Zinj)
= 0.5 + (-1) X -0.3 +I X 0.4 = l.2
vo2(new) = 1102(old)+.6.vo2
Zjril = VQ2 + Xj V!2 + X2.V1.2 '
O;,j= I:okwjk = 0.5 + 0.0006125 = 0.5006!25 Applying activation to calculate the output, we
= 0.5 + 0 X -0.3 +I X 0.4 = 0.9 k=!/
.,.(new)= .,.(old)+8wo = -0.2 + 0.02976 obtain
8;nj = 81 Wj! I·.' only one output neuron]
Applying activation co calculate Ute output, we 1_ 1 _ t'0.4
obrain ------
=>!;,I= !1 wn = 0.1191.K0ft = 0.04764
-~
= -0.!7022
Thus, the final weights hav~ been computed for the
t-"inl
ZI =f(z; 1l = - - - = - - = -0.!974
n 1 + t'-z:;nl 1 + /1.4
I
ZI = f(z;,,) = - - - = - - - = 0.5498
1 + e-z.o.1 1 + t-0.2
I =>O;,z = Ot Wzl = 0.1191
_,- X 0.1 = 0.01191
_-:~
network shown in Figure 12.
zz =/(z;,2) = -
1- t'-Z:,;JL
- - = - -1- 2 = 0.537
l - t'-1.2
Error, 81 =O;,,f'(Zirll). 1+t-Zin2 1 +e-.
I 1 19. Find rhe new weights, using back-propagation
z2 = f(z· 2l = - - - = - - - = 0.7109 j'(z;,I) = f(z;,,) [1- f(z;,,)] network for the network shown in Figure 13.
m 1 + e-Zilll 1 + e-0.9 Calculate lhe net input entering the output layer.
= 0.5498[1- 0.5498] = 0.2475 The network is presented with the input pat- For y layer
Calculate the net input entering the output layer. 0 1 =8;,1/'(z;,J) tern l-1, 1] and the target output is +1. Use a
For y layer
= 0.04764 X 0.2475 = 0.0118
learning rate of a = 0.25 and bipolar sigmoidal Yin= WO + ZJWJ +zzWz
activation function. = -0.2 + (-0.1974) X 0.4 + 0.537 X 0.1
Ji11 = WO+ZJWJ +z2wz Error, Oz =0;,a/'(z;,2) Sn_ly.tion: The initial weights are [vii VZI vod = [0.6 = -0.22526
= -0.2 + 0.5498 X 0.4 + 0.7109 X 0.1
j'(z;,) = f(z;d [1 - f(z;,2)] ·0.1 0.3], [v12 "22 vo2l = [ -0.3 0.4 0.5] and [w,
= 0.09101 Wz wo] = [0.4 0.1 -0.2], and die learning rme is Applying activations to calculate the output, we
= 0.7109[1 - 0.7!09] = 0.2055
Applying activations to calculate the output, we
a= 0.25. obtain
Oz =8;,zf' (z;,2) Activation function used is binary sigmoidal 1 1 0.22526
obtain
= 0.01191 X 0.2055 = 0.00245 activacion function and is given by 1 - t'- '" _-_--",=< -0.1!22
1 1
y = f(y;,) = l + t'-y,.. = 1 + 11-22526
Y = f{y;n) = ~ = 1 + e-0.09101 = 0.5227 Now find rhe changes in weights between input 2 1 -e-x
and hidden layer: f (x )----1---
- 1 +e-x - 1 +e-x Compute the error portion 8k:
Compute the error portion 811.:
.6.v 11 =a0 1x1 =0.25 x0.0118 x0=0 Given the input sample [x1, X21 = [-1, l] and target !, = (t, - yllf' (y;,,)
!,= (t,- y,)f'(y,,.,) <'>"21 = a!pQ=0.25 X 0.0118 X I =0.00295 t= 1:
Now
f'(J;,) = f(y;,)[1 - f(J;,)] = 0.5227[1- 0.5227]

<'>vo1 =a!, =0.25 x0.0118=0.00295
.6.v 12 =a82x1 =0.25 x0.00245 xO=O
Calculate the net input: For ZJ layer
Zin\ =VOl +xJVJJ +X2t121

Now
'
----------------
I f'(J;.) = 0.5[1 + f(J;,)] [I- f(J;,)]
= 0.5[! - 0.1122][1 + 0.1122] = 0.4937 .
-- .
-~~
ll:"22 =a!2X'2 =0.25 X 0.00245 X I =0.0006125 I = Q.3 + (-1) X 0.6 +I X -0.1 = -0.4
!' (J;,) = 0.2495 <'>v02 =a!2=0.25 x 0.00245 =0.0006!25
I '-..
-·---
)
l _...-/
l
3.14 Exercise Prob!ems 95
94 Supervised learning Network
13. State the testing algorithm used in perceptron 34. What are the activations used in back-
This implies f>'OI =•01 = 0.25 X 0.1056';'0.0264 propagation network algorithm?
algorithm.
t,.,,=•o 2x, =0.25 x 0.0195 x -1 =-0.0049 35. What is meant by local minima and global
,, = (l + 0.1122) (0.4937) = 0.5491 14. How is _the linear separability concept imple-
[,."22 = cl02X, =0.25 X 0.0195 X 1 =0.0049 mented using perceprron network training? minima?
Find the changes in weights between hidden and l>'02 = •o2= 0.25 X 0.0195 =0.0049 3i5. · Derive the generalized delta learning rule.
15. Define perceprron learning rule.
output layer:
16. Define d_dta rule. 37. Derive the derivations of the binary and bipolar
Comp'Lite the final weights of the nerwork:
L\w1 = a81 ZJ = 0.25 X 0.5491 X -0.1974 1.1~ SGlte the error function for delta rule. sigmoidal activation function.
= -0.0271 18. What is the drawback of using optimization 38. What are the factors that improve the conver-
""(new) = "" (old)+t., 11 = 0.6- 0.0264
gence of learning in BPN network?
/).w, = •01 Z2 = 0.25 X 0.549! X 0.537 = 0.0737 = 0.5736 algorithm?
39. What is meant by incremenrallearning?
L\wo = a81 = 0.25 x 0.5491 = 0.1373 ,,(n<w) = ,,(old)+t.,, = -0.3-0.0049 19. What is Adaline?
40. Why is gradient descent method adopted to
20. Draw the model of an Adaline network.
Compute the error portion Bj beMeen input and = -0.3049 minimize error?
21. Explain the training algorithm used in Adaline
hidden layer (j = 1 to 2): "21 (new) = "21 (old)+t...., 1 = -0.1 + 0.0264 41. What are the methods of initialization of
network.
= -0.0736 weights?
81 = 8;/ljj' (z;nj) 22. How is a Madaline network fOrmed?
m ...,,(new) = "22(old)+t."22 = 0.4 + 0.0049 42. What is the necessity of momentum factor in
23. Is it true that Madaline network consists of many
8inj = L 8k Wjk = 0.4049 perceptrons?
weight updation process?
43. Define "over fitting" or "over training."
~I 24. Scare the characteristics of weighted interconnec-
WI (new) = WI (old)+t.w 1 = 0.4- 0.0271
._ 8inj = 81 WjJ [· •· only one output neuron] tions between Adaline and Madaline. 44. State the techniques for proper choice oflearning
= 0.3729 rate.
=>8in1 =81 WJJ = 0.5491 X 0.4 = 0.21964
w,(n<w) = w,(old)+t.w, = 0.1 + 0.0737
25. How is training adopted in Madaline network
using majority vme rule? 45. What are the limitations of using momentum
( =>o;., =o, ""' = o.549I x o.1 = o.05491 = 0.1737 factor?
Error, 81 =8;,J/'(z;nJ) = 0.21964 X 0.5 26. State few applications of Adaline and Madaline;
1 ''' (n<w) = "OI (old)+l>'OI = 0.3 + 0.0264 46. How many hidden layers can there be in a neural
27. What is meant by epoch in training process?
~
X (I +0.1974)(1- 0.1974) = 0.1056 network?
= 0.3264 28. Wha,r is meant by gradient descent meiliod?
Error, 82 =8;112/'(z;,2) = 0.05491 X 0.5 47. What is the activation function used in radial
"oz(n<w) = '02(old)+t..,, = 0.5 + 0.0049 29. State ilie importance of back-propagation
X (1- 0.537)(1 + 0.537) = 0.0195 basis function network?
= 0.5049 algorithm.
48. Explain the training algorithm of radial basis
Now find the changes in weights berw-een input wo(new) = wo(old)+t.wo = -0.2 + 0.1373 30. What is called as memorization and generaliza- function network.
and hidden layer: = -0.0627 tion? 49. By what means can an IIR and an FIR filter be
31. List the stages involved in training of back- formed in neural network?
f'l.V]J =Cl:'8]X1 =0.25 X 0.1056 X -1 = -0.0264
Thus, the final weight has been computed for the propagation network.
/).'21 =•OiX, =0.25 X 0.1056 X 1 =0.0264 50. What is the importance of functional link net-
network shown in Figure 13. 32. Draw the architecture of back-propagation algo· work?
I 3.13 Review Questions
rithm.
33. State the significance of error portions 8k and Oj
51. Write a short note on binary classification tree
neural network.
1. What is supervised learning and how is it differ- 7. Smte the activation function used in perceprron in BPN algorithm.
52. Explain in detail about wavelet neural network.
em from unsupervised learning? network.
2. How does learning take place in supervised 8. What is the imporrance of threshold in percep-
learning? tron network? I 3.14 Exercise Problems
3. From a mathematical point of view, what is the 9. Mention the applications of perceptron network.
1. Implement NOR function using perceptron are belonging to the class (so have targ.etvalue 1),
process of learning in supervised learning? 10. What are feature detectors?
network for bipolar inputs and targets. vector (-1, -1, -1, 1) and (-1, -1, 1 1) are
4. What is the building block of the perceprron? 11. With a neat flowchart, explain the training not belonging to the class_ (so have target· value
2. Find the weights required to perform the fol-
5. Does perceprron require supervised learning? If process of percepuon network. -1). Assume learning rate 1 and initial weighlS
lowing classifications using perceptron network
no, what does it require? 12. What is the significance of error signal in per- "0.
The vectors (1, 1, -1, -1) ,nd (!,-I. 1, -I)
6. List the limitations of perceptron. ceptron network?
,L

TB - 04 - Superwised Learning

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

TB - 04 - Superwised Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TB - 04 - Superwised Learning

Uploaded by

Copyright:

Available Formats

Ot"fv!. 0v'J..

• y = 6sin(rrxt) + cos(rrx,) randomly. Also the output dara are normalized it

Architecture, flowchart, training algorithm Difference between back-propagation and

I 3.2 Perceptron Networks

{', r' '-

~.J.';) clo...JJ<L [j 1f'fll r').\-~Oo-t" Cl....\ ~ ··~Len sy (\fll-

(,. Yinj = bj + Lx;wij

I 3.3.2 Delta Rule for Single Output Unit

Set initial values-weights

I 3.3.5 Training Algorithm

Step 5: Update the weights and bias fori= I ron:

w;(new) = w;(old) + a (t- Yin) x; No If

60 Supervised Learning Network 3.4 Multiple Adaptive Linear Neurons 61

I 3.3.6 Testing Algorithm

I 3.4.4 Training Algorithm

Initial & fixed weights

Update weights on units zk which

cb Figure 3·8 (Continued).

Figure 3·8 Flowcharr for rraining ofMadaline,

I 3.5 Back·Propagation Network ...>,

I 3.5.3 Flowchart for Training Process

Jk = output unit k. The net input m Yk is

]ink = Wok + L ZjWjk In hidden unit, calculate o/p,

I 3.5.4 Training Algorilhm

Compute error correction !actor

Calculate error term bi

Update weight and bias on

I to m) receives a target parrern corr~ponding to rhe input training

t1vij = cx8jx;; tlvoj = aOj

overa.ll system performance.

3.5.6 Testing Algorithm of Back-Propagation Network

The resting procedure of the BPN is as follows:

Z; = f(z;n;) Figure 3·11 Gaussian kernel fimcrion.

Figure 3·12 Architecture ofRBE

Select centers of RBF functions;

I 3.6.3 Flowchart for Training Process

I 3.6.4 Training Algorithm

Y11n = L W;mv;(x;) + wo X(!) X( I-n)

Thus, a network can be trained using RBFN.

I 3.8 Functional Link Networks

I "=' I I C=21 I C=1 I I C=31

w, The weighlS WJ = I, W2 = l, b = 1 are the final

Wi(new) = WJ(oJd)+ atx1 =-0+ 1 X 1 X 1 = 1

~y~ {····~· ifJ.rn> ·o

input sample, XJ = 1, X2 = l, target t = -1, and w11 (new) =W21 (old)+a(t-ZinJ)XZ

f'(J;,) = f(y;,)[1 - f(J;,)] = 0.5227[1- 0.5227]

Zin\ =VOl +xJVJJ +X2t121

You might also like