TB - 04 - Superwised Learning
TB - 04 - Superwised Learning
TB - 04 - Superwised Learning
oM
48 Artificial Neural Network: An Introduction
'.;;
network with bipolar sigmoidal units (A= 1) ro
.,
. ~!
3
testing. The input-output data are obtained by
achieve the following [)YO-to-one mappings: varying inpuc variables (xt,Xz) within [-1,+1] ~
~
Learning Objectives -----'''-----------------,
The basic networks in supervised learning. Adaline, Madaline, back~propagarion and
How the perceptron learning rule is better radial basis funcrion network.
rhan the Hebb rule. The various learning facrors used in BPN.
Original percepuon layer description. • An overview of Ttme Delay, Function Link,
Delta rule with single output unit. Wavelet and Tree Neural Networks.
I 3.1 Introduction
The chapter covers major topics involving supervised learning networks and their associated single-layer
and multilayer feed-forward networks. The following topics have been discussed in derail- rh'e- perceptron
learning r'Ule for simple perceptrons, the delta rule (Widrow-Hoff rule) for Adaline and single-layer feed-
forward flC[\VOrks with continuous activation functions, and the back-propagation algorithm for multilayer
feed-forward necworks with cominuous activation functions. ln short, ali the feed-forward networks have
been explored.
I. The perceptron network consists of three units, namely, sensory unit (input unit), associator unit (hidden
unit), response unit (output unit).
~
50 SupeNised Learning Network 3.2 Perceptron Networks
51
2. The sensory units are connected to associamr units with fixed weights having values 1, 0 or -l, which are Output
assigned at random. · - o·or 1 Output Desired
II
3. The binary activation function is used in sensory unit and associator unit. Fixed _weight
t Oar 1 output
4. The response unit has an'activarion of l, 0 or -1. The binary step wiili fixed threshold 9 is used as
activation for associator. The output signals £hat are sem from the associator unit to the response unit are
valUe ciN., 0, -1
at randorr\ .
\ 0 G) y,
9~
only binary.
5. TiiCQUt'put of the percepuon network is given by
- ---
i . \.--- '
~¢~
' iX1
y = f(y,,)
X X X
\i
;x,
G) G)
..., \.~ X
:.I\
' < •
.,cl. 1 X I I \ Xn
where J(y;n) is activation function and is defmed as &.,
·'~
"
•<
· ~. tr Sensory unit 1
•
f(\- ,-. \~ ..
~
sensor grid " /
) ·~ if J;n> 9
)\t \ .. ._ representing any-·'
f(y;,) ={ if -9~y;11 56
'Z -1 if y;71 <-9
lJa~------ .
@ @ ry~
6. The perceptron learning rule is used in the weight updation between the associamr unit and the response e,
unit. For each training input, the net will calculate the response and it will Oetermine whelfier or not an Assoc1ator un~ . Response unit
error has occurred.
Figure 3·1 Ori~erceprron network.
w~t fL 9-·
7. The error calculation is based on the comparison of th~~~~~rgets with those of the ca1~t!!_~~ed
outputs. b'"~ r>-"-j ::.Kq>
(l.. u,•>J;>.-l? '<Y\
' ~ AJA I &J)
8. The weights on the connections from the units that send the nonzero signal will get adjusted suitably. I 3.2.2 Perceptron Learning Rule '
9. The weights will be adjusted on the basis of the learning_rykjf an error has occurred for a particular
training patre_!Jl.,..i.e..,- In case of the percepuon learrling rule, the learning signal is the difference between esir.ed...and.actuaL...- -·--,
~ponse of a neuron. The perceptron learning rule IS exp rune as o ows: j ~ f.:] (\ :._ PK- A-£. )
Wi{new) = Wj{old) + a tx1• Consider a finite "n" number of input training vectors, with their associated r;g~ ~ired) values x(n) {
and t{n), where "n" r~o N. The target is either+ 1 or -1. The ourput ''y" is obtained on the
b(new) = b(old) + at basis of the net input calculated and activation function being applied over the net input.
If no error occurs, there is no weight updarion and hence the training process may be stopped. In the above
equations, the target value "t" is+ I or-land a is the learningrate.ln general, these learning rules begin with
an initial guess at rhe weight values and then successive adjusunents are made on the basis of the evaluation
of an ob~~ve function. Evenrually, the lear!Jillg rules reac~.a near~optimal or optimal solution in a finite __
y = f(y,,) = l~
-1
if J1i1 > (}
if-{} 5Jirl 58
if Jin < -{}
\r~~ ~r
I
-~~
~
'
.,
number of steps. -------
APcrceprron nerwork with irs three units is shown in Figure 3~1. A£ shown in Figure 3~1. a sensory unir The weight updacion in case of perceprron learning is as shown. X~ -~~. ·.
can be a two-dimensional matrix of 400 photodetectors upon which a lighted picture with geometric black
and white pmern impinges. These detectors provide a bif!.~.{~) __:~r-~lgl.signal__if.f\1_~ i~.u.~und lfy ,P • then /I
co exceei~. certain value of threshold. Also, these detectors are conne ed randomly with the associator ullit. w{new) = w{old) + a tx {a - learning rate)
The associator unit is found to conSISt of a set ofsubcircuits called atrtre predicates. The feature predicates are else, we have
(
hard-wired to detect the specific fearure of a pattern and are e "valent to the feature detectors. For a particular
w(new) = w(old)
fearure, each predicate is examined with a few or all of the ponses of the sensory unit. It can be found that
the results from the predicate units are also binary (0 1). The last unit, i.e. response unit, contains the
pattern~recognizers or perceptrons. The weights pr tin the input layers are all fixed, while the weights on
the response unit are trainable.
~I
l
52 Supervised Learning Network 3.2 Perceptron Networks 53
For
each No
Figure 3-2 Single classification perceptron network. s:t
training patterns, and this learning takes place within a finite number of steps provided that the solution
exists."-
I 3.2.3 Architecture
In the original perceptron ne[Work, the output obtained from the associator unit is a binary vector, and hence
that output can be taken as input signal to the res onse unit and classificanon can be performed. Here only
the weights be[l.veen the associator unit and the output unit can be adjuste , an, t e we1ghrs between the
sensory _and associator units are faxed. As a result, the discussion of the network is limited. to a single portion. Apply activation, obtain
Thus, the associator urut behaves like the input unit. A simple perceptron network architecrure is shown in Y= f(y,)
Figure 3•2. --~·------
In Figure 3-2, there are n input neurons, 1 output neuron and a bias. The inpur-layer and output-
layer neurons are connected through a directed communication link, which is associated with weights. The
goal of the perceptron net is to classify theJ!w.w: pa~~tern as a member or not a member to a p~nicular
class. · -···-·.··-· ·-.. --- ......
y!l~~~
~1
ilie desired output, the weight updation process is carried out. The entire neMork is trained based on the Step 2: Perform Steps 3--5 for each bipolar or binary training vector pair s:t.
mentioned stopping criterion. The algorithm of a percepuon network is as follows: Step 3, Set activation (identity) of each input unit i = 1 ton:
I StepO: Initi-alize ili~weights a~d th~bia~for ~ ~culation they can b-e set to zero). Also initialize the / x;;= ~{
learning race a(O < a,;:= 1). For simplicity a is set to 1.
Step 1: Perform Steps 2-6 until the final stopping condition is false. Step 4, irst, the net input is calculated as i A
Step 2: Perform Steps 3-5 for each training pair indicated by s:t. I ,_..... It: -.-' --.~ ',_ -~,
---- :::::::::J~ "'( ;-;· J)'
Step 3: The input layer containing input units is applied with identity activation functions: ~----
b(new) = b(old) + Of
Step 6: Test for the stopping condition, i.e., if there is no change in weights then stop the training process,
else, we have
1 else stan again from Step 2. 1
I.'
7Vi(new) = WJ(old}
b(new) = b(old) It em be noticed that after training, the net classifies each of the training vectors. The above algorithm is
I
i
Step 6: Train the nerwork until diere is no weight change. This is the stopping condition for the network. suited for the architecture shown in Figure 3~4. ~
j
If this condition is not met, then start again from Step 2. i
I
The algorithm discussed above is not sensitive to the initial values of the weights or the value of the
3.2. 7 Percept ron Network Testing Algorithm
~
~
It is best to test the network performance once the training process is complete. For efficient performance
learning rare.
of the network, it should be trained with more data. The testing algorithm (application procedure) is as
I
~
follows: ~!I
3.2.6 Perceptron Training Algorithm for Multiple Output Classes il\!
For multiple output classes, the perceptron training algorithm is as follows:
\ Step 0:-- Initialize the weights, biases and learning rare suitably.
Step 1: Check for stopping c?ndirion; if it is false, perform Steps 2-6.
I
I Step 0: The initi~ weights to be used here are taken from the training algorithms (the final weights I
obtained.i:l.uring training).
Step 1: For each input vector X to be classified, perform Steps 2-3.
Step 2: Set activations of the input unit.
II
I:;i
I
.,,.
011
r
56 Supervised Learriing Network
~- 3.3 Adaptive Unear Neuron (Adaline) 57
~~
~,
~~3.3 Adaptive Linear Neuron (Adaline)
'
1
I 3.3.1 Theory ,
x, 'x,
The unirs with linear activation function are called li~ear.~ts. A network ~ith a single linear unit is called
an Adaline (adaptive linear neuron). That is, in an Adaline, the input-output relationship is linear. Adaline
./~~ \~\J
/w,, uses bipolar activation for its input signals and its target output. The weights be.cween the input and the
omput are adjustable. The bias in Adaline acts like an adjustable weighr, whose connection is from a unit
with activations being always 1. Ad.aline is a net which has only one output unit. The Adaline nerwork may
w,l be trained using delta rule. The delta rule may afso be called as least mean square (LMS) rule or Widrow~Hoff
Xi
(x;)~ "/ ~ y 1:
~(s)--
YJ
I • -----+- YJ
rule. This learning rule is found to minimize the mean~squared error between the activation and the target
value.
The Widrow-Hoff rule is very similar to percepuon learning rule. However, rheir origins are different. The
perceptron learning rule originates from the Hebbian assumption while the delta rule is derived from the
x, ( x,).£::::___ _ _~
w -
gradienc~descem method (it can be generalized to more than one layer). Also, the perceptron learning rule
stops after a finite number ofleaming steps, but the gradient~descent approach concinues forever, converging
Figure 3·4 Network archirecture for percepuon network for several output classes. only asymptotically to the solution. The delta rule updates the weights between the connections so as w
minimize the difference between the net input ro the output unit and the target value. The major aim is to
Step 3: Obrain the· response of output unit. minimize the error over all training parrerns. This is done by reducing the error for each pattern, one at a
rime.
The delta rule for adjusting rhe weight of ith pattern {i = 1 ro n) is
Yin = L" x;w; / ' ·
i=l
D.w; = a(t- y1,)x1
where D.w; is the weight change; a the learning rate; xthe vector of activation of input unit;y;, the net input
I if y;, > 8
to output unit, i.e., Y Li=l
= x;w;; t rhe target output. The deha rule in case of several output units for
Y = f(yhl) = { _o ~f ~e sy;, ~8 _,/'\ adjusting the weight from ith input unit to the jrh output unit (for each pattern) is
1 tfy111 <-8 IJ.wij = a(t;- y;,,j)x;
Thus, the testing algorithm resLS the performance of nerwork. I 3.3.3 Architeclure
As already stated, Adaline is a single~unir neuron, which receives input from several units and also from one
unit called bias. An Adaline inodel is shown in Figure 3~5. The basic Adaline model consists of trainable
weights. Inputs are either of the two values (+ 1 or -1) and the weights have signs (positive or negative).
The condition for separaring the response &om re~o is Initially, random weights are assigned. The net input calculated is applied to a quantizer transfer function
(possibly activation function) that restOres the output to +1 or -1. The Adaline model compares the actual
WJXJ + tiJ2X]. + b> (} output with the target output and on the basis of the training algorithm, the weights are adjusted.
_______
The condition for separating the resPonse from_...r~~o t~~ion of nega~ve
..
~--
is I 3.3.4 Flowchart lor Training Process
WI X} + 'WJ.X]_ + b < -(} The flowchan for the training process is shown in Figure 3~6. This gives a picrorial representation of the
network training. The conditions necessary for weight adjustments have co be checked carefully. The weights
The conditions- above are stated for a siilgie:f.i~p;;~~~ ~~~~;~k~ith rwo Input neurons and one output and other required parameters are initialized. Then the net input is calculated, output is obtained and compared
neuron and one bias. with the desired output for calculation of error. On the basis of the error Factor, weights are adjusted.
58 Supervised Learning Network 3.3 Adaptive Linear Neuron (Adaline) 59
Adaptive
algorithm I• e = t- Ym 1 Output error
generator +t For
No
each
.. ................................. Learning supervisor
~... s: t
Figure 3·5 Adaline model.
Yes
.Step 0: Weights and bias are set to some random values bur not zero. Set the learning rate parameter ct.
Step 1: Perform Steps 2-6 when stopping condition is false.
Step 2: Perform Steps 3~5 for each bipolar training pair s:t.
Step 3: Set activations for input units i = I to n.
Weight updation
x;=s; w;(new) = w1(old) + a(t- Y1n)Xi
b(new) = b(old) + a(r- Yinl
Seep 4: Calculate the net input to the output unit.
"
y;, = b+ Lx;w;
i=J
Step 6: If the highest weight change rhat occurred during training is smaller than a specified toler-
ance ilien stop ilie uaining process, else continue. This is the rest for stopping condition of a
network.
The range of learning rate Can be be[Ween 0.1 and 1.0. Figure 3·6 Flowcharr for Adaline training process.
I
I
I
1._
.~
Ic is essential to perform the resting of a network rhat has been trained. When training is completed, the
Adaline can be used ro classify input patterns. A step &merion is used to test the performance of the network.
The resting procedure for thC Adaline nerwc~k is as follows:
J Step 0: Initialize the weights. (The weights are obtained from ilie ttaining algorithm.) J
Step 1: Perform Steps 2-4 for each bipolar input vecror x.
Step 2: Set the activations of the input units to x.
Step 3: Calculate the net input to rhe output unit:
]in= b+ Lx;Wj
Step 4: Apply the activation funcrion over the net input calculated: Figure 3·7 Archireaure of Madaline layer.
1 ify,"~o
y= and the output layer are ftxed. The time raken for the training process in the Madaline network is very high
{ -1 ifJin<O
compared to that of the Adaline network.
{_
Adaline. lifx~O
f(x) = 1 if x < 0
I 3.4.2 Architectury>
A simple Madaline architecture is shown in Figure 3-7, which consists of"n" uniu of input layer, "m" units Step 0: Initialize the weighu. The weights entering the output unit are set as above. Set initial small
ofAdaline layer and "1" unit of rhe Madaline layer. Each neuron in theAdaline and Madaline layers has a bias random values for Adaline weights. Also set initial learning rate a.
of excitation 1. The Adaline layer is present between the input layer and the Madaline (output) layer; hence, Step 1: When stopping condition is false, perform Steps 2-3.
the Adaline layer can be considered a hidden layer. The use of the hidden layer gives the net computational
Step 2: For each bipolar training pair s:t, perform Steps 3-7.
capability which is nor found in single-layer nets, but chis complicates rhe training process to some extent.
The Adaline and Madaline models can be applied effectively in communication systems of adaptive Step 3: Activate input layer units. Fori;:::: 1 to n,
equalizers and adaptive noise cancellation and other cancellation circuits. x;:;: s;
I 3.4.3 Rowchart of Training Process Step 4: Calculate net input to each hidden Adaline unit:
The flowchart of the traini[lg process of the Madaline network is shown in Figure 3-8. In case of training, the "
Zinj:;:bj+ LxiWij, j:;: l tom
weighu between the input layer and the hidden layer are adjusted, and the weights between the hidden layer i=l
62 Supervised Learning Network 3.4 Multiple Adaptive Linear Neurons 63
(
p A
T
Set small random value
weights for adallne layer.
Initialize a
c}----~
t= 1" No
Yes
No
>--+---{8
Update weights on unit z1whose
net input is closest to zero.
b1(new) = b1(old) + a(1-z~)
w,(new) = wi(old) + a(1-zoy)X1
Activate input units
X10': s,, b1 ton
I
Calculate output
zJ= f(z.,)
I If no
Calculater net input to output unit No ( weight changes
c) (or) specilied
Y..,=b0 ·;i:zyJ
,., ' number of
epochs
T ' / (8
Calculate output Yes '
Y= l(y,)
Step 5: Calculate output of each hidden unit: The back-propagation algorithm is different from mher networks in respect to the process by whic
weights are calculated during the learning period of the ne[INork. The general difficulty with the multilayer
Zj = /(z;n) pe'rceprrons is calculating the weights of the hidden layers in an efficient way that would result in a very small
or zero output error. When the hidden layers are incteas'ed the network training becomes more complex. To
Step 6: Find the output of the net: update weights, the error must be calculated. The error, Which is the difference between the actual (calculated)
and the desired (target) output, is easily measured at the"Output layer. It should be noted that at the hidden
y;, = bo + Lqvj
"' layers, there is no direct information of the en'or. Therefore, other techniques should be used to calculate an
j=l error at the hidden layer, which will cause minimization of the output error, and this is the ultimate goal.
The training of the BPN is done in three stages - the feed-forward of rhe input training pattern, the
y =f(y;")
calculation and back-propagation of the error, and updation of weights. The tescin of the BPN involves the
Step 7: Calculate the error and update ilie weighcs. compuration of feed-forward phase onlx.,There can be more than one hi en ayer (more beneficial) bur one
hidden layer is sufhcienr. Even though the training is very slow, once the network is trained it can produce
1. If t = y, no weight updation is required. its outputs very rapidly.
2. If t f y and t = +1, update weights on Zj, where net input is closest to 0 (zero):
I 3.5.2 Architecture
bj(new) = bj(old) + a (1 - z;11j}
wij(new) = W;i(old) + a (1 - z;11j)x; A back-propagation neural network is a multilayer, feed~forv.rard neural network consisting of an input layer,
a hidden layer and an output layer. The neurons present in che hidden and output layers have biases, which
3. If t f y and t = -1, update weights on units Zk whose net input is positive: are rhe connections from the units whose activation is always 1. The bias terms also acts as weights. Figure 3-9
shows the architecture of a BPN, depicting only the direction of information Aow for the feed~forward phase.
w;k(new) = w;k(old) + a (-1 - z;, k) x;
1 During the b~R3=l)3tion phase of learnms., si nals are sent in the reverse direction
b,(new) = b,(old) +a (-1- z;,.,) The inputs sent to the BPN and the output obtained from the net could be e1ther binary (0, I) or
bipolar (-1, + 1). The activation function could be any function which increases monotonically and is also
Step 8: Test for the stopping condition. (If there is no weight change or weight reaches a satisFactory level, differentiable.
or if a specifted maximum number of iterations of weight updarion have been performed then
1 stop, or else continue). I
Madalines can be formed with the weights on the output unit set to perform some logic functions. If there
are only t\VO hidden units presenr, or if there are more than two hidden units, then rhe "majoriry vote rule"
function may be used. /
I'
1 3.5.1 Theory
The back~propagarion learning algorithm is one of the most important developments in neural net\vorks
(Bryson and Ho, 1969; Werbos, 1974; Lecun, 1985; Parker, 1985; Rumelhan, 1986). This network has re-
awakened the scientific and engineering community to the model in and rocessin of nu
phenomena usin ne networks. This learning algori m IS a lied !tilayer feed-forward ne_two_d~
con;rung o processing elemen~S with continuous renua e activation functions. e networks associated
with back-propagation learning algorithm are so e ac -propagation networ. (BPNs). For a given set
of training input-output pair, chis algorithm provides a procedure for changing the weights in a BPN to
classify the given input patterns correctly. The basic concept for this weight update algorithm is simply the
gradient-des em method as used in the case of sim le crce uon networks with differentiable units. This is a r(~.
method where the error is propagated ack to the hidden unit. he aim o t e neur networ IS w train the ''
net to achieve a balance between the net's ability to respond (memorization) and irs ability to give reason~e
I ~~ure3·9
l
Architecture of a back-propagation network.
responses to rhe inpm mar "simi,.,. bur not identi/to me one mar is used in ttaining (generalization).
66 Super.<ise_d Learni~g Network 3.5 Back·Propagalion Network 67
The flowchart for rhe training process using a BPN is shown in Figure 3-10. The terminologies used in the
flowchart and in the uaining algorithm are as follows:
x = input training vecro.r (XJ, ... , x;, ... , x11 )
t = target output vector (t), ... , t/r, ... , tm) -
a = learning rate parameter
x; :;::. input unit i. (Since rhe input layer uses identity activation function, the input and output signals © "
here are same.)
VOj = bias on jdi hidd~n unit
wok = bias on kch output unit FOr each No
~=hidden unirj. The net inpUt to Zj is training pair >-~----(B
x. t
"
Zinj = llOj +I: XjVij
i=l Yes
and rhe output is
Zj = f(zi"j) Receive Input signal x1 &
transmit to hidden unit
z;=f(Z;nj), ]=1top
and rhe output is i= 1\o n
y; = f(y,";)
Ok =. error correction weight adjusrmen~. for Wtk ~hat is due tO an error at output unit Yk• which is
back-propagared m the hidden uni[S thai feed into u~
Of = error correction weight adjustment for Vij that is due m the back-proEagation of error to the
hidden uni<zj- b>• '\f"-( L""'-'iJ ~-fe_,l.. ,,'-'.fJ Z-J' ...--
Also, ir should be noted that tOe commonly used acrivarion functions are l:imary sigmoidal and bipolar
sigmoidal activation functions (discussed in Section 2.3.3). These functions are used in the BPN because of Calculate output signal from
the following characteristics: (i) continui~; (ii) djffereorjahilit:ytlm) nQndeCreasing mon0£9.11Y· output layer,
p
The range of binary sigmoid is fio;Q to 1, and for bipolar sigmoid it is from -1 to+ 1. Yink =- Wok+ :E z,wik
"'
Yk = f(Yink), k =1 tom
The error back-propagation learning algorithm can be oudined in ilie following algorithm:
Figure 3·10
!Step 0: Initialize weights and learning rate (take some small random values).
Step 1: Perform Sreps 2-9 when stopping condition is false.
Step 2: Perform Steps 3-8 for~ traini~~r.
I
L
Supervised learning Network
3.5 Back·Propagation Network 69
68
_, - ------------._
lf:edjorward p~as' (Phas:fJ_I
A
Zfnf' =
-
v;j + LX
"
ill;;
I
-v
Y. '..,
= l to n}.
,I
'rJ
i=l
Calculate output of the hidden uilit by applying its activation functions over Zinj (binary or bipolar
Find weight & bias correction term
ll.Wjk. = aO,zj> l\W01c = ~J"II
Zj = /(z;,j)
and send the output signal from the hidden unit to the input of output layer units.
Step 5: For each output unity,~o (k = I to m),_ca.lcuhue the net input: ,I
,\--. o\•\
(between hidden and input)
m
~nJ=f}kWjk ' I
p
~ = 0,,1f'(z1,p
Yink = Wok + L ZjWjk
j~l
I
Compute change in weights & bias based
on bj.l!.vii= aqx;. ll.v01 = aq
Yk = f(y;,,)
"'
8inj= z=okwpr
k=l
The term 8inj gets multiplied wirh ilie derivative of j(Zinj) to calculate the error tetm:
8j=8;11jj'(z;nj)
The derivative /'(z;71j) can be calculated as C!TS:cllssed in Section 2.3.3 depending on whether
binary or bipolar sigmoidal function is used. On the basis of the calculated 8j, update rhe change
in weights and bias:
I
-I
'
70 Supervised Learning Network 3.5 Back-Propagation Network 71 :IIf
. Wlighr and bias upddtion (PhaJ~ Ill): I from the beginning itself and the system may be smck at a local minima or at a very flat plateau at the starting
•
point itself. One method of choosing the weigh~ is choosing it in the range
Step 8: Each output unit (yk, k = 1 tom) updates the bias and weights:
I
Wjk(new) = Wjk(old)+6.wjk I -3' 3 J.
[ .fO;' _;a,'
= WQk(oJd)+L'.WQk '
WOk(new)
i
Each hidden unit (z;,j = 1 top) updates its bias and weights: I
Vij(new) = Vij(o!d)+6.vij
'<y(new) = VOj(old)+t.voj
Step 9: Check for the sropping condition. The stopping condition may be cenain number of epochs
1 reached or when ilie actual omput equals the t<Uget output. 1 V,j'(new) =y Vij(old)
llvj(old)ll
The above algorithm uses the incremental approach for updarion of weights, i.e., the weights are being
where Vj is the average weight calculated for all values of i, and the scale factory= 0.7(P) 11n ("n" is the
changed immediately after a training pattern is presented. There is another way of training called batch-mode
number of input neurons and "P" is the nwnber of hidden neurons).
training, where the weights are changed only after all the training patterns are presented. The effectiveness of
rwo approaches depends on the problem, but batch-mode training requires additional local storage for each
3.5.5.2 Learning Rate a
connection to maintain the immediate weight changes. When a BPN is used as a classifier, it is equivalent to
the optimal Bayesian discriminant function for asymptOtically large sets of statistically independent training The learning rate (a) affects the convergence of the BPN. A larger value of a may speed up the convergence
but might result in overshooting, while a smaller value of a has vice-versa effecr. The range of a from 10- 3
pauerns.
The problem in this case is whether the back-propagation learning algorithm can always converge and find to 10 has been used successfulfy for several back-propagation algorithmic experiments. Thus, a large learning I
proper weights for network even after enough learning. It will converge since it implements a gradient-descent rate leads to rapid learning bm there is oscillation of wei_g!lts, while the lower learning rare leads to slower
on the error surface in the weight space, and this will roll down the error surface to the nearest minimum error learning. -
and will stop. This becomes true only when the relation existing between rhe input and the output training
patterns is deterministic and rhe error surface is deterministic. This is nm the case in real world because the 3.5.5.3 Momentum Factor
produced square-error surfaces are always at random. This is the stochastic nature of the back-propagation The gradient descent is very slow if the learning rare a is small and oscillates widely if a is roo large. One
algorithm, which is purely based on the srochastic gradient-descent method. The BPN is a special case of very efficient and commonly used method that altows a larger learning rate without oscillations is by adding
stochastic approximation. a momentum factor ro rhc;_.!,LQ!DlaLgradient-descen_t __m~_r]l_Qq., _
If rhe BPN algorithm converges at all, then it may get smck with local minima and may be unable to The-iil"Omemum E'cror IS denoted by 1] E [0, i] and the value of 0.9 is often used for the momentum
find satisfactory solutions. The randomness of the algorithm helps it to get out of local minima. The error factor. Also, this approach is more useful when some training data are ve rem from the ma·oriry
functions may have large number of global minima because of permutations of weights that keep the network of clara. A momentum factor can be used with either p uern y pattern up atillg or batch-"iiii e up a -
input-output function unchanged. This"6.uses the error surfaces to have numerous troughs. ing.-I'iicase of batch mode, it has the effect of complete averagirig over rhe patterns. Even though the
averaging is only partial in the panern-by-pattern mode, it leaves some useful i-nformation for weight
updation.
3.5.5 Learning Factors _of Back-Propagation Network
The weight updation formulas used here are
The training of a BPN is based on the choice of various parameters. Also, the convergence of the BPN is
Wjk(t+ I)= Wji(t) + ao,Zj+ry [Wjk(t)- Wjk(t- I)]
based on some important learning factors such as rhe initial weights, the learning rare, the updation rule,
the size and nature of the training set, and the architecture (number of layers and number of neurons per ll.•uj~(r+ 1)
layer).
and
3.5.5.1 Initial Weights
Vij(t+ 1) = Vij(t) + a8jXi+1J{Vij(t)- Vij(t- l)]
The ultimate solution may be affected by the initial weights of a multilayer feed-forward nerwork. They are
ll.v;j(r+ l)
initialized at small random values. The choice of r wei t determines how fast the network converges. I
The initial weights cannm be very high because t q~g-~oidal acriva · ed here may get samrated I The momenlum factor also helps in fas"r convergence.
L
'.
72 Supervised Learning Network 3.6 Radiat Basis Function Network 73
3.5.5.4 Generalization Step 4: Now c?mpure the output of the output layer unit. Fork= I tom,
The best network for generalization is BPN. A network is said robe generalized when it sensibly imerpolates p
with input networks thai: are new to the nerwork. When there are many trainable parameters for the given link =:WOk + L ZjWjk
amount of training dam, the network learns well bm does not generalize well. This is usually called overfitting ·. ·j=l
or overtraining. One solurion to this problem is to moniror the error on the rest sec and terminate the training
when che error increases. With small number of trainable parameters, ~e network fails to learn the training Jk = f(yj,,)
_r!-'' ~.,_,r; data and performs very poorly. on the .test data. For improving rhe abi\icy of the network ro generalize from Use sigmoidal activation functions for calculating the output.
.-.!( ~o_ a training data set w a rest clara set, ir is desirable to make small changes in rhe iripur space of a panern,
}{i 1
.,'e,) without changing the output components. This is achieved by introducing variations in the in pur space of
-0
c..!( '!f.!' training panerns as pan of the training set. However, computationally, this method is very expensive. Also,
,-. ,:'\ j a net With large number of nodes is capable of membfizing the training set at the cost of generali:zation ...As a I 3.6 Radial Basis Function Network
?\ Ji result, smaller nets are preferred than larger ones.
r I 3.6.1 Theory
3.5.5.5 Number of Training Data
The radial basis function (RBF) is a classification and functional approximation neural network developed
The training clara should be sufficient and proper. There exisrs a rule of thumb, which states !!!:r rhe training
by M.J.D. Powell. The newark uses the most common nonlineariries such as sigmoidal and Gaussian kernel
dat:uhould cover the entire expected input space, and while training, training-vector pairs should be selected
functions. The Gaussian functions are also used in regularization networks. The response of such a function is
randomly from the set. Assume that theffiput space as being linearly separable into "L" disjoint regions
positive for all values ofy; rhe response decreases to 0 as lyl _. 0. The Gaussian function is generally defined as
with their boundaries being part of hyper planes. Let "T" be the lower bound on the ~umber~ of training
pens. Then, choosing T suE!!_ that TIL ») will allow the network w discriminate pauern classes using f(y) = ,-1
fine piecewise hyperplane parririomng. Also in some cases, scaling.ornot;!:flalization has to be done to help
learning. __ ,•' ··: }) \ .. The derivative of this function is given by
3.5.5.6 Number of Hidden Layer Nodes .•. A/77 _/ ['(yl = -zy,-r' = -2yf(yl
If there exists more than one hidden layer in a BPN, rhe~~ICufarions
performed for a single layer are The graphical represemarion of this Gaussian Function is shown in Figure 3-11 below.
repeated for all the layers and are summed up at rhe end. In case of"all mufnlayer feed-forward networks, When rhe Gaussian potemial functions are being used, each node is found to produce an idemical outpm
rhe size of a h1dden layer i'f"VeTy important. The number of hidden units required for an application needs for inputs existing wirhin the fixed radial disrance from rhe center of the kernel, they are found m be radically
to be determined separately. The size of a hidden lay~_:___is usually determi_~Q~~p_qim~~- For a network symmerric, and hence the name radial basis function network. The emire network forms a linear combination
of a reasonable size,~ SIZe of hidden nod -- araariVel}r~mall fraction of the inpllrl~For of the nonlinear basis function.
example, if the network does not converge to a solution, it may need mor hidduJ lmdes:-i3~and,
Step 0: Initialize the weights. The weights are taken from the training algorithm.
Step 1: Perform Steps 2-4 for each input vector.
Step 2: Set the activation of input unit for x; (i = I ro n).
Step 3: Calculate the net input to hidden unit x and irs output-. For j = 1 ro p,
"
Zinj = VOj + L XiVij ~----~~--r---L-~--~r-----~Y
i:=l -2 -1 0 2
x,
X,
For "'- No
each >--
x,
Input Hidden Output
layer layer (RBF) layer
The flowchart for rhe training process of the RBF is shown in Figure 3-13 below. In this case, the cemer of
the RBF functions has to be chosen and hence, based on all parameters, the output of network is calculated.
The training algorithm describes in derail ali rhe calculations involved in the training process depicted in rhe
flowchart. The training is starred in the hidden layer with an unsupervised learning algorithm. The training is
continued in the output layer with a supervised learning algorithm. Simultaneously, we can apply supervised
learning algorithm to ilie hidden and output layers for fme-runing of the network. The training algorithm is
If no
given as follows. 'epochs (or)
no
I Ste~ 0: Set the weights to small random values. No weight
hange
Step 1: Perform Steps 2-8 when the stopping condition is false.
Step 2: Perform Steps 3-7 for each input. Yes f+------------'
Step 3: Each input unir .(x; for all i ::= 1 ron) receives inpm signals and transmits to rhe next hidden layer
unit.
Figure 3-13 Flowchart for the training process ofRBF.
76 Supervised Learning Network
3.8 Functional Link Networks 77
·Step 4: Calculate the radial basis function.
Step 5: Select the cemers for che radial basis function. The cenrers are selected from rhe set of input
vea:ors. It should be ·noted that a sufficient number of centen; have m be selected to ensure
X( I)
Delay line
l
adequate sampli~g of the input vecmr space. X( I) !<(1-D X( I-n)
Step 6: Calculate the output from the hidden layer unit:
-
Multllayar perceptron
r
t,rxji- Xji)']
v;(x;) =
exp [-
J-
a2
T
0(1)
'
where Xj; is the center of the RBF unit for input variables; a; the width of ith RBF unit; xp rhe Figure 3·14 Time delay neural network (FIR fiher).
jth variable of input panern.
Step 7: Calculate the output of the neural network:
i=l
where k is the number of hidden layer nodes (RBF funcrion);y,m the output value of mrh node in Multilayer perceptron z-1
output layer for the nth incoming panern; Wim rhe weight between irh RBF unit and mrh ourpur
node; wo the biasing term at nrh output node.
Step 8: Calculate the error and test for the stopping condition. The stopping condition may be number 0(1)
of epochs or ro a certain ex:renr weight change.
Figure 3·15 TDNN wirh ompur feedback (IIR filter).
Yes No
obtained by a multilayer network at a panicular decision node is used in the following way:
Figure 3·16 Functional line nerwork model.
x directed to left child node tL, if y < 0
x directed to right child node tR, if y ::: 0
x, 'x, The algorithm for a TNN consists of two phases:
~
/
1. Tree growing phase: In this phase, a large rree is grown by recursively fmding the rules for splitting until
x, x, 0 all the terminal nodes have pure or nearly pure class membership, else it cannot split further.
y y
2. Tree pnming phase: Here a smaller tree is being selected from the pruned subtree to avoid the overfilling
1 of data.
The training ofTNN involves [\VO nested optimization problems. In the inner optimization problem, the
~G BPN algorithm can be used to train the network for a given pair of classes. On the other hand, in omer
Figure 3·17 The XOR problem.
optimization problem, a heuristic search method is used to find a good pair of classes. The TNN when rested
on a character recognition problem decreases the error rare and size of rhe uee relative to that of the smndard
classifiCation tree design methods. The TNN can be implemented for waveform recognition problem. It
Thus, ir can be easily seen rhar rhe functional link nerwork in Figure 3~ 17 is used for solving this problem. obtains comparable error rates and the training here is faster than the large BPN for the same application.
The li.Jncriona.llink network consists of only one layer, therefore, ir can be uained using delta learning rule Also, TNN provides a structured approach to neural network classifier design problems.
instead of rhe generalized delta learning rule used in BPN. As, a result, rhe learning speed of the fUnc6onal
link network is faster rhan that of the BPN.
I 3.10 Wavelet Neural Networks
I 3.9 Tree Neural Networks The wavelet neural network (WNN) is based on the wavelet transform theory. This nwvork helps in
approximating arbitrary nonlinear functions. The powerful tool for function approximation is wavelet
The uee neural networks (TNNs) are used for rhe pattern recognition problem. The main concept of this decomposition.
network is m use a small multilayer neural nerwork ar each decision-making node of a binary classification Letj(x) be a piecewise cominuous function. This function can be decomposed into a family of functions,
tree for extracting the non-linear features. TNNs compbely extract rhe power of tree classifiers for using which is obtained by dilating and translating a single wavelet function¢: !(' --')- R as
appropriate local fearures at the rlilterent levels and nodes of the tree. A binary classification tree is shown in
Figure 3-18.
The decision nodes are present as circular nodes and the terminal nodes are present as square nodes. The
j(x) = L' w;det [D) 12] ¢ [D;(x- 1;)]
i::d
terminal node has class label denoted 'by Cassociated with it. The rule base is formed in the decision node
(splitting rule in the form off(x) < 0 ). The rule determines whether the panern moves to the right or to the where D,. is the diag(d,·), d,. EJ?t
ate dilation vectors; Di and t; are the translational vectors; det [ ] is the
left. Here,f(x) indicates the associated feature ofparcern and"(}" is the threshold. The pattern will be given determinant operator. The w:..velet function¢ selecred should satisfy some properties. For selecting¢: If' --')o
the sJass label of the terminal node on which it has landed. The classification here is based on the fact iliat R, the condition may be
the appropriate features can be selected ar different nodes and levels in the tree. The output feature y = j(x)
,P(x) =¢1 (XJ) .. t/J1 (X 11 ) forx:::: (x, X?.· . . , X11 )
L_i.._
..~~'"·
80 Supervised Learning Network 3.12 Solved Problems 81
ro form a Madaline network. These networks are trained using delta learning rule. Back-propagation network
-r is the most commonly used network in the real time applications. The error is back-propagated here and is
fine runed for achieving better performance. The basic difference between the back-propagation network and
~)-~-{~Q-{~}-~~-~ 7 radial basis function network is the activation funct'ion. use;d. The radial basis function network mostly uses
Gaussian activation funcr.ion. Apart from these nerWor~; some special supervised learning networks such as
:_,, : \] : : ~ time delay neural ne[Wotks, functional link networks, tree neural networks and wavelet neural networks have
also been discussed.
0----{~J--[~]-----{~-BJ------0-r
:· I I
K
Input( X
3.12 Solved Problems
Output
I. I!Jlplement AND function using perceptron net- Calculate the net input
~ //works for bipol~nd targets.
&-c~J-{~~J-G-cd
y;, = b+xtWJ +X2W2
Solution: Table 1···shows the truth table for AND
function with bipolar inputs and targelS:
=O+Ix0+1x0=0
Figure 3·19 Wavelet neural network. The output y is computed by applying activations
Table 1
over the net input calculated:
I {
X]
where "'I I ify;,> 0 · - .
-I -I y = f(;y;,) = 0 if y;, = 0
¢, (x) = -xexp ( -~J -I I -I -1 ify;71 <0
-I -I -I . - ··-· . .- -- .--_-==-...
Here we have rake~-1) = O.)Hence, when,y;11 = 0,
is called scalar wavelet. The network structure can be formed based on rhe wavelet decomposirion as y= 0. ---···
The perceptron network, which uses perceptron
" learning rule, is used to train the AND function. Check whether t = y. Here, t = 1 andy = 0, so
y(x) = L w;¢ [D;(x- <;)] +y The network architecture is as shown in Figure l. t f::. y, hence weight updation takes place:
i=l
The input patterns are presemed to the network one
w;(new) = zv;(old) + ct.t:x;
where J helps to deal with nonzero mean functions on finite domains. For proper dilation, a rotation can be by one. When all the four input patterns are pre-
made for bener network operation: sented, then one epoch is said to be completed. The WJ(new) = WJ(oJd}+ CUXJ =0+] X I X l = 1
initial weights and threshold are set to zero, i.e., W2(ncw) = W2(old) + atx:z = 0 + 1 x l x 1 = I
WJ = WJ. = h = 0 and IJ = 0. The learning rate
y(x) = L" w;¢ [D;R;(x- <;)] + y a is set equal to 1.
b(ncw) = h(old) + at= 0 + 1 x I = l
i=l
Here, the change in weights are
x,~
where R; are the rotation marrices. The network which performs according to rhe above equation is called
Ll.w! = ~Yt:q;
wavelet neural network. This is a combination of translation, rotarian and dilation; and if a wavelet is lying on
the same line, then it is called wavekm in comparison to the neurons in neural networks. The wavelet neural Ll.W2 = atxz;
network is shown in Figure 3-19. b..b = at
X, ~ y y
Table2
Weights 0----z The final weights at the end of third epoch are
w, =2,W]_ = l,b= -1
Input
Target Net input
Calculated
output
Weight changes
W) W]. b x, X w,~y y
Fu-rther epochs have to be done for the convergence
~
X) X]. (t) (y,,) (y) ~WI f:j.W'l M (0 0 0) of'the network.
· 3. _Bnd-the weights using percepuon network for
EPOCH-I
I 0 0 I /AND NOT function when all the inpms are pre-
I
-I -1 -I -I 0 2 0 sented only one time. Use bipolar inputS and
Figure 3 Perceptron network for OR function.
-I -I 2 +I -1 -I I -I ' targets.
0 0 0 1 -1 -I The perceptron network, which uses perceptron Solution: The truth table for ANDNOT function is
-1 -1 -I -3 -I
learning rule, is used to train the OR function. shown in Table 5.
EPOCH-2
0 0 0 -I The network architecture is shown in Figure 3. TableS
I I
0 0 0 -I The initial values of the weights and bias are taken
I -1 -I -1 -I t
-1 as zero, i.e., Xj "'-
-I -I -I -I 0 0 0
-I
I I -I
-I -1 -3 -I 0 0 0
WJ=W]_:::::b:::::O 1 -I I
-I I -1
target and calculated ourput converge for all the ~ Also the learning rate is 1 and threshold is 0.2. So, -I -I -I
patterns. the aaivation function becomes
The final weights and bias after second epoch are The network architecture of AND NOT function is
/'~-..._.};.- . .~:- 1 if y;/1> 0.2 ~ shown as in Figure 4. Let the initial weights be zero
=l,W'l=l, b=-1 (-1, 1)
,_~-- ~Yin ~ 0.2
W[
0 . _,. \.. . .. , [(yin) ;::: { O if - 0.2 and ct = l,fJ = 0. For the first input sample, we
compme the net input as
Since the threshold for the problem is zero, the
equation of the separating line is '
-x,
l~
~
,x,. . J/
";?").. The network is trained as per the perceptron training "
·. ~i algorithm and the steps are as in problem 1 (given for Yin= b+ Lx;w; = h+x1w 1 +xzlil2
w, b
X2 = - - X i - -
:.. /1--'
first pattern}. Table 4 gives the network rraining for i=-1
(-1,-1) (1,-1) ~=-X,+1 '/
Here
'"' "" 3 epochs. =O+IxO+IxO=O
Table4
W[X! + lli2X2 + b > $ Weights
W]X] + UlzX2 + b> Q -X,
Input Calculated Weight changes
Figure 2 Decision boundary for AND function
Target Net input output w, W2 b
Thus, using the final weights we obtain Xi X2 (t) {y;,,) (y) ~W) ~., ~b (0 0 0)
in perceptron training{$= 0).
I (-1) EPOCH-I
X2 = -}x' - -~- ~'mplemenr OR function with binary inputs and I 0 0 I I I
0 2 0 0 0
lil~J
L _ -xt+l · bipolar targw using perceptron training algo-
rithm upto 3 epochs.
0 2 0 0 0 I I 0
h can be easily found that the above straight line 0 0 -I 0 0 -I I I 0
Solution: The uuth table for OR function with EPOCH-2
separates the positive response and negative response
binary inputs and bipolar targets is shown in Table 3. 2 0 0 0 I I 0
region, as shown in Figure 2.
I 0 0 0 0 I I 0
The same methodology can be applied for imple- Table 3 0 I I 0 0 0 I I 0
menting other logic functions such as OR, AND- t 0 0 -I 0 0 0 0 0 I I -I
NOT, NAND, etc. If there exists a threshold value
Xj
"'-
EPOCH-3
f) ::j:. 0, then two separating lines have to be obtained, I
I I I 0 0 0 I I -I
i.e., one to se-parate positive response from zero 0 I 0 0 0 I 0 I 2 I 0
and the other for separating zero from the negative 0 I 0 I I I I 0 0 0 2 I 0
0 0 -I 0 0 -I 0 0 0 0 -I 2 I -I
response.
"'
C:J
Supervised Learning Network
84 3.12 Solved Problema 85
For the third input sample, XI = -1, X2 = 1,
0----z t = -1, the net input is calculated as,
4. Pind the weights required to perform the follow- Table.7
w,~y
/ ing classification using percepuon network. The Input
(/ vectors (1,), 1, 1) and (-1, 1 -1, -1) are belong-
x, x, _..,;¥'
y '
]in= b+ Lx;w;= b+XJWJ +X2WJ. ing to the class (so have rarger value 1), vectors X2 b Targ.t (t)
w, i=l (1, 1, 1, -1) and (1, -1, -1, 1) are not belong-
'J
··) 1 "'1 "'
=0+-1 X O+ 1 X -2=0+0-2= -2 ing to the class (so have target value -1). Assume -1 1 -1 -1
X,
X, learning rate as 1 and initial weights as 0.
-1 1 -1
Figure 4 Network for AND NOT function. The output is oblained as y = fi.J;n) -1. Since = Solution: The truth table for lhe given vectors is given -1 -1 1 1 -1
t = y, no weight changes. Thus, even after presenting
in Table_?.·-· -·---.. ><
Applying the activation function over the net input, clJe third input sample, the weights are
Le~·Wt = ~~.l/l3. = W< "' b ,;;-p and the
we obtain lear7cng ratec; = 1. Since the thresWtl = 0.2, so Thus,ln the third epoch, all the calculated outputs
w=[O -2 0]
become equal to targets and the necwork has con-
=I ~
ify;,. > 0 the.' ctivation function is
y=f(y,,) if-O~y;11 ::S:0 For the fourth input sample, x1 = -1, X2 = -1, verged. The network convergence can also be checked
y., { ~
if ]in> 0.2
l-1 ify;,. < -0
t = -1, the net input is calculated as
'
if -0.2 :S Yin :S 0.1
by forming separating line equations for separating
positive response regions from zero and zero from
negative response region.
Hence, the output y = f (y;,.) = 0. Since t ::/= y, U.e -1 if Yin< -0.2
]in= b+ Lx;w; = b+x1w1 +X21112 The network architecture is shown in Figure 5.
new weights are computed as
i=l The net input is given by
WJ (new) = W] (o\d) + (UX] = 0 + 1 X -} X 1 = -} =0+-lxO+(-lx-2)
5. Classify the two-dimensiona1 input pattern shown
]in= b+x1w1 +xzWJ. +X3W3 _/ in Figure 6 using perceptron network. The sym~
U12(new) = W2.(old) + cttx2_ = 0 + 1 x -1 x l = -1 =0+0+2=2 bol "*" indicates the da[a representation to be +1
+x4w4
b(new) = b(old)+ at= 0 + 1 x -1 = -1 and "•" indicates data robe -1. The patterns are
The output is obtained as y = f (y;n) = 1. Since The training is performed and the weights are tabu- I-F. For panern I, the targer is+ 1, and for F, the
The weights after presenting the first sample are t f. y, the new weights on updating are given as lated in Table 8. target is -1.
w=[-1-1-1] WJ (new) = WJ (old)+ £UXj = 0+ l X -I X -I = 1
Tables
For the seconci inpur sample, we calculate the net IU2(new) = Ul!(old) + ct!X'z = -2 +I x -1 x -1 =-I
inpur as Weights
b(ncw) = b{old) +at= O+ 1 X -1 = -1 Inputs Target Net input Output Weight changes (w, w, w, w4 b)
' (x, X4 b) (t) (Y;,) (y) (.6.w1 /J.llJ2 .6.w3 IJ.w4 !:J.b) (0 0 0 0 0)
Yin= b + L:x;w; = b +x1w1 +X2W.Z X2
i:= I
The weights after presenting foun:h input sample are
w= [1 -1 -1]
EPOCH-! "'
=-l+lx-1+(-lx-1) ( 1 1 1 1 1) 1 0 0 1 1 1 l 1 1 1 1 1 1
(-1 1 -1 -1 1) 1 -1 -1 -1 1 -1 -1 1 0 2 0 0 2
One epoch of training for AND NOT function using
=-1-1+1=-1 ( 1 1 l -1 1) -1 4 I -1 -1 -I 1 -1 -1 1 -1 1
perceptron network is tabulated in Table 6.
( 1 -1 -1 1 1) -1 1 1 -1 1 1 -1 -1 -2 2 0 0 0
The output y = f(y;") is obtained by applying
Table& EPOCH-2
activation function, hence y = -1.
( 1 1 1 1 1) 1 0 0 1 1 1 1 1 -1 3 1
Since t i= y, the new weights are calculated as Weights
Calculated (-1 1 -1 -1 1) 1 3 1 0 0 0 0 0 -1 3 1
Input
Wj{new) = WJ(oJd) + CUXJ = -l + 1 X I X J = 0 _ _ _ Target Net input output WJ "'2 b ( 1 1 1 -1 1) -1 4 1 -1 -1 -1 1 -1 -2 2 0 2 0
(y) (0 0 0)
XI X:Z 1 (t) (y;,)
I
( 1 -1 -1 1 1) -1 -2 -1 0 0 0 0 0 -2 2 0 2 0
Ul2(new) = Ul2(old) + CtD:l = -1 + 1 x l x-I= -2
1 1 -1 0 0 -1 -1 -1 EPOCH-3
b(new) = b{old) +at= -1 + l xI =0
0 -2 0 ( 1 1 1 1 1) 1 2 1 0 0 0 0 0 -2 2 0 2 0
1 -1 1 1 -1 -1
The weights after presenting the second sample are -1 1 1 -1 -2 -1 0 -2 0 I (-1
( 1 1 1
1 -1 -1
-1
l)
1) -1
1 2
-2 -1
1 0
0
0
0
0
0
0
0
0
0
-2 2
-2 2
0
0
2 0
2 0
-1 -1 1 -1 2 1 1 -1 -l
l
w= [0 -2 0] !__1_ -1 -1 1 1) -1 -2 -1 0 0 0 0 0 -2 2 0 2 0
I
86
= b +x1w1 + XZW2 +X3w3 +X4W4 +xsws
+XGW6 + X7WJ + xawa + X9W9
Supervised learning Network
1
3.12 Solved Problems
w;(new) = w;(old)+ O:IXS = 1 + 1 x -1 x 1 = 0 lnitiaJly all the weights and links are assumed to be
W6(new) == WG(oJd) + 0:0:6 = -1 + 1 X -1 X 1 = -2 small raridom values, say 0.1, and the learning rare is
87
II
also set to 0.1. Also here the least mean square error
=0+1 x0+1 x0+1 x 0+(-1) xO W?{new) = W?(old) + atx'] =I+ 1 x -1 x 1 = 0
+1xO+~Dx0+1x0+1x0+1xO wg(new) = ws(old)+ o:txs = 1 + 1 x -1 x -1 = 2
· miy Qe set. The weights are calculated until the least
m~ square error is obtained.
I
Yin= 0 fU9(new) == fV9(old) + etfX9 = 1 + 1 x -1 x -1 "== 2 The initial weighlS are taken to be WJ = W2 =
b[new) = b(old) +or= I+ 1 x -1 = 0 b = 0.1 and rhe learning rate ct = 0.1. For the first
Therefore, by applying the activation function the
input sample, XJ = 1, X2 = 1, t = 1, we calculate the
output is given by y = ff.J;n) = 0. Now since t '# y, The weighlS afrer presenting rhe second input sam~ net input as
the new weights are computed as pie are ~
I~
+ X6W6 + X7lll] + XflWB +X<) IV<) Figure 7 Network architecture.
.6.wz = a(t- y;,)X2
The initial weights are all assumed to be zero, i.e., =1+ l X 1+ 1 X l+ l X 1+ 1 X -1 + 1 X 1 lmplemenr OR function with bipolar inputs and
e = 0 and a = 1. The activation function is given by
+1x-1+1x1+(-1)x 1+(-1)x1 targelS using Adaline network.
t.b = o(t- y;,)
. -1 ifyrn < -0 I Therefore the output is given by y = f (y;u) = l. E = (r- y;,) 2 = (0.7) 2 = 0.49
I Since t f= y, rhe new weights are Table 10
For the first input sample, Xj = [l 1 L ~ I--1 -1 1 1 t The final weights after presenting ftrsr inpur sam·
1 1], t = l, the net input is calculated as
w,(new) == + o:oq == l + 1 x -1
WJ(old) X\== 0 Xj X:z
- pie are
fV2(new) == fV2(old) + O:tx]. = 1 + 1 X -1 Xl=0 1
-1 w= [0.17 0.17 0.17]
w3(new) = w3(old)+ O:b:J =I+\ X -1 X1= 0
y;, = b + Lx;w; -1
i=l
w~(new) = wq(old) + CtP:4 =-I+ 1 x -1 x t = -2 -1 -1 -1 and errorE= 0.49.
11
II
88 Supervised learning Network 3.12 Solved Problems 89
These calculations are performed for all the input Table 12 7. UseAdaline nerwork to train AND NOT funaion w,(new) = w,(old) + a(t- y,,)x:z
'
samples and the error is caku1ared. One epoch is
completed when all the input patterns are presented.
Summing up all the errors obtained for each input
Epoch
Epoch I
Total mean square error
3.02
with bipolar inputs and targets. Perform 2 epochs
of training.
= 0.2+ 0.2 X (-1.6) X I= -0.12
b(new) = b(old) + a(t- y;,) !
Epoch 2 1.938 Solution: The truth table for ANDNOT function = 0.2+ 0.2 (-1.6) = -0.12
sample during one epoch will give the mtal mean X '!:
Epoch 3 1.5506 with bipolar inputs and targets is shown in Table 13.
square error of that epoch. The network training is
Epoch 4 1.417 Table 13 Now we compute the error,
continued until this error is minimized to a very small
Epoch 5 1.377
value.
Adopting the method above, the network training E= (t- y;,) 2 = (-1.6) 2 = 2.56
~-
is done for OR function using Adaline network and
is tabulated below in Table 11 for a = 0.1. The final weights after presenting first input sample
-".~
The total mean square error aft:er each epoch is a<e w = [-0.12- 0.12- 0.12] and errorE= 2.56.
The operational steps are carried for 2 epochs
given as in Table 12. ,1 @ w1 == 0.4893 f::\_ ~
of training and network performance is noted. It is
Thus from Table 12, it can be noticed that as
training goes on, the error value gets minimized.
~~1'~Y Initially the weights and bias have assumed a random
tabulated as shown in Table 14.
Hence, further training can be continued for fur~ - value say 0.2. The learning rate is also set m 0.2. The
weights are calculated until the least mean square error
The total mean square error at the end of two
epochs is summation of the errors of all input samples
t:her minimization of error. The network archirecrure ~ is obtained. The initial weights are WJ = W1. b = =
of Adaline network for OR function is shown in as shown in Table 15.
0.2, and a= 0.2. For the fim input samplex1 = 1,
Figure 8. Figure 8 Network architecture of Adaline.
.::q = l, & = -1, we calculate the net input as Table15
Yin= b + XtWJ + X2lli2
)
Table 11
= 0.2+ I X 0.2+ I X 0.2= 0.6
Epoch Total mean square error
ll
Weights
Epoch I 5.71 :~
Net Epoch 2 2.43 ·'
Inputs T: Weight changes Now compute (t- Yin} = (-1- 0.6) = -1.6.
- - a<get input Wt b Enor
X] x:z I t Yin (r- Y;,l) i>wt
"'"" i>b (0.1 ""
0.1 0.1) (t- Y;,? Updacing ilie weights we obtain
Hence from Table 15, it is clearly undersrood rhat the .,
EPOCH-I w,-(new) = w,-(old) + o:(t- y,n)x; mean square error decreases as training progresses.
I I I I 0.3 0.7 0,07 0,07 om 0.17 0.17 0.17 0.49
Also, it can be noted rhat at the end of the sixth
'
\;
I -1 I I 0.17 0.83 0.083 -0.083 0.083 0.253 0.087 0.253 0.69 The new weights are obtained as
-I I I I 0.087 0.913 -0.0913 0,0913 0,0913 0.1617 0.1783 0.3443 0.83 epoch, rhe error becomes approximately equal to l.
-1 -1 1 -I 0.0043 -1.0043 0.1004 0.1004 -0.1004 0.2621 0.2787 0.2439 1.01 WI (new) ::::: w, (old) + ct(t- Jj )x,
11 The network architecture for ANDNOT function
EPOCH.2 = 0.2 + 0.2 X (-1.6) X I= -0.12 using Adaline network is shown in Figure 9.
1 I 1 1 0.7847 0.2153 0.0215 0.0215 0.0215 0.2837 0.3003 0.2654 0.046
I -1 1 I 0.2488 0.7512 0.7512 -0.0751 0.0751 0.3588 0.2251 0.3405 0.564 Table 14
-I I 1 I 0.2069 0.7931 -0.7931 0.0793 0.0793 0.2795 0.3044 0.4198 0.629
-1 -1 I -I Weights
-0.1641 -0.8359 0.0836 0.0836 -0.0836 0.3631 0.388 0.336 0.699 Ne<
Inputs Weight changes
EPOCH-3 _ _ Target input w, b Error
I I I I 1.0873 -0.0873 -0.087 -0.087 -0.087 0.3543 0.3793 0.3275 0.0076
t>w, M (0.2 ""
0.2 0.2) (t- Y;n)2
-I
I -1 I
I I
-1 -1 1 -1
I
I
0.3025 +0.6975
0.2827
0.0697 -0.0697 0.0697 0.4241 0.3096 0.3973
0.7173 -0.0717 0,0717 0,0717 0.3523 0.3813 0.469
-0.2647 -0.7353 0.0735 0.0735 -0.0735 0.4259 0.4548 0.3954
0.487
0.515
0.541
X[ X:Z
EPOCH-I
I t Y;" (t-y;rl)
"'""
I -I 0.6 -1.6 -0.32 -0.32 -0.32 -0.12 -0.12 -0.12 2.56
EPOCH-4
I I I I 0,076 -I I I -0.12 1.12 0.22 -0.22 0.22 0.10 -0.34 0.10 1.25
1.2761 -0.2761 -0.0276 -0.0276 -0.0276 0.3983 0.4272 0.3678
I -1 I I 0.3389 0.6611 0.0661 -0.0661 0.0661 0.4644 0.3611 0.4339 0.437 -I I I -I -0.34 -0.66 0.13 -0.13 -0.13 0.24 -0.48 -0.03 0.43
-I I 1 I 0.3307 0.6693 -0.0669 0.0669 0.0699 0.3974 0.428 0.5009 0.448 -1 -1 I -I 0.21 -1.2 0.24 0.24 -0.24 0.48 -0.23 -0.27 1.47
-1 -1 I -I -0.3246 -0.6754 0.0675 0.0675 -0.0675 0.465 0.4956 0.4333 0.456 EPOCH-2
EPOCH-5 -I -0.02 -0.98 -0.195 -0.195 -0.195 0.28 -0.43 -0.46 0.95
I I I I 1.3939 -0.3939 -0.0394 -0.0394 -0.0394 0.4256 0.4562 0.393 0.155
I -1 I I 0.25 0.76 0.15 -0.15 0.15 0.43 -0.58 -0.31 0.57
I -1 I I 0.3634 0.6366 0.0637 -0.0637 0.0637 0.4893 0.3925 0.457 0.405
-I I I I 0.3609 0.6391 -0.0639 0.0639 0.0639 0.4253 0.4654 0.5215 0.408 -I I I -I -1.33 0.33 -0.065 0.065 0.065 0.37 -0.51 -0.25 0.106
-1 -1 I -I -0.3603 -0.6397 0.064 0.064 -0.064 0.4893 0.5204 0.4575 0.409 -1 -1 I -I -0.11 -0.90 0.18 0.18 -0.18 0.55 -0.38 0.43 0.8
I
-~
3.1 '2 Solved Problems 91
90 Supervised learning Network
...
b.,o learning rate a equal to 0.5: =0.2+0.5(-1-0.55) X 1 =-0.575
"'22 (new)= "'22 (old)+ a(t- z;" 2)"2
x, x ') w1=o.ss
Calculate net input to the hidden units:
1 y =0.2+0.5(-1-0.45)x 1=-0.525'
Zinl = + XJ WlJ + X2U/2J
b1
y
>Nz"'_o.~ = 0.3 + 1 X 0.05 + 1 X 0.2 = 0.55
b2 (new]= b2 (old)+ a(t- z;d
x, x, Zin2 = /n. +X} WJ2 + xiW22 = 0.15+0.5(-1-0.45)=-0.575
= 0.15 + 1 X 0.1 + 1 X 0.2 = 0.45 All the weights and bias between the input layer and
Figure 9 Network architecrure for ANDNOT hidden layer are adjusted. This completes the train-
function using Adaline nerwork.. Calculate the output z 1,Z2 by applying the activa- ~::-1.08
ing for the first epoch. The same process is repeated
tions over the net input computed. The activation until the weight converges. It is found that the weight
8 Using Madaline network, implement XOR func- function is given by Figure 11 Madaline network for XOR function
tion with bipolar inputs and targets. Assume the converges at the end of 3 epochs. Table 17 shows the
(final weights given).
required parameters for training of the network. I ifz;,<:O training performance of Madaline network for XOR y
! (Zir~) = ( -1 ifz;11 <0 function.
Solution: The uaining pattern for XOR function is The network architecture for Madaline network
given in Table 16. Hence, with final weights for XOR function is shown in
Table 16 z1 = j(z;,,) = /(0.55) = I Figure 11.
z, = /(z;,,) = /(0.45) = 1 9._}Jsing back-propagation_ network, find the new
• After computing the output of the hidden units, / weights ~or the ~et shown in Figure 12. It is pre- .0.5
, semed wuh the mput pattern [0, 1] and the target 0.3,
then find the net input entering into the output
output is 1. Use a learning rare a = 0.25 and
unit:
binary sigmoidal activation function.
Yin= b3 +zJVJ +z2112
Solution: The new weights are calculated based
The Madaline Rule I (MRI) algorithm in which the = 0.5 + 1 X 0.5 + I X 0.5 = 1.5 on the training -algorithm in Section 3.5.4. The
-oj
weights between the hidden layer and ourpur layer Figure 12 Ne[Work.
remain fixed is used for uaining the nerwork. Initializ- • Apply the activation function over the net input initial weights are [v11 v11 vod = [0.6 -0.1 0.3],
ing the weights to small random values, the net\York Yin to calculate the output y.
Table 17
architecture is as shown in Figure 10, widt initial
y = f(;y;,) = /(1.5) = 1 Inputs Target
weights. From Figure 10, rhe initial weights and bias b, b2
X~ (t} wn
are [wu "'21 bd = [0.05 0.2 0.3], [wn "'22 b,] = Since t f:. y, weight updation has to be performed. Zinl Zinl ZJ Zl Y;11 Y "'21 W12
'""
[0.1 0.2 0.15] and [v 1 v, b3] = [0.5 0.5 0.5]. For fim Also since t = -1, the weights are updated on z1
EPOCH-I
and Zl that have positive net input. Since here both
I I 1 -1 0.55 0.45 I 1 1.5 1-0.725 -0.58 -0.475-0.625 -0.525 -0.575
1lbj=0.3 net inputs Zinl and Zinl are positive, updating the 1-1 I I -0.625 -0.675 -1-1 -0.5 -1 0.0875-1.39 0.34 -0.625 -0.525 -0.575
weights and bias on both hidden units, we obtain -I 1 1 I -1.1375 -0.475 -I -1 -0.5 -I 0.0875 -1.39 0.34 -1.3625 0.2125 0.1625
Wij(new) = Wij(old) + a(t- Zin)x; -1-1 1 -1 1.6375 1.3125 1 1 1.5 1 1.4065 -0.069 -0.98 -0.207 1.369 -0.994
bj(new) = bj(old) + a(t- z;"j) EPOCH-2
1 I I -1 0.3565 0.168 1 I 1.5 I 0.7285 -0.75 -1.66 -0.791 -0.207 -1.58
y
This implies: 1-1 I 1 -0.1845-3.154 -1-1-0.5-1 1.3205-1.34 -1.068-0.791 0.785 -1.58
-1 1 I 1 -3.728 -0.002 -1-1-0.5-1 1.3205 -1.34 -1.068- 1.29 0.785 -1.08
WI! (new)= WI! (old)+ a(t- ZinJ)XJ
-1-1 I -1 -1.0495-1.071 -1-1-0.5-1 1.3205 -1.34 -1.068-1.29 1.29 -1.08
=0.05+0.5(-1-0.55) X 1 = -0.725
EPOCH-3
WJ2(new) = WJ2(old) + a(t- Zin2)Xl 1.32 -1.34 -1.07 - 1.29 1.29 -1.08
1 1 1 -1 -1.0865-1.083 -1-1-0.5-1
'bz =0.15 =0.!+0.5(-1-0.45) X I =-0.625 -1.34 -1.07 -1.29 1.29 -1.08
1-1 I I 1.5915-3.655 1-1 0.5 I 1.32
b1(new)= b1(old)+a(t-z;"Il -I 1 I I -3.728 1.501 -1 1 0.5 1 1.32 -1.34 -1.07 -1.29 1.29 -1.08
Figure 10 Nerwork archicecrure ofMadaline for 1.29
=0.3+0.5( -I- 0.55) = -0.475 1-1 1 -1 -1.0495-1.701 -1-1-0.5-1 1.32 -1.34 -1.07 -1.29 -1.08
XOR funcr.ions .(initial weights given).
92 SupeJVised Learning Network
-I 3.12 Solved Problems 93
I Compute rhe final weights of the network:
[v12 vn "02l = [-0.3 0.40.5] and [w, w, wo] = [0.4 This implies
0.1 -0.2], and the learning' rate is a = 0.25. Acti- v11(new) = VIt(old)+b.vJI = 0.6 + 0 = 0.6
!, = (I - 0.5227) (0.2495) = 0.1191
vation function used is binary sigmoidal activation vn(new) = vn(old)+t.v12 = -0.3 + 0 = -0.3 .
function and is given by Find the change5~Ulweights be~een hidden and "21 (new) = "21 (oldl+<'>"21
output layer:.
I = -0.1 + 0.00295 = -0.09705
f(x) = I+ ,-• <'>wi = a!1 ZI = 0.25 X 0.1191 X 0.5498 vu(new) = vu(old)+t>vu
,-- 0.0164 ::>
= 0.4 + 0.0006125 = 0.4006125
Given the output sample [x 1, X2] = [0, 1] and target
t= 1, t.w, = a!1 Z2 = 0.25 X 0.1191 X 0.7109 w,(new) = w1(old)+t.w, = 0.4 + 0.0164,
Calculate the net input: For zt layer ---=o:o2iT7 = 0.4164
Figure 13 Network.
<'>wo = a! 1 = 0.25 x 0.1191 = 0.02978 w2(now) = w,(old)+<'>W2 = 0.1 + 0.02!17
Zinl = !lQJ + XJ + X2V21
V11
Compute the error portion 8j between input and = 0.!2!17 For z2layer
= 0.3+0 X 0.6+ I X -0.1 = 0.2 hidden layer (j = 1 to 2): VOl (new) = VOl (old)+<'>•OI = 0.3 + 0.00295
For z2 layer ~f'( = 0.30295
z;,2 = V02 + XJVJ2 + X2V22
Dj= O;,j Zinj)
= 0.5 + (-1) X -0.3 +I X 0.4 = l.2
vo2(new) = 1102(old)+.6.vo2
Zjril = VQ2 + Xj V!2 + X2.V1.2 '
O;,j= I:okwjk = 0.5 + 0.0006125 = 0.5006!25 Applying activation to calculate the output, we
= 0.5 + 0 X -0.3 +I X 0.4 = 0.9 k=!/
.,.(new)= .,.(old)+8wo = -0.2 + 0.02976 obtain
8;nj = 81 Wj! I·.' only one output neuron]
Applying activation co calculate Ute output, we 1_ 1 _ t'0.4
obrain ------
=>!;,I= !1 wn = 0.1191.K0ft = 0.04764
-~
= -0.!7022
Thus, the final weights hav~ been computed for the
t-"inl
ZI =f(z; 1l = - - - = - - = -0.!974
n 1 + t'-z:;nl 1 + /1.4
I
ZI = f(z;,,) = - - - = - - - = 0.5498
1 + e-z.o.1 1 + t-0.2
I =>O;,z = Ot Wzl = 0.1191
_,- X 0.1 = 0.01191
_-:~
network shown in Figure 12.
zz =/(z;,2) = -
1- t'-Z:,;JL
- - = - -1- 2 = 0.537
l - t'-1.2
Error, 81 =O;,,f'(Zirll). 1+t-Zin2 1 +e-.
I 1 19. Find rhe new weights, using back-propagation
z2 = f(z· 2l = - - - = - - - = 0.7109 j'(z;,I) = f(z;,,) [1- f(z;,,)] network for the network shown in Figure 13.
m 1 + e-Zilll 1 + e-0.9 Calculate lhe net input entering the output layer.
= 0.5498[1- 0.5498] = 0.2475 The network is presented with the input pat- For y layer
Calculate the net input entering the output layer. 0 1 =8;,1/'(z;,J) tern l-1, 1] and the target output is +1. Use a
For y layer
= 0.04764 X 0.2475 = 0.0118
learning rate of a = 0.25 and bipolar sigmoidal Yin= WO + ZJWJ +zzWz
activation function. = -0.2 + (-0.1974) X 0.4 + 0.537 X 0.1
Ji11 = WO+ZJWJ +z2wz Error, Oz =0;,a/'(z;,2) Sn_ly.tion: The initial weights are [vii VZI vod = [0.6 = -0.22526
= -0.2 + 0.5498 X 0.4 + 0.7109 X 0.1
j'(z;,) = f(z;d [1 - f(z;,2)] ·0.1 0.3], [v12 "22 vo2l = [ -0.3 0.4 0.5] and [w,
= 0.09101 Wz wo] = [0.4 0.1 -0.2], and die learning rme is Applying activations to calculate the output, we
= 0.7109[1 - 0.7!09] = 0.2055
Applying activations to calculate the output, we
a= 0.25. obtain
Oz =8;,zf' (z;,2) Activation function used is binary sigmoidal 1 1 0.22526
obtain
= 0.01191 X 0.2055 = 0.00245 activacion function and is given by 1 - t'- '" _-_--",=< -0.1!22
1 1
y = f(y;,) = l + t'-y,.. = 1 + 11-22526
Y = f{y;n) = ~ = 1 + e-0.09101 = 0.5227 Now find rhe changes in weights between input 2 1 -e-x
and hidden layer: f (x )----1---
- 1 +e-x - 1 +e-x Compute the error portion 8k:
Compute the error portion 811.:
.6.v 11 =a0 1x1 =0.25 x0.0118 x0=0 Given the input sample [x1, X21 = [-1, l] and target !, = (t, - yllf' (y;,,)
!,= (t,- y,)f'(y,,.,) <'>"21 = a!pQ=0.25 X 0.0118 X I =0.00295 t= 1:
Now
'
----------------
I f'(J;.) = 0.5[1 + f(J;,)] [I- f(J;,)]
= 0.5[! - 0.1122][1 + 0.1122] = 0.4937 .
-- .
-~~
ll:"22 =a!2X'2 =0.25 X 0.00245 X I =0.0006125 I = Q.3 + (-1) X 0.6 +I X -0.1 = -0.4
!' (J;,) = 0.2495 <'>v02 =a!2=0.25 x 0.00245 =0.0006!25
I '-..
-·---
)
l _...-/
l
3.14 Exercise Prob!ems 95
94 Supervised learning Network
13. State the testing algorithm used in perceptron 34. What are the activations used in back-
This implies f>'OI =•01 = 0.25 X 0.1056';'0.0264 propagation network algorithm?
algorithm.
t,.,,=•o 2x, =0.25 x 0.0195 x -1 =-0.0049 35. What is meant by local minima and global
,, = (l + 0.1122) (0.4937) = 0.5491 14. How is _the linear separability concept imple-
[,."22 = cl02X, =0.25 X 0.0195 X 1 =0.0049 mented using perceprron network training? minima?
Find the changes in weights between hidden and l>'02 = •o2= 0.25 X 0.0195 =0.0049 3i5. · Derive the generalized delta learning rule.
15. Define perceprron learning rule.
output layer:
16. Define d_dta rule. 37. Derive the derivations of the binary and bipolar
Comp'Lite the final weights of the nerwork:
L\w1 = a81 ZJ = 0.25 X 0.5491 X -0.1974 1.1~ SGlte the error function for delta rule. sigmoidal activation function.
= -0.0271 18. What is the drawback of using optimization 38. What are the factors that improve the conver-
""(new) = "" (old)+t., 11 = 0.6- 0.0264
gence of learning in BPN network?
/).w, = •01 Z2 = 0.25 X 0.549! X 0.537 = 0.0737 = 0.5736 algorithm?
39. What is meant by incremenrallearning?
L\wo = a81 = 0.25 x 0.5491 = 0.1373 ,,(n<w) = ,,(old)+t.,, = -0.3-0.0049 19. What is Adaline?
40. Why is gradient descent method adopted to
20. Draw the model of an Adaline network.
Compute the error portion Bj beMeen input and = -0.3049 minimize error?
21. Explain the training algorithm used in Adaline
hidden layer (j = 1 to 2): "21 (new) = "21 (old)+t...., 1 = -0.1 + 0.0264 41. What are the methods of initialization of
network.
= -0.0736 weights?
81 = 8;/ljj' (z;nj) 22. How is a Madaline network fOrmed?
m ...,,(new) = "22(old)+t."22 = 0.4 + 0.0049 42. What is the necessity of momentum factor in
23. Is it true that Madaline network consists of many
8inj = L 8k Wjk = 0.4049 perceptrons?
weight updation process?
43. Define "over fitting" or "over training."
~I 24. Scare the characteristics of weighted interconnec-
WI (new) = WI (old)+t.w 1 = 0.4- 0.0271
._ 8inj = 81 WjJ [· •· only one output neuron] tions between Adaline and Madaline. 44. State the techniques for proper choice oflearning
= 0.3729 rate.
=>8in1 =81 WJJ = 0.5491 X 0.4 = 0.21964
w,(n<w) = w,(old)+t.w, = 0.1 + 0.0737
25. How is training adopted in Madaline network
using majority vme rule? 45. What are the limitations of using momentum
( =>o;., =o, ""' = o.549I x o.1 = o.05491 = 0.1737 factor?
Error, 81 =8;,J/'(z;nJ) = 0.21964 X 0.5 26. State few applications of Adaline and Madaline;
1 ''' (n<w) = "OI (old)+l>'OI = 0.3 + 0.0264 46. How many hidden layers can there be in a neural
27. What is meant by epoch in training process?
~
X (I +0.1974)(1- 0.1974) = 0.1056 network?
= 0.3264 28. Wha,r is meant by gradient descent meiliod?
Error, 82 =8;112/'(z;,2) = 0.05491 X 0.5 47. What is the activation function used in radial
"oz(n<w) = '02(old)+t..,, = 0.5 + 0.0049 29. State ilie importance of back-propagation
X (1- 0.537)(1 + 0.537) = 0.0195 basis function network?
= 0.5049 algorithm.
48. Explain the training algorithm of radial basis
Now find the changes in weights berw-een input wo(new) = wo(old)+t.wo = -0.2 + 0.1373 30. What is called as memorization and generaliza- function network.
and hidden layer: = -0.0627 tion? 49. By what means can an IIR and an FIR filter be
31. List the stages involved in training of back- formed in neural network?
f'l.V]J =Cl:'8]X1 =0.25 X 0.1056 X -1 = -0.0264
Thus, the final weight has been computed for the propagation network.
/).'21 =•OiX, =0.25 X 0.1056 X 1 =0.0264 50. What is the importance of functional link net-
network shown in Figure 13. 32. Draw the architecture of back-propagation algo· work?
I 3.13 Review Questions
rithm.
33. State the significance of error portions 8k and Oj
51. Write a short note on binary classification tree
neural network.
1. What is supervised learning and how is it differ- 7. Smte the activation function used in perceprron in BPN algorithm.
52. Explain in detail about wavelet neural network.
em from unsupervised learning? network.
2. How does learning take place in supervised 8. What is the imporrance of threshold in percep-
learning? tron network? I 3.14 Exercise Problems
3. From a mathematical point of view, what is the 9. Mention the applications of perceptron network.
1. Implement NOR function using perceptron are belonging to the class (so have targ.etvalue 1),
process of learning in supervised learning? 10. What are feature detectors?
network for bipolar inputs and targets. vector (-1, -1, -1, 1) and (-1, -1, 1 1) are
4. What is the building block of the perceprron? 11. With a neat flowchart, explain the training not belonging to the class_ (so have target· value
2. Find the weights required to perform the fol-
5. Does perceprron require supervised learning? If process of percepuon network. -1). Assume learning rate 1 and initial weighlS
lowing classifications using perceptron network
no, what does it require? 12. What is the significance of error signal in per- "0.
The vectors (1, 1, -1, -1) ,nd (!,-I. 1, -I)
6. List the limitations of perceptron. ceptron network?
,L