0% found this document useful (0 votes)
80 views40 pages

Artificial Neural Network

Uploaded by

Darshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views40 pages

Artificial Neural Network

Uploaded by

Darshan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

-=1

or' ·. ~c~~t} en}

Introduction I ---==
10 --1

I Artificial Neural Network:


for their application to optimization. The field of probabilistic reasoning is also sometimes included under the
soft computing umbrella foi- its control of randomness and uncertainty. The importance of soft computing
lies in using these methodologies in partnership - they all offer their own benefits which are· generally nor
competitive and can therefore, work together. As a result; several hybrid systems were looked at - systems in
which such partnerships exist.
An Introduction 2
learning Objectives
The fundamema.ls of artificial neural net~ Various terminologies and notations used
work. throughout the text.
The evolmion of neural networks. The basic fundamental neuron model -
Comparison between biological neuron and McCulloch-Pins neuron and Hebb network.
:inificial neuron. The concept of linear separability to form
Basic models of artificial neural networks. decision boundary regions.
The different types of connections of neural
nern'orks, learning and activation functions
are included.

;:;,\1 bjyO
)-(o,~ c.
ri ..~I'·,"
'~ .
J
I 2.1 Fundamental Concept
Neural networks are those information processing systems, which are constructed and implememed to model
the human brain. The main objective of the neural network research is to develop a computational device
for modeling the brain to perform various computational tasks at a faster rate .than the traditional systems .
.-..., Artificial neural ne.~qrks perfOFm various tasks such as parr~n·marchjng and~"dassificarion. oprimizauon
~on, approximatiOn, vector ·uamizatio d data..clus.te..di!fThese_r__'!_5~~~2'!2'..J~~for rraditiOiiif'
Computers, w ·,c are er 1 gomll~putational raskrlndrp;;ise !-rithmeric operatic~. Therefore,
for implementation of artificial n~~·speed digital corrlpurers are used, which makes the
simulation of neural processes feasible.

I 2.1.1 Artificial Neural Network

& already stated in Chapter 1, an artificial neural nerwork (ANN) is an efficient information processing
system which resembles in characteristics with a biological neural nerwork. ANNs possess large number of
highly interconnected processing elements called notUs or units or neurom, which usually operate in parallel
and are configured in regular architectures. Each neuron is connected wirh the oilier by a connection link. Each
connection link is associated with weights which contain info!£11ation about the_iapu.t signal. This information
is used by rhe neuron n;t to solve a .Particular pr.cl>lem. ANNs' collective behavior is characterized by their
ability to learn, recall and' generaUa uaining p®:erns or data similar to that of a human brain. They have the
ro
capability model networkS of ongma:l nellfOIIS as-found in the brain. Thus, rhe ANN processing elements
are called neurons or artificial neuro'f·\, , ·, l"-
( \- ,.'"
c' ,,· ,
'--· .r ,-
\ ' \
12 ~ Artificial Neural Network: An Introduction 2.1 Fundamental Concept 13

x, X,
...

~@-r Slope= m

t
y
Figure 2·1 Architecture of a simple anificial neuron net.
,
Input '(x) :- ~- i ' (v)l----~•·mx
~--.J

Figure 2·2 Neural ner of pure linear equation. X----->-

Figure 2·3 Graph for y = mx.


It should be noted that each neuron has an imernal stare of its own. This imernal stare is called ilie
activation or activity kv~l of neuron, which is the function of the. inputs the neuron receives. The activation Synapse
signal of a neuron is transmitted to other neurons. Remembe(i neuron can send only one signal at a rime,
which can be transmirred to several ocher neurons.
To depict rhe basic operation of a neural net, ·consider a set of neurons, say X1 and Xz, transmitting signals Nucleus --+--0
to a110ilier neuron, Y. Here X, and X2 are input neurons, which transmit signals, andY is the output neuron,
which receives signals. Input neurons X, and Xz are connected to the output neuron Y, over a weighted
interconnection links (W, and W2) as shown in Figure 2·1.
7 /•
-c_....-- . v,
For the above simple rleuron net architecture, the net input has to be calculated in the following way: DEindrites 1: t 1 1

]in= +XIWI +.xz102 Figure 2-4 Schcmacic diagram of a biological neuron.


~
where Xi and X2 ,gL~vations of the input neurons X, and X2, i.e., the output of input signals. The The biological neuron depicted in Figure 2-4 consists of dtree main pans:
output y of the output neuron Y can be o[)i"alneaOy applymg act1vanon~er the ner input, i.e., the function
of the net input: 1. Soma or cell body- where the cell nucleus is located.
2. Dendrites- where the nerve is connected ro the cell body.
3. Axon- which carries ~e impu!_s~=-;t the neuron.
J = f(y;,)
Output= Function (net input calculated)
Dendrites are tree-like networks made of nerve fiber connected to the cell body. An axon is a single, long
The function robe applied over the l]£t input is call:a;;dti:n fo'!!!f_on. There are various activation functions, conneC[ion extending from the cell body and carrying signals from the neuron. The end of dte axon splits into
which will be discussed in the forthcoming sect10 _ . e a ave calculation of the net input is similar tq the fineruands. It is found that each strand terminates into a small~ed JY1111pse. Ir is duo
calculation of output of a pure linear straight line equation (y = mx). The neural net of a pure linear cqu3.tion na se that e neuron introduces its si nals to euro , T e receiving ends o e a ses
is as shown in Figure 2·2. ~}be nrarhr neurons can be un both on the dendrites and on y. ere are approximatdy
Here, m oblain the output y, the slope m is directly multiplied with the input signal. This is a linear
equation. Thus, when slope and input are linearly varied, the output is also linearly varied, as shown in
f :,}_er neuron in me numan Drain.
~es are passed between the synapse and the dendrites. This type ofsignal uansmission involves
Figure 2·3. This shows that the weight involved in dte ANN is equivalent to the slope of the linear straight a. chemical process in which specific transmitter substances are rdeased from the sending side of the junccio
line. This results in increase or decrease in th~ inside the bOdy of the receiving cell. If the dectric
potential reaches a threshold then the receiving cell fires and a pulse or action potential of fixed strength and
I 2.1.2 Biological Neural Network duration is sent oulihro'iigh the axon to the apcic junctions of the other ceUs. After firing, a cd1 has to wait
for a period of time called th efore it can fire again. The synapses are said to be inhibitory if
It iswdl·known that dte human brain consists of a huge number of neurons, approximatdy 10 11 , with numer· they let passing impulses hind the receiving cell or txdtawry if they let passing impulses cause
ous interconnections. A schematic diagram of a biological neuron is s_hown in Figure 2-4. the firing of the receiving cell.
-·---""
.J

j
2. f Fundamental Concept 15
14 Artificial Neural Network: An Introduction

2. jJ'ocessing: Basically, the biological neuron can perform massive paralld operations simulraneously. The
Inputs artificial neuron can also perform several parallel operations simultaneouSlY, but, ih general, the artificial
~ Weights neuron ne[INork process is faster than that of the brain. .
x, ~ 3. Size and complexity: The total number of neUrons in the brain is about lOll and the total number of
interconnections is about 1015 • Hence, it can be rioted that the complexity of the brain is comparatively
";
higher, i.e. the computational work takes places not"Cmly in the brain cell body, but also in axon, synapse,
ere. On the other hand, the size and complOciry ofan ANN is based on the chosen application and
the ne[INork designer. The size and complexity of a biological neuron is more than iliac Of an arcificial
neurorr.-----
Processing
w, 4. Storage capacity (mnno,Y}: The biologica.l. neuron stores the information in its imerconnections or in
element
synapse strength but in an artificial neuron it is smred in its contiguous memory locations. In an artltlcial
~/ neuron, the continuous loading of new information may sometimes overload the memory locations. As a
X,
result, some of the addresses containing older memory locations may be destroyed. But in case of the brain,
Figure 2·5 Mathematical model of artificial neuron. new information can be added in the interconnections by adjusting the strength without descroying the
older infonnacRm. A disadvantage related to brain is that sometimes its memory niay fail to recollect the.
stored information whereas in an artificial neuron, once the information is stored in its me~ locations,
Table 2·1 Terminology relarioii:ShrpS b~tw-ee·n
biological and artificial neurons
Biological neuron
Cell
Anificial neuron
Neuron
-
it can be retrieved. Owing to these facts, rhe adaptability is more toward an artificial neuron.
5. Tokrance: The biola ical neuron assesses fault tolerant capability whereas the artificial neuron has no
fault tolerance. Th distributed natu of the biological neurons enables to store and retrieve information
even when the interconnections m em get disconnected. Thus biological neurons nc fault toleF.lm. But in
Dendrites Weights or inrerconnecrions case of artificial neurons, the mformauon gets corrupted if the network interconnections are disconnected.
Soma Nee inpur
Biological neurons can accept redundancies, which is not possible in artificial neurons. Even when some
Axon Outpm
ceHs die, the human nervous system appears to be performing with the same efficiency.
6. Control mechanism: In an artificial neuron modeled using a computer, there is a control unit present in
Figure 2~5 shows a mathematical represenracion of the above~discussed chemical processing raking place Central Processing Unit, which can transfe..! and control precise scalar values from unit to unit, bur there
in an artificial neuron. is no such control unit for monitoring in the brain. The srrengdl of a neuron in the brain depends on the
In chis model, the net input is elucidated as active chemicals present and whether neuron connections are strong or weak as a result ~mre layer
rather t~ synapses. However, rhe ANN possesses simpler interconnections and is freefrom

Yin = Xt WJ + XzW2 + · ·· + x,wn = L" x;w;


chemical actions similar to those raking place in brain (biological neuron). Thus, the control mechanism
of an arri6cial neuron is very simple compared to that of a biological neuron. --
i=l
So, we have gone through a comparison between ANNs and biological neural ne[INorks. In shan, we can
where i represents the ith processing elemem. The activation function is applied over it ro calculate the
say that an ANN possesses the following characteristic.s:
output. The r-reighc represents the strength of synapse connecting the input and the output neurons. ft pos·
irive weight corresponds to an excitatory synapse, and a negative weight corresponds to an inhibitory 1. It is a neurally implemented mathem~
synapse.
'!. 2. Ther~lilgfi(y'"interconnected processing elements called nwrom in an ANN.
The terms associated with the biological neuron and their counterparts in artificial neuron are prescmed
in Table 2-l. 3. The interconnections with their weighted linkages hold the informative knowledge.
4. The input signals arrive at the processing elelnents through connections and connecting weights.
2.1.3 Brain vs. Computer - Comparison Between Biolbgical Neuron and 5. The processing elements of the ANN have the ability to learn, recall and generalize from the given data
Artificial Neur9n (Brain vs. Computer) by suitable assignment or adjustment of weights.
6. The computational power can be demonstrated only by the collective behavior of neurons, and it should
A comparison could be made between biological and artificial neurons on the basis of the following criteria:
be noted that no single neuron carries specific information.
1. Speed· T~e of rxecurion in the ANN is of& .. wannsergnds whereas in the ci.se of biolog-
ical neuron ir is of a few millisecondS. Hence, the artificial neuron modeled using a com purer is more The above-mentioned characteristic.s make the ANNs as connectionist models, parallel distributed processing
faster. -

--
models, self-organizing systems, neuro-computing systems and neuro-morphic systems.

I
l
:w:

16 Artificial Neural Network: An Introduction " 17


2.3 Basic Models of Artificial Neural Network

I 2.2 Evolution of Neural Networks


The evolution of neural nenvorks has been facilitated by the rapid developmenr ofarchitectUres and algorithms ~are specified by the three basic. entities namely:
that are currently being used. The history of the developmenr of neural networks along with the names of
their designers is outlined Tab!~ 2~2. 1. the model's synaptic interconnectionS;
In the later years, the discovery of the neural net resulted in the implementation of optical neural nets, 2. the training or learning rules adopted for upda~ng arid adjusting the connection weights;
Boltzmann machine, spatiotemporal nets, pulsed neural networks and support vector machines. 3. their activation functions.

Table 2·2 Evolution of neural networks


I 2.3.1 Connections

Year Newal Designer Description The neurons should be visualized for their arrangements in layers. An ANN consists of a set of highly inter-
necwork connected processi elements (neurons) such that each processing element output is found ro·be connected
1943 McCulloch md McCulloch and The arran gemem of neurons in this case is a combination of logic throughc.. e1g ts to the other processing elements or to itself, delay lead and lag-free_.conn'eccions are allowed.
Pitts neuron Pins functions. Unique feature of this neuron is the concept of Hence, the arrange!llents of these orocessing elements and-dl'e" g:ametFy o'f-tJiciC'interconnectipns are essential
threshold. for an ANN. The point where the connection ongmates and terminates should De noted, :ind the function
1949 Hebb network Hebb It is based upon the fact that if two neurons are found to be active o ea~ processing element in an ANN should be specifie4.
simulraneously then the strength of the connection bmveen them Bes1 es e pie neuron shown in Figure??, there exist several other cypes of neural network connections.
should be increased. /fie arrangement of neuron:2form layers and the connection panem formed wi~in and between layers is
1958, Percepuon F<>nk Here the weighrs on the connection path can be adjusted. ~led the network architecture. here exist five basic types of neuron connection architectUres. They are:
1959. Rosenblau,
1962, Block, Minsky 1. single-layer feed-forwar network;
1988 and Papert 2. multilayer feed-forward network;
1960 Adaline Widrow and Here the weights are adjusted ro reduce the difference between the
Hoff net input to the output unit and the desired output. The result 3. single node with itS own feedback;
here is very negligible. Mean squared error is obtained. 4. single-layer recurrent network;
1972 Kohonen Kohonen The concept behind this network is that the inputs are clustered 5. mulrilayer recurrent network.
self-organizing together to obtain a fired ourput neuron. The clustering is
feature map performed by winner-take all policy. Figures 2-6-2-10 depict the five types of neural network architectures. Basically, neural nets are classified
1982, Hopfield John Hopfidd This neural network is based on fixed weights. These nets can also into single-layer or multilayer neural ners. A layer is formed by taking a processing element and combining it
1984, network and Tank act as associative memory nets. wirh other processing elements. Practically, a layer implies a stage, going stage by stage, i.e., the input srageand
1985, the output stage are linked with each other. These linked interconnections lead to the formation of various
1986, netw-ork architecrures. When a layer of the processing nodes is formed, the inputs can be connected to these
1987
1986 Back- Rumelhart, This network is multi-layer wirh error being propagated backwards I
propagation Hinton and from the output unirs ro the hidden unirs. lnpul Output
network
1988 Counter-
propagation
WiUiams
Grossberg This network is similar ro rhe Kohonen network; here the learning
occurs for all units in a panicular layer, and there exists no
.lI layer layer

network competition among these units.


1987- Adaptive
'
1990 Resonance
Carpenter and
Grossberg
The ART network is designed for both binary inputs and analog
valued inpur.s. Here the input pauems can be presented in any I Output '
neurons
Theory <ARn order.
1988 Radial basis Broomhead and This resembles a back propagation network bur the activation
&merion Lowe function used is a Gaussian function. I
network I
1988 Neo cogniuon Fukushima This network is essential for character recognition. The deficiency
occurred in cogniuon network (1975) was corrected by this
I
network.
Figure 2·6 Single~layer feed-forward network.

I
18/ Artificial Neural Network: An Introduction j 2.3 Basic Models of Artificial Neural Network 19

Input I
layer
®········ . ··~~! ...

Output
neurous
0>·· ~"··

Figure 2·7 Multilayer feed-forward network. 0 wnm ..

Figure 2·9 Single·layer recurrent network.

Output
Input
Input layer
--------..
Y, )---'-----
~
-£ 0 "" Vn

.· .. <:.:..::,/.......
··,

Feedback •"\\ -::::



(A) (B) X
Figure 2·8 (A) Single node wirh own feedback. {B) Comperirive ners. 0 "'" v~

nodes with various weighrs, resulting in&MJ.n)rnp;u~~~eope~ Thus, a single-laye1 feed-forward


netw rk is formed.
A mu t1 erfeed-forward network (Figure 2-?) is formed by the interconnection of several layers. The
input layer is that which receives the input and this layer has no function except buffering the input si nal.
The output layer generates the output of the network. Any layer that is formed between e input and output
0·······. . ~n2
layers is called hidden layer. This hidde-n layer is internal to the network and has no direct contact with the
external environment. It should be noted that there may be zero to several hidden layers in an ANN. More the Figure 2·10 Multilayer recurrent ne[Work.
number of the hidden layers, more is Ute com lexi f Ute network This may, however, provide an efficient
output response. In case of out ut from one layer is connected to d
evlill' node in the next layer. feedback to itself. Figure 2~9 shows a single· layer network with a feedback connection in which a processing
A n'etw.Qrk is said m be a feed~forward nerwork if no neuron in the output layer is an input to a node in element's output can be directed back ro the processing element itself or to clte other processing element or
the same layer or in the preceding layer. On the other hand, when ou uts can be directed back as inputs to to both.
same or pr..t:eding layer nodes then it results in me formation e back networ. . The architecture of a competitive layer is shown in Figure 2~8(8), the competitive interconneccions having
If the feedback of clte om put of clte processing elements ts · recred back at input tO the processing fixed weights of -e. This net is called Maxnet, and will be discussed in the unsupervised learning network
elements in the same layer r.fen ic is tailed ilueral feedbi:Uk. Recurrent networks are feedback networks category. Apart from the network architectures discussed so far, there also exists another type of archirec~
with d(\'ied loop. Figure 2~8(A) shows a simple recurrent neural network having a single neuron with rure with lateral feedback, which is called the on·center--off-surround or latmzl inhibition strUCture. In this
~ ----
20 Artificial Neural Network: An Introduction
, 2.3 Basic Models of Artificial Neural Network

X -+
Neural
network y
21

(lnpu :) w (Actual output)

r
<
'
__o;,L
0

,,
1.•
;~'' ? rl
1

__ ~.\~;-.\
'h-'"'""==::':::C:=:=:'::
Flgure2-11~on.&~r~
c< \,. ,'
Error
(0-Y) <
Error
signal b
-~-' signals
o' ..;'s~?ucture, each processing neuron receives two differem classes of inputs- "excitatory" input &om nearby ~ generator (Desi ·ad output)
c~ ·\'
processing elements and "inhibitory" inputs from more disramly_lggted..pro@~ elements. This cype of
inter~ is shown in Figure"2:-1T:·--·--···------- ----~ Figure 2-12 Supervised learning.
In Figure 2-11, the connections with open circles are excitatory connections and the links with solid con-
nective circles are inhibitory connections. From Figure 2-10, it can be noted that a processing element output
can be directed back w the nodes in a preceding layer, forming a multilayer recunmt network. Nso, in these ence, the
networks, a processing dement output can be directed back to rhe processing element itself and to other pro-
cessing elemenrs in the same layer. Thus, the various network architecrures as discussed from Figures 2~6-2·11
can be suitably used for giving effective solution ro a problem by using ANN.

I 2.3,2 Learning
2.3,2,2 Unsupervised Learning
The learning here is performed without the help of a teacher. Consider the learning process of a tadpole, it
The main property of an ANN is its capability to learn. Learning or training is a process by means of which a learns by itself, that is, a child fish learns to swim by itself, it is not taught by its mother. Thus, its learning
neural network adapts itself to a stimulus by making$rop~~rer adjustm~ resulting in the production process is independent and is nor supervised by a teacher. In ANNs following unsupervised learning, the
of desired response. Broadly, there are nvo kinds o{b;ning in ANNs: \ input vectors of simil~pe are grouped without th use of training da.ta t specify ~ch
'~ group looks or to which group a number beloogf n e training process, efietwork receives rhe input
1. Parameter learning: h updates the connecting weights in a neural net.
~-·~paii:erns and organizes these patterns to form clusters. When a new input panern is applied, the neural
2. Strncttm learning: It focuses on the change in network structure (which includes the number of processing " ·· network gives an output response i dicar.ing..ili_~c which the input pattern belongs. If for an input,
elemems as well as rheir connection types). a pattern class cannot be found the a new class is generated The block 1agram of unsupervised learning is
The above two types oflearn.ing can be performed simultaneously or separately. Apart from these two categories shown in Figure 2~13.
of learning, the learning in an ANN can be generally classified imo three categories as: supervised learning; From Figure 2·13 it is clear that there is no feedback from the environment to inform what the outputs
unsupervised learning; reinforcement learning. Let us discuss rhese learning types in detail. should be or whether the outputs are correct. In this case, the network must itself discover patterns~~
lariries, features or categories from the input data and relations for the input data over (heOUtj:lut. While
2-_3,2, 1 Supervised Learning discovering all these features, the network undergoes change m Its parameters. I h1s process IS called self
The learning here is performed with the help of a teacher. Let us take the example of the learning process organizing in which exact clusters will be formed by discovering similarities and dissimilarities among the
of a small child. The child doesn't know how to readlwrite. He/she is being taught by the parenrs at home objects.
and by the reacher in school. The children are trained and molded to recognize rhe alphabets, numerals, etc.
Their each and every action is supervised by a teacher. Acrually, a child works on the basis of the output that 2.3.2.3 Reinforcement Learning
he/She has to produce. All these real-time events involve supervised learning methodology. Similarly, in ANNs This learning process is similar ro supervised learning. In the case of supervised learning, the correct rarget
following the supervised learning, each input vector re uires a cor din rar et vector, which represents output values are known for each input pattern. But, in some cases, less information might be available.
the desired output. The input vecror along with the target vector is called trainin
informed precisely about what should be emitted as output. The block 1a

~
working of a supervised learning network. X y
(lnpu al output)
During training. the input vector is presented to the network, which results in an output vecror. This
outpur vector is the actual output vecwr. Then the actual output vector is compared with the desired (target)
Figure 2-13 Unsupervised learning.
output ·vector. If there exists a difference berween the two output vectors then an error signal is generated by
2.3 Basic Models of Artificial Neural Network
23
22 Artificial Neural Network: An Introduction

The output here remains the same as input. The input layer uses the idemity activation function.
Neural 2. Binary step function: This function can be defined as
X network y
(lnpu t) w (Actual output)
f(x) = { 1 if x) e
0 1fx<e

where 8 represents the lhreshold value. This function is most widely used in single-layer nets to convert
the net input to an output that is a binary (1 or 0).
Error Error
signals signal A 3. Bipolar step fimction: This function can be defined as
generator (Relnlforcement
siignal) 'f(x)=\ .1 ifx)8
-1 tf x< (}
Figure 2~14 Reinforcement learning.
where 8 represents the dueshold value. This function is also used in single-layer nets to convert the nee
For example, the necwork might be told chat its actual output is only "50% correct" or so. Thus, here only input to an output that is bipolar(+ 1 or -1).
critic information is available, nor the exacr information. The learning based on this crjrjc jofnrmarion is 4. Sigmoidal fonctions-. The sigmoidal functions are widely used in back-propagation nets because of the
called reinforCfment kaming and the feedback sent is called reinforcement sb relationship between the value of the functions ar a point and the value of the derivative at that ~nt
The block diagram of reinforcement leammg IS shown in Figure 2-14. The reinforcement learning is a which reduces the computational blJ!den d~ng.
form of su ervis the necwork receives some feedback from its environment. However, the Sigm01dil funcnons are of two types: -
feedback obtained here is only evaluative and not mstrucr1ve. e extern rem orcemenr signals are processed
Binmy sigmoid fonction: It is also rermed as logistic sigmoid function or unipolar sigmoid function.
in the critic signal generator, andilie obtained ;rnc signals are sent to the ANN for adjustment of weights
It can be defined as
properly so as to get better critic feedback in furure. The reinforcement learning is also called learning with a
critic as opposed ro learning with a teacher, which indicates supervised learning. I
So, now you've a fair understanding of the three generalized learning rules used in the training process of f(x) = 1 + ,-'-'
ANNs.
where A is the steepness parameter. The derivative of rhis funcrion is
c---·---·--··... """\
I 2.3.3 Activation Functions / J'(x) =J.f(x)[l- f(x)] \

To better understand the role. of the activation function, let us assume a person is performing some work. Here the range of che sigmoid funct~~iS"fr~~ Qr~ 1~· -···-· - ___ ..
To make the work more efficient and to obrain exact output, some force or activation may be given. This
• Bipo!dr sigmoid fimction: This function is defined as
aaivation helps in achieving the exaa ourpur. In a similar \vay, the aaivation function is applied over the net
inpu~eulate.the output of an ANN. 2 1-e-Ax
The information processing of a processing element can be viewed as consisting of two major parts: input f ( x )1= ---1=--
+ e-Ax l + e-Ax
and output. An integration fun~tion (say[) is associated with the input of a processing element. This function
serves to combine activation, information or evidence from an external source or other processing elements where A is thesteef'n~~rand the sigmoid function range is between -1 and+ 1. The derivative
into a net mpm ro the processing element. I he nofllmear actlvatlon-fi:iiicfion IS usei:l to ensure that a neuron's ofthisiilliC:·~.:· I ..

response is ~nded - diat 1s, the acrual response of the neuron is conditioned or dampened as a reru.h-of A
large or small activating stimuli and is thus controllabl_s. J'(x) = [1 +f(x)][l - f(x)]
Certain nonlinear fllncnons are used to aCh.eve dle advantages of a multilayer network from a single-layer
2
nerwork. When a signal is fed thro~ a multilayer network with linear activation functions, che output The bipolar sigmoidal function is closely related ro hyperbolic rangenr &merion, which is written as
obtained remains same as that could be obtained using a single~layer network. Due to this reason, nohlinear
et-e-x 1-e-b:
functions are widely used in multilayef networks compared ro linear functions. h(x)=--=--
There are several activation functions. Let us discuss a few in chis section: rF\ r+e-x 1 +e-2x
1. Identity fimction: It is a linear function and can be defined as 'I. \Y ':I '
(.
The derivative of the hyperbolic tangent function is
~

f(x) = x foe all x \.r.'


' \,,' . '-'
-~ h'(x) =[I +h(x)][l- h(x)]
c~-
24 Artificial Neural Network: An Introduction 2.4 Important Tenninologies of ANNs 25

If the nerwork uses a binary data, it is better to conven it to bipolar form and use ilie bipolar sigmoidal
1 ,l(x)
acnvauon funcnon or hyperbolic tangent function.

5. Ramp function: The ~p funaion is defined as

if X> 1 f(x)'

f(x) = U if Q.:::: X .:5: 1


if x< 0
0 X

X
(A) (B)
The graphical representations of all the activation functions are Shown in Figure 2-I5(A)-(F).

I(!C)
I 2.4 Important Terminologies of ANNs
This section introduces you ro the various terminologies related with ANNs. +1f-----

I 2.4.1 Weights 0 X
\
In the architecrure ofan ANN, each neuron is connected ro other neurons by means ofdirected communication -1
links, and each communication link is associated with weights. The weighrs contain information about e
if'!pur ~nal. This information is used by the net ro solve a problem. The we1ghr can ented in
-rem1sOf matrix. T4e weight matrix can alSO bt c:rlled connectzon matrix. To form a mathematical notation, it (C) (D)
is assumed that there are "n" processingelemenrs in~ each processing element has exaaly "m"
adaptive weighr.s. Thus, rhe weight matrix W is defined by l(x),

wT\ \w'' WJ2 WJm


\
'',.
-,, I(!C)
WT W22 \~·,, "\

W=
2

I=
""' IU)_m

+1

'·"'
w~j LWn] 7Vn2 1Unm
+1 X
(E) (F)

where w; = [wil, w;2 •... , w;m]T, i = 1,2, ... , n, is the weight vector of processing dement and Wij is the Figure 2-15 Depicrion of activation functions: (A) identity function; (B) binary step function; (C) bipolar step
weight from processing element":" (source node) to processing element "j' (destination node). function; (D) binary sigmoidal function; (E) bipolar sigmoidal function; (F) ramp function.
If the weight matrix W contains all the adaptive elements of an ANN, then the set of aH W matrices
will determine dte set of all possible information processing configurations for this ANN. The ANN can be
The bias is considered. like another weight, dtat is&£= b}
Consider a simple network shown in Figure 2-16
with bias. From Figure 2-16, the net input to dte ourput neuron Yj is calculated as
realized by finding an appropriate matrix W Hence, the weights encode long-term memory (LTM) and rhe
activation states of neurons encode short-term memory (STM) in a neural network. "
Jinj = Lx;Wij = XOWOj +X] W]j + XlWJ.j + ··· + X 11 Wnj

I 2.4-2 Bias i=O


"
The hi · the necwork has its impact in calculating the net input. The bias is included by adding =wo1+ Lx;wif
i=l
a component .ro 1 to the input vector us, the input vector ecomes
"
X= (l,XJ, ... ,X;, ... ,Xn) Ji"j = bj + Ex;wij
i=l
26 Artificial Neural Network: An Introduction 2.5 McCu!loch-Pitts Neuron 27
-r ~r
I
~
2.4.4 Learning Rate .__f o\ ,~'
' '"'
'
bj The learning rate is denoted by "a." It is used to ,co-9-uol the amounfofweighr adillStmegr ar each step of
w,J ~- The learning rate, ranging from 0 -to 1, 9'erer.ffi_iri.es the rate of learning at each time step.

X~ w11 :"( I 2.4.5 Momentum Factor


w,l
Convergence is made faster if a momenrum factor is added to the weight updacion erocess. This is generally
done in the back propagation network. If momentum has to be used, the weights from one or more previous
x, uaining patterns must be saved. Momenru.nl helps the net in reasonably large we1ght adjustments until the
correct1ons are in lhe same general direction for several patterns.
Figure 2·16 Simple net with bias.

c(Bias)
I 2.4.6 Vigilance Parameter

(Weight) ~
Input J@ m ]; Y• )• y.=mx+c

Figure 2·17 Block diagram for straight line.


I 2.4. 7 Notations

The-notations mentioned in this section have been used in this textbook for explaining each network.
The activation function discussed in Section 2.3.3 is applied over chis nee input to calculate the ouqmt. The
bias can also be explain~d as follows: Consider an equation of straight line, x;: Activation of unit Xi, inp_uc signal.
y;: Activation of unit Yj, Jj = f(J;nj)
y= mx+c Wij: Weight on connection from unit X; ro unit Yj.
where xis the input, m is rhe weight, cis !he bias andy is rhe output. The equation of the suaight line can bj: Bias acting on unitj. Bias has a constant activation of 1.
also be represemed as a block diagram shown in Figure 2~17. Thus, b}as plays a major role in dererrnj_njng W: Weight matrix, W = {wij}
the ouq~ut of rhe nerwork. Yinj= Net input to unit Yj given by Yinj = bj + L;XiWij
The bias can be of two types: positive bias and negaiive bias. The positive bias helps in increasing ~et l!x\1: Norm of magnitude vector X.
input of the network and rhe negative bias helps in decreasing the n_~_r)!!.R-1.!-.~ o(Jli!!_p.et\licid{. I hus, as a result Bj: Threshold for activation of neuron Yj-
of the bias effect, the output of rhe network can be varied. ·--- S: Training input vector, S = (s 1 , ••• , s;, ... , s11)

I 2.4.3 Threshold
T:
X:
Training ourput vector, T = (tJ, ... , fj, •.. , t 71 )
Input vector, X= (XI> ••• , Xi> ••• , x11)
Thr~ldis a set yalue based upon which the final outp_~t-~f ~e network may be calculated. The threshold D..wij: Change in weights given by 8.wij = Wij(new) - Wij(old)
vafue is used in me activation function. X co.mparrso·n is made between the Cil:co.lared:·net>•input and the a: Learning rate; it controls the amount of weight adjustment at each step of training.
threshold to obtain the ne ork outpuc. For each and every apPlicauon;·mere1S'a-dlle5hoidlimit. Consider a
direct current DC) motor. If its maximum spee~then lhe threshold based on the speed is 1500
rpm. If lhe motor is run on a speed higher than its set threshold,-it-m~amage motor coils. Similarly, in neural I 2.5 McCulloch-Pitts Neuron
networks, based on the threshold value, the activation functions ar-;;-cres.iie(l"al:td the ourp_uc is calculated. The
activation function using lhreshold can be defined as ----- I 2.5.1 Theory

The McCulloch-Pitts neuron was the earliest neural network discovered in 1943. It is usually called as M-P
/(net)={_: if net "?-8
ifnet<8 neuron. The M-P neurons are connected by directed weighted paths. It should be noted that the activation of
aM-P neuron is binary, that is, at any time step the neuron maY fire or may por 6re The weights associated
where e ~ the fixed threshold value. wilh the communication links may be excitatocy (weight is positive) or inhibioocy (weight is negative). All ilie

.L
/

28 Artificial Neural Network: An Introduction 2.6 Linear Separabilily 29

excitatory connected weights entering into a particular neuron will have same weights. The threshold plays
a major role in M-P neuron: There is a fiXed threshold for each neuron, and if ilie net input to the neuron
I 2.6 Linear Separability
is greater than the.threshold then ilie neuron fires. Also, it should be noted that any nonzero inhibitory ~ fu'l'N does not give an exact solution for a nonlinea;-. problem. However, it provides possible approximate
input would prevent the neuro,n from firing. The M-P neurons are most widely used in the case of logic solutions nonlinear problems. Linear separability, is _ifie ~ritept wherein the separatiOn of the input space
functiOn~.------------ into regions is ase on w e er e network respoilse isJositive or negative.
A decision line is drawn tO separate positive and negative responses. The decision line may also be called as
I 2.5.2 Architecture
the decision-making line or decision-support line or linear-separable line. The necessity of the linear separability
concept was felt to classify the patterns based upon their output responses. Generally the net input @cU'Iau:a-
to t1te output Unu IS given as
A simple M-P neuron is shown in Figure 2-18. As already discussed, the M-P neuron has both excitatory and
inhibitory connections. It is excitatory with weight (w > 0) or inhibitory with weight -p(p < 0). In Figure "
2-18, inpms &om Xi ro Xn possess excitatory weighted connections and inputs from Xn+ 1 m Xn+m possess Yin = b + z:x,w;
inhibitory weighted interconnections. Since the firing of ilie output neuron is based upon the threshold, the i=l
activation function here is defined as
For example, if 4hlpolar srep acnvanoijfunction is used over the calculated ner input (y;,) then the value of
the funct:ion fs" 1 for a positive net input and -1 for a negative net input. Also, it is clear that there exists a
f(y;,)=(l ify;,;?:-0 boundary between the regions where y;, > 0 andy;, < 0. This region may be called as decision boundary and
0 ify;n<8
can be determined by the relation
For inhibition to be absolute, the threshold with the activation function should satisfy the following condition:
"
b+ Lx;w;=O
() > nw- p l~l

The output wiH fire if it receives sa6·:~~citatory ·i·n~~~~ut no inhibitory inputs, where On the basis of the number of input units in the network, the above equation may represenr a line, a plane

kw:>:O>(k-l)w
---- or a hyperplane. The linear separability of the nerwork is based on the decision-boundary line. If there exist
weights (with bias) for which the training input vectors having positive (correct:) response,+ l,lie on one side
of the decision boundary and all the other vectors having negative (incorrect) response, -1, lie on rhe other
The M-P neuron has no particular training algorithm. An analysis has to be performed m determine the side of the decision boundary. then we can conclude the/PrObleffi.Js "linearly separable."
values of the weights and the ili,reshold. Here the weights of the neuron are set along with the threshold to Consider a single-layer network as shown in Figure 2-~ias irlduded. The net input for the ne[Work
shown in Figure 2-l9 is given as
make the neuron "perform a simple logic functiofk-Xhe-M J?. neurons are used as buildigs ~ocks on...which
we can model any funcrion or phenomenon, which can be represented as a logic furfction. y;,=h+xtwl +X21V2
The sepaming line for wh-ich the boundary lies between the values XJ and X'2· so that the net gives a positive
x, response on one side and negative response on other side, is given as
~

~
'J b+xtw1 +X2Ui2 = 0
~
X,
-·X,
-
b
~' 'y

xm,
-p:;?? x, X, w,

~ w,

Xm•

Figure 2·18 McCulloch- Pins neuron model. Figure 2·19 A single-layer neural net.
30 Artificial Neural Network: An Introduction 2.7 Hebb Network 31

If weight WJ. is not equal to 0 then we get However, the dara representation mode has to be decide_d - whether it would be in binary form or in
bipolar form. It may be noted that the bipolar reoresenta'tion is bener than the
WI b
= Using bipolar data

--
X2 --Xl--
w, w, ues are represeru;d can be represented by
Thus, the requirement for the'positive response of the net is vice-versa.

0t~l W\ + "2"'2 > '!) 1 2.7 H~bb Network (e-n (,j 19,., ":_ w1p--tl u,.,; t-)
During training process, lhe values of Wi> W2 and bare determined so that the net will produce a positive ~ <..J I
(correct) response for the training data. if on the other hand, threshold value is being used, then the condmon- I 2. 7.1 Theory •
for obtaining the positive response from ourpur unit is

Net input received> ()(threshOld) I For a neural net, the Hebb learning rule is a simple one. Let us understand it. Donald Hebb stated in 1949
that in the brain, the learning is performed by th c ange m e syna nc ebb explained it: "When an
Yir~-> 8 axon of cell A is near enough to excite cdl B, an y or permanently takes pia~ it, some
XtW\ + XZW2 > (} growth process or merahgljc cheag;e rakes place in one or both the cells such that Ns efficiency, as one of the
cellS hrmg B. is increased.,
The separating line equation will then be According to the Hebb rule, the weight vector is found to increase proportionately to the product of the
input and the learning signal. Here the learning signal is equal tO the neuron's output. In Hebb learning,
XtWJ +X2W2 =()
if two interconnected neurons are 'on' simu)taneously then the weights associated w1ih these neurons can
W\ 8 be increased by ilie modification made in their synapnc gap (strength). The weight update in Hebb rule is
"'=--XI+- (with w, 'f' 0)
w, w, given by
During training process, the values of WJ and W2 have to be determined, so that the net will have a correct w;(new) = w;(old) + x;y
response to the training data. For this correct response, the line passes close rhrough the origin. In certain
situations, even for correct response, the separating line does not pass through the origin. The Hebb rule is more suited for ~ data than binary data. If binary data is used, ilie above weight
Consider a network having positive response in the first quadram and negative response in all other updation formula cannot distinguish two conditions namely;
quadrants (AND function) with either binary or bipolar data, then the decision line is drawn separating the 1. A training pair in which an input unir is "on" and target value is "off."
positive response region from rhe negative response region. This is depicred in Figure 2-20.
2. A training pair in which both ilie input unit and the target value are "off."
Thus, based on the conditions discussed above, the equation of this decision line may be obtained.
Also, in all the networks rhat we would be discussing, the representation of data plays a major role. Thus, iliere are limitations in Hebb rule application over binary data. Hence, the represemation using bipolar
data is advanrageous.
X,
I 2. 7.2 Flowchart of Training Algorithm
+ The training algorithm is used for rhe calculation and -~diustmem of weights. The flowchart for the training
(Positive response region)
algorithm ofHebb ne[Work is given in Figure 2-21. The notations used in the flowchart have already been
discussed in Section 2.4.7.
(Negalive response region) In Figure 2-21, s: t refers to each rraining input and target output pair. Till iliere exists a pair of training
input and target output, the training process takes place; elSe, IE tS stopped.
-x, x,

Decision
I 2. 7.3 Training Algorithm
line The training algorithm ofHebb network is given below:

I Step 0: First initialize ilie weights. Basically in this network iliey may be se~ro zero, i.e., w; = 0 fori= 1 \
-x, to n where "n" may be the total number of input neurons. '
Figure 2·20 Decision boundary line. Step 1: Steps 2-4 have to b~ performed for each input training vector and mger output pair, s: r.

i
l
32 Artificial Neural Network: An Introduction
2.9 Solved Problems 33

The above five steps complete the algorithmic process. In S~ep 4, rhe weight updarion formula can also be
given in vector form as

w(newl'= u,(old) +xy


Here the change in weight can be expressed as·

D.w = xy
As a result,
For
No w(new) = w(old) + l>.w
each
s: t
The Hebb rule can be used for pattern association, pattern categorization, parcem classification and over a
range of other areas.
Yes

Activate input units I 2.8 Summary


XI= Sl
In this chapter we have discussed dte basics of an ANN and its growth. A detailed comparison between
biological neuron and artificial neuron has been included to enable the reader understand dte basic difference
between them. An ANN is constructed with few basic building blocks. The building blocks are based on
dte models of artificial neurons and dte topology of few basic structures. Concepts of supervised learning,
Activate output units
unsupervised learning and reinforcement learning are briefly included in this chapter. Various activation
y=t
functions and different types oflayered connections are also considered here. The basic terminologies of ANN
are discussed with their typical values. A brief description on McCulloch-Pius neuron model is provided.
The concept of linear separability is discussed and illustrated with suitable examples. Derails are provided for
the effective training of a Hebb network.
Weight update
w1(new)= w1(old) +X1Y

I 2.9 Solved Problems

I. For the network shown in Figure I, calculate the weights are


Bias update
b(new)=b(old)+y net input to the output neuron.
[xi, x,, XJI = [0.3, 0.5, 0.6]
0.3 [wJ,w,,w,] = [0.2,0.1,-0.3]
X~
('
\ l8 ' ~ The net input can be calculated as

·~
tI " , , Figure 2~21 Flowchm ofHebb training algorithm.
0.5
@
0.1
y Yin =X] WJ + X'2WZ + X3W3
,. S~~ 2: Input units acrivations are ser. Generally, the activation function of input layer is idemiry funcr.ion: = 0.3 X 0.2+0.5 X 0.1 + 0.6 X (-0.3)
0- s; fori- tiiiJ
__/"
-0.3 = 0,06 + 0.05-0,18 = -O.D7

'c Step 3:., Output umts activations are set: y 1= t. i


Step 4: Weight adjustments and bias adjtdtments are performed:

wz{new) = w;(old} + x;y


Figure 1 Neural net.

Solution: The given neural net consists of three input


2. Calculate the ner input for the network shown in
Figure 2 with bias included in the network.

Solution: The given net consistS of two input


b(new) = b(old) + y
neurons and one output neuron. The inputs and neurons, a bias and an output neuron. The inputs are
35
Artificial Neural Network: An Introduction 2.9 Solved Problems
34
Table2
The net input ro the omput neuron is
Xj X2
- y_
0.3
y;, = b+ Lx;w;
"
w1=1
0 0 0 @
y i::l y' ;: ~
0

(n = 3, because only 0

0.7 3 input neurons are given] ~


= b + XJ.Wt + X'2W2 + X3W3 ~- The given function gives an ourputonlywhenxi = 1
andX2 ;:; 0. The weights have to bedecidedonlyafi:er
= 0.35 + 0.8 X 0.1 + 0.6 X OJ the analysis. The net Qn be represented as shown in
Figure 2 Simple neural net. Figure 4 Neural net.
+ 0.4 X (-0.2) < Figure 5. , ..X , 0 \ 0
[x1, X2l = [0.2, 0.6] and the weigh" are [w 1, w,] = = 0.35 + 0.08 + 0.18 - 0.08 = 0.53 tt>l>"':u I 'n
[0.3, 0.7]. Since the bias is included b = 0.45 and For an AND function, the output is high if both the
bias input xo is equal to 1, the net input is calcu- (i) For binary sigmoidal activation function, inputs are ~igh. For this condition, the net input is
w151
lated as calculated as 2. Hence, based on ch.is net input, the
1 1 threshold is set, i.e. if the threshold value is greater
y=f(y;.) = 1 + e_,m
-·· = l+e-053
· = 0.625 than or equal m 2 then the neuron fires, else it does y
Yin= b+xJWI +X2W2
nor fire. So the threshold value is set equal to2((J"= 2).
= 0.45 + 0.2 X 0.3 + 0.6 X 0.7 (ii) For bipolar sigmoidal activation function, This can also be ob£ained by
w2521
= 0.45 + 0.06 + 0.42 = 0.93
- 2_ - 1 =
. - __ 2 - 1
y-f(y,.,)- 1 +0'• 1 +e 0.53 -\ "{ e?- nw- p
Therefore y;, = 0.93 is the ner input.
= 0.259 , •.
~
/'} ,., ....... Figure 5 Neural net (weights fixed after analysis).
3. Obtain rhe output of the neuron Y for the net- Here, ~ = 2, w = 1 (excitatory weights) and p = 0
work shown in Figure 3 using activation func- 4. Implement AND function using McCulloch-Fitts (no inhibitory weights). Substituting these values in Case 1: Assume thac both weights W! and 'W'z. are
neuron (cake binary da£a). the above~rnencioned equation we get excitatory, i.e.,
tions as: (i) binary sigmoidal and (ii) bipolar
sigmoidal. Solution: Consider the truth table for AND function WJ=W2=1
8~2xl-0=>8~2
(Table 1).
1.0 Then for the four inputs calculace che net input using
Table 1
Thus, the output of neuron Y can be written as .
0.1 0.35 Xi X2 y ·' y;,=XIW] +l.11V1
1 1 ... ]\
0.6 x,l o.3 ;r y 1 0 0 l ify,.?-2 "'; For inputs
0 1 0 y = f(y;,) = 0 if y;, < 2 j \ ..
0 0 0 \ (1, 1), Yin= 1 X 1 +l X1= 2
/ \""
-0.2 0 (1, 0), Yin= 1 X 1+0 X I= 1
0.4
x, In McCulloch-Pires neuron, only analysis is being where "2" represents che threshold value. (0, 1), Yiu = 0 X 1+1 X 1= 1
performed. Hence, assume che weights be WI = 1
Figure 3 Neural ner. and w1 = 1. The network architecture is shown in ..--- 5. lmplemem ANDNOT function using (0, 0), Yitl = 0 X 1+0 X 1= 0
Figure 4. Wiili chese assumed weights, che nee input McCulloch-Pirrs neuron (use binary data
is calculated for foul inputs: For inputs representation). From the calculated net inputs, it is not possible co
Solution: The given nerwork has three input neu- fire ilie neuron for input (1, 0) only. Hence, t~ese J-.
rons with bias and one output neuron. These form (1,1), y;n=xiwt+X2wz=l x 1+1 xI =2 Solution: In the case of ANDNOT funcrion, the weights are norsUirable. IJI-il'b'l
1\(Jp / . /·
a single-layer network. The inpulS are given as response is true if the first input is true and the Assume one weight as excitato\Y and the qther as --\-- rr
(l,O), Yi11 =XJWJ +X2Wz = 1 X 1 +0 X 1= 1
[xi>X2•X3] = [0.8,0.6,0.4] and the weigh<S are second input is fa1se. For all ocher input variations, inhibitory, i.e., ,.. ' l,t-) ...tit'
[w 1, w,, w3] = [0.1, 0.3, -0.2] with bias b = 0.35
(Q, 1), Ji• = XJ Wj +X2W2 = 1+ 1 X 1 = 1
0 X
rhe response is fa1se. The truth cable for AND NOT
(0,0), )'in =XIWl +X2W2 = 0 X 1 +OX 1 = 0 WI =1, wz=-1
(irs input is always 1). function is given in Table 2.
36 Artificial Neural Network: An Introduction 2.9 Solved Problems 37
Now calculate the net input. For the inputs A single-layer net is not sufficient to represent the ,, x, Case 2: Assume one weight as excitatory and the
function. An intermediate layer is necessary. -1 ot~er as inhibitory, i.e.,
(1,1), y;, = 1 X 1 + 1 X -1 = 0 ~
(1,0), y;,=1x1+0x. -1=1'
w12=-1; wzz=l
(0,1), J;, = 0 X 1 + l X -1 = -1
(0, 0), Yin= 0 X 1 +0 X -1 = 0 ~ y)-.-y Now calculate the net inputs. For the inputs
Figure 8 Neural ner for Z2.
From the calculated net inputs, now it is possible (0, 0), Z2in :::: 0 X -1 +0 X 1= 0
co fire the neuron for input (1, 0) only by fixing a
threshold of 1, i.e.,()~ 1 for Y unit. Thus,
Calculate the net inputs. For inputs 21'1-3.2-fr
Figure 6 Neural net for XOR function (ilie (Q, 0), Zlin = 0 X 1+0 X -1 = Q "
tl!i=:=l; 1112=-1; 6?:.1 weights
Nou: The value ~f() is caJ'?llared using the following: shown are obtained after analysis). (Q, 1), ZJin = QX 1 + 1 X -1 = -1 (1, 1), Z2j11 = 1 X
·) ., (1, 0), Z\in = 1 X } + 0 X -1 :=: 1
8?:. nw-:-p Thus, based on this
First function (zJ = XJ.Xi"): The rrut:h table for
function ZJ is shown in Table 4. (1, 1), Ziin = 1 X 1 + 1 X -1 =0 possible to get the requi
(}?:. 2 x 1- 1 ·~.,[for "p" inhibitory only
~. ~'7 magnitude consitk'red] Table4 On the basis of this calculated net input, it is
9?:.1 J~r,J X] possible to get the required output. Hence, =1
~. S\) "'- Zi W22
Thus, the output of neuron Y can

1
y=f(y;,)= 0 ify;,< 1
1 ify;,::01
be written as 0
0
0
1
0
1
0
0
1
0
w11

WZI
e~ 1
== 1
= -1
for the zl neuron
f.{; r '',I SIV
---
8~1

Third function (;o. = ZJ OR zz): The truth rable


for this function is shown in Table 6.

"6. lmplementXORfunction using McCulloch-Pitts The net representation is given as ~--------


Second function =
(zz XIX2.): The truth table for Table&
neuron (consider binary data). Case 1: Assume both weighrs as excitatory, i.e., function Z2 is shown in Table 5. y zz
(
·~ ·v Xi "'- Zi

Solution: The trmh table for XOR function is given wu = 1021 = 1 TableS I) 0 0 0 0 0
in Table 3. 0 1 1 0 1
Calculate the net inpms. For inputs, Xi "'- zz 1 0 1 1 0
Table3 0 0 0 I 1 0 0 0
X]
"'- y (0, 0), Zj,0 = 0 X 1+ 0 X I= 0 0 I 1
0 0 0 (Q, 1), ZJin = 0 X 1+ l X 1= l 0 0 Here the net input is calculated using
0 1 1 (1, 0), Z!i, = 1 X 1+ 0 1= 1
X 1 0
~]in :::
1
1
0
1
1
0

In this case, ilie output is "ON" foronlyoddnumber


ofl's. For rhe rest it is "OFF." XOR function cannot
(1,1), ZJin = 1 X 1+1 X 1= 2

Hence, it is not possible to obtain function z1


using these weighlS.
Case 2: Assume one weight as excitatory and the
The net representation is given as follows:
~e 1: Assume both weights as excitatory, i.e.,

w12 = wn = 1
-· Z] V] + Z2VZ
Case 1: Assume both weights as excitatory, i.e.,

V] ::: VZ = 1
)

be ;presented by simple and single logic function; it oilier as inhibitory, i.e.,


is represented as Now calculate the net inputs. For the inputs Now calculate the net inp~t. For inputs

\Lf::~
WB=l; U/21=-l
(O,O),Z2in=Ox 1+0x 1=0 (O,O),y;,=Ox 1+0x 1=0
y=z, +za
,, 1

x, z,)(z,..,=x,w,,+JC2w2d (0, 1), ZJ..1; 1 =0 X 1 + 1 X 1 = 1 (0, 1), y;, = 0 X 1+ 1 X 1= 1


where (l,Q),Z2in = 1 X 1 +0 X 1:::1 (1, 0), Ji, = 1 X 1 + 0 X 1= 1
-1
Z! = Xlii (function 1) X, (1, 1), zz;, = 1 X 1 + 1 X 1= 2 (1,1), y;, = 0 X 1+ 0 X 1= 0
Z2 = XJx:z (funccion 2) "'
y = zi(OR)z, (function 3) Figure 7 Neural net for Z 1. Hence, it is not possible to obtain function zz (because for X] = 1 and X2 = l, ZJ = 0 and
using these weights. Z2 = 0)

,!
l
38 Artificial Neural Network: An lntn:iduction 2.9 Solved Problems 39

z, z,
, where the threshold is taken as "I" (e = 1) based final (new) weights obtained by presenting the
on the calculated net input. Hence, using the linear first input paaern, i.e.,
separability concept, the response is obtained fo.r
(-1,1) /7 (1,1) [wi w, b] = [1 l 1]
+ / + "OR" function.
y
-~x,, y,) 8. Design a Hebb net to implement logical AND The weight change here is
.(-1,0) function (use bipolar inputs and targets). ·
x, t:..w 1 =x1y= 1 X -1 = -1
Solution: The training data for the AND function is l>w, =xzy= -I X -I= I
given in Table 9.
(>,, y,)
l>b=y=-1
Figure 9 Nemal ner for Y(Z1 ORZ,). +
(0,-1)
(-1, -1) (1, -1) Table9
The new weights here are
e
Swing a threshold -of 2::. 1' Vj == 1'2 = I, which
Function decision
boundary
Inputs Target
implies that the net is recognized. Therefore, the Xi X2 b y w1(new) = w1(old) + 6.w1 =I -1 = 0
analysis is made for XOR function using M-P Figure 10 Graph for 'OR' function.
1 1 1 1 w, (new) = w,(old) + l>w, = 1 + 1 = 2
neurons. Thus for XOR function, the weights are
1 -1 1 -1 b(new) = b(old) + l>b = 1- 1 = 0
obtained as Using this value the equation for the line is given as
-I 1 1 -1
y = mx+c= (-1)x-l = -x-1 -1 -I 1 -1 Similarly, by presenting the third and fourth
wu = Zll22 = 1 (excitatory) input patterns, the new weights can be calculated.
WJ2 = W21 = -1 (inhibitory) Here the quadrants are nm x andy but XJ and xz, so The neMork is trained using the Hebb network train- Table 10 shows the values of weights for all inputs.
VJ = Vz = 1 (excirarory) the above equation becomes ing algorithm discussed in Section 2.7 .3.lnitially the .
weights and bias are set to zero, i.e., .- ~-- ~~J--- Table 10
7. Using the linear separability concept, obtain the \ xz =-xi -1 (2.1)
Inputs Weight changes Weights
response for OR function (rake bipolar inputs and
This can be wrinen as 0'2_~\ ·"'0 Xj X2, b y D.w, D.wz t:..b w1 wz b
bipolar targets). (0 0 0)
-WI b First input [xi xz b] = [1 1 1] and target = 1 I 1
Solution: Table 7 is the truth table for OR function xz= --XI-'-- (2.2)
[i.e., y = 1]: Setting the initial weights as old
I I 1 1 1 I I
with bipolar inputs and targets. wz wz I -1 I -I -I I -1 0 2 0
weights and applying the Hebb rule, we get -1 -1 I I -1
Comparing Eqs. (2.1) and (2.2), we get -1 I I -1 1
Table7 -1 -1 I -1 1 1 -1 2 2 -2
w;(new) = w;(old) + x;y
Xi X2 y Wi b
w2 =I; w 1(new) = w1 (old) +Xi]= 0 + I x l= 1 The sepaming line equation is given by
I 1
-I 1 '"' w,(new) = w,(oid) + xzy = 0 + I x I = f
-I I I Therefore, WJ = l, wz = 1 and b =
1. Calculating
b(new) = b(old) +y = 0 + 1 = I
-WJ
xz= - - x , - -
b
-I -I -I the net input and output of OR function on the basis ruz wz
of these weights and bias, we get emries in Table 8.
The weights calculated above arc the final weights
The uurh table inpurs and corresponding outputs that are obtained after presenting the first input. For all inputs, use the final weights obtained
TableS
.------;-:=::;:=~----:.cl
have been plotted in Figure 10. If output is 1, it is These weights are used as rhe initial weights when for each input to obtain the separating line.

~[Y•·=b+~}D
denoted as"+" else"-." Assuming rbe ~res ~ X2 the second input pattern is presented. The weight For the first input [1 I 1), the separating line is
as ( l, 0) 3.nd (0, ll; (x,, Yl) and (.xz,yz), the slope change here is t:..w; = x;y. Hence weight changes given by
1 1
"m" of the straight line can be obtained as I -1 1 1 1 relating to the first input are
-1 1
-1 I I 1 1 XZ = - X i - - ::::} XZ = -XJ - 1
)'2-yi -1-0 -1 -1 -1 -1 -1 t:..w1 = XJJ = l x 1 = I 1 1
m=--=--=-=-1
X2-X] 0+1 1 "'"" =w= 1 x 1 = 1
~ Similarly, for the second input [ 1 -1 1], the
Thus, the output of neuron Y can be written as y,·] l>b=y=l separating line is
We now calculate c:
y=f(;;;,) = 11OifJin<1
if y;,) I • Second input [x, X2, b] = [1 - 1 1] and
XZ =
-0
-x, --02 => xz = 0
'= Ji- '"-"i = 0- (~1)(-1) = -1 =
y -1: The initial or old weights here are the 2
I
_L_
Ot"fv!. 0v'J..oM
48 Artificial Neural Network: An Introduction
'.;;
network with bipolar sigmoidal units (A= 1) ro
.,
. ~!

3
testing. The input-output data are obtained by
achieve the following [)YO-to-one mappings: varying inpuc variables (xt,Xz) within [-1,+1] ~

• y = 6sin(rrxt) + cos(rrx,) randomly. Also the output dara are normalized it


• y = sin(nxt) cos(0.2Jr"2) within [-1, 1]. Apply training ro find proper
weights in the network.
]
:f
Supervised Learning Network
Ser up rwo sets of data, each consisting of 10 :?.
input-output pairs, one for training and oilier for

~
Learning Objectives -----'''-----------------,
The basic networks in supervised learning. Adaline, Madaline, back~propagarion and
How the perceptron learning rule is better radial basis funcrion network.
rhan the Hebb rule. The various learning facrors used in BPN.
Original percepuon layer description. • An overview of Ttme Delay, Function Link,
Delta rule with single output unit. Wavelet and Tree Neural Networks.

Architecture, flowchart, training algorithm Difference between back-propagation and


and resting algorithm for perceptron, RBF networks.

I 3.1 Introduction
The chapter covers major topics involving supervised learning networks and their associated single-layer
and multilayer feed-forward networks. The following topics have been discussed in derail- rh'e- perceptron
learning r'Ule for simple perceptrons, the delta rule (Widrow-Hoff rule) for Adaline and single-layer feed-
forward flC[\VOrks with continuous activation functions, and the back-propagation algorithm for multilayer
feed-forward necworks with cominuous activation functions. ln short, ali the feed-forward networks have
been explored.

I 3.2 Perceptron Networks


1 3.2.1 Theory
Percepuon networks come under single-layer feed-forward networks and are also called simple perceptrons.
As described in Table 2-2 (Evolution of Neural Networks) in Chapter 2, various cypes of perceptrons were
designed by Rosenblatt (1962) and Minsky-Papert (1969, 1988). However, a simple perceprron network was
discovered by Block in 1962.
The key points to be noted in a perccptron necwork are:

I. The perceptron network consists of three units, namely, sensory unit (input unit), associator unit (hidden
unit), response unit (output unit).
~
50 SupeNised Learning Network 3.2 Perceptron Networks
51

2. The sensory units are connected to associamr units with fixed weights having values 1, 0 or -l, which are Output
assigned at random. · - o·or 1 Output Desired
II
3. The binary activation function is used in sensory unit and associator unit. Fixed _weight
t Oar 1 output

4. The response unit has an'activarion of l, 0 or -1. The binary step wiili fixed threshold 9 is used as
activation for associator. The output signals £hat are sem from the associator unit to the response unit are
valUe ciN., 0, -1
at randorr\ .
\ 0 G) y,

9~
only binary.
5. TiiCQUt'put of the percepuon network is given by
- ---
i . \.--- '

~¢~
' iX1

{', r' '-


c

y = f(y,,)
X X X
\i
;x,
G) G)
..., \.~ X
:.I\
' < •
.,cl. 1 X I I \ Xn
where J(y;n) is activation function and is defmed as &.,
·'~
"
•<
· ~. tr Sensory unit 1

f(\- ,-. \~ ..
~
sensor grid " /
) ·~ if J;n> 9
)\t \ .. ._ representing any-·'
f(y;,) ={ if -9~y;11 56
'Z -1 if y;71 <-9
lJa~------ .
@ @ ry~
6. The perceptron learning rule is used in the weight updation between the associamr unit and the response e,
unit. For each training input, the net will calculate the response and it will Oetermine whelfier or not an Assoc1ator un~ . Response unit
error has occurred.
Figure 3·1 Ori~erceprron network.
w~t fL 9-·
7. The error calculation is based on the comparison of th~~~~~rgets with those of the ca1~t!!_~~ed
outputs. b'"~ r>-"-j ::.Kq>
(l.. u,•>J;>.-l? '<Y\
' ~ AJA I &J)
8. The weights on the connections from the units that send the nonzero signal will get adjusted suitably. I 3.2.2 Perceptron Learning Rule '
9. The weights will be adjusted on the basis of the learning_rykjf an error has occurred for a particular
training patre_!Jl.,..i.e..,- In case of the percepuon learrling rule, the learning signal is the difference between esir.ed...and.actuaL...- -·--,
~ponse of a neuron. The perceptron learning rule IS exp rune as o ows: j ~ f.:] (\ :._ PK- A-£. )
Wi{new) = Wj{old) + a tx1• Consider a finite "n" number of input training vectors, with their associated r;g~ ~ired) values x(n) {
and t{n), where "n" r~o N. The target is either+ 1 or -1. The ourput ''y" is obtained on the
b(new) = b(old) + at basis of the net input calculated and activation function being applied over the net input.

If no error occurs, there is no weight updarion and hence the training process may be stopped. In the above
equations, the target value "t" is+ I or-land a is the learningrate.ln general, these learning rules begin with
an initial guess at rhe weight values and then successive adjusunents are made on the basis of the evaluation
of an ob~~ve function. Evenrually, the lear!Jillg rules reac~.a near~optimal or optimal solution in a finite __
y = f(y,,) = l~
-1
if J1i1 > (}
if-{} 5Jirl 58
if Jin < -{}
\r~~ ~r
I
-~~
~
'

.,
number of steps. -------
APcrceprron nerwork with irs three units is shown in Figure 3~1. A£ shown in Figure 3~1. a sensory unir The weight updacion in case of perceprron learning is as shown. X~ -~~. ·.
can be a two-dimensional matrix of 400 photodetectors upon which a lighted picture with geometric black
and white pmern impinges. These detectors provide a bif!.~.{~) __:~r-~lgl.signal__if.f\1_~ i~.u.~und lfy ,P • then /I

co exceei~. certain value of threshold. Also, these detectors are conne ed randomly with the associator ullit. w{new) = w{old) + a tx {a - learning rate)
The associator unit is found to conSISt of a set ofsubcircuits called atrtre predicates. The feature predicates are else, we have
(
hard-wired to detect the specific fearure of a pattern and are e "valent to the feature detectors. For a particular
w(new) = w(old)
fearure, each predicate is examined with a few or all of the ponses of the sensory unit. It can be found that
the results from the predicate units are also binary (0 1). The last unit, i.e. response unit, contains the
pattern~recognizers or perceptrons. The weights pr tin the input layers are all fixed, while the weights on
the response unit are trainable.
~I
l
52 Supervised Learning Network 3.2 Perceptron Networks 53

For
each No
Figure 3-2 Single classification perceptron network. s:t

training patterns, and this learning takes place within a finite number of steps provided that the solution
exists."-

I 3.2.3 Architecture

In the original perceptron ne[Work, the output obtained from the associator unit is a binary vector, and hence
that output can be taken as input signal to the res onse unit and classificanon can be performed. Here only
the weights be[l.veen the associator unit and the output unit can be adjuste , an, t e we1ghrs between the
sensory _and associator units are faxed. As a result, the discussion of the network is limited. to a single portion. Apply activation, obtain
Thus, the associator urut behaves like the input unit. A simple perceptron network architecrure is shown in Y= f(y,)
Figure 3•2. --~·------
In Figure 3-2, there are n input neurons, 1 output neuron and a bias. The inpur-layer and output-
layer neurons are connected through a directed communication link, which is associated with weights. The
goal of the perceptron net is to classify theJ!w.w: pa~~tern as a member or not a member to a p~nicular
class. · -···-·.··-· ·-.. --- ......
y!l~~~
~1

~.J.';) clo...JJ<L [j 1f'fll r').\-~Oo-t" Cl....\ ~ ··~Len sy (\fll-


1 3.2.4 Flowchart for Training Process
Yes
The flowchart for the perceprron nerwork training is shown in Figure 3-3. The nerwork has to be suitably
trained to obtain the response. The flowchan depicted here presents the flow of the training process. w1(new) = W1{old)+ atx1 W1(new)= w1{old)
As depicted in the flowchart, fim the basic initialization required for rhe training process is performed. .~ b{new) = b(old) +at b(new) = b(old)
'
The entire loop of the training process continues unril the training input pair is presented to rhe network.
The training {weight updation) is done on the basis of the comparison between the calculated and desired
output. The loop is terminated if there is no change in weight.
If
Yes
3.2.5 Perceptron Training Algorithm for Single Output Classes weight
changes
The percepuon algorithm can be used for either binary or bipolar input vectors, having bipolar targets,
threshold being fixed and variable bias. The algorithm discussed in rh1~ section is not particularly sensitive
No
to the initial values of the wei~fr or the value of the learning race. In the algorithm discussed below, initially
the inputs are assigned. Then e net input is calculated. The output of the network is obtained by app1ying Stop
the. activation function over the calculated net input. On performing comparison over the calculated and
Figure 3·3 Flowcha.n: for perceptron network with ·single ourput.
54 Supervised Learning Network 3.2 Parceptron Networks 55

ilie desired output, the weight updation process is carried out. The entire neMork is trained based on the Step 2: Perform Steps 3--5 for each bipolar or binary training vector pair s:t.
mentioned stopping criterion. The algorithm of a percepuon network is as follows: Step 3, Set activation (identity) of each input unit i = 1 ton:

I StepO: Initi-alize ili~weights a~d th~bia~for ~ ~culation they can b-e set to zero). Also initialize the / x;;= ~{
learning race a(O < a,;:= 1). For simplicity a is set to 1.
Step 1: Perform Steps 2-6 until the final stopping condition is false. Step 4, irst, the net input is calculated as i A
Step 2: Perform Steps 3-5 for each training pair indicated by s:t. I ,_..... It: -.-' --.~ ',_ -~,
---- :::::::::J~ "'( ;-;· J)'
Step 3: The input layer containing input units is applied with identity activation functions: ~----

(,. Yinj = bj + Lx;wij


n
.1~'
r<t' V" p•\ \ '' /
. u· \'.
'
(}.C: \\ , :/ r·
x; =si
\ ~
i=l
. v··· '<.''
Step 4: Calculate the output of the nwvork. To do so, first obtain the net input: Then activations are applied over the net input to calculate the output response:
"
~
Yin= b+ Lx;w; ify;11j > 9
i=I
Jj = f(y;.y) = { if-9 :S.Jinj :S.9 II
,,
where "n" is the number of input neurons in the input layer. Then apply activations over the net -I ify;11j < -9
input calculated to obmin the output:
Step 5: Make adjustment in weights and bias for j = I to m and i = I to n.
~
ify,:n>B
y= f(y;.) = { if -8 S.y;, s.B If;· # Jj• then
-I ify;n < -9 Wij(new) = Wij(old) + CXfjXi
Step 5, Weight and bias adjustment: Compare ilie value of the actual (calculated) output and desired bj(new) = bj(old) + Ofj
(target) output. else, we have \1
Ify i' f, then wij(new) = Wij(old) li'
w;(new) = w;(old) + atx; ~{new) = ~{old)

b(new) = b(old) + Of
Step 6: Test for the stopping condition, i.e., if there is no change in weights then stop the training process,
else, we have
1 else stan again from Step 2. 1
I.'
7Vi(new) = WJ(old}
b(new) = b(old) It em be noticed that after training, the net classifies each of the training vectors. The above algorithm is
I
i
Step 6: Train the nerwork until diere is no weight change. This is the stopping condition for the network. suited for the architecture shown in Figure 3~4. ~
j
If this condition is not met, then start again from Step 2. i
I
The algorithm discussed above is not sensitive to the initial values of the weights or the value of the
3.2. 7 Percept ron Network Testing Algorithm
~
~
It is best to test the network performance once the training process is complete. For efficient performance
learning rare.
of the network, it should be trained with more data. The testing algorithm (application procedure) is as
I
~
follows: ~!I
3.2.6 Perceptron Training Algorithm for Multiple Output Classes il\!
For multiple output classes, the perceptron training algorithm is as follows:

\ Step 0:-- Initialize the weights, biases and learning rare suitably.
Step 1: Check for stopping c?ndirion; if it is false, perform Steps 2-6.
I
I Step 0: The initi~ weights to be used here are taken from the training algorithms (the final weights I
obtained.i:l.uring training).
Step 1: For each input vector X to be classified, perform Steps 2-3.
Step 2: Set activations of the input unit.
II
I:;i
I

.,,.
011
r
56 Supervised Learriing Network
~- 3.3 Adaptive Unear Neuron (Adaline) 57

~~
~,
~~3.3 Adaptive Linear Neuron (Adaline)
'
1
I 3.3.1 Theory ,

x, 'x,
The unirs with linear activation function are called li~ear.~ts. A network ~ith a single linear unit is called
an Adaline (adaptive linear neuron). That is, in an Adaline, the input-output relationship is linear. Adaline

./~~ \~\J
/w,, uses bipolar activation for its input signals and its target output. The weights be.cween the input and the
omput are adjustable. The bias in Adaline acts like an adjustable weighr, whose connection is from a unit
with activations being always 1. Ad.aline is a net which has only one output unit. The Adaline nerwork may
w,l be trained using delta rule. The delta rule may afso be called as least mean square (LMS) rule or Widrow~Hoff
Xi
(x;)~ "/ ~ y 1:
~(s)--
YJ
I • -----+- YJ
rule. This learning rule is found to minimize the mean~squared error between the activation and the target
value.

I 3.3.2 Delta Rule for Single Output Unit

The Widrow-Hoff rule is very similar to percepuon learning rule. However, rheir origins are different. The
perceptron learning rule originates from the Hebbian assumption while the delta rule is derived from the
x, ( x,).£::::___ _ _~
w -
gradienc~descem method (it can be generalized to more than one layer). Also, the perceptron learning rule
stops after a finite number ofleaming steps, but the gradient~descent approach concinues forever, converging
Figure 3·4 Network archirecture for percepuon network for several output classes. only asymptotically to the solution. The delta rule updates the weights between the connections so as w
minimize the difference between the net input ro the output unit and the target value. The major aim is to
Step 3: Obrain the· response of output unit. minimize the error over all training parrerns. This is done by reducing the error for each pattern, one at a
rime.
The delta rule for adjusting rhe weight of ith pattern {i = 1 ro n) is
Yin = L" x;w; / ' ·
i=l
D.w; = a(t- y1,)x1
where D.w; is the weight change; a the learning rate; xthe vector of activation of input unit;y;, the net input
I if y;, > 8
to output unit, i.e., Y Li=l
= x;w;; t rhe target output. The deha rule in case of several output units for
Y = f(yhl) = { _o ~f ~e sy;, ~8 _,/'\ adjusting the weight from ith input unit to the jrh output unit (for each pattern) is
1 tfy111 <-8 IJ.wij = a(t;- y;,,j)x;

Thus, the testing algorithm resLS the performance of nerwork. I 3.3.3 Architeclure

As already stated, Adaline is a single~unir neuron, which receives input from several units and also from one
unit called bias. An Adaline inodel is shown in Figure 3~5. The basic Adaline model consists of trainable
weights. Inputs are either of the two values (+ 1 or -1) and the weights have signs (positive or negative).
The condition for separaring the response &om re~o is Initially, random weights are assigned. The net input calculated is applied to a quantizer transfer function
(possibly activation function) that restOres the output to +1 or -1. The Adaline model compares the actual
WJXJ + tiJ2X]. + b> (} output with the target output and on the basis of the training algorithm, the weights are adjusted.

_______
The condition for separating the resPonse from_...r~~o t~~ion of nega~ve
..
~--
is I 3.3.4 Flowchart lor Training Process

WI X} + 'WJ.X]_ + b < -(} The flowchan for the training process is shown in Figure 3~6. This gives a picrorial representation of the
network training. The conditions necessary for weight adjustments have co be checked carefully. The weights
The conditions- above are stated for a siilgie:f.i~p;;~~~ ~~~~;~k~ith rwo Input neurons and one output and other required parameters are initialized. Then the net input is calculated, output is obtained and compared
neuron and one bias. with the desired output for calculation of error. On the basis of the error Factor, weights are adjusted.
58 Supervised Learning Network 3.3 Adaptive Linear Neuron (Adaline) 59

Set initial values-weights


Ym= I.A/W1 y and bias, lear·rltrig-state
X, \
X2r j w2 ''-
_,.., If· b, a
w"
Y1"
X"
X"

Adaptive
algorithm I• e = t- Ym 1 Output error
generator +t For
No
each
.. ................................. Learning supervisor
~... s: t
Figure 3·5 Adaline model.
Yes

I 3.3.5 Training Algorithm


Activate input layer units
X =s (i=1ton)
1 1
The Adaline nerwork training algorithm is as follows:

.Step 0: Weights and bias are set to some random values bur not zero. Set the learning rate parameter ct.
Step 1: Perform Steps 2-6 when stopping condition is false.
Step 2: Perform Steps 3~5 for each bipolar training pair s:t.
Step 3: Set activations for input units i = I to n.
Weight updation
x;=s; w;(new) = w1(old) + a(t- Y1n)Xi
b(new) = b(old) + a(r- Yinl
Seep 4: Calculate the net input to the output unit.

"
y;, = b+ Lx;w;
i=J

Step 5: Update the weights and bias fori= I ron:

w;(new) = w;(old) + a (t- Yin) x; No If


b(new) = b (old) + a (t- y,,) E;=Es

Step 6: If the highest weight change rhat occurred during training is smaller than a specified toler-
ance ilien stop ilie uaining process, else continue. This is the rest for stopping condition of a
network.

The range of learning rate Can be be[Ween 0.1 and 1.0. Figure 3·6 Flowcharr for Adaline training process.
I

I
I

1._
.~

60 Supervised Learning Network 3.4 Multiple Adaptive Linear Neurons 61

I 3.3.6 Testing Algorithm

Ic is essential to perform the resting of a network rhat has been trained. When training is completed, the
Adaline can be used ro classify input patterns. A step &merion is used to test the performance of the network.
The resting procedure for thC Adaline nerwc~k is as follows:

J Step 0: Initialize the weights. (The weights are obtained from ilie ttaining algorithm.) J
Step 1: Perform Steps 2-4 for each bipolar input vecror x.
Step 2: Set the activations of the input units to x.
Step 3: Calculate the net input to rhe output unit:

]in= b+ Lx;Wj
Step 4: Apply the activation funcrion over the net input calculated: Figure 3·7 Archireaure of Madaline layer.

1 ify,"~o
y= and the output layer are ftxed. The time raken for the training process in the Madaline network is very high
{ -1 ifJin<O
compared to that of the Adaline network.

I 3.4.4 Training Algorithm


I 3.4 Multiple Adaptive Linear Neurons
In this training algorithm, only the weights between the hidden layer and rhe input layer are adjusted, and
I 3.4.1 Theory the weighu for the output units are ftxed. The weights VI, 112, ... , Vm and the bias bo that enter into output
unit Yare determined so that the response of unit Yis 1. Thus, the weights entering Yunit may be taken as
The multiple adaptive linear neurons (Madaline) model consists of many Adalin~el with a single
Vi ;::::V2;::::···;::::vm;::::!
output unit whose value is based on cerrain selection rules. 'It may use majOrity v(;re rule. On using this rule,
rhe output would have as answer eirher true or false. On the other hand, if AND rule is used, rhe output is and the bias can be taken as
true if and only ifborh rhe inputs are true, and so on. The weights that are connected from the Adaline layer
to ilie Madaline layer are fixed, positive and possess equal values. The weighrs between rhe input layer and bo;:::: ~
the Adaline layer are adjusted during the training process. The Adaline and Madaline layer neurons have a
The activation for the Adaline (hidden) and Madaline (output) units is given by
bias of excitation "l" connected to them. The uaining process for a Madaline system is similar ro that of an

{_
Adaline. lifx~O
f(x) = 1 if x < 0
I 3.4.2 Architectury>

A simple Madaline architecture is shown in Figure 3-7, which consists of"n" uniu of input layer, "m" units Step 0: Initialize the weighu. The weights entering the output unit are set as above. Set initial small
ofAdaline layer and "1" unit of rhe Madaline layer. Each neuron in theAdaline and Madaline layers has a bias random values for Adaline weights. Also set initial learning rate a.
of excitation 1. The Adaline layer is present between the input layer and the Madaline (output) layer; hence, Step 1: When stopping condition is false, perform Steps 2-3.
the Adaline layer can be considered a hidden layer. The use of the hidden layer gives the net computational
Step 2: For each bipolar training pair s:t, perform Steps 3-7.
capability which is nor found in single-layer nets, but chis complicates rhe training process to some extent.
The Adaline and Madaline models can be applied effectively in communication systems of adaptive Step 3: Activate input layer units. Fori;:::: 1 to n,
equalizers and adaptive noise cancellation and other cancellation circuits. x;:;: s;

I 3.4.3 Rowchart of Training Process Step 4: Calculate net input to each hidden Adaline unit:

The flowchart of the traini[lg process of the Madaline network is shown in Figure 3-8. In case of training, the "
Zinj:;:bj+ LxiWij, j:;: l tom
weighu between the input layer and the hidden layer are adjusted, and the weights between the hidden layer i=l
62 Supervised Learning Network 3.4 Multiple Adaptive Linear Neurons 63

(
p A

Initial & fixed weights


& bias between hidden & Yes
u
output layers
t=y

T
Set small random value
weights for adallne layer.
Initialize a

c}----~
t= 1" No

Yes
No
>--+---{8
Update weights on unit z1whose
net input is closest to zero.
b1(new) = b1(old) + a(1-z~)
w,(new) = wi(old) + a(1-zoy)X1
Activate input units
X10': s,, b1 ton

Update weights on units zk which


j has positive net inpul.
bk(new) = bN(old) + a(t-z.,.,)
Find net input to hidden layer
wilr(new) = w,.(old) + a(l-z.)x1
...
Zn~=b1 +tx1 w~,j=l tom

I
Calculate output
zJ= f(z.,)

I If no
Calculater net input to output unit No ( weight changes
c) (or) specilied
Y..,=b0 ·;i:zyJ
,., ' number of
epochs

T ' / (8
Calculate output Yes '
Y= l(y,)

cb Figure 3·8 (Continued).

Figure 3·8 Flowcharr for rraining ofMadaline,


I
L
-
Supe!Vised Learning Network 3.5 Back·Propagation Network 65
64

Step 5: Calculate output of each hidden unit: The back-propagation algorithm is different from mher networks in respect to the process by whic
weights are calculated during the learning period of the ne[INork. The general difficulty with the multilayer
Zj = /(z;n) pe'rceprrons is calculating the weights of the hidden layers in an efficient way that would result in a very small
or zero output error. When the hidden layers are incteas'ed the network training becomes more complex. To
Step 6: Find the output of the net: update weights, the error must be calculated. The error, Which is the difference between the actual (calculated)
and the desired (target) output, is easily measured at the"Output layer. It should be noted that at the hidden
y;, = bo + Lqvj
"' layers, there is no direct information of the en'or. Therefore, other techniques should be used to calculate an
j=l error at the hidden layer, which will cause minimization of the output error, and this is the ultimate goal.
The training of the BPN is done in three stages - the feed-forward of rhe input training pattern, the
y =f(y;")
calculation and back-propagation of the error, and updation of weights. The tescin of the BPN involves the
Step 7: Calculate the error and update ilie weighcs. compuration of feed-forward phase onlx.,There can be more than one hi en ayer (more beneficial) bur one
hidden layer is sufhcienr. Even though the training is very slow, once the network is trained it can produce
1. If t = y, no weight updation is required. its outputs very rapidly.
2. If t f y and t = +1, update weights on Zj, where net input is closest to 0 (zero):
I 3.5.2 Architecture
bj(new) = bj(old) + a (1 - z;11j}
wij(new) = W;i(old) + a (1 - z;11j)x; A back-propagation neural network is a multilayer, feed~forv.rard neural network consisting of an input layer,
a hidden layer and an output layer. The neurons present in che hidden and output layers have biases, which
3. If t f y and t = -1, update weights on units Zk whose net input is positive: are rhe connections from the units whose activation is always 1. The bias terms also acts as weights. Figure 3-9
shows the architecture of a BPN, depicting only the direction of information Aow for the feed~forward phase.
w;k(new) = w;k(old) + a (-1 - z;, k) x;
1 During the b~R3=l)3tion phase of learnms., si nals are sent in the reverse direction
b,(new) = b,(old) +a (-1- z;,.,) The inputs sent to the BPN and the output obtained from the net could be e1ther binary (0, I) or
bipolar (-1, + 1). The activation function could be any function which increases monotonically and is also
Step 8: Test for the stopping condition. (If there is no weight change or weight reaches a satisFactory level, differentiable.
or if a specifted maximum number of iterations of weight updarion have been performed then
1 stop, or else continue). I

Madalines can be formed with the weights on the output unit set to perform some logic functions. If there
are only t\VO hidden units presenr, or if there are more than two hidden units, then rhe "majoriry vote rule"
function may be used. /

I 3.5 Back·Propagation Network ...>,


:fu I\"..L.·'"''
,-- J f.
·
~ I
~
·-
(""~-~

I'
1 3.5.1 Theory
The back~propagarion learning algorithm is one of the most important developments in neural net\vorks
(Bryson and Ho, 1969; Werbos, 1974; Lecun, 1985; Parker, 1985; Rumelhan, 1986). This network has re-
awakened the scientific and engineering community to the model in and rocessin of nu
phenomena usin ne networks. This learning algori m IS a lied !tilayer feed-forward ne_two_d~
con;rung o processing elemen~S with continuous renua e activation functions. e networks associated
with back-propagation learning algorithm are so e ac -propagation networ. (BPNs). For a given set
of training input-output pair, chis algorithm provides a procedure for changing the weights in a BPN to
classify the given input patterns correctly. The basic concept for this weight update algorithm is simply the
gradient-des em method as used in the case of sim le crce uon networks with differentiable units. This is a r(~.
method where the error is propagated ack to the hidden unit. he aim o t e neur networ IS w train the ''
net to achieve a balance between the net's ability to respond (memorization) and irs ability to give reason~e
I ~~ure3·9

l
Architecture of a back-propagation network.
responses to rhe inpm mar "simi,.,. bur not identi/to me one mar is used in ttaining (generalization).
66 Super.<ise_d Learni~g Network 3.5 Back·Propagalion Network 67

I 3.5.3 Flowchart for Training Process

The flowchart for rhe training process using a BPN is shown in Figure 3-10. The terminologies used in the
flowchart and in the uaining algorithm are as follows:
x = input training vecro.r (XJ, ... , x;, ... , x11 )
t = target output vector (t), ... , t/r, ... , tm) -
a = learning rate parameter
x; :;::. input unit i. (Since rhe input layer uses identity activation function, the input and output signals © "
here are same.)
VOj = bias on jdi hidd~n unit
wok = bias on kch output unit FOr each No
~=hidden unirj. The net inpUt to Zj is training pair >-~----(B
x. t
"
Zinj = llOj +I: XjVij
i=l Yes
and rhe output is
Zj = f(zi"j) Receive Input signal x1 &
transmit to hidden unit

Jk = output unit k. The net input m Yk is


p

]ink = Wok + L ZjWjk In hidden unit, calculate o/p,


j=:l "
Z;nj::: Voj + i~/iVij

z;=f(Z;nj), ]=1top
and rhe output is i= 1\o n
y; = f(y,";)

Ok =. error correction weight adjusrmen~. for Wtk ~hat is due tO an error at output unit Yk• which is
back-propagared m the hidden uni[S thai feed into u~
Of = error correction weight adjustment for Vij that is due m the back-proEagation of error to the
hidden uni<zj- b>• '\f"-( L""'-'iJ ~-fe_,l.. ,,'-'.fJ Z-J' ...--
Also, ir should be noted that tOe commonly used acrivarion functions are l:imary sigmoidal and bipolar
sigmoidal activation functions (discussed in Section 2.3.3). These functions are used in the BPN because of Calculate output signal from
the following characteristics: (i) continui~; (ii) djffereorjahilit:ytlm) nQndeCreasing mon0£9.11Y· output layer,
p
The range of binary sigmoid is fio;Q to 1, and for bipolar sigmoid it is from -1 to+ 1. Yink =- Wok+ :E z,wik
"'
Yk = f(Yink), k =1 tom

I 3.5.4 Training Algorilhm

The error back-propagation learning algorithm can be oudined in ilie following algorithm:

Figure 3·10
!Step 0: Initialize weights and learning rate (take some small random values).
Step 1: Perform Sreps 2-9 when stopping condition is false.
Step 2: Perform Steps 3-8 for~ traini~~r.

I
L
Supervised learning Network
3.5 Back·Propagation Network 69
68
_, - ------------._
lf:edjorward p~as' (Phas:fJ_I
A

Compute error correction !actor


t,= (1,-yJ f'!Y~o.l
(between output and hidden)
--
Step 3: Each input unit receives input signal x; and sends it to the hidden unit (i
Step 4: Each hidden unit Zj(j = 1 top) sums irs Weighted inp~;~t signals to calculate net input:
..:/

Zfnf' =
-
v;j + LX
"
ill;;
I
-v
Y. '..,
= l to n}.

,I

'rJ
i=l

Calculate output of the hidden uilit by applying its activation functions over Zinj (binary or bipolar
Find weight & bias correction term
ll.Wjk. = aO,zj> l\W01c = ~J"II

Calculate error term bi


-
sigmoidal activation function}:

Zj = /(z;,j)
and send the output signal from the hidden unit to the input of output layer units.
Step 5: For each output unity,~o (k = I to m),_ca.lcuhue the net input: ,I
,\--. o\•\
(between hidden and input)
m
~nJ=f}kWjk ' I
p
~ = 0,,1f'(z1,p
Yink = Wok + L ZjWjk
j~l

I
Compute change in weights & bias based
on bj.l!.vii= aqx;. ll.v01 = aq

Update weight and bias on


output unit
-----:::~
......
f~ropagation ofen-or (Phase ll)j
St:ql-6: --Each output unu JJr(k
-
and apply the activation function to compute output signal

Yk = f(y;,,)

I to m) receives a target parrern corr~ponding to rhe input training


pattern and computes theferrorcorrectionJffii'C)
w111 (new) = w111 (old) + O.w_;11
I'
\
wok (new)= w0k (old)+ ll.w011
··= (t,- ykl/'(y;,,)
The derivative J'(y;11k) can be calculated as in Section 2.3.3. On the basis of the calculated error
correction term, update ilie change in weights and bias:
Update weight and bias on
hidden unil \,
t1wjk = cxOkzj; t1wok = cxOrr {j
v 11 (new) =V~(old) +I.Nq Of
V01 (new)= V01 (old) + t:N01 rJ
Also, send Ok to the hidden layer baCkwards.
Step 7: Each hidden unit (zj,j = I top) sums its delta inputs from the output units:

"'
8inj= z=okwpr
k=l

The term 8inj gets multiplied wirh ilie derivative of j(Zinj) to calculate the error tetm:

8j=8;11jj'(z;nj)

The derivative /'(z;71j) can be calculated as C!TS:cllssed in Section 2.3.3 depending on whether
binary or bipolar sigmoidal function is used. On the basis of the calculated 8j, update rhe change
in weights and bias:

t1vij = cx8jx;; tlvoj = aOj


,.
\.

I
-I
'
70 Supervised Learning Network 3.5 Back-Propagation Network 71 :IIf
. Wlighr and bias upddtion (PhaJ~ Ill): I from the beginning itself and the system may be smck at a local minima or at a very flat plateau at the starting

point itself. One method of choosing the weigh~ is choosing it in the range
Step 8: Each output unit (yk, k = 1 tom) updates the bias and weights:
I
Wjk(new) = Wjk(old)+6.wjk I -3' 3 J.
[ .fO;' _;a,'
= WQk(oJd)+L'.WQk '
WOk(new)
i
Each hidden unit (z;,j = 1 top) updates its bias and weights: I
Vij(new) = Vij(o!d)+6.vij
'<y(new) = VOj(old)+t.voj

Step 9: Check for the sropping condition. The stopping condition may be cenain number of epochs
1 reached or when ilie actual omput equals the t<Uget output. 1 V,j'(new) =y Vij(old)
llvj(old)ll
The above algorithm uses the incremental approach for updarion of weights, i.e., the weights are being
where Vj is the average weight calculated for all values of i, and the scale factory= 0.7(P) 11n ("n" is the
changed immediately after a training pattern is presented. There is another way of training called batch-mode
number of input neurons and "P" is the nwnber of hidden neurons).
training, where the weights are changed only after all the training patterns are presented. The effectiveness of
rwo approaches depends on the problem, but batch-mode training requires additional local storage for each
3.5.5.2 Learning Rate a
connection to maintain the immediate weight changes. When a BPN is used as a classifier, it is equivalent to
the optimal Bayesian discriminant function for asymptOtically large sets of statistically independent training The learning rate (a) affects the convergence of the BPN. A larger value of a may speed up the convergence
but might result in overshooting, while a smaller value of a has vice-versa effecr. The range of a from 10- 3
pauerns.
The problem in this case is whether the back-propagation learning algorithm can always converge and find to 10 has been used successfulfy for several back-propagation algorithmic experiments. Thus, a large learning I
proper weights for network even after enough learning. It will converge since it implements a gradient-descent rate leads to rapid learning bm there is oscillation of wei_g!lts, while the lower learning rare leads to slower
on the error surface in the weight space, and this will roll down the error surface to the nearest minimum error learning. -
and will stop. This becomes true only when the relation existing between rhe input and the output training
patterns is deterministic and rhe error surface is deterministic. This is nm the case in real world because the 3.5.5.3 Momentum Factor
produced square-error surfaces are always at random. This is the stochastic nature of the back-propagation The gradient descent is very slow if the learning rare a is small and oscillates widely if a is roo large. One
algorithm, which is purely based on the srochastic gradient-descent method. The BPN is a special case of very efficient and commonly used method that altows a larger learning rate without oscillations is by adding
stochastic approximation. a momentum factor ro rhc;_.!,LQ!DlaLgradient-descen_t __m~_r]l_Qq., _
If rhe BPN algorithm converges at all, then it may get smck with local minima and may be unable to The-iil"Omemum E'cror IS denoted by 1] E [0, i] and the value of 0.9 is often used for the momentum
find satisfactory solutions. The randomness of the algorithm helps it to get out of local minima. The error factor. Also, this approach is more useful when some training data are ve rem from the ma·oriry
functions may have large number of global minima because of permutations of weights that keep the network of clara. A momentum factor can be used with either p uern y pattern up atillg or batch-"iiii e up a -
input-output function unchanged. This"6.uses the error surfaces to have numerous troughs. ing.-I'iicase of batch mode, it has the effect of complete averagirig over rhe patterns. Even though the
averaging is only partial in the panern-by-pattern mode, it leaves some useful i-nformation for weight
updation.
3.5.5 Learning Factors _of Back-Propagation Network
The weight updation formulas used here are
The training of a BPN is based on the choice of various parameters. Also, the convergence of the BPN is
Wjk(t+ I)= Wji(t) + ao,Zj+ry [Wjk(t)- Wjk(t- I)]
based on some important learning factors such as rhe initial weights, the learning rare, the updation rule,
the size and nature of the training set, and the architecture (number of layers and number of neurons per ll.•uj~(r+ 1)

layer).
and
3.5.5.1 Initial Weights
Vij(t+ 1) = Vij(t) + a8jXi+1J{Vij(t)- Vij(t- l)]
The ultimate solution may be affected by the initial weights of a multilayer feed-forward nerwork. They are
ll.v;j(r+ l)
initialized at small random values. The choice of r wei t determines how fast the network converges. I
The initial weights cannm be very high because t q~g-~oidal acriva · ed here may get samrated I The momenlum factor also helps in fas"r convergence.

L
'.
72 Supervised Learning Network 3.6 Radiat Basis Function Network 73

3.5.5.4 Generalization Step 4: Now c?mpure the output of the output layer unit. Fork= I tom,
The best network for generalization is BPN. A network is said robe generalized when it sensibly imerpolates p
with input networks thai: are new to the nerwork. When there are many trainable parameters for the given link =:WOk + L ZjWjk
amount of training dam, the network learns well bm does not generalize well. This is usually called overfitting ·. ·j=l
or overtraining. One solurion to this problem is to moniror the error on the rest sec and terminate the training
when che error increases. With small number of trainable parameters, ~e network fails to learn the training Jk = f(yj,,)
_r!-'' ~.,_,r; data and performs very poorly. on the .test data. For improving rhe abi\icy of the network ro generalize from Use sigmoidal activation functions for calculating the output.
.-.!( ~o_ a training data set w a rest clara set, ir is desirable to make small changes in rhe iripur space of a panern,
}{i 1
.,'e,) without changing the output components. This is achieved by introducing variations in the in pur space of
-0
c..!( '!f.!' training panerns as pan of the training set. However, computationally, this method is very expensive. Also,
,-. ,:'\ j a net With large number of nodes is capable of membfizing the training set at the cost of generali:zation ...As a I 3.6 Radial Basis Function Network
?\ Ji result, smaller nets are preferred than larger ones.
r I 3.6.1 Theory
3.5.5.5 Number of Training Data
The radial basis function (RBF) is a classification and functional approximation neural network developed
The training clara should be sufficient and proper. There exisrs a rule of thumb, which states !!!:r rhe training
by M.J.D. Powell. The newark uses the most common nonlineariries such as sigmoidal and Gaussian kernel
dat:uhould cover the entire expected input space, and while training, training-vector pairs should be selected
functions. The Gaussian functions are also used in regularization networks. The response of such a function is
randomly from the set. Assume that theffiput space as being linearly separable into "L" disjoint regions
positive for all values ofy; rhe response decreases to 0 as lyl _. 0. The Gaussian function is generally defined as
with their boundaries being part of hyper planes. Let "T" be the lower bound on the ~umber~ of training
pens. Then, choosing T suE!!_ that TIL ») will allow the network w discriminate pauern classes using f(y) = ,-1
fine piecewise hyperplane parririomng. Also in some cases, scaling.ornot;!:flalization has to be done to help
learning. __ ,•' ··: }) \ .. The derivative of this function is given by

3.5.5.6 Number of Hidden Layer Nodes .•. A/77 _/ ['(yl = -zy,-r' = -2yf(yl
If there exists more than one hidden layer in a BPN, rhe~~ICufarions
performed for a single layer are The graphical represemarion of this Gaussian Function is shown in Figure 3-11 below.
repeated for all the layers and are summed up at rhe end. In case of"all mufnlayer feed-forward networks, When rhe Gaussian potemial functions are being used, each node is found to produce an idemical outpm
rhe size of a h1dden layer i'f"VeTy important. The number of hidden units required for an application needs for inputs existing wirhin the fixed radial disrance from rhe center of the kernel, they are found m be radically
to be determined separately. The size of a hidden lay~_:___is usually determi_~Q~~p_qim~~- For a network symmerric, and hence the name radial basis function network. The emire network forms a linear combination
of a reasonable size,~ SIZe of hidden nod -- araariVel}r~mall fraction of the inpllrl~For of the nonlinear basis function.
example, if the network does not converge to a solution, it may need mor hidduJ lmdes:-i3~and,

overa.ll system performance.

3.5.6 Testing Algorithm of Back-Propagation Network


---
if rhe net\vork converges, the user may try a very few hidden nodes and then settle finally on a size based on
f(y)

The resting procedure of the BPN is as follows:

Step 0: Initialize the weights. The weights are taken from the training algorithm.
Step 1: Perform Steps 2-4 for each input vector.
Step 2: Set the activation of input unit for x; (i = I ro n).
Step 3: Calculate the net input to hidden unit x and irs output-. For j = 1 ro p,
"
Zinj = VOj + L XiVij ~----~~--r---L-~--~r-----~Y
i:=l -2 -1 0 2

Z; = f(z;n;) Figure 3·11 Gaussian kernel fimcrion.


74 Supervised Learning Network 3.6 Radial Basis Function Network 75

x,

X,

For "'- No
each >--
x,
Input Hidden Output
layer layer (RBF) layer

Figure 3·12 Architecture ofRBE

Select centers of RBF functions;


I 3.6.2 Architecture sufficient number has to be
selected to ensure adequate sampling
The archirecmre for the radial basis function network (RBFN) is shown in Figure 3-12. The architecture
consim of two layers whose output nodes form a linear combination of the kernel (or basis) functions
computed by means of the RBF nodes or hidden layer nodes. The basis function (nonlinearicy) in the hidden
layer produces a significant nonzero response w the input stimulus it has received only when the input of it
falls within a smallloca.lized region of the input space. This network can also be called as localized receptive
field network.

I 3.6.3 Flowchart for Training Process

The flowchart for rhe training process of the RBF is shown in Figure 3-13 below. In this case, the cemer of
the RBF functions has to be chosen and hence, based on all parameters, the output of network is calculated.

I 3.6.4 Training Algorithm

The training algorithm describes in derail ali rhe calculations involved in the training process depicted in rhe
flowchart. The training is starred in the hidden layer with an unsupervised learning algorithm. The training is
continued in the output layer with a supervised learning algorithm. Simultaneously, we can apply supervised
learning algorithm to ilie hidden and output layers for fme-runing of the network. The training algorithm is
If no
given as follows. 'epochs (or)
no
I Ste~ 0: Set the weights to small random values. No weight
hange
Step 1: Perform Steps 2-8 when the stopping condition is false.
Step 2: Perform Steps 3-7 for each input. Yes f+------------'

Step 3: Each input unir .(x; for all i ::= 1 ron) receives inpm signals and transmits to rhe next hidden layer
unit.
Figure 3-13 Flowchart for the training process ofRBF.
76 Supervised Learning Network
3.8 Functional Link Networks 77
·Step 4: Calculate the radial basis function.
Step 5: Select the cemers for che radial basis function. The cenrers are selected from rhe set of input
vea:ors. It should be ·noted that a sufficient number of centen; have m be selected to ensure
X( I)
Delay line
l
adequate sampli~g of the input vecmr space. X( I) !<(1-D X( I-n)
Step 6: Calculate the output from the hidden layer unit:
-
Multllayar perceptron
r
t,rxji- Xji)']
v;(x;) =
exp [-
J-
a2
T
0(1)
'
where Xj; is the center of the RBF unit for input variables; a; the width of ith RBF unit; xp rhe Figure 3·14 Time delay neural network (FIR fiher).
jth variable of input panern.
Step 7: Calculate the output of the neural network:

Y11n = L W;mv;(x;) + wo X(!) X( I-n)

i=l

where k is the number of hidden layer nodes (RBF funcrion);y,m the output value of mrh node in Multilayer perceptron z-1
output layer for the nth incoming panern; Wim rhe weight between irh RBF unit and mrh ourpur
node; wo the biasing term at nrh output node.
Step 8: Calculate the error and test for the stopping condition. The stopping condition may be number 0(1)
of epochs or ro a certain ex:renr weight change.
Figure 3·15 TDNN wirh ompur feedback (IIR filter).

Thus, a network can be trained using RBFN.

I 3.8 Functional Link Networks


I 3.7 Time Delay Neural Network -
These networks are specifically designed for handling linearly non-separable problems using appropriate
The neural network has to respond to a sequence of patterns. Here the network is required to produce a input representacion. Thus, suitable enhanced representation of the inpm data has to be found out. This
particular ourpur sequence in response to a particular sequence of inputs. A shift register can be wnsidered can be achieved by increasing the dimensions of the input space. The input data which is expanded is
as a tapped delay line. Consider a case of a multilayer perceptron where the tapped outputs of rhe delay line used for training instead of the actual input data. In this case, higher order input terms are chosen so that
are applied to its inputs. This rype of network constitutes a time delay Jlfurtzlnerwork (TONN}. The ourpm they are linearly independent of the original pattern components. Thus, the input representation has been
consists of a finite temporal dependence on irs inpms, given a~ enhanced and linear separability can be achieved in the extended space. One of the functional link model
networks is shown in Figure 3·16. This model is helpful for learning continuous functions. For this model,
U(t) = F[x(t),x(t-1), ... ,x(t- n)] the higher-order input terms are obtained using the onhogonal basis functions such as sinTCX, cos JrX, sin 2TCX,
cos 2;rtr, etc.
where Fis any nonlinearity function. The multilayer perceptron with delay line is shown in Figure 3-14. The most common example oflinear nonseparabilicy is XOR problem. The functional link networks help
When the function U(t) is a weigh red sum, then the· TDNN is equivalent to a finite impulse response in solving this problem. The inputs now are
filter (FIR). In TDNN, when the output is being fed back through a unit delay into rhe input layer, then the
net computed here is equivalent to an infinite impulse response (IIR) filter. Figure 3-15 shows TDNN with x:z t
output feedback. "'-I-I
"'"'
Thus, a neuron with a tapped delay line is called a TDNN unit, and a network which consists ofTDNN -I I -I -I
units is called a TDNN. A specific application ofTDNNs is speech recognition. The TDNN can be trained -I -I -I
using the back-propagatio·n-learning rule with a momentum factor.
78
Supervised ~aming Network 3.10 Wavelet Neural Networks 79

Yes No

I "=' I I C=21 I C=1 I I C=31


Figure 3·18 Binary classification tree.

obtained by a multilayer network at a panicular decision node is used in the following way:
Figure 3·16 Functional line nerwork model.
x directed to left child node tL, if y < 0
x directed to right child node tR, if y ::: 0
x, 'x, The algorithm for a TNN consists of two phases:
~
/

1. Tree growing phase: In this phase, a large rree is grown by recursively fmding the rules for splitting until
x, x, 0 all the terminal nodes have pure or nearly pure class membership, else it cannot split further.
y y
2. Tree pnming phase: Here a smaller tree is being selected from the pruned subtree to avoid the overfilling
1 of data.
The training ofTNN involves [\VO nested optimization problems. In the inner optimization problem, the
~G BPN algorithm can be used to train the network for a given pair of classes. On the other hand, in omer
Figure 3·17 The XOR problem.
optimization problem, a heuristic search method is used to find a good pair of classes. The TNN when rested
on a character recognition problem decreases the error rare and size of rhe uee relative to that of the smndard
classifiCation tree design methods. The TNN can be implemented for waveform recognition problem. It
Thus, ir can be easily seen rhar rhe functional link nerwork in Figure 3~ 17 is used for solving this problem. obtains comparable error rates and the training here is faster than the large BPN for the same application.
The li.Jncriona.llink network consists of only one layer, therefore, ir can be uained using delta learning rule Also, TNN provides a structured approach to neural network classifier design problems.
instead of rhe generalized delta learning rule used in BPN. As, a result, rhe learning speed of the fUnc6onal
link network is faster rhan that of the BPN.
I 3.10 Wavelet Neural Networks
I 3.9 Tree Neural Networks The wavelet neural network (WNN) is based on the wavelet transform theory. This nwvork helps in
approximating arbitrary nonlinear functions. The powerful tool for function approximation is wavelet
The uee neural networks (TNNs) are used for rhe pattern recognition problem. The main concept of this decomposition.
network is m use a small multilayer neural nerwork ar each decision-making node of a binary classification Letj(x) be a piecewise cominuous function. This function can be decomposed into a family of functions,
tree for extracting the non-linear features. TNNs compbely extract rhe power of tree classifiers for using which is obtained by dilating and translating a single wavelet function¢: !(' --')- R as
appropriate local fearures at the rlilterent levels and nodes of the tree. A binary classification tree is shown in
Figure 3-18.
The decision nodes are present as circular nodes and the terminal nodes are present as square nodes. The
j(x) = L' w;det [D) 12] ¢ [D;(x- 1;)]
i::d
terminal node has class label denoted 'by Cassociated with it. The rule base is formed in the decision node
(splitting rule in the form off(x) < 0 ). The rule determines whether the panern moves to the right or to the where D,. is the diag(d,·), d,. EJ?t
ate dilation vectors; Di and t; are the translational vectors; det [ ] is the
left. Here,f(x) indicates the associated feature ofparcern and"(}" is the threshold. The pattern will be given determinant operator. The w:..velet function¢ selecred should satisfy some properties. For selecting¢: If' --')o
the sJass label of the terminal node on which it has landed. The classification here is based on the fact iliat R, the condition may be
the appropriate features can be selected ar different nodes and levels in the tree. The output feature y = j(x)
,P(x) =¢1 (XJ) .. t/J1 (X 11 ) forx:::: (x, X?.· . . , X11 )

L_i.._
..~~'"·
80 Supervised Learning Network 3.12 Solved Problems 81

ro form a Madaline network. These networks are trained using delta learning rule. Back-propagation network
-r is the most commonly used network in the real time applications. The error is back-propagated here and is
fine runed for achieving better performance. The basic difference between the back-propagation network and
~)-~-{~Q-{~}-~~-~ 7 radial basis function network is the activation funct'ion. use;d. The radial basis function network mostly uses
Gaussian activation funcr.ion. Apart from these nerWor~; some special supervised learning networks such as

:_,, : \] : : ~ time delay neural ne[Wotks, functional link networks, tree neural networks and wavelet neural networks have
also been discussed.

0----{~J--[~]-----{~-BJ------0-r
:· I I
K
Input( X
3.12 Solved Problems
Output
I. I!Jlplement AND function using perceptron net- Calculate the net input
~ //works for bipol~nd targets.

&-c~J-{~~J-G-cd
y;, = b+xtWJ +X2W2
Solution: Table 1···shows the truth table for AND
function with bipolar inputs and targelS:
=O+Ix0+1x0=0

Figure 3·19 Wavelet neural network. The output y is computed by applying activations
Table 1
over the net input calculated:

I {
X]
where "'I I ify;,> 0 · - .
-I -I y = f(;y;,) = 0 if y;, = 0
¢, (x) = -xexp ( -~J -I I -I -1 ify;71 <0
-I -I -I . - ··-· . .- -- .--_-==-...
Here we have rake~-1) = O.)Hence, when,y;11 = 0,
is called scalar wavelet. The network structure can be formed based on rhe wavelet decomposirion as y= 0. ---···
The perceptron network, which uses perceptron
" learning rule, is used to train the AND function. Check whether t = y. Here, t = 1 andy = 0, so
y(x) = L w;¢ [D;(x- <;)] +y The network architecture is as shown in Figure l. t f::. y, hence weight updation takes place:
i=l
The input patterns are presemed to the network one
w;(new) = zv;(old) + ct.t:x;
where J helps to deal with nonzero mean functions on finite domains. For proper dilation, a rotation can be by one. When all the four input patterns are pre-
made for bener network operation: sented, then one epoch is said to be completed. The WJ(new) = WJ(oJd}+ CUXJ =0+] X I X l = 1
initial weights and threshold are set to zero, i.e., W2(ncw) = W2(old) + atx:z = 0 + 1 x l x 1 = I
WJ = WJ. = h = 0 and IJ = 0. The learning rate
y(x) = L" w;¢ [D;R;(x- <;)] + y a is set equal to 1.
b(ncw) = h(old) + at= 0 + 1 x I = l
i=l
Here, the change in weights are

x,~
where R; are the rotation marrices. The network which performs according to rhe above equation is called
Ll.w! = ~Yt:q;
wavelet neural network. This is a combination of translation, rotarian and dilation; and if a wavelet is lying on
the same line, then it is called wavekm in comparison to the neurons in neural networks. The wavelet neural Ll.W2 = atxz;
network is shown in Figure 3-19. b..b = at
X, ~ y y

w, The weighlS WJ = I, W2 = l, b = 1 are the final


1 3.11 Summary ~
X,
weighlSafrer first input pattern is presented. The same
process is repeated for all the input patterns. The pro-
In chis chapter we have discussed the supervised learning networks. In most of the classification and recognition cess can be stopped when all the wgets become equal
Figure 1 Perceptron network for AND function. to the cllculared output or when a separating line is
problems, the widely used networks are the supervised learning networks. The.architecrure, the learning rule,
flowchart for training process-and training algorithm are discussed in detail for perceptron network, Adaline, For the first input pattern, x 1 = l, X2 = I and obrained using the final weights for separating the
Madaline, back-propagation network and radial basis function network. The percepuon network can be t = 1, with weights and bias, w1 = 0, W2 = 0 and positive responses from negative responses. Table 2
trained for single output clasSes as well as mulrioutput classes. AJso, many Adaline networks combine together b=O, shows the training of perceptron network until its
82 Supervised Learning Network 3.12 Solved Problems 83

Table2
Weights 0----z The final weights at the end of third epoch are
w, =2,W]_ = l,b= -1
Input
Target Net input
Calculated
output
Weight changes
W) W]. b x, X w,~y y
Fu-rther epochs have to be done for the convergence

~
X) X]. (t) (y,,) (y) ~WI f:j.W'l M (0 0 0) of'the network.
· 3. _Bnd-the weights using percepuon network for
EPOCH-I
I 0 0 I /AND NOT function when all the inpms are pre-
I
-I -1 -I -I 0 2 0 sented only one time. Use bipolar inputS and
Figure 3 Perceptron network for OR function.
-I -I 2 +I -1 -I I -I ' targets.
0 0 0 1 -1 -I The perceptron network, which uses perceptron Solution: The truth table for ANDNOT function is
-1 -1 -I -3 -I
learning rule, is used to train the OR function. shown in Table 5.
EPOCH-2
0 0 0 -I The network architecture is shown in Figure 3. TableS
I I
0 0 0 -I The initial values of the weights and bias are taken
I -1 -I -1 -I t
-1 as zero, i.e., Xj "'-
-I -I -I -I 0 0 0
-I
I I -I
-I -1 -3 -I 0 0 0
WJ=W]_:::::b:::::O 1 -I I
-I I -1
target and calculated ourput converge for all the ~ Also the learning rate is 1 and threshold is 0.2. So, -I -I -I
patterns. the aaivation function becomes
The final weights and bias after second epoch are The network architecture of AND NOT function is
/'~-..._.};.- . .~:- 1 if y;/1> 0.2 ~ shown as in Figure 4. Let the initial weights be zero
=l,W'l=l, b=-1 (-1, 1)
,_~-- ~Yin ~ 0.2
W[

0 . _,. \.. . .. , [(yin) ;::: { O if - 0.2 and ct = l,fJ = 0. For the first input sample, we
compme the net input as
Since the threshold for the problem is zero, the
equation of the separating line is '
-x,
l~
~
,x,. . J/
";?").. The network is trained as per the perceptron training "
·. ~i algorithm and the steps are as in problem 1 (given for Yin= b+ Lx;w; = h+x1w 1 +xzlil2
w, b
X2 = - - X i - -
:.. /1--'
first pattern}. Table 4 gives the network rraining for i=-1
(-1,-1) (1,-1) ~=-X,+1 '/

Here
'"' "" 3 epochs. =O+IxO+IxO=O

Table4
W[X! + lli2X2 + b > $ Weights
W]X] + UlzX2 + b> Q -X,
Input Calculated Weight changes
Figure 2 Decision boundary for AND function
Target Net input output w, W2 b
Thus, using the final weights we obtain Xi X2 (t) {y;,,) (y) ~W) ~., ~b (0 0 0)
in perceptron training{$= 0).
I (-1) EPOCH-I
X2 = -}x' - -~- ~'mplemenr OR function with binary inputs and I 0 0 I I I
0 2 0 0 0
lil~J
L _ -xt+l · bipolar targw using perceptron training algo-
rithm upto 3 epochs.
0 2 0 0 0 I I 0
h can be easily found that the above straight line 0 0 -I 0 0 -I I I 0
Solution: The uuth table for OR function with EPOCH-2
separates the positive response and negative response
binary inputs and bipolar targets is shown in Table 3. 2 0 0 0 I I 0
region, as shown in Figure 2.
I 0 0 0 0 I I 0
The same methodology can be applied for imple- Table 3 0 I I 0 0 0 I I 0
menting other logic functions such as OR, AND- t 0 0 -I 0 0 0 0 0 I I -I
NOT, NAND, etc. If there exists a threshold value
Xj
"'-
EPOCH-3
f) ::j:. 0, then two separating lines have to be obtained, I
I I I 0 0 0 I I -I
i.e., one to se-parate positive response from zero 0 I 0 0 0 I 0 I 2 I 0
and the other for separating zero from the negative 0 I 0 I I I I 0 0 0 2 I 0
0 0 -I 0 0 -I 0 0 0 0 -I 2 I -I
response.

"'
C:J
Supervised Learning Network
84 3.12 Solved Problema 85
For the third input sample, XI = -1, X2 = 1,
0----z t = -1, the net input is calculated as,
4. Pind the weights required to perform the follow- Table.7

w,~y
/ ing classification using percepuon network. The Input
(/ vectors (1,), 1, 1) and (-1, 1 -1, -1) are belong-
x, x, _..,;¥'
y '
]in= b+ Lx;w;= b+XJWJ +X2WJ. ing to the class (so have rarger value 1), vectors X2 b Targ.t (t)
w, i=l (1, 1, 1, -1) and (1, -1, -1, 1) are not belong-
'J
··) 1 "'1 "'
=0+-1 X O+ 1 X -2=0+0-2= -2 ing to the class (so have target value -1). Assume -1 1 -1 -1
X,
X, learning rate as 1 and initial weights as 0.
-1 1 -1
Figure 4 Network for AND NOT function. The output is oblained as y = fi.J;n) -1. Since = Solution: The truth table for lhe given vectors is given -1 -1 1 1 -1
t = y, no weight changes. Thus, even after presenting
in Table_?.·-· -·---.. ><
Applying the activation function over the net input, clJe third input sample, the weights are
Le~·Wt = ~~.l/l3. = W< "' b ,;;-p and the
we obtain lear7cng ratec; = 1. Since the thresWtl = 0.2, so Thus,ln the third epoch, all the calculated outputs
w=[O -2 0]
become equal to targets and the necwork has con-

=I ~
ify;,. > 0 the.' ctivation function is
y=f(y,,) if-O~y;11 ::S:0 For the fourth input sample, x1 = -1, X2 = -1, verged. The network convergence can also be checked

y., { ~
if ]in> 0.2
l-1 ify;,. < -0
t = -1, the net input is calculated as

'
if -0.2 :S Yin :S 0.1
by forming separating line equations for separating
positive response regions from zero and zero from
negative response region.
Hence, the output y = f (y;,.) = 0. Since t ::/= y, U.e -1 if Yin< -0.2
]in= b+ Lx;w; = b+x1w1 +X21112 The network architecture is shown in Figure 5.
new weights are computed as
i=l The net input is given by
WJ (new) = W] (o\d) + (UX] = 0 + 1 X -} X 1 = -} =0+-lxO+(-lx-2)
5. Classify the two-dimensiona1 input pattern shown
]in= b+x1w1 +xzWJ. +X3W3 _/ in Figure 6 using perceptron network. The sym~
U12(new) = W2.(old) + cttx2_ = 0 + 1 x -1 x l = -1 =0+0+2=2 bol "*" indicates the da[a representation to be +1
+x4w4
b(new) = b(old)+ at= 0 + 1 x -1 = -1 and "•" indicates data robe -1. The patterns are
The output is obtained as y = f (y;n) = 1. Since The training is performed and the weights are tabu- I-F. For panern I, the targer is+ 1, and for F, the
The weights after presenting the first sample are t f. y, the new weights on updating are given as lated in Table 8. target is -1.
w=[-1-1-1] WJ (new) = WJ (old)+ £UXj = 0+ l X -I X -I = 1
Tables
For the seconci inpur sample, we calculate the net IU2(new) = Ul!(old) + ct!X'z = -2 +I x -1 x -1 =-I
inpur as Weights
b(ncw) = b{old) +at= O+ 1 X -1 = -1 Inputs Target Net input Output Weight changes (w, w, w, w4 b)
' (x, X4 b) (t) (Y;,) (y) (.6.w1 /J.llJ2 .6.w3 IJ.w4 !:J.b) (0 0 0 0 0)
Yin= b + L:x;w; = b +x1w1 +X2W.Z X2
i:= I
The weights after presenting foun:h input sample are

w= [1 -1 -1]
EPOCH-! "'
=-l+lx-1+(-lx-1) ( 1 1 1 1 1) 1 0 0 1 1 1 l 1 1 1 1 1 1
(-1 1 -1 -1 1) 1 -1 -1 -1 1 -1 -1 1 0 2 0 0 2
One epoch of training for AND NOT function using
=-1-1+1=-1 ( 1 1 l -1 1) -1 4 I -1 -1 -I 1 -1 -1 1 -1 1
perceptron network is tabulated in Table 6.
( 1 -1 -1 1 1) -1 1 1 -1 1 1 -1 -1 -2 2 0 0 0
The output y = f(y;") is obtained by applying
Table& EPOCH-2
activation function, hence y = -1.
( 1 1 1 1 1) 1 0 0 1 1 1 1 1 -1 3 1
Since t i= y, the new weights are calculated as Weights
Calculated (-1 1 -1 -1 1) 1 3 1 0 0 0 0 0 -1 3 1
Input
Wj{new) = WJ(oJd) + CUXJ = -l + 1 X I X J = 0 _ _ _ Target Net input output WJ "'2 b ( 1 1 1 -1 1) -1 4 1 -1 -1 -1 1 -1 -2 2 0 2 0
(y) (0 0 0)
XI X:Z 1 (t) (y;,)
I
( 1 -1 -1 1 1) -1 -2 -1 0 0 0 0 0 -2 2 0 2 0
Ul2(new) = Ul2(old) + CtD:l = -1 + 1 x l x-I= -2
1 1 -1 0 0 -1 -1 -1 EPOCH-3
b(new) = b{old) +at= -1 + l xI =0
0 -2 0 ( 1 1 1 1 1) 1 2 1 0 0 0 0 0 -2 2 0 2 0
1 -1 1 1 -1 -1
The weights after presenting the second sample are -1 1 1 -1 -2 -1 0 -2 0 I (-1
( 1 1 1
1 -1 -1
-1
l)
1) -1
1 2
-2 -1
1 0
0
0
0
0
0
0
0
0
0
-2 2
-2 2
0
0
2 0
2 0
-1 -1 1 -1 2 1 1 -1 -l

l
w= [0 -2 0] !__1_ -1 -1 1 1) -1 -2 -1 0 0 0 0 0 -2 2 0 2 0
I
86
= b +x1w1 + XZW2 +X3w3 +X4W4 +xsws
+XGW6 + X7WJ + xawa + X9W9
Supervised learning Network

1
3.12 Solved Problems

w;(new) = w;(old)+ O:IXS = 1 + 1 x -1 x 1 = 0 lnitiaJly all the weights and links are assumed to be
W6(new) == WG(oJd) + 0:0:6 = -1 + 1 X -1 X 1 = -2 small raridom values, say 0.1, and the learning rare is
87
II
also set to 0.1. Also here the least mean square error
=0+1 x0+1 x0+1 x 0+(-1) xO W?{new) = W?(old) + atx'] =I+ 1 x -1 x 1 = 0
+1xO+~Dx0+1x0+1x0+1xO wg(new) = ws(old)+ o:txs = 1 + 1 x -1 x -1 = 2
· miy Qe set. The weights are calculated until the least
m~ square error is obtained.
I
Yin= 0 fU9(new) == fV9(old) + etfX9 = 1 + 1 x -1 x -1 "== 2 The initial weighlS are taken to be WJ = W2 =
b[new) = b(old) +or= I+ 1 x -1 = 0 b = 0.1 and rhe learning rate ct = 0.1. For the first
Therefore, by applying the activation function the
input sample, XJ = 1, X2 = 1, t = 1, we calculate the
output is given by y = ff.J;n) = 0. Now since t '# y, The weighlS afrer presenting rhe second input sam~ net input as
the new weights are computed as pie are ~

Wi(new) = WJ(oJd)+ atx1 =-0+ 1 X 1 X 1 = 1


w = [0 0 0 - 2 0 -2 0 2 2 0]
' 2
Yin= b+ Lx;w; = b+ Lx;w;
w,(new) = w,(old) + 01>2 = 0 + 1 x 1 x 1 = 1
The network architecture is as shown in Figure 7. The i=l i=l
w3(new) = w3(old) + at:q =0+ 1 x 1 x 1= 1 network can be further trained for its convergence. = b+x1w1 +xzwz
Figure 5 Network archirecrure. W.j(new) = W4(o!d) + CUX4:;:: 0 + l X l X -1 = -1
= 0.1 + 1 X 0.1 + 1 X 0.1 = 0.3
w;(new) = w;(old) + atx;_ = 0 + 1 x 1 x l = 1
••• ••• WG(new) = W6(old) + CttxG = 0 + 1 X 1 X -1 = -1 Now compute (t- y;n) = (1- 0.3) = 0.7. Updating
the weights we obrain,
•• W)(new) = W)(old)+ O"'J = 0 + 1 x 1 x 1 = 1
ws(new) = wg(old) + ""' = 0 + 1 x 1 x 1 = 1 w;(new) = w;(old) + a(t- y;n)x;
•••
W<J(new) = rlJ9(old) + O:fX9 = 0 + 1 x l x 1 = 1
'I' 'P where a(t- y;11 )x; is called as weight change fl.w;.
b(new) = b(old) + ot = 0 + 1 x 1 = 1
The new weights are obtained as
Figure 6 I~F data representation.
The weights afrer presenting first input sample are y
Solution: The training patterns for this problem are
w,(new) = WJ(old)+fl.wl = 0.1 +O.l X 0.7 X 1
w = [11 1 - 1 1 - 1 1 1 1 1]
tabulated in Table 9. = 0.1 + 0.07 = 0.17
Forrhesecondinputsample,xz=[1111111-1 w,(new) = w,(old)+L>W2 = 0.1
Table 9 -1 1], t= -1, rhe ner inpm is calculated as
+ 0.1 X 0.7 X 1 = 0.17
Input
b(new) = b(old)+M = 0.1 + 0.1 x 0.7 = 0.17
Pattern x 1 xz X3 .r4 x5 X6 Xi xa X9 1 Target (t) Yir~ = b+ L:x;w;
r"=l where
1-11-111111
F 1 1 1 1 1 ' 1 -1 -11 -1 = b +X] W] + XZWJ. + X3W3 + X4W4 + X5W5
6.w1 = a(t- JirJ~l

I~
+ X6W6 + X7lll] + XflWB +X<) IV<) Figure 7 Network architecture.
.6.wz = a(t- y;,)X2
The initial weights are all assumed to be zero, i.e., =1+ l X 1+ 1 X l+ l X 1+ 1 X -1 + 1 X 1 lmplemenr OR function with bipolar inputs and
e = 0 and a = 1. The activation function is given by
+1x-1+1x1+(-1)x 1+(-1)x1 targelS using Adaline network.
t.b = o(t- y;,)

~y~ {····~· ifJ.rn> ·o


if-O:Sy;, 1 .::;:0 i Yin= 2 Solution: The truth table for OR function with
bipolar inpulS and targers is shown in Table 10.
Now we calculare rhe error:

. -1 ifyrn < -0 I Therefore the output is given by y = f (y;u) = l. E = (r- y;,) 2 = (0.7) 2 = 0.49
I Since t f= y, rhe new weights are Table 10
For the first input sample, Xj = [l 1 L ~ I--1 -1 1 1 t The final weights after presenting ftrsr inpur sam·
1 1], t = l, the net input is calculated as
w,(new) == + o:oq == l + 1 x -1
WJ(old) X\== 0 Xj X:z
- pie are
fV2(new) == fV2(old) + O:tx]. = 1 + 1 X -1 Xl=0 1
-1 w= [0.17 0.17 0.17]
w3(new) = w3(old)+ O:b:J =I+\ X -1 X1= 0
y;, = b + Lx;w; -1
i=l
w~(new) = wq(old) + CtP:4 =-I+ 1 x -1 x t = -2 -1 -1 -1 and errorE= 0.49.
11

II
88 Supervised learning Network 3.12 Solved Problems 89

These calculations are performed for all the input Table 12 7. UseAdaline nerwork to train AND NOT funaion w,(new) = w,(old) + a(t- y,,)x:z
'
samples and the error is caku1ared. One epoch is
completed when all the input patterns are presented.
Summing up all the errors obtained for each input
Epoch
Epoch I
Total mean square error
3.02
with bipolar inputs and targets. Perform 2 epochs
of training.
= 0.2+ 0.2 X (-1.6) X I= -0.12
b(new) = b(old) + a(t- y;,) !
Epoch 2 1.938 Solution: The truth table for ANDNOT function = 0.2+ 0.2 (-1.6) = -0.12
sample during one epoch will give the mtal mean X '!:
Epoch 3 1.5506 with bipolar inputs and targets is shown in Table 13.
square error of that epoch. The network training is
Epoch 4 1.417 Table 13 Now we compute the error,
continued until this error is minimized to a very small
Epoch 5 1.377
value.
Adopting the method above, the network training E= (t- y;,) 2 = (-1.6) 2 = 2.56

~-­
is done for OR function using Adaline network and
is tabulated below in Table 11 for a = 0.1. The final weights after presenting first input sample

-".~
The total mean square error aft:er each epoch is a<e w = [-0.12- 0.12- 0.12] and errorE= 2.56.
The operational steps are carried for 2 epochs
given as in Table 12. ,1 @ w1 == 0.4893 f::\_ ~
of training and network performance is noted. It is
Thus from Table 12, it can be noticed that as
training goes on, the error value gets minimized.
~~1'~Y Initially the weights and bias have assumed a random
tabulated as shown in Table 14.
Hence, further training can be continued for fur~ - value say 0.2. The learning rate is also set m 0.2. The
weights are calculated until the least mean square error
The total mean square error at the end of two
epochs is summation of the errors of all input samples
t:her minimization of error. The network archirecrure ~ is obtained. The initial weights are WJ = W1. b = =
of Adaline network for OR function is shown in as shown in Table 15.
0.2, and a= 0.2. For the fim input samplex1 = 1,
Figure 8. Figure 8 Network architecture of Adaline.
.::q = l, & = -1, we calculate the net input as Table15
Yin= b + XtWJ + X2lli2
)
Table 11
= 0.2+ I X 0.2+ I X 0.2= 0.6
Epoch Total mean square error
ll
Weights
Epoch I 5.71 :~
Net Epoch 2 2.43 ·'
Inputs T: Weight changes Now compute (t- Yin} = (-1- 0.6) = -1.6.
- - a<get input Wt b Enor
X] x:z I t Yin (r- Y;,l) i>wt
"'"" i>b (0.1 ""
0.1 0.1) (t- Y;,? Updacing ilie weights we obtain
Hence from Table 15, it is clearly undersrood rhat the .,
EPOCH-I w,-(new) = w,-(old) + o:(t- y,n)x; mean square error decreases as training progresses.
I I I I 0.3 0.7 0,07 0,07 om 0.17 0.17 0.17 0.49
Also, it can be noted rhat at the end of the sixth
'
\;
I -1 I I 0.17 0.83 0.083 -0.083 0.083 0.253 0.087 0.253 0.69 The new weights are obtained as
-I I I I 0.087 0.913 -0.0913 0,0913 0,0913 0.1617 0.1783 0.3443 0.83 epoch, rhe error becomes approximately equal to l.
-1 -1 1 -I 0.0043 -1.0043 0.1004 0.1004 -0.1004 0.2621 0.2787 0.2439 1.01 WI (new) ::::: w, (old) + ct(t- Jj )x,
11 The network architecture for ANDNOT function
EPOCH.2 = 0.2 + 0.2 X (-1.6) X I= -0.12 using Adaline network is shown in Figure 9.
1 I 1 1 0.7847 0.2153 0.0215 0.0215 0.0215 0.2837 0.3003 0.2654 0.046
I -1 1 I 0.2488 0.7512 0.7512 -0.0751 0.0751 0.3588 0.2251 0.3405 0.564 Table 14
-I I 1 I 0.2069 0.7931 -0.7931 0.0793 0.0793 0.2795 0.3044 0.4198 0.629
-1 -1 I -I Weights
-0.1641 -0.8359 0.0836 0.0836 -0.0836 0.3631 0.388 0.336 0.699 Ne<
Inputs Weight changes
EPOCH-3 _ _ Target input w, b Error
I I I I 1.0873 -0.0873 -0.087 -0.087 -0.087 0.3543 0.3793 0.3275 0.0076
t>w, M (0.2 ""
0.2 0.2) (t- Y;n)2
-I
I -1 I
I I
-1 -1 1 -1
I
I
0.3025 +0.6975
0.2827
0.0697 -0.0697 0.0697 0.4241 0.3096 0.3973
0.7173 -0.0717 0,0717 0,0717 0.3523 0.3813 0.469
-0.2647 -0.7353 0.0735 0.0735 -0.0735 0.4259 0.4548 0.3954
0.487
0.515
0.541
X[ X:Z
EPOCH-I
I t Y;" (t-y;rl)
"'""
I -I 0.6 -1.6 -0.32 -0.32 -0.32 -0.12 -0.12 -0.12 2.56
EPOCH-4
I I I I 0,076 -I I I -0.12 1.12 0.22 -0.22 0.22 0.10 -0.34 0.10 1.25
1.2761 -0.2761 -0.0276 -0.0276 -0.0276 0.3983 0.4272 0.3678
I -1 I I 0.3389 0.6611 0.0661 -0.0661 0.0661 0.4644 0.3611 0.4339 0.437 -I I I -I -0.34 -0.66 0.13 -0.13 -0.13 0.24 -0.48 -0.03 0.43
-I I 1 I 0.3307 0.6693 -0.0669 0.0669 0.0699 0.3974 0.428 0.5009 0.448 -1 -1 I -I 0.21 -1.2 0.24 0.24 -0.24 0.48 -0.23 -0.27 1.47
-1 -1 I -I -0.3246 -0.6754 0.0675 0.0675 -0.0675 0.465 0.4956 0.4333 0.456 EPOCH-2
EPOCH-5 -I -0.02 -0.98 -0.195 -0.195 -0.195 0.28 -0.43 -0.46 0.95
I I I I 1.3939 -0.3939 -0.0394 -0.0394 -0.0394 0.4256 0.4562 0.393 0.155
I -1 I I 0.25 0.76 0.15 -0.15 0.15 0.43 -0.58 -0.31 0.57
I -1 I I 0.3634 0.6366 0.0637 -0.0637 0.0637 0.4893 0.3925 0.457 0.405
-I I I I 0.3609 0.6391 -0.0639 0.0639 0.0639 0.4253 0.4654 0.5215 0.408 -I I I -I -1.33 0.33 -0.065 0.065 0.065 0.37 -0.51 -0.25 0.106
-1 -1 I -I -0.3603 -0.6397 0.064 0.064 -0.064 0.4893 0.5204 0.4575 0.409 -1 -1 I -I -0.11 -0.90 0.18 0.18 -0.18 0.55 -0.38 0.43 0.8

I
-~
3.1 '2 Solved Problems 91
90 Supervised learning Network

input sample, XJ = 1, X2 = l, target t = -1, and w11 (new) =W21 (old)+a(t-ZinJ)XZ

...
b.,o learning rate a equal to 0.5: =0.2+0.5(-1-0.55) X 1 =-0.575
"'22 (new)= "'22 (old)+ a(t- z;" 2)"2
x, x ') w1=o.ss
Calculate net input to the hidden units:
1 y =0.2+0.5(-1-0.45)x 1=-0.525'
Zinl = + XJ WlJ + X2U/2J
b1
y
>Nz"'_o.~ = 0.3 + 1 X 0.05 + 1 X 0.2 = 0.55
b2 (new]= b2 (old)+ a(t- z;d
x, x, Zin2 = /n. +X} WJ2 + xiW22 = 0.15+0.5(-1-0.45)=-0.575
= 0.15 + 1 X 0.1 + 1 X 0.2 = 0.45 All the weights and bias between the input layer and
Figure 9 Network architecrure for ANDNOT hidden layer are adjusted. This completes the train-
function using Adaline nerwork.. Calculate the output z 1,Z2 by applying the activa- ~::-1.08
ing for the first epoch. The same process is repeated
tions over the net input computed. The activation until the weight converges. It is found that the weight
8 Using Madaline network, implement XOR func- function is given by Figure 11 Madaline network for XOR function
tion with bipolar inputs and targets. Assume the converges at the end of 3 epochs. Table 17 shows the
(final weights given).
required parameters for training of the network. I ifz;,<:O training performance of Madaline network for XOR y
! (Zir~) = ( -1 ifz;11 <0 function.
Solution: The uaining pattern for XOR function is The network architecture for Madaline network
given in Table 16. Hence, with final weights for XOR function is shown in
Table 16 z1 = j(z;,,) = /(0.55) = I Figure 11.
z, = /(z;,,) = /(0.45) = 1 9._}Jsing back-propagation_ network, find the new
• After computing the output of the hidden units, / weights ~or the ~et shown in Figure 12. It is pre- .0.5
, semed wuh the mput pattern [0, 1] and the target 0.3,
then find the net input entering into the output
output is 1. Use a learning rare a = 0.25 and
unit:
binary sigmoidal activation function.
Yin= b3 +zJVJ +z2112
Solution: The new weights are calculated based
The Madaline Rule I (MRI) algorithm in which the = 0.5 + 1 X 0.5 + I X 0.5 = 1.5 on the training -algorithm in Section 3.5.4. The
-oj
weights between the hidden layer and ourpur layer Figure 12 Ne[Work.
remain fixed is used for uaining the nerwork. Initializ- • Apply the activation function over the net input initial weights are [v11 v11 vod = [0.6 -0.1 0.3],
ing the weights to small random values, the net\York Yin to calculate the output y.
Table 17
architecture is as shown in Figure 10, widt initial
y = f(;y;,) = /(1.5) = 1 Inputs Target
weights. From Figure 10, rhe initial weights and bias b, b2
X~ (t} wn
are [wu "'21 bd = [0.05 0.2 0.3], [wn "'22 b,] = Since t f:. y, weight updation has to be performed. Zinl Zinl ZJ Zl Y;11 Y "'21 W12
'""
[0.1 0.2 0.15] and [v 1 v, b3] = [0.5 0.5 0.5]. For fim Also since t = -1, the weights are updated on z1
EPOCH-I
and Zl that have positive net input. Since here both
I I 1 -1 0.55 0.45 I 1 1.5 1-0.725 -0.58 -0.475-0.625 -0.525 -0.575
1lbj=0.3 net inputs Zinl and Zinl are positive, updating the 1-1 I I -0.625 -0.675 -1-1 -0.5 -1 0.0875-1.39 0.34 -0.625 -0.525 -0.575
weights and bias on both hidden units, we obtain -I 1 1 I -1.1375 -0.475 -I -1 -0.5 -I 0.0875 -1.39 0.34 -1.3625 0.2125 0.1625
Wij(new) = Wij(old) + a(t- Zin)x; -1-1 1 -1 1.6375 1.3125 1 1 1.5 1 1.4065 -0.069 -0.98 -0.207 1.369 -0.994
bj(new) = bj(old) + a(t- z;"j) EPOCH-2
1 I I -1 0.3565 0.168 1 I 1.5 I 0.7285 -0.75 -1.66 -0.791 -0.207 -1.58
y
This implies: 1-1 I 1 -0.1845-3.154 -1-1-0.5-1 1.3205-1.34 -1.068-0.791 0.785 -1.58
-1 1 I 1 -3.728 -0.002 -1-1-0.5-1 1.3205 -1.34 -1.068- 1.29 0.785 -1.08
WI! (new)= WI! (old)+ a(t- ZinJ)XJ
-1-1 I -1 -1.0495-1.071 -1-1-0.5-1 1.3205 -1.34 -1.068-1.29 1.29 -1.08
=0.05+0.5(-1-0.55) X 1 = -0.725
EPOCH-3
WJ2(new) = WJ2(old) + a(t- Zin2)Xl 1.32 -1.34 -1.07 - 1.29 1.29 -1.08
1 1 1 -1 -1.0865-1.083 -1-1-0.5-1
'bz =0.15 =0.!+0.5(-1-0.45) X I =-0.625 -1.34 -1.07 -1.29 1.29 -1.08
1-1 I I 1.5915-3.655 1-1 0.5 I 1.32
b1(new)= b1(old)+a(t-z;"Il -I 1 I I -3.728 1.501 -1 1 0.5 1 1.32 -1.34 -1.07 -1.29 1.29 -1.08
Figure 10 Nerwork archicecrure ofMadaline for 1.29
=0.3+0.5( -I- 0.55) = -0.475 1-1 1 -1 -1.0495-1.701 -1-1-0.5-1 1.32 -1.34 -1.07 -1.29 -1.08
XOR funcr.ions .(initial weights given).
92 SupeJVised Learning Network
-I 3.12 Solved Problems 93
I Compute rhe final weights of the network:
[v12 vn "02l = [-0.3 0.40.5] and [w, w, wo] = [0.4 This implies
0.1 -0.2], and the learning' rate is a = 0.25. Acti- v11(new) = VIt(old)+b.vJI = 0.6 + 0 = 0.6
!, = (I - 0.5227) (0.2495) = 0.1191
vation function used is binary sigmoidal activation vn(new) = vn(old)+t.v12 = -0.3 + 0 = -0.3 .
function and is given by Find the change5~Ulweights be~een hidden and "21 (new) = "21 (oldl+<'>"21
output layer:.
I = -0.1 + 0.00295 = -0.09705
f(x) = I+ ,-• <'>wi = a!1 ZI = 0.25 X 0.1191 X 0.5498 vu(new) = vu(old)+t>vu
,-- 0.0164 ::>
= 0.4 + 0.0006125 = 0.4006125
Given the output sample [x 1, X2] = [0, 1] and target
t= 1, t.w, = a!1 Z2 = 0.25 X 0.1191 X 0.7109 w,(new) = w1(old)+t.w, = 0.4 + 0.0164,
Calculate the net input: For zt layer ---=o:o2iT7 = 0.4164
Figure 13 Network.
<'>wo = a! 1 = 0.25 x 0.1191 = 0.02978 w2(now) = w,(old)+<'>W2 = 0.1 + 0.02!17
Zinl = !lQJ + XJ + X2V21
V11
Compute the error portion 8j between input and = 0.!2!17 For z2layer
= 0.3+0 X 0.6+ I X -0.1 = 0.2 hidden layer (j = 1 to 2): VOl (new) = VOl (old)+<'>•OI = 0.3 + 0.00295
For z2 layer ~f'( = 0.30295
z;,2 = V02 + XJVJ2 + X2V22
Dj= O;,j Zinj)
= 0.5 + (-1) X -0.3 +I X 0.4 = l.2
vo2(new) = 1102(old)+.6.vo2
Zjril = VQ2 + Xj V!2 + X2.V1.2 '
O;,j= I:okwjk = 0.5 + 0.0006125 = 0.5006!25 Applying activation to calculate the output, we
= 0.5 + 0 X -0.3 +I X 0.4 = 0.9 k=!/
.,.(new)= .,.(old)+8wo = -0.2 + 0.02976 obtain
8;nj = 81 Wj! I·.' only one output neuron]
Applying activation co calculate Ute output, we 1_ 1 _ t'0.4
obrain ------
=>!;,I= !1 wn = 0.1191.K0ft = 0.04764
-~
= -0.!7022
Thus, the final weights hav~ been computed for the
t-"inl
ZI =f(z; 1l = - - - = - - = -0.!974
n 1 + t'-z:;nl 1 + /1.4
I
ZI = f(z;,,) = - - - = - - - = 0.5498
1 + e-z.o.1 1 + t-0.2
I =>O;,z = Ot Wzl = 0.1191
_,- X 0.1 = 0.01191
_-:~
network shown in Figure 12.
zz =/(z;,2) = -
1- t'-Z:,;JL
- - = - -1- 2 = 0.537
l - t'-1.2
Error, 81 =O;,,f'(Zirll). 1+t-Zin2 1 +e-.
I 1 19. Find rhe new weights, using back-propagation
z2 = f(z· 2l = - - - = - - - = 0.7109 j'(z;,I) = f(z;,,) [1- f(z;,,)] network for the network shown in Figure 13.
m 1 + e-Zilll 1 + e-0.9 Calculate lhe net input entering the output layer.
= 0.5498[1- 0.5498] = 0.2475 The network is presented with the input pat- For y layer
Calculate the net input entering the output layer. 0 1 =8;,1/'(z;,J) tern l-1, 1] and the target output is +1. Use a
For y layer
= 0.04764 X 0.2475 = 0.0118
learning rate of a = 0.25 and bipolar sigmoidal Yin= WO + ZJWJ +zzWz
activation function. = -0.2 + (-0.1974) X 0.4 + 0.537 X 0.1
Ji11 = WO+ZJWJ +z2wz Error, Oz =0;,a/'(z;,2) Sn_ly.tion: The initial weights are [vii VZI vod = [0.6 = -0.22526
= -0.2 + 0.5498 X 0.4 + 0.7109 X 0.1
j'(z;,) = f(z;d [1 - f(z;,2)] ·0.1 0.3], [v12 "22 vo2l = [ -0.3 0.4 0.5] and [w,
= 0.09101 Wz wo] = [0.4 0.1 -0.2], and die learning rme is Applying activations to calculate the output, we
= 0.7109[1 - 0.7!09] = 0.2055
Applying activations to calculate the output, we
a= 0.25. obtain
Oz =8;,zf' (z;,2) Activation function used is binary sigmoidal 1 1 0.22526
obtain
= 0.01191 X 0.2055 = 0.00245 activacion function and is given by 1 - t'- '" _-_--",=< -0.1!22
1 1
y = f(y;,) = l + t'-y,.. = 1 + 11-22526

Y = f{y;n) = ~ = 1 + e-0.09101 = 0.5227 Now find rhe changes in weights between input 2 1 -e-x
and hidden layer: f (x )----1---
- 1 +e-x - 1 +e-x Compute the error portion 8k:
Compute the error portion 811.:
.6.v 11 =a0 1x1 =0.25 x0.0118 x0=0 Given the input sample [x1, X21 = [-1, l] and target !, = (t, - yllf' (y;,,)
!,= (t,- y,)f'(y,,.,) <'>"21 = a!pQ=0.25 X 0.0118 X I =0.00295 t= 1:

Now

f'(J;,) = f(y;,)[1 - f(J;,)] = 0.5227[1- 0.5227]


<'>vo1 =a!, =0.25 x0.0118=0.00295
.6.v 12 =a82x1 =0.25 x0.00245 xO=O
Calculate the net input: For ZJ layer

Zin\ =VOl +xJVJJ +X2t121


Now

'
----------------
I f'(J;.) = 0.5[1 + f(J;,)] [I- f(J;,)]
= 0.5[! - 0.1122][1 + 0.1122] = 0.4937 .
-- .
-~~
ll:"22 =a!2X'2 =0.25 X 0.00245 X I =0.0006125 I = Q.3 + (-1) X 0.6 +I X -0.1 = -0.4
!' (J;,) = 0.2495 <'>v02 =a!2=0.25 x 0.00245 =0.0006!25
I '-..
-·---
)
l _...-/

l
3.14 Exercise Prob!ems 95
94 Supervised learning Network

13. State the testing algorithm used in perceptron 34. What are the activations used in back-
This implies f>'OI =•01 = 0.25 X 0.1056';'0.0264 propagation network algorithm?
algorithm.
t,.,,=•o 2x, =0.25 x 0.0195 x -1 =-0.0049 35. What is meant by local minima and global
,, = (l + 0.1122) (0.4937) = 0.5491 14. How is _the linear separability concept imple-
[,."22 = cl02X, =0.25 X 0.0195 X 1 =0.0049 mented using perceprron network training? minima?
Find the changes in weights between hidden and l>'02 = •o2= 0.25 X 0.0195 =0.0049 3i5. · Derive the generalized delta learning rule.
15. Define perceprron learning rule.
output layer:
16. Define d_dta rule. 37. Derive the derivations of the binary and bipolar
Comp'Lite the final weights of the nerwork:
L\w1 = a81 ZJ = 0.25 X 0.5491 X -0.1974 1.1~ SGlte the error function for delta rule. sigmoidal activation function.
= -0.0271 18. What is the drawback of using optimization 38. What are the factors that improve the conver-
""(new) = "" (old)+t., 11 = 0.6- 0.0264
gence of learning in BPN network?
/).w, = •01 Z2 = 0.25 X 0.549! X 0.537 = 0.0737 = 0.5736 algorithm?
39. What is meant by incremenrallearning?
L\wo = a81 = 0.25 x 0.5491 = 0.1373 ,,(n<w) = ,,(old)+t.,, = -0.3-0.0049 19. What is Adaline?
40. Why is gradient descent method adopted to
20. Draw the model of an Adaline network.
Compute the error portion Bj beMeen input and = -0.3049 minimize error?
21. Explain the training algorithm used in Adaline
hidden layer (j = 1 to 2): "21 (new) = "21 (old)+t...., 1 = -0.1 + 0.0264 41. What are the methods of initialization of
network.
= -0.0736 weights?
81 = 8;/ljj' (z;nj) 22. How is a Madaline network fOrmed?
m ...,,(new) = "22(old)+t."22 = 0.4 + 0.0049 42. What is the necessity of momentum factor in
23. Is it true that Madaline network consists of many
8inj = L 8k Wjk = 0.4049 perceptrons?
weight updation process?
43. Define "over fitting" or "over training."
~I 24. Scare the characteristics of weighted interconnec-
WI (new) = WI (old)+t.w 1 = 0.4- 0.0271
._ 8inj = 81 WjJ [· •· only one output neuron] tions between Adaline and Madaline. 44. State the techniques for proper choice oflearning
= 0.3729 rate.
=>8in1 =81 WJJ = 0.5491 X 0.4 = 0.21964
w,(n<w) = w,(old)+t.w, = 0.1 + 0.0737
25. How is training adopted in Madaline network
using majority vme rule? 45. What are the limitations of using momentum
( =>o;., =o, ""' = o.549I x o.1 = o.05491 = 0.1737 factor?
Error, 81 =8;,J/'(z;nJ) = 0.21964 X 0.5 26. State few applications of Adaline and Madaline;
1 ''' (n<w) = "OI (old)+l>'OI = 0.3 + 0.0264 46. How many hidden layers can there be in a neural
27. What is meant by epoch in training process?

~
X (I +0.1974)(1- 0.1974) = 0.1056 network?
= 0.3264 28. Wha,r is meant by gradient descent meiliod?
Error, 82 =8;112/'(z;,2) = 0.05491 X 0.5 47. What is the activation function used in radial
"oz(n<w) = '02(old)+t..,, = 0.5 + 0.0049 29. State ilie importance of back-propagation
X (1- 0.537)(1 + 0.537) = 0.0195 basis function network?
= 0.5049 algorithm.
48. Explain the training algorithm of radial basis
Now find the changes in weights berw-een input wo(new) = wo(old)+t.wo = -0.2 + 0.1373 30. What is called as memorization and generaliza- function network.
and hidden layer: = -0.0627 tion? 49. By what means can an IIR and an FIR filter be
31. List the stages involved in training of back- formed in neural network?
f'l.V]J =Cl:'8]X1 =0.25 X 0.1056 X -1 = -0.0264
Thus, the final weight has been computed for the propagation network.
/).'21 =•OiX, =0.25 X 0.1056 X 1 =0.0264 50. What is the importance of functional link net-
network shown in Figure 13. 32. Draw the architecture of back-propagation algo· work?
I 3.13 Review Questions
rithm.
33. State the significance of error portions 8k and Oj
51. Write a short note on binary classification tree
neural network.
1. What is supervised learning and how is it differ- 7. Smte the activation function used in perceprron in BPN algorithm.
52. Explain in detail about wavelet neural network.
em from unsupervised learning? network.
2. How does learning take place in supervised 8. What is the imporrance of threshold in percep-
learning? tron network? I 3.14 Exercise Problems
3. From a mathematical point of view, what is the 9. Mention the applications of perceptron network.
1. Implement NOR function using perceptron are belonging to the class (so have targ.etvalue 1),
process of learning in supervised learning? 10. What are feature detectors?
network for bipolar inputs and targets. vector (-1, -1, -1, 1) and (-1, -1, 1 1) are
4. What is the building block of the perceprron? 11. With a neat flowchart, explain the training not belonging to the class_ (so have target· value
2. Find the weights required to perform the fol-
5. Does perceprron require supervised learning? If process of percepuon network. -1). Assume learning rate 1 and initial weighlS
lowing classifications using perceptron network
no, what does it require? 12. What is the significance of error signal in per- "0.
The vectors (1, 1, -1, -1) ,nd (!,-I. 1, -I)
6. List the limitations of perceptron. ceptron network?

,L
U>>v.;,w-,

-~
96 Supervised l.eaming Network

3. ClassifY the two-dimensional pattern shown in parrern [1. 0) and target output I. Use learning
figure below using perceptron n~rwork.

••
•••
"C"
••
•••
•• •
"A"
rate of a == 0.3 and binary sigmoidal activation
function.

Associative Memory Networks 4


Target value : +1 Target value :- 1

4. Implement AND function using Ad.aline net- y Learning Objectives ----;-,------------~


work.
5. Using the delta rule, find the weights required Gives derails on associative memories. Hopfield network with its electrical model is
to perform following classifications: Vectors (1, Discusses rhe training algorithm used for par- described with training algorithm.
I, -I, -I) and (-1, -I, -I, -I) are belong- tern association networks - Hebb rule and Analysis of energy function was performed
ing to the class having target value 1; vectors 0.4 outer products rule. for BAM, discrete and continuous Hopfield
(1, I, I. I) and (-1, -1, I, -1) :uce not networks.
The architecture, flowchart for training pro-
belonging to- the class having rarget value -1.
9. Find the new weights for the network given in cess, training algorithm and testing algorithm An overview is given on rhe iterative aumasso-
Use a learning rate of 0.5 and assume ran-
clte above problem using back-propagation net- of autoassociarive, heteroassociarive and bidi- ciative necwork - linear autoassociaror mem-
dom value of weights. Also, using each of the
work. The network is presented the input pattern rectional associative memory are discussed in ory brain-in-the-box network and autoassoci-
training vectors as input, test the response of
[1, -1] and target output+ 1. Use learning rate detail. aror with threshold unit.
the net.

1 6. Implement AND function using Madaline net-


work.
of a = 0.3 and bipolar sigmoidal activation
function.
Variants of BAM - continuous BAM and
discrete BAM are included.
Also temporal associative memory is discussed
in brief.
10. Find the new weights for the activation func-
.. 7. With suitable example, discuss the perceptron tion with the network shown in problem 8 using
network training with and without bias.
,, 8. Using back-propagation network, find the new
BPN. The network is presemed wilh the input
pattern [-1, I] and target output -l. Use learn- I 4.1 Introduction
' weights for the network shown in the following ing rate of a = 0.45 and suirable activation
figure. The network is presented with the input function. An associative memory network can store a set of patterns as memories. When the associative memory is being
presented with a key panern, it responds by producing one of the scored patterns, which closely resembles
I 3.15 Projects
or_ relates ro the key panern. Thus, che recall is through association of the key paJtern, with the help of
inforiTiai:iOO.rnernomed: These types of memories are also called as content-addressable memories (CAM) m
1. Classify upper case letters and lower case leuers achieve the following rwo-ro-one mappings. contrast to that of traditional address-addressable memor;es in digital computers where stored pattern (in byres)
using perceptron ne[Work. Use as many output is recalled by its address. It is also a matrix memory as in RAM/ROM. The CAM can also be viewed as
units based on training set as possible. Test the • y = 6 sin(JC x,) + cos(JCX'2.) associating data to address, i.e.;--fo every data in the memory there is a corresponding unique address. Also,
network with noisy pattern as well. • y = sin(n x1) + cos(0.2JCX2.) ir can be viewed ~ fala correlato Here inpur data is correlated with chat of rhe stored data in the CAM.
It should be nored rh.rt-r- stored patterns must be unique, i.e., different patt:erns in each location. If the
2. Write a suitable computer program to classify the Set up cwo sets of data, each consisting of l 0 same pattern exists in more than one locacion in rhe CAM, then, even though the correlation is correct, the
numbers becween 0-9 using Adaline network. input-output pairs, one for training and ocher for address is noted to be ambiguous. The basic srrucrure of CAM is ive · Figure 4-1.
3. Write a computer program to train a Madaline to testing. The input-output data are obtained by Associative memory makes aral e searc ithin a stored dar he concept behind this search is
perform AND function using MRJ algorithm. varying input variables (x,, X2.)_ within [-1, +l] to Output any one or .ill Stored items W i match the gi n search argumem and tO retrieve cite stored data
4. Write a program for implementing BPN for train- randomly. Also the output daca is normalized either complerely or pama:Ily.
ing a .single hidden layer back-propagation net- within [-1, l}. Apply training to find proper Two cypes of associative memories can be differentiated. They are auwassodativt mnnory and httaoasso~
work with bipolar sigmoidal units (x = I) to weights in the network. dative mtmo . ·Both these nets are_ sin .e-la_ er nets in which the wei hts are determined in a manner that ·-1-.
the nee srores a set of ia:ern associa"tionS. ch of this associatiOn "iS"an "iii.jmr-outpUY·veCfoTfi"a.ir,-say,-.r.r:-- ·
I each of the ourput..VeCt:ors-isSame as the input vecrors with which it is associated, then the net is a said to
I
I ,) JJ
~() " \ ~
'~ 1
.
_,\:.,f)

l
. • \ l c'
,, 1 \ . ,,-: ~··· :-S · c(
l ( '" .

You might also like