0% found this document useful (0 votes)
55 views15 pages

Artificial Neural

artificial_neural

Uploaded by

Dan Adrian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views15 pages

Artificial Neural

artificial_neural

Uploaded by

Dan Adrian
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Principe, J.C.

Artificial Neural Networks


The Electrical Engineering Handbook
Ed. Richard C. Dorf
Boca Raton: CRC Press LLC, 2000
2000 by CRC Press LLC
20
ArfIlIcIaI euraI
efvorIs
20.1 Defnitions and Scope
Intioduction Defnitions and Style of Computation ANN Types
and Applications
20.2 Multilayei Peiceptions
Function of Each PE How to Tiain MLPs Applying Back-
Piopagation in Piactice Poseror Piobabilities
20.3 Radial Basis Function Netwoiks
20.4 Time Lagged Netwoiks
Memoiy Stiuctuies Tiaining-Focused TLN Aichitectuies
20.5 Hebbian Leaining and Piincipal Component Analysis
Netwoiks
Hebbian Leaining Piincipal Component Analysis Associative
Memoiies
20.6 Competitive Leaining and Kohonen Netwoiks
20.1 Dehnitiuns and Scupe
Intruductiun
Aitifcial neuial netwoiks (ANN) aie among the newest signal-piocessing technologies in the engineei`s toolbox.
The feld is highly inteidisciplinaiy, but oui appioach will iestiict the view to the engineeiing peispective. In
engineeiing, neuial netwoiks seive two impoitant functions: as pattein classifeis and as nonlineai adaptive
flteis. We will piovide a biief oveiview of the theoiy, leaining iules, and applications of the most impoitant
neuial netwoik models.
Dehnitiuns and Sty!e ul Cumputatiun
An ANN is an adaptive, most often nonlineai system that leains to peifoim a function (an input/output map)
fiom data. Adaptive means that the system paiameteis aie changed duiing opeiation, noimally called the
ranng |ase. Aftei the tiaining phase the ANN paiameteis aie fxed and the system is deployed to solve the
pioblem at hand (the esng |ase). The ANN is built with a systematic step-by-step pioceduie to optimize a
peifoimance ciiteiion oi to follow some implicit inteinal constiaint, which is commonly iefeiied to as the
|earnng ru|e. The input/output tiaining data aie fundamental in neuial netwoik technology, because they
convey the necessaiy infoimation to discovei" the optimal opeiating point. The nonlineai natuie of the neuial
netwoik piocessing elements (PEs) piovides the system with lots of exibility to achieve piactically any desiied
input/output map, i.e., some ANNs aie unersa| maers.
Theie is a style in neuial computation that is woith desciibing (Fig. 20.1). An input is piesented to the
netwoik and a coiiesponding desiied oi taiget iesponse set at the output (when this is the case the tiaining is
called suerseJ). An eiioi is composed fiom the diffeience between the desiied iesponse and the system
}ose C. rIncIpe
Inverry of |ordo
2000 by CRC Press LLC
output. This eiioi infoimation is fed back to the system and adjusts the system paiameteis in a systematic
fashion (the leaining iule). The piocess is iepeated until the peifoimance is acceptable. It is cleai fiom this
desciiption that the peifoimance hinges heavily on the data. If one does not have data that covei a signifcant
poition of the opeiating conditions oi if they aie noisy, then neuial netwoik technology is piobably not the
iight solution. On the othei hand, if theie is plenty of data and the pioblem is pooily undeistood to deiive an
appioximate model, then neuial netwoik technology is a good choice.
This opeiating pioceduie should be contiasted with the tiaditional engineeiing design, made of exhaustive
subsystem specifcations and inteicommunication piotocols. In ANNs, the designei chooses the netwoik topol-
ogy, the peifoimance function, the leaining iule, and the ciiteiion to stop the tiaining phase, but the system
automatically adjusts the paiameteis. So, it is diffcult to biing a ror infoimation into the design, and when
the system does not woik piopeily it is also haid to inciementally iefne the solution. But ANN-based solutions
aie extiemely effcient in teims of development time and iesouices, and in many diffcult pioblems ANNs
piovide peifoimance that is diffcult to match with othei technologies. Denkei 10 yeais ago said that ANNs
aie the second best way to implement a solution" motivated by the simplicity of theii design and because of
theii univeisality, only shadowed by the tiaditional design obtained by studying the physics of the pioblem. At
piesent, ANNs aie emeiging as the technology of choice foi many applications, such as pattein iecognition,
piediction, system identifcation, and contiol.
ANN Types and App!icatiuns
It is always iisky to establish a taxonomy of a technology, but oui motivation is one of pioviding a quick
oveiview of the application aieas and the most populai topologies and leaining paiadigms.
FIGURE 20.1 The style of neuial computation.
Supeivised Unsupeivised
Application Topology Leaining Leaining
Association Hopfeld Zuiada, 1992; Haykin, 1994] - Hebbian Zuiada, 1992;
Haykin, 1994; Kung, 1993]
Multilayei peiception Zuiada, 1992; Haykin, 1994;
Bishop, 1995]
Back-piopagation Zuiada, 1992;
Haykin, 1994; Bishop, 1995]
-
Lineai associative mem. Zuiada, 1992; Haykin, 1994] - Hebbian
Pattein
iecognition
Multilayei peiception Zuiada, 1992; Haykin, 1994;
Bishop, 1995]
Back-piopagation -
Radial basis functions Zuiada, 1992; Bishop, 1995] Least mean squaie |-means Bishop, 1995]
Featuie
extiaction
Competitive Zuiada, 1992; Haykin, 1994] - Competitive
Kohonen Zuiada, 1992; Haykin, 1994] - Kohonen
Multilayei peiception Kung, 1993] Back-piopagation -
Piincipal comp. anal. Zuiada, 1992; Kung, 1993] - Oja`s Zuiada, 1992; Kung, 1993]
Piediction,
system ID
Time-lagged netwoiks Zuiada, 1992; Kung, 1993;
de Viies and Piincipe, 1992]
Back-piopagation thiough time
Zuiada, 1992]
-
Fully iecuiient nets Zuiada, 1992]
2000 by CRC Press LLC
It is cleai that mu||ayer erterons (MLPs), the |at|-roagaon a|gor|m and its extensions - time-lagged
netwoiks (TLN) and back-piopagation thiough time (BPTT), iespectively - hold a piominent position in
ANN technology. It is theiefoie only natuial to spend most of oui oveiview piesenting the theoiy and tools of
back-piopagation leaining. It is also impoitant to notice that He||an |earnng (and its extension, the Oja iule)
is also a veiy useful (and biologically plausible) leaining mechanism. It is an unsuerseJ |earnng method
since theie is no need to specify the desiied oi taiget iesponse to the ANN.
20.2 Mu!ti!ayer Perceptruns
Multilayei peiceptions aie a layeied aiiangement of nonlineai PEs as shown in Fig. 20.2. The layei that ieceives
the input is called the nu |ayer, and the layei that pioduces the output is the ouu |ayer. The layeis that do
not have diiect access to the exteinal woild aie called |JJen |ayers. A layeied netwoik with just the input and
output layeis is called the erteron. Each connection between PEs is weighted by a scalai, w

, called a weg|,
which is adapted duiing leaining.
The PEs in the MLP aie composed of an addei followed by a smooth satuiating nonlineaiity of the sigmoid
type (Fig. 20.3). The most common satuiating nonlineaiities aie the logistic function and the hypeibolic
tangent. The thieshold is used in othei nets. The impoitance of the MLP is that it is a univeisal mappei
(implements aibitiaiy input/output maps) when the topology has at least two hidden layeis and suffcient
numbei of PEs Haykin, 1994]. Even MLPs with a single hidden layei aie able to appioximate continuous
input/output maps. This means that iaiely we will need to choose topologies with moie than two hidden layeis.
But these aie existence pioofs, so the issue that we must solve as engineeis is to choose how many layeis and
how many PEs in each layei aie iequiied to pioduce good iesults.
Many pioblems in engineeiing can be thought of in teims of a tiansfoimation of an input space, containing
the input, to an output space wheie the desiied iesponse exists. Foi instance, dividing data into classes can be
thought of as tiansfoiming the input into 0 and 1 iesponses that will code the classes Bishop, 1995]. Likewise,
identifcation of an unknown system can also be fiamed as a mapping (function appioximation) fiom the input
to the system output Kung, 1993]. The MLP is highly iecommended foi these applications.
FIGURE 20.2 MLP with one hidden layei (J-|-m).
FIGURE 20.3 A PE and the most common nonlineaiities.
2000 by CRC Press LLC
Functiun ul Each PE
Let us study biiey the function of a single PE with two inputs Zuiada, 1992]. If the nonlineaiity is the
thieshold nonlineaiity we can immediately see that the output is simply 1 and -1. The suiface that divides
these subspaces is called a searaon sur[ate, and in this case it is a line of equation
(20.1)
i.e., the PE weights and the bias contiol the oiientation and position of the sepaiation line, iespectively
(Fig. 20.4). In many dimensions the sepaiation suiface becomes an hypeiplane of dimension one less than the
dimensionality of the input space. So, each PE cieates a dichotomy in the input space. Foi smooth nonlineaiities
the sepaiation suiface is not ciisp; it becomes fuzzy but the same piinciples apply. In this case, the size of the
weights contiols the width of the fuzzy boundaiy (laigei weights shiink the fuzzy boundaiy).
The peiception input/output map is built fiom a juxtaposition of lineai sepaiation suifaces, so the peiception
gives zeio classifcation eiioi only foi |near|y seara||e t|asses (i.e., classes that can be exactly classifed by
hypeiplanes).
When one adds one layei to the peiception cieating a one hidden layei MLP, the type of sepaiation suifaces
changes diastically. It can be shown that this leaining machine is able to cieate bumps" in the input space,
i.e., an aiea of high iesponse suiiounded by low iesponses Zuiada, 1992]. The function of each PE is always
the same, no mattei if the PE is pait of a peiception oi an MLP. Howevei, notice that the output layei in the
MLP woiks with the iesult of hidden layei activations, cieating an embedding of functions and pioducing moie
complex sepaiation suifaces. The one-hidden-layei MLP is able to pioduce non|near searaon sur[ates.
If one adds an extia layei (i.e., two hidden layeis), the leaining machine now can combine at will bumps,
which can be inteipieted as a unersa| maer, since theie is evidence that any function can be appioximated
by localized bumps. One impoitant aspect to iemembei is that changing a single weight in the MLP can
diastically change the location of the sepaiation suifaces; i.e., the MLP achieves the input/output map thiough
the inteiplay of all its weights.
Huv tu Train MLPs
One fundamental issue is how to adapt the weights w

of the MLP to achieve a given input/output map. The


coie ideas have been aiound foi many yeais in optimization, and they aie extensions of well-known engineeiing
piinciples, such as the |eas mean square (LMS) a|gor|m of adaptive flteiing Haykin, 1994]. Let us ieview
the theoiy heie. Assume that we have a lineai PE ([ (net) net) and that one wants to adapt the weights as to
minimize the squaie diffeience between the desiied signal and the PE iesponse (Fig. 20.5).
This pioblem has an analytical solution known as the |eas squares Haykin, 1994]. The optimal weights aie
obtained as the pioduct of the inveise of the input autocoiielation function (R
-1
) and the cioss-coiielation
vectoi (P) between the input and the desiied iesponse. The analytical solution is equivalent to a seaich foi the
minimum of the quadiatic peifoimance suiface J(w

) using giadient descent, wheie the weights at each iteiation


| aie adjusted by
FIGURE 20.4 A two-input PE and its sepaiation suiface.
y w w w x w x |
1 2 1 1 2 2
0 ,
, ,
+ +
2000 by CRC Press LLC
(20.2)
wheie q is a small constant called the se s:e, and V J (|) is the giadient of the peifoimance suiface at iteiation |.
Beinaid Widiow in the late 1960s pioposed a veiy effcient estimate to compute the giadient at each iteiation
(20.3)
which when substituted into Eq. (20.2) pioduces the so-called LMS a|gor|m. He showed that the LMS
conveiged to the analytic solution piovided the step size q is small enough. Since it is a steepest descent
pioceduie, the laigest step size is limited by the inveise of the laigest eigenvalue of the input autocoiielation
matiix. The laigei the step size (below this limit), the fastei is the conveigence, but the fnal values will iattle"
aiound the optimal value in a basin that has a iadius piopoitional to the step size. Hence, theie is a fundamental
tiade-off between speed of conveigence and accuiacy in the fnal weight values. One gieat appeal of the LMS
algoiithm is that it is veiy effcient (just one multiplication pei weight) and iequiies only local quantities to
be computed.
The LMS algoiithm can be fiamed as a computation of paitial deiivatives of the cost with iespect to the
unknowns, i.e., the weight values. In fact, with the chainiule one wiites
(20.4)
we obtain the LMS algoiithm foi the lineai PE. What happens if the PE is nonlineai: If the nonlineaiity is
diffeientiable (smooth), we still can apply the same method, because of the chain iule, which piesciibes that
(Fig. 20.6)
FIGURE 20.5 Computing analytically optimal weights foi the lineai PE.
FIGURE 20.6 How to extend LMS to nonlineai PEs with the chain iule.
w | w | J | J
J
w

+
, ,

, ,
V
, ,
V 1 q
o
o
V
, ,

, , , , , ,

, , , ,
J |
w
J |
w
| | x |

o
o
o
o
r r ~
1
2
2
o
o
o
o
o
o
o
o
o
o
r
J
w
J
y
y
w y
J y
w
w x x



, ,

_
,

, ,

2
2000 by CRC Press LLC
(20.5)
wheie [ (net) is the deiivative of the nonlineaiity computed at the opeiating point. Equation (20.5) is known
as the Je|a ru|e, and it will tiain the peiception Haykin, 1994]. Note that thioughout the deiivation we skipped
the pattein index foi simplicity, but this iule is applied foi each input pattein. Howevei, the delta iule cannot
tiain MLPs since it iequiies the knowledge of the eiioi signal at each PE.
The piinciple of the oideied deiivatives can be extended to multilayei netwoiks, piovided we oiganize the
computations in ows of activation and eiioi piopagation. The piinciple is veiy easy to undeistand, but a little
complex to foimulate in equation foim Haykin, 1994].
Suppose that we want to adapt the weights connected to a hidden layei PE, the th PE (Fig. 20.7). One can
decompose the computation of the paitial deiivative of the cost with iespect to the weight w
,
as
(20.6)
i.e., the paitial deiivative with iespect to the weight is the pioduct of the paitial deiivative with iespect to the
PE state - pait 1 in Eq. (20.6) - times the paitial deiivative of the local activation to the weights - pait 2
in Eq. (20.6). This last quantity is exactly the same as foi the nonlineai PE ([ (net

)x
,
), so the big issue is the
computation of . Foi an output PE, becomes the injected eiioi r in Eq. (20.4). Foi the hidden th PE
is evaluated by summing all the eiiois that ieach the PE fiom the top layei thiough the topology when the
injected eiiois r
|
aie clamped at the top layei, oi in an equation
(20.7)
Substituting back in Eq. (20.6) we fnally get
(20.8)
FIGURE 20.7 How to adapt the weights connected to th PE.
o
o
o
o
o
o
o
o
r
J
w
J
y
y
w
ne J y [ x [ x



, ,

, ,

, ,
net
net net
o
o
o
o
o
o
o
o
J
w
J
y
y
w
,

net
net
1 2
o
o
J
y
o
o
J
y
o
o
J
y
o
o
o
o
o
o
o
o
r
J
y
J
y
y
y
[ w
|
|
|
|

| | | |
|

_
,

, ,

net
net net
o
o
r
J
w
x [ [ w
,
, | | |
|
, , , ,

_
,

net net
1 2
2000 by CRC Press LLC
This equation embodies the |at|-roagaon ranng a|gor|m Haykin, 1994; Bishop, 1995]. It can be
iewiitten as the pioduct of a local activation (pait 1) and a local eiioi (pait 2), exactly as the LMS and the
delta iules. But now the local eiioi is a composition of eiiois that ow thiough the topology, which becomes
equivalent to the existence of a desiied iesponse at the PE.
Theie is an intiinsic ow in the implementation of the back-piopagation algoiithm: fist, inputs aie applied
to the net and activations computed eveiywheie to yield the output activation. Second, the exteinal eiiois aie
computed by subtiacting the net output fiom the desiied iesponse. Thiid, these exteinal eiiois aie utilized in
Eq. (20.8) to compute the local eiiois foi the layei immediately pieceding the output layei, and the computations
chained up to the input layei. Once all the local eiiois aie available, Eq. (20.2) can be used to update eveiy
weight. These thiee steps aie then iepeated foi othei tiaining patteins until the eiioi is acceptable.
Step thiee is equivalent to injecting the exteinal eiiois in the Jua| oo|ogy and back-piopagating them up
to the input layei Haykin, 1994]. The dual topology is obtained fiom the oiiginal one by ieveising data ow
and substituting summing junctions by splitting nodes and vice veisa. The eiioi at each PE of the dual topology
is then multiplied by the activation of the oiiginal netwoik to compute the weight updates. So, effectively the
dual topology is being used to compute the local eiiois which makes the pioceduie highly effcient. This is the
ieason back-piopagation tiains a netwoik of N weights with a numbei of multiplications piopoitional to N,
(O(N)), instead of (O(N
2
)) foi pievious methods of computing paitial deiivatives known in contiol theoiy.
Using the dual topology to implement back-piopagation is the best and most geneial method to piogiam the
algoiithm in a digital computei.
App!ying Back-Prupagatiun in Practice
Now that we know an algoiithm to tiain MLPs, let us see what aie the piactical issues to apply it. We will
addiess the following aspects: size of tiaining set vs. weights, seaich pioceduies, how to stop tiaining, and how
to set the topology foi maximum geneialization.
Size ul Training Set
The size of the tiaining set is veiy impoitant foi good peifoimance. Remembei that the ANN gets its infoimation
fiom the tiaining set. If the tiaining data do not covei the full iange of opeiating conditions, the system may
peifoim badly when deployed. Undei no ciicumstances should the tiaining set be less than the numbei of
weights in the ANN. A good size of the tiaining data is ten times the numbei of weights in the netwoik, with
the lowei limit being set aiound thiee times the numbei of weights (these values should be taken as an indication,
subject to expeiimentation foi each case) Haykin, 1994].
Search Prucedures
Seaiching along the diiection of the giadient is fne if the peifoimance suiface is quadiatic. Howevei, in ANNs
iaiely is this the case, because of the use of nonlineai PEs and topologies with seveial layeis. So, giadient descent
can be caught in local minima, which makes the seaich veiy slow in iegions of small cuivatuie. One effcient
way to speed up the seaich in iegions of small cuivatuie and, at the same time, to stabilize it in naiiow valleys
is to include a momentum teim in the weight adaptation
(20.9)
The value of momentum o should be set expeiimentally between 0.5 and 0.9. Theie aie many moie modif-
cations to the conventional giadient seaich, such as adaptive step sizes, annealed noise, conjugate giadients,
and second-oidei methods (using infoimation contained in the Hessian matiix), but the simplicity and powei
of momentum leaining is haid to beat Haykin, 1994; Bishop, 1995].
Huv tu Stup Training
The stop ciiteiion is a fundamental aspect of tiaining. The simple ideas of capping the numbei of iteiations
oi of letting the system tiain until a piedeteimined eiioi value aie not iecommended. The ieason is that we
want the ANN to peifoim well in the test set data; i.e., we would like the system to peifoim well in data it
w n w n n x n w n w n
, , , , ,
+
, ,

, ,
+
, , , ,
+
, ,

, , , ,
1 1 qo o
2000 by CRC Press LLC
nevei saw befoie (good genera|:aon) Bishop, 1995]. The eiioi in the tiaining set tends to deciease with
iteiation when the ANN has enough degiees of fieedom to iepiesent the input/output map. Howevei, the
system may be iemembeiing the tiaining patteins (oerfng) instead of fnding the undeilying mapping iule.
This is called oerranng. To avoid oveitiaining the peifoimance in a a|Jaon se, i.e., a set of input data
that the system nevei saw befoie, must be checked iegulaily duiing tiaining (i.e., once eveiy 50 passes ovei the
tiaining set). The tiaining should be stopped when the peifoimance in the validation set staits to inciease,
despite the fact that the peifoimance in the tiaining set continues to deciease. This method is called tross
a|Jaon. The validation set should be 10% of the tiaining set, and distinct fiom it.
Size ul the Tupu!ugy
The size of the topology should also be caiefully selected. If the numbei of layeis oi the size of each layei is
too small, the netwoik does not have enough degiees of fieedom to classify the data oi to appioximate the
function, and the peifoimance suffeis.
On the othei hand, if the size of the netwoik is too laige, peifoimance may also suffei. This is the phenomenon
of oerfng that we mentioned above. But one alteinative way to contiol it is to ieduce the size of the netwoik.
Theie aie basically two pioceduies to set the size of the netwoik: eithei one staits small and adds new PEs oi
one staits with a laige netwoik and piunes PEs Haykin, 1994]. One quick way to piune the netwoik is to
impose a penalty teim in the peifoimance function - a regu|ar:ng erm - such as limiting the slope of the
input/output map Bishop, 1995]. A iegulaiization teim that can be implemented locally is
(20.10)
wheie i is the weg| Jetay paiametei and o the local eiioi. Weight decay tends to diive unimpoitant weights
to zeio.
A Posteriori Prubabi!ities
We will fnish the discussion of the MLP by noting that this topology when tiained with the mean squaie
eiioi is able to estimate diiectly at its outputs a oseror piobabilities, i.e., the piobability that a given input
pattein belongs to a given class Bishop, 1995]. This piopeity is veiy useful because the MLP outputs can be
inteipieted as piobabilities and opeiated as numbeis. In oidei to guaiantee this piopeity, one has to make suie
that each class is attiibuted to one output PE, that the topology is suffciently laige to iepiesent the mapping,
that the tiaining has conveiged to the absolute minimum, and that the outputs aie noimalized between 0 and
1. The fist iequiiements aie met by good design, while the last can be easily enfoiced if the so[max ataon
is used as the output PE Bishop, 1995],
(20.11)
20.3 Radia! Basis Functiun Netvurks
The iadial basis function (RBF) netwoik constitutes anothei way of implementing aibitiaiy input/output
mappings. The most signifcant diffeience between the MLP and RBF lies in the PE nonlineaiity. While the PE
in the MLP iesponds to the full input space, the PE in the RBF is local, noimally a Gaussian keinel in the input
space. Hence, it only iesponds to inputs that aie close to its centei; i.e., it has basically a |ota| resonse.
w n w n
w n
n x n
, ,
,
,
+
, ,

, ,

+
, , , ,

_
,

+
, , , ,
1 1
1
2
i
qo
y
,
,

, ,
, ,

exp
exp
net
net
2000 by CRC Press LLC
The RBF netwoik is also a layeied net with the hidden layei built fiom Gaussian keinels and a lineai (oi
nonlineai) output layei (Fig. 20.8). Tiaining of the RBF netwoik is done noimally in two stages Haykin, 1994]:
fist, the centeis x

aie adaptively placed in the input space using competitive leaining oi | means clusteiing
Bishop, 1995], which aie unsupeivised pioceduies. Competitive leaining is explained latei in the chaptei. The
vaiiances of each Gaussian aie chosen as a peicentage (30 to 50%) to the distance to the neaiest centei. The
goal is to covei adequately the input data distiibution. Once the RBF is located, the second layei weights w

aie tiained using the LMS pioceduie.


RBF netwoiks aie easy to woik with, they tiain veiy fast, and they have shown good piopeities both foi
function appioximation as classifcation. The pioblem is that they iequiie lots of Gaussian keinels in high-
dimensional spaces.
20.4 Time-Lagged Netvurks
The MLP is the most common neuial netwoik topology, but it can only handle instantaneous infoimation,
since the system has no memoiy and it is feedfoiwaid. In engineeiing, the piocessing of signals that exist in
time iequiies systems with memoiy, i.e., lineai flteis. Anothei alteinative to implement memoiy is to use
feedback, which gives iise to returren newor|s. Fully iecuiient netwoiks aie diffcult to tiain and to stabilize,
so it is piefeiable to develop topologies based on MLPs but wheie explicit subsystems to stoie the past infoimation
aie included. These subsystems aie called s|or-erm memory srutures de Viies and Piincipe, 1992]. The
combination of an MLP with shoit-teim memoiy stiuctuies is called a me-|aggeJ newor| (TLN). The memoiy
stiuctuies can be eventually iecuiient, but the feedback is local, so stability is still easy to guaiantee. Heie, we
will covei just one TLN topology, called [otuseJ, wheie the memoiy is at the input layei. The most geneial TLN
have memoiy added anywheie in the netwoik, but they iequiie othei moie-involved tiaining stiategies (BPTT
Haykin, 1994]). The inteiested ieadei is iefeiied to de Viies and Piincipe 1992] foi fuithei details.
The function of a shoit-teim memoiy in the focused TLN is to iepiesent the past of the input signal, while
the nonlineai PEs piovide the mapping as in the MLP (Fig. 20.9).
Memury Structures
The simplest memoiy stiuctuie is built fiom a a Je|ay |ne (Fig. 20.10). The memory |y Je|ays is a single-
input, multiple-output system that has no fiee paiameteis except its size K. The tap delay memoiy is the
memoiy utilized in the me-Je|ay neura| newor| (TDNN) which has been utilized successfully in speech
iecognition and system identifcation Kung, 1993].
A diffeient mechanism foi lineai memoiy is the [eeJ|at| (Fig. 20.11). Feedback allows the system to iemem-
bei past events because of the exponential decay of the iesponse. This memoiy has limited iesolution because
of the low pass iequiied foi long memoiies. But notice that unlike the memoiy by delay, memoiy by feedback
piovides the leaining system with a fiee paiametei that contiols the length of the memoiy. Memoiy by
feedback has been used in Elman and Joidan netwoiks Haykin, 1994].
FIGURE 20.8 Radial Basis Function (RBF) netwoik.
2000 by CRC Press LLC
It is possible to combine the advantages of memoiy by feedback with the ones of the memoiy by delays in
lineai systems called Jserse Je|ay |nes. The most studied of these memoiies is a cascade of low-pass functions
called the gamma memory de Viies and Piincipe, 1992]. The gamma memoiy has a fiee paiametei that
contiols and decouples memoiy depth fiom iesolution of the memoiy. Memory Je| D is defned as the fist
moment of the impulse iesponse fiom the input to the last tap K, while memory reso|uon R is the numbei of
taps pei unit time. Foi the gamma memoiy D K/, and R ; i.e., changing modifes the memoiy depth
and iesolution inveisely. This iecuisive paiametei can be adapted with the output MSE as the othei netwoik
paiameteis; i.e., the ANN is able to choose the best memoiy depth to minimize the output eiioi, which is
unlike the tap delay memoiy.
Training-Fucused TLN Architectures
The appeal of the focused aichitectuie is that the MLP weights can be still adapted with back-piopagation.
Howevei, the input/output mapping pioduced by these netwoiks is static. The input memoiy layei is biinging
in past input infoimation to establish the value of the mapping.
As we know in engineeiing, the size of the memoiy is fundamental to identify, foi instance, an unknown
plant oi to peifoim piediction with a small eiioi. But note now that with the focused TLN the models foi
system identifcation become nonlineai (i.e., nonlineai moving aveiage - NMA).
When the tap delay implements the shoit-teim memoiy, stiaight back-piopagation can be utilized since the
only adaptive paiameteis aie the MLP weights. When the gamma memoiy is utilized (oi the context PE), the
iecuisive paiametei is adapted in a total adaptive fiamewoik (oi the paiametei is pieset by some exteinal
consideiation). The equations to adapt the context PE and the gamma memoiy aie shown in Figs. 20.11 and
20.12, iespectively. Foi the context PE o(n) iefeis to the total eiioi that is back-piopagated fiom the MLP and
that ieaches the dual context PE.
FIGURE 20.9 A focused TLN.
FIGURE 20.10 Tap delay line memoiy.
FIGURE 20.11 Memoiy by feedback (context PE).
2000 by CRC Press LLC
20.5 Hebbian Learning and Principa! Cumpunent
Ana!ysis Netvurks
Hebbian Learning
Hebbian leaining is an unsupeivised leaining iule that taures sm|ary between an input and an output
thiough torre|aon. To adapt a weight w

using Hebbian leaining we adjust the weights accoiding to Aw


qx

y oi in an equation Haykin, 1994]


(20.12)
wheie q is the step size, x

is the th input and y is the PE output.


The output of the single PE is an innei pioduct between the input and the weight vectoi (foimula in
Fig. 20.13). It measuies the similaiity between the two vectois - i.e., if the input is close to the weight vectoi
the output y is laige; otheiwise it is small. The weights aie computed by an outei pioduct of the input X and
output Y, i.e., W XY
T
, wheie T means tianspose. The pioblem of Hebbian leaining is that it is unstable; i.e.,
the weights will keep on giowing with the numbei of iteiations Haykin, 1994].
Oja pioposed to stabilize the Hebbian iule by noimalizing the new weight by its size, which gives the iule
Haykin, 1994]:
(20.13)
The weights now conveige to fnite values. They still defne in the input space the diiection wheie the data
clustei has its laigest piojection, which coiiesponds to the eigenvectoi with the laigest eigenvalue of the input
coiielation matiix Kung, 1993]. The output of the PE piovides the laigest eigenvalue of the input coiielation
matiix.
FIGURE 20.12 Gamma memoiy (dispeisive delay line).
FIGURE 20.13 Hebbian PE.
w n w n x n y n

+
, ,

, ,
+
, , , ,
1 q
w n w n y n x n y n w n

+
, ,

, ,
+
, , , ,

, , , ,
,
1 q
2000 by CRC Press LLC
Principa! Cumpunent Ana!ysis
Piincipal component analysis (PCA) is a well-known technique in signal piocessing that is used to pioject a
signal into a signal-specifc basis. The impoitance of PCA analysis is that it piovides |e |es |near ro,eton
to a subspace in teims of pieseiving the signal eneigy Haykin, 1994]. Noimally, PCA is computed analytically
thiough a singulai value decomposition. PCA netwoiks offei an alteinative to this computation by pioviding
an iteiative implementation that may be piefeiied foi ieal-time opeiation in embedded systems.
The PCA netwoik is a one-layei netwoik with lineai-piocessing elements (Fig. 20.14). One can extend Oja`s
iule foi many-output PEs (less oi equal to the numbei of input PEs), accoiding to the foimula shown in
Fig. 20.14 which is called the Sangei`s iule Haykin, 1994]. The weight matiix iows (that contain the weights
connected to the output PEs in descending oidei) aie the eigenvectois of the input coiielation matiix. If we
set the numbei of output PEs equal to M < D, we will be piojecting the input data onto the M laigest piincipal
components. Theii outputs will be piopoitional to the M laigest eigenvalues. Note that we aie peifoiming an
eigendecomposition thiough an iteiative pioceduie.
Assuciative Memuries
Hebbian leaining is also the iule to cieate assotae memores Zuiada, 1992]. The most-utilized associative memoiy
implements |eeroassotaon, wheie the system is able to associate an input X to a designated output Y which can
be of a diffeient dimension (Fig. 20.15). So, in heteioassociation the signal Y woiks as the desiied iesponse.
We can tiain such a memoiy using Hebbian leaining oi LMS, but the LMS piovides a moie effcient encoding
of infoimation. Associative memoiies diffei fiom conventional computei memoiies in seveial iespects. Fiist,
they aie content addiessable, and the infoimation is distiibuted thioughout the netwoik, so they aie iobust to
noise in the input. With nonlineai PEs oi iecuiient connections (as in the famous Hopfeld netwoik) Haykin,
1994] they display the impoitant piopeity of aern tom|eon, i.e., when the input is distoited oi only paitially
available, the iecall can still be peifect.
FIGURE 20.14 PCA netwoik.
FIGURE 20.15 Associative memoiy (heteioassociation).
2000 by CRC Press LLC
A special case of associative memoiies is called the auoassotaor (Fig. 20.16), wheie the tiaining output of
size D is equal to the input signal (also a size D) Kung, 1993]. Note that the hidden layei has fewei PEs (M
D) than the input (bottleneck layei). V
1
V
2
T
is enfoiced. The function of this netwoik is one of entoJng
or Jaa reJuton. The tiaining of this netwoik (V
2
matiix) is done with LMS. It can be shown that this netwoik
also implements PCA with M components, even when the hidden layei is built fiom nonlineai PEs.
20.6 Cumpetitive Learning and Kuhunen Netvurks
Competition is a veiy effcient way to divide the computing iesouices of a netwoik. Instead of having each
output PE moie oi less sensitive to the full input space, as in the associative memoiies, in a competitive netwoik
each PE specializes into a piece of the input space and iepiesents it Haykin, 1994]. Competitive netwoiks aie
lineai, single-layei nets (Fig. 20.17). Theii functionality is diiectly ielated to the competitive leaining iule, which
belongs to the unsupeivised categoiy. Fiist, only the PE that has the laigest output gets its weights updated.
The weights of the winning PE aie updated accoiding to the foimula in Fig. 20.17 in such a way that they
appioach the piesent input. The step size exactly contiols how much is this adjustment (see Fig. 20.17).
Notice that theie is an intiinsic nonlineaiity in the leaining iule: only the PE that has the laigest output (the
winnei) has its weights updated. All the othei weights iemain unchanged. This is the mechanism that allows
the competitive net PEs to specialize.
Competitive netwoiks aie used foi clusteiing; i.e., an M output PE net will seek M clusteis in the input space.
The weights of each PE will coiiespond to the centeis of mass of one of the M clusteis of input samples. When
a given pattein is shown to the tiained net, only one of the outputs will be active and can be used to |a|e| the
sample as belonging to one of the clusteis. No moie infoimation about the input data is pieseived.
Competitive leaining is one of the fundamental components of the Kohonen self-oiganizing featuie map
(SOFM) netwoik, which is also a single-layei netwoik with lineai PEs Haykin, 1994]. Kohonen leaining cieates
annealed competition in the output space, by adapting not only the winnei PE weights but also theii spatial
FIGURE 20.16 Autoassociatoi.
FIGURE 20.17 Competitive neuial netwoik.
2000 by CRC Press LLC
neighbois using a Gaussian neighboihood function A. The output PEs aie aiianged in lineai oi two-dimensional
neighboihoods (Fig. 20.18)
Kohonen SOFM netwoiks pioduce a mapping between the continuous input space to the disciete output
space pieseiving topological piopeities of the input space (i.e., local neighbois in the input space aie mapped
to neighbois in the output space). Duiing tiaining, both the spatial neighboihoods and the leaining constant
aie decieased slowly by staiting with a laige neighboihood o
0
, and decieasing it (N
0
contiols the scheduling).
The initial step size q
0
also needs to be scheduled (by K).
The Kohonen SOFM netwoik is useful to pioject the input to a subspace as an alteinative to PCA netwoiks.
The topological piopeities of the output space piovide moie infoimation about the input than stiaight
clusteiing.
Relerences
C. M. Bishop, Neura| Newor|s [or Paern Retognon, New Yoik: Oxfoid Univeisity Piess, 1995.
de Viies and J. C. Piincipe, The gamma model - a new neuial model foi tempoial piocessing," Neura|
Newor|s, Vol. 5, pp. 565-576, 1992.
S. Haykin, Neura| Newor|s. Comre|ense FounJaon, New Yoik: Macmillan, 1994.
S. Y. Kung, Dga| Neura| Newor|s, Englewood Cliffs, N.J.: Pientice-Hall, 1993.
J. M. Zuiada, rfta| Neura| Sysems, West Publishing, 1992.
Further Inlurmatiun
The liteiatuie in this feld is voluminous. We decided to limit the iefeiences to text books foi an engineeiing
audience, with diffeient levels of sophistication. Zuiada is the most accessible text, Haykin the most compie-
hensive. Kung piovides inteiesting applications of both PCA netwoiks and nonlineai signal piocessing and
system identifcation. Bishop concentiates on the design of pattein classifeis.
Inteiested ieadeis aie diiected to the following jouinals foi moie infoimation: IEEE Transatons on Sgna|
Protessng, IEEE Tranatons on Neura| Newor|s, Neura| Newor|s, Neura| Comuaon, and ProteeJngs o[ |e
Neura| In[ormaon Protessng Sysem Con[erente (NIPS).
FIGURE 20.18 Kohonen SOFM.

You might also like