2 Architectures
2 Architectures
2-1
2 Neuron Model and Ne5vork Archi5ec5ures
Notation
Neural networks are so new that standard mathematical
notation and ar- chitectural representations for them have not
yet been fi rmly established. In addition, papers and books on
neural networks have come from many di- verse fields, including
engineering, physics, psychology and mathematics, and many
authors tend to use vocabulary peculiar to their specialty. As a
result, many books and papers in this field are diffi cult to read,
and con- cepts are made to seem more complex than they
actually are. This is a shame, as it has prevented the spread of
important new ideas. I t has also led to more than one
“reinvention of the wheel.”
In this book we have tried to use standard notation where
possible, to be clear and to keep matters simple without
sacrificing rigor. In particular, we have tried to define practical
conventions and use them consistently.
Figures, mathematical equations and text discussing both
figures and mathematical equations will use the following
notation:
Scalars — small i50lic letters: 0,b,c
Vectors — small bold nonitalic letters: a,b,c
Matrices — capital BOLD nonitalic letters: A,B,C
Additional notation concerning the network architectures will be
intro- duced as you read this chapter. A complete list of the
notation that we use throughout the book is given in Appendix
B, so you can look there if you have a question.
Neuron Model
Weigh plied by the scalar veigh5 w to form wp , one of the terms that is
t Bia sent to the summer. The other input, 1 , is multiplied by a bi0s b
s Net Input Single-Input Neuron
and then passed to the summer. The summer output n , often
Transfer referred to as the
A single-input ne5 inpu5,
neuron goes ininto
is shown a 5r0nsfer
Figure func5ion
2.1. The f , input
scalar which p
Function produces
is multi- the scalar neuron output a . (Some authors use the
term “activation function” rather than 5r0nsfer func- 5ion and
“offset” rather than bi0s.)
If we relate this simple model back to the biological neuron that
we dis- cussed in Chapter 1, the weight w corresponds to the
strength of a synapse,
2-2
Neuron Model
a = f wp + b .
n p
0+1 -b/w 0
-1 -1
+1
a = hardlim(n) a = hardlim(wp+b)
Hard Limit Transfer Single-Input hardlim
Function Neuron
Figure 2.2 Hard Limit Transfer Function
The graph on the right side of Figure 2.2 illustrates the
input/output char- acteristic of a single-input neuron that uses a
hard limit transfer function. Here we can see the effect of the
weight and the bias. Note that an icon for the hard limit
transfer function is shown between the two figures. Such icons
will replace the general f in network diagrams to show the
particular transfer function that is being used.
Linear
Transfer The output of a line0r 5r0nsfer func5ion is equal to its input:
Function
a = n,
(2.1)
2-4
Neuron Model
+1 +1
2
n p
0 -b/w 0
-1 -1
a = logsig (n) a = logsig(wp+b)
Log-Sigmoid Transfer Single-Input logsig
Function Neuron
Figure 2.4 Log-Sigmoid Transfer Function
This transfer function takes the input (which may have any value
between plus and minus infi nity) and squashes the output into
the range 0 to 1, ac- cording to the expression:
1 --
a = ------------- – (2.2
1+e )
n
- .
The log-sigmoid transfer function is commonly used in multilayer
networks that are trained using the backpropagation algorithm,
in part because this function is differentiable (see Chapter 11).
Most of the transfer functions used in this book are summarized
in Table
2.1. Of course, you can define other transfer functions in
addition to those shown in Table 2.1 if you wish.
To experimen5 vi5h 0 single-inpu5 neuron, use 5he Neur0l Ne5vork Design
Demons5r05ion One-Input Neuron nnd2n1.
2-5
2 Neuron Model and Ne5vork Archi5ec5ures
MATLAB
Name Input/Output Relation Icon
Function
a = 0 n0
Hard Limit hardlim
a = 1 n0
a = –1 n0
Symmetrical Hard Limit hardlims
a = +1 n0
Linear a = n purelin
a = 0 n0
Saturating Linear a = n 0n1 satlin
a = 1 n1
a = –1 n –1
Symmetric Saturating a = n –1 n 1 satlins
Linear a = 1 n1
1
a = ------------ ----
Log-Sigmoid –n logsig
1+e
n –n
e –e
Hyperbolic tansig
a = ---
Tangent n
------------–--n-
Sigmoid e +e
a = 0 n0
Positive Linear poslin
a = n 0n
a = 1 neuron
Competitive
with max n a = 0
C compet
all other
Table 2.1 Transfer
neurons
Functions
2-6
Neuron Model
Multiple-Input Neuron
Typically, a neuron has more than one input. A neuron with R
inputs is shown in Figure 2.5. The individual inputs p 1 ,p 2 ,...,p R
Weight
Matrix
are each weighted by corresponding elements w1 1,w1 2,...,w1 R of
the veigh5 m05rix W .
2
Inputs Multiple-Input Neuron
p1
w1, 1
n a
p2
f
p3 b
pR w1, R
1
a = f (Wp + b)
2-T
2 Neuron Model and Ne5vork Archi5ec5ures
1
1 b
1x
R 1 1
a = f (Wp + b)
2-8
Ne5vork Archi5ec5ures
Network Architectures
Commonly one neuron, even with many inputs, may not be
sufficient. We might need five or ten, operating in parallel, in
what we will call a “layer.” This concept of a layer is discussed
below.
2
A Layer of Neurons
Laye A single-l0yer network of S neurons is shown in Figure 2.7. Note
r that each of the R inputs is connected to each of the neurons and
that the weight ma- trix now has S rows.
n1 a1
p1
w1,1 f
b1
p2 1
n2 a2
p3
f
b2
1
pR nS aS
wS, R
f
bS
1
a = f(Wp + b)
2-9
2 Neuron Model and Ne5vork Archi5ec5ures
W = w
w1 w 222
2 11 w1 ww12RR (2.6
.
)
p a
Rx A W
n
Sx
1 Sx
R Sx f 1
1
1 b
Sx
R 1 S
a = f(Wp + b)
Here again, the symbols below the variables tell you that for
this layer, p is a vector of length R , W is an S R matrix, and a
and b are vectors of length S . As defined previously, the layer
includes the weight matrix, the summation and multiplication
operations, the bias vector b , the transfer function boxes and
the output vector.
Layer append the number of the layer as a superscrip5 to the names for
Superscript each of these variables. Thus, the weight matrix for the fi rst
layer is written as W1 , and the weight matrix for the second
layer is written as W2 . This notation is used in the three-layer
network shown in Figure 2.9. 2
Input First Second Third
s Layer Layer Layer
n1 a1 w2 n2 a2 w3 n3 a3
1 1 1,1 1 1 1,1 1 1
w 1 1,1
p1 f1 f2 f3
b1 1 b2 1 b3 1
1 1 1
p2
p3
n1 2
f 1
a1 2
A n2 2
f2
a2 2
n3 2
f 3
a3 2
b1 2 b2 2 b3 2
1 1 1
pR n1S 1 a1S 1 n 2 2 a2S 2 n3S 3 a3S 3
A A Af
S
w 1S 1 , R f 1 f 2 3
w 21S 2, S w 32S 3, S
b1S 1 b2S 2 b3S 3
1 1 1
2-11
2 Neuron Model and Ne5vork Archi5ec5ures
S1 x R n1
a1
S1 x 1 W2
n2
a2
S2 x 1 W3
n3
a3
S3 x 1
S1 x 1 f1 S2 x S1 S2 x 1 f2 S3 x S2 S3 x 1 f3
1 b1
1 b2
1 b3
R S2 x 1 S3 x 1
S1 x 1 S1 S2 S3
than those without, and that is true. Note, for instance, that a
neuron with- out a bias will always have a net input n of zero
when the network inputs p are zero. This may not be desirable
and can be avoided by the use of a bias. The effect of the bias is
discussed more fully in Chapters 3, 4 and 5. 2
In later chapters we will omit a bias in some examples or
demonstrations. In some cases this is done simply to reduce the
number of network param- eters. With just two variables, we can
plot system convergence in a two-di- mensional plane. Three or
more variables
Recurrent are diffi cult to display.
Networks
Before we discuss recurrent networks, we need to introduce some
Dela simple building blocks. The fi rst is the del0y block, which is
y illustrated in Figure 2.11.
Dela
y
u(t) a(t)
D
a(0)
a(t) = u(t -
1)
Figure 2.11 Delay Block
The delay output at is computed from its input ut
according to
(2.7
at = ut – 1 . )
Thus the output is the input delayed by one time step. (This
assumes that time is updated in discrete steps and takes on only
integer values.) Eq. (2.7) requires that the output be initialized
at time t = 0 . This initial condition is indicated in Figure 2.11 by
the arrow coming into the bottom of the delay block.
Another related building block, which we will use for the
Integrat continuous-time recurrent networks in Chapters 15—18, is the
or in5egr05or, which is shown in Figure 2.12.
2-13
2 Neuron Model and Ne5vork Archi5ec5ures
Integrat
or
u(t) a(t)
a(0)
t
a(t) = u() d + a(0)
0
p n(t +
Sx W a(t + a(t)
1 Sx
1) D
A
S Sx
1 S x 1)
A Sx 1
1 b
Sx
S 1
1 S
2-14
Ne5vork Archi5ec5ures
2-15
2 Neuron Model and Ne5vork Archi5ec5ures
Summary of Results
Single-Input Neuron
Input General
s Neuron
a
w
p
Ab A
n f
1
a = f (wp + b)
Multiple-Input Neuron
Inputs Multiple-Input
Neuron
p1
w1, 1
n a
p2
f
p3 b
pR w1, R
1
a = f (Wp + b)
Inpu Multiple-Input
t Neuron
p a
Rx W 1x
n
1 1x
R 1x f 1
1
1 b
1x
R 1 1
a = f (Wp +
b)
2-16
Summary of Resul5s
Transfer Functions
MATLAB
Name Input/Output Relation Icon
Function 2
a = 0 n0
Hard Limit hardlim
a = 1 n0
a = –1 n0
Symmetrical Hard Limit hardlims
a = +1 n0
Linear a = n purelin
a = 0 n0
Saturating Linear a = n 0n1 satlin
a = 1 n1
a = –1 n –1
Symmetric Saturating a = n –1 n 1 satlins
Linear a = 1 n1
1
a = ------------ ----
Log-Sigmoid –n logsig
1+e
n –n
e –e
Hyperbolic tansig
a = ---
Tangent n
------------–--n-
Sigmoid e +e
a = 0 n0
Positive Linear poslin
a = n 0n
a = 1 neuron
Competitive
with max n a = 0
C compet
all other
neurons
2-1T
2 Neuron Model and Ne5vork Archi5ec5ures
Layer of Neurons
Inpu Layer of S
t Neurons
p a
Rx A W
n
Sx
1 Sx
R Sx f 1
1
1 b
Sx
R 1 S
a = f(Wp +
b)
Delay
Dela
y
u(t) a(t)
D
a(0)
a(t) = u(t -
1)
2-18
Summary of Resul5s
Integrator
Integrat
or
u(t) a(t)
2
a(0)
t
a(t) = u() d + a(0)
0
Recurrent Network
Initial
Conditio Recurrent
n Layer
p n(t +
Sx W a(t + a(t)
1 Sx
1)
A
S
1 S x 1) D
1 b
Sx
A Sx1 Sx
S 1
1 S
a(0) = p a(t + 1)
= satlins (Wa(t) + b)
2-19
2 Neuron Model and Ne5vork Archi5ec5ures
Solved Problems
P2.1 The input to a single-input neuron is 2.0, its weight is 2.3 and its
bias is -3.
i. What is the net input to the transfer function?
ii. What is the neuron output?
iii.The net input is given by:
P2.2 What is the output of the neuron of P2.1 if it has the following
transfer functions?
i. Hard limit
ii. Linear
iii. Log-sigmoid
iv.For the hard limit transfer function:
a = hardlim1.6= 1.0
a = purelin1.6= 1.6
1
a = logsig1.6 = ------------------- =
0.8320 1 + e–1.6
»2+
2 Verify this result using MATLAB and the function logsig, which
ans
=
is in the MININNET directory (see Appendix B).
4
n = Wp + b = 3 2 –5
2
6 + 1.2 = –1.8 .
2-21
2 Neuron Model and Ne5vork Archi5ec5ures
Epilogue
This chapter has introduced a simple artificial neuron and has
illustrated how different neural networks can be created by
connecting groups of neu- rons in various ways. One of the main
objectives of this chapter has been to introduce our basic
notation. As the networks are discussed in more detail in later
chapters, you may wish to return to Chapter 2 to refresh your
mem- ory of the appropriate notation.
This chapter was not meant to be a complete presentation of the
networks we have discussed here. That will be done in the
chapters that follow. We will begin in Chapter 3, which will
present a simple example that uses some of the networks
described in this chapter, and will give you an oppor- tunity to
see these networks in action. The networks demonstrated in
Chapter 3 are representative of the types of networks that are
covered in the remainder of this text.
2-22
Ecercises
Exercises
E2.1 The input to a single input neuron is 2.0, its weight is 1.3 and its
bias is 3.0. What possible kinds of transfer function, from Table
2.1, could this neuron have, if its output is:
2
i. 1.6
ii. 1.0
iii. 0.996
3
iv.-1.0
E2.2 Consider a single-input neuron with a bias. We would like the
output to be
-1 for inputs less than 3 and +1 for inputs greater than or equal
to 3.i. What kind of a transfer function is required?
ii. What bias would you suggest? Is your bias in any way
related to the input weight? If yes, how?
»2+
2 iii. Summarize your network by naming the transfer function
ans
=
and stat- ing the bias and the weight. Draw a diagram of
4 the network. Verify the network performance using
M ATLAB.
E2.3 Kiven a two-input neuron with the following weight matrix and
input
tor: Wvec-
T
= 3 2 and p = –5 7 , we would like to have an
output of 0.5. Do
you suppose that there is a combination of bias and transfer
function that might allow this?
i. Is there a transfer function from Table 2.1 that will do the
job if the bias is zero?
ii. Is there a bias that will do the job if the linear transfer
function is used? If yes, what is it?
iii. Is there a bias that will do the job if a log-sigmoid transfer
function is used? Again, if yes, what is it?
iv. Is there a bias that will do the job if a symmetrical hard
limit trans- fer function is used? Again, if yes, what is it?
2-23
2 Neuron Model and Ne5vork Archi5ec5ures
2-24