Theory and Examples: Problem Statement

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

3 An Illustrative Example

Theory and Examples

Problem Statement
A produce dealer has a warehouse that stores a variety of fruits and vege-
tables. When fruit is brought to the warehouse, various types of fruit may
be mixed together. The dealer wants a machine that will sort the fruit ac-
cording to type. There is a conveyer belt on which the fruit is loaded. This
conveyer passes through a set of sensors, which measure three properties
of the fruit: shape, texture and weight. These sensors are somewhat primi-
tive. The shape sensor will output a 1 if the fruit is approximately round
and a – 1 if it is more elliptical. The texture sensor will output a 1 if the sur-
face of the fruit is smooth and a – 1 if it is rough. The weight sensor will
output a 1 if the fruit is more than one pound and a – 1 if it is less than one
pound.
The three sensor outputs will then be input to a neural network. The pur-
pose of the network is to decide which kind of fruit is on the conveyor, so
that the fruit can be directed to the correct storage bin. To make the prob-
lem even simpler, let’s assume that there are only two kinds of fruit on the
conveyor: apples and oranges.

Neural
Network

Sensors
Sorter

Apples Oranges

As each fruit passes through the sensors it can be represented by a three-


dimensional vector. The first element of the vector will represent shape,
the second element will represent texture and the third element will repre-
sent weight:

3-2
Perceptron

shape
p = texture . (3.1)
weight

Therefore, a prototype orange would be represented by

1
p1 = –1 , (3.2) 3
–1

and a prototype apple would be represented by

1
p2 = 1 . (3.3)
–1

The neural network will receive one three-dimensional input vector for
each fruit on the conveyer and must make a decision as to whether the fruit
is an orange  p 1  or an apple  p 2  .
Now that we have defined this simple (trivial?) pattern recognition prob-
lem, let’s look briefly at three different neural networks that could be used
to solve it. The simplicity of our problem will facilitate our understanding
of the operation of the networks.

Perceptron
The first network we will discuss is the perceptron. Figure 3.1 illustrates a
single-layer perceptron with a symmetric hard limit transfer function hard-
lims.

Inputs Sym. Hard Limit


- Title - Layer

p a
Rx1
W Sx1
SxR
n
Sx1
1 b
R Sx1 S

- Exp (Wp
a = hardlims - + b)

Figure 3.1 Single-Layer Perceptron

3-3
3 An Illustrative Example

Two-Input Case
Before we use the perceptron to solve the orange and apple recognition
problem (which will require a three-input perceptron, i.e., R = 3 ), it is use-
ful to investigate the capabilities of a two-input/single-neuron perceptron
( R = 2 ), which can be easily analyzed graphically. The two-input percep-
tron is shown in Figure 3.2.

Inputs Two-Input Neuron

p1 w1,1
n a
p2
Σ
w1,2 b

1
a = hardlims (Wp + b)

Figure 3.2 Two-Input/Single-Neuron Perceptron


Single-neuron perceptrons can classify input vectors into two categories.
For example, for a two-input perceptron, if w 1 1 = – 1 and w 1 2 = 1 then

a = hardlims  n  = hardlims  – 1 1 p + b  . (3.4)

Therefore, if the inner product of the weight matrix (a single row vector in
this case) with the input vector is greater than or equal to – b , the output
will be 1. If the inner product of the weight vector and the input is less than
– b , the output will be – 1 . This divides the input space into two parts. Fig-
ure 3.3 illustrates this for the case where b = – 1 . The blue line in the fig-
ure represents all points for which the net input n is equal to 0:

n = –1 1 p – 1 = 0 . (3.5)

Notice that this decision boundary will always be orthogonal to the weight
matrix, and the position of the boundary can be shifted by changing b . (In
the general case, W is a matrix consisting of a number of row vectors, each
of which will be used in an equation like Eq. (3.5). There will be one bound-
ary for each row of W . See Chapter 4 for more on this topic.) The shaded
region contains all input vectors for which the output of the network will
be 1. The output will be – 1 for all other input vectors.

3-4
Perceptron

p2

W
1

n>0 n<0
3
p1
-1 1

Figure 3.3 Perceptron Decision Boundary


The key property of the single-neuron perceptron, therefore, is that it can
separate input vectors into two categories. The decision boundary between
the categories is determined by the equation

Wp + b = 0 . (3.6)

Because the boundary must be linear, the single-layer perceptron can only
be used to recognize patterns that are linearly separable (can be separated
by a linear boundary). These concepts will be discussed in more detail in
Chapter 4.

Pattern Recognition Example


Now consider the apple and orange pattern recognition problem. Because
there are only two categories, we can use a single-neuron perceptron. The
vector inputs are three-dimensional ( R = 3 ), therefore the perceptron
equation will be

 p1 
 
a = hardlims  w 1 1 w 1 2 w 1 3 p 2 + b . (3.7)
 
 p3 

We want to choose the bias b and the elements of the weight matrix so that
the perceptron will be able to distinguish between apples and oranges. For
example, we may want the output of the perceptron to be 1 when an apple
is input and – 1 when an orange is input. Using the concept illustrated in
Figure 3.3, let’s find a linear boundary that can separate oranges and ap-

3-5
3 An Illustrative Example

ples. The two prototype vectors (recall Eq. (3.2) and Eq. (3.3)) are shown in
Figure 3.4. From this figure we can see that the linear boundary that di-
vides these two vectors symmetrically is the p 1 p 3 plane.

p3

p2

p1

p1 (orange) p2 (apple)

Figure 3.4 Prototype Vectors


The p 1 p 3 plane, which will be our decision boundary, can be described by
the equation

p2 = 0 , (3.8)

or

p1
0 1 0 p2 + 0 = 0 . (3.9)
p3

Therefore the weight matrix and bias will be

W = 0 1 0 , b = 0. (3.10)

The weight matrix is orthogonal to the decision boundary and points to-
ward the region that contains the prototype pattern p 2 (apple) for which we
want the perceptron to produce an output of 1. The bias is 0 because the
decision boundary passes through the origin.
Now let’s test the operation of our perceptron pattern classifier. It classifies
perfect apples and oranges correctly since

3-6
Perceptron

Orange:

 1 
 
a = hardlims  0 1 0 – 1 + 0 = – 1  orange  , (3.11)
 
 –1 

Apple:

 
3
 1 
a = hardlims  0 1 0 1 + 0 = 1  apple  . (3.12)
 
 –1 

But what happens if we put a not-so-perfect orange into the classifier? Let’s
say that an orange with an elliptical shape is passed through the sensors.
The input vector would then be

–1
p = –1 . (3.13)
–1

The response of the network would be

 –1 
 
a = hardlims  0 1 0 – 1 + 0 = – 1  orange  . (3.14)
 
 –1 

In fact, any input vector that is closer to the orange prototype vector than
to the apple prototype vector (in Euclidean distance) will be classified as an
orange (and vice versa).
To experiment with the perceptron network and the apple/orange classifica-
tion problem, use the Neural Network Design Demonstration Perceptron
Classification (nnd3pc).
This example has demonstrated some of the features of the perceptron net-
work, but by no means have we exhausted our investigation of perceptrons.
This network, and variations on it, will be examined in Chapters 4 through
13. Let’s consider some of these future topics.
In the apple/orange example we were able to design a network graphically,
by choosing a decision boundary that clearly separated the patterns. What
about practical problems, with high dimensional input spaces? In Chapters
4, 7, 10 and 11 we will introduce learning algorithms that can be used to

3-7
Neural Networks:
Learning Process
Prof. Sven Lončarić

[email protected]
http://www.fer.hr/ipg

1
Overview of topics
l  Introduction
l  Error-correction learning
l  Hebb learning
l  Competitive learning
l  Credit-assignment problem
l  Supervised learning
l  Reinforcement learning
l  Unsupervised learning

2
Introduction
l  One of the most important ANN features is ability to
learn from the environment
l  ANN learns through an iterative process of synaptic
weights and threshold adaptation
l  After each iteration ANN should have more
knowledge about the environment

3
Definition of learning
l  Definition of learning in the ANN context:
l  Learning is a process where unknown ANN parameters are
adapted through continuous process of stimulation from the
environment
l  Learning is determined by the way how the change of
parameters takes place
l  This definition implies the following events:
l  The environment stimulates the ANN
l  ANN changes due to environment
l  ANN responds differently to the environment due to the
change
4
Notation
l  vj and vk are
activations of
neurons j and k
ϕ(vj) wkj ϕ(vk)
l  xj and xk are
vj xj vk xk
outputs of
neurons j and k
l  Let wkj(n) be
synaptic weights Neuron j Neuron k

at time n

5
Notation
l  If in step n synaptic weight wkj(n) is changed
by Δwkj(n) we get the new weight:
wkj(n+1) = wkj(n) + Δwkj(n)
where wkj(n) and wkj(n+1) are old and new weights
between neurons k and j
l  A set of rules that are solution to the learning
problem is called a learning algorithm
l  There is no unique learning algorithm, but many
different learning algorithms, each with its
advantages and drawbacks
6
Algorithms and learning
paradigms
l  Learning algorithms determine how weight
correction Δwkj(n) is computed
l  Learning paradigms determine the relation of the
ANN to the environment
l  Three basic learning paradigms are:
l  Supervised learning
l  Reinforcement learning
l  Unsupervised learning

7
Basic learning approaches
l  According to learning algorithm:
l  Error-correction learning
l  Hebb learning
l  Competitive learning
l  Boltzmann learning
l  Thorndike learning
l  According to learning paradigm:
•  Supervised learning
•  Reinforcement learning
•  Unsupervised learning

8
Error-correction learning
l  Belongs to the supervised learning paradigm
l  Let dk(n) be desired output of neuron k at moment n
l  Let yk(n) be obtained output of neuron k at
moment n
l  Output yk(n) is obtained using input vector x(n)
l  Input vector x(n) and desired output dk(n) represent
an example that is presented to ANN at moment n
l  Error is the difference between desired and obtained
output of neuron k at moment n:
ek(n) = dk(n) - yk(n)
9
Error-correction learning
l  The goal of error-correction learning is to minimize
an error function derived from errors ek(n) so that the
obtained output of all neurons approximates the
desired output in some statistical sense
l  A frequently used error function is mean square
error:
⎡1 2 ⎤
J = E ⎢ ∑ ek (n )⎥
⎣2 k ⎦
where E[.] is the statistical expectation operator, and
summation is for all neurons in the output layer
10
Error function
l  The problem with minimization of error function J is
that it is necessary to know statistical properties of
random processes ek(n)
l  For this reason an estimate of the error function in
step n is used as the optimization function:
1
E (n ) = ∑ ek2 (n )
2 k

l  This approach yields an approximate solution

11
Delta learning rule
l  Minimization of error function J with respect to
weights wkj(n) gives Delta learning rule:
Δwkj(n) = η ek(n) xj(n)
where η is a positive constant determining the
learning rate
l  Weight change is proportional to error and to the
value at respective input
l  Learning rate η must be carefully chosen
l  Small η gives stability but learning is slow
l  Large η speeds up learning but brings instability risk
12
Error surface
l  If we draw error value J with respect to synaptic
weights we obtain a multidimensional error surface
l  The learning problem consists of finding a point on
the error surface that has the smallest error (i.e. to
minimize the error)

13
Error surface
l  Depending on the type of neurons there are two
possibilities:
l  ANN consists of linear neurons – in this case the error
surface is a quadratic function with one global minimum
l  ANN consists of nonlinear neurons – in this case the error
surface has one or more global minima and multiple local
minima
l  Learning starts from an arbitrary point on the error
surface and through minimization process:
l  In the first case it converges to the global minimum
l  In the second case it can also converge to a local minimum

14
Hebb learning
l  Hebbov principle of learning says (Hebb, The
Organization of Behavior, 1942):
l  When axon of neuron A is close enough to activate neuron
B and it repeats this many times there will be metabolical
changes so that efficiency of neuron A in activating neuron
B is increased
l  Extension of this principle (Stent, 1973):
l  If one neuron does not influence (stimulate) another neuron
then the synapse between them becomes weaker or is
completely eliminated

15
Activity product rule
l  According to Hebb principle weights are changed as
follows:
Δwkj(n) = F(yk(n), xj(n))
where yk(n) and xj(n) are the output and j-th input
of k-th neuron
l  A special case of this prinicple is:
Δwkj(n) = η yk(n) xj(n)
where constant η determines the learning rate
l  This rule is called activity product rule

16
Activity product rule
l  Weight update is
proportional to input Δwkj
slope = ηyk
value:

Δwkj(n) = η yk(n) xj(n)
l  Problem: Iterative xj
update with the same
input and output
causes continuous 

increase of weight wkj

17
Generalized activity product
rule
l  To overcome the problem of weight saturation
modifications are porposed that are aimed at limiting
the increase of weight wkj
l  Non-linear limiting factor (Kohonen, 1988):
Δwkj(n) = η yk(n) xj(n) - α yk(n) wkj(n)
where α is a positive constant
l  This expression can be written as:
Δwkj(n) = α yk(n)[cxj(n) - wkj(n)]
where c = η/α

18
Generalized activity product
rule
l  In generalized Hebb
rule all inputs such Δwkj
that xj(n)<wkj(n)/c slope = ηyk
result in reduction of
weight wkj xj

l  Inputs for which wkj/c


xj(n)>wkj(n)/c
-aykwkj
increase weight wkj

19
Competitive learning
l  Unsupervised learning
l  Neurons compete to get opportunity to become
active
l  Only one neuron can be active at any time
l  Useful for classification problems
l  Three elements of competitive learning:
l  A set of neurons having randomly selected weights, so
they have different response for a given input
l  Limited weight of each neuron
l  Mechanism for competition of neurons so that only one
neuron is given at any single time (winner-takes-all neuron)
20
Competitive learning
l  An example
network with a
x1
single neuron
layer
x2

x3

x4

input output
layer layer

21
Competitive learning
l  In order to win, activity vj of neuron x must be the
largest of all neurons
l  Output yj of the winning neuron j is equal to 1; 

for all other neurons the output is 0
l  The learning rule is defined as:

⎪ η ( xi − w ji ) if neuron j won
Δw ji = ⎨
⎪⎩ 0 if neuron j lost
l  The learning rule has effect of shifting the weight
vector wj towards the vector x

22
An example of competitive
learning
l  Let us assume that each input vector has norm
equal to one – so that the vector can be represented
as a point on N-dimensional unit sphere
l  Let us assume that weight vectors have norm equal
to one – so they can also be represented as points
on unit N-dimensional sphere
l  During training, input vectors are input to
the network and the winning neuron weight is
updated

23
An example of competitive
learning
l  The learning
process can be weight vector
input vector
represented as
movement
of weight
vectors along
unit sphere
initial state final state

24
Credit-assignment problem
l  Credit-assignment problem is an important issue in
learning algorithms
l  Credit-assignment problem is in assignment of
credit/blame for the overall learning outcome that
depends on a large number of internal decisions of
the learning system

25
Supervised learning
l  Supervised
learning is
characterized by desired
output
the presence of a environment teacher
teacher
obtained
output +
ANN -
error

26
Supervised learning
l  Teacher has knowledge in the form of input-output
pairs used for training
l  Error is a difference between desired and obtained
output for a given input vector
l  ANN parameters change under the influence of
input vectors and error values
l  The learning process is repeated until ANN learns to
imitate the teacher
l  After learning is completed, the teacher is no longer
required and ANN can work without supervision
27
Supervised learning
l  Error function can be mean square error and it
depends on the free parameters (weights)
l  Error function can be represented as a
multidimensional error surface
l  Any ANN configuration is defined by weights
and corresponds to a point on the error surface
l  Learning process can be viewed as movement of
the point down the error surface towards the global
minimum of the error surface

28
Supervised learning
l  A point on error surface moves towards the
minimum based on gradient
l  The gradient at any point on the surface is a vector
showing the direction of the steepest descent

29
Supervised learning
l  Examples of supervised learning algorithms are:
l  LMS (least-mean-square) algorithm
l  BP (back-propagation) algorithm
l  A disadvantage of supervised learning is that
learning is not possible without a teacher
l  ANN can only learn based on provided examples

30
Supervised learning
l  Suppervised learning can be implemented to work
offline or online
l  In offline learning:
l  ANN learns first
l  When learning is completed ANN does not change any
more
l  In online learning:
l  ANN learns during exploitation phase
l  Learning is perfomed in real-time – ANN is dynamic

31
Reinforcement learning
l  Reinforcement learning is of an online character
l  Input-output learning mapping is learned through the
iterative process where a measure of learning
quality is maximized
l  Reinforcement learning overcomes the problem of
supervised learning where training examples are
required

32
Reinforcement learning
l  In reinforcement learning the teacher does not
present input-output training examples, but only
gives a grade representing a measure of learning
quality
l  The grade is a scalar value (a number)
l  Error function is unknown in reinforcement learning
l  Learning algorithm must determine direction of
motion in the learning space through a trial-and-
error approach

33
Thorndike law of effect
l  Reinforcement learning principle:
l  If learning system actions result in positive grade then
there is higher likelihood that the system will take similar
actions in the future
l  Otherwise the likelihood of taking such actions is reduced

34
Unsupervised learning
l  In unsupervised learning there is no teacher
assisting the learning process
l  Competitive learning is an example of unsupervised
learning

35
Unsupervised learning
l  A layer of neurons compete for a chance to learn (to
modify their weights based on the input vector)
l  In the simplest approach the winner-takes-all
strategy is used

36
Comparison of supervised and
unsupervised learning
l  The most popular algorithm for supervised learning
is error-backpropagation algorithm
l  A disadvantage of this algorithm is bad scaling –
learning complexity grows exponentially with the
number of layers

37
Problems
l  Problem 2.1.
l  Delta rule and Hebb rule are two different learning
algorithms. Describe differences between these rules.
l  Problem 2.5.
l  Input of value 1 is connected to the input of synaptic weight
with initial value equal to 1. Calculate weight update using:
l  Basic Hebb rule with learning rate parametrom h=0.1

l  modified Hebb rule with h=0.1 and c=0.1

38

You might also like