Theory and Examples: Problem Statement
Theory and Examples: Problem Statement
Theory and Examples: Problem Statement
Problem Statement
A produce dealer has a warehouse that stores a variety of fruits and vege-
tables. When fruit is brought to the warehouse, various types of fruit may
be mixed together. The dealer wants a machine that will sort the fruit ac-
cording to type. There is a conveyer belt on which the fruit is loaded. This
conveyer passes through a set of sensors, which measure three properties
of the fruit: shape, texture and weight. These sensors are somewhat primi-
tive. The shape sensor will output a 1 if the fruit is approximately round
and a – 1 if it is more elliptical. The texture sensor will output a 1 if the sur-
face of the fruit is smooth and a – 1 if it is rough. The weight sensor will
output a 1 if the fruit is more than one pound and a – 1 if it is less than one
pound.
The three sensor outputs will then be input to a neural network. The pur-
pose of the network is to decide which kind of fruit is on the conveyor, so
that the fruit can be directed to the correct storage bin. To make the prob-
lem even simpler, let’s assume that there are only two kinds of fruit on the
conveyor: apples and oranges.
Neural
Network
Sensors
Sorter
Apples Oranges
3-2
Perceptron
shape
p = texture . (3.1)
weight
1
p1 = –1 , (3.2) 3
–1
1
p2 = 1 . (3.3)
–1
The neural network will receive one three-dimensional input vector for
each fruit on the conveyer and must make a decision as to whether the fruit
is an orange p 1 or an apple p 2 .
Now that we have defined this simple (trivial?) pattern recognition prob-
lem, let’s look briefly at three different neural networks that could be used
to solve it. The simplicity of our problem will facilitate our understanding
of the operation of the networks.
Perceptron
The first network we will discuss is the perceptron. Figure 3.1 illustrates a
single-layer perceptron with a symmetric hard limit transfer function hard-
lims.
p a
Rx1
W Sx1
SxR
n
Sx1
1 b
R Sx1 S
- Exp (Wp
a = hardlims - + b)
3-3
3 An Illustrative Example
Two-Input Case
Before we use the perceptron to solve the orange and apple recognition
problem (which will require a three-input perceptron, i.e., R = 3 ), it is use-
ful to investigate the capabilities of a two-input/single-neuron perceptron
( R = 2 ), which can be easily analyzed graphically. The two-input percep-
tron is shown in Figure 3.2.
p1 w1,1
n a
p2
Σ
w1,2 b
1
a = hardlims (Wp + b)
Therefore, if the inner product of the weight matrix (a single row vector in
this case) with the input vector is greater than or equal to – b , the output
will be 1. If the inner product of the weight vector and the input is less than
– b , the output will be – 1 . This divides the input space into two parts. Fig-
ure 3.3 illustrates this for the case where b = – 1 . The blue line in the fig-
ure represents all points for which the net input n is equal to 0:
n = –1 1 p – 1 = 0 . (3.5)
Notice that this decision boundary will always be orthogonal to the weight
matrix, and the position of the boundary can be shifted by changing b . (In
the general case, W is a matrix consisting of a number of row vectors, each
of which will be used in an equation like Eq. (3.5). There will be one bound-
ary for each row of W . See Chapter 4 for more on this topic.) The shaded
region contains all input vectors for which the output of the network will
be 1. The output will be – 1 for all other input vectors.
3-4
Perceptron
p2
W
1
n>0 n<0
3
p1
-1 1
Wp + b = 0 . (3.6)
Because the boundary must be linear, the single-layer perceptron can only
be used to recognize patterns that are linearly separable (can be separated
by a linear boundary). These concepts will be discussed in more detail in
Chapter 4.
p1
a = hardlims w 1 1 w 1 2 w 1 3 p 2 + b . (3.7)
p3
We want to choose the bias b and the elements of the weight matrix so that
the perceptron will be able to distinguish between apples and oranges. For
example, we may want the output of the perceptron to be 1 when an apple
is input and – 1 when an orange is input. Using the concept illustrated in
Figure 3.3, let’s find a linear boundary that can separate oranges and ap-
3-5
3 An Illustrative Example
ples. The two prototype vectors (recall Eq. (3.2) and Eq. (3.3)) are shown in
Figure 3.4. From this figure we can see that the linear boundary that di-
vides these two vectors symmetrically is the p 1 p 3 plane.
p3
p2
p1
p1 (orange) p2 (apple)
p2 = 0 , (3.8)
or
p1
0 1 0 p2 + 0 = 0 . (3.9)
p3
W = 0 1 0 , b = 0. (3.10)
The weight matrix is orthogonal to the decision boundary and points to-
ward the region that contains the prototype pattern p 2 (apple) for which we
want the perceptron to produce an output of 1. The bias is 0 because the
decision boundary passes through the origin.
Now let’s test the operation of our perceptron pattern classifier. It classifies
perfect apples and oranges correctly since
3-6
Perceptron
Orange:
1
a = hardlims 0 1 0 – 1 + 0 = – 1 orange , (3.11)
–1
Apple:
3
1
a = hardlims 0 1 0 1 + 0 = 1 apple . (3.12)
–1
But what happens if we put a not-so-perfect orange into the classifier? Let’s
say that an orange with an elliptical shape is passed through the sensors.
The input vector would then be
–1
p = –1 . (3.13)
–1
–1
a = hardlims 0 1 0 – 1 + 0 = – 1 orange . (3.14)
–1
In fact, any input vector that is closer to the orange prototype vector than
to the apple prototype vector (in Euclidean distance) will be classified as an
orange (and vice versa).
To experiment with the perceptron network and the apple/orange classifica-
tion problem, use the Neural Network Design Demonstration Perceptron
Classification (nnd3pc).
This example has demonstrated some of the features of the perceptron net-
work, but by no means have we exhausted our investigation of perceptrons.
This network, and variations on it, will be examined in Chapters 4 through
13. Let’s consider some of these future topics.
In the apple/orange example we were able to design a network graphically,
by choosing a decision boundary that clearly separated the patterns. What
about practical problems, with high dimensional input spaces? In Chapters
4, 7, 10 and 11 we will introduce learning algorithms that can be used to
3-7
Neural Networks:
Learning Process
Prof. Sven Lončarić
[email protected]
http://www.fer.hr/ipg
1
Overview of topics
l Introduction
l Error-correction learning
l Hebb learning
l Competitive learning
l Credit-assignment problem
l Supervised learning
l Reinforcement learning
l Unsupervised learning
2
Introduction
l One of the most important ANN features is ability to
learn from the environment
l ANN learns through an iterative process of synaptic
weights and threshold adaptation
l After each iteration ANN should have more
knowledge about the environment
3
Definition of learning
l Definition of learning in the ANN context:
l Learning is a process where unknown ANN parameters are
adapted through continuous process of stimulation from the
environment
l Learning is determined by the way how the change of
parameters takes place
l This definition implies the following events:
l The environment stimulates the ANN
l ANN changes due to environment
l ANN responds differently to the environment due to the
change
4
Notation
l vj and vk are
activations of
neurons j and k
ϕ(vj) wkj ϕ(vk)
l xj and xk are
vj xj vk xk
outputs of
neurons j and k
l Let wkj(n) be
synaptic weights Neuron j Neuron k
at time n
5
Notation
l If in step n synaptic weight wkj(n) is changed
by Δwkj(n) we get the new weight:
wkj(n+1) = wkj(n) + Δwkj(n)
where wkj(n) and wkj(n+1) are old and new weights
between neurons k and j
l A set of rules that are solution to the learning
problem is called a learning algorithm
l There is no unique learning algorithm, but many
different learning algorithms, each with its
advantages and drawbacks
6
Algorithms and learning
paradigms
l Learning algorithms determine how weight
correction Δwkj(n) is computed
l Learning paradigms determine the relation of the
ANN to the environment
l Three basic learning paradigms are:
l Supervised learning
l Reinforcement learning
l Unsupervised learning
7
Basic learning approaches
l According to learning algorithm:
l Error-correction learning
l Hebb learning
l Competitive learning
l Boltzmann learning
l Thorndike learning
l According to learning paradigm:
• Supervised learning
• Reinforcement learning
• Unsupervised learning
8
Error-correction learning
l Belongs to the supervised learning paradigm
l Let dk(n) be desired output of neuron k at moment n
l Let yk(n) be obtained output of neuron k at
moment n
l Output yk(n) is obtained using input vector x(n)
l Input vector x(n) and desired output dk(n) represent
an example that is presented to ANN at moment n
l Error is the difference between desired and obtained
output of neuron k at moment n:
ek(n) = dk(n) - yk(n)
9
Error-correction learning
l The goal of error-correction learning is to minimize
an error function derived from errors ek(n) so that the
obtained output of all neurons approximates the
desired output in some statistical sense
l A frequently used error function is mean square
error:
⎡1 2 ⎤
J = E ⎢ ∑ ek (n )⎥
⎣2 k ⎦
where E[.] is the statistical expectation operator, and
summation is for all neurons in the output layer
10
Error function
l The problem with minimization of error function J is
that it is necessary to know statistical properties of
random processes ek(n)
l For this reason an estimate of the error function in
step n is used as the optimization function:
1
E (n ) = ∑ ek2 (n )
2 k
11
Delta learning rule
l Minimization of error function J with respect to
weights wkj(n) gives Delta learning rule:
Δwkj(n) = η ek(n) xj(n)
where η is a positive constant determining the
learning rate
l Weight change is proportional to error and to the
value at respective input
l Learning rate η must be carefully chosen
l Small η gives stability but learning is slow
l Large η speeds up learning but brings instability risk
12
Error surface
l If we draw error value J with respect to synaptic
weights we obtain a multidimensional error surface
l The learning problem consists of finding a point on
the error surface that has the smallest error (i.e. to
minimize the error)
13
Error surface
l Depending on the type of neurons there are two
possibilities:
l ANN consists of linear neurons – in this case the error
surface is a quadratic function with one global minimum
l ANN consists of nonlinear neurons – in this case the error
surface has one or more global minima and multiple local
minima
l Learning starts from an arbitrary point on the error
surface and through minimization process:
l In the first case it converges to the global minimum
l In the second case it can also converge to a local minimum
14
Hebb learning
l Hebbov principle of learning says (Hebb, The
Organization of Behavior, 1942):
l When axon of neuron A is close enough to activate neuron
B and it repeats this many times there will be metabolical
changes so that efficiency of neuron A in activating neuron
B is increased
l Extension of this principle (Stent, 1973):
l If one neuron does not influence (stimulate) another neuron
then the synapse between them becomes weaker or is
completely eliminated
15
Activity product rule
l According to Hebb principle weights are changed as
follows:
Δwkj(n) = F(yk(n), xj(n))
where yk(n) and xj(n) are the output and j-th input
of k-th neuron
l A special case of this prinicple is:
Δwkj(n) = η yk(n) xj(n)
where constant η determines the learning rate
l This rule is called activity product rule
16
Activity product rule
l Weight update is
proportional to input Δwkj
slope = ηyk
value:
Δwkj(n) = η yk(n) xj(n)
l Problem: Iterative xj
update with the same
input and output
causes continuous
increase of weight wkj
17
Generalized activity product
rule
l To overcome the problem of weight saturation
modifications are porposed that are aimed at limiting
the increase of weight wkj
l Non-linear limiting factor (Kohonen, 1988):
Δwkj(n) = η yk(n) xj(n) - α yk(n) wkj(n)
where α is a positive constant
l This expression can be written as:
Δwkj(n) = α yk(n)[cxj(n) - wkj(n)]
where c = η/α
18
Generalized activity product
rule
l In generalized Hebb
rule all inputs such Δwkj
that xj(n)<wkj(n)/c slope = ηyk
result in reduction of
weight wkj xj
19
Competitive learning
l Unsupervised learning
l Neurons compete to get opportunity to become
active
l Only one neuron can be active at any time
l Useful for classification problems
l Three elements of competitive learning:
l A set of neurons having randomly selected weights, so
they have different response for a given input
l Limited weight of each neuron
l Mechanism for competition of neurons so that only one
neuron is given at any single time (winner-takes-all neuron)
20
Competitive learning
l An example
network with a
x1
single neuron
layer
x2
x3
x4
input output
layer layer
21
Competitive learning
l In order to win, activity vj of neuron x must be the
largest of all neurons
l Output yj of the winning neuron j is equal to 1;
for all other neurons the output is 0
l The learning rule is defined as:
⎧
⎪ η ( xi − w ji ) if neuron j won
Δw ji = ⎨
⎪⎩ 0 if neuron j lost
l The learning rule has effect of shifting the weight
vector wj towards the vector x
22
An example of competitive
learning
l Let us assume that each input vector has norm
equal to one – so that the vector can be represented
as a point on N-dimensional unit sphere
l Let us assume that weight vectors have norm equal
to one – so they can also be represented as points
on unit N-dimensional sphere
l During training, input vectors are input to
the network and the winning neuron weight is
updated
23
An example of competitive
learning
l The learning
process can be weight vector
input vector
represented as
movement
of weight
vectors along
unit sphere
initial state final state
24
Credit-assignment problem
l Credit-assignment problem is an important issue in
learning algorithms
l Credit-assignment problem is in assignment of
credit/blame for the overall learning outcome that
depends on a large number of internal decisions of
the learning system
25
Supervised learning
l Supervised
learning is
characterized by desired
output
the presence of a environment teacher
teacher
obtained
output +
ANN -
error
26
Supervised learning
l Teacher has knowledge in the form of input-output
pairs used for training
l Error is a difference between desired and obtained
output for a given input vector
l ANN parameters change under the influence of
input vectors and error values
l The learning process is repeated until ANN learns to
imitate the teacher
l After learning is completed, the teacher is no longer
required and ANN can work without supervision
27
Supervised learning
l Error function can be mean square error and it
depends on the free parameters (weights)
l Error function can be represented as a
multidimensional error surface
l Any ANN configuration is defined by weights
and corresponds to a point on the error surface
l Learning process can be viewed as movement of
the point down the error surface towards the global
minimum of the error surface
28
Supervised learning
l A point on error surface moves towards the
minimum based on gradient
l The gradient at any point on the surface is a vector
showing the direction of the steepest descent
29
Supervised learning
l Examples of supervised learning algorithms are:
l LMS (least-mean-square) algorithm
l BP (back-propagation) algorithm
l A disadvantage of supervised learning is that
learning is not possible without a teacher
l ANN can only learn based on provided examples
30
Supervised learning
l Suppervised learning can be implemented to work
offline or online
l In offline learning:
l ANN learns first
l When learning is completed ANN does not change any
more
l In online learning:
l ANN learns during exploitation phase
l Learning is perfomed in real-time – ANN is dynamic
31
Reinforcement learning
l Reinforcement learning is of an online character
l Input-output learning mapping is learned through the
iterative process where a measure of learning
quality is maximized
l Reinforcement learning overcomes the problem of
supervised learning where training examples are
required
32
Reinforcement learning
l In reinforcement learning the teacher does not
present input-output training examples, but only
gives a grade representing a measure of learning
quality
l The grade is a scalar value (a number)
l Error function is unknown in reinforcement learning
l Learning algorithm must determine direction of
motion in the learning space through a trial-and-
error approach
33
Thorndike law of effect
l Reinforcement learning principle:
l If learning system actions result in positive grade then
there is higher likelihood that the system will take similar
actions in the future
l Otherwise the likelihood of taking such actions is reduced
34
Unsupervised learning
l In unsupervised learning there is no teacher
assisting the learning process
l Competitive learning is an example of unsupervised
learning
35
Unsupervised learning
l A layer of neurons compete for a chance to learn (to
modify their weights based on the input vector)
l In the simplest approach the winner-takes-all
strategy is used
36
Comparison of supervised and
unsupervised learning
l The most popular algorithm for supervised learning
is error-backpropagation algorithm
l A disadvantage of this algorithm is bad scaling –
learning complexity grows exponentially with the
number of layers
37
Problems
l Problem 2.1.
l Delta rule and Hebb rule are two different learning
algorithms. Describe differences between these rules.
l Problem 2.5.
l Input of value 1 is connected to the input of synaptic weight
with initial value equal to 1. Calculate weight update using:
l Basic Hebb rule with learning rate parametrom h=0.1
38