0% found this document useful (0 votes)
77 views15 pages

Multi Layer Feed-Forward NN

1. The document discusses multi-layer feedforward neural networks (FNNs) which have an input layer, one or more hidden layers, and an output layer. 2. It provides an example of the XOR problem, which is non-linearly separable, and how an FNN with two hidden nodes can realize the non-linear separation needed to solve the XOR problem. 3. Different types of decision regions that can be realized by FNNs are shown, including convex and non-convex regions bounded by lines, which can be modeled by single hidden layer networks.

Uploaded by

sm-malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views15 pages

Multi Layer Feed-Forward NN

1. The document discusses multi-layer feedforward neural networks (FNNs) which have an input layer, one or more hidden layers, and an output layer. 2. It provides an example of the XOR problem, which is non-linearly separable, and how an FNN with two hidden nodes can realize the non-linear separation needed to solve the XOR problem. 3. Different types of decision regions that can be realized by FNNs are shown, including convex and non-convex regions bounded by lines, which can be modeled by single hidden layer networks.

Uploaded by

sm-malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

NN 4 11-00

FNN
Multi layer feed-forward NN
We consider a more general network architecture: between the input and output
layers there are hidden layers, as illustrated below.
Hidden nodes do not directly receive inputs nor send outputs to the external
environment.
FNNs overcome the limitation of single-layer NN: they can handle non-linearly
separable learning tasks.

Input Output
layer layer

Hidden Layer
Neural Networks NN 4 1

FNN
XOR problem
A typical example of non-linealy separable function is
the XOR. This function takes two input arguments with values in
{-1,1} and returns one output in {-1,1}, as specified in the following
table: x x x xor x
1 2 1 2
-1 -1 -1
-1 1 1
1 -1 1
1 1 -1
If we think at -1 and 1 as encoding of the truth values false and true,
respectively, then XOR computes the logical exclusive or,
which yields true if and only if the two inputs have different
truth values.
Neural Networks NN 4 2

Rossella Cancelliere 1
NN 4 11-00

FNN
XOR problem
In this graph of the XOR, input pairs giving output x1
equal to 1 and -1 are shown.
1
These two classes cannot be separated using a line.
We have to use two lines.
The following NN with two hidden nodes -1 1
realizes this non-linear separation, where x2
each hidden node describes one of the two
lines. -1
-1
0.1
+1
x1 +1
This NN uses the sign activation
-1 function. The two arrows
indicate the regions where the
-1 network output will be 1. The output
x2 +1
+1
node is used to combine the outputs
of the two hidden nodes.
Neural Networks -1 NN 4 3

FNN
Types of decision regions
1 Network
w0
with a single
w0 + w1 x1 + w2 x2 > 0 x1 w1 node
w0 + w1 x1 + w2 x2 < 0
x2 w2

One-hidden layer network that


realizes the convex region: each
hidden node realizes one of the
1
L1 1 lines bounding the convex region
L2
Convex 1
region
x1 1
L4
L3 1
x2

Neural Networks NN 4 4

Rossella Cancelliere 2
NN 4 11-00

FNN NEURON MODEL FNN


• The classical learning algorithm of FFNN is based on the
gradient descent method. For this reason the activation function
used in FFNN are continuous functions of the weights,
differentiable everywhere.
• A typical activation function that can be viewed as a continuous
approximation of the step (threshold) function is the Sigmoid
Function. The activation function for node j is:
ϕ (v j )
ϕ (v j ) = 1
− av with a > 0 1
1+ e j

Increasing a
where v j = ∑ w ji yi
i

with w ji weight of link from node i


vj
to node j and yi output of node i -10 -8 -6 -4 -2 2 4 6 8 10

• when a → ∞ , ϕ 'becomes' the step function


Neural Networks NN 4 5

FNN
Training: Backprop algorithm
• The Backprop algorithm searches for weight values
that minimize the total error of the network over the set
of training examples (training set).
• Backprop consists of the repeated application of the
following two passes:
– Forward pass: in this step the network is activated on one
example and the error of (each neuron of) the output layer is
computed.
– Backward pass: in this step the network error is used for
updating the weights (credit assignment problem). This
process is more complex than the LMS algorithm for Adaline,
because hidden nodes are linked to the error not directly but
by means of the nodes of the next layer. Therefore, starting at
the output layer, the error is propagated backwards through
the network, layer by layer. This is done by recursively
computing the local gradient of each weight.
Neural Networks NN 4 6

Rossella Cancelliere 3
NN 4 11-00

Backprop FNN

• Back-propagation training algorithm

Network activation
Forward Step
i k
w
ki
Error propagation
Backward Step

• Backprop adjusts the weights of the NN in order


to minimize the network total mean squared
error.
Neural Networks NN 4 7

FNN
Total Mean Squared Error
• The error of output neuron j after the activation of the
network on the n-th training example ( x ( n ), d ( n )) is:
e j (n) = d j (n) - y j (n)

• The pattern error is the sum of the squared errors of the

∑e
output neurons:
E(n) =
1 2
2 j (n)
j output node
• The total mean squared error is the average of the network
errors of the training examples.
N

∑ E (n)
1
E AV = N
n =1
Neural Networks NN 4 8

Rossella Cancelliere 4
NN 4 11-00

Weight Update Rule FNN

The Backprop weight update rule is based on the gradient


descent method: take a step in the direction yielding the
maximum decrease of the network error E. This direction
is the opposite of the gradient of E.

w ji = w ji + ∆w ji

∂E
∆w ji = -η η >0
∂w ji

Neural Networks NN 4 9

FNN
Weight Update Rule
Input of neuron j is: vj = ∑w
i =0 ,...,m
ji yi

Using the chain rule ∂E ∂E ∂v j


=
we can write: ∂ w ji ∂ v j ∂ w ji
Moreover defining the
∂E
Error signal of neuron j δj = −
as follows: ∂v j

Then from
∂v j
∂w ji
= y i we get ∆w ji = ηδ j yi
Neural Networks NN 4 10

Rossella Cancelliere 5
NN 4 11-00

FNN
Weight update of output neuron
In order to compute the weight change ∆w ji we need to know the error signal
δ j of neuron j .
There are two cases, depending whether j is an output or an hidden neuron.
If j is an output neuron then using the chain rule we obtain:
∂E ∂E ∂e ∂y
− = − j j
= − e j ( − 1 )ϕ ' ( v j )
∂v j ∂e j ∂y j ∂v j

because ej = dj - yj and y j = ϕ ( v j)

So if j is an output node then the weight w ji from neuron i to neuron j is


updated of:

∆ w ji = η (d j - y j ) ϕ ' (v j ) y i

Neural Networks NN 4 11

FNN
Weight update of hidden neuron
If j is a hidden neuron then its error signal δ j is computed using the
error signals of all the neurons of the next layer.
∂E ∂E ∂y j
Using the chain rule we have: δ j = − =- =
∂v j ∂y j ∂v j

∂y ∂E ∂E ∂v k
Observe that
∂v
j
= ϕ ' ( v j ) and
∂y j
= ∑
k in next ∂v k ∂y j
j
layer

Then δj = −
k in next
∑ δ w
layer
k kj . ϕ '(v j)

So if j is a hidden node then the weight w ji from neuron i to neuron j is


updated of:
∆w ji = η y iϕ ' ( v j ) ∑
k in next
δ w k
layer
kj

Neural Networks NN 4 12

Rossella Cancelliere 6
NN 4 11-00

FNN
Summary: Delta Rule
• Delta rule ∆wji = ηδj yi

ϕ′(vj)(dj − yj) IF j output node


δj =
ϕ ′ (v j ) ∑δ w k kj IF j hidden node
k of next layer

where ϕ ' ( v j ) = ay j (1 − y j )

Neural Networks NN 4 13

FNN
Generalized delta rule
• If η is small then the algorithm learns the weights very
slowly, while if η is large then the large changes of the
weights may cause an unstable behavior with
oscillations of the weight values.
• A technique for tackling this problem is the introduction
of a momentum term in the delta rule which takes into
account previous updates. We obtain the following
generalized Delta rule:

∆w ji (n) = α∆w ji ( n − 1) + ηδ j ( n)yi ( n)


α momentum constant 0 ≤α <1
the momentum accelerates the descent in steady downhill directions.
the momentum has a stabilizing effect in directions that oscillate in time.
Neural Networks NN 4 14

Rossella Cancelliere 7
NN 4 11-00

FNN
Other techniques:η adaptation

Other heuristics for accelerating the convergence of


the back-prop algorithm through η adaptation:
• Heuristic 1: Every weight has its own η.
• Heuristic 2: Every η is allowed to vary from one
iteration to the next.

Neural Networks NN 4 15

Backprop learning algorithm FNN


(incremental-mode)
n=1;
initialize w(n) randomly;
while (stopping criterion not satisfied or n<max_iterations)
for each example (x,d)
- run the network with input x and compute the output y
- update the weights in backward order starting from
those of the output layer:
w ji = w ji + ∆w ji
with ∆w ji computed using the (generalized) Delta rule
end-for
n = n+1;
end-while;
Neural Networks NN 4 16

Rossella Cancelliere 8
NN 4 11-00

Backprop algorithm FNN

• In the batch-mode the weights are updated only


after all examples have been processed, using

∑ ∆w
the formula
w ji = w ji + x
ji
x training example
• The learning process continues on an epoch-
by-epoch basis until the stopping condition is
satisfied.
• In the incremental mode choose a randomized
ordering for selecting the examples in the
training set in order to avoid poor performance.

Neural Networks NN 4 17

FNN
Stopping criterions
• Sensible stopping criterions:
– total mean squared error change:
Back-prop is considered to have converged when the
absolute rate of change in the average squared error per
epoch is sufficiently small (in the range [0.1, 0.01]).
– generalization based criterion:
After each epoch the NN is tested for generalization. If the
generalization performance is adequate then stop. If this
stopping criterion is used then the part of the training set
used for testing the network generalization will not used for
updating the weights.

Neural Networks NN 4 18

Rossella Cancelliere 9
NN 4 11-00

FNN
NN DESIGN
The following features are very important for
NN design:
• Data representation
• Network Topology
• Network Parameters
• Training
• Validation

Neural Networks NN 4 19

FNN
Data Representation
• Data representation depends on the problem;
generally NNs work on continuous (real valued)
attributes.
• Attributes of different types may have different
ranges of values; this can affect the training
process. Normalization may be used, so that each
attribute assumes values between 0 and 1.
xi − min i
xi =
max i − min i
where min i and max i represent the range of that
attribute over the training set.
Neural Networks NN 4 20

Rossella Cancelliere 10
NN 4 11-00

FNN
Network Topology
• The number of layers and of neurons
depend on the specific task. In practice this
issue is solved by trial and error.
• Two types of adaptive algorithms can be
used:
– start from a large network and successively
remove some neurons and links until network
performance degrades (pruning).
– begin with a small network and introduce new
neurons until performance is satisfactory.
Neural Networks NN 4 21

FNN
Network parameters

• How are the weights initialized?


• How is the learning rate chosen?
• How many hidden layers and how many
neurons?
• How many examples in the training set?

Neural Networks NN 4 22

Rossella Cancelliere 11
NN 4 11-00

FNN
Weights and learning rate
• In general, initial weights are randomly
chosen, with typical values between -1.0
and 1.0 or -0.5 and 0.5.

•The right value of η depends on the


application. Values between 0.1 and 0.9
have been used in many applications.
•Other heuristics adapt η during the
training as described in previous slides.
Neural Networks NN 4 23

Training FNN

• Rule of thumb:
– the number of training examples should be at
least four to ten times the number of weights of
the network.
• Other rule:

|W | |W|= number of weights


N >
(1 - a) a = expected accuracy on test set

Neural Networks NN 4 24

Rossella Cancelliere 12
NN 4 11-00

Applicability of FNN FNN

Boolean functions:
• Every boolean function can be represented
by a network with a single hidden layer
Continuous functions:
• Every bounded piece-wise continuous
function can be approximated with arbitrarily
small error by a network with one hidden
layer.
• Any continuous function can be
approximated to arbitrary accuracy by a
network with two hidden layers.
Neural Networks NN 4 25

Approximation by FNN - theorem FNN

Let ϕ (⋅ ) be a nonconstant, bounded, and monotone-


increasing continuous function.
Let I m 0 denote the m 0 -dimensional unit hypercube [0,1]m 0
Then, given any function f ∈ C (I m0 ) and ε > 0 there exist an
integer m1 and sets of real constants α i b i and wij such that
⎛ m0 ⎞
( )
m1
F x1,K xm0 = ∑αiϕ ⎜ ∑ wij x j + bi ⎟ is an approximation of f,
⎜ j =1 ⎟
i =1 ⎝ ⎠

i.e. ( ) ( )
F x1 ,K x m0 − f x1 ,K x m0 < ε

Neural Networks NN 4 26

Rossella Cancelliere 13
NN 4 11-00

Approximation by FNN - comments FNN

The sigmoidal function used for the construction of MLP


satisfies the conditions imposed on ϕ (⋅ )

(
F x1,Kxm0 ) represents the output of a MLP with:

• m 0 input nodes and m1 hidden nodes


• synaptic weights wij and bias b i for hidden nodes
• synaptic weights α i for output nodes

The universal approximation theorem is an existence


theorem

Neural Networks NN 4 27

FNN
Approximation by FNN - comments

The theorem states that a single hidden layer is


sufficient for a MLP to compute a uniform approximation
to a given training set represented by the set of inputs
x1,Kxm0

In 1993 Barron established the approximation


properties of a MLP, evaluating the error decreasing
rate as O(1/ m1)

Neural Networks NN 4 28

Rossella Cancelliere 14
NN 4 11-00

Applications of FFNN FNN

Classification, pattern recognition, diagnosis:


• FNN can be applied to solve non-linearly separable
learning problems.
– Recognizing printed or handwritten characters,
– Face recognition, Speech recognition
– Object classification by means of salient features
– Analysis of signal to determine their nature and
source
Regression and Forecasting
• FNN can be applied to learn non-linear functions
(regression) and in particular functions whose inputs is a
sequence of measurements over time (time series).
Neural Networks NN 4 29

Rossella Cancelliere 15

You might also like