0% found this document useful (0 votes)

77 views15 pages

Multi Layer Feed-Forward NN

1. The document discusses multi-layer feedforward neural networks (FNNs) which have an input layer, one or more hidden layers, and an output layer. 2. It provides an example of the XOR problem, which is non-linearly separable, and how an FNN with two hidden nodes can realize the non-linear separation needed to solve the XOR problem. 3. Different types of decision regions that can be realized by FNNs are shown, including convex and non-convex regions bounded by lines, which can be modeled by single hidden layer networks.

Uploaded by

sm-malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views15 pages

Multi Layer Feed-Forward NN

Uploaded by

sm-malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

NN 4 11-00

FNN
Multi layer feed-forward NN
We consider a more general network architecture: between the input and output
layers there are hidden layers, as illustrated below.
Hidden nodes do not directly receive inputs nor send outputs to the external
environment.
FNNs overcome the limitation of single-layer NN: they can handle non-linearly
separable learning tasks.

Input Output
layer layer

Hidden Layer
Neural Networks NN 4 1

FNN
XOR problem
A typical example of non-linealy separable function is
the XOR. This function takes two input arguments with values in
{-1,1} and returns one output in {-1,1}, as specified in the following
table: x x x xor x
1 2 1 2
-1 -1 -1
-1 1 1
1 -1 1
1 1 -1
If we think at -1 and 1 as encoding of the truth values false and true,
respectively, then XOR computes the logical exclusive or,
which yields true if and only if the two inputs have different
truth values.
Neural Networks NN 4 2

Rossella Cancelliere 1
NN 4 11-00

FNN
XOR problem
In this graph of the XOR, input pairs giving output x1
equal to 1 and -1 are shown.
1
These two classes cannot be separated using a line.
We have to use two lines.
The following NN with two hidden nodes -1 1
realizes this non-linear separation, where x2
each hidden node describes one of the two
lines. -1
-1
0.1
+1
x1 +1
This NN uses the sign activation
-1 function. The two arrows
indicate the regions where the
-1 network output will be 1. The output
x2 +1
+1
node is used to combine the outputs
of the two hidden nodes.
Neural Networks -1 NN 4 3

FNN
Types of decision regions
1 Network
w0
with a single
w0 + w1 x1 + w2 x2 > 0 x1 w1 node
w0 + w1 x1 + w2 x2 < 0
x2 w2

Neural Networks NN 4 4

Rossella Cancelliere 2
NN 4 11-00

FNN NEURON MODEL FNN

• The classical learning algorithm of FFNN is based on the
gradient descent method. For this reason the activation function
used in FFNN are continuous functions of the weights,
differentiable everywhere.
• A typical activation function that can be viewed as a continuous
approximation of the step (threshold) function is the Sigmoid
Function. The activation function for node j is:
ϕ (v j )
ϕ (v j ) = 1
− av with a > 0 1
1+ e j

Increasing a
where v j = ∑ w ji yi
i

with w ji weight of link from node i

vj
to node j and yi output of node i -10 -8 -6 -4 -2 2 4 6 8 10

• when a → ∞ , ϕ 'becomes' the step function

Neural Networks NN 4 5

FNN
Training: Backprop algorithm
• The Backprop algorithm searches for weight values
that minimize the total error of the network over the set
of training examples (training set).
• Backprop consists of the repeated application of the
following two passes:
– Forward pass: in this step the network is activated on one
example and the error of (each neuron of) the output layer is
computed.
– Backward pass: in this step the network error is used for
updating the weights (credit assignment problem). This
process is more complex than the LMS algorithm for Adaline,
because hidden nodes are linked to the error not directly but
by means of the nodes of the next layer. Therefore, starting at
the output layer, the error is propagated backwards through
the network, layer by layer. This is done by recursively
computing the local gradient of each weight.
Neural Networks NN 4 6

Rossella Cancelliere 3
NN 4 11-00

Backprop FNN

• Back-propagation training algorithm

Network activation
Forward Step
i k
w
ki
Error propagation
Backward Step

• Backprop adjusts the weights of the NN in order

to minimize the network total mean squared
error.
Neural Networks NN 4 7

FNN
Total Mean Squared Error
• The error of output neuron j after the activation of the
network on the n-th training example ( x ( n ), d ( n )) is:
e j (n) = d j (n) - y j (n)

• The pattern error is the sum of the squared errors of the

∑e
output neurons:
E(n) =
1 2
2 j (n)
j output node
• The total mean squared error is the average of the network
errors of the training examples.
N

∑ E (n)
1
E AV = N
n =1
Neural Networks NN 4 8

Rossella Cancelliere 4
NN 4 11-00

Weight Update Rule FNN

The Backprop weight update rule is based on the gradient

descent method: take a step in the direction yielding the
maximum decrease of the network error E. This direction
is the opposite of the gradient of E.

w ji = w ji + ∆w ji

∂E
∆w ji = -η η >0
∂w ji

Neural Networks NN 4 9

FNN
Weight Update Rule
Input of neuron j is: vj = ∑w
i =0 ,...,m
ji yi

Using the chain rule ∂E ∂E ∂v j

=
we can write: ∂ w ji ∂ v j ∂ w ji
Moreover defining the
∂E
Error signal of neuron j δj = −
as follows: ∂v j

Then from
∂v j
∂w ji
= y i we get ∆w ji = ηδ j yi
Neural Networks NN 4 10

Rossella Cancelliere 5
NN 4 11-00

FNN
Weight update of output neuron
In order to compute the weight change ∆w ji we need to know the error signal
δ j of neuron j .
There are two cases, depending whether j is an output or an hidden neuron.
If j is an output neuron then using the chain rule we obtain:
∂E ∂E ∂e ∂y
− = − j j
= − e j ( − 1 )ϕ ' ( v j )
∂v j ∂e j ∂y j ∂v j

because ej = dj - yj and y j = ϕ ( v j)

So if j is an output node then the weight w ji from neuron i to neuron j is

updated of:

∆ w ji = η (d j - y j ) ϕ ' (v j ) y i

Neural Networks NN 4 11

FNN
Weight update of hidden neuron
If j is a hidden neuron then its error signal δ j is computed using the
error signals of all the neurons of the next layer.
∂E ∂E ∂y j
Using the chain rule we have: δ j = − =- =
∂v j ∂y j ∂v j

∂y ∂E ∂E ∂v k
Observe that
∂v
j
= ϕ ' ( v j ) and
∂y j
= ∑
k in next ∂v k ∂y j
j
layer

Then δj = −
k in next
∑ δ w
layer
k kj . ϕ '(v j)

So if j is a hidden node then the weight w ji from neuron i to neuron j is

updated of:
∆w ji = η y iϕ ' ( v j ) ∑
k in next
δ w k
layer
kj

Neural Networks NN 4 12

Rossella Cancelliere 6
NN 4 11-00

FNN
Summary: Delta Rule
• Delta rule ∆wji = ηδj yi

ϕ′(vj)(dj − yj) IF j output node

δj =
ϕ ′ (v j ) ∑δ w k kj IF j hidden node
k of next layer

where ϕ ' ( v j ) = ay j (1 − y j )

Neural Networks NN 4 13

FNN
Generalized delta rule
• If η is small then the algorithm learns the weights very
slowly, while if η is large then the large changes of the
weights may cause an unstable behavior with
oscillations of the weight values.
• A technique for tackling this problem is the introduction
of a momentum term in the delta rule which takes into
account previous updates. We obtain the following
generalized Delta rule:

∆w ji (n) = α∆w ji ( n − 1) + ηδ j ( n)yi ( n)

α momentum constant 0 ≤α <1
the momentum accelerates the descent in steady downhill directions.
the momentum has a stabilizing effect in directions that oscillate in time.
Neural Networks NN 4 14

Rossella Cancelliere 7
NN 4 11-00

FNN
Other techniques:η adaptation

Other heuristics for accelerating the convergence of

the back-prop algorithm through η adaptation:
• Heuristic 1: Every weight has its own η.
• Heuristic 2: Every η is allowed to vary from one
iteration to the next.

Neural Networks NN 4 15

Backprop learning algorithm FNN

(incremental-mode)
n=1;
initialize w(n) randomly;
while (stopping criterion not satisfied or n<max_iterations)
for each example (x,d)
- run the network with input x and compute the output y
- update the weights in backward order starting from
those of the output layer:
w ji = w ji + ∆w ji
with ∆w ji computed using the (generalized) Delta rule
end-for
n = n+1;
end-while;
Neural Networks NN 4 16

Rossella Cancelliere 8
NN 4 11-00

Backprop algorithm FNN

• In the batch-mode the weights are updated only

after all examples have been processed, using

∑ ∆w
the formula
w ji = w ji + x
ji
x training example
• The learning process continues on an epoch-
by-epoch basis until the stopping condition is
satisfied.
• In the incremental mode choose a randomized
ordering for selecting the examples in the
training set in order to avoid poor performance.

Neural Networks NN 4 17

FNN
Stopping criterions
• Sensible stopping criterions:
– total mean squared error change:
Back-prop is considered to have converged when the
absolute rate of change in the average squared error per
epoch is sufficiently small (in the range [0.1, 0.01]).
– generalization based criterion:
After each epoch the NN is tested for generalization. If the
generalization performance is adequate then stop. If this
stopping criterion is used then the part of the training set
used for testing the network generalization will not used for
updating the weights.

Neural Networks NN 4 18

Rossella Cancelliere 9
NN 4 11-00

FNN
NN DESIGN
The following features are very important for
NN design:
• Data representation
• Network Topology
• Network Parameters
• Training
• Validation

Neural Networks NN 4 19

FNN
Data Representation
• Data representation depends on the problem;
generally NNs work on continuous (real valued)
attributes.
• Attributes of different types may have different
ranges of values; this can affect the training
process. Normalization may be used, so that each
attribute assumes values between 0 and 1.
xi − min i
xi =
max i − min i
where min i and max i represent the range of that
attribute over the training set.
Neural Networks NN 4 20

Rossella Cancelliere 10
NN 4 11-00

FNN
Network Topology
• The number of layers and of neurons
depend on the specific task. In practice this
issue is solved by trial and error.
• Two types of adaptive algorithms can be
used:
– start from a large network and successively
remove some neurons and links until network
performance degrades (pruning).
– begin with a small network and introduce new
neurons until performance is satisfactory.
Neural Networks NN 4 21

FNN
Network parameters

• How are the weights initialized?

• How is the learning rate chosen?
• How many hidden layers and how many
neurons?
• How many examples in the training set?

Neural Networks NN 4 22

Rossella Cancelliere 11
NN 4 11-00

FNN
Weights and learning rate
• In general, initial weights are randomly
chosen, with typical values between -1.0
and 1.0 or -0.5 and 0.5.

•The right value of η depends on the

application. Values between 0.1 and 0.9
have been used in many applications.
•Other heuristics adapt η during the
training as described in previous slides.
Neural Networks NN 4 23

Training FNN

• Rule of thumb:
– the number of training examples should be at
least four to ten times the number of weights of
the network.
• Other rule:

|W | |W|= number of weights

N >
(1 - a) a = expected accuracy on test set

Neural Networks NN 4 24

Rossella Cancelliere 12
NN 4 11-00

Applicability of FNN FNN

Boolean functions:
• Every boolean function can be represented
by a network with a single hidden layer
Continuous functions:
• Every bounded piece-wise continuous
function can be approximated with arbitrarily
small error by a network with one hidden
layer.
• Any continuous function can be
approximated to arbitrary accuracy by a
network with two hidden layers.
Neural Networks NN 4 25

Approximation by FNN - theorem FNN

Let ϕ (⋅ ) be a nonconstant, bounded, and monotone-

increasing continuous function.
Let I m 0 denote the m 0 -dimensional unit hypercube [0,1]m 0
Then, given any function f ∈ C (I m0 ) and ε > 0 there exist an
integer m1 and sets of real constants α i b i and wij such that
⎛ m0 ⎞
( )
m1
F x1,K xm0 = ∑αiϕ ⎜ ∑ wij x j + bi ⎟ is an approximation of f,
⎜ j =1 ⎟
i =1 ⎝ ⎠

i.e. ( ) ( )
F x1 ,K x m0 − f x1 ,K x m0 < ε

Neural Networks NN 4 26

Rossella Cancelliere 13
NN 4 11-00

Approximation by FNN - comments FNN

The sigmoidal function used for the construction of MLP

satisfies the conditions imposed on ϕ (⋅ )

(
F x1,Kxm0 ) represents the output of a MLP with:

• m 0 input nodes and m1 hidden nodes

• synaptic weights wij and bias b i for hidden nodes
• synaptic weights α i for output nodes

The universal approximation theorem is an existence

theorem

Neural Networks NN 4 27

FNN
Approximation by FNN - comments

The theorem states that a single hidden layer is

sufficient for a MLP to compute a uniform approximation
to a given training set represented by the set of inputs
x1,Kxm0

In 1993 Barron established the approximation

properties of a MLP, evaluating the error decreasing
rate as O(1/ m1)

Neural Networks NN 4 28

Rossella Cancelliere 14
NN 4 11-00

Applications of FFNN FNN

Classification, pattern recognition, diagnosis:

• FNN can be applied to solve non-linearly separable
learning problems.
– Recognizing printed or handwritten characters,
– Face recognition, Speech recognition
– Object classification by means of salient features
– Analysis of signal to determine their nature and
source
Regression and Forecasting
• FNN can be applied to learn non-linear functions
(regression) and in particular functions whose inputs is a
sequence of measurements over time (time series).
Neural Networks NN 4 29

Rossella Cancelliere 15

Architecture and Learning Process in Neural Network - GeeksforGeeks
No ratings yet
Architecture and Learning Process in Neural Network - GeeksforGeeks
6 pages
Module 4 Continued
No ratings yet
Module 4 Continued
244 pages
General EBS Setup
No ratings yet
General EBS Setup
119 pages
ML Unit 2
No ratings yet
ML Unit 2
91 pages
Unit Vi - Artificial Neural Network and Deep Learning - Notes
No ratings yet
Unit Vi - Artificial Neural Network and Deep Learning - Notes
16 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
L6 Neural Network
No ratings yet
L6 Neural Network
57 pages
3ML.05.NeuralNetworks DeepLearning
No ratings yet
3ML.05.NeuralNetworks DeepLearning
67 pages
Ann 2
No ratings yet
Ann 2
22 pages
Chap11 Neural Nets
No ratings yet
Chap11 Neural Nets
38 pages
50 Ipv4 Subnetting Practice Questions (With Answer Key)
33% (3)
50 Ipv4 Subnetting Practice Questions (With Answer Key)
15 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
ANN Unit 3
No ratings yet
ANN Unit 3
100 pages
Lecture-2 Learning Process45452465442
No ratings yet
Lecture-2 Learning Process45452465442
50 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
Unit 2 - Soft Computing
No ratings yet
Unit 2 - Soft Computing
49 pages
CH 12 - Artificial Neural Networks
No ratings yet
CH 12 - Artificial Neural Networks
39 pages
Dersnot 6452 1668688984
No ratings yet
Dersnot 6452 1668688984
36 pages
Letter To Regional Provident Fund Commissioner
69% (16)
Letter To Regional Provident Fund Commissioner
2 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
31 pages
Unit 4
No ratings yet
Unit 4
13 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
DLD Problems
No ratings yet
DLD Problems
4 pages
Module 02
No ratings yet
Module 02
20 pages
Micro Python
No ratings yet
Micro Python
10 pages
Ann 4
No ratings yet
Ann 4
15 pages
Lecture15 NeuronNetworks
No ratings yet
Lecture15 NeuronNetworks
61 pages
Lecture 17-Classification by Backpropagation-M
No ratings yet
Lecture 17-Classification by Backpropagation-M
25 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
AI Mod4 Session 8 Best Fit Line & ANN
No ratings yet
AI Mod4 Session 8 Best Fit Line & ANN
39 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
No ratings yet
Lecture 11 - Introduction To Artificial Neural Networks (ANN)
35 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
Modeling of Electromechanical Systems
100% (1)
Modeling of Electromechanical Systems
30 pages
HUAWEI Y3II Quick Start Guide LUA-L02&L22 02 English
100% (1)
HUAWEI Y3II Quick Start Guide LUA-L02&L22 02 English
24 pages
LINK ROBOT ARM - sreeVidhya@Students
No ratings yet
LINK ROBOT ARM - sreeVidhya@Students
18 pages
Deep Learning 10 Hours: - Artificial Neural Networks (ANN) : Architecture
No ratings yet
Deep Learning 10 Hours: - Artificial Neural Networks (ANN) : Architecture
24 pages
Back Propagation
No ratings yet
Back Propagation
56 pages
ANN Research
No ratings yet
ANN Research
18 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
81 pages
M.tech (Water Resourses Engg) Syllabus
No ratings yet
M.tech (Water Resourses Engg) Syllabus
22 pages
314325-Electrical Estimating and Contracting
No ratings yet
314325-Electrical Estimating and Contracting
9 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
Artificial Neural Network Unit-3
No ratings yet
Artificial Neural Network Unit-3
2 pages
Main
No ratings yet
Main
25 pages
Finite Constraint Domains
No ratings yet
Finite Constraint Domains
77 pages
Reader's Digest - July 2008
100% (4)
Reader's Digest - July 2008
202 pages
Chapter 4 Neural Network
No ratings yet
Chapter 4 Neural Network
46 pages
A3 Operation Instruction 1.1
No ratings yet
A3 Operation Instruction 1.1
63 pages
Act No. 2 of 2021the Cyber Security and Cyber Crimes
No ratings yet
Act No. 2 of 2021the Cyber Security and Cyber Crimes
49 pages
Supervised Learning Unit 4-Neural Network
No ratings yet
Supervised Learning Unit 4-Neural Network
30 pages
Introduction To Wavelets - Part 2
No ratings yet
Introduction To Wavelets - Part 2
59 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
15 pages
PR IntroductionTerminology
No ratings yet
PR IntroductionTerminology
53 pages
Benchmark With Gartner Labor Market Insights Key Takeaways Deck July Edition
No ratings yet
Benchmark With Gartner Labor Market Insights Key Takeaways Deck July Edition
33 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
Feedforward
No ratings yet
Feedforward
34 pages
(Irwin Business Communications) Deborah Barrett - Leadership Communication-McGraw-Hill Education (2013) - 165-193
No ratings yet
(Irwin Business Communications) Deborah Barrett - Leadership Communication-McGraw-Hill Education (2013) - 165-193
29 pages
Sqoop VSFlume
No ratings yet
Sqoop VSFlume
18 pages
WK4 - Radial Basis Function Networks
No ratings yet
WK4 - Radial Basis Function Networks
40 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
TO Artificial Neural Networks
No ratings yet
TO Artificial Neural Networks
22 pages
TO Artificial Neural Networks
No ratings yet
TO Artificial Neural Networks
22 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Learning Rules For Multilayer Feedforward Neural Networks
No ratings yet
Learning Rules For Multilayer Feedforward Neural Networks
19 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Business Intelligence & Data Mining-10
No ratings yet
Business Intelligence & Data Mining-10
39 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
T MSL 9.4 Im Up SS
No ratings yet
T MSL 9.4 Im Up SS
84 pages
Coding Resources Coding Clinic, Encoders, Automated Coding
No ratings yet
Coding Resources Coding Clinic, Encoders, Automated Coding
11 pages
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
No ratings yet
Neural Network: Throughout The Whole Network, Rather Than at Specific Locations
8 pages
Work Breakdown Structure WBS
No ratings yet
Work Breakdown Structure WBS
26 pages
Srs Document Detail Explanation
No ratings yet
Srs Document Detail Explanation
14 pages
19 - Introduction To Neural Networks
No ratings yet
19 - Introduction To Neural Networks
7 pages
DDB-distribution Database Important.
No ratings yet
DDB-distribution Database Important.
15 pages
Chapter 2 Arrays Iteration Invariants
No ratings yet
Chapter 2 Arrays Iteration Invariants
19 pages
Unit I Iot
No ratings yet
Unit I Iot
4 pages
Broncolor Mobil Manual
No ratings yet
Broncolor Mobil Manual
15 pages
18.10.2007 Electroni 3rd Year RAGHVENDRA KUMAR
No ratings yet
18.10.2007 Electroni 3rd Year RAGHVENDRA KUMAR
31 pages
Pleiades Panharpening and Orthorectification
No ratings yet
Pleiades Panharpening and Orthorectification
10 pages
DSE Inequalitiesdsdsddwdddwdwdw
No ratings yet
DSE Inequalitiesdsdsddwdddwdwdw
4 pages
+Pvgtcevkxg-Pqyngfig&Kueqxgt (Hqt$Cugnkpg 'Uvkocvkqpcpf9Qtf5Giogpvcvkqpkp Cpfytkvvgp#Tcdke6Gzv
No ratings yet
+Pvgtcevkxg-Pqyngfig&Kueqxgt (Hqt$Cugnkpg 'Uvkocvkqpcpf9Qtf5Giogpvcvkqpkp Cpfytkvvgp#Tcdke6Gzv
19 pages
Intelligent Transportation Systems Using External Infrastructure: A Literature Survey
No ratings yet
Intelligent Transportation Systems Using External Infrastructure: A Literature Survey
18 pages
Discrete-Event Simulation A F
No ratings yet
Discrete-Event Simulation A F
3 pages
Inline Terminal - IB IL AO 4/I/4-20-ECO - 2702497: Product Description
No ratings yet
Inline Terminal - IB IL AO 4/I/4-20-ECO - 2702497: Product Description
10 pages
Nicolet In10 MX-PS51511
No ratings yet
Nicolet In10 MX-PS51511
4 pages
Bedian
No ratings yet
Bedian
1 page
Mid Term Exam Sp20 Solutions
No ratings yet
Mid Term Exam Sp20 Solutions
8 pages
Screenshot 2022-05-20 at 21.51.45
No ratings yet
Screenshot 2022-05-20 at 21.51.45
1 page
Audiolab Mdac HFC
No ratings yet
Audiolab Mdac HFC
3 pages
Notice IReport 5.5.1
No ratings yet
Notice IReport 5.5.1
3 pages
Config
No ratings yet
Config
1 page