0% found this document useful (0 votes)
49 views39 pages

Unit 3

Uploaded by

Bhavani G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views39 pages

Unit 3

Uploaded by

Bhavani G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Neural Networks and Deep Learning

UNIT – 3
Feedforward Neural Networks

Dr. D. SUDHEER
Assistant Professor
Computer Science and Engineering
VNRVJIET
© Dr. Devulapalli Sudheer 1
Introduction
• By a suitable choice of architecture for a feedforward network, it
is possible to perform several pattern recognition tasks.
•The linear association network shows that the network is limited
in its capabilities.
•The constraint on the number of input patterns is overcome by
using a two layer feedforward network with nonlinear processing
units in the output layer.
•This modification automatically leads to the consideration of
pattern classification problems.
• Classification problems which are not linearly separable are
called hard problems. © Dr. Devulapalli Sudheer 2
• In order to overcome the constraint of linear separability for
pattern classification problems, a multilayer feedforward network
with nonlinear processing units in all the intermediate hidden
layers and in the output layer is proposed.
• A multilayer feedforward architecture could solve representation
of the hard problems in a network.
• It introduces the problem of hard learning, i.e., the difficulty in
adjusting the weights of the network to capture the implied
functional relationship between the given input-output pattern
pairs.
• The hard learning problem is solved by using thc
backpropagation learning algorithm. © Dr. Devulapalli Sudheer 3
© Dr. Devulapalli Sudheer 4
Analysis of Pattern Association Networks

• The objective in pattern association is to design a network that


can represent the association in the pairs of vectors (al, bl), l = 1,
2, ..., L, through a set of weights to be determined by a learning
law.
• The given set of input-output pattern pairs is called training data.

© Dr. Devulapalli Sudheer 5


a. Linear Associative Network

© Dr. Devulapalli Sudheer 6


© Dr. Devulapalli Sudheer 7
• Each output unit receives inputs from the M input units corresponding
to the M-dimensional input vectors.
• Due to linearity of the output function, the activation values (xi) and
the signal values of the units in the input layer are the same as the
input data values ali.
The activation value of jth output unit is:

• The weights are determined by using the criterion that the total
mean squared error between the desired output and the actual
output is to be minimized.
© Dr. Devulapalli Sudheer 8
b. Determination of weights by computation
• For linear associative network:

Error in the output is given by the distance between the desired


output vector and the actual output vector.

© Dr. Devulapalli Sudheer 9


• The following singular value decomposition (SVD) of an M x L
matrix A is used to compute the pseudo inverse and to evaluate the
minimum error.

The expression for minimum error is given as

© Dr. Devulapalli Sudheer 10


c. Determination of weights by learning
• It is desirable to determine the weights of a network in an
incremental manner.
•Each update of the weights with a new input data can be
interpreted as network learning.
•Computationally also learning is preferable because it does not
require information of all the training set data at the same time.
•It is also preferable to have learning confined to a local operation.
•Two learning laws and their variations, as applicable to a linear
associative network are discussed: 1. Hebb’s Law
2. Widrows Law
© Dr. Devulapalli Sudheer 11
• Let the input pattern vector al and the corresponding desired
output pattern vector bl be applied to the linear associative
network.
• According to the Hebb's law, the updated weight value of a
connection depends only on the activations of the processing units.

© Dr. Devulapalli Sudheer 12


• Note that the computation of the increment xi yj = aliblj is purely
local for the processor unit and the input-output pattern pair.
• The updated weight matrix for the application of the lth pair (al, bl)
is given by

• where W(1- 1) refers to the weight matrix after presentation of


the first (1 - 1) pattern pairs, and W(1) refers to the weight matrix
after presentation of the first 1 pattern pairs with NxM dimension.

© Dr. Devulapalli Sudheer 13


• To verify whether the network has learnt the association of the
given set of input-output pattern vector pairs, apply the input
pattern ak and determine the actual output vector b’k .

© Dr. Devulapalli Sudheer 14


Widrow’s law:

• A form of Widrow learning can be used to obtain W = BA+


recursively.
• Let W(l - 1) be the weight matrix after presentation of (1 - 1)
samples.
• Then W(1- 1) = B(1- 1)A+(l - 1), where the matrices B(l - 1) and
A(1 - 1) are composed of the first (1 - 1) vectors of bk and the first
(1 - 1) vectors of ak
• The updated matrix is given as

© Dr. Devulapalli Sudheer 15


© Dr. Devulapalli Sudheer 16
• By starting with zero initial values for all the weights, and
successively adding the pairs (a1, b1), (a2, b2), ..., (aL, bL), we can
obtain the final pseudoinverse-based weight matrix W = BA+.
• problem with this approach is that the recursive procedure cannot
be implemented locally because of the need to calculate pl
• The same eventual effect can be approximately realized using the
following variation of the above learning law

© Dr. Devulapalli Sudheer 17


• where is a small positive constant called the learning rate
parameter.
• The learning law implemented locally as:

• where wj(l - 1) is the weight vector associated with the jth


processing unit in the output layer of the linear associative network
at the (1 - 1)th iteration.
• The low learning rate will achieve highest convergence of
learning law.
© Dr. Devulapalli Sudheer 18
Summary of pattern association network

• rank r is maximum number of linearly independent rows of


matrix © Dr. Devulapalli Sudheer 19
Analysis of Pattern Classification Networks

• Each input pattern is associated with any distinct class label.


• The number of output patterns can be viewed as distinct classes.
• There is no restriction of number of input patterns associated
with output patterns.
• The output points are points in a discrete N-dimensional space.
• Some times the input patterns may corrupted by external noise.
• The input will be mapped wit any one of distinct pattern even
not, It is called accretive behaviour.

© Dr. Devulapalli Sudheer 20


Pattern classification with Perceptron
• The number of units in the input layer corresponds to the
dimensionality of the input pattern vectors.
• The number of units in the input layer corresponds to the
dimensionality of the input pattern vectors.
• Typically, if the weighted sum of the input values to the output
unit exceeds the threshold, the output signal is labelled as 1,
otherwise as 0.
• If a subset of the input patterns belong to one
class (say class A,) and the remaining subset of the
input patterns to another class.
© Dr. Devulapalli Sudheer 21
• Note that the dividing surface between the two classes is given
by

• This equation represents a linear hyperplane in the M-


dimensional space. The hyperplane becomes a point if M = 1, a
straight line if M = 2, and a plane if M = 3.

© Dr. Devulapalli Sudheer 22


• Suppose the subsets A1, and A2 of points in the M-dimensional
space contain the sample patterns belonging to the classes A1, and
A2.
• We need to classify the patterns belongs to A1, or A2.

© Dr. Devulapalli Sudheer 23


© Dr. Devulapalli Sudheer 24
Perceptron learning as gradient descent

• The perceptron learning law can be written as follows:

© Dr. Devulapalli Sudheer 25


Is the error signal

• The product between output error e(m) and activation value x(m)
can measure performance as below:

Then the weight update will be:

© Dr. Devulapalli Sudheer 26


Forward pass Y=[0,1,2]
Input
Layer 1.
5.1 0.1 0.3 1.
03 80
0.2
0.2 x 0.7
x0 2 x4 1.
15

0.1 0.6 0.7 x6


2. 1.
3.5 17 22
0.5 0.3
x1 x3 x5 • 0.19 0.69
•0.59 0.29

©Dr. SUDHEER DEVULAPALLI


27
Forward pass Y=[0,1,2]
Input
Layer
5.1 0.1 0.3
0.2
0.2 x 0.7
x0 2 x4

0.1 0.6 0.7 x6


3.5
0.5 0.3
x1 x3 x5

©Dr. SUDHEER DEVULAPALLI


28
Pattern representation problem

© Dr. Devulapalli Sudheer 29


Multi class classification

© Dr. Devulapalli Sudheer 30


Geometrical representation of Hard problems

• A pattern classification problem can be viewed as determining the


Hypersurfaces separating the multidimensional patterns belonging to
different classes.
• A two-layer network consisting of two input units and N output
units can produce N distinct lines in the pattern space.

© Dr. Devulapalli Sudheer 31


• The multilayer perceptron with nonlinear units like sigmoid will
produce smooth surfaces instead of hyperplanes.

© Dr. Devulapalli Sudheer 32


Analysis of pattern mapping networks

• A function transforming a point in the M-dimensional input


pattern space to N-dimensional output pattern space, then the
problem of capturing implied functional relationship is called a
mapping problem.
• The network accomplishing this task called mapping network.
• The pattern mapping problem is more general case of
classification problem.
• The objective of pattern mapping is to capture the generalization
implied in the input – output pattern.
© Dr. Devulapalli Sudheer 33
• It is also can be viewed as approximation function from given
data.
• in terms of function approximation the output should be close to
the values for current input used in training.

© Dr. Devulapalli Sudheer 34


Pattern Mapping Network

• The multi layer feed forward neural network with at least two
hidden layers along with input and out layers can perform the
pattern classification problem.
• Same models can also perform the pattern mapping task.
• The number of hidden layers depends on the nature of mapping
problem.
• Except the input layer the units in the other layers must be mon
linear to produce the generalization.

© Dr. Devulapalli Sudheer 35


Ref: Artificial neural networks, yegna narayana, Table 4.6 for
backpropagation algorithm © Dr. Devulapalli Sudheer 36
• The hard learning problem is solved by using a differentiable
nonlinear output function for each unit in the hidden and output
•layers.
• The corresponding learning law is based on propagating the
error from the output layer to the hidden layers for updating the
weights.
• This is an error correcting learning law, also called the
generalized delta rule.
• It is based on the principle of gradient descent along the error
surface.

© Dr. Devulapalli Sudheer 37


• In this so called back propagation network the objective is to capture (in the
weights) the complex nonlinear hypersurfaces separating the classes.
• The complexity of the surface is determined by the number of hidden units
in the network.
• In a classification problem the input patterns belonging to a class are
expected to have some common features which are different for patterns
belonging to another class.
• For a classification problem, the trained neural network is expected to
perform some kind of generalization, which is possible only if there are some
features common among the input patterns belonging to each class.
•features are captured by the network during training.

© Dr. Devulapalli Sudheer 38


© Dr. Devulapalli Sudheer 39

You might also like