0% found this document useful (0 votes)
32 views56 pages

Ann MLP

Uploaded by

rotwiler38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views56 pages

Ann MLP

Uploaded by

rotwiler38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Artificial Neural Network

For BCSE Final Year


Jadavpur University
Trainable Pattern Classifiers-
The Deterministic Approach
Here we would study classifiers whose
decision functions are generated from
training patterns by means of iterative,
"learning" algorithms. We know that once a
type of decision function has been specified,
the problem is the determination of the
coefficients. Here some algorithms would be
discussed that are capable of learning the
solution coefficients from the training sets
whenever these training pattern sets are
separable by the specified decision
functions.
Contd…
Frank Rosenblatt (1962) introduced the most
primitive type of trainable pattern classifier
in the form of a Artificial Neural Network
(ANN) which is known as Single-Layer-
Perceptron (SLP).
What is a Neural Network ?
A prototype nerve cell called neurone is
shown below. Electrical impulses
propagating along the axon (axon
potentials) activate the synaptic junctions.
These, in turn, produce further excitations
(post synaptic potentials) which travel along
the dendrites towards the next neurone
Contd…
-

Figure 1. A prototype neurone


A Neural Processing Element
-

Figure 2. A neural processing element


Another typical model of
neurone
-

Figure 3. Another typical model of neurone


Contd…
The firing rate of each neurone is controlled
by the region where the axon joins the cell
body, called the hillock zone . When the
membrane potential at the hillock zone rises
above a certain threshold value around -60
mV, it causes a travelling wave of charge to
propagate. The neurone must restore itself
to its proper resting state of balance before
sending out the next packet of charge, called
the refractory period.
Contd…
So information is passed via synapses. The
synapses are termed excitatory, or inhibitory
depending on whether the post-synaptic
potentials increase or reduce the hillock
potential, enhancing or reducing the likehood
of triggering an impulse there respectively.
Contd…
The current level of understanding of the
brain function is so primitive that not even
one area of brain is yet completely
understood. Thus artificial neural network
only tries to mimic the biological neural
network in a very crude and primitive
manner.
What is a ANN ?
An artificial neural network is a paralled
distributed information processing structure
in the form of a directed graph, with the
following sub-definitions and restrictions
which are given in next slide.
Contd…
1. The nodes of the graph are called
processing elements.
2. The links of the graph are called
connections. Each connection function as
an instantaneous unidirectional signal-
condition path.
3. Each processing element can receive any
number of incoming connections (also
called input connections).
Contd…
4. Each processing element can have any
number of outgoing connections, but the
signals in all of these must be the same.
5. Processing elements can have local
memory.
6. Each processing element possesses a
transfer function, which can use (and
alter) local memory, can use input
signals, and which produces the
processing element's output signal.
Contd…
Transfer functions can operate continuously or
episodically. If they operate episodically, there
must be an input called “activate” that causes
the processing element's transfer function to
operate on the current input signals and local
memory values and to produce and update
output signal (and possibly to modify local
memory values). Continuous processing
elements are always operating. The “activate”
input arrives via a connection from a
scheduling processing element that is part of
the network.
Contd…
7. Input signals to a neural network from
outside the network arrive via
connection that originate in the outside
world. Outputs from the network to the
outside world are connections that
leave the network.
Contd…
The input signals x1,x2….,xn arriving at the
processing element are supplied to the
transfer function, as is the "activate" input.
The transfer function of an episodically
updated processing element, when
activated, uses the current values of the
input signals, as well as values in local
memory, to produce processing element's
new output signal value y.
Contd…
Note that the strength of each synaptic
junction is represented by a multiplicative
factor, or weight with a positive sign for
excitatory connections, and negative
otherwise. The hillock zone is modeled by a
summation of the signals received from
every link. The firing rate of the neurone in
response to the aggregate signal is then
described by a mathematical function,
whose value represents the frequency of
emission of electrical impulses along the
axon.
Contd…
The general artificial neuron model has
five components, shown in the following
list. (The subscript i indicates the i-th
input or weight.)
1. A set of inputs Xi,
2. A set of weights Wi,
3. A bias Θ
4. An activation function, f.
5. Neuron output, Y
Table 1: A comparison of neural
networks and conventional computers.
Neural network Conventional
computers

Many simple processors Few complex


processors
Few processing steps Many
computational Steps

Distributed processing Symbolic


processing
Contd…
Trained by example Explicit
programming
Contd…
Algorithms Type Function

Hopfield recursive optimization

Multi-layered feedforward classification


perceptron
Kohonen self-organizing data coding

Temporal predictive forecasting


differences
A taxonomy of six neural nets that
can be used as classifiers.
NEURAL NET CLASSIFIERS
-
BINARY INPUT CONTINUOUS-VALUED INPUT

SUPERVISED UNSUPERVISED SUPERVISED UNSUPERVISED

HOPFILED HAMMING PERCEPTRON MULTI-LAYER


NET NET PERCEPTRON
CARPENTER/ KOHONEN
CROSSBERG SELF-ORGANIZING
CLASSIFIER FEATURE MAP

OPTIMUM LEADER GAUSSIAN K-NEAREST K-MEANS


CLASASIFIER CLUSTRING CLASSIFIER NEIGHBOR, CLUSTRING
ALGORITHM MIXHURE ALGORITHM
Single Layer Perceptron
The single layer perceptron consists with
only one node that can be used with both
continuous valued and binary inputs. A
perceptron that decides whether an input
belongs to one of two classes (denoted A or
B) is shown below. The single node
computes a weighted sum of the input
elements, subtracts a threshold (Θ) and
passes the result through a hard limiting
nonlinearity such that the output y is either
+1 or -1. for class A and class B respectively.
Contd…
-
Contd…
The perceptron forms two decision regions
separated by a hyperplane which in 2-D is a
line. As can be seen, the equation of the
boundary line depends on the connection
weights and the threshold.
Contd…
Connection weights and the threshold in a
perceptron can be fixed or adapted using a
number of different algorithms. The original
perceptron convergence procedure for
adjusting weights was developed by
Rosenblatt. It is described in next slide.
Contd…
First connection weights and the threshold
value are initialized to small random non-
zero values. Then a new input with N
continuous valued elements is applied to the
input and the output is computed as in Fig.
12. Connection weights are adapted only
when an error occurs using the formula in
step 4. This formula includes a gain term
that ranges from 0.0 to 1.0 and controls the
adaptation rate.
The Perceptron
Convergence Procedure
 Step 1. Initialize weights and
Threshold
Set Wi(0):(0≤i≤N-1) and Θ to small random
values.
Here Wi(t) is the weight from input i at time
t and Θ is the threshold in the output
mode.
Contd…
 Step 2. Present New Input and
Desired Output
Present new continuous valued input
X0,X1,…XN-1along with the desired output
d(t).
Contd…
 Step 3. Calculate Actual Output

N 1

y (t )  f   W i(t ) X i(t )   
 i 0 
Contd…
 Step 4. Adapt Weights

W i(t  1) W i(t )  [d (t )  y (t )] X i(t )


0  i  N 1
 1 if input from class A
d (t )  
 1 if input from class B
In these equations
  is a positive gain fraction dless
(t )
than 1 and d (t )
is the desired correct output
for the current input. Note that weights are unchanged if the correct decision
is made by the net.
Contd…
 Step 5. Repeat by Going to Step 2.

Note that the gain term  must be


adjusted to satisfy the conflicting
requirements of fast adaptation for real
changes in the input distributions and
averaging of past inputs to provide stable
weight estimates.
Contd…
Rosenblatt proved that if the inputs
presented from the two classes are
separable (that is they are in opposite sides
of some hyperplane), then the perceptron
convergence procedure converges and
positions the decision hyperplane between
those two classes.
Contd…
One problem with the perceptron
convergence procedure is that decision
boundaries may oscillate continuously when
inputs are not separable and distributions
overlap.
Multi-Layer Perceptron
Multi-layer perceptrons are feed-forward nets
with one or more layers (called hidden
layers) of nodes between the input and
output nodes. A three-layer perceptron with
two layers of hidden units is shown below.
Multi-layer perceptrons overcome many of
the limitations of single-layer perceptron,
and shown to be successful for many
problems of interest.
A MLP with one hidden layer
-
Contd…
The capabilities of perceptrons with one, two,
and three layers that use hard-limiting
nonlinearities are illustrated in the above
Figure. The second column in this figure
indicates the types of decision regions that
can be formed with different nets. The next
two columns present examples of decision
regions which could be formed for the
exclusive OR problem and a problem with
meshed regions. The rightmost column gives
examples of the most general decision
regions that can be formed.
Pattern classification using
MLP
Contd…
 The decision boundary provided by SLP
is given by
 N 1 
y (t )  f   W i(t ) X i(t )   
 i 0 
 Letting , Θ=0, we get

 0 if input from class 1


y (t ) 
 0 if input from class  2
Contd…
 In other words, we want to find a
solution weight vector W with the
property that W’X>0 for all patterns of
ω1 and W’X<0 for all patterns of ω2.
……………………………(1)
 If the patterns of ω2 are multiplied by
-1, we obtain the equivalent condition
W’X>0 for all patterns.
Contd…
 Letting N represent the total number of
augmented sample patterns in both
classes, we may express the problem as
one of finding a vector W such that the
system of inequalities
W’X>0 ………………….(2) 
is satisfied,
 X1 
 
  
X 
X2 
where   
 
 X N 
 
Contd…
W=(w1,w2,…..wn,wn+1)’ and 0 is the zero
vector.

 If there exists a W which satisfies


expression (2), the inequalities are said
to be consistent; otherwise, they are
inconsistent.
Contd…
 Following the condition given in (1), the
Perceptron Algorithm is given by
If X (t )  1 and W’(t)X(t)>0, let
W(t+1)=W(t)

 Otherwise replace W(t) by


W(t+1)=W(t)+ η X(t) where η is the
correction factor.
Contd…
If X (t )  2 and W’(t)X(t)<0, let
W(t+1)=W(t)

Otherwise replace W(t) by W(t+1)=W(t)-


ηX(t)
 Here the amount of weight correction is
not proportional to the amount of error,
but a constant fraction of the input
being misclassified.
Problem..
 Apply the perceptron algorithm to the
following augmented patterns to find a
solution weight vector for a two class
problem.
 The classes are ω1:{(0,0,1)’,(0,1,1)’}
and ω2:{(1,0,1)’,(1,1,1)’}
Letting η=1 and W(1)=0, and presenting
the patterns in the above order, results in
the following sequence of steps:
Contd…
- 0 0
   
W (1) X (1)  (0,0,0) 0   0, W (2)  W (1)  X (1)   0 
1  1 
   
0 0
   
W (2) X (2)  (0,0,1)1   1, W (3)  W (2)   0 
1  1 
   
1    1
   
W (3) X (3)  (0,0,1) 0   1, W (4)  W (3)  X (3)   0 
1   0
   
Contd…
1   1
   
W (4) X (4)  (1,0,0)1  1, W (5)  W (4)   0 
1  0
   
where corrections on the weight vector were
made in the first and third steps because of
misclassification, Since a solution has been
obtained only when the algorithm yields a
complete, error-free iteration through all
patterns, the training set must be presented
again.
Contd…
The machine learning process is continued
by letting X(5)=X(1),X(6)=X(2),X(7)=X(3)
and X(8)=X(4) The second iteration through
the patterns yields:

 1 
 
W (5) X (5)  0, W (6)  W (5)  X (5)   0 
 1
 
Contd…
-   1
 
W (6) X (6)  1, W (7)  W (6)   0 
 1
 
 2
 
W (7) X (7)  0, W (8)  W (7)  X (7)   0 
 0
 
 2
 
W (8) X (8)  2, W (9)  W (8)   0 
 0
 
Contd…
Since two errors occurred in this iteration,
the patterns are presented again.

  2
 
W (9) X (9)  0, W (10)  W (9)  X (9)   0 
 1
 
  2
 
W (10) X (10)  1, W (11)  W (10)   0 
 1 
 
Contd…
-  2
 
W (11) X (11)  1, W (12)  W (11)   0 
 1
 

 2
 
W (12) X (12)  1, W (13)  W (12)   0 
 1
 
Contd…
It is easily verified that in the next
iteration all patterns are classified
correctly. The solution vector is,
therefore,W=(-2,0,1)’ The corresponding
decision function is d(X)=-2x1+1which,
when set equal to zero, becomes the
equation of the decision boundary shown
in Figure below.
(a) Patterns belonging to two classes.
(b) Decision boundary determined by
training
-
Contd…
According to Eq.(2), we may express the
perceptron algorithm in an equivalent form
by multiplying the augmented patterns of
one class by - 1. Thus, arbitrarily multiplying
the patterns of ω2 by - 1, we can write the
perceptron algorithm as
Contd…

W (t ) if W (t ) X (t )  0
W (t  1)  
W (t )  X (t ) if W (t ) X (t )  0

where η is a positive correction


increment.
THANK YOU

You might also like