0% found this document useful (0 votes)
29 views21 pages

NY Perceptron Notes

The document discusses single layer neural networks and perceptrons. It describes perceptrons, their ability to perform pattern classification, and the perceptron learning rule. It also covers linear separability and provides examples of linearly separable problems. The document contains information about the perceptron model and classification using a single layer discrete perceptron network.

Uploaded by

farhaan217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views21 pages

NY Perceptron Notes

The document discusses single layer neural networks and perceptrons. It describes perceptrons, their ability to perform pattern classification, and the perceptron learning rule. It also covers linear separability and provides examples of linearly separable problems. The document contains information about the perceptron model and classification using a single layer discrete perceptron network.

Uploaded by

farhaan217
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Single layer Neural Networks

Single Layer Neural Networks

Dr. N. YADAIAH
Professor of EEE
Department of Electrical & Electronics Engineering
JNTUH College of Engineering, Hyderabad
[email protected]

28-12-2022 Dr. N. Yadaiah, JNTU 1

Perceptrons
• Perceptron is one of the earliest models of artificial neuron.
• It was proposed by Rosenblatt in 1958.
• It is a single layer neural network whose weights can be trained
to produce a correct target vector when presented with the
corresponding input vector
• The training technique used is called the Perceptron learning
rule.
• The Perceptron generated great interest due to its ability to
generalize from its training vectors and work with randomly
distributed connections.
• Perceptrons are especially suited for problems in pattern
classification.

28-12-2022 Dr. N. Yadaiah, JNTU 2

1
Single layer Neural Networks

Perceptrons
 Linear separability
 A set of (2D) patterns (x1, x2) of two classes is linearly
separable if there exists a line on the (x1, x2) plane
 w0 + w1 x1 + w2 x2 = 0
 Separates all patterns of one class from the other class
 A perceptron can be built with
 3 input x0 = 1, x1, x2 with weights w0, w1, w2
 n dimensional patterns (x1,…, xn)
 Hyperplane w0 + w1 x1 + w2 x2 +…+ wn xn = 0 dividing
the space into two regions
 Can we get the weights from a set of sample patterns?
 If the problem is linearly separable, then YES (by
perceptron learning)
28-12-2022 Dr. N. Yadaiah, JNTU 3

LINEAR SEPARABILITY

 Definition: Two sets of points A and B in an n-dimensional space are


called linearly separable if n+1 real numbers w1, w2, w3, . . . ., wn+1
exist, such that every point (x1, x2, . . . , xn)A satisfies and every point
(x1, x2, . . . , xn) B satisfies .
Absolute Linear Separability
 Two sets of points A and B in an n-dimensional space are called linearly
separable if n+1 real numbers w1, w2, w3, . . . ., wn+1 exist, such that
every point (x1, x2, . . . , xn) A satisfies and every point (x1, x2, . . . , xn)
B satisfies .
 Two finite sets of points A and B, in n-dimensional space which are
linear separable are also absolute linearly separable.
 In general, absolute linearly separable-> linearly separable
but if sets are finite, linearly separable absolutely linearly separable

28-12-2022 Dr. N. Yadaiah, JNTU 4

2
Single layer Neural Networks

 Examples of linearly separable classes


o x
- Logical AND function
patterns (bipolar) decision
boundary
x1 x2 output w1 = 1 o o
-1 -1 -1 w2 = 1
-1 1 -1 w0 = -1
1 -1 -1 x: class I (output = 1)
1 1 1 -1 + x1 + x2 = 0 o: class II (output = -1)
- Logical OR function
x x
patterns (bipolar) decision boundary
x1 x2 output w1 = 1
-1 -1 -1 w2 = 1
-1 1 1 w0 = 1
1 -1 1 o x
1 1 1 1 + x1 + x2 = 0
x: class I (output = 1)
o: class II (output = -1)

Perceptron Model
X0
x1 w1
n
net   w i x i net Output
w2 f(.)
i 0 x2  o

O  f net Hard
wn limiter

Fig. 3.1 Schematic diagram of Perceptron


xn

Inputs

Depending upon the type of activation function, the Perceptron may


be classified into two types:
• Discrete Perceptron, in which the activation function is hard
limiter or sgn(net) function
• Continuous Perceptron, in which the activation function is sigmoid
function, which is differentiable.

28-12-2022 Dr. N. Yadaiah, JNTU 6

3
Single layer Neural Networks

Single Layer Discrete Perceptron Networks (SLDP)


To develop insight into the behavior of a pattern classifier, it is necessary to
plot a map of the decision regions in n-dimensional space, spanned by the
n input variables. The two decision regions separated by a hyper plane
defined by
x2

w
i 0
i xi  0 Class C1

x1

Class C2

Fig. 3.2 Illustration of the hyper plane (in this example, a straight line)
as decision boundary for a two dimensional, two-class patron classification problem.

28-12-2022 Dr. N. Yadaiah, JNTU 7

Single Layer Discrete Perceptron Networks (SLDP)

For the Perceptron to function properly, the two classes C1 and C2 must be linearly
separable. Decision boundary

Cla Cla
Cla ss Cla ss
ss C2 ss
C2
C1 C1

(a) (b)
Fig (a) A pair of linearly separable patterns
(b) A pair of nonlinearly separable patterns.

In Fig.3.3(a), the two classes C1 and C2 are sufficiently separated from each other
to draw a hyper plane (in this it is a straight line) as the decision boundary.

28-12-2022 Dr. N. Yadaiah, JNTU 8

4
Single layer Neural Networks

Single Layer Discrete Perceptron Networks (SLDP)

Assume that the input variables originate from two linearly separable classes.

Let æ1 be the subset of training vectors X1(1), X1(2), . , that belongs to class C1

æ2 be the subset of training vectors X2(1), X2(2), . , that belong to class C2.

Given the sets of vectors æ1 and æ2 to train the classifier, the training
process involves the adjustment of the W in such a way that the two
classes C1 and C2 are linearly separable. That is, there exists a weight
vector W such that we may write,
WX  0 for every input vector X belonging to class C1 

WX  0 for every input vector X belonging to class C 2 

28-12-2022 Dr. N. Yadaiah, JNTU 9

Single Layer Discrete Perceptron Networks (SLDP)


The algorithm for updating the weights may be formulated as follows:
1. If the kth member of the training set, Xk is correctly classified by the weight vector
W(k) computed at the kth iteration of the algorithm, no correction is made to the
weight vector of Perceptron in accordance with the rule.
Wk+1 = Wk if WkXk >0 and Xk belongs to class C1
k+1 k
W =W if Wk X k  0 and Xk belongs to class C2
2. Otherwise, the weight vector of the Perceptron is updated in accordance with the
rule.

W ( k 1)T  W kT - ηX k if Wk Xk >0 and Xk belongs to class C2

W ( k 1)T  WkT  η Xk if WkXk  0 and Xk belongs to class C1

where the learning rule parameter α controls the adjustment applied to the weight vector.

28-12-2022 Dr. N. Yadaiah, JNTU 10

5
Single layer Neural Networks

Discrete Perceptron training algorithm


Consider P number of training patterns are available for training the model as :
{(X1, t1), (X2, t2), . . . . (Xp, tp)}, where Xi is the ith input vector,
ti is the ith target output, i = 1, 2, . . . P.

Learning Algorithm
Step 1: Set learning rate (0<η≤1)
Step 2: Initialize the weights and bias at small random values.
Step 3: Set p←1, where p indicates the pth input vector presented.

28-12-2022 Dr. N. Yadaiah, JNTU 11

Algorithm continued..

Step 4: Compute the output response


n
net p   w ki x ip
i 1

O p  f net p 
where, activation function is f net p 
For bipolar binary activation function
 1 if net p  
o p  f (net p )  
 1 if net p  
For unipolar binary activation function

1 if net p  
o p  f (net p )  
 0 otherwise

28-12-2022 Dr. N. Yadaiah, JNTU 12

6
Single layer Neural Networks

Algorithm continued..

Step 5: Update the weights



wik 1  wik  (t p  o p ) x i
2
Here, the weights are updated only if the target and output does not match.
Step 6: If p < P, the p  p+1, go to step 4 and compute the output response for
the next input, otherwise go to step 7.
Step 7: Test the stopping condition: if weights are not changed, stop and store
the final weights (W) and bias (b), else go to step 3.
The network training stops when all the input vectors are correctly classified i.e.
when the target value matches with the output for all the input vectors.

28-12-2022 Dr. N. Yadaiah, JNTU 13

Example: Build the Perceptron network to realize fundamental


logic gates, such as AND, OR and XOR.
Solution:
The following steps are included for hand calculations with OR gate input-output data.
Table: OR logic gate function
Input Output
X1 X2 (Target)
0 0 0
0 1 1
1 0 1
1 1 1
Step 1: Initialize weights w1 = 0.1, w2 = 0.3;
Step 2: Set learning rate, η = 0.1 and threshold value, θ = 0.2.
Step 3: Apply input pattern one by one and repeat the steps 4 and 5,

28-12-2022 Dr. N. Yadaiah, JNTU 14

7
Single layer Neural Networks

For input 1:
Let us consider the input, X1 = [0,0] with target, t1=0.
Step 4: Compute the net input to the Perceptron, using equation
2
net1   w i0 x 1i  b 0  0.1  0  0.3  0  0
i 1

with the bipolar binary activation function, the output obtained as


o1  f (0)  0
Step 5: The output is same as that of target, t1 = 0, that is, the input pattern is correctly
classified.
Therefore, the weights and bias elements remain as their previous values, that is
updation in weights does not takes place.
Now the weight matrix for next input is w1= [0.1 0.3].

For input 2:
The steps 4 and 5 are repeated for the next input, X2 = [0, 1] with target, t2=1.
The net input obtained as
2
net 2   w 1i x i2  b1  0.1  0  0.3  1  0.3
i 1

The corresponding output is obtained as o2= f(0.3) = 1


The output is same as that of target, t2 = 1, that is, the input pattern is correctly
classified. Therefore, the weights and bias elements remain as their previous
values, that is updation in weights and bias does not takes place. Now the weight
matrix for next input is w1= [0.1 0.3].

8
Single layer Neural Networks

For input 3:
Repeat steps 4 and 5 for the next input, x3 = [1,0] with target, t3=1.
Compute the net input to the Perceptron and output
2
net3   w 2i x 3i  b 2  0.1  1  0.3  0  0.1
i 1

o3= f(0.1) = 0
The output is not same as target, t2 = 1, the weights are updated using the equation (3.14)
The weights and bias are updated
w13  w12  (t3  oo ) x13  0.1  0.1 (1  0) 1  0.2

w23  w22   (t3  oo ) x23  0.3  0.1 (1  0)  0  0.3

The new weights are: [0.2 0.3].

For input 4:
Repeat steps 4 and 5 for the next input, x4 = [1,1] with target, t3=1.
Compute the net input to the Perceptron and output
2
net 4   w 3i x i4  b 3  0.2  1  0.3  1  0.5
i 1

The corresponding output using equation (3.13) obtained as


o4= f(0.5) = 1
The output is same as that of target, t2 = 1, that is, the input pattern is correctly
classified. Therefore, the weights and bias elements remain as their previous values,
that is updation in weights and bias does not takes place. Now the weight matrix after
completion of one cycle is : w1= [0.2 0.3].
The summary of weights changes are described in Table 3.2
Table 3.2: The updated weights
Input Updated values
X1 X2 Net Output Target w1 w2
0.1 0.3
0 0 0 0 0 0.1 0.3
0 1 0.3 1 1 0.1 0.3
1 0 0.1 0 1 0.2 0.3
1 1 0.5 1 1 0.2 0.3

9
Single layer Neural Networks

Results 0.5

0.45

0.4

1 0.35

0.9 0.3

Error
0.8 0.25

0.7 0.2

0.6 0.15
Error

0.5 0.1

0.4 0.05

0.3 0
1 2 3 4 5 6 7 8 9 10
0.2 Number of epochs

0.1

0
1 2 3 4 5 6 7 8 9 10
Number of epochs
Fig. 3.5 The Error profile during the training of
Fig. 3.4 The Error profile during the training of Perceptron to learn input-output relation of
Perceptron to learn input-output relation of OR AND gate
gate

2.5

2
Error

1.5

0.5
0 5 10 15 20 25 30 35 40 45 50
Number of epochs

Fig. 3.6 The Error profile during the training of


Perceptron to learn input-output relation of XOR gate

10
Single layer Neural Networks

Single-Layer Continuous Perceptron networks

 The activation function that is used in modeling the Continuous


Perceptron is sigmoidal, which is differentiable.

 The two advantages of using continuous activation function are


(i) finer control over the training procedure and (ii) differential
characteristics of the activation function, which is used for
computation of the error gradient.
 This gives the scope to use the gradients in modifying the
weights. The gradient or steepest descent method is used in
updating weights starting from any arbitrary weight vector W,
the gradient E(W) of the current error function is computed.

28-12-2022 Dr. N. Yadaiah, JNTU 21

28-12-2022 Dr. N. Yadaiah, JNTU 22

11
Single layer Neural Networks

28-12-2022 Dr. N. Yadaiah, JNTU 23

28-12-2022 Dr. N. Yadaiah, JNTU 24

12
Single layer Neural Networks

28-12-2022 Dr. N. Yadaiah, JNTU 25

28-12-2022 Dr. N. Yadaiah, JNTU 26

13
Single layer Neural Networks

28-12-2022 Dr. N. Yadaiah, JNTU 27

28-12-2022 Dr. N. Yadaiah, JNTU 28

14
Single layer Neural Networks

Single-Layer Continuous Perceptron networks

updated weight vector may be written as


W ( k 1)  W k -  E(W k ) (3.22)
where η is learning constant.
The error function at step k may be written as
1
Ek  t k - o k 2 (3.23a)
2
or
Ek = 1 tk - f Wk X2 (3.23b)
2

28-12-2022 Dr. N. Yadaiah, JNTU 29

Single-Layer Continuous Perceptron networks


The error minimization algorithm (3.22) requires computation of the gradient of the
error function (3.23) and it may be written as
1
t - f(net k )
2
E (W k )  (3.24)
2
 E 
 w 
 0
 E 
The n+1 dimensional gradient vector is defined as  w  (3.25)
 1
E (W )   . 
k

 
 . 
 E 
 
 wn 
 

28-12-2022 Dr. N. Yadaiah, JNTU 30

15
Single layer Neural Networks

Single-Layer Continuous Perceptron networks


Using (3.24), we obtain the gradient vector as
  (net k ) 
 w 
 0

  (net k ) 
 
 w1  (3.26)
E (W k )  - (d k - o k ) f ' (net k )  . 
 
 . 
  (net k ) 
 
 wn 
 
 

Since netk = WkX, we have


 (net k )
 xi , for i =0, 1, . . . n. (3.27)
wi
(x0=1 for bias element) and
28-12-2022 Dr. N. Yadaiah, JNTU 31

Single-Layer Continuous Perceptron networks

equation (3.27) can be written as


E (W k )  - (t k - o k )f ' (net k )X (3.28a)
or
E
 - (t k - o k )f ' (net k )x i for i = 0, 1, . . . n (3.28b)
wi

 - E(W k )   (t k - o k )f ' (net k )x i


k
 w i (3.29)

28-12-2022 Dr. N. Yadaiah, JNTU 32

16
Single layer Neural Networks

Single-Layer Continuous Perceptron networks

The gradient (3.28a) can be written as


1 2
E (W k )  - (t k - o k ) (1 - o k )X (3.32)
2
and the complete delta training for the bipolar continuous activation function results
from (3.32) as
1
 (t k - o k ) (1 - o k )X k
2
W ( k 1)  W k  (3.33)
2
where k denotes the reinstated number of the training step.

28-12-2022 Dr. N. Yadaiah, JNTU 33

Perceptron Convergence Theorem

This theorem states that the Perceptron learning law converges to a final set of
weight values in a finite number of steps, if the classes are linear separable.
Proposition: If the sets P and N are finite and linearly separable, the Perceptron

Learning algorithm updates the weight vector wt in a finite number of times.

28-12-2022 Dr. N. Yadaiah, JNTU 34

17
Single layer Neural Networks

Perceptron Convergence Theorem

Proof: Let us make three simplifications, without losing generality:


(i) The sets P and N can be joined in a single set P   P  N  , where
N  consists of the negated elements of N .

(ii) The vectors in P can be normalized ( pi  1 ), because if a weight vector
  
w is found such that w  x  0 then this is also valid for any other vector

 x , where  is a constant.

(iii) The weight vector can also be normalized ( w*  1 ). Since we assume that

a solution for the linear separation problem exists, we call w  a normalized
solution vector.

28-12-2022 Dr. N. Yadaiah, JNTU 35

Perceptron Convergence Theorem



Now, assume that after t  1 steps the weight vector wt 1 has been computed. This means that at time t , a
 
vector pi was incorrectly classified by the weight vector wt and so a correction was applied:
  
wt 1  wt  pi (3.37)
 
The cosine of the angle between wt 1 and w  is
 
w  wt 1
cos     (3.38)
wt 1 w*
    
Numerator of equation (3.38): w*  wt 1  w  wt  pi 
   
 w  wt  w  pi
 
 w  wt  
where   minw  p p  p
  

28-12-2022 Dr. N. Yadaiah, JNTU 36

18
Single layer Neural Networks

Perceptron Convergence Theorem



Since w defines an absolute linear separation ( it means finite sets + linearly
separable ) of P and N , we know that   0
By induction, we obtain
   
w*  wt 1  w   w0  t  1 (3.39)
(Induction is:
we have
  
w*  wt  w  wt 1  
  
w*  wt 1  w  wt 1    
  
w*  wt 1  w wt 1  2 : Induction
Therefore
   
w*  wt 1  w   w0  t  1 )

28-12-2022 Dr. N. Yadaiah, JNTU 37

Perceptron Convergence Theorem

Denominator of equation (3.38):


 2    
wt 1  wt  pi   wt  pi 
 2    2
 wt  2 wt  pi  pi
   
Since wt  pi  0 (remember we corrected wt using pi )
 2  2  2
wt 1  wt  pi
 2 
 wt  1 (since pi is normalized)
 2  2
By induction: wt  w0  t  1 (3.40)

28-12-2022 Dr. N. Yadaiah, JNTU 38

19
Single layer Neural Networks

Perceptron Convergence Theorem

Substituting (3.39), (3.40) in (3.38), we get


 
w  w0  t  1 t  1 = t  1  t
1  cos    2 
w  t  1
0
t 1

The right hand side term grows proportionally to t and since   0 ; it can
become arbitrarily large. However, since cos   1 ; t must be bounded by a
 1 
maximum value  t  2  .
  
 The number of corrections to the weight vector must be finite.

28-12-2022 Dr. N. Yadaiah, JNTU 39

Limitations of Perceptron
 There are limitations to the capabilities of Perceptron however.
 They will learn the solution, if there is a solution to be found.
 First, the output values of a Perceptron can take on only one of
two values (True or False).
 Second, Perceptron can only classify linearly separable sets of
vectors. If a straight line or plane can be drawn to separate the
input vectors into their correct categories, the input vectors are
linearly separable and the Perceptron will find the solution.
 If the vectors are not linearly separable learning will never
reach a point where all vectors are classified properly.
 The most famous example of the Perception’s inability to solve
problems with linearly non-separable vectors is the boolean XOR
realization.

28-12-2022 Dr. N. Yadaiah, JNTU 40

20
Single layer Neural Networks

THANK YOU

28-12-2022 Dr. N. Yadaiah, JNTU 41

21

You might also like