0% found this document useful (0 votes)
42 views20 pages

PMR5406 Redes Neurais e Lógica Fuzzy: Aula 3 Single Layer Percetron

The document discusses the architecture and learning algorithms of single layer perceptrons and adaptive linear neurons (Adalines). It describes the perceptron model which uses a non-linear activation function and can be used for binary classification problems. The fixed increment learning rule is presented to update the perceptron's weights to minimize classification errors on the training examples. However, perceptrons cannot represent XOR functions as they are not linearly separable. Adalines use a linear model and the least mean square (LMS) algorithm, which aims to minimize the squared error by updating weights in the direction of the gradient descent.

Uploaded by

Yatin Bajaj
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views20 pages

PMR5406 Redes Neurais e Lógica Fuzzy: Aula 3 Single Layer Percetron

The document discusses the architecture and learning algorithms of single layer perceptrons and adaptive linear neurons (Adalines). It describes the perceptron model which uses a non-linear activation function and can be used for binary classification problems. The fixed increment learning rule is presented to update the perceptron's weights to minimize classification errors on the training examples. However, perceptrons cannot represent XOR functions as they are not linearly separable. Adalines use a linear model and the least mean square (LMS) algorithm, which aims to minimize the squared error by updating weights in the direction of the gradient descent.

Uploaded by

Yatin Bajaj
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

PMR5406 Redes Neurais

e Lógica Fuzzy
Aula 3
Single Layer Percetron
Baseado em:
Neural Networks, Simon Haykin, Prentice-Hall, 2nd
edition
Slides do curso por Elena Marchiori, Vrije
Unviersity
Architecture
• We consider the architecture: feed-forward NN
with one layer
• It is sufficient to study single layer perceptrons
with just one neuron:



PMR5406 Redes Neurais e Single Layer Perceptron 2
Lógica Fuzzy
Perceptron: Neuron Model
• Uses a non-linear (McCulloch-Pitts) model
of neuron: b (bias)
x1
w1
v y
x2 w2
(v)
wm
xm
  is the sign function:
+1 IF v >= 0
(v) = -1 IF v < 0
Is the function sign(v)

PMR5406 Redes Neurais e Single Layer Perceptron 3


Lógica Fuzzy
Perceptron: Applications

• The perceptron is used for classification:


classify correctly a set of examples into
one of the two classes C1, C2:

If the output of the perceptron is +1 then the


input is assigned to class C1
If the output is -1 then the input is assigned
to C2

PMR5406 Redes Neurais e Single Layer Perceptron 4


Lógica Fuzzy
Perceptron: Classification
• The equation below describes a hyperplane in the
input space. This hyperplane is used to separate the
two classes C1 and C2
decision
region for C1
x2 w1x1 + w2x2 + b > 0
m


i 1
wixi  b  0 decision
boundary C1
decision x1
C2
region for C2
w1x1 + w2x2 + b <= 0 w1x1 + w2x2 + b = 0
PMR5406 Redes Neurais e Single Layer Perceptron 5
Lógica Fuzzy
Perceptron: Limitations

• The perceptron can only model linearly


separable functions.
• The perceptron can be used to model the
following Boolean functions:
• AND
• OR
• COMPLEMENT
• But it cannot model the XOR. Why?

PMR5406 Redes Neurais e Single Layer Perceptron 6


Lógica Fuzzy
Perceptron: Limitations
• The XOR is not linear separable
• It is impossible to separate the classes
C1 and C2 with only one line

x2 C1 C2
1 1 -1

0 -1 1 C1

0 1 x1
PMR5406 Redes Neurais e Single Layer Perceptron 7
Lógica Fuzzy
Perceptron: Learning Algorithm

• Variables and parameters


x(n) = input vector
= [+1, x1(n), x2(n), …, xm(n)]T
w(n) = weight vector
= [b(n), w1(n), w2(n), …, wm(n)]T
b(n) = bias
y(n) = actual response
d(n) = desired response
 = learning rate parameter
PMR5406 Redes Neurais e Single Layer Perceptron 8
Lógica Fuzzy
The fixed-increment learning algorithm
• Initialization: set w(0) =0
• Activation: activate perceptron by applying input
example (vector x(n) and desired response d(n))
• Compute actual response of perceptron:
y(n) = sgn[wT(n)x(n)]
• Adapt weight vector: if d(n) and y(n) are different
then
w(n + 1) = w(n) + [d(n)-y(n)]x(n)
+1 if x(n)  C1
Where d(n) =
-1 if x(n)  C2
• Continuation: increment time step n by 1 and go
to Activation step
PMR5406 Redes Neurais e Single Layer Perceptron 9
Lógica Fuzzy
Example

Consider a training set C1  C2, where:


C1 = {(1,1), (1, -1), (0, -1)} elements of class 1
C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1

Use the perceptron learning algorithm to classify these


examples.

• w(0) = [1, 0, 0]T =1


Example

x2 1
- - +
Decision boundary:
C2 2x1 - x2 = 0

-1 1/2 1 x1

- + -1 + C1

PMR5406 Redes Neurais e Single Layer Perceptron 11


Lógica Fuzzy
Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The


perceptron convergence algorithm converges after n0
iterations, with n0  nmax on training set C1  C2.

Proof:
• suppose x  C1  output = 1 and x  C2  output = -1.
• For simplicity assume w(1) = 0,  = 1.
• Suppose perceptron incorrectly classifies x(1) … x(n) … C1.
Then wT(k) x(k)  0.
 Error correction rule:
w(2) = w(1) + x(1)
w(3) = w(2) + x(2)  w(n+1) = x(1)+
…+ x(n)
w(n+1) = w(n) + x(n).
Convergence theorem (proof)

• Let w0 be such that w0T x(n) > 0  x(n)  C1.


w0 exists because C1 and C2 are linearly separable.

• Let  = min w0T x(n) | x(n)  C1.

• Then w0T w(n+1) = w0T x(1) + … + w0T x(n)  n

• Cauchy-Schwarz inequality:
||w0||2 ||w(n+1)||2  [w0T w(n+1)]2
n2  2
||w(n+1)||2  ||w0|| 2 (A)

PMR5406 Redes Neurais e Single Layer Perceptron 13


Lógica Fuzzy
Convergence theorem (proof)
• Now we consider another route:
w(k+1) = w(k) + x(k)
|| w(k+1)||2 = || w(k)||2 + ||x(k)||2 + 2 w T(k)x(k)
euclidean norm 
 0 because x(k) is misclassified
 ||w(k+1)||2  ||w(k)||2 + ||x(k)||2 k=1,..,n
=0
||w(2)||2  ||w(1)||2 + ||x(1)||2
||w(3)||2  ||w(2)||2 + ||x(2)||2
n

 x(k )
2
 ||w(n+1)||2 
k 1

PMR5406 Redes Neurais e Single Layer Perceptron 14


Lógica Fuzzy
convergence theorem (proof)
• Let  = max ||x(n)||2 x(n)  C1
• ||w(n+1)||2  n  (B)
• For sufficiently large values of k:
(B) becomes in conflict with (A).
Then n cannot be greater than nmax such that (A) and (B) are both
satisfied with the equality sign.
n 2max 2 || w0 || 2
 nmax  nmax  β
|| w0 || 2
 2

• Perceptron convergence algorithm terminates in at most


nmax=  ||w 0 || 2
iterations.
 2

PMR5406 Redes Neurais e Single Layer Perceptron 15


Lógica Fuzzy
Adaline: Adaptive Linear Element

• The output y is a linear combination o x

x1
w1
y
x2 w2
wm m
xm y   x j (n)w j (n )
j 0

PMR5406 Redes Neurais e Single Layer Perceptron 16


Lógica Fuzzy
Adaline: Adaptive Linear Element
• Adaline: uses a linear neuron model and the Least-Mean-
Square (LMS) learning algorithm
The idea: try to minimize the square error, which is a function of the weights

E ( w(n))  e ( n ) 1
2
2

m
e(n )  d (n )   x j (n)w j (n )
j 0
• We can find the minimum of the error function E by means of
the Steepest descent method

PMR5406 Redes Neurais e Single Layer Perceptron 17


Lógica Fuzzy
Steepest Descent Method
• start with an arbitrary point
• find a direction in which E is decreasing most rapidly

 (gradient of E ( w ))   E
w1
, ,
E
wm
 
• make a small step in that direction

w( n  1)  w(n)   (gradient of E(n))


PMR5406 Redes Neurais e Single Layer Perceptron 18
Lógica Fuzzy
Least-Mean-Square algorithm
(Widrow-Hoff algorithm)
• Approximation of gradient(E)
E ( w(n)) e(n)
 e(n)
w(n) w(n)
 e(n)[ x(n) T ]

• Update rule for the weights becomes:

w(n  1)  w(n)  x(n)e(n)


PMR5406 Redes Neurais e Single Layer Perceptron 19
Lógica Fuzzy
Summary of LMS algorithm
Training sample: input signal vector x(n)
desired response d(n)
User selected parameter  >0
Initialization set ŵ(1) = 0

Computation for n = 1, 2, … compute


e(n) = d(n) - ŵT(n)x(n)
ŵ(n+1) = ŵ(n) +  x(n)e(n)

PMR5406 Redes Neurais e Single Layer Perceptron 20


Lógica Fuzzy

You might also like