0% found this document useful (0 votes)
7 views55 pages

Module 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views55 pages

Module 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Perceptron

1 / 55
Outline

• Perceptron
• Perceptron Learning Algorithm
• Convergence Theorem
• Linearly Separable Boolean Functions

2 / 55
McCulloch Pitts Neuron - Recap

A single McCulloch Pitts Neuron can be used to repre-


sent boolean functions which are linearly separable.
Linear separability (for boolean functions) : There
exists a line (plane) such that all inputs which produce a
1 lie on one side of the line (plane) and all inputs which
produce a 0 lie on other side of the line (plane)

3 / 55
Perceptron

4 / 55
Why Perceptron?

So now a question arises like,


• What about non-boolean (say, real) inputs ?
• Do we always need to hand code the threshold ?
• Are all inputs equal ? What if we want to assign more weight (importance) to some
inputs ?
• What about functions which are not linearly separable ?

5 / 55
Perceptron - Introduction
• Frank Rosenblatt, an American psychol-
ogist, proposed the classical perceptron
model (1958)
• A more general computational model than
McCulloch Pitts neurons
• Main differences: Introduction of numer-
ical weights for inputs and a mechanism
for learning these weights
• Inputs are no longer limited to boolean val-
ues
• Refined and carefully analyzed by Minsky
and Papert (1969) - their model is referred
to as the perceptron model here.
6 / 55
Perceptron (Cont'd)

7 / 55
Perceptron (Cont'd)

8 / 55
Perceptron (Cont'd)

9 / 55
Perceptron (Cont'd)
• instead of having this threshold as a separate quantity, we consider it as one of the inputs
which is always on and the weight of that input is - 𝜃.
• So,now the goal is for all these other inputs and their weights is to make sure that their
sum is greater than this input.
• So, this is the perceptron equation which says that it fires when this summation is greater
than equal to 0, otherwise it does not fire.

10 / 55
Perceptron (Cont'd)

So now
• Why are we trying to implement boolean functions?
• Why do we need weights ?
• Why is 𝑤 0 = -𝜃 called the bias ?

11 / 55
Perceptron (Cont'd)
• Consider the task of predicting whether we would like a movie
or not
• Suppose, we base our decision on 3 inputs (binary, for sim-
plicity)
• Based on our past viewing experience (data), we may give
a high weight to isDirectorNolan as compared to the other
inputs
• Specifically, even if the actor is not Matt Damon and the genre
is not thriller we would still want to cross the threshold 𝜃 by
assigning a high weight to isDirectorNolan
• w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may watch
any movie irrespective of the genre, actor, director [𝜃 = 0]
12 / 55
Perceptron (Cont'd)

• On the other hand, a selective viewer may only watch thrillers


starring Matt Damon and directed by Nolan [𝜃= 3] (all
there high input)
• The weights (𝑤 1 ,𝑤 2 , .....,𝑤 𝑛 ) and the bias (𝑤 0 ) will depend
on the data (viewer history in this case)

13 / 55
Perceptron (Cont'd)

• What kind of functions can be implemented using the perceptron?


• Any difference from McCulloch Pitts neurons?

14 / 55
Dierence between Perceptron and MP neurons
• From the equations it is clear that even a perceptron
separates the input space into two halves
• All inputs which produce a 1 lie on one side and all
inputs which produce a 0 lie on the other side
• In other words, a single perceptron can only be used
to implement linearly separable functions
• Then what is the difference?
• The weights (including threshold) can be learned
and
• the inputs can be real valued We will first revisit
some boolean functions and
• then see the perceptron learning algorithm (for
learning weights)

15 / 55
16 / 55
17 / 55
18 / 55
19 / 55
20 / 55
21 / 55
22 / 55
Introduction to Error
and Error surfaces

23 / 55
24 / 55
25 / 55
26 / 55
27 / 55
28 / 55
29 / 55
Let us plot the error surface corresponding to different values of 𝑤 0 , 𝑤 1 , 𝑤 2
30 / 55
31 / 55
Perceptron Learning Algorithm

32 / 55
Apart from implementing boolean functions what can a
perceptron be used for ?

• We will now see a more principled approach for learning these weights and threshold.
• Apart from implementing boolean functions (which does not look very interesting) what
can a perceptron be used for ?
• Our interest lies in the use of perceptron as a binary classifier.

33 / 55
• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies
and a label (class) associated with each
movie indicating whether the user liked
this movie or not : binary decision
• Further, suppose we represent each movie
with n features (some boolean, some real
valued because ratings will be real val-
ued)
• We will assume that the data is linearly
separable and we want a perceptron to
learn how to make this decision

34 / 55
In other words, we want the perceptron to
find the equation of this separating plane (or
find the values of 𝑤 1 ,𝑤 2 , .....,𝑤 𝑛 )

35 / 55
36 / 55
37 / 55
38 / 55
39 / 55
40 / 55
41 / 55
42 / 55
Convergence theorem for
Perceptron Learning Algorithm

43 / 55
Convergence theorem

Definition:
Two sets P and N of points in an n-dimensional space are called absolutely linearly separable
if n + 1 real numbers 𝑤 0 ,𝑤 1 , .....,𝑤 𝑛 exist such that every point (𝑥1 ,𝑥 2 , .....,𝑥 𝑛 ) 𝜖 P satisfies
Í𝑛 Í𝑛
𝑖=1 𝑤 𝑖 ∗ 𝑥 𝑖 > 𝑤 0 and every point (𝑥 1 ,𝑥 2 , .....,𝑥 𝑛 ) 𝜖 N satisfies 𝑖=1 𝑤 𝑖 ∗ 𝑥 𝑖 < 𝑤 0
Proposition:
If the sets P and N are finite and linearly separable, the perceptron learning algorithm updates
the weight vector 𝑤 𝑡 a finite number of times.
In other words: if the vectors in P and N are tested cyclically one after the other, a weight
vector 𝑤 𝑡 is found after a finite number of steps t which can separate the two sets.

44 / 55
Linearly Separable Boolean Functions

45 / 55
So what do we do about functions which are not linearly separable ?
One example is XOR function

46 / 55
47 / 55
48 / 55
49 / 55
50 / 55
51 / 55
52 / 55
53 / 55
54 / 55
Summary

Perceptron Learning principled approach for learning weights


Algorithm
Convergence Theo- a weight vector wT is found after a finite number of steps t which
rem can separate the two sets.
Linearly Separable Single perceptron cannot deal with this data
Boolean Functions

55 / 55

You might also like