Module 4
Module 4
1 / 55
Outline
• Perceptron
• Perceptron Learning Algorithm
• Convergence Theorem
• Linearly Separable Boolean Functions
2 / 55
McCulloch Pitts Neuron - Recap
3 / 55
Perceptron
4 / 55
Why Perceptron?
5 / 55
Perceptron - Introduction
• Frank Rosenblatt, an American psychol-
ogist, proposed the classical perceptron
model (1958)
• A more general computational model than
McCulloch Pitts neurons
• Main differences: Introduction of numer-
ical weights for inputs and a mechanism
for learning these weights
• Inputs are no longer limited to boolean val-
ues
• Refined and carefully analyzed by Minsky
and Papert (1969) - their model is referred
to as the perceptron model here.
6 / 55
Perceptron (Cont'd)
7 / 55
Perceptron (Cont'd)
8 / 55
Perceptron (Cont'd)
9 / 55
Perceptron (Cont'd)
• instead of having this threshold as a separate quantity, we consider it as one of the inputs
which is always on and the weight of that input is - 𝜃.
• So,now the goal is for all these other inputs and their weights is to make sure that their
sum is greater than this input.
• So, this is the perceptron equation which says that it fires when this summation is greater
than equal to 0, otherwise it does not fire.
10 / 55
Perceptron (Cont'd)
So now
• Why are we trying to implement boolean functions?
• Why do we need weights ?
• Why is 𝑤 0 = -𝜃 called the bias ?
11 / 55
Perceptron (Cont'd)
• Consider the task of predicting whether we would like a movie
or not
• Suppose, we base our decision on 3 inputs (binary, for sim-
plicity)
• Based on our past viewing experience (data), we may give
a high weight to isDirectorNolan as compared to the other
inputs
• Specifically, even if the actor is not Matt Damon and the genre
is not thriller we would still want to cross the threshold 𝜃 by
assigning a high weight to isDirectorNolan
• w0 is called the bias as it represents the prior (prejudice)
• A movie buff may have a very low threshold and may watch
any movie irrespective of the genre, actor, director [𝜃 = 0]
12 / 55
Perceptron (Cont'd)
13 / 55
Perceptron (Cont'd)
14 / 55
Dierence between Perceptron and MP neurons
• From the equations it is clear that even a perceptron
separates the input space into two halves
• All inputs which produce a 1 lie on one side and all
inputs which produce a 0 lie on the other side
• In other words, a single perceptron can only be used
to implement linearly separable functions
• Then what is the difference?
• The weights (including threshold) can be learned
and
• the inputs can be real valued We will first revisit
some boolean functions and
• then see the perceptron learning algorithm (for
learning weights)
15 / 55
16 / 55
17 / 55
18 / 55
19 / 55
20 / 55
21 / 55
22 / 55
Introduction to Error
and Error surfaces
23 / 55
24 / 55
25 / 55
26 / 55
27 / 55
28 / 55
29 / 55
Let us plot the error surface corresponding to different values of 𝑤 0 , 𝑤 1 , 𝑤 2
30 / 55
31 / 55
Perceptron Learning Algorithm
32 / 55
Apart from implementing boolean functions what can a
perceptron be used for ?
• We will now see a more principled approach for learning these weights and threshold.
• Apart from implementing boolean functions (which does not look very interesting) what
can a perceptron be used for ?
• Our interest lies in the use of perceptron as a binary classifier.
33 / 55
• Let us reconsider our problem of deciding
whether to watch a movie or not
• Suppose we are given a list of m movies
and a label (class) associated with each
movie indicating whether the user liked
this movie or not : binary decision
• Further, suppose we represent each movie
with n features (some boolean, some real
valued because ratings will be real val-
ued)
• We will assume that the data is linearly
separable and we want a perceptron to
learn how to make this decision
34 / 55
In other words, we want the perceptron to
find the equation of this separating plane (or
find the values of 𝑤 1 ,𝑤 2 , .....,𝑤 𝑛 )
35 / 55
36 / 55
37 / 55
38 / 55
39 / 55
40 / 55
41 / 55
42 / 55
Convergence theorem for
Perceptron Learning Algorithm
43 / 55
Convergence theorem
Definition:
Two sets P and N of points in an n-dimensional space are called absolutely linearly separable
if n + 1 real numbers 𝑤 0 ,𝑤 1 , .....,𝑤 𝑛 exist such that every point (𝑥1 ,𝑥 2 , .....,𝑥 𝑛 ) 𝜖 P satisfies
Í𝑛 Í𝑛
𝑖=1 𝑤 𝑖 ∗ 𝑥 𝑖 > 𝑤 0 and every point (𝑥 1 ,𝑥 2 , .....,𝑥 𝑛 ) 𝜖 N satisfies 𝑖=1 𝑤 𝑖 ∗ 𝑥 𝑖 < 𝑤 0
Proposition:
If the sets P and N are finite and linearly separable, the perceptron learning algorithm updates
the weight vector 𝑤 𝑡 a finite number of times.
In other words: if the vectors in P and N are tested cyclically one after the other, a weight
vector 𝑤 𝑡 is found after a finite number of steps t which can separate the two sets.
44 / 55
Linearly Separable Boolean Functions
45 / 55
So what do we do about functions which are not linearly separable ?
One example is XOR function
46 / 55
47 / 55
48 / 55
49 / 55
50 / 55
51 / 55
52 / 55
53 / 55
54 / 55
Summary
55 / 55