Machine Learning:
Perceptrons
Prof. Dr. Martin Riedmiller
Albert-Ludwigs-University Freiburg
AG Maschinelles Lernen
Machine Learning: Perceptrons p.1/24
Neural Networks
The human brain has approximately 1011 neurons
Switching time 0.001s (computer 1010 s)
Connections per neuron: 104 105
0.1s for face recognition
I.e. at most 100 computation steps
parallelism
additionally: robustness, distributedness
ML aspects: use biology as an inspiration for artificial neural models and
algorithms; do not try to explain biology: technically imitate and exploit
capabilities
Machine Learning: Perceptrons p.2/24
Biological Neurons
Dentrites input information to the cell
Neuron fires (has action potential) if a certain threshold for the voltage is
exceeded
Output of information by axon
The axon is connected to dentrites of other cells via synapses
Learning corresponds to adaptation of the efficiency of synapse, of the
synaptical weight
dendrites
SYNAPSES
AXON
soma
Machine Learning: Perceptrons p.3/24
19
42
a
19 rtifi
49
ci
H al n
eb
eu
bi
an ron
le s (M
ar
19
cC
ni
5
ng
19 8
ul
R
(H loc
1960 os
eb h/
60 Ad en
Pi
b
a
b
tts
)
Le lin la
)
rn e/ tt p
m M e
19
at Ad rc
6
rix a ep
19 9
lin tr
p
(
7
S
e on
19 0 er
t
e
(W (
72 ev ce
i
n
R
o
b
i
p
d
se lut tr
uc ro os
lf- ion on
h) w en
or a s
/H bl
ga ry (M
of att
f) )
ni al in
zi go s
ng r ky
19
m ithm/Pa
82
ap s p
H
s (R ert
19 op
(K e )
86 fie
oh ch
Ba l d n
on en
ck
en be
et
w
p
) rg
19
ro
or
)
pa
ks
co 92
ga
Ba
(H
m
su p y
t
op
io
ut es
p
n
fie
Bo po at i
(
o
n
i
ld
os rt on fe
r
)
i
g
tin ve al re
.
19
g cto lea nc
74
r m rn e
)
ac ing
hi th
ne e
s ory
Historical ups and downs
1950
1960
1970
1980
1990
2000
Machine Learning: Perceptrons p.4/24
Perceptrons: adaptive neurons
perceptrons (Rosenblatt 1958, Minsky/Papert 1969) are generalized variants
of a former, more simple model (McCulloch/Pitts neurons, 1942):
inputs are weighted
weights are real numbers (positive and negative)
no special inhibitory inputs
Machine Learning: Perceptrons p.5/24
Perceptrons: adaptive neurons
perceptrons (Rosenblatt 1958, Minsky/Papert 1969) are generalized variants
of a former, more simple model (McCulloch/Pitts neurons, 1942):
inputs are weighted
weights are real numbers (positive and negative)
no special inhibitory inputs
a percpetron with n inputs is described by a weight vector
w
~ = (w1 , . . . , wn )T Rn and a threshold R. It calculates the
following function:
(
1
T
(x1 , . . . , xn )
7 y=
0
+ x2 w2 + + xn wn
if x1 w1 + x2 w2 + + xn wn <
if x1 w1
Machine Learning: Perceptrons p.5/24
Perceptrons: adaptive neurons
(cont.)
for convenience: replacing the threshold by an additional weight (bias weight)
w0 = . A perceptron with weight vector w
~ and bias weight w0 performs
the following calculation:
(x1 , . . . , xn )T 7 y = fstep (w0 +
n
X
(wi xi )) = fstep (w0 + hw,
~ ~xi)
i=1
with
(
1
fstep (z) =
0
0
if z < 0
if z
Machine Learning: Perceptrons p.6/24
Perceptrons: adaptive neurons
(cont.)
for convenience: replacing the threshold by an additional weight (bias weight)
w0 = . A perceptron with weight vector w
~ and bias weight w0 performs
the following calculation:
(x1 , . . . , xn )T 7 y = fstep (w0 +
n
X
(wi xi )) = fstep (w0 + hw,
~ ~xi)
i=1
with
(
1
fstep (z) =
0
0
if z < 0
if z
1
x1
xn
w0
w1
..
.
wn
Machine Learning: Perceptrons p.6/24
Perceptrons: adaptive neurons
(cont.)
geometric interpretation of a
perceptron:
x2
upper
halfspace
input patterns (x1 , . . . , xn ) are
points in n-dimensional space
hyp
erp
lane
lower
halfspace
x1
x3
upper
halfspace
lower
halfspace
x2
hy
pe
rpl
an
x1
Machine Learning: Perceptrons p.7/24
Perceptrons: adaptive neurons
(cont.)
geometric interpretation of a
perceptron:
x2
upper
halfspace
input patterns (x1 , . . . , xn ) are
points in n-dimensional space
points with w0 + hw,
~ ~xi = 0 are on
a hyperplane defined by w0 and w
~
hyp
erp
lane
lower
halfspace
x1
x3
upper
halfspace
lower
halfspace
x2
hy
pe
rpl
an
x1
Machine Learning: Perceptrons p.7/24
Perceptrons: adaptive neurons
(cont.)
geometric interpretation of a
perceptron:
x2
upper
halfspace
input patterns (x1 , . . . , xn ) are
points in n-dimensional space
points with w0 + hw,
~ ~xi = 0 are on
a hyperplane defined by w0 and w
~
points with w0 + hw,
~ ~xi > 0 are
hyp
erp
lane
lower
halfspace
x1
x3
upper
halfspace
above the hyperplane
lower
halfspace
x2
hy
pe
rpl
an
x1
Machine Learning: Perceptrons p.7/24
Perceptrons: adaptive neurons
(cont.)
geometric interpretation of a
perceptron:
x2
upper
halfspace
input patterns (x1 , . . . , xn ) are
points in n-dimensional space
points with w0 + hw,
~ ~xi = 0 are on
a hyperplane defined by w0 and w
~
points with w0 + hw,
~ ~xi > 0 are
hyp
erp
lane
lower
halfspace
x1
x3
upper
halfspace
above the hyperplane
points with w0 + hw,
~ ~xi < 0 are
below the hyperplane
lower
halfspace
x2
hy
pe
rpl
an
x1
Machine Learning: Perceptrons p.7/24
Perceptrons: adaptive neurons
(cont.)
geometric interpretation of a
perceptron:
x2
upper
halfspace
input patterns (x1 , . . . , xn ) are
points in n-dimensional space
points with w0 + hw,
~ ~xi = 0 are on
a hyperplane defined by w0 and w
~
points with w0 + hw,
~ ~xi > 0 are
hyp
erp
lane
lower
halfspace
x1
x3
upper
halfspace
above the hyperplane
points with w0 + hw,
~ ~xi < 0 are
below the hyperplane
lower
halfspace
perceptrons partition the input space
into two halfspaces along a
hyperplane
x2
hy
pe
rpl
an
x1
Machine Learning: Perceptrons p.7/24
Perceptron learning problem
perceptrons can automatically adapt to example data Supervised
Learning: Classification
Machine Learning: Perceptrons p.8/24
Perceptron learning problem
perceptrons can automatically adapt to example data Supervised
Learning: Classification
perceptron learning problem:
given:
a set of input patterns P Rn , called the set of positive examples
another set of input patterns N Rn , called the set of negative
examples
task:
generate a perceptron that yields 1 for all patterns from P and 0 for all
patterns from N
Machine Learning: Perceptrons p.8/24
Perceptron learning problem
perceptrons can automatically adapt to example data Supervised
Learning: Classification
perceptron learning problem:
given:
a set of input patterns P Rn , called the set of positive examples
another set of input patterns N Rn , called the set of negative
examples
task:
generate a perceptron that yields 1 for all patterns from P and 0 for all
patterns from N
obviously, there are cases in which the learning task is unsolvable, e.g.
P N =
6
Machine Learning: Perceptrons p.8/24
Perceptron learning problem
(cont.)
Lemma (strict separability):
Whenever exist a perceptron that classifies all training patterns accurately,
there is also a perceptron that classifies all training patterns accurately and
no training pattern is located on the decision boundary, i.e.
w~0 + hw,
~ ~xi =
6 0 for all training patterns.
Machine Learning: Perceptrons p.9/24
Perceptron learning problem
(cont.)
Lemma (strict separability):
Whenever exist a perceptron that classifies all training patterns accurately,
there is also a perceptron that classifies all training patterns accurately and
no training pattern is located on the decision boundary, i.e.
w~0 + hw,
~ ~xi =
6 0 for all training patterns.
Proof:
Let (w,
~ w0 ) be a perceptron that classifies all patterns accurately. Hence,
(
0
hw,
~ ~xi + w0
<0
P
for all ~
xN
for all ~
x
Machine Learning: Perceptrons p.9/24
Perceptron learning problem
(cont.)
Lemma (strict separability):
Whenever exist a perceptron that classifies all training patterns accurately,
there is also a perceptron that classifies all training patterns accurately and
no training pattern is located on the decision boundary, i.e.
w~0 + hw,
~ ~xi =
6 0 for all training patterns.
Proof:
Let (w,
~ w0 ) be a perceptron that classifies all patterns accurately. Hence,
(
0
hw,
~ ~xi + w0
<0
Define
P
for all ~
xN
for all ~
x
= min{(hw,
~ ~xi + w0 )|~x N }. Then:
(
2 > 0
for all ~
xP
hw,
~ ~xi + w0 +
2 2 < 0 for all ~x N
Machine Learning: Perceptrons p.9/24
Perceptron learning problem
(cont.)
Lemma (strict separability):
Whenever exist a perceptron that classifies all training patterns accurately,
there is also a perceptron that classifies all training patterns accurately and
no training pattern is located on the decision boundary, i.e.
w~0 + hw,
~ ~xi =
6 0 for all training patterns.
Proof:
Let (w,
~ w0 ) be a perceptron that classifies all patterns accurately. Hence,
(
0
hw,
~ ~xi + w0
<0
Define
P
for all ~
xN
for all ~
x
= min{(hw,
~ ~xi + w0 )|~x N }. Then:
(
2 > 0
for all ~
xP
hw,
~ ~xi + w0 +
2 2 < 0 for all ~x N
Thus, the perceptron (w,
~ w0
+ 2 ) proves the lemma.
Machine Learning: Perceptrons p.9/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
how can we change w
~ and w0 to
avoid this error?
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
x3
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
w
x2
x1
Geometric intepretation: increasing w0
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
x3
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
w
x2
x1
Geometric intepretation: increasing w0
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
x3
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
x2
x1
Geometric intepretation: increasing w0
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
x3
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
w
x2
x1
Geometric intepretation: modifying w
~
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
x3
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
w
x2
x1
Geometric intepretation: modifying w
~
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm:
idea
assume, the perceptron makes an
error on a pattern ~
x P:
hw,
~ ~xi + w0 < 0
x3
how can we change w
~ and w0 to
avoid this error? we need to
increase hw,
~ ~xi + w0
increase w0
if xi > 0, increase wi
if xi < 0 (negative influence),
decrease wi
x2
x1
Geometric intepretation: modifying w
~
perceptron learning algorithm: add ~x
to w
~ , add 1 to w0 in this case. Errors
on negative patterns: analogously.
Machine Learning: Perceptrons p.10/24
Perceptron learning algorithm
Require: positive training patterns P and a negative training examples N
Ensure: if exists, a perceptron is learned that classifies all patterns accurately
w
~ and bias weight w0 arbitrarily
while exist misclassified pattern ~
x P N do
if ~
x P then
w
~ w
~ + ~x
w0 w0 + 1
1: initialize weight vector
2:
3:
4:
5:
6:
7:
8:
else
w
~ w
~ ~x
w0 w0 1
end if
10: end while
11: return w
~ and w0
9:
Machine Learning: Perceptrons p.11/24
Perceptron learning algorithm:
example
N = {(1, 0)T , (1, 1)T }, P = {(0, 1)T }
exercise
Machine Learning: Perceptrons p.12/24
Perceptron learning algorithm:
convergence
Lemma (correctness of perceptron learning):
Whenever the perceptron learning algorithm terminates, the perceptron
given by (w,
~ w0 ) classifies all patterns accurately.
Machine Learning: Perceptrons p.13/24
Perceptron learning algorithm:
convergence
Lemma (correctness of perceptron learning):
Whenever the perceptron learning algorithm terminates, the perceptron
given by (w,
~ w0 ) classifies all patterns accurately.
Proof: follows immediately from algorithm.
Machine Learning: Perceptrons p.13/24
Perceptron learning algorithm:
convergence
Lemma (correctness of perceptron learning):
Whenever the perceptron learning algorithm terminates, the perceptron
given by (w,
~ w0 ) classifies all patterns accurately.
Proof: follows immediately from algorithm.
Theorem (termination of perceptron learning):
Whenever exists a perceptron that classifies all training patterns correctly,
the perceptron learning algorithm terminates.
Machine Learning: Perceptrons p.13/24
Perceptron learning algorithm:
convergence
Lemma (correctness of perceptron learning):
Whenever the perceptron learning algorithm terminates, the perceptron
given by (w,
~ w0 ) classifies all patterns accurately.
Proof: follows immediately from algorithm.
Theorem (termination of perceptron learning):
Whenever exists a perceptron that classifies all training patterns correctly,
the perceptron learning algorithm terminates.
Proof:
for simplification we will add the bias weight to the weight vector, i.e.
w
~ = (w0 , w1 , . . . , wn )T , and 1 to all patterns, i.e. ~x = (1, x1 , . . . , xn )T .
We will denote with w
~ (t) the weight vector in the t-th iteration of perceptron
learning and with ~
x(t) the pattern used in the t-th iteration.
Machine Learning: Perceptrons p.13/24
Perceptron learning algorithm:
convergence proof (cont.)
Let be w
~ a weight vector that strictly classifies all training patterns.
Machine Learning: Perceptrons p.14/24
Perceptron learning algorithm:
convergence proof (cont.)
Let be w
~ a weight vector that strictly classifies all training patterns.
with
w
~ ,w
~
(t+1)
(t)
(t)
= w
~ ,w
~ ~x
(t) (t)
= w
~ ,w
~
w
~ , ~x
(t)
w
~ ,w
~
+
:= min ({hw
~ , ~xi |~x P} { hw
~ , ~xi |~x N })
Machine Learning: Perceptrons p.14/24
Perceptron learning algorithm:
convergence proof (cont.)
Let be w
~ a weight vector that strictly classifies all training patterns.
w
~ ,w
~
(t+1)
(t)
(t)
= w
~ ,w
~ ~x
(t) (t)
= w
~ ,w
~
w
~ , ~x
(t)
w
~ ,w
~
+
:= min ({hw
~ , ~xi |~x P} { hw
~ , ~xi |~x N })
> 0 since w
~ strictly classifies all patterns
with
Machine Learning: Perceptrons p.14/24
Perceptron learning algorithm:
convergence proof (cont.)
Let be w
~ a weight vector that strictly classifies all training patterns.
w
~ ,w
~
(t+1)
(t)
(t)
= w
~ ,w
~ ~x
(t) (t)
= w
~ ,w
~
w
~ , ~x
(t)
w
~ ,w
~
+
:= min ({hw
~ , ~xi |~x P} { hw
~ , ~xi |~x N })
> 0 since w
~ strictly classifies all patterns
with
Hence,
w
~ ,w
~
(t+1)
w
~ ,w
~
(0)
+ (t + 1)
Machine Learning: Perceptrons p.14/24
Perceptron learning algorithm:
convergence proof (cont.)
||w
~
(t+1) 2
(t+1)
(t+1)
|| = w
~
,w
~
(t)
(t)
(t)
(t)
= w
~ ~x , w
~ ~x
(t) (t)
(t) 2
= ||w
~ || 2 w
~ , ~x
+ ||~x(t) ||2
||w
~ (t) ||2 +
with
:= max{||~x||2 |~x P N }
Machine Learning: Perceptrons p.15/24
Perceptron learning algorithm:
convergence proof (cont.)
||w
~
(t+1) 2
(t+1)
(t+1)
|| = w
~
,w
~
(t)
(t)
(t)
(t)
= w
~ ~x , w
~ ~x
(t) (t)
(t) 2
= ||w
~ || 2 w
~ , ~x
+ ||~x(t) ||2
||w
~ (t) ||2 +
with :=
Hence,
max{||~x||2 |~x P N }
||w
~ (t+1) ||2 ||w
~ (0) ||2 + (t + 1)
Machine Learning: Perceptrons p.15/24
Perceptron learning algorithm:
convergence proof (cont.)
cos (w
~ ,w
~
(t+1)
(t+1)
w
~ ,w
~
)=
||w
~ || ||w
~ (t+1) ||
Machine Learning: Perceptrons p.16/24
Perceptron learning algorithm:
convergence proof (cont.)
cos (w
~ ,w
~
(t+1)
(t+1)
w
~ ,w
~
)=
||w
~ || ||w
~ (t+1) ||
(0)
w
~ ,w
~
+ (t + 1)
p
~ (0) ||2 + (t + 1)
||w
~ || ||w
Machine Learning: Perceptrons p.16/24
Perceptron learning algorithm:
convergence proof (cont.)
cos (w
~ ,w
~
(t+1)
(t+1)
w
~ ,w
~
)=
||w
~ || ||w
~ (t+1) ||
Since cos (w
~ , w
~ (t+1) )
(0)
w
~ ,w
~
+ (t + 1)
p
~ (0) ||2 + (t + 1)
||w
~ || ||w
1, t must be bounded above.
Machine Learning: Perceptrons p.16/24
Perceptron learning algorithm:
convergence
Lemma (worst case running time):
If the given problem is solvable, perceptron learning terminates after at most
(n + 1)2 2(n+1) log(n+1) iterations.
8e+07
7e+07
6e+07
5e+07
4e+07
3e+07
2e+07
1e+07
0
Exponential running time is a problem of the perceptron learning algorithm.
7
There are algorithms that solve the problem with complexity O(n 2 )
Machine Learning: Perceptrons p.17/24
Perceptron learning algorithm:
cycle theorem
Lemma:
If a weight vector occurs twice during perceptron learning, the given task is
not solvable. (Remark: here, we mean with weight vector the extended
variant containing also w0 )
Proof: next slide
Machine Learning: Perceptrons p.18/24
Perceptron learning algorithm:
cycle theorem
Lemma:
If a weight vector occurs twice during perceptron learning, the given task is
not solvable. (Remark: here, we mean with weight vector the extended
variant containing also w0 )
Proof: next slide
Lemma:
Starting the perceptron learning algorithm with weight vector ~
0 on an
unsolvable problem, at least one weight vector will occur twice.
Proof: omitted, see Minsky/Papert, Perceptrons
Machine Learning: Perceptrons p.18/24
Perceptron learning algorithm:
cycle theorem
Proof:
Assume w
~ (t+k)
=w
~ (t) . Meanwhile, the patterns ~x(t+1) , . . . , ~x(t+k) have been
applied. Without loss of generality, assume ~
x(t+1) , . . . , ~x(t+q) P and
~x(t+q+1) , . . . , ~x(t+k) N . Hence:
w
~ (t) = w
~ (t+k) = w
~ (t) + ~x(t+1) + + ~x(t+q) (~x(t+q+1) + + ~x(t+k) )
~x(t+1) + + ~x(t+q) = ~x(t+q+1) + + ~x(t+k)
Machine Learning: Perceptrons p.19/24
Perceptron learning algorithm:
cycle theorem
Proof:
Assume w
~ (t+k)
=w
~ (t) . Meanwhile, the patterns ~x(t+1) , . . . , ~x(t+k) have been
applied. Without loss of generality, assume ~
x(t+1) , . . . , ~x(t+q) P and
~x(t+q+1) , . . . , ~x(t+k) N . Hence:
w
~ (t) = w
~ (t+k) = w
~ (t) + ~x(t+1) + + ~x(t+q) (~x(t+q+1) + + ~x(t+k) )
~x(t+1) + + ~x(t+q) = ~x(t+q+1) + + ~x(t+k)
Assume, a solution w
~ exists. Then:
(
(t+i) 0
w
~ , ~x
<0
{1, . . . , q}
if i {q + 1, . . . , k}
if i
Machine Learning: Perceptrons p.19/24
Perceptron learning algorithm:
cycle theorem
Proof:
Assume w
~ (t+k)
=w
~ (t) . Meanwhile, the patterns ~x(t+1) , . . . , ~x(t+k) have been
applied. Without loss of generality, assume ~
x(t+1) , . . . , ~x(t+q) P and
~x(t+q+1) , . . . , ~x(t+k) N . Hence:
w
~ (t) = w
~ (t+k) = w
~ (t) + ~x(t+1) + + ~x(t+q) (~x(t+q+1) + + ~x(t+k) )
~x(t+1) + + ~x(t+q) = ~x(t+q+1) + + ~x(t+k)
Assume, a solution w
~ exists. Then:
(
(t+i) 0
w
~ , ~x
<0
Hence,
(t+1)
w
~ , ~x
(t+q+1)
w
~ , ~x
(t+q)
+ + ~x
(t+k)
+ + ~x
{1, . . . , q}
if i {q + 1, . . . , k}
if i
0
<0
contradiction!
Machine Learning: Perceptrons p.19/24
Perceptron learning algorithm:
Pocket algorithm
how can we determine a good
perceptron if the given task cannot
be solved perfectly?
good in the sense of: perceptron
makes minimal number of errors
Machine Learning: Perceptrons p.20/24
Perceptron learning algorithm:
Pocket algorithm
how can we determine a good
perceptron if the given task cannot
be solved perfectly?
good in the sense of: perceptron
makes minimal number of errors
Machine Learning: Perceptrons p.20/24
Perceptron learning algorithm:
Pocket algorithm
how can we determine a good
perceptron if the given task cannot
be solved perfectly?
Perceptron learning: the number of
errors does not decrease
monotonically during learning
good in the sense of: perceptron Idea: memorise the best weight
makes minimal number of errors
vector that has occured so far!
Pocket algorithm
Machine Learning: Perceptrons p.20/24
Perceptron networks
perceptrons can only learn linearly
separable problems.
famous counterexample:
XOR(x1 , x2 ):
P = {(0, 1)T , (1, 0)T },
N = {(0, 0)T , (1, 1)T }
Machine Learning: Perceptrons p.21/24
Perceptron networks
perceptrons can only learn linearly
separable problems.
famous counterexample:
XOR(x1 , x2 ):
P = {(0, 1)T , (1, 0)T },
N = {(0, 0)T , (1, 1)T }
classifies three patterns
accurately, e.g. w0 = 0.5,
w1 = w2 = 1 classifies
(0, 0)T , (0, 1)T , (1, 0)T but fails
on (1, 1)T
networks with several perceptrons
are computationally more powerful
(cf. McCullough/Pitts neurons)
lets try to find a network with two
perceptrons that can solve the XOR
problem:
first step: find a perceptron that
Machine Learning: Perceptrons p.21/24
Perceptron networks
perceptrons can only learn linearly
separable problems.
famous counterexample:
XOR(x1 , x2 ):
P = {(0, 1)T , (1, 0)T },
N = {(0, 0)T , (1, 1)T }
networks with several perceptrons
are computationally more powerful
(cf. McCullough/Pitts neurons)
lets try to find a network with two
perceptrons that can solve the XOR
problem:
first step: find a perceptron that
classifies three patterns
accurately, e.g. w0 = 0.5,
w1 = w2 = 1 classifies
(0, 0)T , (0, 1)T , (1, 0)T but fails
on (1, 1)T
second step: find a perceptron
that uses the output of the first
perceptron as additional input.
Hence, training patterns are:
N = {(0, 0, 0), (1, 1, 1)},
P = {(0, 1, 1), (1, 0, 1)}.
perceptron learning yields:
v0 = 1, v1 = v2 = 1,
v3 = 2
Machine Learning: Perceptrons p.21/24
Perceptron networks
(cont.)
XOR-network:
x1
1
2
1
x2
1
1
1
1
0.5
Machine Learning: Perceptrons p.22/24
Perceptron networks
(cont.)
XOR-network:
x1
1
2
1
x2
1
2
x2
1
1
0.5
Geometric interpretation:
x1
-1
-2
-2
-1
Machine Learning: Perceptrons p.22/24
Perceptron networks
(cont.)
XOR-network:
x1
1
2
1
x2
1
2
x2
1
1
0.5
Geometric interpretation:
partitioning of first perceptron
x1
-1
-2
-2
-1
Machine Learning: Perceptrons p.22/24
Perceptron networks
(cont.)
XOR-network:
x1
1
2
1
x2
1
2
x2
1
1
0.5
Geometric interpretation:
partitioning of second perceptron, assuming
first perceptron yields 0
x1
-1
-2
-2
-1
Machine Learning: Perceptrons p.22/24
Perceptron networks
(cont.)
XOR-network:
x1
1
2
1
x2
1
2
x2
1
1
0.5
Geometric interpretation:
partitioning of second perceptron, assuming
first perceptron yields 1
x1
-1
-2
-2
-1
Machine Learning: Perceptrons p.22/24
Perceptron networks
(cont.)
XOR-network:
x1
1
2
1
x2
x2
1
1
0.5
Geometric interpretation:
combining both
x1
-1
-2
-2
-1
Machine Learning: Perceptrons p.22/24
Historical remarks
Rosenblatt perceptron (1958):
retinal input (array of pixels)
preprocessing level, calculation
of features
adaptive linear classifier
inspired by human vision
retina
features
linear
classifier
Machine Learning: Perceptrons p.23/24
Historical remarks
Rosenblatt perceptron (1958):
retinal input (array of pixels)
preprocessing level, calculation
of features
adaptive linear classifier
inspired by human vision
if features are complex enough,
everything can be classified
if features are restricted (only
parts of the retinal pixels
available to features), some
interesting tasks cannot be
learned (Minsky/Papert, 1969)
retina
features
linear
classifier
Machine Learning: Perceptrons p.23/24
Historical remarks
Rosenblatt perceptron (1958):
retinal input (array of pixels)
preprocessing level, calculation
of features
adaptive linear classifier
inspired by human vision
if features are complex enough,
everything can be classified
if features are restricted (only
parts of the retinal pixels
available to features), some
interesting tasks cannot be
learned (Minsky/Papert, 1969)
important idea: create features
instead of learning from raw data
retina
features
linear
classifier
Machine Learning: Perceptrons p.23/24
Summary
Perceptrons are simple neurons with limited representation capabilites:
linear seperable functions only
simple but provably working learning algorithm
networks of perceptrons can overcome limitations
working in feature space may help to overcome limited representation
capability
Machine Learning: Perceptrons p.24/24