Machine Learning
Lecture 2
Review of basic concepts
‣ Feature vectors, labels
‣ Training set
‣ Classifier
‣ Training error
‣ Test error
‣ Set of classifiers
Review: training set
x2
- +
+
x1
-
Review: a classifier
x2
h(x) = +1
- +
+
h(x) = 1
x1
-
Review: test set
x2
? h(x) = +1
? ?
? +
- ?
+
?
h(x) = 1? ?
?
?
x1
? -
This lecture
‣ The set of linear classifiers
‣ Linear separation
‣ Perceptron algorithm
Linear classifiers
x2
x1
Linear classifiers through origin
x2
x1
Linear classifiers
x2
x1
Linear separation: ex
x2
x1
Linear separation: ex
x2
x1
Linear separation: ex
x2
x1
Linear separation
Definition:
Training examples Sn = {(x(i) , y (i) }), i = 1, . . . , n} are
linearly separable if there exists a parameter vector ✓ˆ and
o↵set parameter ✓ˆ0 such that y (i) (✓ˆ · x(i) + ✓ˆ0 ) > 0 for all
i = 1, ..., n.
Learning linear classifiers
‣ Training error for a linear classifier (through origin)
Learning linear classifiers
‣ Training error for a linear classifier
Learning algorithm: perceptron
Algorithm 1 Perceptron Algorithm (without offset)
procedure Perceptron({(x(i) , y (i) ), i = 1, . . . , n}, T )
✓ = 0 (vector)
for t = 1, . . . , T do
for i = 1, . . . , n do
if y (i) (✓ · x(i) ) 0 then
✓ = ✓ + y (i) x(i)
return ✓
We should first establish that the perceptron updates tend
this, consider a simple two dimensional example in figure 4. T
figure are chosen such that the algorithm makes a mistake on
pass. As a result, the updates become: ✓(0) = 0 and
Learning algorithm: perceptron
Algorithm 1 Perceptron Algorithm (without offset)
procedure Perceptron({(x(i) , y (i) ), i = 1, . . . , n}, T )
✓ = 0 (vector)
for t = 1, . . . , T do
for i = 1, . . . , n do
if y (i) (✓ · x(i) ) 0 then
✓ = ✓ + y (i) x(i)
return ✓
We should first establish that the perceptron updates tend
this, consider a simple two dimensional example in figure 4. T
figure are chosen such that the algorithm makes a mistake on
pass. As a result, the updates become: ✓(0) = 0 and
Learning algorithm: perceptron
Algorithm 1 Perceptron Algorithm (without offset)
procedure Perceptron({(x(i) , y (i) ), i = 1, . . . , n}, T )
✓ = 0 (vector)
for t = 1, . . . , T do
for i = 1, . . . , n do
if y (i) (✓ · x(i) ) 0 then
✓ = ✓ + y (i) x(i)
return ✓
We should first establish that the perceptron updates tend
this, consider a simple two dimensional example in figure 4. T
figure are chosen such that the algorithm makes a mistake on
pass. As a result, the updates become: ✓(0) = 0 and
Perceptron algorithm: ex
x2
x1
The perceptron algorithm and the above statements about c
to the casePerceptron (with offset)
with the offset parameter.
Algorithm 2 Perceptron Algorithm
1: procedure Perceptron({(x(i) , y (i) ), i = 1, . . . , n}, T )
2: ✓ = 0 (vector), ✓0 = 0 (scalar)
3: for t = 1, . . . , T do
4: for i = 1, . . . , n do
5: if y (✓ · x + ✓0 ) 0 then
(i) (i)
6: ✓ = ✓ + y (i) x(i)
7: ✓0 = ✓0 + y (i)
8: return ✓, ✓0
Why is the offset parameter updated in this way? Think of i
with an additional coordinate that is set to 1 for all example
our examples x 2 Rd to x0 2 Rd+1 such that x0 = [x1 , . . . , x
Key things to understand
‣ Parametric families (sets) of classifiers
‣ The set of linear classifiers
‣ Linear separation
‣ Perceptron algorithm