01 Halfspaces Perceptron
01 Halfspaces Perceptron
PAGE 3
Feature extraction Function approximation
The perceptron algorithm learns a classifier
using linear combinations of features
PAGE 4
Aims
At the end of the lecture, we should be able to:
★ Identify the components of a dataset required for supervised learning.
★ Interpret the separating hyperplane hypothesis class geometrically.
★ Implement the Perceptron algorithm and list its properties.
★ Reproduce Novikoff’s proof of the Perceptron convergence theorem.
PAGE 5
Lecture Outline
I. What is needed in order to learn a classifier?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping
PAGE 6
Lecture Outline
I. What is needed in order to learn a classifier?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping
PAGE 7
A motivating example: predicting whether you’ll pass a class
PAGE 8
A motivating example: predicting whether you’ll pass a class
PAGE 9
Divination effort dataset
PAGE 10
The Binary Classification Problem
PAGE 11
Exploring the “divination effort” dataset
PAGE 12
The slope-intercept form for a line is inconvenient
PAGE 13
A line defines a hyperplane, or affine set, in ℝ2
PAGE 14
The notations and meaning of inner product
x1 x2 x3 x4 ⨯ w1 = -b
w2
w3
w4
PAGE 15
From hyperplanes to halfspaces
PAGE 16
The separating hyperplane hypothesis class
PAGE 17
Biological interpretation
PAGE 18
Lecture Outline
I. What is needed in order to learn a classifier?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping
PAGE 19
Statistical (Batch) Learning
PAGE 20
Online Learning
PAGE 21
The Perceptron Algorithm
PAGE 22
Get ready to watch the Perceptron algorithm in action
parameters
decision boundary
Negative
Positive
x2
New boundary
Old boundary
PAGE 23
features
x1
PAUSE (1 min)
write down your predictions
PAGE 24
The
Perceptron
algorithm
in Action
PAGE 25
Lecture Outline
I. What is needed in order to learn?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping
PAGE 26
The Perceptron convergence theorem (informal)
PAGE 27
How can we define linear separability?
PAGE 28
The padding trick simplifies analysis
x1 x2 x3 x4 ⨯ w1 + b x1 x2 x3 x4 1 ⨯ w1
w2 w2
w3 w3
w4 w4
PAGE 29
Biological interpretation
PAGE 30
Biological interpretation with padding
PAGE 31
Linear separability and the margin, 𝛾, of a dataset, D
PAGE 32
The Oracle Vector
PAGE 33
Linear separability and the margin, 𝛾, of a dataset, D
PAGE 34
Finite number of errors on linearly separable data
PAGE 35
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
II. Will it ever stop?
PAGE 36
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
II. Will it ever stop?
PAGE 37
Do updates necessarily result in progress?
PAGE 38
Do updates necessarily result in progress?
PAGE 39
Do updates necessarily result in progress?
PAGE 40
Do updates necessarily result in progress?
Yes. The weight vector
increases its alignment with
the oracle at every update.
PAGE 41
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
Yes. The weight vector increases its alignment with the oracle at every update.
PAGE 42
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
Yes. The weight vector increases its alignment with the oracle at every update.
PAGE 43
Will it ever stop?
Can now be interpreted as: “Is there an upper bound on the norm of the parameter vector?”
PAGE 44
Will it ever stop?
Upper bound:
Lower bound:
PAGE 45
Finite number of errors on linearly separable data
PAGE 46
QUICK BREAK (2 mins)
go over your notes or discuss with your neighbor
PAGE 47
A few more properties
▪ The solution found by the Perceptron algorithm is not unique
▪ There are infinitely many solutions
▪ No guarantee to be optimal, in terms of the maximizing the margin
▪ The Perceptron algorithm will not converge if data are not linearly separable
▪ The algorithm will never halt, it will cycle
▪ The algorithm is inappropriate for such problems
▪ Multiple valid termination conditions
▪ Weights have stopped changing
▪ Exhausted some update budget
▪ Error on training or validation datasets has stopped
▪ There are different strategies for controlling the order of arrival of samples
PAGE 48
Multiclass classification
Yu, Xiaoqun, Jaehyuk Jang, and Shuping Xiong. Frontiers in Aging PAGE 49
Neuroscience 13 (2021): 692865.
Learning a multiclass classifier with Perceptron
One vs all
▪ Train a classifier for each class
▪ Output: arg maxi(w,x)
One vs. one
▪ Train a classifier for each pair of classes
▪ e.g. if 4 classes, 6 possible pairs
▪ Output: Majority vote
PAGE 50
Lecture Outline
I. What is needed in order to learn?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping
PAGE 51
Aims
We should now be able to:
✓ Identify the components of a dataset required for supervised learning.
✓ Interpret the separating hyperplane hypothesis class geometrically.
✓ Implement the Perceptron algorithm and list its properties.
✓ Reproduce Novikoff’s proof of the Perceptron convergence theorem.
PAGE 52
PAGE 53
On the horizon
posting
today →
examples
are up →
PAGE 54
PAGE 55
PAGE 56