0% found this document useful (0 votes)
15 views56 pages

01 Halfspaces Perceptron

Uploaded by

Cheng Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views56 pages

01 Halfspaces Perceptron

Uploaded by

Cheng Lin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

CS 480/680

Introduction to Machine Learning


Lecture 1
Halfspaces and the Perceptron Algorithm
Kathryn Simone
10 September 2024
Classification: Fall detection from accelerometer data

Clockwise from top:


Yu, Xiaoqun, Jaehyuk Jang, and Shuping Xiong. Frontiers in Aging
Neuroscience 13 (2021): 692865.
Voelker, Aaron, Ivana Kajić, and Chris Eliasmith. Advances in neural
information processing systems 32 (2019).
PAGE 2
Barkley and Simone 2023, Unpublished
Most of ML makes use of linear methods

PAGE 3
Feature extraction Function approximation
The perceptron algorithm learns a classifier
using linear combinations of features

PAGE 4
Aims
At the end of the lecture, we should be able to:
★ Identify the components of a dataset required for supervised learning.
★ Interpret the separating hyperplane hypothesis class geometrically.
★ Implement the Perceptron algorithm and list its properties.
★ Reproduce Novikoff’s proof of the Perceptron convergence theorem.

PAGE 5
Lecture Outline
I. What is needed in order to learn a classifier?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping

PAGE 6
Lecture Outline
I. What is needed in order to learn a classifier?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping

PAGE 7
A motivating example: predicting whether you’ll pass a class

PAGE 8
A motivating example: predicting whether you’ll pass a class

PAGE 9
Divination effort dataset

PAGE 10
The Binary Classification Problem

PAGE 11
Exploring the “divination effort” dataset

PAGE 12
The slope-intercept form for a line is inconvenient

PAGE 13
A line defines a hyperplane, or affine set, in ℝ2

PAGE 14
The notations and meaning of inner product

x1 x2 x3 x4 ⨯ w1 = -b

w2

w3

w4

PAGE 15
From hyperplanes to halfspaces

PAGE 16
The separating hyperplane hypothesis class

PAGE 17
Biological interpretation

PAGE 18
Lecture Outline
I. What is needed in order to learn a classifier?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping

PAGE 19
Statistical (Batch) Learning

PAGE 20
Online Learning

PAGE 21
The Perceptron Algorithm

PAGE 22
Get ready to watch the Perceptron algorithm in action

parameters

decision boundary
Negative

Positive
x2

New boundary
Old boundary

PAGE 23
features
x1
PAUSE (1 min)
write down your predictions

PAGE 24
The
Perceptron
algorithm
in Action

PAGE 25
Lecture Outline
I. What is needed in order to learn?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping

PAGE 26
The Perceptron convergence theorem (informal)

Linearly separable Perceptron converges

PAGE 27
How can we define linear separability?

PAGE 28
The padding trick simplifies analysis

x1 x2 x3 x4 ⨯ w1 + b x1 x2 x3 x4 1 ⨯ w1

w2 w2

w3 w3

w4 w4

PAGE 29
Biological interpretation

PAGE 30
Biological interpretation with padding

PAGE 31
Linear separability and the margin, 𝛾, of a dataset, D

PAGE 32
The Oracle Vector

The Matrix (1999)

PAGE 33
Linear separability and the margin, 𝛾, of a dataset, D

PAGE 34
Finite number of errors on linearly separable data

PAGE 35
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
II. Will it ever stop?

PAGE 36
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
II. Will it ever stop?

PAGE 37
Do updates necessarily result in progress?

PAGE 38
Do updates necessarily result in progress?

PAGE 39
Do updates necessarily result in progress?

PAGE 40
Do updates necessarily result in progress?
Yes. The weight vector
increases its alignment with
the oracle at every update.

PAGE 41
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
Yes. The weight vector increases its alignment with the oracle at every update.

II. Will it ever stop?

PAGE 42
A proof in two parts (Novikoff, 1962)
I. Do updates necessarily result in progress?
Yes. The weight vector increases its alignment with the oracle at every update.

II. Will it ever stop?

PAGE 43
Will it ever stop?
Can now be interpreted as: “Is there an upper bound on the norm of the parameter vector?”

PAGE 44
Will it ever stop?
Upper bound:

Lower bound:

PAGE 45
Finite number of errors on linearly separable data

PAGE 46
QUICK BREAK (2 mins)
go over your notes or discuss with your neighbor

PAGE 47
A few more properties
▪ The solution found by the Perceptron algorithm is not unique
▪ There are infinitely many solutions
▪ No guarantee to be optimal, in terms of the maximizing the margin
▪ The Perceptron algorithm will not converge if data are not linearly separable
▪ The algorithm will never halt, it will cycle
▪ The algorithm is inappropriate for such problems
▪ Multiple valid termination conditions
▪ Weights have stopped changing
▪ Exhausted some update budget
▪ Error on training or validation datasets has stopped
▪ There are different strategies for controlling the order of arrival of samples

PAGE 48
Multiclass classification

Binary: Fall Multiclass:


Not Fall

Yu, Xiaoqun, Jaehyuk Jang, and Shuping Xiong. Frontiers in Aging PAGE 49
Neuroscience 13 (2021): 692865.
Learning a multiclass classifier with Perceptron
One vs all
▪ Train a classifier for each class
▪ Output: arg maxi(w,x)
One vs. one
▪ Train a classifier for each pair of classes
▪ e.g. if 4 classes, 6 possible pairs
▪ Output: Majority vote

PAGE 50
Lecture Outline
I. What is needed in order to learn?
The structure of observations and hypotheses
II. How can we learn a hypothesis from data?
The Perceptron Algorithm
III. Why does this work?
Convergence analysis and other properties
IV. Summary + Housekeeping

PAGE 51
Aims
We should now be able to:
✓ Identify the components of a dataset required for supervised learning.
✓ Interpret the separating hyperplane hypothesis class geometrically.
✓ Implement the Perceptron algorithm and list its properties.
✓ Reproduce Novikoff’s proof of the Perceptron convergence theorem.

PAGE 52
PAGE 53
On the horizon

posting
today →

examples
are up →

PAGE 54
PAGE 55
PAGE 56

You might also like