0% found this document useful (0 votes)
6 views

Lecture1 Slides 1

The document discusses classification using the perceptron algorithm. It uses an example of predicting whether customers will buy a health insurance plan based on their age and income. The goal is to create a linear classifier that splits the data into two classes using a decision boundary learned from labeled training data.

Uploaded by

Junaid Qaiser
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lecture1 Slides 1

The document discusses classification using the perceptron algorithm. It uses an example of predicting whether customers will buy a health insurance plan based on their age and income. The goal is to create a linear classifier that splits the data into two classes using a decision boundary learned from labeled training data.

Uploaded by

Junaid Qaiser
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Attendance Code:

26444542
#Code
Attendance Code:
26444542
#Code

● Last lecture: classification in ML

● This lecture: classification using the perceptron algorithm


Attendance Code:
26444542
#Code
Example: health insurance company
● Data on whether customers bought the plan
Client Age (yrs) Income (k £) Bought? ● Task: predict whether a new customer is
1 25 30 No likely to buy or not the plan, given their
2 45 60 Yes
age and income.
○ Goal: split the data into 2 classes
3 30 50 Yes
(bought/didn’t buy) that best match class-
4 22 25 No labeled training data.
5 35 45 Yes Classification model: linear classifier
6 55 70 Yes 𝑓 𝑤, 𝑥 = sign 𝑤 𝑇 𝑥
7 40 55 No
𝑓: ℝ𝐷 → {−1, +1}
8 60 80 Yes
Initial guess: 𝑤0 = −130, 2, 1 𝑇

9 50 40 No Hypothesis: there is some decision boundary in the


10 28 35 No data which makes this classification possible
Attendance Code:
26444542
#Code
Example: health insurance company
● Data on whether customers bought the plan
Client Age (yrs) Income (k £) Bought? ● Task: predict whether a new customer is
1 25 30 No likely to buy or not the plan, given their
2 45 60 Yes
age and income.
○ Goal: split the data into 2 classes
3 30 50 Yes
(bought/didn’t buy) that best match class-
4 22 25 No labeled training data.
5 35 45 Yes ○ Classification model: linear classifier
6 55 70 Yes 𝑓 𝑤, 𝑥 = sign 𝑤 𝑇 𝑥
7 40 55 No
𝑓: ℝ𝐷 → {−1, +1}
8 60 80 Yes
Initial guess: 𝑤0 = −130, 2, 1 𝑇

9 50 40 No Hypothesis: there is some decision boundary in the


10 28 35 No data which makes this classification possible
Attendance Code:
26444542
#Code
Example: health insurance company
● Data on whether customers bought the plan
Client Age (yrs) Income (k £) Bought? ● Task: predict whether a new customer is
1 25 30 No likely to buy or not the plan, given their
2 45 60 Yes
age and income.
○ Goal: split the data into 2 classes
3 30 50 Yes
(bought/didn’t buy) that best match class-
4 22 25 No labeled training data.
5 35 45 Yes ○ Classification model: linear classifier
6 55 70 Yes 𝑓 𝑤, 𝑥 = sign 𝑤 𝑇 𝑥
7 40 55 No
𝑓: ℝ𝐷 → {−1, +1}
8 60 80 Yes
○ Initial guess: 𝑤0 = −130, 2, 1 𝑇

9 50 40 No Hypothesis: there is some decision boundary in the


10 28 35 No data which makes this classification possible
Attendance Code:
26444542
#Code
Example: health insurance company
● Data on whether customers bought the plan
● Task: predict whether a new customer is
Decision boundary
likely to buy or not the plan, given their
(𝑤0𝑇 𝑥 = 0) age and income.
y2 = +1
Class ○ Goal: split the data into 2 classes
“bought”
y7 = −1 (𝑦 = +1) (bought/didn’t buy) that best match class-
labeled training data.
y3 = +1
○ Classification model: linear classifier

Class 𝑓 𝑤, 𝑥 = sign 𝑤 𝑇 𝑥
y1 = −1 “didn’t buy”
(𝑦 = −1) 𝑓: ℝ𝐷 → {−1, +1}

○ Initial guess: 𝑤0 = −130, 2, 1 𝑇

Hypothesis: there is some decision boundary in the


data which makes this classification possible
Attendance Code:
26444542
#Code
Example: health insurance company
● Data on whether customers bought the plan
Correctly classified
𝕀 𝑓 𝑤, 𝑥2 ≠ 𝑦2 = 0 ○ Misclassification error: number of misclassified
data points
Decision boundary 𝑁
(𝑤0𝑇 𝑥 = 0)
𝐹 𝑤 = ෍ 𝕀 𝑓 𝑤, 𝑥𝑖 ≠ 𝑦𝑖
Class 𝑖=1
Incorrectly classified y2 = +1 “bought”
𝕀 𝑓 𝑤, 𝑥3 ≠ 𝑦3 = 1 y7 = −1 (𝑦 = +1) ■ assigns the same penalty to all incorrect
decisions, regardless of how ‘bad’ they are.
Incorrectly classified
y3 = +1 𝕀 𝑓 𝑤, 𝑥7 ≠ 𝑦7 = 1

○ Perceptron error: sum of perpendicular


Class distances of every misclassified data point to
y1 = −1 “didn’t buy”
(𝑦 = −1)
the decision boundary,
○ 𝐹 𝑤 = σ𝑁 𝑇
𝑖=1 max 0, −𝑦𝑖 𝑤 𝑥𝑖
■ ‘Penalizes’ incorrect decisions by the
Correctly classified
𝕀 𝑓 𝑤, 𝑥1 ≠ 𝑦1 = 0
distance from the decision boundary wTx in
the direction w (perpendicular distance).
Attendance Code:
26444542
#Code
Example: health insurance company
Initial guess: 𝑤0 = −130, 2, 1 𝑇

Correctly classified ○ Now let’s look at y1


𝕀 𝑓 𝑤, 𝑥2 ≠ 𝑦2 =

Decision boundary
(𝑤0𝑇 𝑥 = 0)

Class What should be the loss here?


Incorrectly classified y2 = +1 “bought”
𝕀 𝑓 𝑤, 𝑥3 ≠ 𝑦3 = 1 y7 = −1 (𝑦 = +1)
How about y2?
Incorrectly classified
y3 = +1 𝕀 𝑓 𝑤, 𝑥7 ≠ 𝑦7 = 1

Class
y1 = −1 “didn’t buy” ○ Perceptron error: sum of perpendicular distances of
(𝑦 = −1) every misclassified data point to the decision
boundary,
○ 𝐹 𝑤 = σ𝑁 𝑇
𝑖=1 max 0, −𝑦𝑖 𝑤 𝑥𝑖
Correctly classified ■ ‘Penalizes’ incorrect decisions by the distance from
𝕀 𝑓 𝑤, 𝑥1 ≠ 𝑦1 = the decision boundary wTx in the direction w
(perpendicular distance).
Attendance Code:
26444542
#Code
Example: health insurance company
Initial guess: 𝑤0 = −130, 2, 1 𝑇

Correctly classified
𝕀 𝑓 𝑤, 𝑥2 ≠ 𝑦2 = 0 ○ Now let’s look at y3

Decision boundary
(𝑤0𝑇 𝑥 = 0)

Class
Incorrectly classified y2 = +1 “bought”
𝕀 𝑓 𝑤, 𝑥3 ≠ 𝑦3 = 1 y7 = −1 (𝑦 = +1) ○ How about y7?
Incorrectly classified
y3 = +1 𝕀 𝑓 𝑤, 𝑥7 ≠ 𝑦7 = 1
○ Perceptron error: sum of perpendicular
Class distances of every misclassified data point to
y1 = −1 “didn’t buy” the decision boundary,
(𝑦 = −1)
○ 𝐹 𝑤 = σ𝑁 𝑇
𝑖=1 max 0, −𝑦𝑖 𝑤 𝑥𝑖
■ ‘Penalizes’ incorrect decisions by the
Correctly classified distance from the decision boundary wTx in
𝕀 𝑓 𝑤, 𝑥1 ≠ 𝑦1 = 0
the direction w (perpendicular distance).
Attendance Code:
26444542
#Code
Example: health insurance company
Initial guess: 𝑤0 = −130, 2, 1 𝑇

Correctly classified
𝕀 𝑓 𝑤, 𝑥2 ≠ 𝑦2 = 0 ○ Now let’s look at y3

Decision boundary
(𝑤0𝑇 𝑥 = 0)

Class
Incorrectly classified
𝕀 𝑓 𝑤, 𝑥3 ≠ 𝑦3 = 1
y2 = +1 “bought” ○ How about y7?
y7 = −1 (𝑦 = +1)

Incorrectly classified
y3 = +1 𝕀 𝑓 𝑤, 𝑥7 ≠ 𝑦7 = 1 ○ Perceptron error: sum of perpendicular
distances of every misclassified data point to
Class the decision boundary,
y1 = −1 “didn’t buy”
(𝑦 = −1) ○ 𝐹 𝑤 =

■ ‘Penalizes’ incorrect decisions by the


Correctly classified
𝕀 𝑓 𝑤, 𝑥1 ≠ 𝑦1 = 0
distance from the decision boundary wTx in
the direction w (perpendicular distance).
Attendance Code:
26444542
#Code
Example: health insurance company
● Data on whether customers bought the plan
Correctly classified
max 0, −𝑦2𝑤 𝑇 𝑥2 = 0
○ Misclassification error: number of misclassified data
points
𝑁
Decision boundary
(𝑤0𝑇 𝑥 = 0) 𝐹 𝑤 = ෍ 𝕀 𝑓 𝑤, 𝑥𝑖 ≠ 𝑦𝑖
𝑖=1
Class
assigns the same penalty to all incorrect
Incorrectly classified
max 0, −𝑦3 𝑤 𝑇 𝑥3 y2 = +1 “bought” ■
= 𝑤 𝑇 𝑥3 = 20 y7 = −1 (𝑦 = +1) decisions, regardless of how ‘bad’ they are.
Incorrectly classified
Perceptron error: sum of perpendicular distances of
y3 = +1 max 0, −𝑦7 𝑤 𝑇 𝑥7 ○
= 𝑤 𝑇 𝑥7 = 5
every misclassified data point to the decision
boundary,
Class
y1 = −1 “didn’t buy” 𝑁
(𝑦 = −1)
𝐹 𝑤 = ෍ max 0, −𝑦𝑖 𝑤 𝑇 𝑥𝑖
𝑖=1
■ ‘Penalizes’ incorrect decisions by the distance
Correctly classified from the decision boundary wTx in the
max 0, −𝑦1 𝑤 𝑇 𝑥1 = 0 direction w (perpendicular distance).
Attendance Code:
26444542
#Code
SGD: algorithm (Section 9 Lecture Notes)

● Step 1. Initialization: Select an initial guess for w0, a convergence


tolerance ε > 0, step size (learning rate) parameter α > 0, set
iteration number n=0
● Step 2. Gradient descent step: Compute new model parameters,
wn+1 = wn − α Fw(wn)
● Step 3. Convergence test: Compute new loss function value
F(wn+1), and loss function improvement, ΔF = |F(wn+1) − F(wn)| and if
ΔF < ε, exit with solution w*=wn+1
● Step 4. Iteration: update n=n+1 and go to step 2.
Attendance Code:
26444542
#Code
Perceptron classification

● Classification model (D-dimensional):


𝑓 𝑤, 𝑥 = sign 𝑤1 𝑥 1 + 𝑤2 𝑥 2 + ⋯ + 𝑤𝐷 𝑥 𝐷 = sign 𝑤 𝑇 𝑥
Attendance Code:
26444542
#Code
Perceptron classification

● Classification model (D-dimensional):


𝑓 𝑤, 𝑥 = sign 𝑤1 𝑥 1 + 𝑤2 𝑥 2 + ⋯ + 𝑤𝐷 𝑥 𝐷 = sign 𝑤 𝑇 𝑥

● Perceptron error function:


𝑁

𝐹 𝑤 = ෍ max 0, −𝑦𝑖 𝑤 𝑇 𝑥𝑖
𝑖=1
Attendance Code:
26444542
#Code
Perceptron classification

● Classification model (D-dimensional):


𝑓 𝑤, 𝑥 = sign 𝑤1 𝑥 1 + 𝑤2 𝑥 2 + ⋯ + 𝑤𝐷 𝑥 𝐷 = sign 𝑤 𝑇 𝑥

● Perceptron error function:


𝑁

𝐹 𝑤 = ෍ max 0, −𝑦𝑖 𝑤 𝑇 𝑥𝑖
𝑖=1

● Gradient with respect to w:


𝑁

𝐹𝑤 𝑤 = − ෍ 𝑦𝑖 𝑥𝑖 𝕀 −𝑦𝑖 𝑤 𝑇 𝑥𝑖 ≥ 0
𝑖=1

● Intuitively, gradient is just sum of −yi xi over


incorrectly classified points
Attendance Code:
26444542
#Code
Perceptron training: algorithm
● Step 1. Initialization: Select a starting candidate classification
model w0, set iteration number n = 0, choose maximum number of
iterations R and learning rate α > 0
● Step 2. Gradient descent step: Compute new model parameters:
taking each i = 1, 2, . . . , N in turn, if sign 𝑤𝑛𝑇 𝑥𝑖 ≠ 𝑦𝑖 ,then
w𝑛+1 = 𝑤𝑛 + 𝛼𝑦𝑖 𝑥𝑖

● Step 3. Iteration: If n < R, update n = n + 1, go to step 2, otherwise


exit with solution w* = wn.
Attendance Code:
26444542
#Code
Perceptron training: algorithm
● Step 1. Initialization: Select a starting candidate classification
model w0, set iteration number n = 0, choose maximum number of
iterations R and learning rate α > 0
● Step 2. Gradient descent step: Compute new model parameters:
taking each i = 1, 2, . . . , N in turn, if sign 𝑤𝑛𝑇 𝑥𝑖 ≠ 𝑦𝑖 , then
w𝑛+1 = 𝑤𝑛 + 𝛼𝑦𝑖 𝑥𝑖

● Step 3. Iteration: If n < R, update n = n + 1, go to step 2, otherwise


exit with solution w* = wn.
Attendance Code:
26444542
#Code
Example: health insurance company
Client Age (yrs) Income (k £) Bought?
● Perceptron algorithm in action
1 25 30 1 No
𝑇 𝑇
𝑖 = 4, 𝑤1 = −129.9, 5, 6 + 0.1 × −1 × 22 = −130, 2.8, 3.5
2 45 60 25 Yes
Decision boundary
(𝑤0𝑇 𝑥 = 0)
3 30 50 Yes
y2 = +1
y3 = +1 𝑖 = 7, 𝑤1 = 4 6 𝑇 + 0.1 × −1
−129.9, 5,22 25 × 1, 40, 55
𝑇
=
No−130, 1, 0.5
𝑇

Class 𝑖 = 9, 𝑤1 = −130, 1, 0.5 𝑇 + 0.1 × −1 × 1, 50, 40 𝑇


“bought” 5 −3.5 𝑇 35
= −130.1, −4, 45 Yes
(𝑦 = +1)
y1 = −1 𝑛 = 1, 𝑤0 = −130.1, −4, −3.5 𝑇
:
6 55 70 𝑇 Yes 𝑇
Class 𝑖 = 2, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 45, 60 = −130, 0.5, 2.5
“didn’t buy”
(𝑦 = −1) 71 + 0.1 ×40
𝑖 = 3, 𝑤2 = 𝑤 50 𝑇 = −129.9, 3.5,
+1 × 1, 30, 55 No7.5 𝑇

8 60 80 Yes

9 50 40 No

10 28 35 No
Attendance Code:
26444542
#Code
Example: health insurance company
● Perceptron algorithm in action
○ 𝑛 = 0, 𝑤0 = −130, 2, 1 𝑇 , 𝑅 = 10, 𝛼 = 0.1:
1
Decision boundary ■ 𝑖 = 3, 𝑤1 = 𝑤0 + 0.1 × +1 × 30 = −129.9, 5, 6 𝑇
(𝑤0𝑇 𝑥 = 0) 50
y2 = +1 1
𝑖 = 4, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 22 = −130, 2.8, 3.5 𝑇
y3 = +1
25
Class
“bought”
(𝑦 = +1)
y1 = −1
𝑇 𝑇 𝑇
𝑖 = 7, 𝑤1 = −129.9, 5, 6 + 0.1 × −1 × 1, 40, 55 = −130, 1, 0.5
Class
“didn’t buy” 𝑖 = 9, 𝑤1 = −130, 1, 0.5 𝑇
+ 0.1 × −1 × 1, 50, 40 𝑇
(𝑦 = −1) = −130.1, −4, −3.5 𝑇
𝑛 = 1, 𝑤0 = −130.1, −4, −3.5 𝑇
:
𝑇 𝑇
𝑖 = 2, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 45, 60 = −130, 0.5, 2.5
𝑇 𝑇
𝑖 = 3, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 30, 50 = −129.9, 3.5, 7.5
Attendance Code:
26444542
#Code
Example: health insurance company
● Perceptron algorithm in action
○ 𝑛 = 0, 𝑤0 = −130, 2, 1 𝑇 , 𝑅 = 10, 𝛼 = 0.1:
1
■ 𝑖 = 3, 𝑤1 = 𝑤0 + 0.1 × +1 × 30 = −129.9, 5, 6 𝑇
y6 = +1 50
1
y2 = +1 ■ 𝑖 = 4, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 22 = −130, 2.8, 3.5 𝑇

25
y7 = −1 Class
y3 = +1 “bought”
(𝑦 = +1)
𝑇 𝑇 𝑇
y5 = +1 𝑖 = 7, 𝑤1 = −129.9, 5, 6 + 0.1 × −1 × 1, 40, 55 = −130, 1, 0.5
𝑇 𝑇
𝑖 = 9, 𝑤1 = −130, 1, 0.5 + 0.1 × −1 × 1, 50, 40
y1 = −1 Class = −130.1, −4, −3.5 𝑇
“didn’t buy”
y4 = −1 (𝑦 = −1) 𝑛 = 1, 𝑤0 = −130.1, −4, −3.5 𝑇
:
𝑇 𝑇
𝑖 = 2, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 45, 60 = −130, 0.5, 2.5
𝑇 𝑇
𝑖 = 3, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 30, 50 = −129.9, 3.5, 7.5
Attendance Code:
26444542
#Code
Example: health insurance company
● Perceptron algorithm in action
○ 𝑛 = 0, 𝑤0 = −130, 2, 1 𝑇 , 𝑅 = 1000, 𝛼 = 0.1:
1
■ 𝑖 = 3, 𝑤1 = 𝑤0 + 0.1 × +1 × 30 = −129.9, 5, 6 𝑇
y6 = +1 50
1
y2 = +1 ■ 𝑖 = 4, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 22 = −130, 2.8, 3.5 𝑇

25
y7 = −1 Class
1
y3 = +1 “bought” ■ 𝑖 = 7, 𝑤1 = −130, 2.8, 3.5 𝑇 + 0.1 × +1 × 40 = −130.1, −1.2, −2 𝑇
(𝑦 = +1) 55
y5 = +1 ….
𝑇 𝑇
𝑖 = 9, 𝑤1 = −130, 1, 0.5 + 0.1 × −1 × 1, 50, 40
y1 = −1 Class = −130.1, −4, −3.5 𝑇
“didn’t buy”
y4 = −1 (𝑦 = −1) 𝑛 = 1, 𝑤0 = −130.1, −4, −3.5 𝑇
:
𝑇 𝑇
𝑖 = 2, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 45, 60 = −130, 0.5, 2.5
𝑇 𝑇
𝑖 = 3, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 30, 50 = −129.9, 3.5, 7.5
Attendance Code:
26444542
#Code
Example: health insurance company
● Perceptron algorithm in action
○ 𝑛 = 0, 𝑤0 = −130, 2, 1 𝑇 , 𝑅 = 1000, 𝛼 = 0.1:
1
■ 𝑖 = 3, 𝑤1 = 𝑤0 + 0.1 × +1 × 30 = −129.9, 5, 6 𝑇
y6 = +1 50
1
■ 𝑖 = 4, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 22 = −130, 2.8, 3.5 𝑇
y2 = +1
25
y7 = −1 Class 1
y3 = +1 “bought” ■ 𝑖 = 7, 𝑤1 = −130, 2.8, 3.5 𝑇 + 0.1 × +1 × 40 = −130.1, −1.2, −2 𝑇
(𝑦 = +1) 55
….
y5 = +1
○ 𝑛 = 1, 𝑤1 = −130, −0.2, 2 𝑇 :
y1 = −1 Class
“didn’t buy”
y4 = −1 (𝑦 = −1) Update weights, and so on… until n = R
𝑖 = 7, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 1, 40, 55 𝑇 = −130, 1, 0.5 𝑇

𝑖 = 9, 𝑤1 = −130, 1, 0.5 𝑇 + 0.1 × −1 × 1, 50, 40 𝑇 = −130.1, −4, −3.5 𝑇 𝑖 = 2, 𝑤2


= 𝑤1 + 0.1 × +1 × 1, 45, 60 𝑇 = −130, 0.5, 2.5 𝑇
𝑖 = 3, 𝑤2 = 𝑤1 + 0.1 × +1 × 1, 30, 50 𝑇 = −129.9, 3.5, 7.5 𝑇
Attendance Code:
26444542
#Code
Example: health insurance company
● Perceptron algorithm in action
○ 𝑛 = 0, 𝑤0 = −130, 2, 1 𝑇 , 𝑅 = 1000, 𝛼 = 0.1:
1
■ 𝑖 = 3, 𝑤1 = 𝑤0 + 0.1 × +1 × 30 = −129.9, 5, 6 𝑇
50
y6 = +1
1
■ 𝑖 = 4, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 22 = −130, 2.8, 3.5 𝑇
y2 = +1 25
y7 = −1 Class 1
“bought” ■ 𝑖 = 7, 𝑤1 = −130, 2.8, 3.5 𝑇 + 0.1 × +1 × 40 = −130.1, −1.2, −2 𝑇
y3 = +1 55
(𝑦 = +1)
….
y5 = +1
○ 𝑛 = 1, 𝑤1 = −130, −0.2, 2 𝑇 :
y1 = −1 Class
“didn’t buy” Update weights, and so on… until n = R
y4 = −1 (𝑦 = −1)
𝑖 = 7, 𝑤1 = −129.9, 5, 6 𝑇 + 0.1 × −1 × 1, 40, 55 𝑇 = −130, 1, 0.5 𝑇

○ 𝑛 = 999, 𝑤 ⋆ = −128.98, −1.85, 3.55 𝑇 :


𝑖 = 9, 𝑤1 = −130, 1, 0.5 𝑇 + 0.1 × −1 × 1, 50, 40 𝑇 = −130.1, −4, −3.5 𝑇

𝑛 = 1, 𝑤0 = −130.1, −4, −3.5 𝑇 :


Attendance Code:
26444542
#Code
Perceptron algorithm: analysis

● If the data is linearly separable, perceptron algorithm always


converges on a decision boundary with zero error but no
guarantee on the number of iterations required to reach fixed
point
● If data is not linearly separable, no convergence guarantee –
can cycle between local optima of the perceptron error function,
so we need to stop after some number of iterations R
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Perceptron training in action
Attendance Code:
26444542
#Code
Some remarks about the perceptron
● Simple linear classifier based on the perceptron error function
rather than the misclassification error function
● Very important classic algorithm in the history of ML, direct
precursor to modern deep learning algorithms
● Extremely simple; there are mathematically better "linear single-
layer" algorithms (e.g. support vector machines) so the
perceptron is rarely used in practice today
● Understanding the perceptron critical to understanding most of
the main principles of modern ML classification
Attendance Code:
26444542
#Code
To recap
● We learned the perceptron algorithm for classifying data.
● It only converges if training data is linearly separable (and solution may not
be unique)
● Question: how would you generalize the algorithm to K > 2 classes?
● Next: Neural networks (we will answer the question above)

Further Reading
● PRML, Section 4.1.7
● R&N, Section 18.6.3
● H&T, Section 4.5.1

You might also like