8 SVM
8 SVM
3
Unit Vectors
1 0 4 0 4
𝑣⃗ = 4×𝚤̂ + 3×𝚥̂ = 4 +3 = + =
0 1 0 3 3
4
Vector Magnitude
Pythagorean theorem:
c = 𝑣⃗ 𝒂𝟐 + 𝒃𝟐 = 𝒄𝟐 è 𝐜 = 𝒂𝟐 + 𝒃𝟐
b=3
a= 4
• How can we construct a unit vector out of a given vector of any length?
Input: Output:
6
Vector Normalization
• How can we construct a unit vector out of a given vector of any length?
Input: Output:
𝑣⃗
"=
𝒗
𝒗 Normalization: "
𝒗 𝒗
𝒗
𝒗
7
Vector Normalization
• How can we construct a unit vector out of a given vector of any length?
𝑣⃗
3 𝑣⃗ 4/5
𝑣⃗ = 4 "
𝒗 "=
𝒗 =
3 𝒗 3/5
4
Let us verify that its length is 1
⃗ = 3$ + 4$ = 25 = 5
‖𝑣‖
8
Vector Normalization
• How can we construct a unit vector out of a given vector of any length?
𝑣⃗
3 𝑣⃗ 4/5
𝑣⃗ = 4 "
𝒗 "=
𝒗 =
3 𝒗 3/5
4
& (
⃗ = 3$ + 4$ = 25 = 5
‖𝑣‖ 3 = ( )$+( )$
𝒗 ' '
= (9 + 16)/25 = 1
9
Vector Inner Product
𝑢1 𝑣1 𝒗𝟐 𝑣⃗
𝑢= 𝑣⃗ =
𝑢2 𝑣2 𝒖𝟐
𝑢
p
• What is the inner product of 𝑢 and 𝑣?
⃗ 𝒗𝟏 𝒖𝟏
𝑢! 𝑣⃗ = ?
10
Vector Inner Product
𝑢1 𝑣1 𝒗𝟐 𝑣⃗
𝑢= 𝑣⃗ =
𝑢2 𝑣2 𝒖𝟐
𝑢
p
• What is the inner product of 𝑢 and 𝑣?
⃗ 𝒗𝟏 𝒖𝟏
𝑢! 𝑣⃗ = 𝑝 × 𝑢
≣
! 𝑣1
𝑢 𝑣⃗ = 𝑢1 𝑢2
𝑣2 p is the length of the projection of 𝑣⃗ onto 𝑢
= 𝑢1×𝑣1 + 𝑢2×𝑣2 Can be signed
11
Vector Inner Product
𝑢1 𝑣1
𝑢= 𝑣⃗ =
𝑢2 𝑣2 Project 𝑣⃗ onto 𝑢 𝑢
𝑣⃗
𝑢! 𝑣⃗ = 𝑝 × 𝑢
≣
! 𝑣1
𝑢 𝑣⃗ = 𝑢1 𝑢2
𝑣2 p is negative here
= 𝑢1×𝑣1 + 𝑢2×𝑣2
12
Recall: Perceptron
• Given a training set (𝑥 # , 𝑦# ), (𝑥 $ , 𝑦 $ ), . . . , (𝑥 % , 𝑦 % )
• {𝑥 ! , 𝑥 " , …, 𝑥 # } : data set, and 𝑦 $ Î {1,-1} be the class label
• Perceptron
• a binary classifier: maps its input to an output value 𝑓(𝐱):
+1, 𝑖𝑓 𝒘𝑻 𝒙 + 𝑏 ≥ 0
𝑓(𝒙) = )
−1, 𝑖𝑓 𝒘𝑻 𝒙 + 𝑏 < 0
wTx + b = 0
14
Perceptron: Intuition
15
Perceptron: Intuition
16
Perceptron: Intuition
17
Perceptron: Intuition
18
Perceptron: Intuition
19
Perceptron: Intuition
20
Limitations of Perceptron
21
Limitations of Perceptron
22
Limitations of Perceptron
23
Limitations of Perceptron
24
Limitations of Perceptron
25
Limitations of Perceptron
26
Limitations of Perceptron
27
Support Vector Machines
𝛾
wTx + b = 0 29
The Objective of SVM
𝛾
wTx + b = 0 30
Finding the Decision Boundary
Support Vector
Support Vector
wTx + b = 1 w
wTx + b = 0
wTx + b = -1
Computing the margin width
• Assume that all data is at least distance 1 from the hyperplane, then
the following two constraints follow
wTxi + b ≥ 1 if yi = +1
wTxi + b ≤ -1 if yi = −1
32
Computing the margin width
33
Computing the margin width
wTx + b = +1
wTx + b = -1 wTx + b = 0 34
Computing the margin width
x1
x2
What we know:
wTx1 + b = +1
wT x2 + b = -1
x1 = x2 + l w
𝛾 ∥x1 – x2 ∥ = 2𝛾
wTx + b = +1
𝛾 It’s now easy to get 𝛾
wTx + b = -1 wTx + b = 0 in terms of w and b
35
What we know:
Computing the margin width wT x1 + b = +1
wT x2 + b = -1
x1 = x2 + l w
|x1 – x2 | = 2𝛾
x1
Since x1 is on the hyperplane defined
x2 by wT.x + b = +1, we know that wT x1 +
b = 1. If we substitute for x1:
wT x1 + b = 1
𝛾 =>
wTx + b = +1 wT (x2 + l w) + b = 1
𝛾 =>
wTx + b = -1 wTx + b = 0 -1 + l wT w = 1
=>
l = 2/ wT w
36
What we know:
Computing the margin width wT x1 + b = +1
wT x2 + b = -1
x1 = x2 + l w
|x1 – x2 | = 2𝛾
x1
x2 2𝛾 = ∥ x1 – x2 ∥
=∥lw∥
=l ∥ 𝒘 ∥
𝛾
Because l = 2/ wT w
wTx + b = +1
𝛾
=>
𝟐∥𝒘∥ $
2𝛾 = wT w = ∥𝒘∥
wTx + b = -1 wTx + b = 0
37
Computing the margin width
x1
𝛾 = 𝟏⁄∥𝒘∥
x2
Thus, maximizing 𝛾 is the same as
minimizing ∥ 𝒘 ∥
𝛾
wTx + b = +1
𝛾
wTx + b = -1 wTx + b = 0
38
The Objective of SVM
39
The Objective of SVM - Illustration
40
The Objective of SVM - Illustration
𝒊
What is the inner product
𝒙 of w and xi (i.e., wT xi)?
wTxi + b ≥ 1 if yi = +1 𝒙𝒊𝟐
w
w2
wTxi + b ≤ -1 if yi = −1
𝒙𝒊𝟏 w1
41
The Objective of SVM - Illustration
𝒊
What is the inner product
𝒙 of w and xi (i.e., wT xi)?
wTxi + b ≥ 1 if yi = +1 Project xi onto w w
pi . ∥ 𝒘 ∥
wTxi + b ≤ -1 if yi = −1 pi = w1.𝒙𝒊𝟏 + w2.𝒙𝒊𝟐
42
The Objective of SVM - Illustration
𝒊
What is the inner product
𝒙 of w and xi (i.e., wT xi)?
pi. ∥ 𝒘 ∥ ≥ 1 if yi = +1 Project xi onto w w
pi . ∥ 𝒘 ∥
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 pi = w1.𝒙𝒊𝟏 + w2.𝒙𝒊𝟐
43
The Objective of SVM - Illustration
𝒙𝟏
Project x1 onto w
𝒙𝟐 1
p1 is positive and very small, hence,
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
for p1. ∥ 𝒘 ∥ to be ≥ 1, ∥ 𝒘 ∥ has to
p
2 𝒙𝟏 be very large!
wT x = 0 45
The Objective of SVM - Illustration
𝒙𝟐 1
But, the optimization objective is to
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
minimize ∥ 𝒘 ∥, hence, SVM will
p
2 𝒙𝟏 not prefer this decision boundary
wT x = 0 46
The Objective of SVM - Illustration
𝒙𝟐 1
p2 is negative and very small, hence,
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
for p2. ∥ 𝒘 ∥ to be ≤ -1, ∥ 𝒘 ∥ has to
p
2 𝒙𝟏 be very large!
wT x = 0 47
The Objective of SVM - Illustration
𝒙𝟐 1
But, the optimization objective is to
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
minimize ∥ 𝒘 ∥, thus, again, SVM
p
2 𝒙𝟏 will not prefer this decision boundary
wT x = 0 48
The Objective of SVM - Illustration
𝒙𝟐 1 NO
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p
p
2 𝒙𝟏
wT x = 0 49
The Objective of SVM - Illustration
wT x = 0 50
The Objective of SVM - Illustration
wT x = 0 51
The Objective of SVM - Illustration
𝒙𝟐 YES
pi. ∥ 𝒘 ∥ ≤ -1 if yi = −1 p1
p2 w
𝒙𝟏
wT x = +1
wT x = -1
wT x = 0 52
Example
wT x = -1
[3, 4]
[1, 2]
[4, 3]
maximize the margin!
[2, 1]
54
References
• Support Vector Machines, Andrew W. Moore, CMU School of Computer Science
• Introduction to Information Retrieval: support vector machines
55