SVM Minus Kernel 71
SVM Minus Kernel 71
Linear Classifiers
denotes +1
denotes -1
denotes +1
denotes -1
denotes +1
denotes -1
denotes +1
denotes -1
denotes +1
denotes -1
denotes +1
denotes -1
Define the margin of
a linear classifier as
the width that the
boundary could be
increased by before
hitting a datapoint.
Maximum Margin
denotes +1
denotes -1
The maximum
margin linear
classifier is the
linear classifier with
the, maximum
margin.
This is the simplest
kind of SVM (Called
an LSVM)
Linear SVM
Maximum Margin
denotes +1
denotes -1
The maximum
margin linear
classifier is the
linear classifier with
Support Vectors are
those datapoints
the maximum
that the margin margin.
pushes up against This is the simplest
kind of SVM (Called
an LSVM)
Linear SVM
Specifying=a+1line
” andPlus-
margin
la ss PlaneClassifier
ic t C one Boundary
e d z Minus-Plane
r
“P -1”
s s=
C la
ic t one
re d z
“P
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
Classify +1 if w . x + b >= 1
as..
-1 if w . x + b <= -1
Computing the margin width
”
+1 M = Margin Width
s s=
C la e
t n
e dic zo
“Pr -1” How do we
s=
b=
1
C las compute M in
+ ic t one
wx =0 d z
wx
+ b
b =- “Pre terms of w and
+ 1
wx
b?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
The vector w is perpendicular to the Plus Plane.
Computing the margin width
”
+1 M = Margin Width
s s=
C la e
t n
e dic zo
“Pr -1” How do we
s=
b=
1
C las compute M in
+ ic t one
wx =0 d z
wx
+ b
b =- “Pre terms of w and
+ 1
wx
b?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
The vector w is perpendicular to the Plus Plane.
Computing the margin width
”
+1 + M = Margin Width
s s= x
C la e
d ict zon
e
“Pr - 1” How do we
sx=-
= 1
C las compute M in
+ b t
ic zon e
wx =0 d
wx
+ b
b=
-
“Pr
e terms of w and
+ 1
wx
b?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
• Let x- be any point on the minus plane
• Let x+ be the closest plus-plane-point to x-.
Computing the margin width
”
+1 + M = Margin Width
s s= x
C la e
d ict zon
e
“Pr - 1” How do we
sx=-
= 1
C las compute M in
+ b t
ic zon e
wx =0 d
wx
+ b
b=
-
“Pr
e terms of w and
+ 1
wx
b?
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
• Let x- be any point on the minus plane
• Let x+ be the closest plus-plane-point to x-.
• Claim: x+ = x- + l w for some value of l. Why?
Computing the margin width
”
+1 + M = Margin Width
s s= x
C la e
d i c t
z o n The line from x -
to x +
is
e
“Pr - 1” How
perpendicular
do we to the
s sx=-
planes.
b = 1
t C la
e compute M in
wx
+
=0 d ic zon So to get from x -
to x +
travel
wx
+ b
-
“P r e terms of w and
wx
b=
+ 1 some distance in
b?
direction w.
• Plus-plane = { x : w . x + b = +1 }
• Minus-plane = { x : w . x + b = -1 }
• The vector w is perpendicular to the Plus Plane
• Let x- be any point on the minus plane
• Let x+ be the closest plus-plane-point to x-.
• Claim: x+ = x- + l w for some value of l. Why?
Computing the margin width
”
+1 + M = Margin Width
s s= x
C la e
d ict zon
e
“Pr - 1”
sx=-
= 1
C las
+b ic t one
wx = 0 d z
+b re
w x
b=
+ 1
-
“P
wx
What we know:
• w . x+ + b = +1
• w . x- + b = -1
• x+ = x- + l w
• |x+ - x- | = M
It’s now easy to get M in
terms of w and b
Computing the margin width
”
+1 + M = Margin Width
s s= x
C la e
d ict zon
e
“Pr - 1”
sx=-
= 1
C las
+b ic t one
wx 0 d
+b
=
- Pre z w . (x -
+ l w) + b = 1
w x
b
+ 1
= “
wx
=>
What we know:
• w . x+ + b = +1 w . x - + b + l w .w = 1
• w . x- + b = -1 =>
• x+ = x- + l w -1 + l w .w = 1
• |x+ - x- | = M =>
It’s now easy to get M in
2
terms of w and b λ
w.w
Computing the margin width
”
+1 + M = Margin Width
2
s s= x w.w
la =
i c t C one 2
e d z
“Pr - 1” || w ||
s sx=
-
=1 C la
+ b
ic t one
wx 0 d
+b
=
- P re z M = |x+
- x-
| =| l w |=
w x
b
+ 1
= “
wx
What we know: λ | w | λ w.w
• w . x+ + b = +1
• w . x- + b = -1 2 w.w 2
• x+ = x- + l w
w.w w.w
• |x+ - x- | = M
• 2
λ
w.w
Learning the Maximum Margin Classifier
”
+1 + M = Margin Width
2
s s= x w.w
la =
i c t C one
e d z 2
r
“P - 1” || w ||
sx=-
= 1
C las
+b ic t one
wx = 0 d z
+b re
w x
b=
+ 1
-
“P
wx
Given a guess of w and b we can
• Compute whether all data points in the correct half-planes
• Compute the width of the margin
So now we just need to write a program to search the space of w’s
and b’s to find the widest margin that matches all the datapoints.
Learning the Maximum Margin Classifier
1”
+ M Given guess of w , b we can
s s= =2
Cla ne • Compute whether all data
ic t w.w
e d zo points are in the correct half-
“Pr -1”
2
s s = || w || planes
= 1
C la
wx
+b
=0 ict zone • Compute the margin width
e d
“Pr
+ b
wx b =- Assume R datapoints, each
+ 1
wx
(xk,yk) where yk = +/- 1