ML Lec SVM Linear
ML Lec SVM Linear
0
∙
b=
given plot of two
+
x
label classes on
w
graph. Can you
decide a separating
line for the classes?
w∙x + b<0
denotes +1 + b)
denotes -1 Suppose you are
given plot of two
label classes on
graph. Can you
decide a separating
line for the classes?
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t
denotes +1 + b)
denotes -1 Suppose you are
given plot of two
label classes on
graph. Can you
decide a separating
line for the classes?
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t
denotes +1 + b)
denotes -1
Any of these
would be
fine..
..but which is
best?
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t
denotes +1 + b)
denotes -1
How would
you classify
this data?
Misclassified
to +1 class
Classifier a
Margin
x f yes
sign(wx∙x+
t
f(x,w,b)
f(x,w,b)==sign(w
denotes +1 +b)
b)
denotes -1 Define the
margin of a
linear
classifier as
the width that
the boundary
could be
increased by
before hitting
a datapoint.
Maximum a
Margin
x f yes
∙x
1. Maximizing the margin is good t
according to intuition
f(x,w,b) and PAC
= sign(w
denotes +1 theory + b)
denotes -1 2. Implies that onlyThe
support vectors
maximum
are important; other training
margin linear
examples are ignorable.
classifier is
3. Empirically it works very very
the linear
Support well.
classifier with
Vectors are
those the, um,
datapoints maximum
that the margin.
margin
pushes up This is the
against i.e. simplest kind
co-ordinates Linear of SVM (Called
of individual SVM an LSVM)
Linear SVM Mathematically
s =
a s +
t
l
C ne x M = Margin Width
c
di ” zo
r e
“P +1
X- -
s =
b=1 a s
+
wx b=0 t C l ne
+ 1 d ic zo
wx b=- P re 1 ”
x+ “
w
What we know:
w . x+ + b = +1
w . x- + b = -1 𝑀 =¿ ¿
w . (x+-x-) = 2
Linear SVM
Mathematically
Goal: 1) Correctly classify all training data
𝑤 𝑥𝑖 +𝑏 ≥1 if yi = +1
if yi = -1
𝑦 𝑖 (𝑤 𝑥𝑖 +𝑏)≥ 1 for all i
2
2) Maximize the Margin 𝑀=
same as minimize 1 |𝑤| 𝑡
𝑤 𝑤
2
We can formulate a Quadratic Optimization Problem and solve for w and b
1 𝑡
Minimize Φ (𝑤)= 𝑤 𝑤
2
subject to
Solving the Optimization
Problem
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
Need to optimize a quadratic function subject to linear constraints.
Quadratic optimization problems are a well-known class of
mathematical programming problems, and many (rather intricate)
algorithms exist for solving them.
The solution involves constructing a dual problem where a Lagrange
multiplier αi is associated with every constraint in the primary
problem:
Blue diamonds are positive examples and red squares are negative
examples. We would like to discover a simple SVM that accurately
discriminates the two classes.
15
Solution:
Since the data is linearly separable, we can use a linear SVM (that is,
one whose mapping function Φ() is the identity function). By
inspection, it should be obvious that there are three support vectors
16
After substituting the values of and finding dot products we get
⇒ . Thus, the equation of the hyperplane is . Plotting the line gives the decision
in , we get .
17
Practice example
Suppose we are given the following positively labeled data points
in R2 :
and the following negatively labeled data points in R 2: . See the
figure bellow. Fine equation of the hyperplane separating the two
classes.
19