0% found this document useful (0 votes)
19 views19 pages

ML Lec SVM Linear

The document provides an introduction to Support Vector Machines (SVM), a type of discriminative classifier that identifies an optimal hyperplane to separate different classes in labeled training data. It discusses the properties of SVM, including the concept of maximizing the margin and the importance of support vectors, as well as various applications such as gene expression data classification and text categorization. Additionally, it outlines the mathematical formulation of SVM and the process of solving the optimization problem to determine the classifier parameters.

Uploaded by

l226451
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views19 pages

ML Lec SVM Linear

The document provides an introduction to Support Vector Machines (SVM), a type of discriminative classifier that identifies an optimal hyperplane to separate different classes in labeled training data. It discusses the properties of SVM, including the concept of maximizing the margin and the importance of support vectors, as well as various applications such as gene expression data classification and text categorization. Additionally, it outlines the mathematical formulation of SVM and the process of solving the optimization problem to determine the classifier parameters.

Uploaded by

l226451
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Introduction to Machine Learning

Support Vector Machine


&
Its Applications
A portion of the slides are taken from
Prof. Andrew Moore’s
SVM tutorial at
http://www.cs.cmu.edu/~awm/tutorials
Overview
 Introduction to Support Vector
Machines (SVM)
 Properties of SVM
 Applications
Gene Expression Data Classification
Text Categorization
 Discussion
Support Vector Machine
 A Support Vector Machine (SVM) is a
discriminative classifier formally defined by
a separating hyperplane. In other words,
given labeled training data (supervised
learning), the algorithm outputs an optimal
hyperplane which categorizes new examples.
In two-dimensional space this hyperplane is
a line dividing a plane in two parts where in
each class lay in either side.
Linear a
Classifiers
x f yest
f(x,w,b) = sign(w∙x
denotes +1 w∙x + b>0
+ b)
denotes -1 Suppose you are

0

b=
given plot of two

+
x
label classes on

w
graph. Can you
decide a separating
line for the classes?

w∙x + b<0

Where is a weight vector and is a feature vector


The bias term b determines the offset of the hyperplane from the origin along the direction of the weight vector w.
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t

denotes +1 + b)
denotes -1 Suppose you are
given plot of two
label classes on
graph. Can you
decide a separating
line for the classes?
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t

denotes +1 + b)
denotes -1 Suppose you are
given plot of two
label classes on
graph. Can you
decide a separating
line for the classes?
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t

denotes +1 + b)
denotes -1

Any of these
would be
fine..

..but which is
best?
Linear a
Classifiers
x f yes
f(x,w,b) = sign(w∙x
t

denotes +1 + b)
denotes -1

How would
you classify
this data?

Misclassified
to +1 class
Classifier a
Margin
x f yes
sign(wx∙x+
t
f(x,w,b)
f(x,w,b)==sign(w
denotes +1 +b)
b)
denotes -1 Define the
margin of a
linear
classifier as
the width that
the boundary
could be
increased by
before hitting
a datapoint.
Maximum a
Margin
x f yes
∙x
1. Maximizing the margin is good t
according to intuition
f(x,w,b) and PAC
= sign(w
denotes +1 theory + b)
denotes -1 2. Implies that onlyThe
support vectors
maximum
are important; other training
margin linear
examples are ignorable.
classifier is
3. Empirically it works very very
the linear
Support well.
classifier with
Vectors are
those the, um,
datapoints maximum
that the margin.
margin
pushes up This is the
against i.e. simplest kind
co-ordinates Linear of SVM (Called
of individual SVM an LSVM)
Linear SVM Mathematically
s =
a s +
t
l
C ne x M = Margin Width
c
di ” zo
r e
“P +1
X- -
s =
b=1 a s
+
wx b=0 t C l ne
+ 1 d ic zo
wx b=- P re 1 ”
x+ “
w

What we know:
 w . x+ + b = +1
 w . x- + b = -1 𝑀 =¿ ¿
 w . (x+-x-) = 2
Linear SVM
Mathematically
 Goal: 1) Correctly classify all training data
𝑤 𝑥𝑖 +𝑏 ≥1 if yi = +1
if yi = -1
𝑦 𝑖 (𝑤 𝑥𝑖 +𝑏)≥ 1 for all i
2
2) Maximize the Margin 𝑀=
same as minimize 1 |𝑤| 𝑡
𝑤 𝑤
2
 We can formulate a Quadratic Optimization Problem and solve for w and b

1 𝑡
Minimize Φ (𝑤)= 𝑤 𝑤
2
subject to
Solving the Optimization
Problem
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
 Need to optimize a quadratic function subject to linear constraints.
 Quadratic optimization problems are a well-known class of
mathematical programming problems, and many (rather intricate)
algorithms exist for solving them.
 The solution involves constructing a dual problem where a Lagrange
multiplier αi is associated with every constraint in the primary
problem:

Find α1…αN such that


Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
The Optimization
Problem Solution

The solution has the form:
w =Σαiyixi b= yk- wTxk for any xk such that αk 0
 Each non-zero αi indicates that corresponding xi is a
support vector.
 Then the classifying function will have the form:
f(x) = ΣαiyixiTx + b
 Notice that it relies on an inner product between the test
point x and the support vectors xi – we will return to this
later.
 Also keep in mind that solving the optimization problem
involved computing the inner products xiTxj between all
pairs of training points.
Example:
Suppose we are given the following positively labeled data points in R 2.
And the following negatively labeled data points in R2. . See the figure
bellow

Blue diamonds are positive examples and red squares are negative
examples. We would like to discover a simple SVM that accurately
discriminates the two classes.
15
Solution:
 Since the data is linearly separable, we can use a linear SVM (that is,
one whose mapping function Φ() is the identity function). By
inspection, it should be obvious that there are three support vectors

We will use vectors augmented with 1 as a bias input. The augmented


vectors are

We know that and


Now we are required to find out 3 parameters , and based on the
following 3 linear equations.

16
After substituting the values of and finding dot products we get

After solving these equations simultaneously, we get Substituting these values

⇒ . Thus, the equation of the hyperplane is . Plotting the line gives the decision
in , we get .

surface as given in the Figure.


Now we can classify any new point as
belong to –ve class or +ve class e.g. (1,3),
(2,-1).

Note: The equation of the line parallel to


y-axis is; x=c
and the equation of the line parallel to
x-axis is; y=c

17
Practice example
Suppose we are given the following positively labeled data points
in R2 :
and the following negatively labeled data points in R 2: . See the
figure bellow. Fine equation of the hyperplane separating the two
classes.

We identify the two support vectors as

After following the complete


procedure; we find that

And and We know that


, we get
18
Giving us the separating hyperplane equation with and . See
the figure below;
Now we can classify any new point say as positive or
negative class.

19

You might also like