0% found this document useful (0 votes)
21 views13 pages

Support Vector Machines

Support Vector Machines (SVMs) are used for binary classification by separating classes in feature space with a hyperplane. The optimal linear separator maximizes the margin, determined by support vectors, and can be formulated as a quadratic optimization problem. SVMs can also handle non-linearly separable data through mapping to higher-dimensional spaces and have applications across various complex data types.

Uploaded by

zoo060508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views13 pages

Support Vector Machines

Support Vector Machines (SVMs) are used for binary classification by separating classes in feature space with a hyperplane. The optimal linear separator maximizes the margin, determined by support vectors, and can be formulated as a quadratic optimization problem. SVMs can also handle non-linearly separable data through mapping to higher-dimensional spaces and have applications across various complex data types.

Uploaded by

zoo060508
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Support Vector Machines

Perceptron Revisited: Linear Separators

• Binary classification can be viewed as the task of


separating classes in feature space:

wTx + b = 0
wTx + b > 0
wTx + b < 0

f(x) = sign(wTx + b)
Linear Separators

• Which of the linear separators is optimal?


Classification Margin

• Distance from example xi to the separator is


• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the distance between support vectors.
ρ

r
Maximum Margin Classification
• Maximizing the margin is good according to intuition and
PAC theory.
• Implies that only support vectors matter; other training
examples are ignorable.
Linear SVM Mathematically
• Let training set {(xi, yi)}i=1..n, xi∈Rd, yi ∈ {-1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
⇔ yi(wTxi + b) ≥ ρ/2
wTxi + b ≥ ρ/2 if yi = 1

• For every support vector xs the above inequality is an equality.


After rescaling w and b by ρ/2 in the equality, we obtain that
distance between each xs and the hyperplane is

• Then the margin can be expressed through (rescaled) w and b as:


Linear SVMs Mathematically (cont.)
• Then we can formulate the quadratic optimization problem:

Find w and b such that


is maximized
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
Which can be reformulated as:

Find w and b such that


Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Solving the Optimization Problem

Find w and b such that


Φ(w) =wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1

• Need to optimize a quadratic function subject to linear constraints.

• Quadratic optimization problems are a well-known class of mathematical


programming problems for which several (non-trivial) algorithms exist.
Soft Margin Classification
• What if the training set is not linearly separable?
• Slack variables ξi can be added to allow misclassification of difficult or noisy
examples, resulting margin called soft.

ξi
ξi
Linear SVMs: Overview
• The classifier is a separating hyperplane.

• Most “important” training points are support vectors; they define the
hyperplane.

• Quadratic optimization algorithms can identify which training points xi are


support vectors.
Non-linear SVMs
• Datasets that are linearly separable with some noise work out great:

0 x

• But what are we going to do if the dataset is just too hard?

0 x
• How about… mapping data to a higher-dimensional space:

x2

0 x
Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be mapped to some
higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)
SVM applications

• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and
gained increasing popularity in late 1990s.

• SVMs are currently among the best performers for a number of classification
tasks ranging from text to genomic data.

• SVMs can be applied to complex data types beyond feature vectors (e.g.
graphs, sequences, relational data) by designing kernel functions for such data.

You might also like