Support Vector Machines
Support Vector Machines
wTx + b = 0
wTx + b > 0
wTx + b < 0
f(x) = sign(wTx + b)
Linear Separators
r
Maximum Margin Classification
• Maximizing the margin is good according to intuition and
PAC theory.
• Implies that only support vectors matter; other training
examples are ignorable.
Linear SVM Mathematically
• Let training set {(xi, yi)}i=1..n, xi∈Rd, yi ∈ {-1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
⇔ yi(wTxi + b) ≥ ρ/2
wTxi + b ≥ ρ/2 if yi = 1
ξi
ξi
Linear SVMs: Overview
• The classifier is a separating hyperplane.
• Most “important” training points are support vectors; they define the
hyperplane.
0 x
0 x
• How about… mapping data to a higher-dimensional space:
x2
0 x
Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be mapped to some
higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
SVM applications
• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and
gained increasing popularity in late 1990s.
• SVMs are currently among the best performers for a number of classification
tasks ranging from text to genomic data.
• SVMs can be applied to complex data types beyond feature vectors (e.g.
graphs, sequences, relational data) by designing kernel functions for such data.