Lecture 17 - Hyperplane Classifiers - SVM - Plain
Lecture 17 - Hyperplane Classifiers - SVM - Plain
Class -1
⊤
The objective func. for hard-margin SVM
𝒘 𝒙 +𝑏 ≤ −1
𝒘 ⊤ 𝒙 +𝑏=0
Constrained optimization
problem with inequality
constraints. Objective and
constraints both are
convex
CS771: Intro to ML
4
Soft-Margin SVM (More Commonly Used)
Allow some training examples to fall within
the no-man’s land (margin region)
Even okay for some training examples to fall
totally on the wrong side of h.p.
Extent of “violation” by a training input () is
known as slack
means totally on the wrong side
𝒘 ⊤ 𝒙 𝑛 + 𝑏 ≥ 1 − 𝜉 𝑛 if 𝑦 𝑛 =+1
𝒘 ⊤ 𝒙 𝑛 + 𝑏 ≤ − 1+ 𝜉 𝑛 if 𝑦 𝑛 =− 1
CS771: Intro to ML
7
Solving Hard-Margin SVM
The hard-margin SVM optimization problem is
Maximizing a concave function G is an p.s.d. matrix, also called the Gram Matrix, and
(or minimizing a convex 1 is a vector of all 1s
function) s.t. and . Many
methods to solve it. (Note: For various SVM solvers, can see “Support Vector Machine Solvers” by Bottou and Lin) CS771: Intro to ML
9
Solving Hard-Margin SVM
One we have the ’s by solving the dual, we can get and as
The terms in red color above were not present in the hard-margin SVM
Two set of dual variables and
Will eliminate the primal var , b, to get dual problem containing the dual variables
CS771: Intro to ML
11
Solving Soft-Margin SVM Note: if we ignore the bias term then we don’t
need to handle the constraint (problem becomes
a bit more easy to solve)
The Lagrangian problem to solve Otherwise, the ’s are coupled and some opt. techniques
such as co-ordinate aspect can’t easily applied
The soft-margin SVM solution has three types of support vectors (with nonzero )
CS771: Intro to ML
13
SVMs via Dual Formulation: Some Comments
Recall the final dual objectives for hard-margin and soft-margin SVM
Note: Both these ignore the bias term
otherwise will need another constraint
Sum of slacks is like sum of hinge losses, and play similar roles
Can learn directly by minimizing using (stochastic)(sub)grad. descent
Hinge-loss version preferred for linear SVMs, or with other regularizers on (e.g.,
CS771: Intro to ML
15
SVM: Summary
A hugely (perhaps the most!) popular classification algorithm
Reasonably mature, highly optimized SVM softwares freely available (perhaps the
reason why it is more popular than various other competing algorithms)
Some popular ones: libSVM, LIBLINEAR, sklearn also provides SVM
Lots of work on scaling up SVMs* (both large and large )
Extensions beyond binary classification (e.g., multiclass, structured outputs)
Can even be used for regression problems (Support Vector Regression)
Nonlinear extensions possible via kernels
*
See: “Support Vector Machine Solvers” by Bottou and Lin
CS771: Intro to ML
16
Coming up next
A co-ordinate ascent algorithm for solving the SVM dual
Multi-class SVM
One-class SVM
Kernel methods and nonlinear SVM via kernels
CS771: Intro to ML