0% found this document useful (0 votes)
28 views16 pages

Lecture 17 - Hyperplane Classifiers - SVM - Plain

This document discusses support vector machines (SVMs) for classification. It explains that SVMs find the optimal hyperplane that separates classes with the maximum margin. The hyperplane is defined by a weight vector w and bias b. Hard-margin SVMs require all training examples to satisfy margin constraints, while soft-margin SVMs allow some violations via slack variables. The goal is to maximize the margin while minimizing slack variables. SVMs are solved using Lagrangian duality, resulting in a quadratic programming problem that is optimized to find the separating hyperplane.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views16 pages

Lecture 17 - Hyperplane Classifiers - SVM - Plain

This document discusses support vector machines (SVMs) for classification. It explains that SVMs find the optimal hyperplane that separates classes with the maximum margin. The hyperplane is defined by a weight vector w and bias b. Hard-margin SVMs require all training examples to satisfy margin constraints, while soft-margin SVMs allow some violations via slack variables. The goal is to maximize the margin while minimizing slack variables. SVMs are solved using Lagrangian duality, resulting in a quadratic programming problem that is optimized to find the separating hyperplane.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Hyperplane based Classifiers (2):

Large-Margin Classification - SVM

CS771: Introduction to Machine Learning


Piyush Rai
2
Support Vector Machine (SVM) SVM originally proposed by
Vapnik and colleagues in early
90s

 Hyperplane based classifier. Ensures a large margin around the hyperplane


 Will assume a linear hyperplane to be of the form (nonlinear ext. later)
𝒘 ⊤ 𝒙 +𝑏=1 𝒘 ⊤ 𝒙 𝑛 + 𝑏 ≥ 1 if 𝑦 𝑛=+1
Class +1
𝒘 ⊤ 𝒙 𝑛 + 𝑏 ≤ − 1if 𝑦 𝑛 =− 1
𝒘 ⊤ 𝒙+𝑏 ≥ 1 𝒘 𝒙+𝑏=−1 Distance from the

closest point (on either 𝑦 𝑛 (𝒘 ¿ ¿ ⊤ 𝒙 𝑛 +𝑏)≥ 1 ∀ 𝑛 ¿
side)
Distance of an input
“Margin” of the hyperplane from the h.p.
Class -1
𝒘 ⊤ 𝒙 +𝑏 ≤ −1 𝛾 = min
|𝒘 ⊤ 𝒙 𝑛 +𝑏| ¿
1
1 ≤𝑛 ≤ 𝑁 ‖𝒘‖ ‖𝒘‖
Want the hyperplane such that
this margin is maximized (max- 2
𝒘 ⊤ 𝒙+𝑏=0 Constrained Total margin=
margin hyperplane) and ‖𝒘‖
optimization
problem The 1/-1 in supp. h.p.
 Two other “supporting” hyperplanes defining a “no man’s land” equations is arbitrary; can
replace by any scalar m/-m
 Ensure that zero training examples fall in this region (will relax later) and solution won’t change,
except a simple scaling of
 The SVM idea: Position the hyperplane s.t. this region is as “wide” as possible
CS771: Intro to ML
3
Hard-Margin SVM
 Hard-Margin: Every training example must fulfil margin condition
 Meaning: Must not have any example in the no-man’s land
𝒘 ⊤ 𝒙 +𝑏=1  Also want to maximize margin
Class +1
𝒘 ⊤ 𝒙 +𝑏 ≥ 1 𝒘 ⊤ 𝒙 +𝑏=−1
 Equivalent to minimizing or

Class -1 

The objective func. for hard-margin SVM
𝒘 𝒙 +𝑏 ≤ −1

𝒘 ⊤ 𝒙 +𝑏=0
Constrained optimization
problem with inequality
constraints. Objective and
constraints both are
convex

CS771: Intro to ML
4
Soft-Margin SVM (More Commonly Used)
 Allow some training examples to fall within
the no-man’s land (margin region)
 Even okay for some training examples to fall
totally on the wrong side of h.p.
 Extent of “violation” by a training input () is
known as slack
 means totally on the wrong side

𝒘 ⊤ 𝒙 𝑛 + 𝑏 ≥ 1 − 𝜉 𝑛 if 𝑦 𝑛 =+1
𝒘 ⊤ 𝒙 𝑛 + 𝑏 ≤ − 1+ 𝜉 𝑛 if 𝑦 𝑛 =− 1

Soft-margin constraint: 𝑦 𝑛 (𝒘 ¿ ¿ ⊤ 𝒙𝑛 +𝑏) ≥ 1− 𝜉 𝑛 ∀ 𝑛 ¿


CS771: Intro to ML
5
Soft-Margin SVM (Contd)
Sum of slacks is
 Goal: Still want to maximize the margin such that like the training
error
 Soft-margin constraints are satisfied for all training ex.
 Do not have too many margin violations (sum of slacks should be small)
 The objective func. for soft-margin SVM
Inversely Trade-off hyperparam
prop. to training Constrained optimization
margin error problem with inequality
constraints. Objective and
constraints both are
convex

 Hyperparameter controls the trade off between large margin


and small training error (need to tune)
 Large : small training error but also small margin (bad)
 Small : large margin but large training error (bad)
CS771: Intro to ML
6

Solving the SVM Problem

CS771: Intro to ML
7
Solving Hard-Margin SVM
 The hard-margin SVM optimization problem is

 A constrained optimization problem. One option is to solve using Lagrange’s method


 Introduce Lagrange multipliers , one for each constraint, and solve

 denotes the vector of Lagrange multipliers


 It is easier (and helpful; we will soon see why) to solve the dual: min and then max
CS771: Intro to ML
8
Solving Hard-Margin SVM Note: if we ignore the bias term then we don’t
need to handle the constraint (problem becomes
a bit more easy to solve)
 The dual problem (min then max) is
Otherwise, the ’s are coupled and
some opt. techniques such as co-
ordinate ascent can’t easily be
applied
 Take (partial) derivatives of w.r.t. and and setting them to zero gives (verify)

tells us how important


training example () is
 The solution is simply a weighted sum of all the training inputs
 Substituting in the Lagrangian, we get the dual problem as (verify)
Note that inputs appear only as pairwise dot
This is also a “quadratic products. This will be useful later on when
program” (QP) – a quadratic we make SVM nonlinear using kernel
function of the variables methods

Maximizing a concave function G is an p.s.d. matrix, also called the Gram Matrix, and
(or minimizing a convex 1 is a vector of all 1s
function) s.t. and . Many
methods to solve it. (Note: For various SVM solvers, can see “Support Vector Machine Solvers” by Bottou and Lin) CS771: Intro to ML
9
Solving Hard-Margin SVM
 One we have the ’s by solving the dual, we can get and as

 A nice property: Most ’s in the solution will be zero (sparse solution)


𝒘 ⊤ 𝒙 +𝑏=1
 Reason: KKT conditions
𝒘 ⊤ 𝒙+𝑏=−1  For the optimal ’s, we must have
 Thus nonzero only if , i.e., the training example lies
on the boundary
𝛼𝑛 {1− 𝑦 𝑛 ( 𝒘 ⊤ 𝒙 𝑛 +𝑏 ) }=0

𝒘 ⊤ 𝒙+𝑏=0  These examples are called support vectors


CS771: Intro to ML
10
Solving Soft-Margin SVM
 Recall the soft-margin SVM optimization problem

 Here is the vector of slack variables


 Introduce Lagrange multipliers for each constraint and solve Lagrangian

 The terms in red color above were not present in the hard-margin SVM
 Two set of dual variables and
 Will eliminate the primal var , b, to get dual problem containing the dual variables
CS771: Intro to ML
11
Solving Soft-Margin SVM Note: if we ignore the bias term then we don’t
need to handle the constraint (problem becomes
a bit more easy to solve)
 The Lagrangian problem to solve Otherwise, the ’s are coupled and some opt. techniques
such as co-ordinate aspect can’t easily applied

 Take (partial) derivatives of w.r.t. , and and setting to zero gives


Weighted sum of training inputs

 Using and , we have (for hard-margin,


 Substituting these in the Lagrangian gives the Dual problem The dual variables don’t
Given , and can be found appear in the dual problem!
just like the hard-margin
SVM case
Maximizing a concave function In the solution, will still be sparse just like the
(or minimizing a convex hard-margin SVM case. Nonzero correspond
function) s.t. and . Many to the support vectors
methods to solve it. CS771:
(Note: For various SVM solvers, can see “Support Vector Machine Solvers” Intro
by Bottou to ML
and Lin)
12
Support Vectors in Soft-Margin SVM
 The hard-margin SVM solution had only one type of support vectors
 All lied on the supporting hyperplanes and

 The soft-margin SVM solution has three types of support vectors (with nonzero )

1. Lying on the supporting hyperplanes

2. Lying within the margin region but still


on the correct side of the hyperplane

3. Lying on the wrong side of the


hyperplane (misclassified training
examples)

CS771: Intro to ML
13
SVMs via Dual Formulation: Some Comments
 Recall the final dual objectives for hard-margin and soft-margin SVM
Note: Both these ignore the bias term
otherwise will need another constraint

 The dual formulation is nice due to two primary reasons


 Allows conveniently handling the margin based constraint (via Lagrangians)
 Allows learning nonlinear separators by replacing inner products in by general kernel-based
similarities (more on this when we talk about kernels)
 However, dual formulation can be expensive if is large (esp. compared to )
 Need to solve for variables
 Need to pre-compute and store gram matrix G
 Lot of work on speeding up SVM in these settings (e.g., can use co-ord. descent for )
CS771: Intro to ML
14
Solving for SVM in the Primal
 Maximizing margin subject to constraints led to the soft-margin formulation of SVM

 Note that slack is the same as , i.e., hinge loss for ()


 Thus the above is equivalent to minimizing the regularized hinge loss

 Sum of slacks is like sum of hinge losses, and play similar roles
 Can learn directly by minimizing using (stochastic)(sub)grad. descent
 Hinge-loss version preferred for linear SVMs, or with other regularizers on (e.g.,
CS771: Intro to ML
15
SVM: Summary
 A hugely (perhaps the most!) popular classification algorithm
 Reasonably mature, highly optimized SVM softwares freely available (perhaps the
reason why it is more popular than various other competing algorithms)
 Some popular ones: libSVM, LIBLINEAR, sklearn also provides SVM
 Lots of work on scaling up SVMs* (both large and large )
 Extensions beyond binary classification (e.g., multiclass, structured outputs)
 Can even be used for regression problems (Support Vector Regression)
 Nonlinear extensions possible via kernels

*
See: “Support Vector Machine Solvers” by Bottou and Lin
CS771: Intro to ML
16
Coming up next
 A co-ordinate ascent algorithm for solving the SVM dual
 Multi-class SVM
 One-class SVM
 Kernel methods and nonlinear SVM via kernels

CS771: Intro to ML

You might also like