0% found this document useful (0 votes)
17 views26 pages

Support Vector Machines (II) : CMSC 422

This document discusses support vector machines (SVMs) for classification. It reviews the maximum margin principle behind SVMs and how SVMs can handle non-separable data using slack variables and a regularization parameter C. The document formulates the SVM optimization problem and explains how it can be solved using Lagrange multipliers, leading to sparse solutions where only support vectors have non-zero coefficients. It also discusses how kernels can be used to apply SVMs to non-linear classification.

Uploaded by

Arvind H H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views26 pages

Support Vector Machines (II) : CMSC 422

This document discusses support vector machines (SVMs) for classification. It reviews the maximum margin principle behind SVMs and how SVMs can handle non-separable data using slack variables and a regularization parameter C. The document formulates the SVM optimization problem and explains how it can be solved using Lagrange multipliers, leading to sparse solutions where only support vectors have non-zero coefficients. It also discusses how kernels can be used to apply SVMs to non-linear classification.

Uploaded by

Arvind H H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Support Vector

Machines (II)

CMSC 422
MARINE CARPUAT
[email protected]

Slides credit: Piyush Rai


What we know about SVM so far

REVIEW
The Maximum Margin Principle
• Find the hyperplane with maximum
separation margin on the training data
Support Vector Machine (SVM)
Characterizing the margin
Let’s assume the entire training data is correctly classified
by (w,b) that achieve the maximum margin
Solving the SVM Optimization Problem
(assuming linearly separable data)
Solving the SVM Optimization Problem
(assuming linearly separable data)

A Quadratic Program for


which many off-the-shelf
solvers exist
SVM: the solution!
(assuming linearly separable data)
What if the data is not separable?

GENERAL CASE SVM SOLUTION


SVM in the non-separable case
• no hyperplane can separate the classes perfectly

• We still want to find the max margin hyperplane,


but
– We will allow some training examples to be
misclassified
– We will allow some training examples to fall within
the margin region
SVM in the non-separable case
SVM Optimization Problem

C hyperparameter dictates which term dominates the minimization


• Small C => prefer large margins and allows more misclassified
examples
• Large C => prefer small number of misclassified examples, but at
the expense of a small margin
Introducing Lagrange Multipliers…

Terms in red are those that were


not there in the separable case!
Formulating the dual objective

Note
• Given 𝛼 the solution for w, b has the same form as in the
separable case
• 𝛼 is again sparse, nonzero 𝛼𝑛 ’s correspond to support vectors
Support Vectors
in the Non-Separable Case
We now have 3 types of support vectors!

(1)
(2)
(3)
Notes on training
• Solving the quadratic problem is O(N^3)
– Can be prohibitive for large datasets

• But many options to speed up training


– Approximate solvers
– Learn from what we know about training
linear models
Recall: Learning a Linear Classifier
as an Optimization Problem
Loss function Regularizer
Objective measures how well prefers solutions
function classifier fits training that generalize
data well

Indicator function: 1 if (.) is true, 0 otherwise


The loss function above is called the 0-1 loss
Recall: Learning a Linear Classifier
as an Optimization Problem

• Problem: The 0-1 loss above is NP-hard to optimize


exactly/approximately in general

• Solution: Different loss function approximations and


regularizers lead to specific algorithms
(e.g., perceptron, support vector machines, etc.)
Recall: Approximating the 0-1 loss
with surrogate loss functions
• Examples (with b = 0)
– Hinge loss
– Log loss
– Exponential loss

• All are convex upper-


bounds on the 0-1
loss
What is the SVM loss function?
Recall: What is the perceptron
optimizing?

• Loss function is a variant of the hinge loss


SVM + KERNELS
Kernelized SVM training
Kernelized SVM prediction

Note
• Kernelized SVM needs the
support vectors at test time!
• While unkernelized SVM can
just store w
Example: decision boundary of an
SVM with an RBF Kernel
What you should know
• What are Support Vector Machines
• How to train SVMs
– Which optimization problem we need to solve
• Geometric interpretation
- What are support vectors and what is their
relationship with parameters w,b?
• How do SVM relate to the general formulation of
linear classifiers
• Why/how can SVMs be kernelized

You might also like