0% found this document useful (0 votes)

17 views26 pages

Support Vector Machines (II) : CMSC 422

This document discusses support vector machines (SVMs) for classification. It reviews the maximum margin principle behind SVMs and how SVMs can handle non-separable data using slack variables and a regularization parameter C. The document formulates the SVM optimization problem and explains how it can be solved using Lagrange multipliers, leading to sparse solutions where only support vectors have non-zero coefficients. It also discusses how kernels can be used to apply SVMs to non-linear classification.

Uploaded by

Arvind H H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views26 pages

Support Vector Machines (II) : CMSC 422

Uploaded by

Arvind H H

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Support Vector

Machines (II)

CMSC 422
MARINE CARPUAT
[email protected]

Slides credit: Piyush Rai

What we know about SVM so far

REVIEW
The Maximum Margin Principle
• Find the hyperplane with maximum
separation margin on the training data
Support Vector Machine (SVM)
Characterizing the margin
Let’s assume the entire training data is correctly classified
by (w,b) that achieve the maximum margin
Solving the SVM Optimization Problem
(assuming linearly separable data)
Solving the SVM Optimization Problem
(assuming linearly separable data)

A Quadratic Program for

which many off-the-shelf
solvers exist
SVM: the solution!
(assuming linearly separable data)
What if the data is not separable?

GENERAL CASE SVM SOLUTION

SVM in the non-separable case
• no hyperplane can separate the classes perfectly

• We still want to find the max margin hyperplane,

but
– We will allow some training examples to be
misclassified
– We will allow some training examples to fall within
the margin region
SVM in the non-separable case
SVM Optimization Problem

C hyperparameter dictates which term dominates the minimization

• Small C => prefer large margins and allows more misclassified
examples
• Large C => prefer small number of misclassified examples, but at
the expense of a small margin
Introducing Lagrange Multipliers…

Terms in red are those that were

not there in the separable case!
Formulating the dual objective

Note
• Given 𝛼 the solution for w, b has the same form as in the
separable case
• 𝛼 is again sparse, nonzero 𝛼𝑛 ’s correspond to support vectors
Support Vectors
in the Non-Separable Case
We now have 3 types of support vectors!

(1)
(2)
(3)
Notes on training
• Solving the quadratic problem is O(N^3)
– Can be prohibitive for large datasets

• But many options to speed up training

– Approximate solvers
– Learn from what we know about training
linear models
Recall: Learning a Linear Classifier
as an Optimization Problem
Loss function Regularizer
Objective measures how well prefers solutions
function classifier fits training that generalize
data well

Indicator function: 1 if (.) is true, 0 otherwise

The loss function above is called the 0-1 loss
Recall: Learning a Linear Classifier
as an Optimization Problem

• Problem: The 0-1 loss above is NP-hard to optimize

exactly/approximately in general

• Solution: Different loss function approximations and

regularizers lead to specific algorithms
(e.g., perceptron, support vector machines, etc.)
Recall: Approximating the 0-1 loss
with surrogate loss functions
• Examples (with b = 0)
– Hinge loss
– Log loss
– Exponential loss

• All are convex upper-

bounds on the 0-1
loss
What is the SVM loss function?
Recall: What is the perceptron
optimizing?

• Loss function is a variant of the hinge loss

SVM + KERNELS
Kernelized SVM training
Kernelized SVM prediction

Note
• Kernelized SVM needs the
support vectors at test time!
• While unkernelized SVM can
just store w
Example: decision boundary of an
SVM with an RBF Kernel
What you should know
• What are Support Vector Machines
• How to train SVMs
– Which optimization problem we need to solve
• Geometric interpretation
- What are support vectors and what is their
relationship with parameters w,b?
• How do SVM relate to the general formulation of
linear classifiers
• Why/how can SVMs be kernelized

Zetium Data Sheet
100% (1)
Zetium Data Sheet
4 pages
Logical Structuring Deloitte S Case Competition Training
100% (1)
Logical Structuring Deloitte S Case Competition Training
66 pages
Iso 14001 Static 16x9
100% (1)
Iso 14001 Static 16x9
13 pages
Single Sideband Modulation
No ratings yet
Single Sideband Modulation
25 pages
527260-002F CE840 UserGuide
No ratings yet
527260-002F CE840 UserGuide
100 pages
FSED 27F Places of Assembly Occupancy Checklist Rev01
No ratings yet
FSED 27F Places of Assembly Occupancy Checklist Rev01
4 pages
Doosan Schematic All Models
100% (69)
Doosan Schematic All Models
20 pages
Assessment 2 Task 2-3-43g1gbwp
100% (1)
Assessment 2 Task 2-3-43g1gbwp
9 pages
Audio Technica ATH-M20x
No ratings yet
Audio Technica ATH-M20x
1 page
Nokia 7730 SXR 1 Series Service Interconnect Routers Data Sheet EN
No ratings yet
Nokia 7730 SXR 1 Series Service Interconnect Routers Data Sheet EN
9 pages
CSC2102 Data Structures and Algorithm Program BSSE-3 Sec. A Week 1
No ratings yet
CSC2102 Data Structures and Algorithm Program BSSE-3 Sec. A Week 1
30 pages
Manual Hiad 6 Ton Inv. 1942
No ratings yet
Manual Hiad 6 Ton Inv. 1942
46 pages
RT 900 User Guide
No ratings yet
RT 900 User Guide
83 pages
C++ - Short-Notes
No ratings yet
C++ - Short-Notes
73 pages
Lect6 Traffic Safety
No ratings yet
Lect6 Traffic Safety
83 pages
CG+REport Abcdpdf PDF To Word
No ratings yet
CG+REport Abcdpdf PDF To Word
15 pages
SSG-VD-000-MECH-IOM-SCA01-0001 - 3 - IFI - AC (Cover)
No ratings yet
SSG-VD-000-MECH-IOM-SCA01-0001 - 3 - IFI - AC (Cover)
20 pages
Inverse Power Method, Shifted Power Method and Deflation
No ratings yet
Inverse Power Method, Shifted Power Method and Deflation
4 pages
EBSCO-FullText-03 03 2025
No ratings yet
EBSCO-FullText-03 03 2025
12 pages
Experience Summary: Vijaya Bhaskar P
No ratings yet
Experience Summary: Vijaya Bhaskar P
3 pages
TCS Allegations and Mixtures Quiz-3 PREP INSTA
No ratings yet
TCS Allegations and Mixtures Quiz-3 PREP INSTA
21 pages
Dickson Winter Catalog 2010
No ratings yet
Dickson Winter Catalog 2010
20 pages
ECON1000 Introductory Economics Trimester 2 2023 Dubai Intern'l Academic City INT
No ratings yet
ECON1000 Introductory Economics Trimester 2 2023 Dubai Intern'l Academic City INT
13 pages
Google APAC Test FAQ - Campus Recruiting 2015
No ratings yet
Google APAC Test FAQ - Campus Recruiting 2015
2 pages
MIPS Superscalar Simulator
No ratings yet
MIPS Superscalar Simulator
5 pages
Rohini 56509347058
No ratings yet
Rohini 56509347058
4 pages
HTML File Paths
No ratings yet
HTML File Paths
7 pages
Inspection Notification-093.Rev A
No ratings yet
Inspection Notification-093.Rev A
2 pages
Smart Traffic Management Project
No ratings yet
Smart Traffic Management Project
2 pages
Oscorp Style Guide: Logos
No ratings yet
Oscorp Style Guide: Logos
2 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Support Vector Machines (II) : CMSC 422

Uploaded by

Support Vector Machines (II) : CMSC 422

Uploaded by

Support Vector

Slides credit: Piyush Rai

A Quadratic Program for

GENERAL CASE SVM SOLUTION

• We still want to find the max margin hyperplane,

C hyperparameter dictates which term dominates the minimization

Terms in red are those that were

• But many options to speed up training

Indicator function: 1 if (.) is true, 0 otherwise

• Problem: The 0-1 loss above is NP-hard to optimize

• Solution: Different loss function approximations and

• All are convex upper-

• Loss function is a variant of the hinge loss

You might also like