0% found this document useful (0 votes)

5 views17 pages

Lecture 6

This document discusses the derivation and application of Support Vector Machines (SVM) and kernels, focusing on the dual formulation for both linearly separable and non-separable cases. It highlights the importance of support vectors, the role of Lagrange multipliers, and the use of various kernel functions to avoid explicit feature computation. Additionally, it addresses concerns about overfitting in high-dimensional feature spaces and strategies to mitigate it.

Uploaded by

hoangtucuagio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views17 pages

Lecture 6

Uploaded by

hoangtucuagio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Support Vector Machines & Kernels

Lecture 6

David Sontag
New York University

Slides adapted from Luke Zettlemoyer and Carlos Guestrin,

and Vibhav Gogate
Dual SVM derivation (1) – the linearly
separable case

Original optimization problem:

Rewrite One Lagrange multiplier

constraints per example
Lagrangian:

Our goal now is to solve:

Dual SVM derivation (2) – the linearly
separable case

(Primal)

Swap min and max

(Dual)

Slater’s condition from convex optimization guarantees that

these two optimization problems are equivalent!
⇥
x(1)
⇧ ... ⌃
⇧ ⌃
Dual SVM⇧ derivation
⇧ x (n) ⌃
⌃ (3) – the linearly
⇧ x(1) x(2) ⌃
⇥(x) = ⇧ separable case
⇧ ⌃
(1) (3) ⌃
⇧ x x ⌃
⇧ ⌃
⇧ ... ⌃
(Dual) ⇧ ⌃
⇤ ex(1) ⌅
Can solve for optimal w,. .b. as function of α:
⇤L ⌥
⇤w
=w j yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

Sums over all training examples scalars dot product

⇥
x(1)
⇧ ... ⌃
⇧ ⌃
Dual SVM⇧ derivation
⇧ x (n) ⌃
⌃ (3) – the linearly
⇧ x(1) x(2) ⌃
⇥(x) = ⇧ separable case
⇧ ⌃
(1) (3) ⌃
⇧ x x ⌃
⇧ ⌃
⇧ ... ⌃
(Dual) ⇧ ⌃
⇤ ex(1) ⌅
Can solve for optimal w,. .b. as function of α:
⇤L ⌥
⇤w
=w j yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

So, in dual formulation we will solve for α directly!

• w and b are computed from α (if needed)
Dual SVM derivation (3) – the linearly
separable case
Lagrangian:

αj > 0 for some j implies constraint

is tight. We use this to obtain b:

(1)

(2)

(3)
Classification rule using dual solution

Using dual solution

dot product of feature vectors of

new example with support vectors
Dual for the non-separable case

Primal: Solve for w,b,α:

Dual:

What changed?
• Added upper bound of C on αi!
• Intuitive explanation:
• Without slack, αi  ∞ when constraints are violated (points
misclassified)
• Upper bound of C limits the αi, so misclassifications are allowed
Support vectors

• Complementary slackness conditions:

• Support vectors: points xj such that

(includes all j such that , but also additional points
where ↵j⇤ = 0 ^ yj (w
~ ⇤ · ~xj + b)  1 )

• Note: the SVM dual solution may not be unique!

Dual SVM interpretation: Sparsity

-1
=
=

=
w.x + b
w.x + b

w.x + b
Final solution tends to
be sparse
•αj=0 for most j

•don’t need to store these

points to compute w or make
predictions
Non-support Vectors:
•αj=0
•moving them will not Support Vectors:
change w • αj≥0
SVM with kernels

• Never compute features explicitly!!!

– Compute dot products in closed form Predict with:

• O(n2) time in size of dataset to

compute objective
– much work on speeding up
Quadratic kernel

[Tommi Jaakkola]
Quadratic kernel

Feature mapping given by:

[Cynthia Rudin]
Common kernels
• Polynomials of degree exactly d

• Polynomials of degree up to d

• Gaussian kernels
Euclidean distance,
squared

• And many others: very active area of research!

(e.g., structured kernels that use dynamic programming
to evaluate, string kernels, …)
Gaussian kernel
Level sets, i.e. w.x=r for some r

Support vectors

[Cynthia Rudin] [mblondel.org]

Kernel algebra

Q: How would you prove that the “Gaussian kernel” is a valid kernel?
A: Expand the Euclidean norm as follows:

To see that this is a kernel, use the

Taylor series expansion of the
Then, apply (e) from above exponential, together with repeated
application of (a), (b), and (c):
The feature mapping is
infinite dimensional!
[Justin Domke]
Overfitting?

• Huge feature space with kernels: should we worry about

overfitting?
– SVM objective seeks a solution with large margin
• Theory says that large margin leads to good generalization
(we will see this in a couple of lectures)
– But everything overfits sometimes!!!
– Can control by:
• Setting C
• Choosing a better Kernel
• Varying parameters of the Kernel (width of Gaussian, etc.)

Physicians Desk Reference
0% (1)
Physicians Desk Reference
4 pages
Youtube Videos For Free - SaveFrom
No ratings yet
Youtube Videos For Free - SaveFrom
4 pages
Boq Home Renovation
No ratings yet
Boq Home Renovation
1 page
Guru Puja - The Declaration of Shri Mataji - Nirmala Vidya Amruta
No ratings yet
Guru Puja - The Declaration of Shri Mataji - Nirmala Vidya Amruta
38 pages
Reality Therapy
100% (1)
Reality Therapy
20 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Preposition or Preposition Exercise For Class 6
91% (11)
Preposition or Preposition Exercise For Class 6
5 pages
2.pattern Recognition (Pattern Classification) - Support Vector Machin
No ratings yet
2.pattern Recognition (Pattern Classification) - Support Vector Machin
122 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
HULFT-SalseGuide 20150630 EN
No ratings yet
HULFT-SalseGuide 20150630 EN
49 pages
Dancing Out of Bali Periplus Classics Series John Coast
No ratings yet
Dancing Out of Bali Periplus Classics Series John Coast
60 pages
Buku Bingg 8 Smster 1 Edit 2019 Revisi PDF
No ratings yet
Buku Bingg 8 Smster 1 Edit 2019 Revisi PDF
90 pages
Lecture 3
No ratings yet
Lecture 3
18 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Trash Talk in A Competitive Setting: Impact On Self-Efficacy and Affect
No ratings yet
Trash Talk in A Competitive Setting: Impact On Self-Efficacy and Affect
13 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Building Your Data Hub To Support Digital: Mark Walters Senior Manager Information Architecture & Design
No ratings yet
Building Your Data Hub To Support Digital: Mark Walters Senior Manager Information Architecture & Design
12 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Welcome To The Migo User's Guide
No ratings yet
Welcome To The Migo User's Guide
38 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
DB Systems - Data Modeling
No ratings yet
DB Systems - Data Modeling
126 pages
Lect 3
No ratings yet
Lect 3
14 pages
20 SVM
No ratings yet
20 SVM
35 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
49 pages
Ac 25.856-2a PDF
No ratings yet
Ac 25.856-2a PDF
35 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Intro SVM PDF
No ratings yet
Intro SVM PDF
47 pages
Tutorial Questions On Matlab
No ratings yet
Tutorial Questions On Matlab
2 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Advertisement For Project Research Scientist-I (Medical) & II (Non-Medical)
No ratings yet
Advertisement For Project Research Scientist-I (Medical) & II (Non-Medical)
2 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Aman Sure Nov-2024
No ratings yet
Aman Sure Nov-2024
4 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Lesson 28 Gender and Other Cross-Cutting Issues
No ratings yet
Lesson 28 Gender and Other Cross-Cutting Issues
5 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
SVM Class
No ratings yet
SVM Class
33 pages
Web Services 1
No ratings yet
Web Services 1
16 pages
Flow Between Parallel Plates
No ratings yet
Flow Between Parallel Plates
3 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
HUNTER ATLA Very Large Crude Carrier
No ratings yet
HUNTER ATLA Very Large Crude Carrier
2 pages
Guidelines On The Conversion of IPGrade
No ratings yet
Guidelines On The Conversion of IPGrade
1 page
Applied Data Analytics With Python
No ratings yet
Applied Data Analytics With Python
14 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
SONNET 18 by Wlliam
No ratings yet
SONNET 18 by Wlliam
12 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM
No ratings yet
SVM
36 pages
Cadna A: in Numerous Companies and Authorities Is Sucessfully Used For Example in
No ratings yet
Cadna A: in Numerous Companies and Authorities Is Sucessfully Used For Example in
3 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
A Review On Effects of GTAW Process Parameters On Weld: January 2016
No ratings yet
A Review On Effects of GTAW Process Parameters On Weld: January 2016
6 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
This Is
No ratings yet
This Is
7 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
CRC Sikkim Detailed Advt
No ratings yet
CRC Sikkim Detailed Advt
5 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Song Lyrics
No ratings yet
Song Lyrics
2 pages
Aruba Instant On AP22 Datasheet
No ratings yet
Aruba Instant On AP22 Datasheet
4 pages
Class 12 Economics Overview
No ratings yet
Class 12 Economics Overview
5 pages
SVM
No ratings yet
SVM
21 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Warlock 2
100% (9)
Warlock 2
64 pages
CV1 - PR - B - Mipa TDS
No ratings yet
CV1 - PR - B - Mipa TDS
2 pages
Engineering Managers Guide Design Patterns PDF
100% (1)
Engineering Managers Guide Design Patterns PDF
28 pages
Neha Kewale: Hadoop Administrator
No ratings yet
Neha Kewale: Hadoop Administrator
1 page
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Lecture 6

Uploaded by

Lecture 6

Uploaded by

Support Vector Machines & Kernels

Slides adapted from Luke Zettlemoyer and Carlos Guestrin,

Original optimization problem:

Rewrite One Lagrange multiplier

Our goal now is to solve:

Swap min and max

Slater’s condition from convex optimization guarantees that

Substituting these values back in (and simplifying), we obtain:

Sums over all training examples scalars dot product

Substituting these values back in (and simplifying), we obtain:

So, in dual formulation we will solve for α directly!

αj > 0 for some j implies constraint

Using dual solution

dot product of feature vectors of

Primal: Solve for w,b,α:

• Complementary slackness conditions:

• Support vectors: points xj such that

• Note: the SVM dual solution may not be unique!

•don’t need to store these

• Never compute features explicitly!!!

• O(n2) time in size of dataset to

Feature mapping given by:

• And many others: very active area of research!

[Cynthia Rudin] [mblondel.org]

To see that this is a kernel, use the

• Huge feature space with kernels: should we worry about

You might also like