0% found this document useful (0 votes)

152 views19 pages

Support Vector Machines & Kernels: David Sontag New York University

1. The document discusses support vector machines and kernels. It covers the derivation of the dual formulation of SVMs, which allows solving the SVM optimization problem using kernels without having to explicitly compute features in a high dimensional space. 2. The kernel trick is introduced, which allows computing the dot product of two vectors after they have been mapped to a high dimensional feature space, without having to explicitly perform the mapping. This makes SVMs with kernels efficient even for complex, non-linear mappings. 3. The document provides an overview of key concepts like slack variables, Lagrange multipliers, solving the constrained optimization problem in its dual formulation, and using kernels to implicitly compute dot products in high dimensional feature spaces.

Uploaded by

ramanadk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views19 pages

Support Vector Machines & Kernels: David Sontag New York University

Uploaded by

ramanadk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Support Vector Machines & Kernels

Lecture 5

David Sontag
New York University

Slides adapted from Luke Zettlemoyer and Carlos Guestrin

Multi-class SVM
As for the SVM, we introduce slack variables and maximize margin:

To predict, we use:

Now can we learn it? 

How to deal with imbalanced data?

•  In many practical applications we may have

imbalanced data sets
•  We may want errors to be equally distributed
between the positive and negative classes
•  A slight modification to the SVM objective
does the trick!

Class-specific weighting of the slack variables

What’s Next!
•  Learn one of the most interesting and
exciting recent advancements in machine
learning
–  The “kernel trick”
–  High dimensional feature spaces at no extra
cost!
•  But first, a detour
–  Constrained optimization!
Constrained optimization

No Constraint x ≥ -1 x≥1

x=0 x=0 x*=1

How do we solve with constraints?

 Lagrange Multipliers!!!
Lagrange multipliers – Dual variables
Add Lagrange multiplier
Rewrite
Constraint
Introduce Lagrangian (objective):

We will solve:
Why is this equivalent?
•  min is fighting max!
x<b  (x-b)<0  maxα-α(x-b) = ∞
•  min won’t let this happen!
Add new
x>b, α≥0  (x-b)>0  maxα-α(x-b) = 0, α*=0 constraint
•  min is cool with 0, and L(x, α)=x2 (original objective)

x=b  α can be anything, and L(x, α)=x2 (original objective)

The min on the outside forces max to behave, so constraints will be satisfied.
Dual SVM derivation (1) – the linearly
separable case

Original optimization problem:

Rewrite One Lagrange multiplier

constraints per example
Lagrangian:

Our goal now is to solve:

Dual SVM derivation (2) – the linearly
separable case

(Primal)

Swap min and max

(Dual)

Slater’s condition from convex optimization guarantees that

these two optimization problems are equivalent!
 
x(1)
 ... 
 
Dual SVM derivation
 x (n) 
 (3) – the linearly
 x(1) x(2) 
φ(x) =  separable case
 
(1) (3) 
 x x 
 
 ... 
(Dual)  
 ex(1) 
Can solve for optimal w,. .b. as function of α:
∂L �
∂w
=w− αj yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

Sums over all training examples scalars dot product

 
x(1)
 ... 
 
Dual SVM derivation
 x (n) 
 (3) – the linearly
 x(1) x(2) 
φ(x) =  separable case
 
(1) (3) 
 x x 
 
 ... 
(Dual)  
 ex(1) 
Can solve for optimal w,. .b. as function of α:
∂L �
∂w
=w− αj yj xj 
j

Substituting these values back in (and simplifying), we obtain:

(Dual)

So, in dual formulation we will solve for α directly!

•  w and b are computed from α (if needed)
Dual SVM derivation (3) – the linearly
separable case
Lagrangian:

αj > 0 for some j implies constraint

is tight. We use this to obtain b:

(1)

(2)

(3)
Dual for the non-separable case – same basic
story (we will skip details)

Primal: Solve for w,b,α:

Dual:

What changed?
•  Added upper bound of C on αi!
•  Intuitive explanation:
•  Without slack, αi  ∞ when constraints are violated (points
misclassified)
•  Upper bound of C limits the αi, so misclassifications are allowed
Wait a minute: why did we learn about the dual
SVM?

•  There are some quadratic programming

algorithms that can solve the dual faster than the
primal
–  At least for small datasets

•  But, more importantly, the “kernel trick”!!!

Reminder: What if the data is not
linearly separable?
Use features of features
of features of features….
 
x(1)
 ... 
 
 x(n) 
 
 x(1) x(2) 
 
φ(x) =  (1) (3) 
 x x 
 
 ... 
 
 ex(1) 
...

Feature space can get really large really quickly!

Higher order polynomials
number of monomial terms

d=4

m – input features
d – degree of polynomial

d=3
grows fast!
d = 6, m = 100
d=2
about 1.6 billion terms
number of input dimensions
Dual formulation only depends on
dot-products, not on w!

Remember the
First, we introduce features: examples x only
 appear in one dot
product
Next, replace the dot product with a Kernel:

Why is this useful???


∂w x . . .x
(1)   j j j
φ(x) =   ex x j�
(1)
 (1)e∂Lx(3) 
� x �. . .�= w − � αj yj xj
u1 .∂w .. v1  j
=  �. .. .  = u1 v1 + u2 v2 = u.v
Efficient dot-product
φ(u).φ(v)
∂L
∂L = w
=
− u2� (1)of
�
α �y
w − euα1j yj xj 
x j
v2�polynomials
j xj
 �
∂w
Polynomials of degree exactly d
v1
∂w 2 =
φ(u).φ(v)  jju 2
.  = u1 v1 + u2 v2 = u.v
u1 2 v1 v2
d=1 �� . . .

uu u 1u 2
v1  v 1 v 2
 2 2
φ(u).φ(v)
φ(u).φ(v) =
=  11
φ(u).φ(v) = u u u. v1�. 
u 2 1
.  == uu v v2+ =
+u2uu
v221vv= + u.v
u.v
1= 2u1 v1 u2 v2 + u

∂Lu22 21 u1 uv222  v
1
2 v1
1 11
1 2
v v 
φ(u).φ(v)
 =
= u


w
2 −  α
.
 2

vj2y j1x 2
j  = u21 v12 + 2u1 v1 u2 v2 +

d=2 u∂w
1
2 2
uv21u12 v2 v1
 u1 u2   v1uv22  j 2
v2 v )2
φ(u).φ(v) =  .
�u2 u1 �� = (u v 2
+ 2
u 2 2
v2 v1 �1 11 1 2 21 1 2 2
= u v + 2u v u v + u 2 v2

u
u2 1 v 2v1 = (u v + u v ) 2
φ(u).φ(v) = 2 . 2 = 1u11 v1 + 2 2u v = u.v
2 2
u2 v2 = (u.v) 2
φ(u).φ(v) = (u.v)d
For any d (we will skip proof): φ(u).φ(v) = (u.v) d

= (u.v)=d (u.v)d
φ(u).φ(v)φ(u).φ(v)
•  Cool! Taking a dot product and exponentiating gives same
results as mapping into high dimensional space and then taking
dot produce
Finally: the “kernel trick”!

•  Never compute features explicitly!!!

–  Compute dot products in closed form
•  Constant-time high-dimensional dot-
products for many classes of features
•  But, O(n2) time in size of dataset to
compute objective
–  Naïve implements slow
–  much work on speeding up
Common kernels
•  Polynomials of degree exactly d

•  Polynomials of degree up to d

•  Gaussian kernels

•  Sigmoid

•  And many others: very active area of research!

(Ebook PDF) Predictive HR Analytics: Mastering The HR Metric 2nd Edition Download
100% (5)
(Ebook PDF) Predictive HR Analytics: Mastering The HR Metric 2nd Edition Download
44 pages
Week2 Home Work
67% (3)
Week2 Home Work
2 pages
YDS0081J1 CAMIO 7 Reference Manual PDF
100% (3)
YDS0081J1 CAMIO 7 Reference Manual PDF
1,023 pages
Project Report On Reliance Jio
82% (17)
Project Report On Reliance Jio
27 pages
KNN K Nearest Neighbors Algorithm
No ratings yet
KNN K Nearest Neighbors Algorithm
6 pages
Stability and Convergence Theorems For Newmark's Method
No ratings yet
Stability and Convergence Theorems For Newmark's Method
6 pages
Tìm Hiểu Về Thuật Toán AES - Luận Văn, Đồ Án, Đề Tài Tốt Nghiệp
No ratings yet
Tìm Hiểu Về Thuật Toán AES - Luận Văn, Đồ Án, Đề Tài Tốt Nghiệp
18 pages
Data Warehouse Concepts: Quách Đình Hoàng Hoangqd@hcmute - Edu.vn
No ratings yet
Data Warehouse Concepts: Quách Đình Hoàng Hoangqd@hcmute - Edu.vn
35 pages
Unit 4D
No ratings yet
Unit 4D
43 pages
Bai-Giang - PTTK-HTTT - Ch3-Ch4
No ratings yet
Bai-Giang - PTTK-HTTT - Ch3-Ch4
181 pages
Bai-Giang - PTTK-HTTT - Ch1-Ch2
No ratings yet
Bai-Giang - PTTK-HTTT - Ch1-Ch2
98 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
49 pages
Applications of Graph Theory in Computer Sciences
100% (1)
Applications of Graph Theory in Computer Sciences
15 pages
Assignment
No ratings yet
Assignment
10 pages
Competitor Analysis
No ratings yet
Competitor Analysis
11 pages
Business Intelligence and Data Warehousing-Merged
No ratings yet
Business Intelligence and Data Warehousing-Merged
401 pages
BNSG-9000 Firmware User's Guide
100% (1)
BNSG-9000 Firmware User's Guide
38 pages
Dưới Đây Là Một Ví Dụ Đơn Giản Về Code Cho Trò Chơi Xếp Gạch Sử Dụng Ngôn Ngữ Python
No ratings yet
Dưới Đây Là Một Ví Dụ Đơn Giản Về Code Cho Trò Chơi Xếp Gạch Sử Dụng Ngôn Ngữ Python
6 pages
Cluster Info
No ratings yet
Cluster Info
16 pages
Text Summarizing Using NLP
100% (1)
Text Summarizing Using NLP
8 pages
Database For Financial Accounting Application II - Infrastructure - CodeProject
No ratings yet
Database For Financial Accounting Application II - Infrastructure - CodeProject
33 pages
Python Oops Concept
No ratings yet
Python Oops Concept
37 pages
Features 1 B
No ratings yet
Features 1 B
94 pages
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
No ratings yet
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
407 pages
Speech Emotion Detection (CNN Algorithm)
No ratings yet
Speech Emotion Detection (CNN Algorithm)
29 pages
Demonstration of Types of Viruses and Its Mechanism: Topic 7
No ratings yet
Demonstration of Types of Viruses and Its Mechanism: Topic 7
9 pages
CTDLGT
No ratings yet
CTDLGT
50 pages
Codificacion en Matlab
No ratings yet
Codificacion en Matlab
11 pages
The Bayesian Information Criterion
No ratings yet
The Bayesian Information Criterion
32 pages
Wso2 Esb PDF
No ratings yet
Wso2 Esb PDF
62 pages
Management Information System Ch-6 Data Flow Diagrams: 0-Level DFD
No ratings yet
Management Information System Ch-6 Data Flow Diagrams: 0-Level DFD
6 pages
History of Dart
No ratings yet
History of Dart
2 pages
Kali Commands
No ratings yet
Kali Commands
9 pages
Btvntonghop
No ratings yet
Btvntonghop
5 pages
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
No ratings yet
IT5409 - Ch7 - Part3 - DL For CV-v2 - 4pages
42 pages
Zenva - Javascript 101 Slides
No ratings yet
Zenva - Javascript 101 Slides
49 pages
Practical Database Design
100% (6)
Practical Database Design
13 pages
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
No ratings yet
Sentiment Analysis For Vietnamese: Binh Thanh Kieu Son Bao Pham
6 pages
# Team Username Isleader
No ratings yet
# Team Username Isleader
56 pages
Solu 5
0% (2)
Solu 5
46 pages
Business Statistics
No ratings yet
Business Statistics
109 pages
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
No ratings yet
Deep Convolutional Neural Networks For Image Classification: Many Slides From Rob Fergus (NYU and Facebook)
55 pages
Project Report: Demonstration of Types of Viruses and Its Mechanism
No ratings yet
Project Report: Demonstration of Types of Viruses and Its Mechanism
11 pages
Mos Excel 2016 - Core Practice Exam 1 Raw
No ratings yet
Mos Excel 2016 - Core Practice Exam 1 Raw
7 pages
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 21
No ratings yet
Object Oriented Programming (OOP) - CS304 Power Point Slides Lecture 21
49 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
MAD101-Chap 3
No ratings yet
MAD101-Chap 3
319 pages
Just 01 CPP Handout
No ratings yet
Just 01 CPP Handout
9 pages
Luong An Vinh STU - Bao Cao Chuyen Doi So
No ratings yet
Luong An Vinh STU - Bao Cao Chuyen Doi So
6 pages
Image Caption Technical Report
No ratings yet
Image Caption Technical Report
31 pages
Ioi 2012
No ratings yet
Ioi 2012
293 pages
ĐÁP ÁN LÍ THUYẾT AN TOÀN VÀ BẢO MẬT THÔNG TIN
No ratings yet
ĐÁP ÁN LÍ THUYẾT AN TOÀN VÀ BẢO MẬT THÔNG TIN
40 pages
EECS 1015: Introduction To Computer Science and Programming Topic 4
No ratings yet
EECS 1015: Introduction To Computer Science and Programming Topic 4
82 pages
Design of Experiments Notes - IIT Delhi
No ratings yet
Design of Experiments Notes - IIT Delhi
352 pages
CLIQUE Algorithm Grid-Based Subspace Clustering
No ratings yet
CLIQUE Algorithm Grid-Based Subspace Clustering
10 pages
BTL L13 Nhom 07 Filepdf
No ratings yet
BTL L13 Nhom 07 Filepdf
24 pages
Colab Tutorial
No ratings yet
Colab Tutorial
21 pages
Báo Cáo Thiết Kế Hệ Thông Điều Khiển Tủ Lạnh
No ratings yet
Báo Cáo Thiết Kế Hệ Thông Điều Khiển Tủ Lạnh
13 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
JSON
No ratings yet
JSON
33 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Hostel Rules & Regulations
No ratings yet
Hostel Rules & Regulations
3 pages
Hayagriva Telugu Meanings
No ratings yet
Hayagriva Telugu Meanings
7 pages
4.MBBS NRI Declaration 2023 24
No ratings yet
4.MBBS NRI Declaration 2023 24
1 page
Ii Semester 2004-2005CS C415/is C415 - Data Mining
No ratings yet
Ii Semester 2004-2005CS C415/is C415 - Data Mining
6 pages
Song 2class sip69SIP2011
No ratings yet
Song 2class sip69SIP2011
8 pages
2020 Hindu Religious Calendar in Color in 1 Page
No ratings yet
2020 Hindu Religious Calendar in Color in 1 Page
1 page
Land Records and Titles in India
No ratings yet
Land Records and Titles in India
24 pages
Knime Seventechniquesdatadimreduction
No ratings yet
Knime Seventechniquesdatadimreduction
266 pages
Sattriya and Bharatanatyam
No ratings yet
Sattriya and Bharatanatyam
13 pages
Selenium - A Brief Overview
No ratings yet
Selenium - A Brief Overview
27 pages
5952 Frequently Asked Questions 9001 2015
No ratings yet
5952 Frequently Asked Questions 9001 2015
4 pages
Biology For 12th Class Cbse
100% (2)
Biology For 12th Class Cbse
245 pages
STATISTICS For MGT Summary of Chapters
100% (1)
STATISTICS For MGT Summary of Chapters
16 pages
20027-InfosysImprovingProductivity (1) PDF
No ratings yet
20027-InfosysImprovingProductivity (1) PDF
22 pages
Sec Ike For Ipsec Vpns Xe 3s Book PDF
No ratings yet
Sec Ike For Ipsec Vpns Xe 3s Book PDF
136 pages
2020 - HNDIT1211 - Data Structures and Algorithms - 2020 Marking Scheme
No ratings yet
2020 - HNDIT1211 - Data Structures and Algorithms - 2020 Marking Scheme
8 pages
UNIX Programmer's Manual
No ratings yet
UNIX Programmer's Manual
336 pages
Oops Assignment No 1 Solutions 3
No ratings yet
Oops Assignment No 1 Solutions 3
33 pages
System Design Roadmap
No ratings yet
System Design Roadmap
7 pages
Publish To The Largest Online Library For The World To See
No ratings yet
Publish To The Largest Online Library For The World To See
33 pages
TUETH Laughter in The Living Room. Television Comedy and The American Home Audience
No ratings yet
TUETH Laughter in The Living Room. Television Comedy and The American Home Audience
3 pages
7.BLD Example Solves PDF
No ratings yet
7.BLD Example Solves PDF
10 pages
Java Interface
No ratings yet
Java Interface
6 pages
Power Pack VX - Scripts Collection - ''Official'' Documentation v1.0
100% (1)
Power Pack VX - Scripts Collection - ''Official'' Documentation v1.0
7 pages
TOC in 8 Hours
100% (1)
TOC in 8 Hours
312 pages
RESUME Technical Support Engineer
No ratings yet
RESUME Technical Support Engineer
5 pages
Gwyddion User Guide en
No ratings yet
Gwyddion User Guide en
116 pages
Pymodbus
No ratings yet
Pymodbus
251 pages
Limit Piece Function
No ratings yet
Limit Piece Function
8 pages
Supply Chain Management
80% (5)
Supply Chain Management
38 pages
Chapter 1 Test Introduction To Functions 45: Neither. (1 Mark Each) + 2 X 3 X
No ratings yet
Chapter 1 Test Introduction To Functions 45: Neither. (1 Mark Each) + 2 X 3 X
6 pages
Mentoring Form
No ratings yet
Mentoring Form
9 pages
Token Based Algorithms
No ratings yet
Token Based Algorithms
8 pages
Dci John Scofield On Improvisation PDF: Direct Link #1
No ratings yet
Dci John Scofield On Improvisation PDF: Direct Link #1
4 pages
Alg 2 t5 4
No ratings yet
Alg 2 t5 4
18 pages
Transactional Memory: David Chisnall
No ratings yet
Transactional Memory: David Chisnall
21 pages
287 HO 02 Audio & Communications (CrullG) 08-02-04
No ratings yet
287 HO 02 Audio & Communications (CrullG) 08-02-04
44 pages
Design and Implementation of An Electronic Voting System Using Blockchain Technology
No ratings yet
Design and Implementation of An Electronic Voting System Using Blockchain Technology
6 pages
CareerHub Resume Casual
No ratings yet
CareerHub Resume Casual
2 pages
Osp Perl Document
100% (1)
Osp Perl Document
52 pages
Oracle & Apex Geekery - Remember Me - APEX Autologin
No ratings yet
Oracle & Apex Geekery - Remember Me - APEX Autologin
7 pages

Support Vector Machines & Kernels: David Sontag New York University

Uploaded by

Support Vector Machines & Kernels: David Sontag New York University

Uploaded by

Support Vector Machines & Kernels

Slides adapted from Luke Zettlemoyer and Carlos Guestrin

Now can we learn it? 

• In many practical applications we may have

Class-specific weighting of the slack variables

x*=0 x*=0 x*=1

How do we solve with constraints?

x=b  α can be anything, and L(x, α)=x2 (original objective)

Original optimization problem:

Rewrite One Lagrange multiplier

Our goal now is to solve:

Swap min and max

Slater’s condition from convex optimization guarantees that

Substituting these values back in (and simplifying), we obtain:

Sums over all training examples scalars dot product

Substituting these values back in (and simplifying), we obtain:

So, in dual formulation we will solve for α directly!

αj > 0 for some j implies constraint

Primal: Solve for w,b,α:

• There are some quadratic programming

• But, more importantly, the “kernel trick”!!!

Feature space can get really large really quickly!

Why is this useful???

• Never compute features explicitly!!!

• And many others: very active area of research!

You might also like

•  In many practical applications we may have

x=0 x=0 x*=1

•  There are some quadratic programming

•  But, more importantly, the “kernel trick”!!!

•  Never compute features explicitly!!!

•  And many others: very active area of research!