0% found this document useful (0 votes)

11 views14 pages

Lect 3

Uploaded by

Ark Mtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views14 pages

Lect 3

Uploaded by

Ark Mtech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Lecture 3: Dual problems and Kernels

C4B Machine Learning Hilary 2011 A. Zisserman

• Primal and dual forms

• Linear separability revisted

• Feature mapping

• Kernels for SVMs

• Kernel trick
• requirements
• radial basis functions

SVM – review
• We have seen that for an SVM learning a linear classifier

f (x) = w>x + b
is formulated as solving an optimization problem over w :
N
X
min ||w||2 + C max (0, 1 − yif (xi))
w∈Rd i
• This quadratic optimization problem is known as the primal problem.

• Instead, the SVM can be formulated to learn a linear classifier

N
X
f (x) = αiyi(xi>x) + b
i
by solving an optimization problem over αi.

• This is know as the dual problem, and we will look at the advantages
of this formulation.
Sketch derivation of dual form
The Representer Theorem states that the solution w can always be
written as a linear combination of the training data:
N
X
w= αj yj xj
j=1

Proof: see example sheet .

Now, substitute for w in f (x) = w>x + b

⎛ ⎞
N
X N
X ³ ´
f (x) = ⎝ αj y j x j ⎠ > x + b = αj yj xj >x + b
j=1 j=1
³ ´
and for w in the cost function minw ||w||2 subject to yi w>xi + b ≥ 1, ∀i
⎧ ⎫ ⎧ ⎫
⎨X ⎬ ⎨X ⎬ X
||w||2 = αj yj xj > αk yk xk = αj αk yj yk (xj >xk )
⎩ ⎭ ⎩ ⎭
j k jk
Hence, an equivalent optimization problem is over αj
⎛ ⎞
X N
X
min αj αk yj yk (xj >xk ) subject to yi ⎝ αj yj (xj >xi) + b⎠ ≥ 1, ∀i
αj
jk j=1
and a few more steps are required to complete the derivation.

Primal and dual formulations

N is number of training points, and d is dimension of feature vector x.

Primal problem: for w ∈ Rd

N
X
min ||w||2 + C max (0, 1 − yif (xi))
w∈Rd i

Dual problem: for α ∈ RN (stated without proof):

X 1X X
max αi − αj αk yj yk (xj >xk ) subject to 0 ≤ αi ≤ C for ∀i, and α i yi = 0
αi ≥0
i 2 jk i

• Complexity of solution is O(d3) for primal, and O(N 3) for dual

• If N << d then more eﬃcient to solve for α than w

• Dual form only involves (xj >xi). We will return to why this is an
advantage when we look at kernels.
Primal and dual formulations

Primal version of classifier:

f (x) = w > x + b

Dual version of classifier:

N
X
f (x) = αiyi(xi>x) + b
i

At first sight the dual form appears to have the disad-

vantage of a K-NN classifier — it requires the training
data points xi. However, many of the αi’s are zero. The
ones that are non-zero define the support vectors xi.

Support Vector Machine

wTx + b = 0

b
||w||

Support Vector
Support Vector

X
f (x) = αi yi (xi > x) + b
i
support vectors
Handling data that is not linearly separable

• introduce slack variables

N
X
min ||w||2 + C ξi
w∈Rd ,ξi∈R+ i
subject to
³ ´
yi w>xi + b ≥ 1 − ξi for i = 1 . . . N

• linear classifier not appropriate

Solution 1: use polar coordinates

<0 >0
r θ
θ
0

0 r

• Data is linearly separable in polar coordinates

• Acts non-linearly in original space
Ã ! Ã !
x1 r
Φ: → R2 → R2
x2 θ
Solution 2: map data to higher dimension
⎛ ⎞
Ã ! x2
1
x1 ⎜ ⎟
Φ: → ⎝ √ x2
2 ⎠ R2 → R3
x2
2x1x2

√
Z= 2x1x2

0
Y = x2
2 X = x2
1
• Data is linearly separable in 3D
• This means that the problem can still be solved by a linear classifier

SVM classifiers in a transformed feature space

f (x) = 0
Rd RD

Φ : x → Φ(x) Rd → R D

Learn classifier linear in w for RD :

f (x) = w>Φ(x) + b
Primal Classifier in transformed feature space

Classifier, with w ∈ RD :

f (x) = w>Φ(x) + b
Learning, for w ∈ RD
N
X
2
min ||w|| + C max (0, 1 − yif (xi))
w∈RD i

• Simply map x to Φ(x) where data is separable

• Solve for w in high dimensional space RD

• Complexity of solution is now O(D 3) rather than O(d3)

Dual Classifier in transformed feature space

Classifier:
N
X
f (x) = αi y i x i > x + b
i
N
X
→ f ( x) = αiyi Φ(xi)>Φ(x) + b
i
Learning:
X 1X
max αi − αj αk y j y k x j > x k
αi ≥0
i 2 jk
X 1X
→ max αi − αj αk yj yk Φ(xj )>Φ(xk )
αi ≥0
i 2 jk
subject to
X
0 ≤ αi ≤ C for ∀i, and αi y i = 0
i
Dual Classifier in transformed feature space
• Note, that Φ(x) only occurs in pairs Φ(xj )>Φ(xi)

• Once the scalar products are computed, complexity is again

O(N 3); it is not necessary to learn in the D dimensional space,
as it is for the primal

• Write k(xj , xi) = Φ(xj )>Φ(xi). This is known as a Kernel

Classifier:
N
X
f (x) = αiyi k(xi, x) + b
i
Learning:
X 1X
max αi − αj αk yj yk k(xj , xk )
αi ≥0
i 2 jk
subject to
X
0 ≤ αi ≤ C for ∀i, and αiyi = 0
i

Special transformations
⎛ ⎞
Ã ! x2
1
x1 ⎜ ⎟
Φ: → ⎝ √ x2
2 ⎠ R2 → R3
x2
2x1x2
⎛ ⎞
³ √ z12
´
⎜ ⎟
Φ(x)>Φ(z) = x2 2
1, x2, 2x1x2 ⎝ √ z2
2 ⎠
2z1z2
= x2 2 2 2
1 z1 + x2z2 + 2x1x2z1 z2
= (x1z1 + x2z2)2
= (x> z)2
Kernel Trick
• Classifier can be learnt and applied without explicitly computing Φ(x)

• All that is required is the kernel k(x, z) = (x>z)2

• Complexity is still O(N 3)

Example kernels

• Linear kernels k(x, x0) = x>x0

³ ´d
0 >
• Polynomial kernels k(x, x ) = 1 + x x 0 for any d > 0

— Contains all polynomials terms up to degree d

³ ´
0 0 2
• Gaussian kernels k(x, x ) = exp −||x − x || /2σ 2 for σ > 0

— Infinite dimensional feature space

Valid kernels – when can the kernel trick be used?

• Given some arbitrary function k(xi, xj ), how do we know

if it corresponds to a scalar product Φ(xi)>Φ(xj ) in some
space?

• Mercer kernels: if k(, ) satisfies:

— Symmetric k(xi, xj ) = k(xj , xi)
— Positive definite, α>Kα ≥ 0 for all α ∈ RN , where K is
the N × N Gram matrix with entries Kij = k(xi, xj ).
then k(, ) is a valid kernel.

• e.g. k(x, z) = x>z is a valid kernel, k(x, z) = x − x>z is not.

SVM classifier with Gaussian kernel

N = size of training data

N
X
f (x) = αiyik(xi, x) + b
i
support vector
weight (may be zero)

³ ´
0 0 2
Gaussian kernel k(x, x ) = exp −||x − x || /2σ 2

Radial Basis Function (RBF) SVM

N
X ³ ´
2 2
f (x ) = αiyi exp −||x − xi|| /2σ +b
i

RBF Kernel SVM Example

0.6

0.4

0.2
feature y

-0.2

-0.4

-0.6
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
feature x

• data is not linearly separable in original feature space

σ = 1.0 C=∞ f (x) = 1

f (x) = 0

f (x) = −1

N
X ³ ´
f (x ) = αiyi exp −||x − xi||2/2σ 2 + b
i

σ = 1.0 C = 100

Decrease C, gives wider (soft) margin

σ = 1.0 C = 10

N
X ³ ´
f (x ) = αiyi exp −||x − xi||2/2σ 2 + b
i

σ = 1.0 C=∞

N
X ³ ´
f (x ) = αiyi exp −||x − xi||2/2σ 2 + b
i
σ = 0.25 C=∞

Decrease sigma, moves towards nearest neighbour classifier

σ = 0.1 C=∞

N
X ³ ´
f (x ) = αiyi exp −||x − xi||2/2σ 2 + b
i
Kernel block structure
N × N Gram matrix with entries Kij = k(xi, xj )
linear kernel (C = 0.1) RBF kernel (C = 1, gamma = 0.25)

-6 pos. vec.
-6
neg. vec.
-4 supp. vec.
-4
margin vec.
-2 decision bound.
-2
pos. margin
neg. margin
0 0

2 2

4 4

6 6

-6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6
Gram matrix linear kernel Gram matrix RBF kernel
1
20
5 15 5
0.8
10 10 10
5 0.6
15 15
0
0.4
20 -5 20

25
-10
25 0.2
The kernel
-15 measures similarity
30 30
10 20 30 10 20 30 between the points

Kernel Trick - Summary

• Classifiers can be learnt for high dimensional features spaces, without
actually having to map the points into the high dimensional space

• Data may be linearly separable in the high dimensional space, but not
linearly separable in the original feature space

• Kernels can be used for an SVM because of the scalar product in the dual
form, but can also be used elsewhere – they are not tied to the SVM formalism

• Kernels apply also to objects that are not vectors, e.g.

P
k(h, h0) = 0 0
k min(hk , hk ) for histograms with bins hk , hk

• We will see other examples of kernels later in regression and unsupervised

learning
Background reading
• Bishop, chapters 6.2 and 7

• Hastie et al, chapter 12

• More on web page:

http://www.robots.ox.ac.uk/~az/lectures/ml

Wiley Electrochemical Engineering 978-1-119-00425-7
No ratings yet
Wiley Electrochemical Engineering 978-1-119-00425-7
2 pages
SVM
No ratings yet
SVM
21 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Machine Learning 3
No ratings yet
Machine Learning 3
35 pages
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
No ratings yet
Support Vector Machines: Kernels: CS4780/5780 - Machine Learning Fall 2011 Thorsten Joachims Cornell University
15 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
03 - Kernelization
No ratings yet
03 - Kernelization
32 pages
07 Kernels
No ratings yet
07 Kernels
6 pages
SVM Class
No ratings yet
SVM Class
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Introduction To Support Vector Machines: BTR Workshop Fall 2006
No ratings yet
Introduction To Support Vector Machines: BTR Workshop Fall 2006
88 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Icml Tutorial
No ratings yet
Icml Tutorial
85 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
Fast Kernel Classifiers
No ratings yet
Fast Kernel Classifiers
41 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
SVM
No ratings yet
SVM
36 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
This Is
No ratings yet
This Is
7 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
SVM
No ratings yet
SVM
8 pages
Kernel Models 1233
No ratings yet
Kernel Models 1233
56 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
Lecture 8
No ratings yet
Lecture 8
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Ain3001 - 04 - Support - Vector.machines
No ratings yet
Ain3001 - 04 - Support - Vector.machines
50 pages
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Human Disease Prediction Using Rule Based Expert System: R.Karthikeyan Assistant Professor SRM College Chennai
No ratings yet
Human Disease Prediction Using Rule Based Expert System: R.Karthikeyan Assistant Professor SRM College Chennai
15 pages
50 % of Marks For SC/ST/OBC (Non Creamy Layer) /person With Disability 3. 55 % of Marks For All Other Candidates
No ratings yet
50 % of Marks For SC/ST/OBC (Non Creamy Layer) /person With Disability 3. 55 % of Marks For All Other Candidates
1 page
Water Fall Model
No ratings yet
Water Fall Model
2 pages
History of Crypto
No ratings yet
History of Crypto
1 page
Formulate A Project Strategy
No ratings yet
Formulate A Project Strategy
1 page
SDLC
No ratings yet
SDLC
1 page
Math Dilations
No ratings yet
Math Dilations
6 pages
SP3457-6 - PLP Compression Dead-End - II Color
100% (1)
SP3457-6 - PLP Compression Dead-End - II Color
12 pages
Ficha Técnica Cinta de Aluminio - Fasson
100% (1)
Ficha Técnica Cinta de Aluminio - Fasson
2 pages
f6 TERMINAL FORM SIX
No ratings yet
f6 TERMINAL FORM SIX
5 pages
2022 - Cold-Formed Stainless Steel RHS Members Undergoing Combined Bending
No ratings yet
2022 - Cold-Formed Stainless Steel RHS Members Undergoing Combined Bending
14 pages
Lecture Note 3
No ratings yet
Lecture Note 3
4 pages
Proponents: Ambita, Jeffrey J. Indefenso, Antonette P. Reyes, Melvin Bien Z. Serenio, David Christian A. Vitangcol, Rafael P
No ratings yet
Proponents: Ambita, Jeffrey J. Indefenso, Antonette P. Reyes, Melvin Bien Z. Serenio, David Christian A. Vitangcol, Rafael P
16 pages
Property and Microstructure Evaluation As A Function of Processing Parameters: Large HY-80 Steel Casting For A US Navy Submarine
No ratings yet
Property and Microstructure Evaluation As A Function of Processing Parameters: Large HY-80 Steel Casting For A US Navy Submarine
13 pages
Refractoneter RM Series
No ratings yet
Refractoneter RM Series
20 pages
Unit-3 (A) Rac (Notes)
No ratings yet
Unit-3 (A) Rac (Notes)
21 pages
2.1 Mathematical Systems, Direct Proofs and Counterexamples
No ratings yet
2.1 Mathematical Systems, Direct Proofs and Counterexamples
46 pages
1PH0 1H MSC 20210211
No ratings yet
1PH0 1H MSC 20210211
30 pages
Model Test Civil MPPSC
No ratings yet
Model Test Civil MPPSC
11 pages
How Do We Acquire Knowledge in The Natural Sciences
No ratings yet
How Do We Acquire Knowledge in The Natural Sciences
4 pages
Standard8765-Other Types of m2 Measurement
No ratings yet
Standard8765-Other Types of m2 Measurement
30 pages
Gravity Method PDF
No ratings yet
Gravity Method PDF
14 pages
Physics II: Electricity and Magnetism: FIZ 102E
No ratings yet
Physics II: Electricity and Magnetism: FIZ 102E
58 pages
Combined Footings Trapzoidal&Strap
No ratings yet
Combined Footings Trapzoidal&Strap
19 pages
Vibration-Rotation: Spectrum of
No ratings yet
Vibration-Rotation: Spectrum of
5 pages
Disorder - A Cracked Crutch For Supporting Entropy
No ratings yet
Disorder - A Cracked Crutch For Supporting Entropy
7 pages
DSC and Dilatometry-P - 013333
No ratings yet
DSC and Dilatometry-P - 013333
63 pages
HW 2025 2
No ratings yet
HW 2025 2
2 pages
(I) All Questions Are Compulsory. (Ii) There Are Total 16 Questions. Questions 1 To 5 Carry One Mark Each, Questions 6
No ratings yet
(I) All Questions Are Compulsory. (Ii) There Are Total 16 Questions. Questions 1 To 5 Carry One Mark Each, Questions 6
1 page
Lakshminarayan Hazra - Foundations of Optical System Analysis and Design-CRC Press (2021)
No ratings yet
Lakshminarayan Hazra - Foundations of Optical System Analysis and Design-CRC Press (2021)
775 pages
4 Rev 1
No ratings yet
4 Rev 1
30 pages
Module 18 20
No ratings yet
Module 18 20
41 pages
Ancient Chinese Problem
No ratings yet
Ancient Chinese Problem
9 pages
Chapter 1 - Belt - BMMM2313
No ratings yet
Chapter 1 - Belt - BMMM2313
30 pages
Dye Laser: Al-Hassan Kenaan BMDV-6131
No ratings yet
Dye Laser: Al-Hassan Kenaan BMDV-6131
10 pages

Lect 3

Uploaded by

Lect 3

Uploaded by

Lecture 3: Dual problems and Kernels

C4B Machine Learning Hilary 2011 A. Zisserman

• Primal and dual forms

• Linear separability revisted

• Kernels for SVMs

• Instead, the SVM can be formulated to learn a linear classifier

Proof: see example sheet .

Now, substitute for w in f (x) = w>x + b

Primal and dual formulations

Primal problem: for w ∈ Rd

Dual problem: for α ∈ RN (stated without proof):

• Complexity of solution is O(d3) for primal, and O(N 3) for dual

• If N << d then more eﬃcient to solve for α than w

Primal version of classifier:

Dual version of classifier:

At first sight the dual form appears to have the disad-

Support Vector Machine

• introduce slack variables

• linear classifier not appropriate

Solution 1: use polar coordinates

• Data is linearly separable in polar coordinates

SVM classifiers in a transformed feature space

Learn classifier linear in w for RD :

• Simply map x to Φ(x) where data is separable

• Solve for w in high dimensional space RD

• Complexity of solution is now O(D 3) rather than O(d3)

Dual Classifier in transformed feature space

• Once the scalar products are computed, complexity is again

• Write k(xj , xi) = Φ(xj )>Φ(xi). This is known as a Kernel

• All that is required is the kernel k(x, z) = (x>z)2

• Complexity is still O(N 3)

• Linear kernels k(x, x0) = x>x0

— Contains all polynomials terms up to degree d

— Infinite dimensional feature space

Valid kernels – when can the kernel trick be used?

• Given some arbitrary function k(xi, xj ), how do we know

• Mercer kernels: if k(, ) satisfies:

• e.g. k(x, z) = x>z is a valid kernel, k(x, z) = x − x>z is not.

N = size of training data

Radial Basis Function (RBF) SVM

RBF Kernel SVM Example

• data is not linearly separable in original feature space

Decrease C, gives wider (soft) margin

Decrease sigma, moves towards nearest neighbour classifier

Kernel Trick - Summary

• Kernels apply also to objects that are not vectors, e.g.

• We will see other examples of kernels later in regression and unsupervised

• Hastie et al, chapter 12

• More on web page:

You might also like