0% found this document useful (0 votes)

84 views

Machine Learning-Kernel Methods

This document summarizes a lecture on support vector machines (SVMs). It discusses three key topics: 1) The separable case, where training examples are linearly separable. The SVM aims to maximize the margin between classes by minimizing a regularization penalty. 2) The non-separable case, where examples are not perfectly separable. The optimization problem is modified to allow for some misclassifications by adding penalty terms. 3) A comparison of SVMs to logistic regression, noting they both aim to minimize a regularized empirical loss function, though SVMs focus on large-margin separation while logistic regression models class probabilities.

Uploaded by

aviral1987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views

Machine Learning-Kernel Methods

Uploaded by

aviral1987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Machine learning: lecture 7 Topics

Tommi S. Jaakkola • Support vector machines

MIT CSAIL – separable case, formulation, margin
[email protected] – non-separable case, penalties, and logistic regression
– dual solution, kernels
– examples, properties

Tommi Jaakkola, MIT CSAIL 2

Support vector machine (SVM) SVM: separable case

(d
• When the training examples are linearly separable we can • We minimize "w1"2/2 = 2
i=1 wi /2 subject to
maximize a geometric notion of margin (distance to the
yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n
boundary) by minimizing the regularization penalty
d margin = 1/!ŵ1! o
! o o o
x
"w1"2/2 = wi2/2 x o f (x; ŵ) = ŵ0 + xT ŵ1
o
i=1 x o
o
o x
x o o o x ŵ1 o
o o ooo o o xx xxx
subject to the classification constraints x
o
x x x
x− x+ w1
x o x
o + −
x |x − x |/2
yi[w0 + xTi w1] −1≥0 x
x
o x
x x
for i = 1, . . . , n. x • The resulting margin and the “slope” "ŵ1" are inversely
x
x related
• The solution is defined only on the basis of a subset of
examples or “support vectors”

Tommi Jaakkola, MIT CSAIL 3 Tommi Jaakkola, MIT CSAIL 4

SVM: non-separable case SVM: non-separable case cont’d

• When the examples are not linearly separable we can modify • We can also write the SVM optimization problem more
the optimization problem slightly to add a penalty for compactly as
violating the classification constraints: ξi
n "
! & #$ '+%
We minimize
n C 1 − yi [w0 + xi w1] +"w1"2/2
T
!
"w1"2/2 + C ξi x o
o
o o
i=1

i=1 x x
o
o where (z) = z if z ≥ 0 and zero otherwise (i.e., returns the
+
x o
subject to relaxed classification x
o positive part).
x ŵ1 o
constraints x x x
x
yi [w0 + xTi w1] − 1 + ξi ≥ 0, x
o
x

for i = 1, . . . , n. Here ξi ≥ 0 are

called “slack” variables.

Tommi Jaakkola, MIT CSAIL 5 Tommi Jaakkola, MIT CSAIL 6

SVM: non-separable case cont’d SVM vs logistic regression
• We can also write the SVM optimization problem more • When viewed from the point of view of regularized empirical
compactly as loss minimization, SVM and logistic regression appear quite
ξi similar:
n "
! & #$ '+%
1 !& '+
n
C 1 − yi [w0 + xi w1] +"w1"2/2
T
SVM: 1 − yi [w0 + xTi w1] + λ"w1"2/2
i=1 n i=1

where (z) = z if z ≥ 0 and zero otherwise (i.e., returns the

+ − log P (y |x,w)
i
n " & #$ '%
1!
positive part). Logistic: − log g yi [w0 + xTi w1] +λ"w1"2/2
n i=1
• This is equivalent to regularized empirical loss minimization
where g(z) = (1 + exp(−z))−1 is the logistic function.
1 !& '+
n
1 − yi [w0 + xTi w1] + λ"w1"2/2 (Note that we have transformed the problem maximizing the
n i=1
penalized log-likelihood into minimizing negative penalized
where λ = 1/nC is the regularization parameter. log-likelihood.)

Tommi Jaakkola, MIT CSAIL 7 Tommi Jaakkola, MIT CSAIL 8

SVM vs logistic regression cont’d SVM: solution, Lagrange multipliers

• The difference comes from how we penalize “errors”: • Back to the separable case: how do we solve
&" z
1!
n #$ %' min "w1"2/2 subject to
Both: Loss yi [w0 + xTi w1] + λ"w1"2/2
n i=1 yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n
5
SVM loss
• SVM: 4.5 LR loss

Loss(z) = (1 − z)+ 3.5

3
loss

2.5

• Regularized logistic reg: 2

1.5

Loss(z) = log(1 + exp(−z)) 1

0.5

0
!4 !3 !2 !1 0 1 2 3 4
z

Tommi Jaakkola, MIT CSAIL 9 Tommi Jaakkola, MIT CSAIL 10

SVM: solution, Lagrange multipliers SVM: solution, Lagrange multipliers

• Back to the separable case: how do we solve • Back to the separable case: how do we solve

min "w1"2/2 subject to min "w1"2/2 subject to

yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n

• Let start by representing the constraints as losses • Let start by representing the constraints as losses
+ +
) * 0, yi [w0 + xTi w1] − 1 ≥ 0 ) * 0, yi [w0 + xTi w1] − 1 ≥ 0
max α 1 − yi [w0 + xTi w1] = max α 1 − yi [w0 + xTi w1] =
α≥0 ∞, otherwise α≥0 ∞, otherwise

and rewrite the minimization problem in terms of these

, n
! ) *-
min "w1"2/2 + max αi 1 − yi [w0 + xTi w1]
w αi≥0
i=1

Tommi Jaakkola, MIT CSAIL 11 Tommi Jaakkola, MIT CSAIL 12

SVM: solution, Lagrange multipliers SVM solution cont’d
• Back to the separable case: how do we solve • We can then swap ’max’ and ’min’:
min "w1"2/2 subject to , n
! ) *-
min max "w1"2/2 + αi 1 − yi [w0 + xTi w1]
yi [w0 + xTi w1] − 1 ≥ 0, i = 1, . . . , n w {αi≥0}
i=1
, n
! ) *-
• Let start by representing the constraints as losses ?
= max min "w1"2/2 + αi 1 − yi [w0 + xTi w1]
+ {αi≥0} w
) * 0, yi [w0 + xTi w1] − 1 ≥ 0 $ i=1 %" #
max α 1 − yi [w0 + xTi w1] = J(w;α)
α≥0 ∞, otherwise
As a result we have to be able to minimize J(w; α) with
and rewrite the minimization problem in terms of these
respect to parameters w for any fixed setting of the Lagrange
, !n
) *- multipliers αi ≥ 0.
min "w1"2/2 + max αi 1 − yi [w0 + xTi w1]
w αi≥0
i=1
, n
! ) *-
= min max "w1"2/2 + αi 1 − yi [w0 + xTi w1]
w {αi≥0}
i=1

Tommi Jaakkola, MIT CSAIL 13 Tommi Jaakkola, MIT CSAIL 14

SVM solution cont’d SVM solution cont’d

• We can then swap ’max’ and ’min’: • We can then substitute the solution
! n
, !n
) *- ∂
min max "w1"2/2 + αi 1 − yi [w0 + xTi w1] J(w; α) = w1 − αiyixi = 0
w {αi≥0}
∂w1 i=1
i=1
! n
, n
! ) *- ∂
?
= max min "w1" /2 +2
αi 1 − yi [w0 + xTi w1] J(w; α) = − αiyi = 0
{αi≥0} w ∂w0 i=1
$ i=1 %" #
J(w;α) back into the objective and get (after some algebra):
, !n
) *-
We can find the optimal ŵ as a function of {αi} by setting
max "ŵ1"2/2 + αi 1 − yi [ŵ0 + xTi ŵ1]
the derivatives to zero: P
αi ≥ 0
i=1
n αiyi = 0
∂ ! i

J(w; α) = w1 − αiyixi = 0 ,!
n
1 !
n -
∂w1 i=1 = max αi − yiyj αiαj (xTi xj )
n αi ≥ 0 2 i,j=1
∂ ! P
αiyi = 0
i=1
J(w; α) = − αiyi = 0 i

∂w0 i=1

Tommi Jaakkola, MIT CSAIL 15 Tommi Jaakkola, MIT CSAIL 16

SVM solution: summary SVM solution: summary

• We can find the optimal setting of the Lagrange multipliers • We can find the optimal setting of the Lagrange multipliers
αi by maximizing αi by maximizing
n
! n n n
1 ! ! 1 !
αi − yiyj αiαj (xTi xj ) αi − yiyj αiαj (xTi xj )
i=1
2 i,j=1 i=1
2 i,j=1
( (
subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding
to “support vectors” will be non-zero. to “support vectors” will be non-zero.
• We can make predictions on any new example x according
to the sign of the discriminant function
n
)! !
ŵ0 + xT ŵ1= ŵ0 + xT α̂iyixi) = ŵ0 + α̂iyi(xT xi)
i=1 i∈SV

Tommi Jaakkola, MIT CSAIL 17 Tommi Jaakkola, MIT CSAIL 18

SVM solution: summary SVM solution: summary
• We can find the optimal setting of the Lagrange multipliers • We can find the optimal setting of the Lagrange multipliers
αi by maximizing αi by maximizing
n
! n n n
1 ! ! 1 !
αi − yiyj αiαj (xTi xj ) αi − yiyj αiαj (xTi xj )
i=1
2 i,j=1 i=1
2 i,j=1
( (
subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding subject to αi ≥ 0 and i αiyi = 0. Only αi’s corresponding
to “support vectors” will be non-zero. to “support vectors” will be non-zero.
• We can make predictions on any new example x according • We can make predictions on any new example x according
to the sign of the discriminant function to the sign of the discriminant function
n n
)! ! )! !
ŵ0 + xT ŵ1 = ŵ0 + xT α̂iyixi)= ŵ0 + α̂iyi(xT xi) ŵ0 + xT ŵ1 = ŵ0 + xT α̂iyixi) = ŵ0 + α̂iyi(xT xi)
i=1 i∈SV i=1 i∈SV

Tommi Jaakkola, MIT CSAIL 19 Tommi Jaakkola, MIT CSAIL 20

Non-linear classifier Non-linear classifier

• So far our classifier can make only linear separations
x x
• As with linear regression and logistic regression models, we x x
x

can easily obtain a non-linear classifier by first mapping our x

x
x
x

examples x = [x1 x2] into longer feature vectors φ(x)

x
x x
x
√ √ √ x
x
x x

φ(x) = [x21 x22 2x1x2 2x1 2x2 1]

Linear separator in the feature φ-space
and then applying the linear classifier to the new feature
vectors φ(x) x
x x
x x x
x
x
x
x
x
x x
x
x
x

Non-linear separator in the original x-space

Tommi Jaakkola, MIT CSAIL 21 Tommi Jaakkola, MIT CSAIL 22

Feature mapping and kernels Examples of kernel functions

• Let’s look at the previous example in a bit more detail • Linear kernel
√ √ √
x → φ(x) = [x21 x22 2x1x2 2x1 2x2 1] K(x, x%) = (xT x%)

• The SVM classifier deals only with inner products of examples • Polynomial kernel
(or feature vectors). In this example, ) *p
K(x, x%) = 1 + (xT x%)
φ(x) φ(x ) =
T %
x21x%2
1 + x22x%2
2 + 2x1x2x1x2
% %
+ 2x1x%1 + 2x2x%2 +1 where p = 2, 3, . . .. To get the feature vectors we
= (1 + x1x%1 + x2x%2)2 concatenate all up to pth order polynomial terms of the
) T % 2
* components of x (weighted appropriately)
= 1 + (x x )
• Radial basis kernel
so the inner products can be evaluated without ever explicitly . /
1
constructing the feature vectors φ(x)! K(x, x%) = exp − "x − x%"2
2
) *2
• K(x, x%) = 1 + (xT x%) is a kernel function (inner product In this case the feature space is infinite dimensional function
in the feature space) space (use of the kernel results in a non-parametric classifier).

Tommi Jaakkola, MIT CSAIL 23 Tommi Jaakkola, MIT CSAIL 24

SVM examples
2 2

1.5 1.5

1 1

0.5 0.5

0 0

!0.5 !0.5

!1 !1
!1.5 !1 !0.5 0 0.5 1 1.5 2 !1.5 !1 !0.5 0 0.5 1 1.5 2

linear 2nd order polynomial

2 2

1.5 1.5

1 1

0.5 0.5

0 0

!0.5 !0.5

!1 !1
!1.5 !1 !0.5 0 0.5 1 1.5 2 !1.5 !1 !0.5 0 0.5 1 1.5 2

4th order polynomial 8th order polynomial

Tommi Jaakkola, MIT CSAIL 25

Timo Contest
No ratings yet
Timo Contest
21 pages
SVM
No ratings yet
SVM
21 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
20 SVM
No ratings yet
20 SVM
35 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
37 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SVM
No ratings yet
SVM
36 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
05_SVM
No ratings yet
05_SVM
35 pages
svm
No ratings yet
svm
33 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Support Vector Machines (II) : CMSC 422
No ratings yet
Support Vector Machines (II) : CMSC 422
26 pages
12_Bài toán phân lớp_SVM_v2
No ratings yet
12_Bài toán phân lớp_SVM_v2
138 pages
SVM
No ratings yet
SVM
44 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
svm
No ratings yet
svm
36 pages
1632118884_ML-TCS-Lecture-15 (1)
No ratings yet
1632118884_ML-TCS-Lecture-15 (1)
46 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
L5_SVMs
No ratings yet
L5_SVMs
37 pages
10_SVM (1)
No ratings yet
10_SVM (1)
77 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Support vector machine
No ratings yet
Support vector machine
49 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
36 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
ASU Assignment2 Sol
No ratings yet
ASU Assignment2 Sol
8 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Lecture 2.7
No ratings yet
Lecture 2.7
18 pages
10 SVM
No ratings yet
10 SVM
23 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
DataMining_Chapter5
No ratings yet
DataMining_Chapter5
9 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
First Periodical Test in G8 Mathematics SY 2016 17
No ratings yet
First Periodical Test in G8 Mathematics SY 2016 17
2 pages
An Amazing Functional Equation
No ratings yet
An Amazing Functional Equation
3 pages
This Chapter Is Focused On - A
No ratings yet
This Chapter Is Focused On - A
8 pages
2D Translation in Computer Graphics
No ratings yet
2D Translation in Computer Graphics
31 pages
MATHS EDEXCEL GUIDE 1.0
No ratings yet
MATHS EDEXCEL GUIDE 1.0
11 pages
Determinants and Diagonalization - Linear Algebra
No ratings yet
Determinants and Diagonalization - Linear Algebra
44 pages
Tutorial-05 (Answers)
No ratings yet
Tutorial-05 (Answers)
9 pages
Price Elasticity Derivation
No ratings yet
Price Elasticity Derivation
5 pages
Addmath Form 5 Pap1 2013
No ratings yet
Addmath Form 5 Pap1 2013
2 pages
Exercise 1
No ratings yet
Exercise 1
4 pages
Lecature Real15
No ratings yet
Lecature Real15
3 pages
3rd Sem
No ratings yet
3rd Sem
1 page
Q1 PDF
No ratings yet
Q1 PDF
252 pages
DG10 Sols
No ratings yet
DG10 Sols
4 pages
Math Tricks: Multiply Up To 19x19 in You Head
No ratings yet
Math Tricks: Multiply Up To 19x19 in You Head
2 pages
Section P.1 Section P.2 Section P.3 Section P.4
No ratings yet
Section P.1 Section P.2 Section P.3 Section P.4
22 pages
Tiruvannamalai: 11 TH Mathematics
No ratings yet
Tiruvannamalai: 11 TH Mathematics
65 pages
Finite Element Method: 3.0 Fe Theory: General Continuum
No ratings yet
Finite Element Method: 3.0 Fe Theory: General Continuum
30 pages
Further Pure Mathematics FP1: Pearson Edexcel GCE
No ratings yet
Further Pure Mathematics FP1: Pearson Edexcel GCE
32 pages
Second Course in Linear Algebra
100% (1)
Second Course in Linear Algebra
36 pages
50 Marks Speed Test For 12th Maths
No ratings yet
50 Marks Speed Test For 12th Maths
3 pages
Mathematics For Ntse Questions - MM
No ratings yet
Mathematics For Ntse Questions - MM
127 pages
Exact Differential Equation
100% (2)
Exact Differential Equation
16 pages
22 Variation PDF
No ratings yet
22 Variation PDF
3 pages
Arithmetic Progression Sample Paper 3
No ratings yet
Arithmetic Progression Sample Paper 3
9 pages
F6_1 (1)
No ratings yet
F6_1 (1)
155 pages
Generating-Sequences 1
No ratings yet
Generating-Sequences 1
35 pages
The Properties of Determinants:: 11 22 II NN
No ratings yet
The Properties of Determinants:: 11 22 II NN
15 pages
Maths Presentation Group 4 (Deg1b)
No ratings yet
Maths Presentation Group 4 (Deg1b)
2 pages

Machine Learning-Kernel Methods

Uploaded by

Machine Learning-Kernel Methods

Uploaded by

Machine learning: lecture 7 Topics

Tommi S. Jaakkola • Support vector machines

Tommi Jaakkola, MIT CSAIL 2

Support vector machine (SVM) SVM: separable case

Tommi Jaakkola, MIT CSAIL 3 Tommi Jaakkola, MIT CSAIL 4

SVM: non-separable case SVM: non-separable case cont’d

for i = 1, . . . , n. Here ξi ≥ 0 are

Tommi Jaakkola, MIT CSAIL 5 Tommi Jaakkola, MIT CSAIL 6

where (z) = z if z ≥ 0 and zero otherwise (i.e., returns the

Tommi Jaakkola, MIT CSAIL 7 Tommi Jaakkola, MIT CSAIL 8

SVM vs logistic regression cont’d SVM: solution, Lagrange multipliers

Loss(z) = (1 − z)+ 3.5

• Regularized logistic reg: 2

Loss(z) = log(1 + exp(−z)) 1

Tommi Jaakkola, MIT CSAIL 9 Tommi Jaakkola, MIT CSAIL 10

SVM: solution, Lagrange multipliers SVM: solution, Lagrange multipliers

min "w1"2/2 subject to min "w1"2/2 subject to

and rewrite the minimization problem in terms of these

Tommi Jaakkola, MIT CSAIL 11 Tommi Jaakkola, MIT CSAIL 12

Tommi Jaakkola, MIT CSAIL 13 Tommi Jaakkola, MIT CSAIL 14

SVM solution cont’d SVM solution cont’d

Tommi Jaakkola, MIT CSAIL 15 Tommi Jaakkola, MIT CSAIL 16

SVM solution: summary SVM solution: summary

Tommi Jaakkola, MIT CSAIL 17 Tommi Jaakkola, MIT CSAIL 18

Tommi Jaakkola, MIT CSAIL 19 Tommi Jaakkola, MIT CSAIL 20

Non-linear classifier Non-linear classifier

can easily obtain a non-linear classifier by first mapping our x

examples x = [x1 x2] into longer feature vectors φ(x)

φ(x) = [x21 x22 2x1x2 2x1 2x2 1]

Non-linear separator in the original x-space

Tommi Jaakkola, MIT CSAIL 21 Tommi Jaakkola, MIT CSAIL 22

Feature mapping and kernels Examples of kernel functions

Tommi Jaakkola, MIT CSAIL 23 Tommi Jaakkola, MIT CSAIL 24

linear 2nd order polynomial

4th order polynomial 8th order polynomial

Tommi Jaakkola, MIT CSAIL 25

You might also like