0% found this document useful (0 votes)
2 views

Lecture 11

Uploaded by

Kawtar Dakham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 11

Uploaded by

Kawtar Dakham
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Kernels'(SVMs,'Logistic'

Regression)

Aarti&Singh

Machine&Learning&101315
Oct&2,&2019
Constrained+optimization+– dual+problem

Primal+problem:

b++ve

Moving+the+constraint+to+objective+function
Lagrangian:

If$strong$duality$holds,$then$d*$=$p*$ Dual+problem:
and$x*,$!*$satisfy$KKT$conditions$
including
!*(x*<b)$=$0 2
Dual%SVM%– linearly%separable%case
n'training'points,'d'features (x1,'…,'xn)'where'xi is'a'd?dimensional'
vector'

• Primal'problem:

w – weights on features (d-dim problem)

• Dual'problem'(derivation):

! – weights on training pts (n-dim problem)


3
Dual%SVM%– linearly%separable%case

Dual%problem%is%also%QP
Solution%gives%!js

Use%support%vectors%with%!k>0%to%
compute%b%since%constraint%is%tight%
(w.xk +%b)yk =%1 4
Dual%SVM%– non,separable%case
• Primal(problem:
,{ξj}(

Lagrange%
• Dual(problem:(( Multipliers

,{ξj}( L(w, b, ⇠, ↵, µ)

5
Dual%SVM%– non,separable%case

comes&from
@L Intuition:
=0 If&C→∞,&recover&hard@margin&SVM
@⇠

Dual&problem&is&also&QP
Solution&gives&!js
6
So#why#solve#the#dual#SVM?
• There%are%some%quadratic%programming%
algorithms%that%can%solve%the%dual%faster%than%
the%primal,%(specially%in%high%dimensions%d>>n)

• But,%more%importantly,%the%“kernel#trick”!!!

7
Separable(using(higher/order(features

x2 !

x1 r&=&√x12+x22

x12

x1

8
x1
What%if%data%is%not%linearly%separable?
Use%features%of%features%
of%features%of%features….

Φ(x)(=((x12,(x22,(x1x2,(….,(exp(x1))

Feature(space(becomes(really(large(very(quickly!
9
Higher'Order'Polynomials
m$– input$features$ d$– degree$of$polynomial

grows$fast!
d$=$6,$m$=$100
about$1.6$billion$terms

10
Dual%formulation%only%depends%on%
dot2products,%not%on%w!

Φ(x)%– High+dimensional%feature%space,%but%never%need%it%explicitly%as%long%
as%we%can%compute%the%dot%product%fast%using%some%Kernel%K
11
Dot$Product$of$Polynomials

d=1

d=2

d 12
Finally:(The(Kernel(Trick!

• Never'represent'features'explicitly
– Compute'dot'products'in'closed'
form

• Constant8time'high8dimensional'dot8
products'for'many'classes'of'features

13
Common%Kernels
• Polynomials,of,degree,d

• Polynomials,of,degree,up,to,d

• Gaussian/Radial,kernels,(polynomials,of,all,orders,– recall,
series,expansion,of,exp)

• Sigmoid

14
Mercer%Kernels
What'functions'are'valid'kernels'that'correspond'to'feature'
vectors'!(x)?

Answer:'Mercer'kernels'K
• K'is'continuous'
• K'is'symmetric
• K'is'positive'semi?definite,'i.e.''xTKx ≥'0'for'all'x

15
Overfitting
• Huge'feature'space'with'kernels,'what'about'
overfitting???
– Maximizing'margin'leads'to'sparse'set'of'support'
vectors'
– Some'interesting'theory'says'that'SVMs'search'for'
simple'hypothesis'with'large'margin
– Often'robust'to'overfitting

16
What%about%classification%time?
• For&a&new&input&x,&if&we&need&to&represent&!(x),&we&are&in&trouble!
• Recall&classifier:&sign(w.!(x)+b)

• Using&kernels&we&are&cool!

17
SVMs%with%Kernels
• Choose(a(set(of(features(and(kernel(function
• Solve(dual(problem(to(obtain(support(vectors(!i
• At(classification(time,(compute:

Classify%as

18
SVMs%with%Kernels
• Iris%dataset,%2%vs 13,%Linear%Kernel

19
SVMs%with%Kernels
• Iris%dataset,%1%vs 23,%Polynomial%Kernel%degree%2

20
SVMs%with%Kernels
• Iris%dataset,%1%vs 23,%Gaussian%RBF%kernel

21
SVMs%with%Kernels
• Iris%dataset,%1%vs 23,%Gaussian%RBF%kernel

22
SVMs%with%Kernels
• Chessboard*dataset,*Gaussian*RBF*kernel

23
SVMs%with%Kernels
• Chessboard*dataset,*Polynomial*kernel

24
Corel&Dataset

25
Corel&Dataset

Olivier)Chapelle 1998 26
USPS$Handwritten$digits

27
SVMs%vs.%Logistic%Regression
SVMs Logistic
Regression
Loss/function Hinge&loss Log+loss

28
SVMs%vs.%Logistic%Regression
SVM : Hinge%loss

Logistic.Regression :.Log%loss% (.4ve log.conditional.likelihood)

Log%loss Hinge%loss

051%loss

51 0 1
29
SVMs%vs.%Logistic%Regression
SVMs Logistic
Regression
Loss/function Hinge&loss Log+loss

High/dimensional/ Yes! Yes!


features/with/
kernels
Solution/sparse Often&yes! Almost&always&no!

Semantics/of/ “Margin” Real&probabilities


output
30
Kernels'in'Logistic'Regression

• Define(weights(in(terms(of(features:

• Derive(simple(gradient(descent(rule(on(!i 31
SVMs%vs.%Logistic%Regression
SVMs Logistic
Regression
Loss/function Hinge&loss Log+loss

High/dimensional/ Yes! Yes!


features/with/
kernels
Solution/sparse Often&yes! Almost&always&no!

Semantics/of/ “Margin” Real&probabilities


output
32
SVMs%vs.%Logistic%Regression
SVMs Logistic
Regression
Loss/function Hinge&loss Log+loss

High/dimensional/ Yes! Yes!


features/with/
kernels
Solution/sparse Often&yes! Almost&always&no!

Semantics/of/ “Margin” Real&probabilities


output
33
SVMs%vs.%Logistic%Regression
SVMs Logistic
Regression
Loss/function Hinge&loss Log+loss

High/dimensional/ Yes! Yes!


features/with/
kernels
Solution/sparse Often&yes! Almost&always&no!

Semantics/of/ “Margin” Real&probabilities


output
34
What%you%need%to%know
• Maximizing)margin
• Derivation)of)SVM)formulation
• Slack)variables)and)hinge)loss
• Relationship)between)SVMs)and)logistic)regression
– 0/1)loss
– Hinge)loss
– Log)loss
• Tackling)multiple)class
– One)against)All
– Multiclass SVMs
• Dual)SVM)formulation
– Easier to solve)when dimension high)d >)n
– Kernel Trick 35

You might also like