0% found this document useful (0 votes)
44 views27 pages

Slide 9 - SVM

1. Support Vector Machines (SVMs) are an alternative to logistic regression that find a decision boundary with the largest minimum distance to the nearest data points of any class. 2. SVMs can learn non-linear decision boundaries using kernel methods, which implicitly map inputs to high-dimensional feature spaces. 3. When using an SVM, users must select parameters like the cost parameter C and the kernel function, and software packages can then solve for the optimal parameters.

Uploaded by

Lôny Nêz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views27 pages

Slide 9 - SVM

1. Support Vector Machines (SVMs) are an alternative to logistic regression that find a decision boundary with the largest minimum distance to the nearest data points of any class. 2. SVMs can learn non-linear decision boundaries using kernel methods, which implicitly map inputs to high-dimensional feature spaces. 3. When using an SVM, users must select parameters like the cost parameter C and the kernel function, and software packages can then solve for the optimal parameters.

Uploaded by

Lôny Nêz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Support'Vector'

Machines'
Op2miza2on'
objec2ve'
Machine'Learning'
Alterna(ve*view*of*logis(c*regression*

If'''''''''','we'want'''''''''''''''','
If'''''''''','we'want'''''''''''''''','
Andrew'Ng'
Alterna(ve*view*of*logis(c*regression*
Cost'of'example:'

If'''''''''''(want'''''''''''''''):' If'''''''''''(want'''''''''''''''):'

Andrew'Ng'
We're just gonna get rid of these 1/m terms and this should give the
same optimal value of Theta. Because 1/m is just as constant.

We should end up with the same optimal value for theta.


Support*vector*machine*
Logis2c'regression:'

Support'vector'machine:'

Andrew'Ng'
Support'Vector'
Machines'
Large'Margin'
Intui2on'
Machine'Learning'
Support*Vector*Machine*

I1' 1' I1' 1'


If'''''''''','we'want'''''''''''''''''(not'just''''''')'
If'''''''''','we'want''''''''''''''''''(not'just''''''')'

Andrew'Ng'
if C is large
SVM*Decision*Boundary*

Whenever'''''''''''''''':'

I1' 1'

Whenever'''''''''''''''':'

I1' 1'

Andrew'Ng'
SVM is a Large marging Classifier
Large*margin*classifier*in*presence*of*outliers*

x2'

x1' Better decision


boundary

Andrew'Ng'
Support'Vector'
Machines'
Kernels'I'
Machine'Learning'
NonDlinear*Decision*Boundary*

x2'

x1'

Is'there'a'different'/'beRer'choice'of'the'features''''''''''''''''''''''''''''?'
Andrew'Ng'
Kernel*
Given''''','compute'new'feature'depending''
on'proximity'to'landmarks''
x2'

x1'

Andrew'Ng'
Support'Vector'
Machines'

Kernels'II'
Machine'Learning'
Choosing*the*landmarks*

Given'''':'
'
x2'
'

x1'
Predict'''''''''''''if'
Where'to'get''''''''''''''''''''''''''''''''?'

Andrew'Ng'
SVM*with*Kernels*
Given'
choose'
Given'example''''':'
'
'

For'training'example'''''''''''''''''''':''

Andrew'Ng'
SVM*with*Kernels*
Hypothesis:'Given'''','compute'features'
Predict'“y=1”'if'
Training:'

Solve this minimization problem to get the parameters theta of your SVM

use off the shelf software packages that people have developed to
minimize this cost function, and so those software packages already
embody these numerical optimization tricks Andrew'Ng'
SVM*parameters:*
C'(''''''''').''''Large'C:'Lower'bias,'high'variance.'
''''''''''''''''''''Small'C:'Higher'bias,'low'variance.'

'' '''''''''Large''''':'Features'''''vary'more'smoothly.'
' 'Higher'bias,'lower'variance.'

''''''''''''''''''Small''''':'Features'''''vary'less'smoothly.'
' 'Lower'bias,'higher'variance.'

Andrew'Ng'
Support'Vector'
Machines'

Using'an'SVM'

Machine'Learning'
Use'SVM'so]ware'package'(e.g.'liblinear,'libsvm,'…)'to'solve'for'
parameters''''.'
Need'to'specify:'
Choice'of'parameter'C.'
Choice'of'kernel'(similarity'func2on):'
E.g.'No'kernel'(“linear'kernel”)'
Predict'“y'='1”'if''
Gaussian'kernel:'
'
''''''''''''''''''''''''''''''''''''''''''''','where''''''''''''''''''''''.''
Need'to'choose''''''.'

Andrew'Ng'
Kernel*(similarity)*func(ons:*
function f = kernel(x1,x2)
x1 x2'

return

Note:'Do'perform'feature'scaling'before'using'the'Gaussian'kernel.'

Andrew'Ng'
Other*choices*of*kernel*
Note:'Not'all'similarity'func2ons'''''''''''''''''''''''''''''''make'valid'kernels.'
(Need'to'sa2sfy'technical'condi2on'called'“Mercer’s'Theorem”'to'make'
sure'SVM'packages’'op2miza2ons'run'correctly,'and'do'not'diverge).'

Many'offItheIshelf'kernels'available:'
I  Polynomial'kernel:'

I  More'esoteric:'String'kernel,'chiIsquare'kernel,'histogram'
intersec2on'kernel,'…'
'
Andrew'Ng'
Mul(Dclass*classifica(on*

Many'SVM'packages'already'have'builtIin'mul2Iclass'classifica2on'
func2onality.'
Otherwise,'use'oneIvs.Iall'method.'(Train'''''''SVMs,'one'to'dis2nguish'
'''''''''''''from'the'rest,'for'''''''''''''''''''''''''''''),'get''
Pick'class''''with'largest'
Andrew'Ng'
Logis(c*regression*vs.*SVMs*
''''''''number'of'features'(''''''''''''''''''''),''''''''''''number'of'training'examples'
If'''''is'large'(rela2ve'to'''''):'
Use'logis2c'regression,'or'SVM'without'a'kernel'(“linear'kernel”)'
If'''''is'small,'''''''is'intermediate:'
Use'SVM'with'Gaussian'kernel'
If'''''is'small,''''''is'large:'
Create/add'more'features,'then'use'logis2c'regression'or'SVM'
without'a'kernel'
Neural'network'likely'to'work'well'for'most'of'these'secngs,'but'may'be'
slower'to'train.'

Andrew'Ng'

You might also like