0% found this document useful (0 votes)

5 views40 pages

Introduction To Support Vector Machines: Andrew Moore CMU

Uploaded by

ritikesh123456789

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views40 pages

Introduction To Support Vector Machines: Andrew Moore CMU

Uploaded by

ritikesh123456789

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Note to other teachers and users of

these slides. Andrew would be delighted

if you found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
to fit your own needs. PowerPoint
originals are available. If you make use
Introduction to Support
of a significant portion of these slides in
your own lecture, please include this
message, or the following link to the
Vector Machines
source repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutorials .
Comments and corrections gratefully
received.

Thanks:
Andrew Moore
CMU
And
Martin Law
Michigan State University
History of SVM
 SVM is related to statistical learning theory [3]
 SVM was first introduced in 1992 [1]

 SVM becomes popular because of its success in

handwritten digit recognition
 1.1% test error rate for SVM. This is the same as the error
rates of a carefully constructed neural network, LeNet 4.
 See Section 5.11 in [2] or the discussion in [3] for details
 SVM is now regarded as an important example of “kernel
methods”, one of the key area in machine learning
 Note: the meaning of “kernel” is different from the “kernel”
function for Parzen windows
[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on
Computational Learning Theory 5 144-152, Pittsburgh, 1992.
[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th
IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82.
[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.

2021/3/3 2
Linear Classifiers Estimation:
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
w: weight vector
x: data vector

How would you

classify this data?

2021/3/3 3
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you

classify this data?

2021/3/3 4
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you

classify this data?

2021/3/3 5
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you

classify this data?

2021/3/3 6
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

Any of these
would be fine..

..but which is
best?

2021/3/3 7
a
Classifier Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.

2021/3/3 8
a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Linear SVM
2021/3/3 9
a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x + b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
Support Vectors with the, um,
are those
datapoints that maximum margin.
the margin This is the
pushes up
against simplest kind of
SVM (Called an
LSVM)
Linear SVM
2021/3/3 10
Why Maximum Margin?

f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
Support Vectors with the, um,
are those
datapoints that maximum margin.
the margin This is the
pushes up
against simplest kind of
SVM (Called an
LSVM)

2021/3/3 11
How to calculate the distance from a point
to a line?
denotes +1
denotes -1 x
wx +b = 0

X – Vector
W
W – Normal Vector
b – Scale Value

 http://mathworld.wolfram.com/Point-LineDistance2-
Dimensional.html
 In our case, w1*x1+w2*x2+b=0,

 thus, w=(w1,w2), x=(x1,x2)

2021/3/3 12
Estimate the Margin
denotes +1
denotes -1 x
wx +b = 0

X – Vector
W
W – Normal Vector
b – Scale Value

 What is the distance expression for a point x to a line

wx+b= 0?
xw b xw b
d ( x)  

2 d 2
w w
i 1 i
2

2021/3/3 13
Large-margin Decision Boundary
 The decision boundary should be as far away from the
data of both classes as possible
 We should maximize the margin, m

 Distance between the origin and the line wtx=-b is b/||w||

Class 2

Class 1
m

2021/3/3 14
Finding the Decision Boundary
 Let {x1, ..., xn} be our data set and let yi  {1,-1} be
the class label of xi
 The decision boundary should classify all points correctly

 To see this: when y=-1, we wish (wx+b)<1, when y=1,

we wish (wx+b)>1. For support vectors, we wish

y(wx+b)=1.
 The decision boundary can be found by solving the
following constrained optimization problem

2021/3/3 15
Next step… Optional
 Converting SVM to a form we can solve
 Dual form
 Allowing a few errors
 Soft margin
 Allowing nonlinear boundary
 Kernel functions

2021/3/3 16
The Dual Problem (we ignore the derivation)
 The new objective function is in terms of ai only
 It is known as the dual problem: if we know w, we

know all ai; if we know all ai, we know w

 The original problem is known as the primal problem

 The objective function of the dual problem needs to be

maximized!
 The dual problem is therefore:

Properties of ai when we introduce The result when we differentiate the

the Lagrange multipliers original Lagrangian w.r.t. b
2021/3/3 17
The Dual Problem

 This is a quadratic programming (QP) problem

 A global maximum of ai can always be found

 w can be recovered by

2021/3/3 18
Characteristics of the Solution
 Many of the ai are zero (see next page for example)
w is a linear combination of a small number of data points
 This “sparse” representation can be viewed as data

compression as in the construction of knn classifier

 xi with non-zero ai are called support vectors (SV)

 The decision boundary is determined only by the SV

 Let tj (j=1, ..., s) be the indices of the s support vectors.

We can write
 For testing with a new data z
 Compute and
classify z as class 1 if the sum is positive, and class 2
otherwise
 Note: w need not be formed explicitly
2021/3/3 19
A Geometrical Interpretation

Class 2

a8=0.6 a10=0

a7=0
a5=0 a2=0

a1=0.8
a4=0
a6=1.4
a9=0
a3=0
Class 1

2021/3/3 20
Allowing errors in our solutions
We allow “error” xi in classification; it is based on the
output of the discriminant function wTx+b
 xi approximates the number of misclassified samples

Class 2

Class 1

2021/3/3 21
Soft Margin Hyperplane
 If we minimize ixi, xi can be computed by

 xi are “slack variables” in optimization

 Note that xi=0 if there is no error for xi

 xi is an upper bound of the number of errors

 We want to minimize
C : tradeoff parameter between error and margin


 The optimization problem becomes

2021/3/3 22
Extension to Non-linear Decision Boundary
So far, we have only considered large-
margin classifier with a linear decision
boundary
How to generalize it to become nonlinear?

Key idea: transform xi to a higher

dimensional space to “make life easier”
 Input space: the space the point xi are
located
 Feature space: the space of f(xi) after

transformation
2021/3/3 23
Transforming the Data (c.f. DHS Ch. 5)
f( )
f( ) f( )
f( ) f( ) f( )
f(.) f( )
f( ) f( )
f( ) f( )
f( ) f( )
f( ) f( ) f( )
f( )
f( )

Input space Feature space

Note: feature space is of higher dimension
than the input space in practice

 Computation in the feature space can be costly because it is

high dimensional
 The feature space is typically infinite-dimensional!
 The kernel trick comes to rescue

2021/3/3 24
The Kernel Trick
 Recall the SVM optimization problem

 The data points only appear as inner product

 As long as we can calculate the inner product in the

feature space, we do not need the mapping explicitly

 Many common geometric operations (angles, distances)
can be expressed by inner products
 Define the kernel function K by

2021/3/3 25
An Example for f(.) and K(.,.)
 Suppose f(.) is given as follows

 An inner product in the feature space is

 So, if we define the kernel function as follows, there is

no need to carry out f(.) explicitly

 This use of kernel function to avoid carrying out f(.)

explicitly is known as the kernel trick

2021/3/3 26
More on Kernel Functions
 Not all similarity measures can be used as kernel
function, however
 The kernel function needs to satisfy the Mercer function,
i.e., the function is “positive-definite”
 This implies that
 the n by n kernel matrix,
 in which the (i,j)-th entry is the K(xi, xj), is always positive
definite
 This also means that optimization problem can be solved
in polynomial time!

2021/3/3 27
Examples of Kernel Functions

 Polynomial kernel with degree d

 Radial basis function kernel with width s

 Closely related to radial basis function neural networks

 The feature space is infinite-dimensional

 Sigmoid with parameter k and q

 It does not satisfy the Mercer condition on all k and q

2021/3/3 28
Non-linear SVMs: Feature spaces

 General idea: the original input space can always be mapped to

some higher-dimensional feature space where the training set is
separable:

Φ: x → φ(x)

2021/3/3 29
Example
 Suppose we have 5 one-dimensional data points
 x1=1, x2=2, x3=4, x4=5, x5=6, with 1, 2, 6 as class 1 and 4,
5 as class 2  y1=1, y2=1, y3=-1, y4=-1, y5=1
 We use the polynomial kernel of degree 2
 K(x,y) = (xy+1)2
 C is set to 100

 We first find ai (i=1, …, 5) by

2021/3/3 30
Example
 By using a QP solver, we get
 a1=0, a2=2.5, a3=0, a4=7.333, a5=4.833
 Note that the constraints are indeed satisfied

 The support vectors are {x2=2, x4=5, x5=6}

 The discriminant function is

 b is recovered by solving f(2)=1 or by f(5)=-1 or by f(6)=1,

as x2 and x5 lie on the line and x4
lies on the line
 All three give b=9

2021/3/3 31
Example

Value of discriminant function

class 1 class 2 class 1

1 2 4 5 6

2021/3/3 32
Degree of Polynomial Features

X^1 X^2 X^3

X^4 X^5 X^6

2021/3/3 33
Choosing the Kernel Function
 Probably the most tricky part of using SVM.

2021/3/3 34
Software
 A list of SVM implementation can be found at
http://www.kernel-machines.org/software.html
 Some implementation (such as LIBSVM) can handle
multi-class classification
 SVMLight is among one of the earliest implementation of

SVM
 Several Matlab toolboxes for SVM are also available

2021/3/3 35
Summary: Steps for Classification
 Prepare the pattern matrix
 Select the kernel function to use

 Select the parameter of the kernel function and the

value of C
 You can use the values suggested by the SVM software, or
you can set apart a validation set to determine the values
of the parameter
 Execute the training algorithm and obtain the ai
 Unseen data can be classified using the ai and the

support vectors

2021/3/3 36
Conclusion
 SVM is a useful alternative to neural networks
 Two key concepts of SVM: maximize the margin and the

kernel trick
 Many SVM implementations are available on the web for

you to try on your data set!

2021/3/3 37
Resources
 http://www.kernel-machines.org/
 http://www.support-vector.net/

 http://www.support-vector.net/icml-tutorial.pdf

 http://www.kernel-machines.org/papers/tutorial-

nips.ps.gz
 http://www.clopinet.com/isabelle/Projects/SVM/applist.h
tml

2021/3/3 38
Appendix: Distance from a point to a line

 Equation for the line: let u be a variable, then any point

on the line can be described as:
 P = P1 + u (P2 - P1)
 Let the intersect point be u, P2
 Then, u can be determined by:

 The two vectors (P2-P1) is orthogonal to P3-u:

P
 That is,

 (P3-P) dot (P2-P1) =0

 P=P1+u(P2-P1)
P3
 P1=(x1,y1),P2=(x2,y2),P3=(x3,y3) P1

2021/3/3 39
Distance and margin

 x = x1 + u (x2 - x1)
y = y1 + u (y2 - y1)

 The distance therefore between the point P3 and the

line is the distance between P=(x,y) above and P3
 Thus,

 d= |(P3-P)|=

2021/3/3 40

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Optometry AND Ophthalmology: Kerala Government Optometrists Association PSC Training
100% (7)
Optometry AND Ophthalmology: Kerala Government Optometrists Association PSC Training
137 pages
Em8 1session2.1
No ratings yet
Em8 1session2.1
56 pages
UNIT 2 Searching and Sorting Techniques
No ratings yet
UNIT 2 Searching and Sorting Techniques
151 pages
12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
Laboratory Operations Manual: UCD School of Chemistry & Chemical Biology
No ratings yet
Laboratory Operations Manual: UCD School of Chemistry & Chemical Biology
66 pages
Steel Construction and Design Manual 6th
No ratings yet
Steel Construction and Design Manual 6th
54 pages
2 LT Plug Valve Repair Instructions
No ratings yet
2 LT Plug Valve Repair Instructions
7 pages
562
No ratings yet
562
98 pages
Lecture 14
No ratings yet
Lecture 14
20 pages
Dimension Details For PCC Retaining Wall: A B D5 D3 1 1
100% (2)
Dimension Details For PCC Retaining Wall: A B D5 D3 1 1
11 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machines: Theory, Implementation, and Applications
No ratings yet
Support Vector Machines: Theory, Implementation, and Applications
40 pages
Ap 1000
No ratings yet
Ap 1000
115 pages
FYI: You Got LFI FYI: You Got LFI: Tal Be'ery
No ratings yet
FYI: You Got LFI FYI: You Got LFI: Tal Be'ery
34 pages
SQL PPT1
No ratings yet
SQL PPT1
125 pages
Daa - Unit Ii 2020
No ratings yet
Daa - Unit Ii 2020
111 pages
Unit I
No ratings yet
Unit I
139 pages
SQL Functions
No ratings yet
SQL Functions
94 pages
Mechanical Dissection Laboratory: August 11, 1996 1
No ratings yet
Mechanical Dissection Laboratory: August 11, 1996 1
38 pages
10 SVM
No ratings yet
10 SVM
77 pages
Session 5 IoT Protocols and Architecture
No ratings yet
Session 5 IoT Protocols and Architecture
71 pages
Advanced Materials Proceedings of The International Conference On Physics and Mechanics of New Materials and Their Applications Phenma 2018 1st Ed Ivan A Parinov Download
No ratings yet
Advanced Materials Proceedings of The International Conference On Physics and Mechanics of New Materials and Their Applications Phenma 2018 1st Ed Ivan A Parinov Download
77 pages
Daa Unit IV
No ratings yet
Daa Unit IV
76 pages
Unit 6 GRAPHS
No ratings yet
Unit 6 GRAPHS
111 pages
K-Means Clustering-Converted-Merged
No ratings yet
K-Means Clustering-Converted-Merged
76 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Cursors & Triggers
No ratings yet
Cursors & Triggers
48 pages
DSA Unit 4 Stack
No ratings yet
DSA Unit 4 Stack
46 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Unit V Symantic Analysis
No ratings yet
Unit V Symantic Analysis
45 pages
Assertions
No ratings yet
Assertions
31 pages
DAA - UNIT I 2020l
No ratings yet
DAA - UNIT I 2020l
44 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Technical Building Blocks
No ratings yet
Technical Building Blocks
41 pages
Unit VI Code Optimization
No ratings yet
Unit VI Code Optimization
68 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Gravitation Notes
No ratings yet
Gravitation Notes
21 pages
Unit 1 Functions
No ratings yet
Unit 1 Functions
43 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Unit V Intermediate Code Generation
No ratings yet
Unit V Intermediate Code Generation
51 pages
Chapter 8
No ratings yet
Chapter 8
103 pages
Contest3 Tasks
No ratings yet
Contest3 Tasks
10 pages
Unit 1
No ratings yet
Unit 1
20 pages
Unit VI Code Generation
No ratings yet
Unit VI Code Generation
29 pages
Mechanical Characteristics of Historical Mortars From Tests On Small-Sample Non-Standard Specimens
No ratings yet
Mechanical Characteristics of Historical Mortars From Tests On Small-Sample Non-Standard Specimens
10 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Unit 2
No ratings yet
Unit 2
15 pages
Exponential Function
No ratings yet
Exponential Function
22 pages
BIM For Heritage Science A Review
No ratings yet
BIM For Heritage Science A Review
16 pages
Cambridge IGCSE™: Combined Science 0653/42 May/June 2021
No ratings yet
Cambridge IGCSE™: Combined Science 0653/42 May/June 2021
9 pages
SVM Class
No ratings yet
SVM Class
33 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
Artificial Intelligence and Machine Learning: T.A. Silvia Bucci
No ratings yet
Artificial Intelligence and Machine Learning: T.A. Silvia Bucci
78 pages
Q 1
No ratings yet
Q 1
4 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
SD-M1 TSI Chapitre 4
No ratings yet
SD-M1 TSI Chapitre 4
42 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Task Description PC Comm. Electrical
No ratings yet
Task Description PC Comm. Electrical
7 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
009 Lecture 8 Sequential Circuits - Shift Registers
No ratings yet
009 Lecture 8 Sequential Circuits - Shift Registers
23 pages
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
36 pages
Unit 3 - Part 5 - M2M - m2m Vs Iot - SRD
No ratings yet
Unit 3 - Part 5 - M2M - m2m Vs Iot - SRD
10 pages
SVM Student
No ratings yet
SVM Student
40 pages
PHMB Dosage Method Abreuval Tablets GB
0% (1)
PHMB Dosage Method Abreuval Tablets GB
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
XE Currency Data API Non Technical Quick Start Guide
No ratings yet
XE Currency Data API Non Technical Quick Start Guide
5 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
12.1 Experimental Techniques - Multiple Choice (Questions Only)
No ratings yet
12.1 Experimental Techniques - Multiple Choice (Questions Only)
12 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Bernanke and Blinder (1988)
No ratings yet
Bernanke and Blinder (1988)
5 pages
Part 1: C C: Ode and Ommentary
No ratings yet
Part 1: C C: Ode and Ommentary
1 page
4 - SVM
No ratings yet
4 - SVM
58 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
4694
No ratings yet
4694
4 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Diesel Generator Set MTU 16V4000 DS2250 Dimension
No ratings yet
Diesel Generator Set MTU 16V4000 DS2250 Dimension
4 pages
SVM
No ratings yet
SVM
11 pages
Through, From, Out, On, and At. A Prepositional Phrase Includes A Preposition A Noun
No ratings yet
Through, From, Out, On, and At. A Prepositional Phrase Includes A Preposition A Noun
2 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Dyscalculia Checklist
No ratings yet
Dyscalculia Checklist
2 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
SVM
No ratings yet
SVM
36 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
This Is
No ratings yet
This Is
7 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages

Introduction To Support Vector Machines: Andrew Moore CMU

Uploaded by

Introduction To Support Vector Machines: Andrew Moore CMU

Uploaded by

Note to other teachers and users of

these slides. Andrew would be delighted

 SVM becomes popular because of its success in

How would you

How would you

How would you

How would you

 thus, w=(w1,w2), x=(x1,x2)

 What is the distance expression for a point x to a line

 Distance between the origin and the line wtx=-b is b/||w||

we wish (wx+b)>1. For support vectors, we wish

know all ai; if we know all ai, we know w

 The objective function of the dual problem needs to be

Properties of ai when we introduce The result when we differentiate the

 This is a quadratic programming (QP) problem

compression as in the construction of knn classifier

 The decision boundary is determined only by the SV

 Let tj (j=1, ..., s) be the indices of the s support vectors.

 xi are “slack variables” in optimization

 xi is an upper bound of the number of errors

 The optimization problem becomes

Key idea: transform xi to a higher

Input space Feature space

 Computation in the feature space can be costly because it is

 The data points only appear as inner product

feature space, we do not need the mapping explicitly

 An inner product in the feature space is

 So, if we define the kernel function as follows, there is

 This use of kernel function to avoid carrying out f(.)

 Polynomial kernel with degree d

 Radial basis function kernel with width s

 Closely related to radial basis function neural networks

 Sigmoid with parameter k and q

 It does not satisfy the Mercer condition on all k and q

 General idea: the original input space can always be mapped to

 We first find ai (i=1, …, 5) by

 The support vectors are {x2=2, x4=5, x5=6}

 The discriminant function is

 b is recovered by solving f(2)=1 or by f(5)=-1 or by f(6)=1,

Value of discriminant function

class 1 class 2 class 1

X^1 X^2 X^3

X^4 X^5 X^6

 Select the parameter of the kernel function and the

you to try on your data set!

 Equation for the line: let u be a variable, then any point

 The two vectors (P2-P1) is orthogonal to P3-u:

 (P3-P) dot (P2-P1) =0

 The distance therefore between the point P3 and the

You might also like