0% found this document useful (0 votes)
13 views

Lecture 8

Uploaded by

20208046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 8

Uploaded by

20208046
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Lecture 8: SVM

overview
• SVM for linearly separable binary set

• Main Goal to design a hyper plane that classify all training vectors into two classes

• The best model that leaves the maximum margin from both classes
• the two classes labels +1 (positive examples and -1 (negative examples)

X2

X1
Intuition behind SVM
SVM more formally
Margin in terms of W
SVM as a minimization problem

1
|| w ||2
Quadratic
min
w ,b 2
problem
Linear
s .t . yn (w T xn − b ) − 1  0 n constrain

9
We wish to find the w and b which minimizes, and the α which maximizes
LP(whilst keeping αi ≥ 0 ∀i). We can do this by differentiating LP with
respect to w and b and setting the derivatives to zero:
Characteristics of the Solution
▪ Many of the ai are zero (see next page for example)
▪ w is a linear combination of a small number of data points
▪ This “sparse” representation can be viewed as data compression as in the
construction of knn classifier
▪ xi with non-zero ai are called support vectors (SV)
▪ The decision boundary is determined only by the SV
▪ Let tj (j=1, ..., s) be the indices of the s support vectors. We can write

▪ For testing with a new data z


▪ Compute
and classify z as class 1 if the sum is positive, and class 2 otherwise
▪ Note: w need not be formed explicitly
A Geometrical Interpretation

Class 2

a10=0
a8=0.6

a7=0
a2=0
a5=0

a1=0.8
a4=0
a6=1.4

a9=0
a3=0
Class 1
Example
Example
Example
Kernel trick
Non-linear SVMs: Feature spaces
▪ General idea: the original feature space can always be
mapped to some higher-dimensional feature space where the
training set is separable:

Φ: x → φ(x)

26
SVM for nonlinear reparability
Kernels
▪ Why use kernels?
▪ Make non-separable problem separable.
▪ Map data into better representational space
▪ Common kernels
▪ Linear
▪ Polynomial K(x,z) = (1+xTz)d
▪ Gives feature conjunctions
▪ Radial basis function (infinite dimensional space)

▪ Haven’t been very useful in text classification


29

You might also like