0% found this document useful (0 votes)

13 views29 pages

Support Vector Machine

Support Vector Machines (SVM) are supervised learning algorithms primarily used for classification tasks, aiming to find the optimal hyperplane that separates different classes in n-dimensional space. SVM can handle both linearly and non-linearly separable data, utilizing support vectors to define the decision boundary and maximize the margin between classes. The document also discusses mathematical formulations, optimization problems, and the kernel trick for mapping data into higher-dimensional spaces to improve classification accuracy.

Uploaded by

kalpana khandale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views29 pages

Support Vector Machine

Uploaded by

kalpana khandale

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Machine Learning

Group

Support Vector Machine

University of Texas at 1
Austin
Machine Learning
Group

Support Vector Machines

• Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems.
• However, primarily, it is used for Classification problems in Machine
Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
• This best decision boundary is called a hyperplane. SVM chooses the
extreme points/vectors that help in creating the hyperplane.
• These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine.

University of Texas at 2
Austin
Machine Learning
Group

Continue...
• Consider the below diagram in which there are two different categories
that are classified using a decision boundary or hyperplane:

University of Texas at 3
Austin
Machine Learning
Group

Continue...
• Example: SVM can be understood with the example that we have used in
the KNN classifier. Suppose we see a strange cat that also has some
features of dogs, so if we want a model that can accurately identify
whether it is a cat or dog, so such a model can be created by using the
SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it
can learn about different features of cats and dogs, and then we test it with
this strange creature.
• So as support vector creates a decision boundary between these two data
(cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.

University of Texas at 4
Austin
Machine Learning
Group

Continue...
• Consider the below diagram:

• SVM algorithm can be used for Face detection, image classification, text
categorization, etc. Types of SVM

University of Texas at 5
Austin
Machine Learning
Group

Continue...
• SVM can be of two types:

• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.

• Non-linear SVM: Non-Linear SVM is used for non-linearly separated

data, which means if a dataset cannot be classified by using a straight line,
then such data is termed as non-linear data and classifier used is called as
Non-linear SVM classifier.

University of Texas at 6
Austin
Machine Learning
Group

Continue...
• Hyperplane and Support Vectors in the SVM algorithm:
• Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n- dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best boundary
is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
• Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since
these vectors support the hyperplane, hence called a Support vector. How
does SVM works?

University of Texas at 7
Austin
Machine Learning
Group

Continue...
• Linear SVM:
• The working of the SVM algorithm can be understood by using an
example. Suppose we have a dataset that has two tags (green and blue),
and the dataset has two features x1 and x2. We want a classifier that can
classify the pair(x1, x2) of coordinates in either green or blue. Consider the
below image:

University of Texas at 8
Austin
Machine Learning
Group
Continue...
• So as it is 2-d space so by just using a straight line, we can easily separate these
two classes. But there can be multiple lines that can separate these classes.
Consider the below image:

• Hence, the SVM algorithm helps to find the best line or decision boundary; this
best boundary or region is called as a hyperplane.
• SVM algorithm finds the closest point of the lines from both the classes. These
points are called support vectors.
• The distance between the vectors and the hyperplane is called as margin. And
the goal of SVM is to maximize this margin. The hyperplane with maximum
margin is called the optimal hyperplane.
University of Texas at 9
Austin
Machine Learning
Group

Perceptron Revisited: Linear Separators

• Binary classification can be viewed as the task of

separating classes in feature space:

wTx + b = 0
wTx + b > 0
wTx + b < 0

f(x) = sign(wTx + b)

University of Texas at 10
Austin
Machine Learning
Group

Linear Separators

• Which of the linear separators is optimal?

University of Texas at 11
Austin
Machine Learning
Group

Classification Margin
wT xi  b
• Distance from example xi to the separator is r
w
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the distance between support vectors.
ρ

University of Texas at 12
Austin
Machine Learning
Group

Maximum Margin Classification

• Maximizing the margin is good according to intuition and
PAC theory.
• Implies that only support vectors matter; other training
examples are ignorable.

University of Texas at 13
Austin
Machine Learning
Group

Linear SVM Mathematically

• Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):
wTxi + b ≤ - ρ/2 if yi = -1
 yi(wTxi + b) ≥ ρ/2
wTxi + b ≥ ρ/2 if yi = 1

• For every support vector xs the above inequality is an equality.

After rescaling w and b by ρ/2 in the equality, we obtain
y s (w T x sthat
 b) 1
r 
distance between each xs and the hyperplane is w w

• Then the margin can be expressed

2 through (rescaled) w and b as:
 2r 
w

University of Texas at 14
Austin
Machine Learning
Group

Linear SVMs Mathematically (cont.)

• Then we can formulate the quadratic optimization problem:

Find w and b such that

2
 is maximized
w
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
Which can be reformulated as:

Find w and b such that

Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1

University of Texas at 15
Austin
Machine Learning
Group

Solving the Optimization Problem

Find w and b such that

Φ(w) =wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1

• Need to optimize a quadratic function subject to linear constraints.

• Quadratic optimization problems are a well-known class of mathematical
programming problems for which several (non-trivial) algorithms exist.
• The solution involves constructing a dual problem where a Lagrange
multiplier αi is associated with every inequality constraint in the primal
(original) problem:
Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
University of Texas at 16
Austin
Machine Learning
Group

The Optimization Problem Solution

• Given a solution α1…αn to the dual problem, solution to the primal is:

w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0

• Each non-zero αi indicates that corresponding xi is a support vector.

• Then the classifying function is (note that we don’t need w explicitly):

f(x) = ΣαiyixiTx + b

• Notice that it relies on an inner product between the test point x and the
support vectors xi – we will return to this later.
• Also keep in mind that solving the optimization problem involved
computing the inner products xiTxj between all training points.
University of Texas at 17
Austin
Machine Learning
Group

Soft Margin Classification

• What if the training set is not linearly separable?

• Slack variables ξi can be added to allow misclassification of difficult or
noisy examples, resulting margin called soft.

ξi
ξi

University of Texas at 18
Austin
Machine Learning
Group

Soft Margin Classification Mathematically

• The old formulation:

Find w and b such that
Φ(w) =wTw is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1

• Modified formulation incorporates slack variables:

Find w and b such that

Φ(w) =wTw + CΣξi is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0

• Parameter C can be viewed as a way to control overfitting: it “trades off”

the relative importance of maximizing the margin and fitting the training
data.

University of Texas at 19
Austin
Machine Learning
Group

Soft Margin Classification – Solution

• Dual problem is identical to separable case (would not be identical if the 2-

norm penalty for slack variables CΣξi2 was used in primal objective, we
would need additional Lagrange multipliers for slack variables):
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

• Again, xi with non-zero αi will be support vectors.

• Solution to the dual problem is: Again, we don’t need to
compute w explicitly for
w =Σαiyixi classification:
b= yk(1- ξk) - ΣαiyixiTxk for any k s.t. αk>0
f(x) = ΣαiyixiTx + b

University of Texas at 20
Austin
Machine Learning
Group

Theoretical Justification for Maximum Margins

• Vapnik has proved the following:

The class of optimal linear separators has VC dimension h bounded from
above as  D 2  
h min  2  , m0   1
   
where ρ is the margin, D is the diameter of the smallest sphere that can
enclose all of the training examples, and m0 is the dimensionality.

• Intuitively, this implies that regardless of dimensionality m0 we can

minimize the VC dimension by maximizing the margin ρ.

• Thus, complexity of the classifier is kept small regardless of

dimensionality.

University of Texas at 21
Austin
Machine Learning
Group

Linear SVMs: Overview

• The classifier is a separating hyperplane.

• Most “important” training points are support vectors; they define the
hyperplane.

• Quadratic optimization algorithms can identify which training points xi are

support vectors with non-zero Lagrangian multipliers αi.

• Both in the dual formulation of the problem and in the solution training
points appear only inside inner products:
Find α1…αN such that f(x) = ΣαiyixiTx + b
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

University of Texas at 22
Austin
Machine Learning
Group

Non-linear SVMs

• Datasets that are linearly separable with some noise work out great:

0 x

• But what are we going to do if the dataset is just too hard?

0 x
• How about… mapping data to a higher-dimensional space:
x2

0 x
University of Texas at 23
Austin
Machine Learning
Group

Non-linear SVMs: Feature spaces

• General idea: the original feature space can always be mapped to some
higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

University of Texas at 24
Austin
Machine Learning
Group

The “Kernel Trick”

• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj

• If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is eqiuvalent to an inner product in some
feature space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space
(without the need to compute each φ(x) explicitly).
University of Texas at 25
Austin
Machine Learning
Group

What Functions are Kernels?

• For some functions K(xi,xj) checking that K(xi,xj)= φ(xi) Tφ(xj) can be
cumbersome.
• Mercer’s theorem:
Every semi-positive definite symmetric function is a kernel
• Semi-positive definite symmetric functions correspond to a semi-positive
definite symmetric Gram matrix:

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xn)

K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xn)

K=
… … … … …
K(xn,x1) K(xn,x2) K(xn,x3) … K(xn,xn)

University of Texas at 26
Austin
Machine Learning
Group

Examples of Kernel Functions

• Linear: K(xi,xj)= xiTxj
– Mapping Φ: x → φ(x), where φ(x) is x itself

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)dp  p

 
 
– Mapping Φ: x → φ(x), where φ(x) has  p  dimensions
2
xi  x j

2 2
e
• Gaussian (radial-basis function): K(xi,xj) =
– Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional: every point is
mapped to a function (a Gaussian); combination of functions for support
vectors is the separator.

• Higher-dimensional space still has intrinsic dimensionality d (the mapping

is not onto), but linear separators in it correspond to non-linear separators
inUniversity
originalof space.
Texas at 27
Austin
Machine Learning
Group

Non-linear SVMs Mathematically

• Dual problem formulation:

Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

• The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

• Optimization techniques for finding αi’s remain the same!

University of Texas at 28
Austin
Machine Learning
Group

SVM applications

• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and
gained increasing popularity in late 1990s.
• SVMs are currently among the best performers for a number of classification
tasks ranging from text to genomic data.
• SVMs can be applied to complex data types beyond feature vectors (e.g.
graphs, sequences, relational data) by designing kernel functions for such data.
• SVM techniques have been extended to a number of tasks such as regression
[Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.
• Most popular optimization algorithms for SVMs use decomposition to hill-
climb over a subset of αi’s at a time, e.g. SMO [Platt ’99] and [Joachims ’99]
• Tuning SVMs remains a black art: selecting a specific kernel and parameters
is usually done in a try-and-see manner.

University of Texas at 29
Austin

Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
SVM7
No ratings yet
SVM7
53 pages
10 Classification SVM
No ratings yet
10 Classification SVM
22 pages
Support Vector Machine Master Thesis
100% (3)
Support Vector Machine Master Thesis
7 pages
Support Vector Machine
No ratings yet
Support Vector Machine
21 pages
Support Vector Machine
100% (1)
Support Vector Machine
11 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
S V M (SVM) : Upport Ector Achine
No ratings yet
S V M (SVM) : Upport Ector Achine
67 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
A09 Support Vector Machines 2up
No ratings yet
A09 Support Vector Machines 2up
15 pages
IVPML Unit III
No ratings yet
IVPML Unit III
139 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Lecture7C Classification
No ratings yet
Lecture7C Classification
34 pages
SVM Notes
No ratings yet
SVM Notes
4 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Seminar
No ratings yet
Seminar
51 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
ML - 05 - Support Vector Machines
No ratings yet
ML - 05 - Support Vector Machines
52 pages
SVM
No ratings yet
SVM
11 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Unit 2
No ratings yet
Unit 2
47 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM
No ratings yet
SVM
21 pages
Static Indeterminacy PDF
No ratings yet
Static Indeterminacy PDF
5 pages
Email Security
No ratings yet
Email Security
31 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
21csc305p Machine Learning Unit 5
No ratings yet
21csc305p Machine Learning Unit 5
61 pages
Fourier 4
No ratings yet
Fourier 4
73 pages
059145a019c2fb - Operations Research Theory & Practice - Nvs Raju - Ch1!2!16 - Page-0001
No ratings yet
059145a019c2fb - Operations Research Theory & Practice - Nvs Raju - Ch1!2!16 - Page-0001
15 pages
UNIT IV 5 Weak Slot and Filler Structures
No ratings yet
UNIT IV 5 Weak Slot and Filler Structures
41 pages
A Case For Geometric Criteria in Resources and and Reserves Classification
No ratings yet
A Case For Geometric Criteria in Resources and and Reserves Classification
22 pages
13 - Chapter 5 PDF
No ratings yet
13 - Chapter 5 PDF
40 pages
Binary Search Algorithm - Data Structure
No ratings yet
Binary Search Algorithm - Data Structure
3 pages
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
No ratings yet
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
5 pages
Amazon: Exam Questions AWS-Certified-Machine-Learning-Specialty
No ratings yet
Amazon: Exam Questions AWS-Certified-Machine-Learning-Specialty
16 pages
PE ZC213 / TA ZC233 Engineering Measurements L-3: BITS Pilani
No ratings yet
PE ZC213 / TA ZC233 Engineering Measurements L-3: BITS Pilani
17 pages
匈牙利方法解决任务分配问题
100% (1)
匈牙利方法解决任务分配问题
7 pages
Modular Assessment Grade 11: Statistics and Probability Mr. Antonio E. Soto JR
No ratings yet
Modular Assessment Grade 11: Statistics and Probability Mr. Antonio E. Soto JR
4 pages
7.1 First Order Differential Equation
No ratings yet
7.1 First Order Differential Equation
35 pages
AI WK 11 Lec 21 22 Student
No ratings yet
AI WK 11 Lec 21 22 Student
23 pages
Discrete Probability Distribution
No ratings yet
Discrete Probability Distribution
21 pages
RVSP Unit 3
No ratings yet
RVSP Unit 3
25 pages
OS Lab Manual
No ratings yet
OS Lab Manual
30 pages
Stock Watson Ecta 1993
No ratings yet
Stock Watson Ecta 1993
38 pages
COMP3014J Week6
No ratings yet
COMP3014J Week6
33 pages
7 1526465877 - 16-05-2018 PDF
No ratings yet
7 1526465877 - 16-05-2018 PDF
7 pages
MST 2
No ratings yet
MST 2
4 pages
How To Build An AI
No ratings yet
How To Build An AI
3 pages
Adaboost
No ratings yet
Adaboost
5 pages
Ece468 1
No ratings yet
Ece468 1
34 pages
Particle Dynamics in AdS2 Space
No ratings yet
Particle Dynamics in AdS2 Space
4 pages
JFJF
No ratings yet
JFJF
14 pages
Embedded Application of Fractional Order Control: R. Duma, P. Dobra and M. Trusca
No ratings yet
Embedded Application of Fractional Order Control: R. Duma, P. Dobra and M. Trusca
2 pages
Facebook Profiles Clustering
No ratings yet
Facebook Profiles Clustering
5 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

Support Vector Machine

Uploaded by

Support Vector Machine

Uploaded by

Machine Learning

Support Vector Machine

Support Vector Machines

• Non-linear SVM: Non-Linear SVM is used for non-linearly separated

Perceptron Revisited: Linear Separators

• Binary classification can be viewed as the task of

• Which of the linear separators is optimal?

Maximum Margin Classification

Linear SVM Mathematically

• For every support vector xs the above inequality is an equality.

• Then the margin can be expressed

Linear SVMs Mathematically (cont.)

• Then we can formulate the quadratic optimization problem:

Find w and b such that

Find w and b such that

Solving the Optimization Problem

Find w and b such that

• Need to optimize a quadratic function subject to linear constraints.

The Optimization Problem Solution

w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0

• Each non-zero αi indicates that corresponding xi is a support vector.

Soft Margin Classification

• What if the training set is not linearly separable?

Soft Margin Classification Mathematically

• The old formulation:

• Modified formulation incorporates slack variables:

Find w and b such that

• Parameter C can be viewed as a way to control overfitting: it “trades off”

Soft Margin Classification – Solution

• Dual problem is identical to separable case (would not be identical if the 2-

• Again, xi with non-zero αi will be support vectors.

Theoretical Justification for Maximum Margins

• Vapnik has proved the following:

• Intuitively, this implies that regardless of dimensionality m0 we can

• Thus, complexity of the classifier is kept small regardless of

Linear SVMs: Overview

• The classifier is a separating hyperplane.

• Quadratic optimization algorithms can identify which training points xi are

• But what are we going to do if the dataset is just too hard?

Non-linear SVMs: Feature spaces

The “Kernel Trick”

• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj

What Functions are Kernels?

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xn)

Examples of Kernel Functions

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)dp  p

• Higher-dimensional space still has intrinsic dimensionality d (the mapping

Non-linear SVMs Mathematically

• Dual problem formulation:

• The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

• Optimization techniques for finding αi’s remain the same!

You might also like