0% found this document useful (0 votes)

19 views25 pages

Lec06 SVM

Uploaded by

Samreen Begum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views25 pages

Lec06 SVM

Uploaded by

Samreen Begum

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

SVM - Support Vector Machines

• A new classification method for both linear and nonlinear data

• It uses a nonlinear mapping to transform the original training data into a
higher dimension
• With the new dimension, it searches for the linear optimal separating
hyperplane (i.e., “decision boundary”)
• With an appropriate nonlinear mapping to a sufficiently high dimension,
data from two classes can always be separated by a hyperplane
• SVM finds this hyperplane using support vectors (“essential” training
tuples) and margins (defined by the support vectors)

Machine Learning 1
SVM - History and Applications
• Vapnik and colleagues (1992)—groundwork from Vapnik &
Chervonenkis’ statistical learning theory in 1960s
• Features: training can be slow but accuracy is high owing to their ability
to model complex nonlinear decision boundaries (margin maximization)
• Used both for classification and prediction
• Applications:
– handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests

Machine Learning 2
Linear Classifiers

Consider a two dimensional dataset

with two classes

How would we classify this dataset?

Machine Learning 3
Linear Classifiers

Both of the lines can be linear classifiers.

Machine Learning 4
Linear Classifiers

There are many lines that can be linear classifiers.

Which one is the optimal classifier.

Machine Learning 5
Classifier Margin

Define the margin of a linear classifier as the width that

theboundary could be increased by before hitting a datapoint.

Machine Learning 6
Maximum Margin

The maximum margin linear

classifier is the linear classifier
with the maximum margin.

This is the simplest kind of SVM

(Called Linear SVM)

Machine Learning 7
Support Vectors
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the distance between support vectors.

w.x+b>0 w.x+b=0
ρ
f(x) = sign(w . x + b)
red is +1
blue is -1

Support Vectors
w.x+b<0

Machine Learning 8
Support Vectors

w.x i  b
• Distance from example xi to the separator is r
w

ρ || w || is w12  ...  wn2

Machine Learning 9
SVM - Linearly Separable
• A separating hyperplane can be written as
• W●X+b=0
– where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
• For 2-D it can be written as
• w0 + w1 x1 + w2 x2 = 0
• The hyperplane defining the sides of the margin:
• H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
• H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
• Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors

Machine Learning 10
Linear SVM Mathematically
• Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):

wTxi + b ≤ - ρ/2 if yi = -1
yi(wTxi + b) ≥ ρ/2
wTxi + b ≥ ρ/2 if yi = 1 
• For every support vector xs the above inequality is an equality. After
rescaling w and b by ρ/2 in the equality, we obtain that distance between
each xs and the hyperplane is y s (w T x s  b) 1
r 
w w
• Then the margin can be expressed through (rescaled) w and b as:
2
  2r 
w
Machine Learning 11
Linear SVMs Mathematically (cont.)
• Then we can formulate the quadratic optimization problem:

Find w and b such that

2
 is maximized
w
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1

Which can be reformulated as:

Find w and b such that
Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1

Machine Learning 12
Solving the Optimization Problem

Find w and b such that

Φ(w) =wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
• Need to optimize a quadratic function subject to linear constraints.
• Quadratic optimization problems are a well-known class of mathematical
programming problems for which several (non-trivial) algorithms exist.
• The solution involves constructing a dual problem where a Lagrange multiplier αi is
associated with every inequality constraint in the primal (original) problem:

Find α1…αn such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

Machine Learning 13
The Optimization Problem Solution
• Given a solution α1…αn to the dual problem, solution to the primal is:

w =Σαiyixi b = yk - ΣαiyixiTxk for any αk > 0

• Each non-zero αi indicates that corresponding xi is a support vector.
• Then the classifying function is (note that we don’t need w explicitly):

f(x) = ΣαiyixiTx + b

• Notice that it relies on an inner product between the test point x and the
support vectors xi –
• Also keep in mind that solving the optimization problem involved
computing the inner products xiTxj between all training points.

Machine Learning 14
Soft Margin Classification
• What if the training set is not linearly separable?
• Slack variables ξi can be added to allow misclassification of difficult or
noisy examples, resulting margin called soft.

ξi
ξi

Machine Learning 15
Soft Margin Classification Mathematically
• The old formulation:

Find w and b such that

Φ(w) =wTw is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1
• Modified formulation incorporates slack variables:

Find w and b such that

Φ(w) =wTw + CΣξi is minimized
and for all (xi ,yi), i=1..n : yi (wTxi + b) ≥ 1 – ξi, , ξi ≥ 0
• Parameter C can be viewed as a way to control overfitting: it “trades off” the relative
importance of maximizing the margin and fitting the training data.

Machine Learning 16
Soft Margin Classification – Solution
• Dual problem is identical to separable case (would not be identical if the 2-norm
penalty for slack variables CΣξi2 was used in primal objective, we would need
additional Lagrange multipliers for slack variables):
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
• Again, xi with non-zero αi will be support vectors.
• Solution to the dual problem is: Again, we don’t need to compute
w explicitly for classification:
w =Σαiyixi f(x) = ΣαiyixiTx + b
b= yk(1- ξk) - ΣαiyixiTxk for any k s.t. αk>0

Machine Learning 17
Theoretical Justification for Maximum Margins
• Vapnik has proved the following:
The class of optimal linear separators has VC dimension h bounded from above as
 D 2  
h  min  2 , m0   1
   
where ρ is the margin, D is the diameter of the smallest sphere that can enclose all of
the training examples, and m0 is the dimensionality.

• Intuitively, this implies that regardless of dimensionality m0 we can minimize

the VC dimension by maximizing the margin ρ.

• Thus, complexity of the classifier is kept small regardless of dimensionality.

Machine Learning 18
Linear SVMs: Overview
• The classifier is a separating hyperplane.
• Most “important” training points are support vectors; they define the
hyperplane.
• Quadratic optimization algorithms can identify which training points xi
are support vectors with non-zero Lagrangian multipliers αi.
• Both in the dual formulation of the problem and in the solution training
points appear only inside inner products:

Find α1…αN such that f(x) = ΣαiyixiTx + b

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized
and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
Machine Learning 19
Non-linear SVMs

• Datasets that are linearly separable with some noise work out great:

0 x

• But what are we going to do if the dataset is just too hard?

0 x
• How about… mapping data to a higher-dimensional space
x2

0 x
Machine Learning 20
Non-linear SVMs: Feature spaces

• General idea: the original feature space can always be mapped to some higher-
dimensional feature space where the training set is separable:

Φ: x → φ(x)

Machine Learning 21
The “Kernel Trick”

• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj

• If every datapoint is mapped into high-dimensional space via some transformation
Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
• A kernel function is a function that is eqiuvalent to an inner product in some feature
space.
• Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2] =
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
• Thus, a kernel function implicitly maps data to a high-dimensional space (without
the need to compute each φ(x) explicitly).
Machine Learning 22
Examples of Kernel Functions
• Linear: K(xi,xj)= xiTxj
– Mapping Φ: x → φ(x), where φ(x) is x itself

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)p

d  p
– Mapping Φ: x → φ(x), where φ(x) has   dimensions
 p 
2
xi  x j

2 2
• Gaussian (radial-basis function): K(xi,xj) = e
– Mapping Φ: x → φ(x), where φ(x) is infinite-dimensional: every point is
mapped to a function (a Gaussian); combination of functions for support vectors
is the separator.
• Higher-dimensional space still has intrinsic dimensionality d (the mapping is not
onto), but linear separators in it correspond to non-linear separators in original space.

Machine Learning 23
Non-linear SVMs Mathematically
• Dual problem formulation:

Find α1…αn such that

Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

• The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

• Optimization techniques for finding αi’s remain the same!

Machine Learning 24
SVM applications
• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained
increasing popularity in late 1990s.
• SVMs are currently among the best performers for a number of classification tasks
ranging from text to genomic data.
• SVMs can be applied to complex data types beyond feature vectors (e.g. graphs,
sequences, relational data) by designing kernel functions for such data.
• SVM techniques have been extended to a number of tasks such as regression [Vapnik
et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.
• Most popular optimization algorithms for SVMs use decomposition to hill-climb over
a subset of αi’s at a time, e.g. SMO [Platt ’99] and [Joachims ’99]
• Tuning SVMs remains a black art: selecting a specific kernel and parameters is
usually done in a try-and-see manner.

Machine Learning 25

Ai-Generator Websites and Its Impact On The Academic Performance of Senior High School Students of Usant
100% (1)
Ai-Generator Websites and Its Impact On The Academic Performance of Senior High School Students of Usant
147 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Student
No ratings yet
SVM Student
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
10 SVM
No ratings yet
10 SVM
23 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
Dod Data Analytics Ai Adoption Strategy
No ratings yet
Dod Data Analytics Ai Adoption Strategy
26 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
A09 Support Vector Machines 2up
No ratings yet
A09 Support Vector Machines 2up
15 pages
SVM
No ratings yet
SVM
36 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Rtmnu Machine Learning Paper Winter 2024
100% (1)
Rtmnu Machine Learning Paper Winter 2024
4 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Support Vector Machines: Theory, Implementation, and Applications
No ratings yet
Support Vector Machines: Theory, Implementation, and Applications
40 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
Unit 2
No ratings yet
Unit 2
47 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
843 AI Projects Cookbook
No ratings yet
843 AI Projects Cookbook
43 pages
SVM
No ratings yet
SVM
11 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Lecture 2 (B) Problem Solving Agents
No ratings yet
Lecture 2 (B) Problem Solving Agents
10 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
CS6659 Artificial Intelligence: Topic: Expert Systems
No ratings yet
CS6659 Artificial Intelligence: Topic: Expert Systems
24 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Utilization of ERP Systems in Manufacturing Industry For Productivity
No ratings yet
Utilization of ERP Systems in Manufacturing Industry For Productivity
8 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Hazelcast IMDG V IMDB Whitepaper
No ratings yet
Hazelcast IMDG V IMDB Whitepaper
8 pages
Ai Assignment No 1
No ratings yet
Ai Assignment No 1
17 pages
SVM
No ratings yet
SVM
21 pages
Digital Psychiatry: Ethical Risks and Opportunities For Public Health and Well-Being
No ratings yet
Digital Psychiatry: Ethical Risks and Opportunities For Public Health and Well-Being
30 pages
The Theory of Learning Styles Applied To Distance Learning: Sciencedirect
No ratings yet
The Theory of Learning Styles Applied To Distance Learning: Sciencedirect
12 pages
ĐỀ ÔN TẬP CUỐI HỌC KỲ I T ANH 12 2024
No ratings yet
ĐỀ ÔN TẬP CUỐI HỌC KỲ I T ANH 12 2024
29 pages
Eurofound - Ethical Digitalisation at Work
No ratings yet
Eurofound - Ethical Digitalisation at Work
68 pages
Julia Calaunan Manuscript
No ratings yet
Julia Calaunan Manuscript
127 pages
Semantic Kernel Source Guided Tour
No ratings yet
Semantic Kernel Source Guided Tour
111 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Cse - Ai & ML
No ratings yet
Cse - Ai & ML
44 pages
Garcia-Patricio Final Paper
No ratings yet
Garcia-Patricio Final Paper
25 pages
The Role of Biased Data in Computerized Gender Discrimination
No ratings yet
The Role of Biased Data in Computerized Gender Discrimination
6 pages
The Place and Role of Artificial Intelligence Chatbots in Adult Education and Training of Adult Educators
No ratings yet
The Place and Role of Artificial Intelligence Chatbots in Adult Education and Training of Adult Educators
18 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Chazey Partners RPA - 2017
No ratings yet
Chazey Partners RPA - 2017
20 pages
Future of Indian Banking The Road Ahead 1 10
No ratings yet
Future of Indian Banking The Road Ahead 1 10
10 pages
Most Cited Article in Academia - International Journal of Data Mining & Knowledge Management Process (IJDKP)
No ratings yet
Most Cited Article in Academia - International Journal of Data Mining & Knowledge Management Process (IJDKP)
39 pages
Chapter 4 - Information Technology and Information Systems in Business
No ratings yet
Chapter 4 - Information Technology and Information Systems in Business
41 pages
Massive Passive AI Review
No ratings yet
Massive Passive AI Review
4 pages
Đề Cương Ôn Tập Cuối Kỳ 2
No ratings yet
Đề Cương Ôn Tập Cuối Kỳ 2
18 pages
Fast Animal Pose Estimation Using Deep Neural Networks
No ratings yet
Fast Animal Pose Estimation Using Deep Neural Networks
13 pages
Course Outline - MGMA01 - 2024winter
No ratings yet
Course Outline - MGMA01 - 2024winter
7 pages
Literature Review On Banking Technology
100% (1)
Literature Review On Banking Technology
4 pages
2020 Deep CNN TR Le
No ratings yet
2020 Deep CNN TR Le
6 pages
Lec07 GA
No ratings yet
Lec07 GA
15 pages
Lec05 InstanceBased
No ratings yet
Lec05 InstanceBased
13 pages
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
No ratings yet
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
5 pages
Lec00 Introduction
No ratings yet
Lec00 Introduction
8 pages

Lec06 SVM

Uploaded by

Lec06 SVM

Uploaded by

SVM - Support Vector Machines

• A new classification method for both linear and nonlinear data

Consider a two dimensional dataset

How would we classify this dataset?

Both of the lines can be linear classifiers.

There are many lines that can be linear classifiers.

Which one is the optimal classifier.

Define the margin of a linear classifier as the width that

The maximum margin linear

This is the simplest kind of SVM

ρ || w || is w12  ...  wn2

Find w and b such that

Which can be reformulated as:

Find w and b such that

Find α1…αn such that

w =Σαiyixi b = yk - ΣαiyixiTxk for any αk > 0

Find w and b such that

Find w and b such that

• Intuitively, this implies that regardless of dimensionality m0 we can minimize

• Thus, complexity of the classifier is kept small regardless of dimensionality.

Find α1…αN such that f(x) = ΣαiyixiTx + b

• But what are we going to do if the dataset is just too hard?

• The linear classifier relies on inner product between vectors K(xi,xj)=xiTxj

• Polynomial of power p: K(xi,xj)= (1+ xiTxj)p

Find α1…αn such that

• The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

• Optimization techniques for finding αi’s remain the same!

You might also like