0% found this document useful (0 votes)

117 views23 pages

10 SVM

This document provides an introduction to support vector machines (SVMs). It discusses how SVMs find the optimal hyperplane for binary classification that maximizes the margin between the two classes. The hyperplane is determined by support vectors, which are the data points closest to the decision boundary. The document describes how SVMs solve a quadratic optimization problem to learn the hyperplane parameters that maximize the margin. It also covers extensions of SVMs to non-linearly separable data using soft margins that allow some misclassification with a penalty.

Uploaded by

Aurobindo Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views23 pages

10 SVM

Uploaded by

Aurobindo Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

INTRODUCTION TO MACHINE LEARNING

SUPPORT VECTOR MACHINE

The slides are from Raymond J. Mooney (ML Research Group @ Univ. of Texas)

Mingon Kang, Ph.D.

Department of Computer Science @ UNLV
Linear Separators
 Binary classification can be viewed as the task of
separating classes in feature space:
w Tx + b = 0
w Tx + b > 0
wT x + b < 0

f(x) = sign(wTx + b)
Ch. 15

Linear classifiers: Which Hyperplane?

 Lots of possible choices for a, b, c.

 A Support Vector Machine (SVM) finds an This line
optimal* solution. represents the
 Maximizes the distance between the hyperplane decision
and the “difficult points” close to decision boundary:
boundary ax + by − c = 0
 One intuition: if there are no points near the
decision surface, then there are no very
uncertain classification decisions
Sec. 15.1

Support Vector Machine (SVM)

Support vectors
 SVMs maximize the margin around
the separating hyperplane.
◼ A.k.a. large margin classifiers

 The decision function is fully

specified by a subset of training
samples, the support vectors.
 Solving SVMs is a quadratic
Maximizes
programming problem Narrower
margin
margin
Sec. 15.1

Maximum Margin: Formalization

 w: decision hyperplane normal vector

 xi: data point i
 yi: class of data point i (+1 or -1)
 Classifier is: f(xi) = sign(wTxi + b)
 Functional margin of xi is: yi (wTxi + b)
 The functional margin of a dataset is twice the minimum
functional margin for any point
 The factor of 2 comes from measuring the whole width of the
margin
 Problem: we can increase this margin simply by scaling w, b….
Sec. 15.1

Geometric Margin
wT x + b
 Distance from example to the separator is r=y
w
 Examples closest to the hyperplane are support vectors.
 Margin ρ of the separator is the width of separation between support vectors of
classes.
x ρ Derivation of finding r:
Dotted line x’ − x is perpendicular to
r decision boundary so parallel to w.
x′ Unit vector is w/|w|, so line is rw/|w|.
x’ = x – yrw/|w|.
x’ satisfies wTx’ + b = 0.
So wT(x –yrw/|w|) + b = 0
Recall that |w| = sqrt(wTw).
So wTx –yr|w| + b = 0
So, solving for r gives:
r = y(wTx + b)/|w| 6
Sec. 15.1

Linear SVM Mathematically

The linearly separable case

 Assume that the functional margin of each data item is at least 1, then the
following two constraints follow for a training set {(xi ,yi)}

wTxi + b ≥ 1 if yi = 1
wTxi + b ≤ −1 if yi = −1
 For support vectors, the inequality becomes an equality
 Then, since each example’s distance from the hyperplane is
w x+b
T
r=y
w

 The functional margin is: 2

r=
w
7
Sec. 15.1

Linear Support Vector Machine (SVM)

ρ wTxa + b = 1

wTxb + b = -1
 Hyperplane
wT x + b = 0

 Extra scale constraint:

mini=1,…,n |wTxi + b| = 1

 This implies:
wT(xa–xb) = 2
wT x + b = 0
ρ = ‖xa–xb‖2 = 2/‖w‖2

8
Worked example: Geometric margin

 Maximum margin weight

vector is parallel to line
from
Extra margin(1, 1) to (2, 3). So
weight vector is (1, 2).
 Decision boundary is

normal (“perpendicular”)
to it halfway between.
 It passes through (1.5, 2)

 So y = x1 +2x2 − 5.5

 Geometric margin is √5
9
Worked example: Functional margin
 Let’s minimize w given that
yi(wTxi + b) ≥ 1
 Constraint has = at SVs;
w = (a, 2a) for some a
 a+2a+b = −1 2a+6a+b =
1
 So, a = 2/5 and b = −11/5
Optimal hyperplane is:
w = (2/5, 4/5) and b =
−11/5
 Margin ρ is 2/|w|
= 2/√(4/25+16/25)
= 2/(2√5/5) = √5
10
Sec. 15.1

Linear SVMs Mathematically (cont.)

 Then we can formulate the quadratic optimization problem:

Find w and b such that

2
r= is maximized; and for all {(xi , yi)}
w
wTxi + b ≥ 1 if yi=1; wTxi + b ≤ -1 if yi = -1

 A better formulation (min ‖w‖= max 1/‖w‖ ):

Find w and b such that

Φ(w) =½ wTw is minimized;

and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1 11

Sec. 15.1

Solving the Optimization Problem

Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1

 This is now optimizing a quadratic function subject to linear constraints

 Quadratic optimization problems are a well-known class of mathematical
programming problem, and many (intricate) algorithms exist for solving them
(with many special ones built for SVMs)
 The solution involves constructing a dual problem where a Lagrange multiplier αi
is associated with every constraint in the primary problem:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi 12
Sec. 15.1

The Optimization Problem Solution

 The solution has the form:

w =Σαiyixi b= yk- wTxk for any xk such that αk 0

 Each non-zero αi indicates that corresponding xi is a support vector.
 Then the classifying function will have the form:

f(x) = ΣαiyixiTx + b

 Notice that it relies on an inner product between the test point x and the
support vectors xi
 We will return to this later.
 Also keep in mind that solving the optimization problem involved computing the
inner products xiTxj between all pairs of training points.

13
Sec. 15.2.1

Soft Margin Classification

 If the training data is not

linearly separable, slack
variables ξi can be added to
allow misclassification of
difficult or noisy examples.
 Allow some errors
 Letsome points be ξi
moved to where they ξj
belong, at a cost
 Still, try to minimize training
set errors, and to place
hyperplane “far” from each
class (large margin) 14
Sec. 15.2.1
Soft Margin Classification
Mathematically
 The old formulation:

Find w and b such that

Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1
 The new formulation incorporating slack variables:

Find w and b such that

Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i
 Parameter C can be viewed as a way to control overfitting
 A regularization term

15
Sec. 15.2.1

Soft Margin Classification – Solution

 The dual problem for soft margin classification:

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

 Neither slack variables ξi nor their Lagrange multipliers appear in the dual
problem!
 Again, xi with non-zero αi will be support vectors.
 Solution to the dual problem is:
w is not needed explicitly
w = Σαiyixi for classification!
b = yk(1- ξk) - wTxk where k = argmax αk’
k’ f(x) = ΣαiyixiTx + b
16
Sec. 15.1

Classification with SVMs

 Given a new point x, we can score its projection
onto the hyperplane normal:
 I.e., compute score: wTx + b = ΣαiyixiTx + b
◼ Decide class based on whether < or > 0

 Can set confidence threshold t.

Score > t: yes

Score < -t: no
1
0
Else: don’t know -1 17
Sec. 15.2.1

Linear SVMs: Summary

 The classifier is a separating hyperplane.

 The most “important” training points are the support vectors; they define the
hyperplane.

 Quadratic optimization algorithms can identify which training points xi are

support vectors with non-zero Lagrangian multipliers αi.

 Both in the dual formulation of the problem and in the solution, training
points appear only inside inner products:

Find α1…αN such that f(x) = ΣαiyixiTx + b

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi
18
Sec. 15.2.3

Non-linear SVMs
 Datasets that are linearly separable (with some noise) work out great:

0 x

 But what are we going to do if the dataset is just too hard?

0 x

 How about … mapping data to a higher-dimensional space:

0 x 19
Sec. 15.2.3

Non-linear SVMs: Feature spaces

 General idea: the original feature space can
always be mapped to some higher-dimensional
feature space where the training set is separable:

Φ: x → φ(x)

20
Non-linear SVMs: Feature spaces

21
Sec. 15.2.3

The “Kernel Trick”

 The linear classifier relies on an inner product between vectors K(xi,xj)=xiTxj

 If every datapoint is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
 A kernel function is some function that corresponds to an inner product in some
expanded feature space.
 Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj) where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2] 22
Sec. 15.2.3

Kernels
 Why use kernels?
 Make non-separable problem separable.
 Map data into better representational space

 Common kernels
 Linear

 Polynomial K(x,z) = (1+xTz)d

◼ Gives feature conjunctions
 Radial basis function (infinite dimensional space)

Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Introduction To: Information Retrieval
No ratings yet
Introduction To: Information Retrieval
48 pages
Support Vector Machine
No ratings yet
Support Vector Machine
33 pages
A09 Support Vector Machines 2up
No ratings yet
A09 Support Vector Machines 2up
15 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
103 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Transforms and Partial Differential Equation Questions Notes of m3, 3rd Semester Notes
100% (3)
Transforms and Partial Differential Equation Questions Notes of m3, 3rd Semester Notes
52 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Unit 2
No ratings yet
Unit 2
47 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM
No ratings yet
SVM
11 pages
Slide - SVM
No ratings yet
Slide - SVM
12 pages
SVM Student
No ratings yet
SVM Student
40 pages
10 SVM
No ratings yet
10 SVM
77 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Support Vector Machine Classifiers
No ratings yet
Support Vector Machine Classifiers
44 pages
Support Vector Machines
No ratings yet
Support Vector Machines
33 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM
No ratings yet
SVM
36 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
SVM
No ratings yet
SVM
21 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Cryptarithmetic Problem
No ratings yet
Cryptarithmetic Problem
4 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Mathematics 1B-08-01-2025
No ratings yet
Mathematics 1B-08-01-2025
7 pages
Gamma Distribution: Presented To: Dr. Zahid Ahmad Presented By: Rauf Shaukat (557) Waheed Afzal
No ratings yet
Gamma Distribution: Presented To: Dr. Zahid Ahmad Presented By: Rauf Shaukat (557) Waheed Afzal
45 pages
Math 101 - Revise - Syllabus
No ratings yet
Math 101 - Revise - Syllabus
10 pages
Properties of Rational Numbers For Multiplication: (P) Closure
No ratings yet
Properties of Rational Numbers For Multiplication: (P) Closure
2 pages
Re Tender Maths Kits
No ratings yet
Re Tender Maths Kits
108 pages
Lecture Notes Part1
No ratings yet
Lecture Notes Part1
22 pages
Square-Free Powers of Cohen-Macaulay Simplicial Fo
No ratings yet
Square-Free Powers of Cohen-Macaulay Simplicial Fo
14 pages
CHAPTER 6 Discrete Probability Distributions
No ratings yet
CHAPTER 6 Discrete Probability Distributions
19 pages
CRC Standard Mathematical Tables and Formulas 33rd Edition Daniel Zwillinger
No ratings yet
CRC Standard Mathematical Tables and Formulas 33rd Edition Daniel Zwillinger
62 pages
Area of Rectangle
No ratings yet
Area of Rectangle
6 pages
Math 10 - M7 - Q4
No ratings yet
Math 10 - M7 - Q4
4 pages
Descartes Fermat Analytic Geometry
No ratings yet
Descartes Fermat Analytic Geometry
58 pages
2002 Cambridge Exam
No ratings yet
2002 Cambridge Exam
22 pages
An Algorithm For Generalized Matrix Eigenvalue Problems (1973) (C. B. Moler, G. W. Stewart)
No ratings yet
An Algorithm For Generalized Matrix Eigenvalue Problems (1973) (C. B. Moler, G. W. Stewart)
16 pages
SSC CGL 12 Week Study Plan
No ratings yet
SSC CGL 12 Week Study Plan
3 pages
Chapter 3
No ratings yet
Chapter 3
59 pages
8 Bit Division: Observation: S.No Input Output
No ratings yet
8 Bit Division: Observation: S.No Input Output
3 pages
Revision Sheet For EOY PDF
No ratings yet
Revision Sheet For EOY PDF
13 pages
Analytic Scoring Rubric For MI
No ratings yet
Analytic Scoring Rubric For MI
4 pages
FiMa1 ExerciseSheet07 Solution
No ratings yet
FiMa1 ExerciseSheet07 Solution
10 pages
Machine Learning Solution
No ratings yet
Machine Learning Solution
6 pages
2016 SH2 H2 Maths Consol A Power Series
No ratings yet
2016 SH2 H2 Maths Consol A Power Series
3 pages
Jan & March 08
No ratings yet
Jan & March 08
4 pages
Simple Horizontal Curves
No ratings yet
Simple Horizontal Curves
4 pages
MegaNotes - Mathematics
No ratings yet
MegaNotes - Mathematics
8 pages
Ch16 Estimating Accuracy
No ratings yet
Ch16 Estimating Accuracy
9 pages
FM11SB 7.4
No ratings yet
FM11SB 7.4
14 pages
St. Johannes College P.2 Mathematics Workshop Division (1) Name: - Date
No ratings yet
St. Johannes College P.2 Mathematics Workshop Division (1) Name: - Date
2 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

10 SVM

Uploaded by

10 SVM

Uploaded by

INTRODUCTION TO MACHINE LEARNING

SUPPORT VECTOR MACHINE

Mingon Kang, Ph.D.

Linear classifiers: Which Hyperplane?

 Lots of possible choices for a, b, c.

Support Vector Machine (SVM)

 The decision function is fully

Maximum Margin: Formalization

 w: decision hyperplane normal vector

Linear SVM Mathematically

 The functional margin is: 2

Linear Support Vector Machine (SVM)

 Extra scale constraint:

 Maximum margin weight

Linear SVMs Mathematically (cont.)

Find w and b such that

 A better formulation (min ‖w‖= max 1/‖w‖ ):

Find w and b such that

and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1 11

Solving the Optimization Problem

 This is now optimizing a quadratic function subject to linear constraints

The Optimization Problem Solution

w =Σαiyixi b= yk- wTxk for any xk such that αk 0

Soft Margin Classification

 If the training data is not

Find w and b such that

Find w and b such that

Soft Margin Classification – Solution

Find α1…αN such that

Classification with SVMs

 Can set confidence threshold t.

Score > t: yes

Linear SVMs: Summary

 Quadratic optimization algorithms can identify which training points xi are

Find α1…αN such that f(x) = ΣαiyixiTx + b

 But what are we going to do if the dataset is just too hard?

 How about … mapping data to a higher-dimensional space:

Non-linear SVMs: Feature spaces

The “Kernel Trick”

 The linear classifier relies on an inner product between vectors K(xi,xj)=xiTxj

 Polynomial K(x,z) = (1+xTz)d

You might also like