0% found this document useful (0 votes)

122 views4 pages

Lecture Notes SVM

This document provides an overview of support vector machines (SVM) for classification problems. It discusses how SVM seeks to find an optimal separating hyperplane with the maximum margin between classes in the training data. When the data is not perfectly separable, SVM allows for some misclassifications by introducing slack variables and penalizing them. The document formulates SVM as both a primal and dual optimization problem and explains how the dual problem leads to a sparse solution with only support vectors determining the optimal hyperplane.

Uploaded by

Sreeprada V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

122 views4 pages

Lecture Notes SVM

Uploaded by

Sreeprada V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Statistics & Discrete Methods of Data Sciences CS395T(51800), CSE392(63625), M393C(54377)

MW 11:00a.m.-12:30p.m. GDC 5.304

Lecture Notes: Geometry of Support Vector Machines and Kernel Trick, [email protected]

1 Support Vector Machines(SVM)

We consider a machine learning approach to 2-class hyperplane separation.
Given a training set of instance pairs {(xi , yi ) | xi ∈ Rn , i = 1, . . . , m, class labels yi = ±1}, we wish to find
a hyperplane direction w ∈ Rn and an offset scalar b such that
(
w · xi − b ≥ 0 for yi = +1
w · xi − b < 0 for yi = −1
or altogether

yi (w · xi − b) > 0
If such a hyperplane exists, then it is not unique. In real world classification problems it is quite likely that
one would require non-linear separators having a reasonable complexity vs accuracy tradeoff.
Since the training data are merely samples of the instance space, and not necessarily adequate "represen-
tative" samples, doing well on the training data (samples) does not necessarily guarantee (or even imply)
that one will do well on the entire instance space. A related issue is that the training data distribution is
unknown, so contrary to statistical inference we do not estimate the unknown distribution. Nevertheless
optimal learning algorithms can be developed without the need for first estimating the distribution.

1.1 Idea of SVM

Consider subset Cr of all hyperplanes which have a fixed margin r where the margin is
yi (wT xi − b)
r = min{ }
i kwk2
representing distance of the closest training point to the hyperplane.
The Support Vector Machine (SVM) method first introduced via Vapnik et al. 1992. SVM seeks a
hyperplane that simultaneously minimizes empirical error and maximizes the margin.
Remark: Distance between point x and hyperplane is |w·x−b| kwk2
Note since
w · x − b > 0 ⇐⇒ (λw) · x − (λb) > 0 ∀λ > 0
The distance |w·x−b| 1
kwk can be normalized to kwk2 =
√1
w·w
by setting the smallest margin (closest to hyperplane)
2 2
equals to 1,i.e. let |w · x − b| = 1. Therefore, the overall margin (from both sides) is √w·w = kwk 2

Note
2 2 2 1
argmax √ = argmax = argmax 2 = argmin 2 (w · w)
w w·w w kwk 2 w kwk2 w

Then we could write down the following optimization problem that SVM seeks to solve:

1
min w·w
w,b 2 (1)
s.t. yi (w · xi − b) − 1 ≥ 0 i = 1, . . . , m

here the constraints are the result of normalization of |w·x−b|

kwk : All points labeled ±1 are farther away from
the "narrow-band boundary" margin, i.e.
(
w·x−b≥1 when x is + 1
w · x − b ≤ −1 when x is − 1
combining with the label, one has:

yi (w · xi − b) − 1 ≥ 0 i = 1, . . . , m

1
Statistics & Discrete Methods of Data Sciences CS395T(51800), CSE392(63625), M393C(54377)
MW 11:00a.m.-12:30p.m. GDC 5.304
Lecture Notes: Geometry of Support Vector Machines and Kernel Trick, [email protected]

Noisy Data Case One relaxes the SVM problem to having a "soft" margin. Separability holds with some
error:

m
1 X
min w·w+ν i
w,b,i 2 i=1
(2)
s.t. yi (w · xi − b) ≥ 1 − i i = 1, . . . , m
i ≥ 0i = 1, . . . , m
Pm
With i > 0, point can lie inside the margin. Note that i=1 i is ki k1 ,i.e. the L1 norm, this implies sparsity
and thus sparse errors.
Best is to penalize based on number of errors, i.e.: L0 norm: ki k0 = {i : i > 0}
which minimizes the number of errors. However L0 norms are non-convex. L1 norm is the convex relaxation
of this L0 norm.

1.2 Primal-Dual formulation of SVM

The primal form of SVM with maximization of the soft margin is:

m
1 X
min w·w+ν i
w,b,i 2 i=1
(3)
s.t. yi (w · xi − b) ≥ 1 − i i = 1, . . . , m
i ≥ 0 i = 1, . . . , m

The Lagrangian formulation using Lagrange multipliers yields

m m m
1 X X X
sup min L(w, b, i , µi , δi ) = w · w + ν i − µi [yi (w · xi − b) − 1 + i ] − δ i i
µi ,δi w,b,i 2 i=1 i=1 i=1

using additional first, we check the first order optimality condition:


∂L
∂w = 0



∂L
 ∂b = 0



∂L
min L(w, b, i , µi , δi ) =⇒ ∂i = 0
w,b,i 
µi [yi (w · xi − b) − 1 + i ] = 0





δ = 0
i i

m m
∂L X X
=w− µi yi xi = 0 =⇒ w = µi yi xi
∂w i=1 i=1
m
∂L X
= µi yi = 0
∂b i=1

∂L
= ν − µi − δi =⇒ 0 ≤ µi ≤ ν, i = 1, 2, . . . , m
∂i
When µi = 0, i.e. yi (w · xi − b) > 1 − i , instance xi is classified and is not a boundary point.
When µi > 0, i.e. yi (w · xi − b) = 1 − i , then xi is a boundary point with margin error i > 0 as small as
possible. These boundary points are support vectors and w is determined by them.
∂L ∂L
∂w = 0 and using ∂i implies the following classification

• 0 < µi < ν: point xi is on margin and no margin error

• µi = ν: points xi is a margin error point (since δi = 0)

2
Statistics & Discrete Methods of Data Sciences CS395T(51800), CSE392(63625), M393C(54377)
MW 11:00a.m.-12:30p.m. GDC 5.304
Lecture Notes: Geometry of Support Vector Machines and Kernel Trick, [email protected]

• µi = 0: point xi is not in the margins

Pm
by replacing w = i=1 µi yi xi and δi = ν − µi , one obtains the Dual Problem for SVM with soft margin:

m
X 1X
max µi − (yi yj xi · xj )µi µj
µi
i=1
2 i,j
s.t. 0 ≤ µi ≤ ν, i = 1, . . . , m (4)
Xm
yi µi = 0
i=1

If we denote 1 as the vector with all elements 1, then the maximization problem can be written as:
1
max µT 1 − µT M µ
µi 2
with Gram Matrix Mij = yi yj xi · xj , which is a Positive Semi-Definite (PSD) matrix.
The reason to introduce the dual problem is the following: dual form of SVM is simpler than the primal
SVM; the key feature is that the optimization objective function is now described by inner products of data
instance pairs hxi , xj i.

2 Kernel Trick in Support Vector Machine

SVM is applicable for the "kernel trick", meaning that inner products which determines Gram Matrix can
be replaced by non-linear functional inner product (which keeps the Gram matrix PSD).

2.1 Kernel mapping

A Kernel K is obtained from a mapping of original measurement vectors xi to a higher dimensional feature
vector space. Given a map φ(x)

φ(x) : Rn −→ Rp , p > n
the functional space formulation of a Kernel space is a Hilbert space H = {K : Rn ×Rn −→ R defines an inner product}.
Here K is given by

K(xi , xj ) = φ(xi )T φ(xj )

The Kernel trick is to use and evaluate the kernel without ever explicitly evaluating the φ(·).
Some common choice of Kernels used in conjunction with non-linear (Kernel) SVM are:

• d-th order polynomial K(xi , xj ) = (xTi xj + θ)d

• Gaussian Radial Basis Function(RBF): K(xi , xj ) = exp(− 2σ1 2 kxi − xj k2 )

Algorithms on input vectors expressed as only computing inner products between vectors are amenable to
the kernel trick where xi · xj can be replaced by φT (xi )φ(xj )

We know the Gram Matrix M in dual formulation is Positive Semi-Definite (PSD) since M = QT Q, where

Q = y1 x1 , y2 x2 , · · · , ym xm

with all xi ∈ Rn as column vectors.

Theorem. [Mercer] Let K(x, y) be symmetric and continuous. Then the following conditions are equivalent:
P∞
(I) K(x, y) = i=1 αi φi (x)φi (y) = φT (x)φ(y) for any uniformly converging series with αi > 0

3
Statistics & Discrete Methods of Data Sciences CS395T(51800), CSE392(63625), M393C(54377)
MW 11:00a.m.-12:30p.m. GDC 5.304
Lecture Notes: Geometry of Support Vector Machines and Kernel Trick, [email protected]

ψ 2 (x) dx < ∞, we have

R
(II) ∀ψ(·) satisfy x
Z Z
K(x, y)ψ(x)ψ(y)dxdy ≥ 0
x y

(III) ∀ {xi }qi=1 and all q, matrix K with Kij = K(xi , xj ) is PSD
To better understand the relationship of feature maps with Kernels, consider:
Ps
• Homogeneous Polynomial Kernel: x, y ∈ Rs =⇒ k(x, y) = (xT y)d = ( i=1 xi yi )d , d > 0
The feature map can be defined as
s ! s
s+d d n1 ns
X
φ(x) ≡ ( dimensional vector space) = x1 ...xs , ni = d, ni ≥ 0
d n1 ...ns i=1

One constructs the Gram matrix of Kernels, as the following: Mij = K(xi , xj ) = (xTi xj )d
√
When s = d = 2, (xT y)2 = x21 y12 + 2x1 x2 y1 y2 + x22 y22 , we pick φ(x) = (x21 , x22 , 2x1 x2 ) and thereby
create Gram matrix with Mij = (xTi xj )2 .
• Non-Homogeneous Polynomial Kernel: All monomials with degree ≤ d
√ √ d
K(x, y) = (xT y + α)d = (x1 y1 + x2 y2 + · · · + xk yk + α α)

Again, consider the case where s = d = 2, and now K(x, y) = (x1 y1 + x2 y2 + α)d : φ : R2 −→ R6 maps
a conic curve in the measurement plane to a hyper-plane in six dimensional feature space.

2.2 Applying the Kernel Trick to SVM

Following notation in section 1.2, by adopting kernel K(·, ·), one can solve dual SVM for φ(w) and b, which
yields K :
K(x, y) = φ(x)T φ(y)
Xm
φ(w) = µi yi φ(xi ).
i=1
Rather than explicitly represent φ(w) or evaluate φ(xi ), we store the support vectors xi for which µi > 0
and use the classifier f for all test data:
m
X
T
f (x) = sgn(φ(w) φ(x) − b) = sgn( µi yi K(xi , x) − b)
i=1
Kernel Trick allows non-linear separator (classifier) by changing the Gram matrix of kernels. However one
should store and use all support vectors at time of classification (rather than w, b), which increase the cost
for time and storage complexity.
remark. Constraint b can be recovered from any of the support vectors, say x+ is a support vector with label
+1, (but not a margin error, i.e. µi < ν), then
φ(w)T φ(x+ ) − b = 1 =⇒ b = φ(w)T φ(x+ ) − 1
remark. Support vectors are typically 10% of training examples so computational load plus memory utility
is relatively high. Approximations use reduced number of support vectors so as to maintain the cost for
computation of kernels.

References
[BHK] Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science, Chap 5
[SVM] Various SVM notes,

The Effects of Temperature On Daphnia Heart Rate With Reference Abstracts
88% (16)
The Effects of Temperature On Daphnia Heart Rate With Reference Abstracts
12 pages
Solutions Manual to accompany Gravity: An Introduction to Einstein’s General Relativity 9780805386622 - Available For One-Click Instant Download
100% (7)
Solutions Manual to accompany Gravity: An Introduction to Einstein’s General Relativity 9780805386622 - Available For One-Click Instant Download
28 pages
ACF4610A Datasheet en
100% (1)
ACF4610A Datasheet en
10 pages
Meiktila University of Economics Department of Commerce Mba Programme (Nay Pyi Taw Campus)
No ratings yet
Meiktila University of Economics Department of Commerce Mba Programme (Nay Pyi Taw Campus)
5 pages
pearson-specialist-mathematics-queensland_year-12_student-book_sample-pages
No ratings yet
pearson-specialist-mathematics-queensland_year-12_student-book_sample-pages
27 pages
svm
No ratings yet
svm
33 pages
API 570 Recertification Quiz - Questions and Answers (Updated January 2022)
88% (8)
API 570 Recertification Quiz - Questions and Answers (Updated January 2022)
13 pages
Hands On Machine Learning with R 1st Edition Brad Boehmke (Author) 2024 Scribd Download
100% (1)
Hands On Machine Learning with R 1st Edition Brad Boehmke (Author) 2024 Scribd Download
51 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
The Construction of Meaning
No ratings yet
The Construction of Meaning
7 pages
November Updated 2019 GR 8-12 Past Exam Papers Correct - AUajYkarQN6nwVRJlOdj
No ratings yet
November Updated 2019 GR 8-12 Past Exam Papers Correct - AUajYkarQN6nwVRJlOdj
5 pages
FlowChart Prozess Wesentlichkeitsanalyse - EN - 2022 - Mit Überschrift Und Logo
No ratings yet
FlowChart Prozess Wesentlichkeitsanalyse - EN - 2022 - Mit Überschrift Und Logo
1 page
Cold Call Blueprint
No ratings yet
Cold Call Blueprint
5 pages
Efficient and Robust LiDAR-Based End-to-End Navigation
No ratings yet
Efficient and Robust LiDAR-Based End-to-End Navigation
8 pages
SVM Extra Kernels
No ratings yet
SVM Extra Kernels
29 pages
CSS Transitions
No ratings yet
CSS Transitions
8 pages
Complex Mode VS Real Normal Mode
No ratings yet
Complex Mode VS Real Normal Mode
2 pages
Desjarlais (2011) Phenomenological Approaches in Anthropology
No ratings yet
Desjarlais (2011) Phenomenological Approaches in Anthropology
20 pages
1951
No ratings yet
1951
116 pages
4 Soil Plasticity Atterberg Limits
No ratings yet
4 Soil Plasticity Atterberg Limits
18 pages
Svm
No ratings yet
Svm
52 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Slide+-+SVM
No ratings yet
Slide+-+SVM
12 pages
L5_SVMs
No ratings yet
L5_SVMs
37 pages
Social Factors
No ratings yet
Social Factors
5 pages
Support_Vector_Machine(SVM)[1]
No ratings yet
Support_Vector_Machine(SVM)[1]
103 pages
Galaxie TOC
No ratings yet
Galaxie TOC
5 pages
Svm
No ratings yet
Svm
40 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
Svm
No ratings yet
Svm
20 pages
Ikea Edition 4 Full
No ratings yet
Ikea Edition 4 Full
2 pages
Unit - 2: Infrastructure of Network Security: Structure
No ratings yet
Unit - 2: Infrastructure of Network Security: Structure
44 pages
SVM Class 2
No ratings yet
SVM Class 2
87 pages
CV Dr. Erandi Pathirana
No ratings yet
CV Dr. Erandi Pathirana
9 pages
6 Lec SVM Kernel
No ratings yet
6 Lec SVM Kernel
36 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Svm Student
No ratings yet
Svm Student
40 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Lecture09 SVM Intro, Kernel Trick (Updated)
No ratings yet
Lecture09 SVM Intro, Kernel Trick (Updated)
36 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
SVM
No ratings yet
SVM
40 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
Svm
No ratings yet
Svm
52 pages
Support Vector Machines
No ratings yet
Support Vector Machines
43 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Application Form - Cadet
No ratings yet
Application Form - Cadet
3 pages
LEED AP BD C Introduction - Materials and Resources
No ratings yet
LEED AP BD C Introduction - Materials and Resources
6 pages
SVM
No ratings yet
SVM
11 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Does Up Arman
No ratings yet
Does Up Arman
66 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Presentation Nishat (Chunian) Limited
No ratings yet
Presentation Nishat (Chunian) Limited
21 pages
Essential_Skills_ISYS10301_Academic_Development_Coursework
No ratings yet
Essential_Skills_ISYS10301_Academic_Development_Coursework
9 pages
Ai Unit-1
No ratings yet
Ai Unit-1
41 pages
SVM Theory
No ratings yet
SVM Theory
7 pages
20 Millions: Over 1 Million
0% (1)
20 Millions: Over 1 Million
66 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Events
No ratings yet
Events
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
This Is
No ratings yet
This Is
7 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Unit 2
No ratings yet
Unit 2
47 pages
Prompt
No ratings yet
Prompt
3 pages
Atmospheric Boundary Layer Flows PDF
No ratings yet
Atmospheric Boundary Layer Flows PDF
304 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
G9 English LAS
No ratings yet
G9 English LAS
27 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM
No ratings yet
SVM
36 pages
Animation in Design Systems PDF
No ratings yet
Animation in Design Systems PDF
39 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Lecture Notes SVM

Uploaded by

Lecture Notes SVM

Uploaded by

Statistics & Discrete Methods of Data Sciences CS395T(51800), CSE392(63625), M393C(54377)

MW 11:00a.m.-12:30p.m. GDC 5.304

1 Support Vector Machines(SVM)

1.1 Idea of SVM

here the constraints are the result of normalization of |w·x−b|

1.2 Primal-Dual formulation of SVM

The Lagrangian formulation using Lagrange multipliers yields

using additional first, we check the first order optimality condition:

• 0 < µi < ν: point xi is on margin and no margin error

• µi = 0: point xi is not in the margins

2 Kernel Trick in Support Vector Machine

2.1 Kernel mapping

K(xi , xj ) = φ(xi )T φ(xj )

• d-th order polynomial K(xi , xj ) = (xTi xj + θ)d

with all xi ∈ Rn as column vectors.

ψ 2 (x) dx < ∞, we have

2.2 Applying the Kernel Trick to SVM

You might also like