0% found this document useful (0 votes)

7 views

Support Vecto Machine (3)

Uploaded by

baominh5x2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Support Vecto Machine (3)

Uploaded by

baominh5x2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Support Vector Machine (SVM)

Nguyen Minh Bao

Nguyen Minh Triet

Math for Computer Science

November 18, 2024

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 1/57

Overview of Support Vector Machine

Section 1: Introduction

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 2/57

Overview of Support Vector Machine

Section 1: Introduction

Section 2: Support Vector Machine

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 2/57

Overview of Support Vector Machine

Section 1: Introduction

Section 2: Support Vector Machine

Section 3: Types of SVM

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 2/57

Overview of Support Vector Machine

Section 1: Introduction

Section 2: Support Vector Machine

Section 3: Types of SVM

Section 4: Advantages and Drawbacks

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 2/57

Overview of Support Vector Machine

Section 1: Introduction

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 3/57

Related Mathematical Concepts

In a n-dimensional space, a hyperplane (α) is defined as a subspace with a

dimension of n - 1, represented by the equation:

(α) : w1 x1 + w2 x2 + · · · + wn xn + b = 0 (1)

Where:
x = [x1 , x2 , . . . , xn ]T represents the coordinates of a point on the
hyperplane.
w = [w1 , w2 , . . . , wn ]T is a normal vector of (α).
b is a constant.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 4/57

Distance from a Point to a Hyperplane

In a n-dimensional space, the distance d from a point

x0 = [x01 , x02 , . . . , x0n ]T to the hyperplane α is defined by the equation:

|w1 x01 + w2 x02 + · · · + wn x0n + b| |wT x0 + b|

d= q =
w12 + w22 + · · · + wn2 ∥w∥2

where: q √
∥w∥2 = w12 + w22 + · · · + wn2 = wT w
is the ℓ2 -norm of w.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 5/57

Illustration of Distance to Hyperplane

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 6/57

Duality in Optimization

Given an Optimization Problem:

x∗ = arg min f0 (x)

Subject to: (P1)
fi (x) ≤ 0, for i = 1, . . . , m
hj (x) = 0, for j = 1, . . . , p

Where:

x∗ is the optimal point of (P1).

f0 (x∗ ) is the optimal value of (P1).
Tp
D=( m
T
i=0 dom fi ) ∩ ( j=1 dom hj ) is the domain of (P1).

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 7/57

Lagrangian Function

The Lagrangian function of (P1) combines the objective function

and constraints using multipliers:
m
X p
X
L(x, λ, ν) = f0 (x) + λi fi (x) + νj hj (x)
i=1 j=1

Where:

λi ≥ 0: Lagrange multipliers for inequality constraints.

νj : Lagrange multipliers for equality constraints.
λ = [λ1 , λ2 , . . . , λm ]T and ν = [ν1 , ν2 , . . . , νp ]T : Lagrange multiplier
vectors.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 8/57

Lagrange Dual Function

Definition:
The Lagrange dual function of (P1) is derived from its Lagrangian
function. For any pair of input(λ, ν), it represents the infimum of the
Lagrangian function over all x in the domain D.

m
X p
X
g (λ, ν) = inf L(x, λ, ν) = inf f0 (x) + λi fi (x) + νj hj (x)
x∈D x∈D
i=1 j=1

Key Properties:
The dual function g (λ, ν) is always concave, even if f0 (x) is not
convex.
The dual function g (λ, ν) provides a lower bound for the optimal
value f0 (x∗ ).

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 9/57

Proof: The Dual Function is Concave

Steps of the Proof:

1 Fix two dual variable pairs (λ1 , ν1 ) and (λ2 , ν2 ).
2 Let θ ∈ [0, 1] and define a convex combination:

(λ, ν) = θ(λ1 , ν1 ) + (1 − θ)(λ2 , ν2 ).

3 The dual function for this convex combination:

g (λ, ν) = inf L x, θ(λ1 , ν1 ) + (1 − θ)(λ2 , ν2 )
x∈D
m
X p
X
= inf f0 (x) + [θλ1,i + (1 − θ)λ2,i ]fi (x) + [θν1,j + (1 − θ)ν2,j ]hj (x)
x∈D
i=1 j=1

= inf θL(x, λ1 , ν1 ) + (1 − θ)L(x, λ2 , ν2 ) .
x∈D

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 10/57

Proof: The Dual Function is Concave

Conclusion: By the properties of the infimum:

g (λ, ν) ≥ θ inf L(x, λ1 , ν1 ) + (1 − θ) inf L(x, λ2 , ν2 ).

x∈D x∈D

Since:

inf L(x, λ1 , ν1 ) = g (λ1 , ν1 ) and inf L(x, λ2 , ν2 ) = g (λ2 , ν2 ),

x∈D x∈D

it follows that:

g (λ, ν) ≥ θg (λ1 , ν1 ) + (1 − θ)g (λ2 , ν2 ).

Therefore, g (λ, ν) is concave.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 11/57

Lagrange Dual Problem

Key Concept:
Each pair (λ, ν) provides a lower bound g (λ, ν) for the optimal value
f0 (x∗ ).
The pair (λ∗ , ν ∗ ) that gives the highest lower bound, g (λ∗ , ν ∗ ), is
called the optimal Lagrange multipliers.
Dual Problem:

(λ∗ , ν ∗ ) = arg max g (λ, ν)

Subject to: (P2)
λi ≥ 0, for i = 1, . . . , m

Additional Notes:
(P2) is always a convex optimization problem, regardless of the
convexity of (P1),
The difference f0 (x∗ ) − g (λ∗ , ν ∗ ) is called the optimal duality gap.
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 12/57
Strong Duality and Optimal Duality Gap

Strong Duality:
If the optimal duality gap is zero, we say that strong duality occurs.
Significance:
Solving the dual problem (P2) allows us to find the exact optimal
value of the primal problem (P1).

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 13/57

Slater’s Condition and Strong Duality

Constraint Qualifications:
For a convex optimization problem (P1), certain conditions called
constraint qualifications ensure strong duality.
A fundamental example of such a qualification is Slater’s condition.
Strictly Feasible Point (Definition):
A point x is strictly feasible if it satisfies:

fi (x) < 0, ∀i = 1, . . . , m, hj (x) = 0, ∀j = 1, . . . , p.

This means the inequality constraints are strictly satisfied, and the
equality constraints hold.
Slater’s Theorem:
If a strictly feasible point exists and (P1) is convex, then strong
duality holds.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 14/57

Karush-Kuhn-Tucker (KKT) Conditions

General KKT Conditions:

Assume strong duality holds.
The following conditions are necessary for x∗ , λ∗ , ν ∗ to be optimal
for both the primal and dual problems.
Necessary Conditions:
1 Primal feasibility:

fi (x∗ ) ≤ 0, ∀i = 1, . . . , m, hj (x∗ ) = 0, ∀j = 1, . . . , p.
2 Dual feasibility:
λ∗i ≥ 0, ∀i = 1, . . . , m.
3 Complementary slackness:
λ∗i fi (x∗ ) = 0, ∀i = 1, . . . , m.
4 Stationarity:
m
X p
X
∗
∇x f0 (x ) + λ∗i ∇x fi (x∗ ) + νj∗ ∇x hj (x∗ ) = 0.
i=1 j=1
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 15/57
KKT Conditions for Convex Problems

If (P1) is a convex problems and strong duality holds under Staler’s

condition, the KKT conditions are both necessary and sufficient.
If a point satisfies the KKT conditions, it is optimal for both the
primal and dual problems.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 16/57

Overview of Support Vector Machine

Section 2: Support Vector Machine

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 17/57

Problem Statement

Given:
A dataset D = {(x(i) , y (i) )}m
i=1 , where:
x(i) ∈ Rn (feature vectors),
y (i) ∈ {−1, 1} (class labels), for i = 1, . . . , m.
Assume the two classes of data points (y (i) = 1 and y (i) = −1) are
linearly separable.
Question:
How can we find the best hyperplane to separate these two classes?

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 18/57

Illustration of separating hyperplane

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 19/57

Support Vector Machine (SVM)

Definition:
Support Vector Machine (SVM) is a supervised learning algorithm
designed for classification problems.
It identifies the optimal hyperplane that separates two classes of
data points.
The hyperplane is selected to maximize the margin, which is the
distance to the nearest data points (support vectors) from both
classes.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 20/57

Support Vector Machine Visualization

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 21/57

Separating Hyperplane in n-Dimensional Space

Definition:
In the n-dimensional space, the separating hyperplane (α) has the
form:
w1 x1 + w2 x2 + · · · + wn xn + b = w⊤ x + b = 0,
where:
x = [x1 , x2 , . . . , xn ]⊤ : coordinates of a point on the hyperplane.
w = [w1 , w2 , . . . , wn ]⊤ : normal vector to the hyperplane α.
b: a constant (bias term).
Conditions (C1):
For all pairs (x(i) , y (i) ) ∈ D, the following must hold:

w⊤ x(i) + b ≥ 0, if y (i) = 1,

w⊤ x(i) + b < 0, if y (i) = −1.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 22/57

Distance to the Hyperplane

Definition:
The distance d from each data point (x(i) , y (i) ) to the hyperplane α
is given by:
(i) (i) (i)
w1 x1 + w2 x2 + · · · + wn xn + b w⊤ x(i) + b
d= q = .
w12 + w22 + · · · + wn2 ∥w∥2

Using (C1), the distance can also be written as:

y (i) (w⊤ x(i) + b)

d= .
∥w∥2

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 23/57

Scaling for Support Vectors

Support Vector Condition:

If (x(a) , y (a) ) is a support vector, it satisfies:

y (a) (w⊤ x(a) + b) = c, c ∈ R+ .

Scaling Observation:
The coefficients (w, b) are not unique. Scaling them by any positive
constant k ∈ R+ still satisfies (C1).
The distance d from any point in D to the hyperplane remains
unchanged under this scaling.
Simplified Assumption:
To remove this redundancy, we can assume:

y (a) (w⊤ x(a) + b) = 1, (Eq.1)

without affecting the relative geometry of the problem.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 24/57
Margin Size

Implication of Equation 1:
Since (Eq.1) holds, we can conclude that for every i = 1, . . . , m, the
following holds:
y (i) (w⊤ x(i) + b) ≥ 1.
Margin Size:
The margin size is calculated as:

y (a) (w⊤ x(a) + b) 1

margin = = .
∥w∥2 ∥w∥2

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 25/57

SVM Optimization Problem

Objective:
The goal of SVM is to maximize the margin size, which is
equivalent to solving for the pair of optimal values (w∗ , b ∗ ) of the
following optimization problem:
1
(w∗ , b ∗ ) = arg max ,
w,b ∥w∥2

subject to:
y (i) (w⊤ x(i) + b) ≥ 1, ∀i = 1, . . . , m.
Reformulated Problem:
The above problem is equivalent to minimizing the squared norm of
w:
1
(w∗ , b ∗ ) = arg min ∥w∥22 ,
w,b 2
subject to: (P3)
1 − y (i) (w⊤ x(i) + b) ≤ 0, ∀i = 1, . . . , m.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 26/57

Lagrangian of the Optimization Problem (P3)

Lagrangian:
The Lagrangian for the optimization problem (P3) is defined as:
m
1 X
2
L(w, b, λ) = ∥w∥2 + λi 1 − y (i) (w⊤ x(i) + b) ,
2
i=1

where:
λ = [λ1 , λ2 , . . . , λm ]⊤ are the Lagrange multipliers.
λi ≥ 0, ∀i = 1, . . . , m.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 27/57

Convexity and Optimality of (P3)

Key Insights:
It can be proven that (P3) is a convex optimization problem.
Slater’s condition is satisfied for (P3), ensuring strong duality holds.
Conclusion:
The optimal solutions w∗ , b ∗ , λ∗ for the dual problem can be obtained
by solving the Karush-Kuhn-Tucker (KKT) conditions of (P3).

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 28/57

KKT Conditions for (P3)

KKT Conditions:
Primal feasibility:

1 − y (i) (w∗ )⊤ x(i) + b ∗ ≤ 0, ∀i = 1, . . . , m. (C2.1)
Dual feasibility:
λ∗i ≥ 0, ∀i = 1, . . . , m. (C2.2)
Complementary slackness:

λ∗i 1 − y (i) (w∗ )⊤ x(i) + b ∗ = 0, ∀i = 1, . . . , m. (C2.3)
Stationarity with respect to w∗ :
m
∂L ∗
X
= w − λ∗i y (i) x(i) = 0. (C2.4)
∂w∗
i=1
Stationarity with respect to b ∗ :
m
∂L X
= λ∗i y (i) = 0. (C2.5)
∂b ∗
i=1
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 29/57
Solving (P3) Using the Dual Problem

Motivation:
Directly solving for w∗ , b ∗ , λ∗ using the KKT conditions can be
computationally intensive.
Instead, solving for λ in the Lagrange dual problem of (P3) is more
efficient and commonly done.
Lagrange Dual Function:
The dual function g (λ) is defined as:

g (λ) = inf L(w, b, λ),

w,b

where the Lagrangian L(w, b, λ) is given by:

m
1 X
2
L(w, b, λ) = ∥w∥2 + λi 1 − y (i) (w⊤ x(i) + b) .
2
i=1

Key Insight:
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 30/57
Finding inf w,b L

Key Steps:
To find inf w,b L, set the partial derivatives of L with respect to w and
b to zero.
Partial Derivatives:
With respect to w:
m m
∂L X X
=w− λi y (i) x(i) = 0 ⇒ w = λi y (i) x(i) . (Eq.2)
∂w
i=1 i=1
With respect to b:
m
∂L X
= λi y (i) = 0. (Eq.3)
∂b
i=1
Substituting (Eq.2) and (Eq.3) into g (λ):
m m m
X 1 XX
g (λ) = λi − λi λj y (i) y (j) x(i)⊤ x(j) . (Eq.4)
2
i=1 i=1 j=1

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 31/57

Lagrange Dual Problem of (P3)

Dual Problem Formulation:

By combining (Eq.3), (Eq.4), and the constraints on λ, we obtain the
Lagrange dual problem of (P3):
λ∗ = arg max g (λ),
λ

subject to: (P4)

λi ≥ 0, ∀i = 1, . . . , m,
m
X
λi y (i) = 0.
i=1
Solving the Dual Problem:
(P4) is a quadratic programming problem.
To solve it, we can use:
SMO (Sequential Minimal Optimization),
Libraries such as CVXOPT,
The Projected Gradient Descent algorithm.
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 32/57
Calculating w∗ and b ∗ from KKT Conditions

Observation:
From (C2.3), we conclude that λ∗i could be greater that0 only if:

y (i) (w∗ )⊤ x(i) + b ∗ = 1,

meaning x(i) is a support vector.

Step 1: Solve for λ∗ and identify support vectors:
After solving for λ∗ in (P4), define the set of support vectors:

S = {i | λ∗i ̸= 0}.

Step 2: Calculate w∗ :
Using (C2.4): X
w∗ = λ∗i y (i) x(i) .
i∈S

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 33/57

Calculating w∗ and b ∗ from KKT Conditions

Step 3: Calculate b ∗ :
Since x(i) is a support vector for every i ∈ S, we have:

y (i) (w∗ )⊤ x(i) + b ∗ = 1.

For each i ∈ S, calculate:

1
b∗ = − (w∗ )⊤ x(i) .
y (i)
Alternatively, for numerical stability, we can calculation b ∗ by taking
the mean of all possible b ∗ values:
1 X (i)
b∗ = y − (w∗ )⊤ x(i) .
|S|
i∈S

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 34/57

Separating Hyperplane and Prediction

Separating Hyperplane:
The separating hyperplane α is defined as:

∗ ⊤ ∗
X 1 X (i)
α : (w ) x + b = λ∗i y (i) x(i) + ∗ ⊤ (i)
y − (w ) x = 0.
|S|
i∈S i∈S

Prediction for a New Data Point x(n) :

The label y (n) for a new data point x(n) is determined as follows:
(
1, if (w∗ )⊤ x(n) + b ∗ ≥ 0,
y (n) =
−1, otherwise.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 35/57

Section 3: Types of SVM

Linear Support Vecto Machine

Hard Margin
Soft Margin

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 36/57

Section 3: Types of SVM

Linear Support Vecto Machine

Hard Margin
Soft Margin

Non-Linear Support Vecto Machine

Kernel Function
Linear Kernel
Polynomial Kernel
Sigmoid Kernel
Radial Basis Function (RBF) Kernel

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 36/57

Section 3: Types of SVM - Linear SVM - Hard Margin

Assumption: The data is perfectly

linearly separable, meaning there exists
a hyperplane that can separate the two
classes without any misclassification.
Goal: Maximize the margin between
classes with no points inside the margin.
Conditions:
No points allowed within the margin.
No misclassifications.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 37/57

Section 3: Types of SVM - Linear SVM - Soft Margin

Problem: In real-world scenarios, data

is often noisy and not perfectly linearly
separable. So we can not find w and b.
Therefore, a model that allows for some
misclassification is needed to handle
these cases.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 38/57

Section 3: Types of SVM - Linear SVM - Soft Margin

Solution: To address the problem of

non-separable data, we use slack
variables ξi (epsilon) for each data
point.

Role of ξ (epsilon):
ξi measures the degree of
misclassification for each data point.
ξi = 0: The point is correctly
classified and outside the margin.
ξi > 0: The point either lies within
the margin or is misclassified.

Constraints:

⟨w · xi ⟩ + b ≥ 1 − ξi , if yi = 1
⟨w · xi ⟩ + b ≤ −1 + ξi , if yi = −1
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 39/57
Section 3: Types of SVM - Soft Margin

Constraints with Slack Variables ξ:

For positive class (yi = 1):
⟨w · xi ⟩ + b ≥ 1 − ξi , ∀i = 1 . . . N
For negative class (yi = −1):
⟨w · xi ⟩ + b ≤ −1 + ξi , ∀i = 1 . . . N
Non-negativity constraint:
ξi ≥ 0, ∀i = 1 . . . N

Optimization Objective:
N
!
1 X
arg min ||w ||2 + C ξi (Eq.1)
w ,b,ξ 2
i=1

where:
C > 0 is the penalty constant.
The greater C , the heavier the penalty on misclassifications.
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 40/57
Section 3: Types of SVM – Linear SVM – Soft Margin

The Lagrange function is:

N N N
1 X X X
L(w , b, λ, µ) = ⟨w , w ⟩ + C ξi − λi (yi (⟨w , xi ⟩ + b) − 1 + ξi ) − µi ξi
2 i=1 i=1 i=1
(Eq.2)
where:
λi ≥ 0 and µi ≥ 0 are Lagrange multipliers.
Optimization Conditions:
N
X
∇w L = 0 ⇒ w = λn yn xn (Eq.3)
n=1
N
X
∇b L = 0 ⇒ λn yn = 0 (Eq.4)
n=1

∇ ξ n L = 0 ⇒ λ n = C − µn (Eq.5)
Interpretation of Conditions:
This relationship shows that λi is constrained by C and µi , with µi ≥ 0.
If µi = 0, then λi = C , indicating a boundary point.
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 41/57
Section 3: Types of SVM - Linear SVM – Soft Margin

New Lagrange Function:

After substituting w , ξ, and λ into the Lagrange function, we get:
N N N
X 1 XX
L(λ) = λi − λi λj yi yj ⟨xi , xj ⟩ (Eq.6)
2
i=1 i=1 j=1

Notes:
This dual form depends only on λi and the inner products ⟨xi , xj ⟩,
making it computationally efficient.
The goal is to maximize L(λ) with respect to λ, which controls the
influence of each support vector.
Subject to the following constraints:
0P≤ λi ≤ C , ∀i = 1 . . . N
N
i=1 λi yi = 0

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 42/57

Section 3: Types of SVM - Linear SVM – Soft Margin

KKT Conditions for Soft Margin:

ξi ≥ 0 (Eq.7)
λi ≥ 0 (Eq.8)
µi ≥ 0 (Eq.9)
µi ξi = 0 (Eq.10)
yi ((w · xi ) + b) − 1 + ξi ≥ 0, ∀i = 1 . . . N (Eq.11)
λi (yi ((w · xi ) + b) − 1 + ξi ) = 0 (Eq.12)

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 43/57

Section 3: Types of SVM - Linear SVM – Soft Margin

If λn > 0, it contributes to finding the solution w in the soft margin

SVM problem.
The set S = {n : λn > 0} is called the support set, and the vectors
{xn , n ∈ S} are known as support vectors.
The weight vector w is determined solely based on the support
vectors: X
w= λn yn xn
n∈S

When λn > 0 and Eq.11:

yn (wT xn + b) = 1 − ξn

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 44/57

Section 3: Types of SVM - Linear SVM – Soft Margin

If 0 < λn < C and Eq.5 we have:

yn (wT xn + b) = 1

Indicating that these points lie exactly on the margin boundary.

The bias term b can be computed as:
1 X
b= ym − wT xm
NM
m∈M

where M = {m : 0 < λm < C }.

Final solution:
X 1 X
w= λm ym xm , b= ym − wT xm
NM
m∈S m∈M

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 45/57

Section 3: Types of SVM - Linear SVM – Soft Margin

If λn = C and Eq.5,Eq.11, we have:

yn (wT xn + b) = 1 − ξn ≤ 1

Indicating that these points lie on or between the margin boundaries.

The final decision function for a new point x is given by:
!
X 1 X X
wT x + b = λm ym xT
mx + yn − λm ym xT
m xn
NM
m∈S n∈M m∈S
(Eq.13)

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 46/57

Section 3: Types of SVM – Linear SVM – Soft Margin

C = 0.1

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 47/57

Section 3: Types of SVM – Linear SVM – Soft Margin

C=1

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 48/57

Section 3: Types of SVM – Linear SVM – Soft Margin

C = 10000

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 49/57

Section 3: Types of SVM – Non-linear SVM – Solution

ϕ:
x→
ϕ(x)

Basic concept: It is always possible to transform the initial feature space

into a higher-dimensional feature space in which the training set exhibits
separability.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 50/57

Section 3: Types of SVM - Non-linear SVM - Mathematics

Based on the problem linear SVM:

N N N
X 1XX
λ = arg max λn − λn λm yn ym xT
n xm
λ 2
n=1 n=1 m=1

subject to:
N
X
λn yn = 0, 0 ≤ λn ≤ C , ∀n
n=1

To avoid explicitly computing the transformation Φ(x), the kernel

trick is used:
k(x, z) = Φ(x)T Φ(z)
Allowing computations to stay in the original space.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 51/57

Section 3: Types of SVM - Non-linear SVM - Mathematics

The decision function in the transformed feature space Φ(x) is given

by:
1
wT Φ(x) + b = λm ym Φ(xm )T Φ(x) + λm ym Φ(xm )T Φ(xn )
P P P
m∈S NM n∈M yn − m∈S
(Eq.14)
The objective function for the dual form of the SVM optimization
problem is:
PN 1 PN PN
λ = arg maxλ n=1 λn − 2 n=1 m=1 λn λm yn ym k(xn , xm ) (Eq.15)

subject to:
PN
n=1 λn yn = 0, 0 ≤ λn ≤ C , ∀n

Here:
S is the support set with λm > 0.
M is the set of support vectors on the margin where 0 < λm < C .
k(xn , xm ) = Φ(xn )T Φ(xm ) is the kernel function.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 52/57

Section 3: Types of SVM - Non-linear SVM - Mathematics - Instance

Consider a transformation of a point from the two-dimensional space

x = [x1 , x2 ]T to a higher-dimensional feature space:
√ √ √
Φ(x) = [1, 2x1 , 2x2 , x12 , 2x1 x2 , x22 ]T

For two points x and z in the original space, the dot product in the
feature space is:
√ √ √ √ √ √
Φ(x)T Φ(z) = [1, 2x1 , 2x2 , x12 , 2x1 x2 , x22 ]T [1, 2z1 , 2z2 , z12 , 2z1 z2 , z22 ]
= 1 + 2x1 z1 + 2x2 z2 + x12 z12 + 2x1 z1 x2 z2 + x22 z22

This can be rewritten as:

(1 + xT z)2 = k(x, z)

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 53/57

Section 3: Types of SVM - Non-linear SVM - Conditions for Kernel
Functions

Symmetry: Kernel functions must be symmetric, i.e.,

k(x, z) = k(z, x).
Mercer’s Condition:
N X
X N
k(xm , xn )cm cn ≥ 0, ∀ci ∈ R
n=1 m=1

This condition ensures the kernel matrix K is positive semi-definite,

allowing efficient optimization in dual problems.
Practical Consideration: Some functions not satisfying Mercer’s
condition may still yield acceptable results and are used as kernels.

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 54/57

Section 3: Types of SVM – Non-linear SVM – Kernel Functions

Polynomial:

K (x, z) = ((x · z) + θ)d , θ ∈ R, d ∈ N

Gaussian radial basis function (RBF):

∥x−z∥2
K (x, z) = e − 2σ , σ>0

Sigmoid:
K (x, z) = tanh(γ(x · z) + r ), γ, r ∈ R

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 55/57

Section 3: Types of SVM - Summary

SVM is a supervised learning algorithm used for classification and

regression.
SVM works with real-value attributes. Any nominal attribute needs to
be transformed into a real one.
The learning formulation of SVM focuses on 2 classes. A multiclass
problem can be solved by reducing to many different problems with 2
classes (One vs the rest, one vs one).
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 56/57
Section 4: Advantages and Drawbacks

Pros
Work well with a clear margin of separation between classes
Productive in high-dimensional spaces
Effective when dimensions outnumber specimens
Memory-efficient

Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 57/57

Section 4: Advantages and Drawbacks

Pros
Work well with a clear margin of separation between classes
Productive in high-dimensional spaces
Effective when dimensions outnumber specimens
Memory-efficient
Drawbacks
Not suitable for large datasets
Performs poorly when classes overlap
Underperforms when features outnumber training data specimens
Lack of a probabilistic interpretation for classification
Computationally expensive for large datasets
Sensitive to the choice of kernel and parameters
Memory-intensive due to storing the kernel matrix
Limited to two-class problems
Not suitable for datasets with missing values
Group 8 MATH FOR COMPUTER SCIENCE NOVEMBER 2024 57/57

En 4j2 4f
No ratings yet
En 4j2 4f
71 pages
Audiophile Speaker Set-Up
No ratings yet
Audiophile Speaker Set-Up
61 pages
Pricing Strategy
75% (4)
Pricing Strategy
2 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
COMP 4211 - Machine Learning
No ratings yet
COMP 4211 - Machine Learning
19 pages
EXPLOR 1 Stamped
No ratings yet
EXPLOR 1 Stamped
46 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Report 1
No ratings yet
Report 1
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Support Vector Machines PDF
No ratings yet
Support Vector Machines PDF
5 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
ML-chap13_2024_110331
No ratings yet
ML-chap13_2024_110331
67 pages
SVM
No ratings yet
SVM
28 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
45 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
SVM 30thoct Annotated
No ratings yet
SVM 30thoct Annotated
35 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
SVM SLIDES
No ratings yet
SVM SLIDES
32 pages
Lecture 7_SVM
No ratings yet
Lecture 7_SVM
125 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
EXP-14
No ratings yet
EXP-14
27 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Intro
No ratings yet
SVM Intro
114 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM_NEW
No ratings yet
SVM_NEW
12 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Class 0420
No ratings yet
Class 0420
44 pages
39f6c97e482b96aba75c59b4ac0d99b8_MIT15_097S12_lec12
No ratings yet
39f6c97e482b96aba75c59b4ac0d99b8_MIT15_097S12_lec12
14 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
07 SVMs
No ratings yet
07 SVMs
68 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
SVM 1
No ratings yet
SVM 1
6 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machines: Artificial Neural Networks Unit 6
No ratings yet
Support Vector Machines: Artificial Neural Networks Unit 6
10 pages
SVM Hands-On Problem
No ratings yet
SVM Hands-On Problem
7 pages
5d. Support Vector Machine
No ratings yet
5d. Support Vector Machine
2 pages
Introduction of Support Vector Machines
No ratings yet
Introduction of Support Vector Machines
16 pages
A Short SVM (Support Vector Machine) Tutorial
No ratings yet
A Short SVM (Support Vector Machine) Tutorial
6 pages
1632118884_ML-TCS-Lecture-15 (1)
No ratings yet
1632118884_ML-TCS-Lecture-15 (1)
46 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machine SVM
No ratings yet
Support Vector Machine SVM
58 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
483 of Fda
No ratings yet
483 of Fda
2 pages
1.arrange The Jumbled Sentences Below Into A Meaningful Paragraph
100% (1)
1.arrange The Jumbled Sentences Below Into A Meaningful Paragraph
3 pages
May2012 - Renovating and Revamping Your Home PDF
No ratings yet
May2012 - Renovating and Revamping Your Home PDF
45 pages
Ig Worksheet 1
No ratings yet
Ig Worksheet 1
6 pages
Class 9 Ev 2 Merged 2022 2023
No ratings yet
Class 9 Ev 2 Merged 2022 2023
69 pages
Alcohol 1
No ratings yet
Alcohol 1
32 pages
The Concept of The Tourist Area Life-Cycle of Evolution
No ratings yet
The Concept of The Tourist Area Life-Cycle of Evolution
8 pages
Pearls Trauma
No ratings yet
Pearls Trauma
2 pages
Edited - VLSI DESIGN U4
No ratings yet
Edited - VLSI DESIGN U4
26 pages
FCL Reflection
No ratings yet
FCL Reflection
4 pages
Proverbs in English For Students
No ratings yet
Proverbs in English For Students
10 pages
Emulzone Natural
No ratings yet
Emulzone Natural
2 pages
CB Shine Brochure
No ratings yet
CB Shine Brochure
2 pages
Standard Color Coding For T568A and T568B Wiring
No ratings yet
Standard Color Coding For T568A and T568B Wiring
26 pages
Edan F6 Fetal Monitor Bottom Cover
No ratings yet
Edan F6 Fetal Monitor Bottom Cover
2 pages
Dielectric Materials for Energy Storage and Energy Harvesting Devices Shailendra Rajput & Sabyasachi Parida & Abhishek Sharma & Sonika - The ebook with rich content is ready for you to download
100% (3)
Dielectric Materials for Energy Storage and Energy Harvesting Devices Shailendra Rajput & Sabyasachi Parida & Abhishek Sharma & Sonika - The ebook with rich content is ready for you to download
53 pages
KF Check IOM Series35ShaftStyle
No ratings yet
KF Check IOM Series35ShaftStyle
3 pages
Part 1: Questions 1-10write The Word Whose Stress Pattern Is Different From The Other Three in Each of The Following Questions
No ratings yet
Part 1: Questions 1-10write The Word Whose Stress Pattern Is Different From The Other Three in Each of The Following Questions
9 pages
Kistos 01-2023-19 Update
No ratings yet
Kistos 01-2023-19 Update
9 pages
FUNDA Nutrition N2017 PDF
No ratings yet
FUNDA Nutrition N2017 PDF
3 pages
Survey Checklist
No ratings yet
Survey Checklist
17 pages
Jet Powered Models, Pulse Jets
100% (1)
Jet Powered Models, Pulse Jets
5 pages
Phosphate Weekly Market Report 6 Sept17
No ratings yet
Phosphate Weekly Market Report 6 Sept17
20 pages
Bentonite Wastewater Treatment PDF
No ratings yet
Bentonite Wastewater Treatment PDF
12 pages
DLP-L01.1 - Introduction To Personal Development
No ratings yet
DLP-L01.1 - Introduction To Personal Development
2 pages
Karel PBX Ms38s Installation & Maintenance
100% (1)
Karel PBX Ms38s Installation & Maintenance
65 pages
(Ebook) Skin Diseases of Cattle in the Tropics. A Guide to Diagnosis and Treatment by Mohamed Elamin Hamid ISBN 9780128110546, 9780128110553, 0128110546, 0128110554pdf download
100% (5)
(Ebook) Skin Diseases of Cattle in the Tropics. A Guide to Diagnosis and Treatment by Mohamed Elamin Hamid ISBN 9780128110546, 9780128110553, 0128110546, 0128110554pdf download
51 pages