An Overview On Support Vector Machines
An Overview On Support Vector Machines
Vector Machines
02 PRIMAL PROBLEM
03 DUAL PROBLEM
04 HARD MARGIN
05 SOFT MARGIN
06 CONCLUSION
Introduction to SVMs
Support Vector Machines (SVM) are a set of
supervised learning methods used for
classification, regression, and outliers detection.
In Support Vector Machines, the goal is to find a
hyperplane that separates the data points into
two classes.
The hard margin SVM is used when the data is
linearly separable without any error, while the
soft margin SVM allows for misclassification
when the data is not perfectly separable.
The Primal Problem
The Primal Problem refers to the original optimization problem formulated for the
classification task. It seeks to find the best separating hyperplane between two classes by
maximizing the margin between the closest data points (support vectors) and the
hyperplane
The primal problem is a quadratic programming problem that tries to minimize the norm of
the weight vector (which corresponds to maximizing the margin) while penalizing
misclassified or incorrectly placed points using slack variables.
Equation
Given a dataset where are the feature vectors and are the class labels ,
the primal problem is to minimize the following objective function:
Where:
w is the weight vector that defines the hyperplane.
b is the bias term.
ξiare slack variables, which allow for some misclassification in non-linearly separable
cases.
C is the regularization parameter controlling the trade-off between maximizing the margin
and minimizing the classification error (penalizing the slack variables).
The Dual Problem
The dual problem focuses on maximizing a different objective function that depends only on
the Lagrange multipliers, denoted as αi.
These multipliers are associated with the constraints in the primal problem.
It is often used in practice because it allows the introduction of kernel functions to handle non-
linearly separable data and provides a more computationally efficient solution, especially for
high-dimensional feature spaces.
Where:
are the Lagrange multipliers.
are the class labels
is the dot product of the feature vectors and
is the regularization parameter from the primal problem.
Hard Margin
The Hard Margin case intends to find a hyperplane that seperates the two classes in such a
way that no two data points are misclassified. Given a dataset where
and , we want to find a hyperplane of the form:
where w is the normal vector to the hyperplane and b is the bias term.
To achieve the maximum margin, we need to minimize the norm of the weight vector w, since
the margin is . Keeping this in mind, the primal problem is resolved to:
This is a convex quadratic programming problem, where the objective is quadratic in w, and the
constraints are linear in w and b
Lagrangian for Hard Margin:
We form the Lagrangian for the constrained optimization problem by introducing Lagrange
multipliers for each constraint :
Dual Problem:
To derive the dual problem, we compute the partial derivatives of the Lagrangian with respect
to w and b:
Dual Optimization Problem
The dual formulation of the hard margin SVM is:
Solution:
The below equations represent the optimal weights and biases from the dual optimization
problem.
Soft Margin
When the data is not linearly separable, we allow for some misclassification by introducing
slack variables . The slack variable measures how much the i-th data point violates the
margin.
Here, C is a regularization parameter that controls the trade-off between maximizing the
margin and minimizing the classification error (through ).
Lagrangian for Soft Margin:
Similar to the hard margin case, we construct the Lagrangian by introducing Lagrange
multipliers and for the slack variables:
Use hard margin SVM when the data is linearly separable and free of noise or outliers. It
requires perfect separation and is sensitive to outliers, making it suitable for clean datasets
where misclassifications are unacceptable.
Use soft margin SVM when the data is not perfectly separable and may contain noise or
outliers. It allows for misclassifications through slack variables and includes a
regularization parameter CCC to balance margin size and classification errors. This
Everest
approach is more Remy
robust and flexible for real-world datasets where noise is common.
Cantu Marsh
Ceo Of Ingoude Ceo Of Ingoude
Company Company
THANK YOU