0% found this document useful (0 votes)
4 views8 pages

SVM

Support Vector Machines (SVMs) are powerful tools for classifying both linearly and non-linearly separable datasets using various kernel functions like linear, polynomial, and Radial Basis Function (RBF) kernels. The choice of kernel affects the model's ability to separate classes and handle complexities in the data, with linear kernels being effective for linearly separable data and polynomial and RBF kernels suited for non-linear relationships. Additionally, SVMs can be categorized into hard margin and soft margin types, with soft margin SVMs allowing for some misclassifications to improve robustness against outliers.

Uploaded by

smizba777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

SVM

Support Vector Machines (SVMs) are powerful tools for classifying both linearly and non-linearly separable datasets using various kernel functions like linear, polynomial, and Radial Basis Function (RBF) kernels. The choice of kernel affects the model's ability to separate classes and handle complexities in the data, with linear kernels being effective for linearly separable data and polynomial and RBF kernels suited for non-linear relationships. Additionally, SVMs can be categorized into hard margin and soft margin types, with soft margin SVMs allowing for some misclassifications to improve robustness against outliers.

Uploaded by

smizba777
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Mod 3

SVMs handle linear and non-linearly separable datasets. This unique capability and their robust
mathematical foundation have propelled SVMs to the forefront of various applications across
industries, from text classification and image recognition to bioinformatics and financial forecasting.
Understanding SVMs can open up possibilities in your data analysis journey.
 Many real-world datasets exhibit non-linear relationships between features and classes,
making them non-linearly separable. Non-linear SVMs address this challenge by employing
kernel functions to implicitly map the original feature space into a higher-dimensional space,
where linear separation becomes feasible.
 By transforming the data into a higher-dimensional space, SVMs can find hyperplanes that
effectively separate classes even when they are not linearly separable in the original feature
space.

The linear kernel is the simplest kernel function used in Support Vector Machines (SVMs). It's
essentially a dot product between two data points. Mathematically, it's represented as:

K(x1, x2) = x1 · x2

where:

 K(x1, x2) is the kernel function.

 x1 and x2 are two data points.

 · represents the dot product.

How it Works:

1. Data Representation: Each data point (e.g., an image, a document) is represented as a


feature vector. This vector contains numerical values that describe the characteristics of the
data point.

2. Kernel Function: The linear kernel calculates the similarity between two data points by
computing the dot product of their feature vectors. A higher dot product generally indicates
greater similarity.

3. Hyperplane: The SVM algorithm uses the kernel function to find the optimal hyperplane that
separates the data points into different classes. This hyperplane is defined by the support
vectors, which are the data points closest to the decision boundary.

When to Use the Linear Kernel:

 Linearly Separable Data: The linear kernel is most effective when the data is linearly
separable. This means that a straight line (or a hyperplane in higher dimensions) can
perfectly separate the different classes.

 High-Dimensional Data: In high-dimensional spaces, the linear kernel can be


computationally efficient compared to non-linear kernels.

 Simplicity: Due to its simplicity, the linear kernel is often used as a baseline for comparison
with other kernel functions.

Limitations:
 Non-Linearly Separable Data: If the data is not linearly separable, the linear kernel may not
perform well. In such cases, non-linear kernels like the polynomial kernel or the RBF kernel
are often more effective.

The polynomial kernel is a powerful tool within Support Vector Machines (SVMs) that allows them
to handle non-linearly separable data. Here's a breakdown of its key aspects:

Core Idea:

 Essentially, the polynomial kernel transforms the original input data into a higher-
dimensional space.

 This transformation is done implicitly, meaning we don't actually calculate the coordinates of
the data in this higher space. Instead, the kernel function calculates the dot product of the
transformed vectors.

 The advantage of this transformation is that data that's not separable by a straight line (or
hyperplane) in the original space might become separable in the higher-dimensional space.

Mathematical Representation:

 The polynomial kernel is defined as:

o K(x, y) = (x ⋅ y + c)^d

o Where:

 x and y are the input vectors.

 c is a constant that controls the influence of lower-degree terms.

 d is the degree of the polynomial, which determines the complexity of the


decision boundary.

How it Works:

 By raising the dot product of the input vectors to a certain power (d), the kernel effectively
creates new features that are combinations of the original features.

 This allows the SVM to learn complex, non-linear decision boundaries.

 For example, if d = 2, the kernel creates features that are quadratic combinations of the
original features.

Key Considerations:

 Degree (d):

o A higher degree allows for more complex decision boundaries but also increases the
risk of overfitting.

o Choosing the right degree is crucial for optimal performance.

 Constant (c):
o The constant c influences the balance between higher-degree and lower-degree
terms.

 Computational Cost:

o Polynomial kernels can be computationally more expensive than linear kernels,


especially for high degrees.

Applications:

 Polynomial kernels are useful in situations where the data exhibits non-linear relationships.

 They have been applied in areas like:

o Image processing.

o Text classification.

The Radial Basis Function (RBF) kernel is a very popular kernel function used in Support Vector
Machines (SVMs), particularly for non-linear classification. Here's a breakdown of its key
characteristics:

Core Idea:

 The RBF kernel transforms the input data into an infinitely dimensional space.

 This allows SVMs to create very complex, non-linear decision boundaries.

 It measures the similarity between data points based on their distance.

Mathematical Representation:

 The RBF kernel is defined as:

o K(x, y) = exp(-γ ||x - y||²)

o Where:

 x and y are the input vectors.


 γ (gamma) is a hyperparameter that controls the width of the kernel.

 ||x - y||² is the squared Euclidean distance between x and y.

 exp is the exponential function.

How it Works:

 Essentially, the RBF kernel calculates how similar two data points are by measuring their
distance.

 Points that are close together have a high similarity, while points that are far apart have a
low similarity.

 The gamma parameter determines how much influence a single training example has.

 A small gamma means a wider influence, while a large gamma means a narrower influence.

Key Considerations:

 Gamma (γ):

o This is a crucial hyperparameter.

o A high gamma leads to a very "tight" fit, which can result in overfitting.

o A low gamma leads to a "loose" fit, which can result in underfitting.

 Computational Cost:

o The RBF kernel can be computationally expensive, especially for large datasets.

 Flexibility:

o The RBF kernel is very flexible and can handle complex, non-linear decision
boundaries.

Applications:

 The RBF kernel is widely used in various applications, including:

o Image classification.

o Bioinformatics.

o Text classification.

HARD MARGIN

Linear SVM:

 This refers to an SVM that aims to find a linear hyperplane to separate the data.

 In a 2D space, this hyperplane is a straight line; in 3D, it's a plane; and in higher dimensions,
it's a hyperplane.

 A linear SVM is suitable when the data is linearly separable, meaning it can be perfectly
divided by a straight line or hyperplane.

2. Hard Margin SVM:


 A hard margin SVM aims to find a hyperplane that perfectly separates the data, with no
misclassifications allowed.

 It requires the data to be strictly linearly separable.

 The goal is to maximize the margin, which is the distance between the hyperplane and the
closest data points (support vectors) 1 from each class.

 Limitations:

o It's very sensitive to outliers. Even a single outlier can significantly affect the
hyperplane or make it impossible to find one.

o It only works when the data is perfectly linearly separable, which is rarely the case in
real-world datasets.

3. Soft Margin SVM:

 A soft margin SVM addresses the limitations of the hard margin SVM by allowing some
misclassifications.

 It introduces "slack variables" that allow some data points to be on the wrong side of the
margin or even the wrong side of the hyperplane.

 This makes the SVM more robust to outliers and allows it to work with data that is not
perfectly linearly separable.

 A "C" parameter is used to control the trade-off between maximizing the margin and
minimizing the number of misclassifications.

o A small C allows for more misclassifications, leading to a wider margin.

o A large C penalizes misclassifications more heavily, leading to a narrower margin.

 Benefits:

o More robust to outliers.

o Can handle data that is not perfectly linearly separable.

o Provides better generalization performance.


SVM Regression

Support Vector Regression (SVR) is a machine learning technique that extends Support Vector
Machines (SVMs) to handle regression problems, predicting continuous outcomes rather than
classifying data into discrete categories. It aims to find a function that best fits the data while
minimizing prediction errors.

You might also like