0% found this document useful (0 votes)
8 views

Support Vector Machine

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression, focusing on maximizing the margin between data points of different classes. Key concepts include hyperplanes, support vectors, and kernel functions, which help in transforming data for better separability. SVM can be applied effectively to small to medium-sized datasets with clear margins of separation and can handle non-linear boundaries through various kernel types.

Uploaded by

Mohamed Mamdouh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Support Vector Machine

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression, focusing on maximizing the margin between data points of different classes. Key concepts include hyperplanes, support vectors, and kernel functions, which help in transforming data for better separability. SVM can be applied effectively to small to medium-sized datasets with clear margins of separation and can handle non-linear boundaries through various kernel types.

Uploaded by

Mohamed Mamdouh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Support Vector Machine (SVM): supervised learning machine learning

algorithm that can be used for both classification and regression challenges.

Core Concepts
1. Hyperplane:
o A decision boundary that separates data points of different classes.
In 2D, it’s a line
in 3D, it’s a plane; and in higher dimensions, it’s a hyperplane.
2. Margin:
o The distance between the hyperplane and the nearest data points
(support vectors). SVM aims to maximize this margin.
3. Support Vectors:
o Data points closest to the hyperplane that influence its position and
orientation.
4. Kernel Trick:
o SVM uses kernel functions to transform data into higher dimensions,
making it easier to find a hyperplane in cases where data is not
linearly separable.

Hyperparameters in SVM
1. C (Regularization Parameter)
• Purpose: Balances the trade-off between maximizing the margin and
minimizing classification error.
• Effect:
o Large C: Focuses on correctly classifying all training points, resulting
in a smaller margin and potential overfitting.
o Small C: Allows more misclassifications but achieves a wider margin,
improving generalization.
Real-life Analogy for C:
Imagine you're designing a security system for a museum:
1. Large C:
o The security guard checks every single person and every detail of
their belongings. This ensures no one suspicious gets through but
slows down entry for everyone (overfitting).
o Applied to SVM: This results in a very tight decision boundary that
might not generalize well to new visitors (new data).
2. Small C:
o The guard only checks for large, obvious threats (e.g., large bags or
unusual behavior). Some small errors may occur, but it ensures quick
and smooth entry for most people (better generalization).
o Applied to SVM: The decision boundary is looser, focusing on the
broader picture and tolerating some mistakes.
Scenario: Classifying emails as "spam" or "not spam".
• Large C: The model tries to perfectly classify every email in the training
data. If one legitimate email contains the word "win" (often found in spam),
the model might overfit and treat all emails with "win" as spam.
• Small C: The model tolerates a few misclassified emails in the training data
but finds a broader, more generalized rule for spam classification.

2. Kernel
• Purpose: Determines the transformation applied to the data.
• Types:
o Linear Kernel: For linearly separable data.
o Polynomial Kernel: For more complex patterns.
o Radial Basis Function (RBF) Kernel: Popular for non-linear data due
to its flexibility.
o Sigmoid Kernel: Acts like a neural network activation function.
• Advice:
o Use a linear kernel for datasets where features are already linearly
separable or have high dimensionality.
o Use RBF as a default for non-linear data.

3. Gamma (Kernel Coefficient for RBF, Polynomial, and Sigmoid Kernels)


• Purpose: Defines the influence of a single training example.
• Effect:
o Large Gamma: Focuses on points close to each other, leading to
more complex models (risk of overfitting).
o Small Gamma: Considers distant points, resulting in simpler models
(risk of underfitting).
Real-life Analogy for Gamma:
Imagine you're planning where to install Wi-Fi routers in a building:
1. Large Gamma:
o Each router provides a very small range of coverage. You need
many routers to ensure the entire building has Wi-Fi.
o Applied to SVM: The model focuses on very small regions, fitting
tightly around data points, which can lead to overfitting.
2. Small Gamma:
o Each router provides a wide range of coverage, ensuring fewer
routers are needed. However, the signal might be weaker or less
precise.
o Applied to SVM: The model creates smooth, broad decision
boundaries, which might oversimplify the problem.

Examples in Data Scenario: Classifying customers into "high-value" and "low-


value" groups based on spending habits.
• Large Gamma:
o The model focuses on very specific spending patterns. A customer
spending $500 on groceries and $50 on electronics might be treated
differently from one spending $505 and $45, leading to overfitting.
• Small Gamma:
o The model considers broader patterns. It might classify all customers
spending over $500 as "high-value," missing finer details.

4. Degree (for Polynomial Kernel)


• Purpose: Sets the degree of the polynomial kernel.
• Recommended Value: Begin with degree=3\text{degree} = 3degree=3 for
most applications.

When to Use SVM


1. Small to Medium-sized Datasets:
o SVM performs well with a moderate number of data points but may
struggle with extremely large datasets.
2. High Dimensionality:
o Effective when the number of features is greater than the number of
samples.
3. Clear Margins of Separation:
o SVM excels in cases where there is a clear boundary between classes.
4. Non-linear Boundaries:
o Using kernels, SVM can handle complex, non-linear decision
boundaries.

How to Use SVM Effectively


1. Preprocessing:
o Scale features using standardization or normalization to avoid bias from features
with larger ranges.
2. Start Simple:
o Begin with a linear kernel. If the results are poor, try RBF or polynomial kernels.
3. Use Cross-validation:
o Perform grid search or randomized search to tune hyperparameters (CCC, kernel,
gamma, degree).
4. Balance the Dataset:
o For imbalanced datasets, consider using class weights (class_weight='balanced'
in scikit-learn) to give more importance to the minority class.

SVM margin is sensitive to feature scales.

Maximum margin hyperplane

Soft margin and hard margin

Hard Margin SVM:


1. Definition:
• A hard margin SVM aims to find the hyperplane that perfectly
separates the data into different classes with no misclassifications.
• It assumes that the data is linearly separable, meaning there is a clear
gap between the two classes.
2. Characteristics:
• The hard margin SVM is sensitive to outliers and noise because even
a single mislabeled point can prevent finding a feasible hyperplane.
• It may not perform well when the data is not perfectly separable,
leading to overfitting.
3. Use Case:
• Hard margin SVM is suitable when you are confident that the
classes are perfectly separable and there is minimal noise in the
data.

Large C Value and Narrow Margin

So far, we’ve used hard margin classification: all training samples are on the “correct
side of the street”:
Only work if the data is linearly separable (Left Figure)
Sensitive to outliers→not generalize (Right Figure)

Soft Margin SVM:


1. Definition:
• A soft margin SVM allows for some misclassifications to find a
balance between achieving a clear separation and handling noisy
data.
• It introduces a penalty for misclassified points, allowing for a more
flexible decision boundary.
2. Characteristics:
• Soft margin SVM is less sensitive to outliers and noise because it
allows for a margin of error in the classification.
• It performs better on datasets that are not perfectly separable.
3. Use Case:
• Soft margin SVM is suitable when there is some level of noise or
overlap between the classes. It provides a more realistic approach to
classification.
Small C Value and Large Margin

Objective: find a good balance between keeping the margin as large as


possible while limiting the margin violations
Controlled by hyperparameter C: larger value →smaller margin but less violation
SVM Regression

The trick is to reverse the objective: instead of trying to fit the largest possible street between
two classes while limiting margin violations, SVM Regression tries to fit as many instances as
possible on the street while limiting margin violations (i.e., instances off the street)

The width of the street is controlled by a hyperparameter ϵ.

Multi- Class SVM

Turn Binary Classifier into Multiclass:


- One-vs-Rest (OVR)
- One-vs-One (OVO)
OVO

OVR

Kernels

Polynomial
Gaussian (RBF) Kernel
Sigmoid

You might also like