0% found this document useful (0 votes)
52 views43 pages

Hands On Machine Learning 3 Edition

This chapter discusses support vector machines (SVMs) including linear SVM classification for linearly separable data using a hard margin. It introduces soft margins to handle non-separable data and allow for some misclassification. Non-linear classification techniques like kernel tricks and similarity features are described to project data into higher dimensions for linear separation. SVM regression is also covered, which aims to predict continuous output values. Key concepts like support vectors, margins, kernels, and computational complexity are defined.

Uploaded by

saharabdouma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views43 pages

Hands On Machine Learning 3 Edition

This chapter discusses support vector machines (SVMs) including linear SVM classification for linearly separable data using a hard margin. It introduces soft margins to handle non-separable data and allow for some misclassification. Non-linear classification techniques like kernel tricks and similarity features are described to project data into higher dimensions for linear separation. SVM regression is also covered, which aims to predict continuous output values. Key concepts like support vectors, margins, kernels, and computational complexity are defined.

Uploaded by

saharabdouma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

CHAPTER 5 SVM

CONTENT
Introduction

Important Definitions

Linear SVM Classification (Hard Margin)

Soft Margin

Feature Scaling in SVM

Non linear classification

Computational Complexity

SVM REGRESSION

DECISION FUNCTION AND PREDICTIONS

Training Objective

QUADRATIC PROGRAMMING

THE DUAL PROBLEM

SVM CHAPTER 5 2
INTRODUCTION

o A Support Vector machine (SVM) is a powerful and versatile


machine learning model, capable of performing linear and
Nonlinear Classification.
o SVMs are particularly well suited for classification of complex
small or medium-sized datasets .
o The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space into
classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is
called a hyperplane.

SVM CHAPTER 5 3
SVM CHAPTER 5 4
INTRODUCTION

Pros and Cons of SVM


➢ Pros
• Versatile Kernel Functions
• Effective in High-Dimensional Spaces
• Effective in cases where number of features is greater than
the number of data points(classes).
➢ Cons
• Sensitivity to Noise: SVMs are sensitive to noise in the
dataset, as outliers might affect the construction of the
hyperplane and margin.
• Scalability Issues: SVMs might not be suitable for large-
scale datasets due to their computational demands and
memory requirements, which can make training time-
consuming.

SVM CHAPTER 5 5
IS SVM USED FOR MULTICLASSES DATASET?

o SVMs were originally designed for binary


classification, several strategies exist to extend them
to handle multiple classes.
o Two common approaches are the "one-vs-one" and
"one-vs-all" strategies.

SVM CHAPTER 5 6
IMPORTANT DEFINITIONS
• Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space
• Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed as
Support Vector.
• Margin: Margin is the distance between the support vector and hyperplane .
The wider margin indicates better classification performance.
• Loss Function: SVR uses a loss function that penalizes deviations of predicted
values from the actual values. Common loss functions include epsilon-
insensitive loss or the mean squared error
• Kernel Trick: Like classification SVM, SVR can also benefit from the kernel trick,
which allows the algorithm to implicitly map the input data into a higher-
dimensional space, making it possible to find a nonlinear hyperplane

SVM CHAPTER 5 7
LINEAR SVM CLASSIFIER

• The linear SVM classifier is used for linearly separable


data, which means if a dataset can be classified into two
classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used
called as Linear SVM classifier.

SVM CHAPTER 5 8
HARD (LARGE) MARGIN
HARD (LARGE) MARGIN
• it’s very clear that there are multiple that segregate our data points or
do a classification between red and blue circles.
• One reasonable choice as the best hyperplane is the one that represents
the largest separation or margin between the two classes
• So we choose the hyperplane whose distance from it to the nearest data
point on each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin(L2).

SVM CHAPTER 5 10
FEATURE SCALING
FEATURE SCALING IN SVM

WITH SCALING WITHOUT SCALING

SVM CHAPTER 5 12
SOFT MARGIN
SOFT MARGIN
There are two main issues with hard margin classification.
• First, it only works if the data is linearly separable
• second it is quite sensitive to outliers.

• The objective is to find a good balance between keeping


the margin as large as possible and limiting the margin
violations

SVM CHAPTER 5 14
SOFT MARGIN
• The regularization parameter "C" in SVM controls the trade-
off between maximizing the margin and minimizing training
errors. A small C focuses on maximizing the margin,
potentially allowing some misclassifications (soft margin)
and preventing overfitting.
• On the left, using a low C value the margin is quite large,
but many instances end up on the street.
• On the right, using a high C value the classifier makes fewer
margin violations but ends up with a smaller margin.

Chapter 5 15
NON LINEAR
CLASSIFICATION
NON LINEAR CLASSIFICATION METHODS

1-Adding Similarity Features 2-Kernel Trick

o The "similarity features" are the transformed o A method where Non-Linear data is projected
features in the higher-dimensional space, onto a higher dimension space such that it is
computed based on the similarity function. easier to classify the data where it could be
linearly divided by a plane.
o Instead of explicitly transforming the input
features into a higher-dimensional space,
Kernel Trick function allows SVMs to work in
this higher-dimensional space implicitly.

SVM CHAPTER 5 17
KERNEL FUNCTION

1-Linear Kernel 2-Polynomial Kernel 3-Gaussian RBF Kernel

Suitable for linearly separable Introduces polynomial Creates circular decision boundaries,
data. features, useful for capturing suitable for complex patterns.
non-linear relationships.

SVM CHAPTER 5 19
COMPUTATIONAL COMPLEXITY & SELECTED KERNEL
• As a rule of thumb, you should always try the linear kernel
first (remember that LinearSVC is much faster than
SVC(kernel="linear")), especially if the training set is very
large or if it has plenty of features.
• In the absence of expert knowledge, the Radial Basis
Function kernel makes a good default kernel (once you
have established it is a problem requiring a non-linear
model).

SVM CHAPTER 5 20
SVM REGRESSION
WHAT THE DIFFERENCE BETWEEN CLASSIFICATION AND
REGRESSION?

Classification Regression

• The output variable is a category or label. • The output variable is a continuous value.
• The goal is to assign input data points to predefined • The goal is to predict a numerical value based on input
categories or classes. features.

• predicting house prices based on features like square


• spam detection (classifying emails as spam or not
footage, number of bedrooms, etc.
spam).
• predicting a person's income based on education and
• image classification (identifying objects in an image)
experience

SVM CH5 22
• Support Vector Machine (SVM) can be used for regression
tasks as well, and this variant is known as Support Vector
Regression (SVR).
• In SVR, the goal is to predict a continuous output, and the
algorithm aims to find a function that fits the data while
minimizing the prediction error

SVM CHAPTER 5 23
SVM CHAPTER 5 24
• The SVR approach allows for flexibility in capturing
complex relationships in data, and it is particularly useful
when dealing with datasets where the relationship between
the input features and the output variable is not strictly
linear.

• In summary, Support Vector Regression (SVR) is an


extension of Support Vector Machines (SVM) to
regression tasks, aiming to predict continuous output
variables by finding a hyperplane that minimizes the
error within a certain margin.

SVM CHAPTER 5 25
DECISION FUNCTION
AND PREDICTIONS
DECISION FUNCTION DECISION BOUNDARY

• The decision function is used to determine the class of a • The decision boundary is the set of points where the
new instance decision function is equal to 0.

• In a two-dimensional space (two features, like petal


• The decision function is w^T.x+b, where w is a vector of
width and petal length), the decision boundary is a
coefficients, x is the input instance, and b is a bias term.
straight line

• If w^T.x+b is positive, the predicted class y​ is the • The goal of training a linear SVM is to find the optimal
positive class (1); otherwise, it is the negative class (0). w and b values that define this decision boundary.

SVM CH5 27
MARGIN TRAINING THE SVM

• The margin is the distance between the decision • Training the SVM involves finding the right values for w
boundary and the nearest data point from either class and b to maximize the margin

• SVM aims to maximize this margin. A wider margin • In a hard-margin SVM, the goal is to make the margin
generally leads to better generalization to new, unseen as wide as possible without allowing any data points to
data. violate the margin.

• The points where the decision function is equal to 1 or -


• In a soft-margin SVM, some margin violations are
1 (represented by dashed lines) are parallel to and at an
allowed, but the goal is still to keep the margin
equal distance from the decision boundary, forming the
reasonably wide while penalizing violations.
margin.

SVM CH5 28
EXAMPLES

SVM CHAPTER 5 29
IN SUMMARY

The SVM tries to find the optimal decision


boundary that separates different classes in a
way that maximizes the margin between
them. The margin ensures a robust
classification, and the SVM training process
involves finding the right parameters to
achieve this optimal separation

SVM CH5 30
TRAINING
OBJECTIVE
HARD MARGIN LINEAR SVM CLASSIFIER OBJECTIVE
• The SVM wants a clean, clear separation between classes
with no errors. It's strict about not allowing any data points to
cross the decision boundary

SVM CHAPTER 5 32
1-Objective: The goal is to find a decision boundary (defined
by w and b) that maximizes the margin between different
classes.

2. Constraint: For a hard margin VM, we want all positive


instances to have a decision function greater than or equal to 1
and all negative instances to have a decision function less than
or equal to -1.

3. Mathematical Representation: The objective function to


minimize is 1/2 • w tw, which is equivalent to 1/2 • ||w|| 2.
The minimization is subject to the constraint ti(w2 xi + b) >= 1
for all instances.

SVM CHAPTER 5 33
SOFT MARGIN LINEAR SVM CLASSIFIER OBJECTIVE

• The SVM allows for a bit of flexibility. It understands that


the data may not be perfectly separable, so it permits some
points to cross the decision boundary but with caution.

Chapter 5 34
SOFT MARGIN LINEAR SVM CLASSIFIER OBJECTIVE

1-Introduction of Slack Variables (ζi ): Introduce slack variables


(ζi​) to measure how much each instance is allowed to violate the
margin. (ζi​) is greater than or equal to zero for each instance.
2-Conflicting Objectives:
There are two conflicting goals:
• minimizing the slack variables (ζi​) to reduce margin violations
• minimizing 1/2 • w Tw to increase the margin.

3-Trade-off with Hyperparameter C: Introduce a hyperparameter


C to balance these conflicting objectives. It allows you to control the
trade-off between having a large margin and allowing some
instances to violate the margin.
4-Mathematical Representation: The objective function to
minimize becomes 1/2 • wTw + C Σmi=1(ζi​) This is subject to the
constraints ti(wT + b) >= 1 - ζi and ζi >= 0 for all instances.

Chapter 5 35
• For a hard margin SVM, the focus is on having no margin
violations, and the margin is maximized.
• For a soft margin SVM, some margin violations are allowed,
controlled by slack variables (ζi​), and the trade-off between
margin size and violations is controlled by the
hyperparameter C.
• In both cases, the objective is to find the optimal w and b
values that define the decision boundary while considering
the balance between maximizing the margin and allowing
for some flexibility in the presence of misclassifications.

Chapter 5 36
THE QUADRATIC
PROGRAMMING
DEFINITION AND COMPONENTS
• QP is a type of mathematical optimization problem where
the goal is to minimize a quadratic objective function
subject to linear constraints.
• p is a vector representing parameters to be optimized.
• H is a matrix associated with the quadratic terms in the
objective function.
• f is a vector associated with linear terms in the objective
function.
• A is a matrix representing linear constraints.
• b is a vector defining the right-hand side of linear
constraints

SVM CHAPTER 5 38
Hard margin linear SVM Soft margin linear SVM

• Objective: Find the optimal parameters (p) for a linear


• Objective: Allow for some margin violations (flexibility)
SVM with a hard margin (clear separation, no margin
while still maintaining a reasonably wide margin
violations)

• Using QP Solver: Feed these parameters into a QP


solver. The result (p) will contain the bias term and
feature weights.

SVM CH5 39
THE DUAL
PROBLEM
• The dual problem is a related optimization problem, and in
SVM, solving the dual problem gives the same solution as
the primal problem under certain conditions.
• It involves minimizing a quadratic expression involving the
dual variables (α).

SVM CHAPTER 5 41
ADVANTAGES OF DUAL PROBLEM

• Efficiency: Solving the dual problem is often faster than


solving the primal problem when the number of training
instances is smaller than the number of features.

• Kernel Trick: The dual problem makes the kernel trick


possible, which is a technique that allows SVMs to
implicitly operate in high-dimensional feature spaces
without explicitly computing the transformation. This is
particularly useful when dealing with non-linearly
separable data

SVM CHAPTER 5 42
THANK YOU

SVM CHAPTER 5 43

You might also like