Hands On Machine Learning 3 Edition
Hands On Machine Learning 3 Edition
CONTENT
Introduction
Important Definitions
Soft Margin
Computational Complexity
SVM REGRESSION
Training Objective
QUADRATIC PROGRAMMING
SVM CHAPTER 5 2
INTRODUCTION
SVM CHAPTER 5 3
SVM CHAPTER 5 4
INTRODUCTION
SVM CHAPTER 5 5
IS SVM USED FOR MULTICLASSES DATASET?
SVM CHAPTER 5 6
IMPORTANT DEFINITIONS
• Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space
• Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed as
Support Vector.
• Margin: Margin is the distance between the support vector and hyperplane .
The wider margin indicates better classification performance.
• Loss Function: SVR uses a loss function that penalizes deviations of predicted
values from the actual values. Common loss functions include epsilon-
insensitive loss or the mean squared error
• Kernel Trick: Like classification SVM, SVR can also benefit from the kernel trick,
which allows the algorithm to implicitly map the input data into a higher-
dimensional space, making it possible to find a nonlinear hyperplane
SVM CHAPTER 5 7
LINEAR SVM CLASSIFIER
SVM CHAPTER 5 8
HARD (LARGE) MARGIN
HARD (LARGE) MARGIN
• it’s very clear that there are multiple that segregate our data points or
do a classification between red and blue circles.
• One reasonable choice as the best hyperplane is the one that represents
the largest separation or margin between the two classes
• So we choose the hyperplane whose distance from it to the nearest data
point on each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin(L2).
SVM CHAPTER 5 10
FEATURE SCALING
FEATURE SCALING IN SVM
SVM CHAPTER 5 12
SOFT MARGIN
SOFT MARGIN
There are two main issues with hard margin classification.
• First, it only works if the data is linearly separable
• second it is quite sensitive to outliers.
SVM CHAPTER 5 14
SOFT MARGIN
• The regularization parameter "C" in SVM controls the trade-
off between maximizing the margin and minimizing training
errors. A small C focuses on maximizing the margin,
potentially allowing some misclassifications (soft margin)
and preventing overfitting.
• On the left, using a low C value the margin is quite large,
but many instances end up on the street.
• On the right, using a high C value the classifier makes fewer
margin violations but ends up with a smaller margin.
Chapter 5 15
NON LINEAR
CLASSIFICATION
NON LINEAR CLASSIFICATION METHODS
o The "similarity features" are the transformed o A method where Non-Linear data is projected
features in the higher-dimensional space, onto a higher dimension space such that it is
computed based on the similarity function. easier to classify the data where it could be
linearly divided by a plane.
o Instead of explicitly transforming the input
features into a higher-dimensional space,
Kernel Trick function allows SVMs to work in
this higher-dimensional space implicitly.
SVM CHAPTER 5 17
KERNEL FUNCTION
Suitable for linearly separable Introduces polynomial Creates circular decision boundaries,
data. features, useful for capturing suitable for complex patterns.
non-linear relationships.
SVM CHAPTER 5 19
COMPUTATIONAL COMPLEXITY & SELECTED KERNEL
• As a rule of thumb, you should always try the linear kernel
first (remember that LinearSVC is much faster than
SVC(kernel="linear")), especially if the training set is very
large or if it has plenty of features.
• In the absence of expert knowledge, the Radial Basis
Function kernel makes a good default kernel (once you
have established it is a problem requiring a non-linear
model).
SVM CHAPTER 5 20
SVM REGRESSION
WHAT THE DIFFERENCE BETWEEN CLASSIFICATION AND
REGRESSION?
Classification Regression
• The output variable is a category or label. • The output variable is a continuous value.
• The goal is to assign input data points to predefined • The goal is to predict a numerical value based on input
categories or classes. features.
SVM CH5 22
• Support Vector Machine (SVM) can be used for regression
tasks as well, and this variant is known as Support Vector
Regression (SVR).
• In SVR, the goal is to predict a continuous output, and the
algorithm aims to find a function that fits the data while
minimizing the prediction error
SVM CHAPTER 5 23
SVM CHAPTER 5 24
• The SVR approach allows for flexibility in capturing
complex relationships in data, and it is particularly useful
when dealing with datasets where the relationship between
the input features and the output variable is not strictly
linear.
SVM CHAPTER 5 25
DECISION FUNCTION
AND PREDICTIONS
DECISION FUNCTION DECISION BOUNDARY
• The decision function is used to determine the class of a • The decision boundary is the set of points where the
new instance decision function is equal to 0.
• If w^T.x+b is positive, the predicted class y is the • The goal of training a linear SVM is to find the optimal
positive class (1); otherwise, it is the negative class (0). w and b values that define this decision boundary.
SVM CH5 27
MARGIN TRAINING THE SVM
• The margin is the distance between the decision • Training the SVM involves finding the right values for w
boundary and the nearest data point from either class and b to maximize the margin
• SVM aims to maximize this margin. A wider margin • In a hard-margin SVM, the goal is to make the margin
generally leads to better generalization to new, unseen as wide as possible without allowing any data points to
data. violate the margin.
SVM CH5 28
EXAMPLES
SVM CHAPTER 5 29
IN SUMMARY
SVM CH5 30
TRAINING
OBJECTIVE
HARD MARGIN LINEAR SVM CLASSIFIER OBJECTIVE
• The SVM wants a clean, clear separation between classes
with no errors. It's strict about not allowing any data points to
cross the decision boundary
SVM CHAPTER 5 32
1-Objective: The goal is to find a decision boundary (defined
by w and b) that maximizes the margin between different
classes.
SVM CHAPTER 5 33
SOFT MARGIN LINEAR SVM CLASSIFIER OBJECTIVE
Chapter 5 34
SOFT MARGIN LINEAR SVM CLASSIFIER OBJECTIVE
Chapter 5 35
• For a hard margin SVM, the focus is on having no margin
violations, and the margin is maximized.
• For a soft margin SVM, some margin violations are allowed,
controlled by slack variables (ζi), and the trade-off between
margin size and violations is controlled by the
hyperparameter C.
• In both cases, the objective is to find the optimal w and b
values that define the decision boundary while considering
the balance between maximizing the margin and allowing
for some flexibility in the presence of misclassifications.
Chapter 5 36
THE QUADRATIC
PROGRAMMING
DEFINITION AND COMPONENTS
• QP is a type of mathematical optimization problem where
the goal is to minimize a quadratic objective function
subject to linear constraints.
• p is a vector representing parameters to be optimized.
• H is a matrix associated with the quadratic terms in the
objective function.
• f is a vector associated with linear terms in the objective
function.
• A is a matrix representing linear constraints.
• b is a vector defining the right-hand side of linear
constraints
SVM CHAPTER 5 38
Hard margin linear SVM Soft margin linear SVM
SVM CH5 39
THE DUAL
PROBLEM
• The dual problem is a related optimization problem, and in
SVM, solving the dual problem gives the same solution as
the primal problem under certain conditions.
• It involves minimizing a quadratic expression involving the
dual variables (α).
SVM CHAPTER 5 41
ADVANTAGES OF DUAL PROBLEM
SVM CHAPTER 5 42
THANK YOU
SVM CHAPTER 5 43