0% found this document useful (0 votes)

52 views43 pages

Hands On Machine Learning 3 Edition

This chapter discusses support vector machines (SVMs) including linear SVM classification for linearly separable data using a hard margin. It introduces soft margins to handle non-separable data and allow for some misclassification. Non-linear classification techniques like kernel tricks and similarity features are described to project data into higher dimensions for linear separation. SVM regression is also covered, which aims to predict continuous output values. Key concepts like support vectors, margins, kernels, and computational complexity are defined.

Uploaded by

saharabdouma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views43 pages

Hands On Machine Learning 3 Edition

Uploaded by

saharabdouma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

CHAPTER 5 SVM

CONTENT
Introduction

Important Definitions

Linear SVM Classification (Hard Margin)

Soft Margin

Feature Scaling in SVM

Non linear classification

Computational Complexity

SVM REGRESSION

DECISION FUNCTION AND PREDICTIONS

Training Objective

QUADRATIC PROGRAMMING

THE DUAL PROBLEM

SVM CHAPTER 5 2
INTRODUCTION

o A Support Vector machine (SVM) is a powerful and versatile

machine learning model, capable of performing linear and
Nonlinear Classification.
o SVMs are particularly well suited for classification of complex
small or medium-sized datasets .
o The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space into
classes so that we can easily put the new data point in the
correct category in the future. This best decision boundary is
called a hyperplane.

SVM CHAPTER 5 3
SVM CHAPTER 5 4
INTRODUCTION

Pros and Cons of SVM

➢ Pros
• Versatile Kernel Functions
• Effective in High-Dimensional Spaces
• Effective in cases where number of features is greater than
the number of data points(classes).
➢ Cons
• Sensitivity to Noise: SVMs are sensitive to noise in the
dataset, as outliers might affect the construction of the
hyperplane and margin.
• Scalability Issues: SVMs might not be suitable for large-
scale datasets due to their computational demands and
memory requirements, which can make training time-
consuming.

SVM CHAPTER 5 5
IS SVM USED FOR MULTICLASSES DATASET?

o SVMs were originally designed for binary

classification, several strategies exist to extend them
to handle multiple classes.
o Two common approaches are the "one-vs-one" and
"one-vs-all" strategies.

SVM CHAPTER 5 6
IMPORTANT DEFINITIONS
• Hyperplane: There can be multiple lines/decision boundaries to segregate the
classes in n-dimensional space
• Support Vectors: The data points or vectors that are the closest to the
hyperplane and which affect the position of the hyperplane are termed as
Support Vector.
• Margin: Margin is the distance between the support vector and hyperplane .
The wider margin indicates better classification performance.
• Loss Function: SVR uses a loss function that penalizes deviations of predicted
values from the actual values. Common loss functions include epsilon-
insensitive loss or the mean squared error
• Kernel Trick: Like classification SVM, SVR can also benefit from the kernel trick,
which allows the algorithm to implicitly map the input data into a higher-
dimensional space, making it possible to find a nonlinear hyperplane

SVM CHAPTER 5 7
LINEAR SVM CLASSIFIER

• The linear SVM classifier is used for linearly separable

data, which means if a dataset can be classified into two
classes by using a single straight line, then such data is
termed as linearly separable data, and classifier is used
called as Linear SVM classifier.

SVM CHAPTER 5 8
HARD (LARGE) MARGIN
HARD (LARGE) MARGIN
• it’s very clear that there are multiple that segregate our data points or
do a classification between red and blue circles.
• One reasonable choice as the best hyperplane is the one that represents
the largest separation or margin between the two classes
• So we choose the hyperplane whose distance from it to the nearest data
point on each side is maximized. If such a hyperplane exists it is known as
the maximum-margin hyperplane/hard margin(L2).

SVM CHAPTER 5 10
FEATURE SCALING
FEATURE SCALING IN SVM

WITH SCALING WITHOUT SCALING

SVM CHAPTER 5 12
SOFT MARGIN
SOFT MARGIN
There are two main issues with hard margin classification.
• First, it only works if the data is linearly separable
• second it is quite sensitive to outliers.

• The objective is to find a good balance between keeping

the margin as large as possible and limiting the margin
violations

SVM CHAPTER 5 14
SOFT MARGIN
• The regularization parameter "C" in SVM controls the trade-
off between maximizing the margin and minimizing training
errors. A small C focuses on maximizing the margin,
potentially allowing some misclassifications (soft margin)
and preventing overfitting.
• On the left, using a low C value the margin is quite large,
but many instances end up on the street.
• On the right, using a high C value the classifier makes fewer
margin violations but ends up with a smaller margin.

Chapter 5 15
NON LINEAR
CLASSIFICATION
NON LINEAR CLASSIFICATION METHODS

1-Adding Similarity Features 2-Kernel Trick

o The "similarity features" are the transformed o A method where Non-Linear data is projected
features in the higher-dimensional space, onto a higher dimension space such that it is
computed based on the similarity function. easier to classify the data where it could be
linearly divided by a plane.
o Instead of explicitly transforming the input
features into a higher-dimensional space,
Kernel Trick function allows SVMs to work in
this higher-dimensional space implicitly.

SVM CHAPTER 5 17
KERNEL FUNCTION

1-Linear Kernel 2-Polynomial Kernel 3-Gaussian RBF Kernel

Suitable for linearly separable Introduces polynomial Creates circular decision boundaries,
data. features, useful for capturing suitable for complex patterns.
non-linear relationships.

SVM CHAPTER 5 19
COMPUTATIONAL COMPLEXITY & SELECTED KERNEL
• As a rule of thumb, you should always try the linear kernel
first (remember that LinearSVC is much faster than
SVC(kernel="linear")), especially if the training set is very
large or if it has plenty of features.
• In the absence of expert knowledge, the Radial Basis
Function kernel makes a good default kernel (once you
have established it is a problem requiring a non-linear
model).

SVM CHAPTER 5 20
SVM REGRESSION
WHAT THE DIFFERENCE BETWEEN CLASSIFICATION AND
REGRESSION?

Classification Regression

• The output variable is a category or label. • The output variable is a continuous value.
• The goal is to assign input data points to predefined • The goal is to predict a numerical value based on input
categories or classes. features.

• predicting house prices based on features like square

• spam detection (classifying emails as spam or not
footage, number of bedrooms, etc.
spam).
• predicting a person's income based on education and
• image classification (identifying objects in an image)
experience

SVM CH5 22
• Support Vector Machine (SVM) can be used for regression
tasks as well, and this variant is known as Support Vector
Regression (SVR).
• In SVR, the goal is to predict a continuous output, and the
algorithm aims to find a function that fits the data while
minimizing the prediction error

SVM CHAPTER 5 23
SVM CHAPTER 5 24
• The SVR approach allows for flexibility in capturing
complex relationships in data, and it is particularly useful
when dealing with datasets where the relationship between
the input features and the output variable is not strictly
linear.

• In summary, Support Vector Regression (SVR) is an

extension of Support Vector Machines (SVM) to
regression tasks, aiming to predict continuous output
variables by finding a hyperplane that minimizes the
error within a certain margin.

SVM CHAPTER 5 25
DECISION FUNCTION
AND PREDICTIONS
DECISION FUNCTION DECISION BOUNDARY

• The decision function is used to determine the class of a • The decision boundary is the set of points where the
new instance decision function is equal to 0.

• In a two-dimensional space (two features, like petal

• The decision function is w^T.x+b, where w is a vector of
width and petal length), the decision boundary is a
coefficients, x is the input instance, and b is a bias term.
straight line

• If w^T.x+b is positive, the predicted class y is the • The goal of training a linear SVM is to find the optimal
positive class (1); otherwise, it is the negative class (0). w and b values that define this decision boundary.

SVM CH5 27
MARGIN TRAINING THE SVM

• The margin is the distance between the decision • Training the SVM involves finding the right values for w
boundary and the nearest data point from either class and b to maximize the margin

• SVM aims to maximize this margin. A wider margin • In a hard-margin SVM, the goal is to make the margin
generally leads to better generalization to new, unseen as wide as possible without allowing any data points to
data. violate the margin.

• The points where the decision function is equal to 1 or -

• In a soft-margin SVM, some margin violations are
1 (represented by dashed lines) are parallel to and at an
allowed, but the goal is still to keep the margin
equal distance from the decision boundary, forming the
reasonably wide while penalizing violations.
margin.

SVM CH5 28
EXAMPLES

SVM CHAPTER 5 29
IN SUMMARY

The SVM tries to find the optimal decision

boundary that separates different classes in a
way that maximizes the margin between
them. The margin ensures a robust
classification, and the SVM training process
involves finding the right parameters to
achieve this optimal separation

SVM CH5 30
TRAINING
OBJECTIVE
HARD MARGIN LINEAR SVM CLASSIFIER OBJECTIVE
• The SVM wants a clean, clear separation between classes
with no errors. It's strict about not allowing any data points to
cross the decision boundary

SVM CHAPTER 5 32
1-Objective: The goal is to find a decision boundary (defined
by w and b) that maximizes the margin between different
classes.

2. Constraint: For a hard margin VM, we want all positive

instances to have a decision function greater than or equal to 1
and all negative instances to have a decision function less than
or equal to -1.

3. Mathematical Representation: The objective function to

minimize is 1/2 • w tw, which is equivalent to 1/2 • ||w|| 2.
The minimization is subject to the constraint ti(w2 xi + b) >= 1
for all instances.

SVM CHAPTER 5 33
SOFT MARGIN LINEAR SVM CLASSIFIER OBJECTIVE

• The SVM allows for a bit of flexibility. It understands that

the data may not be perfectly separable, so it permits some
points to cross the decision boundary but with caution.

Chapter 5 34
SOFT MARGIN LINEAR SVM CLASSIFIER OBJECTIVE

1-Introduction of Slack Variables (ζi ): Introduce slack variables

(ζi) to measure how much each instance is allowed to violate the
margin. (ζi) is greater than or equal to zero for each instance.
2-Conflicting Objectives:
There are two conflicting goals:
• minimizing the slack variables (ζi) to reduce margin violations
• minimizing 1/2 • w Tw to increase the margin.

3-Trade-off with Hyperparameter C: Introduce a hyperparameter

C to balance these conflicting objectives. It allows you to control the
trade-off between having a large margin and allowing some
instances to violate the margin.
4-Mathematical Representation: The objective function to
minimize becomes 1/2 • wTw + C Σmi=1(ζi) This is subject to the
constraints ti(wT + b) >= 1 - ζi and ζi >= 0 for all instances.

Chapter 5 35
• For a hard margin SVM, the focus is on having no margin
violations, and the margin is maximized.
• For a soft margin SVM, some margin violations are allowed,
controlled by slack variables (ζi), and the trade-off between
margin size and violations is controlled by the
hyperparameter C.
• In both cases, the objective is to find the optimal w and b
values that define the decision boundary while considering
the balance between maximizing the margin and allowing
for some flexibility in the presence of misclassifications.

Chapter 5 36
THE QUADRATIC
PROGRAMMING
DEFINITION AND COMPONENTS
• QP is a type of mathematical optimization problem where
the goal is to minimize a quadratic objective function
subject to linear constraints.
• p is a vector representing parameters to be optimized.
• H is a matrix associated with the quadratic terms in the
objective function.
• f is a vector associated with linear terms in the objective
function.
• A is a matrix representing linear constraints.
• b is a vector defining the right-hand side of linear
constraints

SVM CHAPTER 5 38
Hard margin linear SVM Soft margin linear SVM

• Objective: Find the optimal parameters (p) for a linear

• Objective: Allow for some margin violations (flexibility)
SVM with a hard margin (clear separation, no margin
while still maintaining a reasonably wide margin
violations)

• Using QP Solver: Feed these parameters into a QP

solver. The result (p) will contain the bias term and
feature weights.

SVM CH5 39
THE DUAL
PROBLEM
• The dual problem is a related optimization problem, and in
SVM, solving the dual problem gives the same solution as
the primal problem under certain conditions.
• It involves minimizing a quadratic expression involving the
dual variables (α).

SVM CHAPTER 5 41
ADVANTAGES OF DUAL PROBLEM

• Efficiency: Solving the dual problem is often faster than

solving the primal problem when the number of training
instances is smaller than the number of features.

• Kernel Trick: The dual problem makes the kernel trick

possible, which is a technique that allows SVMs to
implicitly operate in high-dimensional feature spaces
without explicitly computing the transformation. This is
particularly useful when dealing with non-linearly
separable data

SVM CHAPTER 5 42
THANK YOU

SVM CHAPTER 5 43

ML Lec9 SVM
No ratings yet
ML Lec9 SVM
32 pages
HandsOnML Ch5E
No ratings yet
HandsOnML Ch5E
31 pages
Module 3 ML 24
No ratings yet
Module 3 ML 24
65 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
Unit II 2.2 ML Kernel Machines SVM
No ratings yet
Unit II 2.2 ML Kernel Machines SVM
50 pages
Support Vector Machines (SVMS) - Introduction and Key Concepts
No ratings yet
Support Vector Machines (SVMS) - Introduction and Key Concepts
52 pages
Ankita
No ratings yet
Ankita
10 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
SUpport Vector Machine
No ratings yet
SUpport Vector Machine
28 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Session Svmclassification
No ratings yet
Session Svmclassification
28 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
No ratings yet
Support Vector Machine (SVM) Terminology Hyperplane WX + B 0 Support Vectors Margin Kernel Hard Margin Soft Margin
6 pages
Unit2 Notes What Is A Support Vector Machine
No ratings yet
Unit2 Notes What Is A Support Vector Machine
11 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
Support Vector Machines
No ratings yet
Support Vector Machines
43 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Unit - 2-1
No ratings yet
Unit - 2-1
7 pages
SVM
No ratings yet
SVM
11 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Presentation - SVM & KM - May 2009
No ratings yet
Presentation - SVM & KM - May 2009
24 pages
Honours Endsem Notes
No ratings yet
Honours Endsem Notes
163 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
13 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Unit5 ML
No ratings yet
Unit5 ML
12 pages
SVM
No ratings yet
SVM
12 pages
Support Vector Machines
No ratings yet
Support Vector Machines
12 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
SVM
No ratings yet
SVM
4 pages
SVM 1
No ratings yet
SVM 1
17 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Unit 2 PPT - Part 2
100% (1)
Unit 2 PPT - Part 2
81 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
18 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
SVM
No ratings yet
SVM
8 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
SVM Presentation
No ratings yet
SVM Presentation
13 pages
SVM Unit 2
No ratings yet
SVM Unit 2
12 pages
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
No ratings yet
Course Title: Fundamentals of Machine Learning Course Code: Group Assignment On
9 pages
Support Vecor Machine
No ratings yet
Support Vecor Machine
4 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Unit 2
No ratings yet
Unit 2
47 pages
SVM (Repaired)
No ratings yet
SVM (Repaired)
39 pages
Support Vector Machine: Classification, Regression and Outliers Detection
No ratings yet
Support Vector Machine: Classification, Regression and Outliers Detection
26 pages
SVM Manual
No ratings yet
SVM Manual
7 pages
2425S Csec520 07 SVM
No ratings yet
2425S Csec520 07 SVM
50 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Support Vector Machine
No ratings yet
Support Vector Machine
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Classification of Artificial Intelligence Systems For Mathematics Education
100% (1)
A Classification of Artificial Intelligence Systems For Mathematics Education
18 pages
What Is A Deepfake?: Furious 7. But It Used To Take Entire Studios Full of Experts A Year To Create
No ratings yet
What Is A Deepfake?: Furious 7. But It Used To Take Entire Studios Full of Experts A Year To Create
2 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
ZN Closed Loop Method
No ratings yet
ZN Closed Loop Method
7 pages
Ed205 Part 4 Chomsky Montessori Freobel Bloom
No ratings yet
Ed205 Part 4 Chomsky Montessori Freobel Bloom
19 pages
Intro Lectures To DSA
0% (1)
Intro Lectures To DSA
17 pages
10imam Santoso
No ratings yet
10imam Santoso
11 pages
270+ Machine Learning: Projects
100% (1)
270+ Machine Learning: Projects
15 pages
Neural Network PPT Presentation
67% (3)
Neural Network PPT Presentation
23 pages
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
No ratings yet
PD Control Based On Reinforcement Learning Compensation For A DC Servo Drive
6 pages
Control System Unit 1: Ans: A
No ratings yet
Control System Unit 1: Ans: A
9 pages
ADBMS - Unit 1 - 21042018 - 032136AM
No ratings yet
ADBMS - Unit 1 - 21042018 - 032136AM
21 pages
2-Artificial Intelligence, Concept and Application
No ratings yet
2-Artificial Intelligence, Concept and Application
24 pages
Communication Disorders Fact Sheet
No ratings yet
Communication Disorders Fact Sheet
2 pages
LDPC Decoder Help Doc
No ratings yet
LDPC Decoder Help Doc
4 pages
Adaptive Neural Oscillator Using Continuous-Time Back-Propagation Learning
No ratings yet
Adaptive Neural Oscillator Using Continuous-Time Back-Propagation Learning
11 pages
Chapter 1
No ratings yet
Chapter 1
38 pages
Chapter - 13 Microsoft Excel Lookup, Vlookup and Hlookup Function
No ratings yet
Chapter - 13 Microsoft Excel Lookup, Vlookup and Hlookup Function
3 pages
Lecture Signal Flow Graphs
No ratings yet
Lecture Signal Flow Graphs
56 pages
Healthcare Chatbot Using Decision Tree Algorithm
No ratings yet
Healthcare Chatbot Using Decision Tree Algorithm
3 pages
JCL Abend Codes
No ratings yet
JCL Abend Codes
42 pages
Ishida Ou Funkel
No ratings yet
Ishida Ou Funkel
45 pages
CH 09
No ratings yet
CH 09
6 pages
Assignment 1 Due 12 Jun
No ratings yet
Assignment 1 Due 12 Jun
2 pages
Diary
No ratings yet
Diary
1 page
An Anthropologist Underwater: Immersive Soundscapes, Submarine Cyborgs, and Transductive Ethnography
No ratings yet
An Anthropologist Underwater: Immersive Soundscapes, Submarine Cyborgs, and Transductive Ethnography
22 pages
Unit I Dbms
0% (1)
Unit I Dbms
45 pages
Operations Research 1 The Two-Phase Simplex Method: Dr. Özgür Kabak
No ratings yet
Operations Research 1 The Two-Phase Simplex Method: Dr. Özgür Kabak
16 pages
Pages From Nise Control Systems Engineering 6th Etext
No ratings yet
Pages From Nise Control Systems Engineering 6th Etext
5 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
3 pages

Hands On Machine Learning 3 Edition

Uploaded by

Hands On Machine Learning 3 Edition

Uploaded by

CHAPTER 5 SVM

Linear SVM Classification (Hard Margin)

Feature Scaling in SVM

Non linear classification

DECISION FUNCTION AND PREDICTIONS

THE DUAL PROBLEM

o A Support Vector machine (SVM) is a powerful and versatile

Pros and Cons of SVM

o SVMs were originally designed for binary

• The linear SVM classifier is used for linearly separable

WITH SCALING WITHOUT SCALING

• The objective is to find a good balance between keeping

1-Adding Similarity Features 2-Kernel Trick

1-Linear Kernel 2-Polynomial Kernel 3-Gaussian RBF Kernel

• predicting house prices based on features like square

• In summary, Support Vector Regression (SVR) is an

• In a two-dimensional space (two features, like petal

• The points where the decision function is equal to 1 or -

The SVM tries to find the optimal decision

2. Constraint: For a hard margin VM, we want all positive

3. Mathematical Representation: The objective function to

• The SVM allows for a bit of flexibility. It understands that

1-Introduction of Slack Variables (ζi ): Introduce slack variables

3-Trade-off with Hyperparameter C: Introduce a hyperparameter

• Objective: Find the optimal parameters (p) for a linear

• Using QP Solver: Feed these parameters into a QP

• Efficiency: Solving the dual problem is often faster than

• Kernel Trick: The dual problem makes the kernel trick

You might also like