0% found this document useful (0 votes)
36 views32 pages

ML Lec9 SVM

Uploaded by

merola.g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views32 pages

ML Lec9 SVM

Uploaded by

merola.g
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Support Vector Machine

Prof. Dr. Magdy M. Aboul-Ela


Email: [email protected]

Dr. Heba Zaki


Email: [email protected]
WHAT IS AN SVM?
• Support Vector Machine (SVM) is a powerful Supervised machine
learning algorithm used for classification, regression, and even outlier
detection tasks.
• SVMs can be used for a variety of tasks, such as text classification,
image classification, spam detection, handwriting identification,
gene expression analysis, face detection, and difference detection.
• SVMs are adaptable and efficient in a variety of applications because
they can manage high-dimensional data and nonlinear relationships.
• SVM algorithms are very effective as we try to find the maximum
separating hyperplane between the different classes available in the
2
target feature.
SVM (CONT.)

• The goal of the SVM algorithm is to create the best line or decision
boundary that can separate n-dimensional space into classes so
that we can easily put the new data point in the correct category in
the future.
• This best decision boundary is called a hyperplane.
• The decision boundary created by SVMs is called the maximum
margin classifier or the maximum margin hyper plane.

3
HOW SVM WORKS?
• A simple linear SVM classifier works by making a straight line
between two classes.
• That means all of the data points on one side of the line will
represent a category and the data points on the other side of the
line will be put into a different category.
• This means there can be an infinite number of lines to choose
from.
• It chooses the best line to classify data points which is the furthest
away from the closet data points as possible. 4
HYPERPLANE
• The best hyperplane is the one that represents the largest separation or
margin between the two classes.
• the maximum-margin hyperplane is L2.
• Selecting hyperplane for data with outlier:
• The SVM algorithm has the characteristics to ignore the outlier
and finds the best hyperplane that maximizes the margin.
• SVM is robust to outliers.

5
HYPERPLANE (CONT.)
• We 're trying to find the line between the two closest points that keeps the other
data points separated.
• So, the two closest data points give us the support vectors we'll use to find that line.
That line is called the decision boundary. (Hyperplane)
• The decision boundary doesn't have to be a line.
• It's also referred to as a hyperplane because you can find the
decision boundary with any number of features, not just two.

6
UNDERSTANDING SVM
Suppose we’re given these two samples of blue stars and purple hearts (just for
schematic representation and no real data are used here), and our job is to find
out a line that separates them best.
What do we mean by best here ?

7
UNDERSTANDING SVM (CONT.)
• Let’s see the image below. Could you guess which line would
separate the two samples better?

8
UNDERSTANDING SVM (CONT.)

• Yes, the red line on the left is better than the orange line because, we
say that the red line creates the ‘widest road’ (margin) between the two
groups.

9
UNDERSTANDING SVM (CONT.)

• The samples on the edge of the boundary lines (dotted) lines, are known
as ‘Support Vectors’. On the left side there are two such samples (blue
stars), compared to the one on the right.
• Few important points about Support vectors:
• Support Vectors are the samples that are most difficult to classify
• They directly affect the process to find the optimum location of the decision
boundaries (dotted lines).
• Only a very small subset of training samples (Support vectors) can fully specify the
decision function
10

• (We will see this in more detail once we learn the math behind SVM).
WHY SVMS ARE USED IN MACHINE LEARNING

• It can handle both classification and regression on linear and


non-linear data.
• They can find complex relationships between your data without
you needing to do a lot of transformations on your own.
• It's a great option when you are working with smaller datasets
that have tens to hundreds of thousands of features.
• More accurate results when compared to other algorithms because
of their ability to handle small, complex datasets.
11
TYPES OF SVMS
SVM can be of two types:
1- Simple SVM: Typically used for linear regression and
classification problems.
• It is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line,
2- Kernel SVM: Has more flexibility for non-linear data because
you can add more features to fit a hyperplane instead of a two-
dimensional space.
12
1- SIMPLE (LINEAR) SVM:
• Suppose we have a dataset that has two tags
(green and blue), and the dataset has two features
x1 and x2.
• We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue.
• It is 2-d space so by just using a straight line, we
can easily separate these two classes.
• But there can be multiple lines that can separate
13
these classes.
1- SIMPLE (LINEAR) SVM: (CONT.)
• the SVM algorithm helps to find the best line or
decision boundary (hyperplane).
• It finds the closest point of the lines from both the
classes.
• These points are called support vectors.
• The distance between the vectors and the hyperplane
is called as margin.
• The goal of SVM is to maximize this margin.
• The hyperplane with maximum margin is called 14

the optimal hyperplane


2- KERNEL SVM
• Linear Kernel functions
• These are commonly recommended for text classification because most of
these types of classification problems are linearly separable.
• Linear kernel functions are faster than most of the others
• Here's the function that defines the linear kernel:
f(X) = w^T * X + b
• In this equation, w^T is the weight vector that you want to minimize, X is
the data that you're trying to classify, and b is the linear coefficient
estimated from the training data.
• f(X) defines the decision boundary that the SVM returns. 15
2- KERNEL SVM (CONT.)
• Polynomial Kernel functions
• It isn't used in practice very often because it isn't
computationally efficient as other kernels and its predictions
aren't as accurate.
• Here's the function for a polynomial kernel:
f(X1, X2) = (a + X1^T * X2) ^ b
• f(X1, X2) represents the polynomial decision boundary that will separate
your data.
• X1 and X2 represent your data. 16
2- KERNEL SVM (CONT.)
• Gaussian Radial Basis Function (RBF)
• One of the most powerful and commonly used kernels in SVMs.
Usually the choice for non-linear data.
• Here's the equation for an RBF kernel:
f(X1, X2) = exp(-gamma * ||X1 - X2||^2)
• the gamma parameter defines how far the influence of a
single training example reaches, with low values meaning 'far'
and high values meaning 'close’.
• ||X1 - X2|| is the dot product between your features. 17
2- KERNEL SVM (CONT.)
• Sigmoid
• More useful in neural networks than in support vector machines,
but there are occasional specific use cases.
• Here's the function for a sigmoid kernel:
f(X, y) = tanh(alpha * X^T * y + C)
• In this function, alpha is a weight vector and C is an offset
value to account for some mis-classification of data that can
happen
18
Linear Classifiers Estimation:
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 w: weight vector
x: data vector

How would you


classify this data?

2024/11/24 19
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you


classify this data?

2024/11/24 20
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you


classify this data?

2024/11/24 21
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

How would you


classify this data?

2024/11/24 22
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1

Any of these
would be fine..

..but which is
best?

2024/11/24 23
a
Classifier Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.

2024/11/24 24
a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
with the maximum
margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Linear SVM
2024/11/24 25
2024/11/24 26
How to calculate the distance from a point
to a line?
denotes +1
denotes -1 x
wx +b = 0

X – Vector
W
W – Normal Vector
b – Scale Value

◼ In our case, w1*x1+w2*x2+b=0,


◼ thus, w=(w1,w2), x=(x1,x2)

2024/11/24 27
Estimate the Margin
denotes +1
denotes -1 x
wx +b = 0

X – Vector
W
W – Normal Vector
b – Scale Value

◼ What is the distance expression for a point x to a line


wx+b= 0?

xw +b xw +b
d ( x) = =

2 d 2
w w
i =1 i
2
2024/11/24 28
Large-margin Decision Boundary
◼ The decision boundary should be as far away from the data of both classes
as possible
◼ We should maximize the margin, m

◼ Distance between the origin and the line wtx=-b is b/||w||

Class 2

Class 1
m

2024/11/24 29
Finding the Decision Boundary
◼ Let {x1, ..., xn} be our data set and let yi  {1,-1} be the class label of xi
◼ The decision boundary should classify all points correctly 

◼ To see this:
when y=-1, we wish (wx+b)<1,
when y=1, we wish (wx+b)>1.
For support vectors, we wish y(wx+b)=1.
◼ The decision boundary can be found by solving the following constrained

optimization problem

2024/11/24 30
HISTORY OF SVM

• SVM becomes popular because of its success in handwritten digit


recognition
• 1.1% test error rate for SVM. This is the same as the error rates of
a carefully constructed neural network
• SVM is now regarded as an important example of “kernel
methods”, one of the key area in machine learning

31
PROS AND CONS FOR USING SVMS.
• Pros :
• Effective on datasets with multiple features, like financial or medical data.
• Effective in cases where number of features is greater than the number of data points.
• Uses a subset of training points in the decision function called support vectors which makes it
memory efficient.
• Different kernel functions can be specified for the decision function. You can use common kernels, but
it's also possible to specify custom kernels.

• Cons:
• If the number of features is a lot bigger than the number of data points, avoiding overfitting when
choosing kernel functions and regularization term is crucial.
• SVMs don't directly provide probability estimates. Those are calculated using an expensive five-fold
cross-validation. 32

• Works best on small sample sets because of its high training time.

You might also like