ML Lec9 SVM
ML Lec9 SVM
• The goal of the SVM algorithm is to create the best line or decision
boundary that can separate n-dimensional space into classes so
that we can easily put the new data point in the correct category in
the future.
• This best decision boundary is called a hyperplane.
• The decision boundary created by SVMs is called the maximum
margin classifier or the maximum margin hyper plane.
3
HOW SVM WORKS?
• A simple linear SVM classifier works by making a straight line
between two classes.
• That means all of the data points on one side of the line will
represent a category and the data points on the other side of the
line will be put into a different category.
• This means there can be an infinite number of lines to choose
from.
• It chooses the best line to classify data points which is the furthest
away from the closet data points as possible. 4
HYPERPLANE
• The best hyperplane is the one that represents the largest separation or
margin between the two classes.
• the maximum-margin hyperplane is L2.
• Selecting hyperplane for data with outlier:
• The SVM algorithm has the characteristics to ignore the outlier
and finds the best hyperplane that maximizes the margin.
• SVM is robust to outliers.
5
HYPERPLANE (CONT.)
• We 're trying to find the line between the two closest points that keeps the other
data points separated.
• So, the two closest data points give us the support vectors we'll use to find that line.
That line is called the decision boundary. (Hyperplane)
• The decision boundary doesn't have to be a line.
• It's also referred to as a hyperplane because you can find the
decision boundary with any number of features, not just two.
6
UNDERSTANDING SVM
Suppose we’re given these two samples of blue stars and purple hearts (just for
schematic representation and no real data are used here), and our job is to find
out a line that separates them best.
What do we mean by best here ?
7
UNDERSTANDING SVM (CONT.)
• Let’s see the image below. Could you guess which line would
separate the two samples better?
8
UNDERSTANDING SVM (CONT.)
• Yes, the red line on the left is better than the orange line because, we
say that the red line creates the ‘widest road’ (margin) between the two
groups.
9
UNDERSTANDING SVM (CONT.)
• The samples on the edge of the boundary lines (dotted) lines, are known
as ‘Support Vectors’. On the left side there are two such samples (blue
stars), compared to the one on the right.
• Few important points about Support vectors:
• Support Vectors are the samples that are most difficult to classify
• They directly affect the process to find the optimum location of the decision
boundaries (dotted lines).
• Only a very small subset of training samples (Support vectors) can fully specify the
decision function
10
• (We will see this in more detail once we learn the math behind SVM).
WHY SVMS ARE USED IN MACHINE LEARNING
2024/11/24 19
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2024/11/24 20
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2024/11/24 21
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2024/11/24 22
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
Any of these
would be fine..
..but which is
best?
2024/11/24 23
a
Classifier Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.
2024/11/24 24
a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 The maximum
margin linear
classifier is the
linear classifier
with the maximum
margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Linear SVM
2024/11/24 25
2024/11/24 26
How to calculate the distance from a point
to a line?
denotes +1
denotes -1 x
wx +b = 0
X – Vector
W
W – Normal Vector
b – Scale Value
2024/11/24 27
Estimate the Margin
denotes +1
denotes -1 x
wx +b = 0
X – Vector
W
W – Normal Vector
b – Scale Value
xw +b xw +b
d ( x) = =
2 d 2
w w
i =1 i
2
2024/11/24 28
Large-margin Decision Boundary
◼ The decision boundary should be as far away from the data of both classes
as possible
◼ We should maximize the margin, m
Class 2
Class 1
m
2024/11/24 29
Finding the Decision Boundary
◼ Let {x1, ..., xn} be our data set and let yi {1,-1} be the class label of xi
◼ The decision boundary should classify all points correctly
◼ To see this:
when y=-1, we wish (wx+b)<1,
when y=1, we wish (wx+b)>1.
For support vectors, we wish y(wx+b)=1.
◼ The decision boundary can be found by solving the following constrained
optimization problem
2024/11/24 30
HISTORY OF SVM
31
PROS AND CONS FOR USING SVMS.
• Pros :
• Effective on datasets with multiple features, like financial or medical data.
• Effective in cases where number of features is greater than the number of data points.
• Uses a subset of training points in the decision function called support vectors which makes it
memory efficient.
• Different kernel functions can be specified for the decision function. You can use common kernels, but
it's also possible to specify custom kernels.
• Cons:
• If the number of features is a lot bigger than the number of data points, avoiding overfitting when
choosing kernel functions and regularization term is crucial.
• SVMs don't directly provide probability estimates. Those are calculated using an expensive five-fold
cross-validation. 32
• Works best on small sample sets because of its high training time.