Lecture - 7 Classification (SVM)
Lecture - 7 Classification (SVM)
Machines (SVM)
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of Computer Science & Engineering
Support vector machines are considered by some people to be the best stock classifier. (the result will have low error rates)
Support vector machines make good decisions for data points that are outside the training set. (Test set)
There are many implementations of support vector machines, but we’ll focus on one of the most popular implementations:
The Sequential Minimal Optimization (SMO) algorithm.
It breaks the problem down into sub-problems that can be solved analytically (by calculating) rather than numerically (by searching or
optimizing).
How to use something called kernels to extend SVMs to a larger number of datasets.
03/18/24 2
Support Vector Machines
03/18/24 3
Support Vector Machines
The line used to separate the dataset is called a separating hyperplane. (to
separate the data)
Dataset with three dimension (plane)
Dataset with 1024 dimension (with 1023 dimension to separate)
03/18/24 4
Support Vector Machines
03/18/24 5
Support Vector Machines
We’d like to find the point closest to the separating hyperplane and make sure this is as far away from the separating line as possible.
This is known as margin.
We want to have the greatest possible margin, because if we made a mistake or trained our classifier on limited data, we’d want it to
be as robust as possible.
The points closest to the separating hyperplane are known as support vectors.
03/18/24 6
Support Vector Machines
we’re trying to maximize the distance from the separating line to the support vectors , we need to find a way to optimize the problem.
How can we measure the line that best separates the data? (the maximum margin – the normal or perpendicular line)
03/18/24 7
Support Vector Machines
03/18/24 8
Support Vector Machines
The Maximal-Margin classifier is a hypothetical classifier that best explain how SVM works in practice.
For example, if you had two input variables, this would form a two-dimensional space.
03/18/24 9
Support Vector Machines
In two-dimensions you can visualize this as a line and let’s assume that all of our input points can be
completely separated by this line.
For example:
B0 + (B1 * X1) + (B2 * X2) = 0
03/18/24 10
Support Vector Machines
Where:
The coefficients (B1 and B2) that determine the slop of the line,
The intercept (B0) are found by the learning algorithm, and
X1 and X2 are the two input variables.
03/18/24 11
Support Vector Machines
Above the line, the equation returns a value greater than 0 and the point
belongs to the first class (class 0).
Below the line, the equation returns a value less than 0 and the point
belongs to the second class (class 1).
A value close to the line returns a value close to zero and the point may be
difficult to classify.
If the magnitude of the value is large, the model may have more
confidence in the prediction.
03/18/24 12
Support Vector Machines
The distance between the line and the closest data points is
referred to as the margin.
The best or optimal line that can separate the two classes is the
line that is the largest margin. This is called the Maximal-Margin
hyperplane.
03/18/24 15
SVM : Soft Margin Classifier
03/18/24 17
Support Vector Machines (Kernels)
For example, the inner product of the vectors [2, 3] and [5, 6] is
2*5 + 3*6 or 28.
The equation for making a prediction for a new input using the
dot product between the input (x) and each support vector (xi) is
calculated as follows:
The dot product is the similarity measure used for linear SVM or
a linear kernel because the distance is a linear combination of the
inputs.
03/18/24 19
Linear Kernel SVM
Other kernels can be used that transform the input space into
higher dimensions such as a Polynomial Kernel and a Radial
Kernel. This is called the Kernel Trick.
03/18/24 20
Polynomial Kernel SVM
When d=1 this is the same as the linear kernel. The polynomial
kernel allows for curved lines in the input space.
03/18/24 21
Radial Kernel SVM
A good default value for gamma is 0.1, where gamma is often 0 <
gamma < 1.
The radial kernel is very local and can create complex regions
within the feature space, like closed polygons in two-dimensional
space.
03/18/24 22
Data Preparation for SVM
03/18/24 23
Classification with SVMs
03/18/24 24
Classification with SVMs
03/18/24 25
Classification with SVMs
03/18/24 26
Classification with SVMs
03/18/24 27
Classification with SVMs
03/18/24 28
Outlier Sensitivity in SVMs
03/18/24 29
Outlier Sensitivity in SVMs
03/18/24 30
Outlier Sensitivity in SVMs
03/18/24 31
Outlier Sensitivity in SVMs
03/18/24 32
Outlier Sensitivity in SVMs
03/18/24 33
Linear SVM : the Syntax
03/18/24 35
The Kernel Trick
03/18/24 36
SVM Gaussian Kernel
03/18/24 37
SVM Gaussian Kernel
03/18/24 38
SVM Gaussian Kernel
03/18/24 39
SVM Gaussian Kernel
03/18/24 40
SVM Gaussian Kernel
03/18/24 41
SVMs with Kernels : the Syntax
SVMs with RBF kernels are very slow to train with lots of
futures or data. (in real life)
03/18/24 46
Question & Answer
03/18/24 47
Thank You !!!
03/18/24 48