SVM (Repaired)
SVM (Repaired)
(SVM)
Introduction
▪ Support Vector Machine (SVM) is one of the most popular
Supervised Learning algorithms, which is used for Classification as
well as Regression problems.
Linear SVM
▪ Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
Non-linear SVM
▪ Non-Linear SVM is used for non-linearly separated data, which means if a
dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
Hyper-plane
▪ There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary
that helps to classify the data points. This best boundary is known as the
hyper-plane of SVM.
▪ The dimensions of the hyper-plane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyper-plane will be a straight line. And if there are 3 features, then hyper-
plane will be a 2-dimension plane. We always create a hyper-plane that has
a maximum margin, which means the maximum distance between the data
points.
Support Vectors
▪ The data points or vectors that are the closest to the hyper-
plane and which affect the position of the hyper-plane are
termed as Support Vector.
By adding the third dimension, the sample space will become as below image:
Concept of SVM
▪ So now, SVM will divide the datasets into classes in the following way. Consider
the below image:
▪ Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
▪ If we convert it in 2d space with z=1, then it will become as:
Concept of SVM
▪ Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis.
▪ If we convert it in 2d space with z=1, then it will become as:
▪ Hence we get a circumference of radius 1 in case of non-linear data.
SVM Pros/Cons
Pros of SVM
▪ SVM classifiers offers great accuracy and work well with high
dimensional space. SVM classifiers basically use a subset of
training points hence in result uses very less memory.
A
B
▪ Scenario-2 Here we have 3 hyperplanes (A, B, and C)and all are segregating the
classes well. Now how can we identify the right Hyperplane.
A
B
C
▪ Here, maximizing the distance between nearest data points and Hyperplane will help us to decide
the right Hyperplane.
▪ In above picture margins of Hyperplane C is more than the others. So C has selected as the best
Hyperplane.
How Does Identify Right Hyperplane
▪ Scenario-3 Here we have 2 hyperplanes (A, and B). Use the rules as discussed in
previous section to identify the right Hyperplane
▪ SVM selects the Hyperplane which classifies the classes accurately prior to maximizing the
margins.
▪ Hyperplane B has a classification error and A has classified all accurately. Therefore right
Hyperplane is A.
How Does Identify Right Hyperplane
Advantages of SVM
Disadvantages of SVM
How Does Identify Right Hyperplane
▪ Scenario-4 Below we are not able to segregate the two classes using straight line, as
one of the star lies on the territory of circle.
▪ One star at other end is like an outlier for star class. The svm algorithm has the feature of ignoring
the outlier and find the Hyperplane with maximum margins. Hence we can say that the svm
classification is robust to outliers.
Non-Linear SVM
It is a form of data structure where the data elements don't stay arranged
linearly or sequentially. Since the data structure is non-linear, it does not involve
a single level.
Kernel Function
In machine learning, a kernel refers to a method that allows us to
apply linear classifiers to non-linear problems by mapping non-linear
data into a higher-dimensional space without the need to visit or
understand that higher-dimensional space.
Kernel Trick
Types of Kernel Function
▪ Polynomial Kernel
▪ RBF Kernel
▪ Sigmoid Kernel
Polynomial Kernel
▪ The polynomial kernel is a kernel function commonly used with support
vector machines and other kernelized models, that represents the similarity
of vectors in a feature space over polynomials of the original variables
▪ It is popular in image processing.
K ( xi , x j ) = ( xi .x j + 1)
T d
− | x − y |2
K ( x, y ) = exp
2
2
Hyperbolic Tangent Kernel
▪ Mainly used in neural networks.
K ( xi , x j ) = tanh ( kxi .x j + c )