Unit-4 AI - SVM
Unit-4 AI - SVM
Machine Learning
Support Vector Machine (SVM)
• Support Vector Machine abbreviated as SVM is one of the most
popular Supervised Learning algorithms
• SVM can be used for both regression and classification tasks.
• But, it is widely used in classification objectives in machine learning.
• The objective of the SVM is to perform classification of data by
constructing an N-dimensional hyperplane that optimally classifies
the data points into two categories.
• Vapnik proposed the theory of SVM.
Support Vector Machine (SVM)
• Hyperplanes are decision boundaries that help classify the data points.
• Data points falling on either side of the hyperplane can be attributed to
different classes.
• Also, the dimension of the hyperplane depends upon the number of
features.
• If the number of input features is 2, then the hyperplane is just a line.
• If the number of input features is 3, then the hyperplane becomes a two-
dimensional plane.
• It becomes difficult to imagine when the number of features exceeds 3.
• Objective is to find a hyperplane that has the
maximum margin, i.e the maximum distance between
•To separate the two classes of data points, there data points of both classes.
are many possible hyperplanes that could be
•chosen.
Maximizing the margin distance provides some reinforcement so that future data points can be classified with more
confidence.
•Support vectors are data points that are closer to the hyperplane and influence the position and
orientation of the hyperplane.
•Using these support vectors, we maximize the margin of the classifier.
•Deleting the support vectors will change the position of the hyperplane.
•These are the points that help us build SVM.
Hyperplane and Support Vectors:
Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
Support Vectors:
• The data points or vectors that are the closest to the hyperplane and
which affect the position of the hyperplane are termed as Support Vector.
Since these vectors support the hyperplane, hence called a Support vector.
Working of SVM
Example:
• Suppose we see a strange cat that also has some features of dogs, so if we want a
model that can accurately identify whether it is a cat or dog, so such a model can be
created by using the SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature.
• So as support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.
Types of SVM:
SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Linear SVM Classifier
•It is in 3-d Space, hence it is looking like a plane parallel •If we convert it in 2d space with z=1, then it will
to the x-axis. becomes as above.
•The intial space is called input space and the transformed space is called feature space.
•The kernel functions that are widely used in a variety of applications includes:
• polynomial function
• radial basis function (RBF) and
• Sigmoid function
•Over fitting.
• For the linearly separable case, a hyper-plane separating binary classes in the
three attribute situation can be represented as:
Where,
y represents the outcome
xi represents the attribute value
wi represents the weights
Where,
is the kernel function
• Among the different kernels available, the model that is able to
minimize the estimate is the best model to be chosen.
• Common kernel functions that already stated are:
• Polynomial kernel
Where,
d is the degree of polynomial kernel
is the bandwidth of the Gaussian radial
basis kernel function
Advantages of SVM
• Effective in high dimensional spaces.
• Still effective in cases where number of dimensions is greater than the
number of samples.
• Uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
• Versatile: different Kernel functions can be specified for the decision
function. Common kernels are provided, but it is also possible to specify
custom kernels.
Drawbacks of SVM
• If the number of features is much greater than the number of
samples, avoid over-fitting in choosing Kernel functions and
regularization term is crucial.
• SVMs do not directly provide probability estimates, these are
calculated using an expensive five-fold cross-validation.
Applications of SVM
• SVM algorithm can be used for:
• Face detection
• image classification
• text categorization, etc.
Support Vector Regression
• The method of Support Vector Classification can be extended to solve
regression problems. This method is called Support Vector Regression.
• The response variable is quantitative variable (not a categorical variable).
• The objective of SVR is to do numerical prediction using a set of attributes.
• Regression analysis may be defined as a statistical method for determining
relationships between variables.
• Such analysis enables users to understand the casual effect of one variable
upon another.
Example:
• Effect of inflation rate on economy
• Effect of price increase upon demand
• SVR is also proposed by V. Vapnik and other researchers.
SVR
• The cost function for building the model ignores any training data that
are close (within a threshold) to the model prediction.
Support Vector Regression
• Data regression may be used for modelling and analysis of numerical data that consists of
values of a dependent variable and one or more independent variables.
• Regression can also be used for prediction such as forecasting, inference and modelling of
casual relationships.
• In SVM regression, the first step is to map the input x onto an N-dimensional feature space
using some fixed (non-linear) mapping.
• A linear model is then constructed in this feature space. The linear model, F(x,w), is
represented as:
Where,
gi(x) denotes a set of non-linear transformations
b represents the ‘bias’ term
w represents weight matrix
•Since, the data are often assumed to be zero mean, the bias term is usally dropped.