0% found this document useful (0 votes)
16 views21 pages

Unit-4 AI - SVM

Uploaded by

shradha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views21 pages

Unit-4 AI - SVM

Uploaded by

shradha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Unit-4 (Part 2)

Machine Learning
Support Vector Machine (SVM)
• Support Vector Machine abbreviated as SVM is one of the most
popular Supervised Learning algorithms
• SVM can be used for both regression and classification tasks.
• But, it is widely used in classification objectives in machine learning.
• The objective of the SVM is to perform classification of data by
constructing an N-dimensional hyperplane that optimally classifies
the data points into two categories.
• Vapnik proposed the theory of SVM.
Support Vector Machine (SVM)
• Hyperplanes are decision boundaries that help classify the data points.
• Data points falling on either side of the hyperplane can be attributed to
different classes.
• Also, the dimension of the hyperplane depends upon the number of
features.
• If the number of input features is 2, then the hyperplane is just a line.
• If the number of input features is 3, then the hyperplane becomes a two-
dimensional plane.
• It becomes difficult to imagine when the number of features exceeds 3.
• Objective is to find a hyperplane that has the
maximum margin, i.e the maximum distance between
•To separate the two classes of data points, there data points of both classes.
are many possible hyperplanes that could be
•chosen.
Maximizing the margin distance provides some reinforcement so that future data points can be classified with more
confidence.
•Support vectors are data points that are closer to the hyperplane and influence the position and
orientation of the hyperplane.
•Using these support vectors, we maximize the margin of the classifier.
•Deleting the support vectors will change the position of the hyperplane.
•These are the points that help us build SVM.
Hyperplane and Support Vectors:
Hyperplane: There can be multiple lines/decision boundaries to segregate
the classes in n-dimensional space, but we need to find out the best
decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the
dataset, which means if there are 2 features (as shown in image), then
hyperplane will be a straight line. And if there are 3 features, then
hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means
the maximum distance between the data points.
Support Vectors:
• The data points or vectors that are the closest to the hyperplane and
which affect the position of the hyperplane are termed as Support Vector.
Since these vectors support the hyperplane, hence called a Support vector.
Working of SVM
Example:
• Suppose we see a strange cat that also has some features of dogs, so if we want a
model that can accurately identify whether it is a cat or dog, so such a model can be
created by using the SVM algorithm.
• We will first train our model with lots of images of cats and dogs so that it can learn
about different features of cats and dogs, and then we test it with this strange creature.
• So as support vector creates a decision boundary between these two data (cat and dog)
and choose extreme cases (support vectors), it will see the extreme case of cat and dog.
• On the basis of the support vectors, it will classify it as a cat.
Types of SVM:
SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Linear SVM Classifier

• SVM that perform classification of data by constructing an N-dimensional hyperplane optimally by


classifying the data points into two categories (or classes).
Example:
• Suppose we have a dataset that has two tags (green and blue), and the dataset has two features x1 and
x2.
• We want a classifier that can classify the pair(x1, x2) of coordinates in either green or blue. Consider the
below image:
• For a linearly separable dataset having n features (thereby needing n
dimensions for representation), a hyperplane is basically an (n – 1)
dimensional subspace used for separating the dataset into two sets,
each set containing data points belonging to a different class.
• For example, for a dataset having two features X and Y (therefore
lying in a 2-dimensional space), the separating hyperplane is a line (a
1-dimensional subspace).
• Similarly, for a dataset having 3-dimensions, we have a 2-dimensional
separating hyperplane, and so on.
• In machine learning, Support Vector Machine (SVM) is a non-
probabilistic, linear, binary classifier used for classifying data by
learning a hyperplane separating the data.
Non-linear SVM classifier
• As mentioned above SVM is a linear classifier which learns an (n – 1)-dimensional classifier for
classification of data into two classes.
• However, it can be used for classifying a non-linear dataset.
• This can be done by projecting the dataset into a higher dimension in which it is linearly separable!
Example:
• If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line.
• In machine learning, a trick known as “kernel trick” or kernel function is used to learn
a linear classifier to classify a non-linear dataset.
• It transforms the linearly inseparable data into a linearly separable one by projecting
it into a higher dimension.
• A kernel function is applied on each data instance to map the original non-linear data
points into some higher dimensional space in which they become linearly separable.
• SVM will divide the datasets into classes in the following way:

•It is in 3-d Space, hence it is looking like a plane parallel •If we convert it in 2d space with z=1, then it will
to the x-axis. becomes as above.
•The intial space is called input space and the transformed space is called feature space.
•The kernel functions that are widely used in a variety of applications includes:
• polynomial function
• radial basis function (RBF) and
• Sigmoid function

•Over fitting.
• For the linearly separable case, a hyper-plane separating binary classes in the
three attribute situation can be represented as:

Where,
y represents the outcome
xi represents the attribute value
wi represents the weights

y=+1 for class1


y=-1 for class2
• The maximum-margin hyperplane can be represented in terms of supprot
vectors:

Where,
is the kernel function
• Among the different kernels available, the model that is able to
minimize the estimate is the best model to be chosen.
• Common kernel functions that already stated are:
• Polynomial kernel

• Gaussian radial basis function

Where,
d is the degree of polynomial kernel
is the bandwidth of the Gaussian radial
basis kernel function
Advantages of SVM
• Effective in high dimensional spaces.
• Still effective in cases where number of dimensions is greater than the
number of samples.
• Uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
• Versatile: different Kernel functions can be specified for the decision
function. Common kernels are provided, but it is also possible to specify
custom kernels.
Drawbacks of SVM
• If the number of features is much greater than the number of
samples, avoid over-fitting in choosing Kernel functions and
regularization term is crucial.
• SVMs do not directly provide probability estimates, these are
calculated using an expensive five-fold cross-validation.
Applications of SVM
• SVM algorithm can be used for:
• Face detection
• image classification
• text categorization, etc.
Support Vector Regression
• The method of Support Vector Classification can be extended to solve
regression problems. This method is called Support Vector Regression.
• The response variable is quantitative variable (not a categorical variable).
• The objective of SVR is to do numerical prediction using a set of attributes.
• Regression analysis may be defined as a statistical method for determining
relationships between variables.
• Such analysis enables users to understand the casual effect of one variable
upon another.

Example:
• Effect of inflation rate on economy
• Effect of price increase upon demand
• SVR is also proposed by V. Vapnik and other researchers.

• The model produced by SVR is analogous to the model produced by


SVM for classification.
SVM
• The cost function for building the model does not care about training
points that lie beyond the margin.

SVR
• The cost function for building the model ignores any training data that
are close (within a threshold) to the model prediction.
Support Vector Regression
• Data regression may be used for modelling and analysis of numerical data that consists of
values of a dependent variable and one or more independent variables.
• Regression can also be used for prediction such as forecasting, inference and modelling of
casual relationships.
• In SVM regression, the first step is to map the input x onto an N-dimensional feature space
using some fixed (non-linear) mapping.
• A linear model is then constructed in this feature space. The linear model, F(x,w), is
represented as:

Where,
gi(x) denotes a set of non-linear transformations
b represents the ‘bias’ term
w represents weight matrix
•Since, the data are often assumed to be zero mean, the bias term is usally dropped.

You might also like