Machine Learning With Python - Machine Learning Algorithms-SVM
Machine Learning With Python - Machine Learning Algorithms-SVM
Introduction - SVM
Support vector machines (SVMs) are powerful yet flexible supervised machine learning
algorithms which are used both for classification and regression.
But generally, they are used in classification problems. In 1960s, SVMs were first
introduced but later they got refined in 1990.
SVMs have their unique way of implementation as compared to other machine learning
algorithms.
Lately, they are extremely popular because of their ability to handle multiple
continuous and categorical variables.
The hyperplane will be generated in an iterative manner by SVM so that the error can be minimized.
The goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane (MMH).
Margin
Class A
Y-Axis
Class B
Support
X-Axis
· Support Vectors: Datapoints that are closest to the hyperplane is called support vectors.
Separating line will be defined with the help of these data points.
· Hyperplane: As we can see in the above diagram, it is a decision plane or space which is divided
between a set of objects having different classes.
· Margin: It may be defined as the gap between two lines on the closet data points
of different classes. It can be calculated as the perpendicular distance from the line to the support
vectors. Large margin is considered as a good margin and small margin is considered as a bad
margin.
The main goal of SVM is to divide the datasets into classes to find a maximum marginal hyperplane
(MMH) and it can be done in the following two steps:
· First, SVM will generate hyperplanes iteratively that segregates the classes in best way.
· Then, it will choose the hyperplane that separates the classes correctly.
Next, we are creating a sample dataset, having linearly separable data, from
sklearn.dataset.sample_generator for classification using SVM:
The following would be the output after generating sample dataset having 100
samples and 2 clusters:
plt.xlim(-1, 3.5);
Next, we will use Scikit-Learn’s support vector classifier to train an SVM model on this data. Here, we are
using linear kernel to fit SVM as follows:
cache_size=200: - This specifies the size of the kernel cache (in MB). The kernel cache stores the results of kernel computations to
speed up the training process.
class_weight=None - This parameter allows you to set the weights for different classes. When set to None, all classes are treated as
equally important. You can set it to 'balanced' to automatically adjust weights inversely proportional to class frequencies.
coef0=0.0 - This parameter is relevant when using polynomial or sigmoid kernels. It controls the influence of higher-order terms in
the polynomial kernel and the scaling factor in the sigmoid kernel. Since you're using a linear kernel, coef0 is not relevant here.
decision_function_shape='ovr': - This determines the strategy for multi-class classification. 'ovr' stands for "one-vs-rest," meaning
that the classifier fits one classifier per class, with each classifier separating one class from the rest. The alternative is 'ovo' (one-vs-
one).
degree=3: - This parameter is relevant for polynomial kernels and specifies the degree of the polynomial function. Like coef0, it's
not relevant for a linear kernel.
gamma='auto_deprecated': - The gamma parameter defines the influence of a single training example. 'auto_deprecated' was used in
older versions of scikit-learn to signify an automatically calculated gamma value based on the number of features. In newer versions,
you should use 'scale' or 'auto'.
kernel='linear - This specifies the type of kernel used by the SVC. In this case, it's linear, which means the classifier will try to find a
linear decision boundary between classes.
probability=False: - When set to True, this parameter enables probability estimates by training an additional model with cross-
validation. It increases training time, so it's set to False by default.
random_state=None - This parameter sets the seed for the random number generator, which can ensure reproducibility of results. None
means the random number generator will be initialized randomly.
shrinking=True - This parameter enables the shrinking heuristic, which can speed up training by ignoring some training examples that
are unlikely to change the decision boundary.
tol=0.001: - This sets the tolerance for the stopping criterion. The algorithm will stop iterating when the error improvement is below
this threshold.
verbose=False: - This controls whether to print detailed information during training. False means no output will be printed.
Now, for a better understanding, the following will plot the decision functions for 2D SVC:
if ax is None:
ylim = ax.get_ylim()
For evaluating model, we need to create grid as follows:
Y, X = np.meshgrid(y, x)
xy = np.vstack([X.ravel(), Y.ravel()]).T
P = model.decision_function(xy).reshape(X.shape)
if plot_support:
ax.scatter(model.support_vectors_[:, 0],
model.support_vectors_[:, 1],
ax.set_xlim(xlim)
ax.set_ylim(ylim)
decision_function(model);
model.support_vectors_
array([[0.5323772 , 3.31338909],
[2.11114739, 3.57660449],
[1.46870582, 1.86947425]])
In practice, SVM algorithm is implemented with kernel that transforms an input data space
into the required form.
SVM uses a technique called the kernel trick in which kernel takes
a low dimensional input space and transforms it into a higher dimensional space.
In simple words, kernel converts non-separable problems into separable problems by adding more
dimensions to it.
It makes SVM more powerful, flexible and accurate. The following are some of the types of kernels used
by SVM:
Linear Kernel
It can be used as a dot product between any two observations. The formula of linear kernel is as
From the above formula, we can see that the product between two vectors say 𝑥 & 𝑥𝑖 is the sum of the
multiplication of each pair of input values.
Here d is the degree of polynomial, which we need to specify manually in the learning algorithm.
Here, gamma ranges from 0 to 1. We need to manually specify it in the learning algorithm. A good
default value of gamma is 0.1.
As we implemented SVM for linearly separable data, we can implement it in Python for the data that is
not linearly separable. It can be done by using kernels.
import pandas as pd
import numpy as np
from sklearn import svm, datasets
import matplotlib.pyplot as plt
iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
C = 1.0
Z = svc_classifier.predict(X_plot)
Z = Z.reshape(xx.shape)
plt.figure(figsize=(15, 5))
plt.subplot(121)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.title('Support Vector Classifier with linear kernel')
Output
Z = svc_classifier.predict(X_plot)
Z = Z.reshape(xx.shape)
plt.figure(figsize=(15, 5))
plt.subplot(121)
plt.contourf(xx, yy, Z, cmap=plt.cm.tab10, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Set1)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())