0% found this document useful (0 votes)
21 views5 pages

Kernels in Support Vector Machine Part B

The document discusses different kernel functions that can be used in support vector machines, including polynomial, sigmoid, radial basis function, Bessel function, and ANOVA kernels. It explains that kernels allow SVMs to work with non-linear datasets by converting lower dimension space to higher dimensions. The right kernel choice depends on the dataset characteristics and linear or RBF kernels are generally good starting points.

Uploaded by

Noor Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Kernels in Support Vector Machine Part B

The document discusses different kernel functions that can be used in support vector machines, including polynomial, sigmoid, radial basis function, Bessel function, and ANOVA kernels. It explains that kernels allow SVMs to work with non-linear datasets by converting lower dimension space to higher dimensions. The right kernel choice depends on the dataset characteristics and linear or RBF kernels are generally good starting points.

Uploaded by

Noor Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Kernels in Support Vector Machine

The most interesting feature of SVM is that it can even work with a non-linear dataset and for
this, we use “Kernel Trick” which makes it easier to classifies the points. Suppose we have a
dataset like this:

Here we see we cannot draw a single line or say hyperplane which can classify the points
correctly. So what we do is try converting this lower dimension space to a higher dimension
space using some quadratic functions which will allow us to find a decision boundary that clearly
divides the data points. These functions which help us do this are called Kernels and which
kernel to use is purely determined by hyperparameter tuning.

Different Kernel Functions


Some kernel functions which you can use in SVM are given below:
1. Polynomial Kernel
Following is the formula for the polynomial kernel:
Here d is the degree of the polynomial, which we need to specify manually.
Suppose we have two features X1 and X2 and output variable as Y, so using polynomial kernel
we can write it as:

So we basically need to find X12 , X22 and X1.X2, and now we can see that 2 dimensions got
converted into 5 dimensions.

Image 4
2. Sigmoid Kernel
We can use it as the proxy for neural networks. Equation is:
It is just taking your input, mapping them to a value of 0 and 1 so that they can be separated by a
simple straight line.

Image Source: https://dataaspirant.com/svm-kernels/#t-1608054630725


3. RBF Kernel
What it actually does is to create non-linear combinations of our features to lift your samples
onto a higher-dimensional feature space where we can use a linear decision boundary to separate
your classes It is the most used kernel in SVM classifications, the following formula explains it
mathematically:

where,
1. ‘σ’ is the variance and our hyperparameter
2. ||X₁ – X₂|| is the Euclidean Distance between two points X₁ and X₂
4. Bessel function kernel
It is mainly used for eliminating the cross term in mathematical functions. Following is the
formula of the Bessel function kernel:

5. Anova Kernel
It performs well on multidimensional regression problems. The formula for this kernel function
is:

How to Choose the Right Kernel?


I am well aware of the fact that you must be having this doubt about how to decide which kernel
function will work efficiently for your dataset. It is necessary to choose a good kernel function
because the performance of the model depends on it.
Choosing a kernel totally depends on what kind of dataset are you working on. If it is linearly
separable then you must opt. for linear kernel function since it is very easy to use and the
complexity is much lower compared to other kernel functions. I’d recommend you start with a
hypothesis that your data is linearly separable and choose a linear kernel function.
You can then work your way up towards the more complex kernel functions. Usually, we use
SVM with RBF and linear kernel function because other kernels like polynomial kernel are
rarely used due to poor efficiency. But what if linear and RBF both give approximately similar
results? Which kernel do we choose now?
Example
Let’s understand this with the help of an example, for simplicity I’ll only take 2 features that
mean 2 dimensions only. In the figure below I have plotted the decision boundary of a linear
SVM on 2 features of the iris dataset:

Here

You might also like