Radial Basis Function (RBF) Kernel The Go-To Kernel by Sushanth Sreenivasa Towards Data Science
Radial Basis Function (RBF) Kernel The Go-To Kernel by Sushanth Sreenivasa Towards Data Science
Search Medium
Listen Share
Fig 1: No worries! RBF got you covered. [Image Credits: Tenor (tenor.com)]
RBF kernels are the most generalized form of kernelization and is one of the most
widely used kernels due to its similarity to the Gaussian distribution. The RBF
kernel function for two points X₁ and X₂ computes the similarity or how close
they are to each other. This kernel can be mathematically represented as follows:
where,
1. ‘σ’ is the variance and our hyperparameter
2. ||X₁ - X₂|| is the Euclidean (L₂-norm) Distance between two points X₁ and X₂
Let d₁₂ be the distance between the two points X₁ and X₂, we can now represent
d₁₂ as follows:
The maximum value that the RBF kernel can be is 1 and occurs when d₁₂ is 0
which is when the points are the same, i.e. X₁ = X₂.
1. When the points are the same, there is no distance between them and
therefore they are extremely similar
2. When the points are separated by a large distance, then the kernel value is
less than 1 and close to 0 which would mean that the points are dissimilar
It is important to find the right value of ‘σ’ to decide which points should be
considered similar and this can be demonstrated on a case by case basis.
a] σ = 1
The curve for this equation is given below and we can notice that as the distance
increases, the RBF Kernel decreases exponentially and is 0 for distances greater
than 4.
Fig 4: RBF Kernel for σ = 1 [Image by Author]
1. We can notice that when d₁₂ = 0, the similarity is 1 and as d₁₂ increases
beyond 4 units, the similarity is 0
2. From the graph, we see that if the distance is below 4, the points can be
considered similar and if the distance is greater than 4 then the points are
dissimilar
b] σ = 0.1
When σ = 0.1, σ² = 0.01 and the RBF kernel’s mathematical equation will be as
follows:
The width of the Region of Similarity is minimal for σ = 0.1 and hence, only if
points are extremely close they are considered similar.
1. We see that the curve is extremely peaked and is 0 for distances greater than
0.2
2. The points are considered similar only if the distance is less than or equal to
0.2
b] σ = 10
When σ = 10, σ² = 100 and the RBF kernel’s mathematical equation will be as
follows:
The width of the Region of Similarity is large for σ = 100 because of which the
points that are farther away can be considered to be similar.
2. The points are considered similar for distances up to 10 units and beyond 10
units they are dissimilar
It is evident from the above cases that the width of the Region of Similarity changes as σ
changes.
Finding the right σ for a given dataset is important and can be done by using
hyperparameter tuning techniques like Grid Search Cross Validation and Random
Search Cross Validation.
RBF Kernel is popular because of its similarity to K-Nearest Neighborhood
Algorithm. It has the advantages of K-NN and overcomes the space complexity
problem as RBF Kernel Support Vector Machines just needs to store the support
vectors during training and not the entire dataset.
Fig 6: RBF Kernel SVM for Iris Dataset [Image Credits: https://scikit-learn.org/]
From the figure, we can see that as γ increases, i.e. σ reduces, the model tends to
overfit for a given value of C.
Finding the right γ or σ along with the value of C is essential in order to achieve
the best Bias-Variance Trade off.
Data Science Machine Learning Support Vector Machine Artificial Intelligence
References:
Analytics
Follow