Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
Kernel Functions: Tejumade Afonja Jan 2, 2017 6 Min Read
Kernel Functions
Tejumade Afonja
Jan 2, 2017 · 6 min read
Lately, I have been doing some reading up on machine learning and Kernels happens to
be an interesting part of classification problems, before I go further, this topic was
inspired by a medium post written by Alan, Do it Yourself nlp for bot developers .
Thanks A.
https://towardsdatascience.com/kernel-function-6f1d2be6091 1/6
23/3/2020 Kernel Functions - Towards Data Science
So what exactly is “machine learning (ML)” ? well, it turns out that ML is actually a lot of
things but the overarching theme is best summed up by this oft-quoted statement made
by Arthur Samuel way back in 1959:
“Machine Learning is the field of study that gives computers the ability to learn without
being explicitly programmed.”
Among the different types of ML tasks is what we call supervised learning (SL). This
is a situation where you put in some data you already have answers to (for example, to
predict if a dog is a particular breed, we load in millions of dog information/properties
like type, height, skin color, body hair length etc. In ML lingo, these properties are
referred to as ‘features’. A single entry of these list of features is a data instance while
the collection of everything is the Training Data which forms the basis of your
prediction i.e if you know the skin color, body hair length, height and so on of a
particular dog, then you can predict the breed it will probably belong to.
Before we can jump into kernels, we need to understand what a support vector machine is.
Support Vector Machine or SVM are supervised learning models with associated learning
algorithms that analyze data for classification( clasifications means knowing what belong
to what e.g ‘apple’ belongs to class ‘fruit’ while ‘dog’ to class ‘animals’ -see fig.1)
https://towardsdatascience.com/kernel-function-6f1d2be6091 2/6
23/3/2020 Kernel Functions - Towards Data Science
Fig. 1
In support vector machines, it looks somewhat like Fig.2 below :) which separates the
blue balls from red.
Fig. 2
From the example above of trying to predict the breed of a particular dog, it goes like
this
Data (all breeds of dog)→ Features(skin color, hair etc)→ Learning algorithm
So why Kernels?
https://towardsdatascience.com/kernel-function-6f1d2be6091 3/6
23/3/2020 Kernel Functions - Towards Data Science
Fig. 3
Can you try to solve the above problem linearly like we did with Fig. 2?
NO!
The red and blue balls cannot be separated by a straight line as they are randomly
distributed and this, in reality, is how most real life problem data are -randomly
distributed.
In machine learning, a “kernel” is usually used to refer to the kernel trick, a method of
using a linear classifier to solve a non-linear problem. It entails transforming linearly
inseparable data like (Fig. 3) to linearly separable ones (Fig. 2). The kernel function is
what is applied on each data instance to map the original non-linear observations into
a higher-dimensional space in which they become separable.
Using the dog breed prediction example again, kernels offer a better alternative.
Instead of defining a slew of features, you define a single kernel function to compute
similarity between breeds of dog. You provide this kernel, together with the data and
labels to the learning algorithm, and out comes a classifier.
https://towardsdatascience.com/kernel-function-6f1d2be6091 4/6
23/3/2020 Kernel Functions - Towards Data Science
Mathematical definition: K(x, y) = <f(x), f(y)>. Here K is the kernel function, x, y are
n dimensional inputs. f is a map from n-dimension to m-dimension space. < x,y> denotes
the dot product. usually m is much larger than n.
Intuition: normally calculating <f(x), f(y)> requires us to calculate f(x), f(y) first, and
then do the dot product. These two computation steps can be quite expensive as they
involve manipulations in m dimensional space, where m can be a large number. But after
all the trouble of going to the high dimensional space, the result of the dot product is really
a scalar: we come back to one-dimensional space again! Now, the question we have is: do
we really need to go through all the trouble to get this one number? do we really have to go
to the m-dimensional space? The answer is no, if you find a clever kernel.
Simple Example: x = (x1, x2, x3); y = (y1, y2, y3). Then for the function f(x) = (x1x1,
x1x2, x1x3, x2x1, x2x2, x2x3, x3x1, x3x2, x3x3), the kernel is K(x, y ) = (<x, y>)².
Let’s plug in some numbers to make this more intuitive: suppose x = (1, 2, 3); y = (4, 5,
6). Then:
f(x) = (1, 2, 3, 2, 4, 6, 3, 6, 9)
f(y) = (16, 20, 24, 20, 25, 30, 24, 30, 36)
<f(x), f(y)> = 16 + 40 + 72 + 40 + 100+ 180 + 72 + 180 + 324 = 1024
That’s about it for kernels. Good Job! you just took the first step to becoming a Machine
Learning Expert :)
Extra note: To learn more, you can check out how I predicted the stock market at Numerai
ml and what are kernels in machine learning and SVM.
Pelumi Aboluwarin did a fantastic job in reading the draft and suggesting this topic. Thank
you!
https://towardsdatascience.com/kernel-function-6f1d2be6091 5/6
23/3/2020 Kernel Functions - Towards Data Science
If you enjoyed reading this as much as I enjoyed writing it, you know what to do ;) show it
some love and if you have suggestions on topics you would like me to write about, drop it in
the comment section below. Thanks for reading :)
Extra Readings
1. https://en.wikipedia.org/wiki/Statistical_classification
2. https://en.wikipedia.org/wiki/Supervised_learning
https://towardsdatascience.com/kernel-function-6f1d2be6091 6/6