KNN & Support Vector Machines: Dr.S.Vasantharathna
KNN & Support Vector Machines: Dr.S.Vasantharathna
VECTOR MACHINES
Department of
ELECTRICAL AND ELECTRONICS ENGINEERING
EEE
Dr.S.VASANTHARATHNA
K-Nearest Neighbor(KNN) Algorithm for
Machine Learning
• KNN is one of the simplest Machine Learning algorithms based on Supervised Learning
technique.
• K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a categoryCIT-FDP
that is much similar to the new data.
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A and Category B
• A new data point x1, will lie in which of these categories?
• To solve this type of problem, K-NN algorithm is useful.
CIT-FDP
How does K-NN work?
• The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in each
category.
• Step-5: Assign the new data points to that category for which the number of the
neighbor is maximum.
• Step-6: Model is ready.
CIT-FDP
How does K-NN work?
• By calculating the Euclidean distance the nearest neighbors are arrived as three nearest
neighbors in category A and two nearest neighbors in category B.
CIT-FDP
https://www.kdnuggets.com/2020/11/mo
st-popular-distance-metrics-knn.html
• Advantages of KNN Algorithm: A) EUCLIDEAN
B) MANHATTAN
• It is simple to implement. C) Minkowski
D) Tanimoto
• It is robust to the noisy training data E) Jaccard
F) Mahalanobis
• It can be more effective if the training data is large.
• Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex
some time.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.
• https://towardsdatascience.com/importance-of-distance-metrics
-in-machine-learning-modelling-e51395ffe60d
CIT-FDP
New customer - height 161cm and
weight 61kg
=SQRT((161-158)^2+(61-58)^2) =4.242641
Height
Height (in
(in Weight
Weight (in
(in T
T Shirt
Shirt Size
Size Euclidean Distance
cms)
cms) kgs)
kgs)
158 58 M 13
158 58 M 4.242641
158 59 M 13.0384
158 59 M 3.605551
158 63 M 13.92839
158 63 M 3.605551
160 59 M 11.04536
160 59 M 2.236068
160 60 M 11.18034
160 60 M 1.414214
163 60 M 8.246211
163 60 M
163 61 M 8.544004 2.236068
163
160 61
64 M
L 12.52996 2
160
163 64
64 L
L 10 3.162278
163
165 64
61 L
L 6.708204 3.605551
165
165 61
62 L
L 7.211103 4
165
165 62
65 L
L 9.219544 4.123106
165
168 65
62 L
L 5 5.656854
168
168 62
63 L
L 5.830952 7.071068
168
168 63
66 L
L 8.544004 7.28011
168
170 66
63 L
L 8.602325
5.09902
170
170 63
64 L
L 9.219544
6.082763
170
170 64
68 L
L 9.486833
10.04988
170 68 L 11.40175
CIT-FDP
Assumptions of KNN
1. Standardization : When independent variables in training data are measured in different units,
it is important to standardize variables before calculating distance. For example, if one variable is
based on height in cms, and the other is based on weight in kgs then height will influence more on
the distance calculation. In order to make them comparable we need to standardize them which
can be done by any of the following methods :
CIT-FDP
X1 X2 X3 CLASS DISTANCE
SAMPLE 1 5 4 3 1 (SQRT[(4-5
)^2+(4-4)^2
+(2-3)^2 ]=
1.4
SAMPLE 2 1 2 2 2 3.6
SAMPLE 3 1 2 3 2 3.7
TEST 1 4 4 2 ?=1
TEST 2 2 1 4 ?
CIT-FDP
Support Vector Machines (SVMs)
The advantages of support vector machines are:
•Effective in high dimensional spaces.
•Still effective in cases where number of dimensions is greater than the
number of samples.
•Uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
•Versatile: different Kernel functions can be specified for the decision function.
Common kernels are provided, but it is also possible to specify custom
kernels.
CIT-FDP
• The objective of the support vector machine
algorithm is to find a hyperplane in an
N-dimensional space(N — the number of
features) that distinctly classifies the data
points.
•
CIT-FDP
Types of SVM
CIT-FDP
• Hyperplanes are decision boundaries that help classify the data points. Data points
falling on either side of the hyperplane can be attributed to different classes. Also, the
dimension of the hyperplane depends upon the number of features. If the number of
input features is 2, then the hyperplane is just a line. If the number of input features is
3, then the hyperplane becomes a two-dimensional plane. It becomes difficult to
imagine when the number of features exceeds 3.
CIT-FDP
Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and
C). Now, identify the right hyper-plane to classify star and circle.
You need to remember a thumb rule to identify the right hyper-plane: “Select the
hyper-plane which segregates the two classes better”. In this scenario,
hyper-plane “B” has excellently performed this job.
CIT-FDP
Identify the right hyper-plane (Scenario-2): Here, we have three
hyper-planes (A, B and C) and all are segregating the classes well.
Now, How can we identify the right hyper-plane?
Here, maximizing the distances between nearest data point (either class) and
hyper-plane will help us to decide the right hyper-plane. This distance is called
as Margin.
Above, you can see that the margin for hyper-plane C is high as compared to both
A and B. Hence, we name the right hyper-plane as C. Another lightning reason for
selecting the hyper-plane with higher margin is robustness. If we select a
CIT-FDP
hyper-plane having low margin then there is high chance of miss-classification.
Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed
in previous section to identify the right hyper-plane
But, here is the catch, SVM selects the hyper-plane which classifies the classes
accurately prior to maximizing margin. Here, hyper-plane B has a classification
error and A has classified all correctly. Therefore, the right hyper-plane is A.
CIT-FDP
Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two
classes using a straight line, as one of the stars lies in the territory of other(circle)
class as an outlier.
one star at other end is like an outlier for star class. The SVM algorithm has a
feature to ignore outliers and find the hyper-plane that has the maximum margin.
CIT-FDP
Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we
can’t have linear hyper-plane between the two classes, so how does SVM classify
these two classes? Till now, we have only looked at the linear hyper-plane.
SVM can solve this problem by introducing additional feature. Here, we will add a new
feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:
All values for z would be positive always because z is the squared sum of both x and y
•In the original plot, red circles appear close to the origin of x and y axes, leading to lower
value of z and star relatively away from the origin result to higher value of z.
The SVM algorithm has a technique called the kernel trick. The SVM kernel is a
function that takes low dimensional input space and transforms it to a higher
dimensional space i.e. it converts not separable problem to separable problem. It is
mostly useful in non-linear separation problem. Simply put, it does some extremely
complex data transformations, then findsCIT-FDP
out the process to separate the data based on
the labels or outputs defined.
CIT-FDP
CIT-FDP
CIT-FDP
• https://www.javatpoint.com/machine-learni
ng-support-vector-machine-algorithm
CIT-FDP