0% found this document useful (0 votes)
9 views11 pages

Classification KNN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

Classification KNN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CLASSIFICATION

Arvind Deshpande
11/19/2024 Arvind Deshpande (VJTI) 2

K-Nearest Neighbours (KNN)


• Eager learning- The classification algorithm constructs a
classification model before receiving new data to classify.
• Lazy learning - Instance-based learning
• Training data is simply stored (or only minor processing)
and waits until it is given a test tuple to classify.
• KNN algorithm - instance-based learning
• Unlike eager learning methods, lazy learners do less work
when a training tuple is presented and more work when
making a classification or numeric prediction
11/19/2024 Arvind Deshpande (VJTI) 3

KNN classifier
• KNN classifiers are based on learning by analogy, that is,
by comparing a given test tuple with training tuples that
are similar to it.
• The training tuples are described by n attributes.
• Each tuple represents a point in an n-dimensional space.
In this way, all the training tuples are stored in an n-
dimensional pattern space.
• When given an unknown tuple, a k-nearest-neighbor
classifier searches the pattern space for the k training
tuples that are closest to the unknown tuple.
• These k training tuples are the k “nearest neighbors” of
the unknown tuple.
11/19/2024 Arvind Deshpande (VJTI) 4

KNN classifier
• “Closeness” is defined in terms of a distance metric, such
as Euclidean distance. The Euclidean distance between
two points or tuples, say, 𝑋1 = 𝑥11 , 𝑥12 , … , 𝑥1𝑛 and 𝑋2 =
𝑥21 , 𝑥22 , … , 𝑥2𝑛 .
𝑛

𝑑𝑖𝑠𝑡 𝑋1 , 𝑋2 = 𝑥1𝑖 − 𝑥2𝑖 2

𝑖=1

• For categorical attributes, if 2 tuples are identical, the


difference is taken as 0 otherwise the difference is 1.
11/19/2024 Arvind Deshpande (VJTI) 5

KNN classifier
• Typically, we normalize the values of each attribute before
using the equation. This helps prevent attributes with
initially large ranges (e.g., income) from outweighing
attributes with initially smaller ranges (e.g., binary
attributes).
• Min-max normalization, for example, can be used to
transform a value v of a numeric attribute A to v’ in the
range [0, 1] by computing

𝑣 − 𝑚𝑖𝑛𝐴
𝑣 =
𝑚𝑎𝑥𝐴 − 𝑚𝑖𝑛𝐴
where 𝑚𝑖𝑛𝐴 and 𝑚𝑎𝑥𝐴 are the minimum and maximum
values of attribute A.
11/19/2024 Arvind Deshpande (VJTI) 6

KNN classifier
11/19/2024 Arvind Deshpande (VJTI) 7

KNN Algorithm
1. Calculate "d (x, xi)" i = 1, 2……, n where d denotes the
Euclidean distance between the points.
2. Arrange the calculated in Euclidean distances in non-
decreasing order.
3. Let k be a +ve integer, take the first k distances from
this sorted list.
4. Find those k-points corresponding to these k-distances.
5. Let ki denotes the number of points belonging to the ith
class among k points i.e. k > 0.
6. If ki > kj , i ≠ j then put x in class i.
11/19/2024 Arvind Deshpande (VJTI) 8

Choosing the right value of K


• The KNN algorithm is run several times with different values of
K and choose the K that reduces the number of errors we
encounter while maintaining the algorithm’s ability to accurately
make predictions when it's given data it hasn't seen before.
• As we decrease the value of K to 1, our predictions become
less stable.
• Inversely, as we increase the value of K, our predictions
become more stable due to majority voting / averaging, and
thus, more likely to make more accurate predictions (up to a
certain point.)
• Eventually, we begin to witness an increasing number of errors.
It is at this point we know pushed the value of K too far.
• In cases where we are taking a majority vote (e.g. picking the
mode in a classification problem) among labels, we usually
make K an odd number to have a tiebreaker.
11/19/2024 Arvind Deshpande (VJTI) 9

Advantages
• The algorithm is simple and easy to implement.
• It is lazy learning algorithm and therefore requires no
training prior to making real time prediction.
• There's no need to build a model, tune several
parameters, or make additional assumptions. This makes
the KNN algorithm much faster than other algorithms that
require training like SVM, linear regression etc.
• Since the algorithm requires no training before making
predictions, new data can be added seamlessly.
• There are only two parameters required to implement
KNN i.e. the value of K and the distance function.
• The algorithm is versatile. It can be used for classification,
regression and search.
11/19/2024 Arvind Deshpande (VJTI) 10

Disadvantages
• It doesn't work well with high dimensional data because
with large number of dimensions, it becomes difficult for
the algorithm to calculate distance in each dimension.
• KNN algorithm has a high prediction cost for large
datasets. This is because in large datasets the cost of
calculating distance between new point and each existing
point becomes higher.
• KNN algorithm doesn't work well with categorical features
since it is difficult to find the distance between dimensions
with categorical features.
• The algorithm gets significantly slower as the number of
examples and/or predictors/independent variables
increase.
11/19/2024 Arvind Deshpande (VJTI) 11

Disadvantages
• When making a classification or numeric prediction, lazy
learners can be computationally expensive.
• They require efficient storage techniques and are well
suited to implementation on parallel hardware.
• They offer little explanation or insight into the data’s
structure.
• They can suffer from poor accuracy when given noisy or
irrelevant attributes.
• Nearest-neighbor classifiers can be extremely slow when
classifying test tuples.

You might also like