K-Nearest Neighbor Learning
K-Nearest Neighbor Learning
Learning
Classification is the process
of classifying the data with the
help of class labels. On the other
hand,Clustering is similar
to classification but there are no
predefined class
labels. Classification is geared with
supervised learning. As
against, clustering is also known as
unsupervised learning
Nearest Neighbor Classifier
Nearest-neighbor classifiers are
based on learning by analogy, that
is, by comparing a given test tuple
with training tuples that are similar
to it.
Nearest Neighbor Classifier
The training tuples are described by
n attributes. Each tuple represents a
point in an n-dimensional space. In
this way, all of the training tuples
are stored in an n-dimensional
pattern space.
Nearest Neighbor Classifier
When given an unknown tuple, a k-
nearest-neighbor classifier searches
the pattern space for the k training
tuples that are closest to the
unknown tuple. These k training
tuples are the k “nearest neighbors”
of the unknown tuple.
Closeness
defined in terms of a distance
metric, such as Euclidean distance.
The Euclidean distance between two
points or tuples, say, X1 = (x11,
x12, … , x1n) and X2 = (x21, x22,
... , x2n) is
Class Label
For k-nearest-neighbor
classification, the unknown tuple is
assigned the most common class
among its k nearest neighbors.
When k = 1, the unknown tuple is
assigned the class of the training
tuple that is closest to it in pattern
space.
1-Nearest Neighbor
3-Nearest Neighbor
K-Nearest Neighbor
An arbitrary instance is represented by
(a1(x), a2(x), a3(x),.., an(x))
ai(x) denotes features
Euclidean distance between two instances
d(xi, xj)=sqrt (sum for r=1 to n (ar(xi) -
ar(xj))2)
Continuous valued target function
mean value of the k nearest training
examples
K-Nearest Neighbor
Here is step by step on how to compute K-
nearest neighbor algorithm:
1. Determine parameter K= number of nearest
neighbors
2. Calculate the distance between the query-
instance and all the training samples
3. Sort the distance and determine nearest
neighbors based on the K-th minimum distance
4. Gather category Y of nearest neighbors
5. Use majority of the category of nearest
neighbors as prediction value of the query
instance
Example
We have data of two attributes to classify
whether a special tissue is ‘good’ or ‘bad’.
X1 = Acid X2 = Strength Y=
Durability (kg/ Square Classification
(Seconds) meter)
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Example
Now the factory produces a new tissue that pass
laboratory test with X1 = 3 and X2 = 7. Without
another expensive survey, can we guess what
the classification of this new tissue is?
1. Determine parameter K= number of nearest
neighbors. Suppose use K = 3
Example
Now the factory produces a new tissue that pass
laboratory test with X1 = 3 and X2 = 7. Without
another expensive survey, can we guess what
the classification of this new tissue is?
1. Determine parameter K= number of nearest
neighbors. Suppose use K = 3
Example
2. Calculate the distance between the query-
instance and all the training samples
X1 = X2 = Rank Is it Y=
Acid Strength minimu include Categor
Durabili (kg/ m d in 3- y of
ty Square distance nearest nearest
(Secon meter) neighbo neighbo
ds) rs? rs
7 7 3 Yes Bad
7 4 4 No -
3 4 1 Yes Good
1 4 2 Yes Good
Example
5. Use majority of the category of nearest
neighbors as prediction value of the query
instance
We have 2 good and 1 bad. So test tuple is
included in good class.
Exercise
Find class label of the values (Age = 26, Income
= 18)