CPE412 Pattern Recognition (Week 6)
CPE412 Pattern Recognition (Week 6)
Compute
Distance Test Record
3
K-NN algorithm mostly it used for the
Classification problems.
It is also called a lazy learner algorithm
because it does not learn from the training set
immediately instead it stores the dataset and at
the time of classification, it performs an action
on the dataset.
KNN algorithm at the training phase just stores
the dataset and when it gets new data, then it
classifies that data into a category that is much
similar to the new data.
4
For a given instance T, get the top k dataset
instances that are “nearest” to T
◦ Select a reasonable distance measure
Inspect the category(maybe more than 1) of
these k instances, choose the category C that
represent the most instances.
Conclude that T belongs to category C
In practice, k is usually chosen to be odd, to
avoid ties.
X X X
7
Advantages of KNN Algorithm:
◦ It is simple to implement.
◦ It is robust to the noisy training data
◦ It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
◦ Always needs to determine the value of K which
may be complex some time.
◦ The computation cost is high because of calculating
the distance between the data points for all the
training samples.
8
Suppose we have a new data point and we need to
put it in the required category. Consider the below
image:
9
Firstly, we will choose the
number of neighbors, so
we will choose the k=5.
Next, we will calculate the
Euclidean distance
between the data points.
The Euclidean distance is
the distance between two
points, which we have
already studied in
geometry. It can be
calculated as:
10
By calculating the Euclidean distance we got the nearest
neighbors, as three nearest neighbors in category A and two
nearest neighbors in category B. Consider the below image:
11
By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:
12
13
K value indicates the count of the nearest
neighbors. We have to compute distances
between test points and trained labels points.
Updating distance metrics with every iteration
is computationally expensive, and that’s why
KNN is a lazy learning algorithm.
14
As you can verify from the
image, if we proceed with K=3,
then we predict that test input
belongs to class B, and if we
continue with K=7, then we
predict that test input belongs
to class A.
That’s how you can imagine
that the K value has a powerful
effect on KNN performance..
15
16
17
18
19
20
21
22
23
The distance formula involves comparing the values of each feature. For
example, to calculate the distance between the tomato (sweetness = 6,
crunchiness = 4), and the green bean(sweetness = 3, crunchiness = 7),
we can use the formula as follows:
24
In the classifier, we might set k = 4, because there
were 15 example ingredients in the training data and
the square root of 15 is 3.87.
25
k-means clustering
26
27