0% found this document useful (0 votes)
13 views

CPE412 Pattern Recognition (Week 6)

Uploaded by

Basil Albattah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

CPE412 Pattern Recognition (Week 6)

Uploaded by

Basil Albattah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Week 6

KNN (K-Nearest Neighbor)

Dr. Nehad Ramaha,


Computer Engineering Department
Karabük Universities 1
The class notes are a compilation and edition from many sources. The instructor does not claim intellectual property or ownership of the lecture notes.
 Basic idea:
◦ If it walks like a duck, quacks like a duck, then it’s
probably a duck

Compute
Distance Test Record

Training Choose k of the


Records “nearest” records
 K-Nearest Neighbor is one of the simplest Machine
Learning algorithms based on Supervised Learning
technique.
 K-NN algorithm assumes the similarity between the
new case/data and available cases and put the new
case into the category that is most similar to the
available categories.
 K-NN algorithm stores all the available data and
classifies a new data point based on the similarity.
This means when new data appears then it can be
easily classified into a well suite category by using K-
NN algorithm.

3
 K-NN algorithm mostly it used for the
Classification problems.
 It is also called a lazy learner algorithm
because it does not learn from the training set
immediately instead it stores the dataset and at
the time of classification, it performs an action
on the dataset.
 KNN algorithm at the training phase just stores
the dataset and when it gets new data, then it
classifies that data into a category that is much
similar to the new data.
4
 For a given instance T, get the top k dataset
instances that are “nearest” to T
◦ Select a reasonable distance measure
 Inspect the category(maybe more than 1) of
these k instances, choose the category C that
represent the most instances.
 Conclude that T belongs to category C
 In practice, k is usually chosen to be odd, to
avoid ties.
X X X

(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

K-nearest neighbors of a record x are data points that have


the k smallest distance to x
 Step-1: Select the number K of the neighbors
 Step-2: Calculate the Euclidean distance of K number
of neighbors
 Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
 Step-4: Among these k neighbors, count the number
of the data points in each category.
 Step-5: Assign the new data points to that category
for which the number of the neighbor is maximum.
 Step-6: Our model is ready.

7
 Advantages of KNN Algorithm:
◦ It is simple to implement.
◦ It is robust to the noisy training data
◦ It can be more effective if the training data is large.
 Disadvantages of KNN Algorithm:
◦ Always needs to determine the value of K which
may be complex some time.
◦ The computation cost is high because of calculating
the distance between the data points for all the
training samples.

8
 Suppose we have a new data point and we need to
put it in the required category. Consider the below
image:

9
 Firstly, we will choose the
number of neighbors, so
we will choose the k=5.
 Next, we will calculate the
Euclidean distance
between the data points.
The Euclidean distance is
the distance between two
points, which we have
already studied in
geometry. It can be
calculated as:

10
 By calculating the Euclidean distance we got the nearest
neighbors, as three nearest neighbors in category A and two
nearest neighbors in category B. Consider the below image:

11
 By calculating the Euclidean distance we got the nearest neighbors, as three
nearest neighbors in category A and two nearest neighbors in category B.
Consider the below image:

•As we can see the 3


nearest neighbors are
from category A, hence
this new data point must
belong to category A.

12
13
 K value indicates the count of the nearest
neighbors. We have to compute distances
between test points and trained labels points.
Updating distance metrics with every iteration
is computationally expensive, and that’s why
KNN is a lazy learning algorithm.

14
 As you can verify from the
image, if we proceed with K=3,
then we predict that test input
belongs to class B, and if we
continue with K=7, then we
predict that test input belongs
to class A.
 That’s how you can imagine
that the K value has a powerful
effect on KNN performance..

15
16
17
18
19
20
21
22
23
 The distance formula involves comparing the values of each feature. For
example, to calculate the distance between the tomato (sweetness = 6,
crunchiness = 4), and the green bean(sweetness = 3, crunchiness = 7),
we can use the formula as follows:

24
 In the classifier, we might set k = 4, because there
were 15 example ingredients in the training data and
the square root of 15 is 3.87.

25
 k-means clustering

26
27

You might also like