0% found this document useful (0 votes)
77 views8 pages

M4-Similarity Based Learning

Chapter 4 discusses similarity-based learning techniques, particularly focusing on k-Nearest Neighbor (k-NN) classifiers, weighted k-NN, and Nearest Centroid Classifier for classifying student awards based on GPA and project count. It also presents a case study on predicting COVID-19 test results using symptoms, demonstrating the impact of varying 'k' values on prediction accuracy. The chapter includes calculations of Euclidean distances and the application of different classifiers to derive predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views8 pages

M4-Similarity Based Learning

Chapter 4 discusses similarity-based learning techniques, particularly focusing on k-Nearest Neighbor (k-NN) classifiers, weighted k-NN, and Nearest Centroid Classifier for classifying student awards based on GPA and project count. It also presents a case study on predicting COVID-19 test results using symptoms, demonstrating the impact of varying 'k' values on prediction accuracy. The chapter includes calculations of Euclidean distances and the application of different classifiers to derive predictions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Chapter 4

Similarity Based Learning

8. Consider the following training data set of 10 data instances shown in Table 4.12 which
describes the award performance of individual students based on GPA and No. of
projects done. The target variable is ‘Award’ which is a discrete valued variable that
takes 2 values ‘Yes’ or ‘No’.
Table 4.12 Training Dataset

S.No. GPA No. of Award


Projects done
1. 9.5 5 Yes
2. 8.0 4 Yes
3. 7.2 1 No
4. 6.5 5 Yes
5. 9.5 4 Yes
6. 3.2 1 No
7. 6.6 1 No
8. 5.4 1 No
9. 8.9 3 Yes
10. 7.2 4 Yes

Given a test instance (GPA -7.8, No. of projects done - 4), use the training set to classify
the test instance. Choose k=3.

a) k-Nearest Neighbor classifier


b) Weighted k-Nearest Neighbor classifier
c) Nearest Centroid Classifier
Solution:

a) k-Nearest Neighbor classifier:

© Oxford University Press 2021. All rights reserved


Step 1: Calculate the Euclidean distance between the test instance (GPA -7.8, No. of projects
done - 4) and each of the training instances as shown in Table 1.

Table 1: Euclidean Distance


S.No. GPA No. of Award Euclidean Distance
Projects done
1. 9.5 5 Yes SQRT(POWER((9.5-
7.8),2)+POWER((5-4),2)) =
1.972308292
2. 8.0 4 Yes SQRT(POWER((8-
7.8),2)+POWER((4-4),2)) = 0.2
3. 7.2 1 No SQRT(POWER((7.2-
7.8),2)+POWER((1-4),2)) =
3.059411708
4. 6.5 5 Yes SQRT(POWER((6.5-
7.8),2)+POWER((5-4),2)) =
1.640121947
5. 9.5 4 Yes SQRT(POWER((9.5-
7.8),2)+POWER((4-4),2)) = 1.7
6. 3.2 1 No SQRT(POWER((3.2-
7.8),2)+POWER((1-4),2)) =
5.491812087
7. 6.6 1 No SQRT(POWER((5.4-
7.8),2)+POWER((1-4),2)) =
3.231098884
8. 5.4 1 No SQRT(POWER((5.4-
7.8),2)+POWER((1-4),2)) =
3.841874542
9. 8.9 3 Yes SQRT(POWER((8.9-
7.8),2)+POWER((3-4),2)) =
1.486606875
10. 7.2 4 Yes SQRT(POWER((7.2-
7.8),2)+POWER((4-4),2)) = 0.6

© Oxford University Press 2021. All rights reserved


Step 2: Sort the distances in the ascending order and select the first 3 nearest training data
instances to the test instance. The selected nearest neighbors are shown in Table 2.
Table 2 Nearest Neighbors
Instance Euclidean distance Class

2 0.2 Yes

10 0.6 Yes

9 1.487 Yes

Step 3: Predict the class of the test instance by majority voting.


The class for the test instance is predicted as "Yes".

b) Weighted k-Nearest Neighbor classifier

Step 1: Calculate the Euclidean distance between the test instance (GPA -7.8, No. of projects
done - 4) and each of the training instances as shown in Table 3.

Table 3: Euclidean Distance


S.No. GPA No. of Award Euclidean Distance
Projects done
1. 9.5 5 Yes SQRT(POWER((9.5-
7.8),2)+POWER((5-4),2)) =
1.972308292
2. 8.0 4 Yes SQRT(POWER((8-
7.8),2)+POWER((4-4),2)) = 0.2
3. 7.2 1 No SQRT(POWER((7.2-
7.8),2)+POWER((1-4),2)) =
3.059411708
4. 6.5 5 Yes SQRT(POWER((6.5-

© Oxford University Press 2021. All rights reserved


7.8),2)+POWER((5-4),2)) =
1.640121947
5. 9.5 4 Yes SQRT(POWER((9.5-
7.8),2)+POWER((4-4),2)) = 1.7
6. 3.2 1 No SQRT(POWER((3.2-
7.8),2)+POWER((1-4),2)) =
5.491812087
7. 6.6 1 No SQRT(POWER((5.4-
7.8),2)+POWER((1-4),2)) =
3.231098884
8. 5.4 1 No SQRT(POWER((5.4-
7.8),2)+POWER((1-4),2)) =
3.841874542
9. 8.9 3 Yes SQRT(POWER((8.9-
7.8),2)+POWER((3-4),2)) =
1.486606875
10. 7.2 4 Yes SQRT(POWER((7.2-
7.8),2)+POWER((4-4),2)) = 0.6

Step 2: Sort the distances in the ascending order and select the first 3 nearest training data
instances to the test instance. The selected nearest neighbors are shown in Table 4.
Table 4 Nearest Neighbors
Instance Euclidean distance Class

2 0.2 Yes

10 0.6 Yes

9 1.487 Yes

Step 3: Predict the class of the test instance by weighted voting technique from the 3 selected
nearest instances.

© Oxford University Press 2021. All rights reserved


a. Compute the inverse of each distance of the 3 selected nearest instances as shown
in Table 5.
Table 5 Inverse Distance
Instance Euclidean Inverse Class
distance distance
2 0.2 5 Yes

10 0.6 1.667 Yes

9 1.487 0.672 Yes

b. Find the sum of the inverses.


Sum = 5+ 1.667 + 0.672 = 7.339

c. Compute the weight by dividing each inverse distance by the sum as shown in
Table 6.
Table 6 Weight Calculation
Instance Euclidean Inverse Weight= Class
distance distance Inverse
distance/Sum
2 0.2 5 0.681 Yes

10 0.6 1.667 0.227 Yes

9 1.487 0.672 0.092 Yes

d. Add the weights of the same class.


No = 0
Yes = 0.681+ 0.227 + 0.092 = 1
e. Predict the class by choosing the class with the maximum vote.

The class is predicted as "Yes".

© Oxford University Press 2021. All rights reserved


c) Nearest Centroid Classifier

Step 1: Compute the mean/centroid of each class. In this example there are two classes called
‘Yes’ and ‘No’.
Centroid of class ‘Yes’ = (9.5 + 8.0 + 6.5 + 9.5 + 8.9 +7.2 , 5+4+5+4+3+4) /6 = (49.6,25)/6 =
(8.27, 4.17)
Centroid of class ‘No’ = (7.2+3.2+6.6+5.4, 1+1+1+1) /3 = (22.4, 4)/4 = (5.6, 1)
Now given a test instance (7.8, 4) we can predict the class.

Step 2:
Calculate the Euclidean distance between test instance (6, 5) and each of the centroid.
Euc_Dist[(7.8, 4); (8.27, 4.17)] = SQRT(POWER((7.8-8.27),2)+POWER((4-4.17),2)) =
0.49979996
Euc_Dist[ (7.8, 4) ; (5.6, 1)] = SQRT(POWER((7.8-5.6),2)+POWER((4-1),2)) = 3.720215048

The test instance has smaller distance to class “Yes”. Hence the class of this test instance is
predicted as ‘Yes’.

9. A COVID care centre decide to develop a case-based reasoning system to predict


whether a person will test positive or negative based on the symptoms. The table below
shows the number of possible symptoms and the results of the previous cases. The
training dataset contains the following instances as shown in the Table 4.13 below.

© Oxford University Press 2021. All rights reserved


Table 4.13: Sample Set of Instances
S.No. Fever Dry Tiredness Sore Diarrhea Headache Loss Shortness Chest Result
cough Throat of of Breath Pain
Taste
or
Smell
1. Yes Yes Yes Yes Yes Yes Yes Yes Yes Positive
2. Yes No Yes No No Yes No No No Negative
3. No No No No No No No No No Negative
4. Yes Yes Yes No No No No No Yes Negative
5. Yes Yes Yes Yes No No Yes Yes Yes Positive
6. Yes Yes Yes Yes No Yes No No No Positive
7. Yes Yes Yes Yes No No Yes Yes No Positive
8. Yes Yes Yes Yes No No No No No Positive
9. Yes Yes Yes Yes No No No No No Positive
10. No No No No No No No No No Negative

i. Determine K = number of nearest neighbors to get a better prediction result.


ii. Increase 'K' value and check the prediction. Is it good or bad to have a smaller or
larger 'K' value?
iii. Apply proper similarity measure [Asymmetric binary features] and predict the test
result of the instance [Fever =Yes, Dry Cough = Yes, Tiredness = yes, Sore
Throat = Yes, Diarrhea = No, Headache = No, Loss of Taste or Smell = No,
Shortness of Breath = No, Chest Pain = No].

Solution:
Euclidean
S.No Yes Yes yes yes no no no no no Result Distance
1 0 0 0 0 1 1 1 1 1 Positive 4 0.444444444
2 0 1 0 1 0 1 0 0 0 Negative 3 0.333333333
3 1 1 1 1 0 0 0 0 0 Negative 4 0.444444444
4 0 0 0 1 0 0 0 0 1 Negative 1 0.111111111
5 0 0 0 0 0 0 1 1 1 Positive 2 0.222222222
6 0 0 0 0 0 1 0 0 0 Positive 1 0.111111111
7 0 0 0 0 0 0 1 1 0 Positive 2 0.222222222

© Oxford University Press 2021. All rights reserved


8 0 0 0 0 0 0 0 0 0 Positive 0 0
9 0 0 0 0 0 0 0 0 0 Positive 0 0
10 1 1 1 1 0 0 0 0 0 Negativ 4 0.444444444

k=2

Instance 8 0 Positive
Instance 9 0 Positive class = "Positive"

k =3
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative class = "Positive"

K =4
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative
Instance 6 0.111 Positive class = "Positive"

k=5
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative
Instance 6 0.111 Positive
Instance 5 0.222 Positive class = "Positive"

k=6
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative
Instance 6 0.111 Positive
Instance 5 0.222 Positive
Instance 7 0.222 Positive class = "Positive"

© Oxford University Press 2021. All rights reserved

You might also like