M4-Similarity Based Learning
M4-Similarity Based Learning
8. Consider the following training data set of 10 data instances shown in Table 4.12 which
describes the award performance of individual students based on GPA and No. of
projects done. The target variable is ‘Award’ which is a discrete valued variable that
takes 2 values ‘Yes’ or ‘No’.
Table 4.12 Training Dataset
Given a test instance (GPA -7.8, No. of projects done - 4), use the training set to classify
the test instance. Choose k=3.
2 0.2 Yes
10 0.6 Yes
9 1.487 Yes
Step 1: Calculate the Euclidean distance between the test instance (GPA -7.8, No. of projects
done - 4) and each of the training instances as shown in Table 3.
Step 2: Sort the distances in the ascending order and select the first 3 nearest training data
instances to the test instance. The selected nearest neighbors are shown in Table 4.
Table 4 Nearest Neighbors
Instance Euclidean distance Class
2 0.2 Yes
10 0.6 Yes
9 1.487 Yes
Step 3: Predict the class of the test instance by weighted voting technique from the 3 selected
nearest instances.
c. Compute the weight by dividing each inverse distance by the sum as shown in
Table 6.
Table 6 Weight Calculation
Instance Euclidean Inverse Weight= Class
distance distance Inverse
distance/Sum
2 0.2 5 0.681 Yes
Step 1: Compute the mean/centroid of each class. In this example there are two classes called
‘Yes’ and ‘No’.
Centroid of class ‘Yes’ = (9.5 + 8.0 + 6.5 + 9.5 + 8.9 +7.2 , 5+4+5+4+3+4) /6 = (49.6,25)/6 =
(8.27, 4.17)
Centroid of class ‘No’ = (7.2+3.2+6.6+5.4, 1+1+1+1) /3 = (22.4, 4)/4 = (5.6, 1)
Now given a test instance (7.8, 4) we can predict the class.
Step 2:
Calculate the Euclidean distance between test instance (6, 5) and each of the centroid.
Euc_Dist[(7.8, 4); (8.27, 4.17)] = SQRT(POWER((7.8-8.27),2)+POWER((4-4.17),2)) =
0.49979996
Euc_Dist[ (7.8, 4) ; (5.6, 1)] = SQRT(POWER((7.8-5.6),2)+POWER((4-1),2)) = 3.720215048
The test instance has smaller distance to class “Yes”. Hence the class of this test instance is
predicted as ‘Yes’.
Solution:
Euclidean
S.No Yes Yes yes yes no no no no no Result Distance
1 0 0 0 0 1 1 1 1 1 Positive 4 0.444444444
2 0 1 0 1 0 1 0 0 0 Negative 3 0.333333333
3 1 1 1 1 0 0 0 0 0 Negative 4 0.444444444
4 0 0 0 1 0 0 0 0 1 Negative 1 0.111111111
5 0 0 0 0 0 0 1 1 1 Positive 2 0.222222222
6 0 0 0 0 0 1 0 0 0 Positive 1 0.111111111
7 0 0 0 0 0 0 1 1 0 Positive 2 0.222222222
k=2
Instance 8 0 Positive
Instance 9 0 Positive class = "Positive"
k =3
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative class = "Positive"
K =4
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative
Instance 6 0.111 Positive class = "Positive"
k=5
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative
Instance 6 0.111 Positive
Instance 5 0.222 Positive class = "Positive"
k=6
Instance 8 0 Positive
Instance 9 0 Positive
Instance 4 0.111 Negative
Instance 6 0.111 Positive
Instance 5 0.222 Positive
Instance 7 0.222 Positive class = "Positive"