Decision Tree and KNN Assignment Two
Decision Tree and KNN Assignment Two
College of Informatics
Department of Computer Science (Postgraduate program)
Machine Learning
Course code
Decision Tree and KNN Assignment Two
July 12/2022
Gondar, Ethiopia
Given the above training examples, predict the class for the following instances Using Decision Tree
and KNN where k=3.
X = (age=Senior, income=medium, student = yes, credit rating=excellent)
Z = (age=middle-aged, income=low, student = no, credit rating=fair)
R = (age=Youth, income=medium, student = no, credit rating=excellent).
1. Using Decision
Given the training data (Buy Computer data), build a decision tree and predict the class of the
following new example (age=Senior, income=medium, student = yes, credit rating=excellent).
First check which attribute will be the root node by using the highest Information Gain in order to
split the training set based on that attribute. We need to calculate the expected information to classify
the set and the entropy of each attribute. The information gain is this mutual information minus the
entropy.
The information of the total dataset with the two classes which are class-Yes and class-
No is calculated as
1
𝒚𝒆𝒔 𝒚𝒆𝒔 𝑵𝒐 𝑵𝒐
I (Yes, No) = (- (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ) – ((𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆𝒕 ))
In the given training data set we have 14 total number of example data with two classes:
class-yes 9 and class-no 5. From this we get the total information gain
I (Yes, No) = I (9, 5) = -9/14 log2 (9/14) – 5/14 log2 (5/14) =0.94.
Then we find the entropy of each features (age, income, student and crediting rate) as
Age v=youth (Yes) = 2 age v=youth (No) = 3
Age v = middle-aged (Yes) = 4 age v =middle-aged (No) =0
Age v= Senior (Yes) =3 age v=senior (No) =2
Calculate Entropy of age (residual information) and then after we calculate the information
gain.
Entropy (age) = 5/14 (-2/5 log2 (2/5)-3/5log2 (3/5)) + 4/14 (0) + 5/14 (-3/5log2 (3/5)-2/5log2 (2/5))
= 5/14(0.9709) + 0 + 5/14(0.9709)
= 0.6935
Gain (age) = 0.94 – 0.6935 = 0.2465
For Income we have three values
Income v=high (Yes) = 2 income v=high (No) = 2
Income v =medium (Yes) = 4 income v =medium (No) =2
Income v= low (Yes) =3 income v=low (No) =1
Since all records under the branch age middle-aged are all of class Yes, we can replace the leaf with
Class=Yes.
Step 2: Decide what feature to use at the root node for the left branch.
The same process of splitting has to happen for the two remaining branches.
For branch age youth we still have attributes income, student and Credit_Rating. Which one should
be used to split the partition?
3
income student Credit-rating class
High No Fair No
High No Excellent No
Medium No Fair No
Low Yes Fair Yes
Medium Yes excellent yes
The information of the total dataset with the two classes which are class-Yes and class-No is calculated
𝒚𝒆𝒔 𝒚𝒆𝒔 𝑵𝒐 𝑵𝒐
as I (Yes, No) = (- (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ) – ((𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ))
In the above data set we have 5 total number of example data with two classes: class-yes
2 and class-no 3. From this we get the total information gain as
I (yes, no) = I (2, 3) = -2/5 log2 (2/5) – 3/5 log2 (3/5) =0.97
4
Since these two new branches are from distinct classes, we make them into leaf nodes with their
respective class as label.
Again the same process is needed for the right branch of age which is senior.
The information of the total dataset with the two classes which are class-Yes and class-
No is calculated as
𝒚𝒆𝒔 𝒚𝒆𝒔 𝑵𝒐 𝑵𝒐
I (Yes, No) = (-(𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ) – ((𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ))
In the above data set we have 5 total number of example data with two classes: class-yes
3 and class-no 2. From this we get the total information gain as
I (yes, no) = I (3, 2) = -3/5 log2 (3/5) – 2/5 log2 (2/5)
5
=0.97
= 0.95
Gain (student) = 0.97 – 0.95 = 0.02
For Credit_Rating we have two values
Credit_Rating v=fair (Yes) = 3 Student v=yes (No) = 0
Credit_Rating v =excellent (Yes) = 0 Student v =excellent (No) =2
Entropy (Credit_Rating) = 0
6
Based on this decision tree we can predict the class of new examples as follows.
X = (age=Senior, income=medium, student = yes, credit rating=excellent)
Follow branch (age = senior) then credit rating=excellent we predict Class=no So, the new instance cannot buy
a computer.
Z = (age=middle-aged, income=low, student = no, credit rating=fair)
Follow branch (age = middle) then we predict Class=yes So, the new instance can buy a computer.
R = (age=Youth, income=medium, student = no, credit rating=excellent).
Follow branch (age = youth) then student = no we predict Class=no So, the new instance cannot buy a
computer.
Let as change all the categorical data into numerical data by random value.
7
RID age income student Credit- Class:
rating Buys
computer
1 0 2 0 0 No
2 0 2 0 1 No
3 1 2 0 0 Yes
4 2 1 0 0 Yes
5 2 0 1 0 Yes
6 2 0 1 1 No
7 1 0 1 1 Yes
8 0 1 0 0 No
9 0 0 1 0 Yes
10 2 1 1 0 Yes
11 0 1 1 1 Yes
12 1 1 0 1 Yes
13 1 2 1 0 Yes
14 2 1 0 1 no
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 2
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no ?
Calculate the distance between the query-instance and all the training examples
= Sqrt (7)
=2.65
This is the distance between the first data and the new example data.
Based on this formula we find the distance for each given training example data.
8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 2
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no ?
distance 2.65 2.45 2 1.41 1.41 1 1.7 2.45 2.45 1 2 1.41 1.7 1
From this we select three neighbors which are more close to the new data example these are 6, 12 and 14
and we predict the new class by the majority of the three one. Here the majority are class no therefore the
new class is no class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 2
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no no
distance 2.65 2.45 2 1.41 1.41 1 1.7 2.45 2.45 1 2 1.41 1.7 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 16(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 1
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 0
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit-rating 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
class no no yes yes yes no yes no yes yes yes yes yes no ?
Calculate the distance between the query-instance and all the training examples
= Sqrt (5)
=2.23
This is the distance between the first data and the new example data.
Based on this formula we find the distance for each given training example data.
9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 16(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 1
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 0
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
rating
class no no yes yes yes no yes no yes yes yes yes yes no ?
distance 2.23 2.45 2 1.41 1.41 1.7 1.41 1.41 1.41 1.7 2 1.41 1.7 1.7
From this we select three neighbors which are more close to the new data example which has smallest distance
i.e. 1.41 and we predict the new class by the majority of these.
Here the majority are class no therefore the new class is yes class.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 16(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 1
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 0
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
rating
class no no yes yes yes no yes no yes yes yes yes yes no yes
distance 2.23 2.45 2 1.41 1.41 1.7 1.41 1.41 1.41 1.7 2 1.41 1.7 1.7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 17(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 0
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit-rating 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
class no no yes yes yes no yes no yes yes yes yes yes no ?
Calculate the distance between the query-instance and all the training examples
= Sqrt (2)
=1.41
10
This is the distance between the first data and the new example data.
Based on this formula we find the distance for each given training example data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 17(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 0
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no no
distance 1.41 1 1.7 2.23 2.65 2.45 1.7 1 1.7 2.45 1 1 2 2
From this we select three neighbors which are more close to the new data example which has smallest distance
i.e. 1 and we predict the new class by the majority of these.
Here the majority are class no therefore the new class is yes class
11
RID Age Income student Credit- Class: buys-
rating computer
1 Youth High No Fair No
2 Youth High No Excellent No
3 Middle-aged High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle-aged Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle-aged Medium No Excellent Yes
13 Middle-aged High Yes Fair Yes
14 Senior Medium No Excellent No
15 senior Medium yes excellent No
16 Middle-aged low no Fair yes
17 youth medium no excellent No
12