0% found this document useful (0 votes)
106 views

Decision Tree and KNN Assignment Two

The document provides details on building a decision tree to classify examples from a training data set into two classes (Yes, No) based on four attributes: age, income, student status, and credit rating. It describes calculating information gain for each attribute to determine the root node, then recursively splitting the data set and making predictions. For a new example (age=Senior, income=medium, student=yes, credit=excellent), the decision tree predicts a "Yes" class. It also predicts the classes for two other examples using the decision tree.

Uploaded by

abaynesh moges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
106 views

Decision Tree and KNN Assignment Two

The document provides details on building a decision tree to classify examples from a training data set into two classes (Yes, No) based on four attributes: age, income, student status, and credit rating. It describes calculating information gain for each attribute to determine the root node, then recursively splitting the data set and making predictions. For a new example (age=Senior, income=medium, student=yes, credit=excellent), the decision tree predicts a "Yes" class. It also predicts the classes for two other examples using the decision tree.

Uploaded by

abaynesh moges
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

University of Gondar

College of Informatics
Department of Computer Science (Postgraduate program)
Machine Learning
Course code
Decision Tree and KNN Assignment Two

By: Abaynesh Moges

To: Abdel Ahmed. (PhD)

July 12/2022

Gondar, Ethiopia
Given the above training examples, predict the class for the following instances Using Decision Tree
and KNN where k=3.
X = (age=Senior, income=medium, student = yes, credit rating=excellent)
Z = (age=middle-aged, income=low, student = no, credit rating=fair)
R = (age=Youth, income=medium, student = no, credit rating=excellent).

1. Using Decision
Given the training data (Buy Computer data), build a decision tree and predict the class of the
following new example (age=Senior, income=medium, student = yes, credit rating=excellent).

First check which attribute will be the root node by using the highest Information Gain in order to
split the training set based on that attribute. We need to calculate the expected information to classify
the set and the entropy of each attribute. The information gain is this mutual information minus the
entropy.
The information of the total dataset with the two classes which are class-Yes and class-
No is calculated as

1
𝒚𝒆𝒔 𝒚𝒆𝒔 𝑵𝒐 𝑵𝒐
I (Yes, No) = (- (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ) – ((𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆𝒕 ))

In the given training data set we have 14 total number of example data with two classes:
class-yes 9 and class-no 5. From this we get the total information gain
I (Yes, No) = I (9, 5) = -9/14 log2 (9/14) – 5/14 log2 (5/14) =0.94.
Then we find the entropy of each features (age, income, student and crediting rate) as
Age v=youth (Yes) = 2 age v=youth (No) = 3
Age v = middle-aged (Yes) = 4 age v =middle-aged (No) =0
Age v= Senior (Yes) =3 age v=senior (No) =2

Calculate Entropy of age (residual information) and then after we calculate the information
gain.
Entropy (age) = 5/14 (-2/5 log2 (2/5)-3/5log2 (3/5)) + 4/14 (0) + 5/14 (-3/5log2 (3/5)-2/5log2 (2/5))
= 5/14(0.9709) + 0 + 5/14(0.9709)
= 0.6935
Gain (age) = 0.94 – 0.6935 = 0.2465
For Income we have three values
Income v=high (Yes) = 2 income v=high (No) = 2
Income v =medium (Yes) = 4 income v =medium (No) =2
Income v= low (Yes) =3 income v=low (No) =1

Entropy (income) = 4/14(-2/4log2 (2/4)-2/4log 2(2/4)) + 6/14 (-4/6log 2(4/6)-2/6log2 (2/6))


+ 4/14 (-3/4log2 (3/4)-1/4log 2(1/4))
= 4/14 (1) + 6/14 (0.918) + 4/14 (0.811)
= 0.285714 + 0.393428 + 0.231714 = 0.9108
Gain (income) = 0.94 – 0.9108 = 0.0292
For Student we have two values
Student v=yes (Yes) = 6 Student v=yes (No) = 1
Student v =no (Yes) = 3 Student v =no (No) =4

Entropy (student) = 7/14(-6/7log2 (6/7)) + 7/14(-3/7log2 (3/7)-4/7log 2(4/7)


= 7/14(0.5916) + 7/14(0.9852)
= 0.2958 + 0.4926 = 0.7884
Gain (student) = 0.94 – 0.7884 = 0.1516
For Credit_Rating we have two values
Credit_Rating v=fair (Yes) = 6 Student v=yes (No) = 2
Credit_Rating v =excellent (Yes) = 3 Student v =excellent (No) =3

Entropy (Credit_Rating) = 8/14(-6/8log2 (6/8)-2/8log2 (2/8)) + 6/14(-3/6log 2(3/6)-3/6log2 (3/6))


= 8/14(0.8112) + 6/14(1)
= 0.4635 + 0.4285 = 0.8920
2
Gain (Credit_Rating) = 0.94 – 0.8920 = 0.479
Gain (age) = 0.94 – 0.6935 = 0.2465
Gain (income) = 0.94 – 0.9108 = 0.0292
Gain (student) = 0.94 – 0.7884 = 0.1516
Gain (Credit_Rating) = 0.94 – 0.8920 = 0.048
Since Age has the highest Information Gain we start splitting the dataset using the age attribute

Since all records under the branch age middle-aged are all of class Yes, we can replace the leaf with
Class=Yes.

Step 2: Decide what feature to use at the root node for the left branch.
The same process of splitting has to happen for the two remaining branches.
For branch age youth we still have attributes income, student and Credit_Rating. Which one should
be used to split the partition?

3
income student Credit-rating class
High No Fair No
High No Excellent No
Medium No Fair No
Low Yes Fair Yes
Medium Yes excellent yes
The information of the total dataset with the two classes which are class-Yes and class-No is calculated
𝒚𝒆𝒔 𝒚𝒆𝒔 𝑵𝒐 𝑵𝒐
as I (Yes, No) = (- (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ) – ((𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ))

In the above data set we have 5 total number of example data with two classes: class-yes
2 and class-no 3. From this we get the total information gain as
I (yes, no) = I (2, 3) = -2/5 log2 (2/5) – 3/5 log2 (3/5) =0.97

For Income we have three values


Income v=high (Yes) = 0 income v=high (No) = 2
Income v =medium (Yes) = 1 income v =medium (No) =1

Income v= low (Yes) =1 income v=low (No) =0


Entropy (income) = 2/5(0) + 2/5 (-1/2log 2(1/2)-1/2log 2(1/2)) + 1/5 (0)
= 2/5 (1) = 0.4
Gain (income) = 0.97 – 0.4 = 0.57
For Student we have two values
Student v=yes (Yes) = 2 Student v=yes (No) = 0
Student v =no (Yes) = 0 Student v =no (No) =3

Entropy (student) = 2/5(0) + 3/5(0) = 0


Gain (student) = 0.97 – 0 = 0.97
For Credit_Rating we have two values
Credit_Rating v=fair (Yes) = 1 Student v=yes (No) = 2
Credit_Rating v =excellent (Yes) = 1 Student v =excellent (No) =1

= 2/5 (-1/3 log 2(1/3)-2/3log 2(2/3)) + (-1/2 log 2(1/2)-1/2log 2(1/2))


=0.95
Gain (student) = 0.97 -0.95=0.2
Gain (income) = 0.97 – 0.4 = 0.57
Gain (student) = 0.97 – 0 = 0.97
Gain (student) = 0.97 -0.95=0.2
We can then safely split on attribute student without checking the other attributes since the information gain
is maximized.

4
Since these two new branches are from distinct classes, we make them into leaf nodes with their
respective class as label.

Again the same process is needed for the right branch of age which is senior.

Income Student Credit-rating class


Medium No Fair yes
low yes fair yes
Low Yes Excellent No
Medium Yes Fair Yes
medium no Excellent no

The information of the total dataset with the two classes which are class-Yes and class-
No is calculated as
𝒚𝒆𝒔 𝒚𝒆𝒔 𝑵𝒐 𝑵𝒐
I (Yes, No) = (-(𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ) – ((𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ) ∗ 𝐥𝐨𝐠 𝟐 (𝒕𝒐𝒕𝒂𝒍 𝒅𝒂𝒕𝒂 𝒔𝒆 ))

In the above data set we have 5 total number of example data with two classes: class-yes
3 and class-no 2. From this we get the total information gain as
I (yes, no) = I (3, 2) = -3/5 log2 (3/5) – 2/5 log2 (2/5)

5
=0.97

For Income we have two values


Income v =medium (Yes) = 2 income v =medium (No) =1

Income v= low (Yes) =1 income v=low (No) =1


Entropy (income) = 3/5(-2/3log 2(2/3)-1/3log2 (1/3)) + 2/5 (-1/2log2 (1/2)-1/2log 2(1/2))
= 3/5(0.9182) +2/5 (1) = 0.55+0. 4= 0.95
Gain (income) = 0.97 – 0.95 = 0.02
For Student we have two values
Student v=yes (Yes) = 2 Student v=yes (No) = 1
Student v =no (Yes) = 1 Student v =no (No) =1
Entropy (student) = 3/5(-2/3log2 (2/3)-1/3log2 (1/3)) + 2/5(-1/2log 2(1/2)-1/2log2 (1/2))

= 0.95
Gain (student) = 0.97 – 0.95 = 0.02
For Credit_Rating we have two values
Credit_Rating v=fair (Yes) = 3 Student v=yes (No) = 0
Credit_Rating v =excellent (Yes) = 0 Student v =excellent (No) =2
Entropy (Credit_Rating) = 0

Gain (Credit_Rating) = 0.97 – 0 = 0.97


Gain (income) = 0.97 – 0.95 = 0.02
Gain (student) = 0.97 – 0.95 = 0.02
Gain (Credit_Rating) = 0.97 – 0 = 0.97
We then split based on Credit_Rating. These splits give partitions each with records from the same class. We
just need to make these into leaf nodes with their class label attached.
The final decision tree for the given dataset is as follows.

6
Based on this decision tree we can predict the class of new examples as follows.
X = (age=Senior, income=medium, student = yes, credit rating=excellent)
Follow branch (age = senior) then credit rating=excellent we predict Class=no So, the new instance cannot buy
a computer.
Z = (age=middle-aged, income=low, student = no, credit rating=fair)

Follow branch (age = middle) then we predict Class=yes So, the new instance can buy a computer.
R = (age=Youth, income=medium, student = no, credit rating=excellent).

Follow branch (age = youth) then student = no we predict Class=no So, the new instance cannot buy a
computer.

2. Using k nearest neighbors


First assign the value of k and select the distance metrics
Here I use Euclidean distance measure and k=3
RID age income student Credit- Class:
rating Buys
computer
1 youth high no fair no
2 youth high no excellent No
3 Middle-aged high no fair Yes
4 senior medium no fair Yes
5 senior low yes fair Yes
6 senior low yes excellent No
7 Middle-aged low yes excellent Yes
8 youth medium no fair No
9 youth low yes fair Yes
10 senior medium yes fair Yes
11 youth medium yes excellent Yes
12 Middle-aged medium no excellent Yes
13 Middle-aged high yes fair Yes
14 senior medium no excellent No

Let as change all the categorical data into numerical data by random value.

age income student Credit-rating Class or target


Youth=0 High=2 Yes=1 Fair=0 Yes
Middle-aged=1 Medium=1 No=0 Excellent=1 No
Senior=2 Low=0

7
RID age income student Credit- Class:
rating Buys
computer
1 0 2 0 0 No
2 0 2 0 1 No
3 1 2 0 0 Yes
4 2 1 0 0 Yes
5 2 0 1 0 Yes
6 2 0 1 1 No
7 1 0 1 1 Yes
8 0 1 0 0 No
9 0 0 1 0 Yes
10 2 1 1 0 Yes
11 0 1 1 1 Yes
12 1 1 0 1 Yes
13 1 2 1 0 Yes
14 2 1 0 1 no

X = (age=Senior, income=medium, student = yes, credit rating=excellent)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 2
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no ?

Calculate the distance between the query-instance and all the training examples

Sqrt ((2-0) 2+ (1-2)2 + (1-0) 2+ (1-0)2]


= Sqrt (4+1+1+1)

= Sqrt (7)

=2.65
This is the distance between the first data and the new example data.
Based on this formula we find the distance for each given training example data.

8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 2
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no ?
distance 2.65 2.45 2 1.41 1.41 1 1.7 2.45 2.45 1 2 1.41 1.7 1
From this we select three neighbors which are more close to the new data example these are 6, 12 and 14
and we predict the new class by the majority of the three one. Here the majority are class no therefore the
new class is no class.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 2
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 1
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no no
distance 2.65 2.45 2 1.41 1.41 1 1.7 2.45 2.45 1 2 1.41 1.7 1

Z = (age=middle-aged, income=low, student = no, credit rating=fair)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 16(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 1
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 0
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit-rating 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
class no no yes yes yes no yes no yes yes yes yes yes no ?

Calculate the distance between the query-instance and all the training examples

Sqrt ((1-0) 2+ (0-2)2 + (0-0) 2+ (0-0)2]


= Sqrt (1+4+0+0)

= Sqrt (5)

=2.23
This is the distance between the first data and the new example data.
Based on this formula we find the distance for each given training example data.

9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 16(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 1
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 0
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
rating
class no no yes yes yes no yes no yes yes yes yes yes no ?
distance 2.23 2.45 2 1.41 1.41 1.7 1.41 1.41 1.41 1.7 2 1.41 1.7 1.7
From this we select three neighbors which are more close to the new data example which has smallest distance
i.e. 1.41 and we predict the new class by the majority of these.
Here the majority are class no therefore the new class is yes class.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 16(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 1
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 0
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 0
rating
class no no yes yes yes no yes no yes yes yes yes yes no yes
distance 2.23 2.45 2 1.41 1.41 1.7 1.41 1.41 1.41 1.7 2 1.41 1.7 1.7

R = (age=Youth, income=medium, student = no, credit rating=excellent).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 17(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 0
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit-rating 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
class no no yes yes yes no yes no yes yes yes yes yes no ?
Calculate the distance between the query-instance and all the training examples

Sqrt ((0-0) 2+ (1-2)2 + (0-0) 2+ (1-0)2]


= Sqrt (0+1+0+1)

= Sqrt (2)

=1.41

10
This is the distance between the first data and the new example data.
Based on this formula we find the distance for each given training example data.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 17(new)
age 0 0 1 2 2 2 1 0 0 2 0 1 1 2 0
income 2 2 2 1 0 0 0 1 0 1 1 1 2 1 1
student 0 0 0 0 1 1 1 0 1 1 1 0 1 0 0
Credit- 0 1 0 0 0 1 1 0 0 0 1 1 0 1 1
rating
class no no yes yes yes no yes no yes yes yes yes yes no no
distance 1.41 1 1.7 2.23 2.65 2.45 1.7 1 1.7 2.45 1 1 2 2

From this we select three neighbors which are more close to the new data example which has smallest distance
i.e. 1 and we predict the new class by the majority of these.
Here the majority are class no therefore the new class is yes class

11
RID Age Income student Credit- Class: buys-
rating computer
1 Youth High No Fair No
2 Youth High No Excellent No
3 Middle-aged High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle-aged Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle-aged Medium No Excellent Yes
13 Middle-aged High Yes Fair Yes
14 Senior Medium No Excellent No
15 senior Medium yes excellent No
16 Middle-aged low no Fair yes
17 youth medium no excellent No

12

You might also like