0% found this document useful (0 votes)
110 views

Assignment-Decision Tree

The document describes the process of building a decision tree model to predict whether someone buys a computer based on attributes like age, income, student status, and credit rating. It shows calculating information gain and Gini index for each attribute to determine the root node, then recursively builds out the tree by calculating metrics for the child nodes. The optimal tree was determined using gain ratio and has credit rating as the root node, with student status and age as additional nodes. The document concludes by presenting the rules generated from the final decision tree model.

Uploaded by

ThAnos n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Assignment-Decision Tree

The document describes the process of building a decision tree model to predict whether someone buys a computer based on attributes like age, income, student status, and credit rating. It shows calculating information gain and Gini index for each attribute to determine the root node, then recursively builds out the tree by calculating metrics for the child nodes. The optimal tree was determined using gain ratio and has credit rating as the root node, with student status and age as additional nodes. The document concludes by presenting the rules generated from the final decision tree model.

Uploaded by

ThAnos n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Name: Areeb Ahmed

Roll no:17271519-157

Assignment 3
------------------------------------------------------------------------------------------------------------------------------------------

Decision tree using Information Gain

Step 1: Calculate entropy of the target:

Entropy (Buys_computer) =Entropy (9, 5)

= - (9/14log29/14) - (5/14log25/14)

=0.94

Step 2: Calculate the information gain of the each attribute:

Entropy (Age, Buys_computer) = P (<=30) * Entropy (2, 3) + P (31...40) * Entropy (4, 0)

+ P (>40, 30) * Entropy (3, 2)

= 5/14 * 0.97 + 4/14 * 0 + 5/14 * 0.97

= 0.345 + 0.345

= 0.69

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Age, Buys_computer)

= 0.94 – 0.69 = 0.24

Entropy (Income, Buys_computer) = P (high) * Entropy (2, 2) + P (medium) * Entropy (4, 2)

+ P (low) * Entropy (3, 1)

= 4/14 * 1 + 6/14 * 0.92 + 4/14 * 0.81

1
= 0.29 + 0.39 + 0.23

= 0.91

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.94 – 0.91 = 0.03

Entropy (Student, Buys_computer) = P (no) * Entropy (3, 4) + P (yes) * Entropy (6, 1)

= 7/14 * 0.99 + 7/14 * 0.59

= 0.79

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.94 – 0.79 = 0.15

Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (6, 2) + P (excellent) * Entropy


(3, 3)

= 8/14 * 0.81 + 6/14 * 1

= 0.89

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.94 – 0.89 = 0.05

Here, the largest gain attribute is Age.

Selected Age as root node:

2
Age

<=30 31 to 40 >40 or 30

Buy = Yes

Step 1: Calculate entropy of the target (when age<=30)

Entropy (Buys_computer) =Entropy (2, 3) = 0.97

Step 2: Calculate the information gain of the each attribute:

Entropy (Income, Buys_computer) = P (high) * Entropy (0, 2) + P (medium) * Entropy (1, 1)

+ P (low) * Entropy (1, 0)

= 2/5 * 0 + 2/5 * 1 + 1/5 * 0

= 0.4

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.97 – 0.4 = 0.53

Entropy (Student, Buys_computer) = P (no) * Entropy (0, 2) + P (yes) * Entropy (0, 2)

=0

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.97 – 0 = 0.97

3
Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (1, 2) + P (excellent) * Entropy
(1, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.97 – 0.95 = 0.02

Here, the largest gain attribute is Student.

Selected Student as root node.

Step 1: Calculate entropy of the target (when age>30 or >40)

Entropy (Buys_computer) =Entropy (3, 2) = 0.97

Step 2: Calculate the information gain of the each attribute:

Entropy (Income, Buys_computer) = P (medium) * Entropy (2, 1)

+ P (low) * Entropy (1, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.97 – 0.95 = 0.02

Entropy (Student, Buys_computer) = P (no) * Entropy (1, 1) + P (yes) * Entropy (2, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)
4
= 0.97 – 0.95 = 0.02

Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (3, 0) + P (excellent) * Entropy


(0, 2)

= 3/5 * 0 + 2/5 * 0

=0

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.97 – 0 = 0.97

Here, the largest gain attribute is Credit_rating

Selected Credit_rating as root node:

Age

<=30 31 to 40 >40 or 30

Student Buy = Yes Credit_rating

Yes No Fair Excellent

Buy = Yes Buy = No Buy = No Buy = Yes

Rules set of this tree:

1. If age <=30 and student = no then buys_computer = no


2. If age <=30 and student = yes then buys_computer = yes
3. If age = 31 to 40 then buys_computer = yes
4. If age = >30 or >40 and credit rating = excellent then buys_computer = yes

5
5. If age = >30 or >40 and credit rating = fair then buys_computer = no

Decision tree using Gain ratio


Step 1: Calculate the gain ratio of the each attribute:

SplitInfoage (<=30, 31...40,>40or>30) = -5/14log25/14-4/14log24/14-5/14log25/14

= 1.58

Gain Ratio (Age) = Gain (Age) / SplitInfoage

= 0.24 / 1.58 = 0.15

SplitInfoincome (high, medium, low) = -4 / 14 log2 4 / 14 – 6 / 14 log2 6 / 14 – 4 / 14 log2 4 / 14

= 1.56

Gain Ratio (Income) = Gain (Income) / SplitInfoincome

= 0.03 / 1.56 = 0.019

SplitInfostusdent (yes, no) = -7 / 14 log2 7 / 14 – 7 / 14 log2 7 / 14

=1

Gain Ratio (Student) = Gain (Student) / SplitInfostudent

= 0.79 / 1 = 0.79

SplitInfocredit_rating (fair, excellent) = -8 / 14 log2 8 / 14 – 6 / 14 log2 6 / 14

= 0.99

Gain Ratio (Credit_rating) = Gain (Credit_rating) / SplitInfocredit_rating

= 0.89 / 0.99 = 0.90

Here, the largest gain ratio attribute is Credit_rating

Selected Credit_rating as root node:

6
Credit_rating

Fair Excellent

Step 1: Calculate the gain ratio of the each attribute:

SplitInfoage (<=30, 31...40,>40or>30) = -3/8log23/8-2/14log22/14-3/14log23/14

= 1.56

Entropy (Buys_computer) =Entropy (6, 2) = 0.81

Step 2: Calculate the information gain of the each attribute:

Entropy (Age, Buys_computer) = P (<=30) * Entropy (1, 2) + P (31..40) * Entropy (2, 0)

+ P (>40,>30) * Entropy (3, 0)

= 3/8 * 0.92 + 2/8 * 0 + 3/8 * 0

= 0.345

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Income, Buys_computer)

= 0.81 – 0.345 = 0.465

Gain Ratio (Age) = Gain (Age) / SplitInfoage

= 0.465 / 1.56 = 0.298

7
SplitInfoincome (high, middle,low) = 1.56

Entropy (Buys_computer) =Entropy (6, 2) = 0.81

Entropy (income, Buys_computer) = P (high) * Entropy (2, 1) + P (middle) * Entropy (2, 1)

+ P (low) * Entropy (2, 0)

= 3/8 * 0.92 + 3/8 * 0.92

= 0.345 + 0.345 = 0.69

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.81 – 0.69 = 0.12

Gain Ratio (Income) = Gain (income) / SplitInfoincome

= 0.12 / 1.56 = 0.08

SplitInfostudent (yes,no) = 1

Entropy (Buys_computer) =Entropy (6, 2) = 0.81

Entropy (student, Buys_computer) = P (no) * Entropy (2, 2) + P (yes) * Entropy (4, 0)

= 4/8 * 1 + 4/8 * 0

= 0.5

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.81 – 0.5 = 0.31

Gain Ratio (Student) = Gain (Student) / SplitInfostudent

= 0.31 / 1 = 0.31

8
Credit_rating

Fair Excellent

Student student

Yes No yes no

Buy=Yes Age buy=yes buy=no

<=30 31..40 > 40 or > 30

buy = no buy = yes buy = yes

Rules set of this tree:

1. If credit=fair and student = yes then buys_computer = yes


2. If credit=fair and student = no and age<=30 then buys_computer = no
3. If credit=fair and student = no and age=31…40 or >40,30 then buys_computer = yes
4. If credit rating = excellent and student=no then buys_computer = no
5. If credit rating = excellent and student=yes then buys_computer = yes

Decision tree using Gini Index


Calculating the Gini Index for Age:

Gini index= 5/14*(1-(2/52+3/52)) +4/14*(1-(4/4)2) +5/14*(1-(3/52+2/52))) =0.34

Calculating the Gini Index for Income:

Gini index= 4/14*(1-((2/4) ^2 + (2/4) ^2)) +6/14*(1-((4/6) ^2 + (2/6) ^2)) +4/14*(1-((3/4) ^2 + (1/4)
^2)) =0.44

9
Calculating the Gini Index for Student:

Gini index= 7/14*(1-((3/7) ^2 + (4/7) ^2)) +7/14*(1-((6/7) ^2 + (1/7) ^2)) =0.37

Calculating the Gini Index for Credit_rating:

Gini index= 8/14*(1-((6/8) ^2 + (2/8) ^2)) +6/14*(1-((3/6) ^2 + (3/6) ^2)) =0.43

Here, the smallest Gini index age

Selected age as root node:

Age

<=30 31 to 40 >40 or 30

Buy = Yes

Now, calculating Gini index when age <=30,

Calculating the Gini Index for Income:

Gini index= 2/5*(1-((0/2) ^2 + (2/2) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) +1/5*(1-((1/1) ^2 + (0/1) ^2))
=0.2

Calculating the Gini Index for Student:

Gini index= 3/5*(1-((3/3) ^2 + (0/3) ^2)) +2/5*(1-((2/2) ^2 + (0/2) ^2)) =0

Calculating the Gini Index for Credit_rating:

Gini index= 3/5*(1-((1/3) ^2 + (2/3) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) =0.47

Here, the smallest Gini index is student

Selected student as root node:

10
Now, calculating Gini index when age >40,>30:

Calculating the Gini Index for Income:

Gini index= 3/5*(1-((2/3) ^2 + (1/3) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) =0.47

Calculating the Gini Index for Student:

Gini index= 3/5*(1-((2/3) ^2 + (1/3) ^2)) +2/5*(1-((1/2) ^2 + (1/2) ^2)) =0.47

Calculating the Gini Index for Credit_rating:

Gini index= 3/5*(1-((3/3) ^2 + (0/3) ^2)) +2/5*(1-((2/2) ^2 + (0/2) ^2)) =0

Here, the smallest Gini index is credit

Selected credit as root node:

Age

<=30 31 to 40 >40 or 30

Student Buy = Yes Credit_rating

Yes No Fair Excellent

Buy = Yes Buy = No Buy = No Buy = Yes

Rules set of this tree:

1. If age <=30 and student = no then buys_computer = no


2. If age <=30 and student = yes then buys_computer = yes
3. If age = 31 to 40 then buys_computer = yes
4. If age = >30 or >40 and credit rating = excellent then buys_computer = yes

11
12

You might also like