0% found this document useful (0 votes)
62 views

Datamining

The document discusses using different decision tree algorithms - information gain, gain ratio, and Gini index - to create a decision tree for a dataset on whether individuals buy computers. It calculates the information gain, gain ratio, and splits for each attribute to determine the root nodes and splitting criteria. The decision trees and rule sets produced by each algorithm are defined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Datamining

The document discusses using different decision tree algorithms - information gain, gain ratio, and Gini index - to create a decision tree for a dataset on whether individuals buy computers. It calculates the information gain, gain ratio, and splits for each attribute to determine the root nodes and splitting criteria. The decision trees and rule sets produced by each algorithm are defined.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Mining

Decision Tree
Name: Hafiz Muhammad Behzad
Roll no: 17271519-027
-------------------------------------------------------------------------------------------------------------------------------

Q1. Create a decision tree using Information Gain, Gain Ratio and Gini Index of the following
data set. Also define the Rules Set of each decision tree.

Information Gain:
1. Calculate entropy of the target

entropy (Buys_computer) = Entropy (9, 5) = - (9/14log29/14) - (5/14log25/14) = 0.94

2. Calculate the information gain of the each attribute:

entropy (Age, Buys_computer) = P (<=30) * Entropy (2, 3) + P (31...40) * Entropy (4, 0)

+ P (>40, 30) * Entropy (3, 2)

= 5/14 * 0.97 + 4/14 * 0 + 5/14 * 0.97 = 0.345 + 0.345 = 0.69

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Age, Buys_computer)

= 0.94 – 0.69 = 0.24

entropy (Income, Buys_computer) = P (high) * Entropy (2, 2) + P (medium) * Entropy (4, 2)

+ P (low) * Entropy (3, 1)

= 4/14 * 1 + 6/14 * 0.92 + 4/14 * 0.81 = 0.29 + 0.39 + 0.23 = 0.91

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.94 – 0.91 = 0.03

entropy (Student, Buys_computer) = P (no) * Entropy (3, 4) + P (yes) * Entropy (6, 1)

= 7/14 * 0.99 + 7/14 * 0.59 = 0.79

1
Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,
Buys_computer)

= 0.94 – 0.79 = 0.15

entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (6, 2) + P (excellent) * Entropy


(3,3)

= 8/14 * 0.81 + 6/14 * 1 = 0.89

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.94 – 0.89 = 0.05

So, the largest gain attribute is Age.

Selected Age as root node with Childs <=30 , 31-40 -> buy=yes , >40 or 30 .

1. Calculate entropy of the target (when age<=30)

entropy (Buys_computer) =Entropy (2, 3) = 0.97

2. Calculate the information gain of the each attribute:

entropy (Income, Buys_computer) = P (high) * Entropy (0, 2) + P (medium) * Entropy (1, 1)

+ P (low) * Entropy (1, 0)

= 2/5 * 0 + 2/5 * 1 + 1/5 * 0 = 0.4

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.97 – 0.4 = 0.53

entropy (Student, Buys_computer) = P (no) * Entropy (0, 2) + P (yes) * Entropy (0, 2) = 0

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

2
= 0.97 – 0 = 0.97

Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (1, 2) + P (excellent) * Entropy


(1, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,


Buys_computer)

= 0.97 – 0.95 = 0.02

So, the largest gain attribute is Student.

The Student is Selected as root node.

1. Calculate entropy of the target (when age>30 or >40)

entropy (Buys_computer) =Entropy (3, 2) = 0.97

2. Calculate the information gain of the each attribute:

entropy (Income, Buys_computer) = P (medium) * Entropy (2, 1)

+ P (low) * Entropy (1, 1) = 3/5 * 0.92 + 2/5 * 1 = 0.95

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.97 – 0.95 = 0.02

entropy (Student, Buys_computer) = P (no) * Entropy (1, 1) + P (yes) * Entropy (2, 1)

= 3/5 * 0.92 + 2/5 * 1 = 0.95

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,


Buys_computer)

= 0.97 – 0.95 = 0.02

entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (3, 0) + P (excellent) * Entropy (0,


2)

= 3/5 * 0 + 2/5 * 0 = 0

3
Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,
Buys_computer)

= 0.97 – 0 = 0.97

so, the largest gain attribute is Credit_rating

the Credit_rating is Selected as root node:

Rules set:

1. If age <=30 and student = no then buys_computer = no


2. If age <=30 and student = yes then buys_computer = yes
3. If age = 31 to 40 then buys_computer = yes
4. If age = >30 or >40 and credit rating = excellent then buys_computer = yes
5. If age = >30 or >40 and credit rating = fair then buys_computer = no

Gain Ration:
1. Calculate the gain ratio of the each attribute:

SplitInfo_age (<=30, 31...40,>40or>30) = -5/14log25/14-4/14log24/14-5/14log25/14 = 1.58

Gain Ratio (Age) = Gain (Age) / SplitInfo_age

= 0.24 / 1.58 = 0.15

SplitInfo_income (high, medium, low) = -4 / 14 log2 4 / 14 – 6 / 14 log2 6 / 14 – 4 / 14 log2 4 / 14

= 1.56

Gain Ratio (Income) = Gain (Income) / SplitInfo_income = 0.03 / 1.56 = 0.019

SplitInfo_student (yes, no) = -7 / 14 log2 7 / 14 – 7 / 14 log2 7 / 14 = 1

Gain Ratio (Student) = Gain (Student) / SplitInfo_student = 0.79 / 1 = 0.79

SplitInfo_credit_rating (fair, excellent) = -8 / 14 log2 8 / 14 – 6 / 14 log2 6 / 14

= 0.99

Gain Ratio (Credit_rating) = Gain (Credit_rating) / SplitInfo_credit_rating= 0.89 / 0.99 = 0.90

So, the largest gain ratio attribute is Credit_rating

Selected Credit_rating as root node:

Credit_rating can be fair or excellent.

4
1. Calculate the gain ratio of the each attribute:

SplitInfo_age (<=30, 31...40,>40or>30) = -3/8log23/8-2/14log22/14-3/14log23/14 = 1.56

entropy (Buys_computer) =Entropy (6, 2) = 0.81

2. Calculate the information gain of the each attribute:

entropy (Age, Buys_computer) = P (<=30) * Entropy (1, 2) + P (31..40) * Entropy (2, 0)

+ P (>40,>30) * Entropy (3, 0)

= 3/8 * 0.92 + 2/8 * 0 + 3/8 * 0 = 0.345

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Income, Buys_computer)

= 0.81 – 0.345 = 0.465

Gain Ratio (Age) = Gain (Age) / SplitInfo_age = 0.465 / 1.56 = 0.298

SplitInfo_income (high, middle,low) = 1.56

entropy (Buys_computer) =Entropy (6, 2) = 0.81

entropy (income, Buys_computer) = P (high) * Entropy (2, 1) + P (middle) * Entropy (2, 1)

+ P (low) * Entropy (2, 0)

= 3/8 * 0.92 + 3/8 * 0.92 = 0.345 + 0.345 = 0.69

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,


Buys_computer)

= 0.81 – 0.69 = 0.12

Gain Ratio (Income) = Gain (income) / SplitInfo_income = 0.12 / 1.56 = 0.08

SplitInfo_student (yes,no) = B

entropy (Buys_computer) =Entropy (6, 2) = 0.81

entropy (student, Buys_computer) = P (no) * Entropy (2, 2) + P (yes) * Entropy (4, 0)

= 4/8 * 1 + 4/8 * 0 = 0.5

5
Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,
Buys_computer)

= 0.81 – 0.5 = 0.31

Gain Ratio (Student) = Gain (Student) / SplitInfo_student = 0.31 / 1 = 0.31

Credit_rating

Fair Excellent

Student student

Yes No yes no

Buy=Yes Age buy=yes buy=no

<=30 31..40 > 40 or > 30

buy = no buy = yes buy = yes

You might also like