0% found this document useful (0 votes)
2 views12 pages

00 Decision Tree Example

Uploaded by

jayaprakash.5388
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views12 pages

00 Decision Tree Example

Uploaded by

jayaprakash.5388
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ID3 Decision Tree Classification algorithm: (Iterative

Dichotomiser 3)
This decision tree algorithm use the information gain for
attribute selection measure.
Example:

Attribute: Outlook
Values(Outlook) = Sunny, Overcast, Rain
For entire dataset S = (9+, 5-) which means 9 Yes and 5 No
Entropy(S) = -9/14 log2 9/14 - 5/14 log2 5/14 = 0.94

For Sunny, (2+,3-) which means 2 Yes and 3 No


Entropy(SSunny) = -2/5 log2 2/5- 3/5 log2 3/5 = 0.971

For Outlook, (4+, 0-) which means 4 Yes and 0 No


Entropy(SOvercast) = -4/4 log2 4/4- 0/4 log2 0/4 = 0

For Rain, (3+, 2-) which means 3 Yes and 2 No


Entropy(SRain) = -3/5 log2 3/5- 2/5 log2 2/5 = 0.971
Gain(S, Outlook) = Entropy(S) – (5/14)Entropy(SSunny) -
(4/14)Entropy(SOvercast)- (5/14)Entropy(SRain)
Gain(S, Outlook) = 0.94 – (5/14)0.971 – (4/14)0 –
(5/14)0.971 = 0.2464

Attribute: Temp
Values(Temp) = Hot, Mild, Cool
For entire dataset S = (9+, 5-) which means 9 Yes and 5 No
Entropy(S) = -9/14 log2 9/14 - 5/14 log2 5/14 = 0.94

For Hot, (2+,2-) which means 2 Yes and 2 No


Entropy(SHot) = -2/4 log2 2/4- 2/4 log2 2/4 = 1.0

For Mild, (4+, 2-) which means 4 Yes and 2 No


Entropy(SMild) = -4/6 log2 4/6- 2/6 log2 2/6 = 0.9183

For Cool, (3+, 1-) which means 3 Yes and 1 No


Entropy(SCool) = -3/4 log2 3/4- 1/4 log2 1/4 = 0.8113

Gain(S, Temp) = Entropy(S) – (4/14)Entropy(SHot) -


(6/14)Entropy(SMild)- (4/14)Entropy(SCool)
Gain(S, Temp) = 0.94 – (4/14)1.0 – (6/14)0.9183 –
(4/14)0.8113 = 0.0289

Attribute: Humidity
Values(Humidity) = High, Normal
For entire dataset S = (9+, 5-) which means 9 Yes and 5 No
Entropy(S) = -9/14 log2 9/14 - 5/14 log2 5/14 = 0.94

For High, (3+,4-) which means 3 Yes and 4 No


Entropy(SHigh) = -3/7 log2 3/7- 4/7 log2 4/7 = 0.9852

For Normal, (6+,1-) which means 6 Yes and 1 No


Entropy(SNornal) = -6/7 log2 6/7- 1/7 log2 1/7 = 0.5916

Gain(S, Humidity) = Entropy(S) – (7/14)Entropy(SHigh) -


(7/14)Entropy(SNormal)
Gain(S, Humidity) = 0.94 – (7/14)0.9852 – (7/14)0.5916 =
0.1516

Attribute: Wind
Values(Wind) = Strong, Weak
For entire dataset S = (9+, 5-) which means 9 Yes and 5 No
Entropy(S) = -9/14 log2 9/14 - 5/14 log2 5/14 = 0.94

For Strong, (3+,3-) which means 3 Yes and 3 No


Entropy(SStrong) = -3/6 log2 3/6- 3/6 log2 3/6 = 1.0

For Weak, (6+,2-) which means 6 Yes and 2 No


Entropy(SWeak) = -6/8 log2 6/8- 2/8 log2 2/8 = 0.8113

Gain(S, Wind) = Entropy(S) – (6/14)Entropy(SStrong) -


(8/14)Entropy(SWeak)
Gain(S, Wind) = 0.94 – (6/14)1.0 – (8/14)0.8113 = 0.0478
Information Gain of all attributes against play tennis
attribute:

Gain(S, Outlook) = 0.2464


Gain(S, Temp) = 0.0289
Gain(S, Humidity) = 0.1516
Gain(S, Wind) = 0.0478
The attribute Outlook is selected as a root node which has
high information gain.
We obtained the decision tree as below using the root node
outlook.

Here in overcast attribute, all samples are yes, Hence yes is


the leaf node of overcast attribute. However for other two
attributes, we had both yes and no.
Hence repeat the process for sunny and rain attributes.
The attribute Humidity is selected for Sunny as next best
attribute which has highest information gain.
Hence the resultant decision tree is

Similarly we need to repeat the process for Rain attribute


Final Decision Tree is shown below

CART (Classification and Regression Tree) Decision Tree


Classification Algorithm:
This algorithm use gini index for attribute selection
measure.
Example:

Step 1: Calculate Gini index for the dataset. The target


attribute job offer has 7 instances as Yes and 3 instances as
No.

Step 2: Calculate Gini index of each of the attribute and


each of the subset in the attribute.
Repeat the process for remaining attributes such as
interactiveness, practical knowledge and communication skill.
Similarly calculate the gini index for all the attributes.

Subset CGPA = {(>=9, >=8), <8} is the best splitting subset


which has one of the low gini index 0.1755. Since CGPA and
Communication skills has low gini index, we can take any one
the attribute as root node. Here we will take CGPA as a root
node.

Repeat the process for the CGPA with >=9, >=8 which has job
offer yes and no.
The following are gini index of all the attributes of the CGPA
with >=9, >=8 is calculated as

Here communication skills attribute is selected as next best


splitting attribute with low gini index or high gini value. Hence
the final decision tree is as

You might also like