0% found this document useful (0 votes)
25 views5 pages

Decision Tree

The document describes the steps to build a decision tree classifier to predict whether to play tennis based on weather data. It calculates the information gain of attributes and selects the one with the highest gain as the root node, then recursively calculates the gain of child nodes until reaching leaf nodes where a yes/no prediction is made.

Uploaded by

Priyanka pawar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views5 pages

Decision Tree

The document describes the steps to build a decision tree classifier to predict whether to play tennis based on weather data. It calculates the information gain of attributes and selects the one with the highest gain as the root node, then recursively calculates the gain of child nodes until reaching leaf nodes where a yes/no prediction is made.

Uploaded by

Priyanka pawar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Day Outlook Temperature Humidity Wind Play Tennis

1 Sunny Hot High Weak No


2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

I ( p , n) =
−p
p+n ( )
log 2
p

n
p +n p+ n
log 2(
n
p+n
)

Entropy=∑ ¿

Gain = Information gain – Entropy


Steps –
1. Compute the Information gain of the entire data on the class label (Play tennis).
a. Count the no.of YES (p) and NO (n). p = 9, n = 4.
b. Substitute these values in Information Gain formula.
I ( 9 ,5 ) =
−9
9+ 5
log 2 ( )
9

5
9+5 9+5
log 2
5
9+5( )
i. Calculating log2 in the scientific calculator. ( 9+59 )=0.642
log 2 ( 9+59 )=¿Log(0.642) / log(2) = - 0.639
Information gain of the class label is = 0.941

2. Computing the Gain of the attribute Outlook


a. There are three categories/classes of Outlook – Sunny, Overcast, and Wind. For
each of these compute the Information gain using the formula.
Category Yes No I(pi , ni)
Sunny 2 3 0.970
Overcast 4 0 0
Rain 3 2 0.970

b. Compute the Entropy of Outlook using the formula.


Entropy (Outlook) = (5 / 14) (0.970) + (4 / 14) (0) + (5 / 14) (0.970)
Entropy (Outlook) = 0.692
c. Compute the Gain on Outlook using the formula.
Gain (Outlook) = Information gain of label – Entropy of Outlook
Gain (Outlook) = 0.940 – 0.692
Gain (Outlook) = 0.248
3. Similarly compute the Gain on all other attributes – Temperature, Humidity and Wind.
Mark the attribute with highest gain as the ROOT NODE.
Attribute Gain
Outlook 0.248
Temperature 0.029
Humidity 0.151
Wind 0.048

Calculating Next node


4. For outlook Sunny calculate the Gain on the attributes – Temperature, Humidity & Wind.
Day Outlook Temperature Humidity Wind Play Tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

a. Compute Gain on Temperature. p = 2, n = 3.


Temperatur pi ni I(pi, ni)
e
Hot 0 2 0
Mild 1 1 1
Cool 1 0 0

Entropy of Temperature = (2/5) (0) + ((1+1)/5) (1) + (( 1 + 0 ) / 5) (0)


= 0.4
Gain (Sunny, Temperature) = 0.970 – 0.4 = 0.57
Gain (Sunny, Temperature) = 0.57

b. Compute Gain on Humidity. p = 2, n = 3.


Humidity Pi Ni I(pi, ni)
High 0 3 0
Normal 2 0 0
Entropy of Humidity = 0
Gain (Sunny, Humidity) = 0.970 – 0 = 0.970
Gain (Sunny, Humidity) = 0.970

c. Compute Gain on Wind.


Wind Pi Ni I(pi, ni)
Weak 1 2 0.918
Strong 1 1 1
Entropy of Wind = 0.950
Gain (Sunny, Wind) = 0.970 – 0.950 = 0.020
Gain (Sunny, Wind) = 0.020

5. Gain on Temperature, Humidity & Wind with respect to Sunny.


Attribute Gain
Temperature 0.57
Humidity 0.970
Wind 0.02

a. Select the attribute with the highest gain as the next node. In this case it’s
Humidity.
b. Humidity has 2 categories – High & Normal. So the next nodes under Humidity
would be High & Normal.
c. Mark Yes / No decision under each node – High & Normal by referring the data.

6. Calculating Nodes for Overcast.


a. Observe the data for Overcast. It has all labels as – Yes. Hence we shall simply
mark the decision as Yes under Overcast.
Day Outlook Temperature Humidity Wind Play Tennis
3 Overcast Hot High Weak Yes
7 Overcast Cool Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

7. Calculating Nodes for Rain. Information gain of Rain = 0.970


Day Outlook Temperature Humidity Wind Play Tennis
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

a. Compute Gain on Wind with respect to Rain.


Wind pi ni I(pi, ni)
Weak 3 0 0
Strong 0 2 0
Entropy for Wind: 0
Gain (Rain, Wind) = 0.970 – 0
Gain (Rain, Wind) = 0.970

b. Compute Gain on Temperature with respect to Rain.


Temperature pi ni I(pi, ni)
Hot 0 0 0
Mild 2 1 0.918
Cool 1 1 1
Entropy of temperature = 0.951
Gain (Rain, Temperature) = 0.970 – 0.951 = 0.019
Gain (Rain, Temperature) = 0.019
8. Compare the Gain on Wind & Temperature with respect to Rain. And select the attribute
with the highest Gain as the next node under Rain. In this case it’s Wind.
9. Under the Wind node mark the two categories – Strong and Weak as next nodes.
10. Under Strong & Weak nodes mark Yes / No as the decision by observing the data.

Outlook

Sunny Overcast Rainy

Yes
Humidity Wind

High Normal Strong Weak

No Yes No Yes

Test Data

Day Outlook Temperature Humidity Wind Play Tennis


15 Sunny Hot High Weak
16 Sunny Mild Normal Weak
17 Overcast Hot High Weak
18 Rain Mild Normal Weak
19 Rain Mild High Strong

You might also like