0% found this document useful (0 votes)
5 views

Machine Learning Descision Tree

Uploaded by

Nilima Deore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Machine Learning Descision Tree

Uploaded by

Nilima Deore
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Machine Learning Decision Tree – Solved Problem (ID3 algorithm)

Competition Description
Your goal is to find out when people will play outside through next week’s weather forecast. You find out that the reason
people decide whether to play or not depends on the weather. The following table is the decision table for whether it is
suitable for playing outside.

Data Description

This can be taken as S.No.


or Day. So there are 14
days… D1, D2,D3,…..D14

id outlook temperature humidity wind play


1 sunny hot high weak no
2 sunny hot high strong no
3 overcast hot high weak yes
4 rainy mild high weak yes
5 rainy cool normal weak yes
6 rainy cool normal strong no
7 overcast cool normal strong yes
8 sunny mild high weak no
9 sunny cool normal weak yes
10 rainy mild normal weak yes
11 sunny mild normal strong yes
12 overcast mild high strong yes
13 overcast hot normal weak yes
14 rainy mild high strong no

Course Design
Choose your own way and programming language to implement the decision tree algorithm (with code comments or notes).
Divide the data in Data Description into training sets and test sets the get your answer.

1|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Solution: I have followed ID 3 (Iterative Dichotomiser 3) Algorithm

We need to construct the Decision tree to predict whether people will play outside or not?

The following Dataset is given in the form of table

id outlook temperature humidity wind play


1 sunny hot high weak no
2 sunny hot high strong no
3 overcast hot high weak yes
4 rainy mild high weak yes
5 rainy cool normal weak yes
6 rainy cool normal strong no
7 overcast cool normal strong yes
8 sunny mild high weak no
9 sunny cool normal weak yes
10 rainy mild normal weak yes
11 sunny mild normal strong yes
12 overcast mild high strong yes
13 overcast hot normal weak yes
14 rainy mild high strong no

Step 1: Compute Entropy (H) for entire Dataset

Number of samples = 14
Outlook

Temp
Number of Attributes = 4
Humidity

Wind

Output variable = Play Number of distinct outputs = 2 Yes


No

➢ Out of 14 samples, 9 samples belong to “Yes” category


➢ Out of 14 samples, 5 samples belong to “No” category

So, Number of “Yes” = 9


Number of “No” = 5

Now Total Entropy of given dataset 𝐻 = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

Here L = Number of symbols at the output of the DMS source. DMS is the Discrete Memoryless Source. Note
that Decision Tree is a Binary tree.

2|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
A discrete information source is a source that has only a finite set of symbols as possible outputs. A discrete information
source consists of a discrete (countable) set of letters or symbols.

Let X having alphabets {x1, x2,…..xm}. Note that set of source symbols is called source alphabet.

A Binary source is described by the list of 2 symbols, probability assignment to these symbols a.

x1 = Yes
Binary Source
x2 = No

1
Total Entropy 𝐻 = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥 ) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )
𝑖

𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑌𝑒𝑠 9


𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 14

𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑁𝑜 5
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 14

∴ 𝐻 = −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )}

9 9 5 5
= −{ log 2 + log 2 }
14 14 14 14

= −{0.642857 log 2 0.642857 + 0.357142857 log 2 0.357142857}

= −{0.642857 𝑥 (−0.63742992) + 0.357142857 𝑥 (−1.4854268 }

= −{−0.40977637 − 0.53050957}
= 0.94028

log10 0.642857 −0.19188526


log 2 0.642857 = = = −0.63742992
log10 2 0.3010299

log10 0.357142857 −0.447158


log 2 0.357142857 = = = −1.4854268
log10 2 0.3010299

3|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Entropy concept

• Measures the uncertainty present in the data


• Entropy measures randomness in the data
• It is used to decide how a decision tree can split the data
• Entropy is the measure of the disorder of a system
• Entropy tends to be maximum in the middle with value 1 and minimum 0(zero) at the ends.
• The higher the entropy more the information content.
• Entropy is the average information contained in a message

1
Entropy 𝐻(𝑋) = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥 ) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 ), where X is a source and L = number of
𝑖

symbols or messages generated by source. Binary source generates 2 symbols (example: Yes and No).
H=1

Entropy H(X)

H=0
H=0
Middle point X= xi
Entropy tends to be maximum in the middle with value 1 and minimum 0(zero) at the ends. xi are the events or
symbols or messages

4|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Step 2: Calculations for every Attribute

Calculate Entropy and Information Gain for these different Attributes


In the given dataset, there 4 Attributes: Outlook, Temp, Humidity, Wind

(i) For Outlook attribute

Outlook has 3 different parameters: Sunny, Overcast, Rainy

Yes (2) id outlook play


1 sunny no
Sunny
No (3) 2 sunny no
3 overcast yes
4 rainy yes
Yes (4) 5 rainy yes
Overcast 6 rainy no
No (0) 7 overcast yes
8 sunny no
9 sunny yes
Yes (3)
10 rainy yes
Rainy 11 sunny yes
No (2) 12 overcast yes
13 overcast yes
14 rainy no

Entropy for Sunny: H(Outlook = Sunny) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 2
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 3


𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5

2 2 3 3
∴ H(Outlook = Sunny) = − {5 log 2 5 + log 2 5}
5

= −{0.4 log 2 0.4 + 0.6 log 2 0.6}


= 0.4 𝑥 1.321928 + 0.6 𝑥 0.736065
= 0.9709

5|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
log10 0.4 −0.39794
log 2 0.4 = = = −1.321928
log10 2 0.3010299

log10 0.6 −0.22184875


log 2 0.6 = = = −0.736965
log10 2 0.3010299

Entropy for Overcast: H(Outlook = Overcast) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 4
𝑝(𝑥1 ) = = =1
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 4

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 0


𝑝(𝑥2 ) = = =0
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 4

∴ H(Outlook = Overcast) = − {1 log 2 1 + 0 log 2 0} = 0

Entropy for Rainy: H(Outlook = Rainy) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 3
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 2
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5

3 3 2 2
∴ H(Outlook = Sunny) = − { 5 log 2 5 + 5 log 2 5}
= −{ 0.6 log 2 0.6 + 0.4 log 2 0.4}
= 0.6 𝑥 0.736065 + 0.4 𝑥 1.321928
= 0.9709
Now we have to find Information Gain for attribute: Outlook
Information Gain = Entropy of Total Dataset – Information (Outlook)
Information of Outlook attribute is the weighted average and is given as:
|𝐻𝑣 |
𝐼(𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = ∑𝑣∈(𝑆𝑢𝑛𝑛𝑦,𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡,𝑟𝑎𝑖𝑛𝑦) 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻

5 4 5
= 𝑥 0.9709 + 𝑥0+ 𝑥 0.9709
14 14 14
= 0.34675 + 0.34675 = 0.6935

∴ 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 (𝑂𝑈𝑡𝑙𝑜𝑜𝑘) = 𝑇𝑜𝑡𝑎𝑙 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐼(𝑂𝑢𝑡𝑙𝑜𝑜𝑘)

= 0.94028 − 0.6935 = 0.24678

6|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
(ii) For Temperature Attribute

Temp has 3 different parameters: Hot, Mild, Cool

Yes (2) id Temp play


1 hot no
Hot
No (2) 2 hot no
3 hot yes
4 mild yes
Yes (4) 5 cool yes
Mild 6 cool no
No (2) 7 cool yes
8 mild no
9 cool yes
10 mild yes
Yes (3)
11 mild yes
Cool 12 mild yes
No (1) 13 hot yes
14 mild no

Entropy for Hot: H(Temp = Hot) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 2
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 4
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 2
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 4

2 2 2 2
∴ H(Temp = Hot) = − {4 log 2 4 + log 2 4}
4

= −{0.5 log 2 0.5 + 0.5 log 2 0.5}


= −{log 2 0.5}

log10 0.5
= −{ }
log10 2

−0.30102995
= −{ }=1
0.30102995

7|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Entropy for Mild: H(Temp = Mild) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 4
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 6

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 2


𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 6

4 4 2 2
∴ H(Temp = Mild) = − { 6 log 2 6 + 6 log 2 6}
2 2 1 1
= −{ log 2 + log 2 }
3 3 3 3
= −{ 0.6 log 2 0.6 + 0.4 log 2 0.4}
2 1
= 𝑥 0.5849626 + 𝑥 1.5849627
3 3

= 0.389975 + 0.5283209 = 0.9183

2 log10 0.66667 −0.17609126


log 2 = log 2 0.66667 = = = −0.5849626
3 log10 2 0.3010299

1 log10 0.333333 −0.47712125


log 2 = log 2 0.333333 = = = −1.5849627
3 log10 2 0.3010299

Entropy for Cool: H(Temp = Cool) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 3
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 4
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 1
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 4

3 3 1 1
∴ H(Temp = Mild) = − { 4 log 2 4 + 4 log 2 4}
2 2 1 1
= −{ log 2 + log 2 }
3 3 3 3
3 1
= 𝑥 0.0.41503752 + 𝑥 2
4 4

= 0.311278 + 0.5 = 0.81127814

8|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
3 log10 0.75 −0.1249387
log 2 = log 2 0.75 = = = −0.41503752
4 log10 2 0.3010299

1 log10 0.25 −0.60205999


log 2 = log 2 0.25 = = = −2
4 log10 2 0.3010299

|𝐻𝑣 |
𝐼(𝑇𝑒𝑚𝑝) = ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
𝑣∈(𝐻𝑜𝑡,𝑀𝑖𝑙𝑑,𝐶𝑜𝑜𝑙)

4 6 4
= 𝑥1+ 𝑥 0.9183 + 𝑥 0.81127814
14 14 14

= 0.2857143 + 0.393557 + 0.23179 = 0.911065

∴ 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 (𝑇𝑒𝑚𝑝) = 𝑇𝑜𝑡𝑎𝑙 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐼(𝑇𝑒𝑚𝑝)

= 0.94028 − 0.911065 = 0.0292149

(iii) For Humidity Attribute

Temp has 2 different parameters: High, Normal

Yes (3) id Temp play


1 high no
High
No (4) 2 high no
3 high yes
4 high yes
Yes (6) 5 normal yes
Normal 6 normal no
No (1) 7 normal yes
8 high no
9 normal yes
10 normal yes
11 normal yes
12 high yes
13 normal yes
14 high no

Entropy for High: H(Humidity = High) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

9|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
From above data,
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 3
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 7
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 4
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 7

3 3 4 4
∴ H(Humidity = High) = − {7 log 2 7 + log 2 7}
7

3 4
= 𝑥 1.2223926 + 𝑥 0.807355
7 7
= 0.5238825 + 0.46134574
= −{log 2 0.5}

= 0.98523

3 log10 0.42857 −0.36797678


log 2 = log 2 0.42857 = = = −1.222392
7 log10 2 0.3010299

4 log10 0.57143 −0.243038


log 2 = log 2 0.57143 = = = −0.807355
7 log10 2 0.3010299

Entropy for Normal: H(Humidity = Normal) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 6
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 7
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 1
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 7

3 3 4 4
∴ H(Humidity = High) = − {7 log 2 7 + log 2 7}
7

6 1
= 𝑥 0.22239245 + 𝑥 2.807355
7 7

= 0.190622 + 0.40105076

= 0.59167286
6 log10 0.857143 −0.0669467
log 2 = log 2 0.857143 = = = −0.22239245
7 log10 2 0.3010299

1 log10 0.142857 −0.845098


log 2 = log 2 0.142857 = = = −2.807355
7 log10 2 0.3010299

10 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
|𝐻𝑣 |
𝐼(𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) = ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
𝑣∈(𝐻𝑖𝑔ℎ,𝑁𝑜𝑟𝑚𝑎𝑙)

7 7
= 𝑥 0.98523 + 𝑥 0.59167286
14 14

= 0.492615 + 0.29583643 = 0.78845143

∴ 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 (𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) = 𝑇𝑜𝑡𝑎𝑙 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐼(𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦)

= 0.94028 − 0.78845143 = 0.15182857

(iv) For Wind Attribute

Temp has 2 different parameters: Weak, Strong

Yes (6) id Wind play


1 weak no
Weak
No (2) 2 strong no
3 weak yes
4 weak yes
Yes (3) 5 weak yes
Strong 6 strong no
No (3) 7 strong yes
8 weak no
9 weak yes
10 weak yes
11 strong yes
12 strong yes
13 weak yes
14 strong no

Entropy for Weak: H(Wind = Weak) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 6
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 8
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 2
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 8

11 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
6 6 2 2
∴ H( Wind = Weak) = − {8 log 2 8 + log 2 8}
8

3 3 1 1
= − { log 2 + log 2 }
4 4 4 4

= 0.311278 + 0.5

= 0.81278

Entropy for Strong: H(Wind = Strong) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

= −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )} Where x1 = Yes and x2 = No

From above data,


𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 3
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 6
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 3
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 6

3 3 3 3
∴ H( Wind = Strong) = − {6 log 2 6 + log 2 6}
6

1 1 1 1
= − { log 2 + log 2 } = 1
2 2 2 2

|𝐻𝑣 |
𝐼(𝑊𝑖𝑛𝑑) = ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
𝑣∈(𝐻𝑖𝑔ℎ,𝑁𝑜𝑟𝑚𝑎𝑙)

8 6
= 𝑥 0.81278 + 𝑥1
14 14

= 0.46444 + 0.4285714 = 0.893017

∴ 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 (𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) = 𝑇𝑜𝑡𝑎𝑙 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 − 𝐼(𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦)

= 0.94028 − 0.893017 = 0.04726288

Information gains are reproduced below:

IG(Outlook) = 0.24678 Highest Gain


IG(Temp) = 0.0292149
IG(Humidity) = 0.152
IG(Wind) = 0.04726288

The best attribute (predictor variable) is the one that separates dataset into different classes most effectively
or it is the feature that best splits the dataset. Attribute with highest Information gain is taken as ROOT NODE.
Here the Outlook attribute has highest information gain.
12 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Drawing Decision tree
Select Outlook as Root node

Outlook

Overcast

D3, D7, D12, D13

5 samples Yes 5 samples

Overcast consists of all Yes.


So, in all D3, D7, D12, D13
days people will play outside

Sunny has 2 Yes (D9, D11), Rainy has 3 Yes (D4, D5, D10),
and 3 No (D1, D2, D8). So, and 2 No (D6, D14,). So, there is
there is doubtfulness. We need doubtfulness. We need to
to recalculate everything. recalculate everything. That is
That is we need to split the we need to split the tree down.
tree down.

Take Outlook = Sunny and proceed al the steps that we did for original dataset. Outlook already taken as Root
node, so no need to write Outlook attribute in table. Take Sunny samples from original table and write down
as shown below:

id temperature humidity wind play


1 hot high weak no
2 hot high strong no
8 mild high weak no
9 cool normal weak yes
11 mild normal strong yes

For this table, we need to calculate EVERYTHING that we did for original table

13 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Step 1: Compute Entropy of new dataset given in above table
Number of samples = 5

Temp
Number of Attributes = 4
Humidity

Wind
Yes
Output variable = Play Number of distinct outputs = 2
➢ Out of 5 samples, 2 samples belong to “Yes” category No
➢ Out of 5 samples, 3 samples belong to “No” category

So, Number of “Yes” = 2


Number of “No” = 3

Now Total Entropy of given dataset 𝐻 = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )

1
Total Entropy 𝐻 = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥 ) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )
𝑖

𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑌𝑒𝑠 2


𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5
𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑁𝑜 3
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5

∴ 𝐻 = −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )}

2 2 3 3
= − { log 2 + log 2 }
5 5 5 5

2 log10 0.4 −0.39794


log 2 = log 2 0.4 = = = −1.32193
5 log10 2 0.3010299

3 log10 0.6 −0.2218487


log 2 = log 2 0.6 = = = −0.7369657
5 log10 2 0.3010299

∴ 𝐻 = −{0.4 𝑥 1.32193 + 0.6 𝑥 0.7369657}


= 0.528772 + 0.4421794 = 0.9709514

14 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Step 2: Calculations for every attribute

Step 2: Calculations for every Attribute


Calculate Entropy and Information Gain for these different Attributes

In the given dataset, there 3 Attributes: Temp, Humidity, Wind

(i) For Temp attribute

Temp has 3 different parameters: Hot, Mild, Cool

Yes (0) id temperature play


Hot 1 hot no
No (2)
2 hot no
8 mild no
Yes (1) 9 cool yes
11 mild yes
Mild
No (1)

Yes (1)
Cool
No (0)

Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.

Entropy H(Temp = Hot) = 0 (because all No)


Entropy H(Temp = Mild) = 1 (because equal number of Yes and No)
Entropy H(Temp = Cool) = 0 (because all Yes)

2 2 1 2
𝐼(𝑇𝑒𝑚𝑝) = 𝑥 0 + 𝑥 1 + 𝑥 0 = = 0.4
5 5 5 5
𝐼𝐺(𝑇𝑒𝑚𝑝) = 0.9709514 − 0.4 = 0.5709514

15 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
(ii) For Humidity attribute

Humidity has 2 different parameters: High, Normal

Yes (0) id humidity play


High 1 high no
No (3)
2 high no
8 high no
Yes (2) 9 normal yes
11 normal yes
Normal
No (0)

Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.

Entropy H(Humidity = High) = 0 (because all No)


Entropy H(Humidity = Normal) = 0 (because all Yes)

3 2
𝐼(Humidity ) = 𝑥0+ 𝑥0=0
5 5
𝐼𝐺(Humidity ) = 0.9709514 − 0 = 0.9709514

(iii) For Wind attribute

Temp has 2 different parameters: Weak, Strong

Yes (1) id wind play


Weak 1 weak no
No (2)
2 strong no
8 weak no
Yes (1) 9 weak yes
11 strong yes
Strong
No (1)

Entropy H(Wind = Strong) = 1 (because equal number of Yes and No)

1 1 2 2
Entropy H(Wind = Weak) = − {3 log 2 3 + log 2 3} = 0.5283209 + 0.389975 = 0.9183
3

3 2
𝐼(Wind ) = 𝑥 0.9183 + 𝑥 1 = 0.95098
5 5
𝐼𝐺(Wind ) = 0.9709514 − 0.95098 = 0.0199714

16 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Information gains are reproduced below:

IG(Temp) = 0.5709514
IG(Humidity) = 0.9709514 highest GAIN
IG(Wind) = 0.0199174

New Decision tree is shown below

Outlook

Overcast

D3, D7, D12, D13

Humidity Yes 5 samples

We need to do for Rainy also.

D1, D2, D8 D9, D11


No Yes

NOW WE SHOULD WORK ON Outlook = Rainy condition

id temperature humidity wind play


4 mild high weak yes
5 cool normal weak yes
6 cool normal strong no
10 mild normal weak yes
14 mild high strong no

Step 1: Compute Entropy of new dataset given in above table.

Number of samples = 5

Temp
Number of Attributes = 3
Humidity

Wind
Yes
Output variable = Play Number of distinct outputs = 2
No

17 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Out of 5 samples, 3 samples belong to “Yes” category
Out of 5 samples, 2 samples belong to “No” category

So, Number of “Yes” = 3


Number of “No” = 2

𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑌𝑒𝑠 3


𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5
𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑁𝑜 2
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 5

∴ 𝐻 = −{𝑝(𝑥1 ) log 2 𝑝(𝑥1 ) + 𝑝(𝑥2 ) log 2 𝑝(𝑥2 )}

3 3 2 2
= − { log 2 + log 2 +} = 0.9709514
5 5 5 5

(iv) For Temp attribute

Temp has 3 different parameters: Hot, Mild, Cool

Yes (0) id temperature play


Hot 4 mild no
No (0)
5 cool no
6 cool no
Yes (2) 10 mild yes
14 mild yes
Mild
No (1)

Yes (1)
Cool
No (1)

Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.

Entropy H(Temp = Hot) = 0 (because all No)


2 2 1 1
Entropy H(Temp = Mild) = − { 3 log 2 3 + 3 log 2 3 +} = 0.9183
Entropy H(Temp = Cool) = 1 (because equal number of Yes and No)

0 3 2
𝐼(𝑇𝑒𝑚𝑝) = 𝑥 0 + 𝑥 0.9183 + 𝑥 1 = 0.95098
0 5 5
𝐼𝐺(𝑇𝑒𝑚𝑝) = 0.9709514 − 0.95098 = 0.0199714

18 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
(v) For Humidity attribute

Humidity has 2 different parameters: High, Normal

Yes (1) id humidity play


High 4 high no
No (1)
5 normal no
6 normal no
Yes (2) 10 normal yes
14 high yes
Normal
No (1)

Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.

Entropy H(Humidity = High) = 1 (because equal number of Yes and No)


2 2 1 1
Entropy H(Humidity = Normal) = − { 3 log 2 3 + 3 log 2 3 +} = 0.9183

2 3
𝐼(Humidity ) = 𝑥 1 + 𝑥 0.9183 = 0.95098
5 5
𝐼𝐺(Humidity ) = 0.9709514 − 0.95098 = 0.0199714

(vi) For Wind attribute

Humidity has 2 different parameters: Weak, Strong

Yes (3) id wind play


Weak 4 weak no
No (0)
5 weak no
6 strong no
Yes (0) 10 weak yes
14 strong yes
Strong
No (2)
Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.

Entropy H(Wind = Weak) = 0 (because all Yes)


Entropy H(Wind = Weak) = 0 (because all No)

3 2
𝐼(wind ) = 𝑥0+ 𝑥0=0
5 5
𝐼𝐺(Humidity ) = 0.9709514 − 0 = 0.0199714 = 0.9709514

19 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Information gains are reproduced below:

IG(Temp) = 0.0199714
IG(Humidity) = 0.0199714
IG(Wind) = 0.9709514 highest GAIN
Complete Decision tree is shown below

Final Decision tree is shown below

Outlook

Overcast

D3, D7, D12, D13

Humidity Yes Wind

D1, D2, D8 D6, D14 D4, D5, D10


D9, D11
No No Yes
Yes

20 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m

You might also like