Machine Learning Descision Tree
Machine Learning Descision Tree
Competition Description
Your goal is to find out when people will play outside through next week’s weather forecast. You find out that the reason
people decide whether to play or not depends on the weather. The following table is the decision table for whether it is
suitable for playing outside.
Data Description
Course Design
Choose your own way and programming language to implement the decision tree algorithm (with code comments or notes).
Divide the data in Data Description into training sets and test sets the get your answer.
1|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Solution: I have followed ID 3 (Iterative Dichotomiser 3) Algorithm
We need to construct the Decision tree to predict whether people will play outside or not?
Number of samples = 14
Outlook
Temp
Number of Attributes = 4
Humidity
Wind
Here L = Number of symbols at the output of the DMS source. DMS is the Discrete Memoryless Source. Note
that Decision Tree is a Binary tree.
2|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
A discrete information source is a source that has only a finite set of symbols as possible outputs. A discrete information
source consists of a discrete (countable) set of letters or symbols.
Let X having alphabets {x1, x2,…..xm}. Note that set of source symbols is called source alphabet.
A Binary source is described by the list of 2 symbols, probability assignment to these symbols a.
x1 = Yes
Binary Source
x2 = No
1
Total Entropy 𝐻 = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥 ) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )
𝑖
𝑁𝑜. 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑡𝑜 𝑁𝑜 5
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 14
9 9 5 5
= −{ log 2 + log 2 }
14 14 14 14
= −{−0.40977637 − 0.53050957}
= 0.94028
3|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Entropy concept
1
Entropy 𝐻(𝑋) = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥 ) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 ), where X is a source and L = number of
𝑖
symbols or messages generated by source. Binary source generates 2 symbols (example: Yes and No).
H=1
Entropy H(X)
H=0
H=0
Middle point X= xi
Entropy tends to be maximum in the middle with value 1 and minimum 0(zero) at the ends. xi are the events or
symbols or messages
4|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Step 2: Calculations for every Attribute
2 2 3 3
∴ H(Outlook = Sunny) = − {5 log 2 5 + log 2 5}
5
5|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
log10 0.4 −0.39794
log 2 0.4 = = = −1.321928
log10 2 0.3010299
3 3 2 2
∴ H(Outlook = Sunny) = − { 5 log 2 5 + 5 log 2 5}
= −{ 0.6 log 2 0.6 + 0.4 log 2 0.4}
= 0.6 𝑥 0.736065 + 0.4 𝑥 1.321928
= 0.9709
Now we have to find Information Gain for attribute: Outlook
Information Gain = Entropy of Total Dataset – Information (Outlook)
Information of Outlook attribute is the weighted average and is given as:
|𝐻𝑣 |
𝐼(𝑂𝑢𝑡𝑙𝑜𝑜𝑘) = ∑𝑣∈(𝑆𝑢𝑛𝑛𝑦,𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡,𝑟𝑎𝑖𝑛𝑦) 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
5 4 5
= 𝑥 0.9709 + 𝑥0+ 𝑥 0.9709
14 14 14
= 0.34675 + 0.34675 = 0.6935
6|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
(ii) For Temperature Attribute
2 2 2 2
∴ H(Temp = Hot) = − {4 log 2 4 + log 2 4}
4
log10 0.5
= −{ }
log10 2
−0.30102995
= −{ }=1
0.30102995
7|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Entropy for Mild: H(Temp = Mild) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )
4 4 2 2
∴ H(Temp = Mild) = − { 6 log 2 6 + 6 log 2 6}
2 2 1 1
= −{ log 2 + log 2 }
3 3 3 3
= −{ 0.6 log 2 0.6 + 0.4 log 2 0.4}
2 1
= 𝑥 0.5849626 + 𝑥 1.5849627
3 3
3 3 1 1
∴ H(Temp = Mild) = − { 4 log 2 4 + 4 log 2 4}
2 2 1 1
= −{ log 2 + log 2 }
3 3 3 3
3 1
= 𝑥 0.0.41503752 + 𝑥 2
4 4
8|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
3 log10 0.75 −0.1249387
log 2 = log 2 0.75 = = = −0.41503752
4 log10 2 0.3010299
|𝐻𝑣 |
𝐼(𝑇𝑒𝑚𝑝) = ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
𝑣∈(𝐻𝑜𝑡,𝑀𝑖𝑙𝑑,𝐶𝑜𝑜𝑙)
4 6 4
= 𝑥1+ 𝑥 0.9183 + 𝑥 0.81127814
14 14 14
9|P a g e Yo u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
From above data,
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑌𝑒𝑠 3
𝑝(𝑥1 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 7
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒𝑠 𝑓𝑜𝑟 𝑁𝑜 4
𝑝(𝑥2 ) = =
𝑇𝑜𝑡𝑎𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 7
3 3 4 4
∴ H(Humidity = High) = − {7 log 2 7 + log 2 7}
7
3 4
= 𝑥 1.2223926 + 𝑥 0.807355
7 7
= 0.5238825 + 0.46134574
= −{log 2 0.5}
= 0.98523
3 3 4 4
∴ H(Humidity = High) = − {7 log 2 7 + log 2 7}
7
6 1
= 𝑥 0.22239245 + 𝑥 2.807355
7 7
= 0.190622 + 0.40105076
= 0.59167286
6 log10 0.857143 −0.0669467
log 2 = log 2 0.857143 = = = −0.22239245
7 log10 2 0.3010299
10 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
|𝐻𝑣 |
𝐼(𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦) = ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
𝑣∈(𝐻𝑖𝑔ℎ,𝑁𝑜𝑟𝑚𝑎𝑙)
7 7
= 𝑥 0.98523 + 𝑥 0.59167286
14 14
11 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
6 6 2 2
∴ H( Wind = Weak) = − {8 log 2 8 + log 2 8}
8
3 3 1 1
= − { log 2 + log 2 }
4 4 4 4
= 0.311278 + 0.5
= 0.81278
3 3 3 3
∴ H( Wind = Strong) = − {6 log 2 6 + log 2 6}
6
1 1 1 1
= − { log 2 + log 2 } = 1
2 2 2 2
|𝐻𝑣 |
𝐼(𝑊𝑖𝑛𝑑) = ∑ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝐻𝑣 )
𝐻
𝑣∈(𝐻𝑖𝑔ℎ,𝑁𝑜𝑟𝑚𝑎𝑙)
8 6
= 𝑥 0.81278 + 𝑥1
14 14
The best attribute (predictor variable) is the one that separates dataset into different classes most effectively
or it is the feature that best splits the dataset. Attribute with highest Information gain is taken as ROOT NODE.
Here the Outlook attribute has highest information gain.
12 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Drawing Decision tree
Select Outlook as Root node
Outlook
Overcast
Sunny has 2 Yes (D9, D11), Rainy has 3 Yes (D4, D5, D10),
and 3 No (D1, D2, D8). So, and 2 No (D6, D14,). So, there is
there is doubtfulness. We need doubtfulness. We need to
to recalculate everything. recalculate everything. That is
That is we need to split the we need to split the tree down.
tree down.
Take Outlook = Sunny and proceed al the steps that we did for original dataset. Outlook already taken as Root
node, so no need to write Outlook attribute in table. Take Sunny samples from original table and write down
as shown below:
For this table, we need to calculate EVERYTHING that we did for original table
13 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Step 1: Compute Entropy of new dataset given in above table
Number of samples = 5
Temp
Number of Attributes = 4
Humidity
Wind
Yes
Output variable = Play Number of distinct outputs = 2
➢ Out of 5 samples, 2 samples belong to “Yes” category No
➢ Out of 5 samples, 3 samples belong to “No” category
1
Total Entropy 𝐻 = ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥 ) = − ∑𝐿𝑖=1 𝑝(𝑥𝑖 ) log 2 𝑝(𝑥𝑖 )
𝑖
2 2 3 3
= − { log 2 + log 2 }
5 5 5 5
14 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Step 2: Calculations for every attribute
Yes (1)
Cool
No (0)
Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.
2 2 1 2
𝐼(𝑇𝑒𝑚𝑝) = 𝑥 0 + 𝑥 1 + 𝑥 0 = = 0.4
5 5 5 5
𝐼𝐺(𝑇𝑒𝑚𝑝) = 0.9709514 − 0.4 = 0.5709514
15 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
(ii) For Humidity attribute
Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.
3 2
𝐼(Humidity ) = 𝑥0+ 𝑥0=0
5 5
𝐼𝐺(Humidity ) = 0.9709514 − 0 = 0.9709514
1 1 2 2
Entropy H(Wind = Weak) = − {3 log 2 3 + log 2 3} = 0.5283209 + 0.389975 = 0.9183
3
3 2
𝐼(Wind ) = 𝑥 0.9183 + 𝑥 1 = 0.95098
5 5
𝐼𝐺(Wind ) = 0.9709514 − 0.95098 = 0.0199714
16 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Information gains are reproduced below:
IG(Temp) = 0.5709514
IG(Humidity) = 0.9709514 highest GAIN
IG(Wind) = 0.0199174
Outlook
Overcast
Number of samples = 5
Temp
Number of Attributes = 3
Humidity
Wind
Yes
Output variable = Play Number of distinct outputs = 2
No
17 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Out of 5 samples, 3 samples belong to “Yes” category
Out of 5 samples, 2 samples belong to “No” category
3 3 2 2
= − { log 2 + log 2 +} = 0.9709514
5 5 5 5
Yes (1)
Cool
No (1)
Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.
0 3 2
𝐼(𝑇𝑒𝑚𝑝) = 𝑥 0 + 𝑥 0.9183 + 𝑥 1 = 0.95098
0 5 5
𝐼𝐺(𝑇𝑒𝑚𝑝) = 0.9709514 − 0.95098 = 0.0199714
18 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
(v) For Humidity attribute
Directly we can place values of entropy by remembering properties of entropy. No mathematical calculations
are required.
2 3
𝐼(Humidity ) = 𝑥 1 + 𝑥 0.9183 = 0.95098
5 5
𝐼𝐺(Humidity ) = 0.9709514 − 0.95098 = 0.0199714
3 2
𝐼(wind ) = 𝑥0+ 𝑥0=0
5 5
𝐼𝐺(Humidity ) = 0.9709514 − 0 = 0.0199714 = 0.9709514
19 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m
Information gains are reproduced below:
IG(Temp) = 0.0199714
IG(Humidity) = 0.0199714
IG(Wind) = 0.9709514 highest GAIN
Complete Decision tree is shown below
Outlook
Overcast
20 | P a g e Y o u t u b e . c o m / E n g i n e e r s T u t o r w w w. E n g i n e e r s Tu t o r. c o m