Decision Tree
Decision Tree
Entropy
• Entropy is the measure of uncertainty of a random variable, it
characterizes the impurity of an arbitrary collection of examples.
The higher the entropy more the information content.
• Information gain
It is also known as Kullback-Leibler divergence denoted by 𝐼𝐺ሺ𝑆, 𝐴ሻ is the effective
change in entropy after deciding on a particular attribute A.
𝐼𝐺 𝑆, 𝐴 = 𝐻 𝑆 − 𝐻 𝑆, 𝐴
𝑥 is the possible value of attribute
= 𝐻 𝑆 − σ 𝑃 𝑋 × 𝐻ሺ𝑥ሻ
Dataset
Day Outlook Temperature Humidity Wind Play Golf
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild Normal Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
1. Play(Yes) = 9 Play(No) = 5 Total = 14
Entropy(S) = H(S) = -σ 𝑃 𝑥 log 𝑝 𝑥
9 9 5 5
=− log − log = 0.94
14 14 14 14
Yes
• To further split the sunny node
Temperature Humidity Wind Play
Hot High Weak No
Hot High Strong No
Mild Hot Weak No
Cool No Weak Yes
Mild No Strong Yes
In the same way, Srain will provide us wind as the highest information gain. So the decision
tree becomes -
Yes
Yes No No Yes
ID3 Algorithm
1. Create root node for the tree
2. If all the example are positive return leaf node positive
3. Else all example are negative then return leaf node negative
4. Calculate entropy of current state H(S)
5. For each attribute x, compute entropy w.r.t x H(S,x)
6. Select the attribute which has maximum IG(S,x)
7. Remove the attribute that offer highest IG
8. Repeat until there is no attribute or the decision has all leaf nodes.