What Is An ID3 Algorithm?
What Is An ID3 Algorithm?
H(S)=∑c∈C−p(c)log2p(c) ,
Where,
● S - The current dataset for which entropy is being calculated(changes every iteration
of the ID3 algorithm).
● C - Set of classes in S {example - C ={yes, no}}
● p(c) - The proportion of the number of elements in class c to the number of elements
in set S.
In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest
entropy is used to split the set S on that particular iteration.
Entropy = 0 implies it is of pure class, that means all are of the same category.
Information Gain IG(A) tells us how much uncertainty in S was reduced after splitting set S
on attribute A. Mathematical representation of Information gain is shown here -
IG(A,S)=H(S)−∑t∈Tp(t)H(t)
Where,
● H(S) - Entropy of set S.
● T - The subsets created from splitting set S by attribute A such that
S=⋃tϵTt
● p(t) - The proportion of the number of elements in t to the number of elements in set
S.
● H(t) - Entropy of subset t.
In ID3, information gain can be calculated (instead of entropy) for each remaining attribute.
The attribute with the largest information gain is used to split the set S on that particular
iteration.
Here,the dataset is of binary classes(yes and no), where 9 out of 14 are "yes" and 5 out of 14 are
"no".
Complete entropy of dataset is: formula H(S)=∑c∈C−p(c)log2p(c) ,
H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no))
= - (9/14) * log2(9/14) - (5/14) * log2(5/14)
= - (-0.41) - (-0.53)
= 0.94
For each attribute of the dataset, let's follow the step-2 of pseudocode : -
H(Outlook=overcast) = -(4/4)*log2(4/4)-0 = 0
= 0.693
= 0.94 - 0.693
= 0.247
H(Temperature=hot) = -(2/4)*log2(2/4)-(2/4)*log2(2/4) = 1
= 0.9108
= 0.94 - 0.9108
= 0.0292
= (7/14)*0.983 + (7/14)*0.591
= 0.787
= 0.94 - 0.787
= 0.153
H(Wind=strong) = -(3/6)*log2(3/6)-(3/6)*log2(3/6) = 1
= (8/14)*0.811 + (6/14)*1
= 0.892
= 0.94 - 0.892
= 0.048
Here, the attribute with maximum information gain is Outlook. So, the decision tree built so far -
Here, when Outlook == overcast, it is of pure class(Yes).
Now, we have to repeat the same procedure for the data with rows consisting of Outlook value as
Sunny and then for Outlook value as Rain.
Now, finding the best attribute for splitting the data with Outlook=Sunny values{ Dataset rows =
[1, 2, 8, 9, 11]}.
= 0.971
= 0.4
= 0.971 - 0.4
= 0.571
= (3/5)*0 + (2/5)*0
=0
= 0.971 - 0
= 0.971
Third Attribute - Wind
= (3/5)*0.918 + (2/5)*1
= 0.9508
= 0.971 - 0.9508
= 0.0202
Here, the attribute with maximum information gain is Humidity. So, the decision tree built so far -
Here, when Outlook = Sunny and Humidity = High, it is a pure class of category "no". And When
Outlook = Sunny and Humidity = Normal, it is again a pure class of category "yes". Therefore, we
don't need to do further calculations.
Now, finding the best attribute for splitting the data with Outlook=Sunny values{ Dataset rows =
[4, 5, 6, 10, 14]}.
= 0.971
= (2/5)*1 + (3/5)*0.918
= 0.9508
= 0.971 - 0.9508
= 0.0202
H(Wind=weak) = -(3/3)*log2(3/3)-0 = 0
H(Wind=strong) = 0-(2/2)*log2(2/2) = 0
= (3/5)*0 + (2/5)*0
=0
= 0.971 - 0
= 0.971
Here, the attribute with maximum information gain is Wind. So, the decision tree built so far -
Here, when Outlook = Rain and Wind = Strong, it is a pure class of category "no". And When
Outlook = Rain and Wind = Weak, it is again a pure class of category "yes".
And this is our final desired tree for the given dataset.