0% found this document useful (0 votes)
13 views7 pages

Unit2 C4.5

C4.5 is an improvement on the ID3 algorithm for decision tree induction. It uses the normalized information gain, or gain ratio, as the splitting criterion to select the attribute that best splits the data. The gain ratio prevents attributes with many outcomes from being preferred. It calculates the information gain from splitting on an attribute, divided by a measure of how that attribute splits the data. The attribute with the highest gain ratio is selected as the decision node at each level of the tree.

Uploaded by

Aditya Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views7 pages

Unit2 C4.5

C4.5 is an improvement on the ID3 algorithm for decision tree induction. It uses the normalized information gain, or gain ratio, as the splitting criterion to select the attribute that best splits the data. The gain ratio prevents attributes with many outcomes from being preferred. It calculates the information gain from splitting on an attribute, divided by a measure of how that attribute splits the data. The attribute with the highest gain ratio is selected as the decision node at each level of the tree.

Uploaded by

Aditya Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

UNIT-2 (C4.

5)
What is C4.5?
• C4.5 algorithm is improvement over ID3 algorithm, where “C” is
shows algorithm is written in C and 4.5 specifics version of algorithm.
• splitting criterion used by C4.5 is the normalized information gain
(difference in entropy).
• The attribute with the highest normalized information gain is chosen
to make the decision.
• GainRatio(A) = Gain(A) / SplitInfo(A)
• SplitInfo(A) = -∑ |Dj|/|D| x log|Dj|/|D|
Example:
Entropy(Decision) = ∑ — p(I) . log p(I) = — p(Yes) . log p(Yes) — p(No) . log2(No)
= — (9/14) . log(9/14) — (5/14) . log(5/14) = 0.940
Here, we need to calculate gain ratios instead of gains.
GainRatio(A) = Gain(A) / SplitInfo(A)
SplitInfo(A) = -∑ |Dj|/|D| x log|Dj|/|D|
Let’s calculate for Wind Attribute:
Gain(Decision, Wind) = Entropy(Decision) — ∑ ( p(Decision|Wind) . Entropy(Decision|
Wind) )
Gain(Decision, Wind) = Entropy(Decision) — [ p(Decision|Wind=Weak) . Entropy(Decision|
Wind=Weak) ] + [ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ]
Entropy(Decision|Wind=Weak) = — p(No) . logp(No) — p(Yes) . logp(Yes) = — (2/8) .
log(2/8) — (6/8) . log(6/8) = 0.811
Entropy(Decision|Wind=Strong) = — (3/6) . log(3/6) — (3/6) . log(3/6) = 1
Gain(Decision, Wind) = 0.940 — (8/14).(0.811) — (6/14).(1) = 0.940–0.463–0.428 = 0.049
There are 8 decisions for weak wind, and 6 decisions for strong wind.
SplitInfo(Decision, Wind) = -(8/14).log(8/14) — (6/14).log(6/14) = 0.461 + 0.524 = 0.985
GainRatio(Decision, Wind) = Gain(Decision, Wind) / SplitInfo(Decision, Wind) = 0.049 / 0.985
= 0.049
Similarly, calculate gain ratio for outlook,
humidity, and temperature

Attribute having highest Gain Ratio will be selected as Root node.


Thank you!

You might also like