The ID3 algorithm is a decision tree method introduced in 1986 that transforms raw data into rule-based decision trees. It calculates entropy and information gain to determine the best attributes for decision-making, using examples such as weather conditions to predict outcomes. The document also mentions an extended version, C4.5, which supports numerical features and utilizes gain ratio instead of information gain.
The ID3 algorithm is a decision tree method introduced in 1986 that transforms raw data into rule-based decision trees. It calculates entropy and information gain to determine the best attributes for decision-making, using examples such as weather conditions to predict outcomes. The document also mentions an extended version, C4.5, which supports numerical features and utilizes gain ratio instead of information gain.
• Decision tree algorithms transfom raw data to rule based decision
making trees. Herein, ID3 is one of the most common decision tree algorithm. Firstly, It was introduced in 1986 and it is acronym of Iterative Dichotomiser. Day Outlook Temp. Humidity Wind Decision 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No • We can summarize the ID3 algorithm as illustrated below • Entropy(S) = ∑ – p(I) . log2p(I) • Gain(S, A) = Entropy(S) – ∑ * p(S|A) . Entropy(S|A) + • Entropy • We need to calculate the entropy first. Decision column consists of 14 instances and includes two labels: yes and no. There are 9 decisions labeled yes, and 5 decisions labeled no. • Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No) • Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940 Weak wind factor on decision
Entropy(Decision|Wind) ] • Wind attribute has two labels: weak and strong. We would reflect it to the formula. • Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ] • Now, we need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively. Weak wind factor on decision Outloo Humidit Decisio Day Temp. Wind k y n 1- Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) 1 Sunny Hot High Weak No Overca 3 Hot High Weak Yes st 2- Entropy(Decision|Wind=Weak) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes Overca 13 Hot Normal Weak Yes st Strong wind factor on decision Humidit Here, there are 6 instances for strong wind. Decision Day Outlook Temp. Wind Decision y is divided into two equal parts. 1- Entropy(Decision | Wind=Strong) = – p(No) . 2 Sunny Hot High Strong No log2p(No) – p(Yes) . log2p(Yes) 6 Rain Cool Normal Strong No 2- Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1 7 Overcast Cool Normal Strong Yes 11 Sunny Mild Normal Strong Yes Now, we can turn back to Gain(Decision, Wind) equation. 12 Overcast Mild High Strong Yes Gain(Decision, Wind) = 14 Rain Mild High Strong No Entropy(Decision) – [ p(Decision | Wind=Weak) . Entropy(Decision | Wind=Weak) ] – [ p(Decision |Wind=Strong) .
Entropy(Decision | Wind=Strong) ] = 0.940 – [
(8/14) . 0.811 ] – [ (6/14). 1] = 0.048 Other factors on decision
• We have applied similar calculation on the other columns.
• 2- Gain(Decision, Temperature) = 0.029 • 3- Gain(Decision, Humidity) = 0.151 Overcast outlook on decision Outloo Humidi Decisio Day Temp. Wind k ty n Overca 3 Hot High Weak Yes st Overca 7 Cool Normal Strong Yes st Overca 12 Mild High Strong Yes st Overca 13 Hot Normal Weak Yes st Sunny outlook on decision
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5 percent yes. 1- Gain(Outlook=Sunny|Temperature) = 0.570 2- Gain(Outlook=Sunny|Humidity) = 0.970 3- Gain(Outlook=Sunny|Wind) = 0.019 Rain outlook on decision Day Outlook Temp. Humidity Wind Decision
Here, wind produces the highest score if outlook were rain.
That’s why, we need to check wind attribute in 2nd level if outlook were rain. So, it is revealed that decision will always be yes if wind were weak and outlook were rain. Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes
Day Outlook Temp. Humidity Wind Decision
6 Rain Cool Normal Strong No 14 Rain Mild High Strong No What’s next?
Extended version of ID3 is C4.5. It supports numerical
features and uses gain ratio instead of information gain