0% found this document useful (0 votes)
6 views

ID3 Algorithm

The ID3 algorithm is a decision tree method introduced in 1986 that transforms raw data into rule-based decision trees. It calculates entropy and information gain to determine the best attributes for decision-making, using examples such as weather conditions to predict outcomes. The document also mentions an extended version, C4.5, which supports numerical features and utilizes gain ratio instead of information gain.

Uploaded by

archanashrma6266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

ID3 Algorithm

The ID3 algorithm is a decision tree method introduced in 1986 that transforms raw data into rule-based decision trees. It calculates entropy and information gain to determine the best attributes for decision-making, using examples such as weather conditions to predict outcomes. The document also mentions an extended version, C4.5, which supports numerical features and utilizes gain ratio instead of information gain.

Uploaded by

archanashrma6266
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

ID3 Algorithm

• Decision tree algorithms transfom raw data to rule based decision


making trees. Herein, ID3 is one of the most common decision tree
algorithm. Firstly, It was introduced in 1986 and it is acronym
of Iterative Dichotomiser.
Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
• We can summarize the ID3 algorithm as illustrated below
• Entropy(S) = ∑ – p(I) . log2p(I)
• Gain(S, A) = Entropy(S) – ∑ * p(S|A) . Entropy(S|A) +
• Entropy
• We need to calculate the entropy first. Decision column consists of 14
instances and includes two labels: yes and no. There are 9 decisions
labeled yes, and 5 decisions labeled no.
• Entropy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)
• Entropy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940
Weak wind factor on decision

• Gain(Decision, Wind) = Entropy(Decision) – ∑ * p(Decision|Wind) .


Entropy(Decision|Wind) ]
• Wind attribute has two labels: weak and strong. We would reflect it
to the formula.
• Gain(Decision, Wind) = Entropy(Decision) – [ p(Decision|Wind=Weak)
. Entropy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) .
Entropy(Decision|Wind=Strong) ]
• Now, we need to calculate (Decision|Wind=Weak)
and (Decision|Wind=Strong) respectively.
Weak wind factor on decision
Outloo Humidit Decisio
Day Temp. Wind
k y n 1- Entropy(Decision|Wind=Weak) = – p(No) .
log2p(No) – p(Yes) . log2p(Yes)
1 Sunny Hot High Weak No
Overca
3 Hot High Weak Yes
st 2- Entropy(Decision|Wind=Weak) = – (2/8) .
log2(2/8) – (6/8) . log2(6/8) = 0.811
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
Overca
13 Hot Normal Weak Yes
st
Strong wind factor on decision
Humidit Here, there are 6 instances for strong wind. Decision
Day Outlook Temp. Wind Decision
y is divided into two equal parts.
1- Entropy(Decision | Wind=Strong) = – p(No) .
2 Sunny Hot High Strong No log2p(No) – p(Yes) . log2p(Yes)
6 Rain Cool Normal Strong No 2- Entropy(Decision|Wind=Strong) = – (3/6) .
log2(3/6) – (3/6) . log2(3/6) = 1
7 Overcast Cool Normal Strong Yes
11 Sunny Mild Normal Strong Yes Now, we can turn back to Gain(Decision, Wind)
equation.
12 Overcast Mild High Strong Yes
Gain(Decision, Wind) =
14 Rain Mild High Strong No
Entropy(Decision) – [ p(Decision | Wind=Weak) .
Entropy(Decision | Wind=Weak) ] – [ p(Decision
|Wind=Strong) .

Entropy(Decision | Wind=Strong) ] = 0.940 – [


(8/14) . 0.811 ] – [ (6/14). 1] = 0.048
Other factors on decision

• We have applied similar calculation on the other columns.


• Entropy(Decision | Wind=Strong) ] = 0.940 – [ (8/14) . 0.811 ] – [
(6/14). 1] = 0.048

• 1- Gain(Decision, Outlook) = 0.246


• 2- Gain(Decision, Temperature) = 0.029
• 3- Gain(Decision, Humidity) = 0.151
Overcast outlook on decision
Outloo Humidi Decisio
Day Temp. Wind
k ty n
Overca
3 Hot High Weak Yes
st
Overca
7 Cool Normal Strong Yes
st
Overca
12 Mild High Strong Yes
st
Overca
13 Hot Normal Weak Yes
st
Sunny outlook on decision

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No


2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes
Here, there are 5 instances for sunny outlook.
Decision would be probably 3/5 percent no, 2/5
percent yes.
1- Gain(Outlook=Sunny|Temperature) = 0.570
2- Gain(Outlook=Sunny|Humidity) = 0.970
3- Gain(Outlook=Sunny|Wind) = 0.019
Rain outlook on decision
Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

10 Rain Mild Normal Weak Yes

14 Rain Mild High Strong No


1- Gain(Outlook=Rain | Temperature) = 0.01997309402197489
2- Gain(Outlook=Rain | Humidity) = 0.01997309402197489
3- Gain(Outlook=Rain | Wind) = 0.9709505944546686

Here, wind produces the highest score if outlook were rain.


That’s why, we need to check wind attribute in 2nd level
if outlook were rain.
So, it is revealed that decision will always be yes if wind
were weak and outlook were rain.
Day Outlook Temp. Humidity Wind Decision

4 Rain Mild High Weak Yes


5 Rain Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes

Day Outlook Temp. Humidity Wind Decision


6 Rain Cool Normal Strong No
14 Rain Mild High Strong No
What’s next?

Extended version of ID3 is C4.5. It supports numerical


features and uses gain ratio instead of information gain

You might also like