0% found this document useful (0 votes)
6 views18 pages

C4.5 Algorithm decision tree

The document outlines the Decision Tree Algorithm (C4.5) using the PlayTennis dataset as an example. It details the steps to calculate entropy, information gain, and gain ratios for attributes like Outlook, Temperature, Humidity, and Wind to determine the best attribute for splitting the data. The final decision tree structure is derived based on these calculations to predict whether to play tennis or not.

Uploaded by

Rohan Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views18 pages

C4.5 Algorithm decision tree

The document outlines the Decision Tree Algorithm (C4.5) using the PlayTennis dataset as an example. It details the steps to calculate entropy, information gain, and gain ratios for attributes like Outlook, Temperature, Humidity, and Wind to determine the best attribute for splitting the data. The final decision tree structure is derived based on these calculations to predict whether to play tennis or not.

Uploaded by

Rohan Malik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Decision Tree Algorithm (C4.

5)

1
Training Examples for PlayTennis
(Mitchell 1997)

Day Outlook Temperature Humidity Wind PlayTennis


D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Oyercast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
Di1 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
|D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
2
Decision Tree for PlayTennis
(Mitchell 1997)

Outlook

Sunny Overcast Rain

No
K
Humidity

Normal

Yes
Yes

Strong

No
Wind

Weak

Yes

3
Decision Tree Algorithm (C4.5)

All the equations of C4.5 algorithm are as follow:

Calculate lnfoS)(Entropy, )to identify the class in


the training set S.

i=1

where |s] is the number of cases in the training set, CG


is a class, i 1,2,.k,k is the number of classes,

freq(C;, S) and is the number of cases in C;.

(9Y, 5N)
4
Decision Tree Algorithm (C4.5)

Calculate the expected information valuc, Info, (S)

for feature Xto partition S. Otloo

|Humidir T
= -El(s/S)nfaS,)]
Igo,(8)

where L is the number of outputs for feature X, S is a


S
subset of corresponding to the f output, and S is

the number of cases in subset S,.

5
Decision Tree Algorithm (C4.5)

Calculate the information gaincd after partitioning


according to featureX.

Gai.N) =lnfoS) -nfo,(S)

Calculate the partition information value, Splitnfo (I)


acquired for Spartitioned into L subsets.

S
Splitlnfo X)=

6
Decision Tree Algorithm (C4.5)

Calculate the gain ratio of Gain(X) over Splitlnfo(X).

GainRatiol) = Gain.X)/ Splitlnfo.1 )


The gain ratio can reduce the probability of choosing
the node with more attribute values.

If gain ratio for two attribute values were the same


smallest, then you can choice one randomly.

7
Decision Tree Algorithm (C4.5)

Step 1

Decide which attribute should consider first?

Outlook? Temperature? Humidity? Wind?

Use Entropy:-P.log,P-P.log,P.
Atfirst, we have 9(YES) and 5(NO)
Starting Entropy:-9/14 log, 9/14 -5/14 log,5/14-0.94

Ingo( S)=-eg(C,
=1
Sy|s]log, [freg(C, SY|S|)
8
Entropy (Info)

1.0

0.5
Entropy(S)
0.0 0. 1.0

9
Decision Tree Algorithm (C4.5)

Step 2
Compute the gain ratio for each attribute

Information gain: (The last E-The possible E)...(desire MAX)


Humidity Outlook

Normal Overcast Rain


High Sunny

10
Decision Tree Algorithm (C4.5)

Step 2(Con t)

Compute the gain ratio for cach attribute

Wind Temperature

Strong Weak Hot Mild Cool

11
Decision Tree Algorithm (C4.5)
log,,X log,,X
log, X=
Step 2(Con't) log,,2 0.30103

Compute the gain ratio for cach attribute


E,(-2/5 1og,2/5 -3/5 log,3/5)-0.4 *(-1.3219) -0.6 *(-0.7369))

=(0.5281H04421)=0.9702
Outlook E{-4/4 log,4/4 -0)=(-1log, 1 -0)-0;
E(-3/5 log,3/5 -2/5 log, 2/5)
(-0.6* (-0.7369)+0.4 *(-1.3219)= 0.9702

Sunny Overcast Rain


Info,(8) = s/snfo8)]
3+ 2
i=1

2+ 3 4+ 0
Gain (X)= hnfo (S)-Info,(S)
Splitlnfo =-5/14 log, 5/14 - 4/14 log,4/14 -0.94 -5/14*0.9702- 4/14*0 - 5/14*0.9702
-5/14 log, 5/14=1.577 =0.94 -0.693=0.247

Gain ratio =0.247/1.577=0.157 12


Decision Tree Algorithm (C4.5)

Step 2(Con't)
Compute the gain ratio for each attribute

E-(-3/7 log, 3/7 -4/7 log, 4/7)=0.985

Humidity E,(-6/7 log,6/7 -1/7 log, 1/7)F0.592

Info gain=0.94- 7/14*0.985 - 7/14*0.592 =0.151

Split Info =-7/14 log,7/14 -7/14 log,7/14=1


High Normal
Gain ratio=0.151/1=0.151
3+4- 6+ 1

13
Decision Tree Algorithm (C4.5)

Step 2(Con t)

Compute the gain ratio for cach attribute


E(-2/4 log, 2/4 -2/4 log,2/4)=1

E(-4/6 log,4/6 -2/6 1log, 2/6)-0.9182


Temperature
E;(-3/4log, 3/4 -1/4 log, 1/4)F0.8112

Info gain=0.94-4/14*1- 6/14*0.9182

Hot Mild Cool - 4/14*0.8112 =0.0292


SplitInfo =-4/14 log,4/14 -6/14 log, 6/14
2+ 2 4+ 2 3+ 1
4/14 log, 4/14-1.556

Gain ratio=0.0292/1.556=0.01876

14
Decision Tree Algorithm (C4.5)

Step 2(Con t)

Compute the gain ratio for cach attribute


E-(-3/6log, 3/6-3/6 log,3/6)=1

Wind E-6/8 log, 6/8 - 2/8 log,2/8)-0.8112

Infogain-0.94 -6/14*1 =
- 8/14*0.8112 0.048

Split Info-6/14 log, 6/14 -8/14 log, 8/14 -0.9852


Strong Weak
Gain ratio=0.048/0.9852=0.049
3+3- 6+ 2

15
Decision Tree Algorithm (C4.5)

Step 2(Con t)

Summary of gain ratio

Gain ratio Outlook-0.157 So the root node is Outlook

Gain ratio Humidify-0.151

Gain ratio Wind0.049

Gain ratio Temperature-0.01876

16
Decision Tree Algorithm (C4.5)

Step 3
Decide other attribute under root node

May choice T, H, W, under Sunny and Rain


Outlook
(Don't care Overcast, because no
Sunny Rain
information contained)
"Overcast

Yes

17
Decision Tree Algorithm (C4.5)

Step 3(Con t)

Look under Outlook=Sunny

May choice T, H, W, under Outlook -Sunny


Outlook

E (Outlook=Sunny)
=-2/5 log,2/5 - 3/5 log,3/5= 0.97
Sunny

243

18

You might also like