0% found this document useful (0 votes)
6 views

Decision Tree-Using Entropy

Uploaded by

mehtabksidhu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Decision Tree-Using Entropy

Uploaded by

mehtabksidhu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Decision Tree based Learning

Example

5/13/2022
Decision tree representation (PlayTennis)

Outlook=Sunny, Temp=Hot, Humidity=High, Wind=Strong No

5/13/2022
Decision trees expressivity
 Decision trees represent a disjunction of conjunctions on
constraints on the value of attributes:
(Outlook = Sunny  Humidity = Normal) 
(Outlook = Overcast) 
(Outlook = Rain  Wind = Weak)

5/13/2022
Top-down induction of Decision Trees
 ID3 (Quinlan, 1986) is a basic algorithm used to create DT's
 Given a training set of examples, the algorithms for building DT
performs search in the space of decision trees.
 The construction of the tree is top-down. The algorithm is greedy.
 The fundamental question is “which attribute should be tested next?
Which attribute gives us more information?”
 Select the best attribute
 A descendent node is then created for each possible value of this
attribute and data set is partitioned according to this value.
 The process is repeated for each successor node until all the
examples are classified correctly or there are no attributes left

5/13/2022
Which attribute is the best classifier?

 A statistical property called information gain, measures how


well a given attribute separates the training examples
 Information gain uses the notion of entropy, commonly used in
information theory
 Information gain = expected reduction of entropy

5/13/2022
5/13/2022
Entropy in binary classification
 Entropy measures the impurity of a collection of examples. It
depends from the distribution of the random variable p.
 S is a collection of training examples
 p+ the proportion of positive examples in S
 p– the proportion of negative examples in S
Entropy (S)  – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0.94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
= 1/2 + 1/2 = 1 [log21/2 = – 1]
Note: the log of a number < 1 is negative, 0  p  1, 0  entropy  1
Entropy in general
 Entropy measures the amount of information in a random
variable
H(X) = – p+ log2 p+ – p– log2 p– X = {+, –}
for binary classification [two-valued random variable]
c c
H(X) = –  pi log2 pi =  pi log2 1/ pi X = {i, …, c}
i=1 i=1
for classification in c classes
Example: rolling a die with 8, equally probable, sides
8
H(X) = –  1/8 log2 1/8 = – log2 1/8 = log2 8 = 3
i=1
Information gain as entropy reduction
 Information gain is the expected reduction in entropy caused by
partitioning the examples on an attribute.
 The higher the information gain the more effective the attribute
in classifying training data.
 Expected reduction in entropy knowing A

Gain(S, A) = Entropy(S) −  |Sv|


Entropy(Sv)
v  Values(A) |S|
Values(A) possible values for A
Sv subset of S for which A has value v
Example: expected information gain
 Let
 Values(Wind) = {Weak, Strong}
 S = [9+, 5−]
 SWeak = [6+, 2−]
 SStrong = [3+, 3−]
 Information gain due to knowing Wind:
Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0.94 − 8/14  0.811 − 6/14  1.00
= 0.048
Which attribute is the best classifier?
First step: which attribute to test at the root?

 Which attribute should be tested at the root?


 Gain(S, Outlook) = 0.246
 Gain(S, Humidity) = 0.151
 Gain(S, Wind) = 0.048
 Gain(S, Temperature) = 0.029
 Outlook provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Outlook
 partition the training samples according to the value of Outlook
After first step
Second step
 Working on Outlook=Sunny node:
Gain(SSunny, Humidity) = 0.970  3/5  0.0  2/5  0.0 = 0.970
Gain(SSunny, Wind) = 0.970  2/5  1.0  3/5  0.918 = 0 .019
Gain(SSunny, Temp.) = 0.970  2/5  0.0  2/5  1.0  1/5  0.0 = 0.570
 Humidity provides the best prediction for the target
 Lets grow the tree:
 add to the tree a successor for each possible value of Humidity
 partition the training samples according to the value of Humidity
Second and third steps

{D1, D2, D8} {D9, D11} {D4, D5, D10} {D6, D14}
No Yes Yes No
Thanks

You might also like