Decision Tree Decision Tree: R. Akerkar
Decision Tree Decision Tree: R. Akerkar
R. Akerkar
TMRF, Kolhapur, India
R. Akerkar 1
Introduction
R. Akerkar 2
Introduction
Top-Down
Top Down strategy
R. Akerkar 3
Decision Tree
Example
R. Akerkar 4
Decision Tree
Example
R. Akerkar 5
Classification
R. Akerkar 6
In our tree, we can carry out the classification
for
o aan u
unknown
o record
eco d as follows.
o o s
Let us assume, for the record, that we know
the values of the first four attributes (but we
do not know the value of class attribute) as
R. Akerkar 8
The accuracy of the classifier is determined by the percentage of the
t t data
test d t sett that
th t is
i correctly
tl classified.
l ifi d
We can see that for Rule 1 there are two records of the test data set
satisfying outlook= sunny and humidity < 75, and only one of these
is correctly classified as play.
Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the
accuracy of Rule 2 is also 0.5 (or 50%). The accuracy of Rule 3 is
0.66.
RULE 1
If it is sunny and the humidity
is not above 75%, then play.
R. Akerkar 9
Concept of Categorical Attributes
R. Akerkar 10
Figure gives a decision tree for the
training data
data.
R. Akerkar 11
Advantages and Shortcomings of Decision
Tree Classifications
A decision tree construction process is concerned with identifying
the
h splitting
li i attributes
ib andd splitting
li i criterion
i i at every llevell off the
h tree.
Weaknesses are:
The process of growing a decision tree is computationally expensive
expensive. At
each node, each candidate splitting field is examined before its best split
can be found.
Some decision tree can only deal with binary-valued target classes.
R. Akerkar 12
Iterative Dichotomizer (ID3)
Quinlan (1986)
Each node corresponds to a splitting attribute
Each arc is a possible value of that attribute.
R. Akerkar 13
Training Dataset
This follows an example from Quinlan’s ID3
R. Akerkar 14
Extractingg Classification Rules from Trees
R. Akerkar 15
Solution (Rules)
R. Akerkar 16
Algorithm for Decision Tree Induction
R. Akerkar 17
Attribute Selection Measure: Information
Gain (ID3/C4.5)
(ID3/C4 5)
R. Akerkar 19
Entropyy
The entropy for a completely pure set is 0 and is 1 for a set with
equall occurrences ffor both
b th th
the classes.
l
R. Akerkar 20
Attribute Selection by Information Gain
Computation
5 4
Class P: buys_computer = “yes” E ( age ) I ( 2 ,3) I ( 4,0 )
14 14
Class N: buys_computer = “no”
5
I(p, n) = I(9, 5) =0.940 I (3, 2 ) 0 .694
Compute the entropy for age:
14
age pi ni I(pi, ni) 5
<=30 2 3 0.971 I ( 2,3) means “age
“ <=30”
30” h
has 5
14
30…40 4 0 0 out of 14 samples, with 2 yes's
>40
40 3 2 0.971 and 3 no’s.
no s. Hence
age income student credit_rating buys_computer
<=30
<=30
high
high
no
no
fair
excellent
no
no
Gain ( age ) I ( p , n ) E ( age ) 0.246
31…40 high no fair yes
>40
>40
medium
low
no
yes
fair
fair
yes
yes
Similarly Gain(income) 0.029
Similarly,
>40 low yes excellent no Gain( student ) 0.151
31…40 low yes excellent yes
<=30 medium no fair no Gain(credit _ rating ) 0.048
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes Since, age has the highest information gain
31…40 medium no excellent yes among the attributes, it is selected as the
31…40 high yes fair yes
>40 medium no excellent no R. Akerkar
test attribute. 21
Exercise 1
The following table consists of training data from an employee
database.
database
Let status be the class attribute. Use the ID3 algorithm to construct a
decision tree from the given data.
R. Akerkar 22
Solution 1
R. Akerkar 23
Other Attribute Selection Measures
R. Akerkar 24
Gini Index (IBM IntelligentMiner)
If a data set T contains examples from n classes, gini index, gini(T) is
n
defined as i i (T ) 1 p 2
gini j
j 1
where pj is the relative frequency of class j in T.
If a data set T is split into two subsets T1 and T2 with sizes N1 and N2
respectively, the gini index of the split data contains examples from n
classes, the gini index gini(T) is defined as
gini split
(T ) N 1 gini (T 1) N 2 gini (T 2 )
N N
The attribute provides the smallest ginisplit(T) is chosen to split the node
(need to enumerate all possible splitting points for each attribute).
R. Akerkar 25
Exercise 2
R. Akerkar 26
Solution 2
SPLIT: Age <= 50
----------------------
| High | Low | Total
--------------------
S1 (left) | 8 | 11 | 19
S2 (right) | 11 | 10 | 21
--------------------
For S1: P(high) = 8/19 = 0.42 and P(low) = 11/19 = 0.58
For S2: P(high) = 11/21 = 0.52 and P(low) = 10/21 = 0.48
Gini(S1) = 1-[0.42x0.42 + 0.58x0.58] = 1-[0.18+0.34] = 1-0.52 = 0.48
Gini(S2) = 1-[0.52x0.52 + 0.48x0.48] = 1-[0.27+0.23] = 1-0.5 = 0.5
Gini-Split(Age<=50) = 19/40 x 0.48 + 21/40 x 0.5 = 0.23 + 0.26 = 0.49
R. Akerkar 27
Exercise 3
In previous exercise
exercise, which is a better split of
the data among the two split points? Why?
R. Akerkar 28
Solution 3
Intuitively Salary <= 65K is a better split point since it produces
relatively ``pure''
pure'' partitions as opposed to Age <= 5050, which
results in more mixed partitions (i.e., just look at the distribution
of Highs and Lows in S1 and S2).
On the other hand if the classes are totally mixed, i.e., both
classes have equal probability then
gini(S) = 1 - [0.5x0.5
[0 5x0 5 + 0
0.5x0.5]
5x0 5] = 1
1-[0.25+0.25]
[0 25+0 25] = 0
0.5.
5
In other words the closer the gini value is to 0, the better the
partition is.
is Since Salary has lower gini it is a better split.
split
R. Akerkar 29
Avoid
vo d Overfitting
v g in Classification
C ss c o
Overfitting: An induced tree may overfit the training data
Too many branches, some may reflect anomalies due to noise
or outliers
Poor accuracy y for unseen samples
p
Two approaches to avoid overfitting
Prepruning: Halt tree construction early—do not split a node if
this would result in the goodness measure falling below a
threshold
Difficult to choose an appropriate threshold
R. Akerkar 30