decision_tree_learning_lecture
decision_tree_learning_lecture
1 Introduction
Decision tree learning is one of the most widely used and practical methods
for inductive inference.It is a method for approximating discrete-valued target
functions, in which the learned function is represented by a decision tree.
1
3 APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING 2
• ( A ∧ B ) ∨ ( C ∧¬ D ∧ E )
• M of N
Decision trees represent a disjunction of conjunctions.
• Each path from root to a leaf is a conjunction of attribute tests.
• The tree itself is a disjunction of these conjunctions.
Figure1 illustrates a typical learned decision tree.It corresponds to the ex-
pression
( Outlook = Sunny ∧ Humidity = Normal )
∨
( Outlook = Overcast )
∨
( Outlook = Rain ∧ Wind = Weak )
• A ← the attribute from Attributes that best (i.e., highest information gain) classifies Examples;
– Add a new tree branch below Root, corresponding to the test A=vi ;
– Let Examples( vi) be the subset of Examples that have value vi for A;
– If Examples( vi) is empty
∗ Then below this new branch add a leaf node with label = most common value of
Target-attribute in Examples
∗ Else below this new branch add the subtree
ID3(Examples( vi) , Target-attribute, Attributes- A ))
End
Return Root
We want to select the attribute that is most useful for classifying exam-
ples.In order to measure the worth of an attribute a statistical property is
defined, information gain, which measures how well a given attribute sep-
arates the training examples according to their target classification.
4.1 Entropy
In order to define information gain precisely,we begin by defining a measure
called entropy , that characterizes the (im)purity of an arbitrary collection
of examples. That is, it measures the homogeneity of examples.
4 THE BASIC DECISION TREE LEARNING ALGORITHM 4
X |Sv |
Gain(S, A) ≡ Entropy(S) − Entropy(Sv )
v∈V alues(A)
|S|
In the first step of the algorithm ,the topmost node of the decision tree
is created.In order to determine the attribute that should be tested first in
the tree,the information gain for attributes (Outlook , Temperature ,Humid-
ity and Wind ) are determined. The computation of information gain for
Humidity and Wind is shown below.
Gain(S,Outlook) = 0.246
Gain(S,Humidity) = 0.151
Gain(S,Wind) = 0.048
Gain(S,Temperature) = 0.029
In Figure-3,the partially learned decision tree resulting from the first step
of ID3 is shown. The final decision tree learned by ID3 from the 14 training
examples is shown in Figure-1.
5 HYPOTHESIS SPACE SEARCH IN DECISION TREE LEARNING 6
Figure 3: The partially learned decision tree resulting from the first step of
ID3.
• No back tracking
Local minima...
• Preference for short trees over larger trees, and for those with high
information gain attributes near the root
7 ISSUES IN DECISION TREE LEARNING 8
Occam’s razor : Prefer the simplest hypothesis that fits the data.
Why prefer short hypotheses?
Arguments in favor:
• Fewer short hypotheses than long hypotheses
– a short hypothesis that fits data is unlikely to be a coincidence
– a long hypothesis that fits data might be a coincidence
Arguments opposed:
• There are many ways to define small sets of hypotheses
• e.g., all trees with a prime number of nodes that use attributes begin-
ning with ”Z”
• What’s so special about small sets based on size of hypothesis?
ID3 outputs a decision tree (h) that is more complex than the original tree
from Figure-1(h0 ). h fits the training examples perfectly,whereas the simpler
h0 will not.
Gain(S, A)
GainRatio(S, A) ≡
SplitInf ormation(S, A)
c
X |Si | |Si |
SplitInf ormation(S, A) ≡ − log2
i=1 |S| |S|
where Si is subset of S for which A has value vi
Gain2 (S, A)
Cost(A)
• Nunez (1988)
2Gain(S,A) − 1
(Cost(A) + 1)w
8 References
1. Mitchell, Tom.Machine Learning:Decision Tree Learning (Chapter 3),
The MIT Press, 1997.
2. J.R.Quinlan. Induction of Decision Trees,August 1,1985
3. Online. http://www.comp.hkbu.edu.hk/∼ymc/course/sci3790 0304/chapter4.pdf
(Lecture Notes of Decision Tree Learning-ID3 Algorithm)
c
4.Lecture slides for textbook Machine Learning,°Tom M. Mitchell, McGraw
Hill,1997
5.Online.A simplified ID3 implementation in C++ and Java:
http://www.ida.his.se/ida/kurser/ai-symbolsystem/kursmaterial/archive/assignments/
assignment3/id3.html
6.Online.A complete, java implementation of the decision-tree algorithm:
http://www.cogs.susx.ac.uk/users/christ/crs/sai/lec15.html
7.Online.Machine Learning Algorithms in Java:
www.aifb.uni-karlsruhe.de/Lehre/Winter2002-03/ kdd/download/weka/Tutorial.ps
8.Online.C4.5 tutorial and implementation in C:
http://www2.cs.uregina.ca/ hamilton/courses/831/notes/ml/dtrees/c4.5/tutorial.html