Decision Tree
Decision Tree
Learning Algorithm
Wei Peng, Juhua Chen and Haiping Zhou
Project of Comp 9417: Machine Learning
University of New South Wales, School of Computer Science & Engineering,
Sydney, NSW 2032, Australia
[email protected]
Abstract
Decision tree learning algorithm has been successfully used in expert
systems in capturing knowledge. The main task performed in these systems is
using inductive methods to the given values of attributes of an unknown object
to determine appropriate classification according to decision tree rules. We
examine the decision tree learning algorithm ID3 and implement this algorithm
using Java programming. We first implement basic ID3 in which we dealt with
the target function that has discrete output values. We also extend the domain
of ID3 to real-valued output, such as numeric data and discrete outcome
rather than simply Boolean value. The Java applet provided at last section
offers a simulation of decision-tree learning algorithm in various situations.
Some shortcomings are discussed in this project as well.
1 Introduction
1.1 What is Decision Tree?
What is decision tree: A decision tree is a tree in which each branch node
represents a choice between a number of alternatives, and each leaf node
represents a decision.
Decision tree are commonly used for gaining information for the purpose of
decision -making. Decision tree starts with a root node on which it is for users
to take actions. From this node, users split each node recursively according to
decision tree learning algorithm. The final result is a decision tree in which
each branch represents a possible scenario of decision and its outcome.
Possible Values
Outlook
Temperature
Humidity
Windy
Decision
n(negative), p(positive)
Decision
sunny
hot
high
false
sunny
hot
overcast hot
high
high
true
false
n
p
rain
rain
rain
mild
cool
cool
high
normal
normal
false
false
true
p
p
n
overcast cool
sunny
mild
normal
high
true
false
p
n
sunny
rain
sunny
cool
mild
mild
normal
normal
normal
false
false
true
p
p
p
overcast mild
overcast hot
high
normal
true
false
p
p
rain
mild
high
true
Decision
sunny
hot
0.9
false
sunny
hot
overcast hot
0.87
0.93
true
false
n
p
rain
rain
rain
mild
cool
cool
0.89
0.80
0.59
false
false
true
p
p
n
overcast cool
sunny
mild
0.77
0.91
true
false
p
n
sunny
rain
sunny
cool
mild
mild
0.68
0.84
0.72
false
false
true
p
p
p
overcast mild
overcast hot
0.49
0.74
true
false
p
p
rain
0.86
true
mild
0.68
p
0.72
p
0.87
n
0.9
n
0.91
n
4 Shortcoming of ID3
4. 1 Scenario 1:
'A significant shortcoming of ID3 is that the space
of legal splits at a node is impoverished. A split is a
partition of the instance space that results from
placing a test at a decision tree node. ID3 and its
descendants only allow testing a single attribute
and braching on the outcome of that test' ( Paul
Utgoff & Carla Brodley, 1990)
During our implementation of ID3, we found that
sometimes, we can not gain a result using split
creteria provided by ID3. Such as in dealing with
'titanic.db', you may find that we can ot gain a
tennis
outlook symbolic temperature symbolic humidity symbolic windy symbolic
decision symbolic
sunny hot high false n
sunny hot high true n
overcast hot high false p
rain mild high false p
rain cool normal false p
rain cool normal true n
Class
Age
Sex
Survive
p1
first
adult
male
yes
p2
first
male
:
female
yes
first
adult
:
adult
:
second
:
adult
:
male
:
adult
:
female
:
p176
:
p493
:
p2201
:
crew
:
yes
:
no
:
no
Acknowledgements:
We wish to thank for Professor Claude Sammut and Associate professor
Achim Hoffmann at UNSW for their guidance and support for our project in the
course of Machine Learning. We also appreciate the other project groups that
Reference
1. Tom M. Mitchell, (1997). Machine Learning, Singapore, McGrawHill.
2. Paul E. Utgoff and Carla E. Brodley, (1990). 'An Incremental Method
for Finding Multivariate Splits for Decision Trees', Machine Learning:
Proceedings of the Seventh International Conference, (pp.58). Palo Alto,
CA: Morgan Kaufmann