Decision Trees 4
Decision Trees 4
Decision Trees 4
CSE 413
Presented by: Shahriar Parvej
Slide by: Jeff Storey
Overview
What is a Decision Tree
Sample Decision Trees
How to Construct a Decision Tree
Problems with Decision Trees
Decision Trees in Gaming
Summary
Classification: Definition
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a
function of the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the model. Usually,
the given data set is divided into training and test sets, with
training set used to build the model and test set used to validate it.
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
No
1 Yes Large 125K
algorithm
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ? Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Examples of Classification
Task
Predicting tumor cells as benign or malignant
discrete
We then choose a target attribute that
we want to predict
Then create an experience table that
lists what we have seen in the past
Sample Experience Table
Example Attributes Target
Hour Weather Accident Stall Commute
D1 8 AM Sunny No No Long
D2 8 AM Cloudy No Yes Long
D3 10 AM Sunny No No Short
D4 9 AM Rainy Yes No Long
D5 9 AM Sunny Yes Yes Long
D6 10 AM Sunny No No Short
D7 10 AM Cloudy No No Short
D8 9 AM Rainy No No Medium
D9 9 AM Sunny Yes No Long
D10 10 AM Cloudy Yes Yes Long
D11 10 AM Rainy No No Short
D12 8 AM Cloudy Yes No Long
D13 9 AM Sunny No No Medium
Example of a Decision Tree
cal cal u s
r i r i uo
o o n
teg teg nti
ass
ca ca co cl
Tid Refund Marital Taxable
Splitting Attributes
Status Income Cheat
Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree Classification
Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No Learn
8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Choosing Attributes
The previous experience decision
table showed 4 attributes: hour,
weather, accident and stall
But the decision tree only showed 3
attributes: hour, accident and stall
Why is that?
Choosing Attributes
Methods for selecting attributes (which
will be described later) show that
weather is not a discriminating
attribute
We use the principle of Occam’s
Razor: Given a number of competing
hypotheses, the simplest one is
preferable
Choosing Attributes
The basic structure of creating a
decision tree is the same for most
decision tree algorithms
The difference lies in how we select
the attributes for the tree
We will focus on the ID3 algorithm
developed by Ross Quinlan in 1975
Decision Tree Induction
Many Algorithms:
Hunt’s Algorithm
CART
ID3, C4.5
SLIQ,SPRINT
Decision Tree Algorithms
The basic idea behind any decision tree
algorithm is as follows:
Choose the best attribute(s) to split the
remaining instances and make that attribute a
decision node
Repeat this process for recursively for each child
Stop when:
All the instances have the same target attribute value
There are no more attributes
There are no more instances
Identifying the Best Attributes
Refer back to our original decision tree
Leave At
10 AM 9 AM
8 AM
Stall? Accident?
Long
No Yes No Yes
Short Long Medium Long
How did we know to split on leave at and then on stall and accident and
not weather?
ID3 Heuristic
To determine the best attribute, we
look at the ID3 heuristic
ID3 splits attributes based on their
entropy.
Entropy is the measure of
disinformation…
Entropy
Entropy is minimized when all values of the
target attribute are the same.
If we know that commute time will always be
short, then entropy = 0
C 4 5
1 2 3
Subtree Replacement
Node 6 replaced the subtree
Generalizes tree a little more, but may increase
accuracy
6 4 5
Subtree Raising
Entire subtree is raised onto another
node
A
C 4 5
1 2 3
Subtree Raising
Entire subtree is raised onto another node
This was not discussed in detail as it is not
clear whether this is really worthwhile (as it
is very time consuming)
A
1 2 3
Problems with ID3
ID3 is not optimal
Uses expected entropy reduction, not
actual reduction
Must use discrete (or discretized)
attributes
What if we left for work at 9:30 AM?
We could break down the attributes into
smaller values…
Problems with Decision Trees
While decision trees classify quickly,
the time for building a tree may be
higher than another type of classifier
8:00 (L), 8:02 (L), 8:07 (M), 9:00 (S), 9:20 (S), 9:25 (S), 10:00 (S), 10:02 (M)
-1.0 Defense
Note that this decision tree does not even use the tribe attribute
ID3 in Black & White
Now suppose we don’t want the entire
decision tree, but we just want the 2
highest feedback values
We can create a Boolean expressions,
such as
((Allegiance = Enemy) ^ (Defense = Weak))
v ((Allegiance = Enemy) ^ (Defense =
Medium))
Summary
Decision trees can be used to help
predict the future
The trees are easy to understand
Decision trees work more efficiently
with discrete attributes
The trees may suffer from error
propagation