Chapter 6. Decision Tree Classification
Chapter 6. Decision Tree Classification
Decision Tree
Classification
Definition
Given a collection of records (training set )
– Each record contains a set of attributes, one of the attributes
is the class.
Find a model for class attribute as a function of the values of other
attributes.
Goal: previously unseen records should be assigned a class as accurately
as possible.
– A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and test
sets, with training set used to build the model and test set
used to validate it.
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
March 5, 2008 Data Mining: Concepts and Techniques 7
Illustrating Classification Task
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Classification Techniques
- Decision Tree based Methods
- Rule-based Methods
- Instance-based Methods
- Neural Networks
- Naïve Bayes and Bayesian Belief Networks
- Support Vector Machines
Decision Tree
Decision Tree Representation:
- Each internal node tests an attribute
- Each branch corresponds to attribute value
- Each leaf node assigns a classification
al al us
ic ic uo
g or g or in s
te te nt as Single,
ca ca co cl MarSt
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Tennis)
Output: A Decision Tree for “buys_computer”
age?
<=30 overcast
31..40 >40
no yes no yes
6 No Medium 60K No
Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
Deduction
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Rules:
R1: (Give Birth = no) Ù (Can Fly = yes) ® Birds
R2: (Give Birth = no) Ù (Live in Water = yes) ® Fishes
R3: (Give Birth = yes) Ù (Blood Type = warm) ® Mammals
R4: (Give Birth = no) Ù (Can Fly = no) ® Reptiles
R5: (Live in Water = sometimes) ® Amphibians
Rules:
R1: (Give Birth = no) Ù (Can Fly = yes) ® Birds
R2: (Give Birth = no) Ù (Live in Water = yes) ® Fishes
R3: (Give Birth = yes) Ù (Blood Type = warm) ® Mammals
R4: (Give Birth = no) Ù (Can Fly = no) ® Reptiles
R5: (Live in Water = sometimes) ® Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?
grizzly bear warm yes no no ?
(Status=Single) ® No
Coverage = 40%, Accuracy = 50%
How does Rule-based Classifier Work?
Rules:
R1: (Give Birth = no) Ù (Can Fly = yes) ® Birds
R2: (Give Birth = no) Ù (Live in Water = yes) ® Fishes
R3: (Give Birth = yes) Ù (Blood Type = warm) ® Mammals
R4: (Give Birth = no) Ù (Can Fly = no) ® Reptiles
R5: (Live in Water = sometimes) ® Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?
turtle cold no no sometimes ?
dogfish shark cold yes no yes ?
- Exhaustive rules
– Classifier has exhaustive coverage if it accounts for every
possible combination of attribute values
– Each record is covered by at least one rule
Classification Rules
(Refund=Yes) ==> No
Refund
Yes No (Refund=No, Marital Status={Single,Divorced},
Taxable Income<80K) ==> No
NO Marital
Status (Refund=No, Marital Status={Single,Divorced},
{Single,
{Married} Taxable Income>80K) ==> Yes
Divorced}
(Refund=No, Marital Status={Married}) ==> No
Taxable NO
Income
< 80K > 80K
NO YES
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the
tree
- Indirect Method:
Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
e.g: C4.5rules
Example 1:
age?
<=30 31..4
0 >40
no ye excellent fair
s
n yes n yes
o o
Example 2:
Latihan :
1. Tentukan rule-rule yang didapat dari data berikut ini: