7 - Classfication - Concept - DecisionTree - Evaluation
7 - Classfication - Concept - DecisionTree - Evaluation
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Example of Classification Task
Predicting tumor cells as benign or malignant
6 No Married 60K No
TaxInc NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No NO YES
10 No Single 90K Yes
10
6 No Medium 60K No
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply Decision
Model Tree
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Decision Tree Induction
• Many Algorithms:
• Hunt’s Algorithm (one of the earliest)
• CART
• ID3, C4.5
• SLIQ,SPRINT
Tree Induction
• Greedy strategy.
• Split the records based on an attribute test that
optimizes certain criterion.
• Issues
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
Tree Induction
• Greedy strategy.
• Split the records based on an attribute test that
optimizes certain criterion.
• Issues
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
How to Specify Test Condition?
• Depends on attribute types
• Nominal
• Ordinal
• Continuous
Taxable Taxable
Income Income?
> 80K?
< 10K > 80K
Yes No
• Issues
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
Classification Metrics / Measure
Node Impurity
• Gini Impurity / Index : Used by the CART (classification and regression tree)
algorithm, Gini impurity is a measure of how often a randomly chosen element from the set
would be incorrectly labeled if it was randomly labeled according to the distribution of labels in
the subset.
• Information Gain / Entropy : Used by the ID3, C4.5 and C5.0 tree-
generation algorithms. Information gain is based on the concept of entropy from information
theory.
• Issues
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
Stopping Criteria for Tree Induction
• Stop expanding a node when all the records belong
to the same class
• Missing Values
• Costs of Classification
Underfitting and Overfitting
(Example)
500 circular and 500
triangular data points.
Circular points:
0.5 sqrt(x12+x22) 1
Triangular points:
sqrt(x12+x22) > 0.5 or
sqrt(x12+x22) < 1
Underfitting and Overfitting
Overfitting
Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise
Lack of data points in the lower half of the diagram makes it difficult to predict
correctly the class labels of that region
- Insufficient number of training records in the region causes the decision tree
to predict the test examples using other training records that are irrelevant to
the classification task
Notes on Overfitting
• Overfitting results in decision trees that are more complex
than necessary
Q R
S 0 Q 1
0 1 S 0
0 1
Class=Yes Class=No
a: TP (true positive)
Class=Yes a b b: FN (false negative)
ACTUAL
c: FP (false positive)
CLASS Class=No c d d: TN (true negative)
Example of a Decision Tree
Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat
6 No Married 60K No
TaxInc NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No NO YES
10 No Single 90K Yes
10
Class=Yes Class=No
Class=Yes a b
ACTUAL (TP) (FN)
CLASS Class=No c d
(FP) (TN)
105
UTS
• Friday
• 13-OCT-17
• 08:30:00