A08 Decision Trees 2up
A08 Decision Trees 2up
Mehul Motani
Electrical & Computer Engineering
National University of Singapore
Email: [email protected]
Office: E4-05-18
Tel: 6516 6918
Decision Trees
• Use training data to build the decision tree.
• Use a decision tree to predict categories for
new events.
New
Events
Training
Decision
Events and
Tree
Categories
Category
19/7/20
near(stocking) near(race)
yes no yes no
run3
How do we
choose the best
attribute?
19/7/20
Impurity is Uncertainty
• The key idea is to think of Impurity as Uncertainty
• We will use the counts at the leaves to define
probability distributions and use them to measure
uncertainty
Very impure group Less impure No impurity
High uncertainty Less uncertainty No uncertainty
19/7/20
(Method 1)
(Method 2)
17 instances
2 child æ1 1 ö æ 12 12 ö
impurity = -
entropy 13ç × log 2 ÷ - ç × log 2 ÷ = 0.391
è 13 ø è 13 13ø
parent æ 14 14 ö æ 16 16 ö
1 impurity = -ç × log 2 ÷ - ç × log 2 ÷ = 0.996 13 instances
entropy è 30 30 ø è 30 30 ø
æ 17 ö æ 13 ö
3 Average Entropy of Children = ç × 0.787 ÷ + ç × 0.391÷ = 0.615
è 30 ø è 30 ø
4 Information Gain = 0.996 - 0.615 = 0.38
© Mehul Motani Decision Trees 25
(1)
X Y Z C
1 1 1 I
1 1 0 I (1)
0 0 1 II
1 0 0 II
(3)
Which attribute is best? Which is worst? Does it make sense?
(1)
• Classification Error
(3)
19/7/20
Avoid Overfitting
• Occam’s Razor
– ”If two theories explain the facts equally well, then
the simpler theory is to be preferred”
– Fewer short hypotheses than long hypotheses
– A short hypothesis that fits the data is unlikely to be
a coincidence
– A long hypothesis that fits the data might be a
coincidence
• Stop growing when split not statistically significant
• Grow full tree, then post-prune
– Prune tree to reduce errors or improve accuracy
19/7/20
Ensemble Learning
• An ensemble method is a technique that combines the
predictions from multiple machine learning algorithms together
to make more accurate predictions than any individual model.
• Popular ensemble methods are Bagging, Boosting, and Stacking.
• Ensembling can reduce overfitting without decreasing
performance.
Individual predictors
(gray lines) wiggle a lot
and are clearly
overfitting.
The averaged
ensemble predictor (red
line) is more stable and
less overfitting.
© Mehul Motani Decision Trees 35
19/7/20
Bootstrapping: Random
sampling with replacement
19/7/20
Bias-Variance Tradeoff
Precision vs. Accuracy Low Variance High Variance
High Precision Low Precision
Low Bias
High Accuracy
High Bias
Low Accuracy
19/7/20