Camm 4e Ch09 PPT
Camm 4e Ch09 PPT
Camm 4e Ch09 PPT
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Predictive Data Mining
Chapter 9
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Introduction (Slide 1 of 2)
• An observation, or record, is the set of recorded values of variables
associated with a single entity.
• Supervised learning: Data mining methods for predicting an
outcome based on a set of input variables, or features.
• Supervised learning can be used for:
• Estimation of a continuous outcome.
• Classification of a categorical outcome.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Introduction (Slide 2 of 2)
The data mining process comprises the following steps:
1. Data sampling.
2. Data preparation.
3. Data partitioning.
4. Model construction.
5. Model assessment.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and
Partitioning
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 1 of 7)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 2 of7)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 3 of 7)
• Overfitting occurs when the analyst builds a model that does a great
job of explaining the sample of data on which it is based, but fails to
accurately predict outside the sample data.
• We can use the abundance of data to guard against the potential for
overfitting by splitting the data set into different subsets for:
• The training (or construction) of candidate models.
• The validation (or performance comparison) of candidate models
• The testing (or assessment) of future performance of a selected
model.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 4 of 7)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 5 of 7)
k-Fold Cross-Validation
• k-Fold Cross-Validation: A robust procedure to train and validate
models in which observations to be used to train and validate the
model are repeatedly randomly divided into k subsets called folds. In
each iteration, one fold is designated as the validation set and the
remaining k-1 folds are designated as the training set. The results of
the iterations are then combined and evaluated.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 6 of 7)
k-Fold Cross-Validation
• A special case of k-fold cross-validation is leave-one-out cross-
validation.
• In this case, the number of folds equals the number of observations
in the combined training and validation data.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Data Sampling, Preparation, and Partitioning
(Slide 7 of 7)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures
Evaluating the Classification of Categorical Outcomes
Evaluating the Estimation of Continuous Outcomes
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 1 of 19)
Evaluating the Classification of Categorical Outcomes:
• By counting the classification errors on a sufficiently large validation set
and/or test set that is representative of the population, we will generate
an accurate measure of the model’s classification performance.
• Classification confusion matrix: Displays a model’s correct and incorrect
classifications.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 2 of 19)
Table 9.1: Confusion Matrix
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 3 of 19)
Evaluating the Classification of Categorical Outcomes (cont.):
• One minus the overall error rate is often referred to as the accuracy of
the model.
• While overall error rate conveys an aggregate measure of
misclassification, it counts as misclassifying an actual Class 0 observation
as a Class 1 observation (a false positive) the same as misclassifying an
actual Class 1 observation as a Class 0 observation (a false negative).
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 4 of 19)
Evaluating the Classification of Categorical Outcomes (cont.):
• To account for the asymmetric costs in misclassification, we define the
error rate with respect to the individual classes:
• Class 1 error rate = n10
n11 n10
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 5 of 19)
Probability Probability
Table 9.2: Classification Actual Class of Class 1 Actual Class of Class 1
Probabilities 1 1.00 0 0.66
1 1.00 0 0.65
0 1.00 1 0.64
1 1.00 0 0.62
0 1.00 0 0.60
0 0.90 0 0.51
1 0.90 0 0.49
0 0.88 0 0.49
0 0.88 1 0.46
1 0.88 0 0.46
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 6 of 19)
Probability Probability
Table 9.2: Classification Actual Class of Class 1 Actual Class of Class 1
Probabilities (cont.) 0 0.87 1 0.45
0 0.87 1 0.45
0 0.87 0 0.45
0 0.86 0 0.44
1 0.86 0 0.44
0 0.86 0 0.30
0 0.86 0 0.28
0 0.85 0 0.26
0 0.84 1 0.24
0 0.84 0 0.22
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 7 of 19)
Probability Probability
Table 9.2: Classification Actual Class of Class 1 Actual Class of Class 1
Probabilities (cont.) 0 0.83 0 0.21
0 0.68 0 0.04
0 0.67 0 0.04
0 0.67 0 0.01
0 0.67 0 0.00
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 8 of 19)
Table 9.3: Confusion Matrices for Various Cutoff Values
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 9 of 19)
Table 9.3: Classification Confusion Matrices and Error Rates for Various
Cutoff Values (cont.)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 10 of 19)
Table 9.3: Classification Confusion Matrices and Error Rates for Various
Cutoff Values (cont.)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 11 of 19)
Figure 9.1:
Classification Error
Rates vs. Cutoff Value
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 12 of 19)
Evaluating the Classification of Categorical Outcomes (cont.):
• Cumulative lift chart: Compares the number of actual Class 1
observations identified if considered in decreasing order of their
estimated probability of being in Class 1 and compares this to the
number of actual Class 1 observations identified if randomly selected.
• Decile-wise lift chart: Another way to view how much better a classifier is
at identifying Class 1 observations than random classification.
• Observations are ordered in decreasing probability of Class 1 membership
and then considered in 10 equal-sized groups.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 13 of 19)
Figure 9.2: Cumulative and Decile-Wise Lift Charts
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 14 of 19)
Evaluating the Classification of Categorical Outcomes (cont.):
• The ability to correctly predict Class 1 (positive) observations is
commonly expressed as sensitivity, or recall, and is calculated as:
n11
Sensitivity 1 Class 1 error rate
n11 n10
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 15 of 19)
Evaluating the Classification of Categorical Outcomes (cont.):
• Precision is a measure that corresponds to the proportion of
observations predicted to be Class 1 by a classifier that are actually in
Class 1:
n11
Precision =
n11 n01
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 16 of 19)
Evaluating the Classification of Categorical Outcomes (cont.):
• The receiver operating characteristic (ROC) curve is an alternative
graphical approach for displaying the tradeoff between a classifier’s ability
to correctly identify Class 1 observations and its Class 0 error rate.
• In general, we can evaluate the quality of a classifier by computing the area
under the ROC curve, often referred to as the AUC.
• The greater the area under the ROC curve, i.e., the larger the AUC, the
better the classifier performs.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 17 of 19)
Figure 9.3:
Receiver Operating
Characteristic
(ROC) Curve
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 18 of 19)
Evaluating the Estimation of Continuous Outcomes:
• The measures of accuracy are some function of the error in estimating an
outcome for an observation i.
• Two common measures are:
i 1 ei n
n
• Average error =
i 1 i n
n2
• Root mean squared error (RMSE) = e
(ei error in estimating an outcome for observation i)
The average error estimates the bias in a model’s predictions:
• If the average error is negative, then the model tends to overestimate the value
of the outcome variable.
• If the average error is positive, the model tends to underestimate.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Performance Measures (Slide 19 of 19)
Table 9.4: Computer Error in Estimates of Average Balance for 10
Customers
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 1 of 8)
• Logistic regression attempts to classify a binary categorical outcome
(y = 0 or 1) as a linear function of explanatory variables.
• A linear regression model fails to appropriately explain a categorical
outcome variable.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 2 of 8)
Figure 9.4: Scatter Chart
and Simple Linear
Regression Fit for Oscars
Example
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 3 of 8)
Figure 9.5: Residuals for Simple
Linear Regression on Oscars Data
An unmistakable pattern of
systematic misprediction suggests
that the simple linear regression
model is not appropriate.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 4 of 8)
Odds is a measure related to probability.
If an estimate of the probability of an event is pˆ , then the equivalent
odds measure is pˆ 1 pˆ .
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 5 of 8)
• Logistic regression model:
pˆ
ln b0 b1 x1 bq xq
1 pˆ
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 6 of 8)
Figure 9.6: Logistic S-Curve for
Oscars Example
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 7 of 8)
• Logistic regression classifies an observation by using the logistic
function to compute the probability of an observation belonging to
Class 1 and then comparing this probability to a cutoff value.
• If the probability exceeds the cutoff value, the observation is
classified as Class 1 and otherwise it is classified as Class 0.
• While a logistic regression model used for prediction should
ultimately be judged based on its classification accuracy on
validation and test sets, Mallow’s C p statistic is a measure
commonly computed by statistical software that can be used to
identify models with promising sets of variables.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Logistic Regression (Slide 8 of 8)
Table 9.5: Predicted Probabilities by Logistic Regression for Oscars
Example
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors
Classifying Categorical Outcomes with k-Nearest Neighbors
Estimating Continuous Outcomes with k-Nearest Neighbors
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 1 of 7)
• k-Nearest Neighbors (k-NN): This method can be used either to
classify a categorical outcome or to estimate a continuous outcome.
• k-NN uses the k most similar observations from the training set,
where similarity is typically measured with Euclidean distance.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 2 of 7)
Classifying Categorical Outcomes with k-Nearest Neighbors:
• A nearest-neighbor classifier is a “lazy learner” that directly uses the
entire training set to classify observations in the validation and test sets.
• The value of k can plausibly range from 1 to n, the number of
observations in the training set.
• If k = 1, then the classification of a new observation is set to be equal to the
class of the single most similar observation from the training set.
• If k = n, then the new observation’s class is naïvely assigned to the most
common class in the training set.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 3 of 7)
Table 9.6: Training Set Observations for k-NN Classifier
Observation Average Balance Age Loan Default
1 49 38 1
2 671 26 1
3 772 47 1
4 136 48 1
5 123 40 1
6 36 29 0
7 192 31 0
8 6,574 35 0
9 2,200 58 0
10 2,100 30 0
Average: 1,285 38.2
Standard Deviation: 2,029 10.2
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 4 of 7)
Figure 9.7: Scatter Chart for k-NN Classification
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 5 of 7)
% of Class 1
Table 9.7: Classification of k Neighbors Classification
Observation with Average Balance 1 1.00 1
= 900 and Age = 28 for Different 2 0.50 1
Values of k 3 0.33 0
4 0.25 0
5 0.40 0
6 0.50 1
7 0.57 1
8 0.63 1
9 0.56 1
10 0.50 1
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 6 of 7)
Estimating Continuous Outcomes with k-Nearest Neighbors:
• When k-NN is used to estimate a continuous outcome, a new
observation’s outcome value is predicted to be the average of the
outcome values of its k-nearest neighbors in the training set.
• The value of k can plausibly range from 1 to n, the number of
observations in the training set.
Figure 9.8: Scatter Chart for k-NN Estimation
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
k-Nearest Neighbors (Slide 7 of 7)
Table 9.8: Estimation Average k Average Balance Estimate
Balance for Observation with 1 $36
Age = 28 for Different Values of k 2 $936
3 $936
4 $750
5 $1,915
6 $1,604
7 $1,392
8 $1,315
9 $1,184
10 $1,285
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression
Trees
Classifying Categorical Outcomes with a Classification Tree
Estimating Continuous Outcomes with a Regression Tree
Ensemble Methods
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 1 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 3 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 4 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 5 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 6 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 7 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 8 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 9 of 20)
Estimating Continuous Outcomes with a Regression Tree:
• A regression tree successively partitions observations of the training set
into smaller and smaller groups in a similar fashion as a classification tree.
• The differences are:
• A regression tree bases the impurity of a partition based on the variance of
the outcome value for the observations in the group.
• After a final tree is constructed, the estimated outcome value of an
observation is based on the mean outcome value of the partition in which the
new observation belongs.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 10 of 20)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 11 of 20)
Ensemble Methods:
• In an ensemble method, predictions are made based on the combination
of a collection of models.
• Two necessary conditions for an ensemble to perform better than a
single model:
1. Individual base models are constructed independently of each other.
2. Individual models perform better than just randomly guessing.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 12 of 20)
Ensemble Methods (cont.):
• Two primary steps to an ensemble approach:
1. The development of a committee of individual base models.
2. The combination of the individual base models’ predictions to form a composite
prediction.
• A classification or estimation method is unstable if relatively small changes in
the training set cause its predictions to fluctuate.
• Three different ways to construct an ensemble of classification or regression
trees:
• Bagging.
• Boosting.
• Random forests.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 13 of 20)
Ensemble Methods (cont.):
• In the bagging approach, the committee of individual base models is
generated by first constructing multiple training sets by repeated random
sampling of the n observations in the original data with replacement.
Table 9.10: Original 10-Observation Training Data
Age 29 31 35 38 47 48 53 54 58 70
Loan
default 0 0 0 1 1 1 1 0 0 0
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 14 of 20)
Ensemble Methods (cont.):
• The boosting method generates is committee of individual base models
by sampling multiple training sets.
• Boosting iteratively adapts how it samples the original data when
constructing a new training set based on the prediction error of the
models constructed on the previous training sets.
• Random forests can be viewed as a variation of bagging specifically
tailored for use with classification or regression trees.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 15 of 20)
Table 9.11: Bagging: Generation of 10 New Training Sets and Corresponding
Classification Trees
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 16 of 20)
Table 9.11: Bagging: Generation of 10 New Training Sets and Corresponding
Classification Trees (cont.)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 17 of 20)
Table 9.11: Bagging: Generation of 10 New Training Sets and Corresponding
Classification Trees (cont.)
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 18 of 20)
Table 9.12: Classification of 10 Observations from Validation Set with Bagging Ensemble
Overall
Error
Age 26 29 30 32 34 37 42 47 48 54 Rate
Loan
default 1 0 0 0 0 1 0 1 1 0
Tree 1 0 0 0 0 0 1 1 1 1 1 30%
Tree 2 0 0 0 0 0 0 0 0 0 0 40%
Tree 3 0 0 0 0 0 1 1 1 1 1 30%
Tree 4 0 0 0 0 0 1 1 1 1 1 30%
Tree 5 0 0 0 0 0 0 1 1 1 1 40%
Tree 6 1 1 1 1 1 1 1 1 1 0 50%
Tree 7 1 1 1 1 1 1 1 1 1 0 50%
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 19 of 20)
Table 9.12: Classification of 10 Observations from Validation Set with Bagging Ensemble
Overall
Error
Age 26 29 30 32 34 37 42 47 48 54 Rate
Loan
default 1 0 0 0 0 1 0 1 1 0
Tree 8 1 1 1 1 1 1 1 1 1 0 50%
Tree 9 1 1 1 1 1 1 1 1 1 0 50%
Tree 10 0 0 0 0 0 0 0 0 0 0 40%
Average
Vote 0.4 0.4 0.4 0.4 0.4 0.7 0.8 0.8 0.8 0.4
Bagging
Ensemble 0 0 0 0 0 1 1 1 1 0 20%
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Classification and Regression Trees
(Slide 20 of 20)
Ensemble Methods (cont.):
• For most problems, the predictive accuracy of boosting ensembles exceeds the
predictive performance of bagging ensembles.
• Boosting achieves its performance advantage because:
• It evolves its committee of models by focusing on observations that are
mispredicted.
• The member models’ votes are weighted by their accuracy.
• Boosting is more computationally expensive than bagging.
• There is no adaptive feedback in a bagging approach, so all m training sets are
corresponding models can be implemented simultaneously.
• Random forests approach has performance similar to boosting, but maintains
the computational simplicity of bagging.
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
End of Chapter 9
© 2021 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.