0% found this document useful (0 votes)
35 views29 pages

Performance Evaluation

This document discusses techniques for evaluating predictive performance in predictive analytics. It describes three main types of predictive outcomes and several measures for assessing prediction accuracy based on a validation data set. These include measures like mean absolute error, mean percentage error, and root mean squared error. The document also discusses classification accuracy measures like misclassification rate that can be derived from a confusion matrix. It describes how propensity scores and cutoffs can be used to classify cases or rank them by probability of class membership. Benchmarking predictions against a naïve average model is also covered.

Uploaded by

Sherwin Lopez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views29 pages

Performance Evaluation

This document discusses techniques for evaluating predictive performance in predictive analytics. It describes three main types of predictive outcomes and several measures for assessing prediction accuracy based on a validation data set. These include measures like mean absolute error, mean percentage error, and root mean squared error. The document also discusses classification accuracy measures like misclassification rate that can be derived from a confusion matrix. It describes how propensity scores and cutoffs can be used to classify cases or rank them by probability of class membership. Benchmarking predictions against a naïve average model is also covered.

Uploaded by

Sherwin Lopez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

MODULE 3

PERFORMANCE AND EVALUATION


At the end of the topic, the learner should be able to:
• Learn and understand the different techniques used in
evaluating the predictive performance
• Identify and understand the difference of each
techniques for the performance evaluation of predictive
analytics
Three main types of outcomes of interest are:
• Predicted numerical value: when the outcome variable is
numerical (e.g., house price)
• Predicted class membership: when the outcome variable
is categorical (e.g., buyer/nonbuyer)
• Propensity: the probability of class membership, when the
outcome variable is categorical (e.g., the propensity to
default)
For assessing prediction performance, several measures are
used. In all cases the measures are based on the validation
set, which servers as a more objective ground than the
training set to assess predictive accuracy.
Naïve Benchmark: The Average
• The benchmark criterion in prediction is using the average
outcome value. In other words, the prediction for a new
record is simply the average across the outcome values of
the records in the training set.
Prediction Accuracy Measures
The prediction error for record i is defined as the difference between its
actual outcome value and tis predicted outcome value: ei = yi – yi. A few
popular numerical measures of predictive accuracy are:
• MAE (mean absolute error/deviation). This gives the magnitude of the
average absolute error.
• Mean Error. This measure is similar to MAE except that it retains the sign
of the errors, so that negative errors cancel out positive errors of the
same magnitude.
• MPE (mean percentage error). This gives the percentage score of how
predictions deviate from the actual values (on average), taking into
account the direction of the error.
• MAPE (mean absolute percentage error). This measure gives a
percentage score of how predictions deviate from the actual values.
• RMSE (root mean squared error). This is similar to the standard error of
estimate in linear regression, except that it is computed on the validation
data rather than on the training data.
• Errors that are based on the training set tell us about model fit, whereas
those that are based on the validation set (called “prediction errors”)
measure the model’s ability to predict new data (predictive performance).
• We expect training errors to be smaller than the validation errors
(because the model was fitted using the training set), and the more
complex the model, the greater the likelihood that it will overfit the
training data (indicated by a greater difference between the training and
validation errors).
• In an extreme case of overfitting, the training errors would be zero
(perfect fit of the model to the training data), and the validation errors
would be non-zero and non-negligible.
• A natural criterion for judging the performance of a classifier is
the probability of making a misclassification error.
• Misclassification means that the record belongs to one class
but the model classifies it as a member of a different class.
• A classifier that makes no errors would be perfect, but we do
not expect to be able to construct such classifiers in the real
world due to “noise” and not having all the information needed
to classify records precisely.
Figure 5.2 Cumulative gains chart (a) and decile lift
chart (b) for continuous outcome variable (sales of
Toyota cars)
• Benchmark: The Naïve Rule
• Class Separation
• The Confusion (Classification) Matrix
• Using the Validation Data
• Accuracy Measures
• Propensities and Cutoff for Classification
• A very simple rule for classifying a record into one of m classes,
ignoring all predictor information (x1, x2, …, xp) that we may have, is
to classify the record as a member of the majority class.
• In other words, “classify as belonging to the most prevalent class.”
• The naive rule is used mainly as a baseline or benchmark for
evaluating the performance of more complicated classifiers.
• Clearly, a classifier that uses external predictor information (on top
of the class membership allocation) should outperform the naive
rule.
• If the classes are well separated by the predictor
information, even a small dataset will suffice in finding a
good classifier, whereas if the classes are not separated at
all by the predictors, even a very large dataset will not
help.
Figure 5.3 High (a) and low (b) levels of separation
between two classes, using two predictors
• This matrix summarizes the correct and incorrect
classifications that a classifier produced for a certain
dataset. Rows and columns of the confusion matrix
correspond to the predicted and true (actual) classes,
respectively.
• The confusion matrix gives estimates of the true
classification and misclassification rates.
Table 5.2 Confusion matrix based on 3000 records
and two classes
• Different accuracy measures can be derived from the classification matrix. Consider a
two-class case with classes C1 and C2 (e.g., buyer/non-buyer). The schematic
confusion matrix in Table 5.3 uses the notation ni, j to denote the number of records that
are class Ci members and were classified as Cj members. Of course, if i ≠ j, these are
counts of misclassifications. The total number of records is n = n1, 1 + n1, 2 + n2,
1 + n2, 2.
• The main accuracy measure is the estimated misclassification rate, also called the
overall error rate. It is given by

where n is the total number of cases in the validation dataset.


Table 5.2 Confusion matrix based on 3000 records
and two classes
Table 5.3 Confusion matrix: Meaning of Each Cell
• Propensities are typically used either as an interim step for generating
predicted class membership (classification), or for rank-ordering the
records by their probability of belonging to a class of interest.

• If overall classification accuracy (involving all the classes) is of interest,


the record can be assigned to the class with the highest probability. In
many records, a single class is of special interest, so we will focus on
that particular class and compare the propensity of belonging to that
class to a cutoff value set by the analyst.
• This approach can be used with two classes or more than two
classes, though it may make sense in such cases to
consolidate classes so that you end up with two: the class of
interest and all other classes.
Table 5.4 24 Records with Their Actual Class
and the Probability (Propensity) of Them
Being Class “owner” Members, as Estimated
by a Classifier

FIGURE 5.6 Classification metrics based on cutoffs of 0.5, 0.25,


and 0.75 for the Riding Mower data
Lift Curves
• Lift curves (also called lift charts, gain curves, or gain charts) are
models involving categorical outcomes.
• The lift curve helps us determine how effectively we can “skim the
cream” by selecting a relatively small number of cases and getting a
relatively large portion of the responders.
• It is often the case that the more rare events are the more interesting or
important ones: responders to a mailing, those who commit fraud,
defaulters on debt, and the like.
• This same stratified sampling procedure is sometimes called weighted
sampling or undersampling, the latter referring to the fact that the more
plentiful class is undersampled, relative to the rare class.
Step 1. The response and nonresponse data are separated into two distinct
sets, or strata
Step 2. Records are randomly selected for the training set from each
stratum. Typically, one might select half the (scarce) responders for the
training set, then an equal number of nonresponders.
Step 3. Remaining responders are put in the validation set
Step 4. Nonresponders are randomly selected for the validation set in
sufficient numbers to maintain the original ration of responders to
nonresponders
Step 5. If a test is required, it can be taken randomly from the validation set.
• Shmueli G., et al. Data Mining for Business Intelligence Concepts,
Techniques, and Applications in Microsoft Office Excel with XLMiner 2nd
Ed. A John Wiley & Sons, Inc. Publication
• Bruce P., et al. Data Mining for Business Analytics Concepts, Techniques
and Applications. John Wiley & Sons, Inc. 2020

You might also like