Performance Evaluation

This document discusses techniques for evaluating predictive performance in predictive analytics. It describes three main types of predictive outcomes and several measures for assessing prediction accuracy based on a validation data set. These include measures like mean absolute error, mean percentage error, and root mean squared error. The document also discusses classification accuracy measures like misclassification rate that can be derived from a confusion matrix. It describes how propensity scores and cutoffs can be used to classify cases or rank them by probability of class membership. Benchmarking predictions against a naïve average model is also covered.

Uploaded by

Sherwin Lopez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views29 pages

Performance Evaluation

Uploaded by

Sherwin Lopez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

MODULE 3

PERFORMANCE AND EVALUATION

At the end of the topic, the learner should be able to:
• Learn and understand the different techniques used in
evaluating the predictive performance
• Identify and understand the difference of each
techniques for the performance evaluation of predictive
analytics
Three main types of outcomes of interest are:
• Predicted numerical value: when the outcome variable is
numerical (e.g., house price)
• Predicted class membership: when the outcome variable
is categorical (e.g., buyer/nonbuyer)
• Propensity: the probability of class membership, when the
outcome variable is categorical (e.g., the propensity to
default)
For assessing prediction performance, several measures are
used. In all cases the measures are based on the validation
set, which servers as a more objective ground than the
training set to assess predictive accuracy.
Naïve Benchmark: The Average
• The benchmark criterion in prediction is using the average
outcome value. In other words, the prediction for a new
record is simply the average across the outcome values of
the records in the training set.
Prediction Accuracy Measures
The prediction error for record i is defined as the difference between its
actual outcome value and tis predicted outcome value: ei = yi – yi. A few
popular numerical measures of predictive accuracy are:
• MAE (mean absolute error/deviation). This gives the magnitude of the
average absolute error.
• Mean Error. This measure is similar to MAE except that it retains the sign
of the errors, so that negative errors cancel out positive errors of the
same magnitude.
• MPE (mean percentage error). This gives the percentage score of how
predictions deviate from the actual values (on average), taking into
account the direction of the error.
• MAPE (mean absolute percentage error). This measure gives a
percentage score of how predictions deviate from the actual values.
• RMSE (root mean squared error). This is similar to the standard error of
estimate in linear regression, except that it is computed on the validation
data rather than on the training data.
• Errors that are based on the training set tell us about model fit, whereas
those that are based on the validation set (called “prediction errors”)
measure the model’s ability to predict new data (predictive performance).
• We expect training errors to be smaller than the validation errors
(because the model was fitted using the training set), and the more
complex the model, the greater the likelihood that it will overfit the
training data (indicated by a greater difference between the training and
validation errors).
• In an extreme case of overfitting, the training errors would be zero
(perfect fit of the model to the training data), and the validation errors
would be non-zero and non-negligible.
• A natural criterion for judging the performance of a classifier is
the probability of making a misclassification error.
• Misclassification means that the record belongs to one class
but the model classifies it as a member of a different class.
• A classifier that makes no errors would be perfect, but we do
not expect to be able to construct such classifiers in the real
world due to “noise” and not having all the information needed
to classify records precisely.
Figure 5.2 Cumulative gains chart (a) and decile lift
chart (b) for continuous outcome variable (sales of
Toyota cars)
• Benchmark: The Naïve Rule
• Class Separation
• The Confusion (Classification) Matrix
• Using the Validation Data
• Accuracy Measures
• Propensities and Cutoff for Classification
• A very simple rule for classifying a record into one of m classes,
ignoring all predictor information (x1, x2, …, xp) that we may have, is
to classify the record as a member of the majority class.
• In other words, “classify as belonging to the most prevalent class.”
• The naive rule is used mainly as a baseline or benchmark for
evaluating the performance of more complicated classifiers.
• Clearly, a classifier that uses external predictor information (on top
of the class membership allocation) should outperform the naive
rule.
• If the classes are well separated by the predictor
information, even a small dataset will suffice in finding a
good classifier, whereas if the classes are not separated at
all by the predictors, even a very large dataset will not
help.
Figure 5.3 High (a) and low (b) levels of separation
between two classes, using two predictors
• This matrix summarizes the correct and incorrect
classifications that a classifier produced for a certain
dataset. Rows and columns of the confusion matrix
correspond to the predicted and true (actual) classes,
respectively.
• The confusion matrix gives estimates of the true
classification and misclassification rates.
Table 5.2 Confusion matrix based on 3000 records
and two classes
• Different accuracy measures can be derived from the classification matrix. Consider a
two-class case with classes C1 and C2 (e.g., buyer/non-buyer). The schematic
confusion matrix in Table 5.3 uses the notation ni, j to denote the number of records that
are class Ci members and were classified as Cj members. Of course, if i ≠ j, these are
counts of misclassifications. The total number of records is n = n1, 1 + n1, 2 + n2,
1 + n2, 2.
• The main accuracy measure is the estimated misclassification rate, also called the
overall error rate. It is given by

where n is the total number of cases in the validation dataset.

Table 5.2 Confusion matrix based on 3000 records
and two classes
Table 5.3 Confusion matrix: Meaning of Each Cell
• Propensities are typically used either as an interim step for generating
predicted class membership (classification), or for rank-ordering the
records by their probability of belonging to a class of interest.

• If overall classification accuracy (involving all the classes) is of interest,

the record can be assigned to the class with the highest probability. In
many records, a single class is of special interest, so we will focus on
that particular class and compare the propensity of belonging to that
class to a cutoff value set by the analyst.
• This approach can be used with two classes or more than two
classes, though it may make sense in such cases to
consolidate classes so that you end up with two: the class of
interest and all other classes.
Table 5.4 24 Records with Their Actual Class
and the Probability (Propensity) of Them
Being Class “owner” Members, as Estimated
by a Classifier

FIGURE 5.6 Classification metrics based on cutoffs of 0.5, 0.25,

and 0.75 for the Riding Mower data
Lift Curves
• Lift curves (also called lift charts, gain curves, or gain charts) are
models involving categorical outcomes.
• The lift curve helps us determine how effectively we can “skim the
cream” by selecting a relatively small number of cases and getting a
relatively large portion of the responders.
• It is often the case that the more rare events are the more interesting or
important ones: responders to a mailing, those who commit fraud,
defaulters on debt, and the like.
• This same stratified sampling procedure is sometimes called weighted
sampling or undersampling, the latter referring to the fact that the more
plentiful class is undersampled, relative to the rare class.
Step 1. The response and nonresponse data are separated into two distinct
sets, or strata
Step 2. Records are randomly selected for the training set from each
stratum. Typically, one might select half the (scarce) responders for the
training set, then an equal number of nonresponders.
Step 3. Remaining responders are put in the validation set
Step 4. Nonresponders are randomly selected for the validation set in
sufficient numbers to maintain the original ration of responders to
nonresponders
Step 5. If a test is required, it can be taken randomly from the validation set.
• Shmueli G., et al. Data Mining for Business Intelligence Concepts,
Techniques, and Applications in Microsoft Office Excel with XLMiner 2nd
Ed. A John Wiley & Sons, Inc. Publication
• Bruce P., et al. Data Mining for Business Analytics Concepts, Techniques
and Applications. John Wiley & Sons, Inc. 2020

S1 Evaluate Performance LKW 1mar2025
No ratings yet
S1 Evaluate Performance LKW 1mar2025
26 pages
BANA 560 - Lecture - 3 - Model - Evalaution - Regression
No ratings yet
BANA 560 - Lecture - 3 - Model - Evalaution - Regression
37 pages
Predictive Performance
No ratings yet
Predictive Performance
33 pages
Chap5 Evaluating Performance
No ratings yet
Chap5 Evaluating Performance
54 pages
6 Evaluation
No ratings yet
6 Evaluation
57 pages
Chapter 10
No ratings yet
Chapter 10
31 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Bi 2
No ratings yet
Bi 2
25 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Prediction - Accuracy
No ratings yet
Prediction - Accuracy
33 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Bart-Baesens-Analytics-In-A-Big-Data-World.-The-Essential-Guide-To-Data-Science-And-Its-Applications-Wiley-2014-91-102
No ratings yet
Bart-Baesens-Analytics-In-A-Big-Data-World.-The-Essential-Guide-To-Data-Science-And-Its-Applications-Wiley-2014-91-102
12 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
Week 6 Machine Learning
No ratings yet
Week 6 Machine Learning
17 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
Performance Metrics: Dr. Gaurav Dixit
No ratings yet
Performance Metrics: Dr. Gaurav Dixit
9 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
6.data Mining - Classification
No ratings yet
6.data Mining - Classification
37 pages
Unit III 1
No ratings yet
Unit III 1
21 pages
Lec 4
No ratings yet
Lec 4
24 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Bia Unit Ii
No ratings yet
Bia Unit Ii
37 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
4.8 Estimating The Performance of A Classifier
No ratings yet
4.8 Estimating The Performance of A Classifier
19 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Lec 5
No ratings yet
Lec 5
28 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
Data Mining Primer
No ratings yet
Data Mining Primer
5 pages
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
No ratings yet
9 - Session 9 - Visualizing Model Performance, Evidence and Probabilities
37 pages
Lec 16
No ratings yet
Lec 16
13 pages
Unit 2
No ratings yet
Unit 2
20 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
DADM S2 Data Preprocessing-Data Cleaning and Transformation
No ratings yet
DADM S2 Data Preprocessing-Data Cleaning and Transformation
12 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
ML3 Evaluating Models
No ratings yet
ML3 Evaluating Models
40 pages
DWDM Final5
No ratings yet
DWDM Final5
45 pages
Model Evaluation in ML
No ratings yet
Model Evaluation in ML
12 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet

Performance Evaluation

Uploaded by

Performance Evaluation

Uploaded by

MODULE 3

PERFORMANCE AND EVALUATION

where n is the total number of cases in the validation dataset.

• If overall classification accuracy (involving all the classes) is of interest,

FIGURE 5.6 Classification metrics based on cutoffs of 0.5, 0.25,

You might also like