0% found this document useful (0 votes)

4 views21 pages

Evaluation

Uploaded by

parksy317575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views21 pages

Evaluation

Uploaded by

parksy317575

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Instructor: Junghye Lee

School of Management Engineering

[email protected]
Evaluation of test set
Generalized Evaluation
Evaluation on “Small” Data
• The holdout method reserves a certain amount for
testing and uses the remainder for training
• Usually, one third for testing, the rest for training
• For small or “unbalanced” datasets, samples might
not be representative
• For instance, few or none instances of some classes
• Stratified sample
• Advanced version of balancing the data
• Make sure that each class is represented with
approximately equal proportions in both subsets
Repeated Holdout Method
• Holdout estimate can be made more reliable by
repeating the process with different subsamples
• In each iteration, a certain proportion is randomly
selected for training (possibly with stratification)
• The error rates on the different iterations are averaged
to yield an overall error rate
• This is called the repeated holdout method
• Still not optimum: the different test sets overlap
Cross-Validation
• Avoids overlapping test sets
• First step: data is split into k subsets of equal size
• Second step: each subset in turn is used for testing and
the remainder for training
• This is called k-fold cross-validation
• Often the subsets are stratified before the cross-
validation is performed
• The error estimates are averaged to yield an overall
error estimate
More on Cross-Validation
• Standard method for evaluation
• Stratified ten-fold cross-validation
• Why ten? Extensive experiments have shown that
this is the best choice to get an accurate estimate
• Stratification reduces the estimate’s variance
• Even better: repeated stratified cross-validation
• e.g. ten-fold cross-validation is repeated ten times and
results are averaged (reduces the variance)
Leave-One-Out Cross-Validation
• It is a particular form of cross-validation
• Set number of folds to number of training instances
• i.e., for n training instances, build classifier n times
• Makes best use of the data
• Involves no random subsampling
• Very computationally expensive
Evaluation criteria
• Predictive accuracy: this refers to the ability of the
model to correctly predict the target of new or
previously unseen data:
• Time & Memory: this refers to the computation
costs involved in generating and using the model
• Robustness: this is the ability of the model to make
correct predictions given noisy data or data with
missing values
• Scalability: this refers to the ability to construct the
model efficiently given large amount of data
Evaluation criteria
• Interpretability: this refers to the level of
understanding and insight that is provided by the
model
• Simplicity:
• decision tree size
• rule compactness

• Domain-dependent quality indicators

Prediction Model

• Regression

• Classification
Evaluation of Prediction Model
• BIAS - The arithmetic mean of the errors
𝑛 𝑛
Σ𝑖=1 (𝑦𝑖 − 𝑦ො𝑖 ) Σ𝑖=1 𝑒𝑟𝑟𝑜𝑟
𝐵𝐼𝐴𝑆 = =
𝑛 𝑛
• n is the number of test samples.

• Mean Absolute Deviation - MAD

𝑛 𝑛
Σ𝑖=1 𝑦𝑖 − 𝑦ො𝑖 Σ𝑖=1 |𝑒𝑟𝑟𝑜𝑟|
𝑀𝐴𝐷 = =
𝑛 𝑛
Evaluation of Prediction Model
• Mean Square Error – MSE (most popular)
𝑛 2 𝑛 2
Σ𝑖=1 𝑦𝑖 − 𝑦ො𝑖 Σ𝑖=1 𝑒𝑟𝑟𝑜𝑟
𝑀𝑆𝐸 = =
𝑛 𝑛
• Standard error is square root of MSE or (RMSE)

• Mean Absolute Percentage Error – MAPE

𝑛 𝑦𝑖 − 𝑦ො𝑖
Σ𝑖=1 ∗ 100%
𝑦𝑖
𝑀𝐴𝑃𝐸 =
𝑛
Evaluation of Prediction Model
• Root relative squared error - RRSE

𝑛 2
Σ𝑖=1 𝑦𝑖 − 𝑦ො𝑖
𝑅𝑅𝑆𝐸 = 𝑛 2
Σ𝑖=1 𝑦𝑖 − 𝑦ത𝑖

• In general, the lower the error measure (BIAS, MAD, MSE, MAPE,
RRSE) or the higher the 𝑅2 , 𝑟 the better the forecasting model
Which measure?
• Best to look at all of them
• Often it doesn’t matter
• Example:
Prediction Model

• Regression

• Classification
Evaluation of Prediction Model
• Two-class case (yes, no)

• Four different outcomes

• true positive, true negative, false positive, false negative

• We display these outcomes in the following confusion matrix

Evaluation of Prediction Model
• Accuracy
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝐹𝑃+𝑇𝑁+𝐹𝑁
• Sensitivity = Recall
𝑇𝑃
𝑇𝑃+𝐹𝑁
• Specificity
𝑇𝑁
𝑇𝑁+𝐹𝑃
• Precision
𝑇𝑃
𝑇𝑃+𝐹𝑃
AUC
• Stands for “Area under
Receiver Operating
characteristic”
• It can show tradeoff
between true positives and
false positives over noisy
channel
• (1-specificity) vs. sensitivity
F-Measure
• It can show tradeoff between precision and recall over noisy
channel

1 𝛽 2 +1 𝑃∙𝑅
𝐹𝛽 = 𝛽2
=
∙
1
+
1
∙
1 𝛽 2 𝑃+𝑅
𝛽2 +1 𝑅𝑒𝑐𝑎𝑙𝑙 𝛽2 +1 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

• The most popular measure is

2𝑃𝑅
𝐹1 =
𝑃+𝑅
Cross-validation and AUC, F1
• Collect probabilities for instances in test folds
• Sort instances according to probabilities

• Generate an AUC or a F1 for each fold

• Average them

• Generate an AUC or a F1 for each repetition

• Average them

Sampling Techniques
No ratings yet
Sampling Techniques
22 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Understandable Statistics Solutions Manual Sosany2
100% (13)
Understandable Statistics Solutions Manual Sosany2
238 pages
The Mediating Effect of Self-Regulation On The Relationship Between Mathematical Disposition and Mathematics Proficiency Among Mathematics Education Students
No ratings yet
The Mediating Effect of Self-Regulation On The Relationship Between Mathematical Disposition and Mathematics Proficiency Among Mathematics Education Students
19 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Business Statistics LU1 Notes
100% (1)
Business Statistics LU1 Notes
11 pages
302 - Unit-2 Data Representation and Sampling Technique
No ratings yet
302 - Unit-2 Data Representation and Sampling Technique
25 pages
Cấu Trúc Thi Market Research
No ratings yet
Cấu Trúc Thi Market Research
57 pages
Sampling Methods - Types, Techniques & Examples
No ratings yet
Sampling Methods - Types, Techniques & Examples
16 pages
Survey Research For Elt
100% (3)
Survey Research For Elt
31 pages
Project On Digital Marketing by Divakar
No ratings yet
Project On Digital Marketing by Divakar
63 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
Material Testing Manual Amdt32 Dec2010
No ratings yet
Material Testing Manual Amdt32 Dec2010
1,356 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Cross Validation
No ratings yet
Cross Validation
6 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
Wilhelm
No ratings yet
Wilhelm
44 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Bi Unit 5
No ratings yet
Bi Unit 5
20 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
BBA 4 RM Unit 2
No ratings yet
BBA 4 RM Unit 2
85 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
Modellingandevaluationunit2june2322 220623063944 5c70ebed
No ratings yet
Modellingandevaluationunit2june2322 220623063944 5c70ebed
53 pages
Sampling and Sample Size Determination
100% (1)
Sampling and Sample Size Determination
42 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
Cross Validation LN 12
No ratings yet
Cross Validation LN 12
11 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Topic 7
No ratings yet
Topic 7
70 pages
CHP 3
No ratings yet
CHP 3
70 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Model Evaluation-I
No ratings yet
Model Evaluation-I
68 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Sampling
No ratings yet
Sampling
43 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Camm 4e Ch09 PPT
No ratings yet
Camm 4e Ch09 PPT
71 pages
Unit 2
No ratings yet
Unit 2
28 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
L4 - Data and Sampling
No ratings yet
L4 - Data and Sampling
37 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
Unit 5 ML
No ratings yet
Unit 5 ML
21 pages
Unit - I Chap-4 Model Evaluation and Development
No ratings yet
Unit - I Chap-4 Model Evaluation and Development
35 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
BookSlides 3B Data Exploration
No ratings yet
BookSlides 3B Data Exploration
60 pages
Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
Data Mining - Credibility: Evaluating What's Been Learned
No ratings yet
Data Mining - Credibility: Evaluating What's Been Learned
36 pages
Sampling Methods & Inclusion/Exclusion Criteria: Kami Memimpin
No ratings yet
Sampling Methods & Inclusion/Exclusion Criteria: Kami Memimpin
44 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
Performance Evaluation
No ratings yet
Performance Evaluation
29 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
MIS410 Lecture8toLecture10
No ratings yet
MIS410 Lecture8toLecture10
13 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Best Quiz 3 Flashcards - Quizlet
No ratings yet
Best Quiz 3 Flashcards - Quizlet
20 pages
DS Notes Unit - V
No ratings yet
DS Notes Unit - V
13 pages
Chapter 1 STA108
No ratings yet
Chapter 1 STA108
24 pages
Presentation On Classification
No ratings yet
Presentation On Classification
18 pages
Submodule5 Sampling Designs Ver3 29nov2018
No ratings yet
Submodule5 Sampling Designs Ver3 29nov2018
26 pages
Data Collection and Presentation
No ratings yet
Data Collection and Presentation
11 pages
Lec 16
No ratings yet
Lec 16
18 pages
Assignment No.1 - 8614 - Autumn 2023
No ratings yet
Assignment No.1 - 8614 - Autumn 2023
12 pages
Auditing 2
No ratings yet
Auditing 2
6 pages
Validation Over Under Fir Unit 5
No ratings yet
Validation Over Under Fir Unit 5
6 pages
Evaluation Method Holdout
No ratings yet
Evaluation Method Holdout
14 pages
Sampling Bookish Notes
No ratings yet
Sampling Bookish Notes
10 pages
Estimating The Predictive Accuracy of A
No ratings yet
Estimating The Predictive Accuracy of A
10 pages
Chapter 5
No ratings yet
Chapter 5
3 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Probability and Statistics
No ratings yet
Probability and Statistics
6 pages
An Introduction To Sampling Methods: Population Vs Sample
No ratings yet
An Introduction To Sampling Methods: Population Vs Sample
6 pages
Chapter 1 Capstone Project Ai Class 12
No ratings yet
Chapter 1 Capstone Project Ai Class 12
5 pages
On Estimating Model Accuracy
No ratings yet
On Estimating Model Accuracy
6 pages
Nikita Sarels 22251423 Computer Info Systems 2
No ratings yet
Nikita Sarels 22251423 Computer Info Systems 2
8 pages
Sampling of Data
No ratings yet
Sampling of Data
13 pages
Bootstrap Userguide
No ratings yet
Bootstrap Userguide
12 pages
Types of Sampling
100% (1)
Types of Sampling
3 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet

Evaluation

Uploaded by

Evaluation

Uploaded by

Instructor: Junghye Lee

School of Management Engineering

• Domain-dependent quality indicators

• Mean Absolute Deviation - MAD

• Mean Absolute Percentage Error – MAPE

• Four different outcomes

• We display these outcomes in the following confusion matrix

• The most popular measure is

• Generate an AUC or a F1 for each fold

• Generate an AUC or a F1 for each repetition

You might also like