0% found this document useful (0 votes)

9 views25 pages

Evaluation Metrics

Chapter 8 of 'Fundamentals of Machine Learning for Predictive Data Analytics' focuses on the evaluation of machine learning models, emphasizing the importance of determining model suitability, estimating performance, and user confidence. It discusses various metrics such as misclassification accuracy, precision, recall, and F1 measure, along with methodologies for test set selection like hold-out sampling and k-fold cross-validation. The chapter highlights the significance of stratification and repeated hold-out methods to improve the reliability of evaluation results.

Uploaded by

jaswinder singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views25 pages

Evaluation Metrics

Uploaded by

jaswinder singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Evaluation

Wei Li

Fundamentals of Machine Learning for Predictive Data Analytics

Chapter 8: Evaluation
Sections 8.1, 8.3, 8.4.1, 8.4.2
Content
• Big idea
• Hold-out test set
• Misclassification accuracy and confusion matrix
• Precision, recall and F1 measure
• More test set selection approach
Big idea
The purpose of evaluation is threefold:

1. To determine which model is the most suitable for a task.

2. To estimate how the model will perform.
3. To convince users that the model will meet their needs.

Training
Data Test
Data
Model
Standard Approach: Measuring
Misclassification Rate on a
Hold-out Test Set
Misclassification Rate on a Hold-out Test Set

Figure: The process of building and evaluating a model using a

hold-out test set.
Training Valida'on Test
Set Set Set

(a) A 50:20:30split

Training Valida'on Test

Set Set Set

(b) A 40:20:40split

Figure: Hold-out sampling can divide the full data into training,
validation, and test sets.
0.5
Performance on Training Set
Performance on Validation Set

0.4
Misclassification Rate
0.3
0.2
0.1

0 50 100 150 200

Training Iteration

Figure: Using a validation set to avoid overfitting in iterative machine

learning algorithms.
Misclassification accuracy and
confusion matrix
Table: A sample test set with model predictions of email(ham/spam)
ID Target Pred. Outcome ID Target Pred. Outcome
1 spam ham FN 11 ham ham TN
2 spam ham FN 12 spam ham FN
3 ham ham TN 13 ham ham TN
4 spam spam TP 14 ham ham TN
5 ham ham TN 15 ham ham TN
6 spam spam TP 16 ham ham TN
7 ham ham TN 17 ham spam FP
8 spam spam TP 18 spam spam TP
9 spam spam TP 19 ham ham TN
10 spam spam TP 20 ham spam FP

number incorrect predictions

misclassification rate =
total predictions

misclassification rate = 5/20 = 0.25

Table: A sample test set with model predictions.
ID Target Pred. Outcome ID Target Pred. Outcome
1 spam ham FN 11 ham ham TN
2 spam ham FN 12 spam ham FN
3 ham ham TN 13 ham ham TN
4 spam spam TP 14 ham ham TN
5 ham ham TN 15 ham ham TN
6 spam spam TP 16 ham ham TN
7 ham ham TN 17 ham spam FP
8 spam spam TP 18 spam spam TP
9 spam spam TP 19 ham ham TN
10 spam spam TP 20 ham spam FP

spam--positive ham--negative
For binary prediction problems there are 4 possible
outcomes:
1 True Positive (TP) :Target = positive AND Pred = positive
2 True Negative (TN) :Target = negative AND Pred =negative
3
False Positive (FP) : Target = negative AND Pred = positive
4
False Negative (FN) : :Target = positive AND Pred = negative
Table: A sample test set with model predictions.
ID Target Pred. Outcome ID Target Pred. Outcome
1 spam ham FN 11 ham ham TN
2 spam ham FN 12 spam ham FN
3 ham ham TN 13 ham ham TN
4 spam spam TP 14 ham ham TN
5 ham ham TN 15 ham ham TN
6 spam spam TP 16 ham ham TN
7 ham ham TN 17 ham spam FP
8 spam spam TP 18 spam spam TP
9 spam spam TP 19 ham ham TN
10 spam spam TP 20 ham spam FP

Table: The structure of a confusion matrix.

Prediction
positive negative
positive TP FN
Target
negative FP TN
Table: The structure of a confusion matrix.
Prediction
positive negative
positive TP FN
Target
negative FP TN

Error (FP + FN)

misclassification accuracy = (2)
(TP + TN + FP + FN)
(2 + 3)
misclassification accuracy = = 0.25
(6 + 9 + 2 + 3)
Accuracy (TP + TN)
classification accuracy = (3)
(TP + TN + FP + FN)

(6 + 9)
classification accuracy = = 0.75
(6 + 9 + 2 + 3)
Performance Measures for
Categorical Targets

Precision, Recall and F1 measure

Accuracy Metrics
• Widely used Metrics
• – Precision: Number of correct predictions for
class/Number of predictions of the class
• – Recall: Number of correct predictions for
class/Size of the reference data for the class
• – F1: This measure is defined as the harmonic
mean of precision and recall.
Precision, recall and F1 measure
• For positive predication
positive predicted negative
sample positive sample
TP
precision =
(TP + FP)
TP
recall =
(TP + FN)
precision

6
precision = = 0.75
(6 + 2)
6
recall = = 0.667
(6 + 3)
 Recall and Precision are
contradictory each other
TP
precision =
(TP + FP)
TP
recall =
Extreme cases (TP + FN)

FP=0 FN=0
Precision=100% Precision=60%
Recall = 10% Recall = 100%

(precision ×recall)
F1 -measure = 2 ×
(precision + recall)
(precision ×recall)
F1 -measure = 2 ×
(precision + recall)

Precision=100% Precision=60%
Recall = 10% Recall = 100%

F1-measure=18% F1-measure=75%

6
precision = = 0.75
(6 + 2)
6
recall = = 0.667
(6 + 3)

F1-measure=0.71
Making the most of the data

• Generally, the larger the training data the

better the classifier.
• The larger the test data, the more accurate the
error estimate.

• Make our output more reliable, what can we

do?
Stratification
• The hold-out method reserves a certain amount
for testing and uses the remainder for training.
• For “Unbalanced” dataset, samples might not be
representative.
– Few or none instances of some classes.
• Stratified sample: advanced version of balancing
the data.
– Make sure that each class is represented with
approximately equal proportions in both subsets.
Repeated Hold-out Method
• Hold-out estimate can be made more reliable
by repeating the process with different
subsamples.
– In each iteration, a certain proportion is randomly
selected for training (possibly with stratification)
– The error rates on the different iterations are
averaged to yield an overall error rate.
• This is called the repeated Hold-out method.
Repeated Hold-out Method (Cont.)
• Still not optimum: the different test sets
overlap, but we would like any our instances
from the data to be tested at least once.

• Can we prevent overlapping?

K-Fold Cross Validation
• K-fold cross-validation avoids
overlapping test sets:
– First step: data is split into k subsets of
equal size;
– Second step: each subset in turn is used
for testing and the remainder for
training.
• The subset are stratified before the
cross-validation.
Classifier
• The estimates are averaged to yield an
overall estimate.
train train test

Data train test train

test train train

More on Cross-Validation
• Standard method for evaluation: stratified 10-fold
cross-validation.
• Why 10?
– Extensive experiments have shown that this is the best
choice to get an accurate estimate.
• Stratification reduces the estimates’ variance.
• Even better: repeated stratified cross-validation
– E.g. 10-fold cross-validation is repeated 10 times and
results are averaged (reduce the variance).
Leave-One-Out Cross-Validation
• Leave-One-Out is a particular form of cross-
validation.
– Set number of folds to number of training
instances;
• For n training instances, build classifier n times.
• Make best use of the data.
• Involves no random sub-sampling.
• Very computationally expensive.
Leave-One-Out Cross-Validation and
Stratification
• A disadvantage of Leave-One-Out Cross-
Validation CV is that stratification is not
possible:
– It guarantees a non-stratified sample because
there is only one instance in the test set!

Artificial Intelligence Unit-1 Introduction and Intelligent Agents
100% (2)
Artificial Intelligence Unit-1 Introduction and Intelligent Agents
16 pages
Service Manual - Mispa CCXL Agappe - Final
No ratings yet
Service Manual - Mispa CCXL Agappe - Final
108 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Handout - Kasus - Nominativ Und Akkusativ-1
No ratings yet
Handout - Kasus - Nominativ Und Akkusativ-1
8 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
73 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
CH 6
No ratings yet
CH 6
24 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
Ros 121223 1633 8028
No ratings yet
Ros 121223 1633 8028
1,656 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
MLA CT1 - Notes
No ratings yet
MLA CT1 - Notes
17 pages
Lec 12 13 Evaluation Measures
No ratings yet
Lec 12 13 Evaluation Measures
45 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Mod 7 Smote ML
No ratings yet
Mod 7 Smote ML
40 pages
Presentation On Classification
No ratings yet
Presentation On Classification
18 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Notes On Regression and Classification - Part 1
No ratings yet
Notes On Regression and Classification - Part 1
12 pages
Ebook CISO Define Your Role
100% (1)
Ebook CISO Define Your Role
53 pages
Lecture 2.1 - AML
No ratings yet
Lecture 2.1 - AML
32 pages
Module 6
No ratings yet
Module 6
24 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Lecture 11
No ratings yet
Lecture 11
61 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Lesson 6 Analytics Methods
No ratings yet
Lesson 6 Analytics Methods
12 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Lesson 3.2 - Supervised Learning Evaluation PDF
No ratings yet
Lesson 3.2 - Supervised Learning Evaluation PDF
38 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
Chater 3 Class 10
No ratings yet
Chater 3 Class 10
4 pages
CSS - 1st Sem - 1st Quarter - DLL
100% (2)
CSS - 1st Sem - 1st Quarter - DLL
44 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Networks Laboratory (IT 3095), Lesson Plan-Autumn 2023
No ratings yet
Networks Laboratory (IT 3095), Lesson Plan-Autumn 2023
8 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Acknowledgement For Thesis Work in Pakistan
100% (3)
Acknowledgement For Thesis Work in Pakistan
7 pages
Telit 3g Modules at Commands Reference Guide r9
No ratings yet
Telit 3g Modules at Commands Reference Guide r9
537 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
HSChapter4 PDF
No ratings yet
HSChapter4 PDF
52 pages
Model Performance Assessment
No ratings yet
Model Performance Assessment
13 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Chapter 1 v8.2
No ratings yet
Chapter 1 v8.2
72 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Data Mining - Credibility: Evaluating What's Been Learned
No ratings yet
Data Mining - Credibility: Evaluating What's Been Learned
36 pages
Measuring Stradivari Violin "Cremonese" (1715) by 3D Modeling
No ratings yet
Measuring Stradivari Violin "Cremonese" (1715) by 3D Modeling
5 pages
CE880 Lecture6 Slides
No ratings yet
CE880 Lecture6 Slides
25 pages
Data Entry Operator Job Description
100% (1)
Data Entry Operator Job Description
2 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Unit 2
No ratings yet
Unit 2
28 pages
IOM - Gas Detector - AE - AIYI - EN Manual-AG200 Series Fixed Gas Detector - AnrN - AnrS
No ratings yet
IOM - Gas Detector - AE - AIYI - EN Manual-AG200 Series Fixed Gas Detector - AnrN - AnrS
24 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
OSSC
No ratings yet
OSSC
2 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Artefacts of Kik Messenger On iOS
100% (2)
Artefacts of Kik Messenger On iOS
8 pages
7.3.2.6 Lab - Build and Test Network Cables
No ratings yet
7.3.2.6 Lab - Build and Test Network Cables
5 pages
Online Management Information System With Appointment System With AI Powered Chatbot
No ratings yet
Online Management Information System With Appointment System With AI Powered Chatbot
38 pages
SE Answer Key
No ratings yet
SE Answer Key
17 pages
Phht-621: Solid State Physics B.Sc. Hons (Physics) Project Report
No ratings yet
Phht-621: Solid State Physics B.Sc. Hons (Physics) Project Report
1 page
Numerov Method
No ratings yet
Numerov Method
4 pages
Project Report
No ratings yet
Project Report
30 pages
Logout Edit
No ratings yet
Logout Edit
5 pages
Food - Wb.gov - in Food Digitalportal ApplyNewFPS - Aspx
No ratings yet
Food - Wb.gov - in Food Digitalportal ApplyNewFPS - Aspx
2 pages
Unit 4 - Introduction To Databases
No ratings yet
Unit 4 - Introduction To Databases
33 pages
ADM940 - EN - Part-5
No ratings yet
ADM940 - EN - Part-5
10 pages
Hol 2210 91 SDC - PDF - en
No ratings yet
Hol 2210 91 SDC - PDF - en
55 pages
Digital Logic Design Assignment
No ratings yet
Digital Logic Design Assignment
2 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
API-fication: Core Building Block of The Digital Enterprise
No ratings yet
API-fication: Core Building Block of The Digital Enterprise
14 pages
Mona Abdelmonem
No ratings yet
Mona Abdelmonem
19 pages
60% PDF
No ratings yet
60% PDF
1 page
Contoh Template Soal
No ratings yet
Contoh Template Soal
18 pages
Articulo 1
No ratings yet
Articulo 1
12 pages
Tensor Field: 1 Geometric Introduction
No ratings yet
Tensor Field: 1 Geometric Introduction
4 pages
Doctor Patient
No ratings yet
Doctor Patient
5 pages
Delta Function - From Wolfram MathWorld
No ratings yet
Delta Function - From Wolfram MathWorld
8 pages
Friedmann Equations
No ratings yet
Friedmann Equations
7 pages
Stationary Waves PDF
No ratings yet
Stationary Waves PDF
2 pages
If Else For End End: Function
No ratings yet
If Else For End End: Function
2 pages
Hamilton Jacobi Einstein Equation
No ratings yet
Hamilton Jacobi Einstein Equation
6 pages
JAM Syllabus
No ratings yet
JAM Syllabus
7 pages
Beta Decay As A Virtual Particle Interaction Analogous To Hawking Radiation
No ratings yet
Beta Decay As A Virtual Particle Interaction Analogous To Hawking Radiation
5 pages

Evaluation Metrics

Uploaded by

Evaluation Metrics

Uploaded by

Evaluation

Fundamentals of Machine Learning for Predictive Data Analytics

1. To determine which model is the most suitable for a task.

Figure: The process of building and evaluating a model using a

Training Valida'on Test

0 50 100 150 200

Figure: Using a validation set to avoid overfitting in iterative machine

number incorrect predictions

misclassification rate = 5/20 = 0.25

Table: The structure of a confusion matrix.

Error (FP + FN)

Precision, Recall and F1 measure

• Generally, the larger the training data the

• Make our output more reliable, what can we

• Can we prevent overlapping?

Data train test train

test train train

You might also like