0% found this document useful (0 votes)

10 views

Machine Learning Evaluation Metrics Lecturer

Uploaded by

yassmin khaldi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Machine Learning Evaluation Metrics Lecturer

Uploaded by

yassmin khaldi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

.

BDIJOF-FBSOJOH
1FSGPSNBODF.FUSJDT ityof Bergamo

Performance metrics
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

2 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

3 /31
Metrics
It is extremely important to use quantitative metrics for evaluating a machine learning
model

Until now, we relied on the cost function value for regression and classification

Other metrics can be used to better evaluate and understand the model

For classification
Accuracy/Precision/Recall/F1-score, ROC curves,…
For regression
Normalized RMSE, Normalized Mean Absolute Error (NMAE),…

4 /31
Accuracy
Accuracy is a measure of how close a given set of guessing from our model are closed
to their true value.

(

If a classifier make 10 predictions and 9 of them are correct, the accuracy is 90%.

Accuracy is a measure of how well a binary classifier correctly identifies or excludes

a condition.
It’s the proportion of correct predictions among the total number of cases
examined.

5 /31
Classification case: metrics for skewed classes
Disease dichotomic classification example

Train logistic regression model " , with ! ( if disease, ! ( otherwise.

Find that you got error on test set ( correct diagnoses)

The " class has very few samples with

Only of patients actually have disease
respect to the " class

If I use a classifier that always classifies the observations to the % class, I get of
accuracy!!

For skewed classes, the accuracy metric can be deceptive

6 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

7 /31
Precision and recall
Suppose that ! ( in presence of a rare class that we want to detect

Precision (How much we are precise in the detection)

Of all patients where we classified , Confusion matrix
what fraction actually has the disease?
Actual class

"
!

Estiamted class
1 (p) 0 (n)

True positive False positive

Recall (How much we are good at detecting) 1 (Y)
(TP) (FP)
Of all patients that actually have the disease, what
fraction did we correctly detect as having the disease? False negative True negative
0 (N)
(FN) (TN)

"
!

8 /31
F1-score
It is usually better to compare models by means of one number only. The & ' can
be used to combine precision and recall

Precision(P) Recall (R) Average F1 Score

Algorithm 1 0.5 0.4 0.45 0.444 The best is Algorithm 1
Algorithm 2 0.7 0.1 0.4 0.175
Algorithm 3 0.02 1.0 0.51 0.0392
Algorithm 3 classifies always Average says not correctly
that Algorithm 3 is the best

10 /31
Summaries of the confusion matrix
Different metrics can be computed from the confusion matrix, depending on the class of
interest (https://en.wikipedia.org/wiki/Precision_and_recall)

11 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked example

12 /31
Ranking instead of classifying
Classifiers such as logistic regression can output a probability of belonging to a class (or
something similar)

We can use this to rank the different istances and take actions on the cases at top of
the list

We may have a budget, so we have to target most promising individuals

Ranking enables to use different techniques for visualizing model performance

13 /31
Ranking instead of classifying
p n

Y 0 0 p n
Instance 1 0
True class Score N 100 100 Y
description
99 100
…………… 0,99 N

…………… 0,98 Different confusion

…………… 0,96 p n matrices by changing
…………… 0,90 Y
2 0 the threshold
…………… 0,88 N 98 100
p n
…………… 0,87 Y
2 1

…………… 0,85 98 99
N
…………… 0,80 p n
…………… 0,70 Y
6 4

94 96
N

14 /31
Ranking instead of classifying
ROC curves are a very general way to represent and compare the performance of
different models (on a binary classification task)

Perfection Observations
classify always negative
Recall (True Positive Rate)

Random classify always positive

Better guessing
classifier : random classifier
Worse
classifier : worse than random classifier
Different classifiers can be compared

Area Under the Curve (AUC): probability that a

randomly chosen positive instance will be ranked
1 – specificity (False Positive Rate) ahead of randomly chosen negative instance

15 /31
Outline
1. Metrics

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

4. Worked examples

16 /31
Breast cancer detection
Breast cancer is the most common cancer amongst women in the world.

It accounts for 25% of all cancer cases, and affected over 2.1 Million people in 2015
alone.

It starts when cells in the breast begin to grow out of control. These cells usually
form tumors that can be seen via X-ray or felt as lumps in the breast area.

The key challenges against it’s detection is how to classify tumors into malignant
(cancerous) or benign(non cancerous).

Goal: classifying these tumors using machine learning and the Breast Cancer
Wisconsin (Diagnostic) Dataset.

17 /31
Breast cancer Wisconsin dataset Output:
This dataset has been referred from Kaggle. Class 4 stands for malignant cancer
Class 2 stands for benign cancer

Uniformity Single
Clump Uniformity of Cell Marginal Epithelial Bare Bland Normal
id_num Thickness of Cell Size Shape Adhesion Cell Size Nuclei Chromatin Nucleoli Mitoses Class

1041801 5 3 3 3 2 3 4 4 1 4

1043999 1 1 1 1 2 3 3 1 1 2

1044572 8 7 5 10 7 9 5 5 4 4

1047630 7 4 6 4 6 1 4 3 1 4

1048672 4 1 1 1 2 1 2 1 1 2

1049815 4 1 1 1 2 1 3 1 1 2

1050670 10 7 7 6 4 10 4 1 2 4

……… …… …… …… …… …… …… ……. ……. ……. …….

18 /31
Breast cancer detection
We will use the dataset to compare differente logistic regression models by means
of the ROC curve associated to each of them.

To this aim we will work with 4 different dataset (plus an extra one)

1. Case 1: the whole dataset

2. Case 2: the first group of 5 features
3. Case 3: the second group of 5 features
4. Case 4: only the first two features

Extra: after learning the model of CASE 1, take only the features with the smallest
p-value.

19 /31
%% Load and clean data

Matlab code data = readtable('breast_cancer_w.xlsx'); %load our data as a table

Phi=table2array(data(:,1:end-1));
y=table2array(data(:,end));
Output:
y(y==4)=1; % in the original date 4 stands for malignant cancer
Class 4 stands for malignant y(y==2)=0; % in the original date 2 stands for benign cancer
cancer and it is for us the positive % Setup the data matrix appropriately, and add ones for the intercept
output. We set it to 1 term
[N, d] = size(Phi);

Class 2 stands for benign cancer Phi = [ones(N, 1) Phi]; % Add intercept term
and it is for us the negative %% Train and test data

output. We set it to 0. mdl = fitglm(Phi,y,'Distribution','binomial','Link','logit')

%% ============ Part 2: Compute the ROC curve ============

scores = mdl.Fitted.Probability;

[X,Y,T,AUC] = perfcurve(y,scores,1);

%Plot the ROC curve.

perfcurve compute the points in figure
plot(X,Y)
the ROC curve as well as the AUC xlabel('False positive rate')
ylabel('True positive rate')
title('ROC for Classification by Logistic Regression')

20 /31
Results

Comparison of case 1, 2, 3 and 4

Using only the first 2 features is nott

a smart choice.

21 /31
Results
Comparison of case 1, 4 and best

Using only the best features

provides a model that performs
almost as well as using all the
features

22 /31
Pneumonia detection
Suppose to have at disposal X-ray images of lungs: Healthy people - Covid-19 disease
patients

23 /31
Acknowledgments
The COVID-19 X-ray image is curated by Dr. Joseph Cohen, a postdoctoral fellow at
the University of Montreal, see https://josephpcohen.com/w/public-covid19-dataset/

The previous data contain only X-ray images of people with a disease. To collect
images of healthy people, we can download another X-ray dataset on the platform
Kaggle https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia

The analysis is inspired from a tutorial by Adrian Rosebrock:

https://www.pyimagesearch.com/2020/03/16/detecting-covid-19-in-x-ray-images-with-keras-
tensorflow-and-deep-learning/

24 /31
Acknowledgments
We want to use a classifier to perform classification:
Healthy patients: class
Patients with a disease: class

The input data are directly the X-ray images

For these computer vision tasks, the state of the art algorithm are the Convolutional
Neural Networks:
we can use them to classify the images into healthy and disease

25 /31
Estimated covid label
Pneumonia detection True label
Estimated healthy label

26 /31
Pneumonia detection
Classification results on test set
Actual class

Estimated class
1 (p) 0 (n)
Sensitivity (recall, true positive rate)

True positive False positive

1 (Y)
11 0

False negative True negative

0 (N)
1 11

Specificity (true negative rate)

Accuracy: )

27 /31
Pneumonia detection
Classification results on test set

Sensitivity (recall, true positive rate) Specificity (true negative rate)

Sensitivity: of patients that do have COVID-19 (i.e., true positives), we could

accurately identify them as “COVID-19 positive” 92% of the time using our model

Specificity: of patients that do not have COVID-19 (i.e., true negatives), we could
accurately identify them as “COVID-19 negative” 100% of the time using our model.

28 /31
Pneumonia detection
Classification results on test set

Sensitivity (recall, true positive rate) Specificity (true negative rate)

Being able to accurately detect healthy patients with 100% accuracy is great. We do
not want to quarantine someone for nothing

…but we don’t want to classify someone as «healthy» when they are «COVID-19
positive», since it could infect other people without knowing

29 /31
Summary
Balancing sensitivity and specificity is incredibly challenging when it comes to medical
applications

The results should always be validated with another pool of people

Furthermore, we need to be concerned of what the model is actually learning:

Does the results align with the medical knowledge?
Was the dataset well representative of the population or there was selection bias?

30 /31
Summary

Furthermore, we need to be concerned of

what the model is actually learning:
Do we accounted for all external factors
(confounding) that could interfere with the
response?

31 /31

Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
Breast Cancer Detection and Prediction: Created by
No ratings yet
Breast Cancer Detection and Prediction: Created by
20 pages
APA Chapter3 T20
No ratings yet
APA Chapter3 T20
24 pages
Evaluation Metrics and Statistical Tests For Machi
No ratings yet
Evaluation Metrics and Statistical Tests For Machi
15 pages
BSAN Case 3
No ratings yet
BSAN Case 3
9 pages
lec5_Classification
No ratings yet
lec5_Classification
27 pages
lecture11evaluationmetricsforclassification-240913060639-0c766554
No ratings yet
lecture11evaluationmetricsforclassification-240913060639-0c766554
28 pages
Classification Algorithms
No ratings yet
Classification Algorithms
16 pages
Ramana 2019
No ratings yet
Ramana 2019
6 pages
l09_machine_learning
No ratings yet
l09_machine_learning
39 pages
IDS Project Group 11
No ratings yet
IDS Project Group 11
35 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
DL_IT324a_4
No ratings yet
DL_IT324a_4
52 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
s41598-024-56706-x
No ratings yet
s41598-024-56706-x
14 pages
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
No ratings yet
Predicting Breast Cancer Using Logistic Regression - by Mo Kaiser - The Startup - Medium
15 pages
Breast Cancer Detection
No ratings yet
Breast Cancer Detection
15 pages
Copy of Functional_Test_case_Template minor
No ratings yet
Copy of Functional_Test_case_Template minor
3 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
ML-2-PPT-UNIT-2
No ratings yet
ML-2-PPT-UNIT-2
214 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE334L TH VL2024250101768 2024-10-08 Reference-Material-I
18 pages
A10-Model-Performance-v2-2up
No ratings yet
A10-Model-Performance-v2-2up
11 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Module 2
No ratings yet
Module 2
72 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Machine_Learning_II
No ratings yet
Machine_Learning_II
61 pages
ML Acti
No ratings yet
ML Acti
23 pages
March_3rd&4th
No ratings yet
March_3rd&4th
19 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
A Computational Study On Classification of Malignant
No ratings yet
A Computational Study On Classification of Malignant
63 pages
L22 KNN+Metrics
No ratings yet
L22 KNN+Metrics
18 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
CIVI6731 Lecture (Week9)
No ratings yet
CIVI6731 Lecture (Week9)
18 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
ML Report2
No ratings yet
ML Report2
21 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Ch01_ICS422_03
No ratings yet
Ch01_ICS422_03
46 pages
Application of Big Mining On Health Care Industry
No ratings yet
Application of Big Mining On Health Care Industry
6 pages
Breast_Cancer_Classification_Report
No ratings yet
Breast_Cancer_Classification_Report
16 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
An Introduction To ROC Analysis
No ratings yet
An Introduction To ROC Analysis
14 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Project Final
No ratings yet
Project Final
15 pages
Using Predictive Analytics Model To Diagnose Breast Cnacer
No ratings yet
Using Predictive Analytics Model To Diagnose Breast Cnacer
9 pages
Facial Emotion Detection Presentation
No ratings yet
Facial Emotion Detection Presentation
14 pages
Week 05 Classification Performance
No ratings yet
Week 05 Classification Performance
11 pages
Summary of The Datasets
No ratings yet
Summary of The Datasets
6 pages
Breast Cancer Detection Algo Comparison
No ratings yet
Breast Cancer Detection Algo Comparison
15 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
AI Final Assignment
No ratings yet
AI Final Assignment
27 pages
DMlab - FilE prINCE
No ratings yet
DMlab - FilE prINCE
27 pages
JCM 11 05075
No ratings yet
JCM 11 05075
12 pages
The Predictive Value of Language Scales - by Arm-19198
No ratings yet
The Predictive Value of Language Scales - by Arm-19198
8 pages
Predictive Maintenance of Armoured Vehicles Using Machine Learning Approaches
No ratings yet
Predictive Maintenance of Armoured Vehicles Using Machine Learning Approaches
7 pages
Approaching (Almost) Any Machine Learning Problem
100% (1)
Approaching (Almost) Any Machine Learning Problem
300 pages
Geospatial-Temporal Analysis Andclassification of Criminal Data in Manila
No ratings yet
Geospatial-Temporal Analysis Andclassification of Criminal Data in Manila
6 pages
2023 - Improved Transcranial Doppler Waveform Analysis For Intracranial Hypertension Assessment in Patients With Traumatic Brain Injury
No ratings yet
2023 - Improved Transcranial Doppler Waveform Analysis For Intracranial Hypertension Assessment in Patients With Traumatic Brain Injury
10 pages
Khanum Et Al - 2013 - Predicting Impacts Climate Change Medicinal Asclepiads of Pakistan - Maxent
No ratings yet
Khanum Et Al - 2013 - Predicting Impacts Climate Change Medicinal Asclepiads of Pakistan - Maxent
9 pages
F1 Score Vs ROC AUC Vs Accuracy Vs PR AUC Which Evaluation Metric Should You Choose - Neptune - Ai
No ratings yet
F1 Score Vs ROC AUC Vs Accuracy Vs PR AUC Which Evaluation Metric Should You Choose - Neptune - Ai
1 page
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
50% (2)
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
27 pages
MSC Proposal - Quantitative - 2019
No ratings yet
MSC Proposal - Quantitative - 2019
32 pages
AI Lab Manual New
No ratings yet
AI Lab Manual New
41 pages
Group Assignment: Machine Learning: TOPIC: Predicting of Census Data Using Machine Learning Techniques
No ratings yet
Group Assignment: Machine Learning: TOPIC: Predicting of Census Data Using Machine Learning Techniques
11 pages
Computerised Handwriting Speed Test System CHSTS Validation of A Handwriting Assessment For Chinese Secondary Students
No ratings yet
Computerised Handwriting Speed Test System CHSTS Validation of A Handwriting Assessment For Chinese Secondary Students
9 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
68 pages
[email protected]
No ratings yet
[email protected]
9 pages
Classification Model Evaluation Metrics
No ratings yet
Classification Model Evaluation Metrics
9 pages
Responsible Artificial Intelligence in Human Resources Technology An
No ratings yet
Responsible Artificial Intelligence in Human Resources Technology An
8 pages
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
No ratings yet
Stock Market Prediction Using Machine Learning Algorithms A Classification Study
4 pages
Jadan Et Al 2022 Forest Ecosystems
No ratings yet
Jadan Et Al 2022 Forest Ecosystems
9 pages
CBCL-1-Comparative Evaluation of Child Behavior Checklist-Derived Scales in Children Clinically Referred For Emotional and Behavioral Dysregulation
No ratings yet
CBCL-1-Comparative Evaluation of Child Behavior Checklist-Derived Scales in Children Clinically Referred For Emotional and Behavioral Dysregulation
8 pages
Neutrophil To Lymphocyte Count Ratio As A Biomarker of Bacterial Infections
No ratings yet
Neutrophil To Lymphocyte Count Ratio As A Biomarker of Bacterial Infections
5 pages
Mock Test Session 1 - Question Sheet
No ratings yet
Mock Test Session 1 - Question Sheet
36 pages
Vezzosi 2021
No ratings yet
Vezzosi 2021
9 pages
Stress, Schizophrenia, and Violence: A Machine Learning Approach
No ratings yet
Stress, Schizophrenia, and Violence: A Machine Learning Approach
21 pages
No-Code ML With DataRobot
No ratings yet
No-Code ML With DataRobot
73 pages
Coughs Lyx
No ratings yet
Coughs Lyx
12 pages

Machine Learning Evaluation Metrics Lecturer

Uploaded by

Machine Learning Evaluation Metrics Lecturer

Uploaded by

.

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

Accuracy is a measure of how well a binary classifier correctly identifies or excludes

Train logistic regression model  " , with ! (  if disease, ! (  otherwise.

The "  class has very few samples with

For skewed classes, the accuracy metric can be deceptive

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

Precision (How much we are precise in the detection)

True positive False positive

Precision(P) Recall (R) Average F1 Score

          

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

We may have a budget, so we have to target most promising individuals

Ranking enables to use different techniques for visualizing model performance

…………… 0,98 Different confusion

Random   classify always positive

Area Under the Curve (AUC): probability that a

2. Precision and recall

3. Receiver Operating Characteristic (ROC) curves

……… …… …… …… …… …… …… ……. ……. ……. …….

1. Case 1: the whole dataset

Matlab code data = readtable('breast_cancer_w.xlsx'); %load our data as a table

output. We set it to 0. mdl = fitglm(Phi,y,'Distribution','binomial','Link','logit')

%% ============ Part 2: Compute the ROC curve ============

%Plot the ROC curve.

Comparison of case 1, 2, 3 and 4

Using only the first 2 features is nott

Using only the best features

The analysis is inspired from a tutorial by Adrian Rosebrock:

The input data are directly the X-ray images

          True positive False positive

False negative True negative

Specificity (true negative rate)

Sensitivity (recall, true positive rate) Specificity (true negative rate)

Sensitivity: of patients that do have COVID-19 (i.e., true positives), we could

Sensitivity (recall, true positive rate) Specificity (true negative rate)

The results should always be validated with another pool of people

Furthermore, we need to be concerned of what the model is actually learning:

Furthermore, we need to be concerned of

You might also like

Train logistic regression model " , with ! ( if disease, ! ( otherwise.

The " class has very few samples with

Random classify always positive

True positive False positive