Classification Metrics

The document discusses various metrics for evaluating classification models including accuracy, confusion matrix, recall, precision, and F1 score. These metrics are used to evaluate how well a classification model predicts outcomes and identify errors.

Uploaded by

N Samhing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views24 pages

Classification Metrics

Uploaded by

N Samhing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Classification Metrics in

Scikit Learn
Metrics in Predictive Modelling
• One major area of predictive modeling in data science is classification.
Classification consists of trying to predict which class a particular sample from a
population comes from.
• For example, if we are trying to predict if a particular patient will be re-
hospitalized, the two possible classes are hospital (positive) and not-hospitalized
(negative).
• The classification model then tries to predict if each patient will be hospitalized or
not hospitalized.
• In other words, classification is simply trying to predict which bucket (predicted
positive vs predicted negative) a particular sample from the population should be
placed as seen below.
Metrics in Predictive Modelling
Metrics in Predictive Modelling
• True Positives: people that are hospitalized that you predict will be
hospitalized
• True Negatives: people that are NOT hospitalized that you predict will
NOT be hospitalized
• False Positives: people that are NOT hospitalized that you predict will
be hospitalized
• False Negatives: people that are hospitalized that you predict will
NOT be hospitalized
Metrics for Evaluating Performance of the Models
1. Accuracy Score Metric
Accuracy_score which is imported as
• from sklearn.metrics import accuracy_score
returns “accuracy classification score”. What it does is the calculation of
“How accurate the classification is”
• It is the most common metric for classification which is the fraction of
samples predicted correctly as shown below:
Metrics for Evaluating Performance of the Models

• Presented as a percentage by multiplying the result by 100.

classification accuracy = correct predictions / total predictions * 100
• Classification accuracy can also easily be turned into a
misclassification rate or error rate by inverting the value, such as:
error rate = (1 - (correct predictions / total predictions)) * 100
Metrics for Evaluating Performance of the Models
We can obtain the accuracy score from scikit-learn, which takes as
inputs the actual labels and the predicted labels
• from sklearn.metrics import accuracy_score
• accuracy_score(df.actual_label.values, df.predicted_RF.values)

• Shows answer like 0.6705165630156111

Classification Accuracy limitations
• Classification accuracy from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
alone can be misleading from sklearn.metrics import accuracy_score
if you have an unequal # Make predictions on validation dataset
number of observations knn = KNeighborsClassifier()
knn.fit(X_train, Y_train)
in each class or if you predictions = knn.predict(X_validation)
have more than two print(predictions)
print("Accuracy Score :", accuracy_score(Y_validation, predictions))
classes in your dataset.
['Iris-virginica' 'Iris-versicolor' 'Iris-setosa' 'Iris-
versicolor' 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor'
'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor' 'Iris-
virginica' 'Iris-versicolor' 'Iris-setosa' 'Iris-virginica'
'Iris-setosa' 'Iris-versicolor' 'Iris-virginica' 'Iris-
virginica' 'Iris-setosa' 'Iris-setosa' 'Iris-versicolor' 'Iris-
virginica' 'Iris-versicolor' 'Iris-versicolor' 'Iris-virginica'
'Iris-virginica' 'Iris-versicolor' 'Iris-versicolor' 'Iris-
virginica' 'Iris-virginica']
Accuracy Score : 0.9
Metrics for Evaluating Performance of the Models
2. Confusion Matrix
• A clean and unambiguous way to present the summary of prediction
results of a classifier.
• Calculating a confusion matrix can give you a better idea of what your
classification model is getting right and what types of errors it is
making.
• The number of correct and incorrect predictions are summarized with
count values and broken down by each class
• The confusion matrix shows the ways in which your classification
model is confused when it makes predictions.
Metrics for Evaluating Performance of the Models
Process for calculating a confusion Matrix
• You need a test dataset or a validation dataset with expected
outcome values.
• Make a prediction for each row in your test dataset.
• From the expected outcomes and predictions count:
• The number of correct predictions for each class.
• The number of incorrect predictions for each class, organized by the class that
was predicted.
Metrics for Evaluating Performance of the Models
• These numbers are then organized into a table, or a matrix as follows:
• Expected down the side: Each row of the matrix corresponds to a
predicted class.
• Predicted across the top: Each column of the matrix corresponds to
an actual class.
• The counts of correct and incorrect classification are then filled into
the table.
2-Class Confusion Matrix Case Study
• Let’s pretend we have a two-class classification problem of predicting
whether a photograph contains a man or a woman.
• We have a test dataset of 10 records with expected outcomes and a
set of predictions from our classification algorithm.
Expected, Predicted
man, woman
man, man
woman, woman
man, man
woman, man
woman, woman
woman, woman
man, man
man, woman
woman, woman
2-Class Confusion Matrix
• Let’s start off and calculate the classification accuracy for this set of
predictions.
• The algorithm made 7 of the 10 predictions correct with an accuracy
of 70%.
• accuracy = total correct predictions / total predictions made * 100
• accuracy = 7 / 10 * 100
• But what type of errors were made?
• Let’s turn our results into a confusion matrix.
• First, we must calculate the number of correct predictions for each
class.
2-Class Confusion Matrix
• men classified as men: 3
• women classified as women: 4
• We can now arrange these values into the 2-class confusion matrix:
men women
men 3 1
women 2 4

• The total actual men in the dataset is the sum of the values on the men column (3 + 2)
• The total actual women in the dataset is the sum of values in the women column (1 +4).
• The correct values are organized in a diagonal line from top left to bottom-right of the
matrix (3 + 4).
• More errors were made by predicting men as women than predicting women as men.
3 Class Confusion Matrix
• Sometimes it may from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
be desirable to from sklearn.metrics import accuracy_score
select a model with # Make predictions on validation dataset
knn = KNeighborsClassifier()
a lower accuracy knn.fit(X_train, Y_train)
because it has a predictions = knn.predict(X_validation)
greater predictive print(predictions)
print("Accuracy Score :", accuracy_score(Y_validation, predictions))
power on the print("Confusion Matrix : \n",confusion_matrix(Y_validation,
problem. predictions))
Confusion Matrix : [[ 7 0 0]
• Confusion Matrix in [ 0 11 1]
R with caret [ 0 2 9]]
Prediction Iris-setosa Iris-versicolor Iris-virginica
Iris-setosa 7 0 0
Iris-versicolor 0 11 1
Iris-virginica 0 2 9
Metrics for Evaluating Performance of the Models
3. Recall Score Metric : out of all the positive examples there were,
what fraction did the classifier pick up?
• Recall (also known as sensitivity) is the fraction of positives events
that you predicted correctly as shown below:
• from sklearn.metrics import recall_score
• recall_score(df.actual_label.values, df.predicted_RF.values)
Metrics for Evaluating Performance of the Models
4. Precision Score Metric
precision answers the following question: out of all the examples the
classifier labeled as positive, what fraction were correct?
• Precision is the fraction of predicted positives events that are actually
positive as shown below:
• from sklearn.metrics import precision_score
• precision_score(df.actual_label.values, df.predicted_RF.values)
Metrics for Evaluating Performance of the Models
5. F1 Score Metric
• The f1 score is the harmonic mean of recall and precision, with a
higher score as a better model. The f1 score is calculated using the
following formula:
• from sklearn.metrics import f1_score
• f1_score(df.actual_label.values, df.predicted_RF.values)
Classification Report
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
Classification Report :
from sklearn.metrics import accuracy_score precision recall f1-score support
# Make predictions on validation dataset Iris-setosa 1.00 1.00 1.00 7
knn = KNeighborsClassifier() Iris-versicolor 0.85 0.92 0.88 12
Iris-virginica 0.90 0.82 0.86 11
knn.fit(X_train, Y_train)
predictions = knn.predict(X_validation)
print(predictions)
print("Accuracy Score :", accuracy_score(Y_validation, predictions))
print("Classification Report :\n",classification_report(Y_validation, predictions))
Conclusion
• If the classifier does not make mistakes, then precision = recall =
1.0. But difficult to achieve.
• In predictive analytics, when deciding between two models it is
important to pick a single performance metric.
• As you can see here, there are many that you can choose from
(accuracy, recall, precision, f1-score etc).
• Ultimately, you should use the performance metric that is most
suitable for the business problem at hand.
Titanic Project
• https://www.ritchieng.com/machine-learning-project-titanic-survival/

Ericsson Microwave Products
100% (1)
Ericsson Microwave Products
155 pages
User Manual of TEM912 Digital Test Hammer-2023
No ratings yet
User Manual of TEM912 Digital Test Hammer-2023
26 pages
SAP SuccessFactors Onboarding Role-Based Permission Guidance - v1.3
No ratings yet
SAP SuccessFactors Onboarding Role-Based Permission Guidance - v1.3
31 pages
GPS Tracking System Black Book
100% (1)
GPS Tracking System Black Book
56 pages
Pseudo Dionysius of Areopagite - The Celestial & Ecclesiastical Hierarchy Transl John Parker (1894)
100% (2)
Pseudo Dionysius of Areopagite - The Celestial & Ecclesiastical Hierarchy Transl John Parker (1894)
119 pages
Creative Sensation Techniques Book
100% (1)
Creative Sensation Techniques Book
188 pages
Ezy4g Camera: Quick Operation Guide
No ratings yet
Ezy4g Camera: Quick Operation Guide
7 pages
Java Package - Javatpoint
No ratings yet
Java Package - Javatpoint
11 pages
EPI Computer Room Utilization Ratio
No ratings yet
EPI Computer Room Utilization Ratio
31 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Load Line 2
No ratings yet
Load Line 2
55 pages
Metrics For Multi-Class Classification
No ratings yet
Metrics For Multi-Class Classification
17 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
10 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Library Automation Package
No ratings yet
Library Automation Package
26 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Bernd Klein Python and Machine Learning Letter
No ratings yet
Bernd Klein Python and Machine Learning Letter
453 pages
ML-Lecture-12 (Evaluation Metrics For Classification)
No ratings yet
ML-Lecture-12 (Evaluation Metrics For Classification)
15 pages
Hayes Command Set / Register Formats
100% (1)
Hayes Command Set / Register Formats
5 pages
6SL3100 0BE21 6AB0 Datasheet en
No ratings yet
6SL3100 0BE21 6AB0 Datasheet en
402 pages
Ultrasonic Thickness Gauge NOVOTEST UT-1М-ST
No ratings yet
Ultrasonic Thickness Gauge NOVOTEST UT-1М-ST
4 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
IJRASET Sample Paper For Format
No ratings yet
IJRASET Sample Paper For Format
9 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
Sales Forc
No ratings yet
Sales Forc
217 pages
Module 2
No ratings yet
Module 2
151 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Classification Metrics For Generalized Results
No ratings yet
Classification Metrics For Generalized Results
70 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Chapter 5 Model Evaluation
No ratings yet
Chapter 5 Model Evaluation
21 pages
214 Topic 01 2022
No ratings yet
214 Topic 01 2022
73 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
33 pages
Ebben Is Van Ilyen Processzoros Modell
No ratings yet
Ebben Is Van Ilyen Processzoros Modell
20 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Video Marketing Professional
No ratings yet
Video Marketing Professional
24 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
20 pages
Ad3501-Dl-Unit 4 Notes
No ratings yet
Ad3501-Dl-Unit 4 Notes
16 pages
Unit-6 Notes PART A
No ratings yet
Unit-6 Notes PART A
20 pages
Smart Sensor
No ratings yet
Smart Sensor
11 pages
M M - C C: O: Etrics For Ulti Lass Lassification AN Verview
No ratings yet
M M - C C: O: Etrics For Ulti Lass Lassification AN Verview
17 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
22AIP3101A Session 3
No ratings yet
22AIP3101A Session 3
24 pages
BigData Section6
No ratings yet
BigData Section6
10 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Information Technology
No ratings yet
Information Technology
9 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Lec 4 ML S4 Evaluation Metrics
No ratings yet
Lec 4 ML S4 Evaluation Metrics
29 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Performance Measures
No ratings yet
Performance Measures
19 pages
Lecture - 3
No ratings yet
Lecture - 3
24 pages
Unit 3
No ratings yet
Unit 3
13 pages
Imp Notes For Aamd
No ratings yet
Imp Notes For Aamd
6 pages
MIS501 Assessment 3-Final (1) - 220726 - 061808-2
No ratings yet
MIS501 Assessment 3-Final (1) - 220726 - 061808-2
9 pages
Ads 5
No ratings yet
Ads 5
5 pages
4.2.7 Lab - Getting Familiar With The Linux Shell
No ratings yet
4.2.7 Lab - Getting Familiar With The Linux Shell
8 pages
MCQ in Networks and DBMS
No ratings yet
MCQ in Networks and DBMS
3 pages
Lista Precios202504
No ratings yet
Lista Precios202504
5 pages
Module 1 - Interactive Lecture
No ratings yet
Module 1 - Interactive Lecture
9 pages
Audio1627988258-M4a Dengan Penanda Waktu
No ratings yet
Audio1627988258-M4a Dengan Penanda Waktu
7 pages
Ads Exp4
No ratings yet
Ads Exp4
3 pages
Galaxy G1 Plus (IMU) Brochure
No ratings yet
Galaxy G1 Plus (IMU) Brochure
2 pages
Exp7 MLAI2
No ratings yet
Exp7 MLAI2
8 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
4.8 Estimating The Performance of A Classifier
No ratings yet
4.8 Estimating The Performance of A Classifier
19 pages
Comprehensive Guide On Confusion Matrix 1657202063
No ratings yet
Comprehensive Guide On Confusion Matrix 1657202063
5 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Branson 2000d Error Code 300: Direct Link #1
No ratings yet
Branson 2000d Error Code 300: Direct Link #1
3 pages
The Doctor of The Future Will Use Electronic Prescriptions
No ratings yet
The Doctor of The Future Will Use Electronic Prescriptions
1 page
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet

Classification Metrics

Uploaded by

Classification Metrics

Uploaded by

Classification Metrics in

• Presented as a percentage by multiplying the result by 100.

• Shows answer like 0.6705165630156111

You might also like