0% found this document useful (0 votes)

172 views21 pages

Confusion Matrix

The document discusses confusion matrices, which are used to evaluate classification models. It defines key terms like true positives, true negatives, false positives, and false negatives. It provides an example confusion matrix for a model that predicts whether an animal is a cat or dog. Performance metrics like accuracy, precision, recall, F1 score, and specificity are defined and calculated based on the example matrix. The document summarizes that precision measures certainty of true positives, recall measures avoiding false negatives, and specificity avoids false positives. Confusion matrices for models with more than two classes are also discussed.

Uploaded by

sak shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

172 views21 pages

Confusion Matrix

Uploaded by

sak shi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

CONFUSION MATRIX

By
R Soujanya
Assistant Professor,
CSE, GRIET
CONFUSION MATRIX
 Metric is a technique to evaluate the performance of the model.
 List of various metric we will be covering in this blog.
1. Accuracy
2. Null Accuracy
3. Precision
4. Recall
5. f1 score
6. ROC — AUC
7. Accuracy : This is the most naive and commonly used metric in the context of classification. It is
just the mean of correct predictions.
CONFUSION MATRIX
 A Confusion matrix or error matrix is used for summarizing the performance of a classification
algorithm
 Confusion matrix is used to find the accuracy of the classification model or a classifier.
 It will be applied on classification problems rather than regression.
 The given data set is divided into two ways
1. Training data (80%) ---- Model can be build on training data
2. Test Data (20%) ------ Accuracy of a model will be calculated using test data.
 To find the accuracy of a given model confusion matrix will be used.
 Calculating a confusion matrix can give you an idea of where the classification model is right and what
types of errors it is making.
 A confusion matrix is used to check the performance of a classification model on a set of test data for
which the true values are known.
 Performance measures such as precision, recall are calculated from the confusion matrix.
How To Use Confusion Matrix:

 Confusion matrix is always a square matrix.

 For example S.N PATEINT AGE ACTUAL PREDICTED
O NAME VALUE VALUE
(DIABITIES)

1 A 45 YES YES
2 B 50 NO YES
3 C 57 YES NO
4 D 68 YES YES
5 E 30 NO NO
6 F 76 NO YES

 If the number of categories of target variable is 2 then we get confusion matrix order is 2 i.e 2
x2 .
 In general, if the order confusion matrix is n x n where n is the number of categories in target
variable.
CONFUSION MATRIX
 A confusion matrix provides a summary of the
predictive results in a classification
 Correct and incorrect predictions are
problem.
summarized
broken downinbya each
table class.
with their values and

Fig: Confusion
Matrix for the
Binary
Classification
CONFUSION MATRIX
 We can not rely on a single value of accuracy in classification when the classes are imbalanced.

 For example, we have a dataset of 100 patients in which 5 have diabetes and 95 are healthy.

 However, if our model only predicts the majority class i.e. all 100 people are healthy even though we have a
classification accuracy of 95%. Therefore, we need a confusion matrix.

 Calculate A Confusion Matrix for 2x2 :

 We have a total of 10 cats and dogs and our model predicts whether it is a cat or not.

 Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’]
CONFUSION MATRIX

Remember, we describe predicted values as Positive/Negative and actual values as

True/False.
CONFUSION MATRIX
 Definition of the Terms:

 True Positive: You predicted positive and it’s true. You predicted that an animal is a cat and it actually is.
 True Negative: You predicted negative and it’s true. You predicted that animal is not a cat and it actually is
not (it’s a dog).
 False Positive (Type 1 Error): You predicted positive and it’s false. You predicted that animal is a cat but it
actually is not (it’s a dog).
 False Negative (Type 2 Error): You predicted negative and it’s false. You predicted that animal is not a cat
but it actually is.
 A confusion matrix is used to check the performance of a classification model on a set of test data for which
the true values are known.
 Most performance measures such as precision, recall are calculated from the confusion matrix.
CONFUSION MATRIX
Recall (aka Sensitivity)
 Recall is defined as the ratio of the total
number
classes of correctly
divide by the classified
total positive
number of
positive
classes, classes.
how Or, out
much we of have
all thepredicted
positive
correctly. Recall should be high.
 Precision is defined as the ratio of the total
number
classes of correctly
divided by theclassified
total positive
number of
predicted
predictive positive
positive classes.
classes, Or, out much
how of all the
we
predicted
high. correctly. Precision should be
CONFUSION MATRIX
Classification Accuracy:

 Classification Accuracy is given by the relation:

F-score or F1-score:

 It is difficult to compare two models with different Precision and Recall. So to make them
comparable, we use F-Score. It is the Harmonic Mean of Precision and Recall. As compared to
Arithmetic Mean, Harmonic Mean punishes the extreme values more. F-score should be high.
CONFUSION MATRIX
 Specificity:
Specificity determines the proportion of actual negatives that are correctly identified.

 Example to interpret confusion matrix:

Let’s calculate confusion matrix using above cat and dog example:
CONFUSION MATRIX
CONFUSION MATRIX
 Classification Accuracy:
Accuracy = (TP + TN) / (TP + TN + FP + FN) = (3+4)/(3+4+2+1) = 0.70

 Recall: Recall gives us an idea about when it’s actually yes, how often does it predict yes.
Recall = TP / (TP + FN) = 3/(3+1) = 0.75

 Precision: Precision tells us about when it predicts yes, how often is it correct.
Precision = TP / (TP + FP) = 3/(3+2) = 0.60

 F-score:
F-score = (2*Recall*Precision)/(Recall + Precision) = (2*0.75*0.60)/(0.75+0.60) = 0.67

 Specificity:
Specificity = TN / (TN + FP) = 4/(4+2) = 0.67
CONFUSION MATRIX
Summary:
 Precision is how certain you are of your true positives. Recall is how certain you are that you are not
missing any positives.
 Choose Recall if the occurrence of false negatives is unaccepted/intolerable. For example, in the
case of diabetes that you would rather have some extra false positives (false alarms) over saving some
false negatives.
 Choose Precision if you want to be more confident of your true positives. For example, in case of
spam emails, you would rather have some spam emails in your inbox rather than some regular emails
in your spam box. You would like to be extra sure that email X is spam before we put it in the spam
box.
 Choose Specificity if you want to cover all true negatives, i.e. meaning we do not want any false
alarms or false positives. For example, in case of a drug test in which all people who test positive will
immediately go to jail, you would not want anyone drug-free going to jail.
CONFUSION MATRIX
We can conclude that:
 Accuracy value of 70% means that identification of 3 of every 10 cats is incorrect, and 7 is correct.
 Precision value of 60% means that label of 4 of every 10 cats is a not a cat (i.e. a dog), and 6 are cats.
 Recall value is 70% means that 3 of every 10 cats, in reality, are missed by our model and 7 are
correctly identified as cats.
 Specificity value is 60% means that 4 of every 10 dogs (i.e. not cat) in reality are miss-labeled as cats
and 6 are correctly labeled as dogs.
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 1. TP = True positives = Actual C1 is predicted as C1+

Actual C2 is predicted as C2+
Actual C3 is predicted as C3
= 2+2+2 = 6
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 2. TN = True negatives = actual non C1 is predicted as non C1 +

actual non C2 is predicted as non C2 +
actual non C3 is predicted as non C3
= 4+4+6
= 14
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 3. FP = False positives = Actual non C1 is predicted as C1 +

Actual non C2 is predicted as C2 +
Actual non C3 is predicted as C3 +
= 1+1+0 = 2
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 4. FN = False Negative = Actual C1 is predicted as non C1 +

Actual C2 is predicted as non C2 +
Actual C3 is predicted as non C3 +
=1+1+0 = 2
CONFUSION MATRIX
We can conclude that:
 Accuracy = (TP+TN) / (TP+FP+FN+TN) = (6+14 ) / (6+14+2+2) = 0.833

 Precision = TP/ (TP+FP) = 6/ (6+2) = 0.75.

 Recall = TP / (TP+FN) = 6/ (6+2) = 0.75

 F-measure = (2PR)/(P+R) =( 2* 0.75 * 0.75)/(0.75 + 0.75) = 0.75.

 Purity: Sum of correctly assigned objects and dividing by N(total

 number of objects).

Marks Hi Marks: Be Comp MCQ PDF
100% (1)
Marks Hi Marks: Be Comp MCQ PDF
878 pages
Cs3352 FDS Question Bank
No ratings yet
Cs3352 FDS Question Bank
145 pages
K - Nearest Neighbor Algorithm
100% (1)
K - Nearest Neighbor Algorithm
18 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Decision Support Systems
100% (7)
Decision Support Systems
18 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
10 pages
Supervised Learning-1
100% (1)
Supervised Learning-1
37 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Z - TEST and T Test
No ratings yet
Z - TEST and T Test
45 pages
Dbms Commands
No ratings yet
Dbms Commands
56 pages
Answer 1722791857 NLP and Classification Practical MCQ 4991
No ratings yet
Answer 1722791857 NLP and Classification Practical MCQ 4991
26 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Introduction To The Concept of Probability
100% (1)
Introduction To The Concept of Probability
32 pages
Data Preprocessing
No ratings yet
Data Preprocessing
37 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Basic Statistical Descriptions of Data: Dr. Amiya Ranjan Panda
No ratings yet
Basic Statistical Descriptions of Data: Dr. Amiya Ranjan Panda
35 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Must Know Questions Deep Learning
No ratings yet
Must Know Questions Deep Learning
22 pages
Catalogue of The Arabic and Persian Manuscripts in The Oriental Public Library at Bankipore Persian Poets PDF
No ratings yet
Catalogue of The Arabic and Persian Manuscripts in The Oriental Public Library at Bankipore Persian Poets PDF
293 pages
Part 8 - Confusion Matrix
No ratings yet
Part 8 - Confusion Matrix
21 pages
1694601295-Unit 3.6 Generalized Discriminant Analysis CU 2.0
100% (1)
1694601295-Unit 3.6 Generalized Discriminant Analysis CU 2.0
15 pages
Chapter 6 Measures of Skewness and Kurtosis
No ratings yet
Chapter 6 Measures of Skewness and Kurtosis
25 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Machine Learning Unit 4 MCQ
No ratings yet
Machine Learning Unit 4 MCQ
28 pages
18CSO106T Data Analysis Using Open Source Tool: Question Bank
No ratings yet
18CSO106T Data Analysis Using Open Source Tool: Question Bank
26 pages
Nueral Network Mcqs
No ratings yet
Nueral Network Mcqs
6 pages
MODULE 12 Populations and Samples
No ratings yet
MODULE 12 Populations and Samples
21 pages
Modelling in R
No ratings yet
Modelling in R
47 pages
Lecture 1 - Software Evolution Process
No ratings yet
Lecture 1 - Software Evolution Process
40 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Evaluation Mcqs
No ratings yet
Evaluation Mcqs
2 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Input Devices
No ratings yet
Input Devices
12 pages
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
No ratings yet
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
1 page
K Means Clustering Algorithm
No ratings yet
K Means Clustering Algorithm
12 pages
Hill Climbing Algorithm in AI
No ratings yet
Hill Climbing Algorithm in AI
5 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
DBMS Unit-3
No ratings yet
DBMS Unit-3
42 pages
Data Science Questions and Answers - Letsfindcourse
100% (1)
Data Science Questions and Answers - Letsfindcourse
5 pages
Session 18 Time Series Forecasting
No ratings yet
Session 18 Time Series Forecasting
30 pages
01 Naiv Bayes
No ratings yet
01 Naiv Bayes
25 pages
BML DMS User Manual V1 (Old)
No ratings yet
BML DMS User Manual V1 (Old)
23 pages
Machine Learning Unit 2 MCQ
No ratings yet
Machine Learning Unit 2 MCQ
17 pages
Nikolaos Mesarites
No ratings yet
Nikolaos Mesarites
87 pages
Ford8630servicemanual 180108041459 PDF
No ratings yet
Ford8630servicemanual 180108041459 PDF
4 pages
Relational Object Oriented and Multi Dimensional Databases
No ratings yet
Relational Object Oriented and Multi Dimensional Databases
13 pages
Hiab 140 Parts Manual: File ID: - File Type: PDF File Size: 218.82 Publish Date: 19 Oct, 2013
0% (1)
Hiab 140 Parts Manual: File ID: - File Type: PDF File Size: 218.82 Publish Date: 19 Oct, 2013
3 pages
Data Preprocessing: L1+ Freq
No ratings yet
Data Preprocessing: L1+ Freq
13 pages
Statistical Estimation
No ratings yet
Statistical Estimation
31 pages
MDG200 Col20 SAP Master Data Governance: Configuration and Customizing
No ratings yet
MDG200 Col20 SAP Master Data Governance: Configuration and Customizing
20 pages
MySQL Restricting and Sorting Data - Exercises, Practice, Solution
No ratings yet
MySQL Restricting and Sorting Data - Exercises, Practice, Solution
26 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Lab 1: Preprocessing Using Python
No ratings yet
Lab 1: Preprocessing Using Python
5 pages
Literature Review As A Research Methodology An Overview and Guidelines
No ratings yet
Literature Review As A Research Methodology An Overview and Guidelines
9 pages
Applied Statistics: Assessment Tasks
No ratings yet
Applied Statistics: Assessment Tasks
4 pages
Key Data Mining Tasks: 1. Descriptive Analytics
No ratings yet
Key Data Mining Tasks: 1. Descriptive Analytics
10 pages
Åα¿½«ªÑ¡¿Ñ 4
No ratings yet
Åα¿½«ªÑ¡¿Ñ 4
130 pages
Rayleigh Model
No ratings yet
Rayleigh Model
9 pages
K-Nearest Neighbors: KNN Algorithm Pseudocode
No ratings yet
K-Nearest Neighbors: KNN Algorithm Pseudocode
2 pages
Unit - 1 Part - 2concept of Information Systems and Software
No ratings yet
Unit - 1 Part - 2concept of Information Systems and Software
30 pages
1.write A Program in Prolog To Show The Sum of N Natural Numbers. Code
No ratings yet
1.write A Program in Prolog To Show The Sum of N Natural Numbers. Code
2 pages
Data Bases in SQL Server
No ratings yet
Data Bases in SQL Server
2 pages
K Nearest Neighbour
No ratings yet
K Nearest Neighbour
2 pages
Sat - 10.Pdf - Smart Voice Assistant Using Python
No ratings yet
Sat - 10.Pdf - Smart Voice Assistant Using Python
11 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Cambridge IGCSE: 0417/12 Information and Communication Technology
No ratings yet
Cambridge IGCSE: 0417/12 Information and Communication Technology
16 pages
DBMS Functions Data, Storage, Retrieval, and Update
No ratings yet
DBMS Functions Data, Storage, Retrieval, and Update
14 pages
Industrial - Training - Report Proper
No ratings yet
Industrial - Training - Report Proper
12 pages
DBMS - Viva QnA - Doubtly - in
No ratings yet
DBMS - Viva QnA - Doubtly - in
20 pages
Activity Exercises
No ratings yet
Activity Exercises
4 pages
XI - IT Part-B Unit-4
No ratings yet
XI - IT Part-B Unit-4
5 pages
A Survey On Power System Blackout and Cascading Events Research Motivations and Challenges
No ratings yet
A Survey On Power System Blackout and Cascading Events Research Motivations and Challenges
19 pages
ER Diagram To Relational Model
No ratings yet
ER Diagram To Relational Model
8 pages
Spidey GMAT Notes
No ratings yet
Spidey GMAT Notes
2 pages
Deswik - MDM Module Summary
No ratings yet
Deswik - MDM Module Summary
1 page
Project Plan For KPI Data Management Application Development Using Power Platform
No ratings yet
Project Plan For KPI Data Management Application Development Using Power Platform
6 pages
78l7pcnsbsa7 AssociateAtlasExamGuide1
No ratings yet
78l7pcnsbsa7 AssociateAtlasExamGuide1
9 pages

Confusion Matrix

Uploaded by

Confusion Matrix

Uploaded by

CONFUSION MATRIX

 Confusion matrix is always a square matrix.

 Calculate A Confusion Matrix for 2x2 :

Remember, we describe predicted values as Positive/Negative and actual values as

 Classification Accuracy is given by the relation:

 Example to interpret confusion matrix:

 1. TP = True positives = Actual C1 is predicted as C1+

 2. TN = True negatives = actual non C1 is predicted as non C1 +

 3. FP = False positives = Actual non C1 is predicted as C1 +

 4. FN = False Negative = Actual C1 is predicted as non C1 +

 Precision = TP/ (TP+FP) = 6/ (6+2) = 0.75.

 Recall = TP / (TP+FN) = 6/ (6+2) = 0.75

 F-measure = (2*P*R)/(P+R) =( 2* 0.75 * 0.75)/(0.75 + 0.75) = 0.75.

 Purity: Sum of correctly assigned objects and dividing by N(total

You might also like

 F-measure = (2PR)/(P+R) =( 2* 0.75 * 0.75)/(0.75 + 0.75) = 0.75.