Confusion Matrix

The document provides definitions and explanations of key terms related to confusion matrix terminology for classification models. It begins with an example binary confusion matrix and defines the components as true positives, true negatives, false positives, and false negatives. It then lists several rates that can be computed from the matrix, such as accuracy, misclassification rate, true positive rate, and precision. Finally, it discusses additional related terms like positive predictive value, null error rate, Cohen's kappa, F score, and ROC curves.

Uploaded by

Meet Lukka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

183 views4 pages

Confusion Matrix

Uploaded by

Meet Lukka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Simple guide to confusion matrix terminology

A confusion matrix is a table that is often used to describe the performance of a

classification model (or "classifier") on a set of test data for which the true values
are known. The confusion matrix itself is relatively simple to understand, but the
related terminology can be confusing.

I wanted to create a "quick reference guide" for confusion matrix terminology

because I couldn't find an existing resource that suited my requirements: compact in
presentation, using numbers instead of arbitrary variables, and explained both in
terms of formulas and sentences.

Let's start with an example confusion matrix for a binary classifier (though it can
easily be extended to the case of more than two classes):

What can we learn from this matrix?

 There are two possible predicted classes: "yes" and "no". If we were predicting
the presence of a disease, for example, "yes" would mean they have the
disease, and "no" would mean they don't have the disease.
 The classifier made a total of 165 predictions (e.g., 165 patients were being
tested for the presence of that disease).
 Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55
times.
 In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let's now define the most basic terms, which are whole numbers (not rates):

 true positives (TP): These are cases in which we predicted yes (they have the
disease), and they do have the disease.
 true negatives (TN): We predicted no, and they don't have the disease.
 false positives (FP): We predicted yes, but they don't actually have the
disease. (Also known as a "Type I error.")
 false negatives (FN): We predicted no, but they actually do have the disease.
(Also known as a "Type II error.")

I've added these terms to the confusion matrix, and also added the row and column
totals:

This is a list of rates that are often computed from a confusion matrix for a binary
classifier:

 Accuracy: Overall, how often is the classifier correct?

o (TP+TN)/total = (100+50)/165 = 0.91
 Misclassification Rate: Overall, how often is it wrong?
o (FP+FN)/total = (10+5)/165 = 0.09
o equivalent to 1 minus Accuracy
o also known as "Error Rate"
 True Positive Rate: When it's actually yes, how often does it predict yes?
o TP/actual yes = 100/105 = 0.95
o also known as "Sensitivity" or "Recall"
 False Positive Rate: When it's actually no, how often does it predict yes?
o FP/actual no = 10/60 = 0.17
 Specificity: When it's actually no, how often does it predict no?
o TN/actual no = 50/60 = 0.83
o equivalent to 1 minus False Positive Rate
 Precision: When it predicts yes, how often is it correct?
o TP/predicted yes = 100/110 = 0.91
 Prevalence: How often does the yes condition actually occur in our sample?
o actual yes/total = 105/165 = 0.64

A couple other terms are also worth mentioning:

 Positive Predictive Value: This is very similar to precision, except that it

takes prevalence into account. In the case where the classes are perfectly
balanced (meaning the prevalence is 50%), the positive predictive value
(PPV) is equivalent to precision. (More details about PPV.)
 Null Error Rate: This is how often you would be wrong if you always
predicted the majority class. (In our example, the null error rate would be
60/165=0.36 because if you always predicted yes, you would only be wrong
for the 60 "no" cases.) This can be a useful baseline metric to compare your
classifier against. However, the best classifier for a particular application will
sometimes have a higher error rate than the null error rate, as demonstrated by
the Accuracy Paradox.
 Cohen's Kappa: This is essentially a measure of how well the classifier
performed as compared to how well it would have performed simply by
chance. In other words, a model will have a high Kappa score if there is a big
difference between the accuracy and the null error rate. (More details about
Cohen's Kappa.)
 F Score: This is a weighted average of the true positive rate (recall) and
precision. (More details about the F Score.)
 ROC Curve: This is a commonly used graph that summarizes the
performance of a classifier over all possible thresholds. It is generated by
plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis)
as you vary the threshold for assigning observations to a given class. (More
details about ROC Curves.)

And finally, for those of you from the world of Bayesian statistics, here's a quick
summary of these terms from Applied Predictive Modeling:

In relation to Bayesian statistics, the sensitivity and specificity are the conditional
probabilities, the prevalence is the prior, and the positive/negative predicted values
are the posterior probabilities.

Precision and recall are then defined as:[6]

Precision = t p / t p + f p
For example, for a text search on a set of documents, precision is the number of correct results divided
by the number of all returned results.

Recall = t p / t p + f n
For example, for a text search on a set of documents, recall is the number of correct results divided by
the number of results that should have been returned.

Group8 11abm10 Final - Research Manuscript PR1
No ratings yet
Group8 11abm10 Final - Research Manuscript PR1
77 pages
Steal My UX Case Study Structure That Got Me Hired 1692725028
No ratings yet
Steal My UX Case Study Structure That Got Me Hired 1692725028
38 pages
Questionnaire For Students
No ratings yet
Questionnaire For Students
5 pages
Grade 4 Teachers Guide
No ratings yet
Grade 4 Teachers Guide
99 pages
MINT New Trainers Manual
100% (1)
MINT New Trainers Manual
213 pages
Therapeutic Communication Skills in Mental Health
100% (2)
Therapeutic Communication Skills in Mental Health
27 pages
How To Teach English As A Second Language To Beginners
100% (1)
How To Teach English As A Second Language To Beginners
7 pages
Accomplishment Report For Research Congress
100% (1)
Accomplishment Report For Research Congress
5 pages
Graduation Message Secretary Briones SY 2020 2021
50% (2)
Graduation Message Secretary Briones SY 2020 2021
2 pages
Etl Testing Usefull Notes PDF Free
No ratings yet
Etl Testing Usefull Notes PDF Free
4 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
FFS - Unit 10 - Our Families
No ratings yet
FFS - Unit 10 - Our Families
5 pages
Six Steps To Master Machine Learning With Data Preparation
No ratings yet
Six Steps To Master Machine Learning With Data Preparation
44 pages
SLA Position Paper Final
100% (1)
SLA Position Paper Final
11 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
Approaches To The Analysis of Survey Data PDF
No ratings yet
Approaches To The Analysis of Survey Data PDF
28 pages
Gamlss-Manual Instructions On How To Use The Gamlss Package 2008
No ratings yet
Gamlss-Manual Instructions On How To Use The Gamlss Package 2008
206 pages
Confusion Matrix in Machine Learning
No ratings yet
Confusion Matrix in Machine Learning
6 pages
Can AI Assistants Know What They Don't Know
No ratings yet
Can AI Assistants Know What They Don't Know
21 pages
Combating Risk With Predictive Intelligence
No ratings yet
Combating Risk With Predictive Intelligence
16 pages
KM Questionnaire
50% (2)
KM Questionnaire
3 pages
Confusion Matrix Accuracy Recall Sensitivity
No ratings yet
Confusion Matrix Accuracy Recall Sensitivity
16 pages
Phuong Nguyen: The Complete Guide To Cluster Analysis Using Python
No ratings yet
Phuong Nguyen: The Complete Guide To Cluster Analysis Using Python
68 pages
Pass Program Action Research
No ratings yet
Pass Program Action Research
13 pages
Two Dimensions in Writing
No ratings yet
Two Dimensions in Writing
18 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
Confusion Matrix
No ratings yet
Confusion Matrix
7 pages
Parameter and Statistic DLP
No ratings yet
Parameter and Statistic DLP
5 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Expert Systems With Applications: Moloud Abdar, Mariam Zomorodi-Moghadam, Resul Das, I-Hsien Ting
No ratings yet
Expert Systems With Applications: Moloud Abdar, Mariam Zomorodi-Moghadam, Resul Das, I-Hsien Ting
13 pages
Wa0013.
No ratings yet
Wa0013.
9 pages
Poky Little Puppy Group Time Leason Plan
100% (1)
Poky Little Puppy Group Time Leason Plan
2 pages
Performance Measures
No ratings yet
Performance Measures
9 pages
Background
No ratings yet
Background
21 pages
Kivs Listeing Test 44
No ratings yet
Kivs Listeing Test 44
4 pages
SMOTE: Synthetic Minority Over-Sampling Technique: Nitesh V. Chawla
No ratings yet
SMOTE: Synthetic Minority Over-Sampling Technique: Nitesh V. Chawla
37 pages
Confusion Matrix-Based Feature Selection
No ratings yet
Confusion Matrix-Based Feature Selection
8 pages
Simple Guide To Confusion Matrix Terminology
No ratings yet
Simple Guide To Confusion Matrix Terminology
5 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Project Rubric Photoshop
No ratings yet
Project Rubric Photoshop
6 pages
Cancer Biomarkers
No ratings yet
Cancer Biomarkers
7 pages
Convolutional Neural Network With An Optimized Backpropagation Technique
No ratings yet
Convolutional Neural Network With An Optimized Backpropagation Technique
5 pages
Evaluation of Predictive Models Final
No ratings yet
Evaluation of Predictive Models Final
6 pages
POL BigDataStatisticsJune2014
No ratings yet
POL BigDataStatisticsJune2014
27 pages
Tugas Bahasa Inggris
No ratings yet
Tugas Bahasa Inggris
4 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Confusion Matrix
No ratings yet
Confusion Matrix
13 pages
Data Integration With Rest Api: Technical White Paper
No ratings yet
Data Integration With Rest Api: Technical White Paper
6 pages
Primordial Leadership™: Excerpt 2: A BRIEF History of Western Leadership Thought in The Last 100 Years
No ratings yet
Primordial Leadership™: Excerpt 2: A BRIEF History of Western Leadership Thought in The Last 100 Years
12 pages
Leadership Case Study: Inspires and Motivates Others To High Performance
No ratings yet
Leadership Case Study: Inspires and Motivates Others To High Performance
2 pages
Robust Vocabulary Instruction
No ratings yet
Robust Vocabulary Instruction
4 pages
Confusion Matrix
No ratings yet
Confusion Matrix
18 pages
Streamlit Interface For Multiple Disease Diagnosis
No ratings yet
Streamlit Interface For Multiple Disease Diagnosis
8 pages
School Teachers' Attitudes Towards Inclusive Education
No ratings yet
School Teachers' Attitudes Towards Inclusive Education
6 pages
Dodatak IK
No ratings yet
Dodatak IK
13 pages
Ia Grade6 LP - Module1 q4 w2
No ratings yet
Ia Grade6 LP - Module1 q4 w2
2 pages
Emilys Teaching Resume
No ratings yet
Emilys Teaching Resume
1 page
Neo4j High Performance
From Everand
Neo4j High Performance
Sonal Raj
No ratings yet
Secure Chains: Cybersecurity and Blockchain-powered Automation
From Everand
Secure Chains: Cybersecurity and Blockchain-powered Automation
Srinivas Mahankali
No ratings yet
Advanced Database Architecture: Strategic Techniques for Effective Design
From Everand
Advanced Database Architecture: Strategic Techniques for Effective Design
Adam Jones
No ratings yet
Practical Predictive Analytics
From Everand
Practical Predictive Analytics
Ralph Winters
No ratings yet
Real-time business intelligence A Complete Guide
From Everand
Real-time business intelligence A Complete Guide
Gerardus Blokdyk
No ratings yet
Application Security Complete Self-Assessment Guide
From Everand
Application Security Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology
From Everand
Business Intelligence Cookbook: A Project Lifecycle Approach Using Oracle Technology
John Heaton
No ratings yet
Intelligent Transport Systems Complete Self-Assessment Guide
From Everand
Intelligent Transport Systems Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Change data capture Third Edition
From Everand
Change data capture Third Edition
Gerardus Blokdyk
No ratings yet
Assembly line Complete Self-Assessment Guide
From Everand
Assembly line Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Practical Data Analysis - Second Edition
From Everand
Practical Data Analysis - Second Edition
Hector Cuesta
No ratings yet
Application Engineering Complete Self-Assessment Guide
From Everand
Application Engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Product forecasting Second Edition
From Everand
Product forecasting Second Edition
Gerardus Blokdyk
No ratings yet
Neo4j Certified Professional - Exam Practice Tests
From Everand
Neo4j Certified Professional - Exam Practice Tests
Cristian Scutaru
No ratings yet
Backup and Restore The Ultimate Step-By-Step Guide
From Everand
Backup and Restore The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
SAS Viya: The Python Perspective
From Everand
SAS Viya: The Python Perspective
Kevin D. Smith
No ratings yet
Crash Course Big Data
From Everand
Crash Course Big Data
IntroBooks Team
No ratings yet
Tracking Outpatients: Using The E-Health System To Ensure Positive Treatment Progress For Hospital Services' Effectiveness For Clients Tracking And Communication At Golden Years Care
From Everand
Tracking Outpatients: Using The E-Health System To Ensure Positive Treatment Progress For Hospital Services' Effectiveness For Clients Tracking And Communication At Golden Years Care
Dr. Tamer Sabry
5/5 (2)
Data Analysis and Harmonization: A Simple Guide
From Everand
Data Analysis and Harmonization: A Simple Guide
Jeff Voivoda
No ratings yet
Business rules A Complete Guide
From Everand
Business rules A Complete Guide
Gerardus Blokdyk
No ratings yet
The Information Process: A Model and Hierarchy
From Everand
The Information Process: A Model and Hierarchy
Victor Yang
No ratings yet
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
From Everand
Decision Support System: Fundamentals and Applications for The Art and Science of Smart Choices
Fouad Sabry
No ratings yet
Qlik A Complete Guide - 2020 Edition
From Everand
Qlik A Complete Guide - 2020 Edition
Gerardus Blokdyk
No ratings yet
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
María Carina Roldán
No ratings yet
Quality metrics for semantic interoperability in Health Informatics
From Everand
Quality metrics for semantic interoperability in Health Informatics
Alberto Moreno Conde
No ratings yet
Effective Business Intelligence with QuickSight
From Everand
Effective Business Intelligence with QuickSight
Rajesh Nadipalli
No ratings yet
Database testing Third Edition
From Everand
Database testing Third Edition
Gerardus Blokdyk
No ratings yet
SQLite Complete Self-Assessment Guide
From Everand
SQLite Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
From Prognostics and Health Systems Management to Predictive Maintenance 2: Knowledge, Reliability and Decision
From Everand
From Prognostics and Health Systems Management to Predictive Maintenance 2: Knowledge, Reliability and Decision
Brigitte Chebel-Morello
No ratings yet

Confusion Matrix

Uploaded by

Confusion Matrix

Uploaded by

Simple guide to confusion matrix terminology

A confusion matrix is a table that is often used to describe the performance of a

I wanted to create a "quick reference guide" for confusion matrix terminology

What can we learn from this matrix?

 Accuracy: Overall, how often is the classifier correct?

A couple other terms are also worth mentioning:

 Positive Predictive Value: This is very similar to precision, except that it

Precision and recall are then defined as:[6]

You might also like