Confusion Matrix

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 21

CONFUSION MATRIX

By
R Soujanya
Assistant Professor,
CSE, GRIET
CONFUSION MATRIX
 Metric is a technique to evaluate the performance of the model.
 List of various metric we will be covering in this blog.
1. Accuracy
2. Null Accuracy
3. Precision
4. Recall
5. f1 score
6. ROC — AUC
7. Accuracy : This is the most naive and commonly used metric in the context of classification. It is
just the mean of correct predictions.
CONFUSION MATRIX
 A Confusion matrix or error matrix is used for summarizing the performance of a classification
algorithm
 Confusion matrix is used to find the accuracy of the classification model or a classifier.
 It will be applied on classification problems rather than regression.
 The given data set is divided into two ways
1. Training data (80%) ---- Model can be build on training data
2. Test Data (20%) ------ Accuracy of a model will be calculated using test data.
 To find the accuracy of a given model confusion matrix will be used.
 Calculating a confusion matrix can give you an idea of where the classification model is right and what
types of errors it is making.
 A confusion matrix is used to check the performance of a classification model on a set of test data for
which the true values are known.
 Performance measures such as precision, recall are calculated from the confusion matrix.
How To Use Confusion Matrix:

 Confusion matrix is always a square matrix.


 For example S.N PATEINT AGE ACTUAL PREDICTED
O NAME VALUE VALUE
(DIABITIES)

1 A 45 YES YES
2 B 50 NO YES
3 C 57 YES NO
4 D 68 YES YES
5 E 30 NO NO
6 F 76 NO YES

 If the number of categories of target variable is 2 then we get confusion matrix order is 2 i.e 2
x2 .
 In general, if the order confusion matrix is n x n where n is the number of categories in target
variable.
CONFUSION MATRIX
 A confusion matrix provides a summary of the
predictive results in a classification
 Correct and incorrect predictions are
problem.
summarized
broken downinbya each
table class.
with their values and

Fig: Confusion
Matrix for the
Binary
Classification
CONFUSION MATRIX
 We can not rely on a single value of accuracy in classification when the classes are imbalanced.

 For example, we have a dataset of 100 patients in which 5 have diabetes and 95 are healthy.

 However, if our model only predicts the majority class i.e. all 100 people are healthy even though we have a
classification accuracy of 95%. Therefore, we need a confusion matrix.

 Calculate A Confusion Matrix for 2x2 :

 We have a total of 10 cats and dogs and our model predicts whether it is a cat or not.

 Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’]
CONFUSION MATRIX

Remember, we describe predicted values as Positive/Negative and actual values as


True/False.
CONFUSION MATRIX
 Definition of the Terms:

 True Positive: You predicted positive and it’s true. You predicted that an animal is a cat and it actually is.
 True Negative: You predicted negative and it’s true. You predicted that animal is not a cat and it actually is
not (it’s a dog).
 False Positive (Type 1 Error): You predicted positive and it’s false. You predicted that animal is a cat but it
actually is not (it’s a dog).
 False Negative (Type 2 Error): You predicted negative and it’s false. You predicted that animal is not a cat
but it actually is.
 A confusion matrix is used to check the performance of a classification model on a set of test data for which
the true values are known.
 Most performance measures such as precision, recall are calculated from the confusion matrix.
CONFUSION MATRIX
Recall (aka Sensitivity)
 Recall is defined as the ratio of the total
number
classes of correctly
divide by the classified
total positive
number of
positive
classes, classes.
how Or, out
much we of have
all thepredicted
positive
correctly. Recall should be high.
 Precision is defined as the ratio of the total
number
classes of correctly
divided by theclassified
total positive
number of
predicted
predictive positive
positive classes.
classes, Or, out much
how of all the
we
predicted
high. correctly. Precision should be
CONFUSION MATRIX
Classification Accuracy:

 Classification Accuracy is given by the relation:

F-score or F1-score:

 It is difficult to compare two models with different Precision and Recall. So to make them
comparable, we use F-Score. It is the Harmonic Mean of Precision and Recall. As compared to
Arithmetic Mean, Harmonic Mean punishes the extreme values more. F-score should be high.
CONFUSION MATRIX
 Specificity:
Specificity determines the proportion of actual negatives that are correctly identified.

 Example to interpret confusion matrix:


Let’s calculate confusion matrix using above cat and dog example:
CONFUSION MATRIX
CONFUSION MATRIX
 Classification Accuracy:
Accuracy = (TP + TN) / (TP + TN + FP + FN) = (3+4)/(3+4+2+1) = 0.70

 Recall: Recall gives us an idea about when it’s actually yes, how often does it predict yes.
Recall = TP / (TP + FN) = 3/(3+1) = 0.75

 Precision: Precision tells us about when it predicts yes, how often is it correct.
Precision = TP / (TP + FP) = 3/(3+2) = 0.60

 F-score:
F-score = (2*Recall*Precision)/(Recall + Precision) = (2*0.75*0.60)/(0.75+0.60) = 0.67

 Specificity:
Specificity = TN / (TN + FP) = 4/(4+2) = 0.67
CONFUSION MATRIX
Summary:
 Precision is how certain you are of your true positives. Recall is how certain you are that you are not
missing any positives.
 Choose Recall if the occurrence of false negatives is unaccepted/intolerable. For example, in the
case of diabetes that you would rather have some extra false positives (false alarms) over saving some
false negatives.
 Choose Precision if you want to be more confident of your true positives. For example, in case of
spam emails, you would rather have some spam emails in your inbox rather than some regular emails
in your spam box. You would like to be extra sure that email X is spam before we put it in the spam
box.
 Choose Specificity if you want to cover all true negatives, i.e. meaning we do not want any false
alarms or false positives. For example, in case of a drug test in which all people who test positive will
immediately go to jail, you would not want anyone drug-free going to jail.
CONFUSION MATRIX
We can conclude that:
 Accuracy value of 70% means that identification of 3 of every 10 cats is incorrect, and 7 is correct.
 Precision value of 60% means that label of 4 of every 10 cats is a not a cat (i.e. a dog), and 6 are cats.
 Recall value is 70% means that 3 of every 10 cats, in reality, are missed by our model and 7 are
correctly identified as cats.
 Specificity value is 60% means that 4 of every 10 dogs (i.e. not cat) in reality are miss-labeled as cats
and 6 are correctly labeled as dogs.
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 1. TP = True positives = Actual C1 is predicted as C1+


Actual C2 is predicted as C2+
Actual C3 is predicted as C3
= 2+2+2 = 6
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 2. TN = True negatives = actual non C1 is predicted as non C1 +


actual non C2 is predicted as non C2 +
actual non C3 is predicted as non C3
= 4+4+6
= 14
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 3. FP = False positives = Actual non C1 is predicted as C1 +


Actual non C2 is predicted as C2 +
Actual non C3 is predicted as C3 +
= 1+1+0 = 2
CONFUSION MATRIX FOR 3 X 3
 Consider the following clusters:

C1 C2 C3 PREDICTED VALUES
C1 2 1 0
ACUAL VALUES
C2 1 2 0
C3 0 0 2

 4. FN = False Negative = Actual C1 is predicted as non C1 +


Actual C2 is predicted as non C2 +
Actual C3 is predicted as non C3 +
=1+1+0 = 2
CONFUSION MATRIX
We can conclude that:
 Accuracy = (TP+TN) / (TP+FP+FN+TN) = (6+14 ) / (6+14+2+2) = 0.833

 Precision = TP/ (TP+FP) = 6/ (6+2) = 0.75.

 Recall = TP / (TP+FN) = 6/ (6+2) = 0.75

 F-measure = (2*P*R)/(P+R) =( 2* 0.75 * 0.75)/(0.75 + 0.75) = 0.75.

 Purity: Sum of correctly assigned objects and dividing by N(total


 number of objects).

You might also like