0% found this document useful (0 votes)
87 views58 pages

Session 1 Evaluation Model

Uploaded by

pratheesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views58 pages

Session 1 Evaluation Model

Uploaded by

pratheesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Session 1

Model Evaluation

PRATHEESH KUMAR N

 The most important task in building any machine

learning model is to evaluate its performance. So, the

question arises that how would one measure the success

of a machine learning model? How would we know that

when to stop the training and evaluation and when to

call it good?

 Evaluation metrics are tied to machine learning tasks. There are
different metrics for the tasks of classification and regression. Some
metrics, like precision-recall, are useful for multiple tasks.
Classification and regression are examples of supervised learning,
which constitutes a majority of machine learning applications. Using
different metrics for performance evaluation, we should be able to
improve our model’s overall predictive power before we roll it out
for production on unseen data. Without doing a proper evaluation of
the Machine Learning model by using different evaluation metrics,
and only depending on accuracy, can lead to a problem when the
respective model is deployed on unseen data and may end in poor
predictions.
Classification Metrics in
MachineLearning
 Classification is about predicting the class labels given input
data. In binary classification, there are only two possible
output classes(i.e., Dichotomy). In multiclass classification,
more than two possible classes can be present.

 A very common example of binary classification is spam


detection, where the input data could include the email text
and metadata (sender, sending time), and the output label is
either “spam” or “not spam.” (See Figure) Sometimes, people
use some other names also for the two classes: “positive” and
“negative,” or “class 1” and “class 0.”
Email spam detection is a binary classification

problem

Classification Metrics

 There are many ways for measuring classification
performance. Accuracy, confusion matrix, log-loss,
and AUC-ROC are some of the most popular
metrics. Precision-recall is a widely used metrics for
classification problems.
Accuracy

 Accuracy simply measures how often the classifier
correctly predicts. We can define accuracy as the
ratio of the number of correct predictions and the
total number of predictions.

 When any model gives an accuracy rate of 99%, you might
think that model is performing very good but this is not
always true and can be misleading in some situations. I am
going to explain this with the help of an example.
 Consider a binary classification problem, where a model can
achieve only two results, either model gives
a correct or incorrect prediction. Now imagine we have a
classification task to predict if an image is a dog or cat as
shown in the image. In a supervised learning algorithm, we
first fit / train a model on training data, then test the model
on testing data. Once we have the model’s predictions from
the X test data, we compare them to the true y values (the
correct labels).


 We feed the image of the dog into the training
model. Suppose the model predicts that this is a dog,
and then we compare the prediction to the correct
label. If the model predicts that this image is a cat
and then we again compare it to the correct label and
it would be incorrect.

We repeat this process for all images in X test
data. Eventually, we’ll have a count of correct
and incorrect matches. But in reality, it is very
rare that all incorrect or correct matches
hold equal value. Therefore one metric won’t
tell the entire story.

 Accuracy is useful when the target class is well balanced but is not
a good choice for the unbalanced classes. Imagine the scenario
where we had 99 images of the dog and only 1 image of a cat
present in our training data. Then our model would always predict
the dog, and therefore we got 99% accuracy. In reality, Data is
always imbalanced for example Spam email, credit card fraud, and
medical diagnosis. Hence, if we want to do a better model
evaluation and have a full picture of the model evaluation, other
metrics such as recall and precision should also be considered.
Confusion Matrix

Confusion Matrix is a performance
measurement for the machine learning
classification problems where the output can
be two or more classes. It is a table with
combinations of predicted and actual values.

A confusion matrix is defined as the table that is

often used to describe the performance of a

classification model on a set of the test data for

which the true values are known.



1 0

0
Link for confusion matrix

 https://
towardsdatascience.com/understanding-confusion-
matrix-a9ad42dcfd62

 https://
www.guru99.com/confusion-matrix-machine-learni
ng-example.html
Let’s try to understand TP, FP, FN, TN
with an example of pregnancy analogy


 True Positive: We predicted positive and it’s true. In the
image, we predicted that a woman is pregnant and she
actually is.
 True Negative: We predicted negative and it’s true. In the
image, we predicted that a man is not pregnant and he
actually is not.
 False Positive (Type 1 Error)- We predicted positive and it’s
false. In the image, we predicted that a man is pregnant but
he actually is not.
 False Negative (Type 2 Error)- We predicted negative and it’s
false. In the image, we predicted that a woman is not
pregnant but she actually is.
1. Precision —

Precision explains how many of the correctly predicted
cases actually turned out to be positive. Precision is useful
in the cases where False Positive is a higher concern than
False Negatives. The importance of Precision is in music or
video recommendation systems, e-commerce websites, etc.
where wrong results could lead to customer churn and this
could be harmful to the business.

Precision for a label is defined as the

number of true positives divided by the

number of predicted positives.


2. Recall (Sensitivity)

 Recall explains how many of the actual positive cases

we were able to predict correctly with our model. It is


a useful metric in cases where False Negative is of
higher concern than False Positive. It is important in
medical cases where it doesn’t matter whether we raise a
false alarm but the actual positive cases should not go
undetected!
Recall for a label is defined as the number of
true positives divided by the total number of
actual positives.

3. F1 Score —

 It gives a combined idea about
Precision and Recall metrics. It is
maximum when Precision is equal to
Recall.
F1 Score is the harmonic mean
of precision and recall.

Q1. What are the classification
metrics?

 Classification metrics are evaluation measures used to
assess the performance of a classification model. Common
metrics include accuracy (proportion of correct
predictions), precision (true positives over total predicted
positives), recall (true positives over total actual
positives), F1 score (harmonic mean of precision and
recall), and area under the receiver operating
characteristic curve (AUC-ROC).
Q2. What are the 4 metrics for evaluating
classifier performance?

 The four commonly used metrics for evaluating classifier
performance are:
1. Accuracy: The proportion of correct predictions out of the
total predictions.
2. Precision: The proportion of true positive predictions out of
the total positive predictions (precision = true positives / (true
positives + false positives)).
3. Recall (Sensitivity or True Positive Rate): The proportion of
true positive predictions out of the total actual positive instances
(recall = true positives / (true positives + false negatives)).
4. F1 Score: The harmonic mean of precision and recall,
providing a balance between the two metrics (F1 score = 2 *
((precision * recall) / (precision + recall))).
EVALUATING AN AI MODEL

AIM of Evaluation model:
To measure the performance of AI
through some evaluation metrics.
Calculate some performance score
and based on which the efficiency
of an AI model is determined.
Purpose of Evaluation
metrics

Provide a mathematical estimate as to
how far we are from making correct
predictions.

If the model performs well on unseen


new real-life data, the deployment stage is
started where it is to put to use in real-life
applications.
Model Evaluation
Metrics

Standard way of measurement to
assess something for accuracy and
performance.
Types of metrics

Classification metrics
Regression metrics
Deep learning related metrics
These metrics calculate some score which
indicates how correct the AI model’s prediction
is
The higher the score, the better our model is.
Classification metrics

Used for evaluating classification based
AI models.
Identifying the class of input value.
Classification problems are based on
non-continuous data.
What are the classification
models in AI?

Classification models include
logistic regression
decision tree
Random forest
Gradient-boosted tree
Multilayer perceptron
One-vs-rest
Naive Bayes.
What are types of classification
models in machine learning?

 There are perhaps four main types of classification
tasks that you may encounter; they are:

Binary Classification.
Multi-Class Classification.
Multi-Label Classification.
Imbalanced Classification.
What is an example of a
classifier
 model?
For example, a classification model might
be trained on a dataset of images labeled as
either dogs or cats and then used to predict
the class of new, unseen images of dogs or
cats based on their features such as color,
texture, and shape.
Why use a classification
model?

It helps in categorizing data into different
classes and has a broad array of applications,
such as email spam detection, medical diagnostic
test, fraud detection, image classification, and
speech recognition among others.
How many types of classification
algorithms are there?

Classification algorithms are used to categorize
data into a class or category. It can be performed on
both structured or unstructured data. Classification
can be of three types: binary classification,
multiclass classification, multilabel classification.

Regression metrics

 Used for evaluating Regression based AI models.
Deep learning related
metrics

 Used for evaluating Deep learning related metrics
based AI models.
Why did classification metrics are
mostly use in many AI evaluation?

Simple and easy to valuate.


Classification Metrics

Confusion Matrix
Accuracy
Precision
Recall
F1 Score
What is a confusion
matrix?

A technique using chart or table
summarizing the performance of a
classification based AI model by listing
the predicted values of an AI model
and the actual / correct outcome values
in a confusion table.

The actual value (True / False) represents
the actual result of the AI model
(observed or measured)
The predicted value (Positive / Negative)
is the value of the outcome /result of the
AI model, produced on the basis of its
algorithm and learning.
TRUE POSITIVE

An instance for which
both Predicted value of
the AI model and actual
value are positive.
TRUE NEGATIVE

An instance for which both
Predicted value of the AI
model and actual value are
Negative
FALSE POSITIVE

An instance for which
Predicted value of the AI
model is Positive but actual
value is Negative
FALSE NEGATIVE

An instance for which
Predicted value of the AI
model is Negative but actual
value is Positive
Classification Matrix
Format

PREDICTED VALUES
ACTUAL
VALUES
POSITIVE NEGATIVE

No of FALSE
No of TRUE
POSITIVE (1) NEGATIVES (FN)
POSITIVES (TP)

No of FALSE No of TRUE
NEGATIVE (0)
POSITIVES (FP) NEGATIVES (TN)
Link for Classification Metrics

https://www.analyticsvidhya.com/blog
/2021/07/metrics-to-evaluate-your-classi
fication-model-to-take-the-right-decisions
/
How can we evaluate AI model
using confusion matrices?

By calculating the following values:-

 ACCURACY RATE
 PRECISION RATE
 RECALL
 F1 SCORE
ACCURACY RATE


Percentage of times the predictions out of all
the observations are correct.
Accuracy =

Number of correct predictions (TP + TN)


X 100%
Total number of Predictions made ( TP + TN + FP + FN)
Precision rate

Rate at which the desirable predictions
turn out to be correct (True Positives out
of All Positives)
Precision Rate = TP / (TP + FP)

In %, Precision Rate = TP/(TP + FP) x 100%


RECALL

Rate of correct positive predictions to
the overall number of positive instances
in the dataset.
Recall =
Predictions actually positive
Actual positive values in the dataset

Recall = TP / (TP + FN)

In %, Recall = TP / (TP + FN) x 100 %


F1 score

A measure of balance between precision and
Recall. It is computed as per the following
formula:-

Precision . Recall
F1 = 2 x
Precision + Recall
F1 score

A measure of balance between precision and
Recall. It is computed as per the following
formula:-

TP
F1 =
TP + ½ (FP + FN)

You might also like