Session 1 Evaluation Model
Session 1 Evaluation Model
Model Evaluation
PRATHEESH KUMAR N
The most important task in building any machine
call it good?
Evaluation metrics are tied to machine learning tasks. There are
different metrics for the tasks of classification and regression. Some
metrics, like precision-recall, are useful for multiple tasks.
Classification and regression are examples of supervised learning,
which constitutes a majority of machine learning applications. Using
different metrics for performance evaluation, we should be able to
improve our model’s overall predictive power before we roll it out
for production on unseen data. Without doing a proper evaluation of
the Machine Learning model by using different evaluation metrics,
and only depending on accuracy, can lead to a problem when the
respective model is deployed on unseen data and may end in poor
predictions.
Classification Metrics in
MachineLearning
Classification is about predicting the class labels given input
data. In binary classification, there are only two possible
output classes(i.e., Dichotomy). In multiclass classification,
more than two possible classes can be present.
problem
Classification Metrics
There are many ways for measuring classification
performance. Accuracy, confusion matrix, log-loss,
and AUC-ROC are some of the most popular
metrics. Precision-recall is a widely used metrics for
classification problems.
Accuracy
Accuracy simply measures how often the classifier
correctly predicts. We can define accuracy as the
ratio of the number of correct predictions and the
total number of predictions.
When any model gives an accuracy rate of 99%, you might
think that model is performing very good but this is not
always true and can be misleading in some situations. I am
going to explain this with the help of an example.
Consider a binary classification problem, where a model can
achieve only two results, either model gives
a correct or incorrect prediction. Now imagine we have a
classification task to predict if an image is a dog or cat as
shown in the image. In a supervised learning algorithm, we
first fit / train a model on training data, then test the model
on testing data. Once we have the model’s predictions from
the X test data, we compare them to the true y values (the
correct labels).
We feed the image of the dog into the training
model. Suppose the model predicts that this is a dog,
and then we compare the prediction to the correct
label. If the model predicts that this image is a cat
and then we again compare it to the correct label and
it would be incorrect.
We repeat this process for all images in X test
data. Eventually, we’ll have a count of correct
and incorrect matches. But in reality, it is very
rare that all incorrect or correct matches
hold equal value. Therefore one metric won’t
tell the entire story.
Accuracy is useful when the target class is well balanced but is not
a good choice for the unbalanced classes. Imagine the scenario
where we had 99 images of the dog and only 1 image of a cat
present in our training data. Then our model would always predict
the dog, and therefore we got 99% accuracy. In reality, Data is
always imbalanced for example Spam email, credit card fraud, and
medical diagnosis. Hence, if we want to do a better model
evaluation and have a full picture of the model evaluation, other
metrics such as recall and precision should also be considered.
Confusion Matrix
Confusion Matrix is a performance
measurement for the machine learning
classification problems where the output can
be two or more classes. It is a table with
combinations of predicted and actual values.
A confusion matrix is defined as the table that is
1 0
0
Link for confusion matrix
https://
towardsdatascience.com/understanding-confusion-
matrix-a9ad42dcfd62
https://
www.guru99.com/confusion-matrix-machine-learni
ng-example.html
Let’s try to understand TP, FP, FN, TN
with an example of pregnancy analogy
True Positive: We predicted positive and it’s true. In the
image, we predicted that a woman is pregnant and she
actually is.
True Negative: We predicted negative and it’s true. In the
image, we predicted that a man is not pregnant and he
actually is not.
False Positive (Type 1 Error)- We predicted positive and it’s
false. In the image, we predicted that a man is pregnant but
he actually is not.
False Negative (Type 2 Error)- We predicted negative and it’s
false. In the image, we predicted that a woman is not
pregnant but she actually is.
1. Precision —
Precision explains how many of the correctly predicted
cases actually turned out to be positive. Precision is useful
in the cases where False Positive is a higher concern than
False Negatives. The importance of Precision is in music or
video recommendation systems, e-commerce websites, etc.
where wrong results could lead to customer churn and this
could be harmful to the business.
Binary Classification.
Multi-Class Classification.
Multi-Label Classification.
Imbalanced Classification.
What is an example of a
classifier
model?
For example, a classification model might
be trained on a dataset of images labeled as
either dogs or cats and then used to predict
the class of new, unseen images of dogs or
cats based on their features such as color,
texture, and shape.
Why use a classification
model?
It helps in categorizing data into different
classes and has a broad array of applications,
such as email spam detection, medical diagnostic
test, fraud detection, image classification, and
speech recognition among others.
How many types of classification
algorithms are there?
Classification algorithms are used to categorize
data into a class or category. It can be performed on
both structured or unstructured data. Classification
can be of three types: binary classification,
multiclass classification, multilabel classification.
Regression metrics
Used for evaluating Regression based AI models.
Deep learning related
metrics
Used for evaluating Deep learning related metrics
based AI models.
Why did classification metrics are
mostly use in many AI evaluation?
No of FALSE
No of TRUE
POSITIVE (1) NEGATIVES (FN)
POSITIVES (TP)
No of FALSE No of TRUE
NEGATIVE (0)
POSITIVES (FP) NEGATIVES (TN)
Link for Classification Metrics
https://www.analyticsvidhya.com/blog
/2021/07/metrics-to-evaluate-your-classi
fication-model-to-take-the-right-decisions
/
How can we evaluate AI model
using confusion matrices?
By calculating the following values:-
ACCURACY RATE
PRECISION RATE
RECALL
F1 SCORE
ACCURACY RATE
Percentage of times the predictions out of all
the observations are correct.
Accuracy =
Precision . Recall
F1 = 2 x
Precision + Recall
F1 score
A measure of balance between precision and
Recall. It is computed as per the following
formula:-
TP
F1 =
TP + ½ (FP + FN)