0% found this document useful (0 votes)
44 views20 pages

Evaluation in AI

Uploaded by

silvia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views20 pages

Evaluation in AI

Uploaded by

silvia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Evaluation in AI

Model Evaluation is a crucial part of the AI project cycle,


following the stages of Problem scoping, Data acquisition, Data
exploration, and modelling. It helps determine the reliability of
an AI model by assessing its performance on a test dataset.
This evaluation process is essential for understanding how well
the chosen model will generalize to new, unseen data and
make accurate predictions in real-world scenarios.

by Silvia Thomas Padiyath


LEARNING OBJECTIVES
1. Understand the role of evaluation in the development and implementation
of AI systems.
2. Learn various Model Evaluation Terminologies
3. Learn to make a confusion matrix for given Scenario
4. Learn about the different types of evaluation techniques in AI, such as
Accuracy, Precision, Recall and F1 Score, and their significance.
Understanding Evaluation
Evaluation involves feeding a test dataset into the AI model
and comparing its outputs with the actual answers. This
process helps determine the model's accuracy and reliability.
Different evaluation techniques exist, depending on the type
and purpose of the model. It's important to note that using the
same data used to build the model for evaluation can lead to
overfitting, where the model simply memorizes the training
set and doesn't generalize well to new data.
Underfitting & Overfitting -
Explained
https://
www.youtube.com/watch?v=o3DztvnfAJg
Key Terminologies in
Model Evaluation
When evaluating AI models, several key terminologies come
into play. These terms help us understand the model's
performance and its ability to make accurate predictions. Let's
explore these terminologies using a scenario involving a forest
fire prediction model.
The Forest Fire Scenario
Imagine an AI model deployed in a forest prone to forest fires.
The model's objective is to predict whether a fire has broken
out. To evaluate its efficiency, we need to compare its
predictions with the actual reality in the forest. This comparison
involves two key conditions: Prediction and Reality. The
prediction is the model's output, while reality represents the
actual situation in the forest.
Understanding True
Positives and True
Negatives
Let's consider the following scenarios: If a forest fire has broken
out, and the model correctly predicts "Yes," indicating a fire,
this is termed a True Positive. Conversely, if there is no fire in
the forest, and the model correctly predicts "No," this is termed
a True Negative.
False Positives and False
Negatives
Now, let's consider the cases where the model's prediction
doesn't match reality. If there is no fire in the forest, but the
model incorrectly predicts "Yes," indicating a fire, this is termed
a False Positive. On the other hand, if a fire has broken out,
but the model incorrectly predicts "No," this is termed a False
Negative.
The Confusion Matrix
The confusion matrix is a table that summarizes the results of
comparing the model's predictions with the actual reality. It helps us
understand the model's performance by visualizing the different
combinations of True Positives, True Negatives, False Positives, and
False Negatives.
CONFUSION MATRIX ACTIVITY (PART 1)
Evaluating Model Performance
Now that we understand the different combinations of predictions and reality, let's explore how we can use
these conditions to evaluate the model's performance. One common evaluation metric is accuracy, which is
defined as the percentage of correct predictions out of all observations. Accuracy is calculated as (TP + TN) /
(TP + TN + FP + FN) * 100%.
ACCURACY contd.
Let us go back to the Forest Fire example. Assume that the model always
predicts that there is no fire. But in reality, there is a 2% chance of forest
fire breaking out. In this case, for 98 cases, the model will be right but for
those 2 cases in which there was a forest fire, then too the model
predicted no fire.
Here,
True Positives = 0
True Negatives = 98
Total cases = 100
Therefore, accuracy becomes: (98 + 0) / 100 = 98%
Precision and Recall
While accuracy provides a general measure of performance, it doesn't
always tell the whole story. For example, in the forest fire scenario, a
model that always predicts "No" might achieve high accuracy if fires are
rare. However, this model would fail to detect actual fires, leading to
potentially disastrous consequences. To address this, we introduce two
additional metrics: precision and recall.

1 Precision
Precision measures the percentage of true positive cases out of
all cases where the prediction is positive. It is calculated as TP /
(TP + FP) * 100%.

2 Recall
Recall measures the percentage of true positive cases out of all
actual positive cases. It is calculated as TP / (TP + FN) * 100%.
Precision
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true. That is, it takes into account the True Positives and
False Positives.

Going back to the Forest Fire example, in this case, assume that the model always
predicts that there is a forest fire irrespective of the reality. In this case, all the
Positive conditions would be taken into account that is, True Positive (Prediction =
Yes and Reality = Yes) and False Positive (Prediction = Yes and Reality = No). In this
case, the firefighters will check for the fire all the time to see if the alarm was
True or False.

You might recall the story of the boy who falsely cries out that there are wolves
every time and so when they actually arrive, no one comes to his rescue.
Let us consider that a model has 100% precision. This Similarly, here if the Precision is low (which means there are more False alarms
means that whenever the machine says there’s a fire, than the actual ones) then the firefighters would get complacent and might not
there is a fire (True Positive). In the same model, there can go and check every time considering it could be a false alarm.
be a rare exceptional case where there was actual fire but
the system could not detect it. This is the case of a False This makes Precision an important evaluation criteria. If Precision is
Negative condition. But the precision value would not be high, this means the True Positive cases are more, giving lesser False
affected by it because it does not take FN into account. Is alarms.
precision then a good parameter for model performance?
Recall
Another parameter for evaluating the model’s performance is Recall. It can be defined as the fraction of positive cases that are correctly
identified. It majorly takes into account the true reality cases where in Reality there was a fire but the machine either detected it correctly or it
didn’t. That is, it considers True Positives (There was a forest fire in reality and the model predicted a forest fire) and False Negatives (There was
a forest fire and the model didn’t predict it).

Now as we notice, we can see that the Numerator in both Precision and Recall is the same: True Positives. But in the denominator, Precision
counts the False Positives while Recall takes False Negatives into consideration.

Which Metric is Important?


Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a False
Negative can cost us a lot and is risky too. Imagine no alert being given even when there is a Forest Fire. The whole forest might burn down.
Another case where a False Negative can be dangerous is Viral Outbreak. Imagine a deadly virus has started spreading and the model which is
supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people.
On the other hand, there can be cases in which the False Positive condition costs us more than False Negatives. One such case is Mining. Imagine
a model telling you that there exists treasure at a point and you keep on digging there but it turns out that it is a false alarm. Here, False Positive
case (predicting there is treasure but there is no treasure) can be very costly.
Similarly, let’s consider a model that predicts that a mail is spam or not. If the model always predicts that the mail is spam, people would not
look at it and eventually might lose important information. Here also False Positive condition (Predicting the mail as spam while the mail is not
spam) would have a high cost.
To conclude the argument, we must say that if we want to know if our
model’s performance is good, we need these two measures: Recall and
Precision. For some cases, you might have a High Precision but Low Recall or
Low Precision but High Recall. But since both the measures are important,
there is a need of a parameter which takes both Precision and Recall into
account.

F1 Score
Both precision and recall are important metrics, but they often
provide conflicting information. A model might have high
precision but low recall, or vice versa. To address this, we
introduce the F1 score, which combines precision and recall
into a single metric.
The F1 score is calculated as 2 * (Precision * Recall) / (Precision + Recall).

Take a look at the formula and think of when can we get a perfect F1 score?
An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and Recall. In that case,
the F1 score would also be an ideal 1 (100%). It is known as the perfect value for F1 Score. As the values of both
Precision and Recall ranges from 0 to 1, the F1 score also ranges from 0 to 1.
CONFUSION MATRIX ACTIVITY (PART 2)
HOMEWORK QUESTIONS TO TRY:
Find out Accuracy, Precision, Recall and F1 Score for the given problems.

Scenario 1: Scenario 2:
In schools, a lot of times it happens that there is no water to drink. Nowadays, the problem of floods has worsened in some parts
At a few places, cases of water shortage in schools are very of the country. Not only does it damage the whole place but it
common and prominent. Hence, an AI model is designed to predict also forces people to move out of their homes and relocate.
if there is going to be a water shortage in the school in the near To address this issue, an AI model has been created which can
future or not. The confusion matrix for the same is: predict if there is a chance of floods or not. The confusion
matrix for the same is:
Challenge
Find out Accuracy, Precision, Recall and F1 Score for the given problems.

Scenario 3: Scenario 4:
A lot of times people face the problem of sudden downpour. Traffic Jams have become a common part of our lives
People wash clothes and put them out to dry but due to nowadays. Living in an urban area means you have to face
unexpected rain, their work gets wasted. Thus, an AI model has traffic each and every time you get out on the road. Mostly,
been created which predicts if there will be rain or not. The school students opt for buses to go to school. Many times the
confusion matrix for the same is: bus gets late due to such jams and students are not able to
reach their school on time. Thus, an AI model is created to
predict explicitly if there would be a traffic jam on their way
to school or not. The confusion matrix for the same is:

You might also like