0% found this document useful (0 votes)

44 views20 pages

Evaluation in AI

Uploaded by

silvia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views20 pages

Evaluation in AI

Uploaded by

silvia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Evaluation in AI

Model Evaluation is a crucial part of the AI project cycle,

following the stages of Problem scoping, Data acquisition, Data
exploration, and modelling. It helps determine the reliability of
an AI model by assessing its performance on a test dataset.
This evaluation process is essential for understanding how well
the chosen model will generalize to new, unseen data and
make accurate predictions in real-world scenarios.

by Silvia Thomas Padiyath

LEARNING OBJECTIVES
1. Understand the role of evaluation in the development and implementation
of AI systems.
2. Learn various Model Evaluation Terminologies
3. Learn to make a confusion matrix for given Scenario
4. Learn about the different types of evaluation techniques in AI, such as
Accuracy, Precision, Recall and F1 Score, and their significance.
Understanding Evaluation
Evaluation involves feeding a test dataset into the AI model
and comparing its outputs with the actual answers. This
process helps determine the model's accuracy and reliability.
Different evaluation techniques exist, depending on the type
and purpose of the model. It's important to note that using the
same data used to build the model for evaluation can lead to
overfitting, where the model simply memorizes the training
set and doesn't generalize well to new data.
Underfitting & Overfitting -
Explained
https://
www.youtube.com/watch?v=o3DztvnfAJg
Key Terminologies in
Model Evaluation
When evaluating AI models, several key terminologies come
into play. These terms help us understand the model's
performance and its ability to make accurate predictions. Let's
explore these terminologies using a scenario involving a forest
fire prediction model.
The Forest Fire Scenario
Imagine an AI model deployed in a forest prone to forest fires.
The model's objective is to predict whether a fire has broken
out. To evaluate its efficiency, we need to compare its
predictions with the actual reality in the forest. This comparison
involves two key conditions: Prediction and Reality. The
prediction is the model's output, while reality represents the
actual situation in the forest.
Understanding True
Positives and True
Negatives
Let's consider the following scenarios: If a forest fire has broken
out, and the model correctly predicts "Yes," indicating a fire,
this is termed a True Positive. Conversely, if there is no fire in
the forest, and the model correctly predicts "No," this is termed
a True Negative.
False Positives and False
Negatives
Now, let's consider the cases where the model's prediction
doesn't match reality. If there is no fire in the forest, but the
model incorrectly predicts "Yes," indicating a fire, this is termed
a False Positive. On the other hand, if a fire has broken out,
but the model incorrectly predicts "No," this is termed a False
Negative.
The Confusion Matrix
The confusion matrix is a table that summarizes the results of
comparing the model's predictions with the actual reality. It helps us
understand the model's performance by visualizing the different
combinations of True Positives, True Negatives, False Positives, and
False Negatives.
CONFUSION MATRIX ACTIVITY (PART 1)
Evaluating Model Performance
Now that we understand the different combinations of predictions and reality, let's explore how we can use
these conditions to evaluate the model's performance. One common evaluation metric is accuracy, which is
defined as the percentage of correct predictions out of all observations. Accuracy is calculated as (TP + TN) /
(TP + TN + FP + FN) * 100%.
ACCURACY contd.
Let us go back to the Forest Fire example. Assume that the model always
predicts that there is no fire. But in reality, there is a 2% chance of forest
fire breaking out. In this case, for 98 cases, the model will be right but for
those 2 cases in which there was a forest fire, then too the model
predicted no fire.
Here,
True Positives = 0
True Negatives = 98
Total cases = 100
Therefore, accuracy becomes: (98 + 0) / 100 = 98%
Precision and Recall
While accuracy provides a general measure of performance, it doesn't
always tell the whole story. For example, in the forest fire scenario, a
model that always predicts "No" might achieve high accuracy if fires are
rare. However, this model would fail to detect actual fires, leading to
potentially disastrous consequences. To address this, we introduce two
additional metrics: precision and recall.

1 Precision
Precision measures the percentage of true positive cases out of
all cases where the prediction is positive. It is calculated as TP /
(TP + FP) * 100%.

2 Recall
Recall measures the percentage of true positive cases out of all
actual positive cases. It is calculated as TP / (TP + FN) * 100%.
Precision
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true. That is, it takes into account the True Positives and
False Positives.

Going back to the Forest Fire example, in this case, assume that the model always
predicts that there is a forest fire irrespective of the reality. In this case, all the
Positive conditions would be taken into account that is, True Positive (Prediction =
Yes and Reality = Yes) and False Positive (Prediction = Yes and Reality = No). In this
case, the firefighters will check for the fire all the time to see if the alarm was
True or False.

You might recall the story of the boy who falsely cries out that there are wolves
every time and so when they actually arrive, no one comes to his rescue.
Let us consider that a model has 100% precision. This Similarly, here if the Precision is low (which means there are more False alarms
means that whenever the machine says there’s a fire, than the actual ones) then the firefighters would get complacent and might not
there is a fire (True Positive). In the same model, there can go and check every time considering it could be a false alarm.
be a rare exceptional case where there was actual fire but
the system could not detect it. This is the case of a False This makes Precision an important evaluation criteria. If Precision is
Negative condition. But the precision value would not be high, this means the True Positive cases are more, giving lesser False
affected by it because it does not take FN into account. Is alarms.
precision then a good parameter for model performance?
Recall
Another parameter for evaluating the model’s performance is Recall. It can be defined as the fraction of positive cases that are correctly
identified. It majorly takes into account the true reality cases where in Reality there was a fire but the machine either detected it correctly or it
didn’t. That is, it considers True Positives (There was a forest fire in reality and the model predicted a forest fire) and False Negatives (There was
a forest fire and the model didn’t predict it).

Now as we notice, we can see that the Numerator in both Precision and Recall is the same: True Positives. But in the denominator, Precision
counts the False Positives while Recall takes False Negatives into consideration.

Which Metric is Important?

Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a False
Negative can cost us a lot and is risky too. Imagine no alert being given even when there is a Forest Fire. The whole forest might burn down.
Another case where a False Negative can be dangerous is Viral Outbreak. Imagine a deadly virus has started spreading and the model which is
supposed to predict a viral outbreak does not detect it. The virus might spread widely and infect a lot of people.
On the other hand, there can be cases in which the False Positive condition costs us more than False Negatives. One such case is Mining. Imagine
a model telling you that there exists treasure at a point and you keep on digging there but it turns out that it is a false alarm. Here, False Positive
case (predicting there is treasure but there is no treasure) can be very costly.
Similarly, let’s consider a model that predicts that a mail is spam or not. If the model always predicts that the mail is spam, people would not
look at it and eventually might lose important information. Here also False Positive condition (Predicting the mail as spam while the mail is not
spam) would have a high cost.
To conclude the argument, we must say that if we want to know if our
model’s performance is good, we need these two measures: Recall and
Precision. For some cases, you might have a High Precision but Low Recall or
Low Precision but High Recall. But since both the measures are important,
there is a need of a parameter which takes both Precision and Recall into
account.

F1 Score
Both precision and recall are important metrics, but they often
provide conflicting information. A model might have high
precision but low recall, or vice versa. To address this, we
introduce the F1 score, which combines precision and recall
into a single metric.
The F1 score is calculated as 2 * (Precision * Recall) / (Precision + Recall).

Take a look at the formula and think of when can we get a perfect F1 score?
An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and Recall. In that case,
the F1 score would also be an ideal 1 (100%). It is known as the perfect value for F1 Score. As the values of both
Precision and Recall ranges from 0 to 1, the F1 score also ranges from 0 to 1.
CONFUSION MATRIX ACTIVITY (PART 2)
HOMEWORK QUESTIONS TO TRY:
Find out Accuracy, Precision, Recall and F1 Score for the given problems.

Scenario 1: Scenario 2:
In schools, a lot of times it happens that there is no water to drink. Nowadays, the problem of floods has worsened in some parts
At a few places, cases of water shortage in schools are very of the country. Not only does it damage the whole place but it
common and prominent. Hence, an AI model is designed to predict also forces people to move out of their homes and relocate.
if there is going to be a water shortage in the school in the near To address this issue, an AI model has been created which can
future or not. The confusion matrix for the same is: predict if there is a chance of floods or not. The confusion
matrix for the same is:
Challenge
Find out Accuracy, Precision, Recall and F1 Score for the given problems.

Scenario 3: Scenario 4:
A lot of times people face the problem of sudden downpour. Traffic Jams have become a common part of our lives
People wash clothes and put them out to dry but due to nowadays. Living in an urban area means you have to face
unexpected rain, their work gets wasted. Thus, an AI model has traffic each and every time you get out on the road. Mostly,
been created which predicts if there will be rain or not. The school students opt for buses to go to school. Many times the
confusion matrix for the same is: bus gets late due to such jams and students are not able to
reach their school on time. Thus, an AI model is created to
predict explicitly if there would be a traffic jam on their way
to school or not. The confusion matrix for the same is:

Unit 7 - AI (Evaluation)
No ratings yet
Unit 7 - AI (Evaluation)
28 pages
EVALUATION
No ratings yet
EVALUATION
25 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Ai SS CH 7 LM
No ratings yet
Ai SS CH 7 LM
39 pages
Evaluation
No ratings yet
Evaluation
32 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
EVALUATION
No ratings yet
EVALUATION
12 pages
Evaluation Class X
33% (3)
Evaluation Class X
19 pages
Evaluation AI X
No ratings yet
Evaluation AI X
6 pages
Evaluation - Grade 10 AI
No ratings yet
Evaluation - Grade 10 AI
12 pages
Part B Chapter 7 (Evaluation)
No ratings yet
Part B Chapter 7 (Evaluation)
5 pages
Evaluation 1646538719041
No ratings yet
Evaluation 1646538719041
65 pages
c10 Ai Evaluation - 2024-25
No ratings yet
c10 Ai Evaluation - 2024-25
29 pages
Ch01 ICS422 03
No ratings yet
Ch01 ICS422 03
46 pages
04 Evaluation Revision Notes
No ratings yet
04 Evaluation Revision Notes
5 pages
A Field of Computer Science That Focuses On Enabling Computers To Identify and Understand Objects and People in Images and Videos
No ratings yet
A Field of Computer Science That Focuses On Enabling Computers To Identify and Understand Objects and People in Images and Videos
136 pages
Medical Writing A Guide For Clinicians, Educators, and Researchers, 2nd Edition PDF
100% (3)
Medical Writing A Guide For Clinicians, Educators, and Researchers, 2nd Edition PDF
387 pages
Partiiiunit2model Performanceconfusion Matrixaccuracyprecesion Recall
No ratings yet
Partiiiunit2model Performanceconfusion Matrixaccuracyprecesion Recall
8 pages
Evaluation Grade10 Ai
No ratings yet
Evaluation Grade10 Ai
32 pages
5.10ai - 2B
No ratings yet
5.10ai - 2B
15 pages
Unit - 7 - Evaluation
No ratings yet
Unit - 7 - Evaluation
30 pages
EVALUATION
No ratings yet
EVALUATION
10 pages
Evaluation Class X Ai 417
No ratings yet
Evaluation Class X Ai 417
19 pages
Grade 10 Unit 7 - Evaluation
No ratings yet
Grade 10 Unit 7 - Evaluation
50 pages
Notes of Evaluation
No ratings yet
Notes of Evaluation
5 pages
CH 07 Evaluation
No ratings yet
CH 07 Evaluation
25 pages
AI Evaluation
No ratings yet
AI Evaluation
24 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Evaluation Worksheet
No ratings yet
Evaluation Worksheet
2 pages
21-General Approach To Classification, Classification by Decision Tree Induction-17-02-2025
No ratings yet
21-General Approach To Classification, Classification by Decision Tree Induction-17-02-2025
15 pages
9 Roc Auc
No ratings yet
9 Roc Auc
27 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
AI Evaluation
No ratings yet
AI Evaluation
30 pages
CH 7 - Notes Evaluation
No ratings yet
CH 7 - Notes Evaluation
3 pages
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
No ratings yet
Model Evaluation Metrics - A Comprehensive Guide For Beginners - by Yash - Medium
9 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
X Unit 7 Evaluation
No ratings yet
X Unit 7 Evaluation
5 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
Evaluation Grade10 Ai
No ratings yet
Evaluation Grade10 Ai
32 pages
Evaluation 1 7
No ratings yet
Evaluation 1 7
7 pages
Unit 3
No ratings yet
Unit 3
13 pages
MS Evaluation Worksheet
No ratings yet
MS Evaluation Worksheet
3 pages
EVALUATION - Notes
No ratings yet
EVALUATION - Notes
15 pages
English Grade 7 - REVIEW 2021 (WEB)
100% (1)
English Grade 7 - REVIEW 2021 (WEB)
199 pages
CE880 Lecture6 Slides
No ratings yet
CE880 Lecture6 Slides
25 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
2.confusion Matrix and Performmance Metrics
No ratings yet
2.confusion Matrix and Performmance Metrics
15 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
CH EVALUATION
No ratings yet
CH EVALUATION
7 pages
Chater 3 Class 10
No ratings yet
Chater 3 Class 10
4 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
CHAPTER QUIZ - Position Paper
90% (10)
CHAPTER QUIZ - Position Paper
3 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Evaluation Question Answers
No ratings yet
Evaluation Question Answers
7 pages
517-C-30072-Assignment Chapter Evaluation
No ratings yet
517-C-30072-Assignment Chapter Evaluation
10 pages
Lesson 9 The School Head in School-Based Management (SBM)
100% (3)
Lesson 9 The School Head in School-Based Management (SBM)
6 pages
AI Project Evaluation 1
No ratings yet
AI Project Evaluation 1
5 pages
AI Evaluation
No ratings yet
AI Evaluation
3 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Classification Metrics in Machine Learning
No ratings yet
Classification Metrics in Machine Learning
6 pages
Evaluation Data
No ratings yet
Evaluation Data
3 pages
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
No ratings yet
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
7 pages
Lesson Plan 1
No ratings yet
Lesson Plan 1
10 pages
Educ3 Mother Tongue Preliminary Module Otilan PDF
100% (1)
Educ3 Mother Tongue Preliminary Module Otilan PDF
15 pages
Unit 6 Grammar
No ratings yet
Unit 6 Grammar
11 pages
Unit-1: Dynamics of Communication
No ratings yet
Unit-1: Dynamics of Communication
46 pages
Lea Rning Journal S 2012
No ratings yet
Lea Rning Journal S 2012
4 pages
Single Blade Installation For Large Wind
100% (1)
Single Blade Installation For Large Wind
143 pages
UnitTesting Succinctly
No ratings yet
UnitTesting Succinctly
128 pages
ICS Certification Roadmap
No ratings yet
ICS Certification Roadmap
4 pages
Monthly Accomplishment Report Ipcrf Based Template
No ratings yet
Monthly Accomplishment Report Ipcrf Based Template
7 pages
Placement PPTHHHHHHHHJJJJ
No ratings yet
Placement PPTHHHHHHHHJJJJ
12 pages
Enhanced Retrieval-Augmented Reasoning With Open-Source Large Language Models
No ratings yet
Enhanced Retrieval-Augmented Reasoning With Open-Source Large Language Models
14 pages
NHS FPX 6004 Assessment 3 Training Session For Policy Implementation
No ratings yet
NHS FPX 6004 Assessment 3 Training Session For Policy Implementation
7 pages
Indaba Mentee Onboarding
No ratings yet
Indaba Mentee Onboarding
24 pages
SAP PS - Project Information System 25
No ratings yet
SAP PS - Project Information System 25
3 pages
ML
No ratings yet
ML
1 page
EU Funding 2014-15
No ratings yet
EU Funding 2014-15
2 pages
Portfolio Management George Starr
No ratings yet
Portfolio Management George Starr
28 pages
Pre-Offer Invitation To Self-Identify - Race/Ethnicity, Sex, and Veteran Status Error: Reference Source Not Found
No ratings yet
Pre-Offer Invitation To Self-Identify - Race/Ethnicity, Sex, and Veteran Status Error: Reference Source Not Found
1 page
L&D Profiling and Actionplan
No ratings yet
L&D Profiling and Actionplan
6 pages
Christian School International: Guidebook
No ratings yet
Christian School International: Guidebook
36 pages
Lesson 6
No ratings yet
Lesson 6
2 pages
ISA Action Plan GHS NO.4 JBD Final
No ratings yet
ISA Action Plan GHS NO.4 JBD Final
17 pages
Lewis Hine: I. Introduction: The Context
No ratings yet
Lewis Hine: I. Introduction: The Context
7 pages
Adopt-A-School Program Action Plan S.y.2022-2023
No ratings yet
Adopt-A-School Program Action Plan S.y.2022-2023
3 pages
An Overview of Needs Assessment in ESP by Kay Westerfield
No ratings yet
An Overview of Needs Assessment in ESP by Kay Westerfield
5 pages
Course 6-14 : Roadmap &
No ratings yet
Course 6-14 : Roadmap &
1 page
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet

Evaluation in AI

Uploaded by

Evaluation in AI

Uploaded by

Evaluation in AI

Model Evaluation is a crucial part of the AI project cycle,

by Silvia Thomas Padiyath

Which Metric is Important?

You might also like