Unit 7 - AI (Evaluation)

Uploaded by

divyanagarajj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views28 pages

Unit 7 - AI (Evaluation)

Uploaded by

divyanagarajj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Unit 7

Evaluation
Evaluation
• Method used for understanding the reliability of AI model on basis of
output provided after test data is given.
• It is measurement of model’s reliability & performance as per
requirement.
• This is done by comparing the output generated(Predictions) with the
actual outputs (Reality).
• For evaluation testing data is very crucial.
Characteristics of Testing Data
• It is completely different, unique and new entity when compared to
training data.
• Data is prepared carefully by trained professionals after exploring big
data.
• It’s recommended that the testing data be different from entire
training data set to evaluate model without any biases.
Scenario
• Scenario is the problem area for which the model has been deployed.
• It is the source of real data which is fed into the model for processing.

Real time scenario Regular interval scenario

• Non-stop critical emergent life • Less critical problem area.

threats or terrible situations. • No emergent threat.
• Ex: floods, earthquakes, online • Ex: Pollution monitoring,
transactions, monitoring monitoring of diet & fitness for
patients in critical care, sports person, study on cancer
monitoring air, road, sea and patients, monitoring student
rail traffic performance & their study
habits etc. Daily/weekly/
monthly etc.
The two aspects are compared to generate
two of the following possibilities:
1. The prediction matches the reality.
2. The prediction does not match the reality.
NOTE: The predictions are namely ‘Yes’ or ‘No’/ ‘True’ or ‘False’/
‘Negative’ or ‘Positive’
Case Study: A system that identifies if it is
raining or not.
Prediction Reality

YES / TRUE/
YES / TRUE/ POSITIVE TRUE POSITIVE
POSITIVE

NO/ FALSE/
NO/ FALSE/ NEGATIVE TRUE NEGATIVE
NEGATIVE

YES / TRUE/
NO/ FALSE/ NEGATIVE FALSE POSITIVE
POSITIVE

NO/ FALSE/
YES / TRUE/ POSITIVE FALSE NEGATIVE
NEGATIVE
Confusion Matrix / Error Matrix
• Outcome of the comparison between the prediction & reality,
can be recorded in a tabular form called Confusion Matrix or Error Matrix.
• It helps in visualizing the performance of algorithm & the model
• Mostly used to supervise the learning of the models.
• It is not an evaluation matrix but a
• TRUE POSITIVE
record which can help in evaluation. • Prediction and Reality Matches (TRUE)
TP • Prediction is True (POSITIVE)

• TRUE NEGATIVE
• Prediction and Reality Matches (TRUE)
TN • Prediction is False (NEGATIVE)

• FALSE POSITIVE
• Prediction and Reality DO NOT Match (FLASE)
FP • Prediction is True (POSITIVE)

• FLASE NEGATIVE
• Prediction and Reality DO NOT Match (FLASE)
FN • Prediction is False (NEGATIVE)
REALITY
The Confusion
Matrix
YES NO

True Positive False Positive

YES
(TP) (FP)
PREDICTIO
N
False Negative True Negative
NO
(FN) (TN)
Evaluation Methods

Accuracy Precision Recall F1 Score

Accuracy
• Accuracy allows you to count the total number of accurate predictions made
by a model.
• Percentage of correct predictions out of all the observations, i.e How many
of the model predictions were accurate will be determined by accuracy.
• Correct predictions are when it matches reality. That is, ‘True Positive’ and
‘True Negative’.
• Formula
Accuracy = Correct Prediction X 100 %
Total Cases
Accuracy = (TP + TN) X 100 %
(TP+TN+FP+FN)
Precision :
Precision is defined as the percentage of true positive cases versus
all the cases where the prediction is true.
That is, it takes into account the True Positives and False Positives .
Recall

• It can be described as the percentage of positively detected cases that

are positive.
• The scenarios where a fire actually existed in reality but was either
correctly or incorrectly recognised by the machine are heavily considered.
• That is, it takes into account both False Negatives (there was a forest
fire but the model didn’t predict it) and True Positives (there was a forest
fire in reality and the model anticipated a forest fire).
F1Score:
F1 score is a weighted average of precision and recall.
As we know in precision and in recall there is false positive
and false negative so it also consider both of them.
F1 score is usually more useful than accuracy, especially if
you have an uneven class distribution.
Which Metric is Important?

• Depending on the situation the model has been deployed, choosing between
Precision and Recall is necessary.
• A False Negative can cost us a lot of money and put us in danger in a situation
like a forest fire.
• Viral Outbreak is another situation in which a False Negative might be harmful.
• Consider a scenario in which a fatal virus has begun to spread but is not being
detected by the model used to forecast viral outbreaks. The virus may infect
numerous people and spread widely.

• To conclude the argument, we must say that if we

want to know if our model’s performance is good, we
need these two measures: Recall and Precision.
F1 Score
• Since both the measures (Precision & Recall) are important, there is a need
for a parameter that takes both Precision and Recall into account which is
called the F1 Score.
• An ideal situation would be when we have a value of 100% for both Precision
and Recall. In that case, the F1 score would also be an ideal 100%

• In conclusion, we can say that a model has good

performance if the F1 Score for that model is high.
Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on Heart Attack Risk. Also suggest which metric would not be a good
evaluation parameter here and why?

20-00-02 Standard Practice
100% (1)
20-00-02 Standard Practice
332 pages
Ethics and Public Policy A Philosophical Inquiry PDF
No ratings yet
Ethics and Public Policy A Philosophical Inquiry PDF
2 pages
A-Cat Corp
No ratings yet
A-Cat Corp
26 pages
Final Fantasy RPG Third Edition Core Rulebook
75% (4)
Final Fantasy RPG Third Edition Core Rulebook
418 pages
Definition of Terms Geotech
100% (2)
Definition of Terms Geotech
3 pages
CH 07 Evaluation
No ratings yet
CH 07 Evaluation
25 pages
CH EVALUATION
No ratings yet
CH EVALUATION
7 pages
EVALUATION
No ratings yet
EVALUATION
10 pages
EVALUATION - Notes
No ratings yet
EVALUATION - Notes
15 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
Unit - 7 - Evaluation
No ratings yet
Unit - 7 - Evaluation
30 pages
Evaluation Data
No ratings yet
Evaluation Data
3 pages
Evaluation Grade10 Ai
No ratings yet
Evaluation Grade10 Ai
32 pages
AI Project Evaluation 1
No ratings yet
AI Project Evaluation 1
5 pages
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
No ratings yet
Unit-7 Evaluation: 7. What Is Meant by Overfitting of Data?
7 pages
5.10ai - 2B
No ratings yet
5.10ai - 2B
15 pages
Evaluation Grade10 Ai
No ratings yet
Evaluation Grade10 Ai
32 pages
9 Roc Auc
No ratings yet
9 Roc Auc
27 pages
Partiiiunit2model Performanceconfusion Matrixaccuracyprecesion Recall
No ratings yet
Partiiiunit2model Performanceconfusion Matrixaccuracyprecesion Recall
8 pages
MS Evaluation Worksheet
No ratings yet
MS Evaluation Worksheet
3 pages
Evaluation Worksheet
No ratings yet
Evaluation Worksheet
2 pages
10 Ai Evaluation tp01
No ratings yet
10 Ai Evaluation tp01
5 pages
1051637-Worksheet Part B Unit7 Evaluation
No ratings yet
1051637-Worksheet Part B Unit7 Evaluation
5 pages
Notes of Evaluation
No ratings yet
Notes of Evaluation
5 pages
EVALUATION
No ratings yet
EVALUATION
25 pages
517-C-30072-Assignment Chapter Evaluation
No ratings yet
517-C-30072-Assignment Chapter Evaluation
10 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Evaluation
No ratings yet
Evaluation
32 pages
Unit-7 Evaluation Notes
No ratings yet
Unit-7 Evaluation Notes
9 pages
AI Evaluation
No ratings yet
AI Evaluation
30 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Part B Chapter 7 (Evaluation)
No ratings yet
Part B Chapter 7 (Evaluation)
5 pages
Evaluation AI X
No ratings yet
Evaluation AI X
6 pages
Evaluation Class X Ai 417
No ratings yet
Evaluation Class X Ai 417
19 pages
Evaluation Notes
No ratings yet
Evaluation Notes
12 pages
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
No ratings yet
Accuracy, Precision, Recall & F1 Score Interpretation of Performance Measures
5 pages
c10 Ai Evaluation - 2024-25
No ratings yet
c10 Ai Evaluation - 2024-25
29 pages
Evaluation in AI
No ratings yet
Evaluation in AI
20 pages
Confusion Matrix: A Confusion Matrix Is A Summary of Prediction Results On A Classification Problem
No ratings yet
Confusion Matrix: A Confusion Matrix Is A Summary of Prediction Results On A Classification Problem
13 pages
Unit 7 Evaluation
No ratings yet
Unit 7 Evaluation
13 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Evaluation Exercise
No ratings yet
Evaluation Exercise
3 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
Evaluation 1 7
No ratings yet
Evaluation 1 7
7 pages
CH 7 - Notes Evaluation
No ratings yet
CH 7 - Notes Evaluation
3 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
04 Evaluation Revision Notes
No ratings yet
04 Evaluation Revision Notes
5 pages
Confusion Matrix
No ratings yet
Confusion Matrix
13 pages
Evaluation Metrics For Machine Learning: Negative (Actual) 98 Positive (Actual) 1
No ratings yet
Evaluation Metrics For Machine Learning: Negative (Actual) 98 Positive (Actual) 1
2 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Screenshot 2025-03-17 at 12.15.59
No ratings yet
Screenshot 2025-03-17 at 12.15.59
3 pages
EVALUATION
No ratings yet
EVALUATION
12 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
Aiunit 7 10
No ratings yet
Aiunit 7 10
4 pages
Ads 5
No ratings yet
Ads 5
5 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
AI Evaluation
No ratings yet
AI Evaluation
3 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Unit 3
No ratings yet
Unit 3
13 pages
Unit 7 - Evaluation
No ratings yet
Unit 7 - Evaluation
7 pages
Errors of Regression Models: Bite-Size Machine Learning, #1
From Everand
Errors of Regression Models: Bite-Size Machine Learning, #1
Lee Baker
No ratings yet
A Joosr Guide to... Superforecasting by Philip Tetlock and Dan Gardner: The Art and Science of Prediction
From Everand
A Joosr Guide to... Superforecasting by Philip Tetlock and Dan Gardner: The Art and Science of Prediction
Joosr
No ratings yet
PTCB Pharmacy Calculations Workbook: Master Alligations, Dilutions, IV Flow Rates, Dosages & Conversions with Over 350 Practice Questions with Detailed Explanations
From Everand
PTCB Pharmacy Calculations Workbook: Master Alligations, Dilutions, IV Flow Rates, Dosages & Conversions with Over 350 Practice Questions with Detailed Explanations
Stanley Lawrence Richardson
No ratings yet
Chapter 6 - Probability
No ratings yet
Chapter 6 - Probability
3 pages
n213b Purification
No ratings yet
n213b Purification
253 pages
Sri Yantra Research by Paul Delise
100% (1)
Sri Yantra Research by Paul Delise
34 pages
Report 3.11 PDF
No ratings yet
Report 3.11 PDF
31 pages
Sof - Class 1
100% (1)
Sof - Class 1
8 pages
Grundfos Remote Control System GRM
No ratings yet
Grundfos Remote Control System GRM
3 pages
Community Based Rehabilitation
No ratings yet
Community Based Rehabilitation
9 pages
ICSE Class 10 Maths Chapter 06 Quadratic Equations
No ratings yet
ICSE Class 10 Maths Chapter 06 Quadratic Equations
30 pages
EAP
No ratings yet
EAP
4 pages
Câu hỏi KTCB TA
No ratings yet
Câu hỏi KTCB TA
4 pages
Kapsul
No ratings yet
Kapsul
29 pages
Index: International Marketing International Advertising
No ratings yet
Index: International Marketing International Advertising
37 pages
Data Protection and Recovery in Small Mid-Size
No ratings yet
Data Protection and Recovery in Small Mid-Size
40 pages
Petroleum Project Management
No ratings yet
Petroleum Project Management
65 pages
Breaching The Geisha Spirit On The High School Playground
No ratings yet
Breaching The Geisha Spirit On The High School Playground
6 pages
Lesson Plan R Task 2
No ratings yet
Lesson Plan R Task 2
5 pages
Agricultural Innovation Agricultural Development
No ratings yet
Agricultural Innovation Agricultural Development
17 pages
Self-Concept Questionnaire (SCQ)
No ratings yet
Self-Concept Questionnaire (SCQ)
12 pages
Electrodynamics of Solids - Dressel Gruner
100% (3)
Electrodynamics of Solids - Dressel Gruner
487 pages
Raj Kishore Resume
No ratings yet
Raj Kishore Resume
4 pages
Airports Authority of India: Air Traffic Flow Management - India
No ratings yet
Airports Authority of India: Air Traffic Flow Management - India
40 pages
Anova 1: Eric Jacobs Hubert Korzilius
No ratings yet
Anova 1: Eric Jacobs Hubert Korzilius
39 pages
Offer Salary Package
No ratings yet
Offer Salary Package
4 pages
Math Reviewer
No ratings yet
Math Reviewer
4 pages
Compendium PDF
No ratings yet
Compendium PDF
128 pages

Unit 7 - AI (Evaluation)

Uploaded by

Unit 7 - AI (Evaluation)

Uploaded by

Unit 7

Real time scenario Regular interval scenario

• Non-stop critical emergent life • Less critical problem area.

True Positive False Positive

Accuracy Precision Recall F1 Score

• It can be described as the percentage of positively detected cases that

• To conclude the argument, we must say that if we

• In conclusion, we can say that a model has good

You might also like