0% found this document useful (0 votes)
22 views15 pages

EVALUATION - Notes

The document provides an overview of evaluation in artificial intelligence, detailing its importance, the concept of overfitting, and the distinction between prediction and reality. It explains various evaluation metrics such as accuracy, precision, recall, and F1 score, along with examples and scenarios where these metrics are applied. Additionally, it discusses the implications of false positives and false negatives in different contexts, emphasizing the need for careful metric selection based on the specific case at hand.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views15 pages

EVALUATION - Notes

The document provides an overview of evaluation in artificial intelligence, detailing its importance, the concept of overfitting, and the distinction between prediction and reality. It explains various evaluation metrics such as accuracy, precision, recall, and F1 score, along with examples and scenarios where these metrics are applied. Additionally, it discusses the implications of false positives and false negatives in different contexts, emphasizing the need for careful metric selection based on the specific case at hand.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

EVALUATION - Class 10

Artificial Intellignce(417)

1. What is Evaluation?
Ans : Evaluation is the process of understanding the reliability of any AI model,
based on outputs by feeding test dataset into the model and comparing with
actual answers. Its purpose is to make judgments about a program, to improve
its effectiveness, and/or to inform programming decisions.

2. Why is Evaluation important? Explain.


Ans : Evaluation is a process that critically examines a program by collecting
and analyzing information about a program’s activities, characteristics and
outcomes. The advantages of Evaluation are as follows :
i. Evaluation ensures that the model is operating correctly and
optimally.
ii. Evaluation is an initiative to understand how well it achieves its goals.
iii. Evaluation help to determine what works well and what could be
improved in a Program.

3. What is meant by Overfitting of Data?

Overfitting is "the production of an analysis that corresponds too closely or


exactly to a particular set of data, and may therefore fail to fit additional data
or predict future observations reliably".

OR

Models that use the training dataset during testing, will always results in
correct output. This is known as overfitting

4. What are Prediction & Reality in relation to Evaluation?


Ans :Prediction – It is the output given by the AI model using Machine Learning
Algorithm.
Reality – It is the real scenario of the situation for which the prediction
has been made.

5. Differentiate between Prediction and Reality.

Ans :

a) Prediction is the input given to the machine to receive the expected result

of reality.

b) Prediction is the output given to match reality.

c) The prediction is the output which is given by the machine and the reality is

the real scenario in which the prediction has been made.

d) Prediction and reality both can be used interchangeably.

6. Terminologies of Model Evaluation


The Scenario

Let’s imagine that we have an AI-based prediction model which has been
deployed to identify a Football or a soccer ball.

Now, the objective of the model is to predict whether the given/shown figure
is a football. Now, to understand the efficiency of this model, we need to
check if the predictions which it makes are correct or not. So we need to
consider upon Prediction and Reality.

Case 1 :

a) Prediction = YES
b) Reality = YES

 The predicted value matches the actual value.


 Here, the Prediction is positive and matches Reality. Hence, this
condition is termed as True Positive.

Case 2 :

a) Prediction = No
b) Reality = No

 The predicted value matches the actual value.

 Here, the Prediction is negative and matches Reality. Hence, this


condition is termed as True Negative.

Case 3 :

a) Prediction = Yes
b) Reality = No

 The predicted value does not match the actual value.

 Here, the Prediction is positive and does not match Reality. Hence, this
condition is termed as False Positive.

 This is also known as Type 1 Error.

Case 4 :

c) Prediction = No
d) Reality = Yes

 The predicted value does not match the actual value.

 Here, the Prediction is negative and does not match Reality. Hence, this
condition is termed as False Negative.

 This is also known as Type 2 Error.


7. What is Confusion Matrix?

Ans : Confusion Matrix is a


tabular structure which
helps in measuring the
performance of an AI
model using the test data.
The result of comparison
between the prediction
and reality are recorded in
confusion matrix. It is a
record that helps in
evaluation.

8. Parameters to Evaluate a Model

9. What is Accuracy? Mention its formula.

Ans : Accuracy is defined as the percentage of correct predictions out of all


the observations. A prediction is said to be correct if it matches reality.
Here we have two conditions in which the Prediction matches with the
Reality, i.e., True Positive and True Negative. Therefore, Formula for
Accuracy is –
Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN
= False Negatives.

10. What is Precision? Mention its formula.

Ans : Precision is defined as the percentage of true positive cases versus all
the cases where the prediction is true. That is, it takes into account the
True Positives and False Positives.

11. What is Recall? Mention its formula.


Ans : Recall is defined as the fraction of positive cases that are correctly
Identified. It majorly takes into account the true reality cases.

12. How do you suggest which evaluation metric is more important for any
case ?
Ans :

F 1 Evaluation metric is more important in any case. F1 score maintains a


balance between the precision and recall for the classifier. If the precision is
low, the F1 is low and if the recall is low again F1 score is low. The F1 score
is a number between 0 and 1 and is the harmonic mean of precision and
recall. When we have a value of 1 (that is 100%) for both Precision and
Recall, the F1 score would also be an ideal 1 (100%). It is known as the
perfect value for F1 Score. As the values of both Precision and Recall ranges
from 0 to 1, the F1 score also ranges from 0 to 1.
A model is said to have a good performance if the F1 score for that model is
high.
13.Give an example where High Accuracy is not usable.
Ans : SCENARIO: An expensive robotic chicken crosses a very busy road a
thousand times per day. An ML model evaluates traffic patterns and
predicts when this chicken can safely cross the street with an accuracy of
99.99%.
Explanation: A 99.99% accuracy value on a very busy road strongly suggests
that the ML model is far better than chance. In some settings, however, the
cost of making even a small number of mistakes is still too high. 99.99%
accuracy means that the expensive chicken will need to be replaced, on
average, every 10 days. (The chicken might also cause extensive damage to
cars that it hits.)
14.Give an example where High Precision is not usable.
Ans : Example: “Predicting a mail as Spam or Not Spam”
False Positive: Mail is predicted as “spam” but it is “not spam”.
False Negative: Mail is predicted as “not spam” but it is “spam”. Of course,
too many False Negatives will make the spam filter ineffective but False
Positives may cause important mails to be missed and hence Precision is
not usable.

15. Which evaluation metric would be crucial in the following cases? Justify.
 In a case like Forest Fire, a False Negative can cost us a lot and is risky

too. Imagine no alert being given even when there is a Forest Fire. The whole
forest might burn down.

 Another case where a False Negative can be dangerous is Viral Outbreak.


Imagine a deadly virus has started spreading and the model which is supposed
to predict a viral outbreak does not detect it. The virus might spread widely
and infect a lot of people.

 On the other hand, there can be cases in which the False Positive

condition costs us more than False Negatives. One such case is Mining.
Imagine a model telling you that there exists treasure at a point and you keep
on digging there but it turns out that it is a false alarm. Here, the False
Positive case (predicting there is a treasure but there is no treasure) can be
very costly.

 Similarly, let’s consider a model that predicts whether a mail is spam or


not. If the model always predicts that the mail is spam, people would not look
at it and eventually might lose important information. Here also False Positive
condition (Predicting the mail as spam while the mail is not spam) would have
a high cost.

16.Cases of High FN Cost

Forest Fire

Viral

Cases of High FP Cost

Spam

Mining
17. Calculate Accuracy, Precision, Recall and F1 Score for the following
Confusion Matrix on Heart Attack Risk. Also suggest which metric would be a
good evaluation parameter here and why?

Where True Positive (TP) = 50, True Negative (TN) = 20, False Positive (FP) = 20 and
False Negative (FN) = 10.
Accuracy

=((50+20) / (50+20+20+10))*100%

= (70/100) * 100%

= 0.7 * 100% = 70%

Precision:

Precision is defined as the percentage of true positive cases versus all the cases

where the prediction is true.

= (50 / (50 + 20)) * 100%

= (50/70)*100%

= 0.714 *100% = 71.4%

Recall: It is defined as the fraction of positive cases that are correctly identified.

= 50 / (50 + 10)
= 50 / 60
= 0.83

F1 Score:

F1 score is defined as the measure of balance between precision and recall.


= 2 * (0.714 *0.83) / (0.714 + 0.83)

= 2 * (0.592 / 1.544)

= 2* (0.383) = 0.766

Therefore,

Accuracy= 0.7 Precision=0.714 Recall=0.83 F1 Score=0.766

Here within the test there is a tradeoff. But Recall is a good Evaluation metric.

Recall metric needs to improve more.

Because,

False Positive (impacts Precision): A person is predicted as high risk but does not

have heart attack.

False Negative (impacts Recall): A person is predicted as low risk but has heart

attack. Therefore, False Negatives miss actual heart patients, hence recall metric
need more improvement.

False Negatives are more dangerous than False Positives.

18. Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion

Matrix on Water Shortage in Schools: Also suggest which metric would not be

a good evaluation parameter here and why?


Where True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN).

Accuracy

Accuracy is defined as the percentage of correct predictions out of all the

observations

= ((75+15) / (75+15+5+5))*100%

= (90 / 100) *100%

=0.9 *100% = 90%

Precision:

Precision is defined as the percentage of true positive cases versus all the cases

where the prediction is true.

= (75 / (75+5))*100%
= (75 /80)*100%
= 0.9375 *100% = 93%
Recall:

It is defined as the fraction of positive cases that are correctly identified.

= 75 / (75+5)

= 75 /80

= 0.9375

F1 Score:

F1 score is defined as the measure of balance between precision and recall.

= 2 * ((0.9375 *0.9375) / (0.9375+0.9375)


= 2 * (0.8789 / 1.875)
= 2 * 0.46875 = 0.9375

Accuracy= 90% Precision=93% Recall=0.9375 F1 Score=0.9375

Here precision, recall, accuracy, f1 score all are same

19. Calculate Accuracy, Precision, Recall and F1 Score for the following

Confusion Matrix on SPAM FILTERING: Also suggest which metric would not be a

good evaluation parameter here and why?


Accuracy is defined as the percentage of correct predictions out of all the

Observations

Where True Positive (TP) = 10, True Negative (TN) = 25, False Positive (FP) = 55 and
False Negative (FN) = 10.

Accuracy

= ((10 + 25) / (10+25+55+10))*100%

= (35 / 100)*100%

= 0.35 %100% = 35%

Precision:

Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.

= (10 / (10 +55))*100%


= (10 /65) *100%
= 0.15 *100% = 15%

Recall:

= 10/(10+10)
= 10/20
0.5

F1 Score

F1 score is defined as the measure of balance between precision and recall.

= 2 * ((0.15 * 0.5) / (0.15 + 0.5))

= 2 * (0.075 / 0.65)

= 2 * 0.115

= 0.23

Accuracy= 35% Precision= 15% Recall= 0.5 F1 Score= 0.23

Here within the test there is a tradeoff. But Precision is not a good Evaluation

metric. Precision metric needs to improve more.

Because,
False Positive (impacts Precision): Mail is predicted as “spam” but it is not.

False Negative (impacts Recall): Mail is predicted as “not spam” but spam

Too many False Negatives will make the Spam Filter ineffective. But False

Positives may cause important mails to be missed. Hence, Precision is more

important to improve.

You might also like