EVALUATION - Notes
EVALUATION - Notes
Artificial Intellignce(417)
1. What is Evaluation?
Ans : Evaluation is the process of understanding the reliability of any AI model,
based on outputs by feeding test dataset into the model and comparing with
actual answers. Its purpose is to make judgments about a program, to improve
its effectiveness, and/or to inform programming decisions.
OR
Models that use the training dataset during testing, will always results in
correct output. This is known as overfitting
Ans :
a) Prediction is the input given to the machine to receive the expected result
of reality.
c) The prediction is the output which is given by the machine and the reality is
Let’s imagine that we have an AI-based prediction model which has been
deployed to identify a Football or a soccer ball.
Now, the objective of the model is to predict whether the given/shown figure
is a football. Now, to understand the efficiency of this model, we need to
check if the predictions which it makes are correct or not. So we need to
consider upon Prediction and Reality.
Case 1 :
a) Prediction = YES
b) Reality = YES
Case 2 :
a) Prediction = No
b) Reality = No
Case 3 :
a) Prediction = Yes
b) Reality = No
Here, the Prediction is positive and does not match Reality. Hence, this
condition is termed as False Positive.
Case 4 :
c) Prediction = No
d) Reality = Yes
Here, the Prediction is negative and does not match Reality. Hence, this
condition is termed as False Negative.
Ans : Precision is defined as the percentage of true positive cases versus all
the cases where the prediction is true. That is, it takes into account the
True Positives and False Positives.
12. How do you suggest which evaluation metric is more important for any
case ?
Ans :
15. Which evaluation metric would be crucial in the following cases? Justify.
In a case like Forest Fire, a False Negative can cost us a lot and is risky
too. Imagine no alert being given even when there is a Forest Fire. The whole
forest might burn down.
On the other hand, there can be cases in which the False Positive
condition costs us more than False Negatives. One such case is Mining.
Imagine a model telling you that there exists treasure at a point and you keep
on digging there but it turns out that it is a false alarm. Here, the False
Positive case (predicting there is a treasure but there is no treasure) can be
very costly.
Forest Fire
Viral
Spam
Mining
17. Calculate Accuracy, Precision, Recall and F1 Score for the following
Confusion Matrix on Heart Attack Risk. Also suggest which metric would be a
good evaluation parameter here and why?
Where True Positive (TP) = 50, True Negative (TN) = 20, False Positive (FP) = 20 and
False Negative (FN) = 10.
Accuracy
=((50+20) / (50+20+20+10))*100%
= (70/100) * 100%
Precision:
Precision is defined as the percentage of true positive cases versus all the cases
= (50/70)*100%
Recall: It is defined as the fraction of positive cases that are correctly identified.
= 50 / (50 + 10)
= 50 / 60
= 0.83
F1 Score:
= 2 * (0.592 / 1.544)
= 2* (0.383) = 0.766
Therefore,
Here within the test there is a tradeoff. But Recall is a good Evaluation metric.
Because,
False Positive (impacts Precision): A person is predicted as high risk but does not
False Negative (impacts Recall): A person is predicted as low risk but has heart
attack. Therefore, False Negatives miss actual heart patients, hence recall metric
need more improvement.
18. Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion
Matrix on Water Shortage in Schools: Also suggest which metric would not be
Accuracy
observations
= ((75+15) / (75+15+5+5))*100%
Precision:
Precision is defined as the percentage of true positive cases versus all the cases
= (75 / (75+5))*100%
= (75 /80)*100%
= 0.9375 *100% = 93%
Recall:
= 75 / (75+5)
= 75 /80
= 0.9375
F1 Score:
19. Calculate Accuracy, Precision, Recall and F1 Score for the following
Confusion Matrix on SPAM FILTERING: Also suggest which metric would not be a
Observations
Where True Positive (TP) = 10, True Negative (TN) = 25, False Positive (FP) = 55 and
False Negative (FN) = 10.
Accuracy
= (35 / 100)*100%
Precision:
Precision is defined as the percentage of true positive cases versus all the cases
where the prediction is true.
Recall:
= 10/(10+10)
= 10/20
0.5
F1 Score
= 2 * (0.075 / 0.65)
= 2 * 0.115
= 0.23
Here within the test there is a tradeoff. But Precision is not a good Evaluation
Because,
False Positive (impacts Precision): Mail is predicted as “spam” but it is not.
False Negative (impacts Recall): Mail is predicted as “not spam” but spam
Too many False Negatives will make the Spam Filter ineffective. But False
important to improve.