0% found this document useful (0 votes)
6 views16 pages

6 Model Evalution

The document discusses model evaluation techniques in classification, focusing on metrics such as accuracy, precision, recall, and F1 score, along with their limitations. It also explains cross-validation methods, including Holdout Validation, Leave-One-Out Cross Validation, Stratified Cross-Validation, and K-Fold Cross Validation, emphasizing their importance in preventing overfitting and ensuring robust model performance. Each method has its advantages and drawbacks, particularly in relation to dataset size and class distribution.

Uploaded by

sknihal.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views16 pages

6 Model Evalution

The document discusses model evaluation techniques in classification, focusing on metrics such as accuracy, precision, recall, and F1 score, along with their limitations. It also explains cross-validation methods, including Holdout Validation, Leave-One-Out Cross Validation, Stratified Cross-Validation, and K-Fold Cross Validation, emphasizing their importance in preventing overfitting and ensuring robust model performance. Each method has its advantages and drawbacks, particularly in relation to dataset size and class distribution.

Uploaded by

sknihal.cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Classification -- Model

evaluations

Confusion Matrix
• TP – True Positive ; FP – False Positive
• FN – False Negative; TN – True Negative
Predicted Class
Actual Class = Yes Class = No
Class
Class = Yes a (TP) b (FN)
Class = No c (FP) d (TN)

a d TP 
Accuracy 
 a b c d TP  TN  FP 
FN

1
Classification -- Model
evaluations
• Given a set of records containing positive and negative results, the
computer is going to classify the records to be positive or
negative.

• Positive: The computer classifies the result to be positive


• Negative: The computer classifies the result to be negative
• True: What the computer classifies is true
• False: What the computer classifies is false

2
Classification -- Model
evaluations
• Limitation of Accuracy
• Consider a 2-class problem
• Number of Class 0 examples = 9990
• Number of Class 1 examples = 10
• If a “stupid” model predicts everything to be class 0, accuracy is 9990/10000 =
99.9 %

• The accuracy is misleading because the model does not detect any
example in class 1

3
Classification -- Model
evaluations
• Cost-sensitive measures
Predicted Class
Actual Class = Yes Class = No
Class
Class = Yes a (TP) b (FN)
Class = No c (FP) d (TN)

T a
Precision (p) 
 TPP FP a  c
T a Harmonic mean of Precision and Recall
Recall (r) 
 TPP FN a  b (Why not just average?)
2rp 2a
F - measure (F) 
 rp 2a  b 
c 4
How to understand
• Accuracy
• Accuracy = (TP+TN)/(TP+FP+FN+TN)
• How many students did we correctly label out of all the students?

• Precision
• Precision = TP/(TP+FP)
• How many of those who we labeled as diabetic are actually diabetic?

• Recall (sensitivity)
• Recall = TP/(TP+FN)
• Of all the people who are diabetic, how many of those we correctly predict?

• F1 Score = 2*(Recall * Precision) / (Recall + Precision)


• Harmonic mean (average) of the precision and recall
5
Which to choose
• Accuracy
• A great measure
• But only when you have symmetric datasets (FN & FP counts are close)
• Also, FN & FP have similar costs
• F1 score
• If the cost of FP and FN are different
• F1 is best if you have an uneven class distribution
• Recall
• If FP is far better than FN or if the occurrence of FN is unaccepted/intolerable
• Would like more extra FP (false alarms) over saving some FN
• E.g. diabetes. We’d rather get some healthy people labeled diabetic over leaving a
diabetic person labeled healthy
• Precision
• Want to be more confident of your TP
• E.g. spam emails. We’d rather have some spam emails in inbox rather than some
regular emails in your spam box.

6
Example
• Given 30 human photographs, a computer predicts 19 to be male, 11
to be female. Among the 19 male predictions, 3 predictions are not
correct. Among the 11 female predictions, 1 prediction is not
correct.

Predicted Class
Actual Male Female
Class
Male a = TP = 16 b = FN = 1
Female c = FP = 3 d = TN = 10

7
Example
Predicted Class
Actual Male Female
Class
Male a = TP = 16 b = FN = 1
Female c = FP = 3 d = TN = 10

• Accuracy = (16 + 10) / (16 + 3 + 1 + 10) = 0.867


• Precision = 16 / (16 + 3) = 0.842
• Recall = 16 / (16 + 1) = 0.941
• F-measure = 2 (0.842)(0.941) / (0.842 +
0.941)
= 0.889
8
Discussion
• “In a specific case, precision cannot be computed.” Is the statement true?
Why?
• If the statement is true, can F-measure be computed in that case?
a b c Classified as
a TP FN FN a: positive
b FP TN TN b: negative
c FP TN TN c: negative

• How about if b is positive, a and c are negative, or if c is positive, a and b


are negative ?

9
Cross Validation in Machine
Learning
In machine learning, we couldn’t fit the
model on the training data and can’t say that
the model will work accurately for the real data.
For this, we must assure that our model got the
correct patterns from the data, and it is not
getting up too much noise. For this purpose, we
use the cross-validation technique. In this
article, we’ll delve into the process of cross-
validation in machine learning.
What is Cross-Validation?
Cross validation is a technique used in machine
learning to evaluate the performance of a model on
unseen data.
It involves dividing the available data into multiple
folds or subsets, using one of these folds as a
validation set, and training the model on the remaining
folds. This process is repeated multiple times, each
time using a different fold as the validation set. Finally,
the results from each validation step are averaged to
produce a more robust estimate of the model’s
performance. Cross validation is an important step in
the machine learning process and helps to ensure that
What is cross-validation used for?
The main purpose of cross validation is to prevent overfitting,
which occurs when a model is trained too well on the training
data and performs poorly on new, unseen data. By evaluating
the model on multiple validation sets, cross validation
provides a more realistic estimate of the model’s
generalization performance, i.e., its ability to perform well on
new, unseen data.
Types of Cross-Validation
There are several types of cross validation techniques,
including k-fold cross validation, leave-one-out cross
validation, and Holdout validation, Stratified Cross-
Validation. The choice of technique depends on the size and
nature of the data, as well as the specific requirements of the
1. Holdout Validation
In Holdout Validation, we perform training on the
50% of the given dataset and rest 50% is used for
the testing purpose. It’s a simple and quick way to
evaluate a model. The major drawback of this
method is that we perform training on the 50% of the
dataset, it may possible that the remaining 50% of
the data contains some important information which
we are leaving while training our model i.e. higher
bias.
2. LOOCV (Leave One Out Cross Validation)
In this method, we perform training on the whole dataset but
leaves only one data-point of the available dataset and then
iterates for each data-point. In LOOCV, the model is trained
on n−1 samples and tested on the one omitted sample,
repeating this process for each data point in the dataset. It has
some advantages as well as disadvantages also.
An advantage of using this method is that we make use of all
data points and hence it is low bias.
The major drawback of this method is that it leads to higher
variation in the testing model as we are testing against one
data point. If the data point is an outlier it can lead to higher
variation. Another drawback is it takes a lot of execution
time as it iterates over ‘the number of data points’ times.
3. Stratified Cross-Validation
It is a technique used in machine learning to ensure that each fold of the
cross-validation process maintains the same class distribution as the
entire dataset. This is particularly important when dealing with
imbalanced datasets, where certain classes may be underrepresented. In
this method,
1.The dataset is divided into k folds while maintaining the proportion of
classes in each fold.
2.During each iteration, one-fold is used for testing, and the remaining
folds are used for training.
3.The process is repeated k times, with each fold serving as the test set
exactly once.
Stratified Cross-Validation is essential when dealing with classification
problems where maintaining the balance of class distribution is crucial for
the model to generalize well to unseen data.
4. K-Fold Cross Validation
In K-Fold Cross Validation, we split the dataset into k number of
subsets (known as folds) then we perform training on the all the
subsets but leave one(k-1) subset for the evaluation of the trained
model. In this method, we iterate k times with a different subset
reserved for testing purpose each time.

You might also like