Classification - Performance Evlaution
Classification - Performance Evlaution
PERFORMANCE EVALUATION
APPLIED DATASCIENCE WITH PYTHON
True positives .TP/: These refer to the positive tuples that were correctly labeled by the classifier. Let TP be
the number of true positives.
True negatives .TN/: These are the negative tuples that were correctly labeled by the classifier. Let TN be the
number of true negatives.
False positives .FP/: These are the negative tuples that were incorrectly labeled as positive (e.g., tuples of
class buys computer D no for which the classifier predicted buys computer D yes). Let FP be the number of
false positives.
False negatives .FN/: These are the positive tuples that were mislabeled as negative (e.g., tuples of class buys
computer D yes for which the classifier predicted buys computer D no). Let FN be the number of false
negatives.
5
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
6
Classifier Evaluation Metrics: Example
7
Classifier Evaluation Metrics
Ref : Jiawei Han, Micheline Kamper, Jian Pei, “Data Mining: Concepts and Techniques”, Morgan
Kaufman, Third Edition, 2011
10
Evaluating Classifier Accuracy:
Cross Validation Method
Cross-validation (k-fold, where k = 10 is most popular)
◦ Randomly partition the data into k mutually exclusive subsets, each approximately equal size
◦ At i-th iteration, use Di the as test set and others the as training set.
and the remaining partitions are collectively used to train the model. That is, in the first iteration, subsets D2,:::, Dk collectively serve as the training
◦ *Stratified cross-validation*: folds are stratified so that class distribution in each fold is approx. the same as that in the initial data
11
Evaluating Classifier Accuracy: Bootstrap
Bootstrap
◦ Works well with small data sets
◦ Samples the given training tuples uniformly with replacement
◦ i.e., each time a tuple is selected, it is equally likely to be selected again and re-added to the training
set
12
THANK YOU
August 2021