0% found this document useful (0 votes)
272 views13 pages

Classification - Performance Evlaution

Uploaded by

Mhd Aslam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
272 views13 pages

Classification - Performance Evlaution

Uploaded by

Mhd Aslam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

CLASSIFIER–

PERFORMANCE EVALUATION
APPLIED DATASCIENCE WITH PYTHON

TCE Online Course - <<Course Name>> 1


Model Evaluation and Selection
Use validation test set of class-labeled Comparing classifiers:
tuples instead of training set when ◦ Confidence intervals
assessing the accuracy ◦ Cost-benefit analysis and ROC Curves
Methods for estimating a classifier’s
accuracy:
◦ Holdout method, random subsampling
◦ Cross-validation
◦ Bootstrap

TCE Online Course - <<Course Name>> 2


Classifier Evaluation Metrics: Confusion Matrix
Confusion Matrix:
Actual class\Predicted class C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

True positives .TP/: These refer to the positive tuples that were correctly labeled by the classifier. Let TP be
the number of true positives.
True negatives .TN/: These are the negative tuples that were correctly labeled by the classifier. Let TN be the
number of true negatives.
False positives .FP/: These are the negative tuples that were incorrectly labeled as positive (e.g., tuples of
class buys computer D no for which the classifier predicted buys computer D yes). Let FP be the number of
false positives.
False negatives .FN/: These are the positive tuples that were mislabeled as negative (e.g., tuples of class buys
computer D yes for which the classifier predicted buys computer D no). Let FN be the number of false
negatives.

TCE Online Course - <<Course Name>> 3


Classifier Evaluation Metrics: Confusion Matrix
Example of Confusion Matrix:
Actual class\Predicted buy_computer buy_computer Total
class = yes = no
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000

Given m classes (where m ≥ 2), a confusion matrix is a table of at least


size m by m

TCE Online Course - <<Course Name>> 4


Classifier Evaluation Metrics: Accuracy, Error Rate,
Sensitivity and Specificity
A\P C ¬C  Class Imbalance Problem:
C TP FN P  One class may be rare, e.g. fraud, or HIV-
¬C FP TN N positive
P’ N’ All  Significant majority of the negative class

Classifier Accuracy, or recognition rate: and minority of the positive class


percentage of test set tuples that are
 Sensitivity: True Positive recognition
correctly classified
Accuracy = (TP + TN)/All rate
 Sensitivity = TP/P
Error rate: 1 – accuracy, or
 Specificity: True Negative recognition
Error rate = (FP + FN)/All
If we were to use the training set (instead of a test set) to rate
estimate the error rate of
 Specificity = TN/N
a model, this quantity is known as the resubstitution error.

5
Classifier Evaluation Metrics:
Precision and Recall, and F-measures

Precision: exactness – what % of tuples that


the classifier labeled as positive are actually
positive

Recall: completeness – what % of positive


tuples did the classifier label as positive?
Perfect score is 1.0
Inverse relationship between precision & recall
F measure (F1 or F-score): harmonic mean of
precision and recall,

Fß: weighted measure of precision and recall


◦βassigns
is a non negative
ß times as muchnumber.
weight to Commonly
recall as used Fβ measures are F2
to precision
(which weights recall twice as much as precision) and F0.5 (which
weights precision twice as much as recall).

6
Classifier Evaluation Metrics: Example

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)


cancer = yes 90 210 300 30.00 (sensitivity
cancer = no 140 9560 9700 98.56 (specificity)
Total 230 9770 10000 96.40 (accuracy)

◦ Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%

7
Classifier Evaluation Metrics

Ref : Jiawei Han, Micheline Kamper, Jian Pei, “Data Mining: Concepts and Techniques”, Morgan
Kaufman, Third Edition, 2011

TCE Online Course - <<Course Name>> 8


Classifier Performance
In addition to accuracy-based measures, classifiers can also be compared with respect
to the following additional aspects:
Speed: This refers to the computational costs involved in generating and using the given classifier.
Robustness: This is the ability of the classifier to make correct predictions given noisy data or data with missing
values. Robustness is typically assessed with a series of synthetic data sets representing increasing degrees of noise
and missing values.
Scalability: This refers to the ability to construct the classifier efficiently given large amounts of data. Scalability
is typically assessed with a series of data sets of increasing size.
Interpretability: This refers to the level of understanding and insight that is provided
by the classifier or predictor. Interpretability is subjective and therefore more difficult
to assess.

TCE Online Course - <<Course Name>> 9


Evaluating Classifier Accuracy:
Holdout Method
Holdout method
◦ Given data is randomly partitioned into two independent sets
◦ Training set (e.g., 2/3) for model construction
◦ Test set (e.g., 1/3) for accuracy estimation
◦ The training set is used to
derive the model. The model’s accuracy is then estimated with the test set

◦ Random sampling: a variation of holdout


◦ Repeat holdout k times, accuracy = avg. of the accuracies obtained

10
Evaluating Classifier Accuracy:
Cross Validation Method
Cross-validation (k-fold, where k = 10 is most popular)
◦ Randomly partition the data into k mutually exclusive subsets, each approximately equal size
◦ At i-th iteration, use Di the as test set and others the as training set.

◦ In iteration i, partition Di is reserved as the test set,

and the remaining partitions are collectively used to train the model. That is, in the first iteration, subsets D2,:::, Dk collectively serve as the training

set to obtain a first model, which is tested on D1;

Special case of k-fold cross-validation


◦ Leave-one-out: k folds where k = # of tuples, for small-sized data.
◦ That is, only one sample is “left out” at a time for the test set.

◦ *Stratified cross-validation*: folds are stratified so that class distribution in each fold is approx. the same as that in the initial data

11
Evaluating Classifier Accuracy: Bootstrap

Bootstrap
◦ Works well with small data sets
◦ Samples the given training tuples uniformly with replacement
◦ i.e., each time a tuple is selected, it is equally likely to be selected again and re-added to the training
set

Several bootstrap methods, and a common one is .632 boostrap


◦ A data set with d tuples is sampled d times, with replacement, resulting in a training set of d samples.
The data tuples that did not make it into the training set end up forming the test set. About 63.2% of the
original data end up in the bootstrap, and the remaining 36.8% form the test set (since (1 – 1/d) d ≈ e-1 =
0.368)
◦ Repeat the sampling procedure k times, overall accuracy of the model:

12
THANK YOU

Dr. S.Sridevi M.E, Ph.D


Associate Professor,
Department of Information Technology

August 2021

TCE Online Course - <<Applied Data Science with Python>> 13

You might also like