0% found this document useful (0 votes)
61 views15 pages

8c - Model Evaluation and Selection

This document discusses model evaluation and selection for classification algorithms. It introduces various evaluation metrics that can measure a model's accuracy such as accuracy, precision, recall, F-measure, sensitivity and specificity. These metrics are calculated based on the confusion matrix, which contains true positives, true negatives, false positives and false negatives. The document also discusses the trade-off between precision and recall and weighted F-measures that combine them. Overall model accuracy needs to be evaluated on a validation test set for fair assessment.

Uploaded by

Zafar Iqbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views15 pages

8c - Model Evaluation and Selection

This document discusses model evaluation and selection for classification algorithms. It introduces various evaluation metrics that can measure a model's accuracy such as accuracy, precision, recall, F-measure, sensitivity and specificity. These metrics are calculated based on the confusion matrix, which contains true positives, true negatives, false positives and false negatives. The document also discusses the trade-off between precision and recall and weighted F-measures that combine them. Overall model accuracy needs to be evaluated on a validation test set for fair assessment.

Uploaded by

Zafar Iqbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

CS423

DATA WAREHOUSING AND DATA


MINING

Chapter 8c
Classification – Model Evaluation and
Selection

Dr. Hammad Afzal

[email protected]

Department of Computer Software Engineering


National University of Sciences and Technology (NUST)
CHAPTER 8. CLASSIFICATION: BASIC
CONCEPTS
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy: Ensemble
Methods
 Summary

2
USING IF-THEN RULES
 Represent the knowledge in the form of IF-THEN rules
R: IF age = youth AND student = yes THEN buys_computer = yes
 Rule antecedent/precondition vs. rule consequent

 Assessment of a rule: coverage and accuracy


 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D|
accuracy(R) = ncorrect / ncovers

3
USING IF-THEN RULES
 If more than one rule are triggered, need conflict resolution
 Size ordering: assign the highest priority to the triggering rules that has
the “toughest” requirement (i.e., with the most attribute tests)

 Class-based ordering: decreasing order of prevalence or


misclassification cost per class

 Rule-based ordering (decision list): rules are organized into one long
priority list, according to some measure of rule quality or by experts

4
RULE EXTRACTION FROM A
DECISION TREE
 Rules are easier to understand than large
trees age?
 One rule is created for each path from the <=30 31..40 >40
root to a leaf
student? credit rating?
yes
 Each attribute-value pair along a path forms a
excellent fair
conjunction: the leaf holds the class no yes

no yes
prediction no yes

 Rules are mutually exclusive and exhaustive


 Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
5
IF age = old AND credit_rating = excellent THEN buys_computer = no
IF age = old AND credit_rating = fair THEN buys_computer = yes
CHAPTER 8. CLASSIFICATION: BASIC
CONCEPTS
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy: Ensemble
Methods
 Summary

6
MODEL EVALUATION AND SELECTION

 Evaluation metrics: How can we measure accuracy? Other


metrics to consider?

 Use validation test set of class-labeled tuples instead of


training set when assessing accuracy

 Some of the measures are:


 Accuracy – suitable when class tuples are evenly distributed
 Precision - suitable when class tuples are not evenly distributed
 Recall - Sensitivity

7
CLASSIFIER EVALUATION METRICS:
CONFUSION MATRIX
Confusion Matrix:

Actual class\Predicted class Yes No


Yes True Positives (TP) False Negatives (FN)
No False Positives (FP) True Negatives (TN)

Actual class\Predicted class C1 ¬ C1


C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

 Given m classes, an entry, CMi,j in a confusion matrix indicates


# of tuples in class i that were labeled by the classifier as class j
 May have extra rows/columns to provide totals
8
CLASSIFIER EVALUATION METRICS:
CONFUSION MATRIX
 True Positives
 Postive tuples correctly classified as positive.

 True Negatives:
 Negative tuples correctly classified as negative.

 False Positives:
 Negative tuples incorrectly classified as positives.

 False Negatives:
 Positive tuples incorrectly classified as negatives

9
CLASSIFIER EVALUATION METRICS:
CONFUSION MATRIX
Confusion Matrix:
Actual class\Predicted class C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

Example of Confusion Matrix:


Actual class\Predicted buy_computer buy_computer Total
class = yes = no
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000

 Given m classes, an entry, CMi,j in a confusion matrix indicates


# of tuples in class i that were labeled by the classifier as class j
 May have extra rows/columns to provide totals

10
ACCURACY, ERROR RATE,
SENSITIVITY AND SPECIFICITY
A\P C ¬C
C TP FN P
¬C FP TN N
P’ N’ All

 Classifier Accuracy, or recognition rate: percentage of test


set tuples that are correctly classified
Accuracy = (TP + TN)/All

 Error rate: 1 – accuracy, or


Error rate = (FP + FN)/All

11
ACCURACY, ERROR RATE,
SENSITIVITY AND SPECIFICITY
 Class Imbalance Problem:
 One class may be rare, e.g. fraud, or HIV-positive

 Significant majority of the negative class and minority of

the positive class

 Sensitivity: True Positive recognition rate


 Sensitivity = TP/P

 Specificity: True Negative recognition rate


 Specificity = TN/N

12
PRECISION AND RECALL, AND F-
MEASURES
 Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive

 Recall: completeness – what % of positive tuples did the


classifier label as positive?
 Perfect score is 1.0

13
CLASSIFIER EVALUATION METRICS:
PRECISION AND RECALL, AND F-
MEASURES
 Inverse relationship between precision & recall
 F measure (F or F-score): harmonic mean of precision and
1
recall,

 Fß: weighted measure of precision and recall


 assigns ß times as much weight to recall as to precision

14
CLASSIFIER EVALUATION METRICS: EXAMPLE
Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56


(specificity)

Total 230 9770 10000 96.40 (accuracy)

 Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%

15

You might also like