0% found this document useful (0 votes)

61 views15 pages

8c - Model Evaluation and Selection

This document discusses model evaluation and selection for classification algorithms. It introduces various evaluation metrics that can measure a model's accuracy such as accuracy, precision, recall, F-measure, sensitivity and specificity. These metrics are calculated based on the confusion matrix, which contains true positives, true negatives, false positives and false negatives. The document also discusses the trade-off between precision and recall and weighted F-measures that combine them. Overall model accuracy needs to be evaluated on a validation test set for fair assessment.

Uploaded by

Zafar Iqbal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views15 pages

8c - Model Evaluation and Selection

Uploaded by

Zafar Iqbal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

CS423

DATA WAREHOUSING AND DATA

MINING

Chapter 8c
Classification – Model Evaluation and
Selection

Dr. Hammad Afzal

[email protected]

Department of Computer Software Engineering

National University of Sciences and Technology (NUST)
CHAPTER 8. CLASSIFICATION: BASIC
CONCEPTS
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy: Ensemble
Methods
 Summary

2
USING IF-THEN RULES
 Represent the knowledge in the form of IF-THEN rules
R: IF age = youth AND student = yes THEN buys_computer = yes
 Rule antecedent/precondition vs. rule consequent

 Assessment of a rule: coverage and accuracy

 ncovers = # of tuples covered by R
 ncorrect = # of tuples correctly classified by R
coverage(R) = ncovers /|D|
accuracy(R) = ncorrect / ncovers

3
USING IF-THEN RULES
 If more than one rule are triggered, need conflict resolution
 Size ordering: assign the highest priority to the triggering rules that has
the “toughest” requirement (i.e., with the most attribute tests)

 Class-based ordering: decreasing order of prevalence or

misclassification cost per class

 Rule-based ordering (decision list): rules are organized into one long
priority list, according to some measure of rule quality or by experts

4
RULE EXTRACTION FROM A
DECISION TREE
 Rules are easier to understand than large
trees age?
 One rule is created for each path from the <=30 31..40 >40
root to a leaf
student? credit rating?
yes
 Each attribute-value pair along a path forms a
excellent fair
conjunction: the leaf holds the class no yes

no yes
prediction no yes

 Rules are mutually exclusive and exhaustive

 Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
5
IF age = old AND credit_rating = excellent THEN buys_computer = no
IF age = old AND credit_rating = fair THEN buys_computer = yes
CHAPTER 8. CLASSIFICATION: BASIC
CONCEPTS
 Classification: Basic Concepts
 Decision Tree Induction
 Bayes Classification Methods
 Rule-Based Classification
 Model Evaluation and Selection
 Techniques to Improve Classification Accuracy: Ensemble
Methods
 Summary

6
MODEL EVALUATION AND SELECTION

 Evaluation metrics: How can we measure accuracy? Other

metrics to consider?

 Use validation test set of class-labeled tuples instead of

training set when assessing accuracy

 Some of the measures are:

 Accuracy – suitable when class tuples are evenly distributed
 Precision - suitable when class tuples are not evenly distributed
 Recall - Sensitivity

7
CLASSIFIER EVALUATION METRICS:
CONFUSION MATRIX
Confusion Matrix:

Actual class\Predicted class Yes No

Yes True Positives (TP) False Negatives (FN)
No False Positives (FP) True Negatives (TN)

Actual class\Predicted class C1 ¬ C1

C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

 Given m classes, an entry, CMi,j in a confusion matrix indicates

# of tuples in class i that were labeled by the classifier as class j
 May have extra rows/columns to provide totals
8
CLASSIFIER EVALUATION METRICS:
CONFUSION MATRIX
 True Positives
 Postive tuples correctly classified as positive.

 True Negatives:
 Negative tuples correctly classified as negative.

 False Positives:
 Negative tuples incorrectly classified as positives.

 False Negatives:
 Positive tuples incorrectly classified as negatives

9
CLASSIFIER EVALUATION METRICS:
CONFUSION MATRIX
Confusion Matrix:
Actual class\Predicted class C1 ¬ C1
C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)

Example of Confusion Matrix:

Actual class\Predicted buy_computer buy_computer Total
class = yes = no
buy_computer = yes 6954 46 7000
buy_computer = no 412 2588 3000
Total 7366 2634 10000

 Given m classes, an entry, CMi,j in a confusion matrix indicates

# of tuples in class i that were labeled by the classifier as class j
 May have extra rows/columns to provide totals

10
ACCURACY, ERROR RATE,
SENSITIVITY AND SPECIFICITY
A\P C ¬C
C TP FN P
¬C FP TN N
P’ N’ All

 Classifier Accuracy, or recognition rate: percentage of test

set tuples that are correctly classified
Accuracy = (TP + TN)/All

 Error rate: 1 – accuracy, or

Error rate = (FP + FN)/All

11
ACCURACY, ERROR RATE,
SENSITIVITY AND SPECIFICITY
 Class Imbalance Problem:
 One class may be rare, e.g. fraud, or HIV-positive

 Significant majority of the negative class and minority of

the positive class

 Sensitivity: True Positive recognition rate

 Sensitivity = TP/P

 Specificity: True Negative recognition rate

 Specificity = TN/N

12
PRECISION AND RECALL, AND F-
MEASURES
 Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive

 Recall: completeness – what % of positive tuples did the

classifier label as positive?
 Perfect score is 1.0

13
CLASSIFIER EVALUATION METRICS:
PRECISION AND RECALL, AND F-
MEASURES
 Inverse relationship between precision & recall
 F measure (F or F-score): harmonic mean of precision and
1
recall,

 Fß: weighted measure of precision and recall

 assigns ß times as much weight to recall as to precision

14
CLASSIFIER EVALUATION METRICS: EXAMPLE
Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56

(specificity)

Total 230 9770 10000 96.40 (accuracy)

 Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%

BCA 2nd Project Presentation Notice
No ratings yet
BCA 2nd Project Presentation Notice
4 pages
The Communication Process
100% (1)
The Communication Process
7 pages
Lecture13 Ch8 ClassBasic Part3
No ratings yet
Lecture13 Ch8 ClassBasic Part3
23 pages
6.data Mining - Classification
No ratings yet
6.data Mining - Classification
37 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
17 pages
Lecture 11 Model Evaluation
No ratings yet
Lecture 11 Model Evaluation
11 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
TE - DWM Module No 3
No ratings yet
TE - DWM Module No 3
48 pages
Week 5
No ratings yet
Week 5
72 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
08 Classification
No ratings yet
08 Classification
26 pages
CH 6
No ratings yet
CH 6
24 pages
Lec07 Classification ModelEvaluation Ensemble
No ratings yet
Lec07 Classification ModelEvaluation Ensemble
62 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
Bernd Klein Python and Machine Learning Letter
No ratings yet
Bernd Klein Python and Machine Learning Letter
453 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
CS585 Lecture October03rd
No ratings yet
CS585 Lecture October03rd
146 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
No ratings yet
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
15 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
Big Data Mining and Analytics Notes
No ratings yet
Big Data Mining and Analytics Notes
7 pages
DTS 101 Lecture 2
No ratings yet
DTS 101 Lecture 2
30 pages
Noida Institute of Engineering and Technology
No ratings yet
Noida Institute of Engineering and Technology
24 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Classification Slides
No ratings yet
Classification Slides
147 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
CSDS 440: Machine Learning: Soumya Ray (
No ratings yet
CSDS 440: Machine Learning: Soumya Ray (
20 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
Classification
No ratings yet
Classification
23 pages
Classification Part 1
No ratings yet
Classification Part 1
76 pages
Mla Unit-5'2
No ratings yet
Mla Unit-5'2
74 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Machine Learning With Python - Machine Learning Terminology
No ratings yet
Machine Learning With Python - Machine Learning Terminology
1 page
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
ML CM
No ratings yet
ML CM
17 pages
Classification Chapter 5
No ratings yet
Classification Chapter 5
26 pages
0 Machine Learning Overview and Metrics LT
No ratings yet
0 Machine Learning Overview and Metrics LT
84 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
Realexam - Ctal TM 001
No ratings yet
Realexam - Ctal TM 001
25 pages
Using Transformers For Computer Vision - by Cameron R. Wolfe, Ph.D. - Towards Data Science
No ratings yet
Using Transformers For Computer Vision - by Cameron R. Wolfe, Ph.D. - Towards Data Science
2 pages
Rapid Prototyping
No ratings yet
Rapid Prototyping
1 page
Artificial Intelligence - Presentation
No ratings yet
Artificial Intelligence - Presentation
11 pages
SMS Project Report
No ratings yet
SMS Project Report
27 pages
Individual Learning Monitoring Plan
No ratings yet
Individual Learning Monitoring Plan
29 pages
Ford 8 D
No ratings yet
Ford 8 D
11 pages
Week 1 Lec 2 BPE
No ratings yet
Week 1 Lec 2 BPE
25 pages
Module 1
No ratings yet
Module 1
61 pages
Assignment Questions - SYDM - With Answer
No ratings yet
Assignment Questions - SYDM - With Answer
13 pages
Models of Communication
100% (5)
Models of Communication
14 pages
Simulation Modelling of A Fed-Batch Bioreactor For Controller Development
100% (3)
Simulation Modelling of A Fed-Batch Bioreactor For Controller Development
116 pages
Control Loop Characteristics
No ratings yet
Control Loop Characteristics
10 pages
3 DOF Gyroscope Courseware Sample For MATLAB Users
No ratings yet
3 DOF Gyroscope Courseware Sample For MATLAB Users
8 pages
Purposive Communication 2 Prelim Exam Reviewer
No ratings yet
Purposive Communication 2 Prelim Exam Reviewer
8 pages
PID Controller Design For Two Tanks Liquid Level Control System Using Matlab
No ratings yet
PID Controller Design For Two Tanks Liquid Level Control System Using Matlab
6 pages
SIA SoftwareTesting Techniques
No ratings yet
SIA SoftwareTesting Techniques
18 pages
Work Breakdown Structures (WBS) For Software Development Projects
No ratings yet
Work Breakdown Structures (WBS) For Software Development Projects
13 pages
Supply Chain Managment Macro Process
No ratings yet
Supply Chain Managment Macro Process
6 pages
Expert Systems
100% (1)
Expert Systems
35 pages
Data Governance
No ratings yet
Data Governance
4 pages
Model Based Predictive Control of An Olive Oil Mill
100% (2)
Model Based Predictive Control of An Olive Oil Mill
11 pages
Adarsh Lokhande
No ratings yet
Adarsh Lokhande
2 pages
The Database Management System DBMS
No ratings yet
The Database Management System DBMS
15 pages
Explainable Artificial Intelligence (EXAI) Models For Early Prediction of Parkinson's Disease Based On Spiral and Wave Drawings
No ratings yet
Explainable Artificial Intelligence (EXAI) Models For Early Prediction of Parkinson's Disease Based On Spiral and Wave Drawings
13 pages
ERP Production Planning & Control
No ratings yet
ERP Production Planning & Control
36 pages
Industrial Control System
No ratings yet
Industrial Control System
4 pages
CISA Certified Information Systems Auditor
No ratings yet
CISA Certified Information Systems Auditor
13 pages

8c - Model Evaluation and Selection

Uploaded by

8c - Model Evaluation and Selection

Uploaded by

CS423

DATA WAREHOUSING AND DATA

Dr. Hammad Afzal

Department of Computer Software Engineering

 Assessment of a rule: coverage and accuracy

 Class-based ordering: decreasing order of prevalence or

 Rules are mutually exclusive and exhaustive

 Evaluation metrics: How can we measure accuracy? Other

 Use validation test set of class-labeled tuples instead of

 Some of the measures are:

Actual class\Predicted class Yes No

Actual class\Predicted class C1 ¬ C1

 Given m classes, an entry, CMi,j in a confusion matrix indicates

Example of Confusion Matrix:

 Given m classes, an entry, CMi,j in a confusion matrix indicates

 Classifier Accuracy, or recognition rate: percentage of test

 Error rate: 1 – accuracy, or

 Significant majority of the negative class and minority of

the positive class

 Sensitivity: True Positive recognition rate

 Specificity: True Negative recognition rate

 Recall: completeness – what % of positive tuples did the

 Fß: weighted measure of precision and recall

cancer = yes 90 210 300 30.00 (sensitivity

cancer = no 140 9560 9700 98.56

Total 230 9770 10000 96.40 (accuracy)

 Precision = 90/230 = 39.13% Recall = 90/300 = 30.00%

You might also like