0% found this document useful (0 votes)

18 views18 pages

Presentation On Classification

This document discusses various metrics for evaluating classifier performance such as accuracy, sensitivity, specificity, precision, and the F-measure. It introduces the confusion matrix and defines true positives, true negatives, false positives, and false negatives. It describes limitations of using accuracy and issues with class imbalance. The document also covers the holdout method, random subsampling, k-fold cross-validation, and n-fold cross-validation for evaluating classifier performance. N-fold cross-validation is computationally expensive for large datasets.

Uploaded by

sree vishnupriyq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views18 pages

Presentation On Classification

Uploaded by

sree vishnupriyq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Presentation on

Classification(2)
Section 8.5.1-8.5.3

K Saiveer – 30121101.
8.5.1 Metrics for Evaluating Classifier Performance

• Focus on the predictive capability of a model.

• Rather than how fast it takes to classify or build models, scalability, etc.
• Model Evaluation Metrix:
• Accuracy (also known as recognition rate)
• Sensitivity (or recall)
• Specificity
• Precision
• F-measure
• F1, and Fβ
Confusion Matrix : A table that is often used to describe the
performance of a classification model (or classifier) on a set of test data
for which the true values are known.

• Confusion matrix is a table with size m by m.

• An entry, CMi,j in thr first m rows and m columns

Indicates the number of tuples if class i that were
Labeled by the classifier as class j.
Before we discuss the various measures, we need
to become comfortable with some terminology.
• True Positives (TP): Positive tuples that were correctly labelled by the
classifier.
• True Negatives (TN) : Negative tuples that were correctly labelled by
the classifier
• False Positives (FP) : Negative tuples that were incorrectly labelled as
positive
• False negatives (FN) : Positive tuples that were mislabelled as negative
• Most widely used metric:

• Classifier Accuracy, or recognition rate: percentage of test set tuples

that are correctly classified.

• Error rate, or misclassification rate: Error rate = 1 – accuracy, or

Limitation of Accuracy
• Consider a 2-class problem
• Number of class 0 examples = 9990
• Number of class 1 examples = 10

• If model predicts everything to be class 0:

• Accuracy is 9990/10000 = 99.9%.
• Accuracy is mis leading because model does not detect any class 1 example.

• Class Imbalance Problem:

• One class may be rare ,eg. Fraud or cancer.
• Significant majority of the negative class and minority of the positive class
• Sensitivity is also referred to as the true positive (recognition) rate
(TPR).
• The proportion of positive tuples that are correctly identified

• Specificity is the true negative rate (TNR)

• The proportion of negative tuples that are correctly identified

• It can be shown that accuracy is a function of sensitivity and specificity.

• Precision: Can be thought of as measure of exactness
• What percentage of tuples labelled as positive are actually positive.

• Recall: is a measure od completeness

• What percentage of positive tuples are labelled as positive?

• Perfect score is 1.0

• Inverse relationship between precision and recall
• If recall seems familiar. That’s because it is the same as sensitive (or the true positive rate).
• F measure (F1 or F-Score): harmonic mean of precision and recall,
• It gives equal weight to precision and recall.

• Fβ : weighted measure of precision and recall

• Assigns β times as much weight to recall as to precicion

• Commonly used Fβ measures are
• F2: which weights recall twice as much as precision
• F0.5 which weights precision twice as much as recall
Classifier Evaluation Metrics Example
8.5.2 Holdout Method and Random
Subsampling
• Holdout Method :
• This is a basic concept of estimating a prediction.

• Given a dataset, it is partitioned into two disjoint sets called training set and testing set.

• Classifier is learned based on the training set and get evaluated with testing set.

• Proportion of training and testing sets is at the discretion of analyst; typically 1:1 or 2:1, and
there is a trade-off between these sizes of these two sets.

• If the training set is too large, then model may be good enough, but estimation may be less
reliable due to small testing set and vice-versa.
Random Subsampling
• It is a variation of Holdout method to overcome the drawback of over-
presenting a class in one set thus under-presenting it in the other set and vice-
versa.

• In this method, Holdout method is repeated k times, and in each time, two
disjoint sets are chosen at random with a predefined sizes.

• Overall estimation is taken as the average of estimations obtained from each

iteration.
8.5.3 Cross-Validation
• The main drawback of Random subsampling is, it does not have control over the number of times
each tuple is used for training and testing.

• Cross-validation is proposed to overcome this problem.

• There are two variations in the cross-validation method.

• k-fold cross-validation

• N-fold cross-validation
k-fold Cross-Validation
• Dataset consisting of N tuples is divided into k (usually, 5 or 10) equal,
mutually exclusive parts or folds (, and if N is not divisible by k, then the last
part will have fewer tuples than other (k-1) parts.

• A series of k runs is carried out with this decomposition, and in ith iteration is
used as test data and other folds as training data
• Thus, each tuple is used same number of times for training and once for testing.

• Overall estimate is taken as the average of estimates obtained from each

iteration.
D1 Fold 1

Learning
Di technique
Fold i

Data set
Dk

Fold k
CLASSIFIER

Accuracy Performance
N-fold Cross-Validation
• In k-fold cross-validation method, part of the given data is used in training
with k-tests.

• N-fold cross-validation is an extreme case of k-fold cross validation, often

known as “Leave-one-out’’ cross-validation.

• Here, dataset is divided into as many folds as there are instances; thus, all
most each tuple forming a training set, building N classifiers.

• In this method, therefore, N classifiers are built from N-1 instances, and
each tuple is used to classify a single test instances.

• Test sets are mutually exclusive and effectively cover the entire set (in
sequence). This is as if trained by entire data as well as tested by entire data
set.

• Overall estimation is then averaged out of the results of N classifiers.

N-fold Cross-Validation : Issue
• So far the estimation of accuracy and performance of a classifier model is
concerned, the N-fold cross-validation is comparable to the others we have
just discussed.

• The drawback of N-fold cross validation strategy is that it is

computationally expensive, as here we have to repeat the run N times; this
is particularly true when data set is large.

• In practice, the method is extremely beneficial with very small data set
only, where as much data as possible to need to be used to train a classifier.
Thank
you.

Handbook of Satisfiability - Second Edition
No ratings yet
Handbook of Satisfiability - Second Edition
1,486 pages
Kca 301
No ratings yet
Kca 301
186 pages
Quantum Technology and Optimization Problems: Sebastian Feld Claudia Linnhoff-Popien
No ratings yet
Quantum Technology and Optimization Problems: Sebastian Feld Claudia Linnhoff-Popien
234 pages
Lecture 11
No ratings yet
Lecture 11
61 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
Industrial Statistics - A Computer Based Approach With Python
No ratings yet
Industrial Statistics - A Computer Based Approach With Python
140 pages
Monkey and Banana
50% (2)
Monkey and Banana
22 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
Decoding DTMF: Filters in The Frequency Domain: Laboratory 7
No ratings yet
Decoding DTMF: Filters in The Frequency Domain: Laboratory 7
12 pages
Deep Learning Hardware
No ratings yet
Deep Learning Hardware
82 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Manual For IA Controlers V6
No ratings yet
Manual For IA Controlers V6
60 pages
Lecture - Model Accuracy Measures
No ratings yet
Lecture - Model Accuracy Measures
61 pages
Data Mining Models and Evaluation Techniques
No ratings yet
Data Mining Models and Evaluation Techniques
59 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
CHP 3
No ratings yet
CHP 3
70 pages
Lecture 4
No ratings yet
Lecture 4
31 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Wk07 Topic07 2 - 202303
No ratings yet
Wk07 Topic07 2 - 202303
21 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
AI351 Lecture 2 - Common Evaluation Metrics
No ratings yet
AI351 Lecture 2 - Common Evaluation Metrics
50 pages
1fuzzy Logic
No ratings yet
1fuzzy Logic
5 pages
CST 42315 Dam - L9 1
No ratings yet
CST 42315 Dam - L9 1
15 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Finc600 Chapter 2 PPT
No ratings yet
Finc600 Chapter 2 PPT
41 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
ML 5
No ratings yet
ML 5
14 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
22 pages
Calculation of Gauss Quadrature Rules : by Gene H. Golub and John H. Welsch
No ratings yet
Calculation of Gauss Quadrature Rules : by Gene H. Golub and John H. Welsch
10 pages
Estimating The Predictive Accuracy of A
No ratings yet
Estimating The Predictive Accuracy of A
10 pages
NP Complete
No ratings yet
NP Complete
22 pages
Chapter IV
No ratings yet
Chapter IV
31 pages
Unit 6-Feature Engineering and Sensitivity Analysis
No ratings yet
Unit 6-Feature Engineering and Sensitivity Analysis
63 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Digital Image Processing
No ratings yet
Digital Image Processing
23 pages
Finding Triconnected Components of Graphs PDF
No ratings yet
Finding Triconnected Components of Graphs PDF
24 pages
Unit 2
No ratings yet
Unit 2
28 pages
My Courses: Home UGRD-GE6114-2113T Week 6-7: Linear Programming & Problem Solving Strategies Midterm Quiz 1
No ratings yet
My Courses: Home UGRD-GE6114-2113T Week 6-7: Linear Programming & Problem Solving Strategies Midterm Quiz 1
3 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
Cross Validation
No ratings yet
Cross Validation
10 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Jason PDF
No ratings yet
Jason PDF
14 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Ict Assignment
No ratings yet
Ict Assignment
3 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
01.query-By-Example On-Device Keyword Spotting
No ratings yet
01.query-By-Example On-Device Keyword Spotting
7 pages
Data Communication
No ratings yet
Data Communication
17 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
On Estimating Model Accuracy
No ratings yet
On Estimating Model Accuracy
6 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
Term 3 Test Grade 12 2021
No ratings yet
Term 3 Test Grade 12 2021
6 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
Rr410210 Optimization Techniques
No ratings yet
Rr410210 Optimization Techniques
8 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Normal 640c0498a1cb2
No ratings yet
Normal 640c0498a1cb2
2 pages
2DRR00 20240126
No ratings yet
2DRR00 20240126
5 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
08 Hashing
No ratings yet
08 Hashing
42 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Module 6
No ratings yet
Module 6
24 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Data Structures and Algorithms-Fall-2024-Sheet-02
No ratings yet
Data Structures and Algorithms-Fall-2024-Sheet-02
3 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
MSPS-14 Assignment 2 - Semester 1
No ratings yet
MSPS-14 Assignment 2 - Semester 1
1 page
CH-5 ML
No ratings yet
CH-5 ML
36 pages
MLA CT1 - Notes
No ratings yet
MLA CT1 - Notes
17 pages
6 Evaluarea Performantei
No ratings yet
6 Evaluarea Performantei
43 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Monotone Comparative Statics
No ratings yet
Monotone Comparative Statics
10 pages
Unit 5 ML
No ratings yet
Unit 5 ML
21 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
CH 6
No ratings yet
CH 6
24 pages
4.8 Estimating The Performance of A Classifier
No ratings yet
4.8 Estimating The Performance of A Classifier
19 pages
Mini Project Document 50
No ratings yet
Mini Project Document 50
43 pages
KVS APPC 4.2 Parametric Functions Modeling Planar Motion Practice Key
No ratings yet
KVS APPC 4.2 Parametric Functions Modeling Planar Motion Practice Key
5 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet

Presentation On Classification

Uploaded by

Presentation On Classification

Uploaded by

Presentation on

• Focus on the predictive capability of a model.

• Confusion matrix is a table with size m by m.

• An entry, CMi,j in thr first m rows and m columns

• Classifier Accuracy, or recognition rate: percentage of test set tuples

• Error rate, or misclassification rate: Error rate = 1 – accuracy, or

• If model predicts everything to be class 0:

• Class Imbalance Problem:

• Specificity is the true negative rate (TNR)

• It can be shown that accuracy is a function of sensitivity and specificity.

• Recall: is a measure od completeness

• Perfect score is 1.0

• Fβ : weighted measure of precision and recall

• Assigns β times as much weight to recall as to precicion

• Overall estimation is taken as the average of estimations obtained from each

• Cross-validation is proposed to overcome this problem.

• There are two variations in the cross-validation method.

• Overall estimate is taken as the average of estimates obtained from each

• N-fold cross-validation is an extreme case of k-fold cross validation, often

• Overall estimation is then averaged out of the results of N classifiers.

• The drawback of N-fold cross validation strategy is that it is

You might also like