0% found this document useful (0 votes)

25 views38 pages

Notes 03

Uploaded by

HAMXALA KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views38 pages

Notes 03

Uploaded by

HAMXALA KHAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Machine Learning

EE514 – CS535

Analysis and Evaluation of Classifier’s

Performance and Multi-class Classification

Zubair Khalid

School of Science and Engineering

Lahore University of Management Sciences

https://www.zubairkhalid.org/ee514_2023.html
Outline

- Classification Accuracy (0/1 Loss)

- TP, TN, FP and FN
- Confusion Matrix
- Sensitivity, Specificity, Precision Trade-offs, ROC, AUC
- F1-Score and Matthew’s Correlation Coefficient
- Multi-class Classification, Evaluation, Micro, Macro Averaging
Evaluation of Classification Performance
Classification Accuracy, Misclassification Rate (0/1 Loss):

- For each test-point, the loss is either 0 or 1; whether the prediction is correct or
incorrect.
- Averaged over n data-points, this loss is a ‘Misclassification Rate’.

Interpretation:
- Misclassification Rate: Estimate of the probability that a point is incorrectly classified.
- Accuracy = 1- Misclassification rate

Issue:
- Not meaningful when the classes are imbalanced or skewed.
Evaluation of Classification Performance
Classification Accuracy (0/1 Loss):
Example:
- Predict if a bowler will not bowl a no-ball?
- Assuming 15 no-balls in an inning, a model that says ‘Yes’ all the time will have
95% accuracy.
- Using accuracy as performance metric, we can say that a model is very accurate,
but it is not useful or valuable in fact.

Why?
- Total points: 315 (assuming other balls are legal ☺)
- No-ball label: Class 0 (4.76% are from this class) Imbalanced
- Not a no-ball label: Class 1 (95.24% are from this class) Classes
Evaluation of Classification Performance
TP, TN, FP and FN:
- Consider a binary classification problem.
Evaluation of Classification Performance
TP, TN, FP and FN:
Evaluation of Classification Performance
TP, TN, FP and FN:
Example:
- Predict if a bowler will not bowl a no-ball?
- 15 no-balls in an inning (Total balls: 315)
- Bowl no-ball (Class 0), Bowl regular ball (Class 1)
- Model(*) predicted 10 no-balls (8 correct predictions, 2 incorrect)

* Assume you have a model that has been observing the bowlers for the last 15 years
and used these observations for learning.
Evaluation of Classification Performance
Confusion Matrix (Contingency Table):
- (TP; TN; FP; FN); usefully summarized in a table, referred to as confusion matrix:
- the rows correspond to predicted class (𝑦)
ො
- and the columns to true class (𝑦)

Actual Labels
1 (Positive) 0 (Negative) Total
1 (Positive) Predicted Total
Predicted TP FP Positives
Labels 0 (Negative) Predicted Total
FN TN Negatives
Total P= TP+FN N= P+TN
Actual Actual
Total Total
Positives Negatives
Evaluation of Classification Performance
Confusion Matrix:
Actual Labels
Example:
- Disease Detection : 1 (Positive) 0 (Negative) Total
Given pathology reports and
1 (Positive)
scans, predict heart disease Predicted TP = 100 FP = 10 110
- Yes: 1, No: 0 Labels 0 (Negative)
FN = 5 TN = 50 55

Interpretation: Total P = 105 N = 60

Out of 165 cases

- Predicted: “Yes" 110 times, and “No" 55 times

- In reality: “Yes" 105 times, and “No" 60 times

Evaluation of Classification Performance
Confusion Matrix:
Actual Labels
Example:
- Predict if a bowler will not 1 (Positive) 0 (Negative) Total
bowl a no-ball?
1 (Positive) 305
Predicted TP = 298 FP = 7
Labels 0 (Negative)
FN = 2 TN = 8 10
Interpretation:
Total P = 300 N = 15
Out of 315 balls, we had 15 no-balls.

- Model predicted 305 regular balls and 10 no-balls (8 correct predictions, 2

incorrect).
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix:

- Accuracy: Overall, how frequently is the classifier correct?

- Misclassification or Error Rate: Overall, how frequently is it wrong?

- Sensitivity or Recall or True Positive Rate (TPR): How often does it predict Positive
when it is actually Positive?
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix:

- False Positive Rate: Actual Negative, how often does it

predict Positive?

- Specificity or True Negative Rate (TNR): When it's actually Negative, how often does it
predict Negative?

- Precision: When it predicts Positive, how often is it Positive?

Evaluation of Classification Performance
Confusion Matrix Metrics:

Negative Predicted Value

Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix (Example: Disease Prediction):

- Accuracy: Disease/Healthy prediction accuracy

= (100+50)/165 = 0.91

- Misclassification or Error Rate: Disease/Healthy prediction accuracy

= (10+5)/165 = 0.09

- Sensitivity or Recall or True Positive Rate (TPR): When it's positive, how often does
the model detected disease?

= 100/105 = 0.95
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix (Example: Disease Prediction):

- False Positive Rate: Actually heathy, how often does it predict yes?

= 10/60 = 0.17

- Specificity or True Negative Rate (TNR): When it's actually health, how often does it predict
healthy?
= 50/60 = 0.83

- Precision: When it predicts disease, how often is it correct?

= 100/110 = 0.91
Evaluation of Classification Performance
Confusion Matrix:
Metrics using Confusion Matrix:
When to use which?

- Disease Detection: We do not want FN

- Fraud Detection: We do not want FP

Outline

- Classification Accuracy (0/1 Loss)

- TP, TN, FP and FN
- Confusion Matrix
- Sensitivity, Specificity, Precision Trade-offs, ROC, AUC
- F1-Score and Matthew’s Correlation Coefficient
- Multi-class Classification, Evaluation, Micro, Macro Averaging
Evaluation of Classification Performance
Confusion Matrix:
Precision and Sensitivity (Recall) Trade-off:
Sensitivity or Recall Precision
- Disease Detection:

- Recall or Sensitivity (Se); how good we are at detecting diseased people.

- Precision: How many have been correctly diagnosed as unhealthy.

- If we have diagnosed everyone unhealthy, Se=1 (diagnose

all unhealthy people correctly) but Precision may be low
(because TN=0 that increases the value of FP).

- We want high Precision and high Se (=1, Ideally).

- We should combine precision and sensitivity to evaluate the performance of classifier.
- F1-Score
Evaluation of Classification Performance
Confusion Matrix:
Sensitivity and Specificity Trade-off:
Sensitivity or Recall Specificity
- Disease Detection:

- Sp and Se; how good we are at detecting healthy and diseased people, respectively.

- If we have diagnosed everyone healthy, Sp=1 (diagnose all healthy people correctly) but
Se=0 (diagnose all unhealthy people incorrectly)

- Ideally: we want Sp= Se= 1 (perfect sensitivity and specificity) but unrealistic.
Evaluation of Classification Performance
Confusion Matrix:
Sensitivity and Specificity Trade-off:
How optimal a pair of sensitivity, specificity values is?
- Is Sp= 0.8, Se= 0.7 better than Sp= 0.7, Se= 0.8?

Threshold

- The answer depends on the application. Se = 1 Sp= 1

- In disease diagnosis;

- happy to reduce Sp in order to increase Se.

- In other applications, we may have different requirements.

- Trade-off is better explained by ROC curve and AUC.

Evaluation of Classification Performance
Confusion Matrix:
ROC (Receiver Operating Characteristic) Curve:
- Plot of TPR (Sensitivity) against FPR (1 – Specificity)
for different values of threshold.

- Also referred to as Sensitivity-(1-Specificity) plot.

Threshold

- Threshold of 0.0, every case is diagnosed as positive.

- Se= TPR = 1
- FPR = 1
- Sp= 0

- Threshold of 1.0, every case is diagnosed as negative.

- Se= TPR = 0
- FPR = 0
- Sp= 1
Evaluation of Classification Performance
Confusion Matrix:
ROC Curve
ROC Curve and AUC:

- TPR (Sensitivity): how many correct positive results

occur among all positive samples.

- FPR (1 – Specificity): how many incorrect positive

results occur among all negative samples.

- The best possible prediction method

- Se = Sp = 1 (Upper left corner of ROC space)

- Random guess; a point along a diagonal line (the

so-called line of no-discrimination), No Power!

- Area Under the ROC Curve, abbreviated as (AUC)

quantifies the power of the classifier.
Outline

- Classification Accuracy (0/1 Loss)

- TP, TN, FP and FN
- Confusion Matrix
- Sensitivity, Specificity, Precision Trade-offs, ROC, AUC
- F1-Score and Matthew’s Correlation Coefficient
- Multi-class Classification, Evaluation, Micro, Macro Averaging
Evaluation of Classification Performance
F1-Score:
- We observed trade-off between recall and precision.

- Higher levels of recall may be obtained at the price of lower values of precision.

- We need to define a single measure that combines recall and precision or other
metrics to evaluate the performance of a classifier.

- Some combined measures:

- F1 Score
- Matthew’s Correlation Coefficient
- 11-point average precision
- The Breakeven point
Evaluation of Classification Performance
F1 Score:

- One measure that assesses recall and precision trade-off is weighted harmonic
mean (HM) of recall and precision, that is,
Evaluation of Classification Performance
F1 Score:
Why harmonic mean?
- We could also use arithmetic mean (AM) or geometric mean (GM).

- HM is preferred as it penalizes model the

most; a conservative average, that is, for two
real positive numbers, we have

- Improvement in HM implies improvement in

AM or GM.

Different means, minimum and maximum against

precision. Recall=70% is fixed.
Evaluation of Classification Performance
Matthew’s Correlation Coefficient (MCC):
- Precision, Recall and F1-score are asymmetric. Get a different result if the classes are switched.

- Matthew’s correlation coefficient determines the correlation between true class and predicted
class. The higher the correlation between true and predicted values, the better the prediction.

- Defined as

- MCC=1 when FP = FN = 0 (Perfect classification)

- MCC=-1 when TP = TN = 0 (Perfect misclassification)
- MCC=0; Performance of classifier is not better than a random classifier (flip coin)
- MCC is symmetric by design
Evaluation of Classification Performance
11-point Average Precision:
- Adjust threshold of the classifier such that the recall takes the following 11 values 0.0, 0.1.,
…, 0.9, 1.0.

- For each value of the recall, determine the precision and find the average value of precision,
referred to as average precision (AP).

- This is just uniformly-spaced sampling of Precision-Recall curve and taking average value.

The Breakeven Point:

- Compute precision as a function of recall for different values of thresholds.

- When Precision = Recall, we have a breakeven.

Outline

- Classification Accuracy (0/1 Loss)

Examples:
- Emotion Detection.

- Vehicle Type, Make, model, color of the vehicle from the images streamed by safe city camera.

- Speaker Identification from Speech Signal.

- State (rest, ramp-up, normal, ramp-down) of the process machine in the plant.

- Sentiment Analysis (Categories: Positive, Negative, Neutral), Text Analysis.

- Take an image of the sky and determine the pollution level (healthy, moderate, hazard).

- Record Home WiFi signals and identify the type of appliance being operated.
Multi-Class Classification
Implementation (Possible options using binary classifiers):
Option 1: Build a one-vs-all (OvA) one-vs-rest (OvR) classifier:

Option 2: Build an all-vs-all classifier:

There can be other options…

Evaluation of Classification Performance
Multiclass Classification:

- How do we define the measures for the evaluation of the performance of multi-class classifier?

- Macro-averaging: We compute performance for each class and then average.

- Micro-averaging: Compute confusion matrix after collecting decisions for all classes and then
evaluate.
Evaluation of Classification Performance
Multiclass Classification:
Confusion Matrix
- Predict if a bowler will bowl a no-ball, wide bowl, regular bowl?
- 15 no-balls, 20 wide-balls in an inning (Total balls: 335)
- Model Predictions:
Actual
No-ball Wide-ball Regular ball Precision
No-ball
8 5 20
Classifier
Output
Wide-ball 2 10 10
Regular ball 5 5 270
Recall
Evaluation of Classification Performance
Multiclass Classification:
Confusion Matrix – Recall and Precision:

Recall
- For i-th class, recall represents the fraction of data-points classified correctly, that is,

Precision
- For i-th class, precision represents the fraction of data-points
predicted to be in class i are actually in the i-th class, that is,

Accuracy
- Fraction of data points classified correctly, that is,
Evaluation of Classification Performance
Multiclass Classification:
Confusion Matrix – Macro-Averaging:
- We compute performance for
each class and then average.

Confusion Matrix – Each Class:

Actual Actual Actual
Not a Not
No-ball No-ball Wide Wide Regular Not Regular

Classifier
No-ball 8 25 Wide 10 12 Regular 270 10
Output Not a no- Not
ball 7 295 Not Wide
10 303 Regular 30 25
Recall

Macro-average Recall:
Evaluation of Classification Performance
Multiclass Classification:
True False
Confusion Matrix – Micro-Averaging: Micro-average
- Compute confusion matrix after collecting
True
288 47 Recall:
decisions for all classes and then evaluate.
False
47 623

Confusion Matrix – Each Class:

Actual Actual Actual
Not a Not
No-ball No-ball Wide Wide Regular Not Regular

Classifier
No-ball 8 25 Wide 10 12 Regular 270 10
Output Not a no- Not
ball 7 295 Not Wide
10 303 Regular 30 25
Evaluation of Classification Performance
Multiclass Classification:
Micro-Averaging vs Macro Averaging:
- Note Micro-average recall= Micro-average precision = F1 Score = Accuracy (computed from
confusion matrix)
- Micro-average is termed as a global metric.
- Consequently, it is not a good measure when classes are not balanced.

- Macro-average is relatively a better as we can see a zoomed-in picture before averaging.

- Note Macro-averaging does not take class imbalance into account.

- Weighted-averaging; Similar to Macro averaging but takes a weighted mean instead where
weight for each class is the total number of data-points of that class.

Weighted-average Recall:
Evaluation of Classification Performance
References:

- KM 5.7.2

Neurology Clinics-Current Advances and Future Trends in Vascular Neurology 2024
No ratings yet
Neurology Clinics-Current Advances and Future Trends in Vascular Neurology 2024
141 pages
AIML Lab Manual
67% (3)
AIML Lab Manual
31 pages
Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Final Project - Big Data
No ratings yet
Final Project - Big Data
16 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
47 pages
(Internet of Everything (IoE) ) Rashmi Agrawal (Editor) Marcin Paprzycki (Editor) Neha Gupta (Editor) - Big Data IoT and Mach
No ratings yet
(Internet of Everything (IoE) ) Rashmi Agrawal (Editor) Marcin Paprzycki (Editor) Neha Gupta (Editor) - Big Data IoT and Mach
339 pages
Confusion Matrix
No ratings yet
Confusion Matrix
43 pages
Confusion Matrix For Your Multi-Class Machine Learning Model - by Joydwip Mohajon - Towards Data Science
No ratings yet
Confusion Matrix For Your Multi-Class Machine Learning Model - by Joydwip Mohajon - Towards Data Science
9 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Scikit-Learn CodeMixed Beginner Guide
No ratings yet
Scikit-Learn CodeMixed Beginner Guide
10 pages
Introduction To Classifier Performance Analysis With R by Sutaip L.C. Saw
No ratings yet
Introduction To Classifier Performance Analysis With R by Sutaip L.C. Saw
222 pages
ML Lecture 11 Evaluation
No ratings yet
ML Lecture 11 Evaluation
17 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Lec 12 13 Evaluation Measures
No ratings yet
Lec 12 13 Evaluation Measures
45 pages
D3 IT Performance Metrics May 2023
No ratings yet
D3 IT Performance Metrics May 2023
48 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Performance Metrics Classification
No ratings yet
Performance Metrics Classification
39 pages
Lecture11evaluationmetricsforclassification 240913060639 0c766554
No ratings yet
Lecture11evaluationmetricsforclassification 240913060639 0c766554
28 pages
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
No ratings yet
Evaluation Metrics: Yining Chen (Adapted From Slides by Anand Avati) May 1, 2020
31 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
COnfusion Matrix
No ratings yet
COnfusion Matrix
32 pages
Session 1 Evaluation Model
No ratings yet
Session 1 Evaluation Model
58 pages
12-Confusion Matrix
No ratings yet
12-Confusion Matrix
3 pages
Confusion Matrix
No ratings yet
Confusion Matrix
16 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
ML Lab Manual
No ratings yet
ML Lab Manual
25 pages
Chapter 5 Model Evaluation
No ratings yet
Chapter 5 Model Evaluation
21 pages
Systems 11 00351 v2
No ratings yet
Systems 11 00351 v2
37 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-IV
No ratings yet
WINSEM2024-25 CBS3006 ETH VL2024250505168 2025-01-09 Reference-Material-IV
20 pages
March 3rd&4th
No ratings yet
March 3rd&4th
19 pages
ML-Lecture-12 (Evaluation Metrics For Classification)
No ratings yet
ML-Lecture-12 (Evaluation Metrics For Classification)
15 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Confusion Matrix
No ratings yet
Confusion Matrix
11 pages
Lecture - 3
No ratings yet
Lecture - 3
24 pages
Unit-2 AI Python
No ratings yet
Unit-2 AI Python
57 pages
Heart Disease Prediction Using Binary Classification
No ratings yet
Heart Disease Prediction Using Binary Classification
44 pages
Team 1 - Final Document
No ratings yet
Team 1 - Final Document
44 pages
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
No ratings yet
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
25 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Report PDF
No ratings yet
Report PDF
42 pages
Risk Security and Regulatory Compliance
No ratings yet
Risk Security and Regulatory Compliance
12 pages
Ads 5
No ratings yet
Ads 5
5 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
11.2 - Classification Evaluation Metrics
No ratings yet
11.2 - Classification Evaluation Metrics
22 pages
Lecture 5
No ratings yet
Lecture 5
21 pages
10 57020-Ject 1579598-4340965
No ratings yet
10 57020-Ject 1579598-4340965
12 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
ML Question Answer
No ratings yet
ML Question Answer
21 pages
Evaluation Measures
No ratings yet
Evaluation Measures
8 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
JNN 5.2 Confusion Matrix and Performance Evaluation Metrics
No ratings yet
JNN 5.2 Confusion Matrix and Performance Evaluation Metrics
13 pages
An Efficient Algorithm For A LSTM-based Method For Stock Returns Prediction
No ratings yet
An Efficient Algorithm For A LSTM-based Method For Stock Returns Prediction
6 pages
Evaluation Measures For Machine Learning Models
No ratings yet
Evaluation Measures For Machine Learning Models
6 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Confusion Matrix
No ratings yet
Confusion Matrix
18 pages
Confusion Matrix V 2.0
No ratings yet
Confusion Matrix V 2.0
14 pages
Performance Measures
No ratings yet
Performance Measures
9 pages
Exp7 MLAI2
No ratings yet
Exp7 MLAI2
8 pages
Performance Parameters
No ratings yet
Performance Parameters
23 pages
COE101 Project Group 16
No ratings yet
COE101 Project Group 16
12 pages
Confusion Matrix and Classification Evaluation Metrics
No ratings yet
Confusion Matrix and Classification Evaluation Metrics
16 pages
Prediction of Risk Delay in Construction Projects Using
No ratings yet
Prediction of Risk Delay in Construction Projects Using
15 pages
Confusion Matrix
No ratings yet
Confusion Matrix
14 pages
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyper-Parameter Tuning
20 pages
Instruction & Option Choice
No ratings yet
Instruction & Option Choice
6 pages
Exploratory Analysis of Social Media Prior To A Suicide Attempt
No ratings yet
Exploratory Analysis of Social Media Prior To A Suicide Attempt
12 pages
Confusion Matrix and Performance Evaluation Metrics
No ratings yet
Confusion Matrix and Performance Evaluation Metrics
13 pages
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
No ratings yet
Implementing Logistic Regression For Iris Using Sklearn and Checking The Accuracy Using Confusion Matrix
7 pages
20bce2251 VL2021220503859 Ast01
No ratings yet
20bce2251 VL2021220503859 Ast01
15 pages
Application of Machine Learning in Rock Facies Classification With Physics-Motivated Feature Augmentation
No ratings yet
Application of Machine Learning in Rock Facies Classification With Physics-Motivated Feature Augmentation
8 pages
Importing The Necessary Libraries
No ratings yet
Importing The Necessary Libraries
3 pages
Hasnain2020 Refrensi 15
No ratings yet
Hasnain2020 Refrensi 15
15 pages
Development of Hydroponic IoT-based Monitoring System and Automatic Nutrition Control Using KNN
No ratings yet
Development of Hydroponic IoT-based Monitoring System and Automatic Nutrition Control Using KNN
6 pages
Lab 02: Decision Tree With Scikit-Learn: About The Mushroom Data Set
No ratings yet
Lab 02: Decision Tree With Scikit-Learn: About The Mushroom Data Set
3 pages
Confusion Matrix - Wikipedia
No ratings yet
Confusion Matrix - Wikipedia
4 pages
Foundations of Elementary Analysis
From Everand
Foundations of Elementary Analysis
Roshan Trivedi
No ratings yet
Impulse Balance Theory and its Extension by an Additional Criterion
From Everand
Impulse Balance Theory and its Extension by an Additional Criterion
Reinhard Selten
1/5 (1)

Notes 03

Uploaded by

Notes 03

Uploaded by

Machine Learning

Analysis and Evaluation of Classifier’s

School of Science and Engineering

- Classification Accuracy (0/1 Loss)

Interpretation: Total P = 105 N = 60

- Predicted: “Yes" 110 times, and “No" 55 times

- In reality: “Yes" 105 times, and “No" 60 times

- Model predicted 305 regular balls and 10 no-balls (8 correct predictions, 2

- Accuracy: Overall, how frequently is the classifier correct?

- Misclassification or Error Rate: Overall, how frequently is it wrong?

- False Positive Rate: Actual Negative, how often does it

- Precision: When it predicts Positive, how often is it Positive?

Negative Predicted Value

- Accuracy: Disease/Healthy prediction accuracy

- Misclassification or Error Rate: Disease/Healthy prediction accuracy

- Precision: When it predicts disease, how often is it correct?

- Disease Detection: We do not want FN

- Fraud Detection: We do not want FP

- Classification Accuracy (0/1 Loss)

- Recall or Sensitivity (Se); how good we are at detecting diseased people.

- If we have diagnosed everyone unhealthy, Se=1 (diagnose

- We want high Precision and high Se (=1, Ideally).

- The answer depends on the application. Se = 1 Sp= 1

- happy to reduce Sp in order to increase Se.

- In other applications, we may have different requirements.

- Trade-off is better explained by ROC curve and AUC.

- Also referred to as Sensitivity-(1-Specificity) plot.

- Threshold of 0.0, every case is diagnosed as positive.

- Threshold of 1.0, every case is diagnosed as negative.

- TPR (Sensitivity): how many correct positive results

- FPR (1 – Specificity): how many incorrect positive

- The best possible prediction method

- Random guess; a point along a diagonal line (the

- Area Under the ROC Curve, abbreviated as (AUC)

- Classification Accuracy (0/1 Loss)

- Some combined measures:

- HM is preferred as it penalizes model the

- Improvement in HM implies improvement in

Different means, minimum and maximum against

- MCC=1 when FP = FN = 0 (Perfect classification)

The Breakeven Point:

- When Precision = Recall, we have a breakeven.

- Classification Accuracy (0/1 Loss)

- Speaker Identification from Speech Signal.

- Sentiment Analysis (Categories: Positive, Negative, Neutral), Text Analysis.

Option 2: Build an all-vs-all classifier:

There can be other options…

- Macro-averaging: We compute performance for each class and then average.

Confusion Matrix – Each Class:

Confusion Matrix – Each Class:

- Macro-average is relatively a better as we can see a zoomed-in picture before averaging.

- Note Macro-averaging does not take class imbalance into account.

You might also like