0% found this document useful (0 votes)

33 views57 pages

Lectures3 5

This document discusses different machine learning paradigms and evaluation metrics. It describes supervised learning, unsupervised learning, and reinforcement learning. It then discusses classification accuracy, precision, recall, sensitivity, and specificity as common evaluation metrics. Precision measures the fraction of retrieved instances that are relevant, while recall measures the fraction of relevant instances that are retrieved. Sensitivity measures the proportion of actual positives correctly identified.

Uploaded by

Rohit Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views57 pages

Lectures3 5

Uploaded by

Rohit Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

ML Paradigms and Evaluation

Metrics
Types of ML Paradigms

• Supervised learning
– Classification
– Regression
• Unsupervised learning
• Reinforcement learning
Supervised Learning
• Given: training data + desired outputs (labels)
• (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x

Cats Dogs
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Unsupervised Learning
• Given: training data (without desired outputs)
• x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s – Group them by
similarity

Clustering
Unsupervised Learning
Unsupervised Learning
Reinforcement Learning

Rewards from sequence of

actions
Reinforcement Learning
• Playing games: Atari, Chess, Checkers
• Robot navigation
• Talent acquisition
Applications of Learning Paradigms
Supervised Reinforcement
• Person identification • Game playing
• Object recognition
• Credit assignment
• Stock prediction

Unsupervised
• Social network analysis
• Dimensionality reduction
• Market segmentation
More Recently…
• Combination of these paradigms are being
explored:
– Semi-supervised learning
– Self supervised learning
– Supervised + reinforcement
– ….
History of ML
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM
History of ML
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
• 2020s
– Deep learning
– ???
Based on slide by Ray Mooney
Evaluation Metrics:

Let us design a simple classification

algorithm
Purse vs Laptop Bag: Design a classifier
Laptop bag vs. Purse: Design a
classifier
• Features:
– Width
– Height
– Weight

• Classifier: threshold
Evaluation Metrics
Let the problem statement be classifying purse and bags.
Purses are labeled as positive class and bags are labeled as negative class

Predicted Class
Negative Positive
Negative A (true negative) C (false positive)
Actual Class
Positive D (false negative) B (true positive)

Term Meaning Example

True positive Correct classification Purse identified as purse
False positive Incorrect classification Bag identified as purse
True negative Correct classification Bag identified as bag
False negative Incorrect classification Purse identified as bag

54
http://www.cs.rpi.edu/~leen/misc-publications/SomeStatDefs.html
Evaluation Metrics
Predicted Class
Negative Positive
Negative - 50 A (true negative) - 40 C (false positive) - 10
Actual Class
Positive - 50 D (false negative) - 20 B (true positive) - 30
Negative Positive
Negative - 95 A (true negative) - 90 C (false positive) - 5
Actual Class
Positive - 5 D (false negative) - 3 B (true positive) - 2

Metric Formula
Average classification accuracy [(TN + TP) / (TN + FP + TP + FN)]
Class-wise classification accuracy [TN / (TN + FP) + TP / (TP + FN)]/2
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)
55
http://www.cs.rpi.edu/~leen/misc-publications/SomeStatDefs.html
Evaluation Metrics
Metric Formula
Average classification accuracy (TN + TP) / (TN+TP+FN+FP)
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)

• Type I error or false positive rate: The chance of incorrectly

classifying a (randomly selected) sample as positive
• Type II error or false negative rate: The chance of incorrectly
classification a (randomly selected) sample as negative

56
Evaluation Metrics
Metric Formula
Average classification accuracy (TN + TP) / (TN+TP+FN+FP)
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)

• Type I error or false positive rate: The chance of incorrectly

classifying a (randomly selected) sample as positive
Prevalent in
• Type II error or false negative rate: The chancecomputer
of incorrectly
vision and
image processing
classification a (randomly selected) sample as negative
related classification
problems

57
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)

Precision: Fraction of retrieved instances that are relevant

Recall: Fraction of relevant instances that are retrieved

58
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)

Precision: Probability that a (randomly selected) retrieved

document in relevant

Recall: Probability that a (randomly selected) relevant

document is retrieved in a search

60
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)

Precision: Probability that a (randomly selected) retrieved

More prevalent
document in relevant
in information
retrieval
Recall: Probability that a (randomly selected) relevant domain
document is retrieved in a search

61
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)

Sensitivity: Proportion of actual positives which are correctly identified

Specificity: Proportion of actual negatives which are correctly identified

Sensitivity: The chance of correctly identifying positive samples. A

sensitive test helps rule out disease (when the result is negative)

Specificity: The chance of correctly classifying negative samples. A very

specific test rules in disease with a higher degree of confidence. 62
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)

Sensitivity: The chance of correctly identifying positive samples. A

sensitive test helps rule out disease (when the result is negative)

Specificity: The chance of correctly classifying negative samples. A very

specific test rules in disease with a higher degree of confidence.
lent in
re preva ted
Mo rch rela
resea edical
to m ces
scien 65
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)

Predictive value of a positive result: If the test is positive, what is the

probability that the patient actually has the disease

Predictive value of a negative result: If the test is negative, what is

the probability that the patient does not have the disease

66
F1 Score
• The F1 score is the harmonic mean of
precision and recall:

• Unlike regular mean, harmonic mean gives

more weight to low values.
• Therefore, the classifier’s F1 score is only high
if both recall and precision are high.
Performance Evaluation
• Classification is of two types:
– Authentication / verification (1:1 Matching)
• Is she Richa?
• Is this an image of a helicopter?

– Identification (1:n matching)

• Who’s photo is this?
• This image belongs to which class?

68
Performance Evaluation
• Receiver operating characteristics (ROC) curve
– For authentication/verification
– False positive rate vs true positive rate
– False accept rate vs true accept rate
• Detection error-tradeoff (DET) curve
– False positive rate vs false negative rate
– False accept rate vs false reject rate
• Cumulative match curve (CMC)
– Rank vs identification accuracy
69
ROC Curve

70
Laptop bag vs. Purse: Design a
classifier
• Features:
– Ratio of Height to Width

• Classifier: threshold
Purse

Bag

Threshold = 1
Area Under the Curve
Threshold = 1
Threshold = 1
Threshold = 1
Threshold = 1
CMC
Query Top result Result 2 Result 3

1 A B C

2 B C A
Kitten A Kitten B Kitten C
3 A B C

And four test queries: 4 C B A

What are the rank accuracies?

1 2 3 4

79
Recap: CMC

Query Top result Result 2 Result 3

1 A B C
Kitten A Kitten B Kitten C
2 B C A

And four test queries: 3 A B C

4 C B A

What are the rank accuracies?

1 2 3 4

Rank 1: 1/4 predicted correctly: 25%

Rank 2: 3/4 : 75%
Rank 3: 4/4 : 100%

80
CMC Curve

81
Evaluating ML Systems
• Assumption: Building a model for population
• Reality: Population is not available
• We work with a sample database – not necessarily true
representation of the population

• What to do?
– Should we use the entire available database for training the
model?
• High accuracy on the training data
• Lower accuracy on the testing data
• Called as overfitting

82
Evaluating ML Systems

Overfitting Good fit Underfitting

• Underfitting: Learning algorithm had the opportunity to learn

• Overfitting: Learning algorithm paid too much attention to

idiosyncracies of the training data; the resulting tree doesn’t
generalize
83
Cross Validation
• “Cross-Validation is a statistical method of evaluating
and comparing learning algorithms.”
• The data is divided into two parts:
– Training: to learn or train a model
– Testing: to validate the model

Training database

Kitten A Kitten B Kitten C

Testing database
1 2 3 4

84
Cross Validation
• It is used for
– Performance evaluation: Evaluate the
performance of a classifier using the given data
– Model Selection: Compare the performance of
two or more algorithms (DT classifier and neural
network) to determine the best algorithm for the
given data
– Tuning model parameters: Compare the
performance of two variants of a parametric
model

85
Type of Cross Validation
• Resubstitution Validation
• Hold-Out Validation
• K-Fold Cross-Validation
• Leave-One-Out Cross-Validation
• Repeated K-Fold Cross-Validation

86
Type of Cross Validation…
• Resubstitution Validation
– All the available data is used for training and the
same data is used for testing
• Does not provide any information about generalizability

87
Type of Cross Validation…
• Hold-Out Validation
– The database is partitioned into two non-
overlapping parts, one for training and other for
testing
• The results depend a lot on the partition, may be
skewed if the test set is too easy or too difficult

88
Type of Cross Validation…
• K-Fold Cross-Validation
– Data is partitioned into k equal folds (partitions). k-1 folds are
used for training and 1-fold for testing
– The procedure is repeated k times

• Across multiple folds, report:

– Average error or accuracy
– Standard deviation or variance

Test Train Train Train

Train Test Train Train
Train Train Test Train
Train Train Train Test

4-fold cross validation 89

Type of Cross Validation…
• Repeated K-Fold Cross-Validation
– Repeat k-fold cross validation multiple times
• Leave-One-Out Cross-Validation
– Special case of k-fold cross validation where
k=number of instances in the data
– Testing is performed on a single instance and the
remaining are used for training

• Across multiple folds, report:

– Average error or accuracy
– Standard deviation or variance
90
Comparing Cross-Validation Methods
Validation Method Advantages Disadvantages
Resubstitution Simple Overfitting
Independent training and Reduced data for training and
Hold-out validation
testing sets testing
Small sample for performance
estimation, underestimated
K-fold cross Accurate performance
performance variance or
validation estimation
overestimated degree of freedom
for comparison
Leave-one-out Unbiased performance
Very large variance
cross validation estimation
Overlapped training and test data
between each round,
Repeated k-fold Large number of
underestimated performance
cross-validation performance estimates
variance or overestimated degree of
freedom for comparison
91

Lesson 4 - Performance Metrics
No ratings yet
Lesson 4 - Performance Metrics
46 pages
Time Series Econometrics For MSC 20212022
No ratings yet
Time Series Econometrics For MSC 20212022
268 pages
Unit-16 IGNOU STATISTICS
No ratings yet
Unit-16 IGNOU STATISTICS
16 pages
Wine Quality Research Paper
100% (1)
Wine Quality Research Paper
3 pages
Performance Metrics
No ratings yet
Performance Metrics
12 pages
Linear Regression Model
No ratings yet
Linear Regression Model
195 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
One-Sample T-Test
No ratings yet
One-Sample T-Test
9 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Unit 4 Model Evaluation
No ratings yet
Unit 4 Model Evaluation
24 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
CS585 Lecture October10th
No ratings yet
CS585 Lecture October10th
146 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Statistics Unit 8 Notes
No ratings yet
Statistics Unit 8 Notes
15 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
Machine Learning II
No ratings yet
Machine Learning II
61 pages
Lecture - (3-4) Evaluation Metrices Classification and Regression
No ratings yet
Lecture - (3-4) Evaluation Metrices Classification and Regression
28 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
Newbold Chapter 7
No ratings yet
Newbold Chapter 7
62 pages
Features Election
No ratings yet
Features Election
62 pages
Lec07 Classification ModelEvaluation Ensemble
No ratings yet
Lec07 Classification ModelEvaluation Ensemble
62 pages
Evans Analytics2e PPT 05
No ratings yet
Evans Analytics2e PPT 05
65 pages
Feature Selection For Unsupervised Learning: Jennifer G. Dy
No ratings yet
Feature Selection For Unsupervised Learning: Jennifer G. Dy
45 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
Applied Statistics and Probability For Engineers 6th Edition Montgomery Solutions Manualinstant Download
100% (5)
Applied Statistics and Probability For Engineers 6th Edition Montgomery Solutions Manualinstant Download
52 pages
L 13 Choose Your Own Algorithm D 07062024 111828am
No ratings yet
L 13 Choose Your Own Algorithm D 07062024 111828am
36 pages
Unit 4
No ratings yet
Unit 4
52 pages
08 Classifier Evaluation
No ratings yet
08 Classifier Evaluation
39 pages
CSC4316 9
No ratings yet
CSC4316 9
40 pages
Module 6: Introduction To Time Series Forecasting: Titus Awokuse and Tom Ilvento
No ratings yet
Module 6: Introduction To Time Series Forecasting: Titus Awokuse and Tom Ilvento
26 pages
D3 IT Performance Metrics May 2023
No ratings yet
D3 IT Performance Metrics May 2023
48 pages
3 - Model Evaluation & Validation
No ratings yet
3 - Model Evaluation & Validation
47 pages
Introduction To Artificial Intelligence: Amna Iftikhar Spring ' 2021
No ratings yet
Introduction To Artificial Intelligence: Amna Iftikhar Spring ' 2021
40 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Dimensionality Reduction 22-01-22
No ratings yet
Dimensionality Reduction 22-01-22
47 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
ICA Dim Red
No ratings yet
ICA Dim Red
39 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Introduction To Artificial Intelligence: Amna Iftikhar Fall ' 2019 1
No ratings yet
Introduction To Artificial Intelligence: Amna Iftikhar Fall ' 2019 1
33 pages
Assignment 5
No ratings yet
Assignment 5
22 pages
Lecture 20 - Evaluation Metrics
No ratings yet
Lecture 20 - Evaluation Metrics
27 pages
Nlogit An R Package Presentation
No ratings yet
Nlogit An R Package Presentation
40 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Chapter 5 Model Evaluation
No ratings yet
Chapter 5 Model Evaluation
21 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
STA1006S Notes 1 DVW
No ratings yet
STA1006S Notes 1 DVW
24 pages
04 Ekspektasi - Matematik - SLIDE
No ratings yet
04 Ekspektasi - Matematik - SLIDE
28 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
PM608 - Week 5 Lecture - Probablilty Theory
No ratings yet
PM608 - Week 5 Lecture - Probablilty Theory
24 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Uji Multivariat Regresi Logistik Ganda Penelitian IMD - Deka
No ratings yet
Uji Multivariat Regresi Logistik Ganda Penelitian IMD - Deka
20 pages
Machine Learning Cheatsheet
No ratings yet
Machine Learning Cheatsheet
12 pages
Unit 8 Packet - Part 1
No ratings yet
Unit 8 Packet - Part 1
21 pages
Strategy Deck
No ratings yet
Strategy Deck
16 pages
Evaluation Metrics and Statistical Tests For Machi
No ratings yet
Evaluation Metrics and Statistical Tests For Machi
15 pages
Lecture 10
No ratings yet
Lecture 10
16 pages
Lesson 6 Analytics Methods
No ratings yet
Lesson 6 Analytics Methods
12 pages
CSDS 440: Machine Learning: Soumya Ray (
No ratings yet
CSDS 440: Machine Learning: Soumya Ray (
20 pages
Experimental Research
No ratings yet
Experimental Research
14 pages
? Task
No ratings yet
? Task
23 pages
Evaluation Metrics-ML
No ratings yet
Evaluation Metrics-ML
16 pages
Evaluation Metrics and Statistical Tests For Machine Learning
No ratings yet
Evaluation Metrics and Statistical Tests For Machine Learning
14 pages
Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
14 pages
Confusion Matrix and Classification Evaluation Metrics
No ratings yet
Confusion Matrix and Classification Evaluation Metrics
16 pages
Mars PDF
No ratings yet
Mars PDF
15 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
R10 Sampling and Estimation
No ratings yet
R10 Sampling and Estimation
17 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
11 pages
CS 229, Summer 2020 Problem Set #1
No ratings yet
CS 229, Summer 2020 Problem Set #1
14 pages
Presentation ARM Week10
No ratings yet
Presentation ARM Week10
27 pages
Stat130 Assignment 4 On Unit - 5
No ratings yet
Stat130 Assignment 4 On Unit - 5
8 pages
Ads 5
No ratings yet
Ads 5
5 pages
Tourism Management: Ching-Fu Chen, Song Zan Chiou-Wei
No ratings yet
Tourism Management: Ching-Fu Chen, Song Zan Chiou-Wei
7 pages
P Syllabus PDF
No ratings yet
P Syllabus PDF
5 pages
Time Series Analysis
No ratings yet
Time Series Analysis
7 pages
Ads Exp4
No ratings yet
Ads Exp4
3 pages
Estimation and Confidence Limits
No ratings yet
Estimation and Confidence Limits
6 pages
Value-at-Risk Calculations With Time Varying Copulae: Enzo Giacomini Wolfgang Härdle
No ratings yet
Value-at-Risk Calculations With Time Varying Copulae: Enzo Giacomini Wolfgang Härdle
6 pages
Desi Wahyuni
No ratings yet
Desi Wahyuni
3 pages
Tutorial 2 - Sol
No ratings yet
Tutorial 2 - Sol
2 pages
Resume Rohit Singh
No ratings yet
Resume Rohit Singh
1 page
Exam Prep for:: Advanced Accounting and Heres the News!
From Everand
Exam Prep for:: Advanced Accounting and Heres the News!
Mzn Lnx
No ratings yet
Exam Prep for:: Liberia Diplomatic Handbook
From Everand
Exam Prep for:: Liberia Diplomatic Handbook
Mzn Lnx
No ratings yet
Exam Prep for:: Hungary Army Weapon Systems Handbook
From Everand
Exam Prep for:: Hungary Army Weapon Systems Handbook
Mzn Lnx
No ratings yet
Exam Prep for:: Idiots Guides Statistics
From Everand
Exam Prep for:: Idiots Guides Statistics
Mzn Lnx
No ratings yet
Exam Prep for:: Using and Interpreting Statistics
From Everand
Exam Prep for:: Using and Interpreting Statistics
Mzn Lnx
No ratings yet

Lectures3 5

Uploaded by

Lectures3 5

Uploaded by

ML Paradigms and Evaluation

Rewards from sequence of

Let us design a simple classification

Term Meaning Example

• Type I error or false positive rate: The chance of incorrectly

• Type I error or false positive rate: The chance of incorrectly

Precision: Fraction of retrieved instances that are relevant

Recall: Fraction of relevant instances that are retrieved

Precision: Probability that a (randomly selected) retrieved

Recall: Probability that a (randomly selected) relevant

Precision: Probability that a (randomly selected) retrieved

Sensitivity: Proportion of actual positives which are correctly identified

Specificity: Proportion of actual negatives which are correctly identified

Sensitivity: The chance of correctly identifying positive samples. A

Specificity: The chance of correctly classifying negative samples. A very

Sensitivity: The chance of correctly identifying positive samples. A

Specificity: The chance of correctly classifying negative samples. A very

Predictive value of a positive result: If the test is positive, what is the

Predictive value of a negative result: If the test is negative, what is

• Unlike regular mean, harmonic mean gives

– Identification (1:n matching)

And four test queries: 4 C B A

What are the rank accuracies?

Query Top result Result 2 Result 3

And four test queries: 3 A B C

What are the rank accuracies?

Rank 1: 1/4 predicted correctly: 25%

Overfitting Good fit Underfitting

• Underfitting: Learning algorithm had the opportunity to learn

• Overfitting: Learning algorithm paid too much attention to

Kitten A Kitten B Kitten C

• Across multiple folds, report:

Test Train Train Train

4-fold cross validation 89

• Across multiple folds, report:

You might also like