0% found this document useful (0 votes)
33 views57 pages

Lectures3 5

This document discusses different machine learning paradigms and evaluation metrics. It describes supervised learning, unsupervised learning, and reinforcement learning. It then discusses classification accuracy, precision, recall, sensitivity, and specificity as common evaluation metrics. Precision measures the fraction of retrieved instances that are relevant, while recall measures the fraction of relevant instances that are retrieved. Sensitivity measures the proportion of actual positives correctly identified.

Uploaded by

Rohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views57 pages

Lectures3 5

This document discusses different machine learning paradigms and evaluation metrics. It describes supervised learning, unsupervised learning, and reinforcement learning. It then discusses classification accuracy, precision, recall, sensitivity, and specificity as common evaluation metrics. Precision measures the fraction of retrieved instances that are relevant, while recall measures the fraction of relevant instances that are retrieved. Sensitivity measures the proportion of actual positives correctly identified.

Uploaded by

Rohit Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

ML Paradigms and Evaluation

Metrics
Types of ML Paradigms

• Supervised learning
– Classification
– Regression
• Unsupervised learning
• Reinforcement learning
Supervised Learning
• Given: training data + desired outputs (labels)
• (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x

Cats Dogs
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Unsupervised Learning
• Given: training data (without desired outputs)
• x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s – Group them by
similarity

Clustering
Unsupervised Learning
Unsupervised Learning
Reinforcement Learning

Rewards from sequence of


actions
Reinforcement Learning
• Playing games: Atari, Chess, Checkers
• Robot navigation
• Talent acquisition
Applications of Learning Paradigms
Supervised Reinforcement
• Person identification • Game playing
• Object recognition
• Credit assignment
• Stock prediction

Unsupervised
• Social network analysis
• Dimensionality reduction
• Market segmentation
More Recently…
• Combination of these paradigms are being
explored:
– Semi-supervised learning
– Self supervised learning
– Supervised + reinforcement
– ….
History of ML
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM
History of ML
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
• 2020s
– Deep learning
– ???
Based on slide by Ray Mooney
Evaluation Metrics:

Let us design a simple classification


algorithm
Purse vs Laptop Bag: Design a classifier
Laptop bag vs. Purse: Design a
classifier
• Features:
– Width
– Height
– Weight

• Classifier: threshold
Evaluation Metrics
Let the problem statement be classifying purse and bags.
Purses are labeled as positive class and bags are labeled as negative class

Predicted Class
Negative Positive
Negative A (true negative) C (false positive)
Actual Class
Positive D (false negative) B (true positive)

Term Meaning Example


True positive Correct classification Purse identified as purse
False positive Incorrect classification Bag identified as purse
True negative Correct classification Bag identified as bag
False negative Incorrect classification Purse identified as bag

54
http://www.cs.rpi.edu/~leen/misc-publications/SomeStatDefs.html
Evaluation Metrics
Predicted Class
Negative Positive
Negative - 50 A (true negative) - 40 C (false positive) - 10
Actual Class
Positive - 50 D (false negative) - 20 B (true positive) - 30
Negative Positive
Negative - 95 A (true negative) - 90 C (false positive) - 5
Actual Class
Positive - 5 D (false negative) - 3 B (true positive) - 2

Metric Formula
Average classification accuracy [(TN + TP) / (TN + FP + TP + FN)]
Class-wise classification accuracy [TN / (TN + FP) + TP / (TP + FN)]/2
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)
55
http://www.cs.rpi.edu/~leen/misc-publications/SomeStatDefs.html
Evaluation Metrics
Metric Formula
Average classification accuracy (TN + TP) / (TN+TP+FN+FP)
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)

• Type I error or false positive rate: The chance of incorrectly


classifying a (randomly selected) sample as positive
• Type II error or false negative rate: The chance of incorrectly
classification a (randomly selected) sample as negative

56
Evaluation Metrics
Metric Formula
Average classification accuracy (TN + TP) / (TN+TP+FN+FP)
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)

• Type I error or false positive rate: The chance of incorrectly


classifying a (randomly selected) sample as positive
Prevalent in
• Type II error or false negative rate: The chancecomputer
of incorrectly
vision and
image processing
classification a (randomly selected) sample as negative
related classification
problems

57
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)

Precision: Fraction of retrieved instances that are relevant

Recall: Fraction of relevant instances that are retrieved

58
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)

Precision: Probability that a (randomly selected) retrieved


document in relevant

Recall: Probability that a (randomly selected) relevant


document is retrieved in a search

60
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)

Precision: Probability that a (randomly selected) retrieved


More prevalent
document in relevant
in information
retrieval
Recall: Probability that a (randomly selected) relevant domain
document is retrieved in a search

61
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)

Sensitivity: Proportion of actual positives which are correctly identified

Specificity: Proportion of actual negatives which are correctly identified

Sensitivity: The chance of correctly identifying positive samples. A


sensitive test helps rule out disease (when the result is negative)

Specificity: The chance of correctly classifying negative samples. A very


specific test rules in disease with a higher degree of confidence. 62
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)

Sensitivity: The chance of correctly identifying positive samples. A


sensitive test helps rule out disease (when the result is negative)

Specificity: The chance of correctly classifying negative samples. A very


specific test rules in disease with a higher degree of confidence.
lent in
re preva ted
Mo rch rela
resea edical
to m ces
scien 65
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)

Predictive value of a positive result: If the test is positive, what is the


probability that the patient actually has the disease

Predictive value of a negative result: If the test is negative, what is


the probability that the patient does not have the disease

66
F1 Score
• The F1 score is the harmonic mean of
precision and recall:

• Unlike regular mean, harmonic mean gives


more weight to low values.
• Therefore, the classifier’s F1 score is only high
if both recall and precision are high.
Performance Evaluation
• Classification is of two types:
– Authentication / verification (1:1 Matching)
• Is she Richa?
• Is this an image of a helicopter?

– Identification (1:n matching)


• Who’s photo is this?
• This image belongs to which class?

68
Performance Evaluation
• Receiver operating characteristics (ROC) curve
– For authentication/verification
– False positive rate vs true positive rate
– False accept rate vs true accept rate
• Detection error-tradeoff (DET) curve
– False positive rate vs false negative rate
– False accept rate vs false reject rate
• Cumulative match curve (CMC)
– Rank vs identification accuracy
69
ROC Curve

70
Laptop bag vs. Purse: Design a
classifier
• Features:
– Ratio of Height to Width

• Classifier: threshold
Purse

Bag

Threshold = 1
Area Under the Curve
Threshold = 1
Threshold = 1
Threshold = 1
Threshold = 1
CMC
Query Top result Result 2 Result 3

1 A B C

2 B C A
Kitten A Kitten B Kitten C
3 A B C

And four test queries: 4 C B A

What are the rank accuracies?

1 2 3 4

79
Recap: CMC

Query Top result Result 2 Result 3

1 A B C
Kitten A Kitten B Kitten C
2 B C A

And four test queries: 3 A B C

4 C B A

What are the rank accuracies?


1 2 3 4

Rank 1: 1/4 predicted correctly: 25%


Rank 2: 3/4 : 75%
Rank 3: 4/4 : 100%

80
CMC Curve

81
Evaluating ML Systems
• Assumption: Building a model for population
• Reality: Population is not available
• We work with a sample database – not necessarily true
representation of the population

• What to do?
– Should we use the entire available database for training the
model?
• High accuracy on the training data
• Lower accuracy on the testing data
• Called as overfitting

82
Evaluating ML Systems

Overfitting Good fit Underfitting

• Underfitting: Learning algorithm had the opportunity to learn


more from training data, but didn’t

• Overfitting: Learning algorithm paid too much attention to


idiosyncracies of the training data; the resulting tree doesn’t
generalize
83
Cross Validation
• “Cross-Validation is a statistical method of evaluating
and comparing learning algorithms.”
• The data is divided into two parts:
– Training: to learn or train a model
– Testing: to validate the model

Training database

Kitten A Kitten B Kitten C

Testing database
1 2 3 4

84
Cross Validation
• It is used for
– Performance evaluation: Evaluate the
performance of a classifier using the given data
– Model Selection: Compare the performance of
two or more algorithms (DT classifier and neural
network) to determine the best algorithm for the
given data
– Tuning model parameters: Compare the
performance of two variants of a parametric
model

85
Type of Cross Validation
• Resubstitution Validation
• Hold-Out Validation
• K-Fold Cross-Validation
• Leave-One-Out Cross-Validation
• Repeated K-Fold Cross-Validation

86
Type of Cross Validation…
• Resubstitution Validation
– All the available data is used for training and the
same data is used for testing
• Does not provide any information about generalizability

87
Type of Cross Validation…
• Hold-Out Validation
– The database is partitioned into two non-
overlapping parts, one for training and other for
testing
• The results depend a lot on the partition, may be
skewed if the test set is too easy or too difficult

88
Type of Cross Validation…
• K-Fold Cross-Validation
– Data is partitioned into k equal folds (partitions). k-1 folds are
used for training and 1-fold for testing
– The procedure is repeated k times

• Across multiple folds, report:


– Average error or accuracy
– Standard deviation or variance

Test Train Train Train


Train Test Train Train
Train Train Test Train
Train Train Train Test

4-fold cross validation 89


Type of Cross Validation…
• Repeated K-Fold Cross-Validation
– Repeat k-fold cross validation multiple times
• Leave-One-Out Cross-Validation
– Special case of k-fold cross validation where
k=number of instances in the data
– Testing is performed on a single instance and the
remaining are used for training

• Across multiple folds, report:


– Average error or accuracy
– Standard deviation or variance
90
Comparing Cross-Validation Methods
Validation Method Advantages Disadvantages
Resubstitution Simple Overfitting
Independent training and Reduced data for training and
Hold-out validation
testing sets testing
Small sample for performance
estimation, underestimated
K-fold cross Accurate performance
performance variance or
validation estimation
overestimated degree of freedom
for comparison
Leave-one-out Unbiased performance
Very large variance
cross validation estimation
Overlapped training and test data
between each round,
Repeated k-fold Large number of
underestimated performance
cross-validation performance estimates
variance or overestimated degree of
freedom for comparison
91

You might also like