Lectures3 5
Lectures3 5
Metrics
Types of ML Paradigms
• Supervised learning
– Classification
– Regression
• Unsupervised learning
• Reinforcement learning
Supervised Learning
• Given: training data + desired outputs (labels)
• (x1, y1), (x2, y2), ..., (xn, yn)
• Learn a function f(x) to predict y given x
Cats Dogs
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Supervised Learning
Unsupervised Learning
• Given: training data (without desired outputs)
• x1, x2, ..., xn (without labels)
• Output hidden structure behind the x’s – Group them by
similarity
Clustering
Unsupervised Learning
Unsupervised Learning
Reinforcement Learning
Unsupervised
• Social network analysis
• Dimensionality reduction
• Market segmentation
More Recently…
• Combination of these paradigms are being
explored:
– Semi-supervised learning
– Self supervised learning
– Supervised + reinforcement
– ….
History of ML
• 1950s
– Samuel’s checker player
– Selfridge’s Pandemonium
• 1960s:
– Neural networks: Perceptron
– Pattern recognition
– Learning in the limit theory
– Minsky and Papert prove limitations of Perceptron
• 1970s:
– Symbolic concept induction
– Winston’s arch learner
– Expert systems and the knowledge acquisition bottleneck
– Quinlan’s ID3
– Michalski’s AQ and soybean diagnosis
– Scientific discovery with BACON
– Mathematical discovery with AM
History of ML
• 2000s
– Support vector machines & kernel methods
– Graphical models
– Statistical relational learning
– Transfer learning
– Sequence labeling
– Collective classification and structured outputs
– Computer Systems Applications (Compilers, Debugging, Graphics, Security)
– E-mail management
– Personalized assistants that learn
– Learning in robotics and vision
• 2010s
– Deep learning systems
– Learning for big data
– Bayesian methods
– Multi-task & lifelong learning
– Applications to vision, speech, social networks, learning to read, etc.
• 2020s
– Deep learning
– ???
Based on slide by Ray Mooney
Evaluation Metrics:
• Classifier: threshold
Evaluation Metrics
Let the problem statement be classifying purse and bags.
Purses are labeled as positive class and bags are labeled as negative class
Predicted Class
Negative Positive
Negative A (true negative) C (false positive)
Actual Class
Positive D (false negative) B (true positive)
54
http://www.cs.rpi.edu/~leen/misc-publications/SomeStatDefs.html
Evaluation Metrics
Predicted Class
Negative Positive
Negative - 50 A (true negative) - 40 C (false positive) - 10
Actual Class
Positive - 50 D (false negative) - 20 B (true positive) - 30
Negative Positive
Negative - 95 A (true negative) - 90 C (false positive) - 5
Actual Class
Positive - 5 D (false negative) - 3 B (true positive) - 2
Metric Formula
Average classification accuracy [(TN + TP) / (TN + FP + TP + FN)]
Class-wise classification accuracy [TN / (TN + FP) + TP / (TP + FN)]/2
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)
55
http://www.cs.rpi.edu/~leen/misc-publications/SomeStatDefs.html
Evaluation Metrics
Metric Formula
Average classification accuracy (TN + TP) / (TN+TP+FN+FP)
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)
56
Evaluation Metrics
Metric Formula
Average classification accuracy (TN + TP) / (TN+TP+FN+FP)
Type I error (false positive rate) FP / (TN + FP)
Type II error (false negative rate) FN / (FN + TP)
True positive rate TP / (TP + FN)
True negative rate TN / (TN + FP)
57
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)
58
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)
60
Evaluation Metrics
Metric Formula
Precision TP / (TP + FP)
Recall TP / (TP + FN)
61
Evaluation Metrics
Metric Formula
Sensitivity TP / (TP + FN)
Specificity TN / (TN + FP)
Predictive value for a positive result (PV+) TP / (TP + FP)
Predictive value for a negative result (PV-) TN / (TN + FN)
66
F1 Score
• The F1 score is the harmonic mean of
precision and recall:
68
Performance Evaluation
• Receiver operating characteristics (ROC) curve
– For authentication/verification
– False positive rate vs true positive rate
– False accept rate vs true accept rate
• Detection error-tradeoff (DET) curve
– False positive rate vs false negative rate
– False accept rate vs false reject rate
• Cumulative match curve (CMC)
– Rank vs identification accuracy
69
ROC Curve
70
Laptop bag vs. Purse: Design a
classifier
• Features:
– Ratio of Height to Width
• Classifier: threshold
Purse
Bag
Threshold = 1
Area Under the Curve
Threshold = 1
Threshold = 1
Threshold = 1
Threshold = 1
CMC
Query Top result Result 2 Result 3
1 A B C
2 B C A
Kitten A Kitten B Kitten C
3 A B C
1 2 3 4
79
Recap: CMC
1 A B C
Kitten A Kitten B Kitten C
2 B C A
4 C B A
80
CMC Curve
81
Evaluating ML Systems
• Assumption: Building a model for population
• Reality: Population is not available
• We work with a sample database – not necessarily true
representation of the population
• What to do?
– Should we use the entire available database for training the
model?
• High accuracy on the training data
• Lower accuracy on the testing data
• Called as overfitting
82
Evaluating ML Systems
Training database
Testing database
1 2 3 4
84
Cross Validation
• It is used for
– Performance evaluation: Evaluate the
performance of a classifier using the given data
– Model Selection: Compare the performance of
two or more algorithms (DT classifier and neural
network) to determine the best algorithm for the
given data
– Tuning model parameters: Compare the
performance of two variants of a parametric
model
85
Type of Cross Validation
• Resubstitution Validation
• Hold-Out Validation
• K-Fold Cross-Validation
• Leave-One-Out Cross-Validation
• Repeated K-Fold Cross-Validation
86
Type of Cross Validation…
• Resubstitution Validation
– All the available data is used for training and the
same data is used for testing
• Does not provide any information about generalizability
87
Type of Cross Validation…
• Hold-Out Validation
– The database is partitioned into two non-
overlapping parts, one for training and other for
testing
• The results depend a lot on the partition, may be
skewed if the test set is too easy or too difficult
88
Type of Cross Validation…
• K-Fold Cross-Validation
– Data is partitioned into k equal folds (partitions). k-1 folds are
used for training and 1-fold for testing
– The procedure is repeated k times