0% found this document useful (0 votes)
73 views39 pages

COMPX310-19A Machine Learning Chapter 3: Classification

Uploaded by

Natch Sadindum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views39 pages

COMPX310-19A Machine Learning Chapter 3: Classification

Uploaded by

Natch Sadindum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

COMPX310-19A

Machine Learning
Chapter 3: Classification
An introduction using Python, Scikit-Learn, Keras, and Tensorflow

Unless otherwise indicated, all images are from Hands-on Machine Learning with
Scikit-Learn, Keras, and TensorFlow by Aurélien Géron, Copyright © 2019 O’Reilly Media
House keeping
 Outline

03/08/2021 COMPX310 2
MNIST: the “hello world”of ML
 Scikit-learn provides some benchmark datasets,

 In this case: www.openml.org


03/08/2021 COMPX310 3
Handwritten digits

03/08/2021 COMPX310 4
Preparing Y

03/08/2021 COMPX310 5
Train/test, binary class

03/08/2021 COMPX310 6
Yet another learner: SGD

03/08/2021 COMPX310 7
Cross-validation

03/08/2021 COMPX310 8
Cross-validation
 Cross-validation is an alternative to Train+Validation
 Train is split up into k equal-sized folds (default: 10 folds)
 Use k-1 folds together as the new train, validate on the
remaining fold
 Repeat this k times, always choosing another fold => k results
 Compute mean + standard deviation
 [can also repeat this multiple times with different random seeds
to reduce the variance of the result]

03/08/2021 COMPX310 9
Cross-validation
 Workhorse in ML, therefore direct support in scikit_learn:

03/08/2021 COMPX310 10
Are we really that good?

03/08/2021 COMPX310 11
Getting predictions from CV

03/08/2021 COMPX310 12
Compare to perfection

03/08/2021 COMPX310 13
Precision and Recall
Precision: how many of the predicted 5s are really 5s

Recall: how many of the real 5s do we actually find

03/08/2021 COMPX310 14
TN, TP, FN, FP and the confusion matrix
[[5, 1], TN=5, FP=1, FN=2, TP=3
[2, 3]]. Rows: row0 info about class0, …
Columns: col0 info about predictedAs0, …

03/08/2021 COMPX310 15
F1: harmonic mean of recall & precision
[[5, 1], TN=5, FP=1, FN=2, TP=3
[2, 3]]. Rows: row0 info about class0, …
Columns: col0 info about predictedAs0, …

03/08/2021 COMPX310 16
Some results

03/08/2021 COMPX310 17
Thresholds: precision/recall trade-off

03/08/2021 COMPX310 18
Classifiers return numeric scores

03/08/2021 COMPX310 19
Precision recall curves

03/08/2021 COMPX310 20
Precision recall curves

03/08/2021 COMPX310 21
Recall @ precision == 0.9

03/08/2021 COMPX310 22
Precision-recall curve

03/08/2021 COMPX310 23
Alternative: ROC curve

03/08/2021 COMPX310 24
Alternative: ROC curve
Plot true positive rate (TPR)
over false positive rate (FPR)
for all possible thresholds.

Best @ (0,1).
Diagonal is a random classifier.

Area under the curve (AUC) is


1.0 for best possible, and
0.5 for random classifier.

AUC is very popular,


does not need a threshold,
works well for imbalanced data.

03/08/2021 COMPX310 25
Compare to Random Forest

03/08/2021 COMPX310 26
Compare to Random Forest

03/08/2021 COMPX310 27
Compare to Random Forest

03/08/2021 COMPX310 28
Multi-class classification

03/08/2021 COMPX310 29
Multi-class classification

03/08/2021 COMPX310 30
One-vs-One for Multiclass

03/08/2021 COMPX310 31
Random forest for multi-class

03/08/2021 COMPX310 32
Error analysis: confusion matrix from CV

03/08/2021 COMPX310 33
Error analysis: confusion matrix from CV

03/08/2021 COMPX310 34
Error analysis: confusion matrix from CV

03/08/2021 COMPX310 35
Multilabel: more than one binary target

03/08/2021 COMPX310 36
Multilabel: cross-validation

“Macro”: compute F1 for each label separately, then


average over all labels

“Micro”: compute F1 for the labels per example, then


average over all examples

03/08/2021 COMPX310 37
MultiOutput: multiple multiclass target
 E.g.: reconstruct image from a corrupted version

X y

03/08/2021 COMPX310 38
Adding noise, train & predict

03/08/2021 COMPX310 39

You might also like