TensorFlow Classification
TensorFlow Classification
Over view
Mammals Fish
Members of the infraorder Cetacea Look like fish, swim like fish, move with
fish
Whales: Fish or Mammals?
ML-based Classifier
ML-based Classifier
Training Prediction
Feed in a large corpus of data classified Use it to classify new instances which it
correctly has not seen before
Training the ML-based Classifier
Classification
ML-based Classifier
Corpus
Feedback - loss
Improves model parameters function or cost
function
An algorithm might have high accuracy but
still be a poor machine learning model
Medical reports
Always classify as No Cancer
“normal”
No Cancer 5 1000
True Positive
Predicted Labels
No Cancer
Cancer
Actual Label
10 4
Cancer
No Cancer 5 1000
Cancer
10 TP 4
No Cancer 5 1000
No Cancer 5 1000
No Cancer 5
FP 1000
No Cancer 5 1000
No Cancer 5 1000 TN
No Cancer 5 1000
Cancer
10 4 FN
No Cancer 5 1000
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Accuracy =
TP + TN
=
1010
= 99.12%
Num Instances 1019
Accuracy
Accuracy = 99.12%
But…
Accuracy
Predicted Labels
No Cancer
Cancer
Actual Label
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Precision
Predicted Labels
No Cancer
Cancer
Actual Label
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
TP 10
Precision = TP + FP = 15 = 66.67%
Precision = 66.67%
Precision
1 in 3 cancer diagnoses is incorrect
Recall
Predicted Labels
No Cancer
Cancer
Actual Label
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Recall
Predicted Labels
No Cancer
Cancer
Actual Label
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
Cancer
10 TP 4 FN
No Cancer 5 FP 1000 TN
TP 10
Recall = TP + FN = 14 = 71.42%
Recall = 71.42%
Recall
2 in 7 cancer cases missed
Choosing a Machine Learning Model
ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Corpus
Applying Logistic Regression
Probability of
animal being (95%)
fish Lives in water, breathes with gills, lays
eggs
(60%)
(60%)
(60%)
(60%)
Cancer 0 14
FP TN
“Always No Cancer
0 1005
Negative”
Pthreshold =1 - Recall = 0%
- Precision = Infinite
0
1.0
“Conservativeness” of Decision Threshold
Predicted
No Cancer
Cancer
Actual TP FN
Cancer 14 0
FP TN
“Always No Cancer
1005 0
Positive”
Pthreshold = 0 - Recall = 100%
Recall
0
1.0
“Conservativeness” of Decision Threshold
Precision-Recall Tradeoff
1.0 Precision
Recall
0
1.0
“Conservativeness” of Decision Threshold
Precision-Recall Tradeoff
1.0
Precision
Recall
Heuristics to Choose a Model
ROC Curve
F1 Score
Plot a curve to maximize true positives,
Harmonic mean of precision and recall
minimize false positives
Heuristics to Choose a Model
ROC Curve
F1 Score
Plot a curve to maximize true positives,
Harmonic mean of precision and recall
minimize false positives
Precision x Recall
F1 = 2x
Precision + Recall
F1 Score - Harmonic mean of precision, recall
ROC Curve
F1 Score
Plot a curve to maximize true positives,
Harmonic mean of precision and recall
minimize false positives
Choosing Pthreshold
True
Positive
Rate
False Positive
Rate
Choosing Pthreshold
Should be as high as
True possible
Positive
Rate
False Positive
Rate
Choosing Pthreshold
Should be as low as
True possible
Positive
Rate
False Positive
Rate
Choosing Pthreshold
ROC Curve
(Receiver Operating
Characteristic)
True
Positive
Rate
False Positive
Rate
Choosing Pthreshold
1.0
True
Positive Different values of Pthreshold
(Hyperparameter tuning)
Rate
False Positive
Rate
Choosing Pthreshold
1.0
True
Positive Fit ROC curve from different
Rate values of Pthreshold
False Positive
Rate
ROC Cur ve
1.0
True
Positive Pick top-left corner point as Pthreshold
Rate Why? Maximises True Positive Rate,
minimises False Positive Rate
0
False Positive
Rate
ROC of Perfect Classifier
1.0
TP = 100%
FP = 0%
True
Positive
Rate
False Positive
Rate
ROC of Random Classifier
1.0
TP = FP
True
Positive
Rate
False Positive
Rate