MLS 2 - Classification
MLS 2 - Classification
com
R8L0PN473F
Classification
2. Gaussian Models
3. Logistic Regression
4. Performance Assessments
[email protected]
R8L0PN473F
5. K-Nearest Neighbors
[email protected]
R8L0PN473F
Accuracy = (TP+TN)/(TP+FP+FN+TN)
[email protected]
Positive (1) TP FP
Predicted Values
R8L0PN473F
2. Precision: Proportion of true positives to all the
predicted positives, i.e., how valid the predictions are
Precision = (TP)/(TP+FP)
Negative (0) FN TN
3. Recall: Proportion of true positives to all the actual
positives, i.e., how complete the predictions are
Recall = (TP)/(TP+FN)
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Why accuracy is not always a good performance measure
Accuracy is simply the overall % of correct predictions and can be high even for very useless models.
# Total
Model Misses out on
Patients – 100
Cancer rate – predicts that Accuracy – 2 critical
# of Patients
2% no one has 98% patients
having cancer
cancer having cancer
-2
Here, accuracy
[email protected]
●
R8L0PN473F
will be 98%, even if we simply ● The other important metrics are Recall and
predict that every patient does not have cancer. Precision:
● In this case, Recall should be used as a measure of ○ Recall - What % of actuals 1s did the model
model performance; high recall imply fewer false capture in prediction?
negatives. ○ Precision - What % of predicted 1s are
● Fewer false negatives implies a lower chance of actual 1s?
‘missing’ a cancer patient, i.e., predicting a cancer ● There is a tradeoff - as you try to increase the
patient as one not having cancer. Recall, the Precision will reduce and vice versa.
● This is where we need other metrics to evaluate ● This tradeoff can be used to figure out the right
model performance. threshold to use for the model.
● F1 Score is a measure that takes into account both Precision and Recall.
● The F1 Score is the harmonic mean of Precision and Recall. Therefore, this score takes both false positives and false
negatives into account.
[email protected]
R8L0PN473F
● The highest possible value of the F1 score is 1, indicating perfect precision and recall, and the lowest possible value is
0.
It is a linear classifier but much less It is a non-linear classifier but more flexible
flexible than QDA than LDA
It assumes a common covariance matrix for It assumes that each class has its
R8L0PN473F all the classes covariance matrix
[email protected]
It is preferred when the training set only It is preferred when the training set is very
has a few observations large