Lec5 Classification
Lec5 Classification
Intro to classification
CSCI-P 556
ZORAN TIGANJ
2
Reminders/Announcements
u Let’s simplify the problem for now and only try to identify one digit—for
example, the number 5.
u This “5-detector” will be an example of a binary classifier, capable of
distinguishing between just two classes, 5 and not 5.
8
Performance Measures
u Let’s do cross-validation:
u Well, before you get too excited, let’s look at a very dumb classifier that
just classifies every single image in the “not-5” class:
10
Performance Measures
u Well, before you get too excited, let’s look at a very dumb classifier that
just classifies every single image in the “not-5” class:
Accuracy not a good measure due to skewness in data
u The confusion matrix gives you a lot of information, but sometimes you may
prefer a more concise metric.
u An interesting one to look at is the accuracy of the positive predictions; this
is called the precision of the classifier
u Precision is typically used along with another metric named recall, also
called sensitivity or the true positive rate (TPR): the ratio of positive
instances that are correctly detected by the classifier
u The F1 score favors classifiers that have similar precision and recall.
u This is not always what you want: in some contexts you mostly care about
precision, and in other contexts you really care about recall.
u For example, if you trained a classifier to detect videos that are safe for kids, you
would probably prefer a classifier that rejects many good videos (low recall) but
keeps only safe ones (high precision).
u On the other hand, suppose you train a classifier to detect shoplifters in
surveillance images: it is probably fine if your classifier has only 30% precision as
long as it has 99% recall (sure, the security guards will get a few false alerts, but
almost all shoplifters will get caught).
u Unfortunately, you can’t have it both ways: increasing precision reduces
recall, and vice versa. This is called the precision/recall trade-off.
21
22
The receiver operating characteristic
(ROC) curve
u The receiver operating characteristic (ROC) curve
is another common tool used with binary classifiers.
u It is very similar to the precision/recall curve, but
instead of plotting precision versus recall, the ROC
curve plots the true positive rate (another name
for recall) against the false positive rate (FPR).
u The FPR is the ratio of negative instances that are
incorrectly classified as positive. It is equal to 1 –
the true negative rate (TNR), which is the ratio of
negative instances that are correctly classified as
negative.
u The TNR is also called specificity. Hence, the ROC
curve plots sensitivity (recall) versus 1 – specificity .
(1- TNR (Specificity)
23
Area under the curve (AUC)
u One way to compare classifiers is to measure the area under the curve
(AUC).
u A perfect classifier will have a ROC AUC equal to 1, whereas a purely
random classifier will have a ROC AUC equal to 0.5.
u Scikit-Learn provides a function to compute the ROC AUC:
24
Comparing different classifiers using
ROC
25
OvR vs OvO