Roc 1 PDF
Roc 1 PDF
ROC curves
precision = positive
predictive value (PPV) = TP / P-hat
Sensitivity = recall =
True pos rate = hit rate
= TP / P = 1-FNR
Specificity = TN / N = 1-FPR
T P R = p(
y = 1|y = 1)
F P R = p(
y = 1|y = 0)
Example
i
1
2
3
4
5
6
7
8
9
i
1
2
3
4
5
6
7
8
9
yi
1
1
1
1
1
0
0
0
0
yi
1
1
1
1
1
0
0
0
0
p(yi = 1|xi )
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
p(yi = 1|xi )
0.9
0.8
0.7
0.6
0.2
0.6
0.3
0.2
0.1
yi ( = 0)
1
1
1
1
1
1
1
1
1
yi ( = 0)
1
1
1
1
1
1
1
1
1
yi ( = 0.5)
1
1
1
1
1
0
0
0
0
yi ( = 0.5)
1
1
1
1
0
1
0
0
0
yi ( = 1)
0
0
0
0
0
0
0
0
0
yi ( = 1)
0
0
0
0
0
0
0
0
0
Performance measures
EER- Equal error rate/ cross over error rate (false
pos rate = false neg rate), smaller is better
AUC - Area under curve, larger is better
Accuracy = (TP+TN)/(P+N)
Precision-recall curves
Useful when notion of negative (and hence FPR)
is not well defined, or too many negatives (rare
event detection)
Recall = of those that exist, how many did you find?
Precision = of those that you found, how many
correct?
2P R
2
=
F
=
F-score is harmonic mean
1/P + 1/R
R+P
prec = p(y = 1|
y = 1)
recall = p(
y = 1|y = 1)
Word of caution
Consider binary classifiers A, B, C
1
0
A
1
0.9
0
.
0
0.1
0
B
1
0.8
0.1
.
0
0
0.1
C
1
0.78
0.12
.
0
0
0.1
A
0.9
0.9
1.0
0.947
B
0.9
1.0
0.888
0.941
C
0.88
1.0
0.8667
0.9286
1
1
y=0
p(
y, y)
p(
y , y) log
p(
y)p(y)
y=0
A
0.9
0.9
1.0
0.947
0
B
0.9
1.0
0.888
0.941
0.1865
C
0.88
1.0
0.8667
0.9286
0.1735