0% found this document useful (0 votes)
13 views11 pages

Performance

dss

Uploaded by

Bareeq Nope
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

Performance

dss

Uploaded by

Bareeq Nope
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Performance Measures

• Most common measure is accuracy


• Summed squared error
• Mean squared error
• Classification accuracy
• Precision, Recall, F-score
• ROC

1
Binary Classification
Predicted Output
1 0
True Output (Target)

True Positive (TP) False Negative (FN)


1 Hits Misses

False Positive (FP) True Negative (TN)


0
False Alarm Correct Rejections

Accuracy = (TP+TN)/(TP+TN+FP+FN)
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
2
Precision
Predicted Output
1 0
True Output (Target)

True Positive (TP) False Negative (FN)


1 Hits Misses

False Positive (FP) True Negative (TN)


0
False Alarm Correct Rejections

Precision = TP/(TP+FP)
The percentage of predicted true positives
that are target true positives
3
Recall
Predicted Output
1 0
True Output (Target)

True Positive (TP) False Negative (FN)


1 Hits Misses

False Positive (FP) True Negative (TN)


0
False Alarm Correct Rejections

Recall = TP/(TP+FN)
The percentage of target true positives
that were predicted as true positives
4
Other measures - Precision vs.
Recall
• Considering precision and recall let us choose a ML approach which
maximizes what we are most interested in (precision or recall) and not
just accuracy.
• Tradeoff - Can also adjust ML parameters to accomplish the goal of the
application
• Break even point: precision = recall
• F1 or F-score = 2(precision  recall)/(precision  recall) - Harmonic
mean of precision and recall

5
ROC Curves and Area under curve
• Receiver Operating Characteristic
• Developed in WWII to statistically model false positive and false
negative detections of radar operators
• Standard measure in medicine and biology
• True positive rate (sensitivity) vs false positive rate (1- specificity)
• True positive rate (Probability of predicting true when it is true)
P(Pred:T|T) = Sensitivity = Recall = TP/P = TP/(TP+FN)
• False positive rate (Probability of predicting true when it is false)
P(Pred:T|F) = FP/N = FP/(TN+FP) = 1 – specificity (true negative rate) = 1
– TN/N = 1 - TN/(TN+FP)
• Want to maximize TPR and minimize FPR
• How would you do each independently?

6
ROC Curves and ROC Area
• Want to find the right balance
• But the right balance/threshold can differ for each task considered
• How do we know which algorithms are robust and accurate across
many different thresholds? – ROC curve
• Each point on the ROC curve represents a different tradeoff (cost
ratio) between true positive rate and false positive rate
• The standard measures just show accuracy for one setting of the
cost/ratio threshold, whereas the ROC curve shows accuracy for all
settings and thus allows us to compare how robust to different
thresholds one algorithm is compared to another

7
8
 Assume thresholds:
 Threshold = 1 (0,0), then all
outputs are 0 .3
TPR = P(T|T) = 0
FPR = P (T|F) = 0
.5
 Threshold = 0, (1,1)
TPR = 1, FPR = 1
 Threshold = .8 (.2,.2)
TPR = .38 FPR = .02 .8
- Better Precision
 Threshold = .5 (.5,.5)
TPR = .82 FPR = .18
- Better Accuracy
 Threshold = .3 (.7,.7)
TPR = .95 FPR = .43
Accuracy is maximized at point closest to the top left corner.
- Better Recall
Note that Sensitivity = Recall and the lower the
false positive rate, the higher the precision.
9
ROC Properties

• Area Properties
• 1.0 - Perfect prediction
• .9 - Excellent
• .7 - Moderate
• .5 - Random
• ROC area represents performance over all possible thresholds
• If two ROC curves do not intersect then one method dominates over
the other
• If they do intersect then one method is better for some thresholds,
and is worse for others
• Blue alg better for precision, yellow alg for recall, red neither
• Can choose method and balance based on goals

10
Performance Measurement
Summary
• Some of these measures (ROC, F-score) gaining popularity
• Can allow you to look at a range of thresholds
• However, they do not extend to multi-class situations which are very
common
• However, medicine, finance, etc. have lots of two class problems
• Could always cast problem as a set of two class problems but that can be
inconvenient
• Accuracy handles multi-class outputs and is still the most common
measure but often combined with other measures such as ROC, etc.

11

You might also like