Introduction To Forecast Verification - Fowler, Jenson and Brown
Introduction To Forecast Verification - Fowler, Jenson and Brown
Verification
Tressa L. Fowler, Tara L .Jensen, Barbara G.
Brown
– Administrative purpose
• Monitoring performance
• Choice of model or model configuration
(has the model improved?)
– Scientific purpose
• Identifying and correcting model flaws
• Forecast improvement
– Economic purpose
• Improved decision making
• “Feeding” decision models or decision support systems
Copyright 2012, University Corporation for Atmospheric Research, all rights
4
reserved
Identifying verification goals
What questions do we want to answer?
• Examples:
ü In what locations does the model have the best
performance?
ü Are there regimes in which the forecasts are better
or worse?
ü Is the probability forecast well calibrated (i.e.,
reliable)?
ü Do the forecasts correctly capture the natural
variability of the weather?
Other examples?
Copyright 2012, University Corporation for Atmospheric Research, all rights
5
reserved
Identifying verification goals (cont.)
• What forecast performance attribute should be
measured?
• Related to the question as well as the type of forecast
and observation
AND
F O
If I’m a water
manager for this F O
watershed, it’s a
pretty bad
forecast…
F O
A Flight Route B
14
Observations might be garbage if
• Ensemble
– Multiple iterations of a continuous or
categorical forecast
l May be transformed into a probability
distribution
– Observations may be continuous,
dichotomous or multi-category
Copyright 2012, University Corporation for Atmospheric Research, all rights ECMWF 2-m temperature
meteogram for Helsinki 18
reserved
Matching forecasts and observations
• May be the most difficult part of the verification
process!
• Many factors need to be taken into account
- Identifying observations that represent the forecast
event
ü Example: Precipitation accumulation over an hour at a
point
- For a gridded forecast there are many options for
the matching process
• Point-to-grid
• Match obs to closest gridpoint
• Grid-to-point
• Interpolate?
• Take largest value?
• Point-to-Grid and
Grid-to-Point
Final point:
• Why not??
• Issue: Non-independence!!
How…
– …do you need/want to present results (e.g.,
stratification/aggregation)?
Which…
– …methods and metrics are appropriate?
– … methods are required (e.g., bias, event
frequency, sample size)
F
H
M
Observation
Categorical Verification
Tara Jensen
Contributions from Matt Pocernich, Eric Gilleland,
Tressa Fowler, Barbara Brown and others
Finley Tornado Data
(1884)
Forecast answering the Observation answering the
question: question:
YES YES
NO NO
Observed
Yes No Total
Forecast
Yes 28 72 100
No 23 2680 2703
Total 51 2752 2803
Contingency Table
A Success?
Observed
Yes No Total
Forecast
Yes 28 72 100
No 23 2680 2703
Total 51 2752 2803
Percent Correct = (28+2680)/2803 = 96.6% !!!!
What if forecaster
never forecasted a tornado?
Observed
Yes No Total
Forecast
Yes 0 0 0
No 51 2752 2803
Total 51 2752 2803
Percent Correct = (0+2752)/2803 = 98.2% !!!!
maybe Accuracy is not the most
informative statistic
Correct Forecast
No Miss Negative No
Total Obs. Yes Obs. No Total
Observed
Yes No Total
Forecast
Yes a b a+b
No c d c+d
Total a+c b+d n
Observed
<= 0C > 0C Total
Forecast
<= 0C a b a+b
> 0C c d c+d
Total a+c b+d n
Another Example:
Base Rate (aka sample climatology) = (a+c)/n
Alternative Perspective on
Contingency Table
Correct
Negatives
Hits
Conditioning to form a statistic
• Considers the probability of one event given another event
• Notation: p(X|Y=1) is probability of X occuring given
Y=1 or in other words Y=yes
Conditioning on Observations
Hit Rate - p(f=1|o=1) = a/(a+c): close to 1
[aka Probability of Detection Yes (PODy)]
Fraction of misses p(f=0|o=1) = a/(a+c) : close to 0
Examples of Categorical Scores
(most based on conditioning)
Yes 28 72 100
No 23 2680 2703
Total 51 2752 2803
How to construct:
•Bin your data
•Calculate PODY and POFD by moving thru bins and thus
changing the definition of a –d
•Plot using scatter plot
0-3 146 14
4-6 16 8
7-9 12 3
10-12 10 10
13-15 15 5
16-18 4 9
19-21 7 9
22-24 2 8
25-28 7 8
29< 6 32
Probability Winds will be below Cut-Out Speed Mid-points
Calculation of Empirical ROC
Used to determine how well
forecast discriminates between
event and non-event.
c
15
d
48
PODY POFD
Hit Rate vs. False Alarm Rate 22 57
26 63
0.98 0.55
0.90
0.88
0.46
0.38 a
210
b
58
Does not need to be a probability! 203 49
Does not need to be calibrated! 199 40
Empirical ROC Curve
Perfect
Presented by
Barbara G. Brown
• Statistics
– Bias
– Error statistics
– Robustness
– Comparisons
Exploratory methods:
joint distribution
Scatter-plot: plot of
observation versus
forecast values
Perfect forecast = obs,
points should be on the
45o diagonal
Provides information on:
bias, outliers, error
magnitude, linear
association, peculiar
behaviours in extremes,
misses and false alarms
(link to contingency table)
Exploratory methods:
marginal distribution
Quantile-quantile plots:
OBS quantile versus the
corresponding FRCS quantile
Scatter-plot and qq-plot: example 1
Q: is there any bias? Positive (over-forecast) or
negative (under-forecast)?
Scatter-plot and qq-plot: example 2
Describe the peculiar behaviour of low temperatures
Scatter-plot: example 3
Describe how the error varies as the
temperatures grow
outlier
Scatter-plot and
Contingency Table
Does the forecast detect correctly Does the forecast detect correctly
temperatures above 18 degrees ? temperatures below 10 degrees ?
Example Box (and Whisker) Plot
• Spread:
1 n
( )
2
st dev = ∑ xi − X
n i=1 MEAN MEDIAN STDEV IQR
Inter Quartile Range = OBS 20.71 20.25 5.18 8.52
IQR = q0.75 − q0.25
FRCS 18.62 17.00 5.99 9.75
Exploratory methods:
conditional distributions
Conditional histogram and
conditional box-plot
Exploratory methods:
conditional qq-plot
Continuous scores: linear bias
1 n Attribute:
( )
linear bias = Mean Error = " f i ! oi = f ! o
n i=1
measures
the bias
Mean Error = average of the errors = difference between the
means
It indicates the average direction of error: positive bias
indicates over-forecast, negative bias indicates under-
forecast (y=forecast, x=observation)
Does not indicate the magnitude of the error (positive and
negative error can cancel outs)
Bias correction: misses (false alarms) improve at the expenses of
false alarms (misses). Q: If I correct the bias in an over-forecast, do
false alarms grow or decrease ? And the misses ?
Good practice rules: sample used for evaluating bias correction
should be consistent with sample corrected (e.g. winter separated by
summer); for fair validation, cross validation should be adopted for
bias corrected forecasts
Mean Absolute Error
1 n Attribute:
MAE = " f i ! oi measures
n i=1 accuracy
{ }
Attribute:
MAD = median f i ! oi measures
accuracy
48 72 96 120
24
Forecast Lead Time
Continuous scores: linear correlation
1 n
"
n i=1
( )(
yi ! y xi ! x ) cov(Y, X) Attribute:
rXY = = measures
1 n n sY s X
( ) ( )
2 1
" yi ! y # " xi ! x
2
association
n i=1 n i=1
ρ =
Cov( f , x) ∑ ( f − f )( x − x )
i i
fx
Var ( f )Var ( x) rfx = i =1
(n − 1) s f sx
Continuous scores:
anomaly correlation
• Correlation calculated on
anomaly.
• Anomaly is difference
between what was forecast
(observed) and climatology.
• Centered or uncentered
versions.
MSE and bias correction
( )
2
MSE = f ! o + s + s ! 2s f so rfo
2
f
2
o
MSE = ME + var(f ! o)
2
R Verification package.
www.cran.r-project.org/web/packages/verification.index.html
References:
Jolliffe and Stephenson (2003): Forecast Verification: a practitioner’s
guide, Wiley & Sons, 240 pp.
Wilks (2011): Statistical Methods in Atmospheric Science, Academic press,
467 pp.
Stanski, Burrows, Wilson (1989) Survey of Common Verification Methods
in Meteorology
http://www.eumetcal.org.uk/eumetcal/verification/www/english/courses/
msgcrs/index.htm
http://www.cawcr.gov.au/projects/verification/verif_web_page.html