Hydr JHM D 11 087 - 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

1268 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

Evaluation of Global Flood Detection Using Satellite-Based Rainfall


and a Hydrologic Model

HUAN WU AND ROBERT F. ADLER


Earth System Science Interdisciplinary Center, University of Maryland, College Park, College Park, and NASA
Goddard Space Flight Center, Greenbelt, Maryland

YANG HONG
School of Civil Engineering and Environmental Sciences, and Atmospheric Radar Research Center,
University of Oklahoma, Norman, Oklahoma

YUDONG TIAN
Earth System Science Interdisciplinary Center, University of Maryland, College Park, College Park,
and NASA Goddard Space Flight Center, Greenbelt, Maryland

FRITZ POLICELLI
NASA Goddard Space Flight Center, Greenbelt, Maryland

(Manuscript received 22 July 2011, in final form 2 April 2012)

ABSTRACT

A new version of a real-time global flood monitoring system (GFMS) driven by Tropical Rainfall Mea-
suring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) rainfall has been developed and
implemented using a physically based hydrologic model. The purpose of this paper is to evaluate the per-
formance of this new version of the GFMS in terms of flood event detection against flood event archives to
establish a baseline of performance and directions for improvement. This new GFMS is quantitatively
evaluated in terms of flood event detection during the TRMM era (1998–2010) using a global retrospective
simulation (3-hourly and 1/ 88 spatial resolution) with the TMPA 3B42V6 rainfall. Four methods were explored
to define flood thresholds from the model results, including three percentile-based statistical methods and a Log
Pearson type-III flood frequency curve method. The evaluation showed the GFMS detection performance
improves [increasing probability of detection (POD)] with longer flood durations and larger affected areas. The
impact of dams was detected in the validation statistics, with the presence of dams tending to result in more false
alarms and greater false-alarm duration. The GFMS validation statistics for flood durations .3 days and for
areas without dams vary across the four methods, but center around a POD of ;0.70 and a false-alarm rate
(FAR) of ;0.65. The generally positive results indicate the value of this approach for monitoring and
researching floods on a global scale, but also indicate limitations and directions for improvement of such ap-
proaches. These directions include improving the rainfall estimates, utilizing higher resolution in the runoff-
routing model, taking into account the presence of dams, and improving the method for flood identification.

1. Introduction floods have been rising rapidly because of extreme


weather conditions, urbanization, and inadequate disaster
Floods are a leading natural disaster, common and
response. Hydrologic model–based flood forecasting sys-
costly, and responsible for about one-third of natural ca-
tems have been regarded as the most effective way for
tastrophes (Smith and Ward 1998). Losses caused by
flood early warning and monitoring and subsequent
hazard mitigation and management (e.g., Dutta et al.
2000; Al-Sabhan et al. 2003; Hong et al. 2007; Reed et al.
Corresponding author address: Huan Wu, Earth System Science
Interdisciplinary Center, University of Maryland, College Park, 2007; Yilmaz et al. 2010; among many others). However,
5825 University Court, Suite 4001 College Park, MD 20740-3823. almost all these existing flood forecasting systems are
E-mail: [email protected] established at local or regional scales (e.g., Reed et al.

DOI: 10.1175/JHM-D-11-087.1

Ó 2012 American Meteorological Society


Unauthenticated | Downloaded 10/28/24 08:14 PM UTC
AUGUST 2012 WU ET AL. 1269

2007; Cloke and Pappenberger 2009; Pappenberger and studies concluded that a relatively more physically based
Buizza 2009; Voisin et al. 2011), usually in developed re- hydrologic model may improve the GFMS performance
gions, where sufficient resources are available, while (Hong et al. 2007; Yilmaz et al. 2010). The Coupled
many remote, ungauged regions and regions with trans- Routing and Excess Storage (CREST) hydrologic model,
boundary basins remain without such systems. Ongoing later developed (Wang et al. 2011) for this purpose, is the
improvements in global remote sensing data for esti- subject of the evaluation in this paper.
mating precipitation and delineating land surface char- The purpose of this paper is to evaluate the perfor-
acteristics (e.g., land cover, vegetation, topography, and mance of the new version of the GFMS in flood detec-
hydrography) have augmented hydrological simulations tion against available flood event archives to indicate the
on a wide range of scales, including the global scale. skill and limitations of the system. This paper is orga-
Developing global flood forecasting systems based on nized as follows. In section 2, we describe the method
hydrologic models driven by remote sensing data at rel- used in this study. The results of the evaluation are de-
atively high spatial and temporal resolution is now scribed and discussed in section 3, including the GFMS
practical and has the potential for providing useful in- flood detection performance at various scales and the
formation for flood estimation and management, espe- impacts of dams on the results. Conclusions are in sec-
cially for underdeveloped or remote regions. However, tion 4 and future work is discussed in section 5.
challenges remain in accurate precipitation estimation,
globally distributed parameterization for hydrological
2. Methodology
modeling, etc. But with improved accuracy, coverage, and
resolution from satellite-based rainfall estimation (Adler The GFMS combines the satellite-based estimates of
et al. 2003), these products have been used in many precipitation, runoff generation, runoff routing, and flood
hydrologic modeling applications with positive perfor- identification. A unified algorithm for flood event iden-
mance (e.g., Hong et al. 2006; Artan et al. 2007; Shrestha tification and matching between modeled results and
et al. 2008; Su et al. 2008; Pan et al. 2010; Su et al. 2011; reported floods was developed. Four different flood
among others). One such satellite rainfall product, the threshold definition method using GFMS output were
National Aeronautics and Space Administration (NASA) developed and utilized to evaluate the sensitivity of the
Tropical Rainfall Measuring Mission (TRMM) Multi- results to this variation.
satellite Precipitation Analysis (TMPA; Huffman et al.
a. Hydrologic model and data
2007), has been used extensively and provides quasi-
global (508S–508N) precipitation analyses at 3-hourly, The new (current) version of the GFMS uses the
0.258 latitude–longitude resolution, with all satellite es- CREST model (Wang et al. 2011) to simulate the spatial
timates calibrated or adjusted to the information from and temporal variation of land surface and subsurface
the TRMM satellite itself, which carries both a radar and water fluxes and storages by cell-to-cell simulation, con-
passive microwave sensor. sidering canopy interception, infiltration, and evapotrans-
Using the real-time version of the TMPA rainfall in- piration processes. However, there are no cool season
formation, an experimental global flood monitoring processes (e.g., snow or frost) considered in the model
system (GFMS) was developed (Hong et al. 2007) and at this time. The CREST model calculates infiltration
has been running routinely for the last few years with and surface runoff using the variable infiltration ca-
results being displayed at the NASA TRMM website pacity curve similar to the Xinanjiang model (Zhao and
(http://trmm.gsfc.nasa.gov/). In this original GFMS, a sim- Liu 1995) and the Variable Infiltration Capacity (VIC)
plified hydrologic infiltration module using a curve num- model (Liang et al. 1994, 1996). It employs a vertical,
ber (CN) approach and an antecedent precipitation index parallel, multilinear reservoir module adapted from
method as soil moisture proxy is used to partition rain- Xinanjiang model (Zhao and Liu 1995) coupled with
fall and a linear slope–based flow speed and direction a simplified cell-to-cell routing scheme with high
scheme is used to route runoff in order to predict po- computing efficiency. The CREST model main inputs
tential floods over the quasi-globe in near real time. This include rainfall (e.g., TMPA), potential evapotranspi-
original GFMS was evaluated in terms of detecting flood ration (Famine Early Warning Systems Network; http://
events by Yilmaz et al. (2010), which showed that the igskmncnwb015.cr.usgs.gov/global/) and hydrography.
simplified CN-based hydrologic approach has some skill The hydrography data include 1/ 88 resolution flow di-
in detecting floods, especially during the early stages of rection and drainage area derived by the hierarchical
flood events, but has low performance in flood event dominant river tracing (DRT) algorithm by Wu et al.
detection metrics (e.g., probability of detection) and de- (2011) using 30 arc-second-resolution Hydrological Data
lineation (e.g., flood evolution in the river network). Both and Maps Based on Shuttle Elevation Derivatives at

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1270 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

Multiple Scales (HydroSHEDS; Lehner et al. 2008) as [Asian Disaster Reduction Center (ADRC); http://www.
baseline fine-resolution hydrography inputs. For this glidenumber.net/], Financial Tracking Service (FTS)
exercise and for the current real-time application we do global, real-time database [U.N. Office for Coordination
not calibrate the CREST model because of the diffi- of Humanitarian Affairs (OCHA); http://fts.unocha.
culty of doing so across the globe at 1/ 88 resolution; more org/], Dartmouth Flood Observatory (DFO) (http://
importantly, we assume the hydrologic model still has floodobservatory.colorado.edu/), European Commission
skill in ranking events at locations, even though the Joint Research Center (JRC) Global Disaster Alert and
model-simulated flood magnitudes may be locally bi- Coordination System (GDACS) (http://www.gdacs.org/
ased relative to observed data (Reed et al. 2007). All flooddetection/), and the International Flood Network
the model parameters were either directly estimated (IFNET) (http://www.internationalfloodnetwork.org/).
from input data, or used as a priori parameters (see de- However, although most databases record flood date,
tailed parameter estimation and description by Wang duration, and country, more detailed information such
et al. 2011). as geographical location (latitude and longitude) and river
We performed the evaluation based on the retro- basin are often not recorded. This type of information is
spective simulation results of the CREST model forced critically needed for the evaluation in this study. Since
by the TMPA version 6 (V6) research quality data from 2006, the DFO has begun to record geographical locations
1998 to 2010 at 3-hourly time resolution and 1/ 88 latitude– of flood events based on the center of a polygon enclosing
longitude spatial resolution over the TRMM quasi- the inundated area. A longer-period global flood inven-
global domain. The TMPA (V6) rainfall, which is only tory (GFI) based on DFO, EM-DAT, FTS, and IFNET
available about a month after observation time, is used was compiled for 11 years (1998–2008), coinciding with
for this study because of its consistency during the 13-yr the availability of TRMM precipitation products (Adhikari
TRMM period used, as compared to the real-time ver- et al. 2010). Geographical locations of flood events in
sion of the product (TMPA RT), which has changed GFI were mainly taken from DFO (for 2006 ; 2008)
significantly over that period. The current version of the with additional reports, aerial photographs, and remote
real-time TMPA (RT) rainfall, used for the real-time sensing images used through tedious verification and
GFMS, uses monthly and regional climatological ad- cross-checking processes with Google Earth (Adhikari
justments to produce real-time estimates close to the et al. 2010). We employed the DFO (2006 ; 2010) and
after-the-fact V6, which includes monthly rain gauge GFI (1998 ; 2008) flood event database (referred to as
information used for bias adjustments of the satellite the flood database) as the reference for the quantitative
rainfall estimates (Huffman et al. 2009). While the model evaluation of the GFMS performance in flood detection,
outputs major hydrologic variables including discharge as they both provide both flood location and duration.
(m3 s21), routed runoff (mm), evapotranspiration and Affected areas of flood events are also available from
soil water (mm), etc., the evaluation was performed DFO flood database. There are 929 and 2672 reported
mainly using the routed runoff variable (depth of water flood events within the study domain (508S–508N) by
in each grid cell at each time), which represents the total DFO and GFI, respectively, after removal of flood events
amount of water stored in each grid cell surface at each caused by dam failure and snowmelt, which are not rep-
time interval, routed from its upstream drainage area. resented in the current GFMS formulation. A combined
The routed runoff variable was chosen for the evalua- flood database was created using GFI (1998 ; 2008) and
tion because it directly represents the magnitude (depth) DFO (2009 ; 2010) for the evaluation.
of water stored on the dry land surface for each grid cell
b. Flood threshold definition
regardless of flow conditions of inbank or overbank. The
routed runoff and discharge can be calculated from one The Log Pearson type-III (LP3) distribution presented
another at each grid cell. The simulated routed runoff was in Bulletin 17B (B17) (IACWD 1982) is the method cur-
stored for each grid cell at every 3-h time interval for the rently recommended by United States federal agencies
simulated 13 years. These routed runoff results were then for flood frequency analysis. The LP3 distribution is rec-
used to determine the flood definition statistics for each ommended to fit the observed flood flow data using three
grid cell. sample moments (mean, standard deviation, and skew)
There are several global flood event databases avail- calculated from the logarithmic transformed data. Mag-
able for comparison with the model. Most of them are nitudes can be derived from the analytical LP3 distribu-
online resources—for example, Emergency Events Da- tion fit for floods with various return periods, and these
tabase (EM-DAT) by the Centre for Research on the magnitudes can be used as thresholds for flood definition.
Epidemiology of Disasters (CRED) (http://www.emdat. Although B17 tries to use LP3 to promote a consistent,
be/), Global Identifier Number (GLIDE) disaster database uniform approach to flood frequency determination (Chow

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1271

TABLE 1. The definition of threshold values to define flood from the four methods. The unit for u is mm and for FAC is km2.

Methods Thresholds Methods Thresholds


Method 1 Log Pearson type-III 2-yr Method 3 P95 1 s 1 u(FAC) P98
Method 2 return flood P95 1 30 (mm) Method 4
u 5 6 (FAC # 3000); u 5 10 (3000 , FAC # 1.0 3 106); u 5 20 (FAC $ 1.0 3 106)

et al. 1988), it also indicates that flood events do not fit 1


KT 5 z 1 (z2 2 1)k 1 (z3 2 6z)k2 2 (z2 2 1)k3
any one specific, known statistical distribution (IACWD 3
1982). With only historic streamflow, it is difficult to 1
1 zk4 1 k5 , (2)
derive thresholds for identifying floods (Hirsch 1987), 3
and no single probability distribution is the best to fit
flood events under all situations in terms of variations in k 5 Cs /6, (3)
space and time. To define generalized reliable thresh-
olds for flood identification for global-scale applications 2:515 517 1 0:802 853w 1 0:010 328w2
z 5w2 ,
is even more challenging. However, in this study we are 1 1 1:432 788w 1 0:189 269w2 1 0:001 308w3
focused on the problem of flood detection but not flood (4)
intensity. Instead, in this evaluation of flood detection   1/2
we are mainly focused on differentiating flood flow (ei- 1
w 5 ln 2 , and (5)
ther overbank or even more severe) from normal flow p
(below fullbank) given historic hydrologic data (from
p 5 1/T, (6)
the simulations), regardless of the length of the return
period or magnitude of an identified flood. This makes
where XT is the magnitude (logarithmic transformed) of
the flood definition problem in this study relatively
a flood flow with return period of T years, m is the mean
easier. In addition to using the LP3 method, we also per-
and s is the standard deviation and Cs is the skew cal-
formed a series of experiments using statistic percentile-
culated from the annual maximum discharges (converted
based method to determine alternate thresholds for flood
from routed runoff and 10-based log transformed), KT is
identification.
a frequency factor approximated by Eq. (2), z is the stan-
dard normal variable approximated by Eq. (4), and p is
1) LP3 DISTRIBUTION METHOD
the exceedance probability. The XT related to the 2-yr
The LP3 distribution has been widely used for hy- return period, after being logarithmic back transformed
drological data analysis in many applications. The third and converted back to routed runoff in units of depth
parameter of LP3 (skew) permits the fitting of asym- (mm), was selected to define the threshold to define flood
metric distribution. When the coefficient of skewness is in this study. On average, rivers are fullbank about every
zero, the LP3 becomes identical to the lognormal dis- 2 years (Carpenter et al. 1999; Reed et al. 2007). There-
tribution. Flood magnitudes estimated by the LP3 dis- fore, the magnitude of flood corresponding to a 2-yr re-
tribution are very sensitive to the value of coefficient of turn period was selected from the LP3 method as the
skewness. Because the coefficient of skewness is very threshold to define floods (Table 1). We used the 2-yr
sensitive to the size of the sample and difficult to accu- return period flood threshold estimated from the 13 years
rately estimate from small samples, B17 recommends a of data to define all floods. The LP3 method was also
generalized estimator for coefficient of skewness by com- adopted by Reed et al. (2007) to estimate flood frequency
bining the station skew with a regional skew generalized using an 8-yr simulation for flash flood forecasting. How-
from annual maximum streamflow using the inverse of ever, we used the LP3 only as a binary indicator of flood
their mean square errors as weights (IACWD 1982). As occurrence tuned against the reported flood inventory,
the generalized global map for skew is not available and which should be reliable. Hereafter, the LP3 method is
difficult to use, we simply derived the LP3 for each grid referred to as method 1.
cell over the globe from each grid cell’s corresponding
13-yr annual maximum routed runoff (converted to
2) PERCENTILE-BASED METHOD
discharge in units of m3 s21), following procedures by Model-derived routed runoff absolute values are
Chow et al. (1988) as in described in Eqs. (1)–(6): strongly determined by model assumptions and calibra-
tion, but relative values such as percentile statistics (prob-
XT 5 m 1 KT s, (1) ability of exceedance) can be used, especially for extreme

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1272 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

FIG. 1. Quasi-global 95th percentile routed runoff (mm) map derived from 13-yr retrospective
simulation.

events, to effectively compare simulated and reported value along river flow path. Remember the routed run-
flood events. Brakenridge et al. (2007) developed a meth- off variable is the depth of water from dry ground (river
odology for satellite-based flood detection by thresholding bottom) at the 1/ 88 scale.
the Advanced Microwave Scanning Radiometer for Earth In method 2, we used the P95 plus a constant value
Observing System (AMSR-E) passive microwave signal of (i.e., 30 mm for this study) to represent fullbank status
water surface change using 95th percentile values. In the over the globe (Table 1). The 30-mm value was chosen
evaluation of the initial GFMS by Yilmaz et al. (2010), five based on experiments as to what value subjectively gave
different geographical zones were defined considering a reasonable number of flood events as compared to
hydroclimatic variations and the runoff threshold for each the DFO flood database (2006–10). As seen in Fig. 1,
zone was defined as the 0.98 exceedance probability of headwater and overland areas of basins in the map of P95
3-hourly runoff in each zone during the 1-yr study time are separated from more downstream portions of rivers.
period. As percentile value represents the relative rank of However, the P95, even with the constant 30 mm added
routed runoff for each grid cell, spatially distributed per- as in method 2, cannot effectively separate rivers with
centile values can be used to determine thresholds to dif- high interannual or seasonal variations from rivers with
ferentiate flood flow from normal flow status. We assume low variations. To account for these variations in method
the 95th percentile routed runoff (referred to as P95) rep- 3, additional parameters are added. These additional pa-
resents the streamflow within bank, while some higher rameters are used to take into account river hydrograph
percentile of routed runoff value represents the river in the variations and are needed to increase the P95 to represent
fullbank status. We will determine a grid cell is flooding the fullbank status. Generally, a smaller (larger) range of
when its routed runoff is greater than a threshold value, difference between the P95 and routed runoff value at
which represents a river in full-bank status. Because of which river is fullbank is expected for rivers with less
spatial heterogeneity of climate and landscape character- (more) interannual or seasonal variations. Standard de-
istics across the globe and their effects on hydrological viation (s) of the routed runoff over the 13 years repre-
response, using a uniform percentile (e.g., 95th and 98th) sents the variation or dispersion from the mean and can
based threshold to define flood over globe may not be be used to measure the interannual and seasonal vari-
suitable. Instead of seeking a spatially distributed percen- ability of streamflow. Larger rivers (in terms of magni-
tile map to define floods, we employed the 95th percentile tude of streamflow) tend to require relatively larger
routed runoff value of each grid cell as the starting point to absolute additional routed runoff threshold value above
define thresholds for flood identification—that is, we use the P95 to reach the fullbank status than smaller rivers.
the 95th percentile routed runoff value plus an additional Similar to P95, s usually increases as the mean increases
threshold value to approximate fullbank routed runoff with a distributed spatial pattern generally consistent to
value for each grid cell (Table 1). The 95th percentile the natural river network with increasing value along
routed runoff values derived for each grid cell over the river flow path (not shown). Therefore, although s has a
globe (Fig. 1) show a distributed spatial pattern generally very high correlation coefficient to P95 from the global
consistent with the natural river network, with increasing statistics, we used s locally at each grid cell to form the

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1273

FIG. 2. The difference of thresholds derived by methods 3 and 1 (method 3 2 method 1).

additional threshold in method 3. Because of s being too basin and the Nile basin, method 1 derives larger thresh-
small for some rivers with low streamflow (e.g., rivers in olds than method 3 and the difference tends to be larger
up basins or arid areas) or low seasonality (i.e., with a toward the river mouth. In many downstream areas of
flatter monthly hydrograph), an upstream flow accumu- basins, the threshold values from methods 1 and 3 are
lation area (or upstream basin area, referred to as FAC, much larger than those of method 2. There are differ-
km2) dependent additional threshold (u) was also added ences among the finally determined threshold values of
in method 3. Because it is very difficult to derive an ap- the method because they deviate from the exact ‘‘full-
propriate analytic relation between the additional bank’’ or 2-yr return period when tuned (e.g., the addi-
thresholds and the FACs, arbitrary values for three FAC tional threshold in percentile-based methods) against
bands (Table 1) were adopted to define the u in method the flood database to obtain better detection perfor-
3 by which subjectively a reasonable number of flood mance of the system. However, instead of adjusting a
events were defined as compared to the DFO flood da- single method, we explored the four methods to see the
tabase (2006–10). Thresholds directly using the 98th sensitivity of the flood detection results to the differ-
percentile routed runoff value as used by Yilmaz et al. ences among the thresholding methods. However, the
(2010) were also investigated in this study and this higher evaluation of these methods is not the primary focus of
percentile method is referred to as method 4 (Table 1). this paper.
The four methods were employed to define flood oc-
c. Flood matching between simulated
currence from the simulated results for each grid cell.
and archived databases
A grid cell is determined as flooding at a time interval
when the routed runoff for this time is greater than the Although estimated flood events can be calculated for
threshold at the grid cell defined by method in question. each grid cell from the retrospective simulated results
The thresholds derived by each method (Table 1) are according to the method in section 2b and there are lo-
spatially distributed, with method 1 having the highest cations (latitude and longitude) reported in both flood
spatial variability while methods 2 and 4 have the low- event databases, matching the flood events between these
est. The differences between thresholds derived by the simulated and reported events based on a single grid cell
methods are mainly reflected in up–low basin areas and is not appropriate. Both flood databases consist mainly of
wet–dry areas. There are large spatial variations in var- news reports and the assigned locations and days of the
ious thresholds, while there is no method that consis- reported floods are not always accurate (Yilmaz et al.
tently produces the largest or least threshold across the 2010). To make the evaluation more meaningful, we
study domain. Figure 2 shows differences between the further developed the flood event identification method
thresholds defined by methods 1 and 3 and indicates that by Yilmaz et al. (2010), who used a 2.258 3 2.258 moving
method 3 generally has higher thresholds for wet areas spatial window based on the reported flood location and
and lower ones for dry areas. In most of the study do- a 1-day (624 h) buffer surrounding the reported flood
main, the difference in thresholds between methods 1 duration for matching the simulated and reported flood
and 3 ranges between 250 to 150 mm (green and yel- events. For this study, a spatial window (yellow area in
low in Fig. 2). However, in stem rivers of the Amazon Fig. 3) was defined for matching a simulated flood to

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1274 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

match because it also checks the river segments down-


stream to the reported location for an extended 100 km.
Therefore, the algorithm will check the stem river where
the flood actually happened.

3. Results and discussion


Using the four flood definition methods, simulated
floods for each 3-h time interval were derived globally
and compared to the flood inventory data. Subjective
evaluation of the results indicates that the model results
often capture flood occurrence and general flood evo-
lution reasonably well, responding to rainfall events with
FIG. 3. Definition of spatial window for matching between
simulated and reported flood events. the start, development, and recession of flooding along
the drainage networks (http://trmm.gsfc.nasa.gov/). Sta-
tistical results of the evaluation are presented in the fol-
a reported flood according to the reported flood location lowing sections. To quantitatively evaluate the GFMS
and drainage network. The spatial window was defined performance in flood event detection, we calculated three
to be composed of all grid cells in the upstream drainage classic categorical verification metrics—that is, probabil-
area within a limited flow distance (i.e., ;200 km) ac- ity of detection [POD; a/(a 1 c)], false-alarm ratio [FAR;
cording to the reported location (red dot in Fig. 3). We b/(a 1 b)], and critical success index [CSI; a/(a 1 b 1 c)],
also extended the spatial window definition by including based on a 2 3 2 contingency table (a 5 GFMS yes, re-
the grid cells in the downstream stem river of the basin/ ported yes; b 5 GFMS yes, reported no; c 5 GFMS no,
subbasin below the reported location within a limited reported yes; d 5 GFMS no, reported no).
distance (i.e., ;100 km). In some cases the reported
a. Model flood detection performance
locations of floods are not located in rivers (i.e., with
FAC , 2), and for these cases we moved the reported An algorithm was developed to search the flood events
location downstream along the flow path a distance of in the simulated results according to the thresholds and
two grid cells (;30 km) within the river basin. On the method discussed in sections 2b and 2c to attempt to
simulation side, we mark the entire area defined by the match with reported flood events in the flood databases.
spatial window described above as simulated flooding We determine that a reported flood event is hit by the
when there are more than three grid cells flooding (ac- GFMS if a reported flood event can be found in the
cording to the method in section 2b) within the spatial simulated results within the spatial–temporal window
window for two continuous 3-h time intervals. The ad- associated with the reported flood event. The global
vantage of the spatial window definition is that the flood PODs were calculated using the four flood definition
matching can be constrained in the same basin—that is, methods for the two global flood databases separately
the simulated (reported) floods in neighboring basins and combined (Table 2). The calculation of POD by each
and subbasins will not be incorrectly matched to the method used the same flood event matching rules except
flood event reported (simulated) in the interested basin. for the four different threshold values. Results in Table 2
We assume the reported flood locations are located in indicate that the POD values are basically independent of
the correct basin, even though they may not be recorded which reported flood inventory is used and therefore the
with precisely correct latitude and longitude coordi- two databases can be combined for the overall evalua-
nates. If a flood is reported at a location in a stem river tion. Figure 4 shows the global flood events detected by
just downstream of a confluence while the flood actually the GFMS using method 3 (with a POD of 0.59) during
occurred in the subbasin just upstream of the conflu- 1998–2010 against the combined flood database, which
ence, the flood identification algorithm we developed indicates a reasonable geographic distribution and over-
will check the upstream drainage area within a distance lap of simulated and reported floods.
according to the reported location that contains the sub- The reported and estimated number of floods de-
basin where the flood actually happened. In the other creases as a function of flood duration (Fig. 5a) and the
situation, if a flood is reported at a location within a sub- POD of the GFMS increases with longer duration floods
basin just upstream from a confluence, while the flood (Fig. 5b), with a gradual increase to an asymptote at 10–
actually occurred in the stem river where the confluence 20 days, depending on the method used in identifying
is, the flood identification algorithm will not miss the floods in the simulations. In the combined flood database,

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1275

TABLE 2. The POD performance by the four methods based on detecting smaller floods. Both Fig. 5 and Fig. 6 showed
global statistics. relatively larger differences of the POD performance
Flood database between the methods (especially method 2 and other
Name Time No. M1 M2 M3 M4
methods) for short-term floods or smaller affected areas,
while the difference steadily decreases as the flood scale
GFI 1998–2008 2672 0.55 0.42 0.57 0.90
increases with longer duration or larger affected area.
DFO 2006–2010 929 0.60 0.44 0.62 0.96
Combined 1998–2010 2949 0.56 0.42 0.59 0.90 There are 35% (934 out of total 2672) of flood events in
Combined Duration # 3 days 0.38 0.26 0.42 0.79 the GFI flood database that are reported as short-term
Combined Duration . 3 days 0.77 0.67 0.78 0.95 (duration # 3 days) floods, compared to 25% (231 out of
total 929) in the DFO flood databases. This leads to
consistently higher PODs for DFO as compared to the
there are 1032 (35% of total 2949) short-term floods with GFI database in Table 2.
flood duration #3 days. The GFMS has difficulty de- There is a relatively sharp peak in the number (i.e.,
tecting these short-term floods with PODs of 0.38, 0.26, 156) of reported floods with duration of 15 days (Fig. 5a)
0.42, and 0.79 (Table 2) for methods 1, 2, 3, and 4, re- in the combined flood database. However, this 15-day
spectively. However, the PODs increase to 0.77, 0.67, flood event peak only appears in the GFI flood database.
0.78, and 0.95 (Table 2) for all floods with reported du- As discussed in section 2c, to calculate the POD, the sim-
ration .3 days. The POD model performance for flood ulated floods are searched to match the record only when
detection also steadily increases as the flood-affected area there is a flood is recorded in the flood event database.
increases (Fig. 6). These relations are almost certainly Therefore, the number of simulated floods for 15-day
related to the limitations in the satellite rainfall data. The floods is found to increase for each method in Fig. 5a.
TMPA has a 3-h time resolution and 0.258 spatial reso- However, the POD performance decreases for all the
lution and these resolutions will certainly limit the def- four methods for that specific duration of flood. The
inition of small-scale rain events. However, random reason for the peak in flood events of this duration is not
sampling errors will decrease with spatial and temporal known, but may be related to human estimates tending
averaging and this tends to translate into better hydro- to peak at 2 weeks (14 days) or one-half of a month.
logic model and flood calculations for larger (and lon- Similarly, the DFO flood database showed relatively more
ger) events. Larger floods also have higher possibility of floods reported in some flood-affected-area ranges—
meeting the flood definition and the matching rules dis- for example, 131 out of total 929 flood events with af-
cussed in sections 2b and 2c, with longer durations fected area of 200 000 ; 300 000 km2 (Fig. 6a). The
(larger temporal window) and more affected grid cells reason for this is also unknown, but may be related to the
(more potential flooding individual grid cells). Larger analysts preferentially picking a certain size of event.
floods are relatively easier to detect, while thresholds Using the combined 13-yr flood database, a relatively
defined by the different methods may have difficulty large range of POD values from 0.42 to 0.90 is noted

FIG. 4. Global flood events detected by the GFMS using method 3 during 1998–2010 against
combined flood database. The dark balls are reported flood events in the database. When the
model successfully hits a reported flood event, the dark ball turns to gray. The gray shaded part
of the map is the TRMM-based study domain.

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1276 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

FIG. 5. The GFMS performance of flood detection in terms of


flood duration against the combined flood database using the four
flood definition methods.
FIG. 6. The GFMS performance of flood detection in terms of
affected area against DFO flood database using the four flood
(Table 2), with the values increasing and the range of definition methods.
values narrowing somewhat when only floods of greater
than 3 days are included. However, a complete evalua- place where the flood actually happened, according to
tion must include other statistics (e.g., FAR). the news report. Given a specific subbasin or local river
reach, the reported flood events also have errors in
b. Model false-alarm performance
assigned times. Furthermore, floods are likely under-
The false alarms in the predictions are of equal im- reported in both the GFI and DFO archives, because
portance as the successful model hits, as they determine floods tend to be reported in high-population areas while
the flood forecast reliability and efficiency. A higher underreported in remote areas—for example, the Ama-
POD performance can be achieved by using lower thresh- zon basin (Fig. 7e). In addition, larger floods causing
olds or larger temporal and spatial windows to match the more damage tend to be reported, while smaller floods
simulated and reported flood events. However, for a spe- tend to be missed. Therefore, in order to evaluate the
cific flood definition method, high POD usually comes with model performance in FAR as objectively as possible,
a larger number of false alarms. we calculated the FAR by comparing the simulated
Although the same four methods were used to define flood events to reported floods in 53 selected well-
flood thresholds, the algorithm used to evaluate false reported areas (WRA) over the globe (yellow areas in
alarms is different from the one for POD calculation, Fig. 7). The WRA are defined according to the 13-yr
because FAR cannot be calculated straightforwardly combined flood database by the following procedure: 1)
like the calculation of POD by directly searching for flood the same method for definition of the spatial window
events in the three-dimensional (latitude, longitude, and (section 2c) was applied to each reported flood location
time) simulated results according to each reported flood in the combined flood database; 2) if there are multiple
event. To derive the FAR, flood events had to be iden- reported flood events in the spatial window, the reported
tified first in the simulated results, and then those iden- location with the largest upstream drainage area was
tified simulated flood events were used to compare with selected and used to define a new spatial window; and 3)
reported flood events. Many floods not only occur in if there are more than six flood events reported during
local subbasins but also move downstream along river the 13 years in the new spatial window, we determine the
networks, creating a larger affected area within the en- new spatial window as a WRA. All the WRAs are lo-
tire river basin, while the location of a reported flood is cated in wet and/or high-population regions (Fig. 7). A
a specific point with only a latitude and longitude coor- large proportion of the well-reported areas are located
dinate available in the flood databases. The reported lo- in South Asia (Fig. 7c) and Africa (Fig. 7b). The numbers
cation for a flood event may not be precise; for example, of well-reported areas for each continent are 24 (Asia), 16
the reported location could be adjacent to the actual (Africa), 6 (North America), 4 (Europe), and 3 (South

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1277

FIG. 7. The spatial distribution of the 53 well-reported areas (according to the combined flood database) over the TRMM global domain,
with 5 regions selected to zoom in. The background image is the mean annual runoff (precipitation minus evapotranspiration) from
NASA’s Modern-Era Retrospective Analysis for Research and Applications (MERRA) reanalysis data for the satellite era (Bosilovich
et al. 2006).

America). There are a total of 490 flood events reported long duration and it was hit by the model-based result
for the 53 WRAs from 1998 to 2010. The number of multiple times, each match is recorded. However, two
reported flood events ranges from 6 to 25 with a mean neighboring (in time) simulated events were considered
value of 9. independent events only when they were 2 days apart.
To calculate the FAR, the simulated floods were When a simulated flood event had a long time period
identified by checking each grid cell in a selected WRA and overlapped with more than one reported flood, it
for every modeling time step. A simulated flood event is was simply divided into events by 15-day periods. How-
identified for each time step for which there are at least ever, this type of case did not happen in this evaluation.
three grid cells flooding at each time step (same as the All the three verification metrics by all methods vary
POD calculation in section 3a). Then, if the flood du- from one WRA to another because of the small number
ration of a simulated flood event overlaps with the du- of cases (6–25) in each area (Figs. 8 and 9). The mean
ration of a reported flood in the same selected WRA, we PODs over the 53 WRAs are 64%, 54%, 70%, and 89%
determine the reported flood is successfully detected by by methods 1, 2, 3, and 4, respectively, based on the
the GFMS. All simulated floods having no overlap in combined flood database (Table 3). The PODs by methods
time and space with any reported floods are regarded as using absolute magnitude thresholds (i.e., methods 1, 2, and
false alarms. By this method the number of hits, misses, 3) from the WRAs are higher than those for the whole
and false alarms and simulated flood durations were globe (Table 2), because the WRAs are mostly located
derived for each WRA. When a reported flood had a in wet areas, where the flood identification is relatively

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1278 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

FIG. 9. As in Fig. 8, but for longer-term floods (duration . 3 days).


FIG. 8. The GFMS flood detection performance against the
combined flood database for floods with all durations ($1 day)
over the 53 well-reported areas. The WRAs with identification As all these factors are probably contributing, the FAR
from 1 to 28 (left to the vertical dash line) are with no dams and the statistics appear poor at first glance. The mean FARs
WRAs with identification .28 (right to the vertical dash line) are
over the 53 WRAs for all floods with duration $ 1 day are
with dams.
87%, 89%, 93%, and 95% by methods 1, 2, 3, and 4, re-
spectively (Table 3), with only a few areas showing lower
easier than in dry regions. In arid or up-basin areas FAR values (Fig. 8b). However, 35% of floods in the
where the routed runoff is smaller, the additional threshold reported flood databases are short-term floods (172/490
(u) in methods 2 and 3 tends to reduce the identification of floods with duration # 3 days over the 53 WRAs). When
floods in these areas, while method 4, which uses a relative these short-term floods are removed from the analysis,
rank, was not affected significantly. the GFMS has significantly better performance, with lower
There are a number of factors that can lead to false FARs, and also higher PODs (Fig. 9 and Table 3).
alarms in the model results, including errors in the pre- Three of the techniques have similar CSIs of 22%–
cipitation estimation, impacts of flow control structures 23% for floods greater than 3-day durations. Method 4
(e.g., dams and levees), missing reports, limits in the flood (the 98th percentile method) has a very high POD (95%)
definition methods, and errors in the hydrologic model. for the longer floods, with a FAR of 78%. The greatly

TABLE 3. Flood detection verification against the combined flood database over the 53 well-reported areas by the four methods.

Metrics M1 M2 M3 M4
Metrics averaged over total 53 WRAs for all floods with duration $ 1 day
POD 0.64 0.54 0.70 0.89
FAR 0.87 0.89 0.93 0.95
CSI 0.12 0.08 0.07 0.05
Metrics averaged over the 53 WRAs for floods with duration # 3 days
POD 0.42 0.35 0.52 0.77
FAR 0.93 0.95 0.97 0.98
CSI 0.06 0.04 0.03 0.02
Metrics averaged over the 53 WRAs for floods with duration . 3 days
POD 0.74 0.60 0.78 0.95
FAR 0.70 0.80 0.74 0.78
CSI 0.23 0.16 0.23 0.22

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1279

TABLE 4. Flood detection verification against the combined flood database over the 28 WRAs without dam and the 25 WRAs with dams
by the four methods.

Metrics M1 M2 M3 M4
Metrics averaged over the 28 WRAs with dam for floods with duration # 3 days
POD 0.51 0.40 0.54 0.79
FAR 0.96 0.97 0.97 0.98
CSI 0.04 0.02 0.02 0.01
Metrics averaged over the 28 WRAs without dam for floods with duration # 3 days
POD 0.35 0.31 0.50 0.75
FAR 0.91 0.93 0.96 0.97
CSI 0.07 0.05 0.04 0.03
Metrics averaged over the 25 WRAs with dam for floods with duration . 3 days
POD 0.80 0.69 0.83 0.98
FAR 0.81 0.89 0.85 0.88
CSI 0.16 0.10 0.14 0.12
Metrics averaged over the 25 WRAs without dam for floods with duration . 3 days
POD 0.68 0.51 0.74 0.93
FAR 0.60 0.71 0.63 0.68
CSI 0.30 0.23 0.32 0.31

increased POD and decreased FAR values for longer- with dams tend to have high FAR values, indicating a clear
term flood detection indicates the GFMS is more reliable relation between the presence of large dams and the
for larger-scale floods, which is not surprising considering false-alarm statistics. The FAC of the two groups vary
the resolutions of the precipitation data and the hydro- over a similar range of values, which also indicates that
logic model. Uncertainties in the data (especially the the comparison is valid. Table 4 shows the flood detection
rainfall) and the model may produce noisiness in the verification metrics based on short-term and long-term
model flood identification, leading to a large number of floods derived for the no-dam group and dam group
small-scale false alarms. For example, among the 5759 separately by averaging over the WRAs in the two groups.
simulated floods identified for the 53 WRAs by method For the short-term floods (top half of table) there is only
3, 73% are short-term floods (,3 days), so that many of a slight difference between the dam and no-dam groups.
the false alarms are associated with small-scale events. For longer-term floods (.3 days; bottom half of Table 4)
Part of the reason for the high false alarms for short-term the FAR values are much lower on average for the
floods may be related to a greater likelihood for missed no-dam group, although the POD values are somewhat
reports for smaller events. lower also. The resulting CSI values are generally higher
for the no-dam areas. The higher PODs in the dam group
c. Impact of dams on false-alarm validation statistics
may reflect that larger floods are relatively easier to de-
In Fig. 9, which is for floods lasting greater than 3 days, tect for the GFMS and the reported floods in the dam
the distribution of FAR values (Fig. 9b) is very different group may be relatively larger (because of the presence of
than the distribution for all floods in Fig. 8b. This vari- dams).
ation in FAR among the WRAs is related to the pres- The results in Table 4 indicate that the GFMS has
ence of dams, and becomes clearest when the short-term better performances in areas without dams, which is as
floods are ignored in the analysis. A global large dam expected since the hydrologic model does not include a
database (Vörösmarty et al. 1997, 2003; http://wwdrii.sr. reservoir module to represent dam operations. The GFMS
unh.edu/download.html) was employed to investigate shows very good performance in detecting floods with
the dam effects on the false alarm over the 53 WRAs. To duration . 3 days in nondam situations, with relatively
investigate the effects of dams on the false-alarm sta- high POD and low FAR leading to relatively higher CSI
tistics, we divided the 53 WRAs into two groups. The (Table 4). This result indicates that dam effects on the
first group consists of the 28 WRAs with no dams (left of GFMS flood detection ability highly depend on the flood
the vertical dashed lines in Figs. 8 and 9), referred to as scale—that is, dams prevent many small simulated floods
no-dam group. The second group consists of the 25 WRAs from actually occurring and being reported, leading to
with dams (right of the vertical dashed lines in Figs. 8 and lower GFMS performance metrics, while the statistics are
9), referred to as the dam group. One can see immediately better for longer duration events, even for WRAs with
in Fig. 9b that the lower FAR values tend to be associated dams, because the larger rainfall events can still produce
with the WRAs in the no-dam group, whereas the areas actual floods, even in areas with dams, though the dams

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1280 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

FIG. 10. The accumulated flood duration changes with upstream basin area in natural (by model) and regulated
(reported) scenarios.

would likely decrease flood peaks and damages. The from all grid cells with their FAC values falling into the
GFMS, using relatively coarse-resolution rainfall infor- FAC band indicated. Natural floods progress from up-
mation and hydrologic modeling, and without any stream to downstream along a drainage network and thus
method to take into account the effect of dams, should be increase the flood duration in lower parts of river basins.
expected to have reasonable statistical results for events By methods 1, 2, and 3, the simulated AFD generally
of at least a few days’ duration in areas not affected by increases downstream along the drainage network with
large dams. Results summarized in Table 4 (fourth panel) a similar spatial pattern to FAC and basin drainage net-
indicate that this is the case. For flood duration greater work (gray in Figs. 10a–c). This indicates that the GFMS
than 3 days in areas without large dams the POD is ;0.7, generally maintains the natural spatial pattern of AFD
the FAR is ;0.6, and the CSI is ;0.3. These are good with the hydrologic model in which only natural pro-
results for this stage of GFMS development. cesses are considered. Of course, the current routing
scheme does not consider the presence of dams. How-
d. Flood duration statistics
ever, this type of AFD curve was not generated by
Accumulated flood duration (AFD) during the 13 method 4, which uses the 98th percentile uniformly for
TRMM-era years was calculated for each grid cell each grid cell and derives a spatially uniform AFD (gray
from the simulated results for each flood definition in Fig. 10d). The uniform 98th percentile applied to each
method. The simulated (natural—no dams) and reported grid cell determines the same number of time intervals
(regulated—including basins with dams) AFD histo- (2% of the time) flooding for each grid cell, resulting in
grams based on FAC were derived respectively from the the uniform AFD. Independently defining floods for each
simulated results and the 2949 reported flood events in single grid cell solely using a uniform percentile threshold
the combined flood database. The AFD in each histo- value cannot take into account the fact that floods in
gram column was calculated as the average of the AFDs upstream rivers add more flood risk in downstream areas

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1281

as floods propagate along the drainage network. How- e. Duration of false alarms
ever, percentile plus an additional threshold (e.g., 30 mm
POD and FAR statistics represent how well the tech-
by method 2) defines floods not only in a relative manner,
nique detects individual flood events, no matter what the
but also by an absolute threshold. In this way floods are
durations of the actual and estimated floods. Another
defined only when the runoff is accumulated with a large
measure of the quality and usefulness of the model-based
enough magnitude, which significantly reduces the num-
flood estimates in this study is the mean duration of the
ber of flood identifications in upstream basins and in
false alarms. Therefore, as a further evaluation of the
relatively drier areas leading to, on average, larger AFD
overall model-based technique, the four methods were
values in areas with lager FAC magnitudes.
evaluated in terms of the lengths of the false alarms. The
From Fig. 10, method 1 derives relatively smaller
evaluation showed that method 1 has the longest average
AFD in up basins and method 4 derives the most, prob-
false-alarm flood duration (9.7 days) based on all the
ably mostly contributed by short-term floods. Simulated
simulated floods with durations .3 days from all WRAs,
AFD by method 2 increases relatively more steadily to-
while methods 3 and 4 derived the least (6.4 days). On
ward downstream than other methods while it is higher
average, based on floods with durations .3 days, the false
than other methods that are relatively closer to the AFD
flood duration for each WRA per year is 22.8, 19.9, 14.4,
magnitudes based on the reported floods in downstream
and 20.5 days, while the average number of false alarms
areas. This is consistent to the comparisons between
for each WRA per year are 2.3, 2.6, 2.2, and 3.2 by
verification metrics (e.g., Table 3), which indicate that
methods 1, 2, 3, and 4, respectively. For short-term floods,
methods 1, 3, and 4 have closer and better performance
all methods showed similar average flood duration of 1.5–
for long-term floods (tending to occur in downstream
1.7 days per event. Although there needs to be additional
basins) than method 2. The 30-mm additional threshold
analysis in this area, the different flood identification
used in method 2 is too large for presenting bank-full
thresholding approaches used in the four methods all give
status in up-basin areas and too small for many down-
reasonable results. However, the type of approach used
stream basins (Fig. 10b), but it generally captures the
for method 3, taking into account basin size through the
spatial pattern of flood duration for natural scenarios.
FAC parameter and the seasonal and interannual vari-
As floods are probably largely underreported, it is dif-
ability through the use of the flood variance parameter
ficult to draw a strong conclusion on which method
(s) seems to provide the best framework for future work
derives the closest AFD to reality. However, if reliable
in this area.
reported flood duration information is available, the
relation between AFD and FAC (Fig. 10) can provide
a useful reference to find more reasonable thresholds
4. Conclusions
for flood definition.
The AFD is well related to FAC based on the global This paper describes an evaluation of a new version of
statistics from the simulated results by all the methods, a global flood monitoring system (GFMS) using an im-
except method 4. Unlike the simulated results (except proved hydrologic model (the CREST model) driven by
by method 4), there is no strong increase in the reported Tropical Rainfall Measuring Mission (TRMM) Multi-
AFD as FAC increases and the variability range in satellite Precipitation Analysis (TMPA) rainfall. The new
reported AFD is much larger than the simulated. This GFMS was quantitatively evaluated on flood event de-
could be partly caused by the bias in the reported flood tection during the TRMM era (1998–2010) based on
duration. However, the good relation between the AFD a global retrospective simulation (3-hourly and 1/ 88 spatial
and FAC may exist only in a natural scenario. When resolution) using the satellite rainfall for that period. Four
dams stop floods and change the flood duration, the methods were explored to define flood threshold from the
AFD spatial pattern could be changed. If the bias in the simulated results to compare against the flood events in
reported flood duration does not significantly change the reported archives, including three statistic percentile-
relation between the AFD and FAC in reality, Fig. 10 based methods and a log Pearson type-III flood frequency
may cast another hint of dam impacts on floods, leading curve–based flood definition. The GFMS performance
to more false alarms. Dams and artificial structures de- was evaluated with regard to flood occurrence using three
crease the possibility of flooding in their downstream classic categorical verification metrics (POD, FAR, and
areas, while they might also increase the possibility of CSI). Balanced POD and FAR results are necessary for
flooding in upstream areas, thus flood duration in up- this type of system to be useful in applications. Flood
stream (downstream) basins might also increase (de- duration statistics as related to false-alarm rates were
crease). However dam effects are difficult to quantify also examined in relation to the utility of the simulated
without a reservoir module in the hydrologic model. results.

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1282 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

In this study, flood matching rules (e.g., the spatial- than the earlier technique. However, FAR statistics were
temporal window) remain the same for all the methods. not calculated by Yilmaz, although he noted significant
Therefore the differences of the GFMS flood detection regions of numerous false alarms. Subjectively, the new
performances interpreted are caused by threshold values GFMS seems to improve both the flood detection per-
and the spatial distribution defined by these methods. formance and the presentation of flood evolution (start,
The verification metrics vary across WRAs (Figs. 8 and 9) development, and recession) in the drainage network.
with all the flood detection methods showing roughly This overall better flood detection performance in the
similar results. The evaluation of the GFMS in this study current version of GFMS is probably due to both the
showed two key results independent of the specific flood hydrologic model and the flood identification algorithms.
identification method used. First, the statistics clearly The precipitation input is identical, so that is not an issue
showed that the results improve with flood duration. That in any difference. However, the key conclusion is that the
is, both POD and FAR improve when the evaluation is current system performs in an understandable fashion
confined to longer-term floods—in this case, .3 day and reasonably well against global flood event informa-
durations. This result is reasonable considering the time tion. These important results allow us to proceed to fur-
resolution of the satellite rainfall data (;3 h), the spatial ther improvements and more detailed evaluation and
resolution of the hydrologic model (1/ 88), and the limi- validation. The new GFMS has replaced the old one and
tations of the flood inventory data used for comparison. is operationally available at http://oas.gsfc.nasa.gov/CREST/
The GFMS is therefore best utilized for floods of over global.
a day or a few days’ duration and should not be expected
to consistently detect shorter-term floods (e.g., floods
5. Future work
with duration ,1 day).
Second, the impact of dams can be detected in the This model development and evaluation provides a
validation statistics, with areas without dams showing pathway forward for continued improvement in the fu-
a much lower FAR, as one would expect. The hydrologic ture. First, the improvements brought by the new hy-
model used treats the water flow in a strictly natural drologic model encourage us to use more physically
mode, following the terrain, without taking into account based hydrologic models to potentially achieve better
man-made structures or water management. More dams flood forecasting capability and performance in future
tend to result in more false alarms and false-alarm du- endeavor, though very likely with much higher compu-
ration. However, dam effects highly depend on flood tational cost. The NASA Land Information System (LIS;
scale with more negative effects on detection for short- Kumar et al. 2006; Peters-Lidard et al. 2007) provides a
term floods. Global comparison of accumulated flood series of state-of-art large-scale land surface processes
duration between natural (by model) and regulated models and therefore gives a good opportunity for efforts
(reported) flood events also indicates dam and artificial in this direction. Second, to realize the potential of global
structures play important roles leading to more false flood monitoring systems, simple and robust flow routing
alarms and false-alarm duration. Therefore, the GFMS schemes that contain minimal calibration parameters
statistics for flood durations .3 days and for areas wherever possible are needed (Yilmaz et al. 2010), in
without dams give a good estimate of the overall status addition to the a priori parameters. Although the rout-
of the approach at this time. The statistics vary across the ing scheme in the CREST hydrologic model used in this
four identification methods, but center around a POD of study has advantages in computing efficiency, it requires
;0.7 and a FAR of ;0.6. additional efforts in model regional calibration. Improved
The evaluation of the current system, both subjective routing techniques taking into account within cell routing
and quantitative, indicates an improvement over the will be implemented in the near future. Third, the evalu-
earlier, simpler system evaluated by Yilmaz et al. (2010). ation on effects by dams on flood detection indicates the
However, although this evaluation of the earlier flood limitations of the current GFMS in flood detection without
identification technique was done in a similar manner accounting for dams and levees. Without an explicit
(with an overlap time period: April 2007–July 2008), full module for representing the function of flood control by
and direct comparison is difficult because of differences reservoir operations in the hydrologic model, the effects
in techniques used, spatial resolution, and the shorter of the spatial distribution of dams (in upstream stem river
length of record used. But, in terms of POD the current and/or tributaries) and large reservoirs on the false alarms
GFMS seems to have somewhat higher values (0.9 for remain unknown. Implementation of a reservoir module
method 4 versus 0.38 for the same threshold technique in the routing scheme should also have a high priority in
used by Yilmaz). The other flood detection methods in future work. Furthermore, the continuation and improve-
this study have lower PODs (;0.7) that are still higher ment of multisatellite precipitation observations through

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


AUGUST 2012 WU ET AL. 1283

NASA’s Global Precipitation Measurement (GPM) and Monte Carlo assessment of the error propagation into
mission will provide the GFMS with more accurate hydrologic response. Water Resour. Res., 42, W08421, doi:10.1029/
2005WR004398.
precipitation analyses utilizing space–time interpola-
——, R. F. Adler, F. Hossain, S. Curtis, and G. J. Huffman, 2007: A
tions and improvements for shallow, orographic rain- first approach to global runoff simulation using satellite rain-
fall systems, and snow. A more precise and detailed fall estimation. Water Resour. Res., 43, W08502, doi:10.1029/
flood observation database is also very desirable for fu- 2006WR005739.
ture evaluations. Once acceptable performance in POD Huffman, G. J., and Coauthors, 2007: The TRMM Multisatellite
Precipitation Analysis (TMPA): Quasi-global, multiyear,
and FAR is achieved, the GFMS can also be used to re-
combined-sensor precipitation estimates at fine scales. J. Hy-
construct historical flood events for climate variation drometeor., 8, 38–55.
studies. Thus, the next stage of the GFMS development ——, R. F. Adler, D. T. Bolvin, and E. J. Nelkin, 2009: The TRMM
will focus on precisely quantifying flood properties in- Multi-Satellite Precipitation Analysis (TMPA). Satellite Rain-
cluding flood timing, magnitude, stage, inundation depth, fall Applications for Surface Hydrology, M. Gebremichael and
extent, etc. F. Hossain, Eds., Springer Verlag, 3–22.
IACWD, 1982: Guidelines for determining flood flow frequency.
Interagency Advisory Committee on Water Data, Hydrol-
Acknowledgments. This work was supported by ogy Subcommittee Bulletin 17-B (revised and corrected),
NASA’s Applied Sciences Program (Michael Goodman) 194 pp.
and NASA’s Precipitation Measurement Missions (PMM) Kumar, S. V., and Coauthors, 2006: Land information system: An
Program (Ramesh Kakar). interoperable framework for high resolution land surface
modeling. Environ. Modell. Software, 21, 1402–1415.
Lehner, B., K. Verdin, and A. Jarvis, 2008: New global hydrogra-
REFERENCES phy derived from spaceborne elevation data. Eos, Trans.
Adhikari, P., Y. Hong, K. R. Douglas, D. B. Kirschbaum, J. J. Amer. Geophys. Union, 89, 93–94.
Gourley, R. F. Adler, and G. R. Brakenridge, 2010: A digitized Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges, 1994: A
global flood inventory (1998–2008): Compilation and pre- simple hydrologically based model of land surface water and
liminary results. Nat. Hazards, 55, 405–422, doi:10.1007/s11069- energy fluxes for general circulation models. J. Geophys. Res.,
010-9537-2. 99 (D7), 14 415–14 428.
Adler, R. F., and Coauthors, 2003: The Version-2 Global Pre- ——, E. F. Wood, and D. P. Lettenmaier, 1996: Surface soil
cipitation Climatology Project (GPCP) Monthly Precipitation moisture parameterization of the VIC-2L model: Evaluation
Analysis (1979–present). J. Hydrometeor., 4, 1147–1167. and modifications. Global Planet. Change, 13, 195–206.
Al-Sabhan, W., M. Mulligan, and G. A. Blackburn, 2003: A real- Pan, M., H. Li, and E. Wood, 2010: Assessing the skill of satellite-
time hydrological model for flood prediction using GIS and based precipitation estimates in hydrologic applications. Wa-
the WWW. Comput. Env. Urban Syst., 27, 9–32. ter Resour. Res., 46, W09535, doi:10.1029/2009WR008290.
Artan, G., H. Gadain, J. Smith, K. Asante, C. J. Bandaragoda, and Pappenberger, F., and R. Buizza, 2009: The skill of ECMWF
J. Verdin, 2007: Adequacy of satellite derived rainfall data for precipitation and temperature predictions in the Danube
streamflow modeling. Nat. Hazards, 43, 167–185, doi:10.1007/ basin as forcings of hydrological models. Wea. Forecasting,
s11069-007-9121-6. 24, 749–766.
Bosilovich, M., and Coauthors, 2006: NASA’s Modern Era Peters-Lidard, C. D., and Coauthors, 2007: High-performance Earth
Retrospective-Analysis for Research and Applications system modeling with NASA/GSFC’s Land Information Sys-
(MERRA). U.S. CLIVAR Variations, Vol. 4, No. 2, U.S. tem. Innovations Syst. Software Eng., 3, 157–165.
CLIVAR Project Office, Washington, DC, 5–8. Reed, S., J. Schaake, and Z. Zhang, 2007: A distributed hydrologic
Brakenridge, G. R., S. V. Nghiem, E. Anderson, and R. Mic, 2007: model and threshold frequency-based method for flash flood
Orbital microwave measurement of river discharge and ice forecasting at ungauged locations. J. Hydrol., 337 (3–4), 402–
status. Water Resour. Res., 43, W04405, doi:10.1029/2006WR 420, doi:10.1016/j.jhydrol.2007.02.015.
005238. Shrestha, M. S., G. A. Artan, S. R. Bajracharya, and R. R. Sharma,
Carpenter, T. M., J. A. Spersflage, K. P. Georgakakos, T. Sweeney, 2008: Using satellite-based rainfall estimates for streamflow
and D. L. Fread, 1999: National threshold runoff estimation modelling: Bagmati Basin. J. Flood Risk Manage., 1, 89–99,
utilizing GIS in support of operational flash flood warning doi:10.1111/j.1753-318X.2008.00011.x.
systems. J. Hydrol., 224, 21–44. Smith, K., and R. Ward, 1998: Floods: Physical Processes and
Chow, V. T., D. R. Maidment, and L. W. Mays, 1988: Applied Human Impacts. Wiley, 394 pp.
Hydrology. McGraw-Hill, 572 pp. Su, F. G., Y. Hong, and D. P. Lettenmaier, 2008: Evaluation of
Cloke, H. L., and F. Pappenberger, 2009: Ensemble flood fore- TRMM Multisatellite Precipitation Analysis (TMPA) and its
casting: A review. J. Hydrol., 375 (3–4), 613–626, doi:10.1016/ utility in hydrologic prediction in La Plata basin. J. Hydro-
j.jhydrol.2009.06.005. meteor., 9, 622–640.
Dutta, D., S. Herath, and K. Musiake, 2000: Flood inundation ——, H. Gao, G. J. Huffman, and D. P. Lettenmaier, 2011: Po-
simulation in a river basin using a physically based distributed tential utility of the real-time TMPA-RT precipitation esti-
hydrologic model. Hydrol. Processes, 14, 497–519. mates in streamflow prediction. J. Hydrometeor., 12, 444–455.
Hirsch, R. M., 1987: Probability plotting position formulas for flood Voisin, N., F. Pappenberger, D. P. Lettenmaier, R. Buizza, and
records with historical information. J. Hydrol., 96, 185–199. J. C. Schaake, 2011: Application of a medium-range global
Hong, Y., K. Hsu, H. Moradkhani, and S. Sorooshian, 2006: Un- hydrologic probabilistic forecast scheme to the Ohio River
certainty quantification of satellite precipitation estimation basin. Wea. Forecasting, 26, 425–446.

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC


1284 JOURNAL OF HYDROMETEOROLOGY VOLUME 13

Vörösmarty, C. J., K. Sharma, B. Fekete, A. H. Copeland, Wu, H., J. S. Kimball, N. Mantua, and J. Stanford, 2011: Auto-
J. Holden, J. Marble, and J. A. Lough, 1997: The storage and mated upscaling of river networks for macroscale hydrolog-
aging of continental runoff in large reservoir systems of the ical modeling. Water Resour. Res., 47, W03517, doi:10.1029/
world. Ambio, 26, 210–219. 2009WR008871.
——, M. Meybeck, B. Fekete, K. Sharma, P. Green, and J. Syvitski, Yilmaz, K. K., R. F. Adler, Y. Tian, Y. Hong, and H. F. Pierce,
2003: Anthropogenic sediment retention: Major global-scale 2010: Evaluation of a satellite-based global flood monitoring
impact from the population of registered impoundments. system. Int. J. Remote Sens., 31, 3763–3782, doi:10.1080/
Global Planet. Change, 39, 169–190. 01431161.2010.483489.
Wang, J., and Coauthors, 2011: The Coupled Routing And Excess Zhao, R. J., and X. R. Liu, 1995: The Xinanjiang model. Computer
Storage (CREST) distributed hydrological model. Hydrol. Sci. Models of Watershed Hydrology, V. P. Singh, Ed., Water
J., 56, 84–98. Resources Publications, 215–232.

Unauthenticated | Downloaded 10/28/24 08:14 PM UTC

You might also like