Estimating Extreme River Discharges in Europe Through A Bayesian Network
Estimating Extreme River Discharges in Europe Through A Bayesian Network
Estimating Extreme River Discharges in Europe Through A Bayesian Network
a few studies applied them on a continental or global scale. Usually, they are a variation of the rational equation, which
One series of publications (Dankers and Feyen, 2008; Ro- states that river discharges can be calculated by multiplying
jas et al., 2012; Alfieri et al., 2014 and others) presented the catchment area by the rainfall intensity and runoff coeffi-
calculations using the LISFLOOD model. The simulation cient (Chow, 1988; Sando, 1998). The first two elements are
was set up for Europe with a 5 km resolution. Many dif- used in virtually all methods, but the remaining element is
ferent datasets of rainfall amount were analysed, including either left out due to the difficulty of estimating it, or is de-
historical observations and future climate simulations, deriv- rived from a model table of coefficients, or additional factors
ing daily discharge data for most of the continent. Another are added as proxies. For instance, Stach and Fal (1986) de-
group of studies (Ward et al., 2013; Winsemius et al., 2013) veloped an equation to calculate 100-year discharge in catch-
have introduced a global hydrological model GLOFRIS. This ments above 50 km2 in Poland which incorporates seven fac-
model has a much coarser resolution than LISFLOOD, as tors: catchment area, extreme rainfall (100-year return pe-
its rainfallrunoff module uses a 0.5 grid (ca. 5080 km riod), soil type, catchment slope, river slope, lake area and
resolution over Europe). The aforementioned studies used marsh area. However, it also requires incorporating an ad-
the modelling results to perform an extreme value analysis ditional empirical coefficient for each physio-geographic re-
of river discharges. Some also continued the research with gion of the country, while different return periods than the
flood-hazard estimation. The main drawback of this approach default 100 years are obtained by multiplying discharge by
is the computational expense, which necessitates a reduction a region-specific factor, similar to the RFA method. Another
in resolution. Additionally, only a limited number of rivers example is the preliminary flood risk assessment in Norway
are included in the models. For example, LISFLOOD-based (Peereboom et al., 2011), which utilized a simple regression
studies used a threshold of 1000 km2 catchment size, later between catchment area and 500-year water level. An enve-
reduced to 500 km2 , while GLOFRIS was prepared only for lope curve approach was then applied, in which a curve is
rivers with Strahler order 6 or above, which only accounts constructed in such a manner that it contains all (or almost
for about a third of the river length included in the aforemen- all) observations. This concept was long used to make crude
tioned European model. estimations of maximum possible floods, also on a continen-
The second approach is to use statistical methods, of which tal scale (e.g. Padi et al., 2011, applied it to Africa). Some
a large variety exists. Several statistical models rely on the attempts have also been made to apply multiple linear re-
fact that catchments close to each other share many charac- gressions on global scale (Herold and Mouton, 2011).
teristics. River basins are therefore pooled into groups based This paper presents a new statistical method to calculate
on geographical proximity alone or also based on catchment extreme river discharges under present and future climate in
size, climate data, terrain or soil type. However, the studies Europe. It was devised as an alternative to existing physi-
employing such techniques mostly covered a limited domain, cal and statistical models; its purpose was to provide bound-
typically single countries (Meigh et al., 1997; Salinas et al., ary conditions for hydraulic modelling that could be used in
2013). The first global analysis was recently presented by a pan-European flood-hazard analysis. The method is based
Smith et al. (2015). The study applied regional frequency on Bayesian networks (BNs) that combine probability theory
analysis (RFA) for all continents for the first time. Here, after and graph theory in order to build and operate a joint dis-
clustering catchments based on size, climate type and aver- tribution. A BN is used to analyse and represent the depen-
age rainfall, a probability distribution of discharges is cal- dencies between different environmental variables, including
culated for each region. Estimates of extreme discharges for river discharges. In this paper, we present the quantification
a given ungauged catchment were derived by first assigning of the model based on a large dataset of river-gauge obser-
them to a proper region and then using data on catchment vations and pan-European spatial datasets. The model shows
size and rainfall together with region-specific coefficients to good performance across regions of Europe at different time
solve a simple regression equation, in order to obtain an es- periods. We also present a comparison of this new approach
timate of the mean of annual maxima of discharges in the with other methods, both physical and statistical. Lastly, we
catchment. Finally, a generalized extreme-value (GEV) prob- apply it over the entire domain to obtain a large database
ability distribution with region-specific parameters is used to of extreme discharges, and analyse the influence of climate
calculate return periods of discharges. Flood scenarios (peak change on their return periods.
discharges) obtained through this method were then used in An early and preliminary variant of the method was origi-
a global flood-hazard analysis by Sampson et al. (2015). nally reported in Paprotny and Morales Npoles (2015). The
There are also several statistical methods that rely solely BN presented there is superseded by an improved version
on the geographical characteristics of catchments to estimate described herein. Also, the work is part of a bigger effort to
discharges. Many of them are simple equations that can be create pan-European meteorological and hydrological haz-
easily applied to quickly solve practical problems in engi- ard maps under the Risk analysis of infrastructure networks
neering, such as estimating dike heights or calculating neces- in response to extreme weather (RAIN) project. This influ-
sary channel or culvert capacity. Moreover, they are typically enced the choice of the domain and input data, which is ex-
only applicable in small areas for which they were prepared.
plained in Sect. 2, although this does not limit the applicabil- Pan-European datasets: III
ity of the method outside of the European domain. catchments and river network, meteorological
data, terrain, land use, soils
The basic elements of the procedure to derive extreme dis- Comparison with X
European IX other methods
charge estimates through a BN are presented in Fig. 1. The
database of
first step was to identify available data on annual maxima discharge return
(QAMAX ) of daily river discharge (I), and also the catchments periods
which contribute to locations where the measurements were
made (Sect. 2.2), i.e. gauged catchments (II). Then, several Figure 1. Schematic workflow of obtaining extreme river dis-
large-scale (pan-European or global) spatial datasets were charges from catchment characteristics. QAMAX = annual maxima
compiled (III), providing information on the most important of discharges. Roman numerals refer to the text.
variables influencing extreme river flow behaviour (Sect. 2.3)
both for gauged and ungauged catchments (IV). The de-
pendence between those variables and river discharges were 1186 stations from the Global Runoff Data Cen-
analysed through copulas and BNs (Sect. 2.4) (V). After ex- tre (2016)
tensive testing of different configurations, an optimal model 82 stations from the Norwegian Water Resources and
was constructed (Sect. 2.5) that had the highest performance Energy Directorate (2015)
in validation in terms of the underlying statistical model and
prediction capability (VI; Sect. 2.7 and 3.1). The output of 284 stations from the Swedish Meteorological and Hy-
the model is annual maxima of daily river discharges (VII), drological Institute (2016)
which were then fitted to a probability distribution in order to
239 stations from Centro de Estudios Hidrogrfi-
obtain return periods (Sect. 2.6). After the method was ready,
cos (2012)
it was applied for all catchments (IV) in the domain to create
a database of discharges (VIII). Using frequency analysis, re- 50 stations from Fal (2000)
turn periods of discharges under present and future climate in
Europe (Sect. 3.2) were obtained (IX). The accuracy of the The data collected were daily discharges observed be-
BN model was also contrasted with alternate methods (X; tween 1950 and 2013, though of primary interest were data
Sect. 3.1 and 4.1). up to 2005, since it was the maximum range of EURO-
CORDEX climate models historical scenario runs. All
2.2 River discharge data datasets were quality-checked by the providers; only a few
cases of misplaced decimals in daily series were identified in
Discharge data from measurement stations were collected the data after inspection. Daily discharges were transformed
over a domain covering most of Europe (Fig. 2). The study into annual maxima (QAMAX ) for each calendar year, ex-
area includes the entire continent, plus Cyprus as a European cept for the last group of 50 stations, as Fal (2000) only re-
Union (EU) member, with two exceptions. Out of the terri- ported the extreme and mean values. The total number of
tory of the former Soviet Union, only river basins that are QAMAX values for the years 19502005 in the database was
at least partially located within the EU were included. Also, 74 757. The stations represent 37 countries and 439 different
the outlying regions of Madeira, the Azores and the Canary river basins (78 % of the domains area of 5.67 million km2 ).
Islands were omitted because they are outside the EURO- However, the south-eastern part of Europe is substantially
CORDEX climate models domain. under-represented, with most stations concentrated in Scan-
In total, data series for 1841 stations were compiled, not dinavia and western Europe. France has the highest number
including a few dozen available stations whose tributaries of QAMAX values in the database (14 %), followed closely
could not be unequivocally identified and were therefore ex- by Spain, Sweden and the United Kingdom (UK), as can be
cluded from the analysis. The data were collected from five seen in Table 1. However, the largest density of stations is
sources, as follows: in Switzerland, Austria and the UK. The catchments sizes
Figure 2. Measurement stations used in the work (long data series indicates stations with sufficient data for calculating return periods) and
river basins included in the domain.
span from 1.4 to 807 000 km2 , with 43 % of them being in was a map of the river network and catchments, which was
the 1001000 km2 range. derived from the pan-European CCM River and Catchment
Long data series, i.e. at least three full decades of unin- Database v2.1, or CCM2 (Vogt et al., 2007; de Jager and
terrupted data (19511980, 19611990 or 19712000) were Vogt, 2010). It was created by calculating flow direction and
available for 1125 stations. These observations were used accumulation on a 100 m resolution digital elevation model
to validate the accuracy of the model in estimating mean (DEM), combined with land-cover information, satellite im-
QAMAX and return periods, while the complete database was agery and national GIS databases. CCM2 was utilized to de-
used to quantify the BN model. limit the domain used in this paper. In total that area cov-
ers 831 125 river sections (almost 2 million km in length) in
2.3 Spatial datasets 70 638 river basins. Each river-gauge station was connected
with a corresponding river section in CCM2. Each river sec-
Several large-scale spatial datasets were collected for this tion belongs to one primary catchment, whose attributes in-
study, even though not all of them were used in the final setup clude the identifier of the next downstream catchment. Using
of the model. Nevertheless, all were useful for testing dif- this information, the whole tributary of a gauge station, or
ferent configurations of the BN. The most important dataset
any other point in the domain, could be delimited. For each by ICHEC) with the COSMO_4.8_clm17 regional climate
catchment, various statistics were calculated in GIS. A few model (Rockel et al., 2008), realization r12i1p1. This RCM
indicators could be derived from this dataset alone: catch- also has relatively good model performance when estimat-
ment area, river network density (total river length divided by ing extreme precipitation in comparison with others (Kot-
catchment area) and catchment circularity (catchment area larski et al., 2014). No bias correction was performed, even
divided by the area of a circle that has the same perime- though it is often considerable for extreme precipitation (Ro-
ter as the catchment), whereas others were derived using the jas et al., 2011). For the sake of simplicity and universality
datasets described below. of the method, we opted to use all input data unaltered. How-
The next most relevant source of information is climate ever, as an additional check on the methods performance, a
data, both historical and future projections. Two datasets different GCM-RCM combination was analysed, and the re-
for the former were analysed. E-OBS is a spatial interpo- sults have been added to Supplement 2. From this dataset
lation of observations made by weather stations covering four variables were derived: total precipitation, snowmelt,
the years 19502015 (Haylock et al., 2008), while ERA- near-surface temperature and total runoff. All data were daily
Interim is a complete climate reanalysis for 19792015 (Dee values on a 0.11 rotated grid (spatial resolution of about
et al., 2011). However, E-OBS has gaps in spatial coverage 12 km).
and includes few variables, whereas ERA-Interim has a rela- Meteorological factors are the driving force behind floods,
tively coarse resolution (0.75 ). In effect, slightly better per- but more factors influence the runoff terrain, land use and
formance of the model was recorded using high-resolution soils. Information on terrain was obtained from two DEMs.
control runs of a climate model under the EURO-CORDEX Most of the domain is available from EU-DEM, a dataset
framework (Jacob et al., 2014); the results of this analysis produced for the European Environment Agency. It was cre-
can be found in Supplement 2. EURO-CORDEX uses re- ated by merging two sources of satellite altimetry data
gional climate models (RCMs) for Europe, where boundary Shuttle Radar Topography Mission (SRTM) and ASTER
conditions are obtained from global-scale general circulation GDEM. It has a 25 m resolution and covers 39 countries
models (GCMs). In this work, we utilize simulations for the (DHI GRAS, 2014), including areas north of 60 N, which
historical run (19502005) and two climate-change scenar- are missing from SRTM-only datasets. For eastern Europe
ios (RCP 4.5 and RCP 8.5 for 20062100). The necessary and some Atlantic islands which are not covered by EU-
variables (precipitation, snowmelt and runoff) and resolu- DEM, SRTM data were used instead (Farr et al., 2007).
tion (0.11 ) were included in a total of 14 model runs; of SRTM has a 3 arcsec resolution ( 100 m over Europe)
these, 8 model runs start in 1950. Of the model runs, one was and there are several versions available. The one used here
made using GCM boundary conditions which came from a is a void-filled derivate obtained from Viewfinder Panora-
12-member ensemble. mas (2014). Both datasets were resampled to a common
This model run, which was selected to carry out this study, 100 m grid matching the CCM2 dataset. The variables cal-
was made by the Climate Limited-area Modelling Commu- culated from the DEMs included average elevation, average
nity utilizing the EC-Earth general circulation model (run river slope and average catchment slope. The latter was de-
rived either by averaging all slopes in the DEM or by calcu- 2014b). Grain-size structure of the soil (gravel, sand, silt or
lating the slope S with the following equation: clay) was calculated from SoilGrids1km database (Hengl et
Hmax Hmin al., 2014).
S= , (1)
A 2.4 Bayesian networks
where Hmax is the maximum, and Hmin the minimum, eleva-
tion in the catchment and A is the catchment area. Another As noted in the introduction, BNs are graphical, probabilis-
variable, the time of concentration, which is a measure of wa- tic models (Pearl, 1988; Kurowicka and Cooke, 2006). They
ter circulation speed in the catchment, was calculated based have several advantages when compared against other meth-
on Gericke and Smithers (2014). Finally, we tested a terrain ods, for the application described in this paper. For one,
classification similar to one used in FLEX-Topo hydrological their graphical nature makes the dependence configuration
model (Savenije, 2010). In this approach, all grid cells in the explicit, as evidenced in Fig. 3 in the next section. A BN takes
DEM are classified based on height above nearest drainage, into account, for example, dependencies between different
slope inclination and absolute elevation (Gharari et al., 2011; environmental variables, which are not easily modelled with
Gao et al., 2014). Three classes wetlands, hillslopes and regression methods. Also, they can capture the often non-
mountains were calculated as a percentage of total catch- linear nature of those dependencies. The class of BNs used
ment area. in this research includes several elements, whose specifics
Land-use statistics for catchments were mainly based on need to be explained before the actual hydrological model is
CORINE Land Cover (CLC), another dataset produced by presented.
the European Environment Agency (2014a). In this study, First of all, consider a set of random variables
CLC 2000 edition, version 17 (12/2013), in raster for- (X1 , X2 , . . ., Xn ), which could be discrete, continuous or
mat (100 m resolution) was used. It includes 44 land-cover both. This distinction defines the different types of BNs. In
classes with a minimum mapping unit of 25 ha and covers this paper, we build a continuous BN, since our environmen-
39 countries. The main source material were Landsat 7 satel- tal data are continuous. Furthermore, discrete BNs are only
lite images from the years 19992001 (European Environ- efficient for small models, whose variables have a limited
ment Agency 2007). Similar to EU-DEM, the dataset does number of states because of the way the (conditional) prob-
not cover some catchments in eastern Europe and in a few abilities are calculated, as we explain later on. The random
other areas. Missing information was supplemented using variables are represented as nodes of the BN, while the de-
the Global Land Cover 2000 dataset, produced by the Joint pendencies between them are represented as arcs joining
Research Centre using algorithmic processing of SPOT 4 different nodes. An arc represents the (conditional) correla-
satellite images (Bartalev et al., 2003). This product has a tion between two variables, and has a defined direction. The
30 arcsec resolution and includes 22 land-cover classes. The node whose arc points into the direction of another node is
different classifications were synchronized to derive the area known as the parent, while the node on the receiving end
covered by forests, croplands (total and irrigated), marshes, of the arc is its child. A set of nodes and arcs forms the
lakes, glaciers, bare land and artificial surfaces. However, the eponymous network of the BN. The arcs have to connect
data were only available for a single year for the whole do- the nodes in such a manner that the graph is acyclic, i.e. if we
main, even though CLC was also produced for 2006, 2012 chose any node and follow strictly the direction of all arcs in
and, in some countries, for 1990. In contrast to terrain or a path, we will not end up at the same node. Each variable
soils, land use is dynamic and could influence the analysis is conditionally independent of all its predecessors given its
for early time periods. Yet, some historical land-use recon- parents. Therefore, each variable has a conditional probabil-
structions and projections (e.g. Klein Goldewijk et al., 2011) ity function given its parents, and the joint probability can be
do not have the necessary resolution or thematic coverage for expressed as follows:
use in this analysis. Therefore, fixed values of land-use per- n
centages were used for all years, including the future climate-
Y
fX1 ,X2 ,...,Xn (x1 , x2 , . . ., xn ) = fXi |Pa(Xi ) xi |xPa(Xi ) , (2)
change scenarios. i=1
Last but not least, soil property data were analysed. Oc-
currence of peat, unconsolidated and aeolian deposits, av- where Pa(Xi ) is the set of parent nodes of Xi , with i =
erage water content, and soil texture were derived from the 1, . . ., n. Naturally, if there are no parents, fXi |Pa(Xi ) = fXi .
European Soil Database v2.0 (Panagos et al., 2012), devel- We already see that one of the purposes of BNs, perhaps
oped on a 1 : 1 000 000 scale, and Harmonized World Soil the main one, is updating the probability distributions of
Database v1.2 (FAO/IIASA/ISRIC/ISS-CAS/JRC, 2012), subsets of nodes, when evidence (observations) of a dif-
available at 30 arcsec resolution. Soil sealing (i.e. area cov- ferent subset becomes available. Hence, it is important not
ered by artificial impervious surfaces) was obtained from Re- only to properly set up the network with nodes and arcs,
vised Soil Sealing 2006, a dataset based on satellite imagery but also to choose a good method to describe the depen-
with a 100 m resolution (European Environment Agency, dencies. In case of a discrete network, this is done using
Figure 3. Bayesian network for river discharges in Europe. The nodes are presented as histograms, with numbers indicating the means and
standard deviations of the variables. Values on the arcs are the (conditional) rank correlation coefficients.
conditional probability tables. In our model, node Max dis- (Clayton and Gumbel copulas). Details of this calculation
charge has 7 parents. In this case, if each continuous node and the validation of the whole BN can be found in Supple-
was to be discretized into 5 states, a probability table with ment 1. The bivariate Gaussian copula C has the following
58 = 390 625 conditional probabilities would be required. cumulative distribution function:
Of these, only 57 = 78 125 may be estimated by difference,
as probabilities must add to 1. Thus, 312 500 probabilities C (u, v) = 8 81 (u), 81 (v) , (u, v) [0, 1]2 , (4)
would need to be specified. Similarly, if we were to dis-
cretize each node into 10 states, 90 000 000 probabilities where 8 is the standard normal distribution, 81 is its in-
would need to be specified. Even a discretization into 5 states verse and 8 is the bivariate Gaussian cumulative distri-
for each node in our model would make the quantification bution with (conditional) product moment correlation be-
prohibitive given the data available. Considering other nodes tween the two marginal uniform variates u and v in the in-
(node Buildup has 4 continuous parents) would make it terval [0,1]. In contrast to the copula specification, the non-
even more restrictive for the use of discrete BNs. Thus, in parametric BN we apply in this study is parameterized by
this paper we apply a continuous non-parametric BN to avoid (conditional) rank correlations. This is because they are al-
the use of probability tables. gebraically independent; hence, any number in the interval
By using a non-parametric continuous BN, we only need [1,1] assigned to the arcs of the BN will warranty a positive
to specify an empirical marginal distribution for each vari- definite correlation matrix. The rank correlation (denoted by
able and a rank correlation for each arc (Hanea et al., 2015). r) of two random variables Xi and Xj with cumulative distri-
We use the usual estimator of the cumulative probability dis- bution functions FXi and FXj is the usual Pearsons product
tribution: moment correlation computed with the ranks of Xi and Xj :
1X n r Xi , Xj = FXi (Xi ) , FXj Xj . (5)
F (x) = 1{x x} , (3)
n i=1 i Conditional rank correlations are calculated as shown in
Eq. (5), except that the conditional distributions are used in-
where (xi , . . ., xn ) are the samples of a random variable, side the arguments to the right of the equal sign. For the
while 1{xi x} = 1 over the set {xi x} and is zero elsewhere. Gaussian copula, conditional correlations are equal to par-
Spearmans rank correlations are used to parameterize one- tial correlations and these are constant. For one-parameter
parameter (conditional) copulas. A copula is, loosely, a joint bivariate copulas, Eq. (5) becomes the following:
distribution on the unit hypercube with uniform [0,1] mar-
gins. There are many types of copulas, described in detail Z1 Z1
by Joe (2014). Here, we use bivariate Gaussian copulas, r Xi , Xj = 12 C (u, v) du dv 3, (6)
an assumption that was tested against alternate distributions 0 0
The conditional rank correlation of Xi and Xj given the ran- is a proxy for terrain characteristics that influence the speed
dom vector Z = z is the rank correlation calculated in the with which the water from rainfall moves down the slopes
conditional distribution of (Xi Xj |Z = z). For each variable (Savenije, 2010).
Xi with m parents Pa1 (Xi ), . . ., Pam (Xi ), the arc Paj (Xi ) The climate model from EURO-CORDEX framework de-
Xi is associated with the rank correlation: livered two variables to the BN. First is the annual maxi-
mum of daily precipitation and snowmelt (MaxEvent) in
r Xi , Paj (Xi ) , millimetres (mm). Both factors are relevant, though melting
j =1
r X i , Paj (Xi ) |Pa1 (Xi ) , . . ., Paj 1 (Xi ) , often occur concurrently (as evidenced in a list of large Eu-
j = 2, . . ., m
100
Unconditional
90
Conditional on 2 parents
(area and steepness)
80
Conditional on all parents
70
Percentile (%)
60
50
40
30
20
10
0
0.1 1 10 100 1000 10 000 100 000
Discharge (m s-1; logarithmic scale)
Figure 4. Cumulative probability distribution of river discharge: unconditional and conditionalized on two and seven nodes using values for
Basel station in Switzerland (river Rhine, year 2005).
is higher is colder areas), lakes and marshes (less space avail- increase in radiative forcing of 4.5 or 8.5 W m2 (Moss et al.,
able for construction). 2010).
In order to estimate river discharge in an ungauged catch-
ment, the BN is updated, that is, the value of the node or set 2.6 Return periods of discharges
of nodes (other than discharge) is defined based on the obser-
vations corresponding to that particular catchment, i.e. new Annual maxima of daily river discharges calculated by the
evidence. Fig. 4 shows the effects of updating on the exam- BN were used to perform a frequency analysis. Only sta-
ple of Basel station in Switzerland (meteorological data per- tions with long data series were used, i.e. those with at least
tain to the year 2005). Conditionalizing on only two variables 30 years of discharge observations. To find an optimal model
(catchment area and steepness) changed the mean of the dis- for estimating the marginal probability distribution of annual
tribution from 341 to 1740 m2 s1 . Knowing all seven vari- maxima of discharges, we used the Akaike information cri-
ables that are parents of the river discharge node, we obtain terion (AIC) measure (Mutua, 1994). AIC values varied sig-
an estimate of river discharge of 2819 m2 s1 . In this case, nificantly between stations. On average, the AIC value was
the estimate is fairly accurate, as discharge of 3212 m2 s1 the lowest for the GEV distribution, indicating that it was
was actually measured. The same procedure was applied to the best fit over 15 other tested distributions, such as gen-
all rivers in the domain. Additional examples of conditional- eralized Pareto, gamma, lognormal or Weibull distributions.
ization of the BN can be found in Supplement 1. It should be This three-parameter distribution, however, gave very large
noted that the discharge in each river section was estimated errors for some stations. Therefore, to avoid completely un-
independently from another section in the same river using realistic estimates in the database, we decided to use the two-
data for the entire upstream area. parameter Gumbel distribution, which is essentially the GEV
In addition to validating the method, we apply it to distribution with the shape parameter equal to zero. This dis-
model the influence of future climate predictions from EC- tribution was previously used in several large-scale flood-
EARTH-COSMO_4.8_clm17 (Fig. 8) and EC-HadGEM2- hazard studies (Dankers and Feyen, 2008; Hirabayashi et al.,
ES-RACMO22E (Fig. S9 in the Supplement) models. As 2013; Winsemius et al., 2013; Alfieri et al., 2014). In order to
noted before, land-cover statistics are fixed in time, and calculate discharge Q with probability of occurrence p, the
therefore only the climate variables change over time in the following equation is used:
prediction. Future changes were calculated for two climate Qp = ln ( ln (1 p)) , (8)
scenarios: RCP 4.5 and RCP 8.5. Those representative con-
centration pathways indicate changes in future physical and where is the location parameter and is the scale param-
socio-economic environments that would cause, by 2100, an eter. Parameters were fitted using maximum likelihood esti-
mation (Katz et al., 2002; Gelman et al., 2013). The extreme
(a) (b)
Mean annual maximum discharge (QMAMX) 1000-year discharge (Q1000)
10 000 10 000
1000
100 100
10 10
1 1
R2 = 0.92 R2 = 0.87
INSE = 0.92 INSE = 0.74
0.1 0.1
0.1 1 10 100 1000 10 000 0.1 1 10 100 1000 10 000
Observed [m s1] Q1000 based on observed data [m s1]
(c) 100-year discharge (Q100) (d) 10-year discharge (Q10)
10 000 10 000
Q10 based on simulated data [m s1]
Q100 based on simulated data [m s1]
1000 1000
100 100
10 10
1 1
R2 = 0.89 R2 = 0.91
INSE = 0.80 INSE = 0.88
0.1 0.1
0.1 1 10 100 1000 10 000 0.1 1 10 100 1000 10 000
Q100 based on observed data [m s1] Q10 based on observed data [m s1]
Figure 5. Simulated and observed average annual maxima of daily river discharges (a) and annual maxima fitted to Gumbel distribution
to calculate 1000-, 100- and 10-year return periods (bd), for 1125 stations. 30-year periods of annual maxima were used (the most recent
available out of 19712000, 19611990 or 19511980).
while the 1000-year (Q1000 ) discharge noticeably deviates that a method based on copulas could also be used for bias
from the 1 : 1 line, mainly for very large rivers. It should be correction; however, further investigation of this observation
also remembered that the return periods were based only on is outside of the scope of this paper.
30-year series, and therefore a 100- or 1000-year discharge Performance of the model by time period, region or catch-
includes the uncertainty of extrapolation of the return peri- ment area was also analysed in more detail (Table 2). For four
ods. However, the INSE value is still good, and R 2 changes different time periods, where availability of stations varies,
moderately. The R 2 drops to 0.52 for QMAMX when con- the results of the validation are almost identical. Only for
sidering specific river discharge and 0.44 for 100-year dis- 19812010 is it slightly lower because it is partially outside
charge, with INSE at 0.43 in both cases. Again, performance the timespan of the historical scenario of EURO-CORDEX;
is slightly higher for 10-year discharge and drops approach- for 20062010, data from RCP 4.5 climate-change scenario
ing 1000-year discharge. It is also interesting to notice that run had to be used to fill the missing information. Much more
the rank correlations for all four cases discussed previously variation in the quality of the simulations is observed when
(QMAMX , Q1000 , Q100 and Q10 ) are in the order of 0.8 and dividing the results by geographical regions (their definitions
their bivariate distribution does not present large asymme- correspond to the regionalization of the CCM2 catchment
tries (Fig. S5 in Supplement 2). This could be an indication database). Western Europe (comprised mainly of France,
(a) (b)
Mean annual maximum discharge (QMAMX) 1000-year discharge (Q1000)
2.0 2.0
1.8 1.8
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 R2 = 0.52 0.2 R2 = 0.43
INSE = 0.41 INSE = 0.41
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
Observed [m s1 km] Q1000 from observed data [m s1 km]
(c) (d)
100-year discharge (Q100) 10-year discharge (Q10)
2.0 2.0
Q100 from simulated data [m s1 km]
1.6 1.6
1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
Figure 6. The same as Fig. 5, but for specific discharge, i.e. divided by catchment area.
Belgium, the Netherlands and the Rhine river basin) had par- deed, many Spanish stations with large errors were found to
ticularly good results for QMAMX , followed by the Danube be just downstream of large dams. Finally, other regions
river basin and Scandinavia (roughly defined as Sweden and is a grouping of a small number of stations scattered around
Norway). The lowest correlation for QMAMX was observed Europe, mainly from Finland, Italy and Iceland. Those areas,
in the Iberian Peninsula (Spain and Portugal), while Central containing many rivers in both arid and polar climates, are
Europe (mainly Poland, Lithuania, Denmark and north-east under-represented in the quantification of the BN, which may
Germany) had the highest INSE values. Iberia had the low- provide a potential justification for their lower performance.
est performance for Q100 , while western Europe recorded In Fig. 5 it can be seen that the amount of scatter in the plot
the highest correlation, and Scandinavia had the best score in increases for rivers with smaller discharges. Detailed results
INSE and IRSR . Central European and Scandinavian stations in Table 2 show that the performance of the model drops for
error was lower and INSE values higher for the 100-year re- the smallest catchments, especially for those below 100 km2
turn period compared to QMAMX . No region dropped below (177 catchments). For others, above 500 km2 , the R 2 and
acceptable levels (i.e. R 2 or INSE value of 0.5, according to INSE values are mostly in the range of 0.50.6 for specific
Moriasi et al., 2007), albeit stations in the Iberia and other discharges, as when considering all stations. Additionally, to
regions have noticeably lower performance. In the case of validate the robustness of the method, we did a split-sample
Spain, to which almost all stations collected for the Iberian test. Stations were randomly divided into two sets. Data from
Peninsula belong, discharges tend to be overestimated, which 917 stations were used to quantify the BN in order to simu-
may point to the influence of reservoirs on river flow. In- late discharges in the remaining 924 stations. Of the latter,
Table 2. Validation results for simulated and observed average annual maxima of daily river discharges QMAMX and annual maxima with a
100-year return period Q100 .
586 stations had at least three full decades of discharge ob- the RFA. Less scatter can be observed in upper and lower
servations, which allowed us to make a comparison with sim- ranges of discharges, with similar performance in the middle.
ulated discharge. The validation result was almost identical Using specific river discharges (Fig. 10), the performance of
with those reported for the full quantification, and even bet- both methods was lower, but still much better for the BN:
ter results (R 2 = 0.94 and INSE = 0.93) were observed for INSE , for example, was negative for both QMAMX and Q100
QMAMX , while for Q100 the same value of INSE was calcu- when using RFA, in contrast to a value of 0.43 for the BN.
lated and R 2 equalled 0.90. Still, performance at individual RFA was devised as a global method instead of a regional
stations varies. A selection of observed and simulated dis- one, but at the same time it is in fact a set of 82 regional
charges, both annual maxima and those fitted to Gumbel dis- approximations of hydrological processes. Here, we anal-
tribution, is presented in Fig. 7. At some stations, there is a yse contributing factors of extreme discharges all together,
very close fit, while at others, either the discharge is over- achieving comparable or even better results.
estimated or the distributions have different shapes. This is,
however, not atypical even for more local studies. 3.2 River discharges in Europe
The final analysis in this section is the comparison of
the BN model and RFA. Using RFA, estimates of extreme Calculation of river discharges utilizing data from EURO-
discharge were obtained for all 1125 stations with long CORDEX climate simulations was done for the years 1950
records and compared to discharges in Fig. 9. In the case of 2100, and are presented here in three time slices: 19712000,
Q100 , Gumbel-distributed discharges were used, as the per- 20212050 and 20712100. The first period is from the his-
formance with GEV distribution was slightly lower. The per- torical control run, while the other two were analysed for
formance of both BN and RFA models is visually similar, two emission scenarios: RCP 4.5 and RCP 8.5. Projected
though the BN recorded higher correlation and less bias then trends calculated from the data are presented in Fig. 8. For
Figure 7. Simulated and observed annual maxima of daily river discharges fitted to Gumbel distribution at selected stations. Data refer to
19712000, except for panel (h), which refers to 19611990.
the sake of clarity, only rivers with catchment area above end of the century, however, southern Europe (comprised
500 km2 are presented in the picture; full-scale maps of dis- mostly of Italy) will experience the biggest average increase.
charges have been included in the Supplement. Aggregate Conversely, Q100 is projected to decrease on average in the
statistics by region and catchment size were included in Ta- British Isles in all four scenarios, in north-east Europe (Fin-
ble 3. In the description we focus on 100-year discharge, but land, north-west Russia and the Baltics) in three scenar-
the trends are also representative of other return periods. ios, in Scandinavia in two and in south-east Europe (mainly
The projected trends in Europe are very diversified. For Greece) in one. Those discrepancies are the result of several
Europe as a whole, there is a slight 47 % increase in trends, namely changes in extreme precipitation, snowmelt
discharges with a 100-year return period (Q100 ), with the and runoff coefficient. The first is projected to increase across
biggest change observed in the 20212050 RCP 8.5 scenario. the continent, while the other two decrease at the same time
Along 3444 % of river length in Europe, Q100 is projected with some exceptions. Decline in snowmelt, a consequence
to increase at least by 10 %, depending on scenario. Yet, of thinner snow cover, will contribute to lower extreme dis-
along 1621 % of river length a decrease by more than 10 % charges in parts of Scandinavia and Scotland. However, in
is expected, with only small changes (10 %) for the remain- most of Sweden, Finland and other areas, less snowmelt will
ing 3549 %. In RCP 8.5 both increases and decreases of be offset by more rainfall. Lower precipitation is expected
Q100 are more prominent than in RCP 4.5. In effect, Q100 in only in small, scattered patches of Europe, most noticeably
the 20712100 RCP 8.5 scenario is projected to correspond in southern Spain. At the same time, an increase of the runoff
to 176-year discharge under present climate (19712000) if coefficient could be observed in predictions for the Iberian
we take the median value. This value is slightly lower in Peninsula and western Europe, with decreases in the remain-
mid-century and in end-century for RCP 4.5, with the small- der of the continent. Higher temperatures and less soil mois-
est change compared to present climate in the 20212050 ture are contributing factors to those trends.
RCP 4.5 scenario. In Table 3 projected trends in Q100 were also provided per
Between regions, by mid-century, the largest average in- catchment size. The differences in average increase of dis-
creases in extreme discharges are expected in the Iberian charges are very small and partially caused by their uneven
Peninsula and Danube basin (RCP 4.5), while Q100 in Cen- distribution in Europe. Median return periods show more di-
tral Europe (i.e. mainly the Elbe, Oder and Vistula river versity, since the relative increase in discharge by certain in-
basins) is projected to surge even more in RCP 8.5. By the crement of return period typically gets smaller as the river
Figure 8. Predicted trends in daily river discharge with a 100-year return period (Gumbel distribution) under climate-change scenarios
RCP 4.5 and RCP 8.5 (rivers with catchment area above 500 km2 only). Projections based on EC-EARTH-COSMO_4.8_clm17 climate
model run.
grows in size. Most importantly, this breakdown shows that 4.1 Comparison with other models
the method is able to detect trends in discharge in both large
and small rivers. The accuracy of the BN model of extreme river discharges
can be compared, directly or indirectly, with results of other
4 Discussion statistical and physical models. In case of the former, a com-
parison with the RFA method was shown in Sect. 3.1. For
The results presented in the previous section, however en- the latter, reported values of R 2 and INSE from several stud-
couraging on their own, have to be compared to other existing ies were obtained.
studies. Such analysis is presented in Sect. 4.1. Section 4.2 Studies with measures of model performance comparable
presents a discussion of the limitations of the method and with this analysis were summarized in Table 4. All of the
the uncertainties in the models setup and results. Finally, in publications were based on the LISFLOOD model forced by
Sect. 4.3, ongoing and planned developments of the BN are a large variety of climate models. The validation of those hy-
presented. drological models was mainly based on Global Runoff Data
Centre discharge data, similarly to this study. The correla-
tion between observed and simulated mean annual maxima
of daily discharges (QMAMX ), measured by R 2 , was between
(a) Mean annual maximum discharge (QMAMX) (b) 100-year discharge (Q100)
10 000 10 000
100 100
10 10
R2 = 0.92 1 R2 = 0.89
1 INSE = 0.80
INSE = 0.92
R2 = 0.78 R2 = 0.70
INSE = 0.71 INSE = 0.44
0.1 0.1
0.1 1 10 100 1000 10 000 0.1 1 10 100 1000 10 000
Observed [m3 s1] Q100 based on observed data [ m3 s1]
Figure 9. Simulated and observed average annual maxima of daily river discharges and 100-year discharge for 476 stations; Bayesian
network model in red, regional frequency analysis in green. 30-year periods of annual maxima were used (the most recent available out of
19712000, 19611990 or 19511980).
(a) Mean annual maximum discharge (QMAMX) (b) 100-year discharge (Q100)
4 4
Q100 based on simulated data [m s1]
3 3
Simulated [m3 s1]
2 2
1 1
R2 = 0.52 R2 = 0.45
NSE = 0.43 NSE = 0.43
R2 = 0.28 R2 = 0.17
NSE = -0.02 NSE = -1.35
0 0
0 1 2 3 4 0 1 2 3 4
Observed [m3 s1] Q100 based on observed data [m s1]
Figure 10. As Fig. 9, but for specific discharge, i.e. divided by catchment area.
0.86 and 0.94. The corresponding value in this study is within mate data output slightly improved the correlation, but most
this range. Only one other study (Dankers and Feyen, 2008) importantly the INSE went from a negative value, indicat-
reported R 2 for discharge with different return periods (Q20 , ing poor performance, to a value close to perfect fit with
Q50 and Q100 ). When compared with the results using the a 1 : 1 line. In this study, no modifications to climate data
BN model, our results are slightly higher. It should be noted were made and yet INSE values for our statistical model are
that in the aforementioned analysis, using Gumbel distribu- in the range of a physical model forced by bias-corrected cli-
tion (like in this study) yielded better correlation than GEV mate data. Of course, the reported validation results are not
distribution. Only two studies reported INSE values. Most in- perfectly comparable with this analysis, since the described
terestingly, Rojas et al. (2011) show that the performance of studies focussed on relatively large rivers (those more than
the hydrological model changed significantly depending on ca. 1000 km2 catchment area) and used ENSEMBLES re-
how climate data were treated. The authors noted large bi- gional climate simulations, which are several years older than
ases in modelled precipitation data, and made a correction the CORDEX simulations employed herein. Additionally, R 2
based on observational datasets. This modification of cli- and INSE are not the only measures available. Dankers and
Table 3. Projected change in 100-year river discharge (Q100 ) relative to 19712000, and return periods of discharge equal to Q100 in
19712000 for two emission scenarios RCP 4.5 and RCP 8.5. Predictions based on EC-EARTH-COSMO_4.8_clm17 climate model run.
Table 4. Reported validation results for extreme discharge simulations for Europe.
Feyen (2008) report that the error in simulating QMAMX was ties and limitations of the model are immanent properties of
bigger than 50 % in 2425 % of stations and more than 100 % large-scale hydrological simulations, while others are spe-
in 68 %. In this study, for comparable river size, i.e. with ex- cific to how the method was conceived and what assumptions
treme discharge of ca. 100 m3 s1 and more, those values are and data were included. One of the foremost aspects belong-
34 and 11 %. Still, overall the performance of the BN can be ing to the first category is that the method assumes natural
described as similar to the LISFLOOD model in estimating flow in the catchment. Hydraulic structures, such as large
annual extremes. dams, can have profound influence on extreme discharges,
as many were developed as a flood-reducing measure. As
4.2 Limitations and uncertainties mentioned in the results section, flows in Spanish rivers were
generally overestimated and reservoirs may provide a likely
The BN model, despite its overall high performance, has explanation. Continental- or global-scale models routinely
lower accuracy over certain regions. Some of the uncertain-
omit this aspect, as there is not enough information avail- ern Europe) it was abundant, making the sample less rep-
able to incorporate the existence of reservoirs or their opera- resentative. The river-gauge observations might still con-
tion. The BN model includes reservoirs only indirectly; they tain errors, even though they were quality-checked by the
count as lakes and contribute to the percentage of the catch- providers; they could also be systematically inaccurate due
ment covered by water bodies and have a negative influence to, for example, outdated rating curves.
of extreme discharge. In total, 326 large dams are within the Further concerns are related to the river and catchment
catchments of the stations used in this study, according to dataset CCM2. It has lower accuracy in areas with low re-
the GRanD database (Lehner et al., 2011). Additionally, the lief energy, otherwise known as plains. Slight inaccuracies
conditions in the catchment may change over the timespan of in the DEM result in improper delimitation of catchments in
the analysis of discharge data (19502005), due to reservoir such regions. Large numbers of lakes in post-glacial parts of
construction or river regulation, or simply because of land- Europe can also result in sometimes substantial errors. For
use developments. Currently a single snapshot of European instance, the INSE value for QMAMX for mountainous Nor-
land cover is used (from around the year 2000), but the area way is 0.90, while for Sweden, with its lake-filled landscape,
covered by lakes, marshes and particularly artificial surfaces it drops to 0.71. River-gauge stations, for which there was a
is dynamic. In our analysis there was very little difference in significant difference between catchment area in CCM2 and
performance between different time periods, but this aspect the corresponding value in the stations metadata, were re-
could be relevant locally. moved from the sample. The improperly divided basins still
The configuration of the BN presented here was the best exist in our final database of simulated extreme discharges,
one we found, but may not be the only solution possible, though. This also involves omission of most artificial chan-
or the best one there could be. In Paprotny and Morales nels and all cases of bifurcation, river deltas included.
Npoles (2015), the setup of the model was slightly differ- Climate data from CORDEX have the highest resolution
ent, with unconsolidated deposits (calculated as a fraction of available, yet biases in representing rainfall, snowmelt and
all soil types in a catchment) used instead of the runoff co- runoff could influence the results. As addressed in Sect. 4.1,
efficient. It can be noticed that despite several soil datasets bias-correction of precipitation significantly improved per-
being mentioned in the methodology (Sect. 2.3), none made formance of the LISFLOOD hydrological model, leaving
the final configuration of the model. Low resolution and lim- room for further enhancements of the method. Another issue
ited thematic accuracy of global soil data are likely the cause. is related to climate-change scenarios used to construct the
Several other variables describing terrain, climate or land database of discharges. The difference between RCP 4.5 and
cover mentioned in Sect. 2.3 were not included, as adding RCP 8.5 scenarios is sometimes very large, as witnessed in
them did not improve the model. However, one alternative Fig. 7. This alone illustrates major uncertainty related to fu-
configuration worth mentioning is a BN incorporating ter- ture projections of climate. For the historical period, the use
rain classification based on height above nearest drainage. of an alternative CORDEX model and a climate reanalysis
Replacing lake and marsh cover with wetlands and hill- has shown (Supplement 2) that the BN models performance
slopes identified in the DEM (see Gao et al., 2014) caused depends on the climate model used, yet it is still considerably
only a fractional drop in performance. Given that land-cover better than the RFA.
data for Europe have very high resolution and good accuracy, Finally, the underlying dependence structure requires fur-
this approach may give better results in areas with less satis- ther investigation since some of the bivariate distributions of
factory data such as the developing countries. variables indicate that a non-Gaussian copula could be a bet-
Some issues are related to the datasets used. Discharge ter model (see Supplement 1 for details). Other copulas could
data are daily values, rather than absolute peak flows, as that potentially be used since, for some distributions, tail depen-
variable was the only one available from the main source of dence and other asymmetries may be present, even though
information, i.e. the Global Runoff Data Centre. Yet, Pol- the normal copula works well most of the time. Skewness,
ish data were only available as sub-daily maxima, which did for example, may be modelled by copulas based on mixture
not affect the accuracy for Poland or Europe much, but is distributions. This would correspond to copulas with more
nonetheless a slight inconsistency. More crucially, daily dis- than two parameters (Joe, 2014).
charge is not adequate to model flash floods, floods of short
duration or floods in small catchments. Flash floods can oc- 4.3 Applications and further developments
cur in matter of minutes and outside of river beds. Also, the
model utilizes daily precipitation and snowmelt, which also The method was originally conceived to provide extreme dis-
may not be accurate for large catchments, where the biggest charge estimates that could be used for pan-European hazard
floods are caused by rainfalls lasting many days. Potential mapping. As shown in the previous sections, the BN pro-
incorporation of different timespans of flood-inducing mete- vides similar results when compared to existing hydrolog-
orological events is yet to be analysed. In some regions the ical models, yet it is much faster. For hydrodynamic mod-
amount of river-gauge station data was very limited, mainly elling of water levels (Paprotny et al., 2017), catchments
in south-eastern Europe, while in others (northern and west- with area greater than 100 km2 were selected, both to fur-
ther reduce calculation time and due to limited applicabil- ous United States, indicating that the European quantification
ity of the BN model to very small catchments. The calcula- performed generally well, though much less accuracy was
tion of annual maximum discharge for 151 years, including observed for arid and hurricane-influenced parts of the coun-
95 years in two climate-change scenarios, in a domain of al- try than in those with temperate climate. Quantification based
most 156 000 river sections above the threshold and obtain- on US or combined (USEurope) data performed less well,
ing return periods of flood events, takes less than a day on a though for any variant the results were better than when using
desktop PC. The exact value depends on the number of sam- RFA, which was originally validated for that area by Smith
ples used when conditionalizing the BN and the number of et al. (2015). Finally, the model could be potentially evalu-
samples used to quantify the BN. Nevertheless, the method ated not only using all variables, but conditionalized only on
can reduce time needed to perform a flood-hazard analysis, some of them, as observations for all variables might not be
both continental-scale and local, as long as annual extremes available in a given location.
are relevant for a particular study.
The results of this study extreme discharges with cer-
tain return periods under present and future climate for all 5 Conclusions
river sections in the domain are publicly available online
(Paprotny and Morales Npoles, 2016). The dataset was for- In this paper we presented a first attempt to model extreme
matted in GIS in such a manner that it can be easily com- river discharges in Europe using BNs. The method revis-
bined with the CCM2 river and catchment database. The files its the old concept of estimating discharges using only ge-
include a total of 10 different return periods of discharges ographical properties of catchments, but employing an en-
(21000 years) and 5 scenarios, the same as described in tirely new approach. Instead of a usual regression analysis,
Sect. 3.2. Additionally, for each future scenario, change in we determine the (conditional) correlations between different
the return period of discharge compared to 19712000 was variables describing the catchments with copulas and a non-
calculated and included in dataset. Flood-hazard maps that parametric BN. We show that the model has comparable ac-
utilized those results are also accessible, but further discus- curacy to other large-scale hydrological models in simulating
sion about them is outside the scope of this paper. This is def- mean annual maxima and return periods of daily discharges
initely a line for future research recommended by the authors, and better performance than a RFA. The data necessary to
with the first application presented in Paprotny et al. (2017). apply the method can be obtained from pan-European (or
We should note, however, that all the databases were pub- global) databases for any location in the continent (or other
lished with the intention of analysing them on a European locations where global data are available). In this sense, the
scale, and users should be careful applying them on a local method can be used to create basic flood scenarios at any
scale, especially for small and medium catchments (with an ungauged location where data for these variables are avail-
area of less than 500 km2 ). able. For that reason it was used to provide estimates of ex-
Thus far, the models domain has been limited to Europe, treme river discharges for both present and future climate in
but investigation is also ongoing into applying the method all rivers in a domain covering most of the continent. How-
to other regions, globally. Currently, data from the United ever, the accuracy at different ungauged locations varies to
States and Mexico are being analysed. There is a very large some degree. The best performance was found in Scandi-
number of river-gauge observations available for the contigu- navia, western Europe and the Danube basin, while the low-
ous US, while for its southern neighbour the number and est was observed in southern Europe, especially in the Iberian
quality of historical records is limited. These case studies Peninsula. Trends in discharges were found to be very diver-
provide interesting challenges when compared to Europe. sified, while the database itself will be applied to delimiting
Mexico lays mostly within tropical and arid climate zones, flood-hazard zones in a separate study. Further research re-
which is in stark contrast to Europe. The United States is garding discharge estimates with our model is recommended,
geographically diversified and its biggest river system the especially for future climate scenarios.
MississippiMissouri basin is almost four times larger than There are several advantages of our approach. It has low
the Danube basin. For these countries, global spatial datasets computational expense, the method is flexible as its config-
will be used which have a lower resolution than those applied uration could be easily modified, and the model can be used
in this study. It is possible, for instance, to quantify the BN even if not all variables for a given location are available. At
model with those datasets and analyse its performance rel- the same time it allows for sensitivity analysis of different
ative to the European quantification presented in this paper, variables on extreme discharges, as well as easy incorpora-
as well as to combine those data. In this way, the models tion of changes in climate or land use over time. It relies
configuration with seven variables can be challenged, as the purely on the statistical distributions and statistical depen-
risk is that the method is overfitting the data from Europe. dence of catchment descriptors, without any empirical mod-
But again, this could only be definitely resolved by testing ifiers or clustering typical for other statistical methods. The
the model in other geographical areas of the world. As a first model also has a graphical nature, which makes its formula-
check, Couasnon (2017) applied the model for the contigu- tion explicit. The aim was to make the method universal and,
even though so far it was only comprehensively tested for Eu- Barredo, J. I.: Major flood disasters in Europe: 19502005,
rope, its overall performance is encouraging. The accuracy Nat. Hazards, 42, 125148, https://doi.org/10.1007/s11069-006-
of the model changes relatively little between regions and 9065-2, 2007.
time periods, as well as when a split-sample test is applied. Bartalev, S. A., Belward, A. S., Erchov, D. V., and Isaev,
The disadvantages are mostly typical for other large-scale A. S.: A new SPOT4-Vegetation derived land cover map
of Northern Eurasia, Int. J. Remote Sens., 24, 19771982,
models, such as assumption of natural flow conditions in the
https://doi.org/10.1080/0143116031000066297, 2003.
rivers and lower performance in smaller catchments. Valida- Centro de Estudios Hidrogrficos: Anuario de aforos 20112012,
tion has shown that for catchments smaller than 500 km2 , and available at: http://ceh-flumen64.cedex.es/anuarioaforos/default.
especially than 100 km2 , performance is significantly lower asp (last access: 27 January 2016), 2012.
than for larger ones due to increasing influence of local fac- Chow, V. T.: Applied hydrology, McGraw-Hill, New York, USA,
tors. The method was also crafted only for annual maxima of 1988.
discharges, with the purpose of accurately estimating return Couasnon, A. A. O.: Characterizing flood hazard at two spatial
periods rather than discharges in a particular year. But again, scales with the use of stochastic models: an application to the
this is the most relevant parameter in flood-hazard analysis. contiguous United States of America and the Houston Ship
The method will be further developed and tested in other Channel, MSc thesis, TU Delft, Delft, the Netherlands, 2017.
parts of the world. Dankers, R. and Feyen, L.: Climate change impact on flood
hazard in Europe: An assessment based on high resolu-
tion climate simulations, J. Geophys. Res., 113, D19105,
https://doi.org/10.1029/2007JD009719, 2008.
Data availability. This work relied entirely on public data as in-
Dankers, R. and Feyen, L.: Flood hazard in Europe in an ensemble
puts, which are available from the providers cited in Sect. 2.2 and
of regional climate scenarios, J. Geophys. Res., 114, D16108,
2.3. Results of the work can be downloaded from an online reposi-
https://doi.org/10.1029/2008JD011523, 2009.
tory (Paprotny and Morales Npoles, 2016).
Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli,
P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G.,
Bauer, P., Bechtold, P., Beljaars, A. C. M., van de Berg, L., Bid-
The Supplement related to this article lot, J., Bormann, N., Delsol, C., Dragani, R., Fuentes, M., Geer,
is available online at A. J., Haimberger, L., Healy, S. B., Hersbach, H., Hlm, E. V.,
https://doi.org/10.5194/hess-21-2615-2017-supplement. Isaksen, L., Kllberg, P., Khler, M., Matricardi, M., McNally,
A. P., Monge-Sanz, B. M., Morcrette, J.-J., Park, B.-K., Peubey,
C., de Rosnay, P., Tavolato, C., Thpaut, J.-N., and Vitart, F.: The
ERA-Interim reanalysis: configuration and performance of the
Competing interests. The authors declare that they have no conflict
data assimilation system, Q. J. Roy. Meteor. Soc., 137, 553597,
of interest.
https://doi.org/10.1002/qj.828, 2011.
De Jager, A. L. and Vogt, J. V.: Development and demon-
stration of a structured hydrological feature coding
Acknowledgements. This work was supported by the European system for Europe, Hydrolog. Sci. J., 55, 661675,
Unions Seventh Framework Programme under Risk analysis of https://doi.org/10.1080/02626667.2010.490786, 2010.
infrastructure networks in response to extreme weather (RAIN) DHI GRAS: EU-DEM Statistical Validation Report, European En-
project, grant no. 608166. The authors would like to thank the vironment Agency, Copenhagen, Denmark, 2014.
Global Runoff Data Centre in Koblenz, Germany, for kindly European Environment Agency: CLC2006 technical guidelines,
providing a large part of river-gauge data used in this study. EEA Technical report No. 17/2007, European Environment
The work described herein benefited from useful comments by Agency, Copenhagen, Denmark, 2007.
S. N. Jonkman, H. H. G. Savenije, A. Sebastian, A. Sikorska and European Environment Agency: Corine Land Cover 2000 raster
two anonymous reviewers. data, available at: http://www.eea.europa.eu/data-and-maps/
data/corine-land-cover-2000-raster-3 (last access: 29 January
Edited by: J. Seibert 2016), 2014a.
Reviewed by: A. E. Sikorska and two anonymous referees European Environment Agency: EEA Fast Track Service Precursor
on Land Monitoring Degree of soil sealing, available at:
http://www.eea.europa.eu/data-and-maps/data/eea-fast-track-
References service-precursor-on-land-monitoring-degree-of-soil-sealing
(last access: 29 January 2016), 2014b.
Alfieri, L., Salamon, P., Bianchi, A., Neal, J., Bates, P., and Fal, B.: Przepywy charakterystyczne gwnych rzek polskich w lat-
Feyen, L.: Advances in pan-European flood hazard mapping, Hy- ach 1951-1995, Materiay Badawcze Instytut Meteorologii i
drol. Process., 28, 40674077, https://doi.org/10.1002/hyp.9947, Gospodarki Wodnej. Hydrologia i Oceanologia 26, IMGW, War-
2014. saw, Poland, 137 pp., 2000.
Alfieri, L., Feyen, L., Dottori, F., and Bianchi, A.: En- FAO/IIASA/ISRIC/ISS-CAS/JRC: Harmonized World Soil
semble flood risk assessment in Europe under high end Database (version 1.2), FAO, Rome, Italy and IIASA, Laxen-
climate scenarios. Global Environ. Chang., 35, 199210, burg, Austria, 2012.
https://doi.org/10.1016/j.gloenvcha.2015.09.004, 2015.
Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., vats, S., Krner, N., Kotlarski, S., Kriegsmann, A., Martin,
Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, E., van Meijgaard, E., Moseley, C., Pfeifer, S., Preuschmann,
L., Seal, D., Shaffer, S., Shimada, J., Umland, J., Werner, S., Radermacher, C., Radtke, K., Rechid, D., Rounsevell, M.,
M., Oskin, M., Burbank, D., and Alsdorf, D.: The Shut- Samuelsson, P., Somot, S., Soussana, J.-F., Teichmann, C.,
tle Radar Topography Mission, Rev. Geophys., 45, RG2004, Valentini, R., Vautard, R., Weber, B., and Yiou, P.: EURO-
https://doi.org/10.1029/2005RG000183, 2007. CORDEX: new high-resolution climate change projections for
Feyen, L., Dankers, R., Bdis, K., Salamon, P., and Barredo, J. I.: European impact research, Reg. Environ. Change, 14, 563578.
Fluvial flood risk in Europe in present and future climates, Cli- https://doi.org/10.1007/s10113-013-0499-2, 2014.
matic Change, 112, 4762, https://doi.org/10.1007/s10584-011- Joe, H.: Dependence Modeling with Copulas, Chapman &
0339-7, 2012. Hall/CRC, London, UK, 2014.
Gao, H., Hrachowitz, M., Fenicia, F., Gharari, S., and Savenije, Katz, R. W., Parlange, M. B., and Naveau, P.: Statistics of
H. H. G.: Testing the realism of a topography-driven extremes in hydrology, Adv. Water Resour., 25, 12871304,
model (FLEX-Topo) in the nested catchments of the Up- https://doi.org/10.1016/S0309-1708(02)00056-8, 2002.
per Heihe, China, Hydrol. Earth Syst. Sci., 18, 18951915, Klein Goldewijk, K., Beusen, A., de Vos, M., and van Drecht, G.:
https://doi.org/10.5194/hess-18-1895-2014, 2014. The HYDE 3.1 spatially explicit database of human induced land
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., use change over the past 12 000 years, Global Ecol. Biogeogr.,
and Rubin, D. B.: Bayesian data analysis, 3rd ed., Chapman & 20, 7386, https://doi.org/10.1111/j.1466-8238.2010.00587.x,
Hall/CRC, London, UK, 2013. 2011.
Gericke, O. J. and Smithers, J. C.: Review of methods used Kotlarski, S., Keuler, K., Christensen, O. B., Colette, A., Dqu,
to estimate catchment response time for the purpose of M., Gobiet, A., Goergen, K., Jacob, D., Lthi, D., van Meij-
peak discharge estimation, Hydrol. Sci. J., 59, 19351971, gaard, E., Nikulin, G., Schr, C., Teichmann, C., Vautard, R.,
https://doi.org/10.1080/02626667.2013.866712, 2014. Warrach-Sagi, K., and Wulfmeyer, V.: Regional climate model-
Gharari, S., Hrachowitz, M., Fenicia, F., and Savenije, H. H. G.: ing on European scales: a joint standard evaluation of the EURO-
Hydrological landscape classification: investigating the perfor- CORDEX RCM ensemble, Geosci. Model Dev., 7, 12971333,
mance of HAND based landscape classifications in a central https://doi.org/10.5194/gmd-7-1297-2014, 2014.
European meso-scale catchment, Hydrol. Earth Syst. Sci., 15, Kottek, M., Grieser, J., Beck, C., Rudolf, B., and Rubel, F.:
32753291, https://doi.org/10.5194/hess-15-3275-2011, 2011. World Map of the Kppen-Geiger climate classification up-
Global Runoff Data Centre: BfG The GRDC, available at: http: dated, Meteorol. Z., 15, 259263, https://doi.org/10.1127/0941-
//www.bafg.de/GRDC/EN/Home/homepage_node.html, last ac- 2948/2006/0130, 2006.
cess: 27 January 2016. Kurowicka, D. and Cooke, R.: Uncertainty analysis with high
Hanea, A. M., Kurowicka, D., and Cooke, R. M.: Hybrid Method for dimensional dependence modelling, John Wiley & Sons Ltd,
Quantifying and Analyzing Bayesian Belief Nets, Qual. Reliab. Chichester, UK, 2006.
Eng. Int., 22, 709729, https://doi.org/10.1002/qre.808, 2006. Lehner, B., Liermann, C. R., Revenga, C., Vrsmarty, C., Fekete,
Hanea, A. M., Morales Npoles, O., and Ababei, D.: Non- B., Crouzet, P., Dll, P., Endejan, M., Frenken, K., Magome, J.,
parametric Bayesian networks: Improving theory and review- Nilsson, C., Robertson, J. C., Rdel, R., Sindorf, N., and Wisser,
ing applications, Reliab. Eng. Syst. Safe., 144, 265284, D.: High resolution mapping of the worlds reservoirs and dams
https://doi.org/10.1016/j.ress.2015.07.027, 2015. for sustainable river flow management, Front. Ecol. Environ., 9,
Haylock, M. R., Hofstra, N., Klein Tank, A. M. G., Klok, 494502, https://doi.org/10.1890/100125, 2011.
E. J., Jones, P. D., and New, M.: A European daily Meigh, J. R., Farquharson, F. A. K., and Sutcliffe, J.
high-resolution gridded dataset of surface tempera- V.: A worldwide comparison of regional flood estima-
ture and precipitation, J. Geophys. Res., 113, D20119, tion methods and climate, Hydrol. Sci. J., 42, 225244,
https://doi.org/10.1029/2008JD010201, 2008. https://doi.org/10.1080/02626669709492022, 1997.
Hengl, T., de Jesus, J. M., MacMillan R. A., Batjes, N. H., Morales Npoles, O., Worm, D., van den Haak, P., Hanea, A.,
Heuvelink, G. B. M., Ribeiro, E., Samuel-Rosa, A., Kem- Courage, W., and Miraglia, S.: Reader for course: Introduction to
pen, B., Leenaars, J. G. B., Walsh, M. G., and Gon- Bayesian Networks, TNO-060-DTM-2013-01115, TNO, Delft,
zalez, M. R.: SoilGrids1km Global Soil Information the Netherlands, 2013.
Based on Automated Mapping, PLoS ONE, 9, e105992, Moriasi, D., Arnold, J., Van Liew, M., Binger, R., Harmel, R., and
https://doi.org/10.1371/journal.pone.0105992, 2014. Veith T.: Model evaluation guidelines for systematic quantifica-
Herold, C. and Mouton, F.: Global flood hazard mapping using sta- tion of accuracy in watershed simulations, T. ASABE, 50, 885
tistical peak flow estimates, Hydrol. Earth Syst. Sci. Discuss., 8, 900, 2007.
305363, https://doi.org/10.5194/hessd-8-305-2011, 2011. Moss, R. H., Edmonds, J. A., Hibbard, K. A., Manning, M. R., Rose,
Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Ya- S. K., van Vuuren, D. P., Carter, T. R., Emori, S., Kainuma, M.,
mazaki, D., Watanabe, S., Kim, H., and Kanae, S.: Global flood Kram, T., Meehl, G. A., Mitchell, J. F. B., Nakicenovic, N., Ri-
risk under climate change, Nat. Clim. Change, 3, 816821, ahi, K., Smith, S. J., Stouffer, R. J., Thomson, A. M., Weyant,
https://doi.org/10.1038/nclimate1911, 2013. J. P., and Wilbanks, T. J.: The next generation of scenarios for
Jacob, D., Petersen, J., Eggert, B., Alias, A., Christensen, O. B., climate change research and assessment, Nature, 463, 747756,
Bouwer, L. M., Braun, A., Colette, A., Dqu, M., Georgievski, https://doi.org/10.1038/nature08823, 2010.
G., Georgopoulou, E., Gobiet, A., Menut, L., Nikulin, G., Mutua, F. M.: The use of the Akaike Information Cri-
Haensler, A., Hempelmann, N., Jones, C., Keuler, K., Ko- terion in the identification of an optimum flood
frequency model, Hydrolog. Sci. J., 39, 235244, Salinas, J. L., Laaha, G., Rogger, M., Parajka, J., Viglione,
https://doi.org/10.1080/02626669409492740, 1994. A., Sivapalan, M., and Blschl, G.: Comparative assess-
Norwegian Water Resources and Energy Directorate: Historiske ment of predictions in ungauged basins Part 2: Flood and
vannfringsdata til produksjonsplanlegging, available at: https: low flow studies, Hydrol. Earth Syst. Sci., 17, 26372652,
//www.nve.no/hydrologi/hydrologiske-data/historiske-data/ https://doi.org/10.5194/hess-17-2637-2013, 2013.
historiske-vannfoeringsdata-til-produksjonsplanlegging/ (last Sampson, C. C., Smith, A. M., Bates, P. D., Neal, J.
access: 27 January 2016), 2015. C., Alfieri, L., and Freer, J. E.: A high-resolution global
Padi, P. T., Baldassarre, G. D., and Castellarin, A.: flood hazard model, Water Resour. Res., 51, 73587381,
Floodplain management in Africa: Large scale anal- https://doi.org/10.1002/2015WR016954, 2015.
ysis of flood data, Phys. Chem. Earth, 36, 292298, Sando, S. K.: Techniques for Estimating Peak-Flow Magnitude and
https://doi.org/10.1016/j.pce.2011.02.002, 2011. Frequency Relations for South Dakota Streams, Water-Resources
Panagos, P., Van Liedekerke, M., Jones, A., and Montanarella, L.: Investigations Report 98-4055, U.S. Geological Survey, Denver,
European Soil Data Centre: Response to European policy sup- USA, 1998.
port and public data requirements, Land Use Policy, 29, 329338, Savenije, H. H. G.: HESS Opinions Topography driven conceptual
https://doi.org/10.1016/j.landusepol.2011.07.003, 2012. modelling (FLEX-Topo), Hydrol. Earth Syst. Sci., 14, 2681
Paprotny, D. and Morales Npoles, O.: A Bayesian Network for 2692, https://doi.org/10.5194/hess-14-2681-2010, 2010.
extreme river discharges in Europe, in: Safety and Reliabil- Smith, A., Sampson, C., and Bates, P.: Regional flood frequency
ity of Complex Engineered Systems, edited by: Podofillini, L., analysis at the global scale, Water Resour. Res., 51, 539553,
Sudret, B., Stojadinovic, B., Zio, E., and Krger, W., CRC https://doi.org/10.1002/2014WR015814, 2015.
Press/Balkema, Leiden, the Netherlands, 43034311, 2015. Stach, J. and Fal, B.: Zasady obliczania maksymalnych przepy-
Paprotny, D. and Morales Npoles, O.: Pan-European data sets of ww prawdopodobnych, Prace Instytutu Badawczego Drg i
river flood probability of occurrence under present and future cli- Mostw, 34, 91147, 1986.
mate, TU Delft, dataset, https://doi.org/10.4121/uuid:968098ce- Swedish Meteorological and Hydrological Institute: Vattenweb
afe1-4b21-a509-dedaf9bf4bd5, 2016. Mtningar, available at: http://vattenweb.smhi.se/station/, last ac-
Paprotny, D., Morales-Npoles, O., and Jonkman, S. N.: Efficient cess: 27 January 2016.
pan-European river flood hazard modelling through a combi- Viewfinder Panoramas: Digital elevation data, available at: http:
nation of statistical and physical models, Nat. Hazards Earth //viewfinderpanoramas.org/dem3.html (last access: 28 January
Syst. Sci. Discuss., https://doi.org/10.5194/nhess-2017-4, in re- 2016), 2014.
view, 2017. Vogt, J. V., Soille, P., de Jager, A., Rimaviciute, E., Mehl, W., Fois-
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks neau, S., Bodis, K., Dusart, J., Paracchini, M. L., Haastrup, P.,
of Plausible Inference, Morgan Kaufmann, San Mateo, Califor- and Bamps, C.: A pan-European River and Catchment Database,
nia, USA, 1988. Report EUR 22920 EN, European Commission-Joint Research
Peereboom, I. O., Waag, O. S., and Myhre, M.: Preliminary Flood Centre, Luxembourg, 120 pp., 2007.
Risk Assessment in Norway An example of a methodology Ward, P. J., Jongman, B., Sperna Weiland, F., Bouwman, A., and
based on a GIS-approach, Report no. 7/2011, Norwegian Water van Beek, R.: Assessing flood risk at the global scale: model
Resources and Energy Directorate, Oslo, Norway, 2011. setup, results, and sensitivity, Environ. Res. Lett., 8, 044019,
Rockel, B., Will, A., and Hense, A.: Special issue regional climate https://doi.org/10.1088/1748-9326/8/4/044019, 2013.
modelling with COSMO-CLM (CCLM), Meteorol. Z., 17, 347 Whitfield, P.: Floods in future climates: a review, J. Flood
348, 2008. Risk Manag., 5, 336365, https://doi.org/10.1111/j.1753-
Rojas, R., Feyen, L., Dosio, A., and Bavera, D.: Improving 318X.2012.01150.x, 2012.
pan-European hydrological simulation of extreme events Winsemius, H. C., Van Beek, L. P. H., Jongman, B., Ward, P.
through statistical bias correction of RCM-driven cli- J., and Bouwman, A.: A framework for global river flood
mate simulations, Hydrol. Earth Syst. Sci., 15, 25992620, risk assessments, Hydrol. Earth Syst. Sci., 17, 18711892,
https://doi.org/10.5194/hess-15-2599-2011, 2011. https://doi.org/10.5194/hess-17-1871-2013, 2013.
Rojas, R., Feyen, L., Bianchi, A., and Dosio, A.: Assessment of Wrede, S., Seibert, J., and Uhlenbrook, S.: Distributed con-
future flood hazard in Europe using a large ensemble of bias- ceptual modelling in a Swedish lowland catchment: a
corrected regional climate simulations, J. Geophys. Res., 117, multi-criteria model assessment, Hydrol. Res., 44, 318333.
D17109, https://doi.org/10.1029/2012JD017461, 2012. https://doi.org/10.2166/Nh.2012.056, 2013.