Frequency Analysis
Frequency Analysis
Frequency Analysis
net/publication/268389275
CITATIONS READS
12 1,384
4 authors, including:
Bernard Bobée
Institut National de la Recherche Scientifique
210 PUBLICATIONS 10,699 CITATIONS
SEE PROFILE
All content following this page was uploaded by Eghbal Ehsanzadeh on 17 November 2014.
Abstract: Statistical criteria used to evaluate the best distribution fit give large weights to the center of distributions. This is, however,
not consistent with the objective of frequency analysis which is to estimate the quantiles with large return periods. In this study, the
usefulness of a recently proposed decision support system 共DSS兲, which defines the class of distributions prior to a model selection
practice with respect to tail behavior of sample data, was investigated using three large hydroclimatic databases 关Reference Hydrometric
Basin Network 共RHBN兲, precipitation, and UNESCO兴. According to the DSS, although a considerable majority of RHBN flood sample
data belonged to Class C 共regularly varying distributions兲, a slight and great majority of UNESCO discharge as well as annual precipi-
tation sample data, respectively, belonged to Class D 共subexponential distributions兲. This difference in classification is attributed to the
nature of studied variables: RHBN sample data represent extreme events with heavy tails 共Class C兲, whereas UNESCO and especially
precipitation sample data come from relatively lighter tailed processes and therefore belong mostly to Class D distributions. The impact
of classification on model selection was the largest for the RHBN and the smallest for precipitation sample data. This confirms that
discriminating between classes of competing models prior to model selection is critical when the sample data come from extreme events.
Observed inconsistency in model selection for the RHBN database resulted in an underestimation of quantiles in more than 2/3 of the
cases regardless of class of distributions. For the UNESCO and precipitation data, however, inappropriate model selection resulted equally
in over- and under-estimation in Class C, whereas it resulted in underestimation of quantiles in Class D in the majority of observed
inconsistencies. It can be concluded that an inappropriate model selection due to choosing a wrong class of distributions leads, in the
majority of cases, to an underestimation of the quantity of the variable under study which is associated with a higher socioeconomy risk
compared to that corresponding to an overestimation of a specific quantile. It was also observed that model selection using Bayesian
Information Criterion 共compared to Akaike Information Criterion兲 is more consistent with tail behavior of the natural processes.
DOI: 10.1061/共ASCE兲HE.1943-5584.0000261
CE Database subject headings: Decision support systems; Frequency analysis; Climatic changes; Hydrology.
Author keywords: Decision support system; AIC; BIC; RHBN; Frequency analysis; Quantile.
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
events兲. Following the work of Werner and Upper 共2002兲, five RHBN Database
nested classes of distributions were introduced including Class A
One of the databases used in this study is the reference hydromet-
共stable distributions兲, Class B 共Pareto type tail兲, Class C 共regu-
ric basin network 共RHBN兲. Established in mid-1990s, the RHBN
larly varying distributions兲, Class D 共subexponential distribu- is a 250-station subset of Canada’s national hydrometric network
tions兲, and Class E 共exponential兲 distributions with nonexistent of plus 2,400 active stations, identified by a national group of
exponential moments. El Adlouni et al. 共2008兲 proposed, in addi- hydrological experts for use in detection, monitoring, and assess-
tion, making use of a set of graphical criteria that are developed in ment of climate change 共Harvey et al. 1999; Pilon and Kuylensti-
extreme value theory 共Embrechts et al. 2003兲 to select the class of erna 2000兲. The last year of records for the whole network is
distributions that seems to adequately represent the sample ex- 2003; yet in some stations the records may terminate one or more
tremes. In the second step, and inside each class, classical tests years before 2003 due to the lack of observation or large number
and criteria can be used to select the most adequate distribution. of missing data. The longest record length belongs to two hydro-
The objective of this study is to investigate the usefulness of metric stations in Alberta 共05BB001 and 08MG005兲 with 92 years
the proposed approach using three large hydroclimatic databases of records. Due to the nature of this study, and given that some
with considerably long record lengths. To do so, the frequency statistical tests require a minimum record length, it was decided to
analysis is performed by adopting the following procedure: select the stations that have at least 30 years of observations. That
• First, the sample data in the hydroclimatic databases are fitted is, 155 RHBN stations were selected for further analysis. For the
using a number of probability distribution functions that are purpose of this study, the maximum annual flows 共floods兲 were
commonly used in hydrology. The conventional criteria such extracted from the mean daily flow observations. Since the miss-
as AIC and BIC are used to identify the statistical distribution ing data in most of the cases were observed in low flow periods
which practically do not impact the maximum annual flow, fre-
that fits sample data with minimum departure from the true
quency analysis was performed ignoring the missing data.
density function of observations.
• Second, the new approach of data classification with respect to
tail properties is used to identify the proper class of distribu- Precipitation Database
tions that gives a better estimation of high return period
The second database used in this study concerns time series of
events. Model selection using AIC and BIC is then repeated
annual cumulative precipitation at 388 gauging stations over the
inside each class of distribution functions making use of the world supplied by Pierre Hubert 共personal communication, 2005兲.
same models used prior to classification. The majority of observations in this database end in 1990s with
• Finally, a comparison of fitted models before and after classi- last observation recorded in 1997 for Amarillo station 共USA兲. The
fication 共incorporating DSS兲 is performed to investigate the longest record length belongs to a station in the Great Britain with
level of inconsistency associated with model selection ignor- 298 years of observation 共1697–1995兲. Out of 388 stations, 387
ing the class of distributions. The risk of an inappropriate stations have at least 40 years of observations, whereas 30 and 7
model selection is finally investigated through comparing the stations have at least 150 and 200 years of records, respectively.
proportion of over- and underestimated quantiles for a variety The nature of observations in this database is different from that
of return periods. of RHBN database as the measurements in annual cumulative
It is also important in frequency analysis to identify the pa- precipitation database do not represent extreme values.
rameter estimation method that provides the most accurate results.
Indeed, an inefficient parameter estimation method can lead to a UNESCO Discharge Database
bad estimate of the extreme quantile 共El Adlouni et al. 2008兲. In
this study we fit each candidate model to the sample data making The UNESCO sample data obtained from the UNESCO web-
use of a number of parameter estimation methods available for site 共http://portal.unesco.org/en/ev.php-URL_ID⫽29008&URL_
the candidate model. The number of fitted sample data using each DO⫽DO_TOPIC&URL_SECTION⫽201.html兲 are the annual
parameter estimation method is used as a criterion to identify an maximum discharges calculated based on the mean monthly flows
optimum parameter estimation method. from 28 hydrometric stations around the world. One and three
hydrometric stations are located in Asia and Africa, respectively,
while the rest of the hydrometric stations are located in Europe
共11 stations兲 and North America 共13 stations of which 11 stations
Data Used in the Study are located in Canada兲. The observational period for the
UNESCO data varies from 55 years for Dire station 共Mali兲 to 178
For the purpose of this study, three databases were used to inves- years for Vanersbory 共Sweden兲 and the majority of UNESCO
tigate the suitability of proposed methodology 共based on DSS兲 in records end in 1980s. The average record length for the sample
frequency analysis of hydroclimatic processes. These databases, data in this database is 94 years.
which contain information on hydrological as well as meteoro-
logical variables and are collected from different regions of the
world, are characterized by a considerable diversity in terms of Exploratory Data Analysis
space and time and therefore can be considered as good measures
to evaluate the performance of the DSS for a wide range of ap- As a first step in performing frequency analysis, the three data-
plications 共hydrology, climatology, atmosphere, etc.兲. A brief de- bases were tested for independent and identically distributed 共iid兲
scription of the three databases used in the study is presented conditions 共stationarity, homogeneity, and independence兲. The
here, and the interested reader is referred to the corresponding nonparametric Mann-Kendall 共MK兲 statistical test 共Mann 1945;
report by Ehsanzadeh et al. 共2009兲 for a full description of the Kendall 1975兲 was used to assess the stationarity of the sample
sample data and study area. data. A brief description of the MK test is presented in the Ap-
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
pendix. The Wald and Wolfowitz 共1943兲 independence test was known that the AIC is biased for small sample size 共Cahill 2003兲.
used to verify the independence of the observations 共see Appen- The AIC has been used in various fields of statistics, engineering,
dix兲 and the Wilcoxon rank sum test was used to evaluate the hydrology, and numerical analyses 共Akaike 1970, 1976; Otomo
homogeneity of the time series. This test, proposed initially by et al. 1972; Otsu et al. 1976; Sakamoto and Akaike 1977; Tanabe
Wilcoxon 共1945兲 for equal sample sizes and extended by Mann 1974; Salas et al. 1980兲.
and Whitney 共1947兲, is equivalent to the Mann-Whitney U test. The BIC, also known as Schwarz’s criterion 共Schwarz 1978兲,
This test is described in more detail in the Appendix. is an asymptotically optimal method for identifying the best
It was found that out of 155 tested RHBN stations, 114 sta- model using only sample estimates 共Almpanidis and Kotropoulos
tions were characterized with iid conditions 共passed the three 2008兲. It can be viewed as a penalized ML technique because it
tests兲 at a 5% level of significance. Furthermore, out of 388 pre- imposes a penalty for including too many terms in a regression
cipitation time series, 170 sample data were characterized with iid model in order to overcome overfitting. The BIC is derived from
conditions and 9 out of 28 UNESCO stations passed iid tests. The selection of the most probable model a posteriori when the obser-
new databases consisting of 114, 170, and 9 stations for RHBN, vations are assumed to come from generalized exponential family.
precipitation, and UNESCO sample data, respectively, were used The BIC is defined as
to perform the frequency analysis described in the rest of this
work. ˆ 兲 + k log共n兲
BIC = − 2 log L共⌰ 共2兲
In this equation n is the sample size and other notations are simi-
lar to those used for AIC. For sufficiently large n, the best model
Methodology for the data is the one that minimizes the BIC 共Almpanidis and
Kotropoulos 2008兲. The BIC has become a popular criterion for
Candidate Models and Distribution Selection Criteria model selection in recent years. It has, however, some important
drawbacks that are not widely recognized. First, Bayes factors
The distribution of hydroclimatic data is not known but it is as- depend on prior beliefs about the expected distribution of param-
sumed that they come from one of known distribution functions. eter values, and there is no guarantee that the Bayes factor im-
The most commonly used probability distribution functions in plied by the BIC will be close to the one calculated from a prior
hydrology are extreme value Type 1 共EV1兲 also known as Gum- distribution that an observer would actually regard as appropriate.
bel, extreme value Type 2 共EV2兲 also known as Frechet, Halphen Second, to obtain the Bayes factors that follow from the BIC,
Type A 共HA兲, Halphen Type B 共HB兲, Halphen inverse Type B investigators would have to vary their prior distributions depend-
共HIB兲, two parameter log normal 共LN2兲, gamma 共G兲, inverse ing on the marginal distributions of the variables and the nature of
gamma 共IG兲, and log Pearson Type 3 共LP3兲 and as such were the hypothesis. Such variations seem unjustifiable in principle and
selected to perform frequency analysis in this study. tend to make the BIC inclined to favor very simple models in
In order to verify whether the data come from an assumed practice 共Weakliem 1999兲.
distribution function, the goodness-of-fit tests are used. However,
the conventional goodness-of-fit statistics, such as the chi-square
and the Kolmogrov-Smirnov tests, suffer from the lack of power DSS
due to typically skewness encountered in hydrological sample The most popular distributions used in hydrology can be classi-
data, and this in turn leads to the variability in the estimation of fied in nested classes as 共Werner and Upper, 2002; El Adlouni et
design events for assumed return periods 共Mutua 1994兲. More- al. 2008兲
over, goodness-of-fit tests pass more than one model from the
competing models. Therefore, the usefulness of the conventional A傺B傺C傺D傺E 共3兲
goodness-of-fit tests in the optimum model identification is debat-
The distributions in Class E 共exponential兲 are given by E共eX兲
able 共Mutua 1994兲. Several alternatives exist for determining
= ⬁. The normal distribution has a lighter tail than that of the
which distribution is the most adequate model for the hydrocli-
Class E distributions and therefore the latter is heavy tailed with
matic process given the limited data. Some of the most commonly
respect to the normal distribution. Class D contains the subexpo-
used alternative methods for model selection are the AIC 共Akaike
nential distributions defined by the following equation 共Werner
1977兲, the BIC 共Schwarz 1978兲, and the HQ information 共Hannan
and Upper 2002; Embrechts et al. 2003兲:
and Quinn 1979兲 criteria. In this study, the AIC and BIC were
used to identify the model that fits the observational data with F̄共x兲
lowest uncertainty. The AIC is derived by minimizing the lim −x = ⬁, ∀ ⬎ 0 共4兲
x→⬁ e
Kullback-Leibler distance between the proposed model and the
true one and is estimated using the following equation 共Akaike where F̄共x兲 = 1 − F共x兲 corresponds to the probability of exceedance
1977兲: function and F = cumulative distribution function. Note that the
denominator e−x 共 ⬎ 0兲 represents the tail of the exponential
ˆ 兲 + 2k
AIC = − 2 log L共⌰ 共1兲 distribution. Thus, Class D contains all distributions with tails that
ˆ is the maximum likelihood 共ML兲 estimator of unknown decrease more slowly than any exponential distribution. Class C
where ⌰ contains the regularly varying distributions defined such that
parameter vector ⌰; k is the number of unknown parameters; and
L共⌰ˆ 兲 is the maximum likelihood. The first term on the right-hand F̄共tx兲
side of Eq. 共1兲 is a measure of the lack of fit of the chosen model, lim = x−␣ 共5兲
t→⬁ F̄共t兲
while the second term measures the increased unreliability of the
chosen model due to the increased number of model parameters. This relationship states that, asymptotically, the tail of distribu-
The best approximating model is the one which achieves the tions in Class C 共also called regularly varying distributions兲 de-
minimum AIC in the class of the competing models. It is well clines according to the power function which decreases more
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
log关P共X ⬎ u兲兴 ⬇ − u/ 共7兲
and for power-law distributions
This suggests that, for the log-log plot, the tail probability is
represented by a straight line for power law 共or regularly varying
distributions, Class C兲 but not for the other subexponential or
exponential distributions 共Class D or E兲.
Fig. 1. Distributions commonly used in hydrology, classed with re- The MEF method is based on the function
spect to their tail behavior 共from El Adlouni et al. 2008兲
e共u兲 = E关X − u兩X ⬎ u兴 共9兲
slowly compared to that of distributions in Class D. Distributions This function is constant for exponential tail distributions 关e共u兲
in Class B 共with exact Pareto tails兲 and in Class A 共stable distri- = 兴, while for a power law with tail index ␣共␣ ⬎ 2兲
butions兲 are not the focus of this study and, therefore, no further
description is provided for these classes of distributions. u
e共u兲 = 共10兲
A schematic presentation of different classes/distributions 共␣ − 2兲
based on their tail lightness/heaviness is shown in Fig. 1. This
This suggests the following when plotting the empirical value of
figure presents subexponential, regularly varying, and stable dis-
e共u兲 against u:
tributions 共upper squares兲 ordered from light tailed 共left兲 to heavy
• If the plot is linear and the slope is equal to zero, it suggests an
tailed 共right兲 and the limiting cases in the limits of classes. The
exponential type.
tail of Class C distributions is heavier than that of Class D distri-
• If the plot is linear, the slope is greater than zero and the
butions, which is heavier than that of Class E.
intercept is zero, then it suggests a subexponential distribution.
This classification emphasizes the need to develop techniques
The generalized Hill method is an estimation method that can be
to discriminate between Class C of regularly varying distributions
used to characterize distributions of Class C and thus to discrimi-
and Class D of the other subexponential distributions. The gener-
nate between Classes C and D. Let
ating mechanisms of the lognormal and power-law distributions
are often very connected, i.e., small effects in the generative pro- n
cess of the lognormal distribution may result in a power-law tail
共Champernowne 1953; Mandelbrot 1997, 2003; Turcotte 1997; El
兺
i=1
I共Xi ⬎ xn兲
Adlouni et al. 2008兲. In the present work, the new approach of an共xn兲 = n 共11兲
distribution classification with respect to their tails developed in
El Adlouni et al. 共2008兲 is used which identifies the best class of
兺
i=1
log共Xi/xn兲I共Xi ⬎ xn兲
within each class the probability distribution function that models This method is based on the fact that an is a consistent estimator
the process under study with minimum departures from the of ␣ if the tail is regularly varying 共Class C兲 with tail index
sample data characteristics. The tools developed in the DSS to ␣ 共Hill 1975兲. In Eq. 共11兲 xn is chosen to be large such that
identify the most adequate class of distributions are as follows: P共X ⬎ xn兲 → 0 and nP共X ⬎ xn兲 → ⬁, and I is the indicator function.
1. The log-log plot: used to discriminate between, on one hand, In practice, one plots an共xn兲 as a function of xn and looks for some
Class C and, on the other hand, Classes E and D; stable region from which an共xn兲 can be considered as an estimator
2. The mean excess function 共MEF兲: to discriminate between of ␣.
Classes D and E; and The modified version of the Jackson statistic 共Beirlant et al.
3. Two statistics: Hill’s ratio and modified Jackson statistic used 2006兲 used in DSS for X1 , . . . , Xn as iid random variables is given
for confirmatory analysis of the conclusions drawn by the by
previous two tools.
A brief description of these tools is presented here and inter- k
ested reader is referred to the work of El Adlouni et al. 共2008兲 for
further details. The log-log plot is based on the fact that for an
1/k 兺
j=1
Ck−j+1,kZ j
Tⴱk = 共13兲
exponential tail with mean , F̄共u兲 = P共X ⬎ u兲 = e−u/, and for a Hk,n
power-law tail with tail index ␣ ⬎ 1, F̄ is equivalent 共for large where
quantile兲 to
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
Statistical Analysis
C共k−j+1兲 = 1 − log 冉 冊
j+1
k+1
共16兲
fitted using ML for RHBN, precipitation, and UNESCO data-
bases, respectively. These findings are the fundamentals of using
parameter estimation methods used for candidate distribution
functions throughout this paper. For HB, HA, LN2, HIB, and IG
One plots Tⴱk against k and looks for a stable region 关i.e., Tⴱk distributions, ML which is the optimal method 共with minimum
converges to its mean 共2兲 for large values of k兴. This method variance兲 was used as the parameter estimation method and no
allows the characterization of Class B 共with Pareto type tail兲 and investigation was conducted on possible alternatives. As for LP3
thus to discriminate the distribution of Class C 共which have as- distribution, there was no single method among sundry averages
ymptotically the same tail as the Class B distributions兲 and the method 共SAM兲, BOB, and Water Resources Council 共WRC兲
rest of Class D of subexponential distribution that do not have 共Bobée and Ashkar 2001兲 parameter estimation methods which
power-law type tail. outperforms other methods for all three databases. However, since
Fig. 2 illustrates the procedure to identify a class of distribu- the BOB parameter estimation is more commonly used compared
tions to which sample data belong using DSS 关incorporated in to other two methods, BOB was selected as the parameter estima-
Hyfran-Plus software 共2008兲兴. Based on this algorithm, as the first tion method for LP3 distribution in this study. Results of model
step, a log-log plot is performed on the sample data to investigate selection using selected parameter estimation methods are pre-
whether the sample data belong to Class C. If the data set is sented in the following subsections.
categorized in Class C, then a confirmatory analysis is carried out
making use of Hill’s ratio and statistic of Jackson. If the tested RHBN Database
sample data do not belong to Class C, then the MEF method is Table 2 presents the number of RHBN sample data fitted using
applied to verify whether the data series belongs to Class D 共sub- each of distribution functions based on AIC and BIC. It can be
exponential兲 or Class E 共exponential兲 distribution type. Finally, a seen that there is no difference in ranking of selected models for
confirmatory analysis using Hill’s ratio and statistic of Jackson is RHBN sample data based on AIC and BIC. According to this
performed to assess the validity of the decision made using MEF table, G followed by IG has the highest rank and LN2 takes the
method. For the quantiles of the probability of nonexceedance p third rank after these two distributions. It can also be seen that HB
= 1 − 1 / T estimated by distributions of Classes C–E, the following takes a higher rank compared to HA.
relation holds:
Precipitation Sample Data
Table 3 presents the rank of selected models for precipitation
QT共E兲 ⬍ QT共D兲 ⬍ QT共C兲 共17兲 sample data based on the number of sample data fitted using each
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
Table 2. Results of Distribution Selection for the RHBN Database Based on AIC and BIC
Rank 1 2 3 4 5 6 7 8 9
AIC
Distribution function G IG LN2 EV1 EV2 HB HBI LP3 HA
Number of fitted sample data 38 28 21 17 6 2 1 1 0
BIC
Distribution function G IG LN2 EV1 EV2 HB HBI LP3 HA
Number of fitted sample data 39 29 21 17 5 2 1 0 0
candidate model. It can be seen from this table that, similar to the The results of frequency analysis performed on the three data-
RHBN sample data, G was selected as the best model for precipi- bases used in the study show that regardless of distribution selec-
tation sample data where more than 50 and 60% of the data series tion criterion, the most selected distributions 共e.g., G and LN2兲
were modeled using this model based on AIC and BIC, respec- are two parameter models. This can be in part attributed to the
tively. With much less number of fitted data series, LN2 takes the fact that both AIC and BIC are in favor of two parameter distri-
second rank in modeling the precipitation sample data based on butions due to the penalty term for the number of parameters 关see
both AIC and BIC. Unlike RHBN, a significant number of pre- Eqs. 共1兲 and 共2兲兴. This may lead to a bias toward two parameter
cipitation sample data are modeled using HB where it takes the distributions against those with larger number of parameters.
third and the fourth ranks based on AIC and BIC, respectively.
The EV1 distribution takes the fourth and the third ranks based on
AIC and BIC, respectively. Table 3 also shows that the ranking of Model Selection Incorporating DSS
the lower-ranked candidate models is not affected by model se-
lection criterion, where IG, EV2, LP3, and HIB take the fourth to Classification of Sample Data
the seventh ranks based on both AIC and BIC and no sample data The new approach of data classification with respect to tail prop-
are modeled using HA distribution. erties outlined in the Model Selection Incorporating DSS section
and incorporated in Hyfran-Plus software 共2008兲 is used to iden-
UNESCO Database tify the proper class of distributions. For demonstration purposes,
The results of model selection for the UNESCO discharge sample the procedure to identify the classes of two sample data of RHBN
data are presented in Table 4. Although EV1 and LN2 have the database 共08MG005 and 05BB001兲 is described as case studies.
first and the second ranks in modeling the UNESCO sample data
based on both AIC and BIC, there are some differences in the Case Study. For the log-log plot 共cf. Fig. 2兲, the tail probabil-
ranking of lower-ranked selected models based on AIC and BIC ity is represented by a straight line in the case of a power law
perhaps due to the limited number of sample data. For example, 共Class C兲 but not for the subexponential or exponential distribu-
while LP3 and IG have the third and the fourth ranks based on tions 共Classes D or E, respectively兲. To check the linearity of the
AIC, they take the fifth and the third ranks based on BIC, respec- curve in the log-log diagram, a test on the associated correlation
tively. coefficient is considered. If the hypothesis of linearity is rejected,
Table 3. Results of Fitting Distributions to Precipitation Database Based on AIC and BIC
Rank 1 2 3 4 5 6 7 8 9
AIC
Distribution function G LN2 HB EV1 IG EV2 LP3 HIB HA
Number of fitted sample data 90 25 18 12 8 9 6 2 0
BIC
Distribution function G LN2 EV1 HB IG EV2 LP3 HIB HA
Number of fitted sample data 109 25 12 9 9 4 1 1 0
Table 4. Results of Fitting Distributions to UNESCO Database Based on AIC and BIC
Rank 1 2 3 4 5 6 7 8 9
AIC
Distribution function EV1 LN2 LP3 IG EV2 HBI G HB HA
Number of fitted sample data 2 2 2 1 1 1 0 0 0
BIC
Distribution function EV1 LN2 IG G LP3 EV2 HBI HB HA
Number of fitted sample data 3 2 2 1 1 0 0 0 0
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
first two diagrams 共the distribution belongs to Classes C, D or E兲.
If the curve in Hill’s ratio plot converges to a non-null constant
value, the most adequate distribution belongs to Class C. This is
the case for station 08MG005 关see Fig. 4共a兲兴 and therefore it
confirms that the sample data can be modeled more adequately by
Class C of distributions. However, if the curve decreases to zero,
the distribution is subexponential 共Class D兲 or exponential 共Class
E兲. This is the case for station 05BB001 关see Fig. 4共b兲兴 and there-
fore the decision based on log-log plot to consider the sample data
in Class D or E 共Fig. 3兲 is confirmed.
For the Jackson statistic, if the curve converges clearly to 2,
the studied distribution belongs to Class C. This is true for station
08MG005 关see Fig. 4共c兲兴 which was already categorized in Class
C based on log-log plot criterion and Hill’s ratio. If the curve
presents some irregularities for the distribution tail, then the sub-
exponential class 共Class D兲 or exponential 共Class E兲 is suggested.
This is the case for station 05BB001 关see Fig. 4共d兲兴 which is
consistent with the decision made by log-log plot and Hill’s ratio
in categorizing this station in Classes D or E.
Following the procedure outlined in the case study, the sample
data of the three databases used in the study were classified in
three classes of C–E. Results of this classification are presented in
Table 5. It can be seen from this table that 67, 35, and 12 out of
114 RHBN sample data were categorized in Classes C, D, and E,
respectively. Further, 16, 150, and 4 out of 170 precipitation data
series belonged to Classes C, D, and E, respectively. Applying the
DSS for the UNESCO discharge data resulted in categorizing 2,
5, and 2 out of 9 sample data in Classes C, D, and E, respectively.
No further statistical analysis was performed on the stations cat-
egorized in Class E in this study. This decision was made due to
the fact that the exponential distribution is not consistent with the
underlying mechanisms of natural processes described in this
paper and the selection of this distribution could be attributed to a
variety of contributing factors including uncertainties associated
with measurement errors and distribution selection criteria.
Fig. 3. Log-log plot and MEF plot for stations 08MG005 and Model Selection within Predefined Classes of Sample Data
05BB001 共RHBN兲 Model selection analysis based on AIC and BIC was repeated
inside predefined classes using four 共LP3, EV2, HIB, and IG兲 and
five 共LN2, EV1, HB, HA, and G兲 distribution functions for
the software suggests the use of the MEF plot. Figs. 3共a and b兲 Classes C and D, respectively. It is noteworthy that LN2 is neither
illustrate the log-log plot evaluation of stations 08MG005 and in Class C nor in Class D and is considered as a limiting case
05BB001, respectively. It can be seen from this figure that the between Class C and Class D 关Qt共C兲 ⬎ Qt共LN兲 ⬎ Qt共D兲兴. That is,
observations in station 08MG005 follow more closely a straight the LN tail is lighter than the tail of distributions in Class C and
line compared to those in station 05BB001. Therefore, the DSS of heavier than those in Class D. If the parent distribution is regu-
Hyfran-Plus software suggests the use of a distribution of Class C larly varying 共Class C兲 and the LN2 distribution is considered for
共regularly varying distributions兲 for this station. the fit, the estimated quantile for a fixed return period will be
For station 05BB001, however, the observations do not follow lower than the real value and there is a risk to underestimate this
a straight line and, therefore, the software suggests the use of the quantile. Using LN2, on the other hand, may result in an overes-
MEF plot 共cf. Fig. 2兲. This means that the sample data 共05BB001兲 timation if the true distribution is in Class D. For the purpose of
do not belong to Class C and it might belong to either subexpo- this study, LN2 was considered a member of Class D because this
nential 共Class D兲 or exponential 共Class E兲 distributions. If the gives more conservative results 共overestimation and consequently
MEF plot is linear and the slope is equal to zero, it suggests an lower risk but higher costs兲 in quantile estimation study. The re-
exponential type, and if the plot is linear, but the slope is greater sults of model selection for the three databases are presented in
than zero and the intercept is zero, then it suggests a subexponen- the following sections.
tial distribution. It can be seen from Fig. 3共c兲 that the MEF plot
for station 05BB001 is almost linear and the slope is greater than RHBN Database. Table 6 presents the result of model selec-
zero which suggests the use of a distribution of Class D 共subex- tion for RHBN stations inside Class C. It can be seen from this
ponential兲 for this station 共cf. Fig. 2兲, i.e., HA, EV1, G, Pearson table that IG provides the best fit for the majority of RHBN
Type 3, or HB. sample data in Class C 共more than 74%兲 based on both AIC and
Fig. 4 shows Hill’s ratio and Jackson statistic plots for stations BIC. After IG distribution, EV2, LP3, and HIB take the second to
08MG005 and 05BB001. These tools are used in the DSS 共El the fourth ranks, respectively.
Adlouni et al. 2008兲 to confirm the suggested choice given by the Table 7 presents the number of sample data in Class D mod-
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
Fig. 4. Hill ratio plot and Jackson statistic plot for stations 08MG005 and 05BB001
eled using the candidate distributions in this class. It can be seen date distributions, whereas EV1 and HB are the next potential
that G distribution provides the best fit for the majority of the models. No RHBN discharge sample data was modeled using HA
stations in Class D 共almost 70%兲 based on AIC and BIC. This distribution.
table also shows that LN2 has the second rank among the candi-
Precipitation Database. Precipitation sample data were ex-
amined to define the best model for each data series categorized
in Classes C and D based on AIC and BIC. The results of model
Table 5. Classification of Sample Data in Different Databases Using selection for the stations in Class C are presented in Table 8.
DSS
Similar to the RHBN stations, a large majority 共over 69%兲 of the
Class sample data in Class C are modeled using IG distribution based
Database Class C Class D Class E
on both AIC and BIC. This table also shows that LP3, HIB, and
EV2 take the second to the fourth ranks, respectively.
RHBN 67/114 35/114 12/114 Table 9 presents the number of sample data in Class D mod-
Precipitation 16/170 150/170 4/170 eled using each of candidate distribution functions. According to
UNESCO 2/9 5/9 2/9
Table 6. Selected Models for the RHBN Sample Data in Class C Table 8. Selected Models for the Precipitation Sample Data in Class C
Model Model
Criterion IG EV2 LP3 HIB Sum Criterion IG LP3 HIB EV2 Sum
AIC 53 9 3 2 67 AIC 11 2 2 1 16
BIC 56 7 3 1 67 BIC 13 1 1 1 16
Table 7. Selected Models for the RHBN Sample Data in Class D Table 9. Selected Models for the Precipitation Sample Data in Class D
Model Model
Criterion G LN2 EV1 HB HA Sum Criterion G HB LN2 EV1 HA Sum
AIC 25 7 2 1 0 35 AIC 95 25 21 9 0 150
BIC 25 7 2 1 0 35 BIC 111 9 21 9 0 150
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
Table 10. Selected Models for the UNESCO Sample Data in Class C Table 12. Comparison of Estimated Quantiles for the RHBN Database
before and after Classification by DSS
Model
Quantile
Criterion IG HIB EV2 LP3 Sum
Variable Q200 Q100 Q50 Q10
AIC 2 0 0 0 2
BIC 1 1 0 0 2 AIC
Total Discrepancies 34 34 34 34
Overestimation 11 10 10 8
this table, 58 and 70% of precipitation sample data in Class D Underestimation 23 24 24 26
were modeled using G distribution based on AIC and BIC, re-
spectively. The second and the third selected distributions in this
Class C Discrepancies 33 33 33 33
class are HB 共LN2兲 and LN2 共HB兲 based on AIC 共BIC兲, respec-
Overestimation 10 10 10 8
tively. While EV1 takes the fourth rank based on both AIC and
Underestimation 23 23 23 25
BIC, no station is modeled using HA in this class.
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
Table 13. Comparison of Estimated Quantiles for Precipitation Sample Table 14. Comparison of Estimated Quantiles Using the Two Ap-
Data before and after Classification by DSS proaches for the UNESCO Sample Data
Quantile Quantile
Variable Q200 Q100 Q50 Q10 Variable Q200 Q100 Q50 Q10
AIC AIC
Total Discrepancies 29 29 29 29 Total Discrepancies 4 4 4 4
Overestimation 7 8 8 16 Overestimation 1 1 1 1
Underestimation 22 21 21 13 Underestimation 3 3 3 3
BIC BIC
Total Discrepancies 19 19 19 19 Total Discrepancies 2 2 2 2
Overestimation 6 6 6 6 Overestimation 1 1 1 0
Underestimation 13 13 13 13 Underestimation 1 1 1 2
Precipitation Sample Data models belonged to Classes C and D where observed inconsis-
Selected models and estimated quantiles for precipitation sample tency results in over 共under兲 estimation of a quantile in 40%
data with/without using DSS are compared in Table 13. It can be 共60%兲 and 22% 共78%兲 of cases for corresponding classes, respec-
seen that in 29 out of 166 precipitation stations 共17%兲 the selected tively.
models were different when the DSS was used to classify precipi-
tation stations before using AIC. Moreover, the badly chosen UNESCO Sample Data
models in 7 共22兲, 8 共21兲, 8 共21兲, and 16 共13兲 out of 29 stations Based on selected models using AIC/BIC before and after classi-
resulted in over 共under兲 estimation of Q200, Q100, Q50, and Q10, fication, the quantiles for different return periods were estimated
respectively. These figures indicate significantly different impacts for the UNESCO sample data and a comparison of obtained re-
of badly chosen models on over/underestimation of corresponding sults is performed in Table 14. According to this table and based
quantiles. For example, while inappropriate model selection in on AIC, in 4 out of 7 stations 共57%兲 the selected models before
more than 55% of cases resulted in overestimation of Q10, it has and after classification were not in agreement. Improper selection
an opposite impact for longer return periods where in 76% of of models resulted in over- and underestimation of quantiles in 1
cases it led to underestimation of Q200. Table 13 also shows that 共25%兲 and 3 共75%兲 of stations, respectively. This table also shows
10 共19兲 out of 29 observed inconsistencies belonged to the sample that 1 out of 4 inadequately selected models belonged to Class C
data categorized in Class C 共D兲. Regardless of the quantile under and the rest 共three models兲 belonged to Class D. The inconsis-
study, poorly chosen models in Class C of distributions resulted tency in selected model in Class C resulted in overestimation for
in over 共under兲 estimation in 50% of cases. In Class D, the in- estimated quantiles except for Q10. In Class D, however, poor
adequate selected models resulted in over 共under兲 estimation in model selection resulted in an underestimation for estimated
2 共17兲, 3 共16兲, 3 共16兲, and 11 共7兲 stations for Q200, Q100, Q50, quantiles except for Q10 where it resulted in under- and overes-
and Q10, respectively. The variability in the rates of over/ timation in two and one cases, respectively.
underestimation for different quantiles is similar to that observed Based on BIC, two of the UNESCO stations 共28%兲 were mod-
for the whole database irrespective of the class of distributions. eled using different distributions when the DSS was used prior to
The observed discrepancies in model selection based on BIC fitting the candidate models. Improper model selection led equally
before and after classification is smaller compared to that based to over- and underestimation of quantiles except for Q10 where it
on AIC where in 19 out of 166 stations 共11%兲 the selected models led to an underestimation in both cases. Inadequately selected
using different approaches were not in agreement. Based on this models belonged equally to Classes C and D; however, poorly
criterion, in 6 共13兲 out of 19 observed inconsistencies the im- chosen model in Class C resulted in overestimation for all quan-
proper model selection led to over 共under兲 estimation of an esti- tiles, whereas that in Class D led to underestimation of all quan-
mated quantile. A 共roughly兲 similar percentage of poorly selected tiles.
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
Discussion inside each class, in particular, when the sample data come from
relatively light tailed distributions 共e.g., precipitation and
UNESCO databases兲. That is, the percentage of total inconsis-
Comparison of Inconsistencies in Model Selection/
tency is significantly smaller when the model selection is per-
Quantile Estimation for Different Databases
formed using BIC compared to AIC.
A comparison of obtained results for different databases used in For the RHBN data, where discrepancies were mostly ob-
the study in terms of the impact of classification on model selec- served in Class C of distributions, poorly selected models resulted
tion and quantile estimation, and the ultimate effect of inconsis- in underestimation in 3/4–2/3 of observed inconsistencies for
tencies on over/underestimation of quantiles show some Q10–Q200, respectively. There is no relationship between model
similarities as well as dissimilarities. Based on AIC and BIC, selection criterion 共AIC/BIC兲 and the percentage of over-/
overall 33% of RHBN data series were modeled using a different underestimations. For the precipitation database, however, the im-
distribution function when classification using DSS was per- pact of inadequate model selection varies depending on the model
formed prior to model selection. Observed discrepancies for the selection criteria and the class of observations. For example, there
precipitation database were 17 and 11% based on AIC and BIC is no significant difference in the percentage of over/
model selection criteria, respectively. For the UNESCO sample underestimation for the discrepancies observed in Class C using
data, selected models before/after classification differed by 57 and AIC or BIC. Further, there is no linkage between the percentage
28% using AIC and BIC, respectively. A comparison of the per- of over/underestimation and the quantile under study in this class.
centage of discrepancies for different databases reveals that clas- For the model selection inconsistencies observed in Class D of
sification of time series based on tail behavior prior to model observations, however, the percentage of over-/underestimation
selection had the largest impact on model selection in UNESCO varies significantly in accordance with the quantile and the model
and RHBN databases, respectively, whereas it had a lower impact selection criterion. The same arguments 共for the precipitation
on model selection for precipitation time series. It is noteworthy data兲 apply more or less to the UNESCO data as well.
that due to the small number of sample data in the UNESCO One should note that the average record length for the RHBN
database, the observed percentage of inconsistency in model se- data series is 40 years, whereas the average observation periods
lection might be associated with certain uncertainties. However, it for precipitation and UNESCO time series are 100 and 80 years,
is evident that model selection incorporating DSS was affected respectively. As a consequence, the performance of selected mod-
less effectively for precipitation database. This suggests that dis- els needs to be judged based on appropriate quantiles. For ex-
criminating between the classes of distributions using DSS is of ample, while Q10 might be an appropriate measure to evaluate
substantial importance and it may make a great difference when it the adequacy of a selected model for the RHBN database, it does
comes to the tail behavior of rare events 共extremes兲. It is not not seem to be a robust measure to evaluate the impact of im-
surprising, therefore, that the DSS outperforms when dealing with proper model selection for the precipitation and the UNESCO
RHBN data series which are extreme annual discharges recorded databases. On the other hand, assessing the impact of wrongly
in corresponding hydrometric stations. This is also true, though selected models for the RHBN database using Q200 may not be
with lower certainty, for the UNESCO sample data which are appropriate as record periods in this database are inclusively
annual maximum but obtained from monthly mean discharges. It shorter than 100 years.
should also be noted that precipitation sample data have the long-
est records compared to other two databases. This may play an
important role in smaller inconsistencies between the two ap- Conclusion and Recommendations
proaches for precipitation database as AIC and BIC have much
better performances when dealing with longer sample data. This Irrespective of model selection criterion, classification of the
again underlines the usefulness of DSS when dealing with small sample data based on tail behavior prior to model selection using
sample data which is frequently encountered in hydrology. DSS had the largest impact for the RHBN and the smallest impact
A great majority 共97%兲 of observed inconsistency for the for precipitation sample data. This finding confirms that discrimi-
RHBN data was due to inadequate model selection by AIC/BIC nating between classes of competing models prior to model selec-
inside Class C. For the precipitation data, however, 34 and 66% tion is critical when the sample data come from extreme events
of badly selected models belonged to Classes C and D, respec- with relatively short record length. While 97% of discrepancy in
tively, based on AIC. Based on BIC, 53 and 47% of poorly chosen selected models for RHBN data using the two different ap-
models belonged to Classes C and D, respectively. For the proaches belonged to sample data categorized in Class C of dis-
UNESCO time series 25 and 75% of observed discrepancies be- tributions, the percentage of discrepancy observed in Classes C
longed to Classes C and D, respectively, based on AIC. There was and D for precipitation and UNESCO databases varies depending
no difference in the percentage of observed inconsistencies in on the model selection criterion.
Classes C and D based on BIC for this database. It can be hy- Observed inconsistency in model selection for RHBN database
pothesized that making use of DSS is effective the most in Class resulted in an underestimation of quantiles in more than 2/3 of the
C of observations 共heavy tailed sample data兲 when dealing with cases regardless of return period, class of distributions, and model
extremes 共e.g., RHBN data兲 regardless of model selection crite- selection criterion. This was not the case for precipitation and
rion. For the relatively light tailed sample data 共e.g., precipitation UNESCO data where the percentage of over-/underestimation
or UNESCO observations兲, on the other hand, using the DSS is varies depending on the class of distributions, quantile, and model
more efficient for Class D of observations. Another issue that selection criterion, i.e., inappropriate model selection resulted in
draws attention is the impact of model selection criterion. In the over and other estimation equally in Class C, whereas it resulted
case of the RHBN data series, where discrepancies belong mostly in underestimation of quantiles in Class D in the majority of
to Class C, there is almost no difference in discrepancies based on observed inconsistencies. According to these findings, an inappro-
AIC or BIC. Nevertheless, model selection criterion plays a sig- priate model selection due to choosing a wrong class of distribu-
nificant role in the percentage of discrepancies in general, and tions leads, in the majority of cases, to an underestimation of the
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
quantity of variable under study for a defined return period which i兲. For n larger than 10, the standard normal test statistic Zs for
is associated with a higher socioeconomy risk compared to that hypothesis testing is
冦 冧
corresponding to an overestimation of a specific quantile.
Another conclusion that can be drawn from this study is that S−1
if S ⬎ 0
model selection using BIC is more consistent with tail behavior of s
the natural processes as the percentage of discrepancy between Zs = 0 if S = 0 共23兲
selected models using this criterion before and after classification
S+1
is smaller compared to that of AIC. if S ⬍ 0
One should note that LN2 is neither a member of Class C nor s
Class D of observations. In this study, however, it was considered Zs has a standard normal distribution. Local 共at-site兲 significance
in Class D due to some considerations discussed in previous sec- levels 共p values兲 for each trend test can be obtained from 共Dou-
tions of this study 共see Model Selection within Predefined Classes glas et al. 2000兲 as follows:
of Sample Data section兲. In fact, there is no methodology, so far,
to discriminate between distributions in Classes C and D on one p = 2关1 − ⌽共兩Zs兩兲兴 共24兲
hand and LN2 on the other hand. An interesting extension of this where
study 共research in progress兲 is to develop a methodology to iden-
tify and separate the sample data for which LN2 is the underlying
distribution prior to classifying the sample data in Classes C and
D. We intend to incorporate the results of this work in DSS of
⌽共兩Zs兩兲 =
冑2
1
冕0
兩Z兩
e−共t
2/2兲
dt 共25兲
HYFRAN-PLUS. If the p value is large enough, the sample data are considered to
be stationary. At the significance level of 0.05, if p ⬎ 0.05, then
the time series is assumed to be statistically stationary.
Appendix
Test for Independence
Test for Stationarity
The Wald and Wolfowitz 共1943兲 independence test is used to
The nonparametric MK statistical test 共Mann 1945; Kendall verify the independence of a data set. For the data set
1975兲 is used to assess the stationarity of sample data in this x1 , x2 , . . . , xN, the statistic R is calculated from
study. The main reason for using nonparametric statistical tests is N−1
that no assumption is needed on the normality of the tested data
set, which is important when studying hydrometeorological time R= 兺
i=1
xixi+1 + x1xN 共26兲
series. The MK test due to unbiased estimation of population
parameters is preferred to other nonparametric tests. The null and When the elements of the sample data are independent, R follows
the alternative hypotheses of the MK test are as follows: a normal distribution with mean and variance given by
S= 兺 兺 sgn共x j − xi兲 共19兲 where sr = Nmr⬘; N = sample size; and mr⬘ = rth moment of the
i=1 j=i+1 sample about the origin. The statistic u = 共R − R̄兲 / 关var共R兲兴1/2 is ap-
where x j and xi = data values in years j and i, respectively, with proximately normally distributed with mean zero and variance
j ⬎ i and sgn共x j − xi兲 = sign function as follows: unity and is used to test the hypothesis of independence at sig-
nificance level a by comparing the statistic u with the standard
冦 冧
1 if x j − xi ⬎ 0 normal variate ua/2 corresponding to a probability of exceedance
of a / 2.
sgn共x j − xi兲 = 0 if x j − xi = 0 共20兲
−1 if x j − xi ⬍ 0
Test for Homogeneity
The distribution of the S statistic can be approximated by a nor-
mal distribution for large n, with mean 共s兲 and standard devia- The Wilcoxon rank sum test was used in this study to evaluate the
tion 共s兲 given by homogeneity of the time series. This test, proposed initially by
Wilcoxon 共1945兲 for equal sample sizes and extended by Mann
s = 0 共21兲 and Whitney 共1947兲, is equivalent to the Mann-Whitney U test
冑
and it is virtually identical to performing an ordinary parametric
m two-sample t test on the data after ranking over the combined
n共n − 1兲共2n + 5兲 − 兺
i=1
ti共i兲共i − 1兲共2i + 5兲 samples. This test is used to assess whether two samples of ob-
servations come from the same distribution. The null hypothesis
s = 共22兲 is that the two samples are drawn from a single population, and
18
therefore their probability distributions are identical. The test re-
Eq. 共22兲 estimates the standard deviation of S statistic with the quires the two samples to be independent and the observations to
correction for ties in data 共ti denotes the number of ties of extent be ordinal or continuous measurements. This formulation requires
Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org
the additional assumption that the distributions of the two popu- Embrechts, P., Klüppelberg, C., and Mikosch, T. 共2003兲. Modelling ex-
lations are identical except for possibly a shift 关i.e., f共x兲 = f共y tremal events for insurance and finance, applications of mathematics,
+ ␦兲兴. The Mann-Whitney U statistic for x and y sample data is Vol. 33, Springer, New York.
defined as Hannan, E. J., and Quinn, B. G. 共1979兲. “The determination of the order
of an autoregression.” J. R. Stat. Soc. Ser. B (Methodol.), 41, 190–
n 195.
n2共n2 + 1兲 2
Harvey, K. D., Pilon, P. J., and Yuzyk, T. R. 共1999兲. “Canada’s reference
U = n 1n 2 +
2
− 兺
i=n1+1
Ri 共29兲 hydrometric basin network 共RHBN兲. Partnerships in water resources
management.” Proc., CWRA 51st Annual Conf., Nova Scotia, Canada.
where n1 and n2 = sample sizes and Ri = rank. U can be resolved as Hill, B. M. 共1975兲. “A simple general approach to inference about the tail
of a distribution.” Ann. Stat., 3, 1163–1174.
the number of times observations in one sample precedes obser-
Hyfran-Plus Software. 共2008兲. INRS, Quebec, 具http://www.wrpllc.com/
vations in the other sample in the ranking. In most circumstances, books/hyfran.html0典 共March 2009兲.
a two sided test is required; here the alternative hypothesis is that Kendall, M. G. 共1975兲. Rank correlation methods, Griffin, London.
x values tend to be distributed differently to y values. For a lower Mandelbrot, B. 共1997兲. Fractales, Hasard et Finances, Flamarion, Coll.
side test the alternative hypothesis is that x values tend to be Champs, Paris.
smaller than y values. For an upper side test, the alternative hy- Mandelbrot, B. 共2003兲. “Multifractal power law distributions: Negative
pothesis is that x values tend to be larger than y values. The test and critical dimensions and other ‘anomalies,’ explained by a simple
statistic for the Mann-Whitney test 共U兲 is compared to a table of example.” J. Stat. Phys., 110共3–6兲, 739–774.
critical values for U based on the sample size of each group. If U Mann, H. B. 共1945兲. “Nonparametric tests against trend.” Econometrica,
exceeds the critical value for U at a significance level 共usually 13, 245–259.
0.05兲 it means that there is evidence to reject the null hypothesis Mann, H. B., and Whitney, D. R. 共1947兲. “On a test of whether one of
of homogeneity in favor of the alternative hypothesis. two random variables is stochastically larger than the other.” Ann.
Math. Stat., 18, 50–60.
Mutua, F. M. 共1994兲. “The use of the Akaike information criterion in the
identification of an optimum flood frequency model.” Hydrol. Sci. J.,
References 39, 235–244.
Otomo, T., Nakagawa, T., and Akaike, H. 共1972兲. “Statistical approach to
Akaike, H. 共1970兲. “Stastistical predictor identification.” Ann. Statis. computer control of cement rotary kiln.” Automatica, 8, 35–48.
Math., 22, 203–217. Otsu, K., Horigome, M., and Kitagawa, G. 共1976兲. “On the prediction and
Akaike, H. 共1976兲. “Canonical correlation analysis of time series and the stochastic control of ship’s motion.” Proc., 2nd IFAC/IFAP Symp., M.
use of an information criterion.” Systems identification, R. K. Mehra Pitkin, J. J. Roche, and T. J. Williams, eds., Washington, D.C.
and D. G. Lainiotie, eds. Academic, New York, 27–96. Pilon, P. J., and Kuylenstierna, J. K. 共2000兲. “Pristine river basins and
Akaike, H. 共1977兲. “On entropy maximisation principle.” Proc., Symp. on relevant hydrological indices: Essential ingredients for climate-change
Applications of Statistics, P. R. Krishnaiah, ed., Amsterdam, The studies.” WMO Bulletin, 49共3兲, 248–255.
Netherlands, 27–47. Sakamoto, Y., and Akaike, H. 共1977兲. “Analysis of cross-classified data
Almpanidis, G., and Kotropoulos, C. 共2008兲. “Phonemic segmentation by AIC.” Ann. Inst. Stat. Math., 30B, 1–30.
using the generalised Gamma distribution and small sample Bayesian Salas, J. P., Delleur, J. W., Yevjevich, V., and Lane, W. L. 共1980兲. Applied
information criterion.” Speech Commun., 50共1兲, 38–55. modeling of hydrology time series, Water Resources, Fort Collins,
Beirlant, J., de Wet, T., and Goegebeur, Y. 共2006兲. “A goodness-of-fit Colo.
statistic for Pareto-type behavior.” J. Comput. Appl. Math., 186, 99– Schwarz, G. 共1978兲. “Estimating the dimension of a model.” Ann. Stat.,
116. 6, 461–464.
Bobée, B., and Ashkar, F. 共2001兲. The Gamma family and derived distri- Tanabe, K. 共1974兲. “Statistical regularisation of a noisy ill-conditioned
butions applied in hydrology, Water Resources, Littleton, Colo. system of linear equations by Akaike information criterion.” Research
Cahill, A. T. 共2003兲. “Significance of AIC differences for precipitation Memo No. 60, The Institute of Statistical Mathematics, Tokyo.
intensity distributions.” Adv. Water Resour., 26, 457–464. Turcotte, D. L. 共1997兲. Fractals and chaos in geology and geophysics,
Champernowne, D. 共1953兲. “A model of income distribution.” Econom. 2nd Ed., Cambridge University Press, Cambridge, MA.
J., 63, 318–351. Wald, A., and Wolfowitz, J. 共1943兲. “An exact test for randomness in the
Douglas, E. M., Vogel, R. M., and Kroll, C. N. 共2000兲. “Trends in floods nonparametric case based on serial correlation.” Ann. Math. Stat., 14,
and low flows in the United States: Impact of spatial correlation.” 378–388.
J. Hydrol., 240, 90–105. Weakliem, D. 共1999兲. “A critique of the Bayesian information criterion
Ehsanzadeh, E., El Adlouni, S., and Bobée, B. 共2009兲. “Frequency analy- for model selection.” Sociolog. Methods Res., 27共3兲, 359–397.
sis incorporating a decision support system 共DSS兲.” Internal Rep., Werner, T., and Upper, C. 共2002兲. “Time variation in the tail behaviour of
Hydro-Quebec, Quebec. bund futures returns.” Working Paper No. 199, European Central
El Adlouni, S., Bobée, B., and Ouarda, T. B. M. J. 共2008兲. “On the tails Bank, Frankfurt, Germany.
of extreme event distributions in hydrology.” J. Hydrol., 355共1–4兲, Wilcoxon, F. 共1945兲. “Individual comparisons by ranking methods.” Bio-
16–33. metrics, 1, 80–83.
View publication stats Downloaded 19 Oct 2010 to 137.122.27.41. Redistribution subject to ASCE license or copyright. Visithttp://www.ascelibrary.org