0% found this document useful (0 votes)
170 views28 pages

Statistical Modeling of Extreme Values PDF

This document provides an introduction to statistical modeling of extreme values and its application to calculate extreme wind speeds. It discusses (1) how extreme value theory can be used to estimate rare events like highest wind speeds or sea levels over long time periods based on shorter observational data, and (2) the three classical extreme value distributions - Gumbel, Frechet, and Weibull - that describe the possible limits for normalized maximum values. It also introduces (3) the generalized extreme value distribution that unifies the three classical distributions into a single model to facilitate statistical analysis of extremes.

Uploaded by

HassanRaza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
170 views28 pages

Statistical Modeling of Extreme Values PDF

This document provides an introduction to statistical modeling of extreme values and its application to calculate extreme wind speeds. It discusses (1) how extreme value theory can be used to estimate rare events like highest wind speeds or sea levels over long time periods based on shorter observational data, and (2) the three classical extreme value distributions - Gumbel, Frechet, and Weibull - that describe the possible limits for normalized maximum values. It also introduces (3) the generalized extreme value distribution that unifies the three classical distributions into a single model to facilitate statistical analysis of extremes.

Uploaded by

HassanRaza
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

An introduction to statistical modeling of extreme values

An introduction to statistical modelling of extreme values. Application


to calculate extreme wind speeds.

Fermín Mallor and Eulalia Nualart. Public University of Navarre.

Edward Omey. Hogeschool Universiteit Brussel.

1. INTRODUCTION

High wind speeds pose a threat to the integrity of structures such as wind turbines. An
accurate estimation of the occurrence of extreme wind speeds is an important factor in
achieving a correct balance between safety and cost of “over-design”. This design
problem also arises in many other engineering areas such as ocean engineering (with the
wave height), hydraulics engineering (floods), structural engineering (earthquakes) and
also in meteorology (temperatures, rainfall, etc), fatigue strength (workloads), etc. All
these applications have in common that the interest is not the knowledge of the average
behaviour of the analysed phenomena but the extreme behaviour of them. Then, the
distinguishing feature of an extreme value statistical analysis is that the objective is not
to describe the usual behaviour of the stochastic phenomena but the unusual and the
rarely observed events.

For example, suppose that a sea-wall is going to be built with the purpose of protecting
the coast against all sea-levels
that it is likely to occur within
its projected life span (for
example, 100 years). Accurate
estimation of the highest sea-
level in 100 years is necessary
in order to balance
economical and safety goals.
The problem that the
statistical methods face is that
the records of sea-levels could
span for shorter periods of time, of say 15 years. The challenge is to estimate what sea-
levels might occur over the next 100 years given the 15 year data.

1
An introduction to statistical modeling of extreme values

Motivation of extreme wind speed analysis: the calculus of Vref

The extreme wind speed estimates are used to determine critical design loads which the
turbine must withstand during its
lifetime. According to the
International Standard IEC 61400-1
for Wind Turbine Generator
Systems the extreme wind speed
Vref is a basic parameter for wind
turbine classes and therefore
strongly related to design of wind
turbines. The Vref is defined as the
extreme 10-min average wind speed
with a recurrence period of 50 years. In general Vref has to be determined statistically
on the basis of on-site measurement.

Figure 2: Classification of wind turbine generators according to Vref

The statistical theory developed to deal with these problems and this type of data is
known as Extreme Value Theory. The presentation of its main results as well as its
application to the analysis of extreme wind speeds are the two main purposes of this
monography.
http://www.youtube.com/watch?v=oAWMpxX60KM&feature=player_embedded
http://www.youtube.com/watch_popup?v=CqEccgR0q-o
http://www.youtube.com/watch?v=b43lAoovqd8&feature=fvw

2
An introduction to statistical modeling of extreme values

2. CLASSICAL EXTREME VALUE THEORY

2.1. Model formulation

The core of the extreme value theory is the study of the statistical behaviour of

M n  maxX1,, X n 

where X1,, X n  is a sequence of independent random variables having a


common distribution function F.

In applications, variables X i usually represent values of a process measured on a


regular time-scale, as for example the 10 minutes average (or maximum) wind
speed. Then M n is the maximum of the observed process over n time units.

The distribution function of M n verifies:

PM n  z   P X1  z,, X n  z   P X1  z    P X n  z   F ( z )n

Thus, a way to study M n is to estimate F from the available data (for example the
10 minutes speed records measured during certain interval of time) and then to
substitute this estimation in the previous formula to estimate M n .

The problem of this approach is that small deviances in the estimation of F lead to
large discrepancies for Fn.

One alternative approach is to estimate Fn directly from the extreme data. This idea
is similar to that used to estimate the distribution of the sample mean average.
Following this way it is necessary to study the behaviour of Fn as n tends to infinity.
Although in this case this is not enough because for any z  zsup

n 
F n ( z )  0 , where zsup is the smallest value of z such that F ( z )  1 .

To overcome this difficulty, reaching a limit different from 0, the following linear
normalization of M n is allowed:

M n  bn
M n*  , where {an } and {bn } are sequences of constants.
an

3
An introduction to statistical modeling of extreme values

Now, the objective is to find limit distributions for M n* with appropriate choices for

{an } and {bn } .

The following theorem states that M n* converges in distribution to a variable having


a distribution function within one of three classes of families.

Theorem 1: Extremal types theorem. If there exist sequences of constants {an }

and {bn } such that

P(M n  bn ) / an  zG( z ) , where G is a non-degenerate distribution


n

function, then G belongs to one of the following families:

   z  b  
I. G( z )  exp  exp     ,    z   (Gumbel)
   a  

 0 zb
   z  b  
G ( z )  exp  
  z b
II. (Fréchet)

   a  
  

   z  b  
exp       
zb
III. G ( z )      a    (Weibull)
   
 1 zb

For parameters a>0, b and, in the case of families II and III >0.

These three classes of distributions are named the extreme value distributions, with
types I, II and III, respectively, and also known as Gumbel, Fréchet and Weibull
families, respectively.

Observe that these three types of distributions are the only possible limits for the
distributions of the normalized maxima regardless of the distribution F for the
population.

4
An introduction to statistical modeling of extreme values

The three limit types have different forms of tail behaviour. The end point zsup is

finite for the Weibull


distribution
( zsup      ) while

zsup   for the Fréchet

and Gumbel distributions.


However, the density of
Gumbel distribution
decays exponentially and
the density of Fréchet
distribution decays
polynomially. The gumbel type is the domain of attraction for many common
distributions, like normal, lognormal, exponential and gamma. The Fréchet type has

 
a heavy tail, verifying that E X r   for r  1  (which means that it has infinite
variance if   1 2 ).

2.2. The generalized extreme value distribution

It was usual in the past to adopt one of the three families and then to estimate the
parameters of the model. But this way has a weakness: it needs to choose one out of
the three models which is assumed to be correct and then the uncertainty implied by
this choice is not considered in the subsequent inferences. A better analysis can be
done combining the three models into a single family of models named the
generalized extreme value distribution (GEV):

 1  
   z    
G ( z )  exp  1      
 
    

defined on z such that 1   ( z   )   0 and with parameters

location      
scale  0
shape  

Type II distribution is obtained when   0

5
An introduction to statistical modeling of extreme values

Type III distribution is obtained when   0

The type I (Gumbel distribution) is obtained by letting   0

This unification facilitates the statistical analysis. The uncertainty in the estimation
of  parameter measures the lack of certainty in the choice of one of the three
models.

Now, the extremal types theorem can be re-state in the following way

Theorem 2. If there exist sequences of constants {an } and {bn } such that

P(M n  bn ) / an  zG( z ) , where G is a non-degenerate distribution


n

function G, then G is a distribution of the GEV family:

 1  
   z    
G ( z )  exp  1    
 
 
    

defined on z such that 1   ( z   )   0 and with parameters

location      
scale  0
shape  

The difficulty of the normalizing constants are unknown is easily solved in practice
because if P(M n  bn an )  z   G( z ) for large n, then

PM n  z   G( z  bn ) an   G* ( z ) , where G*(z) belongs too to the GEV family.

2.3. Practical implementation

The above results lead to the following approach for modelling extremes of a series
of independent and identically distributed observations X1, X 2 , . First step
consists in blocking the data into sequences of n observations, being n large enough.
Then the maxima M i of each block i is calculated and, finally, the GEV distribution

is fitted to the series of block maxima M1, M 2 , .

6
An introduction to statistical modeling of extreme values

In environmental applications the length of the blocks use to be one year,


representing then each data M i the annual maxima of year i.

Once the GEV distribution has been fitted, let say for the annual maxima, we can
calculate the quantile function, z p , for the annual maximum distribution as:

  (  ) (1   log(1  p)  )
 for   0
zp  

    log  log(1  p)  for   0

Observe that G( z p )  1  p , being in applied terminology, the return level

associated with the return period 1 p . That is, z p is the level that is expected to be

exceeded, in average, once every 1 p years or z p is the level that is exceeded by

the annual maximum in any particular year with probability p.

By defining y p   log(1  p) , this quantile function can be expressed as

 
  (  ) (1  y p ) for   0
zp  

    log y p for   0

Then, if z p is plotted against log y p the plot is linear in the case of   0 ; the plot

is convex in the case of   0 with asymptotic limit as p tends to 0 to (   ) 


and the plot is concave for   0 and has not finite bound.

This graph is named a return level plot and it is useful as validation tool as well as a
way of presenting the fitted model.

Inference for the GEV distribution

The choice of the length of blocks implies a trade off between bias and variance.
When the length is small then the approximation of the distributions by the limit is
quite poor leading to bias in estimation and extrapolation, while long blocks
generate few data leading to large estimation variance.

The method most commonly used to estimate the parameters is the likelihood
method. One difficulty of this approach is that the regularity conditions for its
application are not satisfied by the GEV distributions because the end-point of the
distribution depends on the parameter values. This violation means that the standard

7
An introduction to statistical modeling of extreme values

asymptotic likelihood results are not automatically applicable. This problem has
been studied in detail (Smith, 1985) with the following results:

 When   0.5 maximum likelihood estimators have the usual asymptotic


properties.

 When  1    0.5 maximum likelihood estimators are obtainable in


general but they do not have the standard asymptotic properties.

 When   1 maximum likelihood estimators are unlikely to be obtainable.

Observe that the case   0.5 corresponds to distributions with a very short
bounded upper tail, which is rarely present in real applications of extreme value
modelling.

By denoting Z1,, Z m the block maxima and under the assumption that they are
independent variables having a GEV distribution, the log-likelihood for the GEV
when   0 is

1 
m  z  m z 
(  , ,  )  m log   (1  1  )  log1   i    1   i 
i 1    i 1  

 z 
provided that 1   i   0 for i=1,…,m. When this condition is not satisfied
  
then the likelihood is zero and the log-likelihood is minus infinity.

In the Gumbel case (   0 ), the log-likelihood is:

mz  m  z 


(  , )  m log     i    exp   i 
i 1   i 1   

By maximizing the previous log-likelihood functions we obtain the maximum


likelihood estimates ( ˆ ,ˆ , ˆ) . The optimization is made using numerical
optimization algorithms.

The classical theory of maximum likelihood estimation establishes that the


distribution of ( ˆ ,ˆ , ˆ) is approximately normal with mean (  , ,  ) and variance-
covariance matrix equal to the inverse of the observed information matrix evaluated
at the maximum likelihood estimate. Confidence intervals are obtained from this
approximate normality of the estimator.

8
An introduction to statistical modeling of extreme values

9
An introduction to statistical modeling of extreme values

Example. Hourly average wind data from Schiphol in Netherlands. We consider


the records of the hourly average wind speed at the location of Schiphol,
Netherlands (lat. 52.330 north, lon. 4.738 east). Data were recorded by the ''Royal
Netherlands Meteorological Institute'', through the KNMI HYDRA PROJECT from
March 1, 1950 to December 31, 2005. The measuring height was 10 meters.

10
An introduction to statistical modeling of extreme values

Figures 1, 2, 3 and 4 show the original data, the daily, monthly and yearly maxima,
respectively.

11
An introduction to statistical modeling of extreme values

12
An introduction to statistical modeling of extreme values

Inference for return levels

The maximum likelihood estimate of the 1/p return level z p for 0<p<1 is

 ˆ
ˆ  (ˆ ˆ) (1  y p ) for ˆ  0
zˆ p  

 ˆ  ˆ log y p for ˆ  0

where y p   log(1  p) . Confidence intervals can be set using the normal

approximation of the estimator distribution, but caution is required in the


interpretation, especially for return levels corresponding to long return periods
because the normal approximation may be poor. A better approximation is generally
obtained from the profile likelihood function.

Graphical model checking

Though it is impossible to check the validity of an extrapolation based on the GEV


model, assessment can be done with reference to the observed data.

Probability Plot. A probability plot is a comparison of the empirical and fitted


distribution functions. The empirical distribution function evaluated in the i-esime
~
ordered block maximum, Z (i ) , is G( Z(i ) )  i (m  1) , and fitted distribution function

 1 ˆ 
   z  
ˆ  
in the same point is Gˆ ( Z (i ) )  exp  1  ˆ  
( i )
 .
   ˆ   
 
~
To be good the model is necessary that G( z(i ) )  Gˆ ( z(i ) ) , and then the plot of points

G~( z(i) ), Gˆ ( z(i) ) i  1,, m , should lie close to the diagonal unit. But because both

functions are bounded to approach 1 as the values of z increase the plot is least
informative in this region. The following graph avoids this deficiency.

Quantile plot. The quantile plot is a representation of the points

Gˆ 1
(i (m  1)), z(i )  i  1,, m , where

ˆ   i  
ˆ
ˆ 1
G (i /( m  1))  ˆ  1    log  i  1,, m
ˆ   m 1 
 

Again, departures from linearity in the quantile plot also indicate model failure.

13
An introduction to statistical modeling of extreme values

 
Return level plot. The return level plot represents the points log y p , zˆ p 0  p  1 .

Confidence intervals are usually added to this plot to increase its informativeness.
The importance of return periods in engineering is due to the fact that the return
period is used as a design criterion. Furthermore, to use this plot as a model
diagnostic one, the empirical estimates of the return level function are also added.
For suitable models the model based curve and empirical estimates should be in
agreement.

14
An introduction to statistical modeling of extreme values

3. THRESHOLD MODELS

Modelling only block maxima implies to waste a lot of data if a detailed recording of
the studied phenomenon is available. Now it is proposed another alternative analysis
that is more efficient in the use of data. The approach consists in considering for the
analysis those data that are viewed as extreme observations, let say, those data that
surpass a threshold level u. Then the stochastic behaviour of these excesses over u is
studied.

More formally, given X1 ,, X n a sequence of independent and identically distributed


random variables, having distribution function F, we are interested in the conditional
probability

1  F (u  y )
PrX  u  y / X  u  , y0
1  F (u )

The flowing result gives an approximation to this probability for high values of the
threshold u.

Theorem 3. Let X1 ,, X n be a sequence of independent and identically distributed

random variables with a common distribution function F, and M n  maxX1,, X n 


satisfying the conditions to be approximated by a GEV, that is, for large n:

1 

   z     
PrM n  z  G ( z ), where G ( z )  exp  1      

      

Then, for large enough u, the distribution function of (X-u), conditioned to X>u, is
approximately

1 
  y
H ( y )  1  1  ~  GENERALIZED PARETO DISTRIBUTION
  

defined on y / y  0 and (1   y ~)  0, and where ~     (u   ) .

This result relates the two approximations to study the distribution of the maximum. We
see how the parameters of the Generalized Pareto Distribution (GDP) are uniquely
determined by the parameters of the associated GEV distribution of block maxima.
Observe that this imply that if we change the size of blocks in the GEV analysis then the

15
An introduction to statistical modeling of extreme values

parameter  remains unperturbed meanwhile the parameters  and  change but


compensating their values to provide a fix value for ~ .

As for the GEV distribution the  parameter is dominant for determining the qualitative
behaviour of the GPD distribution:

 If  <0 then the distribution of excesses is bounded by u  ~  .

 If c>0 then the distribution is unbounded.

 If  =0 then the distribution is also unbounded and is in the exponential family

with parameter 1 ~ .

Threshold selection

Let {x1 ,, xn } be the original data and let us consider as extreme events those that

excess a threshold u, let say, x(1) ,, x( k ) . We denote the excesses over the threshold by

y j  x( j )  u . Because of the previous theorem, when the threshold u is large enough,

the values y j can be view as independent realizations of a variable distributed

according to a GPD, whose parameters have to be estimated and then the model
validated.

The issue of how to choose the threshold is similar to that of selecting the size of a
block in the sense that both imply a balance between bias and variance. A low level
leads to failure in the asymptotic approximation of the model and a high level provides
few observations and then high variance.

A method to help in the choice of the threshold is based on the mean of the GPD: if Y is
a random variable following a GPD with parameters  and  , then E (Y )   (1   )
when  <1, in other case the mean is infinite.

If a model is valid for a threshold u0 then it is also valid for all thresholds u greater than

u0 . The means en both cases are:

E ( X  u0 / X  u0 )   u0 (1   )

E ( X  u / X  u)   u (1   )  ( u   (u  u0 )) (1   )

16
An introduction to statistical modeling of extreme values

Thus, E ( X  u / X  u) is a linear function of u. Based on this result the procedure to


estimate the threshold is as follows:

 Build the mean residual life plot, by representing the points

u,  nu
i 1
 
( x(i )  u ) nu u  xm a ,x where nu is the number of observations

exceeding u and xmax is the maximum observation in the data set.

 Choose as threshold the value above which the plot is approximately linear in u.
The representation of confidence intervals can help to the determination of this
point.

Parameter estimation

Once the threshold has been estimated the next step is to estimate the parameters of the
GPD, for example by maximum likelihood. If we denote by y1 ,, yk the k excesses

over the threshold, the log-likelihood function, in the case that  is not zero, is:

k
( ,  )  k log   (1  1  )  log(1   yi  ) , when (1   yi  )  0 , in other case
i 1

( ,  )   .

k
In the case   0 the log-likelihood is ( )  k log    1  yi
i 1

Return levels

To calculate the return levels, first we need an expression for the unconditional
distribution of variables X. Denoting by  u  Pr X  u  and from the conditional
1 
 ( x  u) 
distribution PrX  x / X  u  1    we obtain that
  

1 
 ( x  u) 
PrX  x   u 1   
  

Hence, the level xm that is exceeded on average once every m observations is the
solution of

1 

1  ( x  u) 
  u 1   m  , which is xm  u 

(mu )  1
m   

17
An introduction to statistical modeling of extreme values

This expression is valid for values m leading to xm  u .

In the case   0 the return level is xm  u   logm u  , again for m enough large.

The estimation of these return levels requires the substitution of parameters by their
estimates. In the case of the probability  u  Pr X  u  , the maximum estimator is the

sample proportion of observations over the threshold u, that is, ˆu  k n .

Another tool to help in the choice of threshold u

As we said before, when the GPD is a valid model for a threshold u0 then it is also a

valid model for any u  u0 . With both levels the parameter  is the same and the scale

parameters are related by  u   u0   (u  u0 ) . Thus, the new parameter     u   u

is constant with respect to u. Consequently, estimates of   and  should be constant

above u0 , when it is a valid threshold. This argument leads to plot ˆ  and ˆ against u,

together with confident intervals for them and selecting u0 as the lowest value of u for
which the estimates remain near-constant.

Model checking

Probability plots, quantile plots and return level plots are used for assessing the quality
of a fitted generalized Pareto model. Assuming a threshold u, ordered excesses
y(1) ,, y( k ) and an estimated model Ĥ for the GPD then


Probability plot. It represents the points i (k  1) , Hˆ ( y(i ) )  i  1,, k .


Quantile plot. It represents the points Hˆ 1 i (k  1), y(i )  i  1,, k .

When the model is valid in both plots the points are almost linearly placed.

1 ˆ
 ˆ y 
When ˆ  0 the estimations are: Hˆ ( y )  1  1 
ˆ


ˆ
ˆ
 ˆ

and Hˆ 1 ( p)  (1  p)   1
 

 y
When ˆ  0 the expressions are: Hˆ ( y )  1  exp    , and Hˆ 1 ( p)  ˆ ln (1  p)
 ̂ 

18
An introduction to statistical modeling of extreme values

Return level plot. It represents the points m, xˆm  , where as we have seen before

ˆ

for ˆ  0 xˆm  u  (m ˆu )  1 ,
ˆ
ˆ

 
for the case ˆ  0 the return level is xˆm  u  ˆ log mˆu .

Recall that x̂m is the estimated value that is exceeded on average once every m
observations.

19
An introduction to statistical modeling of extreme values

4. EXTREMES OF DEPENDENT SEQUENCES

In the models studied so far it is supposed that the sequence of observations comes from
a sequence of independent random variables. But in real applications this is an
unrealistic assumption because it is observed some dependence over time. For example,
in the case of wind speed records it is natural to find high positive correlation among
consecutive hourly observations. Next figure show the correlation for the wind series of
Schiphol (hourly average wind) that we used in the previous sections.

Now we are studying a generalization of a sequence of independent random variables to


a stationary series. Stationarity corresponds to a series with stochastic behaviour
homogeneous through time but whose variables may be mutually dependent.

To obtain the theoretical results in which are based the analysis of extremes of statinary
sequences it is usual to assume a condition that limits the extent of long-range
dependence at extreme levels, in the sense that the events X i  u and X j  u are

approximately independent, when the threshold level u is high enough and the time
points i and j are far away one from each other. Many physical phenomena satisfy this
property. In our example of wind speed it means that a high wind today might influence
the probability of an extreme wind tomorrow, maybe because both are due to the pass of
the same storm, but it is unlike that it might influence in a extreme wind in one month’s
time.

20
An introduction to statistical modeling of extreme values

The following condition formalizes the notion of extreme events being near-
independent if they are sufficiently distant in time.

D(un ) condition . A stationary series X1, X 2 , is said to satisfy the D(un ) condition if

for all i1    i p  j1    jq with j1  i p  k ,

 
Pr X i1  un , X i p  un , X j1  un , X j q  un 

 Pr X i1  un , X i p  u PrX
n j1  u n , X j q  u n    (n, k )

where  (n, kn )  0 for some sequence satisfying kn n n



 0 .

Observe that for independent sequences the difference is always 0. To get the following
result the condition needs to be satisfied only for threshold un that increases with n. In
this way we assure almost the independence of extreme observations that are enough far
apart.

Theorem 4. Let X1, X 2 , be a stationary process and M n  maxX1,, X n . Then if

there exist sequences of constants an  0 and bn  such that

PrM n  bn  an  zn

G( z )

where G is a non-degenerate distribution function and the D(un ) condition is satisfied

with un  an z  bn for every real z, G is a member of the generalized extreme value


family of distributions.

Observe that this result implies that when the stationary series has limited long-range
dependence at extreme levels, the maxima follow the same limit laws that in the case of
independent series. Furthermore, there exists a relationship between both distributions.

Theorem 5. Let X1, X 2 , be a stationary process and X1* , X 2* , be a sequence of


independent variables with the same marginal distribution. Denoting by
M n  maxX1,, X n  and M n*  max X1* ,, X n* then, under suitable regularity

conditions, there exist sequences of constants an  0 and bn  such that

  
G1 ( z ) , if and only if PrM n  bn  an  zn

Pr M n*  bn an  z n 
G2 ( z )

where G2 ( z )  G1 ( z ) , with  a constant 0    1 .

21
An introduction to statistical modeling of extreme values

From the relationship between both distributions is ready to obtain that both have the
same parameter  and


when   0 :  *   


1     and      

when   0 :       log  and    

The quantity  is named the extremal index. This index can be interpreted in terms of
the propensity of the process to cluster at extreme levels. Loosely,

  limiting mean cluster size1

where limiting is in the sense of cluster of exceedances of increasingly high thresholds.

Models for block maxima. The distribution of the block maxima, when the D(un )
condition is satisfied, falls in the same family of distributions as would be if the series
were independent. It means that dependence in the data can be ignored and then we can
model the data as it was done when we suppose independence. The only question is
that, because M n has similar statistical properties to M n (corresponding to the

maxima of n independent observations), the quality of the GEV family as an


approximation to the distribution of block maxima is diminished.

Threshold models. The generalized Pareto distribution remains appropriate for


threshold excesses but some changes are necessary because the extremes have some
tendency to cluster, violating the assumption made of independence among the
individual excesses.
C4
C2 C5

The most used method for dealing with C1 C3

the problem of dependent exceedances C6

in the threshold exceedance model is


declustering. This process filters the
dependent observations to obtain a set
of threshold excesses that are
approximately independent: Gap

22
An introduction to statistical modeling of extreme values

 Use an empirical rule to define clusters of exceedances

 Identify the maximum excess within each cluster

 Assuming cluster maxima to be independent, with conditional excess


distribution given by the generalized Pareto distribution

 Fitting the generalized Pareto distribution to the cluster maxima


The return level is estimated then by xm  u 

(m  u  )  1 where  and  are the

parameters of the threshold excess generalized Pareto distribution,  u is the probability

of an excedence of u, and  is the extremal index.

Denoting the number of exceedances above the threshold u by nu and the number of

clusters obtained above u by nc , the parameters  u and  are estimated as

nu n
ˆu  and ˆ  c
n nu

23
An introduction to statistical modeling of extreme values

Conclusions

Extreme value theory is a statistical discipline that is focused in describing the unusual
rather than the usual. Its objective is to quantify the stochastic behaviour of a process at
unusually large levels. By definition, the observation of these extreme values is very
few frequent. Furthermore, the objective of an extreme analysis is to estimate
probabilities of events that are more extreme than any that have already been observed.

The extreme value paradigm. The model extrapolation is based on the implementation
of mathematical limits as finite level approximations. One main objection is that it is
implicitly assumed that the underlying stochastic mechanism of the process being
modelled is sufficiently smooth to enable extrapolation to unobserved levels.

However this is the most credible alternative proposed to date.

It is important to be aware of the limitations implied by the adoption of the extreme


value paradigm.

1. Models are developed using asymptotic arguments, so care is needed in treating


them as exact results for finite samples.

2. The models are derived under idealized circumstances.

3. The model can lead to a wastage of information when implemented in practice.

Though the GEV model is supported by mathematical argument, its use in extrapolation
is based on unverifiable assumptions, and measures of uncertainty on return levels
should properly be regarded as lower bounds that could be much greater if uncertainty
due to model correctness were taken into account.

24
An introduction to statistical modeling of extreme values

Bibliography

Sanabria, L. A., Cechet, R. P. (2007). A Statistical Model of Severe Winds. Geoscience


Australia Record 2007/12, 60p. ISBN: 978 1 921236 43 3
Ferro CAT and Segers J (2003) Inference for clusters of extreme values. Journal of the
Royal Statistical Society B 65, 545-556.
Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer
London.
Castillo, E.; Hadi, A. S.; Balakrishnan, N. and Sarabia, J. M. (2005). Extreme Value and
Related Models with Applications in Engineering and Science. Wiley New Jersey
http://www.youtube.com/watch?v=oAWMpxX60KM&feature=player_embedded

http://www.youtube.com/watch_popup?v=CqEccgR0q-o

http://www.youtube.com/watch?v=b43lAoovqd8&feature=fvw

Citar librería de R

25
An introduction to statistical modeling of extreme values

26
An introduction to statistical modeling of extreme values

Extreme value theory is a statistical discipline that is focused in describing the unusual
rather than the usual. Its objective is to quantify the stochastic behaviour of a process at
unusually large levels. By definition, the observation of these extreme values is very
few frequent. Furthermore, the objective of an extreme analysis is to estimate
probabilities of events that are more extreme than any that have already been observed.
In this talk the classical block maxima models for extremes as well as threshold
excesses models are introduced and illustrated by using real wind data.

----- Original

27
An introduction to statistical modeling of extreme values

28

You might also like