0% found this document useful (0 votes)
39 views250 pages

Multivariate Extreme Value Theory

Uploaded by

Aziz Ibn Musah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views250 pages

Multivariate Extreme Value Theory

Uploaded by

Aziz Ibn Musah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 250

Springer Series in Operations Research

and Financial Engineering

Michael Falk

Multivariate
Extreme
Value Theory
and D-Norms
Springer Series in Operations Research
and Financial Engineering

Series Editors
Thomas V. Mikosch
Sidney I. Resnick
Stephen M. Robinson
More information about this series at http://www.springer.com/series/3182
Michael Falk

Multivariate Extreme Value


Theory and D-Norms

123
Michael Falk
Fakultät für Mathematik und Informatik
Universität Würzburg
Würzburg, Germany

ISSN 1431-8598 ISSN 2197-1773 (electronic)


Springer Series in Operations Research and Financial Engineering
ISBN 978-3-030-03818-2 ISBN 978-3-030-03819-9 (eBook)
https://doi.org/10.1007/978-3-030-03819-9

Library of Congress Control Number: 2018965201

Mathematics Subject Classification: 60E05, 60G70, 62H05

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material contained herein or for any errors
or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims
in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
‘We do not want to calculate,
we want to reveal structures.’

- David Hilbert, 1930 -

This book is dedicated to my teacher, Rolf Reiss.


. . . you were the one who had made it so clear. . .
George Harrison, All Those Years Ago
Preface

Multivariate extreme value theory (MEVT) is the appropriate toolbox for


analyzing several extremal events simultaneously. However, MEVT is by no
means easy to access; its key results are formulated in a measure-theoretic
setup in which a common thread is not visible.
Writing the ‘angular measure’ in MEVT in terms of a random vector, how-
ever, provides the missing common thread: every result in MEVT, every rel-
evant probability distribution, be it a max-stable one or a generalized Pareto
distribution, every relevant copula, every tail dependence coefficient, etc., can
be formulated using a particular kind of norm on the multivariate Euclidean
space, called a D-norm. Deep results, such as Takahashi’s characterizations
of multivariate max-stable distributions with independent or completely de-
pendent margins, turn out to be elementary and easily seen properties of
D-norms.
Norms are introduced during each basic course on mathematics as soon as
the multivariate Euclidean space is introduced. The definition of an arbitrary
D-norm requires only the additional knowledge of random variables and their
expectations. But D-norms not only constitute the common thread through
MEVT, they are also of mathematical interest on their own.
D-norms were first mentioned in Falk et al. (2004, equation (4.25)) and
elaborated on in Falk et al. (2011, Section 4.4). However, it was recognized
only later that D-norms are actually the skeleton of MEVT and that they
simultaneously provide a mathematical topic that can be studied indepen-
dently. This book fills that gap by compiling contemporary knowledge about
D-norms and, simultaneously, offers a relaxed tour through the essentials of
MEVT due to the D-norm approach.
In Chapter 1, we introduce the theory of D-norms in detail and com-
pile contemporary knowledge. Chapter 2 presents the first links with MEVT:
multivariate generalized Pareto distributions and multivariate max-stable dis-
tributions are introduced via D-norms. The particular role that copulas play

VII
VIII Preface

in MEVT is investigated in detail in Chapter 3. D-norms can also be defined


on functional spaces, as in Section 1.10. This enables a smooth approach to
functional extreme value theory in Chapter 4, in particular to generalized
Pareto processes and max-stable processes.
Further applications of D-norms, such as the max-characteristic function,
multivariate order statistics or multivariate records are given in Chapter 5.
Parts of the present text were presented during a Winter Course, organized
by the CRoNoS COST Action, and at a tutorial preceding the 8th Interna-
tional Conference of the ERCIM WG on Computational and Methodological
Statistics (CMStatistics 2015), December 2015, Senate House, University of
London, UK. The tutorial was mainly directed at PhD students and Early-
Stage Career Investigators.
This text was also used as a basis for several courses taught at the Uni-
versity of Würzburg. The material in Chapter 1 can be used to give an in-
dependent semester-long course on D-norms. The mathematical prerequisites
are modest; a basic knowledge of analysis and probability theory, including
Fubini’s theorem, should be sufficient in most cases. Possible applications of
the theory of D-norms can be taken from Chapter 5.
A one-semester course on MEVT can be based on a combination of auxil-
iary results for D-norms, in particular from the Sections 1.1, 1.2, 1.3, 1.6, 1.7,
and 1.10, with the introduction to univariate extreme value theory in Sec-
tion 2.1 and the material on MEVT as developed in Sections 2.2, 2.3, and 3.1.
An introduction to functional extreme value theory as provided in Sections 4.1
and 4.2 can be added. Possible applications, such as generalized max-linear
models or multivariate records and champions, can be taken from Sections 4.3
and 5.3.
Views on D-norms from a functional analysis and from a stochastic geom-
etry perspective are provided in Section 1.11 and Section 1.12 respectively.
These mathematically more advanced sections underline in particular the aim
of this book, which is to reveal mathematical structures. It is not a book on
statistics.
The author greatly appreciates the constructive feedback given by his stu-
dents; in particular, Emily Geske and Simon Kolb deserve to be mentioned
here for their extraordinarily careful reading of the manuscript and their
many helpful suggestions. The author is grateful to his PhD students Ste-
fan Aulbach, Timo Fuller, Daniel Hofmann, Martin Hofmann, René Michel,
Florian Wisheckel, and Maximilian Zott for their cooperation and numerous
scientific contributions. The collaboration with Gilles Stupfler on the max-
characteristic function is greatly acknowledged.
Last, but not least, the author is in particular indebted to Sidney Resnick
for pushing him very gently but persistently to write this book.

Würzburg, Germany Michael Falk


September 10, 2018
Contents

1 D-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Norms and D-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Examples of D-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Takahashi’s Characterizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Convexity of the Set of D-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 When Is an Arbitrary Norm a D-Norm? . . . . . . . . . . . . . . . . . . . . 17
1.6 The Dual D-Norm Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.7 Normed Generators Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Metrization of the Space of D-Norms . . . . . . . . . . . . . . . . . . . . . . . 33
1.9 Multiplication of D-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.10 The Functional D-Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1.11 D-Norms from a Functional Analysis Perspective . . . . . . . . . . . . 62
1.12 D-Norms from a Stochastic Geometry Perspective . . . . . . . . . . . 78

2 D-Norms & Multivariate Extremes . . . . . . . . . . . . . . . . . . . . . . . . 99


2.1 Univariate Extreme Value Theory . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.2 Multivariate Generalized Pareto Distributions . . . . . . . . . . . . . . . 102
2.3 Multivariate Max-Stable Distributions . . . . . . . . . . . . . . . . . . . . . 107
2.4 How to Generate Max-Stable RVS . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.5 Covariances, Range, etc. of Standard Max-Stable rvs . . . . . . . . . 125
2.6 Max-Stable Random Vectors as Generators of D-Norms . . . . . . 130

3 Copulas & Multivariate Extremes . . . . . . . . . . . . . . . . . . . . . . . . . 135


3.1 Characterizing Multivariate Domain of Attraction . . . . . . . . . . . 135
3.2 Multivariate Piecing-Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
3.3 Copulas Not in the Domain of Attraction . . . . . . . . . . . . . . . . . . . 158

IX
X Contents

4 An Introduction to Functional Extreme Value Theory . . . . . 161


4.1 Generalized Pareto Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.2 Max-Stable Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.3 Generalized Max-Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

5 Further Applications of D-Norms to Probability


& Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.1 Max-Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
5.2 Multivariate Order Statistics: The Intermediate Case . . . . . . . . 205
5.3 Multivariate Records and Champions . . . . . . . . . . . . . . . . . . . . . . 213

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
1
D-Norms

This chapter is devoted to the theory of D-norms, a topic that is of unique


mathematical interest. It is aimed at compiling contemporary knowledge on
D-norms. For a survey of the various aspects that are dealt with in Chapter 1
we simply refer the reader to the table of contents of this book.

1.1 Norms and D-Norms


We start with a general definition of a norm on the d-dimensional Euclidean
space Rd := {x = (x1 , . . . , xd ) : xi ∈ R, 1 ≤ i ≤ d}.
Definition 1.1.1 A function f : Rd → [0, ∞) is a norm if, for all
x, y ∈ Rd , λ ∈ R, it satisfies

f (x) = 0 ⇐⇒ x = 0 ∈ Rd , (1.1)
f (λx) = |λ| f (x), (1.2)
f (x + y) ≤ f (x) + f (y). (1.3)

Condition (1.2) is called homogeneity and condition (1.3) is called triangle


inequality or Δ-inequality, for short.
A norm f : Rd → [0, ∞) is typically denoted by

x = f (x), x ∈ Rd .

Each norm on Rd defines a distance, or metric on Rd via

d(x, y) = x − y , x, y ∈ Rd .

© Springer Nature Switzerland AG 2019 1


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9 1
2 1 D-Norms

Well-known examples of norms are the sup-norm

x∞ := max |xi |


1≤i≤d

and the L1 -norm or Manhattan-norm


d
x1 := |xi | , x = (x1 , . . . , xd ) ∈ Rd .
i=1

The Logistic Norm


Less obvious is that
 d 1/p
 p
xp := |xi | , 1 ≤ p < ∞,
i=1

actually defines a norm for each p ≥ 1. This is the family of logistic norms.
The particular case p = 2 is commonly called the Euclidean norm.
Although condition (1.1) and homogeneity (1.2) are obvious, the proof of
the corresponding Δ-inequality is a little challenging. The inequality
 d 1/p  1/p  1/p
 
d 
d
|xi + yi |p ≤ |xi |p + |yi |p
i=1 i=1 i=1

is known as the Minkowski inequality. We do not establish this inequality


here, as it follows from the fact that each logistic norm ·p is a D-norm; see
Proposition 1.2.1.
The next result shows that the sequence of p-norms decreases pointwise as
p tends to infinity, with its limit being the sup-norm ·∞ . As a consequence,
·∞ can be added to the family of logistic norms, now satisfying pointwise

·∞ ≤ ·p ≤ ·1 .

We see in (1.4) that these inequalities are maintained by the set of D-norms.
Lemma 1.1.2 We have, for 1 ≤ p ≤ q ≤ ∞ and x ∈ Rd ,
(i) xp ≥ xq ,
(ii) lim xp = x∞ .
p→∞

 1/p
d p
Proof. (i) This inequality is obvious for q = ∞: x∞ ≤ i=1 |xi | .
Consider now 1 ≤ p ≤ q < ∞ and choose x
= 0 ∈ R . Put S := xp . d

Then, we have
1.1 Norms and D-Norms 3
x
 
  = 1,
S p
and we have to establish x
 
  ≤ 1.
S q
From
|xi |
∈ [0, 1]
S
and, thus,
q
p
|xi | |xi |
≤ , 1 ≤ i ≤ d,
S S
we obtain
 d 1/q  1/q
x
   |xi |
q d
 |xi |

p  
p/q
x
  = ≤ =   = 1p/q = 1,
S q i=1
S i=1
S S p

which is (i).
(ii) We have, moreover, for x
= 0 ∈ Rd and p ∈ [1, ∞)
 d
p 1/p
 |xi |
x∞ ≤ xp = x∞ ≤ d1/p x∞ →p→∞ x∞ ,
i=1
x ∞

which implies (ii).


Norms by Quadratic Forms


Let A = (aij )1≤i,j≤d be a positive definite d × d-matrix, i.e., the matrix A is
symmetric, A = A = (aji )1≤i,j≤d , and satisfies

x Ax = xi aij xj > 0, x ∈ Rd , x
= 0 ∈ Rd .
1≤i,j≤d

Then,
1
xA := (x Ax) 2 , x ∈ Rd ,
defines a norm on Rd . With

10
A= ,
01

we obtain, for example, xA = (x21 + x22 )1/2 = x2 .


Conditions (1.1) and (1.2) are obviously satisfied. Write A = A1/2 A1/2 ,
where A1/2 is the symmetric root of A. The ordinary Cauchy–Schwarz in-
equality (x y)2 ≤ (x x)(y  y), x, y ∈ Rd , implies
4 1 D-Norms
 2
2
(x Ay) = (A1/2 x) (A1/2 y) ≤ (x Ax) (y  Ay) .

The Δ-inequality is a consequence:


2 
x + yA = (x + y) A (x + y)
= x Ax + y  Ax + x Ay + y  Ay
1 1
≤ x Ax + 2(x Ax) 2 (y  Ay) 2 + y  Ay
 1 1
2
= (x Ax) 2 + (y  Ay) 2 .

Definition of D-Norms
The following result introduces D-norms.

Lemma 1.1.3 (D-Norms) Let Z = (Z1 , . . . , Zd ) be a random vector


(rv), whose components satisfy

Zi ≥ 0, E(Zi ) = 1, 1 ≤ i ≤ d.

Then,

xD := E max (|xi | Zi ) , x ∈ Rd ,


1≤i≤d

defines a norm, called a D-norm, and Z is called a generator of this


D-norm ·D .

Proof. The homogeneity condition (1.2) is obviously satisfied. Further, we


have the bounds

x∞ = max |xi | = max E (|xi | Zi )


1≤i≤d 1≤i≤d

 d 

≤ E max (|xi | Zi ) ≤ E |xi | Zi
1≤i≤d
i=1

d
= |xi | E(Zi ) = x1 , x ∈ Rd ,
i=1

i.e.,

x∞ ≤ xD ≤ x1 , x ∈ Rd . (1.4)

This implies condition (1.1). The Δ−inequality is easily seen:


1.1 Norms and D-Norms 5

x + yD = E max (|xi + yi | Zi )


1≤i≤d

≤E max ((|xi | + |yi |)Zi )


1≤i≤d

≤ E max (|xi | Zi ) + max (|yi | Zi )


1≤i≤d 1≤i≤d

= E max (|xi | Zi ) + E max (|yi | Zi )


1≤i≤d 1≤i≤d

= xD + yD .

Basic Properties of D-Norms


Denote by ej = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rd the j-th unit vector in Rd , 1 ≤ j ≤
d. Each D-norm satisfies

ej D = E max (δij Zi ) = E(Zj ) = 1,


1≤i≤d

where δij = 1 if i = j and zero elsewhere, i.e., each D-norm is standardized.


Each D-norm is monotone as well, i.e., for 0 ≤ x ≤ y, where this inequality
is taken componentwise, we have

xD = E max (xi Zi ) ≤ E max (yi Zi ) = yD .


1≤i≤d 1≤i≤d

Note that there are norms that are not monotone: choose, for example,


A=
δ1

with δ ∈ (−1, 0). The matrix A is positive definite, but the norm xA =
(x Ax)1/2 = (x21 + 2δx1 x2 + x22 )1/2 is not monotone; put, for example, δ =
−1/2 and set x1 = 1, x2 = 0, y1 = 1, and y2 = 1/2. Then, x ≤ y, but

xA = 1 > yA = 3/2.

Each D-norm is obviously radially symmetric, i.e., changing the sign of


arbitrary components of x ∈ Rd does not alter the value of xD . This means
that the values of a D-norm are completely determined by its values on the
subset {x ∈ Rd : x ≥ 0}. The above norm ·A , for example, is in general not
radially symmetric.
6 1 D-Norms

1.2 Examples of D-Norms


Choose the constant generator Z := (1, 1, . . . , 1). Then,

xD = E max (|xi | Zi ) = E max (|xi |) = x∞ ,


1≤i≤d 1≤i≤d

i.e., the sup-norm is a D-norm.


Let X ≥ 0 be an rv with E(X) = 1 and put Z := (X, X, . . . , X). Then, Z
is a generator of the D-norm
xD = E( max (|xi | Zi )) = E( max (|xi | X))
1≤i≤d 1≤i≤d

= max (|xi |)E(X) = x∞ E(X) = x∞ . (1.5)


1≤i≤d

This example shows that the generator of a D-norm is in general not


uniquely determined; even its distribution is not.
Now let Z be a random permutation of (d, 0, . . . , 0) ∈ Rd with equal
probability 1/d, i.e.,

d, with probability 1/d,
Zi = 1 ≤ i ≤ d,
0, with probability 1 − 1/d,

and Z1 + · · · + Zd = d. In what follows, we use 1(Zj = d) to denote the


indicator function of the event Zj = d, i.e., 1(Zj = d) = 1 if Zj = d and
1(Zj = d) = 0 elsewhere. The rv Z is the generator of a D-norm:

xD = E max (|xi | Zi )


1≤i≤d
⎛ ⎞ ⎛ ⎞
d d
= E ⎝ max (|xi | Zi ) 1(Zj = d)⎠ = E ⎝ max (|xi | Zi ) 1(Zj = d)⎠
1≤i≤d 1≤i≤d
j =1 j =1
⎛ ⎞

d 
d
=E⎝ |xj | d 1(Zj = d)⎠ = |xj | d E (1(Zj = d))
j=1 j=1


d 
d
= |xj | d P (Zj = d) = |xj | = x1 ,
j=1 j=1

i.e., ·1 is a D-norm as well.


Inequality (1.4) shows that the sup-norm ·∞ is the smallest D-norm and
that ·1 is the largest D-norm.

Each Logistic Norm is a D-Norm


∞
Using Γ (s) = 0
ts−1 exp(−t) dt, s > 0, we denote the usual Gamma
function.
1.2 Examples of D-Norms 7

 d p 1/p
Proposition 1.2.1 Each logistic norm xp = i=1 |xi | , 1≤
p ≤ ∞, is a D-norm. For 1 < p < ∞ a generator is given by
 
  X
1/p
X
1/p
(p) (p)
Z (p) = Z1 , . . . , Zd := 1
,..., d
,
Γ (1 − p−1 ) Γ (1 − p−1 )

where X1 , . . . , Xd are independent and identically (iid) standard


Fréchet-distributed rv, i.e.,
 
P (Xi ≤ x) = exp −x−1 , x > 0, i = 1, . . . , d,
   
1/p
with E Xi = Γ 1 − p−1 , 1 ≤ i ≤ d.

From Lemma 1.1.2, we know that ·p →p→∞ ·∞ pointwise. We have,
 
moreover, Γ 1 − p−1 →p→∞ Γ (1) = 1 and, consequently, we also have point-
wise convergence almost surely (a.s.):
 
(p) (p)
Z (p) = Z1 , . . . , Zd
 1/p

1/p
X1 Xd
= ,...,
Γ (1 − p−1 ) Γ (1 − p−1 )
→p→∞ (1, . . . , 1) a.s.,
where the constant Z = (1, . . . , 1)
 ∈ R −1is a generator of the sup-norm ·∞ .
d

Note that, on the other hand, Γ 1 − p →p↓1 ∞ and, thus,


 
(p) (p)
Z (p) = Z1 , . . . , Zd →p↓1 0 ∈ Rd ,
which is not a generator of a D-norm. This is an example where pointwise con-
vergence of D-norms does not entail convergence of their generators. However,
the reverse implication is correct, see Corollary 5.1.13.
Before we prove Proposition 1.2.1, we state a tool that is often used in this
book.

Lemma 1.2.2 Let X be an rv with X ≥ 0 a.s. Then, its (possibly


infinite) expectation can be written as
 ∞  ∞
E(X) = P (X > t) dt = 1 − F (t) dt,
0 0

where F is the distribution function (df ) that corresponds to X.


If X ≤ 0 a.s., then
 0
E(X) = − F (t) dt.
−∞
8 1 D-Norms

Proof. Suppose X ≥ 0 a.s. Using 1A (t), we denote the indicator function of a


set A, i.e., 1A (t) = 1 if t ∈ A and zero elsewhere. Fubini’s theorem implies
 ∞
E(X) = x F (dx)
0 ∞  ∞
= 1[0,x) (t) dt F (dx)
0 ∞ 0 ∞
= 1[0,x) (t) F (dx) dt
0 0
 ∞  ∞
= E(1[0,X) (t)) dt = P (X > t) dt.
0 0

If X ≤ 0 a.s., then −X ≥ 0 a.s. and


 ∞  0
E(X) = −E(−X) = − P (−X > t) dt = − F (t) dt
0 −∞

by elementary arguments.

An immediate consequence of the preceding result is, for example, the


following conclusion: if the rv X satisfies X ≥ 0 a.s. and E(X) = 0, then
X = 0 a.s. This conclusion is often used in what follows as well.
 
1/p
Proof (of Proposition 1.2.1). Check that μ := E X1 = Γ (1 − p−1 ) (or see
equation (2.23)). From Lemma 1.2.2, we obtain

 ∞

E max (|xi | Zi ) = P max (|xi | Zi ) > t dt


1≤i≤d 0 1≤i≤d
 ∞

= 1 − P max (|xi | Zi ) ≤ t dt
0 1≤i≤d
 ∞

t
= 1 − P Zi ≤ , 1 ≤ i ≤ d dt
0 |xi |
 ∞ d

t
= 1− P Zi ≤ dt
0 i=1
|xi |
 ∞ d
p

|xi |
= 1− exp − dt
0 i=1

 ∞   
d p
i=1 |xi |
= 1 − exp − dt.
0 (tμ)p
1.2 Examples of D-Norms 9
  p1
d p
The substitution t → t i=1 |xi | /μ implies that the integral above equals

  p1
d
|xi |
p  ∞

i=1 1
1 − exp − p dt
μ 0 t
 ∞  
xp 1/p
= 1/p
P X1 > t dt
E(X1 ) 0
xp  
=   E X11/p
1/p
E X1
= xp .

The Hüsler–Reiss D-Norm


Let the rv X = (X1 , . . . , Xd ) follow a multivariate normal distribution with
mean vector zero, i.e., E(Xi ) = 0, 1 ≤ i ≤ d, and covariance matrix
Σ = (σij )1≤i,j≤d = (E(Xi Xj ))1≤i,j≤d . Then, exp(Xi ) follows a log-normal
distribution with mean exp(σii /2), 1 ≤ i ≤ d, and thus,
  σ11   σdd 
Z = (Z1 , . . . , Zd ) := exp X1 − , . . . , exp Xd − (1.6)
2 2
is the generator of a D-norm, called a Hüsler–Reiss D-norm. This norm de-
pends only on the covariance matrix Σ, and therefore, it is denoted by ·HRΣ .
In the special case where X is a finite dimensional margin of a Brownian
motion (Bt )t≥0 , i.e., X = (Bt1 , . . . , Btd ), 0 ≤ t1 < · · · < td , we compute the
bivariate projections

(x, y)HRΣ ij := E(max(|x| Zi , |y| Zj )), x, y ∈ R,

in Lemma 1.10.6. For more general computations we refer to Krupskii et al.


(2018) and the literature cited therein.
Recall that an arbitrary positive semidefinite d × d-matrix Σ defines the
covariance matrix of a multivariate normal distributed rv X = (X1 , . . . , Xd )
with zero means. With the corresponding Hüsler–Reiss D-norm ·HRΣ , we
have thus defined a mapping from the set of positive semidefinite d×d-matrices
into the set of D-norms on Rd . This mapping is, however, not one-to-one:
Denote by E that d×d-matrix with constant entry one. Then, the two matrices
Σ and Σ + λE, with an arbitrary number λ > 0, generate the same Hüsler–
Reiss D-norm ·HRΣ ; see Example 1.9.2.
10 1 D-Norms

Generators with Uniquely Determined Distribution


We have seen already that neither the generator nor the distribution of a D-
norm is uniquely determined. Actually, if Z = (Z1 , . . . , Zd ) is the generator
of a D-norm, then ZX := (Z1 X, Z2 X, . . . , Zd X) generates the same D-norm
if X is an rv with X ≥ 0 and E(X) = 1, which is also independent of Z. This
is easy to see and closely related to the multiplication of D-norms as defined
in Section 1.9.
The distribution of a generator Z = (Z1 , . . . , Zd ), whose first component is
the constant one, however, is uniquely determined. This is an obvious conse-
quence of Lemma 5.1.1. An inspection of its proof shows that this conclusion
remains valid if Zi = 1 for some i ∈ {1, . . . , d}, i.e., we have the following
result. We state it here for the sake of completeness.
   
(1) (1) (2) (2)
Lemma 1.2.3 Let Z (1) = Z1 , . . . , Zd , Z (2) = Z1 , . . . , Zd
be generators of the same D-norm ·D on Rd with the property that,
(1) (2)
for some index i ∈ {1, . . . , d}, we have Zi = 1 = Zi . Then, the
distributions of Z (1) and Z (2) coincide.

1.3 Takahashi’s Characterizations


The following result shows that the two extremal D-norms ·∞ and ·1 are
already characterized by their values at just one point.

Theorem 1.3.1 (Takahashi (1987, 1988)) Let ·D be an arbitrary


D-norm on Rd . Then, we have the equivalences
(i) ·D = ·1 ⇐⇒ ∃ y > 0 ∈ Rd : yD = y1 ,
(ii) ·D = ·∞ ⇐⇒ 1D = 1.

Corollary 1.3.2 For an arbitrary D-norm ·D on Rd , we have



·∞ , 1,
·D = ⇐⇒ 1D =
·1 , d.

Proof. To prove Theorem 1.3.1 we only have to show the implication “⇐”.
Let (Z1 , . . . , Zd ) be a generator of ·D .
(i) Suppose we have yD = y1 for some y > 0 ∈ Rd , i.e.,
 d 
d 
d 
E( max (yi Zi )) = yi = yi E(Zi ) = E yi Zi .
1≤i≤d
i=1 i=1 i=1
1.3 Takahashi’s Characterizations 11

This entails
 d 
 d 
 
E yi Zi − E max (yi Zi ) = E yi Zi − max (yi Zi ) = 0
1≤i≤d 1≤i≤d
i=1 i=1
  
≥0


d
⇒ yi Zi − max (yi Zi ) = 0 a.s.
1≤i≤d
i=1

d
⇒ yi Zi = max (yi Zi ) a.s.
1≤i≤d
i=1

Recall that yi > 0 for all i. Hence, Zi > 0 for some i ∈ {1, . . . , d} implies
Zj = 0 for all j
= i. Thus, we have, for arbitrary x ≥ 0 ∈ Rd ,


d
xi Zi = max (xi Zi ) a.s.
1≤i≤d
i=1
 


d
⇒ E xi Zi =E max (xi Zi )
1≤i≤d
i=1
⇒ x1 = xD .

(ii) We have the following list of conclusions:

(1, . . . , 1)D = 1

⇒ E max Zi = E(Zj ), 1 ≤ j ≤ d,
1≤i≤d

⇒ E max Zi − Zj = 0, 1 ≤ j ≤ d,
1≤i≤d
  
≥0

⇒ max Zi − Zj = 0 a.s., 1 ≤ j ≤ d,
1≤i≤d

⇒ Z1 = Z2 = . . . = Zd = max Zi a.s.
1≤i≤d

⇒ E max (|xi | Zi ) = E max (|xi | Z1 )


1≤i≤d 1≤i≤d

= E(x∞ Z1 )
= x∞ E(Z1 )
= x∞ , x ∈ Rd .



12 1 D-Norms

Sequences of D-Norms
Theorem 1.3.1 can easily be generalized to sequences of D-norms.

Theorem 1.3.3 Let ·Dn , n ∈ N, be a sequence of D-norms on Rd .


(i) ∀x ∈ Rd : xDn → x1 ⇐⇒ ∃ y > 0 : yDn → y1 ,
n→∞ n→∞
(ii) ∀x ∈ Rd : xDn → x∞ ⇐⇒ 1Dn → 1
n→∞ n→∞

Corollary 1.3.2 carries over.


 
(n) (n)
Proof. Let Z1 , . . . , Zd be a generator of ·Dn . Again, we only need to
show the implication “⇐”.
(i) We suppose y1 − yDn →n→∞ 0 for some y > 0 ∈ Rd . With the
 
(n) (n)
notation Mj := yj Zj = max1≤i≤d yi Zi and 1Mj denoting the indicator
function of the event Mj , we get for every j = 1, . . . , d
⎛ ⎞
⎜ ⎟
⎜ d (n) ⎟
y1 − yDn = E ⎜ yi Zi − max yi Zi ⎟
(n)
⎜ 1≤i≤d ⎟
⎝ i=1 ⎠
  
≥0
  

d
(n) (n)
≥E yi Zi − max yi Zi 1Mj
1≤i≤d
i=1
⎛ ⎞
⎜d
(n) ⎟
=E⎝ yi Zi 1Mj ⎠
i=1
i=j


d  
(n)
= yi E Zi 1Mj →n→∞ 0,
i=1
i=j

since the left-hand side of this equation converges to zero by assumption:


y1 − yDn →n→∞ 0 and the right-hand side is non-negative. As yi > 0 for
all i = 1, . . . , d, we have
 
(n)
E Zi 1Mj →n→∞ 0 (1.7)

for all i
= j. Choose an arbitrary x ∈ Rd . From inequality (1.4) we know that
1.3 Takahashi’s Characterizations 13

0 ≤ x1 − xDn
⎛ ⎞
⎜ ⎟
⎜ d (n) ⎟

=E⎜ |xi | Zi − max |xi | Zi ⎟
(n)
1≤i≤d ⎟
⎝ i=1 ⎠
  
≥0
⎛  d  ⎞
d  (n) (n)
≤E⎝ |xi | Zi − max |xi | Zi 1Mj ⎠
1≤i≤d
j=1 i=1
  

d 
d
(n) (n)
= E |xi | Zi − max |xi | Zi 1Mj
1≤i≤d
j=1 i=1


d 
d  
(n)
= |xi | E Zi 1Mj →n→∞ 0,
j=1 i=1   
i=j by (1.7)
−−n→∞
−−−→0

which implies
xDn →n→∞ x1 , x ∈ Rd .
(ii) We use inequality (1.4) and obtain
0 ≤ xDn − x∞

(n)
= E max |xi | Zi − max |xi |
1≤i≤d 1≤i≤d

(n)
≤ max |xi | E max Zi − max |xi |
1≤i≤d 1≤i≤d 1≤i≤d

(n)
= x∞ E max Zi −1
1≤i≤d
 
= x∞ 1Dn − 1 →n→∞ 0.

Characterizations by Bivariate Projections


Let ·D be an arbitrary D-norm on Rd with generator Z = (Z1 , . . . , Zd ).
Then, (x, y)Dij := E (max(|x| Zi , |y| Zj ) = xei + yej D defines a D-norm
on R2 , for 1 ≤ i < j ≤ d, with generator (Zi , Zj ). Recall that ei denotes the
i-th unit vector in Rd , 1 ≤ i ≤ d.
The preceding results provide characterizations of the extremal D-norms
·∞ and ·1 by their values at a single point in Rd . Both can also be char-
acterized by the corresponding families of bivariate projections {ei + ej D :
1 ≤ i < j ≤ d}. This is the message of the subsequent results. The convergence
of norms is meant pointwise as in Theorem 1.3.3.
14 1 D-Norms

Theorem 1.3.4 Let ·Dn , n ∈ N, be a sequence of D-norms on Rd .


We have the equivalences
(i) ·Dn → ·1 ⇐⇒ ∀ 1 ≤ i < j ≤ d : ei + ej Dn → 2,
n→∞ n→∞
(ii) ·Dn → ·∞ ⇐⇒ ∃ i ∈ {1, . . . , d} ∀j
= i : ei + ej Dn
n→∞
→ 1.
n→∞

Proof. (i) For all 1 ≤ i < j ≤ d, we have


2 − ei + ej Dn
   
(n) (n) (n) (n)
= E Zi + Zj − E max(Zi , Zj )
 
(n) (n) (n) (n)
= E Zi + Zj − max(Zi , Zj )
 
 
(n) (n) (n) (n)
≥ E Zi + Zj − max(Zi , Zj ) 1Z (n) =max (n)
Zk

j 1≤k≤d

(n)
= E Zi 1Z (n) =max (n)
Zk
 ≥ 0.
j 1≤k≤d

(n)
Therefore, E Zi 1Z (n) =max (n)
 →n→∞ 0, which is (1.7). We can
j 1≤k≤d Zk

repeat the steps of the preceding proof and get the desired assertion.
(ii) For our given value of i, we have
0 ≤ 1Dn − 1

(n) (n)
= E max Zk − Zi
1≤k≤d


d

(n) (n)
≤ E max Z − Zi 1Z (n) =max (n)

1≤k≤d k j 1≤k≤d Zk
j=1
 

d    
(n) (n) (n)
= E max Zi , Zj − Zi 1Z (n) =max (n)

j 1≤k≤d Zk
j=1


d    
(n) (n) (n)
≤ E max Zi , Zj − Zi
j=1
  
= ei + ej Dn − 1 →n→∞ 0,
1≤j≤d, j=i

which proves the assertion according to part (ii) of Theorem 1.3.3.



The following consequence of the preceding theorem is obvious by putting
·Dn = ·D .
1.4 Convexity of the Set of D-Norms 15

Corollary 1.3.5 Let ·D be an arbitrary D-norm on Rd . We have the


characterizations:
(i) ·D = ·1 ⇐⇒ ∀ 1 ≤ i < j ≤ d : ei + ej D = 2 = ei + ej 1 ,
(ii) ·D = ·∞ ⇐⇒ ∃ i ∈ {1, . . . , d} ∀j
= i : ei + ej D = 1.

1.4 Convexity of the Set of D-Norms


A convex combination of two D-norms is again a D-norm.

Proposition 1.4.1 The set of D-norms on Rd is convex, i.e., if ·D1


and ·D2 are D-norms, then

·λD1 +(1−λ)D2 := λ ·D1 + (1 − λ) ·D2

is a D-norm as well for each λ ∈ [0, 1].

Take, for example, a convex combination of ·∞ and ·1 :


d
xDλ := λ x∞ + (1 − λ) x1 = λ max |xi | + (1 − λ) |xi | . (1.8)
1≤i≤d
i=1

This is the Marshall–Olkin D-norm with parameter λ ∈ [0, 1].


Proof (of Proposition 1.4.1). Let ξ be a rv with P (ξ = 1) = λ = 1 − P (ξ = 2)
that is independent of Z (1) and Z (2) , where Z (1) and Z (2) are generators of
·D1 and ·D2 . Then, Z := Z (ξ) is a generator of ·λD1 +(1−λ)D2 , as we
have, for x ≥ 0 ∈ Rd ,
⎛ ⎞

2
(ξ) (ξ)
E max xi Zi =E⎝ max xi Zi 1{ξ=j} ⎠
1≤i≤d 1≤i≤d
j=1
2


(ξ)
= E max xi Zi 1{ξ=j}
1≤i≤d
j=1
2


(j)
= E max xi Zi 1{ξ=j}
1≤i≤d
j=1
2


(j)  
= E max xi Zi E 1{ξ=j}
1≤i≤d
j=1

(1) (2)
= λE max xi Zi + (1 − λ)E max xi Zi .
1≤i≤d 1≤i≤d
16 1 D-Norms
 
(ξ)
By putting xi = 1 and xj = 0 for j
= i, we obtain in particular E Zi = 1,
1 ≤ i ≤ d. This completes the proof.

A Bayesian Type of D-Norm


The preceding convexity of the set of D-norms can be viewed as a special case
of a Bayesian type of D-norm,
 as illustrated
 by the following example.
Consider the family ·p : p ≥ 1 of logistic D-norms as defined in (1.1).
∞
Let f be a probability density on [1, ∞), i.e., f ≥ 0 and 1 f (p) dp = 1. Then,
 ∞
xf := xp f (p) dp, x ∈ Rd ,
1

defines a D-norm on Rd . This can easily be seen as follows. Let X be an rv on


[1, ∞) with this probability density f (·), and suppose that X is independent
from each generator Z (p) of ·p , p ≥ 1. Then,

Z := Z (X)

generates the D-norm ·f :


 ∞  
E(Z) = E Z (X) | X = p f (p) dp
1 ∞  
= E Z (p) f (p) dp = 1
1

and
 

(X)
E max |xi | Zi
1≤i≤d
 ∞  

(X)
= E max |xi | Zi
| X = p f (p) dp
1 1≤i≤d
 ∞  

(p)
= E max |xi | Zi f (p) dp
1≤i≤d
1 ∞
= xp f (p) dp.
1

If we take, for instance, the Pareto density fλ (p) := λp−(1+λ) , p ≥ 1, with


parameter λ > 0, we obtain
  p 1/p
∞ 
λp−(1+λ) dp,
p
xfλ = |xi | x ∈ Rd .
1 i=1
1.5 When Is an Arbitrary Norm a D-Norm? 17

1.5 When Is an Arbitrary Norm a D-Norm?


The obvious question When is an arbitrary norm · a D-norm? is answered
by Theorem 2.3.3: if and only if (iff) the norm · is standardized, i.e., ei  =
1, 1 ≤ i ≤ d, and
  
G(x) := exp − (min(xi , 0))di=1  = exp(− min(x, 0)),
x = (x1 , . . . , xd ) ∈ Rd , defines a df on Rd . If G is a df, then each univariate
margin is the standard negative exponential df exp(x), x ≤ 0.
The function G defines a df iff it is Δ-monotone: For a ≤ b ∈ Rd
   
Δb G :=
a (−1)(d− j≤d mj ) G bm1 a1−m1 , . . . , bmd a1−md ≥ 0.
1 1 d d
m∈{0,1}d

The remaining properties that G has to satisfy to be a multivariate df, such


as its continuity from the right, are automatically satisfied by its definition;
see, e.g., Reiss (1989, (2.2.19)). Hofmann (2009) established the following
characterization.

Theorem 1.5.1 (Hofmann, 2009) Let · be an arbitrary norm on


Rd . Then, the function G (x) = exp (− x), x ≤ 0 ∈ Rd , defines a
multivariate df iff the norm satisfies
 
d+1− j≤d mj
(−1)
m∈{0,1}d : mi =1, i∈K
 
×  bm 1 1−m1
1 a1 , . . . , bm d 1−md 
d ad ≥0 (1.9)

for every K ⊂ {1, . . . , d}, K


= {1, . . . , d}, and −∞ < aj ≤ bj ≤ 0, 1 ≤
j ≤ d.

An extension of the previous characterization, with the norm · replaced


by a homogeneous function, was established by Ressel (2013, Theorem 6).

The Bivariate Case


Putting K = {1} and K = {2}, condition (1.9) reduces in the case of d = 2
to
(b1 , b2 ) ≤ min ((b1 , a2 ) , (a1 , b2 )) , a ≤ b ≤ 0,
which, in turn, is equivalent to
b ≤ a , a ≤ b ≤ 0,
i.e., the monotonicity of ·. With K = ∅, inequality (1.9) becomes
(a1 , a2 ) + (b1 , b2 ) ≤ (a1 , b2 ) + (b1 , a2 ) .
18 1 D-Norms

But this is true for every norm on R2 , as we show next. Suppose that
a
= b. Put
a1 b 2 − b 1 b 2 b 1 a2 − b 1 b 2
α := , β :=
a1 a2 − b 1 b 2 a1 a2 − b 1 b 2
a1 a2 − a1 b 2 a1 a2 − b 1 a2
γ := , δ := .
a1 a2 − b 1 b 2 a1 a2 − b 1 b 2
Then, α, β, γ, δ ≥ 0, α + γ = 1 = β + δ,

a = γ(b1 , a2 ) + δ(a1 , b2 ), b = α(b1 , a2 ) + β(a1 , b2 )

and, hence, the triangle inequality implies

a + b = γ(b1 , a2 ) + δ(a1 , b2 ) + α(b1 , a2 ) + β(a1 , b2 )


≤ γ (b1 , a2 ) + δ (a1 , b2 ) + α (b1 , a2 ) + β (a1 , b2 )
= (b1 , a2 ) + (a1 , b2 ) .

We thus obtain from Theorem 1.5.1 the following characterization in the bi-
variate case.

Lemma 1.5.2 Take an arbitrary norm · on R2 . Then,

G(x) = exp(− x), x ≤ 0,

defines a df in R2 iff the norm · is monotone.

The following lemma entails that in the bivariate case, G(x) = exp(− x),
x ≤ 0 ∈ R2 , defines a df with standard negative exponential margins iff the
norm · satisfies x∞ ≤ x ≤ x1 , x ≥ 0.

Lemma 1.5.3 Let · be a norm on Rd . If · is monotone and stan-


dardized, then we have, for 0 ≤ x ∈ Rd ,

x∞ ≤ x ≤ x1 .

For d = 2, the converse statement is also true.

The following characterization of a D-norm on R2 is a consequence of the


previous results.

Corollary 1.5.4 A radially symmetric norm · on R2 is a D-norm


iff, for 0 ≤ x ∈ R2 ,
x∞ ≤ x ≤ x1 .
1.6 The Dual D-Norm Function 19

The preceding equivalence in R2 is not true for a general dimension d.


Proof (of Lemma 1.5.3). Let 0 ≤ x = (x1 , . . . , xd ) ∈ Rd . Since the norm is
standardized, we have by the triangle inequality
(x1 , . . . , xd ) ≤ (x1 , 0, . . . , 0) + · · · + (0, . . . , 0, xd )
= x1 + · · · + xd = (x1 , . . . , xd )1 .
Furthermore, we obtain from the monotonicity of ·
(x1 , . . . , xd ) ≥ (0, . . . , 0, xi , 0 . . . , 0) = xi ei  = xi , i ≤ d,
and, thus, x ≥ max(x1 , . . . , xd ) = x∞ . Overall, we have x∞ ≤ x ≤
x1 .
Now let d = 2 and suppose that the norm satisfies x∞ ≤ x ≤ x1
for 0 ≤ x. Then, we have
1 = ei ∞ ≤ ei  ≤ ei 1 = 1
and, thus, the norm is standardized.
Take a = (a1 , a2 ) , b = (b1 , b2 ) ∈ R2 with 0 ≤ a ≤ b and 0 < b. The
condition x∞ ≤ x implies bi ≤ max(b1 , b2 ) = b∞ ≤ b for i = 1, 2.
From the triangle inequality we obtain
 
 b 1 − a1 a1 

(a1 , b2 ) =  (0, b2 ) + (b1 , b2 )
b1 b1 
b 1 − a1 a1
≤ (0, b2 ) + (b1 , b2 )
b1    b1
=b2 ≤ b

b 1 − a1 a1
≤ + (b1 , b2 ) = b
b1 b1
and
a = (a1 , a2 )
 
 b 2 − a2 a2 
= (a1 , 0) + (a1 , b2 )
b2 b2 
b 2 − a2 a2
≤ (a1 , 0) + (a1 , b2 )
b2    b2   
=a1 ≤b1 ≤ b ≤ b , see above

b 2 − a2 a2
≤ + b = b .
b2 b2
Therefore, the norm is monotone.

1.6 The Dual D-Norm Function


A D-norm is defined by xD = E (max1≤i≤d (|xi | Zi )). Replacing max by min
in this definition yields the dual D-norm function. It plays a particular role
when computing survival probabilities in multivariate extreme-value theory
(MEVT). In this section, we collect elementary properties.
20 1 D-Norms

There is a close relation between the maximum and the minimum of


real numbers, which in particular leads to the inclusion–exclusion principle
in Corollary 1.6.2. In what follows, we denote the number of elements in a set
T by |T |.

Representation of Maxima by Minima and Vice Versa


The following lemma can easily be proved by induction using the equation

min(max(a1 , . . . , an ), an+1 )
= max(min(a1 , an+1 ), . . . , min(an , an+1 )).

The case n = 2 is obvious:

max(a1 , a2 ) = a1 + a2 − min(a1 , a2 ).

Lemma 1.6.1 Arbitrary numbers a1 , . . . , an ∈ R satisfy the equations



max(a1 , . . . , an ) = (−1)|T |−1 min ai ,
i∈T
∅=T ⊂{1,...,n}

min(a1 , . . . , an ) = (−1)|T |−1 max ai .
i∈T
∅=T ⊂{1,...,n}

By choosing a1 = · · · = an = 1, the preceding result implies in particular



1= (−1)|T |−1 . (1.10)
∅=T ⊂{1,...,n}

The inclusion-exclusion principle turns out to be a straightforward conse-


quence of Lemma 1.6.1.

Corollary 1.6.2 (Inclusion–Exclusion Principle) Let A1 , . . . , An


be measurable subsets of a probability space (Ω, A, P ). Then,
 n   
  
P Ai = (−1)|T |−1 P Ai
i=1 ∅=T ⊂{1,...,n} i∈T

and    

n  
|T |−1
P Ai = (−1) P Ai .
i=1 ∅=T ⊂{1,...,n} i∈T
1.6 The Dual D-Norm Function 21

Proof. Choose ω ∈ Ω and set ai := ai (ω) := 1Ai (ω), 1 ≤ i ≤ n. From


Lemma 1.6.1, we obtain

max 1Ai (ω) = (−1)|T |−1 min 1Ai (ω).
1≤i≤n i∈T
∅=T ⊂{1,...,n}

Taking expectations on both sides yields





|T |−1
E max 1Ai = (−1) E min 1Ai ,
1≤i≤n i∈T
∅=T ⊂{1,...,n}

by using linearity of expectation. The first equation in Corollary 1.6.2 is now


a consequence of the equalities

1, if ω ∈ ∪ni=1 Ai ,
max 1Ai (ω) =
1≤i≤n 0, if ω
∈ ∪ni=1 Ai ,

1, if ω ∈ ∩i∈T Ai ,
min 1Ai (ω) =
i∈T 0, if ω
∈ ∩i∈T Ai ,

yielding

 
 

n 
E max 1Ai =P Ai , E min 1Ai = P Ai .
1≤i≤n i∈T
i=1 i∈T

Repeating the preceding arguments implies the second equation in Corol-


lary 1.6.2 as well.

Corollary 1.6.3 If Z (1) , Z (2) generate the same D-norm, then, for
each x ∈ Rd ,
 
 

(1) (2)
E min |xi | Zi = E min |xi | Zi .
1≤i≤d 1≤i≤d

Proof. Corollary 1.6.3 can be seen as follows:


 

(1)
E min |xi | Zi
1≤i≤d
⎛ ⎞
  
(1)
=E⎝ (−1)|T |−1 max |xi | Zj ⎠
j∈T
∅=T ⊂{1,...,d}
  

(1)
= (−1)|T |−1 E max |xi | Zj
j∈T
∅=T ⊂{1,...,d}
22 1 D-Norms
 
  

|T |−1 

= (−1)  |xj | ej 

∅=T ⊂{1,...,d} j∈T 
 
D


|T |−1 (2)
= (−1) E max |xi | Zj
j∈T
∅=T ⊂{1,...,d}
⎛ ⎞
  
(2)
=E⎝ (−1)|T |−1 max |xi | Zj ⎠
j∈T
∅=T ⊂{1,...,d}
 

(2)
=E min |xi | Zi .
1≤i≤d

Definition of the Dual D-Norm Function


Let ·D be an arbitrary D-norm on Rd with arbitrary generator Z =
(Z1 , . . . , Zd ). Put

 x D := E min (|xi | Zi ) , x ∈ Rd , (1.11)


1≤i≤d

called the dual D-norm function corresponding to ·D . By Corollary 1.6.3,


it is independent of the particular generator Z, but the mapping

·D →  · D

is not one-to-one. Consider, for example, the generator Z = (Z1 , Z2 , Z3 ),


which attains each of the two values (2, 0, 1), (0, 2, 1) with a probability 1/2.
As a consequence, min1≤i≤3 Zi = 0. The corresponding D-norm ·D has a
dual function  · D = 0, but ·D does not coincide with ·1 , which also has
a dual function  · 1 = 0:

(2, 1, 3)D = E(max(2Z1 , Z2 , 3Z3 )) = 7/2



= (2, 1, 3)1 = 6.

Clearly,
 · D = 0 (1.12)
is the least dual D-norm function, corresponding, for example, to ·D = ·1
if d ≥ 2, whereas

 x D = min |xi | =  x ∞ , x ∈ Rd , (1.13)


1≤i≤d
1.6 The Dual D-Norm Function 23

is the largest dual D-norm function, corresponding to ·D = ·∞ . This


follows from the inequality

|xk | = E(|xk | Zk ) ≥ E min (|xi | Zi ) , 1 ≤ k ≤ d. (1.14)


1≤i≤d

Thus, for an arbitrary dual D-norm function and d ≥ 2, we have the bounds

0 =  · 1 ≤  · D ≤  · ∞ .

The dual D-norm function is obviously homogeneous of order one, i.e.,

 λx D = |λ|  x D , λ ∈ R, x ∈ Rd .

Choose an arbitrary D-norm on Rd with generator Z = (Z1 , . . . , Zd ), and


denote by ·DT that D-norm on R|T | , which is generated by ZT := (Zi )i∈T ,

= T ⊂ {1, . . . , d}. Then, the mapping

·D → { · DT : ∅


= T ⊂ {1, . . . , d}}

is one-to-one according to Lemma 1.6.1.

Example 1.6.4 (Marshall–Olkin Model) The Marshall–Olkin D-


norm ·Dλ in (1.8) with parameter λ ∈ [0, 1] has for d ≥ 2 the dual
D-norm function

 x Dλ = λ min |xi | , x = (x1 , . . . xd ) ∈ Rd .


1≤i≤d

This is an immediate consequence of the fact that ·Dλ has the


generator Z (ξ) as in the proof of Proposition 1.4.1, together with
 x ∞ = min1≤i≤d |xi | and  x 1 = 0.

Example 1.6.5 (Weibull Model) We can define a generator Z =


(Z1 , . . . , Zd ) by taking independent and identically Weibull distributed
rv Z̃1 , . . . , Z̃d , i.e., P (Z̃1 > t) = exp(−tp ), t > 0, p ≥ 1, and putting
Zi := Z̃i /Γ (1 + p−1 ). It is easy to see that the corresponding dual
D-norm function for x ∈ Rd , xi
= 0, i = 1, . . . , d, is given by
1
 x Wp = ,
1/xp

with
 1/p

d
p
xp = |xi | , x = (x1 , . . . , xd ) ∈ Rd .
i=1
24 1 D-Norms

Hence, according to Lemma 1.6.1, the corresponding D-norm for


those x, whose components are all different from zero, is given by
 1
xWp = (−1)|T |−1  
(1/xi )  .
∅=T ⊂{1,...,d} i∈T p

1.7 Normed Generators Theorem


In this section, we establish in Theorem 1.7.1 the surprising fact that for any
D-norm ·D on Rd and an arbitrary norm · on Rd there exists a generator
Z of ·D with the additional property Z = const. The distribution of this
generator is uniquely determined, and enables in particular the metrization
of the set of D-norms in Section 1.8.
The results in this section actually laid the groundwork for MEVT, as in
Balkema and Resnick (1977); de Haan and Resnick (1977), and Vatan (1985).

Existence of Normed Generators


The main Theorem 1.7.1 follows from a sequence of auxiliary results, which
we establish below.

Theorem 1.7.1 (Normed Generators) Let · be an arbitrary


norm on Rd . For any D-norm ·D on Rd , there exists a genera-
tor Z with the additional property Z = const. The distribution of
this generator is uniquely determined.

Corollary 1.7.2 For any D-norm ·D on Rd , there exist generators


d (1) (2)
Z (1) , Z (2) with the property i=1 Zi = d and max1≤i≤d Zi =
const = (1, . . . , 1)D .

Proof. Choose · = ·1 in Theorem 1.7.1. Then,


  
d
  (1)
const = Z (1)  = Zi .
1
i=1

Taking expectations on both sides yields



d  
(1)
const = E Zi = d.
i=1

Choose · = ·∞ for the second assertion.



1.7 Normed Generators Theorem 25

Example 1.7.3 Put Z (1) := (1, . . . , 1) and Z (2) := (X, . . . , X), where
X ≥ 0 is an rv with  E(X)  = 1. Both generate the D-norm ·∞ , but
only Z (1) satisfies Z (1) 1 = d.

Example 1.7.4 Let V1 , . . . , Vd be independent and identically gamma


distributed rvs with a density γα (x) := xα−1 exp(−x)/Γ (α), x > 0,
α > 0. Then, the rv Z̃ ∈ Rd with components
Vi
Z̃i := , i = 1, . . . , d,
V1 + · · · + Vd

follows a symmetric Dirichlet distribution Dir(α) on the closed simplex


d
S̃d = {u ≥ 0 ∈ Rd : i=1 ui = 1}; see Ng et al. (2011, Theorem 2.1).
We obviously have E(Z̃i ) = 1/d, and thus

Z := dZ̃ (1.15)

is a generator of a D-norm ·D(α) on Rd , which we call the Dirichlet


D-norm with parameter α. We have in particular Z1 = d.
 d d
It is well known that, for a general α > 0, the rv Vi / j=1 Vj
d i=1
and the sum j=1 Vj are independent; see, for example, the proof of
Theorem 2.1 in Ng et al. (2011). Since E(V1 + · · ·+ Vd ) = dα, we obtain,
for x = (x1 , . . . , xd ) ∈ Rd ,

xD(α) = E max (|xi | Zi )


1≤i≤d

max1≤i≤d (|xi | Vi )
= dE
V1 + · · · + Vd

1 max1≤i≤d (|xi | Vi )
= E(V1 + · · · + Vd )E
α V1 + · · · + Vd

1
= E max (|xi | Vi ) .
α 1≤i≤d

Therefore, a generator of ·D(α) is also given by α−1 (V1 , . . . , Vd ).

In view of the preceding results, one may ask: given a generator Z of


a D-norm ·D , does there always exist a norm · with the property that
Z = const? The answer is “no.” Take, for example, an rv X that follows
the standard exponential distribution on R. Then, Z := (X, . . . , X) ∈ Rd
generates the sup-norm ·∞ . Suppose that there exists a norm · with the
property Z = const. This would imply that Z = X (1, . . . , 1) = const
a.s., which is obviously not true.
26 1 D-Norms

The Derivation of Theorem 1.7.1


The derivation of Theorem 1.7.1 is achieved in a sequence of single steps.
Throughout the rest of this section, let · be an arbitrary norm on
Rd and let ·D be a D-norm on Rd with an arbitrary generator Z =
(Z1 , . . . , Zd ). Put 
Sc := s ≥ 0 ∈ Rd : s = c ,
where c := E(Z). First, we show that c is a finite positive number. The
triangle inequality implies
 
d   d 
d
 
Z =  Z i ei  ≤ Zi ei  = Zi ei 
 
i=1 i=1 i=1
and, thus,

d 
d
c = E(Z) ≤ E(Zi ) ei  = ei  < ∞.
i=1 i=1
We have, moreover,
c = E(Z) > 0.
Otherwise, E(Z) = 0 would imply Z = 0 a.s. as the integrand is non-
negative, and thus, Z = 0 a.s. But this clearly contradicts the condition
E(Zi ) = 1, 1 ≤ i ≤ d.
The σ-field of the Borel subsets of Sc is given by

BSc := B ∩ Sc : B ∈ Bd ,
where Bd denotes the common Borel σ-field in Rd .

Introducing the Angular Measure


Put E := [0, ∞) \ {0} ⊂ Rd and λB := {λb : b ∈ B} for an arbitrary set
B ⊂ E and λ > 0.

Lemma 1.7.5 For A ∈ BSc , put


1  
Φ(A) := E 1R+ ·A (Z) Z ,
c
where R+ ·A := {λa : λ > 0, a ∈ A}. Then, Φ(·) is a probability measure
on Sc , equipped with the σ-field BSc . It is commonly called angular
measure.

Proof. The set function Φ is obviously non-negative with


1   1
Φ (Sc ) = E 1R+ ·Sc (Z) Z = E (1E (Z) Z)
c c
1 1
= E (1(Z
= 0) Z) = E(Z) = 1.
c c
1.7 Normed Generators Theorem 27

It remains to show the σ-additivity of Φ. Let Ai , i ∈ N be a sequence of


ndisjoint sets in BSc and put Bn := ∪i=1 Ai , n ∈ N, B∞ := ∪i∈N Ai .
n
pairwise
Then, i=1 1Ai = 1Bn , n ∈ N, is a monotone increasing sequence of functions
on Sc with limn∈N 1Bn = 1B∞ . The monotone convergence theorem implies
 
c Φ(B∞ ) = E 1R+ ·B∞ (Z) Z

=E lim 1R+ ·Bn (Z) Z


n∈N
 
= lim E 1R+ ·Bn (Z) Z
n∈N
 
= lim E 1∪ni=1 R+ ·Ai (Z) Z .
n∈N

Note that the sets R+ · A1 , R+ · A2 , . . . are pairwise disjoint as well: suppose


that λ1 ai = λ2 aj for some λ1 , λ2 > 0 and ai ∈ Ai , aj ∈ Aj . Then,

λ2
ai = aj
λ1
and, thus,
λ2 λ2
c = ai  = aj  = c,
λ1 λ1
i.e., λ1 = λ2 and, hence, ai = aj , i.e., i = j. We therefore obtain
 n  

c Φ(B∞ ) = lim E 1R+ ·Ai (Z) Z
n∈N
i=1

n
 
= lim E 1R+ ·Ai (Z) Z
n∈N
i=1
n 
= lim c Φ(Ai ) = c Φ(Ai ),
n∈N
i=1 i∈N

i.e., Φ is a probability measure on BSc .


Lemma 1.7.6 The angular measure Φ in Lemma 1.7.5 satisfies, for


x = (x1 , . . . , xd ) ≥ 0 ∈ Rd ,


max (xj sj ) Φ(ds) = E max (xj Zj ) = xD .


Sc 1≤j≤d 1≤j≤d

Proof. For a fixed x ≥ 0 ∈ Rd , the function f : Sc → [0, ∞), defined by


f (s) := max1≤j≤d (xj sj ), is continuous and non-negative. It is in particular
Borel-measurable, and thus, there is a sequence of elementary functions
28 1 D-Norms


m(n)
fn = αi,n 1Ai,n , n ∈ N,
i=1

with Ai,n ∈ BSc and αi,n > 0, 1 ≤ i ≤ m(n), n ∈ N, such that fn (s) ↑n∈N f (s),
s ∈ Sc .
By applying the monotone convergence theorem twice, we obtain
 
max (xj sj ) Φ(ds) = f dΦ
Sc 1≤j≤d Sc

= lim fn dΦ
n∈N Sc


m(n)
= lim αi,n Φ(Ai,n )
n∈N
i=1

1 
m(n)
 
= lim αi,n E Z 1R+ ·Ai,n (Z)
n∈N c
i=1
⎛ ⎞
1 ⎝ 
m(n)
= lim E Z αi,n 1R+ ·Ai,n (Z)⎠
n∈N c
i=1
⎛ ⎞

m(n)

1 ⎝ Zc ⎠
= lim E Z αi,n 1R+ ·Ai,n
n∈N c
i=1
Z

1 Zc
= E Z f
c Z

= E max (xj Zj ) = xD .


1≤j≤d



The following consequence of the two preceding auxiliary results is obvi-
ous.

Corollary 1.7.7 Let Y be an rv with the distribution Φ. Then, Y gen-


erates the D-norm ·D and satisfies Y  = c a.s.

The preceding result ensures the existence of a normed generator in The-


orem 1.7.1.

The Distribution of a Normed Generator Is Unique


Next, we establish the fact that the distribution of a normed generator in
Theorem 1.7.1 is uniquely determined. To achieve this, the following lemma
is helpful.
1.7 Normed Generators Theorem 29

Lemma 1.7.8 There is a unique measure ν on the Borel σ-field of


E = [0, ∞)d \ {0} with
1
ν((λ, ∞) · A) = Φ(A), λ > 0, A ∈ BSc . (1.16)
λ

Proof. Use polar coordinates to identify the set E with (0, ∞)·Sc and identify
ν in these coordinates with the product measure μ × Φ on (0, ∞) × Sc , where
the measure μ on (0, ∞) is defined by μ((λ, ∞)) = 1/λ, λ > 0. Precisely,
define the one-to-one function T : (0, ∞) × Sc → E by

T (λ, a) := λa

and put for an arbitrary Borel subset B of E

ν(B) := (μ × Φ)(T −1 (B)). (1.17)

Then, ν is a measure on the Borel σ-field of E with

ν((λ, ∞) · A) = (μ × Φ)(T −1 (λ, ∞) · A))


= (μ × Φ)((λ, ∞) × A)
= μ((λ, ∞)) Φ(A)
1
= Φ(A), λ > 0, A ∈ BSc .
λ
Since the set M := {(λ, ∞) · A : λ > 0, A ∈ BSc } generates the Borel σ-field
of E (use the function T to prove this) and M is stable with respect to
intersection, i.e., if M1 , M2 ∈ M, then M1 ∩ M2 ∈ M (check this as well),
the measure ν is uniquely determined by the property (1.16). But this is the
assertion.

Lemma 1.7.9 The measure ν from Lemma 1.7.8 satisfies


   1

ν [0, x] =  
x , x > 0 ∈ Rd .
D

Proof. We have the equations


  
ν [0, x] = 1[0,x] (y) ν(dy)
R+ ·Sc
 
= 1[0,x] (λs) μ(dλ) Φ(ds)
S c R+
 
= 1 μ(dλ) Φ(ds)
Sc {λ>0: λs≤x}
30 1 D-Norms

= μ(λ > 0 : λsi > xi for some 1 ≤ i ≤ d) Φ(ds)
Sc


xi
= μ λ > 0 : λ > min Φ(ds)
S 1≤i≤d si
 c
1
= Φ(ds)
Sc min1≤i≤d (xi /si )

 
si 1
= max Φ(ds) =  
x .
Sc1≤i≤d xi D

Corollary
  1.7.10 The distribution of a generator Z̃ of ·D with
 
Z̃  = c is uniquely determined.

   
Proof. Let Z (1) , Z (2) be two generators of ·D with Z (1)  = Z (2)  = c.
For A ∈ BSc and i = 1, 2, put
1       
 
Φi (A) := E 1R+ ·A Z (i) Z (i)  = E 1R+ ·A Z (i) .
c
Then, we obtain
    
Φi (A) = E 1A Z (i) = P Z (i) ∈ A ,

i.e., Φ is the distribution of Z (i) , i = 1, 2.


As the measure ν defined in Lemma 1.7.8 is, according to Lemma 1.7.9,
uniquely determined by the D-norm ·D , we obtain
   
P Z (1) ∈ A = Φ1 (A) = ν((1, ∞) · A) = Φ2 (A) = P Z (2) ∈ A ,
which completes the proof.

The Expected Norm of Any Generator is a Constant


The preceding results comprise the Normed Generators Theorem 1.7.1, but
they yield the following consequence as well.

Corollary 1.7.11 Let ·D be an arbitrary D-norm and let · be an


arbitrary norm on Rd . Then, there is a constant const > 0 such that,
for every generator Z of ·D ,

E(Z) = const.
1.7 Normed Generators Theorem 31

(1) (2)
Proof.  Z , Z be two dgenerators of ·D . For i = 1, 2, put ci :=
Let
 (i)
E Z  , Si := s ≥ 0 ∈ R : s = ci and Φi as in Lemma 1.7.5. We
have S2 = (c2 /c1 )S1 . Since the measure ν defined in Lemma 1.7.8 depends
only on ·D according to Lemma 1.7.9, we obtain the equations

Φ2 (S2 ) = ν((1, ∞) · S2 )

c2
=ν , ∞ · S1
c1
c1 c1
= ν((1, ∞) · S1 ) = Φ(S1 ).
c2 c2
But 1 = Φ1 (S1 ) = Φ(S2 ) and, thus, c1 /c2 = 1, which completes the proof.

For example, for ·D = ·∞ and an arbitrary norm · on Rd , we obtain
that each generator Z of ·∞ satisfies

E(Z) = (1, . . . , 1) .

This is immediate from the fact that, in particular, Z = (1, . . . , 1) ∈ Rd


generates ·∞ .
An arbitrary generator Z of ·D = ·1 satisfies


d
E(Z) = ei  .
i=1

This follows from the fact that a random permutation of (d, 0, . . . , 0) ∈ Rd


with equal probability 1/d generates the D-norm ·1 . In this case, we have


d
1 
d
E(Z) = dei  = ei  .
i=1
d i=1

Extending the Normed Generators Theorem


The Normed Generators Theorem 1.7.1 can be extended significantly by re-
placing the norm · with a general radial function R : [0, ∞)d → [0, ∞) as
follows.

Definition 1.7.12 A Borel subset S of [0, ∞)d is a complete angular


set if, for every x ∈ [0, ∞)d , x
= 0 ∈ Rd , there exists a uniquely
determined vector s ∈ S and a uniquely determined number r > 0 such
that x = rs.

For x = rs with s ∈ Rd and r > 0, put

R(x) := r
32 1 D-Norms

and R(0) = 0. Note that the radial function R is homogeneous of order one,
i.e., R(λx) = λR(x), λ ≥ 0, x ∈ [0, ∞)d . If x = rs with s ∈ S, then λx = λrs
and, thus,
R(λx) = λr = λR(x).
Repeating the arguments in the derivation of Theorem 1.7.1, the conclusion
in this result can be generalized as follows.

Theorem 1.7.13 (Extended Normed Generators Theorem)


Let S be a complete angular set in Rd with a corresponding Borel-
measurable radial function R. For every D-norm ·D on Rd , there
exists a generator Z such that R(Z) = const. The distribution of this
generator is uniquely determined.

The above result is actually a generalization of Theorem 1.7.1, since not


every complete angular set S can be generated by a norm · through S =
{s ∈ [0, ∞)d : s = c} with some c > 0. For example, put

S := (u, 1 − u2 ) : u ∈ [0, 1] ,

which is a complete angular set in R2 . Suppose there is a norm · on R2 such


that 
S = (s1 , s2 ) ∈ [0, 1]2 : (s1 , s2 ) = c
for some c > 0. Note that (1, 0) and (0, 1) are both elements of S, and thus,
by the triangle inequality and the homogeneity of order one being satisfied by
every norm, we have for every v ∈ [0, 1]

(v, 1 − v) = v(1, 0) + (1 − v)(0, 1)


≤ v (1, 0) + (1 − v) (0, 1)
= vc + (1 − v)c = c.

Choose v ∈ (0, 1). Then, there exists λ > 1 and u ∈ (0, 1) such that

(v, 1 − v) = λ(u, 1 − u2 ).

As a consequence, we obtain
 
(v, 1 − v) = λ (u, 1 − u2 ) = λc > c ≥ (v, 1 − v) ,

which is a contradiction.
An example of an unbounded complete angular set in R2 is

S := {(u, 1/u) : u > 0} ∪ {(0, 1), (1, 0)} .

Angular sets that are not necessarily complete are introduced in Defini-
tion 1.11.2.
1.8 Metrization of the Space of D-Norms 33

1.8 Metrization of the Space of D-Norms


Denote by Z · D the set of all generators of a given D-norm ·D on Rd .
Theorem 1.7.1 and Corollary 1.7.2 imply the following result.

Lemma 1.8.1 Each set Z · D contains a generator Z with the addi-


tional property Z1 = d. The distribution of this Z is uniquely deter-
mined.

Let P be the set of all probability measures on the simplex Sd := {x ≥


0 ∈ Rd : x1 = d}. Using the preceding lemma, we can identify the set D of
D-norms on Rd with the subset PD of  those probability distributions P ∈ P
that satisfy the additional condition Sd xi P (dx) = 1, i = 1, . . . , d.

Introducing the Wasserstein Metric


Denote by dW (P, Q) the Wasserstein metric between two probability distri-
butions on Sd , i.e.,

dW (P, Q)
:= inf{E (X − Y 1 ) : X has distribution P, Y has distribution Q}.

As Sd , equipped with an arbitrary norm ·, is a complete separable metric


space, the metric space (P, dW ) is complete and separable as well; see, for
example, Bolley (2008).

Lemma 1.8.2 The subspace (PD , dW ) of (P, dW ) is also separable and


complete.

Proof. It is sufficient to show that (PD , dW ) is a separable and closed subspace


of (P, dW ). We start by showing that it is closed. Let Pn , n ∈ N be a sequence
of probability measures in PD that converges to P ∈ P with respect to dW .
We show that P ∈ PD . Let the rv X have distribution P , and let X (n) have
distribution Pn , n ∈ N. Then,
d !

! d !  !
! !  ! !
! !
xi P (dx) − 1! = ! xi P (dx) − xi Pn (dx)!!
! !
i=1 Sd i=1 Sd Sd
d ! 
 !
! (n) !
= !E Xi − Xi !
i=1
 d !

 !
! (n) !
≤E ! Xi − Xi !
i=1
34 1 D-Norms
  
 
= E X − X (n)  , n ∈ N.
1

As a consequence, we obtain
d ! !
! !
! xi P (dx) − 1!! ≤ dW (P, Pn ) →n→∞ 0
!
i=1 Sd

and, thus, P ∈ PD .
The separability of PD can be seen as follows. Let P be a countable
and dense subset of P. Identify each distribution P in P with an rv Y =
(Y1 , . . . , Yd ) on Sd which follows this distribution P , i.e., each component Yi
is non-negative, and we have Y1 + · · · + Yd = d. Without loss of generality
(wlog), we can assume that E(Yi ) > 0 for each component. This can be seen
as follows. Let T ⊂ {1, . . . , d} be the set of those indices i with E(Yi ) = 0.
Suppose that T
= ∅. As Yi ≥ 0, this implies Yi = 0 for i ∈ T . For n ∈ N, put
 
(n) 1 − n1 Yi , if i
∈ T,
Yi := d
n|T | , if i ∈ T.
d  
(n) (n) (n)
Then, i=1 Yi = d and, with Y (n) := Y1 , . . . , Yd ,
    1
  d d
E Y − Y (n)  = + E(Yi ) = 2 .
1 n |T | n n
i∈T i∈T

Therefore, the sequence Y (n) , n ∈ N,approximates Y arbitrarily closely. We


substitute Y ∈ P by the sequence Y (n) , n ∈ N and, as P is countable,
we can assume wlog that each component Yi of each Y ∈ P has positive
expectation.
Finally, put Z = Y /E(Y ). This yields a countable subset of PD , which is
dense.

We can now define the distance between two D-norms ·D1 and ·D2 on
Rd by
 
dW ·D1 , ·D2
      
   
:= inf E Z (1) − Z (2)  : Z (i) generates ·Di , Z (i)  = d, i = 1, 2 .
1 1

The space D of D-norms on R , equipped with the distance dW , is, according


d

to Lemma 1.8.2, a complete and separable metric space.

Convergence of D-Norms and Weak Convergence of


Generators
For the rest of this section, we restrict ourselves to generators Z of D-norms
on Rd that satisfy Z1 = d. By →D , we denote ordinary convergence in
distribution.
1.8 Metrization of the Space of D-Norms 35

Proposition 1.8.3 Let ·Dn , n ∈ N ∪ {0}, be a sequence of D-norms


on Rd with corresponding generators Z (n) , n ∈ N ∪ {0}. Then, we have
the equivalence
 
dW ·Dn , ·D0 →n→∞ 0 ⇐⇒ Z (n) →D Z (0) .

We see in Corollary 5.1.13 that weak convergence Z (n) →D Z (0) of arbi-


trary generators, whose components do not necessarily add up to d, implies
pointwise convergence of the corresponding D-norms.
Proof. Convergence of probability measures Pn to P0 with respect to the
Wasserstein metric is equivalent to weak convergence together with conver-
gence of the moments
 
x1 Pn (dx) →n→∞ x1 P0 (dx);
Sd Sd

see, for example, Villani (2009). But since for each probability measure P ∈
PD we have  
x1 P (dx) = d P (dx) = d,
Sd Sd

the convergence of the moments is automatically satisfied.



Convergence of D-norms with respect to the Wasserstein metric implies
pointwise convergence, which is uniform on compact subsets of Rd . This is
a consequence of the following auxiliary result, which provides in particular
the fact that the pointwise limit of a sequence of D-norms is a D-norm; see
Corollary 1.8.5.

Lemma 1.8.4 For two arbitrary D-norms ·D1 , ·D2 on Rd , we have


the bound
 
xD1 ≤ xD2 + x∞ dW ·D1 , ·D2

and thus, for r ≥ 0,


! !  
sup !x − xD2 ! ≤ r dW ·D1 , ·D2 .
D1
x∈Rd , x ∞ ≤r

Proof. Let Z (i) be a generator of ·Di , i = 1, 2. We have


 

(1)
xD1 = E max |xi | Zi
1≤i≤d
  

(2) (1) (2)


= E max |xi | Zi + Zi − Zi
1≤i≤d
36 1 D-Norms
 
! !

(2) ! (1) (2) !


≤E max |xi | Zi
+ x∞ E max !Zi − Zi !
1≤i≤d 1≤i≤d
  
 (1) (2) 
≤ xD2 + x∞ E Z − Z  ,
1

which implies the assertion.


The Pointwise Limit of D-Norms is a D-Norm


Lemma 1.8.4 entails that the pointwise limit of a sequence of D-norms is again
a D-norm.

Corollary 1.8.5 Let ·Dn , n ∈ N, be a sequence of D-norms on Rd


such that
lim xDn =: f (x)
n→∞

exists in [0, ∞) for x ∈ Rd . Then, f (·) is a D-norm on Rd .

Choose an arbitrary norm · on Rd and put S · := {u ∈ Rd : u ≥


0, u = 1}. By using polar coordinates and writing x = x (x/ x), x ≥ 0,
x
= 0, it is sufficient in the previous result to require the limit

lim uDn = f (u)


n→∞

to exist for u ∈ S · . Then,

xD := x f (|x| / x), x ∈ Rd \ {0} ,

defines a D-norm on Rd , where |x| = (|x1 | , . . . , |xd |) is meant componentwise.

Proof. From Corollary 1.7.2 we know that, for every D-norm ·Dn , there

exists a generator Z (n) that realizes in Sd := x ≥ 0 ∈ Rd : x1 = d . The
simplex Sd is a compact subset of Rd , and thus, the sequence Z (n) , n ∈ N,
is tight, i.e., for each ε > 0 there exists a compact set K in Rd such that
P Z (n) ∈ K > 1 − ε for n ∈ N; just choose K = Sd . But this implies
that the sequence is relatively compact , i.e., there exists a subsequence Z (m) ,
m = m(n) that converges in distribution to some rv Z = (Z1 , . . . , Zd ); see,
for example, Billingsley (1999, Prokhorov’s theorem).
One readily finds that this limit Z realizes in Sd as well, and that each of
its components has expected value equal to one. The Portmanteau Theorem
implies  
lim sup P Z (m) ∈ Sd ≤ P (Z ∈ Sd )
n∈N
1.8 Metrization of the Space of D-Norms 37
 
as Sd is a closed subset of Rd . But P Z (m) ∈ Sd = 1 for each m, and thus,
(m)
P (Z ∈ Sd ) = 1 as well. The sequence of components Zi , n ∈ N, is uniformly
(m)
integrable for each i ∈ {1, . . . , d} as 
Zi realizes
 in [0, d], and thus, weak
(m) (m)
convergence Zi →D Zi implies E Zi →n→∞ E(Zi ); see Billingsley
 
(m)
(1999, Theorem 5.4). But E Zi = 1, and thus, we obtain E(Zi ) = 1 as
well for each i ∈ {1, . . . , d}.
The rv Z is, therefore, the generator of a D-norm ·D . From Proposi-
tion 1.8.3, we obtain that dW ·Dm , ·D →n→∞ 0. Lemma 1.8.4 implies
that xDm → xD , x ∈ Rd , and, thus, f (·) = ·D .

The Topology of Pointwise Convergence


We can define a topology on the set of D-norms on Rd by defining convergence
as
·Dn →n→∞ ·D0 : ⇐⇒ ∀ x ∈ Rd : xDn →n→∞ xD0 .
This generates the topology of pointwise convergence. This topology is ac-
tually metrized by the Wasserstein metric dW (·, ·), as the following result
shows.

Proposition 1.8.6 For a sequence ·Dn , n ∈ N ∪ {0}, of D-norms on


Rd , we have the equivalence
 
dW ·Dn , ·D0 →n→∞ 0 ⇐⇒ ∀ x ∈ Rd : xDn →n→∞ xD0 .

Metrization of the set of D-norms is crucial in Section 1.11, where we


take a look at them from a functional analysis perspective. In particular, in
equation (1.38), we define another metric, which, according to Lemma 1.11.15,
metrizes the topology of pointwise convergence as well.

Proof (of Proposition 1.8.6). The implication “⇒” is immediate from


Lemma 1.8.4 as follows. For arbitrary x ∈ Rd , x
= 0, we have
!    !!
! ! !   
!x − x ! = x !! x  −  x  !!
∞ !  
Dn D0
x∞ Dn x∞ D0 !
 
≤ x∞ dW ·Dn , ·D0 →n→∞ 0. (1.18)

Next, we establish the reverse implication. Suppose the sequence of D-


norms ·Dn , n ∈ N, satisfies xDn →n→∞ xD0 for each x ∈ Rd , but
 
dW ·Dn , ·D0 does not converge to zero as n tends to infinity. Then,
there is a subsequence ·Dk(n) , n ∈ N, with
38 1 D-Norms
 
dW ·Dk(n) , ·D0 ≥ ε, n ∈ N, (1.19)

for some ε > 0.


By Corollary 1.7.2, every D-norm ·Dn has a generator Z (n) , n ∈ N∪{0},

that realizes in Sd = x ≥ 0 ∈ Rd : x1 = d . By repeating the arguments
in the proof of Corollary 1.8.5, there is a further subsequence Z (m(n)) , n ∈ N,
of Z (k(n)) , n ∈ N, which converges in distribution to
 the generatorZ ∈ Sd of
a D-norm ·D . Proposition 1.8.3 implies that dW ·Dm(n) , ·D →n→∞ 0
and, thus, xDm(n) →n→∞ xD for each x ∈ Rd by (1.18). This implies
·D = ·D0 and
 
dW ·Dm(n) , ·D0 →n→∞ 0,

which contradicts (1.19).


1.9 Multiplication of D-Norms


The set of D-norms can be equipped with a commutative multiplication-type
operation, making it a semigroup with an identity element. This multiplication
leads to idempotent D-norms. We characterize the set of idempotent D-norms.
Iterating the multiplication provides a track of D-norms, whose limit exists
and is again a D-norm. If this iteration is repeatedly done on the same D-
norm, then the limit of the track is idempotent.

Multiplying Two D-Norms


Choose two generators Z (1) , Z (2) with corresponding D-norms ·D1 , ·D2
on Rd , and suppose that Z (1) and Z (2) are independent. Then,

Z := Z (1) Z (2)

is again a generator of a D-norm, which we denote by ·D1 D2 . Recall that


all operations on vectors, such as the above multiplication, are meant compo-
(1) (2)
nentwise. The components satisfy Zi Zi ≥ 0 and,by the independence
 of
(1) (2) (1) (2) (1) (2)
Zi and Zi , we also have E Zi Zi = E Zi E Zi = 1.
Clearly, the multiplication is commutative, ·D1 D2 = ·D2 D1 . The D-
norm ·D1 D2 does not depend on the particular choice of generators.

Lemma 1.9.1 The D-norm ·D1 D2 does not depend on the particular
choice of generators Z (1) , Z (2) , provided that they are independent.
1.9 Multiplication of D-Norms 39

Proof. By P ∗ Z we denote the distribution (P ∗ Z)(·) = P (Z ∈ ·) of an rv


Z. For x ∈ Rd , Fubini’s theorem implies
 

(1) (2)
xD1 D2 = E max |xi | Zi Zi
1≤i≤d
  
   
(1) (2) (1) (1)
= E max |xi | zi Zi P ∗ Z (1) d z1 , . . . , zd
1≤i≤d
      

   
= xz (1)  P ∗ Z (1) dz (1) = E xZ (1)  , (1.20)
D2 D2

i.e., xD1 D2 is independent of the particular choice Z (2) . Repeating the above
arguments and conditioning on Z (2) , we obtain the equation
 

 
xD1 D2 = E xZ (2)  , (1.21)
D1

i.e., xD1 D2 is independent of the particular choice Z (1) as well.


The Sup-Norm is the Identity Element


For instance, let ·D2 = ·∞ . We can choose the constant generator Z (2) =
(1, . . . , 1), which is independent of any generator Z (1) , and clearly obtain

xD1 D2 = xD1 , x ∈ Rd , (1.22)

i.e., ·D1 D2 = ·D1 . The sup-norm ·∞ is, therefore, the identity element
within the set of D-norms, equipped with the above multiplication. There is
no other D-norm with this property.
Equipped with this commutative multiplication, the set of D-norms on Rd
is, therefore, a semigroup with an identity element.

Example 1.9.2 The set of Hüsler–Reiss D-norms as defined in (1.6)


is closed with respect to multiplication, i.e., the product of two Hüsler–
Reiss D-norms is again a Hüsler–Reiss D-norm. This is an immedi-
ate consequence of the convolution theorem for (multivariate) normal
distribution: let X = (X1 , . . . , Xd ) and Y = (Y1 , . . . , Yd ) be indepen-
dent multivariate normal rv with zero means and covariance matrices
Σ = (σij )1≤i,j≤d , Λ = (λij )1≤i,j≤d . Then, the product of the corre-
sponding Hüsler–Reiss D-norms ·HRΣ and ·HRΛ has the generator

(1) (2) σ11 + λ11 σdd + λdd


Z Z = exp X1 + Y1 − , . . . , Xd + Yd − .
2 2

The rv
X + Y = (X1 + Y1 , . . . , Xd + Yd )
40 1 D-Norms

is again multivariate normal, with mean vector 0 ∈ Rd and covariance


matrix Σ + Λ = (σij + λij )1≤i,j≤d , and thus, the product is again a
Hüsler–Reiss D-norm. In short notation we have

·HRΣ HRΛ = ·HRΣ+Λ .

In particular, choose Y = (Y, . . . , Y ), where Y is a normal rv on the


real line with mean zero and variance λ > 0. Then, Y is multivariate
normal with mean vector zero and constant covariance matrix Λ = λE.
Equation (1.5) implies
·HRλE = ·∞
thus we obtain from (1.22)

·HRΣ HRλE = ·HRΣ+λE = ·HRΣ .

On the other hand, Σ/n = (σij /n)1≤i,j≤d


√ is, for an √ arbitrary √ inte-
ger n ∈ N, the covariance matrix of X/ n = (X1 / n, . . . , Xd / n).
Therefore, the Hüsler–Reiss D-norm is multiplication stable, i.e.,

·(HRΣ/n )n = ·HRΣ , n ∈ N.

This example is continued in Example 1.9.7.

The Absorbing D-Norm


Take Z (2) as a generator of the D-norm ·1 . Then, we obtain from equation
(1.20)
   
d   d
  (1)
xD1 D2 = E xZ (1)  = |xi | E Zi = |xi | , x ∈ Rd ,
1
i=1 i=1

i.e., ·D1 D2 = ·1 . Multiplication with the norm ·1 yields ·1 again, and
thus, ·1 is an absorbing element among the set of D-norms. There is clearly
no other D-norm with this property.

Idempotent D-Norms
The maximum-norm ·∞ and the norm ·1 both satisfy
·D2 := ·DD = ·D .
Such a D-norm is called idempotent. Naturally, the question of how to charac-
terize the set of idempotent D-norms arises. This is achieved in what follows.
It turns out that in the bivariate case, ·∞ and ·1 are the only idempotent
D-norms, whereas in higher dimensions, each idempotent D-norm is a certain
combination of ·∞ and ·1 .
1.9 Multiplication of D-Norms 41

The following auxiliary result is crucial for the characterization of idem-


potent D-norms. This characterization may be of interest of its own.

Lemma 1.9.3 Let X be an rv with E(X) = 0, and let Y be an inde-


pendent copy of X. If

E(|X + Y |) = E(|X|),

then either X = 0 or X ∈ {−m, m} a.s. with P (X = −m) = P (X =


m) = 1/2 for some m > 0. The reverse implication is true as well.

Proof. Suppose that P (X = −m) = P (X = m) = 1/2 for some m > 0. Then,


obviously,
E(|X|) = m = E(|X + Y |).
Next, we establish the other implication. Suppose that X is not a.s. the con-
stant zero. Denote by F the df of X. We can assume wlog the representation
X = F −1 (U1 ), Y = F −1 (U2 ), where U1 , U2 are independent, on (0, 1) uni-
formly distributed rv, and F −1 (q) := inf {t ∈ R : F (t) ≥ q}, q ∈ (0, 1), is the
generalized inverse of F . The well-known equivalence
F −1 (q) ≤ t ⇐⇒ q ≤ F (t), q ∈ (0, 1), t ∈ R,
(see, for example, Reiss (1989, equation (1.2.9))) together with Fubini’s the-
orem imply
! !
E(|X + Y |) = E !F −1 (U1 ) + F −1 (U2 )!
 1 1
! −1 !
= !F (u) + F −1 (v)! du dv
0 0
 F (0)  F (0)
=− F −1 (u) + F −1 (v) du dv
0 0
 1  1
+ F −1 (u) + F −1 (v) du dv
F (0) F (0)
 
F (0) 1 ! −1 !
+2 !F (u) + F −1 (v)! du dv
0 F (0)
   
F (0) F (0)
−1 −1
=− F (0)F (v) + F (u) du dv
0 0
   
1 1
+ (1 − F (0)) F −1 (v) + F −1 (u) du dv
F (0) F (0)
 
F (0) 1 ! −1 !
+2 !F (u) + F −1 (v)! du dv
0 F (0)
 F (0)
= −2F (0) F −1 (v) dv
0
42 1 D-Norms
 1
+ 2(1 − F (0)) F −1 (v) dv
F (0)
 
F (0) 1 ! −1 !
+2 !F (u) + F −1 (v)! du dv
0 F (0)

and  
F (0) 1
E(|X|) = − F −1 (u) du + F −1 (u) du.
0 F (0)

From the assumption E(|X + Y |) = E(|X|), we thus obtain the equation


 F (0)
0 = (1 − 2F (0)) F −1 (v) dv
0
 1
+ (1 − 2F (0)) F −1 (v) dv
F (0)
 
F (0) 1 ! −1 !
+2 !F (u) + F −1 (v)! du dv
0 F (0)

or
 1
0 = (1 − 2F (0)) F −1 (v) dv
0
 
F (0) 1 ! −1 !
+2 !F (u) + F −1 (v)! du dv.
0 F (0)
1
The assumption 0 = E(X) = 0
F −1 (v) dv now yields
 
F (0) 1 ! −1 !
!F (u) + F −1 (v)! du dv = 0
0 F (0)

thus,
F −1 (u) + F −1 (v) = 0 (1.23)
for λ-a.e. (u, v) ∈ [0, F (0)] × [F (0), 1], where λ denotes the Lebesgue measure
on [0, 1].
If F (0) = 0, then P (X > 0) = 1, and thus, E(X) > 0, which would be a
contradiction. If F (0) = 1, then P (X < 0) > 0 unless P (X = 0) = 1, which
we have excluded, and thus, E(X) < 0, which would again be a contradiction.
Consequently, we have established 0 < F (0) < 1.
As the function F −1 (q), q ∈ (0, 1), is in general continuous from the left
(see, e.g., Reiss (1989, Lemma A.1.2)), equation (1.23) implies that F −1 (v) is
a constant function on (0, F (0)] and on (F (0), 1), precisely,

−1 −m, v ∈ (0, F (0)],
F (v) =
m, v ∈ (F (0), 1),
1.9 Multiplication of D-Norms 43

for some m > 0. Note that the representation X = F −1 (U1 ), together with the
assumption that X is not a.s. the constant zero, implies m
= 0. The condition
 F (0)  1
0 = E(X) = F −1 (v) dv + F −1 (v) dv = m(1 − 2F (0))
0 F (0)

implies F (0) = 1/2 and, thus,



m, U1 > 12 ,
X = F −1 (U1 ) =
−m, U1 ≤ 12 ,

which is the assertion.


Idempotent D-Norms on R2
The next result characterizes bivariate idempotent D-norms.

Proposition 1.9.4 A D-norm ·D on R2 is idempotent iff ·D ∈


{·1 , ·∞ }.

Proof. It suffices to establish the implication

·D2 = ·D , ·D


= ·∞ =⇒ ·D = ·1 .
   
(1) (1) (2) (2)
Let Z (1) = Z1 , Z2 and Z (2) = Z1 , Z2 be independent and
identically distributed generators of ·D . According to Corollary 1.7.2, we can
(1) (1) (2) (2) (1) (2)
assume that Z1 + Z2 = 2 = Z1 + Z2 . Put X := Z1 − 1, Y := Z1 − 1.
Then, X and Y are independent and identically distributed with X ∈ [−1, 1],
E(X) = 0. From the equation max(a, b) = (a + b)/2 + |b − a| /2, which is valid
for arbitrary numbers a, b ∈ R, we obtain the representation
  
(1) (2) (1) (2)
E max Z1 Z1 , Z2 Z2
 
(1) (2) (1) (2)
1 !! (1) (2) !
Z1 Z1 Z2 Z2 (1) (2) !
=E + + E !Z1 Z1 − Z2 Z2 !
2 2 2
! !
! (1) (2) !
= 1 + E !Z1 − 1 + Z1 − 1!
= 1 + E(|X + Y |)

as well as   
(1) (1)
E max Z1 , Z2 = 1 + E(|X|).

Lemma 1.9.3 now implies that P (X = m) = P (X = −m) = 1/2 for some


m ∈ (0, 1]. It remains to show that m = 1.
44 1 D-Norms

Set x = 1 and y = a, where 0 < a < 1 satisfies a(1 + m) > 1 − m. Then,


a(1 + m)2 > (1 − m)2 as well, and we obtain
    
(1) (2) (1) (2)
(x, y)D2 = E max Z1 Z1 , a 2 − Z1 2 − Z1
1  
= max (1 − m)2 , a(1 + m)2
4
1  
+ max (1 + m)2 , a(1 − m)2
4
1  
+ max 1 − m2 , a(1 − m2 )
2
1 1 1
= a(1 + m)2 + (1 + m)2 + (1 − m2 )
4 4 2
1 1
= (1 + m)2 (1 + a) + (1 − m2 )
4 2
and
   
(1) (1)
(x, y)D = E max Z1 , a 2 − Z1
1
= max(1 + m, a(1 − m))
2
1
+ max(1 − m, a(1 + m))
2
1 1
= (1 + m) + a(1 + m)
2 2
1
= (1 + m)(1 + a).
2
From the equality (x, y)D2 = (x, y)D and the fact that 1 + m > 0, we
thus obtain
1 1 1
(1 + m)(1 + a) + (1 − m) = (1 + a)
4 2 2
⇐⇒ (m − 1)(a − 1) = 0
⇐⇒ m = 1,
which completes the proof.

Idempotent D-Norms on Rd
Next, we extend Proposition 1.9.4 to arbitrary dimensions d ≥ 2. Denote
again by ei the i-th unit vector in Rd , 1 ≤ i ≤ d, and let ·D be an arbitrary
D-norm on Rd . Recall that, for 1 ≤ i < j ≤ d,
(x, y)Di,j = xei + yej D , (x, y) ∈ R2 ,

defines a D-norm on R2 , called the bivariate projection of ·D . If Z =


(Z1 , . . . , Zd ) is a generator of ·D , then (Zi , Zj ) generates ·Di,j .
1.9 Multiplication of D-Norms 45

Proposition 1.9.5 Let ·D be a D-norm on Rd such that each bi-


variate projection ·Di,j is different from the bivariate sup-norm ·∞ .
Then, ·D is idempotent iff ·D = ·1 .

Proof. If ·D is idempotent, then each bivariate projection is an idempotent


D-norm on R2 ; thus, each bivariate projection is necessarily the bivariate D-
norm ·1 by Proposition 1.9.4. In other words, ei + ej D = 2 for 1 ≤ i <
j ≤ d. Corollary 1.3.5 now implies ·D = ·1 .

Complete Dependence Frame of a D-Norm


If we allow bivariate complete dependence, i.e., ·Di,j = ·∞ (see the com-
ments after Theorem 2.3.3), then we obtain the complete class of idempotent
D-norms on Rd as mixtures of lower-dimensional ·∞ - and ·1 -norms. To
this end, we first introduce the complete dependence frame of a D-norm.
Let ·D be an arbitrary D-norm on Rd such that at least one bivariate
projection ·Di,j equals ·∞ on R2 . Then, there exist non-empty disjoint
subsets A1 , . . . , AK of {1, . . . , d}, 1 ≤ K < d, |Ak | ≥ 2, 1 ≤ k ≤ K, such that
 
 
 
 xi ei  = max |xi | , x ∈ Rd , 1 ≤ k ≤ K,
  i∈Ak
i∈Ak D
 
and no other projection  i∈B xi ei D , B ⊂ {1, . . . , d}, |B| ≥ 2, B
= Ak ,
1 ≤ k ≤ K, is the sup-norm ·∞ on R|B| . We call A1 , . . . , AK the complete
dependence frame (CDF) of ·D . If there is no completely dependent bivariate
projection of ·D , then we say that its CDF is empty.
The next result characterizes the set of idempotent D-norms with at least
one completely dependent bivariate projection.

Theorem 1.9.6 Let ·D be an idempotent D-norm with non-empty


CDF A1 , . . . , AK . Then, we have


K 
xD = max |xi | + |xi | , x ∈ Rd .
i∈Ak
k=1 i∈{1,...,d}\∪K
k=1 Ak

On the other hand, the above equation defines an idempotent D-norm


on Rd with CDF A1 , . . . , AK for each set of non-empty disjoint subsets
A1 , . . . , AK of {1, . . . , d} with |Ak | ≥ 2, 1 ≤ k ≤ K < d.

Proof. The easiest way to establish this result is to use the fact that G(x) :=
exp(− xD ), x ≤ 0 ∈ Rd , defines a df for an arbitrary D-norm ·D on Rd .
This is the content of Theorem 2.3.3 later in this book.
46 1 D-Norms

Therefore, let η = (η1 , . . . , ηd ) be an rv with this df G. Then, for x ≤ 0 ∈


Rd , we have
G(x) = exp (− xD )
= P (ηi ≤ xi , 1 ≤ i ≤ d)

 K 
= P ηk∗ ≤ min xi , 1 ≤ k ≤ K; ηj ≤ xj , j ∈ ∪k=1 Ak ,
i∈Ak


where k ∈ Ak is an arbitrary but fixed element of Ak for each k ∈ {1, . . . , K}.
 
The rv η ∗ with joint components ηk∗ , 1 ≤ k ≤ K, and ηj , j ∈ ∪K k=1 Ak , is
an rv of a dimension less than d, and η ∗ has no pair of completely dependent
components. The rv η ∗ may be viewed as the rv η after having removed the
copies of the completely dependent components. Its corresponding D-norm is,
of course, still idempotent. From Proposition 1.9.5, we obtain its df, i.e.,
⎛ ⎞
K !
 ! 
⎜ ! ! ⎟
G(x) = exp ⎝− ! min xi ! − |xj |⎠
! !
i∈Ak
k=1 
j∈(∪K
k=1 Ak )
⎛ ⎞
⎜ 
K  ⎟
= exp ⎝− max |xi | − |xj |⎠ , x ≤ 0 ∈ Rd ,
i∈Ak
k=1 
j∈(∪K
k=1 Ak )

which is the first part of the assertion.


On the other hand, take an rv U that is uniformly distributed
! on the set
  ! K  !!
of integers {k ∗ : 1 ≤ k ≤ K} ∪ ∪K A
k=1 k . Put m := K + ∪
! k=1 k !
A , and
set for i = 1, . . . , d
m, i ∈ Ak ,
Zi :=
0 otherwise,
if U = k ∗ , 1 ≤ k ≤ K, and

m, i = j,
Zi :=
0 otherwise,
 
if U = j ∈ ∪Kk=1 Ak . Then, E(Zi ) = 1, 1 ≤ i ≤ d, and

E max (|xi | Zi )
1≤i≤d


= E max (|xi | Zi ) 1(U = j)


1≤i≤d

j∈{k∗ : 1≤k≤K}∪(∪K
k=1 Ak )


K 
= max |xi | + |xj | , x ∈ Rd .
i∈Ak
k=1 
j∈(∪K
k=1 Ak )
1.9 Multiplication of D-Norms 47

It is easy to see that this D-norm is idempotent, and thus, the proof is com-
plete.

The set of all idempotent trivariate D-norms is, for example, given by the
following five cases:


⎪max(|x| , |y| , |z|),




⎨max(|x| , |y|) + |z| ,
(x, y, z)D = max(|x| , |z|) + |y| ,


⎪max(|y| , |z|) + |x| ,



⎩|x| + |y| + |z| ,

where the three mixed versions are just permutations of the arguments and
may be viewed as equivalent.

Tracks of D-Norms
The multiplication of D-norms D1 , D2 , . . . on Rd can obviously be iterated:
·n+1 Di := ·Dn+1 n Di , n ∈ N.
i=1 i=1

In what follows we investigate such D-norm tracks ·n Di , n ∈ N. We


i=1
show in particular that each track converges to an idempotent D-norm if
·Di = ·D , i ∈ N, with an arbitrary ·D -norm D on Rd .

Example 1.9.7 (Continuation of Example 1.9.2) Recall that any


positive semidefinite d × d-matrix Σ is the covariance matrix of a multi-
variate normal rv with mean vector zero, and thus it defines the Hüsler–
Reiss D-norm ·HRΣ . By the principal axis theorem, there exists an
orthogonal d × d-matrix O, i.e., the transpose O of O is the inverse
O−1 of O, such that
⎛ ⎞
λ1 0
⎜ ⎟
Σ = O ⎝ . . . ⎠ O ,
0 λd

where λ1 ≥ · · · ≥ λd ≥ 0 are the eigenvalues of Σ; see, for example,


Lang (1987). This equation implies the spectral decomposition of Σ

Σ = λ1 r1 r1 + · · · + λd rd rd ,

where ri ∈ Rd is the i-th column of the matrix O, 1 ≤ i ≤ d. Clearly,


each product ri ri is a positive semidefinite d × d matrix, and thus the
Hüsler–Reiss D-norm ·HRΛ is the product of the norms ·HR 
,
λi ri r
i
1 ≤ i ≤ d.
48 1 D-Norms

In short notation, we have

·HRΛ = ·d HRλ 


.
i=1 i ri ri

One may call this representation the spectral decomposition of the


Hüsler–Reiss D-norm ·HRΛ .

We establish several auxiliary results in what follows. The first one shows
that the multiplication of two D-norms is increasing.

Lemma 1.9.8 For two arbitrary D-norms ·D1 , ·D2 on Rd , we have


the inequality  
·D1 D2 ≥ max ·D1 , ·D2 .

Proof. Let Z (1) and Z (2) be independent generators of ·D1 and ·D2 . By
equation (1.21), we have for x ∈ Rd
 

 (2) 
xD1 D2 = E xZ  . (1.24)
D1

Note that      
   
xD1 = xE Z (2)  = E xZ (2)  (1.25),
D1 D1
 (2) 
where
  the expectation  of an rv is meant componentwise, i.e., E Z
 =
(2) (2)
E Z1 , . . . , E Zd , etc. Put

T (x) := xD1 , x ∈ Rd .
Check that T is a convex function by the triangle inequality and the homo-
geneity satisfied by any norm. Jensen’s inequality states that a convex function
T : Rd → R entails T (E(Y1 ), . . . , E(Yd )) ≤ E(T (Y1 , . . . , Yd )) for arbitrary in-
tegrable rv Y1 , . . . , Yd . We thus obtain from Jensen’s inequality together with
equations (1.24) and (1.25)
 

 (2) 
xD1 D2 = E xZ 
D1
  
= E T xZ (2)
  
≥ T E xZ (2)
  
 
= E xZ (2) 
D1
= xD1 .
Exchanging Z (1) and Z (2) completes the proof.

1.9 Multiplication of D-Norms 49

The Limit of a Track is a D-Norm


Proposition 1.9.9 Let ·Dn , n ∈ N, be a sequence of arbitrary D-
norms on Rd . Then, the limit of the track

lim xn Di =: f (x)


n→∞ i=1

exists for each x ∈ Rd and is a D-norm, i.e., f (·) = ·D .

Proof. From Lemma 1.9.8, we know that, for each x ∈ Rd and each n ∈ N,

xn Di ≤ xn+1 Di .
i=1 i=1

As each D-norm is bounded by the norm ·1 , we have xn Di ≤ x1


i=1
for each n ∈ N. Consequently, the sequence xn Di , n ∈ N, is monotone
i=1
increasing and bounded; thus, the limit

lim xn Di =: f (x)


n→∞ i=1

exists in [0, ∞). As the pointwise limit of a sequence of D-norms is a D-norm


by Corollary 1.8.5, f (·) is a D-norm as well.

Idempotent Limit of a Track


If we set ·Dn for each n ∈ N equal to a fixed but arbitrary D-norm ·D ,
then the limit in Proposition 1.9.9 is an idempotent D-norm.

Theorem 1.9.10 Let ·D be an arbitrary D-norm on Rd . Then, the


limit
lim xDn =: xD∗ , x ∈ Rd ,
n→∞

is an idempotent D-norm on Rd .

Proof. We know from Proposition 1.9.9 that ·D∗ is a D-norm on Rd . Let Z ∗


be a generator of this D-norm, and let Z (1) , Z (2) , . . . be independent copies
of the generator Z of ·D , independent of Z ∗ as well. Then, for each x ∈ Rd ,
we have  n  
  
 (i) 
xDn = E x Z  ↑n→∞ xD∗
 
i=1 ∞
according to Lemma 1.9.8 and Proposition 1.9.9, as well as for each k ∈ N
50 1 D-Norms
⎛  ⎞
  
 k (i)  n

xDn ⎝
= E x Z (j)  ⎠
Z 
 i=1 j=k+1 

⎛  ⎞
    
 k n

= E x ⎝  z (i)
Z (j)  ⎠
 i=1 j=k+1 

     
P ∗ Z (1) , . . . , Z (k) d z (1) , . . . , z (k)
   k

      
 
→n→∞ x z (i)  P ∗ Z (1) , . . . , Z (k) d z (1) , . . . , z (k)
  ∗
i=1
  D
 
k 
 ∗ 
= E xZ Z (i) 
 
i=1 ∞

by the monotone convergence theorem. We thus have


  
 
k 
 ∗ (i) 
xD∗ = E xZ Z 
 
i=1 ∞

for each k ∈ N. By letting k tend to infinity and repeating the above argu-
ments, we obtain
  
 k 
 ∗ (i) 
xD∗ = E xZ Z  ↑k→∞ E (xZ ∗ D∗ ) = xD∗ D∗ ,
 
i=1 ∞

which completes the proof.



Recall that the multiplication of D-norms is, according to Lemma 1.9.8,
a monotone increasing operation. Together with the fact that ·∞ is the
smallest D-norm (see equation (1.4)), we can draw the following conclusions.
If the initial D-norm ·D has no complete dependence structure among its
margins, i.e., if its CDF is empty, then the limiting D-norm in Theorem 1.9.10
is, by Proposition 1.9.5, the independence D-norm ·1 . Otherwise, the limit
has the same CDF as ·D .
The limit of an arbitrary track ·n Di , n ∈ N, is not necessarily idempo-
i=1
tent. Take, for example, an arbitrary D-norm ·D1 , which is not idempotent,
and set ·Di = ·∞ , i ≥ 2.

An Application to Copulas
Let the rv U = (U1 , . . . , Ud ) follow a copula, i.e., each component Ui is uni-
1
formly distributed on (0, 1). As E(Ui ) = 0 u du = 1/2, the rv Z := 2U
generates a D-norm; see also the discussion on page 152. The following result
is an immediate consequence of the previous considerations.
1.9 Multiplication of D-Norms 51

Corollary 1.9.11 Let U (1) , U (2) , . . . be independent copies of the rv


U that follows an arbitrary copula on Rd . Suppose that no pair Ui , Uj ,
i
= j of the components of U = (U1 , . . . , Ud ) satisfies Ui = Uj a.s.
Then, for each x ∈ Rd ,
  
 n
(i)

d
lim 2 E max |xj |
n
Uj = |xj | .
n→∞ 1≤j≤d
i=1 j=1

With x = (1, . . . , 1), we obtain


  

n
(i)
n
lim 2 E max Uj = d.
n→∞ 1≤j≤d
i=1

An Application to Multivariate Normal Random


Vectors
Let the rv X = (X1 , . . . , Xd ) follow a multivariate normal distribution with
E(Xi ) = 0 and covariance matrix Σ = (σij )1≤i,j≤d . Then,
  σ11   σdd 
Z = (Z1 , . . . , Zd ) = exp X1 − , . . . , exp Xd −
2 2
generates the Hüsler–Reiss D-norm ·HRΣ . In what follows, we require that,

for i
= j, each correlation coefficient ρij = E(Xi Xj )/ σii σjj is strictly less
than one.

Corollary 1.9.12 Let X (1) , X (2) , . . . be independent copies of X. For


each x ∈ Rd , we obtain
   n 
 (i) nσjj
E exp max Xj − + log(|xj |)
1≤j≤d 2
i=1
  
n  σjj 
(i)
= E max |xj | exp Xj −
1≤j≤d 2
i=1

d
→n→∞ |xj | .
j=1

With x = (1, . . . , 1) and identical variances σ11 = · · · = σdd = σ 2 ,


we obtain in particular
   
n (i)
E exp max1≤j≤d i=1 X j
→n→∞ d.
exp(nσ 2 /2)
52 1 D-Norms

1.10 The Functional D-Norm


This section extends D-norms to function spaces. In particular, this entails
an appealing approach to functional extreme value theory in Chapter 4.

Some Basic Definitions


By C [0, 1] := {g : [0, 1] → R, g is continuous}, we denote the well-known set
of continuous functions from the interval [0, 1] to the real line. By E[0, 1] we
denote the lesser known set of those bounded functions f : [0, 1] → R with
only a finite number of discontinuities. Note that E[0, 1] is a linear space: if
f1 , f2 ∈ E[0, 1] and x1 , x2 ∈ R, then x1 f1 + x2 f2 ∈ E[0, 1] as well.
We introduce the set E[0, 1] because it allows the incorporation of finite
dimensional marginal distributions of a stochastic process with a proper choice
of f ∈ E[0, 1], as we see later in equation (1.29).
Let Z = (Zt )t∈[0,1] be a stochastic process on [0, 1], i.e., Zt is an rv for
each t ∈ [0, 1]. We require each sample path of (Zt )t∈[0,1] to be a continuous
function on [0, 1], Z ∈ C [0, 1] for short. We also require that

Zt ≥ 0, E(Zt ) = 1, t ∈ [0, 1] ,

and

E sup Zt < ∞.
0≤t≤1

Lemma 1.10.1 Under the above conditions on the process Z =


(Zt )t∈[0,1] ,

f D := E sup (|f (t)| Zt ) , f ∈ E [0, 1] ,


0≤t≤1

defines a norm on E[0, 1].

Proof. We, obviously, have f D ≥ 0 and


f D = E sup (|f (t)| Zt )


0≤t≤1
  
≤E sup |f (t)| sup Zt
t∈[0,1] t∈[0,1]
   
= sup |f (t)| E sup Zt < ∞.
t∈[0,1] t∈[0,1]
1.10 The Functional D-Norm 53

Let f D = 0. We have to show that f = 0. Suppose that there exists some


t0 ∈ [0, 1] with f (t0 )
= 0. Then,

0 = f D
 
=E sup (|f (t)| Zt )
t∈[0,1]

≥ E(|f (t0 )| Zt0 )


= |f (t0 )| E(Zt0 )
= |f (t0 )| > 0,

which is a clear contradiction. We have thus established the implication

f D = 0 =⇒ f = 0.

The reverse implication is obvious. Homogeneity is obvious as well: for f ∈


E[0, 1] and λ ∈ R, we have

λf D = E sup (|λf (t)| Zt )


0≤t≤1

= E |λ| sup (|f (t)| Zt )


0≤t≤1

= |λ| E sup (|f (t)| Zt )


0≤t≤1

= |λ| f D .

The triangle inequality for ·D follows from the triangle inequality for real
numbers |x + y| ≤ |x| + |y|, x, y ∈ R:

f1 + f2 D = E sup (|f1 (t) + f2 (t)| Zt )


0≤t≤1

≤E sup (|f1 (t)| Zt + |f2 (t)| Zt )


0≤t≤1

≤E sup (|f1 (t)| Zt ) + sup (|f2 (t)| Zt )


0≤t≤1 0≤t≤1

=E sup (|f1 (t)| Zt ) + E sup (|f2 (t)| Zt )


0≤t≤1 0≤t≤1

= f1 D + f2 D , f1 , f2 ∈ E [0, 1] .



54 1 D-Norms

Measurability of the Integrand


Note that, for each f ∈ E[0, 1], (f (t)Zt )t∈[0,1] is a stochastic process whose
sample paths have only a finite number of discontinuities, namely those of the
function f . The finite set of discontinuities of the process (f (t)Zt )t∈[0,1] is, con-
sequently, non-random. This entails the measurability of supt∈[0,1] (|f (t)| Zt ):
we can find a sequence of increasing index sets Tn = {t1 , . . . tn } ⊂ [0, 1], n ∈ N,
containing all discontinuities of f for n large enough, such that

sup (|f (t)| Zt ) = lim max (|f (ti )| Zti ) .


t∈[0,1] n→∞ 1≤i≤n

As max1≤i≤n (|f (ti )| Zti ) is an rv for each n ∈ N, the limit of this sequence, i.e.,
supt∈[0,1] (|f (t)| Zt ), is an rv as well. We can therefore compute its expectation,
which is finite by the bound
sup (|f (t)| Zt ) =: f Z∞ ≤ sup (|f (t)|) sup Zt = f ∞ Z∞
t∈[0,1] t∈[0,1] t∈[0,1]

and taking expectations. Recall that each function f ∈ E[0, 1] is by the def-
inition of E[0, 1] bounded. The process Z = (Zt )t∈[0,1] is again called the
generator of the D-norm ·D .

The Functional Sup-Norm is a D-Norm


The functional sup-norm f ∞ = supt∈[0,1] (|f (t)|), f ∈ E[0, 1], is a functional
D-norm. Just choose an rv X ≥ 0 with E(X) = 1, and set Z = (Zt )t∈[0,1]
with Zt = X, t ∈ [0, 1]. Then, clearly, the corresponding functional D-norm is
 
f D = E sup (|f (t)| Zt ) = E(f ∞ X) = f ∞ , f ∈ E[0, 1].
t∈[0,1]

This example shows that the generator of a D-norm is also not uniquely
determined in the functional setup.
The functional sup-norm ·∞ is again the smallest D-norm
f ∞ ≤ f D , f ∈ E[0, 1],
(see Lemma 1.10.2 below), but unlike the multivariate case, there is no in-
dependence D-norm in the functional setup. Suppose there exists a D-norm
with generator Z = (Zt )t∈[0,1] such that


d
xDt := E max (|xi | Zti ) = |xi |
1 ,...,td 1≤i≤d
i=1

for any choice 0 ≤ t1 < · · · < td ≤ 1 of indices and x = (x1 , . . . , xd ) ∈ Rd , d ∈


N. Then, by the continuity of Z = (Zt )t∈[0,1] and the dominated convergence
theorem, we obtain for the constant function 1
1.10 The Functional D-Norm 55
 
1D = E sup Zt
t∈[0,1]

=E lim max Zi/n


n→∞ 1≤i≤n

= lim E max Z i/n


n→∞ 1≤i≤n

n
= lim 1 = ∞,
n→∞
i=1

thus, a functional independence D-norm does not exist. Furthermore, no func-


tional Lp -norm is a D-norm; see Corollary 1.10.4 below.

Bounds for the Functional D-Norm


All norms on Rd are equivalent, i.e., for two arbitrary norms ·1 , ·2 on Rd
there exists a constant K > 0, such that

x1 ≤ K x2 , x ∈ Rd .

This is no longer true for arbitrary norms on E[0, 1]. But it turns out that
each functional D-norm is equivalent to the sup-norm f ∞ = supt∈[0,1] |f (t)|
on E[0, 1].

Lemma 1.10.2 Each functional D-norm is equivalent to the sup-norm


·∞ , precisely,

f ∞ ≤ f D ≤ f ∞ 1D , f ∈ E[0, 1].

Proof. Let Z = (Zt )t∈[0,1] be a generator of ·D . For each t0 ∈ [0, 1] and
f ∈ E[0, 1], we have

|f (t0 )| = E (|f (t0 )| Zt0 )


 
≤E sup (|f (t)| Zt )
t∈[0,1]

= f D
≤ E (f ∞ Z∞ ) = f ∞ 1D ,

which proves the lemma.



56 1 D-Norms

Corollary 1.10.3 For f, g ∈ E[0, 1], we have the bound


! !
! f  − g ! ≤ f − g 1 .
D D ∞ D

Proof. As ·D is a norm, it satisfies the triangle inequality

f D ≤ f − gD + gD .

Lemma 1.10.2 now implies

f D − gD ≤ f − gD ≤ f − g∞ 1D .

Exchanging f and g implies the assertion.


Functional Lp -Norms Are Not D-Norms


Different than the multivariate case, a functional logistic norm is not a func-
tional D-norm.
 1/p
1 p
Corollary 1.10.4 No norm f p := 0 |f (t)| dt with p ∈ [1, ∞)
is a D-norm.

Proof. Choose ε ∈ (0, 1) and put fε (·) := 1[0,ε] (·) ∈ E[0, 1]. Then, fε ∞ =
1 > ε1/p = fε p . The Lp -norm, therefore, does not satisfy the first inequality
in Lemma 1.10.2.

A Functional Version of Takahashi’s Theorem


The next consequence of Lemma 1.10.2 is obvious. This is a functional ver-
sion of Takahashi’s Theorem 1.3.1, part (ii). Note that there cannot exist an
extension of part (i) to the functional case, as ·1 is not a functional D-norm
according to the preceding result.

Corollary 1.10.5 A functional D-norm ·D is the sup-norm ·∞ iff


1D = 1.

Example: The Brown–Resnick Process


A nice example of a generator process is the geometric Brownian motion

t
Zt := exp Bt − , t ∈ [0, 1], (1.26)
2
1.10 The Functional D-Norm 57

where B := (Bt )t≥0 is a standard Brownian motion on [0, ∞). The corre-
sponding max-stable process is a Brown–Resnick process (Brown and Resnick
(1977)); see Section 4.2.
The characteristic properties of a standard Brownian motion B are that
it realizes in C[0, 1], B0 = 0 and that the increments Bt − Bs are independent
and normal N (0, t − s) distributed rv with mean zero and variance t − s,
formulated a little loosely. As a consequence, each Bt with t > 0 follows the
normal distribution N (0, t) with mean zero and variance t. We have, therefore,

Zt > 0, t ∈ [0, 1],

and, for t > 0,


t
E(Zt ) = exp − E(exp(Bt ))
2

t 1/2 Bt
= exp − E exp t
2 t1/2

 ∞ 2

t 1/2 1 x
= exp − exp(t x) 1/2
exp − dx
2 −∞ (2π) 2
 ∞

1 (x − t1/2 )2
= 1/2
exp − dx
−∞ (2π) 2
= 1, (1.27)
 
as exp −(x − t1/2 )2 /2 /(2π)1/2 , x ∈ R, is the density of the normal
N (t1/2 , 1)-distribution.
It is well known that, for x ≥ 0,
 
P sup Bt > x = 2P (B1 > x),
t∈[0,1]

which is called the reflection principle for the standard Brownian motion; see,
for example, Revuz and Yor (1999, Proposition 3.7). From this equation and
the representation of the expectation of an rv in Lemma 1.2.2, we obtain
   
E sup Zt ≤E sup (exp (Bt ))
t∈[0,1] t∈[0,1]
  
=E exp sup Bt
t∈[0,1]
    

= P exp sup Bt >x dx
0 t∈[0,1]
  

≤1+ P sup Bt > log(x) dx
1 t∈[0,1]
58 1 D-Norms
 ∞
=1+2 P (B1 > log(x)) dx
1 ∞
≤1+2 P (exp(B1 ) > x) dx
0
= 1 + 2E(exp(B1 ))
< ∞,

as exp(B1 ) is standard lognormal distributed with expectation exp(1/2).


The exact value of the complete D-norm f D is unknown for arbitrary
f , but we can compute the bivariate D-norm

(x, y)Ds,t = E (max(|x| Zs , |y| Zt )) , x, y ∈ R, 0 ≤ s < t ≤ 1.

This knowledge is sufficient, for example, to reconstruct a Brown–Resnick


process by means of a max-linear model, as in Section 4.3.

Lemma 1.10.6 In the case of the Brown–Resnick standard max-stable


process with a standard geometric Brownian generator process, we have

t − s log (|x| / |y|)


(x, y)Ds,t = |x| Φ + √
2 t−s

t − s log (|y| / |x|)


+ |y| Φ + √
2 t−s
for x, y ∈ R and 0 ≤ s < t ≤ 1, where Φ denotes the standard normal
df on R.

An inspection of the proof of Lemma 1.10.6 shows that the restriction


s, t ≤ 1 can be dropped.
Note that ·Ds,t in the preceding lemma equals the bivariate Hüsler–Reiss
D-norm ·HRΣ , with covariance matrix

Var(Bs ) Cov(Bs , Bt ) ss
Σ= = .
Cov(Bs , Bt ) Var(Bt ) st

Lemma 1.10.6 can be extended to zero means Gaussian processes with sta-
tionary increments; see Kabluchko et al. (2009, Remark 24). For the trivariate
case, we refer to Huser and Davison (2013).

Proof (of Lemma 1.10.6). We provide quite an elementary proof, which uses
the independence of the increments of a Brownian motion. We have, for 0 ≤
s < t and x, y > 0,

(x, y)Ds,t
= E(max(xZs , yZt ))
1.10 The Functional D-Norm 59


s t
= E max x exp Bs − , y exp Bt −
2 2
 s
t s

= x exp − E exp(Bs )1 Bt − + log(y) ≤ Bs − + log(x)


2 2 2

t t s
+ y exp − E exp(Bt )1 Bt − + log(y) > Bs − + log(x)
2 2 2
 s

t−s x
= x exp − E exp(Bs )1 Bt − Bs ≤ + log
2 2 y

t t−s x
+ y exp − E exp(Bt )1 Bt − Bs > + log
2 2 y
 s

t
=: x exp − I + y exp − II.
2 2

Recall that the increments Bt − Bs , Bs = Bs − B0 of a standard Brownian


motion are independent and normal distributed with means zero and variances
t − s and s. As a consequence, we obtain by equation (1.27)

t−s x
I = E (exp(Bs )) E 1 Bt − Bs ≤ + log
2 y
s t−s

x
= exp P Bt − Bs ≤ + log
2 2 y
 s   (t−s)/2+log(x/y)

1 u
= exp √ ϕ √ du
2 −∞ t−s t−s

 s   ((t−s)/2+log(x/y))/ t−s
= exp ϕ(u) du
2 −∞
 s  √t − s log(x/y)

= exp Φ + √ ,
2 2 t−s
where ϕ denotes the standard normal density on the real line. By repeating
the above arguments, we obtain

t−s x
II = E exp(Bs ) exp(Bt − Bs )1 Bt − Bs > + log
2 y

t−s x
= E(exp(Bs ))E exp(Bt − Bs )1 Bt − Bs > + log
2 y
s ∞

1 u
= exp exp(u) √ ϕ √ du.
2 (t−s)/2+log(x/y) t−s t−s

The equation

u 1 u2
exp(u)ϕ √ = √ exp(u) exp −
t−s 2π 2(t − s)
60 1 D-Norms
2

1 u − 2(t − s)u + (t − s)2 t−s


= √ exp − +
2π 2(t − s) 2
2

1 (u − (t − s)) t−s
= √ exp − exp
2π 2(t − s) 2

t−s u − (t − s)
= exp ϕ √
2 t−s
implies

 ∞

t 1 u − (t − s)
II = exp √ ϕ √ du
2 (t−s)/2+log(x/y) t−s t−s

 ∞
t
= exp √
ϕ(u) du
2 (log(x/y)−(t−s)/2)/ t−s

t log(x/y) t−s
= exp 1−Φ √ −
2 t−s 2

t t − s log(y/x)
= exp Φ + √
2 2 t−s
by appropriate elementary substitutions and the equation 1 − Φ(u) = Φ(−u),
u ∈ R. The assertion is now a consequence of the equation
 s

t
x exp − I + y exp − II
2 2

t − s log(x/y) t − s log(y/x)
= xΦ + √ + yΦ + √ .
2 t−s 2 t−s

Dual D-Norm Function


We can also extend the multivariate dual D-norm function in (1.11) to func-
tional spaces by setting

 f D := E inf (|f (t)| Zt ) , f ∈ E[0, 1]. (1.28)


t∈[0,1]

As in the multivariate case in Corollary 1.6.3, the value of  f D does not
depend on the particular process Z = (Zt )t∈[0,1] , that generates the functional
D-norm ·D .

Lemma 1.10.7 Let Z = (Zt )t∈[0,1] and Z̃ = (Z̃t )t∈[0,1] be two genera-
tors of the functional D-norm ·D . Then,

E inf (|f (t)| Zt ) = E inf (|f (t)| Z̃t ) , f ∈ E[0, 1].


t∈[0,1] t∈[0,1]
1.10 The Functional D-Norm 61

Proof. Choose f ∈ E[0, 1]. As before, we can find a sequence of increasing


index sets Tn := {t1 , . . . , tn } ⊂ [0, 1], n ∈ N, such that

inf (|f (t)| Zt ) = lim min (|f (ti )| Zti )


t∈[0,1] n→∞ 1≤i≤n

and
 
inf (|f (t)| Z̃t ) = lim min |f (ti )| Z̃ti .
t∈[0,1] n→∞ 1≤i≤n

 n
But for each n ∈ N, (Zti )ni=1 and Z̃ti are generators of the same
i=1
D-norm ·Dt on R , as they satisfy for x = (x1 , . . . , xn ) ∈ Rn
n
1 ,...,tn


 
E max (|xi | Zti ) = E sup (|fx (t)| Zt )
1≤i≤n t∈[0,1]
 
 
=E sup |fx (t)| Z̃t
t∈[0,1]
 

=E max |xi | Z̃ti ,


1≤i≤n

with

xi , if t = ti 
n
fx (t) := = xi 1{ti } (t), t ∈ [0, 1], (1.29)
0 elsewhere i=1

which defines a function in E[0, 1].


Since  x Dt1 ,...,tn does not depend on the generator of ·Dt ,...,tn by
1
Corollary 1.6.3, we have, for each n ∈ N and x = (x1 , . . . , xn ) ∈ Rn ,

 

E min (|xi | Zti ) = E min |xi | Z̃ti .


1≤i≤n 1≤i≤n

The monotone convergence theorem now implies



E inf (|f (t)| Zt ) = lim E min (|f (ti )| Zti )


t∈[0,1] n→∞ 1≤i≤n
 

= lim E min |f (ti )| Z̃ti


n→∞ 1≤i≤n
 

=E inf |f (t)| Z̃t .


t∈[0,1]



62 1 D-Norms

It is easy to construct a generator of a functional D-norm such that the


corresponding dual D-norm function is zero for each f ∈ E[0, 1]. Choosing
Z = (Zt )t∈[0,1] as the constant function 1, we obtain from the arguments in
(1.14) that
 f ∞ = min |f (t)| , f ∈ E[0, 1],
t∈[0,1]

is the largest functional D-norm, i.e., we have for an arbitrary functional


D-norm the bounds

0 ≤  f D ≤  f ∞ = min |f (t)| , f ∈ E[0, 1].


t∈[0,1]

A Normed Generators Theorem


We have established in Theorem 1.7.1 the fact that, for any D-norm ·D on
Rd and for any norm · on Rd , there exists a generator Z of ·D with the
additional property that Z = const. The following result can be viewed as
a functional analog of this normed generators theorem. For a proof, we refer
to de Haan and Ferreira (2006, equation (9.4.9)).

Theorem 1.10.8 (De Haan and Ferreira) For an arbitrary func-


tional D-norm ·D , there exists a generator Z = (Zt )t∈[0,1] with the
additional property supt∈[0,1] Zt = const for some const ≥ 1.

1.11 D-Norms from a Functional Analysis Perspective


In this section, seminorms play a crucial role.

Definition 1.11.1 A function ·s from Rd to [0, ∞) is a seminorm if


it is homogeneous of order one and if it satisfies the triangle inequality,
i.e., if it satisfies conditions (1.2) and (1.3). Different than a norm,
condition (1.1) is not required.

We can generate a seminorm by means of an rv Z = (Z1 , . . . , Zd ) ≥ 0 ∈ Rd


with E(Zi ) < ∞, 1 ≤ i ≤ d, by defining

xS := E max (|xj | Zj ) , x = (x1 , . . . , xd ) ∈ Rd . (1.30)


1≤j≤d

Note that we use a capital letter S in the index for such a seminorm, which
is defined by a generator Z. The above definition is quite close to that of a
D-norm in Lemma 1.1.3; the difference is that xS = 0 does not necessarily
imply x = 0 in (1.30), as we allow E(Zj ) = 0 for some j ∈ {1, . . . , d}, i.e., Zj =
0 a.s. In this case, we obtain for the unit vector ej = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rd
1.11 D-Norms from a Functional Analysis Perspective 63

ej S = E(Zj ) = 0.
The seminorm ·S is, consequently, a norm iff E(Zj ) > 0 for all j = 1, . . . , d,
with the special case of it being a D-norm iff E(Zj ) = 1 for all j.
Ressel (2013, Theorem 1) characterized the set  of seminorms as defined
in (1.30) when the generators Z realize in S = x ≥ 0 ∈ Rd : x∞ = 1 .
This characterization is achieved in the setup of functional analysis. The set
of seminorms turns out to be a Bauer simplex, whose extremal elements are
the seminorms with a constant generator. The aim of this section is to extend
this characterization in Theorem 1.11.19 to the case where the generators Z
all realize in an angular set, defined below.
As a consequence of our considerations we show in particular in Propo-
sition 1.11.20 that the set ofD-norms, whose generators follow a discrete
distribution on the set Sd = x ≥ 0 ∈ Rd : x1 = d , is a dense subset of
the set of all D-norms.
Before we can present the results, we have to introduce various definitions
and auxiliary results.

Definition 1.11.2 A subset S ⊂ [0, ∞)d is an angular set if it satisfies


the following conditions:
(i) S
= ∅ is compact,
(ii) 0 ∈
S,
(iii) For any x ∈ S and any λ ∈ R, we have λs ∈ S iff λ = 1.


An angular set in R2 is, for example, S1 := (u, 1 − u2 ) : u ∈ [0, 1] ; see
the discussion after Theorem 1.7.13. The set
S2 := {(u, 1/u) : u > 0} ∪ {(0, 1), (1, 0)}
is not an angular set in R2 in the sense of Definition 1.11.2 as it is not compact.
The set
S3 := {(u, 1 − u) : u ∈ [0, 1/2]}
2
is an angular set in R , but not a complete one as in Definition 1.7.12, since
not every (x, y) > 0 ∈ R2 can be represented as (x, y) = λ(u, 1 − u), with
some λ > 0 and some u ∈ [0, 1/2]. The vector (3/4, 1/4), for example, cannot
be represented this way.

Introducing the Relevant Space of Seminorms


Throughout the rest of this section we suppose that S ⊂ [0, ∞)d is an angular
set as in Definition 1.11.2. By KS we denote the set of seminorms on Rd
generated by means of S:

KS := ·S : there is a rv Z = (Z1 , . . . , Zd ) that realizes in S, with
  
xS = E max (|xj | Zj ) for all x ∈ Rd . (1.31)
1≤j≤d
64 1 D-Norms

Repeating the arguments in the proof of Proposition 1.4.1 yields the con-
vexity of the set KS .

Lemma 1.11.3 The set KS is convex, i.e., if ·S,1 , ·S,2 are semi-
norms in KS , then λ ·S,1 + (1 − λ) ·S,2 ∈ KS for any λ ∈ [0, 1] as
well.

Choose an arbitrary seminorm ·S in KS , i.e., there is a generator Z that


realizes in the angular set S. Next, we establish the fact that the distribution
of Z is uniquely determined. This parallels Corollary 1.7.10.

Lemma 1.11.4 The distribution of a generator Z ∈ S of a seminorm


·S ∈ KS is uniquely determined, i.e., if Z (1) , Z (2) are generators of
this seminorm in KS , which both realize in S, then we have for any
Borel subset B ⊂ S
   
P Z (1) ∈ B = P Z (2) ∈ B .

By the preceding result, we can identify the set KS of seminorms generated


by means of S with the set of probability measures on S.

Proof. Put E := [0, ∞)d \ {0} ⊂ Rd and λB := {λb : b ∈ B} for an arbitrary


set B ⊂ Rd and λ > 0. Set

ν(E\([0, ∞) · S)) := 0

and, for all Borel subsets B of S and λ > 0,


1
ν((λ, ∞) · B) := P (Z ∈ B). (1.32)
λ
One readily finds that this defines a measure ν on the Borel σ-field of
E, which is uniquely determined by the distribution of Z; see the proof of
Lemma 1.7.8. Repeating the arguments in the proof of Lemma 1.7.9, we obtain
for x = (x1 , . . . , xd ) > 0 ∈ Rd
 

 
1 1

ν [0, x] = E max Zj = 
x . (1.33)
1≤j≤d xj S

The measure ν is by equation (1.33) uniquely determined by the seminorm


·S . Let Z (1) and Z (2) be two generators of ·S , which both realize in S.
Then, by equation (1.32), we obtain for an arbitrary Borel subset B of S
   
P Z (1) ∈ B = ν((1, ∞) · B) = P Z (2) ∈ B ,

which is the assertion.



1.11 D-Norms from a Functional Analysis Perspective 65

Convex Hull and Extremal Set


In what follows, each vector space V is defined over R, i.e., if x1 , x2 ∈ V , then
λ1 x1 + λ2 x2 ∈ V for λ1 , λ2 ∈ R.
A subset K of V is convex if λx1 + (1 − λ)x2 ∈ K for each x1 , x2 ∈ K
and each λ ∈ [0, 1]. For the sake of completeness, we establish the following
well-known characterization of convexity.

Lemma 1.11.5 The set K ⊂ V is convex iff



n
λi xi ∈ K, x1 , . . . , xn ∈ K, (1.34)
i=1
n
for any n ∈ N and λ1 , . . . , λn ∈ [0, 1] with i=1 λi = 1.

Proof. Clearly, we only have to prove the implication “⇒.” It can be seen by
induction as follows. Suppose equation (1.34) is true for n ≥ 2. It is true for
n+1
n = 2 by the convexity of K. Choose λ1 , . . . , λn+1 ∈
[0, 1] with i=1 λi = 1
n
and x1 , . . . , xn+1 ∈ K. We can assume wlog that i=1 λi > 0. Then, we
obtain

n+1 
n
λi xi = λn+1 xn+1 + λi xi
i=1 i=1
⎛ ⎞

n 
n
λ
= λn+1 xn+1 + ⎝ λj ⎠ n i xi
j=1 i=1 j=1 λj

n
λ
= λn+1 xn+1 + (1 − λn+1 ) n i xi ∈ K
i=1 j=1 λj

by induction.

For any subset X of a vector space V



conv(X) := x ∈ V : there exist n ∈ N, x1 , . . . , xn ∈ X and λ1 , . . . , λn ∈ [0, 1]

n 
n 
with λi = 1, such that x = λi xi
i=1 i=1

is the convex hull of X. For any convex K ⊂ V

ex(K) := {x ∈ K : if x = λx1 + (1 − λ)x2 for some x1 , x2 ∈ K


and some λ ∈ [0, 1], then x1 = x2 }
66 1 D-Norms

Subset of R2 Convex Hull


{x1 , . . . , xd } convex polyhedron with at most d corners
arbitrary convex set the set itself
the grid Z2 of integers R2

Table 1.1: Examples of subsets of R2 and their convex hulls

is the set of extremal points of K or the extremal set of K. Here is a list of


examples in V = R2 (Tables 1.1 and 1.2).
For every convex subset K ⊂ V , we have
conv(ex(K)) ⊂ conv(K) = K. (1.35)

Convex Set K Extremal Set ex(K)


convex polygon corner points
closed unit ball unit sphere
open
 unit ball  ∅
(x, y) ∈ R2 : y ≥ 0 ∅
2
[0, ∞) {(0, 0)}

Table 1.2: Examples of convex sets in R2 and their extremal sets.

While the equality conv(K) = K is an immediate consequence of the


convexity of K, the inclusion in (1.35) follows from the fact that ex(K) ⊂ K.
The Krein–Milman theorem for finite-dimensional normed vector spaces
states in particular that the reverse inclusion in (1.35) is true as well if K is
convex and compact. Here compactness is meant with respect to the usual
Euclidean topology, generated by an arbitrary norm on Rd .

Theorem 1.11.6 (Krein–Milman, finite dimensions) Let K ⊂


Rd be convex and compact. Then, ex(K) is compact as well, and
K = conv(ex(K)).

The condition that K is compact cannot be dropped. For example, let


K
= ∅ be an open ball in Rd . Then, ex(K) = ∅ = conv(ex(K)). For a proof
of Theorem 1.11.6, we refer to Phelps (2001, Section 1).

Barycentric Coordinates
Let K ⊂ Rd be a convex and compact set, whose extremal set ex(K) =
{x1 , . . . , xn } is a set of n distinct vectors in Rd . For any x ∈ K there exists
according to Theorem 1.11.6  a vector w = (w1 , . . . , wn ) = w(x) of weights
w1 , . . . , wn ∈ [0, 1] with ni=1 wi = 1, such that
1.11 D-Norms from a Functional Analysis Perspective 67


n
x= wi xi .
i=1

The vector (w1 , . . . , wn ) is called a vector of generalized barycentric coordi-


nates of x.
The vector w(x) is, in general, not uniquely determined. Take, for instance,
the unit square K = [0, 1] × [0, 1] in R2 . Its extremal set is the set of its four
corners {(0, 0), (1, 0), (1, 1), (0, 1)} = ex(K). The center (1/2, 1/2) of K can
be represented in two different ways:

1 1 1 1 1 1
, = (0, 0) + (1, 1) = (0, 1) + (1, 0).
2 2 2 2 2 2
In this example, we have two vectors of generalized barycentric coordinates
of (1/2, 1/2): (1/2, 0, 1/2, 0) and (0, 1/2, 0, 1/2).
Each vector of barycentric coordinates w = w(x) = (w1 , . . . , wn ) for a
fixed x ∈ K with corresponding extremal points x1 , . . . , xn can be interpreted
as a discrete probability measure Qw on the set ex(K) of the extremal points
of K:
n
Qw (B) = wi εxi (B), B ⊂ ex(K),
i=1
where εz (·) is the Dirac measure or point measure with mass one at z, i.e.,
εz (B) = 1 if z ∈ B and zero elsewhere.
As a consequence, we can write for any linear affine functional f : K → R

f (x) = f (x ) Qw (dx );
ex(K)

recall that x is kept fixed. This representation can easily be seen as follows.
Each linear affine functional f : K → R can be written as f (·) = (·) + b,
where  is a linear function and b ∈ R is a fixed real number. We therefore
obtain
 n 

f (x) =  wi xi + b
i=1

n
= wi (xi ) + b
i=1
n
= wi ((xi ) + b)
i=1
n
= wi f (xi )
i=1

= f (x ) Qw (dx ). (1.36)
ex(K)
68 1 D-Norms

Definition 1.11.7 Let K be a convex and compact subset of Rd . If,


for any x ∈ K, the weight vector w(x) = (w1 , . . . , wn ) of generalized
barycentric coordinates with corresponding extremal points x1 , . . . , xn
is uniquely determined, then K is called a simplex, and w(x) is the
vector of barycentric coordinates.

Choose, for example, v1 , . . . , vn ∈ Rd such that the vectors v2 −


v1 , . . . , vn − v1 are linearlyindependent. Then,  their convex hull K :=
conv({v1 , . . . , vn }) = {x = ni=1 wi vi : wi ≥ 0, ni=1 wi = 1} is convex and
compact. It is easy to see that its extremal points are v1 , . . . , vn , and that
the vector w(x) of weights of x ∈ K is uniquely determined by the in-
dependence  of v2 − v1 , . . . , vn − v1 . In other words, K is a simplex. The
set K = x ≥ 0 ∈ Rd : x1 ≤ 1 is an example, being the convex hull of
{0, e1 , . . . , ed } ⊂ Rd .

Locally Convex Spaces


The general Krein–Milman theorem is formulated for a locally convex vector
space V of arbitrary dimension. This is a vector space equipped with a topol-
ogy, such that for any neighborhood U of each vector x ∈ V , there is a convex
neighborhood Uc ⊂ U of x.
Local convexity of a vector space V can be characterized in terms of semi-
norms ·s . For a proof of the following characterization, we refer to Jarchow
(1981, Section 7.5).

Lemma 1.11.8 An arbitrary vector space V , equipped with a topology,



is locally convex iff there is a family of seminorms ·s,i : i ∈ I ,
indexed by some index set I, such that for each x ∈ V and an arbitrary
sequence xk ∈ V , k ∈ N, we have

xk →k→∞ x ⇐⇒ ∀ i ∈ I : xk − xs,i →k→∞ 0.

If the topology of V is generated by a norm · on V , then we can obviously


choose I = {1} with ·s,1 = ·. As a consequence we obtain that each
normed vector space is locally convex. However, we do not need Lemma 1.11.8
to see this.
Example 1.11.9 The set of denumerable sequences of real numbers

V := RN = {x = (x1 , x2 , . . . ) : xi ∈ R, i ∈ N}

is a vector space, with addition x + y and multiplication cx, c ∈ R,


meant componentwise. By defining xs,i := |xi |, where xi is the i-
 
th component of x ∈ V , i ∈ N, we obtain a family ·s,i : i ∈ N
1.11 D-Norms from a Functional Analysis Perspective 69

of seminorms on V , indexed by I = N. We define convergence of a


sequence xk ∈ V , k ∈ N, to x ∈ V by

xk →k→∞ x : ⇐⇒ ∀ i ∈ N : xk − xs,i →k→∞ 0.

This yields the topology of element-wise convergence on V . If we put, for


example, ek := (0, . . . , 0, 1, 0, . . . ) ∈ V with 1 being the k-th component,
k ∈ N, then we obviously obtain ek →k→∞ 0 = (0, 0, . . . ) ∈ V . We even
obtain kek →k→∞ 0 = (0, 0, . . . ). Equipped with this topology, V is a
locally convex vector space according to Lemma 1.11.8.


Example 1.11.10 The space Vd := f : Rd → R of real valued func-
tions on Rd is a vector space, equipped with the usual componentwise
operations. By defining for x ∈ Rd

f s,x := |f (x)| , f ∈ Vd ,
 
we obtain a family ·s,x : x ∈ Rd of seminorms on Vd , indexed by
I = Rd .
We define convergence of a sequence fk , k ∈ N, to f in V by

fk →k→∞ f : ⇐⇒ ∀ x ∈ Rd : fk − f s,x = |fk (x) − f (x)| →k→∞ 0,

which generates the topology of pointwise convergence on V . Equipped


with this topology, Vd , is according to Lemma 1.11.8, a locally convex
vector space.
Note that the set of seminorms KS , derived from an angular set
S ⊂ [0, ∞)d as in (1.31), is a subset of Vd .

Krein–Milman Theorem in Arbitrary Dimensions


We are now ready to state the Krein–Milman theorem for a general locally
convex space V , not necessarily a finite dimensional one. By Ā we denote the
topological closure of a subset A ⊂ V , i.e., Ā is the intersection of all closed
subsets of V that contain the set A. For a proof of the following result, we
refer to Jarchow (1981, Section 7.5).

Theorem 1.11.11 (Krein–Milman, Arbitrary Dimensions) Let


K ⊂ V be a compact and convex subset of a locally convex real vector
space V . Then, we have

K = conv(ex(K)).

Note that, because of the closure in the previous result, it is not guaranteed
that every x ∈ K is the convex combination of extremal elements of K.
70 1 D-Norms

Choquet–Bishop–de Leeuw Theorem and Bauer


Simplex
The following result generalizes representation (1.36) of a linear affine func-
tional f (x) in terms of barycentric coordinates of x in a finite dimensional
vector space to arbitrary dimension. For a proof, we refer to Phelps (2001,
Section 4) or Lax (2002, Section 13.4).

Theorem 1.11.12 (Choquet–Bishop–de Leeuw) Let K ⊂ V be a


compact and convex subset of a locally convex real vector space V . For
every x ∈ K, there exists a probability measure Qx on ex(K), equipped
with the induced Borel σ-field of V , such that for every linear affine and
continuous functional f : K → R,

f (x) = f (v) Qx (dv).
ex(K)

Recall that according to the Krein–Milman theorem 1.11.11 in arbitrary


dimensions, the set K in the previous result equals conv(ex(K)). A functional
f : K → R is in this setup, therefore, automatically defined on ex(K) ⊂
conv(ex(K)). We are now ready to define a Bauer simplex.

Definition 1.11.13 Let V be a locally convex vector space. A convex


and compact subset K ⊂ V is a Bauer simplex if ex(K) = ex(K) and
if, for every x ∈ K, the probability measure Qx in Theorem 1.11.12 is
uniquely determined.

If we put V = Rd and equip it with an arbitrary norm ·, then each


compact and convex subset K ⊂ V is a Bauer simplex iff it is a simplex in
the sense of Definition 1.11.7. This is a consequence of Theorem 1.11.6 and
the fact that (Rd , ·) is locally convex.

Back to Seminorms from an Angular Set


In what follows, we apply the preceding results to the set KS of seminorms,
defined by generators in an angular set in [0, ∞)d as in (1.31).
We consider KS as a subset of the vector space Vd of functions from Rd to
R, equipped with the topology of pointwise convergence as in Example 1.11.10.
According to Lemma 1.11.8, Vd is a locally convex vector space.
In Lemma 1.11.3, we established the fact that KS is convex. Next, we
show that it is sequentially compact, i.e., every sequence ·S,n , n ∈ N, of
seminorms in KS contains a subsequence ·S,m(n) , n ∈ N, which converges
to a seminorm ·S in KS :
1.11 D-Norms from a Functional Analysis Perspective 71

·S,m(n) →n→∞ ·S . (1.37)


We have equipped the vector space Vd with the topology of pointwise conver-
gence, and thus, the convergence in (1.37) is meant componentwise, i.e.,
lim xS,m(n) →n→∞ xS , x ∈ Rd .
n→∞
But the fact that KS is, in this sense, sequentially compact is immediately
seen by repeating the arguments in the proof of Corollary 1.8.5, i.e., we have
the following result:

Lemma 1.11.14 The set KS is sequentially compact with respect to


the topology of pointwise convergence.

Metrizing the Set of Seminorms


Next, we prove that the topology of pointwise convergence on the set KS can
be metrized. It is well known that sequential compactness is equivalent to
compactness in a metric space; thus, we obtain as a consequence that the set
KS is compact
 as well.
Let x(1) , x(2) , . . . = Qd be an enumeration of the countable set Qd of
points in Rd with rational components. It is easily seen that, with ·S,1 ,
·S,2 ∈ KS ,
!    !!
!
   1 !x(i) S,1 − x(i) S,2 !
d ·S,1 , ·S,2 := !  !!
2i 1 + !!   (1.38)
i∈N
x (i)  − x (i)  !
S,1 S,2

defines a metric on KS . Pointwise convergence of seminorms in KS can be


metrized by this metric d(·, ·). This is the content of our next result.

Lemma 1.11.15 For seminorms ·S,n , n ∈ N ∪ {0}, in KS , we have


 
d ·S,n , ·S,0 →n→∞ 0 ⇐⇒ ∀ x ∈ Rd : xS,n →n→∞ xS,0 .

Proof. The implication “⇐” is easily seen and


 left to the reader.
 The implica-
tion “⇒” can be seen as follows. Note that d ·S,n , ·S,0 →n→∞ 0 implies
yS,n →n→∞ yS,0 for any y ∈ Qd . The angular set S is a compact subset
of [0, ∞)d , and thus, it is in particular bounded, i.e., there exists a number
c > 0 such that S ⊂ [0, c]d . As a consequence, each seminorm ·S in KS with
corresponding generator Z = (Z1 , . . . , Zd ) ∈ S satisfies

1S = E max Zj ≤ c.
1≤j≤d
72 1 D-Norms

For every x ∈ Rd and for every ε > 0 there exists y ∈ Qd with x − y∞ < ε.
As a consequence we obtain by the triangle inequality for any n ∈ N

xS,n ≤ yS,n + x − yS,n ≤ yS,n + εc,

xS,n ≥ yS,n − x − yS,n ≥ yS,n − εc,

as well as
yS,0 − εc ≤ xS,0 ≤ yS,0 + εc.
But this yields ! !
! !
lim sup !xS,n − xS,0 ! ≤ 2εc.
n→∞

Since ε > 0 was arbitrary, this implies


! !
! !
lim sup !xS,n − xS,0 ! = 0
n→∞

and, thus, the assertion.


The preceding considerations imply that the topology of pointwise conver-


gence on KS can be metrized:

·S,n →n→∞ ·S,0 ⇐⇒ ∀ x ∈ Rd : xS,n →n→∞ xS,0


 
⇐⇒ d ·S,n , ·S,0 →n→∞ 0. (1.39)

Because sequential compactness is equivalent to compactness in a metric


space, Lemma 1.11.14 has the following consequence.

Lemma 1.11.16 The set KS of seminorms,  defined by the angular set


S ⊂ [0, ∞)d , is a compact subset of Vd = f : Rd → R , equipped with
the topology of weak convergence.

The set KS of seminorms is, therefore, a convex and compact subset of


the locally convex vector space Vd . We can now apply the Choquet–Bishop–
de Leeuw theorem 1.11.12. The following result identifies the extremal set
ex(KS ).

Theorem 1.11.17 A seminorm in KS is extremal iff it has a constant


generator Z = z ∈ S.

Proof. We first show that a seminorm ·S with a constant generator Z =


z = (z1 , . . . , zd ) ∈ S is an extremal element. Suppose there are seminorms
·S,1 , ·S,2 in KS with generators Z (1) , Z (2) ∈ S, and λ ∈ (0, 1) such that

xS = λ xS,1 + (1 − λ) xS,2 , x ∈ Rd .


1.11 D-Norms from a Functional Analysis Perspective 73

This is equivalent to
 

(ξ)
max (|xj | zj ) = E max |xj | Zj , x ∈ Rd ,
1≤j≤d 1≤j≤d

where ξ ∈ {1, 2} is an rv with P (ξ = 1) = λ = 1 − P (ξ = 2), which is inde-


pendent of Z (1) and Z (2) (see the proof of Proposition 1.4.1). Note that Z (ξ)
realizes in S, and thus, it is a generator of ·S as well. But the distribution
of a generator on S is uniquely determined by Lemma 1.11.4, and thus, we
obtain Z (ξ) = z a.s. This is equivalent to

Z (1) = z = Z (2) a.s.,

and thus, a seminorm ·S with a constant generator in S is extremal.


Next, we show that each extremal seminorm in KS has a constant gen-
erator Z = z ∈ S. We establish this fact by showing that each seminorm in
KS , whose generator Z ∈ S satisfies P (Z = z) < 1 for each z ∈ S, is not
extremal. Let Z be such a generator. Then, we can find two disjoint Borel
subsets A and B of S with A ∪ B = S and P (Z ∈ A) > 0, P (Z ∈ B) > 0.
Otherwise, we would readily derive a contradiction.
Let the rv Z (1) follow the elementary conditional distribution P (Z ∈ · |
Z ∈ A), and let the rv Z (2) follow the elementary conditional distribution
P (Z ∈ · | Z ∈ B). Note that Z (1) , Z (2) ∈ S are both generators of seminorms
in KS . For x ∈ Rd , we have

E max (|xj | Zj )
1≤j≤d

= P (Z ∈ A)E max (|xj | Zj ) | Z ∈ A


1≤j≤d

+ P (Z ∈ B)E max (|xj | Zj ) | Z ∈ B


1≤j≤d
 
 

(1) (2)
= λE max |xj | Zj + (1 − λ)E max |xj | Zj ,
1≤j≤d 1≤j≤d

with λ := P (Z ∈ A) ∈ (0, 1). Note that the distributions of Z (1) and Z (2) are
different. The seminorm generated by Z is, therefore, not extremal.

Introducing a Homeomorphism
The functional T : S → ex(KS ), which maps each z ∈ S onto the seminorm
·S,z ∈ KS with constant generator Z = z = (z1 , . . . , zd ) ∈ S, i.e.,

xS,z = max (|xj | zj ) , x = (x1 , . . . , xd ) ∈ Rd ,


1≤j≤d

is, obviously, one-to-one. We also have for a sequence zn ∈ S, n ∈ N,


74 1 D-Norms

zn →n→∞ z ⇐⇒ ·S,zn →n→∞ ·S,z ,

i.e., the functional T , as well as its inverse functional, is continuous. The func-
tional T is, therefore, a homeomorphism. It maps the Euclidean topology on
S one-to-one onto the topology defined on ex(KS ), which is the topology of
pointwise convergence. It can be metrized as in (1.39). We state this relation-
ship explicitly.

Lemma 1.11.18 The angular set S ⊂ [0, ∞)d is homeomorphic to


ex(KS ) with the homeomorphism T .

As S is a compact subset of Rd , the set T (S) = ex(KS ) is compact as well.


In particular, it is closed, i.e.,

ex(KS ) = ex(KS ). (1.40)

The next result was established by Ressel (2013, Theorem 1) for the com-
plete angular set S = x ∈ [0, 1]d : x∞ = 1. Its extension to an arbitrary
angular set was proved by Fuller (2016).

Theorem 1.11.19 For an angular set S, the set KS is a Bauer sim-


plex. The extremal elements are the seminorms ·S,z with a constant
generator Z = z ∈ S.

Proof. The set KS is, according to Lemmas 1.11.3 and 1.11.16, a convex and
compact subset of Vd , which is a locally convex vector space, as shown in
Example 1.11.10. According to equation (1.40), the set ex(KS ) is closed. In
order to prove that KS is a Bauer simplex, it remains to show that, for every
element ·S ∈ KS , the probability measure Q · S on ex(KS ), defined in the
Choquet–Bishop–de Leeuw theorem 1.11.12, is uniquely determined.
Choose ·S ∈ KS and let Q · S be a probability measure on ex(KS ) =
ex(KS ) that satisfies
    
f (·S ) = f ·S,z Q · S d ·S,z
ex(KS )

for every linear affine and continuous functional f : KS → R.


Note that f (·S ) := xS with x ∈ Rd kept fixed defines a linear and
continuous functional on KS ; see Lemma 1.11.3 and Example 1.11.10. Thus,
for every x ∈ Rd , we obtain the representation
  
xS = xS,z Q · S d ·S,z . (1.41)
ex(KS )
1.11 D-Norms from a Functional Analysis Perspective 75

According to Lemma 1.11.18, we can identify ex(KS ) with S and their topolo-
gies as well. As a consequence, the probability measure Q · S on the Borel
σ-field of ex(KS ) can be identified with a probability measure σ on the Borel
σ-field of S. Equation (1.41), therefore, becomes

xS = xS,z σ(dz)
S
= max (|xj | zj ) σ(dz)
S 1≤j≤d

= E max (|xj | Zj ) , (1.42)


1≤j≤d

where Z = (Z1 , . . . , Zd ) is an rv in S with distribution σ(·). As Z is the


generator of the seminorm ·S by (1.42), its distribution is, according to
Lemma 1.11.4, uniquely determined. This completes the proof.

According to the preceding result, every seminorm ·S ∈ KS ; thus, the


uniquely determined distribution of its generator in S can be identified with a
probability distribution on the extremal set ex(KS ), i.e., with its barycentric
coordinates, roughly.
By Corollary 1.7.2 we know that, for any D-norm ·D on Rd , there exists
d
a generator Z = (Z1 , . . . , Zd ) with the property Z1 = i=1 Zi = d. If we
put Sd := x ≥ 0 ∈ Rd : x1 = d , then Sd is a complete angular set as in
Definition 1.7.12, and the family KD of D-norms on Rd is a subset of the
Bauer simplex KS of seminorms generated by the set S.
According to Corollary 1.8.5, we know that the pointwise limit of a se-
quence of D-norms is again a D-norm, and thus, KD is a closed subset of KS .
As KS is metrizable according to Lemma 1.11.15 and compact according to
Lemma 1.11.16, KD is compact as well. According to Proposition 1.4.1, it is
also convex.

A Dense Subset of the Set of D-Norms


A subset KM of the set KD of all D-norms on Rd is dense in KD , if for each
·D ∈ KD there exists a sequence ·D,n , n ∈ N, of D-norms in KM that
converges to ·D pointwise, i.e.,

lim xD,n = xD , x ∈ Rd .


n→∞

Our final result in this section provides a dense subset of KD .

Proposition 1.11.20 The set of D-norms, whose generators follow a


discrete distribution on Sd = x ≥ 0 ∈ Rd : x1 = d is dense in the
set of D-norms.
76 1 D-Norms

Proof. The extremal elements in KS are, according to Theorem 1.11.17,


those seminorms that have a constant generator. The Krein–Milman theo-
rem 1.11.11 implies that, for any D-norm ·D on Rd , there exists a sequence
of convex combinations


m(n)
·S,n := wi,n ·S,i,n , n ∈ N,
i=1

of seminorms ·S,i,n ∈ ex(KS ) with constant generator Zi,n := zi,n ∈ S,


1 ≤ i ≤ m(n), n ∈ N, such that

xS,n →n→∞ xD , x ∈ Rd . (1.43)


m(n)
The weights wi,n ≥ 0, 1 ≤ i ≤ m(n), which add up to one, i.e., i=1 wi,n = 1,
together with the vectors z1,n , . . . , zm(n),n define a discrete probability mea-
sure on S for each n ∈ N via


m(n)
Pn (·) := wi,n εzi,n (·).
i=1
 
(n) (n)
Let Z (n) := ∈ S be an rv, which follows this discrete
Z1 , . . . , Zd

probability measure Pn with support z1,n , . . . , zm(n),n , n ∈ N. The rv
Z (n) generates the seminorm ·S,n , since for every x ∈ Rd with zi,n =
(zi,n,1 , . . . , zi,n,d ), we have
 
m(n)
  
(n)
E max |xj | Zj = max (|xj | zi,n,j ) P Z (n) = zi,n
1≤j≤d 1≤j≤d
i=1


m(n)
= xS,i,n wi,n
i=1
= xS,n .

However, the seminorm ·S,n is in general not a D-norm, as the condition


 
(n)
on its generator E Zj = 1, 1 ≤ j ≤ d, is not generally satisfied. From
(1.43), however, we conclude
 
(n) (n)
βj := E Zj = ej S,n →n→∞ ej D = 1, j = 1, . . . , d, (1.44)

where ej = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rd denotes the j-th unit vector in Rd .


Note that ⎛ ⎞
 d
(n)
d
(n)
βj = E ⎝ Zj ⎠ = E(d) = d,
j=1 j=1
1.11 D-Norms from a Functional Analysis Perspective 77
(n)
thus, Bn := max1≤j≤d βj ≥ 1 as well as Bn →n→∞ 1 by (1.44). We have
(n)
1 
d
Bn − β j 1 dBn − d
+ = + = 1,
Bn j=1 dBn Bn dBn

and each summand on the left-hand side of the above equation is non-negative.
Let δj be the probability measure on Sd that puts mass one on the vector
dej , 1 ≤ j ≤ d. Then,
(n)
1 d
Bn − β j
Qn := Pn + δj , n ∈ N,
Bn j=1
dBn

defines a sequence of discrete probability measures on Sd with


(n)
1 
d
Bn − β j
|Qn (B) − Pn (B)| ≤ 1 − + →n→∞ 0 (1.45)
Bn j=1 dBn

for any Borel subset B of Rd . Moreover, for each j ∈ {1, . . . , d} and n ∈ N,


we have
 (n)
1 (n) Bn − βj
xj Qn (dx) = βj + d = 1.
Sd Bn dBn
 
(n) (n)
Let the rv Z̃ (n) = Z̃1 , . . . , Z̃d ∈ Sd follow this distribution Qn . We
have   
(n)
E Z̃j = xj Qn (dx) = 1
Sd

for each j ∈ {1, . . . , d}, and therefore, Z̃ (n) generates a D-norm, say, ·D,n .
The convergence in (1.45) together with Lemma 1.2.2 imply for x ∈ Rd
 

(n)
xD,n = E max |xj | Z̃j
1≤j≤d
 ∞  
(n)
= 1 − P |xj | Z̃j ≤ t, 1 ≤ j ≤ d dt
0
 d max1≤j≤d |xj |  
(n)
= 1 − P |xj | Z̃j ≤ t, 1 ≤ j ≤ d dt
0
 
d max1≤j≤d |xj | 
(n)
= 1 − P |xj | Zj ≤ t, 1 ≤ j ≤ d dt + o(1)
0
 

(n)
= E max |xj | Zj + o(1)
1≤j≤d

= xS,n + o(1) →n→∞ xD ,

which completes the proof.



78 1 D-Norms

1.12 D-Norms from a Stochastic Geometry Perspective


Each D-norm on Rd can be characterized by a particular convex and compact
subset of [0, ∞)d called max-zonoid , which was observed by Molchanov (2008).
This characterization, which is essentially Corollary 1.12.17 below, is achieved
within the framework of stochastic geometry. We list only a few auxiliary re-
sults and tools, which we need for the derivation of this characterization. As
an application, we can considerably extend the well-known Hölder’s inequal-
ity to D-norms; see Theorems 1.12.22 and 1.12.24. For a thorough study of
stochastic geometry, we refer the reader to the book by Molchanov (2005) and
the literature cited therein.

Orthogonal Projection onto a Line


For arbitrary vectors x = (x1 , . . . , xd ) and y = (y1 , . . . , yd ) in Rd , put


d
x, y := xi yi ∈ R,
i=1

which is the usual scalar product or inner product on Rd . This is obviously a


bilinear map:

sx1 + tx2 , y = sx1 , y + tx2 , y,


x, sy1 + ty2  = sx, y1  + tx, y2 , s, t ∈ R.

Note that
 1/2
& 
d
x, x = x2i = x2
i=1

is the Euclidean norm on Rd .


Fix x ∈ Rd with unit length x2 = 1. Mapping y ∈ Rd onto the linear
combination x, y ∈ R has the following geometric interpretation, (see Fig-
ure 1.1 for illustration). First, recall that y sits by definition at a right angle
to x iff x, y = 0. The vector x defines the line Lx = {sx : s ∈ R} in Rd .
Projecting the vector y orthogonally onto this line means that the projection
s0 x sits at a right angle to the vector y − s0 x, i.e., x, y − s0 x = 0 or,
equivalently,
2
x, y = x, s0 x = s0 x, x = s0 x2 = s0 .

We see that the inner product x, y of x and y is just the coordinate s0
of the orthogonal projection of y onto the line Lx . If x has arbitrary length
2
x2 > 0, then x, y = s0 x2 .
1.12 D-Norms from a Stochastic Geometry Perspective 79

y − s0 x
Lx

: s0 x = x, yx
x

Fig. 1.1: Orthogonal projection of y onto the line Lx .

Introducing the Support Function


Let L ⊂ Rd be a non-empty compact set. For x = (x1 , . . . , xd ) ∈ Rd , put
d '

h(L, x) := sup {y, x : y ∈ L} = sup yi xi : (y1 , . . . , yd ) ∈ L ,
i=1

which defines the support function h(L, ·) of L. The support function is one
of the most central basic concepts in convex geometry.
A convex and compact set L ⊂ Rd is uniquely determined by its support
function h(L, ·). This is a consequence of the next result. Put, for x ∈ Rd ,

HL (x) := y ∈ Rd : y, x ≤ h(L, x) ,

which is the half space of Rd that corresponds to L and x.

Lemma 1.12.1 Let ∅


= L ⊂ [0, ∞)d be compact and convex. Then,

L= HL (x).
x∈Rd

Proof. Each y ∈ L satisfies y, x ≤ h(L, x), and thus, L ⊂ HL (x) for each
x ∈ Rd , i.e., 
L⊂ HL (x).
x∈Rd

Choose z ∈ Rd , z
∈ L. It is well known that z and L can be separated in
the following way: we can find x ∈ Rd , x
= 0, such that, for all y ∈ L,

y, x < z, x.

This is the hyperplane separation theorem; see, for example, Rockafellar (1970,
Corollary 11.4.2.). As g(·) :=  ·, x is a continuous function on Rd and L ⊂ Rd
80 1 D-Norms

is compact, the supremum sup {g(y) : y ∈ L} is attained, i.e., there exists


y0 ∈ L with

g(y0 ) = sup {g(y) : y ∈ L} = h(L, x) < z, x.

This shows that z


∈ HL (x), which implies the assertion.

The following consequence of the Lemma 1.12.1 is obvious. A non-empty


convex and compact set in [0, ∞)d is uniquely determined by its support
function.

Corollary 1.12.2 Let L1


= L2 be two non-empty convex and compact
sets in [0, ∞)d . Then, h(L1 , ·)
= h(L2 , ·), i.e., there exists x ∈ Rd with
h(L1 , x)
= h(L2 , x).

Each Support Function Provides a Seminorm


Lemma 1.12.3 Let K be a non-empty compact and convex subset of
[0, ∞)d . Put |x| = (|x1 | , . . . , |xd |) for x = (x1 , . . . , xd ) ∈ Rd . Then,
d '

xS := h(K, |x|) = sup yi |xi | : y = (y1 , . . . , yd ) ∈ K (1.46)
i=1

defines a seminorm as in Definition  1.11.1.


 This seminorm is mono-
tone, i.e., 0 ≤ x(1) ≤ x(2) implies x(1) S ≤ x(2) S .

Proof. We obviously have xS ≥ 0, 0S = 0 as well as λxS = |λ| xS


for λ ∈ R. It remains to show that ·S satisfies the triangle inequality (1.3).
Choose x(1) , x(2) ∈ Rd and y ∈ K; note that y ≥ 0 ∈ Rd . The ordinary
triangle inequality for the absolute value of a real number implies
d 
 
(1) (2)
y, |x(1) + x(2) | = yi |xi + xi |
i=1
d 
 
(1) (2)
≤ yi |xi | + yi |xi |
i=1

= y, |x(1) | + y, |x(2) |

and, thus, the triangle inequality


     
 (1)     
x + x(2)  ≤ x(1)  + x(2)  .
S S S

The monotonicity of ·S is obvious; recall that K ⊂ [0, ∞)d .



1.12 D-Norms from a Stochastic Geometry Perspective 81

A Support Function Can Provide a Norm


The preceding result clearly raises the question: When is the seminorm ·S
a norm? The answer is well known; see Rockafellar (1970, Theorem 15.2).
Adapting this to our purposes, we explicitly state the characterization.

Lemma 1.12.4 Let ∅


= K ⊂ [0, ∞)d be compact and convex. The
seminorm defined in (1.46) defines a norm iff K ∩ (0, ∞)d
= ∅.

Proof. Suppose first that K ∩ (0, ∞)d


= ∅, i.e., there exists y = (y1 , . . . , yd ) ∈
K with yi > 0, 1 ≤ i ≤ d. Suppose xS = 0. We have to show that x = 0 ∈
Rd . But this follows immediately from the inequality


d
0 = xS ≥ yi |xi | ≥ 0,
i=1

thus, x1 = 0, or x = 0 ∈ Rd .
Suppose next that ·S is a norm on Rd . We have to show K ∩(0, ∞)d
= ∅.
As ·S is a norm, we have ej S > 0 for each j = 1, . . . , d, i.e., there exists
yj ∈ K, whose j-th component is strictly positive. Since K is convex, the
d
vector y := j=1 yj /d is in K as well, and it is in (0, ∞)d .

Let · be an arbitrary radially symmetric norm on Rd , i.e., changing the


sign of any components of x ∈ Rd does not alter the value of x. In this
case, · is determined by its values
 on [0, ∞)d . One may conjecture that ·
might equal h(K, ·), where K = x ≥ 0 ∈ Rd : x ≤ 1 . The following result
shows in particular that this conjecture is not true.

Lemma 1.12.5 Let p, q ∈ [1, ∞] with


⎧ p

⎨ p−1 , if p ∈ (1, ∞),
1 1
+ = 1, i.e., q := 1, if p = ∞,
p q ⎪

∞, if p = 1.

Then, for the family of logistic norms on Rd as in Proposition 1.2.1, we


have
d '

xp = h(Kq , |x|) = sup yi |xi | : y ≥ 0 ∈ R , yq ≤ 1 , (1.47)
d

i=1

i.e., the convex and compact set that generates the norm ·p as in
 
equation (1.46) is Kq = y ≥ 0 ∈ Rd : yq ≤ 1 .
82 1 D-Norms

Proof. In what follows, we assume wlog x = (x1 , . . . , xd ) ≥ 0 ∈ Rd , x


= 0.
With p = 1 and q = ∞, we obtain for every y ≥ 0 ∈ Rd , y∞ ≤ 1,


d 
d
xi yi ≤ xi = x1 ,
i=1 i=1

and equality holds for y = (1, . . . , 1) ∈ Rd . This proves (1.47) for the combi-
nation p = 1, q = ∞.
For p = ∞ and q = 1, we obtain for every y ≥ 0 ∈ Rd with y1 ≤ 1


d 
d
xi yi ≤ x∞ yi = x∞ y1 ≤ x∞ ,
i=1 i=1

and equality holds for the choice y = ei∗ , where i∗ ∈ {1, . . . , d} is that index
with xi∗ = max(x1 , . . . , xd ) = x∞ . This proves (1.47) for the combination
p = ∞, q = 1.
Finally, we consider p, q ∈ (1, ∞) with p−1 + q −1 = 1. We obtain with

x := (x1 , . . . , xd )/ xp
'

d
sup xi yi : y ≥ 0 ∈ R , yq ≤ 1
d

i=1
'
d
xi
= xp sup yi : y ≥ 0 ∈ Rd , yq ≤ 1
i=1
x p
d '


= xp sup xi yi : y ≥ 0 ∈ R , yq ≤ 1 .
d

i=1

Hölder’s inequality implies for y ≥ 0 ∈ Rd with yq ≤ 1


d
x∗i yi ≤ x∗ p yq = yq ≤ 1,
i=1

d
therefore, it is sufficient to find y ∈ Kq such that i=1 x∗i yi = 1.
We have equality in Hölder’s inequality if

x∗i = yiq ,
p
1 ≤ i ≤ d.

Therefore, put
yi := x∗i
p/q
, 1 ≤ i ≤ d.
Then, we obtain

d
x∗i yi = x∗ p yq = yq
i=1
1.12 D-Norms from a Stochastic Geometry Perspective 83

with
 1/q

d
x∗p = x∗ p
p/q
yq = i = 1,
i=1

which completes the proof.


The Symmetric Extension


Let K ⊂ [0, ∞)d be a convex and compact set. By

L(K) := y ∈ Rd : |y| ∈ K ,

we denote the symmetric extension of K; recall that the absolute value of


a vector |y| is taken componentwise. The symmetric set L(K) is compact as
well, but not necessarily convex; just set K := {λ(1, . . . , 1) : λ ∈ [0, 1]}, which
is a line in [0, 1]d . Its symmetric extension L(K) is, for d = 2, a cross and,
thus, not a convex set.
The support function that corresponds to L(K) satisfies, for x ∈ Rd ,
d '

h(L(K), x) = sup yi xi : |y| ∈ K
i=1
'

d
= sup |yi | |xi | : |y| ∈ K
i=1
= h(K, |x|).

The Norm Induced by the Symmetric Extension


Let K ⊂ [0, ∞)d be compact and convex with K∩(0, ∞)d
= 0. Then, according
to Lemma 1.12.4,

h(L(K), x) = h(K, |x|) = xS , x ∈ Rd , (1.48)

defines a norm on Rd , which we denote by ·K in what follows. We say that


it is generated by the set K.
Put Kc := {λ(1, . . . , 1) : c ≤ λ ≤ 1} ⊂ Rd , which is, for every c ∈ [0, 1), a
line in [0, 1]d. For x ∈ Rd we have
d '
 d
h(Kc , |x|) = sup yi |xi | : y ∈ Kc = |xi | = x1 .
i=1 i=1

As a consequence, we obtain for each c ∈ [0, 1)

h(L(Kc ), x) = x1 , x ∈ Rd .
84 1 D-Norms

This shows that a convex and compact subset of [0, ∞)d , whose support func-
tion generates
 a norm, is not uniquely determined by this norm. Note that
L(Kc ) = y ∈ Rd : |y| = λ(1, . . . , 1), c ≤ λ ≤ 1 is not a convex set if d ≥ 2
for any c ∈ [0, 1).
If we put, however, K := [0, 1]d, then K is convex and compact, K ∩
(0, ∞)d
= ∅, and L(K) = [−1, 1]d is convex as well. The norm ·K that is
generated by K is again ·1 , but now K is uniquely determined: it is the only
convex and compact subset of [0, ∞)d , K ∩ (0, ∞)d
= ∅, generating ·1 such
that L(K) is convex. This is a consequence of our preceding considerations,
summarized in the next result.

Lemma 1.12.6 Let K ⊂ [0, ∞)d be a convex and compact set with
K ∩(0, ∞)d
= ∅. If L(K) is a convex set, then K is uniquely determined
by the generated norm ·K .

Proof. For x ∈ Rd , we have

h(L(K), x) = xK .

This equation
 identifies the set L(K) according to Corollary 1.12.2. But
L(K) = y ∈ Rd : |y| ∈ K identifies the set K.

Cross-Polytopes
For z = (z1 , . . . , zd ) ≥ 0 ∈ Rd , put

Δz := conv ({0, z1 e1 , . . . , zd ed })
d '
 
d
= λi zi ei : λ1 , . . . , λd ≥ 0, λi ≤ 1 , (1.49)
i=1 i=1

which is the convex hull of the vectors 0, z1 e1 , . . . , zd ed ∈ Rd . The set Δz is


a compact and convex set in [0, ∞)d . It is called a cross-polytope.

Lemma 1.12.7 The symmetric extension L(Δz ) of a cross-polytope


Δz with z ≥ 0 ∈ Rd is convex.

Proof. Choose x, y ∈ L(Δz ), i.e.,


d 
d
x= λi zi ei , y= κ i z i ei ,
i=1 i=1
d d
with i=1 |λi | ≤ 1, i=1 |κi | ≤ 1. We obtain for ϑ ∈ [0, 1]
1.12 D-Norms from a Stochastic Geometry Perspective 85


d
ϑx + (1 − ϑ)y = (ϑλi + (1 − ϑ)κi )zi ei
i=1

with

d 
d 
d
|ϑλi + (1 − ϑ)κi | ≤ ϑ |λi | + (1 − ϑ) |κi | ≤ 1,
i=1 i=1 i=1

i.e., |ϑx + (1 − ϑ)y| ∈ Δz , which completes the proof.


Introducing Max-Zonoids
Let Z = (Z1 , . . . , Zd ) ≥ 0 ∈ Rd be an rv with the property E(Zi ) ∈ (0, ∞),
1 ≤ i ≤ d. Then,

xZ := E max (|xi | Zi ) , x = (x1 , . . . , xd ) ∈ Rd ,


1≤i≤d

defines a norm on Rd ; see the proof of Lemma 1.1.3. It is a D-norm iff E(Zi ) =


1 for i = 1, . . . , d.

Definition 1.12.8 (Max-Zonoids) Let K ⊂ [0, ∞)d be a compact


and convex set with K ∩ (0, ∞)d
= ∅, whose symmetric extension L(K)
is convex as well. If the norm ·K on Rd , which is generated by K,
satisfies
·K = ·Z
for some Z = (Z1 , . . . , Zd ) ≥ 0 ∈ Rd with E(Zi ) ∈ (0, ∞), 1 ≤ i ≤ d,
then K is called a max-zonoid.

A max-zonoid generates a D-norm ·K , if E(Zi ) = 1, 1 ≤ i ≤ d. In this


case, we call K a D-max-zonoid . It is also known as a dependency set.

Remark 1.12.9 A max-zonoid K ⊂ [0, ∞)d is uniquely determined by


the norm ·Z . This is just a reformulation of Lemma 1.12.6.

Example 1.12.10 Each logistic norm ·p , p ∈ [1, ∞], is, according
to Proposition 1.2.1, a D-norm. Lemma 1.12.5 shows that each ·p is
 
generated by the D-max-zonoid Kq = y ≥ 0 ∈ Rd : yq ≤ 1 , where
1/p + 1/q = 1.
86 1 D-Norms

A Random Cross-Polytope
The obvious question When is a convex and compact set K a max-zonoid?
was answered by Molchanov (2008). The answer is given within the framework
of stochastic geometry.
Let Z = (Z1 , . . . , Zd ) ≥ 0 ∈ Rd be a rv with E(Zi ) ∈ (0, ∞), 1 ≤ i ≤ d.
Then,

ΔZ = conv ({0, Z1 e1 , . . . , Zd ed })
d '
 
d
= λi Zi ei : λ1 , . . . , λd ≥ 0, λi ≤ 1 , (1.50)
i=1 i=1

which is the convex hull of the vectors 0, Z1 e1 , . . . , Zd ed ∈ Rd , is a random


compact and convex set in [0, ∞)d . It is a random cross-polytope.

The Support Function of a Random Cross-Polytope


The support function of a random cross-polytope ΔZ is, for x = (x1 , . . . , xd ) ≥
0 ∈ Rd ,

h (ΔZ , x) = sup {y, x : y ∈ ΔZ }


d '
( ) 
d
= sup λi Zi ei , x : λ1 , . . . , λd ≥ 0, λi ≤ 1
i=1 i=1
'

d 
d
= sup λi Zi xi : λ1 , . . . , λd ≥ 0, λi ≤ 1
i=1 i=1
= max (xi Zi ),
1≤i≤d

thus,

E (h (ΔZ , x)) = E max (xi Zi ) = xZ .


1≤i≤d

As a consequence, we obtain for the support function of the symmetric exten-


sion L(ΔZ )

E (h (L(ΔZ ), x)) = E (h (ΔZ , |x|)) = xZ , x ∈ Rd ; (1.51)

see equation (1.48). Note that the set L(ΔZ ) is, according to Lemma 1.12.7,
convex as well.
The preceding observation raises the idea that the random cross-polytopes
ΔZ play a major role when answering the question When is a convex and
compact set a max-zonoid? posed earlier in this section. This is actually true;
see Corollary 1.12.17, which characterizes max-zonoids.
1.12 D-Norms from a Stochastic Geometry Perspective 87

The Expectation of a Random Set


The following definition describes a rather flexible and useful concept of a
random closed set, see Molchanov (2005, Section 1.1.1).

Definition 1.12.11 (Random Closed Set) Let (Ω, A, P ) be a prob-


ability space, i.e., Ω is a non-empty set, equipped with a σ-field A and
P is a probability measure on A. A map X : Ω → F := set of closed
subsets of Rd is called a random closed set if, for every compact subset
K ⊂ Rd ,
{ω ∈ Ω : X(ω) ∩ K
= ∅} ∈ A.

Let X be a random closed set. We suppose that

X1 := sup {x1 : x ∈ X}

has finite expectation, i.e., E (X1 ) < ∞. A random closed set X with this
property is called integrably bounded.
At this point, we ignore the precise definition of a proper σ-field on F
such that X1 is a Borel-measurable rv. Instead, we refer to Molchanov
(2005, Section 1.2.1).
If X is a random closed set that is integrably bounded, then X is bounded
with probability one, and thus, it is compact with probability one. In what
follows, we assume that X is an integrably bounded closed and convex subset
of [0, ∞)d ; thus, it is in particular compact with probability one.
The proper definition of the expectation E(X) of a random set X, given
below, is crucial.

Definition 1.12.12 (Selection of a Random Set) We call an rv


ξ = (ξ1 , . . . , ξd ) ∈ [0, ∞)d a selection of X, if ξ ∈ X a.s. The family of
selections of X is denoted by S(X).

We have, for any ξ ∈ S(X),


d
ξ1 = ξi ≤ X1
i=1

thus,

d
E (ξ1 ) = E(ξi ) ≤ E (X1 ) < ∞,
i=1

i.e., each component ξi of ξ ∈ S(X) has finite expectation E(ξi ) < ∞. Recall
that ξi ≥ 0. The selection expectation of X is now the set
d
E(X) := {E(ξ) : ξ ∈ S(X)} ⊂ [0, E (X1 )] .
88 1 D-Norms

Recall that Ā denotes the topological closure of a set A ⊂ Rd , i.e., Ā is the


intersection of all closed subsets of Rd that contain A. By the expectation
E(ξ) of an rv ξ = (ξ1 , . . . , ξd ), we denote the vector of the componentwise
expectations:
E(ξ) = (E(ξ1 ), . . . , E(ξd )).
For the sake of completeness we remark at this point that, actually, we
do not have to take the closure in the definition of E(X), as {E(ξ) : ξ ∈
S(X)} is already a closed set in our framework, called an Aumann integral ;
see Molchanov (2005, Theorem 1.1.24).

Lemma 1.12.13 E(X) is a compact and convex subset of [0, ∞)d .

Proof. Since E(X) is a closed and bounded subset of [0, ∞)d , it is compact.
It remains to show that it is convex as well. For each y (1) , y 
(2)

 E(X),
(1) (2) (1)
there exist sequences ξn , ξn ∈ S(X), n ∈ N, with limn→∞ E ξn = y (1) ,
 
(2) (1) (2)
limn→∞ E ξn = y (2) . The convexity of X implies that λξn +(1−λ)ξn ∈
S(X) for each λ ∈ [0, 1] as well; thus,
 
λy (1) + (1 − λ)y (2) = lim E λξn(1) + (1 − λ)ξn(2) ∈ E(X);
n→∞

recall that E(X) is by definition a closed set.


Lemma 1.12.14 The symmetric extension L(E(X)) of E(X) satisfies

L(E(X)) ⊂ E(L(X)).

If E(X) satisfies the additional condition

0 ≤ y ≤ z for some z ∈ E(X) =⇒ y ∈ E(X), (1.52)

then we also have E(L(X)) ⊂ L(E(X)) and, thus, the equality

L(E(X)) = E(L(X)).

Condition (1.52) is satisfied, for example, for a random cross-polytope


X = ΔZ , i.e., we obtain

L (E (ΔZ )) = E (L (ΔZ )) ;

see also Example 1.12.15.


1.12 D-Norms from a Stochastic Geometry Perspective 89

Proof. Choose y = (y1 , . . . , yd ) ∈ L(E(X)), i.e., |y| ∈ E(X). There exists


 (n)
a sequence ξ (n) ∈ S(X), n ∈ N, with limn→∞  E ξ = |y|. Multiplying
(n) (n) (n)
each component ξi of ξ (n) = ξ1 , . . . , ξd
with the sign of yi , 1 ≤ i ≤ d,
 
provides a sequence η , n ∈ N, of an rv in S(L(X)) with limn→∞ E η (n) =
(n)

y. Consequently, y ∈ E(L(X)) and, thus, L(E(X)) ⊂ E(L(X)).


(n)
Let y ∈ E(L(X)). Then, there  (n) exists a sequence ξ , n ∈ N,! of rvs in
!
S(L(X)) with y = limn→∞ E! ξ ! . This implies |y| = limn→∞ !E ξ (n) !.
As ξ (n) ∈ S(L(X)), we have !ξ (n) ! ∈ S(X) and, by the usual inequality for
expectations !  ! ! !
! ! ! !
0 ≤ !E ξ (n) ! ≤ E !ξ (n) ! ∈ E(X).
!  !
Condition (1.52) now implies that !E ξ (n) ! ∈ E(X) as well for each n ∈ N.
Since E(X) is a closed set, this entails |y| ∈ E(X) and, thus, y ∈ L(E(X)).

Example 1.12.15 Let ΔZ = conv({0, Z1 e1 , . . . , Zd ed }) be a cross-


polytope as defined in (1.50). Check that X = ΔZ is a random closed
set in the sense of Definition 1.12.11. It is obviously a compact and
convex subset of [0, ∞)d with
 d  '
  d
 
X1 = sup  λi Zi ei  : λ1 , . . . , λd ≥ 0, λi ≤ 1
 
i=1 1 i=1
d '
 d
= sup λi Zi : λ1 , . . . , λd ≥ 0, λi ≤ 1
i=1 i=1

d
≤ Zi
i=1

thus, E (X1 ) ≤ E (Z1 ) < ∞. We have, moreover, for arbitrary


d
numbers λ1 , . . . , λd ≥ 0 with i=1 λi ≤ 1


d
ξ := λi Zi ei ∈ X = ΔZ .
i=1

This implies

E(ξ) = (λ1 E(Z1 ), . . . , λd E(Zd )) ∈ E (ΔZ ) .

On the other hand, any ξ = (ξ1 , . . . , ξd ) ∈ S(ΔZ ) satisfies ξi ≤ Zi ;


thus, E(ξi ) ≤ E(Zi ), 1 ≤ i ≤ d.
Consequently, we obtain for the cross-polytope

ΔE(Z) ⊂ E (ΔZ ) ⊂ [0, E(Z)], (1.53)


90 1 D-Norms

where ΔE(Z) = conv({0, E(Z1 )e1 , . . . , E(Zd )ed }) and [0, E(Z)] =
[0, E(Z1 )] × · · · × [0, E(Zd )]. Lemma 1.12.7, together with Lem-
mas 1.12.13 and 1.12.14, implies that the symmetric extension
L(E(ΔZ )) = E(L(ΔZ )) is a convex set.

Characterizing the Selection Expectation


The next characterization of the selection expectation is crucial. Its proof is
taken from Molchanov (2005, Theorem 1.1.22).

Theorem 1.12.16 (Selection Expectation Characterization)


Let X ⊂ Rd be an integrably bounded closed and convex random
set. Its selection expectation is the unique convex and compact subset
E(X) ⊂ Rd satisfying

E(h(X, x)) = h(E(X), x), x ∈ Rd .

Proof. For each u ∈ E(X), there exists a sequence of rv ξn , n ∈ N, in S(X)


with limn→∞ E(ξn ) = u. Thus, for x ∈ Rd , we obtain

u, x =  lim E(ξn ), x


n→∞
= lim E(ξn , x)
n→∞
≤ E (sup {y, x : y ∈ X})
= E(h(X, x)).

As u ∈ E(X) was arbitrary, this implies

h(E(X), x) = sup {u, x : u ∈ E(X)} ≤ E(h(X, x)).

Next, we establish the reverse inequality. Choose x ∈ Rd and put for ε > 0

Xε := {y ∈ X : y, x ≥ h(X, x) − ε} .

Then, Xε is a non-empty random closed subset of X. Therefore, it has a


selection ξε (Molchanov (2005, Theorem 1.2.13)) and, thus,

ξε , x ≥ h(X, x) − ε.

Taking expectations yields

E(ξε ), x = E(ξε , x) ≥ E(h(X, x)) − ε

thus,
E(h(X, x)) − ε ≤ h(E(X), x).
1.12 D-Norms from a Stochastic Geometry Perspective 91

Letting ε ↓ 0 implies E(h(X, x)) ≤ h(E(X), x) for x ∈ Rd and, hence, equal-


ity.
According to Corollary 1.12.2, the compact and convex set E(X) is
uniquely determined by its support function h(E(X), ·), and thus, E(X) is
the unique compact and convex set with

h(E(X), x) = E(h(X, x)), x ∈ Rd .

This completes the proof.


When is a Given K a Max-Zonoid?


You may be asking yourself, When is a non-empty compact and convex set a
max-zonoid? The answer is a consequence of the previous considerations.

Corollary 1.12.17 (Characterization of Max-Zonoids) A com-


pact and convex set K ⊂ [0, ∞)d , K ∩ (0, ∞)d
= ∅, whose symmetric
extension L(K) is convex as well, is a max-zonoid iff K = E (ΔZ ),
where Z = (Z1 , . . . , Zd ) ≥ 0 ∈ Rd is an rv with E(Zi ) ∈ (0, ∞),
1 ≤ i ≤ d. In this case, ·K = ·Z .

Proof. Suppose that K = E(ΔZ ), where Z = (Z1 , . . . , Zd ) ≥ 0 ∈ Rd with


E(Zi ) ∈ (0, ∞), 1 ≤ i ≤ d. Example 1.12.15, together with equation (1.51),
show that E(ΔZ ) is a max-zonoid with ·K = ·Z .
Suppose, on the other hand, that K is a max-zonoid with corresponding
rv Z. Then, for x ∈ Rd , we have

xK = h(L(K), x)
= xZ
= E(h(L(ΔZ ), x)) by equation (1.51)
= h(E(L(ΔZ )), x) according to Theorem 1.12.16
= h(L(E(ΔZ )), x) according to Lemma 1.12.14.

The set L(E(ΔZ )) is convex, as shown in Example 1.12.15, and the set L(K) is
convex according to the assumption that K is a max-zonoid. Theorem 1.12.16
now implies that these sets coincide, because they provide identical support
functions as shown above. But this yields E(ΔZ ) = K, completing the proof.

Together with Example 1.12.10, Corollary 1.12.17 implies, for instance,


that ΔE(Z) is a strict subset of E (ΔZ ) in general.
Each D-norm ·D can be identified by Corollary 1.12.17 with the set
E(ΔZ ), where Z is an arbitrary generator of ·D . For example, the logistic
92 1 D-Norms

norm ·p , with p ∈ [1, ∞], can be identified by Lemma 1.12.5 with Kq =
 
y ≥ 0 ∈ Rd : yq ≤ 1 , where 1/p + 1/q = 1.
Each max-zonoid K satisfies
ΔE(Z) ⊂ K ⊂ [0, E(Z)]
for some rv Z ≥ 0 ∈ Rd with E(Zi ) ∈ (0, ∞), 1 ≤ i ≤ d; see equation (1.53).
This is a characterization of a max-zonoid in dimension d = 2.

Lemma 1.12.18 A convex and compact set K ⊂ [0, ∞)d with

Δz ⊂ K ⊂ [0, z]

for some z > 0 ∈ Rd is, in the case d = 2, a max-zonoid; it is a


D-max-zonoid iff z = (1, 1). For d ≥ 3, this conclusion is not true.

Proof. Check that L(K) is a convex set if d = 2. The norm ·K , generated by
K, is monotone; see Lemma 1.12.3. Corollary 1.5.4 implies that ·K = ·Z
for some rvs Z ≥ 0 ∈ R2 with E(Zi ) ∈ (0, ∞), i = 1, 2. In this case z = E(Z).
Set d ≥ 3 and put z := (1, . . . , 1) ∈ Rd , K := conv({0, e1 , . . . , ed , y}) ⊂
Rd , where y has constant entry 3/4. Then, Δz ⊂ K ⊂ [0, 1]d , but L(K) is not
convex:

1 1 3 3 3 3 3
y+ − , ,..., = 0, , . . . ,
∈ K.
2 2 4 4 4 4 4

Dual Norm of a D-Norm


Hölder’s inequality states that

d
|xi yi | ≤ xp yq , x, y ∈ Rd , (1.54)
i=1

where ·p , ·q are logistic norms with p, q ∈ [1, ∞] such that 1/p + 1/q = 1.
Both are D-norms according to Proposition 1.2.1. In what follows, we show
that this inequality can be extended to D-norms and their dual norms.

Definition 1.12.19 Let ·D be an arbitrary D-norm on Rd . A radially


symmetric norm · is called the dual norm of ·D if the D-max-zonoid
K = E (ΔZ ), which pertains to ·D , is the unit ball with respect to
·, i.e., 
K = y ≥ 0 ∈ Rd : y ≤ 1 .
We denote the dual norm by ·(D) .
1.12 D-Norms from a Stochastic Geometry Perspective 93

In equations (1.55) and (1.56), we show that a dual norm always exists. Its
uniqueness is shown below. We do not require ·(D) to be a D-norm itself.
This is actually true in dimension d = 2; see Proposition 1.12.26.
Prominent examples are the logistic norms ·p and ·q with 1/p + 1/q =
1, which are dual to one another according to Lemma 1.12.5. This symmetric
duality does not hold in the general case, i.e., if ·(D) is the dual norm of
·D , then ·D is generally not the dual norm of ·(D) .

Lemma 1.12.20 The dual norm ·(D) is uniquely determined.

This is a consequence of the following lemma.

Lemma 1.12.21 Let ·(1) and ·(2) be two radially symmetric norms
on Rd such that
   
y ≥ 0 ∈ Rd : y(1) ≤ 1 = y ≥ 0 ∈ Rd : y(2) ≤ 1 .

Then, ·(1) = ·(2) .

Proof. Choose y ≥ 0 ∈ Rd , y
= 0, and put y ∗ := y/ y(2) . Then, y ∗ (2) = 1
and, thus, y ∗ (1) ≤ 1. But this is y ∗ (1) ≤ y ∗ (2) , which implies y(1) ≤
y(2) . Interchanging both norms implies equality.

Hölder’s Inequality for D-Norms


The next result implies, in particular, Hölder’s inequality (1.54).

Theorem 1.12.22 (Hölder’s Inequality for D-Norms) Let ·(D)


be the dual norm of ·D . Then, we have


d
|xi yi | ≤ xD y(D) , x, y ∈ Rd .
i=1

Clearly, x and y can be interchanged on the right-hand side of the pre-


ceding inequality.
Proof. We can assume wlog x, y ≥ 0 ∈ Rd , y
= 0. Put y ∗ := y/ y(D) . We
obtain
d
xi yi = x, y = y(D) x, y ∗ .
i=1

Clearly, y (D) = 1 and, thus, by assumption, y ∗ ∈ K = E (ΔZ ), where Z


is a generator of ·D . This implies


94 1 D-Norms

x, y ∗  ≤ sup {x, y : y ∈ K}


= h(K, x)
= h (E (ΔZ ) , x)
= E (h (ΔZ , x)) according to Theorem 1.12.16
= xZ by equation (1.51)
= xD .

Together, we obtain


d
xi yi ≤ xD y(D) ,
i=1

which is the assertion.


Specifying the Dual D-Norm


In what follows, we specify the dual norm of an arbitrary D-norm. Recall that
each D-max-zonoid K ∈ [0, ∞]d is a convex and compact set that satisfies,
according to equation (1.53),

Δ(1,...,1) ⊂ K ⊂ [0, 1]d.



Its symmetric extension L(K) = y ∈ Rd : |y| ∈ K is convex and compact
as well.
For x ∈ Rd , x
= 0, put
1
x(K) := (1.55)
max {t > 0 : tx ∈ L(K)}

and 0(K) := 0. This defines a radially symmetric norm on Rd , i.e., x(K) =


|x|(K) ; see Lemma 1.12.23 below; it is called a gauge in Rockafellar (1970,
Chapter 15). In particular, L(K) is obviously the unit ball with respect to
this norm, or, equivalently,
 
K = y ≥ 0 ∈ Rd : y(K) ≤ 1 . (1.56)

Lemma 1.12.23 Let K ⊂ [0, ∞)d be a D-max-zonoid. Then, ·(K) is


a radially symmetric norm on Rd with

x∞ ≤ x(K) ≤ x1 , x ∈ Rd .


1.12 D-Norms from a Stochastic Geometry Perspective 95

Proof. It is obvious that ·(K) is radially symmetric and that it satisfies


conditions (1.1) and (1.2). We have to establish the triangle inequality (1.3).
Recall that the symmetric extension L(K) of K is convex according to the
definition of a max-zonoid.
Choose x, y ∈ Rd , both different from zero. We can suppose x + y
= 0 as
well. Put t1 := 1/ x(K) , t2 := 1/ y(K) ∈ (0, ∞). As L(K) is compact, we
have t1 x ∈ L(K), t2 y ∈ L(K), and, by the convexity of L(K), λt1 x + (1 −
λ)t2 y ∈ L(K) for λ ∈ [0, 1]. With the particular choice
1
t1
λ := 1 1 ,
t1 + t2

we obtain
1
λt1 x + (1 − λ)t2 y = 1 1 (x + y) ∈ L(K).
t1 + t2

But this implies


1
max {t > 0 : t(x + y) ∈ L(K)} ≥ 1 1
t1 + t2

hence,
1
x + y(K) =
max {t > 0 : t(x + y) ∈ L(K)}
 −1
1
≤ 1 1
t1 + t2
1 1
= +
t1 t2
= x(K) + y(K) ,

i.e., ·(K) defines a radially symmetric norm on Rd .


Choose x ≥ 0 ∈ Rd , x
= 0. Next, we establish the inequalities x∞ ≤
x(K) ≤ x1 . We have the following list of implications:
 
 x 
  =1
 x 
1 1
x
=⇒ ∈ Δ(1,...,1)
x1
x
=⇒ ∈ K by (1.53)
x1
1
=⇒ max {r > 0 : rx ∈ K} ≥
x1
=⇒ x(K) ≤ x1 ,
96 1 D-Norms

which is one of the two inequalities we want to establish. The fact that K ⊂
[0, 1]d by (1.53) implies

r1 := max {r > 0 : rx ∈ K} ≤ r2 := max r > 0 : rx ∈ [0, 1]d .

Note that r1 = 1/ x(K) and that r2 satisfies r2 x∞ = 1. We obtain the


following implications:

r1 x(K) = 1 = r2 x∞


=⇒ r1 x(K) = r2 x∞
=⇒ x(K) ≥ x∞ ,

since r1 ≤ r2 . This completes the proof.


The following result is, therefore, an obvious consequence of Lemma 1.12.21,


Theorem 1.12.22, and equation (1.56).
Theorem 1.12.24 Let ·D be an arbitrary D-norm on Rd with cor-
responding D-max-zonoid K. Then, ·(D) := ·(K) is the uniquely
determined dual norm, and we obtain


d
|xi yi | ≤ xD y(D) , x, y ∈ Rd .
i=1

Example 1.12.25 From Proposition 1.4.1, we know that the convex


combination ·D := λ ·p1 + (1 − λ) ·p2 of two logistic norms ·p1 ,
·p2 on Rd with p1 , p2 ∈ [1, ∞], λ ∈ (0, 1), is again a D-norm. Let
Z (1) , Z (2) be generators of ·p1 , ·p2 and let ξ ∈ {1, 2} be an rv with
P (ξ = 1) = λ = 1 − P (ξ = 2), which is also independent of Z (1) and
Z (2) . Then, Z (ξ) is a generator of ·D .
The dual norms are ·q1 , ·q2 with 1/pi + 1/qi = 1, i = 1, 2,
 
and corresponding D-max-zonoids Kqi = y ≥ 0 ∈ Rd : yqi ≤ 1 =
E (ΔZ (i) ), i = 1, 2. Check that the D-max-zonoid, which corresponds
to ·D , is, according to Theorem 1.12.16,

E (ΔZ (ξ) ) = λKq1 + (1 − λ)Kq2 .

The dual norm of ·D = λ ·p1 + (1 − λ) ·p2 is, consequently,

1
x(D) := .
sup {t > 0 : t |x| ∈ λKq1 + (1 − λ)Kq2 }
1.12 D-Norms from a Stochastic Geometry Perspective 97

Note that, with the particular choice p1 = ∞, p2 = 1, the convex


combination xD = λ x∞ + (1 − λ) x1 is the Marshall-Olkin D-
norm.

We close this section with a characterization of bivariate dual norms.

Proposition 1.12.26 The dual norm of an arbitrary D-norm on R2 is


a D-norm. On the other hand, each D-norm on R2 is a dual D-norm.

The mapping ·D → ·(D) between the set of D-norms and the set of
dual norms on R2 is, consequently, one-to-one.

Proof. From Corollary 1.5.4, we know that in dimension d = 2, the radially


symmetric norm ·(K) is a D-norm iff it satisfies ·∞ ≤ ·(K) ≤ ·1 . But
this was established in Lemma 1.12.23 for general dimension d ≥ 2. As a
consequence, we obtain that any dual norm on R2 is a D-norm.
Choose,
 on the other hand, an arbitrary D-norm ·D on R2 and put
K := x ≥ 0 ∈ R2 : xD ≤ 1 . The set K ⊂ [0, ∞)2 is compact and convex,
and therefore, according to Lemma 1.12.18, we only have to show that

Δ(1,1) ⊂ K ⊂ [0, 1]2 .

But this follows from the general inequalities ·∞ ≤ ·D ≤ ·1 in (1.4): we
have, for x ∈ K,
x∞ ≤ xD ≤ 1
thus, x ∈ [0, 1]2 . On the other hand, for arbitrary x = λ1 e1 + λ2 e2 ∈ Δ(1,1) ,
λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1, we have

λ1 e1 + λ2 e2 D ≤ λ1 e1 + λ2 e2 1 ≤ λ1 + λ2 ≤ 1,

thus, x ∈ K. This completes the proof.



2
D-Norms & Multivariate Extremes

This chapter provides a smooth introduction to MEVT via D-norms. Stan-


dard references to MEVT are Balkema and Resnick (1977); de Haan and
Resnick (1977); Resnick (1987); Vatan (1985); Beirlant et al. (2004); de Haan
and Ferreira (2006), and Falk et al. (2011), among others. For the sake of com-
pleteness and for easier reference, we list some basics, starting with univariate
extreme value theory.

2.1 Univariate Extreme Value Theory


Let X be an R-valued rv and suppose that we are only interested in large
values of X, where we call a realization of X large if it exceeds a given high
threshold t ∈ R. In this case, we choose the data window A = (t, ∞) or, better
adapted to our purposes, we put t ∈ R on a linear scale and define
An = (an t + bn , ∞)
for some norming constants an > 0, bn ∈ R. We are, therefore, only interested
in values of X ∈ An .
Denote by F the df of X. The elementary conditional df of X, given that
X exceeds the threshold an t + bn , satisfies
P (X ≤ an (t + s) + bn | X > an t + bn )
1 − F (an (t + s) + bn )
= 1− , s ≥ 0.
1 − F (an t + bn )
We let the threshold an t + bn increase with n ∈ N; thus, we are facing the
problem: what is the limiting behavior of
1 − F (an (t + s) + bn )
−→n→∞ ? (2.1)
1 − F (an t + bn )

© Springer Nature Switzerland AG 2019 99


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9 2
100 2 D-Norms & Multivariate Extremes

Univariate Extreme Value Distributions


Let X1 , X2 , . . . be independent copies of X. Suppose that there exist constants
an > 0, bn ∈ R, such that, for x ∈ R,

max1≤i≤n Xi − bn
P ≤ x = P (Xi ≤ an x + bn , 1 ≤ i ≤ n)
an
= F n (an x + bn )
→n→∞ G(x) (2.2)

for some non-degenerate limiting df G, i.e., there is no x0 ∈ R such that


G(x0 ) = 1 and G(x) = 0 for x < x0 . Then, we say that F belongs to the
domain of attraction of G, denoted by F ∈ D(G).
If F ∈ D(G), we deduce from the Taylor expansion log(1 + ε) = ε + O(ε2 )
for ε → 0 that log(1 + ε)/ε →ε→0 1 thus, the equivalence

F n (an x + bn ) −→n→∞ G(x)


⇔ n log(1 − (1 − F (an x + bn ))) →n→∞ log(G(x))
⇔ n(1 − F (an x + bn )) →n→∞ − log(G(x)),

if 0 < G(x) ≤ 1; note that 1 − F (an x + bn ) →n→∞ 0. Hence, we obtain

1 − F (an (t + s) + bn ) log(G(t + s))


→n→∞ , (2.3)
1 − F (an t + bn ) log(G(t))

if 0 < G(t) < 1.


According to the classical article by Gnedenko (1943) (see also de Haan
(1975); Galambos (1987), or Resnick (1987)), we know that F ∈ D(G) only if
G ∈ {Gα : α ∈ R}, with
 
exp − (−x)α , x ≤ 0,
Gα (x) = for α > 0,
1, x > 0,

0, x ≤ 0,
Gα (x) = α
for α < 0,
exp(−x ), x > 0,

and
G0 (x) := exp(−e−x ), x ∈ R, (2.4)
being the family of reverse Weibull, Fréchet, and Gumbel distributions. Note
that G1 (x) = exp(x), x ≤ 0, is the standard negative exponential df.
The assumption F ∈ D(G) is quite a mild one, practically satisfied by
any textbook df F . We refer, for example, to Galambos (1987, Section 2.3)
or Resnick (1987, Chapter 1), where the condition F ∈ D(G) is characterized
and the choice of the constants an > 0, bn ∈ R, is specified.
2.1 Univariate Extreme Value Theory 101

For the df F (x) = x, x ∈ [0, 1] of the uniform distribution on (0, 1), we


obtain, for example, with an = 1/n, bn = 1, n ∈ N,
 x  x n
lim F n 1 + = lim 1 + = exp(x) = G1 (x), x ≤ 0.
n→∞ n n→∞ n
For counterexamples F
∈ D(G), we refer to Galambos (1987, Corollary 2.4.1
and Example 2.6.1).
The above different representations of Gα can be unified for β ∈ R by
putting  
Fβ (x) := exp −(1 + βx)−1/β , 1 + βx > 0,

with the convention


 
F0 (x) := lim Fβ (x) = exp −e−x , x ∈ R.
β→0

Note that the signs of β and α in these two representations are flipped about
zero, i.e., Fβ with β > 0 corresponds to Gα with α < 0, etc. With this
particular parametrization, the set of univariate distributions {Fβ : β ∈ R} is
commonly called the family of generalized extreme value distributions.

Max-Stability of Extreme Value Distributions


The characteristic property of the class of extreme value distributions (EVDs)
{Gα : α ∈ R} is their max-stability, i.e., for each α ∈ R and each n ∈ N,
there exist constants an > 0, bn ∈ R, depending on α, such that

Gn (an x + bn ) = G(x), x ∈ R. (2.5)

For G1 (x) = exp(x), x ≤ 0, for example, we have an = 1/n, bn = 0, n ∈ N:


x  x n
Gn1 = exp = exp(x) = G1 (x).
n n
Let η (1) , η (2) , . . . be independent copies of an rv η that follows the df Gα
with arbitrary α ∈ R. In terms of rvs, equation (2.5) means

max1≤i≤n η (i) − bn
P ≤ x = P (η ≤ x), x ∈ R.
an

This is the reason why Gα is called a max-stable df, and the set {Gα : α ∈ R}
collects all univariate max-stable distributions, which are non-degenerate; see,
for example, Galambos (1987, Theorem 2.4.1).
102 2 D-Norms & Multivariate Extremes

Univariate Generalized Pareto Distributions


Suppose that F ∈ D(Gα ). Then, we obtain from equation (2.3)
! X−b

X − bn ! n
P ≤ t+s! >t
an an
1 − F (an (t + s) + bn )
=1−
1 − F (an t + bn )
log(Gα (t + s))
→n→∞ 1 −
log(Gα (t))
⎧   

⎪ Hα − 1 + st , if α > 0,

⎨  
= Hα 1 + st , if α < 0, s ≥ 0, (2.6)



⎩ H0 (s), if α = 0,

provided 0 < Gα (t) < 1. The family of df

Hα (x) := 1 + log(Gα (x))




⎨1 − (−x) , −1 ≤ x ≤ 0,
α
if α > 0,
= 1 − xα , x ≥ 1, if α < 0, (2.7)


1 − exp(−x), x ≥ 0, if α = 0,

parametrized by α ∈ R, is the class of (univariate) generalized Pareto df


(GPD) associated with the family of EVD. Note that Hα with α < 0 is a
Pareto distribution, H1 is the uniform distribution on (−1, 0), and H0 is the
standard exponential distribution.
The following consequence is obvious: suppose that your data are realiza-
tions from iid observations, whose common df is in the domain of attraction
of an extreme value df. Modeling the distribution of exceedances above a high
threshold by a GPD is, consequently, a straightforward option. As described
by van Dantzig (1956), for example, floods that exceed some high thresh-
old approximately follow an exponential df. This approach is known as the
peaks-over-threshold (POT) method; see, for example, Beirlant et al. (2004,
Section 5.3) or Reiss and Thomas (2007, Chapter 5).

2.2 Multivariate Generalized Pareto Distributions


In this section, we introduce multivariate GPD. They will be particularly
useful for a smooth derivation of multivariate max-stable df in Section 2.3.
Let Z = (Z1 , . . . , Zd ) be a generator of an arbitrary D-norm ·D on Rd
with the additional property
2.2 Multivariate Generalized Pareto Distributions 103

Zi ≤ c, 1 ≤ i ≤ d, (2.8)

for some constant c ≥ 1. According to Corollary 1.7.2, such a generator always


exists. Let U be an rv that is uniformly distributed on (0, 1) and that is
independent of Z.
Put
1 1
V = (V1 , . . . , Vd ) := (Z1 , . . . , Zd ) =: Z. (2.9)
U U
Note that, for x ≥ 1,

1 1 1
P ≤x =P ≤U =1− ,
U x x

i.e., 1/U follows a standard Pareto distribution (with parameter 1).


According to Fubini’s theorem, we have, moreover, for x > c ≥ 1 and
1 ≤ i ≤ d,

1 Zi
P Zi ≤ x = P ≤U
U x
= E (1 (Zi /x ≤ U ))

= 1 (z/x ≤ u) (P ∗ (U, Zi ))d(u, z)
[0,1]×[0,c]

= 1 (z/x ≤ u) ((P ∗ U ) × (P ∗ Zi ))d(u, z)
[0,1]×[0,c]
 c  1

= 1 (z/x ≤ u) (P ∗ U )du (P ∗ Zi )dz


0 0
 c z 
= P ≤U
(P ∗ Zi ) dz
x
0 c
z
= 1 − (P ∗ Zi ) dz
0 x

1 c
=1− z (P ∗ Zi ) dz
x 0
1
= 1 − E (Zi )
x
1
=1− , (2.10)
x
where P ∗ (X, Y ) = (P ∗ X) × (P ∗ Y ) denotes the product measure when the
rvs X and Y are independent.
The product Zi /U , therefore, follows a standard Pareto distribution in its
upper tail. The special case Zi = 1 yields the standard Pareto distribution ev-
erywhere. We call the distribution of V = Z/U a d-variate simple generalized
Pareto distribution (simple GPD).
104 2 D-Norms & Multivariate Extremes

The Distribution Function of a Simple GPD


By repeating the arguments in the derivation of equation (2.10), we obtain
for x ≥ (c, . . . , c) = c

Zi
P (V ≤ x) = P ≤ xi , 1 ≤ i ≤ d
U

Zi
=P ≤ U, 1 ≤ i ≤ d
xi


zi
= P U ≥ , 1 ≤ i ≤ d (P ∗ Z)d(z1 , . . . , zd )
[0,c]d xi


zi
= P U ≥ max (P ∗ Z)d(z1 , . . . , zd )
[0,c] d 1≤i≤d xi


zi
= 1 − max (P ∗ Z)d(z1 , . . . , zd )
[0,c]d 1≤i≤d xi


zi
=1− max (P ∗ Z)d(z1 , . . . , zd )
[0,c]d 1≤i≤d xi

Zi
= 1 − E max
1≤i≤d xi
 
1
=1− x ,
 (2.11)
D

i.e., the (multivariate) df of V is in its upper tail, i.e., for x ≥ c, given by


1 − 1/xD .

The Survival Function of a Simple GPD


By repeating the arguments in the derivation of equation (2.11), we obtain
for x ≥ c

Zi
P (V ≥ x) = P U ≤ ,1 ≤ i ≤ d
xi


zi
= P U ≤ , 1 ≤ i ≤ d (P ∗ Z)d(z1 , . . . , zd )
[0,c]d xi


zi
= P U ≤ min (P ∗ Z)d(z1 , . . . , zd )
[0,c]d 1≤i≤d xi


zi
= min (P ∗ Z)d(z1 , . . . , zd )
[0,c] d 1≤i≤d xi

Zi
= E min =  1/x D , (2.12)
1≤i≤d xi
2.2 Multivariate Generalized Pareto Distributions 105

i.e., the survival function of V is equal to the dual D-norm function  1/x D ,
for x ≥ c.
As an obvious consequence of (2.12), we obtain the equation

P (V ≥ tx)  1/(tx) D 1


P (V ≥ tx | V ≥ x) = = = , t ≥ 1, (2.13)
P (V ≥ x)  1/x D t

independent of x ≥ c, provided  1/x D > 0. Note that 1/t, t ≥ 1, is the


survival function of the univariate simple Pareto distribution. This is a first
example demonstrating excursion stability or POT stability of a (multivariate
simple) GPD. Actually, excursion stability is a characteristic property of a
GPD in the univariate case as well as in the multivariate case; see, for example,
Falk et al. (2011, Section 5.3) and Proposition 3.1.2.

Application to Risk Assessment


Suppose that the joint random losses of a portfolio consisting of d assets are
modeled by the rv V .
According to equation (2.12), the probability that the d losses jointly ex-
ceed the vector x > c is given by

Zi
P (V ≥ x) = E min =  1/x D .
1≤I≤d xi

Next, we apply different models for the underlying D-norm ·D . If we


choose ·D = ·∞ , then we know from (1.13) that  x ∞ = min1≤i≤d |xi |;
thus,

1 1
P (V ≥ x) = min = , x ≥ (1, . . . , 1).
1≤i≤d xi max xi
1≤i≤d

If we choose ·D = ·1 , then we know from (1.12) that  · 1 = 0; thus,

P (V ≥ x) = 0, x ≥ (d, . . . , d).

This example shows that assessing the risk of a portfolio is highly sensitive
to the choice of the stochastic model. For x = (d, . . . , d) and ·D = ·∞ ,
the probability of the event that the losses jointly exceed the value d is 1/d,
whereas for ·D = ·1 , it is zero. In dimension d = 2, this means a joint loss
probability of 1/2 versus a joint loss probability of 0.

Standard Multivariate GPDS


Let Z = (Z1 , . . . , Zd ) again be a generator of the arbitrary D-norm ·D
on Rd , with the additional property Zi ≤ c, 1 ≤ i ≤ d, for some constant
c ≥ 1, and let the rv U be uniformly distributed on (0, 1) and independent of
Z. The distribution of Z/U is a simple GPD. But now we want to consider
106 2 D-Norms & Multivariate Extremes

U/Z; however, in this case, we may divide by zero. To overcome this problem,
choose a number K < 0 and put

W := (W1 , . . . , Wd )

U U
:= max − , K , . . . , max − , K . (2.14)
Z1 Zd

The additional constant K avoids division by zero. Repeating the arguments


in equation (2.11), we obtain

P (W ≤ x) = 1 − xD , x 0 ≤ x ≤ 0 ∈ Rd ,

where x0 < 0 ∈ Rd depends on K and c.


We call a df H on Rd a standard GPD if there exists x0 < 0 ∈ Rd such
that
H(x) = 1 − xD , x 0 ≤ x ≤ 0 ∈ Rd . (2.15)
From (1.4), we obtain 0 ≤ H(x0 ) = 1 − x0 D ≤ 1 − x0 ∞ and, thus,
max1≤i≤d |x0i | ≤ 1, or x0i ≥ −1, 1 ≤ i ≤ d. As a consequence, the i-th
marginal df Hi of H is given by

Hi (x) = 1 − xei D = 1 − |x| ei D = 1 + x,

for x0i ≤ x ≤ 0, 1 ≤ i ≤ d, which coincides on [x0i , 0] with the uniform df on


(−1, 0).
A repetition of the arguments used in the derivation of equation (2.12)
provides the survival function of a standard multivariate GPD:

P (W ≥ x) =  x D , x 0 ≤ x ≤ 0 ∈ Rd . (2.16)

General Multivariate GPDS


Let W = (W1 , . . . , Wd ) follow a standard multivariate GPD with corre-
sponding D-norm ·D . Choose parameters α1 , . . . , αd ∈ R. Put ψα (x) :=
log(Gα (x)), 0 < Gα (x) < 1, where Gα is a univariate max-stable df from
(2.4). Note that ψα is a strictly monotone and continuous function, whose
range is (−∞, 0). Then, by definition,
 
Y = (Y1 , . . . , Yd ) := ψα−1
1
(W1 ), . . . , ψα−1
d
(Wd ) (2.17)

follows a general multivariate GPD.


For x < 0, we have

⎪ 1/α
⎨−(−x) , α > 0,
ψα−1 (x) = (−x)1/α , α < 0,


− log(−x), α = 0.
2.3 Multivariate Max-Stable Distributions 107
 
Note that ψ1−1 (x) = x on (−∞, 0), and thus, ψ1−1 (W1 ), . . . , ψ1−1 (Wd ) =
(W1 , . . . , Wd ) follows a standard GPD. With the choice α1 = · · · = αd = −1,
we obtain a simple GPD
 −1 −1

ψ−1 (W1 ), . . . , ψ−1 (Wd )


−1

−1 
U U
= − max − , K , . . . , − max − , K
Z1 Zd

1 1
= ,...,
min (U/Z1 , −K) min (U/Zd , −K)
1
= (Z1 , . . . , Zd ) = V
U
if U/Zi ≤ −K or Zi /U ≥ −1/K for 1 ≤ i ≤ d.
With α1 = · · · = αd = 0 and Zi /U ≥ −1/K for 1 ≤ i ≤ d, we obtain
 −1 
ψ0 (W1 ), . . . , ψ0−1 (Wd ) = (− log(U/Z1 ), . . . , − log(U/Zd ))
= (log(Z1 ) − log(U ), . . . , log(Zd ) − log(U )) ,

where − log(U ) follows the standard exponential distribution on (0, ∞).


As mentioned earlier, the characteristic property of a GPD in the uni-
variate case as well as in the multivariate case is its excursion stability or
POT stability; see, for example, Falk et al. (2011, Section 5.3). The df of Y is
given by
 
P (Y ≤ x) = P (W1 , . . . , Wd ) ≤ (ψα1 (x1 ), . . . , ψαd (xd ))
= 1 − (ψα1 (x1 ), . . . , ψαd (xd ))D

if (ψα1 (x1 ), . . . , ψαd (xd )) ≥ x0 ; for such x, its survival function follows from
equation (2.16):
 
P (Y ≥ x) = P (W1 , . . . , Wd ) ≥ (ψα1 (x1 ), . . . , ψαd (xd ))
=  (ψα1 (x1 ), . . . , ψαd (xd )) D .

An alternative but equivalent definition of a general multivariate GPD,


in terms of its copula, together with its univariate margins, is given in Re-
mark 3.1.3.

2.3 Multivariate Max-Stable Distributions


In complete accordance with the univariate case, we call a non-degenerate df
G on Rd max-stable if, for every n ∈ N, there exist vectors an > 0, bn ∈ Rd ,
such that
108 2 D-Norms & Multivariate Extremes

Gn (an x + bn ) = G(x), x ∈ Rd . (2.18)


Recall that all operations on vectors, such as addition, multiplication, etc.,
are meant componentwise. The preceding equation can again be formulated
in terms of componentwise maxima of independent copies η (1) , η (2) , . . . of an
rv η = (η1 , . . . , ηd ) that realizes in Rd , and that follows the df G:

max1≤i≤n η (i) − bn
P ≤x = P (η ≤ x) , x ∈ Rd .
an

Note that both the maximum function and division are taken componentwise.
Different from the univariate case in (2.4), the class of multivariate max-
stable distributions or multivariate extreme value distributions, also abbrevi-
ated as EVD, is no longer a parametric one, indexed by some α ∈ R. This is
obviously necessary for the univariate margins of G. Instead, a non-parametric
part occurs, which can be best described in terms of D-norms, as is shown in
what follows.

Simple Multivariate Max-Stable Distributions


Definition 2.3.1 An EVD G on Rd is called simple max-stable if each
univariate marginal df Gi of G is the Fréchet df with parameter one, or
unit Fréchet df for short, i.e.,

1
Gi (x) = exp − , x > 0, 1 ≤ i ≤ d.
x

Next, we show that such simple EVDs actually exist. Choose an arbitrary
(1) (1) (2) (2)
D-norm ·D on Rd . Let V (1) = (V1 , . . . , Vd ), V (2) = (V1 , . . . , Vd ), . . .
be independent copies of the rv V = Z/U , where Z = (Z1 , . . . , Zd ) is a
generator of ·D with the additional property that it is bounded by some
c ≥ 1 ∈ Rd , and the rv U is uniformly distributed on (0, 1). The rv Z and U
are assumed to be independent as well; thus, the rv V follows a simple GPD.
For the vector of the componentwise maxima,

(i) (i) (i) (i)


max V := max V1 , max V2 , . . . , max Vd ,
1≤i≤n 1≤i≤n 1≤i≤n 1≤i≤n

we obtain from equation (2.11), for x > 0 and n large enough such that
nx > c,

 
1
P max V (i) ≤ x = P V (i) ≤ nx, 1 ≤ i ≤ n
n 1≤i≤n
n  
= P V (i) ≤ nx
i=1
2.3 Multivariate Max-Stable Distributions 109
n
= P (V ≤ nx)
 

 1  n
= 1−  
nx D
 
n
1 1
= 1−  
n x D

 

1
→n→∞ exp −  
x =: G(x), (2.19)
D

where 1/x is meant componentwise.


Suppose that at least one component of x is equal to zero, say component
i0 . Then,

P (V ≤ nx) ≤ P (Vi0 ≤ nxi0 )


Zi0
=P ≤0
U
= P (Zi0 ≤ 0)
= P (Zi0 = 0) < 1

by the fact that E(Zi0 ) = 1. As a consequence, we obtain in this case


1 (i) n n
P max V ≤ x = P (V ≤ nx) ≤ P (Zi0 = 0) →n→∞ 0.
n 1≤i≤n

Hence, we have

1
P max V (i) ≤ x →n→∞ G(x), x ∈ Rd ,
n 1≤i≤n

where    
exp −  x1 D , if x > 0,
G(x) = (2.20)
0 elsewhere.
 
Since P n−1 max1≤i≤n V (i) ≤ · , n ∈ N, is a sequence of df on Rd , one easily
checks that its limit G(·) is a df itself; see, for example, Reiss (1989, (2.2.19)).
It is obvious that the df G satisfies
 
n  

 1  1
Gn (nx) = exp −  
 nx  = exp − x
 = G(x), x > 0 ∈ Rd ,
D D

and, thus,
Gn (nx) = G(x), x ∈ Rd , n ∈ N,
which is the max-stability of G. Let the rv ξ ∈ Rd have df G. By keeping
xi > 0 fixed and letting xj tend to infinity for j
= i, we obtain the marginal
distribution of G:
110 2 D-Norms & Multivariate Extremes

Gi (xi ) =P (ξi ≤ xi )
= x lim
→∞
P (ξi ≤ xi , ξj ≤ xj , j
= i)
j
j=i

= x lim
→∞
G(x)
j
j=i
 

1
= x lim exp −  x

j →∞
j=i D



 1 1 1 
= x lim exp −  , . . . , , . . . , 
j →∞  x1 xi xd D
j=i



 1 
= exp −   0, . . . , 0, , 0, . . . , 0 

xi

D
1
= exp − ei D
xi

1
= exp −
xi
by the fact that each D-norm is standardized. Each univariate marginal df Gi
of G is, consequently, the unit Fréchet df

1
Gi (x) = exp − , x > 0.
x
This proves that simple EVDs actually exist. We see later on in Theorem 2.3.4
that, actually, each simple EVD can be represented by means of a D-norm as
in (2.20).

Standard Multivariate Max-Stable Distributions


The simple max-stable df is the standard approach in the literature on mul-
tivariate extreme value analysis, but the standard max-stable df turns out to
be simpler.
Definition 2.3.2 A df G on Rd is a multivariate standard max-stable
(SMS) df, or standard EVD, iff it is max-stable in the sense of equation
(2.18), and if it has standard negative exponential margins:

Gi (x) = exp(x), x ≤ 0, 1 ≤ i ≤ d.

In what follows, we show that an SMS df G exists as well. Let the rv


ξ ∈ Rd follow a multivariate simple max-stable df as in (2.20), i.e., P (ξ ≤ x) =
exp (− 1/xD ) , x > 0 ∈ Rd . Put

1 1 1
η=− =− , ...,
ξ ξ1 ξd
2.3 Multivariate Max-Stable Distributions 111

and note that P (ξi ≤ 0) = 0, 1 ≤ i ≤ d. Then, for x < 0 ∈ Rd , we obtain


1
P (η ≤ x) = P − ≤ xi , 1 ≤ i ≤ d
ξi

1
= P ξi ≤ − , 1 ≤ i ≤ d
xi

1
=P ξ≤−
x
= exp(− xD ) =: GD (x).

By putting

GD (x) := exp (− (min(x1 , 0), . . . , min(xd , 0))D ) (2.21)

for x = (x1 , . . . , xd ) ∈ Rd , we obtain a df on Rd , which is max-stable as well:


x
GnD = GD (x) , x ∈ Rd , n ∈ N.
n
GnD (·/n) is the df of the product n max1≤i≤n η (i) , where η (1) , η (2) , . . . , η (n)
are independent copies of η.
Note that each univariate margin of GD is the standard negative exponen-
tial df:

P (ηi ≤ x) =P (η ≤ xei )
= exp (− xei D )
= exp (− |x| ei D )
= exp(x), x ≤ 0.

This shows that SMS dfs actually exist.

Characterization of SMS DF
We are going to show that any SMS df or standard EVD can be represented as
in (2.21), i.e., the theory of D-norms allows a mathematically elegant charac-
terization of the family of SMS dfs, presented in the next result. It comes from
results found in Balkema and Resnick (1977); de Haan and Resnick (1977);
Pickands (1981), and Vatan (1985).
Theorem 2.3.3 A df G on Rd is an SMS df iff there exists a D-norm
·D on Rd such that

G(x) = exp (− xD ) , x ≤ 0 ∈ Rd .


112 2 D-Norms & Multivariate Extremes

Let the rv η = (η1 , . . . , ηd ) follow an SMS df G(x) = exp (− xD ), x ≤


0 ∈ Rd . Then, the components η1 , . . . , ηd are independent iff the D-norm ·D
is the logistic norm ·1 , whereas the components are completely dependent,
(i.e., η1 = · · · = ηd a.s.), iff ·D = ·∞ . This is the reason why we call ·1
the independence D-norm and ·∞ the complete dependence D-norm.
Proof. The implication “⇐” in the previous result is a straightforward con-
sequence of the fact that the function G(x) := exp (− xD ), x ≤ 0 ∈ Rd ,
is the limit of a sequence of dfs. Repeating the arguments in equation (2.19),
one obtains, for x ≤ 0 ∈ Rd ,


n
xD
lim P n max W (i) ≤ x = lim 1 − = exp(− xD ), (2.22)
n→∞ 1≤i≤n n→∞ n

where W (1) , W (2) , . . . are independent copies of the rv W , which follows a


standard GPD as in (2.14). Note that for any D-norm on Rd , there exists a
bounded generator according to Theorem 1.7.1. The construction of the rv W
requires this condition. It is, therefore, readily seen that G(x) = exp (− xD ),
x ≤ 0 ∈ Rd , defines a multivariate df; see, for example, Reiss (1989, (2.2.19)).
Next, we prove the implication “⇒.” Let G be an arbitrary max-stable df
with standard negative exponential margins, i.e., there exists a max-stable rv
η = (η1 , . . . , ηd ), whose df is G. Then, with c ∈ (0, 1) and α := 1/c,
 

(c) (c) (c) 1 1


ξ := ξ1 , . . . , ξd := c,..., c
|η1 | |ηd |
follows a max-stable df, say Gc , with Fréchet margins
 

(c) 1
P ξi ≤ x = exp − α =: Fα (x), x > 0, 1 ≤ i ≤ d.
x
Its expectation is Γ (1 − c) =: μc , which can be seen by applying el-
ementary rules of integration as follows; note that the density of Fα is
exp(−x−α )x−α−1 α, x > 0. We have

 ∞
1
E = x exp(−x−α )x−α−1 α dx
|ηi |c 0
 ∞
=α x−α exp(−x−α )dx
0
 ∞
1
= x− α exp(−x)dx
0 ∞
1
= x(1− α )−1 exp(−x)dx
0

1
=Γ 1−
α
= Γ (1 − c). (2.23)
2.3 Multivariate Max-Stable Distributions 113

by proper substitution. The rv


  1 (c)
(c) (c)
Z (c) := Z1 , . . . , Zd := ξ
μc
 
(c) (c)
now satisfies Zi ≥ 0 and E Zi = 1, 1 ≤ i ≤ d, i.e., Z (c) is the generator
of a D-norm, say ·Dc .
The fact that G is max-stable means that, for any n ∈ N and x ∈ Rd ,
x
Gn = G(x) or Gn (x) = G(nx).
n
This implies that, for any n, m ∈ N and x ∈ Rd ,
m 
Gn x = G(mx) = Gm (x)
n
or m 
Gn/m x = G(x).
n
Note that G is continuous because each univariate margin is a continuous df;
see, for example, Reiss (1989, Lemma 2.2.6). Letting n and m tend to infinity
such that n/m → t > 0, we obtain by the continuity of G that max-stability
of G is equivalent to
x
Gt = G(x) or G(tx) = Gt (x)
t
for any t > 0, x ∈ Rd . The final equation clearly implies that Gc satisfies

Gc (tx) = Gc (x)1/t ,
α
t > 0, x ∈ Rd .

In the following proof, we use the fact that Gc (x) > 0 for x > 0. Otherwise,
the max-stability of Gc would imply

0 = Gc (x) = Gc (x)1/t = Gc (tx)


α

for some x > 0 ∈ Rd and each t > 0; letting t converge to infinity would
obviously produce a contradiction. Using Lemma 1.2.2, we obtain, for x >
0 ∈ Rd ,
 

(c)
xDc = E max xi Zi
1≤i≤d
 ∞  

1 (c)
= P max xi ξi > t dt
μc 0 1≤i≤d
 ∞

1 1
= 1 − Gc t dt
μc 0 x
 ∞
1/tα
1 1
= 1 − Gc dt
μc 0 x
114 2 D-Norms & Multivariate Extremes
 ∞

1 1 1
= 1 − exp α log Gc dt
μc 0 t x

1/α  ∞

1 1 1
= − log Gc 1 − exp − α dt
x μc 0 t

1/α
1
= − log Gc
x
∞
by the substitution t → (− log (Gc (1/x)))1/α t; note that 0 1−exp (−1/tα ) dt
= μc . Observe that, for x > 0 ∈ Rd ,

1
Gc = G (−xα ) ,
x
thus, we have for x ∈ Rd
α 1/α
xDc = (− log (G (− |x| ))) , (2.24)
where |x| is also meant componentwise. This yields
lim xDc = − log(G(− |x|)).
c→1

As the pointwise limit of a sequence of D-norms is a D-norm by Corol-


lary 1.8.5, we have proved that
xD := − log(G(− |x|)), x ∈ Rd ,
defines a D-norm on Rd , or
G(− |x|) = exp (− xD ) , x ∈ Rd ,
which completes the proof.

Characterization of an Arbitrary Max-Stable


Distribution
Any multivariate max-stable distribution can be represented by a D-norm
together with transformations of the univariate margins.
Theorem 2.3.4 Any multivariate max-stable df G(α1 ,...,αd ) with uni-
variate margins Gα1 , . . . , Gαd can be represented as

G(α1 ,...,αd ) (x) = G (ψα1 (x1 ), . . . , ψαd (xd ))


 
= exp − (ψα1 (x1 ), . . . , ψαd (xd ))D , x ∈ Rd ,

where G(x) = exp (− xD ), x ≤ 0 ∈ Rd , is a standard EVD and

ψαi (x) = log (Gαi (x)) , 0 < Gαi (x) < 1, 1 ≤ i ≤ d,


2.3 Multivariate Max-Stable Distributions 115

⎨ −(−x) ,
αi
⎪ x < 0, if αi > 0,
= −xαi , x > 0, if αi < 0,


− exp(−x), x ∈ R, if αi = 0.

Note that ψαi is a strictly monotone increasing and continuous function,


whose range is (−∞, 0). The reverse implication in the preceding result is true
as well. Possible location and scale shifts in each component can clearly be
incorporated.
Theorem 2.3.4 is an obvious consequence of Theorem 2.3.3 and the next
lemma.
Lemma 2.3.5 Suppose that the rv X = (X1 , . . . , Xd ) has df
G(α1 ,...,αd ) . Put
ηi := ψαi (Xi ), 1 ≤ i ≤ d.
Then η = (η1 , . . . , ηd ) follows the SMS df G(1,...,1) .
On the other hand, suppose that the rv η = (η1 , . . . , ηd ) follows the
SMS df G(1,...,1) . Then
 
X = (X1 , . . . , Xd ) = ψα−1
1
(η1 ), . . . , ψα−1
d
(ηd )

has df G(α1 ,...,αd ) .

Proof. We establish the first part; the second part is obvious. Since Gαi (Xi )
is uniformly distributed on (0, 1), it is clear that ηi = log(Gαi (Xi )) has df
exp(x) = G1 (x), x < 0. It remains to show that the df of the rv η, say H, is
max-stable. But this follows from the fact that G(α1 ,...,αd ) is max-stable with
Gnαi (ψα−1
i
(x/n)) = Gαi (ψα−1
i
(x)): For xi < 0, 1 ≤ i ≤ d, we have

H(x1 /n, . . . , xd /n)n


 n
= P ηi ≤ xi /n, 1 ≤ i ≤ d
 n
= P Xi ≤ ψα−1 i
(xi /n), 1 ≤ i ≤ d
 
= P Xi ≤ ψα−1 i
(xi ), 1 ≤ i ≤ d
= H(x1 , . . . , xd ).

Lemma 2.3.5 can also be formulated as


 
G(α1 ,...,αd ) ψα−1
1
(x1 ), . . . , ψα−1
d
(xd ) = G(1,...,1) (x1 , . . . , xd ),

for xi < 0, i ≤ d. The max-stability of G(α1 ,...,αd ) is therefore preserved by


transforming each margin onto the standard negative exponential distribution.
116 2 D-Norms & Multivariate Extremes

Example 2.3.6 With the bivariate D-norm (x, y)Ds,t as defined in


Lemma 1.10.6, we obtain from Theorem 2.3.3 that
 
G(x, y) := exp − (x, y)Ds,t , x, y ≤ 0,

defines a bivariate SMS df. Let the rv η = (η1 , η2 ) follow this df. Then,
by Lemma 2.3.5, the rv
 
X = (X1 , X2 ) := ψ0−1 (η1 ), ψ0−1 (η2 )

follows a bivariate max-stable df with Gumbel margins. Precisely, we


have

P (X1 ≤ x, X2 ≤ y) = P (η1 ≤ ψ0 (x), η2 ≤ ψ0 (y))


 
= exp − (ψ0 (x), ψ0 (y))Ds,t
 
= exp − (exp(−x), exp(−y))Ds,t

t−s y−x
= exp − exp(−x)Φ +√
2 t−s

t−s x−y
− exp(−y)Φ + √
2 t−s
= G(0,0) (x, y), x, y ∈ R,

which
√ is the bivariate Hüsler–Reiss distribution, with parameter λ =
t − s/2; see, for example, Falk et al. (2011, Example 4.1.4).

Min-Stable Distributions
Let X (1) , X (2) , . . . be independent copies of the rv X on Rd . The rv X (or its
distribution) is called min-stable if there are constants an > 0 ∈ Rd , bn ∈ Rd ,
n ∈ N, such that, for each n ∈ N,

min1≤i≤n X (i) + bn
P ≥ x = P (X ≥ x), x ∈ Rd . (2.25)
an

Multiplying both sides by −1, equation (2.25) becomes


max1≤i≤n −X (i) − bn
P ≤ −x = P (−X ≤ −x), x ∈ Rd , (2.26)
an
2.3 Multivariate Max-Stable Distributions 117

i.e., the rv X is min-stable iff −X is max-stable. As a consequence of Theo-


rem 2.3.4, we obtain the representation

P (X ≥ x) = P (−X ≤ −x)
 
= exp − (ψα1 (−x1 ), . . . , ψαd (−xd ))D (2.27)

with some D-norm ·D on Rd and α1 , . . . , αd ∈ R. If, in particular, α1 =


· · · = αd = 1, then we obtain

P (X ≥ x) = exp(− xD ), x ≥ 0 ∈ Rd ,

which has standard exponential margins P (Xi ≥ x) = exp(−x), x ≥ 0. Rep-


resentation (2.27) provides the complete family of min-stable distributions, in
arbitrary dimension d ∈ N.

Takahashi Revisited
We can now present the original version of Takahashi’s characterizations
in terms of multivariate max-stable dfs. In what follows, let the rv η =
(η1 , . . . , ηd ) have the SMS df

P (η ≤ x) = G(x) = exp (− xD ) , x ≤ 0 ∈ Rd ,

with an arbitrary D-norm ·D on Rd . Theorem 1.3.1 can now be formulated


as follows.
Theorem 2.3.7 With η as above, we have the equivalences
(i) η1 , . . . , ηd are independent

⇐⇒ ∃ y < 0 ∈ Rd :

d
P (ηi ≤ yi , 1 ≤ i ≤ d) = P (ηi ≤ yi ).
i=1

(ii) η1 = η2 = · · · = ηd a.s.

⇐⇒ P (η1 ≤ −1, η2 ≤ −1, . . . , ηd ≤ −1) = P (η1 ≤ −1).

Proof. If η1 , . . . , ηd are independent, its df can be written explicitly:


d
P (η1 ≤ x1 , . . . , ηd ≤ xd ) = P (ηi ≤ xi )
i=1

d
= exp(xi )
i=1
118 2 D-Norms & Multivariate Extremes
 

d
= exp xi
i=1
= exp (− x1 ) , x ≤ 0 ∈ Rd ,

i.e., the D-norm corresponding to η = (η1 , . . . , ηd ) is ·1 . If η1 = η2 = · · · =


ηd a.s., then

P (η1 ≤ x1 , . . . , ηd ≤ xd ) = P (η1 ≤ min(x1 , . . . , xd ))


= exp(min(x1 , . . . , xd ))
= exp (− x∞ ) , x ≤ 0 ∈ Rd .

The assertion now follows from Theorem 1.3.1.


The next characterization is an immediate consequence of Theorem 1.3.4.


Note that, for arbitrary 1 ≤ i < j ≤ d,

P (ηi ≤ −1, ηj ≤ −1) = P (ηi ≤ −1, ηj ≤ −1, ηk ≤ 0, k


∈ {i, j})
 
= exp − ei + ej D .

Part (ii) in the next result is obviously trivial. We list it for the sake of
completeness.
Theorem 2.3.8 With η as above, we have the equivalences
(i) η1 , . . . , ηd are independent iff η1 , . . . , ηd are pairwise independent.
(ii) η1 = η2 = · · · = ηd a.s. iff η1 , . . . , ηd are pairwise completely depen-
dent.

According to Theorem 2.3.4, the distribution of an arbitrary d-variate


max-stable rv can be represented by means of an SMS rv η together with
a proper non-random transformation of each component ηi , 1 ≤ i ≤ d. The
preceding characterizations, therefore, carry over to an arbitrary multivariate
max-stable rv.
Note that pairwise independence of rvs in general does not imply complete
independence. Take, for example, an rv U , that realizes in the set {1, 2, 3, 4}
and attains each element with equal probability 1/4. Put X1 := 1{1,2} (U ),
X2 := 1{1,3} (U ), X3 := 1{2,3} (U ). Then, X1 , X2 , X3 are pairwise independent,
but they are not completely independent:
1
P (X1 = 1, X2 = 1, X3 = 1) = 0
= P (X1 = 1)P (X2 = 1)P (X3 = 1) = .
8

Pickands Dependence Function


Take an arbitrary D-norm on Rd . Obviously, for x
= 0 ∈ Rd , we can write
2.3 Multivariate Max-Stable Distributions 119
 

 x  x
xD = x1 
 x

 =: x1 A ,
1 D x1

where A(·) is a function on the unit sphere S = y ∈ Rd : y1 = 1 . It is
evident that it suffices to define the function A(·) on S+ := u ≥ 0 ∈ Rd−1 :
d−1
i=1 ui ≤ 1 by putting
 
 
d−1 
 
A(u) :=  u1 , . . . , ud−1 , 1 − ui  .
 
i=1 D

The function A(·) is known as Pickands dependence function, and, according


to Theorem 2.3.3, we can represent any SMS df G as

G(x) = exp (− xD )


 d   
 x1 xd−1
= exp xi A d , . . . , d .
i=1 i=1 xi i=1 xi

An arbitrary max-stable df can be represented correspondingly.


In particular, in the case d = 2, we obtain, for u ∈ [0, 1],

A(u) = (u, 1 − u)D = E (max(uZ1 , (1 − u)Z2 )) ,

with A(0) = A(1) = 1, max(u, 1 − u) ≤ A(u) ≤ u + (1 − u) = 1. According


to the normed generators theorem there exists a generator (Z1 , Z2 ) of ·D
with Z2 = 2 − Z1 ; see Corollary 1.7.2. This entails a more refined analysis of
the function A(·) in the bivariate case. For a further investigation we refer to
Falk et al. (2011, Chapter 6).

The Extremal Coefficient


To measure the dependence among the univariate margins by just one number,
Smith (1990) introduced the extremal coefficient as that constant ε > 0, which
satisfies
G∗ (x, . . . , x) = H ε (x), x ∈ R,
where G∗ is an arbitrary d-dimensional max-stable df with identical univariate
margins G∗1 = · · · = G∗d =: H. If we have independence of the margins, then
ε = d, and if ε = 1, we have complete dependence.
Two questions naturally occur: Can we characterize this ε? Does it exist
at all?
According to Lemma 2.3.5, we can transform wlog the margins of G∗ to
the standard negative exponential distribution exp(x), x ≤ 0, thus obtaining
an SMS df G. For this, we have according to Theorem 2.3.3 the representation
G(x) = exp(− xD ), x ≤ 0 ∈ Rd , with some D-norm ·D on Rd . As an
immediate consequence, we obtain
120 2 D-Norms & Multivariate Extremes

G(x, . . . , x) = exp(− (x, . . . , x)D )


= exp(x 1D )
= exp(x) 1 D , x ≤ 0,

yielding
ε = 1D ∈ [1, d], (2.28)
according to the general inequalities ·∞ ≤ ·D ≤ ·1 in (1.4). The ex-
tremal coefficient is, therefore, the D-norm of the vector 1.
 1/p
d p
For the family of logistic D-norms xp = i=1 |xi | , p ∈ [1, ∞], we
obtain, for example,


⎨d, if p = 1,
1/p
1p = d , if p ∈ (1, ∞),


1, if p = ∞.

From Takahashi’s characterization in Corollary 1.3.2, we already know that


for an arbitrary D-norm ·D

·∞ , 1,
·D = ⇐⇒ 1D =
·1 , d.

Suppose a df F is in the domain of attraction of an arbitrary multivariate


EVD G∗ as defined in (3.1). The df G∗ has corresponding D-norm ·D by
Theorem 2.3.4; thus, ε = 1D is a measure of the tail dependence of F .
A refined tail dependence coefficient, which measures the dependence be-
tween tail independent margins, is defined in equation (5.21).

2.4 How to Generate Max-Stable RVS


In this section, we show how a max-stable rv can be generated by means of
independent copies of a generator of the corresponding D-norm. A well-known
representation of order statistics from the uniform distribution on (0, 1) will
be a crucial tool. As an application, we obtain a sharp lower bound for the
survival function of an SMS rv.

A Crucial Representation of Order Statistics


In what follows, we denote by Z (1) , Z (2) , . . . independent copies of a gen-
erator Z of a D-norm ·D on Rd . Let U1 , U2 , . . . be independent and on
(0, 1) uniformly distributed rvs, which are also independent of the sequence
Z (1) , Z (2) , . . .
2.4 How to Generate Max-Stable RVS 121

Denote by
U1:n ≤ U2:n ≤ · · · ≤ Un:n
the ordered values of U1 , . . . , Un , n ∈ N, or order statistics, for short. It is well
known that  n
i
n k=1 Ek
(Ui:n )i=1 =D n+1 , (2.29)
k=1 Ek i=1

where E1 , E2 , . . . are iid standard exponential rvs; see, for example, Reiss
(1989, Corollary 1.6.9). In what follows, we suppose that the sequence
E1 , E2 , . . . is independent of the sequence Z (1) , Z (2) , . . . as well.

Generation of Standard Max-Stable rvs


The next result shows how to generate an rv with df G(x) = exp(− xD ),
x ≤ 0 ∈ Rd , by means of the two sequences E1 , E2 , . . . and Z (1) , Z (2) , . . .
i
Proposition 2.4.1 Put Vi := 1/ k=1 Ek , i ∈ N. Then, the rv

1
η := −  
supi∈N Vi Z (i)

has the df G(x) = exp (− xD ), x ≤ 0 ∈ Rd .

Clearly, a simple max-stable rv ξ with df P (ξ ≤ x) = exp (− 1/xD ),


x > 0 ∈ Rd , is generated by putting
1  
ξ := − = sup Vi Z (i) .
η i∈N

Proof. Adopting the arguments in (2.11) and (2.19), we obtain, for x > 0 ∈
Rd , even without the assumption that Z is bounded,


 

1 1 (i) 1
P max Z ≤ x →n→∞ exp −  x
 = P (ξ ≤ x), (2.30)
n 1≤i≤n Ui D

where ξ is a simple max-stable rv.


Clearly, for each n ∈ N, we have the equality



1 1 (i) 1 1
P max Z ≤x =P max Z (i) ≤ x ,
n 1≤i≤n Ui n 1≤i≤n Ui:n

owing to the independence of the sequences U1 , U2 , . . . and Z (1) , Z (2) , . . . .


From representation (2.29), we obtain
122 2 D-Norms & Multivariate Extremes

1 1
P max Z (i) ≤x
n 1≤i≤n Ui:n
   
n+1
Ek 1 (i)
=P k=1
max i Z ≤x .
n 1≤i≤n Ek
k=1

The law of large numbers implies n+1k=1 Ek /n →n→∞ 1 a.s. Moreover,
   
1 1  
(i) (i)
max i Z →n→∞ sup i Z = sup Vi Z (i) =: ξ,
1≤i≤n
k=1 Ek i∈N k=1 Ek i∈N

where we know from equation (2.30) that ξ has the df


 

1
P (ξ ≤ x) = exp −  x
 , x > 0 ∈ Rd .
D

Putting
1
η := −
ξ
completes the proof.

Survival Probability of Standard Max-Stable rvs


The representation of an SMS rv in Proposition 2.4.1 enables the derivation
of a lower bound for its survival probability.
Lemma 2.4.2 Let η be an SMS rv on Rd with corresponding D-norm
·D . We have, for x < 0 ∈ Rd ,
(i) P (η > x) ≥ 1 − exp (−  x D ),

P (η > sx)
(ii) lim =  x D .
s↓0 s

Proof. We suppose the representation


1
η=−  
supi∈N Vi Z (i)

from Proposition 2.4.1. Using the notation in its proof, we obtain


 
 

1 1
P (η > x) = P sup Vi Z (i) > = 1 − P sup Vi Z (i)
>
i∈N |x| i∈N |x|
2.4 How to Generate Max-Stable RVS 123

with
 

1
P sup Vi Z (i)
>
i∈N |x|
⎛ ⎞
d * +
(i) 1 ⎠
=P⎝ Vi Zj ≤
j=1
|xj|
i∈N

(here comes the crucial inequality)


⎛ ⎞
d *
  +
(i) 1 ⎠
≤P⎝ Vi Zj ≤
|xj |
i∈N j=1

(the continuity of probability theorem implies)


⎛ ⎞
n  d * +
(i) 1
= lim P ⎝ Vi Zj ≤ ⎠
n→∞
i=1 j=1
|x j |
⎛ i '⎞
n  d  (i) ⎠
= lim P ⎝ Ek ≥ |xj | Zj
n→∞
i=1 j=1 k=1
⎛ ⎧ ⎫⎞
n  d ⎨ i (i) ⎬
k=1 Ek
|xj | Zj
= lim P ⎝ n+1 ≥   ⎠
n→∞ ⎩ k=1 Ek n n+1 ⎭
i=1 j=1 k=1 Ek /n

(the law of large numbers implies)


⎛  '⎞
(i)

n 
d i
|xj | Zj
k=1 Ek
= lim P ⎝ n+1 ≥ ⎠
n→∞
i=1 j=1 k=1 Ek
n
⎛ '⎞
(i)

n 
d
|xj | Zj
= lim P ⎝ Ui:n ≥ ⎠
n→∞
i=1 j=1
n

(by representation (2.29))


 n * +
 1  
(i)
= lim P Ui:n ≥ min |xj | Zj
n→∞
i=1
n 1≤j≤d
 n * +
 1  
(i)
= lim P Ui ≥ min |xj | Zj
n→∞
i=1
n 1≤j≤d
124 2 D-Norms & Multivariate Extremes

(by the independence of the sequences U1 , U2 , . . . and Z (1) , Z (2) , . . . )



n
1
= lim P U ≥ min (|xj | Zj )
n→∞ n 1≤j≤d

(where U is on (0, 1) uniformly distributed and independent of Z)



n
1
= lim 1 − E min (|xj | Zj ) + o(1)
n→∞ n 1≤j≤d

= exp −E min (|xj | Zj )


1≤j≤d

= exp (−  x D ) ,
which is part (i).
Part (ii) follows from the inclusion–exclusion principle (see Corol-
lary 1.6.2), together with (1.10):
 d 

P (η > sx) = 1 − P {ηi ≤ sxi }
i=1

=1− (−1)|T |−1 P (ηi ≤ sxi , i ∈ T )
∅=T ⊂{1,...,d}

= (−1)|T |−1 (1 − P (ηi ≤ sxi , i ∈ T )).
∅=T ⊂{1,...,d}

But

1 − P (ηi ≤ sxi , i ∈ T ) = 1 − exp −E max(s |xi | Zi )


i∈T

= 1 − exp −sE max(|xi | Zi )


i∈T

= sE max(|xi | Zi ) + o(s)
i∈T

according to the Taylor expansion exp(ε) = 1 + ε + o(ε) as ε → 0; thus,




P (η > sx) = s (−1)|T |−1 E max(|xi | Zi ) + o(s)


i∈T
∅=T ⊂{1,...,d}
⎛ ⎞

= sE ⎝ (−1)|T |−1 max(|xi | Zi )⎠ + o(s)
i∈T
∅=T ⊂{1,...,d}

= sE min (|xi | Zi ) + o(s)


1≤i≤d

= s  x D + o(s)
according to Lemma 1.6.1. This completes the proof of Lemma 2.4.2.

2.5 Covariances, Range, etc. of Standard Max-Stable rvs 125

2.5 Covariances, Range, etc. of Standard Max-Stable rvs


Let η = (η1 , . . . , ηd ) be an SMS rv. In this section, we compute the covariance
Cov(ηi , ηj ) between its components, their L1 -distance E(|ηi − ηj |) as well as
the expected range E (max1≤i≤d ηi − min1≤i≤d ηi ). The latter is particularly
useful within the framework of functional extreme value theory.

The Covariance of a Bivariate SMS rv


The D-norm approach offers appealing representations of both the covariance
of the components of a bivariate SMS rv, as well as their L1 -distance. Clearly,
this result immediately carries over to the components ηi , ηj of an arbitrary
SMS rv η = (η1 , . . . , ηd ) on Rd with D-norm ·D . In this case, the D-norm
corresponding to the bivariate SMS rv (ηi , ηj ), i < j, is given by

(x, y)Dij = E(max(|x| Zi , |y| Zj )) = xei + yej D , x, y ∈ R,

where Z = (Z1 , . . . , Zd ) is a generator of ·D .


Lemma 2.5.1 Let η = (η1 , η2 ) follow a bivariate SMS df. According
to Theorem 2.3.3 there exists a D-norm ·D on R2 with P (η ≤ x) =
exp(− xD ), x ≤ 0 ∈ R2 . Then,
 ∞
1
E(η1 η2 ) = dt.
0 (1, t)2D

As E(η1 ) = E(η2 ) = −1, Var(η1 ) = Var(η2 ) = 1, the covariance and


the correlation coefficient of η1 , η2 are consequently given by
 ∞
1
Cov(η1 , η2 ) = 2 dt − 1 = ρ(η1 , η2 ).
0 (1, t)D

The proof of the preceding lemma is based on an ingenious representation


of general covariances, called Hoeffding’s identity. It shows in particular that,
in a certain sense, the covariance is a measure of dependence. For the sake of
completeness, we state it explicitly, along with a proof.
Lemma 2.5.2 (Hoeffding’s Identity) Let X, Y be square integrable
rvs on R. Then,

Cov(X, Y ) = E(XY ) − E(X)E(Y )


 ∞ ∞
= P (X ≤ x, Y ≤ y) − P (X ≤ x)P (Y ≤ y) dx dy.
−∞ −∞

Proof. Let (X1 , Y1 ), (X2 , Y2 ) be independent copies of (X, Y ). Then,


126 2 D-Norms & Multivariate Extremes

E((X1 − X2 )(Y1 − Y2 )) = 2Cov(X, Y ).


We can write (X1 − X2 )(Y1 − Y2 ) as the product of two integrals:
(X1 − X2 )(Y1 − Y2 )
 ∞
 ∞

= 1(x,∞) (X1 ) − 1(x,∞) (X2 ) dx 1(y,∞) (Y1 ) − 1(y,∞) (Y2 ) dy


−∞ −∞
 ∞
 ∞

= 1(∞,x] (X2 ) − 1(∞,x] (X1 ) dx 1(∞,y] (Y2 ) − 1(∞,y] (Y1 ) dy


−∞ −∞
 ∞ ∞
  
= 1(∞,x] (X2 ) − 1(∞,x] (X1 ) 1(∞,y] (Y2 ) − 1(∞,y] (Y1 ) dx dy,
−∞ −∞

where we have used the equality 1(x,∞) (X1 ) = 1 − 1(∞,x] (X1 ) etc. As a con-
sequence, by using Fubini’s theorem, we obtain
E((X1 − X2 )(Y1 − Y2 ))
 ∞ ∞
  
= E 1(∞,x] (X2 ) − 1(∞,x](X1 ) 1(∞,y] (Y2 ) − 1(∞,y] (Y1 ) dx dy
−∞ −∞
 ∞ ∞
= 2P (X ≤ x, Y ≤ y) − 2P (X ≤ x)P (Y ≤ y) dx dy,
−∞ −∞
 
where this time we have used the equality E 1(∞,x] (X2 )1(∞,y] (Y2 ) = P (X ≤
x, Y ≤ y) etc. This completes the proof.

Proof (of Lemma 2.5.1). From Lemma 2.5.2 and Lemma 1.2.2, we obtain
 ∞ ∞
Cov(η1 , η2 ) = P (η1 ≤ x, η2 ≤ y) − P (η1 ≤ x)P (η2 ≤ y) dx dy
−∞ −∞
0  0
= P (η1 ≤ x, η2 ≤ y) − P (η1 ≤ x)P (η2 ≤ y) dx dy
−∞ −∞
 0  0
= P (η1 ≤ x, η2 ≤ y) dx dy − E(η1 )E(η2 )
−∞ −∞
 0  0
= P (η1 ≤ x, η2 ≤ y) dx dy − 1
−∞ −∞
= E(η1 η2 ) − 1.
But
 0  0
P (η1 ≤ x, η2 ≤ y) dx dy
−∞ −∞
 0  0
= exp (− (x, y)D ) dy dx
−∞ −∞
 0  0   y  
 
= exp x  1,  dy dx
−∞ −∞ x D
2.5 Covariances, Range, etc. of Standard Max-Stable rvs 127
 0  ∞
=− x exp (x (1, y)D ) dy dx
−∞ 0
 ∞ 0
x
=− 2 exp(x) dx dy
0 −∞ (1, y)D
 ∞
1
= 2 dy
0 (1, y)D

by substituting first y → xy and then x → x/(1, y)D ; we also used the


0
equation −∞ x exp(x) dx = −1. This completes the proof.

Example 2.5.3 In accordance with the characterization of the inde-


pendence and complete dependence cases in terms of D-norms, we ob-
tain for ·D = ·1
 ∞
1
Cov(η1 , η2 ) = dt − 1 = 0,
0 (t + 1)2

and in the case of ·D = ·∞


 ∞
1
Cov(η1 , η2 ) = dt − 1 = 1.
0 (max(t, 1))2

In addition to this, by substituting t → t1/p , we obtain for a general


logistic D-norm ·p with parameter p ∈ [1, ∞)
 ∞
1
Cov(η1 , η2 ) = dt − 1
0 (tp + 1)2/p

1 ∞ t1/p−1
= dt − 1
p 0 (t + 1)2/p

1 1 1
= B , − 1,
p p p

where
 1  ∞
ty−1
B(x, y) = t x−1
(1 − t) y−1
dt = dt, x, y > 0,
0 0 (1 + t)x+y

denotes the beta function. (Apply the substitution t → 1/(1 + t) to the


first integral to obtain the final equation.)

The L1 -Distance Between Standard Max-Stable rvs


It is easy to compute the L1 -distance between the components of an SMS rv.
128 2 D-Norms & Multivariate Extremes

Lemma 2.5.4 Let η = (η1 , η2 ) be as in the previous Lemma 2.5.1 and


let Z = (Z1 , Z2 ) be an arbitrary generator of the D-norm. Then, we
have the representation

1 E(|Z1 − Z2 |)
E(|η1 − η2 |) = 2 1 − = .
(1, 1)D (1, 1)D

As (1, 1)D is greater than one and less than two according to equation
(1.4), the preceding equation implies the bounds

E(|Z1 − Z2 |)
≤ E(|η1 − η2 |) ≤ E(|Z1 − Z2 |).
2

The preceding result implies that E(|η1 − η2 |) = 0 and, thus, η1 = η2


a.s. iff (1, 1)D = 1. This coincides with Takahashi’s characterization (see
Corollary 1.3.2). These arguments also entail that the sup-norm ·∞ can
only be generated by a generator Z with identical components a.s., because
it is only in this case that we have E(|Z1 − Z2 |) = 0.

Proof (of Lemma 2.5.4). From the equation

a + b |b − a|
max(a, b) = + , (2.31)
2 2
which holds for arbitrary numbers a, b ∈ R, we obtain

E(|η1 − η2 |) = 2E(max(η1 , η2 )) − E(η1 + η2 ) = 2(1 + E(max(η1 , η2 ))).

From Lemma 1.2.2, we obtain


 0
E(max(η1 , η2 )) = − P (max(η1 , η2 ) ≤ t) dt
−∞
 0
=− P (η ≤ (t, t)) dt
−∞
 0
=− exp (− (t, t)D ) dt
−∞
 0
1
=− exp (t (1, 1)D ) dt = −
−∞ (1, 1)D
0
by using the substitution t → t/ (1, 1)D and the fact that −∞
exp(t) dt = 1.
This proves the first equation in Lemma 2.5.4.
2.5 Covariances, Range, etc. of Standard Max-Stable rvs 129

Applying equation (2.31) again, this time to max(Z1 , Z2 ), we obtain

(1, 1)D = E(max(Z1 , Z2 ))


E(Z1 + Z2 ) E(|Z1 − Z2 |)
= +
2 2
E(|Z1 − Z2 |)
=1+
2
and, thus,

1 (1, 1)D − 1 E(|Z1 − Z2 |)


2 1− =2 = ,
(1, 1)D (1, 1)D (1, 1)D

which completes the proof of Lemma 2.5.4.


The Range of the Components of a Max-Stable RV


The following result extends the upper bound on the L1 -distance of the com-
ponents of a bivariate SMS rv in Lemma 2.5.4 to an arbitrary dimension.
Lemma 2.5.5 Let η = (η1 , . . . , ηd ) be an SMS rv on Rd with corre-
sponding D-norm ·D . We have

E max |ηi − ηj | = E max ηi − min ηj


1≤i,j≤d 1≤i≤d 1≤j≤d
1 1
≤ − .
 1 D 1D

Recall that  1 D can be zero, in which case the preceding upper bound
is infinity and less helpful.

Proof. From Lemma 1.2.2, we obtain



 0

E max ηi = − P max ηi ≤ t dt
1≤i≤d −∞ 1≤i≤d
 0
=− P (η ≤ t1) dt
−∞
 0
=− exp (− t1D ) dt
−∞
 0
=− exp (t 1D ) dt
−∞
 0
1 1
=− exp(t) dt = − ,
1D −∞ 1D
130 2 D-Norms & Multivariate Extremes

using the substitution t → t/ 1D . In complete analogy, we obtain



 0

E min ηi =− P min ηi ≤ t dt
1≤i≤d −∞ 1≤i≤d
 0

=− 1−P min ηi > t dt


−∞ 1≤i≤d
 0
= P (η > t1) − 1 dt
−∞
 0
1
≥− exp (t  1 D ) dt = − ,
−∞  1 D

using the bound P (η > x) ≥ 1 − exp (−  x D ), x < 0 ∈ Rd , from


Lemma 2.4.2 and the homogeneity  tx D = |t|  x D .
Thus, we have established

1 1
E max ηi − E min ηi ≤ − .
1≤i≤d 1≤i≤d  1 D 1D

For the Marshall–Olkin D-norm with parameter λ ∈ [0, 1] as in (1.8), we


obtain from Example 1.6.4

1Dλ = λ + d(1 − λ),  1 Dλ = λ,

and, thus, the upper bound


1 1 1 1
− = − .
 1 Dλ 1Dλ λ λ + d(1 − λ)

It is interesting to note that this upper bound converges to 1/λ if the dimen-
sion d tends to infinity.

2.6 Max-Stable Random Vectors as Generators of


D-Norms
In this section, we pick up an idea in the proof of Theorem 2.3.3 and use max-
stable rvs as generators of D-norms. As an example, we obtain a generator of
the logistic D-norm ·p , 1 < p < ∞, as in Proposition 1.2.1.
Let the rv η = (η1 , . . . , ηd ) follow the SMS df

G(x) = P (η ≤ x) = exp(− xD ), x ≤ 0 ∈ Rd .


2.6 Max-Stable Random Vectors as Generators of D-Norms 131
−c
Choose c ∈ (0, 1). Then, the rv |ηi | has the df

1 1 c
P c ≤ x = P ≤ |ηi |
|ηi | x

1
=P ≤ −ηi
x1/c

1
= P − 1/c ≥ ηi
x

1
= exp − 1/c , x > 0, 1 ≤ i ≤ d,
x
−c
i.e., |ηi | follows the Fréchet df Fα (x) = exp(−x−α ), x > 0, with parameter
α = 1/c; note that P (ηi = 0) = 0. Its expectation is, by (2.23), μc = Γ (1 − c).
The rv

(c) (c) (c) 1 1 1


Z = (Z1 , . . . , Zd ) := c,..., c (2.32)
μc |η1 | |ηd |
(c) (c)
now satisfies Zi ≥ 0 and E(Zi ) = 1, 1 ≤ i ≤ d, i.e., Z (c) is the generator
of a D-norm. Can we specify it? 
−c −c
Note that the rv |η1 | , . . . , |ηd | follows a max-stable df with Fréchet
margins:

1
H(x) =P c ≤ x i , 1 ≤ i ≤ d
|ηi |
 
1
=P ηi ≤ − 1/c , 1 ≤ i ≤ d
x
  i  
 
 1 1 
= exp −  1/c , . . . , 1/c  , x > 0 ∈ Rd ,
 x xd 
1 D

and, for each n ∈ N,


   n
 
 1 1 
H (n x) = exp − 
n c
, . . . , 
 (nc x1 )1/c (nc xd ) D
1/c
   
n 1 1 

= exp −  1/c , . . . , 1/c 
n x x 
1 d D
= H(x), x > 0 ∈ Rd .

Now, we specify the D-norm generated by Z (c) as defined in (2.32).


132 2 D-Norms & Multivariate Extremes

Proposition 2.6.1 The D-norm ·D(c) corresponding to the genera-


tor Z (c) defined in (2.32), is, for x ∈ Rd , given by
 
 c
(c)  1/c 1/c 
xD(c) = E max |xi | Zi =  |x1 | , . . . , |xd |  . (2.33)
1≤i≤d D

If η1 , . . . ηd in the preceding result are independent, i.e., if the underlying


D-norm ·D is ·1 , Proposition 2.6.1 yields that Z (c) generates the logistic
norm
 d c
 c 
 1/c 1/c  1/c
xD(c) =  |x1 | , . . . , |xd |  = |xi | = x1/c .
1
i=1

This was already observed in Proposition 1.2.1.


It is clearly a purely mathematical question,
 but, nevertheless,
c an obvious
 1/c 1/c 
and interesting one: what is the limit of  |x1 | , . . . , |xd |  for c → 1,
D
or for c → 0, if it exists?
The answer is actually easy: clearly,
 c
 1/c 1/c 
lim  |x1 | , . . . , |xd |  = xD .
c→1 D

On the other hand, from the fact that each D-norm is larger than the sup-
norm ·∞ and smaller than the norm ·1 (see (1.4)), we obtain
 c  
c
 1/c 1/c  1/c
|x
 1 | , . . . , |xd |  ≥ max |xi | = max (|xi |) = x∞
D 1≤i≤d 1≤i≤d

and  d c
 c 
 1/c 1/c  1/c
 |x1 | , . . . , |xd |  ≤ |xi | →c→0 x∞
D
i=1

by Lemma 1.1.2. Hence, we have


 c
 
lim  |x1 |1/c , . . . , |xd |1/c  = x∞ .
c→0 D

Proof (of Proposition 2.6.1). From Lemma 1.2.2, we obtain that, for x > 0 ∈
Rd ,
 

(c) 1 xi
E max xi Zi = E max c
1≤i≤d μc 1≤i≤d |ηi |
 ∞

1 xi
= P max c > t dt
μc 0 1≤i≤d |ηi |
 ∞

1 xi
= 1 − P max c ≤ t dt
μc 0 1≤i≤d |ηi |
2.6 Max-Stable Random Vectors as Generators of D-Norms 133
 ∞

1 xi
= 1−P c ≤ t, 1 ≤ i ≤ d dt
μc
0 |ηi |
 ∞

1 1 t
= 1−P c ≤ , 1 ≤ i ≤ d dt
μc 0 |ηi | xi
 ∞  

1  1 1 
= 1 − exp −   , . . . ,  dt
μc 0 (t/x1 )1/c (t/xd )1/c D
 ∞ 

1 1   1/c 1/c 
= 1 − exp − 1/c (x1 , . . . , xd ) dt
μc 0 t D

c  ∞

1 
 1/c 1/c  1
= (x1 , . . . , xd ) 1 − exp − 1/c dt
μc D 0 t
 c
 1/c 1/c 
by the substitution t → (x1 , . . . , xd ) t.
∞   D
The integral 0 1 − exp −1/t1/c dt equals E(Y ) according to the
Lemma 1.2.2, where Y follows a Fréchet distribution with parameter 1/c.
It was shown in (2.23) that E(Y ) = μc , which completes the proof.

Iterating the Sequence of Generators


Taking this new D-norm ·D(c) in (2.33) as the initial D-norm ·D and
proceeding as before leads to the D-norm
 c2
 1/c2 1/c2 
(x1 , . . . , xd )D(2) :=  x1 , . . . , xd  ,
D

x ≥ 0 ∈ Rd . We can iterate this procedure and obtain in the n-th step


 cn
 1/cn 1/cn 
(x1 , . . . , xd )D(n) :=  x1 , . . . , xd  .
D

This begs the question: Does this sequence of D-norms converge?


Note: if we choose ·D = ·∞ , then we obtain, for x ≥ 0 ∈ Rd ,
 cn
cn
 1/cn 1/cn  1/cn
 x1 , . . . , xd  = max xi = max xi = (x1 , . . . , xd )∞ .
D 1≤i≤d 1≤i≤d

This may raise the conjecture that the sequence of D-norms converges to the
sup-norm ·∞ , if it converges. This is actually true and can easily be seen as
follows.
Recall that ·∞ ≤ ·D ≤ ·1 for an arbitrary D-norm and that c ∈
(0, 1). As a consequence, we obtain
 cn
 1/cn 1/cn 
(x1 , . . . , xd )D(n) =  x1 , . . . , xd 
D
134 2 D-Norms & Multivariate Extremes
  cn
 1/cn 1/cn 
≤ (x1 , . . . , xd )
1
 d cn
 1/cn
= xi
i=1
→n→∞ (x1 , . . . xd )∞ , x ≥ 0 ∈ Rd ,

according to Lemma 1.1.2; hence,

(x1 , . . . , xd )D(n) →n→∞ (x1 , . . . xd )∞ .


3
Copulas & Multivariate Extremes

This chapter reveals the crucial role that copulas play in MEVT. The D-norm
approach again proves to be quite a helpful tool. In particular, it turns out
that a multivariate df F is in the domain of attraction of a multivariate EVD
iff this is true for the univariate margins of F together with the condition
that the copula of F in its upper tail is close to that of a generalized Pareto
copula. As a consequence, MEVT actually means extreme value theory for
copulas.

3.1 Characterizing Multivariate Domain of Attraction

In complete analogy to the univariate case in (2.2), we say that a multivariate


df F on Rd is in the domain of attraction of an arbitrary multivariate EVD
G, again denoted by F ∈ D(G), if there are vectors an > 0, bn ∈ Rd , n ∈ N,
such that
F n (an x + bn ) −→n→∞ G(x), x ∈ Rd . (3.1)
Recall that all operations on vectors are meant componentwise. See Theo-
rem 2.3.4 for the family of possible limits G.

Sklar’s Theorem
A copula is a multivariate df with the particular property that each univariate
margin is the uniform distribution on (0, 1). For an exhaustive account of
copulas we refer to Nelsen (2006). Sklar’s theorem plays a major role in the
characterization of F ∈ D(G) for a general df F on Rd .

© Springer Nature Switzerland AG 2019 135


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9 3
136 3 Copulas & Multivariate Extremes

Theorem 3.1.1 (Sklar (1959, 1996)) For every df F on Rd with


univariate margins F1 , . . . , Fd there exists a copula C such that

F (x) = C(F1 (x1 ), . . . , Fd (xd )), x = (x1 , . . . , xd ) ∈ Rd .

If F is continuous, then C is uniquely determined and given by

C(u) = F (F1−1 (u1 ), . . . , Fd−1 (ud )), u = (u1 , . . . , ud ) ∈ (0, 1)d ,

where Fi−1 (u) = inf{t ∈ R : Fi (t) ≥ u}, u ∈ (0, 1), is the generalized inverse
of Fi . The copula of an rv Y = (Y1 , . . . , Yd ) is meant to be the copula of its
df.
If Y = (Y1 , . . . , Yd ) is an rv such that, for each i ∈ {1, . . . , d}, the df Fi
of Yi is in its upper tail a continuous function, then the copula C of Y is, for
u = (u1 , . . . , ud ) close to 1 ∈ Rd , uniquely determined and given by

C(u) = P (F1 (Y1 ) ≤ u1 , . . . , Fd (Yd ) ≤ ud ).

Sklar’s theorem can be formulated in terms of rvs as follows. Let Y =


(Y1 , . . . , Yd ) be an arbitrary rv and denote by Fi the df of Yi , 1 ≤ i ≤ d. There
exists an rv U = (U1 , . . . , Ud ), which follows a copula, i.e., each Ui follows the
uniform distribution on (0, 1), such that
 
Y =D F1−1 (U1 ), . . . , Fd−1 (Ud ) .

We frequently use this version.

Introducing Generalized Pareto Copulas


By (2.15), the copula C of an rv Y = (Y1 , . . . , Yd ), which follows an arbitrary
multivariate GPD as in (2.17), with corresponding D-norm ·D satisfies the
equation

C(u) = P ((ψα1 (Y1 ) + 1, . . . , ψαd (Yd ) + 1) ≤ u)


= P (W1 ≤ u1 − 1, . . . , Wd ≤ ud − 1)
= 1 − (1 − u1 , . . . , 1 − ud )D , u 0 ≤ u ≤ 1 ∈ Rd ,

for some u0 < 1 ∈ Rd . A copula C on Rd with such an expansion

C(u) = 1 − 1 − uD , u 0 ≤ u ≤ 1 ∈ Rd , (3.2)

for some u0 < 1, is called a generalized Pareto copula (GPC). These copulas
turn out be a key to MEVT; see, for example, Proposition 3.1.5 and 3.1.10.
Note that any marginal distribution of a GPC C is a lower dimensional
GPC as well: if the rv U = (U1 , . . . , Ud ) follows the GPC C on Rd , then the
3.1 Characterizing Multivariate Domain of Attraction 137

rv UT := (Ui1 , . . . , Uim ) follows a GPC on R|m| for each non-empty subset


T = {i1 , . . . , im } ⊂ {1, . . . , d}.
The characteristic property of a GPC is its excursion stability, as for-
mulated in the next result. The conclusion “⇒” follows from the proof of
Lemma 3.1.13 later in this chapter. The reverse implication is just a reformu-
lation of Falk and Guillou (2008, Proposition 6).
Proposition 3.1.2 Let the rv U = (U1 , . . . , Ud ) follow a cop-
ula C. Then, C is a GPC iff for an arbitrary non-empty subset
T = {i1 , . . . , im } of {1, . . . , d} the rv UT = (Ui1 , . . . , Uim ) is exceedance
stable, i.e.,

P (UT ≥ 1 − tu) = tP (UT ≥ 1 − u), t ∈ [0, 1], (3.3)

for u close to 0 ∈ Rm .

If P (UT ≥ 1 − u) > 0, then (3.3) clearly becomes


P (UT ≥ 1 − tu | UT ≥ 1 − u) = t, t ∈ [0, 1].
But P (UT ≥ 1 − u) can be equal to zero for all u close to 1 ∈ Rm . For
example, this is the case when the underlying D-norm ·D is ·1 . Then,
 · D = 0 (see equation (1.12)), and thus, P (UT ≥ 1 − u) = 0 for u close to
0 ∈ Rm , unless m = 1; see Lemma 3.1.13.
Different from the definition of the family of univariate GPD as in
(2.7), the definition of multivariate GPD is not unique in the literature.
There are different approaches (see, e.g., Rootzén and Tajvidi (2006); Falk
et al. (2011) or (2.17)). The following suggestion may help to conclude
the debate.
Remark 3.1.3 Proposition 3.1.2 offers another way to define an arbi-
trary multivariate GPD:. An rv Y = (Y1 , . . . , Yd ) follows a multivariate
GPD iff its copula is excursion stable, in which case it is a GPC, and
each component Yi follows a univariate GPD in its upper tail as in (2.7).
Note that this coincides with the definition of a multivariate GPD as in
(2.17).

Example 3.1.4 Let X = (X1 , . . . , Xd ) be an rv, whose corresponding


copula C is a GPC, and let each component follow the standard Pareto
distribution, i.e., P (Xi ≤ x) = 1 − x−1 =: F (x), x ≥ 1. Note that
1
1 − F (tx) = (1 − F (x)), t, x ≥ 1.
t
By Sklar’s theorem 3.1.1, we can assume the representation
 
X = F −1 (U1 ), . . . , F −1 (Ud ) ,
138 3 Copulas & Multivariate Extremes

where U = (U1 , . . . , Ud ) follows the copula C. For x = (x1 , . . . , xd )


large enough, we obtain

P (X ≥ tx)
P (X ≥ tx | X ≥ x) =
P (X ≥ x)
P (Ui ≥ F (txi ), 1 ≤ i ≤ d)
=
P (Ui ≥ F (xi ), 1 ≤ i ≤ d)
P (Ui ≥ 1 − (1 − F (txi )), 1 ≤ i ≤ d)
=
P (Ui ≥ 1 − (1 − F (xi )), 1 ≤ i ≤ d)
 
P Ui ≥ 1 − 1t (1 − F (xi )), 1 ≤ i ≤ d
=
P (Ui ≥ 1 − (1 − F (xi )), 1 ≤ i ≤ d)
1
= , t ≥ 1,
t
by equation (3.3), provided P (U ≥ u) > 0 for all u ∈ [0, 1)d close
to 1 ∈ Rd . The preceding result can easily be extended to arbitrary
univariate generalized Pareto margins as given in (2.7).

The previous result shows that if one wants to model the copula of multi-
variate exceedances above high thresholds, then a GPC is a first option.

Domain of Attraction for Copulas


The df of the uniform distribution on (0, 1) is H(u) := u, u ∈ [0, 1]. We obtain,
therefore, with an = 1/n, bn = 1, n ∈ N, for x ≤ 0 and large n
x   x n
H n (an x + bn ) = H n +1 = 1+ →n→∞ exp(x),
n n
i.e., each univariate margin of an arbitrary copula is automatically in the
domain of attraction of the univariate SMS df G(x) = exp(x), x ≤ 0.
The following conclusion is a consequence: if a copula C on Rd is in the
domain of attraction of an EVD G,
 x
Cn 1 + →n→∞ G(x), x ≤ 0 ∈ Rd ,
n
then G has necessarily standard exponential margins, i.e., G is a (multivariate)
SMS df. According to Theorem 2.3.3, there exists a D-norm ·D such that
G(x) = exp(− xD ), x ≤ 0 ∈ Rd . This underlines the particular role of SMS
df. The next result characterizes the condition C ∈ D(G). It turns out that
C ∈ D(G) iff its upper tail is close to that of a GPC.
3.1 Characterizing Multivariate Domain of Attraction 139

Proposition 3.1.5 A copula C on Rd satisfies C ∈ D(G), where


G(x) = exp(− xD ), x ≤ 0 ∈ Rd , iff the copula C satisfies the ex-
pansion

C(u) = 1 − 1 − uD + o(1 − u)

as u → 1, uniformly for u ∈ [0, 1]d .

The uniformity in the preceding result is meant as follows: For each ε > 0,
there exists δ > 0 such that
|C(u) − (1 − 1 − uD )|
≤ ε, if u ∈ [1 − δ, 1]d , u
= 1.
1 − u
The norm · in the denominator can be arbitrarily chosen, due to the fact
that all norms on Rd are equivalent.
As an example, we show in Corollary 3.1.15 that an Archimedean copula
Cϕ on Rd , whose generator function ϕ satisfies condition (3.11) below, is in
the domain of attraction of an SMS df with corresponding logistic D-norm.
Proof (of Proposition 3.1.5). The implication “⇐” is obvious: we have for
x ≤ 0 ∈ Rd


n
x n 1 1
C 1+ = 1 − xD + o
n n n
→n→∞ exp(− xD ) =: G(x),

where G(·) defines a standard max-stable df by Theorem 2.3.3.


Next, we establish the implication “⇒.” Suppose that C ∈ D(G). We have,
consequently,  x
Cn 1 + →n→∞ G(x),
n
where the norming constants are prescribed by the univariate margins of C.
Repeating the arguments in the proof of Theorem 2.3.3, one derives from the
above convergence
 x
Ct 1 + →t→∞ G(x), x ∈ Rd .
t
As the limiting df G(x) = exp(− xD ), x ≤ 0 ∈ Rd , is continuous, the above
convergence is uniform in x, i.e.,
!  x !
! !
sup !C t 1 + − exp (− xD )! →t→∞ 0.
x≤0 t

Taking logarithms, this implies for t ≥ 2


!   x  !
! !
sup !t log C 1 + + xD ! →t→∞ 0.
−1≤x≤0 t
140 3 Copulas & Multivariate Extremes

The Taylor expansion log(1 + ε) = ε + O(ε2 ) for ε → 0 yields uniformly


for −1 ≤ x ≤ 0 ∈ Rd and t ≥ 2
  x     x 
log C 1 + = log 1 + C 1 + −1
t t
 x   x 2

=C 1+ −1+O C 1+ −1 .
t t

The lower Fréchet bound for a multivariate df (see, for example, Galambos
(1987, Theorem 5.1.1)) for x = (x1 , . . . , xd ) ≤ 0 provides the inequality

 x 1
d
0≥C 1+ −1≥ xi .
t t i=1

If we have in addition −1 ≤ x ≤ 0, this yields


  x 2 d2
C 1+ −1 ≤ 2.
t t
As a consequence, we obtain
!   x  !
! !
sup !t C 1 + − 1 + xD ! →t→∞ 0.
−1≤x≤0 t

Putting u := 1 + x/t, the preceding equation becomes

sup |t(C(u) − 1) + t 1 − uD |


1− 1t 1≤u≤1

=t sup |C(u) − (1 − 1 − uD )|


1− 1t 1≤u≤1

=: r(1/t) →t→∞ 0. (3.4)

Choose u ∈ [0, 1]d with 1 − uD ≤ 1/2. The preceding equation with
t := 1/ 1 − u∞ implies

|C(u) − (1 − 1 − uD )|
≤ r(1 − u∞ ); (3.5)
1 − u∞

note that we can apply (3.4) with these choices of u and t, since
1 1−u
1 − 1 ≤ u ≤ 1 ⇐⇒ 0 ≤ ≤ 1,
t 1 − u∞

which is obviously true.


Equation (3.5) implies for u ∈ [0, 1]d , u
= 1,

|C(u) − (1 − 1 − uD )|
→ 1−u ∞ →0 0, (3.6)
1 − u∞
3.1 Characterizing Multivariate Domain of Attraction 141

which is the expansion

C(u) = 1 − 1 − uD + o(1 − u∞ ) (3.7)

as u → 1.
As described above, uniformity in u in the above expansion means that
for all ε > 0, there exists δ > 0 such that the remainder term satisfies
|o(1 − u∞ )| ≤ ε if 1 − u∞ ≤ δ. We prove this by a contradiction. Sup-
pose this uniformity is not valid. Then there exists ε∗ > 0 such that, for all
δ > 0, there exists uδ ∈ [0, 1]d with 1 − uδ ∞ ≤ δ and |o(1 − uδ ∞ )| > ε∗ .
But this clearly contradicts equation (3.6).
Since all norms on Rd are equivalent, the remainder term o(1 − u∞ ) in
expansion (3.7) can be substituted by o(1 − u) with an arbitrary norm on
Rd . This completes the proof of Proposition 3.1.5.

The following consequence of Proposition 3.1.5 provides a handy charac-
terization of the condition C ∈ D(G).
Corollary 3.1.6 A copula C on Rd satisfies C ∈ D(G), with G(x) =
exp(− xD ), x ≤ 0 ∈ Rd , iff for all x ≤ 0 ∈ Rd , the limit

1 − C(1 + tx)
lim =: (x) (3.8)
t↓0 t

exists in [0, ∞). In this case (x) = xD .

The limit (·) is also known as the stable tail dependence function of C
(Huang (1991)). The fact that each stable tail dependence function is actu-
ally a D-norm opens the way to estimating an underlying D-norm by using
estimators of the stable tail dependence function.

Proof. We know from Proposition 3.1.5 that the condition C ∈ D(G) is equiv-
alent to the expansion

C(u) = 1 − 1 − uD + o(1 − u)

as u converges to 1, uniformly in [0, 1]d . For x ≤ 0 ∈ Rd and t > 0, this


readily implies

1 − C(1 + tx) = txD + o(tx) = t xD + o(t)

and, thus, the implication “⇒.”


Suppose next that the limit in equation (3.8) exists. The Hoeffding–
Fréchet bounds for a multivariate df (see, for example, Galambos (1987, The-
orem 5.1.1)) imply the bounds


d
(1 + txi ) − d + 1 ≤ C(1 + tx) ≤ min (1 + txi )
1≤i≤d
i=1
142 3 Copulas & Multivariate Extremes

thus,

d
t |xi | ≥ 1 − C(1 + tx) ≥ t max |xi | .
1≤i≤d
i=1

The limit (x) therefore satisfies

x∞ ≤ (x) ≤ x1 , x ≤ 0 ∈ Rd . (3.9)

This implies in particular that (x) →x→0 0 and (x) → ∞ if one component
of x decreases to −∞.
The Taylor expansion log(1 + ε) = ε + O(ε2 ) for ε → 0 implies for x ≤ 0 ∈
R and n ∈ N large,
d

 x    x 
Cn 1 + = exp n log C 1 +
n / / n 00
    2

x x
= exp n C 1 + −1+O C 1+ −1
n n
  

1−C 1+ x 1
= exp − 1
n
+O
n
n
→n→∞ exp(−(x)).

Note that


x
Cn 1 + = P n max U (i) − 1 ≤ x , n ∈ N,
n 1≤i≤n

is a sequence of df, which converges by the above result to exp(−(x)) =:


G(x), x ≤ 0 ∈ Rd . From Helly’s selection theorem (see, for example, Billings-
ley (1999)) and equation (3.9), we conclude that G(·) defines a df on Rd . We
have to establish its max-stability: for m ∈ N, we have
x  x 
G = lim C n 1 +
m n∈N nm
  x 1/m
= lim C nm 1 + = G(x)1/m
n∈N nm
or
 x m
G = G(x), x ≤ 0 ∈ Rd ,
m
which is the max-stability of G. Finally, we obviously have according to (3.9)

(xei ) = |x| , x ≤ 0, 1 ≤ i ≤ d,

and thus, G has standard negative exponential margins. Theorem 2.3.3 implies
G(x) = exp(− xD ), x ≤ 0 ∈ Rd , which completes the proof.

3.1 Characterizing Multivariate Domain of Attraction 143

Remark 3.1.7 Equation (3.9) in the preceding proof reveals why ·∞
and ·1 are the smallest and the largest D-norms: this is actually due
to the Hoeffding–Fréchet bounds for a multivariate df.

Example 3.1.8 The Ali–Mikhail–Haq family of bivariate copulas is de-


fined by
uv
Cϑ (u, v) := , 0 ≤ u, v ≤ 1, ϑ ∈ [−1, 1];
1 − ϑ(1 − u)(1 − v)

see Nelsen (2006, Section 3.3.2). It satisfies the equation

1 − Cϑ (u, v) 1−u 1−v 1−u1−v


= + + (1 − ϑ) .
Cϑ (u, v) u v u v

As a consequence, we obtain for x1 , x2 ≤ 0 and t > 0 small enough

1 − Cϑ (1 + tx1 , 1 + tx2 )
t

Cϑ (1 + tx1 , 1 + tx2 ) −tx1 −tx2 −tx1 −tx2


= + + (1 − ϑ)
t 1 + tx1 1 + tx2 1 + tx1 1 + tx2
→t↓0 |x1 | + |x2 | .

In this case, we obtain (x1 , x2 ) = (x1 , x2 )1 and, thus, Cϑ ∈ D(G),


G((x1 , x2 )) = exp(− (x1 , x2 )1 ), x1 , x2 ≤ 0, which has independent
components.

Extreme Value Copulas


An extreme value copula on Rd is the copula of an arbitrary d-variate max-
stable df G∗ . According to Theorem 2.3.4 for u ∈ (0, 1]d , it has the represen-
tation
CG∗ (u) = exp (− (log(u1 ), . . . , log(ud ))D ) . (3.10)
On the other hand, the right-hand side of this equation defines an extreme
value copula for any D-norm ·D . This is a consequence of Theorem 2.3.3.
Example 3.1.9 The bivariate Hüsler–Reiss distribution with parame-
ter λ > 0 as given in Example 2.3.6, i.e.,

y−x
GHRλ (x, y) = exp − exp(−x)Φ λ +

x−y
− exp(−y)Φ λ + , x, y ∈ R,

144 3 Copulas & Multivariate Extremes

has according to equation (3.10) and Lemma 1.10.6, where the D-norm
is explicitly given, the copula

CHRλ (u, v)

log(log(u)/ log(v))
= exp log(u)Φ λ +

log(log(v)/ log(u))
+ log(v)Φ λ + , u, v ∈ (0, 1).

For a discussion of parametric families of extreme value copulas and their


statistical analysis we refer the reader to Genest and Nešlehová (2012).

Domain of Attraction for a General df


The next result goes back to Deheuvels (1984) and Galambos (1987). Instead
of the D-norm expansion of CF (u) below, they use in their original formula-
tion the limit of C n (u1/n ) as in Corollary 3.1.12.
Proposition 3.1.10 A d-variate df F satisfies F ∈ D(G) iff this is
true for the univariate margins of F , together with the condition that
the copula CF of F satisfies the expansion

CF (u) = 1 − 1 − uD + o (1 − u)

as u → 1, uniformly for u ∈ [0, 1]d , where ·D is the D-norm on Rd ,


which corresponds to G in the sense of Theorem 2.3.4.

A consequence of the preceding result is that multivariate extreme value


theory actually means extreme value theory for copulas.

Proof. Suppose that F ∈ D(G). According to Theorem 2.3.4, G can be rep-


resented by a D-norm ·D together with the functions ψi (x) = log(Gi (x)),
1 ≤ i ≤ d, where Gi denotes the i-th univariate margin of G. In particular,
we have that each univariate margin Fi of F satisfies Fi ∈ D(Gi ), i.e.,

Fin (ani x + bni ) →n→∞ Gi (x), x ∈ R.

Taking the logarithm on both sides and applying the Taylor expansion log(1+
ε) = ε + O(ε2 ) for ε → 0, one obtains for x ∈ R with Gi (x) > 0

n log(Fi (ani x + bni )) →n→∞ log(Gi (x))

or

n(Fi (ani x + bni ) − 1) →n→∞ log(Gi (x)) = ψi (x).


3.1 Characterizing Multivariate Domain of Attraction 145

Using Sklar’s theorem and repeating the preceding arguments, we obtain

F n (an x + bn ) →n→∞ G(x)


 
⇐⇒ CFn (Fi (ani xi + bni ))di=1 →n→∞ G(x)

d 
n(Fi (ani xi + bni ) − 1)
⇐⇒ CF n
1+ →n→∞ G(x)
n i=1

d 
ψi (xi )
⇒ CF n
1+ →n→∞ G(x)
n i=1

   
yi  d  d   
⇒ CF n
1+ →n→∞ G ψi−1 (yi ) i=1 = exp − (yi )di=1 
n i=1 D

for (y1 , . . . , yd ) < 0 ∈ Rd according to Theorem 2.3.4. Proposition 3.1.5 now


has the implication “⇒” in Proposition 3.1.10.
The reverse implication is easily seen: for x = (x1 , . . . , xd ) with 0 <
Gi (xi ) ≤ 1, 1 ≤ i ≤ d, we have

F n (an x + bn )

d 
n n(Fi (ani xi + bni ) − 1)
= CF 1+
n i=1

n
1 d 
 1
= 1− (n(Fi (ani xi + bni ) − 1))i=1 D + o
n n
→n→∞ exp (− (ψ1 (x1 ), . . . , ψd (xd ))D )
= G(x1 , . . . , xd )

according to Theorem 2.3.4.


Remark 3.1.11 The original formulation of the preceding charac-


terization by Deheuvels (1984) and Galambos (1987) is as follows:
F ∈ D(G) iff this is true for the univariate margins of F together with
convergence of the copulas:
   d 
CFn u1/n →n→∞ CG (u) = G G−1 i (u i i=1 ,
)

u = (u1 , . . . , ud ) ∈ (0, 1)d , where Gi denotes the i-th margin of the


general EVD G, 1 ≤ i ≤ d; see the following Corollary 3.1.12.
 
Note that CFn u1/n is the copula of max1≤i≤n U (i) , where U (1) , U (2) , . . .
are iid rvs that follow the copula CF : Put

(i)
H(u) := P max U ≤ u = CFn (u).
1≤i≤n
146 3 Copulas & Multivariate Extremes

Each univariate margin of H is


(i)
Hj (u) = P max Uj ≤ u = un , u ∈ [0, 1],
1≤i≤n

thus, the copula CH corresponding to the continuous df H is


   
1/n 1/n
CH (u1 , . . . , ud ) = H H1−1 (u1 ), . . . , Hd−1 (ud ) = CFn u1 , . . . , ud ,

u = (u1 , . . . , ud ) ∈ [0, 1]d.


The next result completes the list of characterizations of a copula that
belongs to the domain of attraction of a max-stable distribution.
Corollary 3.1.12 A copula C on Rd satisfies C ∈ D(G), G(x) =
exp(− xD ), x ≤ 0 ∈ Rd , iff for any u ∈ (0, 1)d
   d 
C n u1/n →n→∞ CG (u) = G G−1 i (u i ) i=1
= exp(− log(u)D ),

where log(u) is meant componentwise.

Proof. We know from Proposition 3.1.5 that the condition C ∈ D(G) is equiv-
alent to the expansion

C(u) = 1 − 1 − uD + o(1 − u)

as u → 1, uniformly for u ∈ [0, 1]d . As a consequence, we obtain


      n
   
C n u1/n = 1 − 1 − u1/n  + o 1 − u1/n  .
D

Choose ui = exp(xi ), xi ≤ 0, 1 ≤ i ≤ d. Then, the Taylor expansion 1 −


exp(ε) = −ε + O(ε2 ) as ε → 0 implies, with x = (x1 , . . . , xd ) and exp(x) also
meant componentwise,
    x 
C n u1/n = C n exp
  n     x n
 x   
= 1 − 1 − exp  + o 1 − exp 
n D n



n
x 1  1
= 1− 
 n + O n2  + o n

D

n
1  1 
= 1−  x + O  +o 1
n n D n
→n→∞ exp(− xD )
= exp(− log(u)D ),

which completes the proof of the implication “⇒.”


3.1 Characterizing Multivariate Domain of Attraction 147

Suppose on the other hand that, for u ∈ (0, 1),


 
C n u1/n →n→∞ CG (u) = exp(− log(u)D ).

With u := exp(x), x ≤ 0 ∈ Rd , we obtain


  x 
C n exp →n→∞ exp(− xD ).
n
Writing exp(x/n) = 1 + x/n + o(1/n), this becomes

x 1
Cn 1 + + o →n→∞ exp(− xD ).
n n
But


x 1 x
Cn 1 + + o = Cn 1 + + o(1)
n n n
as n → ∞, which follows from the general bound

d
|F (x) − F (y)| ≤ |Fi (xi ) − Fi (yi )|
i=1

for an arbitrary df F on Rd with univariate margins Fi , (see, for example,


Reiss (1989, Lemma 2.2.6)), together with the fact that the copula C has
uniform margins. We have thus established
 x
Cn 1 + →n→∞ exp(− xD ) = G(x),
n
x ≤ 0 ∈ Rd , which completes the proof.

Expansion of Survival Copula via Dual D-Norm


Function
If a copula C satisfies C ∈ D(G), G(x) = exp(− xD ), x ≤ 0 ∈ Rd , the
survival function of C can be approximated by the dual D-norm function
corresponding to ·D . This is the content of our next result.
We have to clarify some notation first. Let ·D be an arbitrary D-norm
on Rd with generator Z = (Z1 , . . . , Zd ). Choose a non-empty subset T ⊂
{1, . . . , d}, i.e., T = {i1 , . . . , im } with 1 ≤ i1 < · · · < im ≤ d, m = |T |. Then,
 
m 

   
yDT :=   
yj eij  = E max |yj | Zij , y ∈ Rm ,
j=1  1≤j≤m
D

is a D-norm on R . It is the projection of ·D onto ·DT . Its dual D-norm


m

function is

 
 y DT = E min |yj | Zij , y ∈ Rm .
1≤j≤m
148 3 Copulas & Multivariate Extremes

Lemma 3.1.13 Let G be an SMS df with corresponding D-norm ·D


on Rd , and let U = (U1 , . . . , Ud ) be an rv that follows a copula C. Then,
we have C ∈ D(G) iff for every non-empty subset T = {i1 , . . . , im } ⊂
{1, . . . , d}
 
P Uij ≥ uj , 1 ≤ j ≤ m =  1 − u DT + o(1 − u)

as u = (u1 , . . . , um ) → 1 ∈ Rm , uniformly for u ∈ [0, 1]m .

The proof of the preceding lemma shows that, for every T = {i1 , . . . , im } ⊂
{1, . . . , d},  
P Uij ≥ uj , 1 ≤ j ≤ m =  1 − u DT
for u close to 1 ∈ Rm , if C is a GPC.
The uniformity condition on u in the preceding result can be dropped for
the reverse implication “⇐.”
Note that the survival probability P (U ≥ u) of an rv U that follows a
copula C, also known as a survival copula, is not a copula itself.

Proof. We first establish the implication “⇒.” We can assume wlog that T =
{1, . . . , d}. From Proposition 3.1.5, we obtain the expansion

C(u) = 1 − 1 − uD + o(1 − u)

as u → 1, uniformly for u ∈ [0, 1]d , if C ∈ D(G), G(x) = exp(− xD ),


x ≤ 0 ∈ Rd . The inclusion–exclusion principle in Corollary 1.6.2 implies

P (U ≥ 1 − v)
 d 

=1−P {Ui ≤ 1 − vi }
i=1

=1− (−1)|T |−1 P (Ui ≤ 1 − vi , i ∈ T )
∅=T ⊂{1,...,d}
    
    

|T |−1   
=1− (−1) 1− vi ei  + o  vi ei 
   
∅=T ⊂{1,...,d} i∈T D i∈T
 
  
 
= (−1)|T |−1  vi ei  + o(v)
 
∅=T ⊂{1,...,d} i∈T D

as v → 0, uniformly for v ∈ [0, 1]d; recall that, by equation (1.10),



(−1)|T |−1 = 1.
∅=T ⊂{1,...,d}
3.1 Characterizing Multivariate Domain of Attraction 149

Choose a generator Z = (Z1 , . . . , Zd ) of ·D . From Lemma 1.6.1, we obtain


 
  
 
(−1)|T |−1  vi ei 
 
∅=T ⊂{1,...,d} i∈T D


= (−1)|T |−1 E max(vi Zi )


i∈T
∅=T ⊂{1,...,d}
⎛ ⎞

=E⎝ (−1)|T |−1 max(vi Zi )⎠
i∈T
∅=T ⊂{1,...,d}

=E min (vi Zi ) =  v D .


1≤i≤d

Replacing v by 1 − u yields the assertion.


The reverse implication can be seen as follows. Choose x ≥ 0 ∈ Rd and
s > 0. The inclusion–exclusion principle in Corollary 1.6.2 implies

1 − C(1 − sx)
s
1 − P (Ui ≤ 1 − sxi , 1 ≤ i ≤ d)
=
1 s 
d
P i=1 {U i ≥ 1 − sx i }
=
s
 P (Ui ≥ 1 − sxi , i ∈ T )
= (−1)|T |−1
s
∅=T ⊂{1,...,d}


→s↓0 (−1)|T |−1 E min(xi Zi )


i∈T
∅=T ⊂{1,...,d}
⎛ ⎞

= E⎝ (−1)|T |−1 min(xi Zi )⎠
i∈T
∅=T ⊂{1,...,d}

=E max (xi Zi ) = xD


1≤i≤d

and thus, C ∈ D(G) by Corollary 3.1.6.


The following example is established in Charpentier and Segers (2009,


 1/p
d p
Theorem 4.1). With xp = i=1 |xi | , p ∈ [1, ∞], we denote again
the family of logistic norms. These are D-norms, as seen in Section 1.2, with
limp→∞ xp = x∞ ; see Lemma 1.1.2.
150 3 Copulas & Multivariate Extremes

Example 3.1.14 Take an arbitrary Archimedean copula on Rd

Cϕ (u) = ϕ−1 (ϕ(u1 ) + · · · + ϕ(ud )),

where ϕ is a continuous and strictly decreasing function from (0, 1] to


[0, ∞) with ϕ(1) = 0 (see, for example, McNeil and Nešlehová (2009,
Theorem 2.2)). Let U = (U1 , . . . , Ud ) follow this copula Cϕ . Suppose
that
sϕ(1 − s)
p := − lim (3.11)
s↓0 ϕ(1 − s)

exists in [1, ∞]. Then, for x ≥ 0 ∈ Rd , the survival copula satisfies

P (Ui ≥ 1 − sxi , 1 ≤ i ≤ d)
lim
s↓0 s


⎨ x 1 = 0, if p = 1,
=  x p , if 1 < p < ∞,


 x ∞ = min {x1 , . . . , xd } , if p = ∞.

If p = 1, then the margins of Cϕ are tail independent. This concerns


both the Clayton copula and the Frank copula with generators ϕλ (t) =
(t−λ − 1)/λ, λ ≥ 0, and ϕλ (t) = − log((exp(−λt) − 1)/(exp(−λ) − 1)),
λ ∈ R\ {0} respectively, but not the Gumbel copula with generator
ϕλ (t) = (− log(t))λ , λ > 1, in which case p = λ.

The preceding example gives rise to the conjecture that Cϕ ∈ D(Gp ) under
condition (3.11). This conjecture can easily be established.
Corollary 3.1.15 Let Cϕ be an arbitrary Archimedean copula on Rd
with generator ϕ that satisfies condition (3.11). Then, Cϕ ∈ D(Gp ),
where Gp is the standard max-stable df with D-norm ·p , p ∈ [1, ∞].

Proof. Suppose that the rv U = (U1 , . . . , Ud ) follows the Archimedean cop-


ula Cϕ . The distribution of an arbitrary subset (Ui1 , . . . , Uim ), m ≤ d, with
different indices, is an Archimedean copula as well, but this time on Rm :

P (Ui1 ≤ ui1 , . . . , Uim ≤ uim )


= P (Ui1 ≤ ui1 , . . . , Uim ≤ uim ; Ui ≤ 1, 1 ≤ i ≤ d)
= ϕ−1 (ϕ(ui1 ) + · · · + ϕ(uim ))

as ϕ(1) = 0. Since condition (3.11) does not depend on the dimension d, the
preceding Example 3.1.14 also entails that, for x = (x1 , . . . , xm ) ≥ 0 ∈ Rm ,
P (Ui1 ≥ 1 − sx1 , . . . , Uim ≥ 1 − sxm )
lim
s↓0 s
3.1 Characterizing Multivariate Domain of Attraction 151


⎨ x 1 = 0, if p = 1,
=  x p , if 1 < p < ∞, (3.12)


 x ∞ = min {x1 , . . . , xm } , if p = ∞,

where these dual D-norm functions are defined on Rm . Lemma 3.1.13 now
implies the assertion.

Example 3.1.16 (Continuation of Example 3.1.14) Let Cϕ be an


Archimedean copula on Rd , whose generator function ϕ : (0, 1] → [0, ∞)
satisfies with some s0 ∈ (0, 1)

sϕ (1 − s)
− = p, s ∈ (0, s0 ], (3.13)
ϕ(1 − s)

with p ∈ [1, ∞). Then, Cϕ is a GPC, precisely,

Cϕ (u) = 1 − 1 − up , u ∈ [1 − s0 , 1]d ,

where ·p is the usual logistic norm; see Proposition 1.2.1.


This is readily seen as follows. Condition (3.13) is equivalent with
the equation
p
(log(ϕ(1 − s))) = , s ∈ (0, s0 ].
s
Integrating both sides implies

log(ϕ(1 − s)) − log(ϕ(1 − s0 )) = p log(s) − p log(s0 )

or

p

ϕ(1 − s) s
log = log , s ∈ (0, s0 ],
ϕ(1 − s0 ) s0
which yields

ϕ(1 − s0 ) p
ϕ(1 − s) = s , s ∈ [0, s0 ],
sp0

i.e.,
ϕ(s) = c(1 − s)p , s ∈ [1 − s0 , 1],
with c := ϕ(1 − s0 )/sp0 . But this implies

Cϕ (u) = ϕ−1 (ϕ(u1 ) + · · · + ϕ(ud ))


 d 1/p

=1− (1 − ui ) p
, u ∈ [1 − s0 , 1]d .
i=1
152 3 Copulas & Multivariate Extremes

There Are Strictly More D-Norms than Copulas


Let the rv U = (U1 , . . . , Ud ) follow a copula, i.e., each component Ui is uni-
1
formly distributed on (0, 1). Since E(Ui ) = 0 u du = 1/2, the rv Z := 2U
generates a D-norm.
Sklar’s theorem 3.1.1 may promote the idea that every D-norm can be
generated this way. But this is not true. Take, for example, d = 2 and
(x, y)1 = |x| + |y|. Suppose that there exists an rv U = (U1 , U2 ) follow-
ing a copula such that

(x, y)1 = 2E (max(|x| U1 , |y| U2 )) , x, y ∈ R.

Putting x = y = 1, we obtain
 
2 = 2E max(U1 , U2 ) ,

or  
E 1 − max(U1 , U2 ) = 0
  
∈[0,1]

and, thus,
P (max(U1 , U2 ) = 1) = 1.
But

P (max(U1 , U2 ) = 1) = P ({U1 = 1} ∪ {U2 = 1})


≤ P (U1 = 1) + P (U2 = 1) = 0.

Moreover, it is obvious that ·1 on Rd with d ≥ 3 cannot be generated


by 2U , as (1, . . . , 1)1 = d > 2E (max1≤i≤d Ui ). There are, consequently,
strictly more D-norms than copulas. (Note that this is not meant in a strict
mathematical sense.)

3.2 Multivariate Piecing-Together


It is by no means obvious to find a copula C that does not satisfy F ∈
D(G) for some SMS df G. Counterexamples are provided in Section 3.3. As a
consequence of the considerations in Section 3.1, we obtain that a copula C(u)
can reasonably be approximated for u close to 1 ∈ Rd only by 1 − 1 − uD ,
with some D-norm ·D .
This message has the following implication: if you want to model the copula
underlying multivariate data above some high threshold u0 , a GPC is a first
option, given in its upper tail by

Q(u) = 1 − 1 − uD , u 0 ≤ u ≤ 1 ∈ Rd . (3.14)


3.2 Multivariate Piecing-Together 153

This idea is investigated in what follows. It turns out that it is actually possible
to cut off the upper tail of a given copula C and to impute a GPC Q in such
a way that the result is again a copula.
Note that

Q̃(u) := max(1 − 1 − uD , 0), 0 ≤ u ≤ 1,

defines a copula only in dimension d = 2; for details, we refer the reader to


Falk et al. (2011, Section 5.2).

Univariate Peaks-Over-Threshold Approach


As shown in (2.6), the upper tail of a univariate df F can reasonably be
approximated only by that of a GPD, which leads to the (univariate) peaks-
over-threshold approach (POT): For a univariate rv X with df F , set

F (x) − F (x0 )
F [x0 ] (x) = P (X ≤ x | X > x0 ) = , x ≥ x0 ,
1 − F (x0 )

where we require F (x0 ) < 1. The univariate POT is the approximation of the
upper tail of F by that of a GPD H

F (x) = (1 − F (x0 ))F [x0 ] (x) + F (x0 )


≈POT (1 − F (x0 ))Hα,μ,σ (x) + F (x0 ), x ≥ x0 ,

where α, μ, and σ are shape, location and scale parameters of the GPD
H respectively. Recall that the family of univariate standardized GPDs is
given by


⎨1 − (−x) , −1 ≤ x ≤ 0, if α > 0,
α

Hα (x) = 1 − xα , x ≥ 1, if α < 0,


1 − exp(−x), x ≥ 0, if α = 0.

The preceding considerations lead to the univariate piecing-together ap-


proach (PT), by which the underlying df F is replaced by

F (x), x < x0 ,
Fx∗0 (x) = (3.15)
(1 − F (x0 ))Hα,μ,σ (x) + F (x0 ), x ≥ x0 ,

typically in a continuous manner. This approach is aimed at an investigation


of the upper end of F beyond observed data. Replacing F in (3.15) by the
empirical df of the data provides in particular a semiparametric approach to
the estimation of high quantiles; see, for example, Reiss and Thomas (2007,
Section 2.3).
154 3 Copulas & Multivariate Extremes

Multivariate Piecing-Together
A multivariate extension of the univariate PT approach was developed in
Aulbach et al. (2012a) and, for illustration, applied to operational loss data.
This approach is based on the idea that a multivariate df F can be decomposed
by Sklar’s theorem 3.1.1 into its copula C and its marginal df. The multivariate
PT approach then consists of the two steps:
(i) The upper tail of the given d-dimensional copula C is cut off and sub-
stituted by a GPC in a continuous manner, such that the result is again
a copula, called a PT copula. Figure 3.1 illustrates this approach in the
bivariate case: the copula C is replaced in the upper right rectangle of
the unit square by a GPC Q; the lower part of C is kept in the lower left
rectangle, whereas the other two rectangles are needed for a continuous
transition from C to Q.
(ii) Univariate df F1∗ , . . . , Fd∗ are injected into the resulting copula.

(0, 1) (1, 1)

(0, 0) (1, 0)

Fig. 3.1: Multivariate piecing-together.

Taken as a whole, this approach provides a multivariate df with prescribed


margins Fi∗ , whose copula coincides in its lower and central parts with C and
in its upper tail with a GPC.
3.2 Multivariate Piecing-Together 155

Let U = (U1 , . . . , Ud ) follow an arbitrary copula C and V = (V1 , . . . , Vd )


follow a GPC. Let Z be a generator of the corresponding D-norm ·D . We
suppose that U and V are independent.
Choose a threshold u = (u1 , . . . , ud ) ∈ [u0 , 1] and put for 1 ≤ i ≤ d

Yi := Ui 1(Ui ≤ ui ) + (ui + (1 − ui )Vi )1(Ui > ui ). (3.16)

The rv Y = (Y1 , . . . , Yd ) actually follows a GPC; the following result provides


a precise characterization of the corresponding D-norm as well.
Theorem 3.2.1 Suppose that P (U > u) > 0. The rv Y defined
through (3.16) follows a GPC, which coincides with C on [0, u] ⊂ [0, 1]d,
and whose D-norm is given by

1(Uj > uj )
xD = E max |xj | Zj , x ∈ Rd ,
1≤j≤d 1 − uj

where Z and U are independent.

Note that Z 2 := (Z21 , . . . , Z


2d ), with Z
2j := Zj 1(Uj > uj )/(1 − uj ) is a
generator of a D-norm, due to the independence of Z and U . As Xj := 1(Uj >
uj )/(1 − uj ) is non-negative and has expectation one, X = (X1 , . . . , Xd ) is
the generator of a D-norm itself, and consequently, Z̃ is the generator of the
product of the D-norms, as investigated in Section 1.9.

Proof. Elementary computations yield

P (Yi ≤ x) = x, 0 ≤ x ≤ 1,

i.e., Y follows a copula. We have, moreover, for 0 ≤ x ≤ u,

P (Y ≤ x)
  
= P Y ≤ x; Uk ≤ uk , k ∈ K; Uj > uj , j ∈ K 
K⊂{1,...,d}
 
= P Ui 1(Ui ≤ ui ) + (ui + (1 − ui )Vi )1(Ui > ui ) ≤ xi , 1 ≤ i ≤ d;
K⊂{1,...,d}

Uk ≤ uk , k ∈ K; Uj > uj , j ∈ K 
= P (Ui ≤ xi , 1 ≤ i ≤ d)
= C(x)

and, for u < x ≤ 1,


156 3 Copulas & Multivariate Extremes

P (Y ≤ x)
  
= P Y ≤ x; Uk ≤ uk , k ∈ K; Uj > uj , j ∈ K 
K⊂{1,...,d}
  
= P Uk ≤ uk , k ∈ K; uj + (1 − uj )Vj ≤ xj , Uj > uj , j ∈ K 
K⊂{1,...,d}
  

 xj − uj 
= P Uk ≤ uk , k ∈ K; Uj > uj , j ∈ K P Vj ≤ ,j ∈ K
1 − uj
K⊂{1,...,d}
⎛  ⎞
  
= E⎝ 1(Uk ≤ uk ) 1(Uj > uj ) ⎠
K⊂{1,...,d} k∈K j∈K 

xj − uj
×P Vj ≤ , j ∈ K .
1 − uj

If x < 1 is close enough to 1, then we have, for K 


= ∅,

! !

xj − uj ! xj − uj !
P Vj ≤ ,j ∈ K 
= 1 − E max ! ! − 1!! Zj
1 − uj j∈K  1 − uj

|xj − 1|
= 1 − E max Zj
j∈K  1 − uj

and, thus, using the independence of U and Z,

P (Y ≤ x)
= P (Uk ≤ uk , 1 ≤ k ≤ d)
⎛  ⎞
  
+ E⎝ 1(Uk ≤ uk ) 1(Uj > uj ) ⎠
K⊂{1,...,d} k∈K j∈K 
K  =∅

|xj − 1|
× 1 − E max Zj
j∈K  1 − uj
  
  
=1− E 1(Uk ≤ uk ) 1(Uj > uj )
K⊂{1,...,d} k∈K j∈K 
K  =∅


|xj − 1|
× max Zj
j∈K  1 − uj
   
  
=1−E 1(Uk ≤ uk ) 1(Uj > uj )
K⊂{1,...,d} k∈K j∈K 
K  =∅


|xj − 1|
× max Zj
j∈K  1 − uj
3.2 Multivariate Piecing-Together 157
   
  
=1−E 1(Uk ≤ uk ) 1(Uj > uj )
K⊂{1,...,d} k∈K j∈K 
K  =∅


|xj − 1|
× max Zj 1(Uj > uj )
1≤j≤d 1 − uj


1(Uj > uj )
=1−E max |xj − 1| Zj
1≤j≤d 1 − uj
  
  
× 1(Uk ≤ uk ) 1(Uj > uj )
K⊂{1,...,d} k∈K j∈K 
K  =∅


1(Uj > uj )
= 1 − E max |xj − 1| Zj (1 − 1(Uj ≤ uj , 1 ≤ j ≤ d))
1≤j≤d 1 − uj

1(Uj > uj )
= 1 − E max |xj − 1| Zj
1≤j≤d 1 − uj
= 1 − x − 1D ,

where we have used the identity


  
  
1(Uk ≤ uk ) 1(Uj > uj ) = 1.
K⊂{1,...,d} k∈K j∈K 

This completes the proof of Theorem 3.2.1.


The following result justifies the use of the multivariate PT approach, as


it shows that the PT vector Y , suitably standardized, approximately follows
the distribution of U close to one.
Proposition 3.2.2 Suppose that U = (U1 , . . . , Ud ) follows a copula
C ∈ D(G) with corresponding D-norm ·D generated by Z. If the rv
V in the definition (3.16) of the PT vector Y has this generator Z as
well, then we have

P (U > v) = P (Yj > uj + vj (1 − uj ), 1 ≤ j ≤ d | U > u) + o(1 − v)


= P (V > v) + o(1 − v),

uniformly for v ∈ [u, 1] ⊂ Rd .

The term o(1 − v) can be dropped in the preceding result if C is a GPC
itself, precisely if C(v) = 1 − 1 − vD , v ∈ [u, 1] ⊂ Rd .
158 3 Copulas & Multivariate Extremes

Proof. From Lemma 3.1.13, we obtain the expansion


P (U > v) = E min ((1 − vj )Zj ) + o(1 − v)


1≤j≤d

uniformly for v ∈ [0, 1]d . On the other hand, we have for v close enough to 1,

P (Yj > uj + vj (1 − uj ), 1 ≤ j ≤ d | U > u) = P (Vj > vj , 1 ≤ j ≤ d)


= E min ((1 − vj )Zj ) ,


1≤j≤d

where the final equation follows from (2.16). This completes the proof.

If the copula C is not known, the preceding PT approach can be modi-


fied by replacing C with the empirical copula; see Aulbach et al. (2012b) for
details.

3.3 Copulas Not in the Domain of Attraction


It is by no means obvious to find a copula C that does not satisfy C ∈ D(G)
for some SMS df G. An example is given in Kortschak and Albrecher (2009).
The following result provides a parametric family of bivariate rvs that are
easy to simulate. Each member of this family, whose parameter is different
from zero, has the property that its corresponding copula does not satisfy the
extreme value condition (3.8). These bivariate copulas can easily be used to
construct copulas in arbitrary dimension that are not in the domain of an
SMS df; just add independent components.
 
Lemma 3.3.1 Let 3 V√be an√rv4 with df Hλ (u) := u 1 + λ sin(log(u)) ,
0 ≤ u ≤ 1, λ ∈ −1/ 2, 1/ 2 . Note that Hλ (0) = 0, Hλ (1) = 1, and
Hλ (u) ≥ 0 for 0 < u < 1. Furthermore, let the rv U be independent of
V and uniformly distributed on (0, 1). Put S1 := U =: 1 − S2 . Then,
the copula Cλ corresponding to the bivariate rv

V 1 1
X := − , ∈ (−∞, 0]2 (3.17)
2 S1 S2

is not in the domain of attraction of a multivariate EVD if λ


= 0,
whereas C0 ∈ D(G) with corresponding D-norm

|x1 | |x2 |
xD = x1 −
x1

for x = (x1 , x2 )
= 0.
3.3 Copulas Not in the Domain of Attraction 159

Denote by Fλ the df of −V /S1 =D −V /S2 . Elementary computations yield


1
1 
+ λ5 , if x ≤ −1,
Fλ (x) =
|x|  2
 
1
1 − |x| 2 + 5 2 sin(log |x|) − cos(log |x|) , if −1 < x < 0,
λ

thus, Fλ is continuous and strictly increasing on (−∞, 0].

Proof (of Lemma 3.3.1). We show that

1 − Cλ (1 − t, 1 − t)
lim
t↓0 t
3 √ √ 4
does not exist for λ ∈ −1/ 2, 1/ 2 \{0}. Since Cλ coincides with the copula
of 2X, we obtain
   
1 − Cλ Fλ (s), Fλ (s) 1 − P −V /S1 ≤ s, −V /S2 ≤ s
=  
1 − Fλ (s) 1 − P −V /S1 ≤ s
 
1 − P V ≥ |s| max(U, 1 − U )
= 
1 − P V ≥ |s| U )
1  
P V ≤ |s| max(u, 1 − u) du
= 0 1  
0
P V ≤ |s| u du
 1/2   1  
0 Hλ |s| (1 − u) du + 1/2 Hλ |s| u du
= 1  
0 Hλ |s| u du
1  
H |s| u du
1/2 λ
= 2 1   .
0
Hλ |s| u du

The substitution u → u/ |s| yields


   |s|
1 1 − Cλ Fλ (s), Fλ (s) |s|/2 Hλ (u) du
1− = 1 −  |s|
2 1 − Fλ (s)
0 Hλ (u) du
 |s|/2
Hλ (u) du
= 0 |s| ,
0
H λ (u) du

where we have for each 0 < c ≤ 1


 c  c
c2
Hλ (u) du = +λ u sin(log(u)) du
0 2 0

and  c
1 c2  
u2 sin(log(u)) du = 2 sin(log(c)) − cos(log(c)) ,
0 u 5
160 3 Copulas & Multivariate Extremes

which can be seen by applying integration by parts twice. Hence, we obtain


 |s|/2
0
Hλ (u) du
 |s|
0 Hλ (u) du
 
1 12 + λ5 2 sin(log |s| − log(2)) − cos(log |s| − log(2))
= 1
  ,
2 + 5 2 sin(log |s|) − cos(log |s|)
4 λ

3 √ √ 4
whose limit does not exist for s ↑ 0 if λ ∈ 1/ 2, 1/ 2 \ {0}; consider, e.g.,
(1)   (2)  
the sequences sn = − exp (1 − 2n)π and sn = − exp (1/2 − 2n)π as
n → ∞.
On the other hand, elementary computations for x = (x1 , x2 ) ∈ (−∞, 0]2 \
{0} show

1 − C0 (1 + tx) |x1 | |x2 |


lim = 2E(max(|x1 | S1 , |x2 | S2 )) = x1 − .
t↓0 t x1

Corollary 3.1.6 now implies that C0 ∈ D(G), with the corresponding D-norm
being the above limit.

4
An Introduction to Functional Extreme Value
Theory

The extension of D-norms to functional spaces in Section 1.10 provides a


smooth approach to functional extreme value theory, in particular to general-
ized Pareto processes and max-stable processes. Multivariate max-stable dfs
were introduced in Section 2.3 by means of generalized Pareto distributions.
We repeat this approach and introduce max-stable processes via generalized
Pareto processes. In Section 4.3, we show how to generate max-stable pro-
cesses via SMS rvs. This approach, which generalizes the max-linear model
established by Wang and Stoev (2011), entails the prediction of max-stable
processes in space, not in time. The Brown–Resnick process is a prominent
example.

4.1 Generalized Pareto Processes


In this section, we extend the simple multivariate generalized Pareto distri-
bution as defined in equation (2.9) in an obvious way to functional space.

Defining a Simple Generalized Pareto Process


Let Z = (Zt )t∈[0,1] be the generator of a functional D-norm ·D on E[0, 1]
with the additional property

Zt ≤ c, t ∈ [0, 1] , (4.1)

for some constant c ≥ 1. For each functional D-norm, there exists a generator
with this additional property; see Theorem 1.10.8. Let U be an rv that is
uniformly distributed on (0, 1) and that is independent of Z. Put
1 1
V := (Vt )t∈[0,1] := (Zt )t∈[0,1] =: Z. (4.2)
U U

© Springer Nature Switzerland AG 2019 161


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9 4
162 4 An Introduction to Functional Extreme Value Theory

Denote using [0, c][0,1] := {f : [0, 1] → [0, c]} the set of all functions from the
interval [0, 1] to the interval [0, c]. Repeating the arguments in equation (2.11),
we obtain, for g ∈ E[0, 1] with g(t) ≥ c, t ∈ [0, 1],

Zt
P (V ≤ g) = P U ≥ , t ∈ [0, 1]
g(t)


zt  
= P U≥ , t ∈ [0, 1] (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1] g(t)
 

zt  
= P U ≥ sup (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1]
t∈[0,1] g(t)
 

zt  
= 1 − P U ≤ sup (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1] t∈[0,1] g(t)


zt  
=1− sup (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1] t∈[0,1] g(t)


Zt
= 1 − E sup
t∈[0,1] g(t)

= 1 − 1/gD , (4.3)

i.e., the functional df of the process V is given by 1 − 1/gD if g is pointwise


larger than c. We have, moreover,
1
P (Vt ≤ x) = 1 − , x ≥ c, t ∈ [0, 1],
x
i.e., each marginal df of the process V is equal to the standard Pareto distri-
bution in its upper tail. Therefore, we call the process V a simple generalized
Pareto process (GPP); see Ferreira and de Haan (2014) and Dombry and
Ribatet (2015) for detailed discussions.

Survival Function of a Simple GPP


The following result extends the survival function of a multivariate GPD as
in equation (2.12) to simple GPP. The dual D-norm function corresponding
to a functional D-norm was introduced in (1.28).
Proposition 4.1.1 Let Z = (Zt )t∈[0,1] be the generator of a functional
D-norm ·D with the additional property Z∞ ≤ c for some constant
c ≥ 1. Then, for g ∈ E[0, 1] with g(t) ≥ c, t ∈ [0, 1], we obtain

Zt
P (V ≥ g) = P (V > g) = E inf =  1/g D .
t∈[0,1] g(t)
4.1 Generalized Pareto Processes 163

Proof. Repeating the arguments in equation (4.3), we obtain




zt  
P (V > g) = P U< , t ∈ [0, 1] (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1] g(t)


zt  
= P U ≤ inf (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1] t∈[0,1] g(t)


zt  
= inf (P ∗ Z) d (zt )t∈[0,1]
[0,c][0,1] t∈[0,1] g(t)

Zt
=E inf .
t∈[0,1] g(t)

Excursion Stability of a Simple GPP


The following result extends the excursion stability of a multivariate simple
GPD in equation (2.13) to a simple GPP.
Corollary 4.1.2 Under
 the conditions
 of Proposition 4.1.1 and the ad-
ditional condition E inf t∈[0,1] Zt > 0, we obtain

1
P (V ≥ tg | V ≥ g) = , t ≥ 1.
t

Proof. We have

P (V ≥ tg, V ≥ g)
P (V ≥ tg | V ≥ g) =
P (V ≥ g)
P (V ≥ tg)
=
P (V ≥ g)
 1/(tg) D 1
= = .
 1/g D t

The conditional excursion probability P (V ≥ tg|V ≥ g) = 1/t, t ≥ 1,


does not depend on g. We therefore call the process V excursion stable.

Sojourn Time of a Simple GPP


The expected sojourn time of a simple GPP provides another example of its
excursion stability. The time that the simple GPP V = (Vt )t∈[0,1] spends
above the function g ∈ E[0, 1], g ≥ c ≥ 1, called sojourn time above g, is
164 4 An Introduction to Functional Extreme Value Theory
 1
ST (g) = 1(g(t),∞) (Vt ) dt.
0

From Fubini’s theorem, we obtain


 1

E (ST (g)) = E 1(g(t),∞) (Vt ) dt


0
 1  
= E 1(g(t),∞) (Vt ) dt
0
 1
= P (Vt > g(t)) dt
0
 1
1
= dt.
0 g(t)

Recall that P (Vt ≤ x) = 1 − 1/x, x ≥ c, t ∈ [0, 1].


By choosing the constant function g(t) := s ≥ c, we obtain for the expected
sojourn time of the process V above the constant s
 1

1
E(ST (s)) = E 1(s,∞) (Vt ) dt = .
0 s

Given that the sojourn time ST (s) is positive, this implies for the conditional
expectation of the sojourn time the equation

E(ST (s))
E(ST (s) | ST (s) > 0) =
1 − P (ST (s) = 0)
1/s
=
1 − P (Vt ≤ s, t ∈ [0, 1])
1
= , (4.4)
1D

independent of s ≥ c. Different than the multivariate case, where we denote


the vector (1, . . . , 1) ∈ Rd with constant entry 1 using boldface type 1, the
constant function, with value 1 on [0, 1], is denoted using regular type 1.
Interestingly, the number 1D is introduced in equation (2.28) in the
multivariate case as a measure of tail dependence between the margins of the
underlying multivariate df. The tail dependence increases if 1D decreases,
with 1D = 1 being its minimum, attained for ·D = ·∞ .
From Lemma 1.10.2, we know that in the functional case

1D ≥ 1∞ = 1,

and thus, E(ST (s) | ST (s) > 0) increases with decreasing 1D ; its maximum
value is one in the case 1D = 1, which characterizes the functional sup-norm
·∞ by the functional version of Takahashi’s Theorem in Corollary 1.10.5.
4.2 Max-Stable Processes 165

4.2 Max-Stable Processes


Let V (1) , V (2) , . . . be a sequence of independent copies of V = Z/U , where
the generator Z satisfies the additional boundedness condition (4.1). We ob-
tain, for g ∈ E[0, 1], g > 0, and large n ∈ N,

 
1
P max V (i) ≤ g = P V (i) ≤ ng, 1 ≤ i ≤ n
n 1≤i≤n

n  
= P V (i) ≤ ng
i=1
n
= P (V ≤ ng)
 
n
 1 
= 1− 
 ng 
D 

1
→n→∞ exp −  
g  ,
D
(n)
where the mathematical operations max1≤i≤n Vi , etc., are taken compo-
nentwise. The above reasoning is strict if inf t∈[0,1] g(t) > 0. Otherwise, check
that the above convergence is still true with the limit exp(− 1/gD ) = 0.
Next, we ask: Is there a stochastic process ξ = (ξt )t∈[0,1] on [0, 1] with
 

1
P (ξ ≤ g) = exp −  g
 , g ∈ E[0, 1], g > 0?
D

If ξ actually exists: Does it have continuous sample paths?


If such ξ exists, it is a max-stable process: let ξ (1) , ξ (2) , . . . be a sequence of
independent copies of the process ξ. Then we obtain, for arbitrary g ∈ E[0, 1],
g > 0, and any n ∈ N,

1 (i) (i)
P max ξ ≤ g = P max ξ ≤ ng
n 1≤i≤n 1≤i≤n
 
= P ξ (i) ≤ ng, 1 ≤ i ≤ n

n  
= P ξ (i) ≤ ng
i=1
n
= P (ξ ≤ ng)
 
n
 1 
= exp −  
 ng 
  D

1

= exp −   
g D
= P (ξ ≤ g).
Such processes ξ actually exist, see equation (4.7).
166 4 An Introduction to Functional Extreme Value Theory

Standard Max-Stable Processes


We denote using E − [0, 1] := {f ∈ E[0, 1] : f ≤ 0} the set of those functions
in E[0, 1], that attain only non-positive values.
Definition 4.2.1 Let η = (ηt )t∈[0,1] be a stochastic process in C[0, 1],
with the additional property that each component ηt follows the stan-
dard negative exponential distribution exp(x), x ≤ 0. Let η (1) , η (2) , . . .
be independent copies of η. We call the process η an SMS process if,
for arbitrary f ∈ E − [0, 1] and any n ∈ N,

(i)
P n max η ≤ f = P (η ≤ f ).
1≤i≤n

A proper choice of f , as in (1.29), shows that each finite dimensional


margin (ηt1 , . . . , ηtd ), 0 ≤ t1 < · · · < td ≤ d, d ∈ N, of an SMS process η
follows an SMS df as in Definition 2.3.2.
The following result, which goes back to Giné et al. (1990), can now be
formulated in terms of the functional D-norm. We do not provide a proof
here, but refer the reader to Giné et al. (1990) instead.
Theorem 4.2.2 A process η in C[0, 1] is an SMS process iff there exists
a D-norm ·D on E[0, 1], such that

P (η ≤ f ) = exp (− f D ) , f ∈ E − [0, 1].

The preceding result immediately entails, for example, that the rv X :=


supt∈[0,1] ηt follows a negative exponential distribution:

P (X ≤ x) = P (ηt ≤ x, t ∈ [0, 1])


= P (η ≤ x1)
= exp (− x1D )
= exp (x 1D ) , x ≤ 0,

i.e., the rv X = supt∈[0,1] ηt is negative exponentially distributed

P (X ≤ x) = exp(x/ϑ), x ≤ 0,

with parameter ϑ = 1/ 1D . Recall that η has continuous sample paths; as


a consequence, we obtain in particular
 
P (ηt = 0 for some t ∈ [0, 1]) = P sup ηt = 0
t∈[0,1]

= 1 − P (X < 0)
= 1 − P (X ≤ 0) = 0. (4.5)
4.2 Max-Stable Processes 167

This observation can be extended considerably as follows. Choose f ∈


E − [0, 1]. Then

P (ηt = f (t) for some t ∈ [0, 1], η ≤ f ) = 0. (4.6)

With f = 0 we obtain equation (4.5) again. Equation (4.6) is an immediate


consequence of the next result, by observing that

P (ηt = f (t) for some t ∈ [0, 1], η ≤ f ) = P ({η ≤ f } \ {η < f })


= P (η ≤ f ) − P (η < f ) = 0.

Lemma 4.2.3 For an arbitrary SMS process η with corresponding D-


norm ·D , we have

P (η < f ) = P (η ≤ f ) = exp (− f D ) , f ∈ E − [0, 1].

Proof. Choose f ∈ E − [0, 1]. Corollary 1.10.3 implies that


 
 
f − 1  →n→∞ f 
 n D D

thus, we obtain from the continuity of probability theorem


 +
* 1
P (η < f ) ≥ P η≤f−
n
n∈N

1
= lim P η ≤ f −
n∈N n
 

 1
= lim exp − f − 

n∈N n D
= exp (− f D ) .

Because
P (η < f ) ≤ P (η ≤ f ) = exp (− f D ) ,
the assertion follows.

Simple Max-Stable Process


As P (η < 0) = 1 by (4.5), we can put
1
ξ := − . (4.7)
η
The process ξ = (ξt )t∈[0,1] has continuous sample paths, each margin ξt is
standard Fréchet distributed
168 4 An Introduction to Functional Extreme Value Theory

1 1
P (ξt ≤ y) = P ηt ≤ − = exp − , y > 0,
y y
and, for g ∈ E[0, 1], g > 0, we have

 

1 1
P (ξ ≤ g) = P η ≤ − = exp −  
g  .
g D

The process ξ is, consequently, max-stable as well. It is called simple max-


stable in the literature.

Generation of SMS Processes


In Proposition 2.4.1 we showed how to generate an SMS rv on Rd via inde-
pendent copies of a generator of the corresponding D-norm. This approach
can be repeated to generate an SMS process.
Proposition 4.2.4 Let Z (i) , i ∈ N, be independent copies of a bounded
generator process Z = (Zt )t∈[0,1] of a functional D-norm ·D , and let
E1 , E2 , . . . be iid standard exponential
i rvs, also independent of the se-
quence Z (1) , Z (2) , . . . Put Vi := 1/ k=1 Ek , i ∈ N. Then, the stochastic
process
⎛ ⎞
1 1
η := (ηt )t∈[0,1] := −   = −⎝  ⎠
supi∈N Vi Z (i) sup Vi Z
(i)
i∈N t
t∈[0,1]

is an SMS process with

P (η ≤ f ) = exp (− f D ) , f ∈ E − [0, 1].

The condition that Z is bounded can be dropped in the preceding result,


see the proof of equation (9.4.6) in de Haan and Ferreira (2006).
Proof. If we know already that the process η has continuous sample paths,
then we obtain, for f ∈ E − [0, 1], using the continuity of probability theorem
again,
   n 
 
P (η ≤ f ) = P {ηti ≤ f (ti )} = lim P {ηti ≤ f (ti )} ,
n→∞
i∈N i=1

where {t1 , t2 , . . . } is a dense subset of [0, 1] that also contains the finitely
many points t ∈ [0, 1] at which the function f is discontinuous. From Propo-
sition 2.4.1, we obtain that
⎛ ⎞d
1
ηt1 ,...,td := (ηt1 , . . . , ηtd ) = − ⎝  ⎠
(i)
supi∈N Vi Ztj
j=1
4.2 Max-Stable Processes 169

is an SMS rv in Rd for arbitrary indices 0 ≤ t1 < · · · < td ≤ 1 and d ∈ N


with df
 
P (ηt1 ,...,td ≤ x) = exp − xDt ,...,t
1 d

=: Gt1 ,...,td (x), x ≤ 0 ∈ Rd .

The D-norm ·Dt on Rd is generated by Zt1 ,...,td := (Zt1 , . . . , Ztd ), i.e.,


1 ,...,td

 
xDt =E max |xj | Ztj , x ∈ Rd .
1 ,...,td 1≤j≤d

The dominated convergence theorem, together with the fact that Z has
continuous sample paths, implies
 n 
  
P {ηti ≤ f (ti )} = P ηt1 ,...,td ≤ (f (tj ))dj=1
i=1
 
= exp − (f (t1 ), . . . , f (td ))Dt ,...,t
1

 
= exp −E max |f (tj )| Ztj
1≤j≤d
  
→n→∞ exp −E sup (|f (t)| Zt )
t∈[0,1]

= exp (− f D ) .

Therefore, we have established

P (η ≤ f ) = exp (− f D ) , f ∈ E − [0, 1].

It remains
 to show that η has continuous sample paths. Put ξ(t) :=
(i)
supi∈N Vi Zt , t ∈ [0, 1]. We show

lim inf ξ(t) ≥ ξ(t0 ),


t→t0

and
lim sup ξ(t) ≤ ξ(t0 )
t→t0

for each t0 ∈ [0, 1] with probability one. This implies pathwise continuity of
the process ξ = (ξ(t))t∈[0,1] .
Recall that we require boundedness of Z, i.e., supt∈[0,1] Zt ≤ c for some
number c ≥ 1. For any M ∈ N, we have
 
  Zt
(i)
(i)
ξ(t) = max max Vi Zt , sup i
1≤i≤M i≥M+1 k=1 Ek
170 4 An Introduction to Functional Extreme Value Theory
⎧  
⎨≤ max1≤i≤M Vi Zt(i) + c
M +1 ,
  k=1 Ek
⎩≥ max1≤i≤M Vi Z (i) .
t

(i)
The continuity of each Zt implies
 
(i)
lim inf ξ(t) ≥ max Vi Zt0
t→t0 1≤i≤M

for each M ∈ N; thus,


 
(i)
lim inf ξ(t) ≥ sup Vi Zt0 .
t→t0 i∈N

On the other hand,


  c
(i)
lim sup ξ(t) ≤ max Vi Zt0 + M+1
t→t0 1≤i≤M Ek
k=1

thus, by the law of large numbers,


 
(i)
lim sup ξ(t) ≤ sup Vi Zt0
t→t0 i∈N

for each t0 ∈ [0, 1], with probability one. This shows that the process ξ has
continuous sample paths and, therefore, the process η = −1/ξ as well. Note
that P (ξ > 0) = 1, which can easily be seen using the fact that ξ is a max-
stable process, as in the proof of equation (9.4.6) in de Haan and Ferreira
(2006).

Survival Probability of SMS Process


We can easily extend the bounds in Lemma 2.4.2 for the survival probability
of an SMS rv in Rd to an SMS process.
Lemma 4.2.5 Let η be an SMS process with a corresponding functional
D-norm ·D . For f ∈ E − [0, 1], we have
(i) P (η > f ) ≥ 1 − exp (−  f D ),
P (η>sf )
(ii) lims↓0 s =  f D .

Proof. Choose f ∈ E − [0, 1], and let {t1 , t2 , . . . } be a dense set in [0, 1], which
also contains the finitely many points t ∈ [0, 1] at which f is discontinuous.
We can assume wlog that supt∈[0,1] f (t) =: K < 0; otherwise, the prob-
ability P (η > f ) would be zero and parts (i) and (ii) of Lemma 4.2.5 are
obviously true as  f D = 0. From Lemma 2.4.2 and the continuity of η, we
obtain, for ε ∈ (0, |K|),
4.2 Max-Stable Processes 171
 

P (η > f ) ≥ P {ηti > f (ti ) + ε}
i∈N
 

n
= lim P {ηti > f (ti ) + ε}
n∈N
i=1

≥ lim inf 1 − exp −E min (|f (ti ) + ε| Zti )


n∈N 1≤i≤n

= 1 − exp −E inf (|f (t) + ε| Zt ) .


t∈[0,1]

Letting ε converge to zero, we obtain from the dominated convergence theorem


P (η > f ) ≥ 1 − exp −E inf (|f (t)| Zt ) ,


t∈[0,1]

which is part (i) of Lemma 4.2.5.


Next, we establish the inequality

P (η > sf )
lim sup ≤ E min (|f (ti )| Zti ) , n ∈ N.
s↓0 s 1≤i≤n

The inclusion–exclusion principle implies


 n 

P (η > sf ) ≤ P {ηti > sf (ti )}
i=1
 

n
=1−P {ηti ≤ sf (ti )}
i=1
  
=1− (−1)|T |−1 P ηtj ≤ f (tj ), j ∈ T
∅=T ⊂{1,...,n}


 
=1− (−1)|T |−1 exp −sE max |f (tj )| Ztj
j∈T
∅=T ⊂{1,...,n}

=: 1 − H(s) = H(0) − H(s)

by equation (1.10).
The function H is differentiable; thus,

P (η > sf ) H(s) − H(0)


lim sup ≤ − lim
s↓0 s s↓0 s
= −H  (0)


|T |−1
 
= (−1) E max |f (tj )| Ztj
j∈T
∅=T ⊂{1,...,n}
172 4 An Introduction to Functional Extreme Value Theory
⎛ ⎞
  
=E⎝ (−1)|T |−1 max |f (tj )| Ztj ⎠
j∈T
∅=T ⊂{1,...,n}

 
=E min |f (tj )| Ztj
1≤j≤n

according to Lemma 1.6.1. Letting n tend to infinity, the dominated conver-


gence theorem implies

P (η > sf )
lim sup ≤E inf (|f (t)| Zt ) =  f D .
s↓0 s t∈[0,1]

The Taylor expansion exp(x) = 1 + x + o(x) for x → 0, together with the


lower bound in part (i), implies
P (η > sf ) 1 − exp (−  sf D )
lim inf ≥ lim inf
s↓0 s s↓0 s
1 − exp (−s  f D )
= lim
s↓0 s
=  f D ,

which completes the proof of part (ii) and, thus, of Lemma 4.2.5.

It is easy to find an SMS process η and f ∈ E − [0, 1] with a strict inequality
in part (i) of Lemma 4.2.5; see the next example. This construction of an
SMS process is a particular example of a max-linear model discussed and
generalized in Section 4.3.
Example 4.2.6 (Simple Max-Linear Model) Take two indepen-
dent and identically standard negative exponentially distributed rvs η0 ,
η1 , and put, for t ∈ (0, 1),

η0 η1
ηt := max , .
1−t t

Then, the process η := (ηt )t∈[0,1] is continuous and satisfies for f ∈


E − [0, 1]
P (η ≤ f ) = exp (− f D ) , (4.8)
where the functional D-norm ·D is generated by the process

Z = (Zt )t∈[0,1] := (max ((1 − t)Z0 , tZ1 ))t∈[0,1] ,

with Z0 ∈ {0, 2}, P (Z0 = 0) = P (Z0 = 2) = 1/2 and Z1 := 2 − Z0 .


We have min(Z0 , Z1 ) = 0 and, consequently,  f D = 0, f ∈ E[0, 1].
The lower bound in part (i) of Lemma 4.2.5 is, therefore, zero, which is
less helpful:
4.2 Max-Stable Processes 173

P (η > f ) ≥ 1 − exp (−  f D ) = 1 − exp(0) = 0.

Put 3 4
1
− 1−t , t ∈ 0, 12 ,
f (t) :=  4
− 1t , t ∈ 12 , 1 .
The function f is negative and continuous, and we obtain

η0 η1
P (η > f ) = P max , > f (t), t ∈ [0, 1]
1−t t

η0 η1
≥P > f (t), t ∈ [0, 1/2] ; > f (t), t ∈ (1/2, 1]
1−t t
= P (η0 > −1, η1 > −1)
= P (η0 > −1)2
= exp(−2) > 0.

Although it is a bit uncommon, we add a proof of the preceding example.

Proof (of Example 4.2.6). The process Z is non-negative, pointwise not larger
than 2, and satisfies for each t ∈ [0, 1]

E(Zt ) = 2(1 − t)P (Z0 = 2) + 2tP (Z0 = 0) = 1.

It is, therefore, the generator of a functional D-norm ·D . We have, for


f ∈ E − [0, 1],
 
f D = E sup (|f (t)| Zt )
t∈[0,1]
   
=E sup (|f (t)| Zt ) 1(Z0 = 2) +E sup (|f (t)| Zt ) 1(Z0 = 2)
t∈[0,1] t∈[0,1]
   
=2 sup ((1 − t) |f (t)|) P (Z0 = 2) + 2 sup (t |f (t)|) P (Z0 = 0)
t∈[0,1] t∈[0,1]

= sup ((1 − t) |f (t)|) + sup (t |f (t)|)


t∈[0,1] t∈[0,1]

=− inf ((1 − t)f (t)) + inf (tf (t)) .


t∈[0,1] t∈[0,1]

We have, moreover,

P (η ≤ f ) = P (η0 ≤ (1 − t)f (t), η1 ≤ tf (t), t ∈ [0, 1])


= P η0 ≤ inf ((1 − t)f (t)), η1 ≤ inf (f (t))


t∈[0,1] t∈[0,1]
174 4 An Introduction to Functional Extreme Value Theory

=P η0 ≤ inf ((1 − t)f (t)) P η1 ≤ inf (f (t))


t∈[0,1] t∈[0,1]

= exp inf ((1 − t)f (t)) exp inf (f (t))


t∈[0,1] t∈[0,1]

= exp inf ((1 − t)f (t)) + inf (f (t))


t∈[0,1] t∈[0,1]

= exp (− f D ) ,
which proves equation (4.8).

The Range of the Components of an SMS Process


By repeating the arguments in the proof of Lemma 2.5.5 word for word, we
can extend it to a functional version.

Lemma 4.2.7 Let η = (ηt )t∈[0,1] be an SMS process with corresponding


D-norm generated by the process Z = (Zt )t∈[0,1] . For 0 ≤ a < b ≤ 1,
we have the bound
 

E sup |ηt − ηs | = E max ηt − E min ηt


s,t∈[a,b] t∈[a,b] t∈[a,b]

1 1
≤  −  .
E mint∈[a,b] Zt E maxt∈[a,b] Zt

With a = 0 and b = 1, this bound becomes


 
1 1
E sup |ηt − ηs | ≤ − .
s,t∈[0,1]  1 D 1D

Example 4.2.6 shows that  1 D can be zero, in which case the preceding
upper bound is not helpful.
Clearly, the process η has continuous sample paths in our setup. But it is
worth mentioning, on the other hand, that the upper bound in Lemma 4.2.7
implies continuity in probability of η, i.e., P (|ηt − ηs | ≥ ε) →t→s 0, for each
s ∈ [0, 1]: the pathwise continuity of Z, together with the dominated conver-
gence theorem, yields

lim E min Zt = E(Zs ) = 1 = lim E max Zt ,


b−a↓0 t∈[a,b] b−a↓0 t∈[a,b]

if a ≤ s ≤ b. Markov’s inequality, together with Lemma 4.2.7, then implies


   
1
P sup |ηt − ηs | ≥ ε ≤ E sup |ηt − ηs | →b−a↓0 0,
t∈[a,b] ε t∈[a,b]

if a ≤ s ≤ b.
4.3 Generalized Max-Linear Models 175

4.3 Generalized Max-Linear Models


We propose a way how to generate an SMS process in C[0, 1] from an SMS
rv in Rd by generalizing the max-linear model established by Wang and
Stoev (2011). For this purpose, an interpolation technique that preserves max-
stability is proposed. It turns out that if the rv follows some finite dimensional
distribution of some initial SMS process, the approximating processes con-
verge uniformly to the original process and the pointwise mean squared error
can be represented in a closed form. This method enables the reconstruction
of the initial process only from a finite set of observation points, and thus,
reasonable prediction of max-stable processes in space, not in time, becomes
possible. The Brown–Resnick process is a prominent example.

The Generalized Max-Linear Model


Let X = (X0 , . . . , Xd ) be an SMS rv with pertaining D-norm ·D0,...,d on
Rd+1 generated by Z = (Z0 , . . . , Zd ), d ∈ N, i.e.,
 

P (X ≤ x) = exp − xD0,...,d = exp −E max |xi | Zi ,


i=0,...,d

x = (x0 , . . . , xd ) ≤ 0. Choose arbitrary deterministic functions g0 , . . . , gd ∈


C + [0, 1] := {g ∈ C[0, 1] : g ≥ 0} with the property

(g0 (t), . . . , gd (t))D0,...,d = 1, t ∈ [0, 1]. (4.9)

For instance, in the case of independent margins of X, we have ·D0,...,d =


·1 , and condition (4.9) becomes


d
gi (t) = 1, t ∈ [0, 1],
i=0

i.e., gi (t), i = 0, . . . , d, defines a probability distribution on the set {0, . . . , d}


for each t ∈ [0, 1]. This is the setup in the max-linear model introduced by
Wang and Stoev (2011). An example is given by the binomial distribution

d i
gi (t) := t (1 − t)d−i , i = 0, . . . , d, t ∈ [0, 1].
i

Let ·D0,...,d be an arbitrary D-norm on Rd+1 . Choose arbitrary functions



h0 , . . . , hd ∈ C + [0, 1], which satisfy di=0 hi (t) > 0 for t ∈ [0, 1]. Then,

(h0 (t), . . . , hd (t))


(g0 (t), . . . , gd (t)) := , t ∈ [0, 1],
(h0 (t), . . . , hd (t))D0,...,d
176 4 An Introduction to Functional Extreme Value Theory

satisfies condition (4.9). Particularly helpful functions g0∗ , . . . , gd∗ are defined
in (4.12).
Now, for t ∈ [0, 1], put
Xi
ηt := max . (4.10)
i=0,...,d gi (t)

The model (4.10) is called the generalized max-linear model. It defines an SMS
process, as the next lemma shows.
Lemma 4.3.1 The stochastic process η = (ηt )t∈[0,1] in (4.10) defines
an SMS process with generator process Ẑ = (Ẑt )t∈[0,1] given by

Ẑt = max (gi (t)Zi ) , t ∈ [0, 1].


i=0,...,d

In model (4.10) we have not made any further assumptions on the D-norm
·D0,...,d , that is, on the dependence structure of the rv X0 , . . . , Xd . The
special case ·D0,...,d = ·1 characterizes the independence of X0 , . . . , Xd .
This is the regular max-linear model, Wang and Stoev (2011).
On the contrary, ·D0,...,d = ·∞ provides the case of complete depen-
dence X0 = · · · = Xd a.s., with the constant generator Z0 = · · · = Zd = 1.
Thus, condition (4.9) becomes maxi=0,...,d gi (t) = 1, t ∈ [0, 1]; therefore,

Ẑt = max (gi (t)Zi ) = max gi (t) = 1, t ∈ [0, 1].


i=0,...,d i=0,...,d

Proof (of Lemma 4.3.1). At first, we verify that the process Ẑ is indeed a
generator process. It is obvious that the sample paths of Ẑ are in C + [0, 1],
owing to the continuity of each gi . Furthermore, for each t ∈ [0, 1], we have
by construction
 
E Ẑt = (g0 (t), . . . , gd (t))D0,...,d = 1.

As ·∞ ≤ ·D for an arbitrary D-norm, we have (g0 (t), . . . , gd (t))∞ ≤ 1,


t ∈ [0, 1], and, thus, Ẑt ≤ maxi=0,...,d Zi , t ∈ [0, 1].
In addition, we have for f ∈ E − [0, 1]
P (η ≤ f )
= P (Xi ≤ gi (t)f (t), i = 0, . . . , d, t ∈ [0, 1])

= P Xi ≤ inf (gi (t)f (t)), i = 0, . . . , d


t∈[0,1]

= P Xi ≤ − sup (gi (t) |f (t)|), i = 0, . . . , d


t∈[0,1]
⎛  ⎞



 
= exp ⎝−  sup (g0 (t) |f (t)|), . . . , sup (gd (t) |f (t)|)  ⎠
 t∈[0,1] t∈[0,1] 
D0,...,d
4.3 Generalized Max-Linear Models 177

= exp −E max sup gi (t) |f (t)| Zi


i=0,...,d t∈[0,1]



= exp −E sup |f (t)| max (gi (t)Zi )
t∈[0,1] i=0,...,d
 
 

= exp −E sup |f (t)| Ẑt ,


t∈[0,1]

which completes the proof.


If Condition (4.9) is Dropped


Condition (4.9) ensures that the univariate margins ηt , t ∈ [0, 1], of the process
η in model (4.10) follow the standard negative exponential distribution P (ηt ≤
x) = exp(x), x ≤ 0. If we drop this condition, we still obtain a max-stable
process: for n ∈ N, take iid copies η (1) , . . . , η (n) of η, defined in (4.10). For
f ∈ E − [0, 1], we have

P n max η (k) ≤ f
1≤k≤n


n
gi (t)f (t)
= P Xi ≤ inf , i = 0, . . . , d
t∈[0,1] n
⎛  ⎞


 
= exp ⎝−  sup (g0 (t) |f (t)|) , . . . , sup (gd (t) |f (t)|)  ⎠
 t∈[0,1] t∈[0,1] 
D0,...,d

= P (η ≤ f ).

The univariate margins of η are now given by


 
P (ηt ≤ x) = exp (g0 (t), . . . , gd (t))D0,...,d x , (4.11)

for x ≤ 0 and t ∈ [0, 1].

Reconstruction of SMS Process


The preceding approach offers a way to reconstruct an SMS process in such
a way that the reconstruction is again an SMS process. Let η = (ηt )t∈[0,1]
be an SMS process with generator process Z = (Zt )t∈[0,1] and D-norm ·D .
Choose a grid 0 =: s0 < s1 < · · · < sd−1 < sd := 1 of points within [0, 1].
Then, X := (ηs0 , . . . , ηsd ) is an SMS rv in Rd+1 with pertaining D-norm
·D0,...,d generated by (Zs0 , . . . , Zsd ).
In what follows, we define an SMS process η̂ = (η̂t )t∈[0,1] for which η̂si =
ηsi , i = 0, . . . , d, holds, i.e., η̂ interpolates the finite dimensional projections
178 4 An Introduction to Functional Extreme Value Theory

(ηs0 , . . . , ηsd ) of the original SMS process η in an appropriate way. This is


done by means of a special case of the generalized max-linear model, i.e., by a
particular choice of the functions gi in equation (4.10). We show that this way
of predicting η in space is reasonable, as the pointwise mean squared error
   2

(d) (d)
MSE η̂t := E ηt − η̂t vanishes for all t ∈ [0, 1] as d increases.
Moreover, we establish uniform convergence of the “predictive” processes and
the corresponding generator processes to the original ones.

Proper Choice of Auxiliary Functions


As shown in Lemma 4.3.1, the stochastic process η̂ = (η̂t )t∈[0,1] , defined
through its margins by
ηsi
η̂t = max , t ∈ [0, 1],
i=0,...,d gi (t)

is an SMS process with generator process Ẑ = (Ẑt )t∈[0,1] , given by

Ẑt = max (gi (t)Zsi ) , t ∈ [0, 1],


i=0,...,d

for arbitrary functions g0 , . . . , gd in C + [0, 1] that satisfy condition (4.9). We


are going to specify these auxiliary functions now.
Denote by ·Di−1,i the D-norm pertaining to the bivariate rv (ηsi−1 , ηsi ),
i = 1, . . . , d. Put
⎧ s1 − t
⎨ , t ∈ [0, s1 ],

g0 (t) := (s 1 − t, t)D0,1

0, else,
⎧ t − si−1

⎪ , t ∈ [si−1 , si ],

⎪ (s − t, t − si−1 )Di−1,i
⎨ i
gi∗ (t) := si+1 − t
⎪ , t ∈ [si , si+1 ], i = 1, . . . , d − 1,

⎪ (s − t, t − si )Di,i+1


i+1
0, else,
⎧ t − s
⎨ d−1
, t ∈ [sd−1 , sd ],
gd∗ (t) := (sd − t, t − sd−1 )Dd−1,d (4.12)

0, else.

Clearly, g0∗ , . . . , gd∗ ∈ C + [0, 1]: the fact that a D-norm is standardized implies
si − si−1 si+1 − si
lim gi∗ (t) = =1= = lim gi∗ (t).
t↑si (0, si − si−1 )Di−1,i (si+1 − si , 0)Di−1,i t↓si

Moreover, we have, for t ∈ [si−1 , si ], i = 1, . . . , d,


4.3 Generalized Max-Linear Models 179
 ∗ 
(g0∗ (t), . . . , gd∗ (t))D0,...,d =  gi−1 (t), gi∗ (t) Di−1,i = 1.

Hence, the functions g0∗ , . . . , gd∗ are suitable for the generalized max-linear
model (4.10). In addition, they have the following property:
Lemma 4.3.2 The functions g0∗ , . . . , gd∗ defined above satisfy

gi∗ ∞ = gi∗ (si ) = 1, i = 0, . . . , d.

In view of their properties described above, the functions gi∗ work as ker-
nels in non-parametric kernel density estimation. Each function gi∗ (t) has max-
imum value 1 at t = si , and, with the distance between t and si increasing,
the value gi∗ (t) shrinks to zero.
Proof (of Lemma 4.3.2). From the fact that a D-norm is monotone and stan-
dardized, we obtain, for i = 1, . . . , d − 1 and t ∈ [si−1 , si ),
t − si−1 1 1
gi∗ (t) = = 

 
 ≤ = 1,
(si − t, t − si−1 )Di−1,i si −t
 t−s , 1  (0, 1)Di−1,i
i−1
Di−1,i

and for t ∈ [si , si+1 )


si+1 − t 1 1
gi∗ (t) = =

 
 ≤ = 1.
(si+1 − t, t − si )Di,i+1  1, si+1 −t
t−s i
 (1, 0)Di,i+1
Di,i+1

Analogously, we have g0∗ ≤ 1 and gd∗ ≤ 1. The assertion now follows since
gi∗ (si ) = 1, i = 0, . . . , d.

The SMS Process with these Auxiliary Functions


The SMS process η̂ = (η̂t )t∈[0,1] that is generated by the generalized max-
linear model with these particular functions g0∗ , . . . , gd∗ is given by

ηsi−1 ηsi
η̂t = max ,
∗ (t) g ∗ (t)
gi−1 i

ηsi−1 ηsi
= (si − t, t − si−1 )Di−1,i max , , (4.13)
si − t t − si−1
for t ∈ [si−1 , si ], i = 1, . . . , d. Note that ηsi < 0 a.s., i = 0, . . . , d. This implies
that the maximum, taken over d+1 points in (4.10), goes down to a maximum
taken over only two points in (4.13), since all except two of the gi vanish in
t ∈ [si−1 , si ], i = 1, . . . , d. We have, moreover,
η̂si = ηsi , i = 0, . . . , d,
thus, the above process interpolates the rv (ηs0 , . . . , ηsd ). In summary, we have
established the following result.
180 4 An Introduction to Functional Extreme Value Theory

Corollary 4.3.3 Let η = (ηt )t∈[0,1] be an SMS process with generator


Z = (Zt )t∈[0,1] , and let 0 := s0 < s1 <, . . . , < sd−1 < sd := 1 be a
grid of points in the interval [0, 1]. The process η̂ = (η̂t )t∈[0,1] defined in
(4.13) is an SMS process with generator process Ẑ = (Ẑt )t∈[0,1] , where
 
max (si − t)Zsi−1 , (t − si−1 )Zsi
Ẑt = , t ∈ [si−1 , si ], i = 1, . . . , d.
(si − t, t − si−1 )Di−1,i
(4.14)
The processes η̂ and Ẑ interpolate the rv (ηs0 , . . . , ηsd ) and (Zs0 , . . . , Zsd )
respectively.

We call η̂ the discretized version of η and Ẑ the discretized version of Z,


both with grid {s0 , . . . , sd }.

Max and Min of Discretized Versions


We show that the discretized version of the underlying SMS process converges
to this very process in a strong sense. We need the following two lemmas, which
provide technical insight into the structure of the chosen max-linear model.
Lemma 4.3.4 The SMS process defined in (4.13) fulfills, for i =
1, . . . , d,  
sup η̂t = max ηsi−1 , ηsi ,
t∈[si−1 ,si ]

and  
inf η̂t = − (ηsi−1 , ηsi )D .
t∈[si−1 ,si ] si−1 ,si

This minimum is attained for t = (si−1 ηsi−1 + si ηsi )/(ηsi−1 + ηsi ).


Proof. We know from Lemma 4.3.2 that gi−1 (t), gi∗ (t) ≤ 1 for an arbitrary
i = 1, . . . , d and t ∈ [si−1 , si ]. Hence,

ηsi−1 ηsi
η̂t = max ∗ (t) , g ∗ (t) ≤ max(ηsi−1 , ηsi )
gi−1 i


for i = 1, . . . , d and t ∈ [si−1 , si ]. The fact that gi−1 (si−1 ) = 1 = gi∗ (si )
yields the first part of the assertion. Recall that ηsi < 0 with probability one,
i = 0 . . . , d. Moreover, for t ∈ (si−1 , si ), we have
ηsi−1 ηsi si − t ηs si−1 ηsi−1 + si ηsi
≤ ⇐⇒ ≤ i−1 ⇐⇒ t ≥ ,
si − t t − si−1 t − si−1 ηsi ηsi−1 + ηsi

where equality in one of these expressions occurs iff it does in the other two.
In this case of equality, we have
4.3 Generalized Max-Linear Models 181
ηsi  
η̂t = (si − t, t − si−1 )Di−1,i = − (ηsi−1 , ηsi )Di−1,i .
t − si−1
On the other hand, the monotonicity of a D-norm implies, for every t ∈
(si−1 , si ), with t ≥ (si−1 ηsi−1 + si ηsi )/(ηsi−1 + ηsi ),
ηsi
η̂t ≥ (si − t, t − si−1 )Di−1,i
t − si−1


 si − t 
= t − si−1 , 1 
 ηsi
Di−1,i


 ηsi−1   
≥ ,1  ηsi = − (ηsi−1 , ηsi )Di−1,i .
ηsi 
Di−1,i

The case t ≤ (si−1 ηsi−1 + si ηsi )/(ηsi−1 + ηsi ) works analogously.


As an immediate consequence of the preceding result, we obtain, for x ≤ 0,

η̂ ≤ x ⇐⇒ max (ηs0 , . . . , ηsd ) ≤ x,


 
η̂ > x ⇐⇒ max  ηsi−1 , ηsi Di−1,i < −x.
1≤i≤d

The next lemma is on the structure of the underlying generator processes.


It is shown by repeating the arguments in the proof of Lemma 4.3.2.
Lemma 4.3.5 The generator process defined in (4.14) fulfills, for i =
1, . . . , d,  
sup Ẑt = max Zsi−1 , Zsi .
t∈[si−1 ,si ]

Moreover, for i = 1, . . . , d,
⎧ −1
⎨  
(1/Zsi−1 , 1/Zsi )
Di−1,i
, if Zsi−1 , Zsi > 0,
inf Ẑt =
t∈[si−1 ,si ] ⎩0 else.
 
The minimum is attained for t = (si−1 Zsi + si Zsi−1 )/ Zsi−1 + Zsi in
the first case.

In (2.28), we introduced the extremal coefficient 1D as a measure of


tail dependence.
 The preceding
 result shows in particular that the extremal
coefficient E supt∈[0,1] Ẑt of the SMS process η̂ coincides with the extremal
coefficient E(maxi=0,...,d Zsi ), which corresponds to the SMS rv (ηs0 , . . . , ηsd ).

Uniform Convergence of Discretized Versions


So far, we have only considered a fixed discretized version of an SMS process.
The next step is to examine a sequence of discretized versions with certain
182 4 An Introduction to Functional Extreme Value Theory

grids, whose diameter converges to zero. It turns out that such a sequence
converges to the initial SMS process in the function space C[0, 1] equipped
with the sup-norm. Thus, our method is suitable for reconstructing the initial
process.
Let
(d) (d) (d) (d) (d) (d)
Gd := {s0 , s1 , . . . , sd }, 0 =: s0 < s1 < · · · < sd := 1, d ∈ N,

be a sequence of grids in [0, 1] with diameter


 
(d) (d)
κd := max si − si−1 →d→∞ 0.
i=1,...,d

(d)
Let η̂ (d) = (η̂t )t∈[0,1] be the discretized version
  of an SMS process η =
(d) (d)
(ηt )t∈[0,1] with grid Gd . Denote using Ẑ = Ẑt and Z = (Zt )t∈[0,1]
t∈[0,1]
the generator processes pertaining to η̂ (d) and η respectively. Uniform con-
vergence of η̂ (d) to η and of Ẑ (d) to Z, as d tends to infinity, is established in
the next result.
Theorem 4.3.6 The processes η̂ (d) and Ẑ (d) , d ∈ N, converge
 (d)
 
uniformly
 to η and Z pathwise, i.e., η̂ − η ∞ →d→∞ 0 and
 (d) 
Ẑ − Z  →d→∞ 0 with probability one.

Proof. Denote by [t]d , d ∈ N, the left neighbor of t ∈ [0, 1] among Gd , and


by td , d ∈ N, the right neighbor of t ∈ [0, 1] among Gd . Choose a sequence
of numbers s(d) ∈ [0, 1], d ∈ N, with s(d) →d∈N s ∈ [0, 1]. Then, obviously
[s(d) ]d →d→∞ s and s(d) d →d→∞ s. Hence, we obtain by Lemma 4.3.4, and
the continuity of the process η
(d)  
η̂s(d) ≤ max η̂s(d) = max η[s(d) ]d , ηs(d) d →d→∞ ηs ,
s∈[[s(d) ]d ,s(d) d ]

as well as
 
η̂s(d) = −  η[s(d) ]d , ηs(d) d D
(d)
η̂s(d) ≥ min →d→∞ ηs ,
s∈[[s(d) ]d ,s(d) d ] [s(d) ]d , s(d)
d

 
where ·D denotes the D-norm pertaining to η[s(d) ]d , ηs(d) d .
[s(d) ]d , s(d)
d
Hence, the first part of the assertion is proven.
Now, we show that Ẑ (d) →d→∞ Z in (C[0, 1], ·∞ ). If Zs
= 0, the con-
tinuity of Z implies Z[s(d) ]d
= 0
= Zs(d) d for sufficiently large values of d.
Repeating the above arguments, the assertion now follows from Lemma 4.3.5.
If Zs = 0, the continuity of Z implies
(d)  
Ẑs(d) ≤ 2 max Z[s(d) ]d , Zs(d) d →d→∞ 2Zs = 0,
4.3 Generalized Max-Linear Models 183

which completes the proof. Check that


 
 (d) 
(s d − t, t − [s(d) ]d ) ≥ 1/2,
D[s(d) ] (d)

d , s d

since every D-norm is monotone and standardized.


Interpolating a Brown–Resnick Process


A nice example is the SMS Brown–Resnick process η, which is defined via the
standard geometric Brownian motion

t
Zt := exp Bt − , t ∈ [0, 1],
2

as in (1.26), i.e., we have


 
P (η ≤ f ) = E sup (|f (t)| Zt ) = exp (− f D ) , f ∈ E − [0, 1].
t∈[0,1]

The complete D-norm f D is unknown, but in Lemma 1.10.6, we have com-


puted the bivariate D-norm

t − s log (|x| / |y|)


(x, y)Ds,t = |x| Φ + √
2 t−s

t − s log (|y| / |x|)


+ |y| Φ + √
2 t−s
for x, y ∈ R and 0 ≤ s < t ≤ 1, which is a bivariate Hüsler–Reiss D-norm.
(d)
Writing si instead of si , etc., to ease notation, the interpolating SMS
process defined in (4.13), now becomes

si − si−1 log ((si − t)/(t − si−1 ))


η̂t = (si − t)Φ + √
2 si − si−1

si − si−1 log (t − si−1 /(si − t))


+ (t − si−1 )Φ + √
2 si − si−1

ηsi−1 ηsi
× max , , t ∈ [si−1 , si ], 1 ≤ i ≤ d.
si − t t − si−1

What if the Underlying D-Norm is Unknown?


The preceding Theorem 4.3.6 is the main reason why we consider the dis-
cretized version η̂ of an SMS process η a reasonable predictor of this pro-
cess, where the prediction is done in space, not in time. The predictions
η̂t of the points ηt , t ∈ [0, 1], only depend on the multivariate observations
184 4 An Introduction to Functional Extreme Value Theory

(ηs0 , . . . , ηsd ). More precisely, the only additional thing we need to know to
make these predictions is the set of the adjacent bivariate marginal distribu-
tions of (ηs0 , . . . , ηsd ), that is, the bivariate D-norms ·Di−1,i , i = 1, . . . , d.
However, this may be a restrictive condition in practice and suggests the prob-
lem of how to fit models of bivariate D-norms to data, which is beyond the
scope of the present book. The Brown–Resnick process, including additional
parameters, may serve as a parametric model to start with.
The following results, however, are obvious. Let η̂t be a point of the dis-
cretized version defined in (4.13) and define a defective discretized version
via

ηsi−1 ηsi
η̃t := (si − t, t − si−1 )D̃i max , ,
si − t t − si−1
for t ∈ [si−1 , si ], i = 1, . . . , d, where ·D̃i is an arbitrary D-norm on R2 ,
which we call the defective norm. Then, for every t ∈ [si−1 , si ], i = 1, . . . , d,
! !
! !
|η̂t − η̃t | = !(si − t, t − si−1 )Di−1,i − (si − t, t − si−1 )D̃i !

−ηsi−1 −ηsi
× min , .
si − t t − si−1
In particular, we have η̃si = η̂si = ηsi , i = 0, . . . , d. This means that we obtain
an interpolating process even if we replace the D-norm ·Di−1,i with the de-
fective norm ·D̃i . Furthermore, the defective discretized version still defines
a max-stable process with sample paths in C − [0, 1] = {f ∈ C[0, 1] : f ≤ 0}.
Check that its univariate marginal distributions are given by
 
(si − t, t − si−1 )Di−1,i
P (η̃t ≤ x) = exp x , x ≤ 0,
(si − t, t − si−1 )D̃i
for t ∈ [si−1 , si ], i = 1, . . . , d. These are still negative exponential distribu-
tions, but not standard ones, as they are with the discretized version given
in (4.13). In addition to this, the assertions in Lemma 4.3.4 also hold for the
defective discretized version, since each defective norm ·D̃i is monotone and
standardized. Repeating the arguments in the proof of Theorem 4.3.6 now
shows that the uniform convergence toward the original process η is retained
if we replace the norms ·Di−1,i with arbitrary monotone and standardized
norms ·D̃i . Note that Lemma 1.5.2 implies that these two properties al-
ready imply that the bivariate norm ·D̃i is a D-norm. In that case, the
only property of the discretized version that we lose is the standardization
of the univariate margins, i.e., the resulting process is no longer a standard
max-stable process.

Mean Squared Errors of Discretized Versions


To calculate the mean squared error of the predictor η̂t , we have to determine
the mixed moment E(ηt η̂t ). We could apply Lemma 2.5.1 if we knew that the
4.3 Generalized Max-Linear Models 185

vector (ηt , η̂t ) was standard max-stable itself. This is verified in the following
result.
Lemma 4.3.7 Let η = (ηt )t∈[0,1] be an SMS process and denote by
η̂ = (η̂t )t∈[0,1] its discretized version with grid {s0 , . . . , sd }. Then, the
bivariate rv (ηt , η̂t ) is an SMS rv for every t ∈ [0, 1] with corresponding
D-norm of the two-dimensional marginal
 
 ∗ 
(x, y)Dt :=  x, gi−1 (t)y, gi∗ (t)y  , t ∈ [si−1 , si ], i = 1, . . . , d,
Dt,i−1,i

where ·Dt,i−1,i is the D-norm pertaining to (ηt , ηsi−1 , ηsi ).

Proof. For every t ∈ [si−1 , si ], x, y ≤ 0 and i = 1, . . . , d, we have

P (ηt ≤ x, η̂t ≤ y)
 ∗

= P ηt ≤ x, ηsi−1 ≤ gi−1 (t)y, ηsi ≤ gi∗ (t)y
   

= exp − E max |x| Zt , gi−1 (t) |y| Zsi−1 , gi∗ (t) |y| Zsi
    ∗ 
= exp − E max |x| Zt , |y| max gi−1 (t)Zsi−1 , gi∗ (t)Zsi .

The vector   ∗ 
Zt , max gi−1 (t)Zsi−1 , gi∗ (t)Zsi

defines a generator for every t ∈ [si−1 , si ], i = 1, . . . , d, as for all such t


  ∗   ∗ 
E max gi−1 (t)Zsi−1 , gi∗ (t)Zsi =  gi−1 (t), gi∗ (t) D = 1.
i−1,i

Let us recall the sequence of processes that we discussed in Theorem 4.3.6.


Suppose that η is an SMS process and choose a sequence of grids Gd of the
interval [0, 1] with diameter κd →d→∞ 0. Denote using η̂ (d) , d ∈ N, the
sequence of discretized versions of η with grid Gd . Denote further using ·D(d)
t
(d)
the D-norm as in Lemma 4.3.7, pertaining to (ηt , η̂t ), t ∈ [0, 1], d ∈ N.

Theorem 4.3.8 Let η and η̂ (d) , d ∈ N, be as above. The mean squared


(d)
error of η̂t is given by
   2

(d) (d)
MSE η̂t := E ηt − η̂t
⎛ ⎞
 ∞
1
= 2 ⎝2 − 2 du⎠ →d→∞ 0.
0 (1, u)D(d)
t
186 4 An Introduction to Functional Extreme Value Theory

Proof. The second moment of a standard negative exponentially distributed


rv is two, and therefore, we obtain from Lemma 2.5.1 and Lemma 4.3.7
     2

(d)   (d) (d)


MSE η̂t = E ηt2 − 2E ηt η̂t +E η̂t
 ∞
1
=4−2 2 du.
0 (1, u)D(d)
t

Next, we show ·D(d) →d→∞ ·∞ pointwise for all t ∈ [0, 1]. Denote using
t

Z and Ẑ (d) , d ∈ N, the generator processes of η and η̂ (d) , d ∈ N. Define


 
supt∈[0,1] Zt
m := E sup Zt < ∞ and Z̃ := .
t∈[0,1] m

Then, E(Z̃) = 1, and thus, (Zt , Z̃), define a generator of a D-norm ·D̃ on
(d)
R2 for all t ∈ [0, 1]. Lemma 4.3.5 implies Ẑt ≤ mZ̃ for all d ∈ N. Therefore,
for arbitrary x, y ∈ R, d ∈ N and t ∈ [0, 1], we have
   
(d)
max |x| Zt , |y| Ẑt ≤ max |x| Zt , |my| Z̃t ,

where   
E max |x| Zt , |my| Z̃t = (x, my)D̃ < ∞.
Hence, we can apply  the dominated convergence theorem to the sequence
(d) (d)
max |x| Zt , |y| Ẑt , d ∈ N. Together with the fact that Ẑt →d→∞ Zt for
all t ∈ [0, 1] by Theorem 4.3.6, we obtain for x, y ∈ R
  
(d)
(x, y)D(d) = E max |x| Zt , |y| Ẑt
t

→d→∞ E (max (|x| Zt , |y| Zt )) = (x, y)∞ .


∞ −2
In Example 2.5.3, we already calculated 0 (1, u)∞ du = 2. Since ·∞
is the smallest D-norm, we have, for all d ∈ N and t ∈ [0, 1],
1 1
2 ≤ 2 ,
(1, u)D(d) (1, u)∞
t

therefore, by applying the dominated convergence theorem again, we obtain


that  ∞  ∞
1 1
2 du →d→∞ du = 2,
0 (1, u)D(d) 0 (1, u)2∞
t

which completes the proof.



The generalized max-linear model, as considered in this section for SMS
processes, carries over to GPP in a straightforward manner. For details, we
refer the reader to Falk et al. (2015).
5
Further Applications of D-Norms
to Probability & Statistics

5.1 Max-Characteristic Function


This section introduces max-characteristic functions (max-CFs), which are
an offspring of D-norms. A max-CF characterizes the distribution of an rv in
Rd , whose components are non-negative and have finite expectation. Pointwise
convergence of a max-CF is shown to be equivalent to convergence with respect
to the Wasserstein metric. An inversion formula for max-CF is established as
well.
As discussed in Section 1.2, neither the generator of a D-norm nor its
distribution is uniquely determined. However, given a generator Z of a D-
norm on Rd , we can design a D-norm on Rd+1 in a simple fashion so that it
characterizes the distribution of Z: consider the D-norm on Rd+1

(t, x)D := E (max(|t| , |x1 | Z1 , . . . , |xd | Zd )) , t ∈ R, x ∈ Rd .

Then, it turns out that the knowledge of this D-norm fully identifies the
distribution of Z; it is actually enough to know this D-norm when t = 1, as
the following Lemma 5.1.1 shows, and this shall be the basis for our definition
of a max-CF. By =D we mean equality of the distributions.
Lemma 5.1.1 Let X = (X1 , . . . , Xd ) ≥ 0, Y = (Y1 , . . . , Yd ) ≥ 0 be
rvs with E(Xi ), E(Yi ) < ∞, 1 ≤ i ≤ d. If we have, for each x > 0 ∈ Rd ,

E (max(1, x1 X1 , . . . , xd Xd )) = E (max(1, x1 Y1 , . . . , xd Yd )) ,

then X =D Y .

Proof. From Lemma 1.2.2, for arbitrary x > 0 ∈ Rd and c > 0, we obtain the
equation

© Springer Nature Switzerland AG 2019 187


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9 5
188 5 Further Applications of D-Norms to Probability & Statistics

 ∞

X1 Xd X1 Xd
E max 1, ,..., = 1 − P max 1, ,..., ≤ t dt
cx1 cxd cx1 cxd
0 ∞
= 1 − P (1 ≤ t, Xi ≤ tcxi , 1 ≤ i ≤ d) dt
0
 ∞
=1+ 1 − P (Xi ≤ tcxi , 1 ≤ i ≤ d) dt.
1

The substitution t → t/c yields that the right-hand side above equals

1 ∞
1+ 1 − P (Xi ≤ txi , 1 ≤ i ≤ d) dt.
c c

Repeating the preceding arguments with Yi in place of Xi , we obtain, for all


c > 0 from the assumption in Lemma 5.1.1, the equality
 ∞  ∞
1 − P (Xi ≤ txi , 1 ≤ i ≤ d) dt = 1 − P (Yi ≤ txi , 1 ≤ i ≤ d) dt.
c c

Taking right derivatives with respect to c, we obtain, for c > 0,

1 − P (Xi ≤ cxi , 1 ≤ i ≤ d) = 1 − P (Yi ≤ cxi , 1 ≤ i ≤ d)

and, thus, the assertion.


The following characterization extends Lemma 5.1.1.


Lemma 5.1.2 Let X = (X1 , . . . , Xd ) ≥ 0, Y = (Y1 , . . . , Yd ) ≥ 0 be rvs
with finite expectations in each component. Suppose there is a number
a ∈ R such that, for all x > 0 ∈ Rd ,

E (max (1, x1 X1 , . . . xd Xd )) = E (max (a, x1 Y1 , . . . , xd Yd )) . (5.1)

Then, a = 1 and X =D Y .

Proof. For a < 0 and x > 0 ∈ Rd , we clearly have

E (max (a, x1 Y1 , . . . , xd Yd )) = E (max (x1 Y1 , . . . , xd Yd )) .

Thus, we can suppose a ≥ 0. Multiplying both sides of equation (5.1) with an


arbitrary number c > 0, we obtain the working assumption

ϕc,X (x) := E (max(c, x1 X1 , . . . , xd Xd ))


= E (max(ac, x1 Y1 , . . . , xd Yd ))

=: ϕca,Y (x), c > 0, x > 0 ∈ Rd . (5.2)


5.1 Max-Characteristic Function 189

Now, according to Lemma 1.2.2, we have, for any c > 0 and any x > 0,

 ∞

1 X1 Xd
ϕc,X = 1 − P max c, ,..., ≤ t dt
x 0 x1 xd
 ∞
=c+ 1 − P (Xi ≤ txi , 1 ≤ i ≤ d) dt.
c

By the same arguments, we obtain



 ∞
1
ϕca,Y = ca + 1 − P (Yi ≤ txi , 1 ≤ i ≤ d) dt.
x ca

If a = 0, the latter identity becomes



 ∞
1
ϕ0,Y = 1 − P (Yi ≤ txi , 1 ≤ i ≤ d) dt,
x 0

so that equation (5.2) becomes


 ∞  ∞
c+ 1 − P (Xi ≤ txi , 1 ≤ i ≤ d) dt = 1 − P (Yi ≤ txi , 1 ≤ i ≤ d) dt
c 0

for each c > 0 and x > 0 ∈ Rd . Taking right derivatives with respect to c
yields
P (Xi ≤ cxi , 1 ≤ i ≤ d) = 0.
Letting c → ∞ clearly produces a contradiction.
Suppose next that a > 0. We have
 ∞  ∞
c+ 1−P (Xi ≤ txi , 1 ≤ i ≤ d) dt = ca+ 1−P (Yi ≤ txi , 1 ≤ i ≤ d) dt.
c ca

Taking right derivatives with respect to c again entails

P (Xi ≤ cxi , 1 ≤ i ≤ d) = a − 1 + P (Yi ≤ caxi , 1 ≤ i ≤ d).

Letting c → ∞ gives 1 = a − 1 + 1 and, therefore, a = 1. Our basic as-


sumption is, thus, equivalent with the one in Lemma 5.1.1, which implies
X =D Y .

Definition of Max-Characteristic Function


Let Z = (Z1 , . . . , Zd ) be an rv whose components are non-negative and inte-
grable. Then, we call

ϕZ (x) := E (max (1, x1 Z1 , . . . , xd Zd )) ,

with x = (x1 , . . . , xd ) ≥ 0 ∈ Rd , the max-characteristic function pertain-


ing to Z. Lemma 5.1.1 shows that the distribution of a non-negative and
190 5 Further Applications of D-Norms to Probability & Statistics

integrable rv Z is uniquely determined by its max-CF. Note: although we


use the notation Z for a non-negative rv here and in what follows, we do
not require it to be the generator of a D-norm, i.e., we do not require that
its components have expectation one. We only require them to have finite
expectation.
The definition of a max-CF can be extended to arbitrary, not necessarily
non-negative rvs X = (X1 , . . . , Xd ) by putting Z := (exp(X1 ), . . . , exp(Xd )),
provided that E(exp(Xi )) < ∞, 1 ≤ i ≤ d. The distribution of Z clearly iden-
tifies that of X. A prominent example is an rv X that follows a multivariate
normal distribution. In this case, each component Zi = exp(Xi ) is log-normal
distributed; see also the definition of the Hüsler–Reiss D-norm in (1.6).

Basic Properties of Max-CF


Some obvious properties of ϕZ are ϕZ (0) = 1, ϕZ (x) ≥ 1 for all x, and, with
r ∈ [0, ∞),
≤ rϕZ (x), r ≥ 1,
ϕZ (rx)
≥ rϕZ (x), r ≤ 1.
We will show next that any max-CF is a convex function; thus, it is continu-
ous and differentiable almost everywhere (see, for example, Rockafellar (1970,
Theorem 25.5)); furthermore, its derivative from the right exists everywhere.
This fact is used in Proposition 5.1.18, where we establish an inversion formula
for max-CFs.
When Z has bounded components, we obviously have ϕZ (x) = 1 in a
neighborhood of the origin by the definition of ϕZ (x).
Lemma 5.1.3 Any max-CF ϕZ is a convex function.

Proof. Let ϕZ be the max-CF of the rv Z = (Z1 , . . . , Zd ) on Rd . We can


assume wlog that no component Zi is a.s. equal to zero. We have to show
that, for any λ ∈ [0, 1],

ϕZ (λx + (1 − λ)y) ≤ λϕZ (x) + (1 − λ)ϕZ (y), x, y ≥ 0 ∈ Rd .

Put
X := (X0 , . . . , Xd ) := (1, Z1 , . . . , Zd ).
A repetition of the arguments in the proof of Lemma 1.1.3 yields that

xX := E max (|xi | Xi ) , x = (x0 , . . . , xd ) ∈ Rd+1 ,


0≤i≤d

defines a norm on Rd+1 ; but ·X is not necessarily a D-norm, as we only


require E(Xi ) ∈ (0, ∞) for i = 1, . . . , d.
Each norm · is a convex function, which is an obvious consequence of
the triangle inequality and its homogeneity:
5.1 Max-Characteristic Function 191

λx + (1 − λ)y ≤ λx + (1 − λ)y


= λ x + (1 − λ) y , λ ∈ [0, 1].

As a consequence, we obtain, for x, y ≥ 0 ∈ Rd and λ ∈ [0, 1],

ϕZ (λx + (1 − λ)y)
  
= E max 1, (λx1 + (1 − λ)y1 )Z1 , . . . , (λxd + (1 − λ)yd )Zd
  
= E max λ + (1 − λ), (λx1 + (1 − λ)y1 )Z1 , . . . , (λxd + (1 − λ)yd )Zd

= (λ + (1 − λ), λx1 + (1 − λ)y1 , . . . , λxd + (1 − λ)yd )X

≤ λ (1, x1 , . . . , xd )X + (1 − λ) (1, y1 , . . . , yd )X

= λϕZ (x) + (1 − λ)ϕZ (y),

which proves the convexity of ϕZ .


Lemma 5.1.4 The set of max-CFs is convex, i.e., if ϕ1 and ϕ2 are


max-CF on [0, ∞)d , then

ϕλ := λϕ1 + (1 − λ)ϕ2

is a max-CF for each λ ∈ [0, 1].

The proof of Lemma 5.1.4 repeats the arguments in the proof of Propo-
sition 1.4.1, which states that the set of D-norms is convex. It provides in
particular an rv Zλ , whose max-CF is given by ϕλ .
Proof. Let Z (1) , Z (2) be rvs with corresponding max-CFs ϕ1 , ϕ2 . Take an rv
ξ that attains only the values one and two, with probability P (ξ = 1) = λ =
1 − P (ξ = 2), and suppose that ξ is independent of Z (1) and of Z (2) . Note
that  
(ξ) (ξ)
Z (ξ) := Z1 , . . . , Zd

is an rv in Rd , whose components are non-negative and have finite expectation.


Precisely, for x = (x1 , . . . , xd ) ≥ 0, we obtain
  
(ξ) (ξ)
E max 1, x1 Z1 , . . . , xd Zd
2
    
(ξ) (ξ)
= E max 1, x1 Z1 , . . . , xd Zd 1(ξ = i)
i=1
2    
(i) (i)
= E max 1, x1 Z1 , . . . , xd Zd 1(ξ = i)
i=1
192 5 Further Applications of D-Norms to Probability & Statistics
2
   
(i) (i)
= E max 1, x1 Z1 , . . . , xd Zd E(1(ξ = i))
i=1
2
   
(i) (i)
= E max 1, x1 Z1 , . . . , xd Zd P (ξ = i)
i=1
= λϕ1 (x) + (1 − λ)ϕ2 (x),

which completes the proof.


The Max-CF in the Univariate Case


When d = 1, the max-CF of a non-negative and integrable rv Z is, according
to Lemma 1.2.2,

ϕZ (x) = E (max (1, xZ))


 ∞
=1+ P (xZ > t) dt
1
 ∞
=1+x P (Z > t) dt
1/x

= 1 + xE((Z − 1/x)1(Z > 1/x)).

The latter expression is connected to the expected shortfall of Z (see Embrechts


et al. (1997)). Indeed, if qZ (u) := inf {t ≥ 0 : P (Z ≤ t) ≥ u}, u ∈ (0, 1), is the
quantile function of Z, then the expected shortfall of Z for α ∈ (0, 1) is defined
by
E (Z 1(Z > qZ (α)))
ESZ (α) := E(Z | Z > qZ (α)) = .
P (Z > qZ (α))
Putting xα := 1/qZ (α), we obtain

ϕZ (xα ) = 1 + xα E ((Z − qZ (α))1(Z > qZ (α)))


= 1 + xα P (Z > qZ (α))(ESZ (α) − qZ (α))
= 1 + xα SPZ (α),

where
SPZ (α) = P (Z > qZ (α))(ESZ (α) − qZ (α))
is the stop-loss premium risk measure of Z; see Embrechts et al. (1997).
The preceding remarks suggest that max-CFs might be closely connected
to well-known elementary objects such as conditional expectations and risk
measures; a particular consequence of it is that computing a max-CF may,
in certain cases, be much easier than computing a standard characteristic
function (CF), i.e., a Fourier transform. The following example illustrates
this idea.
5.1 Max-Characteristic Function 193

Example 5.1.5 Let Z be an rv that has the GPD with location pa-
rameter μ ≥ 0, scale parameter σ > 0 and shape parameter ξ ∈ (0, 1),
whose df is

−1/ξ
z−μ
P (Z ≤ z) = 1 − 1 + ξ , z ≥ μ.
σ

The expression of the CF of this distribution is a fairly involved one,


which depends on the Gamma function evaluated in the complex plane.
However, it is straightforward to show that, for all x > 0,
⎧ σ
 ∞ ⎪
⎪E(Z) − x = μ − x + , if x < μ,
⎨ 1−ξ
P (Z > z) dz =
1−1/ξ

⎪ σ x−μ
x ⎩ 1+ξ , if x ≥ μ.
1−ξ σ

Hence, the max-CF of Z is




σ 1

⎨ xE(Z) = x μ + , if x > ,
1−ξ μ
ϕZ (x) =

1−1/ξ

⎪ σx 1 − μx 1
⎩1 + 1+ξ , if x ≤ .
1−ξ σx μ

The next example is a consequence of the Pickands–de Haan–Resnick rep-


resentation of a max-stable df; see Theorem 2.3.4.

Example 5.1.6 Let G be a d-dimensional max-stable df with identical


univariate Fréchet margins Gi (x) = exp(−x−α ), x > 0, α > 1. Then,
there exists a D-norm ·D on Rd such that G(x) = exp (− 1/xα D ),
x > 0 ∈ Rd . The max-CF ϕG of G is, for x ≥ 0 ∈ Rd ,
 ∞

xα D
ϕG (x) = 1 + 1 − exp − dy
1 yα
 ∞
1/α
= 1 + xα D 1 − exp(−y −α ) dy.
1/α
1/ xα D

Characterizing Pointwise Convergence of Max-CFs


Denote using dW (P, Q) the Wasserstein metric between two probability dis-
tributions on Rd with finite first moments; see Section 1.8. Recall that conver-
gence of probability measures Pn to P0 with respect to the Wasserstein metric
is equivalent to weak convergence together with convergence of the moments
194 5 Further Applications of D-Norms to Probability & Statistics
 
x1 Pn (dx) →n→∞ x1 P0 (dx);
Rd Rd

see, for example, Villani (2009). Let X and Y be integrable rvs in Rd with
distributions P and Q. Using dW (X, Y ) := dW (P, Q) we denote the Wasser-
stein distance between X and Y . The next result precisely says that point-
wise convergence of max-CFs is equivalent to convergence with respect to the
Wasserstein metric.
Theorem 5.1.7 Let Z, Z (n) , n ∈ N, be non-negative and integrable
rvs in Rd with the pertaining max-CFs
 ϕZ , ϕZ (n) , n ∈ N. Then
ϕZ (n) →n→∞ ϕZ pointwise iff dW Z (n) , Z →n→∞ 0.

(n) (n)
Proof. Suppose
that   , Z) →n→∞ 0. Then, we can find versions Z
dW (Z
 (n) 
and Z with E Z − Z 1 →n→∞ 0. For x = (x1 , . . . , xd ) ≥ 0, this implies
 
(n) (n)
ϕZ (n) (x) = E max(1, x1 (Z1 + (Z1 − Z1 )), . . . , xd (Zd + (Zd − Zd )))
  
≤ E (max(1, x1 Z1 , . . . , xd Zd )) + x∞ E Z (n) − Z 1
 (n)  
≥ E (max(1, x1 Z1 , . . . , xd Zd )) − x∞ E Z − Z 1
= ϕZ (x) + o(1).

Suppose next that ϕZ (n) →n→∞ ϕZ pointwise. For t > 0 and x =


(x1 , . . . , xd ) ≥ 0, we have
x  ∞  
(n)
tϕZ (n) = t+ 1 − P xi Zi ≤ y, 1 ≤ i ≤ d dy,
t
x t ∞
tϕZ = t+ 1 − P (xi Zi ≤ y, 1 ≤ i ≤ d) dy. (5.3)
t t

Putting t = ε and x = ei , for ε > 0 and 1 ≤ i ≤ d, these equations imply


   ∞  
(n) (n)
E Zi = 1 − P Zi ≤ y dy
0 ∞  
(n)
= 1 − P Zi ≤ y dy + O(ε)
ε
 ∞
→n→∞ 1 − P (Zi ≤ y) dy + O(ε)
ε
= E(Zi ) + O(ε).
 
(n)
Letting ε tend to zero entails convergence of E Zi to E (Zi ). Therefore,
we have to establish weak convergence of Z (n) to Z. From equation (5.3), we
obtain, for 0 < s < t and x = (x1 , . . . , xd ) ≥ 0,
5.1 Max-Characteristic Function 195
x x  
t 
(n)
tϕZ (n) − sϕZ (n) = P xi Zi ≤ y, 1 ≤ i ≤ d dy
t s s
x x
→n→∞ tϕZ − sϕZ
t s
 t
= P (xi Zi ≤ y, 1 ≤ i ≤ d) dy. (5.4)
s

Let x = (x1 , . . . , xd ) ≥ 0 be a point of continuity of the df of Z. Suppose


first that x > 0. Then, we have
 

1 (n)
P Z (n) ≤ x = P Zi ≤ 1, 1 ≤ i ≤ d .
xi
If

1 (n) 1
lim sup P Z ≤ 1, 1 ≤ i ≤ d > P Zi ≤ 1, 1 ≤ i ≤ d
n→∞ xi i xi
or

1 (n) 1
lim inf P Zi ≤ 1, 1 ≤ i ≤ d < P Zi ≤ 1, 1 ≤ i ≤ d ,
n→∞ xi xi
then equation (5.4) readily produces a contradiction by putting s = 1 and
t = 1 + ε, or t = 1 and s = 1 − ε with a small ε > 0. Thus, we have
 
P Z (n) ≤ x →n→∞ P (Z ≤ x) (5.5)

for each point of continuity x = (x1 , . . . , xd ) of the df of Z with strictly


positive components.
Suppose next that xj = 0 for j ∈ T ⊂ {1, . . . , d}, xi > 0 for i
∈ T , T
= ∅.
In this case, we have
P (Z ≤ x) = P (Zi ≤ xi , i
∈ T ; Zj ≤ 0, j ∈ T ) = 0
by the assumed continuity from the left of the df of Z at x. Thus, we have to
establish
   
(n) (n)
lim sup P Z (n) ≤ x = lim sup P Zi ≤ xi , i
∈ T ; Zj ≤ 0, j ∈ T = 0.
n→∞ n→∞

Suppose that
 
(n) (n)
lim sup P Zi ≤ xi , i
∈ T ; Zj ≤ 0, j ∈ T = c > 0.
n→∞

Choose a point of continuity y > x. Then, we obtain


 
0 < c ≤ lim sup P Z (n) ≤ y = P (Z ≤ y)
n→∞

by equation (5.5). Letting y converge to x, we obtain P (Z ≤ x) ≥ c > 0 and,


thus, a contradiction. This completes the proof of Theorem 5.1.7.

196 5 Further Applications of D-Norms to Probability & Statistics

Some Applications to Multivariate Extremes


Convergence of a sequence of max-CFs is therefore stronger than the conver-
gence of standard CFs: choose a sequence of real-valued rvs Zn , n ∈ N, such
that
1 1
P (Zn = en ) = and P (Zn = 0) = 1 − , n ∈ N.
n n
Then, Zn converges to zero in probability and, therefore, in distribution, but
the sequence of moments does not converge to zero as well, as E(Zn ) =
en /n →n→∞ ∞.
The following Corollary 5.1.8, which is obtained by simply rewriting The-
orem 5.1.7, is tailored to applications in multivariate extreme value theory.
Corollary 5.1.8 Let X (n) , n ∈ N, be independent copies of an rv
X in Rd that is non-negative and integrable in each component. Let
ξ = (ξ1 , . . . , ξd ) be a max-stable rv with Fréchet margins P (ξi ≤ x) =
exp (−1/xαi ), x > 0, αi > 1, 1 ≤ i ≤ d. Then, from Theorem 5.1.7, we
obtain the equivalence

max1≤i≤n X (i)
dW , ξ →n→∞ 0
a(n)

for some norming sequence 0 < a(n) ∈ Rd iff

ϕn →n→∞ ϕξ pointwise,

where ϕn denotes the max-CF of max1≤i≤n X (i) /a(n) , n ∈ N.

The following example shows a nice application of the use of max-CFs


to the convergence of the componentwise maxima of independent generalized
Pareto rvs.
Example 5.1.9 Let U be an rv that is uniformly distributed on (0, 1),
and let Z = (Z1 , . . . , Zd ) be the generator of a D-norm ·D with the
additional property that each Zi is bounded, i.e., Zi ≤ c, 1 ≤ i ≤ d, for
some constant c ≥ 1. We require that U and Z are independent.
Then, the rv
1  
1/α 1/α
V := (V1 , . . . , Vd ) := Z1 , . . . , Zd
U 1/α
with α > 0 followsa multivariate GPD; see Section 2.2. Precisely, for
x ≥ c1/α , . . . , c1/α ∈ Rd , we have
 
 1 
P (V ≤ x) = 1 −  
 xα  .
D
5.1 Max-Characteristic Function 197

Now, let V (1) , V (2) , . . . be independent copies of V and put

Y (n) := n−1/α max V (i) .


1≤i≤n

Then, we have, for x > 0 ∈ Rd and n large,


   

 1  n
P Y (n) ≤ x = 1 −  
 nxα 
D  

 1 
→n→∞ exp −  
 xα 
D
= P (ξ ≤ x), (5.6)

where ξ is a max-stable rv with identical Fréchet margins P (ξi ≤ x) =


exp(−1/xα ), x > 0. Choose α > 1; in this case, the components of V
and ξ have finite expectations. By writing
 ∞

1
ϕY (n) (x) = 1 + 1 − P Y (n) ≤ t dt
1 x

and using equation (5.6), elementary arguments, such as a Taylor ex-


pansion, make it possible to show that the sequence of max-CFs ϕY (n)
converges pointwise to the max-CF ϕξ of ξ. Since convergence with
respect to the Wasserstein metric is equivalent to convergence in dis-
tribution together with convergence of the moments, we obtain from
(n)
Theorem 5.1.7 that, in this example, we actually have both Y →D ξ
(n)
and E Yi →n→∞ E(ξi ) = Γ (1 − 1/α) for 1 ≤ i ≤ d.

Example 5.1.10 Let U (1) , U (2) , . . . be independent copies of the rv


U = (U1 , . . . , Ud ), which follows a copula on Rd , i.e., each Ui is uni-
formly distributed on (0, 1). From Proposition 3.1.10, we know that
there exists an SMS rv η = (η1 , . . . , ηd ) in Rd such that

V (n) := n max U (j) − 1 →D η


1≤j≤n

iff there exists a D-norm ·D on Rd , such that for x ≤ 0 ∈ Rd


 
P V (n) ≤ x →n→∞ exp (− xD ) =: G(x),

or iff there exists a D-norm ·D on Rd such that


198 5 Further Applications of D-Norms to Probability & Statistics

C(u) = 1 − 1 − uD + o (1 − u)

as u → 1, uniformly for u ∈ [0, 1]d .


For 1 ≤ i ≤ d, we have convergence of the first moments

(j) n
E n 1 − max Ui = →n→∞ 1 = E(−ηi ),
1≤j≤n n+1

thus, we obtain from Theorem 5.1.7 the characterization

V (n) →D η
 
⇐⇒ dW V (n) , η →n→∞ 0
⇐⇒ ϕ−V (n) →n→∞ ϕ−η pointwise.

For instance, when d = 2, straightforward computations yield that −η


is a weak limit above iff it has a max-CF of the form

ϕ−η (x)
1
= 1 + x1 exp(−1/x1 ) + x2 exp(−1/x2 ) − exp(− 1/xD ).
1/xD

Convergence of D-Norms, Their Generators, and


Max-CFs
The following consequences of Theorem 5.1.7 supplement Proposition 1.8.3,
which is formulated for generators of D-norms, whose components add up to
the constant d. Now, we drop this condition.
Corollary 5.1.11 Let Z, Z (n) , n ∈ N be generators of D-norms on
Rd . Then ϕZ (n) →n→∞ ϕZ pointwise iff Z (n) →D Z.

Interestingly, the convergence of a sequence of max-CFs of generators of D-


norms also implies pointwise convergence of the related D-norms. We denote
by ·D,Z that D-norm generated by Z.

Corollary 5.1.12 Let Z, Z (n) , n ∈ N be generators of D-norms in Rd


with pertaining max-CFs ϕZ , ϕZ (n) , n ∈ N. Then, pointwise conver-
gence ϕZ (n) →n→∞ ϕZ implies ·D,Z (n) →n→∞ ·D,Z pointwise.
5.1 Max-Characteristic Function 199

Proof. For x ≥ 0 and proper versions of Z (n) and Z, we have


 

(n)
xD(n) = E max xi Zi
1≤i≤d
 

(n)
= E max xi Zi + xi (Zi − Zi )
1≤i≤d

   
 
= E max (xi Zi ) + O E Z (n) − Z 
1≤i≤d 1

→n→∞ E max (xi Zi ) = xD


1≤i≤d



The following consequence of Corollaries 5.1.11 and 5.1.12 is obvious.
Corollary 5.1.13 Let Z (0) , Z (n) be arbitrary generators of the D-
norms ·D0 , ·Dn on Rd , n ∈ N. If Z (n) →D Z (0) , then ·Dn →n→∞
·D0 pointwise.

The reverse implication in the preceding result is not true; just put Z (0) :=
(1, . . . , 1) ∈ Rd and Z (n) := (X, . . . , X), where X ≥ 0 is an rv with E(X) = 0.
Both generate the sup-norm ·∞ , but, clearly, Z (n)
→n→∞ Z (0) , unless
X = 1 a.s.

A Central Limit Theorem for D-Norms


The following example may be viewed as a central limit theorem for D-
norms. It is closely related to the multiplication stability of a Hüsler–Reiss
D-norm in Example 1.9.2; the multiplication of D-norms is introduced in
Section 1.9.

Example 5.1.14 Let Y = (Y1 , . . . , Yd ) be an rv such that E(Yi ) = 0


and 0 < μi := E(exp(Yi )) < ∞ for 1 ≤ i ≤ d. Then,

exp(Y1 ) exp(Yd )
Z := ,...,
μ1 μd
= (exp(Y1 − log(μ1 )), . . . , exp(Yd − log(μd )))

is the generator of aD-norm.



Suppose that E Yi2 < ∞, 1 ≤ i ≤ d, and let Y (1) , Y (2) , . . . be
independent copies of Y . Then, the multivariate central limit theorem
is applicable, i.e.,
1  (j)
n
√ Y →D N (0, Σ),
n j=1
200 5 Further Applications of D-Norms to Probability & Statistics

with covariance matrix Σ = (σij )1≤i,j,≤d = (E(Yi Yj ))1≤i,j,≤d . The rv


⎛     ⎞
(j) √ (j) √
exp Y1 / n exp Yd / n
Zj,n := ⎝    , . . . ,    ⎠ ,
(j) √ (j) √
E exp Y1 / n E exp Yd / n

with 1 ≤ j ≤ n and n ∈ N, defines a triangular array of generators of


D-norms ·Dj,n that are identical in each row, i.e., ·Dj,n =: ·Dn ,
1 ≤ j ≤ n, n ∈ N. Note that Z1,n , . . . , Zn,n are independent rvs for
each n ≥ 2, and thus,

n
Z (n) := Zj,n
j=1
⎛ /
n

0
1 (j) Y1
= ⎝exp √ Y − n log E exp √ ,
n j=1 1 n
/ ⎞
n

0
1 (j) Yd ⎠
. . . , exp √ Y − n log E exp √
n j=1 d n


generates the product D-norm ·Dn . Note that E(exp(Y / n)) < ∞
n
for n ∈ N if E(exp(Y )) < ∞.
Using the Taylor expansion exp(x) = 1 + x + exp(ϑx)x2 /2, with
some 0 < ϑ < 1, x ∈ R, and log(1 + ε) = ε + O(ε2 ) as ε → 0, it is easy
to see that

 
Yj E Yj2 σjj
n log E exp √ →n→∞ = , 1 ≤ j ≤ d.
n 2 2

The multivariate central limit theorem, together with the continuous


mapping theorem, implies
 σ11 σdd 
Z (n) →D Z (0) := exp X1 − , . . . , Xd − , (5.7)
2 2
where the rv X = (X1 , . . . , Xd ) follows the multivariate normal distri-
bution N (0, Σ).
The norm, which is generated by Z (0) , is the Hüsler–Reiss D-norm
·HRΣ , introduced in (1.6). As a consequence, we obtain from (5.7) and
Corollary 5.1.13 the pointwise convergence of the product of D-norms

·Dnn →n→∞ ·HRΣ .


5.1 Max-Characteristic Function 201

Bounding a Max-CF by the D-Norm


The following lemma relates a D-norm to the max-CF of its generator.
Lemma 5.1.15 If ϕZ is the max-CF of a generator Z of a D-norm
·D,Z , then for all x ∈ [0, ∞)d

max(1, xD,Z ) ≤ ϕZ (x) ≤ 1 + xD,Z .

Especially if Z denotes the set of all generators of D-norms, then


! !
! ϕ (x) !
! Z !
sup ! − 1! → 0 for x ∈ [0, ∞)d with x∞ → ∞.
Z∈Z ! xD,Z !

Proof. The lower bound is obtained by noting that

1 ≤ max(1, x1 Z1 , . . . , xd Zd )

and
max(x1 Z1 , . . . , xd Zd ) ≤ max(1, x1 Z1 , . . . , xd Zd )
and taking expectations. The upper bound is a consequence of the inequality
max(a, b) ≤ a + b, valid when a, b ≥ 0. Finally, the uniform convergence result
is obtained by writing
ϕZ (x) 1
1≤ ≤1+
xD,Z xD,Z

for all Z ∈ Z and all x ∈ Rd+ \ {0}. Because  · D,Z ≥  · ∞ , this entails
! !
! ϕZ (x) ! 1
sup !! − 1!! ≤ ,
Z∈Z xD,Z x∞
from which the conclusion follows.

It is worth noting that the inequalities of Lemma 5.1.15 are sharp in the
sense that, for Z = (1, . . . , 1), ϕZ (x) = max(1, x∞) = max(1, xD,Z ).
Therefore, the leftmost inequality is in fact an equality in this case, whereas
the rightmost inequality ϕZ (x) ≤ a + bxD,Z can only be true if a, b ≥ 1
because of the leftmost inequality.
Lemma 5.1.15 has the following consequence.
Corollary 5.1.16 No constant function can be the max-CF of a gen-
erator of a D-norm.

Such a result is, of course, not true for standard max-CFs, since, for in-
stance, the max-CF of the constant rv zero is the constant function one. The
next result supplements Lemma 5.1.15.
202 5 Further Applications of D-Norms to Probability & Statistics

Lemma 5.1.17 Let Z = (Z1 , . . . , Zd ) be a generator of a D-norm ·D


on Rd , and let ϕZ be its max-CF. Then, we obtain

lim (ϕZ (x) − xD ) = P (Z = 0),


x→∞

where the convergence x → ∞ means that each component xi of x


converges to infinity.

Proof. From Lemma 1.2.2, we obtain for x = (x1 , . . . , xd ) > 0

ϕZ (x) − xD


= E(max(1, x1 Z1 , . . . , xd Zd )) − E max (xi Zi )
1≤i≤d
 ∞  ∞

= P (max(1, x1 Z1 , . . . , xd Zd ) > t) dt − P max (xi Zi ) > t dt


0 0 1≤i≤d
 1  ∞
 ∞

= 1 dt + P max (xi Zi ) > t dt − P max (xi Zi ) > t dt


0 1 1≤i≤d 0 1≤i≤d
 1

=1− P max (xi Zi ) > t dt


0 1≤i≤d
 1

= 1−P max (xi Zi ) > t dt


0 1≤i≤d
 1

= P max (xi Zi ) ≤ t dt
0 1≤i≤d
 1

t
= P Zi ≤ , 1 ≤ i ≤ d dt
0 xi
 1
→x→∞ P (Zi = 0, 1 ≤ i ≤ d) dt
0
= P (Z = 0)

by the dominated convergence theorem.


An Inversion Formula for the Max-CF


It is clear from the bound

xD,Z ≤ ϕZ (x) ≤ 1 + xD,Z , x ≥ 0 ∈ Rd ,

in Lemma 5.1.15 that the D-norm ·D,Z can be deduced from ϕZ , since

ϕZ (tx)
lim = xD,Z , x ≥ 0 ∈ Rd .
t→∞ t
5.1 Max-Characteristic Function 203

In the next result, we establish a direct inversion formula for a non-negative


and componentwise integrable rv, not necessarily the generator of a D-norm.
We have seen in Lemma 5.1.3 that each max-CF is a convex function;
thus, it is continuous and differentiable almost everywhere (see, for example,
Rockafellar (1970, Theorem 25.5)); besides, its derivative from the right exists
everywhere.
The next result contains both an inversion formula for a max-CF and a
criterion for a function to be a max-CF.
Proposition 5.1.18 Let Z be a non-negative and integrable rv with
max-CF ϕZ .
(i) For all x = (x1 , . . . , xd ) > 0, we have

!
∂+ 1 !
P (Z ≤ x) = tϕZ !
∂t tx t=1

1 1 1
= lim (1 + h)ϕZ − ϕZ ,
h↓0 h (1 + h)x x

where ∂+ denotes the right derivative.


(ii) If ψ is a continuously differentiable function, such that for all x =
(x1 , . . . , xd ) > 0

!
∂ 1 !
tψ ! = P (Z ≤ x)
∂t tx t=1

and

1
lim t ψ − 1 = 0,
t→∞ tx

then ψ = ϕZ on (0, ∞)d .

The proof of the preceding result yields in particular that



1
lim t ϕ − 1 = 0, x = (x1 , . . . , xd ) > 0,
t→∞ tx

for any max-CF ϕ.

Proof. Notice first that, for all t > 0,



1 Z1 Zd
tϕZ = tE max 1, ,...,
tx tx1 txd

Z1 Zd
= E max t, ,..., .
x1 xd
204 5 Further Applications of D-Norms to Probability & Statistics

This gives

 ∞

1 Z1 Zd
tϕZ = P max t, ,..., > y dy
tx 0 x1 xd
 ∞

Z1 Zd
=t+ P max ,..., > y dy
x1 xd
t ∞
=t+ 1 − P (Zj ≤ yxj , 1 ≤ j ≤ d) dy.
t

This representation yields in particular that limt→∞ t(ϕZ (1/(tx)) − 1) = 0.


To show (i), notice that taking right derivatives with respect to t yields

∂+ 1
tϕZ = P (Zj ≤ txj , 1 ≤ j ≤ d).
∂t tx

Setting t = 1 concludes the proof of (i). To prove (ii), note that, for all t > 0,

1 1
d
∂ 1 1 1
tψ =ψ − ∂j ψ , (5.8)
∂t tx tx t i=1 xj tx

where ∂j ψ denotes the partial derivative of ψ with respect to its jth compo-
nent. In particular, because

!
∂ 1 !
P (Zj ≤ xj , 1 ≤ j ≤ d) = tψ !
∂t tx t=1

 d

1 1 1
=ψ − ∂j ψ , (5.9)
x x
i=1 j
x

we obtain from equation (5.8), by replacing x with tx in equation (5.9), that,


for all t > 0

∂ 1
tψ = P (Zj ≤ txj , 1 ≤ j ≤ d).
∂t tx
Now write

 ∞

1 ∂ 1
tψ = t− y ψ −1 dy
tx ∂y yx
t ∞
= t+ 1 − P (Zj ≤ yxj , 1 ≤ j ≤ d) dy
t

1
= tϕZ
tx

to conclude the proof of (ii).



5.2 Multivariate Order Statistics: The Intermediate Case 205

Example 5.1.19 Let G be a d-dimensional max-stable df with iden-


tical univariate Fréchet margins Gi (x) = exp(−x−α ), x > 0, α > 1.
According to Example 5.1.6, the max-CF of G is given by
 ∞
α 1/α
ϕG (x) = 1 + x D 1 − exp(−y −α ) dy, x > 0 ∈ Rd ,
1/α
1/ xα D

with some D-norm ·D on Rd . The inversion formula immediately


yields

!  

∂+ 1 !  1 
tϕG ! = exp −  
 xα  = G(x)
∂t tx t=1 D

for x > 0 ∈ Rd .

For additional results on max-CFs, we refer to Falk and Stupfler (2017).

5.2 Multivariate Order Statistics: The Intermediate Case

Asymptotic normality of intermediate order statistics, which are taken from


univariate iid rvs, is well known. We generalize this result to rvs in arbitrary
dimensions, where the order statistics are taken componentwise. D-norms turn
out to be quite helpful again.

Introducing Multivariate Order Statistics


   
(1) (1) (n) (n)
Let X (1) = X1 , . . . , Xd , . . . , X (n) = X1 , . . . , Xd be independent
copies of an rv X = (X1 , . . . , Xd ) that realizes in Rd . Using

X1:n,i ≤ X2:n,i ≤ · · · ≤ Xn:n,i

we denote the ordered values of the i-th components of X (1) , . . . , X (n) , 1 ≤


i ≤ d. Then, (Xj1 :n,1 , . . . , Xjd :n,d ), with 1 ≤ j1 , . . . , jd ≤ n, is an rv of order
statistics (os) in each component. We call it a multivariate os.
The univariate case d = 1 is, clearly, well investigated; standard refer-
ences are the books by David (1981); Reiss (1989); Galambos (1987); David
and Nagaraja (2005); Arnold et al. (2008), among others. In the multivariate
case d ≥ 2, the focus has been on the investigation of the rvs of component-
wise maxima (Xn:n,1 , . . . , Xn:n,d ); see Chapter 2 and 3 of this book and the
references given therein.
Much less is known in the extremal case (Xn−k1 :n,1 , . . . , Xn−kd :n,d ) with
k1 , . . . , kd ∈ N fixed; one reference is Galambos (1975). Asymptotic normality
of the rv (Xj1 :n,1 , . . . , Xjd :n,d ) in the case of central os is established in Reiss
(1989, Theorem 7.1.2). In this case, the indices ji = ji (n) depend on n and
have to satisfy ji (n)/n →n→∞ qi ∈ (0, 1), 1 ≤ i ≤ d.
206 5 Further Applications of D-Norms to Probability & Statistics

Intermediate Order Statistics


In the case of intermediate os, we require ji = ji (n) = n − ki , where ki =
ki (n) →n→∞ ∞ with ki /n →n→∞ 0. Asymptotic normality of intermediate os
in the univariate case under fairly general von Mises conditions was established
in Falk (1989). Balkema and de Haan (1978a) and Balkema and de Haan
(1978b, Theorem 7.1) proved that for particular underlying df F , Xn−k+1:n
may have any limiting distribution if it is suitably standardized and if the
sequence k is chosen appropriately.
As pointed out by Smirnov (1967), a non-degenerate limiting distribution
of Xn−k+1:n , different from the normal one, can only occur if k has an exact
preassigned asymptotic behavior. Assuming only k →n→∞ ∞, k/n →n→∞
0, Smirnov (1967) gave necessary and sufficient conditions for F such that
Xn−k+1:n is asymptotically normal, and he specified the appropriate norming
constants; see condition (5.14) below.
Smirnov’s result was extended to multivariate intermediate os by Cheng
et al. (1997). They identify the class of limiting distributions of the rv
(Xn−k1 :n,1 , . . . , Xn−kd :n,d ) after suitable normalizing and centering, and give
necessary and sufficient conditions for weak convergence.
Cooil (1985) established multivariate extensions of the univariate case
by considering vectors of intermediate os (Xn−k1 +1:n , . . . , Xn−kd +1:n ) taken
from the same sample of univariate os X1:n ≤ · · · ≤ Xn:n , but with pair-
wise different k1 , . . . , kd . Barakat (2001) investigated the limit distribution of
bivariate os in all nine possible combinations of central, intermediate, and
extreme os.
According to Sklar’s theorem 3.1.1, the df of X = (X1 , . . . , Xd ) can be
decomposed into a copula and the df Fi of each component Xi , 1 ≤ i ≤ d.
We establish in what follows asymptotic normality of the vector of multivari-
ate os (Xn−k1 :n,1 , . . . , Xn−kd :n,d ) in the intermediate case. This is achieved
under the condition that the copula corresponding to X is in the max-
domain of attraction of a multivariate extreme value df, together with the
assumption that each univariate marginal df Fi satisfies a von Mises con-
dition and that the norming constants satisfy Smirnov’s condition (5.14)
below.

Main Results: Copula Case


We consider first the case that the df of the rv X is a copula, say C, on Rd .
We require C to be in the max-domain of attraction of an SMS df G, as in
Section 3.1. In this case, according to Theorem 2.3.3, there exists a D-norm
·D on Rd such that
5.2 Multivariate Order Statistics: The Intermediate Case 207

G(x) = exp (− xD ) , x ≤ 0 ∈ Rd .

From Proposition 3.1.5, we know that C ∈ D(G) is equivalent to the expansion

C(u) = 1 − 1 − uD + o (1 − u) (5.10)

as u → 1, uniformly for u ∈ [0, 1]d .


We are now ready to state asymptotic normality of the vector of multivari-
ate os in the intermediate case following a copula. The proof of Theorem 5.2.1
is postponed.

Theorem 5.2.1 (The Copula Case) Suppose that X = (X1 , . . . , Xd )


follows a copula C, which satisfies expansion (5.10) with some D-norm
·D on Rd . Let k = k(n) = (k1 , . . . , kd ) ∈ {1, . . . , n − 1}d , n ∈ N,
2
satisfy ki /kj → kij ∈ (0, ∞) for all pairs of components 1 ≤ i, j ≤ d,
k → ∞ and k /n → 0 as n → ∞. Then, the rv of componentwise
intermediate os is asymptotically normal:

d
n n − ki
√ Xn−ki :n,i − →D N (0, Σ) ,
ki n i=1

where the d × d-covariance matrix Σ is given by



1, if i = j,
Σ = (σij ) =
kij + kji − kij ei + kji ej D ,
j.
if i =

If, for example, the underlying D-norm ·D is the logistic norm xD =
 1/p  p p 1/p
xp = ( pi=1 |xi |p ) , p ≥ 1, then σij = kij + kji − kij + kji , i
= j.
Note that σij = 0, i
= j, if ·D = ·1 , which is the case for inde-
5d
pendent margins of G(x) = exp(− xD ) = i=1 exp(xi ), x ≤ 0 ∈ Rd .
Then, the components of X = (X1 , . . . , Xd ) are tail independent. The re-
verse implication is true as well, i.e., the preceding result entails that the
componentwise intermediate os Xn−k1 :n,1 , . . . , Xn−kd :n,d are asymptotically
independent iff they are pairwise asymptotically independent. But this is
one of Takahashi’s characterizations of ·D = ·1 ; see Corollary 1.3.5 and
Theorem 2.3.8.
Note that σij ≥ 0 for each pair i, j, i.e., the componentwise os are asymp-
totically positively correlated. This is an obvious consequence of the fact that
each D-norm ·D is pointwise less than ·1 ; see (1.4).
208 5 Further Applications of D-Norms to Probability & Statistics

Corollary 5.2.2 If we choose identical ki in the preceding result, i.e.,


k1 = · · · = kd = k, then, under the conditions of Theorem 5.2.1, we
obtain
d
n n−k
√ Xn−k:n,i − →D N (0, Σ)
k n i=1
with
1, if i = j,
Σ = (σij ) =
2 − ei + ej D ,
j.
if i =

Let U1:n ≤ U2:n ≤ · · · ≤ Un:n denote the os of n independent and uniformly


on the (0, 1) distributed rvs U1 , . . . , Un . It is well known that
 i n
n j=1 Ej
(Ui:n )i=1 =D n+1 ,
j=1 Ej i=1

where E1 , . . . , En+1 are iid standard exponential rvs; see equation (2.29).
Let ξ1, ξ2 , . . . ,ξ2(n+1) be iid standard normal distributed rvs. From the
fact that ξ12 + ξ22 /2 follows the standard exponential distribution on (0, ∞),
we obtain the representation
 2i n
2
n j=1 ξj
(Ui:n )i=1 =D 2(n+1) . (5.11)
j=1 ξj2 i=1

Corollary 5.2.2 now opens up a way of tackling a multivariate extension of


the above representation (5.11) at least partially and asymptotically.
Corollary 5.2.3 Suppose that the d × d-matrix Λ with entries

1/2 1, if i = j,
λij = σij =  1/2
2 − ei + ej D , if i
= j,

is positive semidefinite and let ξ (1) , ξ (2) , . . . be independent copies of


the rv ξ = (ξ1 , . . . , ξd ), which follows the normal distribution N (0, Λ)
on Rd . Then, under the conditions of Corollary 5.2.2, we obtain
! ⎛⎛ ⎞ ⎞!
!
!  2(n−k)  (j) 2 d !
!
!  ⎜ ξi ⎟ !
! ⎜ ⎜ j=1 ⎟ ⎟
≤ x⎠!!
d
sup !P (Xn−k:n,i )i=1 ≤ x − P ⎝⎝   2 ⎠
x∈Rd ! 2(n+1)
ξi
(j) !
! j=1
i=1
!
→n→∞ 0.
5.2 Multivariate Order Statistics: The Intermediate Case 209

2(n−k)  (j) 2 2(n+1)  (j) 2


Note that the univariate margins j=1 ξi / j=1 ξi ,i=
1, . . . , d, in the above result have identical distributions, due to equation
(5.11).
The d × d-matrix (λ2ij ) = (σij ) is by Corollary 5.2.2 positive definite.
However, if a matrix with non-negative entries is positive semidefinite, the
matrix of the square roots of its entries is not necessarily positive semidefinite
again. Take, for example, the 3 × 3-matrix
⎛ ⎞
10a
A = ⎝ 0 1 a⎠ .
aa1

This matrix is positive definite for a = 1/31/2 , but not for a = 1/31/4 . This is
the reason why we require the extra condition in Corollary 5.2.3 that the ma-
trix Λ is positive semidefinite. The matrix Λ is, for example, positive semidef-
inite if the value of ei + ej D does not depend on the pair of indices i
= j,
in which case Λ satisfies the compound symmetry condition.

Proof (of Corollary 5.2.3). From Corollary 5.2.2, we obtain that



d
n n−k
√ Xn−k:n,i − →D N (0, Σ).
k n i=1

The assertion follows if we establish


⎛ ⎞d
2(n−k)  (j) 2
n ⎜ j=1 ξ i n − k⎟
√ ⎝  2 − ⎠ →D N (0, Σ)
k  2(n+1) (j) n
j=1 ξ i
i=1

as well. But this follows from the central limit theorem and elementary argu-
ments, using the fact that Cov(X 2 , Y 2 ) = 2c2 if (X, Y ) is bivariate normal
with Cov(X, Y ) = c.

The Proof of Theorem 5.2.1


The proof of Theorem 5.2.1 requires a suitable multivariate central limit the-
orem for arrays. To ease its reference we state it explicitly here. It follows
from the univariate version based on Lindeberg’s condition, together with the
Cramér–Wold device; see, for example, Billingsley (1999, 2012). Recall that
all operations on vectors are meant componentwise.
210 5 Further Applications of D-Norms to Probability & Statistics

Lemma 5.2.4 (Multivariate Central Limit Theorem for Arrays)


(1) (n)
Let Xn , . . . , Xn be iid rvs with mean zero for each n ∈ N, and
bounded by some constant m = (m1 , . . . , md ) > 0 ∈ Rd . Suppose
(n)
there is a sequence c(n) , n ∈ N, in Rd with nci →n→∞ ∞ for
(1)
i = 1,  such that the covariance matrix of Xn can 
 . . . , d, be written
√  as
(1)
Cov Xn = C (n) Σ (n) C (n) , n ∈ N, where C (n) = diag c(n) and
the matrices Σ (n) , n ∈ N, satisfy Σ (n) →n→∞ Σ (meant element-wise).
Then,
1 n
√ Xn(i) →D N (0, Σ).
nc(n) i=1

Proof (of Theorem 5.2.1). Choose x = (x1 , . . . , xd ) ∈ Rd . Elementary argu-


ments yield


d 
n n − ki
P √ Xn−ki :n,i − ≤x (5.12)
ki n i=1

ki n − ki
= P Xn−ki :n,i ≤ xi + , 1≤i≤d
n n
⎛ ⎞
n 6 √ 7

(j) k n − k
=P⎝ ≥ n − ki , 1 ≤ i ≤ d⎠
i i
1 Xi ∈ 0, xi +
j=1
n n
⎛⎛
n √
⎝ ⎝ 1  ki n − ki
=P √ xi +
ki j=1 n n
6 √ 7

d 
(j) ki n − ki
−1 Xi ∈ 0, xi + ≤x .
n n i=1

Now, put
  6 √ 7

d
(n) (n) (n) ki n − ki
Y := := 1 Xi ∈ 0,
Y1 , . . . , Yd xi +
n n i=1
 
(n)
with values in {0, 1} . The entries of its covariance matrix Σ (n) = σij
d
for
i
= j are given by
     
(n) (n) (n) (n) (n)
σij = E Yi Yj − E Yi E Yj
     
(n) (n) (n) (n)
= P Yi = Yj = 1 − P Yi = 1 P Yj = 1
 √ & 
ki n − ki kj n − kj
= P Xi ≤ xi + , Xj ≤ xj +
n n n n
5.2 Multivariate Order Statistics: The Intermediate Case 211

 & 
ki n − ki kj n − kj
− P Xi ≤ xi + P Xj ≤ xj +
n n n n
√ & 
ki n − ki kj n − kj
= Cij xi + , xj +
n n n n
√ 

& 
ki n − ki kj n − kj
− xi + xj +
n n n n

if n is large, where

Cij (u, v) := P (Xi ≤ u, Xj ≤ v)


= C (1 − (1 − u)ei − (1 − v)ej ) , u, v ∈ [0, 1].

Expansion (5.10) now implies for the case i


= j
 √
 &  
 k 
(n)  i ki kj kj 
σij = 1 −  − xi ei + − xj ej 
 n n n n 

 &
D
 & 
ki ki kj kj ki kj
− xi − +1 xj − +1 +o
n n n n n
 √
 &  
 k 
 i ki kj kj 
= − − xi ei + − xj ej 
 n n n n 
&  D

ki + kj ki kj
+ +o
n n
&
ki kj  
= kij + kji − kij ei + kji ej D + o(1) .
n
For i = j, one deduces

(n) ki
σii = (1 + o(1)).
n
The asymptotic normality N (0, Σ)(−∞, x] of the final term in equation
(5.12) now follows from Lemma 5.2.4.

Main Results: General Case


Let F be a df on Rd with univariate margins F1 , . . . , Fd . From Sklar’s the-
orem 3.1.1 we know that there exists a copula C on Rd such that F (x) =
C(F1 (x1 ), . . . , Fd (xd )) for each x = (x1 , . . . , xd ) ∈ Rd .
Let X (1) , X (2) , . . . be independent copies of the rv X, which follows this
df F . We can assume the representation
212 5 Further Applications of D-Norms to Probability & Statistics
 
X = F1−1 (U1 ), . . . , Fd−1 (Ud ) ,

where U = (U1 , . . . , Ud ) follows the copula C and Fi−1 (u) = inf{t ∈ R :


Fi (t) ≥ u}, u ∈ (0, 1), is the generalized inverse of Fi , 1 ≤ i ≤ d. Equally, we
can assume the representation
    
(j) (j)
X (j) = F1−1 U1 , . . . , Fd−1 Ud , j ∈ N,

where U (1) , U (2) , . . . are independent copies of U .


Put ω(Fi ) := sup {x ∈ R : Fi (x) < 1} ∈ (−∞, ∞], which is the upper
endpoint of the support of Fi , and suppose that the derivative Fi = fi exists
and is positive throughout some left neighborhood of ω(Fi ). Let ki = ki (n) ∈
{1, . . . , n − 1} satisfy ki →n→∞ ∞, ki /n →n→∞ 0. It follows from Falk (1989,
Theorem 2.1) that under appropriate von Mises-type conditions on Fi , stated
below, we have convergence in distribution
Xn−ki :n,i − dni
→D N (0, 1)
cni
for any sequences cni > 0, dni ∈ R that satisfy
cni dni − bni
lim =1 and lim = 0, (5.13)
n→∞ ani n→∞ ani
where

ki ki
bni := Fi−1 1 − , ani := , 1 ≤ i ≤ d.
n nfi (bni )
Theorem 1 of Smirnov (1967) shows that the distribution of c−1n (Xn−ki :n −
dn ) converges weakly to N (0, 1) for some choices of constants cn > 0, dn ∈ R,
iff for any x ∈ R
ki + n(Fi (cn x + dn ) − 1)
lim √ = x. (5.14)
n→∞ ki

Von Mises-Type Conditions


Next, we state the three von Mises-type conditions, under which we have
asymptotic normality for intermediate multivariate os in the general case:
ω(Fi ) ∈ (−∞, ∞] and
 ω(Fi )
fi (x) x
1 − Fi (t) dt
lim = 1; (von Mises (1))
x↑ω(Fi ) (1 − Fi (x))2

ω(Fi ) = ∞ and there exists αi > 0 such that

xfi (x)
lim = αi ; (von Mises (2))
x→∞ 1 − Fi (x)
5.3 Multivariate Records and Champions 213

ω(Fi ) < ∞ and there exists αi > 0 such that


(ω(Fi ) − x)fi (x)
lim = αi . (von Mises (3))
x↑ω(Fi ) 1 − Fi (x)
The standard normal df, as well as the standard exponential df, satisfies con-
dition (1); the Pareto df Fα (x) = 1 − x−α , x ≥ 1, α > 0, satisfies condition
(2), and the triangular df on (−1, 1), with density f (x) = 1 − |x|, x ∈ (−1, 1),
satisfies condition (3) with α = 2, for example. For a discussion of these well-
studied and general conditions, each of which ensures that Fi is in the domain
of attraction of a univariate EVD, see, for example, Falk (1989).

Asymptotic Normality: The General Case


The following generalization of Theorem 5.2.1 can now easily be established.
Proposition 5.2.5 Let the rv X have df F . Suppose that the copula C
of F satisfies condition (5.10), i.e., C is in the max-domain of attraction
of an SMS df, and suppose that each univariate margin Fi of F satisfies
one of the von Mises-type conditions (1), (2), or (3).
Let k = k(n) ∈ {1, . . . , n − 1} , n ∈ N, satisfy ki /kj →n→∞ kij2
d

(0, ∞) for all pairs of components i, j = 1, . . . , d, k →n→∞ ∞ and
k /n →n→∞ 0. Then, the vector of intermediate multivariate os sat-
isfies

d
Xn−ki :n,i − dni
→D N (0, Σ) ,
cni i=1
with Σ as in Theorem 5.2.1, for any sequences cni > 0, dni ∈ R, that
satisfy (5.13).

Proof. For x = (x1 , . . . , xd ) ∈ Rd , we have


Xn−ki :n,i − dni


P ≤ xi , 1 ≤ i ≤ d
cni
 −1 
= P Fi (Un−ki :n,i ) ≤ cni xi + dni , 1 ≤ i ≤ d
 n  n − ki  ki + n(Fi (cni xi + dni ) − 1) 
=P √ Un−ki :n,i − ≤ √ ,1≤i≤d .
ki n ki
The assertion is now a consequence of Theorem 5.2.1 and Smirnov’s condition
(5.14).

5.3 Multivariate Records and Champions


Records among a sequence of iid rv X (1) , X (2) , . . . on the real line have been
investigated extensively over the past few decades. A record is defined as an
214 5 Further Applications of D-Norms to Probability & Statistics

rv X (n) such that X (n) > max(X (1) , . . . , X (n−1) ). Trying to generalize this
concept for the case of random vectors, or even stochastic processes with
continuous sample paths, gives rise to the question of how to define records
in higher dimensions. We consider two different concepts: a simple record is
meant to be an rv X (n) that is larger than X (1) , . . . , X (n−1) in at least one
component, whereas a complete record has to be larger than its predecessors
in all components. In addition to this sequential approach, we say that a set
of rvs X (1) , . . . , X (n) contains a champion if there is an index i ∈ {1, . . . , n}
with X (i) > X (j) , j
= i. In this case, X (i) is called the champion among
X (1) , . . . , X (n) .

Terminology
Let X, X (1) , X (2) , . . . be iid rv in Rd with continuous df F . We call X (n) a
simple record if X (n)
≤ maxi=1,...,n−1 X (i) , and we call it a complete record
if X (n) > maxi=1,...,n−1 X (i) (Figures 5.1–5.3). We further define
 
π n (X) := P X (n) is a simple record ,
 
π̄n (X) := P X (n) is a complete record .

×
×
×
×

Fig. 5.1: Data set at time n − 1.

By definition, the first observation X (1) is always a record; thus, we de-


mand π 1 (X) = π 1 (X) = 1. In the univariate case, where X, X (1) , X (2) , . . .
are rvs on the real line, records are much easier to handle, and clearly
π n (X) = π n (X) = 1/n: with probability one there is a single strictly largest
observation among X (1) , . . . , X (n) by the continuity of F , i.e.,
⎛ ⎞
n * + n

1=P⎝ X (j) > max X (i) ⎠ = P X (j) > max X (i) .


1≤i=j≤n 1≤i=j≤n
j=1 j=1
5.3 Multivariate Records and Champions 215

×
× × a simple record
×
×

Fig. 5.2: Data set at time n: a simple record.

× a complete record

×
× ×
×
×

Fig. 5.3: Data set at time n + 1: a complete record.

The iid assumption on X (1) , . . . , X (n) provides



(n) (i) (j) (i)


P X > max X = P X > max X , j = 1, . . . , n − 1.
1≤i≤n−1 1≤i=j≤n

These two equations together imply


1
P X (n) > max X (i) = .
1≤i≤n−1 n

There is much detailed work on univariate records; see, for example,


Galambos (1987, Sections 6.2 and 6.3), Resnick (1987, Chapter 4), and Arnold
et al. (1998). Results on the limiting distribution of joint records have been
recently derived by Barakat and Abd Elgawad (2017) and Falk et al. (2018).
Multivariate records have not been discussed that extensively, yet they have
been approached by Goldie and Resnick (1989, 1995) and Arnold et al. (1998,
Chapter 8), among others. For supplementary material on multivariate and
functional records, we refer the reader to Zott (2016) and Dombry et al. (2018).
216 5 Further Applications of D-Norms to Probability & Statistics

It’s the Copula


According to Sklar’s theorem 3.1.1, the df F has the representation

F (x) = C(F1 (x1 ), . . . , Fd (xd )), x = (x1 , . . . , xd ),

where C is a copula on Rd and F1 , . . . , Fd are the univariate margins of F .


Therefore, we can assume the representation
    
(i) (i)
X (i) = F1−1 U1 , . . . , Fd−1 Ud , i = 1, 2, . . . ,

where  
(i) (i)
U (i) = U1 , . . . , Ud , i = 1, 2, . . .
are iid rvs that follow the copula C.
Recall that since F is continuous, the margins are continuous as well, and
in this case, C is uniquely determined by
 
C(u) = F F1−1 (u1 ), . . . , Fd−1 (ud ) , u = (u1 , . . . , ud ) ∈ (0, 1)d .

Being a record (or a champion) depends on U (n) , not on the df F , if this


is a continuous function. This is the message of the next lemma.
Lemma 5.3.1 If the underlying df F is continuous, then we have with
probability one for each n ∈ N

X (n) is a (simple/complete) record or a champion


⇐⇒ U (n) is a (simple/complete) record or a champion.

Proof. We make use of the general equivalence

Fi−1 (u) ≤ x ⇐⇒ u ≤ Fi (x), u ∈ (0, 1), x ∈ R,

or, equivalently,

Fi−1 (u) > x ⇐⇒ u > Fi (x), u ∈ (0, 1), x ∈ R,

which can be established by elementary arguments. For a continuous df Fi , we


have Fi (Fi−1 (u)) = u, u ∈ (0, 1). It is also well known that the multivariate df
F is continuous iff the univariate margins are; see, for example, Reiss (1989,
Lemma 2.2.6). Thus, we obtain with probability one for i
= k
   
(i) (k) (i) (k)
Xj > Xj ⇐⇒ Fj−1 Uj > Fj−1 Uj
  
(k) (k) (k)
⇐⇒ Uj > Fj Fj−1 Uj = Uj .

This proves the assertion.



5.3 Multivariate Records and Champions 217

Concurrency of Extremes and Champions


A concept that is closely related to the field of complete records is the so-
called concurrency of extremes, which is due to Dombry et al. (2017). We say
that X (1) , . . . , X (n) are sample concurrent if
max X (i) = X (k) for some k ∈ {1, . . . , n}.
i=1,...,n

In that case, we call X (k) the champion among X (1) , . . . , X (n) . Note that,
different from univariate iid observations X1 , . . . , Xd with a continuous df F ,
there is not necessarily a champion for multivariate observations.
We denote the sample concurrence probability by pn (X) and, due to the
iid property, obtain as before
 n * +

(i) (j)
pn (X) = P X > max X
1≤j=i≤n
i=1

n

= P X (i) > max X (j)


1≤j=i≤n
i=1

= nP X (n) > max X (j)


j=1,...,n−1

= nπ n (X). (5.15)
If the limit limn→∞ pn (X) exists in [0, 1], we call it the extremal concurrence
probability.
Different than records, the concept of multivariate and functional champi-
ons is very recent. It has been established in the work of Dombry et al. (2017).
In their paper, they derive the limit sample concurrence probability under iid
rvs X (1) , . . . , X (n) in Rd . There are also many results on statistical inference
in their work. The D-norm approach provides an elegant formulation of their
results; see below.
According to the Lemma 5.3.1, we can assume wlog that the observed iid
rvs follow a copula, say C. To emphasize this assumption, in what follows, we
use the notation U instead of X.
Theorem 5.3.2 Let U (1) , U (2) , . . . be independent copies of the rv U ,
which follows a copula C on Rd satisfying C ∈ D(G), where G is an
SMS df with corresponding D-norm ·D . Then,

pn (U ) = nπ n (U ) →n→∞ E ( η D ) ,

where the rv η has df G.

Proof. The condition C ∈ D(G) implies


 
M (n) := n max U (i) − 1 →D η.
i=1,...,n−1
218 5 Further Applications of D-Norms to Probability & Statistics

Conditioning on M (n) = x ≤ 0 ∈ Rd yields


  
nπ n (U ) = nP (n(U − 1) > x) P ∗ M (n) (dx)
(−∞,0]d
  
=: gn (x) P ∗ M (n) (dx)
(−∞,0]d
 
since M (n) and U are independent. Setting Yn := gn M (n) , we need to
show
nπ n (U ) = E(Yn ) →n→∞ E ( η D ) .
It is enough to verify (Billingsley (1968, p. 32)):
(i) Yn →D  η D .  
1+ε
(ii) There exists ε > 0 with supn∈N E |Yn | < ∞.

Note that (ii) implies the uniform integrability of the sequence (Yn )n∈N .
We first show (i). From Lemma 3.1.13 we obtain

gn (xn ) →n→∞  x D

for xn , x ≤ 0 ∈ Rd with xn − x∞ →n→∞ 0. Now noticing that M (n) →D


η, the assertion is immediate from the extended continuous mapping theorem;
(Billingsley (1968, Theorem 5.5)).
Now we prove part (ii). Elementary calculations show that, for all n ≥ 2,
  
 2 2
E Yn = n2 P (n(U − 1) > x) P ∗ M (n) (dx)
(−∞,0]d
  
2 (n)
≤ n2 P (n (U1 − 1) > x) P ∗ M1 (dx)
(−∞,0]
  x 2  (n)

= n2 P U 1 > 1 + P ∗ M1 (dx)
(−∞,0] n
  
(n)
= x2 P ∗ M1 (dx)
[−n,0]
 2

(n)
=E M1
2n
= ≤ 2,
n+1
which completes the proof of Theorem 5.3.2.

The preceding result clearly implies an expansion of the expected number


of complete records as the sample size n increases.
5.3 Multivariate Records and Champions 219
n  (i) 
Corollary 5.3.3 Denote by R(n) := i=1 1 U > max1≤j<i U (j)
the number of complete records among U (1) , . . . , U (n) . Then, under the
conditions of Theorem 5.3.2,

E(R(n))
→n→∞ E ( η D ) .
log(n)

Proof. We have

n

E(R(n)) = P U (i) > max U (j)


1≤j<i
i=1
n
= π i (U )
i=1
n
1
= (iπ i (U )) .
i=1
i
n
It is well known that ( i=1 1/i) / log(n) →n→∞ 1. The assertion follows from
Theorem 5.3.2 and equation (5.15) together with elementary arguments.
The following lemma provides an alternative representation of the extremal
concurrence probability.
Lemma 5.3.4 Let η = (ηs )s∈S be an SMS rv with corresponding D-
norm ·D . Let Z = (Z1 , . . . , Zd ) be a generator of ·D . Then, we
have

1
E ( η D ) = E 1(Z > 0) .
1/ZD
Further, for x = (x1 , . . . , xd ) ≤ 0 ∈ Rd , we have

E ( max(η, x) D )


1
=E 1 − exp 1/ZD max (xi Zi ) 1(Z > 0) .
1/ZD 1≤i≤d

Proof. Wlog, we can choose a generator Z of ·D that is independent of η.


Then, by conditioning on η,



E min (|ηi | Zi ) = E min (|xi | Zi ) (P ∗ η)(dx)


1≤i≤d (−∞,0]d 1≤i≤d

=  x D (P ∗ η)(dx)
(−∞,0]d

= E ( η D ) .

Lemma 1.2.2, and the fact that η and Z are independent entail
220 5 Further Applications of D-Norms to Probability & Statistics

 ∞

E min (|ηi | Zi ) = P min (|ηi | Zi ) > t dt


1≤i≤d 1≤i≤d
0 ∞
= P (ηi < −t/Zi , 1 ≤ i ≤ d) dt
0 ∞ 
= P (ηi < −t/zi , 1 ≤ i ≤ d) (P ∗ Z)(dz) dt
0 [0,∞)d
 ∞ 
= exp (− t/zD ) (P ∗ Z)(dz) dt
0 [0,∞)d
  ∞

= exp(−t 1/zD ) dt 1(z > 0) (P ∗ Z)(dz)


[0,∞)d 0

1
= 1(z > 0)(P ∗ Z)(dz)
1/zD
[0,∞)d

1
=E 1(Z > 0) ,
1/ZD

which is the first assertion. The second assertion can be shown by repeating
the above arguments.

Example 5.3.5 (Independence and Perfect Dependence) A


generator of the special D-norm ·D = ·∞ , which characterizes
the complete dependence of the univariate margins of η, is given by the
constant Z = 1. In that case, Lemma 5.3.4 implies that the extremal
concurrence probability is one, i.e., pn (U ) = nπ n (U ) →n→∞ 1.
In contrast to that, we have

1
E 1(Z > 0) = 0 ⇐⇒ min Zi = 0 a. s. (5.16)
1/ZD 1≤i≤d

In particular, this is the case when at least two components ηi , ηj ,


i
= j, are independent. This is due to the fact that the bivariate D-
norm corresponding to (ηi , ηj ) is the bivariate logistic norm ·1 , with
generator (Zi , Zj ). But  · 1 = 0 and thus min(Zi , Zj ) = 0 a.s. by (1.12)
and Corollary 1.6.3.

Asymptotic Distribution of Complete Records


Having established the extremal concurrence probability, we can now derive
the limit survival function of a complete record. We have to restrict ourselves
to the case where P (Z > 0) > 0, which is equivalent to the condition that
the extremal concurrence probability is positive; see (5.16). Just as before, we
consider the copula case first.
5.3 Multivariate Records and Champions 221

Proposition 5.3.6 In addition to the assumptions of Theorem 5.3.2,


suppose that the generator fulfills P (Z > 0) > 0. Then, for x ≤ 0 ∈ Rd ,
   
H̄n (x) := P n U (n) − 1 > x | U (n) is a complete record
E ( max(η, x) D )
→n→∞ H̄D (x) := ,
E ( η D )

where η = (ηi )1≤i≤d is an SMS rv with corresponding D-norm ·D .

Note that we avoid division by zero in the preceding formula by the as-
sumption P (Z > 0) > 0.

Proof. We have
 
Πn (x) P n(U − 1) > x, U > maxi=1,...,n−1 U (i)
H̄n (x) = :=   .
πn P U > maxi=1,...,n−1 U (i)

According to Theorem 5.3.2, it remains to show that, for each x ∈ Rd ,


  
nΠn (x) = nP n(U − 1) > max x, M (n) →n→∞ E ( max(η, x) D ) ,
 
where M (n) := n maxi=1,...,n−1 U (i) − 1 . This can be done by repeating the
arguments of the proof of Theorem 5.3.2.

Another representation of H̄D (x) is given by



1
E 1/Z exp (1/ZD max1≤i≤d (xi Zi )) 1(Z > 0)
D
H̄D (x) = 1 −   ,
1
E 1/Z 1(Z > 0)
D
(5.17)
where Z is a generator of ·D . This is due to Lemma 5.3.4.
Example 5.3.7 For the Marshall–Olkin D-norm

xDλ = λ x∞ + (1 − λ) x1 , x ∈ Rd , λ ∈ (0, 1),

we obtain

H̄λ (x) = 1 − exp 1Dλ max xi , x ≤ 0,


i=1,...,d

which is the survival function of the max-stable rv (η, . . . , η)/ 1Dλ ,


where η is standard negative exponentially distributed and 1Dλ =
λ + d(1 − λ). Note that this rv has completely dependent components.
222 5 Further Applications of D-Norms to Probability & Statistics

Although it is not common, we provide a proof of the preceding Exam-


ple 5.3.7 as well.

Proof. A generator of the Marshall–Olkin D-norm ·Dλ is given by

Z := ξ(1, . . . , 1) + (1 − ξ)Z ∗ ,

where ξ is an rv with P (ξ = 1) = λ = 1 − P (ξ = 0), and ξ is independent


of Z ∗ , which is a random permutation of the vector (d, 0, . . . , 0) with equal
probability 1/d. Obviously, P (Z > 0, ξ = 0) = 0. On the other hand, ξ = 1
implies Z = 1. Thus, we obtain from (5.17), for all x ≤ 0 ∈ Rd ,

H̄λ (x)
   
1
E 1/Z D exp 1/ZDλ maxi=1,...,d (xi Zi ) 1(Z > 0, ξ = 1)
=1− λ
 
1
E 1/Z 1(Z > 0, ξ = 1)

= 1 − exp 1Dλ max xi ,


i=1,...,d

which completes the proof.


Simple Records
So far, we have investigated the (normalized) probability of a complete record
and in particular, its limit, the extremal concurrence probability. Now, we re-
peat this procedure, this time for the simple record probability. Unlike before,
where we were actually dealing with the probability of having a champion,
normalizing the record probability with the factor n does not yield an inter-
pretation in terms of a probability in the simple record case.
The following result is the equivalent of Theorem 5.3.2 and Proposi-
tion 5.3.6 in the context of multivariate simple records. Let X, X (1) , X (2) , . . .
be iid rvs in Rd with common continuous df F . Recall that X (n) is a simple
record, if
X (n)
≤ max X (i) ,
1≤i≤n−1

and π n (X) denotes the probability of X (n) being a simple record within the
iid sequence X (1) , X (2) , . . .
Theorem 5.3.8 Let U (1) , U (2) , . . . be independent copies of an rv U ∈
Rd following a copula C. Suppose that C ∈ D(G), G(x) = exp(− xD ),
x ≤ 0 ∈ Rd . Let η be an rv with this df G. Then

nπ n (U ) →n→∞ E (ηD ) ,

and
5.3 Multivariate Records and Champions 223

P (n(U (n) − 1) ≤ x | U (n) is a simple record)


E(min(x, η)D ) − xD
→n→∞ HD (x) := , x ≤ 0 ∈ Rd .
E(ηD )

In the one-dimensional case d = 1, we obtain HD (x) = exp(x), x ≤ 0. Note,


however, that HD is not a probability df in general. For instance, take ·D =
·1 , which is the largest D-norm. In this case, the components η1 , . . . , ηd of
η are independent, and we obtain for x = (x1 , . . . , xd ) ≤ 0 ∈ Rd
d d
i=1 (E(|min(xi , ηi )|) − |xi |) i=1 exp(xi )
H1 (x) = d = .
i=1 E(|ηi |)
d

This is in general not a probability df on (−∞, 0]d since, for example, H1 (x)
does not converge to zero if only one component xi converges to −∞.
On the other hand, take ·D = ·∞ , which is the least D-norm. In this
case, the components η1 , . . . , ηd of η are completely dependent, i.e., η1 = η2 =
· · · = ηd a.s.; thus,
  
 d 
H∞ (x) = E (min(xi , η1 ))i=1  − x∞

= E (max(x∞ , |η1 |)) − x∞
= exp(− x∞ ), x = (x1 , . . . , xd ) ≤ 0 ∈ Rd ,

which is an SMS df according to Theorem 2.3.3.

Proof (of Theorem 5.3.8). Let Z be a generator of ·D , independent of η.


Theorem 5.3.2, the inclusion–exclusion principle in Corollary 1.6.2 as well as
Lemma 1.6.1 yield

(i)
nπ n (U ) = nP U
≤ max U
1≤i≤n−1
⎛ ⎞
d *
 +
(i)
= nP ⎝ Uj > max Uj ⎠
1≤i≤n−1
j=1

|T |−1 (i)
= (−1) nP Uj > max Uj , j∈T
1≤i≤n−1
∅=T ⊂{1,...,d}

→n→∞ (−1)|T |−1 E min(|ηj | Zj )


j∈T
∅=T ⊂{1,...,d}

=E max (|ηj | Zj )
1≤j≤d

= E (ηD ) .
224 5 Further Applications of D-Norms to Probability & Statistics

Similarly, one can use Proposition 5.3.6 to show that, for x ≤ 0 ∈ Rd ,

nP (n(U − 1)
≤ min(x, Mn )) →n→∞ E (min(x, η)D ) ,

where Mn := n maxi=1,...,n−1 (Un − 1) →D η.


From Proposition 3.1.5, we obtain for x ≤ 0 ∈ Rd
 x   x 
nP U
≤ =n 1−P U ≤ = xD + o(1)
n n
as n increases.
In summary, we obtain

x (i)
nP U ≤ 1 + , U
≤ max U
n 1≤i≤n−1
 x
= nP (n(U − 1)
≤ min (x, Mn )) − nP U
≤ 1 +
n
→n→∞ E (min(x, η)D ) − xD ,

which completes the proof of Theorem 5.3.8.


In Corollary 5.3.3, we investigated the expected number of complete


records as the sample size went to infinity. This can be done analogously
for simple records. Its proof carries over.

Corollary 5.3.9 Let X (1) , X (2) , . . . be iid rvs in Rd with a continuous


df F . Suppose that the copula corresponding to F is in the domain of
n xD ), x ≤ 0 ∈ R .
d
attraction of G(x) = exp(− 
Denote by m(n) := i=1 1 X (i)
≤ max1≤j<i X (j) the number of
simple records among X (1) , . . . , X (n) . Then, we have

E(m(n))
→n→∞ E (ηD ) ,
log(n)

where η follows the df G.

The arguments in the proof of Theorem 5.3.8 can easily be repeated to


extend it to the case of a general rv X ∈ Rd , whose df is in the domain of
attraction of a max-stable df. Denote again using
 
CF (u) := F F1−1 (u1 ), . . . , Fd−1 (ud ) , u = (u1 , . . . , ud ) ∈ [0, 1]d ,

the copula of a continuous df F on Rd , where Fi is the i-th univariate


marginal df.
5.3 Multivariate Records and Champions 225

Corollary 5.3.10 Let X (1) , X (2) , . . . be independent copies of an rv


X ∈ Rd , whose df F is continuous and its copula CF satisfies CF ∈
D(G), G(x) = exp(− xD ), x ≤ 0 ∈ Rd . We require in addition that
each univariate margin Fi of F is in the domain of attraction of a
univariate max-stable df Gi , i.e., there are constants ani > 0, bni ∈ R,
n ∈ N, such that, for i = 1, . . . , d,

n(1 − F (ani x + bni )) →n→∞ − log(Gi (x)) =: −ψi (x), Gi (x) > 0.

Then, with an := (an1 , . . . , and ), bn := (bn1 , . . . , bnd ) and ψ(x) :=


(ψ1 (x1 ), . . . , ψd (xd )), x = (x1 , . . . , xd ), Gi (xi ) > 0, i = 1, . . . , d, we
obtain
(n) !

X − bn ! (n)
P ≤x!X is a simple record →n→∞ HD (ψ(x)).
an

Note that in the case d = 1

HD (ψ(x)) = exp(ψ(x)) = G(x), G(x) > 0.

Note, moreover, that the assumptions on the df F in the preceding theorem


are equivalent with the condition F ∈ D(G), where G is a d-dimensional
max-stable df, together with the condition that F is continuous; see Proposi-
tion 3.1.10.

Proof (of Corollary 5.3.10). Assume the representation


 
X = F1−1 (U1 ), . . . , Fd−1 (Ud ) ,

where U = (U1 , . . . , Ud ) follows the copula CF of X. Repeating the arguments


in the proof of Theorem 5.3.8 now implies the assertion.

(Simple) Record Times


We denote using N (n), n ≥ 1, the (simple) record times, i.e., those subsequent
random indices at which a simple record occurs. Precisely, N (1) = 1, as X (1)
is clearly a record, and, for n ≥ 2,
* +
N (n) := min j : j > N (n − 1), X (j)
≤ max X (i) .
1≤i≤N (n−1)

As the df F is continuous, the distribution of N (n) does not depend on


the univariate margins of F ; therefore, we assume wlog in what follows that
F is a copula C on Rd , i.e., each component of X (i) is uniformly distributed
on (0, 1). To emphasize this fact, we use again the notation U = (U1 , . . . , Ud )
instead of X in what follows.
226 5 Further Applications of D-Norms to Probability & Statistics

Expectation of Record Time


Conditioning on U (1) = u, for j ≥ 2 yields

P (N (2) = j) = P (U (2) ≤ U (1) , . . . , U (j−1) ≤ U (1) , U (j)


≤ U (1) )

= C(u)j−2 (1 − C(u)) C(du).
[0,1]d

Solving the geometric series, we get



 
1
E(N (2)) = jP (N (2) = j) = C(du) + 1. (5.18)
j=2 [0,1]d 1 − C(u)

Suppose now that d = 1. Then, we have u = u ∈ [0, 1], C(u) = u, and


 1
u
E(N (2)) = du + 2 = ∞,
0 1−u

which is well known (Galambos (1987, Theorem 6.2.1)). Because N (n) ≥


N (2), n ≥ 2, we have E(N (n)) = ∞ for n ≥ 2 as well.
Suppose next that d ≥ 2 and that the margins of C are independent, i.e.,


d
C(u) = ui , u = (u1 , . . . , ud ) ∈ [0, 1]d .
i=1

Then, we obtain
  1  1 5d
C(u) ui
C(du) = ... i=1
5d du1 . . . dud < ∞
[0,1]d 1 − C(u) 0 0 1− i=1 ui

using elementary arguments and, thus, E(N (2)) < ∞. This observation gives
rise to the problem of how to characterize those copulas C on [0, 1]d, with
d ≥ 2, such that E(N (2)) is finite. Note that E(N (2)) = ∞ if the components
of C are completely dependent.

Characterization of Finite Expectation


The next result characterizes the case E(N (2)) < ∞. It requires no further
condition on the underlying copula C, i.e., we do not require C ∈ D(G) for
some SMS dfs G. Its proof only uses the Hoeffding–Fréchet bounds for a
multivariate df (see, for example, Galambos (1987, Theorem 5.1.1)).
5.3 Multivariate Records and Champions 227

Proposition 5.3.11 We have E(N (2)) < ∞ iff


 1
P (Ui ≥ u, 1 ≤ i ≤ d)
du < ∞. (5.19)
0 (1 − u)2

Condition (5.19) is trivially satisfied in the case of independent components


U1 , . . . , Ud and d ≥ 2. Below, we see that it is roughly satisfied in general if
there are at least two components that are tail independent.
Proof. The Hoeffding–Fréchet bounds for a copula C are for u = (u1 , . . . , ud ) ∈
[0, 1]d  
d
max 1 − d + ui , 0 ≤ C(u) ≤ min (u1 , . . . , ud ) . (5.20)
i=1
Due to the upper bound in (5.20), we obtain from Lemma 1.2.2

1
E(N (2)) − 1 = C(du)
[0,1]d 1 − C(u)

1
=E
1 − C(U )
 ∞

1
= P C(U ) > 1 − dt
1 t
 ∞

1
≤ P Ui > 1 − , 1 ≤ i ≤ d dt.
1 t
On the other hand, the lower bound in (5.20) yields
 ∞

1
E(N (2)) − 1 = P C(U ) > 1 − dt
1 t
 ∞  
 d
1
≥ P (1 − Ui ) < dt
1 i=1
t
 ∞

1
≥ P 1 − Ui < , 1 ≤ i ≤ d dt
1 dt
 ∞

1 1
= P 1 − Ui < , 1 ≤ i ≤ d dt
d d t
 ∞

1 1
= P Ui > 1 − , 1 ≤ i ≤ d dt.
d d t
As a consequence, we have established the equivalence
 ∞

1
E(N (2)) < ∞ ⇐⇒ P Ui > 1 − , 1 ≤ i ≤ d dt < ∞.
1 t
228 5 Further Applications of D-Norms to Probability & Statistics

Substituting t → 1/(1 − t) yields


 ∞
 1
1 P (Ui ≥ u, 1 ≤ i ≤ d)
P Ui > 1 − , 1 ≤ i ≤ d dt = du,
1 t 0 (1 − u)2

which completes the proof of Proposition 5.3.11.


Infinite Expectation of Record Time


The next result provides a criterion for the case E(N (2)) = ∞ in terms of
D-norms. This requires the additional condition that the underlying copula
is in the domain of attraction of an SMS df.
Proposition 5.3.12 Suppose that C ∈ D(G), where the D-norm cor-
responding to G satisfies  1 D > 0. Then E(N (2)) = ∞.

Proof. Let U = (U1 , . . . , Ud ) be an rv that follows the copula C. From


Lemma 3.1.13 and the homogeneity of the dual D-norm function  · D , we
obtain
P (Ui ≥ u, 1 ≤ i ≤ d)
→u↑1  1 D .
1−u
As a consequence, there exists ε ∈ (0, 1) such that

P (Ui ≥ u, 1 ≤ i ≤ d)  1 D

1−u 2
for u ∈ [1 − ε, 1). This implies
 1  1
P (Ui ≥ u, 1 ≤ i ≤ d) P (Ui ≥ u, 1 ≤ i ≤ d)
du ≥ du
0 (1 − u)2 1−ε (1 − u)2

 1 D 1 1
≥ du = ∞,
2 1−ε 1 − u

which completes the proof of Proposition 5.3.12.


Another Tail Dependence Coefficient


Suppose that C ∈ D(G). According to Proposition 5.3.12, a finite expectation
E(N (2)) < ∞ can only occur if the dual D-norm function satisfies  1 D = 0,
which is true, for instance, if G has at least two independent margins.
Let U follow the copula C. Next, we show that E(N (2)) is typically finite
if U has at least two components Uj , Uk that are tail independent , i.e.,

lim P (Uk > u | Uj > u) = 0.


u↑1
5.3 Multivariate Records and Champions 229

Within the class of (bivariate) copulas that are tail independent,

2 log(1 − u)
χ̄ := lim −1
u↑1 log(P (U1 > u, U2 > u))

is a popular measure of tail comparison, provided that this limit exists (Coles
et al. (1999); Heffernan (2000)). In this case, we have χ̄ ∈ [−1, 1] (Beirlant
et al. (2004, (9.83))). For a bivariate normal copula with a coefficient of cor-
relation ρ ∈ (−1, 1), it is, for instance, well known that χ̄ = ρ.
Note that the next result does not require C ∈ D(G). It requires only
the existence of the above tail dependence coefficient for at least one pair of
components.
Proposition 5.3.13 Let U = (U1 , . . . , Ud ) follow a copula C. Suppose
that there exist indices k
= j such that

2 log(1 − u)
χ̄k,j = lim − 1 ∈ [−1, 1). (5.21)
u↑1 log(P (Uk > u, Ui > u))

Then, we have E(N (2)) < ∞.

Proof. According to Proposition 5.3.11, we have to show


 1
P (Ui ≥ u, 1 ≤ i ≤ d)
du < ∞.
0 (1 − u)2
But, obviously,
 1  1
P (Ui ≥ u, 1 ≤ i ≤ d) P (Uk ≥ u, Uj ≥ u)
2
du ≤ du.
0 (1 − u) 0 (1 − u)2

Therefore, we only have to find ε ∈ (0, 1) such that


 1
P (Uk ≥ u, Uj ≥ u)
du < ∞.
1−ε (1 − u)2

Since
2 log(1 − u)
− 1 →u↑1 χ̄k,j ∈ [−1, 1),
log(P (Uk > u, Ui > u))
there exist ε > 0 and c < 1/2 such that

log(P (Uk ≥ u, Uj ≥ u))


1− ≤ c, u ∈ [1 − ε, 1).
2 log(1 − u)
Taking logarithms yields
 1
P (Uk ≥ u, Uj ≥ u)
du
1−ε (1 − u)2
230 5 Further Applications of D-Norms to Probability & Statistics
 1

P (Uk ≥ u, Uj ≥ u)
= exp log du
1−ε (1 − u)2
 1

log(P (Uk ≥ u, Uj ≥ u))


= exp −2 log(1 − u) 1 − du
1−ε 2 log(1 − u)
 1
≤ exp(−2c log(1 − u)) du
1−ε
1
1
= du < ∞,
1−ε (1 − u)2c

as 2c < 1. This completes the proof of Proposition 5.3.13.


Corollary 5.3.14 We have E(N (2)) < ∞ for multivariate normal rvs,
unless all components are completely dependent, precisely, unless all
bivariate coefficients of correlation are one.
References

Arnold, B. C., Balakrishnan, N., and Nagaraja, H. N. (1998).


Records. Wiley Series in Probability and Statistics. Wiley, New York.
doi:10.1002/9781118150412.
Arnold, B. C., Balakrishnan, N., and Nagaraja, H. N. (2008). A First
Course in Order Statistics. Society for Industrial and Applied Mathematics,
Philadelphia. doi:10.1137/1.9780898719062.
Aulbach, S., Bayer, V., and Falk, M. (2012a). A multivariate piecing-
together approach with an application to operational loss data. Bernoulli
18, 455–475. doi:10.3150/10-BEJ343.
Aulbach, S., Falk, M., and Hofmann, M. (2012b). The multivariate
piecing-together approach revisited. J. Multivariate Anal. 110, 161–170.
doi:10.1016/j.jmva.2012.02.002.
Balkema, A. A., and de Haan, L. (1978a). Limit distributions for order
statistics i. Theory Probab. Appl. 23, 77–92. doi:10.1137/1123006.
Balkema, A. A., and de Haan, L. (1978b). Limit distributions for order
statistics ii. Theory Probab. Appl. 23, 341–358. doi:10.1137/1123036.
Balkema, A. A., and Resnick, S. I. (1977). Max-infinite divisibility. J.
Appl. Probab. 14, 309–319. doi:10.2307/3213001.
Barakat, H. M. (2001). The asymptotic distribution theory of
bivariate order statistics. Ann. Inst. Stat. Math. 53, 487–497.
doi:10.1023/A:101466081.
Barakat, H. M., and Abd Elgawad, M. A. (2017). Asymptotic behavior
of the joint record values, with applications. Statist. Probab. Lett. 124,
13–21. doi:10.1016/j.spl.2016.12.020.
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. (2004). Statis-
tics of Extremes: Theory and Applications. Wiley Series in Probability and
Statistics. Wiley, Chichester, UK. doi:10.1002/0470012382.

© Springer Nature Switzerland AG 2019 231


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9
232 References

Billingsley, P. (1968). Convergence of Probability Measures. Wiley Series


in Probability and Mathematical Statistics, 1st ed. Wiley, New York.
Billingsley, P. (1999). Convergence of Probability Measures. Wi-
ley Series in Probability and Statistics, 2nd ed. Wiley, New York.
doi:10.1002/9780470316962.
Billingsley, P. (2012). Probability and Measure. Wiley Series in Probability
and Statistics, Anniversary ed. Wiley, New York.
Bolley, F. (2008). Separability and completeness for the Wasserstein dis-
tance. In Séminaire de Probabilités XLI (C. Donati-Martin, M. Émery,
A. Rouault, and C. Stricker, eds.), Lecture Notes in Mathematics, vol. 1934,
371–377. Springer, Berlin. doi:10.1007/978-3-540-77913-1 17.
Brown, B. M., and Resnick, S. I. (1977). Extreme values of independent
stochastic processes. J. Appl. Probab. 14, 732–739. doi:10.2307/3213346.
Charpentier, A., and Segers, J. (2009). Tails of multivari-
ate Archimedean copulas. J. Multivariate Anal. 100, 1521–1537.
doi:10.1016/j.jmva.2008.12.015.
Cheng, S., de Haan, L., and Yang, J. (1997). Asymptotic distributions of
multivariate intermediate order statistics. Theory Probab. Appl. 41, 646–
656. doi:10.1137/S0040585X97975733.
Coles, S. G., Heffernan, J. E., and Tawn, J. A. (1999). De-
pendence measure for extreme value analyses. Extremes 2, 339–365.
doi:10.1023/A:1009963131610.
Cooil, B. (1985). Limiting multivariate distributions of intermediate order
statistics. Ann. Probab. 13, 469–477. doi:10.1214/aop/1176993003.
van Dantzig, D. (1956). Economic decision problems for flood prevention.
Econometrica 24, 276–287. http://www.jstor.org/stable/1911632.
David, H. (1981). Order Statistics. Wiley Series in Probability and Mathe-
matical Statics, 2nd ed. John Wiley & Sons, New York.
David, H., and Nagaraja, H. (2005). Order Statistics. Wiley Series in
Probability and Mathematical Statics, 3rd ed. John Wiley& Sons, New
York. doi:10.1002/0471722162.
Deheuvels, P. (1984). Probabilistic aspects of multivariate extremes. In
Statistical Extremes and Applications (J. Tiago de Oliveira, ed.), 117–130.
D. Reidel, Dordrecht. doi:10.1007/978-94-017-3069-3 9.
Dombry, C., Falk, M., and Zott, M. (2018). On func-
tional records and champions. Journal of Theoretical Probability.
doi:10.1007/s10959-018-0811-7.
Dombry, C., and Ribatet, M. (2015). Functional regular variations, Pareto
processes and peaks over threshold. In Special Issue on Extreme Theory
and Application (Part II) (Y. Wang and Z. Zhang, eds.), Statistics and Its
Interface, vol. 8, 9–17. doi:10.4310/SII.2015.v8.n1.a2.
Dombry, C., Ribatet, M., and Stoev, S. (2017). Probabilities of
concurrent extremes. Journal of the American Statistical Association.
doi:10.1080/01621459.2017.1356318.
References 233

Embrechts, P., Klüppelberg, C., and Mikosch, T. (1997). Modelling


Extremal Events for Insurance and Finance, Applications of Mathematics
- Stochastic Modelling and Applied Probability, vol. 33. Springer, Berlin.
doi:10.1007/978-3-642-33483-2.
Falk, M. (1989). A note on uniform asymptotic normality of intermediate
order statistics. Ann. Inst. Stat. Math. 41, 19–29.
Falk, M., and Guillou, A. (2008). Peaks-over-threshold stability of multi-
variate generalized Pareto distributions. J. Multivariate Anal. 99, 715–734.
doi:10.1016/j.jmva.2007.03.009.
Falk, M., Hofmann, M., and Zott, M. (2015). On generalized max-linear
models and their statistical interpolation. J. Appl. Probab. 52, 736–751.
doi:10.1239/jap/1445543843.
Falk, M., Hüsler, J., and Reiss, R.-D. (2004). Laws of Small
Numbers: Extremes and Rare Events. 2nd ed. Birkhäuser, Basel.
doi:10.1007/978-3-0348-7791-6.
Falk, M., Hüsler, J., and Reiss, R.-D. (2011). Laws of Small
Numbers: Extremes and Rare Events. 3rd ed. Birkhäuser, Basel.
doi:10.1007/978-3-0348-0009-9.
Falk, M., Khorrami Chokami, A., and Padoan, S. (2018). Some
results on joint record events. Statist. Probab. Lett. 135, 11–19.
doi:10.1016/j.spl.2017.11.011.
Falk, M., and Stupfler, G. (2017). An offspring of multivariate extreme
value theory: The max-characteristic function. J. Multivariate Anal. 154,
85–95. doi:10.1016/j.jmva.2016.10.007.
Ferreira, A., and de Haan, L. (2014). The generalized Pareto process;
with a view towards application and simulation. Bernoulli 20, 1717–1737.
doi:10.3150/13-BEJ538.
Fuller, T. (2016). An Approach to the D-Norms with Functional Analysis.
Master’s thesis, University of Würzburg, Germany.
Galambos, J. (1975). Order statistics of samples from multivariate distri-
butions. Journal of the American Statistical Association 70, 674–680.
Galambos, J. (1987). The Asymptotic Theory of Extreme Order Statistics.
2nd ed. Krieger, Malabar.
Genest, C., and Nešlehová, J. (2012). Copula modeling for extremes. In
Encyclopedia of Environmetrics (A. El-Shaarawi and W. Piegorsch, eds.),
vol. 2, 530–541. Wiley, Chichester. doi:10.1002/9780470057339.vnn018.
Giné, E., Hahn, M., and Vatan, P. (1990). Max-infinitely divisible and
max-stable sample continuous processes. Probab. Theory Related Fields
87, 139–165. doi:10.1007/BF01198427.
Gnedenko, B. (1943). Sur la distribution limite du terme maximum d’une
série aléatoire. Ann. of Math. (2) 44, 423–453. doi:10.2307/1968974.
Goldie, C. M., and Resnick, S. I. (1989). Records in a partially ordered
set. Ann. Probab. 17, 678–699. doi:10.1214/aop/1176991421.
Goldie, C. M., and Resnick, S. I. (1995). Many multivariate records.
Stochastic Process. Appl. 59, 185–216. doi:10.1016/0304-4149(95)00047-B.
234 References

de Haan, L. (1975). On regular variation and its application to the weak


convergence of sample extremes, MC Tracts, vol. 32. 3rd ed. Centrum Voor
Wiskunde en Informatica, Amsterdam. http://persistent-identifier.org/?
identifier=urn:nbn:nl:ui:18-18567.
de Haan, L., and Ferreira, A. (2006). Extreme Value Theory: An In-
troduction. Springer Series in Operations Research and Financial En-
gineering. Springer, New York. doi:10.1007/0-387-34471-3. See http://
people.few.eur.nl/ldehaan/EVTbook.correction.pdf and http://home.isa.
utl.pt/∼anafh/corrections.pdf for corrections and extensions.
de Haan, L., and Resnick, S. (1977). Limit theory for multivari-
ate sample extremes. Probab. Theory Related Fields 40, 317–337.
doi:10.1007/BF00533086.
Heffernan, J. E. (2000). A directory of coefficients of tail dependence.
Extremes 3, 279–290. doi:10.1023/A:1011459127975.
Hofmann, D. (2009). Characterization of the D-Norm Corresponding
to a Multivariate Extreme Value Distribution. Ph.D. thesis, University
of Würzburg. http://opus.bibliothek.uni-wuerzburg.de/volltexte/2009/
4134/.
Huang, X. (1991). Statistics of Bivariate Extreme Values. Ph.D. thesis,
Tinbergen Institute Research Series.
Huser, R., and Davison, A. C. (2013). Composite likelihood es-
timation for the Brown-Resnick process. Biometrika 100, 511–518.
doi:10.1093/biomet/ass089.
Jarchow, H. (1981). Locally Convex Spaces. Teubner, Stuttgart.
doi:10.1007/978-3-322-90559-8.
Kabluchko, Z., Schlather, M., and de Haan, L. (2009). Stationary
max-stable fields associated to negative definite functions. Ann. Probab.
37, 2042–2065. doi:10.1214/09-AOP455.
Kortschak, D., and Albrecher, H. (2009). Asymptotic results for the
sum of dependent non-identically distributed random variables. Methodol.
Comput. Appl. Probab. 11, 279–306. doi:10.1007/s11009-007-9053-3.
Krupskii, P., Joe, H., Lee, D., and Genton, M. G. (2018). Extreme-
value limit of the convolution of exponential and multivariate normal dis-
tributions: Links to the Hüsler-Reiß distribution. J. Multivariate Anal. 163,
80–95. doi:10.1016/j.jmva.2017.10.006.
Lang, S. (1987). Linear Algebra. 3rd ed. Springer, New York.
doi:10.1007/978-1-4757-1949-9.
Lax, P. D. (2002). Functional Analysis. Wiley, New York.
McNeil, A. J., and Nešlehová, J. (2009). Multivariate Archimedean cop-
ulas, d-monotone functions and 1 -norm symmetric distributions. Ann.
Statist. 37, 3059–3097. doi:10.1214/07-AOS556.
Molchanov, I. (2005). Theory of Random Sets. Probability and Its Appli-
cations. Springer, London. doi:10.1007/1-84628-150-4.
Molchanov, I. (2008). Convex geometry of max-stable distributions. Ex-
tremes 11, 235–259. doi:10.1007/s10687-008-0055-5.
References 235

Nelsen, R. B. (2006). An Introduction to Copulas. Springer Series in Statis-


tics, 2nd ed. Springer, New York. doi:10.1007/0-387-28678-0.
Ng, K. W., Tian, G.-L., and Tang, M.-L. (2011). Dirichlet and Related
Distributions. Theory, Methods and Applications. Wiley Series in Probabil-
ity and Statistics. Wiley, Chichester, UK. doi:10.1002/9781119995784.
Phelps, R. R. (2001). Lectures on Choquet’s Theorem. 2nd ed. Springer,
Berlin-Heidelberg. doi:10.1007/b76887.
Pickands, J., III (1981). Multivariate extreme value distributions. Proc.
43th Session ISI (Buenos Aires) 859–878.
Reiss, R.-D. (1989). Approximate Distributions of Order Statistics: With
Applications to Nonparametric Statistics. Springer Series in Statistics.
Springer, New York. doi:10.1007/978-1-4613-9620-8.
Reiss, R.-D., and Thomas, M. (2007). Statistical Analysis of Extreme Val-
ues: with Applications to Insurance, Finance, Hydrology and Other Fields.
3rd ed. Birkhäuser, Basel. doi:10.1007/978-3-7643-7399-3.
Resnick, S. I. (1987). Extreme Values, Regular Variation, and
Point Processes, Applied Probability, vol. 4. Springer, New York.
doi:10.1007/978-0-387-75953-1. First Printing.
Ressel, P. (2013). Homogeneous distributions - and a spectral represen-
tation of classical mean values and stable tail dependence functions. J.
Multivariate Anal. 117, 246–256. doi:10.1016/j.jmva.2013.02.013.
Revuz, D., and Yor, M. (1999). Continuous Martingales and Brownian
Motion. Grundlehren der mathematischen Wissenschaften, 3rd ed. Springer,
London. doi:10.1007/978-3-662-21726-9.
Rockafellar, R. T. (1970). Convex Analysis. Princeton University Press,
New Jersey.
Rootzén, H., and Tajvidi, N. (2006). Multivariate generalized Pareto dis-
tributions. Bernoulli 12, 917–930. doi:10.3150/bj/1161614952.
Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges.
Pub. Inst. Stat. Univ. Paris 8, 229–231.
Sklar, A. (1996). Random variables, distribution functions, and copulas – a
personal look backward and forward. In Distributions with fixed marginals
and related topics (L. Rüschendorf, B. Schweizer, and M. D. Taylor, eds.),
Lecture Notes – Monograph Series, vol. 28, 1–14. Institute of Mathematical
Statistics, Hayward, CA. doi:10.1214/lnms/1215452606.
Smirnov, N. V. (1967). Some remarks on limit laws for order statistics.
Theory. Probab. Appl. 12, 337–339.
Smith, R. L. (1990). Max-stable processes and spatial extremes. Preprint,
Univ. North Carolina, http://www.stat.unc.edu/faculty/rs/papers/RLS
Papers.html.
Takahashi, R. (1987). Some properties of multivariate extreme value dis-
tributions and multivariate tail equivalence. Ann. Inst. Stat. Math. 39,
637–647. doi:10.1007/BF02491496.
Takahashi, R. (1988). Characterizations of a multivariate extreme value
distribution. Adv. in Appl. Probab. 20, 235–236. doi:10.2307/1427279.
236 References

Vatan, P. (1985). Max-infinite divisibility and max-stability in infinite di-


mensions. In Probability in Banach Spaces V: Proceedings of the Interna-
tional Conference held in Medford, USA, July 16, 1984 (A. Beck, R. Dudley,
M. Hahn, J. Kuelbs, and M. Marcus, eds.), Lecture Notes in Mathematics,
vol. 1153, 400–425. Springer, Berlin. doi:10.1007/BFb0074963.
Villani, C. (2009). Optimal Transport. Old and New, Grundlehren
der mathematischen Wissenschaften, vol. 338. Springer, Berlin.
doi:10.1007/978-3-540-71050-9.
Wang, Y., and Stoev, S. A. (2011). Conditional sampling for spectrally dis-
crete max-stable random fields. Adv. in Appl. Probab. 43, 461–483. http://
projecteuclid.org/euclid.aap/1308662488.
Zott, M. (2016). Extreme Value Theory in Higher Dimensions. Max-
Stable Processes and Multivariate Records. Ph.D. thesis, University of
Würzburg. https://opus.bibliothek.uni-wuerzburg.de/opus4-wuerzburg/
frontdoor/index/index/docId/13661.
Index

1(·) Γ
indicator function, 6 Gamma function, 6
=D |T |
equality of distributions, 121, number of elements in set T , 20
187 Ā
A topological closure of set A, 69
transpose of matrix, 3 ej
C[0, 1] j-th unit vector, 5
set of continuous functions on  · D
[0, 1], 52 dual D-norm function, 22
C + [0, 1] ·
set of non-negative continuous norm, 1
functions on [0, 1], 175 ∂+
C − [0, 1] right derivative of a function,
set of continuous and non- 203
positive functions on [0, 1], E
184 matrix with constant entry one,
E[0, 1] 9
subset of functions on [0, 1], 52 →D
E − [0, 1] convergence in distribution, 34
subset of non-positive functions εz
in E[0, 1], 166 Dirac measure, 67
F ∈ D(G) 1A (t)
F is in the domain of attraction indicator function of set A, 8
of G, 100, 135
F −1
Absorbing D-norm, 40
generalized inverse of df, 136,
Angular measure, 26
212
Angular set, 63
P ∗Z
Aumann integral, 88
distribution of rv Z, 39
a.s.
[0, c][0,1]
(almost surely), 7
set of functions from [0, 1] to
[0, c], 162
Δ-inequality Barycentric coordinates, 68, 75
triangle inequality, 1 Bauer simplex, 70
Δ-monotone Beta function, 127
Delta-monotone, 17 Bilinear map, 78

© Springer Nature Switzerland AG 2019 237


M. Falk, Multivariate Extreme Value Theory and D-Norms,
Springer Series in Operations Research and Financial Engineering,
https://doi.org/10.1007/978-3-030-03819-9
238 INDEX

Bivariate projection Covariance of standard max-stable


of D-norm, 44 rv, 125
Brown–Resnick process, 57, 183 Cross-polytope, 84, 86
interpolation of, 183
Defective discretized version
Cauchy–Schwarz inequality, 3 of SMS process, 184
CDF (complete dependence frame), Defective norm, 184
45 Dense subset
Central limit theorem of D-norms, 75
for D-norms, 199 Dependency set, 85
Central order statistics, 205 df (distribution function), 7
CF (characteristic function), 192 Dirac measure, 67
Champion, 214, 217 Dirichlet D-norm, 25
Choquet–Bishop–de Leeuw Discretized version
theorem, 70, 72 of generator process, 180
Coefficient of tail dependence, 120, of SMS process, 180
229 mean squared error, 185
Complete angular set, 31 uniform convergence, 182
Complete dependence frame of Distance, 1
D-norm, 45 Distribution
Complete record arbitrary generalized Pareto,
expected number, 219 137
survival function, 221 binomial, 175
Compound symmetry condition, Dirichlet, 25
209 exponential, 102
Concurrence probability, 219 extreme value, 101, 108
Concurrency of extremes, 217 Fréchet, 7, 100, 108, 131
Continuity of probability theorem, Gamma, 25
123, 167 generalized Pareto, 102, 103
Continuous mapping theorem, 218 Gumbel, 100
Convex hull, 65, 84, 86 Hüsler–Reiss, 116, 143
Convex set, 65 log-normal, 9
Copula, 50, 135, 152 max-stable, 101, 107
Ali–Mikhail–Haq family, 143 multivariate generalized Pareto,
Archimedean, 150 106, 196
Clayton, 150 negative exponential, 100
empirical, 158 normal, 57
extreme value, 143 Pareto, 102, 103, 162
Frank, 150 reverse Weibull, 100
generalized Pareto, 136 simple max-stable, 108
Gumbel, 150 standard generalized Pareto,
not in domain of attraction, 158 106
PT-, 154 standard max-stable, 110
Correlation of standard max-stable standard negative exponential,
rv, 125 111
INDEX 239

uniform, 102 Half space, 79


Weibull, 23 Helly’s selection theorem, 142
D-norm, 4 Hoeffding–Fréchet bounds
complete dependence, 112 for a multivariate df, 141
independence, 112 Hoeffding’s identity, 125
Domain of attraction, 100, 135 Hofmann’s characterization, 17
for copulas, 138 Hölder’s inequality, 82, 92
Dual D-norm function, 22 for D-norms, 93
functional version, 60 Homeomorphism, 74
Dual norm Homogeneity, 1
of a D-norm, 92, 96 Hüsler–Reiss D-norm, 9, 39, 47, 51, 58,
200
EVD (extreme value distribution), Hyperplane separation theorem, 79
101
Excursion stability, 105, 107, 163 Idempotent D-norm, 40, 43, 47, 49
of generalized Pareto copula, 137 Identity element, 39
Expected shortfall, 192 iff (if and only if), 17
Extremal coefficient, 119 iid (independent and identically
Extremal concurrence probability, distributed), 7
217 Inclusion-exclusion principle, 20
Extremal point, 66 Inner product, 78
Extremal set, 66 Integrably bounded
random set, 87
Functional distribution function, Intermediate order statistics, 206
162
Functional D-norm, 166 Jensen’s inequality, 48

Gamma function, 6 Krein–Milman theorem


Gauge, 94 in arbitrary dimension, 69
Generalized barycentric coordi- in finite dimensions, 66
nates, 67
Generalized extreme value distri- Linear affine functional, 67
bution, 101 Locally convex vector space, 68
Generalized inverse
of a distribution function, 136 Marshall–Olkin D-norm, 15, 221
Generalized max-linear model, 176 Max-CF (Max-characteristic
Generalized Pareto process, 162 function), 189, 190
Generator, 4, 54 inversion formula, 203
Geometric Brownian motion, 183 Max-linear model, 172, 175, 176
GPC (generalized Pareto copula), Max-stability, 101
136 Max-stable process, 165
GPD (generalized Pareto Max-zonoid, 78, 85
distribution), 102 D-, 85
GPP (generalized Pareto process), Metric, 1
162 Wasserstein, 33
240 INDEX

MEVT (multivariate extreme Reconstruction of SMS process,


value theory), VII 177
Minkowski inequality, 2 Record
Min-stable distribution, 116 complete, 214
Multiplication of D-norms, 38 simple, 214
Multiplication stability Record time, 225
of Hüsler–Reiss D-norm, 40 characterization of finite expec-
Multivariate central limit theorem tation, 227
for arrays, 210 expectation, 226
finite expectation, 229
Non-degenerate distribution infinite expectation, 228
function, 100 for multivariate normal observa-
Norm, 1 tions, 230
Euclidean, 2, 78 Reflection principle
generated by a set, 83 for Brownian motion, 57
L1 -, 2 Relative compact sequence
logistic, 2, 7, 149 of rv, 36
Manhattan, 2 rv (random vector, random
monotone, 5, 18 variable), 4
radially symmetric, 5
standardized, 18 Scalar product, 78
sup-, 2 Selection
of a random set, 87
Orthogonal projection, 78 Selection expectation
os (order statistics), 121, 205 of a random set, 87
multivariate, 205 Semigroup, 39
Seminorm, 62, 80
Pickands dependence function, 119 Simple max-stable process, 168
Pointwise limit of D-norms, 36 Simple record
Portmanteau theorem, 36 asymptotic distribution, 225
POT (peaks-over-threshold), 102, 153 expected number, 224
stability, 105, 107 Simplex, 68
univariate case, 153 Sklar’s theorem, 136
Predictor of SMS process, 183 SMS (standard max-stable), 110
Principal axis theorem, 47 Sojourn time, 163
Projection of a D-norm, 147 Spectral decomposition
Prokhorov’s theorem, 36 of Hüsler–Reiss D-norm, 48
PT (piecing-together), 153 of positive semidefinite matrix,
multivariate case, 152 47
univariate case, 153 Stable tail dependence function,
141
Quantile function, 192
Standard max-stable process, 166
Random closed set, 87 Stochastic geometry, 78
Random set, 86 Stop-loss premium risk measure,
192
INDEX 241

Support function, 79 Topology of element-wise


Survival copula, 148 convergence, 69
Survival function, 105, 147 Topology of pointwise
Survival probability convergence, 37, 69
of standard max-stable rv, 122 Track of D-norms, 47
Symmetric root Triangle inequality, 1
of positive definite matrix, 3
Uniform integrability, 218
Tail dependence, 120 Uniform integrable sequence
Tail independence, 207, 228 of rv, 37
Takahashi’s characterizations,
10, 117, 207 Wasserstein
Threshold, 99 distance, 194
Tight sequence metric, 193
of rv, 36 wlog (without loss of generality),
34

You might also like