0% found this document useful (0 votes)
32 views77 pages

19 PS336

The document surveys solutions to the infinite extendibility problem for exchangeable random vectors. [1] The problem is determining whether a random vector can be represented as the first elements of an infinite exchangeable sequence. [2] The vector is infinitely extendible if it has a "conditionally iid" representation. [3] The survey covers solutions for vectors with values in {0,1}^d, spherical laws, L1-norm symmetric laws, L∞-norm symmetric densities, and other distributions.

Uploaded by

Rocks Stones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views77 pages

19 PS336

The document surveys solutions to the infinite extendibility problem for exchangeable random vectors. [1] The problem is determining whether a random vector can be represented as the first elements of an infinite exchangeable sequence. [2] The vector is infinitely extendible if it has a "conditionally iid" representation. [3] The survey covers solutions for vectors with values in {0,1}^d, spherical laws, L1-norm symmetric laws, L∞-norm symmetric densities, and other distributions.

Uploaded by

Rocks Stones
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Probability Surveys

Vol. 17 (2020) 677–753


ISSN: 1549-5787
https://doi.org/10.1214/19-PS336

The infinite extendibility problem for


exchangeable real-valued random
vectors
Jan-Frederik Mai

XAIA Investment GmbH


Sonnenstr. 19, 80331 München, Germany
e-mail: [email protected]
Abstract: We survey known solutions to the infinite extendibility problem
for (necessarily exchangeable) probability laws on Rd , which is:

Can a given random vector X = (X1 , . . . , Xd ) be represented in distribu-


tion as the first d members of an infinite exchangeable sequence of random
variables?

This is the case if and only if X has a stochastic representation that


is “conditionally iid” according to the seminal de Finetti’s Theorem. Of
particular interest are cases in which the original motivation behind the
model X is not one of conditional independence. After an introduction
and some general theory, the survey covers the traditional cases when X
takes values in {0, 1}d , has a spherical law, a law with 1 -norm symmetric
survival function, or a law with ∞ -norm symmetric density. The solu-
tions in all these cases constitute analytical characterizations of mixtures
of iid sequences drawn from popular, one-parametric probability laws on
R, like the Bernoulli, the normal, the exponential, or the uniform distri-
bution. The survey further covers the less traditional cases when X has a
Marshall-Olkin distribution, a multivariate wide-sense geometric distribu-
tion, a multivariate extreme-value distribution, or is defined as a certain
exogenous shock model including the special case when its components are
samples from a Dirichlet prior. The solutions in these cases correspond to
iid sequences drawn from random distribution functions defined in terms of
popular families of non-decreasing stochastic processes, like a Lévy subor-
dinator, a random walk, a process that is strongly infinitely divisible with
respect to time, or an additive process. The survey finishes with a list of
potentially interesting open problems. In comparison to former literature
on the topic, this survey purposely dispenses with generalizations to the
related and larger concept of finite exchangeability or to more general state
spaces than R. Instead, it aims to constitute an up-to-date comprehensive
collection of known and compelling solutions of the real-valued extendibility
problem, accessible for both applied and theoretical probabilists, presented
in a lecture-like fashion.
MSC2020 subject classifications: Primary 60G09, 60E05; secondary
62H99.
Keywords and phrases: Exchangeability, conditionally iid, multivariate
probability distributions.

Received July 2019.


677
678 J.-F. Mai

Contents

1 Introduction and general background . . . . . . . . . . . . . . . . . . . 678


1.1 General notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 678
1.2 Motivation and mathematical preliminaries . . . . . . . . . . . . 681
1.3 Canonical probability spaces . . . . . . . . . . . . . . . . . . . . . 688
1.3.1 Laws with positive components . . . . . . . . . . . . . . . 690
1.4 General properties of conditionally iid models . . . . . . . . . . . 691
1.4.1 Positive dependence . . . . . . . . . . . . . . . . . . . . . 691
1.4.2 Further properties . . . . . . . . . . . . . . . . . . . . . . 695
1.5 A general (abstract) solution to Problem 1.1 . . . . . . . . . . . . 698
2 Binary sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
2.1 Hausdorff’s moment problem . . . . . . . . . . . . . . . . . . . . 701
2.2 Extendibility of exchangeable binary sequences . . . . . . . . . . 703
3 Classical results for static factor models . . . . . . . . . . . . . . . . . 706
3.1 Spherical laws (aka 2 -norm symmetric laws) . . . . . . . . . . . 707
3.2 1 -norm symmetric laws . . . . . . . . . . . . . . . . . . . . . . . 711
3.3 ∞ -norm symmetric laws . . . . . . . . . . . . . . . . . . . . . . . 716
4 The multivariate lack-of-memory property . . . . . . . . . . . . . . . . 721
4.1 Marshall-Olkin and multivariate geometric distributions . . . . . 723
4.2 Infinite divisibility and Lévy subordinators . . . . . . . . . . . . 724
4.3 Analytical characterization of exchangeability and conditionally iid725
5 Max-/ min-stable laws and extreme-value copulas . . . . . . . . . . . . 729
5.1 Max-/ min-stability and multivariate extreme-value theory . . . 729
5.2 Analytical characterization of conditionally iid . . . . . . . . . . 731
6 Exogenous shock models . . . . . . . . . . . . . . . . . . . . . . . . . . 736
6.1 Exchangeability and the extendibility problem . . . . . . . . . . 737
6.2 The Dirichlet prior and radial symmetry . . . . . . . . . . . . . . 740
6.3 The Sato-frailty model and self-decomposability . . . . . . . . . . 742
7 Related open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
7.1 Extendibility-problem for further families . . . . . . . . . . . . . 743
7.2 Testing for conditional independence . . . . . . . . . . . . . . . . 744
7.3 Combination of one-factor models to multi-factor models . . . . . 744
7.4 Parameter estimation with uncertainty . . . . . . . . . . . . . . . 745
7.5 Quantification of diversity of possible extensions . . . . . . . . . 746
7.6 Characterization of stochastic objects via multivariate probabil-
ity laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747

1. Introduction and general background

1.1. General notation

Before we start, let us clarify some general notation used throughout, while
some section-specific notations are introduced where they appear.
Table 1. Summary of main results surveyed. Whereas most appearing notations are introduced in the main body of the article, here L[X] denotes

the Laplace transform of a random variable X ≥ 0, and x, y := dk=1 xk yk .

law of X conditionally iid with Ht = analytically see


 
|E[g(X)]|
arbitrary in 1
M+ (Rd ) arbitrary in 1
M+ (H) supg sup |E[g(Y )]| ≤ 1, g bounded, Yk iid Theorem 1.2
Y

The infinite extendibility problem


1
arbitrary in M+ ({0, 1}d ) (1 − M ) 1{t≥0} + M 1{t≥1} P(X = x) = ∇d−x1 bx1 , bk = E[M k ] Theorem 2.1
 
spherical law Φ M t
E[exp{i x, X}] = ϕ(x2 ), ϕ = L[M ] Theorem 3.1
  
∞ -norm symmetric density max 0, min M t
,1 fX (x) = gd (x[d] ), gd (x) = E[M −d 1{M >x} ] Theorem 3.3
d c x[k] +k−1
ranks of Dirichlet prior samples DP (c, {t}t∈[0,1] ) P(X ≤ x) = x[1] k=2 c+k−1 Lemma 6.1

law of X conditionally iid with Zt = P(X > x) = see

1 -norm symmetric survival function Mt ϕ(x2 ), ϕ = L[M ] Theorem 3.2


 d −Ψ
Marshall–Olkin exponential law Lévy subordinator exp k=1 Ψ(k) ∇x[d−k] , e = L[Z1 ] Theorem 4.1(E)
 t d x[d−k+1] −x[d−k]
multivariate wide-sense geometric n=1 Yn , Yk iid k=1 E[exp(−k Y 1 )] Theorem 4.1(G)
distribution
   
d 
(n)
min-stable multivariate exponential law b t − c log G 
n , G(n) iid ∼ γ exp −b xk − c H+,1
G (x) γ(dG) Theorem 5.1
n≥1 ηi /t− k=1
i=1  d
exp − k=1 Ψx[d−k+1] (k)
exogenous shock model additive subordinator  , e−Ψt = L[Zt ] Theorem 6.1
exp−Ψx (k−1)
  [d−k+1]
exp − d Ψ(k x[d−k+1] )
Sato-frailty model self-similar, additive subordinator  k=1 , e−Ψ = L[Z1 ] Lemma 6.2
exp −Ψ((k−1) x[d−k+1] )

679
680 J.-F. Mai

Some very general mathematical definitions:


We denote by N = {1, 2, 3, . . .} the set of natural numbers and N0 := N ∪ {0}, by
R the set of real numbers, by Rd the set of d-dimensional row vectors with entries
in R, for d ∈ N. For n ∈ N0 we denote by f (n) the n-th derivative of a function
f : R → R, provided existence. For d numbers x1 , . . . , xd ∈ R we denote by
x[1] ≤ x[2] ≤ . . . ≤ x[d] an ordered list. For x ∈ R we denote by x the smallest
integer greater or equal to x (ceiling function), and by x the largest integer
less than or equal to x (floor function). We denote by det[A] the determinant
of a square matrix A ∈ Rd×d . We denote elements x = (x1 , . . . , xd ) ∈ Rd by
bold letters in comparison to (one-dimensional) elements x ∈ R. Expressions
like x > y for x, y ∈ Rd are meant component-wise, i.e. xk > yk for each
k = 1, . . . , d. We furthermore use the notation x p := (|x1 |p + . . . + |xd |p )1/p
for the p -norm of x ∈ Rd , p ≥ 1.
Some general definitions regarding probability spaces:
All random objects to be introduced are formally defined on some probability
space (Ω, F, P) with σ-algebra F and probability measure P, and the expected
value of a random variable X is denoted by E[X]. As usual, the argument
ω ∈ Ω of some random variable X : Ω → R will always be omitted. The sym-
d
bol = denotes equality in distribution and the symbol ∼ means “is distributed
d
according to”. We recall that for random vectors (X1 , . . . , Xd ) = (Y1 , . . . , Yd )
means E[g(X1 , . . . , Xd )] = E[g(Y1 , . . . , Yd )] for all bounded, continuous func-
tions g : Rd → R, where the expectation values E are taken on the respective
probability spaces of (X1 , . . . , Xd ) and (Y1 , . . . , Yd ), which might be different.
Equality in law for two stochastic processes X = {Xt } and Y = {Yt } means that
d
(Xt1 , . . . , Xtd ) = (Yt1 , . . . , Ytd ) for arbitrary d ∈ N and t1 , t2 , . . . , td . Through-
out, the abbreviation iid stands for independent and identically distributed.
The index t of a real-valued random variable ft that belongs to some stochastic
process f = {ft }t∈T is purposely written as a sub-index, in order to distinguish
it from the value f (t) of some (non-random) function f : T → R. If F is the
distribution function of some random variable taking values in R, we denote by

F −1 (y) := inf{x ∈ R : F (x) ≥ y}, y ∈ [0, 1],

its generalized inverse, see [25] for background. Any distribution function C
of a random vector U = (U1 , . . . , Ud ) whose components Uk are uniformly dis-
tributed on [0, 1] is called a copula, see [75] for a textbook treatment. We further
recall that an arbitrary survival function F̄ of some d-variate random vector X
can always be written1 as F̄ (x) = Ĉ P(X1 > x1 ), . . . , P(Xd > xd ) , where Ĉ
is a copula, called a survival copula for F̄ , and it is uniquely determined in
case the random variables X1 , . . . , Xd have continuous distribution functions.
This is the survival analogue of the so-called Theorem of Sklar, due to [101].
The Theorem of Sklar itself states that the distribution function F of X can
be written as F (x) = C P(X1 ≤ x1 ), . . . , P(Xd ≤ xd ) for a copula C, called a

1 See [81, p. 195–196].


The infinite extendibility problem 681

copula for F . The relationship between a copula C and its survival copula Ĉ is
that if (U1 , . . . , Ud ) ∼ C then (1 − U1 , . . . , 1 − Ud ) ∼ Ĉ.
A notation of specific interest in the present survey:
We denote by H the set of all distribution functions of real-valued random
variables, and by H+ the subset containing all elements F such that x < 0 implies
F (x) = 0, i.e. distribution functions of non-negative random variables. Elements
F ∈ H are right-continuous, and we denote by F (x−) := limt↑x F (x) their left-
1
continuous versions. If X is some Hausdorff space, we denote by M+ (X) the set of
all probability measures on the measurable space (X, B(X)), where B(X) denotes
the Borel-σ-algebra of X. This notation is borrowed from [8]. Now recall that
H is metrizable (hence in particular Hausdorff) when topologized with the so-
called Lévy metric that induces weak convergence of the associated probability
distributions on R, see [100]. Consequently, we denote by M+ 1
(H) the set of all
probability measures on H. A random element H = {Ht }t∈R ∼ γ ∈ M+ 1
(H)
is almost surely a càdlàg stochastic process, and a common way of treating
probability laws of such objects works via the so-called Skorohod metric on the
space of càdlàg paths. However, even though the Skorohod topology and the
topology induced by the Lévy metric are not identical, see [88, p. 327-328], their
induced Borel-σ-algebras on the set H can indeed be shown to coincide, so that
our viewpoint is equivalent.

1.2. Motivation and mathematical preliminaries

Throughout, we consider by X = (X1 , . . . , Xd ) a random vector taking values


in Rd . Since we are only interested in the probability distribution of X, we
identify X with its probability law in the sense that we often say X has some
property if and only if its probability distribution has this property. The central
theme of the present survey deals with the following formal definition, using the
nomenclature in [24].
Definition 1.1 (Conditionally iid). We say that a probability measure μ ∈
M+1
(Rd ) is conditionally iid if there exists a probability measure γ ∈ M+
1
(H)
such that the equality

μ (−∞, x1 ] × . . . × (−∞, xd ] = h(x1 ) · · · h(xd ) γ(dh)


H

holds for all x1 , . . . , xd ∈ R.


We say that a random vector X = (X1 , . . . , Xd ) taking values in Rd is condi-
tionally iid if its probability distribution is conditionally iid. This is equivalent
to existence of a random element H ∈ H such that almost surely
P(X1 ≤ x1 , . . . , Xd ≤ xd | H) = Hx1 · · · Hxd , x1 , . . . , xd ∈ R,
where conditioning on H means conditioning on σ(H). Given a family of prob-
ability distributions M ⊂ M+
1
(Rd ), we introduce the notation
M∗ = {μ ∈ M : μ is conditionally iid}.
682 J.-F. Mai

Consider a random vector X = (X1 , . . . , Xd ) on a probability space (Ω, F, P)


that is defined via

Xk := f (Uk , H), k = 1, . . . , d,

with some2 measurable “functional” f , an iid sequence of random “objects”


U1 , . . . , Ud , and some independent random object H that is measurable with
respect to the sub-σ-algebra H = σ(H) ⊂ F that it generates itself. Such X is
always conditionally iid and the probability distribution of the stochastic pro-
cess Ht := P(X1 ≤ t | H), t ∈ R, plays the role of the probability measure γ in
Definition 1.1. The object H, sometimes called a latent (dependence-inducing)
factor, then induces dependence between the components, which are iid condi-
tioned on H. This is a Bayesian viewpoint, based on a two-step construction:
first simulate an instance of H, then simulate X1 , . . . , Xd iid given H. However,
it is important to be aware that a random vector that is conditionally iid ac-
cording to our definition does not necessarily have to be defined by a stochastic
model that relies on such a two-step construction. Our definition only requires
that such a construction exists, possibly on another probability space. In fact,
typical cases of interest are such that μ is defined in terms of a stochastic model
or probabilistic property which is a priori unrelated to the concept of conditional
independence, as we will see.
Throughout, we are interested in a solution to the following problem:
Problem 1.1 (Motivating problem). Given a collection M ⊂ M+ 1
(Rd ) and
μ ∈ M, provide necessary and sufficient conditions ensuring that μ ∈ M∗ .
Remark 1.1 (Nomenclature). In the literature, elements of M∗ are not always
called conditionally iid, but other names have been given. For instance, [98] calls
them positive dependent by mixture (PDM), [103, 40, 60] call them infinitely
extendible, and [75, Definition 1.10, p. 43] call them simply extendible. The
nomenclature “(infinite) extendibility” refers to the fact that conditionally iid
random vectors can always be thought of as finite margins of infinite condition-
ally iid sequences, as will be explained below. The nomenclature “PDM” becomes
intuitive from Lemmata 1.2, 1.3 and 1.4 below, but is rather unusual.
The investigation of conditionally iid random vectors is closely related to the
concept of exchangeability. Recall that the probability distribution of a random
vector X is called exchangeable if it is invariant under an arbitrary permutation
of the components of X. The following observation is immediate but important.
Lemma 1.1 (Exchangeability). If X is conditionally iid, it is also exchangeable.
Proof. Let X ∼ μ with γ as in Definition 1.1. If π is an arbitrary permutation
of {1, . . . , d} we observe that

P(X1 ≤ x1 , . . . , X1 ≤ xd ) = h(x1 ) · · · h(xd ) γ(dh)


H
2 As will later be explained in more detail, we can without loss of generality choose f (., h) =

h−1 (.), h ∈ H, so that f (·, h) is an indexed family of quantile functions.


The infinite extendibility problem 683

= h(xπ(1) ) · · · h(xπ(d) ) γ(dh) = P(Xπ−1 (1) ≤ x1 , . . . , Xπ−1 (d) ≤ xd ).


H

Since π was arbitrary, this shows that the distribution function (hence law) of
X is invariant with respect to permutations of its components.
Exchangeability is a property which is convenient to investigate by means
of Analysis, whereas the notion “conditionally iid”, in which we are interested,
is a priori purely probabilistic and more difficult to investigate. Unfortunately,
exchangeability is only a necessary but no sufficient condition for the solution
of our problem. For instance, a bivariate normal distribution is obviously ex-
changeable if and only if the two means and variances are identical, also for
negative correlation coefficients. However, Example 1.1 and Lemma 1.2 below
show that conditionally iid random vectors necessarily have non-negative cor-
relation coefficients. One can show in general that the correlation coefficient,
if existent, between two components of an exchangeable random vector on Rd
is bounded from below by −1/(d − 1), see, e.g., [1, p. 7]. As the dimension d
tends to infinity, this lower bound becomes zero. Even better, the difference
between exchangeabilty and a conditionally iid structure vanishes completely as
the dimension d tends to infinity, which is the content of de Finetti’s Theorem.
Theorem 1.1 (de Finetti’s Theorem). Let {Xk }k∈N be an infinite sequence of
random variables on some probability space (Ω, F, P). The sequence {Xk }k∈N is
exchangeable, meaning that each finite subvector is exchangeable, if and only if
it is iid conditioned on some σ-field H ⊂ F. In this case, H equals almost surely
the tail-σ-field of {Xk }k∈N , which is given by ∩n≥1 σ(Xn , Xn+1 , . . .).
Proof. Originally due to [19]. We refer to [1] for a proof based on the reversed
martingale convergence theorem, which is briefly sketched. Of course, we only
need to verify that exchangeability implies conditionally iid, as the converse
follows from Lemma 1.1. For the sake of a more convenient notation we assume
the infinite sequence {Xk }k∈N0 is indexed by N0 = N ∪ {0}, and we define σ-
algebras F−n := σ(Xn , Xn+1 , . . .) for n ∈ N0 . The tail-σ-filed of the sequence is
H := ∩n≤0 Fn . In order to establish the claim, three auxiliary observations are
helpful with an arbitrary bounded, measurable function g fixed:
d
(i) Exchangeability implies (X0 , X1 , . . .) = (X0 , Xn+1 , . . .) for arbitrary n ∈
d
N. This implies E[g(X0 ) | F−1 ] = E[g(X0 ) | F−(n+1) ], n ∈ N.
(ii) The sequence Yn := E[g(X0 ) | Fn ], n ≤ 0, is easily checked to be a reversed
martingale. The reversed martingale convergence theorem implies that Yn
converges almost surely and in L1 to E[g(X0 ) | H]. See [23, p. 264 ff] for
background on reversed martingales (convergence).
d
(iii) Letting n → ∞ in (i), we observe from (ii) that Y−1 = E[g(X0 ) | H]. We
can further replace this equality in law by an almost sure equality, since
H ⊂ F−1 and the second moments of Y−1 and E[g(X0 ) | H] coincide. Thus,
the sequence {Y−n }n∈N is almost surely a constant sequence.
With these auxiliary observations we may now finish the argument. On the one
684 J.-F. Mai

d
hand, exchangeability implies (X0 , Xn+1 , . . .) = (Xn , Xn+1 , . . .), which gives the
almost sure equality E[g(X0 ) | F−(n+1) ] = E[g(Xn ) | F−(n+1) ]. Taking E[. | H]
on both sides of this equation implies with the tower property of conditional
expectation that E[g(X0 ) | H] = E[g(Xn ) | H]. Since g was arbitrary, X0 and
Xn are identically distributed conditioned on H, and since n was arbitrary all
members of the sequence are identically distributed conditioned on H. To verify
conditional independence, let g1 , g2 be two bounded, measurable functions. For
n ≥ 1 arbitrary, using (iii) in the third equality below, we compute

E[g1 (X0 ) g2 (Xn ) | H] = E[E[g1 (X0 ) g2 (Xn ) | F−n ] | H]


= E[g2 (Xn ) E[g1 (X0 ) | F−n ] | H]
= E[g2 (Xn ) E[g1 (X0 ) | H] | H]
= E[g2 (Xn ) | H] E[g1 (X0 ) | H].

The precisely same tower property argument inductively also implies



k    k

E gi (Xij )  H = E[gi (Xij ) | H]
j=1 j=1

for arbitrary 0 ≤ i1 < . . . < ik and bounded measurable functions g1 , . . . , gk .


Thus, the random variables X0 , X1 , . . . are independent conditioned on H.
Which topics are covered in the present survey?
The present article surveys known answers to Problem 1.1 for families of mul-
tivariate probability distributions M that are well known in the statistical lit-
erature, and/ or have proven useful as a mathematical model for specific appli-
cations. While several traditional results of the theory have been studied in the
last century, some significant achievements have been accomplished only within
the last decade, so the present author feels that this is a good time point to recap
what has been achieved, hence to write this overview article. One goal of the
present survey is to collect the numerous results under one common umbrella
in a reader-friendly summary to make them accessible for a broader audience of
applied and theoretical probabilists, and in order to inspire others to join this
interesting strand of research in the future. Proofs, or at least proof sketches,
are presented for most results in order to (a) demonstrate how solutions to
Problem 1.1 often unravel surprising links between seemingly different fields of
mathematics/probability theory, and (b) render this document a useful basis
for the use as lecture notes in an advanced course on multivariate statistics or
probability theory.
Which topics are not covered in the present survey?
The scope of former literature on the topic is often wider, in particular the
references [1, 57, 2] are very popular surveys on the topic with wider scope.
On the one hand, many references on the topic discuss conditionally iid mod-
els under the umbrella of exchangeability, which has been mentioned to be a
weaker notion for finite random vectors. The characterization of the (finitely)
exchangeable subfamily of M is often easier than the characterization of the
The infinite extendibility problem 685

(in general) smaller set M∗ , and is typically an important first step towards
a solution to Problem 1.1. However, the second (typically harder) step from
(finite) exchangeability to conditionally iid is usually the more important and
more interesting step from both a theoretical and practical perspective. The
algebraic structure of a general theory on (finite) exchangeability is naturally
of a different, often more combinatorial character, whereas “conditionally iid”
by virtue of de Finetti’s Theorem naturally is the concept of an infinite limit
(of exchangeability) so that techniques from Analysis enter the scene. Thus, we
feel it is useful to provide an account with a more narrow scope on conditionally
iid, even though for some of the presented examples we are well aware that an
interesting (finite) exchangeable theory is also viable. On the other hand, many
references consider the case when the components of X take values in more
general spaces than R, for instance in Rn (i.e. lattices instead of vectors) or
even function spaces. In particular, de Finetti’s Theorem 1.1 can be generalized
in this regard, seminal references are [30, 92]. Research in this direction is by
nature more abstract and thus maybe less accessible for a broader audience,
or for more practically oriented readers. One goal of the present survey is to
provide an account that is not exclusively geared towards theorists but also to
applicants of the theory, and in particular to point out relationships to classical
statistical probability laws on Rd . We believe that a limitation of this survey’s
scope to the real-valued case is still rich enough to provide a solid basis for an
interesting and accessible theory. In fact, we seek to demonstrate that Prob-
lem 1.1 has been solved satisfactorily in quite a number of highly interesting
cases, and the solutions contain interesting links to different probabilistic top-
ics. Of course, it might be worthwhile to ponder about generalizations of some
of the presented results to more abstract settings in the future (unless already
done) – but purposely these lie outside the present survey.
Why is Problem 1.1 interesting at all?
Broadly speaking, because of two reasons: (a) conditionally iid models are conve-
nient for applications, and (b) solutions to the extendibility problem sometimes
rely on compelling relationships between a priori different theories.

(a) Roughly speaking, conditionally iid models allow to model (strong and
weak) dependence between random variables in a way that features many
desirable properties which are taylor-made for applications, in particular
when the dimension d is large. Firstly, a conditionally iid random vector is
“dimension-free” in the sense that components can be added or removed
from X without altering the basic structure of the model, which simply
follows from the fact that an iid sequence remains an iid sequence after
addition or removal of certain members. This may be very important in
applications that require a regular change of dimension, e.g. the readjust-
ment of a large credit portfolio in a bank, when old credits leave and new
credits enter the portfolio frequently. Secondly, if X has a distribution
from a parametric family, the parameters of this family are typically de-
termined by the parameters of the underlying latent probability measure
γ, irrespective of the dimension d. Consequently, the number of param-
686 J.-F. Mai

eters does usually not grow significantly with the dimension d and may
be controlled at one’s personal taste. This is an enormous advantage for
model design in practice, in particular since the huge degree of freedom/
huge number of parameters in a high-dimensional dependence model is
often more boon than bane. Thirdly, fundamental statistical theorems re-
lying on the iid assumption, like the law of large numbers, may still be
applied in a conditionally iid setting, making such models very tractable.
Last but not least, in dependence modeling a “factor-model way of think-
ing” is very intuitive, e.g. it is well-established in the multivariate normal
case (thinking of principle component analyses etc.). On a high level, if one
wishes to design a multi-factor dependence model within a certain family
of distributions M, an important first step is to determine the one-factor
subfamily M∗ . Having found a conditionally iid stochastic representation
of M∗ , the design of multi-factor models is sometimes obvious from there,
see also paragraph 7.3.
(b) The solution to Problem 1.1 is often mathematically challenging and
compelling. It naturally provides an interesting connection between the
“static” world of random vectors and the “dynamic” cosmos of (one-
dimensional) stochastic processes. The latter enter the scene because the
latent factor being responsible for the dependence in a conditionally iid
model for X may canonically be viewed as a non-decreasing stochastic
process (a random distribution function), which is further explained in
Section 1.3 below. In particular, for some classical families M of multi-
variate laws from the statistical literature the family M∗ in Problem 1.1
is conveniently described in terms of a well-studied family of stochastic
processes like Lévy subordinators, Sato subordinators, or processes which
are infinitely divisible with respect to time. Moreover, in order to formally
establish the aforementioned link between these two seemingly different
fields of research the required mathematical techniques involve classical
theories from Analysis like Laplace transforms, Bernstein functions, and
moment problems.

Before we start, can we please study a first simple example?


It is educational to end this motivating paragraph by demonstrating the motivat-
ing problem with a simple example that all readers are familiar with. Denoting
by N (μ, Σ) the multivariate normal law with mean vector μ = (μ1 , . . . , μd ) ∈ Rd
and covariance matrix Σ ∈ Rd×d , Example 1.1 provides the solution for Prob-
lem 1.1 in the case when M consists of all multivariate normal laws.

Example 1.1 (The multivariate normal law). We want to solve Problem 1.1
for the family

M = {N (μ, Σ) : μ ∈ Rd , Σ ∈ Rd×d symmetric, positive definite}.

We claim that M∗ equals the set of all multivariate normal distributions satis-
The infinite extendibility problem 687

fying
⎡ ⎤
σ2 ρ σ2 ... ρ σ2
⎢ρ σ 2 σ2 ρ σ2 ⎥
⎢ ⎥
μ = (μ, . . . , μ), Σ=⎢ . .. ⎥, μ ∈ R, σ > 0, ρ ∈ [0, 1].
⎣ .. . ⎦
ρ σ2 ρ σ2 σ2
Proof. Consider X = (X1 , . . . , Xd ) ∼ N (μ, Σ) on a probability space (Ω, F, P)
for μ = (μ1 , . . . , μd ) ∈ Rd , and Σ = (Σi,j ) ∈ Rd×d a positive definite matrix.
If we assume that the law of X is in M∗ , it follows that there is a sub-σ-
algebra H ⊂ F such that the components X1 , . . . , Xd are iid conditioned on H.
Consequently,
μk = E[Xk ] = E[E[Xk | H]] = E[E[X1 | H]] = E[X1 ] = μ1 , (1.1)
irrespectively of k = 1, . . . , d. The analogous reasoning also holds for the second
moment of Xk , which implies Σk,k = Σ1,1 for all k. Moreover,
E[Xi Xj ] = E[E[Xi | H] E[Xj | H]] = E[E[X1 | H]2 ] ≥ μ21 , (1.2)
for arbitrary components i = j, where we used the conditional iid structure and
Jensen’s inequality. This finally implies that all off-diagonal elements of Σ are
identical and non-negative.
Conversely, let μ ∈ R, σ > 0, and ρ ∈ [0, 1]. Consider a probability space on
which d + 1 iid standard normally distributed random variables M, M1 , . . . , Md
are defined. We define
√ 
Xk := μ + σ ρ M + 1 − ρ Mk , k = 1, . . . , d. (1.3)
It is readily observed that X = (X1 , . . . , Xd ) has a multivariate normal law
with pairwise correlation coefficients all being equal to ρ, and all components
having mean μ and variance σ 2 . Notice in particular that the non-negativity
of ρ is important in the construction (1.3) because the square root is not well-
defined otherwise. The components of X are obviously conditionally iid given
the σ-algebra H generated by M . Hence the law of X is in M∗ .
There are already some interesting remarks to be made about this simple ex-
ample. First of all, it is observed that the family M∗ is always three-parametric,
irrespective of the dimension d. This stands in glaring contrast to M, which has
d + d (d + 1)/2 parameters in dimension d. Second, in general a Monte Carlo
simulation of a d-dimensional normal random vector X requires a Cholesky de-
composition of the matrix Σ, which typically has computational complexity of
order d3 , see [75, Algorithm 4.3, p. 182]. In contrast, the simulation of X with
law in M∗ according to (1.3) has only linear complexity in the dimension d.
Especially in large dimensions this can be a critical improvement of computa-
tional speed. Third, the proof above shows that each random vector with law in
M∗ may actually be viewed as the first d components of an infinite sequence of
conditionally iid random variables such that arbitrary finite n-margins have a
multivariate normal law. Thus, we have actually solved the re-fined Problem 1.3
to be introduced in the upcoming paragraph.
688 J.-F. Mai

1.3. Canonical probability spaces

We have mentioned earlier that a conditionally iid random vector X is usually


constructed as Xk := f (Uk , H), k = 1, . . . , d, from an iid sequence U1 , . . . , Ud ,
some independent stochastic object H, and some functional f . Clearly, this
general model is inconvenient because neither the law of U1 , nor the nature of
the stochastic object H or the functional f are given explicitly. However, there
is a canonical choice for all three entities, which we are going to consider in the
sequel. By definition, conditionally iid means that conditioned on the object H
the random variables X1 , . . . , Xd are iid, distributed according to a univariate
distribution function F , which may depend on H. A univariate distribution
function F is nothing but a non-decreasing, right-continuous function F : R →
[0, 1] with limt→−∞ F (t) = 0 and limt→∞ F (t) = 1, see [12, Theorem 12.4,
p. 176]. Without loss of generality we may assume that the random object
H = {Ht }t∈R already is the conditional distribution function itself, i.e. is a
random variable in the space of distribution functions – or, in other words, a
non-decreasing, right-continuous stochastic process with limt→−∞ Ht = 0 and
limt→∞ Ht = 1. In other words, H ∼ γ ∈ M+ 1
(H). In this case, a canonical
choice for the law of U1 is the uniform distribution on [0, 1] and the functional
f may be chosen as
Xk = f (Uk , H) := inf{t ∈ R : Ht > Uk } = HU−1
k
, k = 1, . . . , d. (1.4)

Recall here that H is interpreted as a random distribution function and H −1


denotes its generalized inverse. In particular, Xk ≤ x if and only if Uk ≤ Hx .
Indeed, one verifies that X1 , . . . , Xd are iid conditioned on H := σ {Ht }t∈R ,
with common univariate distribution function H, since
P(X1 ≤ t1 , . . . , Xd ≤ td | H) = P(U1 ≤ Ht1 , . . . , Ud ≤ Htd | H)
= Ht1 Ht2 · · · Htd ,
for all t1 , . . . , td ∈ R. Every random vector which is conditionally iid can be
constructed like this, i.e. there is a one-to-one relation between such models
and random variables in the space of (one-dimensional) distribution functions,
as already adumbrated in Definition 1.1. For each given H = {Ht }t∈R ∼ γ ∈
M+ 1
(H), and a given dimension d ∈ N, the canonical construction (1.4) induces
a multivariate probability distribution on Rd , and we denote this mapping from
1 1
M+ (H) to a subset of M+ (Rd ) by Θd throughout, this subset being precisely
the conditionally iid probability laws on Rd . It is implicit that Θd (γ) depends
on the law of H ∼ γ via
Θd (γ) (−∞, t1 ] × . . . × (−∞, td ] = E[Ht1 . . . Htd ].

Given M ⊂ M+ 1
(Rd ) we denote the pre-image of M∗ under Θd in M+ 1
(H) by
−1 1
Θd (M∗ ). In words, it equals the subset of M+ (H) which consists of all prob-
ability laws γ of stochastic processes {Ht }t∈R such that X of the canonical
construction (1.4) has a law in M, hence in M∗ . From this equivalent viewpoint
our motivating Problem 1.1 becomes
The infinite extendibility problem 689

Problem 1.2 (Motivating problem reformulated). For a given family of d-


dimensional probability distributions M, determine the family of stochastic pro-
cesses Θ−1
d (M∗ ) ⊂ M+ (H).
1

Admittedly, this reformulation in terms of the stochastic process H might


appear quite artificial at this point, but we will see later that in some cases we
obtain interesting correspondences between classical probability distributions
on Rd and families of stochastic processes. On a high level, the problem of de-
termining the intersection of a given family M of distributions with the family
of conditionally iid distributions may also be re-phrased as the problem of find-
ing an increasing stochastic process whose stochastic nature induces the given
multivariate distribution when inserted into a canonical stochastic model.

Obviously, in the stochastic model (1.4) it is possible to let d tend to infinity,


since the Uk are iid. Thus, we may without loss of generality think of a con-
ditionally iid random vector X as the first d members of an infinite sequence
{Xk }k∈N on (Ω, F, P) such that conditioned on some σ-algebra H ⊂ F the se-
quence {Xk }k∈N is iid. De Finetti’s Theorem thus allows us to view conditionally
iid random vectors X = (X1 , . . . , Xd ) as the first d members of an infinite ex-
changeable sequence {Xk }k∈N . More clearly, a probability law μ ∈ M+ 1
(Rd ) is
conditionally iid if and only if there exists an infinite exchangeable sequence
{Xk }k∈N on some probability space such that X = (X1 , . . . , Xd ) ∼ μ. At this
point it is important to highlight that we deal with a fixed dimension d. In gen-
eral, it is possible that two truly different probability laws γ1 = γ2 are mapped
onto the same element μ ∈ M+ 1
(Rd ) under the mapping Θd . By virtue of de
Finetti’s Theorem, this ambiguity vanishes if we let d tend to infinity in (1.4),
i.e. the probability laws of H and {Xk }k∈N stand in a one-to-one correspon-
dence, see also Lemma 1.8 below. In many cases of interest, we are actually not
only interested in finding some γ that is mapped onto a given μ ∈ M under Θd ,
but actually wish to find such γ which is mapped onto a probability law with a
desired property for arbitrary n ≥ 1 by Θn . In order to formalize this idea, we
introduce the following definition.

Definition 1.2 (Conditionally iid respecting (P)). Let (P) be some property
690 J.-F. Mai

which makes sense in arbitrary dimension n ≥ 1, and define the sets

Mn,(P) := {μ ∈ M+
1
(Rn ) : μ has property (P)} ⊂ M+
1
(Rn ).

We say that μ ∈ M = Md,(P) is conditionally iid respecting (P) if there exists


a stochastic process H ∼ γ ∈ M+1
(H) whose probability law is mapped onto μ
under Θd and, in addition, is mapped to an element of Mn,(P) for arbitrary
n ≥ 1 under Θn . We furthermore introduce the notation

M∗∗ := {μ ∈ M : μ is conditionally iid respecting (P)}.

Now we refine Problem 1.1.


Problem 1.3 (Motivating problem refined). Let (P) be a property that makes
sense in any dimension, and consider M = Md,(P) . For μ ∈ M provide necessary
and sufficient conditions ensuring that μ ∈ M∗∗ .
In the situation of Problem 1.3 we have M∗∗ ⊂ M∗ , and the inclusion can
be proper in general, although this is unusual in cases of interest. A non-trivial
example for the situation M∗∗ = M∗ is presented in Example 3.3 in Section 3.2
below. A typical example for (P) is the property of “being a multivariate normal
distribution (in some dimension)”. For a given d-variate multivariate normal law
it is a priori unclear whether there exists an infinite exchangeable sequence with
d-margins being equal to the given multivariate normal law and such that all
n-margins are multivariate normal as well for n > d. This is indeed the case
and we have M∗∗ = M∗ in this particular situation, as can be inferred from
1
Example 1.1. The typical questions in the theory deal with subsets of M+ (Rd ) of
the form M = Md,(P) for a property (P) that makes sense in arbitrary dimension,
so most results presented are actually solutions to Problem 1.3 rather than to
Problem 1.1, see also paragraph 7.5 below for a further discussion related to
this subtlety.

1.3.1. Laws with positive components

If the given family M consists only of probability laws on [0, ∞)d , it is con-
venient to slightly reformulate the stochastic model (1.4). Clearly, if we have
non-negative components, necessarily Ht = 0 for all t < 0 almost surely. There-
fore, without loss of generality we may assume that H = {Ht }t≥0 is indexed
by t ∈ [0, ∞). Moreover, applying the substitution z = − log(1 − F ) it trivially
holds true that
 
H+ = t → 1 − exp(−z(t))  z : [0, ∞) → [0, ∞] non-decreasing,
right-continuous, with z(0) ≥ 0 and lim z(t) = ∞ .
t→∞

One may therefore rewrite the canonical construction (1.4) as

Xk := inf{t ≥ 0 : Zt > k }, k = 1, . . . , d, (1.5)


The infinite extendibility problem 691

where the k := − log(1 − Uk ), k = 1, . . . , d, are now iid exponential random


variables with unit mean, and Z = {Zt }t≥0 is now no longer a distribution
function, but instead a non-decreasing, right-continuous process with Z0 ≥ 0
and limt→∞ Zt = ∞, related to H via the substitution Zt = − log(1 − Ht ).
Figure 1 illustrates one realization of the simulation mechanism (1.5).

Fig 1. Illustration of one simulation of the canonical construction (1.5) in dimension d = 4.


One observes that the process Z = {Zt }t≥0 in this particular illustration has jumps, and
therefore there is a positive probability that two components of X take the identical value.
This does not happen if Z is a continuous process, see also Lemma 1.5 below.

1.4. General properties of conditionally iid models

In this section we briefly collect some general properties of conditionally iid


1
models. To this end, throughout this section we assume that M = M+ (Rd )
denotes the family of all d-dimensional probability laws on R and we collect
d

general properties of M∗ .

1.4.1. Positive dependence

If the law of X is in M∗ , the covariance matrix of X, provided existence, cannot


have negative entries.
Lemma 1.2 (Non-negative correlations). If the law of X is in M∗ and the
covariance matrix of X exists, then all its entries are non-negative.
Proof. This follows from precisely the same computations that have been carried
out already in (1.1) and (1.2) for the particular example of the multivariate
normal distribution.
Correlation coefficients are sometimes inappropriate dependence measure-
ments outside the Gaussian paradigm, see [81, 105]. For instance, their existence
692 J.-F. Mai

depends on the existence of second moments, or we might have a correlation co-


efficient that is strictly less than one despite the fact that one component of the
random vector is a monotone function of the other, since correlation coefficients
depend on the marginal distributions as well. For these reasons, several alterna-
tive dependence measurements have been developed. One popular among them
is the concordance measurement Kendall’s Tau. Recall that x, y ∈ R2 are called
concordant if (x1 − y1 ) (x2 − y2 ) > 0 and discordant if (x1 − y1 ) (x2 − y2 ) < 0.
In words, concordance means that one of the two points lies north-east of the
other, while discordance means that one of the two lies north-west of the other.
Kendall’s Tau for a bivariate random vector X is defined as the difference be-
tween the probability of concordance and the probability of discordance for two
independent copies of X. If X is conditionally iid, Kendall’s Tau is necessarily
non-negative.
Lemma 1.3 (Non-negative Kendall’s Tau). If the law of X = (X1 , X2 ) is in
M∗ , then Kendall’s Tau is necessarily non-negative.
Proof. Let X (1) and X (2) be two independent copies of X, both defined on
some common probability space. By assumption we find a σ-algebra H such
(1) (1) (2) (2)
that conditioned on H all four random variables X1 , X2 , X1 , X2 are in-
(1) (1)
dependent with respective distribution functions H (1) (for X1 , X2 ) and H (2)
(2) (2)
(for X1 , X2 ). Notice that H (1) and H (2) are iid. We compute
(1) (2) (1) (2)
P (X1 − X1 ) (X2 − X2 > 0
  
= E P (X1 − X1 ) (X2 − X2 ) > 0  H
(1) (2) (1) (2)

 (2)  
= E P X1 > X1 , X2 > X2 | H + P X 1 < X1 , X2 < X2  H
(1) (2) (1) (2) (1) (2) (1)

 2   2 
(2) (1)
=E Hx− dHx(1) +E Hx− dHx(2)

and analogously
(1) (2) (1) (2)
P (X1 − X1 ) (X2 − X2 < 0
  
= E P (X1 − X1 ) (X2 − X2 ) < 0  H
(1) (2) (1) (2)

 
(2) (1)
= 2E Hx− dHx(1) Hx− dHx(2) ,

so that Kendall’s Tau equals


 2 
(2) (1)
E Hx− dHx(1) − Hx− dHx(2) ≥ 0,

establishing the claim.


The next lemma is less intuitive on first glimpse, but like Lemmata 1.2 and 1.3
it qualitatively states that laws in M∗ exhibit some sort of “positive” depen-
dence. In order to understand it, it is useful to recall the notion of majorization,
The infinite extendibility problem 693

see [80] for a textbook account on the topic. A vector a = (a1 , . . . , ad ) is said
to majorize a vector b = (b1 , . . . , bd ) if


d 
d 
d 
d
a[k] ≥ b[k] , n = 2, . . . , d, ak = bk .
k=n k=n k=1 k=1

Intuitively, the entries of b are “closer to each other” than the entries of a,
even though the sum of all entries is identical for both vectors. For instance,
the vector (1, 0, . . . , 0) majorizes the vector (1/2, 1/2, 0, . . . , 0), which majorizes
(1/3, 1/3, 1/3, 0, . . . , 0), and so on.
Lemma 1.4 (A link to majorization). Consider X with law in M∗ . Further, let
Y = (Y1 , . . . , Yd ) be a random vector with components that are iid and satisfy
d
Y1 = X1 .

We denote FZ (x) := P(Z ≤ x) for real-valued Z and x ∈ R.


(a) For arbitrary x ∈ R the vector FY[1] (x), . . . , FY[d] (x) majorizes the vector
FX[1] (x), . . . , FX[d] (x) .
(b) For any measurable, real-valued function g which is monotone on the
support of X1 the vector E[g(Y[1] )], . . . , E[g(Y[d] )] majorizes the vector
E[g(X[1] )], . . . , E[g(X[d] )] .
Proof. This is [98, Theorem 2.2 and Corollary 2.3]. By definition,


d 
d 
d 
d
FX[k] (x) = FXk (x) = d FX1 (x) = d FY1 (x) = FYk (x) = FY[k] (x).
k=1 k=1 k=1 k=1

Since FX[1] (x) ≥ . . . ≥ FX[d] (x) and FY[1] (x) ≥ . . . ≥ FX[d] (x), for part (a) we
have to show that

n 
n
FX[k] (x) ≤ FY[k] (x), n = 1, . . . , d − 1.
k=1 k=1

First, it is not difficult to verify that

 d  
n 
d
hn,d (p) := pi (1 − p)d−i , p ∈ [0, 1],
i
k=1 i=k

is concave for arbitrary 1 ≤ n ≤ d. Second, concavity implies that



n 
n 
FX[k] (x) = E P(X[k] ≤ x | H) = E[hn,d (Hx )]
k=1 k=1

n
≤ hn,d (E[Hx ]) = hn,d (P(Y1 ≤ x)) = FY[k] (x),
k=1
694 J.-F. Mai

where Jensen’s inequality has been used. Making use of the relation E[Z] =
∞ 0
0
1 − FZ (z) dz − −∞ FZ (z) dz for real-valued random variables Z, part (b)
is obtained from (a) for the case g(x) = x. For the general case, one simply
has to observe that the law of g(X1 ), . . . , g(Xd ) is also in M∗ and due to
monotonicity of g we have either g(X[1] ) ≤ . . . ≤ g(X[d] ) in the non-decreasing
case or g(X[d] ) ≤ . . . ≤ g(X[1] ) in the non-increasing case.
Intuitively, statement (b) in case g(x) = x states that the expected values
of the order statistics E[X[k] ], k = 1, . . . , d, are closer to each other than the
respective values if the components of X were iid (and not only conditionally
iid). Intuitively, the components of a random vector X with components that
are conditionally iid are thus less spread out than the components of a ran-
dom vector with iid components. Thus, Lemmata 1.2, 1.3 and 1.4 show that
dependence models built from a conditionally iid setup can only capture the
situation of components being “more clustered” than independence, which is
loosely interpreted as “positive dependence”. Generally speaking, negative de-
pendence concepts are more complicated than positive dependence concepts in
dimensions d ≥ 3, the interested reader is referred to [89] for a nice overview
and references dealing with such concepts.
Whereas Lemmata 1.2, 1.3 and 1.4 provide three particular quantifications
for positive dependence of a conditionally iid probability law, many other pos-
sible concepts of positive dependence can be found in the literature, a textbook
account on the topic is [86]. [95, Theorem 4] claims that if X = (X1 , . . . , Xd ) is
conditionally iid and x → P(X1 ≤ x) is continuous, then


d
P(X ≤ x) ≥ P(Xk ≤ xk ), x ∈ Rd ,
k=1

a positive dependence property called positive lower orthant dependency. How-


ever, here is a counterexample showing that [95, Theorem 4] is not correct and
conditionally iid random vectors need not exhibit positive lower orthant depen-
dency in general.
Example 1.2 (Conditionally iid =⇒  positive lower orthant dependency). Let
M be uniformly distributed on [0, 1/2]. Conditioned on M let X = (X1 , X2 ) be
a vector of two iid random variables which have distribution function
1
Ht = 1 + 1{t≥M + 12 } , t ∈ R.
2 {−M + 2 ≤t<M + 2 }
1 1

It is not difficult to compute that


1 1
P(X ≤ x) = E[Hx1 Hx2 ] = x[1] + max{0, x1 + x2 − 1}, x1 , x2 ∈ [0, 1],
2 2
and the distribution function of X is a copula, i.e. has standard uniform one-
dimensional marginals. In particular,
 1 3 1 3  1  3
P X1 ≤ , X2 ≤ = < = P X1 ≤ P X2 ≤ ,
4 4 8 16 4 4
The infinite extendibility problem 695

contradicting positive lower orthant dependency. Notice that Kendall’s Tau for
X is exactly equal to zero, and also the correlation coefficient between the compo-
nents of X equals zero. Figure 2 depicts a scatter plot of 1000 samples from X.
In contrast to Example 1.2, [24] prove that the weaker property P(X1 ∈
A, . . . , Xd ∈ A) ≥ P(X1 ∈ A)d holds indeed true for conditionally iid X and
an arbitrary measurable set A ⊂ R. This makes clear that a decisive point in
Example 1.2 is that the considered xi are different.

Fig 2. 1000 samples of (X1 , X2 ) from Example 1.2.

1.4.2. Further properties

Even though it is obvious, we find it educational to point out explicitly that


path continuity of H corresponds to the absence of a singular component in the
law of X.
Lemma 1.5 (Path continuity of H). Let H ∼ γ ∈ M+ 1
(H) and consider the
random vector X = (X1 , . . . , Xd ) constructed in Equation (1.4) for arbitrary
d ≥ 2. Then P(X1 = X2 ) = 0 if and only if the paths of H are almost surely
continuous.
Proof. Conditioned on the σ-algebra H generated by H, the random variables
X1 , X2 are iid with distribution function H. Since two iid random variables take
696 J.-F. Mai

exactly the same value with positive probability if and only if their common
distribution function has at least one jump, the claim follows.

The following result is shown in [98, Proposition 4.2], but we present a slightly
different proof.
Lemma 1.6 (Closure under convergence in distribution). If X (n) are condition-
ally iid and converge in distribution to X, then the law of X is also conditionally
iid.
Proof. Since we only deal with a statement in distribution, we are free to as-
sume that each X (n) is represented as in (1.4) from some stochastic process
(n)
H (n) = {Ht }t∈R , and all objects are defined on the same probability space
(Ω, F, P). The random objects H (n) take values in the set of distribution func-
tions of random variables taking values in [−∞, ∞]. This set is compact by
Helly’s Selection Theorem and Hausdorff when equipped with the topology of
pointwise convergence at all continuity points of the limit, see [100]. Thus, the
probability measures on this set are a compact set by [3, Corollary II.4.2, p.
104]. This implies that we find a convergent subsequence {nk }k∈N ⊂ N such
that H (nk ) converges in distribution to some limiting stochastic process H,
which takes itself values in the set of distribution functions of random variables
taking values in [−∞, ∞]. It is now not difficult to see that
(nk ) (nk )
P(X1 ≤ x1 , . . . , Xd ≤ xd ) = lim P(X1 ≤ x1 , . . . , Xd ≤ xd )
k→∞

= lim E[Hx(n1 k ) · · · Hx(nd k ) ] = E[Hx1 · · · Hxd ],


k→∞

where bounded convergence is used in the last equality. This implies that the
law of X can be constructed canonically like in (1.4), hence X is conditionally
iid. Finally, since X is assumed to take values in Rd , necessarily H is almost
surely the distribution function of a random variable taking values in R (instead
of [−∞, ∞]).

Recall that a random vector (X1 , . . . , Xd ) is called radially symmetric if there


exist μ1 , . . . , μd ∈ R such that
d
(X1 − μ1 , . . . , Xd − μd ) = (μ1 − X1 , . . . , μd − Xd ).

If (X1 , . . . , Xd ) is constructed as in Equation (1.4), then radial symmetry can


be translated into a symmetry property of the random distribution function H,
which is the content of the following lemma.
Lemma 1.7 (Radial symmetry). Let H ∼ γ ∈ M+ 1
(H). The random vector
(X1 , . . . , Xd ) constructed in Equation (1.4) is radially symmetric if and only if
there is some μ ∈ R such that
d 
{Hμ−t }t∈R = 1 − H(t+μ)− t∈R
.
The infinite extendibility problem 697

Proof. On the one hand, we observe

P(μ − X1 ≤ x1 , . . . , μ − Xd ≤ xd ) = P(H(μ+x1 )− ≤ U1 , . . . , H(μ+xd )− ≤ Ud )


 
= E (1 − H(μ+x1 )− ) · · · (1 − H(μ+xd )− ) .

On the other hand, we have

P(X1 − μ ≤ x1 , . . . , Xd − μ ≤ xd ) = E[Hμ−x1 · · · Hμ−xd ],

from where the claimed equivalence can now be deduced easily. Notice that the
conditionally iid structure implies that d can be chosen arbitrary and the law
of H is determined uniquely by the law of an infinite exchangeable sequence
{Xk }k∈N constructed as in (1.4) with d → ∞.
Example 1.3 (The multivariate normal law, again). The most prominent ra-
dially symmetric distribution is the multivariate normal law. Recalling Exam-
ple 1.1, it follows from (1.3) that N (μ, Σ)∗ , the conditionally iid normal laws,
are induced by the stochastic process {Ht }t≥0 given by
 √ 
t−μ
− ρ M
Ht = Φ σ √ , t ∈ R, (1.6)
1−ρ

for some μ ∈ R, σ > 0, and ρ ∈ [0, 1], and a random variable M ∼ Φ =


distribution function of a standard normal law. The reader may check herself
that this random distribution function H satisfies the property of Lemma 1.7.
An immediate but quite useful property of a conditionally iid model is the
following corollary to the classical Glivenko-Cantelli Theorem.
Lemma 1.8 (Conditional Glivenko-Cantelli). Let {Xk }k∈N be an infinite ex-
changeable sequence defined by the canonical construction (1.4) from an in-
finite iid sequence {Uk }k∈N and an independent random distribution function
H ∼ γ ∈ M+ 1
(H). It holds almost surely and uniformly in t ∈ R that

1 
d
1{Xk ≤t} −→ Ht , as d → ∞.
d
k=1

Proof. Follows immediately from the classical Glivenko-Cantelli Theorem, which


is applied in the second equality below:
 1 
d  
 
P lim sup  1{Xk ≤t} − Ht  = 0
d→∞ t∈R d
k=1
  1 
d   
  
= E P lim sup  1{Xk ≤t} − Ht  = 0  H = E[1] = 1.
d→∞ t∈R d
k=1

The claim is established.


698 J.-F. Mai

The stochastic nature of the process {Ht }t∈R clearly determines the law of X.
Conversely, Lemma 1.8 tells us that the law of the d-dimensional vector X does
not determine the law of the underlying latent factor {Ht }t∈R in general, but
accomplishes this in the limit as d → ∞. Given some infinite exchangeable
sequence of random variables {Xk }k∈N , it shows how we can recover its latent
random distribution function H.
A rather obvious property of the set M∗ is convexity.
Lemma 1.9 (M∗ is convex with extremal boundary the product measures).
If μ1 , μ2 ∈ M∗ and  ∈ (0, 1), then  μ1 + (1 − ) μ2 ∈ M∗ . Furthermore, if
μ ∈ M∗ is extremal, meaning that μ =  μ1 + (1 − ) μ2 for some  ∈ (0, 1) and
μ1 , μ2 ∈ M∗ necessarily implies μ = μ1 = μ2 , then μ is a product measure3 .
Proof. The convexity of M∗ is an immediate transfer from the (obvious) con-
1
vexity of M+ (H) under the mapping Θd , as the reader can readily check herself.
That product measures are extremal is also obvious. Finally, consider an ex-
tremal element μ ∈ M∗ . Since μ is conditionally iid, there is a probability
measure γ ∈ M+ 1
(H) such that μ (−∞, x] = H h(x1 ) · · · h(xd ) γ(dh). We
choose a Borel set A ∈ H with γ(A) > 0. If γ(A) = 1 is the only possible
choice, γ is actually a Dirac measure at some element h ∈ H and μ is a prod-
uct measure, as claimed. Let us derive a contradiction otherwise, in which case
γ = γ(A) γ(. | A) + γ(Ac ) γ(. | Ac ) and both γ(. | A) and γ(. | Ac ) are elements of
1
M+ (H). We obtain a convex combination of μ, to wit

μ((−∞, x]) = γ(A) h(x1 ) · · · h(xd ) γ(dh | A)


H

+ (1 − γ(A)) h(x1 ) · · · h(xd ) γ(dh | Ac ).


H

Since μ is extremal and γ(. | A) and γ(. | Ac ) are different by definition, we obtain
the desired contradiction.
For the sake of completeness, the following remark gives two equivalent con-
ditions for exchangeability of an infinite sequence of random variables.
Remark 1.2 (Conditions equivalent to infinite exchangeability). A result due
to [93] states that an infinite sequence {Xk }k∈N of random variables is exchange-
able (or, equivalently, conditionally iid by de Finetti’s Theorem) if and only if
the law of the infinite sequence {Xnk }k∈N is invariant with respect to the choice
of (increasing) subsequence {nk }k∈N ⊂ N. Another equivalent condition to ex-
d
changeability is that {Xk }k∈N = {Xτ +k }k∈N for an arbitrary finite stopping time
τ with respect to the filtration Fn := σ(X1 , . . . , Xn ), n ∈ N, see [52].

1.5. A general (abstract) solution to Problem 1.1


1
[58] solve Problem 1.1 on an abstract level for the whole family M = M+ (Rd )
3 Meaning that the components of X ∼ μ are iid.
The infinite extendibility problem 699

of all probability laws on Rd . Their result is formulated in the next theorem in


our notation4 .
Theorem 1.2 (General solution to Problem 1.1). The law of X = (X1 , . . . , Xd )
is conditionally iid if and only if
 |E[g(X)]| 
sup ≤ 1,
g=0 sup |E[g(Y )]|
Y

where the outer supremum is taken over all (non-zero) bounded, measurable
functions g : Rd → R, and the inner supremum in the denominator is taken
over all random vectors Y = (Y1 , . . . , Yd ) with iid components.
Proof. The proof of sufficiency is the difficult part, relying on functional analytic
methods, and we refer the interested reader to [58, Theorem 5.1], but provide
some intuition below. Necessity of the condition in Theorem 1.2 is the easy part,
as will briefly be explained. Without loss of generality we may assume that X
is represented by (1.4) with some stochastic process H ∼ γ ∈ M+ 1
(H) and an
independent sequence of iid variates U1 , . . . , Ud uniformly distributed on [0, 1].
For arbitrary bounded and measurable g we observe

|E[g(X)]| = |E[E[g(X) | H]]| = |E[g(HU−1 , . . . , HU−1 )]|


  1
d
≤ sup E g G (U1 ), . . . , G (Ud )  = sup |E[g(Y )]|.
−1 −1
G(.)∈H Y

Regarding the intuition of the sufficiency of the condition in Theorem 1.2, we


provide one demonstrating example. With X standard normal, we have already
seen in Example 1.1 that the random vector X = (X, −X) is not conditionally
iid, since it is bivariate normal with negative correlation coefficient. So how does
this random vector violate the condition? Considering the bounded measurable
function g(x1 , x2 ) = 1{x1 <0<x2 } , we readily observe that E[g(X)] = P(X < 0) =
1/2. If Y = (Y1 , Y2 ) is an arbitrary vector with iid components, we observe that

E[g(Y )] = P(Y1 < 0) P(Y2 > 0) ≤ P(Y1 ≤ 0) 1 − P(Y1 ≤ 0)


!" # !" #
≤P(Y1 ≤0) =P(Y1 >0)

≤ sup {p (1 − p)} = 1/4.


p∈[0,1]

Consequently, the supremum over all such Y is bounded from above by 1/4,
hence the supremum over all g in the condition of Theorem 1.2 is at least two,
hence larger than one. The intuition behind this counterexample is that we
have found one particular bounded measurable g that addresses a distributional
property of X that sets it apart from any iid sequence. Indeed, the proof of
[58] relies on the Hahn-Banach Theorem and thus on a separation argument,
4 In addition to Theorem 1.2, [58] even consider more abstract spaces than R, and also pro-

vide a necessary and sufficient criterion for finite extendibility of the law of X = (X1 , . . . , Xd )
to an exchangeable law on Rn for n > d arbitrary.
700 J.-F. Mai

since the set of conditionally iid laws can be viewed as a closed convex subset
1
of M+ (Rd ) with extremal boundary comprising the laws with iid components,
see Lemma 1.9.
On the one hand, Theorem 1.2 is clearly a milestone with regards to the
present survey as it solves Problem 1.1 in the general case. On the other hand,
it is difficult to apply the derived condition in particular cases of Problem 1.1,
when the family M is some (semi-)parametric family of interest – simply because
the involved suprema are hard to evaluate, see also Example 1.4 below. On a
high level, Theorem 1.2 solves Problem 1.1 but not the refined Problem 1.3,
which depends on an additional dimension-independent property (P). However,
the most compelling results of the theory deal precisely with certain dimension-
independent properties (P) of interest, see the upcoming sections as well as
paragraph 7.5 for a further discussion. This is because the additional structure
provided by some property (P) and the search for structure-preserving exten-
sions is in many cases a more natural and more interesting problem than to
simply find some extension. We will see that the algebraic structure of this
problem is highly case-specific in general, i.e. heavily dependent on (P).
The following example shows that the supremum condition of Theorem 1.2
can lead to an NP-hard problem in general.

Example 1.4 (In general, the extendibility problem is difficult). If a random


vector X = (X1 , X2 ) takes values in {x1 , . . . , xn }2 ⊂ R2 , its joint probabil-
ity distribution is fully described in terms of the matrix A ∈ [0, 1]n×n defined
via Aij := P(X1 = xi , X2 = xj ), 1 ≤ i, j ≤ n. The probability law of X is
exchangeable if and only if A = AT , and the law of X is conditionally iid if
5
m if there
and only are (row vectors) λ ∈ Sm and x1 , . . . , xm ∈ Sn such that
A = i=1 λi xTi xi . Up to normalization, which is only due to the fact that we
deal with a probabilistic interpretation, this property is called complete positiv-
ity. A completely positive matrix A is necessarily also doubly non-negative,
meaning that it is symmetric, element-wise non-negative and positive semi-
definite, and its elements sum up to one. The set of completely positive matrices
is a proper subset of doubly non-negative matrices in dimensions d ≥ 5, and to
decide for a given matrix A whether or not it is completely positive is known
to be NP-hard, see [21]. Theorem 1.2 implies that X, given in terms of A, is
conditionally iid if and only if
 
  n Gij Aij  
i,j=1
sup ≤ 1.
G∈Rn×n \{0} supy∈Sn |y T G y|

Notice that the denominator is equal to the absolute value of the maximal eigen-
value of G, the so-called spectral radius of G. As outlined before, this optimiza-
tion problem must be NP-hard, unless P=NP.

5 We denote by Sm := {y ∈ [0, 1]m : y1 = 1} the m-dimensional unit simplex.


The infinite extendibility problem 701

2. Binary sequences

We study probability laws on {0, 1}d , i.e. on the set of finite binary sequences. We
start with a short digression on the little moment problem, because it occupies
a commanding role, not only in this section but also in Section 4 below. For a
further discussion between the little moment problem and de Finetti’s Theorem,
the interested reader is also referred to [17].

2.1. Hausdorff ’s moment problem

If (b0 , . . . , bd ) is a finite sequence of real numbers, we write ∇bk = bk − bk+1 for


k = 0, . . . , d − 1. The (reversed) difference operator ∇ may be iterated, yielding
∇2 bk = ∇(∇bk ) = ∇bk − ∇bk+1 for k = 0, . . . , d − 2, and so on. In general we
obtain the formula
j  
i j
∇ bk :=
j
(−1) bk+i , 0 ≤ j + k ≤ d,
i=0
i

with ∇ = ∇1 and ∇0 the identity.


Definition 2.1 (d-monotone sequences). For d ∈ N, we say that a finite
sequence (b0 , b1 , . . . , bd ) ∈ [0, ∞)d+1 is d-monotone if ∇d−k bk ≥ 0 for k =
0, 1, . . . , d. An infinite sequence {bk }k∈N0 with positive members is said to be
completely monotone if (b0 , . . . , bd ) is d-monotone for each d ≥ 2.
If (b0 , . . . , bd ) is d-monotone, then ∇j bk ≥ 0 for all 0 ≤ j + k ≤ d. In par-
ticular, if for d ≥ 2 the sequence (b0 , . . . , bd ) is d-monotone, then the shorter
sequences (b0 , . . . , bd−1 ) and (b1 , . . . , bd ) are both (d − 1)-monotone. Intuitively,
when viewing (b0 , . . . , bd ) as a function {0, . . . , d} → [0, ∞), then (−1)j ∇j bk
is something like the j-th derivative at k. With this interpretation in mind, d-
monotonicity means that the higher-order derivatives alternate in sign, i.e. first
derivative is non-positive, second derivative is non-negative, third derivative is
non-positive, and so on. For instance, a 2-monotone sequence is non-increasing
(bk ≥ bk+1 ) and “convex” (bk+1 is smaller or equal than the arithmetic mean of
its neighbors bk and bk+2 ). The set of all d-monotone sequences starting with
b0 = 1 will be denoted by Md in the sequel. Similarly, M∞ denotes the set of
completely monotone sequences starting with b0 = 1.
Finite sequences in Md arise quite naturally in the context of certain discrete
probability laws, as will briefly be explained. Consider a probability distribution
on the power set (including the empty set) of {1, . . . , d} with the property that
subsets with the same cardinality are equally likely outcomes. Concretely, the
probability of some subset I ⊂ {1, . . . , d} only depends on the cardinality |I| of
I, and there are only d + 1 possible cardinalities. Denote the probability of a
subset with cardinality k by pk , k = 0, . . . , d. Then p0 , . . . , pd are non-negative
numbers satisfying
d   
d
pk = p|I| = 1. (2.1)
k
k=0 I⊂{1,...,d}
702 J.-F. Mai

Defining the sequence



d−k
d−k

bk := pd−i , k = 0, . . . , d, (2.2)
i=0
i

it follows that ∇d−k bk = pk ≥ 0 for k = 0, . . . , d. In particular, b0 = 1, so


(b0 , . . . , bd ) ∈ Md . Furthermore, the construction (2.2) can be inverted, i.e. is
general enough to construct all elements of Md . To wit, if (b0 , . . . , bd ) is an ar-
bitrary element in Md , then the vector of non-negative numbers (p0 , . . . , pd ) =
(∇d b0 , ∇d−1 b1 , . . . , ∇0 bd ) satisfies (2.1), i.e. defines a probability law on the
power set of {1, . . . , d} with the aforementioned property. Thus, these probabil-
ity laws on the power set of {1, . . . , d} and Md stand in a one-to-one correspon-
dence. Of course, the power set of {1, . . . , d} can naturally be identified with
{0, 1}d , when identifying x ∈ {0, 1}d with the subset I = {k : xk = 1}. This
explains the occurrence of d-monotonicity in the present section.
The so-called Hausdorff moment problem (also known as little moment prob-
lem) states that the sequences M∞ stand in one-to-one correspondence with the
moment sequences of random variables taking values on the unit interval [0, 1].
Concretely, the sequence {bk }k∈N0 with b0 = 1 is completely monotone if and
only if there is a random variable M taking values in [0, 1] such that bk = E[M k ],
k ∈ N0 . Furthermore, the sequence {bk }k∈N0 uniquely determines the probabil-
ity law of M . This result is originally due to [44, 45]. See also [32, p. 225] for a
proof. Uniqueness of the probability law of M relies heavily on the boundedness
of the interval [0, 1] and is due to the fact that polynomials are dense in the
space of continuous functions on a bounded interval (Stone-Weierstrass).
It is important to observe that not every d-monotone sequence can be ex-
tended to a completely monotone sequence. Being given a d-monotone sequence
(b0 , . . . , bd ), to check whether there exists an extension bd+1 , bd+2 , . . . to an infi-
nite completely monotone sequence {bk }k∈N0 is a purely analytical, highly non-
trivial problem, and luckily already solved. This problem is known as the trun-
cated Hausdorff moment problem. Its solution, due to [53], states that (b0 , . . . , bd )
with b0 = 1 can be extended to an element in M∞ if and only if the Hankel
determinants Ĥ1 , Ȟ1 , . . . , Ĥd−1 , Ȟd−1 are all non-negative, which are defined
as
⎡ ⎤ ⎡ ⎤
b0 . . . b ∇b1 . . . ∇b
⎢ .. ⎥ , Ȟ := det ⎢ .. ⎥
Ĥ2  := det ⎣ ... . ⎦ 2 ⎣ .
..
. ⎦,
b ... b2  ∇b ... ∇b2 −1
⎡ ⎤ ⎡ ⎤
b1 ... b+1 ∇b0 ... ∇b
⎢ .. .. ⎥ ⎢ .. .. ⎥ ,
Ĥ2 +1 := det ⎣ . . ⎦, Ȟ2 +1 := det ⎣ . . ⎦
b+1 ... b2 +1 ∇b ... ∇b2 
(2.3)
for all  ∈ N0 with 2  ≤ d, respectively 2  + 1 ≤ d. To provide an example, the
sequence (1, 1/2, ) is 2-monotone for all  ∈ [0, 1/2], but can only be extended
to a completely monotone sequence if  ∈ [1/4, 1/2].
The infinite extendibility problem 703

2.2. Extendibility of exchangeable binary sequences

Actually, before Bruno de Finetti published his seminal Theorem 1.1 in 1937, he
first published in [18] the same result for the simpler case of binary sequences. In
fact, he showed that there is a one-to-one correspondence between exchangeable
1
probability laws on infinite binary sequences and the set M+ ([0, 1]) of probability
laws on [0, 1].
We start with a random vector X = (X1 , . . . , Xd ) taking values in {0, 1}d . We
know from Lemma 1.1 that X needs to be exchangeable in order to possibly be
conditionally iid, so we concentrate on the exchangeable case. Let 1m , 0m denote
m-dimensional row vectors with all entries equal to one and zero, respectively,
and define

pk := P X = (1k , 0d−k ) , k = 0, . . . , d.

Exchangeability implies that P(X = x) = px1 for arbitrary x ∈ {0, 1}d .


Consequently, the probability law of X is fully determined by p0 , . . . , pd .
Theorem 2.1 (Extendibility of exchangeable binary sequences). Let X be an
exchangeable random vector taking values in {0, 1}d . We denote

pk := P X = (1k , 0d−k ) , k = 0, . . . , d.

The following statements are equivalent:


(a) X is conditionally iid.
(b) There is a random variable M taking values in [0, 1] such that

pk = ∇d−k bk , k = 0, . . . , d,

where bk := E[M k ] for k = 0, . . . , d.


(c) The Hankel determinants in (2.3) are all non-negative, for all  ∈ N0 with
2  ≤ d, respectively 2  + 1 ≤ d, where


d−k
d−k

bk := pd−i , k = 0, . . . , d.
i=0
i

If one (hence all) of these conditions are satisfied, and U = (U1 , . . . , Ud ) is


an iid sequence of random variables that are uniformly distributed on [0, 1],
independent of M in part (b), then
d
X = (1{U1 ≤M } , . . . , 1{Ud ≤M } ).

Proof. The equivalence of (c) and (b) relies on the truncated Hausdorff moment
problem and the identities

E[M k (1 − M )d−k ] = ∇d−k bk = pk , k = 0, . . . , d,


704 J.-F. Mai

which are all readily verified. To show that (b) implies (a) works precisely along
the stochastic model with U as claimed, which is easily checked. To verify the
essential part (a) =⇒ (b) we may simply apply de Finetti’s Theorem 1.1 in
the special case of a binary sequence6 : (a) implies that we may without loss of
generality assume that the given random vector equals the first d members of
an infinite exchangeable binary sequence {Xk }k∈N . De Finetti’s Theorem 1.1,
and as a corollary Lemma 1.8, give us a random variable H ∼ γ ∈ M+ 1
(H).
Since each Xk takes values only in {0, 1}, necessarily almost every path of H
has only one value different from {0, 1}, which is Ht for t ∈ [0, 1). So we define
M := 1 − H1/2 and observe that conditioned on M , the random variables Xk
are iid Bernoulli with success probability M . This implies the claim.
In words, the canonical stochastic model for conditionally iid X with values
in {0, 1}d is a sequence of d independent coin tosses with success probability
M which is identical for all coin tosses, but simulated once before the first coin
toss. We end this section with two examples of particular interest.
Example 2.1 (Pólya’s urn). Let r ∈ N and b ∈ N denote the numbers of red
and blue balls in an urn. Define a random vector X ∈ {0, 1}d as follows:
(i) Set k := 1.
(ii) Draw a ball at random from the urn.
(iii) Set Xk := 1 if the ball is red, and Xk := 0 otherwise.
(iv) Put the ball back into the urn with 1 additional ball of the same color.
(v) Increment k := k + 1.
(vi) If k = d + 1, stop, otherwise go to step (ii).
It is not difficult to observe that X is exchangeable, since
x1 −1 d−x −1
(r + k) k=0 1 (b + k)
P(X = x) = k=0
d−1 , x ∈ {0, 1}d ,
k=0 (r + b + k)
depends on x only through x 1 . Like in Theorem 2.1 we denote by pk the
probability P(X = x) if x 1 = k, k = 0, . . . , d. Using induction over k =
d, d − 1, . . . , 0 in order to verify (∗) below and knowledge about the moments of
the Beta-distribution7 in (∗∗) below, we observe that
 d − k 
d−k  d − k  (r + b − 1)! (r + d − i − 1)! (b + i − 1)!
d−k
bk := pd−i =
i=0
i i=0
i (r − 1)! (b − 1)! (r + b + d − 1)!
(∗) (r + k − 1)! (r + b − 1)! Γ(r + k) Γ(r + b) (∗∗)
= = = E[M k ],
(r − 1)! (b + r + k − 1)! Γ(r) Γ(r + b + k)
where M is a random variable with Beta-distribution whose density is given by
Γ(r + b) r−1
fM (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(r) Γ(b)
6 Alternatively, one may construct a completely monotone sequence {b }
k k∈N from an infi-
nite extension of X, as demonstrated in [17, Equation (1)], and then make use of Hausdorff’s
moment problem to obtain M .
7 See, e.g., [29, p. 35].
The infinite extendibility problem 705

Thus, the probability law of X has a conditionally iid representation like in


Theorem 2.1. This is one of the traditional examples, in which the conditionally
iid structure is a priori not easy to guess from the original motivation of X –
in this case a simple urn replacement model.
Example 2.2 (Ferromagnetic Curie-Weiss Ising model). Motivated by several
models in statistical mechanics, [60] study random vectors which admit a density
with respect to the law of a vector with iid components which is the exponential
of a quadratic form. Concretely, they consider the situation
1 12 d 2

P(X ∈ dx) = e k=1 xk


P(Y ∈ dx), (2.4)
cd
where Y = (Y1 , . . . , Yd ) is a vector with iid components and Y1 is assumed to
satisfy
   1 d 2

ψ(v) := E ev Y1 < ∞ for all v ∈ R, cd := E e 2 k=1 Yk < ∞. (2.5)

Of particular interest are cases in which Y1 takes only finitely many different
values. Especially if Y1 ∈ {0, 1}, the vector X is a binary sequence like in the
present section.
A prominent model motivating the investigation of [60] is the so-called Curie-
Weiss Ising model. In probabilistic terms, this model is a probability law on
{−1, 1}d with two parameters J, h ∈ R, and the components of a random vector
Z with this probability law models the so-called spins at d different sites. These
spins can either have the value −1 or 1 (so X := (1{Z1 >0} , . . . , 1{Zd >0} ) is
a transformation from {−1, 1}d to {0, 1}d ). We denote for n ∈ {−1, 1}d by
N (n) the number of 1’s in n, so that d − N (n) equals the number of −1’s. For
n ∈ {−1, 1}d we define
2
2 N (n)−d + J2
eh 2 N (n)−d
P(Z = n) = d d J
, n ∈ {−1, 1}d , (2.6)
eh (2 k−d)+ 2 (2 k−d)2
k=0 k

which is an exchangeable probability law on {−1, 1}d . The exponent of the nu-
merator can be re-written as

J 2 
d
J 
d d
h 2 N (n) − d + 2 N (n) − d =h nk + nk ni
2 2 i=1
k=1 k=1

and is called the Hamilton operator of the model. The parameter h determines
the external magnetic field and the parameter J denotes a coupling constant.
If J ≥ 0 the model is called ferromagnetic, and for J < 0 it is called antifer-
romagnetic. The
√ √ √ (2.4), if Y1 takes
ferromagnetic case arises as special case of
values
√ in {− J, J} with respective probabilities P(Y1 = √J) = 1 − P(Y1 =
− J) = exp(h)/(exp(h) + exp(−h)). Then the law of Z/ J on {−1, 1}d is
precisely given by the Curie-Weiss Ising model in (2.6) with J ≥ 0. Notice that
for the antiferromagnetic case J < 0 this construction is impossible.
706 J.-F. Mai

[60, Theorem 1.2] shows that X as defined in (2.4) is conditionally iid. More
concretely, conditioned on a random variable M with density8
x2
ψ(x) e− 2
fM (v) := √ , x ∈ R,
cd 2π
the components of X are iid with common distribution
eM x
P(Xk ∈ dx | M ) = P(Y1 ∈ dx), k = 1, . . . , d,
ψ(M )
as can easily be checked. In particular, this shows that the aforementioned fer-
romagnetic Curie-Weiss Ising model is conditionally iid, a result originally due
to [87].

3. Classical results for static factor models

Besides the seminal de Finetti’s Theorem 1.1, the most popular results in
the theory on conditionally iid models concern latent factor processes H of
a very special form to be discussed in the present section. To this end, we con-
sider a popular one-parametric family of one-dimensional distribution functions
x → Fm (x) on the real line and put a prior distribution on the parameter m ∈ R.
Then define H = {Ht }t∈R in the canonical construction (1.4) by Ht = FM (t),
where M is some random variable taking values in the set of admissible values
for the parameter m. For some prominent families, for example the zero mean
normal law or the exponential law, the resulting distribution of the random vec-
tor X belongs to a prominent multivariate family of distributions M, and in fact
defines the subset M∗ ⊂ M. Of particular interest is the case when the subset
M∗ of M admits a convenient analytical description within the framework of the
analytical description of the larger family M. By construction, in this method
of generating conditionally iid laws the dependence-inducing latent factor pro-
cess H is fully determined already by a single random parameter M , so that it
appears unnatural to formulate the model in terms of a “stochastic process” H
at all. Since we investigate situations in which this appears to be more natural
in later sections, we purposely do this anyway in order to present all results
of the present article under one common umbrella. The “single-parameter con-
struction” just described can then be classified as some kind of “static” process
1
within the realm of all possible processes with laws in M+ (H).
More rigorously, let {Ht }t≥0 be the stochastic process from the canonical
stochastic representation (1.4) of some multivariate law in M∗ ⊂ M. Equiv-
alently, we view this probability law as a d-dimensional marginal law of some
infinite exchangeable sequence of random variables {Xk }k∈N , and define {Ht }t≥0
 d
according to Lemma 1.8 as the uniform limit of k=1 1{Xk ≤ t} /d t≥0 as d →
∞. We call the probability law of X = (X1 , . . . , Xd ) static, if the natural filtra-
tion generated by {Ht }t≥0 , i.e. Ht := σ(Hs | s ≤ t), t ∈ R, is trivial, meaning
8 Completing the square shows that fM defines a proper density function on R.
The infinite extendibility problem 707

that there is some T ∈ [−∞, ∞) such that Ht = {∅, Ω} for t ≤ T (“zero infor-
mation before T ”) and Ht = H for t > T (“total information after T ”). The
present section reviews well-known families of distributions M, for which the set
M∗ consists only of static laws. As already mentioned, this situation typically
occurs when the random distribution function H ∼ γ ∈ M+ 1
(H) is itself given by
Ht = FM (t), for a popular family Fm of one-dimensional distribution functions
and a single random variable M representing a random parameter pick.
Example 3.1 (The multivariate normal law revisited). It follows from Exam-
ples 1.1 and 1.3 that N (μ, Σ)∗ , the conditionally iid normal laws, are static.
The random distribution function H as given by (1.6) obviously satisfies H =
σ(Ht : t ∈ R) = σ(M ) = Ht for arbitrary t ∈ R.
Example 3.2 (Binary sequences revisited). If one (hence all) of the conditions
of Theorem 2.1 is satisfied, the law of the binary sequence X ∈ {0, 1}d is static.
Using the notation in Theorem 2.1, the random distribution function H equals
Ht := (1 − M ) 1{t≥0} + M 1{t≥1} . Obviously, H = σ(Ht : t ∈ R) = σ(M ) = Ht
for arbitrary t ≥ 0.
In the remaining section we treat the mixture of iid zero mean normals in
paragraph 3.1 and the mixture of iid exponentials in paragraph 3.2, since these
are the best-studied cases of the theory with nice analytical characterizations.
The interested reader is also referred to [20, 90] who additionally study mixtures
of iid geometric variables, iid Poisson variables, and iid uniform variables. Mix-
tures of uniform random variables are discussed in more detail also in Section 3.3
below.

3.1. Spherical laws (aka 2 -norm symmetric laws)

A random vector X ∈ Rd is called spherical if its probability distribution re-


mains invariant under unitary transformations, such as rotations or reflections,
d
i.e. X = X O for an arbitrary orthogonal matrix O ∈ Rd×d . A spherical random
vector X has a canonical stochastic representation
d
X = R S, (3.1)
where R is a non-negative random variable and the random vector S is inde-
pendent of R and uniformly distributed on the Euclidean unit sphere {x ∈ Rd :
x 2 = 1}, see [31, Chapter 2]. Hence, realizations of spherical laws must be
thought of as being the result of a two-step simulation algorithm: first draw
one completely random point on the unit d-sphere, and then scale this point
according to some one-dimensional probability distribution on the positive half-
axis. In analytical terms, spherical laws are most conveniently treated via their
(multivariate) characteristic functions. In particular, it is not difficult to see
that X has a spherical law if and only if there exists a real-valued function
ϕ : [0, ∞) → R in one variable such that
 
2
E ei (u1 X1 +...+ud Xd ) = ϕ( u 2 ), u = (u1 , . . . , ud ) ∈ Rd ,
708 J.-F. Mai

see, e.g., [75, Lemma 4.1, p. 161]. The function ϕ is called the characteristic
generator. If the components of X are conditionally iid, the function ϕ is of a
very special form, see Schoenberg’s Theorem 3.1 below.
If the components of Y = (Y1 , . . . , Yd ) are iid standard normally distributed,
and M ∈ (0, ∞) is an independent random variable, the random vector X =
M Y is spherical, because Y O is a vector of iid standard normal components for
any orthogonal matrix O. Furthermore, the components of X are iid conditioned
on the σ-algebra generated by the mixture variable M . Schoenberg’s Theorem
states that the converse is true as well, i.e. all conditionally iid spherical laws
are mixtures of zero-mean normals.
Theorem 3.1 (Schoenberg’s Theorem). Let M be the family of d-dimensional
spherical laws, and let the law of X be in M, and assume X is not identically
equal to a vector of zeros. The following are equivalent
(a) The law of X lies in M∗ .
(b) There are iid standard normal random variables Y1 , . . . , Yd and an inde-
pendent positive random variable M ∈ (0, ∞) such that
d
X = M (Y1 , . . . , Yd ).

In other words, this means that X has a stochastic representation as in


(1.4) with Ht := Φ(t/M ), where Φ denotes the distribution function of a
standard normally distributed random variable.
(c) There is a random variable Z with χ2 -law with d degrees of freedom, a
positive random variable M ∈ (0, ∞), and S uniformly distributed on the
Euclidean unit sphere, all three being mutually independent, such that
d √
X=M Z S.

In other words, the random variable R of the general representation (3.1)


d √
is of the special form R = M Z.
(d) The (multivariate) characteristic function of X has the form
 
2
E ei (u1 X1 +...+ud Xd ) = ϕ( u 2 ), u = (u1 , . . . , ud ) ∈ Rd ,

where ϕ is the Laplace transform ϕ of some positive random variable.


Proof. Named after [97], see also [56] or [1, p. 22] for further references. An
alternative proof is also given in [20]. Statement (c) is only included in order to
highlight how the random radius R must be chosen in the canonical represen-
tation (3.1) such that the law of X is in M∗ , see also Remark 3.1 below; the
interested reader can find a proof for the equivalence (b) ⇔ (c) in [75, Lemma
4.2, p. 166]. Similarly, the equivalence (b) ⇔ (d) is obvious, and ϕ in (d) equals
the Laplace transform of the positive random variable M 2 /2 with M from (b).
Trivially, (b) implies (a). We only verify the non-obvious implication (a) ⇒ (b),
and the proof consists of two steps, following the lines of [1, p. 22].
The infinite extendibility problem 709

(i) As a first step we show Maxwell’s Theorem, i.e. if X1 , . . . , Xd are inde-


pendent and (X1 , . . . , Xd ) is spherically symmetric, then all components
Xk are actually iid sharing a normal distribution with mean zero. Since
(X1 , . . . , Xd ) is spherically symmetric, its characteristic function can be
written as
 
2
E ei (u1 X1 +...+ud Xd ) =: ϕ( u 2 ), u = (u1 , . . . , ud ) ∈ Rd ,

for some function ϕ in one variable, see, e.g., [75, Lemma 4.1, p. 161]. De-
noting the characteristic function of Xk by fk , k = 1, . . . , d, independence
2
of the components implies that ϕ( u 2 ) = f1 (u1 ) . . . fd (ud ). Taking the
2
derivative9 w.r.t. uk and dividing by ϕ( u 2 ) on both sides of the last
equation implies for arbitrary k = 1, . . . , d that

fk (uk ) ϕ ( u 2 )
2
= 2 . (3.2)
fk (uk ) 2 uk ϕ( u 2 )

Let u, y ∈ R arbitrary. Plugging u = (u, . . . , u) into (3.2) shows that

fk (u) ϕ ( u 2 )
2
fj (u)
= 2 = , arbitrary 1 ≤ k, j ≤ d. (3.3)
fk (u) 2 u ϕ( u 2 ) fj (u) 2 u

Plugging some u which has u as its k-th and y as its j-th component into
(3.2), we observe

fk (u) ϕ ( u 2 )
2
fj (y) (3.3) fk (y)
= 2 = = .
fk (u) 2 u ϕ( u 2 ) fj (y) 2 y fk (y) 2 y

Since u, y were arbitrary, the functions x → fk (u)/(fk (u) 2 u) are therefore
shown to equal some constant c independent of k. Since fk (0) = 1, solving
the resulting ordinary differential equation implies that fk (u) = exp(c u2 ).
Left to show is now only that c ≤ 0, because this would imply that fk
equals the characteristic function of a zero-mean normal. Since fk is a
characteristic function and as such must be positive semi-definite, the
inequality
$ %
fk (0 − 0) fk (0 − 1)
det = fk (0)2 − fk (1) fk (−1) = 1 − e2 c ≥ 0
fk (1 − 0) fk (1 − 1)

must hold. Clearly, this is only possible for c ≤ 0. The case c = 0 is ruled
out by the assumption that X is not identical to a vector of zeros.
(ii) If the law of X lies in M∗ we can without loss of generality assume that X
equals the first d members of an infinite exchangeable sequence {Xk }k∈N .
Conditioned on the tail-σ-field H := ∩n≥1 σ(Xn , Xn+1 , . . .) the random
9 Notice that characteristic functions are differentiable.
710 J.-F. Mai

variables X1 , . . . , Xd are iid according to de Finetti’s Theorem 1.1. We


observe for an arbitrary orthogonal matrix O ∈ Rd×d that
d
(X O, Xd+1 , Xd+2 , . . .) = (X, Xd+1 , Xd+2 , . . .),

since X is spherical. Since H does not depend on X (but only on the tail
of the infinite sequence), this implies that the conditional distribution of
X and X O given H are identical. As O was arbitrary, X conditioned on
H is spherical. Maxwell’s Theorem now implies that X conditioned on H
is an iid sequence of zero mean normals. Thus, only the standard deviation
may still be a H-measurable random variable, which we denote by M .
If (P) in Problem 1.3 is the property of “having a spherical law (in some di-
mension)”, then Schoenberg’s Theorem 3.1 also implies that M∗ = M∗∗ , which
follows trivially from the equivalence of (a) and (b), since the stochastic con-
struction in (b) clearly works for arbitrary n > d as well. Furthermore, it is
observed that the random distribution function Ht = Φ(t/M ) in part (b) satis-
fies the condition in Lemma 1.7 with μ = 0, so conditionally iid spherical laws
are radially symmetric. In fact, (arbitrary) spherical laws are always radially
d
symmetric, since (X1 , . . . , Xd ) = (−X1 , . . . , −Xd ) follows immediately from the
definition.
Remark 3.1 (Realization of uniform law on Euclidean unit sphere). Denoting
Y = (Y1 , . . . , Yd ), the equivalence (b) ⇔ (c) in Theorem 3.1 implies
 Y Yd 
d 1
S= ,..., ,
Y 2 Y 2

which shows how to generate realizations of the uniform law on the Euclidean
unit sphere from a list of iid standard normals.
Remark 3.2 (Elliptical laws). Spherical laws are always exchangeable, which is
easy to see. A popular method to enrich the family of spherical laws to obtain a
larger family beyond the exchangeable paradigm is linear transformation. To wit,
for X ∈ Rk spherical with characteristic generator ϕ, A ∈ Rk×d some matrix
with Σ := A A ∈ Rd×d and rank of Σ equal to k ≤ d, and with b = (b1 , . . . , bd )
some real-valued row vector, the random vector

Z = (Z1 , . . . , Zd ) = X A + b (3.4)

is said to have an elliptical law with parameters (ϕ, Σ, b). This generalization
from spherical laws to elliptical laws is especially well-behaved from an ana-
lytical viewpoint, since the apparatus of linear algebra gets along perfectly well
with the definition of spherical laws. The most prominent elliptical law is the
multivariate normal distribution, which is obtained in the special case when
ϕ(x) = exp(−x/2) is the Laplace transform of the constant 1/2. The case when
2
E[ X 2 ] < ∞ is of most prominent importance, since the random vector Z then
2
has existing covariance matrix given by E[ X 2 ] Σ /k.
The infinite extendibility problem 711

Since the normal distribution special case occupies a commanding role when
deciding whether or not a spherical law is conditionally iid according to Theo-
rem 3.1(b), and since we have also solved our motivating Problem 1.1 for the
multivariate normal law in Example 1.1, it is not difficult to decide when an el-
liptical law is conditionally iid as well. To wit, in the most important case when
2
E[ X 2 ] < ∞ the random vector Z in (3.4) has a stochastic representation
d
that is conditionally iid if and only if b1 = . . . = bd , and Z = R Y + b with R
some positive random variable with finite second moment and Y = (Y1 , . . . , Yd )
multivariate normal with zero mean vector and covariance matrix such as in Ex-
ample 1.1, i.e. with identical diagonal elements σ 2 > 0 and identical off-diagonal
elements ρ σ 2 ≥ 0.

3.2. 1 -norm symmetric laws

According to [83], a random vector X ∈ [0, ∞)d is called 1 -norm symmetric if


it has a stochastic representation
d
X = R S,

where R is a non-negative random variable and the random vector S is indepen-


dent of R and uniformly distributed on the unit simplex Sd = {x ∈ [0, ∞)d :
x 1 = 1}. Comparing this representation to (3.1), the only difference is that
S is now uniformly distributed on the unit sphere with respect to the 1 -norm
(restricted to the positive orthant [0, ∞)d ), rather than on the unit sphere with
respect to the Euclidean norm. Consequently, quite similar to spherical laws,
realizations of 1 -norm symmetric distributions must be thought of as being
the result of the following two-step simulation algorithm: first draw one com-
pletely random point on the d-dimensional unit simplex, and then scale this
point according to some one-dimensional probability distribution on the posi-
tive half-axis.
Remark 3.1 points out an important relationship between the (univariate)
standard normal distribution and the uniform law on the Euclidean unit sphere
(w.r.t. the Euclidean norm . 2 ). It is not difficult to observe that the (univari-
ate) standard exponential law plays the analogous role for the uniform law on
the unit simplex (w.r.t. the 1 -norm . 1 ). More precisely, if the components of
E = (E1 , . . . , Ed ) are iid exponentially distributed with unit mean, then
 E Ed 
1
S := ,...,
E 1 E 1

is uniformly distributed on the unit simplex, cf. [75, Lemma 2.2(2), p. 77] or
[31, Theorem 5.2(2), p. 115]. An arbitrary 1 -norm symmetric random vector
X is represented as
 E Ed 
d 1
X=R ,..., (3.5)
E 1 E 1
712 J.-F. Mai

with independent R and E. With the analogy to the spherical case in mind,
heuristic reasoning suggests that X is extendible if and only if R is chosen
such that it “cancels” out the denominator of S in distribution. Since E 1
has a unit-scale Erlang distribution with parameter d, this would imply that R
should be chosen as R = Z/M for some positive random variable M and an
independent random variable Z with Erlang distribution and parameter d. This
is precisely the case, as Theorem 3.2 below shows.
Generally speaking, it follows from the canonical stochastic representation
(3.5) that
 x     
Ei , R > x = E e− R−x i=k Ei 1{R>x}
x
P(Xk > x) = P Ek >
R−x
i=k
  x d−1 
= E max 1 − , 0 =: ϕd,R (x), k = 1, . . . , d,
R
where the last equality uses knowledge about the Laplace transform of the
Erlang-distributed random variable i=k Ei . This means that the marginal
survival functions of the components Xk are given by the so-called Williamson
d-transform ϕd,R of R. It has been studied in [106], who shows in particular
that the law of R is uniquely determined by ϕd,R . A similar computation as
above shows that the joint survival function of X is given by

P(X > x) = ϕd,R ( x 1 ), x = (x1 , . . . , xd ) ∈ [0, ∞)d .

Theorem 3.2 solves Problem 1.3 for the property (P) of “having an 1 -norm
symmetric law (in some dimension)”.
Theorem 3.2 (Conditionally iid 1 -norm symmetric laws). Let ϕ : [0, ∞) →
[0, 1] be a function in one variable. The following statements are equivalent:
(a) There is an infinite sequence of random variables {Xk }k∈N such that for
arbitrary d ∈ N we have

P(X > x) = ϕ( x 1 ), x ∈ [0, ∞)d .

(b) The function ϕ equals the Laplace transform of some positive random vari-
able M , i.e. ϕ(x) = E[exp(−x M )].
In this case, for arbitrary d ∈ N we have

d 1 d 1
X = (X1 , . . . , Xd ) = ZS = E,
M M
where X as in (a), M as in (b), S uniformly distributed on the unit simplex,
E = (E1 , . . . , Ed ) a vector of iid unit exponentials, and Z a unit-scale Erlang
distributed variate with parameter d, all mutually independent. In other words,
X has a stochastic representation as in (1.5) with Zt := M t, in particular is
conditionally iid.
The infinite extendibility problem 713

Proof. The implication (b) ⇒ (a) works precisely along the stochastic model
claimed, and is readily observed. The implication (a) ⇒ (b) is known as Kim-
berling’s Theorem, see [54]. We provide a proof sketch in the sequel. From d = 1
we observe that ϕ is the survival function of some positive random variable.
Consequently, due to Bernstein’s Theorem10 , it is sufficient to prove that ϕ is
completely monotone, meaning that (−1)d ϕ(d) ≥ 0 for all d ∈ N0 . To this end,
recall that
d  
 d
d
(−1) ϕ (d)
(x) = Δdh [ϕ](x) + O(h), Δdh [ϕ](x) := (−1)d−k ϕ(x − k h),
k
k=0

so that it is sufficient to show that Δdh [ϕ](x) ≥ 0 for arbitrary d ∈ N0 and


x, h such that 0 ≤ x − d h. To this end, we consider the infinite sequence of
random variables {Uk }k∈N with Uk := ϕ(Xk ), k ∈ N, and with α := ϕ(x/d) and
β := ϕ(x/d − h) > α define the events
   
AI := ∩j∈I {Uj ≤ α} ∩ ∩j ∈I / {U j ≤ β} , I ⊂ {1, . . . , d}.

A lengthy but straightforward computation, with one application of the inclusion


exclusion principle, shows that
 
Δdh [ϕ](x) = . . . = P A∅ \ ∪dk=1 {Uk ≤ α} ≥ 0,

which implies the claim.


Remark 3.3 (On involved probability transforms). In Theorem 3.2, the func-
tion ϕ in part (b) equals the Laplace transform of the random variable M . Fur-
thermore, the survival function of any element in M has the form as claimed
in (a), only the parameterizing function ϕ needs not be a Laplace transform in
general. Instead, ϕ always equals the Williamson d-transform of some positive
random variable (namely of R). The Williamson d-transform of some random
variable is also a Williamson (d + 1)-transform (of some other random vari-
able), and Laplace transforms can be viewed as a proper subset of Williamson
d-transforms given by
&
{Laplace transforms} = {Williamson d-transforms}.
d∈N

The most important example for a Williamson d-transform, which is not a


Laplace transform (in fact, not even a Williamson (d + 1)-transform), is given
by ϕ(x) = ϕ(x; d, r) = (1 − x/r)d−1
+ , with a constant r > 0. In fact, [106] shows
that the set of Williamson d-transforms is a simplex with extremal boundary
given by {ϕ(.; d, r)}r>0 , which is just another way to say that the function ϕd,R
determines the probability law of the positive random variable R uniquely. Sim-
ilarly, Laplace transforms form a simplex with extremal boundary given by the
10 The original reference is [10], a detailed proof can be found in [8] or [96].
714 J.-F. Mai

functions x → exp(−m x) for m > 0, which is just another way to say that
the function ϕ(x) = E[exp(−x M )] determines the law of the positive random
variable M uniquely. Typical parametric examples for Laplace transforms in the
context of 1 -norm symmetric distributions are ϕ(x) = (1+x)−θ with θ > 0, cor-
responding to a Gamma distribution of M , or ϕ(x) = exp(−xθ ) with θ ∈ (0, 1),
corresponding to a stable distribution of M .
Remark 3.4 (Archimedean copulas). Considering X = (X1 , . . . , Xd ) with 1 -
norm symmetric law associated with the Williamson d-transform ϕ = ϕd,R , the
random vector (U1 , . . . , Ud ) := ϕ(X1 ), . . . , ϕ(Xd ) has distribution function

Cϕ (u1 , . . . , ud ) := P(U1 ≤ u1 , . . . , Ud ≤ ud ) = ϕ ϕ−1 (u1 ) + . . . + ϕ−1 (ud ) ,

for u1 , . . . , ud ∈ [0, 1]. Recall that ϕ−1 denotes the generalized inverse of ϕ.
The function Cϕ is called an Archimedean copula and the study of 1 -norm
symmetric distributions can obviously be translated into an analogous study of
Archimedean copulas. In the statistical and applied literature Archimedean cop-
ulas have received considerably more attention. For instance, nested and hier-
archical extensions of (exchangeable) Archimedean copulas have become quite
popular, see, e.g. [16, 47, 49, 82, 107, 65].
Remark 3.5 (Extension to Liouville distributions). Analyzing the analogy be-
tween spherical laws (aka 2 -norm symmetric laws) and 1 -norm symmetric
laws, there is one common mathematical fact on which the analytical treatment
of both families relies. To wit, for both families the uniform distribution on the
d-dimensional unit sphere can be represented as the normalized vector of iid
random variables. In the spherical case the normalized vector Y / Y 2 of d iid
standard normals Y = (Y1 , . . . , Yd ) is uniform on the . 2 -sphere, whereas in
the 1 -norm symmetric case the normalized vector E/ E 1 of d iid standard
exponentials E = (E1 , . . . , Ed ) is uniform on the . 1 -sphere restricted to the
positive orthant [0, ∞)d . Furthermore, in both cases the normalization can be
“canceled out” in distribution, that is
√ Y d E d
Z =Y, R = E,
Y 2 E 1

√ d
where Z = Y 2 is independent of Y and Z has a χ2 -law with d degrees of
d
freedom and R = E 1 is independent of E and has an Erlang distribution
with parameter d. The so-called Lukacs Theorem, due to [63], states that the
exponential distribution of the Ek in the last distributional equality can be gen-
eralized to a Gamma distribution (but no other law on (0, ∞) is possible). More
precisely, if G = (G1 , . . . , Gd ) are independent random variables with Gamma
G
distributions with the same scale parameter, then G 1 is independent of G ,
1
which means that
G d d
R = G, where R = G 1 is independent of G. (3.6)
G 1
The infinite extendibility problem 715

The random vector S := G/ G 1 on the unit simplex is not uniformly dis-


tributed unless the Gk happen to be iid exponential. In general, the law of S is
called Dirichlet distribution, parameterized by the d values α = (α1 , . . . , αd ),
where the Gamma density of Gk is given by

fk (x) = xαk −1 e−x /Γ(αk ), x > 0, k = 1, . . . , d. (3.7)

Notice that the scale parameter of this Gamma distribution is without loss of
generality set to one, since it has no influence on the law of S. A d-parametric
generalization of 1 -norm symmetric laws is obtained by replacing the uniform
law of S on the unit simplex (which is obtained for α1 = . . . = αd ) with a
Dirichlet distribution (with arbitrary αk > 0). One says that the random vector
X = R S with R some positive random variable and S an independent Dirichlet-
distributed random vector on the unit simplex, follows a Liouville distribution.
It is precisely the property (3.6) that implies that the generalization to Liouville
distributions is still analytically quite convenient to work with, see [84] for a
detailed study. Analogous to the 1 -norm symmetric case, the components of X
d d
are conditionally iid if α1 = . . . = αd and R satisfies R = Z/M with Z = G 1
and M some independent positive random variable.
Having at hand the apparatus of Archimedean copulas, we are now in the
position to provide a non-trivial example for the situation M∗∗  M∗ .
Example 3.3 (In general, M∗∗  M∗ ). Consider the family M ⊂ M+ 1
([0, 1]2 )
defined by the property (P) of “having an Archimedean copula as distribution
function and being radially symmetric”. It is well-known since [35, Theorem 4.1]
that the set M comprises precisely Frank’s copula family, that is the bivariate
distribution function of an element in M is either given by C−∞ (u1 , u2 ) :=
max{u1 + u2 − 1, 0}, by C0 (u1 , u2 ) := u1 u2 , by C∞ (u1 , u2 ) := min{u1 , u2 }, or
by

1  e−θ u1 − 1 e−θ u2 − 1 
Cθ (u1 , u2 ) := − log 1 + , u1 , u2 ∈ [0, 1],
θ e−θ − 1

for some parameter θ ∈ (−∞, 0) ∪ (0, ∞). Since Kendall’s Tau of the copula Cθ
is negative in the case θ < 0, Lemma 1.3 implies that the subset M∗ can at best
contain the elements corresponding to θ ∈ [0, ∞]. Indeed, the cases θ ∈ {0, ∞}
are obviously contained in M∗∗ ⊂ M∗ , and for θ ∈ (0, ∞) membership in M∗
follows via the canonical construction (1.4) with the choice H ∼ γ ∈ M+ 1
(H+ ),
given by
 1 − e−θ t M
Ht = , t ∈ [0, 1], (3.8)
1 − e−θ
for a random variable M with logarithmic distribution P(M = m) = (1 −
exp(−θ))m /(m θ), m ∈ N. Furthermore, we can deduce from Theorem 3.2 that
the property of “having an Archimedean copula as distribution function (in ar-
bitrary dimension)” implies that potential elements in M∗∗ must necessarily
716 J.-F. Mai

be induced by a stochastic process of the form (3.8) with some positive random
variable M , which must necessarily be logarithmic in the radially symmetric case
by the result of Frank. The only thing left to check is whether the multivariate
Archimedean copula derived from the canonical construction via H defined by
(3.8) with logarithmic M is radially symmetric in arbitrary dimension d ≥ 2.
According to Lemma 1.7 this is the case if and only if
 1 − e−θ 2 −t
1
M 
  = {H 1 −t }  
1 − e−θ t∈ − 12 , 12 2 t∈ − 12 , 12

  −θ t+ 12 M 
= {1 − Ht+ 12 } 
d  = 1− 1−e  .
t∈ − 12 , 12 1 − e−θ t∈ − 12 , 12

This statement is false, however, as will briefly be explained. Assuming it was


true, then in particular for t = 0 we observe that the law of the random variable
H 12 was symmetric about 12 . In particular, this symmetry would imply
 1 3  3 1
0 = E H 12 − = [. . .] = ϕθ 3 ϕ−1
θ (1/2) − ϕθ 2 ϕ−1
θ (1/2) + ,
2 2 4
with ϕθ (x) = − log e−x (e−θ − 1) + 1 /θ. Numerically, it is easily verified that
the last equality does not hold for any θ ∈ (0, ∞), since the right-hand side is
strictly smaller than zero. Thus, we see that M∗∗ consists only of two elements,
namely those corresponding to {0, ∞}. Thus, M∗∗  M∗  M, since {0, ∞} 
[0, ∞]  [−∞, ∞].

3.3. ∞ -norm symmetric laws

[40, Theorem 2] studies random vectors X = (X1 , . . . , Xd ) which are absolutely


continuous with density given by
fX (x) = gd (x[d] ), x ∈ (0, ∞)d , (3.9)
with some measurable function gd : (0, ∞) → [0, ∞). Recall that
x[d] := max{x1 , . . . , xd } = x ∞

equals the ∞ -norm of x ∈ (0, ∞) . Since fX is invariant with respect to per-


d

mutations of the components of x, the random vector X is exchangeable. But


whether or not it is conditionally iid depends on the choice of gd . First of all,
since fX is a probability density,

1= gd (x[d] ) dx = d gd (x) xd−1 dx, (3.10)
(0,∞)d 0

constituting a necessary and sufficient integrability condition on gd such that fX


defines a proper probability density. Furthermore, lower-dimensional margins of
X have a density of the same structural form, since

fX (x1 , . . . , xd−1 , x) dx = gd−1 (x[d−1] ), x1 , . . . , xd−1 > 0,
0
The infinite extendibility problem 717


where gd−1 (x) := gd (u) du + x gd (x), (3.11)
x

and the function gd−1 is easily checked to satisfy (3.10) in dimension d − 1, that

is 1 = (d − 1) 0 gd−1 (x)xd−2 dx. It is further not difficult to verify that gd is
given in terms of gd−1 as

gd−1 (x) gd−1 (u)
gd (x) = − du. (3.12)
x x u2
If M denotes the family of all laws with density of the form (3.9), i.e. with a func-
tion gd satisfying (3.10), the following result provides necessary and sufficient
conditions on gd to define a law in M∗ .
Theorem 3.3 (Conditionally iid ∞ -norm symmetric densities). Let M be the
family of probability laws on (0, ∞)d with densities of the form (3.9) with a
measurable function gd : (0, ∞) → [0, ∞) satisfying (3.10). For X with law in
M, the following statements are equivalent:
(a) The law of X lies in M∗ .
(b) gd is non-increasing.
(c) For a vector U = (U1 , . . . , Ud ) whose components are iid uniform on [0, 1]
and an independent, positive random variable M we have
d
X = M U.

Proof. This is [40, Theorem 2]. Clearly, (c) ⇒ (a) is obvious. In order to see (b)
⇒ (c), we first conclude from (3.10) that
1 1
0 = lim gd (x) xd = lim gd . (3.13)
x→∞ x→∞ x xd
By non-increasingness, we may without loss of generality assume that gd is
right-continuous (otherwise, change to its right-continuous version, which does
not change the density fX essentially) and apply integration by parts, then
(3.13) and (3.10) imply
∞ ∞
xd d − gd (x) = d gd (x) xd−1 dx = 1.
0 0
x
Consequently, x → 0 y d d − gd (y) defines the distribution function of a posi-
tive random variable M , and we see that

yd
E[1{M >x} M −d ] = d − gd (y) = gd (x).
x yd
Now let U as claimed be independent of M . Conditioned on M , the density of
M U is

d
1{0<xk <M } 1
x → = 1{0<x[d] <M } d .
M M
k=1
718 J.-F. Mai

Integrating out M , the density of M U is found to be



1
1{x[d] <m} dP(M ≤ m) = E[1{M >x[d] } M −d ] = gd (x[d] ),
0 md

which shows (c). The hardest part is (a) ⇒ (b). Fix  > 0 arbitrary. Due to
measurability of gd , Lusin’s Theorem guarantees continuity of gd on a set C
whose complement has Lebesgue measure less than . Without loss of generality
we may assume that all points t in C are density points, i.e. satisfy

λ(C ∩ [t − δ, t + δ])
lim = 1,
δ0 2δ

where λ denotes Lebesgue measure. Let {Xk }k∈N an infinite exchangeable se-
quence such that d-margins have the density fX . Fix t ≥ s arbitrary. We define
the sequence of random variables {ξk }k∈N by

1
ξk := 1{Xk ∈As } − 1{Xk ∈At } , k ∈ N,

where Ax := C ∩ [x − δ, x + δ] for x ∈ {s, t}. Notice that the ξk are square-
integrable and

0 ≤ E[(ξ1 + . . . + ξd )2 ] = d E[ξ12 ] + d (d − 1) E[ξ1 ξ2 ].

If we divide by d2 and let d → ∞, it follows that E[ξ1 ξ2 ] ≥ 0. Denoting by g2


the marginal density of (X1 , X2 ), we observe

1 
0 ≤ E[ξ1 ξ2 ] = E[1{X1 ,X2 ∈As } ] + E[1{X1 ,X2 ∈At } ] − 2 E[1{X1 ∈At ,X2 ∈As } ]
4 δ2
1  
= g (x
2 [2] ) dx + g (x
2 [2] ) dx − 2 g (x
2 [2] ) dx
4 δ2 As ×As At ×At At ×As
= g2 (ηs ) + g2 (ηt ) − 2 g2 (η̃t ),

for certain values s − δ ≤ ηs ≤ s + δ and t − δ ≤ ηt , η̃t ≤ t + δ by the mean value


theorem for Lebesgue integration. As δ  0, we thus observe that g2 (s) ≥ g2 (t),
i.e. g2 is non-increasing. Making use of (3.12) and integrating by parts, we
observe that

1 
g3 (x) = d − g2 (u) ,
x u

which implies that g3 is non-increasing as well. Inductively, the same argument


implies that g4 , . . . , gd are all non-increasing.

From the equivalence of (a) and (c) in Theorem 3.3 we observe easily that
M∗ = M∗∗ , when considering the property (P) of “having a density of the form
(3.9) (in some dimension d ∈ N)” in Problem 1.3. Notice furthermore that the
The infinite extendibility problem 719

law of M U is static in the sense defined in the beginning of this section, and
we have
  t 
Ht := P(Xk ≤ t | M ) = max 0, min 1, , t ∈ R,
M
for Xk := M Uk as defined in part (c) of Theorem 3.3.
Remark 3.6 (Common umbrella of p -norm symmetry results). Theorem 3.3
on ∞ -norm symmetric densities is very similar in nature to Schoenberg’s The-
orem 3.1 on 2 -norm symmetric characteristic functions and Theorem 3.2 on
1 -norm symmetric survival functions, which makes it a beautiful result with
regards to the present survey. The reference [90] considers all these three cases
under one common umbrella, and even manages to generalize them in some
meaningful sense to the case of arbitrary p -norm, with p ∈ [1, ∞] arbitrary11 .
More precisely, it is shown that an infinite exchangeable sequence {Xk }k∈N of
the form Xk := M Yk , k ∈ N, with M > 0 and an independent iid sequence
{Yk }k∈N of positive random variables is p -norm symmetric in some meaningful
sense12 if and only if the random variables Yk have density fp given by
1
p1− p − xpp
fp (x) := e , 0 < p, x < ∞, f∞ (x) := 1{x∈(0,1)} .
Γ(1/p)
Notice that f1 , f2 , and f∞ are the densities of the unit exponential law, the
absolute value of a standard normal law, and the uniform law on [0, 1], respec-
tively. This parametric family in the parameter p is further investigated, and
might for instance be characterized by the fact that fp for p < ∞ has maxi-
mal entropy among all densities on (0, ∞) with p-th moment equal to one, and
f∞ has maximal entropy among all densities with support (0, 1), which is [90,
Theorem 3.5].
An analogous result to Theorem 3.3 on mixtures of the form M U , when
the components of U are iid uniform on [−1, 1], is also presented in [40]. The
resulting densities depend on two arguments, x[1] and x[d] . Furthermore, [90,
Corollary 4.3] prove that an infinite exchangeable sequence {Xk }k∈N satisfies
d
{Xk }k∈N = {M Uk }k∈N , U1 , U2 , . . . iid uniform on [0, 1], M > 0 independent,
if and only if for arbitrary d ∈ N and almost all s > 0 the law of X =
(X1 , . . . , Xd ) conditioned on the event { X ∞ = s} is uniformly distributed
on the sphere {x ∈ (0, ∞)d : x ∞ = s}. This provides an alternative charac-
terization of densities that are ∞ -norm symmetric and conditionally iid.
Remark 3.7 (Relation to non-homogeneous pure birth processes). [99] provide
an interesting interpretation of ∞ -norm symmetric densities, which is briefly
explained. Every non-negative function gd satisfying (3.10) is of the form
x
gd (x) = cd rd (x) e− 0
rd (u) du

11 The authors even allow for p ∈ (0, 1), but in this case .p is no longer a norm.
12 See [90] for details.
720 J.-F. Mai


for some non-negative function rd satisfying 0
rd (x) dx = ∞, and some nor-
malizing constant cd > 0. To wit, a function
gd (x)
rd (x) := ∞ , x > 0, (3.14)
cd x
gd (u) du
for some normalizing constant cd > 0 does the job, as can readily be checked.
From such a function rd we iteratively define functions rd−1 , . . . , r1 by solving
the equations
e−Rk+1 (x)
rk (x) e−Rk (x) = ∞ −R , k = d − 1, . . . , 1, (3.15)
0
e k+1 (u) du
x
where Rk (x) := 0 rk (u) du for k = 1, . . . , d. Notice that rk is related to the
right-hand side of (3.15) exactly in the same way as rd is related to gd , so the
solution (3.14) shows how the rk look like in terms of rk+1 . We define inde-
pendent positive random variables E1 , . . . , Ed with survival functions P(Ek >
x) = exp(−Rk (x)), k = 1, . . . , d, x ≥ 0. Independently, let Π be a random
permutation of {1, . . . , d} with P(Π = π) = 1/d! for each permutation π of
{1, . . . , d}, i.e. Π is uniformly distributed on the set of all d! permutations. We
consider the increasing sequence of random variables T1 < T2 < . . . < Td de-
fined by Tk := E1 + . . . + Ek . Then the (obviously exchangeable) random vector
X = (X1 , . . . , Xd ) := (TΠ(1) , . . . , TΠ(d) ) has density (3.9). If E1 , E2 , . . . is an
arbitrary sequence of independent, absolutely continuous, positive random vari-
ables the counting process

Nt := 1{E1 +...+Ek ≤t} , t ≥ 0,
k≥1

is called non-homogeneous pure birth process with intensity rate functions given
by rk (x) := − ∂x

log{P(Ek > x)}, k ≥ 1. A random permutation of the first d
jump times Tk := E1 + . . . + Ek , k = 1, . . . , d, of a pure birth process N thus has
an ∞ -norm symmetric density if the intensities r1 , . . . , rd−1 can be retrieved
recursively from rd via (3.15). The case of arbitrary intensities r1 , . . . , rd hence
provides a natural generalization of the family of ∞ -norm symmetric densities.
It appears to be an interesting open problem to determine necessary and suf-
ficient conditions on r1 , . . . , rd such that the respective exchangeable density is
conditionally iid, see also paragraph 7.1 below.
Example 3.4 (Pareto mixture of uniforms). Let M in Theorem 3.3 have sur-
vival function P(M > x) = min{1, x−α } for some α > 0. The associated func-
tion gd generating the ∞ -norm symmetric density is given by

α
gd (x) = E[1{M >x} M −d ] = α u−d−1−α du = max{1, x}−d−α .
max{x,1} d+α
The components Xk of X have the following one-dimensional distribution func-
tion G(x) := P(Xk ≤ x), and respective inverse G−1 , given by
'
α
x , if x < 1
G(x) = 1+α 1 −α ,
1 − 1+α x , if x ≥ 1
The infinite extendibility problem 721
'
1+α α
−1 α y , if 0 < y < 1+α
G (y) = −α
1 .
(1 − y) (1 + α) , if α
1+α ≤y<1

This induces the one-parametric bivariate copula family defined by

Cα (u1 , u2 ) := P G(X1 ) ≤ u1 , G(X2 ) ≤ u2



⎪ (1+α)2
⎪ α (α+2) u1 u2 , if u1 , u2 ≤ α
⎨ 1+α
1+1/α
= u[1] − (1+α) u[1] (1 − u[2] )1+1/α , if u[1] ≤ α
≤ u[2] .

⎪ 2+α 1+α
⎩u − α (1 − u )−1/α (1 − u )1+1/α , else
[1] 2+α [1] [2]

Scatter plots from this copula for different values of α are depicted in Figure 3,
visualizing the dependence structure behind pairs of X. The dependence de-
creases with α, and the limiting cases α = 0 and α = ∞ correspond to perfect
positive association and independence, respectively. One furthermore observes
that the dependence is highly asymmetric, i.e. large values of G(X1 ), G(X2 ) are
more likely jointly close to each other than small values, which behave like in-
dependence. This effect can be quantified in terms of the so-called upper- and
lower-tail dependence coefficients, given by
2
lim P(X1 > x | X2 > x = , lim P(X1 ≤ x | X2 ≤ x = 0,
x→∞ 2+α x0

respectively.

4. The multivariate lack-of-memory property

A random vector X = (X1 , . . . , Xd ) with non-negative components is said to


satisfy the (multivariate) lack-of-memory property if for arbitrary 1 ≤ i1 < . . . <
in ≤ d we have that

P(Xi1 > ti1 + t, . . . , Xin > tin + t | Xi1 > t, . . . , Xin > t)
= P(Xi1 > ti1 , . . . , Xin > tin ),

with the t, t1 , . . . , td either in (0, ∞) (continuous support case) or in N0 (discrete


support case). The lack-of-memory property is very intuitive when the k-th com-
ponent Xk of X is interpreted as the future time point at which the k-th compo-
nent in a system of d components fails. In words, it means that conditioned on
the survival of an arbitrary sub-system (i1 , . . . , in ) until time t, the residual life-
times of the components i1 , . . . , in are identical in distribution to the lifetimes
at inception of the system. Needless to mention that such intuitive property
occupies a commanding role in reliability theory, see [6] for a textbook treat-
ment, but is also important in other contexts such as financial risk management,
e.g., [28, 62]. An alternative way to formulate the multivariate lack-of-memory
property, due to [13], is the following. For 1 ≤ i1 < . . . < in ≤ d we denote by
722 J.-F. Mai

Fig 3. Top: 5000 samples of (G(X1 ), G(X2 )) for α = 0.1 in Example 3.4. Bottom: 5000
samples of (G(X1 ), G(X2 )) for α = 1 in Example 3.4.

Zi1 ,...,in (t) := (1{Xi1 >t} , . . . , 1{Xin >t} ), t ≥ 0, the stochastic process which indi-
cates for each of the n components i1 , . . . , in whether it is still working or already
dysfunctional. The random vector X has the lack-of-memory property if and
only if Zi1 ,...,in is a continuous-time Markov chain for all 1 ≤ i1 < . . . < in ≤ d.
From a theoretical point of view, studying the (multivariate) lack-of-memory
property is also natural as it generalizes very popular one-dimensional prob-
ability distributions to the multivariate case. Indeed, if d = 1 we abbreviate
X := X1 and recall the following classical characterizations.
Lemma 4.1 (Characterization of lack-of-memory for d = 1).
(E) If the support of X equals [0, ∞), then X satisfies the lack-of-memory
property if and only if X has an exponential distribution, that is P(X >
t) = exp(−λ t) for some λ > 0.
(G) If the support of X equals N, then X satisfies the lack-of-memory property
if and only if X has a geometric distribution, that is P(X > n) = (1 − p)n
for some p ∈ (0, 1).
The infinite extendibility problem 723

Proof. In the geometric case, inductively we see that F̄ (n) := P(X > n) satisfies
F̄ (n) = F̄ (1)n , n ∈ N0 , and the claim follows with p := 1 − F̄ (1). Notice
that F̄ (1) ∈ {0, 1} is ruled out by the assumption of support equal to N. The
exponential case follows similarly, see [12, p. 190].

4.1. Marshall-Olkin and multivariate geometric distributions

The well known characterizations of univariate lack-of-memory in Lemma 4.1


have been lifted to the multivariate case in [79] and [4, 77], respectively, which is
briefly recalled. First of all, we introduce the multivariate exponential models of
[79] and [4]. To this end, we denote by E(λ) the univariate exponential law with
rate λ > 0, and by G(p) the univariate geometric distribution with parameter
p ∈ (0, 1), i.e. with survival function P(X > n) = (1 − p)n . In order to include
boundary cases, we denote by E(0), G(0) the probability law of a degenerate
random variable that is identically equal to infinity, and by G(1) the probability
law of a degenerate random variable that is identically equal to one.
Example 4.1 (Probability laws with multivariate lack-of-memory).

(E) For each non-empty I ⊂ {1, . . . , d} let λI ≥ 0 with I : k∈I λI > 0 for
each k = 1, . . . , d. With EI ∼ E(λI ) a list of 2d − 1 independent random
variables, we define X via

Xk := min{EI : k ∈ I}, k = 1, . . . , d.

Then X satisfies the multivariate lack-of-memory property, which is easy


to see while noticing that the survival function of X equals
  
F̄ (x) = P(X > x) = exp − λI max{xk } .
k∈I
∅=I⊂{1,...,d}

(G) For each (possibly empty) I ⊂  {1, . . . , d} let pI ∈ [0, 1] with I : k∈I
/ pI <
1 for each k = 1, . . . , d and p
I I = 1. The probabilities p I define a
probability law on the power set of {1, . . . , d}. Let S1 , S2 , . . . be an iid
sequence drawn from this law and denote by GI the smallest n ∈ N such
that Sn = I. Notice that GI ∼ G(pI ). We define the random vector X
with values in Nd by

Xk := min{GI : k ∈ I}, k = 1, . . . , d.

Then X satisfies the multivariate lack-of-memory property. Furthermore,


the survival function of X equals
d 
  n[k] −n[k−1]
F̄ (n) = P(X > n) = pI ,
k=1 I : {σn (k),...,σn (d)}∩I=∅

where σn denotes a permutation of {1, . . . , d} such that nσn (1) ≤ . . . ≤


nσn (d) , and n0 := 0, for n = (n1 , . . . , nd ) ∈ Nd0 .
724 J.-F. Mai

The probability distribution in part (E) of Example 4.1 is called Marshall-


Olkin distribution. It is named after [79]. The probability distribution in part
(G) of Example 4.1 is called wide-sense geometric distribution. The stochastic
model has been introduced in [4]. The presented form of the survival function is
computed in [77]. The following lemma shows that the multivariate stochastic
models in Example 4.1 define precisely the multivariate analogues of the uni-
variate exponential and geometric laws, when defined via the lack-of-memory
property. Thus, it constitutes a multivariate extension of Lemma 4.1.
Lemma 4.2 (Characterization of lack-of-memory for d ≥ 1).
(E) The d-variate Marshall-Olkin distribution is the only probability law with
support [0, ∞)d satisfying the lack-of-memory property.
(G) The d-variate wide-sense geometric distribution is the only probability law
with support Nd satisfying the lack-of-memory property.
Proof. Part (E) is due to the original reference [79], while part (G) is shown in
[77].

Example 4.2 (Narrow-sense geometric law). If Y has a Marshall-Olkin distri-


bution and we define X := (Y1 , . . . , Yd ), then X is said to have the narrow-
sense geometric distribution. As the nomenclature suggests, the narrow-sense
geometric distribution is a subset of the wide-sense geometric distribution in
dimensions d ≥ 2 (and identical for d = 1), which is very easy to see by the
characterizing lack-of-memory property of the Marshall-Olkin law. Not every
wide-sense geometric law can be constructed like this, i.e. the narrow-sense fam-
ily defines a proper subset of the wide-sense family. This indicates that for d ≥ 2
the structure of the discrete lack-of-memory property is more delicate than the
structure of its continuous counterpart. For example, while two components of a
random vector with Marshall-Olkin distribution or narrow-sense geometric dis-
tribution cannot be negatively correlated, two components of a random vector
with wide-sense geometric distribution can be, see [77] for details.

4.2. Infinite divisibility and Lévy subordinators

The concept of infinite divisibility is of fundamental importance in the present


section, but also in Sections 5 and 6 below. Thus, we briefly recall the required
background in the present paragraph. For an elaborate textbook treatment we
refer to [94]. The concept of a Lévy subordinator plays an essential role when
studying the conditionally iid subfamily of the Marshall-Olkin distribution, a re-
sult first discovered in [72]. Recall that a càdlàg stochastic process Z = {Zt }t≥0
with Z0 = 0 is called a Lévy process if it has stationary and independent incre-
ments, which means that:
(i) The law of Zt+h − Zt is independent of t ≥ 0 for each h ≥ 0, i.e. Zt+h −
d
Zt = Zh .
(ii) Zt2 − Zt1 , . . . , Ztn − Ztn−1 are independent for 0 ≤ t1 < . . . < tn .
The infinite extendibility problem 725

Hence, Lévy processes are the continuous-time equivalents of discrete-time ran-


dom walks. A non-decreasing Lévy process is called a Lévy subordinator. How-
ever, there is one fundamental difference between a random walk and a Lévy
process: the probability law of the increments in a random walk is arbitrary on
R, whereas the law of the increments in a Lévy process needs to satisfy a certain
compatibility condition with respect to time, as increments of arbitrarily large
time span can be considered. Concretely, it is immediate from the definition of
a Lévy process Z = {Zt }t≥0 that the probability law of Z1 is infinitely divisible.
Recall that a random variable X is called infinitely divisible if for each n ∈ N
(n) (n) d (n) (n)
there exist iid random variables X1 , . . . , Xn such that X = X1 +. . .+Xn .
Furthermore, if X has an infinitely divisible probability law, there exists a Lévy
d
process Z = {Zt }t≥0 , which is uniquely determined in law, such that Z1 = X.
As a consequence, a Lévy subordinator Z is uniquely determined in distribution
by the law of Z1 , or analytically by the function Ψ(x) := − log(E[exp(−x Z1 )]),
x ≥ 0. One calls Ψ the Laplace exponent of the infinitely divisible random vari-
able Z1 (or of the Lévy subordinator Z). The function Ψ is a so-called Bernstein
function, which means that it is infinitely often differentiable on (0, ∞) and the
derivative Ψ(1) is completely monotone, i.e. (−1)k+1 Ψ(k) ≥ 0 for all k ≥ 1, see
[8, 96] for textbook treatments on the topic. The value Ψ(0) is by definition
equal to zero but we might have a jump at zero meaning that Ψ(x) >  > 0 for
all x > 0 is possible. Intuitively, this is the case if and only if P(Zt = ∞) > 0 for
t > 0, and in this case one sometimes also speaks of a killed Lévy subordinator.

4.3. Analytical characterization of exchangeability and conditionally


iid

By Lemma 1.1 a random vector X with either Marshall-Olkin distribution


or wide-sense geometric distribution can only be conditionally iid if it is ex-
changeable. An elementary computation shows that the Marshall-Olkin distri-
bution (resp. wide-sense geometric distribution) is exchangeable if and only if
its parameters λI (resp. pI ) depend on the indexing subsets I only through
their cardinality |I|. In this exchangeable case, we denote these parameters by
λ1 , . . . , λd (resp. p0 , p1 , . . . , pd ), with subindices denoting the possible cardinal-
ities, i.e. λk := λ{1,...,k} and pk := p{1,...,k} , and combinatorial computations
show that the survival function F̄ of X takes the convenient algebraic form

d
x −x[d−k] 
d
n −n[d−k]
F̄ (x) = bk[d−k+1] , F̄ (n) = bk [d−k+1] , (4.1)
k=1 k=1

for either x1 , . . . , xd ∈ [0, ∞) with x0 := 0 (in the Marshall-Olkin case) or


n1 , . . . , nd ∈ N0 with n0 := 0 (in the wide-sense geometric case), and with13
k   d−i   
d−i
bk := exp − λj+1 , (Marshall-Olkin case) (4.2)
i=1 j=0
j
13 The
0
empty product is conveniently defined to be equal to one, i.e i=1 := 1.
726 J.-F. Mai


d−k
d−k

bk := pi , (wide-sense geometric case), (4.3)
i=0
i

for k = 0, . . . , d. While the parameters λk (resp. pk ) are intuitive since they


allow for the probabilistic interpretations according to Example 4.1, the re-
parameterization in terms of the new parameters bk is more convenient with
regards to finding an answer to the question: when is X conditionally iid? The
main result in this regard is stated in Theorem 4.1 below, which requires the
notion of d-monotone sequences and log-d-monotone sequences. The concept of
d-monotonicity as well as the notations Md and M∞ have already been intro-
duced in paragraph 2.1, the related concept of log-d-monotonicity is introduced
in the following definition.
Definition 4.1 (Log- monotone sequences). For d ∈ N, a finite sequence
(b0 , b1 , . . . , bd ) ∈ (0, ∞)d+1 is said to be log-d-monotone if ∇d−k log(bk ) ≥ 0
for k = 0, 1, . . . , d − 1. An infinite sequence {bk }k∈N0 with positive members is
said to be completely log-monotone if (b0 , . . . , bd ) is log-d-monotone for each
d ≥ 1.
The notion of a log-d-monotone sequence is less intuitive than that of a
d-monotone sequence. First notice that, in contrast to the definition of a d-
monotone sequence in paragraph 2.1, log(bd ) ≥ 0 needs not hold for a log-d-
monotone sequence, which is explained by the following useful relationship be-
tween (d − 1)-monotonicity and log-d-monotonicity. It helps to transform state-
ments involving log-d-monotonicity into statements involving only the simpler
notion of (d − 1)-monotonicity14 :

(b0 , . . . , bd−1 ) is (d − 1)-monotone


  d−1 
⇔ 1, e−b0 , e−(b0 +b1 ) , . . . , e− i=0 bi is log-d-monotone. (4.4)

The set of all log-d-monotone sequences starting with b0 = 1 will be denoted


by LMd in the following. Similarly, LM∞ denotes the sets of completely log-
monotone sequences starting with b0 = 1. [77, Proposition 4.4] shows that
{bk }k∈N ∈ LM∞ if and only if {btk }k∈N ∈ M∞ for arbitrary t > 0. In par-
ticular, LM∞ ⊂ M∞ . Theorem 4.1 below provides a second result, besides
Theorem 2.1, showing that whether or not a (log-) d-monotone sequence can be
extended to a completely (log-) monotone sequence plays an important role in
the context of the present survey.
In order to better understand the following theorem it is helpful to know
that the Laplace exponent Ψ of a Lévy subordinator Z is already completely
determined by its values on N, i.e. by the sequence {Ψ(k)}k∈N0 . Furthermore, the
sequence {exp(−Ψ(k))}k∈N0 equals the moment sequence of the random variable
exp(−Z1 ), so lies in M∞ by the little moment problem, see paragraph 2.1. Since
14 This statement simply follows from the fact that log(1) = 0 and ∇d−k−1 bk =
 k−1
∇d−k log(b̃k ) with b̃k := exp(− i=0 bi ) for k = 0, . . . , d, with an empty sum being con-
veniently defined as zero.
The infinite extendibility problem 727

for arbitrary t > 0 even the sequence {exp(−t Ψ(k))}k∈N0 lies in M∞ as the
moment sequence of exp(−Zt ), the sequence {exp(−Ψ(k))}k∈N0 even lies in the
smaller set LM∞ of completely log-monotone sequences. The subset LM∞ 
M∞ corresponds to precisely the infinitely divisible laws on [0, ∞], which is the
discrete analogue of the well known statement that exp(−t Ψ) is a completely
monotone function for arbitrary t > 0 if and only if Ψ(1) is completely monotone.
With this information and the information of paragraph 4.2 as background the
following theorem is now quite intuitive.
Theorem 4.1 solves Problem 1.3 for the property (P) of “satisfying the mul-
tivariate lack-of-memory property”.
Theorem 4.1 (Lack-of-memory, exchangeability and conditionally iid).
(E) The function (4.1) is a survival function (of some X) with support [0, ∞)d
if and only if we have (b0 , . . . , bd ) ∈ LMd . Furthermore, the associated ex-
changeable Marshall-Olkin distribution admits a stochastic representation
that is conditionally iid if there exist bd+1 , bd+2 , . . . such that {bk }k∈N0 ∈
LM∞ . To wit, in this case there exists a (possibly killed) Lévy subordina-
tor Z = {Zt }t≥0 , determined in law via
 
bk := E e−k Z1 , k ∈ N0 , (4.5)

such that X has the same distribution as the vector defined in (1.5).
(G) The function (4.1) is a survival function (of some X) with support Nd
if and only if we have (b0 , b1 , . . . , bd ) ∈ Md . Furthermore, the associated
exchangeable wide-sense geometric distribution admits a stochastic rep-
resentation that is conditionally iid if there exist bd+1 , bd+2 , . . . such that
{bk }k∈N0 ∈ M∞ . To wit, in this case there exists an iid sequence Y1 , Y2 , . . .
of random variables taking values in [0, ∞], determined in law via
 
bk := E e−k Y1 , k ∈ N0 , (4.6)

such that X has the same distribution as the vector defined in (1.5) when

Zt := Y1 + Y2 + . . . + Yt , t ≥ 0.

Proof. Part (E) is due to [71, 72], while part (G) is due to [77].
First, we observe that once the correspondence between Md and the wide-
sense geometric law is established, the correspondence between LMd and the
narrow-sense geometric law (or, algebraically equivalent, its continuous counter-
part the Marshall-Olkin law) follows from (4.4) together with (4.2) and (4.3).
This is because the λj in (4.2) are arbitrary non-negative numbers, and the pi in
(4.3) are also arbitrary non-negative up to scaling (i.e. with an additional scale
factor c > 0 we have that c (p0 , . . . , pd−1 ) and (λ1 , . . . , λd ) both run through all
of [0, ∞)d \ {(0, . . . , 0)}, noticing that pd is determined by p0 , . . . , pd−1 ). Con-
cretely, by the correspondence between Md and the wide-sense geometric law,
we obtain a correspondence between Md and [0, ∞)d \ {(0, . . . , 0)} up to scaling
728 J.-F. Mai

in (4.3). In particular, the property of being d-monotone is not affected by c.


Replacing the λj in (4.2) by c pj−1 and making use of (4.4), we then end up with
the correspondence between LMd and the Marshall-Olkin law. To establish the
correspondence between Md and the wide-sense geometric law is really only a
tedious algebraic computation, see [77] for details. Essentially, d-monotonicity
enters the scene precisely for the same reason as in paragraph 2.1.
Regarding the conditionally iid subfamliy, the crucial insight is that M∞
stands in one-to-one relation with the set of probability measures on [0, ∞] via
(4.6), which is exactly the well-known statement of the little moment problem,
only formulated for the compact interval [0, ∞] instead of the more usual interval
[0, 1] via the transformation − log. That the (discrete) random walk construction
in part (G) can only be “made continuous” in case Y1 is infinitely divisible is
very intuitive, and the Lévy subordinator in part (E) is simply the continuous
analogue of the discrete random walk in that case.

Since the narrow-sense geometric law of Example 4.2 is a special case of


the wide-sense geometric law, it follows that LMd  Md , which in fact is not
an obvious statement. Furthermore, X in part (G) of Theorem 4.1 happens
to be narrow-sense geometric if and only if the random variable Y1 is infinitely
divisible. In fact, the elements of LM∞ stand in one-to-one correspondence with
the family of infinitely divisible laws on [0, ∞] via (4.5), whereas the elements
of the larger set M∞ stand in one-to-one correspondence with the family of
arbitrary probability laws on [0, ∞] via (4.6), which is just a slight re-formulation
of the little moment problem.
Remark 4.1 (Analytical criterion for conditionally iid). Given an exchangeable
random vector X with lack-of-memory property and parameters (b0 , . . . , bd ),
Theorem 4.1 implies that X has a stochastic representation that is conditionally
iid if (b0 , . . . , bd ) can be extended to a completely (log-) monotone sequence.
Using (4.4), an element (b0 , . . . , bd ) ∈ LMd is extendible to an element in LM∞
if and only if the (d − 1)-monotone sequence (− log(b1 /b0 ), . . . , − log(bd /bd−1 ))
is extendible to a completely monotone sequence. Thus, we can concentrate on
the completely monotone case. Deciding whether a d-monotone sequence can be
extended to a completely monotone sequence is the truncated Hausdorff moment
problem again, see Section 2.1. This means that an effective analytical criterion
for extendibility is known.
The following example demonstrates how a parameter sequence {bk }k∈N0 for
some wide-sense geometric law is conveniently defined via the link to the little
moment problem, setting bk := E[X k ], k ∈ N0 , where X is some arbitrary
random variable taking values in [0, 1].
Example 4.3 (A two-parametric family based on the Beta distribution). Con-
sider a random variable X with density

Γ(p + q) p−1
fX (x) = x (1 − x)q−1 , 0 < x < 1,
Γ(p) Γ(q)
The infinite extendibility problem 729

with parameters p, q > 0, which is a Beta distribution. The moment sequence is


known to be15
1
Γ(p + k) Γ(p + q)
E[X k ] = fX (x) xk dx = , k ∈ N0 ,
0 Γ(p) Γ(p + q + k)

so that a two-parametric family of d-variate wide-sense geometric survival func-


tions (for arbitrary d ≥ 1) is given by
d 
 Γ(p + q) n[d]  Γ(p + k) n[d−k+1] −n[d−k]
F̄p,q (n) = , n ∈ Nd0 .
Γ(p) Γ(p + q + k)
k=1

d
The associated probability distribution of Y1 in Theorem 4.1(G) is given by Y1 =
− log(X), i.e. the logarithm of the reciprocal of the Beta distribution in concern.
Similarly, making use of (4.4), a two-parametric family of d-variate Marshall-
Olkin survival functions (for arbitrary d ≥ 1) is given by

 Γ(p + q) 
d  Γ(p + i) 
k−1
F̄p,q (x) = exp − (x[d−k+1] − x[d−k] )
Γ(p) i=0
Γ(p + q + i)
k=1
 Γ(p + q)  Γ(p + k − 1)
d 
= exp − x[d−k+1] , x ∈ [0, ∞)d .
Γ(p) Γ(p + q + k − 1)
k=1

In the special case when q = 2, the Lévy subordinator in Theorem 4.1(E) is of


compound Poisson type with intensity p + 1 and jumps that are exponentially
distributed with parameter p.

5. Max-/ min-stable laws and extreme-value copulas

Throughout this paragraph, for the sake of a more compact notation we implic-
itly make excessive use of the abbreviations f (0) := limx0 f (x) and f (∞) :=
limx→∞ f (x) for functions f : (0, ∞) → (0, ∞), provided the respective limits
exist in [0, ∞].

5.1. Max-/ min-stability and multivariate extreme-value theory

Definition 5.1 (Max- and min-stability). We denote by F (resp. F̄ ) the d-


variate distribution function (resp. survival function) of some d-dimensional
random vector Y = (Y1 , . . . , Yd ) (resp. X = (X1 , . . . , Xd )).
(a) (The probability law of ) Y is said to be max-stable if for arbitrary t > 0
there are αi (t) > 0, βi (t) ∈ R such that

F (x)t = F α1 (t) x1 + β1 (t), . . . , αd (t) xd + βd (t) .


15 See, e.g., [29, p. 35].
730 J.-F. Mai

In this case, we also say that F is max-stable. In words, F t is again a


distribution function and equals F modulo a linear transformation of its
arguments.
(b) (The probability law of ) X is said to be min-stable if for arbitrary t > 0
there are αi (t) > 0, βi (t) ∈ R such that

F̄ (x)t = F̄ α1 (t) x1 + β1 (t), . . . , αd (t) xd + βd (t) .

In this case, we also say that F̄ is min-stable. In words, F̄ t is again


a survival function and equals F̄ modulo a linear transformation of its
arguments.
If Y is max-stable and Y (i) are independent copies of Y , then for arbitrary
n ∈ N we observe
 maxn {Y (i) } − β (1/n) maxni=1 {Yd } − βd (1/n) 
(i)
d i=1 1 1
Y = ,..., .
α1 (1/n) αd (1/n)

Similarly, if X is min-stable this means


 minn {X (i) } − β (1/n) minni=1 {Xd } − βd (1/n) 
(i)
d i=1 1 1
X= ,..., .
α1 (1/n) αd (1/n)

In words, the component-wise re-scaled maxima of iid copies of Y (resp. minima


of iid copies of X) have the same distribution as Y (resp. X).
Max and min-stability play a central role in multivariate extreme-value the-
ory, as will briefly be explained. If V (i) are independent copies of some random
vector V = (V1 , . . . , Vd ), one is interested in the probability law of the vectors
of component-wise maxima, that is
n (i) n (i)
max{V1 }, . . . , max{Vd } , n ∈ N.
i=1 i=1

If one can find sequences α1 (n), . . . , αd (n) > 0 and β1 (n), . . . , βd (n) ∈ R such
that the re-scaled vector
 maxn {V (i) } − β (n) maxni=1 {Vd } − βd (n) 
(i)
i=1 1 1
,...,
α1 (n) αd (n)

converges in distribution to some Y = (Y1 , . . . , Yd ) as n → ∞, then one says


that Y has a multivariate extreme-value distribution. A classical result in mul-
tivariate extreme-value theory states that Y has a multivariate extreme-value
distribution if and only if Y is max-stable, see, e.g., [50, pp. 172-174].
Since Y is max-stable if and only if −Y is min-stable (obviously), max- and
min-stability can be studied jointly by focusing on one of the two concepts.
Classical extreme-value theory textbooks typically focus on max-stability and
further subdivide the study of the probability law of max-stable Y into two
sub-studies:
The infinite extendibility problem 731

(i) By the Fisher-Tippett-Gnedenko Theorem, the univariate distribution


function Fk of each component Yk necessarily belongs to either the Gum-
bel, the Fréchet or the Weibull family, see [7, Chapter 2, p. 45 ff] for
background.
(ii) Having understood the univariate distribution functions F1 , . . . , Fd of the
components according to (i), the distribution function F of Y necessarily
takes the form
F (x) = C F1 (x1 ), . . . , Fd (xd ) ,

for a copula C : [0, 1]d → [0, 1] with the characterizing property that
C(u)t = C(ut1 , . . . , utd ) for each t > 0, a so-called extreme-value copula.
In order to focus on a deeper understanding of extreme-value copulas it is conve-
nient to normalize the margins F1 , . . . , Fd . In classical extreme-value theory, it
is standard to normalize to standardized Fréchet distributions, i.e. Fk (x) =
exp(−λk /x) 1{x>0} for some λk > 0. Furthermore, we observe that X :=
(1/Y1 , . . . , 1/Yd ) is well-defined, Xk is exponential with rate λk , and X is min-
stable (since x → 1/x is strictly decreasing, so max-stability of Y is flipped to
min-stability of X). The vector X is thus called min-stable multivariate expo-
nential and has survival function
F̄ (x) = P(X > x) = P(Y < 1/x) = C e−λ1 x1 , . . . , e−λd xd ,
with extreme-value copula C. The survival function F̄ is min-stable, satisfying
F̄ (x)t = F̄ (t x), t > 0. (5.1)
The analytical property (5.1) characterizes the concept of min-stable multivari-
ate exponentiality on the level of survival functions, and serves as a convenient
starting point to study the conditionally iid subfamily (of extreme-value copulas,
resp. min-stable multivariate exponential distributions). For a given extreme-
value copula C it further turns out convenient to consider its so-called stable
tail dependence function
 
(x) := − log C e−x1 , . . . , e−xd , x ∈ [0, ∞)d ,

which satisfies (t x) = t (x). Clearly,  determines C and C determines , so


that investigating  instead of C is just a matter of convenience. Wrapping up,
a min-stable multivariate exponential distribution is fully determined by the
rates (λ1 , . . . , λd ) specifying the one-dimensional exponential margins, and by a
stable tail dependence function  which stands in a one-to-one relationship with
the associated extreme-value copula C.

5.2. Analytical characterization of conditionally iid

In the sequel, we are interested in the question: when is a min-stable multi-


variate exponential vector X, i.e. one whose survival function satisfies (5.1),
conditionally iid? We start with two important examples.
732 J.-F. Mai

Example 5.1 (Independent exponentials). If the components X1 , . . . , Xd of X


are iid, then we only need to consider the law of X1 . By definition, X1 must
have an exponential law, so there is some λ > 0 such that for arbitrary t > 0
we have

d t d 
d
F̄ (x)t = e−λ xk = e−t λ k=1 xk
= e−λ t xk = F̄ (t x).
k=1 k=1

Consequently, X is min-stable multivariate exponential. The associated stable


tail dependence function is (x) = x1 + . . . + xd = x 1 .
For arbitrary c ≥ 0 we introduce the notation H+,c ⊂ H+ for distribution
functions of non-negative random variables with mean equal to c. For G ∈ H+

we further denote by MG := 0 1 − G(x) dx ∈ [0, ∞] its mean.
Example 5.2 (An important semi-parametric family). Let G ∈ H+ with 0 <
MG < ∞. With an iid sequence of unit exponentials η1 , η2 , . . . we consider the
stochastic process
  η + . . . + η 
1 n
Zt := − log G − , t ≥ 0,
t
n≥1

taking values in [0, ∞]. It is not difficult to see that H := 1 − exp(−Z) takes
values in H+ . Consequently, we may define a conditionally iid random vector
X via the canonical stochastic model (1.5) from this process H. Conditioned
on H, the components of X are iid with distribution function H. It turns out
that X is min-stable multivariate exponential. To see this, we recall that the
increasing sequence {η1 + . . . + ηn }n≥1 equals the enumeration of the points of a
Poisson random measure on [0, ∞) with intensity measure equal to the Lebesgue
measure. This implies with the help of [91, Proposition 3.6] in (∗) below that the
survival function F̄ of X is given by
 d 
F̄ (x) = P(Zx1 ≤ 1 , . . . , Zxd ≤ d ) = E e− k=1 Zxk
   
d η + . . . + η 
1 n
= E exp − − log G −
xk
n≥1 k=1

(∗)
 ∞ 
d u 
= exp − 1− G du .
0 xk
k=1

We introduce the notation


1 ∞ 
d u
G (x) := − log(F̄ (x/MG )) = 1− G du,
MG 0 xk
k=1

and we observe by substitution that t G (x) = G (t x) for arbitrary t > 0. This


implies F̄ (x)t = F̄ (t x), so X is min-stable multivariate exponential. The func-
tion G is the stable tail dependence function of X. The constant MG equals the
exponential rate of the exponential random variables X1 , . . . , Xd .
The infinite extendibility problem 733

The main theorem in this section states that Examples 5.1 and 5.2 are gen-
eral enough to understand the structure of the set of all infinite exchangeable
sequences {Xk }k∈N whose finite-dimensional margins are both min-stable multi-
variate exponential and conditionally iid. Concretely, Theorem 5.1 solves Prob-
lem 1.3 for the property (P) of “having a min-stable multivariate exponential
distribution (in some dimension)”. In analytical terms, it states that the stable
tail dependence function associated with the extreme-value copula of a condi-
tionally iid min-stable multivariate exponential random vector is a convex mix-
ture of stable tail dependence functions having the structural form as presented
in Examples 5.1 and 5.2.
Theorem 5.1 (Which min-stable laws are conditionally iid?). Let {Xk }k∈N be
an infinite exchangeable sequence of positive random variables such that X =
(X1 , . . . , Xd ) is min-stable multivariate exponential for all d ∈ N. Assume that
{Xk }k∈N is not iid, i.e. not given as in Example 5.1. Then there exists a unique
triplet (b, c, γ) of two constants b ≥ 0, c > 0 and a probability measure γ on
H+,1 , such that Xk is exponential with rate b + c for each k ∈ N and the stable
tail dependence function of X equals
  x  b c
(x) := − log F̄ = x 1+ G (x) γ(dG).
b+c b+c b + c H+,1
In probabilistic terms, the random distribution function H, defined as the limit
of empirical distribution functions of the {Xk }k∈N as in Lemma 1.8, necessarily
d
satisfies H = 1 − exp(−Z) with
  
(n)
Zt = b t + c − log G η1 +...+ηn , t ≥ 0, (5.2)
t −
n≥1

where G(k) is an iid sequence drawn from the probability measure γ, independent
of the iid unit exponentials η1 , η2 , . . ..
Proof. A proof consists of three steps, which have been accomplished in the
three references [74, 59, 66], respectively, and which are sketched in the sequel.
(i) For Z = − log(1 − H) with H as defined in Lemma 1.8 from the sequence
{Xk }k∈N , [74, Theorem 5.3] shows that

n 
d (i)
Z= Zt , n ∈ N, (5.3)
n t≥0
i=1

where Z (i) are independent copies of Z. Conversely, it is shown that if


Z is non-decreasing and satisfies (5.3), then 1 − exp(−Z) is an element
of Θ−1
d (M∗∗ ), when M∗∗ is as in Problem 1.3 and (P) is the property of
“having a min-stable multivariate exponential distribution”.
(ii) [59] show that a non-negative stochastic process Z satisfying (5.3) admits
a series representation of the form
  (n) 
d
{Zt }t≥0 = b t + f t ,
η1 +...+ηn t≥0
n≥1
734 J.-F. Mai

where f (n) are iid copies of some càdlàg stochastic process f with f0 = 0
satisfying some integrability condition, and b ∈ R.
(iii) [66] proves that in the series representation in (ii) necessarily b ≥ 0 and f
is almost surely non-decreasing. Furthermore, the integrability condition
on f can be re-phrased to say that t → G̃t := exp(− lims↓t f1/s ) defines
almost surely the distribution function of some random variable with fi-

nite mean MG̃ = 0 1 − G̃t dt > 0. Finally, the distribution function
t → Gt := G̃MG̃ t has unit mean, and the claimed representation for  is
obtained when c := E[MG̃ ] and γ is defined as the probability law of G
after an appropriate measure change. That (b, c, γ) is unique follows from
the normalization to unit mean of G (for each single realization).
Stochastic processes with property (5.3) are said to be strongly infinitely
divisible with respect to time (strong IDT). Particular examples of strong IDT
processes have been studied in [78, 27, 43], with an emphasis on the associated
multivariate min-stable laws also in [74, 9, 64, 76].
Every Lévy process is strong IDT, but the converse needs not hold. For in-
stance, if Z = {Zt }t≥0 is a non-trivial Lévy subordinator and a > b > 0, then
the stochastic process {Za t + Zb t }t≥0 is strong IDT, but not a Lévy subordi-
nator. The probability law γ in Theorem 5.1 in case of a Lévy subordinator is
specified as the probability law of
 
Gt = e−M + 1 − e−M 1{1−e−M ≥1/t} , t ≥ 0, (5.4)

with an arbitrary random variable M taking values in (0, ∞]. The Lévy measure
of Z and the probability law of M stand in one-to-one relation. We know from
the preceding section that if Z is a Lévy subordinator, the associated element
in M∗∗ is a d-variate Marshall-Olkin distribution. Indeed, the Marshall-Olkin
distribution is one of the most important examples of min-stable multivariate
exponential distributions. Two further examples are presented in the sequel.
Example 5.3 (The (negative) logistic model). If we reconsider Example 5.2
with the Fréchet distribution function G(x) = exp(−{Γ(1 − θ) x}−1/θ ) for θ ∈
(0, 1), then we observe


d
1

G (x) = xkθ = x 1 .
θ
k=1

This is the so-called logistic model. It is particularly convenient to be looked at


from the perspective of conditionally iid models, since the associated strong IDT
process Z takes a very simple form, to wit
 
d 
Z = S t θ t≥0 , S a θ-stable random variable, i.e. E e−x S = e−x .
1 θ

In particular, the resulting extreme-value copula, named Gumbel copula after


[41, 42], is also an Archimedean copula, see Remark 3.4. In fact, it is the only
The infinite extendibility problem 735

copula that is both Archimedean and of extreme-value kind, a result first discov-
ered in [39].
A related example is obtained, if we choose the Weibull distribution function
G(x) = 1 − exp(−{Γ(θ + 1) x}1/θ ), which implies


d  
j − θ1
G (x) = (−1)j+1 x−θ
ik .
j=1 1≤i1 <...<ij ≤d k=1

This is the so-called negative logistic model. The associated extreme-value cop-
ula is named Galambos copula after [36]. There exist many analogies between
logistic and negative logistic models, the interested reader is referred to [37] for
background. In particular, the Galambos copula is the most popular representa-
tive of the family of so-called reciprocal Archimedean copulas as introduced in
[38], see also paragraph 7.1 below.

Example 5.4 (A rich parametric family). For G ∈ H+,1 the function ΨG (z) :=

0
1−G(t)z dt defines a Bernstein function with ΨG (1) = 1, see [64, Lemma 3].
This implies for z ∈ (0, ∞) that Gz ∈ H+,1 , where Gz (x) := G(x ΨG (z))z . Con-
sequently, if M is a positive random variable, we may define γ ∈ M+ 1
(H+,1 ) as
the law of GM . The associated stable tail dependence function equals (x) :=
E[GM (x)]. Many parametric models from the literature are comprised by this
construction. In particular, Example 5.2 corresponds to the case M ≡ 1, and if
G(x) = exp(−1) + (1 − exp(−1)) 1{1−exp(−1)≥1/x} we observe that GM equals
the random distribution function (5.4) corresponding to the Marshall-Olkin sub-
family. See [76] for a detailed investigation and applications of this parametric
family.

Remark 5.1 (Extension to laws with exponential minima). We have seen that
the Marshall-Olkin distribution is a subfamily of min-stable multivariate ex-
ponential laws. The seminal reference [26] treats both families as multivariate
extensions of the univariate exponential law and in the process introduces the
even larger family of laws with exponential minima. A random vector X is said
to have exponential minima if min{Xi1 , . . . , Xik } has a univariate exponential
law for arbitrary 1 ≤ i1 < . . . ik ≤ d. Obviously, a min-stable multivariate expo-
nential law has exponential minima, but the converse needs not hold in general.
It is shown in [74] that if Z = {Zt }t≥0 is a right-continuous, non-decreasing
process such that E[exp(−x Zt )] = exp(−t Ψ(x)) for some Bernstein function
Ψ, then X as defined in (1.5) has exponential minima. The process Z is said
to be weakly infinitely divisible with respect to time (weak IDT), and – as the
nomenclature suggests – every strong IDT process is also weak IDT. However,
there exist weak IDT processes which are not strong IDT. Notice in particular
that a Lévy subordinator is uniquely determined in law by the law of Z1 (or
equivalently the Bernstein function Ψ), but neither strong nor weak IDT pro-
cesses are determined in law by the law of Z1 . If one takes two independent,
(1) d (2)
but different, strong IDT processes Z (1) , Z (2) subject to Z1 = Z1 , then the
736 J.-F. Mai

stochastic process
'
Zt
(1)
if B = 1, 1
Zt := (2) , B independent Bernoulli -variate, t ≥ 0,
Zt if B = 0, 2

is weak IDT, but not strong IDT. On the level of X this means that the mixture
of two min-stable multivariate exponential random vectors always has exponen-
tial minima, but needs not be min-stable anymore.
Remark 5.2 (Archimax copulas). The study of min-stable multivariate expo-
nentials is analogous to the study of extreme-value copulas. From this perspec-
tive, Theorem 5.1 gives us a canonical stochastic model for all conditionally iid
extreme-value copulas. Another family of copulas for which we understand the
conditionally iid subfamily pretty well is Archimedean copulas, related to 1 -norm
symmetric distributions and mentioned in Remark 3.4. The family of so-called
Archimax copulas is a superclass of both extreme-value and Archimedean cop-
ulas. It has been studied in [14, 15] with the intention to create a rich copula
family that comprises well-known subfamilies. An extreme-value copula C is con-
veniently described in terms of its stable tail dependence function. Recall that
Theorem 5.1 is formulated in terms of the stable tail dependence function and
gives an analytical criterion for C to be conditionally iid. An Archimax copula
C is a multivariate distribution function of the functional form
 
C,ϕ (u1 , . . . , ud ) = ϕ  ϕ−1 (u1 ), . . . , ϕ−1 (ud ) .

It is recognized that if (x1 , . . . , xd ) = x 1 then C,ϕ is an Archimedean copula,


and if ϕ(x) = exp(−x), then C,ϕ is an extreme-value copula. By combining our
knowledge from Theorems 3.2 and 5.1 about Archimedean and extreme-value
copulas, it is immediate to show that
 
γ̃ ∈ M+ 1
(H+ ) : 1 − e−ZM t t≥0 ∼ γ̃, M > 0 a positive random variable, and

{Zt }t≥0 non-decreasing strong IDT ⊂ Θ−1 d M∗∗ , (5.5)

when M denotes the family of all probability laws with the property (P) of “hav-
ing a survival function of the functional form ϕ ◦  with  some stable tail de-
pendence function”. In this case, the function ϕ equals the Laplace transform of
M and  is given in terms of a triplet (b, c, γ) such as in Theorem 5.1, associ-
ated with the strong IDT process Z, and b + c = 1. Notice that each stable tail
dependence function  equals the restriction of an orthant-monotonic norm to
[0, ∞)d , see [85], so that survival functions of the form ϕ ◦  are precisely the
survival functions that are symmetric with respect to the norm .

6. Exogenous shock models

The present section studies a family M of multivariate distribution functions


that have a stochastic representation according to the following exogenous shock
The infinite extendibility problem 737

model: We consider some system consisting of d components and interpret the


k-th component of our random vector X = (X1 , . . . , Xd ) with law in M as
the lifetime of the k-th component in our system. A component lives until it
is affected by an exogenous shock, and the arrival times of these exogenous
shocks are modeled stochastically. For each non-empty subset I ⊂ {1, . . . , d} of
components, we denote by EI a non-negative random variable. We assume that
all EI are independent and interpret EI as the arrival time of an exogenous
shock affecting all components of our random vector which are indexed by I.
This means that we define
Xk := min{EI : k ∈ I}, k = 1, . . . , d. (6.1)
Such exogenous shock models are popular in reliability theory, insurance risk,
and portfolio credit risk. Recall from Example 4.1(E) that this model is a gen-
eralization of the Marshall-Olkin distribution, which arises as special case if all
the EI are exponentially distributed, see also Example 6.1 below. For the sake
of clarity, we formally introduce the following definition.
Definition 6.1 (Exogenous shock model). A probability measure μ ∈ M+ 1
(Rd )
is said to define an exogenous shock model if on some probability space there
exists a random vector X with stochastic representation (6.1) such that X ∼ μ.

6.1. Exchangeability and the extendibility problem

We are interested in a solution of Problem 1.3 for the property (P) of “defin-
ing an exogenous shock model”. By Lemma 1.1 exchangeability is a necessary
requirement on X, and we observe immediately from (6.1) that this implies
that the distribution function of EI is allowed to depend on the subset I only
through its cardinality |I|. Some simple algebraic manipulations, see the proof
of Theorem 6.1 below, reveal that the survival function of X necessarily must
be given as the product of its arguments after being ordered and idiosyncrat-
ically distorted. Already the characterization of the exchangeable subfamily in
analytical terms is an interesting problem, the interested reader is referred to
[68] for its solution.
The conditionally iid subfamily M∗∗ is also investigated in [68]. One major
finding is that when the increments of the factor process Z in the canonical
construction (1.5) are independent, then one ends up with an exogenous shock
model. Recall that a càdlàg stochastic process Z = {Zt }t≥0 with independent
increments is called additive, see [94] for a textbook treatment. For our purpose,
it is sufficient to be aware that the probability law of a non-decreasing additive
process Z = {Zt }t≥0 with Z0 = 0 can be described uniquely in terms of a
family {Ψt }t≥0 of Bernstein functions defined by Ψt (x) := − log(E[exp(−x Zt )]),
x ≥ 0, i.e. Ψt equals the Laplace exponent of the infinitely divisible random
variable Zt . The independent increment property implies for 0 ≤ s ≤ t that
Ψt − Ψs is also a Bernstein function and equals the Laplace exponent of the
infinitely divisible random variable Zt − Zs . The easiest example for a non-
decreasing additive process is a Lévy subordinator, in which case Ψt = t Ψ1 ,
738 J.-F. Mai

i.e. the probability law is described completely in terms of just one Bernstein
function Ψ1 (due to the defining property that the increments are not only
independent but also identically distributed). Two further compelling examples
of (non-Lévy) additive processes are presented in subsequent paragraphs.
Theorem 6.1 (Additive subordinators and exogenous shock models). Let M
denote the family of probability laws with the property (P) of “defining an ex-
ogenous shock model”. A random vector X has law in M and is exchangeable if
and only if it admits a survival copula of the functional form


d
Ĉ(u1 , . . . , ud ) = u[1] gk (u[k] ), u1 , . . . , ud ∈ [0, 1], (6.2)
k=2

with certain functions gk : [0, 1] → [0, 1]. Furthermore,


  
γ ∈ M+1
(H+ ) : 1 − e−Zt t≥0 ∼ γ, {Zt }t≥0 additive process = Θ−1
d M∗∗ .

Proof. A proof for the inclusion “⊃” has been accomplished only recently and
can be found in [102]. A proof sketch for the inclusion “⊂” works as follows, see
[68] for details. The survival function of the random vector X defined by (6.1)
can be written in terms of the one-dimensional survival functions of the EI as

P(X > x) = P(EI > max{xk : k ∈ I}).
∅=I

Exchangeability of X implies that the probability law of EI depends on I only


via its cardinality |I| ∈ {1, . . . , d}. If we denote the survival function of EI with
|I| = m by H̄m , we observe that


d 
P(X > x) = H̄m (max{xk : k ∈ I})
m=1 I : |I|=m


d 
d−m+1 d−k
(m−1 )  
d d−k+1 d−k
(m−1 )
= H̄m (x[d−k+1] ) = H̄m (x[d−k+1] ) . (6.3)
m=1 k=1 k=1 m=1

Noting for x = (x, 0, . . . , 0) that x[d] = x and x[1] = . . . = x[d−1] = 0, we observe


that the one-dimensional margins are


d d−1
(m−1 )
P(Xk > x) = H̄m (x) =: F̄1 (x), k = 1, . . . , d.
m=1

That (6.3) can be written as Ĉ P(X1 > x1 ), . . . , P(Xd > xd ) with Ĉ as in (6.2)
follows by a tedious yet straightforward computation with the gk defined as


d−k+1 d−k
(m−1 )
gk := H̄m ◦ F̄1−1 , k = 2, . . . , d,
m=1
The infinite extendibility problem 739

where F̄1−1 denotes the generalized inverse of the non-increasing function F̄1 ,
which is defined analogous to the generalized inverse of a distribution function
as

F̄1−1 (x) := inf{t > 0 : F̄ (t) ≤ x}.

Now assume Z is an additive process with associated family of Bernstein func-


tions {Ψt }t≥0 . The survival copula of the random vector X of Equation (1.5)
can be computed in closed form using the independent increment property of Z.
It is easily shown to be of the structural form (6.2), when

gk (u) := exp − ΨF̄ −1 (u) (k) + ΨF̄ −1 (u) (k − 1) , k = 2, . . . , d,


1 1

with F̄1 (x) := exp(−Ψx (1)), x ≥ 0.


Remark 6.1 (Related literature). Interestingly, there exists quite some litera-
ture that proposes the use of the random distribution function H = 1 − exp(−Z)
with Z additive as a prior distribution when estimating the probability law of
observed samples X1 , . . . , Xd – without noticing the relation to exogenous shock
models. H is sometimes called a neutral-to-the-right prior in these references.
The use of additive Z in this particular application is mainly explained by ana-
lytical convenience, because it implies that prior and posterior distributions have
a similar algebraic structure. To provide some examples, the interested reader
is referred to the papers [51, 48, 61] and the references mentioned therein. Fur-
thermore, [55] calls the random probability measures defined via the random
distribution function H completely random measures, and characterizes them
by the property that the values they take on disjoint subsets are independent.
There are some interesting subfamilies of exogenous shock models that are
worth mentioning with respect to their conditionally iid substructure. The first
of them is a well known friend from previous sections, re-visited once again in
the following example.
Example 6.1 (The Marshall–Olkin law revisited). If all random variables EI
in the exogenous shock model construction (6.1) are exponentially distributed,
we are in the special situation of Example 4.1(E). Indeed, it has already been
shown in the original reference [79] that every Marshall–Olkin distribution can
be constructed like this. Hence, we already know from Theorem 4.1(E) that an ex-
ogenous shock model with exponential arrival times is obtained via the canonical
conditionally iid model (1.5) if the associated stochastic process H = {Ht }t≥0 ∼
γ ∈ M+ 1
(H+ ) is such that Zt := − log(1−Ht ), t ≥ 0, defines a Lévy subordinator,
which is a special additive subordinator.
Example 6.2 (A simple global shock model). A special case of copulas of the
form (6.2) is considered in [22], namely g2 = . . . = gd , which we briefly put in
context with the additive process construction. To this end, let g2 be a strictly
increasing and continuous distribution function of some random variable taking
values in [0, 1], assuming x → g2 (x)/x is non-increasing on (0, 1]. The function
FM (x) := x/g2 (x) then is a distribution function on [0, 1], and we let M be a
740 J.-F. Mai

random variable with this distribution function. Independently, let W1 , W2 , . . . be


an iid sequence drawn from g2 . We consider the infinite exchangeable sequence
{Xk }k∈N with Xk := min{E{k} , E{1,2,...} }, k ∈ N, where

E{k} := − log FM (Wk ) , k ∈ N, E{1,2,...} := − log FM (M ) .


By definition, each finite d-margin has an exogenous shock model representation,
and the survival copula Ĉ is easily seen to be of the form (6.2) with g2 = . . . = gd ,
for arbitrary d ≥ 2. The conditional distribution function is static in the sense
of Section 3, and given by H = 1 − exp(−Z) with
'   
−1 −t
− log g2 FM e , if t < E{1,2,...}
Zt := , t ≥ 0.
∞ , if t ≥ E{1,2,...}
The random variable E{1,2,...} is unit exponential, and Z is additive with asso-
ciated family of Bernstein functions Ψt (x) = − log(E[exp(−x Zt )]) given by
   
−1 −t
Ψt (x) = t 1{x>0} + x − log g2 FM e , x, t ≥ 0.

Notice that for each fixed t > 0 this corresponds to an infinitely divisible distri-
bution of Zt that is concentrated on the set
    
−1 −t
− log g2 FM e ,∞ .

The case g2 (x) = xα with α ∈ [0, 1] implies that Z is a killed Lévy subordinator
that grows linearly before it jumps to infinity, X has a Marshall-Olkin law, and
the EI are exponential. In the general case, Z needs not grow linearly before it
gets killed.
Two further examples are studied in greater detail in the following two para-
graphs, since they give rise to nice characterization results.

6.2. The Dirichlet prior and radial symmetry

In the two landmark papers [33, 34], T.S. Ferguson introduces the so-called
Dirichlet prior and shows that it can be constructed by means of an addi-
tive process. More clearly, let c > 0 be a model parameter and let G ∈ H+ ,
continuous and strictly increasing. Consider a non-decreasing additive process
Z = {Zt }t∈[G−1 (0),G−1 (1)] whose probability law is determined by a family of
Bernstein functions {Ψt }t∈(G−1 (0),G−1 (1)) , which are given by

e−u c (1−G(t)) − e−u c
Ψt (x) = 1 − e−x u du, x ≥ 0.
0 u (1 − e−u )
The random distribution function H = {Ht }t∈[G−1 (0),G−1 (1)] defined by Ht :=
1 − exp(−Zt ) satisfies the following property: For arbitrary G−1 (0) < t1 < . . . <
td < G−1 (1) the random vector
Ht1 , Ht2 − Ht1 , . . . , Htd − Htd−1 , 1 − Htd
The infinite extendibility problem 741

has a Dirichlet distribution16 with parameters

c G(t1 ), G(t2 ) − G(t1 ), . . . , G(td ) − G(td−1 ), 1 − G(td ) ,

and H is called Dirichlet prior with parameters (c, G), denoted DP (c, G) in
the sequel. The probability distribution of (X1 , . . . , Xd ) in (1.5), when H =
DP (c, G) for some G with support [G−1 (0), G−1 (1)] := [0, ∞], is given by

P(X1 > x1 , . . . , Xd > xd ) = Ĉc 1 − G(x1 ), . . . , 1 − G(xd ) , with



d
c u[k] + k − 1
Ĉc (u1 , . . . , ud ) = u[1] . (6.4)
c+k−1
k=2

It is insightful to remark that for c  0 the copula Ĉc converges to the so-called
upper-Fréchet Hoeffding copula Ĉ0 (u) = u[1] , and for c  ∞ to the copula
d
Ĉ∞ (u) = k=1 uk associated with independence. The intuition of the Dirichlet
prior model is that all components of X have distribution function G, but one is
uncertain whether G is really the correct distribution function. So the parameter
c models an uncertainty about G in the sense that the process H must be viewed
as a “distortion” of G. For c  ∞ we obtain H = G, while for c  0 the process
H is maximally chaotic (in some sense) and does not resemble G at all.
Interestingly, if the probability law dG is symmetric about its median μ :=
G−1 (0.5), then the random vector (X1 , . . . , Xd ) is radially symmetric, which
can be verified using Lemma 1.7. One can furthermore show that there ex-
ists no other conditionally iid exogenous shock model satisfying this property,
see the following lemma. To this end, recall that a copula C is called radially
symmetric if C = Ĉ, i.e. it equals its own survival copula, which means that
d
U = (U1 , . . . , Ud ) = (1 − U1 , . . . , 1 − Ud ) for U ∼ C.
Lemma 6.1 (Radial symmetry in exchangeable exogenous shock models). A
copula of the structural form (6.2) is radially symmetric if and only if the func-
tions gk are linear, k = 2, . . . , d, which is the case if and only if there is a
c ∈ [0, ∞] such that the copula takes the form (6.4).
Proof. This is [69, Theorem 3.5]. In order to prove necessity, the principle of
inclusion and exclusion can be used to express the survival copula of Ĉ as an
alternating sum of lower-dimensional margins of Ĉ. By radial symmetry, this
expression equals Ĉ, and on both sides of the equation one may now take the
derivatives with respect to all d arguments. A lengthy but tedious computation
then shows that the gk must all be linear, which implies the claim. Sufficiency
is proved using the Dirichlet prior construction. The defining properties of the
Dirichlet prior imply that the assumptions of Lemma 1.7 are satisfied, which
implies the claim.
16 Recall from Remark 3.5 that S = (S1 , . . . , Sd ) has a Dirichlet distribution with param-
d
eters α = (α1 , . . . , αd ) if S = G/ G1 for a vector G of independent unit-scale Gamma-
distributed random variables.
742 J.-F. Mai

6.3. The Sato-frailty model and self-decomposability

A real-valued random variable X is called self-decomposable if for arbitrary


d
c ∈ (0, 1) there exists an independent random variable Y such that X = c X +Y .
It can be shown that a self-decomposable X is infinitely divisible, so self-
decomposable laws are special cases of infinitely divisible laws. In particular,
if X takes values in (0, ∞) and is infinitely divisible with Laplace exponent
given by the Bernstein function Ψ, then X is self-decomposable if and only
if the function x → x Ψ(1) (x) is again a Bernstein function, see [104, Theo-
rem 2.6, p. 227]. Now let Ψ be the Bernstein function associated with a self-
decomposable law on (0, ∞), and consider a family of Bernstein functions de-
fined by Ψt (x) := Ψ(x t), t ≥ 0. One can show that there exists an additive
subordinator Z = {Zt }t≥0 which is uniquely determined in law by {Ψt }t≥0
via Ψt (x) = − log(E[exp(−x Zt )]), x, t ≥ 0, called Sato subordinator. If we use
this process in (1.5), the conditionally iid random vector X obtained by this
construction has survival function given by

 d 
P(X > x) = exp − Ψ (d − k + 1) x[k] − Ψ (d − k) x[k] , x ∈ [0, ∞)d .
k=1
(6.5)

The following lemma characterizes self-decomposability analytically in terms


of multivariate probability laws given by (6.5).
Lemma 6.2 (Characterization of self-decomposable Bernstein functions). Let
Ψ : [0, ∞) → [0, ∞) be some function. The d-variate function (6.5) defines a
proper survival function on [0, ∞)d for all d ≥ 2 if and only if Ψ equals the
Bernstein function of a self-decomposable probability law on (0, ∞).
Proof. Sufficiency is an instance of the general Theorem 6.1, as demonstrated
above. Necessity, i.e. that self-decomposability can actually be characterized in
terms of the multivariate survival functions (6.5), is shown in [70] and relies on
some purely analytical, technical computations.
Example 6.3 (A one-parametric, multivariate Pareto distribution). Let Ψ(x) =
α log(1 + x) be the Bernstein function associated with a Gamma distribution17
with parameter α > 0. The Gamma distribution is self-decomposable and the
survival function (6.5) takes the explicit, one-parametric form


d
(d − k) x[k] + 1 α
P(X > x) = .
(d − k + 1) x[k] + 1
k=1

The one-dimensional marginal survival functions are given by F̄1 (x) = (1+x)−α .
Notice that this equals the survival function of Y − 1, when Y has a Pareto
distribution with scale parameter (aka left-end point of support) equal to one
17 This is precisely the Gamma distribution with density (3.7) for α = αk .
The infinite extendibility problem 743

and tail index α. Thus, the random vector X + 1 := (X1 + 1, . . . , Xd + 1)


might be viewed as a multivariate extension of the Pareto distribution with scale
parameter equal to one and tail index α.

7. Related open problems

7.1. Extendibility-problem for further families

The present article surveys solutions to Problems 1.1 and 1.3 for several families
M of interest. One goal of the survey is to encourage others to solve the problem
also for other families. We provide examples that we find compelling:
(i) The family of min-stable laws in Section 5 can be generalized to min-
infinitely divisible laws. Generalizing (5.1), a multivariate survival func-
tion F̄ is called min-infinitely divisible if for each t > 0 there is a sur-
vival function F̄t such that F̄ (x)t = F̄t (x). Like min-stability is analogous
to max-stability, the concept of min-infinite divisibility is equivalent to
the concept of max-infinite divisibility, on which [91] provides a textbook
treatment. It is pretty obvious that non-decreasing infinitely divisible pro-
cesses occupy a commanding role with regards to the conditionally iid
subfamily, but to work out a convenient analytical treatment of these in
relation with the associated min-infinitely divisible laws appears to be a
promising direction for further research. Notice that the family of recip-
rocal Archimedean copulas, introduced in [38], is one particular special
case of max-infinitely divisible distribution functions, and in this special
case the conditionally iid subfamily is determined similarly as in the case
of Archimedean copulas, see [38, Section 7]. This might serve as a good
motivating example for the aforementioned generalization.
(ii) Theorem 3.3 studies d-variate densities of the form gd (x[d] ), and [40] also
considers a generalization to densities of the form g(x[1] , x[d] ), depending
on x[1] and x[d] . From a purely algebraic viewpoint it is tempting to investi-
d
gate whether exchangeable densities of the structural form k=1 gk (x[k] )
allow for a nice theory as well. When are these conditionally iid? This
generalization of the ∞ -norm symmetric case is motivated by a relation
to non-homogeneous pure birth processes, as already explained in Re-
mark 3.7. Such processes are of interest in reliability theory, as explained
in [99], see also [11] and [67, Section 4] for related investigations.
(iii) On page 721 it was mentioned that the Marshall-Olkin distribution is
characterized by the property that for all subsets of components the re-
spective “survival indicator process” is a continuous-time Markov chain.
This property may naturally be weakened to the situation when only the
survival indicator process Zt := (1{X1 >t} , . . . , 1{Xd >t} ) of all components
is a continuous-time Markov chain. On the level of multivariate distribu-
tions, one generalizes the Marshall-Olkin distribution to a more general
family of multivariate laws that has been shown to be interesting in math-
ematical finance in [46]. Furthermore, it is a subfamily of the even larger
744 J.-F. Mai

family of so-called multivariate phase-type distributions, see [5]. Which


members of theses families of distributions are conditionally iid? Presum-
ably, this research direction requires to generalize the Lévy subordinator
in the Marshall–Olkin case to more general non-decreasing Markov pro-
cesses.

7.2. Testing for conditional independence

If a specific d-variate law in some family M is given, do we have a practically


useful, analytical criterion to decide whether or not this law is in M∗ , resp. M∗∗ ,
or not? According to Theorem 1.2, in general this requires to check whether a
supremum over bounded measurable functions is bounded from above, which in
practice is rather inconvenient – at least on first glimpse. For certain families,
however, there is hope to find more useful criteria. For instance, for Marshall-
Olkin distributions the link to the truncated moment problem in Remark 4.1 is
helpful in this regard, like it is for binary sequences. For Archimedean copulas
(resp. 1 -norm symmetric survival functions) this boils down to checking whether
a d-monotone function is actually completely monotone, i.e. a Laplace transform.
However, it is an open problem for the family of extreme-value copulas. Of
course, Theorem 5.1 tells us which stable tail dependence functions correspond
to conditionally iid laws. But given some specific stable tail dependence function,
how can we tell effectively whether or not this given function has the desired
form? Even for dimension d = 2, in which case the problem is presumably
easier due to the fact that the 2-dimensional unit simplex is one-dimensional,
this problem is non-trivial and open. Given we find such effective analytical
criterion for some family M, is it even possible to build a useful statistical test
based on it, i.e. can we test the hypothesis that the law is conditionally iid?

7.3. Combination of one-factor models to multi-factor models

This is probably the most obvious application of the presented concepts. The
idea works as follows. According to our notation, the dependence-inducing la-
tent factor in a conditionally iid model is H. Depending on the stochastic prop-
erties of H ∼ γ ∈ M+ 1
(H), it may be possible to construct H from a pair
(H , H ) ∼ γ1 ⊗ γ2 ∈ M+
(1) (2) 1
(H) × M+ 1
(H) of two independent processes of the
same structural form, say H = f (H , H (2) ). For example, if H (1) and H (2) are
(1)

two strong IDT processes, see Section 5, then so is their sum H = H (1) + H (2) .
In this situation, we may define dependent processes H (1,1) , . . . , H (1,J) from
J + 1 independent processes H (0) , . . . , H (J) as H (1,j) = f (H (0) , H (j) ). The con-
ditionally iid vectors X (1) , . . . , X (J) defined via (1.4) from H (1,1) , . . . , H (1,J)
are then dependent, so that the combined vector X = (X (1) , . . . , X (J) ) has a
hierarchical dependence structure. Such structures break out of the – sometimes
undesired and limited – exchangeable cosmos and have the appealing property
that the lowest-level groups are conditionally iid, so the whole structure can be
The infinite extendibility problem 745

sized up, i.e. is dimension-free to some degree. Of particular interest is the situa-
(1) (J)
tion when the random vector (X1 , . . . , X1 ) composed of one component from
each of the J different groups is conditionally iid and its latent factor process
equals H (0) in distribution. In this particular situation, an understanding of the
whole dependence structure of the hierarchical model X is retrieved from an
understanding of the conditionally iid sub-models based on the H (j) . In other
words, the conditionally iid model can be nested to construct highly tractable,
non-exchangeable, multi-factor dependence models from simple building blocks.
For instance, hierarchical elliptical laws, Archimedean copulas18 , and min-stable
laws can be constructed based on the presented one-factor building blocks, see
[73] for an overview. For these and other families, the design, estimation, and
efficient simulation of such hierarchical structures is an active area of research
or even an unsolved problem.

7.4. Parameter estimation with uncertainty

The classical statistical parameter estimation problem is to estimate the (true)


parameter m of a one-parametric distribution function Fm from iid observations
X1 , . . . , Xd ∼ Fm . A parameter estimate is then a function m̂ = m̂(X1 , . . . , Xd )
of the observations into the set of admissible parameters. This classical prob-
lem relies on the hypothesis that there is a “true” parameter m, from which
the observations are drawn. But what if we are uncertain whether or not the
observations are actually drawn from some Fm ? The Dirichlet prior has been
introduced in [33, 34] with the motivation to model uncertainty about the hy-
pothesis that observations are drawn from some Fm . Instead, it is assumed that
they are drawn from DP (c, Fm ) with an uncertainty parameter c > 0. On a
high level, this amounts to observing one sample X = (X1 , . . . , Xd ), with large
d, from a parametric conditionally iid model. Optimal estimates for m based
on the observations can then be derived due to the convenient Dirichlet prior
setting, see [33, 34] for details. But this question can clearly also be posed for
other conditionally iid models. Let us provide a second motivation that appears
to be natural: let X1 , . . . , Xd be observed time points of company bankruptcy
filings within the last 10 years. An iid assumption for X1 , . . . , Xd is well known
to be inappropriate. Instead, a popular model for such time points is a Marshall-
Olkin distribution, see [28]. If we assume in addition – now for mathematical
convenience – that X = (X1 , . . . , Xd ) is conditionally iid, we know from Theo-
rem 4.1(E) and Lemma 1.8 that the empirical distribution function of X1 , . . . , Xd
is approximately equal to 1 − exp(−Z) for a Lévy subordinator Z. Depending
on a specific parametric model for Z, it is well possible that we can estimate the
parameters based on the observed empirical distribution function. For exam-
ple, if Z is a compound Poisson process with constant jump size m, then huge
(small) jumps in the empirical distribution function apparently indicate a large
(small) value of m. Such parameter estimation problems based on one (large)
sample X = (X1 , . . . , Xd ) from a conditionally iid model appear to be very
18 See also the many references in Remark 3.4.
746 J.-F. Mai

model-specific and thus possibly interesting, and the two motivating examples
above indicate that one might find natural motivations for these.

7.5. Quantification of diversity of possible extensions

All of the presented theorems solve Problem 1.3, but only in some cases19 the
solution set M∗∗ is shown to coincide with the in general larger solution set M∗
in Problem 1.1. Can one show that M∗ = M∗∗ in the other presented solutions
of Problem 1.3? To provide one concrete example, from Theorem 4.1(G) we know
that (b0 , b1 , b2 ) ∈ M2 determines a three-dimensional, exchangeable wide-sense
geometric law. However, this exchangeable probability distribution is only in
M∗∗ if there exist b3 , b4 , . . . such that {bk }k∈N0 ∈ M∞ . Could it be that the
last extension property does fail, but the three-dimensional, exchangeable wide-
sense geometric law associated with (b0 , b1 , b2 ) is still conditionally iid? If so,
then necessarily there is some n > 3 and an infinite exchangeable sequence
{Xk }k∈N such that (X1 , X2 , X3 ) has the given wide-sense geometric law but
(X1 , . . . , Xn ) is not wide-sense geometric.
A related question concerns only elements in M∗∗ . There might be two infinite
d
(1) (2) (1) (2)
exchangeable sequences {Xk }k∈N and {Xk }k∈N with {Xk }k∈N = {Xk }k∈N
(1) (1) d (2) (2)
but (X1 , . . . , Xd ) = (X1 , . . . , Xd ) for some d ∈ N. To provide an example,
related to Theorems 2.1 and 4.1, the vector (1, b1 ) with b1 ∈ [0, 1] can always be
extended to a sequence {bk }k∈N that is completely monotone, for example set
bk = bk1 . In case of Theorem 4.1(G), all the different possible extensions imply
different exchangeable sequences {Xk }k∈N such that 2-margins follow the asso-
ciated wide-sense geometric law with parameters (1, b1 ). But all these extensions
in Theorem 4.1(G) have in common that arbitrary d-margins are always wide-
sense geometric. Can one quantify how different such extensions are allowed to
be? A similar question is: Is the “⊂” in (5.5) actually a “=”? Notice that the
proof ideas in [20, 90], who study such issues in the case of some static laws,
might help to approach such questions.

7.6. Characterization of stochastic objects via multivariate


probability laws

As a general rule, for an infinite exchangeable sequence {Xk }k∈N defined via
(1.4) the probability law of the random distribution function H is uniquely
determined by its mixed moments

E[Ht1 · · · Htd ] = P(X1 ≤ t1 , . . . , Xd ≤ td ), d ∈ N, t1 , . . . , td ∈ R.

This often implies interesting analytical characterizations of the stochastic ob-


ject H in terms of the multivariate distribution functions t → P(X ≤ t). In
19 To wit, Example 1.1, Schoenberg’s Theorem 3.1 and Theorem 3.3.
The infinite extendibility problem 747

particular, if H is of the form H = 1 − exp(−Z) like in (1.5), then the mixed


moments above become
 d 
E e− k=1 Ztk = P(X1 > t1 , . . . , Xd > td ), d ∈ N, t1 , . . . , td ≥ 0,

that is the survival functions t → P(X1 > t1 , . . . , Xd > td ) stand in one-to-


one relation with the Laplace transforms of finite-dimensional margins of the
non-decreasing process Z. This general relationship explains the close connec-
tion between conditionally iid probability laws and moment problems/ Laplace
transforms encountered several times in this survey. For instance, Theorem 3.2
shows that ϕ is a Laplace transform if and only if x → ϕ( x 1 ) is a survival
function for all d ≥ 1, or Theorem 4.1 characterizes Lévy subordinators in terms
of multivariate survival functions, or Lemma 6.2 characterizes self-decomposable
Bernstein functions via multivariate survival functions. Can further character-
izations be found? Is there a compelling application for such characterizations
in terms of multivariate survival functions?

References

[1] Aldous, D.J. (1985). Exchangeability and related topics. Springer, École
d’Été de Probabilités de Saint-Flour XIII-1983, Lecture Notes in Mathe-
matics 1117, 1–198. MR0883646
[2] Aldous, D.J. (1985). More uses of exchangeability: representations of
complex random structures. in Probability and Methematical Genetics –
papers in honour of Sir John Kingman, Cambridge University Press 35–63.
MR2744234
[3] Alfsen, E.M. (1971). Compact convex sets and boundary integrals.
Springer, Berlin. MR0445271
[4] Arnold, B.C. (1975). A characterization of the exponential distribution
by multivariate geometric compounding. Sankhyā: The Indian Journal of
Statistics 37:1 164–173. MR0440792
[5] Assaf, D. and Langberg, N.A. and Savits, T.H. and Shaked, M.
(1984). Multivariate phase-type distributions. Operations Research 32:3
688–702. MR0756014
[6] Barlow, R.E. and Proschan, F. (1975). Statistical theory of reliability
and life testing. Rinehart and Winston, New York. MR0438625
[7] Beirlant, J. and Goegebeur, Y. and Teugels, J. and Segers, J.
(2004). Statistics of extremes: theory and applications. John Wiley & Sons,
Chichester. MR2108013
[8] Berg, C. and Christensen, J.P.R. and Ressel, P. (1984). Harmonic
analysis on semigroups. Springer, Berlin. MR0747302
[9] Bernhart, G. and Mai, J.-F. and Scherer, M. (2015). On the con-
struction of low-parametric families of min-stable multivariate exponen-
tial distributions in large dimensions. Dependence Modeling 3 29–46.
MR3418655
748 J.-F. Mai

[10] Bernstein, S. (1929). Sur les fonctions absolument monotones. Acta


Mathematica 52 1–66. MR1555269
[11] Bezgina, E. and Burkschat, M. (2019). On total positivity of ex-
changeable random variables obtained by symmetrization, with applica-
tions to failure-dependent lifetimes. Journal of Multivariate Analysis 169
95–109. MR3875589
[12] Billingsley, P. (1995). Probability and measure. Wiley Series in Proba-
bility and Statistics, Wiley, New York. MR1324786
[13] Brigo, D. and Mai, J.-F. and Scherer, M. (2016). Markov multi-
variate survival indicators for default simulation as a new characterization
of the Marshall–Olkin law Statistics and Probability Letters 114 60–66.
MR3491973
[14] Capéraà, P. and Fougères, A.-L. and Genest, C. (2000). Bivariate
distributions with given extreme value attractor. Journal of Multivariate
Analysis 72 30–49. MR1747422
[15] Charpentier, A. and Fougères, A.-L. and Genest, C. and
Nešlehová, J.G. (2014). Multivariate Archimax copulas. Journal of
Multivariate Analysis 126 118–136. MR3173086
[16] Cossette, H. and Gadoury, S.-P. and Marceauand, E. and Mta-
lai, I. (2017). Hierarchical Archimedean copulas through multivariate
compound distributions. Insurance: Mathematics and Economics 76 1–
13. MR3698183
[17] Daboni, L. (1982). Exchangeability and completely monotone functions.
In: Exchangeability in Probability and Statistics, edited by G. Koch and
F. Spizzichino, North-Holland Publishing Company 39–45. MR0675963
[18] de Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio.
Atti della R. Academia Nazionale dei Lincei, Serie 6. Memorie, Classe di
Scienze Fisiche, Mathematica e Naturale 4 251–299.
[19] de Finetti, B. (1937). La prévision: ses lois logiques, ses sources subjec-
tives. Annales de l’Institut Henri Poincaré 7 1–68. MR1508036
[20] Diaconis, P. and Freedman, D. (1987). A dozen de Finetti-style results
in search of a theory. Annales de l’Institute Henri Poincaré 23 397–423.
MR0898502
[21] Dickinson, P.J.C. and Gijben, L. (2014). On the computational com-
plexity of membership problems for the completely positive cone and
its dual. Computational Optimization and Applications 57:2 403–415.
MR3165055
[22] Durante, F. and Quesada-Molina, J.J. and Ubeda-Flores, ´ M.
(2007). A method for constructing multivariate copulas. In: New Dimen-
sions in Fuzzy Logic and Related Technologies – Proceedings of the 5th
EUSFLAT Conference, volume 1, edited by M. Štěpnička et al. 191–195.
[23] Durrett, R. (2010). Probability: theory and examples, 4th edition. Cam-
bridge University Press, Cambridge. MR2722836
[24] Dykstra, R.L. and Hewett, J.E. and Thompson, Jr., W.A. (1973).
Events which are almost independent. Annals of Statistics 1:4 674–681.
MR0397815
The infinite extendibility problem 749

[25] Embrechts, P. and Hofert, M. (2013). A note on generalized inverses.


Mathematical Methods of Operations Research 77 423–432. MR3072795
[26] Esary, J.D. and Marshall, A.W. (1974). Multivariate distributions
with exponential minimums. Annals of Statistics 2 84–98. MR0362704
[27] Es-Sebaiy, K. and Ouknine, Y. (2008). How rich is the class of processes
which are infinitely divisible with respect to time. Statistics and Probability
Letters 78 537–547. MR2400867
[28] Giesecke, K. (2003). A simple exponential model for dependent defaults.
Journal of Fixed Income 13:3 74–83.
[29] Gupta, A.K. and Nadarajah, S. (2004). Handbook of beta distributions
and its applications. Marcel Dekker, New York. MR2079703
[30] Hewitt, E. and Savage, l.J. (1955). Symmetric measures on Cartesian
products. Transactions of the American Mathematical Society 80 470–501.
MR0076206
[31] Fang, K.-T. and Kotz, S. and Ng, K.-W. (1990). Symmetric multivari-
ate and related distributions. Chapman and Hall, London. MR1071174
[32] Feller, W. (1966). An introduction to probability theory and its applica-
tions, volume II, 2nd edition. John Wiley and Sons, Inc., Hoboken.
[33] Ferguson, T.S. (1973). A Bayesian analysis of some nonparametric
problems. Annals of Statistics 1 209–230. MR0350949
[34] Ferguson, T.S. (1974). Prior distributions on spaces of probability mea-
sures. Annals of Statistics 2 615–629. MR0438568
[35] Frank, M.J. (1979). On the simultaneous associativity of F (x, y) and
x + y − F (x, y). Aequationes Mathematicae 19 194–226. MR0556722
[36] Galambos, J. (1975). Order statistics of samples from multivariate dis-
tributions. Journal of the American Statistical Association 70 674–680.
MR0405714
[37] Genest, C. and Nešlehová, J.G. (2017). When Gumbel met Galambos.
In: Copulas and Dependence Models With Applications: Contributions
in Honor of Roger B. Nelsen (M. Úbeda Flores, E. de Amo Artero, F.
Durante, J. Fernández Sánchez, Eds.), Springer, 83–93. MR3822198
[38] Genest, C. and Nešlehová, J.G. and Rivest, L.-P. (2018). The class
of multivariate max-id copulas with 1 -norm symmetric exponent measure.
Bernoulli 24 3751–3790. MR3788188
[39] Genest, C. and Rivest, L.-P. (1989). Characterization of Gumbel’s
family of extreme value distributions. Statistics and Probability Letters 8
207–211. MR1024029
[40] Gnedin, A.V. (1995). On a class of exchangeable sequences. Statistics
and Probability Letters 25 351–355. MR1363235
[41] Gumbel, E.J. (1960). Bivariate exponential distributions. Journal of the
American Statistical Association 55 698–707. MR0116403
[42] Gumbel, E.J. (1961). Bivariate logistic distributions. Journal of the
American Statistical Association 56 335–349. MR0158451
[43] Hakassou, A. and Ouknine, Y. (2013). IDT processes and associated
Lévy processes with explicit constructions. Stochastics 85:6 1073–1111.
MR3176501
750 J.-F. Mai

[44] Hausdorff, F. (1921). Summationsmethoden und Momentfolgen I.


Mathematische Zeitschrift 9:3-4 74–109. MR1544453
[45] Hausdorff, F. (1923). Momentenproblem für ein endliches Intervall.
Mathematische Zeitschrift 16 220–248. MR1544592
[46] Herbertsson, A. and Rootzén, H. (2008). Pricing kth-to-default swaps
under default contagion: the matrix-analytic approach. Journal of Com-
putational Finance 12 49–72. MR2504900
[47] Hering, C. and Hofert, M. and Mai, J.-F. and Scherer, M. (2010).
Constructing hierarchical Archimedean copulas with Lévy subordinators.
Journal of Multivariate Analysis 101 1428–1433. MR2609503
[48] Hjort, N.L. (1990). Nonparametric Bayes estimators based on beta pro-
cesses in models for life history data. Annals of Statistics 18:3 1259–1294.
MR1062708
[49] Hofert, M. and Scherer, M. (2011). CDO pricing with nested
Archimedean copulas. Quantitative Finance 11 775–787. MR2800641
[50] H. Joe (1997). Multivariate models and dependence concepts. Chapman &
Hall/CRC, Boca Raton. MR1462613
[51] Kalbfleisch, J.D. (1978). Non-parametric Bayesian analysis of survival
time data. Journal of the Royal Statistical Society Series B 40:2 214–221.
MR0517442
[52] Kallenberg, O. (1982). A dynamical approach to exchangeability. In:
Exchangeability in Probability and Statistics, edited by G. Koch and F.
Spizzichino, North-Holland Publishing Company, 87–96. MR0675967
[53] Karlin, S. and Shapley, L.S. (1953). Geometry of moment spaces.
Memoirs of the American Mathematical Society 12:93. MR0059329
[54] Kimberling, C.H. (1974). A probabilistic interpretation of complete
monotonicity. Aequationes Mathematicae 10 152–164. MR0353416
[55] Kingman, J.F.C. (1967). Completely random measures. Pacific Journal
of Mathematics 21:1 59–78. MR0210185
[56] Kingman, J.F.C. (1972). On random sequences with spherical symmetry.
Biometrika 59 492–494. MR0343420
[57] Kingman, J.F.C. (1978). Uses of exchangeability. Annals of Probability
6:2 183–197. MR0494344
[58] Konstantopoulos, T. and Yuan, L. (2019). On the extendibility of
finitely exchangeable probability measures. Transactions of the American
Mathematical Society 371 7067–7092. MR3939570
[59] Kopp, C. and Molchanov, I. (2018). Series representations of time-
stable stochastic processes. Probability and Mathematical Statistics 38:2
299–315. MR3896713
[60] Liggett, T.M. and Steiff, J.E. and Tóth, B. (2007). Statistical me-
chanical systems on complete graphs, infinite exchangeability, finite exten-
sions and a discrete finite moment problem. Annals of Probability 35:3
867–914. MR2319710
[61] Lijoi, A. and Prünster, I. and Walker, S.G. (2008). Posterior anal-
ysis for some classes of nonparametric models. Journal of Nonparametric
Statistics 20:5 447–457. MR2424252
The infinite extendibility problem 751

[62] Lindskog, F. and McNeil, A.J. (2003). Common Poisson shock models:
applications to insurance and credit risk modelling. ASTIN Bulletin 33:2
209–238. MR2035051
[63] Lukacs, E. (1955). A characterization of the gamma distribution. Annals
of Mathematical Statistics 26 319–324. MR0069408
[64] Mai, J.-F. (2018). Extreme-value copulas associated with the expected
scaled maximum of independent random variables. Journal of Multivariate
Analysis 166 50–61. MR3799634
[65] Mai, J.-F. (2019). Simulation of hierarchical Archimedean copulas be-
yond the completely monotone case. Dependence Modeling 7 202–214.
MR3977499
[66] Mai, J.-F. (2020). Canonical spectral representation for exchangeable
max-stable sequences. Extremes 23 151–169. MR4064608
[67] Mai, J.-F. (2020). The de Finetti structure behind some norm-symmetric
multivariate densities with exponential decay. Dependence Modeling 8
210–220. MR4156799
[68] Mai, J.-F. and Schenk, S. and Scherer, M. (2016). Exchangeable
exogenous shock models. Bernoulli 22 1278–1299. MR3449814
[69] Mai, J.-F. and Schenk, S. and Scherer, M. (2016). Analyzing model
robustness via a distortion of the stochastic root: a Dirichlet prior ap-
proach. Statistics and Risk Modeling 32 177–195. MR3507979
[70] Mai, J.-F. and Schenk, S. and Scherer, M. (2017). Two novel char-
acterizations of self-decomposability on the positive half-axis. Journal of
Theoretical Probability 30 365–383. MR3615092
[71] Mai, J.-F. and Scherer, M. (2009). Lévy-frailty copulas. Journal of
Multivariate Analysis 100 1567–1585. MR2514148
[72] Mai, J.-F. and Scherer, M. (2011). Reparameterizing Marshall–Olkin
copulas with applications to sampling. Journal of Statistical Computation
and Simulation 81 59–78. MR2747378
[73] Mai, J.-F. and Scherer, M. (2012). H-extendible copulas. Journal of
Multivariate Analysis 110 151–160. MR2927515
[74] Mai, J.-F. and Scherer, M. (2014). Characterization of extendible dis-
tributions with exponential minima via processes that are infinitely divis-
ible with respect to time. Extremes 17 77–95. MR3179971
[75] Mai, J.-F. and Scherer, M. (2017). Simulating copulas, 2nd edition.
World Scientific Publishing, Singapore. MR3729417
[76] Mai, J.-F. and Scherer, M. (2019). Subordinators which are in-
finitely divisible w.r.t. time: construction, properties, and simulation of
max-stable sequences and infinitely divisible laws. ALEA: Latin Amer-
ican Journal of Probability and Mathematical Statistics 16:2 977–1005.
MR3999795
[77] Mai, J.-F. and Scherer, M. and Shenkman, N. (2013). Multivariate
geometric laws, (logarithmically) monotone sequences, and infinitely di-
visible laws. Journal of Multivariate Analysis 115 457–480. MR3004570
[78] Mansuy, R. (2005). On processes which are infinitely divisible with re-
spect to time. Working paper, arXiv:math/0504408.
752 J.-F. Mai

[79] Marshall, A.W. and Olkin, I. (1967). A multivariate exponential


distribution. Journal of the American Statistical Association 62 30–44.
MR0215400
[80] Marshall, A.W. and Olkin, I. (1979). Inequalities: theory of majoriza-
tion and its applications. Academic Press, New York. MR0552278
[81] McNeil, A.J. and Frey, R. and Embrechts, P. (2005). Quantitative
risk management. Princeton University Press, Princeton. MR2175089
[82] McNeil, A.J. (2008). Sampling nested Archimedean copulas. Journal of
Statistical Computation and Simulation 78 567–581. MR2516827
[83] McNeil, A.J. and Nešlehová, J. (2009). Multivariate Archimedean
copulas, d-monotone functions and l1 -norm symmetric distributions. An-
nals of Statistics 37:5B 3059–3097. MR2541455
[84] McNeil, A.J. and Nešlehová, J. (2010). From Archimedean to
Liouville copulas. Journal of Multivariate Analysis 101 1772–1790.
MR2651954
[85] Molchanov, I. (2008). Convex geometry of max-stable distributions.
Extremes 11:3 235–259. MR2429906
[86] Müller, A. and Stoyan, D. (2002). Comparison methods for stochastic
models and risks. John Wiley and Sons, Chichester (2002). MR1889865
[87] Papangelou, F. (1989). On the Gaussian fluctuations of the critical
Curie-Weiss model in statistical mechanics. Probability Theory and Re-
lated Fields 83 265–278. MR1012501
[88] Pestman, W.R. (2009). Mathematical statistics, 2nd edition. De Gruyter,
Berlin. MR2516478
[89] Puccetti, G. and Wang, R. (2015). Extremal dependence concepts.
Statistical Science 30:4 485–517. MR3432838
[90] Rachev, S.T. and Rüschendorf, L. (1991). Approximate indepen-
dence of distributions on spheres and their stability properties. Annals
of Probability 19 1311–1337. MR1112418
[91] Resnick, S.I. (1987). Extreme values, regular variation and point pro-
cesses. Springer-Verlag, Berlin. MR0900810
[92] Ressel, P. (1985). de Finetti type theorems: an analytical approach.
Annals of Probability 13 898–922. MR0799427
[93] Ryll-Nardzewski, C. (1957). On stationary sequences of random vari-
ables and the de Finetti equivalence. Colloquium Mathematicum 4 149–
156. MR0088823
[94] Sato, K.-I. (1999). Lévy processes and infinitely divisible distributions.
Cambridge University Press, Cambridge. MR1739520
[95] Scarsini, M. (1985). Lower bounds for the distribution function of a k-
dimensional n-extendible exchangeable process. Statistics and Probability
Letters 3 57–62. MR0792789
[96] Schilling, R. and Song, R. and Vondracek, Z. (2010). Bernstein
functions. De Gruyter, Berlin. MR2978140
[97] Schoenberg, I.J. (1938). Metric spaces and positive definite func-
tions. Transactions of the American Mathematical Society 44 522–536.
MR1501980
The infinite extendibility problem 753

[98] Shaked, M. (1977). A concept of positive dependence for exchangeable


random variables. Annals of Statistics 5 505–515. MR0436414
[99] Shaked, M. and Spizzichino, F. and Suter, F. (2002). Nonhomoge-
neous birth processes and ∞ -spherical densities, with applications in reli-
ability theory. Probability in the Engineering and Informational Sciences
16 271–288. MR1914427
[100] Sibley, D.A. (1971). A metric for weak convergence of distribution func-
tions. Rocky Mountain Journal of Mathematics 1:3 427–430. MR0314089
[101] Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs
marges. Publ. Inst. Statist. Univ. Paris 8 229–231. MR0125600
[102] Sloot, H. (2020). The deFinetti representation of generalised Marshall–
Olkin sequences. Dependence Modeling 8:1 107–118. MR4121354
[103] Spizzichino, F. (1982). Extendibility of symmetric probability distribu-
tions and related bounds. In: Exchangeability in Probability and Statis-
tics, edited by G. Koch and F. Spizzichino, North-Holland Publishing
Company, 313–320. MR0675986
[104] Steutel, F.W. and van Harn, K. (2003). Infinite divisibility of proba-
bility distributions on the real line. CRC Press, Boca Raton. MR2011862
[105] Taleb, N.N. (2020). Statistical consequences of fat tails. STEM Academic
Press.
[106] Williamson, R.E. (1956). Multiply monotone functions and their
Laplace transforms. Duke Mathematical Journal 23 189–207. MR0077581
[107] Zhu, W. and Wang, C.-W. and Tan, K.S. (2016). Structure and esti-
mation of Lévy subordinated hierarchical Archimedean copulas (LSHAC):
theory and empirical tests. Journal of Banking and Finance 69 20–36.

You might also like