19 PS336
19 PS336
Contents
Before we start, let us clarify some general notation used throughout, while
some section-specific notations are introduced where they appear.
Table 1. Summary of main results surveyed. Whereas most appearing notations are introduced in the main body of the article, here L[X] denotes
the Laplace transform of a random variable X ≥ 0, and x, y := dk=1 xk yk .
679
680 J.-F. Mai
its generalized inverse, see [25] for background. Any distribution function C
of a random vector U = (U1 , . . . , Ud ) whose components Uk are uniformly dis-
tributed on [0, 1] is called a copula, see [75] for a textbook treatment. We further
recall that an arbitrary survival function F̄ of some d-variate random vector X
can always be written1 as F̄ (x) = Ĉ P(X1 > x1 ), . . . , P(Xd > xd ) , where Ĉ
is a copula, called a survival copula for F̄ , and it is uniquely determined in
case the random variables X1 , . . . , Xd have continuous distribution functions.
This is the survival analogue of the so-called Theorem of Sklar, due to [101].
The Theorem of Sklar itself states that the distribution function F of X can
be written as F (x) = C P(X1 ≤ x1 ), . . . , P(Xd ≤ xd ) for a copula C, called a
copula for F . The relationship between a copula C and its survival copula Ĉ is
that if (U1 , . . . , Ud ) ∼ C then (1 − U1 , . . . , 1 − Ud ) ∼ Ĉ.
A notation of specific interest in the present survey:
We denote by H the set of all distribution functions of real-valued random
variables, and by H+ the subset containing all elements F such that x < 0 implies
F (x) = 0, i.e. distribution functions of non-negative random variables. Elements
F ∈ H are right-continuous, and we denote by F (x−) := limt↑x F (x) their left-
1
continuous versions. If X is some Hausdorff space, we denote by M+ (X) the set of
all probability measures on the measurable space (X, B(X)), where B(X) denotes
the Borel-σ-algebra of X. This notation is borrowed from [8]. Now recall that
H is metrizable (hence in particular Hausdorff) when topologized with the so-
called Lévy metric that induces weak convergence of the associated probability
distributions on R, see [100]. Consequently, we denote by M+ 1
(H) the set of all
probability measures on H. A random element H = {Ht }t∈R ∼ γ ∈ M+ 1
(H)
is almost surely a càdlàg stochastic process, and a common way of treating
probability laws of such objects works via the so-called Skorohod metric on the
space of càdlàg paths. However, even though the Skorohod topology and the
topology induced by the Lévy metric are not identical, see [88, p. 327-328], their
induced Borel-σ-algebras on the set H can indeed be shown to coincide, so that
our viewpoint is equivalent.
Xk := f (Uk , H), k = 1, . . . , d,
Since π was arbitrary, this shows that the distribution function (hence law) of
X is invariant with respect to permutations of its components.
Exchangeability is a property which is convenient to investigate by means
of Analysis, whereas the notion “conditionally iid”, in which we are interested,
is a priori purely probabilistic and more difficult to investigate. Unfortunately,
exchangeability is only a necessary but no sufficient condition for the solution
of our problem. For instance, a bivariate normal distribution is obviously ex-
changeable if and only if the two means and variances are identical, also for
negative correlation coefficients. However, Example 1.1 and Lemma 1.2 below
show that conditionally iid random vectors necessarily have non-negative cor-
relation coefficients. One can show in general that the correlation coefficient,
if existent, between two components of an exchangeable random vector on Rd
is bounded from below by −1/(d − 1), see, e.g., [1, p. 7]. As the dimension d
tends to infinity, this lower bound becomes zero. Even better, the difference
between exchangeabilty and a conditionally iid structure vanishes completely as
the dimension d tends to infinity, which is the content of de Finetti’s Theorem.
Theorem 1.1 (de Finetti’s Theorem). Let {Xk }k∈N be an infinite sequence of
random variables on some probability space (Ω, F, P). The sequence {Xk }k∈N is
exchangeable, meaning that each finite subvector is exchangeable, if and only if
it is iid conditioned on some σ-field H ⊂ F. In this case, H equals almost surely
the tail-σ-field of {Xk }k∈N , which is given by ∩n≥1 σ(Xn , Xn+1 , . . .).
Proof. Originally due to [19]. We refer to [1] for a proof based on the reversed
martingale convergence theorem, which is briefly sketched. Of course, we only
need to verify that exchangeability implies conditionally iid, as the converse
follows from Lemma 1.1. For the sake of a more convenient notation we assume
the infinite sequence {Xk }k∈N0 is indexed by N0 = N ∪ {0}, and we define σ-
algebras F−n := σ(Xn , Xn+1 , . . .) for n ∈ N0 . The tail-σ-filed of the sequence is
H := ∩n≤0 Fn . In order to establish the claim, three auxiliary observations are
helpful with an arbitrary bounded, measurable function g fixed:
d
(i) Exchangeability implies (X0 , X1 , . . .) = (X0 , Xn+1 , . . .) for arbitrary n ∈
d
N. This implies E[g(X0 ) | F−1 ] = E[g(X0 ) | F−(n+1) ], n ∈ N.
(ii) The sequence Yn := E[g(X0 ) | Fn ], n ≤ 0, is easily checked to be a reversed
martingale. The reversed martingale convergence theorem implies that Yn
converges almost surely and in L1 to E[g(X0 ) | H]. See [23, p. 264 ff] for
background on reversed martingales (convergence).
d
(iii) Letting n → ∞ in (i), we observe from (ii) that Y−1 = E[g(X0 ) | H]. We
can further replace this equality in law by an almost sure equality, since
H ⊂ F−1 and the second moments of Y−1 and E[g(X0 ) | H] coincide. Thus,
the sequence {Y−n }n∈N is almost surely a constant sequence.
With these auxiliary observations we may now finish the argument. On the one
684 J.-F. Mai
d
hand, exchangeability implies (X0 , Xn+1 , . . .) = (Xn , Xn+1 , . . .), which gives the
almost sure equality E[g(X0 ) | F−(n+1) ] = E[g(Xn ) | F−(n+1) ]. Taking E[. | H]
on both sides of this equation implies with the tower property of conditional
expectation that E[g(X0 ) | H] = E[g(Xn ) | H]. Since g was arbitrary, X0 and
Xn are identically distributed conditioned on H, and since n was arbitrary all
members of the sequence are identically distributed conditioned on H. To verify
conditional independence, let g1 , g2 be two bounded, measurable functions. For
n ≥ 1 arbitrary, using (iii) in the third equality below, we compute
(in general) smaller set M∗ , and is typically an important first step towards
a solution to Problem 1.1. However, the second (typically harder) step from
(finite) exchangeability to conditionally iid is usually the more important and
more interesting step from both a theoretical and practical perspective. The
algebraic structure of a general theory on (finite) exchangeability is naturally
of a different, often more combinatorial character, whereas “conditionally iid”
by virtue of de Finetti’s Theorem naturally is the concept of an infinite limit
(of exchangeability) so that techniques from Analysis enter the scene. Thus, we
feel it is useful to provide an account with a more narrow scope on conditionally
iid, even though for some of the presented examples we are well aware that an
interesting (finite) exchangeable theory is also viable. On the other hand, many
references consider the case when the components of X take values in more
general spaces than R, for instance in Rn (i.e. lattices instead of vectors) or
even function spaces. In particular, de Finetti’s Theorem 1.1 can be generalized
in this regard, seminal references are [30, 92]. Research in this direction is by
nature more abstract and thus maybe less accessible for a broader audience,
or for more practically oriented readers. One goal of the present survey is to
provide an account that is not exclusively geared towards theorists but also to
applicants of the theory, and in particular to point out relationships to classical
statistical probability laws on Rd . We believe that a limitation of this survey’s
scope to the real-valued case is still rich enough to provide a solid basis for an
interesting and accessible theory. In fact, we seek to demonstrate that Prob-
lem 1.1 has been solved satisfactorily in quite a number of highly interesting
cases, and the solutions contain interesting links to different probabilistic top-
ics. Of course, it might be worthwhile to ponder about generalizations of some
of the presented results to more abstract settings in the future (unless already
done) – but purposely these lie outside the present survey.
Why is Problem 1.1 interesting at all?
Broadly speaking, because of two reasons: (a) conditionally iid models are conve-
nient for applications, and (b) solutions to the extendibility problem sometimes
rely on compelling relationships between a priori different theories.
(a) Roughly speaking, conditionally iid models allow to model (strong and
weak) dependence between random variables in a way that features many
desirable properties which are taylor-made for applications, in particular
when the dimension d is large. Firstly, a conditionally iid random vector is
“dimension-free” in the sense that components can be added or removed
from X without altering the basic structure of the model, which simply
follows from the fact that an iid sequence remains an iid sequence after
addition or removal of certain members. This may be very important in
applications that require a regular change of dimension, e.g. the readjust-
ment of a large credit portfolio in a bank, when old credits leave and new
credits enter the portfolio frequently. Secondly, if X has a distribution
from a parametric family, the parameters of this family are typically de-
termined by the parameters of the underlying latent probability measure
γ, irrespective of the dimension d. Consequently, the number of param-
686 J.-F. Mai
eters does usually not grow significantly with the dimension d and may
be controlled at one’s personal taste. This is an enormous advantage for
model design in practice, in particular since the huge degree of freedom/
huge number of parameters in a high-dimensional dependence model is
often more boon than bane. Thirdly, fundamental statistical theorems re-
lying on the iid assumption, like the law of large numbers, may still be
applied in a conditionally iid setting, making such models very tractable.
Last but not least, in dependence modeling a “factor-model way of think-
ing” is very intuitive, e.g. it is well-established in the multivariate normal
case (thinking of principle component analyses etc.). On a high level, if one
wishes to design a multi-factor dependence model within a certain family
of distributions M, an important first step is to determine the one-factor
subfamily M∗ . Having found a conditionally iid stochastic representation
of M∗ , the design of multi-factor models is sometimes obvious from there,
see also paragraph 7.3.
(b) The solution to Problem 1.1 is often mathematically challenging and
compelling. It naturally provides an interesting connection between the
“static” world of random vectors and the “dynamic” cosmos of (one-
dimensional) stochastic processes. The latter enter the scene because the
latent factor being responsible for the dependence in a conditionally iid
model for X may canonically be viewed as a non-decreasing stochastic
process (a random distribution function), which is further explained in
Section 1.3 below. In particular, for some classical families M of multi-
variate laws from the statistical literature the family M∗ in Problem 1.1
is conveniently described in terms of a well-studied family of stochastic
processes like Lévy subordinators, Sato subordinators, or processes which
are infinitely divisible with respect to time. Moreover, in order to formally
establish the aforementioned link between these two seemingly different
fields of research the required mathematical techniques involve classical
theories from Analysis like Laplace transforms, Bernstein functions, and
moment problems.
Example 1.1 (The multivariate normal law). We want to solve Problem 1.1
for the family
We claim that M∗ equals the set of all multivariate normal distributions satis-
The infinite extendibility problem 687
fying
⎡ ⎤
σ2 ρ σ2 ... ρ σ2
⎢ρ σ 2 σ2 ρ σ2 ⎥
⎢ ⎥
μ = (μ, . . . , μ), Σ=⎢ . .. ⎥, μ ∈ R, σ > 0, ρ ∈ [0, 1].
⎣ .. . ⎦
ρ σ2 ρ σ2 σ2
Proof. Consider X = (X1 , . . . , Xd ) ∼ N (μ, Σ) on a probability space (Ω, F, P)
for μ = (μ1 , . . . , μd ) ∈ Rd , and Σ = (Σi,j ) ∈ Rd×d a positive definite matrix.
If we assume that the law of X is in M∗ , it follows that there is a sub-σ-
algebra H ⊂ F such that the components X1 , . . . , Xd are iid conditioned on H.
Consequently,
μk = E[Xk ] = E[E[Xk | H]] = E[E[X1 | H]] = E[X1 ] = μ1 , (1.1)
irrespectively of k = 1, . . . , d. The analogous reasoning also holds for the second
moment of Xk , which implies Σk,k = Σ1,1 for all k. Moreover,
E[Xi Xj ] = E[E[Xi | H] E[Xj | H]] = E[E[X1 | H]2 ] ≥ μ21 , (1.2)
for arbitrary components i = j, where we used the conditional iid structure and
Jensen’s inequality. This finally implies that all off-diagonal elements of Σ are
identical and non-negative.
Conversely, let μ ∈ R, σ > 0, and ρ ∈ [0, 1]. Consider a probability space on
which d + 1 iid standard normally distributed random variables M, M1 , . . . , Md
are defined. We define
√
Xk := μ + σ ρ M + 1 − ρ Mk , k = 1, . . . , d. (1.3)
It is readily observed that X = (X1 , . . . , Xd ) has a multivariate normal law
with pairwise correlation coefficients all being equal to ρ, and all components
having mean μ and variance σ 2 . Notice in particular that the non-negativity
of ρ is important in the construction (1.3) because the square root is not well-
defined otherwise. The components of X are obviously conditionally iid given
the σ-algebra H generated by M . Hence the law of X is in M∗ .
There are already some interesting remarks to be made about this simple ex-
ample. First of all, it is observed that the family M∗ is always three-parametric,
irrespective of the dimension d. This stands in glaring contrast to M, which has
d + d (d + 1)/2 parameters in dimension d. Second, in general a Monte Carlo
simulation of a d-dimensional normal random vector X requires a Cholesky de-
composition of the matrix Σ, which typically has computational complexity of
order d3 , see [75, Algorithm 4.3, p. 182]. In contrast, the simulation of X with
law in M∗ according to (1.3) has only linear complexity in the dimension d.
Especially in large dimensions this can be a critical improvement of computa-
tional speed. Third, the proof above shows that each random vector with law in
M∗ may actually be viewed as the first d components of an infinite sequence of
conditionally iid random variables such that arbitrary finite n-margins have a
multivariate normal law. Thus, we have actually solved the re-fined Problem 1.3
to be introduced in the upcoming paragraph.
688 J.-F. Mai
Given M ⊂ M+ 1
(Rd ) we denote the pre-image of M∗ under Θd in M+ 1
(H) by
−1 1
Θd (M∗ ). In words, it equals the subset of M+ (H) which consists of all prob-
ability laws γ of stochastic processes {Ht }t∈R such that X of the canonical
construction (1.4) has a law in M, hence in M∗ . From this equivalent viewpoint
our motivating Problem 1.1 becomes
The infinite extendibility problem 689
Definition 1.2 (Conditionally iid respecting (P)). Let (P) be some property
690 J.-F. Mai
Mn,(P) := {μ ∈ M+
1
(Rn ) : μ has property (P)} ⊂ M+
1
(Rn ).
If the given family M consists only of probability laws on [0, ∞)d , it is con-
venient to slightly reformulate the stochastic model (1.4). Clearly, if we have
non-negative components, necessarily Ht = 0 for all t < 0 almost surely. There-
fore, without loss of generality we may assume that H = {Ht }t≥0 is indexed
by t ∈ [0, ∞). Moreover, applying the substitution z = − log(1 − F ) it trivially
holds true that
H+ = t → 1 − exp(−z(t)) z : [0, ∞) → [0, ∞] non-decreasing,
right-continuous, with z(0) ≥ 0 and lim z(t) = ∞ .
t→∞
general properties of M∗ .
(2)
= E P X1 > X1 , X2 > X2 | H + P X 1 < X1 , X2 < X2 H
(1) (2) (1) (2) (1) (2) (1)
2 2
(2) (1)
=E Hx− dHx(1) +E Hx− dHx(2)
and analogously
(1) (2) (1) (2)
P (X1 − X1 ) (X2 − X2 < 0
= E P (X1 − X1 ) (X2 − X2 ) < 0 H
(1) (2) (1) (2)
(2) (1)
= 2E Hx− dHx(1) Hx− dHx(2) ,
see [80] for a textbook account on the topic. A vector a = (a1 , . . . , ad ) is said
to majorize a vector b = (b1 , . . . , bd ) if
d
d
d
d
a[k] ≥ b[k] , n = 2, . . . , d, ak = bk .
k=n k=n k=1 k=1
Intuitively, the entries of b are “closer to each other” than the entries of a,
even though the sum of all entries is identical for both vectors. For instance,
the vector (1, 0, . . . , 0) majorizes the vector (1/2, 1/2, 0, . . . , 0), which majorizes
(1/3, 1/3, 1/3, 0, . . . , 0), and so on.
Lemma 1.4 (A link to majorization). Consider X with law in M∗ . Further, let
Y = (Y1 , . . . , Yd ) be a random vector with components that are iid and satisfy
d
Y1 = X1 .
d
d
d
d
FX[k] (x) = FXk (x) = d FX1 (x) = d FY1 (x) = FYk (x) = FY[k] (x).
k=1 k=1 k=1 k=1
Since FX[1] (x) ≥ . . . ≥ FX[d] (x) and FY[1] (x) ≥ . . . ≥ FX[d] (x), for part (a) we
have to show that
n
n
FX[k] (x) ≤ FY[k] (x), n = 1, . . . , d − 1.
k=1 k=1
d
n
d
hn,d (p) := pi (1 − p)d−i , p ∈ [0, 1],
i
k=1 i=k
where Jensen’s inequality has been used. Making use of the relation E[Z] =
∞ 0
0
1 − FZ (z) dz − −∞ FZ (z) dz for real-valued random variables Z, part (b)
is obtained from (a) for the case g(x) = x. For the general case, one simply
has to observe that the law of g(X1 ), . . . , g(Xd ) is also in M∗ and due to
monotonicity of g we have either g(X[1] ) ≤ . . . ≤ g(X[d] ) in the non-decreasing
case or g(X[d] ) ≤ . . . ≤ g(X[1] ) in the non-increasing case.
Intuitively, statement (b) in case g(x) = x states that the expected values
of the order statistics E[X[k] ], k = 1, . . . , d, are closer to each other than the
respective values if the components of X were iid (and not only conditionally
iid). Intuitively, the components of a random vector X with components that
are conditionally iid are thus less spread out than the components of a ran-
dom vector with iid components. Thus, Lemmata 1.2, 1.3 and 1.4 show that
dependence models built from a conditionally iid setup can only capture the
situation of components being “more clustered” than independence, which is
loosely interpreted as “positive dependence”. Generally speaking, negative de-
pendence concepts are more complicated than positive dependence concepts in
dimensions d ≥ 3, the interested reader is referred to [89] for a nice overview
and references dealing with such concepts.
Whereas Lemmata 1.2, 1.3 and 1.4 provide three particular quantifications
for positive dependence of a conditionally iid probability law, many other pos-
sible concepts of positive dependence can be found in the literature, a textbook
account on the topic is [86]. [95, Theorem 4] claims that if X = (X1 , . . . , Xd ) is
conditionally iid and x → P(X1 ≤ x) is continuous, then
d
P(X ≤ x) ≥ P(Xk ≤ xk ), x ∈ Rd ,
k=1
contradicting positive lower orthant dependency. Notice that Kendall’s Tau for
X is exactly equal to zero, and also the correlation coefficient between the compo-
nents of X equals zero. Figure 2 depicts a scatter plot of 1000 samples from X.
In contrast to Example 1.2, [24] prove that the weaker property P(X1 ∈
A, . . . , Xd ∈ A) ≥ P(X1 ∈ A)d holds indeed true for conditionally iid X and
an arbitrary measurable set A ⊂ R. This makes clear that a decisive point in
Example 1.2 is that the considered xi are different.
exactly the same value with positive probability if and only if their common
distribution function has at least one jump, the claim follows.
The following result is shown in [98, Proposition 4.2], but we present a slightly
different proof.
Lemma 1.6 (Closure under convergence in distribution). If X (n) are condition-
ally iid and converge in distribution to X, then the law of X is also conditionally
iid.
Proof. Since we only deal with a statement in distribution, we are free to as-
sume that each X (n) is represented as in (1.4) from some stochastic process
(n)
H (n) = {Ht }t∈R , and all objects are defined on the same probability space
(Ω, F, P). The random objects H (n) take values in the set of distribution func-
tions of random variables taking values in [−∞, ∞]. This set is compact by
Helly’s Selection Theorem and Hausdorff when equipped with the topology of
pointwise convergence at all continuity points of the limit, see [100]. Thus, the
probability measures on this set are a compact set by [3, Corollary II.4.2, p.
104]. This implies that we find a convergent subsequence {nk }k∈N ⊂ N such
that H (nk ) converges in distribution to some limiting stochastic process H,
which takes itself values in the set of distribution functions of random variables
taking values in [−∞, ∞]. It is now not difficult to see that
(nk ) (nk )
P(X1 ≤ x1 , . . . , Xd ≤ xd ) = lim P(X1 ≤ x1 , . . . , Xd ≤ xd )
k→∞
where bounded convergence is used in the last equality. This implies that the
law of X can be constructed canonically like in (1.4), hence X is conditionally
iid. Finally, since X is assumed to take values in Rd , necessarily H is almost
surely the distribution function of a random variable taking values in R (instead
of [−∞, ∞]).
from where the claimed equivalence can now be deduced easily. Notice that the
conditionally iid structure implies that d can be chosen arbitrary and the law
of H is determined uniquely by the law of an infinite exchangeable sequence
{Xk }k∈N constructed as in (1.4) with d → ∞.
Example 1.3 (The multivariate normal law, again). The most prominent ra-
dially symmetric distribution is the multivariate normal law. Recalling Exam-
ple 1.1, it follows from (1.3) that N (μ, Σ)∗ , the conditionally iid normal laws,
are induced by the stochastic process {Ht }t≥0 given by
√
t−μ
− ρ M
Ht = Φ σ √ , t ∈ R, (1.6)
1−ρ
1
d
1{Xk ≤t} −→ Ht , as d → ∞.
d
k=1
The stochastic nature of the process {Ht }t∈R clearly determines the law of X.
Conversely, Lemma 1.8 tells us that the law of the d-dimensional vector X does
not determine the law of the underlying latent factor {Ht }t∈R in general, but
accomplishes this in the limit as d → ∞. Given some infinite exchangeable
sequence of random variables {Xk }k∈N , it shows how we can recover its latent
random distribution function H.
A rather obvious property of the set M∗ is convexity.
Lemma 1.9 (M∗ is convex with extremal boundary the product measures).
If μ1 , μ2 ∈ M∗ and ∈ (0, 1), then μ1 + (1 − ) μ2 ∈ M∗ . Furthermore, if
μ ∈ M∗ is extremal, meaning that μ = μ1 + (1 − ) μ2 for some ∈ (0, 1) and
μ1 , μ2 ∈ M∗ necessarily implies μ = μ1 = μ2 , then μ is a product measure3 .
Proof. The convexity of M∗ is an immediate transfer from the (obvious) con-
1
vexity of M+ (H) under the mapping Θd , as the reader can readily check herself.
That product measures are extremal is also obvious. Finally, consider an ex-
tremal element μ ∈ M∗ . Since μ is conditionally iid, there is a probability
measure γ ∈ M+ 1
(H) such that μ (−∞, x] = H h(x1 ) · · · h(xd ) γ(dh). We
choose a Borel set A ∈ H with γ(A) > 0. If γ(A) = 1 is the only possible
choice, γ is actually a Dirac measure at some element h ∈ H and μ is a prod-
uct measure, as claimed. Let us derive a contradiction otherwise, in which case
γ = γ(A) γ(. | A) + γ(Ac ) γ(. | Ac ) and both γ(. | A) and γ(. | Ac ) are elements of
1
M+ (H). We obtain a convex combination of μ, to wit
Since μ is extremal and γ(. | A) and γ(. | Ac ) are different by definition, we obtain
the desired contradiction.
For the sake of completeness, the following remark gives two equivalent con-
ditions for exchangeability of an infinite sequence of random variables.
Remark 1.2 (Conditions equivalent to infinite exchangeability). A result due
to [93] states that an infinite sequence {Xk }k∈N of random variables is exchange-
able (or, equivalently, conditionally iid by de Finetti’s Theorem) if and only if
the law of the infinite sequence {Xnk }k∈N is invariant with respect to the choice
of (increasing) subsequence {nk }k∈N ⊂ N. Another equivalent condition to ex-
d
changeability is that {Xk }k∈N = {Xτ +k }k∈N for an arbitrary finite stopping time
τ with respect to the filtration Fn := σ(X1 , . . . , Xn ), n ∈ N, see [52].
where the outer supremum is taken over all (non-zero) bounded, measurable
functions g : Rd → R, and the inner supremum in the denominator is taken
over all random vectors Y = (Y1 , . . . , Yd ) with iid components.
Proof. The proof of sufficiency is the difficult part, relying on functional analytic
methods, and we refer the interested reader to [58, Theorem 5.1], but provide
some intuition below. Necessity of the condition in Theorem 1.2 is the easy part,
as will briefly be explained. Without loss of generality we may assume that X
is represented by (1.4) with some stochastic process H ∼ γ ∈ M+ 1
(H) and an
independent sequence of iid variates U1 , . . . , Ud uniformly distributed on [0, 1].
For arbitrary bounded and measurable g we observe
Consequently, the supremum over all such Y is bounded from above by 1/4,
hence the supremum over all g in the condition of Theorem 1.2 is at least two,
hence larger than one. The intuition behind this counterexample is that we
have found one particular bounded measurable g that addresses a distributional
property of X that sets it apart from any iid sequence. Indeed, the proof of
[58] relies on the Hahn-Banach Theorem and thus on a separation argument,
4 In addition to Theorem 1.2, [58] even consider more abstract spaces than R, and also pro-
vide a necessary and sufficient criterion for finite extendibility of the law of X = (X1 , . . . , Xd )
to an exchangeable law on Rn for n > d arbitrary.
700 J.-F. Mai
since the set of conditionally iid laws can be viewed as a closed convex subset
1
of M+ (Rd ) with extremal boundary comprising the laws with iid components,
see Lemma 1.9.
On the one hand, Theorem 1.2 is clearly a milestone with regards to the
present survey as it solves Problem 1.1 in the general case. On the other hand,
it is difficult to apply the derived condition in particular cases of Problem 1.1,
when the family M is some (semi-)parametric family of interest – simply because
the involved suprema are hard to evaluate, see also Example 1.4 below. On a
high level, Theorem 1.2 solves Problem 1.1 but not the refined Problem 1.3,
which depends on an additional dimension-independent property (P). However,
the most compelling results of the theory deal precisely with certain dimension-
independent properties (P) of interest, see the upcoming sections as well as
paragraph 7.5 for a further discussion. This is because the additional structure
provided by some property (P) and the search for structure-preserving exten-
sions is in many cases a more natural and more interesting problem than to
simply find some extension. We will see that the algebraic structure of this
problem is highly case-specific in general, i.e. heavily dependent on (P).
The following example shows that the supremum condition of Theorem 1.2
can lead to an NP-hard problem in general.
Notice that the denominator is equal to the absolute value of the maximal eigen-
value of G, the so-called spectral radius of G. As outlined before, this optimiza-
tion problem must be NP-hard, unless P=NP.
2. Binary sequences
We study probability laws on {0, 1}d , i.e. on the set of finite binary sequences. We
start with a short digression on the little moment problem, because it occupies
a commanding role, not only in this section but also in Section 4 below. For a
further discussion between the little moment problem and de Finetti’s Theorem,
the interested reader is also referred to [17].
Actually, before Bruno de Finetti published his seminal Theorem 1.1 in 1937, he
first published in [18] the same result for the simpler case of binary sequences. In
fact, he showed that there is a one-to-one correspondence between exchangeable
1
probability laws on infinite binary sequences and the set M+ ([0, 1]) of probability
laws on [0, 1].
We start with a random vector X = (X1 , . . . , Xd ) taking values in {0, 1}d . We
know from Lemma 1.1 that X needs to be exchangeable in order to possibly be
conditionally iid, so we concentrate on the exchangeable case. Let 1m , 0m denote
m-dimensional row vectors with all entries equal to one and zero, respectively,
and define
pk := P X = (1k , 0d−k ) , k = 0, . . . , d.
pk := P X = (1k , 0d−k ) , k = 0, . . . , d.
pk = ∇d−k bk , k = 0, . . . , d,
d−k
d−k
bk := pd−i , k = 0, . . . , d.
i=0
i
Proof. The equivalence of (c) and (b) relies on the truncated Hausdorff moment
problem and the identities
which are all readily verified. To show that (b) implies (a) works precisely along
the stochastic model with U as claimed, which is easily checked. To verify the
essential part (a) =⇒ (b) we may simply apply de Finetti’s Theorem 1.1 in
the special case of a binary sequence6 : (a) implies that we may without loss of
generality assume that the given random vector equals the first d members of
an infinite exchangeable binary sequence {Xk }k∈N . De Finetti’s Theorem 1.1,
and as a corollary Lemma 1.8, give us a random variable H ∼ γ ∈ M+ 1
(H).
Since each Xk takes values only in {0, 1}, necessarily almost every path of H
has only one value different from {0, 1}, which is Ht for t ∈ [0, 1). So we define
M := 1 − H1/2 and observe that conditioned on M , the random variables Xk
are iid Bernoulli with success probability M . This implies the claim.
In words, the canonical stochastic model for conditionally iid X with values
in {0, 1}d is a sequence of d independent coin tosses with success probability
M which is identical for all coin tosses, but simulated once before the first coin
toss. We end this section with two examples of particular interest.
Example 2.1 (Pólya’s urn). Let r ∈ N and b ∈ N denote the numbers of red
and blue balls in an urn. Define a random vector X ∈ {0, 1}d as follows:
(i) Set k := 1.
(ii) Draw a ball at random from the urn.
(iii) Set Xk := 1 if the ball is red, and Xk := 0 otherwise.
(iv) Put the ball back into the urn with 1 additional ball of the same color.
(v) Increment k := k + 1.
(vi) If k = d + 1, stop, otherwise go to step (ii).
It is not difficult to observe that X is exchangeable, since
x1 −1 d−x −1
(r + k) k=0 1 (b + k)
P(X = x) = k=0
d−1 , x ∈ {0, 1}d ,
k=0 (r + b + k)
depends on x only through x 1 . Like in Theorem 2.1 we denote by pk the
probability P(X = x) if x 1 = k, k = 0, . . . , d. Using induction over k =
d, d − 1, . . . , 0 in order to verify (∗) below and knowledge about the moments of
the Beta-distribution7 in (∗∗) below, we observe that
d − k
d−k d − k (r + b − 1)! (r + d − i − 1)! (b + i − 1)!
d−k
bk := pd−i =
i=0
i i=0
i (r − 1)! (b − 1)! (r + b + d − 1)!
(∗) (r + k − 1)! (r + b − 1)! Γ(r + k) Γ(r + b) (∗∗)
= = = E[M k ],
(r − 1)! (b + r + k − 1)! Γ(r) Γ(r + b + k)
where M is a random variable with Beta-distribution whose density is given by
Γ(r + b) r−1
fM (x) = x (1 − x)b−1 , 0 < x < 1.
Γ(r) Γ(b)
6 Alternatively, one may construct a completely monotone sequence {b }
k k∈N from an infi-
nite extension of X, as demonstrated in [17, Equation (1)], and then make use of Hausdorff’s
moment problem to obtain M .
7 See, e.g., [29, p. 35].
The infinite extendibility problem 705
Of particular interest are cases in which Y1 takes only finitely many different
values. Especially if Y1 ∈ {0, 1}, the vector X is a binary sequence like in the
present section.
A prominent model motivating the investigation of [60] is the so-called Curie-
Weiss Ising model. In probabilistic terms, this model is a probability law on
{−1, 1}d with two parameters J, h ∈ R, and the components of a random vector
Z with this probability law models the so-called spins at d different sites. These
spins can either have the value −1 or 1 (so X := (1{Z1 >0} , . . . , 1{Zd >0} ) is
a transformation from {−1, 1}d to {0, 1}d ). We denote for n ∈ {−1, 1}d by
N (n) the number of 1’s in n, so that d − N (n) equals the number of −1’s. For
n ∈ {−1, 1}d we define
2
2 N (n)−d + J2
eh 2 N (n)−d
P(Z = n) = d d J
, n ∈ {−1, 1}d , (2.6)
eh (2 k−d)+ 2 (2 k−d)2
k=0 k
which is an exchangeable probability law on {−1, 1}d . The exponent of the nu-
merator can be re-written as
J 2
d
J
d d
h 2 N (n) − d + 2 N (n) − d =h nk + nk ni
2 2 i=1
k=1 k=1
and is called the Hamilton operator of the model. The parameter h determines
the external magnetic field and the parameter J denotes a coupling constant.
If J ≥ 0 the model is called ferromagnetic, and for J < 0 it is called antifer-
romagnetic. The
√ √ √ (2.4), if Y1 takes
ferromagnetic case arises as special case of
values
√ in {− J, J} with respective probabilities P(Y1 = √J) = 1 − P(Y1 =
− J) = exp(h)/(exp(h) + exp(−h)). Then the law of Z/ J on {−1, 1}d is
precisely given by the Curie-Weiss Ising model in (2.6) with J ≥ 0. Notice that
for the antiferromagnetic case J < 0 this construction is impossible.
706 J.-F. Mai
[60, Theorem 1.2] shows that X as defined in (2.4) is conditionally iid. More
concretely, conditioned on a random variable M with density8
x2
ψ(x) e− 2
fM (v) := √ , x ∈ R,
cd 2π
the components of X are iid with common distribution
eM x
P(Xk ∈ dx | M ) = P(Y1 ∈ dx), k = 1, . . . , d,
ψ(M )
as can easily be checked. In particular, this shows that the aforementioned fer-
romagnetic Curie-Weiss Ising model is conditionally iid, a result originally due
to [87].
Besides the seminal de Finetti’s Theorem 1.1, the most popular results in
the theory on conditionally iid models concern latent factor processes H of
a very special form to be discussed in the present section. To this end, we con-
sider a popular one-parametric family of one-dimensional distribution functions
x → Fm (x) on the real line and put a prior distribution on the parameter m ∈ R.
Then define H = {Ht }t∈R in the canonical construction (1.4) by Ht = FM (t),
where M is some random variable taking values in the set of admissible values
for the parameter m. For some prominent families, for example the zero mean
normal law or the exponential law, the resulting distribution of the random vec-
tor X belongs to a prominent multivariate family of distributions M, and in fact
defines the subset M∗ ⊂ M. Of particular interest is the case when the subset
M∗ of M admits a convenient analytical description within the framework of the
analytical description of the larger family M. By construction, in this method
of generating conditionally iid laws the dependence-inducing latent factor pro-
cess H is fully determined already by a single random parameter M , so that it
appears unnatural to formulate the model in terms of a “stochastic process” H
at all. Since we investigate situations in which this appears to be more natural
in later sections, we purposely do this anyway in order to present all results
of the present article under one common umbrella. The “single-parameter con-
struction” just described can then be classified as some kind of “static” process
1
within the realm of all possible processes with laws in M+ (H).
More rigorously, let {Ht }t≥0 be the stochastic process from the canonical
stochastic representation (1.4) of some multivariate law in M∗ ⊂ M. Equiv-
alently, we view this probability law as a d-dimensional marginal law of some
infinite exchangeable sequence of random variables {Xk }k∈N , and define {Ht }t≥0
d
according to Lemma 1.8 as the uniform limit of k=1 1{Xk ≤ t} /d t≥0 as d →
∞. We call the probability law of X = (X1 , . . . , Xd ) static, if the natural filtra-
tion generated by {Ht }t≥0 , i.e. Ht := σ(Hs | s ≤ t), t ∈ R, is trivial, meaning
8 Completing the square shows that fM defines a proper density function on R.
The infinite extendibility problem 707
that there is some T ∈ [−∞, ∞) such that Ht = {∅, Ω} for t ≤ T (“zero infor-
mation before T ”) and Ht = H for t > T (“total information after T ”). The
present section reviews well-known families of distributions M, for which the set
M∗ consists only of static laws. As already mentioned, this situation typically
occurs when the random distribution function H ∼ γ ∈ M+ 1
(H) is itself given by
Ht = FM (t), for a popular family Fm of one-dimensional distribution functions
and a single random variable M representing a random parameter pick.
Example 3.1 (The multivariate normal law revisited). It follows from Exam-
ples 1.1 and 1.3 that N (μ, Σ)∗ , the conditionally iid normal laws, are static.
The random distribution function H as given by (1.6) obviously satisfies H =
σ(Ht : t ∈ R) = σ(M ) = Ht for arbitrary t ∈ R.
Example 3.2 (Binary sequences revisited). If one (hence all) of the conditions
of Theorem 2.1 is satisfied, the law of the binary sequence X ∈ {0, 1}d is static.
Using the notation in Theorem 2.1, the random distribution function H equals
Ht := (1 − M ) 1{t≥0} + M 1{t≥1} . Obviously, H = σ(Ht : t ∈ R) = σ(M ) = Ht
for arbitrary t ≥ 0.
In the remaining section we treat the mixture of iid zero mean normals in
paragraph 3.1 and the mixture of iid exponentials in paragraph 3.2, since these
are the best-studied cases of the theory with nice analytical characterizations.
The interested reader is also referred to [20, 90] who additionally study mixtures
of iid geometric variables, iid Poisson variables, and iid uniform variables. Mix-
tures of uniform random variables are discussed in more detail also in Section 3.3
below.
see, e.g., [75, Lemma 4.1, p. 161]. The function ϕ is called the characteristic
generator. If the components of X are conditionally iid, the function ϕ is of a
very special form, see Schoenberg’s Theorem 3.1 below.
If the components of Y = (Y1 , . . . , Yd ) are iid standard normally distributed,
and M ∈ (0, ∞) is an independent random variable, the random vector X =
M Y is spherical, because Y O is a vector of iid standard normal components for
any orthogonal matrix O. Furthermore, the components of X are iid conditioned
on the σ-algebra generated by the mixture variable M . Schoenberg’s Theorem
states that the converse is true as well, i.e. all conditionally iid spherical laws
are mixtures of zero-mean normals.
Theorem 3.1 (Schoenberg’s Theorem). Let M be the family of d-dimensional
spherical laws, and let the law of X be in M, and assume X is not identically
equal to a vector of zeros. The following are equivalent
(a) The law of X lies in M∗ .
(b) There are iid standard normal random variables Y1 , . . . , Yd and an inde-
pendent positive random variable M ∈ (0, ∞) such that
d
X = M (Y1 , . . . , Yd ).
for some function ϕ in one variable, see, e.g., [75, Lemma 4.1, p. 161]. De-
noting the characteristic function of Xk by fk , k = 1, . . . , d, independence
2
of the components implies that ϕ( u 2 ) = f1 (u1 ) . . . fd (ud ). Taking the
2
derivative9 w.r.t. uk and dividing by ϕ( u 2 ) on both sides of the last
equation implies for arbitrary k = 1, . . . , d that
fk (uk ) ϕ ( u 2 )
2
= 2 . (3.2)
fk (uk ) 2 uk ϕ( u 2 )
fk (u) ϕ ( u 2 )
2
fj (u)
= 2 = , arbitrary 1 ≤ k, j ≤ d. (3.3)
fk (u) 2 u ϕ( u 2 ) fj (u) 2 u
Plugging some u which has u as its k-th and y as its j-th component into
(3.2), we observe
fk (u) ϕ ( u 2 )
2
fj (y) (3.3) fk (y)
= 2 = = .
fk (u) 2 u ϕ( u 2 ) fj (y) 2 y fk (y) 2 y
Since u, y were arbitrary, the functions x → fk (u)/(fk (u) 2 u) are therefore
shown to equal some constant c independent of k. Since fk (0) = 1, solving
the resulting ordinary differential equation implies that fk (u) = exp(c u2 ).
Left to show is now only that c ≤ 0, because this would imply that fk
equals the characteristic function of a zero-mean normal. Since fk is a
characteristic function and as such must be positive semi-definite, the
inequality
$ %
fk (0 − 0) fk (0 − 1)
det = fk (0)2 − fk (1) fk (−1) = 1 − e2 c ≥ 0
fk (1 − 0) fk (1 − 1)
must hold. Clearly, this is only possible for c ≤ 0. The case c = 0 is ruled
out by the assumption that X is not identical to a vector of zeros.
(ii) If the law of X lies in M∗ we can without loss of generality assume that X
equals the first d members of an infinite exchangeable sequence {Xk }k∈N .
Conditioned on the tail-σ-field H := ∩n≥1 σ(Xn , Xn+1 , . . .) the random
9 Notice that characteristic functions are differentiable.
710 J.-F. Mai
since X is spherical. Since H does not depend on X (but only on the tail
of the infinite sequence), this implies that the conditional distribution of
X and X O given H are identical. As O was arbitrary, X conditioned on
H is spherical. Maxwell’s Theorem now implies that X conditioned on H
is an iid sequence of zero mean normals. Thus, only the standard deviation
may still be a H-measurable random variable, which we denote by M .
If (P) in Problem 1.3 is the property of “having a spherical law (in some di-
mension)”, then Schoenberg’s Theorem 3.1 also implies that M∗ = M∗∗ , which
follows trivially from the equivalence of (a) and (b), since the stochastic con-
struction in (b) clearly works for arbitrary n > d as well. Furthermore, it is
observed that the random distribution function Ht = Φ(t/M ) in part (b) satis-
fies the condition in Lemma 1.7 with μ = 0, so conditionally iid spherical laws
are radially symmetric. In fact, (arbitrary) spherical laws are always radially
d
symmetric, since (X1 , . . . , Xd ) = (−X1 , . . . , −Xd ) follows immediately from the
definition.
Remark 3.1 (Realization of uniform law on Euclidean unit sphere). Denoting
Y = (Y1 , . . . , Yd ), the equivalence (b) ⇔ (c) in Theorem 3.1 implies
Y Yd
d 1
S= ,..., ,
Y 2 Y 2
which shows how to generate realizations of the uniform law on the Euclidean
unit sphere from a list of iid standard normals.
Remark 3.2 (Elliptical laws). Spherical laws are always exchangeable, which is
easy to see. A popular method to enrich the family of spherical laws to obtain a
larger family beyond the exchangeable paradigm is linear transformation. To wit,
for X ∈ Rk spherical with characteristic generator ϕ, A ∈ Rk×d some matrix
with Σ := A A ∈ Rd×d and rank of Σ equal to k ≤ d, and with b = (b1 , . . . , bd )
some real-valued row vector, the random vector
Z = (Z1 , . . . , Zd ) = X A + b (3.4)
is said to have an elliptical law with parameters (ϕ, Σ, b). This generalization
from spherical laws to elliptical laws is especially well-behaved from an ana-
lytical viewpoint, since the apparatus of linear algebra gets along perfectly well
with the definition of spherical laws. The most prominent elliptical law is the
multivariate normal distribution, which is obtained in the special case when
ϕ(x) = exp(−x/2) is the Laplace transform of the constant 1/2. The case when
2
E[ X 2 ] < ∞ is of most prominent importance, since the random vector Z then
2
has existing covariance matrix given by E[ X 2 ] Σ /k.
The infinite extendibility problem 711
Since the normal distribution special case occupies a commanding role when
deciding whether or not a spherical law is conditionally iid according to Theo-
rem 3.1(b), and since we have also solved our motivating Problem 1.1 for the
multivariate normal law in Example 1.1, it is not difficult to decide when an el-
liptical law is conditionally iid as well. To wit, in the most important case when
2
E[ X 2 ] < ∞ the random vector Z in (3.4) has a stochastic representation
d
that is conditionally iid if and only if b1 = . . . = bd , and Z = R Y + b with R
some positive random variable with finite second moment and Y = (Y1 , . . . , Yd )
multivariate normal with zero mean vector and covariance matrix such as in Ex-
ample 1.1, i.e. with identical diagonal elements σ 2 > 0 and identical off-diagonal
elements ρ σ 2 ≥ 0.
is uniformly distributed on the unit simplex, cf. [75, Lemma 2.2(2), p. 77] or
[31, Theorem 5.2(2), p. 115]. An arbitrary 1 -norm symmetric random vector
X is represented as
E Ed
d 1
X=R ,..., (3.5)
E 1 E 1
712 J.-F. Mai
with independent R and E. With the analogy to the spherical case in mind,
heuristic reasoning suggests that X is extendible if and only if R is chosen
such that it “cancels” out the denominator of S in distribution. Since E 1
has a unit-scale Erlang distribution with parameter d, this would imply that R
should be chosen as R = Z/M for some positive random variable M and an
independent random variable Z with Erlang distribution and parameter d. This
is precisely the case, as Theorem 3.2 below shows.
Generally speaking, it follows from the canonical stochastic representation
(3.5) that
x
Ei , R > x = E e− R−x i=k Ei 1{R>x}
x
P(Xk > x) = P Ek >
R−x
i=k
x d−1
= E max 1 − , 0 =: ϕd,R (x), k = 1, . . . , d,
R
where the last equality uses knowledge about the Laplace transform of the
Erlang-distributed random variable i=k Ei . This means that the marginal
survival functions of the components Xk are given by the so-called Williamson
d-transform ϕd,R of R. It has been studied in [106], who shows in particular
that the law of R is uniquely determined by ϕd,R . A similar computation as
above shows that the joint survival function of X is given by
Theorem 3.2 solves Problem 1.3 for the property (P) of “having an 1 -norm
symmetric law (in some dimension)”.
Theorem 3.2 (Conditionally iid 1 -norm symmetric laws). Let ϕ : [0, ∞) →
[0, 1] be a function in one variable. The following statements are equivalent:
(a) There is an infinite sequence of random variables {Xk }k∈N such that for
arbitrary d ∈ N we have
(b) The function ϕ equals the Laplace transform of some positive random vari-
able M , i.e. ϕ(x) = E[exp(−x M )].
In this case, for arbitrary d ∈ N we have
d 1 d 1
X = (X1 , . . . , Xd ) = ZS = E,
M M
where X as in (a), M as in (b), S uniformly distributed on the unit simplex,
E = (E1 , . . . , Ed ) a vector of iid unit exponentials, and Z a unit-scale Erlang
distributed variate with parameter d, all mutually independent. In other words,
X has a stochastic representation as in (1.5) with Zt := M t, in particular is
conditionally iid.
The infinite extendibility problem 713
Proof. The implication (b) ⇒ (a) works precisely along the stochastic model
claimed, and is readily observed. The implication (a) ⇒ (b) is known as Kim-
berling’s Theorem, see [54]. We provide a proof sketch in the sequel. From d = 1
we observe that ϕ is the survival function of some positive random variable.
Consequently, due to Bernstein’s Theorem10 , it is sufficient to prove that ϕ is
completely monotone, meaning that (−1)d ϕ(d) ≥ 0 for all d ∈ N0 . To this end,
recall that
d
d
d
(−1) ϕ (d)
(x) = Δdh [ϕ](x) + O(h), Δdh [ϕ](x) := (−1)d−k ϕ(x − k h),
k
k=0
functions x → exp(−m x) for m > 0, which is just another way to say that
the function ϕ(x) = E[exp(−x M )] determines the law of the positive random
variable M uniquely. Typical parametric examples for Laplace transforms in the
context of 1 -norm symmetric distributions are ϕ(x) = (1+x)−θ with θ > 0, cor-
responding to a Gamma distribution of M , or ϕ(x) = exp(−xθ ) with θ ∈ (0, 1),
corresponding to a stable distribution of M .
Remark 3.4 (Archimedean copulas). Considering X = (X1 , . . . , Xd ) with 1 -
norm symmetric law associated with the Williamson d-transform ϕ = ϕd,R , the
random vector (U1 , . . . , Ud ) := ϕ(X1 ), . . . , ϕ(Xd ) has distribution function
for u1 , . . . , ud ∈ [0, 1]. Recall that ϕ−1 denotes the generalized inverse of ϕ.
The function Cϕ is called an Archimedean copula and the study of 1 -norm
symmetric distributions can obviously be translated into an analogous study of
Archimedean copulas. In the statistical and applied literature Archimedean cop-
ulas have received considerably more attention. For instance, nested and hier-
archical extensions of (exchangeable) Archimedean copulas have become quite
popular, see, e.g. [16, 47, 49, 82, 107, 65].
Remark 3.5 (Extension to Liouville distributions). Analyzing the analogy be-
tween spherical laws (aka 2 -norm symmetric laws) and 1 -norm symmetric
laws, there is one common mathematical fact on which the analytical treatment
of both families relies. To wit, for both families the uniform distribution on the
d-dimensional unit sphere can be represented as the normalized vector of iid
random variables. In the spherical case the normalized vector Y / Y 2 of d iid
standard normals Y = (Y1 , . . . , Yd ) is uniform on the . 2 -sphere, whereas in
the 1 -norm symmetric case the normalized vector E/ E 1 of d iid standard
exponentials E = (E1 , . . . , Ed ) is uniform on the . 1 -sphere restricted to the
positive orthant [0, ∞)d . Furthermore, in both cases the normalization can be
“canceled out” in distribution, that is
√ Y d E d
Z =Y, R = E,
Y 2 E 1
√ d
where Z = Y 2 is independent of Y and Z has a χ2 -law with d degrees of
d
freedom and R = E 1 is independent of E and has an Erlang distribution
with parameter d. The so-called Lukacs Theorem, due to [63], states that the
exponential distribution of the Ek in the last distributional equality can be gen-
eralized to a Gamma distribution (but no other law on (0, ∞) is possible). More
precisely, if G = (G1 , . . . , Gd ) are independent random variables with Gamma
G
distributions with the same scale parameter, then G 1 is independent of G ,
1
which means that
G d d
R = G, where R = G 1 is independent of G. (3.6)
G 1
The infinite extendibility problem 715
Notice that the scale parameter of this Gamma distribution is without loss of
generality set to one, since it has no influence on the law of S. A d-parametric
generalization of 1 -norm symmetric laws is obtained by replacing the uniform
law of S on the unit simplex (which is obtained for α1 = . . . = αd ) with a
Dirichlet distribution (with arbitrary αk > 0). One says that the random vector
X = R S with R some positive random variable and S an independent Dirichlet-
distributed random vector on the unit simplex, follows a Liouville distribution.
It is precisely the property (3.6) that implies that the generalization to Liouville
distributions is still analytically quite convenient to work with, see [84] for a
detailed study. Analogous to the 1 -norm symmetric case, the components of X
d d
are conditionally iid if α1 = . . . = αd and R satisfies R = Z/M with Z = G 1
and M some independent positive random variable.
Having at hand the apparatus of Archimedean copulas, we are now in the
position to provide a non-trivial example for the situation M∗∗ M∗ .
Example 3.3 (In general, M∗∗ M∗ ). Consider the family M ⊂ M+ 1
([0, 1]2 )
defined by the property (P) of “having an Archimedean copula as distribution
function and being radially symmetric”. It is well-known since [35, Theorem 4.1]
that the set M comprises precisely Frank’s copula family, that is the bivariate
distribution function of an element in M is either given by C−∞ (u1 , u2 ) :=
max{u1 + u2 − 1, 0}, by C0 (u1 , u2 ) := u1 u2 , by C∞ (u1 , u2 ) := min{u1 , u2 }, or
by
1 e−θ u1 − 1 e−θ u2 − 1
Cθ (u1 , u2 ) := − log 1 + , u1 , u2 ∈ [0, 1],
θ e−θ − 1
for some parameter θ ∈ (−∞, 0) ∪ (0, ∞). Since Kendall’s Tau of the copula Cθ
is negative in the case θ < 0, Lemma 1.3 implies that the subset M∗ can at best
contain the elements corresponding to θ ∈ [0, ∞]. Indeed, the cases θ ∈ {0, ∞}
are obviously contained in M∗∗ ⊂ M∗ , and for θ ∈ (0, ∞) membership in M∗
follows via the canonical construction (1.4) with the choice H ∼ γ ∈ M+ 1
(H+ ),
given by
1 − e−θ t M
Ht = , t ∈ [0, 1], (3.8)
1 − e−θ
for a random variable M with logarithmic distribution P(M = m) = (1 −
exp(−θ))m /(m θ), m ∈ N. Furthermore, we can deduce from Theorem 3.2 that
the property of “having an Archimedean copula as distribution function (in ar-
bitrary dimension)” implies that potential elements in M∗∗ must necessarily
716 J.-F. Mai
be induced by a stochastic process of the form (3.8) with some positive random
variable M , which must necessarily be logarithmic in the radially symmetric case
by the result of Frank. The only thing left to check is whether the multivariate
Archimedean copula derived from the canonical construction via H defined by
(3.8) with logarithmic M is radially symmetric in arbitrary dimension d ≥ 2.
According to Lemma 1.7 this is the case if and only if
1 − e−θ 2 −t
1
M
= {H 1 −t }
1 − e−θ t∈ − 12 , 12 2 t∈ − 12 , 12
−θ t+ 12 M
= {1 − Ht+ 12 }
d = 1− 1−e .
t∈ − 12 , 12 1 − e−θ t∈ − 12 , 12
∞
where gd−1 (x) := gd (u) du + x gd (x), (3.11)
x
and the function gd−1 is easily checked to satisfy (3.10) in dimension d − 1, that
∞
is 1 = (d − 1) 0 gd−1 (x)xd−2 dx. It is further not difficult to verify that gd is
given in terms of gd−1 as
∞
gd−1 (x) gd−1 (u)
gd (x) = − du. (3.12)
x x u2
If M denotes the family of all laws with density of the form (3.9), i.e. with a func-
tion gd satisfying (3.10), the following result provides necessary and sufficient
conditions on gd to define a law in M∗ .
Theorem 3.3 (Conditionally iid ∞ -norm symmetric densities). Let M be the
family of probability laws on (0, ∞)d with densities of the form (3.9) with a
measurable function gd : (0, ∞) → [0, ∞) satisfying (3.10). For X with law in
M, the following statements are equivalent:
(a) The law of X lies in M∗ .
(b) gd is non-increasing.
(c) For a vector U = (U1 , . . . , Ud ) whose components are iid uniform on [0, 1]
and an independent, positive random variable M we have
d
X = M U.
Proof. This is [40, Theorem 2]. Clearly, (c) ⇒ (a) is obvious. In order to see (b)
⇒ (c), we first conclude from (3.10) that
1 1
0 = lim gd (x) xd = lim gd . (3.13)
x→∞ x→∞ x xd
By non-increasingness, we may without loss of generality assume that gd is
right-continuous (otherwise, change to its right-continuous version, which does
not change the density fX essentially) and apply integration by parts, then
(3.13) and (3.10) imply
∞ ∞
xd d − gd (x) = d gd (x) xd−1 dx = 1.
0 0
x
Consequently, x → 0 y d d − gd (y) defines the distribution function of a posi-
tive random variable M , and we see that
∞
yd
E[1{M >x} M −d ] = d − gd (y) = gd (x).
x yd
Now let U as claimed be independent of M . Conditioned on M , the density of
M U is
d
1{0<xk <M } 1
x → = 1{0<x[d] <M } d .
M M
k=1
718 J.-F. Mai
which shows (c). The hardest part is (a) ⇒ (b). Fix > 0 arbitrary. Due to
measurability of gd , Lusin’s Theorem guarantees continuity of gd on a set C
whose complement has Lebesgue measure less than . Without loss of generality
we may assume that all points t in C are density points, i.e. satisfy
λ(C ∩ [t − δ, t + δ])
lim = 1,
δ0 2δ
where λ denotes Lebesgue measure. Let {Xk }k∈N an infinite exchangeable se-
quence such that d-margins have the density fX . Fix t ≥ s arbitrary. We define
the sequence of random variables {ξk }k∈N by
1
ξk := 1{Xk ∈As } − 1{Xk ∈At } , k ∈ N,
2δ
where Ax := C ∩ [x − δ, x + δ] for x ∈ {s, t}. Notice that the ξk are square-
integrable and
1
0 ≤ E[ξ1 ξ2 ] = E[1{X1 ,X2 ∈As } ] + E[1{X1 ,X2 ∈At } ] − 2 E[1{X1 ∈At ,X2 ∈As } ]
4 δ2
1
= g (x
2 [2] ) dx + g (x
2 [2] ) dx − 2 g (x
2 [2] ) dx
4 δ2 As ×As At ×At At ×As
= g2 (ηs ) + g2 (ηt ) − 2 g2 (η̃t ),
From the equivalence of (a) and (c) in Theorem 3.3 we observe easily that
M∗ = M∗∗ , when considering the property (P) of “having a density of the form
(3.9) (in some dimension d ∈ N)” in Problem 1.3. Notice furthermore that the
The infinite extendibility problem 719
law of M U is static in the sense defined in the beginning of this section, and
we have
t
Ht := P(Xk ≤ t | M ) = max 0, min 1, , t ∈ R,
M
for Xk := M Uk as defined in part (c) of Theorem 3.3.
Remark 3.6 (Common umbrella of p -norm symmetry results). Theorem 3.3
on ∞ -norm symmetric densities is very similar in nature to Schoenberg’s The-
orem 3.1 on 2 -norm symmetric characteristic functions and Theorem 3.2 on
1 -norm symmetric survival functions, which makes it a beautiful result with
regards to the present survey. The reference [90] considers all these three cases
under one common umbrella, and even manages to generalize them in some
meaningful sense to the case of arbitrary p -norm, with p ∈ [1, ∞] arbitrary11 .
More precisely, it is shown that an infinite exchangeable sequence {Xk }k∈N of
the form Xk := M Yk , k ∈ N, with M > 0 and an independent iid sequence
{Yk }k∈N of positive random variables is p -norm symmetric in some meaningful
sense12 if and only if the random variables Yk have density fp given by
1
p1− p − xpp
fp (x) := e , 0 < p, x < ∞, f∞ (x) := 1{x∈(0,1)} .
Γ(1/p)
Notice that f1 , f2 , and f∞ are the densities of the unit exponential law, the
absolute value of a standard normal law, and the uniform law on [0, 1], respec-
tively. This parametric family in the parameter p is further investigated, and
might for instance be characterized by the fact that fp for p < ∞ has maxi-
mal entropy among all densities on (0, ∞) with p-th moment equal to one, and
f∞ has maximal entropy among all densities with support (0, 1), which is [90,
Theorem 3.5].
An analogous result to Theorem 3.3 on mixtures of the form M U , when
the components of U are iid uniform on [−1, 1], is also presented in [40]. The
resulting densities depend on two arguments, x[1] and x[d] . Furthermore, [90,
Corollary 4.3] prove that an infinite exchangeable sequence {Xk }k∈N satisfies
d
{Xk }k∈N = {M Uk }k∈N , U1 , U2 , . . . iid uniform on [0, 1], M > 0 independent,
if and only if for arbitrary d ∈ N and almost all s > 0 the law of X =
(X1 , . . . , Xd ) conditioned on the event { X ∞ = s} is uniformly distributed
on the sphere {x ∈ (0, ∞)d : x ∞ = s}. This provides an alternative charac-
terization of densities that are ∞ -norm symmetric and conditionally iid.
Remark 3.7 (Relation to non-homogeneous pure birth processes). [99] provide
an interesting interpretation of ∞ -norm symmetric densities, which is briefly
explained. Every non-negative function gd satisfying (3.10) is of the form
x
gd (x) = cd rd (x) e− 0
rd (u) du
11 The authors even allow for p ∈ (0, 1), but in this case .p is no longer a norm.
12 See [90] for details.
720 J.-F. Mai
∞
for some non-negative function rd satisfying 0
rd (x) dx = ∞, and some nor-
malizing constant cd > 0. To wit, a function
gd (x)
rd (x) := ∞ , x > 0, (3.14)
cd x
gd (u) du
for some normalizing constant cd > 0 does the job, as can readily be checked.
From such a function rd we iteratively define functions rd−1 , . . . , r1 by solving
the equations
e−Rk+1 (x)
rk (x) e−Rk (x) = ∞ −R , k = d − 1, . . . , 1, (3.15)
0
e k+1 (u) du
x
where Rk (x) := 0 rk (u) du for k = 1, . . . , d. Notice that rk is related to the
right-hand side of (3.15) exactly in the same way as rd is related to gd , so the
solution (3.14) shows how the rk look like in terms of rk+1 . We define inde-
pendent positive random variables E1 , . . . , Ed with survival functions P(Ek >
x) = exp(−Rk (x)), k = 1, . . . , d, x ≥ 0. Independently, let Π be a random
permutation of {1, . . . , d} with P(Π = π) = 1/d! for each permutation π of
{1, . . . , d}, i.e. Π is uniformly distributed on the set of all d! permutations. We
consider the increasing sequence of random variables T1 < T2 < . . . < Td de-
fined by Tk := E1 + . . . + Ek . Then the (obviously exchangeable) random vector
X = (X1 , . . . , Xd ) := (TΠ(1) , . . . , TΠ(d) ) has density (3.9). If E1 , E2 , . . . is an
arbitrary sequence of independent, absolutely continuous, positive random vari-
ables the counting process
Nt := 1{E1 +...+Ek ≤t} , t ≥ 0,
k≥1
is called non-homogeneous pure birth process with intensity rate functions given
by rk (x) := − ∂x
∂
log{P(Ek > x)}, k ≥ 1. A random permutation of the first d
jump times Tk := E1 + . . . + Ek , k = 1, . . . , d, of a pure birth process N thus has
an ∞ -norm symmetric density if the intensities r1 , . . . , rd−1 can be retrieved
recursively from rd via (3.15). The case of arbitrary intensities r1 , . . . , rd hence
provides a natural generalization of the family of ∞ -norm symmetric densities.
It appears to be an interesting open problem to determine necessary and suf-
ficient conditions on r1 , . . . , rd such that the respective exchangeable density is
conditionally iid, see also paragraph 7.1 below.
Example 3.4 (Pareto mixture of uniforms). Let M in Theorem 3.3 have sur-
vival function P(M > x) = min{1, x−α } for some α > 0. The associated func-
tion gd generating the ∞ -norm symmetric density is given by
∞
α
gd (x) = E[1{M >x} M −d ] = α u−d−1−α du = max{1, x}−d−α .
max{x,1} d+α
The components Xk of X have the following one-dimensional distribution func-
tion G(x) := P(Xk ≤ x), and respective inverse G−1 , given by
'
α
x , if x < 1
G(x) = 1+α 1 −α ,
1 − 1+α x , if x ≥ 1
The infinite extendibility problem 721
'
1+α α
−1 α y , if 0 < y < 1+α
G (y) = −α
1 .
(1 − y) (1 + α) , if α
1+α ≤y<1
Scatter plots from this copula for different values of α are depicted in Figure 3,
visualizing the dependence structure behind pairs of X. The dependence de-
creases with α, and the limiting cases α = 0 and α = ∞ correspond to perfect
positive association and independence, respectively. One furthermore observes
that the dependence is highly asymmetric, i.e. large values of G(X1 ), G(X2 ) are
more likely jointly close to each other than small values, which behave like in-
dependence. This effect can be quantified in terms of the so-called upper- and
lower-tail dependence coefficients, given by
2
lim P(X1 > x | X2 > x = , lim P(X1 ≤ x | X2 ≤ x = 0,
x→∞ 2+α x0
respectively.
P(Xi1 > ti1 + t, . . . , Xin > tin + t | Xi1 > t, . . . , Xin > t)
= P(Xi1 > ti1 , . . . , Xin > tin ),
Fig 3. Top: 5000 samples of (G(X1 ), G(X2 )) for α = 0.1 in Example 3.4. Bottom: 5000
samples of (G(X1 ), G(X2 )) for α = 1 in Example 3.4.
Zi1 ,...,in (t) := (1{Xi1 >t} , . . . , 1{Xin >t} ), t ≥ 0, the stochastic process which indi-
cates for each of the n components i1 , . . . , in whether it is still working or already
dysfunctional. The random vector X has the lack-of-memory property if and
only if Zi1 ,...,in is a continuous-time Markov chain for all 1 ≤ i1 < . . . < in ≤ d.
From a theoretical point of view, studying the (multivariate) lack-of-memory
property is also natural as it generalizes very popular one-dimensional prob-
ability distributions to the multivariate case. Indeed, if d = 1 we abbreviate
X := X1 and recall the following classical characterizations.
Lemma 4.1 (Characterization of lack-of-memory for d = 1).
(E) If the support of X equals [0, ∞), then X satisfies the lack-of-memory
property if and only if X has an exponential distribution, that is P(X >
t) = exp(−λ t) for some λ > 0.
(G) If the support of X equals N, then X satisfies the lack-of-memory property
if and only if X has a geometric distribution, that is P(X > n) = (1 − p)n
for some p ∈ (0, 1).
The infinite extendibility problem 723
Proof. In the geometric case, inductively we see that F̄ (n) := P(X > n) satisfies
F̄ (n) = F̄ (1)n , n ∈ N0 , and the claim follows with p := 1 − F̄ (1). Notice
that F̄ (1) ∈ {0, 1} is ruled out by the assumption of support equal to N. The
exponential case follows similarly, see [12, p. 190].
Xk := min{EI : k ∈ I}, k = 1, . . . , d.
Xk := min{GI : k ∈ I}, k = 1, . . . , d.
d−k
d−k
bk := pi , (wide-sense geometric case), (4.3)
i=0
i
for arbitrary t > 0 even the sequence {exp(−t Ψ(k))}k∈N0 lies in M∞ as the
moment sequence of exp(−Zt ), the sequence {exp(−Ψ(k))}k∈N0 even lies in the
smaller set LM∞ of completely log-monotone sequences. The subset LM∞
M∞ corresponds to precisely the infinitely divisible laws on [0, ∞], which is the
discrete analogue of the well known statement that exp(−t Ψ) is a completely
monotone function for arbitrary t > 0 if and only if Ψ(1) is completely monotone.
With this information and the information of paragraph 4.2 as background the
following theorem is now quite intuitive.
Theorem 4.1 solves Problem 1.3 for the property (P) of “satisfying the mul-
tivariate lack-of-memory property”.
Theorem 4.1 (Lack-of-memory, exchangeability and conditionally iid).
(E) The function (4.1) is a survival function (of some X) with support [0, ∞)d
if and only if we have (b0 , . . . , bd ) ∈ LMd . Furthermore, the associated ex-
changeable Marshall-Olkin distribution admits a stochastic representation
that is conditionally iid if there exist bd+1 , bd+2 , . . . such that {bk }k∈N0 ∈
LM∞ . To wit, in this case there exists a (possibly killed) Lévy subordina-
tor Z = {Zt }t≥0 , determined in law via
bk := E e−k Z1 , k ∈ N0 , (4.5)
such that X has the same distribution as the vector defined in (1.5).
(G) The function (4.1) is a survival function (of some X) with support Nd
if and only if we have (b0 , b1 , . . . , bd ) ∈ Md . Furthermore, the associated
exchangeable wide-sense geometric distribution admits a stochastic rep-
resentation that is conditionally iid if there exist bd+1 , bd+2 , . . . such that
{bk }k∈N0 ∈ M∞ . To wit, in this case there exists an iid sequence Y1 , Y2 , . . .
of random variables taking values in [0, ∞], determined in law via
bk := E e−k Y1 , k ∈ N0 , (4.6)
such that X has the same distribution as the vector defined in (1.5) when
Zt := Y1 + Y2 + . . . + Yt , t ≥ 0.
Proof. Part (E) is due to [71, 72], while part (G) is due to [77].
First, we observe that once the correspondence between Md and the wide-
sense geometric law is established, the correspondence between LMd and the
narrow-sense geometric law (or, algebraically equivalent, its continuous counter-
part the Marshall-Olkin law) follows from (4.4) together with (4.2) and (4.3).
This is because the λj in (4.2) are arbitrary non-negative numbers, and the pi in
(4.3) are also arbitrary non-negative up to scaling (i.e. with an additional scale
factor c > 0 we have that c (p0 , . . . , pd−1 ) and (λ1 , . . . , λd ) both run through all
of [0, ∞)d \ {(0, . . . , 0)}, noticing that pd is determined by p0 , . . . , pd−1 ). Con-
cretely, by the correspondence between Md and the wide-sense geometric law,
we obtain a correspondence between Md and [0, ∞)d \ {(0, . . . , 0)} up to scaling
728 J.-F. Mai
Γ(p + q) p−1
fX (x) = x (1 − x)q−1 , 0 < x < 1,
Γ(p) Γ(q)
The infinite extendibility problem 729
d
The associated probability distribution of Y1 in Theorem 4.1(G) is given by Y1 =
− log(X), i.e. the logarithm of the reciprocal of the Beta distribution in concern.
Similarly, making use of (4.4), a two-parametric family of d-variate Marshall-
Olkin survival functions (for arbitrary d ≥ 1) is given by
Γ(p + q)
d Γ(p + i)
k−1
F̄p,q (x) = exp − (x[d−k+1] − x[d−k] )
Γ(p) i=0
Γ(p + q + i)
k=1
Γ(p + q) Γ(p + k − 1)
d
= exp − x[d−k+1] , x ∈ [0, ∞)d .
Γ(p) Γ(p + q + k − 1)
k=1
Throughout this paragraph, for the sake of a more compact notation we implic-
itly make excessive use of the abbreviations f (0) := limx0 f (x) and f (∞) :=
limx→∞ f (x) for functions f : (0, ∞) → (0, ∞), provided the respective limits
exist in [0, ∞].
If one can find sequences α1 (n), . . . , αd (n) > 0 and β1 (n), . . . , βd (n) ∈ R such
that the re-scaled vector
maxn {V (i) } − β (n) maxni=1 {Vd } − βd (n)
(i)
i=1 1 1
,...,
α1 (n) αd (n)
for a copula C : [0, 1]d → [0, 1] with the characterizing property that
C(u)t = C(ut1 , . . . , utd ) for each t > 0, a so-called extreme-value copula.
In order to focus on a deeper understanding of extreme-value copulas it is conve-
nient to normalize the margins F1 , . . . , Fd . In classical extreme-value theory, it
is standard to normalize to standardized Fréchet distributions, i.e. Fk (x) =
exp(−λk /x) 1{x>0} for some λk > 0. Furthermore, we observe that X :=
(1/Y1 , . . . , 1/Yd ) is well-defined, Xk is exponential with rate λk , and X is min-
stable (since x → 1/x is strictly decreasing, so max-stability of Y is flipped to
min-stability of X). The vector X is thus called min-stable multivariate expo-
nential and has survival function
F̄ (x) = P(X > x) = P(Y < 1/x) = C e−λ1 x1 , . . . , e−λd xd ,
with extreme-value copula C. The survival function F̄ is min-stable, satisfying
F̄ (x)t = F̄ (t x), t > 0. (5.1)
The analytical property (5.1) characterizes the concept of min-stable multivari-
ate exponentiality on the level of survival functions, and serves as a convenient
starting point to study the conditionally iid subfamily (of extreme-value copulas,
resp. min-stable multivariate exponential distributions). For a given extreme-
value copula C it further turns out convenient to consider its so-called stable
tail dependence function
(x) := − log C e−x1 , . . . , e−xd , x ∈ [0, ∞)d ,
taking values in [0, ∞]. It is not difficult to see that H := 1 − exp(−Z) takes
values in H+ . Consequently, we may define a conditionally iid random vector
X via the canonical stochastic model (1.5) from this process H. Conditioned
on H, the components of X are iid with distribution function H. It turns out
that X is min-stable multivariate exponential. To see this, we recall that the
increasing sequence {η1 + . . . + ηn }n≥1 equals the enumeration of the points of a
Poisson random measure on [0, ∞) with intensity measure equal to the Lebesgue
measure. This implies with the help of [91, Proposition 3.6] in (∗) below that the
survival function F̄ of X is given by
d
F̄ (x) = P(Zx1 ≤ 1 , . . . , Zxd ≤ d ) = E e− k=1 Zxk
d η + . . . + η
1 n
= E exp − − log G −
xk
n≥1 k=1
(∗)
∞
d u
= exp − 1− G du .
0 xk
k=1
The main theorem in this section states that Examples 5.1 and 5.2 are gen-
eral enough to understand the structure of the set of all infinite exchangeable
sequences {Xk }k∈N whose finite-dimensional margins are both min-stable multi-
variate exponential and conditionally iid. Concretely, Theorem 5.1 solves Prob-
lem 1.3 for the property (P) of “having a min-stable multivariate exponential
distribution (in some dimension)”. In analytical terms, it states that the stable
tail dependence function associated with the extreme-value copula of a condi-
tionally iid min-stable multivariate exponential random vector is a convex mix-
ture of stable tail dependence functions having the structural form as presented
in Examples 5.1 and 5.2.
Theorem 5.1 (Which min-stable laws are conditionally iid?). Let {Xk }k∈N be
an infinite exchangeable sequence of positive random variables such that X =
(X1 , . . . , Xd ) is min-stable multivariate exponential for all d ∈ N. Assume that
{Xk }k∈N is not iid, i.e. not given as in Example 5.1. Then there exists a unique
triplet (b, c, γ) of two constants b ≥ 0, c > 0 and a probability measure γ on
H+,1 , such that Xk is exponential with rate b + c for each k ∈ N and the stable
tail dependence function of X equals
x b c
(x) := − log F̄ = x 1+ G (x) γ(dG).
b+c b+c b + c H+,1
In probabilistic terms, the random distribution function H, defined as the limit
of empirical distribution functions of the {Xk }k∈N as in Lemma 1.8, necessarily
d
satisfies H = 1 − exp(−Z) with
(n)
Zt = b t + c − log G η1 +...+ηn , t ≥ 0, (5.2)
t −
n≥1
where G(k) is an iid sequence drawn from the probability measure γ, independent
of the iid unit exponentials η1 , η2 , . . ..
Proof. A proof consists of three steps, which have been accomplished in the
three references [74, 59, 66], respectively, and which are sketched in the sequel.
(i) For Z = − log(1 − H) with H as defined in Lemma 1.8 from the sequence
{Xk }k∈N , [74, Theorem 5.3] shows that
n
d (i)
Z= Zt , n ∈ N, (5.3)
n t≥0
i=1
where f (n) are iid copies of some càdlàg stochastic process f with f0 = 0
satisfying some integrability condition, and b ∈ R.
(iii) [66] proves that in the series representation in (ii) necessarily b ≥ 0 and f
is almost surely non-decreasing. Furthermore, the integrability condition
on f can be re-phrased to say that t → G̃t := exp(− lims↓t f1/s ) defines
almost surely the distribution function of some random variable with fi-
∞
nite mean MG̃ = 0 1 − G̃t dt > 0. Finally, the distribution function
t → Gt := G̃MG̃ t has unit mean, and the claimed representation for is
obtained when c := E[MG̃ ] and γ is defined as the probability law of G
after an appropriate measure change. That (b, c, γ) is unique follows from
the normalization to unit mean of G (for each single realization).
Stochastic processes with property (5.3) are said to be strongly infinitely
divisible with respect to time (strong IDT). Particular examples of strong IDT
processes have been studied in [78, 27, 43], with an emphasis on the associated
multivariate min-stable laws also in [74, 9, 64, 76].
Every Lévy process is strong IDT, but the converse needs not hold. For in-
stance, if Z = {Zt }t≥0 is a non-trivial Lévy subordinator and a > b > 0, then
the stochastic process {Za t + Zb t }t≥0 is strong IDT, but not a Lévy subordi-
nator. The probability law γ in Theorem 5.1 in case of a Lévy subordinator is
specified as the probability law of
Gt = e−M + 1 − e−M 1{1−e−M ≥1/t} , t ≥ 0, (5.4)
with an arbitrary random variable M taking values in (0, ∞]. The Lévy measure
of Z and the probability law of M stand in one-to-one relation. We know from
the preceding section that if Z is a Lévy subordinator, the associated element
in M∗∗ is a d-variate Marshall-Olkin distribution. Indeed, the Marshall-Olkin
distribution is one of the most important examples of min-stable multivariate
exponential distributions. Two further examples are presented in the sequel.
Example 5.3 (The (negative) logistic model). If we reconsider Example 5.2
with the Fréchet distribution function G(x) = exp(−{Γ(1 − θ) x}−1/θ ) for θ ∈
(0, 1), then we observe
d
1
θ
G (x) = xkθ = x 1 .
θ
k=1
copula that is both Archimedean and of extreme-value kind, a result first discov-
ered in [39].
A related example is obtained, if we choose the Weibull distribution function
G(x) = 1 − exp(−{Γ(θ + 1) x}1/θ ), which implies
d
j − θ1
G (x) = (−1)j+1 x−θ
ik .
j=1 1≤i1 <...<ij ≤d k=1
This is the so-called negative logistic model. The associated extreme-value cop-
ula is named Galambos copula after [36]. There exist many analogies between
logistic and negative logistic models, the interested reader is referred to [37] for
background. In particular, the Galambos copula is the most popular representa-
tive of the family of so-called reciprocal Archimedean copulas as introduced in
[38], see also paragraph 7.1 below.
Example 5.4 (A rich parametric family). For G ∈ H+,1 the function ΨG (z) :=
∞
0
1−G(t)z dt defines a Bernstein function with ΨG (1) = 1, see [64, Lemma 3].
This implies for z ∈ (0, ∞) that Gz ∈ H+,1 , where Gz (x) := G(x ΨG (z))z . Con-
sequently, if M is a positive random variable, we may define γ ∈ M+ 1
(H+,1 ) as
the law of GM . The associated stable tail dependence function equals (x) :=
E[GM (x)]. Many parametric models from the literature are comprised by this
construction. In particular, Example 5.2 corresponds to the case M ≡ 1, and if
G(x) = exp(−1) + (1 − exp(−1)) 1{1−exp(−1)≥1/x} we observe that GM equals
the random distribution function (5.4) corresponding to the Marshall-Olkin sub-
family. See [76] for a detailed investigation and applications of this parametric
family.
Remark 5.1 (Extension to laws with exponential minima). We have seen that
the Marshall-Olkin distribution is a subfamily of min-stable multivariate ex-
ponential laws. The seminal reference [26] treats both families as multivariate
extensions of the univariate exponential law and in the process introduces the
even larger family of laws with exponential minima. A random vector X is said
to have exponential minima if min{Xi1 , . . . , Xik } has a univariate exponential
law for arbitrary 1 ≤ i1 < . . . ik ≤ d. Obviously, a min-stable multivariate expo-
nential law has exponential minima, but the converse needs not hold in general.
It is shown in [74] that if Z = {Zt }t≥0 is a right-continuous, non-decreasing
process such that E[exp(−x Zt )] = exp(−t Ψ(x)) for some Bernstein function
Ψ, then X as defined in (1.5) has exponential minima. The process Z is said
to be weakly infinitely divisible with respect to time (weak IDT), and – as the
nomenclature suggests – every strong IDT process is also weak IDT. However,
there exist weak IDT processes which are not strong IDT. Notice in particular
that a Lévy subordinator is uniquely determined in law by the law of Z1 (or
equivalently the Bernstein function Ψ), but neither strong nor weak IDT pro-
cesses are determined in law by the law of Z1 . If one takes two independent,
(1) d (2)
but different, strong IDT processes Z (1) , Z (2) subject to Z1 = Z1 , then the
736 J.-F. Mai
stochastic process
'
Zt
(1)
if B = 1, 1
Zt := (2) , B independent Bernoulli -variate, t ≥ 0,
Zt if B = 0, 2
is weak IDT, but not strong IDT. On the level of X this means that the mixture
of two min-stable multivariate exponential random vectors always has exponen-
tial minima, but needs not be min-stable anymore.
Remark 5.2 (Archimax copulas). The study of min-stable multivariate expo-
nentials is analogous to the study of extreme-value copulas. From this perspec-
tive, Theorem 5.1 gives us a canonical stochastic model for all conditionally iid
extreme-value copulas. Another family of copulas for which we understand the
conditionally iid subfamily pretty well is Archimedean copulas, related to 1 -norm
symmetric distributions and mentioned in Remark 3.4. The family of so-called
Archimax copulas is a superclass of both extreme-value and Archimedean cop-
ulas. It has been studied in [14, 15] with the intention to create a rich copula
family that comprises well-known subfamilies. An extreme-value copula C is con-
veniently described in terms of its stable tail dependence function. Recall that
Theorem 5.1 is formulated in terms of the stable tail dependence function and
gives an analytical criterion for C to be conditionally iid. An Archimax copula
C is a multivariate distribution function of the functional form
C,ϕ (u1 , . . . , ud ) = ϕ ϕ−1 (u1 ), . . . , ϕ−1 (ud ) .
when M denotes the family of all probability laws with the property (P) of “hav-
ing a survival function of the functional form ϕ ◦ with some stable tail de-
pendence function”. In this case, the function ϕ equals the Laplace transform of
M and is given in terms of a triplet (b, c, γ) such as in Theorem 5.1, associ-
ated with the strong IDT process Z, and b + c = 1. Notice that each stable tail
dependence function equals the restriction of an orthant-monotonic norm to
[0, ∞)d , see [85], so that survival functions of the form ϕ ◦ are precisely the
survival functions that are symmetric with respect to the norm .
We are interested in a solution of Problem 1.3 for the property (P) of “defin-
ing an exogenous shock model”. By Lemma 1.1 exchangeability is a necessary
requirement on X, and we observe immediately from (6.1) that this implies
that the distribution function of EI is allowed to depend on the subset I only
through its cardinality |I|. Some simple algebraic manipulations, see the proof
of Theorem 6.1 below, reveal that the survival function of X necessarily must
be given as the product of its arguments after being ordered and idiosyncrat-
ically distorted. Already the characterization of the exchangeable subfamily in
analytical terms is an interesting problem, the interested reader is referred to
[68] for its solution.
The conditionally iid subfamily M∗∗ is also investigated in [68]. One major
finding is that when the increments of the factor process Z in the canonical
construction (1.5) are independent, then one ends up with an exogenous shock
model. Recall that a càdlàg stochastic process Z = {Zt }t≥0 with independent
increments is called additive, see [94] for a textbook treatment. For our purpose,
it is sufficient to be aware that the probability law of a non-decreasing additive
process Z = {Zt }t≥0 with Z0 = 0 can be described uniquely in terms of a
family {Ψt }t≥0 of Bernstein functions defined by Ψt (x) := − log(E[exp(−x Zt )]),
x ≥ 0, i.e. Ψt equals the Laplace exponent of the infinitely divisible random
variable Zt . The independent increment property implies for 0 ≤ s ≤ t that
Ψt − Ψs is also a Bernstein function and equals the Laplace exponent of the
infinitely divisible random variable Zt − Zs . The easiest example for a non-
decreasing additive process is a Lévy subordinator, in which case Ψt = t Ψ1 ,
738 J.-F. Mai
i.e. the probability law is described completely in terms of just one Bernstein
function Ψ1 (due to the defining property that the increments are not only
independent but also identically distributed). Two further compelling examples
of (non-Lévy) additive processes are presented in subsequent paragraphs.
Theorem 6.1 (Additive subordinators and exogenous shock models). Let M
denote the family of probability laws with the property (P) of “defining an ex-
ogenous shock model”. A random vector X has law in M and is exchangeable if
and only if it admits a survival copula of the functional form
d
Ĉ(u1 , . . . , ud ) = u[1] gk (u[k] ), u1 , . . . , ud ∈ [0, 1], (6.2)
k=2
Proof. A proof for the inclusion “⊃” has been accomplished only recently and
can be found in [102]. A proof sketch for the inclusion “⊂” works as follows, see
[68] for details. The survival function of the random vector X defined by (6.1)
can be written in terms of the one-dimensional survival functions of the EI as
P(X > x) = P(EI > max{xk : k ∈ I}).
∅=I
d
P(X > x) = H̄m (max{xk : k ∈ I})
m=1 I : |I|=m
d
d−m+1 d−k
(m−1 )
d d−k+1 d−k
(m−1 )
= H̄m (x[d−k+1] ) = H̄m (x[d−k+1] ) . (6.3)
m=1 k=1 k=1 m=1
d d−1
(m−1 )
P(Xk > x) = H̄m (x) =: F̄1 (x), k = 1, . . . , d.
m=1
That (6.3) can be written as Ĉ P(X1 > x1 ), . . . , P(Xd > xd ) with Ĉ as in (6.2)
follows by a tedious yet straightforward computation with the gk defined as
d−k+1 d−k
(m−1 )
gk := H̄m ◦ F̄1−1 , k = 2, . . . , d,
m=1
The infinite extendibility problem 739
where F̄1−1 denotes the generalized inverse of the non-increasing function F̄1 ,
which is defined analogous to the generalized inverse of a distribution function
as
Notice that for each fixed t > 0 this corresponds to an infinitely divisible distri-
bution of Zt that is concentrated on the set
−1 −t
− log g2 FM e ,∞ .
The case g2 (x) = xα with α ∈ [0, 1] implies that Z is a killed Lévy subordinator
that grows linearly before it jumps to infinity, X has a Marshall-Olkin law, and
the EI are exponential. In the general case, Z needs not grow linearly before it
gets killed.
Two further examples are studied in greater detail in the following two para-
graphs, since they give rise to nice characterization results.
In the two landmark papers [33, 34], T.S. Ferguson introduces the so-called
Dirichlet prior and shows that it can be constructed by means of an addi-
tive process. More clearly, let c > 0 be a model parameter and let G ∈ H+ ,
continuous and strictly increasing. Consider a non-decreasing additive process
Z = {Zt }t∈[G−1 (0),G−1 (1)] whose probability law is determined by a family of
Bernstein functions {Ψt }t∈(G−1 (0),G−1 (1)) , which are given by
∞
e−u c (1−G(t)) − e−u c
Ψt (x) = 1 − e−x u du, x ≥ 0.
0 u (1 − e−u )
The random distribution function H = {Ht }t∈[G−1 (0),G−1 (1)] defined by Ht :=
1 − exp(−Zt ) satisfies the following property: For arbitrary G−1 (0) < t1 < . . . <
td < G−1 (1) the random vector
Ht1 , Ht2 − Ht1 , . . . , Htd − Htd−1 , 1 − Htd
The infinite extendibility problem 741
and H is called Dirichlet prior with parameters (c, G), denoted DP (c, G) in
the sequel. The probability distribution of (X1 , . . . , Xd ) in (1.5), when H =
DP (c, G) for some G with support [G−1 (0), G−1 (1)] := [0, ∞], is given by
It is insightful to remark that for c 0 the copula Ĉc converges to the so-called
upper-Fréchet Hoeffding copula Ĉ0 (u) = u[1] , and for c ∞ to the copula
d
Ĉ∞ (u) = k=1 uk associated with independence. The intuition of the Dirichlet
prior model is that all components of X have distribution function G, but one is
uncertain whether G is really the correct distribution function. So the parameter
c models an uncertainty about G in the sense that the process H must be viewed
as a “distortion” of G. For c ∞ we obtain H = G, while for c 0 the process
H is maximally chaotic (in some sense) and does not resemble G at all.
Interestingly, if the probability law dG is symmetric about its median μ :=
G−1 (0.5), then the random vector (X1 , . . . , Xd ) is radially symmetric, which
can be verified using Lemma 1.7. One can furthermore show that there ex-
ists no other conditionally iid exogenous shock model satisfying this property,
see the following lemma. To this end, recall that a copula C is called radially
symmetric if C = Ĉ, i.e. it equals its own survival copula, which means that
d
U = (U1 , . . . , Ud ) = (1 − U1 , . . . , 1 − Ud ) for U ∼ C.
Lemma 6.1 (Radial symmetry in exchangeable exogenous shock models). A
copula of the structural form (6.2) is radially symmetric if and only if the func-
tions gk are linear, k = 2, . . . , d, which is the case if and only if there is a
c ∈ [0, ∞] such that the copula takes the form (6.4).
Proof. This is [69, Theorem 3.5]. In order to prove necessity, the principle of
inclusion and exclusion can be used to express the survival copula of Ĉ as an
alternating sum of lower-dimensional margins of Ĉ. By radial symmetry, this
expression equals Ĉ, and on both sides of the equation one may now take the
derivatives with respect to all d arguments. A lengthy but tedious computation
then shows that the gk must all be linear, which implies the claim. Sufficiency
is proved using the Dirichlet prior construction. The defining properties of the
Dirichlet prior imply that the assumptions of Lemma 1.7 are satisfied, which
implies the claim.
16 Recall from Remark 3.5 that S = (S1 , . . . , Sd ) has a Dirichlet distribution with param-
d
eters α = (α1 , . . . , αd ) if S = G/ G1 for a vector G of independent unit-scale Gamma-
distributed random variables.
742 J.-F. Mai
d
P(X > x) = exp − Ψ (d − k + 1) x[k] − Ψ (d − k) x[k] , x ∈ [0, ∞)d .
k=1
(6.5)
d
(d − k) x[k] + 1 α
P(X > x) = .
(d − k + 1) x[k] + 1
k=1
The one-dimensional marginal survival functions are given by F̄1 (x) = (1+x)−α .
Notice that this equals the survival function of Y − 1, when Y has a Pareto
distribution with scale parameter (aka left-end point of support) equal to one
17 This is precisely the Gamma distribution with density (3.7) for α = αk .
The infinite extendibility problem 743
The present article surveys solutions to Problems 1.1 and 1.3 for several families
M of interest. One goal of the survey is to encourage others to solve the problem
also for other families. We provide examples that we find compelling:
(i) The family of min-stable laws in Section 5 can be generalized to min-
infinitely divisible laws. Generalizing (5.1), a multivariate survival func-
tion F̄ is called min-infinitely divisible if for each t > 0 there is a sur-
vival function F̄t such that F̄ (x)t = F̄t (x). Like min-stability is analogous
to max-stability, the concept of min-infinite divisibility is equivalent to
the concept of max-infinite divisibility, on which [91] provides a textbook
treatment. It is pretty obvious that non-decreasing infinitely divisible pro-
cesses occupy a commanding role with regards to the conditionally iid
subfamily, but to work out a convenient analytical treatment of these in
relation with the associated min-infinitely divisible laws appears to be a
promising direction for further research. Notice that the family of recip-
rocal Archimedean copulas, introduced in [38], is one particular special
case of max-infinitely divisible distribution functions, and in this special
case the conditionally iid subfamily is determined similarly as in the case
of Archimedean copulas, see [38, Section 7]. This might serve as a good
motivating example for the aforementioned generalization.
(ii) Theorem 3.3 studies d-variate densities of the form gd (x[d] ), and [40] also
considers a generalization to densities of the form g(x[1] , x[d] ), depending
on x[1] and x[d] . From a purely algebraic viewpoint it is tempting to investi-
d
gate whether exchangeable densities of the structural form k=1 gk (x[k] )
allow for a nice theory as well. When are these conditionally iid? This
generalization of the ∞ -norm symmetric case is motivated by a relation
to non-homogeneous pure birth processes, as already explained in Re-
mark 3.7. Such processes are of interest in reliability theory, as explained
in [99], see also [11] and [67, Section 4] for related investigations.
(iii) On page 721 it was mentioned that the Marshall-Olkin distribution is
characterized by the property that for all subsets of components the re-
spective “survival indicator process” is a continuous-time Markov chain.
This property may naturally be weakened to the situation when only the
survival indicator process Zt := (1{X1 >t} , . . . , 1{Xd >t} ) of all components
is a continuous-time Markov chain. On the level of multivariate distribu-
tions, one generalizes the Marshall-Olkin distribution to a more general
family of multivariate laws that has been shown to be interesting in math-
ematical finance in [46]. Furthermore, it is a subfamily of the even larger
744 J.-F. Mai
This is probably the most obvious application of the presented concepts. The
idea works as follows. According to our notation, the dependence-inducing la-
tent factor in a conditionally iid model is H. Depending on the stochastic prop-
erties of H ∼ γ ∈ M+ 1
(H), it may be possible to construct H from a pair
(H , H ) ∼ γ1 ⊗ γ2 ∈ M+
(1) (2) 1
(H) × M+ 1
(H) of two independent processes of the
same structural form, say H = f (H , H (2) ). For example, if H (1) and H (2) are
(1)
two strong IDT processes, see Section 5, then so is their sum H = H (1) + H (2) .
In this situation, we may define dependent processes H (1,1) , . . . , H (1,J) from
J + 1 independent processes H (0) , . . . , H (J) as H (1,j) = f (H (0) , H (j) ). The con-
ditionally iid vectors X (1) , . . . , X (J) defined via (1.4) from H (1,1) , . . . , H (1,J)
are then dependent, so that the combined vector X = (X (1) , . . . , X (J) ) has a
hierarchical dependence structure. Such structures break out of the – sometimes
undesired and limited – exchangeable cosmos and have the appealing property
that the lowest-level groups are conditionally iid, so the whole structure can be
The infinite extendibility problem 745
sized up, i.e. is dimension-free to some degree. Of particular interest is the situa-
(1) (J)
tion when the random vector (X1 , . . . , X1 ) composed of one component from
each of the J different groups is conditionally iid and its latent factor process
equals H (0) in distribution. In this particular situation, an understanding of the
whole dependence structure of the hierarchical model X is retrieved from an
understanding of the conditionally iid sub-models based on the H (j) . In other
words, the conditionally iid model can be nested to construct highly tractable,
non-exchangeable, multi-factor dependence models from simple building blocks.
For instance, hierarchical elliptical laws, Archimedean copulas18 , and min-stable
laws can be constructed based on the presented one-factor building blocks, see
[73] for an overview. For these and other families, the design, estimation, and
efficient simulation of such hierarchical structures is an active area of research
or even an unsolved problem.
model-specific and thus possibly interesting, and the two motivating examples
above indicate that one might find natural motivations for these.
All of the presented theorems solve Problem 1.3, but only in some cases19 the
solution set M∗∗ is shown to coincide with the in general larger solution set M∗
in Problem 1.1. Can one show that M∗ = M∗∗ in the other presented solutions
of Problem 1.3? To provide one concrete example, from Theorem 4.1(G) we know
that (b0 , b1 , b2 ) ∈ M2 determines a three-dimensional, exchangeable wide-sense
geometric law. However, this exchangeable probability distribution is only in
M∗∗ if there exist b3 , b4 , . . . such that {bk }k∈N0 ∈ M∞ . Could it be that the
last extension property does fail, but the three-dimensional, exchangeable wide-
sense geometric law associated with (b0 , b1 , b2 ) is still conditionally iid? If so,
then necessarily there is some n > 3 and an infinite exchangeable sequence
{Xk }k∈N such that (X1 , X2 , X3 ) has the given wide-sense geometric law but
(X1 , . . . , Xn ) is not wide-sense geometric.
A related question concerns only elements in M∗∗ . There might be two infinite
d
(1) (2) (1) (2)
exchangeable sequences {Xk }k∈N and {Xk }k∈N with {Xk }k∈N = {Xk }k∈N
(1) (1) d (2) (2)
but (X1 , . . . , Xd ) = (X1 , . . . , Xd ) for some d ∈ N. To provide an example,
related to Theorems 2.1 and 4.1, the vector (1, b1 ) with b1 ∈ [0, 1] can always be
extended to a sequence {bk }k∈N that is completely monotone, for example set
bk = bk1 . In case of Theorem 4.1(G), all the different possible extensions imply
different exchangeable sequences {Xk }k∈N such that 2-margins follow the asso-
ciated wide-sense geometric law with parameters (1, b1 ). But all these extensions
in Theorem 4.1(G) have in common that arbitrary d-margins are always wide-
sense geometric. Can one quantify how different such extensions are allowed to
be? A similar question is: Is the “⊂” in (5.5) actually a “=”? Notice that the
proof ideas in [20, 90], who study such issues in the case of some static laws,
might help to approach such questions.
As a general rule, for an infinite exchangeable sequence {Xk }k∈N defined via
(1.4) the probability law of the random distribution function H is uniquely
determined by its mixed moments
References
[1] Aldous, D.J. (1985). Exchangeability and related topics. Springer, École
d’Été de Probabilités de Saint-Flour XIII-1983, Lecture Notes in Mathe-
matics 1117, 1–198. MR0883646
[2] Aldous, D.J. (1985). More uses of exchangeability: representations of
complex random structures. in Probability and Methematical Genetics –
papers in honour of Sir John Kingman, Cambridge University Press 35–63.
MR2744234
[3] Alfsen, E.M. (1971). Compact convex sets and boundary integrals.
Springer, Berlin. MR0445271
[4] Arnold, B.C. (1975). A characterization of the exponential distribution
by multivariate geometric compounding. Sankhyā: The Indian Journal of
Statistics 37:1 164–173. MR0440792
[5] Assaf, D. and Langberg, N.A. and Savits, T.H. and Shaked, M.
(1984). Multivariate phase-type distributions. Operations Research 32:3
688–702. MR0756014
[6] Barlow, R.E. and Proschan, F. (1975). Statistical theory of reliability
and life testing. Rinehart and Winston, New York. MR0438625
[7] Beirlant, J. and Goegebeur, Y. and Teugels, J. and Segers, J.
(2004). Statistics of extremes: theory and applications. John Wiley & Sons,
Chichester. MR2108013
[8] Berg, C. and Christensen, J.P.R. and Ressel, P. (1984). Harmonic
analysis on semigroups. Springer, Berlin. MR0747302
[9] Bernhart, G. and Mai, J.-F. and Scherer, M. (2015). On the con-
struction of low-parametric families of min-stable multivariate exponen-
tial distributions in large dimensions. Dependence Modeling 3 29–46.
MR3418655
748 J.-F. Mai
[62] Lindskog, F. and McNeil, A.J. (2003). Common Poisson shock models:
applications to insurance and credit risk modelling. ASTIN Bulletin 33:2
209–238. MR2035051
[63] Lukacs, E. (1955). A characterization of the gamma distribution. Annals
of Mathematical Statistics 26 319–324. MR0069408
[64] Mai, J.-F. (2018). Extreme-value copulas associated with the expected
scaled maximum of independent random variables. Journal of Multivariate
Analysis 166 50–61. MR3799634
[65] Mai, J.-F. (2019). Simulation of hierarchical Archimedean copulas be-
yond the completely monotone case. Dependence Modeling 7 202–214.
MR3977499
[66] Mai, J.-F. (2020). Canonical spectral representation for exchangeable
max-stable sequences. Extremes 23 151–169. MR4064608
[67] Mai, J.-F. (2020). The de Finetti structure behind some norm-symmetric
multivariate densities with exponential decay. Dependence Modeling 8
210–220. MR4156799
[68] Mai, J.-F. and Schenk, S. and Scherer, M. (2016). Exchangeable
exogenous shock models. Bernoulli 22 1278–1299. MR3449814
[69] Mai, J.-F. and Schenk, S. and Scherer, M. (2016). Analyzing model
robustness via a distortion of the stochastic root: a Dirichlet prior ap-
proach. Statistics and Risk Modeling 32 177–195. MR3507979
[70] Mai, J.-F. and Schenk, S. and Scherer, M. (2017). Two novel char-
acterizations of self-decomposability on the positive half-axis. Journal of
Theoretical Probability 30 365–383. MR3615092
[71] Mai, J.-F. and Scherer, M. (2009). Lévy-frailty copulas. Journal of
Multivariate Analysis 100 1567–1585. MR2514148
[72] Mai, J.-F. and Scherer, M. (2011). Reparameterizing Marshall–Olkin
copulas with applications to sampling. Journal of Statistical Computation
and Simulation 81 59–78. MR2747378
[73] Mai, J.-F. and Scherer, M. (2012). H-extendible copulas. Journal of
Multivariate Analysis 110 151–160. MR2927515
[74] Mai, J.-F. and Scherer, M. (2014). Characterization of extendible dis-
tributions with exponential minima via processes that are infinitely divis-
ible with respect to time. Extremes 17 77–95. MR3179971
[75] Mai, J.-F. and Scherer, M. (2017). Simulating copulas, 2nd edition.
World Scientific Publishing, Singapore. MR3729417
[76] Mai, J.-F. and Scherer, M. (2019). Subordinators which are in-
finitely divisible w.r.t. time: construction, properties, and simulation of
max-stable sequences and infinitely divisible laws. ALEA: Latin Amer-
ican Journal of Probability and Mathematical Statistics 16:2 977–1005.
MR3999795
[77] Mai, J.-F. and Scherer, M. and Shenkman, N. (2013). Multivariate
geometric laws, (logarithmically) monotone sequences, and infinitely di-
visible laws. Journal of Multivariate Analysis 115 457–480. MR3004570
[78] Mansuy, R. (2005). On processes which are infinitely divisible with re-
spect to time. Working paper, arXiv:math/0504408.
752 J.-F. Mai