Asymptotic Analysis of Random Walks Light Tailed Distributions 1st Edition A.A. Borovkov All Chapter Instant Download
Asymptotic Analysis of Random Walks Light Tailed Distributions 1st Edition A.A. Borovkov All Chapter Instant Download
Asymptotic Analysis of Random Walks Light Tailed Distributions 1st Edition A.A. Borovkov All Chapter Instant Download
com
https://ebookmeta.com/product/asymptotic-analysis-of-random-
walks-light-tailed-distributions-1st-edition-a-a-borovkov-2/
OR CLICK BUTTON
DOWLOAD NOW
https://ebookmeta.com/product/asymptotic-analysis-of-random-
walks-light-tailed-distributions-1st-edition-a-a-borovkov/
https://ebookmeta.com/product/random-walks-on-infinite-
groups-1st-edition-steven-p-lalley/
https://ebookmeta.com/product/continuous-time-random-walks-for-
the-numerical-solution-of-stochastic-differential-equations-1st-
edition-nawaf-bou-rabee/
https://ebookmeta.com/product/asymptotic-geometric-analysis-part-
ii-1st-edition-shiri-artstein-avidan/
Statistical Analysis of Graph Structures in Random
Variable Networks V. A. Kalyagin
https://ebookmeta.com/product/statistical-analysis-of-graph-
structures-in-random-variable-networks-v-a-kalyagin/
https://ebookmeta.com/product/distributions-1st-edition-jacques-
simon/
https://ebookmeta.com/product/theory-of-distributions-2nd-
edition-svetlin-g-georgiev/
https://ebookmeta.com/product/distributions-volume-3-1st-edition-
jacques-simon/
https://ebookmeta.com/product/philosophers-walks-1st-edition-
bruce-baugh/
Asymptotic Analysis of Random Walks
This series is devoted to significant topics or themes that have wide application in
mathematics or mathematical science and for which a detailed development of the
abstract theory is less important than a thorough and concrete exploration of the
implications and applications.
Books in the Encyclopedia of Mathematics and Its Applications cover their subjects
comprehensively. Less important results may be summarized as exercises at the ends of
chapters. For technicalities, readers can be referred to the bibliography, which is expected
to be comprehensive. As a result, volumes are encyclopedic references or manageable
guides to major subjects.
ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS
All the titles listed below can be obtained from good booksellers or from Cambridge University Press. For a
complete series listing, visit www.cambridge.org/mathematics
A . A . B O ROV KOV
Sobolev Institute of Mathematics, Novosibirsk
Translated by
V. V. U LYA N OV
Lomonosov Moscow State University
and HSE University, Moscow
M . V. Z H I T L U K H I N
Steklov Institute of Mathematics, Moscow
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India
79 Anson Road, #06–04/06, Singapore 079906
www.cambridge.org
Information on this title: www.cambridge.org/9781107074682
DOI: 10.1017/9781139871303
© A. A. Borovkov 2020
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2020
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Borovkov, A. A. (Aleksandr Alekseevich), 1931– author.
Title: Asymptotic analysis of random walks : light-tailed distributions /
A.A. Borovkov, Sobolev Institute of Mathematics, Novosibirsk ;
translated by V.V. Ulyanov, Higher School of Economics, Mikhail
Zhitlukhin, Steklov Institute of Mathematics, Moscow.
Description: Cambridge, United Kingdom ; New York, NY : Cambridge
University Press, 2020. | Series: Encyclopedia of mathematics and its
applications | Includes bibliographical references and index.
Identifiers: LCCN 2020022776 (print) | LCCN 2020022777 (ebook) |
ISBN 9781107074682 (hardback) | ISBN 9781139871303 (epub)
Subjects: LCSH: Random walks (Mathematics) | Asymptotic expansions. |
Asymptotic distribution (Probability theory)
Classification: LCC QA274.73 .B6813 2020 (print) | LCC QA274.73 (ebook) |
DDC 519.2/82–dc23
LC record available at https://lccn.loc.gov/2020022776
LC ebook record available at https://lccn.loc.gov/2020022777
ISBN 978-1-107-07468-2 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
Contents
Introduction page ix
1 Preliminary results 1
1.1 Deviation function and its properties in the one-
dimensional case 1
1.2 Deviation function and its properties in the multidimen-
sional case 13
1.3 Chebyshev-type exponential inequalities for sums of
random vectors 18
1.4 Properties of the random variable γ = (ξ ) and its
deviation function 25
1.5 The integro-local theorems of Stone and Shepp
and Gnedenko’s local theorem 32
2 Approximation of distributions of sums of random variables 39
2.1 The Cramér transform. The reduction formula 39
2.2 Limit theorems for sums of random variables in the Cramér
deviation zone. The asymptotic density 43
2.3 Supplement to section 2.2 50
2.4 Integro-local theorems on the boundary of the Cramér zone 65
2.5 Integro-local theorems outside the Cramér zone 70
2.6 Supplement to section 2.5. The multidimensional case.
The class of distributions ER 75
2.7 Large deviation principles 81
2.8 Limit theorems for sums of random variables with
non-homogeneous terms 90
2.9 Asymptotics of the renewal function and related problems.
The second deviation function 101
2.10 Sums of non-identically distributed random variables in
the triangular array scheme 116
v
vi Contents
This book is devoted mainly to the study of the asymptotic behaviour of the
probabilities of rare events (large deviations) for trajectories of random walks. By
random walks we mean the sequential sums of independent random variables or
vectors, and also processes with independent increments. It is assumed that those
random variables or vectors (jumps of random walks or increments of random
processes) have distributions which are ‘rapidly decreasing at infinity’. The last
term means distributions which satisfy Cramér’s moment condition (see below).
The book, in some sense, continues the monograph [42], where more or less
the same scope of problems was considered but it was assumed that the jumps of
random walks have distributions which are ‘slowly decreasing at infinity’, i.e. do
not satisfy Cramér’s condition. Such a division of the objects of study according to
the speed of decrease of the distributions of jumps arises because for rapidly and
slowly decreasing distributions those objects form two classes of problems which
essentially differ, both in the methods of study that are required and also in the
nature of the results obtained.
Each of these two classes of problems requires its own approach, to be devel-
oped, and these approaches have little in common. So, the present monograph,
being a necessary addition to the book [42], hardly intersects with the latter in
its methods and results. In essence, this is the second volume (after [42]) of a
monograph with the single title Asymptotic Analysis of Random Walks.
The asymptotic analysis of random walks for rapidly decreasing distributions
and, in particular, the study of the probabilities of large deviations have become
one of the main subjects in modern probability theory. This can be explained as
follows.
• A random walk is a classical object of probability theory, which presents huge
theoretical interest and is a mathematical model for many important applications
in mathematical statistics (sequential analysis), insurance theory (risk theory),
queuing theory and many other fields.
• Asymptotic analysis and limit theorems (under the unbounded growth of some
parameters, for example the number of random terms in a sum) form the chief
ix
x Introduction
method of research in probability theory. This is due to the nature of the main
laws in probability theory (they have the form of limit theorems), as well as
the fact that explicit formulas or numerical values for the characteristics under
investigation in particular problems, generally, do not exist and one has to find
approximations for them.
• Probabilities of large deviations present a considerable interest from the math-
ematical point of view as well as in many applied problems. Finding the
probabilities of large deviations allows one, for example, to find the small
error probabilities in statistical hypothesis testing (error probabilities should
be small), the small probabilities of the bankruptcy of insurance companies
(they should be small as well), the small probabilities of the overflow of
bunkers in queuing systems and so on. The so-called ‘rough’ theorems about the
probabilities of large deviations (i.e. about the asymptotics of the logarithms of
those probabilities; see Chapters 4 and 5) have found application in a series of
related fields such as statistical mechanics (see, for example, [83], [89], [178]).
• Rapidly decreasing distributions deserve attention because the first classical
results about the probabilities of large deviations of sums of random variables
were obtained for rapidly decreasing distributions (i.e. distributions satisfying
Cramér’s condition). In many problems in mathematical statistics (especially
those related to the likelihood principle), the condition of rapid decrease turns
out to be automatically satisfied (see, for example, sections 6.1 and 6.2 below).
A rapid (in particular, exponential) decrease in distributions often arises in
queuing theory problems (for example, the Poisson order flow is widely used),
in risk theory problems and in other areas. Therefore, the study of problems with
rapidly decreasing jump distributions undoubtedly presents both theoretical and
applied interest. Let us add that almost all the commonly used distributions in
theory and applications, such as the normal distribution, the Poisson distribu-
tion, the -distribution, the distribution in the Bernoulli scheme, the uniform
distribution, etc. are rapidly decreasing at infinity.
Let ξ , ξ1 , ξ2 , . . . be a sequence of independent identically distributed random
variables or vectors. Put S0 = 0 and
n
Sn := ξk , n = 1, 2, . . .
k=1
The sequence {Sn ; n 0} is called a random walk. As has been noted, a random
walk is a classical object of probability theory. Let us mention the following
fundamental results related to random walks.
• The strong law of large numbers, on the convergence Sn /n → Eξ almost surely
a.s.
as n → ∞.
• The functional central limit theorem, on the convergence in distribution of the
process ζn (t), t ∈ [0, 1], with values (Sk − Ak )/Bn at the points t = k/n,
k = 0, 1, . . . , n, to a stable process, where Ak , Bn are appropriate normalising
constants. For example, in the case Eξ = 0, Eξ 2 = σ 2 < ∞, the polygonal
Introduction xi
line ζn (t) for Ak = 0, B2n = nσ 2 converges in distribution to a standard
Wiener process (the invariance principle; for details, see e.g. [11], [39] or the
introduction in [42]).
• The law of the iterated logarithm, which establishes upper and lower bounds for
the trajectories of {Sk }.
None of the above results describes the asymptotic behaviour of the probabil-
ities of large deviations of trajectories of {Sk }. We mention the following main
classes of problems.
(a) The study of the probabilities of large deviations of sums of random variables
(or vectors); for example, the study of the asymptotics (for Eξ = 0, Eξ 2 < ∞
in the one-dimensional case) of probabilities
√
P(Sn x) for x n, n → ∞. (0.0.1)
(b) The study of the probabilities of large deviations in boundary crossing prob-
lems. For example, in this class of problems belongs a problem concerning
the asymptotics of the probabilities
x
P max ζn (t) − √ g(t) 0 (0.0.2)
t∈[0,1] n
√
for an arbitrary function g(t) on [0, 1] in the case x n as n → ∞ (the
process ζn (t) is defined above).
(c) The study of the more general problem about the asymptotics of the probabil-
ities
x √
P ζn (·) ∈ √ B , x n, (0.0.3)
n
where B is an arbitrary measurable set in one or another space of the functions
on [0, 1].
This monograph is largely devoted to the study of problems concerning prob-
abilities (0.0.1)–(0.0.3) and other closely related problems, under the assumption
that Cramér’s moment condition
[C] ψ(λ) := Eeλξ < ∞
is satisfied for some λ = 0 (its statement here is given for a scalar ξ ). This condition
means (see subsection 1.1.1) that at least one of the ‘tails’ P(ξ x) or P(ξ < −x)
of the distribution of the variable ξ decreases as x → ∞ faster than some exponent.
In the monograph [42], it was assumed that condition [C] is not satisfied but the
tails of the distribution of ξ behave in a sufficiently regular manner.
As we noted above, the first general results for problem (a) about the asymp-
totics of the probabilities of large deviations of sums of random variables go
back to the paper of Cramér. Essential contributions to the development of this
xii Introduction
direction were also made in the papers of V.V. Petrov [149], R.R. Bahadur and
R. Ranga Rao [5], C. Stone [175] and others. One should also mention the papers
of B.V. Gnedenko [95], E.A. Rvacheva [163], C. Stone [174] and L.A. Shepp [166]
on integro-local limit theorems for sums of random variables in the zone of normal
deviations, which played an important role in extending such theorems to the zone
of large deviations and forming the most adequate approach (in our opinion) to
problems concerning large deviations for sums Sn . This approach is presented in
Chapter 2 (see also the papers of A.A. Borovkov and A.A. Mogul’skii [56], [57],
[58], [59]).
The first general results about the joint distribution of Sn and Sn := maxkn Sk
(this is a particular case of the boundary crossing problem (b); see (0.0.2)) in
the zone of large deviations were obtained by A. A. Borovkov in the papers
[15] and [16], using analytical methods based on solving generalised Wiener–
Hopf equations (in terms of Stieltjes-type integrals) for the generating function
of the joint distribution of Sn and Sn . The solution was obtained in the form
of double transforms of the required joint distribution, expressed in terms of
factorisation components of the function 1 − zψ(λ). The double transforms, as
functions of the variables λ and z, can be inverted if one knows the poles of those
transforms as a function of the variable λ and applies modifications of the steepest
descent methods in the variable z. Those results allowed author A.A. Borovkov in
[17]–[19] to find a solution to a more general problem about the asymptotics of the
probabilities (0.0.2). Later in the work by A. A. Borovkov and A. A. Mogul’skii
[53], the asymptotics of the probability (0.0.2) were found for some cases
using direct probability methods without the factorisation technique (see
Chapter 3).
The first general results on the rough asymptotics in problem (c), i.e. on
the logarithmic asymptotics of the probability of the general form (0.0.3) for
arbitrary sets B in the one-dimensional case (these results constitute the large
deviation principle) were obtained in the papers of S.R.S. Varadhan [177] (for
a special case and x n) and A.A. Borovkov [19] (for x = O(n)). In the
paper of A.A. Mogul’skii [131], the results of the paper [19] were transferred
to the multidimensional case. The large deviation principle was later extended
to a number of other objects (see, for example, [83], [178], [179]). However,
the large deviation principle in the papers [177], [19], [131] was established
under a very restrictive version of Cramér’s condition, that ψ(λ) < ∞ for all
λ, and only for continuous trajectories ζn (t) with Ak = 0, Bn = n. In a more
recent series of papers by A.A. Borovkov and A.A. Mogul’skii [60]–[68] (see also
Chapter 4) substantial progress in the study of the large deviation principle was
made: Cramér’s condition, mentioned above in its strong form, was weakened
or totally removed and the space of trajectories was extended up to the space of
functions without discontinuities of the second kind. At the same time the form of
the results changed, so it became necessary to introduce the notion of the ‘extended
large deviation principle’ (see Chapter 4).
Introduction xiii
Let us mention that in the paper of A.A. Borovkov [19] the principle
of moderately large deviations was established as well for ζn (t) as x = o(n),
n → ∞. After that A.A. Mogul’skii [131] extended those results to the multi-
dimensional case. In a recent paper [67] those results were strengthened (see
Chapter 5 for details).
Also, note that a sizable literature has been devoted to the large deviation
principle for a wide class of random processes, mainly Markov processes and
processes arising in statistical mechanics (see, for example, [83], [89], [93], [178],
[179]). Those publications have little in common with the present monograph in
their methodology and the nature of the results obtained, so we will not touch upon
them.
The above overview of the results does not in any way pretend to be complete.
Our goal here is simply to identify the main milestones in the development of the
limit theorems in probability theory that are presented in this monograph. A more
detailed bibliography will be provided in the course of the presentation.
Preliminary results
A condition [C] will be used to mark the fulfilment of at least one of these
conditions:
[C] = [C+ ] [C− ].
We denote an intersection of the conditions [C± ] as
[C0 ] = [C+ ] [C− ].
Condition [C0 ] means, evidently, that
ψ(λ) < ∞ for sufficiently small |λ|.
If
λ+ := sup λ : ψ(λ) < ∞ , λ− = inf λ : ψ(λ) < ∞ ,
then correspondingly conditions [C± ], [C], [C0 ] can be written in the forms
λ± ≷ 0, λ+ − λ− > 0, |λ± | > 0.
1
2 Preliminary results
These conditions, which are called Cramér’s conditions, characterise the decay
rate of the ‘tails’ F± (t) for the distribution of a random variable ξ . When the
condition [C+ ] is met, by virtue of Chebyshev’s exponential inequality we have
F+ (t) := P (ξ t) e−λt ψ(λ) for λ ∈ (0, λ+ ), t > 0,
and, therefore, F+ (t) decreases exponentially as t → ∞. Conversely, if F+ (t) <
ce−μt for some c < ∞, μ > 0 and for all t > 0, then for λ ∈ (0, μ) we have
0
eλt P(ξ ∈ dt) 1 − F+ (0),
−∞
∞ ∞ ∞
e P (ξ ∈ dt) = −
λt
e dF+ (t) = F+ (0) + λ
λt
eλt F+ (t) dt
0 0 0
∞
cλ
F+ (0) + cλ e(λ−μ)t dt = F+ (0) + < ∞,
0 μ −λ
cλ
ψ(λ) 1 + < ∞.
μ−λ
There is a similar connection between the decay rate of F− (t) = P (ξ − t) as
t → ∞ and the finiteness of ψ(λ) under condition [C− ].
It is clear that condition [C0 ] implies the exponential decay of F+ (t) + F− (t) =
P |ξ | t , and vice versa.
Hereafter we also use the conditions
[C∞± ] = {λ± = ±∞}
and the condition
[C∞ ] = |λ± | = ∞ = [C∞+ ] [C∞− ].
It follows from the above that the condition [C∞+ ] ([C∞− ]) is equivalent to the
fact that the tail F+ (t) (F− (t)) diminishes faster than any exponent, as t increases.
It is clear that, for instance, an exponential distribution meets condition [C+ ]
[C∞− ] while a normal distribution meets condition [C∞ ].
Along with Cramér’s conditions we will also assume that the random variable
ξ is not degenerate, i.e. ξ ≡ const. (or D ξ > 0, which is the same).
The properties of the Laplace transform ψ(λ) of a distribution of random
variable ξ are set forth in various textbooks; see e.g. [39]. Let us mention the
following three properties, which we are going to use further on.
ψ (λ)
( 1) The functions ψ(λ) and ln ψ(λ) are strictly convex; the ratio strictly
ψ(λ)
increases on (λ− , λ+ ).
The analyticity property of ψ(λ) in a strip Re λ ∈ (λ− , λ+ ) can be supplemented
with the following ‘extended’ continuity property on a segment [λ− , λ+ ] (on the
strip Re λ ∈ [λ− , λ+ ]).
( 2) The function ψ(λ) is continuous ‘from within’ a segment [λ− , λ+ ]; i.e.
ψ(λ± ∓ 0) = ψ(λ± ) (the cases ψ(λ± ) = ∞ are not excluded).
1.1 Deviation function and its properties in the one-dimensional case 3
The continuity on the whole line might fail as, for instance, λ+ < ∞, ψ(λ+ ) <
∞, ψ(λ+ + 0) = ∞, which is the case for the distribution of a random variable ξ
with density f (x) = cx−3 e−λ+ x for x 1, c = const.
( 3) If E |ξ |k < ∞ and the right-hand side of Cramér’s condition [C+ ] is met,
then the function ψ is k times right-differentiable at the point λ = 0,
ψ (k) (0) = E ξ k =: ak ,
and, as λ ↓ 0,
k
λj
ψξ (λ) = 1 + aj + o(λk ).
j!
j=1
The meaning of the name will become clear later. In classical convex analysis
the right-hand side of (1.1.3) is known as the Legendre transform of the function
A(λ) := ln ψ(λ).
Consider a function A(α, λ) := αλ − A(λ) presented under the sup sign
in (1.1.3). The function −A(λ) is strictly concave (see property ( 1)), so the
function A(α, λ) is the same (note also that A(α, λ) = − ln ψα (λ), where ψα (λ) =
e−λα ψ(λ) is the Laplace transform of the distribution of the random variable
ξ − α and, therefore, from the ‘qualitative’ point of view, A(α, λ) possesses all the
properties of the function −A(λ)). It follows from what has been said that there
always exists a unique point λ = λ(α) on the ‘extended’ real line [−∞, ∞]),
and the convex envelope S of the support of the distribution of ξ . It is clear that
the values λ± are bounds for the set A. The right and left bounds α± , s± for the
sets A , S, are evidently given by
ψ (λ± ∓ 0)
α± = A (λ± ∓ 0) = ;
ψ(λ± ∓ 0)
s+ = sup t : P(ξ t) < 1 ,
s− = inf t : P(ξ t) > 0 ,
ψ (λ)
= α, (1.1.4)
ψ(λ)
always has a unique solution λ(α) on the segment [λ− , λ+ ] (the values of λ± can
be infinite). This solution λ(α), as the inverse function to the function ψ (λ)/ψ(λ)
(see (1.1.4)), which is analytic and strictly increasing on (λ− , λ+ ), is also analytic
and strictly increasing on (α− , α+ ):
Since λ(a1 ) = (a1 ) = 0 (which follows from (1.1.4) and (1.1.6)), in particular
for α0 = a1 we have
α
(α) = λ(v)dv. (1.1.8)
a1
ψ(λ) ∼ p+ eλs+ ,
αλ − ln ψ(λ) = αλ − ln p+ − λs+ + o(1) = (α − α+ )λ − ln p+ + o(1)
Thus, for α α+ the function λ(α) is a constant, and (α) grows linearly.
Moreover, the relations (1.1.7), (1.1.8) remain valid.
If α+ = ∞, then α < α+ for all finite α α− , and again we are dealing
with the ‘regular’ situation considered before (see (1.1.7), (1.1.8)). Since λ(α) is
non-decreasing, those relations imply the convexity of (α).
In sum, we can formulate the next property.
(3) The functions λ(α), (α) may have discontinuities only at the points s± in
the case P(ξ = s± ) > 0. These points separate the domain (s− , s+ ) of finiteness
and continuity (in the extended sense) of the function from the domain α ∈
[s− , s+ ], where (α) = ∞. On [s− , s+ ] the function is convex. (If one defines
convexity in the ‘extended’ sense, i.e. allowing infinite values, then is convex on
the whole line.) The function is analytic on the interval (α− , α+ ) ⊂ (s− , s+ ). If
λ+ < ∞, α+ < ∞, then the function (α) is linear on (α+ , ∞) with a slope angle
of λ+ ; at the boundary point α+ the continuity of the first derivatives persists. If
λ+ = ∞, then (α) = ∞ on (α+ , ∞). An analogous property is valid for the
function (α) on (−∞, α− ).
If λ− = 0, then α− = a1 and λ(α) = (α) = 0 with α a1 .
In fact, since λ(a1 ) = 0 and ψ(λ) = ∞ for λ < λ− = 0 = λ(a1 ), with
the decrease of α down to α− = a1 , the point λ(α), having reached 0, stops, and
λ(α) = 0 for α α− = a1 . It follows from this and from the first identity in
(1.1.6) that (α) = 0 for α a1 .
1.1 Deviation function and its properties in the one-dimensional case 7
If λ− = λ+ = 0 (the condition [C ] is not satisfied), then λ(α) = (α) ≡ 0
for all α. This is evident, since the value under the sup sign in (1.1.3) is equal to
−∞ for all λ = 0. In this case the limit theorems stated in the subsequent sections
would not be informative.
By summarising the properties of , we can conclude, in particular, that on the
whole line the function is
(a) convex: for α, β ∈ R, p ∈ [0, 1]
pα + (1 − p)β p(α) + (1 − p)(β); (1.1.11)
(b) lower semicontinuous:
lim (α) (α0 ), α0 ∈ R. (1.1.12)
α→α0
Properties (a), (b) are known as general properties of the Legendre transform of
a convex lower-semicontinuous function A(λ) (see e.g. [159]).
We will also need the following properties of function .
(4) Under trivial conventions about notation, for independent random variables
ξ and η we have
(ξ +η) (α) = sup αλ − A(ξ ) (λ) − A(η) (λ) = inf (ξ ) (γ ) + (η) (α − γ ) ,
λ γ
α−b
(cξ +b) (α) = sup αλ − λb − A(ξ ) (λc) = (ξ ) .
λ c
It is clear that the infimum infγ in the first relation is reached at a point γ , such
that λ(ξ ) (γ ) = λ(η) (α − γ ). If ξ and η are identically distributed, then γ = α/2
and, therefore,
α α α
(ξ +η) (α) = (ξ ) + (η) = 2(ξ ) .
2 2 2
It is also evident that for all n 2
α
αλ
(α) = sup αλ − nA (λ) = n sup
(Sn ) (ξ )
− A (λ) = n(ξ )
(ξ )
.
n n
(5) The function (α) attains its minimal value, which is equal to 0, at the point
α = E ξ = a1 . For definiteness, let α+ > 0. If a1 = 0, E |ξ k | < ∞, then
1 γ3
λ(0) = (0) = (0) = 0, (0) = , (0) = − , ... (1.1.13)
γ2 γ22
(in the case α− = 0, right derivatives are meant ). As α ↓ 0, the next representa-
tion takes place:
k
(j) (0)
(α) = α j + o(α k ). (1.1.14)
j!
j=2
Since λ(α) on (α− , α+ ) is the function inverse to ln ψ(λ) (see (1.1.4)), then for
λ ∈ (λ− , λ+ ) the equation (1.1.16) has an evident solution
α = a(λ) := ln ψ(λ) . (1.1.17)
1.1 Deviation function and its properties in the one-dimensional case 9
Taking into account that λ a(λ) ≡ λ, we obtain
T(λ) = λa(λ) − a(λ) ,
T (λ) = a(λ) + λa (λ) − λ a(λ) a (λ) = a(λ).
Since a(0) = a1 and T(0) = −(a1 ) = 0,
λ
T(λ) = a(u)du = ln ψ(λ). (1.1.18)
0
The statement is proved, as well as another inversion formula (the last equality in
(1.1.18); this expresses ln ψ(λ) in terms of the integral of the function a(λ), which
is inverse to λ(α)).
By virtue of Chebyshev’s exponential inequality, for all n 1, λ 0, x 0,
we have
P (Sn x) e−λx ψ(λ) = exp −λx + n ln ψ(λ) . (1.1.19)
Since λ(α) 0 for α a1 , by setting α = x/n and by substituting λ = λ(α) 0
in (1.1.19) we obtain the property
(8) For all n 1 and α = x/n a1 ,
P(Sn x) e−n(α) .
The next property can be named as an exponential modification of the
Kolmogorov–Doob inequality.
(9) Theorem 1.1.2. (i) For all n 1, x 0 and λ 0, one has
P(Sn x) e−λx max 1, ψ n (λ) . (1.1.20)
(ii) Let Eξ < 0, λ1 := max λ : ψ(λ) 1 . Then, for all n 1 and x 0, one
has
P (Sn x) e−λ1 x . (1.1.21)
If λ+ > λ1 then ψ(λ1 ) = 1, (α) λ1 α for all α and (α1 ) = λ1 α1 , where
ψ (λ1 )
α1 := arg λ(α) = λ1 = , (1.1.22)
ψ(λ1 )
so that a line y = λ1 α is tangent to the convex function y = (α) at the
point (α1 , λ1 α1 ). In addition, along with (1.1.21), the next inequality holds
for α := x/n:
P (Sn x) e−n1 (α) , (1.1.23)
where
λ1 α for α α1 ,
1 (α) =
(α) for α > α1 .
10 Preliminary results
If α α1 then the inequality (1.1.23) coincides with (1.1.21); for α > α1 it
is stronger than (1.1.21).
(iii) Let Eξ 0, α = x/n Eξ . Then, for all n 1,
P (Sn x) e−n(α) . (1.1.24)
n
= eλx ψ n−k (λ)P η(x) = k eλx min 1, ψ n (λ) P η(x) n .
k=1
Hence we obtain statement (i).
(ii) Inequality (1.1.21) directly follows from (1.1.20), if one takes λ = λ1 . Now
let λ+ > λ1 . Then, obviously, ψ(λ1 ) = 1 and from the definition of the function
(α) it follows that
(α) λ1 α − ln ψ λ1 = λ1 α.
Furthermore,
(α1 ) = λ1 α1 − ln ψ λ1 = λ1 α1 ,
so that the curves y = λ1 α and y = (α) are tangent to each other at the point
(α1 , λ1 α1 ).
Next, it is clear that ψ λ(α) 1 for α α1 . For α = x/n, the optimal choice
of λ in (1.1.20) would be λ = λ(α). For such an λ(α), i.e. α = x/n, we obtain
P (Sn x) e−n(α) .
Together with (1.1.21) this proves (1.1.23). It is also clear that (α) > λ1 α for
α > λ1 , which proves the last statement of (ii).
(iii) Since λ(a1 ) = 0 and λ(α) is non-decreasing, λ(α) 0 for α a1 . In the
case a1 0 one has ψ(λ) 1 for λ 0. Thus ψ λ(α) 1 for α a1 . By
substituting λ = λ(α) in (1.1.20) for α a1 , one obtains (1.1.24). Theorem 1.1.2
is proved.
The probabilistic sense of the deviation function is clarified by the following
statement. Let (α)ε = (α − ε, α + ε) be an ε-neighbourhood of α. For any set
B ⊂ R denote
(B) = inf (α).
α∈B
2 /2 α2
ψ(λ) = eλ , |λ± | = |α± | = ∞ λ(α) = α, (α) =
.
2
Example 1.1.6. For the Bernoulli scheme with parameter p ∈ (0, 1) we have
M = Eξ(i)
0 0
ξ(j) , 0
ξ(i) = ξ(i) − Eξ(i) , i = 1, . . . , d.
If the vector ξ has a normal distribution with mean a1 and covariance matrix M
then it is easy to verify that
1
(α) = (α − a1 )M −1 (α − a1 )T .
2
−
→
( 3) The function (α) is convex: for α, β ∈ Rd , p ∈ [0, 1] one has (1.1.11).
The sets A and v = α : (α) v for all v 0 are convex.
−
→
( 4) The function (α) is lower semicontinuous everywhere:
Thus, as in the one-dimensional case, the deviation function (α) under [C]
uniquely defines the distribution of ξ .
The next property can be named as a consistency property. Under natural
conventions about notation one has
−
→
( 7)
(α) c1 |α| − c2 .
Lemma 1.2.2. Let (Y, ρ) be an arbitrary linear metric space, and let a function
J : Y → [0, ∞] be lower semicontinuous and convex, y ∈ [J<∞ ], y0 ∈ J<∞ . Then
the following properties hold.
(i) (Lower continuity)
lim J((y)ε ) = J(y). (1.2.5)
ε→0
For a set B ⊂ Rd we denote by (B) and [B] the interior and the closure of B
respectively.
Corollary 1.2.3. Let B ⊂ Rd be a convex set and let the function J be lower
semicontinuous and convex, J((B)) < ∞. Then
J([B]) = J(B) = J((B)). (1.2.7)
Proof of Lemma 1.2.2. (i) Property (1.2.5) is a consequence of the lower semi-
continuity of J (cf. (1.2.3)) and the relation (1.2.6). Indeed, by virtue of the lower
semicontinuity, we have on the one hand
lim J((y)ε ) J(y).
ε→0
On the other hand, choose y0 ∈ J<∞ and let yp := (1 − p)y0 + py, so that
y = y1 . According to the properties of a linear metric space (see [112], p. 23)
we have yp → y1 as p → 1. Hence for every ε > 0 there exists an arbitrarily large
p = p(ε) < 1 such that yp(ε) ∈ (y)ε and p(ε) → 1 as ε → 0. Therefore, by virtue
of (1.2.6),
lim J((y)ε ) lim J(yp(ε) ) = J(y).
ε→0 ε→0
Both the points y0 and y1 belong to the convex set [B] ∩ J<∞ . Hence, yp also
belongs to that set for all p ∈ [0, 1] and, in order to prove (1.2.8), we have to
exclude the possibility yp ∈ ∂B for p ∈ [0, 1). But if yp ∈ ∂B then there exists a
sequence of points v → yp such that v ∈ [B]. In this case,
v − py1
w := → y0 as v → yp .
1−p
Since y0 ∈ (B), w ∈ (B) as v is close enough to yp . Since v = py1 + (1 − p)w and
y1 ∈ [B], v ∈ [B]. The derived contradiction proves (1.2.8).
(ii) Now let us turn to the direct proof of (1.2.7). Statement (1.2.8) implies the
inequality
Since y0 , y1 belong to J<∞ , by virtue of statement (ii) of Lemma 1.2.2, the right-
hand side of (1.2.9) converges to J(y1 ) as p → 1. Therefore,
J((B)) J(y1 ).
J((B)) J([B]),
P(ξ ∈ B) e−([B
con ])
, (1.3.4)
is attained. It is clear that the point αB lies on the boundaries of the sets vB and
[B], and that the sets touch each other: αB ∈ vB ∩ [B] = ∅.
If only condition [C] is met then, in the case where the set B is unbounded, the
sets vB and [B] may ‘touch’ each other ‘at infinity’.
If (B) = ∞ then the set B does not intersect the convex set <∞ .
The symbol e will be used for unit vectors.
Definition 1.3.3. Let (B) < ∞. In this case, the set B will be called -separable
if there exists a hyperplane = := {α : e, α = b} (going through the point αB
if the latter exists) that separates the sets B and vB = {α : (α) vB } in the
following sense:
B ⊂ > := {α : e, α > b}, vB ⊂ := {α : e, α b}. (1.3.5)
B ⊂ > , <∞ ⊂ .
Theorem 1.3.4. If B is a -separable set then
Theorems 1.3.2 and 1.3.4 do not provide any further meaningful bounds. Using
a somewhat different approach, we can consider a random variable γ := (ξ )
and an ‘iterate’ deviation function (γ ) , i.e. the deviation function for a random
variable γ which is equal to the value of the deviation function = (ξ ) (α) at
the point ξ . Properties of the random variable γ and the deviation function (γ )
are studied in section 1.4.
1.3 Exponential inequalities for sums of random vectors 21
Theorem 1.3.5. If (B) Eγ then
P(ξ ∈ B) e−
(γ ) ((B))
.
(i) Let the set B be -separable and (B) = ∞. Then > ∩ <∞ = ∅ and so
(> ) = ∞. In other words, (α) = ∞ for all α ∈ > . It follows from this and
(1.3.8) that (β) (v) = ∞ for all v > b and therefore (β) ((b, ∞)) = ∞.
Further, by virtue of (1.3.5), one has
But the sets vB and > are disjoint. Hence (α) > vB for any α ∈ > . This
means that (> ) vB . Inequality (1.3.6) is established and Theorem 1.3 is
proved.
Note that if the set B is not -separable then, generally speaking, the equality
([B]) = ∞ does not imply that P(ξ ∈ B) = 0 (recall that the set <∞ coincides
up to its boundary with the convex envelope of the support of the distribution of
ξ ). This is demonstrated by the following example.
Example 1.3.6. Let be a unit sphere in Rd , d 2, and let a random vector ξ
have a uniform distribution on . Also let B be the closure of the exterior. Then
⊂ B, ([B]) = ∞ and P(ξ ∈ B) = 1.
Proof of Theorem 1.3.1. Since the sets v , <∞ are convex, according to the
Hahn–Banach theorem (see e.g. [110], p. 137) the open convex set B is -
separable and so by Theorem 1.3.4 one has (1.3.6) for that set. It remains to make
22 Preliminary results
use of Corollary 1.2.3, by virtue of which the right-hand side of (1.3.6) coincides
with the right-hand side of (1.3.3). Theorem 1.3.1 is proved.
Proof of Theorem 1.3.2. Since B ⊂ Bcon , it suffices to verify that (1.3.4) holds for
any convex set B = Bcon .
The set BN := {α ∈ B : |α| N} is convex, along with B. The ε-neighbourhood
(BN )ε of the set BN is also convex. Therefore, by Theorem 1.3.1,
P(ξ ∈ B) P(ξ ∈ (BN )ε ) + PN e−((BN )ε ) + PN , PN := P(|ξ | > N).
Now let ε = εk → 0 as k → ∞, even out the spaces on this line and let αk be
a sequence of points from (BN )εk ⊂ [(BN )εk ] such that (αk ) ((BN )εk ) +
1/k. Assuming without loss of generality that αk converge to α0 ∈ [BN ] as k →
∞, we obtain from the lower semicontinuity of the function that
P(ξ ∈ B) lim e−((BN )εk ) + PN lim e−(αk )+1/k
k→∞ k→∞
−(α0 ) −([BN ])
+ PN e + PN e + PN . (1.3.11)
Since ([BN ]) ↓ ([B]) as N ↑ ∞, passing to the limit in (1.3.11) as N ↑ ∞ we
obtain (1.3.6). Theorem 1.3.2 is proved.
Proof of Theorem 1.3.5. Since B ⊂ {α : (α) (B)}, one has, for (B) Eγ ,
that
P(ξ ∈ B) P((ξ ) (B)) = P(γ (B)) e−
(γ ) ((B))
.
The theorem is proved.
Put
Sn λ
ζn := , A(ζn ) (λ) := ln Eeλ,ζn = nA .
n n
−
→
Then the deviation function (ζn ) (α) for ζn , by virtue of property ( 8) is equal to
(ζn ) (α) = n(α). (1.3.12)
Thus, the following assertion immediately follows from Theorems 1.3.1 and 1.3.2.
Corollary 1.3.7. (i) For an arbitrary open convex set B one has the inequality
P(ζn ∈ B) e−n(B) . (1.3.13)
1.3 Exponential inequalities for sums of random vectors 23
(ii) For an arbitrary measurable set B, one has the inequality
P(ζn ∈ B) e−n([B
con ])
,
where Bcon is the convex envelope of B.
From the results contained in this monograph (see below), it follows that the
bounds in Corollary 1.3.7(i) are ‘exponentially’ unimprovable. Such exact bounds
for the probability P(ζn ∈ B) for arbitrary sets B cannot be found. However, the
following assertion holds true. As before, we let γ = (ξ ).
Corollary 1.3.8. If (B) Eγ then
P(ζn ∈ B) e−n
(γ ) ((B))
. (1.3.14)
Inequality (1.3.14) is not ‘exponentially’ unimprovable owing to the ‘losses’ in
the first inequality in (1.3.15) (see below). This is also indicated by the inequality
(γ ) (v) < v − Eγ (see (1.4.8) below), which holds for random variables ξ that
are unbounded from above. However, for large v one has (γ ) (v) ∼ v, and so
inequalities (1.3.13), (1.3.14), in a certain sense, converge (i.e. (γ ) ((B)) ∼
(B) for large (B)).
Proof of Corollary 1.3.8. By virtue of the convexity of the deviation function
(α), one has
n
n(ζn ) γk , γk := (ξk ).
k=1
where μ21 and μ2d are respectively the largest and the smallest eigenvalues of the
matrix
2
∂ (α)
M −1 = ∂α ∂α
,
(i) (j) α=Eξ
cu nd/2−1 e−un .
Comparison with (1.3.16) shows that the error of inequality (1.3.16) is of order
√
n as n → ∞ – the same as for the exact inequalities in Theorems 1.3.1–
1.3.4.
1.4 The random variable γ = (ξ ) and its deviation function 25
i.e. the value of γ does not change under the shift ξ . Similarly, we get
(ξ H) (ξ H) = (ξ ) (ξ HH −1 ) = γ ,
i.e. the value of γ does not change under ‘rotation and contraction’ of the vector
ξ . Thus, a linear transformation of the vector ξ does not affect the value of γ .
(γ ) (v) ∼ v as v → ∞
(α) = v > 0.
26 Preliminary results
If v ∗ then, by virtue of the convexity of the function (α) and its continuity
inside [s− , s+ ], there exist two solutions α± (v) of that equation; α+ (v) > Eξ
and α− (v) < Eξ . If v ∈ (∗ , ∗ ] then there exists only one solution: α+ (v) (if
∗ = − ) or α− (v) (if ∗ = + ). In that case, we will introduce the second
‘solution’, which does not exist, by setting it equal to ∓∞. If v > ∗ then there
are no solutions, and we set α± (v) = ±∞. It follows that {(ξ ) v} is the union
of two disjoint events {ξ α− (v)}, {ξ α+ (v)} (these events may be empty).
Therefore, for γ = (ξ ) we obtain
(ii) The second assertion of the theorem follows from the first and the equality
∞
Eγ = P(γ v)dv.
0
1
n
(ζn ) (ξk ) = θn , (1.4.3)
n
k=1
and therefore
For the sake of definiteness, let + = ∞. Then the function (α) continuously
increases from 0 to ∞ on (Eξ , ∞) and therefore (α+ (v) + 0) = (α+ (v)) = v.
By the large deviation principle for ζn (see Theorem 1.1.4) it is not hard to obtain
(see also Theorem 2.2.2)
1 1
lim ln P(θn v) lim ln P(θn v)
n→∞ n n→∞ n
1
lim ln P(ζn α+ (v)) −(α+ (v) + 0) = −v. (1.4.4)
n→∞ n
1.4 The random variable γ = (ξ ) and its deviation function 27
It follows from (1.4.2) and (1.4.4) that
(γ ) (v) v. (1.4.5)
Further, by the property (1) (see (1.1.8)), the deviation function (γ ) (v)
for the random variable γ (and for any other random variable as well) allows a
representation of the form
v
(v) =
(γ )
λ(γ ) (u)du, (1.4.6)
Eγ
where λ(γ ) (u) is the value of λ at which the supremum is attained in the definition
(γ )
(γ ) (u) = supλ {λu−ln Eeλγ }. Moreover, λ(γ ) (u) ↑ λ+ := sup{λ : Eeλγ < ∞}
as u ↑ ∞. From that it follows that there exists the limit
(γ ) (v) (γ )
lim = λ+ . (1.4.7)
v→∞ v
(γ )
Now the required inequality λ+ 1 follows from (1.4.5) and (1.4.7), which
(γ )
proves that λ+ = 1.
If ∗ = + < ∞ then the assertion of the theorem follows in an obvious way
from the equality
+ = − ln P(ξ = s+ )
(see (2ii)). Theorem 1.4.1 is proved.
(γ )
Relations (1.4.7) for λ+ = 1 mean that, under broad assumptions,
− ln P(γ v) ∼ v as v→∞
(see also subsection 1.4.3). This is, in a certain sense, an analogue of the relation
P F(ξ ) < t = P ξ < F (−1) (t) = F F (−1) (t) ≡ t,
which is true under some assumptions on the distribution function F(t) = P(ξ <
(γ )
t). Also observe that, by virtue of the relation (1.4.6) and the equality λ+ = 1,
one has the inequality
(γ ) (v) < v − Eγ for v > Eγ . (1.4.8)
Example 1.4.2. Suppose that ξ has a normal distribution. By virtue of subsec-
tion 1.4.1, one can assume that Eξ = 0, Eξ 2 = 1. Then (α) = α 2 /2 (see
Example 1.1.5), so that γ = ξ 2 /2, Eγ = 1/2,
√ ∞
2 −u2 /2 1
P(γ v) = 2P(ξ 2v) = √ √ e du ∼ √ e−v as v → ∞.
2π 2v πv
Further,
∞
2 /2 1 2 /2−x2 /2 1
Eeλγ = Eeλξ =√ eλx dx = √ for λ 1.
2π −∞ 1−λ
28 Preliminary results
Therefore the equation for the point λ(γ ) (v) takes the form of a partial derivative
with respect to λ
1 1
λv + ln(1 − λ) = v − =0
2 λ 2(1 − λ)
and it has the unique solution λ(γ ) (v) = 1 − 1/(2v). Hence we find that
v
1 1
(γ ) (v) = 1− du = v − (ln v + ln 2 + 1).
Eγ 2u 2
It is also clear that, in this example, one has s± = ±∞, ± = ∞.
Example 1.4.3. Suppose that ξ follows the exponential distribution. By virtue of
subsection 1.4.1, one can assume
P(ξ t) = e−t for t 0.
Then (see Example 1.1.7)
(α) = α − 1 − ln α for α 0,
so that γ = ξ − 1 − ln ξ . The equation (α) = v has the solution α+ (v) =
v + ln v + 1 + O(ln v/v) as v → ∞. Therefore,
ln v
P(γ v) = P ξ v + ln v + 1 + O
v
−(v+ln v+1+O(ln v/v)) 1 −v−1 ln v
=e = e 1+O as v → ∞.
v v
In this example, s− = 0, s+ = ∞, ± = ∞.
Example 1.4.4. For a Bernoulli random variable ξ (P(ξ = 1) = p = 1 − P(ξ =
0)) one has (see Example 1.1.6)
ξ 1− ξ
γ = (ξ ) = ξ ln + (1− ξ ) ln , Eγ = −p ln p − (1− p) ln(1− p) ln 2.
p 1− p
Hence
− ln p with probability p,
γ =
− ln(1 − p) with probability 1 − p.
If p = 1/2 then the variable γ degenerates to the constant ln 2. For p < 1/2, the
variable γ is given by an affine transformation of ξ :
ξ +a
γ = with a = ln(1 − p), b = ln(1 − p) − ln p,
b−a
so that
(γ ) (v) = (v(b − a) − a).
The case p > 1/2 is dealt with in a similar way. In this example we have s− = 0,
s+ = 1, − = − ln(1 − p), + = − ln p.
Another Random Document on
Scribd Without Any Related Topics
»Sire», kuiskasi Gilbert, »teillä ei ole enää mitään pelättävää.
Sallikaa minun palata kuningattaren luo.»
Vastavaikutus
»Sire…»
»Olkaa vaiti!»
Ikävä vain, että kello kolmen aikaan aamulla Danton tuli Pétionin
luo ja paljasti salahankkeen. Päivän koitteessa Pétion peruutti
katselmuksen.
»Pahoin pelkään.»
»Jatka.»
»Mitä varten?»
»Iskemään minua.»
»Sanokaa se heti.»
»Milloin siis?»
»Sydänyöllä.»
»Mitä?»
»Tuon miehen?»
»Odottaa.»
»Niin.»
»Minun!»
»Annan.»
»Miehen sanan?»
»Tasavaltalaisen sanan.»
Vergniaud puhuu
Vaara oli suuri, uhkaava, hirveä, niin suuri, ettei se uhannut enää
yksilöitä, vaan koko isänmaata.
Sanat: »Isänmaa on vaarassa!» kulkivat kuiskauksena suusta
suuhun,
vaikkei niitä vielä lausuttukaan ääneen.
Kansalliskokous odotti.