(Mathematics and Its Applications 26) Shiryayev A. N. - Selected Works of A.N. Kolmogorov. Volume II - Probability Theory and Mathematical Statistics-Springer (1992)
(Mathematics and Its Applications 26) Shiryayev A. N. - Selected Works of A.N. Kolmogorov. Volume II - Probability Theory and Mathematical Statistics-Springer (1992)
(Mathematics and Its Applications 26) Shiryayev A. N. - Selected Works of A.N. Kolmogorov. Volume II - Probability Theory and Mathematical Statistics-Springer (1992)
Kolmogorov
Mathematics and Its Applications (Soviet Series)
Managing Editor:
M. HAZEWINKEL
Centre Jor Mathematics and Computer Science, Amsterdam, The Netherlands
Editorial Board:
Volume 26
Selected W orks of
A. N. Kolmogorov
Volume II
Probability Theory and
Mathematical Statistics
edited by
A. N. Shiryayev
..
SPRINGER-SCIENCE+BUSINESS MEDIA, B.V.
Library of Congress Cataloging-in-Publication Data
Kol.ogorov, A.. N. (Andre; Nlicolaevlchl. 1903-
(Teorl1a verOlätnostel 1 uteutlch8Silaiä statlsT\ka. EngllShl
PrObabilIty theory and lIatheutlCal sTZltlstlcs .' eClud by A.N .
Shlryayev: translated FrOIl the Russ!an by G. LlndqulSt.
p. eil. -- CSelected warks of A.N . Kal.agorav : v. 2)
fMathelUtlcs lind Its appltcatlons (Sovlet S,,-II$) ; V. 261
Translation of: Teor",fi veroiätnoste1 Illateutlchesk.fi
statlstlka.
ISBN 978-94-010-5003-6 ISBN 978-94-011-2260-3 (eBook)
DOI 10.1007/978-94-011-2260-3
1. Probab,11tie:; . 2 . Hnh91U'!lcal !ta~lstlcs . 1. Shu' iäe\',
Al 'bllrt Nl kolaev l ch. 11. lltle. !Il. Serlts : Koillogcrov . A. N.
IAndrel Ntkolaevlch ). 1903- Selectlons . Engllsh. 1991 ; v. 2.
IV . Serie. : MathellltlC$ ,nd tu appllcatiOns (Klu"." Aeade.le
Publ1shersJ. SOvlet ser-Ies : 26.
QA.3.K792513 1991 v. 2
[0'273.1S1
510 s--dc20
[519.2J 91-14856
CIP
ISBN 978-94-010-5003-6
ISBN 978-94-010-5003-6
lEt moi •...• si j'avait so comment en revenir, One service mathematics has rendered the
je n'y serais point alle:' human race. It has put common sense back
Ju1e. Veme where it belongs, 01\ the topmost shelf next
to the dUlty canister labelled 'discarded non-
Tbe series is divergent; therefore we may be sense'.
able to do something with it. Erie T. Bell
O. Heaviside
Mathematics is a tool for thought. A highly necessary tool in a world where both feedback and non-
linearities abound. Similarly, all kinds of parts of mathematics serve as tools for other parts and for
other sciences.
Applying a simple reweiting rule to the quote on the right above one finds such statements as:
'One service topology has rendered mathematical physics .. .'; 'One service logic has rendered com-
puter science .. .'; 'One service category theory has rendered mathematics .. .'. All arguably true. And
all statements obtainable this way form part of the raison d'tftre of this series.
This series, Mathematics and fts Applications, started in 1977. Now that over one hundred
volumes have appeared it seems opportune to reexamine its scope. At the time 1 weote
''Growing specialization and diversification have brought a host of monographs and
textbooks on increasingly specialized topics. However, the 'tree' of knowledge of
mathematics and related fields does not grow only by pUUing forth new branches. It
also happens, quite often in fact, that branches which were thought to be completely
disparate are suddenly seen to be related. Further, the kind and level of sophistication
of mathematics applied in various sciences has changed drastically in recent years:
measure theory is used (non-trivially) in regional and theoretical economics; algebraic
geometry interacts with physics; the Minkowsky lemma, coding theory and the structure
of water meet one another in packing and covering theory; quantum fields, crystal
defects and mathematical programming profit from homotopy theory; Lie algebras are
relevant to filtering; and prediction and electrical engineering can use Stein spaces. And
in addition to this there are such new emerging subdisciplines as 'experimental
mathematics', 'CFD', 'completely integrable systems', 'chaos, synergetics and large-scale
order', which are almost impossible to fit into the existing classification schemes. They
draw upon widely different sections of mathematics."
By and large, all this still applies today. It is still true that at first sight mathematics seems rather
fragmented and that to find, see, and exploit the deeper underlying interrelations more effort is
needed and so are books that can help mathematicians and scientists do so. Accordingly MIA will
continue to try to make such books available.
If anything, the description 1 gave in 1977 is now an understatement. To the examples of
interaction areas one should add string theory where Riemann surfaces, algebraic geometry, modu-
lar functions, knots, quantum field theory, Kac-Moody algebras, monstrous moonshine (and more)
all come together. And to the examples of things which can be usefully applied let me add the topic
'finite geometry'; a combination of words which soundslike it might not even exist, let alone be
applicable. And yet it is being applicd: to statistics via designs, to radar! sonar detection arrays (via
finite projective planes), and to bus connections of VLSI chips (via difference sets). There seems to
be no part of (so-called pure) mathematics that is not in immediate danger of being applied. And,
accordingly, the applied mathematician needs to be aware of much more. Besides analysis and
numerics, the traditional workhorses, he may need all kinds of combinatorics, algebra, probability,
and so on.
In addition, the applied scientist needs to cope increasingly with the non linear world and the
v
vi SERIES EDITOR'S PREFACE
extra mathematical sophistication that this requires. For that is where the rewards are. Linear
models are honest and a bit sad and depressing: proportional efforts and results. It is in the non-
linear world that infinitesimal inputs may result in macroscopic outputs (or vice versa). To appreci-
ate what I am hinting at: if electronics were linear we would have no fun with transistors and com-
puters; we would have no TV; in fact you would not be reading these lines.
There is also no safety in ignoring such outlandish things as nonstandard analysis, superspace
and anticommuting integration, p-adic and ultrametric space. All three have applications in both
electrical engineering and physics. Once, complex numbers were equally outlandish, but they fre-
quently proved the sllortest path between 'real' results. Similarly, the first two topics named have
already provided a number of 'wormhole' paths. There is no telling where all this is leading -
fortunately.
Thus the original scope of the series, which for various (sound) reasons now comprises five sub-
series: white (Japan), yellow (China), red (USSR), blue (Eastern Europe), and green (everything
else), still applies. It has been enlarged a bit to include books treating of the tools from one subdis-
cipline which are used in others. Thus the series still aims at boots dealing with:
- a central concept which plays an important role in several different mathematical and/or
scientific specialization areas;
- new applications of the results and ideas from one area of scientific endeavour into another;
- inftuences wh ich the results, problems and concepts of one field of enquiry have, and have had,
on the development of another.
The roots of much that is now possible using mathematics, the stock it grows on, much of that goes
back to A.N. Kolmogorov, quite possibly the finest mathematician of this century. He solved out-
standing problems in established fields, and created wh oie new ones; the word 'specialism' did not
exist for hirn.
A main driving idea behind this series is the deep interconnectedness of all things mathematical
(of which much remains to be discovered). Such interconnectedness can be found in specially writ-
ten monographs, and in selected proceedings. It can also be found in the work of a single scientist,
especially one like A.N. Kolmogorov in whose mind the dividing lines between specialisms did not
even exist.
The present volume is thesecondof a three volume collection of selectedscientific papers of A.N.
Kolmogorov with added commentary by the author himself, and additional surveys by others on the
many developments started by Kolmogorov. His papers are scattered far and wide over many dif-
ferent journals and they are in severallanguages; many have not been available in English before. If
you can, as Abel recommended, read and study the masters themselves; this collection makes that
possible in the case of one of the masters, A.N. Kolmogorov.
Tbe shortest path between two truths in the N ever lend books. for no one ever returns
real domain passes througlt the complex them; the only books I have in my library
domain. are books that other folk have lent me.
J. Hadamard Anatole France
La physique ne nous donne pas seulement The function or an expert is not to be more
l'occasion de resoudre des problemes ... elle right than other people, but to be wrong for
nous fait pressentir 1a solution. more sophisticated reasons.
H. Poincare David Butler
Comments . . . . . . . . . . . . . . . . . . . . . . . . 520
A.N. Kolmogorov. On the papers on prob ability theory and mathe-
matical statistics . . . . . . . . . . . . . . . . . . . . . 520
Analytical methods in prob ability theory (No. 9) (A.D. VentseQ 522
Markov processes with a countable number of states (No. 10)
(B.A. Sevast'yanov) . . . . . . . . . . . . . . . . . . 528
Homogeneous random processes (No. 13) (V.M. Zolotarev) . 528
Homogeneous Markov processes (No. 39) (A.A. Yushkevich) 530
Branching processes (Nos. 25, 32, 33, 46) (B.A. Sevast'yanov) 538
Stationary sequences (No. 27) (Yu.A. Rozanov) . . 539
Stationary processes (No. 48) (V.A. Statulyavichus) . . . . . 542
Statistics of processes (No. 50) (A.N. Shirgaev) . . . . . . . 544
Spectral theory of stationary processes (No. 34) (A.M. Yaglom) 545
Spectral representation of random processes (Nos. 47, 49)
( Yu. G. Balasanov and I. G. Zhurbenko) . . . . . . . . 551
Brownian motion (Nos. 14, 19, 24) (A.M. Yaglom) 554
Markov chains with a countable number of states (No. 23)
(A.A. Yushkevich) . . . . . . . . . . 559
Wald identities (No. 35) (A.A. Novikov) . . . . 567
S-Convergence (No. 42) (A. V. SkorokhorI) . . . 569
Uniform limit theorems (Nos. 43, 51) (T. V. Arak) 570
Concentration functions (No. 45) (V.M. Kruglov) 571
Empirical distributions (No. 15) (E. V. Khmaladze) 574
The method of least squares (Nos. 30, 31) (M.B. Malyutov) 583
Unbiased estimators (No. 38) (Yu.K. Belyaev and Ya.P. Lumel'skia) 585
Statistical prediction (No. 18) (A.M. Yaglom) 587
On inter-bed washout (No. 37) (A.B. Vistelius) . . . . . . . . . 591
From the Publishers of the Russian Edition
In accordance with the decision of the Praesidium of the USSR Academy of
Sciences, the first book of the selected works of Academician A.N. Kolmogorov
"Mathematics and Mechanics" ("Nauka", Moscow) came out in 1985. As for
the second book of Kolmogorov's works planned for publication, the editorial
board decided to divide it into two parts: the first contains the papers on
prob ability theory and mathematical statistics, and the second those on infor-
mation theory and the theory of algorithms. So the articles marked with two
asterisks in the list of references of the first book should be naturally divided
into those given in the present second book and those prepared for publication
in the third book.
The second and third books of the selected works of Kolmogorov were
prepared for publication by Yu.V. Prokhorov and A.N. Shiryaev.
A few words about A.N. Kolmogorov*
The remarkably broad creative interests of A.N. Kolmogorov, the wide range
and variety offields ofmathematics he worked in through different periods ofhis
life - all this makes Andrei Nikolaevich distinguished among mathematicians
in our country and all over the world. This diversity of interests makes hirn
unique among the mathematicians of our time. In many fields of mathematics
he obtained truly fundamental and principally important results. The problems
were often most difficult to solve and required great creative endeavour. This
is true for the results obtained by Andrei Nikolaevich in his young years on the
theory of sets and functions, both the descriptive and the metrical theories; for
example, the theory of operations on sets developed by hirn and his celebrated
example of a divergent Fourier series.
This was followed by papers on general measure theory, both the abstract,
that is, "the general theory proper", and the geometrie theory. After that
Kolmogorov started his fundamental work in various branches of probability
theory, work that made hirn, beyond any doubt, the most outstanding among
the researchers in this field all over the world.
Along with this, Kolmogorov wrote his first papers on mathematicallogic
and the foundations ofmathematics. Later, these were supplemented by studies
in information theory.
Andrei Nikolaevich made a very important contribution in topology. It
suffices to say that simultaneously with the outstanding American topologist
J .W. Alexander and quite independently of him, Kolmogorov came to the no-
tion of cohomology and laid the foundation of the theory of cohomology oper-
ations, obtaining the results that transformed topology as a whole. The deep
and outlook. Even when our opinions on some question differed, we always
respected each other's viewpoint and never lost our deep sympathy for each
other.
As already mentioned, Kolmogorov has a lot of students in various fields of
mathematics, and some of them have hecome famous in their areas. The older
students of Kolmogorov are Sergei Mikhailovich Nikolskii (h. 1905) and the late
Anatoli Ivanovich Mal'tsev (h. 1910), hoth Academicians. Next in age are Boris
Vladimirovich Gnedenko (h. 1912), Academician of the Ukrainian Academy
of Sciences and expert in prohahility theory of world-wide recognition, Acad.
Mikhail Dmitrievich Millionshchikov (1913-1973) and Acad. Izrael' Moiseevich
Gel'fand (h. 1913), elected as a foreign memher of the USA National Academy
of Sciences and Paris Academy of Sciences. Much younger, though also helong-
ing to the older generation of Kolmogorov's students, are Acad. Aleksander
Mikhailovich Ohukhov (h. 1918) and Corresponding Memher Andrei Sergeevich
Monin (h. 1921).
They are followed hy Vladimir Andreevich Uspenskii, Vladimir Mikhail-
ovich Tikhomirov, Vladimir Mikhailovich Alekseev, Yakov Grigor'evich Sinai
and Vladimir Igorevich Arnol'd, Corresponding Memher since 1984.
The largest group of Kolmogorov's students works in prohahility theory
and mathematical statistics. It includes Acad. Yurü Vasil'evich Prokhorov,
Corresponding Memher Login Nikolaevich Bol'shev, Acad. ofthe Uzhek Acad-
emy of Sciences Sagdi Khasanovich Sirazhdinov, Acad. of the Ukrainian Acad-
emy of Sciencesand Corresponding Memher of the USSR Academy of Sciences
Boris Aleksandrovich Sevast'yanov, Juri Anatol'evich Rozanov, Al'hert Niko-
laevich Shiryaev and Igor' Georgievich Zhurhenko. Of course, this list is hy no
means completej from the very title of my note it is clear that this is neither a
juhilee review of the life and activities of A.N. Kolmogorov, nor a traditional
''juhilee article" j so it does not claim to he complete in any respect.
P.S. Aleksandrov
1. ON CONVERGENCE OF SERIES WHOSE TERMS
ARE DETERMINED BY RANDOM EVENTS *
Jointly with A.Ya. Khinchin
Consider aseries
(1)
whose terms are random variables; denote the values taken by Yn (their number
. fi me
IS ·t or, POSSI·bly, count a ble ) by Yn(1) , Yn(2) , ... , Yn(i) , ... , an d th e correspon d·mg
·l·t·les by pn(1) , Pn(2) , ... , pn(i) , ... , WI·th"
prob a b11 (i)
L....i Pn =.
1 Fur ther, denote by
an = Ly~i)p~)
i
bn = L {y~i) - a n }2 . p~)
i
§1
Let us reduce the problem to the function-theoretic form more suitable for uso
For this, we construct a system of functions
* Über Konvergenz von Reihen, deren Glieder durch den Zufall bestimmt werden',
Mat. Sb. 32 (1925), 668-677.
1 Math. Ann. 87 (1922), 135.
1
2 ON CONVERGENCE OF SERIES WITH RANDOM TERMS
for all n and i; divide the interval ° ~ x ~ 1 from left to right into parts of
lengths
(1) (2) (i)
P1 ,PI , ... , P1 , ...
and assume that on each of these subintervals 4>1 (x) equals a constant, namely
1 1
4>1(x)dx = 0, 1 1
{4>1(x)}2dx = b1.
If the function 4>n-1 (x) is already determined, then the function 4>n (x) is
determined as follows: each interval on which 4>n-l (x) is constant is divided
into intervals whose lengths (from left to right) are proportional to the values
1 1
Then
1
4>n(x)dx = 0, 1
{4>n(x)}2dx = bn.
The values of the functions 4>n (x) at the ends of the subintervals are unimpor-
tant and can be taken arbitrarily.
Assuming that I:an converges, the probability that (1) converges is equal
to that of the convergence of
(2)
which in our case is given by the (Lebesgue) measure of the set on which this
series converges. Hence, our main result can be interpreted as folIows: 2 con-
vergence of L bn is a sufficient condition for the almost everywhere convergence
of (2).
2 Clearly, this statement is the same as that stated in the introduction. Gen-
eral considerations of this kind of relation can be found in Steinhaus's work
(Fund. Math. 4 (1923), 286-310).
ON CONVERGENCE OF SERIES WITH RANDOM TERMS 3
The proof given below generalizes Rademacher's proof for a special case. 3
Set
n
'll"n(x) = TI {1 + tPk(X)},
k=l
We have
Let eS (a, b) be an inter val of constancy of the function tPn (t), taken to be
a ~ x < b. Clearly
and, consequently,
L.....
" " zn(i) Pn(i) -- 0 ( n - 1, 2 , ... )
i
we obtain
Iz~)p~)1 = IE z~i)p~)1 (n,j = 1,2, ... )
iti
and the Schwarz inequality implies
(')
Izn' I ~
Il
Vbn
J1- Ci)
PnCi)
(n,j = 1,2, ... ).
pn
Hence, if on (a, b)
then
(4)
3 See footnote 1.
4 ON CONVERGENCE OF SERIES WITH RANDOM TERMS
and
then by our hypotheses the inequality p~+t) ~ P~11 holdsj therefore a fortiori
r V
we have
a
n
k=l
rr 1 - pUn+t}
IJa <Pn+l(t) dt l < p~k\/bn+l Un;~l
Pn+l
(6)
11 x
<Pn+l(t)dt l ~ 2v'bn+1plh)p~2) ... pW n )Vl- p~+t), (7)
n+l >
since this follows from (6) for pUn+l) n+l < 12·
- 12 and from (5) for pUn+l)
Now, (3), (4) and (7) imply
If we set
00
II(1+h)=B,
k=l
then
n
l6. n (x)1 < Bbn +1 + 2Bvn +1 II (1 - Vk). (8)
k=l
the right-hand side of (8) contains the general term of a convergent series.
Hence, the limit
lim fn(x)
n_oo
= f(x)
exists.
In the special case considered by Rademacher, f(x) turned out to be a
monotone function, which is not necessarily so for the general case considered
here.
However, monotonicity was used only to prove differentiability of f(x), so
it suffices to demonstrate that f( x) is a function of bounded variation. This is
easily proved as folIows. In any case the total variation of f(x) is not greater
than the upper bound ofthe total variations of all the fn(x) and, consequently,
it is not greater than
But
n n
where the admissibility of changing the order of taking the product and inte-
grating follows directly from the particular structure of the functions <p k (x).
This gives the desired statement. Thus, the derivative f'(x) exists almost
everywhere. Let x be an arbitrary point of (0,1), other than a sub division
6 ON CONVERGENCE OF SERIES WITH RANDOM TERMS
point, at which f'(x) exists, and let (an, bn ) denote the interval containing x
on which <Pn(x) is constantj since f(a n ) = fn(a n ), f(b n ) = fn(b n ), we obtain
since 4
fn(b n ) - fn(a n ) = 1I"n(x)
bn - an
because 1I"n(x) is constant on (an,b n ).
Hence, lim 1I"n(x) exists almost everywhere. On the other hand, the series
n-+oo
§2
If the series (2) diverges at every point of some set of positive measure,
then there exists a set E of positive measure mE and a positive constant A
such that
1
f (x) = 2f[f(x + f) - f(x - f)] =
I
for each positive integer and each x E E, for a suitable p = p(x, n) > n.
Since sp(x )-sn(x) is constant on every interval on which f/;p(x) is constant,
by construction of our functions, we can find a finite number of non-overlapping
intervals Ö such that:
1) E ö (the sum of the lengths of all Ö's) is greater than ! mE;
2) Each Ö is an interval on which f/;p(x) (p> n) is constant and
consequently,
L
k
p=n+l
bp = 1
0
1
{Sk(X) - Sn (x)}2dx ~ A 2 Lö > !A 2mE.
§3
then
- f f {SIt;+1(X1) - SIt;+1(X2)}2dx1dx2'
JF/clF/c
f f {SIt;+1(X1)-SIt;+1(X2)}2dx1dx2 = f f {SIt;(xt}-SIt;(X2)}2dx1dx2+
lE/c+l lE/c+l lE/c lE/c
+ f f {tPlt;+l(Xt} - tPIt;+1(X2)}2dx1dx2-
lE/clE/c
n-l
I:{2mEk+l . mF + (mF )2} $ l.
k k
k=O
As a result we have
n-l
[ [ {Sn(Zt} - Sn(Z2)}2dz 1 dz 2 ~ I: 2bk+l(mEk)2 - (f( + 2M)2,
JEn JEn k=O
n-l
I: 2bk+l(mEk)2 < 2(f( + 2M)2
k=O
and a fortiori,
n-l
(mE)2 I: bk+l < (f( + 2M)2,
k=O
5 The uniform boundedness of the Yn clearly implies that of the an and conse-
quently that ofthe tPn(X).
10 ON CONVERGENCE OF SERlES WITH RANDOM TERMS
When E Tn converges we call the series (1) and (10) equivalent. 6 Clearly, in
this case Q P. =
Now we claim that a necessary and sufficient condition for Q = 1 to hold
for (10) is the existence of an equivalent series (1) for which E an and E bn
converge. Clearly, only the necessity of the condition needs to be proved.
=
Thus we suppose that Q 1 and define the series (1) as follows. Let Yn =
Un if IUn I < 1 and Yn = 0 otherwise. Hence, T n coincides with the prob ability
=
of lun(x)1 ~ 1. Since Q 1, this immediately implies the convergence ofETn ,
so that (1) and (10) are in fact equivalent. Thus, the constructed series. (1) is
already uniformly bounded, therefore according to §3 the corresponding series
E an, E bn converge, and the required result is proved.
Note also that our proof not only justifies the existence of (1) but also
gives an extremely simple rule for constructing this series.
Moscow, 3 December 1925
of the trial Ek. However, the theorem to be proved holds also in the general
case.
Let Fn be a variable wh ich depends on the first n trials. If for any positive
f the prob ability of
(2)
* 'Sur la loi des grands nombres', C. R. Acad. Sei. Paris 185 (1927), 917-919.
Presented by E. Borel.
11
12 ON THE LAW OF LARGE NUMBERS
(5)
maxlXnl = o(.;ri).
The above is a generalization of Chebyshev's ease, where F n is the sum of
independent random variables, sinee it eoneerns sums of dependent variables.
Very often the summands here are functions of eertain other independent vari~
ables. We think that in this ease the above method for arbitrary functions in
independent variables would be very natural.
31 Oetober 1927
3. ON ALIMIT FORMULA OF A. KHINCHIN *
Let
(1)
Sn = LZk, B n = Lbk.
k=l k=l
A.Ya. Khinchin [1] suggested that under broad assurnptions the probabil-
ity that
·
I Imsup Sn = 1 (2)
V2Bn log log B n
is 1. Khinchin himself proved this formula for some important special cases [2].
Our aim is to formulate sufficiently general conditions for this formula to hold.
These conditions are as folIows:
(I) Bn -+ 00,
Clearly the first condition is necessary, except for the case Zn = O. 1f the
Zn are uniformly bounded, then the second condition follows fram the first; in
this case this condition is necessary and sufficient.
If we want to use only probabilistic relations that can be effectively ob-
served, the meaning of formula (2) can be explained as folIows:
* 'Sur une formule limite de M.A. Khintchine', C. R. Aead. Sei. Paris 186 (1928),
824-825. Presented by J. Hadamard.
13
14 ON ALIMIT FORMULA OF A. KHINCHIN
2°. For any 'TJ, 6 and m, there exists an integer P such that the probability
that all the inequalities
(3)
References
1. A.Ya. Khinchin, Basic laws 0/ probability theory, Moscow, 1927 (in Rus-
sian).
2. A.Ya. Khinchin, Math. Ann. 99 (1928), 152.
§1. Notation
P(A) denotes the probability of an event A;
PB(A) or P(AIB) denotes the conditional prob ability of event A with
respect to an event B;
E~ denotes the expectation of a randorn variable ~;
over B;
D~ = E(~ - E~)2 denotes the variance of a randorn variable ~.
If 6 , ... , ~n are randorn variables, we set
k
Sk = L~;' S = Sn;
;=1
k
(We assurne that E~k and E~; exist; 1 ::; k ::; n.) Clearly E(k = 0, T k
Sk - ESk . If 6, ... , ~k are independent, then
k
DSk = LD~i.
;=1
* 'Über die Summen durch den Zufall bestimmter unabhängiger Grössen', Math.
Ann. 99 (1928), 309-319; Bemerkungen ... - Math. Ann. 102 (1929), 484-488.
15
16 ON SUMS OF INDEPENDENT RANDOM VARIABLES
R=fD+M.
Then
Theorem 4.
P{T > O} > ~ D - M.
- 16 D+ M
Theorem 5.
P{T> -3M} 2: 418.
Theorem 6.
Since the variables (i with i >" k do not occur in the definition of Ak, the
hypothesis that 6, ... ,en are independent implies
n
E(T 2 ;Ak) = E(T;;Ak) + L E((l;Ak) 2: E(T;;Ak) 2: R 2 p(Ak),
i=k-1
and
n
Ai = LAik.
k=l
On the set A ik we have
Clearly
(jl ~
P
P(Ai+1IAik ) = P{u ~ (i+ 1)R}:S P{dPfX<J L €D}.
_P- j=k+l
As in the proof of Theorem 1, we note that the variables (j for j > k do not
occur in the definition of Aik. Therefore,
and consequently
Bk = {ISil:SR, i:Sk},
B = B u = {v:S R},
Ck = Bk-l \Bk.
Clearly
n
Put
Remark. If we take z" instead of (", the statement of the theorem will be valid
with v replaced by u.
ON SUMS OF INDEPENDENT RANDOM VARIABLES 19
Clearly
Bo = {T > O}.
By Theorem 2,
00 00
~ R[P(Bo) + 312]'
On the other hand, it is easy to prove that
E(T; B o) ~ ~D.
Thus
as required.
Then by Theorem 4
T 1 D-M 1
P{ > O} ~ 16 D + M ~ 48'
20 ON SUMS OF INDEPENDENT RANDOM VARIABLES
If
D>~M
- 2
then by Theorem 1,
m:x IS - L(il.
n
W=
i=k
Below necessary and sufficient conditions are considered for the convergence of
series in independent random variables, previously established by other meth-
ods in our joint paper with A.Ya. Khinchin (see paper 1 of this volume).
Consider two sequences of random variables
L:'7i
i=l
(5)
is 1 if there is a sequence ij = (ijn) equivalent to '7 = (77n) such that the series
00
L:Eiji (6)
i=l
and
(7)
Since the series (7) converges, for every { > 0 we can find some (possibly large)
m such that for n ~ m the right-hand side of (8) is smaller than f.
By definition,
p
(9)
is 1. Since the series (6) converges, the same is true for the series
00
L: ijn. (10)
n=l
2) We set
_ { 1/n,
1/n = (11)
0,
First we assume that 1/ = (1/n) and Ti = (Tin) are equivalent , that is,
00
In this case one of the series (6) or (7) should diverge. If the series (7) diverges,
then by Theorem 3 we see that for any Y and each n,
N
p{ max IYkl< 1} = II[l- P{I1/kl
n<k<N -
> 1}] -+ 0
- - k=n
denoted by lI1Jnkll for short. Suppose that in every row the variables are inde-
pendent of one another, whereas in different rows they may be dependent.
We say that the means
are stable if there exists a sequence of numbers d1 , d 2 , ••. , such that for any
(: > 0,
We will give a necessary and sufficient condition for the stability of the means.
We say that two systems l11Jnk 11 and 117Jnk 11 are equivalent if
and
Clearly for equivalent systems the means are simultaneously either stable or
unstable.
(13)
(14)
24 ON SUMS OF INDEPENDENT RANDOM VARIABLES
Proo/ 0/ the necessity. Suppose that the me ans U n , n ~ 1 are stable. For any
random variable TJnk there exists a constant fnk such that
(15)
as n -+ 00.
Now set
Tink = {TJnk' (17)
fnk, if ITJnk - fnkl > m n;
_ () _ { TJnk,
TJnk f - ~
Jnk, if ITJnk - fnkl > mnf.
By (16) the systems IITJnkll, IlTinkll and IlTink(f)1I are equivalent, and the means
ii n and iin(f) are stable. Note that
Therefore by Theorem 6,
mn
P{liin(f) - dnl > f} = p{IE 71nk(f) - mndnl > fmn} ~
k=l
1 [ 324m 2 f 2 ]
~ 48 1- L~l E«(~k(f» .
For any f °
> the left-hand side of this inequaIity converges to zero. Hence,
(18)
For f ~ 1,
and, by (16),
ON SUMS OF INDEPENDENT RANDOM VARIABLES 25
Remark. If there exists a system lliink 11 satisfying the condition of the theorem,
then the system determined by (17) and (15) is such a system.
We say that the means O'n, n ~ 1, have the property of normal stability if
for any f > 0,
The proof of the theorem follows directly from (14) which was established
for any stable system.
Re mark. For the case ofnormal stability, instead of lliinkll determined by (17),
(15), one can take a system
(19)
it is necessary and sufficient that there exist a system lliink 11 such that
26 ON SUMS OF INDEPENDENT RANDOM VARIABLES
n
1. LP{i1nk:P 'l]nk} --+ O.
k=l
1 n
2. - L[Ei1nk - E'I]nk] --+ O.
n
k=l
3.
_ { 'l]k, if I(kl ~ n,
'l]nk = E
'l]k, if I(kl > n.
2. -n1 k=l
L E((k;
n
I(kl ~ n) --+ O.
3.
REMARKS
The purpose of these remarks is, on the one hand, to refine and improve certain
results (§§1, 2) and on the other hand, to study an important special case of
the law of large numbers (§3).
§1
Theorem 5*.
as required.
If IMSI < 2D, then we introduce the event
On the set
5
E' = LErn
rn=O
28 ON SUMS OF INDEPENDENT RANDOM VARIABLES
we have
ITI ~ 18D, ISI ~ 20D.
Hence
Since
E(S2., E) <
- 1D
4
2
,
it follows that
00
whence
1
P(E) ~ 1600'
which proves Theorem 4" .
Pro%/ Theorem 5". If M > D or R > tD, then the required inequality holds,
since then its right-hand side is negative.
If, however, M ~ D and R ~ ! D, then the required inequality follows
from Theorem 4" .
§2
Theorem 6". Let E 1 , ... ,En be a sequence 0/ independent events and let U
be an event such that
Then
ON SUMS OF INDEPENDENT RANDOM VARIABLES 29
then
as required.
Now for all k let
and let Fi be the complementary event. Since Fi and Ei are independent, for
i ~ k we have
k k
P(U) ~ Ep(UFiEi) ~ lu Ep(Ei) ~ luP(Fk) ~ ~u2,
i=l i=l
as required.
Fk = {I (E
i#
7Jni + Ink) / mn - dn I~ ~}.
Clearly
30 ON SUMS OF INDEPENDENT RANDOM VARIABLES
P(U) ~ t,
hence
If the event E k takes place and F k does not, then U takes place. Therefore
P(Fk!Ek) = P(Fk) ~ ~,
P(U!Ek) ~!.
By Theorem 6*
1 2
P(U) ~ 36P (E1 + ... + Emn )
as desired.
§3
Here we study a special case of the law of large numbers, namely the case
when the independent random variables 171,172, ... have the same distribution
P{171 < x} = F(x). Then we have
Theorem 12. The means (Fn = (171 + ... + 17n)/n, n ~ 1, are stable if and
only if
nP{!171! > n} ---+ 0, n ---+ 00.
Proo! Set
_ { 17n, if !17n! ~ K,
17nk = 0, if !17n! > K.
ON SUMS OF INDEPENDENT RANDOM VARIABLES 31
It is easy to see that if nP{ 11]11 > n} - °as n - 00, then the systems
(r,nk) and (1]n) are equivalent and
1 ~ -2 1 ~ -2
n 2 L.J E(nk ~ n 2 L.J E1]nk - 0, n - 00.
k=l k=l
mP{I1]l - I1 ~ w} - 0,
It can be shown that in this case we have normal stability. Thus the
following theorem holds:
Theorem 13. The stability ofthe means O'n = (1]1 + ... + 1]n)/n,
n ~ 1, of a
sequenee of independent identieally distributed random variables with singular
distribution is normal if 2 and only if
The latter statement was earlier proved by A.Ya. Khinchin (G.R. Aead.
Sei. Paris 188 (1929), 477).
8 February 1929
2 The 'only if' part is trivial and follows from the fact that the definition of normal
stability makes sense only if the expectation exists.
5. ON THE LAW OF THE ITERATED LOGARlTHM *
Following A.Ya. Khinchin we say 1 that the sequence (1) obeys the law of
iterated logarithm if the prob ability that
·
1lmsup Sn = 1 (2)
.j2Bn In In B n
is 1. In one important special case this law was established by Khinchin him-
self; 2 we will prove that it is applicable under the following conditions:
I. Bn --> 00,
* 'Über das Gesetz des iterierten Logarithmus', Math. Ann. 101 (1929), 126-135.
1 See his book Basic laws 01 probability theory, Moscow, 1927 (in Russian).
2 Math. Ann. 99 (1928), 152.
32
ON THE LAW OF THE ITERATED LOGARITHM 33
§1. Lemmas
Let
Lemma I. 11 xM $ B, then
(5)
W(z) a2B(1 + 2
< exp [-az + -2- aM)] .
Lemma IV. If
then
W(z) > e-(X2/2B)(1+ f ),
We further set
a = z/(B(1 - 6)).
Clearly,
1+ u > e u (1-u),
(16)
(17)
a 2 B ( 1 - 4"
Ee os > exp[-2- 62 )] . (18)
-1: 1:
On the other hand we have
Ee os = eOlldW(y) =a eOIlW(y)dy =
=a ({O
1-00
+ l
0
OB (1-6) + l OB (1+6)
oB(1-6)
1
+ 80B
oB(1+6)
+ (>0) =
180B
(19)
36 ON THE LAW OF THE ITERATED LOGARITHM
aJI ~a 1 0
-00
eaYdy ~ l. (20)
Therefore we obtain
aJs < a foo e-aYdy< l. (21)
lsaB
Since by (18), (9) and (15),
Ee aS > 8,
(22)
To estimate aJ2 and aJ4 we apply Lemma I. Because of (6) and (8), for y ~ 8aB
we have the inequalities
1
0< w < 862,
W (y) < exp [- : ; (1 - ~62) ].
Therefore we obtain
a(J2 +J4)<a( lo
aB (1-6)
+
jSaB
aB(l+6)
y2 1
)exp[aY-2B(1-862)]dy.
u(y) = ay - -y2 ( 1 - _6 2 1)
2B 8
does not exceed
ON THE LAW OF THE ITERATED LOGARITHM 37
Hence we have
we finally obtain
(24)
(26)
Since, along with (23) and for similar reasons, the formula
W(x) > exp[- a:B (1 + 46)] > exp [ 2B(;~ 6)2 (1 + 46)] >
Since the Zi with i > k do not occur in the definition of the event E", we
have
PE,,{S~U-V2B}~ !'
W(z - V2B) = P{S ~ z - V2B} ~
B no > e, (28)
4 The + sign is used instead of U in order to stress the fact that the events in
question are pairwise incompatible.
ON THE LAW OF THE ITERATED LOGARITHM 39
M JlnlnBn ~ (31)
n Bn < 4'
After no, nl, ... , nk-l have been defined, we choose nk so that
(32)
(33)
implies
(36)
(38)
while according to Lemma V, the latter prob ability does not exceed
40 ON THE LAW OF THE ITERATED LOGARITHM
so that we have
and by Lemma I,
where, by (31),
()= MnkJlnlnBnk <~.
2 Bnk 8
This implies that
As before, suppose that (27) holds. Conditions I and 11 and the first part of
the main theorem just proved imply that there exists an no such that, first (28)
holds, secondly, for every n ~ no,
(39)
(42)
ON THE LAW OF THE ITERATED LOGARITHM 41
Set
(45)
Since Snk = (Tk + Snk_l' it follows that if (40) holds, then the inequality
(46)
(47)
Now we prove that for sufficiently large p the probability that at least one
of the inequalities (46) holds for k = 1,2, ... ,p is greater than 1 - 1//2. This
clearly implies that the corresponding prob ability for the inequalities (47) is
greater than 1 - 1/, which proves the second part of the main theorem.
Since the (Tk are mutually independent, for our purpose it suffices to prove
the divergence of the series
00
LVk (48)
k=l
x = X(ßk)(l- 6/4).
lim X2ßk
(ßk) >
- lim(llnlnB
2 nk
) -
-
00
.
From the latter formulas we can see that f tends to zero as k ---+ 00. Thus
for sufficiently large k we have
vk>exp[-X:~k)(l-~)] =exp[-lnlnßk(l-~)] =
§1. For an accurate formulation of the problem, consider the sequence of real
numbers 3
Let us agree to say that X n is stable 5 or that X n obeys the law o/large numbers,
if there is a sequence of constants
(1)
* 'Sur la loi des grands nombres', Atti. Accad. Naz. Lincei. Rend. 9 (1929), 470-
474. Presented by G. Castelnuovo on 3rd March 1929.
1 See: S.N. Bernshtein, Probability theory, p.142, where a more general definition
is given.
2 C. R. Acad. Sei. Paris 185 (1927), 917.
3 Similar results can be obtained by considering vectors. V. Glivenko has informed
me that similar arguments hold even for vectors in Hilbert space of functions.
4 We assume throughout that these trials are independent.
5 Concerning the definition of stability, see my paper: 'Über die Summen durch
den Zufall bestimmter unabhängiger Grössen', Math. Ann. 99 (1928), 309 (Paper
4 in this volume.)
43
44 ON THE LAW OF LARGE NUMBERS
(2)
for every positive 71, then we say that X n has normal stability. It can be proved
that if X n - E(Xn ) is uniformly bounded, then (2) follows from (1), that is, in
this case the stability can only be normal.
According to Chebyshev we have
(3)
where
therefore, setting
we have
X n - E(Xn ) = Znl + Zn2 + ... + Znn.
More generally, denoting by Ek(Y) the expectation of Y and assuming that
the results of the trials E~n), E~n), . .. ,Ein) are known, we have
Since Zni is constant for i < k, if the results of the first k - 1 trials are fixed,
we also have
and finally,
2 - ß2
B n- ß2 ß2 (7)
nl+ n2+"'+ nn'
§3. Now we consider the case when the trials Ein) determining the value of X n
are independent. Denote by E k (Y) the expectation of Y when the results of
6 In my article (see footnote 5) this formula is proved only for the case of inde-
pendent trials.
7 It is assumed that the trials Ein) are sequentially carried out in accordance with
the order of their indices k.
46 ON THE LAW OF LARGE NUMBERS
all trials &ln) except &k n) are known. In the case under consideration, that is,
when the trials are independent, we have
(9)
(10)
(11)
is sufficient.
Formula (9) can be proved as folIows:
EdEk(Xn)] = Ek-l(Xn),
Ek[Xn - E k(X n )]2 ~ {Ek(Xn ) - E k [E k (Xn )]}2 = [Ek(Xn ) - E k- 1(XnW,
Thus, Ünk is the maximal deviation of X n if only the result of the trial &k n) is
unknown. We see that
(12)
(13)
(14)
8 This condition is especially interesting, since the definition of the moments O'nk
[k
does not involve the order of the trials n ).
ON THE LAW OF LARGE NUMBERS 47
where Xk depends on the trial c~,.) and these trials are independent, we have
48
GENERAL MEASURE THEORY AND PROBABILITY CALCULUS 49
Axiom I.
M(E) ~ o.
Secondly, we assurne that iftwo sets do not intersect, then the measure oftheir
sum is equal to the sum of their measures.
50 GENERAL MEASURE THEORY AND PROBABILITY CALCULUS
Axiom 11. If
then
Here we assurne that the existence of measures for two of the sets involved
implies the existence of a measure for the third set.
Based on this axiom the general formula
(1)
can be proved. One should not assurne, however, that the existence ofmeasures
of two intersecting sets implies the existence of a measure for their sum or
difference: there are certain important measures without this property.
Since the product of two sets can be defined as
(2)
(1) can be used for deriving all the various relations among sums, differences
and products of a finite number of sets.
For convenience we include the empty set in the domain of sets to be
considered, and denote it by o. If for a certain measure at least one set has a
measure, then Axiom 11 implies that
M(O) = O. (3)
Finally, the third axiom intro duces a quite arbitrary restrietion on the measures
considered: the measure of the whole space is 1.
Axiom 111.
M(A) = 1.
This greatly simplifies the discussion and the general case can be easily
derived from this particular case. In many applications, in particular, in prob-
ability calculus, this restrietion is due to the essential nature of the subject.
Let us now consider some examples of measures satisfying these axioms.
1) The Lebesgue measure of point sets in an n-dimensional cube with
side equal to 1.
GENERAL MEASURE THEORY AND PROBABILITY CALCULUS 51
H. CLOSEDNESS OF A MEASURE
§1. A given measure M determines the system (E) of sets E having a measure.
Suppose we know about this system that it includes the empty set 0 and the
whole space A, and also that if
and two of these sets belong to (E), then so does the third one. Clearly, the
complement of a set belonging to (E) also belongs to (E) since
E + E = A, E . E = O.
§3. We say that a measure M contains a measure M' given on the same set A
if (E) contains (E') and both measures coincide on the sets belonging to (E').
One should not assurne that any measure is contained in some complete
one; counterexamples are elementary. Still, for none of the measures usually
employed in mathematics has it been proved that it is impossible to consider
it as contained in a complete measure. On the contrary, using the axiom of
choice, S. Banach proved that linear Lebesgue measure is contained in a certain
complete measure. 1 In view of the metric equivalence of spaces of arbitrary
dimension the same is true for n-dimensional Lebesgue measure. But already
1 Fund. Math. 4.
52 GENERAL MEASURE THEORY AND PROBABILITY CALCULUS
§5. In most cases, however, the system (E) can be assumed to possess certain
closedness properties. We say that a system (E) is finitely c10sed if it contains
an sums of pairs of sets belonging to this system. Clearly, since
(4)
a closed system contains an the differences, and because of (2), it also contains
the products of sets belonging to this system.
§6. For each system (E) there is a certain minimal system F(E) that is closed
and contains (E). If one can assign to an sets of the system F(E) a measure
coinciding with the measure given for the sets of (E) and satisfying our three
axioms, then we say that the initial measure is c1osable.
It is not known whether every measure is closable. If closure is possible,
then it is not necessarily closable in only one way; examples of the latter are
elementary.
It would seem that it is very difficult to find a measure that closes the
measure given above under 3).
It is also doubtful if a measure connected with some problem in probability
calculus need be closed.
2 My results in C. R. Aead. Sei. Paris (1925) point out the similarity of these two
problems.
GENERAL MEASURE THEORY AND PROBABILITY CALCULUS 53
§7. A system (E) is called countably closed if it contains aJI possible countable
sums of its elements. A countably closed system contains countable products
of its elements, since
II E = L:E
n n (n = 1,2, ...). (5)
implies
M(E) = L: M(En) (n = 1,2, ... ).
Of the measures mentioned above, those in 2) and 3) are not normal.
Measures corresponding to problems in prob ability calculus also need not be
normal.
§9. For a finitely closed normal measure it is easy to prove the formulas
n-1
M(~En) = ~M(En- ~Ek)' (6)
k=1
n
M(II En) = limM(II Ek)' n = 1,2, ... , (7)
k=1
which are true under the single condition that the sets E n and either their sum
or product have a measure.
§10. Moreover, for finitely closed normal measures the following Proposition
on countable coverings holds: if
then
M(E):5 ~M(En) (n = 1,2, ... ).
54 GENERAL MEASURE THEORY AND PROBABILITY CALCULUS
Indeed, set
n-l
Then clearly
E~ . E:.r, = 0, n # m;
such that
E C 2:En, E C 2:E~,
§12. Thus we obtain a certain new measure L(M) which measures any set
that is measurable with respect to M. If M is finitely closed and normal, then
L(M) satisfies Axioms 1-111, is normal and countably closed. The methods for
proving this are the same as those used for studying Lebesgue measure. Thus,
for any normal finitely closed measure there exists a countably closed measure
containing it.
§13. The measure L(M) is not the minimal countably closed measure con-
taining M. It is not too difficult to determine the minimal such measure B (M)
contained in L(M).
GENERAL MEASURE THEORY AND PROBABILITY CALCULUS 55
However, the measure L(M) has another remarkable property: a set that is
measurable with respeet to M has the same measure with respeet to any normal
measure eontaining M as with respeet to L(M).
§16. The cube of arbitrary dimension with side equal to 1 and with Lebesgue
measure on its subsets is metrically equivalent to an interval of length 1, also
with Lebesgue measure. The proof of this is elementary.
§17. Two sets EI and E 2 in a metric space have the same metrie type if
In many cases when we can neglect sets of measure zero only the metric
types, not sets themselves, need be involved in the argument. 3
If a measure is finitely closed, then the type of the sum, product or differ-
ence of two sets with a measure depends only on their types and consequently
we can speak of the sums, products and differences of the types themselves.
If a measure is countably closed and normal, then so are the countable
sums and products.
§18. Two finitely closed spaces are called isometrie if a one-to-one corre-
spondence can be established between the metric types of their subsets with
measures in such a way that the· sum of two types corresponds to the sum of
the corresponding types, and the measures of corresponding types are equal.
§19. A measure has a eountable basis ifthe system (E) has a countable sub-
system (I) (a basis) such that any set of (E) is measurable for the basis (1),
that is, this set and its complement each have a covering by sets of (1) in which
the sum of the measures of the covering sets is greater than their measure by
an arbitrary small value. Of course, this definition is interesting only when the
space is normal.
The Lebesgue measure on an interval has a countable basis consisting
of segments with rational ends points. Normal spaces with the cardinality
of the continuum and with a countable basis are not known (except those
determined using the axiom of choice). Such aspace of cardinality 2C can be
easily constructed.
§20. A measure M that is normal, finitely closed and with a eountable basis
is isometrie to the measure M l , wh ich is the Lebesgue measure 01 the interval
(0,1).
§21. In conclusion, since many properties ofpurely metric spaces depend only
on relations between the metric types of their subsets, which can be identical
for spaces that are most different, even for spaces of different cardinalities, it
would be worth trying to set forth the theory of such spaces considered as
systems of metric types that can be added, multiplied, etc., without assuming
the existence "elements" of the space.
§22. We will consider only functions that take real values at all points of a
purely metric space. Such a function is called measurable if the set of points at
which it takes values belonging to an interval is always measurable.
§23. Clearly, the set on which a function takes a certain value a is also mea-
surable, since
E[b < 1 < a] + E[I = a] + E[a < I< c] = E[b < I< cl·
GENERAL MEASURE THEORY AND PROBABILITY CALCULUS 57
If the measure in the space considered is countably closed, then the set
on which a measurable function takes a (B)-measurable set of values has a
measure, since it can be obtained by countable additions and multiplications
of sets that obviously have a measure. But one cannot claim that the same
is true for an (L)-measurable set of values of a function, even if the measure
considered is of the type L(M).
. L
hm
n_oo
oo
rn
-M
n n
rn + 1) E,
[(rn-<f:5--n
]
m=-oo
provided that the series converges absolutely and the limit exists. When con-
sidering problems connected with the notion of integral we assurne the space
to be normal.
§26. It can be proved in the usual way that for a bounded measurable function
the integral exists.
§27. For an integral defined in such a way the following relations hold:
58 GENERAL MEASURE THEORY AND PROBABILITY CALCULUS
§28. The notation of expectation whieh is dose to that of the Lebesgue inte-
gral, has long been used in probability theory. If prob ability is eonsidered as
a measure on the set of elementary events, then the expeetation of a random
i z,
variable Z is
D(Z) =
and its expectation under a hypothesis E is
DE(Z) = M~E) L Z.
then
m=-QO
VI.INDEPENDENCE
§30. Two partitions of a spaee A into non-interseeting parts
A= LF', A= LF"
are independent if for any measurable sets E' and Eil eomposed from the ele-
ments of eertain sets F' and F" respeetively,
§31. The (finite or infinite) product of partitions [F] is the partition of the
space into the products of elements of the given partitions, with one element
taken out of every partition.
A finite or infinite number of partitions [F] are mutually independent if
any two products of these partitions without common elements are indepen-
dent. Mutual independence of partitions does not imply independence of these
partitions.
§33. Every function defines a partition of the space into the sets at which
it takes a certain value. Functions are called mutually independent if their
corresponding partitions are independent.
Let
_
O"n -
Sn _
- -
Xl + X2 + ... + X n
-------
n n
is equal to 1.
We will now prove that the SLLN holds ifthe second moments E(x~) = bn
exist and the senes
(2)
converges. This condition cannot be replaced by any weaker one: if for some
sequence of constants bn (2) diverges, then we can construct a sequence (1)
of independent random variables satisfying E(x n ) = 0, E(x~) = bn but not
satisfying the SLLN.
To prove this, we can use the lemma expressed by the formula
(3)
60
ON THE STRONG LAW OF LARGE NUMBERS 61
n=l
00 1 00 1 2
= P{limsup Iunl > f} ~ L: Pm ~ f2 L: (2 m )
(4)
m=O m=O
L:. bn ~
,=0 m=' n=2'
i +1
1 00 1 2 n <2 Soobn
~ f2 L:(2i-1) L: bn ~ f2 L: n 2 '
i=O n=2i n=l
But P does not change if a finite number of first terms in (1) are replaced
by 0. If (2) converges, then the last term in (4) can be made arbitrarily smalI.
Hence P = 0. Since this is true for any f > 0, in the case under consideration,
the SLLN holds.
To prove the second part of our statement, assume that (2) diverges.
For bn /n 2 ~ 1 let Zn = n, Zn = -n, Zn = ° with probabilities bn /2n 2 ,
bn /2n 2 , 1 - bn /n 2 respectively, while for bn /n 2 ;::: 1 we set Zn = A and
Zn = -A with probabilities ! and !. It can easily be seen that E(zn) = 0,
E(z~) = bn and the SLLN does not hold.
9. ON ANALYTICAL METHODS IN PROBABILITY THEORY*
INTRODUCTION
62
ON ANALYTICAL METHODS IN PROBABILITY THEORY 63
process. These purely deterministic processes also include processes when the
state y is not completely determined by giving astate z at a single moment of
time t, but also essentially depends on the pattern of variation of this state z
prior to t. However, usually it is preferred to avoid such a dependence on the
preceding behaviour of the system, and to do this the notion of the state of the
system at time t is generalized by introducing new parameters. 1
Outside the re alm of classical mechanics, along with the schemes of purely
deterministic processes, one often considers schemes in which the state z of the
system at a certain time t o only determines a certain probability of a possible
event y to occur at a certain subsequent moment t > to. If for any given
to, t> to, and z there exists a certain probability distribution for the states y,
we say that our scheme is a scheme 0/ a stochastically determined process. In
the general case this distribution function can be represented in the form
P(t o, z, t, (1;)
where (1; denotes a certain set of states y, and P is the prob ability of the fact
that at time tone of the states y belonging to this set will be realized. Here
we face a complication: in general, this probability cannot be determined for
all sets (1;. A rigorous definition of a stochastically determined process which
enables one to avoid this complication is given in §l.
As in the case of a purely deterministic process, we could also have consid-
ered here schemes in which the probability P essentially depends not only on
the state z but also on the past behaviour of the system. Still, this influence
of the past behaviour of the process can be bypassed using the same method
as in the scheme of a purely deterministic process.
Note also that the possibility of applying a scheme of either a purely de-
terministic or a stochastically determined· process to the study of some real
processes is in no way linked with the quest ion whether this process is deter-
ministic or random.
t 1 ,t 2 , •.. ,tn , ... which form a discrete series. As far as I know, Bachelier 2
was the first to make a systematic study of schemes in which the probability
P(to, x, t, Q:) varies continuously with time t. We will return to the cases studied
by Bachelier in §16 and in the Conclusion. Here we note only that Bachelier's
constructions are by no means mathematically rigorous.
Starting from Chapter 11 of this paper we mainly consider above-mentioned
schemes that are continuous with respect to time. From the mathematical
point of view these schemes have an important advantage: they allow one to
introduce differential equations for P with respect to time and lead to simple
analytic expressions which in the usual theory can be derived only as asymptotic
formulas. As for the applications, first the new schemes can be directly applied
to real processes, and secondly, from the solutions of differential equations
for processes continuous with respect to time new asymptotic formulas for
continuous schemes can be derived, as will be shown later in §12.
CONTENTS
Chapter I. Generalities:
§1. General scheme of a stochastically determined process.
§2. The operator F1(x, Q:) * F2 (x, Q:).
§3. Classification of particular cases.
§4. The ergodic principle.
2 I. 'Theorie de la speculation', Ann. Ecole Norm. Super. 17 (1900), 21; 11. 'Les
probabilites a plusieurs variables', Ann. Ecole Norm. Super. 27 (1910), 339; IH.
Calcul des probabilites, Paris, 1912.
ON ANALYTICAL METHODS IN PROB ABILITY THEORY 65
§7. Examples.
Conclusion.
CHAPTER I. GENERALITIES
uncountable, the assumption that ~ contains all the subsets of ~ does not hold
for any of the schemes known at present.
Of course we assume that
(1)
We further assume that P(t 1 ,:1:, t 2, IE) is additive as a function of IE, that is,
for any decomposition of IE into a finite or countable number ofnon-intersecting
summands IEn the following identity holds
(2)
n
Now let 1(:1:) be measurable with respect to ~ and bounded, and let </J(IE)
denote a non-negative additive function defined on ~; then, as is known, the
sum
I:m</J(m
n n
~1(:1:)< m+l)
n
m
tends to a weH defined limit as n --+ 00. This limit will be caHed the integral
[ 1(:1:)</J(d~).
lr~3:
3 Concerning these notions, as weH as additive sets of systems, etc., see, for ex-
ample, M. Frechet, 'Sur l'integrale d'une fonctionneHe etendu a. un ensemble
abstrait', Bull. Soc. Math. France 43 (1915),248.
ON ANALYTICAL METHOnS IN PROBABILITY THEORY 67
This notation differs from the usual one only in the specification of the
variable of integration and the place of the differential inside the parentheses.
In what follows we assume that P(t l , X, t2, ~), as a function of the state x,
is measurable with respect to the system~. Finally, P(it, x, t2, ~) must satisfy
the fundamental equation
(3)
and on the right-hand side we have the expression for the total prob ability
P(t l , x, t3, ~); therefore (3) is satisfied in this case. In case ~ is uncountable
we take (3) as a new axiom.
The above requirements completely define a stochastically determined pro-
cess: the elements x, y, z, ... of an arbitrary set ~ can be considered as charac-
teristics of astate of a certain system, and an arbitrary function P(tl, x, t2, ~)
F(~) = 1, (4)
We clearly have
Formula (5) is considered as the definition of Q(t, ~), not as a new re-
quirement imposed on 6. Note, however, that (5) implies (3) as a particular
case.
It is easy to see that F(x,~) satisfies the same conditions of measurability and
additivityas Fl(X,~) and F2(X,~) and (4) also holds:
(9)
Now we define the unit function I'(x, ~), which for any normal distribution
function F( x, ~) satisfies
p. * F(x, ~) = JglI
f F(y, ~)p.(x, fAl) = F(x, ~),
F * p.(x, ~) = JglI
f I'(y, ~)F(x, d9.) = f F(x, d~) = F(x, ~).
J~
The prob ability P(tl, x, t2,~) has been defined so far only for t2 > tl; now
set for any t
P(t,x,t,~) = I'(x, ~). (11)
In view of (10), this new definition does not contradict the fundamental equa-
tion (3), since (3) can be written as
(12)
If the changes in the state of the system 6 take place only at certain moments
which form a discrete series
then obviously
(13)
we have
(16)
Hence in this case the process of change of (5 is totally deterrnined by the
elementary distribution functions Pn(x, IE).
Now let P1(x, IE), P2 (x, IE), ... , Pn(x, IE) be arbitrary normal distribution
functions which are assumed to be measurable as functions of x; further, let
to < tl < ... < t n < ... be a certain sequence of moments of time. Defining
Pmn(x, IE) and P(t', x, t", IE) by (16), (14) and (13), we also obtain normal
distribution functions which satisfy the equations
Pmn(x, IE) * Pnp(x, IE) = Pmp(x, IE) (m< n < p), (17)
P(t',x,t", IE) * P(t",x,t lll , IE) = P(t',x,t lll , IE) (t' < t" < t lll ).
But this latter equation is none other than the fundamental equation (12) or
(13). Thus we see that every sequence of arbitrary normal distribution func-
tions Pn(x, IE), measurable as functions of x, characterizes a certain stochasti-
cally deterrnined process.
The schemes with discrete time defined above are those usually considered
in prob ability theory. If all the distribution functions Pn(x, IE) coincide,
we have a homogeneous scheme with discrete time; in this case (16) yields
Pn n+p(x, IE) = P(x, IE) * P(x, IE) * ... * P(x, IE) = [P(x,
I , .,.. .J
IE)]~ = PP(x, IE).
p times
(19)
As far back as 1900 Bachelier considered stochastic processes continuous
in time. 4 There are good grounds for giving schemes with continuous time
a central place in probability theory. It seems that most important here are
schemes homogeneous in time, in which P(t,x,t + T, IE) depends only on the
difference t2 - tl:
(21)
Without special assumptions on the set 2( of all possible states z, we can only
prove several general theorems, namely those dealing with the ergodic principle.
We say that a stochastic process obeys the ergodie principle if for any t(O), z, y
and ~
lim [P(t(O) , z, t,~) - P(t(O), y, t, ~)] =
t_oo
o. (22a)
For a scheme with discrete time (22a) is clearly equivalent to the following:
(22b)
(23)
(24)
n=l
diverges, then the ergodie principle (22b) holds and the limit in (22b) is uniform
with respect to z, y and ~.
Proof Let
SUpPkn(Z,~) = Mkn(~),
z:
and, similarly,
(26)
hence by (25),
The right-hand side of (29) tends to zero as n ~ 00; this proves the
theorem.
For a homogeneous scheme with discontinuous time the following holds:
Proof. We have
Mn.n+p(~) = supPP(z,~) = Mp(~),
mn.n+p(~) = inf PP(z,~) = mp(~),
An = A,
and, by (29),
Mp(~) - mp(~) $ (1- A)P. (31)
But (25) and (26) imply that for q > p,
therefore,
(34)
Q(~) = r
J!lz
P(z, ~)Q(d~). (35)
and if the series E::i An diverges, then the ergodie principle (22a) holds and
the convergence in (22a) is uniform with respect to z, y, ~.
If
then, in the same way as in the proof of Theorem 1, we obtain the analogous
formula to (29),
M(t, ~) - m(t, ~) ~ rr
k=m+l
n
(1 - Ak)'
we can confine ourselves to the probabilities Pij(tl, t2)' The fundamental equa-
tion (3) now takes the form
LPij (t b t 2 ) = l. (41)
j
Any non-negative functions Pij(tl,t2) satisfying the conditions (40) and (41)
determine some stochastically determined process of variation of the systems
6.
In this case the operator is defined as folIows:
(43)
(44)
and, conversely, arbitrary non-negative values p;~) satisfying (44) can be con-
sidered as the corresponding values of the probabilities of a certain stochasti-
cally determined process.
The probabilities P;~q) can be calculated by the formula
(45)
If all the Pij are positive, then obviously the conditions of Theorem 2 (§4)
hold, hence Pi~ tend to a certain limit Qj as q --+ 00. The integral equation
(35) transforms in our case into the system of equations
By (11) we have
~i(t, t) = 1, Pij (t, t) = 0, i -I j. (47)
If the variations of our system <5 are possible at any time t, then it is natural
to suppose that
that is, for small time intervals the probability of a change in the state of
the system is small. This assumption is contained in the hypothesis of the
continuity of the functions Pij(t 1 , t2) with respect to t 1 and t 2.
Now assume that the functions Pij (8,t) are continuous and differentiable
with respect to t and 8 for t -I 8. We do not require differentiability of these
functions at t = 8. It would be imprudent to assume apriori the existence of
a derivative at these special points. 7
For t > 8 we have
If the deterrninant
s= IPij (8, t)1
is non-zero, then the equations
can be solved:
Pkk(t,t + ß) -1 _ Au
(49)
ß S'
6 See Footnote 5.
7 Compare with the functions F(s,x,t,y) considered in Chapter 4, which neces-
sarily have points of discontinuity at t = s.
ON ANALYTICAL METHODS IN PROBABILITY THEORY 77
Since by (48) (Xik tend to the limit values OPik(S,t)/ot as ß --+ 0, the values
(49) tend to weIl defined limits 8
r1m Pkk(t, t ß+ ß) - 1 - A ()
-Ht, (50a)
· Pjk(t,t+ß) A ()
I1m ß = jk t , j i' k. (50b)
In fact it is evident from the relation
lim3 = 1,
, ..... t
(51)
which holds by (47) and the continuity of 3, that 3 may be non-zero under a
proper choice of S < t.
From (48) and (50) we immediately obtain the first system 0/ differential
equations for the function Pik(S, t):
OPik(S,t)
o "" () ( ) =Piks,t
=L..,..AjktPijS,t ( ) *Aik ()
t . (52)
t .
J
In this case, by (47) and (50),
Ajk(t) = [OPjk(t, u)] , (53)
ou u=t
= li~o![Pik(s+ß,t)- L~j(s,S+ß)Pjk(S+ß,t)] =
j
· [~i(s,S+ß)-lp' ( A)
= - .111m
..... 0
A
~
ik S +~, t +
8 We could equally weIl have taken the opposite approach: to assurne apriori that
the conditions (47a) and (50) hold and to derive from this the continuity and
differentiability of the function Pij (s, t) with respect to t.
78 ON ANALYTICAL METHODS IN PROBABILITY THEORY
8 t)
8Pil.(S,
S
= - "'"
L.JAij(S )Pj" (s,t) = -Ai" (S) * Pi" (s,t) .
.
(57)
J
If the functions Aij(s) are eontinuous, then clearly the equations (57) are also
true for s = t.
Now assurne that at to we know the distribution function
If the functions Ai,,(t) are eontinuous, then the functions Pi"(S, t) form
the unique system of solutions of (52) satisfying the initial eonditions (47);
eonsequently, the eonsidered stoehastie proeess is totally determined by all the
Ai,,(t). The real meaning of the functions Ai,,(t) ean be illustrated in the
following way: for i # k Ai,,(t)dt is the probability of passing from the state Zi
to the state z" during the time from t to t + dt, whereas
Au(t) = - LA"j(t).
j#
It ean also be shown that if we have any eontinuous functions Ai" (t) satisfying
the eonditions (54) and (55), then the solutions Pi"( s, t) of the differential
equations (52) under the initial eonditions (47) are non-negative and satisfy the
eonditions (40) and (41); in other words, they determine a stoehastie proeess.
Indeed, by (52) and (55) we have
and, by (47),
ON ANALYTICAL METHODS IN PROB ABILITY THEORY 79
D+.I,(t)
0/
= 8Pik8t(S, t), P.ik ( S, t ) = ol,()
0/ t ,
~ LAjk(t).,p(t) = R(t).,p(t).
j#
Since .,p(s) = 0, .,p(t) is clearly greater than any negative solution of the
equation
dy/dt = R(t)y,
and therefore it cannot be negative itself.
§7. Examples
In scheInes homogeneous in time the coefficients Aik(t) appear to be indepen-
dent of the time t; in this case the process is completely determined by the n 2
constants Aik. Equations (52) now take the form
dPäk(t)
dt
= "L...J.k
A- P... (t).
'3 , (62)
j
80 ON ANALYTICAL METHODS IN PROBABILITY THEORY
and solving these equations is not difficult. If all the A ik are non-zero, then
the conditions ofTheorem 4 (§4) hold and consequently, Pik(t) tends to a limit
Qk as t -+ 00. The quantities QlI: satisfy the equations
n = 2, A 12 = Au = A, Au = A22 = -A,
that is, the probabilities of transition from the state Xl to the state X2 and the
reverse transition from X2 to Xl are the same. The differential equations (62)
in our case give
= P21 (t) = ~(1- e- 2At ),
P12 (t)
Pu(t) = P22 (t) = ~(1 + e- 2At ).
We see that Pill:(t) tends to the limit QlI: = ~ as t -+ 00.
The following example shows that approaching the limit can be accom-
panied by oscillations damping with time:
= 3, A 12 = A23 = A3l = A,
n
Pl2 (t) = P23 (t) = P31 (t) = e- 3/ 2At ()asinat - ~cosat) +~,
P21 (t) = P32(t) = P13(t) = -e- 3/ 2At (Jasinat + ~cosat) +~,
y'3
a=T A .
Similar damping oscillations for schemes with discontinuous time were
found by Romanovskii.
all the notations and results of §5 of Chapter 11 remain valid. The convergence
of the series
L: Pik(tl, t2) = 1,
k
is assumed, and from this we derive convergence of the series (40), (42), (46);
by contrast, we do not require that the series
should converge.
We now make a few remarks on schemes with discontinuous time, in par-
ticular homogeneous ones. The conditions of our theorems concerning ergodic
principles for schemes with a countable set of states fail in most cases, but
nevertheless the principle itself often appears to be satisfied.
Consider, for example, agame studied recently by S.N. Bernshtein: in any
separate trial agambier wins only one rouble with probability A and loses it
with probability B (B > A, A + B :::; 1), the latter, however, provided only
that his cash is non-zero; otherwise he does not lose anything.
If we denote by Xn the state in wh ich the cash of our gambIer is n - 1
roubles, then the conditions of the game can be written as follows:
lim p? = Aj
p_oo ')
It can be shown that always A ~ 1 and that for A < 1 the ergodic principle
fails.
If all the Aj exist and are zero, then there arises the question of the asymp-
totic expression for PZ as p -+ 00. If such an expression exists independently
of i:
PZ = A} + O(A}),
then we say that the loeal ergodie prineiple holds. This principle seems to be
of great significance in the case of a countable set of possible states.
Now let all possible states x be enumerated by the integers (-00 <n<
+00). All the notation and formulas of §5 are then true, but now the sums run
over all the integers. We consider the case
P.'.J. = pT?J-'.
in more detail. Clearly in this case we have
If the series
a = LkPk , b2 = Lk2Pk
k k
are absolutely convergent, then there arises the quest ion on the conditions of
applicability of the generalized Laplace formula
p _
Pk -
1 [(k - pa)2]
bJ27rp exp - 2pb2
(1)
+ 0 JP . (63)
Po = 1 - A, P1 = A, (64)
and the other Pk vanish. Lyapunov's theorem is of no help for our problem, as
is clear from the following example:
P+1 -p
- -1
_1
- 2' Pk = 0, k f ±1,
where (63) is inapplicable. In order for (63) to hold, it is necessary 9 that for
any integer m there exists k such that
Note also that only for a = 0 does formula (63) actually give an asymptotie
expression for pe for a given k. In this ease it follows from (63) that for a given
(65)
As in §6, we assurne that the funetions Pij(S, t) are eontinuous and have deriva-
tives with respect to t and S for t f; s. In the ease of a eountable set ofpossible
states formulas (48) and (56) still remain valid; but to prove the possibility of
changing the order of the sum and the limit in these formulas and thus arrive at
the differential equations (52) and (57), we have to introduee new restrictions,
namely:
A) the existenee of limit values in (50);
B) uniform eonvergenee in (50b) with respeet to j for a given k;
C) uniform eonvergenee of the series
with respect to ~ (the fact that this series eonverges follows immediately from
(41».
In §6, for a finite number of states we dedueed eondition A) from the
differentiability of Pij(S, t) for t f; S; by eontrast, in the ease of a eountable set
84 ON ANALYTICAL METHODS IN PROBABILITY THEORY
of states this condition does not seem to follow from this property of Pij . With
regard to condition B), note that uniform convergence in (50b) with respect to
k for a given j follows from the obvious inequality
Note, further , that we do not require uniform convergence in (50b) for any j
and k, nor do we require uniform convergence in (50a) with respect to k; these
requirements would have been inconvenient for applications.
Since the factors Pij(s, t) in (48) form an absolutely convergent series, we
can, in view of conditions A) and B), change the order of the signs lim and
E in this formula and obtain (52). Then the variables Ajk(t) clearly satisfy
the formulas of the last condition; moreover, since the factors Pjk(S +..6., t) are
uniformly bounded, we can change the order of the sum and limit signs in (56),
which suffices for deducing (57).
(69)
L IAjkl = Br l ),
LBY)IAjkl = B~2),
j
(70)
~ B(n)IA·
L...J j Jk
I -- B(n+l)
k ,
j
B(n)
L _k_,
n.
x n, k = 1,2, ... , lxi ~ () (> 0), (71)
n
(72)
ON ANALYTICAL METHODS IN PROBABILITY THEORY 85
hold, then the equations (69) have the unique system of solutions PiA:(t) satis-
fying the conditions of our problem.
Indeed, since always Pij(t) :5 1, (69) and (70) imply
(73)
(74)
From (73) and the assumption of convergence of the series (71) it follows
that the functions PiA: are analytic. Further, by (69) and (74) we find that
(75)
(76)
which implies that the analytic functions ~A:(t) are uniquely determined by the
constants AiA:. Formulas (76) and (75) serve also for calculating the solutions
of the system (69) using Taylor series.
For example, if
A i ,i+l = A, A ii = -A,
Aij = 0 otherwise,
then we easily obtain
(At)n-m
Pmn(t) = (n _ m)!e- At , n ~ m,
Pmn(t) = 0, m > n,
that is, the formula of the Poisson distribution: for k =n - m, p =t the
resulting formula coincides with (67).
86 ON ANALYTICAL METHODS IN PROBABILITY THEORY
Qn = (1 - AI B)(AI Bt- 1.
1
Qn+l = n!
(A)n
Be,
-AlB
Suppose now that the state of the system considered is determined by the
values of a certain real parameter x; in this case we denote by x both the state
of the system and the value of the parameter corresponding to this state. If cE y
is the set of all states x for which x ::; y, then we set
ON ANALYTICAL METHODS IN PROBABILITY THEORY 87
(78)
For the function F(t1, x, t2, y) the fundamental equation (3) transforms into:
(79)
F( -00) = 0, F( +00) =1
is called a normal distribution function. If F 1 (x, y) and F2 (x, y), as functions
of x, are Borel measurable and, as functions of y, are normal distribution
functions, then the same is true for the function
This operator EB, like *, obeys the associative law; using this law, the funda-
mental equation (79) can be expressed as
(81)
where
(83)
The associative law also holds for the operator 0 while for normal distribution
functions the commutative law holds as weIlj if Vl(x) and V2(x) are considered
as distribution functions of two independent random variables Xl and X 2 ,
then Vl(X) 0 V2(X), as is known, is the distribution function of the sum 11
X = Xl +X2 •
If F(tl, X, t2, y) is absolutely continuous as a function of y, then we have
(84)
i:
respect to x and y and satisfies
i:
f(tl,x,t2,y) = 1, (85)
Conversely, if (85), (86) hold for f(tl, x, t2, y), then the function F(tl, x, t2, y)
defined by (84) satisfies (78) and (79): hence, such a function determines the
scheme of a stochastic process. This function f(tl,x,t2,y) will be called the
differential distribution function for the random variable y.
i:
Note also that the following mixed formulas hold:
i:
F(tl,x,t3,Z) = F(t2,y,t3,Z)f(tl,x,t2,y)dy, (87)
If
i:
then, in addition, we have
i:
fm,n+1(X,Z) = fmn(x,y)fn+1(y,z)dy, (91)
As we nüted in §3, probability theory usually deals only with schemes that are
discontinuous in time. Für these schemes, the main problem is to find approx-
imate expressions for the distributions F mn (x, y) für large n - m, or what is
essentially the same, to construct asymptotic formulas for Fmn(x, y) as n -+ 00.
i: i:
Fn(x,y) = Vn(y - x),
y - X)
Fmn(x,y) = <I> ( B mn + 0(1)
12 Math. Z. 15 (1922), 211.
90 ON ANALYTICAL METHODS IN PROBABILITY THEORY
1 jZ
~(Z) = -- e- Z 2 /2dz.
V2i -00
to = 0, tn = Bgn ,
Fmn(x,y) = F(tm,x,tn,y), Fn(x,y) = Fn-1,n(x,y).
Clearly we have
-Fn(x,y) = ~ (y-
b;:-X) ,
i:
i:
än(x) = (y - x)dFn(x, y) = 0,
b~(x) = (y - x?dF n(x, y) = b~.
The first and second moments än(x) and b~(x) of the distribution F n(x, y)
coincide with the corresponding moments an(x) and b~(x) of the distribution
Fn(x, y). From this Lindeberg deduced that
after which the Laplace-Lyapunov theorem follows directly from the obvious
identity
-Fmn(x,y) = ~ (y-
-B X) .
mn
In the general case of arbitrary functions Fn(x, y) we can only apply Linde-
berg's method ifwe know a function F(t',x,t",y) characterizing a continuous
stochastic process, and which, for a certain sequence of instants of time
gives the moments än(x), b~(x) which coincide with an(x), b~(x), or are dose
to them. A general method for constructing such functions F is obtained by
using differential equations of continuous processes, considered in the sections
below. To pass from F to F we can use the following:
i: i:
stochastic processes with discontinuous time. If
i: i:
(y - x)dFn(x,y) (y - x)dFn(x,y)
i: i:
(94)
R(x)=O, forx~O,
R(x) = 1, for I ~ x,
and for
i
(98) "
the inequalities
(99)
I()x3Ukn x,z) I~ K n
{)3 ( (3)
, (k=O,1, ... ,n),
holds, where
92 ON ANALYTICAL METHODS IN PROBABILITY THEORY
In applying this theorem to the case when the moments a(x), b(x), c(x) are
unbounded as x increases, it is often possible to eliminate this unboundedness
by introducing a new properly chosen variable x' = <jJ(x).
= 1: [Ukn(X,y) + :xUkn(X,y)z~x+
ö2 (z-x)2 ö3 (z-x)3]_
+ öx2Ukn(x,y) 2 + öx3Ukn (e,y) 6 dFk(X,Z) =
Ö _
= Ukn(X, y) + öx Ukn(X, y)ak(x)+
2 -2 -
~U ( )bk(x) ÖK(3)Ck(X)
+ öx2 kn X, Y 2 + n 6' (102)
Setting
(103)
Ö
Vk-l,n(X, y) = Ukn(X, y) + öx Ukn(X, y)ak(x)+
From (102) and (104) and using (96) and (99) it follows that
ON ANALYTICAL METHODS IN PROBABILITY THEORY 93
Now let
~ [ : IVk-l,n(Z,y) - Uk_l,n(Z,y)ldFo,k_l(X,Z) ~
But
and
WOn(X, y) = Fon(x, y) EB R(y - x) =[: R(y - z)dFon(x, z).
WOn(x, y + 1) ~ j-00
Y+l
dFon(x, z) = Fon(x, y + 1),
Formula (100) now follows immediately from (107) and (108). The details of
the proof can be found in Lindeberg's paper referred to above.
94 ON ANALYTICAL METHODS IN PROBABILITY THEORY
holds, at least for the first three moments m(l), m(2) and A general study
m(3).
of the possibilities that arise under these assumptions is of great interest; some
remarks to this end will be given below in §19.
In the following sections we also assume that the following important con-
dition holds:
(111)
This condition will certainly hold if in the definition of m(3)(t, x, ß) via (110)
only infinitesimally small differences y - x playa significant role for infinitesi-
maHy small ß or, more precisely, if
(112)
Strictly speaking, only in this case is our process continuous in time. For-
mula (111) also implies that
m(2)(t x ß)
--;~.....:'---,-'-,-7- - 0 as ß - O.
m(1)(t,x,ß)
Finally, we will also assume that for s i= t all the partial derivatives of the
function F( s, x, t, y) up to the fourth order exist, and that these derivatives for
constant t, y are uniformly bounded with respect to sand x for t - s > k > O.
From (78) and (110) we conclude that for s = t the function F(s,x,t,y) is, by
contrast, discontinuous. The function
()
f(s,x,t,y) = ()yF(s,x,t,y), (113)
ON ANALYTICAL METHODS IN PROBABILITY THEORY 95
clearly satisfies (84)-(86) and, at given t, y, has for t - s > k > 0 deriva-
tives up to the third order that are uniformly bounded with respect to s and
t. All further calculations are made for this differential distribution function
f(s, z, t, y).
I:
Set
I:
a(t, z, d) = (y - z)f(t, z, t + d, y)dy, (114)
= = m(2)(t, z, d),
I:
b2(t, z, d) (y - z)2 f(t, Z, t + d, y)dy (115)
I:
By (85) and (86) we have
= I: f(S,Z,S+d,Z)[f(s+d,z,t,y)+
8
+ 8z f (S + d,z, t,y)(z - z)+
82 (z - z)2
+8z2f(s+d,z,t,y) 2 +
83 (z - z)3]
+ 8e3f(s + d,e, t, y) 6 dz =
8
= f(s+d,z,t,y)+ 8z f (s+d,z,t,y)a(s,z,d)+
82 f( d )b 2(s, Z, d) 8c(s, z, d)
+ 8z 2 S + ,z, t, y 2 + 6 ' 181< c, (117)
f(s+d,z,t,y)-f(s,z,t,y) _
d
- -!...f( d t )a(s,z,d)
- 8z s + ,z" y d
8 2 f( A ) b2 (s, Z, d) lI c(s, Z, d)
- 8z 2 S + u, z, t, y 2d - 17 6d . (118)
96 ON ANALYTICAL METHODS IN PROBABILITY THEORY
does not vanish identicaHy for any t', y' , t" , y", then the ratios
>.(t::..) :x /(s + t::.., x, t', y') + JJ(t::..) :x /(s + t::.., x, t", y") = 0,
(120)
>.(t::..) ::2/(S + t::.., x, t', y') + JJ(t::..) ::2/(S + t::.., x, t", y") =1
have a unique solution. In this case >'(t::..) and JJ(t::..) tend to >'(0) and JJ(O) as
t::.. -+ O. Further, by (118) we obtain
1· [8
A1.T f( )a(s,x,t:..)] 8 f( s,x,t,y-
= - 8s )
o 8x s,x,t,y t:..
82
- 8x2f(s,x,t,y)B 2(s,x).
Since 8f(s, x, t, y)j8x does not vanish identically for any t and y, the following
limit also exists:
8 8 82
8s f (s, x, t, y) = -A(s, x) 8x f(s, x, t, y) - B 2(s, x) 8x 2 f(s, x, t, y). (125)
When the determinant D(s , x , t' , y' , t" , y") vanishes for any t' , y' ,t" , y" ,
then the limits A( s, x) and B 2 ( s, x) do not in general exist, as is dear from the
following example:
It can be shown, however, that these singular points (s, x) form a nowhere
dense set in the (s, x)-plane.
The practical significance of these very important quantities A(s, x) and
B(s, x) is as folIows: A(s, x) is the mean rate of variation of the parameter x
over an infinitesimally small time interval and B(s, x) is the differential variance
of the process. The variance of the difference y ~ x for the time interval t:.. is
l a
b
()
{)/(s,x,t,y)R(y)dy= ()
()t l a
b
f(s,x,t,y)R(y)dy=
lim A{
= .1_0 1 JOO R(y) JOO f(s,x,t,z)f(t,z,t+~,y)dzdy-
.u. -00 -00
JOO 1
i:
- f(s, x, t, Y)R(Y)dY} = lim A X
-00 .1-0.u.
) ( A) nI,( z )b 2(t,z,.6.)
[R'( zat,z,.u+.n + BC(t,z,.6.)]d_
i:
X 2 6 z-
= l b
f(8, X, t, y)[R'(y)A(t, y) + R"(y)B 2(t, y)]dy, (129)
l a
b
f(8, X, t, y)R'(y)A(t, y)dy = - lb ()
a
8[f(8, X, t, y)A(t, y)]R(y)dy.
Y
(130)
I a
b
f(8, X, t, y)R"(y)B 2(t, y)dy = I a
b
:22 [/(8, x, t, y)B 2(t, y)]R(y)dy, (131)
Y
l b
;/(8,x,t,y)R(y)dY= l b
{-:y[A(t,y)f(8,x,t,y)]+
{)2 2 }
+ {)y2 [B (t,y)f(8,X,t, y)] R(y)dy. (132)
However, since the function R(y) can be chosen arbitrarily only if the above
conditions are fulfilled, we easily see that for points (t, y) at which the deter-
minant D(t, y, u', z', u", Zll) does not vanish identically the 8econd fundamental
differential equation also holds:
() () {)2 2
()tf(8,x,t,y)=-{)y[A(t,y)f(8,X,t,y)]+ {)y2[B (t,y)f(8,X,t,y)]. (133)
This second equation could also have been obtained without using the
first one, using the methods described in §13 directly; then, however, new and
100 ON ANALYTICAL METHODS IN PROBABILITY THEORY
= f(s,:c,t - Ll,y) Ll
1[ 1 00
-00 f(t - Ll,z,t,y)dz -1]+
o
+-;;-f(s,:c,t
vy
- Ll,y) Ll 11 00
-00
f(t - Ll,z,t,y)(z - y)dz+
2
o0 2f (s,:c,t-Ll,y)2Ll
y
1 1 00
-00
f(t-Ll,z,t,y)(z-y) 2 dz+
+6Ll
00
1
1 _oof(t-Ll,z,t,y)lz-YI 3 dz. (134)
lim
.:l.-Ou.
11
A
00
-00
f(t - Ll,z,t,y)lz - yl3dz = 0
1
and that the limits
. ~
1 yl 2 dz = -B2 (t,y),
00
hm f(t - Ll,z,t,y)lz - (135)
.:l.-o2u. -00
lim 11
A
.:l.-ou.
00
-00
f(t - Ll, z, t, y)(z - y)dz = -A(t, y), (136)
li~oLl1 ( 1 00
-00 f(t - Ll,z,t,y)dz -1) = N(t,y),
- (137)
exist. Thus we would have obtained our second equation in the followingform
o -
o/(s,:c,t,y) = N(t,y)f(s,:c,t,y)+
- y) Oyf(s,:c,
+A(t, 0 0 2 f(s,:c, t, y).
t, y) + -B2 (t, y) oy2 (
138)
To show the identity of this equation with the one derived before, we would
have to prove that
- 0 0 2
N(t, y) = oy 2
A(t, y) + oy2 B (t, y). (141)
ON ANALYTICAL METHODS IN PROBABILITY THEORY 101
1: f(8,z,t,y)dy = 1 (142)
1: (y - z?f(8,z,t,y)dy -+ 0, as t -+ 8. (143)
In this case, clearly A(s, t) and B(s, t) depend only on s, so that the differential
equations (125) and (133) are now expressed as
{}I {}I 2 {}2 I
{}s = -A(s) {}x - B (s) {}x 2 ' (145)
For the function v(s, t, z), we obtain from (145) and (146):
Equation (148) was found by Bachelier, 13 but strictly speaking, was not
proved.
If we have A(t) = 0 and B(t) = 1 identicaHy, then (133) (respectively
(146» turns into the heat equation
(149)
for which the only non-negative solution satisfying (142), (143) is given, as is
weH known, by Laplace's formula
(150)
x' =x -1 6
A(u)du, y' =y -l t
A(u)du,
s' = 1 6
B 2 (u)du, t' = l' B 2 (u)du.
and the conditions (142), (143) retain the same form in the new variables
s', x',t', y' as in the variables s,x,t,y. Bence in the general case, the function
(151)
Let
of'
ot'
= -~[A'I']
oy'
+ ~[B'2/']
oy'2 '
(154)
With the help of the above transformation the solutions of (133) can be
obtained for many new types of coefficients A(t, y) and B 2(t, y). For example,
let
A(t, y) = a(t)y + b(t), B 2(t, y) = c(t); (156)
we set
</J(t) = J c(t)e- 2Ja(t) dt dt,
and obtain in the new variables s', x', t', y', I' the simplest heat equation:
(158)
In this case the initial conditions (142) and (143) remain valid for /'( s', x', t', y')
as weIl; therefore the formula
together with (157) and (152) gives the unique solution I(s, x, t, y) of (133)
with coefficients of the form (156) satisfying our conditions. It is easy to see
that in this case the function I(s, x, t, y) is of the form
_1_
.,;:;rpe -(y-a)2/ 4 ß , (160)
we again obtain for I'(s', x',t', y') equation (158) for which the solution (159)
is already known. Note that here it suffices to consider only the values x > c,
ON ANALYTICAL METHODS IN PROB ABILITY THEORY 105
y > c, since as x or y varies from c to +00 the variable x' (hence y') runs through
all the values from -00 to +00. Certain complications arising in connection
with this when transferring the conditions (142) and (143) to f' can easily be
eliminated.
In particular, for
I:
determined for any t > ta by the formula
og 0 02 2
öt = - öy [A(t, y)g] + oy2 [B (t, y)g]. (167)
We now assume that the coefficients A(t, y) and B2(t, y) depend onlyon y
(the process is homogeneous in time) and study the functions g(t,y) which in
this case do not change with time. It is dear that for such functions we have
(168)
If we assume that 9 and g' tend to 0 so rapidly as y -. ±oo that the entire
left-hand side of (168) tends to 0, then dearly C = 0 and we have
(169)
106 ON ANALYTICAL METHODS IN PROBABILITY THEORY
(170)
where u(z) = 0 for z < 0 and u(z) = 1 for z ~ 0, and u(z) is a continuous
i:
non-negative function for which
u(z)dz =1
and the moments
i: u(z)lzlidz (i = 1,2,3)
are finite. It can easily be shown that the function F(s,x,t,y) satisfies (78)
and (79), as weH as (110).
This scheme can be interpreted as follows: du ring an infinitely small time
inter val (t, t + dt) the parameter y either remains constant with prob ability
ON ANALYTICAL METHODS IN PROBABILITY THEORY 107
1- adt, or takes a value y', z < y' < z + dz with prob ability au(z)dt dz. Thus
a jump is possible in any time interval, and the distribution nmction of the
values of the parameter after the jump does not depend on the values of this
parameter prior to the jump.
This scheme could also be generalized in the following way: imagine that,
during an infinitely small time interval (t, t + dt) the parameter y retains its
former value with probability 1- a(t,y)dt and turns into y', z < y' < z + dz
I:
with probabiIity u(t, y, z)dt dz. Clearly we assume that
-00
g(t, z)u(t, z, y)dz (175)
should hold.
If we wish to consider not only jumps but also continuous changes in y,
then it is natural to expect that g(t, y) satisfies
-00
g(t, z)u(t, z, y)dz-
8 82
- 8y [A(t, y)g(t, y)] + 8 y 2 [B 2 (t, y)g(t, y)], (176)
provided (174) holds and the coefficients A(t, y) and B 2 (t, y) are as indicated
in §13.
CONCLUSION
i=l
Z1. ... , Zn)a-:- -
Z,
L:: L:: Bij(S,
n n
i=l j=l
Zl, ... , zn) 8 .8 .'
Z, Z,
(177)
108 ON ANALYTICAL METHOnS IN PROBABILITY THEORY
For the case when A;(t, Yl," . ,Yn) and B;j(t, Yl,' .. ,Yn) depend only on
t, these equations were discovered and solved by Bachelier. 14 In this case the
solutions satisfying the conditions of our problem have the form
§1. The purpose of this paper is to show several applications of the general
equati<?ns which I studied in my memoir submitted to "Mathematische An-
nalen" 1 (see No. 9 of the present publication). For this I give a new solution
to the "Waiting problem" dealt with in an extensive memoir by Polyachek. 2
The essence of the problem is as folIows. (Naturally, the same mathe-
matical problem can arise when studying other real phenomena, therefore such
expressions as "telephone lines" or "conversation" are used here only for the
sake of illustration.)
Assume that, in a telephone station, there are n lines over which telephone
conversations can take place. At any moment there are m clients that either
have a conversation or await their turn; the latter takes place only for m > n,
and the length of the waiting line is m - n. Theoretically, m can take all
non-negative integer values:
m = 0, 1,2, ....
We denote by
L: Qm(t) = l.
00
(1)
m:O
(2)
109
110 THE WAITING PROBLEM
§2. We proceed from the following two assumptions on the character of the
random events studied.
I. At any infinitely small time interval (t, t + dt) a new dient arrives at the
station with probability o:(t)dt. More precisely, at any interval (t, t + A) the
probability of a new dient arriving at the station is
o:(t)A + o(A),
o(A);
D= 1 00
o
ßte-tfjdt
1
= -.
ß
(4)
§3. Denote by Qmp(t, t+A) the conditional prob ability that at time t+A there
are p clients at the station if at time t there were m dients. The probability
that within the inter val (t, t + A) more that one dient will arrive, or more than
one on-going conversation will be finished, is o(A). Therefore
therefore
L
00
Substituting the value of Qmp(t, t + Ll) from (5)-(10) into (11) we prove that
the limit
Q' (t) = lim Qp(t + Ll) - Qp(t) (12)
p .a._o Ll
always exists and the following equations hold:
Thus we obtain for the functions Qi(t) a countable infinite system of dif-
ferential equations. It suffices to find a solution Q~m)(t) with initial conditions
of type (2). A general solution can be found with the help of the formulas for
the total probability,
=L
00
§4. Consider the simplest and most important case when the function a(t) is
constant:
a(t) = a. (17)
112 THE WAITING PROBLEM
In this case the equations (13)-(15) have constant coefficients. In the paper
mentioned above I proved the existence and uniqueness of the solution far these
systems (under the assumption that the Qm(t) are non-negative and satisfy
(1), which holds in our problem). Therefore the functions Qm(t) are uniquely
determined by the equations (13)-(15) and their initial values Qm(tO). It is
not difficult to find approximate solutions for these equations; one method for
obtaining such a solution is given in my paper mentioned above. But from the
practical viewpoint it is preferable to go directly to the limit solution.
This solution only exists if
nß=O' (20)
(1)
Qi = WiQO,
1 (O')i
Wi = n!ni-n ß ' n $ i, (25)
THE WAITING PROBLEM 113
(26)
LWi,
00
ü=
i=O
(27)
(28)
where
(29)
In this case we see that if on the average two out of three lines are busy,
then there is a free line with probability R 2 = 5/9, whereas the prob ability
114 THE WAITING PROBLEM
that m > 6, that is, the probability that more than three dients are awaiting
their turn is equal to 1 - R6 = 64/729 '" 1/11.
§5. If the limit solution is taken as the exact solution, then it is simple to
calculate the expectation E of the waiting time. For this, the average length
waiting time
1 1
u= :L:(m-n)Qm=-- -
(a)n+1 (1 -1-a)-2
- (31)
m>n
On!n ß nß
should be divided by nß:
E = u/nß. (32)
Thus, in our example, for n = 3, a/ß = 2 we find
8 8 8 8
u= 9' E= 9.3ß = 27ß = 27 D .
This means that the average waiting time is 8/27 of the average length of a
conversation.
§6. To be able to calculate the distribution law for the waiting time we have
only to solve the following problem: under the assumption that at time to there
are m dients at the station and a new dient A appears, we have to calculate
the distribution law of the waiting time for this dient. We are interested only
in the case m ;::: n, since otherwise there are free lines available. Set m = n + p
and denote by Pq(t), q = 0,1, ... ,p, the probability that in the interval (to, t)
precisely q conversations finish; denote by P(t) the prob ability that this number
exceeds p. Clearly, at time to,
P~(t) = -nßPo(t),
P'{t) = nßPq(t).
Thus we have found a finite system of linear equations, which can be easily
solved. Hence we obtain the probability P{t) that the waiting time for dient
A is at most t - to.
Paris, 24 November 1930
11. THE METHOD OF THE MEDIAN IN THE THEORY OF ERRORS*
Under the assumption that the error distribution is a normallaw the method
of the arithmetic mean, as is weH known, is the best method for calculating
the true value of an observable. The method of the median in this case is less
effective, though not much less, as Haag has shown. However, if the hypothesis
of normal distribution does not hold, then the problem arises of finding the
best method for the given distribution law. In particular, in many cases when
it is considered necessary to rule out "abnormal observations" it would be
methodologically better to study a general distribution law and to find a more
appropriate method for calculating the true value.
In this paper I show how, knowing the law of error distribution, one can
determine the degree of accuracy of the method of the median and compare
it with the accuracy of the method of the arithmetic mean. Which of the two
methods is preferable depends on the nature of the distribution law adoptedj
however, by Theorem 2, when the distribution law is unknown and can deviate
markedl-y /rom the normallaw, it is sa/er to use the method 0/ the median.
The method of study, in its essential features, is due to Haag [1] (who
applied it, however, only to the normal distribution law).
Let the probability that the error of a single observation lies in (z, z + dz)
be /(z)dz. We shaH assume that /(z) is continuous and !(m) :f:. 0 for the
median m:
m
1 1
-00 !(z)dz = 2·
Furthermore, let
a= 1: 00
z/(z)dz
be the expectation of the error. Denote by Z1, Z2, ... ,Zn the results of n suc-
cessive observations, let an be their arithmetic mean and m n their median. We
=
restrict ourselves to the case of n 2k + 1, when m n =
Zk+1. assuming that
the Zi are numbered in increasing order. The differences (an -a) and (mn -m)
are (as we shall see) of order O(I/fo), so that it is natural to set
115
116 THE METHOD OF THE MEDIAN IN THE THEORY OF ERRORS
with variance u.
The probability that rn n lies in (t, t + dt) is
J:
where 1
O(t) = 2 f(s)ds.
with variance
Um = ~f(rn).
Indeed,
./.'f'n (J-l ) = ..;2iC+1(2k)!
(k!)24k
{I _ 02( )}kf( )
t t .
By Stirling's formula
;i
f 00.
Theorem 2. For distribution laws with one maximum the ratio>. can take any
values in the interval
0< >. < -13,
but it can not exceed 0.
The maximum value 0 is attained if
and suppose that the ratios dn/b n are smaller than a certain fixed constant
(1)
Set
and R( t n , Jl.) -+ 0 uniformly with respect to Jl. if t n is greater than some constant
T.
Thus, for fixed n we have a formula for the asymptotic behaviour of Sn.
Consider the following problem. Let a(t) and b(t) be functions ofthe parameter
t. What is the probability that all the inequalities
(3)
hold? Assurne that a(t) and b(t) are continuously differentiable and that
118
A GENERALIZATION OF THE LAPLACE-LYAPUNOV THEOREM 119
K 1 contains those sums for which there exists a k such that all inequalities
X< Sn < y.
Clearly
The inequalities
t > 0, a(t) < s < b(t)
single out a region G in the (s, t)-plane. Denote by g(so, to; s, t) the Green's
function for the heat equation
(4)
in G and set
g(s,t) = g(O,O; s,t), og(s,t)fos = u(s,t),
Vl(t) = -u[a(t), t], V2(t) = u[b(t), t].
Theorem. The following asymptotic formulas hold
(6)
t
PA 2) = Jo v2(t)dt + (J2R2(jt),
n
(7)
then a similar result holds, whieh ean be easily obtained from the above by
passing to the limit.
Göttingen, 20 January 1931
13. ON THE GENERAL FORM OF A HOMOGENEOUS
STOCHASTIC PROCESS *
(The problem of Bruno de Finetti)
(1)
are finite.
For non-specialists, we note that cI> a (x) is non-decreasing and left contin-
uous and that cI>a(-oo) = 0, cI>a(oo) = l.
Let
* This article was presented on 6 March 1932 by the member of the Italian
Academy of Sciences, G. Castelnuovo in the journal Atti R. Accad. Naz. Lincei.
Sero Sesta. Rend. and in the same year published in two parts in Italian. The first
part was called 'Sulla forma generale di un processo stocastico omogeneo' 15:10,
805-808). The translation of this part is the title of the present article. The
second part was called 'Ancora sulla forma generale di un processo omogeneo'
(15:12, 866-869).
I See, for example, my paper: 'Über die analytischen Methoden in der Wahrschein-
lichkeitsrechnung', Math. Ann. 104 (1931), 415-458. (Paper 9 in this volume.)
2 See: B. de Finetti, 'Le funzioni caratteristiche de legge istantanea', Atti Accad.
Naz. Lincei. Rend. 12 (1930), 278-282.
121
122 ON THE GENERAL FORM OF A HOMOGENEOUS STOCHASTIC PROCESS
be the characteristic function of 4> ~ (X). From the weH known properties of
characteristic functions we have
(2)
Moreover,
-[tP~(t)
1
Ll
1 eitxd4>~(x) =
- 1] = Ll
1 [
00
- 1]
1 xd4>~(x) + 1
-00
= -1 [it itx)d4>~(x)]
1:
00 00
(e,tx
. - 1- =
Ll -00 -00
where
p(x,t) = (e itz -1- itx)/x 2 for x i= 0,
(4)
p(x, t) = -t 2 /2 for x = o.
Since for any given t the function p(x, t) is finite and continuous (including at
x = 0), we have
We now choose a sequence of positive numbers Al, A2, ... , An, ... tending
to zero. As is weIl known, given a sequence of functions
F(-oo) ~ limFd(-oo) = 0,
(5)
F(oo) ~ limFd(oo) = (T~ (as A ~ 0).
Taking into account the fact that p(x, t) ~ 0 for fixed t as x ~ ±oo, we finally
I: I:
obtain
!
I:
log tPl(t) = lim [tPd(t) - 1] =
= lim[itml + I: ~ 0);
I:
p(x, t)dFd(x)] (as A
implying that
F(oo) - F(-oo) = u?;
-00
1- e- itx
.
zt
tPa(t)dt (7)
I:
clia(x). Indeed, the integral
82
-p(x
8t 2 t) - _e itx ,
,-
1
we have
fj2
= -!i2logtPl(t) = x(t),
00
eitxdF(x)
1
-00 vt
= -211"1 1-
.zte-it:r: X(t)dt.
00
F(x) - F(O)
-00
ON THE GENERAL FORM OF A HOMOGENEOUS STOCHASTIC PROCESS 125
Fr~m the latter equality we can determine F(x) to within an additive constant
which, in turn, is determined by the condition F( -00) = o.
As already mentioned, from any sequence F~n(x) with ß n ---+ 0 we can
always select a subsequence F~nk (x) converging to F(x). Hence, F~(x) ---+
F(x) as ß ---+ 0 and
(8)
Further (8) implies that at each point of continuity x < 0 of F(x) we have
(9)
(10)
Clearly, these formulas, as weH as aH the others in which P1 (x) and P2 (x)
occur hold only when F(x) is continuous.
126 ON THE GENERAL FORM OF A HOMOGENEOUS STOCHASTIC PROCESS
°
Thus, the behaviour of F(z) outside z = depends only on the probability
distribution ofthejumps of X(A). At the same time, the discontinuity of F(z)
at the origin is associated with the continuous variation of X(A). To make this
clear we set
O(z) = F(z), z ~ 0,
O(z) = F(z) - u~, z > 0,
where u~ denotes the jump of F(z) at z = 0. Clearly, dF(z) = dO(z) outside
a neighbourhood of the origin. We can write (11) and (12) as follows
O(z) = 11C
00 y2 dP1 (y) , z < 0, (13)
u~ - O(z) = 1 00
y2 dP2 (y), z > 0, (14)
1: 1:
and hence O(z) is completely determined by P1(z) and P2 (z). We have
only at a finite number ofpoints Xl, X2, ... , Xn. Denote by CT~ = Wl +W2+ ...+Wn
the sum of these jumps (it is constant on every interval that does not contain
any of the Xk). Suppose also that X = 0 is not one of the points Xk. Then 5
1 00
-00
p(x,t)dT(x) = LPk(e it 1: k
k
- 1- itxk),
n(t) = exp(itxk) i
1:
any function F(x) by a step function T(x) so that the inequality
holds for all Itl < ta. We now form a sequence Tn(x) offunctions of the form
T(x) such that the difference in (16) tends to 0 for all t, uniformlyon every
bounded interval. Then the corresponding functions ~~)(t) converge to tP/l.(t),
which implies that tP/l.(t), being a limit of characteristic functions, is itself a
characteristic function.
In this paper we solve the following problem posed by S.1. Vavilov. Determine
the expectation of the area covered during a given time by the projection onto a
plane of a moving Brownian particle of finite size. The parts of the area covered
by this projection several times must be counted only once. This problem will
be reduced to the following more general problem: determine the probability
that an infinitesimal Brownian particle moving in a region G and situated at
t =
0 at a given point (x, y) hits the boundary R of this region at least once
during the time t. In §1 we explain a method for solving this problem, and in
§§2, 3 this method is applied to computing the mean Brownian area. §§1, 2 of
this paper are written by A.N. Kolmogorov, §3 by M.A. Leontovich.
PL(x,y;t) ~ P(x,y;t).
Now let p(x,y;e,'fJ;t)ded'fJ be the prob ability that the particle which was
e
at the point (x, y) at the moment t = 0 is in the region (e, + dei 'fJ, 'fJ + d'fJ) at
time t and does not hit the boundary during time t. Then, clearly,
128
ON COMPUTING THE MEAN BROWNIAN AREA 129
where 181 $ M, 10'1 $ M', M and M' being independent of eand 77 (but may
depend on z, y and t)i
lim{ 2 1 f p(x, y;e, TJ; r)(e - x)(TJ - Y)cJedTJ} = B I2 (X, y), (12)
,._0 r JG
-_P(x,y;r)
r
+r !1 ( G
·)P(C.. ,TJ,·t)d'Cd_P(x,y;t)_
p x,y,.c.. ,TJ,r .. TJ
r
-
-_P(x,y;r)
r . . )'/cd
+:;:l {JGf P( x,y,e,TJ,r ..... TJ-l }( .)+
P x,y,t
1 f ( )( 2 8 2p(x,y;t)
+2rJG Px ,y;e,TJ;r e-x) dedTJ 8x 2 +
1 f 8 2 P(x, y;t)
+2rJG P(x,y;e,TJ;r)(e- x )(TJ-y)cJe dTJ 8x8y +
1 f 2 82p(x,y;t)
+2r JGP(x,y;e,TJ;r)(TJ- y) dedTJ 8 y2 +
the second term on the right-hand side of (15) also tends to 0 as T --+ O. The
coefficients of the derivatives of P with respect to :c and y have the limits
Al, A 2, B 11 , 2B 12 , B 22 ; therefore the right-hand side of (15) tends to the right-
hand side of (14) as T --+ O. The left-hand side of (15) also has the limit oplot.
This proves (14).
Equation (14) satisfied by P is adjoint to the Fokker equation. This equa-
tion can also be derived under much weaker assumptions (cf. [1]-[4]).
We now assume that, in addition to I-lU the following two conditions hold:
IV. For fixed t (where t > 0) the probability P(:c, y; t) tends to 1 as (:c, y)
approaches the boundary of G, and P(:c, y; t) == 0 at t = 0 for any interior point
(:c, y) of G;
V. For fixed t (where t > 0) the prob ability PL(:C, y;t) tends to 1 as (:c, y)
tends to an interior point of the part L of the boundary and tends to 0 as (:c, y)
tends to a point of the boundary outside L, and PL(:c, y; t) == 0 at t = 0 for
every interior point (:c, y) of G.
Taking into ac count that both P and PL are bounded (namely, they are
non-negative and not greater than 1), it can easily be shown that both P and
PL are uniquely defined by (14) and the conditions IV and V.
oP =D(02P +02P).
ot o:c 2 oy2
where E denotes the expectation. It ean easily be seen that D eoineides with
the analogous eonstant for a three-dimensional Brownian particle.
Our problem is to determine the expectation 0/ the area covered by succes-
sive positions of a particle at times from t = 0 to a given t. As the origin we
take the point that coincides with the particle's eentre at t = O. Denote by
W(x, Yj t) the probability that during time ta given point (x, y) is at least onee
eovered by this particle. When the distanee from (x, y) to the origin is at most
1, then clearly, W(x,Yjt) = 1. 1fthis distanee exeeeds 1, then (x,y) is eovered
by the particle at least onee if and only if the route of the particle's eentre falls
at least onee inside the disk with radius 1 and eentre at (x, y). By symmetry
eonsiderations the probability of such an event is also equal to the probability
=
that the eentre of the Brownian particle situated at (x, y) for t 0 will during
time t touch the boundary of the disk S of radius 1 with eentre at the origin at
least onee. This probability ean be eomputed using the eonsiderations of §1.
Thus, inside S we have
W(x,Yjt) = 1, (16)
while outside this disk W(x, Yj t) must satisfy the differential equation
8W = D(8 2 W 8 2 W) (17)
8t 8x 2 + 8y2 •
Moreover, on the boundary of S the boundary eondition
W(x, Yj 0) = O. (19)
E{c5(x,Yj~)} = W(x,Yjt),
and the Brownian area F ean be defined by the relation
F= JJc5(x, Yj ~)dxdy.
ON COMPUTING THE MEAN BROWNIAN AREA 133
§3. To solve the desired problem with boundary conditions we pass to polar
coordinates (r,<p). Clearly, W depends only on the position vector r, and not
on the angle <p. We now choose the time unit in such a way that D takes the
value 1. In this case our problem with boundary conditions for W(r, t) takes
the form
8W 82W 18W
- - - - + r-8r
8t - 8r 2
-' (21)
U(r, v) = 1 00
W(r, t)e-vtdt (24)
d2U 1 dU
-+---vU=O. (25)
dr 2 r dr
The boundary conditions (22) for W yield the following conditions for U:
1 K(ry'v)
U(r,v) =~ K(y'v) ,
where
K( x ) -- i1r (l)(. )
2 H 0 'x,
and H~l) is the first Hankel function of order zero (see [5], §17.71).
134 ON COMPUTING THE MEAN BROWNIAN AREA
By (24), in order to find W(r, t) we have to solve the linear integral equa-
tion of the first kind
(JO ( -vt 1 K(rVv)
Jo W r,t)e dt = -;;- K(Vv) . (26)
satisfies this condition is well-known (see, for example, [7], §6.7), and is of the
following form:
_ 1 l(X+ioo evt K(rVv)
W(r,t) - -2. - K(Vv) dv, (27)
7rt a-ioo V V
where the integration is taken over a straight line parallel to the imaginary axis
and a > O. This gives the solution of our problem with boundary conditions.
Equation (27) holds for r > 1, whereas if r ::; 1, then by (16) we have
W(r,t) = 1. (27')
E(F) = 27r 1 00
W(r, t)r dr = 7r + 27r 100
W(r, t)r dr =
= 7r _ 27r
Jo
r oW(l,
or
t) dt, (28)
where, = eC , Cis Euler's constant (that is, , = 1.7810 ... ) and P(x) is an
entire function.
Hence,
K'(x) 1
K(x) x In(,x/2) + xR(x)j
R(x) = iln(,x/2)J~(ix)/2 - 2P(x) - P'(x)x + P(x)/ln(,x/2).
Jo(ix) In(,x/2) - x 2P(x)
Since K'(x)/K(x) does not have singularities for largxl < 7r/2 (except x = 0)
and tends to 0 as lxi -+ 00 (which follows from the weIl known asymptotic
formulas K(x) "" (7r/2)1/2 e-X /x 1/ 2 and K'(x) "" _(7r/2)1/2e -X /x 1/ 2), R(x)
does not have other singularities in the same region, except at x = 2/,. As
x-+O
R( x) -+ _1. + A + --o-_B--:--:-
2 ln(,x /2) ln(,x /2) ,
hence IR(x)1 is bounded.
By (28') we may set
where
",+ioo evt
12 = i l -R( -.[ü)dv.
",-ioo v
12 =i l
",+ioo evt
. -R(-.[ü)dv=i
"'-'00 v
l"'t+ioo
.
",t-.oo..
c
e€
R(
If-)cJe=
t
is bounded for any t. This can easily be seen by applying the Second Mean
Value theorem to its real and imaginary parts.
Thus we see that
(30)
10 = 0(1). (31)
__
1 = e -21nz - 1 + _1 = -2
z2 In z In z In z a I n z
1
1 1
e-2alnzda+_,
we obtain
---.,--)
Fig. 1
In the term proportional to f e~: ~z the integration can be carried out along
the imagiriary axis. It is then easy to show (for example, by applying the Second
Mean Value theorem to the real and imaginary parts ofthe integral), that this
term is of order 0(1). In the expression f e-2alnz+zT dz the integration path
can be deformed so that it coincides with the loop (5 encircling the negative
part of the real axis (see Fig. 1). This expression can then be transformed in
the following way:
ON COMPUTING THE MEAN BROWNIAN AREA 137
and therefore
(32)
If times and lengths are measured in arbitrary units, then t must be replaced
by Dt/(1'2, and F by F/(1'2, where (1' is the particle's radius. Thus, finally we
obtain
References
§1. Let XI, X 2 , ••• ,Xn be the results of n mutually independent observations,
ordered increasingly, that is, Xl ~ X2 ~ ••• ~ X n and let
F(x) = P{X ~ x}
Hence, nFn(x) is the number of the X k not exceeding x. A natural quest ion
is: does Fn (x) approach F( x) for large n ? A theorem related to this question
was formulated by von Mises [1] and is called the w2-method. However, the
fundamental statement that the prob ability of the inequality
139
140 ON THE EMPIRICAL DETERMINATION OF A DISTRIBUTION LAW
tends to
(1)
-00
uniformly in ~ as n -+ 00.
Below the table of some values of ~(~) is givenj these were calculated by
N. Kozhevnikov.
is small, then (1) converges very slowlYj in this case the following
If ~
§2. Lemma. The probability function ~n(~) does not depend on the distribu-
tion function F( z) if the latter is assumed to be continuous.
ON THE EMPIRICAL DETERMINATION OF A DISTRIBUTION LAW 141
If Fn(x) and F~O)(x) are the empirical distribution laws for X and Y for n
observations, then
(3)
The values Fn(x) are multiples of l/n; for example, let Fn(x) = i/n and
x = j/n + f (0 ~ f < l/n). Taking into account the fact that Fn(x), being a
distribution function, is monotone, we immediately obtain
i-j
Fn(x) - x = - - - f,
n
j) j i-j
Fn ( - - -
n n
~ Fn(x) - (x - f) = --,
n
j+1) j+1
Fn ( -n- --n-~Fn(x)-
(X+-;;-f
1 ) = i-j-1
n .
142 ON THE EMPffiICAL DETERMINATION OF A DISTRIBUTION LAW
i-i
IFn(z)-zl= I--Cl I'
~-
n n
holds if and only if at least one of the following inequalities holds:
Fn ( -i) -
i
-
i-i
~ - - ~ --,
I'
n n n n
P oo = 1, PiO = 0 (i # 0) (7)
More generally,
Pilc =0 (8)
for lil ~ 1', since in this case the inequalities (5) are contradictory. Furthermore,
where Q~~) denotes the probability of Eik+l under the condition that Ejk holds,
that is, the prob ability of
(10)
ON THE EMPIRICAL DETERMINATION OF A DISTRIBUTION LAW 143
Relation (11) means that, of the results of the n observations Xl, X 2, ... , X n
exactly n - k - j belong to the interval kin< z :5 1; (10) can therefore be true
only when i - j + 1 of the results of these n - k - j observations belong to the
interval kin< z :5 (k + l)ln.
Provided that X m is uniformly distributed, we obtain the following ex-
pression for our desired probability:
Q(k)
.. -_
JI
(n - + j) (
i - j
k-
1
1 -1-)n-k-i-l ( - 1 )i-i+
n- k n- k
1
.
(12)
Formulas (7)-(9), (12) and (6) enable us to find the prob ability cf>n(A) for the
case A = 1'1.,fii.
These formulas can be replaced by other, more convenient ones. For this
we set
(n-k-i)!nn -k
Rik = (n -
k) n-",-In.I ePik.
L·
(13)
(16)
(17)
Formulas (14)-(17) also enable us to find cf>n(A) for the case A = 1'1.,fii.
§3. Now let Yl, Y2 , ••• , Yn be a sequence ofindependent random variables with
distribution law given by the formula
{ i-I}
1 .
P Yk = - - = "1e-l, i= 1,2, .... (18)
I' z.
144 ON THE EMPffiICAL DETERMINATION OF A DISTRIBUTION LAW
Setting
(19)
hold simultaneously satisfies the same conditions (14)-(16) that Rö" does, in
other words, Rö" = Rö". This allows us to give an asymptotic expression for
Rö" as n -+ 00. For this we state the following general theorem.
Let
Sn = if
hold simultaneously and by u(u, T, S, t) the Green's /unction o/the heat equation
we have
Rön = f· {u(O, 0, if, tn) + Ll},
ON THE EMPIRICAL DETERMINATION OF A DISTRIBUTION LAW 145
c) there exists a constant K >0 such that for any k there exists an ile
for which
Apart from these restrictions, the YIe, as weil as their number n and the
integer i, can depend on l in arbitrary fashion.
This theorem falls into the same scope of ideas as the one given in [3].
However, the assertion of the above theorem is stronger: the theorem in
[3] allows us only to assert that
L: Rän = l
i=q
i=p
qE
pE
11.(0,0, z, tn)dz + ,6.',
where ,6.' tends to 0 as l -+ 0 under the conditions a) and b). The condition
c), which is essential in our new theorem, had already been used by von Mises
in similar considerations.
In our case
l = 1/,.,.,
E(YIe) = 0, E(Yl) = 2ble = 1/,.,.2, E(Yt) = die = C/,.,.3,
dle/ble = C/,.,. = Cl, t n = n/2,.,.2 = 1/>..2, a(t) = -1, b(t) = +1,
1 +00 k (s-2k)2
U(O, 0, s, t) = 2..jirt ~) -1) exp [- 4t ],
-00
+00
= ~)_1)ke-2k2>.2 + R = 4>(A) + R.
-00
Thus, Theorem 1 is proved for values A of the form J.l/ yn, provided that
A > AQ. Since the limit function 4>(A) is continuous and its limit value is
4>(0) = 0, it is easy to see that these restrictions are essential. We prove
Theorem 2 elsewhere.
References
As is well known, the fundamental ideas of P. Laplace [1] and P.L. Chebyshev
[2] were subsequently developed in papers by A.A. Markov [3] and A.M. Lya-
punov [4], culminating in a very general statement (the so-called Lyapunov
theorem) on the limit of probability distributions for sums of a large number
of small independent random variables. Further studies by Markov [5] and
S.N. Bernshtein [6] have demonstrated that in many cases a similar statement
also holds for sums of independent variables. These generalizations are of spe-
cial importance for applied purposes, but in principle they do not go much
further; in all cases studied by these authors only summands that are near
to each other are strongly dependent, whereas if the sum is decomposed into
sufficiently long partial sums, the later ones will be almost independent. Of
much greater consequence is the two- (and multi- )dimensional generalization
of Lyapunov's theorem to the case of sums of ralildom vectors, which was first
rigorously proved by Bernshtein [6].
What still remains after all these studies is to determine the limits of
various kinds of probabilities related to the whole set of partial sums of a given
sequence of random variables: these limits are unknown even in the simplest
case of independent summands. In any case, the early results of Laplace and
Poisson [7] on the prob ability of a gambler's ruination belong to this field; they
were further elaborated by P. Levy [8]. Recently I published a general statement
of this kind [9]. In the meantime some partial two-dimensional problems of
similar type were studied by R. Lüneburg [10]. Both these papers clarify the
connection between these problems and differential heat equations [11].
Several months ago I.G. Petrovskii in Moscow found a general method of
reducing problems in prob ability theory on sums of small random variables to
corresponding differential equations in a very general situation. In what follows
this method is applied to proving Lyapunov's theorem (§1) and my theorem
mentioned above (§§2, 3). The applications to random walk problems can be
found in a yet unpublished work by Petrovskii, where the setting given by
Lüneburg is considerably generalized.
Apparently this method also can be used when the distribution law for each
term Xk+l depends on the sums of all previous terms Bk = Xl +X2+' . . +Xk. This
147
148 ON THE LIMIT THEOREMS OF PROBABlLITY THEORY
§1. Let
and suppose that the quotients d" : b" are uniformly bounded by a fixed con-
stant
(1)
By the weIl known inequality for moments, b~ :5 dZ, (1) implies that
b,,:5J.'2. (2)
We now set
Z1 + Z2 + ... + Z" = S", Sn = 5,
b1 + b2 + ... + b" = t", t n = T.
Let T be fixed (and n variable); then Lyapunov's theorem asserts that
where P denotes the prob ability ofthe inequality in parentheses and R(J.') tends
to 0 together with 1'.
To prove this, consider the probability
so that a < 5 < b under the assumption s" = s. Clearly, the desired probability
P{a < 5< b} coincides with Po(O); P,,(s) satisfies the equation 1
P,,(s) = J
PH1(S+z)dTH1(Z), k=O,1,2, ... ,n-1, (4)
1 Formula (4) is proved in the same way as that for the distribution function of
the sum of two independent variables; see, for example, P. Levy, Calcul des
probabilites, Paris, 1927, p.187.
ON THE LIMIT THEOREMS OF PROBABILITY THEORY 149
where TH1 is the distribution function of Xk+l, and the integral is taken from
-00 to +00. Moreover,
The initial conditions (5) and equations (4) uniquely define Pk(S) for all k
(0 ~ k ~n).
We call Pk(s) an upper function if the inequalities
(8)
holds also for k < n. Lower functions are defined similarly. We now wish
to construct a function u*(s, t) which for any sufficiently small J-! leads to the
upper function
(9)
For this we first determine a four times continuously differentiable function
v(s) satisfying the following inequalities:
o ~ v(s) ~ 1, a- f ~ S ~ a, v(s) = 0, b+ f ~ s,
v(s)=l, a<s<b. (10)
Now let
where f > 0 in both (10) and (12). The function u(s, t) satisfies (for t ~ T) the
differential equation
(13)
150 ON THE LIMIT THEOREMS OF PROBABILITY THEORY
and its derivatives up to the fourth order with respect to s and the second
order with respect to t are bounded. Let M be the least upper bound of these
derivatives. If P:(s) is defined by (9), then we obtain
But since
ü(s, tk) - ü(s, t1:+d = - 8t8 ü(s, t1:+d(t1:+1 - tk) + 0" M(t 1:+ 1 - tk? =
::2
~ = {b1:+1 + {- :t ü(s, t1:+1) = ü(s, t1:+d }b1:+1 + Oiv Mbk+11'+
+0'" Mb k +11'2 = b1:+1 ({ + Oiv M p + 0'" M 1'2), 10iv I ~ 1. (15)
~~o,
which proves (6). Therefore P: (s) is an upper function for a sufficiently small
I' and
§2. Now let a(t) and b(t) be four times continuously differentiable functions
of t satisfying the conditions
(22)
(23)
152 ON THE LIMIT THEOREMS OF PROBABILITY THEORY
(4)
(25)
and
P~(s) 2: 1 for a(T) < s < b(T),
(26)
P~(s) 2: 0 for s ~ a(Tk) and, correspondingly, s 2: b(Tk).
As in §1, we prove that for k < n the inequality
holds.
To construct an upper function we need a function u(s,t) which is defined
in G as a solution of the differential equation (13) with the following boundary
conditions:
ü(s, T) = 1, a(T) < s < b(T),
ü{a(t), t} = v(t), 0 ~ t ~ T, (27)
ü{b(t), t} = v(t), 0~t ~ T,
where v(t) is a four times continuously differentiable function such that
v(t) = 0, 0~t ~T - f,
o ~ v(t) ~ 1, T- f ~ t ~ T, (28)
v(t) = 1, v'(T) = v"(T) = O.
Since ü(s, t) has continuous derivatives up to the fourth order (including on the
boundary of G), this function can be extended beyond G so that the derivatives
with respect to s up to the fourth order are bounded on the whole (s, t)-plane
and the inequality
ü(s,t) > -f (29)
ON THE LIMIT THEOREMS OF PROBABILITY THEORY 153
Apart from obvious changes, the proof of the fact that P;(s) is an upper
function for any sufficiently small fJ is the same as in §1. Thus for the indicated
fJ we obtain
P = Po(O) ~ P;(O) = u*(O, 0)
§3. Now let p(1) (respectively, p(2») denote the prob ability of the existence
of k such that
and
respectively. Finally, let P(x, y) be the probability that all the inequalities
where R(1)(Jl), R(2)(Jl) and R(Jl) tend to 0 together with Jl, and where
are bounded solutions of (13) determined from the following boundary condi-
tions:
u(l)(s,T) = 0, a < s< b,
u(1){a(t),t} = 1, 0 ~ t < T, (37)
u(1){b(t),t} = 0, 0 ~ t < T,
u(2){b(t), t} = 1, 0 ~ t < T,
Ux,y(s,T)=O, y<s<b,
The proof is the same as in §2. In [9] other expressions 2 are derived for the
same values u(l)(O, 0), u(2)(0,0) and UX,y(O, 0).
References
2 In [9] and [14] multiple differentiability of a(t) and b(t) is not assumed. If a(t)
and b(t) are continuously differentiable only once, then in the proof one must
approximate them by four times continuously differentiable functions. The proof
of the fact that the remainders in (3), (21), (34)-(36) tend to zero uniformly with
respect to T when T is greater than some fixed constant is simple if the random
variable is multiplied by JTo/T and in this way, the case of arbitrary T reduces
to the case T = To. A simple argument then shows that in (34) and (35) the
remainders tend to zero together with IL even without assuming that T> To.
ON THE LIMIT THEOREMS OF PROBABILITY THEORY 155
where dVy denotes the volume element. Here J(t',:c,t",y) has to satisfy the
following fundamental equations:
L J(t',:c,t",y)dVy = 1, (2)
The integral equation (3) was studied by Smolukhovskii and then by other
authors. 1 In the paper 'Über die analytischen Methoden in der Wahrschein-
lichkeitsrechnung' 2 I have proved that, under certain additional conditions,
!(t',:c, t", y) satisfies certain differential equations of parabolic type. 3 But in
A.M. there was no answer to the question 4 as to what extent !(t',:c,t",y) is
uniquely determined by the coefficients A(t,:c) and B(t,:c). In this paper the
theory is developed in the general case of a Riemannian manifold m and the
question of uniqueness is answered affirmatively for a closed manifold m.
156
ON THE THEORY OF CONTINUOUS RANDOM PROCESSES 157
we assume that f(t',:c, t", y) has continuous derivatives up to the third order
with respect to all the arguments (t', t" and the coordinates :Cl, :C2, ... , :Cn ,
Yl, Y2, ... ,Yn of the points :c and y) and satisfies the continuity condition
8
-8 f(s,:c,t,y),
:Ci
of y and t (for fixed sand:c) are linearly independent, that is, that t l , Yl, t2, Y2, . ..
. . . ,tl:, YI:, ... ,tN, YN can be chosen so that the determinant
(9)
is non-zero. 6
5 See A.M., §13, formula (112).
6 See A.M., §13, determinant (119).
158 ON THE THEORY OF CONTINUOUS RANDOM PROCESSES
In 2l we have
2( ) 0' 3( )
PX,z=-px,Z, 10'1 ~ C',
ß(s,X,a) = Lf(s,x,s+a,z)p2(x,Z)dVz =
+ lf(s,x,s+a,z)0 p3 (x,z)dVz +
+ [ f(s,x,s+a,z)0'l(x,z)dVz =
Jm-~
L: 9;jb;j(s, x, a) + 0"v(s, x, a), 10"1 ~ C". (10)
ß(s,X,a)
---'----'-
v(s, x, a)
- +00 as a- 0, (11)
(12)
{)
fes + a, z, t, y) - fes + a, x, t, y) = L:(z; - Xi) {)x/(s + a, X, t, y)+
f(s,x,t,y) = Lf(s,x,S+ß,Z)f(s+ß,z,t,y)dVz =
= L f(s, x, s + ß, z)f(s + ß, x, t, y)dVz+
+ f f(s,x,s+ß,z){f(s+ß,z,t,y)-f(s+ß,x,t,y)}dVz =
JfJt-1l
(14)
By (2),
h = Lf(s,x,s+ß,z)f(S+ß,x,t,y)dVz =
82
+~ ""(Zi - xi)(zi - xi) 8 8 f(s
L...J Xi xi
+ ß, x, t, y)+
Then
I3 = [ f(s,x,s+Ll,zHf(s+Ll,z,t,y)-f(s+Ll,x,t,y)}dVz=
1m- fA
= [ f(s,x,s+Ll,z)e'l(x,z)dVz , 1.
le'I:::;C'=K (17)
1m-fA
Substituting (15)-(17) into (14) we finally obtain
Ö
f(s, x, t, y) = f(s + Ll, x, t, y) + L ai(s, x, Ll) ÖXi f(s + Ll, x, t, y)+
ö2
+! L bij(s, X, Ll) ÖXiÖXj f(s + Ll, x, t, y)+
+ lf(s,x,s+Ll,z)e llp3 (x,z)d Vz , le"l:::;c". (18)
If we also take into account the obvious equality
_L bij ( s,
x, Ll) ö2
A !I !I
_ f( A
s+u.,x,t,y
) _ 0111
\7
v( S , x, Ll)
A • (19)
2u. UXiUXj u.
The left-hand side in (19) tends to :.f(s,x,t,y) as Ll-+ O.
Suppose that the deterrninant DN(s,x) is non-zero for tl,yl,t2,Y2, ...
... ,tN,YN. Then DN(s + Ll,x) =F 0 for sufliciently srnall Ll. Hence, there
(20)
(22)
= Lk
f)
Ao Ak(O)Jl /(s, z, tk, Yk) (23)
vS
as ~ -0.
In particular, if we set eri = 0, erij - gij, then
By (12), the second term in (24) is infinitesimally small as compared with the
first one (since the Ak(~) are hounded). Hence we have
(25)
v(S,z,~)/~ - 0 as ~ - O. (26)
· ai(s,z,~)
Ai ( s, z ) = 11m A 0 (27)
~ as ~ - ,
exist and do not depend on the choice 7 of~. Then (27), (28), (26) and (19)
immediately imply the first differential equation
Certainly the condition that DN(8, X) does not vanish identically can be
replaced by the direct requirement that the limits (27) and (28) exist, since
(28) implies the existence of a finite limit (25) and therefore of (26).
At certain exceptional points the limits (27) and (28) need not exist. This
was illustrated in A.M. 8 by the following example: !R is the ordinary number
axis and
3 y2 [ (y3 _ x3)2]
/(8, X, t, y) = 2J7r(t _ 8) exp - 4(t _ 8) ; (30)
Assume now that in a neighbourhood ~ of the point Yo for a given t the limits
A;(t, y) and B;j(t, y) exist uniformly and that v(t, y, /l.)j /l. tends uniformly to
oin ~. Suppose further that R(y) is a non-negative function vanishing outside
~ with bounded derivatives up to the third order. Then for y E ~, Z E ~ we
have
82
+~ ""(Yi - Zi)(Yj - Zj ) -88 R(z)+
L...J Zi Zj
R(y) = O. (33)
l R(y):/(s,x,t,y)dVy =
= :t l R(y)/(s,x,t,y)dVy = ! 1m R(y)/(s,x,t,y)dVy =
- 1m R(z)/(s, x, t, z)dV z} =
2
0
+~ L(Yi - z;)(Yj - Zj) OZ;OZj R(z) ] /(t, z, t + ß, y)dVydVz +
+ l/(s,x,t,z) 1m 8'''p3(y,z)f(t,z,t+ß,y)dVydVz-
- l R(z)/(s,x,t,z)dVz } = lim! { l /(s,x,t,z)R(z)dVz +
+ l/(S,x,t,z)[Lai(t,z,ß)o~;R(Z)+
2
0
+~ L b;j(t, z, ß) OZ;OZj R(z) ] dVz +
Now assume that Ai(t, z) and Bij(t, z) are twice continuously differentiable
in ~. Then we set
Q(t, y) = Igij (t, y)1
and after integration by parts, we obtain
Double integration by parts (since all the derivatives vanish on the boundary
of~) yields
lm
f(s,z,t,y)Bij(t,y)-{)
Yi Yj
()2
() R(y)dV" =
l
= -()
()2
() [I(s, z, t, y)Bij(t, y)Q(t, y)]R(y)dy1dY2 ... dYn.
m Yi Yj
(36)
Since R(y) is arbitrary, apart from the above conditions, it is easy to conclude
that at interior points of ~ the second differential equation
8
Q(t, y) 8t fes, x, t, y) = - L: 8
8Yi [Ai(t, y)Q(t, y)f(s, x, t, y)]+
82
+ "" ~[Bij(t, y)Q(t, y)f(s, x, t, y)]
L..J uy;uYj
(37)
also holds.
If at time to the differential function of the probability distribution is given,
that is, a non-negative function g(t o, y) of y satisfying the condition
then for arbitrary t > t o the distribution function g(t, y) is given by the formula
(40)
§3. Uniqueness
Under a change of the coordinate system the coefficients Ai (s, x) and Bij (s, x)
are transformed in the following way:
(41)
(42)
· bii( s, x, .6.)
Bu= 1Im 2.6. = 1·Im 21.6.}f1.f f( s,x,s+u.,z
A)( Zi-Xi )2dVz2:0. (43)
Uniqueness Theorem 1. If~ is closed, then (40) has at most one solution
g(t, y) with given continuous initial condition g(t o, y) = g(y).
Proof. Clearly it suffices to consider the initial condition g(to, y) = 0 and prove
that g(t, y) = 0 also for t > to. We can transform (40) into the form
(45)
Now set
v(t, y) = g(t, y)e-et.
{)v
-() = 2: B ij -{)2(()v) + 2: Si -()
()v
+ Tv - cv. (46)
t Yi Yj Yi
T(t,y) - c < 0
for all y and t, to ::; t ::; t1. Under these conditions v(t, y) cannot have a
positive maximum at any point (t,y), t o < t < t 1, since at such a maximum
~-o ,
{)Yi -
(T - c)v < 0,
g(t, y) = o.
10 See: E. Rothe, 'Über die Wärmeleitungsgleichung', Math. Ann. 104 (1931),
353-354 (uniqueness proof).
ON THE THEORY OF CONTINUOUS RANDOM PROCESSES 167
g2(t,y) = L9(X)h(s,x,t,y)dV:e
are also different. By (2) and (47), gl(t, y) and g2(t, y) tend to g(y) aB t -+ s.
Since the functions gl(t, y) and g2(t, y) satisfy (40), this contradicts Uniqueness
Theorem 1.
§4. An example
Then general ergodie theorems 11 imply the existence of the limit probability
distribution. In other words, for any distribution ge', y) determined by (38)
and (39) and any region ~ the relation
holds, where P(~) does not depend on g(to,y). It can easily be proved that
ge', y) is uniformly continuous for large t. From this we deduce that 12
8 82
- ' " ~[Ai(y)Q(y)g(y)] + '" ~[Bij(y)Q(y)g(y)]
L..J UYi L..J UYiUYj
= 0, (53)
1m g(y)dVy = 1. (53a)
= =
Setting g(to, y) g(y) it can easily be seen that ge', y) g(y) also for t > t o
(see (40) and Uniqueness Theorem 1). From this we deduce that the solution
01 (53) and (53a) (il it exists) is uniquely determined and coincides with the
limit function g(y).
As a particular case, (52) implies
far only linear formulas have been used for this purpose:
(1)
By ehoosing appropriate faetors Zl, Z2, ••• ,ZA:, whose number is somewhere
between three and seven, and eoeffieients a1, a2, ... ,aA: it ean be shown that in
many eases the eorrelation eoeffieient between the actuaUy observed il.y and
its value eomputed by formula (1), ealculated from the data during the same
30-50 years that served for deriving the formula, reaches 0.60-0.75. However,
when verifieation of the formula was based on the observations over years other
than the years used in deriving the formula (Vize), the eorrelation eoeffieient
between the eomputed and the observed il.y turned out to be 0.30-0.40. This
correlation eoefficient has no practical value, espeeially since forecasts of the
degree of reliability ean be obtained in a much simpler way.
The theory for obtaining regression equations of type (1) is based on the
foUowing assumptions. It is assumed that y, Zl, Z2, ••• , ZA: are random variables
with a eertain distribution law w(y, Zl, Z2, ••• , ZA:) whieh does not change over
the years. It is assumed further, that the probabilities that y, Zl, Z2, ••• , ZA: take
certain values in a given year do not depend on the values taken by these vari-
ables in previous years. Without the first condition (of stability) the problem of
formulating a regression equation has no meaning at aU. If the first eondition is
fulfiUed, then there exist determinate eoefficients a1, a2, ... ,aA: that minimize
the expectation
The relation
6 = E(il. y 2) - E(u 2 )
E(il. y 2)
169
170 FORECASTING FORMULAS FOUND BY STATISTICAL METHODS
is called the trae correlation coefficient. The second condition (of indepen-
dence) is essential for proving that the coefficients a1, a2, ••• ,aA: of the empirical
regression equation (1) computed from the observations during a sufficiently
large number of years n are dose to the theoretical coefficients ab a2, ••• ,aA:,
and also for proving that the empirical correlation coefficient R is dose to the
theoretical one, 6, also for sufficiently large n.
There can certainly be some doubts as to whether these two conditions
can be applied to meteorological phenomena. Moreover, if we consider the
existence of secular or periodic multi-year dimate ßuctuations an established
fact, the first condition is dearly wrong. This fact is often used for explaining
the discrepancy mentioned above: it is supposed that the regression equa-
tion (1) does indeed reßect the regularities that took place during the period
under study with great accuracy, but these regularities themselves are subject
to change as a result of long-term climatic ßuctuations. I am going to prove
that these excessively high empirical correlation coefficients are quite explica-
ble also under the assumption of complete stability and independence of the
studied factors from year to year, in other words, that the methods used by
the above researchers inevitably lead to a certain "blow-up" of correlation co-
efficients. This is discussed in §2. In §3 we consider whether stability and
independence in meteorological series are sufficient to make statistical deter-
mination of regression equations possible and useful; in §4 we give some ideas
on the techniques for finding regression equations.
§2. The mathematical apparatus needed for solving our problem was devel-
oped quite recently by Fisher [1] in his study on the distribution law for the
correlation coefficient under multiple correlation (see also the review by Rider
[2]). It is assumed that the distribution law w(y, Xl, X2, • •• , XA:) is normal.
Suppose that we are given a regression equation for a certa,in variable y.
Assurne that y is related to Xl> X2, ••• , XA: in such a way that the true correlation
coefficient is 6. Fisher's analysis makes it possible to compute the distribution
law for the empirical correlation coefficient R from 6 and the number of ob-
servations n (number of years during which observation took place). However,
this distribution law by itself still gives no answer to our question. Indeed,
Xl, X2, ••• ,XA: can be chosen so that the correlation coefficient 6 is as large as
possible; then although at most 5-7 values are introduced into the regression
equation, the stock of values from which these 5-7 values can be chosen is very
FORECASTING FORMULAS FOUND BY STATISTICAL METHODS 171
large.
We assume therefore that there are i groups of values
of k values each. Assume for simplicity that y is related to each of these groups
with true correlation coefficient b. For any A Fisher's formulas allow us to
compute the prob ability that in each individual case the empirical correlation
coefficient R exceeds A. Let this prob ability be p. It is natural to assume that
with probability
P=l_(l_p)i (2)
the inequality R > A holds for at least one of the groups: this is so under the
assumption of independence of the deviations of R from b corresponding to
different groups. 1 If, for example, i = 14 and p = 1/20, then
P = 1 - (1 - 1/20)14 ~ 1/2.
m=n-k-1, ß=v'ffitanh- 1 b,
then find B from ß and m via Fisher's tables 2 and, finally, find A by the formula
B = v'ffitanh- 1 A.
1 This assumption is quite arbitrary, but in reality the number of groups from
which to choose might be even greater than i = 14 taken for the calculation in
the example; this, I hope, justifies the arbitrariness of our assumption.
2 Tables on p.665 in [1].
172 FORECASTING FORMULAS FOUND BY STATISTICAL METHODS
§3. The established fact of "blowing up" of the correlation coefficient does
not necessarily mean that these statistical regression formulas cannot be used.
We must make the number k of values involved in the regression equation and
the stock of values from which they are chosen as low as possible; in this case
the risk of obtaining an artificially "blown-up" correlation coefficient can be
significantly diminished.
More essential for evaluating future prospects for the statistical estab-
lishment of forecasting equations is to find out to what extent meteorologi-
cal series satisfy the stability and independence conditions. The very exis-
tence of secular changes or periodic fluctuations does not in itself preclude
the use of studies stemming from assumptions on stability and independence,
if the role of these secular or periodic fluctuations in forming the deviations
ßy, ßZl, ßZ2,"" ßZl: is insignificant.
The simplest method for checking the stability of the series and the inde-
pendence of its terms is the following: let ßy( i) be the value of the deviation ßy
FORECASTING FORMULAS FOUND BV STATISTICAL METHODS 173
for the ith year. We need to compute, over a long period of n years, the correla-
tion coeflicients between Ay(i) and Ay(i+ 1), between Ay(i) and Ay(i+ 2), etc. If
these correlation coeflicients deviate from zero within the bounds correspond-
ing to theoretical computations made under the assumption of independence
and stability, this confirms our hypothesis. Indeed, secular change, or peri-
odic fluctuation with period of four years or more, inevitably lead to positive
correlation between Ay(i) and Ay(i+ 4 ), whereas short fluctuations during two
or three years lead to positive correlation between A.y(i) and A.y(i+2) or Ay(i)
and Ay(i+ 3). In the case of stability and independence the expectation of the
square of each of the above correlation coeflicients is approximately 1/(n - 3),
for n not too smalI.
Such computations were made for the average monthly temperatures in
Leningrad. For each month the correlation coeflicients between Ay(i) and
Ay(i+ 1 ), Ay(i+2) and Ay(i+3) were computed from the data over a hundred
years. The mean square of these three correlation coeflicients over twelve
months turned out to be 0.065, 0.064 and 0.110, which is in good agreement
with the theoretical values:
1/..,fn - 3 = 11m = 0.102.
Now consider two series y(i) and x(i). If y is related to x with positive corre-
lation, then .6.yAx has positive expectation. To study the stability of corre-
lation between y(i) and x(i) we must form the correlation coeflicients between
Ay(i) - .6.x(i) and .6.x(i+l) - Ay(i+l), .6.x(i+2) - Ay(i+ 2), etc. If the double
series is stable and the pairs y(i), x(i) relating to different years are indepen-
dent, then the square of this correlation coeflicient has expectation 1/(n - 3),
as in the first case. In this way the stability of the correlation between av-
erage temperatures of two successive months in Leningrad was studied (it is
known to be a significant positive correlation): the mean square of the cor-
relation coeflicient between Ay(i) - Ax(i) and Ay(i+l) - Ax(i+ 1) over twelve
pairs of adjacent months was 0.071, which is also in good agreement with the
theoretical value 0.102. Similar treatment can be given to the stability of the
variability of y by forming the correlation coeflicients between (Ay(i»2 and
(Ay(i+ 1»2, (Ay(i+2»2, etc.
Only such systematic studies of the stability of meteorological series can
solve the problem, whereas separate remarks to the effect that a certain corre-
lation coeflicient appeared to be unstable can only distort our ideas on the real
174 FORECASTING FORMULAS FOUND BY STATISTICAL METHODS
§4. We now assurne that the requirements of stability and independence for
our series are fulfilled. If the number of observation years is large enough,
then under our assumption, even quite complex regression equations selected
on the basis of these observations, would refiect with sufficient accuracy the
regularities that are true for the entire series. However, for n = 30 or even 50 or
even 100 the situation changes considerably: if too many values are introduced,
from which we make our choice for entering in our equation, then the danger is
that there randomly appear nice combinations whose use for forecasting wöuld
be quite unjustified.
On the contrary, if serious theoretical observations show that the value y
to be forecast can to a considerable degree be determined, for example, by three
values x!, X2, X3, then the computation of the regression formula (1) which re-
lates ll.y to ll.Xl, ll.x2 and ll.x3, based on 50 years' observations, should be con-
sidered reliable. Here we mean theoretical considerations dynamically justified
and not ''theories'' of purely statistical origin. However, even purely statistical
techniques might give a hint of where to look for non-random correlation rela-
tions. Such an attempt on a broad scale was made by Baur [3]. He computed
correlation coefficients between average monthly temperatures in Iceland and
atmospheric pressure in previous months at various points of the Earth. It ap-
peared that for the stations on the Southern hemisphere the absolute values of
the correlation coefficients do not exceed on the average (over different stations
and twelve months) the values predicted theoretically under the assumption of
FORECASTING FORMULAS FOUND BY STATISTICAL METHODS 175
z = tan- 1 r,
turns into the normal law for z with centre at the origin and variance
(1 = 1/..;n:::3,
where n is the number of observations.
References
of q' and 4' for any t' > t. We assurne, moreover, that G does not depend on
the behaviour of the system before time t.
It is natural to assurne that 2
where .6.t = t' - t and Eis the expectation symbol. Relations (1) and (2) imply
176
RANDOM MOTIONS 177
Under the assumptions (2)-(8), which are quite natural from the point of
view of physics, it is dear that G is a fundamental solution of the following
differential equation of Fokker-Planck type 3
(9)
If fand kare constants, then the fundamental solution of (10) can be repre-
sented by
3k-q- ti'ti(t'-t)f
- k(t' _ t)3 }. (11)
Clearly ä4 is of the order (ät)1/2, that is, it behaves like äz in the case
of a general continuous process. However, for äq we obtain 4
(12)
It can be proved that the latter relation also holds in the case of the general
equation (9).
References
4 This result means that for any constant K, however large, the probability of
IAq - qAtl ;::: K(At) is sm aller than any fixed f > 0 uniformly in At.
20. DEVIATIONS FROM HARDY'S FORMULAS
UNDER PARTIAL ISOLATION *
The profound mathematical studies by R. Fisher [1] and S. Wright [2] deal.
with the evolution of gene concentration in a population in which free cross-
ing dominates. The purpose of this paper is to give a method for obtaining
similar results for a population consisting of a large number of partial popu-
lations weakly connected with each other. Mathematical analysis was applied
to the following scheme: a population with a constant number of individuals
N consisting of s partial populations of n individuals each (N = sn) with free
crossing in each partial population and in which in every generation on the av-
erage k "wandering" individuals are isolated from every population; regardless
of their origin, wandering individuals randomly join any of the partial popula-
tions, where they take part in creating the next generation. This scheme was
indicated as a possible one by N.P. Dubinin and D.D. Romashov. A number of
other, not less interesting, schemes of restricted crossing do not yet succumb
to mathematical treatment.
179
180 DEVIATIONS FROM HARDY'S FORMULAS UNDER PARTIAL ISOLATION
differential equation
1 02 0
"2 ap2 (Bu) - ap (Au) = O.
Solving this equation we obtain
p4k p -1 q4kq-1
u(p) = B(4kp,4kq) . (1)
Wright obtained (1) using a different method. It can be proved that (1)
not only gives the prob ability distribution for the concentration p but, for a
sufficiently large number s of partial populations, the actually observed distri-
bution of partial populations over concentrations p will also be given by (1).
Under these assumptions Hardy's formula holds for every partial popula-
tion, that is, the concentrations of individuals of the types AA, Aa and aa are
equal to q2, 2pq and p2. The concentrations of individuals of the type AA, Aa
and aa in a large population can be computed by the formulas:
AA = 10 1 q2u(p)dp,
Aa = 2 10 1pqu(p)dp,
aa = 10 1 p2 u(p)dp.
The computations give
- 4k -2 1_
AA = 4k + 1 q + 4k + 1 q,
- 4k
Aa = 2 4k + 1 pq, (2)
_ 4k -2 1_
aa = 4k + 1p + 4k + 1p.
Formulas (2) give the solution to the problem raised in the title of the
paper.
Due to the absence of selection, the expectation E(Äp) of the increment of
the total gene concentration (in the large population) over one generation is 0,
while the intensity of random concentration fluctuations in the large population
is E(Äp)2. This latter value is computed by the formula
E(Äp)2 11
= -s 0
1
E(Äp)2 u(p)dp = -2
sn
11 0
1
pqu(p)dp.
DEVIATIONS FROM HARDY'S FORMULAS UNDER PARTIAL ISOLATION 181
Computations give
4k --
E(ilp? = 4k+ 1 ;;. (3)
In the case of free crossing we have, according to Fisher and Wright,
E(ilp)2 = pij/2N.
§2. The effect of selection. Formula (1), Hardy's formula for partial popu-
lations and its substitute (2) for a large population remain valid in the presence
of selection when the selection coefficient a is much less than l/n. We consider
only this case: accordingly, the formulas below are merely asymptotic formu-
las, true as na ---+ O. Of special interest is the case of a recessive gene. In this
case the mean increment of gene concentration due to selection is ap2q for any
partial population. Therefore, for the total concentration we have
or
4k
E(ilp) = a (4k + 2)(4k + l)pij(4k p + 1). (4)
Formula (4) can be used to confirm, for the above scheme, the general
statement on the existence of an optimum of partial isolation for the selec-
tion of recessive genes, which was suggested and quantitatively justified by
A.A. Malinovskii. It is easy to compute the k corresponding to the highest
rate of selection. For small p this optimal k is
ko = tV2 = 0.35.
Mathematical Institute of Moscow State University
18 June 1935
References
The eonsiderations given below, though simple, are, I believe, new and interest-
ing for eertain physieal applieations, espeeially in the analysis of reversibility of
statisticallaws of nature made by Sehrödinger 1 for one partieular ease. In what
foHows it is a matter of indifferenee whieh of the two following assumptions is
made: either the time variable t runs through aH real values, or only through
the integers. The classical understanding of Markov ehains eorresponds to the
seeond possibility.
Consider a physieal system whieh at any given time t ean be in one of the
states of a finite set Ei, E 2 , ... , EN. Assume that for any pair of states Ei
and Ej and every pair of moments t and s, t ~ s, the eonditional probability
Pij(t, s) that the state Ej takes plaee at time s under the assumption that at t
the system was in state Ei, is defined. A signifieant, but not always explieitly
stated assumption is that the eonditional prob ability Pij (t, s) is independent of
the prehistory of the system before t. This assumption holds in dedueing the
fundamental equation of the theory of Markov ehains,
(2)
L Pij(t, s) = 1, (3)
j
(4)
So far we have only eonsidered eondition transition probabilities Pij (t, s). There
arises the question whether it is possible, knowing the transition probabilities
* 'Zur Theorie der Markoffschen Ketten', Math. Ann. 112 (1936), 155-160.
1 Berliner Berichte (1931), 144.
182
ON THE THEORY OF MARKOV CHAINS 183
(5)
To prove this theorem we first note that for each to, as we have already
seen, there exists at least one system Q~to)(t) determined for all t ~ to and
satisfying (5) and (6) for s ~ t ~ to. Using a diagonal process, we can extract
from the sequence
to = -1,-2,-3, ...
a subsequence
such that An -00, so that for each k and each integer t the quantities
-+
Q~~n)(t) defined at each fixed t for all sufficiently large n te nd to a certain
184 ON THE THEORY OF MARKOV CHAINS
limit Qk(t) as n -+ 00. From (6) it is easy to deduce that for all real non-
integer t there exist limit values Qi;(t). These limit values, as can be proved
by passing to the limit, satisfy (5) and (6). This proves our theorem.
Of special interest is the case when the absolute probabilities are uniquely
defined by the transition probabilities Pik(t, s). A necessary and sufficient
condition for such uniqueness is the following:
For arbitrary fixed k and s, ~k(t,S) tends to a certain limit Qi;(s), in-
dependent 01 i, as t -+ -00. 11 this condition holds, then it is precisely these
limits Qi;( s) that lorm the desired unique system 01 absolute probabilities.
Let us first prove sufficiency. Let Qk(t) be certain absolute probabilities
compatible with the transition probabilities Pik(t,S). Then
(5)
but since Pik(t, s) -+ Qi;(s) as t -+ -00, the right-hand side of (5) tends to
I: Qi(s)Q;(t) = Qk(s)
i
Now suppose that our condition fails. Then we can choose i 1 and i 2 and
two sequences,
t~ > t~ > ... >t~ > ... -+ -00,
t~ > t~ > ... >t~ > ... -+ -00,
such that Pilk(t~, s) and Pi2k(t~, s) tend to limits Ql,(s) and QT:(s) respectively,
as n -+ +00, for arbitrary k and s; furthermore, Ql,(s) and QT:(s) are not
identically equal for all s. Then it is easy to prove that both Ql,(s) and Qr:(s)
can be taken as absolute probabilities, implying the necessity of the condition.
3. Inverse probabilities
EIIik(s,t) = 1, (3*)
k
IIik(t,t) = 6ik, (4*)
We now assurne that the transition probabilities Pik(S,t) depend only on the
difference t - s:
Pik(S,t) = P;k(t - s).
As is known, in this case there exists at least one system of absolute probabil-
ities independent of time t:
(8)
186 ON THE THEORY OF MARKOV CHAINS
Since
(9)
In the discrete case (when t runs only through the integers) it suffices to require
(10) only for r = 1. The proof is elementary, and we leave it to the reader.
In particular, if the transition probabilities are symmetrie:
(11)
then (10) holds. Hence, the symmetry condition (11) is sufficient for (8).
5. Conclusion
These almost trivial facts find many physical applications. Here we confine
ourselves to an example which is different from Schrödinger's original exam-
pIe. Suppose that a cirde is divided into a very large number M of identical
intervals. A large number L of mobile particles move along this cirde so that
eaeh partide, independently of others, passes at every step into the neighbour-
!.
ing interval, either to the right or the left, eaeh possibility with prob ability
There are M L = N various possible patterns of L particles on M intervals.
The absolute probabilities Qk eorresponding to these N possibilities are all
equal to each other and to 1/N. The transition probabilities Pik (r) are clearly
symmetrie, and therefore, the invertibility equation (8) holds. If we eonsider
the "maeroseopie" distribution of the particles along the eircle, then with prob-
ability very close to one, it will be uniform. If it is known (though a prior;
this is highly unlikely) that at a eertain time to a eonsiderable deviation from
this uniform distribution takes plaee, then with prob ability very dose to 1 we
ean assert that this non-uniformity for t > to will die out approximately in ac-
eordanee with the diffusion differential equation. Formula (8) now implies the
same probability of uniformity for t < to with the same differential equation,
but with the opposite sign of the time variable.
Mathematieal Institute of Moseow State University
20 May 1935
22. ON THE STATISTICAL THEORY OF METAL CRYSTALLIZATION *
This paper gives a rigorous solution to the problem of the rate of a erystalliza-
tion proeess under eertain sehematie, but still suffieiently general assumptions.
The study of the proeess of erystal growth after random formation of erys-
tallization eentres is of signifieant irnportanee for metallurgy. In this eonnection
it is fairly diffieult to take aeeount of eollisions between the erystal grains ap-
pearing around separate erystallization eentres. These eollisions disturb the
grain form by preventing the growth of erystals in eertain direetions. In papers
by F. Göler and G. Saehs [1], G. Tammann [2], B.V. Stark, I.L. Mirkin and
A.N. Romanovskii [3], and others only rough approximation formulas for the
growth of erystal matter are given. In this paper I give, under rather wide
assumptions, an exact formula for the prob ability pet) that a randomly taken
point P of the volume filled with a erystallized substanee is inside the erystal
body within the erystallization period. With a suffieient approximation rate it
ean be eonsidered that the amount of substanee erystallized over time t is also
pet). In eonclusion I determine the number of erystallization eentres formed
throughout the entire erystallization proeess.
I extend my gratitude to I.L. Mirkin who interested me in this problem
and kindly provided me with all the neeessary material.
and that of more than one eentre is o(tJ..t) , where o(tJ..t) is an infinitesimal
with respeet to tJ..t. These probabilities do not depend on the distribution
of the erystallization eentres formed prior to time t provided that it ean be
guaranteed (see later) that at the time t there is no erystal bulk in V'.
* Izv. Akad. Nauk SSSR Sero Mat.3 (1937), 355-360. Presented by S.N. Bernshtein.
188
ON THE STATISTICAL THEORY OF METAL CRYSTALLIZATION 189
c(t, n) = k(t)c(n),
depending on time t and direction n. We assume that the ends of the vectors of
length c( n) measured in the direction n from the origin form a convex surface.
Under these conditions an essential restrietion is that although c(t, n) may
depend on n, this dependence should be the same at all points. In other
words, the formulas given below hold either under the simplifying assumption
of uniform growth in all directions, or for crystals of arbitrary shape similarly
oriented in space.
c3 = ~ ( c3 (n)du,
47r Js
where we integrate over the surface of the unit sphere S with centre at the
origin. Clearly, at t > to the volume of the crystal growing freely round a
centre formed at the moment t o is given by
~c 3 (1: k(r)drf·
maxc(n) l t
k(r)dr
c(n)lt k(r)dr,
t'
where n is the direction P' P. For a fixed t' the volume occupied by the points
P' satisfying our conditions is
a(t')V'(t')tJ..t' + o(tJ..t'),
and the probability that this does not happen is
•
q(t) = II{1 - a(ti)V'(ti)tJ..t'} + 0(1).
i=l
Therefore the probability that the point P does not belong to the crystallized
bulk at the moment t is
•
q(t) = II{1 - a(ti)V'(ti)tJ..t'} + 0(1), (1)
i=l
where t = stJ..t', ti = itJ..t', and 0(1) is infinitesimal if tJ..t' is infinitesimal.
Taking the logarithm of (1) we obtain
t
= - 4; c3 10 a(t')( 1 t'
t
k(r)dr) 3 dt'. (2)
p(t) = 1 - q(t)
411"
p(t) = 1- eXP{-Tc3n}, (3)
1
where
n = 10t
t
a(t')( t' k(r)dr) 3 dt'. (4)
§3. Conclusions
When V is large enough as compared with the sizes of the individual grains,
we may set Vl(t) = Vp(t) or, in view of (3),
V(t) = V( 1- exp { _ 4; c n}) ,
1 3 (5)
ON THE STATISTICAL THEORY OF METAL CRYSTALLIZATION 191
where n is defined by (4). The formula (5) for the volumeV1 (t) ofthe substance
crystallized over the time t gives a solution of the first problem posed in the
introduction. If O'(t) and c(t, n) do not depend on time, then we can set
For a sufficiently large volume V the following formula holds for the number
N(t) of crystallization centres formed over the period t:
N(t) =V 1 1
O'(r)q(r)dr. (6)
or
(6a)
where
For t = +00 (6) and (6a) give the total number of crystallization cent res
throughout the whole process. In particular, for constant O'(t) = 0' and k = 1
we obtain
(7a)
Note also the special case when all the crystallization centres are formed
at the very beginning, with an average of ß centres per unit volume. The
corresponding formulas are derived from the general formulas by passage to
the limit. Instead of (4) we obtain
t
n = ß(}o k(r)dr) ,
3
(4b)
192 ON THE STATISTICAL THEORY OF METAL CRYSTALLIZATION
formula (5) still holds and (6) is replaced for any t > 0 by the trivial identity
N = Vß. If we assume, moreover, that k = 1 (that is, c(t, n) is independent of
t), then we obtain
n = ßt 3 , (4c)
References
§1. Notation
Denote by Ei the various possible states of the system under study, where i
runs through all positive integers. We note that the following aeeount is also
applieable for the ease when i takes a finite number of values; in this ease a
number of the statements ean be simplified. The probabilities Pij of transition
from Ei to Ej in one step are, as usual, assumed to be subjeet to the eonditions
PI).. >
_ 0 (1)
(2)
(3)
where
for i =j
for i f. j
193
194 MARKOV CHAINS WITH A COUNTABLE NU MB ER OF POSSIBLE STATES
(6)
n=l
Clearly, Lij ~ 1. If Lij = 1 then, starting from Ei the system will inevitably,
sooner or later, visit the state E j • The expectation of the number of steps
needed in this case for the transition from Ei to Ej is
I: nK&n).
00
Mij = (7)
n=l
Astate Ei is called in essential if there exist a j and an n such that Pi~n) >0
and ~(;") = 0 for all m, that is, there is a transition from Ei to E j without
return to Ei. All other states are called essential. Clearly, if two states Ei and
Ej are essential and if there exists an n with Pi~n) > 0 then there also exists an
m with PjSm ) > O. Ifthere are such n and m, then the essential states Ei and Ej
are called communicating. If Ei communicates with Ej and Ej communicates
with Ek, then Ei also communicates with Ek. Therefore all essential states
fall into c1asses S(O:), such that states belonging to one dass communicate and
those belonging to different dasses do not communicate. It is also dear that
for an essential state Ei and an inessential state Ej, Pi~n) is always O. Thus,
on ce having fallen into astate of S(O:), our system can never leave this dass of
states.
Consider now an essential state Ei. Let IDti be the set of indices n for
which p;~n) > O. Since Ei is essential, the set IDt; is non-empty. If n and m
occur in IDt;, then so does n + m. Let di be the greatest common divisor of all
numbers in IDti . The set IDti consists only of multiples of di. It can easily be
MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES 195
shown that all sufliciently large multiples of di occur in roti . The number c4 is
called the period of the state Ei.
It can easily be shown that all states be/onging to the same dass s(a) have
the same period which we denote by d(c:r) and call the period 0/ s(a). Indeed,
for two states Ei and Ej of the same dass s(a) let there exist n and m such
that Pi)n) > 0 and Pj~m) > 0 (such n and m do exist, as indicated above). Then
p};d j ) > 0 for sufliciently large k. Hence, for sufliciently large k,
that is, all sufliciently large numbers of the form kdj +n +m occur in roti ,
which is only possible if dj is divisible by di. Conversely, di is also divisible by
d j and therefore di = dj •
For two states Ei and Ej belonging to the same dass s(a), we simulta-
neously have Pi)n) > 0 and Pi)m) > 0 only if n == m(modd(c:r». Therefore,
having chosen a certain state Ei o of dass s(a), we obtain for any state Ei of
the same dass a well-defined number ß(Ei) = 1,2, ... , d(c:r) such that Pi~~) >0
is possible only for n == ß(Ei)(mod d(c:r». All the states Ej with given ß(Ej)
belong to the subdass S~a). Thus, s(a) is divided into d(c:r) subdasses ~a).
With every step our system inevitably goes from the states of S~a) into one of
the states of S~~l and in the case ß = d(c:r), into one ofthe states of S~a). Thus
if Ei and Ej belong respectively to subdasses S~a) and S~a), then Pi)n) =1= Oonly
if n == 'Y - ß (mod d(c:r». On the other hand, for sufliciently large n satisfying
the latter congruence we actually have Pi)n) > O.
Apart from the prob ability Lij that, having started from the state Ei, we
visit at least once, sooner or later, the state Ej, we introduce the prob ability
(lij that, starting from the state Ei we visit the state Ej an infinite number of
times. Clearly,
(8)
Proof. Denote by the prob ability that, starting from astate Ei, we return
TC.I:)
to it no fewer than k times. Clearly, always
Theorem lbo Within one dass either all the Lii < 1 or all the Lij 1. =
It should be noted that when all the Lii < 1, there might still be some
Lij = 1 (i i= j).
If all the Lij = 1 and all the (lij =
1, then the dass is called recurrent.
If, conversely, all the Lii < 1 and all the (lij < 1, then the dass is called non-
recurrent. It is easy to see that if astate Ej is of non-recurrent dass and Ei is
arbitrary, then
lim p~!,,) =
n-++oo 'J
o.
§40 Positive and zero classes
(10)
Lemma 2ao For any Ei /rom a recurrent dass with finite Mii
Proof Starting from Ei we return to this state an infinite number of times with
probability 1. Let the first return to Ei take place at the nl th step, the second
return at the n2th step and the kth return at the nkth step. The differences
form a sequence of random variables independent of each other with the same
distribution law: Zk = s with prob ability KI:). Clearly, Mii is none other than
the expectation of each of the Zk.
198 MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES
Assume first that the expectation Mi; of the random variables XTe is finite.
Then, according to Khinchin's theorem [3], the sequence {XTe} satisfies the law
of large numbers, that is, for any f > 0 there exists a ko such that for k ~ ko
the prob ability of the inequality
Te
"x· - M··I" = InkTe - M··I" ->:2
I.!kL..JJ
j=l
It can easily be seen that k' ~ k o, k" ~ k o. Therefore, with probability greater
than 1 - f we can assert that
hence,
Thus if n ~ no, then with prob ability greater than 1- 2f we have the inequality
that is, the number tPn of returns to the state Ei within the first n steps is
between k' and k". It can easily be seen that the expectation of the frequency
tPn In of returns to Ei within the first n steps is 11'}~). Since for n ~ no we have
k' In < tPnln < k" In with prob ability greater than 1 - 2f:
1 tPn 1
-k'n = -(1
Mi;
- f) < -
n
< -(1
Mi;
+ f)
k"
= -,
n
MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES 199
and since always 0 ~ tPnln ~ 1 and Mii ~ 1, we finally obtain for n ~ no,
Now consider the case Mu = +00. Then, whatever M < +00 and f> 0,
there exists a ko such that for k ~ k o the prob ability that the inequality
A:
.!.kL..J)
" z, = <M
nA:
k-
j=l
lim 11"~~) = O.
n-++oo 11
Theorem 2. Within one class either all the Mii are infinite or all finite.
Proof. For any two states Ei and Ej of the same class there exist k and m such
that Pi)A:) > 0, Pj~m) > O. Furthermore it is dear that for any n,
p~?+A:+m)
))
> p~~) p~~) p~~)
-)1 U I)'
Hence, the limits of 11"~;) for all i (corresponding to a given dass) are either
zero or positive. Therefore, according to Lemma 2a, the Mii are also either all
infinite or all finite.
200 MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES
It only remains to prove that the finiteness of all the M" implies the
finiteness of all the Mij. We denote by R~;) the probability that, starting from
Ei we visit Ej (j f i) in n steps without visiting Ei meanwhile. Then 1
n
M ii = L mK~;n) + LRr;)(Mji + n).
m=l j~i
But within one dass for any i and j we can find an n for which R~;) > O.
Hence Mji = +00 would imply Mii = +00, which proves our theorem.
The dasses with all the Mi; finite are called positive, and with all the
M ii = +00 are called zero. Note that in zero dasses some of the Mij (i f j)
may be finite.
Theorem 3. In a zero dass Pi)n) --4 0, as n --4 +00, tor any Ei, Ej in the
given dass.
To prove Theorem 3 we need the following:
~~,m) = ~(P~?+1)
7rJJ JJ
+ p~?+2)
JJ
+ . . . + p~?+m»
J1 (11)
m
Therefore
of whether or not there was areturn to E; within the first n steps. Then
(12)
Choose an ro such that for r ~ ro we always have 1/"~i) < f. (This is possible
by Lemma 2a.) Choose mo > ralf. Also let m ~ mo. Then for m - s :$ ra we
have (m - s)/m < f, while for m - s ~ ro we have 1/"~j-.) < f.
Since always (m - s)/m :$ 1, 1/"~j-.) :$ 1, it foIlows that for all m ~ mo,
we have
m - S (m-.)
- - 1 / " ..
m JJ
<f.
Since
we see from (12) that 1/"~j,m) < f + 11m as soon as m ~ mo. Since f > 0 is
arbitrary and does not depend on n, our lemma is proved.
Proo/ 0/ Theorem 3. First note that to prove this theorem it suflices to prove
it for i = j, that is, to prove that the probabilities p;~) tend to 0 as n -+ +00.
To see this, we choose an m such that p;~m) > O. Then, clearly,
For any f> 0 there exists no such that n ~ no implies Pj~) ~ A + f. For some
6 > 0 choose mo ~ a such that
L KJj) < 6.
m>mo
n ~ no + mo, P (n) ,
.. >"-TJ (TJ > 0)
JJ -
imply that
In fact,
atm<mo
implies that
Pj~-a) ~ A - (TJ + f + 6)/A = A - TJ1,
where
TJ < TJ1 < TJ2 < ... < TJ.·
Now, for any s we can choose TJ > 0, f> 0, 6 > 0 so that TJ. < A/2. By
choosing no and mo suitably we obtain for all n ~ no + mo + sa such that
Pj~n) > A - TJ the inequalities
Theorem 4a. In a positive dass S(O:), for any Ei from a subdass S~o:) and
Ei from a subdass S~o:), the probability Pi)n) tends to a limit
Pj = d(o:)/Mjj,
which is independent of i, when n -+ 00 runs through the values n - r-
ß (mod d(o:)).
Lemma 3. In a positive dass for any i,j and f > 0 there exists m such that for
any n the probability that, starting /rom Ei, we visit Ei at least once between
the nth and the (n + m)th step is greater than 1 - f.
Proof. The probability that, starting from Ei, we do not visit Ei within the
time interval indicated is given by
L...J K~~)
"" '3
+ ""
L...J KY?)
13
""
L...J p~~)
'3
+ ""
L...J K~~) "" p~~) <
13 L...J '3 -
p=n+m p=m+1 1:=n+m-p p=n+m 1:=1
00 00
<
- ""
L...J K~~)
~
+ ""
L...J pKY?)
33
= U(m).
p=m p=m+1
But if 00
Mn = LpKjr)
p=1
is finite, then u(m) -+ 0 as m -+ +00. Since u(m) does not depend on n, our
Lemma is proved.
204 MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSffiLE STATES
· . fP(n) 0
IImIn
n_+oo iJ' > .
Proo! Ifthe dass consists of one subdass, then there exists k o such that always
pN) > 0 for k ~ ko. In accordance with Lemma 3, we choose m such that for
given i and j the number f from Lemma 3 can be taken equal to !. We set
, =' f{P~~o) p~~o+l) p~~o+m)}
" In 1) 'JJ ' ... , 11 .
Clearly, A > O. Now let n' > m + k o . Set n' = n + m + k o . The probability
that, starting from Ei, we visit Ei between the nth and the (n + m )th step is
greater than !.
Suppose that the first arrival into Ei between the nth and the (n + m)th
step takes place after the (n+s)th step (s < m). Then the conditional probabil-
ity of revisting Ei in n' = n + m + k o steps is Pg o+m - 8 ) ~ A. This inequality
holds for any s (1 $ s $ m). Therefore, the total probability p;)n that, l
)
lim p~~)
n--+oo U
= 11M;;.
Proo/. First we prove the existence of the limit of Pi)n) as n -+ +00. Let
hold. Let n >m+ k o be such that p;)n) < a + f, and n' >n+ k o such that
Pi)n l
) > b - f (such n and n' always exist). Put n' - n = k. Then
p~~')
11
= p(k) p~~)
n 11
+ A(l) p~~-l)
11
+ A(2) p~~-2)
n'..
+ + A(n) p~.o)
11 ,
MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES 205
where A(6) is the probability that, starting from Ei, we visit after the kth step
Ei for the first time only in (k + s) steps. Clearly,
n
pHt) + I:A(') :$ 1.
.=1
Moreover, by Lemma 3,
m
pHt) + I:A(') > 1- f.
6=1
Note also that for s :$ m we have n - s > k, and hence Pt~n-.) < b + f.
Therefore,
m n
p(n')
n
= p~.k) p(n) + ""
un L...J
A(') p~~-') + ""
u L...J
A(') p~~-a)
u-
<
.=1 .=m+1
:$ Pi~k)(a + f) + (1 - Pi~k»(b + f) + f = b + 2f - Pi~k)(b - a).
Bearing in mind that Pt~k) > a - f, Pt~n') > b - fand b - a:$ 1 we obtain
Since a > 0 and f > 0 is arbitrary, it follows that b - a = O. This proves the
existence of the limit of ~~n) equal to b = a. Lemma 2a directly implies that
this limit is I/Mii'
Proo/ 0/ Theorem 4a. Consider, together with the given Markov chain, a new
chain determined by the elementary transition probabilities
P-IJ.. -_ p(d)
ij ,
where d is the period of the dass considered. Clearly, for all states of our dass
P?) = p~~d).
IJ IJ
206 MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES
With respect to the new Markov chain our dass of states forms one single
subclass. Therefore, according to Lemma 4b,
Thus Theorem 4a is proved for i = j. To prove it for the general case, let q
be the minimal number of steps in which we can pass from Ei to Ej (dearly
q == / - ß (mod d)). Then,
n
p.(~d+q) = """ K~':'1d+q) p~'.'d-md).
'J L...J 'J JJ
m::O
In this case,
m::1
and Pj~d-md) tends to d/Mjj for constant m and n -+ +00. This implies that
Theorem 4b. In a positive dass the sum of the limits Pj over all states in a
subdass is 1 for every subdass.
Theorem 4b follows directly from the next lemma:
Lemma 5. In a positive dass there exists for any f > 0 a finite system of
states Eh, Eh, ... ,Eile such that for any Ei from the same dass and for all
sufficiently large n,
k
I: Pi):) > 1 - f.
6::1
We will prove that this system of states satisfies the conditions of the Lemma.
For this we take some fixed i and choose q such that
L.J K~~)
"
110
> 1- ~.
3
t=l
Now let n > m + q. Set n = q' + m, q' > q. With prob ability greater than
1 - f./3, starting from Ei we visit E io in the first q' steps. No matter at what
step ~ q' we first visit Ei o' with prob ability greater than 1 - f./3 we return
to Ei o between the q'th and the (q' + m)th step. If this happens at some
(q' +m- r)th step, then with probability greater than 1 - f./3 we are in one
of the chosen states Ei. after q' + m steps. Thus with probability greater than
(1- f./3)3 > 1- f., having started from Ei we arrive at one of our chosen states
Ei. in n = q' + m steps. This proves the Lemma and with it Theorem 4b.
Remark. Theorem 4a implies that not only does 7r~;) tend to I/Mi; (Lemma
2a), hut also that for any Ei from the same dass as Ei, the 7r]7) tend to the
same limit. By Theorem 4h, the sum E(1/Mii) taken over the states of one
sub dass is equal to l/d (where d is the period of the dass), and the same sum
taken over all the states of one dass is equal to 1.
since, having once arrived in one of the states of the dass S(Ot) it is impossible
to leave this dass. In the case when we enter the dass S(Ot) at the initial state
Ei, we denote by no the number of steps before the first visit to one of the
states Ei of the dass S(Ot) and by ßo the number of the sub dass S~:) to which
this first state Ei belongs. Now let Ni~~) be the probability that, given the
208 MARKOV CHAINS WITH A COUNTABLE NUMBER OF POSSIBLE STATES
initial state Ei, we visit the dass s(a) so that no == ßo +'Y (mod d(a». Clearly,
d(a)
L....J N~a)
'"
',1'
= N~a).
,
-y=1
References
Consider an n-dimensional manifold R. Let let, x, y)dy1 dY2 ... dYn be the prob-
ability of transition in time t > 0 from a point x to a point 1J with coordinates
1Ji, i = 1,2, ... , n, such that Yi < 1Ji < Yi + dYi. Assume that let, x, y) is
differentiable up to a certain sufliciently high order and satisfies the following
conditions:
!(t,x,y)?O, (1)
11··· 1R
let, x,y)dy1 dY2 .. . dYn = 1, (2)
!(s+t,x,y) =11··· 1 R
!(s,x,z)!(t,z,y)dz1 dz2 ... dzn, (3)
where x is an interior point of a domain G (cf. [1], [2]). Given let, x, Y), p(x)
determines a stationary probability distribution compatible with let, x, y) if and
only if
p(x) ? ° (5)
11···1 R
p(X) dX l dx 2 ... dxn = 1, (6)
p(y) =11··· 1 R
p(x)!(t, x, y) dX l dx 2 ... dxn. (7)
209
210 ON THE REVERSIBILITY OF THE STATISTICAL LAWS OF NATURE
This paper deals with the special case in which the function f(t, x, y) satisfies
the following Fokker-Planck equations:
{) f = - ""
8t {). "" "" {)2
L.,.. '8i{A'(y)f} + L.,.. L.,.. {) ;{) j {B'J.. (y)f} (11)
; y ; j y y
(see [2]).
For simplicity we will furt her confine ourselves to the case of a closed
manifold R. In this case, (11) implies that every stationary distribution p(y)
satisfies the equation
1 In [3] this problem is discussed for the case of Markov chains with a finite number
of statesj cf. also [4].
ON THE REVERSffiILITY OF THE STATISTICAL LAWS OF NATURE 211
Now we consider a more general case than that discussed in §l. Namely, we
now assume that the transitions from x to y in the time between the moments
sand t > s have prob ability distribution described by the prob ability density
I(s,t,x,y). In this case I(s,t,x,y) must satisfy the following conditions (cf.
[1], [2]):
I(s,t,x,y) ~ 0, (14)
JJ... JR
I(s, t, x, y)dy 1 dY2 ... dYn = 1, (15)
!(s,t,x,y) = 11 ···1 R
!(s,u,x,z)!(u,t,z,y)dz1 dz2 ... dzn, (16)
s< u < t,
11··· 1
G
!(s,t,x,y)dy1 dY2 ... dYn- 1 ast-s, (17)
2
- ""
L..J ""..
L..J BI, (s, x) 8x8i 8xj !(s, t, x, y), (18)
i j
2
+ """"
~ ~ 8y8i 8yj {B"(t,
.. y)!(s,t, x,y)}. (19)
I ,
The coefficients Bij(s,x) form a contravariant tensor ofrank two, whereas the
coefficients Ai(s,x) transform according to the following more complex law:
_.. 8 2 xi
A'=x'Ak + B km •
k 8x k 8x m
(From now on the summation sign is omitted.)
Let us assume that the quadratic form Bij (s, x) is positive definite every-
where and for all s, and choose it to be the principal metric form on R (note
that our metric depends on s). We then set
(20)
where r~k is the Christoffel symbol corresponding to Bij (s, x). The contravari-
ant vector a i coincides with Ai in every geodesic coordinate system (at x). In
a geodesic coordinate system chosen at the point x, (18) may be written in the
following way:
(21)
where (x) indicates the argument over which the derivative is taken, ßi is the
covariant derivative, and ß is the Laplace operator, that is, ß = ßißi.
ON THE REVERSIBILITY OF THE STATISTICAL LAWS OF NATURE 213
The latter equation, by virtue of its invariance, must hold also in any
other coordinate system. When the coordinates z change, f( s, t, z, y) does not
vary; if, however, the coordinates y change, this function transforms as a scalar
density.
Now set
fes, t, z, y) = VIBi;(t, y)It/J(s, t, z, y). (22)
Clearly, t/J(s, t, z, y) is now invariantwith respect to changes in both z and y;
it also satisfies (21):
(23)
0;: = _d~lI) {li(t, y)t/J} + d (1I)t/J + l 0 log I~ (t, y)1 t/J. (24)
Then to find the conditions when (9) holds, we merely have to the find condi-
tions under which the equivalent relation
holds. In view of the results of §2, t/J(t, z, y) and 1r(z) satisfy the equations
Sinee, moreover,
.p(t,:c, y) = 4J(t,:c, Y)-1r(:C )/1r(Y),
(29) ean be rewritten as
Assurne that (29) (and henee (33» holds. Interchanging :c and y and using
(31), we obtain the following equation for 4J(t,y,:c):
(34)
(Hereafter all the derivatives are taken either with respeet to t or :c, but never
with respect to y, and o:i depends only on :c.) The same equation must be
satisfied by .p(t,:c, y) and therefore by .p(t,:c, Y)1r(Y) = 4J(t,:c, Y)1r(:C):
(35)
or
1r ~~ = -1ro:i l:1i4J - 4Jl:1i(o:i1r) + 1r1:14J + 2(l:1i 1r)(l:1i4J) + 4J1:11r. (36)
Multiplying (32) by -4J and (30) by -1r and adding the resulting inequalities
to (36), we have
or
(37)
It ean easily be seen that l:1i4J does not vanish identieally (that is, for any
t > 0) in any domain of R. Therefore, on an everywhere dense set of points :c
and henee, by eontinuity, everywhere,
that is,
(39)
Putting log 1r = P we see that P is a potential of 0:. This proves the
neeessity of the eondition stated in §l.
Assurne now that, on the eontrary, there exists P such that
ON THE REVERSIBILITY OF THE STATISTICAL LAWS OF NATURE 215
we clearly have
(40)
11···1
where
J = eP VIBij(x)ldxldx2 ... dx n . '
R
Now (40) successively implies (39), (38), (37), (36) and (35), while (35) implies
that .,p(t, x, y) satisfies
(41)
that is, the equation that was earlier denoted by (34) holds for tfJ(t, y, x) as weH.
The initial values for tfJ(t,y,x) and .,p(t,x,y) clearly coincide at t = o. Hence,
tfJ(t,y,x) = .,p(t,x,y), which proves our theorem. 3
Moscow, 1 July 1936
References
R.A. Fisher [1] gave an interesting application of iteration theory to the laws of
breeding a new gene in an unbounded population. Recently J.F. Steffenson [2]
gave a detailed exposition of the probability that all offspring from an individual
die out. Both these questions are, however, mathematically identical. After
giving a brief statement of the problem and recapitulating the known results, I
we will give several additions to the results by Fisher and Steffenson in certain
directions.
The following assumptions are essential for our mathematical discussion. Let
Fo, FI , ... , Fn , be a sequence of generations and let Fo have K o individuals with
a feature M (M-individuals); we assurne that crossing between M-individuals
is impossible; each M-individual from Fn +1 is assumed to be an offspring from
a certain M-individual in Fn . We are given the probabilities Pk that an M-
individual from generation F n has exactly k M-offspring in Fn +l , and these
probabilities are independent of the fate of other branches of M-offspring.
Pk
Our goal is to determine the probabilities n ) (n ~ 1) that the number
K n of M-individuals in the nth generation is equal to k. Here pJn) is the
probability that all the M-offspring die out. Clearly, pJn) can only increase
with n. The main result of this paper is an asymptotic formula for pJn) for
large n.
We assurne that the three first factorial moments
are finite.
2. Biological explanations
* Izv. NIl Mat. Mekh. Tomsk. Univ. 2:1 (1938), 7-12 (in Russian).
I References to previous works can be found in Stefl"enson's paper.
216
SOLUTION OF A BIOLOGICAL PROBLEM 217
to be prescribed, take into account the effect of selection. If, for instance, a
grown-up M-individual always has a very large number m of M-offspring in
the next generation which, however, have a very insignificant probability alm
of reaching the age set for determining K n , then we obtain
(1)
We should distinguish three cases which are essentially different from each
other: a greater than, smaller than, or equal to L
If a < 1, then it is dear that the M-offspring will finally disappear. In what
follows we will show that this also holds when a = 1, b> O. If a > 1, then the
probability Po ofthe M-offspring dying out is #:-1 (Po = limPJn) as n -+ (0).
The probability that the offspring do not die but propagate unrestrictedly is
I-Po.
The exception a = 1, b = 0 is only possible when PI = 1, PA: = 0 (k #:- 1).
In this case, dearly K n always equals K o and consequently, pJn) is always O.
A specific problem that led to Fisher's studies is the following: in a very
large stationary population consisting only of individuals of BB type there
appears a small number K o of individuals of Bb type (generation Fa). If in
the further generations FI, F2, . .. Bb-offspring remain comparatively small in
number, then Bb x Bb crossing is virtually excluded. Under this assumption
our scheme can be applied to Bb-offspring.
If a = 1, then Bb-individuals are as viable as BB-individuals. However,
due to random fluctuations of K n they should surely die out (here the case a =
218 SOLUTION OF A BIOLOGICAL PROBLEM
Tbe main result of Fisber and Steffenson consists in reducing tbe computation
of tbe probabilities pin) to determining tbe coefficients of known power series.
For tbis purpose tbey assurne tbat
Since Pie ~ 1, pin) ~ 1, it follows tbat q(x) and Q(n)(x) are analytic functions
for Ix I < 1. Let qn (x) be tbe ntb iterate of q( x):
(4)
In particular,
and consequently,
Tbe limit
A = lim qn(o), n -+ 00,
q(A)-A=O. (7)
It can further be proved that for a > 1 the required smallest root A is
smaller than 1. Therefore in this case we have a positive prob ability
that the M -offspring propagate unrestrictedly. If, on the other hand, a :::; 1,
(and if a = 1, then b is also greater than zero), then A = 1 and consequently,
Po = 1. In this case M -offspring must necessarily die out. Therefore it is
desirable to study the asymptotic behaviour of the prob ability
that the offspring will survive up to the nth generation. Neither Fisher nor
Steffenson did this under general assumptions.
(8)
q'(1) = a,
3 See [2].
220 SOLUTION OF A BIOLOGICAL PROBLEM
while for 0 ~ x ~ 1,
(9)
(10)
(11)
Thus we see that for a < 1 the prob ability Rf..n) deereases asymptotieally as a
geometrie progression with eommon ratio a.
Let a = q'(1) = 1. Then
(12)
Finally, by (8),
R(n) '" 2Ko/nb. (14)
Thus we see that for a = 1, R(n) tends to zero eonsiderably more slowly.
Formula (14) shows that for a = 1 and large n the probability R(n) of survival
until the nth generation is inversely proportional to the seeond moment b. In
SOLUTION OF A BIOLOGICAL PROBLEM 221
particular, if the probabilities Pie satisfy the Poisson formula (1), then b = a2 =
1 and consequently,
R(n) _ 2Ko/n.
5. An approximate formula
w =a-l
A-I-2w/b. (16)
1- Po - 2Kow/b,
References
In the discussion on genetics that took place in the autumn of 1939 much at-
tention was paid to checking whether or not Mendel's laws were really true. In
the basic discussion on the validity of the entire concept of Mendel, it was quite
reasonable and natural to concentrate on the simplest case, which, according
to Mendel, results in splitting in the ratio 3:1. For this simplest case of crossing
Aa x Aa, with the feature A domin.ating over the feature a, it is weIl known
that Mendel's concept leads to the condusion that in a sufficiently numerous
progeny (no matter whether it consists of one family or involves many separate
families resulting from various pairs of heterozygous parents of type Aa)<the
ratio between the number of individuals with the feature A (that is, the indi-
viduals of the type AA or Aa) to the number of individuals with the feature a
(aa type) should be dose to the ratio 3:1. T.K. Enin [1], [2], N.I. Ermolaeva [3]
and E. Kol'man [4] have concentrated on checking this simplest consequence of
Mendel's concept. However, Mendel's concept not only results in this simplest
conclusion on the approximate ratio 3:1 but also makes it possible to predict
the average deviations from this ratio. Owing to this it is the statistical analysis
of deviations from the ratio 3:1 that gives a new, more subtle and exhaustive
way of proving Mendel's ideas on feature splitting. In this paper we will try to
indicate what we think to be the most rational methods of such checking and
to illustrate these methods on the material ofthe paper by N.I. Ermolaeva [3].
In contrast to the opinion of Ermolaeva herself, this material proved to be a
brilliant new confirmation of Mendel's laws. 1
222
ON A NEW CONFIRMATION OF MENDEL'S LAWS 223
and ß gametes
ßl, ß2, ... , ßk l •
in 8 = n.I(k 1 !l)~(k _ )1 different ways. In accordance with the above our further
n. 2 n.
study sterns from the assumption that for each of these possible choices there is
only a certain probability ofits actual fulfillment as determined by the biological
factors.
The derivation of Mendel's laws is based on the simplest assumption that
the probabilities corresponding to any of these 8 possible choices are equal
(and, consequently, all equal to 1/8) (see §2). From the biological viewpoint
this assumption implies the same viability of gametes, the absence of selective
fertilization, and equal viability (at least up to the moment of counting the
offspring) of the individuals resulting from any pair combination of gametes
(ti , ßi. For simplicity, we call this the independence hypothesis (the probability
of obtaining some set of gametes used for producing the progeny is assumed to
be independent of the biological features of these gametes).
Like any other hypothesis on the independence of some phenomenon of
certain other ones, our hypothesis taken as an absolute dogma not allowing any
corrections is wrong: there are a number of well-known examples of deviations
from this hypothesisj sorne are quantitatively insignificant, while others are
quite considerable.
It is quite clear that a viewpoint that totally rejects the role of biologi-
cal external random events in the selection of the gamet es that take part in
224 ON A NEW CONFIRMATION OF MENDEL'S LAWS
Let us now return to the more special Mendelian assumption for the case of
crossing Aa x Aa and domination of the feature A. In this case we assume
that each of the parents forms as many type Agametes as those of type a,
gamete pairs of AA and Aa types give offspring with feature A and gamete
pairs of aa type, offspring with feature a. These assumptions, together with
the assumption that k1 and k 2 are much greater than n, and the independence
hypothesis, imply the following:
1. The probability that there are exactly m individuals with feature A in
a group of n offspring (all the rest of them having feature a) is
Pn(m)
n!
= m!(n - m)!
(3)m
4"
(l)n-m
4" . (1)
individuals, of which
respectively have the feature A. The question is how one can best check whether
or not this result is in agreement with Mendel's assumption.
Ifthe number ofindividuals in every family is very small (for example, less
than 10), then it is reasonable to check (1) directly using Pearson's x2-criterion.
If each of the families is sufficiently numerous, then it is better to apply
another method. In this case (1) implies the following:
ON A NEW CONFffiMATION OF MENDEL'S LAWS 225
~ = (m/n - 3/4) : U n ,
where
approximately obey the Gaussian law with variance 1, that is, the prob ability
of
is approximately equal to
Here U n = (Vä/4)Vii is the mean square deviation of the frequency m/n from
3/4. We see that this mean square deviation is proportional to I/Vii. Hence
it is only for very large families that Mendel's theory predicts that m/n is very
dose to 3/4. For example, we can affirm that
2 The summary Table 1 in Ermolaeva's paper [3] presents some figures other than
those given here, since she takes into account certain families (2 from the first
series and 4 from the second serles) that, for reasons we do not know, were not
included in her Tables 4 and 6. Our summary Table includes only the families
represented in Tables 4 and 6. However, the conclusions given below remain the
same if drawn from the data given in the summary Table 1 of Ermolaeva.
226 ON A NEW CONFIRMATION OF MENDEL'S LAWS
Splitting
(in colour) Splitting
No. of families of the flower (in colour) Theoretical, %
and leaf axil of cotyledons
For this number of families in the series the correspondence with the theory
should be considered quite good. Due to some strange misunderstanding, Er-
molaeva herself claims in her work that the presence of a noticeable percentage
of families with ILlI > 1 disproves Mendel's theory.
In a similar way we could verify the coincidence of the percentage of fam-
ilies for which ILlI ~ a for a "# 1. For example, the theory predicts that
approximately 50% of the families should have ILlI ~ 0.674. However, it is best
just to check whether the actually observed distribution of the deviations Ll is
elose to the theoretical Gaussian distribution. To do this, we draw on the same
plot the theoretical curve y = P( x) in accordance with (2) and the empirical
step curve
y = q(x)/r,
where r denotes the total number of families in a given series and q(x) is the
number of families in the series for which Ll ~ x. The results of this verification
for the two series from Ermolaeva's paper are shown in Figs. 1 and 2. It is elear
that in both cases the coincidence between the theoretical and empirical curves
is sufficiently good. To evaluate whether the observed discrepancy between
these two curves is admissible for the given size of the series we should use the
formulas derived earlier (see [5]). Using these formulas for the cases given in
Figs. 1 and 2, we find that
Al = 0.82, A2 = 0.75,
ON A NEW CONFffiMATION OF MENDEL'S LAWS 227
Fig.1 Fig.2
References
1. T.K. Enin, Dokl. Akad. Nauk SSSR 24 (1939), 176-178 (in Russian).
2. T.K. Enin, Dokl. VASKhNIL 6 (1939), (in Russian).
3. N.!. Ermolaeva, Yarovizatsiya 2:23 (1939), 79-86 (in Russian).
4. E. Kol'man, Yarovizatsiya 3:24 (1939), 70-73 (in Russian).
5. V. Romanovskii, Mathematical statistics, Moscow-Leningrad, 1939 (in
Russian).
27. STATIONARY SEQUENCES IN HILBERT SPACE*
Introduction
do not depend on t.
Definition 2. Two stationary sequences {x(t)} and {y(t)} are called jointly
stationary if the scalar products
do not depend on t.
The definitions of By:r;(k) and B:r;:r;(k) immediately imply that
* Byull. Moskov. Gos. Univ. Mat. 2:6 (1941), 1-40 (in Russian).
228
STATIONARY SEQUENCES IN HILBERT SPACE 229
be stationary and pairwise jointly stationary, and let HII:11I:2 ••. lI:n be the smallest
c10sed linear subspace 0/ H containing all the elements 0/ these sequences. Then
the equations
UXj'(t) = xj'(t + 1), I-' = 1,2, ... , n, -00 < t < +00,
Zl,Z2,···'Zn'···· (1.2)
We eliminate from this sequence all the elements Zn that depend linearlyon
the preceding Zk. Then we have a sequence
(1.4)
eontains all elements of the sequenee (1.2) and is dense everywhere in HM.
Sinee the elements of R ean be uniquely represented in the form (1.4) (by the
linear independenee of the elements in (1.3», the formula
(1.5)
Lemma 2. A sequence
exists on the space H if and only if for any k, ml, m2, ... , mk the matrix
IIcm,mj 11 is hermitian and non-negative, that is, if 1
S = 2: Cm,m/.i(i ~ 0 (1.7)
i.i=l
Sn = L: Cijei(j ~ o.
i,j=1
We find a transformation
n
71i = L: a~j)ej
j=1
Sn = L: l'1d 2 •
i=1
and set m
u~n) = L:a~7)zj, i = 1,2, ... , n.
j=1
1) We set
2) assuming Ul, U2, ••. , UN to be already known and to satisfy the require-
ment (1.6) for all m, n ::; N, we set
Clearly, the elements Ul. U2, ... , UNH satisfy (1.6) for all m, n ::; N + 1.
Let
(2.2)
be the spectral representation 2 of Uz;. For -1r ::; A ::; +1r we set
(2.3)
After Stone ([3], Chapter 6, §1), we denote by L; the dass of all complex
functions </J, measurable with respect to Fxx(A), defined on the segment -'Ir ~
is finite. Two functions </JI and </J2 from L; are considered to be identical if
whose domain 6(</J) consists of all the elements Z of Hx for which the integral
(2.6)
is finite. The finiteness of (2.4) implies that x(O) belongs to 6(</J) for any </J from
i:
L;. Therefore to each </J in L; there corresponds a certain element
ZtP = T(</J)x(O) = </J(A)dEx(A)X(O) (2.7)
of the space H x .
These definitions allow us to state the following two Lemmas, which are
almost immediate corrolaries of Theorems 6.1 and 6.2 of Stone [3].
and the dass Tx 01 all operators T representable in the lorm (2.5), where </J
belongs to L;.
Under this map
(3c)
(3d)
234 STATIONARY SEQUENCES IN HILBERT SPACE
(3e)
(4b)
I:
n(,p) - E:c(A)Z~,
(Z~l' Z,p2) =
(4e)
I:
,pl(A),p2(A) dF:c:c(A), (4f)
The properties (3a), (3b) and (3e) of the map ,p - T,p follow from Theo-
rem 6.2 of Stone [3]. The properties (3d) and (3e) ean be derived by eomparing
I:
(2.5) with the formulas
I: I:
U; = eit"dE:c(A), (2.8)
The fact that the map ,p - T(,p) is one-to-one follows from the fact that for
,pl "I ,p2 (in the sense of equality in L~) the differenee
has norm
STATIONARY SEQUENCES IN HILBERT SPACE 235
and consequently
(2.10)
and eit>· belongs to L;, the set !m contains all the x(t), which implies that
i:
7r
7r
= I/Jl(A)~2(A)dFz:z:(A).
This proves Lemmas 3 and 4.
Theorem 2. For any stationary sequence {x(t)} the values Bz:z:(k) can be
represented as
(3.1)
where Fz:z: is areal continuous function, non-decreasing from the right, defined
on the interval -1f ~ A~ 1f, and such that
(3.2)
Here we note also that F:I::I:(A), which in what follows will be called the
spectral function of the sequence {x(t)}, coincides with the function defined by
(2.3), as will be dear from the text below.
Theorem 3. Iftwo stationary sequences {x(t)} and {y(t)} are jointly station-
ary, then
(3.3)
where F:l:y(A) is an (in general complex) function continuous from the right, of
bounded variation, defined on the interval -11" :::; A :::; 11", for which
(3.4)
1:
To prove Theorem 3, we consider the spectral representation
(3.7)
implies that
Finally,
= 1: eik>.d(E(A)X(O),y(O» = 1: eik>'dF:l:y(A).
STATIONARY SEQUENCES IN HILBER!' SPACE 237
Thus, F:r:y(~) defined by (3.6) satisfies all the conditions of the theorem.
For the particular case of one sequence {z(tn = {y(tn the function
(3.8)
is real and non-decreasing (see Stone [3], p.189). This justifies the correspond-
ing statement of Theorem 2.
Elementary properties of Fourier series for functions of bounded variation
(see, for example, A. Zygmund [4], p.13) imply that
(3.9)
where
w:r:y (~) = B :r:y (O)~ _ '"
L-,
B:r:y(k) e- iA:>.
ik ' (3.10)
k;tO
and the constant C can be found from the condition
(3.11)
Formulas (3.9) and (3.10) show that F:r:y(~) is uniquely defined by B:r:y. Thus,
Theorems 2 and 3 are completely proved.
Note also that (3.8) directly implies that
(3.12)
are stationary and jointly stationary, then lor any -'11' ~ a ~ ß ~ '11' the
increments
(3.13)
which are continuous /rom the right and are defined on -1r ~ ).. ~ 1r, satisfy
the conditions
n
L: ~F/JI/f./J(1/ ~ 0, (3.15)
1',1/=1
(3.16)
for which
(3.17)
To prove this first part of the theorem we consider the spectral represen-
tation
of the operator
Z = L:f./JX/J(O),
1'=1
n n
n n
= L: Fx,.x,,()..)f./Jf.//.
1',,,=1
STATIONARY SEQUENCES IN HILBERT SPACE 239
Since </J().) is real and monotone non-decreasing (see Stone [3], p.189), it
follows that for a <ß
n
for any r, J.1.p, tp, f,p. This is always the case since, setting
where the sum E(") is taken over all p for which J.1.p = J.1., we have, by (3.15),
are stationary and jointly stationary. It is easy to see that in this case
Theorem 6. If the sequences {z(t)} and {y(tn are stationary and jointly
stationary, then for any -'Ir ~ a < ß ~ 'Ir,
(3.18)
A~ 'Ir are continuous from the right and satisfy the conditions
(3.19)
then there erist stationary and jointly stationary sequences {z(tn and {y(tn
for which Fu(A) = Fl1(A), F!I!I(A) = F22(A), FZ!I(A) = F12 (A).
In what follows we require the following strengthened version ofTheorem 6:
Theorem 7. If F l1 (A), F22 (A) and F 12 (A) satisfy all the conditions of the
second part of Theorem 6, the stationary sequence {z(tn is such that
STATIONARY SEQUENCES IN HILBERT SPACE 241
Tx·(t) = x(t).
It can easily be verified that T is an isometry on the set {x*(t)}. By Lemma 1,
Textends as an isometry onto the whole space Hr;o. Then clearly,
Ty*(t) = y(t).
It is easy to see that the sequence {y(t)} has all the properties required
by Theorem 7.
1:
a unique function 4>~x)(A) of dass L; for which
14>~X)(A)12dFxx(A),
1:
Fyy(A) = (4.3)
The map
(4.5)
(4.6)
We will prove the existence of functions 4>~x)(A) for 'any sequence {y(t)} from
the dass Yx using Lemma 4 in §2. If {y(t)} belongs to Yx , then y(O) belongs
to H x , and according to Lemma 4, there exists a function 4>~x)(A) of dass L;
such that
y(O) = z tP~X). (4.7)
By (3.6) and (4g) of Lemma 4 we have
1:
Fy1y2 (A) = (Ey1y2 (A)Yl(O),Y2(O)) = (Ex(A)Yl(O),Y2(O)) =
= 4>~~)4>~:)(A)dFxx(A)
for any sequences {Yl(t)} and Y2(t)} in Y x , which proves (4.5). In the particular
case Yl = y, Y2 = X we obtain (4.4), while for Yl = Y2 = X we obtain (4.3).
STATIONARY SEQUENCES IN HILBERT SPACE 243
Theorem 9. If two sequences {z(t)} and {y(t)} are stationary and jointly
stationary, then {y(t)} is subordinate to {z(t)} if and only if there exists a
i:
function q,(A) in L~ for which
i:
Fyy(A) = 1q,(AWdFxx(A), (4.8)
Therefore
By Lemma 1 we can determine an isometry Ton Hxyo such that for any
integer t,
Tz(t) = z(t), Ty*(t) = y(t).
Clearly
(4.10)
(4.11)
1o. We prove the necessity of the condition. Assume that {y(t)} is sub-
ordinate to {z(t)}, and {z(t)} is subordinate to {y(t)}. Then it follows from
(4.3) and the formula
(4.13)
obtained by interchanging the indices, that F:t::t:(A) and F",,(A) are absolutely
continuous with respect to each other. Therefore the notions "almost every-
where with respect to F:t::t:" and "almost everywhere with respect to F",," co-
incide. By (4.3) and (4.13),
(4.14)
Formula (4.14) shows that tP~:t:)(A) ::f:. 0 alrnost everywhere with respect to
F:t::t:, as required.
20 • We now prove the sufficiency ofthe condition. Assume that {y(t)} is
subordinate to {z(t)} and tP~:t:)(A) ::f:. 0 alrnost everywhere with respect to F:t::t:.
Since, according to (4.3), F",,(A) is absolutely continuous with respect to
F:t::t:(A), the function tP~:t:)(A) is also absolutely continuous almost everywhere
STATIONARY SEQUENCES IN HILBERT SPACE 245
J-,..A1</J~:>:)('x)12
dFyy('x) = JA 1</J~:>:)(,X)I2dF:>::>:('x) = JA dF:>::>:('x) = F:>::>:('x)
-,.. 1</J~:>:)('x)12 -,.. ,
(4.15)
1:
( ) - ( ) - 'l'y
-,.. </Jy:>: (,X) -,.. </Jy:>: (,X) -,..
(4.17)
1:
satisfies the conditions
1:
F:>::>:('x) = 1</J(,X1 2 dFyy ('x), (4.18)
Definition 5. Two sequences {Yl (t)} and {Y2(t)} are called mutually orthogonal
if
(5.1)
(5.2)
246 STATIONARY SEQUENCES IN HILBERT SPACE
or, equivalently, if
(5.3)
Theorem 11. Iftwo sequences {Yl(t)} and {Y2(t)} are stationary and mutually
orthogonal, then
a) the sequence {z(t)} = {Yl(t) + Y2(t)} is stationary;
b) {Yl(t)} and {Y2(t)} are jointly stationary both with respect to each other
and with respect to {z(t)}:
(cd Bxx(k) = B'/1!Il (k) + B y2y2 (k),
(C2) B Y1X (k) = B y1y1 (k),
(C3) B Y2X (k) = B y2y2 (k),
(dd Fxx(A) = Fy1y1 (A) + Fy2y2 (A),
(d 2 ) FY1X (A) = Fy1y1 (A),
(d 3 ) FY2X (A) = Fy2y2 (A).
Theorem 11 is proved by simple computations based on (0.2), (3.9)-(3.11).
In the conditions of Theorem 11, by (d 1 ) and the non-negativeness of the
increments t1Fxx , t1FY1YlI t1FY2Y2 (for non-negative t1A) we have:
(5.4)
(5.5)
i:
are bounded and uniquely defined (in the sense adopted in [3]) by the formulas
,p~~)(A)dFxx(A),
i:
Fy1y1 (A) =
(5.6)
Fy2y2 (A) = ,p~~)(A)dFxx(A).
Theorem 12. Under the conditions of Theorem 11, either both sequences
{Yl (t)} and {Y2(t)} are subordinate to {x(t)} or neither iso The first of these
two cases takes place if and only if
(5.7)
STATIONARY SEQUENCES IN HILBERT SPACE 247
(5.8)
but if we take into account (d 2 ) and (da), then comparing (5.6) with (4.4), we
see that almost everywhere with respect to Fil:II: ,
.1.(11:)
'l'Yl
= ..J.(II:)
'l'Yl'
.1.(11:)
'l'y~
= 'l'Y2
..J.(II:)
.
(5.9)
Comparing (5.8) and (5.9), we conclude that t/J~~) and t/J~~) are equal to 0 or
1 almost everywhere with respect to FII:II:. Since, moreover, almost everywhere
with respect to Fil:II:
.1.(11:) .1.(11:) _ dFY1Y1 dFY~Y2 _ dF1I:1I: _ 1
'l'Yl + 'l'y~ - dF1I:1I: + dF1I:1I: - dF1I:1I: - , (5.10)
(5.11)
FY11I:(A) = I: t/J~~)(A)dFII:II:(A),
Fy2y2 (A) = I: 1t/J~~)(AWdFII:II:(A),
(5.12)
FY~II:(A) = I: t/J~~)(A)dFII:II:(A).
248 STATIONARY SEQUENCES IN HILBERT SPACE
By Theorem 9, the existence of tP~~) and tP~~) satisfying (5.12) implies that
the sequences {Yl(t)} and {Y2(t)} are subordinate to {x(t)}.
Theorem 13. Suppose that for a stationary sequence {x(t)}, the orthogo-
nal complement He H x of the space H x in H is infinite dimensional. Then
corresponding to any representation of Fxx(A) as a sum
(5.13)
of two real non-decreasing functions F 1(A) and F 2(A) continuous from the right
and for which F 1(-1r) = F 2(-1r) = 0, there is at least one representation of
{x(t)} as
x(t) = Yl(t) + Y2(t), (5.14)
where {Yl(t)} and {Y2(t)} are mutually orthogonal and have spectral functions
(5.15)
The functions FU(A), F 22 (A) and F 12 (A) satisfy the conditions ofthe sec-
ond part of Theorem 6. Therefore, by Theorem 7 there exists in H a sequence
{Yl(t)} that is stationary and jointly stationary with respect to {x(t)} and is
such that
Setting
Y2(t) = x(t) - Yl(t),
and using simple computations we see that {Yl(t)} and {Y2(t)} are mutually
orthogonal and satisfy (5.15).
Theorem 12 gives conditions that shouldbe imposed on F 1 (A) and F2(A)
in the representation (5.13) in order to make {Yl(t)} and {Y2(t)} sub ordinate to
{x(t)}, that is, to ensure that they lie in H x • Under these additional conditions
the requirement that the dimension of the complement He H x be infinite is
STATIONARY SEQUENCES IN HILBERT SPACE 249
Theorem 14. For any stationary sequence {x(tn and for each representation
of the spectral function Fxx(.~) in the form
where F1(A) and F 2(A) are real non-decreasing functions of A, continuous from
the right with F 1 (-1r) = F2(-1r) = 0, and almost everywhere with respect to
Fxx ,
dFl + dF2 = 0, (5.16)
dFxx dFxx
there corresponds one representation {x(tn in the form
for which {Yl (tn and {Y2(tn are mutua//y orthogonal and
Z= Ctu(t), (6.4)
t=-oo
where
L
+00
L
00
The necessity of the condition of the theorem has already been proved. If
this condition holds, then every element x(t) of the sequence {x(t)} belongs to
H u and, by (6.4) and (6.6), is representable in the form
L L
00 00
L
00
Formula (6.9) shows that 4>1u )(A) is a square integrable function. It follows
from (3.3) and (6.10) that
Bxu(n) = -2
1
7r
1 -lr
lr
einA 4>1u )(A)dA. (6.11)
Thus the Bxu(n) are the Fourier coefficients of 4>1u )(A), that is,
L
00
With regard to the above formulas we note that, by (6.3), the words "al-
most everywhere with respect to F uu " mean "exduding a set of Lebesgue mea-
sure zero" and the dass L~ coincides with the dass L 2 of square integrable
functions in the usual sense of Lebesgue.
The necessity of the condition follows from (6.9). To prove its sufficiency
we consider an arbitrary function ,(A) in L 2 for which, almost everywhere on
-7r ~ A~ 7r,
(6.13)
252 STATIONARY SEQUENCES IN HILBERT SPACE
and define Fll (A), F22 (A) and F12 (A) by the formulas
It is easy to see that these functions satisfy the conditions of the second
part of Theorem 6. By Theorem 7, if H e H., is infinite dimensional, then
there exists a sequence {u(t)}, stationary and jointly stationary with {x(t)} ,
for which
(6.14)
(6.15)
(6.16)
(6.18)
STATIONARY SEQUENCES IN HILBERT SPACE 253
- - = Fuu(A) = jA
A+1r If/Ju(z) (A )1 2dFzz ()
A, (6.19)
21r _ ...
(6.20)
Hence by Theorem 10, {u(t)} is sub ordinate to {x(t)} (that is, actually
belongs to Hz itself). Thus, we have proved the sufficiency of the conditions of
the Theorem and with it, the additional assertion A.
A sequence {u(t)} subordinate to {x(t)} is fundamental if and only if
(6.21)
This proves C.
254 STATIONARY SEQUENCES IN HILBERT SPACE
For a stationary sequence {z(tn we denote by Hz(t) the smallest cl08ed linear
subspace of Hz containing all the z(s) for s $ t, and let S~ be the intersection
of all the Hz(t). Clearly,
(7.3)
(7.4)
for any t. Conversely, if (7.4) is true for some t, then {z(tn is singular. Indeed,
if (7.4) holds for one t, then by (7.1), it also holds for all other t, whence it
follows that S~ = Hz.
Let sz(t) be the projection of z(t) onto S~. It is easy to see that {sz(tn
is a singular stationary sequence subordinate to {z(tn for which
(7.5)
where ~(t) belongs to H~(t - 1) and ~(t) :F 0 is orthogonal to H~(t - 1). Set
Since z(t) belongs to Hz(t), it follows that z(t) can be uniquely represented in
the form
E c~z)uz(t -
00
(7.9)
and i!{s(t)} and {u(t)} and the coefficients Cn satisfy the conditions W 1 )-W 5 ),
then {x(t)} is non-singular and
Proof. Since x(t') and an the u(t' - n) belong to Hx{t') for n = 0,1,2, ... (by
W 3 ), then so does
L:
00
where
L:
00
Comparing (7.13) with (7.7) and taking W5) into account we obtain
Cn = c~z). (7.16)
Comparing (7.8) with (7.11) and taking into account (7.15) and (7.16) we finally
obtain
S(t) = sz(t).
Sz(t) = O. (8.2)
Let us prove the necessity of (8.2). For this purpose note that if there
exists a representation (8.1), then the space Hz(t) is contained in the space
Hu(t) and Sz in Su. Therefore, if the equality
Su = 0 (8.4)
z(t) I 0, sz(t) = O.
258 STATIONARY SEQUENCES IN HILBERT SPACE
L: ICn
00
L: cne- in >..
00
~~u)(A) = (8.7)
n=O
L cn(n
00
r;«() = (8.8)
n=O
represents an analytic function in the disk 1(1 < 1. By (8.7), its boundary
values on I( I = 1 are given by the formula
(8.9)
The condition z(t) :j; 0 implies that r:«() cannot vanish identically. There-
fore,6 almost everywhere on 1(1 = 1,
(8.10)
Since by (6.9)
(8.11)
Theorem 20. 1f a regular sequence {z(t)} can be represented in the form (8.1)
with a fundamental sequence {u(t)}, then {u(t)} is subordinate to {z(t)}.
coincides almost everywhere on the circle 1(1 = 1 with the boundary values of
the analytic function d U
)«(), dejined for 1(1 < 1 by
(8.14)
1fthis condition holds, then the coefficients Cn of(8.1) can be determined from
(8.8).
The first part of the theorem, the necessity of the conditions in the second
part of the theorem, and the assertion on the form of dependence between the
coefficients Cn and the function r~u)«() have been proved above.
It remains to prove the sufficiency of the conditions of the second part of
the theorem.
If these conditions hold, then for the boundary values of d u )«() we have
the Fourier series expansion
L cne- in>..
00
Comparing (8.15) with (6.12) and (6.7) we conclude that x(t) can be rep-
resented in the form (8.1) with coefficients Cn determined from (8.15) or, what
is the same, from (8.8). This completes the proof of Theorem 20.
= r~U.,)«() = L
00
(8.18)
Now
d dA+'Ir 1
fu,suzCA) = dA FU,su,s(A) = dA ~ = 2'1r'
Therefore according to assertion C of Theorem 17, (8.18) implies that {u(t)}
is a fundamental sequence.
i:
By (4.5),
.I.(U)(A)
'l's
= dFuu(A)
dFsu(A) = 2 dFsu(t)
'Ir dA
= .I.(uz)(A)J;(uz)(A)
'l's 'l'u . (8.20)
By (8.18),
(8.21)
(8.23)
(8.24)
for '(I = 1.
Since (0 is a zero of r :r:«), it follows that r~u)«) is analytic for '(I < 1.
By Theorem 20 this implies that 7
L: cnu(t -
+00
coincides almost everywhere on the cirele 1(' = 1 with the boundary values of
the function
r(US)(I") = (- (0 (8.26)
U .. 1- (0('
which is analytic in the disk '(I < 1. Therefore, according to Theorem 20
L: dnu:r:(t -
00
where the dn are the coefficients of the Taylor series of the function rLus)«).
Formula (8.27) shows that u(t) belongs to Huz<t). Since by property W 3 ) of
the sequence {u:r:(t)} (see §7), H(us)(t) lies in H:r:(t), we see that u(t) also lies
in H:r:(t).
This implies that for n > 0 all the u(t - n) belong to H:r:(t - n). Therefore,
comparing (8.25) and (7.6) we obtain
Cou(t) = ~(t),
It can easily be verified that the sequence {u(t)} defined by the formula
(8.28)
Theorem 22. A stationary sequence {z(t)} is regular if and only if the fol-
lowing conditions hold on -7r $ A $ 7r:
1) Fxx(.~) is absolutely continuous;
2) fxx(A) is positive almost everywhere;
3) log fxx(A) is summable.
If all these conditions hold, then
(8.29)
(x) 00
r x(O) c~x)
log . Jn= = log. Jn=.
V 27r v27r
Since r x«() has no zeros, the function Qx«() is uniquely defined at all
points of the disk 1(1 < 1. For the real part of Qx«() we have, by (8.29),
(8.32)
STATIONARY SEQUENCES IN HILBERT SPACE 263
Let Re+ Qx(() be the function equal to ReQx(() for ReQx(() > 0 and
equal to 0 otherwise. Since (8.32) implies
Since
we find that
of Re Qx( e- iA ) are integrable with respect to'x. This proves the necessity of
the integrability of log fxx('x), that is, condition 3 of our theorem.
Before passing to the proof of the sufficiency of the conditions of the the-
orem, let us show that for any regular sequence {x( t)} the function r x (() is
determined by (8.29)-(8.31). Since, according to what was proved above, the
function ~logfxx('x) is integrable for a regular sequence {x(t)}, it can be ex-
panded in a Fourier series (8.31). Formulas (8.33) and (8.31) imply that
(x) 00
Since
(8.35)
264 STATIONARY SEQUENCES IN HILBERT SPACE
where the eoeffieients a~1:) and b~1:) are determined from the expansion (8.31).
Then 8
(8.39)
where
P (0) _ 2. 1 - p2 (8.40)
P - 211" 1 + p2 - 2p eos 0 .
The inequality between the geometrie mean and the arithmetie mean 9 implies
that
(8.41)
m= 1 6
P(x)f(x)dx, logs = 1 6
P(x) log f(x)dx, 1
6
P(x)dx = 1,
P(x) ~ 0, f(x) ~ 0,
then s ::; m.
STATIONARY SEQUENCES IN HILBERT SPACE 265
The boundedness of the integral on the left-hand side of (8.42) thus estab-
lished guarantees 10 that r(e) can be represented as the Cauchy integral of its
boundary values
r(e- i >.) = cf>(A).
Formulas (8.38) and (8.31) imply that
Therefore,
11" 1
1 -11" 1cf>(A)12dF:c:c(A) = 1
and consequently l/cf>(A) belongs to the dass L~. Let us find a sequence {u(t)}
subordinate to {z(t)} for which
Equation (8.43) and Theorem 17C imply that {u(t)} is a fundamental sequence
equivalent to {z(t)}. By Theorem 10,
Thus, {u(t)} satisfies all the conditions of the second part of Theorem 20.
Hence {z(t)} can be represented in terms of {u(t)} in the form (8.1), which
proves the sufficiency of the conditions of Theorem 22. It is easy to see that
the sequence {u(t)} so constructed in fact coincides with {u:c(t)}.
In condusion we note the formula 11
where
= L: C~")U.,(t -
00
and the coefficients c~.,) in (9.2) are determined lrom (8.16), where r .,«) is
defined by (8.29)-(8.31).
(9.6)
12 Since the sequence identically equal to zero is singular, but not regular, the
singular component of a regular sequence equals 0 and a singular sequence has
no regular component.
STATIONARY SEQUENCES IN HILBERT SPACE 267
(9.8)
Therefore, every set of positive Lebesgue measure has positive measure also
with respect to F.,.,(A).
Since {S.,(t)} and {r.,(t)} are subordinate to {z(t)}, Theorem 12 implies
that
dF6Z6z (A) dFrzrz(A) = 0
dF.,., (A) dF.,., (A)
almost everywhere with respect to F.,.,(A), and hence, in the sense ofLebesgue.
Since by (9.8) almost everywhere
that is,
(9.9)
(9.10)
Formula (9.10) and the facts that frzrz(A) is positive almost everywhere
and log frzrz (A) is integrable imply that for a non-singular sequence case 3)
always holds.
Let us show that, conversely, in case 3) {z(t)} is non-singular. According
to Theorem 14 in case 3) {z(t)} can be uniquely represented as a sum
= s(t) + l: C~)Ur(t -
00
Formula (9.11) shows that in this ease HIIJ(t) is eontained in the spaee 13
S. $ Hr(t). Representing z(t + 1) as
z(t + 1) = Ot + ß,
where
E C~)Ur(t -
00
it is easy to see that ß belongs to SIIJ $ Hur(t) and Ot ::fi 0 is orthogonal to this
spaee. Therefore z(t + 1) does not He in SIIJ $ Hur(t) and henee does not He in
HIIJ(t). This implies that {z(t)} is non-singular.
We have proved (9.10) for any non-singular {z(t)}. Therefore r llJ «() deter-
mined in aeeordanee with (8.29)-(8.31) eoincides with rr,.«(). To prove that
the eoefficients c~lIJ) ean be obtained from the expansion (8.16) of r IIJ«()' it only
remains to establish that
(9.12)
For this it suffiees to show that (9.2) satisfies Wt}-W5) (with zero singular
eomponent). Clearly Wl), W2), W4) and W5) hold. Let us prove that W3)
holds for (9.2). For this we note that by (9.8) and Theorem 17A, {UIIJ(t)} is
subordinate to {rllJ(t)}. From (9.1) and (9.2) it is dear that HIIJ(t) is eontained
in SIIJ $ Hr,.(t). Sinee UIIJ(t) belongs to HIIJ(t) and is orthogonal to SIIJ (see §7),
ulIJ(t) belongs to Hr,.(t), whieh means that (9.2) satisfies W3).
For a non-singular sequenee, (9.5) follows immediately from (9.1), (9.10)
and the absolute eontinuity of Fr,.r,. (the latter is ensured by the regularity of
{rllJ(t)}, see Theorem 16).
(10.1)
only one of two eases is possible: either aH the HIIJ(t) eoincide with H I J or all
the HIIJ(t) are different from HIIJ.
(10.2)
dz = 16(t)l. (10.4)
1 11'
-11'
d>'
I:t::t:(>')
(10.6)
is finite.
11 these conditions hold, then
<p(>') = 0,
1:
By (3.3), for 8 =I t,
the space Hz;, which contains y(t), cannot coincide with Hz;(t). This proves the
minimality of {x(t)}.
To prove the necessity ofthe conditions of the theorem, assurne that {x(t)}
is minimal. It is easy to see that {6(t)} is stationary and subordinate to {x(t)}.
Then
(10.10)
(10.11)
(10.12)
STATIONARY SEQUENCES IN HILBERT SPACE 271
(10.13)
1: 14>~")(AWdF.,.,(A) = 1:127rf~:(A) 2
1 dFxx(A) = (:;)21: f.,~~A)
(10.14)
is finite. This proves the necessity of the conditions of the theorem. Noting
that
(10.15)
References
Spectral conditions are established for the possibility of extrapolating and in-
terpolating stationary random sequences by a sufficiently large number of terms
with any prescribed accuracy.
Introduction
For each integer t (-00 < t < +00) let x(t) be areal random variable whose
square has finite expectation. The sequence of random variables x(t) will be
called stationary if the expectations 1
m = Ex(t)
and
B(k) = E[(x(t + k) - m)(x(t) - m)]
do not depend on t. Without loss of generality we may set
m = Ex(t) = o. (1)
Then
B(k) = E[x(t + k)x(t)]. (2)
Since
B(-k) = B(k), (3)
it suffices to consider the second moments B(k) only for k ~ o.
The problem of linear extrapolation of a stationary sequence satisfying (1)
is to select for given n > 0 and m ~ 0 real coefficients a, for which the linear
combination
of random variables
272
INTERPOLATION OF STATIONARY RANDOM SEQUENCES 273
gives the closest possible approximation to the random variable z(t + m). It is
natural to take the expectation
n n n
(T2 = E(z(t + m) - L)2 = B(O) - 2 I:B(m + s)a. + I: I: B(p - q)apaq
p=lq=l
as the measure of accuracy of such an approximation.
If the second moments B(k) are known, then it is easy to find the coeffi-
dents a. for which (T2 takes the smallest value. This smallest value of (T2 will
be denoted by (T~ (n, m).
Clearly, (T~ (n, m) cannot increase with n. Therefore the limit
(5)
exists.
Our second goal is to determine (TJ. The solution to both these problems
was announced in my paper [1].2 It uses notions relating to the spectral theory
of stationary random processes.
2 There is amisprint in formula (1) of [1]. Th, correct form of (1) is:
lim tTJ(n)
n_oo
=r : 1'" d(~).
0 8 A
274 INTERPOLATION OF STATIONARY RANDOM SEQUENCES
Theorem 1. For any stationary sequence {x(t)} the second moments B(k)
can be represented in the form
The derivative
W(A) = dW(A)jdA
of the non-decreasing function W(A) exists almost everywhere, is non-negative
and summable. Since
logw(A) ~ W(A),
it follows from the summability of W(A) that the integral
p = -11""
0
'Ir
log w(A)dA (8)
is either finite or equal to _00. 4 Further, we prove the following:
R =;: Jo
1 r W(A)
dA
(12)
uj = I/R. (13)
Our discussion will be based on the axioms and the construction of the basic
not ions suggested in my book [5], with one difference: our random variables
can take not only real, but also complex values. 6
Consider the set Jj of all random variables of a certain Borel probability
field (F, P) with finite expectation of the squared absolute value, regarding
equivalent random variables (that is, random variables that differ from each
other with prob ability zero) as identical. We introduce in Jj the scalar product
(15)
the additional eondition (1). Then, by (2) and the fact that z(t) is real, we
have
B(k) = E[z(t + k)z(t)] = (z(t + k), z(t».
Since, by definition, B(k) does not depend on t, {z(t)} is a stationary sequenee
of elements of" in the sense of [4].
In [4] I eonsider stationary sequenees in a Hilbert space, that is, in the
spaee satisfying not only the Axioms A, B and E, but also C and D from [6].
However, this restriction is inessential. Indeed, denoting by Hz the smallest
closed linear subspaee of" eontaining all elements of {z(t)}, it is easy to show
that Hz is separable, that is, satisfies Axiom D. A separable unitary space is
either a Hilbert spaee itself (that is, satisfies not only Axioms A, Band E, but
also Axiom C), or is finite dimensional, and in this latter ease ean be extended
to a Hilbert spaee H.
Thus, all the results obtained in [4] may be applied to {z(t)}, setting
(17)
(19)
It is dear from (18) and (3.9) and Theorem 2 of [4] that Wxx(A) is a
non-decreasing real function. Together with (18) and (20) this shows that the
function
W(A) = Wxx(A)
satisfies the requirements of Theorem 1.
In the general case of stationary sequences in the sense of [4] we take (22) as
the definition of uj;(m).
If {z(t)} is singular, then Hx(t - 1) = Hx and hence
(23)
7 As is known, 116(t -1, m)1I equals "the distance" of the point x(t + m) from the
space Hx(t -1), that is, the greatest lower bound of the distances IIx(t+m) -yll
for all y in Hx(t - 1). Since the elements of the form
are everywhere dense in Hx(t -1), 116(t - 1, m)1I also equals the greatest lower
bound of the distances
If all the x( s) are real random variables, then the greatest lower bound of (*)
does not change when we consider only real coefficients ak, in which case it
clearly coincides with u E ( m )
278 INTERPOLATION OF STATIONARY RANDOM SEQUENCES
2: c~~)u~(t + m -
00
Since s~(t+m) and u~(t+m-n) belongto H~(t-l) for n > m and u~(t+m-n)
are orthogonal to H~(t - 1) for n:::; m, comparing (21) and (24) we obtain
Since the elements u~(t+i) are pairwise orthogonal and normalized, (25) implies
that
/T~(m) = 1I~(t _1)11 2 = (c~~»2 + (c~~»2 + ... + (c~». (26)
It is easy to derive Theorem 2 !rom (26) for stationary sequences of real random
variables. This is carried out in this section.
By (3.9) we have
(27)
1 +11'
-11' logf~~(~)d~=2
r logw(~)d~-27rlog27r.
Jo (29)
Together with Theorem 23 from [4], (29) shows that the equality
1111'
P = - logw(~)d~ = -00
7r 0
is necessary and sufficient for the singularity of the sequence {z(t)}. We have
already seen in §3 that in this and only in this case
/TMm) = O. (30)
INTERPOLATION OF STATIONARY RANDOM SEQUENCES 279
1 1+11"
(c~X))2 = 211" exp (211" r
_11" logfxx(,x)d,x) = exp (;1 Jo logw,xd,x) = eP • (31)
Under the same assumption, namely if {x(t)} is non-singular, 8 (8.31), (27)
and (28) imply that
(2: a~X)(k).
00
Setting
(2: a~X)(k) = 1 + rl( + r2(2 + ... ,
00
exp (34)
k=l
c(x)/c(x)
n 0
-- r n· (35)
Formulas (30), (31) and (36) now complete the proof of Theorem 2.
§5. Definition of a}
After [4], we denote by Hx(t) the smallest closed linear subspace of H x con-
taining the elements
where lI(t) belongs to H:t:(t) and 6(t) is orthogonal to H:t:(t). It is easy to show
that for a stationary sequence of real random variables
(37)
(38)
CT~ = d! = 'Ir: 1
0
11' d>' 1
w(>.) = R'
References
In arecent paper [1] N.K. Razumovskii indicates many cases when the loga-
rithms of particle sizes (gold grits in gold placers, rock particles under grinding,
etc.) obey approximately the Gauss distribution law. The aim of this paper
is to give a fairly general scheme of the random process of particle grinding,
for which in the limit (when grinding does not stop) the Gauss law for the
logarithms of particle sizes can be established theoretically. Perhaps similar
considerations will help to explain also why the Gauss distribution is applica-
ble to the logarithms of mineral contents in separate sampIes (Razumovskii's
paper is mainly devoted to this question).
Let us study the general number of particles N(t) and their distribution
in size at successive times t = 0, 1, 2, ....
Let N(r, t) denote the number of particles with sizes p ~ rat time t (in
what follows it is immaterial whether p denotes diameter, weight or any other
characteristic of a particle's size, provided the size of each particle obtained
after grinding a particle of size r does not exceed r).
We denote by Q( k) the expectation of the number of particles of size p ~ kr
formed during the period from t to t + 1 from one particle which at time t had
size r. We set
B2 = Q~I) 1
1
(logk - A)2dQ(k). (2)
Under certain assumptions given below it can be proved that for sufficiently
large t the ratio
N(e#:,t)/N(t) (3)
is arbitrarily close to
1 j#:
';2rlB -00 exp
- At?}de
{(e 2B t2
(4)
281
282 LOG-NORMAL DISTRIBUTION OF PARTICLE SIZES
The most essential assumption used to derive this relation is that the
probability that a particle is ground into a certain number of parts of certain
relative sizes per unit time does not depend on the size of the initial particle.
To formulate rigorously the required assumptions we introduce our nota-
tion. Let Pn be the probability of obtaining exactly n particles from one particle
during the period between t and t + 1 and let
be the conditional distribution law for the ratios ki = ri/r of the sizes of
the resulting n particles to the size of the original particle. The n particles
resulting from grinding are supposed to be enumerated in increasing order of
size: rl ::; r2 ::; r3 :::; ... :::; r n .
In accordance with this, Fn (al, a2, ... , an) is defined only for 0 :::; al <
a2 :::; ... :::; an :::; l.
Clearly,
00
is finitej
d) at initial time t = 0 there is a certain number of particles N(O) with
arbitrary size distribution N(r,O).
Under these assumptions:
1) The expectation of the total number of particles N(t) at the moment t
lS
N(eIC,t) N(eIC,t)
N(t) = N(O)Q'(l) = T(z, t) (7)
(8)
Setting
Q(k) = Q(1)8(logk), (9)
from (7) and (8) we obtain
It is easy to deduce from (7) and (9) that 8(z) and T(z, 0) = N(e IC , O)jN(O)
satisfy all the requirements for distribution functions. 2 In view of the recur-
rence relation (10), the same is true for the functions 3 T(z,t) for any integer
t > O. Condition (c) implies that
(11)
T(z, t) -+
1
.;2irt
jIC exp{
(e - At)2}
2B2 de (12)
2mB -00 t
1 This point should not be dismissed as being trivial. For N(t) and N(r, t) taken
separately, the corresponding statement, that is, that the ratios N(t)/ N(t) and
N(r, t)/ N(r, t) are dose to 1, would be wrong.
2 In this case we set for z > 0,
S(z) = S(O) = 1.
Therefore in all integrals involving dS the upper limit 0 can be replaced by +00
without chan ging the value of the integral.
3 In our problem S(z) and T(z, t) are not probability distributions, but rather
express certain expectations. This does not prevent us from invoking Lyapunov's
theorem, considered as a theorem in pure analysis.
284 LOG-NORMAL DISTRIBUTION OF PARTICLE SIZES
are clearly the same as the A and B defined by (1) and (2).
It would be interesting to study mathematical schemes in which the rate
of grinding decreases (or increases) with the decrease of particle size. Then it
would be natural to consider first the cases when the grinding rate is propor-
tional to a certain power of particle size. If this power is non-zero, then the
logarithmic normal law is no longer applicable.
Steklov Mathematical Institute, 17 December 1940
References
1. N.K. Razumovskii, Dokl. Akad. Nauk SSSR 28:8 (1940) (in Russian).
30. JUSTIFICATION OF THE METHOD OF LEAST SQUARES *
From the purely practical viewpoint, the standard literature on the method
of least squares has one essential drawback: it does not give any indications
on the use of the Student distribution and X2 laws for evaluating the reliabil-
ity of results obtained (for computing the probability that the errors exceed
certain limits). On the other hand, the use of the Gauss law for a small num-
ber of observations results in a very high, and in practice, quite noticeable,
overestimation of this reliability (see below §§9, 10).
In addition, the standard handbooks do not explain adequately the tech-
niques involved in the method ofleast squares; this is in evidence in the teaching
courses in universities and pedagogical institutes where the students are sup-
posed to have a good knowledge of linear algebra. The point is that usually all
the basic results of the method of least squares are obtained in a very clumsy
purely computational way, whereas the use of proper general methods of modern
linear algebra (for example, the notion of orthogonality) gives the same results
much more transparently. The presentation is most lucid when it is performed
with the help of notions of n-dimensional vector geometry.
The aim of this paper is to indicate by means of an example of a very
simple problem on the method of least squares, under the assumption that all
observations are equally valid, how one may obviate both these drawbacks. The
reader is supposed to have a knowledge of linear algebra in geometrie vector
presentation and the fundamentals of probability theory. Greek characters,
excluding 7r and r, denote random variables.
y = Eajxj. (I)
j=l
285
286 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
We ass urne that the aj are uniquely determined by the values of Zjr and Yr,
that is, the rank of the matrix rank IIZjrll is greater than or equal to n. This
implies that N ~ n. In the experimental determination of the Yr certain errors
are inevitable. Instead of the true values of Yr we experimentally obtain the
values
7]r = Yr + 6. r . (111)
Given Zir and 7]r, we have to determine the best rational approximate values
G:i of the coefficients ai.
Let
n
(VI)
(IX)
where s is the mean square error of the experimental values of Yr (it is assumed
to be independent of the number r) and qir is determined from the equations
n
L: [Xi Xj ]qjA: = eiki i, k = 1,2, ... , n. (X)
j=1
Here
for i'" k,
for i = k
Formula (IX) is a particular case (for i = j) of the formula
(XI)
(XIII)
(XIV)
The latter formula shows that for large N - n the ratio (J' : s is indeed elose to 1
with prob ability elose to 1. Together with formula (IX) this makes it possible
to consider q;j(J'2 as an approximate value of Daj:
(XV)
[u] = O.
In this case the denominator on the right-hand side of (XII) vanishes and (1 is
undefined.
Formulas (VIII), (IX), (XIII) and (XIV) give but a rough indication ofthe size
of the error resulting from the replacement of aj by O:j and s by (1. The final
solution of this problem should have consisted in producing distribution laws
for the deviations O:j - aj and (1 - s. Gauss did this (under the assumption
that the errors ~r are independent and obey the Gauss distribution law with
mean equal to zero) for O:j - aj. Namely, he found that
(XVI)
The deviations of aj from O:j can be evaluated more accurately with the help
of the distribution law
(XVII)
where
2) _ 1 fh m-l _h 2 /2
Hm(h - 2(m-2)/2f(m/2) Jo h e dh. (XVIII)
Tables ofthe function H m and its inverse are widely used. In §1O we point
out a possible modification in these tables which seems to be desirable from
the viewpoint of practical application of the method of least squares.
The error resulting from replacing aj by O:j for an unknown s is estimated
using the theorem stating that
JUSTIFICATION OF THE METHOD OF LEAST SQUARES 289
p{ ..;q;;u
aj - aj < t} = SN-n(t), (XIX)
where 1
r((m+1)/2)jt ( t 2 )-<m+1)/2
Sm(t) = y'miil'(
m1r m /2) -00
1 + -
m dt. (XX)
Regarding
Y= Lajxj, (1)
j=l
1/ = y+A, (2)
n
1 At the end of this paper I give the table of the function inverse to Sm(t) which
is a reduced version of the table published by E.N. Pomerantseva. I believe it
satisfies the basic needs for a practical use of the method of least squares (see
§9).
290 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
Since the choice of the Cl:i is at our disposal, (3) merely asserts that
TJ" E L, (5)
where
L=L(Zl,Z2, ... ,Zn)
These are the normal equations (VII) for determining Cl:j. The corresponding
determinant
(9)
is the Gram determinant of the vectors Zt, Z2, •.. , Zn. Since, by the assumption
of §1, IIZjrll has rank n, the vectors Zi are linearly independent and G :I O.
Therefore (8) uniquely determine the Cl:j.
in L that are 6n -orthogonal to the system Xl, X2, ... , X n ; this means that
Since each of the systems {Xl, X2, ... , X n } and {U1' U2, ... , U n } is a basis
in L, it follows that
n n
Ui = Lqikxk, Xi = LCikUk.
k=l k=l
Taking the scalar product of the first of these equalities and Uj, and the scalar
product of the second and Xj, and using (10) we obtain
(12)
that is,
n
Ui = L qijXj = L[UiUj]Xj, (13)
j=l
n
Xi = L[XiXj]Uj. (14)
j=l
By (13)-(14), IIqijll is the inverse to II[XiXj] 11 , that is, the qij are indeed
determined from (X).
Taking the scalar product of (1) and Ui we obtain
ai = [yu.]. (15)
(17)
(18)
(19)
N N n
(20)
Formulas (19) and (20) are none other than formulas (VIII) and (XI). As has
already been mentioned, (XI) gives (IX) for i = j.
Weset
A*=1]*-Y· (21)
A=A*+f. (22)
Setting
(23)
we obtain
(24)
r=l
n
A* = L:Lirbr, (25)
r=l
N
l = L: Lirbr. (26)
r=n+l
Clearly, by C) and D)
for r = r',
(27)
for r i= r'.
Formula (26) and B) imply that
El= 0 (28)
Formula (29) and (XII) (this formula merely serves as adefinition of u) imme-
diately imply (XIII).
For our further derivations, the assumptions B), C) and D) given at the begin-
ning of §6 should be replaced by the following stronger ones:
G) The errors Ar obey the Gauss distribution
294 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
(30)
(31)
Formula (31) gives the prob ability density of a random vector d with
(32)
Ir = [tbr), (33)
Li r = [dbr ), (34)
(35)
(32')
Therefore in this new coordinate system !(t) can also be considered as the
probability density of d. In other words, the prob ability density of the random
variables Li l , Li 2 , ••• , LiN is of the form
(30')
JUSTIFICATION OF THE METHOD OF LEAST SQUARES 295
Formula (30') shows that the random variables Ar are independent and
obey the same distribution law
- r < t} = -1-
P{6.
-/2is
1
t
-00
e- t 2/2• 2 dt, (36)
as do the 6. r .
It is important to be aware of the fact that the independence of Ar is
derived from both G) and U). This conclusion is no longer the case when the
Gaussian distribution for the errors of the first kind is replaced by some other
one.
(37)
(39)
The independence of the random variables Ar implies that of the random vari-
ables Ar/s. By (36) each ofthese latter variables obeys the normal distribution
law with prob ability density Ja;e- t2 / 2 • Therefore their (N - n)-dimensional
distribution law is characterized by the prob ability density
exp{ -~ L
N
(21r)-(N-n)/2 t~}.
r=n+l
X<h
296 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
Thus we obtain 2
EX 2 = N - n, (41)
DX 2 = E{x2 - (N - n)}2 = 2(N - n). (42)
By (37), formulas (41) and (42) are equivalent to (XIII) and (XIV).
It only remains to consider the distribution law for
Tj = 'r;/x, (44)
where
"/j =.JN - n(aj - aj)/..;qjjs. (45)
We now note that by (15), (16), (21) and (25),
n
aj - aj = [~·Uj] = L: UjrAr , (46)
r=l
2 For the derivation of (40) it should be noted that the area of the surface of a
sphere of radius R in m-dimensional space is
JUSTIFICATION OF THE METHOD OF LEAST SQUARES 297
where
(47)
Comparing (46) with (37)-(38) we see that the ~r entering in the expres-
sion for lkj - aj are only those ~r that do not occur in the expression for X2 •
Therefore, the independence of the ~r implies that lkj - aj is independent of
X. By (45), 'Yj and X are also independent. In accordance with (XVI), the
probability density of 'Yj is
1 -c2/2(N-n)
V 21r (N _ n) e .
Together with (40) this implies that the 2-dimensional prob ability density of
'Yj and X is
1 hN - n- 1 { c2 h2 }
V1r(N - n) ·2(N n 1)/2r«N _ n)/2) exp - 2(N - n) - 2 .
Integrating this prob ability density over the region where c/h < t, we
obtain
P{ Tj < t} = p{ ~ < t} =
1
= X
V1r(N - n)2(N-n-l)/2r«N - n)/2)
P{tj < t} =
= V1r(N - 1
1 00 N-n-l
u -2-e- n du
1t 82
1+ - -
- N-n±l
2
=
n)r«N - n)/2) 0 0 ( N - n) d8
=
r«N - n + 1)/2)
V1r(N - n)r«N - n)/2)
1 0
t( 2
1 +t- -) -(N-n+1)/2 dt.
N - n
(49)
298 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
Formulas (XIX) and (XX) are nothing but another way of expressing (49).
The values of t corresponding to various w are given in the second last line
(m = 00) of Table 1.
If s is known, then in accordance with (XV), it is customary to consider
(T as an approximate value of s. However, for the confidence limits
(53)
to satisfy (50), t should be defined not by (52) but, in accordance with (XIX),
(XX), by
SN-n(t) = (1 + w)/2. (54)
The values oft corresponding to various w according to (54) for different values
of N - n = mare given in the same Table 1.
3 See [1] or [2].
Table 1. Values of t satisfying the equation Sm(t)=P
m P =0.75 P 0.90
= P 0.95
= P =0.975 P 0.99
= P 0.995
= P = 0.9975 P 0.999
= P 0.9995
=
1 1.000 3.078 6.314 12.076 31.821 63.657 127.321 318.309 636.619
2 0.816 1.886 2.920 4.303 6.965 9.925 14.089 22.327 31.600
3 0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.214 12.922
4 0.741 1.533 2.132 2.776 3.747 4.604 5.597 7.173 8.610
5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869
6 0.718 1.440 1.943 3.447 3.143 3.707 4.317 5.208 5.959
7 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.69 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 0.694 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 0.691 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.690 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 0.689 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 0.688 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 0.688 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.687 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.849
21 0.686 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 0.696 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.685 1.319 1.714 2.069 2.500 2.807 3.104 3.486 3.768
24 0.685 1.318 1.711 2.064 2.492 2.797 3.092 3.467 3.745
25 0.684 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 0.684 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.684 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.690
28 0.683 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.683 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.659
30 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.386 3.646
00 0.67449 1.21855 1.64485 1.95996 2.32634 2.57582 2.80703 3.09023 3.29053
w 0.5 0.8 0.9 0.95 0.98 0.99 0.995 0.998 0.999
300 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
The equality
(55)
(58)
P P P
m m m
0.01 0.99 0.01 0.99 0.01 0.99
1 0.388 79.750 8 0.631 2.204 40 0.792 1.343
2 0.466 9.975 9 0.644 2.076 50 0.810 1.297
3 0.514 5.111 10 0.657 1.977 60 0.824 1.265
4 0.549 3.669 15 0.708 1.694 70 0.835 1.241
5 0.576 3.003 20 0.730 1.556 80 0.844 1.222
6 0.597 2.623 25 1.751 1.473 90 0.852 1.207
7 0.615 2.377 30 0.768 1.416 100 0.858 1.195
(59)
(60)
alone. For example, to determine the mean square error made when replacing
ai hy ai in the expression
where the coefficients Ci are given, knowledge of the variances D( as) is not
enough, hut it is quite sufficient to know the covariance matrix 8ij. Setting
302 JUSTIFICATION OF THE METHOD OF LEAST SQUARES
we have
E{ß - b}2 = LLSijCiCj.
i j
Usually the diagonal elements of the matrix IIqij 11 are introduced as the recip-
rocals of the "weights"
This points to the essential role of the entire matrix IIqij 11.
References
This paper refines certain estimates from Gauss's theory of the method of least
squares.
In §39 of "Theoria combinationis observationum erroribus minimis obnox-
iae" , Gauss computes the mean square error made in replacing 1'1' by
0-~ 0-~ 2
- ( )2 [p - ~(all' + bß + e)' + ...) ]. (1)
7r-P 7r-P L...J
In §40, Gauss derives from (1) simple estimates ofthe mean square error.
He derives these estimates from the inequalities
(2)
Gauss overlooked the fact that the upper estimate in (2) can be essentially
sharpened, and (2) may be replaced by
Therefore the conclusions of §40 are unexpectedly weak: the lower estimate
suggested by Gauss for the least square error is sometimes even negative.
Our aim is to prove the inequalities (3) and to sb.ow that they cannot be
improved.
Let us first recapitulate the problem in modern notation dose to that in Kol-
mogorov's paper [1].
Let Yr, ai and Xir where r = 1,2, ... , N; i = 1,2, ... , n; n < N, satisfy
the N equations
n
Yr = ~aiXir. (4)
i=l
303
304 A FORMULA OF GAUSS IN THE METHOn OF LEAST SQUARES
TJr = Yr + ß r , (5)
(6)
I)TJr - L:
a i X ir)2 = min (7)
r=l i=l
The eoeffieients Uir in (8) have the following geometrie meaning: in N-dimen-
sional space the veetors
whieh means that the vectors U1, U2, ... , Un are uniquely determined by the fol-
lowing eonditions: they belong to the linear n-dimensional subspace L spanned
by Xl, X2, ... ,Xn and satisfy the biorthogonality eonditions
for i = j, (9)
for i i= j.
Setting
1 N
(T2 =- - - " f~,
N-nL.....!
(10)
r=l
A FORMULA OF GAUSS IN THE METHOD OF LEAST SQUARES 305
where
n
we have
(12)
(13)
(14)
where
N n 2
0= L(LZirUir) . (15)
r=1 i=1
§2. Estimation of 0
Indeed, the projeetion z" of any veetor z onto L ean be written in the form
n
Z" = LCiZi.
i=1
e; = LXirUi, (19)
i=1
e;k = LXirUik. (20)
(22)
LO;k ~ LO;k = 1,
k=l k=l
A FORMULA OF GAUSS IN THE METHOD OF LEAST SQUARES 307
o ~ I:I:0~k = n.
r=lk=l
The inequality
(23)
(24)
an equality can be attained for any N and n < N, as has been established by
A.1. Mal'tsev [2] at my request.
§3. Conclusions
(25)
r- 84 n (/4 - 384 )
.:...".".----
2 84
<Du <.:...".".--
r-
N-n N N-n - - N-n'
(26)
/4 _ 84 /4 _ 84 n (38 4 _ /4)
.:...".".--
N-n -
< Du 2 <
- N-n
+ -N N-n
,
308 A FORMULA OF GAUSS IN THE METHOD OF LEAST SQUARES
and the bounds in (26) can be attained for any N and n < N in certain
particular cases.
If r - 38 4 = 0, then from (26) we obtain the formula
(27)
which is weH known for the case of the normal Gaussian distribution of the
errors ~r.
(28)
(29)
5 May 1947
References
Consider a set of objects (for example, molecules) of n types Tl. T2,.'" Tn and
=
aSsume that with prob ability P:(t 1 ,t2) P(T" -+ Saltltt2)' during the time
interval (tl, t 2) one object of type T" turns into the set
consisting of (ll objects of the first type, (l2 objects of the second type, (li ob-
jects of type i, etc. A random process consisting of this kind of transformation
is called a branching process if the probabilities P:(tl,t2) are uniquely deter-
mined by the times tl < t2, the number k of the original type, k = 1,2, ... , n,
and the n-dimensional vector (l = «(ll, (l2, ... , (ln) with integer coeflicients,
(li = 0,1,2, ....
It is essential here that we assume the probabilities to be independent 0/:
1) how the original object of type T" appearedj it is only assumed that it
exists at time tl j
2) the fate of other objects of types Tl. T2, ... ,Tn different from T" at time
tl and the objects originating from them at times t > tl.
This probability-theoretic scheme finds numerous applications in biology,
chemistry and elementary particle physics. In particular, in chemistry it can
can be used to describe the initial stages of the most varied chemical re ac-
tions. At the initial stage of a chemical reaction the concentration of certain
types T{, T~, ... ,T:n of molecules may be considered high but approximately
constant, whereas the concentrations of other types T{', T~', ... ,T:: are variable
but very low. Under these assumptions meeting between two types T" is vir-
tually impossible, while the results of meeting between one molecule of type
T" with one or several molecules of the types T' approximately obey the above
requirements as regards the number of resulting molecules of types T".
In chemical and physical quest ions it is natural to use a "continuous time"
version of this scheme, assuming that the probabilities P:(tl, t2) are differ-
entiable with respect to t l and t 2 • The differential equations derived under
309
310 BRANCHING RANDOM PROCESSES
By the general principles of probability theory, the Pf(tl, t2) satisfy the con-
ditions
(I)
I: Pk(t1, t 2) = l. (II)
0<
Pk(t, t) = Ef = { ~
if ak = 1, ai = 0, i t k,
(III)
otherwise.
Finally, in view of the above assumption, for any
where
I:I:ß(i,s) = ß· (IVb)
;=1 8=1
F,.(tl, t2; Xl, X2, ... , Xn ) = I: Pf(it, t2)xrlx~2 ... x~n. (2)
a
1 For (Xi = 0 the corresponding product in (IVa) and (IVb) is set to be 1, and the
corresponding surn o.
312 BRANCHING RANDOM PROCESSES
of the vector argument x = (Xl, x2, ... , xn). The reason for introducing the
function F(tl, t2; x) is that by means of this function (IV) can be expressed in
the following way:
(A)
Apart from the basic functional equation (A) for F(t1, t2; x) (111) also gives the
boundary value
F(t,t;x) = x (B)
(3)
(5)
BRANCHING RANDOM PROCESSES 313
Setting tl = t', t2 = t' + ß, t3 = t" in (A) and passing to the limit as ß ---+ 0,
we obtain for t' < t" the following differential equation for F(t', t"; x):
This differential equation together with the boundary value (B) can serve as
a basis for computing this type of processes. Indeed, PI:(t) can usually be
determined directly from the conditions of the problem, while the purpose
of the mathematical theory is to determine the probabilities pr (tl, t2), and
sometimes, their asymptotic behaviour as t 2 ---+ +00. Since fk can easily be
found from the given pI: and the desired pr are obtained from the power series
expansions of the Fk in the Xi, to determine the probabilitiespr and study
their asymptotic behaviour we only have to solve (C) with the boundary value
(B) and investigate the asymptotic behaviour of the solutions.
For a process homogeneous in time the prob ability densities PI: are con-
stants and the Fk(t; Xl, X2,' .. , Xn) are related to the functions
by the equations
This method is often much more efficient than directly dealing with the
infinite systems of differential equations of Markov processes with a countable
set of states derived from (1) under assumptions similar to (V). Here we will
confine ourselves to one simple example of using this method (the same problem
has been solved by N. Arley [5] using infinite systems):
314 BRANCHING RANDOM PROCESSES
n = 1:
dF a + ce(a-b)t
-:-----:-:-----,-
(1 - F)(a - bF)
= dt, F - --..,..--:-:-
- b + ce(a-b)t '
a+c bx - a
F(O;x) = -b- = x, C=--,
+c 1-x
a(1-x)+(bx-a)e(a-b)t 0 I 2 2
F(t; X) = b(1- X) + (bx _
a)e(a-b)t = PI (t) + PI (t)x + PI (t)x + ...
o a 1 - e(a-b)t
Pd t ) = b 1 _ (a/b)e(a-b)t'
a) 2 (1 _ e(a-b)t)k-I
pk(t) - ( 1 _ _ (a-b)t
I - b [1- (a/b)e(a-b)t]k+ 1 e ,
References
1. M.A. Leontovich, 'Basic equations of kinetic gas theory from the point of
view of the theory of random processes', ZhETF, 5 (1935), 211-231 (in
Russian).
2. R.A. Fisher , The genetic theory 0/ natural selection, Oxford, 1930.
3. J.F. Steffenson, 'Deux problemes du calcul des probabilites', A nn. Inst. H.
Poincare 3 (1933), 331-334.
4. A.N. Kolmogorov, 'Solution ofa biological problem', Izv. NIl Mat. Mekh.,
Tomsk Univ. 2:1 (1938),7-12 (No. 25 in this volume).
5. N. Arley, On the theory of stochastic processes and their application to
the theory of cosmic radiation, Dissertation, Copenhagen, 1943.
33. COMPUTATION OF FINAL PROBABILITIES
FOR BRANCHING RANDOM PROCESSES *
Jointly with B.A. Sevastyanov
The terminology and notation in this paper are elose to those in [1]. In what fol-
lows, we consider discrete schemes homogeneous in time t: t runs only through
1,2,3, ... and can be interpreted as the "generation number" of the partiele in
question. Accordingly,
(1)
is the probability that one partiele of type Tk gives al, a2, ... , an partieles of
the types Tl, T 2 , ••• , Tn , respectively in t generations. All further computations
are based on the generating functions
(2)
of the probabilities
pI: = Pk(l) (3)
(5)
315
316 FINAL PROBABILITIES FOR BRANCHING RANDOM PROCESSES
In what follows we assurne that (7) is already satisfied for the initial system
of types Tb T2, .. . , Tn and functions 11,12,·· ., In.
A group of types T"l , T"3' ... , T"m is ealled closed if a particle of any other
type produees only particles belonging to the types of the group. The system
of all types Tl, T 2, ... , Tn is deeomposable if it ean be divided into two closed
groups. It is natural to eonfine ourselves (and we shall do so) to indeeomposable
systems.
A group of types is ealled final if a) it is closedj b) eaeh particle of any type
in the group always produees exaetly one particlej and e) it does not eontain
any smaller group with the properties a) and b).
It is easy to see that two final groups do not have eommon elements.
Therefore, in general, the entire system Tb T2, ... , T n eonsists of a eertain
number offinal groups lli r = {Trb Tr2, ... , T rnr }, r = 1,2, ... , s, and a eertain
number of types T Ol , T 02 , ... , T ono which do not belong to final groups. Clearly,
no + n1 + ... + n, = n.
The proeess is eonsidered to be eompleted if there only remain particles of
the types from final groups. This way of understanding is quite natural, sinee
the progeny of a particle of a type belonging to a final group eonsists, in any
future generation, of one particle of a type belonging to the same final group,
and transitions from one type to another within a final group are governed by
the weIl known law of Markov ehains in their simplest form, eorresponding to
the assumption that all "states" are "essential" and form one "class" (see, for
example, [2]).
Denote by
(9)
FINAL PROBABILITIES FOR BRANCHING RANDOM PROCESSES 317
the probability that the evolution of the progeny from one particle of type TA:
eventually terminates in the final state, where ßr particles belonging to the
types of each of the final groups Wr remainj
(10)
is the total probability that the evolution of the progeny from one particle
eventually terminates (in the above sense).
We introduce the generating functions
(11)
When some of the variables attain the value 1, analyticity may be lost, but
continuity remains. In particular,
(13)
(14)
Equations of the system (14) for which the number k corresponds to the
type of a certain final group are corollaries of (15). Therefore, finally to deter-
mine </JA: we have the system of equations
(16)
</Jrm=Ur , m=1,2, ... ,nr , r=1,2, ... ,s.
318 FINAL PROBABILITIES FOR BRANCHING RANDOM PROCESSES
The junior author of this paper has proved the following theorem on
uniqueness of the solution of (16).
Remark 2. All these considerations are applicable also to computing final prob-
abilities qffor branching processes with continuous time. It suffices to count
not over time literally, but over "generations" of particles, introducing for the
types that do not undergo any further transformation additional ghost trans-
formations of their particles into themselves (with prob ability 1). This remark
will be clarified with an example (see below).
_ (2,0)
(2,0)
P1 = P{T1 -+ 2T1 } = -_7:(2:-:,0~)..!.+-_7:(0:-:,1")
P1 = p,
P1 P1
_ (0,1)
(0,1) _ P{T 2'Tl} - P1 - 1
P1 - 1 -+ .L2 - -(2,0) + -(0,1) - - p,
P1 P1
FINAL PROBABILITIES FOR BRANCHING RANDOM PROCESSES 319
and additionally
(18)
When 0 ~ U1 < 1, 0 ~ 4>1 < 1 there remains only one branch of the curve
(18) (with minus sign).
Let us complete the computations for p = ~. In this case we obtain for
the coefficients of the expansion 4>1 (ud = q~O) + qP)U1 + qF)u~ + ...
q1(O) -- 0 , (1) _ 1 (2) _ 1 (m) _ 1·3·5 ... (2m - 3)
q1 - 2' q1 - '8' ... , q1 - 2m . m! ' m~2,
that is, the process will inevitably terminate, the expectation E1 = Lm mq~m)
of the number of particles of type T 2 obtained from one particle of type Tl is
infinite. This causes a peculiar phenomenon of non-stability of the number of
particles of type T 2 generated by a given, though perhaps very large, number of
particles of type Tl. To make this clear, we denote by J.ln the number of particles
of type T 2 generated by n particles of type Tl. Clearly, J.ln = K1 + K2 + ... + K n
where K; denotes the number of particles of type T 2 generated by the ith particle
of type Tl. The variables Ki are independent and have prob ability distribution
P{K; = k} = q~k).
This together with (19) implies that (see [3]) the distribution law for
(20)
320 FINAL PROBABILITIES FOR BRANCHING RANDOM PROCESSES
is Sn (x) = P {en < x}, which tends to a certain limit distribution law for the
quantity
S(x) = 111: s(x)dx. (21)
The limit distribution law (21) can be found from the logarithm of its
characteristic function,
logX(t) = log 1
o
00
s(x)eitll:dx = -1-
2...(i
1
0
00
(e iut -
du
1)--.
u3 / 2
(22)
Thus for large n, J.ln is of order n 2 , but the ratio J.ln/n 2 varies from case
to case.
Moscow, 12 April 1947
References
w,w/2,w/3,w/4, ....
A natural generalization of periodic oscillations are almost periodic oscillations,
ofthe form
er(t) "" L:a~n)ei>.nt. (2)
n
(3)
or, in a completely general form combining the discrete (2)and continuous (3)
cases, to represent oscillatory processes by Stieltjes integrals
(4)
* In: Collected papers on the 30th anniversary 0/ the Great October Sodalist Rev-
olution, Vol. 1, Akad. Nauk SSSR, Moscow-Leningrad, 1947, pp. 242-252.
321
322 STATISTICAL THEORY OF OSCILLATIONS WITH CONTINUOUS SPECTRUM
cillations in the form (4). Apparently, this justification is general enough for
practical applications. Still, even in Khinchin's concept, the representation of
non-damped oscillations in the form (3) has no rigorous mathematical justifica-
tion. I believe that such astate of affairs should be considered final: there seems
2. All the very simple and complete results stated below are obtained by a
radieal cha.nge in viewpoint: the er(t) are considered as random variables in
the sense of prob ability theory. Thus, the main object of study is not a certain
individual oscillatory process, but the law of the prob ability distribution in the
function space of various possible versions of such a process.
Formally, the theory may be presented starting from the following defini-
tions. 4
An s-dimensional random process is a collection of complex random vari-
ables er (t) given for aH real t and r = 1,2, ... ,s.
It is assumed that the expectations
are finite and that the process is stochastically continuous, which means that
as ~ -+ o.
A random process {el(t),6(t), ... ,e.(t)} is called stationary if for any
t, t1, t 2 , ••• ,tn , the sn-dimensional distribution law of sn random variables
(5)
4 The notions of probability theory used in what follows are rigorously introduced,
for example, in my book [20].
5 e
In (5), denotes the complex conjugate of e.
324 STATISTICAL THEORY OF OSCILLATIONS WITH CONTINUOUS SPECTRUM
(6)
This case was studied by E.E. Slutskii [3]. He showed that under the assumption
(7) there exists an expansion
where the o:~n) are certain random variables uniquely 6 determined by the given
er(t). In this case,
(9)
In particular,
(9')
6 Uniqueness is to within equivalent variables, that is, variables that are equal
with probability 1.
STATISTICAL THEORY OF OSCILLATIONS WITH CONTINUOUS SPECTRUM 325
where the <Pr(Ll),) are the increments of the spectral function occurring in the
representation of er(t) in the form (4).
This assumption appears to be correct. The further development of Khin-
chin's theory of stationary random processes follows almost automatically from
the reduction of it to the spectral theory of one-parameter groups of unitary
operators, as is indicated in [4]-[7] and presented in the following section.
3. The random variables er(t) may be considered as elements of a Hilbert
space, in which the scalar product is given by the formula
(11)
It can easily be shown that any stationary process {6(t),6(t), ... ,e8(t)} in
the corresponding Hilbert space generates a one-parameter group of unitary
operators {ur} satisfying the relation
(12)
for all real t and and all r = 1,2, ... , s. As is weIl known, the operators ur
I:
T
can be represented as
ur = ei)'r dE()'), (13)
andjumps
at distinct points ).. Naturally, <Pr(Ll),) and a r ().) are random variables, as is
er(t), but, unlike er(t), they do not depend on the time t. If for some ). the
jump a r ().) is non-zero, then er(t) contains a strictly periodic component
(15)
326 STATISTICAL THEORY OF OSCILLATIONS WITH CONTINUOUS SPECTRUM
. 1 jT
O'r(O) = Thm
..... oo
2T
-T
er(t)dt. (16)
The real meaning of the increments C)r(A.\) for sufficiently short intervals A.\
is similar. Namely, the component
(17)
corresponding to the interval A.\ of the spectrum can be arbitrary closely ap-
proximated by
(17')
are not strictly periodic for any finite fixed interval A.\ (unless they are identi-
cally zero). Their time behaviour on short intervals A.\ is similar to the oscil-
lations of a pendulum with weak damping generated by chaotically distributed
small random pushes.
The components er(t, A.\) for non-intersecting intervals A~ and A~ are
not correlated to each other, that is,
(18)
(19)
In particular,
(19')
that is, Frr(A.\) is just the average value (with respect to probability) of the
square of the spectral component er(t, A.\) of er(t).
4. The use of the abstract tool of operators in Hilbert space could have pro-
duced the impression on the reader that the spectral components er(t, A.\)
STATISTICAL THEORY OF OSCILLATIONS WITH CONTINUOUS SPECTRUM 327
are some mathematical fiction far from possible direct experiment. This is
not true: with any desired degree of approximation they can be isolated from
any statistically stationary oscillatory process by me ans of appropriate filters.
Namely, with any desired accuracy a device can, in principle, be constructed
which associates with any given random stationary function el(t) the random
1
stationary function
e2(t) = 00
6(t - T)S(T)dT, (20)
If we set
(21)
is arbitrarily elose to
(22)
where
d ~ d!'
L = Co + Cl dt + C2 dt 2 + ... + Cn dt n (23)
and all the eigenvalues of L have negative real part. Under these assumptions,
if el(t) is statistically stationary, then 6(t) tends to a statistically stationary
function as t - +00. This statistically stationary limit behaviour of 6(t) is
determined by the formula
Formula (24) is of interest in its own right and not merely for the experi-
mental determination of the spectral components er(t, dA). It implies that
(25)
(26)
Formulas (24), (25) and (26) generalize the usual resonance theory to the case
of an arbitrary statistically stationary external force. Usually, a non-statistical
presentation of this theory can be applied only to the case when the external
force has a discrete spectrum, that is, when 6(t) is ofthe form (2). In physical
literature, however, for the case of continuous spectrum this kind of reasoning
is widely used, though, of course, without rigorous mathematical justification.
(27)
(28)
The simplest and most important in applications is the case when the dis-
tribution laws of any finite number of variables er(t) are Gaussian. In this case
so are the ~r(dA). Under the simplest and most typical case of a continuous
spectrum (the applicability of (27) and (28) and Gaussian distributions) the
random functions ~r(>') are not differentiable and the passage from (4) to the
formulas of type (3) seems impossible. On the intervals of the >.-axis where
the spectral density Irr (>') is continuous and non-zero, the nature of variation
of ~r(>') is the same as the time dependence of the coordinates of a Brownian
particle, when neglecting inertial forces, that is, if for small d = >'2 - >'1 the
increments of ~r(>') are of order.Ji5.. Here is a new case where continuous
nowhere differentiable functions of Weierstrass type intrude into mathematical
physics. For Brownian motion, more refined arguments taking into ac count the
STATISTICAL THEORY OF OSCILLATIONS WITH CONTINUOUS SPECTRUM 329
References
Wald and a number of other American authors have given interesting theorems
concerning the sums
where the number v of terms is a random variable (see [1]-[3], where references
to earlier literat ure can be found). In their method of proof these theorems
go back to the work of one of the authors of the present paper [4], where for
estimating the prob ability
he considered sums (v with index v equal to the first number n for which
The inequality proved in [4] (see also [5], p.154) can easily be derived from
Theorem 5 of the present paper.
Further we give very simple proofs for theorems of Wald type relating to
the first and second moments. Our conditions for the applicability of basic
identities are somewhat broader than those of Wald and Wolfowitz. Our gen-
eralization of the conditions for the applicability of these identities is important
for certain applications.
In what folIows, v denotes a random variable that can take only non-
negative integer values, and
n=O,1,2,3, ....
331
332 ON SUMS OF A RANDOM NUMBER OF RANDOM TERMS
Pn = P(Sn).
Moreover, we set
00
Pn =P{lI=n}= 2: Pm·
m=n
Expectations of random variables,
will be understood in the sense of the abstract Lebesgue integral over the set
of elementary events U. Accordingly, the expectations, when they exist, are
always finite, and the existence of E(7J) implies the existence of E(I7JI). The con-
ditional probability distributions and conditional expectations are understood
in the sense explained in [6].
Of basic importance for all theorems of Wald type is the assumption
(w) For n > m the random variable en
and the event Sm are independent.
According to [6], (w) means that for n > m the conditional distribution of
en under the condition Sm coincides with the unconditional distribution
E(lI) = 2:P n = 2: P
n n,
n=l n=l
Theorem 1 is an obvious consequence of the following more general statement:
Pro%/ Theorem 2. In view of (I), the series on the right-hand side of (11)
converges absolutely. Applying the Abel transformation to it, we obtain
00 00
The event {/I ~ n} is complementary to the event {/I < n} which, by condition
(w), is independent of en . But the independence of an event implies the
independence of its complementary event. Therefore, denoting by E(7JIA) the
conditional expectation of 7J under condition A, we obtain
00 00 00
(3)
Since the fact that en is independent of the event {/I ~ n} implies that len I is
also independent of this event, it follows from (I) that
00
= L: P{/I ~ n}E(lenlll/l ~ n) =
n=l
= 00. (4)
n=l n=l
334 ON SUMS OF A RANDOM NUMBER OF RANDOM TERMS
(6)
As for the latter equality, we note that the "necessary event" U is the union of
the events Sm, including So. However, the omission ofthe term with m = 0 on
the left-hand side of (6) is inessential, since, in accordance with the accepted
rule, we consider the sum (0 of an empty set of terms to be identically equal
to zero.
Comparing (3), (5) and (6), we obtain (11) as required.
When considering the second moments we will assume that
is a vector with two components e~ and e~. Condition (w) is now understood
in the sense that for n > m the two-dimensional conditional distribution of
en under the condition Sm coincides with the unconditional distribution of the
same vector. In addition to (w) we also assume the condition
(z) The vectors 6,6,6, ... are independent.
By contrast, the dependence among the components of the same vector
can be arbitrary. We now consider the particular case when e~ == e~:
(III)
exists.
Naturally, in (III) (~ denotes
ON SUMS OF A RANDOM NUMBER OF RANDOM TERMS 335
B nij -- bij
1 + bij2 + ... + bn'
ij
(8)
(9)
we have identically
(10)
From (8) (9) and the independence of en of (n-1 (which follows from (z))
we obtain
(11)
336 ON SUMS OF A RANDOM NUMBER OF RANDOM TERMS
~ v'b~lb~2 + v'b~l B~2 + v'b~2 BAI ~ 2 ( v'b~l B~2 + v'b~2 BAI) (14)
for E(I<Pn 1). Comparing (7) and (14) we see that (13) holds, which completes
the proof of Theorem 4.
In the particular case e~ == e; we can abandon the vector notation and
write
Cl _ c 2 - C
<"n-<"n-<"n, A n=al + a2 + ... + an,
Theorem 5. If (w) and (z) hold for a sequence of random variables en, the
expectations E(en) = an, E(en -an)2 = bn exist and the series E::'=l Pnv'bnBn
converges, then
00
References
This paper studies the limiting distributions of the number of visits to various
states for a Markov chain with a constant transition prob ability matrix.
Consider a classical Markov chain, that is, a random Markov process with
discrete time, a finite number of states (s > 1)
(1.1)
(1.2)
el = (1,0'00.,0),
e2=(0,1, ... ,0),
(1.3)
e, = (0,0, 00.,1)
in s-dimensional coordinate vector space, then the components
of the vector
/let) = €(1) + €(2) + ... + €(t) (1.4)
338
A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS 339
(1.7)
Thus the function Wa(m) ofthe vector argument m comprises the conditional
distributions for E(O) = ea of all vectors p(t) for t = 1,2,3, .... This is possible
because, since by virtue of the relation.
(1.9)
(A) For any two states ea and eß there exists a sequence 0/ states (e a , e.n ,
e""2' ... , e""k' eß) along which all the transition probabilities pi}, p~:, ...
. .. , P~:-l' P~k are positive.
The most general case can be reduced to the case (A). This is done in §7.
The following Lemmas 1-3 hold only under condition (A). In the state-
ment of these lemmas we denote by Ea the conditional expectations under the
340 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
(1.10)
Q Q
Lemma 2.
A~(t) == E'YpQ(t) = tqQ + 0(1) (1.11)
as t -+ 00.
(1.12)
W~ßI = 0, (1.18)
and all the principal minors of the matrix IIbaßII are equal to one another:
(1.19)
(1.20)
Ll> o. (1.21)
Under condition (B) the form b(x) has an inverse form c(x) in the space
N which can, for example, be expressed as
bl l b12 bl ,3-l xl
b2l b22 b2,3-l x2
1
c(x) =-- ..... (1.22)
Ll
b3- l ,1 b3- l ,2 b3- l ,3-l x 3- l
Xl x2 x 3- l 0
or in a similar alternative way by choosing some other index 'Y instead of s - 1.
Setting
1
p(x) = e- l / 2c (x), (1.23)
v's(27r)3-lLl
Theorem 1. Under conditions (A) and (B), for any 'Y and any rectifiable
domain G ofthe space N,
as t -+ 00, where
(1.26)
(1.27)
(1.28)
We then say that an s-dimensional vector z is cyclic if there exists a cycle (1.27)
such that
(1.29)
(1.30)
where Zl, Z2, ••• ,Zn are cyclic vectors and al, a2, ••• ,an are arbitrary integers
(the number of terms n is also arbitrary). We denote the fundamentallattice
by Z. Clearly Z consists exclusively of vectors with integer components and
forms a group with respect to addition. We now state our additional condition.
(C) the fundamentallattice Z coincides with the set Q of all integer vectors
of s-dimensional coordinate space.
The results of §§6, 7 imply that both conditions (A) and (C) are necessary
for the limiting behaviour of the probabilities W-y (m) to be independent of the
indices 'Y. Thus, the case when both conditions hold is the only case when we
can count on obtaining a locallimit theorem in an ideally simple formulation.
This makes us regard the results of §3 and §5 as definitive in a certain sense:
A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS 343
Theorem 3 (§5). Ifl (A) and (C) hold, then whatever the choice of indices "
(1.31)
A very interesting quest ion is the following: are (B) and (C) equivalent
if (A) holds? If the answer is yes, then the conditions of applicability of the
Local Theorem 3 and the Integral Theorem given above would coincide. 2
In §6 we give a fuH analysis ofthe complications wh ich arise when (C) does
not hold while (A) still holds. As mentioned earlier, in §7 we consider the case
when (A) does not hold. The results of this section are stated in a somewhat
more complicated way, though the question on the limiting behaviour of the
probabilities W-y( m) is essentiaHy solved for the most general case as fuHy as
for the case when (A) and (C) hold.
Local theorems, which are the main subject of this paper, will be derived in §5
and §6 using a certain strengthening of the method developed by Doeblin for
1 Unlike (1.23), in (1.31) there is no factor 8 under the root sign, sinee in the
(8 - 1)-dimensional space of veetors m with given m (for example, in the space
N) integer points are distributed with density I/Vi. Note also that in the text
of §5 the statement of Theorem 3 differs from that given here in that the way
in whieh the probabilities tend to their asymptotie expression is indieated more
precisely.
2 Let L be the linear dosure of the fundamental lattiee Z, that is, the set of
all veetors representable in terms of eydic ones by (1.30) with arbitrary real
eoefficients ak. In §3 (see Corollary 2 of Lemma 10) it is proved that under
eondition (A), requirement (B) is equivalent to (B'):
(B') The space L eoincides with the entire 8-dimensional veetor space R.
The question whether (B) and (C) are equivalent would be solved if we eould
show that
(*) under eondition (A) the fundamentallattiee Z always eoincides with the set
of all integer points from L.
Assumption (*) is very likely to be true. If it were proved, this would lead to a
eertain improvement in the results of §6.
(Assumption (*) was later proved by Rosenknop in Moseow and Chulanovskii in
Leningrad. )
344 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
proving the integral limit theorem for the case of an infinite number of states
[4]. To make dear how this method was developed, in this section we briefly
present Doeblin's method in its original form, which is only suitable for proving
integral theorems.
The presentation of this paragraph will be brief, since the results given
here are essentially known. Here, as in §§3-6, (A) will be assumed to hold
without any special indication. Moreover, throughout this section 'Y will be
assumed to be fixed and
f(O) = e-y. (2.1)
Let
0= r(O) < r(l) < r(2) < ... < r(n) < '"
be the sequence of all times t when the state e-y is observed. For n ~ 1 we set
For n = 0 set
A(O) = O. (2.4)
The components 60t (n) of 6( n) denote the number of visits to the state eOt at
times t satisfying the inequalities
that is, between the (n - 1)th and the nth returns to the original state e-y
(induding the very moment of the nth return). Clearly, we always have
P(n) = 1, (2.5)
(2.6)
A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS 345
(2.7)
(2.8)
and since
7'(n) = X(n), (2.9)
(2.10)
Lemma 5. There exist constants C and D > 0 such that for any k and for
c5( n) = 7'( n) - 7'( n - 1) the following inequality holds:
(2.11)
Lemma 5 implies
(2.12)
are finite.
346 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
(2.13)
Let Tt denote the smallest of the numbers T(n) that are ~ t and let lI(t)
be the corresponding number n.
Setting
"t
.\(t) = .\(lIt) = !-'(Tt) = L:c5(n), (2.14)
n=l
we obtain
Lemma 6. There exist constants C and D > 0 such that for any k,
(2.17)
Since always
(2.18)
(2.19)
The weIl known Wald identities (see [5] concerning the conditions of their
applicability) are applicable to the following sums with random upper limit IIt:
"t
.\t = L:c5(n).
n=l
(2.22)
By Lemma 6,
E-yTt = t + 0(1) (2.23)
(2.26)
with accuracy to within 0(1) as t -+ 00. Together with (2.24) this leads to the
formulas
(1.13)
where
baß = qV[b~ß - L:(qßb~.p + qab~ß) + L: qaqßb~"']. (2.29)
.p .p.",
348 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
Thus we have proved Lemma 3 from §1, at the same time showing the
connection between the coefficients baß and the moments b~ß (the assertion of
independence of baß of the index 'Y should be proved separately, but because
of (A), this is quite simple).
Note here the conversion 3 of (2.29) (although we shall not need it in what
follows):
(2.30)
-
7J(n) = V(riY 1~
~[A(n) - A(n)q] = Vii L..J il(k), (2.31) .
k=1
where
il(n) = JqY[6(n) - 6(n)q]. (2.32)
Since the il(n) are independent and identically distributed, by (2.33) and
(2.34) the vectors 7J(n) have a Gaussian distribution as n -+ 00 with matrix of
second moments IlbaßII. Applying to the sums
n"
L il(k)
k=n'
3 Formula (2.30) can be proved directly in a way similar to that given for (2.29)
starting not from (2.27) but from the identity
(2.35)
as n -+ 00.
In order to pass from the vectors '1( n) to the vectors
(2.36)
as t -+ 00.
In the proof of Lemma 7 we can pass from e(t) to '1(t) via the vectors e( Tt)
and
(2.37)
(2.38)
(2.40)
Finally, by Lemma 6,
(2.41)
350 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
Apart from the probabilities W-y(m) we shall also need the probabilities
(3.1)
W~(m) is the conditional prob ability under the hypothesis (0) = e-y of the
combination of the two events:
1) visiting the states eßl (ß = 1,2, ... , s) m ß times at the times t =
1,2, ... , m respectivelYj and
2) visiting the state e a at the final moment t = m.
For m = 0 we set
a
W-y (0) -
_ {10 for 'Y = a,
for 'Y # a.
(3.2)
The probabilities W~ (m) are of special interest when the upper and lower
indices are equal. They can be used to express the probability distributions for
the vectors A( n) considered in the previous section,
(3.4)
Since always
(3.5)
the distribution of A(n) under the hypothesis (0) = e-y is completely deter-
mined by the values W.J(m) with m-Y = n and for any n ~ 0
E W.J(m) = 1. (3.6)
m"Y=n
Clearly, for any cyclic vector z for which z-Y > 0, we have
Clearly, the converse is also true: if for some "y (3.7) holds, then,z is cyclic. 4
Especially important for us are the cyclic vectors with z'Y = 1. They are
the only values of vectors c5(n) which (naturally, provided f(O) = ea ) have
positive probability. They are considered in
Lemma 8. The minimal additive vector group containing all cyclic vectors
with z'Y = 1 coincides with the fundamental lattice Z.
To prove this lemma it suffices to establish that any cyclic vector z can be
represented as a linear combination with integer coefficients of cyclic vectors z
with z'Y = 1. To this end we consider three cases:
1) if z'Y = 1, then our assertion is already provedj
2) if z'Y > 1, then the cycle generating z can be divided into z'Y cycles,
from the visit to e'Y up to the next (along the cycle) subsequent visit to e'Y' and
then z can be represented as
z = zl + z2 + ... + Zz"l,
where the z"correspond to the partial cyclesj
3) if Z'Y = 0, that is, the z-generating cycle
does not cointain e'Y at all, then by (A), we can find chains
so that they can contain e'Y as indicated: the first chain as first element, and
the second chain as last element. Then the chains
are cycles, and for the corresponding cyclic vectors Zl and Z2 we obtain
Zl - Z2 = z, zl = 1, z~ = 1.
4 Note here also the interesting identity (a.lthough we sha.ll not need it in what
folIows):
352 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
Lemma 9. The space .c is equal to the linear span of the values of the vectors
6( n) which have (under the hypothesis e(O) = e-y) positive probability, while the
space .co is equal to the linear span of the possible values of the vectors
The first part of Lemma 9 follows directly from Lemma 8. The second
part can be proved in the following way:
1. Lemma 4 and the first part of Lemma 9 imply that q belongs to .c.
2) It can easily be shown that ~(n) = O. Therefore the possible values of
A( n) belong to .c.
3) Since the possible values of
1 -
6(n) = vq-Y
.r,;::yA(n) + 6(n)q
generate the whole space .c obtained by attaching to .co a vector q that does
not belong to .co, the possible values of A(n) generate the entire space .co.
Since
E-yA(n) = 0, (2.33)
E-yACl(n)Aß(n) = bClß , (2.34)
Lemma 10.
for x E .co,
(3.8)
for x orthogonal to .co.
Lemma 10 brings us the following conclusions (some of them were already
mentioned in §1):
A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS 353
ll'Yl -
u'Yo -
ll'Y2 -
U'Yl -
-
•.. -
ll'Yo -
u'Yk_l -
1. (3.10)
A eyde will be ealled simple if it eontains eaeh state at most onee. Simple
eydie vectors corresponding to simple eydes are eharacterized by the fact that
all their eomponents are at most 1:
(3.11)
Sinee there are a finite number of simple eydes and all of them ean easily
be found, the following lemma gives a very effective way for determining the
fundamentallattiee.
Lemma 11. All the vectors in Z are representable in the form (1.30) with
simple cyclic vectors Zi.
For the proof it suffiees to note that any eyde in whieh a eertain state is
found more than on ce ean be divided into two eydes. Repeating such division,
any eyde ean be divided into simple eydes.
Let
354 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
be the system of all simple cyclic vectors. Then the above considerations im-
mediately give the following result:
Lemma 12. The lattice Z is the minimal additive group generated by the
vectors h, 12, ... ,Ihl while the space .c is the linear span 01 this system 01
vectors. The rank r 01 the M arkov process equals the rank 01 the matrix
By Lemma 10 the distribution ofthe vectors ~(n) and '7(n) is totally concen-
trated on the space .c. Since the form b(z) on this space is positive, it follows
that as n -+ 00, '7( n) obeys a Gaussian distribution which is non-degenerate
on .co and corresponds to b(z). We denote by p(z) the probability density of
this Gaussian distribution. Then by Lemma 7 we have the following result.
Theorem 2. Under condition (A), tor any 'Y and any domain G the intersec-
tion 0/ whose boundary with .co has measure 0 in .co we have
In the particular case when (B) holds, .co coincides with N and Theorem
2 gives Theorem 1 stated in §1.
The sum on the right-hand side extends over all integer vectors I with non-
negative components satisfying the conditions
(5.2)
l"Y=0. (5.3)
It is easily checked that the sum of the probabilities W-y(l) over all vectors
I satisfying (5.3) is given by
L 1
W-y(l) = Eyc5(n) = - . (5.4)
/"'=0 q-Y
Under the additional restrietion (5.2) this sum may become somewhat smaller,
but because ofthe absolute convergence of (5.4) this reduced sum tends to l/q-Y
as the m tend to infinity.
Q
If (A) and (C) hold, then by Lemma 8, the minimal additive group gen-
erated by the values of the vectors c5(n) with positive prob ability (under the
hypothesis (0) = e.,) consists of all integer vectors, for any 'Y. For the differ-
ences of the values of the vectors c5( n) having positive prob ability, m-y is always
zero (since for the possible values of c5(n) themselves m-y is always 1), that is,
they belong to the group Q-y of all integer vectors m with m-y = O. It is easy
to see that in fact they generate the entire group: if the vectors
where mo, ml, ... , mk, ... are all the possible values of c5(n), did not generate
the whole group Q-y, but only a proper subgroup of it, then after adding one
vector mo they could not generate the whole group Q, whereas the group
generated by the vectors
that is, in view of (C), with the whole group Q of integer vectors. As a result,
we have the following:
Lemma 13. Under conditions (A) and (C) the minimal group generated by
the differences 0/ the values 0/ the vectors c5(n) coincides with the group Q-y 0/
all integer vectors m with m-Y = O.
From Lemma 4, the corollary of Lemma 5 and Lemma 13, and using the
limit theorem [6] applied to the sums
we obtain
P'Y(y) is the Gaussian probability density in the space Q'Y corresponding to zero
mean values and covariance matrix IIb~.8I1, and the estimate of the remainder
term is uniform lor
(5.7)
(5.8)
are arbitrarily dose to 1. Then from (5.1), (5.4) and (5.8) we obtain
(5.9)
(5.10)
x= rm '" .;q'f(y
m-mq 1 _
- yq), (5.11)
(5.13)
Formula (5.13) acts uniformly provided that (5.10) holds. Taking this into
ac count and keeping in mind the integral Theorem 1, we obtain 5
(5.14)
Theorem 3. If conditions (A) and (C) hold, then as m -+ 00 and for any 'Y
(5.15)
In this section we retain condition (A), but discard (C). Since now the funda-
mentallattice Z does not coincide with the lattice Q of all integer vectors, it
is natural to consider residues of Q modulo Z. In more detail, this means the
following. Two vectors ml and m2 will be considered congruent modulo Z if
(6.1)
All vectors with integer components are divided into classes of congruent vec-
tors modulo Z. These classes are residue classes modulo Z.
5 Of course, (5.13) can be proved directly from the relations between the moments
b~ß and baß. Then the proof of the local Theorem 3 stated below would become
independent of integral theorems. Such a presentation, more consistent from the
algebraic viewpoint, would however be somewhat cumbersome. The factor .JS
in (5.14), as mentioned in §1, is connected with the fact that the integer points
in N are distributed with density 1/.JS.
A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS 359
Lemma 15'. All vectors m satisfying (6.2) belong to the same residue class
modulo Z. This residue class will be denoted by D~.
To prove Lemma 15, we assume that Wg(mt} > 0 and Wg(m2) > O. Then
there exist chains
for which
By condition (A) there exists achain eß, e X1 ' ex~' ... ' e Xk = ea . Clearly,
=eß, eX1 ' eX2 ' ... , e Xk = ea ),
(e a , e«>l' e«>2'· .. , e«>i
(ea,e~1,e~2,···,e~j = eß,eXl,eX2, ... ,eXk = ea )
Clearly, always 6
D~=Z (6.3)
D~+DJ = D~. (6.4)
In complete analogy with Lemma 14, the general local limit theorem [6]
gives
for m rt. Z,
(6.5)
for mEZ,
m - m'Yq/q'Y
y= ...;r;:rr , (5.6)
as m'Y -+ 00, the estimate of the remainder is uniform for
(5.7)
6 Residue classes are added in accordance with the usual algebraic ruIes.
360 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
According to (5.4),
1
"L.J /'Y(D) = -. (6.8)
q'Y
D
Since
(3.3)
it follows by Lemma 15 that /'Y(D) is positive only for D that coincide with
some D~, that is, only for a finite number of residue classes D.
Treating (6.6), (5.1) and (6.7) in the same way as (5.9), we see that, as
m'Y -+ 00when (5.7) holds and m lies in D,
where
m. - mJqjq'Y
y. = y:;n:r , (6.10)
m. = m- UD (6.11)
Theorem 4. If(A) holds and r > 1, then as m -+ 00 and for m in the residue
class D we have
where 7
(6.13)
W is the density o/the points 0/ Z in the space .co, p(:c) is the Gaussian density
in .co corresponding to mean value 0 and matri:c 0/ second moments 116'1<,811 and
the estimate 0(1) is uniform under the condition
(5.10)
In the most general case the set of states el, e2, ... ,e. splits into a certain num-
ber of "dasses" Kl, K 2 , ••• , K n of "essential" states and a set R of "inessential"
states (see [1]). Condition (A) holds within each dass Kj: transitions from a
state ea E Kj to astate e,8 E Kj for i =I j are impossible as are transitions
from astate belonging to one of the Kj to astate in Rj on the other hand,
there is always the possibility of passing in a certain number of steps from a
state e a E R to astate in at least one of the Kj. Clearly transitions of the
latter kind are irreversible: having reached astate of dass K j , our system is
not able to come out of the states of this dass.
Let K be the union of all dasses K j • For ea E K we denote by Oa the set
of integer vectors m with non-negative components satisfying the conditions
Since from any state e-y E R transition to some state e,8 E K eventually occurs,
the following lemma holds:
I: I: W;(m) = 1. (7.2)
aEKmEOa
7 The definition of x. depends on the choice of the vectors UD in (5.7), hut since
f-y(D) > 0 only for a finite numher of residue classes D, this arhitranness is
immaterial in the limit theorems.
362 A LOCAL LIMIT THEOREM FOR CLASSICAL MARKOV CHAINS
Lemma 18. There exist constants C and D >0 such that for any 'Y and any
mEM ' ,
(7.3)
We can now easily complete the study of the limit behaviour of W-y(m) as
m -4 00:
(7.4)
(7.5)
where ml E M' and m2 has components m 2 > 0 only for e Ct E K;. Then
(7.6)
It is easy to study the limit behaviour ofthe probabilities WCt (m2 - e Ct ) using
Theorem 4, since within the dass K condition (A) holds.
15 March 1949
References
----------------------------
~.......~~~~-. "0_1
~ ----------------- "'---l---i----!-- hn
ffr
gz
Fig.l
364
SOLUTION OF A PROBABILISTIC PROBLEM ON BED FORMATION 365
G(x) = i~ g(x)dx.
The third assumption could have been omitted, thereby complicating fur-
ther the analytical tools. But in view of the applied significance of the prob-
lem, we decided to make the presentation less cumbersome by avoiding the
use of Stieltjes integrals. The second assumption guarantees 1 that the sums
dr) = 6 n + 6n +1 + ... + 6n +r
tend to +00 as the second index tends to +00,
therefore the greatest lower bounds tPn = inf«(~O), (~1), ... ,(~r), ... ) are finite
and will be attained at a certain (random) finite number r.
It can easily be seen that if tPn ~ 0 the nth bed is washed out completely,
while if tPn > 0, it retains some final thickness tPn. Therefore the problem is
reduced to determining the prob ability p = P{ tPn > O} and the conditional
distribution of tPn under the hypothesis tPn > O. It is clear from assumption 1)
that both p and this distribution do not depend on n.
Assumption 3) implies that the distribution of the random variables tPn
is continuous, that is, it can be characterized by a certain probability density
f(x). Clearly, p = Jooo f(x)dx and the conditional distribution of tPn under the
1 By a well-known theorem of A.Ya. Khinchin, 1) and 2) imply that our sums obey
the strong law of large numbers, that is,
,(r) }
P { lim _n_=M =1.
r-+oo r +1
366 SOLUTION OF A PROBABILISTIC PROBLEM ON BED FORMATION
To find /*(z) and p it suffices to know the function s(z) = /(z)/p. Then
1:
/* is determined by (1) and p is computed by the formula
Theorem. Under the assumptions 1), 2), 3) the Junctional s(h) = /(z)/p
(-00 < z < +00) is the unique solution oJ the integral equation
The proof of (4) is as folIows. From the definition of <Pn it can easily be
derived that
where h(z) and h(z) are the conditional densities for <Pn under the hypotheses
<Pn+l > 0 and <Pn+l < 0 respectively and noting that the conditional density
for <Pn+l under the hypothesis <Pn+1 < 0 is 2
2 Here we set p < 1. When p = 1, g(x) = 0 for x < 0, f(x) = g(x) and (4) holds
trivially.
SOLUTION OF A PROBABILISTIC PROBLEM ON BED FORMATION 367
we obtain
= g(z),
1
I1(Z) (8)
Ja(z) = . 1 00
-00g(z - y)/",*(y)dy = 1-
1P -00
0
g(z - y)f(y)dy. (9)
Formula (4) now follows immediately from (6), (8) and (9).
To complete the proof of the theorem we now only need to establish the
uniqueness of the solution to (3). Consider the iterated kerneis
Kr(z,y) = K1(z,z)Kr_1(z,y)dz.
I:
So(z) = g(z),
=JrJ.f. ..jU1+U2+
[ ... +u/c<0 g(Ul)" .g(ur)g(z - Ul - ••• - Ur)dUl" . dUr
k=1,2, ... ,r
To see that it really is a solution, and the unique one, it s1,lffices to prove
the convergence of the series E~o Ir, where
Clearly,
P{A~)} =
Since the events A~), r = 1,2, ... , form (to within probability 0) a com-
plete system of pairwise incompatible events, it follows that
(12)
This essentially solves the problem posed at the beginning of the article:
formula (10) gives the function s(x). Using this function we can compute I*(x):
while the probability pis given by (12). We can estimate the remainder terms
of the series (10) and (12). The use of these series for computing numerical
resuIts is somewhat cumbersome but entirely possible.
Fig.2
Figure 2 plots /(x) and I*(x) for the case
P(A) = P{z E A}
for a random point Z in a certain space X, and a functional f(P) defined on
~.
EP cP = f(P). (1)
f(P) = a
369
370 UNBIASED ESTIMATORS
while
<p = i 2 - 1/n
f(P) = a2 ,
ete.
We shall see that in many important eases there are no unbiased estima-
tors. In these eases the following definition is helpful.
Definition 2. Functions <p+(x) and <p-(x) are ealled the upper and lower
estimators for f(P) respeetively if for any distribution P from ~
Cl + C2 + ... + Cn = 1
ean serve as unbiased estimator for a.
The exeessive diversity of unbiased estimators ean be signifieantly de-
ereased if we confine ourselves to unbiased estimators that are expressed in
terms of properly chosen suffieient statisties of the problem. To formulate the
appropriate general theorems, I shall have to give a somewhat generalized def-
inition of a sufficient statistie. Sinee in this definition a sufficient statistie ean
be not only scalar, but also veetor-valued, we do not need to introduee the
UNBIASED ESTIMATORS 371
not ion of "a system of sufficient statistics": in our general definition such a
system of statistics (Xl, X2, ... , XI) can be considered as a single statistic
Er 4J = L 4J(x)P(dxlh) (5)
Theorem 1. If X(x) is a sufficient statistic for~, then for any 4J(x) whose
expectation EP is finite for all P E ~,the conditional expectations Er 4J do not
depend on P E ~.
As with Definition 3, the exact meaning ofTheorem 1 needs to be clarified.
It is as folIows: for any function 4J(x) whose expectation EP 4J is finite for all P
in ~, there exists an M(h) that for any P E ~ can be taken for Er 4J.
2 The integrals in (5) should be understood in the sense of (10) and (11) of §4,
Chapter 5 of my book [1].
372 UNBIASED ESTIMATORS
where Q(Alh) is taken from the explanation to Definition 3 and the integral is
understood in the sense of the footnote to (5).
The following two theorems give a generalization of Blackwell's results [3].
Theorem 3. FOT any P E 'iJ for which the variance oP </> exists, under the
conditions of Theorem 2 we have the inequality
(8)
which hold for any functions </>(x) and X(x) provided that E</> and O</> exist.
Theorems 2 and 3 can be considered as a substantiation of the natural ten-
dency to use only unbiased estimators that are expressed via sufficient statis-
tics of the problem: Theorem 2 shows that in doing this we do not restrict the
number of problems for which there exist unbiased estimators, and Theorem 3
demonstrates that when we pass from an unbiased extimator </> to the averaged
estimator </>* expressed in terms ofthe statistic X, we can decrease the variance.
It can be shown that the estimator </>* is always "no worse" than the estimator
</> generating it, even when using other methods for comparing the "quality" of
estimators.
3 The superscript P to Ein (7) is omitted, since by Theorem 1 q,* can be chosen
independent of P.
UNBIASED ESTIMATORS 373
for the possible values of the estimator tP. The condition that tP is an unbiased
estimator of f(O) is then expressed as
n
Formula (1) shows that in the case when 0 runs through an infinite number
of values, only few functions f(O), namely those expressed by linear forms (1),
allow unbiased estimators tP. If the Pk(O) are linearly independent, then for
each function f(O) which allows unbiased estimators there exists only one such
estimator.
4 In the case of several parameters 81. 82, ... , 8n , we shall denote by 8 the vector
(81, 82, ... ,8n ).
374 UNBIASED ESTIMATORS
qhk ~ 0, Lqhk = 1,
k=l
m
Ph(O) ~ 0, LPh(O) = 1.
h=l
In case (2), in accordance with Theorems 2 and 3 we obtain from the unbiased
estimator
for any 0.
The proof of Theorems 2 and 3 which, in the general case, though ex-
tremely brief and simple, was based on a somewhat difficult general theory of
conditional probabilities and expectations, now becomes, for the special case
under consideration, purely arithmetic, since the expectations and variances
under consideration can be expressed by the known formulas:
m nh m
p(xIO).
In this case the condition for ljJ(x) to be an unbiased estimator for 1(0) is
L
written in the form
ljJ(x)p(xIO)dx = 1(0), (1)
that is, the determination of an unbiased estimator ljJ(x) for a given function
I( 0) reduces to the solution of an integral equation.
Suppose further that a function X(x) is given with values in aspace H.
This function divides X into subsets Vh on which X takes constant values,
x(x) = h.
q(x) ~ 0, f q(X)dXl = 1,
JVh
p(h) ~ 0, [P(hIO)dh = 1.
with variance
(4)
for any 0, this variance depending only on x( z):
<p* = s[x(z)].
P(A) = P{(z, y) E A}
for the pairs consisting of an "observable" point zEH and a random variable Y
that cannot be observed directly, a function <p( z) is called an unbiased estimator
of Y for all P in i;J.l if
(1)
The question of finding such unbiased estimators for random variables does not
involve new difliculties, since it is equivalent to finding unbiased estimators for
f(P) = EPYj
only now, to characterize the accuracy of the estimator <p for Y it is natural to
consider the expression
(2)
and since oP y does not depend on the choice of <p, the problem of finding an
unbiased estimator <p for y with least value of EP(<p-y)2 is equivalent to finding
an unbiased estimator for EP y with minimal variance oP <p.
data, we consider two problems from this field in this section and in §7. In this
paper we shall not go into details on the practical application of the suggested
method. In this section we assurne that we destroy the industrial product item
by testing and therefore, in principle, this testing can only be of a sampIe
character. The inspection system under consideration is as folIows.
Random sampling of n items is made from a batch containing N items.
These n selected items are tested and the number x of "defective" items in the
sampIe is found. If x :::; c, then the N - n items not included in the sampIe are
accepted. If x ~ d = c + 1, then the whole batch is rejected.
If there are y defective items in the batch before testing, then the number
of accepted defective items will be
y-X for x :::; c,
y* = {
o for x ~ d.
We set
q=y/N, q*=y*/N.
The aim of inspection is to guarantee that the q* are small enough, without
causing unnecessary, that is, for small q, too frequent rejection of the entire
batch, and without excessively increasing the sampIe size n.
To satisfy all these requirements, appropriate values of n and c must be
chosen. The main characteristic of the inspection system with given n and c is
the conditional prob ability
of obtaining x :::; c for a given q, that is, of accepting the batch. To compute
L(q) we use the conditional probabilities
n! (N - n)!N n ( 1)
Pm(q)=P{x=mlq}= m!(n-m)! N! q q- N x
If ,it is necessary to reject batches with q < qo and to accept those with
q > qo, ~hen the ideal operative characteristic for L(q) would be the function
for q < qo
L(q) = {~ for q > qo,
given in Figure 1. This form of L(q) can be achieved only for n = N, which in
case the tests are of a destructive character, would make the entire operation
meaningless. However, we come dose to the ideal characteristic in Figure 1 if
we choose a sufliciently large c and take
n "" c/qo.
If N is very large, this does not lead to an excessive increase in the ratio n/N
which, of course, should remain sufficiently small. Figure 2 gives operative
characteristics for the case 5
lJIJ
/
Fig.1 Fig.2.
runs smoothly enough this proportion is often much smaller than the proportion
of defedive items qo which has been used for computing the inspedion system.
Subsequent estimations of the proportion of defective items in the products
to be checked based on test results and the proportion of defective items that
were accepted using this inspedion system, is certainly very important also in
the case of quite satisfactory operative characteristics. This is the problem we
will solve now.
Assume that the inspection system with given n and c is applied to a large
number S of batches. The ratios q and q* introduced above will be denoted by
qr and q; for the rth bateh. Moreover, Sm will be the number of batches with
X = m (where m is the number of defedive items found in a bateh). Clearly,
S = So + SI + ... + Sn,
while
S' = So + SI + ... + Sc
denotes the number of accepted batches. The total number of items in S batches
lS
R=sN,
while the number of accepted items is
y = LYr = NLqr,
r=l r=1
while the number of accepted defective items is
3' 3
Since
where
y' 1 3
*
qmean = R = -; ~ *
L...Jqr,
r=l
while the ratio R/ R' becomes known after acceptance and is dose to 1 for any
normal production run, we can take it that our problem is to estimate qmean
and q~ean.
If the number S of batches is large enough, and if tfo(x) and tfo*(x) are
unbiased estimators of q and q* respectively with respect to x, then by the
law of large numbers, qmean and q~ean can be estimated using the approximate
formulas
(3)
(4)
1 3
qmean - tfomean = - L[qr - tfo( x r )],
S r=l
1
L
n
qmean ....., tfomean = - mSm (6)
ns m=l
for qmean.
UNBIASED ESTIMATORS 381
Since always 6
(7)
it follows that qmean can also serve as an upper estimator (see Definition 2
of §1) for q~ean' However, this well-known estimator does not indicate the
decrease in the number of defective items as a result of inspection. To obtain an
idea about more efficient estimators of q~ean we consider the general question:
which functions /(x) have unbiased estimators with respect to the number of
defective items x in a sampIe of n items. This quest ion is solved very simply
in accordance with §2. For any function ifJ(x),
n
we can obtain upper and lower estimators ifJ+ (x) and ifJ*- (x) respectively for q* .
6 More preClse
. Iy, *
qmean < 1 ~8
_ qmean - 6N L.....r=l Xr·
382 UNBIASED ESTIMATORS
This method promises acceptable practical results for many cases in which
the approximate formulas of the next section are inapplicable.
. {q
q =
for x ~ c,
o for x ~ d,
in a sufficiently good approximation, and replace (1), (2), (9) of §5 by
I
Pm(q) = m.n
I( n~ m.
)l qm (1- qt- m , (1)
of the estimator (5) of the previous section, we obtain the unbiased estimator
1jJ2 = x( n - x) , (4)
n 2 (n - 1)
found by Girshick, Mosteller and Savage [5]. From this we obtain the unbiased
estimator
1 n-l
ß2 = sn
2 2( n- 1) L
m(n - m)sm (5)
m:1
for the variance of <Pmean.
For m small compared with n the probabilities Pm(q) are non-zero only
for small q, and (1) can be replaced by the simple approximate formula
(6)
Since by (6),
m+1
qPm(q) = --Pm+1(q),
n
under the assumption that c is small compared with n, (2) and (3) give
1
Q(q) = -
n
L mPm(q). (7)
m~d
UNBIASED ESTIMATORS 383
Thus when we can use (3), (6) and (7), the function Q(q) is a linear
combination of the Pm(q) and, in accordance with §2, we are able to construct
an unbiased estimator for Q(q) (and consequently, for q*) with respect to x.
This estimator has the form
(9)
*
4Jmean - 4Jmean w mSm
= -1 '"'
ns
(10)
m>d
is approximately equal to qmean - q~ean or (when R/ R' is elose to 1) to qmean -
q:nean' that is, the decrease of the proportion of defective items in the process
of inspection.
We can quite effectively estimate the accuracy of the approximate estima-
tor (9). The independence of the sampies from different batches (this indepen-
dence is implicit in the notion of a "random sampie") implies that
(11)
Since
(q - x/n)2 for x ~ c,
{
(q~ean - 4J~ean)2 = ~x/n)2 for x = d = c+ 1 (12)
for x ~ d+ 1,
it follows that
m o 1 2 3 4 5 6 7 8 9 10 11
143 27 12 9 3 1 2 1 1 1
If a test does not affect the items, we may accept an inspection system different
from that considered in §5.
A random sampie of n items is made from a batch consisting of N items.
The n items chosen are tested, and the number z of "defective" items in the
sampie is found. If z :5 c, then the z detected defective items are replaced by
good ("non-defective") ones, and the whole batch is accepted. If z ~ d = c+ 1,
then the whole batch is checked, all defective items thus detected are replaced
by good ones, and only after that is the entire batch accepted.
As in §5,
for z :5 c,
q
* ={qo- ziN
for z ~ d,
but now q* has a simpler meaning: it is the proportion of defective items
remaining in the batch after the above-described procedure. Formulas (1)-(9)
of §5 hold. Here Q(q) has the new meaning of expectation of the proportion
of defective items in an accepted product * under the assumption that prior
to inspection the proportion of defective items in each batch was equal to
q. Therefore we can now make an a priori of the worst possible
es~imate
z- {o for z :5 c,
qN-z for z ~ d,
* Here a product can also be a set of batches (Translator's note).
7 Grant, Statist'cal quality control, 1946, p.353.
386 UNBIASED ESTIMATORS
L(;)
4D~~
Il, /Ja
-
Fig.3
= x+z = {;N
for x ~ e,
u
x~d
Since for q = 0, 1/N, ... , e/N the factor 1- L(q) in (2) vanishes, for u ~ e
the function 4;( u) is uniquely determined by the system of d equations
In particular, for f(q) = Q(q) formulas (3), (4) give an unbiased estimator for
Q(q), that is, for q*.
UNBIASED ESTIMATORS 387
Let Xl, X2, ••• ,Xn be independent and obey the normal distribution with prob-
ability density
(1)
(2)
is a sufficient statistic for the problem. The prob ability density for x can be
expressed in the form
p(xla, u) = G(x - a, T), (3)
where
In our case (for unbiased estimators of the form </J(x) for the function f(a)) the
i:
main equation (1) of §3 can be expressed in the form
i:
For t > - T we set
Clearly,
</J(z, 0) = f(z). (8)
and, under the above assumption, is uniquely defined for t > -T by its values
fez) at t = o.By (9) this implies that <p(z) is uniquely determined by fez). 8
Thus, if the problem of finding unbiased estimators for f(a) is solvable,
then, under restrietions 1) and 2) its solution is unique.
By (8)-(10) the problem offinding unbiased estimators for f(a) is reduced
to "the inverse heat conductivity problem", which is considered, for example,
in [6]-[8]. In what follows we will note only the following concerning this. If at
I:
a certain time To > T, fez) is already representable as
I:
then the unbiased estimator <p(x) is given by the formula
at some fixed point x we obtain an unbiased estimator for n > 1 in the form
<p:c(x) = ~ e-(x-:c)2/2q~, u~ = (1- ~)u2. (14)
v2~uo n
Integrating (14) for any fixed set A on the x-axis, we obtain for
Formula (16) is applicable for n > 1. When n = 1, we can use instead the
unbiased estimator for P(A) with respect to i = Zl of the form
for Zl E A,
(17)
for Zl 'EA.
Similarly, for n > 1 we can also give for P(A) an unbiased estimator:
(18)
P(A) = P{z E A}
p(Zl. Z2,···,
(
1
znl a , (1') =."fiFi
21r(1')n
[n { _ a) 2+ s2}]
exp --22
(1'
(Z - (1)
for the n-dimensional prob ability density it is dear that there are two sufficient
statistics for the problem, namely i and
(2)
(3)
390 UNBIASED ESTIMATORS
where
1rn(n-l)/2
(4)
Kn- 2 = r«n - 1)/2)
is the (n - 2)-dimensional volume of a sphere of radius 1 in (n - 1)-dimensional
i: i:
space. The main equation (1) of §3 now takes the form
We will confine ourselves to finding, with the help of Theorem 2 of §1, the
solution of (5) for the case
for Xl E A,
for X2 'EA.
To obtain an unbiased estimator for P(A) of the form <fo(i, s) it remains
to compute the integral
(7)
where VX ,$ is the set of points (Xl, X2, ••• , x n ) in n-dimensional space satisfying
the equations
Xl + X2 + ... + X n
-=----=-----"- = i,
n
p= vns; (8)
1
(9)
For n > 2,
(10)
UNBIASED ESTIMATORS 391
where <Px is an unbiased estimator with respect to i and s for the prob ability
density p(xla, 0") at x. To determine <Px we consider the volume of the annular
zone
Xl = X, Xl = + dx.
X
When
its radius is
(11)
its width is
{j = f6---dx
n -1 p'
p
<Px = { ;n-2
Kn_3 1 1 1 1 x-x 2 -~-
-/n-l .. { - n=I (-8-)} ,
n-~
(J = A(J1 + (1 - A)(J2.
1:
distributions (J on the number ws for whieh the absolute moment
I: I:
where s is a positive integer, is finite, then the eentral moment
if and only if
s ~ n.
However, Halmos went even furt her than these evident results and proved
that under certain not too restrictive limitations imposed on the system '+lo,
the symmetrie unbiased estimator is unique (see [4]).
30 March 1950
References
The transition probabilities ~(t) we are interested in are defined for all real
t ~ 0 and satisfy the relations
~(t) ~ 0, (I)
L~(t) = 1, (H)
fJ
for 0: f:. ß,
(IH)
for 0: = ß,
L~(t)p~(t') = pl(t +t'). (IV)
fJ
In addition to these relations of an algebraic character we will assume that the
continuity condition holds:
d ] . 1- pa(t)
aa = - [ -p~(t)
dt t=O
= hmo
t .... t
a, (1)
a fJa-- [ ~~(t)]
dt t=O
= lim ~(t)
t .... o t
for ß f:. 0:, (2)
(3)
395
396 DIFFERENTIABILITY OF MARKOV TRANSITION PROBABILITIES
and have the weIl-known probabilistic meaning of prob ability densities of leav-
ing the ath state (for the aa) and prob ability densities of transition from the
ath state to the ßth state (for the a~). The transition probabilities J{(t) them-
selves are uniquely determined by the transition densities a~ (in terms of which
the aa are defined by (3)) as solutions of any of the two systems of differential
equations
(5)
(7)
(G) there are examples of two different systems of functions ~(t) satis-
fying (I)-(V) with identical and finite aa and a~ satisfying (3).
Of the above statements, only (D) remains conjectural. The others were
partially proved by Doob (see [3], [4]), and are established completely in the
present paper.
Proo/ 0/ (A). The existence of finite or infinite limits was proved by Doob. We
shall give another, quite elementary, proof (Doob 's proof makes use of measures
in function spaces).
Set
- = 1"ImIDf 1 - p~(t) .
aa (8)
t-O t
Clearly, äa ~ O. If äa = 00, then
-
aa= 1- p~(t)
l'Im-....;;...=~ (9)
t-O t
(1 - p~(t))/t $ äa . (10)
Indeed, take arbitrary t > 0 and t: > 0 and choose h such that
t = nh+s,
r(1-
~
~ {1- n ( ii a + i) h} (1 - i t ) ~ { 1 - (ä a + i) t } x
x ( 1 - it) ~ 1 - (ii a + t:)t,
Since f > 0 is arbitrary, (13) implies (10). Formulas (10) and (8) imply
(9). Thus the existence of the limit a~ = -aO' = -aO' is also proved for finite
aO'.
Whether the case aO' = 00 is possible remains an open question in Doob's
work. The affirmative answer is given by the following:
Lemma. Suppose that for a 2: 2 the aO' >0 are finite and are such that
1
L:-
00
Then there exists a Markov process satisfying (I)-(V), with given aO' for a 2: 2,
al = 00,
ß
al = 1 for ß 2: 2,
a~ = aO'}
a~ = 0 for a 2: 2, a f. ß 2: 2.
The requirement
L:pf(t) = pt(t) + L:pf(t) = 1
ß ß~2
(17)
= L: e-aßT
00
k(r)
ß=2
DIFFERENTIABILITY OF MARKOV TRANSITION PROBABILITIES 399
k( r)dr = L: -.
00
a{:J
(:J=2
1
Therefore, 1 (17) has a continuous solution </J(t) for t ~ 0, which can be obtained
in the usual way via the Fourier transformation. It can easily be shown that
this solution is continuous and satisfies the conditions
Proof of (B).
Lemma. Suppose that for t ~ H,
Then for
nh ~t ~ H,
Proof. Let 3
ri~ß
i=1.2 •...• k-l
(23)
Finally,
n
3 To understand the meaning of inequalities (20), (22) and (24) the reader should
remember the probabilistic meaning of Pk and Qk' These inequalities, however,
can easily be proved also in a purely algebraic way, based on (I)-(IV).
DIFFERENTIABILITY OF MARKOV TRANSITION PROBABILITIES 401
Now assume that the conditions of the lemma are satisfied for a given H
and consider an h ~ H. Let n be the integer part of H/h and let t = nh. Then
n ~ H/2h
and, by (19),
1 ~ p!(t) ~ np!(h)(1 - 3e),
that is,
~(h) < 1 < 2
h - nh(1 - 3e) - H(1 - 3e) .
Hence
ä~ = limsup ~(t) < 00. (25)
t ..... o t
For any t ~ H choose h such that
nh ~ t(l- e)
For any e such that 1 > e > 0 (28) is proved for t ~ H, where H is sufficiently
small. This, together with (25), implies that
~(t) _ -p _ p
II· m - - - aa - aa·
t-O t
Proof of(C). Inequality (6) is elementary and can be found in the work ofDoob
[3], [4].
Here is an example where (7) holds for a = 1, though all the aa are finite.
402 DIFFERENTIABILITY OF MARKOV TRANSITION PROBABILITIES
1) For 0 ~ 3 the probabilities p~(t), ~(t) for 3 ~ ß < 0 and p~(t) are
determined from the differential equations
!p~(t) = -aap~(t),
d
dt~(t) = aß+l~+l(t) - aß~(t), ß = 0 - 1,0 - 2, ... ,3,
!p~(t) = a3P~(t)
(with the usual initial values (111)), where the aa > 0 are chosen so that
1
L-<oo.
00
(29)
a=3 aa
where
~(t) = a-oo
lim ~(t).
it follows that
Laf < al·
ß#
It can be verified that (I)-(V) hold in this example.
Statement (E) was proved by Doob (see [4], VIII). For the above example
equation (4) fails. Indeed, in this example al = 1 and af = 0 for ß :F 1.
Therefore for 0 = 1, (4) are of the form
pf(t) = 0
Proof of(F) and (G). This can be found in Doob's work (see [4], Theorem 2.2).
References
where P" is some expression that is easier to compute or use than the precise
formula for P. In what follows we indicate conditions when (2) becomes precise
in the limit, meaning that
n
I:lp-rl-+O. (3)
m=O
a = np, (5)
u 2 = np(1 - p)(N - n)/(N - 1), (6)
p=M/N. (7)
404
GENERALIZATION OF POISSON'S FORMULA FOR A FINITE SAMPLE 405
The condition A ::; AO < 1 is not restrictive since for A ~ ~ we can consider
the "complementary" sampie from the remaining
n' = N - n
where
w = -Oog(l - A)]/A. (11)
Approximation (10) is also correct under more general conditions. Namely,
for its applicability in the sense of (3) it suffices that
p-O. (12)
(15)
A-O. (16)
41. SOME RECENT WORK ON LIMIT THEOREMS
IN PROBABILITY THEORY*
Introduction
In the middle of the 1940's it seemed that the topic of limit theorems of clas-
sical type (that is, problems of the limiting behaviour of distributions of sums
of a large number of terms that are either independent or connected in Markov
chains) was basically exhausted. The monograph written by me together with
B.V. Gnedenko [1] was intended to summarize the results ofthe previous years.
In reality, however, starting from the late 1940's more papers in these classi-
cal fields appeared. This can be explained by several circumstances. First, it
became clear that from a practical viewpoint the accuracy of remainder terms
obtained so far was far from sufficient. Secondly, certain problems that were
solved earlier only under complicated and very restrictive conditions unexpect-
edly obtained very simple complete solutions (in the sense of necessary and
sufficient conditions). These include, for example, the problem of "localiza-
tion" of limit theorems, which turn out to have an exhaustive solution both
for the case of identically distributed independent summands and for the case
of the distribution of the number of separate states visited in a homogeneous
Markov chain.
Naturally, these results have been a stimulus for seeking similar complete
results for a number of other cases. Finally, the very statements of the problems
have become more refined and transparent, owing to the introduction of suitable
distances between distributions and the ideas of computing least upper bounds
of residue terms borrowed from the theory of best approximation.
This paper briefly presents the results of some recent papers that illustrate
these new tendencies. The results given in the monograph [1] are supposed to
be known to the reader, though sometimes they are mentioned in order to make
the presentation more coherent.
406
SOME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY 407
where the supremum is taken over all A for which the probabilities are defined.
In the one-dimensional case, Pl (Pl , P2 ) is equal to half the total variation of
the difference F l - F2 :
where the supremum is taken over all intervals ~. However, it is easy to prove
that
Clearly,
For a long time general limit theorems were being found ''hy feeI" . Only after
the limiting distribution for a particular case was computed was the problem
raised of finding general conditions for convergence to this distribution. The
first example of a different approach I am aware of is Levy's theorem on limiting
distributions of the successive sums
of independent identically distributed terms (see [1], §33). The theorem states
that the distributions of the variables
can only converge weakly to a stable distribution, and to any such distribution
(under an appropriate choice of the distribution of the terms en). It was only
much later that W. Doeblin and B.V. Gnedenko fully studied conditions under
which this convergence holds at all or there is convergence to a certain definite
stable law (see [1], §35).
Even more important is A.Ya. Khinchin's theorem, which characterizes a
dass of possible limit laws for the distributions of sums
4 The fact that we do not include in the expression for Cn the non-random term
An (cf. (1) on p.119 in [1]) does not make any difference: Khinchin's theorem
holds in this form as weIl.
SO ME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY 411
where each enk takes only two values, 0 and 1, and are connected within each
sequence in a simple homogeneous Markov chain with matrix of transition
probabilities
IPnqn 1- Pn I
1- qn,
depending only on the number n of the sequence, but not on the number k of
the test in the sequence. If the enk are independent (that is, when Pn = qn)
then only the degenerate distribution f n , the normal distribution, the Poisson
distribution and distributions obtainable from these three by linear transfor-
mations 5 can serve as limit distributions for
For arbitrary Pn and qn the problem becomes much more difficult and in order
to enumerate all possible limit distributions of 7Jn a number of new ingenious
considerations are required.
The study of similar questions for the case of Markov chains with any finite
number of states (not necessarily homogeneous) was started by Koopman (see,
for instance [14]). It would be very interesting to obtain as complete results as
those of R.L. Dobrushin for s = 2, at least for the homogeneous case with an
arbitrary number of states s.
In estimating the proximity of distributions with the help of some distance
p(P1 , P2 ) between distributions we can raise the question of convergence not to
an individual distribution, but to a whole elass of distributions. For instance,
we can ask if the distribution of the sum
5 This result by P.A. Kozulya.ev is contained in general theorems of §26 and §27
in [1].
412 SOME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY
More exactly, the quest ion is whether for any f > 0 there exists an N such that
for any n ~ N there exists an infinitely divisible distribution S such that for
the distribution P of the sum ( we have
p(P, S) < f.
So far there are few papers in which problems of this sort are solved in
a setting essentially non-reducible to limit theorems of the usual type. Note
that in Dobrushin's work mentioned above, the question of approximating the
distributions of the sums (n is solved (for the case he considered) also in the
sense of this kind of uniform approximation (as n -+ 00) with respect to the
distance Pl.
Dobrushin's result is quite complicated because of case-by-case checking
of many particular distributions. The principle underlying his result can be
elarified by the example of a particular case of independent variables enk con-
sidered earlier by Yu. V. Prokhorov [3]. In this case we speak about uniform
approximation in the sense of the distance Pl of the binomial distribution B np ,
given by the formula
and P2 is taken for the distance, these theorems on uniform approximation are
not essentially new (see §4). However, a deliberate search for families sufficient
for uniform approximation in more complicated cases has been started quite
recently, and there still remains much to be done in this direction.
where p( x) is the corresponding prob ability density and tJl( A) is the singular
part of the distribution. If tJl == 0, then P is continuous. If at least one of the
distributions P1 and P2 is continuous, then
i:
is equivalent to convergence in mean of the densities
IPn - pldx -+ O.
6 See [1], §46 and §47 and the Hungarian translation of this book, in which the
formulations of local theorems are improved.
414 SOME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY
Xk = kh+ a.
This operation can be performed, for example, by ascribing the prob ability
The theorems of Prokhorov and Gnedenko are limited to the case of iden-
tically distributed summands. But in any case they show that more general
and complete results on "the localization of limit theorems" are possible than
it seemed before. To conclude this section, note also the papers on arithmetic
local theorems [6]-[10]. The first three of them, [6]-[8], deal with multidimen-
sional generalizations of Gnedenko's result mentioned above. The fourth, [9],
gives an exhaustive answer to the question of approximating in variation the
distribution of the number of distinct states visited in a homogeneous Markov
SOME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY 415
chain with a finite number of states. The fifth, [10], contains quite deep, though
less complete, results for inhomogeneous Markov chains.
where
A= (~) 1/3 ~(1 + 4e- 3 / 2 )2/3 e-l/6 = 0.42 ....
Thus, in terms of the variational distance he found an asymptotically precise
estimate of the largest deviation (for given n and variable p) of the binomial
distribution from the most suitable of the approximations L np , Pnp and Pnp .
This somewhat narrow result can serve as an example of the technique for
estimating remainder terms that have taken shape in the papers of the last
decade. There are no complete results of the same kind for sums of arbitrarily
distributed independent summands. Below we consider problems of this kind
connected with the so-called Lyapunov's ratio.
For simplicity we assurne that the sum
It is weH known that as L --+ 0 the following limit relation for the distribution
function F( z) of the sum , holds:
After Berry's work [15], where a more particular problem was considered,
Esseen [16] proved that
1/...;2; ~ c ~ 15/2.
Of great interest is the quest ion whether more precise estimates or computa-
tions of c and c* are possible, especially for the functions
( _
c z) - s~p
p(1, z) _
- 1 - - sup
IF(z) -
L
~(z)1
'
* 1
c = ...;2;'
SOME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY 417
Perhaps these formulas hold under the single restriction of symmetry of the
distributions of the e. The first of these formulas might also be true in the
general case. Even the conjecture
1
c = ...j2-i
References
1. B.V. Gnedenko and A.N. Kolmogorov, Limit distributions tor sums 0/
independent random variables, Gostekhizdat, Moscow-Leningrad, 1949 (in
Russian).
2. R.L. Dobrushin, Izv. Akad. Nauk SSSR Sero Mat. 17 (1953), 285 (in Rus-
sian).
3. Yu.V . Prokhorov, Uspekhi Mat. Nauk 8:3 (1953), 135-142 (in Russian).
4. Yu.V. Prokhorov, Dokl. Akad. Nauk SSSR 83 (1952), 797-800 (in Rus-
sian).
5. B.V. Gnedenko, Dokl. Akad. Nauk SSSR 71 (1950), 425-428 (in Russian).
6. D.G. Metzler, O.S. Parasyuk and E.L. Rvacheva, DoH. Akad. Nauk SSSR
60 (1948), 1127-1128 (in Russian).
7. D.G. Metzler, O.S. Parasyuk and E.N. Rvacheva, Ukr. Mat. Zh. (1949),
9-20 (in Russian).
8. E.L. Rvacheva, Proc. Inst. Math. Mekh. Uzbek. Akad. Nauk No.10, Part I
(1953) 106-121 (in Russian).
418 SOME RECENT WORK ON LIMIT THEOREMS IN PROBABILITY THEORY
9. A.N. Kolmogorov, Izv. Akad. Nauk SSSR Sero Math. 13 (1949), 281-300
(in Russian).
10. Yu.V. Linnik and N.A. Sapogov, Izv. Akad. Nauk SSSR Sero Mat. 13
(1949),533-566 (in Russian).
11. Yu.V. Linnik, Izv. Akad. Nauk SSSR Sero Mat. 11 (1947), 111-138 (in
Russian).
12. Yu.V. Prokhorov, Izv. Akad. Nauk SSSR Sero Mat. 16 (1952), 281-292 (in
Russian).
13. S.Kh. Sirazhdinov, Dokl. Akad. Nauk SSSR 84 (1952),1143-1146 (in Rus-
sian).
14. B.O. Koopman, 1rans. Amer. Math. Soc. 70 (1951), 277-290.
15. A.G. Berry, 1rans. Amer. Math. Soc. 49 (1941), 122-136.
16. C.G. Esseen, Acta Math. 77 (1945), 1-125.
42. ON A.V. SKOROKHOD'S CONVERGENCE *
When describing the course of areal process in time with the help of a function
I(t) of time t that takes values in an appropriate "phase space" X, it is often
natural and correct to assurne that I has only discontinuities of the first kind
(jumps). To study such processes in detail it is useful to introduce in a set D
of functions with discontinuities of the first kind (for the sake of being specific
we will consider such functions on the unit interval 0 ~ t ~ 1) an appropriate
topology.
The topology of uniform convergence, natural when studying continuous
processes, appears to be too strong when studying processes with discontinu-
ities of the first kind. For example, it is natural to require that the sequence
of functions
for t < tn ,
for t > tn ,
where t n -+ ta as n -+ 00, converge to the function
for t < ta,
for t > ta,
since In differs from I for large n only by a small shift in the jump from the
state Xl to the state X2' As is known, this convergence does not hold in the
space of uniform convergence for Xl I X2'
On the other hand, the topology in D should not be too weak, since we
wish to retain the most essential properties of I E D when passing to the limit.
For example, we wish that under the assumption that In converges to I, the
conditions
imply that
I(t + 0) - I(t - 0) = c.
Even for these two assumptions we need a new topology in D, adapted
especially to the needs of studying processes with discontinuities of the first
kind. This kind of topology is suggested in the papers [1], [2] by A.V. Skoro-
khod. In these papers Skorokhod defines a certain convergence in D (we will
call it S-convergence) for the case of real functions I(t) (that is, when X is the
419
420 ON A.V. SKOROKHOD'S CONVERGENCE
real line) and notes that this convergence turns the set D into a topological
Hausdorff space. Further, in §2, a new definition of S-topology is given which
can be used under the assumption that X is an arbitrary metric space and it is
shown that the topological space SD arising in this way is a metric space. The
distance s(l,g) introduced in §2 for this purpose and the corresponding uni-
form topology seem to be quite natural and convenient in applications, despite
the fact that even for a complete space X the metric space sD is not complete.
At the end of §2 a simple necessary and sufficient condition for S-compactness
is given.
The results of §3 (as weIl as Theorem IV of §2) are essentially based on the
assumption that the phase space X is complete (for simplicity we will assurne
this from the very beginning).
In §3 the completion of the metric space sD is given a specific interpreta-
tion using D-curves in the space R = T x T. This interpretation is used for
proving the possibility of introducing into D another distance s· (I, g) inducing
the same Skorokhod convergence and topology, but turning D into a complete
metric space s· D. When presenting the results of this paper at the seminar on
prob ability theory at Moscow University, I raised the problem of the simplest
possible explicit construction of the distance s· (I, g). This problem was solved
by Yu.V. Prokhorov [3].
§1 contains some prerequisites necessary for the considerations of §3. The
contents of §2 could be as weIl presented without introducing the set e, using
instead a useful technique for representing a process with discontinuities of the
first kind, for example, using functions of a real variable t that are continuous
from the right:
f+(t) = f(t + 0).
iffor any f: > 0 there exists n(f:) such that for n ~ n(f:),
t - f: + 0 ~ On ~ t - 0,
and that
iffor any f: > 0 there exists an n(f:) such that for n ~ n(f:),
This convergence turns the set e into a compact topological space (see [5]).
The fact that this space is not metrizable is no hindrance to its use in further
constructions.
By definition, the dass D x consists of functions f(O) defined on e, with
values in ametrie space X and continuous in the sense that
implies
422 ON A.V. SKOROKHOD'S CONVERGENCE
f+(t) = f(t + 0)
is defined for all t in the half-interval 0 ~ t < 1, is right continuous and has a
left limit,
f+(t - 0) = f(t - 0),
It is easy to prove
Wj(tk-1 + 0, tk - 0] < L
I.!.. g.
In.!.. I·
In what follows S-convergence is denoted by
s
In ~ f.
This is the convergence Skorokhod introduced for the case when X is the real
line.
We now set
s(f, g) = inf f.
':'9
s(f,t) = o.
It is fairly easy to show that
s(f,g) = 0
Lemma. 11
t'
I~g, 9 '" h,
then
s(fn, f) --+ O.
If X contains two different points Xl -::F X2, then the metric space sD is
not complete, since for
tn < ta < 1, t n --+ ta
the sequence of functions
for 0 + 0 ~ () ~ t n - 0,
for t n + 0 ~ () ~ ta - 0,
for ta + 0 ~ () ~ 1 - 0
satisfies the Cauchy criterion hut does not converge. In this connection the
following theorem is interesting.
where
w* f[(),()'] = infmax{wl[(),t - O],wl(t + 0, ()')}.
t
ON A.V. SKOROKHOD'S CONVERGENCE 425
It should be noted that the operations w" I are similar to the '"'(J that were
already used by E.B. Dynkin in [4] for studying the functions of the dass D.
Theorem 4 is dosely connected with
WJ[O+O,O-O]<f, wJ[1-o+0,1-0]<f,
T={t;0~t~1}
and
s(l,g) = p(j,g).
Theorem 6 gives a new interpretation ofthe distance s(l, g) and S-converg-
ence: S-convergence of a sequence of functions In to I is equivalent to con-
vergence in the sense of the distance p(~, (f) of the curves in to the curve
1.
In what folIows, it is useful to introduce the notations T",(O), e",(O) for the
components of the function
<fo(O) = (T",(O),e",(O)).
if and only if
(Al) r",(fJ) maps () onto the whole set T;
(A2) () < ()' implies that r",((}) ~ r",((}').
By (Ad, (A2) for a function t/J E ~ E A the sets r;l(t) of those () for which
r",((})=t
When the parameter () runs over r;l(t), the value r",(t) remains constant, while
~"'((}) runs over some sequence of points of the phase space X. Therefore the
curves ~ E Ä may be considered as the graphs of special generalized processes
with discontinuities of the first kind whose behaviour at time t can, in gen-
eral, be more complex than simple transition from the state I(t - 0) to the
state I(t + 0). Such generalized processes can naturally appear as limiting re-
strictions similar to condition (*) of Theorem 4 that prevent the accumulation
of several vanishingly small jumps at one point t. Possibly, introducing this
kind of generalized process will appear useful in certain special studies on limit
properties of random processes. Without developing this idea any furt her I use
the analysis of the construction of Ä only as auxiliary means for proving the
possibility of introducing, in the space SD, a distance s*(I,g) different from
s(l, g) and turning SD into a complete metric space. For this purpose I will
prove the following lemma:
Proof It can easily be shown that for given t E T and ~ E Ä the value of
is the same for all t/J E ~, that is, it characterizes the properties of the curve
~, not of its parametric representation t/J; we denote this value by w;(t). The
condition
(1)
n
The set Ä is closed in i by definition. It ean be proved that the sets An are
also closed in i. Our lemma now follows immediately from the closedness of
the sets Ä and An and (1).
By a well-known theorem of P.S. Aleksandrov this lemma implies that we
ean introduee a new distanee p. (i, g) in A that is topologieally equivalent to
the distanee p(j, g) but turns it into a eomplete metrie space.
Setting
s·(/,g) = p.(j,g)
and keeping Theorem 6 in mind, we ean see that the following is true:
References
1. A.V. Skorokhod, Dokl. Akad. Nauk SSSR 104 (1955), 364-367 (in Rus-
sian).
2. A.V. Skorokhod, Dokl. Akad. Nauk SSSR 106 (1956), 781-784 (in Rus-
sian).
3. Yu.V. Prokhorov, Teor. Veroyatnost. i Primenen. (Probability Theory and
its Applieations) 1:2 (1956), 175-237 (in Russian).
4. E.B. Dynkin, Izv. Akad. Nauk SSSR Sero Mat. 16 (1952), 563-572 (in
Russian).
5. P.S. Urysohn, Works on topology and other fields 0/ mathematics, Gostekh-
izdat, Moseow, 1951, Vol. 2 (in Russian).
43. TWO UNIFORM LIMIT THEOREMS FOR SUMS OF
INDEPENDENT TERMS *
In what follows
where
e=6 +6 + ···+en
and the random variables ek are independent. Let ~ be the family of degenerate
distributions of the form
for x ::; a,
E(x) = {~ for x > a,
and let e be the family of infinitely divisible distributions. The purpose of the
present paper is to prove the following two theorems:
hold for all f. > 0, L > 21 > 0, where Ek E ~, k = 1,2, ... ,n, then there exists
q; E e for which
where
429
430 TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS
The proof of each theorem makes use of a number of ideas borrowed from
papers of P. Levy (see [1], §48), W. Doeblin [2] and Yu.V. Prokhorov [3], and
is based on the following lemmas, in which
Lemma 1. If
where 1
s = ~)1- QF,,(r)],
k
then
Lemma 2.
Lemma 3.
Lemma 4. If
then
F * Go,u(x -I) - TJ ~ F(x) ~ F * Go,u(x + I) + TJ·
Lemma 5. Let
1 Throughout, the index k in the sums L:k and products TIk or TI; runs through
all integers 1 :5 k :5 n.
TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS 431
where
1-Pk form =0
{
p~) = ~k fior m = 1, qm(k) -- ~
m! e -Pk , m = 0, 1, ...
for m > 1.
IfO ~ Pk ~ 1 then 2
L
ml· .. mn
IPml ... mn - qml ... mnl ~ es LP~'
k
Lemma 6. If 0 ~ P ~ 1, then
Lemma 7. Let
2 Here and in what folIows, l:ml ... mn denotes summation over all sets of positive
integers ml, ... , mn.
3 See: A. Kolmogorov, 'Sur les proprietes des fonctions de concentrations de
M.P. Levy', Ann. Inst. H. Poincare 16:1 (1958), 27-34. No. 45 in this book.
432 TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS
1. Pro%/ Theorem 1, general part. It suffices to prove the theorem for the
case of continuous and strictly increasing functions Fk(x). Indeed, assume that
F/c, t, land L are fixed so that the conditions of the theorem hold. Choose I'
and L' so that
L > L' > 2/' > 21
and set t' = 2t. Using only the qualitative part of Lemma 4, it is easy to
establish that for a sufficiently small s the distributions
satisfy the conditions of the theorem with L replaced by t', I', L'. It is easy
t, I,
to prove that the Fk(x) are continuous strictly increasing functions. If for the
corresponding function ~' = ~ * GIVn we prove the inequality
w(x - L') - 6' ~ ~'(x) ~ w(x + L') + 6',
I'g'
where
6' = C' max (L'
- log - e'l/5)
I' , ,
then, using Lemma 4 with = ns 2 and I = L - L', we easily find that
0'2
for sufficiently small sand C = 2C' the original function ~(x) satisfies the
inequality stated in the theorem.
In accordance with this, we take y = F/c(x) continuous and strictly in-
creasing, which allows us to consider the inverse functions x = Fk-1(y), which
are uniquely defined for all y in the interval 0 < Y < 1, continuous, strictly
increasing, and running through all real values of x.
We set
~/c(P) = F/c-l(l- P) - F/c-l(p).
Under these assumptions the ~/c(P) are continuous and strictly decreasing.
They run through all positive values ~, 0 < ~ < 00, when P runs through
o < P < ~. The inverse functions
TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS 433
are also continuous and strictly decreasing: when A runs through 0 < A < 00,
they run through all values of P in the interval ! > P > O. Therefore, provided
that
L fk(2/) > 2f- 4/ 5
k
(here 1 and f are the quantities occurring in the conditions of the theorem and
we assume without loss of generality that f ~ 1), the equation
L fk(AO) = 2f- 4/ 5
k
er 2 = L(1- 2fk)eri,
k
'"
L...J Pml ... mn II * [F(l)]m"
k * II *[F(O)]l-m"
k'
ml ... mn k k
Now, expressing the qml ... mn in terms of Pk = 2lk according to the formulas of
the same Lemma 5, we set
where
for z ~ 0,
Eo(z) = {~ for z > 0,
and the powers are understood in the sense of convolution
FO = E o, F1 = F, F2 = F*F etc.
Clearly, by passing from ek to
ak = 0, k = 1, ... ,nj a = 0.
TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS 435
We now set 4
Noting that
ml· .. mn
mt···mn
~2 = L:
mt .. ·mn
Pml ... mn~~~ ...mn * Gt7ml ... mn * G t70 '
mt···mn
(5)
(7)
we have
(8)
Therefore
(9)
where E' is the sum of those Pml ...mn for which (7) does not hold. Clearly,
(10)
where
p2 = L(1 -l'k)O'I.
k
(13)
Since always 0"0 ~ A, (3/5 ::; (1/5, (9) and (13) imply
(14)
By Lemma 7 and the fact that lek I ::; A for J.lk = 1, we have
(15)
(17)
where
C. = 8C5 + C3 + 4.
2. Case A). The end of the proof of Theorem 1 differs according to cases A)
and B). Let us first consider case A). In this case, by (A.2), (17) turns into
(18)
(21)
S = ~)1- QF,,(A)]
k=1
438 TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS
is bounded by
(22)
C1/TO 1/5
Q~ ( TO) ~ A.,fi ~ 2C1 f . (23)
(25)
Formula (25) now immediately implies (for case A)) the conclusion of the the-
orem, and the shifts by L are not needed; they are needed only for case B).
3. Case B). In this case, by (B.2) inequality (17) takes the form
(27)
we have
(28)
Now set
C = C. + 23 / 2 C7 + 2C1C2 + C4 / log 2.
Since L > 2/, log(L/I) > log 2, it follows from (26) and (28) that
where
_ f:L
8 - Cmax Lylog(I
I,f 1/5) .
5 It is easy to verify that by (A.2), (22) and under the assumption that f ~ 1, the
additional condition R 2 2: r 2 log s holds for R = 0'0 and r = A.
TWO UNIFORM LIMIT THEOREMS FOR SUMS OF INDEPENDENT TERMS 439
As we see from (25), inequality (29) proved here for case B), holds also for
case A). This completes the proof of Theorem 1.
f = ein.
Naturally, fk and Pk = 2fk do not now depend on k. Since
is now superfluous and would even have somewhat complicated the achievement
of the final result. Instead of referring to Lemma 5 it is better to refer to the
simpler Lemma 6. Since these simplifications are possible, we now give a proof
of Theorem 2 that is independent of Nos. 1, 2. We denote by F( x) = P {ek < x}
the general distribution function of the terms ek. The conditions
~3 = L: C!pk(l- p)n-k Ft * G u •.
k
By Lemma 6
(30)
Lemma 3, under the assumption that
--k- n I <n
I-n1-p -
4/5
(31)
implies that
(32)
therefore
(33)
where 'L' is the sum of those C!pk(l - p)n-k for which (31) fails. Clearly
(35)
Clearly,
QF(A/2) ~ 1- p/2.
(37)
(38)
References
Without claiming to present an exhaustive review of the literat ure on the ques-
tion, we intend to give an outline of the first steps of the theory of random
functions, possible basic methods for a systematic construction of this the-
ory and basic problems concerning functional methods in limit theorems; in
addition, we report certain comparatively new results.
We will proceed from the axiomatics of prob ability theory given in [1]. A family
of {n, 4>, P} consisting of
1) a set n whose elements ware called "elementary events";
2) alT-algebra 4> of subsets of n;
3) a measure P(A) defined on 4> and satisfying the additional requirement
P(A) = 1,
will be called a prob ability field (the terminology corresponds to [2]). Some-
times it is also useful to assume that
(w') The measure P(A) is complete, that is P(A) = 0, B ~ A implies that
BE 4>.
442
RANDOM FUNCTIONS AND LIMIT THEOREMS 443
belongs to <I1 for any real a, a (real) random variable (see [1]).
An arbitrary function ew with values in a set X determines a probability
field {X, <I1e, Pd where <I1e consists of all A ~ X for which
Since in any problem in prob ability theory the basic set n is considered to
be quite definite while its elements ware implicit in the formulations of more
specific problems, the index w at ew
is usually omitted, and we speak about a
"random element e of the set X". Note that under the assumption (w') the
condition
(6) if Pe{A) = 0, B ~ A, then B E <I1 e
holds automatically.
If X is a topological space, then it is natural to require for a random
element eE X that certain topologically very simple sets A ~ X are contained
in <I1e. If Xis a metric space (this space is the main one in what folIows), then
these requirements are undoubtedly quite sensible. Namely, it is reasonable to
confine ourselves to random variables satisfying the condition
(e2) any open set G ~ X belongs to <I1 e.
As is weIl known, (e2) implies that any Borel set A X belongs to <I1 e.
~
If X is the real axis, then (6) is equivalent to the usual requirement in the
definition of random variables. We can further simplify the problem if, m
addition,
(6) For any A E <I1e, Pe(A) = infG2A Pe(G),
where the infimum is taken over all open sets G containing A.
For (6) to hold for any random element ein a separable complete metric
space X satisfying (6), it is necessary and sufficient that
(w") the measure P on n is perfect (see [3]).
The quest ion whether it is worth giving a systematic construction of prob-
ability theory using perfect measures remains debatable until such a systematic
444 RANDOM FUNCTIONS AND LIMIT THEOREMS
For the sake of definiteness we will consider complex functions x(t) of an argu-
ment t E T, where T is, in general, an arbitrary set. (Mostly studied here are
"random processes" where T is the real axis.) Two approaches to introducing
the notion of a "random function" of this kind are possible.
to a certain function space (for example, in the case T = [a, b]) as belonging to
the space qa,b] of continuous complex functions with the metric
The quest ion posed above can now be interpreted in the following way: Given
a random function 6 (t) in the sense of (I) and aspace of Borel measurable sets
(I), does there exist a random function 6(t) of type ~ for which (2) holds for
any t E T? From the point of view of applications, this approach developed
by E.E. Slutskii (see [12], [13]) is apparently quite sufficient and allows us to
avoid a number of complications of set theory which will be discussed below
in §3. From the practical viewpoint the most interesting are the questions:
Is it possible to consider a random function as a) continuous, b) having only
discontinuities of the first kind? In the first direction the following result
by A.N. Kolmogorov (see [12]) has long been known: in order that a random
function e(t) in the sense of (I) given on T = [a, b] be equivalent to a random
function C(t) of type qa,b], it suffices that
(3)
for some a > 0, 0: > 1 and K. In this case C(t) satisfies with probability 1 a
Lipschitz condition
for any °> 0, where K* is a random variable. (Note also that for any f >0
there exists K~, depending only on f,o,o:,a,K, such that P{K* > K~} < f.
This will be used in §6.)
The conditions for the representability of e(t) as a function with discon-
tinuities of the first kind only are especially essential when e(t) represents a
446 RANDOM FUNCTIONS AND LIMIT THEOREMS
Markov process. Among the recent papers on this question we point out those
by E.B. Dynkin [14] and Kinney [15].2
As is weIl known, elements of a number of important function spaces are
not individual functions, but metric types z(t) of these functions with respect
to the measure I' introduced in T (a metric type is a dass of all functions
that differ from a certain fixed function z(t) only on a set of I'-measure 0).
We introduce the notion of asymptotic continuity of z(t) at to so that for the
metric type z(t) the "true values" of z(t) are uniquely determined at almost all
points to E T by asymptotic continuity, and the space ~ is such that the true
values of a random function of type ~ at the points where it is defined are with
probability 1 random variables. In this setting the same problem on the relation-
between the two not ions of a random function, (I) and (11), may be raised and
solved, perhaps in a different form. Now, using the approach (I) we naturally
require that random variables e(t) be defined only almost everywhere on T
and regard two random functions (in the sense (I)) 6(t) and 6(t) as being
equivalent if (2) holds for almost all t. Then corresponding to each random
function C(t) in the sense of (11) there is a random function e(t) in the sense
of (I) which is defined to within equivalence and is equal to the true value of
f"(t) at t for almost all t with probability 1. The complete solution of the
inverse problem under conditions when a random function in the sense of (I) is
equivalent to a measurable function was given in the paper by Ambrose [16] for
the case when T is the real axis and I' is Lebesgue measure. A necessary and
sufficient condition is for e(t) to be asymptotically stochastically continuous
almost everywhere (with respect to 1'). If this condition is satisfied, we can
further consider our problem in the space ~ of measurable functions (more
exactly, in the space of metric types) and without any difficulties in principle
compute the probability of the summability of e(t), its square summability,
"essential boundedness" , etc.
Let us return to random functions e(t) in the sense of (I) defined for all t E T.
2 See also a later work by N.N. Chentsov, Teor. Veroyatnost. i Primenen. (Trans-
lated as Theory Probab. Appl.) 1:1 (1956), 155-161 (in Russian). (Note of Rus-
sian editor.)
RANDOM FUNCTIONS AND LIMIT THEOREMS 447
For fixed tt, t2, ... , t n this mapping assigns to each random function e(t) a
random vector
The values Pe(B) for B E IPT are uniquely defined by the finite-dimen-
sional distributions. In [1] it was proved that, conversely, arbitrary "compat-
ible" finite-dimensional distributions Ptlh ..... tn given for all finite systems of
elements tl, t2, ... ,tn in T and defined on the Borel sets in an n-dimensional
complex vector space give rise to a probability field {nT, IPT, P*}. The follow-
ing formulation is a slight variation of this result: If the distributions Pt1 .t2 ..... t n
are defined on all systems of finite elements of T, are compatible and satisfy
(e2) and (e3), then there exist random functions e(t) in the sense of (I) with
448 RANDOM FUNCTIONS AND LIMIT THEOREMS
of random elements from X, there arises the quest ion of the convergence of the
corresponding distributions p{n to a distribution P on X. Here convergence
naturally means "weak convergence" of distributions in the usual functional
analytic sense. Recall that a sequence Pn of distributions on X weakly con-
verges to a distribution P, if for any function f(x), x E X that is continuous
and bounded on X,
[f(X)dPn ~ [f(X)dP.
If X is the real line, then weak convergence appears to be equivalent to
the well-known convergence "in general" of the corresponding distribution func-
tions. Denote by Pn P the weak convergence of distributions. The meaning
~
I) Pn ~ P if and only if
RANDOM FUNCTIONS AND LIMIT THEOREMS 449
for any set A whose boundary has P-measure 0 (that is, for any continuity set
of the distribution P);
II) Pn => P if and only iffor any functional I( x) continuous almost every-
where with respect to P, the sequence of distribution functions
(5)
Further , let
TI; Tl1, Tl2,···, Tin,···
[0, 1], x (0)= 0, is compact if and only if there exists a function 4>( r), tending
to zero as r ---+ 0, for which
Pn(A) - P(A), A E S,
implies that
Pn~P.
Examples illustrating the use of these general results can be found in the
last section of the present paper.
If eis a random element from a Banach space X, 4 then for the corresponding
distribution p( we can define an analogue of the usual characteristic function.
For any element f of the space X* dual to X we define the value of the char-
acteristic functional H by the formula
then
Ix /I12 ...
where
E(/I, 12,···, In) = IndPe
for all n.
The quest ion of sufficient conditions for a functional H(f) to be character-
istic acquires new aspects in the infinite-dimensional case. Here, we only make
the following remark. In accordance with [18] it is natural to call a distribution
P in X normal if its characteristic functional has the form
H(f) = ei(m,J)-~(J,sJ),
where m is an element of X and S is asymmetrie bounded linear operator
replacing the matrix of second moments.
However, the condition that
(f,Sj)"? 0
RANDOM FUNCTIONS AND LIMIT THEOREMS 453
be random variables with zero expectations Een,k and finite variances Oen,k
such that
We denote by
the "accumulated sums" of the random variables en,k, and by tn,k the quantities
(8)
as n --- 00, where e(t) is a Wiener random process, that is, a process continuous
with probability 1 and satisfying the conditions:
1) = 0 with probability 1;
~(O)
we obtain for a ~ 0,
P{ max en(t)
0$t$1
> a} = P{ l~k~n
max Sn k > a} --- P{ max ~(t) > a} =
' 0~t~1
Given continuous functions a(t) and b(t) on [0,1] such that a(O) < 0 < b(O)
and a(t) :::; b(t), consider the functional fex) equal to 1 if a(t) < x(t) < b(t) for
an t and 0 otherwise. We find that
P{for an t, a(t) < ~n(t) < b(t)} --- P{for all t, a(t) < ~(t) < b(t)}
and hence
P{for an k, a(tn,k) :::; Sn,k :::; b(tn,k)} --- P{for all t, a(t) < ~(t) < b(t)}.
RANDOM FUNCTIONS AND LIMIT THEOREMS 455
The latter result was obtained by A.N. Kolmogorov in 1931 under somewhat
different assumptions (see [28], [29]). This remark allows us to obtain similar
limiting relations for the joint distribution of any finite number of functionals,
for example
P{ min Sn k <:c, max Sn k < y} -+ P{ min e(t) <:c, max e(t) < y}.
lSkSn' lSkSn' 0St9 099
Statement (8) contains theorems of Erdös, Kac and Donsker (see [31]-[33])
as particular cases. According to these theorems, the distributions of functions
of "accumulated sums" converge to the distributions of the corresponding func-
tionals of Wiener's process. The method by Erdös and Kac in its final form
given by Donsker [33] is first to establish the convergence of
(9)
for "strips", that is, for the sets A whose elements satisfy a(t) < :c(t) < b(t) for
all t, where a(t) and b(t) are step functions (see Figure).
'!/tJ
/1 11' b
1
alt) 1
1
Remark. It is known that by using the method of upper and lower functions
(see [30]) for the case when the process en(t) is constructed by the accumulated
sums of terms that are independent or connected in a Markov chain, and if the
process e(t) is a properly chosen Markov process satisfying the Fokker-Planck
456 RANDOM FUNCTIONS AND LIMIT THEOREMS
differential equations, we can obtain a statement of the following type: for any
piecewise smooth functions a(t) and b(t)
P{for all t, a(t) < ~n(t) < b(t)} - P{for all t, a(t) < ~(t) < b(t)}. (10)
Clearly, using Theorem III of §4, we can infer from the convergence (10)
that Pen => Pe. Nowadays the method of upper and lower functions is suc-
cessfully used by 1.1. Gikhman [34]-[39], who has applied it to a number of
problems concerning sums of random variables and certain problems in math-
ematical statistics (see below).
Denote by ~(t) the Gaussian process (that is, the process all finite-dimen-
sional distributions of which are Gaussian), continuous with probability 1 and
with correlation function
References
28. A.N. Kolmogorov, Izv. Akad. Nauk SSSR Sero Fiz.-Mat. (1931),959-962
(in Russian).
29. A.N. Kolmogorov, Izv. Akad. Nauk SSSR Sero Fiz.-Mat. (1933), 363-372
(in Russian).
30. A.Ya. Khinchin, Asymptotic laws 0/ probability theory, ONTI, Moscow-
Leningrad, 1936 (in Russian).
31. P. Erdös and M. Kac, Bull. Amer. Math. Soc. 52 (1946), 292-302.
32. P. Erdös and M. Kac, Bull. Amer. Math. Soc. 53 (1947), 1011-1020.
33. M. Donsker, Mem. Amer. Math. Soc. 6 (1951), 1-12.
34. 1.1. Gikhman, Ukmin. Mat. Zh. 6:1 (1954), 28-36. (in Russian).
35. 1.1. Gikhman, Mat. Sb. Kiev. Gos. Univ. 1953, No. 7, 8-15 (in Russian).
36. 1.1. Gikhman, Dokl. Akad. Nauk SSSR 82:6 (1952), 837-840 (in Russian).
37. 1.1. Gikhman, Dokl. Akad. Nauk SSSR 56 (1947), 961-964 (in Russian).
38. 1.1. Gikhman, Ukmin. Mat. Zh. 5 (1953), 413-433 (in Russian).
39. 1.1. Gikhman, Dokl. Akad. Nauk SSSR 91:4 (1953), 1003-1006 (in Rus-
sian).
40. M. Kac, In: Proc. Second Berkeley Sympos. 1951, pp. 189-215.
41. G. Maruyama, Nat. Sci. Rept Ochanomicu Univ. 1 (1953).
42. A.N. Kolmogoroff, G. Ist. Ital. Attuar. 4 (1933), 83-91.
43. N.V. Smirnov, Rev. Math. Moscow 6 (1939), 3-26.
44. N.V. Smirnov, Byull. Moskov. Gos. Univ. Sero A 2:2 (1939), 3-14 (in
Russian).
45. M. Donsker, Ann. Math. Statist. 23 (1952), 277-28l.
46. T. Anderson and D. Darlong, Ann. Math. Statist 23 (1952), 193-212.
45. ON THE PROPERTIES OF P. LEVY'S
CONCENTRATION FUNCTIONS * 1
In what follows
implies
(1)
where
n
s = ~)1 - Q,,(1)),
"=1
imply that
Q(L) ~ CL/l..jS. (2)
* 'Sur les proprietes des fonctions de concentrations de M.P. Levy', Ann. Inst.
H. Poincare 16:1 (1958), 27-34.
1 In two issues of the Proc. Inst. Statistics Paris Univ. articles in honour of Paul
Levy have been published. This paper was submitted too late to be included in
these issues.
2 Here and in what follows logarithms to the base 2 are considered.
459
460 ON THE PROPERTIES OF P. LEVY'S CONCENTRATION FUNCTIONS
the inequality
implies
Q(6L) ~ ß. (3)
It would be interesting to find inequalities that include (2) and (3) as par-
ticular cases. In general, I would like to point out that the further development
of elementary methods for direct probabilistic computations, so brilliantly de-
veloped in France by P. Levy and W. Doeblin, seems to remain as urgent as the
development of classical or functional analytic methods. In any case, I could
not prove the results of [1] without these elementary direct methods. It is quite
possible that mathematicians that have a better command of the subtle prop-
erties of characteristic functions will sooner or later be able to prove and even
generalize theorems from [1] using purely analytical methods, as already has
happened with results initially proved by direct methods of probability theory.
It seems, however, that today we are still in aperiod when the competition
between these two trends is bringing about the best results. While P. Levy
who masters these two techniques equally weIl, makes use of these two kinds
of method in his own work, it is very desirable that, after the premature death
of Doeblin, the younger generation of probabilists should not forget the direct
methods despite their admiration (admittedly, quite justified) for the power of
the methods which make use of distributions in function spaces.
We will use the following notation:
L ~ Iv'logn(n ~ 2)
imply that
(6)
The sum of the n' numbers n r satisfying (7) is not greater than
1 n
nE 4 = a·
00
r
r=1
Since the sum of all the n r equals n, the sum of the n" numbers n r satis-
fying (8) is not smaller than !n. Clearly, among all these nr there are at least
2 log n non-zero numbers. Since the number of r satisfying
is smaller than logn, there exists some number, not less than logn, ofnumbers
r for which
Taking from these reither only even, or only odd numbers in increasing
order we obtain the sequence
ON THE PROPERTIES OF P. LEVY'S CONCENTRATION FUNCTIONS 463
for which
a more or less elementaryargument, which is left to the reader, shows that the
sum f,' can belong to an inter val of length L only for one single quite specific
> 1) of the signs of the terms f,k ö' In other words, with
n-'.
combination (if n
probability at most 2- 8 ~
Hence, in the third case, because of the condition L ~ 1 we have
Comparing (5), (6) and (9) in these three cases we see that Lemma 2 is
proved.
Now let us prove the theorem. The relations
(10)
whose definition must be made suitably precise for all Y corresponding to dis-
continuity points of Fk(Z), If we assurne that the random variables 7]k are
uniformly distributed on (0,1) and are independent, then f,k = Uk(7]k) are also
independent and have distributions Fk(Z), Clearly, it may be assumed without
loss of generality that the initially given random variables ek are expressed in
this way.
It may be assumed that all the Qk(/) are smaller than 1, since by excluding
from our discussion those terms for which Qk(1) = 1 and obtaining (2) for the
sum of all other terms we can include the excluded terms again, and this will
not enlarge Q( L). We set
4fk = 1 - Qk(1),
x~ = 'Uk(fk), x~ = 'Uk(1- fk)'
Clearly,
X~ - x~ > I.
Denote by k 1 , k 2 , ••• ,km the indices k for which either 1/k < fk or 1/k >
1- fk.
We fix the quantities
if 1/k r < fk r ,
if 1/k r > 1 - fk r .
It is easy to see that the joint conditional distribution of the random variables
eki is such that they remain independent and each of them has the distribution
where
Xr = ~[Ukr(1- Zr) - Ukr(Zr)],
a r = ~[ukr(1- Zr) + Ukr(Zr)].
Applying (for fixed kr and Zr) Lemma 2 to the e~ = ek r - ar for which le~1 ~
I' = ~/, we obtain for L ~ I' y10g m the inequality
(11)
p=1-PHs~m~s}.
ON THE PROPERTIES OF P. LEVY'S CONCENTRATION FUNCTIONS 465
Q(L) ~ CL/hIS,
where C = 8 + 4C2 •
References
466
BRANCHING PROCESSES, DIFFUSION PROCESSES & GENETIC PROBLEMS 467
of certain results of Fisher which were tackled by FeIler, from the diffusion
viewpoint.
Suppose that during one step of a process one partiele turns into k particles
with prob ability Pk(N),
with singularity at T = 0, = O.
X In a certain sense it describes the behaviour
of one partiele that appeared at time T = 0, meaning that if for TO > 0 we
1
choose c so that
00
U(To,x)dx = 1,
468 BRANCHING PROCESSES. DIFFUSION PROCESSES & GENETIC PROBLEMS
then, provided that e(rO) > 0, the conditional distribution of e(r) for r > ro
converges to the distribution
as N -+ 00.
u(x) = ct/x
has the following statistical meaning: if at each step a new particle appears
independently of time with prob ability p, and p( Xl, X2) is the number of those
initial particles whose progeny at a certain fixed time t satisfies
as N -+ 00.
The solution
ct/x + c2/(1- x)
of the stationary equation
(x(1 - x)u):z::z: =0
in Fisher's theory can be interpreted in a similar way. This interpretation was
even verified in an experiment and appeared to be true.
18 November 1958
BRANCHING PROCESSES. DIFFUSION PROCESSES & GENETIC PROBLEMS 469
References
may be represented as
References
470
48. ON CONDITIONS OF STRONG MIXING OF A GAUSSIAN
STATIONARY PROCESS*
Jointly with Yu.A. Rozanov
As is weH known, two u-algebras of events mt' and mt" are said to be indepen-
dent iffor any A' E mt', A" E mt", P(A',A") = P(A')P(A").
M. Rosenblatt [1] has suggested a natural measure of dependence between
two u-algebras of events:
For a stationary random process e(t) the measure a(mt~, mti+T) (where
mt! denotes the u-algebra of events generated by e( u), s ~ u ~ t) depends
only on rand will be denoted by a(r). If a(r) - 0 as r - 00, then it is said
that the process e(t) has the strong mixing property. In this paper we indicate
which properties of the spectral function F( A) of the process guarantee the
strong mixing condition for Gaussian processes.
1. For any two systems {O = ~, and {17} = ~" with finite second moments
we introduce the index
If~' and ~" are, respectively, the families of aH variables with finite second
moments measurable with respect to the u-algebras mt' and mt", then, by def-
inition (see [2]), p(mt', mt") = p(~', ~") is the maximal correlation coefficient
between mt' and mt".
Clearly,
(1)
Now let {O and {17} be two sets of random variables which have (for
any finite set el, ... ,em,171, ... ,17m) Gaussianjoint distributions, and let mte
and mt'1 be the u-algebras generated by the events (e Er') and (17 Er")
respectively, where r' and r" are arbitrary Borel sets on the line, and let He
and H'1 be the closed (with respect to the mean square) linear spans of {O and
{17}·
* Teor. Veroyatnost. i Primenen. (Probability theory and its applications) 5:2
(1960), 222-227 (in Russian).
471
472 CONDITIONS OF STRONG MIXING OF A GAUSSIAN STATIONARY PROCESS
Theorem 1. 1
(2)
Proolol Theorem 1. Clearly, we may restrict ourselves to the case when {e}
and {'11} consist of a finite number of variables.
Further, in He and H" we can choose el, ... ,ern and '11, ... , 'In SO that only
those ek and 'I1k with the same indices are dependent, and each e E {e}, '11 E {'I}
is a function of 6, ... , ern and '11, ... , 'In respectively. We may also assurne that
Eek = 0, E'I1j = 0, Dek = D'I1j = 1, k = 1, m, j = 1, n.
Then the quantities I = 1(6, ... ,ern) and 9 = g('11, ... ,'In) may be rep-
resented as
rn n
where
Ik = E(flel, ... ,ek) - E(f16,··· ,ek-d,
gj = E(gl'l11,'''' 'I1j) - E(gl'l11,"" 'I1j-d·
Note that for k ~ j,
which implies that E/kgj = 0 for k =1= j and, taking m ~ n for the sake of
definiteness,
rn rn
Elg = L:E/k9k =L:E[E(fk9kI6, ... ,ek-l,'11"",'I1k-l)].
1 1
The variables 6: and TJk with Gaussian distribution do not depend on 6, ... ,ek-1 ,
TJ1, ... , TJk-1, and Sarmanov's result [2] implies that
where
Hence,
m
IE/gl ~ p L: Eakbk ~ p,
1
Pro%/ Theorem 2. 2 Take an arbitrary > 0 and ee EHe, TJe E HrJ with
f
Eee = ETJe = 0, oee = ßTJe = 1 such that r = EeeTJe > p - f. Consider the
events
Ac = {ee > O} E 9Re and Be = {TJe > O} E 9RrJ'
and, clearly ,
2~ sin- 1 r = P(AeBe) - P(Ae)P(Be) ~ 0'.
Further, if 0' > ~, then the inequality p ~ 211'0' is trivial. If, on the other hand,
0' ~ i, then
p- f ~ r ~ sin 211'0', p ~ 211'0' + f, P ~ 211'0',
2. Let ~(t) be a process that is stationary in the wide sense and let
(4)
where inf", is taken over a/l functions <fJ( z) that can be analytica/ly continued
inside the unit disc; for continuous time
(4')
where inf", is taken over a/l functions <fJ(z) that can be analytically continued
into the lower half-plane.
The proof of this theorem is based on a general lemma taken from func-
tional analysis.
Proof Since (h* - hO)(h) = h*(h), it follows that h*(h) ~ IIh* - hOIl for h E
H, IIhll = 1, therefore sup h*(h) ~ inf IIh* - hOIl. Further, according
hEH,lIhll=! hOEHO
CONDITIONS OF STRONG MIXING OF A GAUSSIAN STATIONARY PROCESS 475
J
where
Pj(A) = L c{e- iAt{, IpjI2f(A)dA ~1
tÜO
and the integration is carried out from -11" to 11" for integer time, and from
-00 to 00 for continuous time. Using certain properties of boundary values of
analytic functions it can be shown that in fact
(6)
J
where
p(A) =L Cke-iAtk, Ip(A)lf(A)dA ~ l.
tk~O
Let us take for the space L the space offunctions h(A) that are integrable
=
with weight f(A), that is, IIhll f Ih(A)lf(A)dA < 00, and as subspace H the
linear closure of the functions p( A) of the form
p(A) =L Cke-iAtk.
tk~O
h*(h) = J
h*(A)h(A)f(A)dA,
unit disc for integer time, and for continuous time 4»(A) = h(A)/(A) may be
analytically continued into the lower half-plane.
Taking as h* the linear functional corresponding to the function h*(A) =
e-i>.T, we obtain (4) and (4') from (5).
Theorem 4. I/there exists 4»o(z) that has an analytic continuation inside the
unit disc /or integer time (to the lower hai/-plane /or the case 0/ continuous
time) with boundary value 4»p(e- i >.) (respective1y, 4»p(A)) , such that the ratio
/ No is a uni/ormly continuous function 0/ A with 1/Nol ~ ( > 0 for almost
all A, then
per) -+ 0 (7)
as r -+ 00. If there exists an analytic function 4»o(z) such that 1/Nol ~ ( > 0
and the derivative (f No)<k) is uni/ormly bounded, then
p(r)::; cr- k •
Proof Let 4»(z) be a polynomial of degree at most [r/2] for integer time (or
an analytic function of exponential type with type at most r /2 for continuous
time). We have
continuous over the whole line, does not vanish and for sufficiently large A
satisfies the inequality
m/A k ::; /(A) ::; M/A k - 1 (9)
for some positive m, M and integer k > 0 (for the case of continuous time).
It then follows from [5], [6] that if p( r) -+ 0 as r -+ 00, then the spec-
tral density cannot vanish "too strongly": namely, it must be positive almost
everywhere and satisfy the inequality
j log/(A)d\
1 + A2 ,,>
_
00. (10)
CONDITIONS OF STRONG MIXING OF A GAUSSIAN STATIONARY PROCESS 477
Apparently, the strong mixing condition might fail if the spectral den-
sity f(>") has a discontinuity (even if it is everywhere greater than a positive
constant for integer t).
The authors thank Yu.V. Prokhorov for the remarks on the manuscript,
which undoubtedly has improved it.
References
A set
478
50. AN ESTIMATE OF THE PARAMETERS OF A COMPLEX
STATIONARY GAUSSIAN MARKOV PROCESS *
Jointly with M. Arato and Ya.G. Sinai
C(r) = A(r) + iB(r) = E[(t)(t + r)] = (J'2 exp( -~Irl- iwr), (2)
479
480 PARAMETERS OF A COMPLEX STATIONARY GAUSSIAN MARKOV PROCESS
In the expression for r the integration is over the angle defined from
The figure shows the empirical correlation function for Chandler variations
of the coordinates of the earth's pole. 1
V=LxW
A2 + w2
dP
dV = CAexp [- A
2a Ts~ - ;s~ + AT+ ;Tr ,
W ]
(4)
where C is a constant. Formula (4) shows that the system of three statistics
s~, s~, r is a sufficient system of statistics of the problem. Differentiating
dP A2 +w 2 A W
L=logdV =c'+logA- 2a TS~-;S~+AT+;Tr
1 The instantaneous axis of the earth's rotation moves with respect to the small
axis of the earth's ellipsoid (so called free nutation). These movements have a
periodic component with aperiod of one year. After eliminating this component
there remain Chandler's movements with tendency to fluctuate with period of
about 14 months, but which are not strictly periodic and have large and mainly
smooth variations of the amplitudes (waves of about 10-20 years). The figure
shows that the Chandler component of pole movement is in good agreement with
the hypothesis at the beginning of this paper.
The figure was obtained by processing the data of Table 6 from the book by
A.Ya. Orlov [1]. The component with one year period is singled out from the
coordinates x(t), y(t) of Table 6, and the remainder is taken to be e(t) and '1(t).
The nodes on the figure indicate the points corresponding to the increments of T
in 0.1 year. The figure shows straight away that the period 21r/w approximately
equals 14 months. The regular pattern of the spiral obtained might suggest that
the parameter A can also be estimated very precisely. This, however, is not true,
as will be explained at the end of our paper.
PARAMETERS OF A COMPLEX STATIONARY GAUSSIAN MARKOV PROCESS 481
öL w 2 T
- = --TS 2 + - r = 0, (5)
öw a a
-öL
ÖA
= -A1 - A 2 s~
-Ts
a 2 - -a +T =° (6)
for determining the maximum likelihood estimates wand i From (5) we obtain
Clearly,
(8)
We have initiated computations of Ka(il:) for 0' = 0.1; 0.05; 0.025; 0.01;
0.005; 0.001; 0.9; 0.95; 0.975; 0.99; 0.995; 0.999. The results thus obtained
will be published when the computations are completed.
For small k, (8) is equivalent to the relation
that is, k/ K has a X2 distribution with two degrees of freedom. For large k, (8)
is equivalent to
(10)
(11)
§3. For the case mentioned in the beginning of the paper, that is, the movement
of the earth's pole, we have 2
2 The introduction of Wiener processes 4> and ,p, that is, perturbations of "white
noise" type in (1) is, of course, a gross generalization in the case of the movement
of the earth's poles. It would be more correct to write
However, data from [1] show that the values J(t) and g(t) at times t separated by
several years are actually independent, so the replacement of J and 9 by "equiv-
alent white noise" is possible. Apparently, the error in determining the intensity
PARAMETERS OF A COMPLEX STATIONARY GAUSSIAN MARKOV PROCESS 483
References
1. A.Ya. Orlov, Service 0/ latitude, Akad. Nauk SSSR, Moscow, 1958 (in
Russian).
2. Ch. Striebel, Ann. Math. Statist. 30:2 (1959), 559.
3. M. Arato, Dokl. Akad. Nauk SSSR 145:1 (1962) (in Russian).
4. E.P. Fedorov, Proe. Poltava Gravimetrie Observatory 2:3 (1948) (in Rus-
sian).
5. N.I. Panchenko, In: Proc. 14th Astronomical Conference of the USSR,
Akad. Nauk SSSR, Moscow, (in Russian).
6. W.H. Munk and G.J.F. Macdonal, The rotation o/the earth, Cambridge,
1960. (Monographs on mechanies and applied mathematics.)
7. H. Jeffreys, Monthly Notiees Roy. Astronom. Soc. Geophys. Suppl. 100
(1942), 139.
8. A.M Walker and A. Young, Monthly Notices Roy. Astronom. Soc. Geophys.
Suppl. 115 (1955), 443; (1957) 117, 119.
a of this equivalent white noise is sma.ll enough not to affect the estimates of
A substantia.lly. The value of wis computed via the discrete analogue of for-
mula (*) obtained using the maximum likelihood method for the "discrete-time
scheme".
Concerning the estimation of the parameters A and w for the earth 's movement,
= =
see also [6]. The results of [6] are dose to ours: A 1/15; 211': W 1.193. Close
values were given by Jeffreys [7], but [5], [8] give sharply different values: A = 0.3
=
and A 0.01.
51. ON THE APPROXIMATION OF DISTRIBUTIONS OF SUMS OF
INDEPENDENT TERMS BY INFINITELY DIVISIBLE DISTRIBUTIONS *
Introduction
for x 0,
= Go(x) = {~
~
E(x)
for x > 0,
Theorem 1. There exists Cl such that tor identically distributed ek, there
= =
exists tor any F(x) Fk(x), k 1,2, ... , n, a distribution D E l) such that
(0.1)
tor all x.
Theorem 2. There exists C 2 such that tor any (; > 0, L > 21 > 0, the
inequalities
(0.2)
* Trudy MoskolJ. Mat. Obshch. (Proe. Moscow Math. Soc.) 12 (1963),437-451 (in
Russian).
484
APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS 485
1. It follows from the dosedness (with respect to weak convergence) of the dass
of infinitely divisible distributions introduced by Bruno de Finetti [2] that if
the distributions of sums
(0.5)
whose terms are independent and identically distributed within each series,
converge weakly, then the limit distribution is infinitely divisible.
It is tempting to interpret this result as folIows: the distribution 0/ the sum
0/ a large number 0/ identically distributed independent terms is dose to an in-
finitely divisible distribution. However, prior to my work [1] this interpretation
was not quite convincing.
Even for a sequence
e(n) = 6 + 6 + ... + en
as n ---+ 00. Still, the work left open the question of whether the convergence in
(0.6) is uniform with respect to the distribution function F(x) of the variables
en.
In terms of the supremum metric
tends to 0 as n ---+ 00. The answer to this question was given in my work [1]:
it was proved that
(0.7)
(0.8)
(0.9)
It was natural to try to estimate 1jJ(n) from below. These estimates were
made by the student of Prokhorov, I.P. Tsaregradskii, Prokhorov himself and
L.D. Meshalkin. The latest result of Meshalkin [6] is
(0.10)
2. For sums
Later this result was reported by F.M. Kagan at the Meeting on Probability
Theory and Mathematical Statistics in Fergana (September 1962).
APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS 487
all terms of which are independent within each series but have different dis-
tributions, A.Ya. Khinchin in 1937 [7] established a sufficient condition for
the limit distribution of e(k) to be infinitely divisible. This is the condition of
infinitesimal terms: there exist
for which the distributions Fki of the eki satisfy the condition
Corollary. 1f
sUPPL(Fi,E) ~ 1/,
i
then
3. As is weH known, the most powerful means of proving limit theorems on the
distribution of sums of a large number of independent terms is the apparatus
of characteristic functions. Now "direct" probabilistic methods in this area
can very sei dom compete with the possibilities of the analytic apparatus of
characteristic functions.
Our Theorems 1 and 2 give an interesting example of another state of
affairs. An essential element for proving these theorems is Lemma 1, which
refers to the "concentration functions" introduced by P. Levy. I strengthened
theorems of Levy and Doeblin on the properties of concentration functions
([8]) specificaHy to prove the new versions of Theorems 1 and 2 given in [1].
488 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
we have
(0.11)
3 This result of Rogozin (now know as the Kolmogorov-Rogozin inequality) was
proved by Esseen in 1966 by the method of characteristic functions (Editor's
note).
4 The first of these steps was taken by F.M. Kagan somewhat before this paper
was written (see footnote 2).
APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS 489
After P. Levy, we introduce for any distribution function F(x) its "concentra-
tion function"
QF(l) = sup[F(x + I + 0) - F(x)].
x
Lemma 4. If
E(x)1 ~ es.
r:-oo rh:5x:5(r+l)h
490 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
Lemma 8. Let 5 0 ~ Pk ~ 1,
I-Pk for m = 0,
Pk(m) =
{
~k for m = 1, qk(m) = ~e-Pk,
for m > 1,
Then
L Ip(m) -
m
q(m)1 ~ C 12 LP~'
k
(Chebyshev's inequality).
Lemmas 5 and 6 are elose to the known estimate
(1.1)
(1.2)
Lemma 6 may be derived from (1.2) if the additional normal term with
variance 0"5 is represented as a sum of a large number of terms with sufficiently
small variances.
The proof 6 of Lemma 5 is somewhat more difficult.
1. In what follows we consider n > 1. It can easily be seen that this requirement
is inessential.
3. We set
p = n- 1/ 3 ,
for p/2 < 1Jk < 1 - p/2,
~k = {~ otherwise.
e~ =6 - a,
6 Lemma 5 can easily be derived from the following estimate due to A. Bikyalis:
where Cis an absolute constant (see Litov. Mat. Sb. 6:3 (1966), 323-346) (Edi-
tor's note).
7 The functions Fk- 1 must be suitably defined. This is left to the reader.
492 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
onlya' = 0 appears instead of a, and the functions A(x) and B(x) are replaced
by
A'(x) = A(x + a), = B(x + a).
B'(x)
4. In the decomposition
of length
while the support of the distribution B lies outside this interval, with each
of the rays (-00, x-] and [x+, 00) having probability ~ in the distribution B.
Lemma 1 may be applied to the distributions B rn (throughout, the powers
of a distribution are understood in the sense of convolution), which gives the
estimate
(2.1)
D = enp(B-E) =L (np~rn e- np B rn ,
tIM
m.
H1 = L C: prn(l - p)n-rn B rn .
rn
APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS 493
According to Lemma 7,
(2.2)
~ QBm(-') Er sup
r>'~!I~(r+l)>'
IAn-m(y) - E(y)1 ~ C5 CS m- 1 / 2 • (2.3)
Therefore
where
(2.4)
Noting that
EI' = np = n2 / 3 , (2.5)
DJ.' = np(l - p) ~ n 2 / 3 , (2.6)
Case B. We set
(2.8)
~
,fiiu
C5 - - m
-1/2
C g -A- = C 5 C9 m -1/2 .
A ,fiiu
where
I n-m
n(l-p)
-ll>fu
C7
Using Chebyshev's inequality we obtain from (2.5) and (2.6) for n > 1 and an
appropriate choice of C 16 the estimate
11
"L...J <
- C17 n- 1 / 3 ,
which leads to
(2.10)
with arbitrary Fk(x) satisfying (0.2). Let L > 2/. Choase I' and L' such that_
By Lemma 2 we can choose 0"0 to be so small that for any distribution function
F(x) we have
where
)"=1'-1, A=L-L',
I' L'
C~ max[L' (log l')
1/2
6' = ,(2f)1/3].
Set
Since the functions F~ are continuous and strictly increasing, there exists an
infinitely divisible distribution D' for which
Noting that
496 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
is also continuous and strictly decreasing. It takes all values in the interval
n > S > O. Therefore for 0 < ( < 1 there exists a unique solution AO of the
e quat ion
S(A) = (-2/3.
4. We set
AO, if AO ~ I,
A= {
I, if AO < I,
S = s(A) = LPk'
k
for x;;
< (k < xt,
otherwise,
ak = E{ekll'k = O},
(j~ = D{ekll'k = O},
Setting
we represent Fk (z) as
where the support ofthe distribution Ak lies in the interval [z;, ztl, while that
of the distribution Bk lies outside this intervalj here the prob ability of the rays
(-00, z;l and [zt, 00) is !.
Using the notation of Lemma 8 and setting 9
B( m ) = rr *Bmr.
k k , A(m) -_ rr:A~-mr.,
,. ,.
we obtain
A B C
). = ).0 ). = ).0 ). = 1
s =C 2/ 3 S = e- 2/ 3
(3.5)
9 The B(m) are defined for any negative mk, while the A(m) are defined only
when the mk take the values 0 and 1.
498 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
This is the only instance in our proof when we use condition (0.2) of the theo-
rem. Since the definition of Pk(A) and all the other variables essential for our
constructions is invariant with respect to the shifts
r is equal to the number of variables 6 such that x'k < ek < xt. It is easily
checked that
Er = s, Dr = L:Pk(l- Pk) < s.
k
P{lr - si ~ C} = L:
It(ID)-'I~C
p(m) < ;2' (3.6)
This is the sum of those ek for wh ich x'k < ek < xt. In view of the assumption
that the ak = 0, for any m
E( = 0, E«IJ.t = m) = O.
In what follows we shall be interested in the conditional variance
D p 2 = L:Pk(l- Pk)U:.
k
APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS 499
Since
~)1- Pl:)o-~ = 0- 2 ,
I:
Therefore
0- 2 A2 f
L: p(m) ~ 4C2 • (3.7)
10"2(m)-0"21~c
is approximated by
Therefore
where
~' = ~ p(m).
t(m)<t c2 / 3
(3.12)
(3.13)
m m
The inequality
(3.14)
(3.15)
APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS 501
Exactly as we obtained (3.12) from (3.10) in case A we now obtain from (3.15)
(3.16)
if
Using (3.7) and taking into account the fact that now A < (1', we obtain the
estimate
L:" = L: p(m) $ G20 e1/ 3 • (3.17)
1";2L11~%:
Therefore in the same way that we derived (2.10) in the proof of Theorem 1
we obtain
IH1 - H 2 1 $ (G19 + G20 )e 1/ 3 •
(3.18)
Inequalities (3.12) and (3.18) show that in cases A and B the estimate
(0.3), which is the essence of Theorem 2, may be replaced by the following
stronger one:
(3.19)
Case C. In this case we have not managed to obtain an estimate of the type
(3.19). We set
1 ( L)-1/2
(1'0 = ..,ß.L log T (3.20)
By Lemma 2, for
(3.21)
502 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
we have
H'(z - L) - TJ:S H(z) :S H'(z + L) + TJ. (3.22)
H 1' -- ",",p(m)B(m)
LJ *G q~(m),
_21(m) -_ _
v v
2(m) + v-02,
m
H~ = Lp(m)B(m) * Gq~.
m
The inequality
(3.24)
is proved in the same way as (3.9) and (3.14) in cases A and B, only now
By Lemma 3,
(3.25)
where E III
runs over those p(m) for which
(3.26)
(3.27)
whence we have
IH' - H~I = v'2C102I (log T
L) 1/2 . (3.28)
Formulas (3.24), (3.27) and (3.28) immediately imply (3.23). This completes
the proof of Theorem 2.
Steamboat "Sergei Kirov"
Red Sea-Persian Gulf
13-24 March 1962
References
I ( L) 1/2
I =..fiT, log T 0"0 :5 0"0·
504 APPROXIMATION BY INFINITELY DIVISIBLE DISTRIBUTIONS
The paper deals with the asymptotic behaviour of efficient statistics of a spec-
tral density as the sampie size increases. A certain new statistic introduced
by A.N. Kolmogorov is compared with other well-known statistics of a general
density, as weIl as with asymptotically optimal statistics with respect to the
mean square deviation. The influence on various statistics of high peaks of
the spectral density at neighbouring frequencies is discussed. A new dass of
statistics is obtained by applying the shift operator to a periodogram calculated
from smoothed data.
Let X(t), t = 0, ±1, ... , be a stationary random process with expecta-
tion EX(t) = 0, covariance function C(t) and spectral density f(>'), where
f(>'), -00 < >. < 00, is a function with period 211". For dimensional reasons it
is natural to consider all quadratic forms
N
1 'W" b(N)
211"N 8 ,t X(s)X(t) (1)
l,t=1
(2)
(3)
where
1 N-Itl
BN(t) = N L: X(s)X(s + Itl).
8=1
* Proc. 2nd European Conference on Statistics, Oslo, August 14-18, 1978. First
pu blica tion.
505
506 ESTIMATORS OF SPECTRAL FUNCTIONS OF RANDOM PROCESSES
(4)
where
if for any JJ and certain given >., 0 < 0' ~ 2, C 2: 0, Cl > 0, one of the
inequalities
1/(>' + JJ) - 1(>')1 ~ CIJJI'\ 0< 0' ~ 1,
and the 4th order spectral density 14(Xl, X2, X3, X4) of the process X(t) IS
bounded:
(4a)
i:
N -+ 00, such that
c)N(X)c)N(X + Y + z)dx =
(5)
(6)
case is the requirement of smoothness of the spectral density 1(>') at one single
point as compared to the uniform smoothness
E
00
Theorem 1. 11 X(t) E K(>', a, C, Cl)' <IlN(x) E :F, then lor the statistic iN(>')
we have, as N -+ 00
(7)
where
I, >. == 0 (mod 71"),
7](>') = { (8)
0, >. t 0 (mod 71").
(9)
where
lxi ~ 1,
lxi> 1,
and
( aNC2 ) 1/(1+20:)
(10)
AN = 7I"J2(>')(1 + a)(1 + 2a) .
Proof. Isolating the main term of the mean square deviation \7 iN( >.) we obtain,
in accordance with Lemmas 1 and 2 in [3], the variational problem of finding
the minimum of the functional
ESTIMATORS OF SPECTRAL FUNCTIONS OF RANDOM PROCESSES 509
which implies that one should look for a solution of the form (9). Taking into
ac count the initial conditions we obtain (10) and with it (7) and (8).
Theorem 1 gives an explicit form of the optimal window in the sense of
the mean square deviation of the estimator iN(A) for a given smoothness of
the spectral density f(A) under study at the point A.
In order to compare asymptotic properties of various periodogram estima-
tors of the form (2) and (4) we use the following
Theorem 2. Let 4>N(X) E:F be dejined by (5) and suppose that G(x) satisjies
(6) for some a such that 0 < a:::; 2. Then for X(t) E K, we have
. ~ ( C2 )
mf sup VfN(A)""f 2(A) f2(')
1l2a 2a
N-l+2ag(a) (11)
AN X(t)EI< A
as N -+ 00, where
C;:a)(2aVdd2a(211'V2)1:l!~a,
i: i:
g(a) =
(12)
V1 = IxlaG(x)dx, V2 = G2 (x)dx,
(12a)
one of the authors (A.N. Kolmogorov) suggested that these drawbacks could
be eliminated by using a "time window" followed by averaging periodograms
over various time intervals.
Let aM(t), t = 0, ±1, ... , be a non-negative function vanishing outside
[0, M]. From a sampie {X (Q), ... ,X (Q + M)} we construct the function
L
00
We determine the statistic fN(.\) of the spectral density f(.\) of the random
process X(t) in the following way:
T-1
fN(.\) = ~ L IW~k(.\W·
k=O
where M = K(P-1) and the coefficients CK,P are determined from the relation
Since
L
00
it follows from (13) that the "spectral window" of the estimator in quest ion is
given by
I'f'K,P
A. ()1 2 = 2(K p)(K(P2 -1))1/2(Sin 2(PX/2))K (13a)
x J.l, 12~ p2 sin2(x/2) ,
(14)
where TI = [-11",11"],
L
00
Theorem 3. 1f X(t) E K, then for the statistic IN(>") for>.. i= 0 (mod 11") with
coefficients defined by (13) we have
L N(1-2a)/(1+2a) P
N1/(1+ 2a ) --+ 0(1), P --+ 0(1), N1/(1+2a) --+ 0(1),
The main term of OiN().) gives an expression which, using the method of
steepest descent, can be reduced to the following form:
The difference between (18) and the second term of (15) is estimated by
11 - 121$ L )Ha ( 1 + 0 (
C\Tp..(K { K(P2-1)})
exp - L2 + 0 (1))
K ' Q < l.
=
Setting K(P2 -1) lIN" and finding the minimum of V iN().), we obtain the
asymptotic equality (16). This proves our theorem.
As has been shown by calculations, (see [3], [7]), the mean square devia-
tion appears to be closest to the optimal statistic in the case of Kolmogorov's
statistic iN().) as compared to the estimators of Tukey, Parsen, Bartlett, Abel
and others. As follows from Theorem 3, the statistic iN().) by comparison
with other statistics, is insensitive to changes in the parameters, which can be
chosen in broad ranges. The degree of dependence on high frequencies of this
statistic is of order N-2K(1+2a), where K can be arbitrarily large as N -+ 00.
This enables us to carry out a spectral analysis in the required frequency strip
in the presence of strong noises and non-stationary processes concentrated at
other frequencies, for example, in the presence of a trend. At the same time it
is easy to check stationarity in the frequency strip under study.
We now consider the effect of strong peaks in the spectral density fez) at
a frequency ). + tl, close to ).. Assume that
where fOl(x) is the spectral density ofthe process XOI(t) EI\, and /6(x) is defined
by an equation of the form
f6(x) = 6· 6*(x -..\ -.6.) + 66*(x +..\ + .6.), .6."t 0 (mod 211'), (19)
where 6 > 0 is areal number and the function 6*(x) is given by (15a). We say
that X(t) E I\, (..\,.6.,6) if the spectral density of X(t) is determined by (19),
the semi-invariant spectral density of the 4th order is bounded and EX(t) = O.
The asymptotics of the mean square deviation of the statistic with optimal
choice ofparameters in the presence of a chosen 6 at frequency ..\+.6. is described
by the following theorem (see [5]-[7]):
Theorem 4. Let X(t) E I\, (..\,.6.,6), let the kerneis cJ.>N(X) E :F be determined
by (5), let G(x) satisfy (6) for a ~ 2 and let the parameter AN of the statistic
iN(..\) be chosen in accordance with (12a). Then as N -+ 00
C 2 ) 1/(1+201)
sup "VfN(..\) = f2(..\) N- 201 /(1+ 201 ) x
A (
-2-
X(t))EI«>.,A,5) f (..\)
x(l + 7](..\))201/(1+ 201 )g(a)(1 + 0(1))+
where
..\ == 0 (mod 11'),
7](..\) = { 0,I, ..\"t 0 (mod 11')
(21)
and the functions tfoÄr(x) and g(a) are defined by (4a) and (12a) respectively.
Theorem 5. Let X(t) E I\, (..\,.6.,6) and let the statistic IN(..\) be determined
by the coefficients aM(t) found via (13) under the conditions (16) as N =
L(T-1)+M(P-1)+1-+00. Then
- ( C ) 1/(1+201)
sup "VfN(..\) = K(a)f2(..\) f2(..\) x
X(t)EI«>.,A,5)
XN- 201 /(1+ 201 )(1 + 7](..\))201/(1+ 201 )(1 + 0(1)) + 26 2(ltfoM(.6.W+
where l<Pn(x)1 2 , K(a) andTJ()..) are defined by(13a), (16a) and(21) respectively.
The proofs of Theorems 4 and 5 are given in [5]-[7].
Note that the order of the remainder terms in (20) for the statistic iN()..)
is not less than
(23)
if .6. ~ N- c/(1+2a)(l+c). For iN()..) the order of the remainder terms in (22) is
not less than
which can be taken arbitrarily small with respect to (23) for comparatively
small k as N --+ 00. Under these conditions
According to (23) the statistics iN()..) have fixed order of dependence on the
choice of 6, which cannot be diminished by choosing a suitable spectral window
<I>N(X).
References
When we talk about random events in the everyday sense of the word, we
mean phenomena in which we do not find any regularities that would allow us
to predict their behaviour. Generally speaking, there are no reasons to suggest
that events random in this sense obey any prob ability laws. Consequently we
should distinguish between randomness in the wider sense (the absence of any
regularity) and stochastic random events (which are the subject of probability
theory).
The problem is to describe the reasons why mathematical probability the-
ory can be applied at all to phenomena of the real world. My first attempt
to ans wer questions of this kind was made in [1] (published in an edition of
methodological character).
Since a random event is defined as the absence of regularity, we should
first define the notion of regularity. The natural means for this is given by the
theory of algorithms and recursive functions. The first attempt to apply it to
probability theory was made by Church [2].
The purpose of my report is to familiarize the readers with this field, at
least in a first approximation.
Paying tribute to tradition, I begin with the classical definition of proba-
bility as the ratio
P=m/n
of the number of favourable out comes m to the total number of out comes n.
This definition reduces the problem of calculating the prob ability to combina-
torial problems.
However, this definition cannot be applied in many practical situations.
This brought to life the so-called statistical definition of probability:
p ~ I'/N, (1)
515
516 ON THE LOGICAL FOUNDATIONS OF PROBABILITY THEORY
The first attempt to make definition (1) more exact was made by R. von
Mises. But before we describe his approach, let us discuss (from the viewpoint
of the dassical definition of prob ability ) why stability of frequencies is so often
observed in natural phenomena.
Consider all 0-1 sequences of length n containing exactIy m ones and as-
sume that all such sequences are equally probable. Suppose that some method
of dividing any sequence of length n into two subsequences has been chosen.
Then for each sequence it is important to compare the frequencies of ones in
both subsequences by calculating the difference
where n1 and n2 are the lengths of the subsequences and 1'1 and 1'2 the number
of ones in them, so that n1 + n2 =
n, 1'1 + 1'2 =
m. We would like to think
that this difference would "almost always be small" in the sense that for any
t>O
Pclass{ll'l
n1
- 1'n22 < t} -+ 1 as n1, n2 -+ 00.
1
exists;
2) this limit is preserved under transition to a subsequence chosen with
the help of an admissible rule:
lim .!
m-oo m '"L..J X n J· = P.
iSm
Von Mises gave only general characteristics and several examples of ad-
missible rules. His instructions basically mean that the choke of every next
ON THE LOGICAL FOUNDATIONS OF PROBABILITY THEORY 517
term of the subsequence should not depend on the value of the term but only
on the values of the previous terms. Of course, this definition is not exact, but
no rigorous definition should be expected since the very notion of ''rule'' did
not have a mathematical definition at that time. The situation changed essen-
tially when the not ions of algorithm and recursive function appeared, used by
Church [2] to darify von Mises' definition. In [3] I proposed a broader dass
of admissible selection rules than that given by Church. According to [3], a
selection rule is given by an algorithm (or, if you like, a Turing machine). The
next term of the sequence is chosen as folIows: the input information consists
of a finite sequence of numbers n1, n2, ... ,nj: and values X n1 , X n2 , ••• ,Xnk of
the corresponding terms of the original sequence. The output consists, first,
of the number nH1 of the next considered element of the sequence (it should
not coincide with any of the n1, n2, ... , nj:, on whose order no restrictions are
imposed), and secondly, indications as to whether the element X nk +1 should
be chosen only for inspection or is to be induded in the chosen subsequence.
At the next step the input of the algorithm consists of the longer sequence
n1, n2, ... , nH 1; naturally, the algorithm starts working from the empty input.
In comparison with [2] the dass of admissible selection rules is wider, since
the order of terms in the subsequence need not coincide with their order in
the initial sequence. Another, even more important, difference is in the strictly
finitary character of the entire concept mentioned above and in the quantitative
estimate of stability of the frequencies.
Passing to finite sequences inevitably leads to restrictions on the complex-
ity of the selection algorithm. An exact definition of the complexity of a finite
object and examples of its use in probability theory were proposed in [3], [6].
The results of frequency and complexity approaches are compared in [4].
Now let us return to the original idea according to which "randomness"
consists in the absence of "regularity" and see how the notion of complexity of
a finite object enables us to make this idea precise. The notion of complexity
is dealt with in a number of papers that may be divided into two groups: on
the complexity of calculations and on the complexity of definitions. We will
deal with the second group.
We take the definition of complexity from [6]. We define the conditional
complexity of a constructible object with respect to a certain algorithm A under
the condition that the constructible algorithm Y is known. More precisely,
518 ON THE LOGICAL FOUNDATIONS OF PROBABILITY THEORY
Here l(p) is the length of the sequenee of zeros and ones eneoding the pro-
gramme. There exists an "optimal" algorithm A, that is, one such that for any
algorithm Al there exists a eonstant C such that for all X and Y
where IMI is the number of elements of M. The objeets from M that are
.6.-random for relatively small .6.'s will be ealled random in M. We obtain a
definition of a random finite objeet whieh ean be regarded as definitive.
Taking as M the set D n of aH 0-1 sequenees of length n we eome to the
eondition
It ean be proved that the sequenee satisfying this eondition for suflieiently
small .6. possesses the property of stability of frequeneies when passing to sub-
sequenees. Henee, von Mises' requirements on random sequenees ean be eon-
sidered as a partieular ease of our requirements.
Further results in this direetion may be found in [5], [7]-[12].
References
11. Axiomatics and logical foundations of prob ability theory. Paper No. 7.
Paper No. 7, published in 1929, was the first result of my reflections on
the logical structure of probability theory. Here probability theory is presented
as one of the fields of application of general measure theory. But the concept
developed had not yet revealed the set-theoretic implications of conditional
prob ability, a notion that is fundamental in probability theory. Only after this
difficulty was overcome and the theory of distributions in infinite products was
constructed did it become possible to speak of a set-theoretic justification of
probability theory as a whole which was given in my monograph "Fundamen-
tal notions of probability theory" that came out in 1933 in German and in
1936 in Russian. A modernized presentation of these concepts, developed by
A.N. Shiryaev and myself, is given in the second edition of this monograph
(Nauka, Moscow, 1974).
520
PAPERS ON PROBABILITY THEORY AND MATHEMATICAL STATISTICS 521
IV. Markov proeesses. Papers Nos. 9, 10, 13, 14, 17, 19,24,39.
In 1929 I foeused attention on the theory of Markov proeesses with eon-
tinuous time. In No. 9 (one-dimensional ease) and in No. 17 (multidimensional
ease) this theory was developed in classieal terms without explieitly using tra-
jectory spaces. The first versions of the modern eoneept of a Markov proeess
with eontinuous time were developed by J. Doob and E.B. Dynkin (see the
eomments by A.D. Ventsel). Papers Nos. 14, 19, 24 (see the eomments by
A.M. Yaglom) deal with various applieations of the diffusion-type theory of
Markov proeesses. Paper No. 39 (see the comments by A.A. Yushkevieh) dis-
eusses the ease of a diserete set of states.
VI. Stationary proeesses. Papers Nos. 27, 28, 47, 48, 50, 52.
I beeame interested in the speetral theory of stationary random proeesses
after the appearanee of the work of A.Ya. Khinehin and E.E. Slutskii. A first
idea on the range of problems involved ean be obtained from Paper No. 34. The
hypothesis on representing oseillatory proeesses by Stieltjes integrals (see (4) in
522 COMMENTS
No. 34) was completely confirmed during the following 40 years. Apparently,
even in teaching (in particular, to engineers) it should be more explicitly stated
that the spectral decomposition of a process does not allow, in general, a more
concrete explanation.
VIII. Various applications. Papers Nos. 14, 18, 22, 26, 29, 37.
Paper No. 14 solves a problem posed by S.1. Vavilov. In this paper §§1
and 2 are written by me, and §3 by M.A. Leontovich. As for paper No. 18,
the considered "blow-up" of the empirical correlation coeflicients under a small
number of observations is quite typical in many applied works (see also the
comments by A.M. Yaglom).
IX. Mathematical statistics. Papers Nos. 11, 15, 30, 31, 38, 50.
See the comments by E.V. Khmaladze, E.V. Malyutov and A.N. Shiryaev.
The title to this article, which in modern language would rather read "Analyt-
ical methods in the theory of Markov processes" may serve as the title for an
entire branch of this theory. The essence of this branch is that Markov pro-
cesses come into consideration only for the purposes of translating the problem
in question into the language of transition probabilities P( s, Z, t, E) of a Markov
process or other related analytical objects; after that the problem is solved as
a purely analytical one. In paper No. 9 of this volume, random processes as
ensembles of realizations (trajectories) or as objects described by a system of
finite-dimensional distributions are not considered explicitly but only as mo-
tivations of certain definitions and assumptions. Thus, having introduced the
ANALYTICAL METHODS IN PROB ABILITY THEORY 523
in certain works.
The central problem in the analytical trend in the theory of Markov pro-
cesses with continuous time is to obtain differential characteristics of processes
and the mechanisms, realizing the connection between them and the processes
themselves. Let us review the development of methods of answering this ques-
tion and some achievements in the analytical trend of the theory of Markov
processes.
In No. 9 differential characteristics are introduced by formulas (50) for
processes with a discrete state space and by formulas (114), (124), (115), (122)
for continuous (diffusion) processes on the realline. The existence ofthe limits
(50), (122), (124) (which are in fact differential characteristics for s = t) is
established under the assumption of differentiability of the transition proba-
bilities Pij(S,t), (or the transition prob ability densities f(s,x,t,y» at s <t
(additionaIly, provided that the determinant (119) does not vanish). In both
cases differential equations are derived: inverse ones by differentiating with
respect to the first time argument s (formulas (57), (125» and direct ones:
by differentiating with respect to the second argument t (formulas (52), (133».
The existence and uniqueness problem is raised for a solution satisfying natural
conditions at s =t and the conditions (1), (3) (particular cases are (40), (41)
and (85), (86» required for the existence of the corresponding process. In the
continuous case it is solved only when the equation reduces to the classical heat
equation. In §19 differential characteristics for purejump Markov processes are
given and a direct differential equation is suggested (175) (as weIl as (176) for
a process with diffusion between the jumps).
The problem of finding transition probability densities for a diffusion pro-
cess in its analytical formulation is the problem of finding the fundamental
solution of a parabolic differential equation. Existence and uniqueness theo-
rems were obtained under general conditions already in the 1930's (FeIler [5]);
they were also obtained for pure jump processes and processes with diffusion
between jumps.
The differential characteristics of Markov processes in No. 9 are introduced
separately for different particular classes of processes; these characteristics are
functions of the corresponding arguments. The next step was to consider a
linear (sometimes unbounded) operator, instead of a set of functions, as a
differential characteristic of a Markov process. This step was taken when the
ANALYTICAL METHODS IN PROBABILITY THEORY 525
h
act on the functions
Tt/(x) = Po(t,x,dY)f(Y),
operator coincides with the infinitesimal one), but rather a new aspect of the
connection between this operator and a Markov process.
New methods made it possible to solve the problem of giving a full de-
scription of all one-dimensional diffusion processes homogeneous in time (that
is, strong Markov processes with continuous trajectories). At the same time
a fruitful setting of new problems was found, in particular, on the behaviour
of a Markov process given inside a domain, after going out onto its boundary
(analytically it reduces to finding boundary values that restriet the given linear
operator to an infinitesimal operator of the semigroup of contracting operators
preserving positivity).
A further development of analytical methods, following the work of Hunt
[7], was based on considering, on the one hand, general non-negative additive
functionals of Markov processes, and on the other hand, from the analytical
viewpoint, excessive functions (non-negative functions superharmonie with re-
spect to a given semigroup of operators). By considering the extremal points of
the set of these functions, ways were found of constructing an ideal boundary of
the domain corresponding to a given Markov process (the Martin boundary).
A new approach to the question of obtaining differential characteristics of
a Markov process was started by Fukusima (see [8]). Instead of an infinites-
imal operator, he considers the corresponding bilinear Dirichlet form as such
a characteristic. However, a complete and closed theory is only obtained for
the case when the semigroup consists of operators symmetrie with respect to
a certain measure (in terms of probability theory, for Markov processes invert-
ible in time). Important results were obtained dealing with possible extensions
concerning the exit of a Markov process given inside a domain (in particular,
Brownian motion with "reflection in the normal" at an arbitrary, non-smooth,
boundary has been considered).
In the 1960-1970's, especially after the works by Stroock and Varadhan [9],
the approach to the connection between linear operators and Markov processes
based on the not ion of martingale and the "martingale problem" became quite
popular. Instead of the strong Markov property, it uses the preservation of the
martingale property for Markov random moments (the strong Markov property
appears to be an automatie consequence of the uniqueness of the solution to the
martingale problem). In papers by Krylov [10] and Stroock and Varadhan [9]
the existence and uniqueness problem for a diffusion process corresponding to
ANALYTICAL METHODS IN PROB ABILITY THEORY 527
given diffusion and transition coefficients posed in No. 9 was solved practically
without any limitations.
Perhaps the analytical trend in the theory of Markov processes will now be
developed on a new, higher level based on considering, as main analytical ob-
jects, distributions in function spaces, rat her than transition probabilities and
related operators. This was started in papers by Stroock and Varadhan, and
others. Analytical methods give especially wide possibilities for establishing
various limit theorems for random processes.
References
The paper is divided into two parts (the first ends at formula (7» because of a
formal restriction on the length of a paper in a volume of the journal Atti della
Reale Accademia ....
Abrief history preceding this paper is as folIows. In 1929-1930 several
papers of the well-known Italian mathematician B. de Finetti came out, in
which he initiated the study ofproperties ofrandom processes X(A) that were
later called homogeneous processes with independent increments. It turns out
that
1/;(t, A) = Eexp{itX(A)} = [1/;(t, 1)]".
where "'I, (1', c are real constants and F(x) is a distribution function.
528
HOMOGENEOUS RANDOM PROCESSES 529
References
1. P. Levy, 'Sur les integrales dont les elements sont des variables aleatoires
530 COMMENTS
This paper by A.N. Kolmogorov, together with those by Doob and Levy quoted
below, laid the foundations of the theory of Markov processes dealing with
homogeneous processes with a countable number of states. The peculiar effects
taking place in these processes made them aseparate branch of the general
theory. At the same time, studies of the countable case helped to work out
important concepts having a wider scope of application, such as the strong
Markov property and boundary conditions for Markov processes.
The densities a~ of transition from a to ß and the densities aa = -a~
of leaving a, as weH as the direct and inverse systems of differential equations
PI(t) =
P(t)A and PI(t) = AP(t), rar the transition probabilities ~(t) of a
homogeneous Markov process with a finite or countable set of states E were
introduced by Kolmogorov in his fundamental work [1] (No. 9 of the present
volume), in which he assumed differentiability of the ~(t) and, when E is
infinite, certain conditions of uniform convergence (A and pet) denote the
matrices with entries a~ and ~(t), t ~ 0). If Eisfinite, it is easy to show that
aa = =
E ß # a~, and for the initial value P(O) I (I is the identity matrix)
each of the systems of Kolmogorov differential equations has the unique solution
pet) = eAt • After Doeblin [2] established that for finite E differentiability of
the transition probabilities foHows from stochastic continuity of the process,
that is, from the condition P( +0) = I, the case of a finite set of states was
essentially exhausted.
Processes with a countable state space proved to be more difficult. The
analysis of their infinitesimal characteristics, smoothness of transition func-
tions, behaviour of trajectories, possible "pathologies", etc., accounted for a
whole trend notable in its time.
HOMOGENEOUS MARKOV PROCESSES 531
In the papers [3], [4], which came out earlier and were related to his notion
of separable process, Doob studied the transition matrix P(t) based on an
examination of the jumps in the trajectories. Assuming that P(+O) =I (this
condition is considered to hold throughout) Doob took the following steps: 1)
he proved the existence of finite or infinite derivatives a~ of the functions ~(t)
at the origin; 2) he showed that the inverse system holds when there exists a
first jump after 0, that is, when aa =
Eß~a a~ < 00 (we then say that the
matrix A is conservative) and found a similar connection between the direct
system and the last jump before t; and 3) by continuing the process in different
ways after accumulating the jumps, he obtained examples of non-uniqueness of
a process with a given matrix A.
In No. 39 of [5], by making use ofremarkably simple and concise purely al-
gebraic or analytical techniques (in terms of wh ich probabilistic considerations
can be reflected), Kolmogorov 1) proved the existence of finite densities a~ for
ß =t=. Cl'; 2) proved the existence of the densities aa ::; 00; 3) constructed an ex-
ample of a process where al = 00; and 4) constructed an example of a process
where Eß~a a~ < aa < 00 (we will denote these examples by Kl and K2).
Further, Kolmogorov conjectured [5] that the ~(t) are always differentiable at
t > O.
In a paper that came out at the same time, Levy [6], who was interested
primarily in the asymptotics of ~(t) as t -+ 00, classified the processes con-
sidered from the viewpoint of the behaviour of trajectories, treating them on
a less formal level than Doob. In particular, Levy established the effects ob-
served in the examples Kl and K2. Levy suggested calling the states Cl' with
aa < 00 stable, and those with Cl'a = 00 instantaneous ones. The theorem [6]
stating that not all states are instantaneous turned out to be erroneous. Levy
continued his analysis in [7], [8].
Further studies of countable homogeneous Markov processes were consid-
erably influenced by Kolmogorov's works and ideas. At Moscow University this
subject was discussed at the seminars of Dynkin and Kolmogorov. Kolmogorov
suggested the study of the differentiability of ~(t) as the subject for diploma
work to Yushkevich, who managed with the help of trajectory analysis to prove
the continuous differentiability of ~(t) under the assumption that at least one
of the states Cl' or ß is stable, and to construct an example of a process with
infinite second derivative of ~(t) at certain t > 0 (the idea of the example was
532 COMMENTS
suggested by A.N. Kolmogorov). This paper was published six years after [9].
Yushkevich also noticed the gap in the proof of the above-mentioned theorem
of Levy on instantaneous states, which was connected with the absence of a
definite value of the trajectory at the moment of a discontinuity of the second
kind (see [28]).
These questions were discussed at the Mathematical Congress in Ams-
terdam, where Kendall and Reuter [10] gave a detailed analysis of examples
K1 and K2 from the viewpoint of semigroups. Kolmogorov was present at the
congress. From conversations, some participants got the wrong impression that
in [9] an example is constructed that refuted the hypothesis of differentiability
of ~(t) [12]. Soon Austin [11], [12], published a purely analytical proof of
the continuous differentiability of ~(t) for t > 0 in the case of a stable state
fr. Then Chung [13] proved the same for t ~ 0 using a probabilistic method
elose to [9]. Later Austin generalized his proof to the case of a stable state ß
[14]. Improvements of this proof were proposed by Reuter [15], who further
developed the semigroup approach, as weIl as by Jurkat [16] and Chung [17],
[18]. Finally a complete purely analytical proof of Kolmogorov's conjecture was
found by Ornstein [19]. Smith [20] gave a negative ans wer to Chung's quest ion
on whether or not the derivative of p~(t) tends as t ~ 0 to its value -00 at
t = 0 for an instantaneous state fr. On the other hand, Orey [21] showed that
the total prob ability pa(t) of the state fr at time t need not be differentiable for
a suitable initial distribution and time t > O. Subtle questions on the structure
of the functions ~(t) as t ~ 0 were also dealt with by Kendall [22], Blackwell
and Freedman [23] and Reuter [24]. Studies on the dass of possible functions
f(t) = were one of the factors that stimulated Kingman to develop the
p~(t)
theory of regenerative events [25] ..
Soon after No. 39 came out, Kendall partially expanded Kolmogorov's
results on the existence of densities aa and a~ to Markov jump processes with
arbitrary set of states [26] and, by combining the ideas of examples Kl and K2,
constructed an example of a process with al 00 and = af =
0 for ß =1= 1 [27]. A
positive answer to the intriguing question whether there exists a process whose
states are all instantaneous, was given independently by Dobrushin [28] and
FeIler and McKean [29]; other examples were soon suggested by Kendall [30]
and Blackwell [31]. Later Reuter returned to the studies of K1 and showed that
in this example the infinitesimal matrix A uniquely determines the transition
HOMOGENEOUS MARKOV PROCESSES 533
of transition during a finite number of jumps form the minimal solution for
this system. The problem arises of constructing possible continuations of the
minimal process beyond an accumulation point of the jumps, preserving the
Markov property and the infinitesimal matrix A, so that the process does not
terminate. After FeUer [44], the preliminary question as to when accumulation
of jumps (in other words, going away to infinity) can take place over finite
time, was studied by Dobrushin for the countable case [45]. The furt her con-
struction requires constructing exit boundaries, entrance boundaries (if any)
and "glue together" the exits to the entrances, or to the initial states by means
of boundary conditions. Without mentioning the vast range of corresponding
research for diffusion-type processes and their generalizations, we merely note
that for a countable state space such a programme was sketched by Dynkin
in his talk at the AU-Union Meeting on Probability Theory in Leningrad in
1955. Apparently, it could only be realized for the case with a finite number of
exits. Without attempting to give a complete review, we mention the relevant
papers by Dobrushin [46], FeIler [47], Reuter [15], [48], Neveu [49], Chung [50],
Williams [51], [52] and Dynkin [53].
References
1. A.N. Kolmogoroff, 'Über die analytischen Methoden in der Wahrschein-
lichkeitsrechnung', Math. Ann. 104 (1931), 415-457.
2. W. Doeblin, 'Sur l'equation matricieIle A(t+6) = A(t)A(6) et ses applica-
tions aux probabilites en chaine', Bull. Sei. Math. 62 (1938), 21-32; 64
(1940),35-37.
3. J .L. Doob, 'Topics in the theory of Markoff chains', Trans. A mer. M ath.
Soc. 52 (1942), 37-64.
4. J .L. Doob, 'Markoff chains - denumerable case', Trans. Amer. Math. Soc.
58 (1945), 455-473.
5. A.N. Kolmogorov, 'On differentiability of transition probabilities in time-
homogeneous Markov processes with a countable number of states', Uch.
Zap. Moskov. Gos. Univ. Mat. 148:4 (1951), 53-59 (in Russian).
6. P. Levy, 'Systemes markoviens et stationnaires. Cas denombrable', Ann.
Sei. Ecole Norm. Super. 68 (1951),327-381.
7. P. Levy, 'Processus markoviens et stationnaires du cinquieme type (infinite
denombrable des etats possibles, parametre continu), C. R. Acad Sei. Paris
236 (1953), 1630-1632.
HOMOGENEOUS MARKOV PROCESSES 535
39. E.B. Dynkin and A.A. Yushkevich, 'Strict Markov processes', Teor. Veroy-
atnost. i Primenen. 1 (1956), 149-155 (in Russian). (Translated as Theory
Probab. Appl.)
40. G.A. Hunt, 'Some theorems concerning Brownian motion', 1Tans. Amer.
Math. Soc. 81 (1956), 294-319.
41. D .. Ray, 'Stationary Markov processes with continuous paths', 1Tans.
Amer. Math. Soc. 82 (1956), 452-493.
42. R.M. Blumenthal, 'An extended Markov property', 1Tans. Amer. Math.
Soc. 85 (1957), 52-72.
43. J .L. Doob, 'Compactification of the discrete state space of a Markov pro-
cess', Z. Wahrscheinlichkeitstheorie 10 (1968), 236-251.
44. W. FeIler, 'On the integro-differential equations of purely discontinuous
Markoff processes', 1Tans. Amer. Math. Soc. 48 (1940), 488-515; Errata
- Ibid. 58 (1945), 474.
45. R.L. Dobrushin, 'On the regularity conditions for time-homogeneous
Markov processes with a countable number of possible states', Uspekhi
Mat. Nauk 7:6 (1952), 185-191 (in Russian).
46. R.L. Dobrushin, 'Some classes of homogeneous countable Markov pro-
cesses', Teor. Veroyatnost. i Primenen. 2 (1957), 377-380 (in Russian).
(Translated as Theory Probab. Appl.)
47. W. FeIler, 'On boundaries and lateral conditions for the Kolmogoroff dif-
ferential equations', Ann. 0/ Math. 65 (1957), 527-570.
48. G.E.H. Reuter, 'Denumerable Markov processes. 11, 111', J. London Math.
Soc. 34 (1959), 81-91.
49. J. Neveu, 'Sur les etats d 'entree et les etats fictifs d 'un processus de
Markoff', Ann. [nst. H. Poincare 17 (1962), 324-337.
50. K.L. Chung, 'On the boundary theory for Markov chains', Acta Math. 110
(1963), 19-77; 115 (1966), 111-163.
51. D. Williams, 'The process extended to the boundary', Z. Wahrschein-
lichkeitstheorie 2 (1962), 332-339.
52. D. Williams, 'On the construction problem for Markov chains', Z. Wahr-
scheinlichkeitstheorie 3 (1964), 227-246.
53. E.B. Dynkin, 'General boundary conditions for Markov processes with a
countable set of states' , Teor. Veroyatnost. i Primenen. 12 (1967), 222-257
(in Russian). (Translated as Theory Probab. Appl.)
BRANCHING PROCESSES (Nos. 25, 32, 33, 46)
(B.A. Sevast'yanov)
The general notion of and the term "branching random process" itself that
immediately became commonly accepted, were first explicitly coined by A.N.
Kolmogorov at his seminar at Moscow University in 1946-1947. Various prob-
lems relating to simple models of branching processes had been considered pre-
viously. In particular, one ofthese problems was solved by A.N. Kolmogorov in
his earlier work, No. 25. Rowever, it was the publication of papers Nos. 31, 32,
that triggered an intensive development of the theory of branching processes.
By now there are several monographs that deal with branching processes (books
by Rarris [1], Sevast'yanov [2], Athreya and Ney [3], and others). The model
of branching processes described in papers Nos. 31, 32 are Markov branching
processes with several types of particle and with continuous and discrete time.
This model appeared to be very efficient, both in terms of the number of re-
sults obtained and in its possible applications to biology, chemistry, physics and
technology. A special case of this model is a branching process with immigra-
tion, when, in addition to multiplying particles, there also appear immigrating
particles (see, for example, [2]).
Later on, more complex models of branching processes were developed
which took into account the following parameters: dependence ofmultiplication
on the age of particles (the BeIlman-Rarris process), a particle's position in a
certain space, a particle's dependence on energy, random environment, etc.
The asymptotic formulas of the probability of continuing the process ob-
tained in No. 25, as weIl as related limit theorems proved by Yaglom [4], were
later generalized by other authors for other models, including non-Markov ones.
Papers [5]-[7] deal with the convergence of branching processes to diffusion pro-
cesses, which is discussed in the review paper No. 46.
References
538
STATIONARY SEQUENCES 539
Paper No. 27 has by now become classical and opened the main road in the
theory of stationary (and related) random processes, giving a proper setting 1
for prediction problems and solving them for stationary processes with dis-
crete time (stationary sequences). This work is remarkable because of its deep
connection with various questions of approximation theory, spectral theory of
operators in Hilbert space and the theory of analytic functions. The two central
notions in this theory are the regularity of a random process and subordination
of one process to another.
A stationary sequence can be regarded as asequence
i:
criterion: F(d>') is absolutely continuous and the density f(>') = F(d>')/d>'
satisfies the condition
logf(>')d>' > -00.
y(t) = Ax(t)
J
with the operator
A= tjJ(>')E(d>')
in L 2 with F(d>') = d>' the answer is the following: Hy(O) = H",(O) holds if
and only if
where the so-called inner function 1/J E H 2 gives a fundamental sequence for y,
1-00
00 logf(>")d'
1
+ >..2 1\ >
_
00.
Later Krein found the connection between the prediction problem and the
inverse spectral problem for the string equation. A direct generalization of the
regularity criterion for the non-degenerate multidimensional case was given in
i:
1941 by Zasukhin in the form:
I:
generalized version of (*) in the form
(where c> 0 is a constant and I the identity operator) does not guarantee the
required regularity of 1(>'). A general criterion for regularity was proposed by
Rozanov.
References
The authors solve the problem for a Gaussian stationary process X(t) under
the strong mixing (SM) condition introduced by Rosenblatt [1]. It is proved
that in this case the SM function 0:( r) is equivalent to the maximal correlation
coefficient p(r) (o:(r) ~ p(r) ~ 21r0:(r)) and the representation (4) for p(r) is
STATIONARY PROCESSES 543
found in terms of the speetral density of the proeess !(>'), whieh exists in view
of the SM eondition. Thus, the problem is solved eompletely. For example, in
the ease of integer time the proeess X t possesses the SM property if and only
if
where T is the transition operator of the random walk. A.N. Kolmogorov also
posed the problem of finding an effeetive eriterion for eomplete regularity of
a Gaussian stationary proeeSSj it was solved by Volkonskü and Rozanov [7].
A student of A.N. Kolmogorov, Leonov [8], studied various SM eonditions of
random proeesses using higher semi-invariants.
References
6. I.H. More, and L.B. Page, 'The dass W of operator valued weight func-
tions', J. Math. Mech. 19 (1970), 1011-1017.
7. V.N. Volkonskii and Yu.A. Rozanov, 'Some limit theorems for random
functions. I, 11', Teor. Veroyatnost. i Primenen. 4:2 (1959), 186-207; 6:2
(1961),202-214 (in Russian). (Translated as Theorg Probab. Appl.)
8. V.P. Leonov, Some applications 0/ higher semi-invariants to the theorg 0/
stationarg mndom processes, Nauka, Moscow, 1964 (in Russian).
In this work the estimators jT and WT for the parameter A (damping coef-
ficient) and w (frequency) in a complex stationary Gaussian Markov process
are constructed using the maximum likelihood method. It turned out that the
normalized ratio
WT-W
References
Paper No. 34 by A.N. Kolmogorov published in 1947 is the first popular review
of the spectral theory of stationary random processes, one of the most impor-
tant sections of the mathematical theory of random functions which had been
developed shortly before (with an active contribution by Kolmogorov himself),
but at that time hardly known to anyone outside the narrow cirele of experts.
The first similar review of this theory outside the Soviet Union appeared two
years later (see [1]). The editions in Russian dealing with this theory include
a scientific monograph [2], the rather elementary book [3] and large sections in
many textbooks (see, for example, [4]-[7]).
Kolmogorov starts his review by giving elassical results by Khinchin [1] 1
who was the first to define a stationary random process e(t) and proved that
its correlation function B(T) = E{e(t + T)e(tn can always be represented as
a Fourier-Stieltjes integral (see formula (6) which relates to the somewhat
more general case of a multidimensional process e(t) = {el(t), 6(t), . .. ,e.(tn).
However he paid main attention to the question of substantiating the possibility
of representing the stationary process e(t) as a Fourier-Stieltjes integral of the
form (,.0 and on the physical meaning of such a representation. He remarks
that the spectral representation (4) can be easily derived from the well-known
theorem in functional analysis on the spectral decomposition of one-parameter
groups of unitary operators in Hilbert space, and the first deduction of this
type was given in [6]-[7] in 1940. However, for simplicity he does not stress
the point that these works solved a more general (and more complex) problem
on the spectral representation not only for stationary processes, but for the
wider class of processes with stationary increments.
A random process e(t) is called a process with stationary increments (in
the wide sense) if for any t, T, 8, Tl, T2 the expectations
545
546 COMMENTS
stationary increments need not be stationary.) The results of the paper [6] 2
imply that any process with stationary increments ~(t) allows a spectral rep-
1:
resentation of the form
where ~o,6 are random variables and the random function ~(A) is such that
its increments ~(~~) = ~(A;) - ~(AD and ~(~~) = ~(A~) - ~(An on non-
intersecting intervals ~>. = [A;, A~] and ~~ = [A~, At] are uncorrelated to each
other (that is, = 0), while the real monotone non-decreasing
E{~(~~)~(~~)}
function F(A) of the argument A given by E{I~(~>.W} = F(~>.) satisfies, for
any ( > 0, the condition
(3)
In the particular case of a stationary process ~(t) (considered for discrete time,
in particular, by Kolmogorov's [4], [5]), 6 = 0 and, furthermore, not only
1:
must (3) hold, but also the more restrictive condition
1:
therefore (2) may be rewritten as
~(t) = ei>'td~l(A)
(where ~l(A) = ~(A) + (~o -~~)K(A) and K(A) is a "jump function" equal to
ofor A < 0 and to 1 for A ~ 0). Applied to stationary processes this brings us
back to (4) and to the results (applied to the multidimensional process ~(t»
of the paper No. 34 by Kolmogorov.
The spectral theory of random processes with stationary increments de-
veloped in [6]-[ 7] can be applied to the more general dass of random processes
~(t) with stationary increments of order n > 1 (see [9], [10], where one can also
2 The results of [6] dealing with geometry of Hilbert space were obtained a little
later also by von Neumann and Schoenberg [8], who noted, however, that the
same results can at the same time be interpreted as certain facts concerning
random processes e(t) with stationary increments.
SPECTRAL THEORY OF STATIONARY PROCESSES 547
find precise definitions of this dass of random processes). For these processes
the spectral representation is
where eO,el,." ,enare random variables, and the increments of the random
function cI(>') on non-intersecting intervals ~~ and ~~ are again uncorrelated
to each other, whereas the function F(>') determined by E{lcI(~A)12} = F(~A)'
satisfies, for any f > 0, the condition
(6)
i:
be associated with a generalized process
for any T. For simplicity we suppose that E{e(t/J)} = m(t/J) = 0 for all t/J. In this
case a generalized stationary random process e(t/J) has a spectral representation
ofthe form
(9)
where ~(>.) = f~oo ei>.tt/J(t)dt is the Fourier transform of t/J(t), and the random
function <1>(>') has the same properties as <1>( >.) featuring in the spectral decom-
position (.4) of an ordinary random process e(t), with the only difference that
now F(>.) is such that E{I<I>(Ll A)I2} = F(Ll A ) satisfies the condition
1 00
-00
dF(>')
(1 + >,2)m < 00
(10)
for some integer m ~ 0 (for ordinary stationary processes e(t) we always have
f~oo dF(>') < 00, that is, m = 0). It can easily be seen that in the general case
when the process e(t/J) is ordinary, (that is, it is given by (7)), (9) immediately
implies also the more usual spectral representation of the process e(t) of the
form (4).
Generalized random processes are in certain aspects simpler than ordinary
processes; in particular, whereas an ordinary random process e(t) has deriva-
tive df.(t)fdt = e'(t) only under certain special conditions, a generalized process
e(t/J) is always differentiable, its derivative e'(t/J) being e'(t/J) = -e(t/J') (if a pro-
cess.e(t/J) is given by (7) and e'(t) exists, then clearly e'(t/J) = f~ooe'(t)t/J(t)dt).
Since generalized processes are differentiable, a generalized random process
with stationary increments of given order n can be defined simply as a process
e(t/J) (non-stationary in general) whose nth derivative en(t/J) is a generalized
stationary process. Using this definition, it is easy to derive from (9) and (10)
the general spectral representation of a generalized random process with sta-
tionary increments of order n, which includes (5) and (6) as particular cases
(relating to processes e(t/J) of the form (7)).
SPECTRAL THEORY OF STATIONARY PROCESSES 549
(12)
are uncorrelated, and F(~k) = E{I<fI(~kW} is integrable over the whole space
(that is, JRn F(dk) < 00). We note that as far back as 1941 Obukhov [14],
[15] used the spectral decomposition (12) of a homogeneous random field e(x)
in his important work on statistical turbulence theory, referring to the paper
[6] by Kolmogorov. A rigorous proof of this formula and some of its gener-
alizations (concerning, in particular, fields with homogeneous increments and
generalized homogeneous fields) can now be found, for example, in [16] (see
also [13], §III.5). A number offurther examples of "generalized spectral repre-
sentations" of various classes of random functions related to the representation
of stationary random processes e(t) as Fourier-Stieltjes integrals (4) is given in
particular in [17], [18].
Apreeise formulation of Zasukhin 's results [13] mentioned by Kolmogorov
and a rigorous proof of these results can be found in Rozanov's book [2], Chap-
ter 11. For a modern presentation and furt her development of Slutskii's results
[14] see the papers by Moran [19], [20] and Kendall and Stuart [21], §47.15.
References
1. J .L. Doob, 'Time series and harmonie analysis', In: Proc. First Berkeley
Sympos. Math. Statist. Probab., 1949, pp.303-344.
2. Yu.A. Rosanov, Stationary random processes, Fizmatgiz, Moscow, 1963 (in
Russian).
550 COMMENTS
and S(k) the dass of processes {(t) E T<k) such that for all 1 $ I $ 11:, -00 <
T< 00,
Then the example constructed in the commented papers shows that although
S(2) C c)(2), for 11: > 2 there exist processes of dass S(k) that are not of dass
c)(k) •
where
552 COMMENTS
instead of the moment spectral measure MP'\)\l"'" Ak) (defined by the rela-
tion (see [1], [2], [9])
In particular, the classes il(k) are charac1erized by the fact that for e(t) E il(k)
the se mi-invariant measures FP), 1 ~ I ~ k, are absolutely continuous with
=
respect to Lebesgue measure on the sets Al + .. .+A/ 0 (mod 211'). The study
of the classes il(k) was then continued in [2]. A generalization of il(k) for vector
random processes is obtained in [3]. Similar questions for random fields were
considered in [4].
The work initiated by A.N. Kolmogorov in which the classes il(k) were
defined and studied, initiated aseries of studies in a new branch, called the
theory of higher spectra of stationary random processes and their statistical
analysis. The results of this branch are now of fundamental character and are
used extensively for solving various applied problems in astronomy, geophysics,
studies of liquid and gas turbulence, etc. (see, for example, [5]-[8]).
A fundamental contribution to the further development of higher-order
spectral analysis was made in [9] where the mathematical apparatus of higher
moments and semi-invariants, necessary for further research, was developed
(see [10], [11], etc.). Higher-order spectral theory for homogeneous random
fields was considered by Yaglom [12]-[14]. Upper estimates for higher spectral
densities and their derivatives under different mixing conditions can be found
in [11] and are essential in statistical analysis.
Numerous works dealt with constructing and studying higher spectral den-
sities (for example, [15]-[17], etc.). However, until recently, all the statistics
had the same essential drawback: they did not allow one to construct an esti-
mator of the higher semi-invariant spectral density for all arguments. Though
the semi-invariant density of nth order is defined at all points Al + ... + An =
o (mod 211'), the statistics proposed gave no ans wer for the subsets Ak l + ...
... + Ak = 0
p (mod211'), 1 ~ ki ~ n, i =
1, ... ,p, for p < n. Naturally
statistical estimators were ullstable in a neighbourhood of these sets. This
drawback has recently been eliminated by using statistics constructed by the
time translation method [18].
SPECTRAL REPRESENTATION OF RANDOM PROCESSES 553
References
15. M.A. Rosenblatt and J.S. Van Ness, 'Estimation ofthe bispectrum', Ann.
M ath. Statist. 36 (1965), 1120-1136.
16. D.R. Brillinger and M.A. Rosenblatt, 'Asymptotic theory of estimates of
kth order spectra', In: Advanced Seminar on Spectral Analysis 0/ Time
Sen es, Wiley, New York, 1967, pp. 189-232.
17. I.G. Zhurbenko and N.N. Trush, 'On estimators of spectral densities of
stationary processes', Litov. Mat. Sb. 19:1 (1979), 65-82 (in Russian).
18. I.G. Zhurbenko, 'Statistical estimators of higher spectra', Teor. Veroyat-
nost. i pnmenen. 30:1 (1985),66-77 (in Russian). (Translated as Theory
Probab. Appl.)
Papers Nos. 14, 19, 24 are closely related to the important papers Nos. 9
and 17 dealing mainly with the general theory of continuous Markov ("Stochas-
tically determined" in Kolmogorov's terms) random processes. Starting from
the well-known works by Einstein and Smoluchowski [1], [2], processes of this
kind have been widely used in physics to describe Brownian motion of both
individual particles and systems with many degrees of freedom; we recall in
this connection that, as it turned out, the physicists Fokker and Planck, who
studied Brownian motion, came to some of the conclusions given in No. 9 even
earlier. It was therefore natural to expect that the results of No. 17 can also be
directly applied to many problems concerning Brownian motion. Some concrete
examples of these applications are given in Nos. 14, 19, 24.
Paper No. 14, written together with M.A. Leontovich
, and published in a
physics magazine, is more related to physics than the other two. It solves the
following problem in the theory of Brownian motion posed by S.1. Vavilov: to
find the mean E(F) ofthe area F on the plane (x,y) covered by the projection
of a spherical particle of fixed radius u in the process of Brownian motion over
time t. Based on partial differential equations for the transition probability
densities of a multidimensional continuous Markov process found in No. 17, the
authors obtain for E( F) a very precise asymptotic formula (applicable when
Dtfu 2 ~ 1, where D is the corresponding diffusion coefficient, which, due to
Einstein's results [1], [2] for a spherical particle ofradius u immersed in a liquid
BROWNIAN MOTION 555
motion over very small time intervals A.t. For the simplest case of Brownian
motion of a free particle a refined theory taking into account the particle's
inertia was developed in 1930 by Uhlenbeck and Ornstein [8] (see also Doob
[9] and Chandrasekhar [10], Chapter II)j in this refined theory trajectories of
particles are differentiable (but do not have second derivative, so that a par-
ticle's acceleration is infinite). In fact, the same generalization of the classical
theory of Brownian motion is discussed in No. 19 which, however, considers
the general case of Brownian motion with n degrees of freedom, instead of the
particular case of motion of a free particle. According to Kolmogorov, inertia is
taken into account by considering the state of the system as given by the values
of n positions Q1, ••• , qn and their derivatives with respect to time (velocities)
41, ... , 4n. Brownian motion is modelIed here by a continuous Markov process
in the 2n-dimensional phase space of positions and velocities. The main equa-
tion for the transition probability density of this Markov process (equation (9)
of No. 19) turns out in this case to be adegenerate parabolic-type equation
(since its right-hand side contains only second derivatives with respect to ve-
locities, but not with respect to two position coordinates or a position and a
velocity coordinate). Soon after No. 19 was published, Kolmogorov's student,
Piskunov, began a study of the mathematical theory of such equations (11).
The Uhlenbeck-Ornstein theory of one-dimensional Brownian motion of a free
particle is derived from Kolmogorov's general theory for the particular case
n = 1 (so that the basic equation of the theory acquires the form (10), see
No. 19), where f = =
-o:q (and 0: ß/m, where m is the particle's mass, and ß
the viscous friction coefficient equal to 67rui' for a spherical particle of radius
u), and k = const = koT/mß.
Finally, paper No. 24 discusses the interesting question of the meaning
and conditions of statistical reversibility of Brownian motion. It is weIl known
that in the thermodynamical sense processes of Brownian motion (or diffusion)
are irreversible: in the presence of a large number of diffusing particles these
processes always result in a levelling out of the distribution of the particles
available as t increases, whereas the distribution becomes more and more in-
homogeneous as t decreases. Apparently Schrödinger [12] was the first to draw
attention to the fact that an essential role is played he re by the presence of
a well-defined initial condition at t = 0 and the assumption (always made)
that the distribution of particles tends to a homogeneous one (in general, to a
BROWNIAN MOTION 557
there that the matrix IIBij (z) 11 ofthe coefficients ofthe second derivatives in the
equations for the transition probability density is strictly positive definite for all
z. At the end of §1 the author points out that the case of degenerate IIBij (z)1I is
in fact of considerable physical importance, since it appears, in particular, when
considering Brownian motion in the phase space of positions and velocities of
a physical system (see No. 19). But he immediately adds that the degenerate
case is not considered in this paper. Later Kolmogorov asked a postgraduate
of his, the author of the present comments, the quest ion on the conditions of
statistical reversibility of Brownian motion in the phase space of positions and
velocities. In this case the very definition of statistical reversibility differs from
that given in No. 24, since reversing the motion of a physical system entails
changing the sign of all the velocities (see [13]). However, this difference is
very simple and has little effect. To obtain necessary and sufficient conditions
of statistical reversibility, we must now rewrite the main equations for the
transition probability (equation (9) of No. 19 and the adjoint equation) in
invariant tensor form. (This acquires an especially dear physical meaning when
the coefficients of the metric quadratic form in the position space are taken
to be the coefficients of the quadratic form of the velocities that gives the
system's kinetic energy.) If we now confine ourselves to the most interesting
case, when the coefficients of second derivatives with respect to the velocities
in the equation for the transition probability densities depends only on the
positions and the forces affecting the system in question depend linearly on
the velocities, we can show (see [13]) that for a Brownian motion in the phase
space to be statistically reversible it is only necessary that the corresponding
stationary prob ability distribution have the same form as the canonical Gibbs
distribution. Thus, in this case the statistical reversibility condition has dear
physical meaning. If we then pass to the limit corresponding to letting the
system's inertia te nd to zero (that is, let all coefficients of the quadratic form
of velocities tend to zero, which gives the kinetic energy), then we again come
to the Einstein-Smoluchowski model of Brownian motion as a Markov process
in the space of positions onlYi then the conditions for statistical reversibility of
the Brownian motion in the phase space found in [13] by passing to the limit
again revert to the conditions for statistical reversibility found in No. 24 by
Kolmogorov. The latter circumstance sheds additional light on the physical
meaning of the latter conditions.
MARKOV CHAINS WITH A COUNTABLE NUMBER OF STATES 559
References
A.N. Kolmogorov's note [1] and its enlarged version [2] (No. 23 of the present
edition) eontain the foundations of the theory of eountable homogeneous Markov
ehains; these works marked the beginning of the systematie study of Markov
ehains with a eountable and, in general, an infinite set of states. To obtain a
560 COMMENTS
For the zero dass, when the limits in (1) are 0, it is interesting to compare
the orders of convergence to zero of the transition probabilities for different
states, and also to compare future distributions for various initial states. Doe-
blin [9] established that within one dass the finite non-zero limit
lim
E n1 p~~)
n-oo ~n p(m)
I)
(2)
L..,l k
always exists (Doeblin's result is actually important for the zero recurrent dass,
since for a positive dass it follows from (1), while for a non-recurrent dass it
follows from the convergence of the series obtained by replacing n by 00 in (2)).
Chung [10] found that for a non-recurrent dass the limit (2) is JLjJL/ l where
JLi is the average number of visits to i during one excursion from a fixed state
h to h, and Derman [11] showed that under the same conditions JL = {JLd is a
u-finite invariant measure which is unique to within a constant factor. These
and related results are discussed in detail in the monograph by Chung [12].
Orey [13] showed that in a recurrent dass under different initial states
future distributions converge to each other in variation:
(3)
Another proof was given by Blackwell and Freedman [14] (in a positive dass
(3) it follows directly from Kolmogorov's result (4)).
Kingman and Orey [15] considerably strengthened Doeblin's result on the
limit (2) by proving an "individual" limit theorem for ratios: if in a non-periodic
recurrent dass
N
"p~~) > ( Vi
L.Ju (4)
1
As was pointed out by Molchanov [16], condition (4) is equivalent to the simpler
condition Pi\M) > 8 for any i. Molchanov generalized (5) to a-recurrent chains,
that is, chains such that for some a ~ 1,
00
",np.(~)
"L.J.... I)
= 00 v., J. ,
w;
n=1
562 COMMENTS
but for each positive ß < a the corresponding series converges; if in this chain
Pi\N) > €Pi\M) for some N > M, € > 0 and all i, then the limit in (5) exists
and equals a-ml';t/JiJi, 1 t/J; 1 , where I' = al'P is (to within a constant factor)
a unique a-harmonic measure, and t/J = aPt/J a unique positive function. The
proof of Molchanov's theorem, which makes use of the Martin boundary, can
be found in [17].
The asymptotics of the transition probabilities Pi; (t) of a homogeneous
Markov process with a countable state space and continuous time was studied
by Levy [18].
For an uncountable phase space E Kolmogorov's sub division of E into the
set of inessential states, classes and subclasses was made by Doeblin [19] and
improved by Doob [20], Chung [21] and Orey [22]. The well-known condition
of Doeblin assumed in these works guarantees the presence of a stationary
distribution and the analogy with the case of positive classes. Adefinition of
probability for a general space E was proposed by Harris [23]. In chains that
are recurrent in Harris's sense there exists a unique CF-finite measure, which
is invariant to within a constant factor. For these chains an analogue of the
result on the existence (at i = k) of the limit (2), equal to 1';1',1, holds (the
Chacon-Ornstein theorem [24], later improved by a number of authors). These
problems are discussed in detail in the monograph by Revuz [25].
Kolmogorov's papers [1], [2] have stimulated not only further studies on
the classification of states and the asymptotics of transition probabilities, but
to a significant extent, also the development of the whole theory of Markov
chains with a countable or arbitrary phase space.
The first of these deals with the recurrent, primarily positive case and con-
sists in extending the known results on sums of independent random variables
or finite chains to additive functionals of the Markov chains considered. This
includes the law of large numbers and related ergodic theorems, the central
limit theorem and its refinements and corresponding asymptotic expansions,
the multidimensional limit theorem for the number of visits to a given set of
states, the law of the iterated logarithm, convergence of normalized increasing
sums to a Wiener process (also called the invariance principle), convergence to
non-Gaussian limit laws, etc. The ergodic theorem for the ratio of two func-
tionals is proved for recurrent chains, while the other results require stronger
assumptions of an ergodic character such as finiteness of the second moments
MARKOV CHAINS WITH A COUNTABLE NUMBER OF STATES 563
References
28. R.L. Dobrushin, 'A central limit theorem for non-homogeneous Markov
chains', Teor. Veroyatnosl. i Primenen. 1:1 (1956), 72-89 (in Russian).
(Translated as Theory Probab. Appl.)
29. V.A. Statulyavichus, 'Locallimit theorems and asymptotic expansions for
non-homogeneous Markov chains', Litov. Mal. Sb. 1:1 (1961), 231-314 (in
Russian).
30. S.Kh. Sirazhdinov and Sh.K. Formanov, Limit theorems for sums of ran-
dom vectors connected into a Markov chain, Fan, Tashkent, 1979 (in Rus-
sian).
31. D. Freedman, Markov chains, Holden-Day, San Francisco, 1971.
32. G.A. Hunt, 'Markoff processes and potentials', 111. J. Math. 1 (1957),44-
93; 316-369; 2 (1958), 151-213.
33. E.B. Dynkin, Markov processes, Fizmatgiz, Moscow, 1963 (in Russian).
34. P.L. Henneken and A. Tortra, Probability theory and some of its applica-
tions, Mir, Moscow, 1974.
35. W. FeIler, 'Boundaries induced by non-negative matrices', 7rans. Amer.
Math. Soc. 83 (1956), 19-54.
36. J .L. Doob, 'Discrete potential theory and boundaries', J. M ath. M ech. 8
(1959), 433-458.
37. T. Watanabe, 'On the theory of Martin boundaries induced by countable
Markov processes', Mem. Colloq. Sei. Univ. Kyoto (A) 33 (1960), 39-108.
38. R.S. Martin, 'Minimal positive harmonie functions', Trans. Amer. Math.
Soc. 49 (1941), 137-172.
39. G. A. Hunt, 'Markoff chains and Martin boundaries', Ill. J. Math. 4 (1960),
313-340.
40. E.B. Dynkin, 'The boundary theory of Markov processes (Discrete case)',
Uspekhi Mat. Nauk 24:2 (1969), 3-42 (in Russian). (Translated as Russian
Math. Surveys)
41. J.G. Kemeny, J.L. SneIl and A.W. Knapp, Denumerable Markov chains,
Princeton Univ. Press, 1966; New York, 1976.
42. A.N. Shiryaev, Probability, Nauka, Moscow, 1980 (in Russian).
WALD IDENTITIES (No. 35)
(A.A. Novikov)
(1)
(2)
These relations enabled hirn to obtain approximate formulas for the aver-
age observation time in the problem of sequential testing of two simple alter-
native hypotheses.
In paper No. 35 the author tried to get rid of the condition of identical
distribution of the random variables elc, which was quite important in Wald's
method [1], [2]. But much more important was the generalization of (1) and
(2) for arbitrary stopping times 1/. Later, results of this type (with arbitrary
stopping times) acquired a fundamental role in various problems of statistical
sequential analysis [3], the theory of controlled random processes [4], boundary
conditions for random processes [5], [6], etc.
Interestingly, the paper [35] did not use the not ion of stopping times (or
Markov times, see [3]) for a sequence of random variables, and Theorems 1
and 2 hold for some non-Markov times. However, in the proof of Theorems 3,
4 and 5 a condition dose to the Markov property of the time 1/ was assumed
implicitly. This was pointed out by Seits and Winkelbauer in [7] (in which also
a refined formulation of Theorems 3, 4 and 5 is given).
When 1/ is a Markov stopping time with respect to the family of u-algebras
Fn = u(6, ... ,en), Fa = {0, n} (that is, the event {I/ = n} E Fn for any n =
1,2, ... ), the results of Kolmogorov's paper have the following generalizations.
567
568 COMMENTS
(3)
v 2 v
E[ev - LE(ekIFk-d] = E[LE(eiIFk-d]. (5)
k=l k=l
Identity (4) was proved by Burkholder and Gundy [8] under condition (3)
with a = 2; its generalizations for the case 1 :::; a :::; 2 are given in the book
by Chow and Teicher [9]. Similar results for the case of continuous time were
obtained by Novikov [10], [11]. Identities with moments of en of all orders are
considered in the work by Hall [12] and in [10], [11].
Wald identities (1) and (2) for the case ofidentically distributed summands
are trivial consequences of (4) and (5). However, as follows from (3), the first
of these relations holds only if
References
6. A.A. Novikov, 'A martingale approach to problems ofthe first hitting time
on non-linear boundaries', Trudy. Math. Inst. Steklov 158 (1981), 130-152
(in Russian). (Translated as Proc. Steklov Inst. Math.)
7. I. Seits and K. Winkelbauer, 'A note to the article by Kolmogorov and
Prokhorov "On sums of a random number of random summands"', Czecho-
slovak Math. J. 3 (1953), 89-91.
8. n.L. Burkholder and R.F. Gundy, 'Extrapolation and interpolation of
quasilinear operators on martingales', Acta Math. 124 (1970), 249-304.
9. Y.S. Chow and H. Teicher, Probability theory, Springer-Verlag, New York,
1978.
10. A.A. Novikov, 'On the stopping times of a Wiener process', Teor. Veroyat-
nost. i Primenen. 16:3 (1971), 458-465 (in Russian). (Translated as Theory
Probab. Appl.)
11. A.A. Novikov, 'On discontinuous martingales', Teor. Veroyatnost. i Prime-
nen. 20:1 (1976), 13-28 (in Russian). (Translated as Theory Probab. Appl.)
12. W.J. Hall, 'On Wald's equation in continuous time', J. Appl. Probab. 8:1
(1971), 59-68.
In paper No. 42 [1] the following, more convenient metric, equivalent to the
metric S introduced in §2, was proposed:
rD(J, g) = ~EA
inf sup [P(J(t) , g(t)) + '\(t) -
0:5t:51
t],
References
The main theorems of paper No. 51 are now known as the Kolmogorov uniform
limit theorems. As is mentioned in the introduction, Theorem 1 (or, equiva-
lently, relation (0.9)) is the third refinement of the corresponding theorem from
No. 43. Later, in 1965 Le Cam [1] showed that in Theorem 1 under suitable
centralization we can take D = exp(n(F - E)), Cl = 132. In 1973 Ibragimov
and Presman [2] gave a new proof with Cl = 8. In parallel with estimates
uniform over the set of all distributions, estimates for various narrower classes
of distributions of summands were studied. In particular, Meshalkin proved
that
sup p(B n , lJ) = O(n- 2 / 3 ),
O~p~l p
References
. { Q(x) x }
KQ = mf K: K Q(y) ~ y' y~ x >0 .
for any positive 11 , ••. , In, L, 2L ;::: max Ij . Here e and ej denote random vectors
with values in R d ; the convex set E belongs to a special dass of sets which, in
particular, indudes the sets satisfying the condition
m
PE(X) = 2: IA;xl, xE R d,
;=1
References
Discussing the "possible deviations" of w~ for large n, von Mises writes (see, for
example, p.320 in [2]), that the theoretical interval ofvalues for w~ (theoretische
Wert) is the interval
How many terms of this sum are essential ? To answer this question, Cramer
suggested that one should consider the integrals
Pearson [1] had chosen the coefficients 1/ilF(ti) so that the limit distribution
ofthis quadratic form does not depend on the probabilities ilF(ti) = F(ti+1)-
F(ti) and consequently, on the division points ti and the distribution function
F. Subsequently many papers appeared that established that these statistics
were asymptotically normal. Clearly it suffices to normalize these statistics to
EMPffiICAL DISTRIBUTIONS 577
obtain random variables with the standard limit distribution. In 1931, perhaps
under the influence of this trend, von Mises [2] considered the statistics
and discussed only two weight functions A : A1(t) = n/f(t) where f is the
density of a distribution function F and A2(t) = l/E[J[Fn (t) - F(t)]2dt]. The
choice of Al was regarded as being analogous to the X2 statistic but unsuitable,
since the integral in W~(A1) "usually" diverges, whereas A2 was considered suit-
able, since it standardized the means of the w 2 statistics. The limit distribution
of the W~(A2), however, clearly depends on F.
Then, in 1933, the above-mentioned lemma was formulated, and in 1937
Smirnov in [10], with reference to Glivenko, finally introduced the statistics
W~(A), where A(t) = ~[F(t)]f(t), with distribution independent of F. Since
then, or perhaps since 1952, after the papers by Anderson and Darling [11],
this principle became standard in constructing non-parametric tests.
When a statistical hypothesis fixes a certain family F of distribution func-
tions instead of a specific distribution function F, the realization of this prin-
ciple becomes somewhat diflicult. Abrief description of papers dealing with
this problem and a very incomplete bibliography, to which Gikhman's paper
[12] should be added, can be found, for example, in [13], §2.1. In the same §2 it
is described how to construct functionals of F n and F whose limit distribution
does not depend on F.
When F is a continuous distribution in the multi dimensional space R m ,
m> 1, the lemma does not hold: it is false that the random variable F(X1 , ••.
. . . , X m ) has the same distribution for all F if the random vector (Xl, ... , X m )
has continuous distribution function F. Therefore there seems to be no stan-
dard way for constructing non-parametric goodness-of-fit tests for distributions
in multidimensional space, unlike the one-dimensional case. Still, we point out
two modern [14], [15] and one old [16] paper on this subject.
6. The convolution (16) and the recurrence formula (9) of paper No. 15
were later extensively used for calculations for finite n. In particular, in 1950
Massey [17] was the first to use (16) to calculate small tables for the distribution
function of the Kolmogorov statistic for 5 < n < 80. Quite recently (see [18])
a recurrence formula similar to (9) was used for calculating the probability of
F n remaining within domains with various curvilinear boundaries.
578 COMMENTS
Let
Pn(A, h) = p{Vt E [0,1] : IFn(t) - tl ~ )nh(t)},
cI>n,K(A, K) = P{K(vn ) ~ A}.
Clearly, Pn(A, h) = cI>n,K(A, K), that is, the calculation of the probability that
the empirical process remains within a certain boundary is equivalent to cal-
culating the distribution functions for the functionals K(v n ). These problems
were solved in different ways, however.
The first problem, namely, that of studying the prob ability of remaining
within a certain domain, is reviewed in the well-known papers by Gikhman and
Gnedenko [19] and Borovkov and Korolyuk [20]. In particular, they describe
Gikhman's work on limit theorems for the probability of remaining within the
boundary for processes converging to diffusion processes (these works are very
elose in their ideas to Kolmogorov's paper) and the papers by Gnedenko and
Korolyuk that make use of random walk methods.
We recall the development of the second problem. In No. 15 the limit for
the distribution function of the Kolmogorov statistic was established: cI>nK ~ cI>
for h = 1. However, the probabilistic "construction" of Kolmogorov distribu-
tion function cI>, namely the equality
remained unelear for a long time. In any case, it remained unelear of what use
such a "construction" could be.
With time other statistics were discovered which, as was noticed later,
can be conveniently represented as functionals of an empirical process. In
EMPmICAL DISTRIBUTIONS 579
particular, in 1939, Wald and Wolfowitz [22] considered the so-called weighted
Kolmogorov statistic, that is, the functional K(v n ) with weight function h '# 1,
while in 1937 Smirnov [10] considered the ",2 statistic, that is, the functional
",2(vn ) = 1
1
[vn (t)]2dt. (2)
It was natural to single out what the convergence of the distributions of various
functionals of V n had in common, namely, convergence of V n to v in distribution.
The discovery of the fact that it is natural to represent various statistics
as functionals of an empirical process was only one important feature of the
matter. Another was to foresee that general efficient methods could be worked
out for proving convergence in distribution of processes V n to a process v and,
in general, for proving weak convergence in function spaces.
Both methods were ripe in the late 1940's: on 30 November 1948 (as re-
ported in 'Uspekhi Matematicheskikh Nauk" (Russian Mathematical Surveys),
4:2 (1949), p. 173), A.N. Kolmogorov gave a lecture for the Moscow Mathe-
matical Society caHed "Measures and distributions in function spaces", which
discussed, among other matters, the problems of determining weak convergence
of measures in function spaces. On 29 January 1949, Smirnov gave a talk on
the "Cramer-von Mises test" [23] (see also [5], p.200), where again, after an
interval of 12 years, he obtained the limit distribution of ",2 statistics, this time
using the representation (2) and the corresponding Parseval identity. Finally,
in September 1949, Doob's well-known paper [24] appeared. Together with the
description of the general viewpoint, to the effect that the convergence ~n => ~
is a consequence of the convergence of V n to v in distribution, he also proved
(1).
This started the useful development of the theory of weak convergence in
function spaces that presently constitutes the basis for the limit theorems in
non-parametric statistics.
I recommend the small monograph by Durbin [25] and the review article
by Gaensaler and Stute [26] for an acquaintance with the state of affairs in this
field.
8. Let us make several general conduding remarks.
A. Kolmogorov's statistic, as weH as the statistic K(vn ) (see No. 7),
stands apart from the other statistics of non-parametric goodness-of-fit tests in
that it leads to confidence sets that are easy to visualize. Thus, for example,
580 COMMENTS
Here Fl n1 and F2n2 are the empirical distribution functions constructed from
two independent sequences of independent identically distributed random vari-
ables.
It is interesting that both statements (3), wh ich are the best known state-
ments in non-parametric statistics, were given as simple consequences of general
theorems on the number of intersections of the curves Fn and F + Ajy'n and
the curves F1n1 and F2n2 ± Ajy'n respectively.
In 1944 Smirnov [6] gave the distribution of the statistic D;t for finite n.
In 1955 Korolyuk [28] obtained the distribution of the statistic D n1n2 for finite
nl and n2 and in 1962 Borovkov [29] established the asymptotic expansions for
References
1. K. Pearson, 'On the criterion that a given system of deviations from the
probable in the case of a correlated system of variables is such that it can
be reasonably supposed to have arisen from random sampling', Phil. Mag.
5:50 (1900), 157.
2. R. von Mises, Warscheinlichkeitsrechnung und ihre Anwendungen in der
Statistik und theoretischen Physik, Franz Deuticke, Leipzig, Vienna, 1931.
3. H. Cramer, 'On the composition of elementary errors. Second paper: sta-
tistical applications', Skand. Aktuarietidskr. 1/2 (1928), 141-180.
4. V. Glivenko, 'Sulla determinazione empirica delle leggi di probabilita' , G.
Ist. Ital. Attuar. 4:1 (1933), 92-99.
5. N.V. Smirnov, Probability theory and mathematical statistics: Collected
works, Nauka, Moscow, 1970 (in Russian).
6. N.V. Smirnov, 'Approximation of distribution laws ofrandom variables by
empirical data', Uspekhi Mat. Nauk No.lO (1944), 179-206 (in Russian).
7. N. V. Smirnov, 'An estimate of the discrepancy between empirical distribu-
tion curves in two independent sampies', Byull. Moskov. Gos. Univ. Math.
and Mech. 2 (1939), 3-14 (in Russian).
8. N.V. Smirnov, 'Tables for estimating the goodness of fit of empirical dis-
tribution', Ann. Math. Statist. 19:2 (1948), 279-281.
9. L.N. Bolshev and N.V. Smirnov, Tables ofmathematical statistics, Nauka,
Moscow, 1983 (in Russian).
10. N.V. Smirnov, 'On the distribution of von Mises' w2 -test', Mat. Sb. 2:1
(1937), 973-993 (in Russian).
11. T.W. Anderson and D.A. Darling, 'Asymptotic theory of certain "goodness
of fit" criteria based on a stochastic process', Ann. Math. Statist. 23:1
(1952), 193.
12. 1.1. Gikhman, 'Remarks on Kolmogorov's goodness offit test', Dokl. Akad.
Nauk SSSR 91:4 (1953), 715-718 (in Russian).
13. E.V. Khmaladze, 'Certain applications of martingale theory in statistics',
Uspekhi Mat. Nauk 37:6 (1983), 194-212 (in Russian). (Translated as Rus-
sian Math. Surveys)
582 COMMENTS
14. P.J. Bickel and L. Breiman, 'Sums of functions of nearest neighbour dis-
tances, moments bounds, limit theorems and a goodness of fit test', Ann.
Probab. 11 (1983), 185-214.
15. M.F. Schilling, 'Goodness of fit testing in R m based on the weighted em-
pirical distribution of certain nearest neighbour statistics', Ann. Statist.
11:1 (1983), 1-12.
16. M. Rosenblatt, 'Remarks on multivariate transformations', Ann. Math.
Statist. 23:3 (1952), 470-472.
17. F.J. Massey, Jr., 'A note on the estimation of a distribution by confidence
limits', Ann. Math. Statist. 21:1 (1950), 116-118.
18. V.F. Kotel'nikova and E.V. Khmaladze, 'Calculating the prob ability of an
empirical process remaining within a curvilinear boundary, Teor. Veroyat-
nost. i Primenen. 27:3 (1982), 599-607 (in Russian). (Translated as Theory
Probab. Appl.)
19. 1.1. Gikhman and B.V. Gnedenko, 'Mathematical statistics', In: Mathemat-
ics in the USSR during jorty years: 1917-1957, Vol. 1, Fizmatgiz, Moscow,
1959 (in Russian).
20. A.A. Borovkov and V.S. Korolyuk, 'On results of asymptotic analysis con-
cerning boundary problems', Teor. Veroyatnost. i Primenen. 10:2 (1965),
255-266 (in Russian). (Translated as Theory Probab. Appl.)
21. V.S. Korolyuk, 'Asymptotic analysis of the distribution of maximal devi-
ations in a Bernoulli scheme', Teor. Veroyatnost. i Primenen. 4:4 (1959),
369-397 (in Russian). (Translated as Theory Probab. Appl.)
22. A. Wald and J. Wolfowitz, 'Confidence limits for continuous distribution
functions', Ann. Math. Statist. 10:2 (1939), 105-118.
23. N.V. Smirnov, 'On the Cramer-von Mises test', Uspekhi Mat. Nauk 4:4
(1949),196-197 (in Russian).
24. J. Doob, 'Heuristic approach to the Kolmogorov-Smirnov theorems', Ann.
Math. Statist. 20:3 (1949), 393-403.
25. J. Durbin, 'Distribution theory for tests based on the sampIe distribution
function', Regional Gon! Sero in Appl. Math., 9th issue, SIAM, 1973.
26. P. Gaensaler and W. Stute, 'Empirical processes: A survey of results for
independent and identically distributed random variables', Ann. Probab.
7:2 (1979), 193-243.
THE METHOD OF LEAST SQUARES 583
27. N.V. Smirnov, 'On the deviations ofthe empirical distribution curve', Mat.
Sb. 6 (48):1 (1939),3-24 (in Russian).
28. V.S. Korolyuk, 'On the divergence of empirical distributions for the case of
two independent sampIes', Izv. Akad. Nauk SSSR Sero Mat. 19:1 (1955),
81-96 (in Russian).
29. A.A. Borovkov, 'On the problem of two sampIes', Izv. Akad. Nauk SSSR
Sero Mat. 26:4 (1962), 605-624 (in Russian).
30. F.J. Massey, Jr., 'Distribution tables for the deviation between two sampIe
cumulatives', Ann. Math. Statist. 23:3 (1952),435-441.
31. A.A. Borovkov, N.P. Markov and N.M. Sycheva, Tables for N. V. Smirnov's
test of uniformity of two samples, Izdat. Sibirsk. Otdel. Akad. Nauk SSSR,
Novosibirsk, 1964 (in Russian).
32. A.N. Kolmogorov, 'On a new proof of Mendel's laws', Dokl. Akad. Nauk
SSSR 27 (1940), 38-42 (in Russian).
The method of least squares is one of the most popular statistical methods
for parameter estimation in applications. Suffice it to say that specialists in
geodesy, navigation, artillery, etc., even nowadays write their own textbooks
on this method (which do not always meet the standards of modern statistics).
At the same time, before No. 30 was published, the mathematical presen-
tations of this method were not based on modern geometrical ideas of higher-
dimensional geometry and were essentially no different from the initial method
of Gauss. It is in No. 30 that the connection of the method with orthogonal pro-
jection onto a subspace in RN was explicitly used and it was discovered which
properties of the method depend on the assumption of normality of the distri-
bution of measurements and which are true for orthogonal observations. For
applications it is important to present in detail the confidence intervals for the
parameters and for the unknown variance of measurements and to supplement
them with the tables of Student and x2-distributions. .
Paper No. 30 played an important role in developing the mathematical
theory of the method of least squares in the USSR and putting into some
order the applications of this method. In particular, its influence can be seen
in the books by Linnik [1] and Romanovskii [2]. The latter, for instance,
584 COMMENTS
References
1. Yu.V. Linnik, The method 0/ least squares and the /oundations 0/ the
mathematical and statistical theory 01 observation processing, Fizmatgiz,
Moscow, 1952 (in Russian).
2. V.1. Romanovskii, Mathematical statistics, Vol. 2, Izdat. Akad. Nauk
UzSSR,1963.
3. H. Scheffe, Analysis 01 variance, Wiley, New York, 1959.
4. H. Drügas, 'Asymptotic confidence intervals for the variance in the lin-
ear regression model', In: Statistics and probability: Essays in honour 0/
G.R. Rao, North-Holland, Amsterdam, 1982, pp.233-239.
UNBIASED ESTIMATORS (No. 38)
(Yu.K. Belyaev and Ya.P. Lumel'skii)
585
586 COMMENTS
References
The first attempts at the statistical prediction of future values of certain me-
teorological parameters using linear regression equations to give a future value
of the quantity !1y of interest in the form of a simple linear combination of
the !1xl, ... !1xk known from observations made in the past or present were
started in the 1920's. (In addition to Bauer's paper of 1925 referred to by
A.N. Kolmogorov, we can also refer to later works [1]-[6] containing a num-
ber of additional references.) At first sight this prediction method seems to
be quite simple: it requires only preliminary estimates of a certain number of
correlation coefficients which determine the unknown coefficients al, ... , ak in
the regression equations and does not require cumbersome calculations such
as those involved in "dynamic weather forecasting" based on numerical solu-
tion of partial differential equations approximately describing the dynamics
of the atmosphere. Here, the only problem is to choose appropriate predic-
tors !1xl, ... , !1xk, that is, atmospheric characteristics in the past and present
whose values are used for predicting !1y. It turns out, however, that to choose
appropriate predictors is not that simple.
Indeed, it might seem that it is better to take as many predictors as possi-
ble, since in that case the forecast uses very broad initial information and will
588 COMMENTS
be accurate with high prob ability. Unfortunately, the empirical correlation co-
efficients ri and rij between il.y and il.zi, i = 1, ... , k, and between the pairs
il.zi and il.Zj, j = 1, ... , k, used for calculations are not exact and depend
on the volume (and quality) of the available empirical data. Therefore, the
coefficients al, ... , al: for which the linear combination a1il.z 1 + ... + al:il.zk
corresponding to these data gives the best approximation of il.y are not ex-
act, that is, they are not the best for the whole sampIe of random variables
(il.Zl, ... , il.Zl:, il.y) and applied to the subsequent independent sampIe, these
values may be appreciably less appropriate. To avoid this, the choice of predic-
tors should satisfy a number of special conditions. Here we will only consider
the two most important of them.
First, in order that the accuracy of the determination of the coefficients
al, ... ,ak should not be too low, the total number k of predictors should be
rather small as compared to the volume of the sampIe used for determining
the coefficients ri, rij and ai. Thus, for example, if the order of the sample's
volume is one or two hundred, then k should not exceed several units. This
condition was intuitively clear even for the first researchers of statistical weather
forecasting (see, for example, Kolmogorov's remark to the effect that when
using a sampIe with 30-50 average annual values, k is usually chosen to vary
from 3 to 7); later, the well-known American meteorologist Lorenz called it "the
taboo of statistical forecasting" . However, apart from this, there is another
essential requirement, which is clarified in No. 18 and which is by no means
always taken into account, even now. This second requirement is to forbid
searching among a large number of systems of predictors (even if each of these
systems contains only a small number of values) in order to choose the best of
these systems. The point is that if we try a large number of various systems
of predictors, then with high prob ability at least in one of them the empirical
value of the cumulative correlation coefficient between these predictors and il.y
is much larger than its true value. In this case it is quite possible that this
system of predictors will be chosen for forecasting. However, as applied to a
subsequent independent sampIe the empirical correlation coefficient of il.y and
il.u =
a1il.zl + ... + akil.zk will most probabably be essentially smaller than
that of the initial sampIe. Kolmogorov had good reason to assert that this
"blow-up of the maximal empirical correlation coefficient" will very often take
place when searching for a large number of systems of predictors, which readily
STATISTICAL PREDICTION 589
ysis (cf. [7], Chap. 11). In this method, first a linear combination Ul = dXl
of predictors with the largest variability (that is, variance) is chosen from the
initial set of all possible "virtual predictors", then only linear combinations of
predictors uncorrelated with Ul are considered, and the combination U2 = dX2
with greatest variance is chosen, etc. Usually after selecting a small number k
of linear combinations Ul. U2, •.• , Uk the variance of all the others is so small
that they might as weIl not be considered. In cases when the original number
of admissible predictors is comparatively small (as, for example, in Obukhov's
work [10] where the number is 5), this method results in a notable simplification
of the calculations (for example, allowing one to confine oneself to considering
only the two most significant linear combinations ofthe five variables). If, how-
ever, we first consider a large number of predictors (for example, all the values
of all or several meteorological fields for a large network of stations or at the
no des of a regular network), then we can again apply Kolmogorov's considera-
tions, which suggest that the combinations are optimal only for the sampie for
which they were calculated, and on passing to an independent sampie the re-
sults become much worse. It is quite possible that this accounts for some of the
disappointing conclusions on statistical forecasting using empirical orthogonal
functions contained in [4].
In conclusion, we should emphasize that Kolmogorov's work clearly demon-
strates the need for a thorough study of the selective properties of systems of
predictors used for developing methods of statistical weather forecasting. This
study was started by A.N. Kolmogorov in 1933; unfortunately, many problems
remain open even now.
References
The problem 80lved by A.N. Kolmogorov in this paper was posed while studying
productive rock-masses (strata) of the Apsheronskii peninsula and red-rock
oil-bearing sediments on Cheleken. These strata are comp08ed exclusively of
terrogeneous formations. They include, on the Apsheronskii peninsula, sparse
conglomerate beds, while at Cheleken the strata consist completely of sands,
aleurites and clays. The depressions in lower beds are filled by the shelves of the
higher ones. Moreover, the bases of thick sand beds often contain fragments
composed of clays that are identical in their appearance to the clays of the
clay bed underlying the sand beds. These fragments are rounded, and they are
considered by all researchers as the traces of washout of the lower bed when
the upper one was formed. This pattern is especially typical of Cheleken.
These features of the strata leave no doubt that when they were formed, the
lower beds were washed out during the formation of the upper ones ("Inter-
592 COMMENTS
bed washout"). So there were sound reasons for posing the problem solved
by Kolmogorov and it has retained its importance to this day. At the same
time one should bear in mind that when Kolmogorov's paper was published, a
number of geologists believed that the thicknesses of beds in a profile do not
depend on the material of the beds. In general, this opinion turned out to be
false. Moreover, when the paper was published, geologists did not operate with
random variables, probability distribution functions and sequences of values of
a random variable. It was the period in which the foundation of a number of
geological disciplines had just been laid, based on the notion of the stochastic
character of the values studied. This fundamental restructuring later gave rise
to mathematical geology and was largely as a result of Kolmogorov's paper, as
weIl as to personal advice and re marks by him during 1945-1950.
For the application of No. 37 to the solution of the problem relating to the
mechanism of bed formation, three requirements must be met:
a) numerical solutions of Kolmogorov's equation must be found;
b) the applicability ofKolmogorov's axioms (which, from the modern view-
point, does not have a universal character at aIl) to specific profiles should be
confirmed;
c) a model of bed-formation that gives G( x) should be available. To obtain
reasonable results, this model should be derived from bed-formation conditions
and should allow testing based on data of observations.
Condition a) is in no way restrictive. Kolmogorov's equation coincides
with the Wiener-Hopf equation, and there is extensive literature on numerical
methods for solving this latter one. In any case, even in the late 1940's there
were quite a number of methods for solving this problem. An appropriate
algorithm was chosen immediately after Kolmogorov's paper was published,
and it was used to compute a number of examples. These data were not
published, since no function could be taken for G(x).
The axioms adopted by Kolmogorov (condition b» are as follows:
1) the random variables On and ~n+l are independent, and all the On have
the same distribution law P{on < x} = G(x);
2) the expectations EO n= J~oo xdG(x) are positive;
3) the distribution of On is continuous, that is, it can be expressed in terms
of a corresponding probability density g(x) by the formula G(x) = J~oo g(x)dx.
Recall that On is the difference between the thickness of the sediment
ON INTERBED WASHOUT 593
and the subsequent depth of the washout of this sediment directly after its
accumulation is finished. The sequence of beds in the profile may have certain
beds with thickness contained in the sequence bn . It is not dear how to identify
the layers with thicknesses from the sequence bn during field work.
Items 2) and 3) of these axioms do not give rise to any doubts from a
geological viewpoint. However, in analyzing item 1), the following facts discov-
ered after publishing No. 37 have to be taken into account. Apparently, it is
reasonable to assume that: 1) in specific profiles the beds that had undergone
a washout and belong to the sequence bn can dominate; 2) when the layers of
sand beds are deposited, the values T/n playa decisive role in the final deter-
mination of the thickness of a bed. For a sequence of day beds, in many cases
T/n = O.
If this is true, then by studying the strata profiles during field work we can
estimate to what extent item 1) of these axioms is true. The following relevant
data were obtained by studying the profiles of red-rock and productive strata
and the flysch of north-western Caucasus, Kakhetia and Southern Urals.
For beds of different compositions the average bed thicknesses are usu-
ally different. Thus for red-rock strata in Cheleken it was found out that the
sand layers have average thickness of 84.11 cm. with standard error 188.34
and asymmetry +5.0 (for number of beds n = 318); for aleurites 15.22 with
16.93 and +2.7 (n = 471), and for days 27.66 with 53.23 and +8.0 (n = 792)
respectively.
Sampie correlation functions for sequences of bed thicknesses are divided
into two types, Band F (Vistelius [2]). For type B all autocorrelation coefli-
cients are positive and monotone decreasing as the distance s between the beds
increases (in the number of beds). For type F, the odd autocorrelation coefli-
cients are negative and the even ones are positive. Both sequences of even and
odd autocorrelation coeflicients rapidly and monotonely decrease in absolute
value as s increases. Hence, if the beds with thicknesses from the sequence bn
really dominate in the profile, then the assumption of the independence of the
bn can be questioned.
It is interesting to note the following. Let
E(x" X,+r la,=i, a,+r=j) - E(x, la,=j, a,+r=j )E(x,+rla,=j, a,+r=j) = 0,
where adenotes bed composition, i,j denote elements ofthe set ofbed compo-
sitions, x denotes the bed's thickness, s denotes the number of the bed starting
594 COMMENTS
from the foot of the profile, and r denotes the distance between beds. In this
case, for simple Markov chains consisting of two states (for example, sand and
day), each of which corresponds to a random value (bed thickness) with dis-
tinct expectations for each state (that is, the average thickness of sand beds
differs from that of day beds), only three types of correlograms are possible:
belonging either to type B, to type F, or identically zero (Vistelius [4]).
The latter statement shows that in the sequences of beds that were washed
out only once after sedimentation, and without total destruction, the sequence
of bed composition is a simple Markov chain, and the correlogram is of type
F. Hence for these sequences, bed sections could be found in which successive
beds have the same g(z) and the 6n are independent. Something similar was
observed in flysch profile in the Southern Urals. Here a sequence of 1530
layers (described by Bezhaev) is dose to a homogeneous Markov chain, and its
correlogram is of type F with the following sampIe autocorrelation coefficients:
(Vistelius [3]). The table below gives estimates of the correlation coefficients
between bed thicknesses next to each other (8 + 1) and every second one (8 + 2)
for fixed bed compositions.
Composition Composition
Composition oft he (s + 1)th bed of the (8 + 2)th bed
of the 8th bed
'Ir er 'Y 'Ir er 'Y
where 11" are sand, er aleurite, and 'Y clay beds, whose comp08itions are given
by the table's entries. The numerators of the entries give autocorrelation coef-
ficients between the beds of the corresponding compositions and denominators
the number of the bed pairs observed. Dashes mean that the number of the
beds compared was less than 5.
It can be seen from the table that in a first approximation the beds com-
p08ed of sands and aleurites make it possible to take a sampie in such a way
that the 6n are independent, provided they dominate in the profile.
Summing up it can be said that, when embarking on the analysis of bed
sequence we should very thoroughly check condition 1) of Kolmogorov's axioms.
Here we may encounter both cases satisfying the axioms (especially if we use
a special testing procedure and study not all the beds, but only some special
types of them, for example, sands), and those contradicting it. One should
also study the robustness of the model, that is, how violations of the axioms
propagate on the final geological conclusions.
While the problem of verifying the axioms finally reduces to that of the
robustness of the model and the necessity of developing special procedures for
selecting beds that satisfy Kolmogorov's conditions, the question of choosing
g(z) in specific studies is much more difficult. Since there are non-observable
values among the 6n , g( z) can only be found from a model of layer formation
based on appropriate lithological (sedimentological) assumptions. Attempts to
construct such a model have not resulted in anything really worth while so far.
Kolmogorov's paper was ahead of its time and only in 1962, when geol-
ogists became aware of the importance of mathematics for their science and
mathematical geology took its proper place, did this work draw the attention
of researchers. This paper was referred to in the first book on the application
of mathematical methods in geology (Miller and Kahn [7]), and was mentioned
later in all serious publications on this subject (Agterberg [1]). Unfortunately,
these textbooks gave only general references to this paper, without analyzing it.
In 1975 the monograph by Schwarzacher [8] was published. It was the first to
analyze Kolmogorov's work. Schwarzacher considered the corresponding model
for a discrete case using the method ofrandom walks. He also imposed stronger
independence conditions.
The literat ure on specific studies based on Kolmogorov's paper can be
subdivided into two types. In one of them no analysis of the correspondence
596 COMMENTS
References