Feller 1949 - On The Theory of Stochastic Processes, With Par - Ticular Reference To Applications

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

ON THE THEORY OF STOCHASTIC

PROCESSES, WITH PARTICULAR


REFERENCE TO APPLICATIONS
W. FELLER
CORNELL UNIVERSITY

(ONR. Project for Research in Probability)


1. Introduction
Since Kolmogoroff's famous paper of 1931, "On Analytical Methods in the
Theory of Probability," the theory of stochastic processes has been developed
and it has been shown that it can successfully be applied to practical problems
and used to describe empirical phenomena. However, the theory is new and
the most appropriate mathematical techniques have yet to be discovered. It
is therefore reasonable to expect that the usefulness of the theory will increase
when more pertinent mathematical problems are solved. On the other hand,
these new problems are of interest also in pure analysis beyond the theory of
stochastic processes. In the past, pure mathematics has always derived great
benefits from the interplay with physical theories, and many parts of purest
mathematics owe their origin to physical problems. Now we shall see that our
theory leads to integrodifferential equations of a type never studied before:
they contain as simplest special cases a surprisingly great variety of familiar
and unfamiliar functional equations (sec. 7). It seems probable that our
methods of deriving solutions and adjoint equations could be utilized also for
many functional equations without probabilistic meaning.' Another example
of a problem of general interest on which we shall touch briefly (sec. 10) is
connected with the fact that an empirical phenomenon can often be described
in several ways, say by a system of infinitely many ordinary differential equa-
tions or by a partial differential equation of the diffusion type. This seems to
indicate connections which have yet to be explored.
As for practical usefulness, it should be borne in mind that for a mathe-
matical theory to be applicable it is by no means necessary that it be able to
provide accurate models of observed phenomena. Very often in applications the
constructive role of mathematical theories is less important than the economy
of thought and experimentation resulting from the ease with which qualita-
tively reasonable working hypotheses can be eliminated by mathematical
arguments. Perhaps even more important is the constant interpretation of
observations in the light of theory and of theory in the light of observations;
in this way mathematical theory can become an indispensable guide not only
to a better understanding, but even to a proper formulation of scientific
1 The integrodifferential
equations mentioned in the text contain as a special case, among
others, infinite systems of ordinary differential equations, which will be studied in section 4.
The theory of these ordinary differential equations has been generalized also to cases which
are void of probabilistic meaning [cf. Arley and Borchsenius (1945)].
[403]
404 BERKELEY SYMPOSIUM: FELLER
problems. For example, in geology we are confronted with random processes
which have been going on for millions of years, some of them covering the
surface of the earth. We observe that certain species go through a period of
prosperity and steady increase, only to die out suddenly and without apparent
reason. Is it really necessary to introduce new hypotheses for each new ob-
servation, to assume cataclysms working one-sidedly against certain species,
or to find other explanations? The Volterra-Lotka theory of struggle for ex-
istence teaches us that even under constant conditions situations are bound to
arise which would appear to the naive observer exactly like many of the
cataclysms of geology. Similarly, although it is impossible to give an accurate
mathematical theory of evolution, even the simplest mathematical model of
a stochastic process, together with observations of age, geographical distri-
bution, and sizes of various genera and species, makes it possible to deduce
valuable information concerning the influence on evolution of various factors
such as selection, mutation, and the like.2 In this way undecisive qualitative
arguments are supplemented by a more convincing quantitative analysis.
In the mathematical literature stochastic processes are usually treated in a
formal and general way which does not clearly show the practical meaning and
applicability. On the contrary, practical problems leading to stochastic proc-
esses are usually treated by special methods and under various disguises so
that the connection with the general theory does not become apparent. It
seems therefore desirable to begin (sec. 2) by explaining the very simplest and
most intuitive examples of stochastic processes which have been treated in
the literature. They do not require new mathematical tools, as they all lead
to systems of ordinary differential equations which, though infinite, are of a
very simple form. We shall then pass to more general theories, but it should
be understood that we shall not in this paper consider the most general type
of stochastic processes such as occurs in the analysis of time series. Instead,
we shall restrict the considerations to what is now generally called Markov
processes, that is to say to processes where all future probability relations are
fully determined by the present state, in the same way as in classical me-
chanics the present state uniquely determines the future development of a
system. Furthermore, we shall focus our attention on the so-called discon-
tinuous type of Markov processes where the changes occur in jumps: the
system remains for a time unchanged, then undergoes a sudden change into
another state. This type of process has found extensive and important appli-
cations in the theory of automatic telephone exchanges and in the theory of risk
for insurance companies.3 The continuous type of process is best exemplified
by diffusion processes which are continuous in the sense that some change
occurs during any time interval, however small, but in small time intervals
the changes are small. This type of process is of special importance in physics,
2 For details, cf. Yule (1924). It is evident that at that time Yule could not use the forma-
lism of stochastic processes (which would have simplified some of his considerations).
3 Elaborate applications of the theory of stochastic processes to telephone exchanges will
be found in C. Palm's book of 1943. The so-called collective theory of risk has been initiated
by F. Lundberg and continued by H. Cram6r and his collaborators. An account will be
found in Segerdahli's book (1939). These by no means simple mathematical theories have
found extensive applications in practice.
STOCHASTIC PROCESSES 405
and two excellent accounts have recently been given.4 The connection between
the two types will be touched upon in the last section of the present paper.
2. The simplest examples
(1) The Poisson process.-This is quite familiar to physicists, who often
refer to it as "random events" and occasionally call the Poisson distribution
after Bateman. Here we consider this example as a point of departure for
various generalizations and also to emphasize that the all-important Poisson
distribution appears in its own rights and not merely as an approximation to
the binomial distribution.
Consider, in order to fix ideas, events occurring in time, such as telephone
calls, radioactive disintegrations, impacts of particles (cosmic rays), and the
like. Let it be assumed that (i) the probability of an event in any time interval
of length dt is, asymptotically, qdt, where i7 is a positive constant; (ii) the
probability of more than one event in a time interval dt is of smaller order of
magnitude than dt, in symbols o(dt); (iii) the numbers of events in non-over-
lapping time intervals represent independent random variables.
Denote, then, by Pn(t) the probability of having exactly n events in a time
interval of length t. We want to compare P,(t) with P.(t + dt). Now n _ 1
events can occur in the interval (0, t + dt) in one of several ways: either n
events occur in (0, t) and none in (t, t + dt); or n-1 events occur in (0, t) and
one in (t, t + dt); or, finally, less than n - 1 events occur in (0, t) and more
than one in (t, t + dt). Writing down the corresponding probabilities we find
PQ(t + dt) = (1 - qdt)Pn(t) + 7dtPn-1(t) + o(dt), (1)
and similarly
P0(t + dt) = (1 - 7dt)P0(t) + o(dt). (1')
Rearranging these equations and passing to the limit we find easily that our
probabilities satisfy the system of differential equations
(P'' (t) =- 2P0(t),
(2)
p (t) = tP.(t) + rnPn-l(t) X n 21
The initial conditions are obviously
PO(0) = 1, Pn(0) = 0, n> 1. (3)
Fortunately, in this case, the differential equations are of a recursive char-
acter and can be solved successively. The required solution is given by the
familiar Poisson distribution

Pn(t) = en' X n _ 0;t > O. (4)


4Cf. Chandrasekhar (1943); Wang and Uhlenbeck (1945).
406 BERKELEY SYMPOSIUM: FELLER
If Nt denotes the number of events during (0, t), then the expected value
E(Nt) = M(t) and the variance E(N2) - M2(t) = a2(t) satisfy differential
equations which can be deduced directly from (2). For comparison we quote
here the familiar fact that
M(t) = rt, a2(t) = tat. (5)
It is important to notice that the applicability of the Poisson distribution
(4) is far wider than would appear on the surface. It describes adequately
many phenomena which play in space rather than in time; in such cases the
parameter t stands for volume, area, or length, instead of time. With such an
interpretation of t it will be seen that our assumptions which led to the dif-
ferential equations (2) hold for many "events," such as the distribution of
stars in space, flaws in a material, raisins in a cake, misprints in a book. In
such cases we shall refer to the parameter t as "operational time." We shall
see that many stochastic processes do not necessarily play in time, but that
the operational time may be anything, such as depth of penetration, force,
and the like.
Before passing to other examples we shall introduce the terminology which
is most convenient for the general theory, although it sounds somewhat arti-
ficial in the special case of the Poisson distribution. Instead of saying that
during the time interval (O,t) exactly n events have occurred, we shall say that
"the system is in state E.." Instead of saying that at time t an "event" has
occurred, we shall say that "the system has passed from state E. to E,,+,."
(2)Radioactive disintegrations.-The terminology is well illustrated by the
successive transitions of a radioactive atom. We may say that the atom passes
from one state to another, in symbols Eo -'-> E1 -'- . . .* EN. Of course this
time we have only a finite number of possible states. Moreover, in the Poisson
case the probabilities of transition were the same for all states. According to
accepted theories, if the atom is in state Ei it has a probability asymptotically
equal to vi dt to disintegrate in the following time interval of length dt; this
probability does not depend on the age or history of the atom, but changes
from state to state. Clearly this case can be treated exactly as the Poisson case,
only that this time the factors q will depend on the state. The corresponding
differential equations now take on the form
V0'~t = -7oPo(t),
{ ;:ti0 = - 17Pn(t) +7t-1 P.-1 (t), 1 _ n _ N;'qN = 0. (6)
The initial conditions are of course the same as before. The system (6) is fre-
quently found in the literature and the explicit solution has been given (and
is being given) independently by many authors. If we suppose that no two
among the vi are equal, this solution can be written in the form
P(t) =X
n
Ee
kilo
{(?ak - so)...* (nak - 'qk-1) (n~k n~k+l) ...
(nak -
tn.) (7)
STOCHASTIC PROCESSES 407
These formulas are rather unwieldy and it is usually simpler to obtain per-
tinent information directly from the differential equations (6). For example,
it is easy to see that for 1 < n _ N - 1 the function Pn(t) first increases
monotonically to a unique maximum and then decreases steadily to zero. If
t. denotes the place of this maximum, then ti < t2 < . < tN-1. For t <t.
. .

one has qlPn(t) < n,-,P. i (t), whereas for t > t. the reverse inequality holds.
These facts check and explain experimental observations.
(3) Growth without absorption.-The differential equations (6) and their
solution (7) retain sense also in case N = a. This case is met with in practice
in connection with all kinds of physical or biological growth processes; the
state E. denotes then the number of "individuals" in the "population." Since
only transitions En -.- E,+, are possible, individuals can only multiply, not
die or be absorbed. The factors Fn determine the intensity of multiplication
peculiar to the state E.. The simplest case is that where the individuals multi-
ply independently of one another and each individual has in any time interval
of length dt a chance i/dt to split into two. Because of the assumed independence
we have then
ten = nq, (8)
and (6) becomes
Pnn(t) = nPn (t) + (n - 1)tqPn- ( n> 1, (9)
usually with the natural initial condition
P, (O) = 1, P,(0) = O, n > 1. (10)
The solution follows either from (7) or by direct integration:
Pn(t) = e- it(l - e-11S)n-l.ll
This type of process has been used (in an indirect way) by Yule in his mathe-
matical theory of evolution, to which we have referred in the introduction.
There the population consists of the species within a genus, or of the various
animal or vegetable genera. The multiplication is due to mutation, each species
or genus having at any time a (constant) probability of throwing a new species
or genus. The theory is used to study relations between the age of genera, the
number of composing species, their geographical distribution, and the like.
The conclusions permit us to confirm or reject various assumptions concerning
evolution. The same type of process has been used by Furry in the theory of
cosmic ray showers; Arley (1943) calls (11) the Furry process. The population
there consists of electrons or of photons, and the multiplication is actually a
complex event (each electron can emit a photon, each photon can be absorbed
emitting two electrons; the differential equations then refer only to each
second generation). The same equations have been used by Feller (1939) to
408 BERKELEY SYMPOSIUM: FELLER

study fluctuations in biological growth. In all these cases it would be more


natural to take into account the possibility of an individual dying at any time.
The mathematical expectation of the size of the population, that is,
M(t) = EnPR(t), (12)
can be obtained directly from (9): multiplying by n and adding, one sees that
M'(t) = i7M(t), or
M(t) = en'. (13)
For the variance
=2(t) En2P.(t) - M2(t), (14)
one obtains similarly
a2(t) M2(t) - M(t), (15)
a relation which permits comparison with actual observations [cf. Arley
(1943) 1.
Clearly our process is the statistical counterpart of the deterministic process
of growth described by the equation
x(t) = 1x(t). (16)
In the same way it is possible to translate other differential equations describ-
ing a deterministic growth process into an analogous random process. For
example, to the differential equation
x(t) = qx2(t) (17)
there corresponds a random process described by the system (6) with,, = n2n.
Now a population growing according to (17) would increase indefinitely within
a finite time interval; the solution
x(t) = (c - 7t)' (18)
"explodes" at t = c/i7. One should therefore assume that a differential equa-
tion of type (17) could not naturally describe a growth process. Accordingly,
it is of interest to study the corresponding anomaly for the system (6) and to
find out under what conditions this system can describe a regular growth. It
turns out that every time the series

E,7 .-1 (19)


converges the solutions P,(t) of (6) have the undesirable property that
E nt <S1.\ (20)R
STOCHASTIC PROCESSES 409
This can be interpreted by saying that there is a finite probability that in-
finitely many jumps will occur during a finite time interval, or that the
system "explodes" in a way similar to the singularity of (18). On the con-
trary,5 if the series (19) diverges, the sum (20) equals unity and the probability
distribution behaves properly in every respect. Similar situations arise in
connection with more general processes.
Equations (9) and (16) correspond to a growth where there is no interaction
whatsoever between the individuals. The so-called logistic law of growth as-
sumes that the size of the population has an adverse influence on the repro-
duction rate of the individual, and accordingly one assumes that the intensity
functions n, are of the form
n = n' - n2t, (21)
where 7 and r are positive constants. It is generally believed that the corre-
sponding deterministic equation
x(t) = x(t)- t) t) (22)
gives the expected value of the process (6) in the same way as (16) really gives
the expected value of the undisturbed growth (9). In reality this is not true
and the latter quantity is somewhat smaller than the solution x(t) of (22).
The fact that (16) actually gives the expected value for the process (9) must
be looked upon as a coincidence (which is very fortunate for the theory of
radioactive decay). Similar remarks hold for the more refined problems of the
Volterra-Lotka theory.
(4) Growth with absorption.-The possibility of an individual dying or of a
particle being absorbed is taken into account by assuming that not only transi-
tions En -* E.+, are possible, but also the reverse transitions En -*- E,,-. Let
us, for example, suppose that in any small time interval of length dt any
individual has a chance ydt + o(dt) of being absorbed, and that the intensity
-y does not depend on the actual state of the system. An obvious change in the
argument leading to the Poisson distribution and to (9) shows that the prob-
abilities P.(t) now satisfy the system of infinitely many equations
rP'0(t) = -P(t),
(23)
LP'n(t) = -n(r7 + -y)Pn(t)+ (n - 1)nPn-,(t)
+ (n + 1)-yP.+i(t), n > 0.

This system presents a novelty, inasmuch as it is not of recursive type and


its integration therefore is not obvious. It can be accomplished by introducing
the generating function.
P(t,x) = lp.(t)X,
5 Feller (1940). Recent results of Doob (1945) reveal the curious fact that, whenever (19)
converges and therefore (20) holds, the differential equations have infinitely many solutions
which are meaningful as probabilities. They correspond to a ghostlike return of the system
into a finite state after an "explosion" has taken place.
410 BERKELEY SYMPOSIUM: FELLER
for which it is easy to find a partial differential equation of the first order. If
at the beginning the population consists of a single individual, that is, if
Pi (0) = 1, the solution is given6 by
PO(t) = ya, Pf(t) = (1 - e)W(1 - oa) (ai)-1, (24)
where for abbreviation
ff= 11-e(i-'')}{y1 - e71(
t . (25)
In the particular case Sy = n, equations (24) have to be replaced by

P.(t) = tqt{1 + at 1V', p.(t) = (.t)n -{1 + ,tl.+1. (26)


It is interesting to note that the probability of extinction is

01, if o > q1,


lim Po(t) = (27)
I ,if z<X

whereas for every fixed n> 0 the probability P.(t) ->- 0. Thus, roughly speak-
ing, for large values of t the population will either have died out or be exceed-
ingly large, whereas there is no probability for moderate sizes. The same
phenomenon is known in connection with the problem of survival of family
names. For the expectation M(t) and variance a2(t) [cf. (13) and (15)l we
obtain easily

M(t) = e(v-)t, a2(t) = {M2(t) - M(t)} 7 . (28)


X7-z
It is seen that absorption increases the magnitude of the probable random
fluctuations.
(5) The Arley process for cosmic rays.-The simplified model (23) does not
adequately describe the phenomenon of cosmic ray showers. With (23) the
expected number of particles would either steadily increase or steadily de-
crease, whereas actually the number of electrons first rapidly increases but
then decreases, since with increasing depth of penetration the particles lose
energy and are more rapidly absorbed. Also, as Arley remarked, the actual
fluctuations are much larger than would correspond to (23). The actual prob-
abilities of multiplication and absorption for each particle depend on its
energy; to describe this situation adequately, more complicated models of
stochastic processes are required, where the state of the system is no longer
described by a simple integer. However, as Arley (1943) has shown, satis-
6 Equations (23) have been considered by Feller (1939), by Arley (1943), and (in an in-
direct way) by Yule (1924). The solution (24) is due to C. Palm and has been communicated
without proof by Arley and Borchsenius (1945).
STOCHASTIC PROCESSES 411
factory results can be achieved in the following simple manner. We assume
that the probability for absorption for each particle increases linearly in time,
whereas the probabilities for multiplication remain constant. Then the in-
tensities ti,, and yon assume the forms nrq and n-yt, respectively, and (23) is re-
placed by
P' (t) = -n(r1 + yt)P.(t) + (n - 1)>Pn-l(t) + (n + 1),ytP.+±(t). (29)
This is the first example for a non-stationary process. For the expected value
of the number of particles, one obtains easily

M(t) = exp (at - 2 t2)' (30)


2 (0
with a maximum at t = n/-y. The method of the generating function described
above [cf. subsec.(4) above] leads also to the following explicit solution :

P0(t) = 1 - exp (it - 2 t)2, (31)


2
{Pn(t) = exp (-at 2
+ t2) An-l {A + exp ([at - 1]2/2y)} (l)
where
A exp ([,ys -2) ds. (32)

(6) The P6lya process.-It is well known that P6lya has devised a scheme of
urns which in the simplest way illustrates the occurrence of contagious events,
in the sense that with P6lya's scheme each favorable event increases the prob-
ability of succeeding favorable events. P6lya's probability distribution has
proved exceedingly useful and serves as the prototype of contagious distri-
butions. 0. Lundberg (1940) has shown that the full power and flexibility of
the P6lya distribution appears only if one passes from discrete drawings from
an urn to a stochastic process with a continuous time parameter.
In the original scheme, drawings are made from an urn containing Np white
and Nq black balls (p + q = 1). After each drawing the ball is replaced and,
in addition, No balls of the color last drawn are added to the urn. If X is the
number of white balls in n successive drawings, then

Pr(X = k) =
n p(p+)(p +28) . . .
(p+[k-1]b)q(q+±) . .
(q+[n-k-1 a)
.

k1 pub+l)+2s) h e d .+[n-116)

'Not published.
412 BERKELEY SYMPOSIUM: FELLER
and an easy computation shows that
E(X) = np, o2(X) = npq(l +n8) (1 + 5).-' (34)
If we assume that in the first n drawings exactly k white balls have been
drawn (that is, that X = k), the probability of obtaining a white ball in the
next drawing is
p + k3
1+nS (35)
One could pass from these formulas to the limit much in the same way as is
customary with the derivation of the Poisson formula from the binomial dis-
tribution. One would imagine that within a finite time interval (O,t) a very
large number, n, of drawings is made but that the numbers p and a are very
small. In order to obtain a reasonable passage to the limit one puts
lim np = t, lim nb = td, (36)
where d is the new parameter of contagion. We obtain then from (33) the
following distribution, depending on the continuous time parameter t and on
the parameter of contagion d,

rP0(t) = (1 + dt)-l/d
Pk(t) = ( t )k(l + d)( + 2d) *..(1 + [k-1]d) P.(t) (37)

Now it seems simpler and more natural to start directly with this process
and to derive its differential equations in the same way as we have obtained
the Poisson distribution, without a passage to the limit. Let the "state" de-
note the number of events during (O,t). Then only passages Ek -*- Ek+l are
possible, and the process is determined by some differential equations of the
form (6). Formula (35) suggests putting

1k
=1 + td (38)

It is readily verified that the probability distribution (37) is really the solu-
tion of (6) with (38) and thus represents the continuous limiting case of the
P61ya scheme.
It is clear that (38) has been chosen in accordance with the original P61ya
scheme but represents only one of the many possible choices for the intensity
functions. From a purely abstract point of view there is nothing remarkable
in the P61ya scheme, but this does not detract from its value. It has been
selected judiciously as the simplest scheme of contagion that lends itself to
practical purposes. One of the reasons why it is so practicable is the fact that
STOCHASTIC PROCESSES 413
with (37) the first and the second moment can be fitted separately to empirical
observations. Interesting material illustrating this fact will be found in 0.
Lundberg's book, where the distribution (37) has been derived and applied
to accident and sickness statistics. Arley (1943) used the same distribution
as an approximation to theoretical distributions, which are less usable in
practice.
3. The nature of contagion
P61ya's original urn scheme (as described in the last section) clearly shows
a true contagion, and it has become customary to assume that a kind of actual
contagion exists in nature whenever P61ya's or a similar distribution shows
an excellent fit to observations. As 0. Lundberg has pointed out, there is no
justification for this assumption. The argument actually implies that the
distribution (37) is "contagious" simply because it satisfies the equations (6)
with non-constant coefficients (38), or, in other words, because the transition
probabilities depend on previous happenings. In the same sense all the prob-
ability distributions derived above should be (and usually are) classified as
"contagious." However, the equations for our growth processes have been
derived from the fundamental hypothesis that there is no contagion whatso-
ever, each individual being absolutely independent of the others and of the
past. It is interesting to notice that even the P6lya distribution (37) admits
of an interpretation which excludes any notion of contagion. In fact, it has in
this form been anticipated by Greenwood and Yule (1920) in their studies on
industrial accident statistics.
Suppose, in order to fix the ideas, that each individual in a large popula-
tion is liable to accidents: within any short time interval of length dt each
individual has a probability asymptotically equal to i7dt to sustain an accident.
The parameter q is characteristic for the individual and remains constant in
time. The number of accidents of any particular individual during (0,t) is then
a random variable distributed according to the Poisson distribution. Now
different individuals are characterized by different parameters a0, that is to
say, they exhibit a variable proneness to accidents. Accordingly, the param-
eter v becomes itself a random variable, and the probability that an indi-
vidual taken at random from the entire population should have k accidents
during (0,t) is given by
Pk(t) = f e.t,' k! dU(in), (39)
where U(,7) denotes the distribution function of the random variable t7. Green-
wood and Yule assumed in particular that q has a Pearson Type III distri-
bution:
U(Qq) =
C"r)'- e- , r, c, q > 0. (40)

In that case, (39) reduces to the P61ya distribution (37).A


s Cf. 0. Lundberg (1940); also Feller (1943).
414 BERKELEY SYMPOSIUM: FELLER

With the "contagious" interpretation of (37) one is led to believe that for
each individual each accident increases the probability of another accident.
With the new interpretation this probability remains absolutely constant,
an accident in no way influencing the probability of further accidents. It is
not now known which of the models of stochastic processes admit of a similar
double interpretation and which (if any) necessarily mean true contagion. It
seems that none of the distributions which are now used in the literature
permits the conclusion that the phenomenon described is contagious.
As an amusing example let us consider football, where the observed num-
ber of goals in individual games shows a clearly "contagious" distribution.
This could be interpreted as true contagion, where each success increases the
probability of further successes; or by assuming that for each team the prob-
abilities remain constant but that the skill (= parameter in a Poisson dis-
tribution) varies from team to team. In most cases a combination of the two
interpretations would probably come nearest to the truth. Incidentally, the
treatment of statistical problems connected with the game of cricket has been
suggested by Seal, in his discussion of the paper by Elderton (1945).
4. Markov processes leading to ordinary differential equations
The considerations of the second section can easily be generalized to the
case where not only transitions from a state Ek to the neighboring states
Ek+l and Ek-l are possible but also transitions from any Ei to any Ek. For
example, in the statistical theory of strength of material one considers9
bundles of threads under tension. The strength of an individual thread is a
random variable. In state Ek the bundle consists of k threads, and it is as-
sumed that each carries the same load. The load plays the role of opera-
tional time and will be denoted by t. As t increases, a moment will be reached
where the weakest among the k constituent threads will break. Now this does
not necessarily mean a transition Ek -. Ek-l. In fact, if k - 1 threads re-
mained, each of them would now carry the load t/(k - 1), and it is possible
that the weakest among them is too weak to carry this load. It may happen
that among the k - 1 remaining threads one is too weak to carry the load
t/(k - 1), another too weak to carry t/(k - 2), but that there are k - 3
threads able to support the load t/(k - 3). In this case the bundle will pass
from state Ek to Ek- 3, and similarly the passage to any Ej with j < k is theo-
retically possible. In connection with a discussion of Pareto's law of income
distribution we shall find it natural to consider the changes of income of an
individual as a random process. If the possible incomes are divided into classes
El, E2,. --, any transition Eg-o- Ei becomes imaginable.
For the general theory it is convenient to change the notation slightly and
to denote by P~k(7,t) the (conditional) probability of the system to be at time
t in state ZA if at the time r < t it has been in state E,. Except for the last two
all examples of the second section were stationary in time so that their transi-
tion probabilities P~k(T,t) would depend only on the difference t - T. With
the present notations the P. of these examples would be denoted by Pok or
Plk, respectively.
9 Cf. Daniels (1945).
STOCHASTIC PROCESSES 415
A process of random transitions from one of the states to another is called
a Markov process if the transitions in non-overlapping time intervals are
statistically independent (so that corresponding probabilities multiply). Con-
sider then an arbitrary moment s, (T < s < t). At time s the system is in some
state Ei, and a transition E,, * E3 -* Ek has, by assumption, probability
Pj(r,S)Pjk(S,t).
Therefore
Pk(Tt) = EP,,i(TS)Pjk(st). (41)

This is the fundamental identity which may be taken as the description of


Markov processes with a simple sequence of states; it is generally known as
the Chapman-Kolmogoroff equation.
Now we can proceed as before and deduce from (41) differential equations
for Pvk(Tt). For that purpose we have again to assume that certain elementary
probabilities are known: (i) If at time t the system is in state Ek it will be as-
sumed that the probability for any change during (t, t + dt) is pk(t)dt +
o(dt). The function pk(t) will be referred to as intensity for the state Ek.
(ii) If at time t the system is in Ek and if a change occurs, the (conditional)
probability that this change will take the system into En will be denoted by
Ilkf(t). Obviously the conditions
Pk(t) _ 0, lkn(t) _ 0, flkk(t) = O, E
n
flk(t) = 1 (42)
must be satisfied. Otherwise the pk(t) and the lJk' can be chosen arbitrarily. The
two assumptions can be combined to read
{Pkk(t, t + dt) = 1 - pk(t)dt + o(dt),
(43)
Pkn(t, t + dt) pk(t)HkII(t)dt + o(dt).
=

Following Kolmogoroff (1931), it is now easy to deduce two systems of


differential equations for the Pk(Tt). In the "forward equations" both v and
r are fixed and only k and t are the variables. In a formal way they are ob-
tained from (41), putting s = t - dt, substituting from (43), and passing to
the limit. One obtains the ordinary differential equations

at Pk(Tt) = - pk(t)P~k(Tt) +
j pj(t)IIk(t)Pj(t). (44)

The parameters v and T appear only in the initial conditions, which are ob-
viously
1, if k=v,
Pk(TT) = (45)
0, if k#.
416 BERKELEY SYMPOSIUM: FELLER
Conversely, the "backward equations" contain k and t only as parameters
and refer to v and T as variables. They follow from (41) on putting s = r + dT
and passing formally to the limit:

a (46)
Prk(Tft) = pv,() {P1k(rt) - Z1.ij(7)Pjk(rt)}

The initial condition is again (45), with T replaced by t. The forward system
appears more natural from a physical point of view, but the backward system
is occasionally preferable for technical reasons. It plays also an important
role in connection with questions of reversibility. In the terminology of the
theory of differential equations, each of the systems is the adjoint of the other.
As an example, consider the Poisson process. With the present notations
its transition probabilities are
t1(t - 'W1
P11(r t) = (k -)! ' ifk (47)
O. if k <.

By definition of the process pk(t) = H,lk,k+l(t) = 1, whereas al other


HIkI(t) = 0. The two systems satisfied by (47) are

{at
-d P~k(T~t)
at
= -'q{Pvk(Txt) - ,klT)}

d PV (T t) = n1{Pk(Tt) P,+1,k(T,t)}. (48)

In the general case the situation is of course less obvious.


The two systems of differential equations could also be interpreted mechan-
ically by considering containers Es and a liquid flowing from one to the other
through channels whose conductivity is regulated by the coefficients of the
equations. This interpretation is natural at least in the stationary case where
all the coefficients are constants.
It has been shown"0 that each of the two systems of infinitely many ordinary
differential equations (44) and (46) has a solution which is automatically a
solution of the other and of the fundamental equation (41). In all cases of
practical interest this solution is unique."1 However, there exist particular
processes where a phenomenon of "explosion" can occur, completely analogous
to that discussed in connection with the example, given in subsection (2)
above. Necessary and sufficient conditions for the possibility of such an occur-
rence have also been found (loc. cit.). Incidentally, it will be seen in section 6
10 Cf. Feller (1940).
11 There is in general no uniqueness if the consideration is not restricted to solutions which
are meaningful from a probability standpoint. Cf. also footnote 5.
STOCHASTIC PROCESSES 417
that our differential equations are only a very special case of a couple of
integrodifferential equations describing the general discontinuous Markov
process.
5. Ergodicity
The classical counterparts of our random processes are the discrete Markov
chains, that is to say, random operations such as shuffling of cards, random
walk, transferring a molecule from one container to another, and the like.
With each operation the system passes from one state to another, but the
operations are performed at arbitrary moments. With shuffling cards it seems
plausible, and desirable, that after a large number of operations all permuta-
tions become essentially equally probable, no matter what the initial arrange-
ment has been. If there are infinitely many possible states, one cannot reason-
ably expect that all of them will in the limit become equally probable. Instead,
we shall say that a Markov process with transition probabilities Pwk(r,t) is
regularly erogodic if
lim Pk(Tt) = Pk (49)
exists and is independent of v and r; in this case the probability distribution
for the states tends to a uniquely determined limiting distribution which is
independent of the initial state. We shall apply this notion in particular to
stationary processes where the transition probabilities depend only on the
difference t - r. It should be noticed that the constants P, do not necessarily
add to unity. This fact is illustrated by the limiting distribution (27). In most
physical and technical applications, however, 2P, = 1, and the P, repre-
sents the steady state for the system. A non-trivial example connected with
a waiting time problem will be found in section 8.
The main headache with the classical theory was to establish conditions
under which a Markov chain is ergodic. The trouble is due to the fact that in
the discrete case even quite respectable chains are not ergodic but exhibit
periodic fluctuations. For example, a Raileigh-Pearson random walk consists
of a series of operations each of which carries the moving particle a unit step
to the right or to the left. If the state of the system denotes the distance of
the particle from its original position, an odd number of operations will carry
the system into an odd-numbered state, an even number into an even state.
Therefore the system cannot be ergodic. In view of such difficulties it is re-
markable that stochastic processes with a continuous time parameter behave
much more simply. Any Markov process defined by a system of differential
equations (44) or (46) with constant coefficients will be regularly ergodic,
provided that it is at all possible to reach (in one or more steps) any state
from any state.32 More particularly, only ergodic discrete Markov chains can
be interpolated by processes with continuous time parameter."'
12 The latter condition is by no means necessary and is
given only for simplicity. However,
some restriction is obviously required in order to exclude trivial exceptional cases without
practical meaning, such as systems where the possible states split into two groups without
418 BERKELEY SYMPOSIUM: FELLER
What ergodicity might possibly mean in practice may be illustrated by the
much discussed Pareto law of income distribution. According to this law there
should exist a universal distribution function, depending on one or two param-
eters, which (after an appropriate adjustment of the parameters) would de-
scribe the income distribution within any economic unit (country). It appears
that at present opinions regarding existence of such a mysterious function are,
less divided than opinions concerning its precise analytic form. Now, if the
transition of an individual from one finite income class to another is con-
sidered as a random process which is stationary or almost stationary, the
ergodic properties would, to begin with, ensure the existence of a limiting
distribution. Moreover, it seems plausible that this stable limiting distribu-
tion would to a certain degree be insensitive to the changes in numerical values
of the coefficients pk, llk. as long as these retain certain qualitative properties.
In fact, experience with other limit laws in probability would lead one to
expect that a wide class of processes with the same qualitative properties of
the elementary transition probabilities IkI would lead to the same limiting
distribution in the same sense as the summation of a wide class of different
random variables leads to the normal distribution. If there is any reality to
Pareto's law, such considerations may possibly lead to a better understand-
ing.14 Alternatively, if the hypothesis should prove false, this would not dis-
prove Pareto's claim, but at least it would produce doubts in certain respects.
Another example illustrating the importance of ergodic properties is sup-
plied by statistical mechanics. It is usually proved (by more or less con-
troversial arguments) that the Gauss-Maxwell distribution for the velocities
of gas molecules is the only one having certain properties which are desirable
to the physicist. In itself this does not prove that any gas obeys the Gauss-
Maxwell law. What is really required is the knowledge that, whatever the
initial velocity distribution (which, in fact, is arbitrary), the gas will tend to
a state in which the velocities obey the Gauss-Maxwell law. In classical
statistical mechanics the molecules move in a perfectly deterministic man-
mer, and "statistics" enters into the considerations only in connection with
the initial state. It is well known that the ergodic theory of such systems has
led to very deep and beautiful mathematical results. Seen against this back-
transitions from one group to the other. This case would actually represent two independent
physical systems which have arbitrarily been combined into one.
Even if the process is not regularly ergodic, the limits (49) will exist, but they will depend
on P. If the system of differential equations is finite, the existence of these limits can be estab-
lished in an elementary way: the solutions Pk (Tt) are then linear combinations of expo-
nential terms of the form exp (aiti), and it is not difficult to see that the real part of any
non-vanishing as is negative. For infinite systems the ergodicity can be established either
directly from the explicit form of the solutions or (according to a remark of Doob) as a
simple consequence of abstract ergodic theorems.
13 For finite chains, cf. Elfving (1937).
14 The thoughts expressed in the text are perhaps related to, and certainly influenced by, a
recent paper by Bernardelli (1944). Starting from deterministic considerations, Bernardelli
also arrives at a system of differential equations which may be interpreted as describing a
stochastic process. The present author is unable to follow all the mathematical arguments
in Bernardelli's pa er. In particular, with a division into finitely many classes, it would seem
that all income classes should be equally probable, which would contradict the original
thesis. This is not the first instance where an oversimplification completely hides the actual
mechanism and contradicts all facts and theories; unfortunately, such a contradiction is
not always manifest on the surface.
STOCHASTIC PROCESSES 419
ground, the comparative triviality of the ergodic theory for random processes
becomes the more remarkable. From the point of view of the physicist it may
well pay to introduce random from the beginning at a more essential place.
6. Examples of general discontinuous Markov processes
Up to now we have considered only the case where the possible states form
a simple sequence El, E2, * * * In general, this will of course not be the situa-
tion. Even in the Pareto case the classification of possible incomes into classes
Ei is an artifice which makes the theoretical considerations in many respects
more complicated. In fact, a classification into a finite number of classes
obscures the essential point.'5 It is therefore safer and at the same time more
convenient to characterize the possible states by a real number x _ 0. Similar
situations arise frequently. For example, in the collective theory of risk for
insurance companies as developed by F. Lundberg and H. Cramerl6 the
transitions of the system are connected with the occurrence of a claim; if a
claim occurs, it may be for any amount. Again the actual state is expressed
by a real number, positive or negative. Particles of ore are subject to splitting,
and such a change can occur at any moment. This is another instance where
it is more convenient not to restrict the consideration to a simple sequence
of possible states. The more general method has been used by Kolmogoroff
(1941) to derive the so-called logarithmico-normal law for the distribution of
particle sizes.
A simple example where the solution can be obtained without recourse to
the general theory is supplied by the theory of transport of stones by rivers,
which has been treated by P6lya.'7 It has applications in engineering. A stone
will lie still on the bottom of the river for a relatively long time. For some
reason or other it will sooner or later be set into motion and transported a
stretch X farther down the river. The traveling time is so short that the change
in position can be treated as instantaneous. We are then concerned with a
random process where the state of the system ( = stone) is given by the
distance x from its position, say at t = 0. If one wishes to take into considera-
tion the fact that the river is not homogeneous, the integrodifferential equa-
tion to be developed later is required. In the first approximation, however,
we may, with P6lya, treat the river as homogeneous. Then the probability
of a transport taking place in any time interval of length dt will remain the
same (independent of the actual position of the stone); let this intensity be
denoted by a. Moreover, the distance X, traveled t the nth step is a random
variable and under the hypothesis of homogeniety all X, have the same dis-
tribution function, say U(x). The total distance traveled in the first n steps
is Xl + X2 + * . . + X, and, since the Xk are mutually independent, the
distribution function Un(x) of this sum is given by the well known formulas

U,(x) = U(x), U.(x) = f Un,(x - y)dU(y). (50)


15 Cf. the preceding footnote.
16 Cf. Segerdahl (1939).
17 P61ya (1937, 1938); H. A. Einstein (1937). The methods given in the text are different.
420 BERKELEY SYMPOSIUM: FELLER

With the present assumptions, the number N of transports during a time t is


a random variable subject to the Poisson distribution. Accordingly, the dis-
tribution function for the state (that is, the total distance traveled) can be
written down directly. It is the compound Poisson distribution

n-o n! ()(1
The same distribution has been extensively used by 0. Lundberg and plays
an important role in the collective theory of risk. It has also been given in
Khintchine's well-known booklet published in 1933 (p.23). For many pur-
poses it is simpler to pass from (51) to the corresponding equation for the
characteristic functions. According to the central limit theorem the U^(x) tend,
with increasing n, to a normal distribution. For larger values of t the expres-
sion (51) can therefore be approximated in an exceedingly simple way and
the final result depends essentially only on three parameters: -0, and the first
two moments of the distribution function U(x).
The case where the possible states of the system are determined by two,
three, or more real numbers, that is to say, by a point in the plane or in a
Euclidean space, presents no essential novelty. However, even more general
cases present themselves. In connection with the theory of automatic tele-
phone exchanges one is led to consider units in which certain conversations
may be going on; the state will be described by stating the duration of each
of the conversations and the number of people in the waiting line: the state is
therefore given by a system of real numbers plus an integer. In Arley's theory
of cosmic rays one is interested in the number not only of electrons but also
of photons; the state is then characterized by two integers. If energy is to be
taken into account, the situation becomes more complicated.
It is true that all these examples can, more or less artificially, be reduced
to the case where the state is described by a point in a Euclidean space or
even by a point on the real axis. However, there is no analytic gain in this
procedure and we shall therefore write the equations in their general form.
The reader can always interpret the space to mean the real axis.

7. The integrodifferential equations of Markov processes


We shall denote a possible state by a simple symbol such as x or (, which
thus may stand for real numbers, points in space, or anything else. The set E
of all possible states is, in accordance with physical terminology, called the
"phase space." Instead of the transition probabilities Ppk(Tt) we are now led
to consider transition probabilities of the form P(Tr,Z; tF) where T < t are,
as before, the time parameters, Z is an arbitrary state, and F an arbitrary set
of states (which may also consist of a single state). This P(r,(; tP) denotes the
(conditional) probability that at time t the system will be in some state in-
cluded in F, if at time r it has been in state t. As an example, consider the case
where E is the real axis. The transition from a point t into a point x will in
STOCHASTIC PROCESSES 421
general have probability zero; however, in many cases a density p(r,t; t,x) for
such transitions will exist. If then P is an interval a < x < b, we shall have
eb
P(T7,; tP) = J p(T7,; t,x)dx. (52)
On the other hand, the case where the possible states form a simple sequence
E1, E2, * * can also be interpreted on the real axis by letting the integer
x = k stand for the state E*. Then with our previous notations

P(T.,; t,k) = Ptk (T,t), (53)


whereas transitions into any set (interval) not containing an integer are im-
possible (have probability zero). Equation (53) has an actual meaning only
when Z is an integer, since the state can never assume other values.
The theory of Markov processes hinges on the general Chapman-Kolmogo-
roff equation analogous to (41) and expressing the fundamental assumption
that the changes of the system in any time interval (s,t) are statistically inde-
pendent of the changes in the previous interval (r,s). It reads now

P(r,z; tr) =
f P(T,; sdE.) P(s,y; tI). (54)

We can now proceed as before and introduce elementary probabilities anal-


ogous to the pk(t) and JIk(t) of section 4. Accordingly, we shall assume:
1. If the system is at time t in state x, there is a probability p(t,x)dt + o(dt)
that any change of state occurs during (t, t + dt).
2. If a change occurs, the probability that it will take the system into a
state included in the set I is given by the distribution function Hl(t,x,I). This
function must, of course, satisfy the conditions of respectability, such as being
non-negative, H(tx,E) = 1, etc. Analytically the two conditions mean simply
that, whenever x is not included in F, then
P(t,x; t + dtr) p(t,x) II (t,x,I)dt + o(dt), (54')
whereas
P(t,x; t + dt,x) = 1 - p(t,x)dt + o(dt). (54")
Under some weak regularity conditions on the functions p(t,x) and ll(txF)
it is possible to show [Feller (1940)] that the transition probabilities satisfy
two integrodifferential equations which now take the place of (44) and (46):
The forward equation
a (55)
at P(T-,{; tar)

= -,f p(ty)P(r,(; tdE,) + f p(t y)II(tyr)P( r,; t,dE.),


422 BERKELEY SYMPOSIUM: FELLER
and the backward equation

a P(T,{; t,P) (56)

p(Tr,) {P(Tt; tr) - fP(ry; tF)l(rtdEV) }


Again the common boundary condition is given by the obvious requirement
that, as t - T approaches zero, P(7-,(; ta) tends to one or zero according as I
does or does not include t. With (55) r and Z are parameters occurring only
in the initial conditions, and conversely (56) represents an equation in the
variables r and t only.
As an illustrative example let us return to P6lya's problem of the transport
of stones which led to the solution (51). The states were there given by num-
bers x > 0, and by assumption for any x > 0, t > 0
p(t,x) = n = const. (57)
For the set P we shall naturally take an interval, P = (OX). Then, by defi-
nition,
U(x - 0), if <x,
P(tTF) = (58)
0 if x <t
Obviously the transition probabilities depend only on the differences t- -
and x - t, and we can in this case write P(7,{; t,I) = P(t - r; x - t). Equa-
tions (55) and 56) take on the form

-P(t - ;---
) = P(t - r; x - Z) + U(x - y)d.P(t - ;-
y )} (55')
and

-P(t - Tr;x - = sP(t - T;x - 4)


n) - P(t - r; X - y)dU(y - a). (56)
Introducing more natural variables of integration we see that these two equa-
tions actually represent the same equation, which fortunately is of the simple
"renewal" type. Its integration presents no difficulty, and one arrives also in
this way at the solution (51), where t has naturally to be replaced by t-T
and x by x -t.
Returning to the general case (55) and (56) we shall mention only that the
existence and uniqueness of solutions has been established'8 and the situation
is in every respect similar to that described in connection with the differential
equations (44) and (46).
18 Feller (1940); under somewhat weaker hypotheses, also Feller (1936), where a derivation
of the equations is given.
STOCHASTIC PROCESSES 423

The integrodifferential equations (55) and (56) seem to be of a type not


studied before. They present some general mathematical interest inasmuch
as they contain as special cases many functional equations of unexpected
types. To begin with, the systems of infinitely many differential equations
(44) and (46) are special cases which one obtains by taking for E the real
axis and supposing that all probability mass is concentrated in the points
x = 1, 2, * * * (or, even more naturally, letting E be the abstract space con-
sisting of the denumerably many points Ek). An equation of quite different
character is obtained by again taking for E the real axis but putting

fl(txF) = if x-qiscontained
is in F, (59)
0, otherwise;
here q may be a constant or an arbitrary function. Equation (56) then reads

- P(rr tr) = p(Tr0) IP(rt; tF) - P(Tr, - q; t,F)}. (60)

Now t and I occur herein only as parameters, and the equation is really of
the type of a difference-differential equation

- U(T,) = p(Tr,,) {U(T,{) - U(T, - q) }, (61)


A3T

where p and q are arbitrarily prescribed. By virtue of our theory, equation


(61) has an adjoint which is obtained by substituting from (59) into (55); the
two equations are, in a certain sense, equivalent, although they are very
different in appearance. Taking for E a finite or infinite system of lines one
can similarly obtain finite or infinite systems of equations analogous to (61):

Ui(T,) = E pk(r,Z)Uk(T,t - qk). (62)


If we select for ll(',tP) a function with more than one jump it is possible to
replace the right-hand side in (61) by arbitrary linear combinations u(r,{-q),
and the like. It is evident that our existence and uniqueness theorems apply
to these equations only to the extent that the latter represent stochastic
processes, which imply restrictions such as positivity, bounded variation, and
the like.
8. Waiting time and telephone problems
Although we have here been considering only Markov processes, it must
continually be borne in mind that they represent only a special case of the
general stochastic process. As in classical mechanics the present state com-
pletely determines the future of the system, so in a Markov process the
present state determines all future probability relations. The past history of
424 BERKELEY SYMPOSIUM: FELLER

the system has an influence only in so far as it has produced the present state
("influence globale," in P6lya's terminology). Accordingly, a Markov process
is completely determined when its transition probabilities are known. Now
if a man arrives at a counter and observes three people in the waiting line,
this knowledge of the present state, in itself, does not permit him to compute
the probability distribution of his waiting time. The latter depends in an
essential manner on how the present state has developed, namely, on the time
elapsed since the moment when the customer actually being served got access
to the counter. Thus, at least if "state of a system" is defined so that it can
really be observed, the problem of waiting times does not lead to a Markov
process. (For possible redefinitions of the notion of state, cf. the next section.)
Fortunately, there exists an artifice which is often justified in practice and
which reduces waiting time and similar problems to a Markov process.The
time T which it takes to serve a customer is a random variable and will be
called the "holding time." Let then
F(t) = Pr(T _ t) (63)
be the distribution function of the holding time. If it is known that the present
customer has been at the counter for exactly t time units, the probability that
he will leave within the next dt time units is, up to terms of higher order, equal
to dF(t)/ [1 - F(t) ], provided that the derivative F'(t) exists. Consider now
the particular case of exponential holding times, that is,
F(t) = 1 -e-ct, c > O. (64)
In this case the probability just considered is independent of t and equals
cdt: thus, with (64), if the counter is at any time occupied, the probability
that the holding time will terminate within the next time interval of length dt
is c dt + o(dt) and independent of what has happened before. It is exactly the
problem we have with telephone conversations of certain people where time
plays no role and the probability of a continuation is independent of how long
the conversation has already been going on.
The importance of exponential holding times is emphasized in the well-
known book of T. C. Fry (1928), where several stochastic processes pertaining
to the theory of telephone exchanges are discussed."9 Many practical problems
can at present be solved only under this simplifying assumption. Fortunately,
this assumption is in many cases less artificial and more justified than would
appear at first glance. For example, if many trunk lines serve the same cus-
tomers, a waiting line will form only if all these lines are busy. In that event
one is not so much interested in when a given line will become free, but rather
when any line will become free. The waiting time is in this case given by the
smallest among many random variables, and it follows from known limit
theorems that under fairly general conditions the corresponding distribution
function is of type (64).
19 Fry generally discusses only the steady state. Many other problems will be found in
Palm's book.
STOCHASTIC PROCESSES 425

As an example, let us consider the simplest problem in telephone traffic


[C. Palm (1943) ]: Infinitely many trunk lines are available and calls arrive
at a constant intensity v [the number of calls being distributed according to
the Poisson law (4) 1. Every new call is directed to a free trunk line, and the
holding time (length of the ensuing conversation) is distributed according to
(64). Required is the probability Pik(t) that k trunk lines will be busy at time
t if initially i lines were busy. If at any time exactly n lines are busy, the prob-
ability of any change in state during the next time interval of length dt is
(nc + q)dt + o(dt); here the term nc accounts for the probability that a busy
line is set free, the term i7 for new calls. The differential equations of our
problem are therefore.

{P'io(t) = -nPio(t) + cPil(t), (65)


P' k(t) = - ('7 + kc)Pik(t) + n7Pik-l(t) + (k + 1)cPik+l(t).

Alternatively, we could use the "backward equations"


OkP(t) = -Pok () + cPuk(t),
p (66)
P ik(t) = -(,1 + iC)Pik(t) + nPi-lk(t) + (i + 1)cPi+l,k(t).
The initial conditions are, in either case,

Pik() 1, if k =i,
0O, if k idi.
The solution is

Pik(t) = exp [-c (1-e-ct)] {1 - ect} i (67)


min (i,k) \ (q

mm o
(k r)!G) iC.l-e1 e
In particular,

Pok(t) = exp [ - -(1 - e-ct)] * !{-(1 - e-ct)} (68)


which is simply a Poisson distribution with parameter 7(1 - ecl)/c. Palm has
also treated the more general case of intensities varying in time with ensuing
periodic oscillations for the required probabilities.
If we assume exponential holding times, the problem of waiting lines with
one or more counters can be treated in a similar way, but the corresponding
differential equations are less easy to handle. However, at least the ergodic
steady-state limit is easy to obtain.
426 BERKELEY SYMPOSIUM: FELLER

9. Non-Markov processes
The definition of a Markov process depends on what is meant by the state
of a system, and the question may well be asked whether it is not always pos-
sible to redefine the notion of state in such a way that the process becomes a
Markov process. In theory this is possible. For example, in the waiting time
problem of the beginning of the last section one might define the state so as
to include not only the number of people in the waiting line but also the
moment when the customer being served got access to the counter; the state
would then be characterized by an integer and a real number. This would be
in accordance with the procedure of mechanics where the primitive notion of
state would include only the positions of particles, not their velocities; the
latter have been included only for the convenience of having the present state
determine the future. On the other hand, the new definition would have the
disadvantage of the state not being directly observable. More important is
that in most cases our integrodifferential equations would no longer hold be-
cause the intensity function p(t,x) shows too strong discontinuities. The new
equations can usually be written down, but they are so complicated that it is
doubtful whether anything is gained. It is interesting to illustrate the situa-
tion by means of the simplest and, perhaps, most important process which is
not of the Markov type.
We shall consider a process which is best known from the so-called "re-
newal theory," although exactly the same type and the same integral equa-
tion appear in many other applications, some of them even more important
than the renewal theory. In order to explain the process and its practical
meaning in the very simplest case, let us consider a certain unit (such as a
machine or bulb) which, when installed, has a certain probability density +(t)
to last for a time t; in other words, the time of service of our unit is a random
variable with the elementary distribution function 4(t). When the lifetime of
the unit expires, it is immediately replaced by a similar unit. Required is the
probability density u(t) for the necessity of a replacement at time t. Now
such a replacement can occur only in one of two mutually exclusive ways.
Either the original unit is being replaced, or it has been replaced at some time
s < t and one of its successors is to be replaced at time t - s after the in-
stallation of the first successor. Therefore

u(t) = +(t) + j'u(t - s) 4(s)ds. (69)

This is the well-known integral equation of renewal theory,20 whose integra-


tion presents no particular difficulties.
In order to treat the same process as a Markov process we would define
the state of the system at time t to be given by the moment x < t of installa-
tion of the unit which actually serves at time t. The transition probabilities
could be expressed by means of one symbol, but it is more convenient to use
20 Cf., for example, Feller (1941).
STOCHASTIC PROCESSES 427
two. If at time r the state is I, then at a later time t > r the state may still be
t: the probability for this event we shall denote by p(r,t,t). If the state has
changed, the new state will be given by a number x with r < x < t. There
exists a corresponding probability density which will be denoted by p(r,4; t,x).
These two functions together give all transition probabilities, since at time t
any state with x < i, t < x < r, or x > t is impossible. Now it is seen that
these transition probabilities have a bad discontinuity at the particular point
x = t, which is not even fixed in the phase space. The result is that the forward
equation (55) alone does not describe the process. It yields in the present case
0 (t -
a K(IM) p(7,4,t), (70)
and

tp(Tr,; tx) -pent; tA, (71)

where 4(x) = f b(y)dy. From the singularity at x = t we obtain another


equation, namely,
t )
p(Tt; tAx p(t) K ,m) +f r(t z ) p(Tr,; t,z)dz. (72)

It can again be shown that these three equations together completely de-
termine the transition probabilities. On the other hand, these transition prob-
abilities can also be written down in terms of u(t), and the equation (69) has
certain advantages over the system (70) to (72).
It may be of interest to remark that many counting mechanisms and other
simple apparatus change an incoming Markov process into an outgoing non-
Markov process of the renewal type, which is described by (69). For example
consider a trunk line or other unit in a telephone exchange. An incoming call
is served by this line if the latter is free; otherwise it is directed to a second
line. Suppose that the incoming calls are Markovian, or distributed according
to the Poisson law. It is easily seen that the outgoing traffic of the first line,
which is the incoming traffic of the second line, is no longer Markovian: in-
stead, the time between consecutive calls will be regulated by an equation
of type (69). For each consecutive line one integral equation of that type has
to be solved. For a detailed analysis we refer the reader to Palm's book.
Similar remarks apply to Geiger-Miller counters, where, owing to the re-
solving time of the counter, not all events are actually counted.
10. The connection with diffusion processes
In this paper we have been considering only the "purely discontinuous" type
of Markov process: in a small time interval there is an overwhelming prob-
ability that the state will remain unchanged; however, if it changes, the change
may be radical. The other extreme, that of a "purely continuous" process, is
428 BERKELEY SYMPOSIUM: FELLER
represented by diffusion and by Brownian motion; there it is certain that
some change will occur in any time interval, however small; only, here it is
certain that the changes during small time intervals will be also small. Con-
sider, for simplicity, a one-dimensional diffusion process. It is most convenient
to consider densities of the transition probabilities and to let u('r,t; t,x) denote
the probability density of finding the particle at time t at the place x if it is
known that at a previous time r it has been at t. The Chapman-Kolmogoroff
equation expressing that the changes in position during time intervals (rrs)
and (s,t) are independent now reads

U(Tr,; t,x) = f u(Tr; s,y)u(s,y; t,x)dy, (73)


where of course r < s < t. The forward and backward equations of the dis-
continuous processes have a counterpart in the so-called Fokker-Planck equa-
tions for u(Tt; t,x), which are familiar to the physicists and have served as a
model to Kolmogoroff in laying the foundations of the general theory. In
order to derive these equations we have again to introduce natural assump-
tions concerning the limits of the transition probabilities for very short time
intervals. To begin with, we shall assume that the mathematical expectation
of the displacement of the particle during a time interval of length dt is of
the order of magnitude of dt; more precisely, we shall assume that if the
particle is at time t at the place x, this mathematical expectation satisfies the
relation
lim- f u(t,x; t + dt,y) (y - x)dy = b(t,x), (74)
de +Odt --

where the function b(t,x) characterizes the direction and intensity of diffusion
at place x. In the case of a symmetric diffusion, b(t,x) naturally vanishes. The
second and last assumption is that the variance of the displacements satisfies
a relation analogous to (74):
1 f+-X
lim-f
de- odt J-
u(t,x; t + dty) (y-x)2dy = a(tx). (75)

Under these conditions21 U(r,t; t,x) satisfies the "forward equation"


ut(Tr,t; t,x) = V [a(tx)u(rt; t,x) ].. + [b(tx)u(r,t; t,x I, (76)
and the "backward equation"
u7(T,,; t,x) = V2 a(T,t)ut(Tr,{; t,x) + b(r,t)u(T,r; t,x); (77)
21 Feller (1936). There is also shown that each of the differential equations (76) and (77)
has a unique solution which is automatically a solution of the other equation and of the
Chapman-Kolmogoroff equation, and which satisfies all prescribed conditions, such as (74),
(75), etc.
STOCHASTIC PROCESSES 429
here subscripts denote differentiation. Again (76) is an equation in the vari-
ables t and x where r and t enter only as parameters occurring in the initial
condition; conversely, (77) is an equation in the variables r and t. The initial
condition expresses the simple fact that as the difference t - T approaches
zero the probability of any finite displacement tends to zero: in other words,
for every fixed e > 0,
rt+f
f u(Tr; t,y)dy - 1. (78)

The most familiar case is that of a homogeneous diffusion where b(t,x) = 0


and a(t,x) is a constant which can be chosen as 2. Then

U(TA; t,x) -
1r(t - ()
} { 4(t - (79)

is the familiar Gaussian distribution.


It is both of theoretical interest and of practical importance that many
empirical phenomena can be described as well by a discontinuous as by a
continuous model. For example, a biological population changes size only in
jumps, individuals being born or dying. Nevertheless, if the population is
large, its size can in the usual manner be treated as a continuous function. It
is then only a matter of analytical convenience whether the process is to be
treated as a diffusion-type random process or handled by one of the models
considered in this paper. In many cases the diffusion equation lends itself
better for an analytical treatment [Feller (1939) ].
From these considerations it follows that, for practical purposes, it is pos-
sible to replace the differential equations of diffusion (76) and (77) by systems
of ordinary differential equations. In fact, if the redl axis is divided into small
intervals of length 6, we may in the first approximation neglect the precise
position of the particle and consider only the interval in which it is contained.
We have then a simple sequence of states Ek, where Ek stands for the fact
that the particle is contained in the interval kS < x < (k + 1)6. We may
then, by the method of section 4, consider the transition probabilities Pvk(Tt)
which will satisfy a system of infinitely many ordinary differential equations,
say (46). One would then expect that as a +- 0 the solutions Pk(7,t) will con-
verge toward a solution u(Tr,; t,x) of an equation of type (77). In other words,
passing to the limit a * 0 so that
vA,3 kS --> x, (80)
one would expect that
a -1PV (Trt) --> U(Trt; tAx) (81)
Let us consider this passage to a limit in a purely formal and heuristic way.
As the length 6 of the individual intervals decreases, the probability of passing
430 BERKELEY SYMPOSIUM: FELLER
from one into another increases, and we have to assume that the intensities
of the discontinuous process (46) satisfy p,(t)8 -* 1. If we wish to come from
(46) to (77) it is necessary to assume that the mathematical expectation and
the variance of the displacement in the discontinuous process tend to the cor-
responding quantities for the process (46), or, in other words, we have to
assume that
IIk(k - v)5-e-b(r,{), (82)
llk(k - v)2 2-*-a(^). (83)
Finally, since in small time intervals small changes are preponderant, we are
led to assume that
HlkI k - 363 0, -
(84)
a relation which, under certain regularity conditions, would be implied by (83).
It is now easy to say what becomes of the individual terms in (46). Putting
j = yS we obtain from (81)
a Pik(T,t) - u(T,y; t,x) (85)

=u(r,; 'M)+ (y -Z)ut(Tr,t; tAx)+ )y


2 ut(r,{; x)+
t **

2
-U(T,t; t,x) + (j - i)OUt(r,t; t,x) + 2 ut(r,{; t,x).
Substituting from (85) into (46) and remembering (82) and (83), we see that
(46) passes formally into '(77).
It appears therefore that every continuous process can be considered as a
limiting case of discontinuous processes, and that solutions of partial differen-
tial equations of form (76) and (77) can be approximated by solutions of sys-
tems of infinitely many ordinary differential equations of type (44) and (46).
STOCHASTIC PROCESSES 431

REFERENCES
ARLEY, N.
1934. "On the Theory of Stochastic Processes and Their Application to the Theory of Cosmic
Radiation." 240 pp. Copenhagen.
ARLEY, N. and V. BORCHSENIUS
1945. "On the theory of infinite systems of differential equations and their application to
the theory of stochastic processes and the perturbation theory of quantum me-
chanics," Acta Mathematica, vol. 76, pp. 261-322.
BERNARDELLI, H.
1944. "The stability of income distributions," Sankhya, vol. 6, pp. 351-362.
CHANDRASEKHAR, S.
1943. "Stochastic problems in physics and astronomy," Reviews of Modern Physics, vol. 15,
pp. 1-89.
DANIELs, H. E.
1944. "The statistical theory of the strength of bundles of threads I," Proc. Roy. Soc.
London, vol. 183-A, pp. 405-435.
DOOB, J. L.
1945. "Markoff chains-Denumerable case," Trans. Amer. Math. Soc., vol. 58, pp. 455-
473.
DUBROVSEKI, W.
1938. "Eine Verallgemeinerung der Theorie der rein unstetigen stochastischen Prozesse
von W. Feller," C. R. (Doklady) Academie Sciences URSS, vol. 19, pp. 439-445.
EINSTEIN, H. A.
1937. Der Geschiebetrieb als Wahrscheinlichkeitsproblem. Mitteilungen derVersuchsan-
stalt fur Wasserbau der Eidgen6ssischen Technischen Hochschule in Zurich, Thesis,
111 pp.
ELDERTON, W.
1945. "Cricket scores and some skew correlation distributions" (discussion by Mr. Seal),
Jour. Roy. Stat. Soc., vol. 108, pp. 34-37.
ELFVING, G.
1937. "Zur Theorie der Markoffschen Ketten," Acta Societatis Scientiarum Fennicae
(Helsingfors) N. S. A., vol. 2, no. 8, 17 pp.
FELLER, W.
1936. "Zur Theorie der stochastischen Prozesse," Mathematische Annalen, vol. 113, pp.
113-160.
1939. "Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahrschein-
lichkeitstheoretischer Behandlung," Acta Biotheoretica (Leiden), vol. 5, pp. 11-40.
1940. "On the integrodifferential equations of purely discontinuous Markoff processes,"
Trans. Amer. Math. Soc., vol. 48, pp. 488-515.
1941. "On the integral equation of renewal theory," Annals of Math. Stat., vol. 12, pp.
243-267.
1943. "On a general class of 'contagious distributions'," ibid., vol. 14, pp. 389-399.
FRY, T. C.
1928. "Probability and Its Engineering Uses." 476 pp. New York, D. Van Nostrand.
GREENWOOD, M., and G. UDNY YULE
1920. "An inquiry into the nature of frequency distribution representative of multiple
happenings with particular reference to the occurrence of multiple attacks of disease
or of repeated accidents," Jour. Roy. Stat. Soc., vol. 83, pp. 255-279.
KHINTCHINE, A.
1939. "Asymptotische Gesetze der Wahrscheinlichkeitsrechnung." Ergebnisse der Mathe-
matik, vol. 2, Berlin, J. Springer, 77 pp.
432 BERKELEY SYMPOSIUM: FELLER
KOLMOGOROFF, A.
1931a. "t~ber die analytischen Methoden in der Wahrscheinlichkeitsrechnung," Math.
Annalen, vol. 104, pp. 415-458.
1931b. "Sur le problbme d'attente," Bull. Academie Sciences URSS, pp. 101-106.
1941. "tVber das logarithmisch normale Verteilungsgesetz der Dimension der Teilchen bei
Zerstuckelung," C. R. (Doklady) Academie Sciences URSS, vol. 31, pp. 99-101.
LUNDBERG, O.
1940. "On Random Processes and Their Application to Sickness and Accident Statistics."
172 pp. Uppsala.
PALM, C.
1943. "Intensitiatsschwankungen im Fernsprechverkehr," Ericcson Technics (Stockholm),
vol. 44, pp. 1-189.
POLLACZEK, F.
1946. "Sur l'application de la theorie des fonctions au calcul de certaines probabilites
continues utilis6cs dans la theorie des r6seaux tkl6phoniques," Annales Inst. H.
Poincare, vol. 10, pp. 1-55.
1947. "Sur un probleme du calcul des probabilites qui se rapporte a la telephonie." Journ.
Math6m. Purcs. Appl. Vol. 25, pp. 307-334.
P6LYA, G.
1930. "Sur quelques points de la th6orie des probabilitds," Annales Institut Henri Poin-
care, vol. 1, pp. 117-16.1.
1937. "Zur Kinematik der Geschiebebewegung." Mitteilungen der Versuchsanstalt fur
Wasserbau der Eidgenossischen Technischen Hochsehule, Zurich, pp. 1-21.
1938. "Sur la promenade au hasard dans un r6seau de rues." Actualites Scientifiques et
Industrielles, no. 734, pp. 25-44.
SEGERDAHL, C. 0.
1939. "On Homogeneous Random Processes and Collective Risk Theory." 132 pp. Uppsala.
WANG, MING CHEN, and G. E. URLENBECK
1945. "On the theory of the Brownian Motion II," Reviews of Modern Physics, vol.17,
pp. 323-342.
YULE, G. UDNY
1924. "A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis,
F.R.S.," Philo. Trans. Roy. Soc. London, Series B, vol. 213, pp. 21-87.

You might also like