1993 Asymptotic 20 Properties 20 in 20 Dynamic 20 Programming
1993 Asymptotic 20 Properties 20 in 20 Dynamic 20 Programming
net/publication/24058475
CITATIONS READS
29 106
2 authors, including:
Dov Monderer
Technion - Israel Institute of Technology
66 PUBLICATIONS 5,535 CITATIONS
SEE PROFILE
All content following this page was uploaded by Dov Monderer on 04 June 2014.
D o v MONDERER
Department of Economics, Queen's University, Kingston, Canada, K7L 3N6, Canada and
Faculty of Industrial Engineering and Management, the Technion, Haifa, Israel
SYLVAIN SORIN
DMJ (URA CNRS 762), Ecole Normale Sup6rieure, 45 rue d'Ulm, 75230 Paris, France
1 Introduction
Let S be a state space. For each sES let OCF(s)C_S, and let f be a real b o u n d e d
function on S. Consider the dynamic programming problem where the decision
maker on day t, at stage st, has to choose a new state st+lEF(st), and receives a
payoff f(st). A play at s s S is a sequence (st)~~ with So=S and st+~ ~F(st) f o r all
t ~ 0. One traditionally considers the X-discounted value Vx (s):
oo
V~(s)= sup ( 1 - 2 ) ~. Xtf(st),
(st)7~o t =0
1 T
VT(S) = sup - - ~, f(s,),
(s,)Y=o T + 1 t=o
Lehrer and Sorin (1992) proved that if either one of the limits limz~l Vz(s), or
l i m r ~ = Vr(s) exists uniformly in s~S, then the other limit also exists uniformly, and
the limit functions coincide.
In Section 3 we give sufficient conditions on linearly ordered families (| < ) of
probabilities on the integers to get analogous results for (Vo)o~o and (Vr)r>_o-
This research was supported by the fund for the promotion of research in the technion.
There are other natural ways of evaluating streams of payoffs in dynamic pro-
gramming (except for those discussed above):
The lower (long-run average) value,
1 T
V(s) = sup lim inf ~, f ( s t ) ,
(~,Y;=o r ~ T + I t=o
1 T
V(s) = sup lim sup ~ ~, f(s~),
(st)~= 0 T~oo 1-}- I t=O
where A is the set of all probabilities on the set of plays, endowed with the cylinder
a-field, and E u stands for the expectation operator with respect to ~.
Obviously U _ V. As for the relationship between __Uand the limit V of the dis-
counted value functions, Mertens and Neyman (1981) provided sufficient condi-
tions, stronger than the uniform convergence of (VDz~to,1) (and satisfied in every
finite setup), that ensure the equality __U= V (even for stochastic games). In Section 2
we show that uniform convergence alone is not sufficient by providing a counter
example. See Mertens (1987) for related conjectures, hints, and comments. Other
type of necessary conditions, for specific types of dynamic programming problems,
are discussed in Dutta (!991).
Every rooted directed tree without terminal nodes naturally defines a dynamic pro-
gramming problem when we attach payoffs to the nodes. Our dynamic program-
ming problem will be defined as a tree, constructed inductively in the spirit of Lehrer
and Monderer (1989).
Given two decreasing vanishing sequences (an)~=l and (fin);=1, define for
every real number x the tree T ( x ) as follows:
Asymptotic Properties in Dynamic Programming 3
Every node of T(x) except for the root has an outdegree one, and the root itself
has countably many branches. On the n th branch of the root the payoff, g(s), is 0
until node [enn] + 1, it then equals x - a n until node n, and from then on it equals 0.
Define a valuation r at each node s different from the root as follows: ~o(s)=x-an
for every s in the n th branch appearing before the n th node in this branch, and
~o(s) = 0 for every node thereafter. Set T~ = T(1). T2 is obtained from T~ by attaching
the tree T ( r to each node s of T1, different from the root, and keeping the old
payoff of s (i.e., its payoff in 7"1). One can continue naturally and define inductively
the trees T3, T4, ... and finally define T = U ~= ~ T,. Denote the root of T b y So, and
the payoff function by g.
Note that although g is bounded from above by 1, it is not necessarily bounded
from below. Therefore we replace g with a new bounded payoff function f, defined
by: f ( s ) = m a x ( g ( s ) , 0) for every node s of T.
It is clear that limr~= VT(S)= q~(s) uniformly on all nodes s of T. In particular
V(so) = 1.
We will show that for a specific choice of the sequences (g ,)n=~ and (an)n%1,
_U(so) = O.
Let then ~ > 0 and let us prove that __U(so)<o~. Assume in negation that there
e x i s t s / ~ A such that for some integer M , T>_M implies
E 1 T _ OL.
(2.1)
We remark that we can assume that all plays in the support of/~ belong to the fol-
lowing set ~:
If a play in f2 is on the n th branch of some T(.), it remains in this branch until
exactly node n. In fact, if some play leaves the branch before node [en n] + 1, the
decision maker will increase his payoff by leaving the branch at its root, and if a
play leaves the n 'h branch after node n, it is better for the decision maker to leave it
at precisely node n. In particular, a play in f~ never remains in a branch of some
T(.) and is thus characterized by a sequence (m3ff=l of integers inducing the path:
Branch ml of T(1) until the ml th node, Sin1, branch m2 of T(~o(s,,q)) until node m2 of
this branch (with the valuation 1 - a m ~ - a , , ) , etc . . . . Finally, for every play in ~,
ao
~, am,_< 1. (2.2)
i~l
Having done the above reduction, we can now replace any strictly positive payoff on
any play in f~ by 1.
The basic idea of the proof is to choose a sequence (an)~~ converging very
slowly to zero, implying by (2.2), that for every play in f~, for a set of integers i with
positive density, emimt is much larger than ~.k<,'mk. Hence, every play in f] has
"many" large blocks of zeros.
More precisely, let 341 =2, and define inductively ni= ~.k_~iMk and M~+ i = n~
for every i_> 1. Define an =-1 for all i and extend a by monotonicity to all other in-
' i
4 D. Monderer and S. Sorin
1
tegers. Choose e, = ~ n " We say that a play w is good in the i th block li = [ni_ 1, n;] if
v--
a sequence of ones starts in this block. That is, if w is determined by ml, m2.....
there exists mk adapted to Ii in the sense that
Z mj+emkmkEIi 9 (2.3)
j<k
1
Set S~ (w) = - ~.e= 1 we. We claim that there exists io such that for every i > io and for
n
every w e f t , if S~,(w)>_ot, then w is good in the i th block. Otherwise, denote by k the
largest integer such that the k th sequence of ones in w starts before the i th block.
a contradiction. 9
3 Uniform Convergence
We first establish a few notations. Let D denote the set of all probability distribu-
tions 0 on the set N = {0, 1, 2 . . . . } of non-negative integers, that are non-increasing.
That is,
Note that
T T
~. O(t)= ~, O(t)-(T+ 1) O(T+ 1) for all T_>0. (3.2)
t=0 t~O
1 T
ST(a) -- Z at,
T+ 1 t=o
co
So(a)= ~ Off)at.
t=O
Observe that by (3.1), similarly to the way (3.2) was obtained, we have
So(a) = So(S(a)) for all sequences a and probabilities 0, that is,
o| o|
We consider linearly ordered families (0, >), where @_CD, and " > " is a linear
(complete) order on | satisfying:
N
r e > 0 , vN___0, ~0oe| such that V0>0o, ~ O(t)<e, (B)
t=0
Note that Condition (B) implies that for every 0eO, there exists 0e| with
0 < 0 . Therefore, the notions of lira, lira inf, lira sup, etc . . . . are naturally defined
for real-valued function on | An increasing sequence (0n)~=o in 0, is increasing
to oo, if for every 0 c O , there exists an integer N such that 0~> 0 for all n>_N. For
the equivalence results we will need the next properties:
(C) ~eo>0 and ~:(0, eo)~(0, 1) such that re<e| 3J(e), and a sequence
(0~,,)n% J(~), that increases to o| and satisfies:
6 D. Monderer and S. Sorin
(D) There exists a sequence (0n),=o,- that increases to oo , and 3eo>O and
~u:(0, eo)~(0, 1) such that v e < e o , aI(e),
We will assume without loss of generality that the payoff function in our dynamic
programming satisfies 0_<f__. 1.
Lemma 3.1. r e > 0 , vN, 300 such that v0>0o, qso~S, 3n>__N satisfying
V . (So) >- Vo (So) - e.
Proof." By condition (B) and by (3.2), there exists 0o, such that ~.tU=o0 ( t ) < _e for
2
all 0>0o. Let 0>0o, and let sotS. Let s=(st)?'=o be an 88 play for 0
in So. Then by (3.3),
~. O(t)S,(f(s))>>- Vo(so)-e,
t=N+ 1
Corollary 3.2.
Hence, it suffices to prove that for every e > O, for sufficiently large O,
By rearranging terms and by (3.3), the last inequality can be proved by showing
that
Hence, it suffices to prove that for every e > 0, for sufficiently large 0,
1 T C
~, f(sL +t) >- Vn (So) - e f o r every 0 <_T <_- n.
T+ 1 t~o 2
3.2 From Vo to V~
C
Proof." Set el = ~ . By the uniform convergence assumption, there exists 0o, such
J
that
~, Oo(t)> 1 - ~ , (3.5)
t=o
and let N be an integer satisfying N > --.2 We now show that N satisfies the asser-
tion of the proposition. Indeed, let n >_N, and let s o t S . By Lemma 3.4, there exists a
play s = (st)~-o and an integer L that satisfy the assertion of Lemma 3.4 for el. By
(3.3) and (3.5), this implies, Voo(sz) - Vn (So) - 2 el. Therefore V(SL) -- V~(So) -- 3 el, by
(3.4). Hence, by Lemma 3.3, and because 3el = e,
8 D. Monderer and S. Sorin
Proof." Otherwise, there exists e > 0 such that for every N, there exists n>_N and
Soe S with V, (So)< V(so)- e. We now choose a particular integer N as follows: set
el = e2 = ~ , and choose e3, e4, e5 in a way that will be described later. Choose an
(2) Let J(~2) and the sequence (0 .... ),--J(~2) satisfy the property stated in Condi-
tion (C). Choose K>J(82). That is,
(4) By Proposition 1, we can choose K large enough, such that for every
n ~K,
K
for al n>_N.
t=O
By our initial assumption there exists n _>N and So with Vn (So)< V(so)- e. Let
s = (st)7~ o be any play at So. Set at= O,(t)St(f(s)). Then
K
So,,(f(s)): E a, + 7 a, + Z a, + Z a,.
t=O K<t<(1--az)n (1 - - e z ) n ~ t ~ n t>n
where
A = e3 + es - (o(e2) e l .
vo~ V(so) + a .
(ff ~ <" ~3 + ~4 + e5 9
1 r
f(st)>_ W ( s o ) - e f o r every en<_ T _ < ( 1 - e ) n .
T + I t=o
f
Proof." Let e > 0. Let fi > 0 satisfies< rain (~u(e), e). Then by Lemma 3.5 there
1-d
exists N such that for every n>_N and soeS, there is a play s=(st)?~=o at So satis-
fying:
T
1
~, f(st) > W(so) - f for every dn_< T_(1 - d ) n .
T+ 1 t=o
Without loss of generality we can choose N>_I(e). Note that if m >_N (assuming that
N was chosen large enough), there exists n>_N, with
[q/(e)m, m] c_ [fin, (1 - d ) n l .
Remark 1.
If the sequence (0n)2=o, given in Condition (D) is dense in (| > ) (in the sense
that its uniform convergence implies the uniform convergence of (Vo)o~| then un-
der conditions (C) and (D), uniform convergence of (Vn)~~ implies uniform con-
vergence of (Vo)o~o to the same limit function. As it was proved in Lehrer and Sorin
(1992), such is the case when | {0h: 2~[0, 1)}, where 0~(t) = ( 1 - 2 ) 2 t, and " > " is
the natural order on real numbers.
Remark 2.
Let (| > ) be a linearly ordered set of distributions on N satisfying (B), (C*),
and (D*), where (C*) and (D*) are obtained from (C) and (D) respectively, by re-
placing 0 with 0 everywhere. Define,
It is obvious that our proofs yield the equivalence theorem for this solution concept
as well. E.g., for every 0 < 2 < 1 define
Then (UD converges uniformly if and only if (V,) converges uniformly, and both
share the same limit function.
Asymptotic Properties in Dynamic Programming 11
References
1. Dutta PK, What Do Discounted Optima Converge to? A Theory of Discount Rate Asymp-
totics In Economic Models, Journal of Economic Theory 55 (1991), 64-941
2. Lehrer E and Monderer D, Discounting Versus Averaging in Dynamic Programming,
Games and Economic Behavior (to appear) (1989).
3. Lehrer E and Sorin S, A Uniform Tauberian Theorem in Dynamic Programming, Mathe-
matics of Operations Research 17 (1992), 303-307.
4. Mertens J-F, Repeated Games, Proceeding of the International Congress of Mathemati-
cians (Berkeley 1986) (1987), 1528-1577.
5. Mertens J-F and Neyman A, Stochastic games, International Journal of Game Theory 10,
2 (1981), 53-66.