0% found this document useful (0 votes)
25 views12 pages

1993 Asymptotic 20 Properties 20 in 20 Dynamic 20 Programming

This document summarizes a counter example to show that uniform convergence of discounted values in dynamic programming does not imply equality between the limit and the lower infinite value. Specifically: - The example constructs an infinite tree with bounded payoffs but payoffs that are not uniformly bounded below. - It shows that as the discount factor approaches 1, the discounted values converge uniformly to the stage values. - However, by choosing the discount factors carefully, the lower infinite value can be made equal to 0 while the limit of discounted values is 1. - This demonstrates that uniform convergence alone is not sufficient to relate the limit to the lower infinite value. Additional conditions are needed.

Uploaded by

Tahamid Hasan.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views12 pages

1993 Asymptotic 20 Properties 20 in 20 Dynamic 20 Programming

This document summarizes a counter example to show that uniform convergence of discounted values in dynamic programming does not imply equality between the limit and the lower infinite value. Specifically: - The example constructs an infinite tree with bounded payoffs but payoffs that are not uniformly bounded below. - It shows that as the discount factor approaches 1, the discounted values converge uniformly to the stage values. - However, by choosing the discount factors carefully, the lower infinite value can be made equal to 0 while the limit of discounted values is 1. - This demonstrates that uniform convergence alone is not sufficient to relate the limit to the lower infinite value. Additional conditions are needed.

Uploaded by

Tahamid Hasan.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/24058475

Asymptotic Properties in Dynamic Programming

Article in International Journal of Game Theory · February 1993


DOI: 10.1007/BF01245566 · Source: RePEc

CITATIONS READS

29 106

2 authors, including:

Dov Monderer
Technion - Israel Institute of Technology
66 PUBLICATIONS 5,535 CITATIONS

SEE PROFILE

All content following this page was uploaded by Dov Monderer on 04 June 2014.

The user has requested enhancement of the downloaded file.


Game
International Journal of Game Theory (1993) 22:1-11 Theory
Asymptotic Properties in Dynamic Programming

D o v MONDERER
Department of Economics, Queen's University, Kingston, Canada, K7L 3N6, Canada and
Faculty of Industrial Engineering and Management, the Technion, Haifa, Israel

SYLVAIN SORIN

DMJ (URA CNRS 762), Ecole Normale Sup6rieure, 45 rue d'Ulm, 75230 Paris, France

Abstract: In the framework of dynamic programming we provide two results:


- An example where uniform convergence of the T-stage value does not imply equality
of the limit and the lower infinite value.
- Generalized Tauberian theorems, that relate uniform convergence of the T-stage value
to uniform convergence of values associated with a general distribution on stages.

1 Introduction

Let S be a state space. For each sES let OCF(s)C_S, and let f be a real b o u n d e d
function on S. Consider the dynamic programming problem where the decision
maker on day t, at stage st, has to choose a new state st+lEF(st), and receives a
payoff f(st). A play at s s S is a sequence (st)~~ with So=S and st+~ ~F(st) f o r all
t ~ 0. One traditionally considers the X-discounted value Vx (s):
oo
V~(s)= sup ( 1 - 2 ) ~. Xtf(st),
(st)7~o t =0

or the T-stage value Vr(s):

1 T
VT(S) = sup - - ~, f(s,),
(s,)Y=o T + 1 t=o

where in both cases the supremum ranges over all plays at s.


One can also consider other evaluations: Let 0 = (0(t))F=o be a probability on
the set of non-negative integers and define:
oo
Vo(s)= sup ~, O(t) f(st).
(st) t = o t = O

Lehrer and Sorin (1992) proved that if either one of the limits limz~l Vz(s), or
l i m r ~ = Vr(s) exists uniformly in s~S, then the other limit also exists uniformly, and
the limit functions coincide.
In Section 3 we give sufficient conditions on linearly ordered families (| < ) of
probabilities on the integers to get analogous results for (Vo)o~o and (Vr)r>_o-

This research was supported by the fund for the promotion of research in the technion.

0020-7276/93 / 1/ 1-11 $ 2.50 9 1993 Physica-Verlag, Heidelberg


2 D. Monderer and S. Sorin

There are other natural ways of evaluating streams of payoffs in dynamic pro-
gramming (except for those discussed above):
The lower (long-run average) value,

1 T
V(s) = sup lim inf ~, f ( s t ) ,
(~,Y;=o r ~ T + I t=o

and the upper (long-run average) value,

1 T
V(s) = sup lim sup ~ ~, f(s~),
(st)~= 0 T~oo 1-}- I t=O

where, again, the supremum is taken on all plays at s.


Lehrer and Monderer (1989) proved that uniform convergence of (V~)x~to,1) to
some V implies V= V, and showed in an example that it does not imply the equality
V=V. If one allows the decision maker to use mixed strategies, i.e., to choose a play
in random, and then defines the payoff of each state as the expectation, one obtains
new evaluations. It is clear that the evaluations Vx, Vr, Vo, and Vwill not change by
allowing mixed strategies, but V will change in general. Let

U(s) = sup lim inf E u t ,


/z~A T~oo t

where A is the set of all probabilities on the set of plays, endowed with the cylinder
a-field, and E u stands for the expectation operator with respect to ~.
Obviously U _ V. As for the relationship between __Uand the limit V of the dis-
counted value functions, Mertens and Neyman (1981) provided sufficient condi-
tions, stronger than the uniform convergence of (VDz~to,1) (and satisfied in every
finite setup), that ensure the equality __U= V (even for stochastic games). In Section 2
we show that uniform convergence alone is not sufficient by providing a counter
example. See Mertens (1987) for related conjectures, hints, and comments. Other
type of necessary conditions, for specific types of dynamic programming problems,
are discussed in Dutta (!991).

2 The Counter Example

Every rooted directed tree without terminal nodes naturally defines a dynamic pro-
gramming problem when we attach payoffs to the nodes. Our dynamic program-
ming problem will be defined as a tree, constructed inductively in the spirit of Lehrer
and Monderer (1989).
Given two decreasing vanishing sequences (an)~=l and (fin);=1, define for
every real number x the tree T ( x ) as follows:
Asymptotic Properties in Dynamic Programming 3

Every node of T(x) except for the root has an outdegree one, and the root itself
has countably many branches. On the n th branch of the root the payoff, g(s), is 0
until node [enn] + 1, it then equals x - a n until node n, and from then on it equals 0.
Define a valuation r at each node s different from the root as follows: ~o(s)=x-an
for every s in the n th branch appearing before the n th node in this branch, and
~o(s) = 0 for every node thereafter. Set T~ = T(1). T2 is obtained from T~ by attaching
the tree T ( r to each node s of T1, different from the root, and keeping the old
payoff of s (i.e., its payoff in 7"1). One can continue naturally and define inductively
the trees T3, T4, ... and finally define T = U ~= ~ T,. Denote the root of T b y So, and
the payoff function by g.
Note that although g is bounded from above by 1, it is not necessarily bounded
from below. Therefore we replace g with a new bounded payoff function f, defined
by: f ( s ) = m a x ( g ( s ) , 0) for every node s of T.
It is clear that limr~= VT(S)= q~(s) uniformly on all nodes s of T. In particular
V(so) = 1.
We will show that for a specific choice of the sequences (g ,)n=~ and (an)n%1,
_U(so) = O.
Let then ~ > 0 and let us prove that __U(so)<o~. Assume in negation that there
e x i s t s / ~ A such that for some integer M , T>_M implies

E 1 T _ OL.
(2.1)

We remark that we can assume that all plays in the support of/~ belong to the fol-
lowing set ~:
If a play in f2 is on the n th branch of some T(.), it remains in this branch until
exactly node n. In fact, if some play leaves the branch before node [en n] + 1, the
decision maker will increase his payoff by leaving the branch at its root, and if a
play leaves the n 'h branch after node n, it is better for the decision maker to leave it
at precisely node n. In particular, a play in f~ never remains in a branch of some
T(.) and is thus characterized by a sequence (m3ff=l of integers inducing the path:
Branch ml of T(1) until the ml th node, Sin1, branch m2 of T(~o(s,,q)) until node m2 of
this branch (with the valuation 1 - a m ~ - a , , ) , etc . . . . Finally, for every play in ~,

ao

~, am,_< 1. (2.2)
i~l

Having done the above reduction, we can now replace any strictly positive payoff on
any play in f~ by 1.
The basic idea of the proof is to choose a sequence (an)~~ converging very
slowly to zero, implying by (2.2), that for every play in f~, for a set of integers i with
positive density, emimt is much larger than ~.k<,'mk. Hence, every play in f] has
"many" large blocks of zeros.
More precisely, let 341 =2, and define inductively ni= ~.k_~iMk and M~+ i = n~
for every i_> 1. Define an =-1 for all i and extend a by monotonicity to all other in-
' i
4 D. Monderer and S. Sorin

1
tegers. Choose e, = ~ n " We say that a play w is good in the i th block li = [ni_ 1, n;] if
v--

a sequence of ones starts in this block. That is, if w is determined by ml, m2.....
there exists mk adapted to Ii in the sense that

Z mj+emkmkEIi 9 (2.3)
j<k
1
Set S~ (w) = - ~.e= 1 we. We claim that there exists io such that for every i > io and for
n
every w e f t , if S~,(w)>_ot, then w is good in the i th block. Otherwise, denote by k the
largest integer such that the k th sequence of ones in w starts before the i th block.

Then em, mk<_ni_l, and hence mk <


- n 2i_ ~ = M/. This implies that this sequence
ni_l
ofonesendsveryearlyintheithblock, andthat wt=Ofor ( 1 - n l ~ ) M i t ' s i n
this block. As hi-1 ~ 0 as i ~ ~ , then S,i(w) must be very small contradicting our
Mi
assumption.
Define Ji(w) to be one if S,,(w) >_o~ and 0 otherwise. If Ji(w) = 1, one can by the
above claim, define k(w, t) as the smallest k that satisfy (2.3). Denote Oi(w)=~k<w.oif
Ji(w) = 1 and 0 otherwise.
Using the m o n o t o n e convergence theorem we have:

l~-~E,u( Z Ji(w)Oi(w)) I ~ Z E,u(Ji(W)Oi(W)~ Z El~(Ji(w))(~ni.


\ i>_io / i>_io i>_io
1
Since (2.1) at ni implies E~(J~(w))>_~, we obtain, recalling that ~,, = _ ,
l

1 ~" i>_i07 Ol,

a contradiction. 9

3 Uniform Convergence

We first establish a few notations. Let D denote the set of all probability distribu-
tions 0 on the set N = {0, 1, 2 . . . . } of non-negative integers, that are non-increasing.
That is,

O(t + 1)_<-0(t) for all teN. (A)

For real numbers c~_<fl and for a distribution 0,


Asymptotic Properties in Dynamic Programming 5

o[~, p]= Y o(t).


c~<--t<_fl

For OeD, define 0 on N as follows:

0(t) = ( 0 ( t ) - 0 ( t + 1))(t+ 1) for all teN. (3.1)

Note that

T T
~. O(t)= ~, O(t)-(T+ 1) O(T+ 1) for all T_>0. (3.2)
t=0 t~O

Because of (A), limt~oo tO(t)=O, and therefore 0 is a probability distribution on


N.
Let a = (at)t=| be a bounded sequence. For every T _ 0 , denote

1 T

ST(a) -- Z at,
T+ 1 t=o

and denote S(a)= (St(a));~174 For every probability 0, set,

co

So(a)= ~ Off)at.
t=O

Observe that by (3.1), similarly to the way (3.2) was obtained, we have
So(a) = So(S(a)) for all sequences a and probabilities 0, that is,
o| o|

~. Off)at= ~. O(t)St(a). (3.3)


t=O t=O

We consider linearly ordered families (0, >), where @_CD, and " > " is a linear
(complete) order on | satisfying:
N
r e > 0 , vN___0, ~0oe| such that V0>0o, ~ O(t)<e, (B)
t=0

which is obviously equivalent to:

r e > O , ~OoeO, such that vO>Oo, O(O)<e. (B*)

Note that Condition (B) implies that for every 0eO, there exists 0e| with
0 < 0 . Therefore, the notions of lira, lira inf, lira sup, etc . . . . are naturally defined
for real-valued function on | An increasing sequence (0n)~=o in 0, is increasing
to oo, if for every 0 c O , there exists an integer N such that 0~> 0 for all n>_N. For
the equivalence results we will need the next properties:
(C) ~eo>0 and ~:(0, eo)~(0, 1) such that re<e| 3J(e), and a sequence
(0~,,)n% J(~), that increases to o| and satisfies:
6 D. Monderer and S. Sorin

0,. ~ [ ( 1 - e) n, nl>~o(e) for all n>_J(e).

(D) There exists a sequence (0n),=o,- that increases to oo , and 3eo>O and
~u:(0, eo)~(0, 1) such that v e < e o , aI(e),

O, [~ (e) n, n] > 1 - e for all n >_I(e).

3.1 Preliminary Results

We will assume without loss of generality that the payoff function in our dynamic
programming satisfies 0_<f__. 1.

Lemma 3.1. r e > 0 , vN, 300 such that v0>0o, qso~S, 3n>__N satisfying
V . (So) >- Vo (So) - e.

Proof." By condition (B) and by (3.2), there exists 0o, such that ~.tU=o0 ( t ) < _e for
2
all 0>0o. Let 0>0o, and let sotS. Let s=(st)?'=o be an 88 play for 0
in So. Then by (3.3),

~. O(t)S,(f(s))>>- Vo(so)-e,
t=N+ 1

where f(s) = (f(st)) ~=o.


A S Zt~ 1 0 ( t ) ~ 1, the above inequality implies that a convex combination of
{St(f(s))lt>>_N+ 1} is greater or equals Ve(so)-e. Therefore there exists t>>_N+l
with St (f(s)) >_Vo (So), implying V t (So) ~ V O(So) - e. []

Corollary 3.2.

lim sup V. _> lim sup 11o.

Lemma 3.3. lim sup 11o is non-increasing in plays. That is,

lim sup Vo(so)>_lim sup Vo(sD for every sl~F(so).

Proof." Note that if (st)~~ is e-optimal in sl for O, then s = ( S t)t=o


00 is a play in So.

Hence, it suffices to prove that for every e > O, for sufficiently large O,

~, O(t)f(st+ 1) -f(st) < e.


t=O
Asymptotic Properties in Dynamic Programming 7

By rearranging terms and by (3.3), the last inequality can be proved by showing
that

~. O(t) f(st+ 1) - f ( s o ) < c .


t:o t+l

Hence, it suffices to prove that for every e > 0, for sufficiently large 0,

which follows easily from Condition (B).


2
L e m m a 3.4 (Lehrer and Sorin (1992)). r e > 0 , vn > - , and u there exist a p l a y
s = (S t)t=o
'~' and a stage L such that

1 T C
~, f(sL +t) >- Vn (So) - e f o r every 0 <_T <_- n.
T+ 1 t~o 2

3.2 From Vo to V~

Proposition 1. A s s u m e limo~ 0o 11o= V, uniformly.

r e > 0 3N, such that vn>_N, Vn <_ V+ e.

C
Proof." Set el = ~ . By the uniform convergence assumption, there exists 0o, such
J
that

IVo(so)-V(so)l <el for all sotS. (3.4)

Let M be an integer satisfying

~, Oo(t)> 1 - ~ , (3.5)
t=o

and let N be an integer satisfying N > --.2 We now show that N satisfies the asser-

tion of the proposition. Indeed, let n >_N, and let s o t S . By Lemma 3.4, there exists a
play s = (st)~-o and an integer L that satisfy the assertion of Lemma 3.4 for el. By
(3.3) and (3.5), this implies, Voo(sz) - Vn (So) - 2 el. Therefore V(SL) -- V~(So) -- 3 el, by
(3.4). Hence, by Lemma 3.3, and because 3el = e,
8 D. Monderer and S. Sorin

V(so) >- V~ (So) - e. 9

Proposition 2. Assume (| > ) satisfies Condition (C), and uniform convergence of


(Vo)o~o to V.

r e > 0 , ~N, such that vn>_N, V,>_ V - e .

Proof." Otherwise, there exists e > 0 such that for every N, there exists n>_N and
Soe S with V, (So)< V(so)- e. We now choose a particular integer N as follows: set

el = e2 = ~ , and choose e3, e4, e5 in a way that will be described later. Choose an

integer K satisfying the following 4 properties.


(1) K is large enough such that at every play s=(st)t=o, vn>_K, if
II. (So) < V(so) - e, then

S r ( f ( s ) ) < V(so)-el for all ( 1 - e l ) n < T<n.

(2) Let J(~2) and the sequence (0 .... ),--J(~2) satisfy the property stated in Condi-
tion (C). Choose K>J(82). That is,

0 n [(1 -- e2) n , n] > ~o(e2) for every n ~ K,

where 0, = 0,, ~2"


(3) As ( 0,)n=k is increasing to oo, and Vo~ V, we can choose K large enough
such that

- - •4 < V o n -- V < •4 f o r a l l n _> K .

(4) By Proposition 1, we can choose K large enough, such that for every
n ~K,

Vn< V+a3 for all n>_K.

Finally, choose N > K satisfying

K
for al n>_N.
t=O

By our initial assumption there exists n _>N and So with Vn (So)< V(so)- e. Let
s = (st)7~ o be any play at So. Set at= O,(t)St(f(s)). Then

K
So,,(f(s)): E a, + 7 a, + Z a, + Z a,.
t=O K<t<(1--az)n (1 - - e z ) n ~ t ~ n t>n

Therefore, by the way we chose N,


Asymptotic Properties in Dynamic Programming

So. ( f (s)) ~_ V(so) + A,

where

A = e3 + es - (o(e2) e l .

As the last inequality holds for every play at So, then

vo~ V(so) + a .

Hence, by property (3), satisfied by K and hence by N, and recalling that el =


e
e2 = ~ , we have

(ff ~ <" ~3 + ~4 + e5 9

Thus we can have a contradiction by choosing ei, i = 3 , 4 , 5, to be less than

3.3 From Vn to I1o

Proposition 3. A s s u m e lim,~ o~ V, = W uniformly.

r e > 0 , 30o, such that u Vo <


- W + e.

Proof." The proof is an immediate consequence of Lemma 3.1. 9

L e m m a 3.5 (Lehrer and Sorin (1992)). A s s u m e limn~ oo V, = W uniformly. Then f o r


every e small enough, there exists an integer N, such that f o r every n>_N and soeS,
there is a play s = (si)2~ o at so satisfying:

1 r
f(st)>_ W ( s o ) - e f o r every en<_ T _ < ( 1 - e ) n .
T + I t=o

Proposition 4. A s s u m e (| > ) satisfies condition (D), and l i m , ~ o V, = W uni-


formly.

r e > 0 , 3N, such that u Vo >_W - e ,

where (0,)2'=o is defined in Condition (D).


10 D. Monderer and S. Sorin

f
Proof." Let e > 0. Let fi > 0 satisfies< rain (~u(e), e). Then by Lemma 3.5 there
1-d
exists N such that for every n>_N and soeS, there is a play s=(st)?~=o at So satis-
fying:

T
1
~, f(st) > W(so) - f for every dn_< T_(1 - d ) n .
T+ 1 t=o

Without loss of generality we can choose N>_I(e). Note that if m >_N (assuming that
N was chosen large enough), there exists n>_N, with

[q/(e)m, m] c_ [fin, (1 - d ) n l .

Hence, t~m[q/(e)m, m l - - - 1 - e, and S t ( f (s))-> 1 - O_ 1 - e, for T~ [q/(e)m, m]. There-


fore,

Vom(So)>--W(so) - 2 e for all m _ N and all So~ S.

Remark 1.
If the sequence (0n)2=o, given in Condition (D) is dense in (| > ) (in the sense
that its uniform convergence implies the uniform convergence of (Vo)o~| then un-
der conditions (C) and (D), uniform convergence of (Vn)~~ implies uniform con-
vergence of (Vo)o~o to the same limit function. As it was proved in Lehrer and Sorin
(1992), such is the case when | {0h: 2~[0, 1)}, where 0~(t) = ( 1 - 2 ) 2 t, and " > " is
the natural order on real numbers.

Remark 2.
Let (| > ) be a linearly ordered set of distributions on N satisfying (B), (C*),
and (D*), where (C*) and (D*) are obtained from (C) and (D) respectively, by re-
placing 0 with 0 everywhere. Define,

Uo(so)= sup Z O(t)St(f(S)).


(st) ~'= o t = o

It is obvious that our proofs yield the equivalence theorem for this solution concept
as well. E.g., for every 0 < 2 < 1 define

U~(so)= sup ( 1 - 2 ) ~. 2tSt(f(s)).


(st) ~ - o t~ 0

Then (UD converges uniformly if and only if (V,) converges uniformly, and both
share the same limit function.
Asymptotic Properties in Dynamic Programming 11

References

1. Dutta PK, What Do Discounted Optima Converge to? A Theory of Discount Rate Asymp-
totics In Economic Models, Journal of Economic Theory 55 (1991), 64-941
2. Lehrer E and Monderer D, Discounting Versus Averaging in Dynamic Programming,
Games and Economic Behavior (to appear) (1989).
3. Lehrer E and Sorin S, A Uniform Tauberian Theorem in Dynamic Programming, Mathe-
matics of Operations Research 17 (1992), 303-307.
4. Mertens J-F, Repeated Games, Proceeding of the International Congress of Mathemati-
cians (Berkeley 1986) (1987), 1528-1577.
5. Mertens J-F and Neyman A, Stochastic games, International Journal of Game Theory 10,
2 (1981), 53-66.

Received June 1992


Revised version February 1993

View publication stats

You might also like