Stopping Time Markov Processes
Stopping Time Markov Processes
and Z
E [f (Xn+1 ) |Xn = x, Xn−1 , . . ., X0] = f (y)P (dy|x)
S
for any measurable function f : S → R. The main information from
these equalities is that the distribution of Xn+1 , conditional on the history
{Xn , Xn−1, . . . , X0}, is independent of {Xn−1 , . . . , X0}.
Let c : S → R and g : S → R be two functions on the state space. The
objective is to minimize the total expected cost
τX
−1
E g(Xτ ) + c(Xj )
j=0
1 Finite-horizon
1
The DPE for this problem is quite simple: Define
.
VN (x) = g(x)
Z
.
Vn (x) = min g(x), c(x) + Vn+1 (y)P (dy|x) ,
S
Proposition 1 The value function vN (x) = V0 (x), and the optimal stop-
ping time is
∗ .
τN = inf {j ≥ 0 : Vj (Xj ) = g(Xj )} .
In particular, we have
2
= 1{τN∗ ≤n} ZτN∗ + 1{τN∗ ≥(n+1)} E[Zn+1 |Xn, Xn−1 , . . . , X0]
n
X Z
= 1{τN∗ ≤n} ZτN∗ + 1{τN∗ ≥(n+1)} c(Xj ) + Vn+1 (y)P (dy|Xn)
j=0 S
n−1
X
= 1{τN∗ ≤n} ZτN∗ + 1{τN∗ ≥(n+1)} c(Xj ) + Vn (Xn )
j=0
= 1{τN∗ ≤n} ZτN∗ + 1{τN∗ ≥(n+1)} Zn
= ZτN∗ ∧n .
Therefore,
But
∗ −1 ∗ −1
τN τN
X X
E[ZτN∗ ] = E c(Xj ) + VτN∗ (XτN∗ ) = E c(Xj ) + g(XτN∗ ) .
j=0 j=0
It follows that
V0(x) = vN (x)
∗
and that τN is optimal for the problem with horizon N .
The following result is a rewrite of the Proposition 1, which is indeed
the forward form of the above DPE.
v0 (x) = g(x)
Z
vn+1 (x) = min g(x), c(x) + vn (y)P (dy|x) ,
S
3
with 0 < a < b. Each solicitation will cost c. After the first offer comes in,
what should the agent do so as to minimize the total cost plus the minimum
price attained?
Solution: The objective is to find
τ −1
. X
vN (x) = E Xτ + c X0 = x ,
j=0
.
where N = 2, and Xn = Y0 ∧ . . . ∧ Yn .
It is not difficult to check that {X0, X1, X2} is a time-homogeneous
Markov chain taking values in {a, b}, with transition probability matrix
P (Xn+1 = a|Xn = a) = 1,
P (Xn+1 = b|Xn = a) = 0,
P (Xn+1 = a|Xn = b) = 1/2,
P (Xn+1 = b|Xn = b) = 1/2.
We will write
b−a
c= + ε.
2
Define V2(x) = x, or
V2(a) = a, V2(b) = b.
and
and
τ ∗ = inf {n ≥ 0; Vn(x) = x} .
More precisely,
4
2. For ε < 0, then v2 (a) = a, and v2 (b) = b + 3ε/2. It is not difficult to
see that
τ ∗ = inf {n ≥ 0; Xn = a} .
That is, the optimal policy is to search until the lowest price a is
solicited.
2 Infinite-horizon
The following result claim that the value function satisfies the DPE.
An optimal (finite) stopping time exists if and only if the stopping time
.
τ ∗ = inf {n ≥ 0 : v(Xn ) = g(Xn)}
5
But thanks to Condition 1, MCT, and DCT, we have
τX
ε −1 (τε ∧n)−1
X
E g(Xτε ) + c(Xj ) = lim E g(Xτε∧n ) + c(Xj )
n
j=0 j=0
≥ lim vn (x).
n
Therefore,
v(x) + ε ≥ lim vn (x).
n
Since ε is arbitrary, we have v(x) ≥ limn vn (x), which in turn implies v(x) =
limn vn (x), for every x ∈ S.
By Proposition 2, we have
Z
vn+1 (x) = min g(x), c(x) + vn (y)P (dy|x)
S
from DCT.
Fix X0 = x, and assume now there exists an optimal finite stopping
time, say σ. Consider the process
n−1
. X
Zn = c(Xj ) + v(Xn), n = 0, 1, . . ..
j=0
v(Xσ ) = g(Xσ ).
6
This implies σ ≥ τ ∗ . Whence τ ∗ is finite if there exists an optimal finite
stopping time.
Suppose now τ ∗ is finite. The same proof in Proposition 1 yields that
τ ∗X
∧n−1
v(x) = E[Z0] = · · · = E[Zτ ∗∧n ] = c(Xj ) + v(Xτ ∗∧n ) .
j=0
Corollary 1 The value function v is largest solution to the DPE (3) in the
sense that: if u is another solution to the DPE, then v(x) ≥ u(x) for every
x ∈ S.
7
Proof. Let x̄ ∈ S such that
Since S is a finite set, such a x̄ always exists. Clearly, v(x̄) = g(x̄). Thus
.
τ ∗ = {n ≥ 0 : v(Xn) = g(Xn )} ≤ {n ≥ 0 : Xn = x̄} = σ.
Proof. The proof is exactly the same as the last part of the proof of Theorem
1 with v replaced by v̄. We omit the details.
All the results we have obtained hold for the discounted problem, as long as
Condition 1 holds.
8
3 Examples of infinite horizon
is the least concave majorant of f (i.e., the smallest concave function that
dominates f ).
v0 (x) = f (x)
Z
vn+1 (x) = max f (x), vn (y)P (dy|x) .
S
9
for every x ∈ S. Then
Z
vn+1 (x) ≤ max f (x), u(y)P (dy|x) ≤ max {f (x), u(x)} = u(x).
S
and the optimal strategy. The value function represents the price of a dis-
crete put option.
10
Clearly, α < 0. But since we assume v̄ is bounded, it follows B = 0, and
thus
v̄(x) = Aeαx , ∀ x ≥ b.
But at x = b, we must have
1 − eb = Aeαb ,
or
A = e−αb 1 − eb .
In other words, a candidate solution is
(
1 − ex ; if x ≤ b
v̄(x) = α(x−b) b (5)
e 1−e ; if x > b
But what is the value of the free boundary b ∈ Z? Recall that for v̄ to be a
solution, we will have
and
1 − ex < β/2[v̄(x + 1) + v̄(x − 1)], ∀ x > b. (7)
These inequalities indeed uniquely determines b ∈ Z. We have the following
Lemma.
Lemma 1 There exists a unique negative integer b ∈ Z such that the above
inequalities (6)-(7) hold. Indeed, b ∈ Z is the unique integer such that
b ∈ (B − 1, B], with
1 − eα
B = log < 0.
1 − eα−1
The function v̄ given by (5) is the value function, and the optimal stopping
time is
τ ∗ = {n ≥ 0 : Xn ≤ b} .
The proof of this lemma is given in the Appendix.
11
as j → ∞. This implies that v(x) ≥ 1, and whence v(x) = 1. The value
function clearly satisfies the DPE
v(x) = max (1 − ex )+ , [v(x + 1) + v(x − 1)]/2 .
But since the stopping time
inf{n ≥ 0 : v(Xn) = (1 − exp(Xn ))+ } = ∞,
there is no optimal stopping time. Note that {σj } is a sequence of stopping
times that approach the value function (i.e., {σj } is an optimizing sequence).
The results in the previous sections rely on Condition 1. What if the con-
dition fails to hold? Or more seriously, what if the cost structure is not
as specified; e.g. the cost is path-dependent? In this case, one method we
can apply the so-called verification argument. This approach has been used
many times in Chapter 2, and is also implicitly used in the proof of Theorem
1. The next two examples show that this method is quite general.
Example 4 (Search for maximum) This is another search model, but with
a different cost. Let {Y0 , Y1, . . .} be a sequence of iid non-negative random
variables, representing the offers, with common distribution F and such
that E[Y0] < ∞. Let c > 0 be a constant, representing the cost for each
solicitation. The objective for the agent is to solve the optimization problem
max E [Y0 ∨ Y1 ∨ . . . ∨ Yτ − cτ ]
τ
This does not satisfies Condition 1 since {Xn } may not be bounded. Denote
the value function by
v(x) = sup E [ Xτ − cτ | X0 = x] .
{τ :P (τ <∞)=1}
12
1. If E[Y0] ≤ c, then the optimal stopping time is τ ∗ = 0, and the value
function is v(x) ≡ x for all x. Indeed, it is trivial to check that the
process {Xn −cn} is a supermartingale. Thus, for every finite stopping
time τ and n ∈ N, we have
E [Xτ − cτ ] ≤ x.
Before proving the claim, it worth pointing out that x∗ always exists,
since Z ∞
[1 − F (y)] dy = E[Y0] > c,
0
∗
and that τ is obviously finite.
The idea of the proof is the same as Case 1. Denote by v̄ the RHS of
equation (8). It is not difficult to check that the
Z
v̄(x) = max x, − c + v̄(y)P (dy|x) (9)
Z
= max x, − c + v̄(x ∨ y) dF (y)
13
Indeed, for x > x∗ , we have
Z
−c + v̄(x ∨ y) dF (y)
Z x Z ∞
= −c + x dF (y) + y dF (y)
0 x
Z x
= −c + E[Y0] + (x − y) dF (y)
Z0 x
= −c + E[Y0] + F (y) dy
0
Z ∞ Z ∞ Z x
= − [1 − F (y)] dy + [1 − F (y)] dy + F (y) dy
x∗ 0 0
Z x∗ Z x
= [1 − F (y)] dy + F (y) dy
0 0
Z x
= x− [1 − F (y)] dy,
x∗
14
Example 5 (Discrete up-and-out put option) Suppose the log-stock price
is modeled by the simple symmetric random process {X0, X1, . . .} on Z. Let
β ∈ (0, 1) be a discounter factor. Compute the value function
h i
.
v(x) = sup E β τ (1 − exp(Xτ ))+ · 1{max0≤j≤τ Xj <H} X0 = x .
{τ :P (τ <∞)=1}
.
Now consider the process Zn = {β σ∧n V (Xσ∧n ) : n = 0, 1, . . .}. We claim
{Zn } is a supermartingale. Indeed,
15
It follows from Optional Sampling Theorem that for any stopping time τ
and any n ∈ N , we have
But
Zτ = β σ∧τ V (Xσ∧τ ) ≥ β τ (1 − exp(Xτ ))+ · 1{τ <σ}.
Whence h i
V (x) ≥ E β τ (1 − exp(Xτ ))+ · 1{τ <σ} .
Taking supremum over τ on the RHS, we have V (x) ≥ v(x).
. ∗
Now define Zn∗ = {β σ∧τ ∧n V (Xσ∧τ ∗∧n ) : n = 0, 1, . . .}. Similarly to the
above argument, we have
∗
E[Zn+1 |Xn , . . . , X0]
∗
= β σ∧τ V (Xσ∧τ ∗ )1{σ∧τ ∗≤n} + 1{σ∧τ ∗≥n+1} β n+1 [V (Xn + 1) + V (Xn − 1)] /2
∗
= β σ∧τ V (Xσ∧τ ∗ )1{σ∧τ ∗≤n} + 1{σ∧τ ∗≥n+1} β n V (Xn)
= Zn∗ .
In particular,
h ∗ ∧n
i
E β σ∧τ V (Xσ∧τ ∗∧n ) = E[Zn∗] = E[Zn−1
∗
] = . . . = E[Z0] = V (x).
But
∗ ∗
β σ∧τ V (Xσ∧τ ∗ ) = β τ (1 − exp(Xτ ∗ ))+ · 1{τ ∗<σ} ,
Thus V (x) ≥ v(x), which in turn implies that V (x) = v(x) and that τ ∗ is
optimal.
16
with α as defined in (4), and
(1 − eb )e−αb (1 − eb )eαb
A= , B= .
1 − e2α(H−b) 1 − e−2α(H−b)
The free boundary is the unique integer in the interval (D − 1, D], with D
being the unique negative solution satisfying the equation
h i h i
(1 − eD−1 ) 1 − e2α(H−D) + (eD − 1) e−α − eα+2α(H−D) = 0.
Proof. The proof of the lemma is just some technical details, and is deferred
to the Appendix.
A Appendix
Proof of Lemma 1. It suffices to solve for an integer b < 0. The rest is
implied by Proposition 3. The inequalities (6)-(7) amount to
1 − ex ≥ β/2 1 − ex+1 + 1 − ex−1 , ∀ x≤ b−1
1 − eb ≥ β/2 eα 1 − eb + 1 − eb−1 , x=b
(1 − ex )+ < eα(x−b) 1 − eb , ∀ x ≥ b + 1.
However, since b < 0, the LHS is a convex function, and taking value 0 if
x = b. Thus the above inequality reduces to
eb+1 + eα (1 − eb ) > 1,
or
1 − eα
b > log = B − 1.
e − eα
The inequality corresponding to x = b gives
e−α − 1
b ≤ log = B.
e−α − e−1
17
These uniquely determine b as the integer in the interval (B − 1, B].
It remains to verify the inequality for x ≤ b − 1, or
1 − β ≥ ex 1 − βe/2 − βe−1 /2 , ∀ x ≤ b − 1.
Thus, we only need to show V (x) > 1 − ex for all b < x < H. However,
Aeαx + Be−αx = α2 (Aeαx + Be−αx ) > 0
18
References
[1] J. Neveu. Mathematical Foundations of the Calculus of Probability.
Holden-Day, San Francisco, 1965.
19