Portfolio
Portfolio
2, 2017, (137–160)
Research Report No. 14, 2013, Probab. Statist. Group Manchester (26 pp)
where t runs from 0 to the given terminal time T > 0 , the supremum is taken
over admissible controls u , and c > 0 is a given constant. By employing the
method of Lagrange multipliers we show that the nonlinear problem can be reduced
to a family of linear problems. Solving the latter using a classic Hamilton-Jacobi-
Bellman approach we find that the optimal dynamic control is given by
δ 1 (δ2 −r)(T −t)
u∗ (t, x) = e
2cσ x
where δ = (µ−r)/σ . The dynamic formulation of the problem and the method
of solution are applied to the constrained problems of maximising/minimising the
mean/variance subject to the upper/lower bound on the variance/mean from which
the nonlinear problem above is obtained by optimising the Lagrangian itself.
1. Introduction
Imagine an investor who has an initial wealth which he wishes to exchange between a risky
stock and a riskless bond in a self-financing manner dynamically in time so as to maximise
his return and minimise his risk at the given terminal time. In line with the mean-variance
analysis of Markowitz [11] where the optimal portfolio selection problem of this kind was solved
in a single period model (see e.g. Merton [12] and the references therein) we will identify the
return with the expectation of the terminal wealth and the risk with the variance of the terminal
wealth. The quadratic nonlinearity of the variance then moves the resulting optimal control
problem outside the scope of the standard optimal control theory (see e.g. [5]) which may be
viewed as dynamic programming in the sense of solving the Hamilton-Jacobi-Bellman (HJB)
equation and obtaining an optimal control which remains optimal independently from the
initial (and hence any subsequent) value of the wealth. Consequently the results and methods
Mathematics Subject Classification 2010. Primary 60H30, 60J65. Secondary 49L20, 91G80.
Key words and phrases: Nonlinear optimal control, static optimality, dynamic optimality, mean-variance
analysis, the Hamilton-Jacobi-Bellman equation, martingale, geometric Brownian motion, Markov process.
1
of the standard/linear optimal control theory are not directly applicable in this new/nonlinear
setting. The purpose of the present paper is to develop a new methodology for solving nonlinear
optimal control problems of this kind and demonstrate its use in the optimal mean-variance
portfolio selection problem stated above. This is done in parallel to the novel methodology for
solving nonlinear optimal stopping problems that was recently developed in [13] when tackling
an optimal mean-variance selling problem.
Assuming that the stock price follows a geometric Brownian motion and the bond price
compounds exponentially, we first consider the constrained problem in which the investor aims
to maximise the expectation of his terminal wealth XTu over all admissible controls u (repre-
senting the fraction of the wealth held in the stock) such that the variance of XTu is bounded
above by a positive constant. Similarly the investor could aim to minimise the variance of
his terminal wealth XTu over all admissible controls u such that the expectation of XTu is
bounded below by a positive constant. A first application of Lagrange multipliers implies that
the Lagrange function (Lagrangian) for either/both constrained problems can be expressed as
a linear combination of the expectation of XTu and the variance of XTu with opposite signs.
Optimisation of the Lagrangian over all admissible controls u thus yields the central optimal
control problem under consideration. Due to the quadratic nonlinearity of the variance we can
no longer apply standard/linear results of the optimal control theory to solve the problem.
Conditioning on the size of the expectation we show that a second application of Lagrange
multipliers reduces the nonlinear optimal control problem to a family of linear optimal control
problems. Solving the latter using a classic HJB approach we find that the optimal control
depends on the initial point of the controlled wealth process in an essential way. This spatial
inconsistency introduces a time inconsistency in the problem that in turn raises the question
whether the optimality obtained is adequate for practical purposes. We refer to this optimality
as the static optimality (Definition 1) to distinguish it from the dynamic optimality (Definition
2) in which each new position of the controlled wealth process yields a new optimal control
problem to be solved upon overruling all the past problems. This in effect corresponds to solving
infinitely many optimal control problems dynamically in time with the aim of determining the
optimal control (in the sense that no other control applied at present time could produce a
more favourable value at the terminal time). While the static optimality has been used in
the paper by Strotz [21] under the name of ‘pre-commitment’ as far as we know the dynamic
optimality has not been studied in the nonlinear setting of optimal control before. In Section 4
below we give a more detailed account of the mean-variance results and methods on the static
optimality starting with the paper by Richardson [19]. Optimal controls in all these papers
are time inconsistent in the sense described above. This line of papers ends with the paper by
Basak and Chabakauri [1] where a time-consistent control is derived that corresponds to the
Strotz’s approach of ‘consistent planning’ [21] realised as the subgame-perfect Nash equilibrium
(the optimality concept refining Nash equilibrium proposed by Selten in 1965).
We show that the dynamic formulation of the nonlinear optimal control problem admits a
simple closed-form solution (Theorem 3) in which the optimal control no longer depends on
the initial point of the controlled wealth process and hence is time consistent. Remarkably
we also verify that this control yields the expected terminal value which (i) coincides with the
expected terminal value obtained by the statically optimal control (Remark 4) and moreover
(ii) dominates the expected terminal value obtained by the subgame-perfect Nash equilibrium
control (in the sense of Strotz’s ‘consistent planning’) derived in [1] (Section 4). Closed-form
2
solutions to the constrained problems are then derived using the solution to the unconstrained
problem (Corollaries 5 and 7). These results are of both theoretical and practical interest.
In the first problem we note that the optimal wealth exhibits a dynamic compliance effect
(Remark 6) and in the second problem we observe that the optimal wealth solves a meander
type equation of independent interest (Remark 8). In both problems we verify that the expected
terminal value obtained by the dynamically optimal control dominates the expected terminal
value obtained by the statically optimal control.
The novel problems and methodology of the present paper suggest a number of avenues for
further research. Firstly, we work within the transparent setting of one-dimensional geometric
Brownian motion in order to illustrate the main ideas and describe the new methodology with-
out unnecessary technical complications. Extending the results to higher dimensions and more
general diffusion/Markov processes appears to be worthy of further consideration. Secondly,
for similar tractability reasons we assume that (i) unlimited short-selling and borrowing are
permitted, (ii) transaction costs are zero, (iii) the wealth process may take both positive and
negative values of unlimited size. Extending the results under some of these constraints being
imposed is also worthy of further consideration. In both of these settings it is interesting to
examine to what extent the results and methods laid down in the present paper remain valid
under any of these more general or restrictive hypotheses.
with B0 = b for some b > 0 where r ∈ IR is the interest rate, and let the risky stock price
S follow a geometric Brownian motion solving
with S0 = s for some s > 0 where µ ∈ IR is the drift, σ > 0 is the volatility, and W
is a standard Brownian motion defined on a probability space (Ω, F, P) . Note that a unique
solution to (2.1) is given by Bt = b ert and recall that a unique strong solution to (2.2) is given
by St = s exp(σWt +(µ−(σ 2/2))t) for t ≥ 0 .
Consider the investor who has an initial wealth x0 ∈ IR which he wishes to exchange
between B and S in a self-financing manner (with no exogenous infusion or withdrawal of
wealth) dynamically in time up to the given horizon T > 0 . It is then well known (see e.g. [2,
Chapter 6]) that the investor’s wealth X u solves
¡ ¢
(2.3) dXtu = r(1−ut )+µut Xtu dt + σut Xtu dWt
with Xtu0 = x0 where ut denotes the fraction of the investor’s wealth held in the stock at
time t ∈ [t0 , T ] for t0 ∈ [0, T ) given and fixed. Note that (i) ut < 0 corresponds to short
selling of the stock, (ii) ut > 1 corresponds to borrowing from the bond, and (iii) ut ∈ [0, 1]
corresponds to a long position in both the stock and the bond.
To simplify the exposition we will assume that the control u in (2.3) is given by ut =
u(t, Xtu ) where (t, x) 7→ u(t, x) · x is a continuous function from [0, T ]×IR into IR for which
3
the stochastic differential equation (2.3) understood in Itô’s sense has a unique strong solution
X u (meaning that the solution X u to (2.3) is adapted to the natural filtration of W and if
X̃ u is another solution to (2.3) of this kind then X u and X̃ u are equal almost surely). We
will call controls of this kind admissible in the sequel. Recalling that the natural filtration of
S coincides with the natural filtration of W we see that admissible controls have a natural
financial interpretation as they are obtained as deterministic (measurable) functionals of the
observed stock price. Moreover, adopting the convention that u(t, 0) · 0 := lim 06=x→0 u(t, x) · x
we see that the solution X u to (2.3) could take both positive and/or negative values after
passing through zero when the latter limit is different from zero (as is the case in the main
results below). This convention corresponds to re-expressing (2.3) in terms of the total wealth
ut Xtu held in the stock as opposed to its fraction ut which we follow throughout (note that
the essence of the wealth equation (2.3) remains the same in both cases). We do always identify
u(t, 0) with u(t, 0) · 0 however since x 7→ u(t, x) may not be well defined at 0 .
Note that the results to be presented below also hold if the set of admissible controls
is enlarged to include discontinuous and path dependent controls u that are adapted to the
natural filtration of W , or even controls u which are adapted to a larger filtration still making
W a martingale so that (2.3) has a unique weak solution X u (meaning that the solution X u
to (2.3) is adapted to the larger filtration and if X̃ u is another solution to (2.3) of this kind
then X u and X̃ u are equal in law). Since these extensions follow along the same lines and
needed modifications of the arguments are evident, we will omit further details in this direction
and focus on the set of admissible controls as defined above.
For a given admissible control u we let Pt,x denote the probability measure (defined on
the canonical space) under which the solution X u to (2.3) takes value x at time t for
(t, x) ∈ [0, T ] × IR . Note that X u is a (strong) Markov process with respect to Pt,x for
(t, x) ∈ [0, T ]×IR .
Consider the optimal control problem
£ ¤
(2.4) V (t, x) = sup E t,x (XTu )−c Vart,x (XTu )
u
where the supremum is taken over all admissible controls u such that E t,x [(XTu )2 ] < ∞ for
(t, x) ∈ [0, T ]×IR and c > 0 is a given and fixed constant. A sufficient condition for the latter
£RT ¤
expectation to be finite is that Et,x t (1+u2s )(Xsu )2 ds < ∞ and we will assume in the sequel
that all admissible controls by definition satisfy that condition as well.
Due to the quadratic nonlinearity of the second term in the expression Var t,x (XTu ) =
E t,x [(XTu )2 ] − [E t,x (XTu )]2 it is evident that the problem (2.4) falls outside the scope of the
standard/linear optimal control theory for Markov processes (see e.g. [5]). Moreover, we will
see below that in addition to the static formulation of the nonlinear problem (2.4) where the
maximisation takes place relative to the initial point (t, x) which is given and fixed, one is also
naturally led to consider a dynamic formulation of the nonlinear problem (2.4) in which each
new position of the controlled process ((t, Xtu ))t∈[0,T ] yields a new optimal control problem to
be solved upon overruling all the past problems. We believe that this dynamic optimality is of
general interest in the nonlinear problems of optimal control (as well as nonlinear problems of
optimal stopping as discussed in [13]).
The problem (2.4) seeks to maximise the investor’s return identified with the expectation
of XTu and minimise the investor’s risk identified with the variance of XTu upon applying the
4
control u . This identification is done in line with the mean-variance analysis of Markowitz
[11]. Moreover, we will see in the proof below that the problem (2.4) is obtained by optimising
the Lagrangian of the constrained problems
(2.5) V1 (t, x) = sup E t,x (XTu )
u: Vart,x (XTu )≤α
respectively, where u is any admissible control, and α ∈ (0, ∞) and β ∈ IR are given and
fixed constants. Solving (2.4) we will therefore be able to solve (2.5) and (2.6) as well. Note
that the constrained problems have transparent interpretations in terms of the investor’s return
and the investor’s risk as discussed above.
We now formalise definitions of the optimalities alluded to above. Recall that all controls
throughout refer to admissible controls as defined/discussed above.
Note that the static optimality refers to the optimality relative to the initial point (t, x)
which is given and fixed. Changing the initial point may yield a different optimal control in the
nonlinear problems since the statically optimal controls may and generally will depend on the
initial point in an essential way (cf. [21]). This stands in sharp contrast with standard/linear
problems of optimal control where in view of dynamic programming (the HJB equation) the
optimal control does not depend on the initial point explicitly. This is a key difference between
the static optimality in nonlinear problems of optimal control and the standard optimality in
linear problems of optimal control (cf. [5]).
5
A control u∗ is dynamically optimal in (2.6), if for every given and fixed (t, x) ∈ [0, T ]×IR
and every control v such that v(t, x) 6= u∗ (t, x) with E t,x (XTv ) ≥ β , there exists a control w
satisfying w(t, x) = u∗ (t, x) with E t,x (XTw ) ≥ β such that
(2.12) Vart,x (XTw ) < Vart,x (XTv ) .
Dynamic optimality above is understood in the ‘strong’ sense. Replacing the strict inequalities
in (2.10)-(2.12) by inequalities would yield dynamic optimality in the ‘weak’ sense.
Note that the dynamic optimality corresponds to solving infinitely many optimal control
problems dynamically in time where each new position of the controlled process ((t, Xtu ))t∈[0,T ]
yields a new optimal control problem to be solved upon overruling all the past problems. The
optimal decision at each time tells us to exert the best control among all possible controls.
While the static optimality remembers the past (through the initial point) the dynamic op-
timality completely ignores it and only looks ahead. Nonetheless it is clear that there is a
strong link between the static and dynamic optimality (the latter being formed through the
beginnings of the former as shown below) and this will be exploited in the proof below when
searching for the dynamically optimal controls. In the case of standard/linear optimal control
problems for Markov processes it is evident that the static and dynamic optimality coincide
under mild regularity conditions due to the fact that dynamic programming (the HJB equation)
is applicable. This is not the case for the nonlinear problems of optimal control considered in
the present paper as it will be seen below.
Theorem 3. Consider the optimal control problem (2.4) where X u solves (2.3) with
Xtu0 = x0 under Pt0 ,x0 for (t0 , x0 ) ∈ [0, T ]×IR given and fixed. Recall that B solves (2.1),
S solves (2.2), and we set δ = (µ − r)/σ for µ ∈ IR , r ∈ IR and σ > 0 . We assume
throughout that δ 6= 0 and r 6= 0 (the cases δ = 0 or r = 0 follow by passage to the limit
when the non-zero δ or r tends to 0 ).
(A) The statically optimal control is given by
· ¸
s δ 1 r(t−t0 ) 1 δ2 (T −t0 )−r(T −t)
(3.1) u∗ (t, x) = x0 e −x+ e
σ x 2c
for (t, x) ∈ [t0 , T ]×IR . The statically optimal controlled process is given by
· ¸
s r(t−t0 ) 1 (δ2 −r)(T −t) δ2 (t−t0 ) 2
−δ(Wt −Wt0 )− δ2 (t−t0 )
(3.2) Xt = x0 e + e e −e
2c
· ¸
Bt 1 ³ Bt ´1−δ2/r ³ Bt ´δ2/r ³ Bt ´δ(1/σ+(δ−σ)/2r) ³ St ´−δ/σ
= x0 + −
Bt0 2c BT Bt 0 Bt 0 S t0
for t ∈ [t0 , T ] . The static value function Vs := E(XTs )−c Var(XTs ) is given by
1 h δ2 (T −t0 ) i
(3.3) Vs (t0 , x0 ) = x0 er(T −t0 ) + e −1
4c
6
for (t0 , x0 ) ∈ [0, T ]×IR .
(B) The dynamically optimal control is given by
for t ∈ [t0 , T ] . The dynamic value function Vd := E(XTd )−c Var(XTd ) is given by
1 h δ2 (T −t0 ) 1 2δ2 (T −t0 ) 3 i
(3.6) Vd (t0 , x0 ) = x0 er(T −t0 ) + e − 4e −4
2c
for (t0 , x0 ) ∈ [0, T ]×IR .
Proof. We assume throughout that the process X u solves the stochastic differential equa-
tion (2.3) with Xtu0 = x0 under Pt0 ,x0 for (t0 , x0 ) ∈ [0, T ]×IR given and fixed where u is
any admissible control as defined/discussed above. To simplify the notation we will drop the
subscript zero from t0 and x0 in the first part of the proof below.
(A): Note that the objective function in (2.4) reads
£ ¤2 £ ¤
(3.7) E t,x (XTu ) − c Vart,x (XTu ) = E t,x (XTu ) + c E t,x (XTu ) − c E t,x (XTu )2
where the key difficulty is the quadratic nonlinearity of the middle term on the right-hand side.
To overcome this difficulty we will condition on the size of E t,x (XTu ) . This yields
£ ¤
(3.8) V (t, x) = sup sup E t,x (XTu )−c Vart,x (XTu )
M ∈IR u : E t,x (XTu )=M
h £ ¤2 £ ¤i
= sup sup E t,x (XTu ) + c E t,x (XTu ) − c E t,x (XTu )2
M ∈IR u : E t,x (XTu )=M
h £ ¤i
= sup M + cM 2 − c inf u E t,x (XTu )2 .
M ∈IR u : E t,x (XT )=M
Hence to solve (3.8) and thus (2.4) we need to solve the constrained problem
£ ¤
(3.9) VM (t, x) = inf u E t,x (XTu )2
u : E t,x (XT )=M
7
1. To tackle the problem (3.9) we will apply the method of Lagrange multipliers. For this,
define the Lagrangian as follows
£ ¤ £ ¤
(3.10) Lt,x (u, λ) = E t,x (XTu )2 − λ E t,x (XTu )−M
for λ ∈ IR and let uλ∗ denote the optimal control in the unconstrained problem
upon assuming that it exists. Suppose moreover that there is λ = λ(M, t, x) ∈ IR such that
¡ uλ ¢
(3.12) E t,x XT ∗ = M .
for any admissible control u such that E t,x (XTu ) = M . This shows that uλ∗ satisfying (3.11)
and (3.12) is optimal in (3.9).
2. To tackle the problem (3.11) with (3.12) we consider the optimal control problem
£ ¤
(3.14) V λ (t, x) = inf E t,x (XTu )2 −λXTu
u
where u is any admissible control. This is a standard/linear problem of optimal control (see
e.g. [5]) that can be solved using a classic HJB approach. For the sake of completeness we
present key steps in the derivation of the solution.
From (3.14) combined with (2.3) we see that the HJB system reads
h ¡ ¢ λ 1 2 2 2 λi
λ
(3.15) inf Vt + r(1−u)+µu xVx + 2 σ u x Vxx = 0
u∈IR
δ 1 Vxλ
(3.17) u=− λ
.
σ x Vxx
δ 2 (Vxλ )2
(3.18) Vtλ + rxVxλ − λ
= 0.
2 Vxx
8
and making use of (3.16) we find that
δ 2 b2 (t)
(3.20) a0 (t) = (δ 2 −2r)a(t) & b0 (t) = (δ 2 −r)b(t) & c0 (t) =
4 a(t)
on [0, T ] with a(T ) = 1 , b(T ) = −λ and c(T ) = 0 . Solving (3.20) under these terminal
conditions we obtain
2 2 λ2 h 2
i
(3.21) a(t) = e−(δ −2r)(T −t) & b(t) = −λ e−(δ −r)(T −t) & c(t) = − 1−e−δ (T −t) .
4
Inserting (3.21) into (3.19) and calculating (3.17) we find that
δ 1h λ −r(T −t) i
(3.22) u(t, x) = − x− e .
σ x 2
Applying Itô’s formula to the process Z defined by
where we set K := (λ/2) e−r(T −t0 ) and making use of (2.3) we find that
with Zt0 = K −x0 under Pt0 ,x0 . Solving the linear equation (3.24) explicitly we obtain the
following closed form expression
h 2 i
u r(t−t0 ) −δ(Wt −Wt0 )− 3δ2 (t−t0 )
(3.25) Xt = e K −(K −x0 ) e
for t ∈ [t0 , T ] . The process X u defined by (3.25) is a unique strong solution to the stochas-
tic differential equation (2.3) obtained by the control u from (3.22) and yielding the value
function V λ given in (3.19) combined with (3.21) above. It is then a matter of routine to
apply Itô’s formula to V λ composed with (t, Xtv ) for any admissible control v and using
(3.15)+(3.16) verify that the candidate control u from (3.22) is optimal in (3.14) as envisaged
(these arguments are displayed more explicitly in (3.36)-(3.37) below).
3. Having solved the problem (3.14) we still need to meet the condition (3.12). For this, we
find from (3.25) that
¡ ¢ −(δ 2 −r)(T −t0 ) λh −δ 2 (T −t0 )
i
(3.26) E t0 ,x0 XTu = x0 e + 1−e .
2
To realise (3.12) we need to identify (3.26) with M . This yields
2
M −x0 e−(δ −r)(T −t0 )
(3.27) λ=2
1−e−δ2 (T −t0 )
for δ 6= 0 . Note that the case δ = 0 is evident since in this case uλ∗ = 0 is optimal in (3.14)
for every λ ∈ IR and hence the inequality in (3.13) holds for every admissible control u while
from (2.3) we also easily see that (3.26) (with δ = 0) holds for every admissible control u so
9
that we only have one M possible in (3.8) and that is the one given by (3.26) (with δ = 0).
This shows that (3.1)-(3.3) are valid when δ = 0 and we will therefore assume in the sequel
that δ 6= 0 . Moreover, from (3.25) we also find that
£ ¤ 2 λ2 h 2
i
(3.28) E t0 ,x0 (XTu )2 = x20 e−(δ −2r)(T −t0 ) + 1−e−δ (T −t0 ) .
4
Note that this expression can also be obtained from (3.14) and (3.26) upon recalling (3.19) with
(3.21) above. Inserting (3.27) into (3.28) and recalling (3.13) we see that (3.9) is given by
¡ 2 ¢2
−(δ 2 −2r)(T −t M −x0 e−(δ −r)(T −t0 )
(3.29) VM (t0 , x0 ) = x20 e 0)
+
1−e−δ2 (T −t0 )
for δ 6= 0 . Inserting (3.29) into (3.8) we get
· µ ¡ ¢
−(δ 2 −r)(T −t0 ) 2 ¶¸
2 M −x 0 e
(3.30) V (t0 , x0 ) = sup M + cM 2 − c x20 e−(δ −2r)(T −t0 ) +
M ∈IR 1−e−δ2 (T −t0 )
for δ 6= 0 . Note that the function of M to be maximised on the right-hand side is quadratic
with the coefficient in front of M 2 strictly negative when δ 6= 0 . This shows that there exists
a unique maximum point in (3.30) that is easily found to be given by
1 h δ2 (T −t0 ) i
(3.31) M∗ = x0 er(T −t0 ) + e −1 .
2c
Inserting (3.31) into (3.27) we find that
1 δ2 (T −t0 )
(3.32) λ∗ = 2x0 er(T −t0 ) + e .
c
Inserting (3.32) into (3.22) we establish the existence of the optimal control in (2.4) that is
given by (3.1) above. Moreover, inserting (3.32) into (3.25) we obtain the first identity in (3.2).
The second identity in (3.2) then follows upon recalling the closed form expressions for B and
S stated following (2.2) above. Finally, inserting (3.31) into (3.30) we obtain (3.3) and this
completes the first part of the proof.
(B): Identifying t0 with t and x0 with x in the statically optimal control us∗ from (3.1)
we obtain the control ud∗ from (3.4). We claim that this control is dynamically optimal in
(2.4). For this, take any other admissible control v such that v(t0 , x0 ) 6= ud∗ (t0 , x0 ) and set
w = us∗ . Then w(t0 , x0 ) = ud∗ (t0 , x0 ) and we claim that
(3.33) Vw (t0 , x0 ) := E t0 ,x0 (XTw )−c Vart0 ,x0 (XTw ) > E t0 ,x0 (XTv )−c Vart0 ,x0 (XTv ) =: Vv (t0 , x0 )
upon noting that Vw (t0 , x0 ) equals V (t0 , x0 ) since w is statically optimal in (2.4).
4. To verify (3.33) set Mv := Et0 ,x0 (XTv ) and first consider the case when Mv 6= M∗ where
M∗ is given by (3.31) above. Using (3.9)+(3.29) and (3.30)+(3.31) we then find that
£ ¤
(3.34) Vv (t0 , x0 ) = Mv + cMv2 − c E t0 ,x0 (XTv )2 ≤ Mv + cMv2 − c VMv (t0 , x0 )
10
µ ¡ 2 ¢2 ¶
2 Mv −x0 e−(δ −r)(T −t0 )
≤ Mv + cMv2
−c x20 e−(δ −2r)(T −t0 )
+
1−e−δ2 (T −t0 )
µ ¡ ¢
−(δ 2 −r)(T −t0 ) 2 ¶
2 M∗ −x 0 e
< M∗ + cM∗2 − c x20 e−(δ −2r)(T −t0 ) +
1−e−δ2 (T −t0 )
= Vw (t0 , x0 )
for δ 6= 0 where the strict inequality follows since M∗ is the unique maximum point of the
quadratic function as pointed out following (3.30) above. The case δ = 0 is excluded since
then as pointed out following (3.27) above we only have M∗ possible in (3.8) so that Mv
would be equal to M∗ . This shows that (3.33) is satisfied when Mv 6= M∗ as claimed.
Next consider the case when Mv = M∗ . We then claim that
£ ¤
(3.35) Vvλ∗ (t, x) := E t0 ,x0 (XTv )2 −λ∗ XTv > V λ∗ (t0 , x0 )
where V λ∗ is defined in (3.14) and λ∗ is given by (3.32) above. For this, note that using
(3.16) and applying Itô’s formula we get
11
on Rε for some β > 0 given and fixed. Setting τε := inf { s ∈ [t0 , t0 +ε] | (s, Xsv ) ∈
/ Rε } we
see by (3.37) and (3.38) that
where in the first inequality we use that the integrand in (3.37) is non-negative as pointed out
above and in the final (strict) inequality we use that τε > t0 with Pt0 ,x0 -probability one due
to the continuity of X v . The arguments remain also valid when x0 = 0 upon recalling that
v(t0 , 0) and w(t0 , 0) are identified with v(t0 , 0) · 0 and w(t0 , 0) · 0 in this case. From (3.39)
we see that (3.35) holds as claimed.
Recalling from (3.10)-(3.13) that V λ∗ (t0 , x0 ) = VM∗ (t0 , x0 )−λ∗M∗ as well as that Mv = M∗
by hypothesis we see from (3.35) that
£ ¤
(3.40) E t0 ,x0 (XTv )2 > VM∗ (t0 , x0 ) .
This shows that (3.33) holds when Mv = M∗ as well and hence we can conclude that the
control ud∗ from (3.4) is dynamically optimal as claimed.
d
5. Applying Itô’s formula to er(T −t) Xtd where we set X d := X u∗ and making use of (2.3)
we easily find that the first identity in (3.5) is satisfied. Integrating by parts and recalling the
closed form expressions for B and S stated following (2.2) above we then establish that the
second identity in (3.5) also holds. From the first identity in (3.5) we get
¡ ¢ 1 h δ2 (T −t0 ) i
(3.42) E t0 ,x0 XTd = x0 er(T −t0 ) + e −1
2c
1 h 2 i
(3.43) Vart0 ,x0 (XTd ) = 2 e2δ (T −t0 ) − 1 .
8c
From (3.42) and (3.43) we obtain (3.6) and this completes the proof. ¤
Remark 4. The dynamically optimal control ud∗ from (3.4) by its nature rejects any past
point (t0 , x0 ) to measure its performance so that although the static value Vs (t0 , x0 ) by its
definition dominates the dynamic value Vd (t0 , x0 ) this comparison is meaningless from the
standpoint of the dynamic optimality. Another issue with a plain comparison of the values
Vs (t, x) and Vd (t, x) for (t, x) ∈ [t0 , T ]×IR is that the optimally controlled processes X s and
X d may never come to the same point x at the same time t so that the comparison itself
may be unreal. A more dynamic way that also makes more sense in general is to compare the
value functions composed with the controlled processes. This amounts to look at Vs (t, Xts )
and Vd (t, Xtd ) for t ∈ [t0 , T ] and pay particular attention to t becoming the terminal £ value¤
T . Note that
£ Vs (T, ¤XT ) = XT and Vd (T, XT ) = XT so that to compare Et0 ,x0 Vs (T, XTs )
s s d d
and Et0 ,x0 Vd (T, XTd ) is the same as to compare Et0 ,x0 (XTs ) and Et0 ,x0 (XTd ) . It is easily seen
from (3.2) and (3.5) that the latter two expectations coincide. We can therefore conclude that
12
£ ¤ £ ¤
(3.44) Et0 ,x0 Vs (T, XTs ) = Et0 ,x0 (XTs ) = Et0 ,x0 (XTd ) = Et0 ,x0 Vd (T, XTd )
for all (t0 , x0 ) ∈ [0, T ]×IR . This shows that the dynamically optimal control ud∗ is as good as
the statically optimal control us∗ from this static standpoint as well (with respect to any past
point (t0 , x0 ) given and fixed). In addition to that however the dynamically optimal control
ud∗ is time consistent while the statically optimal control us∗ is not.
Note also from (3.4) that the amount of the dynamically optimal wealth ud∗ (t, x) · x held in
the stock at time t does not depend on the amount of the total wealth x . This is consistent
with the fact that the risk/cost in (2.4) is measured by the variance (applied at a constant
rate c) which is a quadratic function of the terminal wealth while the return/gain is measured
by the expectation (applied at a constant rate too) which is a linear function of the terminal
wealth. The former therefore penalises stochastic movements of the large wealth more severely
than what the latter is able to compensate for and the investor is discouraged to hold larger
amounts of his wealth in the stock. Thus even if the total wealth is large (in modulus) it is
still dynamically optimal to hold the same amount of wealth ud∗ (t, x) · x in the stock at time t
as when the total wealth is small (in modulus). The same optimality behaviour has been also
observed for the subgame-perfect Nash equilibrium controls (cf. Section 4).
We now turn to the constrained problems. Note in the proofs below that the unconstrained
problem above is obtained by optimising the Lagrangian of the constrained problems.
Corollary 5. Consider the optimal control problem (2.5) where X u solves (2.3) with
Xtu0 = x0 under Pt0 ,x0 for (t0 , x0 ) ∈ [0, T ]×IR given and fixed. Recall that B solves (2.1),
S solves (2.2), and we set δ = (µ − r)/σ for µ ∈ IR , r ∈ IR and σ > 0 . We assume
throughout that δ 6= 0 and r 6= 0 (the cases δ = 0 or r = 0 follow by passage to the limit
when the non-zero δ or r tends to 0 ).
(A) The statically optimal control is given by
· 2 ¸
δ 1 √ eδ (T −t0 )−r(T −t)
(3.45) us∗ (t, x) = x0 er(t−t0 )
−x+ α √ 2
σ x eδ (T −t0 ) −1
for (t, x) ∈ [t0 , T ]×IR . The statically optimal controlled process is given by
2 · ¸
√
e(δ −r)(T −t) δ 2 (t−t0 )
2
−δ(Wt −Wt0 )− δ2 (t−t0 )
(3.46) Xts = x0 er(t−t0 )
+ α√ 2 e −e
eδ (T −t0 ) −1
Bt
√
α ³ B ´1−δ2/r ·³ B ´δ2/r ³ B ´δ(1/σ+(δ−σ)/2r) ³ S ´−δ/σ ¸
t t t t
= x0 + q¡ ¢ 2 −
Bt0 BT δ /r
−1 B T B t0 B t0 S t0
Bt 0
13
(B) The dynamically optimal control is given by
2
√ δ 1 e(δ −r)(T −t)
(3.48) ud∗ (t, x) = α √
σ x eδ2 (T −t) −1
for (t, x) ∈ [t0 , T ]×IR . The dynamically optimal controlled process is given by
·
d r(t−t0 )
√ −r(T −t) p δ2 (T −t )
(3.49) Xt = x0 e +2 αe e 0 −1
p Z 2 ¸
2 (T −t) δ t eδ (T −s)
− e δ −1 + √ dWs
2 t0 eδ2 (T −s) −1
√ · 2
µs³ ´ 2 s
³ B ´δ2/r ¶
Bt δ α Bt 2(r+δ(1−σ))−σ BT δ /r
T
= x0 + 2
−1 − −1
Bt0 σ BT δ Bt Bt 0
¡ B ¢δ2/r ¡¡ B ¢δ2/r ¢
T ³ S ´ δ 2 Z t ³ B ´δ2/r T
−2 ³S ´ ¸
Bt t T Bs s
+ q¡ ¢ 2 log + ¡¡ ¢ δ 2/r ¢ 3/2
log ds
BT δ /r St0 2 t 0 Bs BT
−1 S t0
Bt
−1 Bs
for t ∈ [t0 , T ) . The dynamic value function Vd1 := lim t↑T E(Xtd ) is given by
√ p
(3.50) Vd1 (t0 , x0 ) = x0 er(T −t0 ) + 2 α eδ2 (T −t0 ) − 1
Proof. We assume throughout that the process X u solves the stochastic differential equa-
tion (2.3) with Xtu0 = x0 under Pt0 ,x0 for (t0 , x0 ) ∈ [0, T ]×IR given and fixed where u is
any admissible control as defined/discussed above. To simplify the notation we will drop the
subscript zero from t0 and x0 in the first part of the proof below.
(A): Note that we can think of (3.7) as (the essential part of) the Lagrangian for the
constrained problem (2.5) defined by
£ ¤
(3.51) Lt,x (u, c) = E t,x (XTu ) − c Vart,x (XTu )−α
for c > 0 . By the result of Theorem 3 we know that the control us∗ given in (3.1) is optimal
in unconstrained problem
for c > 0 . Suppose moreover that there exists c = c(α, t, x) > 0 such that
¡ uc ¢
(3.53) Vart,x XT ∗ = α .
for any admissible control u such that Vart,x (XTu ) ≤ α . This shows that the control uc∗ from
(3.1) with c = c(α, t, x) > 0 is statically optimal in (2.5).
14
To realise (3.53) note that taking E t0 ,x0 in (3.2) and making use of (3.3) we find that
¡ uc ¢ 1 h 2 i
(3.55) Vart0 ,x0 XT ∗ = 2 eδ (T −t0 ) − 1 .
4c
Setting this expression equal to α yields
1 p δ2 (T −t0 )
(3.56) c= √ e − 1.
2 α
By (3.53) and (3.54) we can then conclude that the control uc∗ is statically optimal in (2.5).
Inserting (3.56) into (3.1) and (3.2) we obtain (3.45) and (3.46) respectively. Taking E t0 ,x0 in
(3.46) we obtain (3.47) and this completes the first part of the proof.
(B): Identifying t0 with t and x0 with x in the statically optimal control us∗ from
(3.45) we obtain the control ud∗ from (3.48). We claim that this control is dynamically optimal
in (2.5). For this, take any other admissible control v such that v(t0 , x0 ) 6= ud∗ (t0 , x0 ) and
set w = us∗ . Then w(t0 , x0 ) = ud∗ (t0 , x0 ) and (3.33) holds with c from (3.56). Using that
Vart0 ,x0 (XTw ) = α by (3.55) and (3.56) we see that (3.33) yields
¡ ¢
(3.57) E t0 ,x0 (XTw ) > E t0 ,x0 (XTv ) + c α−Vart0 ,x0 (XTv ) ≥ E t0 ,x0 (XTv )
whenever Var t0 ,x0 (XTv ) ≤ α . This shows that the control ud∗ from (3.48) is dynamically
optimal in (2.5) as claimed.
d
Applying Itô’s formula to er(T −t) Xtd where we set X d := X u∗ and making use of (2.3) we
easily find that the first identity in (3.49) is satisfied. Integrating by parts and recalling the
closed form expressions for B and S stated following (2.2) above we then establish that the
second identity in (3.49) also holds. From the first identity in (3.49) we get
¡ d¢ r(t−t0 )
√ −r(T −t) hp δ2 (T −t ) p
δ 2 (T −t)
i
(3.58) E t0 ,x0 Xt = x0 e +2 αe e 0 −1 − e −1
for t ∈ [t0 , T ) . Letting t ↑ T in (3.58) we obtain (3.50) and this completes the proof. ¤
Remark 6 (A dynamic compliance effect). From (3.47) and (3.50) we see that the
dynamic value Vd1 (t0 , x0 ) strictly dominates the static value Vs1 (t0 , x0 ) . To see why this is
possible note that using (3.49) we find that
h 2 ³ eδ2 (T −t0 ) −1 ´i
δ 2 (T −t)
(3.59) Vart0 ,x0 (Xtd ) = αe −2r(T −t)
e δ (T −t0 )
−e +log
eδ2 (T −t) −1
for t ∈ [t0 , T ) with δ 6= 0 . This shows that Var t0 ,x0 (Xtd ) → ∞ as t ↑ T so that the static
value Vs1 (t0 , x0 ) can indeed be exceeded by the dynamic value Vd1 (t0 , x0 ) since the set of admis-
sible controls is virtually larger in the dynamic case. It amounts to what we refer to as a dynamic
compliance effect where the investor follows a uniformly bounded risk (variance) strategy at each
time (and thus complies with the adopted regulation rule imposed internally/externally) while
the resulting static strategy exhibits an unbounded risk (variance).
Rt Denoting the stochastic
2 2
integral (martingale) in (3.49) by Mt we see that hM, M it = t0 e2δ (T −s) /(eδ (T −s)−1) ds → ∞
as t ↑ T . It follows therefore that Mt oscillates from −∞ to ∞ with Pt0 ,x0 -probability one
15
as t ↑ T and hence the same is true for Xtd whenever δ 6= 0 (for similar behaviour arising
from the continuous-time analogue of a doubling strategy see [9, Example 2.3]). We also see
from (3.46) and (3.49) that unlike in (3.44) we have the strict inequality
£ ¤ £ ¤
(3.60) lim Et0 ,x0 Vs1 (t, Xts ) = lim Et0 ,x0 (Xts ) < lim Et0 ,x0 (Xtd ) = lim Et0 ,x0 Vd1 (t, Xtd )
t↑T t↑T t↑T t↑T
satisfied for all (t0 , x0 ) ∈ [0, T ) × IR . This shows that the dynamic control ud∗ from (3.48)
outperforms the static control us∗ from (3.45) in the constrained problem (2.5).
Corollary 7. Consider the optimal control problem (2.6) where X u solves (2.3) with
Xtu0 = x0 under Pt0 ,x0 for (t0 , x0 ) ∈ [0, T ]×IR given and fixed. Recall that B solves (2.1),
S solves (2.2), and we set δ = (µ − r)/σ for µ ∈ IR , r ∈ IR and σ > 0 . We assume
throughout that δ 6= 0 and r 6= 0 (the cases δ = 0 or r = 0 follow by passage to the limit
when the non-zero δ or r tends to 0 ).
(A) The statically optimal control is given by
· ¸
δ 1 ¡ ¢ δ2 (T −t0 )−r(T −t)
r(T −t0 ) e
(3.61) us∗ (t, x) = x0 er(t−t0 )
− x + β −x0 e
σ x eδ2 (T −t0 ) −1
if x0 er(T −t0 ) < β and us∗ (t, x) = 0 if x0 er(T −t0 ) ≥ β for (t, x) ∈ [t0 , T ]×IR . The statically
optimal controlled process is given by
· ¸
s r(t−t0 )
¡ ¢ (δ2 −r)(T −t) δ2 (t−t0 )
r(T −t0 ) e
2
−δ(Wt −Wt0 )− δ2 (t−t0 )
(3.62) Xt = x0 e + β −x0 e e −e
eδ2 (T −t0 ) −1
¡ B ¢1−δ2/r ·
Bt ³ BT ´ BTt ³ B ´δ2/r ³ B ´δ(1/σ+(δ−σ)/2r) ³ S ´−δ/σ ¸
t t t
= x0 + β −x0 ¡ BT ¢δ2/r −
Bt0 Bt 0 −1 B t0 B t0 S t0
Bt 0
if x0 er(T −t0 ) < β and Xts = x0 er(t−t0 ) if x0 er(T −t0 ) ≥ β for t ∈ [t0 , T ] (see Figure 1 below).
The static value function Vs2 := Var(XTs ) is given by
¡ ¢2
β −x0 er(T −t0 )
(3.63) Vs2 (t0 , x0 ) =
eδ2 (T −t0 ) −1
if x0 er(T −t0 ) < β and Vs2 (t0 , x0 ) = 0 if x0 er(T −t0 ) ≥ β for (t0 , x0 ) ∈ [0, T ]×IR .
(B) The dynamically optimal control is given by
if x0 er(T −t0 ) < β and ud∗ (t, x) = 0 if x0 er(T −t0 ) ≥ β for (t, x) ∈ [t0 , T )×IR . The dynamically
optimal controlled process is given by
·
d −r(T −t)
¡ ¢ δ2 (T −t) −1
r(T −t0 ) e
(3.65) Xt = e β − β −x0 e ×
eδ2 (T −t0 ) −1
16
µ Z t 2 (T −s) Z 2 ¶¸
eδ δ 2 t e2δ (T −s)
× exp − δ δ 2 (T −s) −1
dWs − ds
t0 e 2 t0 (eδ2 (T −s) −1)2
" Ã ¡ B ¢δ2/r !1/2+σ/2δ−r/σδ Ã ¡ B ¢δ2/r
Bt ¡ BT ¢ T
−1 δ T ³S ´
Bt Bt t
= β − β −x0 ¡ BT ¢δ2/r exp − ¡ ¢δ2/r log
BT Bt 0 −1 σ BT −1 S t0
Bt0 Bt
¡ B ¢δ2/r ¡ B ¢δ2/r ¡ B ¢δ2/r !#
3 Z t T ³ ´ T
− BTt
δ Bs Ss 1 Bt0
+ ¡¡ ¢ ¢ log ds − ¡¡ ¢δ2/r ¢¡¡ ¢δ2/r ¢
σ t0 BT δ2/r −1 2 S t0 2 BT −1 BT
−1
Bs Bt Bt0
with Xtd er(T −t) < β for t ∈ [t0 , T ) and XTd := lim t↑T Xtd = β with Pt0 ,x0 -probability one if
x0 er(T −t0 ) < β , and Xtd = x0 er(t−t0 ) for t ∈ [t0 , T ] if x0 er(T −t0 ) ≥ β (see Figure 1 below).
The dynamic value function Vd2 := Var(XTd ) is given by
Proof. We assume throughout that the process X u solves the stochastic differential equa-
tion (2.3) with Xtu0 = x0 under Pt0 ,x0 for (t0 , x0 ) ∈ [0, T ]×IR given and fixed where u is
any admissible control as defined/discussed above. To simplify the notation we will drop the
subscript zero from t0 and x0 in the first part of the proof below.
(A): Note that the Lagrangian for the constrained problem (2.6) is defined by
£ ¤
(3.67) Lt,x (u, c) = Vart,x (XTu ) − c E t,x (XTu )−β
for c > 0 . Suppose moreover that there exists c = c(β, t, x) > 0 such that
¡ 1/c ¢
(3.70) E t,x XTu∗ = β .
1/c
for any admissible control u such that E t,x (XTu ) ≥ β . This shows that the control u∗ from
(3.1) with c = c(β, t, x) > 0 is statically optimal in (2.6).
To realise (3.70) note that taking E t0 ,x0 in (3.2) we find that
¡ 1/c ¢ ch 2 i
(3.72) E t0 ,x0 XTu∗ = x0 er(T −t0 ) + eδ (T −t0 ) − 1 .
2
17
Setting this expression equal to β yields
with Zt0 = β − er(T −t0 ) x0 under Pt0 ,x0 . Solving the linear equation (3.76) explicitly we obtain
the closed form expression
µ Z t Z t· ¸ ¶
δ δ2 1 δ2
(3.77) Zt = Zt0 exp − −δ 2 (T −s)
dWs − −δ 2 (T −s)
+ ds
t0 1−e t0 1−e 2 (1−e−δ2 (T −s) )2
for t ∈ [t0 , T ) under Pt0 ,x0 . Inserting (3.77) into (3.75) we easily find that the first identity in
(3.65) is satisfied. Integrating by parts and recalling the closed form expressions for B and S
stated following (2.2) above we then establish that the second identity in (3.65) also holds.
From (3.75) and (3.77) we see that Zt = β − er(T −t) Xtd > 0 so that Xtd er(T −t) < β for
t ∈ [t0 , T ) as claimed. Moreover, by the Dambis-Dubins-Schwarz theorem (see e.g. [18, p. 181])
18
2.0
1.5
1.0
0.5
0.5
1.1
1.0
0.9
0.8
Figure 1. The dynamically optimal wealth t 7→ Xtd and the statically optimal wealth
t 7→ Xts in the constrained problem (2.6) of Corollary 7 obtained from the stock price
t 7→ St when t0 = 0 , x0 = 1 , S0 = 1 , β = 2 , r = 0.1 , µ = 0.5 , σ = 0.4 and T = 1 .
Note that the expected value of ST equals eµT ≈ 1.64 which is strictly smaller than β .
Rt 2 2
we know that the continuous martingale M defined by Mt = −δ t0 eδ (T −s) /(eδ (T −s) −1) dWs
for t ∈ [t0 , T ) is a time-changed Brownian motion
R W̄ in the sense that Mt = W̄hM,M it for
2 t 2δ 2 (T −s) 2
t ∈ [t0 , T ) where we note that hM, M it = δ t0 e /(eδ (T −s) − 1)2 ds ↑ ∞ as t ↑ T .
It follows therefore by the well-known sample path properties of W̄ that Mt − 21 hM, M it =
W̄hM,M it − 12 hM, M it → −∞ as t ↑ T with Pt0 ,x0 -probability one. Making use of this fact in
(3.65) we see that Xtd → β with Pt0 ,x0 -probability one as t ↑ T if x0 er(T −t0 ) < β as claimed.
From the preceding facts we also see that (3.66) holds and the proof is complete. ¤
Remark 8. Note from the proof above that Xtd < β with Pt0 ,x0 -probability one for all
t ∈ [t0 , T ) if x0 er(T −t0 ) < β so that X d is not a bridge process but a time-reversed meander
process. The result of Corollary 7 shows that it is dynamically optimal to keep the wealth
Xtd strictly below β for t ∈ [t0 , T ) with achieving XTd = β . This behaviour is different
19
from the statically optimal wealth Xts which can go above β on [t0 , T ) and end up either
above or below β at T (see Figure 1 above). Moreover, it is easily seen from (3.62) that
Pt0 ,x0 (XTs < β) > 0 from where we find that
£ ¤ £ ¤
(3.78) Et0 ,x0 Vs2 (T, XTs ) = ∞ > 0 = Et0 ,x0 Vd2 (T, XTd )
if x0 er(T −t0 ) < β using (3.63) and (3.66) respectively. This shows that the dynamic control ud∗
from (3.64) outperforms the static control us∗ from (3.61) in the constrained problem (2.6).
Remark 9. Note that no admissible control u can move a given deterministic wealth x0
at time t0 ∈ [0, T ) to any other deterministic wealth at time T apart from x0 er(T −t0 ) in
which case u equals zero. This is important since otherwise the optimal control problem (2.6)
would not be well posed. Indeed, this can be seen by a standard martingale measure change
dP̃t0 ,x0 = exp(−δ(WT −Wt0 )−(δ 2/2)(t−t0 )) dPt0 ,x0 making W̃t := Wt −Wt0 +δ(t−t0 ) a standard
Brownian motion for t ∈ [t0 , T ] . It then follows from (2.3) using integration by parts that
Z t
−r(t−t0 ) u
(3.79) e Xt = x0 + σ e−r(s−t0 ) us Xsu dW̃s
t0
Rt
where Mt := t0 σ e−r(s−t0 ) us Xsu dW̃s is a continuous local martingale under P̃t0 ,x0 for t ∈
[t0 , T ] . Moreover, by Hölder’s inequality we see that
· µZ T ¶1/2 ¸
p 2
−δ(WT −Wt0 )− δ2 (T −t0 ) 2 −2r(t−t0 ) 2 u 2
(3.80) Ẽ t0 ,x0 hM, M iT = E t0 ,x0 e σ e ut (Xt ) dt
t0
³ µ ·Z ¸¶1/2
£ −2δ(WT −Wt0 )−δ 2 (T −t0 )
¤´1/2 T
2 −2r(t−t0 )
≤ E t0 ,x0 e E t0 ,x0 σ e u2t (Xtu )2 dt <∞
t0
£RT ¤
since Et0 ,x0 t0 (1+u2t )(Xtu )2 dt < ∞ by admissibility of u . This shows that M is a martingale
under P̃t0 ,x0 . Hence if XT is constant then it follows from (3.79) and the martingale property
of M that Mt = 0 for all t ∈ [t0 , T ] . But this means that Xtu = x0 er(t−t0 ) for t ∈ [t0 , T ]
with u being equal to zero as claimed.
Remark 10. Note from (3.65) that E t0 ,x0 (Xtd ) → β as t ↑ T if x0 er(T −t0 ) < β , however,
this convergence fails to extend to the variance. Indeed, using (3.65) it can be verified that
³ ´2 µ eδ2 (T −t) −1 ¶2
d −2r(T −t) r(T −t0 )
(3.81) Vart0 ,x0 (Xt ) = e β −x0 e
eδ2 (T −t0 ) −1
· δ2 (T −t0 ) µ 2 2 ¶ ¸
e −1 eδ (T −t0 ) − eδ (T −t)
× δ2 (T −t) exp −1
e −1 (eδ2 (T −t) − 1)(eδ2 (T −t0 ) − 1)
for t ∈ [t0 , T ) from where we see that Vart0 ,x0 (Xtd ) → ∞ as t ↑ T if x0 er(T −t0 ) < β . To
connect to the comments on the sample path behaviour made in Remark 6 note that t 7→ Xtd
is not bounded from below on [t0 , T ) . Both of these consequences are due partly to the fact
that we allow the wealth process to take both positive and negative values of unlimited size
(recall the end of Section 1 above). Another reason is that the dynamic optimality by its nature
pushes the optimal controls to their limits so that breakdown points are possible.
20
4. Static vs dynamic optimality
In this section we address the rationale for introducing the static and dynamic optimality
in the nonlinear optimal control problems under consideration and explain their relevance for
applications of both theoretical and practical interest. We also discuss relation of these results
with the existing approaches to similar problems in the literature.
1. To simplify the exposition we focus on the unconstrained problem (2.4) and similar
arguments apply to the constrained problems (2.5) and (2.6) as well. Recall that (2.4) represents
the optimal portfolio selection problem for an investor who has an initial wealth x0 ∈ IR which
he wishes to exchange between a risky stock S and a riskless bond B in a self-financing
manner dynamically in time so as to maximise his return (identified with the expectation of
his wealth) and minimise his risk (identified with the variance of his wealth) at the given
terminal time T . Due to the quadratic nonlinearity of the variance (as a function of the
expectation) the optimal portfolio strategy (3.1) depends on the initial wealth x0 in an essential
way. This spatial inconsistency (not present in the standard/linear optimal control problems)
introduces the time inconsistency in the problem because the investor’s wealth process moves
from the initial value x0 in t units of time to a new value x1 (different from x0 with
probability one) which in turn yields a new optimal portfolio strategy that is different from the
initial strategy. This time inconsistency repeats itself between any two points in time and the
investor may be in doubt which optimal portfolio strategy to use unless already made up his
mind. To tackle these inconsistencies we are naturally led to consider two types of investors
and consequently introduce the two notions of optimality as stated in Definitions 1 and 2
respectively. The first investor is a static investor who stays ‘pre-committed’ to the optimal
portfolio strategy evaluated initially and does not re-evaluate the optimality criterion (2.4) at
later times. This investor will determine the optimal portfolio strategy at time t0 and follow
it blindly to the terminal time T . The second investor is a dynamic investor who remains
‘non-committed’ to the optimal portfolio strategy evaluated initially as well as subsequently
and continuously re-evaluates the optimality criterion (2.4) at each new time. This investor will
determine the optimal portfolio strategy at time t0 and continue doing so at each new time
until the terminal time T . Clearly both the static investor and the dynamic investor embody
realistic economic behaviour (see below for a more detailed account coming from economics) and
Theorem 3 discloses their optimal portfolio selection strategies in the unconstrained problem
(2.4). Similarly Corollary 5 and Corollary 7 disclose their optimal portfolio selection strategies
in the constrained problems (2.5) and (2.6). Given that the financial interpretations of these
results are easy to draw directly and somewhat lengthy to state explicitly we will omit further
details. It needs to be noted that although closely related the three problems (2.4)-(2.6) are still
different and hence it is to be expected that their solutions are also different for some values of
the parameters. Difference between the static and dynamic optimality is best understood by
analysing each problem on its own first as in this case the complexity of the overall comparison
is greatly reduced.
2. Apart from the paper [13] where the dynamic optimality was used in a nonlinear problem
of optimal stopping, we are not aware of any other paper on optimal control where nonlinear
problems were studied using this methodology. The dynamic optimality (Definition 2) appears
therefore to be original to the present paper in the context of nonlinear problems of optimal
control. There are two streams of papers on optimal control however where the static opti-
21
mality (Definition 1) has been used. The first one belongs to the economics literature and
dates back to the paper by Strotz [21]. The second one belongs to the finance literature and
dates back to the paper by Richardson [19]. We present a brief review of these papers to high-
light similarities/differences and indicate the applicability of the present methodology in these
settings.
3. The stream of papers in the economics literature starts with the paper by Strotz [21]
who points out a time inconsistency arising from the presence of the initial point in the time
domain when the exponential discounting in the utility model of Samuelson [20] is replaced by
a non-exponential discounting. For an illuminating exposition of the problem of intertemporal
choices (decisions involving tradeoffs among costs and benefits occurring at different times)
lasting over hundred years and leading to the Samuelson’s simplifying model containing a single
parameter (discount rate) see [7] and the references therein. To tackle the issue of the time
inconsistency Strotz proposed two strategies in his paper: (i) the strategy of ‘pre-commitment’
(where the individual commits to the optimal strategy derived initially) and (ii) the strategy
of ‘consistent planning’ (where the individual rejects any strategy which he will not follow
through and aims to find the optimal strategy among those that he will actually follow). Note
in particular that Strotz coins the term ‘pre-committed’ strategy in his paper and this term
has since been used in the literature including most recent papers too. Although his setting is
deterministic and his time is discrete on closer look one sees that our financial analysis of the
static investor above is fully consistent with his economic reasoning and moreover the statically
optimal portfolio strategy derived in the present paper may be viewed as the strategy of ‘pre-
commitment’ in Strotz’s sense as already indicated above. The dynamically optimal portfolio
strategy derived in the present paper is different however from the strategy of ‘consistent
planning’ in Strotz’s sense. The difference is subtle still substantial and it will become clearer
through the exposition of the subsequent development that continues to the present time. The
next to point out is the paper by Pollak [16] who showed that the derivation of the strategy of
‘consistent planning’ in the Strotz paper [21] was incorrect (one cannot replace the individual’s
non-exponential discount function by the exponential discount function having the same slope
as the non-exponential discount function at zero). Peleg and Yaari [14] then attempted to find
the strategy of ‘consistent planning’ by backward recursion and concluded that the strategy
could exist only under too restrictive hypotheses to be useful. They suggested to look at what
we now refer to as a subgame-perfect Nash equilibrium (the optimality concept refining Nash
equilibrium proposed by Selten in 1965). Goldman [8] then pointed out that the failure of
backward recursion does not disprove the existence as suggested in [14] and showed that the
strategy of ‘consistent planning’ does exist under quite general conditions. All these papers
deal with problems in discrete time. A continuous-time extension of these results appear more
recently in the paper by Ekeland and Pirvu [6] and the paper by Björk and Murgoci [3] (see
also the references therein for other unpublished work). The Strotz’s strategy of ‘consistent
planning’ is being understood as a subgame-perfect Nash equilibrium in this context (satisfying
the natural consumption constraint at present time).
4. The stream of papers in the finance literature starting with the paper by Richardson [19]
deals with optimal portfolio selection problems under mean-variance criteria similar/analogous
to (2.4)-(2.6) above. Richardson’s paper [19] derives a statically optimal control in the con-
strained problem (2.6) using the martingale method suggested by Pliska [15] who makes use of
22
the Legendre transform (convex analysis) rather than the Lagrange multipliers. For an overview
of the martingale method based on Lagrange multipliers see e.g. [2, Section 20]. This martin-
gale method can be used to solve the auxiliary optimal control problem (3.14) in the proof
of Theorem 3 above. Moreover on closer look it is possible to see that the dynamically opti-
mal control is obtained by setting the Radon-Nikodym derivative of the equivalent martingale
measure with respect to the original measure equal to one. Given that the martingale method
is applicable to more general problems of optimal control including those in non-Markovian
settings as well this observation provides a lead for finding the dynamically optimal controls
when a classic HJB approach may not be directly applicable.
Returning to the stream of papers in the finance literature, the paper by Li & Ng [10,
Theorems 1 & 2] in discrete time and the paper by Zhou & Li [24, Theorem 3.1] in continuous
time show that if there is statically optimal control in the unconstrained problem (2.4) then this
control can be found by solving a linear-quadratic optimal control problem (which in turn also
yields statically optimal controls in the constrained problems (2.5) and (2.6)). The methodology
in these papers relies upon the results on multi-index optimisation problems from the paper by
Reid & Citron [17] and is more involved (in comparison with the simple conditioning combined
with a double application of Lagrange multipliers as done in the present paper). In particular,
the results of [10] and [24] do not establish the existence of statically optimal controls in
the problems (2.4)-(2.6) although they do derive their closed form expressions in discrete and
continuous time respectively. In this context it may be useful to recall that the first to point
out that nonlinear dynamic programming problems may be tackled using the ideas of Lagrange
multipliers was White in his paper [23]. He also considered the constrained problem (2.6)
in discrete time (his Section 3) and using Lagrange multipliers derived some conclusions on
the statically optimal control (without realising its time inconsistency). In his setting the
conditioning on the size of the expected value is automatic since he assumed that the expected
value in (2.6) equals β . For this reason his first Lagrangian associated with (2.6) was a linear
problem and hence there was no need to untangle the resulting nonlinearity by yet another
application of Lagrange multipliers as done in the present paper.
All papers in the finance literature reviewed above (including others not mentioned) study
statically optimal controls which in turn are time inconsistent. Thus all of them deal with ‘pre-
committed’ strategies in the sense of Strotz. This was pointed out by Basak and Chabakauri
in their paper [1] where they return to the Strotz’s approach of ‘consistent planning’ and study
the subgame-perfect Nash equilibrium in continuous time. The paper by Björk and Murgoci [3]
merges this with the stream of papers from the economics literature (as already stated above)
and studies general formulations of time inconsistent problems based on the Strotz’s approach of
‘pre-commitment’ vs ‘consistent planning’ in the sense of the subgame-perfect Nash equilibrium.
A recent paper by Czichowsky [4] studies analogous formulations and further refinements in
a general semimartingale setting. For applications of statically optimal controls to pension
schemes see the paper by Vigna [22].
5. We now return to the question of comparison between the Strotz’s definition of ‘consistent
planning’ which is interpreted as the subgame-perfect Nash equilibrium in the literature and
the ‘dynamic optimality’ as defined in the present paper. The key conceptual difference is that
the Strotz’s definition of ‘consistent planning’ is relative (constrained) in the sense that the
‘optimal’ control at time t is best among all ‘available’ controls (the ones which will be actually
followed) while the present definition of the ‘dynamic optimality’ is absolute (unconstrained)
23
in the sense that the optimal control at time t is best among all ‘possible’ controls afterwards.
To illustrate this distinction recall that the subgame-perfect Nash equilibrium formulation of
the Strotz ‘consistent planning’ optimality can be informally described as follows. Given the
present time t and all future times s > t one identifies the control cs applied at time s ≥ t
with an action of the s-th player. The Strotz ‘consistent planning’ optimality is then obtained
through the subgame-perfect Nash equilibrium at a given control (cr )r≥0 if the action ct is
best when the actions cs for s > t are given and fixed, i.e. no other action c̃t in place ct
would do better when the actions cs for s > t are given and fixed (the requirement is clear
in discrete time and requires some right-hand limiting argument in continuous time). Clearly
this optimality is different from the ‘dynamic optimality’ where the optimal control at time t
is best among all ‘possible’ controls afterwards.
To make a more explicit comparison between the two concepts of optimality, recall from [1]
(see also [3]) that a subgame-perfect Nash optimal control in the problem (2.4) is given by
δ 1 −r(T −t)
(4.1) un∗ (t, x) = e
2cσ x
for (t, x) ∈ [t0 , T ]×IR , the subgame-perfect Nash optimal controlled process is given by
δ −r(T −t) h i
(4.2) Xtn = x0 er(t−t0 ) + e δ(t−t0 ) + Wt −Wt0
2c
for t ∈ [t0 , T ] , and the subgame-perfect Nash value function is given by
δ2
(4.3) Vn (t0 , x0 ) = x0 er(T −t0 ) + (T −t0 )
4c
for (t0 , x0 ) ∈ [t0 , T ] × IR (compare these expressions with those given in (3.4)-(3.6) above).
Returning to the analysis from the first paragraph of Remark 4 above, one can easily see by
direct comparison that the subgame-perfect Nash value Vn (t0 , x0 ) dominates the dynamic value
Vd (t0 , x0 ) (and is dominated by the static value Vs (t0 , x0 ) due to its definition). Given that
the optimally controlled processes X n and X d may never come to the same point x at the
same time t we see (as pointed out in Remark 4) that this comparison may be unreal and a
better way is to compare the value functions composed with the controlled processes. Noting
that Vn (T, XTn ) = XTn and Vd (T, XTd ) = XTd it is easy to verify using (3.5) and (4.2) that
£ ¤ £ ¤
(4.4) Et0 ,x0 Vn (T, XTn ) = Et0 ,x0 (XTn ) < Et0 ,x0 (XTd ) = Et0 ,x0 Vd (T, XTd )
for all (t0 , x0 ) ∈ [0, T )×IR . This shows that the dynamically optimal control ud∗ from (3.4)
outperforms the subgame-perfect Nash optimal control un∗ from (4.1) in the unconstrained
problem (2.4). A similar comparison in the constrained problems (2.5) and (2.6) is not possible
since subgame-perfect Nash optimal controls are not available in these problems at present.
References
[1] Basak, S. and Chabakauri, G. (2010). Dynamic mean-variance asset allocation. Rev.
Financ. Stud. 23 (2970–3016).
24
[2] Björk, T. (2009). Arbitrage Theory in Continuous Time. Oxford Univ. Press.
[3] Björk, T. and Murgoci, A. (2010). A general theory of Markovian time inconsistent
stochastic control problems. Preprint SSRN (55 pp).
[4] Czichowsky, C. (2013). Time-consistent mean-variance portfolio selection in discrete
and continuous time. Finance Stoch. 17 (227–271).
[5] Fleming, W. H. and Rishel, R. W. (1975). Deterministic and Stochastic Optimal
Control. Springer-Verlag.
[6] Ekeland, I. and Pirvu, T. A. (2008). Investement and consumption without commit-
ment. Math. Financ. Econ. 2 (57–86).
[7] Frederick, S. Loewenstein, G. and O’Donoghue (2002). Time discounting and
time preferences: A critical review. J. Econ. Lit. 40 (351–401).
[8] Goldman, S. M. (1980). Consistent plans. Rev. Econ. Stud. 47 (533–537).
[9] Karatzas, I. and Shreve, S. E. (1998). Methods of Mathematical Finance. Springer-
Verlag.
[10] Li, D. and Ng, W. L. (2000). Optimal dynamic portfolio selection: Multiperiod mean-
variance formulation. Math. Finance 10 (387–406).
[11] Markowitz, H. M. (1952). Portfolio selection. J. Finance 7 (77–91).
[12] Merton, R. C. (1972). An analytic derivation of the efficient portfolio frontier. J. Fi-
nancial Quant. Anal. 7 (1851–1872).
[13] Pedersen, J. L. and Peskir, G. (2012). Optimal mean-variance selling strategies. Re-
search Report No. 12, Probab. Statist. Group Manchester (20 pp). To appear in Math.
Financ. Econ.
[14] Peleg, B. and Yaari, M. E. (1973). On the existence of a consistent course of action
when tastes are changing. Rev. Econ. Stud. 40 (391–401).
[15] Pliska, S. R. (1986). A stochastic calculus model of continuous trading: Optimal port-
folios. Math. Oper. Res. 11 (370–382).
[16] Pollak, R. A. (1968). Consistent planning. Rev. Econ. Stud. 35 (201–208).
[17] Reid, R. W. and Citron, S. J. (1971). On noninferior performance index vectors. J.
Optimization Theory Appl. 7 (11–28).
[18] Revuz, D. and Yor, M. (1999). Continuous Martingales and Brownian Motion.
Springer-Verlag.
[19] Richardson, H. R. (1989). A minimum variance result in continuous trading portfolio
optimization. Management Sci. 35 (1045–1055).
[20] Samuelson, P. (1937). A note on measurement of utility. Rev. Econ. Stud. 4 (155–161).
25
[21] Strotz, R. H. (1956). Myopia and inconsistency in dynamic utility maximization. Rev.
Econ. Stud. 23 (165–180).
[22] Vigna, E. (2014). On efficiency of mean-variance based portfolio selection in defined
contribution pension schemes. Quant. Finance 14 (237–258).
[23] White, D. J. (1974). Dynamic programming and probabilistic constraints. Operations
Res. 22 (654–664).
[24] Zhou, X. Y. and Li, D. (2000). Continuous-time mean-variance portfolio selection: A
stochastic LQ framework. Appl. Math. Optim. 42 (19–33).
26