DOI 10.1007/s00245-010-9123-8
1 Introduction
for some functions b, σ , ψ and φ, and a Brownian motion Bt . For every t the control
ut is allowed to take values in the action space U . The mean-field SDE is obtained as
the mean square limit of an interacting particle system of the form
⎛ ⎞ ⎛ ⎞
n 1
dxti,n = b ⎝t, xti,n , , ut ⎠ dt + σ ⎝t, xti,n , , ut ⎠ dBti ,
j,n j,n
ψ xt φ xt
n n
j =1 j =1
when n → ∞. The classical example is the McKean-Vlasov model (see e.g. [13] and
the references therein), although in that model the coefficients are linear in the law of
the process. For the nonlinear case, see [10].
The object of the control problem is to minimize a cost functional of the form
J (u) = E h (t, xt , Eϕ(xt ), ut ) dt + g (xT , Eχ(xT )) , (1.2)
for given functions h, g, ϕ and χ . This cost functional is also of mean-field type, as
the functions h and g depend on the law of the state process.
The fact that J is a (possibly) nonlinear function of the expected value stands in
contrast to the standard formulation of a stochastic control problem, where J is the
expected value of a functional of the state process. In fact, this leads to a so-called
time inconsistent control problem. That is, the Bellman optimality principle does not
hold, see e.g. [4, 9], and [2]. The reason for this is that one cannot apply the law
of iterated expectations on the cost functional. A more general form of this control
problem has been studied in [1], and to some extent in [12], using an extended version
of the dynamic programming principle.
In this paper, we derive necessary and sufficient conditions for optimality of this
control problem in the form of a stochastic maximum principle. The standard sto-
chastic maximum principle involves solving the adjoint equation, a backward SDE,
and maximizing the Hamiltonian. In our case, where the state process and cost func-
tional are of mean-field type, the adjoint equation will in fact not be just a backward
SDE, but a mean-field backward SDE, which has been studied recently in [5] and [6].
Due to time inconsistency of the control problem, [9] and [4] need to specify a
definition of admissible controls that involves the Nash Certainty Equivalent Prin-
ciple and the notion of Nash Equilibrium, respectively, to turn the problem into a
time consistent one and solve it using Bellman’s dynamic programming principle.
[1] express the value function in terms of the Nisio nonlinear semigroup of opera-
tors to obtain a highly complicated version of the Hamilton-Jacobi-Bellman equa-
tion. In this work we show that to study the control problem associated with (1.1)
and (1.2), the set up offered by the stochastic maximum principle does not require
any special definition of the set of admissible controls, besides the classical adapted-
ness and some integrability conditions, which makes it suitable to solve this class of
time inconsistent control problems without introducing further concepts and techni-
cal tools.
We apply the methods in [3] to obtain the necessary conditions, i.e. we assume
that the action space is convex, which allows us to make a convex perturbation of the
optimal control and obtain a maximum principle of local condition. The extension to
the general case where the action space is not convex, using a spike variation of the
optimal control, leading to a Peng’s type maximum principle is more involved and
will appear elsewhere.
In the last section we illustrate the result by applying it to the mean-variance op-
timization problem. That is, the continuous version of a Markowitz investment prob-
lem where one constructs a portfolio by investing in a risk free bank account and a
risky asset (e.g. a stock). The objective is to maximize the expected terminal wealth
while minimizing the variance of the terminal wealth. Since this cost function in-
volves the variance which is quadratic in the expected value, the problem is time
inconsistent as explained above. It has nevertheless been solved by different meth-
ods, for instance [15] obtain an optimal control in feedback form by embedding the
problem into a class of stochastic LQ problems. We show that our version of the
stochastic maximum principle can be directly applied to obtain the optimal control
found in [15]. This optimal control is different from the ones obtained by [4] using the
notion of Nash equilibrium and the one suggested in [2], using the total conditional
variance formula.
To ease the exposition of the results, we only consider the one dimensional case.
The extension to the multidimensional case is by now straightforward.
Let T > 0 be a fixed time horizon and (, F , Ft , P) be a filtered probability space
satisfying the usual conditions, on which a standard Brownian motion B = (Bt )t≥0
is defined. We assume that (Ft )t≥0 is the natural filtration of B augmented by P-null
sets of F .
The action space, U , is a non-empty, closed and convex subset of R, and U is the
class of measurable, Ft -adapted and square integrable processes u : [0, T ] × → U .
For any u ∈ U , we consider the following stochastic differential equation
dxt = b(t, xt , Eψ(xt ), ut )dt + σ (t, xt , Eφ(xt ), ut )dBt ,
x(0) = x0 ,
b : [0, T ] × R × R × U −→ R,
σ : [0, T ] ×R × R × U −→ R,
ψ : R −→ R,
φ : R −→ R.
g : R × R −→ R,
h : [0, T ] × R × R × U −→ R,
χ : R −→ R,
ϕ : R −→ R.
The following assumptions will be in force throughout this paper, where x denotes
the state variable, y the ‘expected value’, and v the control variable.
(A.1) ψ, φ, χ and ϕ are continuously differentiable. g is continuously differentiable
with respect to (x, y). b, σ, h are continuously differentiable with respect to
(x, y, v).
(A.2) All the derivatives in (A.1) are Lipschitz continuous and bounded.
Further assumptions will be needed for the sufficient conditions in Sect. 4. Under the
above assumptions, the SDE (2.1) has a unique strong solution. Indeed, since the co-
efficients b and σ are Lipschitz continuous with respect to x, this follows from Propo-
sition 1.2 in [10] if they are also Lipschitz continuous with respect to the Wasserstein
d(μ, ν) = inf EQ |X − Y |2 ; Q ∈ P with marginals μ and ν
= sup hd(μ − ν); |h(x) − h(y)| ≤ |x − y| ,
≤ Kd (μX , μY ) ,
for any p ∈ N+ .
The optimal control problem is to minimize the functional J (.) over U . A control
that solves this problem is called optimal.
We denote for any process ϕt ,
For notational convenience, we will suppress the dependence on the time variable. bx ,
by , bv denotes the derivative of b with respect to the state trajectory, the ‘expected
value’ and the control variable, respectively, and similarly for the other functions.
Finally, we denote by x̂t and ût the optimal trajectory and control, respectively.
In this section we derive necessary conditions for optimality in the form of a maxi-
mum principle of local condition. The methods are those used in [3].
We let xtθ denote the state trajectory corresponding to the following perturbation uθt
of ût :
uθt = ût + θ vt , vt ∈ U.
Proof Since the coefficients in (3.1) are bounded, it follows from Proposition 1.2 in
[10] that there exists a unique solution such that
E sup |zt |p < ∞, (3.2)
t∈[0,T ]
for any p ∈ N+ .
xtθ −x̂t
If we define ytθ = θ − zt , and noting (2.3), it is also clear that
θ p
E sup y < ∞, (3.3)
t∈[0,T ]
for any p ∈ N+ . We have y0θ = 0 and ytθ fulfills the following SDE.
dytθ = b x̂t + θ ytθ + zt , Eψ x̂t + θ ytθ + zt , ût + θ vt − b̂(t) dt
− b̂x (t)zt + b̂y (t)E ψ̂x (t)zt + b̂v (t)vt dt
+ σ x̂t + θ ytθ + zt , Eφ x̂t + θ ytθ + zt , ût + θ vt − σ̂ (t) dBt
− σ̂x (t)zt + σ̂y (t)E φ̂x (t)zt + σ̂v (t)vt dBt . (3.4)
Noting that
b ·, E ψ x̂t + λθ ytθ + zt , ·
= by ·, E ψ x̂t + λθ ytθ + zt , · E ψx x̂t + λθ ytθ + zt ytθ + zt θ,
we proceed as in [3] and write
b x̂t + θ ytθ + zt , Eψ x̂t + θ ytθ + zt , ût + θ vt − b̂(t) dt
= bx x̂t + λθ ytθ + zt , Eψ x̂t + θ ytθ + zt , ût + λθ vt ytθ + zt dλ
+ by x̂t + λθ ytθ + zt , Eψ x̂t + θ ytθ + zt , ût + λθ vt
· E ψx x̂t + λθ ytθ + zt ytθ + zt dλ
+ bv x̂t + λθ ytθ + zt , Eψ x̂t + θ ytθ + zt , ût + λθ vt vt dλ. (3.5)
The three last terms tend to 0 in L2 ( × [0, T ]) as θ → 0. To see this, we rewrite the
second to last term above as
It := by xtλ,θ , Eψ xtλ,θ , uλ,θ
− by x̂t , Eψ xtλ,θ , uλ,θ
t E ψx xtλ,θ zt dλ
+ by x̂t , Eψ xtλ,θ , uλ,θ
− by x̂t , Eψ x̂t , uλ,θ
t E ψx xtλ,θ zt dλ
+ by x̂t , Eψ x̂t , uλ,θ
t − b̂y (t) E ψx xtλ,θ zt dλ
+ b̂y (t) E ψx xtλ,θ zt − E(ψ̂x (t)zt ) dλ.
1/2 ⎫
T 1 T 1/2 ⎬
+ E |λθ vt |4 dλdt E |zt |4 dt ,
0 0 0 ⎭
which converges to 0 as θ → 0 since the expected values are finite. Similar estima-
tions for the third and fifth terms in (3.6) show that these terms also converge to 0
in L2 ( × [0, T ]). Now, rewriting the diffusion part in (3.4) in the same way and
using the Burkholder-Davis-Gundy inequality, we have by the boundedness of the
functions and Jensen’s inequality that
∗,2 T ∗,2 T θ 2
E y θ ≤K
E y θ dt +
sup E ys dt + ρ θ
0 0 s∈[0,t]
T ∗,2
≤K E y θ t dt + ρ θ ,
d T
J û + θ v =E ĥx (t)zt + ĥy (t)E ϕ̂x (t)zt + ĥv (t)vt dt
dθ θ=0 0
+ E ĝx (T )zT + ĝy (T )E (χx (T )zT ) .
Proof Using the short hand notation g (xT ) = g (xT , E (χ(xT ))), etc., we have, in
view of Lemma 3.4,
d θ
E g xT
dθ θ=0
g xTθ − g x̂T
= lim E
θ→0 θ
1 xTθ − x̂T
= lim E gx x̂T + λ xTθ − x̂T dλ
θ→0 0 θ
1 θ θ xTθ − xT
+E gy x̂T + λ xT − x̂T E χx x̂T + λ xT − xT dλ
0 θ
= E ĝx (T )zT + ĝy (T ) E χ̂x (T )zT .
From the definitions of the cost function and the perturbed control, we see that this
proves the lemma.
3.2 Duality
This equations reduces to the standard one, when the coefficients do not depend
explicitly on the marginal law of the underlying diffusion. Under the assumptions
(A.1)–(A.2), this is a linear mean-field backward SDE with bounded coefficients and
it follows from [5], Theorem 3.1, that it has a unique adapted solution such that
∗,2 T 2
E p̂ T + E q̂t dt < +∞. (3.8)
Lemma 3.3
E p̂T zT = E p̂t b̂v (t)vt − zt ĥx (t) − zt E ĥy (t) ϕ̂x (t) + q̂t σ̂v (t)vt dt .
+ Mt ,
E p̂T zT = E p̂t b̂v (t)vt − zt ĥx (t) − zt E ĥy (t) ϕ̂x (t) + q̂t σ̂v (t)vt dt .
+ σ t, x, φdμ, u q.
Corollary 3.1 The Gateaux derivative of the cost functional can be expressed in
terms of the Hamiltonian H in the following way.
d T
J û + θ v =E ĥv (t)vt + p̂t b̂v (t)vt + q̂t σ̂v (t)vt dt
dθ θ=0 0
T d
=E H t, x̂t , ût , p̂t , q̂t vt dt .
0 dv
d T d
J û + θ v − û =E H t, x̂t , ût , p̂t , q̂t vt − ût dt ≥ 0.
dθ θ=0 0 dv
As in [3], we can reduce this to
H t, x̂t , ût , p̂t , q̂t v − ût ≥ 0,
a.e., P-a.s., for all v ∈ U . We summarize this with the main result of this section.
Theorem 3.1 Under assumptions (A.1)–(A.2), if ût is an optimal control with state
trajectory x̂t , then there exists a pair (p̂t , q̂t ) of adapted processes which satisfies
(3.7) and (3.8), such that
H t, x̂t , ût , p̂t , q̂t v − ût ≥ 0, P-a.s., for all t ∈ [0, T ]. (3.9)
Remark 3.1 We note that if we assume that the functions b, σ, h, g, ϕ, ψ, χ and φ
are only Lipschitz continuous, Theorem 3.1 still holds but on an extended probability
space, using distributional derivatives and the Bouleau-Hirsch Flow Property (cf. [7]).
Theorem 4.1 Assume the conditions (A.1)–(A.6) are satisfied and let û ∈ U with
state trajectory x̂t be given and such that there exist solutions p̂t , q̂t to the adjoint
equation (3.7). Then, if
H t, x̂t , ût , p̂t , q̂t = inf H t, x̂t , v, p̂t , q̂t , (4.1)
Remark 4.1 By assumption (A.4), the conditions (3.9) and (4.1) are equivalent.
Proof We introduce the short hand notation b(t) = b (t, xt , E (ψ(xt )) , ut ) and simi-
larly for the other functions. The functions b̂(t), ψ̂(t) etc., are defined as in Sect. 3.
Moreover, we denote H (t) = H (t, xt , ut , p̂t , q̂t ) and Ĥ (t) = H (t, x̂t , ût , p̂t , q̂t ).
Since g and χ are convex and gy ≥ 0, it holds that
E ĝ − g ≤ E ĝx (T ) x̂T − xT + ĝy (T )E χ̂(T ) − χ(T )
≤ E ĝx (T ) x̂T − xT + ĝy (T )E χ̂x (T ) · x̂T − xT
= E p̂T x̂T − xT .
+E Ĥ (t) − H (t) dt − E ĥ(t) − h(t) dt,
0 0
where, in the last step, we have used the definition of the Hamiltonian H . Next, we
differentiate the Hamiltonian and use the convexity of the functions to get for all
t ∈ [0, T ], P-a.s.,
Ĥ (t) − H (t)
≤ Ĥx (t) x̂t − xt + ĥy (t)E ϕ̂(t) − ϕ(t) + b̂y (t)E ψ̂(t) − ψ(t) p̂t
+ σ̂y (t)E φ̂(t) − φ(t) q̂t + Ĥu (t) ût − ut
≤ Ĥx (t) x̂t − xt + ĥy (t)E ϕ̂x (t) x̂t − xt + b̂y (t)E ψ̂x (t) x̂t − xt p̂t
+ σ̂y (t)E φ̂x (t) x̂t − xt q̂t + Ĥu (t) ût − ut
≤ Ĥx (t) x̂t − xt + ĥy (t)E ϕ̂x (t) x̂t − xt + b̂y (t)E ψ̂x (t) x̂t − xt p̂t
+ σ̂y (t)E φ̂x (t) x̂t − xt q̂t ,
where in the last step we have used that Ĥu ût − ut ≤ 0 due to the minimum condi-
tion (4.1). Combining the inequalities above gives us
J û − J (u)
=E ĥ(t) − h(t) dt + E ĝ(T ) − g(T )
≤E Ĥ (t) − H (t) dt − E x̂t − xt b̂x (t)p̂t + E b̂y (t)p̂t ψ̂x (t)
0 0
+ σ̂x (t)q̂t + E σ̂y (t)q̂t φ̂x (t) + ĥx (t) + E ĥy (t) ϕ̂x (t) dt
=E Ĥ (t) − H (t) dt − E x̂t − xt Ĥx (t) + E b̂y (t)p̂t ψ̂x (t)
0 0
+ E σ̂y (t)q̂t (t) φ̂x (t) + E ĥy (t) ϕ̂x (t) dt ≤ 0,
In this section we will illustrate the maximum principle by solving the optimal mean-
variance portfolio problem.
We consider a market with a risky asset and a risk free bank account and denote
the prices at time t with St1 and St0 , respectively. The price process evolve according
to the equations
dSt0 = ρt St0 dt,
dSt1 = αt St1 dt + σt St1 dBt ,
where αt , σt , ρt are bounded deterministic functions. If ut denotes the amount of
money invested in the risky asset at time t, we can write down the value xt of a
self-financing portfolio consisting of the risky and the risk free assets, as
γ 2 γ
J (u) = E x − xT − (E (xT ))2 ,
2 T 2
we see that this is a cost functional of the form (2.2). As noted in e.g. [4], this becomes
a time inconsistent control problem. We start our attempt to solve it by writing down
the Hamiltonian for this system:
+ At σt ut dBt , (5.4)
where At and Ct denotes the derivatives with respect to t. By comparing (5.4)
with (5), we get
Since, H is linear in the control variable u, in which case the conditions (3.9) and
(4.1) are equivalent, we consider the first order condition for minimizing the Hamil-
tonian that yields
(αt − ρt ) pt + σt qt = 0.
Inserting (5.6) into the latter expression gives us the following candidate of feedback
form for the optimal control:
(αt − ρt ) (αt − ρt )
ût = 2
x̂t − μ̂t − Ct , (5.7)
σt σt2 At
which is square integrable since x̂t is. The expected value of ût is
(ρt − αt ) Ct
E ût = . (5.8)
σt2 At
αt − ρt −1
ût = C A
t t − x̂ t − μ̂ t
αt − ρt 1 T (s −ρs )ds
= et − x̂t − μ̂t , (5.10)
σt2 γ
1 T
dμ̂t = ρt μ̂t + t e t (s −ρs )ds dt, μ̂0 = x0 . (5.12)
Finally, by inserting (5.13) into (5.10), we get the solution candidate for the mean-
variance portfolio selection problem (5.2), when xt obeys (5.1), given in feedback
form by
αt − ρt t 1 T s ds− T ρs ds
û t, x̂t = x0 e 0 ρs ds
+ e0 t − x̂t ,
σt2 γ
which is identical to the optimal control found in [15], cf. (5.12), (6.7) and the sub-
sequent comments.
356 Appl Math Optim (2011) 63: 341–356
