0% found this document useful (0 votes)
34 views12 pages

Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl

This document discusses optimal control in large stochastic multi-agent systems. It presents a mathematical model for multi-agent systems where agents have independent stochastic dynamics and controls. The goal is to minimize a cost function consisting of state-dependent and control-dependent terms. For large systems where agents must distribute themselves over targets, the optimal control problem can be formulated as a graphical model inference problem. However, exact inference is intractable for large densely-coupled systems. The document proposes using mean field approximation to compute near-optimal controls that scale to large systems. It provides an example where the optimal control can be computed in closed form for a system with stochastic velocity controls.

Uploaded by

Giovanni Palombo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views12 pages

Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl

This document discusses optimal control in large stochastic multi-agent systems. It presents a mathematical model for multi-agent systems where agents have independent stochastic dynamics and controls. The goal is to minimize a cost function consisting of state-dependent and control-dependent terms. For large systems where agents must distribute themselves over targets, the optimal control problem can be formulated as a graphical model inference problem. However, exact inference is intractable for large densely-coupled systems. The document proposes using mean field approximation to compute near-optimal controls that scale to large systems. It provides an example where the optimal control can be computed in closed form for a system with stochastic velocity controls.

Uploaded by

Giovanni Palombo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Optimal Control in Large Stochastic

Multi-agent Systems

Bart van den Broek, Wim Wiegerinck, and Bert Kappen

SNN, Radboud University Nijmegen, Geert Grooteplein 21, Nijmegen,


The Netherlands
{b.vandenbroek,w.wiegerinck,b.kappen}@science.ru.nl

Abstract. We study optimal control in large stochastic multi-agent sys-


tems in continuous space and time. We consider multi-agent systems
where agents have independent dynamics with additive noise and con-
trol. The goal is to minimize the joint cost, which consists of a state
dependent term and a term quadratic in the control. The system is de-
scribed by a mathematical model, and an explicit solution is given. We
focus on large systems where agents have to distribute themselves over a
number of targets with minimal cost. In such a setting the optimal con-
trol problem is equivalent to a graphical model inference problem. Exact
inference will be intractable, and we use the mean field approximation to
compute accurate approximations of the optimal controls. We conclude
that near to optimal control in large stochastic multi-agent systems is
possible with this approach.

1 Introduction
A collaborative multi-agent system is a group of agents in which each member
behaves autonomously to reach the common goal of the group. Some examples
are teams of robots or unmanned vehicles, and networks of automated resource
allocation. An issue typically appearing in multi-agent systems is decentralized
coordination; the communication between agents may be restricted, there may
be no time to receive all the demands for a certain resource, or an unmanned
vehicle may be unsure about how to anticipate another vehicles movement and
avoid a collision.
In this paper we focus on the issue of optimal control in large multi-agent sys-
tems where the agents dynamics are continuous in space and time. In particular
we look at cases where the agents have to distribute themselves in admissible
ways over a number of targets. Due to the noise in the dynamics, a configura-
tion that initially seems attainable with little effort may become harder to reach
later on.
Common approaches to derive a coordination rule are based on discretizations
of space and time. These often suffer from the curse of dimensionality, as the
complexity increases exponentially in the number of agents. Some successfull
ideas, however, have recently been put forward, which are based on structures
that are assumed to be present [1,2].

K. Tuyls et al. (Eds.): Adaptive Agents and MAS III, LNAI 4865, pp. 15–26, 2008.

c Springer-Verlag Berlin Heidelberg 2008
16 B. van den Broek, W. Wiegerinck, and B. Kappen

Here we rather model the system in continuous space and time, following the
approach of Wiegerinck et al. [3]. The agents satisfy dynamics with additive
control and noise, and the joint behaviour of the agents is valued by a joint cost
function that is quadratic in the control. The stochastic optimization problem
may then be transformed into a linear partial differential equation, which can
be solved using generic path integral methods [4,5]. The dynamics of the agents
are assumed to factorize over the agents, such that the agents are coupled by
their joint task only.
The optimal control problem is equivalent to a graphical model inference prob-
lem [3]. In large and sparsely coupled multi-agent systems the optimal control
can be computed using the junction tree algorithm. Exact inference, however,
will break down when the system is both large and densely coupled. Here we
explore the use of graphical model approximate inference methods in optimal
control of large stochastic multi-agent systems. We apply the mean field approx-
imation to show that optimal control is possible with accuracy in systems where
exact inference breaks down.

2 Stochastic Optimal Control of a Multi-agent System


We consider n agents in a k-dimensional space Rk , the state of each agent a is
given by a vector xa in this space, satisfying stochastic dynamics

dxa (t) = ba (xa (t), t)dt + Bua (t)dt + σdw(t), (1)

where ua is the control of agent a, ba is an arbitrary function representing au-


tonomous dynamics, w is a Wiener process, and B and σ are k × k matrices.
The agents have to reach a goal at the end time T , they will pay a cost φ(x(T ))
at the end time depending on their joint end state x(T ) = (x1 (T ), . . . , xn (T )),
but to reach this goal they will have to make an effort which depends on the
agents controls and states over time. At any time t < T , the expected cost-to-go
is
C(x, t, u(t → T )) =
  n 

T  T
1 
φ(x(T )) + dθ V (x(θ), θ) + dθ ua (θ) Rua (θ) , (2)
t a=1 t 2

given the agents initial state x, and the joint control over time u(t → T ). R
is a symmetric k × k matrix with positive eigenvalues, such that ua (θ) Rua (θ)
is always a non-negative number, V (x(θ), θ) is the cost for the agents to be
in a joint state x(θ) at time θ. The issue is to find the optimal control which
minimizes the expected cost-to-go.
The optimal controls are given by the gradient

ua (x, t) = −R−1 B  ∂xa J(x, t), (3)


Optimal Control in Large Stochastic Multi-agent Systems 17

where J(x, t) the optimal expected cost-to-go, i.e. the cost (2) minimized over
all possible controls; a brief derivation is contained in the appendix. An impor-
tant implication of equation (3) is that at any moment in time, each agent can
compute its own optimal control if it knows its own state and that of the other
agents: there is no need to discuss possible strategies! This is because the agents
always perform the control that is optimal, and the optimal control is unique.
To compute the optimal controls, however, we first need to find the optimal
expected cost-to-go J. The latter may be expressed in terms of a forward diffusion
process:

J(x, t) = −λ log dy ρ(y, T |x, t)e−φ(y)/λ , (4)

ρ(y, T |x, t) being the transition probability for the system to go from a state
x at time t to a state y at the end time T . The constant λ is determined by
the relation σσ  = λBR−1 B  , equation (14) in the appendix. The density
ρ(y, θ|x, t), t < θ ≤ T , satisfies the forward Fokker-Planck equation,

V 
n n
1   2 
∂θ ρ = − − ∂ya ba ρ + Tr σσ ∂ya ρ . (5)
λ a=1 a=1
2

The solution to this equation may generally be estimated using path integral
methods [4,5], in a few special cases a solution exists in closed form:
Example 1. Consider a multi-agent system in one dimension in which there is
noise and control in the velocities of the agents, according to the set of equations

dxa (t) = ẋa (t)dt
dẋa (t) = ua (t)dt + σdw(t).
Note that this set of equations can be merged into a single equation of the
form (1) by a concatenation of xa and ẋa into a single vector. We choose the
potential V = 0. Under the task where each agent a has to reach a target with
location μa at the end time T , and arrive with speed μ̇a , the end cost function
φ can be given in terms of a product of delta functions, that is

n
e−φ(x,ẋ)/λ = δ(xa − μa )δ(ẋa − μ̇a ),
a=1

and the system decouples into n independent single-agent systems. The dynamics
of each agent a is given by a transition probability

ρa (ya , ẏa , T |xa , ẋa , t) =


 2 
1 1

−1/2 ya − xa − (T − t)ẋa

exp − c , (6)
det(2πc) 2 ẏa − ẋa

where 
1 2(T − t)3 3(T − t)2
c= σ2 .
6 3(T − t)2 6(T − t)
18 B. van den Broek, W. Wiegerinck, and B. Kappen

The optimal control follows from equations (3) and (4) and reads
6(μa − xa − (T − t)ẋa ) − 2(T − t)(μ̇a − ẋa )
ua (xa , ẋa , t) = . (7)
(T − t)2
The first term in the control will steer the agent towards the target μa in a
straight line, but since this may happen with a speed that differs from μ̇a with
which the agent should arrive, there is a second term that initially ‘exaggerates’
the speed for going in a straight line, so that in the end there is time to adjust
the speed to the end speed μ̇a .

2.1 A Joint Task: Distribution over Targets


We consider the situation where agents have to distribute themselves over a
number of targets s = 1, . . . , m. In general, there will be mn possible combina-
tions of assigning the n agents to the targets—note, in example 1 we considered
only one assignment. We can describe this by letting the end cost function φ be
given in terms of a positive linear combination of functions

n
Φ(y1 , . . . , yn , s1 , . . . , sn ) = Φa (ya , sa )
a=1

that are peaked around the location (μs1 , . . . , μsn ) of a joint target (s1 , . . . , sn ),
that is
 n
−φ(y)/λ
e = w(s1 , . . . , sn ) Φa (ya , sa ),
s1 ,...,sn a=1

where the w(s1 , . . . , sn ) are positive weights. We will refer to these weights as
coupling factors, since they introduce dependencies between the agents. The
optimal control of a single agent is obtained using equations (3) and (4), and is
a weighted combination of single-target controls,

m
ua = pa (s)ua (s) (8)
s=1

(the explicit (x, t) dependence has been dropped in the notation). Here ua (s) is
the control for agent a to go to target s,
ua (s) = −R−1 B  ∂xa Za (s), (9)
with Za (s) defined by

Za (sa ) = dya ρa (ya , T |xa , t)Φa (ya , sa ).

The weights pa (s) are marginals of the joint distribution



n
p(s1 , . . . , sn ) ∝ w(s1 , . . . , sn ) Za (sa ). (10)
a=1

p thus is a distribution over all possible assignments of agents to targets.


Optimal Control in Large Stochastic Multi-agent Systems 19

Example 2. Consider the multi-agent system of example 1, but with a different


task: each of the agents a = 1, . . . , n has to reach a target s = 1, . . . , n with
location μs at the end time T , and arrive with zero speed, but no two agents
are allowed to arrive at the same target. We model this by choosing an end cost
function φ(x, ẋ) given by


n
e−φ(x,ẋ)/λ = w(s1 , . . . , sn ) δ(ya − μa )δ(ẏa )
s1 ,...,sn a=1

with coupling factors



n  c 
w(s1 , . . . , sn ) = exp δsa ,sa .
λn
a,a =1

For any agent a, the optimal control under this task is a weighted average of
single target controls (7),

6(μa  − xa − (T − t)ẋa ) + 2(T − t)ẋa


ua (xa , ẋa , t) = , (11)
(T − t)2

where μa  the averaged target for agent a,


n
μa  = pa (s)μs .
s=1

The average is taken with respect to the marginal pa of the joint distribution

n
p(s1 , . . . , sn ) ∝ w(s1 , . . . , sn ) ρa (μsa , 0, T |xa , ẋa , t),
a=1

the densities ρa given by (6).


In general, and in example 2 in particular, the optimal control of an agent will
not only depend on the state of this agent alone, but also on the states of other
agents. Since the controls are computed anew at each instant in time, the agents
are able to continuously adapt to the behaviour of the other agents, adjusting
their control to the new states of all the agents.

2.2 Factored End Costs


The additional computational effort in multi-agent control compared to single-
agent control lies in the computation of the marginals of the joint distribution
p, which involves a sum of at most mn terms. For small systems this is feasible,
for large systems this will only be feasible if the summation can be performed
efficiently. Whether an efficient way of computing the marginals exists, depends
20 B. van den Broek, W. Wiegerinck, and B. Kappen

on the joint task of the agents. In the most complex case, to fulfil the task each
agent will have to take the joint state of the entire system into account. In less
complicated cases, an agent will only consider the states of a few agents in the
system, in other words, the coupling factors will have a nontrivial factorized
form:
w(s1 , . . . , sn ) = wA (sA ),
A

where the A are subsets of agents. In such cases we may represent the couplings,
and thus the joint distribution, by a factor graph; see Figure 1 for an example.

1,4 1,2 2,4 3,4 2,3

1 4 2 3

Fig. 1. Example of a factor graph for a multi-agent system of four agents. The cou-
plings are represented by the factors A, with A = {1, 4}, {1, 2}, {2, 4}, {3, 4}, {2, 3}.

2.3 Graphical Model Inference


In the previous paragraph we observed that the joint distribution may be repre-
sented by a factor graph. This implies that the issue of assigning agents to targets
is equivalent to a graphical model inference problem. Both exact methods (junc-
tion tree algorithm [6]) and approximate methods (mean field approximation [7],
belief propagation [8]) can be used to compute the marginals in (8). In this paper
we will use the mean field (MF) approximation to tackle optimal control in large
multi-agent systems.
In the mean field approximation we minimize the mean field free energy, a
function of single agent marginals qa defined by
 
FMF ({qa }) = −λ log wq − λ log Za qa − λ H(qa ),
a a

where q(s) = q1 (s1 ) · · · qn (sn ). Here the H(qa ) are the entropies of the distribu-
tions qa , 
H(qa ) = − qa (s) log qa (s).
s

The minimum
JMF = min FMF ({qa })
{qa }

is an upper bound for the optimal cost-to-go J, it equals J in case the agents
are uncoupled. FMF has zero gradient in its local minima, that is,

∂F (q1 (s1 ), . . . , qn (sn ))


0= a = 1, . . . , n,
∂qa (sa )
Optimal Control in Large Stochastic Multi-agent Systems 21

with additional constraints for normalization of the probability vectors qa . So-


lutions to this set of equations are implicitely given by the mean field equations

Za (sa ) exp (log w|sa )


qa (sa ) = n  
(12)
s =1 Za (sa ) exp (log w|sa )

a

where log w|sa  the conditional expectation of log w given sa ,


 
log w|sa  = qa (sa ) log w(s1 , . . . , sn ).
s1 ,...,sn \sa a =a

The mean field equations are solved by means of iteration, and the solutions are
the local minima of the mean field free energy. Thus the mean field free energy
minimized over all solutions to the mean field equations equals the minimum
JMF .
The mean field approximation of the optimal control is found by taking the
gradient of the minimum JMF of the mean field free energy, similar to the exact
case where the optimal control is the gradient of the optimal expected cost-to-go,
equation (3):

ua (x, t) = −Ra−1 Ba ∂xa JMF (x, t) = qa (sa )ua (xa , t; sa ).
sa

Similar to the exact case, it is an average of single-agent single-target optimal


controls ua (xa , t; sa ), the controls ua (xa , t; sa ) given by equation (9), where the
average is taken with respect to the mean field approximate marginal qa (sa ) of
agent a.

3 Control of Large Multi-agent Systems

Exact inference of multi-agent optimal control is intractable in large and densely


coupled systems. In this section we present numerical results from approximate
inference in optimal control of a large multi-agent system. We focus on the system
presented in example 2. A group of n agents have to distribute themselves over
an equal number of targets, each target should be reached by precisely one agent.
The agents all start in the same location at t = 0, and the time they reach the
targets lies at T = 1, as illustrated in figure 3. The variance of the noise equals
0.1 and the control cost parameter R equals 1, both are the same for each agent.
The coupling strength c in the coupling factors equals −10. For implementation,
time had to be discretized: each time step Δt equaled 0.05 times the time-to-go
T − t.
We considered two approximate inference methods for obtaining the marginals
in (8), the mean field approximation described in section 2.3, and an approxi-
mation which at each moment in time assigns each agent to precisely one target.
In the latter method the agent that is nearest to any of the targets is assigned
first to its nearest target, then, removing this pair of agent and target, this is
22 B. van den Broek, W. Wiegerinck, and B. Kappen

repeated for the remaining agents and targets, until there are no more remain-
ing agents and targets. We will refer to this method as the sort distances (SD)
method.
For several sizes of the system we computed the control cost and the required
CPU time to calculate the controls. This we did under both control methods.
Figures 2(a) and (b) show the control cost and the required CPU time as a
function of the system size n; each value is an average obtained from 100 sim-
ulations. To emphasize the necessity of the approximate inference methods, in
figure 2(b) we included the required CPU time under exact inference; this quan-
tity increases exponentially with n, as we may have expected, making exact
inference intractable in large MASs. In contrast, both under the SD method and
the MF method the required CPU time appears to increase polynomially with n,
the SD method requiring less computation time than the MF method. Though
the SD method is faster than the MF method, it also is more costly: the control
cost under the SD method is significantly higher than under the MF method.
The MF method thus better approximates the optimal control.

3
5 10
2
10
4
CPU Time

1
10
Cost

3 0
10
2 −1
10
−2
1 10
0 5 10 15 0 5 10 15
n n
(a) Cost (b) CPU Time

Fig. 2. The control cost (a) and the required CPU Time in seconds (b) under the
exact method (· − ·), the MF method (−−), and the SD method (—)

Figure 3 shows the positions and the velocities of the agents over time, both
under the control obtained using the MF approximation and under the control
obtained with the SD method. We observe that under MF control, the agents
determine their targets early, between t = 0 and t = 0.5, and the agents ve-
locities gradually increase from zero to a maximum value at t = 0.5 to again
gradually decrease to zero, as required. This is not very surprising, since the
MF approximation is known to show an early symmetry breaking. In contrast,
under the SD method the decision making process of the agents choosing their
targets takes place over almost the entire time interval, and the velocities of
the agents are subject to frequent changes; in particular, as time increases the
agents who have not yet chosen a target seem to exchange targets in a frequent
manner. This may be understood by realising that under the SD method agents
always perform a control to their nearest target only, instead of a weighted com-
bination of controls to different targets which is the situation under MF control.
Optimal Control in Large Stochastic Multi-agent Systems 23

1 2

0.5 1
Position

Velocity
0 0

−0.5 −1

−1 −2
0 0.5 1 0 0.5 1
Time Time
(a) Positions (b) Impulse
1 4

0.5 2
Position

Velocity
0 0

−0.5 −2

−1 −4
0 0.5 1 0 0.5 1
Time Time
(c) Positions (d) Impulse

Fig. 3. A multi-agent system of 15 agents. The positions (a) and the velocities (b) over
time under MF control, and the positions (c) and the velocities (d) over time under
SD control.

Further more, compared with the velocities under the MF method the velocities
under the SD method take on higher maximum values. This may account for
the relatively high control costs under SD control.

4 Discussion

In this paper we studied optimal control in large stochastic multi-agent systems


in continuous space and time, focussing on systems where agents have a task
to distribute themselves over a number of targets. We followed the approach of
Wiegerinck et al. [3]: we modeled the system in continuous space and time, result-
ing in an adaptive control policy where agents continuously adjust their controls
to the environment. We considered the task of assigning agents to targets as a
graphical model inference problem. We showed that in large and densely coupled
systems, in which exact inference would break down, the mean field approxima-
tion manages to compute accurate approximations of the optimal controls of the
agents.
We considered the performances of the mean field approximation and an alter-
native method, referred to as the sort distances method, on an example system
in which a number of agents have to distribute themselves over an equal number
24 B. van den Broek, W. Wiegerinck, and B. Kappen

of targets, such that each target is reached by precisely one agent. In the sort
distances method each agent performs a control to a single nearby target, in
such a way that no two agents head to the same target at the same time. This
method has an advantage of being fast, but it results in relatively high control
costs. Because each agent performs a control to a single target, agents switch
targets frequently during the control process. In the mean field approximation
each agent performs a control which is a weighted sum of controls to single tar-
gets. This requires more computation time than the sort distances method, but
involves significantly lower control costs and therefore is a better approximation
to the optimal control.
An obvious choice for a graphical model inference method not considered in
the present paper would be belief propagation. Results of numeric simulations
with this method in the context of multi-agent control, and comparisons with the
mean field approximation and the exact junction tree algorithm will be published
elsewhere.
There are many possible model extensions worthwhile exploring in future re-
search. Examples are non-zero potentials V in case of a non-empty environment,
penalties for collisions in the context of robotics, non-fixed end times, or bounded
state spaces in the context of a production process. Typically, such model ex-
tensions will not allow for a solution in closed form, and approximate numerical
methods will be required. Some suggestions are given by Kappen [4,5]. In the
setting that we considered the model which describes the behaviour of the agents
was given. It would be worthwhile, however, to consider cases of stochastic op-
timal control of multi-agent systems in continuous space and time where the
model first needs to be learned.

Acknowledgments
We thank Joris Mooij for making available useful software and the reviewers for
their useful remarks. This research is part of the Interactive Collaborative Infor-
mation Systems (ICIS) project, supported by the Dutch Ministry of Economic
Affairs, grant BSIK03024.

References
1. Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In:
Proceedings of NIPS, vol. 14, pp. 1523–1530 (2002)
2. Guestrin, C., Venkataraman, S., Koller, D.: Context-specific multiagent coordination
and planning with factored MDPs. In: Proceedings of AAAI, vol. 18, pp. 253–259
(2002)
3. Wiegerinck, W., van den Broek, B., Kappen, B.: Stochastic optimal control in con-
tinuous space-time multi-agent systems. In: UAI 2006 (2006)
4. Kappen, H.J.: Path integrals and symmetry breaking for optimal control theory.
Journal of statistical mechanics: theory and experiment, 11011 (2005)
5. Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Physical
Review Letters 95(20), 200–201 (2005)
Optimal Control in Large Stochastic Multi-agent Systems 25

6. Lauritzen, S., Spiegelhalter, D.: Local computations with probabilities on graphi-


cal structures and their application to expert systems (with discussion). J. Royal
Statistical Society Series B 50, 157–224 (1988)
7. Jordan, M., Ghahramani, Z., Jaakkola, T., Saul, L.: An introduction to variational
methods for graphical models. In: Jordan, M.I. (ed.) Learning in Graphical Models,
MIT Press, Cambridge (1999)
8. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product
algorithm. IEEE Trans. Info. Theory 47, 498–519 (2001)

A Stochastic Optimal Control


In this appendix we give a brief derivation of equations (3), (4) and (5), starting
from (2). Details can be found in [4,5].
The optimal expected cost-to-go J, by definition the expected cost-to-go (2)
minimized over all controls, satisfies the stochastic Hamilton-Jacobi-Bellman
(HJB) equation
 n 
1   1  
−∂t J = min ua Rua + (ba + Bua ) ∂xa J + Tr σσ  ∂x2a J + V,
u
a=1
2 2

with boundary condition J(x, T ) = φ(x). The minimization with respect to u


yields equation (3), which specifies the optimal control for each agent. Substi-
tuting these controls in the HJB equation gives a non-linear equation for J. We
can remove the non-linearity by using a log transformation: if we introduce a
constant λ, and define Z(x, t) through

J(x, t) = −λ log Z(x, t), (13)

then
1  1
u Rua + (Bua ) ∂xa J = − λ2 Z −2 (∂xa Z) BR−1 B  ∂xa Z,
2 a 2
1   2  1 −2 1  
Tr σσ ∂xa J = λZ (∂xa Z) σσ  ∂xa Z − λZ −1 Tr σσ  ∂x2a Z .
2 2 2
The terms quadratic in Z vanish when σ  σ and R are related via

σσ  = λBR−1 B  . (14)

In the one dimensional case a constant λ can always be found such that equa-
tion (14) is satisfied, in the higher dimensional case the equation puts restrictions
on the matrices σ and R, because in general σσ  and BR−1 B  will not be pro-
portional.
When equation (14) is satisfied, the HJB equation becomes

V n n
1   2 

∂t Z = − b ∂x − Tr σσ ∂xa Z
λ a=1 a a a=1 2
= −HZ, (15)
26 B. van den Broek, W. Wiegerinck, and B. Kappen

where H a linear operator acting on the function Z. Equation (15) is solved


backwards in time with Z(x, T ) = e−φ(x)/λ . However, the linearity allows us to
reverse the direction of computation, replacing it by a diffusion process, as we
will now explain.
The solution to equation (15) is given by

Z(x, t) = dyρ(y, T |x, t)e−φ(y)/λ , (16)

the density ρ(y, ϑ|x, t) (t < ϑ ≤ T ) satisfying the forward Fokker-Planck equa-
tion (5). Combining the equations (13) and (16) yields the expression (4) for the
optimal expected cost-to-go.

You might also like