0% found this document useful (0 votes)

34 views12 pages

Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl

This document discusses optimal control in large stochastic multi-agent systems. It presents a mathematical model for multi-agent systems where agents have independent stochastic dynamics and controls. The goal is to minimize a cost function consisting of state-dependent and control-dependent terms. For large systems where agents must distribute themselves over targets, the optimal control problem can be formulated as a graphical model inference problem. However, exact inference is intractable for large densely-coupled systems. The document proposes using mean field approximation to compute near-optimal controls that scale to large systems. It provides an example where the optimal control can be computed in closed form for a system with stochastic velocity controls.

Uploaded by

Giovanni Palombo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views12 pages

Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl

Uploaded by

Giovanni Palombo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Optimal Control in Large Stochastic

Multi-agent Systems

Bart van den Broek, Wim Wiegerinck, and Bert Kappen

SNN, Radboud University Nijmegen, Geert Grooteplein 21, Nijmegen,

The Netherlands
{b.vandenbroek,w.wiegerinck,b.kappen}@science.ru.nl

Abstract. We study optimal control in large stochastic multi-agent sys-

tems in continuous space and time. We consider multi-agent systems
where agents have independent dynamics with additive noise and con-
trol. The goal is to minimize the joint cost, which consists of a state
dependent term and a term quadratic in the control. The system is de-
scribed by a mathematical model, and an explicit solution is given. We
focus on large systems where agents have to distribute themselves over a
number of targets with minimal cost. In such a setting the optimal con-
trol problem is equivalent to a graphical model inference problem. Exact
inference will be intractable, and we use the mean ﬁeld approximation to
compute accurate approximations of the optimal controls. We conclude
that near to optimal control in large stochastic multi-agent systems is
possible with this approach.

1 Introduction
A collaborative multi-agent system is a group of agents in which each member
behaves autonomously to reach the common goal of the group. Some examples
are teams of robots or unmanned vehicles, and networks of automated resource
allocation. An issue typically appearing in multi-agent systems is decentralized
coordination; the communication between agents may be restricted, there may
be no time to receive all the demands for a certain resource, or an unmanned
vehicle may be unsure about how to anticipate another vehicles movement and
avoid a collision.
In this paper we focus on the issue of optimal control in large multi-agent sys-
tems where the agents dynamics are continuous in space and time. In particular
we look at cases where the agents have to distribute themselves in admissible
ways over a number of targets. Due to the noise in the dynamics, a configura-
tion that initially seems attainable with little effort may become harder to reach
later on.
Common approaches to derive a coordination rule are based on discretizations
of space and time. These often suffer from the curse of dimensionality, as the
complexity increases exponentially in the number of agents. Some successfull
ideas, however, have recently been put forward, which are based on structures
that are assumed to be present [1,2].

K. Tuyls et al. (Eds.): Adaptive Agents and MAS III, LNAI 4865, pp. 15–26, 2008.

c Springer-Verlag Berlin Heidelberg 2008
16 B. van den Broek, W. Wiegerinck, and B. Kappen

Here we rather model the system in continuous space and time, following the
approach of Wiegerinck et al. [3]. The agents satisfy dynamics with additive
control and noise, and the joint behaviour of the agents is valued by a joint cost
function that is quadratic in the control. The stochastic optimization problem
may then be transformed into a linear partial diﬀerential equation, which can
be solved using generic path integral methods [4,5]. The dynamics of the agents
are assumed to factorize over the agents, such that the agents are coupled by
their joint task only.
The optimal control problem is equivalent to a graphical model inference prob-
lem [3]. In large and sparsely coupled multi-agent systems the optimal control
can be computed using the junction tree algorithm. Exact inference, however,
will break down when the system is both large and densely coupled. Here we
explore the use of graphical model approximate inference methods in optimal
control of large stochastic multi-agent systems. We apply the mean ﬁeld approx-
imation to show that optimal control is possible with accuracy in systems where
exact inference breaks down.

2 Stochastic Optimal Control of a Multi-agent System

We consider n agents in a k-dimensional space Rk , the state of each agent a is
given by a vector xa in this space, satisfying stochastic dynamics

dxa (t) = ba (xa (t), t)dt + Bua (t)dt + σdw(t), (1)

where ua is the control of agent a, ba is an arbitrary function representing au-

tonomous dynamics, w is a Wiener process, and B and σ are k × k matrices.
The agents have to reach a goal at the end time T , they will pay a cost φ(x(T ))
at the end time depending on their joint end state x(T ) = (x1 (T ), . . . , xn (T )),
but to reach this goal they will have to make an eﬀort which depends on the
agents controls and states over time. At any time t < T , the expected cost-to-go
is
C(x, t, u(t → T )) =
n

T T
1
φ(x(T )) + dθ V (x(θ), θ) + dθ ua (θ) Rua (θ) , (2)
t a=1 t 2

given the agents initial state x, and the joint control over time u(t → T ). R
is a symmetric k × k matrix with positive eigenvalues, such that ua (θ) Rua (θ)
is always a non-negative number, V (x(θ), θ) is the cost for the agents to be
in a joint state x(θ) at time θ. The issue is to ﬁnd the optimal control which
minimizes the expected cost-to-go.
The optimal controls are given by the gradient

ua (x, t) = −R−1 B ∂xa J(x, t), (3)

Optimal Control in Large Stochastic Multi-agent Systems 17

where J(x, t) the optimal expected cost-to-go, i.e. the cost (2) minimized over
all possible controls; a brief derivation is contained in the appendix. An impor-
tant implication of equation (3) is that at any moment in time, each agent can
compute its own optimal control if it knows its own state and that of the other
agents: there is no need to discuss possible strategies! This is because the agents
always perform the control that is optimal, and the optimal control is unique.
To compute the optimal controls, however, we first need to find the optimal
expected cost-to-go J. The latter may be expressed in terms of a forward diffusion
process:

J(x, t) = −λ log dy ρ(y, T |x, t)e−φ(y)/λ , (4)

ρ(y, T |x, t) being the transition probability for the system to go from a state
x at time t to a state y at the end time T . The constant λ is determined by
the relation σσ = λBR−1 B , equation (14) in the appendix. The density
ρ(y, θ|x, t), t < θ ≤ T , satisﬁes the forward Fokker-Planck equation,

V
n n
1 2
∂θ ρ = − − ∂ya ba ρ + Tr σσ ∂ya ρ . (5)
λ a=1 a=1
2

The solution to this equation may generally be estimated using path integral
methods [4,5], in a few special cases a solution exists in closed form:
Example 1. Consider a multi-agent system in one dimension in which there is
noise and control in the velocities of the agents, according to the set of equations

dxa (t) = ẋa (t)dt
dẋa (t) = ua (t)dt + σdw(t).
Note that this set of equations can be merged into a single equation of the
form (1) by a concatenation of xa and ẋa into a single vector. We choose the
potential V = 0. Under the task where each agent a has to reach a target with
location μa at the end time T , and arrive with speed μ̇a , the end cost function
φ can be given in terms of a product of delta functions, that is

n
e−φ(x,ẋ)/λ = δ(xa − μa )δ(ẋa − μ̇a ),
a=1

and the system decouples into n independent single-agent systems. The dynamics
of each agent a is given by a transition probability

ρa (ya , ẏa , T |xa , ẋa , t) =

2
1 1

−1/2 ya − xa − (T − t)ẋa

exp − c , (6)
det(2πc) 2 ẏa − ẋa

where
1 2(T − t)3 3(T − t)2
c= σ2 .
6 3(T − t)2 6(T − t)
18 B. van den Broek, W. Wiegerinck, and B. Kappen

The optimal control follows from equations (3) and (4) and reads
6(μa − xa − (T − t)ẋa ) − 2(T − t)(μ̇a − ẋa )
ua (xa , ẋa , t) = . (7)
(T − t)2
The ﬁrst term in the control will steer the agent towards the target μa in a
straight line, but since this may happen with a speed that diﬀers from μ̇a with
which the agent should arrive, there is a second term that initially ‘exaggerates’
the speed for going in a straight line, so that in the end there is time to adjust
the speed to the end speed μ̇a .

2.1 A Joint Task: Distribution over Targets

We consider the situation where agents have to distribute themselves over a
number of targets s = 1, . . . , m. In general, there will be mn possible combina-
tions of assigning the n agents to the targets—note, in example 1 we considered
only one assignment. We can describe this by letting the end cost function φ be
given in terms of a positive linear combination of functions

n
Φ(y1 , . . . , yn , s1 , . . . , sn ) = Φa (ya , sa )
a=1

that are peaked around the location (μs1 , . . . , μsn ) of a joint target (s1 , . . . , sn ),
that is
n
−φ(y)/λ
e = w(s1 , . . . , sn ) Φa (ya , sa ),
s1 ,...,sn a=1

where the w(s1 , . . . , sn ) are positive weights. We will refer to these weights as
coupling factors, since they introduce dependencies between the agents. The
optimal control of a single agent is obtained using equations (3) and (4), and is
a weighted combination of single-target controls,

m
ua = pa (s)ua (s) (8)
s=1

(the explicit (x, t) dependence has been dropped in the notation). Here ua (s) is
the control for agent a to go to target s,
ua (s) = −R−1 B ∂xa Za (s), (9)
with Za (s) deﬁned by

Za (sa ) = dya ρa (ya , T |xa , t)Φa (ya , sa ).

The weights pa (s) are marginals of the joint distribution

n
p(s1 , . . . , sn ) ∝ w(s1 , . . . , sn ) Za (sa ). (10)
a=1

p thus is a distribution over all possible assignments of agents to targets.

Optimal Control in Large Stochastic Multi-agent Systems 19

Example 2. Consider the multi-agent system of example 1, but with a diﬀerent

task: each of the agents a = 1, . . . , n has to reach a target s = 1, . . . , n with
location μs at the end time T , and arrive with zero speed, but no two agents
are allowed to arrive at the same target. We model this by choosing an end cost
function φ(x, ẋ) given by

n
e−φ(x,ẋ)/λ = w(s1 , . . . , sn ) δ(ya − μa )δ(ẏa )
s1 ,...,sn a=1

with coupling factors

n c
w(s1 , . . . , sn ) = exp δsa ,sa .
λn
a,a =1

For any agent a, the optimal control under this task is a weighted average of
single target controls (7),

6(μa − xa − (T − t)ẋa ) + 2(T − t)ẋa

ua (xa , ẋa , t) = , (11)
(T − t)2

where μa the averaged target for agent a,

n
μa = pa (s)μs .
s=1

The average is taken with respect to the marginal pa of the joint distribution

n
p(s1 , . . . , sn ) ∝ w(s1 , . . . , sn ) ρa (μsa , 0, T |xa , ẋa , t),
a=1

the densities ρa given by (6).

In general, and in example 2 in particular, the optimal control of an agent will
not only depend on the state of this agent alone, but also on the states of other
agents. Since the controls are computed anew at each instant in time, the agents
are able to continuously adapt to the behaviour of the other agents, adjusting
their control to the new states of all the agents.

2.2 Factored End Costs

The additional computational effort in multi-agent control compared to single-
agent control lies in the computation of the marginals of the joint distribution
p, which involves a sum of at most mn terms. For small systems this is feasible,
for large systems this will only be feasible if the summation can be performed
efficiently. Whether an efficient way of computing the marginals exists, depends
20 B. van den Broek, W. Wiegerinck, and B. Kappen

on the joint task of the agents. In the most complex case, to fulﬁl the task each
agent will have to take the joint state of the entire system into account. In less
complicated cases, an agent will only consider the states of a few agents in the
system, in other words, the coupling factors will have a nontrivial factorized
form:
w(s1 , . . . , sn ) = wA (sA ),
A

where the A are subsets of agents. In such cases we may represent the couplings,
and thus the joint distribution, by a factor graph; see Figure 1 for an example.

1,4 1,2 2,4 3,4 2,3

1 4 2 3

Fig. 1. Example of a factor graph for a multi-agent system of four agents. The cou-
plings are represented by the factors A, with A = {1, 4}, {1, 2}, {2, 4}, {3, 4}, {2, 3}.

2.3 Graphical Model Inference

In the previous paragraph we observed that the joint distribution may be repre-
sented by a factor graph. This implies that the issue of assigning agents to targets
is equivalent to a graphical model inference problem. Both exact methods (junc-
tion tree algorithm [6]) and approximate methods (mean field approximation [7],
belief propagation [8]) can be used to compute the marginals in (8). In this paper
we will use the mean field (MF) approximation to tackle optimal control in large
multi-agent systems.
In the mean field approximation we minimize the mean field free energy, a
function of single agent marginals qa defined by

FMF ({qa }) = −λ log wq − λ log Za qa − λ H(qa ),
a a

where q(s) = q1 (s1 ) · · · qn (sn ). Here the H(qa ) are the entropies of the distribu-
tions qa ,
H(qa ) = − qa (s) log qa (s).
s

The minimum
JMF = min FMF ({qa })
{qa }

is an upper bound for the optimal cost-to-go J, it equals J in case the agents
are uncoupled. FMF has zero gradient in its local minima, that is,

∂F (q1 (s1 ), . . . , qn (sn ))

0= a = 1, . . . , n,
∂qa (sa )
Optimal Control in Large Stochastic Multi-agent Systems 21

with additional constraints for normalization of the probability vectors qa . So-

lutions to this set of equations are implicitely given by the mean ﬁeld equations

Za (sa ) exp (log w|sa )

qa (sa ) = n
(12)
s =1 Za (sa ) exp (log w|sa )

a

where log w|sa the conditional expectation of log w given sa ,

log w|sa = qa (sa ) log w(s1 , . . . , sn ).
s1 ,...,sn \sa a =a

The mean field equations are solved by means of iteration, and the solutions are
the local minima of the mean field free energy. Thus the mean field free energy
minimized over all solutions to the mean field equations equals the minimum
JMF .
The mean field approximation of the optimal control is found by taking the
gradient of the minimum JMF of the mean field free energy, similar to the exact
case where the optimal control is the gradient of the optimal expected cost-to-go,
equation (3):

ua (x, t) = −Ra−1 Ba ∂xa JMF (x, t) = qa (sa )ua (xa , t; sa ).
sa

Similar to the exact case, it is an average of single-agent single-target optimal

controls ua (xa , t; sa ), the controls ua (xa , t; sa ) given by equation (9), where the
average is taken with respect to the mean ﬁeld approximate marginal qa (sa ) of
agent a.

3 Control of Large Multi-agent Systems

Exact inference of multi-agent optimal control is intractable in large and densely

coupled systems. In this section we present numerical results from approximate
inference in optimal control of a large multi-agent system. We focus on the system
presented in example 2. A group of n agents have to distribute themselves over
an equal number of targets, each target should be reached by precisely one agent.
The agents all start in the same location at t = 0, and the time they reach the
targets lies at T = 1, as illustrated in figure 3. The variance of the noise equals
0.1 and the control cost parameter R equals 1, both are the same for each agent.
The coupling strength c in the coupling factors equals −10. For implementation,
time had to be discretized: each time step Δt equaled 0.05 times the time-to-go
T − t.
We considered two approximate inference methods for obtaining the marginals
in (8), the mean field approximation described in section 2.3, and an approxi-
mation which at each moment in time assigns each agent to precisely one target.
In the latter method the agent that is nearest to any of the targets is assigned
first to its nearest target, then, removing this pair of agent and target, this is
22 B. van den Broek, W. Wiegerinck, and B. Kappen

repeated for the remaining agents and targets, until there are no more remain-
ing agents and targets. We will refer to this method as the sort distances (SD)
method.
For several sizes of the system we computed the control cost and the required
CPU time to calculate the controls. This we did under both control methods.
Figures 2(a) and (b) show the control cost and the required CPU time as a
function of the system size n; each value is an average obtained from 100 sim-
ulations. To emphasize the necessity of the approximate inference methods, in
ﬁgure 2(b) we included the required CPU time under exact inference; this quan-
tity increases exponentially with n, as we may have expected, making exact
inference intractable in large MASs. In contrast, both under the SD method and
the MF method the required CPU time appears to increase polynomially with n,
the SD method requiring less computation time than the MF method. Though
the SD method is faster than the MF method, it also is more costly: the control
cost under the SD method is signiﬁcantly higher than under the MF method.
The MF method thus better approximates the optimal control.

3
5 10
2
10
4
CPU Time

1
10
Cost

3 0
10
2 −1
10
−2
1 10
0 5 10 15 0 5 10 15
n n
(a) Cost (b) CPU Time

Fig. 2. The control cost (a) and the required CPU Time in seconds (b) under the
exact method (· − ·), the MF method (−−), and the SD method (—)

Figure 3 shows the positions and the velocities of the agents over time, both
under the control obtained using the MF approximation and under the control
obtained with the SD method. We observe that under MF control, the agents
determine their targets early, between t = 0 and t = 0.5, and the agents ve-
locities gradually increase from zero to a maximum value at t = 0.5 to again
gradually decrease to zero, as required. This is not very surprising, since the
MF approximation is known to show an early symmetry breaking. In contrast,
under the SD method the decision making process of the agents choosing their
targets takes place over almost the entire time interval, and the velocities of
the agents are subject to frequent changes; in particular, as time increases the
agents who have not yet chosen a target seem to exchange targets in a frequent
manner. This may be understood by realising that under the SD method agents
always perform a control to their nearest target only, instead of a weighted com-
bination of controls to diﬀerent targets which is the situation under MF control.
Optimal Control in Large Stochastic Multi-agent Systems 23

1 2

0.5 1
Position

Velocity
0 0

−0.5 −1

−1 −2
0 0.5 1 0 0.5 1
Time Time
(a) Positions (b) Impulse
1 4

0.5 2
Position

Velocity
0 0

−0.5 −2

−1 −4
0 0.5 1 0 0.5 1
Time Time
(c) Positions (d) Impulse

Fig. 3. A multi-agent system of 15 agents. The positions (a) and the velocities (b) over
time under MF control, and the positions (c) and the velocities (d) over time under
SD control.

Further more, compared with the velocities under the MF method the velocities
under the SD method take on higher maximum values. This may account for
the relatively high control costs under SD control.

4 Discussion

In this paper we studied optimal control in large stochastic multi-agent systems

in continuous space and time, focussing on systems where agents have a task
to distribute themselves over a number of targets. We followed the approach of
Wiegerinck et al. [3]: we modeled the system in continuous space and time, result-
ing in an adaptive control policy where agents continuously adjust their controls
to the environment. We considered the task of assigning agents to targets as a
graphical model inference problem. We showed that in large and densely coupled
systems, in which exact inference would break down, the mean ﬁeld approxima-
tion manages to compute accurate approximations of the optimal controls of the
agents.
We considered the performances of the mean ﬁeld approximation and an alter-
native method, referred to as the sort distances method, on an example system
in which a number of agents have to distribute themselves over an equal number
24 B. van den Broek, W. Wiegerinck, and B. Kappen

of targets, such that each target is reached by precisely one agent. In the sort
distances method each agent performs a control to a single nearby target, in
such a way that no two agents head to the same target at the same time. This
method has an advantage of being fast, but it results in relatively high control
costs. Because each agent performs a control to a single target, agents switch
targets frequently during the control process. In the mean field approximation
each agent performs a control which is a weighted sum of controls to single tar-
gets. This requires more computation time than the sort distances method, but
involves significantly lower control costs and therefore is a better approximation
to the optimal control.
An obvious choice for a graphical model inference method not considered in
the present paper would be belief propagation. Results of numeric simulations
with this method in the context of multi-agent control, and comparisons with the
mean field approximation and the exact junction tree algorithm will be published
elsewhere.
There are many possible model extensions worthwhile exploring in future re-
search. Examples are non-zero potentials V in case of a non-empty environment,
penalties for collisions in the context of robotics, non-fixed end times, or bounded
state spaces in the context of a production process. Typically, such model ex-
tensions will not allow for a solution in closed form, and approximate numerical
methods will be required. Some suggestions are given by Kappen [4,5]. In the
setting that we considered the model which describes the behaviour of the agents
was given. It would be worthwhile, however, to consider cases of stochastic op-
timal control of multi-agent systems in continuous space and time where the
model first needs to be learned.

Acknowledgments
We thank Joris Mooij for making available useful software and the reviewers for
their useful remarks. This research is part of the Interactive Collaborative Infor-
mation Systems (ICIS) project, supported by the Dutch Ministry of Economic
Aﬀairs, grant BSIK03024.

References
1. Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In:
Proceedings of NIPS, vol. 14, pp. 1523–1530 (2002)
2. Guestrin, C., Venkataraman, S., Koller, D.: Context-speciﬁc multiagent coordination
and planning with factored MDPs. In: Proceedings of AAAI, vol. 18, pp. 253–259
(2002)
3. Wiegerinck, W., van den Broek, B., Kappen, B.: Stochastic optimal control in con-
tinuous space-time multi-agent systems. In: UAI 2006 (2006)
4. Kappen, H.J.: Path integrals and symmetry breaking for optimal control theory.
Journal of statistical mechanics: theory and experiment, 11011 (2005)
5. Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Physical
Review Letters 95(20), 200–201 (2005)
Optimal Control in Large Stochastic Multi-agent Systems 25

6. Lauritzen, S., Spiegelhalter, D.: Local computations with probabilities on graphi-

cal structures and their application to expert systems (with discussion). J. Royal
Statistical Society Series B 50, 157–224 (1988)
7. Jordan, M., Ghahramani, Z., Jaakkola, T., Saul, L.: An introduction to variational
methods for graphical models. In: Jordan, M.I. (ed.) Learning in Graphical Models,
MIT Press, Cambridge (1999)
8. Kschischang, F.R., Frey, B.J., Loeliger, H.A.: Factor graphs and the sum-product
algorithm. IEEE Trans. Info. Theory 47, 498–519 (2001)

A Stochastic Optimal Control

In this appendix we give a brief derivation of equations (3), (4) and (5), starting
from (2). Details can be found in [4,5].
The optimal expected cost-to-go J, by deﬁnition the expected cost-to-go (2)
minimized over all controls, satisﬁes the stochastic Hamilton-Jacobi-Bellman
(HJB) equation
n
1 1
−∂t J = min ua Rua + (ba + Bua ) ∂xa J + Tr σσ ∂x2a J + V,
u
a=1
2 2

with boundary condition J(x, T ) = φ(x). The minimization with respect to u

yields equation (3), which speciﬁes the optimal control for each agent. Substi-
tuting these controls in the HJB equation gives a non-linear equation for J. We
can remove the non-linearity by using a log transformation: if we introduce a
constant λ, and deﬁne Z(x, t) through

J(x, t) = −λ log Z(x, t), (13)

then
1 1
u Rua + (Bua ) ∂xa J = − λ2 Z −2 (∂xa Z) BR−1 B ∂xa Z,
2 a 2
1 2 1 −2 1
Tr σσ ∂xa J = λZ (∂xa Z) σσ ∂xa Z − λZ −1 Tr σσ ∂x2a Z .
2 2 2
The terms quadratic in Z vanish when σ σ and R are related via

σσ = λBR−1 B . (14)

In the one dimensional case a constant λ can always be found such that equa-
tion (14) is satisﬁed, in the higher dimensional case the equation puts restrictions
on the matrices σ and R, because in general σσ and BR−1 B will not be pro-
portional.
When equation (14) is satisﬁed, the HJB equation becomes

V n n
1 2

∂t Z = − b ∂x − Tr σσ ∂xa Z
λ a=1 a a a=1 2
= −HZ, (15)
26 B. van den Broek, W. Wiegerinck, and B. Kappen

where H a linear operator acting on the function Z. Equation (15) is solved

backwards in time with Z(x, T ) = e−φ(x)/λ . However, the linearity allows us to
reverse the direction of computation, replacing it by a diﬀusion process, as we
will now explain.
The solution to equation (15) is given by

Z(x, t) = dyρ(y, T |x, t)e−φ(y)/λ , (16)

the density ρ(y, ϑ|x, t) (t < ϑ ≤ T ) satisfying the forward Fokker-Planck equa-
tion (5). Combining the equations (13) and (16) yields the expression (4) for the
optimal expected cost-to-go.

An Introduction To Optimal Control Applied To Disease Models
No ratings yet
An Introduction To Optimal Control Applied To Disease Models
37 pages
k3 Ve Service Manual
26% (19)
k3 Ve Service Manual
2 pages
Grade 12 Physics Exam Questions and Answers
80% (10)
Grade 12 Physics Exam Questions and Answers
3 pages
Arturo Locatelli-Optimal Control - An Introduction-Birkhäuser Basel (2001) PDF
No ratings yet
Arturo Locatelli-Optimal Control - An Introduction-Birkhäuser Basel (2001) PDF
305 pages
Stream Project
No ratings yet
Stream Project
25 pages
Optimal Control-PMP - and Games
No ratings yet
Optimal Control-PMP - and Games
22 pages
Distributed Optimal Consensus Control For Multiagent Systems With Input Delay
No ratings yet
Distributed Optimal Consensus Control For Multiagent Systems With Input Delay
13 pages
Large Population Stochastic Dynamics Games
No ratings yet
Large Population Stochastic Dynamics Games
31 pages
Primer On Optimal Control BOOK Speyer
67% (3)
Primer On Optimal Control BOOK Speyer
321 pages
Mean-Field Control Barrier Functions: A Framework For Real-Time Swarm Control
No ratings yet
Mean-Field Control Barrier Functions: A Framework For Real-Time Swarm Control
7 pages
Distributed Inverse Optimal Control For Discrete-Time Nonlinear Multi-Agent Systems
No ratings yet
Distributed Inverse Optimal Control For Discrete-Time Nonlinear Multi-Agent Systems
6 pages
ZHANG-Distributed Optimal Control
No ratings yet
ZHANG-Distributed Optimal Control
8 pages
Mean-Field Pontryagin Maximum Principle
No ratings yet
Mean-Field Pontryagin Maximum Principle
59 pages
Distributed Optimal Coordination Control For Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints
No ratings yet
Distributed Optimal Coordination Control For Continuous-Time Nonlinear Multi-Agent Systems With Input Constraints
6 pages
Optimal Control Theory With Econ Applications
No ratings yet
Optimal Control Theory With Econ Applications
250 pages
Control Course
No ratings yet
Control Course
126 pages
Decentralized Stochastic Control: Aditya Mahajan
No ratings yet
Decentralized Stochastic Control: Aditya Mahajan
50 pages
Optimal Control Theory Chapter 12
No ratings yet
Optimal Control Theory Chapter 12
55 pages
Formation Control With Obstacle Avoidance For A Class of Stochastic Multiagent Systems
No ratings yet
Formation Control With Obstacle Avoidance For A Class of Stochastic Multiagent Systems
9 pages
(Evans L.C.) An Introduction To Mathematical Optim
No ratings yet
(Evans L.C.) An Introduction To Mathematical Optim
125 pages
Deterministic Continuous Time Optimal Control and The Hamilton-Jacobi-Bellman Equation
No ratings yet
Deterministic Continuous Time Optimal Control and The Hamilton-Jacobi-Bellman Equation
7 pages
Francoi Delarue Lectures
No ratings yet
Francoi Delarue Lectures
22 pages
Optimal Control
No ratings yet
Optimal Control
305 pages
LectureNotes MA5232 2021
No ratings yet
LectureNotes MA5232 2021
43 pages
Linear Theory For Control
No ratings yet
Linear Theory For Control
4 pages
Singular Arcs On Average Optimal Control-Affine Problems: M.S. Aronna, G. de Lima Monteiro and O. Sierra
No ratings yet
Singular Arcs On Average Optimal Control-Affine Problems: M.S. Aronna, G. de Lima Monteiro and O. Sierra
6 pages
Optimal Control of A Fully Decentralized Quadratic Regulator
No ratings yet
Optimal Control of A Fully Decentralized Quadratic Regulator
7 pages
Optimal Control Theory Chapter 2 V6
No ratings yet
Optimal Control Theory Chapter 2 V6
86 pages
Andersson Djehiche - AMO 2011
No ratings yet
Andersson Djehiche - AMO 2011
16 pages
SC Dec22
No ratings yet
SC Dec22
82 pages
Introduction To Optimal Control Theory and Hamilton-Jacobi Equations
100% (1)
Introduction To Optimal Control Theory and Hamilton-Jacobi Equations
55 pages
PhysRevResearch.5.013122 Physics of Networks
No ratings yet
PhysRevResearch.5.013122 Physics of Networks
9 pages
On Some Optimal Control Problems For Electric Circuits
No ratings yet
On Some Optimal Control Problems For Electric Circuits
21 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Stochastic Feedback Controller Design Considering The Dual Effect
No ratings yet
Stochastic Feedback Controller Design Considering The Dual Effect
13 pages
Data Quality DMB Ok Dam A Brasil
100% (1)
Data Quality DMB Ok Dam A Brasil
46 pages
Control Lab1
0% (1)
Control Lab1
59 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
14 pages
Optimal Control
No ratings yet
Optimal Control
51 pages
Control Course
No ratings yet
Control Course
126 pages
On Some Optimal Control Problems For Electric Circuits
No ratings yet
On Some Optimal Control Problems For Electric Circuits
21 pages
Projet
No ratings yet
Projet
22 pages
Chapter One: 1.1 Optimal Control Problem
No ratings yet
Chapter One: 1.1 Optimal Control Problem
25 pages
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
No ratings yet
5.1 Dynamic Programming and The HJB Equation: k+1 K K K K
30 pages
Model-Based Adaptive Critic Designs: Editor's Summary
No ratings yet
Model-Based Adaptive Critic Designs: Editor's Summary
31 pages
L4 Discrete Time Optimal Control Indirect LQ ARE
No ratings yet
L4 Discrete Time Optimal Control Indirect LQ ARE
26 pages
16.323 Principles of Optimal Control: Mit Opencourseware
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
4 pages
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
From Everand
Algebraic Methods in Statistical Mechanics and Quantum Field Theory
Dr. Gérard G. Emch
No ratings yet
Locatelli-Optimal Control An Introduction-Birkhäuser Basel (2001)
No ratings yet
Locatelli-Optimal Control An Introduction-Birkhäuser Basel (2001)
306 pages
0 Tlemcen Mio Contrib PDF
No ratings yet
0 Tlemcen Mio Contrib PDF
48 pages
Statistics for Spatio-Temporal Data
From Everand
Statistics for Spatio-Temporal Data
Noel Cressie
No ratings yet
I. Module 3: Market Study: Study of Demand Study of Supply Demand-Supply Analysis Study of The Price Marketing Program
No ratings yet
I. Module 3: Market Study: Study of Demand Study of Supply Demand-Supply Analysis Study of The Price Marketing Program
14 pages
Class IX E-Content Links (Final)
No ratings yet
Class IX E-Content Links (Final)
1 page
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
No ratings yet
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
22 pages
BOOK-Soner-Stochastic Optimal Control in Finance
No ratings yet
BOOK-Soner-Stochastic Optimal Control in Finance
67 pages
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
Analytical Methods of Optimization
From Everand
Analytical Methods of Optimization
D. F. Lawden
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
4 pages
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
100% (1)
Lesson Plan Subject/Grade Unit/Skill/Topic of Lesson Standards Addressed Va:Re9.1. 2 Va:Cr2.1.2 Vacr3.1.2
4 pages
Stochastic Control Princeton
No ratings yet
Stochastic Control Princeton
14 pages
Woolseylecture 1
No ratings yet
Woolseylecture 1
4 pages
Graph
No ratings yet
Graph
9 pages
Constructed Layered Systems: Measurements and Analysis
From Everand
Constructed Layered Systems: Measurements and Analysis
W. H. Cogill
No ratings yet
MasterCast 222 TDS-974770
No ratings yet
MasterCast 222 TDS-974770
2 pages
Function Spaces
No ratings yet
Function Spaces
20 pages
Project Name: Wilmont's Pharmacy Drone Case: Qualitative Risk Analysis
100% (1)
Project Name: Wilmont's Pharmacy Drone Case: Qualitative Risk Analysis
3 pages
TRCS - Assignment Issued To Students
No ratings yet
TRCS - Assignment Issued To Students
4 pages
A New Genus of Terraranas Anura Brachycephaloidea From Northern South America With A Systematic Review of Tachiramantis
No ratings yet
A New Genus of Terraranas Anura Brachycephaloidea From Northern South America With A Systematic Review of Tachiramantis
26 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
L Matching Reflection Coefficient Using Matlab
100% (1)
L Matching Reflection Coefficient Using Matlab
7 pages
SPM Swivels Operation Instruction and Service Manual
No ratings yet
SPM Swivels Operation Instruction and Service Manual
44 pages
Lectures on Ergodic Theory
From Everand
Lectures on Ergodic Theory
Paul R. Halmos
No ratings yet
Fourier Analysis in Hilbert Space: 4.1 Orthonormal Sequences
No ratings yet
Fourier Analysis in Hilbert Space: 4.1 Orthonormal Sequences
19 pages
EPP Lessonplan
No ratings yet
EPP Lessonplan
6 pages
Kolmogorov-Rietz Theorem
No ratings yet
Kolmogorov-Rietz Theorem
11 pages
Resilient Control Architectures and Power Systems 1st Edition Craig Rieger Instant Download
100% (1)
Resilient Control Architectures and Power Systems 1st Edition Craig Rieger Instant Download
44 pages
PhysicsBowl 2017
No ratings yet
PhysicsBowl 2017
11 pages
5 Versionfinal
No ratings yet
5 Versionfinal
8 pages
Mechanical Tube English
No ratings yet
Mechanical Tube English
8 pages
Ambulong Climatological Extremes (As of 2016)
No ratings yet
Ambulong Climatological Extremes (As of 2016)
1 page
CBSE Class 6 Social Science Sample Paper SA 2 SET 1
No ratings yet
CBSE Class 6 Social Science Sample Paper SA 2 SET 1
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Random Dice Deck Database - Search
No ratings yet
Random Dice Deck Database - Search
36 pages
10 Sobolev Spaces For A Gaussian Measure: E (H) of
No ratings yet
10 Sobolev Spaces For A Gaussian Measure: E (H) of
27 pages
CepheusUniversal Whiteback17
90% (10)
CepheusUniversal Whiteback17
456 pages
Schmidt Sciences
No ratings yet
Schmidt Sciences
6 pages
Brochure 4200 en
No ratings yet
Brochure 4200 en
8 pages
STS Lesson 1
No ratings yet
STS Lesson 1
8 pages
Quiz 2 PF
No ratings yet
Quiz 2 PF
7 pages
Measurability of Functions in Product Spaces
No ratings yet
Measurability of Functions in Product Spaces
5 pages
159.52 101870341003 101870349999 Heating Climatic Unit
No ratings yet
159.52 101870341003 101870349999 Heating Climatic Unit
5 pages
MAED Math 4
No ratings yet
MAED Math 4
2 pages
New Vendor Form
No ratings yet
New Vendor Form
1 page
Competitivep20180921 PDF
No ratings yet
Competitivep20180921 PDF
1 page
Competitivep20180914 PDF
No ratings yet
Competitivep20180914 PDF
1 page
Interfacing of LED 8051
No ratings yet
Interfacing of LED 8051
16 pages

Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl

Uploaded by

Optimal Control in Large Stochastic Multi-Agent Systems: (B.vandenbroek, W.wiegerinck, B.kappen) @science - Ru.nl

Uploaded by

Optimal Control in Large Stochastic

Bart van den Broek, Wim Wiegerinck, and Bert Kappen

SNN, Radboud University Nijmegen, Geert Grooteplein 21, Nijmegen,

Abstract. We study optimal control in large stochastic multi-agent sys-

2 Stochastic Optimal Control of a Multi-agent System

dxa (t) = ba (xa (t), t)dt + Bua (t)dt + σdw(t), (1)

where ua is the control of agent a, ba is an arbitrary function representing au-

ua (x, t) = −R−1 B ∂xa J(x, t), (3)

ρa (ya , ẏa , T |xa , ẋa , t) =

2.1 A Joint Task: Distribution over Targets

The weights pa (s) are marginals of the joint distribution

p thus is a distribution over all possible assignments of agents to targets.

Example 2. Consider the multi-agent system of example 1, but with a diﬀerent

with coupling factors

6(μa  − xa − (T − t)ẋa ) + 2(T − t)ẋa

where μa  the averaged target for agent a,

the densities ρa given by (6).

2.2 Factored End Costs

1,4 1,2 2,4 3,4 2,3

2.3 Graphical Model Inference

∂F (q1 (s1 ), . . . , qn (sn ))

with additional constraints for normalization of the probability vectors qa . So-

Za (sa ) exp (log w|sa )

where log w|sa  the conditional expectation of log w given sa ,

Similar to the exact case, it is an average of single-agent single-target optimal

3 Control of Large Multi-agent Systems

Exact inference of multi-agent optimal control is intractable in large and densely

In this paper we studied optimal control in large stochastic multi-agent systems

6. Lauritzen, S., Spiegelhalter, D.: Local computations with probabilities on graphi-

A Stochastic Optimal Control

with boundary condition J(x, T ) = φ(x). The minimization with respect to u

J(x, t) = −λ log Z(x, t), (13)

where H a linear operator acting on the function Z. Equation (15) is solved

You might also like

6(μa − xa − (T − t)ẋa ) + 2(T − t)ẋa

where μa the averaged target for agent a,

Za (sa ) exp (log w|sa )

where log w|sa the conditional expectation of log w given sa ,