Decentralized stochastic control
The person-by-person and the common information approaches
Aditya Mahajan
McGill University
Banf Workshop on Optimal Cooperation, Communication,
and Learning in Decentralized Systems, 14 Oct 2014
Simplest general model of a decentralized control system
U I
Y
X
Y
n
Un In
Dynamics X + = f (X , , W ), where = (U , . . . , Un ).
Observation Yi = hi (X , Wi ).
Information {Yi : , Ui : } Ii { : , : }, Ui = gi (Ii ).
structure
Control Strategy = ( , . . . , n ), where i = (gi , gi , . . . ).
Per-step reward R = (X , ). J() = R
[ ]
Performance
=
1
Decentralized stochastic control (Aditya Mahajan)
Simplest general model of a decentralized control system
g
U I
Y
X
Y
Designer
n
Un gn
In
Dynamics X + = f (X , , W ), where = (U , . . . , Un ).
Observation Yi = hi (X , Wi ).
Information {Yi : , Ui : } Ii { : , : }, Ui = gi (Ii ).
structure
Control Strategy = ( , . . . , n ), where i = (gi , gi , . . . ).
Per-step reward R = (X , ). J() = R
[ ]
Performance
=
1
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
...
AI Literature
...
2
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
...
AI Literature
...
Simpler than non-cooperative game theory.
All pre-game agreements are enforceable.
Simpler than cooperative game theory.
The value of the game does not need to be split between the players.
2
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
Main
... diiculty: Seeking global optimality
AI Literature
...
Simpler than non-cooperative game theory.
All pre-game agreements are enforceable.
Simpler than cooperative game theory.
The value of the game does not need to be split between the players.
2
Decentralized stochastic control (Aditya Mahajan)
Conceptual difficulties
choose an ininite sequence of control laws to maximize the expected total reward.
The optimal control problem is a functional optimization problem where we have to
The domain Ii of control law gi increases with time.
Can the optimization problem be solved?
Can we implement the optimal solution?
Agent based methods lead to ininite regress.
Signaling (or the communication aspect of control)
3
Decentralized stochastic control (Aditya Mahajan)
Centralized stochastic control: Information state
I I +
4
Decentralized stochastic control (Aditya Mahajan)
Centralized stochastic control: Information state
I I +
A process {Z }= is called an information state if
There exists a series of functions {F }= such that Z = f (I ).
Function of available information
Absorbs the efect of available information on current rewards
(R | I = i , U = u ) = (R | Z = F (i ), U = u ).
Controlled Markov property
(Z + | I = i , U = u ) = (Z + | Z = F (i ), U = u ).
Examples: System state in MDPs Belief state in POMDPs
4
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies
i.e., for any choice of future strategy = (g + , g + , . . . )
The information state absorbs the efect of available information on expected future cost,
R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]
| |
t t
5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies
i.e., for any choice of future strategy = (g + , g + , . . . )
The information state absorbs the efect of available information on expected future cost,
R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]
| |
t t
Z is a suicient statistic for performance evaluation,
Therefore,
there is no loss of optimality is using control laws of the form g Z U
5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies
i.e., for any choice of future strategy = (g + , g + , . . . )
The information state absorbs the efect of available information on expected future cost,
R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]
| |
t t
Z is a suicient statistic for performance evaluation,
Therefore,
there is no loss of optimality is using control laws of the form g Z U
In MDPs, g X U .
In POMDPs, g B U , where B is the belief state.
Examples
5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,
R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]
|
t
t+1
+ + + +
+
= R Z = z , U = u Relies on +
[ = ]
|
t
6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,
R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]
|
t
t+1
+ + + +
+
= R Z = z , U = u Relies on +
[ = ]
|
t
There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by
the ixed point of the following dynamic program
V(z) = min [R + V(Z + ) | Z = z, U = u]
6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,
R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]
|
t
t+1
+ + + +
+
= R Z = z , U = u Relies on +
[ = + information]state.
|
t Both these results rely on an appropriate choice of
There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by
Note that information state for DP
is also adynamic
the ixed point of the following suicient statistic for control.
program
V(z) = min [R + V(Z + ) | Z = z, U = u]
6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,
R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]
|
t
t+1
+ + + +
+
= R Z = z , U = u Relies on +
[ = + information]state.
|
Both these
t results rely on an appropriate choice of
Can we identify a suicient statistic Zi and restrict
attention to gi Zi Ui ?
There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by
Note that information state for DP
is also adynamic
the ixed point of the following suicient statistic for control.
program
V(z) = min [R +optimal
V(Z + control
) | Z = z,strategies?
U = u]
Can we show that there exist time-homogeneous
Can we identify appropriate information states to
determine a dynamic program that computes such
optimal strategies?
6
Decentralized stochastic control (Aditya Mahajan)
Two approaches to dynamic programming:
The person-by-person approach
The person-by-person approach
Pick an agent, say i.
Arbitrarily ix the strategies i of all other agents.
Identify an information-state process {Zi }= for agent i.
Structure of If i , the space of realization of Zi , does not depend on i , then
optimal strategies there is no loss of optimality in using gi Zi Ui .
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
The person-by-person approach
Pick an agent, say i.
Arbitrarily ix the strategies i of all other agents.
Identify an information-state process {Zi }= for agent i.
Structure of If i , the space of realization of Zi , does not depend on i , then
optimal strategies there is no loss of optimality in using gi Zi Ui .
Write coupled dynamic programs to identify the best response strategy
i = i (i )
Remarks Is the best-response strategy time-homogeneous?
Does there exist a ixed-point of the coupled dynamic program?
Is the ixed point unique?
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
The person-by-person approach
Pick an agent, say i.
Arbitrarily ix the strategies i of all other agents.
The person-by-person approach:
May identify the
Identify an information-state process {Zi }=of for
structure globally i.
agent optimal control
Structure of If i , the space of realization of Zi , does not depend on i , then
strategies.
optimal strategies there is no loss of optimality in using gi Zi Ui .
Provides coupled dynamic programs, which, at best,
may determine person-by-person optimal control
strategies. Such strategies can be arbitrarily bad
Write coupled dynamic programs to identify the best response strategy
i = i (i )
compared to globally optimal strategies.
Remarks Is the best-response strategy time-homogeneous?
Does there exist a ixed-point of the coupled dynamic program?
Is the ixed point unique?
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).
Information Ii = {Xi : , : }
structure
Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).
Information Ii = {Xi : , : }
structure
Conditional For any arbitrary choice of control strategies :
| = )= | = )
independence n
i
( : : : (X : : :
i=
Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).
Information Ii = {Xi : , : }
structure
Conditional For any arbitrary choice of control strategies :
| = )= | = )
independence n
i
( : : : (X : : :
i=
Arbitrarily ix strategies i , and consider the best-response strategy
at agent i.
Structure
of optimal
{Xi , } is an information-state at agent i.
strategies
:
Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
Two approaches to dynamic programming:
The common-information approach
One dynamic program to rule them all
V( ) = min [R + V( + )| = , = ]
9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all
V( ) = min [R + V( + )| = , = ]
The information state must be a function of the information available to every controller.
9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all
V( ) = min [R + V( + )| = , = ]
The information state must be a function of the information available to every controller.
Common information: C = Ii , Local information: Li = Ii C
n
i=
9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all
V(z) = min [R + V(Z + ) | Z = z, = ]
The information state must be a function of the information available to every controller.
Common information: C = Ii , Local information: Li = Ii C
n
i=
9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all
V(z) = min [R + V(Z + ) | Z = z, = ]
The information state must be a function of the information available to every controller.
Common information: C = Ii , Local information: Li = Ii C
n
i=
Each step of the dynamic programming must determine a mapping from (C , Li ) Ui .
The information state Z only depends on C
Thus, the action at each step must be a mapping Li Ui . Call it prescription and
denote it by i .
9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all
V(z) = min [R + V(Z + ) | Z = z, = ]
The information state must be a function of the information available to every controller.
Common information: C = Ii , Local information: Li = Ii C
n
i=
Each step of the dynamic programming must determine a mapping from (C , Li ) Ui .
The information state Z only depends on C
Thus, the action at each step must be a mapping Li Ui . Call it prescription and
denote it by i .
9
Decentralized stochastic control (Aditya Mahajan)
A virtual coordinator
I
In
10
Decentralized stochastic control (Aditya Mahajan)
A virtual coordinator
L
X C
n
Ln
10
Decentralized stochastic control (Aditya Mahajan)
A virtual coordinator
L
X C
n
Ln
|i | is uniformly bounded (over i and t) and
Partial history sharing
(Li + | C , Li , Ui , Yi + ) = (Li + | Li , Ui , Yi + )
Information state: (X , | C = c) (or something else)
Centralized POMDP
Standard POMDP results apply, value function is PWLC.
Subsumes many previous results on DP for decentralized stochastic control.
10
Decentralized stochastic control (Aditya Mahajan)
Example 1: Delayed sharing information structure
Dynamics X + = f (X , , W ), where = (U , . . . , Un ).
Observations Yi = hi (X , Wi ).
Information Ii = {Yi : , Ui : , : k , : k }. k is the sharing delay.
structure
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Nayyar, Mahajan and Teneketzis, Optimal control strategies in delayed sharing information structures, IEEE TAC 2011.
11
Decentralized stochastic control (Aditya Mahajan)
Example 1: Delayed sharing information structure
Dynamics X + = f (X , , W ), where = (U , . . . , Un ).
Observations Yi = hi (X , Wi ).
Information Ii = {Yi : , Ui : , : k , : k }. k is the sharing delay.
structure
Common info.: C = { : k , : k }, Local Info.: Li = Ii C , Pres.: i Li Ui
Information State = (X , | C )
Results No loss of optimality in using control strategies gi (Li , ) Ui .
Dynamic program: V() = min [R + V( + ) | = , = ].
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Nayyar, Mahajan and Teneketzis, Optimal control strategies in delayed sharing information structures, IEEE TAC 2011.
11
Decentralized stochastic control (Aditya Mahajan)
Example 2: Control sharing information structure
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).
: Ii = {Xi : , }
structure Using p-by-p approach: Ii = {Xi , : }.
Information Original :
Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
12
Decentralized stochastic control (Aditya Mahajan)
Example 2: Control sharing information structure
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).
: Ii = {Xi : , }
structure Using p-by-p approach: Ii = {Xi , : }.
Information Original :
Common info.: C = : , Local Info.: Li = Xi , Prescriptions: i Xi Ui
Information Deine i (x) = (Xi = x | : ).
State Then = ( , . . . , n ) is an information state.
Results No loss of optimality in using control strategies gi (Xi , ) Ui .
Dynamic program: V() = min [R + V( + ) | = , = ].
Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
12
Decentralized stochastic control (Aditya Mahajan)
Example 3: Mean-field sharing information structure
Dynamics Xi + = f (Xi , Ui , M , Wi ), where M = Xit .
n
i=
Information Ii = {Xi , M : }, and assume identical control laws.
structure
Arabneydi, Mahajan Team optimal control of coupled subsystems with mean ield sharing, CDC 2014.
13
Decentralized stochastic control (Aditya Mahajan)
Example 3: Mean-field sharing information structure
Dynamics Xi + = f (Xi , Ui , M , Wi ), where M = Xit .
n
i=
Information Ii = {Xi , M : }, and assume identical control laws.
structure
Common info.: C = M : , Local info.: Li = Xi , Prescriptions: Xi Ui .
Information state Due to the symmetry of the system, M is an information-state.
Results No loss of optimality in using control strategies: gi (Xi , M ).
Dynamic program: V(m) = min [R + V(M + ) | M = m, = ]
Size of state space = poly(n); Size of action space .
Arabneydi, Mahajan Team optimal control of coupled subsystems with mean ield sharing, CDC 2014.
13
Decentralized stochastic control (Aditya Mahajan)
What if the shared information is empty?
The designers approach
An example: Finite memory controller
Dynamics X + = f (X , U , W ), Y = h (X , N ).
Information I = {Y , M } Simplest non-classical information structure
structure [U , M + ] = g (Y , M )
Witsenhausen, A standard form for sequential stochastic control, Math. Sys. Theory, 1973.
14
Decentralized stochastic control (Aditya Mahajan)
An example: Finite memory controller
Dynamics X + = f (X , U , W ), Y = h (X , N ).
Information I = {Y , M } Simplest non-classical information structure
structure [U , M + ] = g (Y , M )
Common info.: C = , Local info.: L = (Y , M ), Prescriptions: g (Y , M ) U .
Information state = (X , M | g : )
Results Dynamic program: V() = min [R + V( + ) | = , g = g]
g
Cannot show that time-homogeneous strategies are optimal!
Witsenhausen, A standard form for sequential stochastic control, Math. Sys. Theory, 1973.
14
Decentralized stochastic control (Aditya Mahajan)
Some applications
Real-time communication with feedback
Source Encoder Channel Decoder
Variations
Source coding, channel coding, or joint source-channel coding setup;
Feedback from channel output to encoder;
No feedback or noisy feedback (but either encoder or decoder has inite memory);
Generalization
Multi-terminal real-time communication
Source coding, channel coding, joint source-channel coding
15
Decentralized stochastic control (Aditya Mahajan)
Networked control systems
Plant Sensor Channel Controller
Variations
Feedback from channel output to sensor;
No feedback from channel output to sensor (but either the sensor or the controller has
inite memory);
Connections to posterior matching
16
Decentralized stochastic control (Aditya Mahajan)
Other examples
Paging and registration in cellular networks
Hajek, Mitzel, Yang, IEEE TIT 2008
Multi-access broadcast
Hlyuchi Gallager, NTC 1983; Ooi, Wornell, CDC 1996; Mahajan, Allerton 2011
Decentralized balancing of queues
Ouyang, Teneketzis, arxiv 2014.
Remote Estimation
Lipsa, Martins IEEE TAC 2011; Nayyar, Baar, Teneketzis, Veeravalli, IEEE TAC 2013.
Decentralized sequential hypothesis testing
Nayyar, Teneketzis, IEEE TIT, 2011. Related to social learning.
17
Decentralized stochastic control (Aditya Mahajan)
Further Reading
Existence results for arbitrary spaces
Gupta, Yksel, Baar, Langbort, On the Existence of Optimal Policies for a Class of
Static and Sequential Dynamic Teams, arxiv preprint 2014.
Application to Linear Quadratic Gaussian (LQG) system
Mahajan, Nayyar, Suicient statistics for linear control strategies in decentralized
systems with partial history sharing, IEEE TAC 2015 (in print)
Nayyar, Lassard, Optimal Control for LQG Systems on Graphs Part I: Structural
Results, arxiv preprint, 2014.
Generalization to Games
Nayyar, Gupta, Langbort, Baar, Common Information Based Markov Perfect Equilibria
for Stochastic Games With Asymmetric Information: Finite Games, IEEE TAC 2014.
Nayyar, Gupta, Langbort, Baar, Common Information based Markov Perfect Equilibria
for Linear-Gaussian Games with Asymmetric Information, arxiv preprint 2014.
18
Decentralized stochastic control (Aditya Mahajan)
Final Thoughts
Simple solution to a complex class of problems
Is common information (or PHS) a realistic assumption?
Arises naturally in certain applications.
Use (a faster time-scale) consensus dynamics to generate common information (e.g.,
in mean-ield sharing)
Provide upper and lower bounds
Are there good numerical algorithms?
Are there POMDP algorithms for large action spaces?
Is there some structure in the DP that can be exploited?
common-information
Interesting variations
Approximation techniques Reinforcement learning
Other information structures (sparse structures)?
19
Decentralized stochastic control (Aditya Mahajan)
References
Nayyar, Sequential Decision-Making in Decentralized systems, PhD Thesis, Univ of
Michigan, 2011.
Mahajan, Nayyar, and Teneketzis, Identifying tractable decentralized problems on the
basis of information structures , Allerton 2008.
Nayyar, Mahajan and Teneketzis, Optimal control strategies in delayed sharing information
structures, IEEE TAC 2011.
Nayyar, Mahajan and Teneketzis, Decentralized stochastic control with partial history
sharing: A common information approach, IEEE TAC 2013.
Mahajan, Optimal decentralized control of coupled subsystems with control sharing,
IEEE TAC 2013.
Arabneydi and Mahajan, Team optimal control of coupled subsystems with mean ield
sharing, CDC 2014.
Mahajan and Mannan, Decentralized Stochastic Control, Annals of OR, (in print).
20
Decentralized stochastic control (Aditya Mahajan)