0% found this document useful (0 votes)
61 views50 pages

Decentralized Stochastic Control: Aditya Mahajan

The document discusses decentralized stochastic control. It presents the simplest general model which involves multiple agents taking actions based on local information to optimize a global reward function. Two main approaches are discussed: the person-by-person approach and the common information approach. The conceptual difficulties in solving the decentralized control problem are outlined, such as choosing control laws to optimize expected long-term reward across agents with increasing information domains over time.

Uploaded by

Anonymous AFFiZn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views50 pages

Decentralized Stochastic Control: Aditya Mahajan

The document discusses decentralized stochastic control. It presents the simplest general model which involves multiple agents taking actions based on local information to optimize a global reward function. Two main approaches are discussed: the person-by-person approach and the common information approach. The conceptual difficulties in solving the decentralized control problem are outlined, such as choosing control laws to optimize expected long-term reward across agents with increasing information domains over time.

Uploaded by

Anonymous AFFiZn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Decentralized stochastic control

The person-by-person and the common information approaches

Aditya Mahajan
McGill University

Banf Workshop on Optimal Cooperation, Communication,


and Learning in Decentralized Systems, 14 Oct 2014
Simplest general model of a decentralized control system
U I

Y
X
Y

n

Un In

Dynamics X + = f (X , , W ), where = (U , . . . , Un ).

Observation Yi = hi (X , Wi ).

Information {Yi : , Ui : } Ii { : , : }, Ui = gi (Ii ).


structure

Control Strategy = ( , . . . , n ), where i = (gi , gi , . . . ).

Per-step reward R = (X , ). J() = R


[ ]

Performance
=
1
Decentralized stochastic control (Aditya Mahajan)
Simplest general model of a decentralized control system
g
U I

Y
X
Y
Designer
n

Un gn
In

Dynamics X + = f (X , , W ), where = (U , . . . , Un ).

Observation Yi = hi (X , Wi ).

Information {Yi : , Ui : } Ii { : , : }, Ui = gi (Ii ).


structure

Control Strategy = ( , . . . , n ), where i = (gi , gi , . . . ).

Per-step reward R = (X , ). J() = R


[ ]

Performance
=
1
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
...
AI Literature
...

2
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
...
AI Literature
...

Simpler than non-cooperative game theory.


All pre-game agreements are enforceable.

Simpler than cooperative game theory.


The value of the game does not need to be split between the players.

2
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
Main
... diiculty: Seeking global optimality
AI Literature
...

Simpler than non-cooperative game theory.


All pre-game agreements are enforceable.

Simpler than cooperative game theory.


The value of the game does not need to be split between the players.

2
Decentralized stochastic control (Aditya Mahajan)
Conceptual difficulties

choose an ininite sequence of control laws to maximize the expected total reward.
The optimal control problem is a functional optimization problem where we have to

The domain Ii of control law gi increases with time.


Can the optimization problem be solved?
Can we implement the optimal solution?

Agent based methods lead to ininite regress.

Signaling (or the communication aspect of control)

3
Decentralized stochastic control (Aditya Mahajan)
Centralized stochastic control: Information state

I I +

4
Decentralized stochastic control (Aditya Mahajan)
Centralized stochastic control: Information state

I I +

A process {Z }= is called an information state if

There exists a series of functions {F }= such that Z = f (I ).


Function of available information

Absorbs the efect of available information on current rewards


(R | I = i , U = u ) = (R | Z = F (i ), U = u ).
Controlled Markov property
(Z + | I = i , U = u ) = (Z + | Z = F (i ), U = u ).

Examples: System state in MDPs Belief state in POMDPs

4
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies

i.e., for any choice of future strategy = (g + , g + , . . . )


The information state absorbs the efect of available information on expected future cost,

R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]

| |

t t

5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies

i.e., for any choice of future strategy = (g + , g + , . . . )


The information state absorbs the efect of available information on expected future cost,

R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]

| |

t t

Z is a suicient statistic for performance evaluation,


Therefore,

there is no loss of optimality is using control laws of the form g Z U

5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies

i.e., for any choice of future strategy = (g + , g + , . . . )


The information state absorbs the efect of available information on expected future cost,

R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]

| |

t t

Z is a suicient statistic for performance evaluation,


Therefore,

there is no loss of optimality is using control laws of the form g Z U

In MDPs, g X U .
In POMDPs, g B U , where B is the belief state.
Examples

5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = ]

|
t

6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = ]

|
t

There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by


the ixed point of the following dynamic program

V(z) = min [R + V(Z + ) | Z = z, U = u]


6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = + information]state.

|
t Both these results rely on an appropriate choice of

There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by


Note that information state for DP
is also adynamic
the ixed point of the following suicient statistic for control.
program

V(z) = min [R + V(Z + ) | Z = z, U = u]


6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = + information]state.

|
Both these
t results rely on an appropriate choice of

Can we identify a suicient statistic Zi and restrict


attention to gi Zi Ui ?
There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by
Note that information state for DP
is also adynamic
the ixed point of the following suicient statistic for control.
program

V(z) = min [R +optimal


V(Z + control
) | Z = z,strategies?
U = u]
Can we show that there exist time-homogeneous

Can we identify appropriate information states to


determine a dynamic program that computes such
optimal strategies?

6
Decentralized stochastic control (Aditya Mahajan)
Two approaches to dynamic programming:
The person-by-person approach
The person-by-person approach
Pick an agent, say i.

Arbitrarily ix the strategies i of all other agents.

Identify an information-state process {Zi }= for agent i.

Structure of If i , the space of realization of Zi , does not depend on i , then


optimal strategies there is no loss of optimality in using gi Zi Ui .

Radner, "Team decision problems, Ann Math Stat, 1962.


Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
The person-by-person approach
Pick an agent, say i.

Arbitrarily ix the strategies i of all other agents.

Identify an information-state process {Zi }= for agent i.

Structure of If i , the space of realization of Zi , does not depend on i , then


optimal strategies there is no loss of optimality in using gi Zi Ui .

Write coupled dynamic programs to identify the best response strategy


i = i (i )

Remarks Is the best-response strategy time-homogeneous?


Does there exist a ixed-point of the coupled dynamic program?
Is the ixed point unique?
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
The person-by-person approach
Pick an agent, say i.

Arbitrarily ix the strategies i of all other agents.


The person-by-person approach:
May identify the
Identify an information-state process {Zi }=of for
structure globally i.
agent optimal control

Structure of If i , the space of realization of Zi , does not depend on i , then


strategies.

optimal strategies there is no loss of optimality in using gi Zi Ui .


Provides coupled dynamic programs, which, at best,
may determine person-by-person optimal control
strategies. Such strategies can be arbitrarily bad
Write coupled dynamic programs to identify the best response strategy
i = i (i )
compared to globally optimal strategies.

Remarks Is the best-response strategy time-homogeneous?


Does there exist a ixed-point of the coupled dynamic program?
Is the ixed point unique?
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

Information Ii = {Xi : , : }
structure

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

Information Ii = {Xi : , : }
structure

Conditional For any arbitrary choice of control strategies :


| = )= | = )
independence n
i
( : : : (X : : :
i=

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

Information Ii = {Xi : , : }
structure

Conditional For any arbitrary choice of control strategies :


| = )= | = )
independence n
i
( : : : (X : : :
i=

Arbitrarily ix strategies i , and consider the best-response strategy


at agent i.
Structure
of optimal
{Xi , } is an information-state at agent i.
strategies
:

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
Two approaches to dynamic programming:
The common-information approach
One dynamic program to rule them all

V( ) = min [R + V( + )| = , = ]

9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all

V( ) = min [R + V( + )| = , = ]

The information state must be a function of the information available to every controller.

9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all

V( ) = min [R + V( + )| = , = ]

The information state must be a function of the information available to every controller.

Common information: C = Ii , Local information: Li = Ii C


n

i=

9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all

V(z) = min [R + V(Z + ) | Z = z, = ]

The information state must be a function of the information available to every controller.

Common information: C = Ii , Local information: Li = Ii C


n

i=

9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all

V(z) = min [R + V(Z + ) | Z = z, = ]

The information state must be a function of the information available to every controller.

Common information: C = Ii , Local information: Li = Ii C


n

i=

Each step of the dynamic programming must determine a mapping from (C , Li ) Ui .


The information state Z only depends on C
Thus, the action at each step must be a mapping Li Ui . Call it prescription and
denote it by i .

9
Decentralized stochastic control (Aditya Mahajan)
One dynamic program to rule them all

V(z) = min [R + V(Z + ) | Z = z, = ]


The information state must be a function of the information available to every controller.

Common information: C = Ii , Local information: Li = Ii C


n

i=

Each step of the dynamic programming must determine a mapping from (C , Li ) Ui .


The information state Z only depends on C
Thus, the action at each step must be a mapping Li Ui . Call it prescription and
denote it by i .

9
Decentralized stochastic control (Aditya Mahajan)
A virtual coordinator
I

In

10
Decentralized stochastic control (Aditya Mahajan)
A virtual coordinator

L

X C

n
Ln

10
Decentralized stochastic control (Aditya Mahajan)
A virtual coordinator

L

X C

n
Ln

|i | is uniformly bounded (over i and t) and


Partial history sharing

(Li + | C , Li , Ui , Yi + ) = (Li + | Li , Ui , Yi + )

Information state: (X , | C = c) (or something else)


Centralized POMDP

Standard POMDP results apply, value function is PWLC.


Subsumes many previous results on DP for decentralized stochastic control.
10
Decentralized stochastic control (Aditya Mahajan)
Example 1: Delayed sharing information structure
Dynamics X + = f (X , , W ), where = (U , . . . , Un ).

Observations Yi = hi (X , Wi ).

Information Ii = {Yi : , Ui : , : k , : k }. k is the sharing delay.


structure

Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.


Nayyar, Mahajan and Teneketzis, Optimal control strategies in delayed sharing information structures, IEEE TAC 2011.
11
Decentralized stochastic control (Aditya Mahajan)
Example 1: Delayed sharing information structure
Dynamics X + = f (X , , W ), where = (U , . . . , Un ).

Observations Yi = hi (X , Wi ).

Information Ii = {Yi : , Ui : , : k , : k }. k is the sharing delay.


structure

Common info.: C = { : k , : k }, Local Info.: Li = Ii C , Pres.: i Li Ui

Information State = (X , | C )

Results No loss of optimality in using control strategies gi (Li , ) Ui .


Dynamic program: V() = min [R + V( + ) | = , = ].

Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.


Nayyar, Mahajan and Teneketzis, Optimal control strategies in delayed sharing information structures, IEEE TAC 2011.
11
Decentralized stochastic control (Aditya Mahajan)
Example 2: Control sharing information structure
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

: Ii = {Xi : , }
structure Using p-by-p approach: Ii = {Xi , : }.
Information Original :

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
12
Decentralized stochastic control (Aditya Mahajan)
Example 2: Control sharing information structure
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

: Ii = {Xi : , }
structure Using p-by-p approach: Ii = {Xi , : }.
Information Original :

Common info.: C = : , Local Info.: Li = Xi , Prescriptions: i Xi Ui

Information Deine i (x) = (Xi = x | : ).


State Then = ( , . . . , n ) is an information state.

Results No loss of optimality in using control strategies gi (Xi , ) Ui .


Dynamic program: V() = min [R + V( + ) | = , = ].

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
12
Decentralized stochastic control (Aditya Mahajan)
Example 3: Mean-field sharing information structure
Dynamics Xi + = f (Xi , Ui , M , Wi ), where M = Xit .
n

i=

Information Ii = {Xi , M : }, and assume identical control laws.


structure

Arabneydi, Mahajan Team optimal control of coupled subsystems with mean ield sharing, CDC 2014.
13
Decentralized stochastic control (Aditya Mahajan)
Example 3: Mean-field sharing information structure
Dynamics Xi + = f (Xi , Ui , M , Wi ), where M = Xit .
n

i=

Information Ii = {Xi , M : }, and assume identical control laws.


structure

Common info.: C = M : , Local info.: Li = Xi , Prescriptions: Xi Ui .

Information state Due to the symmetry of the system, M is an information-state.

Results No loss of optimality in using control strategies: gi (Xi , M ).


Dynamic program: V(m) = min [R + V(M + ) | M = m, = ]

Size of state space = poly(n); Size of action space .


Arabneydi, Mahajan Team optimal control of coupled subsystems with mean ield sharing, CDC 2014.
13
Decentralized stochastic control (Aditya Mahajan)
What if the shared information is empty?
The designers approach
An example: Finite memory controller
Dynamics X + = f (X , U , W ), Y = h (X , N ).

Information I = {Y , M } Simplest non-classical information structure


structure [U , M + ] = g (Y , M )

Witsenhausen, A standard form for sequential stochastic control, Math. Sys. Theory, 1973.
14
Decentralized stochastic control (Aditya Mahajan)
An example: Finite memory controller
Dynamics X + = f (X , U , W ), Y = h (X , N ).

Information I = {Y , M } Simplest non-classical information structure


structure [U , M + ] = g (Y , M )

Common info.: C = , Local info.: L = (Y , M ), Prescriptions: g (Y , M ) U .

Information state = (X , M | g : )

Results Dynamic program: V() = min [R + V( + ) | = , g = g]


g

Cannot show that time-homogeneous strategies are optimal!

Witsenhausen, A standard form for sequential stochastic control, Math. Sys. Theory, 1973.
14
Decentralized stochastic control (Aditya Mahajan)
Some applications
Real-time communication with feedback

Source Encoder Channel Decoder

Variations
Source coding, channel coding, or joint source-channel coding setup;
Feedback from channel output to encoder;
No feedback or noisy feedback (but either encoder or decoder has inite memory);

Generalization
Multi-terminal real-time communication
Source coding, channel coding, joint source-channel coding

15
Decentralized stochastic control (Aditya Mahajan)
Networked control systems

Plant Sensor Channel Controller

Variations
Feedback from channel output to sensor;
No feedback from channel output to sensor (but either the sensor or the controller has
inite memory);
Connections to posterior matching

16
Decentralized stochastic control (Aditya Mahajan)
Other examples
Paging and registration in cellular networks
Hajek, Mitzel, Yang, IEEE TIT 2008

Multi-access broadcast
Hlyuchi Gallager, NTC 1983; Ooi, Wornell, CDC 1996; Mahajan, Allerton 2011

Decentralized balancing of queues


Ouyang, Teneketzis, arxiv 2014.

Remote Estimation
Lipsa, Martins IEEE TAC 2011; Nayyar, Baar, Teneketzis, Veeravalli, IEEE TAC 2013.

Decentralized sequential hypothesis testing


Nayyar, Teneketzis, IEEE TIT, 2011. Related to social learning.

17
Decentralized stochastic control (Aditya Mahajan)
Further Reading
Existence results for arbitrary spaces
Gupta, Yksel, Baar, Langbort, On the Existence of Optimal Policies for a Class of
Static and Sequential Dynamic Teams, arxiv preprint 2014.

Application to Linear Quadratic Gaussian (LQG) system


Mahajan, Nayyar, Suicient statistics for linear control strategies in decentralized
systems with partial history sharing, IEEE TAC 2015 (in print)
Nayyar, Lassard, Optimal Control for LQG Systems on Graphs Part I: Structural
Results, arxiv preprint, 2014.

Generalization to Games
Nayyar, Gupta, Langbort, Baar, Common Information Based Markov Perfect Equilibria
for Stochastic Games With Asymmetric Information: Finite Games, IEEE TAC 2014.
Nayyar, Gupta, Langbort, Baar, Common Information based Markov Perfect Equilibria
for Linear-Gaussian Games with Asymmetric Information, arxiv preprint 2014.

18
Decentralized stochastic control (Aditya Mahajan)
Final Thoughts

Simple solution to a complex class of problems

Is common information (or PHS) a realistic assumption?


Arises naturally in certain applications.
Use (a faster time-scale) consensus dynamics to generate common information (e.g.,
in mean-ield sharing)
Provide upper and lower bounds

Are there good numerical algorithms?


Are there POMDP algorithms for large action spaces?
Is there some structure in the DP that can be exploited?

common-information
Interesting variations
Approximation techniques Reinforcement learning
Other information structures (sparse structures)?
19
Decentralized stochastic control (Aditya Mahajan)
References
Nayyar, Sequential Decision-Making in Decentralized systems, PhD Thesis, Univ of
Michigan, 2011.

Mahajan, Nayyar, and Teneketzis, Identifying tractable decentralized problems on the


basis of information structures , Allerton 2008.

Nayyar, Mahajan and Teneketzis, Optimal control strategies in delayed sharing information
structures, IEEE TAC 2011.

Nayyar, Mahajan and Teneketzis, Decentralized stochastic control with partial history
sharing: A common information approach, IEEE TAC 2013.

Mahajan, Optimal decentralized control of coupled subsystems with control sharing,


IEEE TAC 2013.

Arabneydi and Mahajan, Team optimal control of coupled subsystems with mean ield
sharing, CDC 2014.

Mahajan and Mannan, Decentralized Stochastic Control, Annals of OR, (in print).

20
Decentralized stochastic control (Aditya Mahajan)

You might also like