0% found this document useful (0 votes)

61 views50 pages

Decentralized Stochastic Control: Aditya Mahajan

The document discusses decentralized stochastic control. It presents the simplest general model which involves multiple agents taking actions based on local information to optimize a global reward function. Two main approaches are discussed: the person-by-person approach and the common information approach. The conceptual difficulties in solving the decentralized control problem are outlined, such as choosing control laws to optimize expected long-term reward across agents with increasing information domains over time.

Uploaded by

Anonymous AFFiZn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views50 pages

Decentralized Stochastic Control: Aditya Mahajan

Uploaded by

Anonymous AFFiZn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Decentralized stochastic control

The person-by-person and the common information approaches

Aditya Mahajan
McGill University

Banf Workshop on Optimal Cooperation, Communication,

and Learning in Decentralized Systems, 14 Oct 2014
Simplest general model of a decentralized control system
U I

Y
X
Y

n

Un In

Dynamics X + = f (X , , W ), where = (U , . . . , Un ).

Observation Yi = hi (X , Wi ).

Information {Yi : , Ui : } Ii { : , : }, Ui = gi (Ii ).

structure

Control Strategy = ( , . . . , n ), where i = (gi , gi , . . . ).

Per-step reward R = (X , ). J() = R

[ ]

Performance
=
1
Decentralized stochastic control (Aditya Mahajan)
Simplest general model of a decentralized control system
g
U I

Y
X
Y
Designer
n

Un gn
In

Dynamics X + = f (X , , W ), where = (U , . . . , Un ).

Observation Yi = hi (X , Wi ).

Information {Yi : , Ui : } Ii { : , : }, Ui = gi (Ii ).

structure

Control Strategy = ( , . . . , n ), where i = (gi , gi , . . . ).

Per-step reward R = (X , ). J() = R

[ ]

Performance
=
1
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
...
AI Literature
...

2
Decentralized stochastic control (Aditya Mahajan)
Literature Economics Literature
overview Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
...
Systems & Control Literature
Witsenhausen, Separation of estimation and control, Proc IEEE, 1971.
Witsenhausen, On information structures, feedback and causality, SICON 1971.
Ho and Chu, Team decision theory and information structures, IEEE TAC 1972.
...
AI Literature
...

Simpler than non-cooperative game theory.

All pre-game agreements are enforceable.

Simpler than cooperative game theory.

The value of the game does not need to be split between the players.

Simpler than non-cooperative game theory.

All pre-game agreements are enforceable.

Simpler than cooperative game theory.

The value of the game does not need to be split between the players.

2
Decentralized stochastic control (Aditya Mahajan)
Conceptual difficulties

choose an ininite sequence of control laws to maximize the expected total reward.
The optimal control problem is a functional optimization problem where we have to

The domain Ii of control law gi increases with time.

Can the optimization problem be solved?
Can we implement the optimal solution?

Agent based methods lead to ininite regress.

Signaling (or the communication aspect of control)

3
Decentralized stochastic control (Aditya Mahajan)
Centralized stochastic control: Information state

I I +

4
Decentralized stochastic control (Aditya Mahajan)
Centralized stochastic control: Information state

I I +

A process {Z }= is called an information state if

There exists a series of functions {F }= such that Z = f (I ).

Function of available information

Absorbs the efect of available information on current rewards

(R | I = i , U = u ) = (R | Z = F (i ), U = u ).
Controlled Markov property
(Z + | I = i , U = u ) = (Z + | Z = F (i ), U = u ).

Examples: System state in MDPs Belief state in POMDPs

4
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies

i.e., for any choice of future strategy = (g + , g + , . . . )

The information state absorbs the efect of available information on expected future cost,

R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]

| |

t t

5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies

i.e., for any choice of future strategy = (g + , g + , . . . )

The information state absorbs the efect of available information on expected future cost,

R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]

| |

t t

Z is a suicient statistic for performance evaluation,

Therefore,

there is no loss of optimality is using control laws of the form g Z U

5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Structure of optimal strategies

i.e., for any choice of future strategy = (g + , g + , . . . )

The information state absorbs the efect of available information on expected future cost,

R I = i , U = u = R Z = F (i ), U = u .
[ = ] [ = ]

| |

t t

Z is a suicient statistic for performance evaluation,

Therefore,

there is no loss of optimality is using control laws of the form g Z U

In MDPs, g X U .
In POMDPs, g B U , where B is the belief state.
Examples

5
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = ]

|
t

6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = ]

|
t

There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by

the ixed point of the following dynamic program

V(z) = min [R + V(Z + ) | Z = z, U = u]

6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = + information]state.

|
t Both these results rely on an appropriate choice of

There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by

Note that information state for DP
is also adynamic
the ixed point of the following suicient statistic for control.
program

V(z) = min [R + V(Z + ) | Z = z, U = u]

6
Decentralized stochastic control (Aditya Mahajan)
Centralized control: Dynamic programming
For any strategy of the form g Z U ,

R Z ,U =g (Z ) Z = z ,U = u
[ [ = ]| ]

|
t
t+1
+ + + +
+

= R Z = z , U = u Relies on +
[ = + information]state.

|
Both these
t results rely on an appropriate choice of

Can we identify a suicient statistic Zi and restrict

attention to gi Zi Ui ?
There exists a time-homogeneous optimal strategy = (g , g , . . . ) that is given by
Note that information state for DP
is also adynamic
the ixed point of the following suicient statistic for control.
program

V(z) = min [R +optimal

V(Z + control
) | Z = z,strategies?
U = u]
Can we show that there exist time-homogeneous

Can we identify appropriate information states to

determine a dynamic program that computes such
optimal strategies?

6
Decentralized stochastic control (Aditya Mahajan)
Two approaches to dynamic programming:
The person-by-person approach
The person-by-person approach
Pick an agent, say i.

Arbitrarily ix the strategies i of all other agents.

Identify an information-state process {Zi }= for agent i.

Structure of If i , the space of realization of Zi , does not depend on i , then

optimal strategies there is no loss of optimality in using gi Zi Ui .

Radner, "Team decision problems, Ann Math Stat, 1962.

Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
The person-by-person approach
Pick an agent, say i.

Arbitrarily ix the strategies i of all other agents.

Identify an information-state process {Zi }= for agent i.

Structure of If i , the space of realization of Zi , does not depend on i , then

optimal strategies there is no loss of optimality in using gi Zi Ui .

Write coupled dynamic programs to identify the best response strategy

i = i (i )

Remarks Is the best-response strategy time-homogeneous?

Does there exist a ixed-point of the coupled dynamic program?
Is the ixed point unique?
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
The person-by-person approach
Pick an agent, say i.

Arbitrarily ix the strategies i of all other agents.

The person-by-person approach:
May identify the
Identify an information-state process {Zi }=of for
structure globally i.
agent optimal control

Structure of If i , the space of realization of Zi , does not depend on i , then

strategies.

optimal strategies there is no loss of optimality in using gi Zi Ui .

Provides coupled dynamic programs, which, at best,
may determine person-by-person optimal control
strategies. Such strategies can be arbitrarily bad
Write coupled dynamic programs to identify the best response strategy
i = i (i )
compared to globally optimal strategies.

Remarks Is the best-response strategy time-homogeneous?

Does there exist a ixed-point of the coupled dynamic program?
Is the ixed point unique?
Radner, "Team decision problems, Ann Math Stat, 1962.
Marschak and Radner, Economics Theory of Teams, 1972.
7
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

Information Ii = {Xi : , : }
structure

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
An example: coupled subsystems with control sharing
Dynamics Xi + = fi (Xi , , Wi ), where = (U , . . . , Un ).

Information Ii = {Xi : , : }
structure

Conditional For any arbitrary choice of control strategies :

| = )= | = )
independence n
i
( : : : (X : : :
i=

Information Ii = {Xi : , : }
structure

Conditional For any arbitrary choice of control strategies :

| = )= | = )
independence n
i
( : : : (X : : :
i=

Arbitrarily ix strategies i , and consider the best-response strategy

at agent i.
Structure
of optimal
{Xi , } is an information-state at agent i.
strategies
:

Mahajan, Optimal decentralized control of coupled subsystems with control sharing, IEEE TAC 2013.
8
Decentralized stochastic control (Aditya Mahajan)
Two approaches to dynamic programming:
The common-information approach
One dynamic program to rule them all

V( ) = min [R + V( + )| = , = ]