0% found this document useful (0 votes)
24 views

Dynamic Programmingvia Linear Programming

programa dinamica
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Dynamic Programmingvia Linear Programming

programa dinamica
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/227988178

Dynamic Programming Via Linear Programming

Chapter · February 2011


DOI: 10.1002/9780470400531.eorms0277

CITATIONS READS
16 16,864

1 author:

Esra Buyuktahtakin
Virginia Tech (Virginia Polytechnic Institute and State University)
67 PUBLICATIONS 1,110 CITATIONS

SEE PROFILE

All content following this page was uploaded by Esra Buyuktahtakin on 17 December 2017.

The user has requested enhancement of the downloaded file.


Dynamic Programming via Linear Programming

İ. Esra Büyüktahtakın

Systems and Industrial Engineering, University of Arizona

1127 E James E. Rogers Way, Tucson, AZ, 85721

[email protected]

Abstract

Dynamic programming (DP) has been used to solve a wide range of optimization

problems. Given that dynamic programs can be equivalently formulated as linear pro-

grams, linear programming (LP) offers an efficient alternative to the functional equa-

tion approach in solving such problems. LP is also utilized with DP to characterize the

polyhedral structure of discrete optimization problems. In this paper, we investigate

the close relationship between the two traditionally distinct areas of dynamic program-

ming and linear programming.

To cite this paper: İ. E. Büyüktahtakın. Dynamic Programming Via Linear Pro-

gramming. In J. J. Cochran and L. A. Cox, Jr. and P. Keskinocak and J. P. Kharoufeh

and J. C. Smith, Wiley Encyclopedia of Operations Research and Management Science.

John Wiley & Sons, Hoboken, NJ, 2011.

1 Related Literature

A wide variety of problems have been shown to be polynomially solvable via dynamic pro-
gramming recursive formulations. The problem is attacked by decomposing it into a sequence

1
of interrelated subproblems defined by a recursive function and starting to solve the smallest
subproblem. The problem is then enlarged by finding the current optimal solution from the
preceding subproblem until the original problem is solved in its entirety. The main strength
of this approach is to offer treatments to different type of problems involving sequential
decision-making with discrete, nonlinear, and stochastic characteristics.
Dynamic programming was introduced to solve Markov decision processes by Bellman
[1]. The basics of a dynamic program can be described as follows: Consider a system being
observed over a finite or infinite time horizon divided into periods or stages. At each stage,
a decision or an action updates the state to be observed at the next stage, and depending on
the state and the decision made, an immediate reward (cost) is observed. The value function
represents the expected total reward (cost) from the current stage through the end of the
planning horizon, while the functional equation expresses the relationship between the value
function at the present stage and the successive stage. Optimal decisions depending on stage
and state are determined backwards iteratively by maximizing (minimizing) the right hand
side of the functional equation.
The use of linear programming to solve dynamic programming formulations was first
introduced by D’epenoux [11] and Manne [21]. Manne [21] studies an undiscounted Markov
decision model with an infinite planning horizon. He specifically analyzes an inventory con-
trol problem where the “state” variable represents the initial stock level i and the “decision
variable” corresponds to the order quantity j. In the linear program in Manne [21], i and
j are introduced as subscripts to the variables xij . These variables xij represent the joint
probabilities with which the initial stock equals i and the production quantity equals j.
The steady state probabilities of inventory, production, and shortage levels are then derived.
The constraints of the LP are the requirements regarding steady state probabilities, and
the objective function is the minimization of the expected cost corresponding to the steady
state probabilities. D’epenoux [11] provides a linear program for the discounted version of
the problem in [21] by linearizing the functional equations of the corresponding dynamic

2
program.
Howard [17] combines dynamic programming with Markov chain theory to develop Markov
decision processes. He also contributes to the solution of infinite horizon problems by de-
veloping the policy iteration method as an alternative to the backward induction method
of Bellman [1], which is known as value iteration. The policy iteration algorithm generates
a sequence of stationary policies by evaluating and improving the policies until the optimal
policy is obtained. Later, Osaki and Mine [26] formulate a semi-Markovian decision pro-
cess as an LP and show that the dual of this LP is equivalent to dynamic programming
formulation of Howard [17].
Another group of dynamic programming and Markov decision processes research arise
from the observation that finding the optimal cost function can be cast as a linear program-
ming problem ([3], [4], [9], [10], [16], [21], [29]). However the resulting linear program suffers
from the curse of dimensionality: as the problem size linearly increases, the size of the state
space grows exponentially. This LP has as many decision variables as states in the Markov
decision processes, and an even greater number of constraints. This difficulty is addressed by
approximate dynamic programming (ADP), where dynamic programing optimal cost-to-go
function of the problem is approximated within the span of some pre-specified set of basis
functions as introduced by Schweitzer and Seidmann [27]. de Farias and Van Roy analyze
and further develop this approach by proposing the procedure known as approximate linear
programming (ALP); that reduces the dimensionality of the linear program by utilizing a
linear combination of basis functions combined with sampling only a small subset of the
constraints ([6], [7], [8]). de Farias and Van Roy [7] also establish strong approximation
guarantees for ALP based approximations assuming knowledge of a Lyapunov-like function
that must be included in the basis. An extension to the ALP approach that automatically
generates the basis functions in the linear framework was developed by Valenti et al. [30].
Later Desai et al. [12] study the ADP via a smoothed linear program and develop an error
bound that characterizes the quality of approximations produced by ADP.

3
Eppen and Martin [14] and Martin et al. [22] study the underlying DP network structure
to reformulate difficult integer programs into new models that have better bounds and solve
more quickly. Eppen and Martin [14] provide tighter mixed integer programming formula-
tions for the single and multi-item lot-sizing problems using a variable redefinition approach.
They first remove the capacity constraints from the traditional lot-sizing formulation and
represent the subproblem with the dynamic programming network structure. This shortest
path network can be written as an integer linear program (IP), with the arcs correspond-
ing to binary variables and the nodes corresponding to flow balance constraints. Eppen
and Martin then relate the variables of the traditional model to the new set of variables
through a linear transformation. By using this transformation, they insert the complicating
constraints in terms of the new variables into the new formulation. Although this new refor-
mulation has a greater number of variables and constraints, it has a tighter LP relaxation
lower bound leading to reduced solution times. Martin et al. [22] formulate polynomially
solvable optimization problems as shortest path problems by using dynamic programming.
They then represent the dynamic program as an LP having a polynomial number of variables
and constraints. The extreme points of this LP are represented by the solution vectors of
the DP, and the dual of the LP provides the DP formulation. They also show that with
an appropriate change of variables, the LP formulation obtained from the DP provides a
polyhedral description of the model considered.
There are a number of studies that utilize dynamic programming algorithms to formalize
and solve integer linear programming problems (e.g. [2], [13], [19], [25], [28], [31]). In a
recent study, Hartman et al. [15] introduce a set of dynamic programming-based inequal-
ities that can be used to augment the capacitated lot-sizing integer linear programming
(CLSP) formulation. These authors utilize iterative solutions of forward dynamic program-
ming formulations for CLSP to generate inequalities for an equivalent integer programming
formulation. The inequalities capture convex and concave envelopes of intermediate-stage
value functions, and can be lifted by examining potential state information at future stages.

4
Lawler and Wood [20] discuss the close relationship between branch and bound and dynamic
programming, while Morin and Marsten [24] study branch and bound techniques to reduce
storage and computational requirements in discrete dynamic programming. In particular,
they utilize relaxations and fathoming criteria in branch and bound to eliminate states of
DP which will not lead optimal policies.
The paper is outlined as follows. In Section 2, we review the close relationship between
linear programming techniques and dynamic programming in stochastic control, while we
discuss the utilization of dynamic programming driven acyclic decision graphs to describe
polyhedral characteristics of discrete optimization problems in Section 3. Section 4 concludes
the paper and offers directions for future research.

2 LP-DP Relationship in Stochastic Control

In this section, we consider a discrete-time stochastic control problem involving a finite state
space S = (1, . . . , n). Let U be a finite decision set for all i ∈ S. Given state i, the use
of decision u ∈ U specifies the transition probability pij (u) to the next stage j, and a cost
g(i, u) is incurred when a decision u is taken in state i. Future costs are discounted by a
factor α ∈ (0, 1). A policy of the stochastic control problem is denoted by µ : S → U. The
problem is then to minimize the so-called cost-to-go function Jµ : S → U over the set of
admissible policies P:
" ∞
#
X
min Jµ (i0 ) = min E αk g(ik , µ(ik )) , (1)
µ∈P µ∈P
k=0

where i0 ∈ S is an initial state and the expectation E is taken over the possible future states
{i1 , i2 , . . .}, given i0 and the policy µ.
The optimal cost associated with the optimal policy µ∗ , denoted by J ∗ = Jµ∗ , satisfies
the Bellman equation:
" #
X
J(i) = min g(i, u) + α pij (u)J(j) , ∀i ∈ S, (2)
u∈U
j∈S

5
which is called value iteration and is a principal method for calculating the optimal cost
function J ∗ [4]. Once J ∗ is found by solving (2), the optimal policy µ∗ can be computed.
An alternative approach to solving the stochastic control problem described here is to fix
the policy µ and then to solve the following linear system:
X
Jµ (i) = g(i, µ(i)) + α pij (µ(i))Jµ (j) ∀i ∈ S. (3)
j∈S

Solving (3) is called as policy evaluation yielding Jµ , the cost-to-go of the fixed policy µ.
After computing the policy’s cost-to-go function, a better policy can be constructed by
performing a policy improvement step. The policy iteration method repeatedly performs
policy evaluation followed by policy improvement. This procedure generates a sequence of
policies that is guaranteed to converge to the optimal policy µ∗ after a finite number of
iterations since the new policy must be strictly better than the previous policy and there is
only a finite number of possible policies in total [17].

2.1 LP Formulation of DP

Suppose that we use value iteration to generate a sequence of vectors Jk = (Jk (1), . . . , Jk (n))
given an initial condition vector J0 = (J0 (1), . . . , J0 (n)). Then the following constraints:
X
J(i) ≤ g(i, u) + α pij (u)J(j), ∀i ∈ S, u ∈ U, (4)
j∈S

form a polyhedron in Rn . The optimal cost vector J ∗ = (J ∗ (1), . . . , J ∗ (n)) solves the follow-
ing problem (in w1 , . . . , wn )
LP1:
X
max wi (5)
i∈S
X
subject to: wi ≤ g(i, u) + α pij (u)wj , ∀i ∈ S, u ∈ U. (6)
j∈S

This is a linear program with n variables and as many as n × q constraints, where q is the
maximum number of elements of the set U. For very large n and q, the linear program can
be solved by the use of specialized, large-scale linear programming algorithms ([4], [29]).

6
2.2 Dual and Policy Iteration

We now investigate the dual of LP1 which is shown to be the same as policy iteration ([10]).
Duality theory of linear programming (see e.g., [5]) asserts that the following dual linear
program:
Dual1:
XX
min q(i, u)g(i, u) (7)
i∈S u∈U

X XX
subject to: q(i, u) − α q(j, u)pji (u) = 1, ∀i ∈ S, (8)
u∈U j∈S u∈U
q(i, u) ≥ 0, ∀i ∈ S, u ∈ U, (9)

has the same optimal value as LP1. The variables q(i, u), i ∈ S, u ∈ U, of the dual
program can be interpreted as the steady-state probabilities that state i will be visited at
the typical transition and then control u will then be applied. The constraints of Dual1
are the constraints that q(i, u) must satisfy in order to be feasible steady-state probabilities.
The cost function:
XX
q(i, u)g(i, u)
i∈S u∈U

is the steady state average cost per transition.


Denardo [10] shows that the feasible bases for Dual1 are in one-to-one correspondence
with the policies. Denardo also proves that the application of the simplex method to the dual
program performs the same sequence of pivots as does policy iteration and policy iteration
is the same as multiple substitution (block pivoting) in the dual simplex method, as applied
to LP1.

3 LP-DP Relationship through Acyclic Graphs

Many discrete optimization problems that are solvable through dynamic programming can
be represented by directed acyclic shortest path decision graphs (e.g. [10], [18], [19], [23]).

7
Given a finite state set S, vertices of the graph correspond to states of S reflecting the
transition phases in the solution of the underlying sequential decision process. Arcs (i, j)
between vertices i, j ∈ S represent decisions changing the state i to state j > i. Furthermore,
each arc is assigned a cost equal to the length of the arc. The problem is then to find the
shortest path from an initial state to a goal state. Generally, states are partitioned into
stages such that the decision at a stage updates the state at the stage into the state for the
next stage.
Once the shortest path formulation of a discrete optimization problem is constructed, the
shortest path problem can easily be written as a linear program where variables represent
arcs and the flow balance at each vertex is expressed as a constraint. This LP provides
the polyhedral characterization of the discrete optimization problem where every face of
the associated polytope contains the incidence vector of a decision path in the dynamic
programming graph. For any given cost function, one such incidence vector will be the
linear programming optimal solution [22].
One problem with the acyclic shortest path paradigm is that it is inadequate for more
complex discrete optimization problems since a typical decision involves composing two or
more partial solution elements into a single element. To overcome this difficulty, Martin et
al. [22] developed a directed acyclic decision hypergraph framework for deriving extended
formulations of combinatorial optimization problems from dynamic programming algorithms
as mentioned in Section 1. The dynamic programing algorithms considered in [22] search
hyperpaths in a directed hypergraph H = (S, H) on a finite state space S with cardinality
|S| = n, where the directed hyperarcs in H are of the form (K, j) with j ∈ S and K ⊆ S.
The states are ordered by a numbering function σ : S → {1, 2, . . . , m , |S|} such that
σ(i) < σ(j) for all (K, j) ∈ H and i ∈ K. Furthermore, it is assumed that the directed
hypergraph is single tailed. The set ST ⊂ S of boundary states is the set of all j ∈ S for
which there is a hyperarc (∅, j) ∈ H and all nonboundary states are called as intermediate.
Finally, there must be a finite reference set I(j) 6= ∅ for each state j ∈ S such that I(i) ⊆ I(j)

8
and I(i) ∩ I(i0 ) = ∅ for all i, i0 ∈ K where i 6= i0 is satisfied for all (K, j) ∈ H.
In this setting, Martin et al. [22] proved the convex hull of the characteristic vectors of
decision paths in the dynamic programming hypergraph is described by the following linear
program
LP2:
X
min c[K, j]z[K, j] (10)
(K,j)∈H

X
subject to: z[K, m] = 1, (11)
(K,m)∈H
X X
z[K, j̄] − z[K, j] = 0 for σ(j̄) = 1, . . . , m − 1, (12)
(K,j̄)∈H (K,j)∈Hwith j̄∈K

z[K, j] ≥ 0 ∀ (K, j) ∈ H (13)

where z[K, j] is a binary (0 − 1) variable that takes value 1 if the hyperarc (K, j) is selected,
and 0 otherwise, and c[K, j] represents the cost of selecting the hyperarc (K, j). The objective
function (10) minimizes the sum of costs of choosing a particular path, while constraints (11)
ensure that exactly one decision will terminate at the global state m. Constraints (12) enforce
flow balance conditions. Constraints (11) and (12) with the nonnegativity constraints (13)
are shown to be sufficient to produce a binary solution to the primal linear program LP2.
LP2 can be viewed as the dual to the computations of the dynamic program itself. Thus
we provide the dual of the problem above as follows:
Dual2:
max ω (14)

X
subject to: ω− λ[i] ≤ c[K, m], ∀ (K, m) ∈ H (15)
i∈K
X
λ[j] − λ[i] ≤ c[K, j], ∀ (K, j) ∈ H; j 6= m. (16)
i∈K

Here ω denotes the dual multiplier for constraint (11), and λ[j] is the dual variable for row
σ(j̄) of (12). Note that Dual2 is equivalent to the dynamic programming formulation.

9
4 Conclusions and Future Remarks

In this paper we survey linear programming approaches for solving dynamic programming
formulations of hard optimization problems. In particular, we analyze techniques used for
casting a dynamic program as a linear program. An LP with few variables and a large
number of constraints is often tractable via large scale linear programming algorithms such as
constraint generation. However for the problems where DP requires an exponential number
of states, the corresponding LP formulation consists of an exponential number of constraints.
This difficulty necessitates research on new techniques to reduce the state space of DP by
eliminating the states and decisions that will not lead to the optimal policy.
As each problem requires a specific DP formulation and a program to be solved, LP
provides an advantage of easily writing and solving tractable DP models using commercial
software. Furthermore LP enables us to use sensitivity analysis for dynamic programs. LP
sensitivity analysis and its relation to the states and decisions of the dynamic program could
be further investigated. Another direction for research is utilization of LP formulation of
DP in order to obtain stronger IP formulations in discrete optimization.

References

[1] R. Bellman. A Markovian decision process. Journal of Mathematics and Mechanics,


6:679–684, 1957.

[2] R. Bellman. On a routing problem. Quarterly of Applied Mathematics, 16:87–90, 1958.

[3] D. P. Bertsekas. Dynamic Programming and Optimal Control: Volume 1. Athena


Scientific, 2005.

[4] D. P. Bertsekas. Dynamic Programming and Optimal Control: Volume 2. Athena


Scientific, 2007.

10
[5] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Prince-
ton, N. J., 1963.

[6] D. P. de Farias. The Linear Programming Approach to Approximate Dynamic Program-


ming: Theory and Application. PhD thesis, Stanford University, 2002.

[7] D. P. de Farias and B. Van Roy. The linear programming approach to approximate
dynamic programming. Operations Research, 51(6):850–865, 2003.

[8] D. P. de Farias and B. Van Roy. On constraint sampling in the linear programming
approach to approximate dynamic programming. Mathematics of Operations Research,
29(3):462–478, 2004.

[9] E. V Denardo. On linear programming in a Markov decision problem. Management


Science, 16(5):282–288, 1970.

[10] E. V Denardo. Dynamic Programming. Prentice-Hall, 2 edition, 2003.

[11] F. D’epenoux. A probabilistic production and inventory problem. Management Science,


10:98–108, 1963. Translation of an article published in Revue Frangaise de Recherche
Operationnelle 14, 1960.

[12] V. V. Desai, V. F. Farias, and C. C. Moallemi. Aproximate dynamic programming via


a smoothed approximate linear program. Working paper, 2009.

[13] S. E. Elmaghraby. The concept of state in discrete dynamic programming. Journal of


Mathematical Analysis and Applications, 29(3):523–557, 1970.

[14] G. D. Eppen and R. K. Martin. Solving multi-item capacitated lot sizing problems
using variable redefinition. Operations Research, 35(6):832–848, 1987.

[15] J. C. Hartman, İ. E. Büyüktahtakın, and J. C Smith. Dynamic programming based


inequalities for the capacitated lot-sizing problem. To appear in IIE Transactions, 2010.

11
[16] A. Hordijk and L. C. M. Kallenberg. Linear programming and Markov decision chains.
Management Science, 25(4):352–362, 1979.

[17] R. A. Howard. Dynamic Programming and Markov Processes. M.I.T. Press, Cambridge,
Massachusetts, 1960.

[18] T. Ibaraki. Solvable classes of discrete dynamic programming. Journal of Mathematical


Analysis and Applications, 43:642–693, 1973.

[19] R. M. Karp and M. Held. Finite state processes and dynamic programming. SIAM
Journal of Applied Mathematics, 15:693–718, 1967.

[20] E. L. Lawler and D. E. Wood. Branch-and-bound methods: A survey. Operations


Research, 14(4):699–719, 1966.

[21] A. S Manne. Linear programming and sequential decisions. Operations Research,


6(3):259–267, 1960.

[22] R. K. Martin, R. L. Rardin, and B. A. Campbell. Polyhedral characterization of discrete


dynamic programming. Operations Research, 38(1):127–138, 1990.

[23] T. L. Morin. Monotonicity and the principle of optimality. Journal of Mathematical


Analysis and Applications, 86:665–674, 1982.

[24] Th. L. Morin and R. E. Marsten. Branch-and-bound strategies for dynamic program-
ming. Operations Research, 24(4):611–627, 1976.

[25] G. Nemhauser. Introduction to Dynamic Programming. John Wiley and Sons, 1966.

[26] S. Osaki and H. Mine. Linear programming algorithms for semi-Markovian decision
processes. Journal of Mathematical Analysis and Applications, 22:356–381, 1968.

[27] P. Schweitzer and A. Seidmann. Generalized polynomial approximations in Markovian


decision processes. Journal of Mathematical Analysis and Applications, 110(2):568–582,
1985.

12
[28] J. F. Shapiro. Dynamic programming algorithms for the integer programming problem-
I: The integer programming problem viewed as a knapsack type problem. Operations
Research, 16(1):103–121, 1968.

[29] M. A Trick and S. E. Zin. A linear programming approach to solving stochastic dynamic
programs. unpublished, 1993.

[30] M. Valenti, B. Bethke, J. How, D. P. de Farias, and J. Vian. Embedding health man-
agement into mission tasking for UAV teams. In American Controls Conference, New
York, NY, 2007.

[31] L. A. Wolsey. Generalized dynamic programming methods in integer programming.


Mathematical Programming, 4(1):222–232, 1973.

13

View publication stats

You might also like