0% found this document useful (0 votes)
31 views

Dynamic Programming

The document discusses the key features of dynamic programming problems: 1) Problems are divided into stages where decisions must be made; 2) Each stage has associated states representing possible system conditions; 3) Decisions transition the state to the next stage; 4) The solution provides an optimal policy prescribing the best decision for each state. Dynamic programming solves problems recursively by working backward from the last stage to the first.

Uploaded by

kasim
Copyright
© © All Rights Reserved
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Dynamic Programming

The document discusses the key features of dynamic programming problems: 1) Problems are divided into stages where decisions must be made; 2) Each stage has associated states representing possible system conditions; 3) Decisions transition the state to the next stage; 4) The solution provides an optimal policy prescribing the best decision for each state. Dynamic programming solves problems recursively by working backward from the last stage to the first.

Uploaded by

kasim
Copyright
© © All Rights Reserved
Available Formats
Download as RTF, PDF, TXT or read online on Scribd
You are on page 1/ 8

Dynamic programming

These basic features that characterize dynamic programming problems are the following

Stages: The problem can be divided into stages, with a policy decision required at each stage. If
an investment planning problem has n period horizon then the problem can be divided into n
stages where each stage represents each period in the planning horizon. At each stage decision
has to be made what to do: how much to invest, for instance, in the investment planning problem.
Generally, dynamic programming problems require making a sequence of interrelated decisions,
where each decision corresponds to one stage of the problem.

States: Each stage has a number of states associated with the beginning of that stage. For
instance, if the problem is to find the shortest route from a certain initial node to a destination
node, at stage n, the states are the nodes where the traveler might be located and from which
points he wants to decide the next route to the next nodes. In general, the states are the various
possible conditions in which the system might be at that stage of the problem. The number of
states may be either finite or infinite. For instance, in a shortest route problem, there are a few
discrete number of nodes at a given stage representing the starting state of the system at that
stage. On the other hand, the amount of previous period investments that begin investment
decision at period k may be a continuous number with an infinite number of possible values.

State transition: The effect of the policy decision at each stage is to transform the current state
to a state associated with the beginning of the next stage (possibly according to a probability
distribution). The fortune seeker’s decision as to his next destination led him from his current
state to the next state on his journey. This procedure suggests that dynamic programming
problems can be interpreted in terms of the networks. Each node would correspond to a state.
The network would consist of columns of nodes, with each column corresponding to a stage, so
that the flow from a node can go only to a node in the next column to the right. The links from a
node to nodes in the next column correspond to the possible policy decisions on which state to
go to next. The value assigned to each link usually can be interpreted as the immediate
contribution to the objective function from making that policy decision. In most cases, the
objective corresponds to finding either the shortest or the longest path through the network.

Optima policy decision: The solution procedure is designed to find an optimal policy for the
overall problem, i.e., a prescription of the optimal policy decision at each stage for each of the
possible states. For the stagecoach problem, the solution procedure constructed a table for each
stage (n) that prescribed the optimal decision (xn*) for each possible state (s). Thus, in addition
to identifying three optimal solutions (optimal routes) for the overall problem, the results show
the fortune seeker how he should proceed if he gets detoured to a state that is not on an optimal
route. For any problem, dynamic programming provides this kind of policy prescription of what
to do under every possible circumstance (which is why the actual decision made upon reaching a
particular state at a given stage is referred to as a policy decision). Providing this additional
information beyond simply specifying an optimal solution (optimal sequence of decisions) can
be helpful in a variety of ways, including sensitivity analysis.

Optimality Principle: Given the current state, an optimal policy for the remaining stages is
independent of the policy decisions adopted in previous stages. Therefore, the optimal immediate
decision depends on only the current state and not on how you got there. This is the principle of
optimality for dynamic programming.

Given the state in which the fortune seeker is currently located, the optimal life insurance policy
(and its associated route) from this point onward is independent of how he got there. For
dynamic programming problems in general, knowledge of the current state of the system
conveys all the information about its previous behavior necessary for determining the optimal
policy henceforth. (This property is the Markovian property). Any problem lacking this property
cannot be formulated as a dynamic programming problem.

Optimal policy for last stage: The solution procedure begins by finding the optimal policy for
the last stage. The optimal policy for the last stage prescribes the optimal policy decision for
each of the possible states at that stage. The solution of this one-stage problem is usually trivial,
as it was for the stagecoach problem.

A recursive relationship that identifies the optimal policy for stage n, given the optimal policy
for stage n + 1, is available.

For the stagecoach problem, this recursive relationship was

f n*(s) = minxn {CsXn + f*n+1(xn)}.

Therefore, finding the optimal policy decision when you start in state s at stage n requires finding
the minimizing value of xn. For this particular problem, the corresponding minimum cost is
achieved by using this value of xn and then following the optimal policy when you start in state
xn at stage n +1.
The precise form of the recursive relationship differs somewhat among dynamic programming
problems. However, notation analogous to that introduced in the preceding section will continue
to be used here, as summarized below.

N=number of stages.

n =label for current stage (n = 1, 2, . . . , N).

sn = current state for stage n.

xn =decision variable for stage n.

xn* = optimal value of xn (given sn).

fn(sn, xn) = contribution of stages n, n =1, . . . , N to objective function if system starts in state sn
at stage n, immediate decision is xn, and optimal decisions are made thereafter.

f n*(sn) = fn(sn, xn*).

The recursive relationship will always be of the form

f n*(sn) =max xn {fn(sn, xn)} or f n*(sn) = min xn {fn(sn, xn)},

where fn(sn, xn) would be written in terms of sn, xn, f*n+1(sn+1), and probably some measure of
the immediate contribution of xn to the objective function. It is the inclusion of f*n+1(sn+1) on
the right-hand side, so that f*n(sn) is defined in terms of f*n+1(sn+1), that makes the expression
for f*n (sn) a recursive relationship.

The recursive relationship keeps recurring as we move backward stage by stage. When the
current stage number n is decreased by 1, the new f*n(sn) function is derived by using the
f*n+1(sn+1) function that was just derived during the preceding iteration, and then this process
keeps repeating. This property is emphasized in the next (and final) characteristic of dynamic
programming.
DP solution process

DP problems can be solved either through backward or forward recursion methods. In forward
recursion, the solution starts from stage 1 and proceeds to stage n. In backward recursion, the
reverse order is taken: from stage n to stage 1. Both approaches should give the same result but
depending on the problem one or the other may be more efficient computationally. (Taha 2007)

Backward Recursion: When we use this recursive relationship, the solution procedure starts at
the end and moves backward stage by stage—each time finding the optimal policy for that stage
— until it finds the optimal policy starting at the initial stage. This optimal policy immediately
yields an optimal solution for the entire problem, namely, x1* for the initial state s1, then x2* for
the resulting state s2, then x3* for the resulting state s3, and so forth to x*N for the resulting
stage sN.

This backward movement was demonstrated by the stagecoach problem, where the optimal
policy was found successively beginning in each state at stages 4, 3, 2, and 1, respectively.1 For
all dynamic programming problems, a table such as the following would be obtained for each
stage (n = N, N _1 . . . , 1).

xn fn(sn, xn) fn*(sn) Xn*


sn

When this table is finally obtained for the initial stage (n =1), the problem of interest is solved.
Because the initial state is known, the initial decision is specified by x1* in this table. The
optimal value of the other decision variables is then specified by the other tables in turn
according to the state of the system that results from the preceding decisions.

State of the system

State of the system at stage n is the subtlest component of DP problems. Though the context of
the problem is determinant of what it means a state, the following two questions help guide
determining the relevant states
 What relationships binds the stages together
 What information is needed to make feasible decisions at the current stage without
reexamining the decisions made at the previous stages.

Deterministic Dynamic Programming

 The state at the next stage is completely determined by the state and policy decision at the
current stage.
 Structure of deterministic dynamic programming is as shown in the diagram here

Stage Stage

n n+1
xn
Sn Sn +1
State
Contribution of xn
Value fn(sn, xn) fn+1(sn+1)

Thus, at stage n the process will be in some state sn. Making policy decision xn then moves the
process to some state sn+1 at stage n +1. The contribution thereafter to the objective function
under an optimal policy has been previously calculated to be f*n+1(sn+1). The policy decision
xn also makes some contribution to the objective function. Combining these two quantities in an
appropriate way provides fn(sn, xn), the contribution of stages n onward to the objective
function. Optimizing with respect to xn then gives f n*(sn) = fn(sn, xn*). After xn* and f n*(sn)
are found for each possible value of sn, the solution procedure is ready to move back one stage.
Types of deterministic DP

Objective function: Minimization/ maximization of a sum, product of contributions of individual


stages

Nature of set of states in each stage: discrete, continuous, a state vector

A Prevalent Problem Type—The Distribution of Effort Problem

For this type of problem, there is just one kind of resource that is to be allocated to a number of
activities. The objective is to determine how to distribute the effort (the resource) among the
activities most effectively.

Assumptions

Though like LP this is a resource allocation problem, there is still fundamental difference
between the two. One key difference is that the distribution of effort problem involves only one
resource (one functional constraint), whereas linear programming can deal with thousands of
resources. The three assumptions of proportionality, divisibility and certainty can be violated in
this model. The only assumption held here as in LP is additivity. This assumption is needed
because of the principle of optimality in DP.

Formulation

Because they always involve allocating one kind of resource to a number of activities,
distribution of effort problems always have the following dynamic programming formulation
(where the ordering of the activities is arbitrary):

Stage n = activity n (n _ 1, 2, . . . , N).

xn = amount of resource allocated to activity n.


State sn = amount of resource still available for allocation to remaining activities (n, . . . , N).

The reason for defining state sn in this way is that the amount of the resource still available for
allocation is precisely the information about the current state of affairs (entering stage n) that is
needed for making the allocation decisions for the remaining activities.

When the system starts at stage n in state sn, the choice of xn results in the next state at stage n+
1 being sn+1 =sn -xn, as depicted below:

Stage Stage

n n+1
xn
Sn Sn - Xn
State

Probabilistic dynamic programming model

Probabilistic dynamic programming differs from deterministic dynamic programming in that the
state at the next stage is not completely determined by the state and policy decision at the current
stage. Rather, there is a probability distribution for what the next state will be. However, this
probability distribution still is completely determined by the state and policy decision at the
current stage. The structure of the probabilistic DP model is shown in the diagram below. We
denote by S, the number of states in n+1 stage

For the purposes of this diagram, we let S denote the number of possible states at stage n + 1 and
label these states on the right side as 1, 2, . . . , S. The system goes to state i with probability pi (i
=1, 2, . . . , S) given state sn and decision xn at stage n. If the system goes to state i, Ci is the
contribution of stage n to the objective function.

Stage n Stage

Contribution from stage n n+1


1
C1
P1 f* n+1(1)
Decision
Sn P2 C2 2
xn
State
f* n+1(2)
PS Cs
….

S
f* n+1(s)

You might also like