Dynamic Programming
Dynamic Programming
Dynamic programming
Introduction:
Dynamic programming is a mathematical technique dealing with the optimization of
multistage decision process. The word ‘dynamic’ means to the situations of change in
several stages, such as every week, every day, or every month etc. and programming is
the term used in the mathematical sense of selecting an optimum allocation of resource.
The dynamic programming determines the optimum solution to an n-variable problem by
decomposing it into n stages with each stage constituting a single variable sub problem.
Recursive nature of dynamic programming:
Computations in DP are done recursively in the sense that the optimum solution of one
sub problem is used as an input to the next sub problem. By the time the last sub problem
is solved, the optimum solution for the entire problem is at hand.
Principle of optimality:
The concept of D.P is largely based upon the principle of optimality due to “Bell man”,
viz.
“An optimal policy has the property that what ever the initial state and initial
decision are, the remaining decisions must constitute an optimal policy with regard to the
state resulting from the first decision.
The principle of optimality implies that given the initial state of a system, an
optimal policy for the subsequent stages does not depend upon the policy adopted at the
preceding stages. That is the effect of a current decision on any of the policy decisions of
the preceding stages need not be taken into account at all. It is usually referred to as the
Monrovians Property of Dynamic Programming.
Salient features and their definitions of DPP
1. Recursive function: a DPP is solved by an objective function which is recurring in
nature. This recursive approach can be applied in two directions.
a) Forward recursive approach: In this approach the main problem is attacked by
dividing it from initial stage to final stage.
b) Backward recursive approach: In this approach the problem is dealt by
subdividing the problem from last to first and then solving from final to initial
stages.
1
Operations Research Dynamic programming
2. Stage: A large problem is decomposed into smaller and sequential sub problems is
referred to as a ‘stage’, where a decision is called for. Thus each stage can be
considered to have a starting and ending. The ending of one stage will be the
beginning of its next stage.
Suppose an engineering student has to acquire good marks in his degree after
studying for four years. Each year is considered to be a stage. When a student
completes his first year successfully, he is said to have ended first stage (and started
his second stage) having 1/4th of his large problem is completed. Thus he moves to
optimize his returns in second stage i.e. 2nd year with similar objective function, but
this includes the results of first year and completion of his second stage he acquires
2/4th of problem solved. Similarly he goes to 3 rd stage, finishing 3/4th that should
include the results of first, 2nd year. Finally, after completing his fourth year, he
completes his total problem that includes results of his previous three stages.
3. State: any stage just before starting or just after completion will have certain status
called the “state” and the variable that links up two stages is called state variable.
4. Return function: The decision of a stage described in the form of an algebraic
equation with state variables to explain the worth of benefit (or cost) is known as
return function or transformation equation.
5. Decision or policy variable: a decision is made at each stage. The decision or policy
variable transforms the state variable from any given stage into a state associated with
the next stage. For example, at any stage in the salesmen allocation problem, a policy
would be the allocation of some specific number of the available salesmen to the
territory represented by that stage.
2
Operations Research Dynamic programming
6. Optimal policy: a policy which optimizes the value of a criterion, return or objective
function is called an optimal policy starting in any given state of any stage; the
optimal policy depends only upon that state and not upon how it was reached. In
accordance with the principle of optimality, the optimal return function is found by
optimizing the immediate return (cost) from stage n plus the known optimal return
(cost) from the previous stage n-1.
The general relationship between state, stage and policy (or decision) can be
illustrated as shown in figure.
3
Operations Research Dynamic programming
4
Operations Research Dynamic programming
Xj
This is an n-stage decision problem where the value of the decision variable must be
determined at stage j.
The right hand side of the constraints bi can be treated as m types of resources to
be allocated among different kinds of activities Xj (different kinds of activities may be
number of castings produced, number of forgings produced, number of finished
components produced), bi may be available machines, available time. The constant Cj
represents the profit per unit bi needed for 1 unit of jth activity Xj (E.g.: the amount of
material required to produce one casting).
By denoting the optimal value of the composite objective function over n-stages as fn*.
5
Operations Research Dynamic programming
Find:
Such that
Xj
Since any value larger than would violate at least one constraint. Thus at the i th stage,
the optimal values Xi* and fi* can be determined as functions of
Finally at the nth stage, since the values of are known to be b1, b2, b3,
--------bm, respectively, we can determine X n* and fn*. Once Xn* is known, the remaining
values Xn-1*, Xn-2*, ------X1* can be determined by retracing the sub optimization steps.
6
Operations Research Dynamic programming
X1+1.5X2
X1, X2
Solution:
Since n=2; m=3; this problem can be considered as a two stage problem with three state
parameters.
Stage 1: max f1 (
X1 is a non-negative value that satisfies the side constraints.
10X1
Here and hence the maximum value
that X1 can assume is given by
=X1*=min
=50 min
Stage II: maximum value of f2
Max f2 (
are the resources available for allocation at stage II, which are equal to 2500,
2000, 450 respectively.
The maximum value of X2 can assume without violating any constraint is given by
=
Thus the recurrence relation is restated as
Maxf2(
Since min
=max
=max (2500-25X2) =21,875 at X2= 125
7
Operations Research Dynamic programming
Hence
f2*(2500, 2000, 450) =21,875 at X2*= 125.
X1*= min
= min (187.5, 187.5, 262.5) =187.5
X1*=187.5; X2*= 125; f max*= 21,875.
Max f2 (
6X2
Here and hence the maximum value that X2 can assume is
given by
=X2*=min
Stage 1:
Max f1 (
are the resources available for allocation at stage I, which are equal to 56, 20.
8
Operations Research Dynamic programming
Max f1 (
Since min
Z max= 760
X1*=16; X2*=20
Max f2 (
X2
Here and hence the maximum value that X2 can assume is given
by
=X2*=min
9
Operations Research Dynamic programming
Stage 1:
Max f1 (
are the resources available for allocation at stage I, which are equal to 21.5, 23.
Max f1 (
Since min
Zmax= 135
X1*=10; X2*=23
4. Max Z= X1+ 9X2
Subject to: 2X1+ X2
X2
X1, X2
Solution:
Since n=2; m=2; this problem can be considered as a two stage problem with 2 state
parameters.
Stage II: maximum value of f2:
Max f2(
10
Operations Research Dynamic programming
X2
Here and hence the maximum value that X2 can assume is given
by
=X2*=min
Stage 1:
Max f1 (
are the resources available for allocation at stage I, which are equal to 12.5, 11.
Max f1 (
Since min
Z max= 106
X1*=7; X2*=11
Cargo loading:
1) Find number of each of three items to be included in a package so that value of
package will be maximum. Total weight of package must not exceed 5 kg’s.
11
Operations Research Dynamic programming
Solution:
The given problem can be formulated as follows:
Let X1, X2 and X3 be the number of items of each type.
Maximize f(X) =30X1+80X2+65X3 (The total value is to be maximized)
Subject to: X1+X2+X3
X1, X2, and X3
We have to determine many units of three types of items are to be loaded. So it is a three
stage dynamic programming problem.
Let Xj (j=1,2,3) denote three decisions. Let fj (Xj) denote the value of the optimal
allocation for the three types of items.
If fj (S, Xj) be the value associated with the optimum solution f j (S), (j=1, 2, 3, ------n)
then we have
Where Pj (Xj) denotes the expected value obtained from allocation of X j units of weight to
the item j.
Now for stage I problem:
f1*=
Hence we have six alternatives i.e., 0, 1, 2, 3, 4 and 5 for X 1 we have the following
computations.
Stage I:
12
Operations Research Dynamic programming
1 0 30 - - - - 30 1
2 0 30 60 - - - 60 2
3 0 30 60 90 - - 90 3
4 0 30 60 90 120 - 120 4
5 0 30 60 90 120 150 150 5
Stage II: The largest value of X2 is 5/3=1 and thus we have two alternatives i.e. 0 and 1
for X2.
f2*(S) =
Stage III:
The largest value of X3 is 5/2= 2 and we have 0, 1 and 2 as the alternatives for X3
And
f3*(S) =
X3 Value of Optimum
S solution
0 1 2 f3*(S) X3*
0 0+0=0 - - 0 0
1 0+30=30 - - 30 0
2 0+60=60 65+0=65 65 1
3 0+90=90 65+30=95 95 1
4 0+120=120 65+60=125 130+0=130 130 2
5 0+150=150 65+90=155 130+30=160 160 2
For the given total weight W=5 the optimum solution is, X1*=1; X2*= 0; X3*=2;
Maximum value of Rs.160
13
Operations Research Dynamic programming
2. A 4-ton vessel loaded with one or more of three items. The following table gives
the unit weight, wi, in tons and the unit revenue in thousands of dollars, r i for item i.
how should the vessel be loaded to maximize the total return?
Item i Wi Ri
1 2 31
2 3 47
3 1 14
Solution:
Stage III
f3*=
Stage II:
The largest value of X2=4/3=1 and thus we have two alternatives i.e. 0 and 1 for X2
f2*(S) =
14
Operations Research Dynamic programming
Stage I:
The largest value of X1=4/2=2 and we have 0, 1 and 2 as alternatives for X1 and
f1*(S) =
X1 Value of Optimum
S solution
0 1 2 f1*(S) X1*
0 0+0=0 - - 0 0
1 0+14=14 - - 14 0
2 0+28=28 31+0=31 31 1
3 0+47=47 31+14=45 47 0
4 0+61=61 31+28=59 62+0=62 62 2
Solution:
To solve the problem by D.P, we first decompose it into stages as delineated by the
vertical dashed line in figure. Next, we carry out the computations for each stage
separately.
15
Operations Research Dynamic programming
The general idea is to compute the shortest (cumulative) distances to all the terminal
nodes of a stage and then use these distances as input data to the immediately succeeding
stage. Stage 1 includes three nodes (2, 3, 4) and its computations are simple.
Stage I:
Shortest distance to node 2= 7 miles (from node 1)
Shortest distance to node 3= 8 miles (from node 1)
Shortest distance to node 4= 5 miles (from node 1)
Stage II:
Stage 2 has two end nodes (5 and 6) considering node 5 first, we see from fig. that there
are three possible routes to reach node 5-namely [(2, 5), (3, 5) and (4, 5)]. This
information together with the shortest distances to nodes 2, 3 and 4, determines the
shortest (cumulative) distance to node 5 as
Shortest distance to node 5
Stage III:
Shortest distance to node 7:=min
Therefore, shortest distance from node 1 to node 7 = 21 miles.
16
Operations Research Dynamic programming
From stage III, the node 7 is linked to node 5. Next node 5 is linked to node 4. Finally
node 4 is linked to node 1. Thus the shortest route is defined as
Allocation:
1. A member of a certain political party is making plans for his election to the
parliament. He has received the service of six volunteer workers and wishes to
assign them to three districts in such way as to maximize their effectiveness. He
feels that it would be inefficient to assign no worker to more than one district,
but he is willing to assign no worker to any one of the district if they can
accomplish in other districts. The following table gives the estimated increase in
the number of votes in his favour in each district if it were allocated various
numbers of workers.
Number of Districts
workers 1 2 3
0 0 0 0
1 25 20 33
2 42 38 43
3 55 54 47
4 63 65 50
5 69 73 52
6 74 80 53
How many of the six workers should be assigned to each of the three districts in
order to maximize total estimated increase in the number of votes in his favour.
Solution:
Maximize Z= V1X1+V2X2+V3X3
Subject to: X1+ X2+ X3=6
Recursive relation:
f1(S, X1) = V1X1
If fj (S, Xj) be the profit associated with the optimum solution fj*(S), j=1, 2, 3 then
f1*(S, Xj) =
Stage-3
S f3*(S) X3*
17
Operations Research Dynamic programming
0 0 0
1 33 1
2 43 2
3 47 3
4 50 4
5 52 5
6 53 6
Stage-2:
X2 Value of f2(S,X2)=V2X2+f3*(S-X2) Optimum
solution
S
0 1 2 3 4 5 6 f2(S) X2*
0 0+0=0 - - - - - - 0 0
1 0+33=33 20+0=20 - - - - - 33 0
2 0+43=43 20+33=53 38+0=38 - - - - 53 1
3 0+47=47 20+43=63 38+33=71 54+0=54 - - - 71 2
4 0+50=50 20+47=67 38+43=81 54+33=87 65+0=65 - - 87 3
5 0+52=52 20+50=70 38+47=85 54+43=97 65+33=98 73+0=73 - 98 4
6 0+53=53 20+52=72 38+50=88 54+47=101 65+43=108 73+33=106 80+0=80 108 4
Stage 1:
X1 Value of f2(S,X2)=V2X2+f3*(S-X2) Optimum
S solution
0 1 2 3 4 5 6 f1(S) X1*
0 0 - - - - - - 0 0
1 33 25+0=20 - - - - - 33 0
2 53 25+33=58 42 - - - - 58 1
3 71 25+53=78 42+33=75 55 - - - 78 1
4 87 25+71=96 42+53=95 55+33=88 63 - - 96 1
5 98 25+87=112 42+71=113 55+53=108 63+33=96 69 - 113 2
6 108 25+98=123 42+87=126 55+71=116 63+53=116 69+33=102 74 129 2
X1*= 2; X2*=3; X3*=1;
18
Operations Research Dynamic programming
2. The owner of a chain of four grocery stores has purchased six crates of fresh
strawberries. The estimated probability distribution of potential sales of the
strawberries before spoilage differs among the four stores. The following table
gives the estimated total expected profit at each store, when it is allocated
various numbers of crates.
For administrative reasons, the owner does not wish to split crates between
stores. However, he is willing to distribute Zero crates to any of his stores.
Find the allocation of six crates to four stores as to maximize the expected profit.
Store
1 2 3 4
Number of crates
0 0 0 0 0
1 4 2 6 2
2 6 4 8 3
3 7 6 8 4
4 7 8 8 4
5 7 9 8 4
6 7 10 8 4
Solution:
Let Xj be the number of crates allocated at the jth stage j=1, 2, 3, 4;
Pj (Xj) = expected profit from allocation of Xj crates to store j.
Now the problem can be formulated as LPP as follows
Max Z = P1 (X1) + P2 (X2)+ P3 (X3)+ P4 (X4)
Subject to: X1+ X2+ X3+ X4=6 and
X1, X2, X3, X4
Recursive equations:
f1(S, X1) = P1 (X1)
if fj* (S, Xj) denotes the profit associated with the optimum solution f j* (S) (j=1,2,3,4)
fj* (S) =
19
Operations Research Dynamic programming
Stage 4:
S f4* X4*
0 0 0
1 2 1
2 3 2
3 4 3
4 4 3, 4
5 4 3, 4, 5
6 4 3, 4, 5, 6
Stage 3:
Stage 2:
0 0 - - - - - - 0 0
1 6 2 - - - - - 6 1
2 8 8 4 - - - - 8 0, 1
3 10 10 10 6 - - - 10 0, 1, 2
4 11 12 12 12 8 - - 12 1, 2, 3
5 12 13 14 14 14 9 - 14 2, 3, 4
6 12 14 15 16 16 15 10 16 3, 4
20
Operations Research Dynamic programming
Stage 1:
6 16 18 18 17 17 13 7 18 1, 2
From above computations it is observed that the maximum profit of Rs.18 can be
obtained by choosing the following eight alternative solutions that each row must be 6.
Store 1 :X1 Store 2 : X2 Store 3: X3 Store 4: X4
1 2 2 1
1 3 1 1
1 3 2 0
1 4 1 0
2 1 2 1
2 2 1 1
2 2 2 0
2 3 1 0
1. Minimize Z =
21
Operations Research Dynamic programming
Since
Since minimum value of the function occurs
y3=5; y1+y2= 10; y1= 5;
Hence the optimal policy (5, 5, 5) with f3 (15) =75.
2. Maximize Z=
Stage 1:
S1= y1 where y1 can vary from 1 to 4
22
Operations Research Dynamic programming
with y1 integer
y1 1 2 3 4
f1(S1) 1 4 9 16
Stage 2:
With y1, y2 integers
y1 1 2 3 4
4 2 - 1
y2=
y1 1 2 4
f1(S1) 1 4 16
y2 y22
1 1 2 5 17
2 4 5 8
4 16 17
S2 1 2 4
f2(S2) 5 4 17
Stage 3:
=Max
S2 1 2 4
f2(S2) 2 5 17
23
Operations Research Dynamic programming
y3 y32
1 1 3 6 18
2 4 6 9
4 16 18
S3 1 2 4
f3(S3) 3 6 18
At stage 2:
At stage1:
Since y1 = S2-y2
24
Operations Research Dynamic programming
f2 (S2) =
f3 (S3) =
Assignment problems
25
Operations Research Dynamic programming
4. Max Z= 50 X1+100X2
Subject to: 2 X1+3 X2
X1+3X2
X1+X2
X1, X2 Sol: X1=6, X2=12; Zmax=60
6. Max Z= 50x1+80x2
Subject to: x1 , x2 , 5x1+6x2 , x1+2x2 , x1,x2
Sol: Z= 7,000
x1=60, x2= 50.
7. Find out the minimum distance from city A to city B from the fig as shown in below:
Solution:
Total distance =3+3+6+4=16 km’s.
Route:
8. Truck can carry a total of 10 tons of commodity. Three types (A, B & C) of
commodities are to be carried. These commodities have the characteristics as shown in
the following table:
26
Operations Research Dynamic programming
9. A ship is to be loaded with certain items. Each unit of item ‘i’ has a weight wi and a
value vi (i=1, 2, 3). The maximum cargo weight permitted is W. using the following
tables; determine the most valuable cargo load which will not exceed the maximum
permissible weight.
i) When W=10;
i Wi vi
1 5 4
2 8 10
3 3 6
10. A company makes three types of product A, B and C. the raw material required and
the cash return for each product are shown below in the table. The total number of
boxes of raw materials available for the given period is 10. The company wants to
select a product mix that maximizes the total cash return. Since a fractional unit of a
product is meaningless, the product must be measured in integer values.
Product Boxes of raw material required for per unit of Profit per unit
product (Rs.)
A 4 10
B 3 7
C 2 4
27
Operations Research Dynamic programming
11. A student has to take examination in three courses. He has only 5 days left. Table
below indicates the relationships between the expected grade in a particular course
and the block of time spent studying it.
Block of time Course
available (days)
A B C
0 0 0 0
1 4 3 2
2 6 5 4
3 7 6 5
4 9 7 7
5 10 8 9
12. Mr. X wishes to invest Rs.50, 000 among three possible investment projects. He
could invest all, part or none of his Rs.50, 000 in any one of the project, but he has to
invest in units of Rs.10, 000. the expected return, which depends on the amount of
money invested is shown in the table given below:
28
Operations Research Dynamic programming
C 5000 550
D 3000 350
Determine the number of units of each package that would maximize the revenue,
given that the capacity of the van is limited to 10,000 Kgs. Use dynamic
programming approach.
2. a) What are the situations warranting use of dynamic programming.
b) Six units of capital is available to invest in four business ventures. The returns
from each unit of investment in all the four ventures are given in the table below.
Find how the capital should be allocated to business proposals in order to maximize
profit using dynamic programming.
2 6 4 7 3
3 7 6 8 4
4 8 8 8 5
5 8 9 8 6
6 8 10 8 6
29
Operations Research Dynamic programming
replacing some old inefficient machines by atomic machines. The electronic gadget
parts section was started only a few days back and thus additional amount can be
invested only by adding new machines to the section. The cost of adding and replacing
the machines along with the association expected returns in the different sections is
given in the table below. Select a set of expansion plans which may yield maximum
return. Use dynamic programming.
Automobile parts Computer parts Electronic gadgets
Alternatives Cost (Rs) Return Cost (Rs) Return Cost (Rs) Return
(Rs) (Rs) (Rs)
No 0 0 0 0 0 0
expansion
Add new 4000 8000 8000 12000 2000 8000
machines
Replace old 6000 10,000 12000 18000 - -
machines
30