0% found this document useful (0 votes)

2 views16 pages

Chapter 1 Introduction and Basic Dynamic Programming Algorithm

The document discusses Dynamic Programming (DP) as an optimization methodology, outlining its structure for solving both deterministic and stochastic problems. It covers the formulation of optimization problems, the significance of feedback in decision-making, and various examples including inventory control and scheduling. Additionally, it explains the principle of optimality and the DP algorithm for finding optimal policies and costs in complex systems.

Uploaded by

suhezero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views16 pages

Chapter 1 Introduction and Basic Dynamic Programming Algorithm

Uploaded by

suhezero

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

DP AS AN OPTIMIZATION METHODOLOGY

• Generic optimization problem:

min g(u)
u∈U

where u is the optimization/decision variable,

g(u) is the cost function, and U is the constraint
set
• Categories of problems:
− Discrete (U is finite) or continuous
− Linear (g is linear and U is polyhedral) or
nonlinear
− Stochastic or deterministic: In stochastic prob-
lems the cost involves a stochastic parameter
w, which is averaged, i.e., it has the form

g(u) = Ew G(u, w)

where w is a random parameter.

• DP can deal with complex stochastic problems
where information about w becomes available in
stages, and the decisions are also made in stages
and make use of this information.
BASIC STRUCTURE OF STOCHASTIC DP

• Discrete-time system

xk+1 = fk (xk , uk , wk ), k = 0, 1, . . . , N − 1

− k: Discrete time
− xk : State; summarizes past information that
is relevant for future optimization
− uk : Control; decision to be selected at time
k from a given set
− wk : Random parameter (also called distur-
bance or noise depending on the context)
− N : Horizon or number of times control is
applied

• Cost function that is additive over time

−1
( N
)
X
E gN (xN ) + gk (xk , uk , wk )
k=0

• Alternative system description: P (xk+1 | xk , uk )

xk+1 = wk with P (wk | xk , uk ) = P (xk+1 | xk , uk )

INVENTORY CONTROL EXAMPLE

wk Demand at Period k

Stock at Period k Inventory Stock at Period k + 1

xk System xk + 1 = xk + uk - wk

Stock Ordered at
Period k
Cos t of P e riod k uk
c uk + r (xk + uk - wk)

• Discrete-time system

xk+1 = fk (xk , uk , wk ) = xk + uk − wk
• Cost function that is additive over time
−1
( N
)
X
E gN (xN ) + gk (xk , uk , wk )
k=0
(N −1 )
X
=E cuk + r(xk + uk − wk )
k=0

• Optimization over policies: Rules/functions uk =

µk (xk ) that map states to controls
ADDITIONAL ASSUMPTIONS

• The set of values that the control uk can take

depend at most on xk and not on prior x or u
• Probability distribution of wk does not depend
on past values wk−1 , . . . , w0 , but may depend on
xk and uk
− Otherwise past values of w or x would be
useful for future optimization
• Sequence of events envisioned in period k:
− xk occurs according to

xk = fk−1 xk−1 , uk−1 , wk−1

− uk is selected with knowledge of xk , i.e.,

uk ∈ Uk (xk )

− wk is random and generated according to a

distribution

Pwk (xk , uk )
DETERMINISTIC FINITE-STATE PROBLEMS

• Scheduling example: Find optimal sequence of

operations A, B, C, D
• A must precede B, and C must precede D
• Given startup cost SA and SC , and setup tran-
sition cost Cmn from operation m to operation n

ABC CC D

C BC
AB
ACB C BD
C AB
A C CB
AC
C AC
CC D ACD C DB
SA
Initial
State
CAB C BD
CA C AB
SC
C C CA
C AD CAD C DB

CC D CD

C DA
CDA C AB
STOCHASTIC FINITE-STATE PROBLEMS

• Example: Find two-game chess match strategy

• Timid play draws with prob. pd > 0 and loses
with prob. 1 − pd . Bold play wins with prob. pw <
1/2 and loses with prob. 1 − pw

0.5-0.5 1- 0
pd pw

0-0 0-0
1 - pd 1 - pw

0-1 0-1

1st Game / Timid Play 1st Game / Bold Play

2-0
2-0
pw

pd 1-0 1 - pw
1-0 1.5-0.5
1.5-0.5
1 - pd
pw
pd 0.5-0.5 1-1
0.5-0.5 1-1 1 - pw
1 - pd
pw
pd 0.5-1.5
0.5-1.5 0-1
0-1 1 - pw
1 - pd
0-2
0-2

2nd Game / Timid Play 2nd Game / Bold Play

BASIC PROBLEM

• System xk+1 = fk (xk , uk , wk ), k = 0, . . . , N −1

• Control contraints uk ∈ Uk (xk )
• Probability distribution Pk (· | xk , uk ) of wk
• Policies π = {µ0 , . . . , µN −1 }, where µk maps
states xk into controls uk = µk (xk ) and is such
that µk (xk ) ∈ Uk (xk ) for all xk
• Expected cost of π starting at x0 is

−1
( N
)
X
Jπ (x0 ) = E gN (xN ) + gk (xk , µk (xk ), wk )
k=0

• Optimal cost function

J ∗ (x0 ) = min Jπ (x0 )

• Optimal policy π ∗ satisfies

Jπ∗ (x0 ) = J ∗ (x0 )

When produced by DP, π ∗ is independent of x0 .

SIGNIFICANCE OF FEEDBACK

• Open-loop versus closed-loop policies

u kµ=km
uk = k(x
(xk )k) System xk
xk + 1 = fk( xk,u k,wk)

) m
µkk

• In deterministic problems open loop is as good

as closed loop
• Value of information; chess match example
• Example of open-loop policy: Play always bold
• Consider the closed-loop policy: Play timid if
and only if you are ahead

pd 1.5-0.5

1- 0
1 - pd
pw
Timid Play
1-1
0-0
1 - pw
Bold Play
pw 1- 1

0-1
1 - pw
VARIANTS OF DP PROBLEMS

• Continuous-time problems
• Imperfect state information problems
• Infinite horizon problems
• Suboptimal control
PRINCIPLE OF OPTIMALITY

• Let π ∗ = {µ∗0 , µ∗1 , . . . , µ∗N −1 } be optimal policy

• Consider the “tail subproblem” whereby we are
at xi at time i and wish to minimize the “cost-to-
go” from time i to time N

−1
( N
)
X
E gN (xN ) + gk xk , µk (xk ), wk
k=i

and the “tail policy” {µ∗i , µ∗i+1 , . . . , µ∗N −1 }

xi Tail Subproblem

0 i N

• Principle of optimality: The tail policy is opti-

mal for the tail subproblem (optimization of the
future does not depend on what we did in the past)
• DP first solves ALL tail subroblems of final
stage
• At the generic step, it solves ALL tail subprob-
lems of a given time length, using the solution of
the tail subproblems of shorter time length
DETERMINISTIC SCHEDULING EXAMPLE

• Find optimal sequence of operations A, B, C,

D (A must precede B and C must precede D)

ABC 6

3
AB
ACB 1
2 9
A 4
3 AC
8 ACD 3
5 6
5
Initial
1 0 State
CAB 1
CA 2
3
4
C 3 4
CAD 3
7 6
CD

5 3
CDA 2

• Start from the last tail subproblem and go back-

wards
• At each state-time pair, we record the optimal
cost-to-go and the optimal decision
STOCHASTIC INVENTORY EXAMPLE

wk Demand at Period k

Stock at Period k Inventory Stock at Period k + 1

xk System
xk + 1 = xk + uk - wk

Stock Ordered at
Period k
Cost of Period k uk
cuk + r (xk + uk - wk)

• Tail Subproblems of Length 1:

JN −1 (xN −1 ) = min E cuN −1
uN −1 ≥0 wN −1

+ r(xN −1 + uN −1 − wN −1 )

• Tail Subproblems of Length N − k:

Jk (xk ) = min E cuk + r(xk + uk − wk )
uk ≥0 wk

+ Jk+1 (xk + uk − wk )

• J0 (x0 ) is opt. cost of initial state x0

DP ALGORITHM

• Start with

JN (xN ) = gN (xN ),

and go backwards using

Jk (xk ) = min E gk (xk , uk , wk )
uk ∈Uk (xk ) wk

+ Jk+1 fk (xk , uk , wk ) , k = 0, 1, . . . , N − 1.

• Then J0 (x0 ), generated at the last step, is equal

to the optimal cost J ∗ (x0 ). Also, the policy
π ∗ = {µ∗0 , . . . , µ∗N −1 }
where µ∗k (xk ) minimizes in the right side above for
each xk and k, is optimal
• Justification: Proof by induction that Jk (xk ) is
equal to Jk∗ (xk ), defined as the optimal cost of the
tail subproblem that starts at time k at state xk
• Note:
− ALL the tail subproblems are solved (in ad-
dition to the original problem)
− Intensive computational requirements
PROOF OF THE INDUCTION STEP

• Let πk = µk , µk+1 , . . . , µN −1 denote a tail
policy from time k onward
∗ (x
• Assume that Jk+1 (xk+1 ) = Jk+1 k+1 ). Then

(

Jk∗ (xk ) = min E gk xk , µk (xk ), wk
(µk ,πk+1 ) wk ,...,wN −1

N −1
)
X
+ gN (xN ) + gi xi , µi (xi ), wi
i=k+1
(

= min E gk xk , µk (xk ), wk
µk wk
" ( N −1
)# )
X
+ min E gN (xN ) + gi xi , µi (xi ), wi
πk+1 wk+1 ,...,wN −1
i=k+1
∗

= min E gk xk , µk (xk ), wk + Jk+1 fk xk , µk (xk ), wk
µk wk

= min E gk xk , µk (xk ), wk + Jk+1 fk xk , µk (xk ), wk
µk wk

= min E gk (xk , uk , wk ) + Jk+1 fk (xk , uk , wk )
uk ∈Uk (xk ) wk

= Jk (xk )
LINEAR-QUADRATIC ANALYTICAL EXAMPLE

Initial Final
Temperature x0 Oven 1 x1 Oven 2 Temperature x2
Temperature Temperature
u0 u1

• System

xk+1 = (1 − a)xk + auk , k = 0, 1,

where a is given scalar from the interval (0, 1)

• Cost
r(x2 − T )2 + u20 + u21
where r is given positive scalar
• DP Algorithm:

J2 (x2 ) = r(x2 − T )2
h i
2
J1 (x1 ) = min u21 + r (1 − a)x1 + au1 − T

u1
2
J0 (x0 ) = min u0 + J1 (1 − a)x0 + au0
u0
STATE AUGMENTATION

• When assumptions of the basic problem are

violated (e.g., disturbances are correlated, cost is
nonadditive, etc) reformulate/augment the state
• DP algorithm still applies, but the problem gets
BIGGER
• Example: Time lags

xk+1 = fk (xk , xk−1 , uk , wk )

• Introduce additional state variable yk = xk−1 .

New system takes the form

xk+1 fk (xk , yk , uk , wk )
=
yk+1 xk

View x̃k = (xk , yk ) as the new state.

• DP algorithm for the reformulated problem:
n
Jk (xk , xk−1 ) = min E gk (xk , uk , wk )
uk ∈Uk (xk ) wk
o
+ Jk+1 fk (xk , xk−1 , uk , wk ), xk

Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
199 pages
A&J Flashcards For SOA Exam P/CAS Exam 1
No ratings yet
A&J Flashcards For SOA Exam P/CAS Exam 1
28 pages
DP Slides
No ratings yet
DP Slides
263 pages
Dynamic Programing and Optimal Control
No ratings yet
Dynamic Programing and Optimal Control
276 pages
Dynamic Programing and Optimal Control PDF
No ratings yet
Dynamic Programing and Optimal Control PDF
276 pages
MIT Dynamic Programming Lecture Slides
No ratings yet
MIT Dynamic Programming Lecture Slides
261 pages
MIT6 231F15 Notes PDF
No ratings yet
MIT6 231F15 Notes PDF
303 pages
MIT6 231F15 Complete Slide
No ratings yet
MIT6 231F15 Complete Slide
166 pages
MIT6 231F11 Notes Short
No ratings yet
MIT6 231F11 Notes Short
125 pages
Dynamic Programming and Optimal Control, Volumes I Solution Selected
No ratings yet
Dynamic Programming and Optimal Control, Volumes I Solution Selected
30 pages
Vol I Dimitri PDF
No ratings yet
Vol I Dimitri PDF
30 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
Chapter 4 Problems With Perfect StateInformation
No ratings yet
Chapter 4 Problems With Perfect StateInformation
17 pages
Dynamic Programming Online Teaching FOR PRINT
No ratings yet
Dynamic Programming Online Teaching FOR PRINT
44 pages
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
0% (1)
Dynamic Programming and Optimal Control: Third Edition Dimitri P. Bertsekas
54 pages
Chapter 5 Problems With Imperfect State Information
No ratings yet
Chapter 5 Problems With Imperfect State Information
22 pages
CH 9 MDP
No ratings yet
CH 9 MDP
97 pages
RL and ObC Lecture 2
No ratings yet
RL and ObC Lecture 2
20 pages
Chapter 6 Approximate Dynamic Programming
No ratings yet
Chapter 6 Approximate Dynamic Programming
21 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
45 pages
Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
62 pages
Dynamic Programming: Xiaolan Xie
No ratings yet
Dynamic Programming: Xiaolan Xie
97 pages
Figure by Mit Opencourseware
No ratings yet
Figure by Mit Opencourseware
26 pages
RL Monograph1
No ratings yet
RL Monograph1
48 pages
IJCAS v2 n3 pp263-278
No ratings yet
IJCAS v2 n3 pp263-278
16 pages
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
No ratings yet
Dynamic Programming and Linear Quadratic (LQ) Control (Discrete-Time and Continuous Time Cases)
53 pages
Dynamic Programming
No ratings yet
Dynamic Programming
52 pages
RL Monograph1
No ratings yet
RL Monograph1
42 pages
Stochastic DP Problems
No ratings yet
Stochastic DP Problems
11 pages
Namic Programming
No ratings yet
Namic Programming
18 pages
16.323 Principles of Optimal Control: Mit Opencourseware
No ratings yet
16.323 Principles of Optimal Control: Mit Opencourseware
27 pages
NDP PDF
No ratings yet
NDP PDF
5 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
1 Optimal Control: 1.1 Problem Definition
No ratings yet
1 Optimal Control: 1.1 Problem Definition
8 pages
Moritz Lars
No ratings yet
Moritz Lars
97 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Kulkami, V. G. Modeling Analysis Design and Control of Stochastic System (2000) .12
No ratings yet
Kulkami, V. G. Modeling Analysis Design and Control of Stochastic System (2000) .12
30 pages
EE290 Lecture 16
No ratings yet
EE290 Lecture 16
4 pages
P550
No ratings yet
P550
27 pages
08 - Markov Decision Processes
No ratings yet
08 - Markov Decision Processes
31 pages
Dynamic Programming Matlab
No ratings yet
Dynamic Programming Matlab
6 pages
Powell UnifiedFrameworkStochasticOptimization Jan292018
No ratings yet
Powell UnifiedFrameworkStochasticOptimization Jan292018
69 pages
Neuro-Dynamic Programming An Overview Dimitri P. Bertsekas
No ratings yet
Neuro-Dynamic Programming An Overview Dimitri P. Bertsekas
9 pages
Chapter 3 Dynamic Programming
No ratings yet
Chapter 3 Dynamic Programming
33 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
Module 04
No ratings yet
Module 04
63 pages
Softmax Policy Gradient Methods Can Take Exponenti
No ratings yet
Softmax Policy Gradient Methods Can Take Exponenti
65 pages
Notas - Dynamic Optimation and Optimal Control
No ratings yet
Notas - Dynamic Optimation and Optimal Control
26 pages
04 RL DP
No ratings yet
04 RL DP
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
An Introduction To Policy Search Methods: Thomas Furmston
No ratings yet
An Introduction To Policy Search Methods: Thomas Furmston
33 pages
Part4 2 Stochastic Full
No ratings yet
Part4 2 Stochastic Full
26 pages
Dynamic Optimization
No ratings yet
Dynamic Optimization
73 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
Adprl Chapter Icis
No ratings yet
Adprl Chapter Icis
43 pages
RL Unit-4
No ratings yet
RL Unit-4
18 pages
Deterministic Dynamic Programming: To The Next
No ratings yet
Deterministic Dynamic Programming: To The Next
52 pages
Constrained Policy Opt
No ratings yet
Constrained Policy Opt
18 pages
Reinforcement Learning: Foundations Exam
No ratings yet
Reinforcement Learning: Foundations Exam
42 pages
Speadsheet
No ratings yet
Speadsheet
5 pages
Cito Proefschrift Maarten Marsman PDF
No ratings yet
Cito Proefschrift Maarten Marsman PDF
114 pages
Confidence Intervals
No ratings yet
Confidence Intervals
24 pages
Slides 04
No ratings yet
Slides 04
99 pages
Sta 100 2
No ratings yet
Sta 100 2
5 pages
VK Malik
No ratings yet
VK Malik
25 pages
10-601 Machine Learning Midterm Exam Fall 2011: Tom Mitchell, Aarti Singh Carnegie Mellon University
No ratings yet
10-601 Machine Learning Midterm Exam Fall 2011: Tom Mitchell, Aarti Singh Carnegie Mellon University
16 pages
Pareto Distribution
No ratings yet
Pareto Distribution
12 pages
Cambridge International AS & A Level: Mathematics 9709/62
No ratings yet
Cambridge International AS & A Level: Mathematics 9709/62
16 pages
COMSATS University Islamabad, Wah Campus Terminal Examinations Spring 2020
No ratings yet
COMSATS University Islamabad, Wah Campus Terminal Examinations Spring 2020
6 pages
Random Variables, Distributions, Multidimensional Random Variables
No ratings yet
Random Variables, Distributions, Multidimensional Random Variables
9 pages
Standard Error
No ratings yet
Standard Error
3 pages
Introduction To Probability: Sample Space and Events
No ratings yet
Introduction To Probability: Sample Space and Events
14 pages
GR - 06 Probability Work Sheet
No ratings yet
GR - 06 Probability Work Sheet
4 pages
Sample Final1
No ratings yet
Sample Final1
7 pages
Module-2 QUANTIFYING UNCERTAINTY - PPTX - Google Slides
No ratings yet
Module-2 QUANTIFYING UNCERTAINTY - PPTX - Google Slides
68 pages
Voltage Stability
No ratings yet
Voltage Stability
10 pages
Gaussian Markov Random Fields Theory and
100% (1)
Gaussian Markov Random Fields Theory and
259 pages
Mid 1 Que Bank
No ratings yet
Mid 1 Que Bank
5 pages
2018 April MA204-C - Ktu Qbank
No ratings yet
2018 April MA204-C - Ktu Qbank
2 pages
Solutions To IIT JAM For Mathematical Statistics: December 2018
No ratings yet
Solutions To IIT JAM For Mathematical Statistics: December 2018
21 pages
Poisson Distribution PDF
No ratings yet
Poisson Distribution PDF
15 pages
Chapter 2
No ratings yet
Chapter 2
6 pages
Basic Probability What Every Math Student Should Know 2nd Edition Henk Tijms - Download The Ebook Now and Own The Full Detailed Content
No ratings yet
Basic Probability What Every Math Student Should Know 2nd Edition Henk Tijms - Download The Ebook Now and Own The Full Detailed Content
73 pages
Stat Unit 3 - T Test
No ratings yet
Stat Unit 3 - T Test
25 pages
Lec8 MTH305
No ratings yet
Lec8 MTH305
41 pages
Autocorrelation PDF
No ratings yet
Autocorrelation PDF
37 pages
Information Theory: By: Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu
No ratings yet
Information Theory: By: Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu
32 pages

Chapter 1 Introduction and Basic Dynamic Programming Algorithm

Uploaded by

Chapter 1 Introduction and Basic Dynamic Programming Algorithm

Uploaded by

DP AS AN OPTIMIZATION METHODOLOGY

• Generic optimization problem:

where u is the optimization/decision variable,

where w is a random parameter.

• Cost function that is additive over time

• Alternative system description: P (xk+1 | xk , uk )

xk+1 = wk with P (wk | xk , uk ) = P (xk+1 | xk , uk )

Stock at Period k Inventory Stock at Period k + 1

• Optimization over policies: Rules/functions uk =

• The set of values that the control uk can take

− uk is selected with knowledge of xk , i.e.,

− wk is random and generated according to a

• Scheduling example: Find optimal sequence of

• Example: Find two-game chess match strategy

1st Game / Timid Play 1st Game / Bold Play

2nd Game / Timid Play 2nd Game / Bold Play

• System xk+1 = fk (xk , uk , wk ), k = 0, . . . , N −1

• Optimal cost function

J ∗ (x0 ) = min Jπ (x0 )

• Optimal policy π ∗ satisfies

Jπ∗ (x0 ) = J ∗ (x0 )

When produced by DP, π ∗ is independent of x0 .

• Open-loop versus closed-loop policies

• In deterministic problems open loop is as good

• Let π ∗ = {µ∗0 , µ∗1 , . . . , µ∗N −1 } be optimal policy

and the “tail policy” {µ∗i , µ∗i+1 , . . . , µ∗N −1 }

• Principle of optimality: The tail policy is opti-

• Find optimal sequence of operations A, B, C,

• Start from the last tail subproblem and go back-

Stock at Period k Inventory Stock at Period k + 1

• Tail Subproblems of Length 1:

• Tail Subproblems of Length N − k:

• J0 (x0 ) is opt. cost of initial state x0

and go backwards using

• Then J0 (x0 ), generated at the last step, is equal

xk+1 = (1 − a)xk + auk , k = 0, 1,

where a is given scalar from the interval (0, 1)

• When assumptions of the basic problem are

xk+1 = fk (xk , xk−1 , uk , wk )

• Introduce additional state variable yk = xk−1 .

View x̃k = (xk , yk ) as the new state.

You might also like