0% found this document useful (0 votes)

87 views11 pages

RL Frontmatter

This document provides an overview and summary of the draft textbook "Reinforcement Learning and Optimal Control" by Dimitri P. Bertsekas. The textbook covers topics such as deterministic and stochastic dynamic programming, approximation methods in value and policy spaces, parametric function approximation using linear architectures and neural networks, and infinite horizon reinforcement learning problems. It is scheduled to be finalized and published in 2019 by Athena Scientific.

Uploaded by

Chainszz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views11 pages

RL Frontmatter

Uploaded by

Chainszz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Reinforcement Learning and Optimal Control

by
Dimitri P. Bertsekas
Massachusetts Institute of Technology

DRAFT TEXTBOOK
This is a draft of a textbook that is scheduled to be finalized in 2019,
and to be published by Athena Scientific. It represents “work in progress,”
and it will be periodically updated. It more than likely contains errors
(hopefully not serious ones). Furthermore, its references to the literature
are incomplete. Your comments and suggestions to the author at dim-
[email protected] are welcome. The date of last revision is given below.

December 14, 2018

WWW site for book information and orders

http://www.athenasc.com

Athena Scientific, Belmont, Massachusetts

Athena Scientific
Post Office Box 805
Nashua, NH 03060
U.S.A.

Email: [email protected]
WWW: http://www.athenasc.com

Publisher’s Cataloging-in-Publication Data

Bertsekas, Dimitri P.
Reinforcement Learning and Optimal Control
Includes Bibliography and Index
1. Mathematical Optimization. 2. Dynamic Programming. I. Title.
QA402.5 .B465 2019 519.703 00-91281

ISBN-10: 1-886529-39-6, ISBN-13: 978-1-886529-39-7

ABOUT THE AUTHOR

Dimitri Bertsekas studied Mechanical and Electrical Engineering at the

National Technical University of Athens, Greece, and obtained his Ph.D.
in system science from the Massachusetts Institute of Technology. He has
held faculty positions with the Engineering-Economic Systems Department,
Stanford University, and the Electrical Engineering Department of the Uni-
versity of Illinois, Urbana. Since 1979 he has been teaching at the Electrical
Engineering and Computer Science Department of the Massachusetts In-
stitute of Technology (M.I.T.), where he is currently the McAfee Professor
of Engineering.
His teaching and research spans several fields, including determinis-
tic optimization, dynamic programming and stochastic control, large-scale
and distributed computation, and data communication networks. He has
authored or coauthored numerous research papers and seventeen books,
several of which are currently used as textbooks in MIT classes, including
“Dynamic Programming and Optimal Control,” “Data Networks,” “Intro-
duction to Probability,” and “Nonlinear Programming.”
Professor Bertsekas was awarded the INFORMS 1997 Prize for Re-
search Excellence in the Interface Between Operations Research and Com-
puter Science for his book “Neuro-Dynamic Programming” (co-authored
with John Tsitsiklis), the 2001 AACC John R. Ragazzini Education Award,
the 2009 INFORMS Expository Writing Award, the 2014 AACC Richard
Bellman Heritage Award, the 2014 Khachiyan Prize for Life-Time Accom-
plishments in Optimization, the 2015 George B. Dantzig Prize, and the 2018
John von Neumann Theory Prize. In 2001, he was elected to the United
States National Academy of Engineering for “pioneering contributions to
fundamental research, practice and education of optimization/control the-
ory, and especially its application to data communication networks.”

iii
ATHENA SCIENTIFIC
OPTIMIZATION AND COMPUTATION SERIES

1. Abstract Dynamic Programming, 2nd Edition, by Dimitri P.

Bertsekas, 2018, ISBN 978-1-886529-46-5, 360 pages
2. Dynamic Programming and Optimal Control, Two-Volume Set,
by Dimitri P. Bertsekas, 2017, ISBN 1-886529-08-6, 1270 pages
3. Nonlinear Programming, 3rd Edition, by Dimitri P. Bertsekas,
2016, ISBN 1-886529-05-1, 880 pages
4. Convex Optimization Algorithms, by Dimitri P. Bertsekas, 2015,
ISBN 978-1-886529-28-1, 576 pages
5. Convex Optimization Theory, by Dimitri P. Bertsekas, 2009,
ISBN 978-1-886529-31-1, 256 pages
6. Introduction to Probability, 2nd Edition, by Dimitri P. Bertsekas
and John N. Tsitsiklis, 2008, ISBN 978-1-886529-23-6, 544 pages
7. Convex Analysis and Optimization, by Dimitri P. Bertsekas, An-
gelia Nedić, and Asuman E. Ozdaglar, 2003, ISBN 1-886529-45-0,
560 pages
8. Network Optimization: Continuous and Discrete Models, by Dim-
itri P. Bertsekas, 1998, ISBN 1-886529-02-7, 608 pages
9. Network Flows and Monotropic Optimization, by R. Tyrrell Rock-
afellar, 1998, ISBN 1-886529-06-X, 634 pages
10. Introduction to Linear Optimization, by Dimitris Bertsimas and
John N. Tsitsiklis, 1997, ISBN 1-886529-19-1, 608 pages
11. Parallel and Distributed Computation: Numerical Methods, by
Dimitri P. Bertsekas and John N. Tsitsiklis, 1997, ISBN 1-886529-
01-9, 718 pages
12. Neuro-Dynamic Programming, by Dimitri P. Bertsekas and John
N. Tsitsiklis, 1996, ISBN 1-886529-10-8, 512 pages
13. Constrained Optimization and Lagrange Multiplier Methods, by
Dimitri P. Bertsekas, 1996, ISBN 1-886529-04-3, 410 pages
14. Stochastic Optimal Control: The Discrete-Time Case, by Dimitri
P. Bertsekas and Steven E. Shreve, 1996, ISBN 1-886529-03-5,
330 pages

iv
Contents

1. Exact Dynamic Programming

1.1. Deterministic Dynamic Programming . . . . . . . . . . . p. 2
1.1.1. Deterministic Problems . . . . . . . . . . . . . . p. 2
1.1.2. The Dynamic Programming Algorithm . . . . . . . . p. 7
1.1.3. Approximation in Value Space . . . . . . . . . . . p. 12
1.1.4. Model-Free Approximate Solution - Q-Learning . . . . p. 13
1.2. Stochastic Dynamic Programming . . . . . . . . . . . . . p. 14
1.3. Examples, Variations, and Simplifications . . . . . . . . . p. 17
1.3.1. Deterministic Shortest Path Problems . . . . . . . . p. 18
1.3.2. Discrete Deterministic Optimization . . . . . . . . . p. 19
1.3.3. Problems with a Terminal State . . . . . . . . . . p. 23
1.3.4. Forecasts . . . . . . . . . . . . . . . . . . . . . p. 26
1.3.5. Problems with Uncontrollable State Components . . . p. 27
1.3.6. Partial State Information and Belief States . . . . . . p. 32
1.3.7. Linear Quadratic Optimal Control . . . . . . . . . . p. 35
1.4. Reinforcement Learning and Optimal Control - Some . . . . . .
Terminology . . . . . . . . . . . . . . . . . . . . . . p. 38
1.5. Notes and Sources . . . . . . . . . . . . . . . . . . . p. 40

2. Approximation in Value Space

2.1. Variants of Approximation in Value Space . . . . . . . . . p. 3
2.1.1. Off-Line and On-Line Methods . . . . . . . . . . . p. 4
2.1.2. Simplifying the Lookahead Minimization . . . . . . . p. 5
2.1.3. Model-Free Approximation in Value and . . . . . . . . .
Policy Space . . . . . . . . . . . . . . . . . . . p. 6
2.1.4. When is Approximation in Value Space Effective? . . . p. 9
2.2. Multistep Lookahead . . . . . . . . . . . . . . . . . . p. 10
2.2.1. Multistep Lookahead and Rolling Horizon . . . . . . p. 11
2.2.2. Multistep Lookahead and Deterministic Problems . . . p. 13
2.3. Problem Approximation . . . . . . . . . . . . . . . . . p. 14

v
vi Contents

2.3.1. Enforced Decomposition . . . . . . . . . . . . . . p. 15

2.3.2. Probabilistic Approximation - Certainty . . . . . . . . .
Equivalent Control . . . . . . . . . . . . . . . . p. 21
2.4. Rollout and Model Predictive Control . . . . . . . . . . . p. 27
2.4.1. Rollout for Deterministic Problems . . . . . . . . . p. 27
2.4.2. Stochastic Rollout and Monte Carlo Tree Search . . . p. 34
2.4.3. Model Predictive Control . . . . . . . . . . . . . . p. 41
2.5. Notes and Sources . . . . . . . . . . . . . . . . . . . p. 46

3. Parametric Approximation
3.1. Approximation Architectures . . . . . . . . . . . . . . . p. 2
3.1.1. Linear and Nonlinear Feature-Based Architectures . . . p. 2
3.1.2. Training of Linear and Nonlinear Architectures . . . . p. 7
3.1.3. Incremental Gradient and Newton Methods . . . . . . p. 9
3.2. Neural Networks . . . . . . . . . . . . . . . . . . . . p. 21
3.2.1. Training of Neural Networks . . . . . . . . . . . . p. 24
3.2.2. Multilayer and Deep Neural Networks . . . . . . . . p. 26
3.3. Sequential Dynamic Programming Approximation . . . . . . p. 29
3.4. Q-factor Parametric Approximation . . . . . . . . . . . . p. 31
3.5. Notes and Sources . . . . . . . . . . . . . . . . . . . p. 33

4. Infinite Horizon Renforcement Learning

4.1. An Overview of Infinite Horizon Problems . . . . . . . . . p. 2
4.2. Stochastic Shortest Path Problems . . . . . . . . . . . . p. 5
4.3. Discounted Problems . . . . . . . . . . . . . . . . . . p. 14
4.4. Exact and Approximate Value Iteration . . . . . . . . . . p. 19
4.5. Policy Iteration . . . . . . . . . . . . . . . . . . . . p. 22
4.5.1. Exact Policy Iteration . . . . . . . . . . . . . . . p. 22
4.5.2. Policy Iteration for Q-factors . . . . . . . . . . . . p. 27
4.5.3. Limited Lookahead Policies and Rollout . . . . . . . p. 28
4.5.4. Approximate Policy Iteration - Error Bounds . . . . . p. 30
4.6. Simulation-Based Policy Iteration with Parametric . . . . . . .
Approximation . . . . . . . . . . . . . . . . . . . . . p. 34
4.6.1. Self-Learning and Actor-Critic Systems . . . . . . . p. 34
4.6.2. A Model-Based Variant . . . . . . . . . . . . . . p. 35
4.6.3. A Model-Free Variant . . . . . . . . . . . . . . . p. 37
4.6.4. Issues Relating to Approximate Policy Iteration . . . . p. 39
4.7. Exact and Approximate Linear Programming . . . . . . . p. 42
4.8. Q-Learning . . . . . . . . . . . . . . . . . . . . . . p. 44
4.9. Additional Methods - Temporal Differences . . . . . . . . p. 47
4.10. Approximation in Policy Space . . . . . . . . . . . . . p. 58
4.11. Notes and Sources . . . . . . . . . . . . . . . . . . . p. 60
4.12. Appendix: Mathematical Analysis . . . . . . . . . . . . p. 63
Contents vii

4.12.1. Proofs for Stochastic Shortest Path Problems . . . . p. 63

4.12.2. Proofs for Discounted Problems . . . . . . . . . . p. 69
4.12.3. Convergence of Exact Policy Iteration . . . . . . . . p. 69
4.12.4. Error Bounds for Approximate Policy Iteration . . . . p. 70

5. Aggregation
5.1. Aggregation Frameworks . . . . . . . . . . . . . . . . . . p.
5.2. Classical and Biased Forms of the Aggregate Problem . . . . . p.
5.3. Bellman’s Equation for the Aggregate Problem . . . . . . . . p.
5.4. Algorithms for the Aggregate Problem . . . . . . . . . . . . p.
5.5. Some Examples . . . . . . . . . . . . . . . . . . . . . . p.
5.6. Spatiotemporal Aggregation for Deterministic Problems . . . . p.
5.7. Notes and Sources . . . . . . . . . . . . . . . . . . . . p.

References . . . . . . . . . . . . . . . . . . . . . . . . . p.

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . p.
Preface
In this book we consider large and challenging multistage decision prob-
lems, which can be solved in principle by dynamic programming (DP for
short), but their exact solution is computationally intractable. We discuss
solution methods that rely on approximations to produce suboptimal poli-
cies with adequate performance. These methods are collectively referred to
as reinforcement learning, and also by alternative names such as approxi-
mate dynamic programming, and neuro-dynamic programming.
Our subject has benefited greatly from the interplay of ideas from
optimal control and from artificial intelligence. One of the aims of the
book is to explore the common boundary between these two fields and to
form a bridge that is accessible by workers with background in either field.
Our primary focus will be on approximation in value space. Here, the
control at each state is obtained by limited lookahead with cost function
approximation, i.e., by optimization of the cost over a limited horizon, plus
an approximation of the optimal future cost, starting from the end of this
horizon. The latter cost, which we generally denote by J, ˜ is a function of
the state where we may be at the end of the horizon. It may be computed
by a variety of methods, possibly involving simulation and/or some given or
separately derived heuristic/suboptimal policy. The use of simulation often
allows for model-free implementations that do not require the availability
of a mathematical model, a major idea that has allowed the use of dynamic
programming beyond its classical boundaries.
We focus selectively on four types of methods for obtaining J: ˜
(a) Problem approximation: Here J˜ is the optimal cost function of a re-
lated simpler problem, which is solved by exact DP. Certainty equiv-
alent control and enforced decomposition schemes are discussed in
some detail.
(b) Rollout and model predictive control : Here J˜ is the cost function of
some known heuristic policy. The needed cost values to implement a
rollout policy are often calculated by simulation. While this method
applies to stochastic problems, the reliance on simulation favors de-
terministic problems, including challenging combinatorial problems
for which heuristics may be readily implemented. Rollout may also

ix
x Preface

be combined with adaptive simulation and Monte Carlo tree search,

which have proved very effective in the context of games such as
backgammon, chess, Go, and others.
Model predictive control was originally developed for continuous-
space optimal control problems that involve some goal state, e.g.,
the origin in a classical control context. It can be viewed as a special-
ized rollout method that is based on an optimization algorithm for
reaching a goal state.
(c) Parametric cost approximation: Here J˜ is chosen from within a para-
metric class of functions, including neural networks, with the param-
eters “optimized” or “trained” by using state-cost sample pairs and
some type of incremental least squares/regression algorithm. Ap-
proximate policy iteration and its variants are covered in some detail,
including several actor-critic schemes. These include policy evalua-
tion that involves temporal difference-based training methods, and
policy improvement that is based on approximation in policy space.
(d) Aggregation: Here the cost function J˜ is the optimal cost function of
some approximation to the original problem, called aggregate prob-
lem, which has fewer states. The aggregate problem can be formu-
lated in a variety of ways, and may be solved by using exact DP
techniques. Its optimal cost function is then used as J˜ in a limited
lookahead scheme. Aggregation may also be used to provide local im-
provements to parametric approximation schemes that involve neural
networks or linear feature-based architectures.
We have adopted a gradual expository approach, which proceeds
along three directions:
(1) From exact DP to approximate DP : We first discuss exact DP algo-
rithms, explain why they may be difficult to implement, and then use
them as the basis for approximations.
(2) From finite horizon to infinite horizon problems: We first discuss fi-
nite horizon exact and approximate DP methodologies, which are in-
tuitive and mathematically simple in Chapters 1-3. We then progress
to infinite horizon problems in Chapters 4 and 5.
(3) From model-based to model-free approaches: Reinforcement learning
methods offer a major potential benefit over classical DP approaches,
which were practiced exclusively up to the early 90s: they can be im-
plemented by using a simulator/computer model rather than a math-
ematical model. In our presentation, we first discuss model-based
methods, and then we identify those methods that can be appropri-
ately modified to work with a simulator.
After the first chapter, each new class of methods is introduced as a
Preface xi

more sophisticated or generalized version of a simpler method introduced

earlier. Moreover, each type of method is illustrated by means of examples,
which should be helpful in providing insight into its use, but may also be
skipped selectively and without loss of continuity. Detailed solutions to
some of the simpler examples are given, and may illustrate some of the
implementation details.
The mathematical style of this book is somewhat different from the
one of the author’s dynamic programming books [Ber12], [Ber17a], [Ber18a],
and the neuro-dynamic programming research monograph, written jointly
with John Tsitsiklis [BeT96]. While we rigorously present the theory of
finite and infinite horizon dynamic programming, and some fundamental
approximation methods, we rely more on intuitive explanations and less on
proof-based insights. Moreover, our mathematical requirements are mod-
est: calculus, elementary probability, and a minimal use of matrix-vector
algebra.
Furthermore, we present methods that are often successful in practice,
but have less than solid performance properties. This is a reflection of the
state of the art in the field: there are no methods that are guaranteed to
work for all or even most problems, but there are enough methods to try
on a given problem with a reasonable chance of success in the end. For this
process to work, however, it is important to have proper intuition into the
inner workings of each type of method, as well as an understanding of its
analytical and computational properties. To quote a statement from the
preface of the neuro-dynamic programming (NDP) monograph [BeT96]:
“It is primarily through an understanding of the mathematical structure of
the NDP methodology that we will be able to identify promising or solid
algorithms from the bewildering array of speculative proposals and claims
that can be found in the literature.”

Dimitri P. Bertsekas
Winter 2018

Dynamic Programming and Optimal Control
No ratings yet
Dynamic Programming and Optimal Control
199 pages
Reinforcement Learning and Optimal Control - Draft Version by Dmitri Bertsekas
No ratings yet
Reinforcement Learning and Optimal Control - Draft Version by Dmitri Bertsekas
268 pages
Lessons From Alpha Zero
No ratings yet
Lessons From Alpha Zero
242 pages
AbstractDynamic Programming
No ratings yet
AbstractDynamic Programming
422 pages
Reinforcement Learning: Foundations
No ratings yet
Reinforcement Learning: Foundations
276 pages
Abstract Dynamic Programming
No ratings yet
Abstract Dynamic Programming
257 pages
SSRN 4963741
No ratings yet
SSRN 4963741
26 pages
Powell UnifiedFrameworkStochasticOptimization Jan292018
No ratings yet
Powell UnifiedFrameworkStochasticOptimization Jan292018
69 pages
Notes Summary
No ratings yet
Notes Summary
65 pages
Reinforcement Learning - A Comprehensive Overview
No ratings yet
Reinforcement Learning - A Comprehensive Overview
177 pages
Abstract Dynamic Programming Bertsekas Dimitri P Download
No ratings yet
Abstract Dynamic Programming Bertsekas Dimitri P Download
87 pages
La5 PDF
No ratings yet
La5 PDF
35 pages
Reinforcement Learning and Dynamic Programming For Control
100% (1)
Reinforcement Learning and Dynamic Programming For Control
111 pages
RL Notes
No ratings yet
RL Notes
69 pages
RL Test Leif
No ratings yet
RL Test Leif
163 pages
Audio To Text Embedding
No ratings yet
Audio To Text Embedding
144 pages
Application of Reinforcement Learning - Finance
No ratings yet
Application of Reinforcement Learning - Finance
540 pages
RL-Notes Book
No ratings yet
RL-Notes Book
119 pages
RL Class Notes
No ratings yet
RL Class Notes
68 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
93 pages
Algorithms For Reinforced Learning
No ratings yet
Algorithms For Reinforced Learning
98 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
Book
No ratings yet
Book
534 pages
Dynamic Programming and Optimal Control 3rd Edition, Volume II
No ratings yet
Dynamic Programming and Optimal Control 3rd Edition, Volume II
233 pages
Algorithms For Reinforcement Learning - Szepesvari
No ratings yet
Algorithms For Reinforcement Learning - Szepesvari
98 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
Alg RLearning Ejemplo
No ratings yet
Alg RLearning Ejemplo
99 pages
Book All-In-One 2
No ratings yet
Book All-In-One 2
281 pages
Book-Decision Making Under Uncertainty and Reinforcement Learning
No ratings yet
Book-Decision Making Under Uncertainty and Reinforcement Learning
273 pages
Decision Uncertainty
No ratings yet
Decision Uncertainty
269 pages
Algorithm For RL
No ratings yet
Algorithm For RL
99 pages
1 - Table of Contents
No ratings yet
1 - Table of Contents
6 pages
Book All in One
No ratings yet
Book All in One
288 pages
Salis Grad Thesis
No ratings yet
Salis Grad Thesis
90 pages
DP Book
No ratings yet
DP Book
428 pages
Introduction To Online Control Hazan & Singh
No ratings yet
Introduction To Online Control Hazan & Singh
192 pages
Deep Reinforcement Learning: Overcoming The Challenges of Deep Learning in Discrete and Continuous Markov Decision Processes
No ratings yet
Deep Reinforcement Learning: Overcoming The Challenges of Deep Learning in Discrete and Continuous Markov Decision Processes
110 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
Dynamic Programming: Thomas J. Sargent and John Stachurski January 16, 2024
No ratings yet
Dynamic Programming: Thomas J. Sargent and John Stachurski January 16, 2024
446 pages
Dynamic Programming and Optimal Control Script
No ratings yet
Dynamic Programming and Optimal Control Script
58 pages
Network Optimization - Continuos and Discrete Models
No ratings yet
Network Optimization - Continuos and Discrete Models
270 pages
Stochastic Programming
No ratings yet
Stochastic Programming
326 pages
Manujw
100% (1)
Manujw
326 pages
Stochastic Programming
100% (1)
Stochastic Programming
326 pages
Shiyu Zhao - Mathematical Foundation of Reinforcement Learning (2024, Tsinghua University Press, Springer) - Libgen - Li
No ratings yet
Shiyu Zhao - Mathematical Foundation of Reinforcement Learning (2024, Tsinghua University Press, Springer) - Libgen - Li
283 pages
A Gentle Introduction To Gradient-Based Optimization
No ratings yet
A Gentle Introduction To Gradient-Based Optimization
36 pages
MIT6 231F11 Notes Short
No ratings yet
MIT6 231F11 Notes Short
125 pages
Ashwin Rao, Tikhon Jelvis - Foundations of Reinforcement Learning With Applications in Finance-CRC Press - Chapman & Hall (2022)
No ratings yet
Ashwin Rao, Tikhon Jelvis - Foundations of Reinforcement Learning With Applications in Finance-CRC Press - Chapman & Hall (2022)
522 pages
AR23
No ratings yet
AR23
159 pages
Final MSC Report Divyam Rastogi
No ratings yet
Final MSC Report Divyam Rastogi
78 pages
6.036 Notes
No ratings yet
6.036 Notes
99 pages
ML in Control and Games SIAM Review HAL
No ratings yet
ML in Control and Games SIAM Review HAL
76 pages
Stochastic Programming
100% (2)
Stochastic Programming
315 pages
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
100% (1)
Markov Decision Processes: Lecture Notes For STP 425: Jay Taylor
86 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
232 pages
Recent Advances in Reinforcement Learning in Finance
No ratings yet
Recent Advances in Reinforcement Learning in Finance
64 pages
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
From Everand
Data Empowerment: Harnessing Advanced Mathematical and Statistical Methods for Data Science and Machine Learning
NAGARAJU CHEVURU
No ratings yet
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Computational Science: An Introduction for Scientists and Engineers
From Everand
Computational Science: An Introduction for Scientists and Engineers
Christopher D Wentworth
No ratings yet
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet

RL Frontmatter

Uploaded by

RL Frontmatter

Uploaded by

Reinforcement Learning and Optimal Control

December 14, 2018

WWW site for book information and orders

Athena Scientific, Belmont, Massachusetts

Publisher’s Cataloging-in-Publication Data

ISBN-10: 1-886529-39-6, ISBN-13: 978-1-886529-39-7

Dimitri Bertsekas studied Mechanical and Electrical Engineering at the

1. Abstract Dynamic Programming, 2nd Edition, by Dimitri P.

1. Exact Dynamic Programming

2. Approximation in Value Space

2.3.1. Enforced Decomposition . . . . . . . . . . . . . . p. 15

4. Infinite Horizon Renforcement Learning

4.12.1. Proofs for Stochastic Shortest Path Problems . . . . p. 63

be combined with adaptive simulation and Monte Carlo tree search,

more sophisticated or generalized version of a simpler method introduced

You might also like