100% found this document useful (2 votes)

40 views59 pages

Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere

The document promotes a collection of ebooks related to adaptive dynamic programming and optimal control, highlighting various titles and their authors. It emphasizes the importance of optimal control techniques for nonlinear systems and discusses the challenges and methodologies involved, particularly the use of adaptive dynamic programming (ADP). The book aims to provide both theoretical insights and practical applications in the field of control engineering.

Uploaded by

wassoedyta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

40 views59 pages

Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere

Uploaded by

wassoedyta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Explore the full ebook collection and download it now at textbookfull.

com

Adaptive Dynamic Programming with Applications in

Optimal Control 1st Edition Derong Liu

https://textbookfull.com/product/adaptive-dynamic-
programming-with-applications-in-optimal-control-1st-
edition-derong-liu/

OR CLICK HERE

DOWLOAD EBOOK

Browse and Get More Ebook Downloads Instantly at https://textbookfull.com

Click here to visit textbookfull.com and download textbook now
Your digital treasures (PDF, ePub, MOBI) await
Download instantly and pick your perfect format...

Read anywhere, anytime, on any device!

Stochastic Optimal Control in Infinite Dimension Dynamic

Programming and HJB Equations 1st Edition Giorgio Fabbri

https://textbookfull.com/product/stochastic-optimal-control-in-
infinite-dimension-dynamic-programming-and-hjb-equations-1st-edition-
giorgio-fabbri/
textbookfull.com

Robust Adaptive Dynamic Programming 1st Edition Hao Yu

https://textbookfull.com/product/robust-adaptive-dynamic-
programming-1st-edition-hao-yu/

textbookfull.com

Intelligent Optimal Adaptive Control for Mechatronic

Systems 1st Edition Marcin Szuster

https://textbookfull.com/product/intelligent-optimal-adaptive-control-
for-mechatronic-systems-1st-edition-marcin-szuster/

textbookfull.com

Programming Interview Problems: Dynamic Programming (with

solutions in Python) 1st Edition Leonardo Rossi

https://textbookfull.com/product/programming-interview-problems-
dynamic-programming-with-solutions-in-python-1st-edition-leonardo-
rossi/
textbookfull.com
Optimal Control in Thermal Engineering 1st Edition Viorel
Badescu

https://textbookfull.com/product/optimal-control-in-thermal-
engineering-1st-edition-viorel-badescu/

textbookfull.com

Robust Control: Theory and Applications Kang-Zhi Liu

https://textbookfull.com/product/robust-control-theory-and-
applications-kang-zhi-liu/

textbookfull.com

Adaptive aeroservoelastic control 1st Edition Tewari

https://textbookfull.com/product/adaptive-aeroservoelastic-
control-1st-edition-tewari/

textbookfull.com

Sliding Mode Control Methodology in the Applications of

Industrial Power Systems Jianxing Liu

https://textbookfull.com/product/sliding-mode-control-methodology-in-
the-applications-of-industrial-power-systems-jianxing-liu/

textbookfull.com

Adaptive Critic Control with Robust Stabilization for

Uncertain Nonlinear Systems Ding Wang

https://textbookfull.com/product/adaptive-critic-control-with-robust-
stabilization-for-uncertain-nonlinear-systems-ding-wang/

textbookfull.com
Advances in Industrial Control

Derong Liu
Qinglai Wei
Ding Wang
Xiong Yang
Hongliang Li

Adaptive Dynamic
Programming with
Applications in
Optimal Control
Advances in Industrial Control

Series editors
Michael J. Grimble, Glasgow, UK
Michael A. Johnson, Kidlington, UK
More information about this series at http://www.springer.com/series/1412
Derong Liu Qinglai Wei Ding Wang
• •

Xiong Yang Hongliang Li

•

Adaptive Dynamic
Programming
with Applications
in Optimal Control

123
Derong Liu Xiong Yang
Institute of Automation Tianjin University
Chinese Academy of Sciences Tianjin
Beijing China
China
Hongliang Li
Qinglai Wei Tencent Inc.
Institute of Automation Shenzhen
Chinese Academy of Sciences China
Beijing
China

Ding Wang
Institute of Automation
Chinese Academy of Sciences
Beijing
China

ISSN 1430-9491 ISSN 2193-1577 (electronic)

Advances in Industrial Control
ISBN 978-3-319-50813-9 ISBN 978-3-319-50815-3 (eBook)
DOI 10.1007/978-3-319-50815-3
Library of Congress Control Number: 2016959539

© Springer International Publishing AG 2017

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.

Printed on acid-free paper

This Springer imprint is published by Springer Nature

The registered company is Springer International Publishing AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Foreword

Nowadays, nonlinearity is involved in all walks of life. It is a challenge for

engineers to design controllers for all kinds of nonlinear systems. To handle this
issue, various nonlinear control theories have been developed, such as theories of
adaptive control, optimal control, and robust control. Among these theories, the
theory of optimal control has drawn considerable attention over the past several
decades. This is mainly because optimal control provides an effective way to design
controllers with guaranteed robustness properties as well as capabilities of opti-
mization and resource conservation that are important in manufacturing, vehicle
emission control, aerospace systems, power systems, chemical engineering pro-
cesses, and many other applications.
The core challenge in deriving the solutions of nonlinear optimal control
problems is that it often boils down to solving certain Hamilton–Jacobi–Bellman
(HJB) equations. The HJB equations are nonlinear and difﬁcult to solve for general
nonlinear dynamical systems. Indeed, no closed-form solution to such equations
exists, except for very special problems. Therefore, numerical solutions to HJB
equations have been developed by engineers. To obtain such numerical solutions, a
highly effective method known as adaptive/approximate dynamic programming
(ADP) can be used. A distinct advantage of ADP is that it can avoid the well-known
“curse of dimensionality” of dynamic programming while adaptively solving the
HJB equations. Due to this characteristic, many elegant ADP approaches and their
applications have been developed in the literature during the past several decades.
It is also notable that ADP techniques also provide a link with cognitive
decision-making methods that are observed in the human brain, and thus, ADP has
become a main channel to achieve truly brain-like intelligence in human-engineered
automatic control systems.
Unlike most ADP books, the present book “Adaptive Dynamic Programming
with Applications in Optimal Control” focuses on the principles of emerging
optimal control techniques for nonlinear systems in both discrete-time and
continuous-time domains, and on creating applications of these optimal control
techniques. This book contains three themes:

v
vi Foreword

1. Optimal control for discrete-time nonlinear dynamical systems, covering various

novel techniques used to derive optimal control in the discrete-time domain,
such as general value iteration, θ-ADP, finite approximation error-based value
iteration, policy iteration, generalized policy iteration, and error bounds analysis
of ADP.
2. Optimal control for continuous-time nonlinear systems, discussing the optimal
control for input-affine/input-nonaffine nonlinear systems, robust and optimal
guaranteed cost control for input-affine nonlinear systems, decentralized control
for interconnected nonlinear systems, and optimal control for differential games.
3. Applications, providing typical applications of optimal control approaches in the
areas of energy management in smart homes, coal gasification, and water gas
shift reaction.
This book provides timely and informative coverage about ADP, including both
rigorous derivations and insightful developments. It will help both specialists and
nonspecialists understand the new developments in the field of nonlinear optimal
control using online/offline learning techniques. Meanwhile, it will be beneficial for
engineers to apply the developed ADP methods to their own problems in practice.
I am sure you will enjoy reading this book.

Arlington, TX, USA Frank L. Lewis

September 2016
Series Editors’ Foreword

The series Advances in Industrial Control aims to report and encourage technology
transfer in control engineering. The rapid development of control technology has an
impact on all areas of the control discipline: new theory, new controllers, actuators,
sensors, new industrial processes, computer methods, new applications, new design
philosophies, and new challenges. Much of this development work resides in
industrial reports, feasibility study papers, and the reports of advanced collaborative
projects. The series offers an opportunity for researchers to present an extended
exposition of such new work in all aspects of industrial control for wider and rapid
dissemination.
The method of dynamic programming has a long history in the ﬁeld of optimal
control. It dates back to those days when the subject of control was emerging in a
modern form in the 1950s and 1960s. It was devised by Richard Bellman who gave
it a modern revision in a publication of 1954 [1]. The name of Bellman became
linked to an optimality equation, key to the method, and like the name of Kalman
became uniquely associated with the early development of optimal control. One
notable extension to the method was that of differential dynamic programming due
to David Q. Mayne in 1966 and developed at length in the book by Jacobson and
Mayne [2]. Their new technique used locally quadratic models for the system
dynamics and cost functions and improved the convergence of the dynamic pro-
gramming method for optimal trajectory control problems.
Since those early days, the subject of control has taken many different directions,
but dynamic programming has always retained a place in the theory of optimal
control fundamentals. It is therefore instructive for the Advances in Industrial
Control monograph series to have a contribution that presents new ways of solving
dynamic programming and demonstrating these methods with some up-to-date
industrial problems. This monograph, Adaptive Dynamic Programming with
Applications in Optimal Control, by Derong Liu, Qinglai Wei, Ding Wang, Xiong
Yang and Hongliang Li, has precisely that objective.
The authors open the monograph with a very interesting and relevant discussion
of another computationally difﬁcult problem, namely devising a computer program
to defeat human master players at the Chinese game of Go. Inspiration from the

vii
viii Series Editors’ Foreword

better programming techniques used in the Go-master problem was used by the
authors to defeat the “curse of dimensionality” that arises in dynamic programming
methods.
More formally, the objective of the techniques reported in the monograph is to
control in an optimal fashion an unknown or uncertain nonlinear multivariable
system using recorded and instantaneous output signals. The algorithms’ technical
framework is then constructed through different categories of the usual state-space
nonlinear ordinary differential system model. The system model can be continuous
or discrete, have affine or nonaffine control inputs, be subject to no constraints, or
have constraints present. A set of 11 chapters contains the theory for various
formulations of the system features.
Since standard dynamic programming schemes suffer from various implemen-
tation obstacles, adaptive dynamic programming procedures have been developed
to find computable practical suboptimal control solutions. A key technique used by
the authors is that of neural networks which are trained using recorded data and
updated, or “adapted,” to accommodate uncertain system knowledge. The theory
chapters are arranged in two parts: Part 1 Discrete-Time Systems—five chapters;
and Part 2 Continuous-Time Systems—five chapters.
An important feature of the monographs of the Advances in Industrial Control
series is a demonstration of potential or actual application to industrial problems.
After a comprehensive presentation of the theory of adaptive dynamic program-
ming, the authors devote Part 3 of their monograph to three chapter-length appli-
cation studies. Chapter 12 examines the scheduling of energy supplies in a smart
home environment, a topic and problem of considerable contemporary interest.
Chapter 13 uses a coal gasification process that is suitably challenging to demon-
strate the authors’ techniques. And finally, Chapter 14 concerns the control of the
water gas shift reaction. In this example, the data used was taken from a real-world
operational system.
This monograph is very comprehensive in its presentation of the adaptive
dynamic programming theory and has demonstrations with three challenging pro-
cesses. It should find a wide readership in both the industrial control engineering
and the academic control theory communities. Readers in other fields such as
computer science and chemical engineering may also find the monograph of con-
siderable interest.
Michael J. Grimble
Michael A. Johnson
Industrial Control Centre
University of Strathclyde
Glasgow, Scotland, UK
Series Editors’ Foreword ix

References

1. Bellman R (1954) The theory of dynamic programming. Bulletin of the American Mathematical
Society 60(6):503–515
2. Jacobson DH, Mayne DQ (1970) Differential dynamic programming, American Elsevier Pub.
Co. New York
Preface

With the rapid development in information science and technology, many busi-
nesses and industries have undergone great changes, such as chemical industry,
electric power engineering, electronics industry, mechanical engineering, trans-
portation, and logistics business. While the scale of industrial enterprises is
increasing, production equipment and industrial processes are becoming more and
more complex. For these complex systems, decision and control are necessary to
ensure that they perform properly and meet prescribed performance objectives.
Under this circumstance, how to design safe, reliable, and efficient control for
complex systems is essential for our society. As modern systems become more
complex and performance requirements become more stringent, advanced control
methods are greatly needed to achieve guaranteed performance and satisfactory
goals.
In general, optimal control deals with the problem of finding a control law for a
given system such that a certain optimality criterion is achieved. The main differ-
ence between optimal control of linear and nonlinear systems lies in that the latter
often requires solving the nonlinear Bellman equation instead of the Riccati
equation. Although dynamic programming is a conventional method in solving
optimization and optimal control problems, it often suffers from the “curse of
dimensionality.” To overcome this difficulty, based on function approximators such
as neural networks, adaptive/approximate dynamic programming (ADP) was pro-
posed by Werbos as a method for solving optimal control problems
forward-in-time.
This book presents the recent results of ADP with applications in optimal
control. It is composed of 14 chapters which cover most of the hot research areas of
ADP and are divided into three parts. Part I concerns discrete-time systems,
including five chapters from Chaps. 2 to 6. Part II concerns continuous-time sys-
tems, including five chapters from Chaps. 7 to 11. Part III concerns applications,
including three chapters from Chaps. 12 to 14.
In Chap. 1, an introduction to the history of ADP is provided, including the basic
and iterative forms of ADP. The review begins with the origin of ADP and

xi
xii Preface

describes the basic structures and the algorithm development in detail. Connections
between ADP and reinforcement learning are also discussed.
Part I: Discrete-Time Systems (Chaps. 2–6)
In Chap. 2, optimal control problems of discrete-time nonlinear dynamical systems,
including optimal regulation, optimal tracking control, and constrained optimal
control, are studied using a series of value iteration ADP approaches. First, an ADP
scheme based on general value iteration is developed to obtain near-optimal control
for discrete-time affine nonlinear systems with continuous state and control spaces.
The present scheme is also employed to solve infinite-horizon optimal tracking
control problems for a class of discrete-time nonlinear systems. In particular, using
the globalized dual heuristic programming technique, a value iteration-based
optimal control strategy of unknown discrete-time nonlinear dynamical systems
with input constraints is established as a case study. Second, an iterative θ-ADP
algorithm is given to solve the optimal control problem of infinite-horizon
discrete-time nonlinear systems, which shows that each of the iterative controls can
stabilize the nonlinear dynamical systems and the condition of initial admissible
control is avoided effectively.
In Chap. 3, a series of iterative ADP algorithms are developed to solve the
infinite-horizon optimal control problems for discrete-time nonlinear dynamical
systems with finite approximation errors. Iterative control laws are obtained by
using the present algorithms such that the iterative value functions reach the opti-
mum. Then, the numerical optimal control problems are solved by a novel
numerical adaptive learning control scheme based on ADP algorithm. Moreover, a
general value iteration algorithm with finite approximate errors is developed to
guarantee the iterative value function to converge to the solution of the Bellman
equation. The general value iteration algorithm permits an arbitrary positive
semidefinite function to initialize itself, which overcomes the disadvantage of tra-
ditional value iteration algorithms.
In Chap. 4, a discrete-time policy iteration ADP method is developed to solve
the infinite-horizon optimal control problems for nonlinear dynamical systems. The
idea is to use an iterative ADP technique to obtain iterative control laws that
optimize the iterative value functions. The convergence, stability, and optimality
properties are analyzed for policy iteration method for discrete-time nonlinear
dynamical systems, and it is shown that the iterative value functions are nonin-
creasingly convergent to the optimal solution of the Bellman equation. It is also
proven that any of the iterative control laws obtained from the present policy
iteration algorithm can stabilize the nonlinear dynamical systems.
In Chap. 5, a generalized policy iteration algorithm is developed to solve the
optimal control problems for infinite-horizon discrete-time nonlinear systems.
Generalized policy iteration algorithm uses the idea of interacting the policy iter-
ation algorithm and the value iteration algorithm of ADP. It permits an arbitrary
positive semidefinite function to initialize the algorithm, where two iteration indices
are used for policy evaluation and policy improvement, respectively. The
Preface xiii

monotonicity, convergence, admissibility, and optimality properties of the gener-

alized policy iteration algorithm are analyzed.
In Chap. 6, error bounds of ADP algorithms are established for solving undis-
counted infinite-horizon optimal control problems of discrete-time deterministic
nonlinear systems. The error bounds for approximate value iteration based on a
novel error condition are developed. The error bounds for approximate policy
iteration and approximate optimistic policy iteration algorithms are also provided. It
is shown that the iterative approximate value function can converge to a finite
neighborhood of the optimal value function under some conditions. In addition,
error bounds are also established for Q-function of approximate policy iteration for
optimal control of unknown discounted discrete-time nonlinear systems. Neural
networks are used to approximate the Q-function and the control policy.
Part II: Continuous-Time Systems (Chaps. 7–11)
In Chap. 7, optimal control problems of continuous-time affine nonlinear dynamical
systems are studied using ADP approaches. First, an identifier–critic architecture
based on ADP methods is presented to derive the approximate optimal control for
uncertain continuous-time nonlinear dynamical systems. The identifier neural net-
work and the critic neural network are tuned simultaneously, while the restrictive
persistence of excitation condition is relaxed. Second, an ADP-based algorithm is
developed to solve the optimal control problems for continuous-time nonlinear
dynamical systems with control constraints. Only a single critic neural network is
utilized to derive the optimal control, and there is no special requirement on the
initial control.
In Chap. 8, the optimal control problems are considered for continuous-time
nonaffine nonlinear dynamical systems with completely unknown dynamics via
ADP methods. First, an ADP-based novel identifier–actor–critic architecture is
developed to provide approximate optimal control solutions for continuous-time
unknown nonaffine nonlinear dynamical systems, where the identifier is constructed
by a dynamic neural network to transform nonaffine nonlinear systems into a class
of affine nonlinear systems. Second, an ADP-based observer–critic architecture is
presented to obtain the approximate optimal control for nonaffine nonlinear
dynamical systems in the presence of unknown dynamics, where the observer is
composed of a three-layer feedforward neural network aiming to get the knowledge
of system states.
In Chap. 9, robust control and optimal guaranteed cost control of
continuous-time uncertain nonlinear systems are studied using the idea of ADP.
First, a novel strategy is established to design the robust controller for a class of
nonlinear systems with uncertainties based on an online policy iteration algorithm.
By properly choosing a cost function that reflects the uncertainties, regulation, and
control, the robust control problem is transformed into an optimal control problem,
which can be solved effectively under the framework of ADP. Then, the
infinite-horizon optimal guaranteed cost control problem of uncertain nonlinear
systems is investigated by employing the formulation of ADP-based online optimal
xiv Preface

control design, which extends the application scope of ADP methods to nonlinear
and uncertain environment.
In Chap. 10, by using neural network-based online learning optimal control
approach, a decentralized control strategy is developed to stabilize a class of
continuous-time large-scale interconnected nonlinear systems. The decentralized
control strategy of the overall system can be established by adding appropriate
feedback gains to the optimal control laws of isolated subsystems. Then, an online
policy iteration algorithm is presented to solve the Hamilton–Jacobi–Bellman
equations related to the optimal control problems. Furthermore, as a generalization,
a neural network-based decentralized control law is developed to stabilize the
large-scale interconnected nonlinear systems with unknown dynamics by using an
online model-free integral policy iteration algorithm.
In Chap. 11, differential game problems of continuous-time systems, including
two-player zero-sum games, multiplayer zero-sum games, and multiplayer
nonzero-sum games, are studied via a series of ADP approaches. First, an integral
policy iteration algorithm is developed to learn online the Nash equilibrium solution
of two-player zero-sum differential games with completely unknown
continuous-time linear dynamics. Second, multiplayer zero-sum differential games
for a class of continuous-time uncertain nonlinear systems are solved by using an
iterative ADP algorithm. Finally, an online synchronous approximate optimal
learning algorithm based on policy iteration is developed to solve multiplayer
nonzero-sum games of continuous-time nonlinear systems without requiring exact
knowledge of system dynamics.
Part III: Applications (Chaps. 12–14)
In Chap. 12, intelligent optimization methods based on ADP are applied to the
challenges of intelligent price-responsive management of residential energy, with
an emphasis on home battery use connected to the power grid. First, an
action-dependent heuristic dynamic programming is developed to obtain the opti-
mal control law for residential energy management. Second, a dual iterative
Q-learning algorithm is developed to solve the optimal battery management and
control problem in smart residential environments where two iterations are intro-
duced, which are respectively internal and external iterations. Based on the dual
iterative Q-learning algorithm, the convergence property of iterative Q-learning
method for the optimal battery management and control problem is proven. Finally,
a distributed iterative ADP method is developed to solve the multibattery optimal
coordination control problem for home energy management systems.
In Chap. 13, a coal gasiﬁcation optimal tracking control problem is solved
through a data-based iterative optimal learning control scheme by using iterative
ADP approach. According to system data, neural networks are used to construct the
dynamics of coal gasiﬁcation process, coal quality, and reference control, respec-
tively. Via system transformation, the optimal tracking control problem with
approximation errors and disturbances is effectively transformed into a two-person
zero-sum optimal control problem. An iterative ADP algorithm is developed to
obtain the optimal control laws for the transformed system.
Preface xv

In Chap. 14, a data-driven stable iterative ADP algorithm is developed to solve

the optimal temperature control problem of water gas shift reaction system.
According to the system data, neural networks are used to construct the dynamics of
water gas shift reaction system and solve the reference control. Considering the
reconstruction errors of neural networks and the disturbances of the system and
control input, a stable iterative ADP algorithm is developed to obtain the optimal
control law. Convergence property is developed to guarantee that the iterative value
function converges to a ﬁnite neighborhood of the optimal cost function. Stability
property is developed so that each of the iterative control laws can guarantee the
tracking error to be uniformly ultimately bounded.

Beijing, China Derong Liu

Chicago, USA Qinglai Wei
September 2016 Ding Wang
Xiong Yang
Hongliang Li
Acknowledgements

The authors would like to acknowledge the help and encouragement they have
received from colleagues in Beijing and Chicago during the course of writing this
book. Some materials presented in this book are based on the research conducted
with several Ph.D. students, including Yuzhu Huang, Dehua Zhang, Pengfei Yan,
Yancai Xu, Hongwen Ma, Chao Li, and Guang Shi. The authors also wish to thank
Oliver Jackson, Editor (Engineering) from Springer for his patience and
encouragements.
The authors are very grateful to the National Natural Science Foundation of
China (NSFC) for providing necessary ﬁnancial support to our research in the past
ﬁve years. The present book is the result of NSFC Grants 61034002, 61233001,
61273140, 61304086, and 61374105.

xvii
Contents

1 Overview of Adaptive Dynamic Programming . . . . . . . . . . . . . . . . . 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Adaptive Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Basic Forms of Adaptive Dynamic Programming . . . . . 10
1.3.2 Iterative Adaptive Dynamic Programming . . . . . . . . . . . 15
1.3.3 ADP for Continuous-Time Systems . . . . . . . . . . . . . . . . 18
1.3.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Related Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Part I Discrete-Time Systems

2 Value Iteration ADP for Discrete-Time Nonlinear Systems . . . .... 37
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 37
2.2 Optimal Control of Nonlinear Systems
Using General Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.2.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.2.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 48
2.2.3 Generalization to Optimal Tracking Control . . . . . . . . . 52
2.2.4 Optimal Control of Systems
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . .... 56
2.2.5 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . .... 59
2.3 Iterative θ-Adaptive Dynamic Programming Algorithm
for Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.3.1 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.3.2 Optimality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3.3 Summary of Iterative θ-ADP Algorithm . . . . . . . . . . . . 80
2.3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

xix
xx Contents

2.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3 Finite Approximation Error-Based Value Iteration ADP . . . . . .... 91
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 91
3.2 Iterative θ-ADP Algorithm with Finite
Approximation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 92
3.2.1 Properties of the Iterative ADP Algorithm
with Finite Approximation Errors . . . . . . . . . . . . . . . . . 93
3.2.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 100
3.2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.3 Numerical Iterative θ-Adaptive Dynamic Programming . . . . . . . 107
3.3.1 Derivation of the Numerical Iterative θ-ADP
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 107
3.3.2 Properties of the Numerical Iterative θ-ADP
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 111
3.3.3 Summary of the Numerical Iterative θ-ADP
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 120
3.3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . .... 121
3.4 General Value Iteration ADP Algorithm with Finite
Approximation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 125
3.4.1 Derivation and Properties of the GVI Algorithm
with Finite Approximation Errors . . . . . . . . . . . . . .... 125
3.4.2 Designs of Convergence Criteria with Finite
Approximation Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
4 Policy Iteration for Optimal Control of Discrete-Time Nonlinear
Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
4.2 Policy Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.2.1 Derivation of Policy Iteration Algorithm . . . . . . . . . . . . 153
4.2.2 Properties of Policy Iteration Algorithm . . . . . . . . . . . . 154
4.2.3 Initial Admissible Control Law . . . . . . . . . . . . . . . . . . . 160
4.2.4 Summary of Policy Iteration ADP Algorithm . . . . . . . . 162
4.3 Numerical Simulation and Analysis . . . . . . . . . . . . . . . . . . . . . . 162
4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Contents xxi

5 Generalized Policy Iteration ADP for Discrete-Time Nonlinear

Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2 Generalized Policy Iteration-Based Adaptive Dynamic
Programming Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2.1 Derivation and Properties of the GPI Algorithm . . . . . . 179
5.2.2 GPI Algorithm and Relaxation of Initial Conditions . . . 188
5.2.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.3 Discrete-Time GPI with General Initial Value Functions . . . . . . 199
5.3.1 Derivation and Properties of the GPI Algorithm . . . . . . 199
5.3.2 Relaxations of the Convergence Criterion
and Summary of the GPI Algorithm . . . . . . . . . . . . . . . 211
5.3.3 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
6 Error Bounds of Adaptive Dynamic Programming Algorithms . . . . 223
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
6.2 Error Bounds of ADP Algorithms for Undiscounted Optimal
Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.2.2 Approximate Value Iteration . . . . . . . . . . . . . . . . . . . . . 226
6.2.3 Approximate Policy Iteration . . . . . . . . . . . . . . . . . . . . . 231
6.2.4 Approximate Optimistic Policy Iteration . . . . . . . . . . . . 237
6.2.5 Neural Network Implementation . . . . . . . . . . . . . . . . . . 241
6.2.6 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.3 Error Bounds of Q-Function for Discounted Optimal Control
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6.3.2 Policy Iteration Under Ideal Conditions . . . . . . . . . . . . . 249
6.3.3 Error Bound for Approximate Policy Iteration . . . . . . . . 254
6.3.4 Neural Network Implementation . . . . . . . . . . . . . . . . . . 257
6.3.5 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

Part II Continuous-Time Systems

7 Online Optimal Control of Continuous-Time Affine Nonlinear
Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 267
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 267
7.2 Online Optimal Control of Partially Unknown Affine
Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 267
7.2.1 Identifier–Critic Architecture for Solving HJB
Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 269
xxii Contents

7.2.2 Stability Analysis of Closed-Loop System . . . . . . . .... 281

7.2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . .... 286
7.3 Online Optimal Control of Affine Nonlinear Systems
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . .... 291
7.3.1 Solving HJB Equation via Critic Architecture . . . . .... 294
7.3.2 Stability Analysis of Closed-Loop System
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . . . . . 298
7.3.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
7.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
8 Optimal Control of Unknown Continuous-Time Nonaffine
Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
8.2 Optimal Control of Unknown Nonaffine Nonlinear Systems
with Constrained Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
8.2.1 Identifier Design via Dynamic Neural Networks . . . . . . 311
8.2.2 Actor–Critic Architecture
for Solving HJB Equation . . . . . . . . . . . . . . . . . . . . . . . 316
8.2.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 318
8.2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.3 Optimal Output Regulation of Unknown Nonaffine Nonlinear
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
8.3.1 Neural Network Observer . . . . . . . . . . . . . . . . . . . . . . . 328
8.3.2 Observer-Based Optimal Control Scheme
Using Critic Network . . . . . . . . . . . . . . . . . . . . . . . . . . 333
8.3.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 337
8.3.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
8.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9 Robust and Optimal Guaranteed Cost Control
of Continuous-Time Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . 345
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
9.2 Robust Control of Uncertain Nonlinear Systems. . . . . . . . . . . . . 346
9.2.1 Equivalence Analysis and Problem Transformation . . . . 348
9.2.2 Online Algorithm and Neural Network
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
9.2.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 353
9.2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
9.3 Optimal Guaranteed Cost Control of Uncertain Nonlinear
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
9.3.1 Optimal Guaranteed Cost Controller Design . . . . . . . . . 362
9.3.2 Online Solution of Transformed Optimal Control
Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Contents xxiii

9.3.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 373

9.3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
9.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
10 Decentralized Control of Continuous-Time Interconnected
Nonlinear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
10.2 Decentralized Control of Interconnected Nonlinear Systems . . . . 388
10.2.1 Decentralized Stabilization via Optimal Control
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
10.2.2 Optimal Controller Design of Isolated Subsystems . . . . 394
10.2.3 Generalization to Model-Free
Decentralized Control . . . . . . . . . . . . . . . . . . . . . . . . . . 400
10.2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
10.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
11 Learning Algorithms for Differential Games
of Continuous-Time Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
11.2 Integral Policy Iteration for Two-Player Zero-Sum Games . . . . . 418
11.2.1 Derivation of Integral Policy Iteration . . . . . . . . . . . . . . 420
11.2.2 Convergence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 423
11.2.3 Neural Network Implementation . . . . . . . . . . . . . . . . . . 425
11.2.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
11.3 Iterative Adaptive Dynamic Programming for Multi-player
Zero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
11.3.1 Derivation of the Iterative ADP Algorithm . . . . . . . . . . 433
11.3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
11.3.3 Neural Network Implementation . . . . . . . . . . . . . . . . . . 444
11.3.4 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
11.4 Synchronous Approximate Optimal Learning for Multi-player
Nonzero-Sum Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
11.4.1 Derivation and Convergence Analysis . . . . . . . . . . . . . . 460
11.4.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 464
11.4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
11.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
xxiv Contents

Part III Applications

12 Adaptive Dynamic Programming for Optimal Residential Energy
Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
12.2 A Self-learning Scheme for Residential Energy System
Control and Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
12.2.1 The ADHDP Method . . . . . . . . . . . . . . . . . . . . . . . . . . 488
12.2.2 A Self-learning Scheme for Residential Energy
System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
12.2.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
12.3 A Novel Dual Iterative Q-Learning Method for Optimal
Battery Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
12.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
12.3.2 Dual Iterative Q-Learning Algorithm . . . . . . . . . . . . . . . 497
12.3.3 Neural Network Implementation . . . . . . . . . . . . . . . . . . 503
12.3.4 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
12.4 Multi-battery Optimal Coordination Control for Residential
Energy Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
12.4.1 Distributed Iterative ADP Algorithm . . . . . . . . . . . . . . . 515
12.4.2 Numerical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
12.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
13 Adaptive Dynamic Programming for Optimal Control of Coal
Gasiﬁcation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
13.2 Data-Based Modeling and Properties . . . . . . . . . . . . . . . . . . . . . 538
13.2.1 Description of Coal Gasiﬁcation Process
and Control Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
13.2.2 Data-Based Process Modeling and Properties . . . . . . . . 540
13.3 Design and Implementation of Optimal Tracking Control. . . . . . 546
13.3.1 Optimal Tracking Controller Design by Iterative ADP
Algorithm Under System and Iteration Errors . . . . . . . . 546
13.3.2 Neural Network Implementation . . . . . . . . . . . . . . . . . . 554
13.4 Numerical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
13.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
14 Data-Based Neuro-Optimal Temperature Control
of Water Gas Shift Reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
14.2 System Description and Data-Based Modeling . . . . . . . . . . . . . . 572
14.2.1 Water Gas Shift Reaction . . . . . . . . . . . . . . . . . . . . . . . 572
14.2.2 Data-Based Modeling and Properties . . . . . . . . . . . . . . . 573
Contents xxv

14.3 Design of Neuro-Optimal Temperature Controller . . . . . . . .... 575

14.3.1 System Transformation . . . . . . . . . . . . . . . . . . . . . .... 575
14.3.2 Derivation of Stable Iterative ADP Algorithm . . . . .... 576
14.3.3 Properties of Stable Iterative ADP Algorithm
with Approximation Errors and Disturbances . . . . .... 578
14.4 Neural Network Implementation for the Optimal Tracking
Control Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
14.5 Numerical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
14.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
Abbreviations

ACD Adaptive critic designs

AD Action-dependent, e.g., ADHDP and ADDHP
ADP Adaptive dynamic programming, or approximate dynamic programming
ADPRL Adaptive dynamic programming and reinforcement learning
BP Backpropagation
DHP Dual heuristic programming
DP Dynamic programming
GDHP Globalized dual heuristic programming
GPI Generalized policy iteration
GVI General value iteration
HDP Heuristic dynamic programming
HJB Hamilton–Jacobi–Bellman, e.g., HJB equation
HJI Hamilton–Jacobi–Isaacs, e.g., HJI equation
NN Neural network
PE Persistence of excitation
PI Policy iteration
UUB Uniformly ultimately bounded
VI Value iteration
RL Reinforcement learning

xxvii
Symbols

T The transposition symbol, e.g., AT is the transposition of matrix A

N The set of all natural numbers
Zþ The set of all positive integers, i.e., N ¼ f0; Z þ g
R The set of all real numbers
Rn The Euclidean space of all real n-vectors, e.g., a vector x 2 Rn is
written as x ¼ ðx1 ; x2 ; . . .; xn ÞT
Rmn The space of all m by n real matrices, e.g., a matrix A 2 Rmn is
written as A ¼ ðaij Þ 2 Rmn
kk The vector norm or matrix norm in Rn or Rnm
k kF The Frobenius matrix norm, which is the Euclidean norm of a matrix,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn P m
is defined as kAkF ¼ a2ij for A ¼ ðaij Þ 2 Rnm
i¼1 j¼1

2 Belong to
8 For all
) Implies
, Equivalent, or if and only if
Kronecker product
; The empty set
, Equal to by definition
Cn ðΩÞ The class of functions having continuous nth derivative on Ω
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
R ffi
2
ℒ2(Ω) The ℒ2 space defined on Ω, i.e., Ω kf ðxÞk dx\1 for f 2 ℒ2(Ω)
ℒ∞(Ω) The ℒ∞ space defined on Ω, i.e., supx2Ω kf ðxÞk\1 for f 2 ℒ∞(Ω)
λmin ðAÞ The minimum eigenvalue of matrix A
λmax ðAÞ The maximum eigenvalue of matrix A
In The n by n identity matrix
A[0 Matrix A is positive definite
det(A) Determinant of matrix A
A1 The inverse of matrix A
tr(A) The trace of matrix A

xxix
xxx Symbols

vec(A) The vectorization mapping from matrix A into an mn-dimensional

column vector for A 2 Rmn
diagfζi g Also written as diagfζ1 ; ζ2 ; . . .; ζn g which is an n n diagonal matrix
with diagonal elements ζ1 ; ζ2 ; . . .; ζ n
tanhðxÞ The hyperbolic tangent function of x
sgnðxÞ The sign function of x, i.e., sgnðxÞ ¼ 1 for x [ 0, sgnð0Þ ¼ 0, and
sgnðxÞ ¼ 1 for x\0
AðΩÞ The set of admissible controls on Ω
J Performance index, or cost-to-go, or cost function
Jμ Performance index, or cost-to-go, or cost function associated with the
policy μ
J Optimal performance index function or optimal cost function
V Value function, or performance index associated with a speciﬁc policy
V Optimal value function
L Lyapunov function
Wf ; Yf NN weights for function approximation
Wm ; Ym Model/identiﬁer NN weights
Wo ; Yo Observer NN weights
Wc ; Yc Critic NN weights
Wa ; Ya Action NN weights
Chapter 1
Overview of Adaptive Dynamic
Programming

1.1 Introduction

Big data, artificial intelligence (AI), and deep learning are the three topics talked about
the most lately in information technology. The recent emergence of deep learning
[10, 17, 38, 68, 88] has pushed neural networks (NNs) to become a hot research
topic again. It has also gained huge success in almost every branch of AI, includ-
ing machine learning, pattern recognition, speech recognition, computer vision, and
natural language processing [17, 25, 26, 35, 74]. On the other hand, the study of
big data often uses AI technologies such as machine learning [80] and deep learning
[17]. One particular subject of study in AI, i.e., the computer game of Go, faced
a great challenge of dealing with vast amounts of data. The ancient Chinese board
game Go has been studied for years with the hope that one day, computer programs
can defeat human professional players. The board of game Go consists of 19 × 19
grid of squares. At the beginning of the game, each of the two players has roughly
360 options for placing each stone. However, the number of potential legal board
positions grows exponentially, and it quickly becomes greater than the total number
of atoms in the whole universe [103]. Such a number leads to so many directions any
given game can move in that makes it impossible for a computer to play by brute
force computation of all possible outcomes.
Previous computer programs focused less on evaluating the state of the board
positions and more on speeding up simulations of how the game might play out.
The Monte Carlo tree search approach was used often in computer game programs,
which samples only some of the possible sequences of plays randomly at each step to
choose between different possible moves instead of trying to calculate every possible
ones. Google DeepMind, an AI company in London acquired by Google in 2014,
developed a program called AlphaGo [92] that has shown performance previously
thought to be impossible for at least a decade. Instead of exploring various sequences
of moves, AlphaGo learns to make a move by evaluating the strength of its position on
the board. Such an evaluation was made possible by NN’s deep learning capabilities.

D. Liu et al., Adaptive Dynamic Programming with Applications
in Optimal Control, Advances in Industrial Control,
DOI 10.1007/978-3-319-50815-3_1
2 1 Overview of Adaptive Dynamic Programming

Position evaluation (for approximating the optimal cost-to-go function of the

game) is the key to success of AlphaGo. Such ideas have been used previously
by many researchers in computer games, such as backgammon (TD-Gammon) [100,
101], checkers [87], othello [13], and chess [16]. A reinforcement learning technique
called TD(λ) was employed in AlphaGo and TD-Gammon for position evaluation.
With TD-Gammon, the program has learned to play backgammon at a grandmaster
level [100, 101]. On the other hand, AlphaGo has defeated European Go champion
Fan Hui (professional 2 dan) by 5 games to 0 [92] and defeated world Go champion
Lee Sedol (professional 9 dan) by 4 games to 1 [71, 111].
The success of reinforcement learning (RL) technique in this case relied on NN’s
deep learning capabilities [10, 17, 38, 68, 88]. The NNs used in AlphaGo have a
deep structure with 13 layers. Even though there were reports on the use of RL and
related techniques for the computer game of Go [15, 89, 93, 137, 138], it is only
with AlphaGo [92] that value networks were obtained using deep NNs to achieve
high evaluation accuracy. On the other hand, position evaluation [8, 20, 24, 89, 93]
and deep learning [18, 66, 102] have been considered for building programs to play
the game of Go; none of them achieved the level of success by AlphaGo [92]. The
match of AlphaGo versus Lee Sedol in March 2016 is a history-making event and
a milestone in the quest of AI. The defeat over humanity by a machine has also
generated huge public interests in AI technology around the world, especially in
China, Korea, US, and UK [111]. It will have a lasting impact on the research in AI,
deep learning, and RL [142].
RL is a very useful tool in solving optimization problems by employing the princi-
ple of optimality from dynamic programming (DP). In particular, in control systems
community, RL is an important approach to handle optimal control problems for
unknown nonlinear systems. DP provides an essential foundation for understanding
RL. Actually, most of the methods of RL can be viewed as attempts to achieve much
the same effect as DP, with less computation and without assuming a perfect model
of the environment. One class of RL methods is built upon the actor-critic structure,
namely adaptive critic designs, where an actor component applies an action or control
policy to the environment, and a critic component assesses the value of that action
and the state resulting from it. The combination of DP, NN, and actor-critic structure
results in the adaptive dynamic programming (ADP) algorithms.
The present book studies ADP with applications to optimal control. Significant
efforts will be devoted to the building of value functions which indicate how well
the predicted system performance is, which in turn are used for developing the opti-
mal control strategy. Both RL and ADP provide approximate solutions to dynamic
programming, and they are closely related to each other. Therefore, there has been a
trend to consider the two together as ADPRL (ADP and RL). Examples include IEEE
International Symposium on Adaptive Dynamic Programming and Reinforcement
Learning (started in 2007), IEEE CIS Technical Committee on Adaptive Dynamic
Programming and Reinforcement Learning (started in 2008), and a survey article
[42] published in 2009. A brief overview of RL will be given in the next section,
followed by a more detailed overview of ADP. We review both the basic forms of
1.1 Introduction 3

ADP as well as iterative forms. A few related books will be briefly reviewed before
the end of this chapter.

1.2 Reinforcement Learning

The main research results in RL can be found in the book by Sutton and Barto
[98] and the references cited in the book. Even though both RL and the main topic
studied in the present book, i.e., ADP, provide approximate solutions to dynamic
programming, research in these two directions has been somewhat independent [7]
in the past. The most famous algorithms in RL are the temporal difference algorithm
[97] and the Q-learning algorithm [112, 113]. Compared to ADP, the area of RL is
more mature and has a vast amount of literature (cf. [27, 34, 47, 98]).
An RL system typically consists of the following four components: {S, A, R, F},
where S is the set of states, A is the set of actions, R is the set of scaler reinforcement
signals or rewards, and F is the function describing the transition from one state
to the next under a given action, i.e., F : S × A → S. A policy π is defined as a
mapping π : S → A. At any given time t, the system can be in a state st ∈ S, take
an action at ∈ A determined by the policy π , i.e., at = π(st ), transition to the next
state s = st+1 , which is denoted by st+1 = F(st , at ), and at the same time, receive a
reward signal rt+1 = r(st , at , st+1 ) ∈ R. The goal of RL is to determine a policy to
maximize the accumulated reward starting from initial state s0 at t = 0.
An RL task always involves estimating some kind of value functions. A value
function estimates how good it is to be in a given state s and it is defined as,
∞
∞

V π (s) = γ k rk+1 s0 =s = γ k r(sk , ak , sk+1 )s0 =s,
k=0 k=0

where 0 ≤ γ ≤ 1 is a discount factor and ak = π(sk ) and sk+1 = F(sk , ak ) for k =

0, 1, . . . . The definition of V π (s) can also be considered to start from st , i.e.,
∞
∞

V π (s) = γ k rt+k+1 st =s = γ k r(st+k , at+k , st+k+1 )st =s, (1.2.1)
k=0 k=0

where at+k = π(st+k ) and st+k+1 = F(st+k , at+k ) for k = 0, 1, . . . . V π (s) is referred
to as the state-value function for policy π . On the other hand, the action-value function
for policy π estimates how good it is to perform a given action a in a given state s
under the policy π and it is defined as,
∞
∞

Qπ (s, a) = γ k rt+k+1 st =s, at =a = γ k r(st+k , at+k , st+k+1 )st =s, at =a,
k=0 k=0
(1.2.2)
Exploring the Variety of Random
Documents with Different Content
BOOK III.
LAMBETH PALACE.

CHAPTER I.
HOW CARDINAL POLE ARRIVED IN ENGLAND, AND HOW HE
WAS WELCOMED BY THE KING AND QUEEN.

he court returned to Whitehall in November, Parliament

being about to meet in the middle of that month.
One morning, as the royal pair were walking together in
the west gallery overlooking the garden, the Lord
Chancellor presented himself with a despatch in his hand. It was
easy to perceive, from the joyous expression of his countenance,
that he brought good tidings.
“Welcome, my good lord,” said Mary. “I see you have satisfactory
intelligence to communicate. Have you heard from Rome?”
“I have just received this transcript of the decree which has been
sent to Cardinal Pole by the Pope,” replied Gardiner, “in which his
Holiness, after due deliberation, has agreed to extend the privileges
of the Legate, so as to enable him to act on all occasions with the
same plentitude of power as the Pope himself. In regard to church
revenues and goods, his Holiness fully recognises the great difficulty
of the question, feeling it to be the main obstacle to the nation’s
recognition of the Papal supremacy, and he therefore invests his
Eminence with the most ample power to agree and compound with
the present owners; to assure to them their possessions, on
whatever title they may hold them; and to exempt them from any
duty of restitution.”
“This is glad news indeed!” exclaimed the Queen. “Parliament
meets in a few days. Your first business must be to repeal the
attainder of the Cardinal, who will then be free to return to his own
country, and aid us with his counsels. Hasten his arrival, I pray you,
my lord, by all means in your power. I shall not feel perfectly happy
till I behold him!”
“There shall not be a moment’s needless delay, rely upon it,
gracious Madam,” replied Gardiner. “The repeal of the attainder may
be considered as already accomplished, since no opposition will now
be made to the measure. Meantime, an escort shall be immediately
despatched to Brussels to bring over his Eminence with all honour to
this country.”
Having nothing more to lay before their Majesties, he then bowed
and withdrew.
Parliament was opened by the King and Queen in person, a sword
of state and a cap of maintenance being borne before each of them
as they went in state to the House of Lords. Everything proceeded
as satisfactorily as had been anticipated by Gardiner. The first bill
brought before the Lords was that for reversing Pole’s attainder,
which, being quickly passed, was sent down to the Commons, and
read thrice in one day; after which it received the royal assent, the
impression of the great seal being taken off in gold.
Meantime, in confident anticipation of this event, a brilliant escort,
comprising Lord Paget, Sir Edward Hastings, Sir William Cecil, and
forty gentlemen of good birth, had been despatched to Brussels, to
bring back the illustrious exile to his own country. As soon as
intimation was received by Pole that he was free to return, he took
leave of the Emperor, and set out with his escort for England.
Among the Cardinal’s suite was one of whom some account may
be necessary. Years ago, while studying at the celebrated university
of Padua, Pole contracted a friendship with Ludovico Priuli, a young
Venetian noble, distinguished for his personal accomplishments,
refined manners, and love of learning. From this date the two friends
became inseparable. Possessed of an ample fortune, Priuli, from his
position, might have filled the highest offices in the Venetian
Republic, but he preferred sharing Pole’s labours, and proved a most
valuable coadjutor to him. Chosen as successor to the Bishop of
Brescia by Pope Julius III., Priuli declined to exercise his functions,
and even refused the purple rather than quit his friend. He had
remained with Pole during his retirement at the convent of
Maguzano, had attended him to Brussels and to Paris, whither the
Cardinal went to negotiate terms of peace between Spain and
France, and of course accompanied him to England. Besides the
Lord Priuli, Pole was attended by his secretary, Floribello, an
excellent scholar, together with the Signori Stella and Rollo, both
men of learning and piety, though somewhat advanced in years.
Owing to the infirm state of his health, the Cardinal was unable to
proceed far without resting, and after a week’s slow travel he
reached Calais (then, it need scarcely be said, in possession of
England, though soon afterwards lost), where he was received by
the governor with a distinction rarely shown to any other than a
crowned head.
Pole attended high mass at the cathedral, and the populace clad in
holiday attire, flocked thither to receive his blessing. One
circumstance occurred which was regarded as a most favourable
omen. For more than a week strong adverse winds had prevailed in
the Channel, but a favourable change suddenly took place,
promising a swift and pleasant passage to the Cardinal.
A royal vessel awaited him, in which he embarked with his train,
and escorted by six men-of-war, well armed, and under the
command of the Lord High Admiral, he sailed on a bright sunny day
for England, and, impelled by a fresh wind, arrived in a few hours at
Dover.
A royal salute was fired from the guns of the castle as the Cardinal
landed, and he was received by his nephew, Lord Montague, son of
his elder brother, who had been put to death by Henry VIII. With
Lord Montague were several other noblemen and gentlemen,
amongst whom were the mayor and the town authorities, and
besides these there was a vast miscellaneous concourse.
No sooner did the Cardinal set foot on the mole, closely followed
by his other nephew, Sir Edward Hastings, and Lord Priuli, than the
whole assemblage prostrated themselves before him. Spreading his
arms over them, Pole gave them his solemn benediction. All eyes
were fixed on the venerable and majestic figure before them—all
ears were strained to catch his words. The noble cast of the
Cardinal’s countenance, proclaiming his royal descent—his reverend
air, increased by the long grey beard that descended to his waist—
the benignity and sweetness of his looks—the stateliness of his
deportment—all produced an indescribable effect on the spectators.
Lofty of stature, and spare of person—the result of frequent fastings
—Pole, notwithstanding the ailments under which he laboured,
carried himself erect, and ever maintained a most dignified
deportment. To complete the picture we desire to present, it may be
necessary to say that his garments were those proper to his eminent
ecclesiastical rank, namely, a scarlet soutane, rochet, and short
purple mantle. His silk gloves and hose were scarlet in hue, and from
his broad red hat depended on either side long cords, terminating in
tassels of two knots each. These garments became him well, and
heightened the imposing effect of his presence.
Behind him stood his friend, Lord Priuli, who was nearly of his own
age, though he looked full ten years younger, and appeared scarcely
past the prime of life. The noble Venetian had a countenance which
Titian would have delighted to paint, so handsome was it, so grave
and full of thought. Priuli was attired in black taffetas, over which he
wore a long silk gown of the same colour, and had a black skull-cap
on his head.
Signor Floribello, Pole’s secretary, was a Roman, and had a
massive and antique cast of countenance, which might have become
one of his predecessors of the Augustan age. He had a grave,
scholar-like aspect, and was attired in dark habiliments. With him
were the Cardinal’s other attendants, Stella and Rolla, neither of
whom merit special description. The former was the Cardinal’s
steward, and the latter his comptroller, and each wore a gold chain
around his neck.
Lord Montague was a very goodly personage, and bore such a
remarkable resemblance to his ill-fated father, that Pole exclaimed,
as he tenderly embraced him, “I could almost fancy that my long-
lost and much-lamented brother had come to life again. I doubt not
you possess your father’s excellent qualities of head and heart, as
well as his good looks.”
“I trust I am no degenerate son, dear and venerated uncle,”
replied Montague. “But I would my father had lived to see this day,
and to welcome you back to the land from which you have been so
long and so unjustly exiled.”
“Heaven’s will be done!” ejaculated Pole, fervently. “I do not
repine, though I have never ceased to lament the calamities and
afflictions I have brought upon my family.”
“Think not of them now, dear uncle,” rejoined Lord Montague.
“They are passed and gone. The tyrant who inflicted these injuries is
in his grave. Happier days have dawned upon us. Your brother yet
lives in me, to honour and serve you. Perchance your martyred
mother now looks down from that heaven which her destroyer shall
never enter, and joys at her son’s return.”
“It may be,” replied the Cardinal, glancing upwards, “and ere long
I hope to join her, for my sojourn in this Vale of Tears is nearly
ended; but I have much to do while I tarry here. Oh! my good
nephew! what mixed emotions of joy and sorrow agitate my breast—
joy at returning to the country of my birth—sorrow for the relatives
and friends I have lost. Many a time and oft, during my long
banishment, have I besought Heaven to allow me to return and lay
my bones in my native land; and now that my prayers have been
granted, I tremble and am sad, for I feel like a stranger.”
“You will not be a stranger long, dear uncle,” returned Lord
Montague. “There is not one of this throng who does not feel that
Heaven has sent you to us to give us a blessing, of which we have
so long been deprived.”
As he spoke, the crowd, which had been pressing on them, could
no longer be kept back, but completely surrounded the Cardinal;
those nearest him throwing themselves at his feet, kissing his
garments, trying to embrace his knees, and making every possible
demonstration of reverence. Little children were held up to him; old
men struggled to approach him; and it was long before he could
extricate himself from the throng, which he did with great
gentleness and consideration.
Graciously declining the hospitality proffered by the mayor, the
Cardinal proceeded with his suite to the Priory of Saint Martin, where
he tarried for the night.
On the next day, attended by an immense cortége, and having
two great silver crosses, two massive silver pillars, and two silver
pole-axes borne before him, as emblems of his Legantine authority,
he journeyed to Canterbury. Here he heard mass in the magnificent
cathedral, of which he was so soon to become head, and rested at
the palace.
On the second day he proceeded to Rochester, his escort
increasing as he went on; and on the third day he reached
Gravesend, where he was met by the Bishop of Durham, the Earl of
Shrewsbury, and other important personages, who had been
dispatched by their Majesties to offer him their congratulations on
his safe arrival in England, and at the same time to present him with
a copy of the act by which his attainder was reversed.
At Gravesend he again tarried for the night, and next morning
entered a royal barge, richly decorated, lined with tapestry, and
containing a throne covered with gold brocade. At the prow of this
barge a silver cross was fixed, which attracted universal attention as
he passed up the river, attended by several other gorgeous barges
conveying his retinue.
As the Cardinal approached the metropolis, the river swarmed
with boats filled with persons of all ranks eager to welcome him,
while crowds collected on the banks to gaze at his barge with the
great silver cross at the prow.
While passing the Tower, and gazing at the gloomy fortress where
the terrible tragedies connected with his family had been enacted,
the Cardinal became a prey to saddening thoughts. But these were
dispelled as he approached London Bridge, and heard the shouts of
the spectators, who greeted him from the windows of the lofty
habitations. The next objects that attracted his attention were
Baynard’s Castle and Saint Paul’s, and he uttered aloud his
thanksgivings that the ancient rites of worship were again performed
in the cathedral.
Sweeping up the then clear river, past the old palace of Bridewell,
Somerset House—built in the preceding reign by the Lord Protector,
and which the Cardinal had never before seen—past Durham-place
and York House, attended by hundreds of barques, he at length
approached the palace of Whitehall, and was taken to the privy
stairs.
At the head of the stairs stood Gardiner, ready to receive him, and
after they had interchanged a most amicable greeting, and Pole had
presented his friend Priuli, Gardiner conducted the Cardinal through
two lines of attendants apparelled in the royal liveries, all of whom
bowed reverentially as Pole passed on to the principal entrance of
the palace, where the King, with the chief personages of his court,
awaited his coming.
As the Lord Legate slowly approached, supported by Gardiner,
Philip advanced to meet him, and, embracing him affectionately,
bade him welcome, saying how anxiously both the Queen and
himself had looked for his coming. To these gracious expressions
Pole replied:
“I have rejoiced at the union her Majesty has formed, Sire,
because I regard it as a presage of my country’s future felicity.
Inasmuch as a nuptial disagreement between an English monarch
and a Spanish queen led to a most lamentable breach with the Holy
See, so the marriage of a Spanish king and an English princess will
serve to heal the breach. Most assuredly my countrymen will reap
the benefit of this auspicious alliance, and so far from finding any
yoke placed upon them, as they once apprehended, will recognise
the difference between your Majesty and that Prince who chastised
them with so heavy a rod.”
“With the aid of your Eminence in all spiritual matters, and with
that of the Lord Chancellor in temporal affairs,” replied Philip, “I
doubt not I shall be able, through the Queen’s Highness, to
contribute to the welfare and prosperity of the realm. Such has been
my constant endeavour since I have been here. And now suffer me
to lead you to her Majesty, who is all impatience to behold you.”
Hereupon they ascended the grand staircase, the King graciously
giving his arm to the Lord Legate. At the head of the staircase they
found the Queen, who exhibited the liveliest marks of delight on
seeing the Cardinal, and gave him a most affectionate greeting.
Pole could not fail to be deeply moved by so much kindness, and
with streaming eyes, and in broken accents, sought to express his
gratitude. He soon, however, regained his customary serenity, and
attended the Queen to the privy-chamber, whither they were
followed by the King and the Lord Chancellor. He then delivered his
credentials to her Majesty, and they had a long discourse together, in
which both the King and the Lord Chancellor took part.
Before withdrawing, Pole besought permission to present his
friend Lord Priuli, and Mary kindly assenting, the noble and learned
Venetian was introduced to their Majesties, and very graciously
received by both. After this the Cardinal took leave, and, attended
by Gardiner, re-entered his barge, and was conveyed in it to
Lambeth Palace, which had been prepared for his residence.
On the same day a grand banquet was given at Whitehall in
honour of the Lord Legate, at which all the nobles vied with each
other in paying him attention. Indeed, since Wolsey’s palmiest days
no such distinction had been shown to an ecclesiastic. Priuli, also,
came in for some share of the tribute of respect paid to his
illustrious friend.
On the following day, in order to celebrate Pole’s arrival publicly, a
grand tournament was held in the court of the palace, where
galleries were erected, adorned with rich hangings, having two
canopies of crimson cloth of silver, embroidered with the royal arms,
prepared for their Majesties—a chair for the Cardinal being set near
that of the Queen. Precisely at two o’clock her Majesty issued from
the palace in company with the Cardinal, attended by her ladies, and
took her place beneath the canopy, Pole seating himself beside her.
The galleries on either side presented a magnificent sight, being
thronged with all the beauty and chivalry of the court—high-born
dames and noble gallants, all richly apparelled.
The lists were under the governance of the Lord Chamberlain, Sir
John Gage, who was clad in russet armour, and mounted on a
powerful and richly-caparisoned steed; and as soon as the Queen
and the Cardinal had taken their places, loud fanfares were blown by
a bevy of trumpeters stationed on the opposite side of the court.
At this summons two champions immediately rode into the ring,
attracting great attention. One of them was the King. He was clad in
a suit of richly chased armour inlaid with gold, and his helm was
adorned with a panache of red ostrich plumes. His courser was
trapped with purple satin, broached with gold. As he rode round the
tilt-yard and saluted the Queen, a buzz of applause followed his
course.
His opponent was Osbert Clinton, whom his Majesty had
challenged to a trial of skill. Osbert wore a suit of black armour, with
a white plume, and was mounted on a powerful charger, with bases
and bards of black cloth of gold of damask.
As soon as the champions had taken their places, the signal was
given by Sir John Gage, and dashing vigorously against each other,
they met in mid-career, both their lances being shivered by the
shock. As no advantage had been gained on either side, fresh lances
were brought, and they immediately ran another course. In this
encounter, Osbert had the best of it, for he succeeded in striking off
the King’s helmet, and was consequently proclaimed the victor, and
received a costly owche as a prize from the hands of the Queen.
Other courses were then run, and spears broken, all the
combatants demeaning themselves valiantly and like men of
prowess. Amongst the Spaniards, those who most distinguished
themselves were Don Ruy Gomez de Silva, Don Frederic de Toledo,
and Don Adrian Garcias; whilst amongst the Englishmen the best
knights were accounted the Lord Admiral and Sir John Perrot. The
King was more fortunate in other courses than in those he had run
with Osbert Clinton, and received a diamond ring from her Majesty,
amid the loud plaudits of the spectators.
After this, Sir John Gage called upon them to disarm, the trumpets
sounded, and graciously bowing to the assemblage, the Queen
withdrew with the Cardinal.
CHAPTER II.
OF THE RECONCILIATION OF THE REALM WITH THE SEE
OF ROME.

few days afterwards, in consequence of the Queen’s

indisposition, which, however, was not supposed to be of a
nature to inspire uneasiness, both Houses of Parliament
were summoned to the palace of Whitehall, and assembled
in the presence-chamber. Mary, who was so weak at the time that
she had to be carried to her throne, was placed on a hautpas,
beneath a rich canopy embroidered with the royal arms in gold.
On her left hand was seated the King, attired in black velvet, over
which he wore a robe of black cloth of gold, bordered with pearls
and diamonds. The collar of the Garter was round his neck, and the
lesser badge studded with gems, beneath his knee.
On the Queen’s right, and on the hautpas, but not beneath the
canopy, sat Cardinal Pole. His robes were of the richest scarlet, and
he wore a mantle of fine sables about his neck. He was attended by
four gentleman ushers, all richly clad, and having heavy chains of
gold round their necks. Two of these carried the large silver crosses,
and the other two bore the silver pillars. Behind the Queen stood Sir
John Gage, in his robes of office as Grand Chamberlain, and holding
a white wand, and with him were the Vice-Chamberlain and other
officers of the royal household. All the Queen’s ladies were likewise
grouped around the throne.
Near to the Lord Legate stood Gardiner, and as soon as all were in
their places, and the doors had been closed by the ushers, he
addressed both Houses, informing them that the Right Reverend
Father in God, the Lord Cardinal Pole, legate a latere, who was now
present before them, had come as ambassador from Pope Julius III.
to the King and Queen’s Majesties on a matter of the utmost
importance, not only to their Highnesses, but to the whole realm. As
representatives of the nation, they were called there to listen to the
declaration about to be made to them by the Lord Legate.
When Gardiner concluded his address and retired, every eye was
fixed upon the Cardinal, and a hush of expectation fell upon the
assemblage. After a moment’s pause, Pole arose, and with a
dignified bow to their Majesties, commenced his address, in tones
that vibrated through every breast.
“Long excluded from this assembly,” he said, “and exiled from my
native country by laws upon the severity and injustice of which I will
not dwell, I have most heartily to thank you, my Lords of the Upper
House, and you, good Sirs, of the Nether House, for reversing the
sentence pronounced upon me, and enabling me to appear before
you once more. I rejoice that I am able to requite the great service
you have rendered me. You have restored me to my country and to
my place amongst the highest nobility upon earth. I can restore you
to a heavenly kingdom, and to a Christian greatness, which you have
unhappily forfeited by renouncing a fealty annexed to the true
Church. Bethink you of the many evils that have occurred to this
land since its lamentable defection. Estimate aright the great boon
now offered you. Until the late most unhappy schism, the English
nation ever stood foremost in the regard of the See of Rome,
abundant proofs of which I can offer you. While reminding you of
your past errors, let me exhort you to a sincere repentance, and to
receive with a deep and holy joy the reconciliation with the Church
of Rome, which I, as Legate, am empowered to impart to you. To
reap this great blessing it only needs that you should repeal
whatever you have enacted against the Holy See, and those laws by
which you have severed yourselves from the body of the faithful.”
Delivered in tones of mellifluous sweetness and persuasion, this
discourse was listened to with profound attention, and produced an
unmistakeable effect upon the auditors. As the Cardinal resumed his
seat, Gardiner advanced towards him.
“I thank your Eminence,” he said, “in the name of their Majesties
and the Parliament, for the good offices you have rendered the
nation. The members of both Houses will at once deliberate upon
what you have proposed, and will speedily acquaint you with their
determination, which, I nothing doubt, will be favourable to the
cause of our holy religion.”
Upon this, the Lord Legate arose and retired with his attendants
into an adjoining chamber, there to await the decision of the
Parliament.
As soon as he was gone, Gardiner again addressed the
assemblage in these terms: “Heaven hath spoken to you by the lips
of the holy man to whom you have just listened. I can confirm the
truth of all he has uttered. I acknowledge myself to be a great
delinquent, but I have deeply and sincerely repented of my errors,
and I beseech you to do so likewise. Rise from your fallen estate,
and dispose yourselves to a complete reconciliation with the Catholic
Church, and a return to its communion. Are ye all agreed to this?”
“We are all agreed,” replied the whole assemblage, without a
moment’s hesitation.
“I rejoice to hear it,” replied Gardiner. “If you have erred, you at
least make amends for your error.”
The promptitude and unanimity of this decision gave great
satisfaction to their Majesties, and the King, calling Gardiner to him,
held a brief conference with him, after which Sir John Gage, with the
Earl of Arundel, six knights of the Garter, and the like number of
bishops, were sent to summon the Lord Legate. As Pole again
entered the presence-chamber, the whole of the assemblage arose.
The Cardinal having resumed his seat, Gardiner called out, in a loud
voice,—
“I again ask you, in the presence of the Lord Legate, whether you
sincerely desire to return to the unity of the Church, and the
obedience due to her chief pastor?”
“We do!—we do!” cried the entire assemblage.
A radiant smile passed over Pole’s benign countenance at these
exclamations, and he raised up his hands in thankfulness to Heaven.
“This moment repays me for all I have suffered,” he murmured.
Then Gardiner turned towards the King and Queen, and, making a
profound obeisance to them, said:—
“On behalf of the members of both Houses of Parliament,
representatives of the whole realm, I have to express to your
Majesties their sorrow for the former schism, and for whatever they
have enacted against the See of Rome and the Catholic religion, all
which they now annul; and would humbly beseech you to obtain
from the Lord Legate pardon and restoration to that body from
which they had separated themselves by their misdeeds.”
“We pray your Eminence to grant the pardon and reconciliation
thus humbly sued for?” said Philip, turning towards the Cardinal.
“Right joyfully will I accede to your Majesty’s request,” replied
Pole.
The Cardinal’s assent having been communicated to the
assemblage by Gardiner, they all advanced towards Pole, who arose
as they approached, and said:—
“Thanks are due to Divine goodness for granting you this
opportunity of cancelling your past offences. If your repentance be
answerable to the importance of the occasion and the heinousness
of the fault, great, indeed, must be the joy of the saints at your
conversion.”
It being now evident that the Cardinal was about to pronounce
the absolution, the whole assemblage, with the exception of the
King and Queen, fell upon their knees. Extending his arms over
them, Pole, in a clear and distinct voice, said:—
“As representative of Christ’s Vicegerent, I here absolve all those
present, and the whole nation, and the whole dominion thereof,
from all heresy and schism, and all judgments, censures, for that
cause incurred, and restore them to the communion of the Holy
Church, in the name of the Father, Son, and Holy Ghost.”
To this the whole assemblage responded “Amen!”
Nothing could be more solemn and impressive than the Cardinal’s
manner while pronouncing this absolution, and his words penetrated
all hearts. The Queen and most of her ladies shed tears. As the
assembly rose from their kneeling posture, they embraced each
other, and gave utterance to their satisfaction.
The King and Queen, with their attendants, then proceeded to the
royal chapel to return thanks, and were followed by the Cardinal,
Gardiner, and the entire assemblage. A solemn mass was then
performed, and Te Deum sung.
CHAPTER III.
OF THE EVENTS THAT FOLLOWED THE RESTORATION OF
THE PAPAL AUTHORITY.

o sooner was the nation’s reconciliation with the See of

Rome completed, than an express was sent by Cardinal
Pole to Pope Julius III., acquainting his Holiness with the
joyful event. On receipt of the intelligence, public rejoicings
on the grandest scale were held at Rome, religious processions
paraded the streets, masses were performed in all the churches, and
a solemn service was celebrated at Saint Peter’s by the Pontiff in
person. The event, indeed, was a signal triumph to the Pope, and in
reply to Cardinal Pole he thanked him heartily for the great service
he had rendered the Church, and warmly commended his zeal and
diligence. Moreover, he issued a bull granting indulgences to all such
persons as should openly manifest their satisfaction at the
restoration of the Papal authority in England.
Public rejoicings also took place in London, and in other towns,
but they were productive of mischief rather than good, as they led to
many serious brawls and disturbances. Though compelled to submit
to their opponents, who were now in the ascendant, the Reformers
were far from subdued, but were quite ready for outbreak, should a
favourable opportunity occur for attempting it. The triumphant
demonstrations of the Romanists were abhorrent to them, and
constant collisions, as we have said, took place between the more
violent adherents of the opposing creeds. In these encounters, the
Protestants, being the less numerous, got the worst of it, but they
promised themselves revenge on a future day.
On the Sunday after the reconciliation, a sermon was preached by
Gardiner at Paul’s Cross, before the King and Cardinal Pole. A large
crowd collected to hear him. On this occasion, in spite of the
presence of a strong guard, some interruptions occurred, proving
that there were dissentients among the auditors. Evidently there was
a growing feeling of dislike to Philip and the Spaniards, fostered by
the malcontents, and many a fierce glance was fixed upon the King,
many a threat breathed against him, as, surrounded by a band of
halberdiers, he listened to Gardiner’s discourse.
But if Philip was hated even by the Romanists, who after all were
as true lovers of their country as those of the adverse sect, and
equally hostile to the Spaniards, the universal feeling was favourable
to Cardinal Pole, whose benevolent countenance pleased the
Reformers, as much as his dignified deportment commanded their
respect. He and the King rode together to Saint Paul’s, and after
hearing the sermon, returned in the same way to Whitehall. Philip
had the sword of state borne before him, but the Cardinal contented
himself with the silver cross.
A few days afterwards, intimation was sent by the council to
Bonner, Bishop of London, that the Queen was in a condition to
become a mother. Command was given at the same time that there
should be a solemn procession to Saint Paul’s, in which the Lord
Mayor, the aldermen, and all the City companies, in their liveries,
should join, to offer up prayers for her Majesty’s preservation during
her time of travail, coupled with earnest supplications that the child
might be a male.
This announcement, which, as may be supposed, was quickly
bruited abroad throughout the City, gave great satisfaction to the
Romanists, but it was anything but welcome or agreeable to the
Reformers, who saw in it an extension of power to their enemies,
and an increase of danger to themselves. If an heir to the throne
should be born, Philip’s authority in England would be absolute. Such
was the general impression, and its correctness was confirmed by a
petition made to the King by both Houses, which prayed “that if it
should happen otherwise than well to the Queen, he would take
upon himself the government of the realm during the minority of her
Majesty’s issue.” As may be supposed, Philip readily assented, and
an act was immediately passed carrying out the provisions above
mentioned, and making it high treason to compass the King’s death,
or attempt to remove him from the government and guardianship
confided to him.
Under these circumstances the solemn procession to Saint Paul’s
took place. Vast crowds encumbered the streets as the civic
authorities proceeded from Guildhall to the cathedral, headed by ten
bishops in their robes, the pix being borne before them under a
canopy. This gave such offence, that had not a strong military force
kept the populace in awe, it is certain that the procession would
have been molested. As it was, expressions of antipathy to Philip
could not be checked. “England shall never be ruled by the
Spaniard,” was the indignant outcry, which found an echo in many a
breast, whether of Romanist or Reformer.
In spite of all these clamours, the procession reached Saint Paul’s
in safety, and high mass was celebrated by Bonner and the other
bishops, after which prayers were offered up for the Queen, in
accordance with the council’s mandate. The mass of the assemblage
joined heartily in these supplications, but there were some who
refused to recite them, and secretly prayed that Philip’s hopes of an
heir might be frustrated.
The reader is already aware that Cardinal Pole, immediately on his
arrival in London, had been put in possession of Lambeth Palace.
This noble residence, with the revenues of the Archbishopric of
Canterbury, confiscated on the condemnation of Cranmer for high
treason, was bestowed on the Cardinal by the King and Queen; but
Pole could not be promoted to the archiepiscopal see while Cranmer
lived.
One of the Cardinal’s first acts on taking possession of the palace
was to summon all the bishops and principal clergy before him, and,
after listening to their expressions of penitence for the perjuries,
heresies, and schisms they had committed during the late reigns, he
gave them absolution.
And now, before proceeding further, it may be desirable to give a
brief description of the ancient edifice occupied by the Cardinal.
The present vast and irregular pile, known as Lambeth Palace,
was preceded by a much smaller mansion, wherein the archbishops
of Canterbury were lodged, and to which a chapel was attached.
This building was pulled down in 1262 by the turbulent Archbishop
Boniface, and a new and more important structure erected in its
place. Of Boniface’s palace little now remains save the chapel and
crypt. So many additions were made to the palace by successive
archbishops, and so much was it altered, that it may almost be said
to have become another structure. A noble hall, subsequently
destroyed in the time of the Commonwealth, was built by Archbishop
Chichely, who flourished in the reign of Edward IV.; while the chief
ornament of the existing pile, the gateway, was reared by Cardinal
Archbishop Morton, towards the end of the 15th century. The
Steward’s Parlour, a chamber of large dimensions, was added by
Cranmer, and a long gallery and other buildings were erected by
Cardinal Pole.
Before entering the palace, let us pause to examine the gateway,
a structure of almost unrivalled beauty, and consisting of two large
square towers, built of fine brick, embattled, and edged with stone.
The archway is pointed, and has a groined roof springing from four
pillars, one in each corner. Spiral stone staircases lead to the upper
chambers, and from the leads of the roof a wonderful prospect of
the surrounding metropolis is obtained. Connected with the porter’s
lodge is a small prison-chamber, having a double door, and high,
narrow-grated windows. The walls are cased with stone, and of
prodigious thickness, while three heavy iron rings fixed in them
attest the purpose to which the room was formerly applied.
Passing through the principal court, we enter the great hall, rebuilt
by Archbishop Juxon on the exact model of the old hall, demolished
during the Protectorate, so that it may be considered a counterpart
of Archbishop Chichely’s banqueting-chamber. Nearly a hundred feet
in length, proportionately wide and lofty, this noble room has a
superb pendant timber roof, enriched with elaborate carvings, and
lighted by a louvre. In the great bay-window, amidst the relics of
stained glass, recovered from the original hall, may be discerned the
arms of Philip of Spain, painted by order of Cardinal Pole. At the
present day the hall is used as the palace library, and its space is
somewhat encroached upon by projecting bookcases, filled with
works of divinity. At the upper end is the archbishop’s seat.
From the great hall we may proceed to the gallery and guard-
chamber, the latter of which was once used as the armoury of the
palace. It has an ancient timber roof, with pendants, pointed arches,
and pierced spandrels. Here are portraits of many of the archbishops
of Canterbury, among which may be seen that of Cardinal Pole,
copied from the original by Raffaelle, preserved in the Barberini
Palace at Rome.
Pass we by the presence-chamber and other state-rooms, and let
us enter the long gallery erected by Cardinal Pole—a noble room,
lighted by windows enriched with stained glass.
Hence we will proceed to the chapel erected by Boniface. Lighted
by three lancet-shaped windows on either side, and divided by an
elaborately carved screen, on the inner side of which is the
archiepiscopal stall, this chapel contains but little of its pristine
character, and is disfigured by a flat-panelled ceiling, added by
Archbishop Laud.
Beneath the chapel, and corresponding with it in size, is an
ancient crypt, with a groined roof, once used as a place of worship.
In this part of the palace is a large room built by Cranmer, and now
called the Steward’s Parlour, and close to it are the servants’ hall and
the great kitchen.
We now come to a part of the palace to which interest of a
peculiar nature attaches. This is the Lollards’ Tower, a large stone
structure, erected by Archbishop Chichely, which derives its name
from being used as a place of imprisonment for the followers of
Wickliffe, called Lollards. This time-worn tower faces the river, and
on its front is a small niche or tabernacle, formerly occupied by an
image of Thomas à Becket.
In the lower part of the Lollard’s Tower is a gloomy chamber of
singular construction, the heavy timber roof being supported by a
strong wooden pillar standing in the centre of the chamber, whence
the place is called the Post Room. Tradition asserts that the
unfortunate Lollards, confined in the chamber above, were tied to
this pillar and scourged. The Post Room is lighted by three low
pointed windows looking towards the Thames, and its flat-panelled
ceiling is ornamented at the intersections with grotesque carvings.
Ascending by a narrow spiral stone staircase, we reach the prison-
chamber just referred to, which is guarded by an inner and outer
door of stout oak, studded with broad-headed nails. A strange,
strong room, that cannot fail powerfully to impress the visitor.
Wainscot, ceiling, floor, every part of the chamber is boarded with
dark oak of great thickness. Fixed to the wainscot, breast-high from
the ground, are eight massive rings. The boards adjoining them are
covered with inscriptions—mementoes of the many unfortunates
confined there. The prison-chamber is lighted by two small grated
windows, narrowing outwardly, one of which looks upon the river.
Attached to the palace are a park and gardens of considerable
extent, and in the olden time of great beauty. Within the gardens, up
to the commencement of the present century, grew two singularly
fine fig trees, planted by Cardinal Pole, and trained against that part
of the palace which he erected.
Lambeth Palace came into Pole’s hands in a very habitable
condition, having been well kept up by his predecessor, Cranmer. So
well pleased was the Cardinal with the mansion, that he not only
embellished it in many ways, but enlarged it, as we have previously
mentioned. He also took great delight in the gardens, and laid them
out in the Italian style.
Unostentatious of character, and simple in his tastes and habits,
Pole felt it due to his elevated position to maintain princely state in
the residence assigned to him by their Majesties, and employ his
large revenues in hospitality and charity. When complete, which it
was within a month after his occupation of the Palace, Pole’s
household was as numerous and magnificent as Wolsey’s, and
comprised a high-chamberlain and vice-chamberlain, twelve
gentlemen ushers, steward, treasurer, comptroller, cofferer, three
marshals, two grooms, and an almoner. In his chapel he had a dean,
a sub-dean, twelve singing-priests, and the like number of quiristers.
Besides these, there were his cross-bearers, his pillar-bearers, and
two yeomen to bear his poleaxes. The inferior officers were almost
too numerous to particularise, comprehending purveyors, cooks,
sewers, cup-bearers, yeomen of the larder, of the buttery, of the
ewery, the cellar, the laundry, the bakehouse, the wardrobe, the
chandry, the wood yard, and the garden. Of gardeners, indeed,
there were several. Besides these, there were a multitude of pages
and grooms, a sumpter-man, a muleteer, and sixteen grooms of the
stable, each of whom had four horses. Then there were tall porters
at the gate, yeomen of the chariot, and yeomen of the barge. Nor
were these all. In addition to those previous enumerated, there were
a physician, two chaplains, and two secretaries.
Such was the magnificent establishment maintained by Pole
during his residence at Lambeth Palace. His hospitality may be
judged of by the fact that three long tables were daily laid in the
great hall, abundantly supplied with viands, and ever thronged with
guests. At the upper table sat the Cardinal, generally surrounded by
nobles or ecclesiastical dignitaries. A place at this table, not far from
his illustrious friend, was always reserved for Lord Priuli.
Apartments in the palace were, of course, assigned to Priuli, who
had likewise his own attendants. The entire control of the vast
establishment devolved upon the noble Venetian, who undertook the
office in order to relieve the Cardinal of a portion of his labours.
Amidst all this profusion the poor were not forgotten. Dole was
daily distributed at the palace gate, under the personal
superintendence of Pole and Priuli. The wants of the necessitous
were relieved, and medicines were delivered to the sick. None who
deserved assistance were ever sent empty-handed away by the
Cardinal.
Amongst the Cardinal’s officers were our old acquaintances
Rodomont Bittern, Nick Simnel, and Jack Holiday, the first of whom
had been recommended to Pole by the King himself. Rodomont was
appointed captain of the palace guard, and his two friends were
made lieutenants. On state occasions they formed part of the
Cardinal’s body-guard.
One fine morning, at an early hour, these three personages had
scaled the lofty gate-tower, in order to enjoy the goodly prospect it
commanded. Before them flowed the Thames, then a clear and
unpolluted stream, its smooth surface speckled, even at that early
hour, by many barques. A ferry-boat, laden with passengers and
horses, was crossing at the time from Lambeth to Westminster. On
the opposite side stood the ancient Abbey, with the Parliament
House, the Star Chamber, the beautiful gates of Whitehall, designed
by Holbein, the royal gardens, and the palace. Further on could be
observed the exquisite cross at Charing, subsequently destroyed by
fanatical fury. Then following the course of the river, the eye lighted
upon York-place, Durham-place, the Savoy, and the splendid
mansion then but recently completed by the aspiring Duke of
Somerset. Further on was the ancient palace of Bridewell, and
beyond, Baynard’s Castle, while above the clustering habitations of
the City rose the massive tower and lofty spire of old Saint Paul’s.
London at the period of which we treat was singularly picturesque
and beautiful. The walls encircling it were well fortified and in good
repair, and most of its oldest and most remarkable edifices were still
standing, no terrible conflagration having as yet touched them.
Numberless towers, churches, and picturesque habitations, with high
roofs and quaint gables, excited the admiration of those who stood
that morn on the gateway of Lambeth Palace; but perhaps the
object that pleased them best was London-bridge, which, with its
gates, its drawbridges, its church, and lofty habitations, proudly
bestrode the Thames. Having gazed their fill at this wondrous
structure, or rather collection of structures, they turned towards the
Surrey side of the river, and noted Saint Mary Overy’s fine old
church, the palace of the Bishop of Winchester, the Ring, at that
time much frequented, in which bulls and bears were baited, and the
adjacent theatre, wherein, at a later date, many of the plays of our
immortal bard were represented. Content with this distant survey,
they then looked nearer home, and allowed their gaze to wander
over the park and gardens of the palace, and finally to settle upon
the various courts, towers, and buildings composing the pile.
“By my faith, ’tis a stately edifice, this palace of Lambeth!”
exclaimed Rodomont. “Our lord and master the Cardinal is as well
lodged as the King and Queen at Whitehall.”
“Were it not for yonder ague-bringing marshes the palace would
be a marvellous pleasant residence,” observed Nick Simnel.
“Why should a sturdy fellow like you, Nick, fear ague?” cried
Rodomont. “Lord Priuli tells me that his Eminence enjoys better
health here than he has done since he left the Lago di Garda—a
plain proof that the place cannot be insalubrious, as you would have
it.”
“Follow my example, Nick, and fortify yourself against the morning
mists with a thimbleful of aqua vitæ,” remarked Jack Holiday, with a
laugh. “’Tis a sovereign remedy against ague. But see! yonder are
the Cardinal and the Lord Priuli, taking an early walk in the garden.
They seem engaged in earnest discourse.”
“I warrant their discourse relates to the recusant Protestant
divines, who have just been excommunicated by the ecclesiastical
commissioners, and are to be burnt,” observed Rodomont. “There
will be rare doings at Smithfield ere long, if Gardiner and Bonner
have their way. But our good lord the Cardinal is averse to
persecution, and may succeed in checking it.”
“Heaven grant he may!” exclaimed Jack Holiday. “If once the fires
are lighted at Smithfield, there’s no saying when they may be
extinguished, or who may perish by them. ’Tis a marvel to me that
the late occupant of this palace, Cranmer, has so long been spared.
If the ecclesiastical commissioners desire to deal a heavy blow
against the Reformers, why not strike their leader now they have
him in their power?”
“I will tell you why,” rejoined Rodomont. “In this high place none
can overhear us, so we may talk freely. Gardiner would fain be
Archbishop of Canterbury, but he knows that if Cranmer be burnt,
our lord the Cardinal will at once be appointed to the archiepiscopal
see. Therefore Cranmer is allowed to live, in the hope that Pole may
be recalled to Rome by his Holiness. But the crafty Bishop of
Winchester will be disappointed, for the Cardinal is not likely to leave
his native country again.”
“I am rejoiced to hear it,” said Simnel. “We could ill spare him. The
Cardinal is the pillar of the Romish church in England.”
“By our Lady, he is a pattern to all,” cried Rodomont. “There lives
not a better man than his Eminence. Even the Queen, they say, is
governed by his advice. He has more influence with her than the
King himself.”
“Like enough,” observed Jack Holiday, “for they do say that the
royal couple, like other married folk, have an occasional quarrel. Her
Majesty is plaguily jealous.”
“And not without reason,” said Rodomont, with a laugh. “It was
not to be expected that the King, who is of an amorous complexion,
as all the world knows, should continue faithful to a woman eleven
years older than himself, and ill-favoured into the bargain. He wants
something younger and better-looking.”
“Like poor Constance Tyrrell,” said Nick Simnel; “she who is shut
up yonder,” he added, pointing to the Lollard’s Tower.
“Ay, and she will never get out unless she yields to the King’s
wishes,” observed Jack Holiday.
“Don’t be too sure of that,” rejoined Rodomont. “It will be her own
fault if she remains here another twenty-four hours.”
“How so?—who will unlock the door for her?—not her gaoler?”
said Holiday.
“Not her gaoler, fool,” rejoined Rodomont, “but her lover, Osbert
Clinton. Since he can’t unlock the door, he will unbar the window.
You are both too generous to betray him, I know, and therefore I’ll
e’en tell you what occurred last night. While making my rounds, a
little after midnight, I entered the outer court, and was standing
near the Water Tower, when looking up, I espied a head above yon
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and

personal growth!

textbookfull.com

Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
No ratings yet
Introduction To Reinforcement Learning: Instructor: Sergey Levine UC Berkeley
46 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
Playing Atari Games With Deep Reinforcement Learning
No ratings yet
Playing Atari Games With Deep Reinforcement Learning
16 pages
Machine Learning Algorithms in Bipedal Robot Control
No ratings yet
Machine Learning Algorithms in Bipedal Robot Control
16 pages
AI and Power BI Que-Finger Tips
No ratings yet
AI and Power BI Que-Finger Tips
37 pages
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
No ratings yet
Deep Reinforcement Learning For Mobile 5G and Beyond Fundamentals Applications and Challenges
16 pages
Large-Scale Dynamic Scheduling For Flexible Job-Shop With Random Arrivals of New
No ratings yet
Large-Scale Dynamic Scheduling For Flexible Job-Shop With Random Arrivals of New
12 pages
Adaptive Laser Welding Control A Reinforcement Learning Approach
No ratings yet
Adaptive Laser Welding Control A Reinforcement Learning Approach
13 pages
Machine Learning 2025
No ratings yet
Machine Learning 2025
12 pages
Deep Reinforcement Learning For Cyber Security
No ratings yet
Deep Reinforcement Learning For Cyber Security
17 pages
TSP Csse 31116
No ratings yet
TSP Csse 31116
16 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
Crop Yield Prediction Using Deep Reinforcement Learning Model For Sustainable Agrarian Applications
No ratings yet
Crop Yield Prediction Using Deep Reinforcement Learning Model For Sustainable Agrarian Applications
17 pages
DMV & ML Lab
No ratings yet
DMV & ML Lab
103 pages
DS T1 Report - Load Balancing in Cloud Computing
No ratings yet
DS T1 Report - Load Balancing in Cloud Computing
40 pages
PA4
No ratings yet
PA4
8 pages
Sahil Khaja Huzoor AMS 517 Report
No ratings yet
Sahil Khaja Huzoor AMS 517 Report
11 pages
Improving Productivity in Mining Operations A Deep
No ratings yet
Improving Productivity in Mining Operations A Deep
14 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
Reinforcement Learning - Ipynb - Colab
No ratings yet
Reinforcement Learning - Ipynb - Colab
5 pages
Reinforcement Learning: A Survey: Leslie Pack Kaelbling Michael L. Littman Andrew W. Moore
No ratings yet
Reinforcement Learning: A Survey: Leslie Pack Kaelbling Michael L. Littman Andrew W. Moore
49 pages
AI With CPP
No ratings yet
AI With CPP
119 pages
Slides Active Flow Control Deep Reinforcement Learning
No ratings yet
Slides Active Flow Control Deep Reinforcement Learning
46 pages
C-Edge QoE Computation Offloading With Deep Reinforcement Learning For Internet of Things
No ratings yet
C-Edge QoE Computation Offloading With Deep Reinforcement Learning For Internet of Things
11 pages
A Collaborative Iterated Greedy Algorithm With Reinforcement Learning For Energy-Aware Distributed Blocking Flow-Shop Scheduling
No ratings yet
A Collaborative Iterated Greedy Algorithm With Reinforcement Learning For Energy-Aware Distributed Blocking Flow-Shop Scheduling
23 pages
RL Unit 4
No ratings yet
RL Unit 4
9 pages
Dictionary of Artificial Intelligence
No ratings yet
Dictionary of Artificial Intelligence
492 pages
Deep Reinforcement Learning in Games
No ratings yet
Deep Reinforcement Learning in Games
9 pages
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
No ratings yet
An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
12 pages
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
No ratings yet
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
30 pages

Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere

Uploaded by

Adaptive Dynamic Programming With Applications in Optimal Control 1st Edition Derong Liu - Quickly Download The Ebook To Read Anytime, Anywhere

Uploaded by

Explore the full ebook collection and download it now at textbookfull.

Adaptive Dynamic Programming with Applications in

Browse and Get More Ebook Downloads Instantly at https://textbookfull.com

Read anywhere, anytime, on any device!

Stochastic Optimal Control in Infinite Dimension Dynamic

Robust Adaptive Dynamic Programming 1st Edition Hao Yu

Intelligent Optimal Adaptive Control for Mechatronic

Programming Interview Problems: Dynamic Programming (with

Robust Control: Theory and Applications Kang-Zhi Liu

Adaptive aeroservoelastic control 1st Edition Tewari

Sliding Mode Control Methodology in the Applications of

Adaptive Critic Control with Robust Stabilization for

Xiong Yang Hongliang Li

ISSN 1430-9491 ISSN 2193-1577 (electronic)

© Springer International Publishing AG 2017

Printed on acid-free paper

This Springer imprint is published by Springer Nature

Nowadays, nonlinearity is involved in all walks of life. It is a challenge for

1. Optimal control for discrete-time nonlinear dynamical systems, covering various

Arlington, TX, USA Frank L. Lewis

monotonicity, convergence, admissibility, and optimality properties of the gener-

In Chap. 14, a data-driven stable iterative ADP algorithm is developed to solve

Beijing, China Derong Liu

1 Overview of Adaptive Dynamic Programming . . . . . . . . . . . . . . . . . 1

Part I Discrete-Time Systems

5 Generalized Policy Iteration ADP for Discrete-Time Nonlinear

Part II Continuous-Time Systems

7.2.2 Stability Analysis of Closed-Loop System . . . . . . . .... 281

9.3.3 Stability Analysis of Closed-Loop System . . . . . . . . . . . 373

Part III Applications

14.3 Design of Neuro-Optimal Temperature Controller . . . . . . . .... 575

ACD Adaptive critic designs

T The transposition symbol, e.g., AT is the transposition of matrix A

vec(A) The vectorization mapping from matrix A into an mn-dimensional

© Springer International Publishing AG 2017 1

Position evaluation (for approximating the optimal cost-to-go function of the

1.2 Reinforcement Learning

where 0 ≤ γ ≤ 1 is a discount factor and ak = π(sk ) and sk+1 = F(sk , ak ) for k =

he court returned to Whitehall in November, Parliament

few days afterwards, in consequence of the Queen’s

o sooner was the nation’s reconciliation with the See of

Let us accompany you on the journey of exploring knowledge and

You might also like