PDP +Parallel+Dynamic+Programming

The document introduces Parallel Dynamic Programming (PDP), a novel approach that integrates dynamic programming with analytics intelligence to enhance reinforcement learning methods. It emphasizes the importance of adaptive dynamic programming (ADP) and discusses how PDP can address the challenges of trial-and-error in real-world control systems, particularly in complex environments. The authors propose using big data analytics and computational experiments to improve decision-making processes in Cyber-Physical-Social systems.

Uploaded by

ChâuTrungTín

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

PDP +Parallel+Dynamic+Programming

Uploaded by

ChâuTrungTín

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO.

1, JANUARY 2017 1

PDP: Parallel Dynamic Programming

Fei-Yue Wang, Fellow, IEEE, Jie Zhang, Member, IEEE, Qinglai Wei, Member, IEEE, Xinhu Zheng, Student
Member, IEEE, and Li Li, Fellow, IEEE

Abstract—Deep reinforcement learning is a focus research area technologies. However, the other key to AlphaGo’s success,
in artificial intelligence. The principle of optimality in dynamic the Principle of Optimality for dynamic programming, has
programming is a key to the success of reinforcement learning been taken for reinforcement learning (RL). As a matter of
methods. The principle of adaptive dynamic programming (ADP)
is first presented instead of direct dynamic programming (DP), fact, dynamic programming plays a very important role in
and the inherent relationship between ADP and deep reinforce- modern reinforcement learning. The victory of AlphaGo is
ment learning is developed. Next, analytics intelligence, as the actually also the victory of dynamic programming.
necessary requirement, for the real reinforcement learning, is Dynamic programming [2] has become well-known since
discussed. Finally, the principle of the parallel dynamic pro- 1950s in many fields. In 1977, Werbos combined DP, neural
gramming, which integrates dynamic programming and analytics
intelligence, is presented as the future computational intelligence. networks and reinforcement learning, and introduced approx-
imate/adaptive dynamic programming (ADP) [3], [4] to solve
the “curse of dimensionality” [2]. However, trial-and-error
Index Terms—Parallel dynamic programming, Dynamic pro-
gramming, Adaptive dynamic programming, Reinforcement based reinforcement learning and deep learning both focus
learning, Deep learning, Neural networks, Artificial intelligence. on engineering complexity and ignore the social complexity.
In this article, we suggest another extension of dynamic
programming considering both engineering and social com-
plexities, aiming the “paradox of scientific methods” in com-
I. I NTRODUCTION
plex system’s “scientific solutions” [5]. We utilizing big data
Google DeepMind’s deep reinforcement learning based Al- analytics and the ACP approach [6], [7]: arttificial societies for
phaGo computer program [1] won the historic Go match descriptive anlytics, computational experiments for predictive
against world champion Lee Sedol in March 2016. The analytics, and parallel execution for prescriptive analytics. We
combination of Monte-Carlo tree search and deep reinforce- name our approach Parallel Dynamic Programming.
ment learning makes a breakthrough at Go playing which This article is organized as follows. The next section reviews
is believed impossible with brute-force search, and brings dynamic programming and adaptive dynamic programming.
artificial intelligence a focus for the year. Most people pay Then, we briefly discuss the neural network structure of ADP
more attention to the intuitive highly brain-like deep learning and AlphaGo. We present the ACP approach of analytics
intelligence in Section IV. In Section V, we introduce the basic
Manuscript received November 11, 2015; accepted December 21, 2016. structure of parallel dynamic programming. The last section
This work was supported by National Natural Science Foundation of China
(61533019, 61374105, 71232006, 61233001, 71402178). concludes the article.
Citation: F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, “PDP:
parallel dynamic programming,” IEEE/CAA Journal of Automatica Sinica,
vol. 4, no. 1, pp. 1-5, Jan. 2017. II. F ROM DYNAMIC P ROGRAMMING TO A DAPTIVE
Fei-Yue Wang is with The State Key Laboratory of Management and DYNAMIC P ROGRAMMING
Control for Complex Systems (SKL-MCCS), Institute of Automation, Chi-
nese Academy of Sciences (CASIA), Beijing 100190, China, and School Dynamic programming (DP) is a very useful tool in solving
of Computer and Control Engineering, University of Chinese Academy optimization and optimal control problems [8]–[10]. The dy-
of Sciences, Beijing 100049, China, and also with the Research Center namic programming technique rests on a very simple idea, the
for Military Computational Experiments and Parallel Systems Technology,
National University of Defense Technology, Changsha 410073, China (e-mail: Bellman’s principle of optimality [2]: “An optimal policy has
[email protected]). the property that no matter what the previous decision (i.e.,
Jie Zhang is with The State Key Laboratory of Management and Con- controls) have been, the remaining decisions must constitute
trol for Complex Systems, Institute of Automation, Chinese Academy of
Sciences (SKL-MCCS, CASIA), Beijing 100190, China, and also with the an optimal policy with regard to the state resulting from those
Qingdao Academy of Intelligent Industries, Shandong 266000, China (e-mail: previous decisions.”
[email protected]). DP can easily be applied to the optimal control of discrete-
Qinglai Wei is with The State Key Laboratory of Management and Control
for Complex Systems, Institute of Automation, Chinese Academy of Sciences time nonlinear systems. Let the system be
(SKL-MCCS, CASIA), Beijing 100190, China, and also with School of
Computer and Control Engineering, University of Chinese Academy of xk+1 = Fk (xk , uk ), (1)
Sciences, Beijing 100049, China (e-mail: [email protected]).
Xinhu Zheng is with the Department of Computer Science and Engi- where xk , uk are the state and control, respectively, and Fk (·)
neering, University of Minnesota, Minneapolis, MN 55414, USA (e-mail: is the system function at time k. Suppose we associate with
[email protected]).
Li Li is with the Department of Automation, Tsinghua University, Beijing
this plant the performance index function
100084, China (email: [email protected]). −1
Color versions of one or more of the figures in this paper are available
∑
N

online at http://ieeexplore.ieee.org.
Ji (xi ) = ϕ(N, xN ) + Uk (xk , uk ), (2)
Digital Object Identifier 10.1109/JAS.2017.7510310 k=i
2 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 1, JANUARY 2017

where [i, N ] is the time interval of interest. According to the Programming” [22]. In 2006, the synonyms were unified as
Bellman’s principle of optimality, the optimal performance “Adaptive Dynamic Programming (ADP)” [23]–[36].
index function, which aims to minimize, satisfies the following
equation III. N EURAL N ETWORK S TRUCTURE OF ADP
Jk∗ (xk ) = min{Uk (xk , uk ) + Jk+1
∗
(xk+1 )}. (3) HDP is the most basic and widely applied structure of ADP
uk [11], [13]. The structure of HDP is shown in Fig. 2. HDP
Equation (3) is the Bellman’s optimality equation. Its im- is a method for estimating the performance index function.
portance lies in the fact that it allow us to optimize over only Estimating the performance index function for a given policy
one control vector at a time by working backward from N . only requires samples from the instantaneous utility function
It is called the functional equation of dynamic programming U , while models of the environment and the instantaneous
and is the basis for computer implementation of the Bellman’s reward are needed to find the performance index function
method. However, it is often computationally untenable to corresponding to the optimal policy.
obtain the optimal control by directly solving the Bellman
equation (3) due to the backward numerical process required
&ULWLF J xk
for its solution, i.e., as a result of the well-known “curse of 1HWZRUN
dimensionality” [2]. We have to find a series of optimal control
actions that must be taken in sequence. This sequence will
give the optimal performance index, but the total cost of these uk
xk $FWLRQ 0RGHO xk &ULWLF J xk
actions is unknown until the end of that sequence. 1HWZRUN 1HWZRUN 1HWZRUN
U xk uk
Approximate dynamic programming, proposed by Werbos
[3], [4], builds a critic system to circumvent the “curse
of dimensionality” by building a system, called “critic” to
Fig. 2. The HDP structure diagram.
approximate the cost function in dynamic programming. The
main idea of approximate dynamic programming as shown in In the HDP structure, the model network aims to describe
Fig. 1. There are three parts in the the structure of approximate the dynamic of the system. The action network aims to
approximate the control policy of the system, and the critic
network aims to approximate the performance index function.
If each neural network in HDP is chosen as three-layer back-
propagation (BP) network, then the neural network structure
of HDP can be expressed as in Fig. 3.

uk x k

uk x k
,QSXW # # # 2XWSXW
xk # # # # # # # # # # J k
# um k
# xn k
#
$FWLRQ1HWZRUN 0RGHO1HWZRUN &ULWLF1HWZRUN

'HHS1HXUDO1HWZRUN%DVHG5HLQIRUFHPHQW/HDUQLQJ

Fig. 1. ADP structure. Fig. 3. Deep neural network structure of HDP.

dynamic programming, which are dynamic system, the critic In Fig. 3, we can say that we use three BP neural networks
module, and the action module, respectively. First, the action to implement the learning of the optimal control. However, if
module outputs a control policy according to the system state. the three neural networks are regarded as one neural network,
Second, according to the system implementation, the critic then we can say that we implement the learning of the optimal
module receives a evaluate signal. Third, a reinforcement control by at least a nine-layer deep BP network [37]. In this
signal is created by the critic network, which aims to indicate point of view, the structure of HDP is a structure of a deep
the action module to find a better control policy, at least neural network. For all the other structures of ADP [13], such
not worse. The whole implementation is self-learning and as dual heuristic programming (DHP), global dual heuristic
the critic and action modules can be regarded as an agent. programming (GDHP), and their action-depended versions,
According to the principle in Fig. 1, the dynamic programming the structures can also be transformed into one deep neural
problem is desired to solve forward-time. In [11], [12], the network. Thus, the structure of ADP is naturally a deep neural
approximate dynamic programming method was implemented, network. The training target of the deep neural network is
where each part in Fig. 1 was modeled by a neural network and desired to force the following error
hence is called “Neuro-Dynamic Programming”. Its several
ek = Uk (xk , uk ) + Jk+1 (xk+1 ) − Jk (xk ) (4)
synonyms are used, such as “Adaptive Critic Designs” [13],
[14], “Asymptotic Dynamic Programming” [15], “Adaptive to zero. Obviously, the training error ek can be chosen as
Dynamic Programming” [16]–[21], and “Neural Dynamic the reinforcement signal, which optimizes the control policy
WANG et al.: PDP: PARALLEL DYNAMIC PROGRAMMING 3

to minimize the distance between ek and the equilibrium Predictive analysis [41]–[43] should be made according to
point. Hence, the optimization process by ADP is actually a the descriptive model and historical data to predict the future
reinforcement learning process via a deep neural network. This by reasoning. It tells us “what will happen”, “when will it
is an amazing similar with the implementation of AlphaGo. happen,” and “why will it happen”. In our ACP approach,
Earlier works in [38], [39] also provided neural network based predictive analytics are conducting computational experiments
control method with knowledge architecture embed. to predict the future for certain artificial society with certain
control and management policy. “Big data” are created in the
IV. A NALYTICS I NTELLIGENCE : F ROM ACP TO DPP computational experiments in cyberspace.
Reinforcement learning is a computational approach to Finally, no matter how many possible futures or policies,
understanding and automating goal-directed learning and we can choose only one to implement in our real world.
decision-making. During the implementation process of ADP, Hence, after the predictive analytics of different policies and
it is emphasized that reinforcement learning is a key technique different artificial societies, we reduce the “big data”, extract
to find a better control policy of the system via trial-and- rules, and create the real future through learning and adaption.
error. However, it should be pointed out that shortcomings In our ACP approach, prescriptive analytics are developed to
inherently exist for the trial-and-error approach. Many real- find benefit policy from predictions based on the descriptive
world control systems cannot be “tried” sufficiently for the models and historical data through parallel execution. Data
fact of security and cost. Particularly, for systems that involve are collected for further descriptive analytics and predictive
human and societies (Cyber-Phisical-Social systems, CPSS) analytics during the parallel execution.
[40], sometimes the “error” is intolerable. The success of
AlphaGo suggests the possibility of conducting reinforcement V. PARALLEL DYNAMIC P ROGRAMMING
learning with a virtual Go-game played by two virtual players. In the practice of AlphaGo, one of the key ideas is to extract
However, we do not know the exact rules or dynamic systems supervised learning policy, and to improve it to get a sub-
in most of our real-world control and management problems optimal policy during the reinforcement learning procedure.
as the Go-game. In parallel dynamic programming (PDP), we suggest the
Data driven parallel systems in cyberspace are the key to ACP approach for decision making in CPSS with analytics
solve the trial-and-error challenge. Two founding pioneers intelligence.
of modern management sciences stated famous maxims for
operations: W. Edwords Deming, “In God we trust; all others
must bring data” and Peter F. Drucker: “The best way to
predict the future is to create it”. Our suggestion is to integrate
artificial intelligence and analytics [6]: artificial societies (or
systems) for descriptive analytics, computational experiments
for predictive analytics, and parallel execution for prescriptive
analytics. The DPP (descriptive, predictive and prescriptive)
analytics intelligence can be built based on the ACP approach
as is shown in Fig. 4.

Fig. 5. Parallel dynamic programming with three parallel systems.

The descriptive analytics of parallel dynamic programming

are data driven for systems with unknown or imperfect in-
formation. We collect state-action-reward-state data from real-
Fig. 4. The ACP approach of descriptive, predictive, and prescriptive
analytics. world controls and observations, as is shown in Fig. 5. Since
the highly complexity of real-world system, and the highly
The representation procedure of descriptive models is to unpredictable of human behaviors, the data are not directly
speak with data, which generally aims to tell us “what hap- used for fitting the state equations of the dynamic system.
pened in history”, “when did it happen”, and “why did it On the contrary, in parallel dynamic programming, we focus
happen”. However, in real-world cases, people can only collect on how to construct “possible” data consistent with the real-
“small data” for decision making. Before we make a decision, world observations. Combining these virtual data and the
a lot of possible “futures” (“big data” or artificial societies) historical real-world data, we can build the artificial systems
are created during the analytics based on the collected “s- parallelized to the real-world system. The data driven virtual
mall data”. Moreover, “futures” can be “created” by imaging systems model the state equations of the artificial world, and
and designing. Hence, in our ACP approach, the descriptive the objectives of agents. An artificial system in PDP indicates
analytics by artificial societies are not only the model of real- feasibility and possibility, instead of similarity.
world data, but also the model of virtual “predicted future” Within each artificial system, the predictive analytics can be
and “created future” in cyberspace. viewed as optimal control problems with known dynamics in
4 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 1, JANUARY 2017

PDP. Hence, assume we have n-artificial systems (n = 3 in [9] C. Vagg, S. Akehurst, C. J. Brace, and L. Ash, “Stochastic dynamic
Fig. 5) parallelized with the real-world system, we can employ programming in the real-world control of hybrid electric vehicles,” IEEE
Trans. Control Syst. Technol., vol. 24, no. 3, pp. 853−866, Mar. 2016.
dynamic programming or adaptive dynamic programming to [10] P. M. Esfahani, D. Chatterjee, and J. Lygeros, “Motion planning for
solve the optimal control problems with known state equations continuous-time stochastic processes: A dynamic programming ap-
in the trial-and-error approach in the virtual parallel system proach,” IEEE Trans. Autom. Control, vol. 61, pp. 2155−2170, 2016.
[11] P. J. Werbos, “Approximate dynamic programming for real-time control
without any cost or risk, and get n optimal (or sub-optimal) and neural modeling,” in Handbook of Intelligent Control: Neural, Fuzzy,
decisions. Note that, this procedure is naturally distributed, and and Adaptive Approaches, D.A. White and D.A. Sofge (Eds.), New York:
previous decisions in real-world can be used as an initial guess Van Nostrand Reinhold, 1992, ch. 13.
[12] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming.
in the reinforcement learning iterations to reduce the compu- Belmont, MA: Athena Scientific, 1996.
tation. Then, the computational experiments can be conducted [13] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE
in a bionic manner: based on a voting mechanism n-virtual Trans. Neural Netw., vol. 8, no. 5, pp. 997−1007, Sep. 1997.
[14] J. Han, S. Khushalani-Solanki, J. Solanki, and J. Liang, “Adaptive critic
systems will vote for the n-decisions. Hence, we can get a design-based dynamic stochastic optimal control design for a microgrid
winning decision and its corresponding critic network. Note with multiple renewable resources,” IEEE Trans. Smart Grid, vol. 6, no.
that, the computational experiments search for an acceptable 6, pp. 2694−2703, Jun. 2015.
artificial system and an admissible decision, rather than the [15] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press, 1998.
optimal control. [16] J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic
The parallel execution will based on the optimality principle programming,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32,
of dynamic programming, and the critic network selected no. 2, pp. 140−153, May 2002.
[17] Q. Wei, F. L. Lewis, D. Liu, R. Song, and H. Lin, “Discrete-time local
in the computational experiments. We adjust the decision value Iteration adaptive dynamic programming: Convergence analysis,”
according to the winning critic network and the observed IEEE Trans. Syst., Man, Cybern. A, Syst., article in press, 2016. DOI:
real-world state. The virtual-real system interaction can be 10.1109/TSMC.2016.2623766.
[18] Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song, “Discrete-time
conducted by observing states and errors, updating the artificial deterministic Q-learning: A novel convergence analysis,” IEEE Trans.
systems and adjusting the voting mechanism. Cybern., article in press, 2016. DOI: 10.1109/TCYB.2016.2542923.
The detailed implementation of the PDP algorithm for [19] Q. Wei, D. Liu, and G. Shi, “A novel dual iterative Q-learning method
for optimal battery management in smart residential environments,”
unknown discrete systems has been conducted and the result IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2509−2518, Apr. 2015.
is very interesting and promising [44], more works are under- [20] Q. Wei and D. Liu, “A novel iterative θ-Adaptive dynamic programming
taking and will be reported. for discrete-time nonlinear systems,” IEEE Trans. Autom. Sci. Eng., vol.
11, no. 4, pp. 1176−1190, Oct. 2014.
[21] Q. Wei, D. Liu, Q. Lin, and R. Song, “Discrete-time optimal control
via local policy iteration adaptive dynamic programming,” IEEE Trans.
VI. R EMARK AND C ONCLUSION Cybern., article in press, 2016. DOI: 10.1109/TCYB.2016.2586082.
“Scientific solutions” need to satisfy two conditions: triable [22] R. Enns and J. Si, “Helicopter trimming and tracking control using direct
neural dynamic programming,” IEEE Trans. Neural Netw., vol. 14, no.
and repeatable [6]. In real-world systems that involves human 4, pp. 929−939, Aug. 2003.
and societies, the trial-and-error based reinforcement learning [23] R. Kamalapurkar, J. R. Klotz, and W. E. Dixon, “Concurrent learning-
can not be conducted unless we already know the “error” based approximate feedback-Nash equilibrium solution of N-player
nonzero-sum differential games,” IEEE/CAA J. Autom. Sinica, vol. 1,
will be harmless. On the other hand, the suggested parallel no. 3, pp. 239−247, Jul. 2014.
dynamic programming conduct computational experiments in [24] Q. Wei, D. Liu, and Q. Lin, “Discrete-time local iterative adap-
virtual systems with the idea of optimality principle. Unlike tive dynamic programming: Terminations and admissibility analysis,”
IEEE Trans. Neural Netw. Learn. Syst., article in press, 2016. DOI:
the game of AlphaGo, PDP is based on parallel systems [45], 10.1109/TNNLS.2016.2593743.
[46] instead of the exact rules of real-world systems, and will [25] Q. Wei, R. Song, and P. Yan, “Data-driven zero-sum neuro-optimal
be more flexible and feasible for complex problems. control for a class of continuous-time unknown nonlinear systems with
disturbance using ADP,” IEEE Trans. Neural Netw. Learn. Syst., vol.
27, no. 2, pp. 444−458, Feb. 2016.
R EFERENCES [26] H. Zhang, C. Qin, B. Jiang, and Y. Luo, “Online adaptive policy
learning algorithm for H∞ state feedback control of unknown affine
[1] D. Silver et al., “Mastering the game of Go with deep neural networks nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12,
and tree search,” Nature 529.7587, pp. 484-489, 2016. pp. 2706−2718, Dec. 2014.
[2] R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton Uni- [27] F.-Y. Wang and G. N. Saridis, “Suboptimal control for nonlinear
versity Press, 1957. stochastic systems,” Proc. 31st IEEE Conf. Decision Control, 1992.
[3] P. J. Werbos, “Advanced forecasting methods for global crisis warning [28] G. N. Saridis and F.-Y. Wang, “Suboptimal control of nonlinear stochas-
and models of intelligence,” General Syst. Yearbook, vol. 22, 1977. tic systems,” Control Theory and Advanced Technology, vol. 10, no. 4,
[4] P. J. Werbos, “A menu of designs for reinforcement learning over time,” pp. 847−871, 1994.
in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. [29] Q. Wei, D. Liu, and X. Yang, “Infinite horizon self-learning optimal
Werbos (Eds.), Cambridge: MIT Press, 1991, pp. 67−95. control of nonaffine discrete-time nonlinear systems,” IEEE Trans.
[5] F.-Y. Wang, et al., “Where does AlphaGo go: from church-turing thesis Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866−879, Apr. 2015.
to AlphaGo thesis and beyond”, IEEE/CAA J. Autom. Sinica, vol. 3, no. [30] Q. Wei, D. Liu, Y. Liu, and R. Song, “Optimal constrained self-learning
2, pp. 113−120, April 2016. battery sequential management in microgrid via adaptive dynamic pro-
[6] F.-Y. Wang, “A big-data perspective on AI: Newton, Merton, and gramming,” IEEE/CAA J. Autom. Sinica, article in press, 2016. DOI:
analytics intelligence”, IEEE Intell. Syst., vol. 27, no. 5, pp. 2−4, 2012. 10.1109/JAS.2016.7510262.
[7] L. Li, Y.-L. Lin, D.-P. Cao, N.-N. Zheng, and F.-Y. Wang, “Parallel [31] Q. Zhao, H. Xu, and S. Jagannathan, “Near optimal output feedback
learning-a new framework for machine learning,” Acta Autom. Sinica, control of nonlinear discrete-time systems based on reinforcement neural
vol. 43, no. 1, pp. 1−8, 2017 (in Chinese). network learning,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp.
[8] J. Li, W. Xu, J. Zhang, M. Zhang, Z. Wang, and X. Li, “Efficient video 372−384, Oct. 2014.
stitching based on fast structure deformation,” IEEE Trans. Cybern., [32] Q. Wei, D. Liu, G. Shi, and Y. Liu, “Optimal multi-battery coordination
article in press, 2015. DOI: 10.1109/TCYB.2014.2381774. control for home energy management systems via distributed iterative
WANG et al.: PDP: PARALLEL DYNAMIC PROGRAMMING 5

adaptive dynamic programming,” IEEE Trans. Ind. Electron., vol. 42, Jie Zhang (M’16) is an associate professor with
no. 7, pp. 4203−4214, Jul. 2015. The State Key Laboratory of Management and Con-
[33] Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- trol for Complex Systems, Institute of Automation,
gramming for optimal control of discrete-time nonlinear systems,” IEEE Chinese Academy of Sciences. His current research
Trans. Cybern., vol. 46, no. 3, pp. 840−853, Mar. 2016. interests include mechanism design and optimal con-
[34] Q. Wei, F. Wang, D. Liu, and X. Yang, “Finite-approximation-error trol in e-commerce and traffic systems. He received
based discrete-time iterative adaptive dynamic programming,” IEEE his Ph.D. degree in Technology of Computer Ap-
Trans. Cybern., vol. 44, no. 12, pp. 2820−2833, Dec. 2014. plication from University of Chinese Academy of
[35] H. Li and D. Liu, “Optimal control for discrete-time affine non-linear Sciences in 2015. He received his BSc. degree in
systems using general value iteration,” IET Control Theory Appl., vol. Information and Computing Science from Tsinghua
6, no. 18, pp. 2725−2736, Dec. 2012. University in 2005, and received MSc. degree in
[36] W. Gao and Z.-P. Jiang, “Adaptive dynamic programming and adap- Operations Research and Control Theory from Renmin University of China
tive optimal output regulation of linear systems,” IEEE Trans. Autom. in 2009.
Control, vol. 61, no. 12, pp. 4164−4169, Dec. 2016.
[37] Y. Duan, Y. Lv, J. Zhang, X. Zhao, and F.-Y. Wang, “Deep learning for
control: The state of the art and prospects,” Acta Autom. Sinica, vol 42,
no. 5, pp. 643−654, 2016.
[38] F.-Y. Wang, “Building knowledge structure in neural nets using fuzzy
logic,” Robotics and Manufacturing: Recent Trends in Research Edu-
cation and Applications, M. Jamshidi (Eds.), New York, NY, ASME
(American Society of Mechanical Engineers) Press, 1992. Qinglai Wei (M’11) received Ph.D. degree in con-
[39] F.-Y. Wang and H.-A. Kim, “Implementing adaptive fuzzy logic control theory and control engineering, from the North-
trollers with neural networks: a design paradigm,” J. Intell. Fuzzy Syst., eastern University, Shenyang, China, in 2009. From
vol. 3, no. 2, pp. 165-180, 1995. 2009–2011, he was a postdoctoral fellow with The
[40] F.-Y. Wang, “The emergence of intelligent enterprises: From CPS to State Key Laboratory of Management and Control
CPSS,” IEEE Intell. Syst., vol. 25, no. 4, pp. 85-88, 2010. for Complex Systems, Institute of Automation, Chi-
[41] C. Nyce, “Predictive analytics white paper,” American Institute for Char- nese Academy of Sciences, Beijing, China. He is
tered Property Casualty Underwriters/Insurance Institute of America, currently a Professor of the institute. He is also a
2007. Professor of the University of Chinese Academy of
[42] W. Eckerson, “Extending the value of your data warehousing investmen- Sciences. He has authored two books, and published
t,” The Data Warehouse Institute, USA, 2007. over 60 international journal papers. His research
[43] J. R. Evans and C. H. Lindner, “Business analytics: The next frontier for interests include adaptive dynamic programming, neural-networks-based con-
decision sciences,” Decision Line, vol. 43, no. 2, pp. 1−4, Mar. 2012. trol, optimal control, nonlinear systems and their industrial applications.
[44] J. Zhang, Q. Wei, and F.-Y. Wang, “Parallel dynammic program- Dr. Wei is an Associate Editor of IEEE Transaction on Systems Man,
ming with an average-greedy mechanism for discrete systems, ” SKL- and Cybernetics: Systems since 2016, Information Sciences since 2016,
MCCS/QAII Tech Report 01-09-2016, ASIA, Beijing, China. Neurocomputing since 2016, Optimal Control Applications and Methods since
[45] F.-Y. Wang, “Parallel control: a method for data-driven and computa- 2016, Acta Automatica Sinica since 2015, and has been holding the same
tional control,” Acta Autom.a Sinica, vol.39, no. 2, pp. 293−302, 2013. position for IEEE Transactions on Neural Networks and Learning Systems
[46] F.-Y. Wang, “Control 5.0: From Newton to Merton in Popper’s Cyber- during 2014–2015. He is the Secretary of IEEE Computational Intelligence
Social-Physical Spaces,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. Society (CIS) Beijing Chapter since 2015.
233−234, 2016.

Fei-Yue Wang (S’87-M’89-SM’94-F’03) received

his Ph. D. in Computer and Systems Engineering
from Rensselaer Polytechnic Institute, Troy, New
York in 1990. He joined the University of Arizona
in 1990 and became a Professor and Director of the
Robotics and Automation Lab (RAL) and Program Xinxu Zheng received the B.S. degree in control
in Advanced Research for Complex Systems (PARC- science and engineering from Zhejiang University,
S). In 1999, he founded the Intelligent Control Hangzhou, China, in 2011. He is currently working
and Systems Engineering Center at the Institute of toward the Ph.D. degree in computer science and en-
Automation, Chinese Academy of Sciences (CAS), gineering at the University of Minnesota, Minneapo-
Beijing, China, under the support of the Outstanding lis, MN, USA. His research interests include social
Oversea Chinese Talents Program from the State Planning Council and “100 computing, machine learning, and data analytics.
Talent Program” from CAS, and in 2002, was appointed as the Director of the
Key Lab of Complex Systems and Intelligence Science, CAS. In 2011, he be-
came the State Specially Appointed Expert and the Director of The State Key
Laboratory of Management and Control for Complex Systems. Dr. Wang’s
current research focuses on methods and applications for parallel systems,
social computing, and knowledge automation. He was the Founding Editor-
in-Chief of the International Journal of Intelligent Control and Systems (1995-
2000), Founding EiC of IEEE ITS Magazine (2006-2007), EiC of IEEE
Intelligent Systems (2009-2012), and EiC of IEEE Transactions on ITS (2009-
2016). Currently he is EiC of China’s Journal of Command and Control. Since
1997, he has served as General or Program Chair of more than 20 IEEE, Li Li (S’05-M’06-SM’10-F’17) is currently an as-
INFORMS, ACM, ASME conferences. He was the President of IEEE ITS sociate professor with Department of Automation,
Society (2005-2007), Chinese Association for Science and Technology (CAST, Tsinghua University, China. His research interests
USA) in 2005, the American Zhu Kezhen Education Foundation (2007-2008), include complex and networked systems, intelligent
and the Vice President of the ACM China Council (2010-2011). Since 2008, control and sensing, intelligent transportation sys-
he is the Vice President and Secretary General of Chinese Association of tems and intelligent vehicles. Dr. Li had published
Automation. Dr. Wang is elected Fellow of IEEE, INCOSE, IFAC, ASME, and over 50 SCI indexed international journal papers
AAAS. In 2007, he received the 2nd Class National Prize in Natural Sciences and over 50 international conference papers as a
of China and awarded the Outstanding Scientist by ACM for his work in first/corresponding author. He serves as an Associate
intelligent control and social computing. He received IEEE ITS Outstanding Editor for IEEE Transactions on Intelligent Trans-
Application and Research Awards in 2009 and 2011, and IEEE SMC Norbert portation Systems.
Wiener Award in 2014. Corresponding author of this paper.

Sap-C S4ewm 2023
No ratings yet
Sap-C S4ewm 2023
31 pages
Ashwin Rao, Tikhon Jelvis - Foundations of Reinforcement Learning With Applications in Finance-CRC Press - Chapman & Hall (2022)
No ratings yet
Ashwin Rao, Tikhon Jelvis - Foundations of Reinforcement Learning With Applications in Finance-CRC Press - Chapman & Hall (2022)
522 pages
Dynamicprogrammingkk
No ratings yet
Dynamicprogrammingkk
513 pages
Adaptive Dynamic Programming With Applications in Optimal Control (PDFDrive)
No ratings yet
Adaptive Dynamic Programming With Applications in Optimal Control (PDFDrive)
609 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
Problem Statement: Aim Theory:: Assignment No-1
No ratings yet
Problem Statement: Aim Theory:: Assignment No-1
33 pages
DAA Presentation
No ratings yet
DAA Presentation
9 pages
IC5 - Level - 3 - Teacher's Edition Teaching Notes - pdf-237-244
No ratings yet
IC5 - Level - 3 - Teacher's Edition Teaching Notes - pdf-237-244
8 pages
Economic Dispatch Using Dynamic Programming
No ratings yet
Economic Dispatch Using Dynamic Programming
22 pages
DAA Unit-3 PPT 19
No ratings yet
DAA Unit-3 PPT 19
49 pages
A Parallel Dynamic Programming Algorithm For Multi-Reservoir System
No ratings yet
A Parallel Dynamic Programming Algorithm For Multi-Reservoir System
15 pages
SM 4
No ratings yet
SM 4
800 pages
Dynamic Programming
No ratings yet
Dynamic Programming
384 pages
Personal Computer: Mujallar DC - Main
100% (1)
Personal Computer: Mujallar DC - Main
10 pages
Stock Analysis Spreadsheet (10YR, 2024) (Vers 4.2) PUBLIC
No ratings yet
Stock Analysis Spreadsheet (10YR, 2024) (Vers 4.2) PUBLIC
17 pages
MIT6 231F15 Complete Slide
No ratings yet
MIT6 231F15 Complete Slide
166 pages
Quantitative Techniques - Theory
No ratings yet
Quantitative Techniques - Theory
2 pages
Program: Computer Engineering Course-Web Development Using PHP (22619) Practical Answer Sheet
100% (1)
Program: Computer Engineering Course-Web Development Using PHP (22619) Practical Answer Sheet
4 pages
Handbook of Learning and Approximate Dynamic Programming Jennie Si PDF Download
No ratings yet
Handbook of Learning and Approximate Dynamic Programming Jennie Si PDF Download
85 pages
Stochastic Dynamic Programming 2
No ratings yet
Stochastic Dynamic Programming 2
105 pages
Based On Stanford CS161 Slides From Summer 2021 by Karey Shi
No ratings yet
Based On Stanford CS161 Slides From Summer 2021 by Karey Shi
102 pages
RLAlgs in MDPs
No ratings yet
RLAlgs in MDPs
98 pages
Panoramic Solution: Hikvision Certified Security Professional
No ratings yet
Panoramic Solution: Hikvision Certified Security Professional
68 pages
Dynamic Programming (I) : Kelvin Chow (Lrt1088) 2024-03-02
No ratings yet
Dynamic Programming (I) : Kelvin Chow (Lrt1088) 2024-03-02
96 pages
Unit 5
No ratings yet
Unit 5
97 pages
Modern Deep Reinforcement Learning Algorithms
No ratings yet
Modern Deep Reinforcement Learning Algorithms
56 pages
Adprl Chapter Icis
No ratings yet
Adprl Chapter Icis
43 pages
Simulation of Agvs in Matlab: Virtual 3D Environment For Testing Different AGV Kinematics and Algorithms
No ratings yet
Simulation of Agvs in Matlab: Virtual 3D Environment For Testing Different AGV Kinematics and Algorithms
67 pages
10 Dynamic Programing DP
No ratings yet
10 Dynamic Programing DP
29 pages
1 s2.0 S0360544225018973 Main
No ratings yet
1 s2.0 S0360544225018973 Main
26 pages
DTSP - New Lab Manual - 2024-25 - Even - SEM
No ratings yet
DTSP - New Lab Manual - 2024-25 - Even - SEM
30 pages
Unit 04
No ratings yet
Unit 04
24 pages
MCSL-204 ENG (Jan 25 To July 25)
No ratings yet
MCSL-204 ENG (Jan 25 To July 25)
19 pages
Dynamic Programming in Computer Science
No ratings yet
Dynamic Programming in Computer Science
49 pages
GonzalezGauss Zhao 2020
No ratings yet
GonzalezGauss Zhao 2020
16 pages
! Boss
No ratings yet
! Boss
17 pages
Open-Source Device For High Sensitivity Magnetic Particle Spectroscopy, Relaxometry, and Hysteresis Loop Tracing
No ratings yet
Open-Source Device For High Sensitivity Magnetic Particle Spectroscopy, Relaxometry, and Hysteresis Loop Tracing
16 pages
Rsinak 000095 063706 - 1
No ratings yet
Rsinak 000095 063706 - 1
15 pages
An Analysis of The Inverse Kinematics For A 5-DOF Manipulator
No ratings yet
An Analysis of The Inverse Kinematics For A 5-DOF Manipulator
11 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
No ratings yet
Continuous-Time Stochastic Policy Iteration of Adaptive Dynamic Programming
13 pages
Skill-Lync - Aerospace Offerings - 2024
No ratings yet
Skill-Lync - Aerospace Offerings - 2024
32 pages
Entregable Final - Big Data y Machine Learning (Diaz Granados Alexander Angel)
No ratings yet
Entregable Final - Big Data y Machine Learning (Diaz Granados Alexander Angel)
18 pages
Assignment 01 Logika Matematika
No ratings yet
Assignment 01 Logika Matematika
14 pages
Open-Circuit Fault-Tolerant Method For Three-Phase CF-DAB Converter With Auto-Balancing Control
No ratings yet
Open-Circuit Fault-Tolerant Method For Three-Phase CF-DAB Converter With Auto-Balancing Control
12 pages
Financify
No ratings yet
Financify
8 pages
Dynamic Programming (DP) 02 - Class Notes
No ratings yet
Dynamic Programming (DP) 02 - Class Notes
38 pages
Graph Neural Networks Are Dynamic Programmers: Equal Contribution
No ratings yet
Graph Neural Networks Are Dynamic Programmers: Equal Contribution
18 pages
High-Speed PWM Controller: Features Description
No ratings yet
High-Speed PWM Controller: Features Description
20 pages
Lesson 8 Complexity Theory and DP
No ratings yet
Lesson 8 Complexity Theory and DP
13 pages
SAQA - 115431 - Learner Guide
No ratings yet
SAQA - 115431 - Learner Guide
21 pages
Introduction To Dynamic Programming
No ratings yet
Introduction To Dynamic Programming
15 pages
Daa - GP - 250203 - 090859
No ratings yet
Daa - GP - 250203 - 090859
13 pages
1 s2.0 S0005109822002163 Main
No ratings yet
1 s2.0 S0005109822002163 Main
9 pages
Optimizations, Chapter 1,2,3,4
No ratings yet
Optimizations, Chapter 1,2,3,4
13 pages
CH 9 MDP
No ratings yet
CH 9 MDP
97 pages
Dynamic Programming
No ratings yet
Dynamic Programming
7 pages
Reinforcement Learning in Dynamic Environments Optimizing Real Time Decision Making For Complex Systems MAR 2025
No ratings yet
Reinforcement Learning in Dynamic Environments Optimizing Real Time Decision Making For Complex Systems MAR 2025
8 pages
Dynamic Programming
No ratings yet
Dynamic Programming
12 pages
RL Ia 2
No ratings yet
RL Ia 2
14 pages
7-Dynamic Programming-17-01-2024
No ratings yet
7-Dynamic Programming-17-01-2024
13 pages
A Survey of Dynamic Programming Algorithms
No ratings yet
A Survey of Dynamic Programming Algorithms
7 pages
15 Dynamic Programming
No ratings yet
15 Dynamic Programming
6 pages
Dynamic Programming
No ratings yet
Dynamic Programming
12 pages
Discuss About Temporal Difference in Reinforcement Learning?
No ratings yet
Discuss About Temporal Difference in Reinforcement Learning?
9 pages
Dynamic Programming
No ratings yet
Dynamic Programming
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
10 pages
Complete Stability Analysis of A Heuristic Approximate Dynamic Programming Control Design
No ratings yet
Complete Stability Analysis of A Heuristic Approximate Dynamic Programming Control Design
20 pages
Deep Neural Network Approximated Dynamic Programming For Combinatorial Optimization
No ratings yet
Deep Neural Network Approximated Dynamic Programming For Combinatorial Optimization
8 pages
$R14EU7U
No ratings yet
$R14EU7U
3 pages
Aws Report 1
No ratings yet
Aws Report 1
7 pages
Intro - Types of Machine Learning
No ratings yet
Intro - Types of Machine Learning
24 pages
Aodv Protocol Thesis
No ratings yet
Aodv Protocol Thesis
4 pages
Log
No ratings yet
Log
3 pages
Octnov 23
No ratings yet
Octnov 23
3 pages
Pagerank Explained Simple
No ratings yet
Pagerank Explained Simple
4 pages
Coding - Decoding English
No ratings yet
Coding - Decoding English
3 pages
DP Report
No ratings yet
DP Report
1 page
Godzilla Cliker Original On Scratch 7
No ratings yet
Godzilla Cliker Original On Scratch 7
1 page
Harshit Patel
No ratings yet
Harshit Patel
1 page
DP - Intro With Links
No ratings yet
DP - Intro With Links
10 pages
Lab 7 Capturing and Examining The Registry (15 PTS.)
No ratings yet
Lab 7 Capturing and Examining The Registry (15 PTS.)
8 pages
Overview of DP
No ratings yet
Overview of DP
21 pages
TRAX DS en
No ratings yet
TRAX DS en
8 pages
Esigno E S: Nergy Aver
No ratings yet
Esigno E S: Nergy Aver
2 pages
IJCAS v2 n3 pp263-278
No ratings yet
IJCAS v2 n3 pp263-278
16 pages
Curriculum Vitae
No ratings yet
Curriculum Vitae
3 pages
IEEE Conference & Events Search Results
No ratings yet
IEEE Conference & Events Search Results
1 page
Foundational Models and Architectures S1: Generative AI, #1
From Everand
Foundational Models and Architectures S1: Generative AI, #1
Leaster Startx
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Deep learning: deep learning explained to your granny – a guide for beginners
From Everand
Deep learning: deep learning explained to your granny – a guide for beginners
PAT NAKAMOTO
3/5 (2)
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet