PDP_+Parallel+Dynamic+Programming
PDP_+Parallel+Dynamic+Programming
1, JANUARY 2017 1
Abstract—Deep reinforcement learning is a focus research area technologies. However, the other key to AlphaGo’s success,
in artificial intelligence. The principle of optimality in dynamic the Principle of Optimality for dynamic programming, has
programming is a key to the success of reinforcement learning been taken for reinforcement learning (RL). As a matter of
methods. The principle of adaptive dynamic programming (ADP)
is first presented instead of direct dynamic programming (DP), fact, dynamic programming plays a very important role in
and the inherent relationship between ADP and deep reinforce- modern reinforcement learning. The victory of AlphaGo is
ment learning is developed. Next, analytics intelligence, as the actually also the victory of dynamic programming.
necessary requirement, for the real reinforcement learning, is Dynamic programming [2] has become well-known since
discussed. Finally, the principle of the parallel dynamic pro- 1950s in many fields. In 1977, Werbos combined DP, neural
gramming, which integrates dynamic programming and analytics
intelligence, is presented as the future computational intelligence. networks and reinforcement learning, and introduced approx-
imate/adaptive dynamic programming (ADP) [3], [4] to solve
the “curse of dimensionality” [2]. However, trial-and-error
Index Terms—Parallel dynamic programming, Dynamic pro-
gramming, Adaptive dynamic programming, Reinforcement based reinforcement learning and deep learning both focus
learning, Deep learning, Neural networks, Artificial intelligence. on engineering complexity and ignore the social complexity.
In this article, we suggest another extension of dynamic
programming considering both engineering and social com-
plexities, aiming the “paradox of scientific methods” in com-
I. I NTRODUCTION
plex system’s “scientific solutions” [5]. We utilizing big data
Google DeepMind’s deep reinforcement learning based Al- analytics and the ACP approach [6], [7]: arttificial societies for
phaGo computer program [1] won the historic Go match descriptive anlytics, computational experiments for predictive
against world champion Lee Sedol in March 2016. The analytics, and parallel execution for prescriptive analytics. We
combination of Monte-Carlo tree search and deep reinforce- name our approach Parallel Dynamic Programming.
ment learning makes a breakthrough at Go playing which This article is organized as follows. The next section reviews
is believed impossible with brute-force search, and brings dynamic programming and adaptive dynamic programming.
artificial intelligence a focus for the year. Most people pay Then, we briefly discuss the neural network structure of ADP
more attention to the intuitive highly brain-like deep learning and AlphaGo. We present the ACP approach of analytics
intelligence in Section IV. In Section V, we introduce the basic
Manuscript received November 11, 2015; accepted December 21, 2016. structure of parallel dynamic programming. The last section
This work was supported by National Natural Science Foundation of China
(61533019, 61374105, 71232006, 61233001, 71402178). concludes the article.
Citation: F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, “PDP:
parallel dynamic programming,” IEEE/CAA Journal of Automatica Sinica,
vol. 4, no. 1, pp. 1-5, Jan. 2017. II. F ROM DYNAMIC P ROGRAMMING TO A DAPTIVE
Fei-Yue Wang is with The State Key Laboratory of Management and DYNAMIC P ROGRAMMING
Control for Complex Systems (SKL-MCCS), Institute of Automation, Chi-
nese Academy of Sciences (CASIA), Beijing 100190, China, and School Dynamic programming (DP) is a very useful tool in solving
of Computer and Control Engineering, University of Chinese Academy optimization and optimal control problems [8]–[10]. The dy-
of Sciences, Beijing 100049, China, and also with the Research Center namic programming technique rests on a very simple idea, the
for Military Computational Experiments and Parallel Systems Technology,
National University of Defense Technology, Changsha 410073, China (e-mail: Bellman’s principle of optimality [2]: “An optimal policy has
[email protected]). the property that no matter what the previous decision (i.e.,
Jie Zhang is with The State Key Laboratory of Management and Con- controls) have been, the remaining decisions must constitute
trol for Complex Systems, Institute of Automation, Chinese Academy of
Sciences (SKL-MCCS, CASIA), Beijing 100190, China, and also with the an optimal policy with regard to the state resulting from those
Qingdao Academy of Intelligent Industries, Shandong 266000, China (e-mail: previous decisions.”
[email protected]). DP can easily be applied to the optimal control of discrete-
Qinglai Wei is with The State Key Laboratory of Management and Control
for Complex Systems, Institute of Automation, Chinese Academy of Sciences time nonlinear systems. Let the system be
(SKL-MCCS, CASIA), Beijing 100190, China, and also with School of
Computer and Control Engineering, University of Chinese Academy of xk+1 = Fk (xk , uk ), (1)
Sciences, Beijing 100049, China (e-mail: [email protected]).
Xinhu Zheng is with the Department of Computer Science and Engi- where xk , uk are the state and control, respectively, and Fk (·)
neering, University of Minnesota, Minneapolis, MN 55414, USA (e-mail: is the system function at time k. Suppose we associate with
[email protected]).
Li Li is with the Department of Automation, Tsinghua University, Beijing
this plant the performance index function
100084, China (email: [email protected]). −1
Color versions of one or more of the figures in this paper are available
∑
N
online at http://ieeexplore.ieee.org.
Ji (xi ) = ϕ(N, xN ) + Uk (xk , uk ), (2)
Digital Object Identifier 10.1109/JAS.2017.7510310 k=i
2 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 1, JANUARY 2017
where [i, N ] is the time interval of interest. According to the Programming” [22]. In 2006, the synonyms were unified as
Bellman’s principle of optimality, the optimal performance “Adaptive Dynamic Programming (ADP)” [23]–[36].
index function, which aims to minimize, satisfies the following
equation III. N EURAL N ETWORK S TRUCTURE OF ADP
Jk∗ (xk ) = min{Uk (xk , uk ) + Jk+1
∗
(xk+1 )}. (3) HDP is the most basic and widely applied structure of ADP
uk [11], [13]. The structure of HDP is shown in Fig. 2. HDP
Equation (3) is the Bellman’s optimality equation. Its im- is a method for estimating the performance index function.
portance lies in the fact that it allow us to optimize over only Estimating the performance index function for a given policy
one control vector at a time by working backward from N . only requires samples from the instantaneous utility function
It is called the functional equation of dynamic programming U , while models of the environment and the instantaneous
and is the basis for computer implementation of the Bellman’s reward are needed to find the performance index function
method. However, it is often computationally untenable to corresponding to the optimal policy.
obtain the optimal control by directly solving the Bellman
equation (3) due to the backward numerical process required
&ULWLF J xk
for its solution, i.e., as a result of the well-known “curse of 1HWZRUN
dimensionality” [2]. We have to find a series of optimal control
actions that must be taken in sequence. This sequence will
give the optimal performance index, but the total cost of these uk
xk $FWLRQ 0RGHO xk &ULWLF J xk
actions is unknown until the end of that sequence. 1HWZRUN 1HWZRUN 1HWZRUN
U xk uk
Approximate dynamic programming, proposed by Werbos
[3], [4], builds a critic system to circumvent the “curse
of dimensionality” by building a system, called “critic” to
Fig. 2. The HDP structure diagram.
approximate the cost function in dynamic programming. The
main idea of approximate dynamic programming as shown in In the HDP structure, the model network aims to describe
Fig. 1. There are three parts in the the structure of approximate the dynamic of the system. The action network aims to
approximate the control policy of the system, and the critic
network aims to approximate the performance index function.
If each neural network in HDP is chosen as three-layer back-
propagation (BP) network, then the neural network structure
of HDP can be expressed as in Fig. 3.
uk x k
uk x k
,QSXW # # # 2XWSXW
xk # # # # # # # # # # J k
# um k
# xn k
#
$FWLRQ1HWZRUN 0RGHO1HWZRUN &ULWLF1HWZRUN
'HHS1HXUDO1HWZRUN%DVHG5HLQIRUFHPHQW/HDUQLQJ
dynamic programming, which are dynamic system, the critic In Fig. 3, we can say that we use three BP neural networks
module, and the action module, respectively. First, the action to implement the learning of the optimal control. However, if
module outputs a control policy according to the system state. the three neural networks are regarded as one neural network,
Second, according to the system implementation, the critic then we can say that we implement the learning of the optimal
module receives a evaluate signal. Third, a reinforcement control by at least a nine-layer deep BP network [37]. In this
signal is created by the critic network, which aims to indicate point of view, the structure of HDP is a structure of a deep
the action module to find a better control policy, at least neural network. For all the other structures of ADP [13], such
not worse. The whole implementation is self-learning and as dual heuristic programming (DHP), global dual heuristic
the critic and action modules can be regarded as an agent. programming (GDHP), and their action-depended versions,
According to the principle in Fig. 1, the dynamic programming the structures can also be transformed into one deep neural
problem is desired to solve forward-time. In [11], [12], the network. Thus, the structure of ADP is naturally a deep neural
approximate dynamic programming method was implemented, network. The training target of the deep neural network is
where each part in Fig. 1 was modeled by a neural network and desired to force the following error
hence is called “Neuro-Dynamic Programming”. Its several
ek = Uk (xk , uk ) + Jk+1 (xk+1 ) − Jk (xk ) (4)
synonyms are used, such as “Adaptive Critic Designs” [13],
[14], “Asymptotic Dynamic Programming” [15], “Adaptive to zero. Obviously, the training error ek can be chosen as
Dynamic Programming” [16]–[21], and “Neural Dynamic the reinforcement signal, which optimizes the control policy
WANG et al.: PDP: PARALLEL DYNAMIC PROGRAMMING 3
to minimize the distance between ek and the equilibrium Predictive analysis [41]–[43] should be made according to
point. Hence, the optimization process by ADP is actually a the descriptive model and historical data to predict the future
reinforcement learning process via a deep neural network. This by reasoning. It tells us “what will happen”, “when will it
is an amazing similar with the implementation of AlphaGo. happen,” and “why will it happen”. In our ACP approach,
Earlier works in [38], [39] also provided neural network based predictive analytics are conducting computational experiments
control method with knowledge architecture embed. to predict the future for certain artificial society with certain
control and management policy. “Big data” are created in the
IV. A NALYTICS I NTELLIGENCE : F ROM ACP TO DPP computational experiments in cyberspace.
Reinforcement learning is a computational approach to Finally, no matter how many possible futures or policies,
understanding and automating goal-directed learning and we can choose only one to implement in our real world.
decision-making. During the implementation process of ADP, Hence, after the predictive analytics of different policies and
it is emphasized that reinforcement learning is a key technique different artificial societies, we reduce the “big data”, extract
to find a better control policy of the system via trial-and- rules, and create the real future through learning and adaption.
error. However, it should be pointed out that shortcomings In our ACP approach, prescriptive analytics are developed to
inherently exist for the trial-and-error approach. Many real- find benefit policy from predictions based on the descriptive
world control systems cannot be “tried” sufficiently for the models and historical data through parallel execution. Data
fact of security and cost. Particularly, for systems that involve are collected for further descriptive analytics and predictive
human and societies (Cyber-Phisical-Social systems, CPSS) analytics during the parallel execution.
[40], sometimes the “error” is intolerable. The success of
AlphaGo suggests the possibility of conducting reinforcement V. PARALLEL DYNAMIC P ROGRAMMING
learning with a virtual Go-game played by two virtual players. In the practice of AlphaGo, one of the key ideas is to extract
However, we do not know the exact rules or dynamic systems supervised learning policy, and to improve it to get a sub-
in most of our real-world control and management problems optimal policy during the reinforcement learning procedure.
as the Go-game. In parallel dynamic programming (PDP), we suggest the
Data driven parallel systems in cyberspace are the key to ACP approach for decision making in CPSS with analytics
solve the trial-and-error challenge. Two founding pioneers intelligence.
of modern management sciences stated famous maxims for
operations: W. Edwords Deming, “In God we trust; all others
must bring data” and Peter F. Drucker: “The best way to
predict the future is to create it”. Our suggestion is to integrate
artificial intelligence and analytics [6]: artificial societies (or
systems) for descriptive analytics, computational experiments
for predictive analytics, and parallel execution for prescriptive
analytics. The DPP (descriptive, predictive and prescriptive)
analytics intelligence can be built based on the ACP approach
as is shown in Fig. 4.
PDP. Hence, assume we have n-artificial systems (n = 3 in [9] C. Vagg, S. Akehurst, C. J. Brace, and L. Ash, “Stochastic dynamic
Fig. 5) parallelized with the real-world system, we can employ programming in the real-world control of hybrid electric vehicles,” IEEE
Trans. Control Syst. Technol., vol. 24, no. 3, pp. 853−866, Mar. 2016.
dynamic programming or adaptive dynamic programming to [10] P. M. Esfahani, D. Chatterjee, and J. Lygeros, “Motion planning for
solve the optimal control problems with known state equations continuous-time stochastic processes: A dynamic programming ap-
in the trial-and-error approach in the virtual parallel system proach,” IEEE Trans. Autom. Control, vol. 61, pp. 2155−2170, 2016.
[11] P. J. Werbos, “Approximate dynamic programming for real-time control
without any cost or risk, and get n optimal (or sub-optimal) and neural modeling,” in Handbook of Intelligent Control: Neural, Fuzzy,
decisions. Note that, this procedure is naturally distributed, and and Adaptive Approaches, D.A. White and D.A. Sofge (Eds.), New York:
previous decisions in real-world can be used as an initial guess Van Nostrand Reinhold, 1992, ch. 13.
[12] D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming.
in the reinforcement learning iterations to reduce the compu- Belmont, MA: Athena Scientific, 1996.
tation. Then, the computational experiments can be conducted [13] D. V. Prokhorov and D. C. Wunsch, “Adaptive critic designs,” IEEE
in a bionic manner: based on a voting mechanism n-virtual Trans. Neural Netw., vol. 8, no. 5, pp. 997−1007, Sep. 1997.
[14] J. Han, S. Khushalani-Solanki, J. Solanki, and J. Liang, “Adaptive critic
systems will vote for the n-decisions. Hence, we can get a design-based dynamic stochastic optimal control design for a microgrid
winning decision and its corresponding critic network. Note with multiple renewable resources,” IEEE Trans. Smart Grid, vol. 6, no.
that, the computational experiments search for an acceptable 6, pp. 2694−2703, Jun. 2015.
artificial system and an admissible decision, rather than the [15] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press, 1998.
optimal control. [16] J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, “Adaptive dynamic
The parallel execution will based on the optimality principle programming,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32,
of dynamic programming, and the critic network selected no. 2, pp. 140−153, May 2002.
[17] Q. Wei, F. L. Lewis, D. Liu, R. Song, and H. Lin, “Discrete-time local
in the computational experiments. We adjust the decision value Iteration adaptive dynamic programming: Convergence analysis,”
according to the winning critic network and the observed IEEE Trans. Syst., Man, Cybern. A, Syst., article in press, 2016. DOI:
real-world state. The virtual-real system interaction can be 10.1109/TSMC.2016.2623766.
[18] Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song, “Discrete-time
conducted by observing states and errors, updating the artificial deterministic Q-learning: A novel convergence analysis,” IEEE Trans.
systems and adjusting the voting mechanism. Cybern., article in press, 2016. DOI: 10.1109/TCYB.2016.2542923.
The detailed implementation of the PDP algorithm for [19] Q. Wei, D. Liu, and G. Shi, “A novel dual iterative Q-learning method
for optimal battery management in smart residential environments,”
unknown discrete systems has been conducted and the result IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2509−2518, Apr. 2015.
is very interesting and promising [44], more works are under- [20] Q. Wei and D. Liu, “A novel iterative θ-Adaptive dynamic programming
taking and will be reported. for discrete-time nonlinear systems,” IEEE Trans. Autom. Sci. Eng., vol.
11, no. 4, pp. 1176−1190, Oct. 2014.
[21] Q. Wei, D. Liu, Q. Lin, and R. Song, “Discrete-time optimal control
via local policy iteration adaptive dynamic programming,” IEEE Trans.
VI. R EMARK AND C ONCLUSION Cybern., article in press, 2016. DOI: 10.1109/TCYB.2016.2586082.
“Scientific solutions” need to satisfy two conditions: triable [22] R. Enns and J. Si, “Helicopter trimming and tracking control using direct
neural dynamic programming,” IEEE Trans. Neural Netw., vol. 14, no.
and repeatable [6]. In real-world systems that involves human 4, pp. 929−939, Aug. 2003.
and societies, the trial-and-error based reinforcement learning [23] R. Kamalapurkar, J. R. Klotz, and W. E. Dixon, “Concurrent learning-
can not be conducted unless we already know the “error” based approximate feedback-Nash equilibrium solution of N-player
nonzero-sum differential games,” IEEE/CAA J. Autom. Sinica, vol. 1,
will be harmless. On the other hand, the suggested parallel no. 3, pp. 239−247, Jul. 2014.
dynamic programming conduct computational experiments in [24] Q. Wei, D. Liu, and Q. Lin, “Discrete-time local iterative adap-
virtual systems with the idea of optimality principle. Unlike tive dynamic programming: Terminations and admissibility analysis,”
IEEE Trans. Neural Netw. Learn. Syst., article in press, 2016. DOI:
the game of AlphaGo, PDP is based on parallel systems [45], 10.1109/TNNLS.2016.2593743.
[46] instead of the exact rules of real-world systems, and will [25] Q. Wei, R. Song, and P. Yan, “Data-driven zero-sum neuro-optimal
be more flexible and feasible for complex problems. control for a class of continuous-time unknown nonlinear systems with
disturbance using ADP,” IEEE Trans. Neural Netw. Learn. Syst., vol.
27, no. 2, pp. 444−458, Feb. 2016.
R EFERENCES [26] H. Zhang, C. Qin, B. Jiang, and Y. Luo, “Online adaptive policy
learning algorithm for H∞ state feedback control of unknown affine
[1] D. Silver et al., “Mastering the game of Go with deep neural networks nonlinear discrete-time systems,” IEEE Trans. Cybern., vol. 44, no. 12,
and tree search,” Nature 529.7587, pp. 484-489, 2016. pp. 2706−2718, Dec. 2014.
[2] R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton Uni- [27] F.-Y. Wang and G. N. Saridis, “Suboptimal control for nonlinear
versity Press, 1957. stochastic systems,” Proc. 31st IEEE Conf. Decision Control, 1992.
[3] P. J. Werbos, “Advanced forecasting methods for global crisis warning [28] G. N. Saridis and F.-Y. Wang, “Suboptimal control of nonlinear stochas-
and models of intelligence,” General Syst. Yearbook, vol. 22, 1977. tic systems,” Control Theory and Advanced Technology, vol. 10, no. 4,
[4] P. J. Werbos, “A menu of designs for reinforcement learning over time,” pp. 847−871, 1994.
in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. [29] Q. Wei, D. Liu, and X. Yang, “Infinite horizon self-learning optimal
Werbos (Eds.), Cambridge: MIT Press, 1991, pp. 67−95. control of nonaffine discrete-time nonlinear systems,” IEEE Trans.
[5] F.-Y. Wang, et al., “Where does AlphaGo go: from church-turing thesis Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866−879, Apr. 2015.
to AlphaGo thesis and beyond”, IEEE/CAA J. Autom. Sinica, vol. 3, no. [30] Q. Wei, D. Liu, Y. Liu, and R. Song, “Optimal constrained self-learning
2, pp. 113−120, April 2016. battery sequential management in microgrid via adaptive dynamic pro-
[6] F.-Y. Wang, “A big-data perspective on AI: Newton, Merton, and gramming,” IEEE/CAA J. Autom. Sinica, article in press, 2016. DOI:
analytics intelligence”, IEEE Intell. Syst., vol. 27, no. 5, pp. 2−4, 2012. 10.1109/JAS.2016.7510262.
[7] L. Li, Y.-L. Lin, D.-P. Cao, N.-N. Zheng, and F.-Y. Wang, “Parallel [31] Q. Zhao, H. Xu, and S. Jagannathan, “Near optimal output feedback
learning-a new framework for machine learning,” Acta Autom. Sinica, control of nonlinear discrete-time systems based on reinforcement neural
vol. 43, no. 1, pp. 1−8, 2017 (in Chinese). network learning,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp.
[8] J. Li, W. Xu, J. Zhang, M. Zhang, Z. Wang, and X. Li, “Efficient video 372−384, Oct. 2014.
stitching based on fast structure deformation,” IEEE Trans. Cybern., [32] Q. Wei, D. Liu, G. Shi, and Y. Liu, “Optimal multi-battery coordination
article in press, 2015. DOI: 10.1109/TCYB.2014.2381774. control for home energy management systems via distributed iterative
WANG et al.: PDP: PARALLEL DYNAMIC PROGRAMMING 5
adaptive dynamic programming,” IEEE Trans. Ind. Electron., vol. 42, Jie Zhang (M’16) is an associate professor with
no. 7, pp. 4203−4214, Jul. 2015. The State Key Laboratory of Management and Con-
[33] Q. Wei, D. Liu, and H. Lin, “Value iteration adaptive dynamic pro- trol for Complex Systems, Institute of Automation,
gramming for optimal control of discrete-time nonlinear systems,” IEEE Chinese Academy of Sciences. His current research
Trans. Cybern., vol. 46, no. 3, pp. 840−853, Mar. 2016. interests include mechanism design and optimal con-
[34] Q. Wei, F. Wang, D. Liu, and X. Yang, “Finite-approximation-error trol in e-commerce and traffic systems. He received
based discrete-time iterative adaptive dynamic programming,” IEEE his Ph.D. degree in Technology of Computer Ap-
Trans. Cybern., vol. 44, no. 12, pp. 2820−2833, Dec. 2014. plication from University of Chinese Academy of
[35] H. Li and D. Liu, “Optimal control for discrete-time affine non-linear Sciences in 2015. He received his BSc. degree in
systems using general value iteration,” IET Control Theory Appl., vol. Information and Computing Science from Tsinghua
6, no. 18, pp. 2725−2736, Dec. 2012. University in 2005, and received MSc. degree in
[36] W. Gao and Z.-P. Jiang, “Adaptive dynamic programming and adap- Operations Research and Control Theory from Renmin University of China
tive optimal output regulation of linear systems,” IEEE Trans. Autom. in 2009.
Control, vol. 61, no. 12, pp. 4164−4169, Dec. 2016.
[37] Y. Duan, Y. Lv, J. Zhang, X. Zhao, and F.-Y. Wang, “Deep learning for
control: The state of the art and prospects,” Acta Autom. Sinica, vol 42,
no. 5, pp. 643−654, 2016.
[38] F.-Y. Wang, “Building knowledge structure in neural nets using fuzzy
logic,” Robotics and Manufacturing: Recent Trends in Research Edu-
cation and Applications, M. Jamshidi (Eds.), New York, NY, ASME
(American Society of Mechanical Engineers) Press, 1992. Qinglai Wei (M’11) received Ph.D. degree in con-
[39] F.-Y. Wang and H.-A. Kim, “Implementing adaptive fuzzy logic con- trol theory and control engineering, from the North-
trollers with neural networks: a design paradigm,” J. Intell. Fuzzy Syst., eastern University, Shenyang, China, in 2009. From
vol. 3, no. 2, pp. 165-180, 1995. 2009–2011, he was a postdoctoral fellow with The
[40] F.-Y. Wang, “The emergence of intelligent enterprises: From CPS to State Key Laboratory of Management and Control
CPSS,” IEEE Intell. Syst., vol. 25, no. 4, pp. 85-88, 2010. for Complex Systems, Institute of Automation, Chi-
[41] C. Nyce, “Predictive analytics white paper,” American Institute for Char- nese Academy of Sciences, Beijing, China. He is
tered Property Casualty Underwriters/Insurance Institute of America, currently a Professor of the institute. He is also a
2007. Professor of the University of Chinese Academy of
[42] W. Eckerson, “Extending the value of your data warehousing investmen- Sciences. He has authored two books, and published
t,” The Data Warehouse Institute, USA, 2007. over 60 international journal papers. His research
[43] J. R. Evans and C. H. Lindner, “Business analytics: The next frontier for interests include adaptive dynamic programming, neural-networks-based con-
decision sciences,” Decision Line, vol. 43, no. 2, pp. 1−4, Mar. 2012. trol, optimal control, nonlinear systems and their industrial applications.
[44] J. Zhang, Q. Wei, and F.-Y. Wang, “Parallel dynammic program- Dr. Wei is an Associate Editor of IEEE Transaction on Systems Man,
ming with an average-greedy mechanism for discrete systems, ” SKL- and Cybernetics: Systems since 2016, Information Sciences since 2016,
MCCS/QAII Tech Report 01-09-2016, ASIA, Beijing, China. Neurocomputing since 2016, Optimal Control Applications and Methods since
[45] F.-Y. Wang, “Parallel control: a method for data-driven and computa- 2016, Acta Automatica Sinica since 2015, and has been holding the same
tional control,” Acta Autom.a Sinica, vol.39, no. 2, pp. 293−302, 2013. position for IEEE Transactions on Neural Networks and Learning Systems
[46] F.-Y. Wang, “Control 5.0: From Newton to Merton in Popper’s Cyber- during 2014–2015. He is the Secretary of IEEE Computational Intelligence
Social-Physical Spaces,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. Society (CIS) Beijing Chapter since 2015.
233−234, 2016.