! Boss
! Boss
org/Xplore
Abstract—In this paper, a novel nonlinear learning controller the users [6]. For nonlinear system modeling, the most widely
called fuzzy based goal representation adaptive dynamic pro- used approaches are based on conventional philosophy, such
gramming (Fuzzy-GrADP) is proposed. In the adopted GrADP as differential algebraic equation (DAE) based mathematic
method, a goal representation network is introduced to generate
an adaptive internal reinforcement signal to the critic network, model. When the system exhibits strong nonlinearities, mul-
such that, to help the controller to provide a general mapping tivariable coupling, variation of operation conditions together
between the input and output action. Moreover, in the proposed with unknown model structure and parameters, the conven-
architecture, the action network in the goal representation tional mathematics may not be suitable [7] [8] [9]. In these
adaptive dynamic programming (GrADP) is improved by using situations, methods that do not require system modeling and
the fuzzy hyperbolic model (FHM), which combines the mer-
its of fuzzy model and neural network model. Based on the also hold on-line learning ability are highly desired in the real
back-propagation technique, the parameters in the membership applications. Adaptive dynamic programming (ADP) is such a
functions (MFs) and the fuzzy rules are all undergo training tool to provide sequential decision and control to address such
and online adapting. The proposed controller is tested on two aforementioned real-life problems [10]. The key idea of ADP
numerical benchmarks and the simulation results show that the is to achieve optimization over time based on the Bellman
proposed controller outperforms the original adaptive dynamic
fuzzy controller, and the pure neural network-based GrADP equation, which has the following form [11]:
controller. In addition, the proposed controller is further applied ∞
on a large multimachine power system for static var compensator J[x(t), t] = αi−t U [x(i), u(i), i] (1)
(SVC) damping control, where simulation results demonstrate i=t
the effectiveness of the proposed approach on real applications.
Furthermore, in order to demonstrate the theoretical guarantee where x(t) is the state vector of the system, u(t) is the control
of the proposed method, Lyapunov stability analysis to support action, U is the utility function, and α is a discount factor.
the proposed Fuzzy-GrADP approach has also been carried out. Approximate dynamic programming is used to seek control
policy u(t) to minimize the total cost function J. Instead
Index Terms—Adaptive dynamic programming (ADP); inter- of finding the exact minimum, an approximate solution is
nal goal representation; goal representation adaptive dynamic provided by solving the following equation:
programming (GrADP); fuzzy hyperbolic model (FHM); multi-
machine power systems; stability analysis. J ∗ (x(t)) = min{U (x(t), u(t)) + αJ ∗ (x(t + 1))} (2)
u(t)
to have better performance than HDP and DHP. However, constructing rule based on hybrid singular value decompo-
the computational complexities and hardware implementation sition and gradient descent method [34], permanent-magnet
difficulties are much higher for GDHP. Variations of these synchronous motor drive speed control using self-constructing
major designs, such as the action-dependent (AD) versions, fuzzy neural network [35], oscillation energy descent based
have also been developed in the community [17]. The online adaptive fuzzy-logic SVC damping controller design [36], and
“model-free” direct HDP was developed in [16], where the short-term load forecasting using radial basis function (RBF)
authors took the advantages of the potential scalability of the based ANFIS [37]. Based on aforementioned discussions, it
adaptive critic designs and the intuitiveness of Q-learning. It is interesting to combine the advantages of fuzzy and ADP
is also an online learning scheme that simultaneously updates methods together to design robust controller. Such an idea has
the value function and the control policy. For the model-based been implemented in the community [38] [39] [40] [41] [42],
DHP/GDHP design, the authors in [15] proposed that the demonstrating the promising future along this direction.
efficient learning can be achieved with different weights error Inspired by these previous researches, in this paper, we are
terms for the control of an auto-lander helicopter. In [18] [19], proposing a new real-time control framework using GrADP
the authors demonstrated the convergence analysis for model- and fuzzy hyperbolic model (FHM). Moreover, an application
based DHP/GDHP in terms of cost function and control law. In study on industrial-scale multimachine power system damping
addition, the Levenberg-Marquardt method has been proposed control using static var compensator (SVC) is also presented
to be integrated into the ADP design, to improve the learning in this research. The main contributions of this paper are
and control of both the tension and height of a looper system summarized as follows:
in a hot strip mill [20].
1) A novel nonlinear learning controller, called Fuzzy-
Among the ADP designs, the online “model-free” technique
GrADP, based on fuzzy hyperbolic model (FHM)
has attracted considerable attention. To be specific, the previ-
and goal representation adaptive dynamic programming
ous total cost-to-go value J(t − 1) is stored and used to obtain
(GrADP) has been proposed in this paper. Different with
the temporal difference for training at any time instance, which
the original GrADP method, the proposed controller
enables the online learning, association, and optimization over
incorporate the advantage of FHM to increase the ro-
time. In [21], the authors proposed to improve the online
bustness. Under this framework, the parameters in the
learning of ADP design with the incorporation of a dual-
membership functions (MFs) and the fuzzy rules have
critic/reference network. This hierarchical ADP design with
been updated through a learning mechanism, and can
multiple goal networks has been tested maze navigation [22],
provide online sequential control policy.
energy storage based power system damping control [23],
2) Comparative simulation studies have been carried out
power system stability control for a wind farm [24], and load
for the proposed method with the original GrADP al-
frequency control for island smart grid with electric vehicles
gorithm, the hierarchical GrADP algorithm, the Fuzzy-
and renewable resources [25], demonstrating the superior
ADP algorithm, and the traditional ADP algorithm on
learning performance over the traditional ones.
two classical control benchmarks, which are cart-pole
Meanwhile, fuzzy systems have been used in many appli-
balancing problem and ball-and-beam balancing prob-
cations for its robust control in the presence of noise and
lem. Simulation results demonstrate that the proposed
uncertainties. In these systems, the linguistic control strategy
controller has much better robustness with noise in the
based on expert knowledge has been converted into automatic
environment.
control strategy. General speaking, fuzzy systems provide a
3) Moreover, an application study case on a large multi-
nonlinear mapping from the input to a set of fuzzy values
machine power system for SVC damping control has
using fuzzification methods, and then back to the output using
also been carried out in this paper. As the multimachine
defuzzification techniques [26]. The parameters of member-
power system is much more complex than above two
ship functions and the fuzzy IF-THEN rules are provided
classical benchmarks, the specific controller design, in-
according to the experience and knowledge from human
cluding the wide-area control signal (WACS) selection
experts. However, in reality, there does not exist a systematic
and the reinforcement signal setting, have all been
way to select the proper membership functions and the fuzzy
introduced in details. Based on dynamic time-domain
rules [27]. If the pre-set parameters demonstrate unsatisfied
simulation and a quantitative performance index, the
performance, then an adaptive law is applied to update the
proposed intelligent controller demonstrates increased
parameters in the fuzzy rules or the membership functions,
power system damping and improved system transient
which is called adaptive fuzzy control [28] [29] [30]. Similar
stability.
with this concept, the neuro-fuzzy controller based on neural
4) The stability analysis for the proposed Fuzzy-GrADP
networks has been proposed in [31] [32], where the adaptive-
has been carried out in this paper. The constraints for this
network-based fuzzy inference system (ANFIS) is a typical
method have been derived based on Lyapunov function,
structure belongs to this category [33]. These methods pose
and can be used for choosing the key parameters. The
certain advantages of the neural network, thus could achieve
implementation issues and future research directions are
performance over the traditional fuzzy logic controllers. Along
also provided in this research.
this topic, many improvements on the algorithm side and
the application side have been intensively carried out in the The rest of the paper is organized as follows. Section II
literature. Such as neuro-fuzzy system modeling with self- presents the details of the proposed controller architecture and
IEEE, NOV 2015 3
Fig. 2. The schematic diagram of the goal network with three-layer nonlinear
architecture
associated learning algorithm. In Section III and Section IV,
experimental setup and simulation analysis based on two small
benchmarks (i.e., cart-pole and ball-and-beam) are presented
to show the effectiveness of our approach. Then a relative one is from the goal network path and the other is from the
large multimachine power system control case study is carried critic network path. The detailed learning and adaptation for
out in Section V, including the inputs selection and the each module are discussed in the following sections.
reinforcement signal design. Then the stability analysis for the
proposed Fuzzy-GrADP is discussed in Section VI. Finally,
A. The Goal Network Learning and Adaptation
the concluding remarks, implementation issues and potential
solutions are given in Section VII. The structure of the goal network is shown in Fig. 2. As
can be seen from this figure, a neural network with three-layer
II. O NLINE LEARNING OF THE F UZZY-G R ADP nonlinear architecture (with one hidden layer) is used in this
CONTROLLER
paper, where this structure setting is the same as in [21] [22]
[44]. The feed-forward propagation of the signal in the goal
The schematic diagram of the proposed Fuzzy-GrADP is network is as follows:
shown in Fig. 1. The reason for the adopted GrADP is better
1 − e−k(t)
than the traditional two networks (i.e., the action network and s(t) = (3)
the critic network) ADP is that, an internal goal network is 1 + e−k(t)
introduced. The key idea of the goal network is to replace the Nh
traditional “hand-crafted” based reinforcement signal setting, k(t) = wg(2)
i
(t) · yi (t) (4)
hence to provide an adaptive internal goal/reward representa- i=1
tion to the critic network [21]. As we can see from Fig. 1, 1 − e−zi (t)
the reinforcement signal r(t) is no longer directly used by the yi (t) = , i = 1, ..., Nh (5)
1 + e−zi (t)
critic network. It will be used by the goal network to generate
n+1
internal goal representation signal for the critic network. In
zi (t) = wg(1) (t) · xj (t), i = 1, ..., Nh (6)
this way, the goal network facilitates the critic network to i,j
j=1
better approximate the value function. The motivation of this
hierarchical goal representation is to mimic human’s brain with where zi (t) is the ith hidden node input of the goal network
multiple levels internal goals to accomplish the long-term final and yi (t) is the corresponding output of the hidden node, k(t)
goals [43]. In this paper, one goal representation network is is the input to the output node of the goal network before the
used to serve the internal goal representation signal. sigmoid function, Nh is the number of hidden neurons of the
Because of the integration of the goal network and the using goal network, and (n + 1) is the total number of inputs to the
of FHM, the learning and the adaptation in the three networks goal network including the action value u(t) from the fuzzy
will be different with the GrADP and Fuzzy-ADP. For feed- logic controller.
forward process, the output of the fuzzy logic controller Before adjusting the weights in the goal network through
u(t) will have two paths to contribute to the error function back-propagation rule, we need to define the error function
formulation, one is through the goal network and the other first [21]. As shown in Fig. 1, the primary reinforcement
is through the critic network. For backward propagation, the signal r(t) is presented to the goal network directly, then a
error function of the goal network is related to the primary secondary/internal reinforcement signal s(t) is generated and
reinforcement signal r(t), and the error function of the critic sent to the critic network, which in turn is used to provide a
network is related to the internal reinforcement signal s(t). better approximation of the J(t) [21]. In this way, the primary
Meanwhile, the updating of the rules and the membership reinforcement signal r(t) is in a higher hierarchical level and
functions in the FHM will be composed of two parts, where can be a simple binary signal to represent “good” or “bad”,
IEEE, NOV 2015 4
(2)
Pa1 =ea (t)
Nh
1 (31)
· wc(2) (t) · · (1 − p2i (t))wc(1) (t) · ωr
i=1
i
2 i,n+1
(2)
Pa2 =ea (t)
Nh
1
· wc(2) (t) · · (1 − p2i (t))wc(1) (t)
i=1
i
2 i,n+2
(32)
Nh
1
· wg(2) (t) · · (1 − yi2 (t))wg(1) (t) · ωr
i=1
i
2 i,n+1
∂Ea (t) ∂Ea (t) ∂J(t) ∂u(t) ∂ωr (t) ∂μi,ji (t)
= · · · ·
∂θi (t) ∂J(t) ∂u(t) ∂ωr (t) ∂μi,ji (t) ∂θi (t) Fig. 5. Flowchart of the Fuzzy-GrADP simulation procedure
(1)
Pa1
∂Ea (t) ∂J(t) ∂s(t) ∂u(t) ∂ωr (t) ∂μi,ji (t) ∂μi,ji (t) − 12 sec h2 (θi (t) · xi (t)) · xi (t), ji = N
+ · · · · · =
∂J(t) ∂s(t) ∂u(t) ∂ωr (t) ∂μi,ji (t) ∂θi (t) ∂θi (t) 1
sec h2 (θi (t) · xi (t)) · xi (t), ji = P
2
(1)
Pa2 (37)
(34) where ηa (t) is the learning rate in the FHM. The setting of
this parameter is similar as ηg (t) in goal network and ηc (t) in
(1) critic network, and will be discussed in the parameter setting
Pa1 =ea (t)
section.
Nh
1 At last, the parameter tuning for the fuzzy logic controller
· wc(2)
· · (1 − p2i (t)) · wc(1)
(t) (t)
i=1
i
2 i,n+1 is chosen as the gradient descent rule as:
⎡ ⎤ (35)
2n n
Rr (t + 1) = Rr (t) + ΔRr (t)
⎢ ∂μi,ji (t) ⎥ (38)
· ⎣Rr · ( μi,ji (t)) · ⎦ θi (t + 1) = θi (t) + Δθi (t)
r=1 t=1
∂θi (t)
t=i
D. Fuzzy-GrADP Learning Process and Parameter Setting
(1) The utility function Uc (t) is set as zero to represent success
Pa2 =ea (t)
in the paper. Once a system state x(t) is observed (we assume
Nh
1 that in this paper, the system/plant to be controlled is fully
· wc(2) (t) · · (1 − p2i (t)) · wc(1) (t)
i=1
i
2 i,n+2 observable) and sent to the controller, the learning process will
Nh occurs and an consequent control action will be generated by
1 the controller.
· wg(2) · (1 − yi2 (t)) · wg(1)
(t) · (t) (36)
i=1
i
2 i,n+1
The flowchart of the simulation procedure is presented in
⎡ ⎤ Fig. 5. The dash lines represent the back-propagation path,
n
2
⎢ n
∂μi,ji (t) ⎥ and the order of the back-propagation is corresponded to the
· ⎣ Rr · ( μi,ji (t)) · ⎦ numbers. During each sampling time step, after the feed-
r=1 t=1
∂θi (t)
t=i forward propagation, the goal network will first update its
IEEE, NOV 2015 7
TABLE I
G ENERAL PARAMETERS USED IN THE F UZZY-G R ADP CONTROLLER
TABLE II
S PECIFICAL PARAMETERS USED IN THE F UZZY-G R ADP CONTROLLER
u
J
í0
í02
í0
0 2000 4000 6000 0 2000 4000 6000
Time Step Time Step
Cart position 3olH anJXlar
0
0 0
x
φ
í0 í
0 2000 4000 6000 0 2000 4000 6000
Time Step Time Step
Fig. 7. Typical record of cost-to-go, control action, cart position, and pole angular signal on the cart-pole balancing problem
TABLE III
P ERFORMANCE EVALUATION ON CASE I: C ART POLE , BASED ON REQUIRED AVERAGE NO . OF TRIALS TO BE SUCCESS
10−5 kg·m2 , the friction coefficient of the drive mechanics b = B. Results Analysis In Case II
1Ns /m, the radius of force application l = 0.48m, the radius Since the number of the state vector is the same as that in
of beam lω = 0.5m, the stiffness of the drive mechanics K = Case I, the parameter setting described in Table I for Case
0.001N/m, the gravity g = 9.8N/kg, the inertia moment of II will remain unchange. The objective of the task it to keep
the beam Iω = 0.14025kg · m2 , and u is the force of the drive balancing the ball on the beam for a certain period of time.
mechanics. Specifically, each run consists of a maximum of 1000 trials,
and it is considered successful if the last trial of the run has
In order to simplify the system model function, we re-define
lasted 10000 time steps. Otherwise, if the controller is unable
that x1 = x represents the position of the ball, x2 = ẋ
to learn to balance the ball-and-beam within 1000 trials, then
represents the velocity the ball, x3 = α is the angle of the
the run is considered unsuccessful. The range of beam is
beam with respect to the horizontal axis, and x4 = α̇ is the
[−0.48, 0.48]m and the range of the angular of the beam to
angular velocity of the beam. In this way, the system function
the horizontal axis is [−0.24, 0.24]rad. In this case, different
in (43) and (44) can be transformed into the following form:
with the “bang-bang” control in Case I, a continuous force is
Ib 1 applied to the driver directly.
(m + )ẋ2 + (mr2 + Ib ) ẋ4 = mx1 x24 + mg(sin x3 ) (45)
r2 r We compare the proposed algorithm with the ADP structure
1 presented in [16], the GrADP in [21], the hierarchical GrADP
(mr2 + Ib ) ẋ2 + [mx21 + Ib + Iω ]ẋ4 = with three goal networks in [43], and the T-S Fuzzy-HDP in
r
(ul + mgx1 ) cos x3 − (2mx2 x1 + bl2 )x4 − Kl2 x3 [50]. The results of the required average number of trials to
(46) be success and the successful rate in 100 individual runs are
shown in Table IV. For fair comparison, we add the same
then re-write (45) and (46) into a matrix notation as follows: initial condition and types of noise according to [43] in our
A B ẋ2 P simulation. Specifically, the ball position x1 and the angular
· = (47) of the beam x3 are uniformly distributed in the range of
C D ẋ4 Q
[−0.2, 0.2]m and [−0.15, 0.15]rad, respectively, and the ball
where the elements are as follows; velocity x2 and the angular velocity x4 are set to be zero.
2 The initialization of the neural networks and the fuzzy logic
A B m + rIb2 mr + Ib 1r
= 2 (48) controllers are the same as in Case I. From the results in Table
C D mr + Ib 1r mx21 + Ib + Iω IV we can observe that, the proposed approach can provide the
best performance with uniform or gaussian noise. Especially,
P
= the proposed algorithm is unsensitive to the noise intensity
Q and type, which demonstrates a consistent observation with in
mx1 x24 + mg (sin x3 ) Case I, namely, effective and robust under noisy conditions.
(ul + mgx1 ) cos x3 − 2mx1 x2 + bl2 x4 − Kl2 x3
(49) V. C ASE III: M ULTIMACHINE P OWER S YSTEM C ONTROL
and the general form of this problem is obtained as follows: S TUDY
−1
ẋ2 A B P A. Benchmark Power System Description
= (50) To demonstrate the feasibility of the proposed Fuzzy-
ẋ4 C D Q
GrADP approach on real applications, a case study is under-
and the other two terms in the state vector can be expressed taken based on the New England 10-machine 39-bus system.
as ẋ1 = x2 and ẋ3 = x4 , thus with the state vector as follows: The power system configuration is shown in Fig. 9. This test
x 1 x2 x3 x4 system consists of 10 generators, 39 buses, and 46 trans-
(51)
mission lines. Similar as in reference [53], each generator is
IEEE, NOV 2015 10
TABLE IV
P ERFORMANCE EVALUATION ON CASE II: BALL - AND - BEAM , BASED ON REQUIRED AVERAGE NO . OF TRIALS TO BE SUCCESS AND SUCCESSFUL RATE
TABLE V
I NTER - AREA MODES AND THE OBSERVABILITY SIGNALS CORRESPONDING TO MODE I
3RZHUíDQG&RQWURORXWSXW SX
$ctiYH poZHr on transPission linH to 8
2XtpXt oI tKH controllHr
W ACS = ΔP318 ΔP1718 ΔP1516 ΔP1617 (52) 0
Q = diag 1 1 1 1 0
(53)
r (t) = −0.25 · W ACS · Q · W ACS í0
shown in Fig. 12 and Fig. 13. Specifically, Fig. 12 and Fig. 13 reinforcement signal for the critic network, so that to help
show the transmitted active power on line 3 ∼ 18 and 17 ∼ 18 the value function approximation and control policy seeking
with the original PI control, GrADP control, Fuzzy-ADP over time [21]. In addition, the stability of GrADP controller
control and the Fuzzy-GrADP control, respectively. From the is also analyzed in [44] [64] that such controller is stable
simulation results we can see, with Fuzzy-GrADP control, the under certain constraints for the key parameters with Lyapunov
system can become stable after about 5 seconds. Meanwhile method. We are also working on the convergence analysis of
the proposed Fuzzy-GrADP approach has the best control value function, and (internal) reinforcement signal with certain
performance than the other methods. monotonic properties based on our previous published results
To better assess the control performance during the transient [65].
process with different methods, a quantitative performance
index based on the integral of the time multiplied by the In this paper, we proposed a fuzzy-based GrADP structure,
absolute error (ITAE) [62] [63] has been adopted as follows: in which we use fuzzy hyperbolic model to seek the control
Tsim n policy. The goal network in the Fuzzy-GrADP is still to
provide the adaptive (internal) reinforcement signal for the
JIT AE = |δi − δr | · t · dt (54)
i=1
critic network, which will then evaluate the performance of the
0
action network, i.e., the FHM. The objective of the proposed
where δi is the rotor angle of the ith generator, δr is the rotor Fuzzy-GrADP is to keep the control system stable and also
angle of the reference generator (i.e., G10 in this case), n is the minimizing the total cost function over time. There are two
number of all the generators, and Tsim is total simulation time. possible directions to address the stability and convergence
As indicated in [63], smaller JIT AE indicates less deviation of of the proposed Fuzzy-GrADP method. On one hand, we
synchronization among all the generators and shorter time for can define the Lyapunov function for the proposed design,
the system to reach steady state. Since the rotor angle oscilla- and analyze the first difference of Lyapunov function to be
tions of all the generators have been considered, JIT AE is a negative definite [66]. Under the conditions derived, we can
system-level performance index representing overall stability conclude that the proposed Fuzzy-GrADP method is (asymp-
and dynamic performance. In this paper, this index is used as totically) stable. On the other hand, we plan to address the
the supplement and conclusion to the time-domain simulation convergence of the value function and (internal) reinforcement
for a better view of comparison. signal as those in our previous related works [65]. We will
Table VI shows the comparison of the JIT AE under the first analyze the monotonic properties of both signals and
same fault with different control methods. It could be observed then find the upper/lower bounds. We are currently working
that the proposed method could achieve the smallest JIT AE for both possible directions to handle the theoretical analysis
value, which means the whole system will have less oscillation of the proposed Fuzzy-GrADP method. In this paper, the
under this fault condition. Moreover, based on the JIT AE former method is adopted to show the stability analysis of
value of the original PI control, the percentage of damping our proposed structure. Similar with the method in [66], here
improvement of each method is also calculated. The proposed we use R(t) to represent the fuzzy control rules before the
method improves the system damping for a number of 48.38%. output layer in FHM. And we use ωc (t), ωg (t) to represent
Now, we can conclude that the Fuzzy-GrADP controller has (2) (2)
the hidden-to-output layer weights ωc (t), ωg (t) and define
the best control performance to increase the system damping. the outputs of the hidden layers as φc (t) = p(t), φg (t) = y(t),
in critic and goal networks, respectively.
VI. F UZZY-G R ADP S TABILITY A NALYSIS
Define the Lyapunov function candidate as follows:
Fuzzy modeling and fuzzy-based network have been in-
troduced into the adaptive control area for many years, and V (t) = V1 (t) + V2 (t) + V3 (t) + V4 (t) (55)
has demonstrated its performance with stability analysis from
different perspectives [28] [32] [33] [34], as well as the where
1
successful control performance on power system applications V1 (t) = tr{ω̃cT (t)ω̃c (t)}, ω̃c (t) = ωc (t) − ωc∗ (56)
[35] [36]. Based on the advantages of the fuzzy modeling, ηc
researchers also introduced such fuzzy mapping into adap- 1
tive/approximate dynamic programming for adaptive online V2 (t) = tr{ω̃gT (t)ω̃g (t)}, ω̃g (t) = ωg (t) − ωg∗ (57)
γ 2 ηg
learning control. For instance, in [40] [41] [42] [50], the
1
authors have introduced the fuzzy neural network model into V3 (t) = tr{R̃T (t)R̃(t)}, R̃(t) = R(t) − R∗ (58)
the ADP design, and demonstrate better statistical performance γ 3 ηa
on the balancing benchmarks. In [49], the convergence anal- 1
V4 (t) = ξc (t − 1)2 , ξc (t) = ω̃cT (t)φc (t) (59)
ysis for the value function and control policy were provided 2
for FHM based adaptive dynamic programming. Meanwhile,
the three-network/goal representation ADP was proposed to
Hence, the first difference of the Lyapunov function candi-
introduce an additional neural network mapping comparing
date is:
with the existing ADP design mentioned above. This ad-
ditional neural network can provide an adaptive (internal) ΔV (t) = ΔV1 (t) + ΔV2 (t) + ΔV3 (t) + ΔV4 (t) (60)
IEEE, NOV 2015 13
TABLE VI
P ERFORMANCE EVALUATION ON CASE III: M ULTIMACHINE P OWER S YSTEMS , BASED ON JIT AE
and according to Cauchy-Schwarz inequality, (70) becomes: therefore, we can further obtain that:
1 2 2 2 2 8 2
ΔV3 ≤ (−(1 − ηa 2 )ωcT (t)φc (t)2 ωcT (t)H(t)2 P 2 ≤ 8α2 + 4 ωcm φcm + ωcm Hm 2 2
ωcm φ2cm + Cm
γ3 γ γ2
(71) 3 2
+ 2ωcT (t)φc (t)2 ωcT H(t)2 + ξa (t)2 ) 1 4α
(α2 + 1)s2m + rm 2
+ ω 2 φ2 = Pm 2
2 γ2 gm gm
(77)
where ωcm , ωgm , φcm , φgm , Cm , Hm , sm , and rm are the
For the forth term, upper bounds of ωc , ωg , φc , φg , C(t), H(t), s(t), and r(t),
respectively.
1
ΔV4 (t) = ξc (t)2 − ξc (t − 1)2 (72) Hence, if condition (74) holds, then for any:
2
2
substituting (64), (68), (71), and (72) into (60), we obtain the ξc (t)2 > P2 (78)
first difference of the Lyapunov function candidate as follows: 2α2 −1 m
1 the first difference of the Lyapunov function candidate ΔV ≤
ΔV (t) ≤ − (α2 − )ξc (t)2 − α2 (1 − α2 ηc φc (t)2 )ξc (t) 0 holds. According to the standard Lyapunov extension theo-
2
rem [67], this demonstrates that the errors between the optimal
+ ωc∗T φc (t) + α−1 s(t) − α−1 ωcT (t − 1φc (t − 1))2
weights ωc∗ , ωg∗ , R∗ and their estimations ωc , ωg , R are
1
− (1 − α2 ηg φg (t)2 )C(t)2 αs(t) + r(t) uniformly ultimately bounded (UUB), which further implies
γ2 that the proposed Fuzzy-GrADP is stable.
1
− s(t − 1)2 − (1 − ηa ω(t)2 )ωcT (t)φc (t)2
γ3
1 VII. C ONCLUSIONS AND D ISCUSSIONS
ωc (t)H(t) + 2αωc∗T φc (t) + s(t) − ωcT (t − 1)
T 2
2 In this paper, a novel FHM based GrADP algorithm (Fuzzy-
1 α2
φc (t − 1) − ωc∗ φc (t − 1)2 + ξg (t)2 GrADP) was proposed for nonlinear control problems. The
2 γ2 parameters in the membership functions and the fuzzy rules
2
+ C(t)2 αs(t) + r(t) − s(t − 1)2 were updated through a learning mechanism, thus was able
γ2 to provide online sequential control policy. Simulation results
2
+ ωcT (t)φc (t)2 ωcT H(t)2 on three case studies, i.e., a cart-pole balancing problem, a
γ3 ball-and-beam balancing problem and a multimachine power
(73) system damping control problem, demonstrated that the pro-
set the following constrains: posed control algorithm is effective and robust either in
√ small balancing problems or in large power system damping
2
< α < 1, α2 ηc φc (t)2 < 1 applications. Furthermore, detailed Lyapunov stability analysis
2 (74) was also carried out in this paper to demonstrate the theoretical
α2 ηg φg (t)2 < 1, ηa ω(t)2 < 1 convergence guarantee of the proposed approach.
and define: The adjustment of the parameters in the FHM, goal network
1 and critic network are based on back-propagation that is time-
P 2 =2αωc∗T φc (t) + s(t) − ωcT (t − 1)φc (t − 1) consuming. In real power system applications, the sampling
2
1 α2 time should be long enough to guarantee the Fuzzy-GrADP
− ωc∗ φc (t − 1)2 + ξg (t)2 controller has adapted the parameters in the three networks.
2 γ2
(75) In our simulation, an Inter(R) Core(TM) i7-4770 CPU with
2
+ C(t)2 αs(t) + r(t) − s(t − 1)2 3.4GHz with Matlab/Simulink R2013a environment is used.
γ2
The iteration number in each sampling time step in the FHM,
2
+ ωcT (t)φc (t)2 ωcT H(t)2 goal network and critic network are set as Na = 100,
γ3 Ng = 50, and Nc = 80 (see Table II), respectively. In Case
and applying Cauchy-Schwarz inequality, we obtain: I, the average time to fully adapt the parameters in the three
1 networks in each sampling time step is 0.72ms. In Case II,
P 2 ≤8(α2 ωc∗T φc (t)2 + s(t)2 + ωcT (t − 1)φc (t − 1)2 the average time to fully adapt the parameters in the three
4
1 8 parts in each sampling time step is 0.83ms. In the power
+ ωc∗T φc (k − 1)2 ) + C(t)2 system damping control case, the average time to fully adapt
4 γ2
the parameters in the three networks in each sampling time
2 2 2 1 2
α s(t) + s(t − 1) + r(t) step is 5.8ms. Therefore in real power system applications,
2
2
the sampling time for the controller could be chosen as 20ms
2α
+ ωgT φg (t)2 + ωgT φg (t)2 (50Hz).
(2) As indicated in Case III, the Fuzzy-GrADP requires wide-
+ ωcT (t)φc (t)2 ωcT (t)H(t)2 area control signals (WACS), such as generator speed and
(76) transmission line voltage measurements. In modern power
IEEE, NOV 2015 15
system, this hurdle has been addressed by the largely installa- [17] P. J. Werbos, “Intelligence in the brain: A theory of how it works and
tion of wide-area measurement system (WAMS). The remote how to build it,” Neural Networks, vol. 22, no. 3, pp. 200–212, 2009.
[18] D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, “Neural-network-based
generator or bus signal will be measured by the sensors with a optimal control for a class of unknown discrete-time nonlinear systems
global time tag, and sent to the control center, such as energy using globalized dual heuristic programming,” Automation Science and
management system (EMS). Even if not all the generators or Engineering, IEEE Transactions on, vol. 9, no. 3, pp. 628–634, 2012.
[19] D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of
buses information are available, the state estimation technique unknown nonaffine nonlinear discrete-time systems based on adaptive
will help the controller to get the accurate and real-time system dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832,
state. The Fuzzy-GrADP is used to damp out the most critical 2012.
[20] J. Fu, H. He, and X. Zhou, “Adaptive learning and control for MIMO
inter-area modes, such as low-frequency oscillation mode in system based on adaptive dynamic programming,” Neural Networks,
largely inter-connected power systems. However, some of the IEEE Transactions on, vol. 22, no. 7, pp. 1133–1148, 2011.
unstable local-area modes may also need damping control. [21] H. He, Z. Ni, and J. Fu, “A three-network architecture for on-line
learning and optimization based on adaptive dynamic programming,”
The proposed Fuzzy-GrADP controller can be coordinated Neurocomputing, vol. 78, no. 1, pp. 3–13, 2012.
with traditional power system stabilizer (PSS) to improve [22] Z. Ni, H. He, J. Wen, and X. Xu, “Goal representation heuristic dynamic
the system dynamic stability in a wide rage of operating programming on maze navigation,” Neural Networks and Learning
conditions. Systems, IEEE Transactions on, vol. 24, no. 12, pp. 2038–2050, Dec
2013.
[23] X. Sui, Y. Tang, H. He, and J. Wen, “Energy-storage-based low-
frequency oscillation damping control using particle swarm optimization
R EFERENCES and heuristic dynamic programming,” Power Systems, IEEE Transac-
[1] C.-H. Wang, H.-L. Liu, and T.-C. Lin, “Direct adaptive fuzzy-neural tions on, vol. 29, no. 5, pp. 2539–2548, Sept 2014.
control with state observer and supervisory controller for unknown [24] Y. Tang, H. He, J. Wen, and J. Liu, “Power system stability control for a
nonlinear dynamical systems,” Fuzzy Systems, IEEE Transactions on, wind farm based on adaptive dynamic programming,” Smart Grid, IEEE
vol. 10, no. 1, pp. 39–49, 2002. Transactions on, vol. 6, no. 1, pp. 166–177, Jan 2015.
[2] Y. Yang and C. Zhou, “Adaptive fuzzy H-∞ stabilization for strict- [25] Y. Tang, J. Yang, J. Yan, and H. He, “Intelligent load frequency
feedback canonical nonlinear systems via backstepping and small-gain controller using GrADP for island smart grid with electric vehicles and
approach,” Fuzzy Systems, IEEE Transactions on, vol. 13, no. 1, pp. renewable resources,” Neurocomputing, vol. 170, pp. 406–416, 2015.
104–114, 2005. [26] C.-C. Lee, “Fuzzy logic in control systems: fuzzy logic controller. I and
[3] S. J. Yoo and J. B. Park, “Neural-network-based decentralized adaptive II,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 20, no. 2,
control for a class of large-scale nonlinear systems with unknown time- pp. 404–435, 1990.
varying delays,” Systems, Man, and Cybernetics, Part B: Cybernetics, [27] S. Mohagheghi, G. Venayagamoorthy, and R. Harley, “Fully evolvable
IEEE Transactions on, vol. 39, no. 5, pp. 1316–1323, 2009. optimal neurofuzzy controller using adaptive critic designs,” Fuzzy
[4] M. Wang, S. S. Ge, and K.-S. Hong, “Approximation-based adaptive Systems, IEEE Transactions on, vol. 16, no. 6, pp. 1450–1461, Dec
tracking control of pure-feedback nonlinear systems with multiple un- 2008.
known time-varying delays,” Neural Networks, IEEE Transactions on, [28] L.-X. Wang, “Stable adaptive fuzzy control of nonlinear systems,” Fuzzy
vol. 21, no. 11, pp. 1804–1816, 2010. Systems, IEEE Transactions on, vol. 1, no. 2, pp. 146–155, 1993.
[5] W.-Y. Wang, Y.-H. Chien, Y.-G. Leu, and T.-T. Lee, “Adaptive T-S fuzzy- [29] H. O. Wang, K. Tanaka, and M. F. Griffin, “An approach to fuzzy control
neural modeling and control for general MIMO unknown nonaffine of nonlinear systems: stability and design issues,” Fuzzy Systems, IEEE
nonlinear systems using projection update laws,” Automatica, vol. 46, Transactions on, vol. 4, no. 1, pp. 14–23, 1996.
no. 5, pp. 852–863, 2010. [30] P. P. Angelov and D. P. Filev, “An approach to online identification of
[6] Q. Gao, X.-J. Zeng, G. Feng, Y. Wang, and J. Qiu, “T-S-fuzzy- takagi-sugeno fuzzy models,” Systems, Man, and Cybernetics, Part B:
model-based approximation and controller design for general nonlinear Cybernetics, IEEE Transactions on, vol. 34, no. 1, pp. 484–498, 2004.
systems,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE [31] C.-T. Lin and C. G. Lee, “Reinforcement structure/parameter learning
Transactions on, vol. 42, no. 4, pp. 1143–1154, 2012. for neural-network-based fuzzy logic control systems,” Fuzzy Systems,
[7] Y.-J. Liu, W. Wang, S.-C. Tong, and Y.-S. Liu, “Robust adaptive tracking IEEE Transactions on, vol. 2, no. 1, pp. 46–63, 1994.
control for nonlinear systems based on bounds of fuzzy approximation [32] J.-S. Jang and C.-T. Sun, “Neuro-fuzzy modeling and control,” Proceed-
parameters,” Systems, Man and Cybernetics, Part A: Systems and ings of the IEEE, vol. 83, no. 3, pp. 378–406, 1995.
Humans, IEEE Transactions on, vol. 40, no. 1, pp. 170–184, 2010. [33] J.-S. Jang, “ANFIS: adaptive-network-based fuzzy inference system,”
[8] C. Chen, Y.-J. Liu, and G.-X. Wen, “Fuzzy neural network-based Systems, Man and Cybernetics, IEEE Transactions on, vol. 23, no. 3,
adaptive control for a class of uncertain nonlinear stochastic systems,” pp. 665–685, 1993.
Cybernetics, IEEE Transactions on, vol. 44, no. 5, pp. 583–593, May [34] S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system modeling with
2014. self-constructing rule generationand hybrid SVD-based learning,” Fuzzy
[9] H. Han, X.-L. Wu, and J.-F. Qiao, “Nonlinear systems modeling based Systems, IEEE Transactions on, vol. 11, no. 3, pp. 341–353, June 2003.
on self-organizing fuzzy-neural-network with adaptive computation al- [35] F.-J. Lin, C.-H. Lin, and P.-H. Shen, “Self-constructing fuzzy neural
gorithm,” Cybernetics, IEEE Transactions on, vol. 44, no. 4, pp. 554– network speed controller for permanent-magnet synchronous motor
564, April 2014. drive,” Fuzzy Systems, IEEE Transactions on, vol. 9, no. 5, pp. 751–
[10] J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Handbook of 759, 2001.
learning and approximate dynamic programming. New York, USA: [36] D. Fang, Y. Xiaodong, T. S. Chung, and K. Wong, “Adaptive fuzzy-logic
IEEE, 2004. SVC damping controller using strategy of oscillation energy descent,”
[11] R. E. Bellman and S. E. Dreyfus, Applied dynamic programming. Power Systems, IEEE Transactions on, vol. 19, no. 3, pp. 1414–1421,
Princeton, NJ: Princeton Univ. Press, 1966. Aug 2004.
[12] W. B. Powell, Approximate Dynamic Programming: Solving the curses [37] Z. Yun, Z. Quan, S. Caixin, L. Shaolan, L. Yuming, and S. Yang, “RBF
of dimensionality. USA: John Wiley & Sons, 2007. neural network and ANFIS-based short-term load forecasting approach
[13] H. He and E. Garcia, “Learning from imbalanced data,” Knowledge and in real-time price environment,” Power Systems, IEEE Transactions on,
Data Engineering, IEEE Transactions on, vol. 21, no. 9, pp. 1263–1284, vol. 23, no. 3, pp. 853–858, Aug 2008.
Sept 2009. [38] T. Shannon and G. Lendaris, “Adaptive critic based approximate dy-
[14] P. J. Werbos, “Backpropagation through time: what it does and how to namic programming for tuning fuzzy controllers,” in Fuzzy Systems,
do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990. 2000. FUZZ IEEE 2000. The Ninth IEEE International Conference on,
[15] D. V. Prokhorov, D. C. Wunsch et al., “Adaptive critic designs,” Neural May 2000.
Networks, IEEE Transactions on, vol. 8, no. 5, pp. 997–1007, 1997. [39] S. Mohagheghi, G. K. Venayagamoorthy, and R. G. Harley, “Adaptive
[16] J. Si and Y.-T. Wang, “Online learning control by association and critic design based neuro-fuzzy controller for a static compensator in
reinforcement,” Neural Networks, IEEE Transactions on, vol. 12, no. 2, a multimachine power system,” Power Systems, IEEE Transactions on,
pp. 264 –276, Mar. 2001. vol. 21, no. 4, pp. 1744–1754, 2006.
IEEE, NOV 2015 16
[40] T. Li, D. Zhao, and J. Yi, “Adaptive dynamic neuro-fuzzy system for [63] J. Fang, W. Yao, Z. Chen, J. Wen, and S. Cheng, “Design of anti-windup
traffic signal control,” in Neural Networks, 2008. IJCNN 2008.(IEEE compensator for energy storage-based damping controller to enhance
World Congress on Computational Intelligence). IEEE International power system stability,” Power Systems, IEEE Transactions on, vol. 29,
Joint Conference on. IEEE, 2008, pp. 1840–1846. no. 3, pp. 1175–1185, May 2014.
[41] D. Zhao, Y. Zhu, and H. He, “Neural and fuzzy dynamic programming [64] Z. Ni, X. Fang, H. He, D. Zhao, and X. Xu, “Real-time tracking
for under-actuated systems,” in Neural Networks (IJCNN), The 2012 control on adaptive critic design with uniformly ultimately bounded
International Joint Conference on. IEEE, 2012. condition,” in IEEE Symposium on Adaptive Dynamic Programming
[42] Y. Zhu, D. Zhao, and H. He, “Integration of fuzzy controller with and Reinforcement Learning (ADPRL’13), IEEE Symposium Series on
adaptive dynamic programming,” in Intelligent Control and Automation Computational Intelligence (SSCI), Apr. 2013.
(WCICA), 2012 10th World Congress on. IEEE, 2012, pp. 310–315. [65] X. Zhong, H. He, H. Zhang, and Z. Wang, “Optimal control for
[43] Z. Ni, H. He, D. Zhao, and D. V. Prokhorov, “Reinforcement learning unknown discrete-time nonlinear markov jump systems using adaptive
control based on multi-goal representation using hierarchical heuristic dynamic programming,” Neural Networks and Learning Systems, IEEE
dynamic programming,” in Neural Networks (IJCNN), The 2012 Inter- Transactions on, vol. 25, no. 12, pp. 2141–2155, Dec 2014.
national Joint Conference on. IEEE, 2012. [66] F. Liu, J. Sun, J. Si, W. Guo, and S. Mei, “A boundedness result for the
[44] Z. Ni, H. He, and J. Wen, “Adaptive learning in tracking control based on direct heuristic dynamic programming,” Neural Networks, vol. 32, pp.
the dual critic network design,” Neural Networks and Learning Systems, 229–235, 2012.
IEEE Transactions on, vol. 24, no. 6, pp. 913–928, June 2013. [67] J.-J. E. Slotine, W. Li et al., Applied nonlinear control. Englewood
[45] H. Zhang and Y. Quan, “Modeling, identification, and control of a class Cliffs, NJ: Prentice-Hall International Inc, 1991.
of nonlinear systems,” Fuzzy Systems, IEEE Transactions on, vol. 9,
no. 2, pp. 349–354, 2001.
[46] S. Lun, Z. Guo, and H. Zhang, “Fuzzy hyperbolic neural network
model and its application in H-∞ filter design,” in Advances in Neural
Networks-ISNN 2008. Springer, 2008, pp. 222–230.
[47] H. Zhang and D. Liu, Fuzzy modeling and fuzzy control. Boston, MA:
Birkhauser, 2006. Yufei Tang (S’13) received the B.Eng. and M.Eng.
[48] G. Wang, H. Zhang, B. Chen, and S. Tong, “Fuzzy hyperbolic neural degrees in electrical engineering from Hohai Uni-
network with time-varying delays,” Fuzzy Sets and Systems, vol. 161, versity, Nanjing, China, in 2008 and 2011, respec-
no. 19, pp. 2533–2551, 2010. tively. He is currently working toward the Ph.D.
[49] J. Zhang, H. Zhang, Y. Luo, and H. Liang, “Nearly optimal control degree at the Department of Electrical, Computer,
scheme using adaptive dynamic programming based on generalized and Biomedical Engineering, University of Rhode
fuzzy hyperbolic model,” Acta Automatica Sinica, vol. 39, no. 2, pp. Island, Kingston, RI, USA.
142–148, 2013. His research interests include power system sta-
[50] Y. Zhu, D. Zhao, and D. Liu, “Convergence analysis and application of bility, control, and optimization, renewable energy
fuzzy-HDP for nonlinear discrete-time HJB systems,” Neurocomputing, systems, smart grid security, and computational in-
vol. 149, pp. 124–131, 2015. telligence for smart grids.
[51] P. H. Eaton, D. V. Prokhorov, and D. C. Wunsch, “Neurocontroller alter-
natives for “fuzzy” ball-and-beam systems with nonuniform nonlinear
friction,” Neural Networks, IEEE Transactions on, vol. 11, no. 2, pp.
423–435, 2000.
[52] T.-L. Chien, C.-C. Chen, Y.-C. Huang, and W.-J. Lin, “Stability and
almost disturbance decoupling analysis of nonlinear system subject
to feedback linearization and feedforward neural network controller,” Haibo He (SM’11) received the B.S. and M.S.
Neural Networks, IEEE Transactions on, vol. 19, no. 7, pp. 1220–1230, degrees in electrical engineering from Huazhong
2008. University of Science and Technology (HUST),
[53] W. Yao, L. Jiang, J. Wen, Q. Wu, and S. Cheng, “Wide-area damping Wuhan, China, in 1999 and 2002, respectively, and
controller of FACTS devices for inter-area oscillations considering the Ph.D. degree in electrical engineering from Ohio
communication time delays,” Power Systems, IEEE Transactions on, University, Athens, in 2006. From 2006 to 2009,
vol. 29, no. 1, pp. 318–329, 2014. he was an assistant professor in the Department
[54] M. Pai, Energy function analysis for power system stability. Norwell, of Electrical and Computer Engineering, Stevens
MA: Kluwer, 1989. Institute of Technology, Hoboken, New Jersey. He is
[55] M. Aboul-Ela, A. Sallam, J. McCalley, and A. Fouad, “Damping currently the Robert Haas Endowed Chair Professor
controller design for power system oscillations using global signals,” in Electrical Engineering at the University of Rhode
Power Systems, IEEE Transactions on, vol. 11, no. 2, pp. 767–773, Island, Kingston, Rhode Island.
May 1996. He has authored one sole-author research book (Wiley), edited one book
[56] Y. Zhang and A. Bose, “Design of wide-area damping controllers for (Wiley-IEEE) and six conference proceedings (Springer), and authored or co-
interarea oscillations,” Power Systems, IEEE Transactions on, vol. 23, authored over 180 peer-reviewed journal and conference papers. His current
no. 3, pp. 1136–1143, 2008. research interests include adaptive dynamic programming, computational
[57] W. Yao, L. Jiang, J. Wen, Q. Wu, and S. Cheng, “Wide-area damping intelligence, machine learning, data mining, and various applications, such
controller for power system interarea oscillations: A networked predic- as smart grid, cognitive radio networks, humanrobot interaction, and sensor
tive control approach,” Control Systems Technology, IEEE Transactions networks.
on, vol. 23, no. 1, pp. 27–36, Jan 2015. Prof. He received the IEEE International Conference on Communications
[58] P. Kundur, Power System Stability and Control. New York, USA: Mc Best Paper Award in 2014, the IEEE Computational Intelligence Society
Graw-Hill, 1994. (CIS) Outstanding Early Career Award in 2014, the K. C. Wong Research
[59] Y. Tang, H. He, and J. Wen, “Comparative study between HDP and PSS Award from the Chinese Academy of Sciences in 2012, the National Science
on DFIG damping control,” in Computational Intelligence Applications Foundation CAREER Award in 2011, the Providence Business News Rising
In Smart Grid (CIASG), 2013 IEEE Symposium on, 2013, pp. 59–65. Star Innovator Award in 2011, and the Best Master Thesis Award of Hubei
[60] Y. Tang, H. He, Z. Ni, J. Wen, and X. Sui, “Reactive power control Province, China, in 2002. His research results have been covered by national
of grid-connected wind farm based on adaptive dynamic programming,” and international medias, such as The Wall Street Journal, Yahoo!, Providence
Neurocomputing, vol. 125, no. 1, pp. 125–133, 2014. Business News, among others. He has delivered numerous keynote and invited
[61] Y. Tang, H. He, and J. Wen, “Adaptive control for an HVDC transmission talks at various conferences and organizations. He was the General Chair of
link with FACTS and a wind farm,” in in Proc. IEEE Innovative Smart the IEEE Symposium Series on Computational Intelligence (IEEE SSCI) in
Grid Technologies Conference (ISGT’13), Feb 2013. 2014. He is an Associate Editor of the IEEE Transactions on Neural Networks
[62] A. Bartoszewicz and A. Nowacka-Leverton, “ITAE optimal sliding and Learning Systems, the IEEE Computational Intelligence Magazine, and
modes for third-order systems with input signal and state constraints,” the IEEE Transactions on Smart Grid.
Automatic Control, IEEE Transactions on, vol. 55, no. 8, pp. 1928–1932,
Aug 2010.
IEEE, NOV 2015 17
Zhen Ni (M’15) received the B.S. degree from Xin Xu (SM’12) received the B.S. degree in electri-
the Department of Control Science and Engineering, cal engineering from the Department of Automatic
Huazhong University of Science and Technology, Control, National University of Defense Technol-
Wuhan, China, in 2010, and the Ph.D. degree from ogy (NUDT), Changsha, China, in 1996, where he
the Department of Electrical, Computer and Biomed- received the Ph.D. degree in control science and
ical Engineering, University of Rhode Island (URI), engineering from the College of Mechatronics and
Kingston, RI, USA, in 2015. Automation, NUDT, in 2002. He is currently a Full
He is currently an Assistant Professor with the Professor with the College of Mechatronics and Au-
Department of Electrical Engineering and Computer tomation, NUDT, China. He has co-authored more
Science, South Dakota State University, Brookings, than 100 papers in international journals and con-
SD, USA. His current research interests include ferences, and co-authored four books. His research
computational intelligence, smart grid, machine learning, and cyber-physical interests include reinforcement learning, approximate dynamic programming,
systems. Prof. Ni received the Chinese Government Award for Outstanding machine learning, robotics, and autonomous vehicles.
Students Abroad in 2014. He has been actively involved in numerical Dr. Xu was a recipient of the 2nd class National Natural Science Award
conference and workshop organization committees in the society, including of China, in 2012. He is an Associate Editor of Information Sciences and
the Local Arrangement Chair of the IEEE Computational Intelligence Society Intelligent Automation and Soft Computing, a Guest Editor of the International
(CIS) Workshop in URI in 2014, and the General Co-Chair of the IEEE CIS Journal of Adaptive Control and Signal Processing. He is a Committee
Winter School in Washington, DC, USA, in 2016. Member of the IEEE TC on Approximate Dynamic Programming and
Reinforcement Learning (ADPRL) and the IEEE TC on Robot Learning.