0% found this document useful (0 votes)
5 views17 pages

! Boss

The paper introduces a novel nonlinear learning controller called Fuzzy-GrADP, which combines fuzzy hyperbolic modeling with goal representation adaptive dynamic programming for improved control in nonlinear systems. The proposed controller demonstrates superior performance in simulation tests compared to traditional adaptive dynamic programming methods, particularly in applications involving complex multimachine power systems. Additionally, the paper includes a Lyapunov stability analysis to validate the theoretical foundations of the Fuzzy-GrADP approach.

Uploaded by

Hoàng Đoàn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views17 pages

! Boss

The paper introduces a novel nonlinear learning controller called Fuzzy-GrADP, which combines fuzzy hyperbolic modeling with goal representation adaptive dynamic programming for improved control in nonlinear systems. The proposed controller demonstrates superior performance in simulation tests compared to traditional adaptive dynamic programming methods, particularly in applications involving complex multimachine power systems. Additionally, the paper includes a Lyapunov stability analysis to validate the theoretical foundations of the Fuzzy-GrADP approach.

Uploaded by

Hoàng Đoàn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

http://ieeexplore.ieee.

org/Xplore

Fuzzy-Based Goal Representation Adaptive


Dynamic Programming
Yufei Tang, Student Member, IEEE, Haibo He, Senior Member, IEEE, Zhen Ni, Member, IEEE, Xiangnan Zhong,
Dongbin Zhao, Senior Member, IEEE, and Xin Xu, Senior Member, IEEE

Abstract—In this paper, a novel nonlinear learning controller the users [6]. For nonlinear system modeling, the most widely
called fuzzy based goal representation adaptive dynamic pro- used approaches are based on conventional philosophy, such
gramming (Fuzzy-GrADP) is proposed. In the adopted GrADP as differential algebraic equation (DAE) based mathematic
method, a goal representation network is introduced to generate
an adaptive internal reinforcement signal to the critic network, model. When the system exhibits strong nonlinearities, mul-
such that, to help the controller to provide a general mapping tivariable coupling, variation of operation conditions together
between the input and output action. Moreover, in the proposed with unknown model structure and parameters, the conven-
architecture, the action network in the goal representation tional mathematics may not be suitable [7] [8] [9]. In these
adaptive dynamic programming (GrADP) is improved by using situations, methods that do not require system modeling and
the fuzzy hyperbolic model (FHM), which combines the mer-
its of fuzzy model and neural network model. Based on the also hold on-line learning ability are highly desired in the real
back-propagation technique, the parameters in the membership applications. Adaptive dynamic programming (ADP) is such a
functions (MFs) and the fuzzy rules are all undergo training tool to provide sequential decision and control to address such
and online adapting. The proposed controller is tested on two aforementioned real-life problems [10]. The key idea of ADP
numerical benchmarks and the simulation results show that the is to achieve optimization over time based on the Bellman
proposed controller outperforms the original adaptive dynamic
fuzzy controller, and the pure neural network-based GrADP equation, which has the following form [11]:
controller. In addition, the proposed controller is further applied ∞

on a large multimachine power system for static var compensator J[x(t), t] = αi−t U [x(i), u(i), i] (1)
(SVC) damping control, where simulation results demonstrate i=t
the effectiveness of the proposed approach on real applications.
Furthermore, in order to demonstrate the theoretical guarantee where x(t) is the state vector of the system, u(t) is the control
of the proposed method, Lyapunov stability analysis to support action, U is the utility function, and α is a discount factor.
the proposed Fuzzy-GrADP approach has also been carried out. Approximate dynamic programming is used to seek control
policy u(t) to minimize the total cost function J. Instead
Index Terms—Adaptive dynamic programming (ADP); inter- of finding the exact minimum, an approximate solution is
nal goal representation; goal representation adaptive dynamic provided by solving the following equation:
programming (GrADP); fuzzy hyperbolic model (FHM); multi-
machine power systems; stability analysis. J ∗ (x(t)) = min{U (x(t), u(t)) + αJ ∗ (x(t + 1))} (2)
u(t)

I. I NTRODUCTION Formulating the problem in this way, ADP can success-


fully achieve learning and control by using functional ap-
A DAPTIVE control has been successfully applied in many
areas, such as canonical nonlinear systems [1] [2], large-
scale nonlinear systems with unknown time-varying delays
proaching structure to approximate the total cost function
in order to get the approximated solution for the Bellman
[3] [4], and MIMO unknown nonaffine nonlinear systems equation, meanwhile, overcomes “the curse of dimensionality”
[5]. Generally speaking, these control methods are based on [12]. The universal functional approaching structures, such
nonlinear system modeling, which requires the consideration as multi-layer-perceptron (MLP) neural networks with back-
of both the specific objectives of task and preferences from propagation learning mechanism, have been widely studied in
the computational intelligence society [13] [14] [15].
This work was supported in part by the National Science Foundation (NSF) Existing ADP designs can be categorized into three major
under grant ECCS 1053717 and IIS 1526835, and the National Natural Science
Foundation of China (NSFC) under grant 51529701, 91220301, 61273136, groups: heuristic dynamic programming (HDP), dual heuristic
and 61573353. dynamic programming (DHP), and globalized dual heuristic
Yufei Tang, Haibo He, and Xiangnan Zhong are with the Department of dynamic programming (GDHP) [15]. The major difference
Electrical, Computer and Biomedical Engineering, University of Rhode Island,
Kingston, RI 02881, USA (email: {ytang, he, xzhong}@ele.uri.edu). between HDP and DHP is the design of the critic network.
Zhen Ni is with the Department of Electrical Engineering and Computer The critic network approximates J directly in HDP, while
Science, South Dakota State University, Brookings, SD 57007, USA (email: DHP approximates the derivative of J with respect to its
[email protected]).
Dongbin Zhao is with the State Key Lab of Management and Control for input vector. Such inner building of derivative terms over time
Complex Systems, Institute of Automation, Chinese Academy of Sciences, helps DHP reduce the probability of error introduced by back-
Beijing 100190, China (email: [email protected]). propagation (BP) [16]. GDHP takes the advantages of HDP
Xin Xu is with the College of Mechatronics and Automation, Na-
tional University of Defense Technology, Changsha 410073, China (email: and DHP, by using the critic network to approximate both J
[email protected]). and its derivative simultaneously. Thus, the GDHP is expected
Digital Object Identifier 10.1109/TFUZZ.2015.2505327
1063-6706 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
IEEE, NOV 2015 2

to have better performance than HDP and DHP. However, constructing rule based on hybrid singular value decompo-
the computational complexities and hardware implementation sition and gradient descent method [34], permanent-magnet
difficulties are much higher for GDHP. Variations of these synchronous motor drive speed control using self-constructing
major designs, such as the action-dependent (AD) versions, fuzzy neural network [35], oscillation energy descent based
have also been developed in the community [17]. The online adaptive fuzzy-logic SVC damping controller design [36], and
“model-free” direct HDP was developed in [16], where the short-term load forecasting using radial basis function (RBF)
authors took the advantages of the potential scalability of the based ANFIS [37]. Based on aforementioned discussions, it
adaptive critic designs and the intuitiveness of Q-learning. It is interesting to combine the advantages of fuzzy and ADP
is also an online learning scheme that simultaneously updates methods together to design robust controller. Such an idea has
the value function and the control policy. For the model-based been implemented in the community [38] [39] [40] [41] [42],
DHP/GDHP design, the authors in [15] proposed that the demonstrating the promising future along this direction.
efficient learning can be achieved with different weights error Inspired by these previous researches, in this paper, we are
terms for the control of an auto-lander helicopter. In [18] [19], proposing a new real-time control framework using GrADP
the authors demonstrated the convergence analysis for model- and fuzzy hyperbolic model (FHM). Moreover, an application
based DHP/GDHP in terms of cost function and control law. In study on industrial-scale multimachine power system damping
addition, the Levenberg-Marquardt method has been proposed control using static var compensator (SVC) is also presented
to be integrated into the ADP design, to improve the learning in this research. The main contributions of this paper are
and control of both the tension and height of a looper system summarized as follows:
in a hot strip mill [20].
1) A novel nonlinear learning controller, called Fuzzy-
Among the ADP designs, the online “model-free” technique
GrADP, based on fuzzy hyperbolic model (FHM)
has attracted considerable attention. To be specific, the previ-
and goal representation adaptive dynamic programming
ous total cost-to-go value J(t − 1) is stored and used to obtain
(GrADP) has been proposed in this paper. Different with
the temporal difference for training at any time instance, which
the original GrADP method, the proposed controller
enables the online learning, association, and optimization over
incorporate the advantage of FHM to increase the ro-
time. In [21], the authors proposed to improve the online
bustness. Under this framework, the parameters in the
learning of ADP design with the incorporation of a dual-
membership functions (MFs) and the fuzzy rules have
critic/reference network. This hierarchical ADP design with
been updated through a learning mechanism, and can
multiple goal networks has been tested maze navigation [22],
provide online sequential control policy.
energy storage based power system damping control [23],
2) Comparative simulation studies have been carried out
power system stability control for a wind farm [24], and load
for the proposed method with the original GrADP al-
frequency control for island smart grid with electric vehicles
gorithm, the hierarchical GrADP algorithm, the Fuzzy-
and renewable resources [25], demonstrating the superior
ADP algorithm, and the traditional ADP algorithm on
learning performance over the traditional ones.
two classical control benchmarks, which are cart-pole
Meanwhile, fuzzy systems have been used in many appli-
balancing problem and ball-and-beam balancing prob-
cations for its robust control in the presence of noise and
lem. Simulation results demonstrate that the proposed
uncertainties. In these systems, the linguistic control strategy
controller has much better robustness with noise in the
based on expert knowledge has been converted into automatic
environment.
control strategy. General speaking, fuzzy systems provide a
3) Moreover, an application study case on a large multi-
nonlinear mapping from the input to a set of fuzzy values
machine power system for SVC damping control has
using fuzzification methods, and then back to the output using
also been carried out in this paper. As the multimachine
defuzzification techniques [26]. The parameters of member-
power system is much more complex than above two
ship functions and the fuzzy IF-THEN rules are provided
classical benchmarks, the specific controller design, in-
according to the experience and knowledge from human
cluding the wide-area control signal (WACS) selection
experts. However, in reality, there does not exist a systematic
and the reinforcement signal setting, have all been
way to select the proper membership functions and the fuzzy
introduced in details. Based on dynamic time-domain
rules [27]. If the pre-set parameters demonstrate unsatisfied
simulation and a quantitative performance index, the
performance, then an adaptive law is applied to update the
proposed intelligent controller demonstrates increased
parameters in the fuzzy rules or the membership functions,
power system damping and improved system transient
which is called adaptive fuzzy control [28] [29] [30]. Similar
stability.
with this concept, the neuro-fuzzy controller based on neural
4) The stability analysis for the proposed Fuzzy-GrADP
networks has been proposed in [31] [32], where the adaptive-
has been carried out in this paper. The constraints for this
network-based fuzzy inference system (ANFIS) is a typical
method have been derived based on Lyapunov function,
structure belongs to this category [33]. These methods pose
and can be used for choosing the key parameters. The
certain advantages of the neural network, thus could achieve
implementation issues and future research directions are
performance over the traditional fuzzy logic controllers. Along
also provided in this research.
this topic, many improvements on the algorithm side and
the application side have been intensively carried out in the The rest of the paper is organized as follows. Section II
literature. Such as neuro-fuzzy system modeling with self- presents the details of the proposed controller architecture and
IEEE, NOV 2015 3

Fig. 1. The schematic diagram of the proposed Fuzzy-GrADP

Fig. 2. The schematic diagram of the goal network with three-layer nonlinear
architecture
associated learning algorithm. In Section III and Section IV,
experimental setup and simulation analysis based on two small
benchmarks (i.e., cart-pole and ball-and-beam) are presented
to show the effectiveness of our approach. Then a relative one is from the goal network path and the other is from the
large multimachine power system control case study is carried critic network path. The detailed learning and adaptation for
out in Section V, including the inputs selection and the each module are discussed in the following sections.
reinforcement signal design. Then the stability analysis for the
proposed Fuzzy-GrADP is discussed in Section VI. Finally,
A. The Goal Network Learning and Adaptation
the concluding remarks, implementation issues and potential
solutions are given in Section VII. The structure of the goal network is shown in Fig. 2. As
can be seen from this figure, a neural network with three-layer
II. O NLINE LEARNING OF THE F UZZY-G R ADP nonlinear architecture (with one hidden layer) is used in this
CONTROLLER
paper, where this structure setting is the same as in [21] [22]
[44]. The feed-forward propagation of the signal in the goal
The schematic diagram of the proposed Fuzzy-GrADP is network is as follows:
shown in Fig. 1. The reason for the adopted GrADP is better
1 − e−k(t)
than the traditional two networks (i.e., the action network and s(t) = (3)
the critic network) ADP is that, an internal goal network is 1 + e−k(t)
introduced. The key idea of the goal network is to replace the Nh

traditional “hand-crafted” based reinforcement signal setting, k(t) = wg(2)
i
(t) · yi (t) (4)
hence to provide an adaptive internal goal/reward representa- i=1
tion to the critic network [21]. As we can see from Fig. 1, 1 − e−zi (t)
the reinforcement signal r(t) is no longer directly used by the yi (t) = , i = 1, ..., Nh (5)
1 + e−zi (t)
critic network. It will be used by the goal network to generate
n+1

internal goal representation signal for the critic network. In
zi (t) = wg(1) (t) · xj (t), i = 1, ..., Nh (6)
this way, the goal network facilitates the critic network to i,j
j=1
better approximate the value function. The motivation of this
hierarchical goal representation is to mimic human’s brain with where zi (t) is the ith hidden node input of the goal network
multiple levels internal goals to accomplish the long-term final and yi (t) is the corresponding output of the hidden node, k(t)
goals [43]. In this paper, one goal representation network is is the input to the output node of the goal network before the
used to serve the internal goal representation signal. sigmoid function, Nh is the number of hidden neurons of the
Because of the integration of the goal network and the using goal network, and (n + 1) is the total number of inputs to the
of FHM, the learning and the adaptation in the three networks goal network including the action value u(t) from the fuzzy
will be different with the GrADP and Fuzzy-ADP. For feed- logic controller.
forward process, the output of the fuzzy logic controller Before adjusting the weights in the goal network through
u(t) will have two paths to contribute to the error function back-propagation rule, we need to define the error function
formulation, one is through the goal network and the other first [21]. As shown in Fig. 1, the primary reinforcement
is through the critic network. For backward propagation, the signal r(t) is presented to the goal network directly, then a
error function of the goal network is related to the primary secondary/internal reinforcement signal s(t) is generated and
reinforcement signal r(t), and the error function of the critic sent to the critic network, which in turn is used to provide a
network is related to the internal reinforcement signal s(t). better approximation of the J(t) [21]. In this way, the primary
Meanwhile, the updating of the rules and the membership reinforcement signal r(t) is in a higher hierarchical level and
functions in the FHM will be composed of two parts, where can be a simple binary signal to represent “good” or “bad”,
IEEE, NOV 2015 4

or “success” or “failure”, while the secondary/internal rein-


forcement signal s(t) can be a more informative continuous
values for improved learning and generalization performance.
Therefore, the error function Eg (t) is defined as [22] [44]:

eg (t) = α · s(t) − [s(t − 1) − r(t)]
(7)
Eg (t) = 12 · e2g (t)

and the chain back-propagation path can be represented as:


∂Eg (t) ∂Eg (t) ∂s(t)
= · (8)
∂wg (t) ∂s(t) ∂wg (t)
By applying the chain back-propagation rule, the adaptation
of the goal network can be implemented as follows.
(2) Fig. 3. The schematic diagram of the critic network with three-layer nonlinear
(a). Δωg : Goal network weights adjustment for the hidden architecture
to the output layer:
 
(2) ∂Eg (t)
Δwgi = ηg (t) · − (2) (9)
∂wgi (t)
1 − e−qi (t)
pi (t) = , i = 1, ..., Nh (15)
∂Eg (t) ∂Eg (t) ∂s(t) ∂k(t) 1 + e−qi (t)
= · · n+2
(2)
∂wgi (t) ∂s(t) ∂k(t) (2)
∂wgi (t) 
(10) qi (t) = wc(1) (t) · xj (t), i = 1, ..., Nh (16)
1 i,j
= α · eg (t) · · (1 − (s(t))2 ) · yi (t) j=1
2
(1) where qi (t) and pi (t) are the input and output of the ith hidden
(b). Δωg : Goal network weights adjustment for the input node of the critic network, respectively. And (n + 2) is the
to the hidden layer: total number of inputs to the critic network, including the
 
∂Eg (t) action value u(t) from the action network and the internal
(1)
Δwgi,j = ηg (t) · − (1) (11) reinforcement signal s(t) from the goal network.
∂wgi,j (t)
Since the primary reinforcement signal r(t) is used by the
goal network, not the critic network. The error function Ec (t)
∂Eg (t) ∂Eg (t) ∂s(t) ∂k(t) ∂yi (t) ∂zi (t)
= · · · · used to update the parameters in the critic network is based on
(2)
∂wgi (t) ∂s(t) ∂k(t) ∂yi (t) ∂zi (t) ∂wg(1) i,j (t) the internal reinforcement signal s(t) and is defined as follows:
1 
= α · eg (t) · · (1 − (s(t))2 ) · wg(2) (t) ec (t) = α · J(t) − [J(t − 1) − s(t)]
2 i
(17)
1 Ec (t) = 12 e2c (t)
· · (1 − yi2 (t)) · xj (t)
2 and the chain back-propagation path can be represented as:
(12)
∂Ec (t) ∂Ec (t) ∂J(t)
where ηg (t) is the learning rate in the goal network. In = · (18)
general, this learning rate starts from an initial value, and ∂wc (t) ∂J(t) ∂wc (t)
gradually decreases as the iteration step increases. The detailed By applying the chain back-propagation rule, the adaptation
parameter setting will be discussed in the parameter setting of the critic network can be implemented as follows.
section.
At last, the weights tuning for the goal network is chosen (2)
(a). Δωc : Critic network weights adjustment for the hid-
as the gradient descent rule as: den to the output layer:
 
wg (t + 1) = wg (t) + Δwg (t) (13) ∂Ec (t)
(2)
Δwci = ηc (t) · − (2) (19)
∂wci (t)
B. The Critic Network Learning and Adaptation ∂Ec (t) ∂Ec (t) ∂J(t)
(2)
= = α · ec (t) · pi (t) (20)
The critic network is shown in Fig. 3. As can be seen from ∂wci (t) ∂J(t) ∂wc(2)
i (t)
this figure, a same three-layer neural network with one hidden
(1)
layer structure is used. The feed-forward propagation of the (b). Δωc : Critic network weights adjustment for the input
signal in the critic network is as follows: to the hidden layer:
Nh
 
 ∂Ec (t)
J(t) = wc(2)
i
(t) · pi (t) (14) (1)
Δwci,j = ηc (t) · − (1) (21)
i=1 ∂wci,j (t)
IEEE, NOV 2015 5

∂Ec (t) ∂Ec (t) ∂J(t) ∂pi (t) ∂qi (t)


= · · ·
(1)
∂wci,j (t) ∂J(t) ∂pi (t) ∂qi (t) ∂wc(1) i,j (t) (22)
(2) 1 2
= α · ec (t) · wci (t) · · (1 − pi (t)) · xj (t)
2
where ηc (t) is the learning rate in the critic network. The
setting of this parameter is similar as ηg (t) in the goal network,
and will be discussed in the parameter setting section.
At last, the weights tuning for the critic network is chosen
as the gradient descent rule as:
wc (t + 1) = wc (t) + Δwc (t) (23)

C. The FHM Learning and Adaptation


In this paper, the fuzzy hyperbolic model (FHM) is em-
ployed as the control policy approximator, which is different Fig. 4. The schematic diagram of the fuzzy hyperbolic model, also known
from the original GrADP method with three-layer neural as the fuzzy hyperbolic neural network model
network as the action network. The FHM is also a fuzzy
hyperbolic neural network model (FHNNM) [45] [46], which
is shown in Fig. 4. The definition of hyperbolic type fuzzy
rules has been described in detail in [45], which will be briefly
introduced as follows: general neural network for stability conditions [49]. Based on
the aforementioned definition and discussion, the feed-forward
Definition 1: propagation of the signal in the FHM is as follows:
Given a plant with n input variables x = (x1 (t), ..., xn (t))T Nr
and n output variables ẋ = (ẋ1 (t), ..., ẋn (t))T . For each 
u(t) = ωr (t) · Rr (t) (24)
output variable ẋk , k = 1, ..., n, the corresponding group of
r=1
hyperbolic type fuzzy rules has the following form:
n

j ωr (t) = μi,ji (t) (25)
R : IF x1 is Fx1 and x2 is Fx2 , ... , and xn is Fxn
THEN ẋk = ±cFx1 ± cFx2 , ..., ±cFx1 i=1
 1
where Fxi , i = 1, ..., n are fuzzy sets of xi , which include Pxi μi,N (t) = 2 · [1 − tanh(θi (t) · xi (t))], i = 1, ..., n
1 (26)
(positive) and Nxi (negative), and ±cFxi , i = 1, ..., n are 2n μi,P (t) = 2 · [1 + tanh(θi (t) · xi (t))], i = 1, ..., n
real constants corresponding to Fxi .
(1) The constant terms ±cFxi in the THEN-part correspond where θi (t) is the parameter of the membership function, ωr (t)
to Fxi in the IF-part. Specifically, if the language value of Fxi is the output of the “hidden” layer, ji equals to N or P , Nr is
term in the IF-part is Pxi , +cFxi must appear in the THEN- the number of ωr (t), and Rr (t) are the weights to represent
part; if the language value of Fxi term in the IF-part is Nxi , fuzzy control rules.
−cFxi must appear in the THEN-part; if there is no Fxi term
The error function Ea (t) used to update the parameters in
in the IF-part is Nxi , ±cFxi does not appear in the THEN-part.
the FHM is to indirectly back propagate the error between the
(2) There are 2n fuzzy rules in each rule base. Specifically,
desired ultimate objective Uc (t) and the J(t) function from
there are a total of 2n input variable combinations of all the
the critic network, and is defined as:
possible Pxi and Nxi in the IF-part. This group of fuzzy rules 
is called hyperbolic type fuzzy rule base (HFRB). If a plant ea (t) = J(t) − Uc (t)
(27)
has n output variables, then there will be n HFRBs. Ea (t) = 12 e2a (t)
As we know, both FHM and Takagi-Sugeno (T-S) model are and the chain back-propagation path can be represented as:
universal approximators, and can be used to establish nonlinear
mapping for complex environment. The advantage of using ∂Ea (t) ∂Ea (t) ∂J(t) ∂u(t)
= · ·
FHM over T-S model is that no premise structure identification ∂wa (t) ∂J(t) ∂u(t) ∂wa (t)
(28)
nor completeness design of premise variable space is need ∂Ea (t) ∂J(t) ∂s(t) ∂u(t)
+ · · ·
[47]. FHM can be obtained without knowing much information ∂J(t) ∂s(t) ∂u(t) ∂wa (t)
about the real plant, and it can be derived from a set of fuzzy
rules. Moreover, the FHM can be seen as a neural network By applying the chain back-propagation rule, the adaptation
model, where the model parameters can be learned by back- of the FHM can be implemented as follows.
propagation algorithm [48]. Since the variables of real physical
systems are always bounded, it is more reasonable in practice
by using FHM. A much more important point is that the (a). ΔRr : The adjustment of the fuzzy control rules:
 
norm of derivative for hyperbolic tangent function is less than ∂Ea (t)
ΔRr (t) = ηa (t) · − (29)
one, thus by using FHM has less conservatism than using the ∂Rr (t)
IEEE, NOV 2015 6

∂Ea (t) ∂Ea (t) ∂J(t) ∂u(t)


= · ·
∂Rr (t) ∂J(t) ∂u(t) ∂Rr (t)
(2)
Pa1
(30)
∂Ea (t) ∂J(t) ∂s(t) ∂u(t)
+ · · ·
∂J(t) ∂s(t) ∂u(t) ∂Rr (t)
(2)
Pa2

(2)
Pa1 =ea (t)
Nh 
 
1 (31)
· wc(2) (t) · · (1 − p2i (t))wc(1) (t) · ωr
i=1
i
2 i,n+1

(2)
Pa2 =ea (t)
Nh 
 
1
· wc(2) (t) · · (1 − p2i (t))wc(1) (t)
i=1
i
2 i,n+2
(32)
Nh 
 
1
· wg(2) (t) · · (1 − yi2 (t))wg(1) (t) · ωr
i=1
i
2 i,n+1

(b). Δθ: The adjustment of the parameters in the member-


ship functions:
 
∂Ea (t)
Δθi (t) = ηa (t) · − (33)
∂θi (t)

∂Ea (t) ∂Ea (t) ∂J(t) ∂u(t) ∂ωr (t) ∂μi,ji (t)
= · · · ·
∂θi (t) ∂J(t) ∂u(t) ∂ωr (t) ∂μi,ji (t) ∂θi (t) Fig. 5. Flowchart of the Fuzzy-GrADP simulation procedure
(1)
Pa1

∂Ea (t) ∂J(t) ∂s(t) ∂u(t) ∂ωr (t) ∂μi,ji (t) ∂μi,ji (t) − 12 sec h2 (θi (t) · xi (t)) · xi (t), ji = N
+ · · · · · =
∂J(t) ∂s(t) ∂u(t) ∂ωr (t) ∂μi,ji (t) ∂θi (t) ∂θi (t) 1
sec h2 (θi (t) · xi (t)) · xi (t), ji = P
2
(1)
Pa2 (37)
(34) where ηa (t) is the learning rate in the FHM. The setting of
this parameter is similar as ηg (t) in goal network and ηc (t) in
(1) critic network, and will be discussed in the parameter setting
Pa1 =ea (t)
section.
Nh 
 
1 At last, the parameter tuning for the fuzzy logic controller
· wc(2)
· · (1 − p2i (t)) · wc(1)
(t) (t)
i=1
i
2 i,n+1 is chosen as the gradient descent rule as:
⎡ ⎤ (35) 
2n n
Rr (t + 1) = Rr (t) + ΔRr (t)
 ⎢  ∂μi,ji (t) ⎥ (38)
· ⎣Rr · ( μi,ji (t)) · ⎦ θi (t + 1) = θi (t) + Δθi (t)
r=1 t=1
∂θi (t)
t=i
D. Fuzzy-GrADP Learning Process and Parameter Setting
(1) The utility function Uc (t) is set as zero to represent success
Pa2 =ea (t)
in the paper. Once a system state x(t) is observed (we assume
Nh 
 
1 that in this paper, the system/plant to be controlled is fully
· wc(2) (t) · · (1 − p2i (t)) · wc(1) (t)
i=1
i
2 i,n+2 observable) and sent to the controller, the learning process will
Nh   occurs and an consequent control action will be generated by
 1 the controller.
· wg(2) · (1 − yi2 (t)) · wg(1)
(t) · (t) (36)
i=1
i
2 i,n+1
The flowchart of the simulation procedure is presented in
⎡ ⎤ Fig. 5. The dash lines represent the back-propagation path,
n
2
⎢ n
∂μi,ji (t) ⎥ and the order of the back-propagation is corresponded to the
· ⎣ Rr · ( μi,ji (t)) · ⎦ numbers. During each sampling time step, after the feed-
r=1 t=1
∂θi (t)
t=i forward propagation, the goal network will first update its
IEEE, NOV 2015 7

TABLE I
G ENERAL PARAMETERS USED IN THE F UZZY-G R ADP CONTROLLER

Network FHM Goal Critic


Inputs Ki Ki + Koa Ki + Koa + 1
Outputs Koa 1 1
Hiddena 2Ki Khg Khc
Activationb Herperbolic Sigmoid Sigmoid
a. In FHM, it represents the number of fuzzy rules.
b. In FHM, it represents membership function.

TABLE II
S PECIFICAL PARAMETERS USED IN THE F UZZY-G R ADP CONTROLLER

Parameters ηa (0) ηg (0) ηc (0) ηa (f )


value 0.3 0.3 0.3 0.005
Fig. 6. The schematic diagram of the cart-pole plant in Case I
Parameters ηg (f ) ηc (f ) Na Ng
value 0.005 0.005 100 50
III. C ASE I: CART- POLE BALANCING PROBLEM
Parameters Nc Ta Tg Tc
value 80 0.005 0.05 0.05 A. Cart-Pole System Model Description
The proposed Fuzzy-GrADP controller has been tested on
a cart-pole balancing problem as shown in Fig. 6, which is
weights until the stop criterion is satisfied, and the s(t) is the same as that in [21]. The ultimate goal here is to control
sent to the critic network. Then the critic network will update the force applied on the cart to move it either left or right to
its weights until the stop criterion is satisfied, and the J(t) is keep the balance of the single pole mounted on the cart.
used by the FHM. Finally, the FHM will update its weights
The system function of the model is described as follows:
until the stop criterion is satisfied, and then the control action
u(t) is sent to the system. The general parameters used in cos φ[−F −mlφ̇2 sin φ+μc sgn(ẋ)] μp φ̇
2 2
g sin φ + mc +m − ml
the Fuzzy-GrADP controller are shown in Table I, and the ∂ φ/∂t = φ2
notations are defined as follows: l( 43 − mmcos
c +m
)
(39)
Ki : number of system state send to the controller, which
F + ml[φ̇2 sin φ − φ̈ cos φ] − μc sgn(ẋ)
is corresponding to n in Section II; ∂ 2 x/∂t2 = (40)
Koa : number of FHM output, depends on the number of mc + m
units to be controlled; where x is the position of the cart, φ is the angular of the
Khg : number of goal network hidden neuron, depends on pole, the acceleration g = 9.8m/s2 , the mass of the cart mc =
Ki and plant, which is corresponding to Nh in Section II; 1.0kg, the mass of the pole m = 0.1kg, half-pole length l =
Khc : number of critic network hidden neuron, depends on 0.5m, the coefficient of friction of the cart μ = 0.0005 and
Ki and plant, which is usually keeping the same as Khg ; the coefficient of friction the pole μp = 0.000002. The force
The specific parameters used in the Fuzzy-GrADP controller F applied to the cart is either 10 Newton or −10 Newton, and
are summarized in Table II and the notations are defined as the sgn function in (40) is defined as follows:
follows: ⎧
⎪ −1, x<0
ηa (0) : initial learning rate of the FHM; ⎨
sgn (x) = −1 ∨ 1, x = 0 (41)
ηg (0) : initial learning rate of the goal network; ⎪
⎩ 1, x>0
ηc (0) : initial learning rate of the critic network;
ηa (k) : learning rate of the FHM which is decreased by 0.05
and the state vector in this system model is as follows:
every 5 time step until it reach ηa (f ) and stay thereafter;  
ηg (k) : learning rate of the goal network which is decreased x φ ẋ φ̇ (42)
by 0.05 every 5 time step until it reach ηg (f ) and stay
thereafter;
ηc (k) : learning rate of the critic network which is de- B. Results Analysis In Case I
creased by 0.05 every 5 time step until it reach ηc (f ) and Based on the definition of the state vector, the parameters
stay thereafter; described in Table I are set as Ki = 4, Koa = 1, Khg = 6,
Na : internal cycle of the FHM; and Khc = 6. In our current study, the same criteria as those
Ng : internal cycle of the goal network; in [21] to evaluate the performance has been adopted. That
Nc : internal cycle of the critic network; equals to a run consists of a maximum of 1000 consecutive
Ta : internal training error threshold for the FHM; trials. It is considered successful if the last trial of the run has
Tg : internal training error threshold for the goal network; lasted 6000 time steps. Otherwise, if the controller is unable
Tc : internal training error threshold for the critic network; to learn to balance the cart-pole within 1000 trials, then the
IEEE, NOV 2015 8

&RVWíWRíJR Control action


0
02
0
0 0

u
J
í0
í02
í0
0 2000 4000 6000 0 2000 4000 6000
Time Step Time Step
Cart position 3olH anJXlar
0 

0 0
x

φ
í0 í
0 2000 4000 6000 0 2000 4000 6000
Time Step Time Step

Fig. 7. Typical record of cost-to-go, control action, cart position, and pole angular signal on the cart-pole balancing problem

run is considered unsuccessful. Moreover, a pole is considered


fallen when the angular is outside the range of [−12◦ , 12◦ ] or
the cart if beyond the range of [−2.4, 2.4]m. Note that the F
force applied to the cart is a binary value (i.e., either 10 or
−10 Newton) while the control action u(t) fed to the goal
network and critic network is a continuous value.
In order to provide statistical-based comprehensive per-
formance comparison of our proposed approach with the
method of the original ADP in [16], the GrADP in [21],
the FHM Fuzzy-ADP in [42], and the T-S Fuzzy-HDP in
[50], we set 100 independent runs to this task, where the
initial conditions of the plant are set as the same as in
Fig. 8. The schematic diagram of the ball-and-beam system in Case II
[16]. Before each run, the weights in the neural networks
are randomized in the range of [−1, 1], the fuzzy control
rules Rr (t) in FHM are also initialized in the range of a snap shot of the convergence of the J(t) value during the
[−1, 1], and the parameter of the membership function is learning process and the control action u(t) during a typical
calculated as θi (t) = (ϑ − M INtanh ) ∗ (M AXT anh − successful run. The performance of the cart position and the
M INT anh )/(M AXtanh − M INtanh ) + M INT anh . In which pole angular signal are also presented in this figure. This
M AXtanh = 1, M INtanh = −1, M AXT anh = 10, figure clearly demonstrates that our proposed approach can
M INT anh = 0.01, and ϑ is a random number between effectively accomplish the control performance in this case.
[−1, 1]. The simulation results of the required average number
of trials to be success in the 100 runs are summarized in IV. C ASE II: BALL - AND - BEAM BALANCING PROBLEM
Table III. For fair comparison, we also added the same type A. Ball-And-Beam System Model Description
of noises in our simulation. From this table one can see, the The ball-and-beam system is shown in Fig. 8, which is
FHM Fuzzy-ADP and the T-S Fuzzy-HDP demonstrate similar the same as that in [43] [51] [52]. The system function and
control performance, while our approach can provide quite parameter setting are described in the following.
robust performance with the lower required number of trials The motion equations from the Lagrange equation are as
to be success under the noisy conditions. It could be also follows:
observed that the proposed method is unsensitive to the noise
Ib 1
type and size. This indicates that by using the three networks mg(sin α) = (m + 2 )ẍ + (mr2 + Ib ) α̈ − mx α̇2 (43)
based architecture, the controller is more robust and can work r r
effectively under large level of noises, which are more general ul(cos α) =[m(x )2 + Ib + Iω ]α̈ + (2mẋ x + bl2 )α̇
cases in reality. 1 (44)
+ Kl2 α + (mr2 + Ib ) ẍ − mgx (cos α)
Furthermore, the critic network is used to estimate the cost- r
to-go value J(t), thus we further analyze how the J(t) value where the mass of the ball m = 0.0162kg, the roll radius of
and control action u(t) looks like in this case. Fig. 7 shows the ball r = 0.02m, the inertia moment of the ball Ib = 4.32×
IEEE, NOV 2015 9

TABLE III
P ERFORMANCE EVALUATION ON CASE I: C ART POLE , BASED ON REQUIRED AVERAGE NO . OF TRIALS TO BE SUCCESS

Noise type Fuzzy-GrADP FHM Fuzzy-ADP T-S Fuzzy-HDP GrADP ADP


Noise free 9.73 21.67 15.42 13.7 6
Uniform 5% a.1 10.93 21.37 19.27 12.6 8
Uniform 10% a. 11.22 24.65 27.89 14.4 14
Uniform 5% s.2 10.26 21.65 18.92 12.6 32
Uniform 10% s. 10.71 17.46 31.02 14.4 54
Gaussian σ 2 (0.1) s. 10.56 25.23 21.02 15.0 164
Gaussian σ 2 (0.2) s. 11.14 29.87 38.72 21.3 193
1: Actuators are subject to the noise.
2: Sensors are subject to the noise.

10−5 kg·m2 , the friction coefficient of the drive mechanics b = B. Results Analysis In Case II
1Ns /m, the radius of force application l = 0.48m, the radius Since the number of the state vector is the same as that in
of beam lω = 0.5m, the stiffness of the drive mechanics K = Case I, the parameter setting described in Table I for Case
0.001N/m, the gravity g = 9.8N/kg, the inertia moment of II will remain unchange. The objective of the task it to keep
the beam Iω = 0.14025kg · m2 , and u is the force of the drive balancing the ball on the beam for a certain period of time.
mechanics. Specifically, each run consists of a maximum of 1000 trials,
and it is considered successful if the last trial of the run has
In order to simplify the system model function, we re-define
lasted 10000 time steps. Otherwise, if the controller is unable
that x1 = x represents the position of the ball, x2 = ẋ
to learn to balance the ball-and-beam within 1000 trials, then
represents the velocity the ball, x3 = α is the angle of the
the run is considered unsuccessful. The range of beam is
beam with respect to the horizontal axis, and x4 = α̇ is the
[−0.48, 0.48]m and the range of the angular of the beam to
angular velocity of the beam. In this way, the system function
the horizontal axis is [−0.24, 0.24]rad. In this case, different
in (43) and (44) can be transformed into the following form:
with the “bang-bang” control in Case I, a continuous force is
Ib 1 applied to the driver directly.
(m + )ẋ2 + (mr2 + Ib ) ẋ4 = mx1 x24 + mg(sin x3 ) (45)
r2 r We compare the proposed algorithm with the ADP structure
1 presented in [16], the GrADP in [21], the hierarchical GrADP
(mr2 + Ib ) ẋ2 + [mx21 + Ib + Iω ]ẋ4 = with three goal networks in [43], and the T-S Fuzzy-HDP in
r
(ul + mgx1 ) cos x3 − (2mx2 x1 + bl2 )x4 − Kl2 x3 [50]. The results of the required average number of trials to
(46) be success and the successful rate in 100 individual runs are
shown in Table IV. For fair comparison, we add the same
then re-write (45) and (46) into a matrix notation as follows: initial condition and types of noise according to [43] in our
     
A B ẋ2 P simulation. Specifically, the ball position x1 and the angular
· = (47) of the beam x3 are uniformly distributed in the range of
C D ẋ4 Q
[−0.2, 0.2]m and [−0.15, 0.15]rad, respectively, and the ball
where the elements are as follows; velocity x2 and the angular velocity x4 are set to be zero.
    2   The initialization of the neural networks and the fuzzy logic
A B m + rIb2 mr + Ib 1r
=  2  (48) controllers are the same as in Case I. From the results in Table
C D mr + Ib 1r mx21 + Ib + Iω IV we can observe that, the proposed approach can provide the
  best performance with uniform or gaussian noise. Especially,
P
= the proposed algorithm is unsensitive to the noise intensity
Q and type, which demonstrates a consistent observation with in
 
mx1 x24 + mg (sin x3 ) Case I, namely, effective and robust under noisy conditions.
 
(ul + mgx1 ) cos x3 − 2mx1 x2 + bl2 x4 − Kl2 x3
(49) V. C ASE III: M ULTIMACHINE P OWER S YSTEM C ONTROL
and the general form of this problem is obtained as follows: S TUDY
   −1  
ẋ2 A B P A. Benchmark Power System Description
= (50) To demonstrate the feasibility of the proposed Fuzzy-
ẋ4 C D Q
GrADP approach on real applications, a case study is under-
and the other two terms in the state vector can be expressed taken based on the New England 10-machine 39-bus system.
as ẋ1 = x2 and ẋ3 = x4 , thus with the state vector as follows: The power system configuration is shown in Fig. 9. This test
 
x 1 x2 x3 x4 system consists of 10 generators, 39 buses, and 46 trans-
(51)
mission lines. Similar as in reference [53], each generator is
IEEE, NOV 2015 10

TABLE IV
P ERFORMANCE EVALUATION ON CASE II: BALL - AND - BEAM , BASED ON REQUIRED AVERAGE NO . OF TRIALS TO BE SUCCESS AND SUCCESSFUL RATE

Fuzzy-GrADP hierarchical GrADP T-S Fuzzy-HDP GrADP ADP


Noise type
Trials Rate1 Trials Rate Trials Rate Trials Rate Trials Rate
Noise free 12.06 100% 13.5 100% 19.7 100% 21.9 100% 42.1 98%
Uniform 5% a.2 13.02 98% 17.6 99% 27.4 98% 21.3 98% 53.2 98%
Uniform 5% x.3 15.78 100% 16.2 100% 24.3 99% 23.8 100% 71.8 98%
Gaussian σ 2 (0.1) a. 15.89 100% 23.2 100% 37.3 99% 29.7 100% 79.3 98%
Gaussian σ 2 (0.2) a. 15.71 99% 31.3 98% 43.5 97% 32.4 98% 121.3 97%
1: The successful rate of all the test runs.
2: Actuators are subject to the noise.
3: Position sensor are subject to the noise.

Fig. 10. The schematic diagram of the SVC controller

The proposed SVC supplementary controller is shown in


Fig. 10. In this figure, the wide-area control signals (WACS)
are collected by the wide-area measurement system (WAMS),
which will be used by the proposed Fuzzy-GrADP controller
to generate a supplementary control signal to the original SVC
controller. In this figure, Vref is the pre-set reference voltage
for the SVC and VSV C is the measurement voltage. The other
parameters are set as: Bmax = 2 p.u., Bmin = −2 p.u.,
K = 20, and T = 0.05s. The detailed damping controller
design will be introduced in the following sections, including
Fig. 9. The schematic diagram of the New England 10-machine 39-bus
system in Case III. The SVC is placed at Bus 16 to provide fine reactive
the WACS selection and the reinforcement signal setting.
power to damp system oscillation
B. Detailed Fuzzy-GrADP Design for Damping Control
modelled as a fourth-order model and equipped with excitation The benchmark power system is linearized around a nominal
system, except for generator G10, which is an equivalent operating point [58]. Then modal analysis is carried out based
infinite bus. The transmission system is modelled as a passive on this linear model, and only inter-area modes are selected
circuit, and the loads are modelled as constant impedances. and shown in the first three columns in Table V. It can be
The mechanical power of each generator is assumed to be observed that the damping ratios of all the four inter-area
constants during the fault simulation. As has been indicated modes are less than 0.1. Mode I has the smallest oscillation
in [54], this benchmark system is a typical interconnected frequency with 0.61 Hz, while others have relatively larger
system with poorly damped inter-area oscillation modes. We frequency values. Thus, mode I is the critical inter-area mode,
can see from Fig. 9, this power system has been divided into which should be provided supplementary control signal to
two separated subsystems by the transmission line 15 ∼ 16 increase its damping. The observability analysis is carried
and 16 ∼ 17. Low-frequency oscillation has been observed out for mode I, and the results are shown in the last two
on the transmission lines when a system fault is occurred. A columns in Table V. We can see that the transmitted active
±200M var SVC is installed at bus 16 to support the system power on line 3 ∼ 18 has the largest observability value of
voltage, therefore increase the system damping. This power 0.096, following with the transmitted active power on line
system has been widely used as benchmark in the power and 17 ∼ 18 with observability value of 0.094. Thus, these two
energy society (PES) [55] [56] [57], and is also employed signals are selected as the wide-area control signals (WACS).
in this paper to test the effectiveness and efficiency of the Since transmission line 15 ∼ 16 and 16 ∼ 17 are the tie-lines
proposed approach. between the two areas, the active power singles on these lines
IEEE, NOV 2015 11

TABLE V
I NTER - AREA MODES AND THE OBSERVABILITY SIGNALS CORRESPONDING TO MODE I

Mode Number Damping ratio Frequency (Hz) Model I Observability


I 0.052 0.61 P3−18 0.096
II 0.039 0.93 P17−18 0.094
III 0.043 1.04 P5−8 0.092
IV 0.045 1.14 P8−9 0.091

should also be included in the WACS.

3RZHUíDQG&RQWURORXWSXW SX

  $ctiYH poZHr on transPission linH  to 8
2XtpXt oI tKH controllHr
W ACS = ΔP318 ΔP1718 ΔP1516 ΔP1617 (52) 0

  
Q = diag 1 1 1 1 0
(53)
r (t) = −0.25 · W ACS · Q · W ACS í0

Based on the aforementioned analysis, the finalized WACS í


is illustrated in equation (52), where ΔP are the active power 0 2 4 6
Time (s)
8 0 2

deviations on the transmission lines [59] [60]. Based on the


selected WACS, the reinforcement signal of the intelligent Fig. 11. The simulation result of the first trial (normalized value)
controller is designed in equation (53).

C. Results Analysis In Case III


Simulation studies are carried out based on detailed nonlin-
ear benchmark power system model to verify the effectiveness
of the designed Fuzzy-GrADP controller. The proposed control
algorithm is also compared with GrADP in reference [21] [22],
Fuzzy-ADP in [42], and the original PI controller without
supplementary control. The sampling time of the controller
is 20ms, which is large enough for the controller to finish the
adaptation in each time step. The supplementary control signal
generated by the proposed intelligent controller is limited
between −0.1 p.u. to 0.1 p.u.. Fig. 12. The active power on line 3 to 18 with compared control
As demonstrated in equation (52), the number of the state
vector is 4, therefore Ki in Table I is set as 4. The SVC is the
only unit to be controlled, thus Koa is set as 1. In this case, we
set Khg = 12 and Khc = 12 to address the more changeable
and unpredictable power system operating conditions. The
weights in the two neural networks, the parameters in the
FHM are randomly initialized before the training. Where the
initialization strategy is the same as in Case I and Case II.
As it is well-known that in neural network, the initial weights
contribute significantly to the performance of the controller
[60] [61]. The trained weights and parameters in the first trial
should be saved and carried on for the next trial, regardless of
what the simulation result is.
A three-phase ground-fault occurs at the end terminal of
line 3 ∼ 4 near bus 3 at t = 0.5s, followed by tripping the
Fig. 13. The active power on line 17 to 18 with compared control
faulty transmission line at t = 0.6s and reclosing again at
t = 1.1s. With original PI control, the system will need almost The weights in the first trial are carried on as the initial
12s to damp the inter-area oscillation after this disturbance. weights for the second trial. The results of the second trial are
Then the proposed Fuzzy-GrADP controller is activated in the
benchmark power system to provide supplementary control
signal to SVC, and the simulation result of the first trial early stage of the simulation (0.5 ∼ 4s) in the first trial. After
is shown in Fig. 11. Because of the random initial weights about 6s, the proposed intelligent controller learned to damp
and parameters, we can observe that the proposed intelligent the line active power swing by adapting the weights in the
controller does not generate proper control signal during the neural networks and the parameters in the FHM.
IEEE, NOV 2015 12

shown in Fig. 12 and Fig. 13. Specifically, Fig. 12 and Fig. 13 reinforcement signal for the critic network, so that to help
show the transmitted active power on line 3 ∼ 18 and 17 ∼ 18 the value function approximation and control policy seeking
with the original PI control, GrADP control, Fuzzy-ADP over time [21]. In addition, the stability of GrADP controller
control and the Fuzzy-GrADP control, respectively. From the is also analyzed in [44] [64] that such controller is stable
simulation results we can see, with Fuzzy-GrADP control, the under certain constraints for the key parameters with Lyapunov
system can become stable after about 5 seconds. Meanwhile method. We are also working on the convergence analysis of
the proposed Fuzzy-GrADP approach has the best control value function, and (internal) reinforcement signal with certain
performance than the other methods. monotonic properties based on our previous published results
To better assess the control performance during the transient [65].
process with different methods, a quantitative performance
index based on the integral of the time multiplied by the In this paper, we proposed a fuzzy-based GrADP structure,
absolute error (ITAE) [62] [63] has been adopted as follows: in which we use fuzzy hyperbolic model to seek the control
Tsim n policy. The goal network in the Fuzzy-GrADP is still to
 provide the adaptive (internal) reinforcement signal for the
JIT AE = |δi − δr | · t · dt (54)
i=1
critic network, which will then evaluate the performance of the
0
action network, i.e., the FHM. The objective of the proposed
where δi is the rotor angle of the ith generator, δr is the rotor Fuzzy-GrADP is to keep the control system stable and also
angle of the reference generator (i.e., G10 in this case), n is the minimizing the total cost function over time. There are two
number of all the generators, and Tsim is total simulation time. possible directions to address the stability and convergence
As indicated in [63], smaller JIT AE indicates less deviation of of the proposed Fuzzy-GrADP method. On one hand, we
synchronization among all the generators and shorter time for can define the Lyapunov function for the proposed design,
the system to reach steady state. Since the rotor angle oscilla- and analyze the first difference of Lyapunov function to be
tions of all the generators have been considered, JIT AE is a negative definite [66]. Under the conditions derived, we can
system-level performance index representing overall stability conclude that the proposed Fuzzy-GrADP method is (asymp-
and dynamic performance. In this paper, this index is used as totically) stable. On the other hand, we plan to address the
the supplement and conclusion to the time-domain simulation convergence of the value function and (internal) reinforcement
for a better view of comparison. signal as those in our previous related works [65]. We will
Table VI shows the comparison of the JIT AE under the first analyze the monotonic properties of both signals and
same fault with different control methods. It could be observed then find the upper/lower bounds. We are currently working
that the proposed method could achieve the smallest JIT AE for both possible directions to handle the theoretical analysis
value, which means the whole system will have less oscillation of the proposed Fuzzy-GrADP method. In this paper, the
under this fault condition. Moreover, based on the JIT AE former method is adopted to show the stability analysis of
value of the original PI control, the percentage of damping our proposed structure. Similar with the method in [66], here
improvement of each method is also calculated. The proposed we use R(t) to represent the fuzzy control rules before the
method improves the system damping for a number of 48.38%. output layer in FHM. And we use ωc (t), ωg (t) to represent
Now, we can conclude that the Fuzzy-GrADP controller has (2) (2)
the hidden-to-output layer weights ωc (t), ωg (t) and define
the best control performance to increase the system damping. the outputs of the hidden layers as φc (t) = p(t), φg (t) = y(t),
in critic and goal networks, respectively.
VI. F UZZY-G R ADP S TABILITY A NALYSIS
Define the Lyapunov function candidate as follows:
Fuzzy modeling and fuzzy-based network have been in-
troduced into the adaptive control area for many years, and V (t) = V1 (t) + V2 (t) + V3 (t) + V4 (t) (55)
has demonstrated its performance with stability analysis from
different perspectives [28] [32] [33] [34], as well as the where
1
successful control performance on power system applications V1 (t) = tr{ω̃cT (t)ω̃c (t)}, ω̃c (t) = ωc (t) − ωc∗ (56)
[35] [36]. Based on the advantages of the fuzzy modeling, ηc
researchers also introduced such fuzzy mapping into adap- 1
tive/approximate dynamic programming for adaptive online V2 (t) = tr{ω̃gT (t)ω̃g (t)}, ω̃g (t) = ωg (t) − ωg∗ (57)
γ 2 ηg
learning control. For instance, in [40] [41] [42] [50], the
1
authors have introduced the fuzzy neural network model into V3 (t) = tr{R̃T (t)R̃(t)}, R̃(t) = R(t) − R∗ (58)
the ADP design, and demonstrate better statistical performance γ 3 ηa
on the balancing benchmarks. In [49], the convergence anal- 1
V4 (t) = ξc (t − 1)2 , ξc (t) = ω̃cT (t)φc (t) (59)
ysis for the value function and control policy were provided 2
for FHM based adaptive dynamic programming. Meanwhile,
the three-network/goal representation ADP was proposed to
Hence, the first difference of the Lyapunov function candi-
introduce an additional neural network mapping comparing
date is:
with the existing ADP design mentioned above. This ad-
ditional neural network can provide an adaptive (internal) ΔV (t) = ΔV1 (t) + ΔV2 (t) + ΔV3 (t) + ΔV4 (t) (60)
IEEE, NOV 2015 13

TABLE VI
P ERFORMANCE EVALUATION ON CASE III: M ULTIMACHINE P OWER S YSTEMS , BASED ON JIT AE

System Performance Fuzzy-GrADP Fuzzy-ADP GrADP Original PI


JIT AE 2.1218 × 104 2.6832 × 104 2.5974 × 104 4.1107 × 104
Damping Improvement 48.38% 34.73% 36.81% −

With the updating rules of ωc (t) given in (20), we obtain: where


ω̃c (t + 1) =ω̃c (t) − αηc φc (t)[αωcT (t)φc (t) + s(t) 1
ω̃g (t + 1) =ω̃g (t) − αηg (1 − s2 (t))φg (t)(αs(t)
2
− ωcT (t − 1)φc (t − 1)] + r(t) − s(t − 1)) (66)
=[I − α2 ηc φc (t)φTc (t)]ω̃c2 (t) =ω̃g (t) − αηg C(t)φg (t)D(t)
(61)
− αηc φc (t)[αωc∗T φc (t) + s(t)
with C(t) = 12 (1 − s2 (t)) and D(t) = αs(t) + r(t) − s(t − 1).
− ωcT (t − 1)φc (t − 1)]T Then, let ξg (t) = ω̃g (t)φg (t), (65) becomes:
=A(t)ω̃c (t) − αηc φc (t)B T (t) 1
2
ΔV2 (t) = tr{−2αηg C(t)DT (t)φTg (t)ω̃g (t)
where A(t) = I − α ηc φc (t)φTc (t)
and B(t) = αωc∗T φc (t)
+ γ 2 ηg
s(t) − ωcT (t − 1)φc (t − 1). Hence, consider the first term in + α2 ηg2 C(t)2 D(t)2 φg (t)2 }
(60), we have: 1
= tr{C(t)DT (t) − αξg (t)2 − C(t)2 D(t)2
1 γ2
ΔV1 (t) = tr{ω̃cT (t + 1)ω̃c (t + 1) − ω̃cT (t)ω̃c (t)}
ηc − α2 ξg (t)2 + α2 ηg C(t)2 D(t)2 φg (t)2 }
1 1
= tr{ω̃cT (t)AT Aω̃c (t) − ω̃cT (t)ω̃c (t) = (−(1 − α2 ηg φg (t)2 )C(t)2 D(t)2
ηc γ2
− 2αηc BφTc (t)Aω̃c (t) + α2 ηc2 BφTc (t)φc (t)B T } − α2 ξg (t)2 + C(t)DT (t) − αξg (t)2 )
(62) (67)
and since ξc (t) = ω̃cT (t)φc (t), (62) becomes: and based on Cauchy-Schwarz inequality, we have:
1 1
ΔV1 (t) = tr{−α2 ηc ξc (t)2 − α2 ηc ξc (t)2 (1 ΔV2 (t) ≤ (−(1 − α2 ηg φg (t)2 )C(t)2 D(t)2
ηc γ2 (68)
− α2 ηc φc (t)2 ) − 2αηc BφTc (t)[I + α2 ξg (t)2 + 2C(t)DT (t)2 )
− α2 ηc φc (t)φTc (t)]ω̃c (t)
+ α2 ηc2 BφTc (t)φc (t)B T }
=tr{−α2 ξc (t)2 − α2 (1 − α2 ηc φc (t)2 )ξc (t)
+ α−1 B2 + α2 (1 − α2 ηc φc (t)2 )α−1 B2
For the third term, given the following updating rule:
+ α2 ηc B2 φc (t)2 }
R̃(t + 1) =R̃(t) − ηa ω(t)[ωcT (t)φc (t)]T (ωcT (t)E(t)
= − α2 ξc (t)2 − α2 (1 − α2 ηc φc (t)2 )ξc (t)
+ [ωcT (t)F (t)][ωgT (t)G(t)])
+ α−1 B2 + B2
(63) =R̃(t) − ηa ω(t)[ωcT (t)φc (t)]T ωcT (t)(E(t) (69)
and based on Cauchy-Schwarz inequality, we have: + F (t)[ωgT (t)G(t)])
=R̃(t) − ηa ω(t)[ωcT (t)φc (t)]T [ωc (t)H(t)]
ΔV1 (t) ≤ − α2 ξc (t)2 − α2 (1 − α2 ηc φc (t)2 )ξc (t)
(1)
+ ωc∗T φc (t) + α−1 s(t) − α−1 ωcT (t − 1)φc (t − 1)2 where E(t) = 12 (1 − φ2c (t))ωc,n+1 (t), F (t) = 21 (1 −
(1) (1)
1 φ2c (t))ωc,n+2 (t), G(t) = 12 (1 − φ2g (t))ωg,n+1 (t) and H(t) =
+ 2αωc∗T φc (t) + s(t) − ωcT (t − 1)φc (t − 1)
2 E(t) + F (t)[ωgT (t)G(t)]. Set ξa (t) = R̃(t)ω(t), then we have:
1 ∗ 1
− ωc φc (t − 1)2 + ξc (t − 1)2 1
2 2 ΔV3 = tr{R̃T (t + 1)R̃(t + 1) − R̃T (t)R̃(t)}
(64) γ 3 ηa
1
= tr{−2ηa R̃(t)ω(t)[ωcT φc (t)]T [ωcT (t)H(t)]
γ 3 ηa
+ ηa2 ω(t)2 ωcT (t)φc (t)2 ωcT (t)H(t)2 }
1
= (−(1 − ηa ω(t)2 )ωcT (t)φc (t)2 ωcT (t)H(t)2
For the second term, γ3
1 − ξa (t)2 + [ωcT (t)φc (t)]T [ωcT (t)H(t)] − ξa (t)2 )
ΔV2 (t) = tr{ω̃gT (t + 1)ω̃g (t + 1) − ω̃gT (t)ω̃g (t)} (65)
γ 2 ηg (70)
IEEE, NOV 2015 14

and according to Cauchy-Schwarz inequality, (70) becomes: therefore, we can further obtain that:
1   2 2 2 2 8 2
ΔV3 ≤ (−(1 − ηa 2 )ωcT (t)φc (t)2 ωcT (t)H(t)2 P 2 ≤ 8α2 + 4 ωcm φcm + ωcm Hm 2 2
ωcm φ2cm + Cm
γ3 γ γ2
(71)  3 2
+ 2ωcT (t)φc (t)2 ωcT H(t)2 + ξa (t)2 ) 1 4α
(α2 + 1)s2m + rm 2
+ ω 2 φ2 = Pm 2
2 γ2 gm gm
(77)
where ωcm , ωgm , φcm , φgm , Cm , Hm , sm , and rm are the
For the forth term, upper bounds of ωc , ωg , φc , φg , C(t), H(t), s(t), and r(t),
respectively.
1 
ΔV4 (t) = ξc (t)2 − ξc (t − 1)2 (72) Hence, if condition (74) holds, then for any:
2
2
substituting (64), (68), (71), and (72) into (60), we obtain the ξc (t)2 > P2 (78)
first difference of the Lyapunov function candidate as follows: 2α2 −1 m
1 the first difference of the Lyapunov function candidate ΔV ≤
ΔV (t) ≤ − (α2 − )ξc (t)2 − α2 (1 − α2 ηc φc (t)2 )ξc (t) 0 holds. According to the standard Lyapunov extension theo-
2
rem [67], this demonstrates that the errors between the optimal
+ ωc∗T φc (t) + α−1 s(t) − α−1 ωcT (t − 1φc (t − 1))2
weights ωc∗ , ωg∗ , R∗ and their estimations ωc , ωg , R are
1
− (1 − α2 ηg φg (t)2 )C(t)2 αs(t) + r(t) uniformly ultimately bounded (UUB), which further implies
γ2 that the proposed Fuzzy-GrADP is stable.
1
− s(t − 1)2 − (1 − ηa ω(t)2 )ωcT (t)φc (t)2
γ3
1 VII. C ONCLUSIONS AND D ISCUSSIONS
ωc (t)H(t) + 2αωc∗T φc (t) + s(t) − ωcT (t − 1)
T 2
2 In this paper, a novel FHM based GrADP algorithm (Fuzzy-
1 α2
φc (t − 1) − ωc∗ φc (t − 1)2 + ξg (t)2 GrADP) was proposed for nonlinear control problems. The
2 γ2 parameters in the membership functions and the fuzzy rules
2
+ C(t)2 αs(t) + r(t) − s(t − 1)2 were updated through a learning mechanism, thus was able
γ2 to provide online sequential control policy. Simulation results
2
+ ωcT (t)φc (t)2 ωcT H(t)2 on three case studies, i.e., a cart-pole balancing problem, a
γ3 ball-and-beam balancing problem and a multimachine power
(73) system damping control problem, demonstrated that the pro-
set the following constrains: posed control algorithm is effective and robust either in
√ small balancing problems or in large power system damping
2
< α < 1, α2 ηc φc (t)2 < 1 applications. Furthermore, detailed Lyapunov stability analysis
2 (74) was also carried out in this paper to demonstrate the theoretical
α2 ηg φg (t)2 < 1, ηa ω(t)2 < 1 convergence guarantee of the proposed approach.
and define: The adjustment of the parameters in the FHM, goal network
1 and critic network are based on back-propagation that is time-
P 2 =2αωc∗T φc (t) + s(t) − ωcT (t − 1)φc (t − 1) consuming. In real power system applications, the sampling
2
1 α2 time should be long enough to guarantee the Fuzzy-GrADP
− ωc∗ φc (t − 1)2 + ξg (t)2 controller has adapted the parameters in the three networks.
2 γ2
(75) In our simulation, an Inter(R) Core(TM) i7-4770 CPU with
2
+ C(t)2 αs(t) + r(t) − s(t − 1)2 3.4GHz with Matlab/Simulink R2013a environment is used.
γ2
The iteration number in each sampling time step in the FHM,
2
+ ωcT (t)φc (t)2 ωcT H(t)2 goal network and critic network are set as Na = 100,
γ3 Ng = 50, and Nc = 80 (see Table II), respectively. In Case
and applying Cauchy-Schwarz inequality, we obtain: I, the average time to fully adapt the parameters in the three
1 networks in each sampling time step is 0.72ms. In Case II,
P 2 ≤8(α2 ωc∗T φc (t)2 + s(t)2 + ωcT (t − 1)φc (t − 1)2 the average time to fully adapt the parameters in the three
4
1 8 parts in each sampling time step is 0.83ms. In the power
+ ωc∗T φc (k − 1)2 ) + C(t)2 system damping control case, the average time to fully adapt
4 γ2
  the parameters in the three networks in each sampling time
2 2 2 1 2
α s(t) + s(t − 1) + r(t) step is 5.8ms. Therefore in real power system applications,
2
2 
the sampling time for the controller could be chosen as 20ms
2α 
+ ωgT φg (t)2 + ωgT φg (t)2 (50Hz).
(2) As indicated in Case III, the Fuzzy-GrADP requires wide-
+ ωcT (t)φc (t)2 ωcT (t)H(t)2 area control signals (WACS), such as generator speed and
(76) transmission line voltage measurements. In modern power
IEEE, NOV 2015 15

system, this hurdle has been addressed by the largely installa- [17] P. J. Werbos, “Intelligence in the brain: A theory of how it works and
tion of wide-area measurement system (WAMS). The remote how to build it,” Neural Networks, vol. 22, no. 3, pp. 200–212, 2009.
[18] D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, “Neural-network-based
generator or bus signal will be measured by the sensors with a optimal control for a class of unknown discrete-time nonlinear systems
global time tag, and sent to the control center, such as energy using globalized dual heuristic programming,” Automation Science and
management system (EMS). Even if not all the generators or Engineering, IEEE Transactions on, vol. 9, no. 3, pp. 628–634, 2012.
[19] D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of
buses information are available, the state estimation technique unknown nonaffine nonlinear discrete-time systems based on adaptive
will help the controller to get the accurate and real-time system dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832,
state. The Fuzzy-GrADP is used to damp out the most critical 2012.
[20] J. Fu, H. He, and X. Zhou, “Adaptive learning and control for MIMO
inter-area modes, such as low-frequency oscillation mode in system based on adaptive dynamic programming,” Neural Networks,
largely inter-connected power systems. However, some of the IEEE Transactions on, vol. 22, no. 7, pp. 1133–1148, 2011.
unstable local-area modes may also need damping control. [21] H. He, Z. Ni, and J. Fu, “A three-network architecture for on-line
learning and optimization based on adaptive dynamic programming,”
The proposed Fuzzy-GrADP controller can be coordinated Neurocomputing, vol. 78, no. 1, pp. 3–13, 2012.
with traditional power system stabilizer (PSS) to improve [22] Z. Ni, H. He, J. Wen, and X. Xu, “Goal representation heuristic dynamic
the system dynamic stability in a wide rage of operating programming on maze navigation,” Neural Networks and Learning
conditions. Systems, IEEE Transactions on, vol. 24, no. 12, pp. 2038–2050, Dec
2013.
[23] X. Sui, Y. Tang, H. He, and J. Wen, “Energy-storage-based low-
frequency oscillation damping control using particle swarm optimization
R EFERENCES and heuristic dynamic programming,” Power Systems, IEEE Transac-
[1] C.-H. Wang, H.-L. Liu, and T.-C. Lin, “Direct adaptive fuzzy-neural tions on, vol. 29, no. 5, pp. 2539–2548, Sept 2014.
control with state observer and supervisory controller for unknown [24] Y. Tang, H. He, J. Wen, and J. Liu, “Power system stability control for a
nonlinear dynamical systems,” Fuzzy Systems, IEEE Transactions on, wind farm based on adaptive dynamic programming,” Smart Grid, IEEE
vol. 10, no. 1, pp. 39–49, 2002. Transactions on, vol. 6, no. 1, pp. 166–177, Jan 2015.
[2] Y. Yang and C. Zhou, “Adaptive fuzzy H-∞ stabilization for strict- [25] Y. Tang, J. Yang, J. Yan, and H. He, “Intelligent load frequency
feedback canonical nonlinear systems via backstepping and small-gain controller using GrADP for island smart grid with electric vehicles and
approach,” Fuzzy Systems, IEEE Transactions on, vol. 13, no. 1, pp. renewable resources,” Neurocomputing, vol. 170, pp. 406–416, 2015.
104–114, 2005. [26] C.-C. Lee, “Fuzzy logic in control systems: fuzzy logic controller. I and
[3] S. J. Yoo and J. B. Park, “Neural-network-based decentralized adaptive II,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 20, no. 2,
control for a class of large-scale nonlinear systems with unknown time- pp. 404–435, 1990.
varying delays,” Systems, Man, and Cybernetics, Part B: Cybernetics, [27] S. Mohagheghi, G. Venayagamoorthy, and R. Harley, “Fully evolvable
IEEE Transactions on, vol. 39, no. 5, pp. 1316–1323, 2009. optimal neurofuzzy controller using adaptive critic designs,” Fuzzy
[4] M. Wang, S. S. Ge, and K.-S. Hong, “Approximation-based adaptive Systems, IEEE Transactions on, vol. 16, no. 6, pp. 1450–1461, Dec
tracking control of pure-feedback nonlinear systems with multiple un- 2008.
known time-varying delays,” Neural Networks, IEEE Transactions on, [28] L.-X. Wang, “Stable adaptive fuzzy control of nonlinear systems,” Fuzzy
vol. 21, no. 11, pp. 1804–1816, 2010. Systems, IEEE Transactions on, vol. 1, no. 2, pp. 146–155, 1993.
[5] W.-Y. Wang, Y.-H. Chien, Y.-G. Leu, and T.-T. Lee, “Adaptive T-S fuzzy- [29] H. O. Wang, K. Tanaka, and M. F. Griffin, “An approach to fuzzy control
neural modeling and control for general MIMO unknown nonaffine of nonlinear systems: stability and design issues,” Fuzzy Systems, IEEE
nonlinear systems using projection update laws,” Automatica, vol. 46, Transactions on, vol. 4, no. 1, pp. 14–23, 1996.
no. 5, pp. 852–863, 2010. [30] P. P. Angelov and D. P. Filev, “An approach to online identification of
[6] Q. Gao, X.-J. Zeng, G. Feng, Y. Wang, and J. Qiu, “T-S-fuzzy- takagi-sugeno fuzzy models,” Systems, Man, and Cybernetics, Part B:
model-based approximation and controller design for general nonlinear Cybernetics, IEEE Transactions on, vol. 34, no. 1, pp. 484–498, 2004.
systems,” Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE [31] C.-T. Lin and C. G. Lee, “Reinforcement structure/parameter learning
Transactions on, vol. 42, no. 4, pp. 1143–1154, 2012. for neural-network-based fuzzy logic control systems,” Fuzzy Systems,
[7] Y.-J. Liu, W. Wang, S.-C. Tong, and Y.-S. Liu, “Robust adaptive tracking IEEE Transactions on, vol. 2, no. 1, pp. 46–63, 1994.
control for nonlinear systems based on bounds of fuzzy approximation [32] J.-S. Jang and C.-T. Sun, “Neuro-fuzzy modeling and control,” Proceed-
parameters,” Systems, Man and Cybernetics, Part A: Systems and ings of the IEEE, vol. 83, no. 3, pp. 378–406, 1995.
Humans, IEEE Transactions on, vol. 40, no. 1, pp. 170–184, 2010. [33] J.-S. Jang, “ANFIS: adaptive-network-based fuzzy inference system,”
[8] C. Chen, Y.-J. Liu, and G.-X. Wen, “Fuzzy neural network-based Systems, Man and Cybernetics, IEEE Transactions on, vol. 23, no. 3,
adaptive control for a class of uncertain nonlinear stochastic systems,” pp. 665–685, 1993.
Cybernetics, IEEE Transactions on, vol. 44, no. 5, pp. 583–593, May [34] S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system modeling with
2014. self-constructing rule generationand hybrid SVD-based learning,” Fuzzy
[9] H. Han, X.-L. Wu, and J.-F. Qiao, “Nonlinear systems modeling based Systems, IEEE Transactions on, vol. 11, no. 3, pp. 341–353, June 2003.
on self-organizing fuzzy-neural-network with adaptive computation al- [35] F.-J. Lin, C.-H. Lin, and P.-H. Shen, “Self-constructing fuzzy neural
gorithm,” Cybernetics, IEEE Transactions on, vol. 44, no. 4, pp. 554– network speed controller for permanent-magnet synchronous motor
564, April 2014. drive,” Fuzzy Systems, IEEE Transactions on, vol. 9, no. 5, pp. 751–
[10] J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Handbook of 759, 2001.
learning and approximate dynamic programming. New York, USA: [36] D. Fang, Y. Xiaodong, T. S. Chung, and K. Wong, “Adaptive fuzzy-logic
IEEE, 2004. SVC damping controller using strategy of oscillation energy descent,”
[11] R. E. Bellman and S. E. Dreyfus, Applied dynamic programming. Power Systems, IEEE Transactions on, vol. 19, no. 3, pp. 1414–1421,
Princeton, NJ: Princeton Univ. Press, 1966. Aug 2004.
[12] W. B. Powell, Approximate Dynamic Programming: Solving the curses [37] Z. Yun, Z. Quan, S. Caixin, L. Shaolan, L. Yuming, and S. Yang, “RBF
of dimensionality. USA: John Wiley & Sons, 2007. neural network and ANFIS-based short-term load forecasting approach
[13] H. He and E. Garcia, “Learning from imbalanced data,” Knowledge and in real-time price environment,” Power Systems, IEEE Transactions on,
Data Engineering, IEEE Transactions on, vol. 21, no. 9, pp. 1263–1284, vol. 23, no. 3, pp. 853–858, Aug 2008.
Sept 2009. [38] T. Shannon and G. Lendaris, “Adaptive critic based approximate dy-
[14] P. J. Werbos, “Backpropagation through time: what it does and how to namic programming for tuning fuzzy controllers,” in Fuzzy Systems,
do it,” Proceedings of the IEEE, vol. 78, no. 10, pp. 1550–1560, 1990. 2000. FUZZ IEEE 2000. The Ninth IEEE International Conference on,
[15] D. V. Prokhorov, D. C. Wunsch et al., “Adaptive critic designs,” Neural May 2000.
Networks, IEEE Transactions on, vol. 8, no. 5, pp. 997–1007, 1997. [39] S. Mohagheghi, G. K. Venayagamoorthy, and R. G. Harley, “Adaptive
[16] J. Si and Y.-T. Wang, “Online learning control by association and critic design based neuro-fuzzy controller for a static compensator in
reinforcement,” Neural Networks, IEEE Transactions on, vol. 12, no. 2, a multimachine power system,” Power Systems, IEEE Transactions on,
pp. 264 –276, Mar. 2001. vol. 21, no. 4, pp. 1744–1754, 2006.
IEEE, NOV 2015 16

[40] T. Li, D. Zhao, and J. Yi, “Adaptive dynamic neuro-fuzzy system for [63] J. Fang, W. Yao, Z. Chen, J. Wen, and S. Cheng, “Design of anti-windup
traffic signal control,” in Neural Networks, 2008. IJCNN 2008.(IEEE compensator for energy storage-based damping controller to enhance
World Congress on Computational Intelligence). IEEE International power system stability,” Power Systems, IEEE Transactions on, vol. 29,
Joint Conference on. IEEE, 2008, pp. 1840–1846. no. 3, pp. 1175–1185, May 2014.
[41] D. Zhao, Y. Zhu, and H. He, “Neural and fuzzy dynamic programming [64] Z. Ni, X. Fang, H. He, D. Zhao, and X. Xu, “Real-time tracking
for under-actuated systems,” in Neural Networks (IJCNN), The 2012 control on adaptive critic design with uniformly ultimately bounded
International Joint Conference on. IEEE, 2012. condition,” in IEEE Symposium on Adaptive Dynamic Programming
[42] Y. Zhu, D. Zhao, and H. He, “Integration of fuzzy controller with and Reinforcement Learning (ADPRL’13), IEEE Symposium Series on
adaptive dynamic programming,” in Intelligent Control and Automation Computational Intelligence (SSCI), Apr. 2013.
(WCICA), 2012 10th World Congress on. IEEE, 2012, pp. 310–315. [65] X. Zhong, H. He, H. Zhang, and Z. Wang, “Optimal control for
[43] Z. Ni, H. He, D. Zhao, and D. V. Prokhorov, “Reinforcement learning unknown discrete-time nonlinear markov jump systems using adaptive
control based on multi-goal representation using hierarchical heuristic dynamic programming,” Neural Networks and Learning Systems, IEEE
dynamic programming,” in Neural Networks (IJCNN), The 2012 Inter- Transactions on, vol. 25, no. 12, pp. 2141–2155, Dec 2014.
national Joint Conference on. IEEE, 2012. [66] F. Liu, J. Sun, J. Si, W. Guo, and S. Mei, “A boundedness result for the
[44] Z. Ni, H. He, and J. Wen, “Adaptive learning in tracking control based on direct heuristic dynamic programming,” Neural Networks, vol. 32, pp.
the dual critic network design,” Neural Networks and Learning Systems, 229–235, 2012.
IEEE Transactions on, vol. 24, no. 6, pp. 913–928, June 2013. [67] J.-J. E. Slotine, W. Li et al., Applied nonlinear control. Englewood
[45] H. Zhang and Y. Quan, “Modeling, identification, and control of a class Cliffs, NJ: Prentice-Hall International Inc, 1991.
of nonlinear systems,” Fuzzy Systems, IEEE Transactions on, vol. 9,
no. 2, pp. 349–354, 2001.
[46] S. Lun, Z. Guo, and H. Zhang, “Fuzzy hyperbolic neural network
model and its application in H-∞ filter design,” in Advances in Neural
Networks-ISNN 2008. Springer, 2008, pp. 222–230.
[47] H. Zhang and D. Liu, Fuzzy modeling and fuzzy control. Boston, MA:
Birkhauser, 2006. Yufei Tang (S’13) received the B.Eng. and M.Eng.
[48] G. Wang, H. Zhang, B. Chen, and S. Tong, “Fuzzy hyperbolic neural degrees in electrical engineering from Hohai Uni-
network with time-varying delays,” Fuzzy Sets and Systems, vol. 161, versity, Nanjing, China, in 2008 and 2011, respec-
no. 19, pp. 2533–2551, 2010. tively. He is currently working toward the Ph.D.
[49] J. Zhang, H. Zhang, Y. Luo, and H. Liang, “Nearly optimal control degree at the Department of Electrical, Computer,
scheme using adaptive dynamic programming based on generalized and Biomedical Engineering, University of Rhode
fuzzy hyperbolic model,” Acta Automatica Sinica, vol. 39, no. 2, pp. Island, Kingston, RI, USA.
142–148, 2013. His research interests include power system sta-
[50] Y. Zhu, D. Zhao, and D. Liu, “Convergence analysis and application of bility, control, and optimization, renewable energy
fuzzy-HDP for nonlinear discrete-time HJB systems,” Neurocomputing, systems, smart grid security, and computational in-
vol. 149, pp. 124–131, 2015. telligence for smart grids.
[51] P. H. Eaton, D. V. Prokhorov, and D. C. Wunsch, “Neurocontroller alter-
natives for “fuzzy” ball-and-beam systems with nonuniform nonlinear
friction,” Neural Networks, IEEE Transactions on, vol. 11, no. 2, pp.
423–435, 2000.
[52] T.-L. Chien, C.-C. Chen, Y.-C. Huang, and W.-J. Lin, “Stability and
almost disturbance decoupling analysis of nonlinear system subject
to feedback linearization and feedforward neural network controller,” Haibo He (SM’11) received the B.S. and M.S.
Neural Networks, IEEE Transactions on, vol. 19, no. 7, pp. 1220–1230, degrees in electrical engineering from Huazhong
2008. University of Science and Technology (HUST),
[53] W. Yao, L. Jiang, J. Wen, Q. Wu, and S. Cheng, “Wide-area damping Wuhan, China, in 1999 and 2002, respectively, and
controller of FACTS devices for inter-area oscillations considering the Ph.D. degree in electrical engineering from Ohio
communication time delays,” Power Systems, IEEE Transactions on, University, Athens, in 2006. From 2006 to 2009,
vol. 29, no. 1, pp. 318–329, 2014. he was an assistant professor in the Department
[54] M. Pai, Energy function analysis for power system stability. Norwell, of Electrical and Computer Engineering, Stevens
MA: Kluwer, 1989. Institute of Technology, Hoboken, New Jersey. He is
[55] M. Aboul-Ela, A. Sallam, J. McCalley, and A. Fouad, “Damping currently the Robert Haas Endowed Chair Professor
controller design for power system oscillations using global signals,” in Electrical Engineering at the University of Rhode
Power Systems, IEEE Transactions on, vol. 11, no. 2, pp. 767–773, Island, Kingston, Rhode Island.
May 1996. He has authored one sole-author research book (Wiley), edited one book
[56] Y. Zhang and A. Bose, “Design of wide-area damping controllers for (Wiley-IEEE) and six conference proceedings (Springer), and authored or co-
interarea oscillations,” Power Systems, IEEE Transactions on, vol. 23, authored over 180 peer-reviewed journal and conference papers. His current
no. 3, pp. 1136–1143, 2008. research interests include adaptive dynamic programming, computational
[57] W. Yao, L. Jiang, J. Wen, Q. Wu, and S. Cheng, “Wide-area damping intelligence, machine learning, data mining, and various applications, such
controller for power system interarea oscillations: A networked predic- as smart grid, cognitive radio networks, humanrobot interaction, and sensor
tive control approach,” Control Systems Technology, IEEE Transactions networks.
on, vol. 23, no. 1, pp. 27–36, Jan 2015. Prof. He received the IEEE International Conference on Communications
[58] P. Kundur, Power System Stability and Control. New York, USA: Mc Best Paper Award in 2014, the IEEE Computational Intelligence Society
Graw-Hill, 1994. (CIS) Outstanding Early Career Award in 2014, the K. C. Wong Research
[59] Y. Tang, H. He, and J. Wen, “Comparative study between HDP and PSS Award from the Chinese Academy of Sciences in 2012, the National Science
on DFIG damping control,” in Computational Intelligence Applications Foundation CAREER Award in 2011, the Providence Business News Rising
In Smart Grid (CIASG), 2013 IEEE Symposium on, 2013, pp. 59–65. Star Innovator Award in 2011, and the Best Master Thesis Award of Hubei
[60] Y. Tang, H. He, Z. Ni, J. Wen, and X. Sui, “Reactive power control Province, China, in 2002. His research results have been covered by national
of grid-connected wind farm based on adaptive dynamic programming,” and international medias, such as The Wall Street Journal, Yahoo!, Providence
Neurocomputing, vol. 125, no. 1, pp. 125–133, 2014. Business News, among others. He has delivered numerous keynote and invited
[61] Y. Tang, H. He, and J. Wen, “Adaptive control for an HVDC transmission talks at various conferences and organizations. He was the General Chair of
link with FACTS and a wind farm,” in in Proc. IEEE Innovative Smart the IEEE Symposium Series on Computational Intelligence (IEEE SSCI) in
Grid Technologies Conference (ISGT’13), Feb 2013. 2014. He is an Associate Editor of the IEEE Transactions on Neural Networks
[62] A. Bartoszewicz and A. Nowacka-Leverton, “ITAE optimal sliding and Learning Systems, the IEEE Computational Intelligence Magazine, and
modes for third-order systems with input signal and state constraints,” the IEEE Transactions on Smart Grid.
Automatic Control, IEEE Transactions on, vol. 55, no. 8, pp. 1928–1932,
Aug 2010.
IEEE, NOV 2015 17

Zhen Ni (M’15) received the B.S. degree from Xin Xu (SM’12) received the B.S. degree in electri-
the Department of Control Science and Engineering, cal engineering from the Department of Automatic
Huazhong University of Science and Technology, Control, National University of Defense Technol-
Wuhan, China, in 2010, and the Ph.D. degree from ogy (NUDT), Changsha, China, in 1996, where he
the Department of Electrical, Computer and Biomed- received the Ph.D. degree in control science and
ical Engineering, University of Rhode Island (URI), engineering from the College of Mechatronics and
Kingston, RI, USA, in 2015. Automation, NUDT, in 2002. He is currently a Full
He is currently an Assistant Professor with the Professor with the College of Mechatronics and Au-
Department of Electrical Engineering and Computer tomation, NUDT, China. He has co-authored more
Science, South Dakota State University, Brookings, than 100 papers in international journals and con-
SD, USA. His current research interests include ferences, and co-authored four books. His research
computational intelligence, smart grid, machine learning, and cyber-physical interests include reinforcement learning, approximate dynamic programming,
systems. Prof. Ni received the Chinese Government Award for Outstanding machine learning, robotics, and autonomous vehicles.
Students Abroad in 2014. He has been actively involved in numerical Dr. Xu was a recipient of the 2nd class National Natural Science Award
conference and workshop organization committees in the society, including of China, in 2012. He is an Associate Editor of Information Sciences and
the Local Arrangement Chair of the IEEE Computational Intelligence Society Intelligent Automation and Soft Computing, a Guest Editor of the International
(CIS) Workshop in URI in 2014, and the General Co-Chair of the IEEE CIS Journal of Adaptive Control and Signal Processing. He is a Committee
Winter School in Washington, DC, USA, in 2016. Member of the IEEE TC on Approximate Dynamic Programming and
Reinforcement Learning (ADPRL) and the IEEE TC on Robot Learning.

Xiangnan Zhong received the B.S. and M.S.


degrees in automation, and control theory and
control engineering from Northeastern University,
Shenyang, China, in 2010 and 2012, respectively.
She is currently pursuing the Ph.D. degree with the
Department of Electrical, Computer, and Biomedical
Engineering, University of Rhode Island, Kingston,
RI, USA.
Her current research interests include adaptive
dynamic programming, reinforcement learning, ma-
chine learning, and optimal control.

Dongbin Zhao (SM’10) received the B.S., M.S.,


Ph.D. degrees from Harbin Institute of Technol-
ogy, Harbin, China, in Aug. 1994, Aug. 1996, and
Apr. 2000 respectively. Dr. Zhao was a postdoctoral
fellow with Tsinghua University, Beijing, China,
from May 2000 to Jan. 2002. He was an associate
professor from 2002, and now is a professor at the
State Key Laboratory of Management and Control
for Complex Systems from 2012 with the Institute of
Automation, Chinese Academy of Sciences, China.
He has published one book, edited 2 books, and
published over forty international journal papers. His current research interests
lies in the area of computational intelligence, adaptive dynamic programming,
robotics, intelligent transportation systems, and process simulation.
Dr. Zhao is an Associate Editor of the IEEE Transactions on Neural
Networks and Learning Systems (2012-), IEEE Computation Intelligence
Magazine (2014-) and Cognitive Computation (2011-). He serves as the
Newsletter Editor of IEEE Computational Intelligence Society (CIS) (2013-),
the vice chair of Neural Network Technical Committee of IEEE CIS (2013-).
He is the Program Chair of the 4th International Conference on Intelligent
Control and Information Processing (Beijing, China, 2013), as an organizer
of flagship conferences of IEEE CIS (WCCI 2014, SSCI 2014). He worked
as several guest editors of international journals.
He received the Second Award for Scientific Progress of National Defense
from the Commission of Science technology and industry for National
Defense of China (1999), the First Award for Scientific Progress of Chinese
Universities, Ministry of Education of China (2001), the Third Award for
Scientific and Technology Progress from China Petroleum and Chemical
Industry Association (2009), and the First Award for Scientific and Technology
Progress from China Petroleum and Chemical Industry Association (2010,
2012). He has been a senior member of IEEE since 2010.

You might also like