0% found this document useful (0 votes)

12 views

High-Precision Quick Control in Multivariable Time-Varying Nonlinear System A Biological Decision Model Predictive Control Algorithm

Uploaded by

fengmengqun

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

High-Precision Quick Control in Multivariable Time-Varying Nonlinear System A Biological Decision Model Predictive Control Algorithm

Uploaded by

fengmengqun

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS 1

High-Precision Quick Control in Multivariable

Time-Varying Nonlinear System: A Biological
Decision Model Predictive Control Algorithm
Jinying Yang , Yongjun Zhang , Qiang Guo , Xiong Xiao , Tanju Yildirim ,
and Fei Zhang , Member, IEEE

Abstract—To solve the problem of unsatisfactory control and Model predictive control (MPC) algorithms can be divided
poor real-time performance of nonlinear time-varying multi- into three categories: 1) predictive control based on adaptive
input systems, this article proposes an intelligent model predictive control theory, such as generalized predictive control [4], [5],
control (MPC) algorithm inspired by heuristic dynamic program-
ming (HDP), biological control theory, and operations research. extended time-domain adaptive control [6], and pole assign-
Considering that the internal feedback information from a neural ment control [7]; 2) nonparametric model prediction control,
network (NN) is low, a multilevel feedback NN is proposed. such as dynamic matrix predictive control based on a step
Combining an NN with a biofeedback mechanism increases the response model [8], and model control based on an impulse
internal feedback information and improves the convergence response [9]; and 3) predictive control based on structural
accuracy of the NN. The multilevel feedback network is used
in three internal networks of the intelligent MPC algorithm. design, such as inference control [10], [11] and internal
In order to improve the convergence speed of the proposed model control (IMC) [12], [13]. Traditional predictive control
algorithm, a biologically inspired central coordination module algorithms can effectively control linear or weakly nonlinear
and operations research theory inspired priority factor module systems via a predictive model, rolling optimization, and feed-
is incorporated within the HDP algorithm. The prediction back correction. With the development and progress of science
accuracy and control speed of the algorithm for nonlinear time-
varying systems is greatly improved without affecting the control and technology, the control objective in processes is becom-
accuracy. The stability and convergence of the intelligent MPC ing increasingly complex, and traditional MPC algorithms
algorithm is demonstrated on test data. Finally, the effectiveness encounter difficulties in accurately predicting multivariable
and superiority of the proposed MPC algorithm is verified and nonlinear and time-varying uncertain systems. Therefore, the
compared against several traditional algorithms. control performance of traditional MPC algorithms deteriorate
Index Terms—Biological regulatory mechanism, heuristic when processing complex objectives and do not satisfy the
dynamic programming (HDP), model predictive control (MPC), requirements of various control indexes.
neural network (NN), operational decision. To address control challenges in traditional MPC, Ju and
Haihua [14] designed an MPC algorithm that effectively dealt
with network time delay, which improved the performance of
I. I NTRODUCTION the controller and enhanced the system’s stability. However,
the algorithm only applies to time-invariant systems. Thomas
ITH increasingly complex nonlinear and time-varying
W objects in process control, it is becoming increasingly
difficult to achieve accurate and fast control of systems.
and Hansson [15] designed an MPC incorporating an integral
effect load observer. A fast response speed, small current
ripple and electromagnetic force were predicted; however, the
Due to predictive control algorithm’s robustness in practical
algorithm was not used in an actual system. Zhang et al. [16]
applications, they have been widely used in various fields,
proposed a multivariable sequential MPC method for an
such as a maglev system [1] and multiagent systems [2], [3].
inductor–capacitor–inductor (LCL)-type three-level grid con-
Manuscript received 25 March 2024; accepted 8 August 2024. This work nected inverter (GCI), and verified the effectiveness of the
was supported in part by the National Natural Science Foundation of China proposed algorithm by applying the method to the power
under Grant U21A20483. This article was recommended by Associate Editor
C. Platania. (Corresponding author: Yongjun Zhang.) grid current control of an LCL three-level GCI system. The
Jinying Yang, Yongjun Zhang, Xiong Xiao, and Fei Zhang are with the prediction model is derived from the mathematical model of
Institute of Engineering Technology, University of Science and Technology the discrete domain; therefore, difficulties are encountered
Beijing, Beijing 100083 China (e-mail: [email protected]; zhangyj@
ustb.edu.cn; [email protected]; [email protected]). when faced with complex nonlinear control objectives. Masuda
Qiang Guo is with the National Engineering Research Center for Advanced and Uchiyama [17] proposed an extended MPC (EMPC)
Rolling and Intelligent Manufacturing, University of Science and Technology algorithm which applied matching conditions to improve
Beijing, Beijing 100083, China (e-mail: [email protected]).
Tanju Yildirim is with the Faculty of Science and Engineering, Southern robustness. Numerical simulations demonstrated EMPC inher-
Cross University, East Lismore, NSW 2480, Australia (e-mail: tanju.yildirim@ its the performance characteristics of traditional MPC with
scu.edu.au). greater robustness. However, the application object of EMPC
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TSMC.2024.3449332. is based on a linear time-invariant model, and its control
Digital Object Identifier 10.1109/TSMC.2024.3449332 performance for time-varying nonlinear systems remains to
2168-2216
c 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: National Institute for Materials Science. Downloaded on September 10,2024 at 05:15:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

be verified. Islam [18] and Yang et al. [19] applied MPC to

control target water level in a watershed. Results indicated the
control performance of the algorithm is stable. However, it is
difficult to measure all the physical parameters of the basin
and update the model in real-time.
In order to overcome time-varying systems, adaptive control
algorithms offer many advantages. Yang et al. [20] proposed
an intelligent adaptive control algorithm and applied it to the
flow control of multitributary rivers. The algorithm can control Fig. 1. Structure diagram of HDP.
the river flow in a fast and stable manner, which proves the
superiority of the control algorithm. Wang et al. [21] proposed
a fuzzy adaptive fixed-time quantized feedback controller for design of the multilevel feedback NN and its application in
strict feedback of nonlinear systems with quantized inputs. Bio-Dec-HDP. Section V analyzes the feasibility and stability
The controller can make the tracking error converge to a small of Bio-Dec-HDP. Section VI uses a case study to demonstrate
neighborhood of the origin in a fixed time and ensure that all the validity of Bio-Dec-HDP. Section VII presents some simu-
signals in the closed-loop system are bounded. Xu et al. [22] lation results and discussions about the case study. Section VIII
addressed the problem of adaptive neural network (NN) time ends with concluding remarks.
tracking control for multiple-input–multiple-output systems
with dynamic uncertainty. A hyperbolic tangent function II. BACKGROUND
and piecewise function is introduced to overcome multiple A. Heuristic Dynamic Programming Algorithm
singularities, and the output of the controlled system can track
the reference signal. HDP [23], [24], [25] is a near-optimal method in the field
To improve the fast control ability of an MPC algo- of optimal control. In essence, it is a nonlinear programming
rithm for time-varying nonlinear complex systems, an MPC method, the core of which is the Behrman optimality principle.
algorithm based on heuristic dynamic programming (HDP), Behrman’s optimal principle simplifies the problem-solving
biological system, and operation decision is proposed. The process by reducing a multistep decision problem to multiple
proposed algorithm combines the adaptive learning ability single-step decision problems and performing backward recur-
of a biologically inspired central coordination module and sion from the end to the beginning. HDP integrates the ideas
operation research theory inspired priority factor module, of reinforcement learning, dynamic programming, function
which is incorporated within the HDP algorithm. Through approximation and adaptive control, which can effectively
theoretical analysis, it is proved that the proposed MPC solve the optimal control problem of nonlinear systems. It
exhibits stable iterative approximation and accurate conver- has been applied to trajectory tracking and formation control
gence. Finally, a time-varying nonlinear function is selected of multiagent systems [26], virtual guide automatic berthing
as the test object for comparison experiments to verify the control of multiagent systems in marine ships [27], and
validity and superiority of the MPC algorithm proposed in this the constrained near-optimal control problem for nonlinear
article. discrete-time systems [28]. As shown in Fig. 1, HDP includes
The primary contributions of this article can be summarized three parts, which are the model network, action network
in three aspects as follows. and critic network [29]. The model network is a model used
1) This article proposes a new intelligent model prediction to approximate the real system. It inputs the system’s state
control algorithm, which is called a biological-decision x(n) and control strategy u(n), and outputs the system’s state
HDP (Bio-Dec-HDP). In Bio-Dec-HDP, the human x(n + 1) at the next sampling time. The action network obtains
biological network and the operational decision are the system’s control variable through calculation. It inputs
incorporated into the HDP algorithm. By improving x(n), and outputs u(n). The critic network is an evaluator used
the structure of the algorithm, the convergence speed to estimate the system’s performance, and judge whether the
of the algorithm for nonlinear time-varying systems is performance of the system can satisfy the control performance.
improved. It inputs x(n) and x(n + 1), and outputs the performance index
2) A new multilevel feedback network developed in this J(x(n)) and J(x(n + 1)).
article combines a biological regulation mechanism with
an NN to increase the internal information exchange and B. Biological Regulation Systems
improve the tracking and prediction ability for nonlinear The three principal control system of biological networks
time-varying objects. are the nervous system, endocrine system, and immune
3) A detailed analysis of the Bio-Dec-HDP’s feasibility and system [30], [31], which compose the nervous–endocrine–
stability for guaranteeing system stability and conver- immune system (NEIS). The nervous system is a system in
gence is conducted. the human body that plays a leading role in the regulation
The remainder of this article is organized as follows. of physiological functions and is mainly composed of nerve
Section II provides a detailed background. Section III tissues [32], [33]. The endocrine system is a large body
describes the overall structure of the Bio-Dec-HDP algorithm fluid regulation system comprised of all endocrine glands and
and two new improved modules. Section IV describes the hormones, which is closely related to the nervous system [34],

Authorized licensed use limited to: National Institute for Materials Science. Downloaded on September 10,2024 at 05:15:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YANG et al.: HIGH-PRECISION QUICK CONTROL IN MULTIVARIABLE TIME-VARYING NONLINEAR SYSTEM 3

Fig. 2. Structure diagram of the NEIS. Fig. 4. Hormone trend curves: (a) rise of hormone secretion and (b) fall of
hormone secretion.

C. Operations Research and Priority Factor Theory

In the objective planning of operations research, when
the importance of the objective is different, the priority
factor Pn , n = 1, 2, . . . , N, can be used to express impor-
tance [41], [42], [43]. In this case, it is necessary to ensure that
the target corresponding to the higher priority factor has been
satisfied before considering the target with a lower priority
factor. In addition, when considering the target with a lower
priority factor, the generation of the new decision needs to
ensure that the target corresponding to the higher priority
factor is not destroyed. The selection of priority factors should
Fig. 3. Endocrine feedback regulation circuit. follow the principle that the high-priority factor is far greater
than the low-priority factor

[35], [36], [37]. The immune system is an important system P1 P2 . . . Pn (2)

that has the functions of immune surveillance, defense, and that is, the target related to Pn has absolute priority over the
regulation. It is composed of organs, cells, and active sub- target related to Pn+1 .
stances [38], [39]. The three systems form a human regulatory In the decision-making process, each target is assigned a
network, which coordinate and control each other, similar to corresponding priority factor Pn , n = 1, 2, . . . , N according to
the function of the three networks in HDP theory. In addition, its importance. As the application scenario changes, the same
the central nervous system in organisms can also assist in goal may be given different priority factors. Simultaneously,
regulating NEIS, as shown in Fig. 2. All systems interact different goals may also be assigned the same priority factor
and coordinate with each other to form a closely connected according to the degree of importance.
complex network, which jointly maintains the body’s internal The HDP algorithm, biological regulatory mechanism, and
environment stability. operations research theory have good self-learning, self-
Hormone regulation is an important mechanism in biologi- adaptation, robustness, and fault tolerance. Compared with the
cal systems, which includes a traditional feedback mechanism artificial experience control algorithm with greater uncertainty,
and an ultrashort feedback mechanism. The ultrashort feed- they have the advantages of higher accuracy and faster control.
back mechanism produces a particular feedback inhibition The MPC control system in this article is based on the
effect on the secretion activity of the glands through the HDP algorithm. On this basis, biological regulation laws and
concentration of gland hormones. Compared with a traditional operations research are adopted to improve HDP’s structure,
feedback loop, the regulation is faster and can achieve rapid resulting in the introduction of a central coordination module
regulation. The structure of the endocrine ultrashort feedback and priority factor module. Inspired by the ultrashort feedback
regulation circuit is shown in Fig. 3. mechanism of the endocrine system which can quickly exert
The regulation law of hormone secretion is monotonous and hormone secretion before the action of the ordinary feedback
non-negative, and the rise and fall trend of hormone secretion mechanism. Moreover, a biofeedback mechanism is integrated
follows the rule of Hill’s function [40]. Fig. 4 is the curve of within the HDP NN, manifesting in a multilevel feedback
the regulation law of hormone secretion, where horizontal and network having greater feedback dimensionality. The proposed
vertical coordinates are dimensionless variables. network structure is used in the model module, action module,
If hormone x is regulated by hormone y, the relationship and critic module of the Bio-Dec-HDP algorithm.
between hormone x, secretion rate Sx , hormone y, and con-
centration Cy is
III. B IO -D EC -HDP A LGORITHM
Sx = a · F Cy + Sx0 (1) A. Algorithm Overall Structure Design
where Sx0 is the basic secretion rate of x, and a is a constant Since nonlinear systems have multiple control variables,
coefficient. the priority factor module based on operations research is
Authorized licensed use limited to: National Institute for Materials Science. Downloaded on September 10,2024 at 05:15:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

The covariance of each input and each control object is

given by
Cov(X, Z) = E(X − EX)(Z − EZ) = EXZ − EX · EZ (5)
where Z represents the controlled objects.
According to (6), the correlation coefficient is
Cov(X, Y)
ρXY = √ √ . (6)
Fig. 5. Bio-Dec-HDP algorithm.
DX · DY
introduced into the HDP algorithm structure to optimize the The priority factor is regulated according to (7), which
predicted value and control variable of the system. The priority inspired by the ultrashort feedback rule in real time gives
factors of the model module, action module, and critic module
fup ρXi Y , 0 ≤ ρXi Y ≤ 1
are adjusted according to the biologically inspired regulation f ρXi Y = (7)
mechanism. In addition, the central coordination module is fdown ρXi Y , −1 ≤ ρXi Y < 0
introduced into the HDP structure inspired by the biological where
regulation system, which trains and corrects the NN in real-
ρXi Y
time according to the deviation between the predicted value fup ρXi Y = sign ρXi Y · (8)
calculated by Bio-Dec-HDP and the actual value. Fig. 5 shows υ1 + ρXi Y

the structure of the Bio-Dec-HDP algorithm. υ2
In Fig. 5, the dotted box shows the Bio-Dec-HDP controller, fdown ρXi Y = sign ρXi Y · . (9)
which includes five components: 1) model module; 2) critic υ2 + ρXi Y
module; 3) action module; 4) priority factor module; and υ1 in (8) is constant since ρXi Y ≥ 0, υ2 in (9) is constant
5) central coordination module. The model module is the since ρXi Y < 0.
prediction model of the controller. It inputs the current state The priority factors αm1 , αm2 , αa1 , and αa2 are adjusted in
x(n) and outputs the predicted state x(n + 1) of the next real time according to (3)–(9).
time step. The action module and priority factor module work
together to form the rolling optimization module of MPC. The C. Central Coordination Module
action module inputs x(n), and under the optimization effect of
the priority factor, outputs the control variable u(n). The critic According to HDP theory, the learning rate of model module
module acts as feedback correction. Based on the current state lmp , critic module lcp and action module lap are all fixed data
x(n) and the next moment state x(n + 1) predicted by the model selected by experience. But in an actual control process, the
module, the control variable obtained by the action module is learning rates lmp , lcp , and lap have individual optimal values
evaluated by the critic network, and the performance index J as the system changes with time. To optimize the control
is used to judge whether the control meets the requirements. performance, the central coordination module regulates lmp ,
In the process of control, the central coordination module lcp , and lap according to the state of the system. Its regulation
regulates the controller parameters, which makes the control rules refer to the regulation mechanism of biological hormone
performance of the algorithm more rapid, stable, and accurate. secretion.
lmp can be represented by
B. Priority Factor Module lmp = am · f (α) + βm (10)
According to operational decision theory [41], [42], [43],
where α is the deviation between the controlled variable’s
priority factors αm1 , αm2 , αa1 , and αa2 are added to the model
actual value and the target value, βm is the model module’s
module and action module. In the actual control process, the
basic learning rate, and am is the model module learning rate’s
optimal value of each priority factor changes dynamically with
constant coefficient. f (α) can be written as
the change of system state. The priority factor module adjusts
the priority factors of the model module and action module, fup (α), α ≥ 0
f (α) = (11)
respectively, according to the influence of each variable on the fdown (α), α < 0
control performance. The adjustment of priority factors first so that
calculates the average of the model module and action module
inputs according to |α|
fup (α) = sign(α) · (12)
k (δ1 + |α|)
x(i) 1
EX = i=1 (3) fdown (α) = sign(α) · (13)
k (δ2 + |α|)
where x(i) represents the input of the module adjusted by the
priority factor. where δ1 in (12) is constant when α ≥ 0, and δ2 in (13) is
The variance is obtained as constant when α < 0.
k lcp can be represented by
(x(i) − EX)2
DX = i=1 . (4) lcp = ac · f (α) + βc . (14)
k
Authorized licensed use limited to: National Institute for Materials Science. Downloaded on September 10,2024 at 05:15:41 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

YANG et al.: HIGH-PRECISION QUICK CONTROL IN MULTIVARIABLE TIME-VARYING NONLINEAR SYSTEM 5

B. Model Module
Various models can be used as prediction models, such as
mechanism model and data-driven model [44]. We selected the
network shown in Fig. 6 as the prediction model to predict the
future state. The input of the model module is Mm , and the
output is x(n+1). The forward calculation of the network
prediction model can be divided into the following five steps:
Mm = [u(n), x(n)] (16)
1 − e−Wm1 ×Mm
mh1 = Wm1 × Mm − αm1 ·
1 + e−Wm1 ×Mm
1 − e−Wm1 ×Mm
− αm2 · Wm2 × (17)
1 + e−Wm1 ×Mm
1 − e m1 ×Mm
−W
x̂(n + 1) = Wm2 × (18)
1 + e−Wm1 ×Mm
−m
1 − e h1
mh2 = (19)
1 + e−mh1
x(n + 1) = Wm2 × mh2 + ϑm · x̂(n + 1) (20)
where mh1 is the hidden layer’s inactive value, mh2 is the
hidden layer’s activated value, Wm1 is the weight matrix
between the input layer and the hidden layer, Wm2 is the weight
matrix between the hidden layer and the output layer, αm1
Fig. 6. Structure of the improved multilevel feedback network.
and αm2 are the priority factors, and ϑm is the optimization
coefficient of the model module.
The forward calculation error can be written as follows:
In (14), βc is the critic module’s basic learning rate, and ac
is the critic module learning rate’s constant coefficient. emp = xp (n + 1) − x(n + 1) (21)
lap can be written as follows: 1 2 1 2
Emp = emp = xp (n + 1) − x(n + 1) (22)
2 2
lap = aa · f (α) + βa . (15) where xp (n+1) is the actual value at the next time step. The
training target of the predictive model is to minimize Emp .
In (15), βa is the action module’s basic learning rate, and After the above calculation, Wm1 and Wm2 are adjusted as
aa is the action module learning rate’s constant coefficient. follows:

∂Emp
Wm1 = lmp − (23)
IV. M ULTILEVEL F EEDBACK N EURAL N ETWORK ∂Wm1

BASED ON B IOLOGICAL M ECHANISM 2 · e−mh1 ∂mh1
= lmp · x̂(n + 1) × Wm2 × 2 ×
A. Multilevel Feedback Network Design 1 + e−mh1 ∂Wm1
A traditional NN regulates internal parameters by the devi-
2 · e−mh1 ×Mmp
ation between the target value and the network output. Due to + ϑ · Wm2 × 2 × Mmp
less feedback information, the convergence speed and accuracy 1 + e−mh1 ×Mmp
for nonlinear time-varying objects are not satisfactory. In Wm1 = Wm1 + Wm1 (24)

biology, the ultrashort feedback regulation mechanism rapidly ∂Emp
Wm2 = lmp −
regulates glandular hormone secretion. Through the feedback ∂Wm2
information of the gland itself, the secretion of hormones can 1 − e−Wm1 ×Mmp
be adjusted before the regular feedback mechanism action, in = −lmp × x̂(n + 1) × mh2 + ϑ ·
1 + e−Wm1 ×Mmp
order to accelerate the regulation speed of hormonal secretion.
(25)
According to this biologically inspired feedback mechanism,
the feedback mechanism of backpropagation NN (BPNN) is Wm2 = Wm2 + Wm2 (26)
improved by adding the feedback information. The improved where lmp ∈ (0, 1) is the model module’s learning rate, which
BPNN can quickly perceive internal information by adding is adjusted by the central coordination module in real time.
feedback information between the output layer and the hidden In (24), [(∂mh1 )/(∂Wm1 )] can be written as
layer, the hidden layer itself, and the output layer itself, which
∂mh1 1 − e−Wm1 ·Mmp
improves the convergence speed and convergence accuracy. = Mmp − αm1 ·
The optimized NN is shown in Fig. 6. In Bio-Dec-HDP ∂Wm1 1 + e−Wm1 ·Mmp
algorithm, the model module, action module, and critic module 1 − e−Wm1 ·Mmp
− αm2 · Wm2 ×
all use this multilevel feedback network. 1 + e−Wm1 ·Mmp

6 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

2 · Mmp × e−Wm1 ·Mmp and [(∂ah1 )/(∂Wa1 )] can be converted into
− Wm1 × αm1 · 2
1 + e−Wm1 ·Mmp ∂ah1 1 − e−Wa1 ×x(n)
= x(n) − αa1 ·
2 · Mmp × e−Wm1 ·Mmp ∂Wa1 1 + e−Wa1 ×x(n)
+ αm2 · Wm2 × 2 . (27)
1 + e−Wm1 ·Mmp 1 − e−Wa1 ×x(n)
− αa2 · Wa2 ×
1 + e−Wa1 ×x(n)
C. Action Module
The input of the action module is x(n), and the output is 2 · x(n) × e−Wa1 ×x(n)
−Wa1 × αa1 · 2
u(n). The calculation of the action module can be written as 1 + e−Wa1 ×x(n)
follows:
2 · x(n) × e−Wa1 ×x(n)
1 − e−Wa1 ×x(n) + αa2 · Wa2 × 2 (41)
ah1 = Wa1 × x(n) − αa1 · 1 + e−Wa1 ×x(n)
1 + e−Wa1 ×x(n)
where [∂ û/(∂Wa1 )] can be converted into
1 − e−Wa1 ×x(n) ∂ û 2 · x(n) × e−Wa1 ×x(n)
− αa2 · Wa2 × (28) = Wa2 ×
1 + e−Wa1 ×x(n) 2 . (42)
∂Wa1 1 + e−Wa1 ×x(n)
1 − e−ah1
ah2 = (29) In summary, Wa1 and Wa2 can be adjusted according
1 + e−ah1 to (36)–(42).
1 − e−Wa1 ×x(n)
û = Wa2 × (30)
1 + e−Wa1 ×x(n) D. Critic Module
u = Wa2 × ah2 + ϑa · û. (31) After the action module calculates the control variables,
Wa1 is the weight matrix between the input layer and the the control variables need to be evaluated, and based on the
hidden layer, Wa2 is the weight matrix between the hidden evaluation results they require judgment as to whether the
layer and the output layer, ah1 is the hidden layer’s inactive control variables obtained by the action module need further
value, ah2 is the hidden layer’s activated value, αa1 and αa2 adjustment. The critic module guides the adjustment of the
are the priority factors, and ϑa is the optimization coefficient internal parameters of the action module by estimating the con-
of the action module. trol performance of control variables. The critic module inputs
Ĵ is the cost function of this module. Wa1 and Wa2 are the state variable’s component, and outputs the performance
adjusted as follows: index.
The input of the critic module is x, and the output is J. The
∂ Ĵ ∂ Ĵ ∂u calculation of critic module can be defined as follows:
Wa2 = lap · − = lap · − × (32)
∂Wa2 ∂u ∂Wa2 ch1 = Wc1 × (x − αc · Wc1 ) (43)

Wa2 = Wa2 + Wa2 (33) 1 − e−ch1

∂ Ĵ ∂ Ĵ ∂u ch2 = (44)
Wa1 = lap · − = lap · − × (34) 1 + e−ch1
∂Wa1 ∂u ∂Wa1 J = Wc2 × ch2 . (45)
Wa1 = Wa1 + Wa1 (35) Wc1 represents the weight matrix between the input layer
where lap ∈ (0, 1) is the action module’s learning rate, which and hidden layer, Wc2 represents the weight matrix between
is adjusted by the central coordination module in real time. the hidden layer and output layer, and αc represents the
In (32), (∂ Ĵ/∂u) can be written as follows: feedback coefficient of the hidden layer.
The objective function of this critic module is
∂ Ĵ
=2·u (36) ec (n) = Ĵ(n) − U(n) − ξ · Ĵ(n + 1). (46)
∂u
Other terms in (32) and (34) can be written as The target of this module is to minimize Ec (n) as follows:
∂u ∂ û 1
= ah2 + ϑ · (37) Ec (n) = ec (n)2 . (47)
∂Wa2 ∂Wa2 2
The critic module’s weight is adjusted as follows:
∂u ∂ah2 ∂ û
= Wa2 × +ϑ · . (38) ∂Ec
∂Wa1 ∂Wa1 ∂Wa1 Wc1 = lc · − = −lc · ec × Wc2 × (1 − mh2 × mh2 ) × x
∂Wc1
In (37), [∂ û/(∂Wa2 )] can be converted into (48)

∂ û 1 − e−Wa1 ×x(n) Wc1 = Wc1 + Wc1 (49)
= . (39)
∂Wa2 1 + e−Wa1 ×x(n) ∂Ec
Wc2 = lc · − = −lc · ec × ch2 (50)
∂Wc2
In (38), [(∂ah2 )/(∂Wa1 )] can be converted into Wc2 = Wc2 + Wc2 . (51)
∂ah2 ∂ah2 ∂ah1 2 · e−ah1 ∂ah1 lc ∈ (0, 1) is the critic module’s learning rate, which is
= × = 2 × (40)
∂Wa1 ∂ah1 ∂Wa1 1 + e−ah1 ∂Wa1 adjusted by the central collaborative module in real time.

YANG et al.: HIGH-PRECISION QUICK CONTROL IN MULTIVARIABLE TIME-VARYING NONLINEAR SYSTEM 7

V. F EASIBILITY A NALYSIS OF B IO -D EC -HDP u0 (x(n)) = arg min x(n)T Qx(n) + uT Ru + J0 emp (n + 1) . (59)
u
The feasibility analysis of the Bio-Dec-HDP algorithm con-
The updated cost function is given by
sists of three aspects, which are iteration and approximation
analysis, stability analysis for nonlinear systems, and iterative J1 = emp (n)T Qemp (n) + uT0 (x(n))Ru0 (x(n)) + J0 emp (n + 1) .
convergence analysis. (60)
Hence, Bio-Dec-HDP computes between the following
A. Algorithm’s Iteration and Approximation Analysis
equations iteratively:
Nonlinear systems can be approximated by the model
module which can be described as u(i, x(n)) = arg min x(n)T Qx(n) + uT Ru + J i, emp (n + 1)
u
(61)
x(n + 1) = f emp (n) + g(x(n))u(x(n)) (52)
J(i + 1) = min emp (n)T Qemp (n) + u(n)T Ru(n)
u
where x ∈ Rn , f (emp (n)) ∈ Rn , g(x) ∈ Rn . The cost function is
minimized as shown in (53) to get u(x(n)) + J i, emp (n + 1)

∞
= emp (n)T Qemp (n) + u(i, x(n))T Ru(i, x(n))
J emp (n) = emp (i)T Qemp (i) + u(i)T Ru(i) (53)
+ J i, f emp (n) + g(x(n))u(i, x(n)) . (62)
i=k
To implement the algorithm’s iterative procedure shown
where Q ∈ Rn×n and R ∈ Rn×n are positive definite, i.e., ∀emp ,
in (61) and (62), we approximate the value function using
u = 0, emp (i)T Qemp (i) > 0, u(i)T Ru(i) > 0, and emp = 0 ⇒
an NN. Ĵ(i, emp ) and û(i, x) are approximated by an NN as
eTmp Qemp = 0, u = 0 ⇒ uT Ru = 0. Equation (53) can be
derived in
written as

Ĵ i, emp (n), Wc2 (i) = Wc2
T
(i)φ emp (n) (63)
J emp (n) = emp (n)T Qemp (n) + u(n)T Ru(n) + J emp (n + 1) .
(54) û(i, x(n), Wa2 (i)) = Wa2
T
(i)σ (x(n)) (64)
where φ() in (63) is the activation function of the critic
Based on the Bellman optimality principle, the Hamilton–
module, and σ () in (64) is the activation function of the action
Jacobi–Bellman (HJB) equation can be written as
module.
The target cost function is
J ∗ (x(n)) = min emp (n)T Qemp (n) + u(n)T Ru(n) + J ∗ (x(n + 1)) .
u(n) T
(55) d φ emp (n) , Wc2 (i)
= emp (n)T Qemp (n) + û(i, x(n))T Rû(i, x(n))
The optimal strategy u∗ (x(n)) requires that the first-order
necessity condition for the gradient of u(n) in (55) satisfies + Wc2 T
(i)φ emp (n + 1) (65)

∂J ∗ emp (n) where Wc2 ∈ RLIC ×1 , φ(emp (n)) ∈ RLIC ×1 .
= In (63), the relation between Wc2 (i) and the target cost
∂u(n)
function shown in (65) is explicit. Therefore, to find the weight
∂ emp (n)T Qemp (n) + u(n)T Ru(n)
Wc2 (i + 1), the error between the target value in (63) and (65)
∂u(n)
is minimized by least-squares (L-S) over a compact set .
∂emp (n + 1) ∂J ∗ emp (n + 1) Therefore, Wc2 (i + 1) can be given as
+ =0 (56)
∂u(n) ∂emp (n + 1)
T
Wc2 (i + 1) = arg min W (i + 1)φ emp (n)
resulting in c2
Wc2 (i+1)

T 2
∗ 1 −1 ∗
T ∂J emp (n + 1) − d φ emp (n) , Wc2 (i) demp (n) . (66)
u (x(n)) = R g(x(n)) (57)
2 ∂emp (n + 1)
In (66), d(φ(emp (n)), Wc2 T (i)) can be simplified
An HJB equation can be obtained by substituting (57) using (62), (63), and (65), resulting in
into (53), where J ∗ (emp (n+1)) is the value function of u∗ (x(n)) T
d φ emp (n) , Wc2 (i) = Ĵ(i + 1) (67)
J ∗ emp (n) = J ∗ emp (n + 1) + emp (n)T Qemp (n)
therefore, (66) can be rewritten as follows:
1 ∂J ∗ T emp (n + 1) ∂J ∗ emp (n + 1)
+ g(x(n))R−1 g(x(n))T .
4 ∂emp (n + 1) ∂emp (n + 1) T
Wc2 (i + 1) = arg min Wc2 (i + 1)φ emp (n)
(58) Wc2 (i+1)
2

− Ĵ(i + 1) demp (n)
In Bio-Dec-HDP, the cost function J0 (emp ) = 0 is calculated

first (the initial value of the cost function is not required), and
= arg min Wc2 (i + 1)φ emp (n)
the control strategy u0 is obtained as in Wc2 (i+1)

8 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

T
φ T emp (n) Wc2 (i + 1) therefore

− Wc2 (i + 1)φ emp (n) Ĵ(i + 1) ∂la ∂f (α)
= aa · ≥ 0, −2 ≤ α ≤ 2. (77)
− Ĵ(i + 1)T φ T emp (n) Wc2T (i + 1) ∂|α| ∂|α|

+ Ĵ(i + 1)T Ĵ(i + 1)demp (n) . (68) In (77), it is important to note that la is monotonically
bounded, meaning there is no change in the convergence of la ·
In (68), Wc2
T (i + 1)φ(e (n)) − Ĵ(i + 1)2 can be simplified
mp ([∂(x(n)T Qx(n) + û(i)T Rû(i) + Ĵ(i, emp (n + 1)))]/[∂Wa2 (i)])
as Therefore, (74) can be rewritten as follows:
2
T
Wc2 (i + 1)φ emp (n) − Ĵ(i + 1) Wa2 (i + 1) = Wa2 (i) − la · 2σ (x(n))Rû(i)
T
= Wc2 (i + 1)φ emp (n) φ T emp (n) Wc2 (i + 1)
∂φ emp (n + 1)
− Wc2 (i + 1)φ emp (n) Ĵ(i + 1) + σ (x(n))g(x(n)) T
Wc2 (i) .
T ∂emp (n + 1)
− Ĵ(i + 1)T φ T emp (n) Wc2 (i + 1) + Ĵ(i + 1)T Ĵ(i + 1).
(78)
(69)
Equation (78) can be restated as
Setting the derivative of (69) equal to zero results in

2φ emp (n) φ T emp (n) Wc2 (i + 1) − 2φ emp (n) Ĵ(i + 1) = 0. Wa2 (i + 1) = Wa2 (i) − α 2σ (x(n))Rû(i)
(70)
T ∂φ emp (n + 1)
Solving out for Wc2 (i+1) in (70) leads to +σ (x(n))g(x(n)) Wc2 (i)
∂emp (n + 1)
−1
Wc2 (i + 1) = φ emp (n) φ T emp (n) · φ emp (n) Ĵ(i + 1). (79)
(71) where x(n + 1) = f (emp (n)) + g(x(n))û(x(n), Wa2 (i)). When
Therefore, (66) can be obtained as follows: i ⇒ ∞, the weight Wa2 (i) ⇒ Wa2 satisfies (72). Hence,
u(i, x(n), Wa2 (i)) ⇒ û and J(i, emp (n), Wc2 (i)) ⇒ Ĵ.
−1
T
Wc2 (i + 1) = φ emp (n) φ emp (n) demp (n)
B. Stability Analysis for Nonlinear System
T
× φ emp (n) Ĵ i + 1, φ emp (n) , Wc2 (i) demp (n). Theorem 1: When 0 < ϑa < [1/(([∂x(n)]/[∂u(n)])2 )],
the nonlinear system shown in (52) is stable.
(72)
Proof: Construct the Lyapunov function
Similarly, to obtain the control policy’s parameter 1
û(t, x(n), Wa2 (t)) L(n) = Emp (n)T · Emp (n). (80)
2

Wa2 (i + 1) = arg min x(n)T Qx(n) + û(x(n), α)T Rû(x(n), α) Then
α
L(n) = L(n + 1) − L(n)
+ Ĵ i, f emp (n) + g x(n)û(x(n), α) (73)
1
= Emp (n)T Emp (n) + Emp (n) . (81)
where Wa2 ∈ RLIa ×1 . 2
It is known that in (73) the relation of Wa2 (i + 1) is implicit. Let
On a training set constructed from , the weight Wa2 (i + 1)
can be updated using the gradient steepest descent algorithm u(n) = u(n) − u(n − 1) = P · u(n) (82)
as follows:
where P is the invertible coefficient matrix of u(n). P is found
Wa2 (i + 1) = Wa2 (i) as follows:

∂ x(n)T Qx(n) + û(i)T Rû(i) + Ĵ i, emp (n + 1) ∂J
− la
u(n) = −ϑa · . (83)
∂Wa2 (i) ∂u(n)
(74) Inserting (82) and (83) into (53) results in
where la is determined by (51), giving ∂x(n) T
u(n) = 2 · ϑa · · Q · Emp (n)
∂la ∂f (α) ∂u(n)
= aa · . (75) − 2 · ϑa · PT · R · u(n). (84)
∂|α| ∂|α|
In (75) Then
⎧ T
⎨ ∂fup (α) α ∂x(n)
∂f (α) ∂α = ≥ 0, 0 ≤ α < 2 I + 2 · ϑa · PT · R u(n) = 2 · ϑa · · Q · Emp (n)
= (1+α)2 (76) ∂u(n)
∂|α| ⎩ ∂fdown
∂α
(α)
= 1
≥ 0, −2 ≤ α < 0
(1−α)2 (85)

YANG et al.: HIGH-PRECISION QUICK CONTROL IN MULTIVARIABLE TIME-VARYING NONLINEAR SYSTEM 9

where J1 i + 1, emp (n) = emp (n)T Qemp (n) + υ̂(n)T Rυ̂(n)

−1 ∂x(n) T + J1 i, emp (n + 1) (91)
u(n) = 2 · ϑa · I + 2 · ϑa · PT · R · · Q · Emp (n).
∂u(n) where i is the number of the current iteration, and J1 (i)
(86) represents an arbitrary performance index other than the
optimal performance index calculated by (62).
Substituting (86) into (81)
The difference is shown as follows:
1
L(n) = Emp (n)T Emp (n) + Emp (n) J1 i + 1, emp (n) − J1 i, emp (n)

2
∂x(n) −1 = J1 i, emp (n + 1) − J1 i − 1, emp (n + 1)
= − · 2 · ϑ a · I + 2 · ϑa · P T · R
∂u(n) ,...
T
∂x(n) T = J1 1, emp (n + i) − J1 0, emp (n + i) . (92)
· · Q · Emp (n)
∂u(n) Then, (92) can be written as
−1
∂x(n)
· Emp (n) − · ϑ a · I + 2 · ϑa · P T · R J1 i + 1, emp (n) − J1 i, emp (n)
∂u(n)

∂x(n) T = J1 1, emp (n + i) − J1 0, emp (n + i) . (93)
· · Q · Emp (n) . (87)
∂u(n) Since J1 0, emp (n) = 0

Let κ = ([∂x(n)]/[∂u(n)]), R = I, Q = I, after that (87) J1 i + 1, emp (n) = J1 1, emp (n + i) + J1 i, emp (n)

can be written as = J1 1, emp (n + i) + J1 1, emp (n + i − 1)
T −1 T
+ J1 1, emp (n + i − 2) + . . . + J1 1, emp (n) .
L(n) = − κ T · Emp (n) · 2 · ϑa · I + 2 · ϑa · PT ·
(94)
−1
I − κ T · κ · ϑ a · I + 2 · ϑa · P T · κ T · Emp (n) . Therefore, (94) can be written as
(88) i

J1 i + 1, emp (n) = J1 1, emp (n + i − 1)
Thus, L(n) < 0 is equivalent to the following: j=0
−1
ϑ a · I + 2 · ϑa · P T > ϑa > 0
i

−1 . (89) = emp (n + j)T Qemp (n + j)
T
I − κ · κ · ϑ a · I + 2 · ϑa · P T > I − κ T · κ · ϑa > 0 j=0

Therefore, when 0 < ϑa < [1/(([∂x(n)]/[∂u(n)])2 )]
is + υ̂(n + j)T Rυ̂(n + j)
satisfied, L(k) < 0 can be guaranteed. From the Lyapunov ∞

stability theorem, the closed-loop system of the Bio-Dec-HDP ≤ emp (n + j)T Qemp (n + j)
algorithm is gradually stable. Hence, the theorem proves Bio- j=0

Dec-HDP is stable for nonlinear systems. + υ̂(n + j)T Rυ̂(n + j) . (95)
The system is known to be stable, i.e., emp (n) → 0 as
C. Algorithm’s Iterative Convergence Analysis n → ∞, and υ̂(n) is admissible and stabilized, so
To analyze the iteration convergence easily, i.e., J(i) → J ∗ , ∞

u(i) → u∗ as i → ∞, we introduce two lemmas. ∀i : J1 i + 1, emp (n) ≤ J1 1, emp (n + i) ≤ Jmax . (96)
Lemma 1: Let υ(i) be any arbitrary control strategies as i=0
in (61), J(i) is defined as (62), Ĵ(i) can be written as follows: From Lemma 2

Ĵ i + 1, emp (n) = emp (n)T Qemp (n) + υ(i)T Rυ(i) ∀i : J i + 1, emp (n) ≤ J1 i + 1, emp (n) ≤ Jmax . (97)

+ Ĵ i, emp (n + 1) . (90) {J(i)} is known as (62). Define u(i) as in (61) and ψ(0) = 0.
ψ(i) can be updated as
J(i) < Ĵ(i) ∀i, when J(0) = Ĵ(0) = 0.
Proof: It is obvious that J(i) is the optimal performance ψ i + 1, emp (n) = emp (n)T Qemp (n) + u(i + 1)T Ru(i + 1)
index which is calculated by minimizing the control variable
+ ψ i, emp (n + 1) . (98)
u by (61), and Ĵ(i) is obtained by any arbitrary control input,
therefore, when J(0) = Ĵ(0) = 0, J(i) < Ĵ(i) as ∀i. Knowing that
Lemma 2: Let the performance index sequence {J(i)} be
J 1, emp (n) − ψ 0, emp (n) = emp (n)T Qemp (n) ≥ 0 (99)
formulated by (62). When the system is a controlled system,
there is an upper bound Jmax that satisfies 0 ≤ J(i) ≤ Jmax ∀i. then
Proof: Assume υ̂(n) be any admissible stable strategy,
J 1, emp (n) ≥ ψ 0, emp (n) . (100)
J(0) = J1 (0) = 0. Among them, J(i) updates as (62), J1 (i)
becomes Suppose that J(i, emp (n)) ≥ ψ(i − 1, emp (n)) ∀emp (n).

10 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

Since

ψ 0, emp (n) = emp (n)T Qemp (n) + u(i + 1)T Ru(i + 1)

+ ψ i − 1, emp (n + 1) (101)

J i + 1, emp (n) = emp (n) Qemp (n) + υ(i) Rυ(i)
T T

+ J i, emp (n + 1) . (102)
One can obtain

J i + 1, emp (n) − ψ i, emp (n)

= J i, emp (n + 1) − ψ i − 1, emp (n + 1) ≥ 0. (103)
Hence

ψ i, emp (n) ≤ J i + 1, emp (n) . (104)
Fig. 7. Prediction results for our model and existing models.
From Lemma 1, J(i, emp (n)) ≤ ψ(i, emp (n)), so

J i, emp (n) ≤ ψ i, emp (n) ≤ J i + 1, emp (n) (105)

J i, emp (n) ≤ J i + 1, emp (n) . (106)
The above proves that {J(i)} is nondecreasing sequence
bounded. Therefore, when i → ∞, J(i) → J ∗ (emp (n)). In the
same way, when i → ∞, u(i) → u∗ .

VI. C ASE S TUDY

According to the feasibility analysis of Section V, as long
as the nonlinear system can be described by (52), then the
nonlinear system can be timely and accurately controlled by
the Bio-Dec-HDP algorithm. To verify the validity and supe-
riority of Bio-Dec-HDP in a nonlinear time-varying system,
this study selected the time-varying nonlinear function shown
in (107) as the research object
⎧
⎪ K∗ −τ s Fig. 8. Prediction accuracies for our model and existing models.
⎪
⎪ G(s) = (T ∗ s+1) · e
⎨ ∗
K = 0.9 − 0.15 · t0.15 (107)
⎪
⎪ T ∗ = 50 + 0.4 · t0.1
⎪
⎩
τ =5 used as test data. The Bio-Dec-HDP model was compared
with the BPNN model [45], [46] and the self-feedback BPNN
where t is a time variable introduced to describe the general model [47], [48]. The prediction results for our model and
rule of K ∗ and T ∗ changing with time. existing models are shown in Fig. 7, and the prediction
The expression of the first-order inertial pure delay model accuracies for our model and existing models are shown in
in the discrete domain is as follows: Fig. 8.
yout (n + 1) = F(yout (n), xin (n − d), . . . , xin (n − d − 1)) (108) Figs. 7 and 8 demonstrate the high prediction model accu-
racy using the Bio-Dec-HDP algorithm proposed in this work.
where n is the sampling time, d = [τ/Ts] is the lag time, Over time, Bio-Dec-HDP maintains high prediction accuracy,
Ts = 5 is the sampling time interval, [ ] is the integer function, encountering a lowest prediction accuracy of 98.60%, and
yout (n) is the output variable of sampling time n, and xin (n) is an average accuracy of 99.59%, which is much higher than
the input variable of n sampling time. the prediction accuracy of both the BP algorithm and self-
Therefore, combining (107) and (108), the discrete domain feedback-BP algorithm.
expression of the system model is To better analyze the prediction results, the mean absolute
yout (n + 1) = F(yout (n), xin (n − 1), xin (n − 2)). (109) error (MAE), mean relative error (MRE), and mean-square
error (MSE) are used to compare and analyze the three
prediction models. MAE reflects the actual error between the
VII. R ESULTS AND D ISCUSSION
predicted value and the actual value of the three models, MRE
A. Simulation of Prediction Model reflects the deviation degree of the predicted value and the
The system shown in (109) was simulated to obtain the actual value, and MSE reflects the degree of concentration and
characteristic data of the input and output of the system, dispersion of error between predicted value and actual value.
totaling 1000 sets of data. Among them, the first 800 sets of The calculation methods of the above evaluation indexes are
data were used as training data, and 200 sets of data were shown in (110)–(112), and the calculation results are shown

YANG et al.: HIGH-PRECISION QUICK CONTROL IN MULTIVARIABLE TIME-VARYING NONLINEAR SYSTEM 11

TABLE I
P ERFORMANCE I NDEXES OF THE P REDICTION

in Table I

1
N
Fig. 9. Comparison of control performance.
MAE = x(i) − x̂(i) (110)
N
i=1 TABLE II

N
x(i) − x̂(i) C OMPARISON OF T RAINING T IMES FOR MPC
MRE = × 100% (111)
x̂(i)
i=1

1
N
2
MSE = x(i) − x̂(i) (112)
N
i=1

where N is the number of samples in the test set, x̂(i) is the

actual value, and x(i) is the predicted value.
Table I reveals Bio-Dec-HDP has the best performance In order to better analyze the MPC performance of the three
among the three models in terms of MAE, MRE, and MSE algorithms, the average training time to achieve the expected
performance indexes. The MAE value of the Bio-Dec-HDP control accuracy is calculated. To eliminate data contingency
model is reduced by 95.91% compared with the BP model, in the predictive control process, each algorithm carried out
and 90.45% compared with the self-feedback BP model. The 50 simulation experiments and the averaged values are given
MRE value of the Bio-Dec-HDP model is 96.35% lower in Table II.
than the BP model, and 90.75% lower than the self-feedback From Table II, we can see that the average training time of
BP model. The MSE value of the Bio-Dec-HDP model is Bio-Dec-HDP is much lower than that of HDP and IMC. The
99.80% lower than the BP model, and 98.92% lower than the average training time of Bio-Dec-HDP is only 0.76 s, 98.60%
self-feedback BP model. From the above simulation results, faster than IMC and 96.60% faster than HDP.
it can be seen that the Bio-Dec-HDP model can accurately According to the above data, the Bio-Dec-HDP algo-
predict the nonlinear time-varying system state, therefore, the rithm can implement rapid regulation for complex nonlinear
prediction model can meet the requirement of MPC. time-varying systems with fast training. In summary, the Bio-
Dec-HDP algorithm is a new intelligent predictive control
B. Validation of Bio-Dec-HDP algorithm superior to traditional predictive control algorithms.
To verify the control performance of the proposed algo-
rithm for a nonlinear time-varying system, the Bio-Dec-HDP VIII. C ONCLUSION
algorithm is compared with the HDP algorithm [26], [49] and Traditional MPC algorithms encounter numerous difficul-
the IMC algorithm [50], [51]. In order to ensure the fairness ties for controlling nonlinear, time-varying, and multicontrol
of simulation comparison results, the training data and test systems rapidly and accurately. This work proposed an
data obtained by the system shown in (107) under the same MPC algorithm inspired by HDP, biological regulation, and
initial conditions and external conditions were, respectively, operational research. By introducing a biologically inspired
input into the three controllers for simulation verification, with feedback system and operational research, the HDP algorithm
consistent initial parameters. Fig. 9 shows the controller’s state is significantly improved. For the prediction component, the
control results with two time points highlighted in the insets. information feedback path of the NN is improved, overcoming
In Fig. 9, the Bio-Dec-HDP algorithm can accurately model mismatch issues in the control process. Moreover,
predict and control the nonlinear time-varying system. When due to the use of NN modeling, the demand for physical
the system control error was set to 0.001, the algorithm can data of the model is less, and the universality is strong.
still control the system to the set value quickly when the target Finally, the predictive control algorithm is simulated using a
value changes, which verifies the control performance of the nonlinear time-varying function, and the effectiveness of the
predictive controller for the nonlinear time-varying system. In control algorithm is proven. Since the Bio-Dec-HDP algorithm
the control process, it can meet the requirement that the system can quickly and accurately predict and control multivariable
adjusts to the demand change. nonlinear time-varying systems, it has certain application

12 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS: SYSTEMS

potential in industrial processes and social life, such as flood [13] Q. Yang, J. Sun, and J. Chen, “Output consensus for heterogeneous linear
warning and control, and for the high-quality production of multiagent systems with a predictive event-triggered mechanism,” IEEE
Trans. Cybern., vol. 51, no. 4, pp. 1993–2005, Apr. 2021.
steel and oil. [14] Z. Ju and Z. Haihua, “Explicit model predictive control of networked
While Bio-Dec-HDP is a time-efficient and stable algorithm, control systems with time-delay,” in Proc. Chin. Control Decis. Conf.
the authors note that there are two limitations for further (CCDC), 2011, pp. 907–911.
[15] J. Thomas and A. Hansson, “Enumerative nonlinear model predictive
improvement. control for linear induction motor using load observer,” in Proc. UKACC
1) The MPC algorithm proposed in this article is only simu- Int. Conf. Control (CONTROL), 2014, pp. 373–377.
lated in the nonlinear function. The nonlinear function is [16] H. Zhang, R. Tao, Z. Li, X. Zhang, and Z. Ma, “Multivariable sequential
less affected by factors. But in reality, the actual system model predictive control of LCL-type grid connected inverter,” IET
Power Electron., vol. 16, no. 4, pp. 558–574, 2023.
is subject to many uncontrollable factors, so the control [17] K. Masuda and K. Uchiyama, “Simply robust control strategy based on
effect of the algorithm still needs to be further verified. model predictive control,” in Proc. SICE Int. Symp. Control Syst. (SICE
In future work, a variety of field data will be gradually ISCS), 2020, pp. 99–106.
[18] A. Islam, “Model predictive control for flood regulation,” in Proc.
controlled to ensure the effectiveness of the Bio-Dec- Int. Conf. Intell. Comput., Instrum. Control Technol. (ICICICT), 2017,
HDP in an actual system. pp. 275–279.
2) The number of nonlinear function input variables is only [19] F. Yang, Y. Shi, Y. Zhang, L. Liu, and W. Guan, “Research on the control
of the yellow river diversion project based on the decentralized model
3, which is small. However, the structure of an actual predictive control,” in Proc. 32nd Youth Acad. Annu. Conf. Chin. Assoc.
system is very complex, and the input of the algorithm Autom. (YAC), 2017, pp. 681–686.
will be significantly increased, resulting in a surge of [20] J. Yang, Y. Zhang, T. Yildirim, and J. Zhang, “A model predictive
algorithmic calculations. In the future, as the complexity control algorithm based on biological regulatory mechanism and oper-
ational research,” IEEE/CAA J. Automatica Sinica, vol. 10, no. 11,
of the system increases, the actual data required by the pp. 2174–2176, Nov. 2023.
control system will increase. Reducing the complexity [21] H. Wang, M. Chen, and X. Liu, “Fuzzy adaptive fixed-time quantized
of the control object and the data volume on the premise feedback control for a class of nonlinear systems,” Acta Automatica
Sinica, vol. 47, no. 12, pp. 2823–2830, 2021.
of ensuring high precision and applicability of Bio-Dec- [22] K. Xu, H. Wang, and P. X. Liu, “Singularity-free adaptive fixed-
HDP is the direction of subsequent research. time tracking control for MIMO nonlinear systems with dynamic
uncertainties,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 71, no. 3,
pp. 1356–1360, Mar. 2024.
R EFERENCES [23] X. Sui, Y. Tang, H. He, and J. Wen, “Energy-storage-based low-
frequency oscillation damping control using particle swarm optimization
[1] L.-Y. Yu, Z.-Y. Sun, Q. Meng, and C.-C. Chen, “A new finite-time and heuristic dynamic programming,” IEEE Trans. Power Syst., vol. 29,
stabilizing design for a class of high-order uncertain nonlinear systems no. 5, pp. 2539–2548, Sep. 2014.
and its application in maglev systems,” IEEE Trans. Syst., Man, Cybern., [24] S. Saadatmand, P. Shamsi, and M. Ferdowsi, “The heuristic dynamic
Syst., vol. 53, no. 1, pp. 417–424, Jan. 2023. programming approach in boost converters,” in Proc. IEEE Texas Power
[2] L. Li, P. Shi, R. K. Agarwal, C. K. Ahn, and W. Xing, “Event-triggered Energy Conf. (TPEC), 2020, pp. 1–6.
model predictive control for multiagent systems with communication [25] G. Sterling and B. Tyler, “Renewable energy management using action
constraints,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 51, no. 5, dependent heuristic dynamic programming,” in Proc. IEEE Int. Smart
pp. 3304–3316, May 2021. Cities Conf. (ISC2), 2018, pp. 1–5.
[3] G.-P. Liu, “Predictive control of networked nonlinear multiagent systems
[26] E. F. Ferreira, J. V. D. F. Neto, and R. R. Selmic, “HDP algorithms for
with communication constraints,” IEEE Trans. Syst., Man, Cybern.,
trajectory tracking and formation control of multi-agent systems,” IEEE
Syst., vol. 50, no. 11, pp. 4447–4457, Nov. 2020.
Access, vol. 10, pp. 27136–27146, 2022.
[4] M. Wang, Y. Wang, T. Chai, and X. Zhang, “External general-
[27] Q. Liu, T. Li, Q. Shan, R. Yu, and X. Gao, “Virtual guide automatic
ized predictive cascade control for main steam temperature based on
berthing control of marine ships based on heuristic dynamic pro-
weight factor self-regulating,” Acta Automatica Sinica, vol. 48, no. 2,
gramming iteration method,” Neurocomputing, vol. 437, pp. 289–299,
pp. 418–433, 2022.
May 2021.
[5] L. Yin, Q. Li, Z. Hong, Y. Han, and W. Chen, “FFRLS online
identification and real-time optimal temperature generalized predictive [28] M. Ha, D. Wang, and D. Liu, “Event-triggered adaptive critic control
control method of PEMFC power generation system,” Proc. Chin. Soci. design for discrete-time constrained nonlinear systems,” IEEE Trans.
Electr. Eng., vol. 37, no. 11, pp. 3223–3235, 2017. Syst., Man, Cybern., Syst., vol. 50, no. 9, pp. 3158–3168, Sep. 2020.
[6] K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convo- [29] C. Mu, D. Wang, and H. He, “Data-driven finite-horizon approximate
lutional neural network for inverse problems in imaging,” IEEE Trans. optimal control for discrete-time nonlinear systems using iterative
Image Process., vol. 26, pp. 4509–4522, 2017. HDP approach,” IEEE Trans. Cybern., vol. 48, no. 10, pp. 2948–2961,
[7] Z. Fang, S. Duan, T. Chen, C. Chen, and B. Liu, “Formation mechanism Oct. 2018.
and suppression strategy of prediction control error applied in a battery [30] K. Isa and M. R. Arshad, “An analysis of homeostatic motion control
energy storage inverter,” Proc. Chin. Soci. Electr. Eng., vol. 33, no. 30, system for a hybrid-driven underwater glider,” in Proc. IEEE/ASME Int.
pp. 1–9, 2013. Conf. Adv. Intell. Mechatronics, 2013, pp. 1570–1575.
[8] S. Li and B. Ding, “An overall solution to double-layered model [31] B. Liu, Y. Ding, and J.-H. Wang, “Intelligent network control system
predictive control based on dynamic matrix control,” Acta Automatica inspired from neuroendocrine-immune system,” in Proc. 6th Int. Conf.
Sinica, vol. 41, no. 11, pp. 1857–1866, 2015. Fuzzy Syst. Knowl. Discov., 2009, pp. 136–140.
[9] J. Liu, Y. Gu, Y. Cheng, and X. Wang, “Prediction of breast cancer [32] C. Moy, “Bio-inspired cognitive phones based on human nervous
pathogenic genes based on multi-agent reinforcement learning,” Acta system,” in Proc. 3rd Int. Symp. Appl. Sci. Biomed. Commun. Technol.
Automatica Sinica, vol. 48, no. 5, pp. 1246–1258, 2022. (ISABEL), 2010, pp. 1–5.
[10] J. Salt, J. Alcaina, Á. Cuenca, and A. Baños, “Multirate control strategies [33] N. Safiannikov and P. Burenev, “Intelligent measuring system for
for avoiding sample losses. Application to UGV path tracking,” ISA implementation of the method of assessment of functional state of human
Trans., vol. 101, pp. 130–146, Jun. 2020. central nervous system,” in Proc. 20th IEEE Int. Conf. Soft Comput.
[11] Y. Cheng, L. Huang, and X. Wang, “Authentic boundary proximal policy Meas. (SCM), 2017, pp. 577–579.
optimization,” IEEE Trans. Cybern., vol. 52, no. 9, pp. 9428–9438, [34] D. Ventura and S. Kak, “Quantum computing and neural information
Sep. 2022. processing,” Inf. Sci. Inform. Comput. Sci., Intell. Syst., Appl. Int. J.,
[12] G. Xia, Y. Hua, X. Tang, L. Zhao, and W. Chen, “Internal-model vol. 128, nos. 3–4, pp. 147–148, 2000.
control of vehicle chassis based on wavelet-network dynamic inversion [35] D. Ventura and T. Martinez, “Quantum associative memory,” Inf. Sci.,
method,” Int. J. Veh. Auton. Syst., vol. 14, no. 2, pp. 170–195, 2018. vol. 124, nos. 1–4, pp. 273–296, 2000.

YANG et al.: HIGH-PRECISION QUICK CONTROL IN MULTIVARIABLE TIME-VARYING NONLINEAR SYSTEM 13

[36] C. Sauze and M. Neal, “Artificial endocrine controller for power Qiang Guo was born in Beijing, China. He received
management in robotic systems,” IEEE Trans. Neural Netw. Learn. Syst., the B.S. degree in industrial automation and the
vol. 24, no. 12, pp. 1973–1985, Dec. 2013. M.S. degree in control science and engineering from
[37] Z. Yang, Y. Ding, Y. Jin, and K. Hao, “Immune-endocrine the University of Science and Technology Beijing,
system inspired hierarchical coevolutionary multiobjective optimization Beijing, China, in 1995 and 2005, respectively.
algorithm for IoT service,” IEEE Trans. Cybern., vol. 50, no. 1, He is currently an Associate Professor with the
pp. 164–177, Jan. 2020. National Engineering Research Center for Advanced
[38] P. A. D. Castro and F. J. Von Zuben, “Learning ensembles of neural Rolling and Intelligent Manufacturing, University
networks by means of a Bayesian artificial immune system,” IEEE of Science and Technology Beijing. His research
Trans. Neural Netw., vol. 22, no. 2, pp. 304–316, Feb. 2011. interests include control theory and its application in
[39] L. Deng, P. Yang, and W. Liu, “Artificial immune network clustering production process control, automation of metallur-
based on a cultural algorithm,” vol. 2020, no. 1, pp. 1–17, 2020. gical production process, and modeling and control of complex processes.
[40] A. K. Khanmamedov and A. F. Mamedova, “A remark on the inverse
scattering problem for the perturbed Hill equation,” Math. Notes,
vol. 112, no. 1, pp. 281–285, 2022.
[41] N. Wei and Y. Song, “An energy-aware routing strategy based on
dynamic priority factor in ad hoc networks,” in Proc. Int. Conf. Inf.
Technol., Comput. Eng. Manag. Sci., 2011, pp. 189–192.
[42] J. Luo, Z. Chen, Y. Zhou, and J. Wang, “Optimal transit signal priority
method with conflicting requests under connected vehicle environ- Xiong Xiao was born in Hubei, China. He received
ment,” in Proc. IEEE 5th Adv. Inf. Technol., Electron. Autom. Control
the B.S. and Ph.D. degrees in control science and
Conf. (IAEAC), 2021, pp. 308–313.
engineering from the University of Science and
[43] Ariani and M. Salman, “Modeling study of priority intrusion response
Technology Beijing, Beijing, China, in 2010 and
selected on intrusion detection system alert,” in Proc. 6th Int. Conf. Sci. 2017, respectively.
Technol. (ICST), 2020, pp. 1–6.
From 2017 to 2019, he was a Postdoctoral
[44] G. Cademartori, L. Oneto, F. Valdenazzi, A. Coraddu, A. Gambino, and
Researcher with the University of Science and
D. Anguita, “A review on ship motions and quiescent periods prediction
Technology Beijing, where he was an Associate
models,” Ocean Eng., vol. 280, Jul. 2023, Art. no. 114822.
Professor with the National Engineering Research
[45] K. AL-Bukhaiti, Y. Liu, S. Zhao, and H. Abas, “An application of BP
Center for Advanced Rolling and Intelligent
neural network to the prediction of compressive strength in circular
Manufacturing. His research interests include
concrete columns confined with CFRP,” KSCE J. Civil Eng., vol. 27, advanced control algorithms, data-driven power electronics, and electrical
pp. 3006–3018, May 2023.
transmission.
[46] Z.-z. Chang et al., “Model predictive control of long Transfer-line
Dr. Xiao is a reviewer for journals, such as IEEE T RANSACTIONS
cooling process based on Back-Propagation neural network,” Appl.
ON I NDUSTRIAL E LECTRONICS , IEEE T RANSACTIONS ON I NDUSTRY
Thermal Eng., vol. 207, May 2022, Art. no. 118178. A PPLICATIONS, and Chinese Society for Electrical Engineering Journal of
[47] H. Zhou, H. Chen, and J. Li, “Design of multi-parameter fusion
Power and Energy Systems.
coulometer based on self-feedback BP network,” in Proc. Int. Conf.
Electr. Control Eng., 2010, pp. 767–770.
[48] D. Zhao, Y. Zeng, and Y. Li, “BackEISNN: A deep spiking neural
network with adaptive self-feedback and balanced excitatory–inhibitory
neurons,” Neural Netw., vol. 154, pp. 68–77, Oct. 2022.
[49] L. Sun, X. Zhao, and Y. Lv, “Stability analysis and performance
improvement of power sharing control in islanded microgrids,” IEEE
Trans. Smart Grid, vol. 13, no. 6, pp. 4665–4676, Nov. 2022.
[50] L. Condrachi, R. Vilanova, and M. Barbu, “Data-driven internal model Tanju Yildirim was born in Wollongong, NSW,
control of an anaerobic digestion process,” in Proc. 25th Int. Conf. Syst. Australia. He received the bachelor’s degree in
Theory, Control Comput. (ICSTCC), 2021, pp. 504–509. mechanical (engineering) and the Ph.D. degree in
[51] N. Kumar, H. Malik, A. Singh, M. A. Alotaibi, and M. E. Nassar, “Novel philosophy (engineering) from the University of
neural network-based load frequency control scheme: A case study of Wollongong, Wollongong, in 2014 and 2017, respec-
restructured power system,” IEEE Access, vol. 9, pp. 162231–162242, tively.
2021. He has worked with Shenzhen University,
Shenzhen, China, and the National Institute for
Jinying Yang was born in Shandong, China, in Materials Science, Tsukuba, Japan. He is cur-
1995. She received the B.S. degree in automation rently a Lecturer with the Faculty of Science
and the M.S. degree in control science and engi- and Engineering, Southern Cross University, East
neering from the China University of Petroleum, Lismore, NSW, Australia. His research interests include nanomechanical
Qingdao, China, in 2017 and 2021, respectively. sensors, semiconductors, materials science, and acoustic-olfactory technology.
She is currently pursuing the Ph.D. degree in con-
trol science and engineering with the Institute of
Engineering Technology, University of Science and
Technology Beijing, Beijing, China.
Her research interests include nonlinear system
modeling and prediction, predictive control, adaptive
control systems, and advanced automatic control algorithms and industrial
applications.
Fei Zhang (Member, IEEE) was born in Hunan,
Yongjun Zhang was born in Shandong, China, in China. He received the B.S. degree in industrial
1973. He received the M.Sc. and Ph.D. degrees in automation and the Ph.D. degree in control theory
electrical engineering from the University of Science and control engineering from the University of
and Technology Beijing, Beijing, China, in 2002 and Science and Technology Beijing, Beijing, China, in
2011, respectively. 2001 and 2007, respectively.
He is currently a Professor with the Engineering He is currently an Associate Professor with
Research Institute, University of Science and the Engineering Research Institute, University of
Technology Beijing. His research interests include Science and Technology Beijing. His research
modeling and optimization of complex industrial interests include control theory and its application in
processes, power electronics and converter tech- robust control theory and method for systems with
nology, advanced automatic control algorithms and large time delay, detection and control based on pattern recognition, intelligent
industrial applications, knowledge modeling and optimization, pattern technology of industrial robot, integrated intelligent condition monitoring, and
recognition, and intelligent control. automatic system real-time diagnosis technology.
Authorized licensed use limited to: National Institute for Materials Science. Downloaded on September 10,2024 at 05:15:41 UTC from IEEE Xplore. Restrictions apply.