An Echo State Gaussian Process-Based Nonlinear

Model Predictive Control for Pneumatic
Muscle Actuators
Jian Huang , Senior Member, IEEE, Yu Cao, Student Member, IEEE, Caihua Xiong , Member, IEEE,
and Hai-Tao Zhang , Senior Member, IEEE
Abstract— Pneumatic muscle actuators (PMAs), a kind of soft/ identification of system parameters, which is difficult for the
compliant actuators, have been attracted a great deal of attention PMA, owing to its strong nonlinear and time-varying character-
in the studies of rehabilitation robots. However, the nonlinearities, istics. This paper proposes a new model predictive control method
uncertainties, hysteresis, and time-varying features of PMAs based on an echo state Gaussian process that can describe the
bring a lot of difficulties in their high-precision trajectory unknown dynamics of a PMA due to its universal approximation
tracking tasks. In this paper, an echo state Gaussian process- property. Through the optimization method, the controller can be
based nonlinear model predictive control (ESGP-NMPC) is efficiently realized and presents better performances than some
designed for the PMAs. The proposed strategy is comprised of comparatives. By applying this approach, it is possible to achieve
an ESGP, which is suitable for modeling unknown nonlinear not only high-precision control of PMAs but also a certain degree
systems as well as measuring their uncertainties, and a gradient of robustness to the load.
descent optimization algorithm for calculating the control signal
sequences. Based on the Lyapunov theorem, characteristics of the Index Terms— Echo state Gaussian process (ESGP), nonlinear
closed-loop system are analyzed to guarantee the asymptotical model predictive control (NMPC), pneumatic muscle actua-
stability. Both simulations and physical experiments are carried tor (PMA).
out to illustrate the validity of the proposed control strategy.
Compared with other conventional methods, the ESGP-NMPC I. I NTRODUCTION
can achieve a better model fitting for the PMA and control
performance for the high-precision tracking tasks.
Note to Practitioners—High-precision control of pneumatic
W ITH the development of robotic technologies,
the robot-assisted therapy has attracted a lot of
attention all over the world [1]–[4]. Meanwhile, designing
muscle actuators (PMAs) is a vital problem when PMAs are a robot that is suitable for the task-oriented rehabilitation
utilized as actuators of rehabilitation robots since the patient’s
safety and the performance of rehabilitation tasks are largely therapy is still challenging since this kind of robots should
dependent on the accuracy of the actuators. Conventional model- be compliant and safe for the patients. Traditionally, robots
based control approaches usually require relatively accurate are driven by motors that are usually expensive and lack of
necessary compliance. Pneumatic muscle actuators (PMAs)
deficiencies of the existing approaches, we would like to training algorithms. The echo state network (ESN) is a new
propose an efficient strategy that is easy to be implemented kind of RNN characterized by the ability of uniquely mapping
and has sound theoretical support to obtain a satisfactory a temporal input history to an echo state [30]. By this way,
performance for high-precision tracking tasks of PMAs. the high computational complexity of the conventional RNN
The model predictive control (MPC) is a powerful tech- is significantly reduced due to the sparse connections among
nique for optimizing performance of control systems with the hidden neurons. In addition, the learning requirements are
wide recognition from the academia and industry [15], [16]. reduced to only the weights connecting the hidden layer and
Conventionally, the MPC is mostly used in the industrial the readout neuron, and it turns out that the simple structure
processes because of its easy implementation [17]. In the past, makes it more applicable in physical systems.
the high-complexity calculation made it difficult to be used in However, when the testing data change slightly away from
the motion control, which requires high sampling frequency. the training data, ESNs may be ill-posed and accompanied
Because of the rapid development of hardware, the MPC is with large output weights, which weakens their generaliza-
starting to be applied in the field of motion control [18]–[20]. tion capability [31]. One solution for this problem is apply-
This makes the motion controller to be efficiently designed and ing an extension of the ESN called echo state Gaussian
easy to be implemented. By adopting the current states as ini- process (ESGP) that is a fusion of ESN with Bayesian infer-
tial states, an optimal control sequence is generated by solving ence for GPs. The ESGP model generalizes the conventional
a constrained finite-horizon optimal control problem at each ESN, treated by means of ridge or linear regression, under
sample time, and the first element in this sequence is applied a Bayesian perspective, and includes them as special sub-
to the plant [16]. This demands a reasonably accurate model of cases [32]. As a result, the ESGP combines both the merits
the plant characteristics that are able to capture the dynamics of the ESN and the GP. Not only the prediction but also
of the system. Nonetheless, a quantity of applications are based the measurement of model uncertainties can be attained by
on linear models, which leads to poor control performances the method of the ESGP. In addition, on the contrary to the
for highly nonlinear processes. One method to deal with this high computing complexity of conventional neural networks,
problem is to linearize nonlinear plants with Taylor series the training methods of the ESGP can be conducted by type-II
expansion and estimate the higher order unknown term to maximum likelihood, which turns out to be computationally
compensate the nonlinearity [21], [22]. An alternative way is efficient. Though the ESGP has been presented for years, there
to model the nonlinear plants for nonlinear MPC (NMPC), are few studies concerning the application in control systems,
which results in numerically determining the optimal values except for [33].
of the control signal. In this way, an important aspect for a In this paper, an ESGP-NMPC strategy is developed for
practical implementation of a real-time capable MPC scheme stabilizing the transformed model of tracking error dynamics
is to use an efficient algorithm [23], [24]. Generally, it is of a PMA plant. The main contribution of this paper lies
difficult to obtain accurate parameters of the model in identi- in: 1) the proposed control strategy called ESGP-NMPC for
fying nonlinear physical systems. Fortunately, with the great PMAs’ high-precision tracking tasks; 2) the stability analysis
improvement of artificial intelligence, the neural networks of the closed-loop system that depends on the reference
have been widely used in approximating the unknown systems trajectory; and 3) the experimental studies for validating the
due to their excellent capabilities of modeling uncertain and proposed control approaches.
nonlinear systems. This provides an alternative solution for the The rest of this paper is organized as follows. In Section II,
design of model-free NMPC. Thus, an idea of neural network- the three-element model of the PMA is presented. In addition,
based NMPC (NN-NMPC) schemes has been gotten approval. the architectures and the formulations of the ESN and the
Different applications of NN-NMPC have been investigated ESGP are stated. In Section III, the overall processes of the
in [24] and [25]. At the same time, accurately describing an NMPC with the detail deduction of the ESGP differential
unknown, complex system remains a challenging problem. coefficients are denoted. Then, in Section IV, the stability
At present, there are various methods that can approximate of the proposed ESGP-NMPC is analyzed. The simulations,
dynamics of the nonlinear system. The common function as well as practical experiments, are presented to demonstrate
approximators include multilayer perceptrons (MLPs) [26], the effectiveness and robustness of the strategy in Sections V
fuzzy logical systems [27], and recurrent neural net- and VI. The conclusions are given in Section VII.
works (RNNs) [28]. However, the MLPs are limited in the
sense that they are only capable of providing a static map
between inputs and outputs (i.e., they have no way of internally A. Model Formulation
representing the dynamics of a nonlinear system). In fuzzy A generalized three-element model of the PMA is often
control, the tuning of the fuzzy rule and membership function assumed in its control applications [10]. Fig. 1 shows that the
are strongly dependent on experience and time-consuming. PMA is placed on the vertical position with a load attached to
For RNNs, the universal approximation property makes them the bottom. As the air pressure of inner bladder is changing,
particularly suitable for modeling dynamical systems with the contractile length of the muscle will alter, correspondingly.
arbitrary precision [29]. Hence, they are more applicable to When the rubber bladder expands, the diameter largens in the
nonlinear system modeling and control. Despite the potential radial direction and the length decreases in the axial direction,
and capability of RNNs, the main problem is the difficulty of simultaneously. In this way, the force occurs in the axial
training them, the complexity, and slow convergence of the direction. The dynamic behaviors of a PMA hanging vertically

Fig. 2. Block diagram of ESN.

Fig. 1. (a) Dynamic principle of PMA. (b) Three-element model of PMA.

where f (·) is the internal units’ nonlinear activation function,
Win ∈  N×K , W ∈  N×N , and Wback ∈  N×L are the
can be described as input–hidden, hidden–hidden, and output–hidden connection
M ÿ + B(P) ẏ + K (P)y = F(P) − Mg (1) weight matrices called input matrix, internal matrix, and feed-
back matrix, respectively. x(t) = [x 1 (t), . . . , x N (t)]T ∈  N ,
where M represents the mass of the load, g is the gravitational u(t) = [u 1 (t), . . . , u K (t)]T ∈  K , and y(t + 1) = [y1 (t +
acceleration, y denotes the contractile length of the PMA, 1), . . . , y L (t + 1)]T ∈  L represent the reservoir states, input,
y = 0 corresponds to the fully deflated position, and and output respectively. The ESN is famous for its reservoir
K (P), B(P), and F(P) are the spring coefficient, the damping computing. By updating (4) and (5), the evolution can be done.
coefficient, and the effective force, respectively. They can be Remark 1: The universal computation and approximation
expressed as properties of the ESN have been started and advanced in the
field of liquid state machines [34], [35]. In addition, [30, Def-
K (P) = K 0 + K 1 P
inition 1] implies that nearby echo states must present similar
F(P) = F0 + F1 P input histories. Intuitively, the echo states may represent the
Bi (P) = B0i + B1i P (inflation) current dynamics or states of the system that the ESN is
Bd (P) = B0d + B1d P (deflation). (2) modeling [36]. In addition, the existence of echo states can
be verified concerning the necessary condition ρ(W) < 1 and
The value of F(P) relates to the geometric factors of the the sufficient condition σmax (W) < 1, where ρ(W), σmax (W)
sheath as well as the thickness of the sheath/bladder. All the denote the spectral radius and the largest singular value of the
coefficients rely on the air pressure of the inner bladder, and reservoir weight matrix, respectively. Because the outputs of
the damping coefficient depends on the inflation or deflation ESN from the squashing function f (x) = tanh(x) are bounded
period of the PMA. Therefore, this system can be regarded as for all inputs, the bounded-input bounded-output stability
a SISO system that the input air pressure and the displacement can be satisfied. More rigorous boundness for guaranteeing
of PMAs are the system’s input u c and output y, respectively. asymptotic stability of ESNs can be found in [36] and [37].
Let T be the sampling time. The dynamics of the PMA can
also be rewritten as the following discrete form according C. Echo State Gaussian Process [32]
to (1) and (2):
⎧ The ESGP, using GP to regress against the augmented

⎪ y(t + 1) = α1 y(t) − α2 y(t − 1) + α3 reservoir states with following form, is an extension of ESN:

⎪ −1
⎨α1 = 2 − M T [B0 + B1 u c (t − 1)] φ(t) = [x(t)T , u(t)T ]T (6)
α2 = 1 + M −1 T [B0 + B1 u c (t − 1)] (3)

⎪ where x(t) is the vector of reservoir states and u(t) denotes the

⎪ −1
+M T [K 0 + K 1 u c (t − 1)]

⎪ ESN’s input. Then, the j th output of the ESN can be written
⎩α = M −1 T 2 [F + F u (t − 1) − Mg].
3 0 1 c as
y j (t) = wTj φ(t) (7)
B. Echo State Network
The ESN contains K inputs in the input layer, N neurons where wTj
represents the j th row of Wout .
in the dynamic reservoir, and L neurons in the output layer. Applying the assumption of the GP, we impose a Gaussian
Fig. 2 shows the neural network structure. The dynamics of prior over the weights of ESN’s outputs
an ESN are given by w j ∼ N(0, I). (8)
x(t + 1) = f [W u(t + 1) + Wx(t) + W
in back
y(t)] (4) On the basis of probability theory, y j (t) obeys a Gaussian
y(t + 1) = Wout [x(t + 1)T , u(t + 1)T ]T (5) distribution. Then, the mean and variance corresponding to

time instant t1 and t2 can be obtained as

E[y j (t)] = E wTj φ(t) = 0 (9)
E[y j (t1 )y j (t2 )] = φ(t1 ) φ(t2 ).
Thus, y j (t1 ) and y j (t2 ) obey joint Gaussian distribution with
zero mean and covariance given by φ(t1 )T φ(t2 ) for any
j ∈ [1, . . . , L] and ∀t1 , t2 . Based on the above-mentioned
analysis, the distribution of the ESN outputs turns out to be
a GP
Fig. 3. Control structure of the proposed control scheme.
[y j (t)]tts=t1 ∼ N(0, Kr (, ))
 = [φ(t1 ), . . . , φ(ts )]
⎡ ⎤ where u c and y are the input air pressure and displacement
φ(t1 )T φ(t1 ) · · · φ(t1 )T φ(ts )
⎢ .. .. .. ⎥ of the PMA, respectively. Let u(t) = [u c (t − 1), y(t − 1),
Kr (, ) = ⎣ . . . ⎦ (11) y(t − 2)]T . The training algorithm of ESNs is to solve a
φ(ts )T φ(t1 ) ··· φ(ts )T φ(ts ) linear regression by constructing the training set and the testing
set. The online learning produce is presented briefly in the
where t1 , . . . , ts denote a series of time points.
With the linear regression method for GP, the predictive
mean and variance of the ESGP can be expressed as 1) Generate, randomly, the matrices Win , W,and Wback
and scale the weight matrix W, such that its maximum
E j ∗ = φ(t∗ )T A−1  ỹ j (12) eigenvalue |λmax | < 1.
−1 2) Drive the network using the training data by computing
σ∗2 = σ φ(t∗ ) A
2 T
φ(t∗ ) (13)
internal states x(t) according to (4).
where 3) Collect the state x(t) between time point t1 and ts
as a row into a state matrix, and collect, similarly,
A =  T + σ 2 I Nk ∈ (N+K )×(N+K )
the corresponding system’s output into a teacher matrix.
and ỹ j (t) = [ ỹ j (tτ )]ttsτ =t1 means a series of the ESN’s outputs 4) Compute the output weight matrix Wout by the online
ỹ j = y j + ε, ε ∼ N(0, σ 2 ) superimposing on an independent method of recursive least squares algorithm with the data
white Gaussian noise signal at historical time point t1 , . . . , ts . collection of the state matrix and the teacher matrix.
Examining the expression (12), the ESGP calculates the model In this way, the training procedure is finished, and the
predictive mean by combining the augmented reservoir states trained network can be driven by new input sequences to
at time point t∗ and the historical states of the ESN together predict new outputs.
with its input and the corresponding output values. In other Fig. 3 shows the structure of the proposed control strategy,
words, D = {(φ(ti ), ỹ j (ti ))|i = 1, . . . , s} can be regarded as in which the ESGP model approximates the dynamic behaviors
the training set, and φ(t∗ ) acts as the inputs of testing set at of the PMA using the historical information of the plant. Equa-
time point t∗ . Because of the linear regression of the ESGP, tion (12) can be regarded as the predicted output of the ESGP
the only parameter, the noise variance σ 2 of the GP model, model, in which φ(t∗ ) serves as the testing input corresponding
is necessary to be optimized, which is usually conducted by to the future time point t∗ = t + i, (i = 1, . . . , N p ). In this
type-II maximum likelihood [38]. The model evidence can be way, E j ∗ can be indicated as the predicted output ym (t + i ) of
given by the PMA system at the future time point t +i . It is easy to find
that E j ∗ can be divided into two parts. One part is the testing
s 1
log p(ỹ j |, σ 2 ) = − log 2π − log |Kr (, ) + σ 2 I| part, i.e., φ(t + i ) = [x(t + i )T , u(t + i )T ]T , in which the
2 2 reservoir states update by iteratively calculating the internal
1 T
− ỹ j (Kr (, ) + σ 2 I)−1 ỹ j . (14) states and the prediction. The other one is the training part
2 A−1  ỹ j . Let W̄out = (A−1  ỹ j )T , the ESGP predictive mean
III. E CHO S TATE G AUSSIAN P ROCESS -BASED can be rewritten as E j ∗ = W̄out φ(t∗ ). It is obvious that the
N ONLINEAR M ODEL P REDICTIVE C ONTROL predictive mean of the ESGP and the output of the ESN, given
by (5), have the same form. In addition, W̄out and Wout are the
A. Learning Procedure certain matrices when the training process is completed. Thus,
In order to illustrate the approximation model of PMA, synthesizing the output of the ESGP and the ESN, Ŵout can
we first give the training process of the ESN and, then, represent W̄out or Wout corresponding to predictive model of
generalize to the ESGP for forecasting the output of the PMA. ESGP or ESN. The comprehensive form can be expressed as
Based on (3), the output of the PMA at point t is directly
related to the control signal at point t − 1 and the output of ym (t + i ) = Ŵout φ(t + i ). (16)
the two previous moments. We model the PMA by nonlinear
autoregressive exogenous model In this way, (4) can be rewritten as

y(t) = f u ([u c (t − 1), y(t − 1), y(t − 2)]T ) (15) x(t + 1) = f [Win u(t + 1) + Wx(t) + Wback ym (t)]. (17)

To overcome the errors between the prediction and the plant’s where e(t) = r(t)− ŷ(t). In practical, for simplicity, the weight
actual output, the compensation term y(t) − ym (t) is also parameters can be denoted by W y = ρ y I and Wu =
obtained. With the combination of the compensation and the ρu I (ρ y , ρu > 0).
prediction, the actual predicted output can be described as To minimize J (t) at each time step t, we consider the gra-
dient descent algorithm, then the control input sequences uc (t)
ŷ(t + i ) = ym (t + i ) + (y(t) − ym (t)) (18) are updated as follows:
where ym , ŷ, and y represent the predicted output of the model, ∂ J (t)
uc (t + 1) = uc (t) − η (21)
the actual predicted output, and the output of the system, ∂uc (t)
respectively. As soon as ŷ is calculated, the error of the signal
where η = ρη I (ρη > 0) is the learning rate for control
and the reference r (t) will be fed into the NMPC controller
signals. ∂ J (t)/∂uc (t) is a vector, which can be derived from
to obtain the control signal.
the predictive model, referring to the ESN or ESGP, based on
the control input sequences.
B. Nonlinear Model Predictive Control Scheme It is obvious that
The basic idea of NMPC is that the current control inputs ∂ J (t) ∂e(t)T y
= W e(t) + Wu uc (t). (22)
are chosen to minimize the cost function over several steps in ∂uc (t) ∂uc (t)
the future for stable control. From (21) and (22), the increments of the control input
The formulation based on the constrained finite-horizon vector uc (t) are attained as follows:
optimization can be described as
∂e(t)T y
 uc (t) = −(I + ηWu )−1 η W e(t)
y ∂uc (t)
J (t) = [r (t + i ) − ŷ(t + i )]Wi [r (t + i ) − ŷ(t + i )]
∂ ŷ(t)T y
i=1 = (I + ηWu )−1 η W e(t) (23)

Nu ∂uc (t)
+ u c (t + j − 1)W uj u c (t + j − 1) (19) where the Jacobian matrix ∂ ŷ(t)T /∂uc (t) can be derived from
j =1 the predictive model based on the control input sequences.
subjected to Remark 2: In order to realize the real-time property, we first
applied the reservoir computing of the sparse connections
| u c | < u cmax among the hidden neurons, which can significantly reduce
u c min ≤ u c (t) ≤ u c max the computational complexity. Note that there are K inputs
ŷmin ≤ ŷ(t) ≤ ŷmax in the input layer and N internal neurons in the internal
layer. Hence, because of the matrix inversion, referring to (12),
where r and u c represent the reference and the control input, the computational complexity of the ESGP is O((N + K )3 ).
u c is the incremental control signal, N p is the prediction On the other hand, the parametric calculation complexity of
horizon, Nu is the control horizon, Nu must be smaller than gradient descent algorithm is based on the internal size of the
N p (Nu < N p ), and Wi and W uj are the weight parameters. reservoir, as well as the prediction horizon N p and control
According to the MPC scheme, the iterative optimization horizon Nu . The control signal is obtained by calculating
strategy repeats at each time step over a finite prediction the increment of the control signal u. The complexity for
horizon N p and its solution leads to an optimal control the proposed gradient descent method is O((N p2 + Nu2 )N 2 )
sequence. according to (23) and (39). Although the computational com-
plexity is high, due to the small scale of the calculation
(K = 3, N = 20, N p = 2, and Nu = 1), the time consumption
C. Optimization Method
of the proposed method can still meet the requirement of this
To minimize the cost function (19) over several steps in the application. Therefore, based on the above-mentioned analysis,
future, the PMA system is utilized. The input air pressure and the real-time property of the proposed control strategy can be
the PMA’s contractile length serve as the input and the output satisfied.
of the system. The following equations are defined to describe Remark 3: NMPC requires the accurate computation of
the MPC scheme: gradients and comes with lack of convexity [39]. However, it is
difficult to obtain the optimal solution. In addition, the typical
r(t) = [r (t + 1), . . . , r (t + N p )]T
numerical solvers are not able to distinguish a local optimum
ŷ(t) = [ ŷ(t + 1), . . . , ŷ(t + N p )]T from a global optimum. In this application, the control strategy
e(t) = [e(t + 1), . . . , e(t + N p )]T requires fast update intervals within a highly resource-limited
uc (t) = [u c (t + 1), . . . , u c (t + Nu )]T computer environment. We cannot expect that the optimization
solver will be allowed to execute until strict tolerances on
uc (t) = [ u c (t), . . . , u c (t + Nu − 1)]T . the optimality conditions or stopping criteria are met in
Then, the cost function of NMPC can be rewritten as every case. In this situation, we prefer to get a feasible solution
instead of an optimal solution. Actually, a recent survey of
J (t) = e(t)T W y e(t) + uc (t)T Wu uc (t) (20) optimization algorithms for NMPC is given in [40]. Because of

nonconvexity, most algorithms find local minima or stationary From (6), it follows that
points rather than global minima. On the other hand, stability ⎛ ⎞
may still be achieved using suboptimal (approximate) MPC. 1 ··· 0 ··· 0
This indicates that model predictive controllers that require ∂φ(t + i )T ⎜ .. .. .. .. .. ⎟
=⎝. . . . .⎠
feasibility rather than optimality have a much better prospect ∂x(t + i )
0 ··· 1 ··· 0
of being implemented when the system is nonlinear [16], [41].  
Remark 4: The adaptive control (MIT rule) is designed with = I N×N 0 N×K = Ī ∈  N×(N+K ) (29)
a reference model. Then, a cost function that is dependent
on the errors between the controlled object and the reference where I N×N is the identity matrix with N dimension and
model is generated. Based on the negative gradient direc- 0 N×K is the zero matrix with N × K dimension.
tion of the cost function, the reference model parameters Combining (24)–(29), we have
can be adaptively adjusted. The main idea of the adaptive
control is to drive the reference model to approximate to ∂ ŷ(t + i ) ∂x(t + j )T ∂x(t + i )T out
the controlled object [42], [43]. This requires a reasonable = Ī(Ŵ )T . (30)
∂u c (t + j ) ∂u c (t + j ) ∂x(t + j )
model of the object. However, in this paper, the ESGP model
is not a reference model. It can be regarded as a global In order to analyze the relationship between the reservoir
approximator. By online training this model, the ESGP can states and the model inputs, (17) can be denoted as
approach the dynamic model of PMA with higher accuracy.
The significant difference between the adaptive control and the ⎛ ⎞
g1 (t + j − 1)
proposed method is that the adaptive control is still a model- ⎜ .. ⎟
x(t + j ) = f ⎝ . ⎠ (31)
based control strategy, while the proposed method is a data-
driven strategy. Furthermore, this strategy includes the idea g N (t + j − 1)
of adaptive control. The gradient descent algorithm makes the
predicted trajectory to approach the reference trajectory, which where
results in the accurate tracking.

gr (t + j − 1) = wrs u s (t + j ) + wrs x s (t + j − 1)
D. Echo State Gaussian Process-Based NMPC s=1 s=1
To get the Jacobian matrix ∂ ŷ(t)T /∂uc (t), we consider the 
situation i ≥ j given time point t + i and t + j since the term + wrs
ym (t + j − 1). (32)
∂ ŷ(t + i )/∂u c (t + j ) will be equal to zero when i < j . s=1
Activated by the input information, the reservoir states are
updated by a trace of previously acquired reservoir states and Then, we have
the system output. Thus, the relationship of the reservoir states
at different time instants can be obtained ∂x(t + j )T
= Wuin (t + j − 1) (33)
⎧ i−1 ∂u c (t + j )

⎪  ∂x(t + l + 1)T

⎨ ,i> j
∂x(t + i ) T
∂x(t + l) where Wuin = [w11 in , . . . , w in ], (t + j − 1) = diag{ f (g (t +
= l= j (24) N1 1
∂x(t + j ) ⎪
⎪ = j − 1)), . . . , f (g N (t + j − 1))}

⎩ I, i j
0, i < j. Since the dynamic reservoir states contain the information
of historical outputs, we can rewrite the reservoir states
The relationship between the reservoir states and the control equation as
signals can be obtained from (17) as follows:
∂x(t + i )T ∂x(t + j )T ∂x(t + i )T 
= (25) x m (t + l + 1) = f wmn + wback
ms Ŵsn x n (t + l)
∂u c (t + j ) ∂u c (t + j ) ∂x(t + j )
n=1 s=1
where u c (t + j ) is the first element of the input vector. 

It indicates that the input vector is comprised of the control + wms
u s (t + l + 1) (34)
signal u c and the actual predictions ŷ of the PMA model. s=1
Thus, according to (16) and (18), it follows that
where x m and x n are the elements of the reservoir states, and
∂ ŷ(t + i ) ∂φ(t + i )T ∂ ŷ(t + i )
= (26) wmn , wms
back , and ŵ out are the elements of the internal weights,
∂u c (t + j ) ∂u c (t + j ) ∂φ(t + i ) feedback weights, and output weights, respectively.
∂ ŷ(t + i ) ∂ym (t + i ) Then, it follows that
= = (Ŵout )T . (27)
∂φ(t + i ) ∂φ(t + i )
Meanwhile, we have ∂ x m (t + l + 1) 
= wmn + wms ŵsn f (gm (t + l)).
back out
∂φ(t + i )T ∂x(t + i )T ∂φ(t + i )T ∂ x n (t + l)
= . (28) s=1
∂u c (t + j ) ∂u c (t + j ) ∂x(t + i ) (35)

In addition, we have the Jacobian matrix ∂x(t + l + 1)T / The change of the function can be obtained as the following
∂x(t + l) form:
⎡ ∂ x (t + l + 1) ∂ x N (t + l + 1) ⎤ ∂ V (t) ∂e(t)T y
··· = W e(t). (41)
⎢ ∂ x 1(t + l) ∂ x 1 (t + l) ⎥ ∂t ∂t
∂x(t + l + 1)T ⎢ ⎢ . . .. ⎥
=⎢ .. .. . ⎥ Now, we want to compute ∂e(t)T /∂t, which directly relates
∂x(t + l) ⎣ ∂ x (t + l + 1)
1 ∂ x N (t + l + 1) ⎦ to the model prediction. According to (16), we only care about
∂ x N (t + l) ∂ x N (t + l) the augmented reservoir states φ(t+i ) = [x(t+i )T , u(t+i )T ]T
(36) since the output matrix Ŵout is a certain matrix when the train-
ing process is finished. Meanwhile, the feedback matrix Wback
Thus is set to be a zero matrix. By iteratively updating the reservoir
∂x(t + l + 1)T states, the relationship between the future states and the
= (WT + W̄)(t + l) (37)
∂x(t + l) historical states can be expressed as
where x(t + N p ) = f [Win u(t + N p ) + Wx(t + N p − 1)]
⎡ ⎤
L = f [Win u(t + N p ) + W f (Win u(t + N p − 1)
⎢ back out
w1s ŵs1 ··· wback out
Ns ŵs1

⎢ ⎥ + Wx(t + N p − 2))]
⎢ s=1 s=1 ⎥
⎢ .. .. .. ⎥ ..
W̄ = ⎢
⎢ . . .

⎥ .
⎢ L ⎥
⎢ L ⎥ = f {Win u(t + N p ) + W f [Win u(t + N p − 1)
⎣ wback ŵout ··· w back out ⎦

1s sN Ns sN + W f (Win u(t + N p − 2)
s=1 s=1
+ W f (· · · f (Win u(t + 1) + Wx(t))))]}. (42)
Due to the known historical states x(t), the derivative
(t + l) = diag{ f (g1 (t + l)), . . . , f (g N (t + l))}. of x(t) for time will be equal to 0. Thus, ∂e(t)T /∂t only
From (24), (26), (33), and (37), and let relates to the network input u(t + 1), . . . , u(t + N p ), in which
the control signal u c is the first element.

According to the previous analysis and the rules of deriva-
Q= [(WT + W̄)(t + l)] (38)
tion, ∂e(t)T /∂t can be obtained as
l= j
∂e(t)T ∂uc (t)T ∂ ŷ(t)T ∂r(t)T
then we can finally get =− + (43)
⎧ ∂t ∂t ∂uc (t) ∂t
out T

⎨Wu (t + j − 1)QĪ(Ŵ ) , i > j
∂ ŷ(t + i ) out where ∂r(t)T /∂t is the derivation of the reference. From (23),
= Wuin (t + j − 1)Ī(Ŵ )T , i= j (39) the relationship between the derivation of error ∂e(t)T /∂t and
∂u c (t + j ) ⎩⎪
0, i < j. uc (t) can be attained
∂ ŷ(t)T y
IV. S TABILITY A NALYSIS W e(t) = (η−1 + Wu ) uc (t). (44)
∂uc (t)
As it is essential for successful applications, the stability
Combining (41), (43), and (44), the derivation of the function
of the closed-loop system is a vital issue and requires careful
∂ V (t)/∂t can be expressed as
investigation. In Section IV, the Lyapunov function is proposed
and the stability of the closed-loop system is analyzed from ∂ V (t) ∂uc (t)T −1 ∂r(t)T y
=− (η + Wu ) uc (t) + W e(t).
two situations under the condition that the ESGP can approx- ∂t ∂t ∂t
imate the dynamics of the PMA, as described by Remark 1. (45)
Theorem 1: Assume that the parameters η−1 + Wu is pos-
itive definite and set the feedback matrix Wback to be a zero Approximately, we have
matrix. V (t) = − uc (t)T (η−1 + Wu ) uc (t) + r(t)T W y e(t).
1) If the reference trajectory does not change over time, (46)
the asymptotic stability of the proposed ESGP-NMPC
can be guaranteed. First, considering the case that r(t) is equal to zero, the
2) If the reference trajectory changes over time, the system asymptotic stability of the proposed ESGP-NMPC can be
will be passive. guaranteed when the parameter η−1 + Wu is positive definite
Proof: Let us define a function as since the function can be seen as Lyapunov function and
V (t) is less than or equal to zero ( V (t) ≤ 0 ). This
V (t) =
e(t)T W y e(t). (40) corresponds to the situation that the reference does not change
2 over time.
When the weight parameter W y keeps positive definite, Considering the second case in which r(t) is not equal
the function will be nonnegative [V (t) ≥ 0]. to zero, the closed-loop system can be regarded as a system,

TABLE I Theorem 1 to guarantee the stability of the system (ρu = 0.87,

M ODEL PARAMETERS U SED FOR THE S IMULATIONS ρ y = 12247, ρη = 0.35). The sampling time is set to 0.001 s.
We model the PMA by nonlinear autoregressive exogenous
model according to (15). The ESGP, as well as ESN, has
three layers, including the input layer, the dynamic reservoir
layer, and the output layer. The corresponding weight matrices
Win and W are randomly generated based on the its learning
procedure, and the weight matrix W is generated, such that
its maximum eigenvalue is smaller than 1 to ensure the “echo
state property.” Wback is set to be a zero matrix. Since the
where r(t) serves as the input and W y e(t) serves as the accuracy of the predicted output may seriously influence the
output. Now, the function is the storage function. Then, control performance, increasing the reservoir size is the most
the system will be passive since r(t)T W y e(t) ≥ V (t) direct way to guarantee the capacity of modeling. However,
when η−1 + Wu is positive definite. a high reservoir size is always accompanied with high com-
It is worth mentioning that our applications of the PMA putational complexity. Therefore, the appropriate reservoir
are mainly aimed at the rehabilitation robots to help patients size needs to be selected, which is set as 20 in this paper.
complete the rehabilitation training process. In this case, In order to illustrate the effectiveness and high precision of the
the reference trajectory is usually slowly changing. This proposed control algorithm, we use ESN, RNN, and MLP with
satisfies the situation 1 in most cases. the nonlinear model predictive controller and PID controller
to achieve comparative experiments. The predictive models
V. S IMULATION S TUDIES forecast the future processes over the specified horizon, and
In this section, the proposed ESGP-NMPC strategy is the gradient descent algorithm is applied to minimize the cost
applied to control the PMA plant. The sinusoidal trajectory function at each sampling instance. The PID controller can be
is adopted to serve as the tracking reference to prove the expressed as the following equation:
validity of the proposed control approach in the following   t 
simulation. It is worth mentioning that [24] and [44] proposed u(t) = P e(t) + I e(t)dτ + D (48)
0 dt
a NMPC scheme with MLP and RNN, respectively. Inspired
by them, we utilized control strategies called MLP-NMPC and with the parameters P, I, and D blind tuned.
RNN-NMPC for comparison. So as to show the evaluation Our aim is to measure and compare the accuracy of control
and comparisons on the high-precision position tracking tasks, results. The maximum absolute error and the integral of
the ESN, RNN, and MLP are applied, in order to show absolute error can describe the tracking performance from two
whether the ESGP is able to build the more precise model and different perspectives. One is the maximum deviation, and the
ESGP-NMPC can improve the tracking performance. All the other one is the average error. They can be calculated with the
simulations were programmed with MATLAB, version 2013b, following formulas:
and were run on a PC with a clock speed of 3.6 GHz and ERROR a = Max(|r (t) − y(t)|nt=1 )
8-GB RAM in a Microsoft Windows 10 environment.
ERROR = |r (t) − y(t)| (49)
A. System Conditions t =1

This simulation is conducted to demonstrate the effec- where n is the total number of samples.
tiveness of the proposed control strategy for the reference
trajectory B. System Modeling
yd = A x cos(2π f t − π/2) + Bx (47) As described in Remark 1, the ESGP has universal computa-
tion and approximation properties. It can model the PMA with
where we choose A x = 0.0125, f = 0.25 Hz, and sufficient accuracy. The ESGP model will be trained by the
Bx = 0.0225. input–output data of the PMA system. The effectiveness of the
The system model is based on Fig. 1. The system parameters ESGP model is evaluated by the online modeling and multistep
in (2) are selected based on the identification of the three- prediction. Assume that the input u(t) = α1 sin(2πwt) +
element model [10], and the values are shown in Table I. The α2 (α1 = 10000, α2 = 145000, w = 0.25) is known. The
damping coefficient B(P) is dependent on whether the PMA prediction of the ESGP model is compared with an ESN,
is being inflated or deflated, which corresponds to two sets an MLP and an RNN. In the simulations, all the function
of B0 and B1 . In fact, the spring coefficient K (P) turns out approximators ESGP, ESN, MLP, and RNN are set to have the
to be a piecewise linear function at point P = 163892 Pa, similar topologies of three-layer structures, including the input
and hence, there are also two different sets of K 0 and K 1 for layer, hidden layer, and output layer. The significant difference
P < 163892 and P > 163892, respectively. between the ESGP, the ESN, and other function approximators
We adopt the parameters N p = 2 and Nu = 1. The is that the hidden layer applies the reservoir computing.
choice of other parameters Wu = ρu I, W y = ρ y I, In order to ensure a fair comparison, the hidden nodes of
and η = ρη I (ρu , ρ y , ρη > 0) were selected based on all the approximators are set to 20, and the weight matrices

Fig. 4. Prediction performance of the PMA using models. Fig. 6. Tracking performances of the control strategies with the sinusoidal

Fig. 5. Forecasting errors of the PMA using models.

Fig. 7. Errors of the tracking performances with the sinusoidal reference.

C OMPARISON OF THE P REDICTION OF D IFFERENT M ODELS and PID control strategies are shown in Fig. 6, and the
corresponding tracking error results are presented in Fig. 7.
It is easy to find that the ESGP-NMPC presents the best
performance, whereas the tracking errors of the ESN-NMPC
and ESGP-NMPC are almost the same. The reason is that
ESGP derives from ESN and has a similar principle to it.
The RNN-NMPC and MLP-NMPC behave a litter worse
than the proposed strategies since the prediction accuracy
are randomly initialized according to the principles of each of RNN and MLP is directly related to the number of
approximator. Therefore, we run the program 100 times to hidden neurons. However, with the increase in the number
eliminate the effects of randomness. The forecast performance of neurons, the structure of MLP and RNN would become
is measured by applying the root-mean-square error (RMSE) complex. Compared with the dynamic reservoir computing
that can be described as of ESGP, the training methods of MLP and RNN would be
 complicated, and the convergence speed may not guarantee
RMSE(t) = ! (y(t) − ŷ(t))2 . (50) real-time calculation. On the other hand, the PID controller
2n is not only limited by the lack of theoretical support to
t =1
prove the stability but also difficult to find suitable controller
The predictions of all the models are shown in Figs. 4 and 5 for parameters for high-precision control with blind tuned P, I, and
comparisons. The forecasting performance is assessed using D. Consequently, the PID controller performs the worst with
the same number of hidden nodes, as shown in Table II. In the our best efforts for parameter regulation in this simulation.
case of the same number of hidden layer nodes, the results At last, the corresponding results about the maximum error
indicate that the ESGP can predict the behavior of the PMA and the integral of absolute error are shown in Table III.
with highest accuracy among all the approaches.
C. Simulation Results A. Experimental Setup
The trajectory tracking results of the PMA employed The main module in the PMA hardware system is the
ESGP-NMPC, ESN-NMPC, RNN-NMPC, MLP-NMPC, xPC target, a product from MathWorks, which makes it easy to

Authorized licensed use limited to: MANIPAL INSTITUTE OF TECHNOLOGY. Downloaded on March 29,2023 at 13:09:11 UTC from IEEE Xplore. Restrictions apply.


Fig. 9. Experimental schematic of the PMA platform. 1: air compressor.

2: pressure-relief valve. 3: electromagnetic valve. 4: load. 5: PMA. 6: force
sensor. 7: displacement sensor.

the physical hardware and its parameters. The PMA practical

platform is comprised of a PMA, a force sensor, a dis-
placement sensor, a pressure-relief valve, an air compressor,
an electromagnetic proportion valve, and a load, as shown in
Figs. 8 and 9. Through the electromagnetic proportion valve,
Fig. 8. Physical system of the PMA. the air compressor provides the air pressure for the system
to drive the PMA. Meanwhile, the real-time sensory data are
TABLE IV sampled and sent to the computer via the signal acquisition
M AIN D EVICES board, and then, the computer calculates the control signals in
the light of the proposed strategy. In fact, the control algorithm
regulates the voltage of the electromagnetic proportion valve
that changes the input air pressure of the system to alter the
displacement of the PMA.

B. Experimental Results
To verify the validity of the proposed approach, a sinusoidal
trajectory is also utilized as the desired trajectory. Comparisons
are given to show the tracking performance of the
and PID. The proposed control strategy will be applied to
the PMA attaching different loads for testing its robustness
to different loads. In addition, experiments with different
frequencies and amplitudes of reference trajectories are
applied to further verify the effectiveness of this method.
Before presenting the experimental results, we would like
to show the calculation time of the control algorithm. At first,
we performed the whole experiment 10 times and calculated
the average computational time within a control period. The
create real-time application models by SIMULINK/BLOCKS. time consumption of each control period includes two parts.
With the dedicated hardware, the proposed strategy can be One is the learning procedure of the ESGP that is used to
implemented. Two computers are utilized. One is the host, approximate the dynamics of the PMA. The other one is the
and the other one is the target. The host computer is installed gradient descent algorithm that is used to obtain the control
with MATLAB/SIMULINK and C compiler, and it applies signal. It turns out that the average consumption time of the
models to generate executable codes, while the target computer ESGP is 8.1758 × 10− 5 s. This is due to the relatively small
executes the generated code in real-time. Hence, the xPC target size of the internal states. In addition, the average consumption
can provide the necessary software that makes use of real-time time of the gradient descent algorithm is 1.483 × 10− 5 s.
resources on the target computer hardware. Table IV shows Hence, this fully meets our real-time requirements.

Fig. 10. Tracking performances of the control strategies with the sinusoidal

Fig. 12. Tracking performances of the control strategy ESGP-NMPC with

the PMA-attached different loads.

Fig. 11. Tracking errors of the control strategies with the sinusoidal reference.

The control performances are shown in Fig. 10, and the

corresponding tracking errors are presented in Fig. 11. It is
clear that the maximum absolute error of ESGP-NMPC is the
smallest, and the corresponding integral of absolute error is
6.8894 × 10−4 (m), which is also the smallest of the five
control strategies. This is because, the ESGP can better capture
the dynamics of the PMA so that the control effect can be
effectively improved. The tracking errors are mostly coming
from the sinusoidal peaks and troughs because of the PMA’s
hysteresis feature. Even though we use the same internal Fig. 13. Tracking errors of the control strategy ESGP-NMPC with the PMA-
weights of the ESGP and the ESN, the tracking performances attached different loads.
of the ESGP-NMPC are better because the ESGP model
takes noise into account in modeling the PMA system. The TABLE VI
RNN and MLP are also able to model the system, but it R ESULTS OF THE C ONTROL S TRATEGIES W ITH THE D IFFERENT L OADS
requires a number of neurons with a few hidden layers, which
makes the structure hugely complex and the training methods
costly. Meanwhile, the ESGP and ESN are only necessary
to calculate the output layer referring to Ŵout . Meanwhile,
the PID controller is still able to get a certain degree of
precision control effect but still not as good as ESGP-NMPC.
In this way, the ESGP-NMPC not only can achieve the high-
precision task but also can meet the requirement of real-time
computing. The results about the maximum error and the are almost same, and the ESGP-NMPC is still able to track the
integral of absolute error are shown in Table V. reference with high precision. It is clear that as the weight of
To illustrate the robustness of the system, experiments that load increases, the errors increase little correspondingly, and
the PMA attaches 0.5-, 1.0-, and 1.5-kg loads are conducted. the deviation is so small that it is within the acceptable range.
The tracking performances are shown in Fig. 12, and the cor- This indicates that the ESGP-NMPC is robustness to different
responding tracking errors are presented in Fig. 13. Although loads. Table VI gives out the specific values of the maximum
the PMA attaches different loads, the tracking performances absolute error and the integral of absolute error.

C. Discussion
Note that the ESGP used in this paper has K input neurons,
N internal neurons, and one output neuron. The storage of
ESGP is mainly used for weight matrices and the vector of
internal states. The computational and storage cost of the
NMPC-ESGP algorithms are O((N + K )3 ) and O((N + K )2 ),
respectively. In order to satisfy the real-time property, the inter-
nal size is rather small in our application. Currently, the high-
speed multicore digital signal processing (DSP), such as
TMS320C6678 SoC, is clocked at 1.25 GHZ and 512 KB of
memory, which has reached a fairly high speed of operation.
In fact, there have been studies using DSPs to implement ESNs
to accomplish specific tasks [45]. In addition, the advanced
RISC machine (ARM) Cortex-A53 that contains eight cores
with 1.4 GHZ is considered to be the most promising ARM
processor in recent years. It can carry embedded operating
Fig. 14. Tracking performance of the control strategy ESGP-NMPC with a system, and it has been applied in various fields. There is no
sinusoidal reference signal of frequency 0.1 Hz and amplitude 0.0075, 0.01, doubt that it can also meet our requirement. FPGAs allow a
and 0.0125 m.
much faster implementation cycle for a “small” number of
chips and offer an intermediate between the programmability
of classic processors and the parallel nature and high speed of
Application Specific Integrated Circuits. There exist several
studies about the implementation of liquid state machines
for real-time speech recognition [46]. Note that the liquid
state machine is an RNN of spiking neurons with reservoir
computing, whose principle is the same as ESNs. In the
current stage, our main purpose is to verify the effectiveness
and robustness of the algorithm. For our further application,
we may implement this algorithm on a DSP or FPGA.

This paper proposed a systematic design methodology to
develop an ESGP-NMPC for a PMA system. By analyzing the
NMPC scheme, a gradient descent algorithm was utilized to
minimize the cost function, and a control signal sequence was
obtained. As the stability is essential for the physical applica-
tion, characteristics of the closed-loop system were analyzed to
Fig. 15. Tracking performance of the control strategy ESGP-NMPC with a
sinusoidal reference signal of amplitude 0.0125 m and frequency 0.1, 0.25,
get the stability condition. The simulation and experiment were
and 0.4 Hz. conducted to demonstrate the validation of the ESGP-NMPC,
and its robustness was also verified through the physical
experiments that the PMA mounted different loads. Compared
with other control strategies, the ESGP-NMPC can achieve
Another experiments with sinusoidal reference signals of a better model fitting for the PMA and a better control
frequency 0.1 Hz and amplitude 0.0075, 0.01, and 0.0125 m performance for the high-precision tracking tasks.
are conducted, respectively. The results indicate that the
Ind. Electron., vol. 61, no. 4, pp. 1970–1982, Apr. 2014. Jian Huang (M’07–SM’17) received the B.S., M.E.,
[24] L. Cheng, W. Liu, Z.-G. Hou, J. Yu, and M. Tan, “Neural-network- and Ph.D. degrees from the Huazhong University
based nonlinear model predictive control for piezoelectric actuators,” of Science and Technology (HUST), Wuhan, China,
IEEE Trans. Ind. Electron., vol. 62, no. 12, pp. 7717–7727, Dec. 2015. in 1997, 2000, and 2005, respectively.
[25] M. A. Hosen, M. A. Hussain, and F. S. Mjalli, “Control of polystyrene From 2006 to 2008, he was a Post-Doctoral
batch reactors using neural network based model predictive control Researcher with the Department of Micro-Nano
(NNMPC): An experimental investigation,” Control Eng. Pract., vol. 19, System Engineering and the Department of
no. 5, pp. 454–467, May 2011. Mechano-Informatics and Systems, Nagoya
[26] W. Liu, L. Cheng, Z.-G. Hou, J. Yu, and M. Tan, “An inversion-free University, Nagoya, Japan. He is currently a Full
predictive controller for piezoelectric actuators based on a dynamic Professor with the School of Automation, HUST.
linearized neural network model,” IEEE/ASME Trans. Mechatronics, His main research interests include rehabilitation
vol. 21, no. 1, pp. 214–226, Feb. 2016. robot, robotic assembly, networked control systems, and bioinformatics.

Yu Cao (S’17) received the B.S. degree in control Hai-Tao Zhang (M’07–SM’13) received the B.E.
science and control engineering from the Wuhan and Ph.D. degrees from the University of Science
University of Technology, Wuhan, China, in 2011, and Technology of China, Hefei, China, in 2000 and
and the M.S degree in software engineering from 2005, respectively.
the Huazhong University of Science and Technology, In 2007, he was a Post-Doctoral Researcher with
Wuhan, in 2014, where he is currently pursuing the the University of Cambridge, Cambridge, U.K. From
Ph.D. degree. 2005 to 2010, he was an Associate Professor with
His current research interests include modeling the Huazhong University of Science and Technology,
and control of rehabilitation robotics exoskeleton Wuhan, China, where he has been a Professor since
systems. 2010. His research interests include multi-agent sys-
tems control, model predictive control, and multi-
robot collaboration manufacturing control.
Dr. Zhang serves as an Associate Editor of the IEEE T RANSACTIONS ON
C IRCUITS AND S YSTEMS II and the Asian Journal of Control and an Editorial
Board Member for the IEEE Control Systems Society (CSS). He also serves
Caihua Xiong (M’12) received the Ph.D. degree as the Chair of IEEE CSS Wuhan Chapter.
in mechanical engineering from the Huazhong
University of Science and Technology (HUST),
Wuhan, China, in 1998.
From 1999 to 2003, he was a Post-Doctoral
Fellow with the City University of Hong Kong,
Hong Kong, and The Chinese University of
Hong Kong, Hong Kong, and a Research Scientist
with the Worcester Polytechnic Institute, Worcester,
MA, USA. He is currently a Chang Jiang Professor
and the Director of the Institute of Rehabilitation
and Medical Robotics, HUST. His current research interests include
biomechatronic prostheses, rehabilitation robotics, and robot motion planning
and control.
Dr. Xiong received the National Science Fund for Distinguished Young
Scholars of China.

