0% found this document useful (0 votes)
15 views12 pages

Trinh2021 - Article - RobustOptimalTrackingControlUs 1

paperggggggggggggg gghh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views12 pages

Trinh2021 - Article - RobustOptimalTrackingControlUs 1

paperggggggggggggg gghh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Journal of Control, Automation and Electrical Systems

https://doi.org/10.1007/s40313-021-00765-2

Robust Optimal Tracking Control Using Disturbance Observer


for Robotic Arm Systems
Nam Hai Trinh2 · Nga Thi‑Thuy Vu1 · Phuoc Doan Nguyen1

Received: 22 February 2021 / Revised: 17 May 2021 / Accepted: 26 May 2021


© Brazilian Society for Automatics--SBA 2021

Abstract
In this paper, an online adaptive dynamic programming (OADP) combining with a disturbance observer is proposed to solve
the problem of robust optimization for nonlinear systems. The scheme has only one neural network so it is more effective
and simpler than others works which use two or three neural networks (two networks for action decision and one for distur-
bance estimation). Moreover, the stability of the overall system which includes the Actor, Critic, and disturbance observer
components is mathematically proven through Lyapunov theory. Finally, the simulation and comparison are done to evaluate
the correction and the advantages of the proposed algorithm. The simulation results show that the observer-based OADP
technique has the ability to give the good response for the Planar robot even under conditions of system uncertainties and
external disturbances. Moreover, they also demonstrate that proposed method is not only simpler but also more effective
than other works, i.e., faster convergence, smaller error, and smaller storage requirement.

Keywords Nonlinear systems · Observer disturbance · Online adaptive dynamic programming · Robust optimal

1 Introduction developed for nonlinear with disturbance (Fan & Yang,


2016); (Song & Lewis, 2020). By using an adaptive Actor-
Nonlinear systems with disturbances are very popular in Critic method, (Fan & Yang, 2016) designed an integral
the real and there many researches have been published for sliding mode controller for a nonlinear system with
tracking control for these systems (Chang et al., 2020); (Du unknown system parameters and input disturbances. An
et al., 2015). However, optimal control design for nonlinear online adaptive learning algorithm based on the PI method
systems is still limited because HJB for nonlinear system is to solve the problem of two-player zero-sum game for con-
a partial differential nonlinear equation, which is difficult to tinuous nonlinear system is mentioned in Vamvoudakis &
solve by analytical methods. Lewis, 2010b. To obtain the optimal value for the control
In recent years, Reinforcement Learning (RL) and law while ensuring system stability, the adaptive algorithm
Adaptive Dynamic Programming (ADP) seem to be poten- is deployed through the simultaneous variation of three
tial methods for solving optimization problems for both neural networks: Critic network, Actor network and a dis-
linear and nonlinear systems (Luo et al., 2017); (Vam- turbance approximation network. In (Wu & Luo, 2012),
voudakis & Lewis, 2010a), in which there are some pio- an online simultaneous policy update algorithm (SPUA)
neering articles considered as the foundation for the appli- is developed to solve the nonlinear H∞ state feedback con-
cation of ADP algorithm to find the optimal controller trol problem with unknown internal system dynamics. The
such as (Sutton & Barto, 2018); (Vamvoudakis & Lewis, algorithm uses two neural networks: the first neural net-
2010a). Based on these successes, some algorithms are work to approximate the Critic component and the other
to approximate the disturbance. The weight matrices of
these two neural networks are updated simultaneously by
* Nga Thi‑Thuy Vu only one iteration loop. The H∞ control problem is given
[email protected] in Luo et al. (2015) for nonlinear systems with unknown
1
internal system model. By using an off-policy reinforce-
School of Electrical Engineering, Hanoi University
of Science and Technology, Hanoi, Vietnam ment learning method, the solution of the HJI equation
2
is calculated entirely from real system data instead of
Viettel Aerospace Institute, Hanoi, Vietnam

13
Vol.:(0123456789)
Journal of Control, Automation and Electrical Systems

mathematical model. In this algorithm, the system data can • The neural network parameters and the controller are
be generated with arbitrary policies rather than the evalu- updated simultaneously in only one iteration loop to fur-
ating policy. This is extremely important and promising ther increase the convergence rate.
for real systems. In (Song & Lewis, 2020), an algorithm
based on online adaptive reinforcement learning method is
developed for the optimal control problem of continuous 2 Dynamic Model of the Robotic Arm
nonlinear systems with model uncertainty. To approximate
the solution of the HJB equation, an Actor-Critic-Identifier To investigate and evaluate a robust optimal controller, we
(ACI) structure is used based on three neural networks: consider a control problem for the two degree of freedom
Actor and Critic networks estimate the optimal control robotic arm, shown in Fig. 1, whose dynamic equation is
law and the optimal cost function, respectively, the third described as follows (Dupree et al., 2010):
network is used for the system dynamic identification.
In this paper, an online adaptive dynamic program-
M(q)q̈ + Vm (q, q)
̇ q̇ + G(q) + F(q)
̇ + d0 (t) = 𝜏(t) (1)
ming (OADP) algorithm (Vamvoudakis & Lewis, 2010a) where
combining with a disturbance observer is used to design d0 (t) denotes an unknown system disturbance.
a robust optimal controller for a robotic arm system. The [ ]
controller includes of two parts: The first part is optimal a + 2a2 cos(q2 ) a3 + a2 cos(q2 )
M(q) = 1
component which optimizes the cost function; the second a3 + a2 cos(q2 ) a3
one is compensation component which uses the estimated
disturbances to cancel the effect of system uncertainties [ ]
a cos(q1 ) + a5 cos(q1 + q2 )
and disturbances. The optimal controller is designed based G(q) = 4
a5 cos(q1 + q2 )
on the value iteration (VI) method and simultaneously
update of the weight matrix. The stability of the whole
[ ]
system using the optimal component and the disturbance −a2 sin(q2 )q̇ 2 −a2 sin(q2 )(q̇ 1 + q̇ 2 )
observer will be proven according to the Uniformly Ulti- Vm (q, q)
̇ =
a2 sin(q2 )q̇ 1 0
mately Bounded (UUB) condition. Finally, some simula-
tions are executed to prove the correctness of the algo- in which ai , i = 1, ..., 5 are constant parameters defined
rithm. The simulation results show that the proposed by robot mechanical characteristic as follow:
scheme gives the good performances for both nominal and 2
a1 = m1 lg1 + J1 + m2 l12 + m2 lg2
2 2
+ J2 , a2 = m2 l1 lg2 , a3 = m2 lg2 + J2
external disturbance including systems. Also, the proposed
algorithm is compared with the traditional AC structure to a4 = m1 glg1 + m2 gl1 , a5 = m2 glg2
show that the presented method is improved accountably,
i.e., faster convergence, lower storage requirement, and where:
smaller errors. mi is the mass of the i link, qi is the rotation angle of the
The contributions of this paper are summarized as below: i link, Ji is the moment of inertia of the i link about the

• Solving the robust optimal problems of the nonlinear sys-


tem with disturbances by using ADP technique combin- y
ing with the disturbance observer.
• The proposed control structure uses only one neural net-
work, so it reduces the computation in comparison with
structures using two neural networks (Wu & Luo, 2012);
J2
(Song & Lewis, 2020). Moreover, this structure avoids m2
the model complexity and increases the convergence rate
of the algorithm. l1
• An online update rule for neural network parameters is q2 l2
lg 2
analyzed and designed not only to ensure the stability of
lg1 J1
the closed-loop system, but these parameters also con-
verge in close proximity to optimal values. In addition, m1
the requirement to initialize a stability controller is elimi- . q1
nated. x

Fig. 1  Two degree of freedom Robotic Arm

13
Journal of Control, Automation and Electrical Systems

vertical axis through the center of the i link, li is the length


of the i link and lgi is the length from the beginning of the
i link to the center of the i link. (with i = 1, 2).
In this paper, the following assumptions are used:

Assumption 2.1 Dupree et al., 2010: The matrix M(q) is


symmetric, positive definite satisfies

∀𝜉 ∈ ℝn:

� ≤ 𝜉 M(q)𝜉 ≤ �b(q)‖𝜉‖
2 T 2
a‖𝜉‖
𝜉 ̇
T M(q) − 2Vm (q, q)
̇ 𝜉=0

where a ∈ ℝ is a positive constant, b(q) ∈ ℝ is a positive Fig. 2  The proposed control structure
function of q.
Defined the variable s as:
3 Robust Optimal Controller Design
s(t) = ė 1 + 𝜆1 e1 where 𝜆1 ∈ ℝn×n , e1 (t) = qd − q (2) for Robotic Arm system
From (1), we have the dynamic equation with respect to The block diagram of the proposed scheme is shown in
s(t) as follows: Fig. 2.
As we mentioned above, the controller has two parts:
M ṡ = −Vm s − 𝜏 + d (3)
optimal component ur (x) and compensation component
where: ud (x)d̂ . Detailed designing for each part is illustrated as
( ) ( ) follows.
d = M. q̈ d + 𝜆1 ė 1 + Vm . q̇ d + 𝜆1 e1 + G + F + d0 = f + d0
(4)
is considered as the system disturbances.
[ ]T 3.1 Optimal Controller Design
Define the state variable x = eT1 , sT then the following
is obtained: Consider the nonlinear system (6) in the case of distur-
[ ] [ ] [ ] bance d = 0 , the following is obtained:
−𝜆1 e1 + s 0n×n 0n×n
ẋ = + 𝜏 + d
−M −1 (q).Vm (q, q).s
̇ −M −1 (q) −M −1 (q) ẋ = f (x) + gu (x)ur (8)
(5)
It is general to assume that system (8) always exists
It can be shortened to: solutions which are depend on initial values x0.
ẋ = f (x) + gu 𝜏 + gd d (6) Define the set Ω ∈ ℝn which consists of all possible
[ ] [ ] solutions of (8). This means that Ω is working space of
where: f (x) =
−𝜆1 e1 + s
, gu =
0n×n (8) which contains all existent trajectories x including the
−M −1 (q).Vm (q, q).s −M −1 (q)
,
̇ origin.
[ ]
.
0n×n
gd =
−M −1 (q) Assumption 1 f (x) satisfies the Lipschitz continuous condi-
For the nonlinear system (6), the desired controller is tion on the set Ω.
designed as follows:
The performance index function is defined as:
𝜏 = u = ur (x) + ud (x)d̂ (7)
∞ ( )
∫t (9)
where ur (x), built according to OADP method, is the optimal J(x) = r x(𝜏), ur (x(𝜏)) d𝜏
control component when d = 0, ud (x) is the gain of the dis- ( )
turbance compensation component, and d̂ is the estimated with the reward function r x(𝜏), ur (𝜏) = xT (𝜏)Qx(𝜏)
disturbance d . +uTr (x)Rur (x) , where Q ∈ ℝ2n×2n , R ∈ ℝn×n are symmetric
positive definite matrices.
The Hamiltonian function is defined as:

13
Journal of Control, Automation and Electrical Systems

( ) ( 𝜕J )T ( ) ( 𝜕J )T ̂̇ 1 = −𝛼1 ( 𝜎̂ ( T
̂ + Q(x) + uT Rur
)
H x, ur , Jx = ẋ + r x, ur = ẋ + xT Qx + uTr Rur W )2 𝜎̂ W r
𝜕x 𝜕x 𝜎̂ T 𝜎̂ + 1
(10) ( )
WRB = 21 𝛼2 Φx gu (x)R−1 gTu (x)x, 𝜎̂ = Φx f (x) + gu ur , Φx = 𝜕Φ
To have the optimal
( )solution for system (8), there must
𝜕x

exist a function V x, ur satisfying the HJB equation: The tuning law for the weight W
̂ 1 is a modified Leven-
( )2
( ) ( 𝜕V )T (berg–Marquardt
) algorithm, using 𝜎̂ T 𝜎̂ + 1 instead of
H x, ur , V = ẋ + xT Qx + uTr Rur = 0 (11) 𝜎̂ T 𝜎̂ + 1 Ioannou & Fidan, (2006).WRB is added to prove
𝜕x
that the closed-loop system is UUB.
Then, the optimal control signal u∗r is calculated by the
following formula:
{ ( )} 3.2 Disturbance Observer
u∗r = arg min H x, ur , V (12)
ur
Consider the nonlinear system with disturbance described
After calculation, u∗r is expressed as: by:
1 𝜕V ẋ = f (x) + gu (x)u + gd (x)d (19)
u∗r = − R−1 gTu (13)
2 𝜕x
( ) where x ∈ ℝn is the system state, ur ∈ ℝm is the system con-
In (13), V x, ur is an (unknown
) component, so to calcu-
trol input, d ∈ ℝl is the system disturbance. f (x) ∈ ℝn and
late the optimal value, V x, ur is approximated by a neural
gu (x) ∈ ℝn×m are known smooth functions, gd (x) ∈ ℝn×l is
network, formed as follows:
the coefficient of the disturbance d on the system. f (x) also
( ) satisfies Assumption 1.
V x, ur = WT Φ(x) + 𝜀(x) (14)
To facilitate proving the stability of the whole system, the
where:W is the weight matrix of the neural network, Φ(x) is following assumptions are used:
the activation function that depends on state variable x, 𝜀(x)
is residual error. Assumption 2 There exists a function 𝛾(x) ∈ ℝm×l such that
Then the optimal control signal u∗r becomes: gd (x) = 𝛾(x)gu (x).
( )
1 𝜕V 1 𝜕Φ
u∗r = − R−1 gTu = − R−1 gTu WT (15) Assumption 3 The system disturbance d is slowly time vary-
2 𝜕x 2 𝜕x ing, that is ḋ ≃ 0.
( However,
) the ideal values in (15) are unknown then
V x, ur is estimated by: Due to the system disturbance is unknown, we use an
( ) observer to approximate the disturbance for system (19) as
V̂ x, ur = Ŵ T Φ(x) (16) follows (Chen, 2004); (Yang et al., 2011):
{
where W ̂ is estimation of W . d̂ = 𝜂 + 𝜌(x)
{ } (20)
To get the approximate solution of the above HJB equa- 𝜂̇ = −h(x) gd (x)[𝜂 + 𝜌(x)] + f (x) + gu (x)u
tion, on-policy ADP( (OADP)) algorithm is used to approxi-
mate the function V x, ur and the control signal u∗r . Accord- where, d̂ is the estimation of the unknown disturbance,
ingly, we choose an arbitrary initial 𝜂 ∈ ℝl is the intermediate variable, referred as the internal
( control) signal ur , then in
0

each step we solve the equation H x, ur , V = 0 with bound-


i i state of the observer, 𝜌(x) ∈ ℝl is a vector function to be
ary conditions. Then updating the control signal as follows: designed and h(x) = 𝜕𝜌(x)∕𝜕x is referred as the coefficient
( ) of the observer.
1 −1 T ( ̂ T )i 𝜕Φ It is shown in Dupree et al. (2010) that for the system
ui+1
r
= − R g u
W (17)
2 𝜕x (19) and the disturbance observer (20), if G(x) = h(x)gd (x)
In order to guarantee that the weight matrix converges to is a symmetric positive definite function, the disturbance
the optimal value while the system is UUB simultaneously, observer can exponentially track the disturbance or the error
the tuning law for the weight W ̂ Would be: of the disturbance observer d̃ = d − d̂ is exponentially stable.
( )T ( )T
{
if xT f (x) + gu (x)ur ≤ 0
( ) Because gu (x) = 0Tnxn −M −1 (q) , gd (x) = 0Tnxn M −1 (q)
̂̇ 1
W

̂ = (18) then
̂̇ 1 + WRB contrast
W
𝛾(x) = −1 (21)
where:
The function h(x) is chosen as:

13
Journal of Control, Automation and Electrical Systems

( ( ))
h(x) = kd 0Tnxn M qd − e1 (22) Choose the Lyapunov function as follows:

where kd is considered as the convergence rate of the ( )

T
observer. V(X, t) = v1 x x + v2 xT Qx + uiT i
r Rur d𝜏
Then 𝜌(x) and G(x) are calculated according to h(x) as t

follows:

( )

v T
+ v3 d̃ T d̃ d𝜏 + 4 W
̃ W
̃
𝛼2

𝜌(x) = h(x)dx (23) t (28)
[ ]
where: W ̃ =W −W ̂ , X = x ur d̃ W ̃ coeff icients
G(x) = h(x)gd (x) = kd I2×2 (24) v1 > 0, v2 > 0, v3 > 0, v4 > 0.
Since u r is solution of (9), so the component
∫ xT Qx + uiT
∞( )
To ensure the stability of closed-loop system according to i is limited. Also, the observer in
r Rur d𝜏
the Lyapunov theorem, the noise compensation component
Sect. 3.2 is exponential convergence then ∫ d̃ T d̃ d𝜏 is con-
t ∞( )
is selected as:
t
ud = −𝛾(x) (25) verging. These leading to the boundness of V as the
following:
Finally, from (17) and (25), the explicit formula for the
controller (7) is: 0 ≤ V(X, t) ≤ 𝜂1 ‖X‖ (29)
1 ( T )i ( 𝜕Φ ) where η1 is a positive scalar.
u = − R−1 gTu W ̂ − 𝛾(x)d̂ (26)
2 𝜕x The derivative of V is:
( ( )) ( ) 2v4 T ̇
V̇ = 2v1 xT f + gu uir + gu ud d̂ + gd d̃ + d̂ − v2 xT Qx + uiT
r
Ruir − v3 ‖ ̃ ‖2 ̃ ̂
‖d ‖ − 𝛼 W W (30)
2

Due to ud = −𝛾(x) and gd = 𝛾(x)gu then Eq. (30) can be


rewritten to:

( ( )) ( ) 2v4 T ̇
V̇ = 2v1 xT f + gu uir − 𝛾(x)gu d̂ + 𝛾(x)gu d̃ + d̂ − v2 xT Qx + uiT r
Ruir − v3 ‖ ̃ ‖2
‖d‖ − 𝛼 W W
̃ ̂
2
( ) ( ) 2v4 T ̇
= 2v1 xT f + gu uir + 𝛾(x)gu d̃ − v2 xT Qx + uiT r
Ruir − v3 ‖ ̃ ‖2
‖d‖ − 𝛼 W W
̃ ̂ (31)
2
( ) ( )
= 2v1 xT f + gu uir + 2v1 xT 𝛾(x)gu d̃ − v2 xT Qx + uiT i ‖d̃ ‖2 2v4 W ̃ TW ̂̇
r Rur − v3 ‖ ‖ −
𝛼2

Since:
3.3 Overall Stability
2v1 xT 𝛾(x)gu d̃ ≤ v1 ‖x‖2 + v1 � �2 � ̃ �2
�gu 𝛾 � �d� (32)
The optimal controller (17) guarantees the stability for sys-
tem (8) which does not consists of disturbances. However,
r Rur ≥ 𝜆min (Q)‖x‖ + 𝜆min (R)�
� i �2
for the system which includes the observer and disturbances, xT Qx + uiT i 2
u�
� r�

r Rur ≤ −v2 𝜆min (Q)‖x‖ − v2 𝜆min (R)�


the controller is (26) and the stability of the overall system � T � � i �2
⇒ −v2 x Qx + uiT i 2
u�
will be demonstrated in this section. � r�
Inserting (7) into (6), the closed-loop system is trans- (33)
formed into the following: Equation (31) equivalents to:

V̇ ≤ 2v1 xT f + gu uir + v1 ‖x‖2 + v1 �


( ) � �
ẋ = f + gu ur + gu ud d̂ + gd d̃ + d̂ (27) �2 � ̃ �2
�gu 𝛾 � �d�
� �2 2v4 T ̇
− v2 𝜆min (Q)‖x‖2 − v2 𝜆min (R)�uir � − v3 � ̃ �2
�d� − 𝛼 W W
̃ ̂
� � 2
(34)

13
Journal of Control, Automation and Electrical Systems

( )
̂̇ = W
Case 1 when xT f + gu uir > 0 then W ̂̇ 1 + WRB

2v4 T ̇ 2v T 2v T
− ̃ W
W ̂ = − 4W ̂ 1 − 4W
̃ W ̃ WRB
𝛼2 𝛼2 𝛼2
𝛼 T ( T ) (35)
𝜎̂
= 2v4 1 W
̃ ( ) 2
𝜎̂ W − 𝜎̂ T W r
̃ T Φx gu (x)R−1 gT (x)x
̃ + Q(x) + uT Rur − v4 W
u
𝛼2 T
𝜎̂ 𝜎̂ + 1

Let 𝜀H = 𝜎̂ T W + Q(x) + uTr Rur yields:

2v4 T ̇ 𝛼 ̃T T ̃ 𝛼 ̃T
− ̃ W
W ̂ = −2v4 1 (W 𝜎̂ 𝜎̂ W) + 2v4 1 ( W 𝜎̂ ) 𝜀H − v4 W
̃ T Φx gu (x)R−1 gT (x)x (36)
𝛼2 𝛼2 𝜎̂ T 𝜎̂ + 1 2 𝛼2 𝜎̂ T 𝜎̂ + 1 2 u

̂ ≤ −2v4 1 ‖ 𝜎̂
𝛼 ‖ ‖2 𝛼 ‖ ‖2 𝛼 ‖ 𝜀H ‖
̃ ‖2 + v4 1 ‖ 𝜎̂
‖ ‖W ̃ ‖2 + v4 1 ‖
‖ ‖W ‖
2
2v4 T ̇
− ̃ W
𝛼2 ‖ ‖‖ ‖ 𝛼2 ‖ ‖ ‖ ‖ 𝛼2 ‖ ‖
W
𝛼2 ‖ 𝜎̂ T 𝜎̂ + 1 ‖ ‖ 𝜎̂ T 𝜎̂ + 1 ‖ ‖ 𝜎̂ T 𝜎̂ + 1 ‖
−v4 W̃ T Φx gu (x)R−1 gT (x)x
u
(37)
𝛼1 ‖
‖ 𝜎̂
‖2
‖ ‖W 𝛼 ‖ 𝜀H ‖
̃ ‖2 + v4 1 ‖ ‖ − v4 W
2
= −v4 ̃ T Φx gu (x)R−1 gT (x)x
‖ ‖
𝛼2 ‖ 𝜎̂ T 𝜎̂ + 1 ‖ ‖ ‖ ‖ ‖
𝛼2 ‖ 𝜎̂ T 𝜎̂ + 1 ‖ u

In the other hand, since f (x) satisfies Assumption 1, there


exists k such that 2xT f ≤ 2k‖x‖2.
Substituting uir = − 12 R−1 gTu ΦTx W
̂ into (34) and using Lip-
schitz condition, the following is obtained:

�gu 𝛾 � �d� ≤ 2kv1 ‖x‖ − v1 x gu R gu Φx W + v1 ‖x‖ + v1 �gu 𝛾 � �d�


� �
2v1 xT f + gu uir + v1 ‖x‖2 + v1 � �2 � ̃ �2 2 T −1 T T ̂ 2 � �2 � ̃ �2
(38)
= v1 (2k + 1)‖x‖2 + v1 � �2 � ̃ �2 T −1 T T ̂
�gu 𝛾 � �d� − v1 x gu R gu Φx W

V̇ ≤ (2k + 1)v1 − v2 𝜆min (Q) ‖x‖2 − v2 𝜆min (R)�uir �


� � � �2
Substituting (37) and (38) into (34) leading to the fol-
� �
lowing results: � � �
𝛼2 � 𝜎̂ � 2
+ v1 � �2 � ̃ �2 � � ̃ �2
V̇ ≤ (2k + 1)v1 − v2 𝜆min (Q) ‖x‖2 − v2 𝜆min (R)�uir �
� � � �2 �gu 𝛾 � − v3 �d� − v4 𝛼 � 𝜎̂ T 𝜎̂ + 1 � �W � (41)
1 � �
� �
� � 𝛼 � 𝜀H � 2
� � � �2 +v4 2 � � + v1 𝜆x
2
𝛼 𝜎
̂ � 𝛼2 � 𝜀H �
+ v1 � �2 �̃� 2�
𝛼1 � �
2
�gu 𝛾 � − v3 �d� − v4 𝛼 � 𝜎̂ T 𝜎̂ + 1 � v4 𝛼 � 𝜎̂ T 𝜎̂ + 1 � � 𝜎̂ T 𝜎̂ + 1 �
1� � 1� �
� �
− v1 W ̃ T Φx gu (x)R−1 gT (x)x
̂ T + v4 W where 𝜆 = WTmax Φx max gu max (x)R−1 gTu max (x)
u
(39)
Let v4 = v1 and note that W ̂ = W:
̃ +W

V̇ ≤ (2k + 1)v1 − v2 𝜆min (Q) ‖x‖2 − v2 𝜆min (R)�uir �


� � � �2
� �
� � � �2
𝛼2 � �2
+ v1 � �2 � ̃ �2 𝛼2� 𝜎̂ � � ̃ �2 � 𝜀H � (40)
�gu 𝛾 � − v3 �d� − v4 𝛼 � 𝜎̂ T 𝜎̂ + 1 � �W � + v4 𝛼 � 𝜎̂ T 𝜎̂ + 1 �
1� � 1� �
T −1 T
−v1 W Φx gu (x)R gu (x)x

13
Journal of Control, Automation and Electrical Systems

V̇ ≤ (2k + 1)v1 − v2 𝜆min (Q) ‖x‖2 − v2 𝜆min (R)�uir �


� � � �2 where:V2x = 2v1 (1 − k0 ) − v2 𝜆min (Q) ; V2u = −v2 𝜆min (R) ;
� � V2d = v1 ‖ ‖2
‖gu 𝛾 ‖ − v3.
� � 𝛼 � � 2
𝜎
̂ 𝛼 ‖ 𝜎̂ ‖2 𝛼 ‖ 𝜀H ‖2
+ v1 � � �̃� 2� � � ̃ �2
2 2
�gu 𝛾 � − v3 �d� − v4 𝛼 � 𝜎̂ T 𝜎̂ + 1 � �W � (42) V2W = −v4 𝛼1 ‖ 𝜎̂ T 𝜎+1 ‖ ; V2𝜀 = v4 𝛼1 ‖ 𝜎̂ T 𝜎+1 ‖.
1� � 2 ‖ ̂ ‖ 2‖ ̂ ‖

If vi , i = 1, ..., 4 satisfy: v2 > 𝜆1 (Q)0 ; v3 > v1 ‖ ‖2


2v (1−k )
𝛼2 � � ‖gu 𝛾 ‖ .
2 2
𝜀 v 𝜆 v
+v4 � H � + 1 + 1 ‖x‖2 � √ √
𝛼1 � And ‖x‖ ≥ −V2𝜀 or ‖ ‖ur ‖ ≥ −V2u or ‖d‖ ≥ −V2d or

min

� 𝜎̂ T 𝜎̂ + 1 � 2 2 V i‖ V2𝜀 ‖̃‖ V2𝜀


‖ ‖ ≥ −V2W .
2x

Let V1x = (2k + 3∕2)v1 − v2 𝜆min (Q) ; V1u = −v2 𝜆min (R) ; ‖W
̃‖ V2𝜀

V1d = v1 ‖ ‖2
‖gu 𝛾 ‖ − v3. Then

V̇ ≤ 𝜂3 ‖X‖2
𝛼 ‖ 𝜎̂ ‖2 𝛼2 ‖ 𝜀H ‖2 v1 𝜆2
V1W = −v4 𝛼2 ‖ 𝜎̂ T 𝜎+1 ‖ ; V = v ‖ ‖ + . (48)
1‖ ̂ ‖ 𝛼1 ‖ 𝜎̂ 𝜎+1
̂ ‖
1𝜀 4 T 2

where 𝜂3 ≤ max V2x , V2u , V2d , V2w is a negative factor.


Accordingly, (42) is rewritten as: { }

V̇ ≤ V1x ‖x‖2 + V1u �uir � + V1d �


� �2 ̃ �2 � ̃ �2 According to both (44) and (48), therefore the closed-loop
� � �d� + V1W �W � + V1𝜀
system is UUB ∇∇ .
(43)
Remark: In (17), the controller component ur depends on
If vi , i = 1, ..., 4 satisfy: v2 ≥ , v3 > v1 ‖ ‖2
‖ u ‖ ,
(2k+2)v1
𝜆min (Q)
g 𝛾 the system information (gu(x)) which does not require in the
v4 = v1. � √ √ algorithms using Actor-Critic structure with two neural net-
And ‖x‖ ≥ −V1𝜀 or ‖ i‖ ≥ or ‖d̃ ‖ ≥ or works. However, in our proposed scheme, the gu(x)) is only
V V1𝜀 V1𝜀
u
‖ r‖ −V1u ‖ ‖ −V1d

1x
nominal system parameters; the uncertainties of the system
‖ ‖ ≥ −V .
‖W̃‖ V1𝜀
can be considered as disturbances which is demonstrated by
Then
1W
d. By using a simple observer (20), the effect of the distur-
bances is canceled; so, the calculating burden of the system
V̇ ≤ 𝜂2 ‖X‖2 (44) is reduced with only one neural network.

where 𝜂2 ≤ max V1x , V1u , V1d , V1w is a negative factor.


{ }

̂̇ 1 when xT f + gu ui ≤ 0
( )
Case 2: Ŵ̇ = W
r

̂ 1 ≤ −2v4 1 ‖ 𝜎̂
2v4 T 𝛼 ‖ ‖2 𝛼 ‖ ‖2 𝛼 ‖ 𝜀H ‖ 2
− ̃ W
W ‖ ‖W ̃ ‖2 + v4 1 ‖ 𝜎̂ ‖ ‖W ̃ ‖2 + v4 1 ‖ ‖
𝛼2 ‖ ‖
𝛼2 ‖ 𝜎̂ T 𝜎̂ + 1 ‖ ‖ ‖ ‖ ‖
𝛼2 ‖ 𝜎̂ T 𝜎̂ + 1 ‖ ‖ ‖ ‖ ‖
𝛼2 ‖ 𝜎̂ T 𝜎̂ + 1 ‖
(45)
𝛼 ‖ 𝜎̂ ‖2
𝛼 ‖ 𝜀H ‖ 2
= −v4 1 ‖

‖ ‖W
‖ ‖
̃ ‖2 + v4 1 ‖


𝛼2 ‖ 𝜎̂ 𝜎̂ + 1 ‖
T 𝛼2 ‖ ‖
‖ 𝜎̂ T 𝜎̂ + 1 ‖

since �xT f + gu uir ≤ 0 , there exists k0 > 0


( )
Meanwhile, 4 Simulation Results
such that xT f + gu uir ≤ −k0 ‖x‖2.

This leads to the following: Consider an optimization problem for nonlinear system (6)

�gu 𝛾 � �d� ≤ −2v1 k0 ‖x‖ + v1 �gu 𝛾 � �d�


� �
2v1 xT f + gu uir + v1 ‖x‖2 + v1 � �2 � ̃ �2 2 � �2 � ̃ �2 (46)

With transforming steps completely similar case 1, we given above. The parameters of the problem are chosen as
get: follows:

V̇ ≤ V2x ‖x‖2 + V2u �u[i]


� � 2
⎡ 40 2 −44 ⎤
� + V2d � ̃ �2 � ̃ �2
�d� + V2W �W � + V2𝜀 � �
� r � ⎢ 2 40 −6 ⎥
4 0.25 0
(47) Q=⎢
0 ⎥⎥
,R =
⎢ −4 4 4 0 0.25
⎣ 4 −6 04 ⎦
[ ]T
The desired trajectory qd (t) = 3 sin(0.1t) 3 cos(0.1t) .

13
Journal of Control, Automation and Electrical Systems

[ ]T is immediately removed to make the disturbance observer


The system disturbance d0 (t) = 0.5 sin(t) 0.5 cos(t) .
works properly.
The variable s(t) is determined[ ]by the formula:
15.6 10.6 The simulation results are shown in Figs. 3, 4, 5, 6.
s(t) = ė 1 + 𝜆1 e1 where 𝜆1 = . From the simulation results, it can be seen that at the
10.6 10.4
The Planar robotic arm system parameters (Luo et al., early of the simulation, the Critic is in the learning process.
2015) are: ­a1 = 5, ­a2 = 1, ­a3 = 1, ­a4 = 1.2 g, ­a5 = g. The response of the output states (q1 and q2) tend to track
The OADP controller parameters: the reference values. After about 100 s, the learning process
𝛼1 = 800, 𝛼2 = 5.10−3 , 𝜁 = diag(1000, 1000, 1000, 1, 1, 1). finishes, the weight of the neural network W converges to the
Select the rate coefficient: kd = 150. Then, we have a ideal value; however, the output states still oscillate around
function related to the disturbance observer as follows: the reference trajectories due to the effect of the PE noise.
( ) At the time of 200 s, the PE noise is excluded, the tracking
h(x) = 150 0Tnxn M(q) errors are almost zero.
[ ]
0 0 a1 + 2a2 cos(q2 ) a3 + a2 cos(q2 )
= 150
0 0 a3 + a2 cos(q2 ) a3

[ ] 5 Comparison with the Other AC Algorithms


0 0 a1 + 2a2 cos(q2 ) a3 + a2 cos(q2 ) [ ]T
Using two Neural Networks (AC2NN):
𝜌(x) = 150 e1 e2
0 0 a3 + a2 cos(q2 ) a3
In order to evaluate the advantages of the proposed algo-
The NN weights are randomly initialized in [0;1] and the
rithm, a comparison is executed between presented scheme
states including q(0) and q(0)
̇ ∈ ℝ2 are arbitrarily initialized.
and the method which used two neural network for learning
The PE condition should be ensured throughout the control-
process. The convergence rate of the two algorithms ‖W‖ is
ler’s learning period. However, unlike linear systems, there
shown in Fig. 7. The proposed OADP method uses only one
is no way to exactly guarantee the PE condition in nonlinear
neural network in the algorithm, so the convergence rate is
systems. Therefore, to achieve this PE condition, a PE noise
significantly faster than the AC algorithm using two neural
which is the sum of trigonometric components with different
networks, i. e., the Critic weight of the OADP algorithm
frequencies is added in the first 200 s. After this period, it
only needs nearly 200 s to converge; but in the AC2NN, it is

Fig. 3  The state q of the robotic


arm system and the desired
trajectory qd

13
Journal of Control, Automation and Electrical Systems

Fig. 4  The state q in the first


50 s

Fig. 5  The state’s error of the a


robotic arm system (a) and
the error of the disturbance
observer after turn off the PE
noise (b)

13
Journal of Control, Automation and Electrical Systems

Fig. 6  The weights of neural


network during the simulation

Fig. 7  The weight matrix’s


Euclide norm between proposed
method and AC2NN

13
Journal of Control, Automation and Electrical Systems

Table 1  Comparison of quality between OADP and AC2NN of switched nonlinear systems with unmodeled dynamics. IEEE
Access, 8, 204782–204790.
No Criterion OADP AC2NN Chen, W. (2004). Disturbance observer based control for nonlinear sys-
tems. IEEE/ASME Transactions on Mechatronics, 4(9), 706–710.
1 Convergence time W­ 1(s) 197 s 694 s Du, X. K., Zhao, H., & Chang, X. H. (2015). Unknown input observer
2 Convergence time ­W2(s) 135 s 871 s design for fuzzy systems with uncertainties. Applied Mathematics
3 Convergence time ­W3(s) 175 s 854 s and Computation, 266, 108–118.
4 Convergence time ­W4(s) 85 s 672 s Dupree, K., Patre, P. M., Wilcox, Z. D., & Dixon, W. E. (2010).
Asymptotic optimal control of uncertain nonlinear Euler-
5 Convergence time ­W5(s) 184 s 754 s Lagrange systems. Automatica, 47(1), 99–107.
6 Convergence time ­W6(s) 118 s 397 s Fan, Q., & Yang, G. (2016). Adaptive actor–critic design-based
7 Convergence time ­W7(s) 141 s 600 s integral sliding-mode control for partially unknown nonlinear
8 Convergence time ­W8(s) 96 s 222 s systems with input disturbances. IEEE Transactions on Neural
Networks and Learning Systems, 27(1), 165–177.
9 Storage resources∗ 8 16 Gao, W., & Jiang, Z. (2016). Adaptive dynamic programming and
adaptive optimal output regulation of linear systems. IEEE
*The weight matrices needed in whole system
Transactions on Automatic Control, 61(12), 4164–4169.
Ioannou, P., & Fidan, B. (2006). Adaptive control tutorial. Society
for Industrial and Applied Mathematics.
more than 800 s. The detail comparison results are depicted Jiang, Y., & Jiang, Z.-P. (2008). Computational adaptive optimal
control for continuous-time linear systems with completely
in the Table 1, it is shown that using the simultaneous update
unknown dynamics. Automatica, 48(10), 2699–2704.
method with only one neural network reduces the computa- Liu, Y., Luo, Y., & Zhang, H. (2014, December). Adaptive dynamic
tion cost and increases the convergence rate of the algorithm programming for discrete-time LQR optimal tracking control
significantly. problems with unknown dynamics. In 2014 IEEE Symposium on
Adaptive Dynamic Programming and Reinforcement Learning
(ADPRL) (pp. 1-6). IEEE.
Luo, B., Liu, D., & Wu, H. (2018). Adaptive constrained optimal
6 Conclusions control design for data-based nonlinear discrete-time systems
with critic-only structure. IEEE Transactions on Neural Net-
works and Learning Systems, 29(6), 2099–2111.
An online adaptive dynamic programming combining
Luo, B., Liu, D., Wu, H., Wang, D., & Lewis, F. L. (2017). Policy
with a disturbance observer is proposed to solve the prob- gradient adaptive dynamic programming for data-based optimal
lem of robust optimization for nonlinear systems in this control. IEEE Transactions on Cybernetics, 47(10), 3341–3354.
paper. The scheme, which has only one neural network, Luo, B., Wu, H., & Huang, T. (2015). Off-policy reinforcement learn-
ing for H
­ ∞ control design. IEEE Transactions on Cybernetics,
provides better performance, i.e., reduces computation
45(1), 65–76.
time, increases the quality of the system. The stability Song, R., & Lewis, F. L. (2020). Robust optimal control for a class
of the overall system which includes the Actor, Critic, of nonlinear systems with unknown disturbances based on dis-
and disturbance observer components is mathematically turbance observer and policy iteration. Neurocomputing, 390,
185–195.
proven through Lyapunov theory. Finally, the simulation
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An intro-
and comparison are done to evaluate the correction and duction. MIT press.
the advantages of the proposed algorithm. The simulation Vamvoudakis, K. G., & Lewis, F. L. (2010a). Online actor–critic algo-
results show that the observer-based OADP technique has rithm to solve the continuous-time infinite horizon optimal control
problem. Automatica, 46(5), 878–888.
the ability to give the good response for the Planar robot
Vamvoudakis, K. G., & Lewis, F. L. (2010b). Online solution of
even under conditions of system uncertainties and external nonlinear two-player zero-sum games using synchronous policy
disturbances. Moreover, they also demonstrate that pro- iteration. In 49th IEEE Conference on Decision and Control,
posed method is not only simpler but also more effective Atlanta, GA, pp. 3040–3047, 2010.
Vrabie, D., Pastravanu, O., Abu-Khalaf, M., & Lewis, F. L. (2008).
than AC2NN, i.e., faster convergence, smaller error, and
Adaptive optimal control for continuous-time linear systems based
smaller storage requirement. on policy iteration. Automatica, 45(2), 477–484.
Wu, H., & Luo, B. (2012). Neural network based online simultaneous
Declarations policy update algorithm for solving the HJI equation in nonlinear
­H∞ control. IEEE Transactions on Neural Networks and Learning
Systems, 23(12), 1884–1895.
Conflict of interest The authors declares that they have no conflict of
Yang, J., Chen, W. H., & Li, S. (2011). Non-linear disturbance
interest.
observer-based robust control for systems with mismatched dis-
turbances/uncertainties. IET Control Theory & Applications,
5(18), 2053–2062.
References Zhang, K., & Ge, S. (2019). Adaptive optimal control with guaranteed
convergence rate for continuous-time linear systems with com-
pletely unknown dynamics. IEEE Access, 7, 11526–11532.
Chang, Y., Zhang, S., Alotaibi, N. D., & Alkhateeb, A. F. (2020).
Observer-based adaptive finite-time tracking control for a class

13
Journal of Control, Automation and Electrical Systems

Zhang, J., Peng, Z., Hu, J., Zhao, Y., Luo, R., & Ghosh, B. K. (2020). performance for switched pure-feedback nonlinear systems. IEEE
Internal reinforcement adaptive dynamic programming for opti- Access, 9, 69481–69491.
mal containment control of unknown continuous-time multi-agent
systems. Neurocomputing, 413, 85–95. Publisher’s Note Springer Nature remains neutral with regard to
Zhou, P., Zhang, L., Zhang, S., & Alkhateeb, A. F. (2020). Observer- jurisdictional claims in published maps and institutional affiliations.
based adaptive fuzzy finite-time control design with prescribed

13

You might also like