0% found this document useful (0 votes)
55 views8 pages

Sampled-Data-Based Adaptive Optimal Control For 2dof Helicopter

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views8 pages

Sampled-Data-Based Adaptive Optimal Control For 2dof Helicopter

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

IET Control Theory & Applications

Special Issue: Data-driven Control, and Data-Based System Modelling,


Monitoring, and Control

ISSN 1751-8644
Sampled-data-based adaptive optimal Received on 24th September 2015
Revised on 29th January 2016
output-feedback control of a Accepted on 13th March 2016
doi: 10.1049/iet-cta.2015.0977
2-degree-of-freedom helicopter www.ietdl.org

Weinan Gao1 , Mengzhe Huang1 , Zhong-Ping Jiang1 ,Tianyou Chai2


1
Department of Electrical and Computer Engineering,Tandon School of Engineering, NewYork University, Brooklyn, NY 11201, USA
2
State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University, Shenyang 110819,
People’s Republic of China
E-mail: [email protected]

Abstract: This study addresses the adaptive and optimal control problem of a Quanser’s 2-degree-of-freedom helicopter
via output feedback. In order to satisfy the requirement of digital implementation of flight controller, this study distin-
guishes itself through proposing a novel sampled-data-based approximate/adaptive dynamic programming approach. A
policy iteration algorithm is presented that yields to learn a near-optimal control gain iteratively by input/output data. The
convergence of the proposed algorithm is theoretically ensured and the trade-off between the optimality and the sampling
period is rigorously studied as well. Finally, the authors show the performance of the proposed algorithm under bounded
model uncertainties.

1 Introduction of sensors, we propose a more practical output-feedback design


approach. Second, this paper, for the first time, fills the gap between
Miniature unmanned aerial vehicles (UAVs) are not only deployed adaptive optimal control [31, 32] and sampled-data system theory
for military reconnaissance, but also increasingly used in civil- [33] for continuous-time dynamic systems. It is generally accepted
ian applications, such as emergent transportation and mapping that modern flight control systems often use digital controllers
in hazardous area. Typical examples of miniature UAVs include due to numerous benefits from digital technology. Therefore, the
rotary-wing [1], fixed-wing [2, 3], and flapping-wing aircrafts proposed approach is of great practical importance. Since the con-
[4]. Among them, rotary-wing aircrafts have attracted consid- troller is designed by periodic sampling, it is suboptimal for the
erable attention since they require little space for take-off and continuous-time system. Interestingly, we can guarantee that this
landing, as well as have strong flexibility in a dense environ- sampled-data controller approaches the optimal controller as the
ment. However, high-precision control of rotary-wing UAVs is sampling period goes to zero. Third, this paper computes an initial
still a challenging task due to the unknown system dynamics and stabilising dynamic output-feedback controller by means of robust
unmeasurable states in practice. As a result, further investigation control theory.
on the attitude control of rotary-wing aircrafts is imperative and The rest of this paper is organised as follows. In Section 2, the
significant. dynamic model of a 2-DOF helicopter is first presented in the pres-
Multiple alternatives are proposed to address the attitude control ence of model uncertainties. Then, we review some basics of linear
problem of rotary-wing UAVs, such as proportional––integral– optimal control theory. The ADP-based output-feedback approach
–derivative [5], sliding model [6] and backstepping methods [7, 8]. is proposed in Section 3 and an initial stabilising controller is
With the unknown or uncertain dynamics of the aircraft sys- developed in Section 4. In Section 5, we test the effectiveness
tem in mind, adaptive control laws can be designed using the of the proposed approach. Concluding remarks are contained in
above-mentioned control methods; see [9, 10]. Section 6.
Optimal control is another important issue of flight control
Notations. Throughout this paper, | · | represents the Euclidean
design. It mainly concerns how to develop controllers to minimise
norm for vectors and the induced norm for matrices. vec(A) =
fuel usage [11, 12]. These algorithms usually rely on the accurate
[aT1 , aT2 , . . . , aTm ]T , where ai ∈ Rn are the columns of A ∈ Rn×m . ⊗
knowledge of the system dynamics. Unfortunately, it is often hard
indicates the Kronecker product operator.
to obtain an accurate model of a real aircraft. A straightforward
way avoiding this difficulty is to design an adaptive optimal con-
trol policy by identifying these system parameters first, and then
solving the corresponding Hamilton–Jacobi–Bellman equation [or 2 Problem formulation and preliminaries
algebraic Riccati equation (ARE) for linear systems]. However, the
closed-loop system designed this way responds slowly to parameter In this section, we first construct the dynamic model of a 2-DOF
variations from the plant [13]. helicopter with parametric uncertainties. Then, we review some
Approximate/adaptive dynamic programming (ADP) is a non- basic results in linear optimal control theory and a policy iteration
model-based method inspired from biological systems [14–17]. (PI) technique for solving a discrete-time Riccati equation.
Recently, it is under extensive investigation how to iteratively learn
the optimal controller without requiring the knowledge of sys-
tem dynamics; see [13, 18–30] and many references therein. In 2.1 Dynamic model of 2-DOF helicopter
this paper, we employ the data-driven ADP technique to develop
an attitude controller for a Quanser’s 2-degree-of-freedom (DOF) The free-body diagram of the 2-DOF helicopter is shown in Fig. 1.
helicopter, which is a benchmark for testing various flight con- Using the Euler–Lagrange formula, the dynamic model of the
trol algorithms. Our contributions are threefold. First, comparing helicopter is made available in [34]. In this paper, a state-space
with full-state feedback ADP control which usually uses plenty description of the helicopter with unknown parameters is given as

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
1440 © The Institution of Engineering and Technology 2016
Assumption 1: The pair (A, B) is controllable and (A, C) is observ-
able, where A = A0 + A and B = B0 + B.

Define a quadratic cost for (1)


∞
J (x0 ) = (yT (τ )Qy(τ ) + uT (τ )Ru(τ )) dτ , (4)
0


where Q = QT ≥ 0 and R = RT > 0 with (A, QC) observable.
By linear optimal control theory [35], a minimum cost J ∗ is
obtained by

u = −R−1 BT P ∗ x ≡ −K ∗ x, (5)

where P ∗ = (P ∗ )T > 0 is the unique solution to the following ARE


Fig. 1 Free-body diagram of 2-DOF helicopter

AT P ∗ + P ∗ A + C T QC − P ∗ BR−1 BT P ∗ = 0. (6)
Table 1 System parameters
Parameter Mean For practical implementation in the helicopter control system, our
goal in this paper is to seek a suboptimal sampled-data control
Jp total moment of inertia about pitch axis policy. A discretised model of (1) is obtained by taking periodic
Jy total moment of inertia about yaw axis
mheli total mass of the helicopter
sampling
Bp equivalent viscous damping about pitch axis
By equivalent viscous damping about yaw axis
Kpp thrust force constant of yaw propeller
xk+1 = Ad xk + Bd uk ,
(7)
Kyy thrust torque constant acting on yaw axis from yaw propeller yk = Cxk ,
Kpy thrust torque constant acting on pitch axis from yaw propeller
Kyp thrust torque constant acting on yaw axis from pitch propeller
where xk and uk areh the state and the input at the sample instant kh.
Ad = eAh , Bd = ( 0 eAτ dτ )B, and h > 0 is the sampling period.
Assume the sampling frequency ωh = 2π/h is non-pathological
follows [33]. In other words, one cannot find any two eigenvalues of A
with equal real parts and imaginary parts that√differ by an integral
ẋ = (A0 + A)x + (B0 + B)u multiple of ωh . Then, both (Ad , C) and (Ad , QC) are observable
(1) with (Ad , Bd ) controllable. The cost for (7) is
y = Cx,



where x = [θ, ψ, θ̇ , ψ̇]T ∈ R4 is the state, u = [Fp , Fy ]T ∈ R2 is
Jd (x0 ) = (yjT Qyj + ujT Ruj ). (8)
the control input, y(t) = [θ, ψ]T ∈ R2 the output, and all matrices
j=0
have appropriate dimensions. The corresponding system matrices
of the nominal system are
The optimal control law minimising (8) is
⎡ ⎤ ⎡ ⎤
0 0 1 0 0 0
⎢0 0 0 1 ⎥ ⎢ 0 0 ⎥ uk = −(R + BdT Pd∗ Bd )−1 BdT Pd∗ Ad xk ≡ −Kd∗ xk
⎢ ⎥ ⎢K Kpy ⎥
(9)
⎢ Bp ⎢ pp ⎥
A0 = ⎢0 0 − 0 ⎥ ⎥, B0 = ⎢ ⎥
⎢ JTp ⎥ ⎢ JTp JTp ⎥ ,
⎣ ⎢ ⎥ where Pd∗ = Pd∗ T > 0 is the unique solution to
By ⎦ ⎣ Kyp Kyy ⎦
0 0 0 −
JTy JTy JTy
⎡ ⎤ ATd Pd∗ Ad − Pd∗ + C T QC − ATd Pd∗ Bd (R + BdT Pd∗ Bd )−1 BdT Pd∗ Ad = 0.
1 0
(10)
T ⎢0 1⎥
C =⎣
0⎦
,
0
0 0 Instead of directly solving (10) which is non-linear in Pd , the
model-based PI Algorithm 1 (see Fig. 2) is proposed by Hewer [36]
with to approximate Pd∗ . It has been concluded that sequences {Pj }∞ j=0
2 2 and {Kj }∞
j=1 computed from Algorithm 1 (Fig. 2) converge to Pd

JTp = Jp + mheli lcm , JTy = Jy + mheli lcm . ∗
and Kd , respectively. Moreover, for j = 0, 1, 2, . . ., Ad − Bd Kj is a
The norm-bounded system uncertainties are defined as follows Schur matrix.

A = G1 F1 E1 , B = G2 F2 E2 (2)
3 Adaptive optimal control design via
where for i = 1, 2, Gi and Ei are known constant matrices with output-feedback ADP
appropriate dimensions, Fi is a bounded unknown matrix satisfying
In this section, we will develop an adaptive optimal control
T algorithm for the discretised system (7) via output feedback. This
Fi ∈ F := {F|F F ≤ I }. (3) algorithm does not rely on the knowledge of the system matri-
ces Ad , Bd or C. Convergence of the proposed control algorithm is
The following assumption is made on the system (1). rigorously analysed as well.

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
© The Institution of Engineering and Technology 2016 1441
Fig. 2 Model-based PI algorithm

3.1 State reconstruction −Kj xk + uk


= (Kj xk + uk )T BdT Pj Bd BdT Pj Ad
2xk
Just like in [31, 37], we extend the system (7) using input/output
sequences on the time horizon [k − N , k − 1] − (ykT Qyk + xkT KjT RKj xk )


−K̄j zk + uk
x k = AN
d xk−N + V (N )ūk−1,k−N
= (K̄j zk + uk )T H̄j1 H̄j2
(13) 2zk
ȳk−1,k−N = U (N )xk−N + T (N )ūk−1,k−N
− (ykT Qyk + zkT K̄jT RK̄j zk )
where = [ukT ⊗ ukT − (zkT ⊗ zkT )(K̄jT ⊗ K̄jT )]vec(H̄j1 )
T
ūk−1,k−N = [uk−1 T
, uk−2 T
, . . . , uk−N ]T + 2[(zkT ⊗ zkT )(Iq ⊗ K̄jT ) + (zkT ⊗ ukT )]
T
ȳk−1,k−N = [yk−1 T
, yk−2 T
, . . . , yk−N ]T × vec(H̄j2 ) − (ykT Qyk + zkT K̄jT RK̄j zk )

V (N ) = [Bd , Ad Bd , · · · , AN −1
Bd ] := φk1 vec(H̄j1 ) + φk2 vec(H̄j2 ) − (ykT Qyk + zkT K̄jT RK̄j zk ) (16)
d
−1 T
U (N ) = [(CAN ) , . . . , (CAd )T , C T ]T
d where H̄j1 = BdT Pj Bd , H̄j2 = BdT Pj Ad , φk1 and φk2 are two row
⎡ −2 ⎤
0 CBd CAd Bd · · · CAN d Bd vectors computed by online data and K̄j .
⎢0 N −3 ⎥
· · · CAd Bd ⎥
⎢ 0 CBd
⎢ ⎥
T (N ) = ⎢ .. .. .. .. .. ⎥ Assumption 2: There exists a large enough integer s > 0 such that
⎢. . . . . ⎥
⎣ ⎦
0 ··· 0 CBd 1
0 0 0 0 0 rank(
) = [dim(u) + dim(z)][dim(u) + dim(z) + 1], (17)
2
and N is the observability index [31]. Hence there exists a left where
inverse of U (N ), defined as U + (N ) = [U T (N )U (N )]−1 U T (N ).
A lemma about the uniqueness of state reconstruction in [38] is

= ηk0 ⊗ ηk0 , ηk1 ⊗ ηk1 , . . . , ηks ⊗ ηks (18)
recalled below.

Lemma 1: Under the condition of Assumption 1, if ωh is non- and ηk = [ukT , zkT ]T .


pathological, xk is obtained uniquely in terms of measured
input/output sequences by Assumption 2 is related to the condition of persistent excita-
tion (PE) in adaptive control theory [39, 40]. Under (17), it has

been proved in [37] that the triplet (P̄j , H̄j1 , H̄j2 ) can be uniquely
ūk−1,k−N
xk = [Mu , My ] := zk (14) solved (e.g. by least square method) given K̄j and online data
ȳk−1,k−N
uk , yk , zk , zk+1 measured during the period k ∈ [k0 , ks ]. Then, K̄j+1
+ can be computed by (12)
where My = AN d U (N ), Mu = V (N ) − My T (N ), zk = [ūk−1,k−N ,
T

ȳk−1,k−N ] ∈ R and q = N [dim(u) + dim(y)] = 4N .


T T q
K̄j+1 = (R + H̄j1 )−1 H̄j2 . (19)

3.2 Adaptive optimal controller design Remark 1: By solving (16) instead of (11), the requirement on the
knowledge of system matrices Ad , Bd and C is fully eliminated.
Defining Aj = Ad − Bd Kj , we rewrite the discretised system (7) as
What we need to measure is only input/output data at the sampling
instants.
xk+1 = Aj xk + Bd (Kj xk + uk ) (15)
Now, we are ready to present the output-feedback ADP
Letting K̄j = Kj  and P̄j = T Pj , by (11) and (15), it follows Algorithm 2 (see Fig. 3). It should be noted that (16) called policy
that evaluation aims to solve P̄j and (19) called policy improvement
is to update control gain K̄j+1 . The convergence of Algorithm 2
T
zk+1 P̄j zk+1 − zkT P̄j zk = xk+1
T
Pj xk+1 − xkT Pj xk (Fig. 3) is given in Theorem 1.
= xkT ATj Pj Aj xk + (Kj xk + uk )T BdT Pj Bd (Kj xk + uk )
Theorem 1: Under the condition of Assumption 2, given a sta-
+ 2(Kj xk + uk )T BdT Pj (Ad − Bd Kj )xk − xkT Pj xk bilising gain K̄0 , the sequences {P̄j }∞ ∞
0 and {K̄j }0 obtained from

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
1442 © The Institution of Engineering and Technology 2016
Fig. 3 Output-feedback ADP algorithm

iteratively solving (16) and (19) converge to P̄ ∗ and K̄ ∗ , respec- Integrating both sides of the last equation, we have
tively. ∞
j∗ j∗
−xT (0)P ∗ x(0) = − (yT Qy + (uk )T Ruk ) dt
Proof: Given a stabilising Kj , if Pj = PjT is the solution to (11), ∞
0
j∗ j∗
then Kj+1 is uniquely determined by (12). It is easy to check that P̄j + (uk + K ∗ x)T R(uk + K ∗ x) dt,
and K̄j+1 satisfy (16) and (19). Letting P̄ and K̄ solve (16) and (19), o
(17) ensures that P̄j = P̄ and K̄j+1 = K̄ are uniquely determined.
where the left side is −J ∗ ,and the first term of the right side is
By Hewer [36], we have limj→∞ K̄j = K̄d∗ , limj→∞ P̄j = P̄d∗ . This
−J ⊕ . By moving the first term of right side to left, we obtain (21).
completes the proof of convergence. 
Letting J be the cost in (4) for the system (1) in closed loop
with (9). It is easy to check that the learned controller approaches
Remark 2: From the definition of zk , we observe that the learned the discretised optimal controller (9) as the threshold goes to 0,
control policy (20) only depends on previous input/output data. which implies that J ⊕ → J as → 0. Next, we will explore how
Therefore, it is a dynamic output-feedback control policy. the sampling period h affects on the cost error J − J ∗ . Inspired
by Melzer and Kuo [41], for h > 0, let Ph (h) = hPd∗ (h), and
Remark 3: Since zk contains input/output data uk−N and yk−N , it is
not available from k = 0 to k = N − 1. In this paper, we develop X (h) = (ATd Ph (h)Ad − Ph (h) + hC T QC)/h,
a robust controller vk for this period. The detail of the design of
vk is presented in Section 4. Y (h) = (ATd Ph (h)Bd )/h, (23)
Z(h) = (hR + BdT Ph (h)Bd )/h.
Remark 4: Similar as in previous work on ADP [13, 31], the explo-
ration noise ek in Algorithm 2 (Fig. 3) is introduced in order to It is easy to obtain the limits of X (h), Y (h) and Z(h) as h goes to
satisfy the rank condition (17). zero
X (0) = Ph (0)A + AT Ph (0) + C T QC,
The following theorem characterises the relationship between (24)
the optimal cost for the original continuous-time system and the Y (0) = Ph (0)B, Z(0) = R.
cost value for the discretised counterpart under the sampled-data
controller (20). Then, (9) and (10) imply

Kd∗ (h) = Z −1 (h)Y T (h); Kd∗ (0) = R−1 BT Ph (0)


Theorem 2: Letting J ⊕ be the cost in (4) for the system (1) in
closed-loop with (20), the error between J ⊕ and J ∗ , X (h) = Y (h)Kd∗ (h); Ph (0)A + AT Ph (0) + C T QC (25)
∞  −1 T
T  ∗  = Ph (0)BR B Ph (0).
j∗
J⊕ − J∗ = uk + K ∗ x R uk + K ∗ x dτ ,
j
(21)
0 which indicates that Ph (0) = P ∗ and Kd∗ (0) = K ∗ . For the first-
order sensitivities, Kd∗ (h) and X (h) are differentiated at h = 0
is bounded. For small h > 0 such that ωh is non-pathological,
we have J ⊕ − J ∗ → yT (0)Qy(0)h + OJ (h2 ) as → 0, where ∂Kd∗ ∂Y T ∂Z −1 T
lim suph→0 |(OJ (h2 ))/h2 | < ∞. = R−1 − R−1 R B Ph (0)
∂h ∂h ∂h (26)
∂K ∗
∂X ∂Y −1 T
Proof: Differentiating the Lyapunov function V = xT P ∗ x along the = Ph (0)B d + R B Ph (0)
∂h ∂h ∂h
solutions of the system
By (23), the first-order sensitivities of X (h), Y (h) and Z(h) are
j∗
ẋ = Ax + Buk , (22) ∂X ∂Ph ∂Ph
= AT + A
∂h ∂h ∂h
we have
AT T A
+ (A Ph (0) + Ph (0)A) + (AT Ph (0) + Ph (0)A)
d T ∗ j∗ 2 2 (27)
(x P x) = xT (AT P ∗ + P ∗ A)x + 2xT P ∗ Buk
dt ∂Y ∂Ph 1 T
j∗
= B + (2A Ph (0) + Ph (0)A)B
= −yT Qy + xT (K ∗ )T RK ∗ x + 2xT (K ∗ )T Ruk ∂h ∂h 2
∂Z
j∗ j∗ j∗ j∗ T
= B Ph (0)B
= −(yT Qy + (uk )T Ruk ) + (uk + K ∗ x)T R(uk + K ∗ x). ∂h

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
© The Institution of Engineering and Technology 2016 1443
Substituting (27) into (26), and by (25), we have

 
∂Ph C T QC
(A − BR−1 BT Ph (0))T −
∂h 2
 
∂Ph C T QC
+ − (A − BR−1 BT Ph (0)) = 0
∂h 2

Since A − BR−1 BT Pd∗ (0) is asymptotically stable, we obtain Fig. 4 Robust controller design algorithm [42]

∂Ph C T QC This directly implies


= . (28)
∂h 2
J − J ∗ = yT (0)Qy(0)h + OJ (h2 ).
Write J as a summation (see [41])
The proof is thus completed.
∞ 
J = (yT (τ )Qy(τ ) + uT (τ )Ru(τ )) dτ
0

 4 Robust controller design
= xjT Q̂xj + 2xjT Muj + ujT R̂uj (29)
j=0 In this section, we focus on the design of a robust output-feedback
controller vk for Algorithm 2 (Fig. 3). First, we seek a stabilis-
ing continuous-time output-feedback control policy v(t) given the
where bound of the plant. Then, we discretise v(t) by periodic sampling.
The stability of the system in closed loop with this discretised
h controller is ensured by sampled-data system theory.
Q̂ = (eAs )T C T QCeAs ds Consider a dynamic output feedback controller
0
h  s 

˙ = AK x̂(t) + BK y(t)
x̂(t)
M= (eAs )T C T QC eAλ B dλ ds (31)
0 0
v(t) = CK x̂(t)
 h  s T  s 
R̂ = hR + eAλ B dλ C T QC eAλ B dλ ds
0 0 0 where x̂ ∈ Rp is the state of the controller and all matrices are
constant with proper dimensions. Combining the controller (31)
and the model (1), the closed-loop system is described by
Then, we have J = xT (0)P (h)x(0), where P (h) satisfies
ξ̇ (t) = Acl ξ(t) (32)
(Ad − Bd Kd∗ )T P (Ad − Bd Kd∗ ) − P
where
= Q̂ − MKd∗ − (Kd∗ )T M T + (Kd∗ )T R̂Kd∗ . (30)

x(t) A0 + A (B0 + B)CK


ξ(t) = , Acl = . (33)
x̂(t) BK C AK
Define P (h) = P (h) − Ph (h), and
Let Kex incorporate all the parameters of the controller

(Ad − Bd Kd∗ )T P (Ad − Bd Kd∗ ) − P
X2 (h) = ,

h 0 CK
Kex  , (34)
Q̂ − C T Qd C − MKd∗ − (Kd∗ )T M T + (Kd∗ )T (R̂ − hR)Kd∗ BK AK
Y2 (h) = .
h
then
Equations (10) and (30) imply X2 (h) = Y2 (h). By X2 (0) = Y2 (0)
Acl = A0ex + Aex + [B0ex + Bex ]Kex Cex (35)
and the first-order sensitivities of X2 and Y2 , we have


where
0 = (A − BK ∗ )T P (0) + P (0)(A − BK ∗ ),
 

0 A0 A 0 B0 0
∂P C T QC A0ex = , Aex = , B0ex = ,
0 = (A − BK ∗ )T − 0 0 0 0 0 I
∂h 2

  Bex =
B 0
, Cex =
C 0
.

∂P C T QC 0 0 0 I
+ − (A − BK ∗ ).
∂h 2
Define a Lyapunov function as V (t) = ξ T (t)Pex ξ(t). Then, the

derivative of this function with respect to time is
(A − BK ∗ ) is a Hurwitz matrix, revealing that P (0) = 0 and

∂P /∂h = C QC/2. By (28), we obtain the first-order approxi-
T
V̇ (t) = ξ T (ATcl Pex + Pex Acl )ξ
mation of P − P ∗
= ξ T [AT0ex + ATex + Cex
T T T
Kex (B0ex T
+ Bex )]Pex ξ

P − P ∗ = P + (Pd∗ − P ∗ ) C T QCh + ξ T P[A0ex + Aex + (B0ex + Bex )Kex Cex ]ξ . (36)

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
1444 © The Institution of Engineering and Technology 2016
By using (3) and the fact that

1
GE + E T G T ≤ GG T + E T E, > 0, (37)

for any matrices G and E, we have

V̇ (t) ≤ ξ T (t)ξ(t)

where
T T T
 = S̃ + Cex Kex B0ex Pex + Pex B0ex Kex Cex
T T
+ Cex Kex [I 0]Kex Cex + Pex [G T
0]T Q1 [I 0]T [G T 0]Pex ,

R 0
S̃ = AT0ex Pex + Pex A0ex + 1 ,
0 0
R1 = E1T E1 ,
Q1 = E2T E2 ,
Fig. 5 Trajectory of inputs
[G T 0]T [G T 0] = [G1T 0]T [G1T 0] + [G2T 0]T [G2T 0].

By using Schur complement,  < 0 is equivalent to (see (38))


A sufficient condition of the existence of robust controllers is given
in Theorem 3 and Algorithm 3 (Fig. 4) can be employed to seek
a robust controller.

Theorem 3 [42]: Consider the system (1) and let C⊥T be the orthog-

onal complement of C T . If there exist matrices X > 0 and Y > 0


satisfying the following inequalities
⎡ ⎤
⎡ ⎤ XAT0 + A0 X 0 X G
I 0 T⎢ 0 −Q1 −1
0 0 ⎥
⎣ −B0T 0 ⎦ ⎢ ⎢


⎣ X 0 −R1−1
0 ⎦
0 I
GT 0 0 −I
⎡ ⎤
I 0
× ⎣ −B0T 0 ⎦ < 0, (39)
0 I
T
T  AT Y + YA + R YG  T

C⊥ 0 0 0 1 C⊥ 0 Fig. 6 Trajectory of outputs


< 0, (40)
0 I GT Y −I 0 I

X I
≥0 (41) Lemma 2 [43]: If the continuous-time system (1) in closed-loop
I Y
with controller (31) is stable, there exists a δ > 0 such that its
then there exist robust stabilising controllers for (1). discretised counterpart (7) with (42) is stable for any h ∈ (0, δ).

As mentioned before, we need to propose a robust discrete-time


control policy for the purpose of digital implementation. Using the To this end, we obtain the discrete-time output-feedback stabil-
same sampling period h, we obtain the discretised variant of (31) ising controller (42) for Algorithm 2 (Fig. 3) by robust control and
sample-data system theory.
x̂k+1 = AKd x̂k + BKd yk
(42)
vk = CK x̂k
5 Numerical simulations
where
h In this section, simulations are conducted on the 2-DOF heli-
AKd = eAK h , BKd = eAK τ dτ BK copter. The parameters in the nominal model are Bp = 0.8 N/V,
0 By = 0.318 N/V, mheli = 1.3872 kg, lcm = 0.186 m, Jp = 0.0384
To make a connection on the stability of continuous-time system kg m2 , Jy = 0.0432 kg m2 , Kpp = 0.204 N m/V, Kyy = 0.072 N m/V,
and its discretised counterpart, the following Lemma 2 is presented. Kpy = 0.0068 N m/V and Kyp = 0.0219 N m/V. The norm-bounded

⎡ ⎤
S̃ + Cex
T K T BT P + P B
ex 0ex ex ex 0ex Kex Cex Pex [G T 0]T T K T [I
Cex ex 0]T
⎢ ⎥
 := ⎣ [G T 0]Pex −I 0 ⎦ < 0. (38)
[I 0]Kex Cex 0 −Q1−1

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
© The Institution of Engineering and Technology 2016 1445
uncertainties are
⎡ ⎤ ⎡ ⎤
0 0 0 0 0 0
⎢0 0 0 0⎥ ⎢ 0 0 ⎥
A = ⎣ B = ⎣
0⎦ 1 0.1 0.05⎦ 2
F , F .
0 0 0.1
0 0 0 0.2 0.05 0.1

First, we follow Algorithm 3 (Fig. 4) to compute the gain of initial


stabilising controller (see equation at the bottom of this page)
Then, we take the sampling period h = 0.1 s, and thus get the
discrete-time robust controller (42). For the purpose of simula-
tion, we set the weight matrices as Q = diag{200, 150}, R = I2 .
The observability index is N = 2 and the stopping criterion of
Algorithm 2 is = 10. By solving of the discrete-time ARE (10),
∗ ∗
we get the optimal values P¯d := T Pd∗  and K¯d := Kd∗  (see
equations at the bottom of this page)
From t = 0 to 10 s, the robust output-feedback controller (42) is
employed. After that, an exploration noise is added into the control
input until Assumption 2 satisfies, i.e. t = 40 s. The online informa-
Fig. 7 Trajectory of states
tion of input and output are collected in the whole process and the
adaptive optimal controller is also computed iteratively. After 13
iterations, the approximate optimal values are obtained as follows, optimal controllers. The suboptimality of the designed controller
which is very close to the corresponding optimal values (see equa- is closely related with the sampling period compromising the dig-
tions at the bottom of this page). Figs. 5–7 depict the trajectory of ital implementation. An initial stabilising controller is designed by
input, output and states of the helicopter system. Finally, we obtain robust control theory for the learning phase. Numerical simula-
that the cost of the continuous-time system J ∗ = 2.2031, and that tion results demonstrate the validity of the proposed approach. The
of the sampled-data system is J ⊕ = 2.2117, which is close to J ∗ . developed methodology can be used to study other sampled-data
adaptive optimal control problems.

6 Conclusion
7 Acknowledgments
This paper addresses the adaptive optimal control problem of a
helicopter via sampled-data output-feedback. ADP is employed as This work was supported in part by the U.S. National Science
a useful tool to design a new class of adaptive output-feedback Foundation grants ECCS-1230040 and ECCS-1501044, and by the

⎡ ⎤
0 0 0 0 −0.59 −0.06
⎢ 0 0 0 0 −0.02 −0.02 ⎥
⎢ ⎥
⎢ 2424186.55 54807.81 −217478.30 −6246.69 738120.17 9547.23 ⎥
=⎢
−25081.01 −476750.84 85230.93 1027753.76 ⎥
Kex .
⎢ 288020.99 3888867.25 ⎥
⎣ 235133.49 18498.04 −21090.76 −2222.12 71575.64 4411.69 ⎦
125465.95 1220394.57 −11018.11 −149604.95 37428.55 322451.81

⎡ 2114.8 −525.37 −769.57 360.73 26.198 −2.4188 12.272 −1.1706 ⎤


⎢−525.37 6898.8 206.92 −4418.0 4.2648 39.577 2.781 20.008 ⎥
⎢−769.57 −142.12 −9.7906 0.97296 −4.5095 0.47326 ⎥
⎢ 206.92 283.61 ⎥
⎢ 360.73 −4418.0 −142.12 2871.1 −2.5641 −26.007 −1.6803 −12.998 ⎥
P̄d = ⎢

⎢ 26.198

⎢ 4.2648 −9.7906 −2.5641 0.36529 0.0352 0.16619 0.01771 ⎥⎥
⎢−2.4188 39.577 0.97296 −26.007 0.035236 0.23801 0.020441 0.11793 ⎥
⎣ ⎦
12.272 2.781 −4.5095 −1.6803 0.16619 0.020441 0.077248 0.010432
−1.1706 20.008 0.47326 −12.998 0.01771 0.11793 0.010432 0.058951

19.689 2.5852 −7.6329 −1.6218 0.29616 0.025204 0.12903 0.012087


K̄d∗ =
−2.5028 33.425 0.99182 −22.492 0.026238 0.2093 0.015181 0.1019

⎡ 2114.8 −525.37 −769.57 360.73 26.198 −2.4188 12.272 −1.1706 ⎤


⎢−525.37 6898.8 206.92−4418.0 4.2648 39.577 2.781 20.008 ⎥
⎢−769.57 −142.12 −9.7906 0.97296 −4.5095 0.47327 ⎥
⎢ 206.92 283.61 ⎥
⎢ 360.73 −4418.0 −142.12 2871.1 −2.5641 −26.007 −1.6803 −12.998 ⎥
P̄13 =⎢⎢ 26.198

⎢ 4.2648 −9.7906−2.5641 0.36529 0.035236 0.16619 0.01771 ⎥ ⎥
⎢−2.4188 39.577 0.97296−26.007 0.035236 0.23801 0.020441 0.11793 ⎥
⎣ ⎦
12.272 2.781 −4.5095−1.6803 0.16619 0.020441 0.077248 0.010432
−1.1706 20.008 0.47327−12.998 0.01771 0.11793 0.010432 0.058951

19.689 2.5843 −7.6331 −1.6231 0.29616 0.025204 0.12903 0.012087


K̄13 =
−2.5029 33.425 0.99185 −22.492 0.026238 0.2093 0.015181 0.1019

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
1446 © The Institution of Engineering and Technology 2016
National Natural Science Foundation of China grant 61374042. The 21 Gao, W., Jiang, Z.P.: ‘Adaptive dynamic programming and adptive optimal out-
authors thank Dr. P. Albertos for fruitful discussions. put regulation of linear systems’, accepted by IEEE Trans. Autom. Control,
2016
22 Gao, W., Jiang, Z.P.: ‘Nonlinear and adaptive suboptimal control of connected
vehicles: a global adaptive dynamic programming approach’, accepted by J.
8 References Intell. Robot. Syst., 2016
23 Jiang, Y., Jiang, Z.P.: ‘Robust adaptive dynamic programming and feedback
1 Cai, G., Chen, B.M., Lee, T.H.: ‘Unmanned Rotorcraft systems’ (Springer, 2011) stabilization of nonlinear systems’, IEEE Trans. Neural Netw. Learn. Syst., 2014,
2 Beard, R., Kingston, D., Quigley, M. et al.: ‘Autonomous vehicle technologies 25, (5), pp. 882–893
for small fixed-wing UAVs’, J. Aerosp. Comput. Inf. Commun., 2005, 2, (1), 24 Jiang, Z.P., Jiang, Y.: ‘Robust adaptive dynamic programming for linear and
pp. 92–108 nonlinear systems: an overview’, Eur. J. Control, 2013, 19, (5), pp. 417–425
3 Kang, Y., Hedrick, J.K.: ‘Linear tracking for a fixed-wing UAV using nonlinear 25 Kamalapurkar, R., Dinh, H., Bhasin, S. et al.: ‘Approximate optimal trajectory
model predictive control’, IEEE Trans. Control Syst. Technol., 2009, 17, (5), tracking for continuous-time nonlinear systems’, Automatica, 2015, 51, pp. 40–48
pp. 1202–1210 26 Lewis, F.L., Vrabie, D.: ‘Reinforcement learning and adaptive dynamic program-
4 Deng, X., Schenato, L., Wu, W.C. et al.: ‘Flapping flight for biomimetic robotic ming for feedback control’, IEEE Circuit Syst. Mag., 2009, 9, (3), pp. 32–50
insects: part I-system modeling’, IEEE Trans. Robot., 2006, 22, (4), pp. 776–788 27 Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: ‘Reinforcement learning and
5 Tommaso, B.: ‘Modelling, identification and control of a quadrotor helicopter’. feedback control: Using natural decision methods to design optimal adaptive
PhD thesis, Lund University, 2008 controllers’, IEEE Control Syst. Mag., 2012, 32, (6), pp. 76–105
6 Xu, R., Ozguner, U.: ‘Sliding mode control of a quadrotor helicopter’. IEEE 28 Luo, B., Wu, H.N., Huang, T. et al.: ‘Data-based approximate policy iteration
Conf. on Decision and Control, 2006, pp. 4957–4962 for affine nonlinear continuous-time optimal control design’, Automatica, 2014,
7 Farrell, J.M.S., Polycarpou, M., Dong, W.: ‘Command filtered backstepping’, 50, (12), pp. 3281–3290
IEEE Trans. Autom. Control, 2009, 54, (6), pp. 1391–1395 29 Wang, D., Liu, D., Wei, Q. et al.: ‘Optimal control of unknown nonaffine nonlin-
8 Gao, W., Fang, Z.: ‘Adaptive integral backstepping control for a 3-DOF ear discrete-time systems based on adaptive dynamic programming’, Automatica,
helicopter’, Int. Conf. Inf. Autom., 2012, pp. 190–195 2012, 48, (8), pp. 1825–1832
9 Fang, Z., Gao, W.: ‘Adaptive integral backstepping control of a micro-quadrotor’. 30 Zhang, H., Liu, D., Luo, Y. et al.: ‘Adaptive dynamic programming for control:
Int. Conf. Intelligent Control and Information Processing, 2011, pp. 910–915 algorithms and stability’ (Springer, 2013)
10 Lee, D., Kim, H.J., Sastry, S.: ‘Feedback linearization vs. adaptive sliding mode 31 Lewis, F.L., Vamvoudakis, K.G.: ‘Reinforcement learning for partially observ-
control for a quadrotor helicopter’, Int. J. Control Autom. Syst., 2009, 7, (3), pp. able dynamic processes: adaptive dynamic programming using measured output
419–428 data’, IEEE Trans. Syst. Man Cybern.B, Cybern., 2011, 41, (1), pp. 14–25
11 Budiyono, A., Wibowo, S.S.: ‘Optimal tracking controller design for a small 32 Nodland, D., Zargarzadeh, H., Jagannathan, S.: ‘Neural network-based optimal
scale helicopter’, J. Bionic Eng., 2007, 4, (4), pp. 271–280 adaptive output feedback control of a helicopter UAV’, IEEE Trans. Neural
12 Sanchez, L.A., Santos, O., Romero, H.: ‘Nonlinear and optimal real-time control Netw. Learn. Syst., 2013, 24, (7), pp. 1061–1073
of a rotary-wing UAV’, Proc. American Control Conf., 2012, pp. 3857–3862 33 Chen, T., Francis, B.A.: ‘Optimal sampled-data control systems’ (Springer, 1995)
13 Jiang, Y., Jiang, Z.P.: ‘Computational adaptive optimal control for continuous- 34 Quanser: ‘Quanser 2-DOF Helicopter user and control manual’ (Quanser Inc.,
time linear systems with completely unknown dynamics’, Automatica, 2012, 48, 2006)
(10), pp. 2699–2704 35 Lewis, F.L., Vrabie, D., Syrmos, V.L.: ‘Optimal control’ (Wiley, 2012)
14 Bertsekas, D.P., Tsitsiklis, J.N.: ‘Neuro-dynamic programming’ (Athena Scien- 36 Hewer, G.: ‘An iterative technique for the computation of the steady state gains
tific, 1996) for the discrete optimal regulator’, IEEE Trans. Autom. Control, 1971, 16, (4),
15 Murray, J.J., Cox, C.J., Lendaris, G.G. et al.: ‘Adaptive dynamic programming’, pp. 382–384
IEEE Trans. Syst. Man Cybern. C, Appl. Rev., 2002, 32, (2), pp. 140–153 37 Gao, W., Jiang, Y., Jiang, Z.P. et al.: ‘Adaptive and optimal output feedback
16 Powell, W.B.: ‘Approximate dynamic programming: solving the curse of dimen- control of linear systems: An adaptive dynamic programming approach’, Proc.
sionality’ (John Wiley & Sons, 2007) World Congress on Intelligent Control and Automation, Shenyang, China, 2014,
17 Werbos, P.J.: ‘Beyond regression: new tools for prediction and analysis in the pp. 2085–2090
behavioral sciences’, PhD thesis, Harvard University, 1974 38 Aangenent, W., Kostic, D., de Jager, B. et al.: ‘Data-based optimal control’,
18 Bian, T., Jiang, Y., Jiang, Z.P.: ‘Adaptive dynamic programming and optimal Proc. American Control Conf., Portland, OR, 2005, pp. 1460–1465
control of nonlinear non-affine systems’, Automatica, 2014, 50, (10), pp. 2624– 39 Astrom, K.J., Wittenmark, B.: ‘Adaptive control’ (Addison-Wesley Longman
2632 Publishing Co., 1994)
19 Dierks, T., Jagannathan, S.: ‘Online optimal control of affine nonlinear discrete- 40 Ioannou, P.A., Sun, J.: ‘Robust adaptive control’ (Dover Publications, 2012)
time systems with unknown internal dynamics by using time-based policy 41 Melzer, S.M., Kuo, B.C.: ‘Sampling period sensitivity of the optimal sampled
update’, IEEE Trans. Neural Netw. Learn. Syst., 2012, 23, (7), pp. 1118– data linear regulator’, Automatica, 1971, 7, (3), pp. 367–370
1129 42 Jeung, E., Oh, D., Kim, J. et al.: ‘Robust controller design for uncertain systems
20 Gao, W., Jiang, Z.P.: ‘Linear optimal tracking control: An adaptive dynamic with time delays: LMI approach’, Automatica, 1996, 32, (8), pp. 1229–1231
programming approach’, Proc. American Control Conf., Chicago, IL, 2015, 43 Chen, T., Francis, B.A.: ‘Input–output stability of sampled-data systems’, IEEE
pp. 4929–4934 Trans. Autom. Control, 1991, 36, (1), pp. 50–58

IET Control Theory Appl., 2016, Vol. 10, Iss. 12, pp. 1440–1447
© The Institution of Engineering and Technology 2016 1447

You might also like