0% found this document useful (0 votes)

5 views

[15] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method

This paper presents a robust control scheme for uncertain nonlinear systems with unknown dynamics using a data-driven reinforcement learning method. The authors formulate the optimal regulation control problem and design a robust controller by integrating a constant feedback gain with the nominal system's optimal controller, proving its optimality and stability. The proposed method is demonstrated through numerical simulations, showcasing its effectiveness in addressing robust control issues without relying on precise mathematical models.

Uploaded by

Mohammadreza Hadipour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

[15] Robust control scheme for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven reinforcement learning method

Uploaded by

Mohammadreza Hadipour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Communicated by Dr Chenguang Yang

Accepted Manuscript

Robust control scheme for a class of uncertain nonlinear systems

with completely unknown dynamics using data-driven reinforcement
learning method

He Jiang, Huaguang Zhang, Yang Cui, Geyang Xiao

PII: S0925-2312(17)31381-4
DOI: 10.1016/j.neucom.2017.07.058
Reference: NEUCOM 18771

To appear in: Neurocomputing

Received date: 20 March 2017

Revised date: 16 July 2017
Accepted date: 31 July 2017

Please cite this article as: He Jiang, Huaguang Zhang, Yang Cui, Geyang Xiao, Robust control scheme
for a class of uncertain nonlinear systems with completely unknown dynamics using data-driven rein-
forcement learning method, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.07.058

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Robust control scheme for a class of uncertain nonlinear

systems with completely unknown dynamics using
data-driven reinforcement learning method

T
He Jiang, Huaguang Zhang∗, Yang Cui, Geyang Xiao

IP
College of Information Science and Engineering, Northeastern University, Box 134,
110819, Shenyang, P. R. China

CR
Abstract
US
This paper deals with the robust control issues for a class of uncertain nonlin-
ear systems with completely unknown dynamics via a data-driven reinforce-
AN
ment learning method. Firstly, we formulate the optimal regulation control
problem for the nominal system, and then, the robust controller for the orig-
inal uncertain system is designed by adding a constant feedback gain to the
optimal controller of the nominal system. Then, this scheme is extended to
M

the optimal tracking control by means of augmented system and discount fac-
tor. It is also demonstrated that the proposed robust controller can achieve
optimality with a new defined performance index function when there is no
ED

control perturbation. It is well known that the nonlinear optimal control

problem relies on the solution of Hamilton-Jacobi-Bellman (HJB) equation,
which is a nonlinear partial differential equation and impossible to be solved
PT

analytically. In order to overcome this difficulty, we introduce a model-based

iterative learning algorithm to successively approximate the solution of HJB
equation and provide its convergence proof. Subsequently, based on the
CE

structure of the model-based approach, a data-driven reinforcement learning

method is derived, which only requires the sampling data from real system
with different control inputs rather than the accurate mathematical system
models. Neural networks (NNs) are utilized to implement this model-free
AC

∗
Corresponding author
Email addresses: [email protected] (He Jiang), [email protected]
(Huaguang Zhang), [email protected] (Yang Cui), [email protected] (Geyang
Xiao)

Preprint submitted to Neurocomputing August 21, 2017

ACCEPTED MANUSCRIPT

method to approximate the optimal solutions and the least-square approach

is employed to minimize the NN approximation residual errors. Finally, two
numerical simulation examples are given to illustrate the effectiveness of our
proposed method.

T
Keywords: Reinforcement learning; Adaptive dynamic programming;
Data-driven; Model-free; Neural networks.

IP
1. Introduction

CR
Model uncertainties, which may severely affect the control performance
of the closed-loop feedback systems, usually occur in the real-world complex

US
systems, such as manufacturing systems and power systems. Therefore, re-
search on the robust control problems has received considerable attention,
and, so far, many significant relevant results have been achieved. Authors
of the work [1] pointed out that the robust control problem of the uncer-
AN
tain original system can be translated into the optimal control problem of its
nominal system, but the detailed results were not shown. It is known that
solving the optimal control problem relies on the solution of the Hamilton-
Jacobi-Bellman (HJB) equation. For the linear optimal control, i.e., linear
M

quadratic regulator (LQR) problem, the HJB equation is the well-known

Riccati equation, which can be computed directly. Nevertheless, for the non-
linear optimal control issue, the HJB equation becomes a nonlinear partial
ED

differential one, which is impossible to be solved analytically.

Dynamic programming is regarded as a conventional approach in solving
the optimal control problem. However, due to its backward-in-time proce-
PT

dure, it generally suffers from the “curse of dimensionality”. In recent years,

reinforcement learning (RL), which can obtain the optimal action of an agent
via the responses from its environment, has provided many effective ways to
CE

overcome the bottleneck of dynamic programming, such as approximation/

adaptive dynamic programming (ADP), which is served as a forward-in-time
method to solve the optimal control problems [2]. ADP can be divided into
AC

two mainstream iterative algorithms, i.e., value iteration (VI) [3, 4] and pol-
icy iteration (PI) [5, 6, 7]. The convergence proof of VI algorithm for the
discrete-time (DT) systems was first given in [3], and a novel PI algorithm for
the DT version was presented in [6]. For the continuous-time (CT) systems,
a PI algorithm was proposed without the requirement of the knowledge of
internal system dynamics in [5]. Following the theoretical structure of these

2
ACCEPTED MANUSCRIPT

classical works [3, 4, 5, 6, 7], a variety of optimal control issues have been ad-
dressed, such as optimal control with constrained control input [8, 9, 10, 11]
and time-delay [12, 13, 14], optimal control for zero-sum [15, 16, 17, 18]
and non-zero-sum games [19, 20, 21, 22], and optimal control applied on the
Markov jump systems [23, 24], robot systems [25, 26] and multi-agent systems

T
[27, 28, 29, 30, 31]. Among these issues, optimal tracking control problem

IP
(OTCP) has become one of the most attractive topics within the scope of
ADP. The integral RL technique was employed to address the OTCP for the
partially unknown CT systems with constrained control inputs in [32], and

CR
the DT case was investigated in [33] by means of an actor-critic-based RL
algorithm. In [34], a novel infinite-time optimal tracking control scheme was
provided for a class of DT nonlinear systems via the greedy heuristic dy-

US
namic programming (HDP) iteration algorithm, and a finite-horizon OTCP
version was studied in [35] through an adaptive dynamic programming ap-
proach. However, the model uncertainties, which are generally inevitable for
the most real-world systems, are not concerned in these studies above.
AN
The main idea of this paper is to employ the ADP technique to ob-
tain the optimal solution of the nominal system, and then, extend it to the
robust controller design of the uncertain original system. In [36], a neural-
M

network-based robust tracking controller was designed for a class of electri-

cally driven nonholonomic mechanical systems, but the optimality was not
concerned. Authors of [37] proposed a novel ADP scheme to solve the opti-
ED

mal robust control problem for a class of uncertain nonlinear systems, and
then, based on this work [37], a data-based robust control approach was
developed via the neural network identification technique in [38]. However,
for the most industrial systems such as aerospace control systems, power
PT

systems and chemical engineering, the system structures are so complicated

that there may be no accurate system mathematical models to support the
control decision design. Hence, the model-based studies will be invalid for
CE

these practical complex systems. Although the identification technique is

viewed as an effective way to overcome this difficulty, prior model identifi-
cation is costly and brings unexpected identification errors which may affect
AC

the control performance. On the other hand, the conventional identification

approaches [15, 21, 23, 38, 39, 40, 41] generally utilize the obtained identi-
fication results to approximate and replace the real system dynamics, and,
as a consequence, they are in fact still model-based not exactly model-free.
Owing to the rapid development of digital sensor technology, vast volume of
system information can be acquired, which inspires data-driven RL control

3
ACCEPTED MANUSCRIPT

methods [42, 43, 44, 45]. Therefore, it will be interesting and challenging to
handle the robust control issues by using the data-driven RL method in the
model-free environment, which motivates our research.
In this paper, we present a data-driven RL scheme to solve the robust
control issues for a class of uncertain CT nonlinear systems with completely

T
unknown system dynamics. First of all, the problem formulation is derived

IP
in Section 2. Secondly, the robust controller of the uncertain original system
is designed by adding a constant feedback gain to the optimal controller of
the nominal system, and the proofs of optimality and stability are provided

CR
in Section 3. Subsequently, based on the introduced model-based iterative
learning algorithm, a data-driven model-free RL method is derived and im-
plemented by two neural networks (NNs), tuning laws of which are updated

US
by the least-square approach in Section 4. Two numerical simulation exam-
ples are given to demonstrate the effectiveness of our proposed scheme in
Section 5. Finally, a brief conclusion is drawn in Section 6.
AN
2. Problem formulation
Optimal control issues can be generally classified into two main groups:
optimal regulation control and optimal tracking control [33].
M

2.1. Optimal regulation control

Consider a class of uncertain nonlinear systems with control input per-
ED

turbation denoted by the following original system:

¯
ẋ(t) = f (x(t)) + g(x(t))(ū(t) + d(x(t))) (1)
PT

¯
where x(t) ∈ Rn is the state; ū(t) ∈ Rm denotes the control input; d(x) ∈
R represents the finite control input perturbation, which is assumed to be
m

bounded by d(x)¯ ≤ kd kxk with the positive constant kd ; f (x(t)) ∈ Rn

and g(x(t)) ∈ Rn×m are the system matrices, which, in this paper, are both
considered to be unknown.
The corresponding nominal system of (1) can be described by
AC

ẋ(t) = f (x(t)) + g(x(t))u(t). (2)

Define the performance index function

Z ∞
T
V(x(t)) = x (τ )P x(τ ) + uT (x(τ ))Ru(x(τ )) dτ (3)
t

4
ACCEPTED MANUSCRIPT

where P ∈ Rn×n > 0 and R ∈ Rm×m > 0 are symmetric positive definite
weight matrices.
The associated optimal control policy which minimizes V(x) can be de-
rived as

T
1
u∗ (x) = − R−1 g T (x)∇V ∗ (x). (4)
2

IP
where V ∗ (x) denotes the optimal performance index function and ∇V ∗ (x) =
∂V ∗ (x)/∂x.

CR
Then, one can obtain the following Hamilton-Jacobi-Bellman (HJB) equa-
tion for the optimal regulation control problem
1

2.2. Optimal tracking control

4
US
∇V ∗T (x)f (x) + xT P x − ∇V ∗T (x)g(x)R−1 g T (x)∇V ∗ (x) = 0. (5)
AN
Let the command generator, which can produce a large class of command
trajectories, be the reference system to be tracked:

ẋd (t) = h(xd (t)) (6)

where xd (t) ∈ Rn denotes the tracking objective state and h(xd ) ∈ Rn is the
command generator function.
ED

Thus, the tracking error ed (t) can be defined as

∆
ed (t) = x(t) − xd (t). (7)
PT

Bringing (2) and (6) together yields the tracking error dynamics

ėd (t) = f (x(t)) − h(xd (t)) + g(x(t))u(t). (8)

T
Let X(t) = eTd (t) xTd (t) be the state of the augmented system. Next,
combining (6) and (8) yields the dynamics of augmented system:
AC

Ẋ(t) = F (X(t)) + G(X(t))u(t) (9)

where
f (ed (t) + xd (t)) − h(xd (t)) g(ed (t) + xd (t))
F (X(t)) = and G(X(t)) = .
h(xd (t)) 0

5
ACCEPTED MANUSCRIPT

∆
X(t) ∈ X ⊂ R2n and u(t) ∈ U ⊂ Rm . D = {(X, u) |X ∈ X , u ∈ U }, where
X and U denote compact sets.
For the nominal augmented system (9), in order to solve the infinite
horizon OTCP, one needs to design a state feedback control policy u(X),
which minimizes the following discounted performance index function:

T
Z ∞

IP
V (X(t)) = e−α(τ −t) X T (τ )QX(τ ) + uT (X(τ ))Ru(X(τ )) dτ (10)
t

CR
P 0
where α > 0 is the discount factor; Q = with P > 0, and R > 0 is
0 0
a symmetric positive definite matrix.

US
Remark 1. Note that it is necessary to employ a discount factor in the per-
formance index function (10). This is because the trajectory of the reference
system xd (t) (6) to be tracked may not go to zero, which is a common case in
AN
the most practical systems, and then the performance index function, which
contains the control policy u(X), will become infinite without the discount
factor.
M

Definition 1. A control policy u(X) ∈ Ψ(X ) is referred to be admissible

with respect to (10) on X , if u(X) can not only stabilize the tracking error
system (8) but also guarantee the performance index function V (X(t)) (10)
ED

finite.

Assumption 1. Assume that there exists at least one admissible control pol-
icy u(X) on the compact set X such that the tracking error system (8) is
PT

asymptotically stable and the performance index function V (X) (10) is fi-
nite.
CE

By means of Leibniz’s rule, differentiate V (X(t)) along the augmented

system trajectories (9), then one can obtain
AC

V̇ (X(t)) = αV (X(t)) − X T (t)QX(t) − uT (t)Ru(t). (11)

In light of (11), the Hamiltonian function can be defined as

∆
H(X, ∇V (X), u) = ∇V T (X)(F (X) + G(X)u) − αV (X) + X T QX + uT Ru
(12)

6
ACCEPTED MANUSCRIPT

where ∇V (X) = ∂V (X)/∂X.

The optimal performance index function V ∗ (X) is given by
Z ∞
∗

V (X(t)) = min e−α(τ −t) X T (τ )QX(τ ) + uT (X(τ ))Ru(X(τ )) dτ ,
u∈Ψ(X) t

T
(13)

IP
which also satisfies the following HJB equation:

0 = min H(X, ∇V ∗ (X), u). (14)

CR
u∈Ψ(X)

If the minimum on the right hand side of (14) exists and is unique, the
corresponding optimal control policy u∗ (X) can be obtained as
1 US
u∗ (X) = − R−1 GT (X)∇V ∗ (X).
2
(15)
AN
Inserting (15) into (14), the HJB equation can be rewritten as

H(V ∗ (X)) =∇V ∗T (X)F (X) + X T QX − αV ∗ (X)

1
M

− ∇V ∗T (X)G(X)R−1 GT (X)∇V ∗ (X) = 0. (16)

4
Remark 2. For the linear systems, the HJB equation is the well known Ric-
ED

cati equation, which is easy to be solved directly. However, for the nonlinear
cases, the HJB equation becomes a nonlinear partial differential one, which
is generally impossible to obtain the solution analytically. In Section 4, we
will introduce two iterative ADP methods to overcome this difficulty.
PT

3. Robust controller design

In this section, firstly, we provide the robust regulation controller, and

then extend the results to the design of the robust tracking controller.
AC

3.1. Robust regulation controller design

In light of the optimal control policy u∗ (x) (4) for the nominal system (2),
the robust controller ū(x) for the uncertain original system (1) is designed
by adding a constant feedback gain η to u∗ (x):

ū(x) = ηu∗ (x). (17)

7
ACCEPTED MANUSCRIPT

Define a new performance index function for the system (1) as

Z ∞

J (x(t)) = P(x(τ )) + ūT (x(τ ))R̄ū(x(τ )) dτ (18)
t

where P(x) = xT P x + (η − 1)u∗T (x)Ru∗ (x) with η ≥ 1 and R̄ = η −1 R.

T
Theorem 1. Consider the system (1) and let η ≥ 1. One can attain:

IP
1. If there is no control input perturbation, i.e., d¯ = 0, then the control
policy ū(x) (17) achieves optimality with the performance index func-

CR
tion (18).
2. If the constant feedback gain η is selected appropriately, then the robust
control policy ū(x) (17) guarantees the system (1) to be asymptotically
stable.
US
Proof. 1) Let J ∗ (x) be the optimal performance index function for the
system (1) with the condition d¯ = 0. One can derive the associated optimal
AN
control policy ū∗ (x) and HJB equation, respectively, as
1
ū∗ (x) = − R̄−1 g T (x)∇J ∗ (x), (19)
2
M

and
∇J ∗T (x)f (x) + xT P x + (η − 1)u∗T Ru∗
ED

1
− ∇J ∗T (x)g(x)R̄−1 g T (x)∇J ∗ (x) = 0 (20)
4
where ∇J ∗ (x) = ∂J ∗ (x)/∂x.
PT

Based on (5), replacing J ∗ (x) by V ∗ (x) and inserting (4) into (20) yields
1
∇V ∗T (x)f (x) + xT P x + (η − 1)∇V ∗T (x)g(x)R−1 g T (x)∇V ∗ (x)
4
CE

1
− η∇V (x)g(x)R g (x)∇V ∗ (x)
∗T −1 T
4
1
=∇V ∗T (x)f (x) + xT P x − ∇V ∗T (x)g(x)R−1 g T (x)∇V ∗ (x) = 0. (21)
AC

4
From (21), it can be observed that V ∗ (x) is a solution of the HJB equation
(20), which also implies the optimal control policy (19) can be rewritten as
1
ū∗ (x) = − ηR−1 g T (x)∇V ∗ (x) = ηu∗ (x). (22)
2

8
ACCEPTED MANUSCRIPT

This completes the proof.

2) Consider the Lyapunov function candidate as:

Θ(x) = V ∗ (x). (23)

T
Then, one has

IP
¯
Θ̇(x) =∇V ∗T (x)(f (x) + g(x)(ū(x) + d(x)))
1
= − xT P x + ∇V ∗T (x)g(x)R−1 g T (x)∇V ∗ (x)

CR
4
+ ∇V (x)g(x)ū(x) + ∇V ∗T (x)g(x)d(x)
∗T ¯
1 1 ¯
= − xT P x − ( η − )∇V ∗T (x)g(x)R−1 g T (x)∇V ∗ (x) + ∇V ∗T (x)g(x)d(x)
2 4

1
1
2
1
4 US
+ ∇V ∗T (x)g(x)g T (x)∇V ∗ (x)
1
≤ − xT P x − ( η − )∇V ∗T (x)g(x)R−1 g T (x)∇V ∗ (x) + d¯T (x)d(x)
2
¯
AN
2
1 2 2 1 1 −1 1 2
≤ − (λmin (P ) − kd )kxk − ( η − )λmin (R ) − g T (x)∇V ∗ (x)
2 2 4 2
(24)
M

where λmin (P ) and λmin (R−1 ) denote the minimum eigenvalues of the matri-
ces P and R−1 , respectively.
ED

In order to obtain Θ̇(x) < 0, the matrix P and the constant feedback
gain η should be given to satisfy the following condition:

λmin (P ) > 21 kd2
PT

1 1 . (25)
η > λmin (R −1 ) + 2

Since P and R are both symmetric positive definite matrices, one can
CE

easily choose a large enough constant feedback gain η to satisfy the condition
1 1
η > max{1, λmin (R −1 ) + 2 }. If P and η are selected appropriately such that

the condition (25) holds, it can be acquired that Θ̇(x) < 0, which indicates
AC

the system is asymptotically stable under the robust controller (17).

The proof is completed.

Remark 3. Theorem 1 shows that the control policy (17) is a successful

design of robust controller for the uncertain system (1). Furthermore, if there
is no any control input perturbation, (17) is also an optimal control policy

9
ACCEPTED MANUSCRIPT

with the respect to the performance index function (18). For some given P , if
the control perturbation becomes much larger such that the condition (25) is
not satisfied, one can also stabilize the system (1) by enhancing the feedback
gain η. This will be demonstrated in the simulation results.

T
3.2. Robust tracking controller design
Similar to the nominal augmented system (9), the original one can be

IP
expressed as

CR
¯
Ẋ(t) = F (X(t)) + G(X(t))(ū(t) + d(t)). (26)

Based on the optimal control policy u∗ (X) (15) for the nominal aug-
mented system, the robust controller for the uncertain original augmented
system (26) is designed by
US
ū(X) = ηu∗ (X). (27)
AN
Define a new performance index function for the augmented system (26)
as
Z ∞
J(X(t)) = e−α(τ −t) L(X(τ )) + ūT (X(τ ))R̄ū(X(τ )) dτ (28)
M

where L(X) = X T QX + (η − 1)u∗T (X)Ru∗ (X) with η ≥ 1 and R̄ = η −1 R.

Corollary 1. Consider the augmented system (26). If η ≥ 1 and there is no

control input perturbation, i.e., d¯ = 0, then the control policy ū (27) achieves
optimality with the performance index function (28).
PT

Proof. The derivation is similar to that of Theorem 1. Let J ∗ (X) be

the optimal performance index function for the augmented system (26) with
d¯ = 0. The corresponding optimal control policy ū∗ (X) and HJB equation
CE

can be given, respectively, by

1
ū∗ (X) = − R̄−1 GT (X)∇J ∗ (X), (29)
AC

2
and

∇J ∗T (X)F (x) + X T QX + (η − 1)u∗T (X)Ru∗ (X) − αJ ∗ (X)

1
− ∇J ∗T (X)G(X)R̄−1 GT (X)∇J ∗ (X) = 0 (30)
4

10
ACCEPTED MANUSCRIPT

where ∇J ∗ (X) = ∂J ∗ (X)/∂X.

According to (16), replacing J ∗ (X) by V ∗ (X) and substituting (15) into
(30) yields
1
∇V ∗T (X)F (X) + X T QX + (η − 1)∇V ∗T (X)G(X)R−1 GT (X)∇V ∗ (X)

T
4
1
− αV ∗ (X) − η∇V ∗T (X)G(X)R−1 GT (X)∇V ∗ (X)

IP
4
=∇V ∗T (X)F (x) + X T QX − αV ∗ (X)

CR
1
− ∇V ∗T (X)G(X)R−1 GT (X)∇V ∗ (X) = 0. (31)
4
From (31), it can be deduced that V ∗ (X) is a solution of the HJB equation

1
US
(30), which also indicates the optimal tracking control policy (29) can be
rewritten as

ū∗ (X) = − ηR−1 GT (X)∇V ∗ (X) = ηu∗ (X). (32)

AN
2
The proof is completed.
Corollary 2. Consider the augmented system (26) and let η ≥ 1. If the
M

constant feedback gain η is selected large enough, then the robust control
policy ū(X) (27) makes the tracking error dynamics asymptotically stable in
the limit as the discount factor goes to zero.
ED

Remark 4. If the discount factor goes to zero, in light of the results of The-
orem 1 and previous relevant works [32, 33, 45, 46], it can be easily acquired
that the tracking error is asymptotically stable. Nevertheless, if the discount
PT

factor is nonzero, there will be difficulty in the stability analysis [47, 48]. Ac-
cording to the results of [32, 33, 45, 46], one can make the tracking error as
small as desired by selecting a small enough discount factor. If the discount
CE

factor is not small, the stability may not be guaranteed.

4. Data-driven reinforcement learning algorithm

In order to solve the HJB equation, firstly, we introduce a model-based

iterative learning algorithm to approximate the solution, and provide the
corresponding convergence proof. Secondly, based on the model-based algo-
rithm, we present a data-driven model-free one, which only requires the real
system data with different control inputs rather than the accurate system
mathematical models.

11
ACCEPTED MANUSCRIPT

4.1. Model-based iterative learning algorithm

Step 1. Let i = 0. Given an initial function V (0) (X) ∈ V0 , where V0 is
determined by the Lemma 5 in [49]. u(0) = − 21 R−1 GT (X)∇V (0) (X).
Step 2. Solve for V (i+1) (X) by using the following equation:

T
[∇V (i+1) (X)]T (F (X) + G(X)u(i) ) − αV (i+1) (X) + X T QX + u(i)T Ru(i) = 0.

IP
(33)

Step 3. Update the control policy by

CR
1
u(i+1) = − R−1 GT (X)∇V (i+1) (X). (34)
2

US
If V (i+1) − V (i) ≤ ε, where ε is a small enough positive constant, then stop
at Step 3; Else, let i = i + 1 and go back to Step 2.
Following the idea of previous works [49, 50], the convergence proof of the
AN
proposed model-based iterative learning algorithm is provided as follows.
Let us consider a Banach space Λ ⊂ {V (x) |V (x) : X → R, V (0) = 0 }
with the norm k·kX and the mapping H : Λ → Λ given by (16). Define
another mapping Γ : Λ → Λ as
M

0
ΓV = V − (H (V ))−1 H(V ) (35)
0
ED

where H (V ) represents the Fréchet derivative of H(·) at the point V .

Due to the difficulty in calculating the solution of Fréchet derivative on
Banach space Λ directly, we introduce the following Gâteaux derivative.
PT

Definition 2. [50, 51] (Gâteaux Derivative) Let H : σ(V ) ⊆ X → Y be a

mapping with two Banach spaces X and Y , where σ(V ) represents a neigh-
borhood of V . If there exists a bounded linear operator L : X → Y such
CE

that

H(V + sM ) − H(V ) = sL(M ) + o(s), s → 0, (36)

for all s in the neighborhood of zero with lim(o(s)/s) = 0 and all M with
s→0
kM kX = 1, then the mapping H is Gâteaux differentiable at the point V and
L denotes the Gâteaux derivative of H at V .

12
ACCEPTED MANUSCRIPT

Therefore, according to (36), the Gâteaux differential at V can be given

by L(M ) as
H(V + sM ) − H(V )
L(M ) = lim . (37)
s→0 s

T
It should be pointed out that (37) provides an easier way to compute
the Gâteaux derivative rather than the Fréchet derivative. The relationship

IP
between these two derivatives will be shown in the following lemma.
0
Lemma 1. [50, 51] If H is continuous at V and exists as the Gâteaux

CR
0
derivative in the neighborhood of V , then L = H (V ) is also an Fréchet
derivative at V .
Lemma 2. Let H be a mapping expressed as (16). For ∀V ∈ Λ, the Fréchet
differential of H at V can be given by
0
H (V )M =L(M )
US
AN
1
=∇M T F − αM − ∇V T GR−1 GT ∇M. (38)
2
Proof. According to (16), one has
M

H(V + sM ) =∇(V + sM )T F + X T QX − α(V + sM )

1
− ∇(V + sM )T GR−1 GT ∇(V + sM ),
4
ED

1
H(V ) =∇V T F + X T QX − αV − ∇V T GR−1 GT ∇V,
4
1
H(V + sM ) − H(V ) =s∇M T F − αsM − s2 ∇M T GR−1 GT ∇M
4
PT

1 T −1 T
− s∇V GR G ∇M. (39)
2
By means of (37) and (39), the Gâteaux derivative at V is obtained by
CE

H(V + sM ) − H(V ) 1
L(M ) = lim = ∇M T F − αM − ∇V T GR−1 GT ∇M.
s→0 s 2
(40)
AC

0
Based on Lemma 1, one gets H (V )M = L(M ). This completes the proof.

Next, construct a Newton iterative sequence as below:
∆ 0
V (i+1) = ΓV (i) = V (i) − (H (V (i) ))−1 H(V (i) ), i = 0, 1, 2, · · · . (41)

13
ACCEPTED MANUSCRIPT

Remark 5. According to the Lemma 4 (Kantorovtich’s Theorem) in [49], it

can be acquired that the Newton iterative sequence {V (i) } (41) will converge to
the optimal value V ∗ as i → ∞. The Lemma 5 in [49] also provides a method
to select the appropriate initial value V (0) which guarantees the convergence
of the Newton iteration.

T
Theorem 2. The sequence {V (i) } produced by the model-based iterative learn-

IP
ing algorithm is equivalent to that of the Newton iteration (41).

CR
Proof. Based on Lemma 2, one attains
0 1
H (V (i) )V (i+1) =∇V (i+1)T F − αV (i+1) − ∇V (i)T GR−1 GT ∇V (i+1) , (42)
2
0

US1
H (V (i) )V (i) =∇V (i)T F − αV (i) − ∇V (i)T GR−1 GT ∇V (i) ,
2
1
(43)

H(V (i) ) =∇V (i)T F + X T QX − αV (i) − ∇V (i)T GR−1 GT ∇V (i) .

AN
4
(44)

According to the model-based iterative learning algorithm, i.e., (33) and

(34), and combining (42), (43) and (44), one has

0 0
H (V (i) )V (i+1) = H (V (i) )V (i) − H(V (i) ), (45)
ED

which is equivalent to the Newton iteration (41).

Thus, as is mentioned in Remark 5, once the initial value V (0) is deter-
mined by the Lemma 5 in [49], according to the Lemma 4 (Kantorovtich’s
PT

Theorem) in [49], the proposed model-based iterative learning algorithm will

ensure V (i) to converge to the optimal value V ∗ as i → ∞ , which also means
u(i) → u∗ as i → ∞.
CE

4.2. Derivation of data-driven reinforcement learning method

To derive the data-based reinforcement learning algorithm, first of all, we
AC

rewrite the system (9) as

Ẋ =F (X) + G(X)u
=F (X) + G(X)u(i) + G(X)(u − u(i) ). (46)

14
ACCEPTED MANUSCRIPT

Next, by means of (46), one gets

dV (i+1) (X(t))
dt
(i+1)
=[∇V (X)]T (F (X) + G(X)u)

T
=[∇V (i+1) (X)]T (F (X) + G(X)u(i) ) + [∇V (i+1) (X)]T G(X)(u − u(i) ). (47)

IP
Based on the updating laws (33) and (34), (47) becomes
dV (i+1) (X(t))

CR
= αV (i+1) (X) − X T QX − u(i)T Ru(i) − 2[u(i+1) ]T R(u − u(i) ).
dt
(48)

attains
V (i+1) (X(t + ∆t)) − V (i+1) (X(t))
US
Integrate both sides of (48) on the interval [t t + ∆t], and then one
AN
Z t+∆t Z t+∆t
(i+1)
= αV (X(τ ))dτ − (X T (τ )QX(τ ) + u(i)T (τ )Ru(i) (τ ))dτ
t t
Z t+∆t
T
−2 [u(i+1) (τ )] R(u(τ ) − u(i) (τ ))dτ . (49)
M

(i+1)
where V (X) and u(i+1) (X) are the unknown functions to be determined.
ED

Remark 6. From the aforementioned derivation, it can be observed that the

main idea of the data-driven RL method is to solve the equation (49) itera-
tively rather than the iterative equations (33) and (34). Furthermore, differ-
PT

ent from (33) and (34), the equation (49) only requires the arbitrary system
data (X, u) ∈ D instead of the system models, i.e., F (X) and G(X).

The following lemma shows that the data-driven model-free algorithm is

equivalent to the model-based iterative learning algorithm, which also im-

plies the convergence of the data-driven approach, that is, {V (i) } and {u(i) }
generated by the iterative equation (49) converge to the optimal values V ∗
AC

and u∗ , respectively, as i → ∞.

Lemma 3. [46] Let V (i+1) (0) = 0 for ∀i = 0, 1, 2, · · · . Then, (V (i+1) (X),

u(i+1) (X)) is the solution of (49) if and only if it is the solution of (33) and
(34), i.e., the iterative equation (49) is equivalent to the iterative equations
(33) and (34).

15
ACCEPTED MANUSCRIPT

4.3. NN-based implementation of data-driven model-free algorithm

The NN structure used in this paper is similar to that in [52, 53]. Two
NNs, namely, critic NN and actor NN, are utilized to approximate the iter-
ative performance index function V (i) and control policy u(i) as below

T
(i)
V̂ (i) (X(t)) =φTV (X(t))WV , (50)

IP
û(i) (X(t)) =φTu (X(t))Wu(i) , (51)
(i) (i)

CR
where WV and Wu are the weight vectors of critic NN and actor NN, re-
spectively, and φV and φu are their corresponding NN activation functions.
For the following derivation, we take the single control input case into consid-
eration, that is, m = 1, which is convenient for the mathematical expression.

US
Since there will be NN approximation errors brought by the NN imple-
mentation, replacing V (i+1) , u(i) and u(i+1) in (49) by V̂ (i+1) , û(i) and û(i+1)
yields the following residual error Ξ(i) :
AN

Ξ(i) (X(t), X(t + ∆t), u(t))

Z t+∆t
M

∆ (i+1) (i+1)
=V̂ (X(t)) − V̂ (X(t + ∆t)) + αV̂ (i+1) (X(τ ))dτ
t
Z t+∆t
T
− (X T (τ )QX(τ ) + [û(i) (X(τ ))] Rû(i) (X(τ )))dτ
ED

t
Z t+∆t
T
+2 [û(i+1) (X(τ ))] R(û(i) (X(τ )) − u(τ ))dτ
t
Z t+∆t
PT

T (i+1) (i+1)
=(φV (X(t)) − φV (X(t + ∆t))) WV + αφTV (X(τ ))WV dτ
t
Z t+∆t
T
(X T (τ )QX(τ ) + [û(i) (X(τ ))] Rû(i) (X(τ )))dτ
CE

−
t
Z t+∆t
T
+2 (û(i) (X(τ )) − u(τ )) R(φTu (X(τ ))Wu(i+1) )dτ . (52)
t
AC

16
ACCEPTED MANUSCRIPT

For simplicity of mathematical expression, let

Z t+∆t
η1 =φV (X(t)) − φV (X(t + ∆t)), η2 = αφTV (X(τ ))dτ ,
t
Z t+∆t
T

T
η3 =2 (û(i) (X(τ )) − u(τ )) RφTu (X(τ ))dτ , (53)
t
Z t+∆t

IP
T
Υ(i) = (X T (τ )QX(τ ) + [û(i) (X(τ ))] Rû(i) (X(τ )))dτ .
t

CR
It is worthy of being pointed out that if ∆t is selected as a small enough
time period, one can utilize the trapezoidal rule to calculate definite integrals
η2 , η3 and Υ(i) as

lim
Z t+∆t
∆t→0 t
p(τ )dτ =
∆t
2
US
[p(t) + p(t + ∆t)]. (54)
AN
Therefore, (52) can be given by

Ξ(i) (x(t), x(t + ∆t), u(t))

(i+1) (i+1)
M

=η1T WV + η2 WV − Υ(i) + η3 Wu(i+1) . (55)

∆ (i+1) ∆ (i+1) T (i+1) T T

Define Φ(i) = [η1T + η2 η3 ] and WV u = [(WV ) (Wu ) ] . Next, (52)
ED

can be rewritten as
(i+1)
Ξ(i) (x(t), x(t + ∆t), u(t)) = Φ(i) WV u − Υ(i) . (56)
PT

In order to minimize the residual error Ξ(i) , we employ the least-

square approach, and thus, multiple data sets will be required. Let the
size of the data sampling sets be M, where M is a large enough number.
CE

Choosing different control input uk (t) with the small enough time period
∆t, one can obtain the system data sampling sets (Xk (t), Xk (t + ∆t), uk (t)),
where k = 1, 2, · · · , M. Subsequently, the database can be constructed as
AC

(i) (i) (i) (i) (i) (i)

ξ (i) = [(Φ1 )T (Φ2 )T · · · (ΦM )T ]T and θ(i) = [(Υ1 )T (Υ2 )T · · · (ΥM )T ]T .
(i+1)
Consequently, the least-square solution for the updating law of WV u is
derived by
(i+1)
WV u = [(ξ (i) )T ξ (i) ]−1 (ξ (i) )T θ(i) . (57)

17
ACCEPTED MANUSCRIPT

4.4. NN-based data-driven model-free algorithm

From the aforementioned content, the data-driven reinforcement learning
algorithm can be summarized as below.
Step 1. With different control inputs uk (t), the sampling sets of system
data (Xk (t), Xk (t + ∆t), uk (t)) can be collected, where k = 1, 2, · · · , M. Let

T
(0)
i = 0. Choose an initial NN weight WV u such that V̂ (0) ∈ V0 .

IP
Step 2. Use the collected sampling data sets to compute ξ (i) and θ(i) .
(i+1)
Step 3. Tune the NN weights WV u through the updating law (57). If
(i+1) (i)

CR
WV u − WV u ≤ ε, where ε is a small enough positive constant, then stop
at Step 3; Else, let i = i + 1 and go back to Step 2.

Remark 7. It can be observed that if we set the discount factor α = 0 and

US
use the performance index function (3), this data-driven method can be also
applied to solve the optimal regulation control issue, which implies optimal
regulation control is a special case of optimal tracking control.
AN
5. Simulation Results
In this section, two simulation examples for both robust regulation control
M

and robust tracking control are shown to demonstrate the effectiveness of our
proposed scheme.

5.1. Robust regulation control

Consider the following uncertain original system:

−x1 + x2 0 ¯
ẋ = + (ū + d). (58)
PT

−0.5(x1 + x2 ) + 0.5x21 x2 x1

where ū = ηu∗ with η = 4 and d¯ = δ1 sin(δ2 x1 + δ3 x2 ) denotes the control

input perturbation with the random constants δ1 ∈ [−0.5, 0.5], δ2 ∈ [−1, 1]
CE

and δ3 ∈ [−1.5, 1.5].

The corresponding nominal system of (58) can be expressed as

AC

−x1 + x2 0
ẋ = + u. (59)
−0.5(x1 + x2 ) + 0.5x21 x2 x1
with the associated performance index function
Z ∞
T
V(x(t)) = x (τ )P x(τ ) + uT (x(τ ))Ru(x(τ )) dτ (60)
t

18
ACCEPTED MANUSCRIPT

where P = I2×2 (I denotes the identity matrix) and R = 1.

We select the activation functions for the critic and actor NNs, respec-
tively, as below
φV (x) = [x21 x1 x2 x22 ]T , (61)

T
and

IP
φu (x) = [x1 x2 x21 x1 x2 x22 ]T . (62)

CR
It should be pointed out that this simulation example is constructed by
the converse HJB method [54]. The optimal solutions can be given by V ∗ =
0.5x21 + x22 and u∗ = −x1 x2 . From the simulation results in Fig. 1 and
Fig. 2, it can be seen that the NN weights converge to their optimal values

US
after iteration. In Fig. 3, the proposed robust controller overcomes the
random control perturbation and state trajectories are stable finally. When
the control perturbation becomes much larger such that the condition (25)
AN
does not hold and the system (58) may be unstable, one can enhance the
feedback gain η to handle this problem as we have mentioned in Remark 3.
From Fig. 4, when we still use η = 4 for much larger control perturbation,
the system goes to unstable states; when we enhance the feedback gain η, the
M

system becomes stable, which demonstrates the aforementioned statement in

Remark 3.
ED

8
WV1
7
WV2

6 WV3
PT

5
Critic Weights

4
CE

1
AC

−1
0 5 10 15 20
Iteration Step

Fig. 1. Convergence of the critic NN weights.

19
ACCEPTED MANUSCRIPT

0.5

−0.5

T
−1
Actor Weights

IP
−1.5
Wu1

−2 Wu2
Wu3

CR
−2.5 Wu4
Wu5
−3

−3.5

−4
0 5
US 10
Iteration Step
15 20
AN
Fig. 2. Convergence of the actor NN weights.

0.2
M

x1
x2
0.15
ED

0.1
x1 and x2

0.05
CE

−0.05
0 5 10 15 20 25 30
AC

Time Step

Fig. 3. Evolution of the state trajectories x1 and x2 .

20
ACCEPTED MANUSCRIPT

0.8

0.7
x1 with η=4
0.6 x2 with η=4
x1 with η=6

T
x1 and x2 with different η

0.5
x2 with η=6

IP
0.4 x1 with η=8
x2 with η=8
0.3
x1 with η=10

CR
0.2 x2 with η=10

0.1

−0.1
0 5
US10
Time Step
15 20
AN
Fig. 4. Evolution of the state trajectories x1 and x2 with different η.

5.2. Robust tracking control

Consider the uncertain original system as follows:

x2 0 ¯
ẋ = + (ū + d) (63)
ED

−0.5(x1 + x2 ) + 0.5x21 x2 1

where we set ū = ηu∗ with the constant feedback gain η = 2, and select
the control input perturbation as d¯ = δ1 sin(δ2 x1 + δ3 x2 ) with the random
PT

constants δ1 ∈ [−1, 1], δ2 ∈ [−3, 3] and δ3 ∈ [−4, 4].

Thus, the nominal nonlinear system of (63) is described by

CE

x2 0
ẋ = + u. (64)
−0.5(x1 + x2 ) + 0.5x21 x2 1

The reference system, which generates the desired trajectory dynamics,

is given by

xd2
ẋd = . (65)
−xd1

21
ACCEPTED MANUSCRIPT

Let X1 = x1 − xd1 , X2 = x2 − xd2 , X3 = xd1 and X4 = xd2 . Then, we can

obtain the augmented system dynamics as below:
   
X2 0
 −0.5(X1 + X2 − X3 + X4 ) + 0.5(X1 + X3 )2 (X2 + X4 )   1 
Ẋ =   +  u

T
 X4   0 
−X3 0

IP
(66)

CR
with the following discounted performance index function:
Z ∞

V (X(t)) = e−α(τ −t) X T (τ )QX(τ ) + uT (τ )Ru(τ ) dτ (67)

US
t

P 0
where the discount factor α = 0.01; Q = with P = 10I2×2 , and
0 0
AN
R = 1.
The activation functions for the critic NN are selected as

φV (X) =[X12 X1 X2 X1 X3 X1 X4 X22 X2 X3 X2 X4 X32 X3 X4 X42

X13 X23 X33 X43 X14 X24 X34 X44 ]T , (68)

and the activation functions for the actor NN are given by

φu (X) =[X1 X2 X3 X4 X12 X1 X2 X1 X3 X1 X4 X22 X2 X3 X2 X4 X32

X3 X4 X42 X13 X1 X2 X3 X1 X2 X4 X1 X3 X4 X12 X2 X12 X3 X12 X4
X23 X22 X1 X22 X3 X22 X4 X2 X3 X4 X33 X32 X1 X32 X2 X32 X4
PT

X43 X42 X1 X42 X2 X42 X3 ]T . (69)

We set ∆t = 0.1s, and then, with different control inputs uk , real system
data sampling sets (Xk (t), Xk (t+∆t), uk ) can be collected. After that, update
the two NNs via the least-square method (57). From Fig. 5 and Fig. 6, it
can be seen that the NN weights achieve convergence finally.
AC

22
ACCEPTED MANUSCRIPT

300

250

200

T
Critic Weights

150

IP
100

CR
50

−50
0 5
US 10
Iteration Step
15 20
AN
Fig. 5. Convergence of the critic NN weights.

50
M

0
ED

−50

1
Actor Weights

−100
0.5
PT

−150 0

−0.5
−200
−1
CE

14 16 18 20
−250

−300
0 5 10 15 20
AC

Iteration Step

Fig. 6. Convergence of the actor NN weights.

23
ACCEPTED MANUSCRIPT

3
x1

2.5 xd1

T
1.5
x1 and xd1

IP
1

CR
0.5

−0.5

−1
0 5 10
US
15
Time Step
20 25 30
AN
Fig. 7. Evolution of the state trajectories x1 and xd1 .

2.5
M

2 xd2

1.5
ED

1
x2 and xd2

0.5
PT

−0.5
CE

−1

−1.5
0 5 10 15 20 25 30
AC

Time Step

Fig. 8. Evolution of the state trajectories x2 and xd2 .

Subsequently, use the obtained results along with the constant feedback
gain to control the uncertain original system (63) with the random control

24
ACCEPTED MANUSCRIPT

input perturbation. As is indicated from Fig. 7 and Fig. 8, the uncer-

tain original system (63) successfully gets synchronization with the reference
system (65).

6. Conclusion

T
In this paper, the robust control issues for a class of uncertain systems

IP
have been investigated. A novel reinforcement learning scheme has been em-
ployed to obtain the optimal control policy of the nominal version without

CR
the requirement of the knowledge of system models. Adding a constant feed-
back gain to the optimal control policy yields the robust controller, which
has been proved to be able to achieve optimality under a new defined perfor-
mance function when there is no control input perturbation. To implement

US
the model-free algorithm, two neural networks updated by the least-square
method have been utilized to learn the solution of the HJB equation itera-
tion by iteration. Simulation results have demonstrated the feasibility and
AN
effectiveness of our proposed scheme. It is expected that, with the powerful
ability of ADP in solving the optimal control problems, our research results
will be extended to other decision support systems.
M

Acknowledgment
This work was supported by the National Natural Science Foundation
ED

of China (61433004, 61627809, 61621004), and IAPI Fundamental Research

Funds 2013ZCX14.
PT

References
[1] F. Lin, R. D. Brandt, J. Sun, Robust control of nonlinear systems:
compensating for uncertainty, International Journal of Control 56 (6)
CE

(1992) 1453–1459.
[2] R. Song, W. Xiao, H. Zhang, C. Sun, Adaptive dynamic programming
for a class of complex-valued nonlinear systems, IEEE Transactions on
AC

Neural Networks and Learning Systems 25 (9) (2014) 1733–1739.

[3] A. Al-Tamimi, F. L. Lewis, M. Abu-Khalaf, Discrete-time nonlinear
HJB solution using approximate dynamic programming: convergence
proof, IEEE Transactions on Systems, Man, and Cybernetics, Part B:
Cybernetics 38 (4) (2008) 943–949.

25
ACCEPTED MANUSCRIPT

[4] Q. Wei, D. Liu, H. Lin, Value iteration adaptive dynamic programming

for optimal control of discrete-time nonlinear systems, IEEE Transac-
tions on Cybernetics 46 (3) (2016) 840–853.

[5] J. J. Murray, C. J. Cox, G. G. Lendaris, R. Saeks, Adaptive dynamic

T
programming, IEEE Transactions on Systems, Man, and Cybernetics,
Part C: Applications and Reviews 32 (2) (2002) 140–153.

IP
[6] D. Liu, Q. Wei, Policy iteration adaptive dynamic programming algo-

CR
rithm for discrete-time nonlinear systems, Neural Networks and Learn-
ing Systems, IEEE Transactions on 25 (3) (2014) 621–634.

[7] K. G. Vamvoudakis, F. L. Lewis, Online actor-critic algorithm to solve

ica 46 (5) (2010) 878–888.

US
the continuous-time infinite horizon optimal control problem, Automat-

[8] H. Zhang, C. Qin, Y. Luo, Neural-network-based constrained optimal

AN
control scheme for discrete-time switched nonlinear system using dual
heuristic programming, IEEE Transactions on Automation Science En-
gineering 11 (3) (2014) 839–849.
M

[9] M. Abu-Khalaf, F. L. Lewis, Nearly optimal control laws for nonlinear

systems with saturating actuators using a neural network HJB approach,
Automatica 41 (5) (2005) 779–791.
ED

[10] H. Zhang, Y. Luo, D. Liu, Neural-network-based near-optimal control

for a class of discrete-time affine nonlinear systems with control con-
straints, IEEE Transactions on Neural Networks 20 (9) (2009) 1490–
PT

1503.

[11] H. Modares, F. L. Lewis, M.-B. Naghibi-Sistani, Adaptive optimal con-

trol of unknown constrained-input systems using policy iteration and

neural networks, IEEE Transactions on Neural Networks and Learning
Systems 24 (10) (2013) 1513–1525.
AC

[12] B. Wang, D. Zhao, C. Alippi, D. Liu, Dual heuristic dynamic program-

ming for nonlinear discrete-time uncertain systems with state delay,
Neurocomputing 134 (2014) 222–229.

[13] H. Zhang, R. Song, Q. Wei, T. Zhang, Optimal tracking control for

a class of nonlinear discrete-time systems with time delays based on

26
ACCEPTED MANUSCRIPT

heuristic dynamic programming, IEEE Transactions on Neural Networks

22 (12) (2011) 1851–1862.

[14] Q. Wei, H. Zhang, D. Liu, Y. Zhao, An optimal control scheme for a

class of discrete-time nonlinear systems with time delays using adaptive

T
dynamic programming, Acta Automatica Sinica 36 (1) (2010) 121–129.

IP
[15] Q. Wei, R. Song, P. Yan, Data-driven zero-sum neuro-optimal control
for a class of continuous-time unknown nonlinear systems with distur-

CR
bance using ADP, IEEE Transactions on Neural Networks and Learning
Systems 27 (2) (2016) 444–458.

[16] K. G. Vamvoudakis, F. Lewis, Online solution of nonlinear two-player

US
zero-sum games using synchronous policy iteration, International Jour-
nal of Robust and Nonlinear Control 22 (13) (2012) 1460–1483.

[17] D. Liu, H. Li, D. Wang, Neural-network-based zero-sum game for

AN
discrete-time nonlinear systems via iterative adaptive dynamic program-
ming algorithm, Neurocomputing 110 (2013) 92–100.

[18] H. Zhang, Q. Wei, D. Liu, An iterative adaptive dynamic programming

method for solving a class of nonlinear zero-sum differential games, Au-

tomatica 47 (1) (2011) 207–214.
ED

[19] K. G. Vamvoudakis, F. L. Lewis, Multi-player non-zero-sum games: On-

line adaptive learning solution of coupled Hamilton-Jacobi equations,
Automatica 47 (8) (2011) 1556–1569.
PT

[20] H. Zhang, L. Cui, Y. Luo, Near-optimal control for nonzero-sum differ-

ential games of continuous-time nonlinear systems using single-network
adp, IEEE Transactions on Cybernetics 43 (1) (2013) 206–216.
CE

[21] D. Liu, H. Li, D. Wang, Online synchronous approximate optimal learn-

ing algorithm for multi-player non-zero-sum games with unknown dy-
namics, IEEE Transactions on Systems, Man, and Cybernetics: Systems
AC

44 (8) (2014) 1015–1027.

[22] R. Song, F. L. Lewis, Q. Wei, Off-policy integral reinforcement learn-

ing method to solve nonlinear continuous-time multiplayer nonzero-sum
games, IEEE Transactions on Neural Networks and Learning Systems
28 (3) (2017) 704–713.

27
ACCEPTED MANUSCRIPT

[23] X. Zhong, H. He, H. Zhang, Z. Wang, Optimal control for unknown

discrete-time nonlinear markov jump systems using adaptive dynamic
programming, IEEE Transactions on Neural Networks and Learning
Systems 25 (12) (2014) 2141–2155.

T
[24] X. Zhong, H. He, H. Zhang, Z. Wang, A neural network based online
learning and control approach for markov jump systems, Neurocomput-

IP
ing 149 (2015) 116–123.

CR
[25] C. Yang, K. Huang, H. Cheng, Y. Li, C.-Y. Su, Haptic identification by
ELM-controlled uncertain manipulator, IEEE Transactions on Systems,
Man, and Cybernetics: Systems PP (99) (2017) 1–12.

US
[26] C. Yang, X. Wang, L. Cheng, H. Ma, Neural-learning-based telerobot
control with guaranteed performance, IEEE Transactions on Cybernet-
ics PP (99) (2016) 1–12.
AN
[27] H. Zhang, T. Feng, G. H. Yang, H. Liang, Distributed cooperative op-
timal control for multiagent systems on directed graphs: An inverse op-
timal approach, IEEE Transactions on Cybernetics 45 (7) (2015) 1315–
1326.
M

[28] K. G. Vamvoudakis, F. L. Lewis, G. R. Hudas, Multi-agent differential

graphical games: Online adaptive learning solution for synchronization
ED

with optimality, Automatica 48 (8) (2012) 1598–1611.

[29] H. Zhang, J. Zhang, G. H. Yang, Y. Luo, Leader-based optimal coordi-

nation control for the consensus problem of multiagent differential games
PT

via fuzzy adaptive dynamic programming, IEEE Transactions on Fuzzy

Systems 23 (1) (2015) 152–163.
CE

[30] Q. Wei, D. Liu, F. L. Lewis, Optimal distributed synchronization con-

trol for continuous-time heterogeneous multi-agent differential graphical
games, Information Sciences 317 (2015) 96–113.
AC

[31] M. I. Abouheaf, F. Lewis, K. Vamvoudakis, S. Haesaert, R. Babuska,

Multi-agent discrete-time graphical games and reinforcement learning
solutions, Automatica 50 (2014) 3038–3053.

28
ACCEPTED MANUSCRIPT

[32] H. Modares, F. L. Lewis, Optimal tracking control of nonlinear partially-

unknown constrained-input systems using integral reinforcement learn-
ing, Automatica 50 (7) (2014) 1780–1792.

[33] B. Kiumarsi, F. L. Lewis, Actor-critic-based optimal tracking for par-

T
tially unknown nonlinear discrete-time systems, IEEE Transactions on
Neural Networks and Learning Systems 26 (1) (2015) 140–151.

IP
[34] H. Zhang, Q. Wei, Y. Luo, A novel infinite-time optimal tracking control

CR
scheme for a class of discrete-time nonlinear systems via the greedy
HDP iteration algorithm, IEEE Transactions on Systems, Man, and
Cybernetics, Part B: Cybernetics 38 (4) (2008) 937–942.

US
[35] D. Wang, D. Liu, Q. Wei, Finite-horizon neuro-optimal tracking control
for a class of discrete-time nonlinear systems using adaptive dynamic
programming approach, Neurocomputing 78 (1) (2012) 14–22.
AN
[36] H.-M. Yen, T.-H. S. Li, Y.-C. Chang, Design of a robust neural network-
based tracking controller for a class of electrically driven nonholonomic
mechanical systems, Information Sciences 222 (2013) 559–575.
M

[37] D. Wang, D. Liu, H. Li, H. Ma, Neural-network-based robust optimal

control design for a class of uncertain nonlinear systems via adaptive
dynamic programming, Information Sciences 282 (2014) 167–179.
ED

[38] D. Wang, D. Liu, Q. Zhang, D. Zhao, Data-based adaptive critic designs

for nonlinear robust optimal control with uncertain dynamics, IEEE
Transactions on Systems, Man, and Cybernetics: Systems 46 (11) (2015)
PT

1544–1555.

[39] T. Dierks, B. T. Thumati, S. Jagannathan, Optimal control of unknown

affine nonlinear discrete-time systems using offline-trained neural net-

works with proof of convergence, Neural Networks 22 (5) (2009) 851–
860.
AC

[40] D. Liu, D. Wang, D. Zhao, Q. Wei, N. Jin, Neural-network-based opti-

mal control for a class of unknown discrete-time nonlinear systems using
globalized dual heuristic programming, IEEE Transactions on Automa-
tion Science Engineering 9 (3) (2012) 628–634.

29
ACCEPTED MANUSCRIPT

[41] H. Zhang, L. Cui, X. Zhang, Y. Luo, Data-driven robust approximate

optimal tracking control for unknown general nonlinear systems using
adaptive dynamic programming method, IEEE Transactions on Neural
Networks 22 (12) (2011) 2226–2236.

T
[42] R. Song, F. L. Lewis, Q. Wei, H. Zhang, Off-policy actor-critic struc-
ture for optimal control of unknown systems with disturbances, IEEE

IP
Transactions on Cybernetics 46 (5) (2016) 1041–1050.
[43] R. Song, F. Lewis, Q. Wei, H.-G. Zhang, Z.-P. Jiang, D. Levine, Mul-

CR
tiple actor-critic structures for continuous-time optimal control using
input-output data, IEEE Transactions on Neural Networks and Learn-
ing Systems 26 (4) (2015) 851–865.

US
[44] B. Luo, H.-N. Wu, T. Huang, D. Liu, Data-based approximate pol-
icy iteration for affine nonlinear continuous-time optimal control design,
Automatica 50 (12) (2014) 3281–3290.
AN
[45] H. Modares, F. L. Lewis, Z.-P. Jiang, H∞ tracking control of completely
unknown continuous-time systems via off-policy reinforcement learning,
IEEE Transactions on Neural Networks and Learning Systems 26 (10)
M

(2015) 2550–2562.
[46] G. Xiao, H. Zhang, Y. Luo, H. Jiang, Data-driven optimal tracking con-
ED

trol for a class of affine non-linear continuous-time systems with com-

pletely unknown dynamics, IET Control Theory Applications 10 (6)
(2016) 700–710.
PT

[47] R. Kamalapurkar, H. Dinh, S. Bhasin, W. E. Dixon, Approximate op-

timal trajectory tracking for continuous-time nonlinear systems, Auto-
matica 51 (2015) 40–48.
CE

[48] R. Kamalapurkar, L. Andrews, P. Walters, W. E. Dixon, Model-based

reinforcement learning for infinite-horizon approximate optimal track-
ing, IEEE Transactions on Neural Networks and Learning Systems
AC

PP (99) (2016) 1–6.

[49] B. Luo, H. Wu, Computationally efficient simultaneous policy update
algorithm for nonlinear H∞ state feedback control with Galerkin’s
method, International Journal of Robust and Nonlinear Control 23 (9)
(2013) 991–1012.

30
ACCEPTED MANUSCRIPT

[50] H.-N. Wu, B. Luo, Neural network based online simultaneous policy
update algorithm for solving the HJI equation in nonlinear H∞ control,
IEEE Transactions on Neural Networks and Learning Systems 23 (12)
(2012) 1884–1895.

T
[51] E. Zeidler, Nonlinear Functional Analysis and Its Applications: III: Vari-
ational Methods and Optimization, Springer Science Business Media,

IP
2013.

CR
[52] C. Yang, Y. Jiang, Z. Li, W. He, C.-Y. Su, Neural control of biman-
ual robots with guaranteed global stability and motion precision, IEEE
Transactions on Industrial Informatics PP (99) (2016) 1–9.

US
[53] C. Yang, X. Wang, Z. Li, Y. Li, C.-Y. Su, Teleoperation control based on
combination of wave variable and neural networks, IEEE Transactions
on Systems, Man, and Cybernetics: Systems PP (99) (2016) 1–12.
AN
[54] V. Nevistic, J. A. Primbs, Optimality of nonlinear design techniques: a
converse HJB approach, Technical Report TR96-022, California Insti-
tute of Technology (1996).
M
ED
PT
CE
AC

31
ACCEPTED MANUSCRIPT

He Jiang received the B.S. degree in automation control in

T
2014 from Northeastern University, Shenyang, China, where he is currently
pursuing the Ph.D. degree in control theory and control engineering. His cur-

IP
rent research interests include adaptive dynamic programming, fuzzy control,
multi-agent system control and their industrial applications.

CR
US
Huaguang Zhang received the B.S. degree and the M.S. de-
gree in control engineering from Northeast Dianli University of China, Jilin
City, China, in 1982 and 1985, respectively. He received the Ph.D. degree
AN
in thermal power engineering and automation from Southeast University,
Nanjing, China, in 1991. He joined the Department of Automatic Control,
Northeastern University, Shenyang, China, in 1992, as a Postdoctoral Fellow
for two years. Since 1994, he has been a Professor and Head of the Institute of
M

Electric Automation, School of Information Science and Engineering, North-

eastern University, Shenyang, China. His main research interests are fuzzy
control, stochastic system control, neural networks based control, nonlinear
ED

control, and their applications. He has authored and coauthored over 280
journal and conference papers, six monographs and co-invented 90 patents.
Dr. Zhang is the fellow of IEEE, the E-letter Chair of IEEE CIS Society,
the former Chair of the Adaptive Dynamic Programming & Reinforcement
PT

Learning Technical Committee on IEEE Computational Intelligence Soci-

ety. He is an Associate Editor of AUTOMATICA, IEEE TRANSACTIONS
ON NEURAL NETWORKS, IEEE TRANSACTIONS ON CYBERNET-
CE

ICS, and NEUROCOMPUTING, respectively. He was an Associate Editor

of IEEE TRANSACTIONS ON FUZZY SYSTEMS (2008-2013). He was
awarded the Outstanding Youth Science Foundation Award from the Na-
AC

tional Natural Science Foundation Committee of China in 2003. He was

named the Cheung Kong Scholar by the Education Ministry of China in
2005. He is a recipient of the IEEE Transactions on Neural Networks 2012
Outstanding Paper Award.

32
ACCEPTED MANUSCRIPT

T
IP
CR
Yang Cui received the B.S. degree in information
and computing science and the M.S. degree in applied mathematics form

US
Liaoning University of Technology, Jinzhou, China. She is currently working
toward the Ph.D. degree in control theory and control engineering, North-
eastern University. Her research interests include dynamic surface control,
AN
neural networks and network control.
M
ED
PT

Geyang Xiao received the B.S. degree in Au-

tomation Control from Northeastern University, Shenyang, China, in 2012.
CE

He has been pursuing the Ph.D. degree with Northeastern University, Shenyang,
China, since 2012. His current research interests include neural-network-
based control, nonlinear control, adaptive dynamic programming and their
AC

industrial applications.

12 Week Duathlon Training Program
No ratings yet
12 Week Duathlon Training Program
7 pages
Sales and Marketing: 25.5.2 Types of Transactions
No ratings yet
Sales and Marketing: 25.5.2 Types of Transactions
2 pages
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
No ratings yet
Data-driven-based Sliding-mode Dynamic Event-triggered Control (2) (1)
11 pages
shi2021
No ratings yet
shi2021
11 pages
Trinh2021 - Article - RobustOptimalTrackingControlUs 1
No ratings yet
Trinh2021 - Article - RobustOptimalTrackingControlUs 1
12 pages
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
No ratings yet
Neurocomputing: Xiaofeng Li, Lei Xue, Changyin Sun
8 pages
20-IEEE-NNLS-Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming
No ratings yet
20-IEEE-NNLS-Event-Triggered Optimal Control With Performance Guarantees Using Adaptive Dynamic Programming
13 pages
Data-Enabled Predictive Control: in The Shallows of The Deepc
No ratings yet
Data-Enabled Predictive Control: in The Shallows of The Deepc
8 pages
Applsci 13 13181
No ratings yet
Applsci 13 13181
21 pages
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
No ratings yet
Adaptive Dynamic Programming Algorithm For Uncertain Nonlinear Switched Systems
7 pages
Near-optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
No ratings yet
Near-optimal Control of Dynamical Systems With Neural Ordinary Differential Equations
23 pages
2024-Ouput-feedback-linear-system-based-on-ADP-1 (1)
No ratings yet
2024-Ouput-feedback-linear-system-based-on-ADP-1 (1)
10 pages
[28] Using Reinforcement Learning Techniques to Solve Continuous-time Non-linear Optimal Tracking Problem Without System Dynamics
No ratings yet
[28] Using Reinforcement Learning Techniques to Solve Continuous-time Non-linear Optimal Tracking Problem Without System Dynamics
9 pages
adaptative for lineare
No ratings yet
adaptative for lineare
23 pages
2020 ADP Nonlinear System Mismatched Disterbances 2
No ratings yet
2020 ADP Nonlinear System Mismatched Disterbances 2
8 pages
Adaptive optimal output regulation of unknown linear continuous-time systems by dynamic output feedback and value iteration
No ratings yet
Adaptive optimal output regulation of unknown linear continuous-time systems by dynamic output feedback and value iteration
11 pages
He, S et al (2019) Reinforcement learning
No ratings yet
He, S et al (2019) Reinforcement learning
10 pages
Adaptive Neural Network-Based Control For A Class of Nonlinear Pure-Feedback Systems With Time-Varying Full State Constraints
No ratings yet
Adaptive Neural Network-Based Control For A Class of Nonlinear Pure-Feedback Systems With Time-Varying Full State Constraints
11 pages
Reinforcement_Learning-Based_Tracking_Control_for_a_Three_Mecanum_Wheeled_Mobile_Robot
No ratings yet
Reinforcement_Learning-Based_Tracking_Control_for_a_Three_Mecanum_Wheeled_Mobile_Robot
8 pages
ACODS 2014 GAndrade
No ratings yet
ACODS 2014 GAndrade
7 pages
Global Adaptive Dynamic Programming For Continuous-Time Nonlinear Systems
No ratings yet
Global Adaptive Dynamic Programming For Continuous-Time Nonlinear Systems
13 pages
[18]Learning-Based Control of Continuous-Time Systems Using Output Feedback
No ratings yet
[18]Learning-Based Control of Continuous-Time Systems Using Output Feedback
8 pages
LMI Relaxations and Its Application To Data-Driven Control Design For Switched Affine Systems
No ratings yet
LMI Relaxations and Its Application To Data-Driven Control Design For Switched Affine Systems
21 pages
09029185
No ratings yet
09029185
7 pages
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
No ratings yet
Adaptive DP For Discrete Time LQR Optimal Tracking Control Problems With Unknown Dynamics
6 pages
Adaptive Learning Feedback Linearization
No ratings yet
Adaptive Learning Feedback Linearization
9 pages
1211 5761 PDF
No ratings yet
1211 5761 PDF
8 pages
Get Adaptive Dynamic Programming with Applications in Optimal Control 1st Edition Derong Liu PDF ebook with Full Chapters Now
100% (1)
Get Adaptive Dynamic Programming with Applications in Optimal Control 1st Edition Derong Liu PDF ebook with Full Chapters Now
55 pages
Linear Quadratic Control Using Model-Free Reinforcement Learning
No ratings yet
Linear Quadratic Control Using Model-Free Reinforcement Learning
16 pages
Adapti
No ratings yet
Adapti
6 pages
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
No ratings yet
Robust Design of Linear Control Laws For Constrained Nonlinear Dynamic Systems
6 pages
_inline-Formula_ _tex-Math Notation=_LaTeX__$ {H}_{ {_infty }}$ __tex-Math___inline-Formula_ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning
No ratings yet
_inline-Formula_ _tex-Math Notation=_LaTeX__$ {H}_{ {_infty }}$ __tex-Math___inline-Formula_ Tracking Control of Completely Unknown Continuous-Time Systems via Off-Policy Reinforcement Learning
13 pages
RISE
No ratings yet
RISE
7 pages
Data_Informativity_A_New_Perspective_on_Data-Driven_Analysis_and_Control
No ratings yet
Data_Informativity_A_New_Perspective_on_Data-Driven_Analysis_and_Control
16 pages
[30]
No ratings yet
[30]
12 pages
1 s2.0 S0005109806000021 Main PDF
No ratings yet
1 s2.0 S0005109806000021 Main PDF
11 pages
root
No ratings yet
root
8 pages
1 s2.0 S1474667016440140 Main
No ratings yet
1 s2.0 S1474667016440140 Main
6 pages
Distributed_Optimal_Coordination_Control_for_Continuous-Time_Nonlinear_Multi-Agent_Systems_With_Input_Constraints
No ratings yet
Distributed_Optimal_Coordination_Control_for_Continuous-Time_Nonlinear_Multi-Agent_Systems_With_Input_Constraints
6 pages
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
No ratings yet
On_the_Certainty-Equivalence_Approach_to_Direct_Data-Driven_LQR_Design
8 pages
[29] Optimal Tracking Control of Nonlinear Partially-unknown Constrained-Input Systems Using Integral Reinforcement Learning
No ratings yet
[29] Optimal Tracking Control of Nonlinear Partially-unknown Constrained-Input Systems Using Integral Reinforcement Learning
13 pages
1-s2.0-S0005109822002163-main
No ratings yet
1-s2.0-S0005109822002163-main
9 pages
Programacion Lineal 9
No ratings yet
Programacion Lineal 9
5 pages
2019_RL_Control_Review
No ratings yet
2019_RL_Control_Review
27 pages
1909.04314
No ratings yet
1909.04314
7 pages
From_Noisy_Data_to_Feedback_Controllers_Nonconservative_Design_via_a_Matrix_S-Lemma
No ratings yet
From_Noisy_Data_to_Feedback_Controllers_Nonconservative_Design_via_a_Matrix_S-Lemma
14 pages
Intl J Robust Nonlinear - 2024 - Chang - Adaptive Prescribed Performance Track
No ratings yet
Intl J Robust Nonlinear - 2024 - Chang - Adaptive Prescribed Performance Track
19 pages
BBMbook Cambridge Newstyle
No ratings yet
BBMbook Cambridge Newstyle
373 pages
Linear quadratic tracking control of unknown systems
No ratings yet
Linear quadratic tracking control of unknown systems
10 pages
Robust Output Feedback Controller Scheme For A Class of Uncertain Nonlinear Systems
No ratings yet
Robust Output Feedback Controller Scheme For A Class of Uncertain Nonlinear Systems
6 pages
CDC_2023_final_submission 2023-09-12 14_11_10
No ratings yet
CDC_2023_final_submission 2023-09-12 14_11_10
6 pages
yang2013
No ratings yet
yang2013
9 pages
Adaptive Neural Asymptotic Tracking of Uncertain Non-Strict Feedback Systems With Full-State Constraints via Command Filtered Technique
No ratings yet
Adaptive Neural Asymptotic Tracking of Uncertain Non-Strict Feedback Systems With Full-State Constraints via Command Filtered Technique
6 pages
Adaptive Observer Based Data-Driven Control For Nonlinear Discrete-Time Processes
No ratings yet
Adaptive Observer Based Data-Driven Control For Nonlinear Discrete-Time Processes
9 pages
Control and Reinforcement Learning
No ratings yet
Control and Reinforcement Learning
6 pages
Model-Based Adaptive Critic Designs: Editor's Summary
No ratings yet
Model-Based Adaptive Critic Designs: Editor's Summary
31 pages
ACC23 Tutorial Paulson
No ratings yet
ACC23 Tutorial Paulson
12 pages
Lewis LFC
No ratings yet
Lewis LFC
15 pages
Robust Control Barrier Functions For Nonlinear Control Systems With Uncertainty: A Duality-Based Approach
No ratings yet
Robust Control Barrier Functions For Nonlinear Control Systems With Uncertainty: A Duality-Based Approach
8 pages
Feedback Control Theory
From Everand
Feedback Control Theory
Bruce Francis
5/5 (1)
Principles of Control Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of Control Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Nonlinear Control Feedback Linearization Sliding Mode Control
From Everand
Nonlinear Control Feedback Linearization Sliding Mode Control
Mourad Boufadene
No ratings yet
[16] Robust controller design for systems with probabilistic uncertain parameters using multi-objective genetic programming
No ratings yet
[16] Robust controller design for systems with probabilistic uncertain parameters using multi-objective genetic programming
17 pages
Hermman 1967
No ratings yet
Hermman 1967
6 pages
[13] Robust MPC with recursive model update
No ratings yet
[13] Robust MPC with recursive model update
11 pages
Meta-device: advanced manufacturing
No ratings yet
Meta-device: advanced manufacturing
16 pages
Huang 1978
No ratings yet
Huang 1978
8 pages
Corner Sort For Pareto-Based Many-Objective Optimization: Handing Wang, Xin Yao
No ratings yet
Corner Sort For Pareto-Based Many-Objective Optimization: Handing Wang, Xin Yao
11 pages
SKF FFT
100% (1)
SKF FFT
32 pages
Accepted Manuscript
No ratings yet
Accepted Manuscript
27 pages
Approximate ENS
No ratings yet
Approximate ENS
43 pages
Optimizacion de Consignas PDF
No ratings yet
Optimizacion de Consignas PDF
15 pages
Mbabaei@znu - Ac.ir: Nsga-Ii
No ratings yet
Mbabaei@znu - Ac.ir: Nsga-Ii
7 pages
Practise 1 "Primes, HCF and LCM"
No ratings yet
Practise 1 "Primes, HCF and LCM"
2 pages
Irrigation Pump Installation Method Statement
No ratings yet
Irrigation Pump Installation Method Statement
3 pages
Three Dimensional Object Representations
No ratings yet
Three Dimensional Object Representations
3 pages
Streamlight's STRION 2020
No ratings yet
Streamlight's STRION 2020
4 pages
Locker Checklist
No ratings yet
Locker Checklist
8 pages
De Thi Thu Tieng Anh Nam Dinh Lan 1
No ratings yet
De Thi Thu Tieng Anh Nam Dinh Lan 1
6 pages
Performance Evaluation Imoc 2003
No ratings yet
Performance Evaluation Imoc 2003
5 pages
Biotechnology and It's Application Important QN English
No ratings yet
Biotechnology and It's Application Important QN English
2 pages
Teng2012 PDF
No ratings yet
Teng2012 PDF
13 pages
Boyd & Morrison, 1992
No ratings yet
Boyd & Morrison, 1992
4 pages
MAPAL OptiMill Composite Speed Plus En
No ratings yet
MAPAL OptiMill Composite Speed Plus En
12 pages
Introduction To Hydraulics
No ratings yet
Introduction To Hydraulics
70 pages
Buckling and Free Vibration Analysis of Arch 529814 - 1 - en - 30 - Chapter - Author
No ratings yet
Buckling and Free Vibration Analysis of Arch 529814 - 1 - en - 30 - Chapter - Author
16 pages
Abdallah Baraka Exterior and Landscape BOQ (1)
No ratings yet
Abdallah Baraka Exterior and Landscape BOQ (1)
4 pages
Ruger LCR Manual
No ratings yet
Ruger LCR Manual
40 pages
PHPR 5 DHJ V
No ratings yet
PHPR 5 DHJ V
10 pages
Egyptian National Antimicrobial Formulary 2023 v1
No ratings yet
Egyptian National Antimicrobial Formulary 2023 v1
463 pages
Chemistry of Cooking
No ratings yet
Chemistry of Cooking
250 pages
OMM Fellows Review
100% (1)
OMM Fellows Review
178 pages
Railwayana Auction Catalogue: April 2013
100% (1)
Railwayana Auction Catalogue: April 2013
84 pages
UNIT 1 ART DECO
No ratings yet
UNIT 1 ART DECO
2 pages
question-1289631 (2)
No ratings yet
question-1289631 (2)
8 pages
Thin Solid Films, 31: (1976) 235-241 © Elsevier Sequoia S.A., Lausanne - Printed in Switzerland
No ratings yet
Thin Solid Films, 31: (1976) 235-241 © Elsevier Sequoia S.A., Lausanne - Printed in Switzerland
7 pages
BSEE Curriculum
No ratings yet
BSEE Curriculum
5 pages
Beam 1224
No ratings yet
Beam 1224
2 pages
Forests: Deforestation and Desertification
No ratings yet
Forests: Deforestation and Desertification
15 pages
Improvement of Sri Lankan Peaty Soil - UOM
No ratings yet
Improvement of Sri Lankan Peaty Soil - UOM
6 pages
English L3 Activiy 25 30
No ratings yet
English L3 Activiy 25 30
91 pages