0% found this document useful (0 votes)
22 views13 pages

Provably Safe and Robust Learning-BasedModel Predictive Control

This paper proposes a learning-based model predictive control (LBMPC) scheme that uses statistical tools to identify richer models and improve system performance, while also providing robustness guarantees. LBMPC uses two models - a nominal model with bounded uncertainty to ensure safety, and a learned model updated through statistics to optimize performance. The paper proves LBMPC ensures safety and stability, and that its control action converges to that of MPC with true dynamics under sufficient excitation.

Uploaded by

hongyu zheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views13 pages

Provably Safe and Robust Learning-BasedModel Predictive Control

This paper proposes a learning-based model predictive control (LBMPC) scheme that uses statistical tools to identify richer models and improve system performance, while also providing robustness guarantees. LBMPC uses two models - a nominal model with bounded uncertainty to ensure safety, and a learned model updated through statistics to optimize performance. The paper proves LBMPC ensures safety and stability, and that its control action converges to that of MPC with true dynamics under sufficient excitation.

Uploaded by

hongyu zheng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Provably Safe and Robust Learning-Based Model Predictive

Control ?

Anil Aswani a , Humberto Gonzalez a , S. Shankar Sastry a , Claire Tomlin a


a
Electrical Engineering and Computer Sciences, Berkeley, CA 94720
arXiv:1107.2487v2 [math.OC] 4 Aug 2012

Abstract

Controller design faces a trade-off between robustness and performance, and the reliability of linear controllers has caused
many practitioners to focus on the former. However, there is renewed interest in improving system performance to deal with
growing energy constraints. This paper describes a learning-based model predictive control (LBMPC) scheme that provides
deterministic guarantees on robustness, while statistical identification tools are used to identify richer models of the system
in order to improve performance; the benefits of this framework are that it handles state and input constraints, optimizes
system performance with respect to a cost function, and can be designed to use a wide variety of parametric or nonparametric
statistical tools. The main insight of LBMPC is that safety and performance can be decoupled under reasonable conditions in
an optimization framework by maintaining two models of the system. The first is an approximate model with bounds on its
uncertainty, and the second model is updated by statistical methods. LBMPC improves performance by choosing inputs that
minimize a cost subject to the learned dynamics, and it ensures safety and robustness by checking whether these same inputs
keep the approximate model stable when it is subject to uncertainty. Furthermore, we show that if the system is sufficiently
excited, then the LBMPC control action probabilistically converges to that of an MPC computed using the true dynamics.

Key words: Predictive control; statistics; robustness; safety analysis; learning control.

1 Introduction The main challenge is combining (a) and (c): Statis-


tical methods converge in a probabilistic sense, and
Tools from control theory face an inherent trade-off be- this is not strong enough for the purpose of providing
tween robustness and performance. Stability can be de- deterministic guarantees of safety. Showing (d) is also
rived using approximate models, but optimality requires difficult because of the differences between statistical
accurate models. This has driven research in adaptive and dynamical convergence.
[64,65,55,6,60] and learning-based [74,3,70,1,47] control.
Adaptive control reduces conservatism by modifying We introduce a form of robust, adaptive model predic-
controller parameters based on system measurements, tive control (MPC) that we refer to as learning-based
and learning-based control improves performance by us- model predictive control (LBMPC). The main insight
ing measurements to refine models of the system. How- of LBMPC is that performance and safety can be de-
ever, learning by itself cannot ensure the properties that coupled in an MPC framework by using reachability
are important to controller safety and stability [15,7,8]. tools [4,14,56,23,5,69,52]. In particular, LBMPC im-
proves performance by choosing inputs that minimize a
cost subject to the dynamics of a learned model that is
The motivation of this paper is to design a control updated using statistics, while ensuring safety and sta-
scheme than can (a) handle state and input constraints, bility by using theory from robust MPC [19,21,42,44] to
(b) optimize system performance with respect to a check whether these same inputs keep a nominal model
cost function, (c) use statistical identification tools to stable when it is subject to uncertainty.
learn model uncertainties, and (d) provably converge.
LBMPC is similar to other variants of MPC. For in-
? Corresponding author A. Aswani. stance, linear parameter-varying MPC (LPV-MPC) has
Email addresses: [email protected] (Anil a model that changes using successive online lineariza-
Aswani), [email protected] (Humberto tions of a nonlinear model [38,26]; the difference is that
Gonzalez), [email protected] (S. Shankar LBMPC updates the models using statistical methods,
Sastry), [email protected] (Claire Tomlin). provides robustness to poor model updates, and can in-
volve nonlinear models. Other forms of robust, adap- For a sequence fn and rate rn , the notation fn = O(rn )
tive MPC [28,2] use an adaptive model with an uncer- means that ∃M, N > 0 such that kfn k ≤ M krn k, for all
tainty measure to ensure robustness, while LBMPC uses n > N . For a random variable fn , constant f , and rate
a learned model to improve performance and a nominal rn , the notation kfn − f k = Op (rn ) means that given
model with an uncertainty measure to provide robust-  > 0, ∃M, N > 0 such that P(kfn − f k/rn > M ) < ,
ness. p
for all n > N . The notation fn − → f means that there
exists rn → 0 such that kfn − f k = Op (rn ).
Here, we focus on LBMPC for when the nominal model
is linear and has a known level of uncertainty. After re-
2.2 Model
viewing notation and definitions, we formally define the
LBMPC optimization problem. Deterministic theorems
about safety, stability, and robustness are proved. Next, Let x ∈ Rp be the state vector, u ∈ Rm be the control
we discuss how learning is incorporated into the LBMPC input, and y ∈ Rq be the output. We assume that the
framework using parametric or nonparametric statisti- states x ∈ X and control inputs u ∈ U are constrained
cal tools. Provided sufficient excitation of the system, by the polytopes X , U. The true system dynamics are
we show convergence of the control law of LBMPC to
that of an MPC that knows the true dynamics. The pa- xn+1 = Axn + Bun + g(xn , un ) (1)
per concludes by discussing applications of LBMPC to
three experimental testbeds [12,9,20,13] and to a simu- and yn = Cxn , where A, B, C are matrices of appro-
lated jet engine compression system [53,25,39]. priate size and g(x, u) describes the unmodeled (possi-
bly nonlinear) dynamics. The intuition is that we have a
nominal linear model with modeling error. The term un-
2 Preliminaries certainty is used interchangeably with modeling error.

In this section, we define the notation, the model, and We assume that the modeling error g(x, u) of (1) is
summarize three results on estimation and filtering. Note bounded and lies within a polytope W, meaning that
that polytopes are assumed to be convex and compact. g(x, u) ∈ W for all (x, u) ∈ (X , U). This assumption
is not restrictive in practice because it holds whenever
2.1 Mathematical Notation g(x, u) is continuous, since X , U are bounded. Moreover,
the set W can be determined using techniques from un-
certainty quantification [18]; for example, the residual
We use A0 to denote the transpose of A, and subscripts
error from model fitting can be used to compute this
denote time indices. Marks above a variable distinguish
uncertainty.
the state, output, and input of different models of the
same system. For instance, the true system has state x,
the linear model with disturbance has state x, and the 2.3 Estimation and Filtering
model with oracle has state x̃.
Simultaneously performing state estimation and learn-
A function γ : R+ → R+ is type-K if it is continuous, ing unmodeled dynamics requires measuring all states
strictly increasing, and γ(0) = 0 [63]. Function β : R+ × [10], except in special cases [9]. We focus on the case
R+ → R+ is type-KL if for each fixed t ≥ 0, the function in which all states are measured (i.e, C = I). It is pos-
β(·, t) is type-K, and for each fixed s ≥ 0, the function sible to relax these assumptions by using set theoretic
β(s, ·) is decreasing and β(s, t) → 0 as t → ∞ [35]. estimation methods (e.g., [51]), but we do not consider
Also, Vm (x) is a Lyapunov function for a discrete time those extensions here. For simplicity of presentation, we
system if (a) Vm (xs ) = 0 and Vm (x) > 0, ∀x 6= xs ; (b) assume that there is no measurement noise; however,
α1 (kx − xs k) ≤ Vm (x) ≤ α2 (kx − xs k), where α1 , α2 our results extend to the case with measurement noise
are type-K functions; (c) xs lies in this interior of the by simply replacing the modeling error W in our results
domain of Vm (x); and (d) Vm+1 (xm+1 ) − Vm (xm ) < 0 with W ⊕ D, where D is a polytope encapsulating the
for states xm 6= xs of a dynamical system. effect of bounded measurement noise.

Let U, V, W be sets. Their Minkowski sum [66] is U ⊕V =


{u + v : u ∈ U; v ∈ V}, and their Pontryagin set differ- 3 Learning-Based MPC
ence [66] is U V = {u : u ⊕ V ⊆ U}. This set difference
is not symmetric, and so the order of operations is im- This section presents the LBMPC technique. The first
portant; also, the set difference can result in an empty step is to use reachability tools to construct a terminal
set. The linear transformation of U by matrix T is given set with robustness properties for the LBMPC, and this
by T U = {T u : u ∈ U}. Some useful properties [66,37] terminal set is important for proving the stability, safety,
include: (U V) ⊕ V ⊆ U, (U (V ⊕ W)) ⊕ W ⊆ U V, and robustness properties of LBMPC. The terminal con-
(U V) W ⊆ U (V ⊕ W), and T (U V) ⊆ T U T V. straint set is typically used to guarantee both feasibility

2
and convergence [50]. We decouple performance from ro- and (b) disturbance invariance:
bustness by identifying feasibility with robustness and
convergence with performance. "
A + BK B(Ψ − KΛ)
#
Ω ⊕ (W × {0}) ⊆ Ω. (4)
One novelty of LBMPC is that different models of the 0 I
system are maintained by the controller. In order to de-
lineate the variables of the various models, we add marks Recall that the θ component of the set is a parametriza-
above x and u. The true system (1) has state x and in- tion of which points can be tracked using control un .
put u. The nominal linear model with uncertainty has
state x and input u; its dynamics are given by The set Ω has an infinite number of constraints in gen-
eral, though arbitrarily good approximations can be
xn+1 = Axn + Bun + dn , (2) computed in a finite number of steps [37,44,57]. These
approximations maintain both disturbance invariance
where dn ∈ W is a disturbance. Because g(x, u) ∈ W, and constraint satisfaction, and these are the properties
the dn reflects the uncertain nature of modeling error. which are used in the proofs for our MPC scheme. So
even though our results are for Ω, they equally hold true
for appropriately computed approximations.
For the learned model, we denote the state x̃ and input ũ.
Its dynamics are x̃n+1 = Ax̃n +B ũn +On (x̃n , ũn ), where
On is a time-varying function that is called the oracle. 3.2 Stability and Safety of LBMPC
The reason we call this function the oracle is in reference
to computer science in which an oracle is a black box that LBMPC uses techniques from a type of robust MPC
takes in inputs and gives an answer: LBMPC only needs known as tube MPC [21,42,44], and it enlarges the fea-
to know the value (and gradient when doing numerical sible domain of the control by using tracking ideas from
computations) of this function at a finite set of points; [22,45]. The idea of tube MPC is that given a nominal
and yet, the mathematical structure and details of how trajectory of the linear system (2) without disturbance,
the oracle is computed are not relevant to the stability then the trajectory of the true system (1) is guaranteed
and robustness properties of LBMPC. to lie within a tube that surrounds the nominal trajec-
tory. A linear feedback K is used to control how wide
this tube can grow. Moreover, LBMPC fixes the initial
3.1 Construction of an Invariant Set condition of the nominal trajectory as in [21,42], as op-
posed to letting the initial condition be an optimization
We begin by recalling two facts [44]. First, if (A, B) is sta- variable as in [44].
bilizable, then the set of steady-state points are xs = Λθ
and us = Ψθ, where θ ∈ Rm and Λ, Ψ are full column- Let N be the number of time steps for the horizon of the
rank matrices with suitable dimensions. These matrices MPC. The width of the tube at the i-th step, for i ∈ I =
can be computed with a null space computation, by not- {0, . . . , N − 1}, is given by a set Ri , and the constraints
ing that range([Λ0 Ψ0 ]0 ) = null([(I − A) − B]). Second, X are shrunk by the width of this tube. The result is
if (A + BK) is Schur stable (i.e., all eigenvalues have that if the nominal trajectory lies in X Ri , then the
magnitude strictly less than one), then the control input true trajectory lies in X . Similarly, suppose that the N -
un = K(xn − xs ) + us = Kxn + (Ψ − KΛ)θ steers (2) th step of the nominal trajectory lies in Projx (Ω) RN ,
to steady-state xs = Λθ and us = Ψθ, whenever dn ≡ 0. where Projx (Ω) = Ωx = {x : ∃θ s.t. (x, θ) ∈ Ω}; then
the true trajectory lies in Projx (Ω), and the invariance
These facts are useful because they can be used to con- properties of Ω imply that there exists a control that
struct a robust reachable set that serves as the terminal keeps the system stable even under disturbances.
constraint set for LBMPC. The particular type of reach
set we use is known as a maximal output admissible dis- The following optimization problem defines LBMPC
turbance invariant set Ω ⊆ X × Rm . It is a set of points
such that any trajectory of the system with initial con- Vn (xn ) = minc,θ ψn (θ, x̃n , . . . , x̃n+N ,
dition chosen from this set and with control un remains ǔn , . . . , ǔn+N −1 ) (5)
within the set for any sequence of bounded disturbance, subject to:
while satisfying constraints on the state and input [37].
x̃n = xn , xn = xn (6)
These properties of Ω are formalized as (a) constraint x̃n+i+1 = Ax̃n+i + B ǔn+i + On (x̃n+i , ǔn+i ) (7)
satisfaction: xn+i+1 = Axn+i + B ǔn+i



ǔn+i = Kxn+i + cn+i 
(8)
Ω ⊆ {(x, θ) : x ∈ X ; Λθ ∈ X ; xn+i+1 ∈ X Ri , ǔn+i ∈ U KRi 

Kx + (Ψ − KΛ)θ ∈ U; Ψθ ∈ U}, (3) (xn+N , θ) ∈ Ω (RN × {0})

3
for all i ∈ I in the constraints; K is the feedback gain um+n [Mn ], for i ∈ I. In this notation, the control law is
Li−1
used to compute Ω; R0 = {0} and Ri = j=0 (A +
explicitly given by
j
BK) W; On is the oracle; and ψn are non-negative func-
tions that are Lipschitz continuous in their arguments. um [M∗n ] = Kxn + cn [M∗n ]. (9)
Note that the Lipschitz assumption is not restrictive be-
cause it is satisfied by costs with bounded derivatives; This MPC scheme is endowed with robust feasibility and
for example, linear and quadratic costs satisfy this due constraint satisfaction properties, which in turn imply
to the boundedness of states and inputs. Also note that stability of the closed-loop control provided by LBMPC.
the same control ǔ[·] is applied to both the nominal and The equivalence between these properties and stability
learned models. holds because of the compactness of constraints X , U.
Remark 1. The cost ψn is a function of the states of
Theorem 1. If Ω has the properties defined in Sect. 3.1
the learned model, which uses the oracle to update the
and Mn = {cn , . . . , cn+N −1 , θn } is feasible for the
nominal model. The cost function may contain a termi-
LBMPC scheme (5) with xn , then applying the control
nal cost, an offset cost, a stage cost, etc. An interesting
(9) gives
feature of LBMPC is that its stability and robustness
properties do not depend on the actual terms within the
cost function; this is one of the reasons that we state that a) Robust feasibility: there exists a feasible Mn+1 for
LBMPC decouples safety (i.e., stability and robustness) xn+1 ;
from performance (i.e., having the cost be a function of b) Robust constraint satisfaction: xn+1 ∈ X .
the learned model).
Remark 2. The constraints in (8) are taken from [21] Proof. The proof follows a similar line of reasoning as
and are robustly imposed on the nominal linear model Lemma 7 of [21]. We begin by showing that the following
(2), taking into account the prior bounds on the unmod- point Mn+1 = {cn+1 , . . . , cn+N −1 , 0, θn } is feasible for
eled dynamics of the nominal model g(x, u). The reason xn+1 ; the results follow as consequences of this.
that the constraints are not relaxed to exploit the refined
results of the oracle (as in [28,2]) is that this provides Let dn+1+i [Mn ] = (A + BK)i g(xn , un ), and note
robustness to the situation in which the learned model that dn+1+i [Mn ] ∈ (A + BK)i+1 W. Some algebra
is not a good representation of the true dynamics. It gives the predicted states for i = 0, . . . , N − 1 as
is known that the performance of a learning-based con- xn+1+i [Mn+1 ] = xn+1+i [Mn ] + dn+1+i [Mn ] and pre-
troller can be arbitrarily bad if the learned model does dicted inputs for i = 0, . . . , N − 2 as ǔn+1+i [Mn+1 ] =
not exactly match the true model [15]; imposing the con- ǔn+1+i [Mn ] + Kdn+1+i [Mn ].
straints on the nominal model, instead of the learned
model, protects against this situation. Because Mn is feasible, this means by definition
Remark 3. There is another, more subtle reason for that xn+1+i [Mn ] ∈ X Ri+1 for i = 0, . . . , N − 1.
maintaining two models. Suppose that the oracle is Combining terms gives xn+1+i [Mn+1 ] ∈ X (Ri ⊕
bounded by a polytope On ∈ P, where P is a polytope; (A + BK)i+1 W) ⊕ (A + BK)i+1 W. It follows that
then, the worst case error between the true model (1) and xn+1+i [Mn+1 ] ∈ X Ri for i = 0, . . . , N − 1. Simi-
the learned model (7) lies within the polytope W ⊕ P, lar reasoning gives that ǔn+1+i [Mn+1 ] ∈ U KRi for
which is strictly larger than W whenever P = 6 {0}. i = 0, . . . , N − 2.
Intuitively, this means that if we were to use the worst-
case bounded learned model in the constraints, then the The same argument gives (xn+1+N −1 [Mn+1 ], θn ) ∈
constraints will be reduced by a larger amount W ⊕ P; Ω (RN −1 × {0}) ⊂ Ω. Now by construction
this is in contrast to using the nominal model in which of Mn+1 , it holds that ǔn+1+N −1 [Mn+1 ] = M p,
case the constraints are reduced by only W. where M = [K (Ψ − KΛ)] is a matrix and p =
(xn+1+N −1 [Mn+1 ], θn ) is a point. Therefore, we have
Note that the value function Vn (xn ) (i.e., the value of ǔn+1+N −1 [Mn+1 ] = M p ⊆ M Ω M (RN −1 × {0}) =
the objective (5) at its minimum), the cost function ψn , MΩ KRN −1 . However, the constraint satisfaction
and the oracle On can be time-varying because they are property of Ω (3) implies that M Ω ⊆ U. Consequently,
functions of n. It is important that the oracle be allowed we have that ǔn+1+N −1 [Mn+1 ] ∈ U KRN −1 .
to be time-varying, because it is updated using statistical
methods as time advances and more data is gathered. Next, observe that the control ǔn+1+N −1 [Mn+1 ] leads
This is discussed in more detail in the next section. to xn+1+N [Mn+1 ] = ([A 0] + BM )p. Consequently,
we have xn+1+N [Mn+1 ] ∈ ([A 0] + BM )Ω (A +
Let Mn be a feasible point for the LBMPC scheme (5) BK)RN −1 . As a result of the disturbance invariance
with initial state xn , and denote a minimizing point of property of Ω (4), it must be that (xn+1+N [Mn+1 ], θn ) ∈
(5) as M∗n . The states and inputs predicted by the lin- (Ω (W × {0})) ((A + BK)RN −1 × {0}) =
ear model (2) for point Mn are denoted xn+i [Mn ] and Ω (RN × {0}). This completes the proof for part (a).

4
Similar arithmetic shows that the true, next state is and the reason that we have a continuous value function
xn+1 [Mn ] = xn+1 [Mn ] + wn where wn = g(xn , un ) ∈ is that our active constraints are linear equality con-
W. Since Mn is a feasible point, it holds that straints or polytopes. In practice, this result requires
xn+1 [Mn ] ∈ X W. This implies that xn+1 [Mn ] = being able to numerically compute a global minimum,
xn+1 [Mn ] + wn ∈ (X W) ⊕ W ⊆ X ; this proves part and this can only be efficiently done for convex opti-
(b). mization problems.
Remark 7. The proof of this result suggests another ben-
Corollary 1. If Ω has the properties defined in Sect. efit of LBMPC: The fact that the constraints are linear
3.1 and M0 is feasible for the LBMPC scheme (5) with means that suboptimal solutions can be computed by
initial state x0 , then the closed-loop system provided by solving a linear (and hence convex) feasibility problem,
LBMPC is (a) stable, (b) satisfies all state and input even when the LBMPC problem is nonlinear. This en-
constraints, and (c) feasible, for all points of time n ≥ 0. ables more precise tradeoffs between computation and
Remark 4. Robust feasability and constraint satisfac- solution accuracy, as compared to conventional forms of
tion, as in Theorem 1, trivially imply this result. nonlinear MPC.
Remark 5. These results apply to the case where ψn , On Next, we prove that LBMPC is robust because its worst
are time-varying; this allows, for example, changing the case behavior is an increasing function of modeling error.
set point of the LBMPC using the approach in [45]. This type of robustness if formalized by the following
Moreover, the safety and stability that we have proved definition.
for the closed-loop system under LBMPC are actually
robust results because they imply that the states remain Definition 1 (Grimm, et al. [30]). A system is robustly
within bounded constraints even under disturbances, asymptotically stable (RAS) about xs if there exists a
provided the modeling error in (2) follows the prescribed type-KL function β and for each  > 0 there exists δ > 0,
bound and the invariant set Ω can be computed. such that for all dn satisfying maxn kdn k < δ it holds
that xn ∈ X and kxn − xs k ≤ β(kx0 − xs k, n) +  for all
Next, we discuss additional types of robustness pro- n ≥ 0.
vided by LBMPC. First, we show that the value func- Remark 8. The intuition is that if a controller for the
tion Vn (xn ) of LBMPC (5) is continuous, and this prop- approximate system (2) with no disturbance converges
erty can be used for establishing certain other types of to xs , then the same controller applied to the approx-
robustness of an MPC controller [30,48,58,43]. imate system (2) with bounded disturbance (note that
this also includes the true system (1)) asymptotically
Lemma 1. Let XF = {xn : ∃Mn } be the feasible region remains within a bounded distance from xs .
of the LBMPC (5). If ψn , On are continuous, then Vn (xn )
is continuous on int(XF ). We can now prove when LBMPC is RAS. The key intu-
itive points are that linear MPC (i.e, LBMPC with an
identically zero oracle: On ≡ 0) needs to be provably
Proof. We define a cost function ψ̃n and constraint func-
convergent for the approximate model with no distur-
tion φ such that the LBMPC (5) can be rewritten as
bance, and the oracle for LBMPC needs to be bounded.
minc,θ ψ̃n (θ, xn , cn , . . . , cn+N −1 ) Theorem 2. Assume (a) Ω has the properties defined in
(10)
s.t. (c, θ) ∈ φ(xn ). Sect. 3.1; (b) M0 is feasible for LBMPC (5) with x0 ; (c)
the cost function ψn is time-invariant, continuous, and
The proof proceeds by showing that both the objective strictly convex, and (d) there exists a continuous Lya-
ψ̃n and constraint φ are continuous. Under such con- punov function W (x) for the approximate system (2) with
tinuity, we get continuity of the value function by the no disturbance, when using the control law of linear MPC
Berge maximum theorem [16] (or equivalently by Theo- (i.e, LBMPC with On ≡ 0). Under these conditions, the
rem C.34 of [58]). control law of LBMPC is RAS with respect to the distur-
bance dn in (2), whenever the oracle On is a continuous
Because the constraints (6) and (8) in LBMPC are lin- function satisfying maxn,X ×U kOn k ≤ δ. Note that this
ear, the constraint φ is continuous [30]. Continuity of ψ̃n δ is the same one as from the definition of RAS.
follows by noting that it is the composition of continu-
ous functions — specifically (5), (6), and (7) — is also a ∗
Proof. Let Mn be the minimizer for linear MPC, and
continuous function [61]. note that it is unique because ψn is assumed to be strictly
Remark 6. This result is surprising because a non- convex. Similarly, let M∗n be a minimizer for LBMPC.
convex (and hence nonlinear) MPC problem generally Now consider the state-dependent disturbance
has a discontinuous value function (cf. [30]). LBMPC is ∗
non-convex when On is nonlinear (or ψn is non-convex), en = B(ǔn [M∗n ] − ǔn [Mn ]) + dn , (11)

5
for the approximate system (2). By construction, it holds kx0 − xs k < δ implies that kxn − xs k <  for all n ≥ 0.

that xn+1 [M∗n ] = xn+1 [Mn ] + en . The second is that limn→∞ kxn − xs k = 0 for all feasible
points x0 ∈ XF .
Proposition 8 of [30] and Theorem 1 imply that given
 > 0, there exists δ1 > 0 such that for all en satisfying The second condition was shown in Theorem 1 of [45],
maxn ken k < δ1 it holds that xn ∈ X and kxn − xs k ≤ and so we only need to check the first condition. We begin
β(kx0 − xs k, n) +  for all n ≥ 0. What remains to be by noting that since Q, T are positive definite matrices,
checked is whether there exists δ such that maxn ken k < there exists a positive definite matrix S such that S < Q
δ1 for the en defined in (11). and S < T . Next, observe that kx̃n − xs k2S ≤ kx̃n −
Λθk2Q + kxs − Λθk2T ≤ ψn . Minimizing the both sides
The same argument as used in Lemma 1 coupled with of the inequality subject to the linear MPC constraints
the strict convexity of the linear MPC gives that M∗n yields kx̃n − xs k2S ≤ V (xn ), where V (xn ) is the value
is continuous, with respect to On , when On ≡ 0. (Re- function of the linear MPC optimization.

call that the minimizer at this point is Mn .) Because
of this continuity, this means that there exists δ2 > 0 Because linear MPC is the special case of LBMPC in

such that kǔn [M∗n ] − ǔn [Mn ]k ≤ δ1 /(2kBk), whenever which On ≡ 0, the result in Lemma 1 applies: The value
the oracle lies in the set {On : kOn k < δ2 }. Taking function V (xn ) is continuous. Furthermore, the proof of
δ = min{δ1 /2, δ2 } gives the result. Theorem 1 of [45] shows that the value function is non-
increasing (i.e., V (xn+1 )) ≤ V (xn )), non-negative (i.e.,
Remark 9. Condition (a) is satisfied if the set Ω can be V (xn ) ≥ 0), and zero-valued only at the equilibrium
computed; it cannot be computed in some situations be- point (i.e., V (xs ) = 0). Because of the continuity of the
cause it is possible to have Ω = ∅. Conditions (b) and (c) value function, given  > 0 there exists δ > 0, such that
are easy to check. As we will show in Sect. 3.2.1, certain V (x0 ) <  whenever kx0 − xs k < δ. The local uniform
systems have easy sufficient conditions for checking the stability condition holds by noting that kx̃n − xs k2S ≤
Lyapunov conditions in (d).
V (xn ) ≤ V (x0 ) = , and this proves the result.
3.2.1 Example: Tracking in Linearized Systems Remark 10. The result does not immediately follow from
[45], because the value function of the linear MPC is
Here, we show that the Lyapunov condition in The- not a Lyapunov function in this situation. In particular,
orem 2 can be easily checked when the cost function the value function is non-increasing, but it is not strictly
is quadratic and the approximate model is linear with decreasing.
bounds on its uncertainty. Suppose we use the quadratic
cost defined in [45]
4 The Oracle

ψn = kx̃n+N − Λθk2P + kxs − Λθk2T In theoretical computer science, oracles are black boxes
PN −1
+ i=0 kx̃n+i − Λθk2Q + kǔn+i − Ψθk2R , (12) that take in inputs and give answers. An important class
of arguments known as relativizing proofs utilize oracles
where P, Q, R, T are positive definite matrices, to track in order to prove results in complexity theory and com-
to the point xs ∈ {Λθ : Λθ ∈ X }. Then, the Lyapunov putability theory. These proofs proceed by endowing the
condition required for Theorem 2 holds. oracle with certain generic properties and then studying
the resulting consequences.
Proposition 1. For linear MPC with cost (12) where
xs ∈ {Λθ : Λθ ∈ X } is kept fixed, if (A + BK) is Schur We have named the functions On oracles in reference
stable and P solves the discrete-time Lyapunov equation to those in computer science. Our reason is that we
(A + BK)0 P (A + BK) − P = −(Q + K 0 RK); then there proved robustness and stability properties of LBMPC by
exists a continuous Lyapunov function W for the equi- only assuming generic properties, such as continuity or
librium point xs of the approximate model (2) with no boundedness, on the function On . These functions are
disturbances. arbitrary, which can include worst case behavior, for the
purpose of the theorems in the previous section.

Proof. First note that because we consider the linear Whereas the previous section considered the oracles as
MPC case, we have by definition x̃ = x. abstract objects, here we discuss and study specific forms
that the oracle can take. In particular, we can design
Results from converse Lyapunov theory [36] indicate On to be a statistical tool that identifies better system
that the result is true if the following two conditions models. This leads to two natural questions: First, what
hold. The first is local uniform stability, meaning that are examples of statistical methods that can be used to
for every  > 0, there exists some δ > 0 such that construct an oracle for LBMPC? Secondly, when does

6
the control law of LBMPC converge to the control law 4.2 Nonparametric Oracles
of MPC that knows the true model?
Nonparametric regression refers to techniques that esti-
This section begins by defining two general classes of sta-
mate a function g(x, u) of input variables such as x, u,
tistical tools that can be used to design the oracle On .
without making a priori assumptions about the mathe-
For concreteness, we provide a few examples of methods
matical form or structure of the function g. This class of
that belong to these two classes. The section concludes
techniques is interesting because it allows us to integrate
by addressing the second question above. Because our
non-traditional forms of adaptation and “learning” into
control law is the minimizer of an optimization problem,
LBMPC. And because LBMPC robustly maintains fea-
the key technical issue that we discuss is sufficient con-
sibility and constraint satisfaction as long as Ω can be
ditions that ensure convergence of the minimizers of a
computed, we can design or choose the nonparametric
sequence of optimization problems to the minimizer of
regression method without having to worry about stabil-
a limiting optimization problem.
ity properties. This is a specific instantiation of the sepa-
ration between robustness and performance in LBMPC.
4.1 Parametric Oracles
Example 3. Neural networks are a classic example of
A parametric oracle is a continuous function On (x, u) = a nonparametric method that has been used in adap-
χ(x, u; λn ) that is parameterized by a set of coefficients tive control [55,60,3], and they can also be used with
λn ∈ T ⊆ RL , where T is a set. This class of learning is LBMPC. There are many particular forms of neural net-
often used in adaptive control [64,6]. In the most general works, and one specific type is a feedforward neural net-
case, the function χ is nonlinear in all its arguments, and work with a hidden layer of kn neurons; it is given by
it is customary to use a least-squares cost function with
input and trajectory data to estimate the parameters On (x, u) = c0 +
Pkn 0 0
u0 ]0 + bi ),
i=1 ci σ(ai [x (15)
Pn 2
λ̂n = arg minλ∈T j=0 (Yj − χ(xj , uj ; λ)) , (13)
where ai ∈ Rp+m and bi , c0 , ci ∈ R for all i ∈ {1, . . . , k}
where Yi = xi+1 − (Axi + Bui ). This can be difficult to are coefficients, and σ(x) = 1/(1 + e−x ) : R → [0, 1]
compute in real-time because it is generally a nonlinear is a sigmoid function [31]. Note that this is considered
optimization problem. a nonparametric method because it does not generally
converge unless kn → ∞ as n → ∞.
Example 1. It is common in biochemical networks to
have nonlinear terms in the dynamics such as Designing a nonparametric oracle for LBMPC is chal-
λ
! ! lenging because the tool should ideally be an estimator
x1 n,2 λn,4 that is bounded to ensure robustness of LBMPC and dif-
On (x, u) = λn,1 , (14)
λ
x1 n,2 + λn,3
λ
u1 n,5 + λn,4 ferentiable to allow for its use with numerical optimiza-
tion algorithms. Local linear estimators [62,8] are not
where λn ∈ T ⊂ R5 are the unknown coefficients in this guaranteed to be bounded, and their extensions that re-
example. Such terms are often called Hill equation type main bounded are generally non-differentiable [27]. On
reactions [11]. the other hand, neural networks can be designed to re-
main bounded and differentiable, but they can have tech-
An important subclass of parametric oracles are nical difficulties related to the estimation of its coeffi-
those that are linear in the coefficients: On (x, u) = cients [72].
PL p
i=1 λn,i χi (x, u), where χi ∈ R for i = 1, . . . , L are
a set of (possibly nonlinear) functions. The reason for 4.2.1 Example: L2-Regularized Nadaraya-Watson Es-
the importance of this subclass is that the least-squares timator
procedure (13) is convex in this situation, even when
the functions χi are nonlinear. This greatly simplifies
The Nadaraya-Watson (NW) estimator [54,62], which
the computation required to solve the least-squares
can be intuitively thought of as the interpolation of non-
problem (13) that gives the unknown coefficients λn .
uniformly sampled data points by a suitably normal-
Example 2. One special case of linear parametric ora- ized convolution kernel, is promising because it ensures
cles is when the χi are linear functions. Here, the ora- boundedness. Our approach to designing a nonparamet-
cle can be written as Om (x, u) = Fλm x + Gλm u, where ric estimator for LBMPC is to modify the NW estimator
Fλm , Gλm are matrices whose entries are parameters. by adding regularization that deterministically ensures
The intuition is that this oracle allows for corrections to boundedness. Thus, it serves the same purpose as trim-
the values in the A, B matrices of the nominal model; ming [17]; but the benefit of our approach is that it also
it was used in conjunction with LBMPC on a quadro- deterministically ensures differentiability of the estima-
tor helicopter testbed [9,20], in which LBMPC enabled tor. To our knowledge, this modification of NW has not
high-performance flight. been previously considered in the literature.

7
Define hn , λn ∈ R+ to be two non-negative parameters; There are few notes regarding numerical computation
except when we wish to emphasize their temporal de- of L2NW. First, picking the parameters λ, h in a data-
pendence, we will drop the subscript n to match the driven manner [24,67] is too slow for real-time imple-
convention of the statistics literature. Let Xi = [x0i u0i ]0 , mentation, and so we suggest rules of thumb: Deter-
Yi = xi+1 − (Axi + Bui ), and Ξi = kξ − xi k2 /h2 , where ministic regularity is provided by Theorem 3 for any
Xi ∈ Rp+m and Yi ∈ Rp are data and ξ = [x0 u0 ]0 are free positive λ (e.g., 1e-3), and we conjecture using hn =
variables. We define any function κ : R → R+ to be a O(n−1/(p+m) ) because random samples cover X × U ⊆
kernel function if it has (a) finite support (i.e., κ(ν) = 0 Rp+m at this rate. Second, computational savings are
for |ν| ≥ 1), (b) even symmetry κ(ν) = κ(−ν), (c) pos- possible through careful software coding, because if h is
itive values κ(ν) > 0 for |ν| < 1, (d) differentiability small, then most terms in the summations of (17) and
(denoted by dκ), and (e) nonincreasing values of κ(ν) (18) will be zero because of the finite support of κ(·).
over ν ≥ 0. The L2-regularized NW (L2NW) estimator
is defined as 4.3 Stochastic Epi-convergence
P
iPYi κ(Ξi )
On (x, u) = , (16) It remains to be shown that if On (x, u) stochastically
λ + i κ(Ξi )
converges to the true model g(x, u), then the control law
of the LBMPC scheme will stochastically converge to
where λ ∈ R+ . If λ = 0, then (16) is simply the NW that of an MPC that knows the true model. The main
estimator. The λ term acts to regularize the problem technical problem occurs because On is time-varying,
and ensures differentiability. and so the control law is given by the minimizer of an
LBMPC optimization problem that is different at each
There are two alternative characterizations of (16). The point in time n. This presents a problem because point-
first is as the unique minimizer of the parametrized, wise convergence of On to g is generally insufficient to
strictly convex optimization problem On (x, u) = prove convergence of the minimizers of a sequence of op-
arg minγ L(x, u, Xi , Yi , γ) for timization problems to the minimizer of a limiting opti-
mization problem [59,73].
κ(Ξi )(Yi − γ)2 + λγ 2 .
P
L(x, u, Xi , Yi , γ) = i (17)
A related notion called epi-convergence is sufficient for
Viewed in this way, the λ term represents a Tikhonov showing convergence of the control law. Define the epi-
(or L2) regularization [71,32]. The second characteriza- graph of fn (·, ω) to be the set of all points lying on
tion is as the mean with weights {λ, κ(Ξ1 ), . . . , κ(Ξn )} or above the function, and denote it as Epi fn (·, ω) =
for points {0, Y1 , . . . , Yn }, and it is useful for showing {(x, µ) : µ ≥ fn (x, ω)}. To prove convergence of the se-
the second part of the following theorem about the de- quence of minimizers, we must show that the epigraph
terministic properties of the L2NW estimator. of the cost function (and constraints) of the sequence of
optimizations converges in probability to the epigraph of
Theorem 3. If 0 ∈ W, κ(·) is a kernel function, and the cost function (and constraints) in the limiting opti-
λ > 0; then (a) the L2NW estimator On (x, u) as defined mization problem. This notion is called epi-convergence,
l−prob.
in (16) is differentiable, and (b) On (x, u) ∈ W. and we denote it as fn −−−−−→ f0 .
X

Proof. To prove (a), note that the estimate On (x, u) is For simplicity, we will assume in this section that the cost
the value of γ that solves dL function is time-invariant (i.e., ψn ≡ ψ0 ). It is enough
dγ (x,
P
u, Xi , Yi , γ) = 0, where
to cite the relevant results for our purposes, but the in-
L(·) is from (17). Because λ+ i κ(Ξi ) > 0, the hypoth- terested reader can refer to [59,73] for details.
esis of the implicit function theorem is satisfied, and re-
sult directly follows from the implicit function theorem.
Theorem 4 (Theorem 4.3 [73]). Let ψ̃n and φ be as
Part (b) is shown by noting that the assumptions im- defined in Lemma 1, and define ψ̃0 to be the composition
ply that 0, Yi ∈ W. If the weights of a weighted mean of (5) with both (6) and xn+i+1 (xn+i , un+i ) = Axn+i +
l−prob.
are positive and have a nonzero sum, then the weighted Bun+i + g(xn+i , un+i ). If ψ̃n −−−−−→ ψ̃0 for all {xn :
mean can be written as a convex combination of points. φ(xn )

This is our situation, and so the result follow from the φ(xn ) 6= ∅}, then the set of minimizers converges
weighted mean characterization of (16).
arg min{ψ̃n |(c, θ) ∈ φ(xn )}
Remark 11. This shows that L2NW is deterministically p
bounded and differentiable, which is needed for robust- −
→ arg min{ψ̃0 |(c, θ) ∈ φ(xn )}. (19)
ness and numerical optimization, respectively. We can
compute the gradient of L2NW using standard calculus, Remark 12. The intuition is that if the cost function ψn
and its jk-th component is given by (18) for fixed Xi , Yi . composed with the oracle On (x, u) converges in the ap-

8
∂On   {P [Y ] · dκ(Ξ ) · Ξ · [ξ − X ] }{λ + P κ(Ξ )} − {P [Y ] κ(Ξ )}{P dκ(Ξ ) · Ξ · [ξ − X ] }
i i j i i i k i Pi i i j i i i i i k
x, u = . (18)
∂ξk h2 {λ + i κ(Ξi )}2 /2

propriate manner to ψ0 composed with the true dynam- with asymptotically decreasing radius h [75], though we
ics g(x, u); then we get convergence of the minimizers of make this explicit in our statement of the result. A proof
LBMPC to those of the MPC with true model, and the can be found in [10].
control law (9) converges. This theorem can be used to
prove convergence of the LBMPC control law. Theorem 6. Let hn be some sequence such that hn → 0.
If Shn is a FSC of X × U and
4.4 Epi-convergence for Parametric Oracles
sup kOn (x, u) − g(x, u)k = Op (rn ), (20)
X ×U
Sufficient excitation (SE) is an important concept in sys-
tem identification, and it intuitively means that the con- with rn → 0; then the control law of LBMPC with
trol inputs and state trajectory of the system are such On (x, u) converges in probability to the control law of
that all modes of the system are activated. In general, p
it is hard to design a control scheme that ensures this a an MPC that knows the true model (i.e., un [M∗n ] − →

priori, which is a key aim of reinforcement learning [15]. u0 [M0 ]).
However, LBMPC provides a framework in which SE Remark 13. Our reason for presenting this result is that
may be able to be designed. Because we have a nominal this theorem may be useful for proving convergence of
model, we can in principle design a reference trajectory the control law when using types of nonparametric re-
that sufficiently explores the state-input space X × U. gression that are more complex than L2NW. However,
we stress that this is a sufficient condition, and so it may
Though designing a controller that ensures SE can be be possible for nonparametric tools that do not meet this
difficult, checking a posteriori whether a system has SE condition to generate such stochastic convergence of the
is straightforward [46,7,8]. In this section, we assume SE controller.
and leave open the problem of how to design reference
trajectories for LBMPC that guarantee SE. This is not Assuming SE in the form of a FSC with asymptotically
problematic from the standpoint of stability and robsut- decreasing radius h, we can show that the control law of
ness, because LBMPC provides these properties, even LBMPC that uses L2NW converges to that of an MPC
without SE, whenever the conditions in Sect. 3 hold. We that knows the true dynamics. Because the proofs [10]
have convergence of the control law assuming SE, statis- rely upon theory from probability and statistics, we sim-
tical regularity, and that the oracle can correctly model ply summarize the main result.
g(x, u). The proof of the following theorem can be found
in [10] Theorem 7. Let hn be some sequence such that hn → 0.
If Shn is a FSC of X × U, λ = O(hn ), and g(x, u) is
Theorem 5. Suppose there exists λ0 ∈ T such that Lipschitz continuous; then the control law of LBMPC
g(x, u) = χ(x, u; λ0 ). If the system has SE [41,34,49], with L2NW converges in probability to the control law
then the control law of the LBMPC with oracle (13) con- p
of an MPC that knows the true model (i.e., un [M∗n ] −

verges in probability to the control law of an MPC that ∗
u0 [M0 ]).
p
knows the true model (i.e., un [M∗n ] −
→ u0 [M∗0 ]).

5 Experimental and Numerical Results


4.5 Epi-convergence for Nonparametric Oracles

In this section, we briefly discuss applications in which


For a nonlinear system, SE is usually defined using er-
LBMPC has been experimentally applied to different
godicity or mixing, but this is hard to verify in general.
testbeds. The section concludes with numerical simula-
Instead, we define SE as a finite sample cover (FSC) of
tions that display some of the features of LBMPC.
X . Let Bh (x) = {y : kx − yk ≤ h} be a ball centered
S at x
with radius h, then a FSC of X is a set Sh = i Bh/2 (Xi )
that satisfies X ⊆ Sh . The intuition is that {Xi } sample 5.1 Energy-efficient Building Automation
X with average, inter-sample distance less than h/2.
We have implemented LBMPC on two testbeds that
Our first result considers a generic nonparametric ora- were built on the Berkeley campus for the purpose of
cle with uniform pointwise convergence. Such uniform study energy-efficient control of heating, ventilation, and
convergence implicitly implies SE in the form of a FSC air-conditioning (HVAC) equipment. The first testbed

9
[12], which is named the Berkeley Retrofitted and Inex- the LBMPC displayed robustness by preventing crashes
pensive HVAC Testbed for Energy Efficiency (BRITE), into the ground during experiments in which the EKF
is a single-room that uses HVAC equipment that is com- was purposely made unstable in order to mis-learn.
monly found in homes. LBMPC was able to generate up The improved performance and learning generaliza-
to 30% energy savings on warm days and up to 70% en- tion possible with the type of adaptation and learning
ergy savings on cooler days, as compared to the existing within LBMPC was demonstrated with an integrated
control of the thermostat within the room. It achieved experiment in which the quadrotor helicopter caught
this by using semiparametric regression to be able to es- ping-pong balls that were thrown to it by a human.
timate, using only temperature measurements from the
thermostat, the heating load from exogenous sources like 5.3 Example: Moore-Greitzer Compressor Model
occupants, equipment, and solar heating. The LBMPC
used this estimated heating load as its form of learning,
Here, we present a simulation of LBMPC on a nonlinear
and was able to adjust the control action of the HVAC
system for illustrative purposes. The compression sys-
based on this in order to achieve large energy savings.
tem of a jet engine can exhibit two types of instability:
rotating stall and surge [53,25,39]. Rotating stall is a
The second testbed [13], which is named BRITE in Su- rotating region of reduced air flow, and it degrades the
tardja Dai Hall (BRITE-S), is a seven floor office build- performance of the engine. Surge is an oscillation of air
ing that is used in multiple ways. The building has offices, flow that can damage the engine. Historically, these in-
classrooms, an auditorium, laboratory space, a kitchen, stabilities were prevented by operating the engine con-
and a coffee shop with dining area. Using a variant of servatively. But better performance is possible through
LBMPC for hybrid systems with controlled switches, active control schemes [25,39].
we were able to achieve an average of 1.5MWh of en-
ergy savings per day. For reference, eight days of en- The Moore-Greitzer model is an ODE model that de-
ergy savings is enough to power an average American scribes the compressor and predicts surge instability
home for one year. Again, we used semiparametric re-
gression to be able to estimate, using only temperature
Φ̇ = −Ψ + Ψc + 1 + 3Φ/2 − Φ3 /2
measurements from the building, the heating load from √ (21)
exogenous sources like occupants, equipment, and solar Ψ̇ = (Φ + 1 − r Ψ)/β 2 ,
heating. The LBMPC used this estimated heating load
along with additional estimates of unmodeled actuator where Φ is mass flow, Ψ is pressure rise, β > 0 is a con-
dynamics, as its form of learning, in order to adjust its stant, and r is the throttle opening. We assume r is con-
supervisory control action. trolled by a second order actuator with transfer function
w2
r(s) = s2 +2ζwnn s+w2 u(s), where ζ is the damping coeffi-
n
5.2 High Performance Quadrotor Helicopter Flight cient, wn2 is the resonant frequency, and u is the input.

We have also used LBMPC in order to achieve high We conducted simulations of this √ system with the √ pa-
performance flight for semi-autonomous systems such rameters β = 1, Ψc = 0, ζ = 1/ 2, and wn = 1000.
as a quadrotor helicopter, which is a non-traditional We chose state constraints 0 ≤ Φ ≤ 1 and 1.1875 ≤
helicopter with four propellers that enable improved Ψ ≤ 2.1875, actuator constraints 0.1547 ≤ r ≤ 2.1547
steady-state stability properties [33]. In our experiments and −20 ≤ ṙ ≤ 20, and input constraints 0.1547 ≤
with LBMPC on this quadrotor testbed [9,20], the learn- u ≤ 2.1547. For the controller design, we took the ap-
ing was implemented using an extended Kalman filter proximate model with state δx = [δΦ δΨ δr δ ṙ]0 to be
(EKF) that provided corrections to the coefficients in the exact discretization (with sampling time T = 0.01)
the A, B matrices. This makes it similar to LPV-MPC, of the linearization of (21) about the equilibrium x0 =
which performs linear MPC using a successive series of [Φ0 Ψ0 r0 ṙ0 ]0 = [0.5000 1.6875 1.1547 0]0 ; the control is
linearizations of a nonlinear model; in our case, we used un = δun + u0 , where u0 ≡ r0 . The linearization and ap-
the learning provided by the EKF to in effect perform proximate model are unstable, and so we picked a nom-
such linearizations. inal feedback matrix K = [−3.0741 2.0957 0.1195 −
0.0090] that stabilizes the system by ensuring that the
Various experiments that we conducted showed that poles of the closed-loop system xn+1 = (A + BK)xn
LBMPC improved performance and provided robust- were placed at {0.75, 0.78, 0.98, 0.99}. These particular
ness. Amongst the experiments we performed were those poles were chosen because they are close to the poles of
that (a) showed improved step responses with lower the open-loop system, while still being stable.
amounts of overshoot and settling time as compared
to linear MPC, and (b) displayed the ability of the For the purpose of computing the invariant set Ω, we
LBMPC controller to overcome a phenomenon known used the algorithm in [37]. This algorithm uses the mod-
as the ground effect that typically makes flight paths eling error set W as one of its inputs. This set W was cho-
close to the ground difficult to perform. Furthermore, sen to be a hypercube that encompasses both a bound on

10
the linearization error, derived using the Taylor remain-
der theorem applied to the true nonlinear model, along 0.5
with a small amount of subjectively-chosen “safety mar-
δΦ
gin” to provide protection against the effect of numerical 0
errors.
−0.5
0 200 400 600
We compared the performance of linear MPC, nonlinear 0.5
MPC, and LBMPC with L2NW for regulating the sys- δΨ 0
tem about the operating point x0 , by conducting a sim-
ulation starting from initial condition [Φ0 − 0.35 Ψ0 − −0.5
0 200 400 600
0.40 r0 0]0 . The horizon was chosen to be N = 100, and 0.2
we used the cost function (12), with Q = I4 , R = 1, T =
1e3, and P that solves the discrete-time Lyapunov equa- δr 0
tion. The L2NW used an Epanechnikov kernel (CITE),
−0.2
with parameter values h = 0.5, λ = 1e-3 and data mea- 0 200 400 600
sured as the system was controlled by LBMPC. Also, 0.5
the L2NW only used three states Xi = [Φi Ψi ui ] to δ ṙ 0
estimate g(x, u); incorporation of such prior knowledge
improves estimation by reducing dimensionality. −0.5
0 200 400 600
0.2
The significance of this setup is that the assumptions of
Theorems 1 and 2 (via Proposition 1) are satisfied. This δu 0
means that for both linear MPC and LBMPC: (a) con-
−0.2
straints and feasibility are robustly maintained despite 0 200 400 600
modeling errors, (b) closed-loop stability is ensured, and n
(c) control is ISS with respect to modeling error. In
the instances we simulated, the controllers demonstrated
these features. More importantly, this example shows Fig. 1. The states and control of LBMPC (solid blue), linear
that the conditions of our deterministic theorems can be MPC (dashed red), and nonlinear MPC (dotted green) are
checked easily for interesting systems such as this. shown. LBMPC converges faster than linear MPC.

Simulation results are shown in Fig. 1: LBMPC con-


verges faster to the operating point than linear MPC, work is the design of better learning methods for use
but requires increased computation at each step (0.3s for in LBMPC. Loosely speaking, nonparametric methods
linear MPC vs. 0.9s for LBMPC). Interestingly, LBMPC work by localizing measurements in order to provide
performs as well as nonlinear MPC, but nonlinear MPC consistent estimates of the function g(x, u) [75]. The
only requires 0.4s to compute each step. However, our L2NW estimator maintains strict locality in the sense
point is that LBMPC does not require the control en- of [75], because this property makes it easier to perform
gineer to model nonlinearities, in contrast to nonlinear theoretical analysis. However, it is known that learning
MPC. Our code was written in MATLAB and uses the methods that also incorporate global regularization,
SNOPT solver [29] for optimization; polytope computa- such as support vector regression [68,72], can outper-
tions used the Multi-Parametric Toolbox (MPT) [40]. form strictly local methods [75]. The design of such
globally-regularized nonparametric methods which also
have theoretical properties favorable for LBMPC is an
6 Conclusion open problem.

LBMPC uses a linear model with bounds on its uncer-


tainty to construct invariant sets that provide determin-
istic guarantees on robustness and safety. An advantage Acknowledgements
of LBMPC is that many types of statistical identifica-
tion tools can be used with it, and we constructed a new
nonparametric estimator that has deterministic proper- The authors would like to acknowledge Jerry Ding and
ties required for use with numerical optimization algo- Ram Vasudevan for useful discussions about colloca-
rithms while also satisfying conditions required for ro- tion. This material is based upon work supported by the
bustness. A simulation shows that LBMPC can improve National Science Foundation under Grant No. 0931843,
over linear MPC, and experiments on testbeds [12,9,20] the Army Research Laboratory under Cooperative
show that such improvement translates to real systems. Agreement Number W911NF-08-2-0004, the Air Force
Office of Scientific Research under Agreement Number
Amongst the most interesting directions for future FA9550-06-1-0312, and PRET Grant 18796-S2.

11
References [20] P. Bouffard, A. Aswani, and C. Tomlin. Learning-
based model predictive control on a quadrotor: Onboard
implementation and experimental results. In ICRA, pages
[1] P. Abbeel, A. Coates, and A. Ng. Autonomous helicopter
279–284, 2012.
aerobatics through apprenticeship learning. International
Journal of Robotics Research, 29(13):1608–1639, 2010. [21] L. Chisci, J. Rossiter, and G. Zappa. Systems with presistent
disturbances: predictive control with restricted constraints.
[2] V. Adetola and M. Guay. Robust adaptive mpc for
Automatica, 37:1019–1028, 2001.
constrained uncertain nonlinear systems. Int. J. Adapt.
Control, 25(2):155–167, 2011. [22] L. Chisci and G. Zappa. Dual mode predictive tracking of
piecewise constant references for constrained linear systems.
[3] C. Anderson, P. Young, M. Buehner, J. Knight, K. Bush,
International Journal of Control, 76(1):61–72, 2003.
and D. Hittle. Robust reinforcement learning control using
integral quadratic constraints for recurrent neural networks. [23] A. Chutinan and B. Krogh. Verification of polyhedral-
IEEE Trans. Neural Netw., 18(4):993–1002, 2007. invariant hybrid automata using polygonal flow pipe
approximations. In HSCC, pages 76–90, 1999.
[4] E. Asarin, O. Bournez, T. Dang, and O. Maler. Approximate
reachability analysis of piecewise-linear dynamical systems. [24] B. Efron. Estimating the error rate of a prediction rule: Some
In HSCC 2000, pages 20–31, 2000. improvements on cross-validation. JASA, 78:316–331, 1983.
[5] E. Asarin, T. Dang, and A. Girard. Reachability analysis [25] A. Epstein, J. Ffowcs Williams, and E. Greitzer. Active
of nonlinear systems using conservative approximation. In supression of aerodynamic instabilities in turbomachines.
HSCC 2003, pages 20–35, 2003. Journal of Propulsion, 5(2):204–211, 1989.
[6] K.J. Åström and B. Wittenmark. Adaptive control. Addison- [26] P. Falcone, F. Borrelli, H. Tseng, J. Asgari, and D. Hrovat.
Wesley, 1995. Linear time-varying model predictive control and its
[7] A. Aswani, P. Bickel, and C. Tomlin. Statistics for sparse, application to active steering systems. International Journal
high-dimensional, and nonparametric system identification. of Robust and Nonlinear Control, 18:862–875, 2008.
In ICRA, 2009. [27] A. Fiacco. Sensitivity analysis for nonlinear programming
[8] A. Aswani, P. Bickel, and C. Tomlin. Regression on using penalty methods. Mathematical Programming, 10:287–
manifolds: Estimation of the exterior derivative. Annals of 311, 1976.
Statistics, 39(1):48–81, 2010. [28] H. Fukushima, T.H. Kim, and T. Sugie. Adaptive model
[9] A. Aswani, P. Bouffard, and C. Tomlin. Extensions predictive control for a class of constrained linear systems
of learning-based model predictive control for real-time based on the comparison model. Automatica, 43(2):301–308,
application to a quadrotor helicopter. In ACC, pages 4661– 2007.
4666, 2012. [29] P. Gill, W. Murray, and M. Saunders. SNOPT: An SQP
[10] A. Aswani, H. Gonzalez, S. Sastry, and C. Tomlin. Statistical algorithm for large-scale constrained optimization. SIAM
results on filtering and epi-convergence for learning-based Review, 47(1):99–131, 2005.
model predictive control. Technical report, 2012. [30] G. Grimm, M. Messina, S. Tuna, and A. Teel. Examples when
[11] A. Aswani, H. Guturu, and C. Tomlin. System identification nonlinear model predictive control is nonrobust. Automatica,
of hunchback protein patterning in early drosophila 40(10):1729–1738, 2004.
embryogenesis. In CDC, pages 7723–7728, dec. 2009. [31] L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. Neural
[12] A. Aswani, N. Master, J. Taneja, D. Culler, and C. Tomlin. networks estimates. In A Distribution-Free Theory of
Reducing transient and steady state electricity consumption Nonparametric Regression, pages 297–328. Springer New
in hvac using learning-based model-predictive control. York, 2002.
Proceedings of the IEEE, 99(12), 2011. [32] A. E. Hoerl and R. W. Kennard. Ridge regression: Biased
[13] A. Aswani, N. Master, J. Taneja, A. Krioukov, D. Culler, estimation for nonorthogonal problems. Technometrics, 8:27–
and C. Tomlin. Energy-efficient building HVAC control using 51, 1970.
hybrid system LBMPC. In IFAC Conference on Nonlinear [33] Gabriel M. Hoffmann, Steven L Waslander, and Claire J.
Model Predictive Control, 2012. To appear. Tomlin. Quadrotor helicopter trajectory tracking control. In
[14] A. Aswani and C. Tomlin. Reachability algorithm for 2008 AIAA Guidance, Navigation and Control Conference
biological piecewise-affine hybrid systems. In HSCC 2007, and Exhibit, Honolulu, Hawaii, USA, August 2008.
pages 633–636, 2007. [34] R. Jennrich. Asymptotic properties of non-linear least
[15] A. Barto and T. Dietterich. Reinforcement learning and its squares estimators. Annals of Mathematical Statistics,
relationship to supervised learning. In Handbook of Learning 40:633–643, 1969.
and Approximate Dynamic Programming. Wiley-IEEE Press, [35] Z.-P. Jiang and Y. Wang. Input-to-state stability for discrete-
2004. time nonlinear systems. Automatica, 37(6):857–869, 2001.
[16] C. Berge. Topological Spaces. Oliver and Boyd, Ltd., 1963. [36] Zhong-Ping Jiang and Yuang Wang. A converse lyapunov
[17] P. Bickel. On adaptive estimation. Annals of Statistics, thoerem for discrete-time systems with disturbances. Systems
10(3):647–671, 1982. and Control Letters, 45:49–58, 2002.
[18] L. Biegler, G. Biros, O. Ghattas, M. Heinkenschloss, [37] I. Kolmanovsky and E. Gilbert. Theory and computation
D. Keyes, B. Mallick, L. Tenorio, B. van Bloemen Waanders, of disturbance invariant sets for discrete-time linear systems.
K. Willcox, and Y. Marzouk. Large-Scale Inverse Problems Mathematical Problems in Engineering, 4:317–367, 1998.
and Quantification of Uncertainty. John Wiley & Sons, 2011. [38] M. Kothare, B. Mettler, M. Morari, P. Bendotti, and C.-
[19] F. Borelli, A. Bemporad, and M. Morari. Constrained M. Falinower. Level control in the steam generator of a
Optimal Control and Predictive Control for linear and hybrid nuclear power plants. IEEE Transactions on Control Systems
systems. 2009. In preparation. Technology, 8(1):55–69, 2000.

12
[39] M. Krstić and P. Kokotović. Lean backstepping design for [57] S.V. Rakovic and M. Baric. Parameterized robust control
a jet engine compressor model. In CCA, pages 1047–1052, invariant sets for linear systems: Theoretical advances and
September 1995. computational remarks. IEEE Trans. Autom. Control,
[40] M. Kvasnica, P. Grieder, and M. Baotić. Multi-Parametric 55(7):1599–1614, 2010.
Toolbox (MPT). 2004. [58] J.B. Rawlings and D.Q. Mayne. Model Predictive Control
[41] T. Lai, H. Robbins, and C. Wei. Strong consistency of Theory and Design. Nob Hill Pub., 2009.
least squares estimates in multiple regression ii. Journal of [59] R. Rockafellar and R. Wets. Variational Analysis. Springer-
Multivariate Analysis, 9:343–361, 1979. Verlag, 1998.
[42] W. Langson, I. Chryssochoos, S. Raković, and D. Mayne. [60] G. Rovithakis and M. Christodoulou. Adaptive control of
Robust model predictive control using tubes. Automatica, unknown plants using dynamical neural networks. IEEE
40(1):125–133, 2004. Trans. Syst., Man, Cybern., 24(3):400–412, 1994.
[43] D. Limon, T. Alamo, D. Raimondo, D. de la Peña, J. Bravo, [61] W. Rudin. Principles of Mathematical Analysis. McGraw-
A. Ferramosca, and E. Camacho. Input-to-state stability: A Hill, 2 edition, 1964.
unifying framework for robust model predictive control. In [62] D. Ruppert and M. Wand. Multivariate locally weighted
Lalo Magni, Davide Raimondo, and Frank Allgöwer, editors, least squares regression. Annals of Statistics, 22(3):1346–
Nonlinear Model Predictive Control, volume 384 of Lecture 1370, 1994.
Notes in Control and Information Sciences, pages 1–26.
Springer Berlin / Heidelberg, 2009. [63] S. Sastry. Nonlinear systems: analysis, stability, and control.
Springer, 1999.
[44] D. Limon, I. Alvarado, T. Alamo, and E. Camacho. Robust
tube-based MPC for tracking of constrained linear systems [64] S. Sastry and M. Bodson. Adaptive Control: Stability,
with additive disturbances. Journal of Process Control, Convergence, and Robustness. Prentice-Hall, 1989.
20(3):248–260, 2010. [65] S. Sastry and A. Isidori. Adaptive control of linearizable
[45] D. Limon, I. Alvarado, T. Alamo, and E.F. Camacho. MPC systems. IEEE Trans. Autom. Control, 34(11):1123–1131,
for tracking piecewise constant references for constrained 1989.
linear systems. Automatica, 44(9):2382–2387, 2008. [66] R. Schneider. Convex bodies: the Brunn-Minkowski theory.
[46] L. Ljung. System Identification: Theory for the User. Cambridge University Press, 1993.
Prentice-Hall, 1987. [67] J. Shao. Linear model selection by cross-validation. Journal
of the American Statistical Association, 88(422):486–494,
[47] L.. Ljung, H. Hjalmarsson, and H. Ohlsson. Four encounters
1993.
with system identification. European Journal of Control,
17(5–6):449–471, 2011. [68] Alex J. Smola and Bernhard Schölkopf. A tutorial on support
vector regression. Statistics and Computing, 14:199–222,
[48] L. Magni and R. Scattolini. Robustness and robust design
2004.
of mpc for nonlinear discrete-time systems. In Assessment
and Future Directions of Nonlinear Model Predictive Control, [69] O. Stursberg and B. Krogh. Efficient representation and
pages 239–254. Springer, 2007. computation of reachable sets for hybrid systems. In HSCC
2003, pages 482–497, 2003.
[49] E. Malinvaud. The consistency of nonlinear regressions.
Annas of Mathematical Statistics, 41(3):956–969, 1970. [70] R. Tedrake. LQR-trees: Feedback motion planning on sparse
randomized trees. In Robotics: Science and Systems, pages
[50] D. Mayne, J. Rawlings, C. Rao, and P. Scokaert.
17–24, 2009.
Constrained model predictive control: Stability and
optimality. Automatica, 36:789–814, 2000. [71] A.N. Tikhonov and V.I.A. Arsenin. Solutions of ill-posed
problems. Scripta series in mathematics. Winston, 1977.
[51] M. Milanese and G. Belaforte. Estimation theory and
uncertainty intervals evaluation in the presence of unknown [72] V.N. Vapnik. An overview of statistical learning theory.
but bounded errors: Linear families of models and estimates. Neural Networks, IEEE Transactions on, 10(5):988–999, sep
IEEE Transactions on Automatic Control, 27(2):408–414, 1999.
1982. [73] S. Vogel and P. Lachout. On continuous convergence and
[52] I. Mitchell, A. Bayen, and C. Tomlin. A time-dependent epi-convergence of random functions. part i: Theory and
Hamilton-Jacobi formulation of reachable sets for continuous relations. Kybernetika, 39(1):75–98, 2003.
dynamic games. IEEE Trans. Autom. Control, 50(7):947– [74] J.X. Xu and Y. Tan. Linear and nonlinear iterative learning
957, 2005. control. Springer, 2003.
[53] F. Moore and E. Greitzer. A theory of poststall transients [75] Alon Zakai and Ya’acov Ritov. How local should a learning
in axial compressors–part I: Development of the equations. method be? In COLT, pages 205–216, 2008.
ASME Journal of Engineering for Gas Turbines and Power,
108:68–76, 1986.
[54] H. Müller. Weighted local regression and kernel methods
for nonparametric curve fitting. Journal of the American
Statistical Association, 82:231–238, 1987.
[55] K.S. Narendra and K. Parthasarathy. Identification and
control of dynamical systems using neural networks. Neural
Networks, IEEE Transactions on, 1(1):4–27, 1990.
[56] S. Raković, E. Kerrigan, D. Mayne, and J. Lygeros.
Reachability analysis of discrete-time systems with
disturbances. IEEE Trans. Autom. Control, 51(4):546–561,
2006.

13

You might also like