0% found this document useful (0 votes)
8 views28 pages

2017-Non-Asymptotic Analysis of Robust Control From Coarse-Grained Identification

This document presents a non-asymptotic analysis of robust control through coarse-grained identification, focusing on the trade-off between sample size and control performance. It establishes bounds on the number of noisy samples needed to approximate a stable linear time-invariant system and shows that simpler models can achieve robust control objectives with fewer samples than previously thought. The authors also introduce optimal experiment design procedures for input selection under physical constraints and demonstrate the effectiveness of their approach through theoretical results and experimental validation.

Uploaded by

Gary Rey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views28 pages

2017-Non-Asymptotic Analysis of Robust Control From Coarse-Grained Identification

This document presents a non-asymptotic analysis of robust control through coarse-grained identification, focusing on the trade-off between sample size and control performance. It establishes bounds on the number of noisy samples needed to approximate a stable linear time-invariant system and shows that simpler models can achieve robust control objectives with fewer samples than previously thought. The authors also introduce optimal experiment design procedures for input selection under physical constraints and demonstrate the effectiveness of their approach through theoretical results and experimental validation.

Uploaded by

Gary Rey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Non-Asymptotic Analysis of Robust Control from

Coarse-Grained Identification
Stephen Tu, Ross Boczar, Andrew Packard, Benjamin Recht
arXiv:1707.04791v2 [math.OC] 30 Nov 2017

December 1, 2017

Abstract
This work explores the trade-off between the number of samples required to accurately build
models of dynamical systems and the degradation of performance in various control objectives
due to a coarse approximation. In particular, we show that simple models can be easily fit
from input/output data and are sufficient for achieving various control objectives. We derive
bounds on the number of noisy input/output samples from a stable linear time-invariant system
that are sufficient to guarantee that the corresponding finite impulse response approximation
is close to the true system in the H∞ -norm. We demonstrate that these demands are lower
than those derived in prior art which aimed to accurately identify dynamical models. We also
explore how different physical input constraints, such as power constraints, affect the sample
complexity. Finally, we show how our analysis fits within the established framework of robust
control, by demonstrating how a controller designed for an approximate system provably meets
performance objectives on the true system.

1 Introduction
Most control design relies on establishing a model of the system to be controlled. For simple physical
systems, a model with reasonable fidelity can typically be constructed from knowledge of the physics
at hand. However, for complex, uncertain systems, building models from first principles becomes
quickly intractable and one usually resorts to fitting models from empirical input/output data. This
approach naturally raises an important question: how well must we identify a system in order to
control it?
In this work, we attempt to answer this question by striking a balance between system iden-
tification and robust control. We aim to identify coarse estimates of the true underlying model
while coupling our estimation with precise probabilistic bounds on the inaccuracy of our estimates.
With a coarse model in hand, we can use standard robust synthesis tools that take into account the
derived bounds on the model uncertainty.
More precisely, given an unknown stable discrete-time plant G, we bound the error accrued by
fitting a finite impulse response (FIR) approximation to G from noisy output measurements. These
bounds balance the sample complexity of estimating an unknown FIR filter against the capability of
such a filter to approximate the behavior of G. In particular, we show that notably short FIR filters
provide a sufficient approximation to stable systems in order to ensure robust performance for a
variety of control design tasks. In particular, we demonstrate considerable savings in experimental
measurements as compared to other non-asymptotic schemes that aim to precisely identify G.

1
In the process of fitting a FIR filter, a natural question arises as to what inputs should be
used to excite the unknown system. Of course, due to actuator limitations and other physical
constraints, we are not free to choose any arbitrary input. Hence, we model the choice of inputs as
an experiment design question, where the practitioner specifies a bounded input set and asks for the
best m inputs to use to minimize FIR identification error. We propose a new optimal experiment
design procedure for solving this problem, and relate it to the well studied A-optimal experiment
design objective from the statistics literature [24]. This connection is used to study practical cases
of input constraints. Specifically, we prove that when the inputs are `2 -power constrained, then
impulse responses are the optimal choice of inputs. However, we show that is not the case when the
inputs are `∞ -constrained. For `∞ constraints, we construct a deterministic set of inputs which is
within a factor of 2 to the optimal solution. Combining these designs with our probabilistic bounds,
we show that for estimating a length-r FIR filter Gr , as long as m ≥ 4r, the residual H∞ error
kGr − G b r k∞ on the estimate G e √m)1 with high probability. This is a substantial
b r satisfies O(1/
p
improvement over the O( e r/m) scaling which we show occurs in the `2 -constrained case. We also
prove an information-theoretic lower bound which shows that when the true system happens to
be an FIR filter, our bounds are minimax optimal up to constant factors for the given estimation
problem.
Experimentally, we show that H∞ loop-shaping controller design on the estimated FIR model,
using probabilistic bounds, can be used to synthesize controllers with both stability and performance
guarantees on the closed loop with the true plant. We also demonstrate that our probabilistic bounds
can be estimated directly from data using Monte–Carlo techniques. We hope that our results
encourage further investigation into a rigorous foundation for data-driven controller synthesis.

1.1 A sample complexity bound for FIR identification


We now state our main results: upper and lower bounds on the sample complexity of FIR system
identification. Let G be a stable, discrete-time SISO LTI system. Suppose we are given query access
to G via independent, noisy measurements of the form
−1
Yu,T := (g ∗ u)Tk=0 + ξ , ξ ∼ N (0, σ 2 IT ) . (1.1)

Above, g denotes the impulse response of G and T is the length of the output we observe. We
assume that we are allowed to choose any input u contained within a bounded set U, which is
specified beforehand. From these measurements, we can approximate G by a length-r FIR filter
Gb r (z) as

r−1
X
G
b r (z) := gbk z −k , (1.2)
k=0

where r ≤ T , and the coefficients gbk are estimated from ordinary least-squares (c.f. Section 3). We
note that the extra degree of freedom in allowing r 6= T reduces the variance of the higher lag terms,
which is a standard trick used in the system-identification literature (see e.g. Section 2 of Wahlberg
et al. [30]).
The main quantity of interest in this setting is the number of timesteps needed in order to ensure
that the H∞ -norm of the error G − G b r satisfies the bound kG − Gb r k∞ ≤ ε with probability at least
1
The notation O(·)
e suppresses dependence on polylogarithmic factors.

2
1 − δ over the randomness of the noise. Here, the total number of timesteps is the product of the
number of queries of the form (1.1) times length of the queries. That is, the number of timesteps is
m × T , where m is number of measurements taken and T denotes the length of each measurement.
This quantity depends on the set U; we restrict (for now) to the case where U is either the unit
`2 -ball or the unit `∞ -ball. These two sets comprise the most common input constraints found in
the controls literature. Nevertheless, our analysis later will cover all `p -balls for p ∈ [1, ∞].
In both cases, we must first consider the length of the FIR filter (1.2) required to ensure rea-
sonable approximation in the H∞ -norm. We must guarantee that we are able to accurately capture
the large components of the impulse response of G. Therefore, we will need some measure of how
quickly the impulse response coefficients tend to zero. It turns out that the H∞ -norm of G provides
a convenient proxy for the decay of the impulse response. Throughout, we will use the following
sufficient bound on the truncation length:
Definition 1 (Sufficient Length Condition). Let G be stable with stability radius ρ ∈ (0, 1). Fix a
ε > 0. Let R(ε) be the smallest integer which satisfies
kG(γz)k∞
 
1
R(ε) ≥ inf log . (1.3)
ρ<γ<1 1 − γ ε(1 − γ)
Equation (1.3) characterizes the approximation error of an FIR filter to G as a balance between
the growth of 1/(1 − γ) versus the decay of the logarithm of kG(γz)k∞ , as γ varies between (ρ, 1).
We first study the `2 -ball case. In this case, we will set all m inputs to an impulse; that is,
ui = e1 , where e1 ∈ Rr is the first standard basis vector.
Theorem 1.1 (Main result, `2 -constrained case). Fix an ε > 0 and δ ∈ (0, 1), and suppose that
U = {x ∈ RT : kxk2 ≤ 1}. Let G be stable with stability radius ρ ∈ (0, 1), and set r ≥ R(ε/2) from
(1.3) and T = 2r. Set m measurements u1 , ..., um ∈ U, with ui = e1 for i = 1, ..., m, where m ≥ 1
satisfies
σ2r
  
1
m ≥ C 2 log r + log . (1.4)
ε δ
Then, with probability at least 1 − δ, we have kG − G
b r k∞ ≤ ε. Above, C is an absolute positive
constant.
Theorem 1.1 states that the number of timesteps to achieve identification error ε with `2 -
constrained inputs scales as O(σe 2 r2 /ε2 ) in the regime when σ/ε  1. It also turns out that
this input ensemble is optimal for the `2 -ball case, which we will discuss shortly.
We next turn to the `∞ -ball case. In this case, we take m = 2n to be an even number, and
construct the measurement ensemble
   
(c) 2πit (s) 2πit
ui,t = cos and ui,t = sin for i = 0, ..., n − 1 . (1.5)
n n
With this measurement ensemble, we prove the following result for `∞ -constraints.
Theorem 1.2 (Main result, `∞ -constrained case). Fix an ε > 0 and δ ∈ (0, 1), and suppose that
U = {x ∈ RT : kxk∞ ≤ 1}. Let G be stable with stability radius ρ ∈ (0, 1), and set r ≥ R(ε/2) from
(1.3) and T = 2r. Set m measurements as described in (1.5), where m ≥ 4r satisfies
σ2
  
1
m ≥ C 2 log r + log . (1.6)
ε δ

3
Then, with probability at least 1 − δ, we have kG − G
b r k∞ ≤ ε. Above, C is an absolute positive
constant.
In the regime when σ/ε  1, Theorem 1.2 states that the number of timesteps to achieve
e 2 r/ε2 ). This is substantially more
identification error ε with `∞ -constrained inputs scales as O(σ
efficient than the complexity O(σ 2 2 2
e r /ε ) which arises in the `2 -constrained case. We conclude by
noting that this particular input ensemble is optimal for the `∞ -ball case up to constants, which we
turn our attention to now.
For the lower bound, we assume that G itself is an length-r FIR filter. We consider the general
case when all m inputs are constrained to a unit `p -ball, where p ∈ [1, ∞].
Theorem 1.3 (Main result, minimax risk lower bound). Fix a p ∈ [1, ∞] and r ≥ 16. Suppose
that m ≥ 1 measurements u1 , ..., um ∈ RT are fixed beforehand, with T = 2r and kui kp ≤ 1 for all
i = 1, ..., m. Let Hr denote the space of all length-r FIR filters. We have that
r
2/ max(p,2) log r
inf sup EkG b − Gk∞ ≥ Cσ r , (1.7)
b G∈Hr
G m
b : ⊗m RT −→ Hr , and C is an absolute
where the infimum ranges over all measurable functions G k=1
positive constant.
In view of Theorem 1.3, we see that the rates prescribed by Theorem 1.1 for the `2 -constrained
case and by Theorem 1.2 for the `∞ -constrained case are minimax optimal up to constant factors.
We conclude by noting that our choice of T = 2r is arbitrary. Indeed, the same results hold for
T = d(1 + ε)re for any fixed ε > 0, which only a change in the constant factors.

2 Related Work
2.1 Transfer function identification
Estimating the transfer function of a linear time-invariant system from input/output pairs has been
studied in various forms in both the controls literature [17, 18] and the statistics literature [8, 10, 28],
where it is closely related to estimating the coefficients of a stable autoregressive (AR) process. The
main difference between our work and that of autoregressive estimation is that we assume the
noise process driving the system is chosen by the practitioner (which we denote as the input to
the system), and the stochastic component enters only during the output of the system. This
simplifying assumption allows us to provide stronger non-asymptotic guarantees. Also by making
prior assumptions on the stability radius of the underlying system, we circumvent the delicate issue
of model order selection; a similar assumption is made in [10].
Most closely related to our work is that of Goldenshluger [9], where he considers the problem of
estimating the impulse response coefficients of a stable SISO LTI system. Goldenshluger provides
upper and lower bounds on the `p -error when the residual between the estimate and the true
coefficients is treated as a sequence in `p for p ∈ [1, ∞]. The main difference between Goldenshluger’s
setting and ours is that he restricts himself to the case when the input u is `∞ -constrained, and
furthermore assumes only a single realization is available. On the other hand, we make assumption
that multiple independent realizations of the system are available, which is reasonable in a controlled
laboratory setting. This assumption simplifies the analysis and allows us to study more general `p -
constrained inputs.

4
2.2 System identification
We now turn our attention to system identification, where the classical results can be found in
[16]. Sample complexity guarantees in the system identification literature often require strong
assumptions, which are difficult to verify. Most analyses are asymptotic and are based on the idea
of persistence of excitation or mixing [19, 29]. There has been some progress in estimating the
sample complexity of dynamical system identification using machine learning tools [4, 29], but such
results typically yield pessimistic sample complexity bounds that are exponential in the degree of
the linear system or other relevant quantities.
Two recent results provide polynomial sample complexity for identifying linear dynamical sys-
tems. Shah et al. [27] show that if certain frequency domain measurements are obtained from a linear
dynamical system, then the system can be approximately identified by solving a second-order cone
programming problem. The degree of the estimated IIR system scales as (1 − ρ(A))−2 where ρ(A)
denotes the stability radius. Similarly, Hardt et al. [11] show that one can estimate an IIR system
from time domain observations with a number of measurements polynomial in (1 − ρ(A))−2 , under
the assumption that the impulse response coefficients {gk }k≥0 satisfy the decay law |gk | ≤ Cρ(A)k ,
where C is considered a constant independent of the degree of the system. In this work, we show
that under the same decay assumption, a considerably smaller FIR approximation with degree
O((1
e − ρ(A))−1 ) suffices to complete many control design tasks.

2.3 Robust control


Classical robust control literature focuses much of its effort on designing a controller while taking into
account fixed bounds on the uncertainty in the model. There are numerous algorithms for controller
synthesis under various uncertainty specifications, such as coprime factor uncertainty [20] or state-
space uncertainty [23]. However, there are only a few branches of the robust control literature that
couple identification to control design, and the identification procedure best suited for a particular
control synthesis scheme is usually not specified.

2.4 H∞ identification and gain estimation


Most related to our work is the literature on H∞ identification. In this literature, noisy input/output
data from an unknown stable linear time-invariant (LTI) plant is collected in either the frequency
or time domain; the goal is often to estimate a model with low H∞ error. For frequency domain
algorithms, see e.g. [12, 13], and for time domain algorithms, see e.g. [6]. A comprehensive review
of this line of work is given by Chen and Gu [5].
The main difference between the H∞ identification literature and our work is that we assume a
probabilistic noise model instead of worst-case (adversarial), and we assume that our identification
algorithm is allowed to pick its inputs to the plant G. As we will see, these simplifying assumptions
lead to simple algorithms, straightforward analysis, and finite-time sample complexity guarantees.
Another related line of work is the use of the power method [25, 30] for estimating the H∞ -norm
of an unknown SISO plant. The key insight is that in the SISO case, a time-reversal trick can be
applied to effectively query the system G∗ ◦ G, where G∗ denotes the adjoint system. This approach
is appealing, since the power method is known to converge exponentially quickly to the leading
eigenvector. However, the leading factor in the convergence rate is the ratio of λ1 /λ2 , and hence
providing a finite-time guarantee of this method would require a non-asymptotic analysis of the rate
of convergence of the second singular value of finite sections of a Toeplitz operator.

5
2.5 Norms of random polynomials
A significant portion of our Panalysis relies on bounding the norms of random trigonometric poly-
r−1
nomials of the form Q(z) = k=0 εk z k . The study of the supremum norm of random finite degree
polynomials was first initiated by Salem and Zygmund [26], who studied the setting where the coef-
ficients are drawn from a symmetric Bernoulli distribution supported on {±1}. Later, Kahane [15]
proved that when the coefficients
p are distributed as an isotropic Gaussian, then with probability
at least 1 − δ, kQk∞ ≤ O( r log(r/δ)). More recently, Meckes [21] extended this result to hold
for independent sub-Gaussian random variables by employing standard tools from probability in
Banach spaces. In Section 3, we extend these results to the case when the coefficients follow a
non-isotropic Gaussian distribution. This is important because it allows us to reduce the overall
error of our estimate by using non-isotropic covariance matrices from experiment design.

3 System Identification of Finite Impulse Responses


Recall from Section 1.1 that we are given query access to G via the form
−1
Yu,T = (g ∗ u)Tk=0 + ξ , ξ ∼ N (0, σ 2 IT ) ,

where u ∈ U for some fixed set U. Therefore, the ratio of some measure of the size of U to σ serves
as the signal-to-noise (SNR) ratio for our setting. In what follows, we will always assume U is a
unit `p -ball for p ∈ [1, ∞].
Fix a set of m inputs u1 , ..., umP ∈ U. Given a realization of {Yuk ,T }mk=1 , we can estimate the first
T coefficients {gk }k=0 of G(z) = ∞
T −1
k=0 gk z −k via ordinary least-squares (OLS). Calling the vector

Y := (Yu1 ,T , ..., Yum ,T ) ∈ RT m , it is straightforward to show that the least squares estimator gb0:T −1
is given by
 
gb0  
 gb1  Toep(u1 )
gb0:T −1 :=  .  = (Z T Z)−1 Z T Y , Z :=  .. T m×T
∈R .
   
 ..  .
Toep(um )
gbT −1

Let us clarify the Toep(u) notation. For a vector u ∈ RT , Toep(u) is the T × T lower-triangular
Toeplitz matrix where the first column is equal to u. Later on, we will use the notation Toepa×b (u),
where a, b are positive integers. This is to be interpreted as the upper left a × b section of the
semi-infinite lower-triangular Toeplitz matrix form by treating u as a zero-padded sequence in RN .
Above, we assume the matrix Z T Z is invertible, which we will ensure in our analysis. From
gb0:T −1 , we form the estimated finite impulse response G b r (z) := Pr−1 gbk z −k .
b r for any r ≤ T as G
k=0
The Gaussian output noise assumption means that the error vector gb0:T −1 − g0:T −1 is distributed
N
P(0, σ 2 (Z T Z)−1 ), and hence G
b r − Gr is equal in distribution to the random polynomial Q(z) =
r−1
εk z with ε ∼ N (0, σ Er (Z T Z)−1 ErT ), where Er := Ir 0r×(T −r) ∈ Rr×T . Here, Gr (z) :=
−k 2
 
Pr−1k=0
−k is the length-r FIR truncation of G. Since the covariance matrix will play a critical
k=0 gk z
role in our analysis to follow, we introduce the notation
m
X
Σ(u) := Toep(uk )T Toep(uk ) , (3.1)
k=1

6
where m will be clear from context. We will also use the shorthand notation [M ][r] , to refer to the
r × r matrix Er M ErT for any T × T matrix M .
The roadmap for this section is as follows. In Section 3.1, we characterize the behavior of
the random quantity kQk∞ as a function of the covariance matrix σ 2 Σ(u)−1 and the polynomial
degree r. Next, we study in Section 3.2 the problem of experiment design for choosing the best
inputs u1 , ..., um to minimize the error kQk∞ . Using these results, we give upper bounds for FIR
identification with `p -constrained inputs in Section 3.3. We then combine these results and prove
in Section 3.4 the main results from Theorem 1.1 and Theorem 1.2. Finally, we prove the minimax
risk lower bound from Theorem 1.3 in Section 3.5.

Process noise. Before we begin our analysis, we note that our upper bounds easily extend to
the case where process noise enters the system through the same channel as the input. Specifically,
suppose the input signal is corrupted by ζ ∼ N (0, σn2 IT ) which is independent of the output noise
ξ, and instead of observing Yu,T we observe
−1
e)Tk=0
Yeu,T = (g ∗ u +ξ, u
e := u + ζ .

In this setting, the error vector gb0:T −1 − g0:T −1 of the least-squares estimator on {Yeuk ,T }m
k=1 is
distributed

N (0, Λ) , Λ := (Z T Z)−1 Z T (σn2 ToepT ×T (g)ToepT ×T (g)T + σ 2 IT )Z(Z T Z)−1 .

Since g is the impulse response of a stable system, kToepT ×T (g)k ≤ kGk∞ , and therefore Λ 4
(σn2 kGk2∞ + σ 2 )(Z T Z)−1 . Thus, the upper bounds carry over to this process noise setting with the
variable substitution σ 2 ← σn2 kGk2∞ + σ 2 . The modification to the lower bounds in this setting is
more delicate, and we leave this to future work.

3.1 A concentration result for the error polynomial


We first address the behavior of the error kQk∞ . Our main tool is a discretization result from
Bhaskar et al. [1]:

Lemma 3.1 (Bhaskar et al. [1]). Let Q(z) := r−1 −k , where ε ∈ C. For any N ≥ 4πr,
P
k=0 εk z k
 
4πr
kQk∞ ≤ 1 + max |Q(ej2πk/N )| .
N k=0,...,N −1

Lemma 3.1 immediately reduces controlling the H∞ -norm of a finite-degree polynomial to con-
trolling the maxima of a finite set of points on the torus. Hence, upper bounding the expected
value of kQk∞ and showing concentration is straightforward. Before we state the result, we define
some useful notation which we will use throughout this section. For a z ∈ C, define the vector of
monomials ϕ(z) as

ϕ(z) := (1, z, z 2 , ..., z r−1 ) ∈ Cr , ϕ1 (z) := Re{ϕ(z)} , ϕ2 (z) := Im{ϕ(z)} , (3.2)

where the length r will be implicit from context.

7
Pr−1 −k .
Lemma 3.2. Let ε ∼ N (0, V ) where ε ∈ Rr , and put Q(z) = k=0 εk z Define for ` = 1, 2,

η`2 := sup ϕ` (z)T V ϕ` (z) . (3.3)


z∈T

We have that
√ p
EkQk∞ ≤ 4 2η log(8πr) , (3.4)

where η := max(η1 , η2 ). Furthermore, with probability at least 1 − δ, we have


√ p p
kQk∞ ≤ 4 2η( log(8πr) + log(2/δ)) . (3.5)

Proof. Set N = 4πr and invoke Lemma 3.1 to conclude that

kQk∞ ≤ 2 max |Q(ejk/2r )|


k=0,...,4πr−1

=2 max |hϕ(ejk/2r ), εi|


k=0,...,4πr−1

≤2 max |hϕ1 (ejk/2r ), εi| + 2 max |hϕ2 (ejk/2r ), εi| .


k=0,...,4πr−1 k=0,...,4πr−1

We first prove (3.4) by bounding E maxk=0,...,4πr−1 |hϕ` (ejk/2r ), εi| for ` = 1, 2. For a fixed k, we
have that hϕ` (ejk/2r ), εi ∼ N (0, ϕ` (ejk/2r )T V ϕ` (ejk/2r )). By standard results for expected maxima
of Gaussian random variables, we have
q p
E max |hϕ` (ejk/2r ), εi| ≤ max ϕ` (ejk/2r )T V ϕ` (ejk/2r ) 2 log(8πr)
k=0,...,4πr−1 k=0,...,4πr−1
p
≤ η 2 log(8πr) .

This yields (3.4).


For (3.5), using standard concentration results for suprema of Gaussian processes (see e.g. [3]),
we have that with probability at least 1 − δ,
p
max |hϕ` (ejk/2r ), εi| ≤ E max |hϕ` (ejk/2r ), εi| + η 2 log(1/δ)
k=0,...,4πr−1 k=0,...,4πr−1
p p
≤ η 2 log(8πr) + η 2 log(1/δ) .

The claim (3.5) now follows from a union bound.

Note that when V = I, η 2 ≤ r which recovers the known results from [15] up to constants.
Furthermore, when V is diagonal, η 2 ≤ Tr(V ). We will exploit this result in the sequel.

3.2 Experiment design


We now consider the problem of choosing a set of inputs u ∈ U in order to minimize the expected
error of the residual polynomial. Fixing the number of inputs m, the input constraint set U, and
recalling the definition of the covariance Σ(u) from (3.1), the optimal experiment design problem is

minimize Eε∼N (0,[Σ(u)−1 ][r] ) kQk∞ . (3.6)


u1 ,...,um ∈U

8
In (3.6) and the sequel, if the covariance matrix Σ(u) is not invertible then we assign the function
value +∞. Problem (3.6) is difficult to solve as written because the expected value does not have
a form which is easy to work with computationally. The following design problem provides a good
approximation of (3.6). Let {z1 , ..., zs } ⊆ T denote a grid of s points on T. Consider the problem
minimize max ϕ` (zk )T [Σ(u)−1 ][r] ϕ` (zk ) . (3.7)
u1 ,...,um ∈U 1≤k≤s
`=1,2

The objective (3.7) minimizes the maximum pointwise variance of Q(z) over all points on the grid
{z1 , ..., zs }. If the grid is uniformly spaced and s ≥ 4πr, then by Lemma 3.1 we can interpret (3.7)
as minimizing an upper bound to the objective function in (3.6), since
Eε∼N (0,[Σ(u)−1 ][r] ) kQk∞ ≤ (1 + 4πr/s) E max |hϕ(zk ), εi|
1≤k≤s

p 2
X q
≤ (1 + 4πr/s) 2 log(2s) max ϕ` (zk )T [Σ(u)−1 ][r] ϕ` (zk ) . (3.8)
1≤k≤s
`=1

However, (3.7) is non-convex in the ui ’s. A convex version of the problem can be written by choosing
m0 inputs u1 , ..., um0 ∈ U and solving the semidefinite program (SDP)
m0
X
minimize
m
max ϕ` (zk )T [Σ−1 ][r] ϕ` (zk ) s.t. Σ = λi Toep(ui )T Toep(ui ) , λT 1 = 1 , λ ≥ 0 . (3.9)
λ∈R 0 1≤k≤s
`=1,2 i=1

(3.9) is a convex program and can be solved with any off-the-shelf solver such as MOSEK [22].
We now study two special cases of U to show how input constraints can affect design. We first
observe that when Σ(u) is diagonal, continuing the estimates from (3.8), we have the following
upper bound which holds since kϕ` (z)k∞ ≤ 1,
q
Eε∼N (0,[Σ(u)−1 ][r] ) kQk∞ ≤ (1 + 4πr/s) 2 2 log(2s) Tr([Σ(u)−1 ][r] ) . (3.10)

Even though (3.10) only holds when Σ(u) is diagonal, it motivates us to consider the standard
A-optimal design problem
minimize Tr([Σ(u)−1 ][r] ) . (3.11)
u1 ,...,um ∈U

An advantage of (3.11) versus (3.7) is that the reduced complexity of the objective function allows
us to make statements about optimality when U is an `p -ball. The analogous SDP relaxation of
(3.11), similar to (3.9), is also more efficient to implement in practice for more general U’s.
Let Fp∗ (T, r) denote the optimal value of (3.11) with U = BpT for p ∈ [1, ∞], where BpT denotes
the unit `p -ball in RT . We will always assume T ≥ r. It is not hard to show that Fp∗ (T, r) is finite
and the value is attained (and hence Σ(u) at the optimum is invertible). We will now show the
following statements about (3.11):
(a) When p ∈ [1, 2], the optimal solution is to set ui = e1 , i = 1, ..., m.
(b) When p ∈ (2, ∞], we can solve (3.11) to within a factor of two of optimal by convex program-
ming.
(c) When p = ∞, we can give an exact closed form solution for (3.11) in the special case when
r = 2n with n ≥ 0, T = 2k r with k ≥ 1, and m is a multiple of T .

9
3.2.1 A-optimal design for `p -balls
We first prove a lower bound on the optimal objective value Fp∗ (T, r). To do this, we use the
following linear algebra fact.

Lemma 3.3. Let A be an n × n positive definite matrix. We have that


n
X
Tr(A−1 ) ≥ A−1
ii .
i=1

Proof. This proof is due to Mateusz Wasilewski [31].


By the Schur-Horn theorem, we know that the eigenvalues of A majorize the diagonal of A, i.e.
k
X k
X n
X n
X
Aii ≤ λi (A) , k = 1, ..., n , Aii = λi (A) .
i=1 i=1 i=1 i=1

The function x 7→ 1/x is convex for x > 0. This allows us to apply Karamata’s inequality, from
which the claim immediately follows.

Lemma 3.3 immediately yields an optimization problem which lower bounds the optimal value
Fp∗ .

Lemma 3.4. For all p ∈ [2, ∞], we have that

r
"T −r+i #−1
1 X X 1
Fp∗ (T, r) ≥ inf w` := Dp (T, r) . (3.12)
m w∈RT m
kwkp/2 ≤1,w≥0 i=1 `=1

Proof. We have that

(a) (b) r
X
Fp∗ (T, r) = inf Tr([Σ(u) −1
][r] ) ≥ inf Tr([Σ(u)]−1
[r] ) ≥ inf Σ(u)−1
ii
u1 ,...,um ∈RT u1 ,...,um ∈RT u1 ,...,um ∈RT
i=1
kui kp ≤1 kui kp ≤1 kui kp ≤1
det Σ(u)6=0 det Σ(u)6=0 det Σ(u)6=0
r r
" m T −r+i #−1 r
"T −r+i #−1
(c) X X X X 1
(d) X X
≥ inf Σ(u)−1
ii = inf (uk )2` = inf u2`
u1 ,...,um ∈RT u1 ,...,um ∈RT m u∈RT
i=1 i=1 k=1 `=1 kukp ≤1 i=1 `=1
kui kp ≤1 kui kp ≤1
r
"T −r+i #−1
1
(e) X X
= inf w` .
m w∈RT
kwkp/2 ≤1,w≥0 i=1 `=1

Above, (a) follows since for any positive definite matrix M , [M −1 ][r] < [M ]−1 [r] (see e.g. Equation
(3.2.27) of [14]), combined with the fact that trace is operator monotone, (b) follows from Lemma 3.3,
(c) follows since the infimum is over a larger set, (d) follows by the symmetry of the objective function
with respect to u1 , ..., um , and (e) follows by a simple reparameterization.

10
Before we proceed, a couple of remarks are in order. First, it is clear that D∞ (T, r) = HT −
HT −r = Θ(log(T /(T − r))), where Hn is the n-th Harmonic number. This is achieved by setting
w = 1. Second, it is straightforward to show (e.g. by the KKT conditions) that D2 (T, r) = r, which
is achieved by setting w = e1 . When p ∈ (2, ∞), the optimal u will be an interpolation between
w = 1 and w = e1 . To the best of our knowledge, there is no simple closed form formula for w in
the general case. However, we prove the following upper and lower bound on Dp (T, r).

Lemma 3.5. When p > 2, we have that

(T − r + 1)2/p (HT − HT −r ) ≤ Dp (T, r) ≤ T 2/p (HT − HT −r ) .

Proof. Set p0 = p/2 and let q 0 denote its conjugate pair. The upper bound follows by plugging in
the feasible vector w = 1T /k1T kp0 . The lower bound follows by the chain of inequalities

r
"T −r+i #−1 r
"T −r+i #−1
X X X X
Dp (T, r) = inf w` ≥ inf w`
w∈RT w∈RT
kwkp0 ≤1,w≥0 i=1 `=1 i=1 kwk 0 ≤1,w≥0
p
`=1
 −1
r −r+i
TX r
X   X 1
=  sup w`  =

w∈RT
 k1T −r+i kq0
i=1 `=1 i=1
kwkp0 ≤1,w≥0
r  1−1/p0 T  1−1/p0
X 1 X 1
= =
T −r+i i
i=1 i=T −r+1
T T
X 1 1/p0 0
X 1
= i ≥ (T − r + 1)1/p
i i
i=T −r+1 i=T −r+1
1/p0
= (T − r + 1) (HT − HT −r ) .

Lemma 3.5 implies that if e.g. T = 2r, then Dp (2r, r) = Θ(r2/p log r). We now use the lower
bound on F2∗ (T, r) to show that in the regime p ∈ [1, 2], we can solve the problem (3.11) exactly.

Lemma 3.6. For p ∈ [1, 2], we have that Fp∗ (T, r) = r


m, which is achieved by setting u1 = ... =
um = e1 .

Proof. For u1 = ... = um = e1 , it is immediate to verify that Σ(u) = mI, which yields Tr(Σ(u)−1 ) =
r r ∗ 1 r
m . Hence by Lemma 3.4, for the p = 2 case, m ≥ F2 (T, r) ≥ m D2 (T, r) = m . On the other hand,
since e1 is feasible for all `p -balls and since Bp ⊆ B2 for p ∈ [1, 2], m ≥ Fp∗ (T, r) ≥ F2∗ (T, r) =
T T r
r
m.

The remainder of this section will focus on the regime when p > 2. In this case, Dp (T, r) is a
convex program, and for each fixed T, r, p, one can quickly solve for the optimal value numerically.
Our goal now is to use utilize this numerical solution for approximating the experiment design
problem. Specifically, we show how we can choose m inputs of length T via convex programming
such that Tr([Σ(u)−1 ][r] ) = m
2
Dp (T, r). Recall by Lemma 3.4, that Fp∗ (T, r) ≥ m
1
Dp (T, r). Hence,
we can always recover a solution that is within a factor of two of the optimal solution by convex

11
programming. The loss of a factor of two is due to the real-valued nature of the inputs: if complex-
valued inputs were allowed then our solution would be exact.
Before we can state the result, we need a few auxiliary lemmas and notation. For two vectors
x, y ∈ CT , we let the notation x y = (x1 y1 , ..., xT yT ) ∈ CT denote the vector formed by the
element wise product of the entries. The next lemma states that when we form a (Hermitian)
covariance matrix from weighted sinusoidal inputs, the resulting covariance matrix has a simple
diagonal structure.
Lemma 3.7. Fix positive integers T, n with n ≥ T . Define zi = e2πji/n for i = 0, ..., n − 1. Fix any
weighting w ∈ CT . We have that
n−1 T T −1
!
X X X
∗ 2 2 2
M := Toep(w ϕ(zi )) Toep(w ϕ(zi )) = n diag |wi | , |wi | , ..., |w1 | , (3.13)
i=0 i=1 i=1

Proof. To ease indexing notation, we zero index the coefficients of w in the proof. Define the matrix
−1
Mz := Toep(w ϕ(z))∗ Toep(w ϕ(z)). Let us compute the entries of (Mz )Tk,`=0 for z = ejθ .
Assuming k < `, we have that
 
T −(`+1)
X
(Mz )k,` =  w`−k+m wm  e−jθ(`−k) .
m=0

Now summing over the zi ’s,


 
n−1 T −(`+1) n−1
X X X 2πji
Mk,` = (Mzi )k,` =  w`−k+m wm  e−j n
(`−k)

i=0 m=0 i=0


 
T −(`+1)
X 1 − e−2πj(`−k)
= w`−k+m wm  2πj =0.
m=0 1 − e− n
(`−k)

Above, the last inequality holds because we assumed n ≥ T . Since Mz is Hermitian, (Mz )`,k =
(Mz )k,` = 0, and therefore we have that M is a diagonal matrix. The diagonal entries are simply
given by
T −k+1
X
Mkk = n |wi |2 .
i=1

The claim now follows.

The weighting vector w from Lemma 3.7 will be useful when we consider inputs constrained by
general `p -balls. We now state a lemma which will allow us to work with purely real-valued input
signals by splitting the signals up into the real and imaginary parts, at an expense of a factor of
two. This is where our sub-optimality enters in.
Lemma 3.8. Fix positive integers m, n, T , and suppose that m = 2n and n ≥ T . Also fix any
w ∈ RT . Let zi = e2πji/n for i = 0, ..., n − 1. Define the vectors u0 , ..., un−1 and un , ..., u2n−1 as

ui = Re{w ϕ(zi )} , un+i = Im{w ϕ(zi )} , i = 0, ..., n − 1 .

12
We have that
T T −1
!
m X X
Σ(u) = diag |wi |2 , |wi |2 , ..., |w1 |2 . (3.14)
2
i=1 i=1

Proof. We observe that for any complex matrix X,


Re{X ∗ X} = Re{X}T Re{X} + Im{X}T Im{X} .
Furthermore, Re{Toep(u)} = Toep(Re{u}) and similarly Im{Toep(u)} = Toep(Im{u}). Hence,
Re{Toep(w ϕ(zi ))∗ Toep(w ϕ(zi ))}
T
= Toep(Re{w ϕ(zi )}) Toep(Re{w ϕ(zi )}) + Toep(Im{w ϕ(zi )})T Toep(Im{w ϕ(zi )}) .
Therefore, we have the identity
(n−1 )
X

Σ(u) = Re Toep(w ϕ(zi )) Toep(w ϕ(zi )) .
i=0

The claim now follows from Lemma 3.7.

We are finally ready to define the input ensemble which approximately solves the A-optimal
experimental design problem in the p ∈ (2, ∞] regime. For any p ∈ (2, ∞], define the vector
wp (T, r) ∈ RT as any vector which achieves the infimum in the optimization problem defining
Dp (T, r) before re-parameterization, i.e. wp (T, r) satisfies kwp (T, r)kp ≤ 1 and
r
"T −r+i #−1
X X
(wp (T, r))2` = Dp (T, r) .
i=1 `=1

Recall that wp (T, r) can be solved for numerically via convex optimization.
Lemma 3.9. Fix positive integers m, n, T , and suppose that m = 2n and n ≥ T . Let zi = e2πji/n
for i = 0, ..., n − 1. Define the vectors u0 , ..., un−1 and un , ..., u2n−1 as
ui = Re{wp (T, r) ϕ(zi )} , un+i = Im{wp (T, r) ϕ(zi )} , i = 0, ..., n − 1 .
We have that the covariance matrix Σ(u) satisfies
2
Tr([Σ(u)−1 ][r] ) = Dp (T, r) .
m
Proof. This is a direct consequence of Lemma 3.8. First, we check that both kui kp ≤ 1 and
kun+i kp ≤ 1. To check ui , note that
kui kp = kRe{wp (T, r) ϕ(zi )}kp ≤ kwp (T, r) ϕ(zi )kp = kwp (T, r)kp ≤ 1 .
A similar calculation holds for un+i . Hence from Lemma 3.8, specifically (3.14), we have that the
covariance matrix Σ(u) is diagonal, invertible, and satisfies Tr([Σ(u)−1 ][r] ) = m
2
Dp (T, r).

At this point we are nearly done, since we have shown that (3.11) can be solved to within a
factor of two. We conclude this section by showing that when p = ∞, r = 2n with n ≥ 0, T = 2k r
with k ≥ 1, and m is a multiple of T , we can we can remove the factor of two sub-optimality and
exactly solve (3.11).

13
Hadamard construction. Our construction is based on the Hadamard transform. We will show
that the optimal input vectors for A-design are r orthogonal vectors in {−1, +1}r . We give a
construction for these vectors in the following proposition.
n
Proposition 3.10. For n = 0, 1, 2, ..., there exists 2n vectors in {−1, +1}2 that are orthogonal
n
with respect to the standard `2 inner product on R2 .
Proof. We will induct on n = 0, 1, 2, ..., for which the base case n = 0 holds with u0 = 1. Assume
n
we have 2n orthogonal vectors, in {−1, +1}2 , denoted {uk }. Then, the 2n+1 vectors
n −1 
2[   
uk uk
{ũk } := , ,
uk −uk
k=0

which reside in {−1, +1} 2n+1 , are also orthogonal.


2 −1 n
Lemma 3.11. The constructed orthogonal vectors {uk }k=0 specified in Proposition 3.10 satisfy
n −1
2X
Σ(u) = Toep(uk )T Toep(uk ) = 2n diag(2n , 2n − 1, 2n − 2, . . . , 1) .
k=0

Proof. This follows from straightforward manipulations shown in Appendix A.

Combining Lemma 3.4 and Lemma 3.11 implies that the construction from Proposition 3.10 is
optimal for (3.11).

3.3 Upper bounds on FIR identification with `p -constrained inputs


Combining the results from Section 3.1 and Section 3.2, we now prove an upper bound on length-r
FIR identification when the inputs are `p -constrained.
Lemma 3.12. Fix positive integers m and r, and set T = 2r. Consider the input ensemble u1 =
... = um = e1 when p ∈ [1, 2], or the input ensemble defined in Lemma 3.9 when p ∈ (2, ∞] (with
additional restrictions on m, r in this case). Let Gb r denote the length-r FIR estimate derived from
least-squares, and let Gr denote the length-r FIR truncation of G. With probability at least 1 − δ,
 √ p p 
4 2σ r
p
m log(8πr) + log(2/δ) if p ∈ [1, 2]
kGb r − Gr k∞ ≤
8√2 log 2σ r2/p
q  p p 
m log(8πr) + log(2/δ) if p ∈ (2, ∞] .

Proof. From Lemma 3.2, we have that with probability at least 1 − δ,


√ q p p 
b r − Gr k∞ ≤ 4 2σ Tr([Σ(u)−1 ][r] )
kG log(8πr) + log(2/δ) .

We just need to upper bound the variance term Tr([Σ(u)−1 ][r] ). When p ∈ [1, 2], we know
by Lemma 3.6 that the optimal input ensemble is the impulse response ui = e1 , so we have
Tr([Σ(u)−1 ][r] ) ≤ r/m. On the other hand, when p ∈ (2, ∞], we know by Lemma 3.9 that the
specified u’s satisfy
2 4 4 log 2 2/p
Tr([Σ(u)−1 ][r] ) ≤ Dp (2r, r) ≤ r2/p (H2r − Hr ) ≤ r .
m m m
Above, the upper bound on Dp (2r, r) follows from Lemma 3.5.

14
3.4 Proof of upper bounds for main result
We now prove Theorem 1.1 and Theorem 1.2. Recall that G b r is the estimated length-r FIR approx-
imation to G, and Gr is the true length-r FIR truncation of G. By the triangle inequality, we have
the following error decomposition into an approximation error and an estimation error
kG − G
b r k∞ ≤ kG − Gr k∞ + kGr − G
b k . (3.15)
| {z } | {z r ∞}
Approx. error. Estimation error.

Hence, in order for kG − Gb r k∞ ≤ ε to hold, it suffices to have both the approximation error
kG − Gr k∞ ≤ ε/2 and the estimation error kGr − G b r k∞ ≤ ε/2.
The approximation error is a deterministic quantity, and its behavior is governed by the tail
decay of the impulse response coefficients {gk }k≥0 . In Section 4, we prove in Lemma 4.1 that as
long as r satisfies
 
1 2kG(γz)k∞
r ≥ inf log ,
ρ<γ<1 1 − γ ε(1 − γ)
then we have kG − Gr k∞ ≤ ε/2.
We now turn our attention to the estimation error. For the case when p = 2, Lemma 3.12 tells
us with probability at least 1 − δ, the estimation error satisfies

r 
r p p 
kGr − Gr k∞ ≤ 4 2σ
b log(8πr) + log(2/δ) .
m
Setting the RHS less than ε/2 and solving for m, combining with the inequality (a+b)2 ≤ 2(a2 +b2 ),
we conclude that a sufficient condition on m is
256σ 2 r
    
2
m ≥ max 2
log(8πr) + log ,1 .
ε δ
This concludes the proof of Theorem 1.1.
The proof of Theorem 1.2 is nearly identical. For the case when p = ∞, Lemma 3.12 tells us
with probability at least 1 − δ, the estimation error satisfies
r 
b r k∞ ≤ 8 2 log 2σ 1
p p p 
kGr − G log(8πr) + log(2/δ) .
m
Setting the RHS less than ε/2 and solving for m, we conclude that the sufficient condition is
(1024 log 2)σ 2
    
2
m ≥ max log(8πr) + log , 4r .
ε2 δ
This concludes the proof of Theorem 1.2.

3.5 Proof of lower bounds for main result


We will slightly relax the estimation setting to work with complex inputs and complex systems
which will simplify the proof. The relaxation is as follows. Recall that Hr is the space of length-r
FIR filters,
r−1
( )
X
−k
Hr = G(z) = r−1
gk z : (gk )k=0 ∈ C r
.
k=0

15
We fix a true G ∈ Hr parameterized by g ∈ Cr , and we fix m signals u1 , ..., um ∈ CT with
kui kp ≤ 1 for all i = 1, ..., m and T = 2r. Our estimator is given a realization of the random
variable Y = (Y1 , ..., Ym ) ∈ ⊗m T
i=1 C , where

Y ∼ ⊗m 2
i=1 N (ToepT ×T (g)ui , σ IT ) .

An estimator is any measurable function G b : ⊗m CT −→ Hr .


i=1
Let us clarify what is meant by Y ∼ N (µ, σ 2 I) where µ is a complex vector. We mean that the
random variable Y has distribution equal to Y = (Re(µ) + σw1 ) + j(Im(µ) + σw2 ) where w1 , w2 are
independent N (0, I) random vectors. We can equivalently treat this as the N ((Re(µ), Im(µ)), σ 2 I)
distribution over R2T , which means that for µ1 , µ2 ∈ CT ,
 2
σ 2 Re(µ1 ) σ2
  
2 2 Re(µ2 )
DKL (N (µ1 , σ I), N (µ2 , σ I)) = − = kµ1 − µ2 k2CT , (3.16)
2 Im(µ1 ) Im(µ2 ) R2T 2
where DKL (·, ·) denotes the KL-divergence. In the sequel, we will drop the norm subscripts as they
will be clear from context.
We now establish some preliminaries. First letting zk = e2πjk/r for k = 0, ..., r − 1 and denoting
ϕ(zk ) as the length-r vector of monomials, we define the T × T matrix Mr as
r−1
1X
Mr := ToepT ×T (ϕ(zk ))∗ ToepT ×T (ϕ(zk )) . (3.17)
r
k=0

It turns out this matrix will also play a central role in the lower bound analysis. To aid the analysis,
we have the following identity for Mr .
Proposition 3.13. We have that
Mr = diag(r, ..., r, r, r − 1, ..., 1) . (3.18)
| {z }
r times

Proof. The proof is similar to Lemma 3.7. For z = ejθ define Mz := Toep2r×2r (ϕ(z))∗ Toep2r×2r (ϕ(z)).
When 0 ≤ k < ` ≤ 2r − 1,
(
(min(2r, k + r) − `)e−jθ(`−k) if ` − k < r
(Mz )k,` = .
0 o.w.
Therefore, for k < ` and ` − k < r,
r
X r−1
X
(Mzi )k,` = (min(2r, k + r) − `) e−j2πji(`−k)/r = 0 ,
i=1 i=0
Pr
and when k < ` and ` − k ≥ r, i=1 (Mzi )k,` = 0 trivially. Therefore, Mr is a diagonal matrix. The
diagonal entries are easily computed.

In light of this identity, we will be interested in upper bounds on the quantity


T
X
Ap := sup u∗ Mr u = sup min(k, r)u2k . (3.19)
kukp ≤1 kukp ≤1 k=1

We have the following calculation which establishes the desired upper bounds.

16
Proposition 3.14. When p ∈ [1, 2], we have

Ap = r .

Furthermore, when p ∈ (2, ∞], we have the upper bound

Ap ≤ 4r2(p−1)/p .

Proof. First, we treat the case when p ∈ [1, 2]. By setting u = eT , we have that Ap ≥ r. Now, by
Hölder’s inequality,
T
X
min(k, r)u2k ≤ rkuk22 ≤ r ,
k=1

where the last inequality holds since kuk2 ≤ kukp ≤ 1.


Now, we treat the case when p ∈ (2, ∞]. First, we use the upper bound
T
X
Ap ≤ sup ku2k := Bp .
kukp ≤1 k=1

We now bound Bp by the duality of `p and `q norms. We write kukpp as


T
X T
X
kukpp = p
|uk | = ||uk |2 |p/2 .
k=1 k=1

Making a change of variables wk ← |uk |2 , we have by the duality of `p and `q norms that Bp is
equivalent to

Bp = max h(1, 2, ..., T ), wi = k(1, 2, ..., T )k 1 .


kwkp/2 ≤1 1−2/p

Immediately, we can read off that B2 = k(1, 2, ..., T )k∞ = T and B∞ = k(1, 2, ..., T )k1 = T (T +1)/2.
Let us now handle the case when p ∈ (2, ∞).

Bp = k(1, 2, ..., T )k 1
1−2/p

T
! 1−2/p
X p
= k p−2
k=1
Z T +1 1−2/p
p
≤ x p−2 dx
1
1−2/p
p−2

2(p−1)/(p−2)
≤ (T + 1)
2(p − 1)
1−2/p
p/(p−2) p − 2 2(p−1)/(p−2)

= 2 T
p−1
p − 2 (p−2)/p 2(p−1)/p
 
= T .
p−1

17
 (p−2)/p
p−2
Hence, we have that Bp ≤ p−1 T 2(p−1)/p when p ∈ (2, ∞). Notice that when p & 2 or
p % ∞ we have that this bound approaches T and T 2 , respectively, which is correct up to constants.
 (p−2)/p
The bound now follows since supp>2 p−2
p−1 = 1 and T = 2r.

Finally, we establish a simple lower bound on the H∞ -norm. For what follows, let F ∈ Cr×r be
the un-normalized Discrete Fourier Transform (DFT) matrix (i.e. F −1 = 1r F ∗ ).

Proposition 3.15. For any g ∈ Cr , we have that


r−1
X
sup gk z −k ≥ kF gk∞ . (3.20)
z∈T k=0

Proof. For any r set of points z0 , ..., zr−1 ∈ T with zm = ejθm ,


r−1
X r−1
X
sup gk z −k ≥ max −k
gk zm .
z∈T k=0 m=0,...,r−1
k=0

The result now holds by choosing θm := 2πm/r.

We are now in a position to prove Theorem 1.3. Our approach is based on Fano’s inequality
combined with a reduction to multi-way hypothesis testing, which is one of the standard approaches
in statistics for establishing minimax risk lower bounds (see e.g. [32]).
Recall that F ∈ Cr×r is the un-normalized DFT matrix. Fix a δ > 0, and define r systems
G1 , ..., Gr with parameters gk := δF −1 ek , where ek is the k-th standard basis vector. Using (3.20),
for k 6= `,

kGk − G` k∞ ≥ δkF (F −1 ek − F −1 e` )k∞ = δkek − e` k∞ = δ .

Now let Pgk := ⊗m 2


i=1 N (ToepT ×T (gk )ui , σ IT ). Define the random variable J to be uniform on the
set {1, ..., r}, and the random variable Z with conditional distribution Z|J=k ∼ Pgk . By Fano’s
inequality, we have that
 
R := inf sup EkG b − Gk∞ ≥ δ 1 − I(Z; J) + log 2 , (3.21)
b G∈Hr
G 2 log r

where I(Z; J) is the mutual information between Z and J. The remainder of the proof involves
choosing a δ such that I(Z;J)+log
log r
2
≤ 1/2, from which we conclude R ≥ δ/4.
By joint convexity of relative entropy and the fact that x 7→ Toep(x) is a linear operator, letting
k, ` be independent random indices distributed uniformly on {1, ..., r}, we have that

I(Z; J) ≤ Ek,` [DKL (Pgk , Pg` )]


m
δ2 X
= 2 Ek,` [kToepT ×T (F −1 (ek − e` ))ui k22 ]

i=1
2 m
2δ X
≤ Ek [kToepT ×T (F −1 ek )ui k22 ] ,
σ2
i=1

18
where the first equality holds from (3.16) and the last inequality uses kx + yk2 ≤ 2(kxk2 + kyk2 )
which holds for any norm k·k and all vectors x, y. Now we need to upper bound the expectation
Ek [kToepT ×T (F −1 ek )ui k22 ] for all i = 1, ..., m. If we let f0∗ , ..., fr−1
∗ denote the rows of F , the columns
of F −1 1 1
are r f0 , ..., r fr−1 . Furthermore, each fk = ϕ(zk ) for zk = e2πjk/r . Using the linearity of
x 7→ ToepT ×T (x) and the definition of Mr (3.17) and Ap (3.19),
r
1X ∗
Ek [kToepT ×T (F −1 ek )ui k22 ] = ui ToepT ×T (F −1 ek )∗ ToepT ×T (F −1 ek )ui
r
k=1
r
!
1 ∗ X
= ui ToepT ×T (F −1 ek )∗ ToepT ×T (F −1 ek ) ui
r
k=1
r−1
!
1 ∗ X −1 ∗ −1
= ui ToepT ×T (r fk ) ToepT ×T (r fk ) ui
r
k=0
r−1
!
1 ∗ 1 X
= ui ToepT ×T (fk )∗ ToepT ×T (fk ) ui
r r2
k=0
1
= 2 u∗i Mr ui
r
1
≤ 2 Ap .
r
Therefore, the mutual information I(Z; J) is bounded above by

2δ 2 m
I(Z; J) ≤ Ap .
σ2 r2
q
σ 2 r2 log r
Setting δ = 8 Ap m , and using our assumption that r ≥ 16, it is straightforward to check that

2δ 2 m
I(Z; J) + log 2 σ2 r2 p
A + log 2 1
≤ ≤ .
log r log r 2
Hence we conclude that
s
σ r2 log r
R≥ √ .
8 2 Ap m

By Proposition 3.14, when p ∈ [1, 2] we have Ap = r, in which case


r
σ r log r
R≥ √ .
8 2 m

On the other hand, when p ∈ (2, ∞], we have Ap ≤ 4r2(p−1)/p , in which case
r
σ r2/p log r
R≥ √ .
16 2 m
The concludes the proof of Theorem 1.3.

19
4 Finite Truncation Error Analysis for Stable Systems
In Section 3, we presented both probabilistic guarantees and experiment design for identification of
FIR systems of length r, which were independent of any system specific properties of G. In this
section, we analyze how system behavior affects the necessary truncation length needed to reach a
desired approximation error tolerance.
In order to provide guarantees, we require that the underlying system G is stable with stability
radius ρ ∈ (0, 1). A standard fact states that stability is equivalent to the existence
P∞ of −ka constant
C > 0 such that the tail decay on the coefficients of the Laurent expansion G = k=0 gk z satisfies
the following condition

|gk | ≤ Cρk , k ≥ 1 . (4.1)

Under this assumption, a simple calculation reveals that as long as


 
1 C
r≥ log ,
1−ρ ε(1 − ρ)

then we have that the approximation error kG − Gr k∞ satisfies kG − Gr k∞ ≤ ε.


Unfortunately, without more knowledge of the system at hand, a bound on C in (4.1) is hard
to characterize. However, by slightly relaxing the decay condition (4.1), we are able to derive a
tail bound using system-theoretic ideas. Intuitively, if a system has long transient behavior, then
we expect the constant C in (4.1) to be large, since in order to obtain a small approximation error
one needs to capture the transient behavior. The next lemma shows that the H∞ -norm provides
a sufficient characterization of this transient behavior. This result is due to Goldenshluger and
Zeevi [10]. We include the proof for completeness.

Lemma 4.1 (Lemma 1, Goldenshluger and Zeevi [10]). Let G(z) = ∞ −k be a stable SISO
P
k=0 gk z
LTI system with stability radius ρ ∈ (0, 1). Fix any γ satisfying ρ < γ < 1. Then for all k ≥ 1,

|gk | ≤ kG(γz)k∞ γ k .

Proof. Define the function H(z) := G(z −1 ), which is analytic for all |z| ≤ 1/γ. It is easy to check
that k-th derivative of H(z) evaluated at zero is H (k) (0) = k!gk . Therefore,

k!|gk | = |H (k) (0)| ≤ k!γ k max |H(z)| = k!γ k max |G(z)|


|z|≤1/γ |z|≥γ
k k
= k!γ max |G(γz)| = k!γ kG(γz)k∞ .
|z|≥1

Above, the first inequality is Cauchy’s estimate formula for analytic functions, and the last equality
follows from the maximum modulus principle.

We note that the technique used in Lemma 4.1 of considering the proxy system G(γz) instead
of G(z) directly also appears in [2] in the context of certifying exponential rates of convergence for
linear dynamical systems.

20
5 Robust Controller Design
In Section 3, we described how to obtain a FIR system Gfir with a probabilistic guarantee that
G = Gfir + ∆, where ∆ is an LTI system satisfying k∆k∞ ≤ ε. This description of G naturally lends
itself to many robust control synthesis methods. In this section, we describe the application of one
particular method based on H∞ loop-shaping to a particular unknown plant.
Suppose that G is itself an FIR described with z-transform
149
X
G(z) = |w0 | + |wk |ρk−1 z −k , ρ = 0.95 , (5.1)
k=1

where wk ∼ N (0, 1) are independent Gaussians. In this section, we will detail the design of a
reference tracking controller for G using probabilistic guarantees.

5.1 Computing bounds


While the non-asymptotic bounds of Section 3 and Section 4 give us upper bounds on the error
of noisy FIR approximation, the constant factors in the bounds are not optimal. Hence, strictly
relying on the bounds will cause oversampling by a constant factor of (say) 10 or more. For real
systems, this is extremely undesirable– using the sharpest bound possible is of great practical inter-
est. Fortunately, we can do this via simple Monte–Carlo simulations, which we detail in Section B
the appendix. For now, we describe the results of these simulations.
Our first Monte–Carlo simulation establishes that G satisfies the tail decay specified in (4.1)
with C = 3.9703 and ρ = 0.95. If we truncate G with r = 75, we see that our worst-case bound on
r−1
0.9574
kG − Gr k∞ is kG − Gr k∞ ≤ C ρ1−ρ = 3.9703 × 1−0.95 = 1.7840. In general, assuming we have no
other information about G other than the bounds on C and ρ, this is the sharpest approximation
error bound possible, since for any system with real-valued, all non-negative Fourier coefficients,
the H∞ -norm is simply the sum of the coefficients.
However, if we further assume we know the structure of G as in this case where we know
the form of (5.1), but not the values of wk , we can P further sharper our approximation bound.
Specifically, we know that Eapprox := kG − Gr k∞ = 149 k=75 |wk |ρ
k−1 , and hence we can perform

another Monte–Carlo simulation to estimate the tail probability of this random variable. The result
of our simulation is that P(Eapprox ≤ 0.46) ≥ 0.99. This is a substantial improvement over the
previous bound of Eapprox ≤ 1.7840 which only uses the information contained in the tail decay.
Furthermore, we can use the same trick to sharpest the estimates from Lemma 3.2. P We perform
our final Monte–Carlo simulation, this time on the random variable Enoise := √σN k 74 k=0 ξk z
−k k ,

2
with σ = 1 and ξk ∼ N (0, 1). Note that this corresponds to choosing the inputs to the system as
impulse responses, which we recall from Section 3.2 is optimal under `2 -power constraints. Doing
this simulation, we obtain that P(Enoise ≤ 3.5954) ≥ 0.99.

5.2 Controller design


Our goal is to design a controller K in the setup described in Figure 1, under the assumption that
k∆k∞ ≤ Eapprox + Enoise ≤ 4.0554. This assumption comes from the calculations in Section 5.1.
We note that k∆k∞ /kGk∞ fluctuates between 10-20%, so Gfir is a relatively coarse description of
G. We use standard loop-shaping performance goals (see e.g. [7]). Let Tr7→e and Tn7→e denote the

21

r e y
K Gfir

Figure 1: Closed-loop experimental setup. The goal is to design the controller K. Gfir is estimated from
noisy output data, and k∆k∞ is bounded via Monte–Carlo simulations.

transfer functions from r 7→ e and n 7→ e, respectively. At low frequencies, we would like |Tr7→e |
to have small gain, and at high frequencies we would like |Tr7→e | ≤ 2. Similarly, we would like
|Tn7→e | ≤ 2 at low frequencies and |Tn7→e | small at high frequencies. Of course, we would like these
goals to be achieved, in addition to closed loop stability, for all G = Gfir + ∆.
We proceed in two steps. We first design a controller with the nominal Gfir using H∞ loop-
shaping synthesis (mixsyn in MATLAB). We choose weights to encourage our performance goals on
Tr7→e and Tn7→e to be met. Next, we check that our performance goal is met, in addition to robust
stability. To make the computation easier, we check the performance goals separately. First, it is
well known (see e.g. [7]) that the goal on Tr7→e is met (in addition to robust stability) if the following
holds
k|W1 S| + γ|KS|k∞ < 1 , (5.2)
1
where S = 1+KG fir
and γ = 4.0554. Specifically, under (5.2), the closed loop with K in feedback
with G is stable and achieves the performance guarantee |Tr7→e (z)| ≤ |W11(z)| for every frequency
z ∈ T. On the other hand, to the best of our knowledge no simple expression for the performance
goal on Tn7→e exists, so we resort to a standard structured singular value (SSV) calculation [23].
We generate our controller K via the following MATLAB commands
w_c = 0.07; % Cross-over freq
W1 = makeweight(5000, w_c, .5, 1); % Low-freq disturbance rejection
W2 = 1.5*fir_error_bound; % Robust stability
W3 = makeweight(.5, 3 * w_c, 5000, 1); % High-freq noise insensitivity
P = augw(G_fir, W1, W2, W3);
K = hinfsyn(P);
In Figure 2, we plot the open loop gain L = Gfir K, sensitivity function S = 1/(1 + L), and
complementary sensitivity function T = 1 − S. Here, we see that the cross-over frequency ωc ≈ 0.1.
Next, in Figure 3, we plot the µ values for both the reference tracking objective Tr7→e and the noise
insensitivity objective Tn7→e , and check that both curves lies below 1 for all frequencies. Recall
that this means that G in feedback with K is not only exponentially stable, but also satisfies both
performance guarantees. Finally, in Figure 4, we plot the output y as a function of a noisy square
wave input u, to show the desired reference tracking behavior, on both the closed loop simulation
(with Gfir ), and the actual closed loop behavior (with G). This shows that, while the model Gfir
was a coarse grained description of G with up to 20% relative error, it was faithful enough to allow
for a robust controller design.

22
80 1.0
60 S
40 L 0.8
Magnitude (dB)

20 T
0.6
0
20 0.4
40 0.2 Reference Tracking
60 Noise Insensitivity
80 0.0
3 2 1 0
10 10 10 10 0.0 0.5 1.0 1.5 2.0 2.5 3.0
Frequency (rad/s) Frequency (rad/s)

Figure 2: Loop-shaping curves from the experi- Figure 3: The pointwise frequency µ value for
mental setup of Figure 1. L = Gfir K denotes the both reference tracking Tr7→e and noise insensi-
open-loop gain, S = 1/(1 + L) is the sensitivity tivity Tn7→e . Robust performance is guaranteed
function, and T = 1 − S. as the curve lies below 1 at all frequencies.

1.0 1.2
1.0
0.8
0.8 Input
0.6 Input r = 10
0.6
0.4
CLP-True
0.4 r = 30
CLP-FIR r = 50
0.2
0.2
r = 70
0.0
0.0
Opt
0.2
0 100 200 300 400 500 600 700 0 100 200 300 400 500 600 700
Time Time

Figure 4: Reference tracking behavior of the Figure 5: Reference tracking behavior as the
closed loop with the model Gfir and the actual FIR truncation length is varied from r =
plant G. 10, 30, 50, 70.

5.3 Varying truncation length


We next study the effect of truncation length r on controller design. In Figure 5, we assume
the same setup and performance goals as the previous section, but vary the truncation length
r ∈ {10, 30, 50, 70}. We also include the result of a controller design which has full knowledge
of the true system G, which we label as Opt. We see that for r = 10, the resulting controller
unsurprisingly has undesirable overshoot behavior. However, as r increases the resulting controller
mimics the behavior of Opt quite closely. This plot shows that, at least for reference tracking
behavior, a fairly low-fidelity model suffices. For instance, across different trials, the relative error
of Gfir for r = 30 fluctuated between 15% to 30%, but in many cases r = 30 was able to provide
reasonable reference tracking behavior.

23
6 Conclusion
This paper explored the use of a coarse-grained FIR model estimated from noisy output data for
control design. We showed that sharp bounds on the H∞ error between the true unknown plant and
the estimated FIR filter can be derived using tools from concentration of measure, and the constant
factors on these bounds can be further refined via Monte–Carlo simulation techniques. Finally,
we demonstrated empirically that one can perform controller synthesis using only a coarse-grained
approximation of the true system while meeting certain performance goals.
There are many possible future extensions of our work. We highlight a few ideas below.

End-to-end Guarantees. In Section 5, we studied empirically the performance of the controller


synthesized on our nominal system versus the optimal controller synthesized on the true system.
We would like to push our analysis further and provide guarantees on this performance gap as a
function of our estimation error. This would allow us to provide a true end-to-end guarantee on the
number of samples needed to control an unknown system to a certain degree of performance.

MIMO Systems. While our approach can be generalized to the MIMO case by estimating filters
for each input/output pair separately, we believe that when the MIMO transfer matrix has special
structure (e.g. low rank), it should be possible to couple the estimation procedure to reduce the n2
factor increase in sample complexity. This is motivated by the vast literature on compressed sensing,
where sparse models embedded in a much larger ambient dimension can be uniquely recovered with
at most a logarithmic factor more samples than the degree of the intrinsic sparsity.

Nonlinear Systems. An extension of these techniques to nonlinear systems is another exciting


direction. One possible idea is to treat a nonlinear system’s Jacobian linearization as the target
unknown system, and fit a FIR using our techniques by exciting the nonlinear system locally. One
would expect that the controller designed on the FIR would be valid in a neighborhood, and upon
exiting the neighborhood, the process would repeat itself. The challenge here remains to estimate
online the regime for which a controller is valid.

Acknowledgements
We thank Orianna DeMasi for helpful comments and suggestions regarding this manuscript, Kevin
Jamieson for insightful discussions regarding Monte–Carlo simulation, Anders Rantzer for pointing
out references related to this work, Max Simchowitz for guidance regarding the proof of Theorem 1.3,
and Vikas Sindhwani for providing ideas around linear control techniques for nonlinear systems.
RB is supported by the Department of Defense NDSEG Scholarship. AP gratefully acknowledges
generous support from the FANUC Corporation as well as the National Science Foundation under
grant ECCS-1405413. BR is generously supported by NSF award CCF-1359814, ONR awards
N00014-14-1-0024 and N00014-17-1-2191, the DARPA Fundamental Limits of Learning (Fun LoL)
Program, a Sloan Research Fellowship, and a Google Faculty Award.

24
References
[1] B. N. Bhaskar, G. Tang, and B. Recht. Atomic norm denoising with applications to line spectral
estimation. arXiv:1204.0562, 2012.

[2] R. Boczar, L. Lessard, A. Packard, and B. Recht. Exponential Stability Analysis via Integral
Quadratic Constraints. arXiv:1706.01337, 2017.

[3] S. Boucheron, G. Lugosi, and P. Massart. Concentration Inequalities: A Nonasymptotic Theory


of Independence. 2016.

[4] M. C. Campi and E. Weyer. Finite Sample Properties of System Identification Methods. IEEE
Transactions on Automatic Control, 47(8), 2002.

[5] J. Chen and G. Gu. Control-Oriented System Identification: An H∞ Approach. 2000.

[6] J. Chen and C. N. Nett. The Carathéodory-Fejér Problem and H∞ Identification: A Time
Domain Approach. In Conference on Decision and Control, 1993.

[7] J. Doyle, B. Francis, and A. Tannenbaum. Feedback Control Theory. 1990.

[8] L. Gerencsér. AR(∞) Estimation and Nonparametric Stochastic Complexity. IEEE Transac-
tions on Information Theory, 38(6), 1992.

[9] A. Goldenshluger. Nonparametric Estimation of Transfer Functions: Rates of Convergence and


Adaptation. IEEE Transactions on Information Theory, 44(2), 1998.

[10] A. Goldenshluger and A. Zeevi. Nonasymptotic Bounds for Autoregressive Time Series Mod-
eling. The Annals of Statistics, 29(2), 2001.

[11] M. Hardt, T. Ma, and B. Recht. Gradient Descent Learns Linear Dynamical Systems.
arXiv:1609.05191, 2016.

[12] A. J. Helmicki, C. A. Jacobson, and C. N. Nett. Control Oriented System Identification: A


Worst-Case/Deterministic Approach in H∞ . IEEE Transactions on Automatic Control, 36(10),
1991.

[13] H. Hindi, C.-Y. Seong, and S. Boyd. Computing Optimal Uncertainty Models from Frequency
Domain Data. In Conference on Decision and Control, 2002.

[14] R. A. Horn and F. Zhang. Basic Properties of the Schur Complement. In F. Zhang, editor,
The Schur Complement and Its Applications, 2005.

[15] J.-P. Kahane. Some Random Series of Functions. 1994.

[16] L. Ljung. System Identification: Theory for the User. 1999.

[17] L. Ljung and B. Wahlberg. Asymptotic Properties of the Least-Squares Method for Estimating
Transfer Functions and Disturbance Spectra. Advances in Applied Probability, 24(2), 1992.

[18] L. Ljung and Z.-D. Yuan. Asymptotic Properties of Black-Box Identification of Transfer Func-
tions. IEEE Transactions on Automatic Control, 30(6), 1985.

25
[19] D. J. McDonald, C. R. Shalizi, and M. Schervish. Nonparametric Risk Bounds for Time-Series
Forecasting. Journal of Machine Learning Research, 18, 2017.

[20] D. McFarlane and K. Glover. A Loop Shaping Design Procedure Using H∞ Synthesis. IEEE
Transactions on Automatic Control, 37(6), 1992.

[21] M. W. Meckes. On the spectral norm of a random Toeplitz matrix. Electron. Commun. Probab.,
12, 2007.

[22] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 7.1 (Revision
28)., 2015.

[23] A. Packard and J. Doyle. The Complex Structured Singular Value. Automatica, 29(1), 1993.

[24] F. Pukelsheim. Optimal Design of Experiments. 1993.

[25] C. R. Rojas, T. Oomen, H. Hjalmarsson, and B. Wahlberg. Analyzing Iterations in Identifica-


tion with Application to Nonparametric H∞ -norm Estimation. Automatica, 48(11), 2012.

[26] R. Salem and A. Zygmund. Some properties of trigonometric series whose terms have random
signs. Acta Mathematica, 91, 1954.

[27] P. Shah, B. N. Bhaskar, G. Tang, and B. Recht. Linear System Identification via Atomic Norm
Regularization. In Conference on Decision and Control, 2012.

[28] R. Shibata. Asymptotically Efficient Selection of the Order of the Model for Estimating Pa-
rameters of a Linear Process. The Annals of Statistics, 8(1), 1980.

[29] M. Vidyasagar and R. L. Karandikar. A learning theory approach to system identification and
stochastic adaptive control. Journal of Process Control, 18(3), 2008.

[30] B. Wahlberg, M. B. Syberg, and H. Hjalmarsson. Non-parametric methods for L2 -gain estima-
tion using iterative experiments. Automatica, 46(8), 2010.

[31] M. Wasilewski. Trace matrix inequality. MathOverflow. URL (version: 2017-07-05): https:
//mathoverflow.net/q/106233.

[32] B. Yu. Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam: Research Papers in
Probability and Statistics, 1997.

26
A Proof of Lemma 3.11
We will induct on n, for which the base cases n = 1 holds. Denote the i-right shift operator applied
[i]
to uk by uk . Now, assume the property holds for n. This implies that
n −1
2X
[i] [j]
Mij = uk · uk = 0
k=0
n −1
2X
[i] [i]
Mii = uk · uk = 2n (2n − i) .
k=0

Now, construct {ũk } as before, and note that for l = 0, . . . , 2n − 1,


   

 0i×1 
 0 i×1
 
 ul  , i ≤ 2n  ul  , i ≤ 2n

  
 
 
[i] [i] [i] [i]
ũ2l = ũ2l+1 = .
" ul # " −ul #
0 0i×1
 
i×1
 
n n
 [i−2n ] , i > 2 [i−2n ] , i > 2

 

 
ul −ul

Thus, for i ≤ 2n , when i 6= j,

2n+1
X−1 [i] [j]
M̃ij = ũk · ũk
k=0
n −1
2X
[i] [j] [i] [j]
= ũ2l · ũ2l + ũ2l+1 · ũ2l+1
l=0
n −1
2X
[i] [j] [i] [j]
= ul · ul + ul · ul − ul · ul + ul · ul = 0 .
l=0

Furthermore,

2n+1
X−1 [i] [i]
M̃ii = ũk · ũk
k=0
n −1
2X
[i] [i] [i] [i]
= ũ2l−1 · ũ2l−1 + ũ2l · ũ2l
l=0
n −1
2X
[i] [i] [i] [i]
= ul · ul + ul · ul + ul · ul + ul · ul
l=0
= 2(2 (2n − i)) + 2(2n (2n ))
n

= 2n+1 (2n+1 − i) .

A similar calculation holds for i > 2n . Thus, by induction, the property holds for all r = 2n .

27
B Details for Monte–Carlo simulations
In all of our simulations we are faced with the following problem which we describe in some generality.
Let X be a random variable distributed according to the law P. We assume we have access to iid
samples from P. Our goal is to estimate an upper bound on P(X ≥ t) for a fixed t ∈ R.
If the law P admits a density f (·) with respect to the Lebesgue measure, a possible solution
could be to solve this problem exactly by numerically integrating
Z
X(ξ)f (ξ) dξ .
X(ξ)≥t

However, numerical integration does not scale favorably with dimension. For our experiments, ξ is
75-dimensional, which is prohibitive for numerical integration.
An alternative approach to numerical integration is to rely on concentration of measure. Let
X1 , ..., XN be iid copies of X, and let PN denote the product measure PN = ⊗N k=1 P. Using a
Chernoff bound and defining Ft := P(X ≥ t), we have
N
!
1 X
P N
1{Xk ≥t} ≤ Ft − ε ≤ e−N ·D(Ft −ε,Ft ) , (B.1)
N
k=1

where D(p, q) = p log(p/q) + (1 − p) log((1 − p)/(1 − q)) is the KL-divergence between two Bernoulli
distributions. Given a δ ∈ (0, 1), define the random variable Q as the solution to the implicit
equation
N
!
1 X
N ·D 1{Xk ≥t} , Q = log(1/δ) . (B.2)
N
k=1

Note that, from a realization of X1 , ..., XN , the realization of Q from (B.2) can be solved for by
numerical root finding. Plugging the definition of Q back into the Chernoff inequality (B.1), we
conclude that there exists an event E (in the product σ-algebra) such that on E the inequality
Ft ≤ Q holds, and furthermore PN (E) ≥ 1 − δ. This is the methodology which we use to generate
all our bounds, with δ = 10−4 . Hence, the statements of the form “Ft ≤ γ” in Section 5.1 should
be understood as operating under the assumption that our implementation of the simulation chose
a particular realization which is contained in the simulator event E described previously.

28

You might also like