Balanced Truncation
Balanced Truncation
Balanced Truncation1
This lecture introduces balanced truncation for LTI systems: an important projection
model reduction method which delivers high quality reduced models by making an extra
effort in choosing the projection subspaces.
ẋ1 = −x1 + x2 + f,
ẋ2 = −2x2 ,
y = x1 + x2 ,
can be replaced by
ẋ = −x + f, y = x
∗ c
A. Megretski, 2004
1
Version of September 27, 2004
2
without changing its transfer function. In this state space model, with
−1 1 1
A= , B= , C = 1 1 , D = 0,
0 −2 0
satisfies pMc = 0 for p = [0 1], and hence the variable px = x2 represents an uncontrollable
mode. The removal of such mode can be viewed as a canonical projection model reduction
where the columns of V form a basis in the column range of Mc (which is the same as the
null space of p), and U can be selected quite arbitrarily subject to the usual constraint
UV = I.
Strictly speaking, the example above cannot even be considered as “model reduction”,
as the orders of the original and the projected systems are both equal to 1. A more
interesting situation is represented by the perturbed model
ẋ1 = −x1 + x2 + f,
ẋ2 = −2x2 + ǫf,
y = x1 + x2 ,
(same A, C, D but a modified B), where ǫ > 0 is a parameter. Intuitively, one can expect
that, when ǫ > 0 is small enough,
ẋ = −x + f, y = x
is still a good reduced model. This expectation can be related to the fact that x2 is
difficult to control by f when ǫ > 0 is small. One can say that the mode x2 = px, which
corresponds to the left (row) eigenvector of the A matrix (pA = −2p in this case), is
almost uncontrollable, which can be seen directly from the transfer function
s+2+ǫ 1+ǫ ǫ
G(s) = = − ,
(s + 2)(s + 1) s+1 s+2
where A is an n-by-n Hurwitz matrix (all eigenvalues have negative real part). When
f (t) ≡ 0 for t ≥ 0, the value of the output y(t) at a given moment t is uniquely defined
by x(0), and converges to zero exponentially as t → +∞. Hence the integral
Z ∞
Eo = |y(t)|2dt,
0
measuring the “observable output energy” accumulated in the initial state, is a function
of x(0), i.e. Eo = Eo (x(0)). Moreover, since y(t) is a linear function of x(0), Eo will be a
quadratic form with respect of x(0), i.e.
i.e. as the minimal output energy which can be observed for t ≥ 0 when px(0) = 1. Note
that infimum over an empty set equals plus infinity, hence E o (0) = ∞. When the pair
(C, A) is observable, and hence Wo > 0 is invertible, the dual observability measure is
given by
1
E o (p) = for p 6= 0.
pWo−1 p′
The following theorem is frequently utilized for computing Wo numerically.
implies Z ∞
′
x(t) Wo x(t) = |Cx(τ )|2 dτ.
t
Differentiating the second identity with respect to t at t = 0 yields
2x(0)′ Wo Ax(0) = −|Cx(0)|2
for all x(0) ∈ Rn . Comparing the coefficients on both sides of the quadratic identity
yields (5.3).
Finding a numerical solution of (5.3) is not easy when n is about 104 and larger. In
such situation, Theorem 5.1 can be used as a basis for finding an approximation of Wo .
It is important to understand that the observability measure alone should not be the
only numerical test for choosing which states to eliminate in a model reduction procedure.
Instead, a combination of observability and a controllability measures, to be introduced
in the next subsection, should be used.
corresponds to a unique initial condition x(0) in (5.1) for which the corresponding solution
x = x(t) satisfies x(t) → 0 as t → −∞. This solution is given by
Z ∞
x(t) = eAτ Bf (t − τ )dτ,
0
where eM denotes the matrix exponent of a square matrix M. One can say that input
f = f (t) drives the system state from x(−∞) = 0 to x(0) = X(f (·)).
Let p be a 1-by-n row vector, so that the product px(t) is a dual state of (5.1) – a
linear combination of components of the state space vector. The (dual) controllability
measure E c = E c (p) is defined as the maximal value of |px(0)|2 which can be achieved by
using an input f = f (t) of unit energy:
E c (p) = max{|pX(f (·))|2 : kf k ≤ 1}.
6
The following statement describes some basic properties of these controllability mea-
sures.
Theorem 5.2 Assuming that A is an n-by-n Hurwitz matrix.
(a) E c (p) = pWc p′ is a quadratic form with the coefficient matrix
Z ∞
′
Wc = eAt BB ′ eA t dt.
0
(c) A given state x0 ∈ Rn is reachable from zero if and only if Ec (x0 ) > 0 or, equiva-
lently, the equation Wc p′ = x0 has a solution p′ . In this case Ec (x0 ) = px0 is the
minimum of kf (·)k2 subject to X(f (·)) = x0 .
Proof To prove (a), note that
Z
max −∞∞ g(t)′ f (t)dt = kgk,
kf k≤1
hence Z ∞
c
E (p) = |peAt B|2 dt = pWc p′ .
0
Statement (b) is actually a re-wording of Theorem 5.1, with C replaced by B ′ , A
replaced by A′ , Wo replaced by Wc , and x(0) replaced by p.
To prove (c), consider first the case when equation Wc p′ = x0 has no solution p. Then
there exists a row vector p0 such that p0 Wc = 0 but p0 x0 6= 0. Here the equality means
that |p0 X(f (·))|2 equals zero for every finite energy signal f = f (t). Since, from the
inequality, |p0 x0 |2 > 0, the state x0 is not reachable from zero.
Now assume that x0 = Wc p′ for some p. Then kf k2 ≥ pWc p′ = px0 whenever x0 =
X(f (·)). On the other hand, for
′ −A′ t ′
Be p , t ≤ 0,
f (t) =
0, t > 0,
we have kf k2 = px0 and x0 = X(f (·)).
7
When the pair (A, B) is controllable, and, hence, Wc > 0, the primal controllability
measure Ec = Ec (x0 ) can be expressed as
1
Ec (x0 ) = for x0 6= 0.
x′0 Wc−1 x0
For controllable and observable systems Wc and Wo are positive definite, and the formulae
can be simplified to
x′0 Wo x0 oc pWc p′
Eoc (x0 ) = (x0 =
6 0), E (p) = (p 6= 0).
x′0 Wc−1 x0 pWo−1 p′
For model reduction purposes, we are interested in finding a subspace of primal state
vectors for which the minimum of the joint controllability and observability measure
over all non-zero elements is maximal. A basis in this subspace will yield columns of
a projection matrix V . Similarly, we are interested in finding a subspace of dual state
vectors for which the minimum of the joint controllability and observability measure over
all non-zero elements is maximal. A basis in this subspace will yield rows of a projection
matrix V .
The following theorem can be used in finding such V and U.
Theorem 5.3 Let Wc = Lc L′c and Wo = L′o Lo be the Choleski decompositions of the
controllability and observability Gramians. Let
ρ1 ≥ · · · ≥ ρr > ρr+1 ≥ · · · ≥ ρn ≥ 0
(a) ρ1 ≥ · · · ≥ ρr are also the eigenvalues of Lo Wc L′o , and the corresponding normalized
row eigenvectors can be defined by
φi = σi−1 ψi′ L′c L′o (i = 1, . . . , r).
(b) The set of all linear combinations of vectors Lc ψi is the only r-dimensional linear
subspace V in Rn such that Eoc (v) ≥ ρr for every v ∈ V.
(c) The set of all linear combinations of row vectors φi Lo is the only r-dimensional
linear subspace U of row n-vectors such that E oc (u) ≥ ρk for every u ∈ U.
(d) UV = Ir , where
−1/2
σ1 φ1
..
h i
−1/2 −1/2
V = Lc ψ1 σ1 . . . ψr σr , U = Lo .
.
−1/2
σr φr
5.1.6 Example
Let
1 1
G(s) = + ,
s+1−ǫ s+1+ǫ
where ǫ > 0 is a small parameter. A state space model is
−1 + ǫ 0 1
A= , B= , C = 1 1 , D = 0.
0 −1 − ǫ 1
Wo ≥ Wo− , Wc ≥ Wc−
are guaranteed. The definition of upper bounds will be more strict: by upper bounds of
Gramians Wo , Wc defined by the Lyapunov equalities
Wo A + A′ Wo = −C ′ C, AWc + Wc A′ = −BB ′ ,
where A is a Hurwitz matrix, we mean solutions Wo+ , Wc+ of the corresponding Lyapunov
inequalities
Wo+ A + A′ Wo+ ≤ −C ′ C, AWc+ + Wc+ A′ ≤ −BB ′ .
These inequalities imply that Wo+ ≥ Wo and Wc+ ≥ Wc , but the inverse implication is not
always true.
The following simple observation can be used to produce lower bounds of the Gramians.
Theorem 5.5 Let A be an n-by-n Hurwitz matrix. Let F be an n-by-m matrix. For
s ∈ C with Re(s) > 0 define
p
a = a(s) = (sIn − A)−1 (s̄In + A), b = b(s) = 2Re(s)(sIn − A)−1 B.
Then
(a) a is a Schur matrix (all eigenvalues strictly inside the unit disc);
AP + P A′ = −BB ′
P = aP a′ + bb′ ;
11
P = lim Pk ,
k→∞
AQ + QA′ < 0.
AWc + Wc A′ = −BB ′ ,
then
AWc− + Wc− A′ ≈ −BB ′ ,
and hence
A(Wc− + ǫQ) + (Wc− + ǫQ)A′ ≤ −BB ′
for some ǫ > 0 (which will, hopefully, be small enough). Then Wc− and Wc− + ǫQ are a
lower and an upper bound for the controllability Gramian Wc .
Theorem 5.6 Let Wo− = Fo′ Fo and Wc− Fc Fc′ be lower bounds of the observability and
controllability Gramians Wo , Wc of a stable LTI model G = G(s). Let
σ1− ≥ σ2− ≥ · · · ≥ 0
12
be the ordered singular numbers of Fo Fc . Then σk− is a lower bound for the k-th Hankel
singular number σk = σk (G) of the system, and
kG − Ĝk∞ ≥ σk−
Proof Let Zk denote the subspace spanned by the k dominant eigenvectors of Fc′ Wo− Fc ,
i.e.
|Fo Fc z| ≥ σk− |z| ∀ z ∈ Zk .
Since Wc ≥ Fc Fc′ , every vector Fc z lies in the range of Wc , and q ′ Fc z ≤ |z|2 whenever
Fc z = Wc q. Hence every state x(0) = Fc z can be reached from x(−∞) = 0 using a minimal
energy input f = fz (t) (depending linearly on z) of energy not exceeding |z|2 . On the
other hand, every state x(0) = Fc z with z ∈ Zk will produce at least |Fo Fc z|2 ≥ (σk− )2 |z|2
of output energy. Since Ĝ is a linear system of order less than k, there exists at least one
non-zero z ∈ Zk for which the input f = fz (t) produces a zero state x̂(0) = 0 at zero time.
Then, assuming the input is zero for t > 0, the error output energy is at least (σk− )2 |z|2 .
Since the testing input energy is not larger than |z|2 > 0, this yields an energy gain of
(σk− )2 , which means that kG − Ĝk∞ ≥ σk− .
Theorem 5.7 Let σ1 > σ2 > · · · > σh be the ordered set of different Hankel singular
numbers of a stable LTI system G. Let Ĝ be the reduced model obtained by removing the
states corresponding to singular numbers not larger than σk from a balanced realization of
G. Then Ĝ is stable, and satisfies
h
X
kG − Ĝk∞ ≤ 2 σi .
i=k
13
The utility of Theorem 5.7 in practical calculations of H-Infinity norms of model re-
duction errors is questionable: an exact calculation of the H-Infinity norm is possible at
about the same cost, and the upper bound itself can be quite conservative. Neverthe-
less, the theorem provides an important reassuring insight into the potential of balanced
truncation: since the singular numbers of exponentially stable LTI systems decay expo-
nentially, the upper bound of Theorem 5.7 is not expected to be much larger than the
lower bound.
For example, for a system with singular numbers σi = 2−i , a kth order reduced model
cannot have quality better than 2−k−1, and exact balanced truncation is guaranteed to
provide quality of at least 2−k+1 .
The proof of Theorem 5.7 is based on estimating the quality of balanced truncation
in the case when only the states of a balanced realization corresponding to the smallest
Hankel singular number are removed, which is done in the following technical lemma.
Lemma 5.1 Let W = W ′ > 0 be a positive definite symmetric n-by-n matrix satisfying
the Lyapunov equalities
W A + A′ W = −C ′ C, AW + W A′ = −BB ′ . (5.5)
where A22 is an r-by-r matrix, and matrices B2 , C2′ have r rows. Then
are stable;
are satisfied;
(c) kG − G1 k∞ ≤ 2σ.
14
Proof It is sufficient to consider the case when the dimension m of f = f (t) equals the
dimension k of y = y(t). (If m < k, add zero columns to B, if m > k, add zero rows to
C.) First note that re-writing (5.5) in terms of the blocks Aik , Bi , Ci yields
P A11 + A′11 P = −C1′ C1 , (5.6)
P A12 + σA′21 = −C1′ C2 , (5.7)
σ(A22 + A′22 ) = −C2′ C2 , (5.8)
A11 P + P A′11 = −B1 B1′ , (5.9)
σA12 + P A′21 = −B1 B2′ , (5.10)
σ(A22 + A′22 ) = −B2 B2′ . (5.11)
Note (5.8) together with (5.11) implies that C2′ = B2 θ for some unitary matrix θ. Also,
(5.6) and (5.9) prove (b).
To prove (a), note that for every complex eigenvector v 6= 0 of A, Av = sv for some
s ∈ C, multiplication of the first equation in (5.5) by v ′ on the left and by v on the right
yields
2Re(s)v ′ W v = −|Cv|2 .
Hence either Re(s) < 0 or Re(s) = 0 and Cv = 0. Hence all unstable modes of A are
unobservable, and G = G(s) has no unstable poles. The same proof applies to G1 , since
A11 satisfies similar Lyapunov equations.
To prove (c), consider the following state space model of the error system G − G1 :
ẋ1 = A11 x1 + A12 x2 + B1 f,
ẋ2 = A21 x1 + A22 x2 + B2 f,
ẋ3 = A11 x3 + B1 f,
e = C1 x1 + c2 x2 − C1 x3 .
It would be sufficient to find a positive definite quadratic form V (x) = x′ Hx such that
dV (x(t))
ψ(t) = 4σ 2 |f (t)|2 − |e(t)|2 − ≥0
dt
for all solutions of system equations. Indeed, such Lyapunov function V can be readily
presented, though there is no easy way to describe the intuitive meaning of its format:
V (x) = σ 2 (x1 + x3 )′ P −1 (x1 + x3 ) + (x1 − x3 )′ P (x1 − x3 ) + 2σ|x2 |2 .
To streamline the derivation, introduce the shortcut notation
z = x1 + x3 , ∆ = x1 − x3 , δ = C1 ∆, u = σ −1 B2′ x2 , q = B1′ P −1 z.
15