Infinite Dimensional QR Iteration
Infinite Dimensional QR Iteration
https://doi.org/10.1007/s00211-019-01047-5 Mathematik
Received: 8 June 2018 / Revised: 11 April 2019 / Published online: 18 May 2019
© The Author(s) 2019
Abstract
Spectral computations of infinite-dimensional operators are notoriously difficult, yet
ubiquitous in the sciences. Indeed, despite more than half a century of research, it is still
unknown which classes of operators allow for the computation of spectra and eigen-
vectors with convergence rates and error control. Recent progress in classifying the
difficulty of spectral problems into complexity hierarchies has revealed that the most
difficult spectral problems are so hard that one needs three limits in the computation,
and no convergence rates nor error control is possible. This begs the question: which
classes of operators allow for computations with convergence rates and error control?
In this paper, we address this basic question, and the algorithm used is an infinite-
dimensional version of the QR algorithm. Indeed, we generalise the QR algorithm
to infinite-dimensional operators. We prove that not only is the algorithm executable
on a finite machine, but one can also recover the extremal parts of the spectrum and
corresponding eigenvectors, with convergence rates and error control. This allows
for new classification results in the hierarchy of computational problems that existing
algorithms have not been able to capture. The algorithm and convergence theorems are
demonstrated on a wealth of examples with comparisons to standard approaches (that
are notorious for providing false solutions). We also find that in some cases the IQR
algorithm performs better than predicted by theory and make conjectures for future
study.
1 Introduction
B Matthew J. Colbrook
[email protected]
123
18 M. J. Colbrook, A. C. Hansen
n (T ) −→ σ (T ), n → ∞, (1.1)
preferably with some form of error control of the convergence. As this philosophy
forms the basics of numerical analysis, it naturally permeates the classical literature
on the computational spectral problem. However, as is shown in [8,9,38], an algorithm
satisfying (1.1) is impossible even for the class of self-adjoint operators. Indeed, in the
general case, the best possible alternative is an algorithm depending on three indices
n 1 , n 2 , n 3 such that
In fact, any algorithm with fewer than three limits will fail on the general class of
operators. Moreover, no error control nor convergence rate on any of the limits are
possible, since any such error control would reduce the number of limits needed. How-
ever, for the self-adjoint and normal cases, two limits suffice in order to recover the
spectrum. This phenomenon implies that the only way to characterise the computa-
tional spectral problem is through a hierarchy classifying the difficulty of computing
spectra of different subclasses of operators. This is the motivation behind the SCI
hierarchy, which also covers general numerical analysis problems. Indeed, the SCI
hierarchy is closely related to Smale’s question on the existence of purely iterative
generally convergent algorithm for polynomial zero finding [69]. As demonstrated by
McMullen [49,50] and Doyle and McMullen [30], this is a case where several limits
are needed in the computation, and their results become special cases of classification
in the SCI hierarchy [8,9].
Informally, the SCI hierarchy is characterised as follows (see the “Appendix 1” for
a more detailed summary describing the SCI hierarchy).
123
On the infinite-dimensional QR algorithm 19
0 : The set of problems that can be computed in finite time, the SCI = 0.
1 : The set of problems that can be computed using one limit, the SCI = 1,
however, one has error control and one knows an error bound that tends to
zero as the algorithm progresses.
2 : The set of problems that can be computed using one limit, the SCI = 1, but
error control may not be possible.
m+1 : For m ∈ N, the set of problems that can be computed by using m limits, the
SCI ≤ m.
The class 1 is of course a highly desired class, however, most spectral problems
are much higher in the hierarchy. For example, we have the following known classi-
fications [8,9,38].
Here, the notation\indicates the standard “setminus”. Note that the SCI hierarchy
can be refined. We will not consider the full generalisation in the higher part of the
hierarchy in this paper, but recall the class 1 [24]. This class is defined as follows.
n (T ) ⊂ σ (T ) + B2−n (0)
123
20 M. J. Colbrook, A. C. Hansen
The main contributions of the paper can be summarised as follows: New convergence
results, algorithmic results (the IQR algorithm can be implemented), classification
results in the SCI hierarchy and numerical examples.
(1) Convergence results We provide new convergence theorems for the IQR algo-
rithm with convergence rates and error control. The results include eigenvalues,
eigenvectors and invariant subspaces.
(2) Algorithmic implementation We prove that for infinite matrices with finitely many
non-zero entries in each column, it is possible to implement the IQR algorithm
exactly (on a finite machine) as if one had an infinite computer at one’s disposal.
This can be extended to implementing the IQR algorithm with error control for
general invertible operators.
(3) SCI hierarchy classifications As a result of (1) and (2), we provide new classifi-
cation results for the SCI hierarchy. In particular, the convergence properties of
the IQR algorithm capture key structures that allow for sharp 1 classification of
the problem of computing extremal points in the spectrum. Moreover, we estab-
lish sharp 1 classification of the problem of computing spectra of subclasses of
compact operators.
(4) Numerical examples Finally, we demonstrate the IQR algorithm and the proven
convergence results on a variety of difficult problems in practical computation,
illustrating how the IQR algorithm is much more than a theoretical concept.
Moreover, the examples demonstrate that the IQR algorithm performs much
better than predicted by our theory, working on much larger classes of operators.
Hence, we are left with many open problems on the theoretical understanding of
the potential and limitations of this algorithm. The computational experiments
include examples from
(i) Toeplitz/Laurent operators and their perturbations,
(ii) P T -symmetry in quantum mechanics,
(iii) Hopping sign model in sparse neural networks,
(iv) NSA Anderson model in superconductors.
Our results connect to many different approaches in the vast literature on spectral
computation in infinite dimensions. The infinite-dimensional computational spectral
problem is very different from the finite-dimensional computational eigenvalue prob-
lem, and even though the IQR algorithm is inspired by the finite-dimensional version,
this paper solely focuses on the infinite-dimensional problem. Thus, the paper is aimed
at the analysis and numerical analysis audience focusing on infinite-dimensional prob-
lems rather than the finite-dimensional numerical linear algebra discipline.
Finite sections The IQR algorithm provides an alternative to the standard finite
section method in several cases where it fails. Whereas the finite section method
would extract a finite section from the infinite matrix and then apply, for example,
123
On the infinite-dimensional QR algorithm 21
the finite-dimensional QR algorithm, the IQR algorithm first performs the infinite
QR iterations and then extracts a finite section. In general, these two processes do
not commute. The finite section method (or any derivative of it) cannot work in
general because of the general classification results in the SCI hierarchy mentioned
in Sect. 1. Typically, it may provide false solutions. However, in the cases where
it converges, it provides invaluable 2 classifications in the SCI hierarchy. The
finite section method has often been viewed in connection with Toeplitz theory
and the reader may want to consult the work by Böttcher [14,15], Böttcher and
Silberman [18], Böttcher et al. [16], Brunner et al. [22], Hagen et al. [35], Lind-
ner [44], Marletta [46] and Marletta and Scheichl [47]. From the operator algebra
point of view, the work of Arveson [5–7] has been influential as well as the work
of Brown [21].
Infinite-dimensional Toda flow Deift et al. [28] provided the first results on the
IQR algorithm in connection with Toda flows with infinitely many variables. Their
results are purely functional analytic and do not take implementation and com-
putability issues into account. However, these results provide the fundamentals
of the IQR algorithm. In [36] these results were expanded with a convergence
result for eigenvectors corresponding to eigenvalues outside the essential numeri-
cal range for normal operators. Yet, this paper did not consider convergence rates,
actual numerical calculation nor any classification results.
Infinite-dimensional QL algorithm Olver, Townsend and Webb have provided
a practical framework for infinite-dimensional linear algebra and foundational
results on computations with infinite data structures [53–56,73]. This includes
efficient codes as well as theoretical results. The infinite-dimensional QL (IQL)
algorithm is an important part of this program. The IQL algorithm is rather differ-
ent from the IQR algorithm, although they are similar in spirit. In particular, both
the implementation and the convergence results are somewhat contrasting.
Infinite-dimensional spectral computation: The results in this paper follow in the
long tradition of infinite-dimensional spectral computations. This field contains
a vast literature that spans more than half a century, and the references that we
have cited in the first paragraph of Sect. 1 represent a small sample. However, we
would like to highlight the recent work by Bögli et al. [13] who were able to com-
putationally confirm, with absolute certainty, a conjecture on a certain oscillatory
behaviour of higher auto-ionizing resonances of atoms. Note that problems that
are classified as 1 and 1 in the SCI hierarchy may allow for computer assisted
proofs.
Here we briefly recall some definitions used in the paper. We will consider the canonical
separable Hilbert space H = l 2 (N) (the set of square summable sequences). Moreover,
we write B(H) for the set of bounded operators on H. For orthogonal projections E, F,
we will write E ≤ F if the range of E is a subspace of the range of F. We denote the
canonical orthonormal basis of H by {e j } j∈N , and if ξ ∈ H we write ξ( j) = ξ, e j .
123
22 M. J. Colbrook, A. C. Hansen
SOT WOT
Tn −→ T , Tn −→ T
to mean convergence in the strong and weak operator topology respectively. The
spectrum of T ∈ B(H) will be denoted by σ (T ), and σd (T ) denotes the set of isolated
eigenvalues with finite multiplicity (the discrete spectrum).
In connection with the spectrum, we need to recall some definitions which will
appear in the statement of our theorems. We recall that, for T ∈ B(H), the essential
spectrum1 and the essential spectral radius are given by
Moreover, the numerical range and the essential numerical range of T are defined by
W (T ) = {T ξ, ξ : ξ = 1}, We (T ) = W (T + K ).
K compact
where d(λ, T ) = inf ρ∈T |ρ − λ|. We also recall a generalisation of the spectrum,
known as the pseudospectrum. Indeed, for > 0 define the -pseudospectrum as
σ (T ) = z ∈ C : (T − z I )−1 ≥ −1 ,
where we interpret S −1 as +∞ if S does not have a bounded inverse. This is easier
to compute than the spectrum, converges in the Hausdorff metric to the spectrum as
↓ 0 and gives an indication of the instability of the spectrum of T . We shall use it
as a comparison for the IQR algorithm and as a means to detect spectral pollution for
finite section methods.
Finally, we need a notion of convergence of subspaces. We follow the notation in
[41]. Let M ⊂ B and N ⊂ B be two non-trivial closed subspaces of a Banach space
B. The distance between them is defined by
1 Of course in the case of non-normal T there are different definitions of the essential spectrum. However,
these differences will not matter regarding the results of this paper.
123
On the infinite-dimensional QR algorithm 23
Given subspaces M and {Mk } such that δ̂(Mk , M) → 0 as k → ∞, we will use the
notation Mk →M. If we replace B with a Hilbert space H, we can express δ and δ̂
conveniently in terms of projections and operator norms. In particular, if E and F are
the orthogonal projections onto subspaces M ⊂ H and N ⊂ H respectively, then
This allows us to extend the definition to allow the trivial subspace {0} and gives
rise to a metric on the set of all closed subspaces of H (first introduced by Krein
and Krasnoselski in [42]). We also define the (maximal) subspace angle, φ(M, N ) ∈
[0, π/2], between M and N by
sin φ(M, N ) = δ̂(M, N ). (1.4)
Finally, we will use two further well-known properties in the Hilbert space setting.
First, if M and N are both finite l-dimensional subspaces, then
1
δ(M, N ) ≤ l 2 δ(N , M), (1.5)
n
M= Mj, N (k) = N1(k) + · · · + Nn(k) ,
j=1
(k)
where the N j need not be orthogonal. Then a simple application of Hölder’s inequal-
ity yields
⎛ ⎞1
n 2
(k)
δ(M, N (k)
)≤⎝ δ(M j , N j )2 ⎠ , (1.6)
j=1
(k)
which shows that if the dimensions of M j and N j are finite and equal, then to prove
(k)
convergence N (k) → M we only need to prove that δ(M j , N j ) → 0 as k → ∞.
For further properties (including other notions of distances between subspaces) and
a discussion on two projections theory, we refer the reader to the excellent article of
Böttcher and Spitkovsky [19].
123
24 M. J. Colbrook, A. C. Hansen
The paper is organised as follows. In Sect. 2 we define the IQR algorithm (simple
codes are also provided in the appendix). Section 3 contains and proves our main
theorems including convergence rates. The outcome is more elaborate than the finite-
dimensional case, as the infinite-dimensional setting includes more intricate instances.
Our key practical result is that, despite being an algorithm dealing with infinite amount
of information, it can be implemented on any standard computer and this is discussed
in Sect. 4. The fact that the IQR algorithm can be computed allows for its use in order
to provide new classification in the SCI hierarchy as discussed in Sect. 5. In particular,
we demonstrate 1 classification for the extremal part of the spectrum and dominant
invariant subspaces, as well as 1 results for spectra of certain classes of compact
operators. Note that the general spectral problem for compact operators is not in 1 .
The IQR algorithm and convergence theorems are demonstrated on a large collection
of examples from the sciences on difficult computational spectral problems in Sect. 6,
with comparisons to the finite section method. The IQR algorithm is also found to
perform better than theory predicts and we conjecture conditions on the operator for
this to be the case. Finally, we conclude with a discussion of the opportunities and
limits of the IQR algorithm in Sect. 7.
The IQR algorithm has existed as a pure mathematical concept for more than thirty
years and it first appeared in the paper “Toda Flows with Infinitely Many Variables”
[28] in 1985. However, the analysis in [28] covers only self-adjoint infinite matrices
with real entries, and since the analysis is done from a pure mathematical perspective,
the question regarding the actual numerical algorithm is left out. We will extend the
analysis to more general operators and answer the crucial question: can one actually
implement the IQR algorithm? The answer is affirmative, and we also prove conver-
gence theorems, generalising the well-known finite-dimensional case.
The QR decomposition is the core of the QR algorithm. If T ∈ Cn×n , one may apply
the Gram-Schmidt procedure to the columns of T and store these columns in a matrix
Q. This gives us the QR decomposition
T = Q R, (2.1)
123
On the infinite-dimensional QR algorithm 25
nite matrices and to find a way so that one can implement the procedure on a finite
machine. To do this, we need to introduce the concept of Householder reflections in
the infinite-dimensional setting.
Definition 2.1 A Householder reflection is an operator S ∈ B(H) of the form
2
S=I− ξ ⊗ ξ̄ , ξ ∈ H, (2.2)
ξ 2
In other words, one can introduce zeros in the column below the diagonal entry.
Indeed, if η1 = η, e1 = 0 one may choose ξ = η ± η ζ, where ζ = η1 /|η1 |e1 and
if η1 = 0 choose ξ = η ± η e1 . The following theorem gives the existence of a QR
decomposition, even in the case where the operator is not invertible.
Theorem 2.2 ([36]) Let T be a bounded operator on a separable Hilbert space H and
let {e j } j∈N be an orthonormal basis for H ∼
= l 2 (N). Then there exists an isometry Q
such that T = Q R, where R is upper triangular with respect to {e j }. Moreover,
Q = SOT-lim Vn
n→∞
Let T ∈ B(H) be invertible and let {e j } be an orthonormal basis for H. By Theorem 2.2
we have T = Q R, where Q is an isometry and R is upper triangular with respect to
{e j }. Since T is invertible, Q is in fact unitary. Consider the following construction of
unitary operators { Q̂ k } and upper triangular (w.r.t. {e j }) operators { R̂k }. Let T = Q 1 R1
be a QR decomposition of T and define T1 = R1 Q 1 . Then QR factorize T1 = Q 2 R2
and define T2 = R2 Q 2 . The recursive procedure becomes
Tm−1 = Q m Rm , Tm = Rm Q m . (2.3)
123
26 M. J. Colbrook, A. C. Hansen
Now define
Definition 2.3 Let T ∈ B(H) be invertible and let {e j } be an orthonormal basis for
H. The sequences { Q̂ j } and { R̂ j } constructed as in (2.3) and (2.4) will be called a
Q-sequence and an R-sequence of T with respect to {e j }.
Remark 2.4 Note that since the Householder transformations used in the proof of
Theorem 2.2 are unique up to a ± sign, we will with some abuse of language refer
to the QR decomposition constructed as the QR decomposition. In general for an
invertible operator, the IQR algorithm is uniquely defined up to phase—see Sect. 4.2.
This will not be a problem for our theorems or numerical examples.
The following observation will be useful in the later developments. From the con-
struction in (2.3) and (2.4) we get
T = Q 1 R1 = Q̂ 1 R̂1 ,
T 2 = Q 1 R1 Q 1 R1 = Q 1 Q 2 R2 R1 = Q̂ 2 R̂2 ,
T 3 = Q 1 R1 Q 1 R1 Q 1 R1 = Q 1 Q 2 R2 Q 2 R2 R1 = Q 1 Q 2 Q 3 R3 R2 R1 = Q̂ 3 R̂3 .
T m = Q̂ m R̂m . (2.5)
Note that R̂m must be upper triangular with respect to {e j } j∈N since R j , j ≤ m is
upper triangular with respect to {e j } j∈N . Also, if T is invertible then Rei , ei = 0.
From this it follows immediately that
3 Convergence theorems
123
On the infinite-dimensional QR algorithm 27
N
Q ∗m T Q m −→ λjej ⊗ ej, as m → ∞.
j=1
In this section we will address the convergence of the IQR algorithm for normal
operators under similar assumptions and prove an analogue of Theorem 3.1 in infinite
dimensions (Theorem 3.9). As well as this, and for more general operators T that are
not necessarily normal, we address block convergence (Theorem 3.13), relevant when
the eigenvalues do not have distinct moduli, and convergence to (dominant) invariant
subspaces (Theorem 3.15).
To state and prove our theorems we need some preliminary results. The reader only
interested in the results themselves is referred to Sect. 3.2. If T is a normal operator, we
will use χ S (T ) to denote the indicator function of the set S defined via the functional
calculus. Without loss of generality, we deal with the Hilbert space H = l 2 (N) and the
canonical orthonormal basis {e j } j∈N . Our first set of results concerns the convergence
of spanning sets under power iterations and is analogous to the finite-dimensional
case. The following proposition can be found in [36] and together with Lemma 3.6
below, these are the only results we will use from [36].
eigenvalues of T with |λ1 | > |λ2 | > · · · > |λ N |. Suppose further that sup{|z| : z ∈
} < |λ N |. Let l ∈ N and suppose that {ξi }li=1 are linearly independent vectors in H
such that {χω (T )ξi }li=1 are also linearly independent. Then
(i) The vectors {T k χω (T )ξi }li=1 are linearly independent and there exists an l-
dimensional subspace B ⊂ ranχω (T ) such that
span{T k ξi }li=1 → B, as k → ∞.
(ii) If
span{T k ξi }l−1
i=1 → D ⊂ H, as k → ∞,
123
28 M. J. Colbrook, A. C. Hansen
In order to extend this proposition to describe rates of convergence and prove our
main theorems, we need to describe the space B in more detail. This is done inductively
as follows. The first step is to choose ν1,1 ∈ {λi }i=1
N of maximum modulus such that
We then let ξ1,1 be a linear multiple of ξ1 such that χν1,1 (T )ξ1,1 has norm one. Now
suppose that at the m-th stage we have constructed vectors {ξm,i }i=1 m with the same
sm
linear span as {ξi }i=1 and such that there exist {νm, j } j=1 ⊂ {λi }i=1 with the following
m N
properties. After re-ordering the vectors {ξm,i }i=1m if necessary, there exist integers
0 = km,0 < km,1 < km,2 < · · · < km,sm = m such that
(1) νm,sm < νm,sm −1 < · · · < νm,1 .
N has |λ| > ν
(2) χλ (T )ξm,i = 0 if i > km, j and λ ∈ {λi }i=1
m, j+1 .
k
(3) {χνm, j (T )ξm,i }i=k
m, j
m, j−1 +1
are orthonormal.
We seek to add the space spanned by the vector ξm+1 whilst preserving these properties.
First we deal with (2). Let ηm+1 ∈ {λi }i=1 N be of maximal modulus such that
/ span{χ{λ1 ,...,ηm+1 } (T )ξ j }mj=1 . If |ηm+1 | < νm,1 then let
χ{λ1 ,...,ηm+1 } (T )ξm+1 ∈
t(m + 1) be maximal such that |ηm+1 | < νm,t(m+1) . We then choose complex num-
k
bers {am, j } j=1
m,t(m+1)
such that writing
km,t(m+1)
ξ̃m+1,m+1 = ξm+1 + am, j ξm, j
j=1
k
{χνm,t(m+1)+1 (T )ξm+1,i }i=k
m,t(m+1)+1
m,t(m+1) +1
∪ {χνm,t(m+1)+1 (T )ξ̃m+1,m+1 }
k
(without changing {ξm+1,i }i=km,t(m+1)+1
m,t(m+1) +1
). Note that by (2) and the definition of ηm+1
these vectors are linearly independent. This gives ξm+1,m+1 such that
k
{χνm,t(m+1)+1 (T )ξm+1,i }i=k
m,t(m+1)+1
m,t(m+1) +1
∪ {χνm,t(m+1)+1 (T )ξm+1,m+1 }
123
On the infinite-dimensional QR algorithm 29
N has |λ| > ν
are orthonormal and χλ (T )ξm+1,m+1 = 0 if λ ∈ {λi }i=1 m,t(m+1)+1 .
After re-ordering indices if necessary, we see that (1)-(3) now hold for m + 1.
After l steps the above process terminates giving a new basis {ξ̃i }li=1 = {ξl,i }li=1
for span{ξi }li=1 along with {ν j }nj=1 = {νl, j }nj=1 ⊂ {λi }i=1
N and 0 = k < k < k <
0 1 2
· · · < kn = l such that
(i) |νn | < |νn−1 | < · · · < |ν1 | .
N has |λ| > ν
(ii) χλ (T )ξ̃i = 0 if i > k j and λ ∈ {λi }i=1
j+1 .
k
(iii) {χν j (T )ξ̃i }i=k
j
j−1 +1
are orthonormal.
n
k
B= span{χν j (T )ξ̃i }i=k
j
j−1 +1
.
j=1
Definition 3.3 With respect to the above construction we define the following:
kj
l 1
2
E j := span{χν j (T )ξ̃i }i=k j−1 +1
, Z (T , {ξ j }lj=1 ) := ( ξ̃i 2
− 1) . (3.1)
i=1
Z (T , {ξ j }l+1
j=1 ) ≥ Z (T , {ξ j } j=1 ).
l
ρ = sup{|z| : z ∈ ∪ {λ J +1 , . . . , λ N }},
r = max{|λ2 /λ1 | , . . . , |λ J /λ J −1 | , ρ/ |λ J |}.
Then r < 1 and δ(B, span{T k ξi }li=1 ) ≤ Z (T , {ξ j }lj=1 )r k . Since the spaces are l-
dimensional, it follows from (1.5) that we have the convergence rate
1
δ̂(B, span{T k ξi }li=1 ) ≤ Z (T , {ξ j }lj=1 )l 2 r k .
k
E kj = span{T k ξ̃i }i=k
j
j−1 +1
.
123
30 M. J. Colbrook, A. C. Hansen
k j k j
Let ζ = i=k α χ (T )ξ̃i ∈ E j be a unit vector (hence i=k
j−1 +1 i ν j j−1 +1
|αi |2 = 1)
and consider
kj
ηk = αi T k ξ̃i /ν kj ∈ E kj .
i=k j−1 +1
By construction, we have for any such ξ̃i in the above sum that
ρ j = sup{|z| : z ∈ θ j } < |ν j |.
Thus, since
we have
kj kj 1
2
ζ − ηk ≤ |ρ j /ν j |k |αi | χθ j (T )ξ̃i ≤ ( ξ̃i 2
− 1) rk.
i=k j−1 +1 i=k j−1 +1
Here we have used Hölder’s inequality together with the fact that χθ j (T )ξ̃i 2 =
kj
ξ̃i 2 −1 by orthonormality of {χν j (T )ξ̃i }i=k j−1 +1
. The right-hand side gives an upper
bound for δ(E j , E j ). Analogous rates of convergence hold for the other subspaces
k
123
On the infinite-dimensional QR algorithm 31
Definition 3.5 Suppose that (A1) and (A2) hold and let K ∈ N ∪ {∞} be minimal
with the property that dim(span{χω (T )e j } Kj=1 ) = M. Define
ω = {e j : χω (T )e j = 0, j ≤ K },
= {e j : χω (T )e j = 0, j ≤ K },
˜ ω = {e j ∈ ω : χω (T )e j ∈ span{χω (T )ei } j−1 }.
i=1
where {ξ j } M
j=1 is an orthonormal set of eigenvectors of T . The following simple lemma
extends Lemma 39 in [36] to infinite M but the proof is verbatim so omitted.
˜ ω , then
Lemma 3.6 If em ∈ ∪
s(m)
where s(m) is the largest integer such that {ê j } j=1 ⊂ {e j }mj=1 .
The following theorem is the key step of the proof of Theorem 3.9 and concerns
convergence to the eigenvectors of T .
j
δ̂(span{q̂ j }, span{q̂k, j }) ≤ B( j)Z (T , {êi }i=1 )r k . (3.4)
123
32 M. J. Colbrook, A. C. Hansen
and hence
μ μ 1 μ
δ̂(span{q̂ j,k } j=1 , span{q̂ j } j=1 ) ≤ μ 2 C(μ)Z (T , {ê j } j=1 )r k . (3.6)
We will provide an inductive proof of Theorem 3.7 which requires the following
for the inductive step of part (a).
Lemma 3.8 Assume the conditions in the statement of Theorem 3.7. Suppose also that
(b) in Theorem 3.7 holds for j = 1, . . . , μ and that (c) holds for a given μ < M. Let
˜ ω , where m < pμ+1 , (3.3) also holds with
e pμ+1 = êμ+1 , then if em ∈ ∪
⎧ ⎫1
⎨ μ ⎬2
A(m) = [C(μ) + B( j)]2 + C(μ).
⎩ ⎭
j=1
μ
Proof First note that from (2.6), invertibility of T and the fact that {χω (T )ê j } j=1 are
μ
linearly independent, it must hold that {χω (T )q̂k, j } j=1 are linearly independent also.
Then by using the assumptions stated and the fact that χω (T )q̂ j = q̂ j we have
μ μ μ μ
δ(span{χω (T )q̂k, j } j=1 , span{q̂ j } j=1 ) ≤ δ(span{q̂k, j } j=1 , span{q̂ j } j=1 )
μ
≤ C(μ)Z (T , {ê j } j=1 )r k .
s(m) μ
span{χω (T )qk, j }mj=1 = span{χω (T )q̂k, j } j=1 ⊂ span{χω (T )q̂k, j } j=1 .
Using the fact that χω (T )qk,m ≤ 1 and the definition of δ (along with the
μ
fact that span{q̂ j } j=1 is finite-dimensional), it follows that there exists some vk =
μ μ
j=1 β j,k q̂ j ∈ span{q̂ j } j=1 with vk ≤ 1 and
χω (T )qk,m − vk ≤ C(μ)Z (T , {ê j }μ )r k . (3.7)
j=1
123
On the infinite-dimensional QR algorithm 33
since qk,m is orthogonal to q̂k, j . This together with (3.7) gives that β j,k ≤ C(μ) +
μ
B( j) Z (T , {ê j } j=1 )r k . Hence we must have
μ
2 21 μ
vk ≤ C(μ) + B( j) Z (T , {ê j } j=1 )r k .
j=1
Using (3.7) again then gives the result. Note that we have used orthonormality of
μ
{q̂ j } j=1 which will be proven as part of the induction.
Proof of Theorem 3.7 We begin with the initial step of the induction for (b) and (c).
Note that (a) trivially holds by construction with A(m) = 0 for any m < p1 where
e p1 = ê1 and this provides the initial step for (a).
By Propositions 3.2 and 3.4, there exists a unit eigenvector q̂1 ∈ ranχω (T ) such
that
p
δ(span{q̂1 }, span{T k ei }i=1
1
) ≤ Z (T , {ê1 })r k .
Hence we can take B(1) = 1 and C(1) = 1 in (b) and (c) respectively which completes
the initial step.
For the induction step we will argue simultaneously for (a), (b) and (c) using induc-
tion on μ. Suppose that (a) holds for m < pμ with e pμ = êμ together with (b) and
(c) for j ≤ μ and some μ < M. Let e pμ+1 = êμ+1 then we can use Lemma 3.8 to
extend (a) to all m < pμ+1 and this provides the step for (a). For (b), we note that
Propositions 3.2 and 3.4 imply that
123
34 M. J. Colbrook, A. C. Hansen
μ μ+1
δ span{q̂i }i=1 ⊕ span{ξ }, span{T k êi }i=1 ,
μ+1
≤ Z (T , {ê j } j=1 )r k , ξ ∈ ranχω (T ), (3.11)
μ μ+1 p μ+1
δ(span{q̂i }i=1 ⊕ span{ξ }, span{T k ei }i=1 ) ≤ Z (T , {ê j } j=1 )r k ,
μ μ+1 p
δ(span{q̂i }i=1 ⊕ span{ξ }, span{qk,i }i=1 )
μ μ+1 p
= δ(span{q̂i }i=1 ⊕ span{ξ }, span{T k ei }i=1 )
μ+1
≤ Z (T , {ê j } j=1 )r k . (3.12)
p
μ+1
Again, using that{qk,i }i=1 are orthonormal, there exists some coefficients αk,i with
pμ+1 pμ+1
i=1 |αk,i | 2 ≤ 1 such that defining η̃k = i=1 αk,i qk,i we have
μ+1
ξ − η̃k ≤ Z (T , {ê j } j=1 )r k . (3.13)
Taking the inner product of ξ − η̃k with qk,m and using (3.13) together with the
μ+1
orthonormality of the qk, j s, it follows that αk,m ≤ A(m) + 1 Z (T , {ê j } j=1 )r k .
Similarly, if j ≤ μ then for any c ∈ C
q̂k, j , ξ ≤ cq̂ j , ξ + cq̂ j − q̂k, j = cq̂ j − q̂k, j ,
123
On the infinite-dimensional QR algorithm 35
To finish the inductive step, we define q̂μ+1 = ξ . Recall that ξ is orthogonal to any q̂l
μ+1
with l ≤ μ. Hence it follows that {q̂i }i=1 are orthonormal and we can take
⎧ ⎫1
⎨ pμ+1
2
μ
2 ⎬
2
B(μ + 1) = 1 + A(m) + 1 + B( j) + 1
⎩ ⎭
˜ω
m=1,em ∈ ∪ j=1
μ+1
in (b). For the induction step for (c), the fact that {q̂k,i }i=1 are orthonormal and (1.6)
imply we can take
⎛ ⎞1
μ+1 2
C(μ + 1) = ⎝ B( j)2 ⎠ .
j=1
Our first result generalises Theorem 3.1 to infinite dimensions and relies on Theo-
rem 3.7 (which concerns convergence to eigenvectors).
Theorem 3.9 (Convergence theorem for normal operators in infinite dimensions) Let
T ∈ B(l 2 (N)) be an invertible normal operator with σ (T ) = ω ∪ and ω = {λi }i=1 N ,
where the λi ’s are isolated eigenvalues with (possibly infinite) multiplicity m i satisfying
|λ1 | > · · · > |λ N |. Suppose further that sup{|θ | : θ ∈ } < |λ N |, and let {e j } j∈N be
the canonical orthonormal basis. Let {Q n }n∈N and {Rn }n∈N be Q- and R-sequences
of T with respect to {e j } j∈N . Let {ê j } M
j=1 ⊂ {e j } j∈N , where M = m 1 + · · · + m N , be
the subset described in Definition 3.5 and Theorem 3.7, i.e. span{Q k ê j } → span{q̂ j }
where {q̂ j } M j=1 ⊂ ranχω (T ) is a collection of orthonormal eigenvectors of T and if
ej ∈/ {ê j } j=1 , then χω (T )Q k e j → 0. Then:
M
as k → ∞, where
= { j : ej ∈
/ {êl }l=1
M
}, ξ j ∈ span{ei }i∈
123
36 M. J. Colbrook, A. C. Hansen
and only j∈ ξ j ⊗ e j depends on the choice of subsequence. Furthermore, if T
has only finitely many non-zero entries in each column then we can replace W O T
convergence by S O T convergence.
(ii) We have the following convergence of sections:
M
P #M −→
#M Q ∗n T Q n P SOT
T q̂ j , q̂ j ê j ⊗ ê j , as n → ∞,
j=1
then r < 1 and for any fixed x ∈ span{ê j } M j=1 we have the following rate of
convergence
⎛ ⎞
M
P#M Q ∗n T Q n P
#M x − ⎝ T q̂ , q̂ ê ⊗ ê ⎠ x = O(r n ), as n → ∞.
j j j j
j=1
(3.14)
Remark 3.10 What Theorem 3.9 essentially says is that if we take the n-th iteration
of the IQR algorithm and truncate to an m × m matrix (i.e. Pm Q ∗n T Q n Pm ) then, as
n grows, the eigenvalues of this matrix will converge to the extremal parts of the
spectrum of T . In particular, the theorem suggests that the IQR algorithm can locate
the extremal parts of the spectrum.
Proof of Theorem 3.9 To prove (i), since a closed ball in B(l 2 (N)) is weakly sequen-
tially compact, it follows that any subsequence of {Q ∗n T Q n }n∈N must have a weakly
convergent subsequence {Q ∗n k T Q n k }k∈N . In particular, there exists a W ∈ B(l 2 (N))
such that
Q ∗n k T Q n k −→ W ,
WOT
k → ∞.
123
On the infinite-dimensional QR algorithm 37
Let P#M denote the projection onto span{ê j } M . Note that part (i) of the theorem will
j=1
follow if we can show that
M
P #M =
#M W P T q̂ j , q̂ j ê j ⊗ ê j , (3.17)
j=1
and
#M
P ⊥ #M = 0,
WP #M W P
P #M⊥
= 0.
We will indeed show this, and we start by observing that, due to the weak convergence
and the standard functional calculus, we have that
χω (T )Q n ei → 0, n → ∞, i ∈
$
limk→∞ T Q n k ê j , χω (T )Q n k ei = 0, i ∈ , (3.20)
⇒
limk→∞ χω (T )Q n k ei , T ∗ Q n k ê j = 0, i ∈ ,
span{Q n ê j } → span{q̂ j }, n → ∞, T q̂ j = λq̂ j , λ ∈ ω,
⎧
⎪
⎨limk→∞ T Q n k ê j , χ (T )Q n k ei = 0, i ∈ N,
(3.21)
⇒ limk→∞ T Q n k ei , χ (T )Q n k ê j = 0, i ∈ N,
⎪
⎩
limk→∞ T Q n k ê j , χω (T )Q n k êl = δ j,l λ.
Thus, by (3.18), (3.20), (3.21) and Theorem 3.7 we get (3.17) and also that P #⊥ W P #M =
M
# #⊥
0. Also, by (3.19), (3.20), (3.21) and Theorem 3.7 we get that PM W PM = 0. Note that
in all of these cases, Theorem 3.7 implies that the rate of convergence is such that the
difference between W ê j , ei , W ei , ê j and their limiting values is O(r n k ) (however,
not necessarily uniformly over the indices). Now suppose that T has finitely many
non-zero entries in each column. This can be described by a function f : N → N non-
decreasing with f (n) ≥ n such that T e j , ei = 0 when i > f ( j) as in Definition 4.1.
Proposition 4.2 shows that this is preserved under the iteration in the IQR algorithm,
i.e. Q ∗n k T Q n k also has this property. So let x ∈ l 2 (N) and > 0. Choose y of finite
support such that x − y ≤ . It is then clear that Q ∗n k T Q n k y − W y → 0 as
n k → ∞ (since we only require convergence in finitely many entries). Hence
SOT
Since > 0 and x were arbitrary, we have Q ∗n k T Q n k −→ W .
123
38 M. J. Colbrook, A. C. Hansen
M
x= x j ê j ,
j=1
with at most finitely many x j non-zero. We have that δ̂(span{Q n ê j }, span{q̂ j }) =
O(r n ) and hence there exists some an, j of unit modulus such that Q n ê j − an, j q̂ j =
O(r n ). Since Q n is unitary, we then have
⎛ ⎞
M
P#M Q ∗n T Q n P#M x − ⎝ T q̂ j , q̂ j ê j ⊗ ê j x
⎠
j=1
⎛ ⎞
∗ M
#
≤ Q n T Q n PM x − ⎝ T q̂ j , q̂ j ê j ⊗ ê j Q n Q n x
⎠ ∗
j=1
M
= x j (T − T q̂ j , q̂ j I )Q n ê j = O(r ),
n
j=1
where we have used the fact that T is bounded in the last line. We therefore have
convergence on span{ê j } M
j=1 , and, since the operators are uniformly bounded, we
must have convergence on span{ê j } M
j=1 which implies that
M
#M Q ∗n T Q n P
P #M −→
SOT
T q̂ j , q̂ j ê j ⊗ ê j , as n → ∞.
j=1
For the last parts, suppose that M is finite. Theorem 3.7 then implies (3.15) after a
possible re-ordering. The rate of convergence in (3.14) also implies that
M
P#M Q ∗n T Q n P
#M − T q̂ j , q̂ j ê j ⊗ ê j
= O(r ).
n
j=1
ω = {e j : χω (T )e j = 0, j ≤ K }, = {e j : χω (T )e j = 0, j ≤ K }
˜ ω = {e j ∈ ω : χω (T )e j ∈ span{χω (T )ei } }.
and
j−1
i=1
123
On the infinite-dimensional QR algorithm 39
Theorems 3.9 and 3.7 also give us convergence to the eigenvectors. With the use
of (possibly countably many) shifts and rotations, the above theorem allows us to
find all eigenvalues, their multiplicities and eigenspaces outside the convex hull of the
essential spectrum, i.e. outside the essential numerical range.
Example 3.11 It is possible in the case of infinite M that the q̂ j do not form an
orthonormal basis of ranχω (T ) and we can even lose part of ω in the convergence of
#M Q ∗n T Q n P
P #M to a diagonal operator. This is to be contrasted to the finite-dimensional
case. For example, suppose that with respect to an initial orthonormal basis {v j } j∈N , T
is given by the diagonal matrix Diag(1/2, 1, 1, . . .). Now define f j = v1 + (1/ j)v j+1
and apply Gram-Schmidt to the sequence { f j } j∈N to generate orthonormal vectors
{e j } j∈N . It is easy to see that any v j can be approximated to arbitrary accuracy using
finite linear combinations of e j and hence {e j } j∈N is an orthonormal basis of our
Hilbert space. We also have that the χ1 (T )( f j ) = (1/ j)v j+1 are linearly independent
and hence so are χ1 (T )(e j ). It follows that the IQR iterates converge in the strong oper-
ator topology to the identity operator. However, we could equally take ω = {1, 1/2}
in Theorem 3.9. Hence we have the curious case that span{q̂ j } j∈N ⊂ span{v̂ j } j>1 and
we lose the eigenvalue 1/2.
In the finite-dimensional case and the case of distinct eigenvalues of the same
magnitude, the QR algorithm applied to a normal matrix will ‘converge’ to a block
diagonal matrix (without necessarily converging in each block). This can be extended
to infinite dimensions by inductively using the following theorem which also extends
to non-normal operators.
Theorem 3.13 (Block convergence theorem in infinite dimensions) Let T ∈ B(l 2 (N))
be an invertible operator (not necessarily normal) and suppose that there exists an
orthogonal projection P of rank M (possibly infinite) such that both the ranges of P
and of I − P are invariant under T . Suppose also that there exists α > β > 0 such
that
123
40 M. J. Colbrook, A. C. Hansen
• Tx ≥ α x ∀x ∈ ran(P),
• Tx ≤ β x ∀x ∈ ran(I − P).
Let {Q n }n∈N and {Rn }n∈N be Q- and R-sequences of T with respect to {ei }. Then
there exists a subset {ê j } M
j=1 ⊂ {ei }i∈N such that
μ
(i) For any finite μ ≤ M we have δ(span{Q n ê j } j=1 , ran(P)) = O(β n /α n ) as
n → ∞. If M is finite this implies full convergence δ̂(span{Q n ê j } M
j=1 , ran(P)) =
O(β n /α n ) as n → ∞.
(ii) Every subsequence of {Q ∗n T Q n }n∈N has a convergent subsequence {Q ∗n k T Q n k }k∈N
such that
WOT
M
Q ∗n k T Q n k −→ ξ j ⊗ ê j ζi ⊗ ei ,
j=1 i∈
as k → ∞, where
= { j : ej ∈
/ {êl }l=1
M
}, ξ j ∈ span{êl }l=1
M , ζ ∈ span{e }
i l l∈ .
If {Pel }l=1
M are linearly independent then we can take ê = e . Furthermore, if T
j j
has only finitely many non-zero entries in each column then we can replace W O T
convergence by S O T convergence.
Remark 3.14 Theorem 3.13 essentially says that the IQR algorithm can compute the
invariant subspace ran(P) of such an operator if there is enough separation between
T restricted to ran(P) and ran(I − P). In other words, provided the existence of a
dominant invariant subspace.
Proof of Theorem 3.13 The main ideas of the proof of Theorem 3.13 have already been
presented so we sketch the proof. We first define the vectors {ê j } M
j=1 in a similar way
to Definition 3.5 inductively by ê j = e p j where
j−1
p j = min{i : Pei ∈
/ span{P êk }k=1 }.
Q ∗n k T Q n k −→ W ,
WOT
k → ∞.
123
On the infinite-dimensional QR algorithm 41
implies that (I − P)Q n ê j ≤ C1 ( j)r n ). The final part of the theorem then follows
from the same arguments in the proof of Theorem 3.9. Hence we only need to prove
(a) and (b).
We first claim that
μ μ
δ(span{P T n ê j } j=1 , span{T n ê j } j=1 ) ≤ C3 (μ)r n . (3.22)
P commutes with T which is invertible and hence both of these spaces have dimension
μ by the construction of the ê j . It follows that (3.22) implies
μ μ 1
δ̂(span{P T n ê j } j=1 , span{T n ê j } j=1 ) ≤ μ 2 C3 (μ)r n = C4 (μ)r n . (3.23)
μ
To show (3.22), let x1n , . . . , xμn be an orthonormal basis for span{P T n ê j } j=1 and let
μ
ξ = j=1 α j x nj have norm at most 1. Now, we may choose coefficients β j,n such that
μ
T n j=1 β j,n x nj = ξ since T |ran(P) is invertible when viewed as an operator acting
on ran(P). By the assumptions on T we must have that
⎛ ⎞1/2
m
2
⎝ β j,n ⎠ ≤ 1 .
αn
j=1
μ μ
We may change basis from {ê j } j=1 to {ẽ j } j=1 such that P ẽ j = x nj . Form the vector
⎛ ⎞
μ
μ
ηn = T n ⎝ β j,n ẽ j ⎠ ∈ span{T n ê j } j=1 .
j=1
s(m)
span{P Q n e j }mj=1 = span{P Q n ê j } j=1 (3.24)
μ μ μ μ p
δ(span{T n ê j } j=1 , span{P Q n ê j } j=1 ) = δ(span{T n ê j } j=1 , span{P Q n e j } j=1 )
μ pμ
= δ(span{T n ê j } j=1 , span{P T n e j } j=1 )
123
42 M. J. Colbrook, A. C. Hansen
μ μ
≤ δ(span{T n ê j } j=1 , span{P T n ê j } j=1 )
≤ C4 (μ)r n ,
μ
where we have used (2.6) to reach the second line and the fact that span{P T n ê j } j=1 ⊂
pμ
span{P T n e j } j=1 to reach the third line. Again, both spaces have dimension μ so we
have
μ pμ μ μ p
δ(span{P Q n ê j } j=1 , span{Q n e j } j=1 ) = δ(span{P Q n ê j } j=1 , span{T n e j } j=1 )
μ μ
≤ δ(span{P Q n ê j } j=1 , span{T n ê j } j=1 )
≤ C5 (μ)r n . (3.25)
With these arguments out of the way (these are the analogue of Proposition 3.4) we
can now form our inductive argument, similar to the proof of Theorem 3.7. Suppose
first that (a) holds for μ (allowing μ = 0 for the initial step) and let j ∈ have
j < pμ+1 (where pμ+1 = ∞ if μ = M). From (a) for μ and (3.24) we have that
μ
P Q n e j = vn + an,i Q n êi
i=1
pμ+1
ξ= bn, j Q n e j + wn
j=1
and wn ≤ C5 (μ + 1)r n . Now let j ∈ with j < pμ+1 then we must have
123
On the infinite-dimensional QR algorithm 43
where the square root factor appears since the relevant spaces are μ-dimensional. This
completes the inductive step (the initial step is identical) and hence the proof of the
theorem.
Theorem 3.13 can be made sharper (under a slightly stricter assumption on the
linear independence of {e j } M
j=1 ) with the following theorem which includes the case
that ran(I − P) is not necessarily invariant.
Theorem 3.15 (Convergence to invariant subspace in infinite dimensions) Let T ∈
B(l 2 (N)) be an invertible operator (not necessarily normal) and suppose that there
exists an orthogonal projection P of finite rank M such that the range of P is invariant
under T . Suppose also that there exists α > β > 0 such that
• T x ≥ α x ∀x ∈ ran(P),
• (I − P)T (I − P) ≤ β.
Under these conditions, there exists a canonical M-dimensional T ∗ −invariant sub-
space S and we let P̃ denote the orthogonal projection onto S (in the special case that
ran(I − P) is also T -invariant such as in Theorems 3.9 and 3.13, then S = ran(P)).
Suppose also that { P̃e j } M
j=1 are linearly independent. Let {Q n }n∈N and {Rn }n∈N be
Q- and R-sequences of T with respect to {ei }. Then
(i) The subspace angle φ(span{e j } M
j=1 , S) < π/2 and we have
δ̂(span{Q n e j } M
j=1 , ran(P))
sin φ(span{e j } M n P T (I − P)
j=1 , ran(P)) β
≤ 1+ , (3.26)
cos φ(span{e j } j=1 , S)
M α n α−β
as k → ∞, where
ξ j ∈ span{el }l=1
M , ζ ∈ H.
i
Furthermore, if T has only finitely many non-zero entries in each column then we can
replace W O T convergence by S O T convergence.
Remark 3.16 Theorem 3.15 says that the IQR algorithm can be used to approximate
dominant invariant subspaces. In particular, we shall use the bound (3.26) to build a 1
algorithm in Sect. 5. Note in the normal case that Theorem 3.9 is more precise, both
in giving convergence of individual vectors to eigenvectors and in the less restrictive
assumptions on spanning sets and M. In the normal case (and that of Theorem 3.13)
we also have that the limit operator has a block diagonal form.
123
44 M. J. Colbrook, A. C. Hansen
In this section we will prove Theorem 3.15. The proof technique is different from
those used above, and hence we have given it a separate section. Throughout, we
will denote the ratio β/α by r . Note that since M is finite, the bound α implies that
T |ran(P) : ran(P) → ran(P) is invertible with T |−1 ran(P) ≤ 1/α. First, let Q denote
a unitary change of basis matrix from {e j } to {ẽ j } where {ẽ j } M
j=1 is a basis for ran(P).
Then as matrices with respect to the original basis we can write
T11 T12
Q = [P1 , P2 ], Q∗ T Q = ,
0 T22
−1
where T11 ∈ C M×M and T12 has M rows. Our assumptions imply that T11 ≤ 1/α
and T22 ≤ β. The next lemma shows that we can change the basis further to eliminate
the sub-block T12 . This is needed to apply a power iteration type argument.
Lemma 3.17 Define the linear function F : B(l 2 (N), C M ) → B(l 2 (N), C M ) by
−1
F(A) = T11 AT22 ,
−1
It is then straightforward to check A − F(A) = −T11 T12 , B(A)B(−A) =
B(−A)B(A) = I and the identity (3.27).
Let
I 0
Y =Q
−A∗ I
123
On the infinite-dimensional QR algorithm 45
T n P0 = Q n Rn P0 = Q n P0 P0∗ Rn P0 .
n
T11 (V01 − AV02 ) = (Vn1 − AVn2 )Z n , (3.28)
n 2
T22 V0 = Vn2 Z n . (3.29)
δ̂(span{Q n e j } M
j=1 , ran(P)) = Vn .
2
(3.30)
∗ ∗ ∗
δ̂(span{Q n e j } M
j=1 , ran(P)) = Q n P0 P0 Q n − P1 P1
= Q ∗n (Q n P0 P0∗ Q ∗n − P1 P1∗ )Q
0 P0∗ Q ∗n P2
=
−(I − P0 )∗ Q ∗ P1
.
n 0
123
46 M. J. Colbrook, A. C. Hansen
But we have that P0∗ Q ∗n P2 = Vn2 and hence we are done if we can show
P0∗ Q ∗n P2 = (I − P0 )∗ Q ∗n P1 . Consider the unitary matrix
P0∗ Q ∗n P1 P0∗ Q ∗n P2 U11 U12
U := Q ∗n Q = = .
(I − P0 ) Q n P1 (I − P0 )∗ Q ∗n P2
∗ ∗ U21 U22
1
(V01 − AV02 )−1 ≤ . (3.31)
cos φ(span{e j } M
j=1 , S)
Y = [P1 − P2 A∗ P2 ]
and hence the columns of W are a basis for the subspace S. Arguing as in the proof
of Lemma 3.18, we have that
'
δ̂(span{e j } M
j=1 , S) = 1 − σ0 (W ∗ P0 )2 < 1.
Since (I + A A∗ )−1/2 has norm at most 1, we see that (V01 − AV02 ) is invertible and
(3.31) holds.
Proof of Theorem 3.15 Using Lemma 3.19 and the matrix identities (3.28) and (3.29),
we can write
−n
Vn2 = T22
n 2
V0 (V01 − AV02 )−1 T11 (Vn1 − AVn2 ).
123
On the infinite-dimensional QR algorithm 47
with { j,n } null. But again by (i) we have that P Q n e j approaches span{Q n ek }k=1
M
which is orthogonal to Q n ei and hence {α j,n } is null. The proof of part (ii) now
follows the same argument as in the proof of part (i) of Theorem 3.9 and of the
final part of Theorem 3.13. The key property being that if j ≤ M and i > M
then Q ∗n T Q n e j , ei → 0 due to the invariance of ran(P) under T . Note that it
does not necessarily follow (as is easily seen by considering upper triangular T ) that
Q ∗n T Q n ei , e j → 0 for such i, j.
The previous section gives a theoretical justification for why the IQR algorithm may
work, but we are faced with the possibly unpleasant problem of how to compute with
infinite data structures on a computer. Fortunately, there is a way to overcome such a
problem. The key is to impose some structural requirements on the infinite matrix.
Definition 4.1 Let T be an infinite matrix acting as a bounded operator on l 2 (N) with
basis {e j } j∈N . For f : N → N non-decreasing with f (n) ≥ n we say that T has
quasi-banded subdiagonals with respect to f if T e j , ei = 0 when i > f ( j).
123
48 M. J. Colbrook, A. C. Hansen
This is the class of infinite matrices with a finite number of non-zero elements in
each column (and not necessarily in each row) which is captured by the function f . It is
for this class that the computation of the IQR algorithm is feasible on a finite machine.
For this class of operators one can actually compute (without any approximation or
any extra discretisation) the matrix elements of the n-th iteration of the IQR algorithm
as if it was done on an infinite computer (meaning the computation collapses to a
finite one). The following result of independent interest is needed in the proof and
generalises the well-known fact in finite dimensions that the QR algorithm preserves
bandwidth (see [57] for a good discussion of the tridiagonal case).
Proposition 4.2 Let T ∈ B(l 2 (N)) and let Tn be the n-th element in the IQR iteration,
such that Tn = Q ∗n · · · Q ∗1 T Q 1 · · · Q n , where
j j
Q j = SOT-lim U1 · · · Ul
l→∞
j
and Ul is a Householder transformation. If T has quasi-banded subdiagonals with
respect to f then so does Tn .
Proof By induction, it is enough to prove the result for n = 1. From the construction
of the Householder reflections Um1 = Pm−1 ⊕ Sm , the chosen ηm (see Theorem 2.2)
have
Using the fact that f is increasing, it follows that each Um1 has quasi-banded subdiag-
onals with respect to f , as does the product U11 · · · Um1 . It follows that Q 1 must have
quasi-banded subdiagonals with respect to f and hence so does T1 = R1 Q 1 since R1
is upper triangular.
Theorem 4.3 Let T ∈ B(l 2 (N)) have quasi-banded subdiagonals with respect to f
and let Tn be the n-th element in the IQR iteration, i.e. Tn = Q ∗n · · · Q ∗1 T Q 1 · · · Q n ,
where
j j
Q j = SOT-lim U1 · · · Ul
l→∞
j
and Ul is a Householder transformation (the superscript is not a power, but an index).
Let Pm be the usual projection onto span{e j }mj=1 and denote the a-fold iteration of f
by f ◦ f ◦ . . . ◦ f = f a . Then
( )* +
a times
123
On the infinite-dimensional QR algorithm 49
Remark 4.4 What Theorem 4.3 says is that to compute the finite section of size m of the
n-th iteration of the IQR algorithm (i.e. Pm Tn Pm ), one only needs information from
the finite section of size f n (m) (i.e. P fn (m) T P fn (m) ) since the relevant Householder
reflections can be computed from this information. In other words, the IQR algorithm
can be computed.
To see why this is true, note that by the assumption that T has quasi-banded subdiag-
onals with respect to f , Proposition 4.2 shows that Tn has quasi-banded subdiagonals
with respect to f for all n ∈ N. Thus, it follows from the construction in the proof of
j
Theorem 2.2 that each Ul is of the form
j 2
Ul = Il, j,1 ⊕ Il, j,2 − ξ ⊗ ξ̄l, j
2 l, j
⊕ Il, j,3 ,
ξl, j
where Il, j,1 denotes the identity on Pl−1 H, Il, j,2 denotes the identity on span{ek : l ≤
k ≤ f (l)}, Il, j,3 denotes the identity on P ⊥
f (l) H and ξl, j ∈ span{ek : l ≤ k ≤ f (l)}.
Since Pm is compact, it then follows that
j
Remark 4.5 This result allows us to implement the IQR algorithm because each Ul
only affects finitely many columns or rows of A if multiplied either on the left or
the right. In computer science, it is often referred to as “Lazy evaluation” when one
computes with infinite data structures, but defers the use of the information until
needed. A simple implementation is shown in the appendix for the case that the matrix
has k subdiagonals (i.e. we have f (n) = n + k).
The next question is how restrictive is the assumption in Definition 4.1? In par-
ticular, suppose that T ∈ B(H) and that ξ ∈ H is a cyclic vector for T (i.e.
span{ξ, T ξ, T 2 ξ, . . .} is dense in H). Then by applying the Gram-Schmidt procedure
to {ξ, T ξ, T 2 ξ, . . .} we obtain an orthonormal basis {η1 , η2 , η3 , . . .} for H such that
the matrix representation of T with respect to {η1 , η2 , η3 , . . .} is upper Hessenberg,
and thus the matrix representation has only one subdiagonal. The question is therefore
about the existence of a cyclic vector. Note that if T does not have invariant subspaces
then every vector ξ ∈ B(H) is a cyclic vector. Now what happens if ξ is not cyclic for
T ? We may still form {η1 , η2 , η3 , . . .} as above, however, H1 = span{η1 , η2 , η3 , . . .}
is now an invariant subspace for T and H1 = H. We may still form a matrix represen-
tation of T with respect to {η1 , η2 , η3 , . . .}, but this will now be a matrix representation
of T |H1 . Obviously, we can have that σ (T |H1 ) σ (T ).
123
50 M. J. Colbrook, A. C. Hansen
However, the following example shows that the class of matrices for which we
can compute the IQR algorithm covers a wide number of applications. In particular, it
includes all finite interaction Hamiltonians on graphs. Such operators play a prominent
role in solid state physics [48,51] describing propagation of waves and spin waves as
well as encompassing Jacobi operators studied in many physical models and integrable
lattices [71].
Example 4.6 Consider a connected, undirected graph G, such that each vertex degree is
finite and the set of vertices V (G) is countably infinite. Consider the set of all bounded
operators A on l 2 (V (G)) ∼ = l 2 (N) such that the set S(v) := {w ∈ V : w, Av = 0}
is finite for any v ∈ V . Suppose our enumeration of the vertices obeys the following
pattern. e1 ’s neighbours (including itself) are S1 = {e1 , e2 , . . . , eq1 } for some finite
q1 . The set of neighbours of these vertices is S2 = {e1 , . . . , eq2 } for some finite
q2 where we continue the enumeration of S1 and this process continues inductively
enumerating Sm . If we know S(v) for all v ∈ V then we can find an f : N → N such
that A j,m = 0 if | j| > f (m). We simply choose f (n) = qrn where rn is minimal such
that ∪ j≤n S(e j ) ⊂ Srn .
More generally, given an invertible operator T with information on how its columns
decay at infinity, we can compute finite sections of the IQR iterates with error control.
For computing spectral properties, we can assume, by shifting T → T + λI then
translating by −λ back, that the operator we are interested in is invertible, hence the
invertibility criterion is not that restrictive. Throughout, we will use the following
lemma which says that for invertible operators, the QR decomposition is essentially
unique.
Another way to see this result is to note that the columns of Q are obtained by
applying the Gram–Schmidt procedure to the columns of T . The restriction that Rii ∈
R>0 can also be incorporated into Theorem 4.3. Theorem 4.3 (in this subcase of
123
On the infinite-dimensional QR algorithm 51
Pm T n Pm = Pm (P fn (m) T P fn (m) )n Pm
and the relations (2.5)—we can apply Gram-Schmidt (or a more stable modified ver-
sion) to the columns of P fn (m) T P fn (m) and truncate the resulting matrix.
Assume that given T ∈ B(l 2 (N)) invertible (not necessarily with quasi-banded
subdiagonals), we can evaluate an increasing family of increasing functions g j :
N → N such that defining the matrix T( j) with columns {Pg j (n) T en } we have that T( j)
is invertible and
1
(P − I )T en ≤ . (4.5)
g j (n)
j
It is easy to see that such a sequence of functions must exist since any S with S − T <
−1 −1
T is invertible. Given this information, without loss of generality by increasing
the g j s pointwise if necessary,
applying Hölder’s inequality and taking subsequences,
we may assume that T( j) − T ≤ 1/ j. In other words, given a sequence of functions
satisfying (4.5) we can evaluate a sequence of functions with this stronger condition.
The following says that given such a sequence of functions, we can compute the
truncations Pm Tn Pm to a given precision.
Theorem 4.8 Suppose T ∈ B(l 2 (N)) is invertible and the family of functions {g j } are
as above. Suppose also that we are given a bound C such that T ≤ C. Let > 0 and
m, n ∈ N, then we can choose j such that applying Theorem 4.3 (with the diagonal
operators to ensure Rii > 0) to T( j) using the function g j instead of f , we have the
guaranteed bound
Pm Tn Pm − Pm T( j),n Pm ≤ ,
Proof of Theorem 4.8 First consider the error when applying Theorem 4.3 to T( j) with
g j for any fixed j. We will show that we can compute an error bound which converges
to zero as j → ∞ and from this the theorem easily follows by successively computing
the bound and halting when this bound is less than .
Write the QR decompositions
n n
n 1 n−k (C + 1)n C̃
T − (T( j) )n ≤ C ≤ = ,
k j k j j
k=1
123
52 M. J. Colbrook, A. C. Hansen
where C̃ = (C + 1)n . The columns of Q̂ n and Q̂ ( j),n are simply the columns of the
matrices T n and (T( j) )n after the application of Gram-Schmidt. Let the first m columns
of T n and (T( j) )n be denoted by {tk }m ˜j m
k=1 and {tk }k=1 respectively and let {qk }k=1 and
m
j m
{q̃k }k=1 be the vectors obtained after applying Gram-Schmidt to these sequences of
vectors. We then have
t t˜1
j
j 1
q1 − q̃1 = − j
t1 t˜1
(4.6)
t ( t˜ j − t ) (t˜ j − t ) t 2 t − t˜ j
1 1 1 1 1 1 2C̃
= − 1 ≤ 1
≤ .
t1 t˜1
j
t1 t˜1
j t˜1
j
j t˜1
j
For a vector v of unit norm, let P⊥v denote the orthogonal projection onto the
space of vectors perpendicular to v. Note that for two such vectors v, w, we have
P⊥v − P⊥w ≤ v − w . Let
j j
vk = P⊥qk−1 · · · P⊥q1 tk , ṽk = P⊥q̃ j · · · P⊥q̃ j t˜k , (4.7)
k−1 1
j
then qk are just the normalised version of vk and likewise q̃k are just the normalised
j j
version of ṽk . Suppose that for μ < k we have qμ − q̃μ ≤ δ for some δ > 0. Then
applying the above products of projections we have
j j j j
vk − ṽk ≤ P⊥qk−1 · · · P⊥q1 (tk − t˜k ) + P⊥qk−1 · · · P⊥q1 t˜k − ṽk
j j
≤ tk − t˜k + P⊥qk−1 · · · P⊥q1 − P⊥q̃ j · · · P⊥q̃ j t˜k
k−1 1
j j
≤ tk − t˜k + (k − 1)δ t˜k .
In the last line we have used the fact that if the operators {Al }l=1
m and {B }m have
l l=1
norm bounded by 1, then
m
, ,
m m
Al − Bl ≤ Al − Bl .
l=1 l=1 l=1
j j
j 2( tk − t˜k + (k − 1)δ t˜k ) 2(C̃/ j + 2(k − 1)δ C̃)
qk − q̃k ≤ j
≤ j
, (4.8)
ṽk ṽk
j j
since t˜k ≤ C + C̃/ j ≤ 2C̃. Now note that we can compute the ṽk from the proof
of Theorem 4.3. Set δ1 ( j) = 2C̃
j and for 1 < k ≤ m define iteratively
j t˜1
123
On the infinite-dimensional QR algorithm 53
$ -
2(C̃/ j + 2(k − 1)δk−1 ( j)C̃)
δk ( j) = max δk−1 ( j), j
.
ṽk
j
We must have qk − q̃k ≤ δm ( j) for 1 ≤ k ≤ m where we have now shown the j
dependence as an argument. √
It follows that ( Q̂ n − Q̂ ( j),n )Pm ≤ mδm ( j) and hence that
Pm Tn Pm − Pm T( j),n Pm ≤ Pm ( Q̂ n − Q̂ ( j),n )∗ T Q̂ n Pm
+ Pm Q̂ ∗( j),n (T Q̂ n − T( j) Q̂ ( j),n )Pm
√
≤ mδm ( j)C + (T − T( j) ) Q̂ ( j),n Pm
+ T ( Q̂ n − Q̂ ( j),n )Pm
√ 1
≤ 2 mδm ( j)C + .
j
In this section we will apply the above results to prove three new classification theorems
in the SCI hierarchy. First, assume that T ∈ B(l 2 (N)) is an invertible normal operator
with σ (T ) = ω ∪ , where ω ∩ = ∅, ω = {λi }i=1 N , and the λ ’s are isolated
i
eigenvalues with multiplicity m i satisfying |λ1 | > · · · > |λ N |. As usual, we also
assume that sup{|θ | : θ ∈ } < |λ N | and set
M := m 1 + · · · + m N ∈ N ∪ {∞}. (5.1)
In this section we will assume for simplicity that all the m i except possibly m N are
finite. To be able to obtain the classification results we need two key assumptions.
(I) Column decay We assume a much weaker condition than bandedness of the infinite
matrix. Indeed, we suppose a known decay of the elements in the columns of T that
is described through a family of increasing functions {g j } j∈N . In particular, g j :
N → N is such that defining the infinite matrix T( j) with columns {Pg j (n) T en }n∈N
we have that T( j) is invertible and
1
(P − I )T en ≤ , n ∈ N. (5.2)
g j (n)
j
(II) Distance to span of eigenvectors In order to obtain error control (1 classification)
one needs to control the hidden constant in the O(r n ) estimate in (3.16). This is
123
54 M. J. Colbrook, A. C. Hansen
IQR algorithm converges with the expected ordering (largest eigenvalue in the first
diagonal entry then in descending order). It follows from Theorems 3.9 and 3.7,
that there exist eigenspaces E 1 , . . . , E N (with the last space depending on k and
the vectors {e j }) corresponding to the eigenvalues λ1 , . . . , λ N such that
– Ei = ker(T − λi I ) is the full eigenspaceif i < N
.l min{m 1 +···+m l ,k}
– δ̂ i=1 E i , span{Q n e j } j=1 → 0 as n → ∞ for l = 1, . . . , N .
l
min{m 1 +···+m l ,k}
(T , {e j }kj=1 ) := sup φ E i , span{e j } j=1 , (5.3)
l=1,...,N i=1
where φ, defined by (1.4), denotes the subspace angle. Our assumptions and the
proofs in Sect. 3 show that (T , {e j }kj=1 ) < π/2 and hence the key quantity
tan (T , {e j }kj=1 ) is finite.
Remark 5.1 The quantity tan (T , {e j }kj=1 ) can be viewed as a measure of how far
{e j }kj=1 is from {q j }kj=1 , the k eigenvectors of T corresponding to the first k eigenvalues
(including multiplicity and preserving order). Hence it gives an estimate of how good
the initial approximation {e j }kj=1 to {q j }kj=1 is. Indeed, we know from (3.16) that
the convergence rate is O(r n ), and the hidden constant C depends exactly on this
behaviour. In particular, if e j = q j for j ≤ k then C = 0.
Define also
We can now define the class of operators kt,L for the classification theorem.
Definition 5.2 Given k ∈ N, t ∈ (0, 1) and L > 0, let kt,L denote the class of
invertible normal operators T acting on l 2 (N) with T ≤ L such that:
1. There exists the decomposition σ (T ) = ω ∪ as above with m 1 + · · · + m N −1 <
k ≤ M, where M is defined in (5.1).
1 +···+m l
2. If m 1 + · · · + m l < k then {χ{λ1 ,...,λl } (T )e j }mj=1 are linearly independent.
Also, the vectors {χ{λ1 ,...,λ N } (T )e j }kj=1 are linearly independent.
3. We have access to functions gj : N → N with(5.2).
4. It holds that r (T ) ≤ t and tan (T , {e j }kj=1 ) ≤ L.
We can now define the computational problem that we want to classify in the SCI
hierarchy. Consider for any T ∈ kt,L , the problem of computing the k-th largest eigen-
values (including multiplicity) and the corresponding eigenspaces. In other words, we
123
On the infinite-dimensional QR algorithm 55
where we define
S := (λ1 , . . . , λ1 , . . . , λ ,...,λ ) × (q̂1 , . . . , q̂k ) :
( )* + ( N )* N+
m 1 times k−(m 1 +···+m N −1 ) times
m 1 +···+m l
s.t. {q̂ j } j=m 1 +···+m l−1 +1 is an orthonormal basis of ran(χλl (T )) for l < N
and {q̂ j }kj=m 1 +···+m N −1 +1 is an orthonormal basis for a subspace of ran(χλ N (T )) .
Having established the basic definition we can now present the classification theorem.
Theorem 5.3 (1 classification for the extremal part of the spectrum) Given the above
set-up we have {1 , kt,L } ∈ 1 . In other words, for all n ∈ N, there exists a general
tower using radicals, n (T ), such that for all T ∈ kt,L ,
dist(n (T ), 1 (T )) ≤ 2−n .
Remark 5.4 Note that this means we converge to the k largest magnitude eigenvalues
in order with error control, and not just arbitrary points of the spectrum. This is in
contrast to most 1 classifications in the SCI hierarchy where the best we can hope
for is to bound dist(z, σ (T )) for z ∈ C.
Proof of Theorem 5.3 Let T ∈ kt,L then by the definition of kt,L , we may take ê j = e j
for j = 1, . . . , k in the arguments in Sect. 3.1. The first step is to bound Z (T , {e j }kj=1 )
in terms of (T , {e j }kj=1 ). Let {ẽ j }kj=1 denote the basis described in Sect. 3.1. In our
case:
• For any 1 ≤ i ≤ k, span{ẽ j }ij=1 = span{e j }ij=1 .
• If j > m 1 + · · · + m l then χλl (T )ẽ j = 0.
min{m 1 +···+m l ,k}
• The vectors {χλl (T )ẽ j } j=m 1 +···+m l−1 +1
are orthonormal.
123
56 M. J. Colbrook, A. C. Hansen
δ 2j − 1
l
min{m 1 +···+m i ,k} 2
≤ δ span{ẽ j }, span{χ{λi } (T )ẽ j } j=m 1 +··· ,m i−1 +1
δ 2j i=1
l
min{m +···+m l ,k} min{m 1 +···+m i ,k} 2
≤ δ span{ẽ j } j=1 1 , span{χ{λi } (T )ẽ j } j=m 1 +··· ,m i−1 +1
i=1
l 2
min{m +···+m l ,k}
= δ span{e j } j=1 1 , Ei
i=1
≤ sin (T , {e j }kj=1 )
2
.l
Where the first line holds since the nearest point to ẽ j in i=1 span{χ{λi } (T )
min{m 1 +···+m i ,k}
ẽ j } j=m 1 +··· ,m i−1 +1 is simply χ λl (T ) ẽ j and the E i are defined as above and in (3.1).
Rearranging, this implies that
1 1
δ 2j ≤ = .
1 − sin2 (T , {e j }kj=1 ) cos2 (T , {e j }kj=1 )
k 1 k
21 √
2
Z (T , {e j }kj=1 ) = δ 2j − 1 ≤ tan2 (T , {e j }kj=1 ) ≤ k L.
j=1 j=1
Note that we do not need to assume knowledge of N for this bound (trivially N ≤ k).
Using that Q m is an isometry, this implies that
∗
Q T Q m e j , e j − λa ≤ 2 T βt m ≤ 2Lβt m ,
m j
1 +···+m l
where T q̂ j = λa j . Note that we must have {λa j }mj=m 1 +···+m l−1 +1
= λl and
{λa j }kj=m 1 +···+m N −1 +1 = λ N by 3. in the definition of kt,L .
123
On the infinite-dimensional QR algorithm 57
Given any > 0, choose m large enough so that 2Lβt m ≤ and βt m ≤ . The fact
that T ≤ L and (5.2) hold implies that we can compute Q ∗m T Q m e j , e j to accuracy
using finitely many arithmetical and square root operations using Theorem 4.8. Call
these approximations λ̃1 , λ̃2 , . . . , λ̃k . Furthermore, the proof of Theorem 4.8 also
makes clear that we can compute Q m e j ∈ l 2 (N) to accuracy using finitely many
arithmetical and square root operations (the approximations have finite support). Call
these approximations q̃1 , q̃2 , . . . , q̃k . Then set
The above estimates show that dist( (T ), 1 (T )) ≤ 4k. The proof is completed by
−(n+2) /k
setting n (T ) = 2 (T ).
Note that by Theorem 3.9 this includes all normal compact operators, T , such that
{z ∈ σ (T ) : |z| = s} has size at most 1 for all s > 0 (where we can take g(x) = x).2
We will allow evaluations of g in our algorithms and also assume that we are given
functions that satisfy (5.2) and have an upper bound for T . We consider computing
2 (T ) = σ (T ) in the space of compact non-empty subsets of C with the Hausdorff
metric.
Theorem 5.5 (1 classification for spectrum) Given the above set-up we have
g
{2 , IQR } ∈ 1 . In other words, there is a convergent sequence of general tow-
g
ers using radicals, n (T ), such that n (T ) → 2 (T ) = σ (T ) for any T ∈ IQR and
for all n we have
n (T ) ⊂ σ (T ) + B2−n (0).
g
Proof of Theorem 5.5 Let T ∈ IQR and Q m be a Q−sequence of T . Fix n ∈ N. Then
Theorem 4.8 shows that we can compute any finite number of the diagonal entries
of Q ∗m T Q m to any given accuracy using finitely many arithmetical and square root
operations. Similarly, the proof shows that we can compute T Q m e j and Q m e j to any
given accuracy in l 2 (N) (the approximations have finite support). Now let α j,m be the
computed approximations of Q ∗m T Q m e j , e j to accuracy 1/m, then since T ∈ IQR
g
2 A simple compactness argument shows that for any bounded operator T there is a corresponding function
g that works.
123
58 M. J. Colbrook, A. C. Hansen
σ (T ). We have that
−1
(T − α j,m I )−1 ≤ T Q m e j − α j,m Q m e j
Given m, j, we can computean upper bound h j,m for the right-hand side of (5.4)
by approximating the norm T Q m e j − α j,m Q m e j from above to accuracy 1/m
and finitely many evaluations
of g. Namely, let x j,m be the approximation of
T Q m e j − α j,m Q m e j and set
n (T ) ⊂ σ (T ) + B2−n (0).
We also assume that we are given functions that satisfy (5.2) and consider computing
the dominant invariant subspace 3 (T ) = ran(P) in the space of M-dimensional
subspaces of l 2 (N) equipped with the metric δ̂.
Theorem 5.6 (1 classification for dominant invariant subspace) Given the above
˜ M } ∈ 1 . In other words, for all n ∈ N, there exists a general
set-up we have {3 , t,L
tower using radicals, n (T ), each an M-dimensional subspace of l 2 (N), such that for
˜M ,
all T ∈ t,L
δ̂(n (T ), 3 (T )) ≤ 2−n .
Proof of Theorem 5.6 Let n ∈ N and T ∈ ˜ M . Then from Theorem 3.15, we can
t,L
choose m large so that t m L < 2−(n+1) , and hence
−(n+1)
δ̂(span{Q m e j } M
j=1 , ran(P)) < 2 .
123
On the infinite-dimensional QR algorithm 59
Using Theorem 4.8 and its proof, given we can compute in finitely many arithmetical
and square root operations, approximations vm, j () (of finite support) such that
vm, j () − Q m e j ≤ .
The vectors {Q m e j } M
j=1 are orthonormal, as are the approximations {vm, j ()} j=1 . A
M
3 A discussion of this is beyond the scope of this paper. In effect, for invertible operators, this corresponds
to choosing the order of columns on which to perform a Gram-Schmidt type procedure.
123
60 M. J. Colbrook, A. C. Hansen
We first briefly say a few words on the finite section method, the standard means
to discretise infinite matrices, since comparisons will be made later. If {Pm }m∈N is a
sequence of finite-rank projections such that Pm+1 ≥ Pm and Pm → I strongly, where
I is the identity, then the idea is to replace T by the finite square matrix Pm T | Pm H
(typically, one takes Pm to be the orthogonal projection onto span{e1 , . . . , em }). Thus,
to find σ (T ), we instead compute σ (Pm T | Pm H ). However, there can be significant
issues when using the finite section method. In general, there is no guarantee that the
computed spectra σ (Pm T | Pm H ) need converge to σ (T ).
For example, consider the shift operator Se j = e j+1 on l 2 (N). If Pm projects onto
span{e1 , . . . , em }, we would get that σ ((Pm S| Pm H ) = {0} for all m, whereas σ (S) is
the closed unit disc. We can also have that σ (Pm T | Pm H ) σ (T ). For example, let
⎛ ⎞
a1 i
⎜ 1 a2 i ⎟
⎜ ⎟
⎜ 1 a3 i ⎟
⎜
A=⎜ ⎟, (6.1)
. . ⎟
⎜ 1 a4 . ⎟
⎝ ⎠
.. ..
. .
where a j = 5 cos( j)/4 + 2i sin( j). To gain an accurate picture of the spectrum, note
that A is banded and hence we can compute approximates to the pseudospectrum [38].
In order to approximate the spectrum in the best possible way we must take as small
as possible. Unfortunately, there is a restriction to how small can be depending on
mach (machine precision) of the software used. To illustrate this, observe that the
approximates are given by (a discrete version of)
√
m (A) = {z ∈ C : min{ λ : λ ∈ σ (Pm (A − z)∗ (A − z)| Pm H )} ≤ }
√ (6.2)
∪ {z ∈ C : min{ λ : λ ∈ σ (Pm (A − z)(A − z)∗ | Pm H )} ≤ }.
Thus, ignoring the additional error in computing the smallest eigenvalues of the
squared operator and assuming A to have matrix entries of order 1, computing m (A)
will have the same challenges as if one squares a real number and then takes its square
root. In particular, due to the floating point arithmetic used in the software and (6.2),
we must ensure that
√
mach ,
and this puts a serious restriction on our computation, particularly for the non-normal
case where the distance d H (σ (T ), σ (T )) may be large (though we always have
σ (T ) ⊂ σ (T )). However, it is possible to detect spectral pollution outside of σ (T )
if we can approximate it well.
The phenomenon of “spectral pollution” occurs for A: namely, the computed spec-
trum σ (Pm A| Pm H ) contains elements that have nothing to do with σ (A). This is
123
On the infinite-dimensional QR algorithm 61
100
2 10
1 5 100 5
5
5 10
10
100 10
0
1
100 10 0.5
5
10
10
100
5
10
10
5 5
10
0 0
0
10
100
5
100
5
5
1 00 10
10
-1 5 -0.5
10 0 10 10
100
5
10
10
0
-2 -1 10
100 5
5 10 5 5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
-3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Fig. 1 Left: σ (A) plotted as contours of the resolvent norm, as well as σ (Pm A| Pm H ) for m = 300 with
the false eigenvalue (recall that σ (A) ⊆ σ (A)). Right: σ (T ), σ (T ) and σ (Pm T | Pm H ) for m = 100
Theorem 6.1 (Pokrzywa [58]) Let A ∈ B(H) and {Pm } be a sequence of finite-
dimensional projections converging strongly to the identity. Suppose that S ⊂ We (A).
Then there exists a sequence { P̃m } of finite-dimensional projections such that Pm < P̃m
(so P̃m → I strongly) and
d H (σ (Am ) ∪ S, σ ( Ãm )) → 0, as m → ∞,
where
Despite this result, the finite section can perform quite well. This is the case for
self-adjoint operators [6,21,36] and it is also well suited for the computation of pseu-
dospectra of Toeplitz operators [14,18]. Moreover, in general, we have the following
(recall that We (T ) is the convex hull of the essential spectrum for T normal):
Theorem 6.2 (Pokrzywa [58]) Let T ∈ B(H) and {Pm } be a sequence of finite-
dimensional projections converging strongly to the identity. If λ ∈
/ We (T ) then
λ ∈ σ (T ) if and only if
dist(λ, σ (Pm T | Pm H )) −→ 0, as m → ∞.
However, if we want to use the finite section method and rely on Theorem 6.2, we
must know We (T ), and that may be unpleasant to compute. Alternatively, we could
hope that σess (T ) is close to We (T ). For example, if T is hypo-normal (T ∗ T − T T ∗ ≥
0) then
123
62 M. J. Colbrook, A. C. Hansen
conv(σess (T )) = We (T ),
where conv(σess (T )) denotes the convex hull of σess (T ). But what if we have a “very
non-normal” operator?
Another problem we may encounter when using the finite section method is that
even though σd (T ) may be recovered, one may get a very misleading picture of the
rest of the spectrum. Such problems are illustrated in the following simple example.
Let
⎛ ⎞
2.5 + 0.5i 0 0 0 0 0 0 ···
⎜ 1 3 − 0.5i 0 0 0 0 0 ···⎟
⎜ ⎟
⎜ 0 1 1.7 0.05 0 0 0 ···⎟
⎜ ⎟
⎜ 0 0 0.05 t4 0 0 0 ···⎟
⎜ ⎟
T =⎜ 0 0 0 0 t5 0 0 ···⎟ , (6.3)
⎜ ⎟
⎜ 0 0 0 0 1 t6 0 ···⎟
⎜ ⎟
⎜ 0 0 0 0 0 1 t7 ···⎟
⎝ ⎠
.. .. .. .. .. .. .. ..
. . . . . . . .
Remark 6.3 The previous examples demonstrated that, in general, the finite section
method is not always suitable for computing spectra. Rather then working with square
sections of the infinite matrix T , one should work with uneven sections Pn T Pm , where
the parameters n and m are allowed to vary independently. Indeed, the algorithms
presented in [24,38] use this method. In effect, we need to know how large n should
be to retain enough information of the operator T Pm . This type of idea is also used
implicitly in the IQR algorithm (see Sect. 4).
Example 6.4 (Convergence of the IQR algorithm) We begin with two simple examples
that demonstrate the linear (or exponential) convergence proven in Theorem 3.9 and
Corollary 3.12 (and its generalisations). Consider first the one-dimensional discrete
Schrödinger operator given by
123
On the infinite-dimensional QR algorithm 63
10
10
10
10
10 10
10 10
⎛ ⎞
v1 1
⎜ 1 v2 1 ⎟
⎜ ⎟
⎜ 1 v3 1 ⎟
T1 = ⎜
⎜
⎟,
⎜ 1 v
. . .⎟
⎟
⎝ 4 ⎠
.. ..
. .
√
where v j = 5 sin( j)2 / j if j ≤ 10 and v j = 0 otherwise. As a compact (in fact
finite rank) perturbation of the free Laplacian, σ (T1 ) consists of the interval [−2, 2]
together with isolated eigenvalues of finite multiplicity which can be computed [73].
The second operator, T2 , consists of taking the operator
⎛ ⎞
2 0 0 0
⎜0 3i 0 0 ⎟
T0 = ⎜ 2
⎝0 0 − 5 0 ⎠
⎟ U1 ,
4
0 0 0 −8 9i
where Uk denotes the bilateral shift e j → e j+k , writing this as an operator on l 2 (N)
and then mixing the spaces via a random unitary transformation on the span of the first
9 basis vectors. This ensures T2 is not written in block form but has known eigenvalues.
We have plotted the difference in norm between the first j × j block of each Q ∗n Tl Q n
and the diagonal operator formed via the largest j eigenvalues for j = 1, 2, 3 and 4
in Fig. 2. The plot clearly shows the exponential convergence.
Example 6.5 (Convergence to extremal parts of the spectrum) To see why we may
need some condition on σ (T ) for convergence of the IQR algorithm to the extreme
parts of the spectrum, we consider Laurent and Toeplitz operators with symbol given
by a trigonometric polynomial
j=k
a(t) = ajt j.
j=−k
123
64 M. J. Colbrook, A. C. Hansen
acting on l 2 (Z) and l 2 (N) respectively. Note that L(a) is always normal whereas T (a)
need not be (see for example [18]). A simple example already mentioned is a(t) = t
which gives rise to the bilateral and unilateral shifts L(a) = U1 and T (a) = S. In this
case, both of these operators are invariant under iterations of the IQR algorithm and
hence their finite sections Pm Q ∗n T Q n | Pm H always have spectrum {0}. In the case of
L(a) this is an example of spectral pollution, whereas in the case of T (a) this does not
capture the extremal parts of the spectrum. Regarding pure finite section, the following
beautiful result is known:
where ar (t) = a(r t). Furthermore, this limit set is a connected finite union of analytic
arcs, each pair of which has at most endpoints in common.
t 3 + t −1
a(t) = , ã(t) = t + it −2 .
2
Figure 3 shows the outputs of the IQR algorithm and plain finite section for
the corresponding Laurent and Toeplitz operators for m = 50 and n = 1 and
n = 300. In the case of a, it appears that both limit sets are the extremal parts
of σ (L(a)) (together with 0 if m is not a multiple of 4). Whereas in the case of
ã it appears that limn→∞ Pm Q ∗n T (ã)Q n | Pm H is the extremal parts of ϒ(ã) and
limn→∞ Pm Q ∗n L(ã)Q n | Pm H is the extremal parts of σ (L(ã)) (again together with
a finite collection of points depending on the value of m modulo 3). Curiously, in
both cases we observed convergence in the strong operator topology to block diago-
nal operators (up to unitary equivalence in each sublock), whose blocks have spectra
corresponding to the limiting sets (hence the dependence on the remainder of m mod-
ulo 2 or 3). However, in contrast to convergence to points in the discrete spectrum,
123
On the infinite-dimensional QR algorithm 65
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-0.4 -0.4
-0.6 -0.6
-0.8 -0.8
-1 -1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Fig. 3 Top: output of IQR and finite section on T (a) and L(a) for m = 50 and n = 1 (left), n = 300 (right).
Bottom: same but for the symbol ã. In both cases for a given symbol b, σ (L(b)) is given by {b(z) : z ∈ T}
(shown) and σ (T (b)) is given by σ (L(b)) ∪ {z ∈ C\b(T) : wind(b, z) = 0}
convergence to these operators was only algebraic. This is shown in Fig. 4 where we
have plotted the Hausdorff distance between the limiting set and the eigenvalues of
the first diagonal block. We also shifted the operators (+ 1.1I for a and − 1.5i I for
ã) so that the extremal points correspond to exactly one point. In this scenario and for
all operators (Laurent or Toeplitz) the IQR algorithm converges strongly to a diagonal
operator whose diagonal entries are the corresponding extremal point of σ (L(a)). This
convergence is also shown in Fig. 4 and we observed a slower rate of convergence than
before. This is possibly due to points from the other tips of the petals of σ (L(a)) con-
verging as we increase n. It would be interesting to see if some form of Theorem 6.6
holds for the IQR algorithm (now taking n → ∞). Given the examples presented
here, such a statement would likely be quite complicated. However, we conjecture
that if a normal operator has exactly one extreme point of its essential spectrum (and
finitely many eigenvalues of magnitude greater than ress ) then this extreme point will
be recovered in the limit n → ∞ for large enough m.
Example 6.7 (IQR and avoiding spectral pollution) In this example we consider
whether the IQR algorithm may be used as a tool to avoid spectral pollution. Some-
times when considering σ (Pm T | Pm H ), spectral pollution can be detected by changing
m (edge states which correspond to spectral pollution are often unstable, but this is not
123
66 M. J. Colbrook, A. C. Hansen
10 10
10
10
10
10 10
Fig. 4 Left: algebraic convergence to block diagonal operators. Right: algebraic convergence to diagonal
operators. In both cases, we have plotted the difference in eigenvalues of the first block as we increase n
However, for finite section, spectral pollution occurs for all large m
123
On the infinite-dimensional QR algorithm 67
2 200
1.8 180
1.6 160
1.4 140
1.2 120
1 100
0.8 80
0.6 60
0.4 40
0.2 20
0 0
0 20 40 60 80 100 120 140 160 180 200 -4 -3 -2 -1 0 1 2 3 4
Fig. 5 Left: d H (σ (Pm Q ∗n (T3 + 0.2I )Q n | Pm H ) − 0.2I , σ (T3 )) as a function of n for different m. Right:
σ (Pm Q ∗n (T3 + 0.2I )Q n | Pm H ) − 0.2I as a function of n for m = 201. Note the crossing of eigenvalues
across the spectral gap
0.55 0.7
0.5
0.6
0.45
0.4 0.5
0.35
0.4
0.3
0.3
0.25
0.2 0.2
0.15
0.1
0.1
0.05 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20
and the IQR algorithm can only recover the extreme parts of the spectrum
Despite this, we found that for small fixed n > 0 it appears that
Although Theorem 3.9 considers normal operators, Theorems 3.13 and 3.15 suggest
the IQR algorithm may also be useful for non-normal operators. Indeed, the results
presented here demonstrate that in practice the IQR algorithm can work very well for
123
68 M. J. Colbrook, A. C. Hansen
100
2 10
1 5 100 5
5
5 10
10
1 00 10
0
1 0.5
1 0 0 10
5
10
10
100
5
10
10
5 5
10
0 0
0
10
100
5
100
5
5
100 10
10
-1 5 -0.5
100 10 10
100
5
10
10
0
-2 -1
10
1 00 5
5 10 5 5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
-3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Fig. 7 Left: output of the IQR algorithm σ (Pm Q ∗n AQ n | Pm ) for m = 300 and n = 1000. Right: output of
the IQR algorithm σ (Pm Q ∗n T Q n | Pm ) for m = 100 and n = 300
σ (Pm Q ∗n T Q n | Pm H ) −→ {λ1 , . . . , λm }, as n → ∞.
We will verify this numerically in the next examples. However, we will see that not
only do we get convergence to the eigenvalues, but often we also pick up parts of
the boundary of the essential spectrum (this was the case when considering T (a) but
appeared not to be the case for T (ã)). This phenomenon is not accounted for in the
previous exposition where normality was crucial for proving Theorem 3.9.
Example 6.8 (Recovering the extremal part of the spectrum) Let us return to the infinite
matrices A in (6.1) and T in (6.3) from Sect. 6.1. We have run the IQR algorithm with
n = 1000 and n = 300 for A and T respectively, shown in Fig. 7. We see that if one
takes a finite section after running the IQR algorithm, then part of the boundary of the
essential spectrum also appears, along with the discrete spectrum σd (A). Note that the
part of the boundary that is captured is the extreme part (points with largest modulus).
It seems that after running the IQR algorithm, the spectral information from the largest
isolated eigenvalues and the largest approximate point spectrum is “squeezed up” to
the upper and leftmost portions of the matrix. This is not completely counter-intuitive
given (2.5) and is what normally happens in finite dimensions. For both examples, we
found that the IQR iterates converge to an upper triangular matrix (analogous to the
finite-dimensional case) in agreement with Theorems 3.13 and 3.15. The convergence
of the upper 1 × 1 block for A (corresponding to the dominant eigenvalue) and 4 × 4
non-diagonal block for T are shown in Fig. 8 where we have plotted the difference in
norm.
123
On the infinite-dimensional QR algorithm 69
10
10
0.5
0 10
-0.5
10
-1
10
0 0.5 1 1.5 2 2.5 3 0 100 200 300 400 500 600 700
Fig. 8 Left: output of the IQR algorithm σ (Pm Q ∗n T1Q n | P ) for m = 100 and n = 300. The reference
m
circle is the boundary of the essential spectrum. Right: convergence of upper diagonal blocks for operators
A, T and T 1
123
70 M. J. Colbrook, A. C. Hansen
1.5 1.5
Resolvent Norm 5 5
1 Finite Section 1
10 10
IQR 100
1100
100
0.5 5 0.5
00
10
10
5 10
10 0
10
10
100
100
5
5
10
Im(z)
Im(z)
100
10 10
5
10
0 100
100 0 10
5
11000
10
10
0
10 0 100 100
100
1 5
10
10
50
5
1 00
10
5
-0.5 5 -0.5
0
10
11000
0
10
100
-1 -1 10 5
10 5
5
-1.5 -1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Re(z) Re(z)
Fig. 9 The figures show finite sections σ (Pm H1 | Pm H ) (magenta) and (shifted) σ (Pm Q ∗n H1 Q n | Pm H )
IQR iterates (blue) along with converged resolvent norm contours for γ = 1 (left) and γ = 2 (right). Both
figures are for m = 500, n = 3000 and show the convergence to the extremal parts of the spectrum
This commutes with (the discrete version of) P T precisely when the potential has
even real part and odd imaginary part. We tested the IQR algorithm on the potential
$
cos(n) + iγ sin(n), mod (n, 2) = 0
Vn = , (6.5)
0, mod (n, 2) = 1
and found similar results for other potentials. Figure 9 shows the same qualitative
behaviour as the last example for γ = 1, 2 at m = 500, n = 3000. We shifted by
2.2 and 2.15 for γ = 1, 2 respectively. For comparison, we have shown converged
resolvent norms. We found that spectral pollution with no IQR iterates was consistent
as we varied m. However, for a fixed m, increasing the number of iterates (n → ∞)
caused σ (Pm Q ∗n H1 Q n | Pm H ) to approach the extremal part of the spectrum.
In this final section, we explore examples where the Pm Q ∗n T Q n Pm naturally give rise
to periodic boundary conditions (this was already seen for some examples of Laurent
operators in Sect. 6.2). Both examples discussed here are physically motivated random
tridiagonal operators on the lattice Z. One of the key applications of studying such ran-
dom operators can be found in condensed matter physics. The discrete models below
have been used to study conductivity of disordered media, flux lines in superconduc-
tors and asymmetric hopping particles. Many such operators are also the discretisation
of certain stochastic differential equations. As we will demonstrate, the IQR method
can be a powerful way of avoiding spectral pollution caused by unnatural “open”
boundary conditions in forming the finite section Pm T Pm . In both of these examples,
periodic boundary conditions are natural and we find that taking finite sections after
iterating the IQR algorithm captures periodic boundary conditions.
123
On the infinite-dimensional QR algorithm 71
Example 6.10 (Hopping sign model in sparse neural networks) The first example is a
non-normal operator with random sub and superdiagonals, first studied by Feinberg
and Zee [23,31,40]. The usual “Hopping Sign Model” is defined via
−
(H3 x)n = sn−1 exp(−g)xn−1 + sn+ exp(g)xn+1 , (6.6)
and appearing in [1] in the context of sparse neural networks. We shall assume that
g is real and non-negative and that s ±j are i.d.d. random variables with Bernoulli
distribution p. In other words
P(s ± ±
j = 1) = 1 − P(s j = −1) = p.
We will only consider g = 1/10 and p = 1/2, but will vary p in an effort to compute
the spectrum of H3 which only depends on the support of the distribution of the s ± j ’s.
It is easy to prove that the spectrum (and pseudospectrum) of H3 is almost surely
constant and that there is no inessential spectrum. Furthermore, one can show that
σ (H3 ) is contained in the annulus {z ∈ C : 2 sinh(g) ≤ |z| ≤ 2 cosh(g)}.
Finite section calculations associated with this operator have some interesting prop-
erties and are extensively studied in [1]. If one projects using the standard basis of
l 2 (Z) then one obtains matrices of the form
⎛ − ⎞
0 s−n+1 exp(−g)
⎜ + .. ⎟
⎜s . ⎟
Mn1 = ⎜ −n+1 exp(g) 0 ⎟.
⎜ .. .. ⎟
⎝ . . −
sn−1 exp(−g)⎠
+
sn−1 exp(g) 0
If we use open boundary conditions (i.e. we simply project onto the space spanned by
{e−n , . . . , en }) then one can “gauge” away g by a similarity transformation, leading
to
⎛ − ⎞
0 s−n+1
⎜ + .. ⎟
⎜s 0 . ⎟
Mn =⎜
⎜
−n+1 ⎟.
⎟
⎝ .. .. − ⎠
. . sn−1
+
sn−1 0
123
72 M. J. Colbrook, A. C. Hansen
Fig. 10 Top: output of finite section over a random sample of 200 matrices of size 200 (left) and the
estimates using pseudospectral techniques (right). Bottom: the output of IQR over 200 samples computing
σ (Pm Q ∗n H3 Q n | Pm H ) for m = 200 and n = 50 (left), n = 2000 (right). Note that after a few iterates, the
output seems to agree with periodic boundary conditions and then increasing the number of iterates leads
to convergence to the extremal parts of the essential spectrum
On the other hand, the use of periodic boundary conditions leads to the matrix
⎛ − ⎞
0 s−n+1 exp(−g) sn+ exp(g)
⎜ + .. ⎟
⎜s . ⎟
Mn2 =⎜ −n+1 exp(g) 0 ⎟,
⎜ .. .. ⎟
⎝ . . −
sn−1 exp(−g)⎠
+
sn− exp(−g) sn−1 exp(g) 0
123
On the infinite-dimensional QR algorithm 73
ψn+2 −
yn+1 (z) = exp(g) = −(sn−1 /sn+ )/yn (z) + z/sn+
ψn+1
1
N
κ(z; g) = lim log y j (z) − g .
N →∞ 2N + 1
j=−N
This is known as the transfer matrix approach. For fixed z, as we increase g, κ(z; g)
becomes negative. The heuristic is that a hole opens up in the spectrum corresponding
to a negative Lyapunov exponent. Eigenvalues of Mn2 inside the hole are swept up
and become delocalised moving to the rim of the hole, whereas those outside remain
largely undisturbed. Eigenvalues of Mn1 inside the negative κ zone correspond to edge
states due to the finite system size approximation.
Figure 10 shows the output of a sample of 200 finite sections with open boundary
conditions and matrix size 200. We have also shown the annular region that bounds
the spectrum, as well as the contour κ = 0. In order to calculate κ, we calculated the
above sum on a grid with large N to ensure convergence. The colour bar corresponds
to the inverse participation ratio (log scale) of normalised eigenfunctions defined by
j |ψi |4
1/P ≡ .
j |ψi |2
Note that this has a maximum value of 1 (localised) and a minimum value of 1/N
(delocalised), N being the size of the matrix. Open boundary conditions produce
spectral pollution in the hole with localised eigenfunctions and the contour κ = 0
corresponds to the delocalised region. In order to compare to the spectrum of the
infinite operator on l 2 (Z) we have plotted σ (H3 ), for = 10−2 , calculated using
matrix sizes of order 105 . We note that the spectrum is independent of p ∈ (0, 1) so
we have also shown the union of these estimates over p = {k/100}99 k=1 . Although the
algorithm used to compute the pseudospectrum is guaranteed to converge to σ (H3 ),
there are regions in the complex plane where this convergence is very slow. Taking
unions over p is simply a way to speed up this convergence. Upon taking smaller, we
found that the spectrum appeared to have a fractal-like nature. It also appears that the
hole in the spectrum corresponds to the boundary of two ellipses. It is easy to prove
that the ellipse
is contained in σ (H3 ) and that the spectrum (and pseudospectrum) of H3 has fourfold
rotational symmetry. Denoting the rotation of E 1 by π/4 as E 2 we have shown E 1 ∪ E 2
in the figure.
123
74 M. J. Colbrook, A. C. Hansen
Figure 10 also shows the effect of IQR iterations over random samples of size 200
for m = 200 and n = 50 and 2000. Remarkably, as we increase n, a few iterations is
enough to capture periodic boundary conditions and sweep away the localised edge
states. We have also shown the inverse participation ratio which, although now is
defined with respect to a new basis, still gives an indication of how “diagonal” the
matrix Pm Q ∗n H3 Q n | Pm H is. If we increase n further, the output approaches the edge of
the spectrum with eigenvectors becoming more localised (in the new basis). We found
exactly the same phenomena to occur if we shifted the operator H3 , with convergence
to the corresponding extremal part of the essential spectrum.
where g > 0 and V is a random potential. This operator also has applications in
population biology [52] and the self-adjoint version of this model is widely studied
for the phenomenon of Anderson localisation (absence of diffusion of waves) [2,12].
In the non-self-adjoint case, complex values of the spectrum indicate delocalisation.
Note that we now have randomness on the diagonal with fixed coupling coefficients
exp(±g).
Standard finite section produces real eigenvalues since the matrix Pm H4 | Pm H is
similar to a real symmetric matrix. However, truncating the operator and adopting
periodic boundary conditions gives rise to the famous “bubble and wings”. If V = 0
then the spectrum is an ellipse E = {exp(g + iθ ) + exp(−g − iθ ) : θ ∈ [0, 2π )},
but as we increase the randomness wings appear on the real axis. For a study of this
phenomenon and the described phase transition we refer the reader to [31]. Goldsheid
and Khoruzhenko have studied the convergence of the spectral measure in the periodic
case as N → ∞ in [33], N being the number of sites. In general, the support of these
measures as N → ∞ can be very different from the spectrum of the operator on l 2 (Z)
given by (6.7), highlighting the difficulty in computing the spectrum.
We consider the case g = 1/2 with Vn i.i.d. Bernoulli random variables taking
values in {±1} with equal probability p = 1/2. Again, there is no inessential spectrum
and the spectrum/pseudospectrum is constant almost surely, depending only on the
support of the distribution of the Vn . The following inclusion is also known, which
bounds the spectrum:
123
On the infinite-dimensional QR algorithm 75
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Fig. 11 The output of IQR over 200 samples computing σ (Pm Q ∗n H4 Q n | Pm H ) for m = 30 and n = 15
(left), n = 300 (right). Note that we appear to recover the periodic limit curve and increasing the number
of iterates converges to the extremal parts. Applying shifts allowed us to recover the extremal parts of the
limit curves
where conv(E) its closed convex hull of E and B1 denotes the closed unit disk. The
choice of g ensures the spectrum has a hole in it. One may calculate the Lyapunov
exponent, either by the transfer matrix approach or by calculating a potential related
to the density of states. The limiting distribution of the eigenvalues of finite section
with periodic boundary conditions is given by the complex curve
The output of the IQR algorithm for m = 30 and n = 15 and n = 300 over 200
random samples are shown in Fig. 11. Note that if we took n = 0, the spectrum would
be real in stark contrast to Fig. 11. Taking a small number of IQR iterates approximates
the bubble and wings with a few remaining real eigenvalues. However, upon increasing
n, the output does not seem to converge to the extremal parts of the spectrum, but seems
to remain stuck on the limit curve with the operator Pm Q ∗n H4 Q n | Pm H . Shifting by
+4i I caused the output to recover the top part of the limit curve.
Remark 6.12 For any operator T that has Q n unitary, the essential spectrum and spec-
trum of Q ∗n T Q n is equal to that of T . As the above two examples suggest, taking a
small value of n could be used as a method of testing eigenvalues of finite section meth-
ods that correspond to finite system size effects, such as open boundary conditions.
This could be used in quasi-periodic systems or systems with very few symmetries,
where there is no obvious choice of appropriate boundary conditions. However, detect-
ing isolated eigenvalues of finite multiplicity within the convex hull of the essential
spectrum still remains a challenge.
This paper discussed the generalisation of the famous QR algorithm to infinite dimen-
sions. It was shown that for a large class of operators, encompassing many in scientific
123
76 M. J. Colbrook, A. C. Hansen
123
On the infinite-dimensional QR algorithm 77
Based on our findings, we end with a list of open problems for further study on the
theoretical properties of the IQR algorithm:
• Which conditions are needed on a possibly non-normal operator in order for the
IQR algorithm to pick up the extreme points of the essential spectrum?
• Is the convergence rate to non-isolated points of the spectrum algebraic?
• For operators which do not have a trivial QR decomposition, is there a way of
choosing n = n(m) such that σ (Pm Q ∗n(m) T Q n(m) | Pm ) converges to the spectrum
as m → ∞? If not, then for which classes of operators does such a choice exist?
• Is there a link between the IQR algorithm and the finite section method with
periodic boundary conditions for the class of pseudoergodic operators?
• Are there other cases where the IQR algorithm alleviates the need to provide natural
boundary conditions when applying the finite section method?
• Extending the IQR algorithm to unbounded operators. Can the IQR algorithm also
be extended to a continuous version for differential operators?
Acknowledgements MJC acknowledges support from the UK Engineering and Physical Sciences Research
Council (EPSRC) Grant EP/L016516/1. ACH acknowledges support from a Royal Society University
Research Fellowship as well as EPSRC Grant EP/L003457/1.We would also like to thank the referees
whose comments and suggestions led to the improvement of the manuscript.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.
A Appendix
Here we show example code for the IQR algorithm in the case that the matrix has k
subdiagonals. The code can easily be adapted for the more general case considered in
Sect. 4.1.
Algorithm A.1
function J = Infinite_QR(A,n,k,m)
d = size(A,2);
for j=1:n
A = Inf_QR(A,d-j*k,k); % The output in each loop is actually
end % U_(d-j*k)...U_1A_(j-1)U_1...U_(d-j*k)
J = A(1:m,1:m); % if A_j is the j-th term in the QR iteration.
Algorithm A.2
% Inf_QR(A,n,k) takes a matrix A with k subdiagonals and performs
% multiplication by n Householder transformation from the left and
123
78 M. J. Colbrook, A. C. Hansen
function B = Inf_QR(A,n,k)
B = A; d = size(A,1);
for j = 1:n
u = House(A(j:j+k,j));
A(j:j+k,j:d) = A(j:j+k,j:d) - 2*u*(u’*A(j:j+k,j:d));
B(j:j+k,1:d) = B(j:j+k,1:d) - 2*u*(u’*B(j:j+k,1:d));
B(1:d,j:j+k) = B(1:d,j:j+k) - 2*(B(1:d,j:j+k)*u)*u’;
end
Algorithm A.3
function u = House(x)
v = x;
if v(1) == 0
v(1) = v(1) + norm(v); %This is the classical way
else %of creating Householder reflections
v(1) = x(1) + sign(x(1))*norm(x); %as in finite dimensions.
end
u = v/norm(v);
123
On the infinite-dimensional QR algorithm 79
The goal is to find algorithms which approximate the function . More generally,
the main pillar of our framework is the concept of a tower of algorithms, which is
needed to describe problems that need several limits in the computation. However,
first one needs the definition of a general algorithm.
Definition A.5 (General Algorithm) Given a computational problem {, , M, },
a general algorithm is a mapping : → M such that for each A ∈
(i) there exists a finite subset of evaluations (A) ⊂ ,
(ii) the action of on A only depends on {A f } f ∈ (A) where A f := f (A),
(iii) for every B ∈ such that B f = A f for every f ∈ (A), it holds that (B) =
(A).
Note that the definition of a general algorithm is more general than the definition
of a Turing machine or a Blum–Shub–Smale (BSS) machine. A general algorithm has
no restrictions on the operations allowed. The only restriction is that it can only take
a finite amount of information, though it is allowed to adaptively choose the finite
amount of information it reads depending on the input. Condition (iii) assures that the
algorithm reads the information in a consistent way. Note that the purpose of such a
general definition is to get strong lower bounds. In particular, the more general the
definition is, the stronger a proven lower bound will be.
With a definition of a general algorithm we can define the concept of towers of
algorithms. However, before we define that, we will discuss the cases for which we
may have a set-valued function.
Remark A.6 (Set-valued functions) Occasionally we will consider a function such
that for T ∈ we have that (T ) ⊂ M. In this case, we will still require that a general
algorithm produces a single valued out put i.e (T ) ∈ M for T ∈ . However, we
replace the metric in order to define convergence. In particular, n (T ) → (T ), as
n → ∞ means
inf dM (n (T ), y) → 0.
y∈(T )
n k : → M, n k ,n k−1 : → M, . . . , n k ,...,n 1 : → M,
where n k , . . . , n 1 ∈ N and the functions n k ,...,n 1 at the “lowest level” of the tower
are general algorithms in the sense of Definition A.5. Moreover, for every A ∈ ,
123
80 M. J. Colbrook, A. C. Hansen
= n k ,...,n 1 : → M
satisfy the following: For each A ∈ the action of on√A consists of only finitely
many arithmetic operations, comparisons and radicals ( ·) of positive numbers on
{A f } f ∈ (A) , where A f = f (A).
In other words one may say that for the finitely many steps of the computation
of the lowest functions = n k ,...,n 1 : → M only the four arithmetic opera-
tions +, −, ·, / within the smallest (algebraic) field which is generated by the input
{A f } f ∈ (A) are allowed. In addition, we allow the extraction of radicals of positive
real numbers. We implicitly assume that any complex number can be decomposed into
a real and an imaginary part, and moreover we can determine whether a = b or a > b
for all real numbers a, b which can occur during the computations. Given the defini-
tions above we can now define the key concept, namely, the Solvability Complexity
Index:
Definition A.9 (Solvability Complexity Index) A computational problem {, , M, }
is said to have Solvability Complexity Index SCI(, , M, )α = k, with respect to a
tower of algorithms of type α, if k is the smallest integer for which there exists a tower of
algorithms of type α of height k. If no such tower exists then SCI(, , M, )α = ∞.
If there exists a tower {n }n∈N of type α and height one such that = n 1 for
some n 1 < ∞, then we define SCI(, , M, )α = 0. We may sometimes write
SCI(, )α to simplify notation when M and are obvious.
The definition of the SCI immediately induces the SCI hierarchy:
Definition A.10 (The Solvability Complexity Index Hierarchy) Consider a collection
C of computational problems and let T be the collection of all towers of algorithms
of type α for the computational problems in C. Define
as well as
Remark A.11 (The k notation) Note that in this paper we only consider radical towers
and hence the superscript α will be omitted throughout. Thus we will always write
k .
Finally, we recall the definition of 1α .
123
On the infinite-dimensional QR algorithm 81
References
1. Amir, A., Hatano, N., Nelson, D.R.: Non-Hermitian localization in biological networks. Phys. Rev. E
93(4), 042310 (2016)
2. Anderson, P.W.: Absence of diffusion in certain random lattices. Phys. Rev. 109(5), 1492 (1958)
3. Aronszajn, N.: Approximation methods for Eigenvalues of completely continuous symmetric operators.
In: Proceedings of the Symposium on Spectral Theory and Differential Problems, pp. 179–202 (1951)
4. Arveson, W.: Improper filtrations for C ∗ -algebras: spectra of unilateral tridiagonal operators. Acta Sci.
Math. (Szeged) 57(1–4), 11–24 (1993)
5. Arveson, W.: Noncommutative spheres and numerical quantum mechanics. In: Operator Algebras,
Mathematical Physics, and Low-dimensional Topology (Istanbul, 1991), Research Notes in Mathe-
matics, vol. 5, A K Peters, Wellesley, pp. 1–10 (1993)
6. Arveson, W.: C ∗ -algebras and numerical linear algebra. J. Funct. Anal. 122(2), 333–360 (1994)
7. Arveson, W.: The role of C ∗ -algebras in infinite-dimensional numerical linear algebra. In: C ∗ -algebras:
1943–1993 (San Antonio, TX, 1993), Contemporary Mathematics, vol. 167, Amer. Math. Soc., Prov-
idence, RI, pp. 114–129 (1994)
8. Ben-Artzi, J., Colbrook, M.J., Hansen, A.C., Nevanlinna, O., Seidel, M.: On the Solvability Complexity
Index Hierarchy and Towers of Algorithms (Preprint) (2018)
9. Ben-Artzi, J., Hansen, A.C., Nevanlinna, O., Seidel, M.: New barriers in complexity theory: on the
solvability complexity index and the towers of algorithms. Comput. Rend. Math. 353(10), 931–936
(2015)
10. Bender, C.M.: Making sense of non-Hermitian Hamiltonians. Rep. Prog. Phys. 70(6), 947 (2007)
11. Bender, C.M., Boettcher, S.: Real spectra in non-Hermitian Hamiltonians having PT symmetry. Phys.
Rev. Lett. 80(24), 5243 (1998)
12. Billy, J., Josse, V., Zuo, Z., Bernard, A., Hambrecht, B., Lugan, P., Clément, D., Sanchez-Palencia, L.,
Bouyer, P., Aspect, A.: Direct observation of Anderson localization of matter waves in a controlled
disorder. Nature 453(7197), 891–894 (2008)
13. Bögli, S., Brown, B.M., Marletta, M., Tretter, C., Wagenhofer, M.: Guaranteed resonance enclosures
and exclosures for atoms and molecules. In: Proceedings of the Royal Society of London A: Mathe-
matical, Physical and Engineering Sciences, vol. 470, no. 2171 (2014)
14. Böttcher, A.: Pseudospectra and singular values of large convolution operators. J. Integral Equ. Appl.
6(3), 267–301 (1994)
15. Böttcher, A.: Infinite matrices and projection methods. In: Lectures on Operator Theory and Its Appli-
cations (Waterloo, ON, 1994), Fields Institute Monographs, vol. 3, Amer. Math. Soc., Providence, pp.
1–72 (1996)
16. Böttcher, A., Brunner, H., Iserles, A., Nørsett, S.P.: On the singular values and eigenvalues of the
Fox-Li and related operators. N. Y. J. Math. 16, 539–561 (2010)
17. Böttcher, A., Chithra, A.V., Namboodiri, M.N.N.: Approximation of approximation numbers by trun-
cation. Integral Equ. Oper. Theory 39(4), 387–395 (2001)
18. Böttcher, A., Silbermann, B.: Introduction to Large Truncated Toeplitz Matrices. Springer, New York
(1999)
19. Böttcher, A., Spitkovsky, I.M.: A gentle guide to the basics of two projections theory. Linear Algebra
Appl. 432(6), 1412–1459 (2010)
20. Boulton, L.: Projection methods for discrete Schrödinger operators. Proc. Lond. Math. Soc. 88(2),
526–544 (2004)
21. Brown, N.: Quasi-diagonality and the finite section method. Math. Comput. 76(257), 339–360 (2007)
22. Brunner, H., Iserles, A., Nørsett, S.P.: The computation of the spectra of highly oscillatory Fredholm
integral operators. J. Integral Equ. Appl. 23(4), 467–519 (2011)
23. Chandler-Wilde, S., Chonchaiya, R., Lindner, M.: Eigenvalue problem meets Sierpinski triangle: com-
puting the spectrum of a non-self-adjoint random operator. Oper. Matrices 5(4), 633–648 (2011)
24. Colbrook, M.J., Roman, B., Hansen, A.: How to Compute Spectra with Error Control (Preprint) (2019)
25. Davies, E.B.: Spectral enclosures and complex resonances for general self-adjoint operators. LMS J.
Comput. Math. 1, 42–74 (1998)
123
82 M. J. Colbrook, A. C. Hansen
26. Davies, E.B.: Linear Operators and Their Spectra, vol. 106. Cambridge University Press, Cambridge
(2007)
27. Dean, C., Wang, L., Maher, P., Forsythe, C., Ghahari, F., Gao, Y., Katoch, J., Ishigami, M., Moon, P.,
Koshino, M., et al.: Hofstadter’s butterfly and the fractal quantum Hall effect in moire superlattices.
Nature 497(7451), 598–602 (2013)
28. Deift, P., Li, L., Tomei, C.: Toda flows with infinitely many variables. J. Funct. Anal. 64(3), 358–402
(1985)
29. Digernes, T., Varadarajan, V.S., Varadhan, S.: Finite approximations to quantum systems. Rev. Math.
Phys. 6(04), 621–648 (1994)
30. Doyle, P., McMullen, C.: Solving the quintic by iteration. Acta Math. 163(3–4), 151–180 (1989)
31. Feinberg, J., Zee, A.: Non-Hermitian localization and delocalization. Phys. Rev. E 59(6), 6433 (1999)
32. M. C. T. for MATLAB 4.5.3.12856. Advanpix LLC., Yokohama, Japan
33. Goldsheid, I.Y., Khoruzhenko, B.A.: Distribution of eigenvalues in non-Hermitian Anderson models.
Phys. Rev. Lett. 80(13), 2897 (1998)
34. Gray, R.M., et al.: Toeplitz and circulant matrices: a review. Found. Trends Commun. Inf. Theory 2(3),
155–239 (2006)
35. Hagen, R., Roch, S., Silbermann, B.: C ∗ -algebras and numerical analysis. In: Monographs and Text-
books in Pure and Applied Mathematics, vol. 236, Marcel Dekker Inc., New York (2001)
36. Hansen, A.C.: On the approximation of spectra of linear operators on Hilbert spaces. J. Funct. Anal.
254(8), 2092–2126 (2008)
37. Hansen, A.C.: Infinite-dimensional numerical linear algebra: theory and applications. Proc. R. Soc.
Lond. Ser. A Math. Phys. Eng. Sci. 466(2124), 3539–3559 (2010)
38. Hansen, A.C.: On the solvability complexity index, the n-pseudospectrum and approximations of
spectra of operators. J. Am. Math. Soc. 24(1), 81–124 (2011)
39. Hatano, N., Nelson, D.R.: Localization transitions in non-Hermitian quantum mechanics. Phys. Rev.
Lett. 77(3), 570 (1996)
40. Holz, D.E., Orland, H., Zee, A.: On the remarkable spectrum of a non-Hermitian random matrix model.
J. Phys. A Math. Gen. 36(12), 3385 (2003)
41. Kato, T.: Perturbation Theory for Linear Operators, vol. 132. Springer, Berlin (2013)
42. Krein, M., Krasnoselski, M.: Fundamental theorems concerning the extension of Hermitian operators
and some of their applications to the theory of orthogonal polynomials and the moment problem.
Uspekhi Mat. Nauk. 2, 60–106 (1947)
43. Levitin, M., Shargorodsky, E.: Spectral pollution and second-order relative spectra for self-adjoint
operators. IMA J. Numer. Anal. 24(3), 393–416 (2004)
44. Lindner, M.: Infinite matrices and their finite sections. In: Frontiers in Mathematics: An Introduction
to the Limit Operator Method, Birkhäuser Verlag, Basel (2006)
45. Makris, K.G., El-Ganainy, R., Christodoulides, D.N., Musslimani, Z.H.: Beam dynamics in PT sym-
metric optical lattices. Phys. Rev. Lett. 100(10), 103904 (2008)
46. Marletta, M.: Neumann–Dirichlet maps and analysis of spectral pollution for non-self-adjoint elliptic
PDEs with real essential spectrum. IMA J. Numer. Anal. 30(4), 917–939 (2010)
47. Marletta, M., Scheichl, R.: Eigenvalues in spectral gaps of differential operators. J. Spectr. Theory
2(3), 293–320 (2012)
48. Mattis, D.C.: The few-body problem on a lattice. Rev. Mod. Phys. 58(2), 361 (1986)
49. McMullen, C.: Families of rational maps and iterative root-finding algorithms. Ann. Math. 125(3),
467–493 (1987)
50. McMullen, C.: Braiding of the attractor and the failure of iterative algorithms. Invent. Math. 91(2),
259–272 (1988)
51. Mogilner, A.: Hamiltonians in solid state physics as multiparticle discrete Schrödinger operators. Adv.
Soc. Math. 5, 139–194 (1991)
52. Nelson, D.R., Shnerb, N.M.: Non-Hermitian localization and population biology. Phys. Rev. E 58(2),
1383 (1998)
53. Olver, S.: ApproxFun.jl v0.8. github (online). https://github.com/JuliaApproximation/ApproxFun.jl
(2018)
54. Olver, S., Townsend, A.: A fast and well-conditioned spectral method. SIAM Rev. 55(3), 462–489
(2013)
123
On the infinite-dimensional QR algorithm 83
55. Olver, S., Townsend, A.: A practical framework for infinite-dimensional linear algebra. In: Proceed-
ings of the 1st First Workshop for High Performance Technical Computing in Dynamic Languages,
HPTCDL ’14, Piscataway, NJ, USA, IEEE Press, pp. 57–62 (2014)
56. Olver, S., Webb, M.: SpectralMeasures.jl. github (online). https://github.com/JuliaApproximation/
SpectralMeasures.jl (2018)
57. Parlett, B.N.: The Symmetric Eigenvalue Problem, vol. 20. siam, Bangkok (1998)
58. Pokrzywa, A.: Method of orthogonal projections and approximation of the spectrum of a bounded
operator. Stud. Math. 65(1), 21–29 (1979)
59. Ponomarenko, L., Gorbachev, R., Yu, G., Elias, D., Jalil, R., Patel, A., Mishchenko, A., Mayorov, A.,
Woods, C., Wallbank, J., et al.: Cloning of Dirac fermions in graphene superlattices. Nature 497(7451),
594–597 (2013)
60. Regensburger, A., Bersch, C., Miri, M.-A., Onishchukov, G., Christodoulides, D.N., Peschel, U.:
Parity-time synthetic photonic lattices. Nature 488(7410), 167–171 (2012)
61. Riddell, R.: Spectral concentration for self-adjoint operators. Pac. J. Math. 23(2), 377–401 (1967)
62. Schmidt, P., Spitzer, F.: The Toeplitz matrices of an arbitrary Laurent polynomial. Math. Scand. 8(1),
15–38 (1960)
63. Seidel, M.: On (N , )-pseudospectra of operators on Banach spaces. J. Funct. Anal. 262(11), 4916–
4927 (2012)
64. Seidel, M., Silbermann, B.: Finite sections of band-dominated operators—norms, condition numbers
and pseudospectra. In: Operator Theory, Pseudo-differential Equations, and Mathematical Physics,
Operator Theory: Advances and Applications, vol. 228, Birkhauser/Springer Basel AG, Basel, pp.
375–390 (2013)
65. Shargorodsky, E.: Geometry of higher order relative spectra and projection methods. J. Oper. Theory
44(1), 43–62 (2000)
66. Shargorodsky, E.: On the limit behaviour of second order relative spectra of self-adjoint operators. J.
Spectr. Theory 3, 535–552 (2013)
67. Shivakumar, P., Sivakumar, K., Zhang, Y.: Infinite Matrices and Their Recent Applications. Springer,
Berlin (2016)
68. Simon, B.: The classical moment problem as a self-adjoint finite difference operator. Adv. Math. 137(1),
82–203 (1998)
69. Smale, S.: The fundamental theorem of algebra and complexity theory. Bull. Am. Math. Soc. (N.S.)
4(1), 1–36 (1981)
70. Szabo, A., Ostlund, N.S.: Modern Quantum Chemistry: Introduction to Advanced Electronic Structure
Theory. Courier Corporation, Chelmsford (2012)
71. Teschl, G.: Jacobi Operators and Completely Integrable Nonlinear Lattices. American Mathematical
Soc, Providence (2000)
72. Trefethen, L.N., Embree, M.: Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and
Operators. Princeton University Press, Princeton (2005)
73. Webb, M., Olver, S.: Spectra of Jacobi Operators Via Connection Coefficient Matrices. arXiv preprint.
arXiv:1702.03095 (2017)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123