0% found this document useful (0 votes)
8 views67 pages

Infinite Dimensional QR Iteration

Uploaded by

hoangkiett0904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views67 pages

Infinite Dimensional QR Iteration

Uploaded by

hoangkiett0904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Numerische Mathematik (2019) 143:17–83 Numerische

https://doi.org/10.1007/s00211-019-01047-5 Mathematik

On the infinite-dimensional QR algorithm

Matthew J. Colbrook1 · Anders C. Hansen1

Received: 8 June 2018 / Revised: 11 April 2019 / Published online: 18 May 2019
© The Author(s) 2019

Abstract
Spectral computations of infinite-dimensional operators are notoriously difficult, yet
ubiquitous in the sciences. Indeed, despite more than half a century of research, it is still
unknown which classes of operators allow for the computation of spectra and eigen-
vectors with convergence rates and error control. Recent progress in classifying the
difficulty of spectral problems into complexity hierarchies has revealed that the most
difficult spectral problems are so hard that one needs three limits in the computation,
and no convergence rates nor error control is possible. This begs the question: which
classes of operators allow for computations with convergence rates and error control?
In this paper, we address this basic question, and the algorithm used is an infinite-
dimensional version of the QR algorithm. Indeed, we generalise the QR algorithm
to infinite-dimensional operators. We prove that not only is the algorithm executable
on a finite machine, but one can also recover the extremal parts of the spectrum and
corresponding eigenvectors, with convergence rates and error control. This allows
for new classification results in the hierarchy of computational problems that existing
algorithms have not been able to capture. The algorithm and convergence theorems are
demonstrated on a wealth of examples with comparisons to standard approaches (that
are notorious for providing false solutions). We also find that in some cases the IQR
algorithm performs better than predicted by theory and make conjectures for future
study.

Mathematics Subject Classification 47A10 · 65J10 · 46N40 · 03D55

1 Introduction

Spectral computations are ubiquitous in the sciences with applications in solutions to


differential and integral equations, spline functions, orthogonal polynomials, quantum

B Matthew J. Colbrook
[email protected]

1 DAMTP, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Rd,


Cambridge CB3 0WA, UK

123
18 M. J. Colbrook, A. C. Hansen

mechanics, quantum chemistry, statistical mechanics, Hermitian and non-Hermitian


Hamiltonians, optics etc. [10,26,27,34,59,67,68,70]. The computational problem is as
follows. Letting T denote a bounded linear operator on the canonical separable Hilbert
space l 2 (N), one wants to design algorithms to compute the spectrum of T , denoted
by σ (T ). Given the many applications, this problem has been investigated intensely
since the 1950s [3,4,6,7,16,17,20–22,25,28,29,35–38,43,46,47,61,63–66,72], and we
can only cite a small subset here.
In the paper “On the Solvability Complexity Index, the n-pseudospectrum and
approximations of spectra of operators” [38] the Solvability Complexity Index (SCI)
was introduced. The SCI provides a classification hierarchy [8,9,38] of spectral prob-
lems according to their computational difficulty. The SCI of a class of spectral problems
is the least number of limits needed in order to compute the spectrum of operators
in this class. From a classical numerical analysis point of view, such a concept may
seem foreign. Indeed, the traditional sentiment is that one should have an algorithm,
n , such that for an operator T ∈ B(l 2 (N)),

n (T ) −→ σ (T ), n → ∞, (1.1)

preferably with some form of error control of the convergence. As this philosophy
forms the basics of numerical analysis, it naturally permeates the classical literature
on the computational spectral problem. However, as is shown in [8,9,38], an algorithm
satisfying (1.1) is impossible even for the class of self-adjoint operators. Indeed, in the
general case, the best possible alternative is an algorithm depending on three indices
n 1 , n 2 , n 3 such that

lim lim lim n 3 ,n 2 ,n 1 (T ) = σ (T ).


n 3 →∞ n 2 →∞ n 1 →∞

In fact, any algorithm with fewer than three limits will fail on the general class of
operators. Moreover, no error control nor convergence rate on any of the limits are
possible, since any such error control would reduce the number of limits needed. How-
ever, for the self-adjoint and normal cases, two limits suffice in order to recover the
spectrum. This phenomenon implies that the only way to characterise the computa-
tional spectral problem is through a hierarchy classifying the difficulty of computing
spectra of different subclasses of operators. This is the motivation behind the SCI
hierarchy, which also covers general numerical analysis problems. Indeed, the SCI
hierarchy is closely related to Smale’s question on the existence of purely iterative
generally convergent algorithm for polynomial zero finding [69]. As demonstrated by
McMullen [49,50] and Doyle and McMullen [30], this is a case where several limits
are needed in the computation, and their results become special cases of classification
in the SCI hierarchy [8,9].
Informally, the SCI hierarchy is characterised as follows (see the “Appendix 1” for
a more detailed summary describing the SCI hierarchy).

123
On the infinite-dimensional QR algorithm 19

0 : The set of problems that can be computed in finite time, the SCI = 0.
1 : The set of problems that can be computed using one limit, the SCI = 1,
however, one has error control and one knows an error bound that tends to
zero as the algorithm progresses.
2 : The set of problems that can be computed using one limit, the SCI = 1, but
error control may not be possible.
m+1 : For m ∈ N, the set of problems that can be computed by using m limits, the
SCI ≤ m.

The class 1 is of course a highly desired class, however, most spectral problems
are much higher in the hierarchy. For example, we have the following known classi-
fications [8,9,38].

(i) The general spectral problem is in 4 \3 .


(ii) The self-adjoint spectral problem is in 3 \2 .
(iii) The compact spectral problem is in 2 \1 .

Here, the notation\indicates the standard “setminus”. Note that the SCI hierarchy
can be refined. We will not consider the full generalisation in the higher part of the
hierarchy in this paper, but recall the class 1 [24]. This class is defined as follows.

1 : We have 1 ⊂ 1 ⊂ 2 and 1 is the set of problems that can be computed by


passing to one limit. Error control may not be possible, however, there exists an
algorithm for these problems that converges and for which its output is included
in the spectrum (up to an arbitrarily small accuracy parameter ).

In the context of computing σ (T ), a 1 classification means the existence of an


algorithm n such that

n (T ) ⊂ σ (T ) + B2−n (0)

and n converges to σ (T ) in the Hausdorff metric. The 1 class is very important as


it allows for algorithms that never make a mistake. In particular, one is always sure
that the output is sound but we do not know if we have everything yet. The simplest
infinite-dimensional spectral problem is that of computing the spectrum of an infinite
diagonal matrix and, as is easy to see, we have the following.

(iv) The problem of computing spectra of infinite diagonal matrices is in 1 \1 .

Hence, the computational spectral problem becomes an infinite classification theory


in order to characterise the above hierarchy. In order to do so, there will, necessarily,
have to be many different types of algorithms. Indeed, characterising the hierarchy will
yield a myriad of different approaches, as different structures on the various classes
of operators will require specific algorithms. The key contribution of this paper is to
investigate the convergence properties of the infinite-dimensional QR (IQR) algorithm,
its implementation properties, and how this algorithm provides classification results
in the SCI hierarchy.

123
20 M. J. Colbrook, A. C. Hansen

1.1 Main contribution and novelty of the paper

The main contributions of the paper can be summarised as follows: New convergence
results, algorithmic results (the IQR algorithm can be implemented), classification
results in the SCI hierarchy and numerical examples.
(1) Convergence results We provide new convergence theorems for the IQR algo-
rithm with convergence rates and error control. The results include eigenvalues,
eigenvectors and invariant subspaces.
(2) Algorithmic implementation We prove that for infinite matrices with finitely many
non-zero entries in each column, it is possible to implement the IQR algorithm
exactly (on a finite machine) as if one had an infinite computer at one’s disposal.
This can be extended to implementing the IQR algorithm with error control for
general invertible operators.
(3) SCI hierarchy classifications As a result of (1) and (2), we provide new classifi-
cation results for the SCI hierarchy. In particular, the convergence properties of
the IQR algorithm capture key structures that allow for sharp 1 classification of
the problem of computing extremal points in the spectrum. Moreover, we estab-
lish sharp 1 classification of the problem of computing spectra of subclasses of
compact operators.
(4) Numerical examples Finally, we demonstrate the IQR algorithm and the proven
convergence results on a variety of difficult problems in practical computation,
illustrating how the IQR algorithm is much more than a theoretical concept.
Moreover, the examples demonstrate that the IQR algorithm performs much
better than predicted by our theory, working on much larger classes of operators.
Hence, we are left with many open problems on the theoretical understanding of
the potential and limitations of this algorithm. The computational experiments
include examples from
(i) Toeplitz/Laurent operators and their perturbations,
(ii) P T -symmetry in quantum mechanics,
(iii) Hopping sign model in sparse neural networks,
(iv) NSA Anderson model in superconductors.

1.2 Connection to previous work

Our results connect to many different approaches in the vast literature on spectral
computation in infinite dimensions. The infinite-dimensional computational spectral
problem is very different from the finite-dimensional computational eigenvalue prob-
lem, and even though the IQR algorithm is inspired by the finite-dimensional version,
this paper solely focuses on the infinite-dimensional problem. Thus, the paper is aimed
at the analysis and numerical analysis audience focusing on infinite-dimensional prob-
lems rather than the finite-dimensional numerical linear algebra discipline.

Finite sections The IQR algorithm provides an alternative to the standard finite
section method in several cases where it fails. Whereas the finite section method
would extract a finite section from the infinite matrix and then apply, for example,

123
On the infinite-dimensional QR algorithm 21

the finite-dimensional QR algorithm, the IQR algorithm first performs the infinite
QR iterations and then extracts a finite section. In general, these two processes do
not commute. The finite section method (or any derivative of it) cannot work in
general because of the general classification results in the SCI hierarchy mentioned
in Sect. 1. Typically, it may provide false solutions. However, in the cases where
it converges, it provides invaluable 2 classifications in the SCI hierarchy. The
finite section method has often been viewed in connection with Toeplitz theory
and the reader may want to consult the work by Böttcher [14,15], Böttcher and
Silberman [18], Böttcher et al. [16], Brunner et al. [22], Hagen et al. [35], Lind-
ner [44], Marletta [46] and Marletta and Scheichl [47]. From the operator algebra
point of view, the work of Arveson [5–7] has been influential as well as the work
of Brown [21].
Infinite-dimensional Toda flow Deift et al. [28] provided the first results on the
IQR algorithm in connection with Toda flows with infinitely many variables. Their
results are purely functional analytic and do not take implementation and com-
putability issues into account. However, these results provide the fundamentals
of the IQR algorithm. In [36] these results were expanded with a convergence
result for eigenvectors corresponding to eigenvalues outside the essential numeri-
cal range for normal operators. Yet, this paper did not consider convergence rates,
actual numerical calculation nor any classification results.
Infinite-dimensional QL algorithm Olver, Townsend and Webb have provided
a practical framework for infinite-dimensional linear algebra and foundational
results on computations with infinite data structures [53–56,73]. This includes
efficient codes as well as theoretical results. The infinite-dimensional QL (IQL)
algorithm is an important part of this program. The IQL algorithm is rather differ-
ent from the IQR algorithm, although they are similar in spirit. In particular, both
the implementation and the convergence results are somewhat contrasting.
Infinite-dimensional spectral computation: The results in this paper follow in the
long tradition of infinite-dimensional spectral computations. This field contains
a vast literature that spans more than half a century, and the references that we
have cited in the first paragraph of Sect. 1 represent a small sample. However, we
would like to highlight the recent work by Bögli et al. [13] who were able to com-
putationally confirm, with absolute certainty, a conjecture on a certain oscillatory
behaviour of higher auto-ionizing resonances of atoms. Note that problems that
are classified as 1 and 1 in the SCI hierarchy may allow for computer assisted
proofs.

1.3 Background and notation

Here we briefly recall some definitions used in the paper. We will consider the canonical
separable Hilbert space H = l 2 (N) (the set of square summable sequences). Moreover,
we write B(H) for the set of bounded operators on H. For orthogonal projections E, F,
we will write E ≤ F if the range of E is a subspace of the range of F. We denote the
canonical orthonormal basis of H by {e j } j∈N , and if ξ ∈ H we write ξ( j) = ξ, e j .

123
22 M. J. Colbrook, A. C. Hansen

Note that T ∈ B(H) is uniquely determined by its matrix elements ti j = T e j , ei .


Hence we will use the words bounded operator and infinite matrix interchangeably.
Given a sequence of operators {Tn }, we will use the notation

SOT WOT
Tn −→ T , Tn −→ T

to mean convergence in the strong and weak operator topology respectively. The
spectrum of T ∈ B(H) will be denoted by σ (T ), and σd (T ) denotes the set of isolated
eigenvalues with finite multiplicity (the discrete spectrum).
In connection with the spectrum, we need to recall some definitions which will
appear in the statement of our theorems. We recall that, for T ∈ B(H), the essential
spectrum1 and the essential spectral radius are given by

σess (T ) = {z ∈ C : T − z I is not Fredholm}, ress (T ) = sup{|z| : z ∈ σess (T )}.

Moreover, the numerical range and the essential numerical range of T are defined by

W (T ) = {T ξ, ξ  : ξ = 1}, We (T ) = W (T + K ).
K compact

In addition, we need the Hausdorff metric as defined by the following. Let S, T ⊂ C,


be compact. Then their Hausdorff distance is

d H (S, T ) = max{sup d(λ, T ), sup d(λ, S)}, (1.2)


λ∈S λ∈T

where d(λ, T ) = inf ρ∈T |ρ − λ|. We also recall a generalisation of the spectrum,
known as the pseudospectrum. Indeed, for  > 0 define the -pseudospectrum as
   
 
σ (T ) = z ∈ C : (T − z I )−1  ≥  −1 ,

 
where we interpret  S −1  as +∞ if S does not have a bounded inverse. This is easier
to compute than the spectrum, converges in the Hausdorff metric to the spectrum as
 ↓ 0 and gives an indication of the instability of the spectrum of T . We shall use it
as a comparison for the IQR algorithm and as a means to detect spectral pollution for
finite section methods.
Finally, we need a notion of convergence of subspaces. We follow the notation in
[41]. Let M ⊂ B and N ⊂ B be two non-trivial closed subspaces of a Banach space
B. The distance between them is defined by

δ(M, N ) = sup inf x − y , δ̂(M, N ) = max[δ(M, N ), δ(N , M)].


x∈M y∈N
x =1

1 Of course in the case of non-normal T there are different definitions of the essential spectrum. However,
these differences will not matter regarding the results of this paper.

123
On the infinite-dimensional QR algorithm 23

Given subspaces M and {Mk } such that δ̂(Mk , M) → 0 as k → ∞, we will use the
notation Mk →M. If we replace B with a Hilbert space H, we can express δ and δ̂
conveniently in terms of projections and operator norms. In particular, if E and F are
the orthogonal projections onto subspaces M ⊂ H and N ⊂ H respectively, then

δ(M, N ) = sup inf x − y = sup F⊥x = F⊥ E .


x∈M y∈N x∈M
x =1 x =1

Since the operator E − F = F ⊥ E − F E ⊥ is essentially the direct sum of operators


F ⊥ E ⊕ (−F E ⊥ ), its norm is δ̂(M, N ), i.e.

δ̂(M, N ) = max( F ⊥ E , E ⊥ F ) = max( F ⊥ E , F E ⊥ ) = E − F . (1.3)

This allows us to extend the definition to allow the trivial subspace {0} and gives
rise to a metric on the set of all closed subspaces of H (first introduced by Krein
and Krasnoselski in [42]). We also define the (maximal) subspace angle, φ(M, N ) ∈
[0, π/2], between M and N by
 
sin φ(M, N ) = δ̂(M, N ). (1.4)

Finally, we will use two further well-known properties in the Hilbert space setting.
First, if M and N are both finite l-dimensional subspaces, then
1
δ(M, N ) ≤ l 2 δ(N , M), (1.5)

which shows that to prove convergence of finite-dimensional subspaces, it is enough


to prove δ-convergence. Second, suppose we have


n
M= Mj, N (k) = N1(k) + · · · + Nn(k) ,
j=1

(k)
where the N j need not be orthogonal. Then a simple application of Hölder’s inequal-
ity yields

⎛ ⎞1
n 2
(k)
δ(M, N (k)
)≤⎝ δ(M j , N j )2 ⎠ , (1.6)
j=1

(k)
which shows that if the dimensions of M j and N j are finite and equal, then to prove
(k)
convergence N (k) → M we only need to prove that δ(M j , N j ) → 0 as k → ∞.
For further properties (including other notions of distances between subspaces) and
a discussion on two projections theory, we refer the reader to the excellent article of
Böttcher and Spitkovsky [19].

123
24 M. J. Colbrook, A. C. Hansen

1.4 Organisation of the paper

The paper is organised as follows. In Sect. 2 we define the IQR algorithm (simple
codes are also provided in the appendix). Section 3 contains and proves our main
theorems including convergence rates. The outcome is more elaborate than the finite-
dimensional case, as the infinite-dimensional setting includes more intricate instances.
Our key practical result is that, despite being an algorithm dealing with infinite amount
of information, it can be implemented on any standard computer and this is discussed
in Sect. 4. The fact that the IQR algorithm can be computed allows for its use in order
to provide new classification in the SCI hierarchy as discussed in Sect. 5. In particular,
we demonstrate 1 classification for the extremal part of the spectrum and dominant
invariant subspaces, as well as 1 results for spectra of certain classes of compact
operators. Note that the general spectral problem for compact operators is not in 1 .
The IQR algorithm and convergence theorems are demonstrated on a large collection
of examples from the sciences on difficult computational spectral problems in Sect. 6,
with comparisons to the finite section method. The IQR algorithm is also found to
perform better than theory predicts and we conjecture conditions on the operator for
this to be the case. Finally, we conclude with a discussion of the opportunities and
limits of the IQR algorithm in Sect. 7.

2 The infinite-dimensional QR algorithm (IQR)

The IQR algorithm has existed as a pure mathematical concept for more than thirty
years and it first appeared in the paper “Toda Flows with Infinitely Many Variables”
[28] in 1985. However, the analysis in [28] covers only self-adjoint infinite matrices
with real entries, and since the analysis is done from a pure mathematical perspective,
the question regarding the actual numerical algorithm is left out. We will extend the
analysis to more general operators and answer the crucial question: can one actually
implement the IQR algorithm? The answer is affirmative, and we also prove conver-
gence theorems, generalising the well-known finite-dimensional case.

2.1 The QR decomposition

The QR decomposition is the core of the QR algorithm. If T ∈ Cn×n , one may apply
the Gram-Schmidt procedure to the columns of T and store these columns in a matrix
Q. This gives us the QR decomposition

T = Q R, (2.1)

where Q is a unitary matrix and R upper triangular. It is no surprise that a QR decom-


position should exist in the infinite-dimensional case, however, we need more than
just the existence. A key ingredient in the QR algorithm are Householder transfor-
mations, used for computational reasons (they are backwards stable). It is crucial that
we can adopt these tools in the infinite-dimensional setting. Our goal is to extend
the construction of the QR decomposition, via Householder transformations, to infi-

123
On the infinite-dimensional QR algorithm 25

nite matrices and to find a way so that one can implement the procedure on a finite
machine. To do this, we need to introduce the concept of Householder reflections in
the infinite-dimensional setting.
Definition 2.1 A Householder reflection is an operator S ∈ B(H) of the form

2
S=I− ξ ⊗ ξ̄ , ξ ∈ H, (2.2)
ξ 2

where ξ̄ denotes the associated functional in H∗ given by x → x, ξ . In the case


where H = H1 ⊕ H2 and Ii is the identity on Hi then
 
2
U = I1 ⊕ I2 − ξ ⊗ ξ̄ ξ ∈ H2 ,
ξ 2

will be called a Householder transformation.


A straightforward calculation shows that S ∗ = S −1 = S and thus also U ∗ =
U −1 = U . An important property of the operator S is that if {e j } is an orthonormal
basis for H and η ∈ H, then one can choose ξ ∈ H such that
  
2
Sη, e j  = I− ξ ⊗ ξ̄ η, e j = 0, ∀ j = 1.
ξ 2

In other words, one can introduce zeros in the column below the diagonal entry.
Indeed, if η1 = η, e1  = 0 one may choose ξ = η ± η ζ, where ζ = η1 /|η1 |e1 and
if η1 = 0 choose ξ = η ± η e1 . The following theorem gives the existence of a QR
decomposition, even in the case where the operator is not invertible.
Theorem 2.2 ([36]) Let T be a bounded operator on a separable Hilbert space H and
let {e j } j∈N be an orthonormal basis for H ∼
= l 2 (N). Then there exists an isometry Q
such that T = Q R, where R is upper triangular with respect to {e j }. Moreover,

Q = SOT-lim Vn
n→∞

where Vn = U1 · · · Un are unitary and each U j is a Householder transformation.

2.2 The IQR algorithm

Let T ∈ B(H) be invertible and let {e j } be an orthonormal basis for H. By Theorem 2.2
we have T = Q R, where Q is an isometry and R is upper triangular with respect to
{e j }. Since T is invertible, Q is in fact unitary. Consider the following construction of
unitary operators { Q̂ k } and upper triangular (w.r.t. {e j }) operators { R̂k }. Let T = Q 1 R1
be a QR decomposition of T and define T1 = R1 Q 1 . Then QR factorize T1 = Q 2 R2
and define T2 = R2 Q 2 . The recursive procedure becomes

Tm−1 = Q m Rm , Tm = Rm Q m . (2.3)

123
26 M. J. Colbrook, A. C. Hansen

Now define

Q̂ m = Q 1 Q 2 . . . Q m , R̂m = Rm Rm−1 . . . R1 . (2.4)

This is known as the QR algorithm and is completely analogous to the finite-


dimensional case. Note also that we have Tn = Q̂ ∗n T Q̂ n . In the finite-dimensional
case and under favourable conditions, Q̂ ∗n T Q̂ n converges to a diagonal operator and
the columns of Q̂ n converge to the corresponding eigenvectors as n → ∞ (see Theo-
rem 3.1 below). We will see that the IQR algorithm behaves similarly for the extreme
parts of the spectrum.

Definition 2.3 Let T ∈ B(H) be invertible and let {e j } be an orthonormal basis for
H. The sequences { Q̂ j } and { R̂ j } constructed as in (2.3) and (2.4) will be called a
Q-sequence and an R-sequence of T with respect to {e j }.

Remark 2.4 Note that since the Householder transformations used in the proof of
Theorem 2.2 are unique up to a ± sign, we will with some abuse of language refer
to the QR decomposition constructed as the QR decomposition. In general for an
invertible operator, the IQR algorithm is uniquely defined up to phase—see Sect. 4.2.
This will not be a problem for our theorems or numerical examples.

The following observation will be useful in the later developments. From the con-
struction in (2.3) and (2.4) we get

T = Q 1 R1 = Q̂ 1 R̂1 ,
T 2 = Q 1 R1 Q 1 R1 = Q 1 Q 2 R2 R1 = Q̂ 2 R̂2 ,
T 3 = Q 1 R1 Q 1 R1 Q 1 R1 = Q 1 Q 2 R2 Q 2 R2 R1 = Q 1 Q 2 Q 3 R3 R2 R1 = Q̂ 3 R̂3 .

An easy induction gives us that

T m = Q̂ m R̂m . (2.5)

Note that R̂m must be upper triangular with respect to {e j } j∈N since R j , j ≤ m is
upper triangular with respect to {e j } j∈N . Also, if T is invertible then Rei , ei  = 0.
From this it follows immediately that

span{T m e j } Jj=1 = span{ Q̂ m e j } Jj=1 , J ∈ N. (2.6)

3 Convergence theorems

In finite dimensions we have the following well-known theorem:

Theorem 3.1 (Finite dimensions) Let T ∈ C N ×N be a normal matrix with eigenvalues


satisfying |λ1 | > · · · > |λ N |. Let {Q m } be a Q-sequence of unitary operators. Then
(up to re-ordering of the basis)

123
On the infinite-dimensional QR algorithm 27


N
Q ∗m T Q m −→ λjej ⊗ ej, as m → ∞.
j=1

In this section we will address the convergence of the IQR algorithm for normal
operators under similar assumptions and prove an analogue of Theorem 3.1 in infinite
dimensions (Theorem 3.9). As well as this, and for more general operators T that are
not necessarily normal, we address block convergence (Theorem 3.13), relevant when
the eigenvalues do not have distinct moduli, and convergence to (dominant) invariant
subspaces (Theorem 3.15).

3.1 Preliminary definitions and results

To state and prove our theorems we need some preliminary results. The reader only
interested in the results themselves is referred to Sect. 3.2. If T is a normal operator, we
will use χ S (T ) to denote the indicator function of the set S defined via the functional
calculus. Without loss of generality, we deal with the Hilbert space H = l 2 (N) and the
canonical orthonormal basis {e j } j∈N . Our first set of results concerns the convergence
of spanning sets under power iterations and is analogous to the finite-dimensional
case. The following proposition can be found in [36] and together with Lemma 3.6
below, these are the only results we will use from [36].

Proposition 3.2 Suppose that T ∈ B(H) is normal, is invertible and that σ (T ) =


ω ∪  is a disjoint union such that ω = {λi }i=1 N consists of finitely many isolated

eigenvalues of T with |λ1 | > |λ2 | > · · · > |λ N |. Suppose further that sup{|z| : z ∈
} < |λ N |. Let l ∈ N and suppose that {ξi }li=1 are linearly independent vectors in H
such that {χω (T )ξi }li=1 are also linearly independent. Then

(i) The vectors {T k χω (T )ξi }li=1 are linearly independent and there exists an l-
dimensional subspace B ⊂ ranχω (T ) such that

span{T k ξi }li=1 → B, as k → ∞.

(ii) If

span{T k ξi }l−1
i=1 → D ⊂ H, as k → ∞,

where D is an (l − 1)-dimensional subspace, then

span{T k ξi }li=1 → D ⊕ span{ξ }, as k → ∞,

where ξ ∈ ranχω (T ) is an eigenvector of T .

123
28 M. J. Colbrook, A. C. Hansen

In order to extend this proposition to describe rates of convergence and prove our
main theorems, we need to describe the space B in more detail. This is done inductively
as follows. The first step is to choose ν1,1 ∈ {λi }i=1
N of maximum modulus such that

span{χν1,1 (T )ξ1 } = {0}.

We then let ξ1,1 be a linear multiple of ξ1 such that χν1,1 (T )ξ1,1 has norm one. Now
suppose that at the m-th stage we have constructed vectors {ξm,i }i=1 m with the same
sm
linear span as {ξi }i=1 and such that there exist {νm, j } j=1 ⊂ {λi }i=1 with the following
m N

properties. After re-ordering the vectors {ξm,i }i=1m if necessary, there exist integers

0 = km,0 < km,1 < km,2 < · · · < km,sm = m such that
     
(1) νm,sm  < νm,sm −1  < · · · < νm,1  .  
N has |λ| > ν
(2) χλ (T )ξm,i = 0 if i > km, j and λ ∈ {λi }i=1 
m, j+1 .
k
(3) {χνm, j (T )ξm,i }i=k
m, j
m, j−1 +1
are orthonormal.

We seek to add the space spanned by the vector ξm+1 whilst preserving these properties.
First we deal with (2). Let ηm+1 ∈ {λi }i=1 N be of maximal modulus such that
 
/ span{χ{λ1 ,...,ηm+1 } (T )ξ j }mj=1 . If |ηm+1 | < νm,1  then let
χ{λ1 ,...,ηm+1 } (T )ξm+1 ∈
 
t(m + 1) be maximal such that |ηm+1 | < νm,t(m+1) . We then choose complex num-
k
bers {am, j } j=1
m,t(m+1)
such that writing

km,t(m+1)
ξ̃m+1,m+1 = ξm+1 + am, j ξm, j
j=1

N has |λ| > |η


we have that χλ (T )ξ̃m+1,m+1 = 0 if λ ∈ {λi }i=1 m+1 |. Note that by (2), (3)
and the definition of ηm+1 , the coefficients am, j are determined uniquely in terms of
 
{ξm,i } m,t(m+1) . If |ηm+1 | ≥ νm,1  then let t(m + 1) = 0 and we set ξ̃m+1,m+1 = ξm+1 .
k
i=1
N has |λ| > |η
In this case we still have that χλ (T )ξ̃m+1,m+1 = 0 if λ ∈ {λi }i=1 m+1 |.
We then define ξm+1, j = ξm, j for 1 ≤ j ≤ m and now deal with (3).
If ηm+1 ∈ / {νm, j }sj=1
m
then let ξm+1,m+1 be a linear multiple of ξ̃m+1,m+1 such
m +1
that χηm+1 (T )ξm+1,m+1 has norm 1 and we let {νm+1, j }sj=1 be a re-ordering of
{νm, j }sj=1
m
∪ {ηm+1 }. Otherwise, we have ηm+1 = νm,t(m+1)+1 and we apply Gram-
Schmidt to

k
{χνm,t(m+1)+1 (T )ξm+1,i }i=k
m,t(m+1)+1
m,t(m+1) +1
∪ {χνm,t(m+1)+1 (T )ξ̃m+1,m+1 }

k
(without changing {ξm+1,i }i=km,t(m+1)+1
m,t(m+1) +1
). Note that by (2) and the definition of ηm+1
these vectors are linearly independent. This gives ξm+1,m+1 such that

k
{χνm,t(m+1)+1 (T )ξm+1,i }i=k
m,t(m+1)+1
m,t(m+1) +1
∪ {χνm,t(m+1)+1 (T )ξm+1,m+1 }

123
On the infinite-dimensional QR algorithm 29


N has |λ| > ν

are orthonormal and χλ (T )ξm+1,m+1 = 0 if λ ∈ {λi }i=1 m,t(m+1)+1 .

After re-ordering indices if necessary, we see that (1)-(3) now hold for m + 1.
After l steps the above process terminates giving a new basis {ξ̃i }li=1 = {ξl,i }li=1
for span{ξi }li=1 along with {ν j }nj=1 = {νl, j }nj=1 ⊂ {λi }i=1
N and 0 = k < k < k <
0 1 2
· · · < kn = l such that
(i) |νn | < |νn−1 | < · · · < |ν1 | .  
N has |λ| > ν
(ii) χλ (T )ξ̃i = 0 if i > k j and λ ∈ {λi }i=1 
j+1 .
k
(iii) {χν j (T )ξ̃i }i=k
j
j−1 +1
are orthonormal.

The subspace B can then be described as


n
k
B= span{χν j (T )ξ̃i }i=k
j
j−1 +1
.
j=1

Definition 3.3 With respect to the above construction we define the following:

kj
 l 1
2
E j := span{χν j (T )ξ̃i }i=k j−1 +1
, Z (T , {ξ j }lj=1 ) := ( ξ̃i 2
− 1) . (3.1)
i=1

Since the Gram-Schmidt process is defined uniquely up to phases we see that


Z (T , {ξ j }lj=1 ) is well-defined. The above construction also shows that if {χω (T )ξi }l+1
i=1
are linearly independent then

Z (T , {ξ j }l+1
j=1 ) ≥ Z (T , {ξ j } j=1 ).
l

We can now prove the following refinement of Proposition 3.2:

Proposition 3.4 Suppose the assumptions of Proposition 3.2 hold. Let J ≤ N be


minimal such that {χ{λ1 ,...,λ J } (T )ξi }li=1 are linearly independent. Set

ρ = sup{|z| : z ∈  ∪ {λ J +1 , . . . , λ N }},
r = max{|λ2 /λ1 | , . . . , |λ J /λ J −1 | , ρ/ |λ J |}.

Then r < 1 and δ(B, span{T k ξi }li=1 ) ≤ Z (T , {ξ j }lj=1 )r k . Since the spaces are l-
dimensional, it follows from (1.5) that we have the convergence rate

1
δ̂(B, span{T k ξi }li=1 ) ≤ Z (T , {ξ j }lj=1 )l 2 r k .

Proof Consider the subspaces

k
E kj = span{T k ξ̃i }i=k
j
j−1 +1
.

123
30 M. J. Colbrook, A. C. Hansen

k j k j
Let ζ = i=k α χ (T )ξ̃i ∈ E j be a unit vector (hence i=k
j−1 +1 i ν j j−1 +1
|αi |2 = 1)
and consider
kj
ηk = αi T k ξ̃i /ν kj ∈ E kj .
i=k j−1 +1

By construction, we have for any such ξ̃i in the above sum that

ξ̃i = (χν j (T ) + χθ j (T ))ξ̃i , θ j = {λ ∈ σ (T ) : |λ| < |ν j |}.

This gives T k ξ̃i = ν kj χν j (T )ξ̃ j,i + T k χθ j (T )ξ̃i . Now, by the assumption on σ (T ), we


have

ρ j = sup{|z| : z ∈ θ j } < |ν j |.

Thus, since

T k χθ j (T )ξ̃i /|ν kj | < |ρ j /ν j |k χθ j (T )ξ̃i ,

we have
kj  kj 1
2
ζ − ηk ≤ |ρ j /ν j |k |αi | χθ j (T )ξ̃i ≤ ( ξ̃i 2
− 1) rk.
i=k j−1 +1 i=k j−1 +1

Here we have used Hölder’s inequality together with the fact that χθ j (T )ξ̃i 2 =
kj
ξ̃i 2 −1 by orthonormality of {χν j (T )ξ̃i }i=k j−1 +1
. The right-hand side gives an upper
bound for δ(E j , E j ). Analogous rates of convergence hold for the other subspaces
k

and from (1.6) we have

δ(B, span{T k ξ̃i }li=1 ) ≤ Z (T , {ξ j }lj=1 )r k , (3.2)

since the spaces E j are orthogonal. 



For the rest of this section we shall assume the following:
(A1) T ∈ B(H) is an invertible normal operator and {e j } j∈N an orthonormal basis
for H. {Q k } and {Rk } are Q- and R-sequences of T with respect to the basis
{e j } j∈N .
(A2) σ (T ) = ω ∪  such that ω ∩  = ∅ and ω = {λi }i=1N , where the λ s are isolated
i
eigenvalues with (possibly infinite) multiplicity m i . Let M = m 1 + · · · + m N =
dim(ranχω (T )) and suppose that |λ1 | > . . . > |λ N |. Suppose further that
sup{|θ | : θ ∈ } < |λ N |.
To apply Propositions 3.2 and 3.4 to prove the main result Theorem 3.9, we need
to take care of the case that some of the e j may have χω (T )e j = 0.

123
On the infinite-dimensional QR algorithm 31

Definition 3.5 Suppose that (A1) and (A2) hold and let K ∈ N ∪ {∞} be minimal
with the property that dim(span{χω (T )e j } Kj=1 ) = M. Define

ω = {e j : χω (T )e j = 0, j ≤ K },
 = {e j : χω (T )e j = 0, j ≤ K },
˜ ω = {e j ∈ ω : χω (T )e j ∈ span{χω (T )ei } j−1 }.
 i=1

Define also the corresponding subset {ê j } M ˜


j=1 ⊂ {e j } j=1 such that {ê j } j=1 = ω \ω
K M

and such that upon writing ê j = e p j , the p j are increasing.

Note that we have the following decomposition of T into


⎛ ⎞
M
T =⎝ λc j ξ j ⊗ ξ̄ j ⎠ ⊕ χ (T )T , λc j ∈ ω,
j=1

where {ξ j } M
j=1 is an orthonormal set of eigenvectors of T . The following simple lemma
extends Lemma 39 in [36] to infinite M but the proof is verbatim so omitted.
˜ ω , then
Lemma 3.6 If em ∈  ∪ 

span{χω (T )qk, j }mj=1 = span{χω (T )q̂k, j }s(m)


j=1 , qk, j = Q k e j , q̂k, j = Q k ê j ,

s(m)
where s(m) is the largest integer such that {ê j } j=1 ⊂ {e j }mj=1 .

The following theorem is the key step of the proof of Theorem 3.9 and concerns
convergence to the eigenvectors of T .

Theorem 3.7 Assume (A1) and (A2) and define

ρ = sup{|z| : z ∈ }, r = max{|λ2 /λ1 | , . . . , |λ N /λ N −1 | , ρ/ |λ N |}.

Then there exists a collection of orthonormal eigenvectors {q̂ j } M


j=1 ⊂ ranχω (T ) of T
and collections of constants A(m), B( j) and C(μ) such that
˜ ω and μ is maximal with pμ < m (recall that ê j = e p j ), then we
(a) If em ∈  ∪ 
have
 
χω (T )qk,m  ≤ A(m)Z (T , {ê j }μ )r k . (3.3)
j=1
 
In the case that m < p1 , we interpret this as χω (T )qk,m  = 0 which holds from
Lemma 3.6.
(b) For any j < M + 1,

j
δ̂(span{q̂ j }, span{q̂k, j }) ≤ B( j)Z (T , {êi }i=1 )r k . (3.4)

123
32 M. J. Colbrook, A. C. Hansen

(c) For any μ < M + 1,


μ μ μ
δ(span{q̂ j,k } j=1 , span{q̂ j } j=1 ) ≤ C(μ)Z (T , {ê j } j=1 )r k (3.5)

and hence
μ μ 1 μ
δ̂(span{q̂ j,k } j=1 , span{q̂ j } j=1 ) ≤ μ 2 C(μ)Z (T , {ê j } j=1 )r k . (3.6)

Here, as in Lemma 3.6, qk, j = Q k e j and q̂k, j = Q k ê j . Finally, if M is finite then we


must have span{q̂ j } M
j=1 = ranχω (T ).

We will provide an inductive proof of Theorem 3.7 which requires the following
for the inductive step of part (a).

Lemma 3.8 Assume the conditions in the statement of Theorem 3.7. Suppose also that
(b) in Theorem 3.7 holds for j = 1, . . . , μ and that (c) holds for a given μ < M. Let
˜ ω , where m < pμ+1 , (3.3) also holds with
e pμ+1 = êμ+1 , then if em ∈  ∪ 

⎧ ⎫1
⎨ μ ⎬2
A(m) = [C(μ) + B( j)]2 + C(μ).
⎩ ⎭
j=1

μ
Proof First note that from (2.6), invertibility of T and the fact that {χω (T )ê j } j=1 are
μ
linearly independent, it must hold that {χω (T )q̂k, j } j=1 are linearly independent also.
Then by using the assumptions stated and the fact that χω (T )q̂ j = q̂ j we have
μ μ μ μ
δ(span{χω (T )q̂k, j } j=1 , span{q̂ j } j=1 ) ≤ δ(span{q̂k, j } j=1 , span{q̂ j } j=1 )
μ
≤ C(μ)Z (T , {ê j } j=1 )r k .

Also, we have that s(m) ≤ μ and Lemma 3.6 implies

s(m) μ
span{χω (T )qk, j }mj=1 = span{χω (T )q̂k, j } j=1 ⊂ span{χω (T )q̂k, j } j=1 .
 
Using the fact that χω (T )qk,m  ≤ 1 and the definition of δ (along with the
μ
fact that span{q̂ j } j=1 is finite-dimensional), it follows that there exists some vk =
μ μ
j=1 β j,k q̂ j ∈ span{q̂ j } j=1 with vk ≤ 1 and

 
χω (T )qk,m − vk  ≤ C(μ)Z (T , {ê j }μ )r k . (3.7)
j=1

We also have from assumption (b) that


   
χω (T )qk,m , q̂ j  = qk,m , q̂ j  ≤ B( j)Z (T , {êi } j )r k
i=1
 
+ qk,m , q̂k, j  = B( j)Z (T , {êi }i=1 )r k ,
j
(3.8)

123
On the infinite-dimensional QR algorithm 33

  
since qk,m is orthogonal to q̂k, j . This together with (3.7) gives that β j,k  ≤ C(μ) +
 μ
B( j) Z (T , {ê j } j=1 )r k . Hence we must have

 μ
 2  21 μ
vk ≤ C(μ) + B( j) Z (T , {ê j } j=1 )r k .
j=1

Using (3.7) again then gives the result. Note that we have used orthonormality of
μ
{q̂ j } j=1 which will be proven as part of the induction. 


Proof of Theorem 3.7 We begin with the initial step of the induction for (b) and (c).
Note that (a) trivially holds by construction with A(m) = 0 for any m < p1 where
e p1 = ê1 and this provides the initial step for (a).
By Propositions 3.2 and 3.4, there exists a unit eigenvector q̂1 ∈ ranχω (T ) such
that

δ(span{q̂1 }, span{T k ê1 }) ≤ Z (T , {ê1 })r k .


p
Since span{T k ê1 } ⊂ span{T k ei }i=1
1
, this implies that

p
δ(span{q̂1 }, span{T k ei }i=1
1
) ≤ Z (T , {ê1 })r k .

Thus, it follows that


p p
δ(span{q̂1 }, span{qk,i }i=1
1
) = δ(span{q̂1 }, span{T k ei }i=1
1
) ≤ Z (T , {ê1 })r k ,
(3.9)
p
from (2.6). Note that {qk,i }i=1
1
are orthonormal (recall that Q k is unitary) and hence
 p1
by (3.9) there exists some coefficients αk,i with i=1 |αk,i |2 ≤ 1 such that defining
 p1
η̃k = i=1 αk,i qk,i we have
 
q̂1 − η̃k  ≤ Z (T , {ê1 })r k . (3.10)

˜ ω , where m < p1 then by Lemma 3.6 qk,m , q̂1  = 0. It follows that


If em ∈  ∪ 
we must have
 
δ(span{q̂1 }, span{q̂k,1 }) ≤ q̂1 − αk, p1 q̂k,1  ≤ Z (T , {ê1 })r k .

Hence we can take B(1) = 1 and C(1) = 1 in (b) and (c) respectively which completes
the initial step.
For the induction step we will argue simultaneously for (a), (b) and (c) using induc-
tion on μ. Suppose that (a) holds for m < pμ with e pμ = êμ together with (b) and
(c) for j ≤ μ and some μ < M. Let e pμ+1 = êμ+1 then we can use Lemma 3.8 to
extend (a) to all m < pμ+1 and this provides the step for (a). For (b), we note that
Propositions 3.2 and 3.4 imply that

123
34 M. J. Colbrook, A. C. Hansen

 
μ μ+1
δ span{q̂i }i=1 ⊕ span{ξ }, span{T k êi }i=1 ,
μ+1
≤ Z (T , {ê j } j=1 )r k , ξ ∈ ranχω (T ), (3.11)

where ξ is a unit eigenvector of T . We may also assume without loss of generality


μ+1
that ξ is orthogonal to q̂ j for j = 1, . . . , μ. As before, since span{T k êi }i=1 ⊂
p μ+1
span{T k ei }i=1 we have

μ μ+1 p μ+1
δ(span{q̂i }i=1 ⊕ span{ξ }, span{T k ei }i=1 ) ≤ Z (T , {ê j } j=1 )r k ,

and hence by invertibility of T

μ μ+1 p
δ(span{q̂i }i=1 ⊕ span{ξ }, span{qk,i }i=1 )
μ μ+1 p
= δ(span{q̂i }i=1 ⊕ span{ξ }, span{T k ei }i=1 )
μ+1
≤ Z (T , {ê j } j=1 )r k . (3.12)

p
μ+1
Again, using that{qk,i }i=1 are orthonormal, there exists some coefficients αk,i with
 pμ+1  pμ+1
i=1 |αk,i | 2 ≤ 1 such that defining η̃k = i=1 αk,i qk,i we have

μ+1
ξ − η̃k ≤ Z (T , {ê j } j=1 )r k . (3.13)

˜ ω , where m < pμ+1 then as shown above we have


If em ∈  ∪ 
   
qk,m , ξ  = χω (T )qk,m , ξ  ≤ A(m)Z (T , {ê j }μ )r k ≤ A(m)Z (T , {ê j }μ+1 )r k .
j=1 j=1

Taking the inner product of ξ − η̃k with qk,m and using (3.13) together with the
    μ+1
orthonormality of the qk, j s, it follows that αk,m  ≤ A(m) + 1 Z (T , {ê j } j=1 )r k .
Similarly, if j ≤ μ then for any c ∈ C
       
q̂k, j , ξ  ≤ cq̂ j , ξ  + cq̂ j − q̂k, j  = cq̂ j − q̂k, j  ,

since ξ is orthogonal to q̂ j . Minimising over c, we can bound this by B( j)Z (T ,


μ   μ+1
{ê j } j=1 )r k . In the same way, it then follows that |αk, p j | ≤ B( j)+1 Z (T , {ê j } j=1 )r k
where ê j = e p j . Together, these imply that
⎡ ⎧
  ⎨ pμ+1
 2
ξ − αk, p q̂k,μ+1  ≤ ⎣1 + A(m) + 1
μ+1

˜ω
m=1,em ∈ ∪
⎫1 ⎤
μ ⎬2
⎥ μ+1 k
+ [B( j) + 1] 2
⎦ Z (T , {ê j } j=1 )r .

j=1

123
On the infinite-dimensional QR algorithm 35

To finish the inductive step, we define q̂μ+1 = ξ . Recall that ξ is orthogonal to any q̂l
μ+1
with l ≤ μ. Hence it follows that {q̂i }i=1 are orthonormal and we can take

⎧ ⎫1
⎨ pμ+1
 2
μ
 2 ⎬
2

B(μ + 1) = 1 + A(m) + 1 + B( j) + 1
⎩ ⎭
˜ω
m=1,em ∈ ∪ j=1

μ+1
in (b). For the induction step for (c), the fact that {q̂k,i }i=1 are orthonormal and (1.6)
imply we can take

⎛ ⎞1
μ+1 2

C(μ + 1) = ⎝ B( j)2 ⎠ .
j=1

Finally, if M is finite we demonstrate that span{q̂ j } M


j=1 = span{ξ j } j=1 . Since
M
 M
the {q̂i }i=1
M are orthogonal and are eigenvectors of
j=1 λc j ξ j ⊗ ξ̄ j , it follows that
span{q̂ j } j=1 = span{ξ j } j=1 = ranχω (T ).
M M 


3.2 Main results

Our first result generalises Theorem 3.1 to infinite dimensions and relies on Theo-
rem 3.7 (which concerns convergence to eigenvectors).

Theorem 3.9 (Convergence theorem for normal operators in infinite dimensions) Let
T ∈ B(l 2 (N)) be an invertible normal operator with σ (T ) = ω ∪  and ω = {λi }i=1 N ,

where the λi ’s are isolated eigenvalues with (possibly infinite) multiplicity m i satisfying
|λ1 | > · · · > |λ N |. Suppose further that sup{|θ | : θ ∈ } < |λ N |, and let {e j } j∈N be
the canonical orthonormal basis. Let {Q n }n∈N and {Rn }n∈N be Q- and R-sequences
of T with respect to {e j } j∈N . Let {ê j } M
j=1 ⊂ {e j } j∈N , where M = m 1 + · · · + m N , be
the subset described in Definition 3.5 and Theorem 3.7, i.e. span{Q k ê j } → span{q̂ j }
where {q̂ j } M j=1 ⊂ ranχω (T ) is a collection of orthonormal eigenvectors of T and if
ej ∈/ {ê j } j=1 , then χω (T )Q k e j → 0. Then:
M

(i) Every subsequence of {Q ∗n T Q n }n∈N has a convergent subsequence {Q ∗n k T Q n k }k∈N


such that
⎛ ⎞

M 
−→ ⎝ T q̂ j , q̂ j ê j ⊗ ê j ⎠
WOT
Q ∗n k T Q n k ξj ⊗ ej,
j=1 j∈

as k → ∞, where

 = { j : ej ∈
/ {êl }l=1
M
}, ξ j ∈ span{ei }i∈

123
36 M. J. Colbrook, A. C. Hansen


and only j∈ ξ j ⊗ e j depends on the choice of subsequence. Furthermore, if T
has only finitely many non-zero entries in each column then we can replace W O T
convergence by S O T convergence.
(ii) We have the following convergence of sections:

M
P #M −→
#M Q ∗n T Q n P SOT
T q̂ j , q̂ j ê j ⊗ ê j , as n → ∞,
j=1

where P#M denotes the orthogonal projection onto span{ê j } M . Furthermore, if


j=1
we define
ρ = sup{|z| : z ∈ }, r = max{|λ2 /λ1 | , . . . , |λ N /λ N −1 | , ρ/ |λ N |}

then r < 1 and for any fixed x ∈ span{ê j } M j=1 we have the following rate of
convergence
 ⎛ ⎞ 
 
 M

P#M Q ∗n T Q n P
#M x − ⎝ T q̂ , q̂ ê ⊗ ê ⎠ x  = O(r n ), as n → ∞.
 j j j j 
 j=1 
(3.14)

If M is finite then we can write (after possibly re-ordering)


⎛  ⎞
ml

M 
N 
l≤k

T q̂ j , q̂ j ê j ⊗ ê j = ⎝λk ê j ⊗ ê j ⎠ , (3.15)



j=1 k=1 j=1+ l<k m l

and in part (ii) we have the rate of convergence


 
  
 M

P#M Q ∗n T Q n P
#M − T q̂ , q̂ ê ⊗ ê 
j  = O(r ), as n → ∞.
n
(3.16)
 j j j
 j=1 

If {χω (T )el }l=1


M are linearly independent, then we can take ê = e .
j j

Remark 3.10 What Theorem 3.9 essentially says is that if we take the n-th iteration
of the IQR algorithm and truncate to an m × m matrix (i.e. Pm Q ∗n T Q n Pm ) then, as
n grows, the eigenvalues of this matrix will converge to the extremal parts of the
spectrum of T . In particular, the theorem suggests that the IQR algorithm can locate
the extremal parts of the spectrum.
Proof of Theorem 3.9 To prove (i), since a closed ball in B(l 2 (N)) is weakly sequen-
tially compact, it follows that any subsequence of {Q ∗n T Q n }n∈N must have a weakly
convergent subsequence {Q ∗n k T Q n k }k∈N . In particular, there exists a W ∈ B(l 2 (N))
such that

Q ∗n k T Q n k −→ W ,
WOT
k → ∞.

123
On the infinite-dimensional QR algorithm 37

Let P#M denote the projection onto span{ê j } M . Note that part (i) of the theorem will
j=1
follow if we can show that


M
P #M =
#M W P T q̂ j , q̂ j ê j ⊗ ê j , (3.17)
j=1

and

#M
P ⊥ #M = 0,
WP #M W P
P #M⊥
= 0.

We will indeed show this, and we start by observing that, due to the weak convergence
and the standard functional calculus, we have that

W ê j , ei  = lim T Q n k ê j , χω (T )Q n k ei  + lim T Q n k ê j , χ (T )Q n k ei , (3.18)


k→∞ k→∞
W ei , ê j  = lim χω (T )Q n k ei , T ∗ Q n k ê j  + lim T Q n k ei , χ (T )Q n k ê j . (3.19)
k→∞ k→∞

We then have the following

χω (T )Q n ei → 0, n → ∞, i ∈ 
$
limk→∞ T Q n k ê j , χω (T )Q n k ei  = 0, i ∈ , (3.20)
⇒
limk→∞ χω (T )Q n k ei , T ∗ Q n k ê j  = 0, i ∈ ,
span{Q n ê j } → span{q̂ j }, n → ∞, T q̂ j = λq̂ j , λ ∈ ω,


⎨limk→∞ T Q n k ê j , χ (T )Q n k ei  = 0, i ∈ N,
(3.21)
⇒ limk→∞ T Q n k ei , χ (T )Q n k ê j  = 0, i ∈ N,


limk→∞ T Q n k ê j , χω (T )Q n k êl  = δ j,l λ.

Thus, by (3.18), (3.20), (3.21) and Theorem 3.7 we get (3.17) and also that P #⊥ W P #M =
M
# #⊥
0. Also, by (3.19), (3.20), (3.21) and Theorem 3.7 we get that PM W PM = 0. Note that
in all of these cases, Theorem 3.7 implies that the rate of convergence is such that the
difference between W ê j , ei , W ei , ê j  and their limiting values is O(r n k ) (however,
not necessarily uniformly over the indices). Now suppose that T has finitely many
non-zero entries in each column. This can be described by a function f : N → N non-
decreasing with f (n) ≥ n such that T e j , ei  = 0 when i > f ( j) as in Definition 4.1.
Proposition 4.2 shows that this is preserved under the iteration in the IQR algorithm,
i.e. Q ∗n k T Q n k also has this property. So let x ∈ l 2 (N) and  > 0. Choose y of finite
support such that x − y ≤ . It is then clear that Q ∗n k T Q n k y − W y → 0 as
n k → ∞ (since we only require convergence in finitely many entries). Hence

lim sup Q ∗n k T Q n k x − W x ≤ ( T + W ).


n k →∞

SOT
Since  > 0 and x were arbitrary, we have Q ∗n k T Q n k −→ W .

123
38 M. J. Colbrook, A. C. Hansen

To prove (ii), suppose that x ∈ span{ê j } M


j=1 , then x can be written as

M
x= x j ê j ,
j=1

with at most finitely many x j non-zero. We have that δ̂(span{Q  n ê j }, span{q̂ j }) =
O(r n ) and hence there exists some an, j of unit modulus such that  Q n ê j − an, j q̂ j  =
O(r n ). Since Q n is unitary, we then have
 ⎛ ⎞ 
  
 M

P#M Q ∗n T Q n P#M x − ⎝ T q̂ j , q̂ j ê j ⊗ ê j x 

 
 j=1 
 ⎛ ⎞ 
  
 ∗ M

 #
≤  Q n T Q n PM x − ⎝ T q̂ j , q̂ j ê j ⊗ ê j Q n Q n x 
⎠ ∗

 j=1 
 
M 
 
=  x j (T − T q̂ j , q̂ j I )Q n ê j   = O(r ),
n
 j=1 

where we have used the fact that T is bounded in the last line. We therefore have
convergence on span{ê j } M
j=1 , and, since the operators are uniformly bounded, we
must have convergence on span{ê j } M
j=1 which implies that


M
#M Q ∗n T Q n P
P #M −→
SOT
T q̂ j , q̂ j ê j ⊗ ê j , as n → ∞.
j=1

For the last parts, suppose that M is finite. Theorem 3.7 then implies (3.15) after a
possible re-ordering. The rate of convergence in (3.14) also implies that
 
  
 M

P#M Q ∗n T Q n P
#M − T q̂ j , q̂ j ê j ⊗ ê j 
 = O(r ).
n

 j=1 

More generally, let K ∈ N∪{∞} be minimal such that dim(span{χω (T )e j } Kj=1 ) = M.


Recall that we defined

ω = {e j : χω (T )e j = 0, j ≤ K },  = {e j : χω (T )e j = 0, j ≤ K }
˜ ω = {e j ∈ ω : χω (T )e j ∈ span{χω (T )ei } }.
and 
j−1
i=1

Recall also from the proof of Theorem 3.7 that {ê j } M ˜


j=1 = ω \ω . If {χω (T )e j } j=1
M

˜ ω = ∅, and therefore {ê j }


are linearly independent then  M = {e j } , which yields
M
j=1 j=1
#M in (3.17) is the projection onto span{e j } M .
that the projection P 

j=1

123
On the infinite-dimensional QR algorithm 39

Theorems 3.9 and 3.7 also give us convergence to the eigenvectors. With the use
of (possibly countably many) shifts and rotations, the above theorem allows us to
find all eigenvalues, their multiplicities and eigenspaces outside the convex hull of the
essential spectrum, i.e. outside the essential numerical range.

Example 3.11 It is possible in the case of infinite M that the q̂ j do not form an
orthonormal basis of ranχω (T ) and we can even lose part of ω in the convergence of
#M Q ∗n T Q n P
P #M to a diagonal operator. This is to be contrasted to the finite-dimensional
case. For example, suppose that with respect to an initial orthonormal basis {v j } j∈N , T
is given by the diagonal matrix Diag(1/2, 1, 1, . . .). Now define f j = v1 + (1/ j)v j+1
and apply Gram-Schmidt to the sequence { f j } j∈N to generate orthonormal vectors
{e j } j∈N . It is easy to see that any v j can be approximated to arbitrary accuracy using
finite linear combinations of e j and hence {e j } j∈N is an orthonormal basis of our
Hilbert space. We also have that the χ1 (T )( f j ) = (1/ j)v j+1 are linearly independent
and hence so are χ1 (T )(e j ). It follows that the IQR iterates converge in the strong oper-
ator topology to the identity operator. However, we could equally take ω = {1, 1/2}
in Theorem 3.9. Hence we have the curious case that span{q̂ j } j∈N ⊂ span{v̂ j } j>1 and
we lose the eigenvalue 1/2.

The following corollary is entirely analogous to the finite-dimensional case.

Corollary 3.12 Suppose that the conditions of Theorem 3.9


hold with M finite. Suppose
ml
also that for j = 1, . . . , N the vectors {χ{λ1 ,...,λ j } (T )ei }i=1l≤ j are linearly indepen-
dent. In the notation of Theorem 3.9, let ρ = sup{|z| : z ∈ }. For j < N define
r j = max{|λk+1 /λk | : k ≤ j} and for j = N define r N = max{|λk+1 /λk |, |λ N /ρ| :
k ≤ j}. We then have the following rates of convergence to the diagonal operator for
i, j ≤ M:
 
1. Q ∗nT Q n e j , ei  = O(rkn ) as n → ∞ if i > j and k is minimal such that
i ≤ l≤k m l ,  
2. Q ∗n T Q n ei , ei  − λk  = O(rkn ) as n → ∞ if k is minimal such that i ≤ l≤k m l .

Proof The result follows from Theorem 3.9 applied successively to ω1 , ω2 , . . . , ω N


where ω j = {λk : k ≤ j}. In general, analogous results follows from Theorem 3.9
when M is infinite and with other linear independence conditions on χω (T )ei with
ω ⊂ ω but the statements become less succinct. 


In the finite-dimensional case and the case of distinct eigenvalues of the same
magnitude, the QR algorithm applied to a normal matrix will ‘converge’ to a block
diagonal matrix (without necessarily converging in each block). This can be extended
to infinite dimensions by inductively using the following theorem which also extends
to non-normal operators.

Theorem 3.13 (Block convergence theorem in infinite dimensions) Let T ∈ B(l 2 (N))
be an invertible operator (not necessarily normal) and suppose that there exists an
orthogonal projection P of rank M (possibly infinite) such that both the ranges of P
and of I − P are invariant under T . Suppose also that there exists α > β > 0 such
that

123
40 M. J. Colbrook, A. C. Hansen

• Tx ≥ α x ∀x ∈ ran(P),
• Tx ≤ β x ∀x ∈ ran(I − P).
Let {Q n }n∈N and {Rn }n∈N be Q- and R-sequences of T with respect to {ei }. Then
there exists a subset {ê j } M
j=1 ⊂ {ei }i∈N such that
μ
(i) For any finite μ ≤ M we have δ(span{Q n ê j } j=1 , ran(P)) = O(β n /α n ) as
n → ∞. If M is finite this implies full convergence δ̂(span{Q n ê j } M
j=1 , ran(P)) =
O(β n /α n ) as n → ∞.
(ii) Every subsequence of {Q ∗n T Q n }n∈N has a convergent subsequence {Q ∗n k T Q n k }k∈N
such that

WOT
M 
Q ∗n k T Q n k −→ ξ j ⊗ ê j ζi ⊗ ei ,
j=1 i∈

as k → ∞, where

 = { j : ej ∈
/ {êl }l=1
M
}, ξ j ∈ span{êl }l=1
M , ζ ∈ span{e }
i l l∈ .

If {Pel }l=1
M are linearly independent then we can take ê = e . Furthermore, if T
j j
has only finitely many non-zero entries in each column then we can replace W O T
convergence by S O T convergence.
Remark 3.14 Theorem 3.13 essentially says that the IQR algorithm can compute the
invariant subspace ran(P) of such an operator if there is enough separation between
T restricted to ran(P) and ran(I − P). In other words, provided the existence of a
dominant invariant subspace.
Proof of Theorem 3.13 The main ideas of the proof of Theorem 3.13 have already been
presented so we sketch the proof. We first define the vectors {ê j } M
j=1 in a similar way
to Definition 3.5 inductively by ê j = e p j where

j−1
p j = min{i : Pei ∈
/ span{P êk }k=1 }.

Let r = β/α < 1. We will prove inductively that


μ μ
(a) δ̂(span{Q n ê j } j=1 , span{P Q n ê j } j=1 ) ≤ C1 (μ)r n for any finite μ ≤ M,
(b) P Q n e j ≤ C2 ( j)r n for any j ∈ ,
for some constants C1 (μ) and C2 ( j). Suppose that this has been done. Part (i) of
μ
Theorem 3.13 now follows since span{P Q n ê j } j=1 ⊂ ran(P). We then argue as in the
proof of Theorem 3.9 to gain

Q ∗n k T Q n k −→ W ,
WOT
k → ∞.

Then by studying the inner products T Q n k e j , Q n k ei  using the invariance of ran(P),


ran(I − P) under T and from (b), part (ii) of Theorem 3.13 easily follows (note that (a)

123
On the infinite-dimensional QR algorithm 41

implies that (I − P)Q n ê j ≤ C1 ( j)r n ). The final part of the theorem then follows
from the same arguments in the proof of Theorem 3.9. Hence we only need to prove
(a) and (b).
We first claim that
μ μ
δ(span{P T n ê j } j=1 , span{T n ê j } j=1 ) ≤ C3 (μ)r n . (3.22)

P commutes with T which is invertible and hence both of these spaces have dimension
μ by the construction of the ê j . It follows that (3.22) implies

μ μ 1
δ̂(span{P T n ê j } j=1 , span{T n ê j } j=1 ) ≤ μ 2 C3 (μ)r n = C4 (μ)r n . (3.23)

μ
To show (3.22), let x1n , . . . , xμn be an orthonormal basis for span{P T n ê j } j=1 and let

ξ = j=1 α j x nj have norm at most 1. Now, we may choose coefficients β j,n such that
 μ
T n j=1 β j,n x nj = ξ since T |ran(P) is invertible when viewed as an operator acting
on ran(P). By the assumptions on T we must have that
⎛ ⎞1/2
m
 2
⎝ β j,n  ⎠ ≤ 1 .
αn
j=1

μ μ
We may change basis from {ê j } j=1 to {ẽ j } j=1 such that P ẽ j = x nj . Form the vector
⎛ ⎞
μ
μ
ηn = T n ⎝ β j,n ẽ j ⎠ ∈ span{T n ê j } j=1 .
j=1

Then clearly by Hölder’s inequality


 μ 
2 1/2
j=1 T n (I − P)ẽ j βn
ξ − ηn ≤ ≤ C3 (μ) ,
αn αn
proving (3.22) and hence (3.23).
Note that the proof of Lemma 3.6 carries over (replacing the projection χω (T ) by
P) to prove that

s(m)
span{P Q n e j }mj=1 = span{P Q n ê j } j=1 (3.24)

where s(m) is maximal with {ê j }s(m)


j=1 ⊂ {e j } j=1 . It follows that
m

μ μ μ μ p
δ(span{T n ê j } j=1 , span{P Q n ê j } j=1 ) = δ(span{T n ê j } j=1 , span{P Q n e j } j=1 )
μ pμ
= δ(span{T n ê j } j=1 , span{P T n e j } j=1 )

123
42 M. J. Colbrook, A. C. Hansen

μ μ
≤ δ(span{T n ê j } j=1 , span{P T n ê j } j=1 )
≤ C4 (μ)r n ,

μ
where we have used (2.6) to reach the second line and the fact that span{P T n ê j } j=1 ⊂

span{P T n e j } j=1 to reach the third line. Again, both spaces have dimension μ so we
have
μ pμ μ μ p
δ(span{P Q n ê j } j=1 , span{Q n e j } j=1 ) = δ(span{P Q n ê j } j=1 , span{T n e j } j=1 )
μ μ
≤ δ(span{P Q n ê j } j=1 , span{T n ê j } j=1 )
≤ C5 (μ)r n . (3.25)

With these arguments out of the way (these are the analogue of Proposition 3.4) we
can now form our inductive argument, similar to the proof of Theorem 3.7. Suppose
first that (a) holds for μ (allowing μ = 0 for the initial step) and let j ∈  have
j < pμ+1 (where pμ+1 = ∞ if μ = M). From (a) for μ and (3.24) we have that

μ
P Q n e j = vn + an,i Q n êi
i=1

for some vn with vn ≤ C1 (μ)r n . Then we must have

an,i + vn , Q n êi  = P Q n e j , Q n êi  = Q n e j , P Q n êi .


μ
Using (a) again, along with the fact that Q n e j is orthogonal to {Q n êi }i=1 , we must

have an,i  ≤ 2C1 (μ)r n . It follows that we can take C2 ( j) = (2 μ + 1)C1 (μ) for
μ+1
j ∈ [ pμ + 1, . . . , pμ+1 ) in (b). Now we use (3.25). Let ξ ∈ span{P Q n ê j } j=1 have
unit norm and assume that pμ+1 < ∞ (else there is nothing to prove since then
μ = M). Then there exists bn, j and wn such that

pμ+1
ξ= bn, j Q n e j + wn
j=1

and wn ≤ C5 (μ + 1)r n . Now let j ∈  with j < pμ+1 then we must have

ξ, P Q n e j  = ξ, Q n e j  = bn, j + wn , Q n e j .


   
We have proven (b) for such j and hence we have bn, j  ≤ C2 ( j) + C5 (μ + 1) r n .
It follows that we can take
⎡ ⎧ ⎫1 ⎤
1 ⎢
⎨ pμ+1 ⎬2

C1 (μ + 1) = μ 2 ⎣C5 (μ + 1) + [C2 ( j) + C5 (μ + 1)] 2
⎦,
⎩ ⎭
j=1, j∈

123
On the infinite-dimensional QR algorithm 43

where the square root factor appears since the relevant spaces are μ-dimensional. This
completes the inductive step (the initial step is identical) and hence the proof of the
theorem. 

Theorem 3.13 can be made sharper (under a slightly stricter assumption on the
linear independence of {e j } M
j=1 ) with the following theorem which includes the case
that ran(I − P) is not necessarily invariant.
Theorem 3.15 (Convergence to invariant subspace in infinite dimensions) Let T ∈
B(l 2 (N)) be an invertible operator (not necessarily normal) and suppose that there
exists an orthogonal projection P of finite rank M such that the range of P is invariant
under T . Suppose also that there exists α > β > 0 such that
• T x ≥ α x ∀x ∈ ran(P),
• (I − P)T (I − P) ≤ β.
Under these conditions, there exists a canonical M-dimensional T ∗ −invariant sub-
space S and we let P̃ denote the orthogonal projection onto S (in the special case that
ran(I − P) is also T -invariant such as in Theorems 3.9 and 3.13, then S = ran(P)).
Suppose also that { P̃e j } M
j=1 are linearly independent. Let {Q n }n∈N and {Rn }n∈N be
Q- and R-sequences of T with respect to {ei }. Then
(i) The subspace angle φ(span{e j } M
j=1 , S) < π/2 and we have

δ̂(span{Q n e j } M
j=1 , ran(P))
 
sin φ(span{e j } M n P T (I − P) 
j=1 , ran(P)) β
≤   1+ , (3.26)
cos φ(span{e j } j=1 , S)
M α n α−β

(ii) Every subsequence of {Q ∗n T Q n }n∈N has a convergent subsequence {Q ∗n k T Q n k }k∈N


such that
M  ∞
WOT
Q ∗n k T Q n k −→ ξj ⊗ ej ζi ⊗ ei ,
j=1 i=M+1

as k → ∞, where

ξ j ∈ span{el }l=1
M , ζ ∈ H.
i

Furthermore, if T has only finitely many non-zero entries in each column then we can
replace W O T convergence by S O T convergence.
Remark 3.16 Theorem 3.15 says that the IQR algorithm can be used to approximate
dominant invariant subspaces. In particular, we shall use the bound (3.26) to build a 1
algorithm in Sect. 5. Note in the normal case that Theorem 3.9 is more precise, both
in giving convergence of individual vectors to eigenvectors and in the less restrictive
assumptions on spanning sets and M. In the normal case (and that of Theorem 3.13)
we also have that the limit operator has a block diagonal form.

123
44 M. J. Colbrook, A. C. Hansen

3.3 Proof of Theorem 3.15

In this section we will prove Theorem 3.15. The proof technique is different from
those used above, and hence we have given it a separate section. Throughout, we
will denote the ratio β/α by r . Note that since M is finite, the bound α implies that
T |ran(P) : ran(P) → ran(P) is invertible with T |−1 ran(P) ≤ 1/α. First, let Q denote
a unitary change of basis matrix from {e j } to {ẽ j } where {ẽ j } M
j=1 is a basis for ran(P).
Then as matrices with respect to the original basis we can write
 
T11 T12
Q = [P1 , P2 ], Q∗ T Q = ,
0 T22

−1
where T11 ∈ C M×M and T12 has M rows. Our assumptions imply that T11 ≤ 1/α
and T22 ≤ β. The next lemma shows that we can change the basis further to eliminate
the sub-block T12 . This is needed to apply a power iteration type argument.

Lemma 3.17 Define the linear function F : B(l 2 (N), C M ) → B(l 2 (N), C M ) by

−1
F(A) = T11 AT22 ,

where we identify elements of B(l 2 (N), C M ) as matrices. Then we can define A ∈


−1
B(l 2 (N), C M ) by A − F(A) = −T11 T12 . Furthermore, if we define


I A
B(A) = ,
0 I

then B(A) has inverse B(−A) and


   
T11 T12 T 0
B(−A) B(A) = 11 . (3.27)
0 T22 0 T22

Proof Our assumptions on T ensure that F is a contraction with F ≤ r < 1. Hence


we can define A via the series

−1
A= F k (−T11 T12 ).
k=0

−1
It is then straightforward to check A − F(A) = −T11 T12 , B(A)B(−A) =
B(−A)B(A) = I and the identity (3.27). 


Let
 
I 0
Y =Q
−A∗ I

123
On the infinite-dimensional QR algorithm 45

then we have the matrix identity


 ∗ 0

T11
Y −1 T ∗ Y = ∗ .
0 T22

The canonical T ∗ −invariant subspace alluded to in Theorem 3.15 is then simply


S = span{Y e j } M
j=1 . The space is canonical since it is easily seen that it is unchanged
if we use a different  1 ) and ran(P2 ) in the definition of Q.
 basis for ran(P
Now let P0 = e1 e2 . . . e M ∈ B(C M , l 2 (N)) denote the matrix whose columns
are the first M basis elements {e j } M
j=1 . Since the {Ri } are upper triangular, it is easy
to see that

T n P0 = Q n Rn P0 = Q n P0 P0∗ Rn P0 .

We will denote the (invertible) matrix P0∗ Rn P0 ∈ C M×M by Z n . Now define

Vn1 = P1∗ Q n P0 ∈ B(C M ), Vn2 = P2∗ Q n P0 ∈ B(C M , l 2 (N)),

then we have the relation


 n    
T11 T12 V01 Vn1
= Zn .
0 T22 V02 Vn2

But by Lemma 3.17 we have


 n n 0


T11 T12 T11
= B(A) n B(−A).
0 T22 0 T22

Unwinding the definitions, this implies the matrix identities

n
T11 (V01 − AV02 ) = (Vn1 − AVn2 )Z n , (3.28)
n 2
T22 V0 = Vn2 Z n . (3.29)

Lemma 3.18 The following identity holds

δ̂(span{Q n e j } M
j=1 , ran(P)) = Vn .
2
(3.30)

Proof Note that span{Q n e j } M ∗


j=1 = ran(Q n P0 ) and ran(P) = ran(P1 ). Since P1 P1
∗ ∗
and Q n P0 P0 Q n are orthogonal projections, it follows that

∗ ∗ ∗
δ̂(span{Q n e j } M
j=1 , ran(P)) = Q n P0 P0 Q n − P1 P1
= Q ∗n (Q n P0 P0∗ Q ∗n − P1 P1∗ )Q
 
 0 P0∗ Q ∗n P2 
=
 −(I − P0 )∗ Q ∗ P1
.

n 0

123
46 M. J. Colbrook, A. C. Hansen

But we have that P0∗ Q ∗n P2 = Vn2 and hence we are done if we can show
P0∗ Q ∗n P2 = (I − P0 )∗ Q ∗n P1 . Consider the unitary matrix
   
P0∗ Q ∗n P1 P0∗ Q ∗n P2 U11 U12
U := Q ∗n Q = = .
(I − P0 ) Q n P1 (I − P0 )∗ Q ∗n P2
∗ ∗ U21 U22

Now let x ∈ C M be of unit norm, then U11 x 2 + U21 x 2 = 1. It follows that


U21 2 = 1 − σ0 (U11 )2 , where σ0 denotes the smallest singular value. Applying the
same argument to U ∗ we see that U12 2 = 1 − σ0 (U11 )2 = U21 2 , completing the
proof. 


Lemma 3.19 The matrix (V01 − AV02 ) is invertible with

1
(V01 − AV02 )−1 ≤  . (3.31)
cos φ(span{e j } M
j=1 , S)

Proof First note that since { P̃e j } M


j=1 are linearly independent, we must have
φ(span{e j } j=1 , S) < π/2 and hence the bound in (3.31) is finite. Let W =
M

(P1 − P2 A∗ )(I + A A∗ )−1/2 ∈ B(C M , l 2 (N)). By considering W ∗ W = I ∈ C M×M ,


we see that the columns of W are orthonormal. In fact, expanding Y we have

Y = [P1 − P2 A∗ P2 ]

and hence the columns of W are a basis for the subspace S. Arguing as in the proof
of Lemma 3.18, we have that
'
δ̂(span{e j } M
j=1 , S) = 1 − σ0 (W ∗ P0 )2 < 1.

This implies that W ∗ P0 is invertible with


 
σ0 (W ∗ P0 ) = cos φ(span{e j } M
j=1 , S > 0.

We also have the identity

V01 − AV02 = (I + A A∗ )1/2 (W ∗ P0 ).

Since (I + A A∗ )−1/2 has norm at most 1, we see that (V01 − AV02 ) is invertible and
(3.31) holds. 


Proof of Theorem 3.15 Using Lemma 3.19 and the matrix identities (3.28) and (3.29),
we can write

−n
Vn2 = T22
n 2
V0 (V01 − AV02 )−1 T11 (Vn1 − AVn2 ).

123
On the infinite-dimensional QR algorithm 47

Using (3.30) and (3.31), this implies

V02 Vn1 − AVn2 r n


δ̂(span{Q n e j } M
j=1 , ran(P)) ≤  
cos φ(span{e j } M
j=1 , S)
  1
sin φ(span{e j } M
j=1 , ran(P)) Vn − AVn r
2 n
=   .
cos φ(span{e j } M j=1 , S)
(3.32)

It is clear by summing a geometric series that


T12 P T (I − P)
A ≤ = .
α(1 − r ) α−β

It follows that Vn1 − AVn2 ≤ 1 + P T (I − P) /(α − β). Substituting this into


(3.32) proves part (i) of the theorem.
Next we argue that if i > M then P Q n ei → 0 as n → ∞. We have that
M
P Q n ei = α j,n Q n e j + vn
j=1

with vn → 0 by part (i). Note that we then have

α j,n = P Q n ei , Q n e j  +  j,n = Q n ei , P Q n e j  +  j,n

with { j,n } null. But again by (i) we have that P Q n e j approaches span{Q n ek }k=1
M

which is orthogonal to Q n ei and hence {α j,n } is null. The proof of part (ii) now
follows the same argument as in the proof of part (i) of Theorem 3.9 and of the
final part of Theorem 3.13. The key property being that if j ≤ M and i > M
then Q ∗n T Q n e j , ei  → 0 due to the invariance of ran(P) under T . Note that it
does not necessarily follow (as is easily seen by considering upper triangular T ) that
Q ∗n T Q n ei , e j  → 0 for such i, j. 


4 The IQR algorithm can be computed

The previous section gives a theoretical justification for why the IQR algorithm may
work, but we are faced with the possibly unpleasant problem of how to compute with
infinite data structures on a computer. Fortunately, there is a way to overcome such a
problem. The key is to impose some structural requirements on the infinite matrix.

4.1 Quasi-banded subdiagonals

Definition 4.1 Let T be an infinite matrix acting as a bounded operator on l 2 (N) with
basis {e j } j∈N . For f : N → N non-decreasing with f (n) ≥ n we say that T has
quasi-banded subdiagonals with respect to f if T e j , ei  = 0 when i > f ( j).

123
48 M. J. Colbrook, A. C. Hansen

This is the class of infinite matrices with a finite number of non-zero elements in
each column (and not necessarily in each row) which is captured by the function f . It is
for this class that the computation of the IQR algorithm is feasible on a finite machine.
For this class of operators one can actually compute (without any approximation or
any extra discretisation) the matrix elements of the n-th iteration of the IQR algorithm
as if it was done on an infinite computer (meaning the computation collapses to a
finite one). The following result of independent interest is needed in the proof and
generalises the well-known fact in finite dimensions that the QR algorithm preserves
bandwidth (see [57] for a good discussion of the tridiagonal case).

Proposition 4.2 Let T ∈ B(l 2 (N)) and let Tn be the n-th element in the IQR iteration,
such that Tn = Q ∗n · · · Q ∗1 T Q 1 · · · Q n , where

j j
Q j = SOT-lim U1 · · · Ul
l→∞

j
and Ul is a Householder transformation. If T has quasi-banded subdiagonals with
respect to f then so does Tn .

Proof By induction, it is enough to prove the result for n = 1. From the construction
of the Householder reflections Um1 = Pm−1 ⊕ Sm , the chosen ηm (see Theorem 2.2)
have

ηm , e j  = 0, j > f (m). (4.1)

Using the fact that f is increasing, it follows that each Um1 has quasi-banded subdiag-
onals with respect to f , as does the product U11 · · · Um1 . It follows that Q 1 must have
quasi-banded subdiagonals with respect to f and hence so does T1 = R1 Q 1 since R1
is upper triangular. 


Theorem 4.3 Let T ∈ B(l 2 (N)) have quasi-banded subdiagonals with respect to f
and let Tn be the n-th element in the IQR iteration, i.e. Tn = Q ∗n · · · Q ∗1 T Q 1 · · · Q n ,
where
j j
Q j = SOT-lim U1 · · · Ul
l→∞

j
and Ul is a Householder transformation (the superscript is not a power, but an index).
Let Pm be the usual projection onto span{e j }mj=1 and denote the a-fold iteration of f
by f ◦ f ◦ . . . ◦ f = f a . Then
( )* +
a times

Pm Tn Pm = Pm Umn · · · U1n U n−1 n−1


f 1 (m) · · · U1 · · · U 2f(n−2) (m) · · · U12 U 1f(n−1) (m) · · · U11
· P fn (m) T P fn (m) (4.2)
· U11 · · · U 1f(n−1) (m) U12 · · · U 2f(n−2) (m) · · · U1n−1 · · · U n−1 n
f 1 (m) U1 · · · Umn Pm .

123
On the infinite-dimensional QR algorithm 49

Remark 4.4 What Theorem 4.3 says is that to compute the finite section of size m of the
n-th iteration of the IQR algorithm (i.e. Pm Tn Pm ), one only needs information from
the finite section of size f n (m) (i.e. P fn (m) T P fn (m) ) since the relevant Householder
reflections can be computed from this information. In other words, the IQR algorithm
can be computed.

Proof of Theorem 4.3 By induction it is enough to prove that

Pm Tn Pm = Pm Umn . . . U1n P f (m) Tn−1 P f (m) U1n . . . , U1m Pm (4.3)

To see why this is true, note that by the assumption that T has quasi-banded subdiag-
onals with respect to f , Proposition 4.2 shows that Tn has quasi-banded subdiagonals
with respect to f for all n ∈ N. Thus, it follows from the construction in the proof of
j
Theorem 2.2 that each Ul is of the form
 
j 2
Ul = Il, j,1 ⊕ Il, j,2 − ξ ⊗ ξ̄l, j
2 l, j
⊕ Il, j,3 ,
ξl, j

where Il, j,1 denotes the identity on Pl−1 H, Il, j,2 denotes the identity on span{ek : l ≤
k ≤ f (l)}, Il, j,3 denotes the identity on P ⊥
f (l) H and ξl, j ∈ span{ek : l ≤ k ≤ f (l)}.
Since Pm is compact, it then follows that

Pm Tn Pm = (SOT-lim Pm Uln , . . . , U1n )P f (m) Tn−1 P f (m) (SOT-lim U1n , . . . , Uln Pm )


l→∞ l→∞ (4.4)
= Pm Um , . . . , U1 P f (m) Tn−1 P f (m) U1 , . . . , U1 Pm .
n n n m



j
Remark 4.5 This result allows us to implement the IQR algorithm because each Ul
only affects finitely many columns or rows of A if multiplied either on the left or
the right. In computer science, it is often referred to as “Lazy evaluation” when one
computes with infinite data structures, but defers the use of the information until
needed. A simple implementation is shown in the appendix for the case that the matrix
has k subdiagonals (i.e. we have f (n) = n + k).

The next question is how restrictive is the assumption in Definition 4.1? In par-
ticular, suppose that T ∈ B(H) and that ξ ∈ H is a cyclic vector for T (i.e.
span{ξ, T ξ, T 2 ξ, . . .} is dense in H). Then by applying the Gram-Schmidt procedure
to {ξ, T ξ, T 2 ξ, . . .} we obtain an orthonormal basis {η1 , η2 , η3 , . . .} for H such that
the matrix representation of T with respect to {η1 , η2 , η3 , . . .} is upper Hessenberg,
and thus the matrix representation has only one subdiagonal. The question is therefore
about the existence of a cyclic vector. Note that if T does not have invariant subspaces
then every vector ξ ∈ B(H) is a cyclic vector. Now what happens if ξ is not cyclic for
T ? We may still form {η1 , η2 , η3 , . . .} as above, however, H1 = span{η1 , η2 , η3 , . . .}
is now an invariant subspace for T and H1 = H. We may still form a matrix represen-
tation of T with respect to {η1 , η2 , η3 , . . .}, but this will now be a matrix representation
of T |H1 . Obviously, we can have that σ (T |H1 )  σ (T ).

123
50 M. J. Colbrook, A. C. Hansen

However, the following example shows that the class of matrices for which we
can compute the IQR algorithm covers a wide number of applications. In particular, it
includes all finite interaction Hamiltonians on graphs. Such operators play a prominent
role in solid state physics [48,51] describing propagation of waves and spin waves as
well as encompassing Jacobi operators studied in many physical models and integrable
lattices [71].

Example 4.6 Consider a connected, undirected graph G, such that each vertex degree is
finite and the set of vertices V (G) is countably infinite. Consider the set of all bounded
operators A on l 2 (V (G)) ∼ = l 2 (N) such that the set S(v) := {w ∈ V : w, Av = 0}
is finite for any v ∈ V . Suppose our enumeration of the vertices obeys the following
pattern. e1 ’s neighbours (including itself) are S1 = {e1 , e2 , . . . , eq1 } for some finite
q1 . The set of neighbours of these vertices is S2 = {e1 , . . . , eq2 } for some finite
q2 where we continue the enumeration of S1 and this process continues inductively
enumerating Sm . If we know S(v) for all v ∈ V then we can find an f : N → N such
that A j,m = 0 if | j| > f (m). We simply choose f (n) = qrn where rn is minimal such
that ∪ j≤n S(e j ) ⊂ Srn .

4.2 Invertible operators

More generally, given an invertible operator T with information on how its columns
decay at infinity, we can compute finite sections of the IQR iterates with error control.
For computing spectral properties, we can assume, by shifting T → T + λI then
translating by −λ back, that the operator we are interested in is invertible, hence the
invertibility criterion is not that restrictive. Throughout, we will use the following
lemma which says that for invertible operators, the QR decomposition is essentially
unique.

Lemma 4.7 Let T be an invertible operator (viewed as a matrix acting on l 2 (N)),


then there exists a unique decomposition T = Q R with Q unitary and R invertible,
upper triangular such that Rii ∈ R>0 . Furthermore, any other “QR” decomposition
T = Q  R  has a diagonal matrix D = Diag(t1 , t2 , . . .) such that |ti | = 1 and
Q = Q  D. In other words, the QR decomposition is unique up to phase choices.

Proof Consider the QR decomposition already discussed in this paper, T = Q  R  . T


is invertible, and hence Q  is a surjective isometry so is unitary. Hence R  = Q ∗ T is
invertible. Being upper triangular, it follows that Rii = 0 for all i. Choose ti ∈ T such
that ti Rii ∈ R>0 and set D = Diag(t1 , t2 , . . .). Letting Q = Q  D ∗ and R = D R 
we clearly have the decomposition as claimed.
Now suppose that T = Q  R  then we can write Q = Q  R  R −1 . It follows that
R R −1 is a unitary upper triangular matrix and hence must be of the form D =


Diag(t1 , t2 , . . .) with |ti | = 1. 




Another way to see this result is to note that the columns of Q are obtained by
applying the Gram–Schmidt procedure to the columns of T . The restriction that Rii ∈
R>0 can also be incorporated into Theorem 4.3. Theorem 4.3 (in this subcase of

123
On the infinite-dimensional QR algorithm 51

invertibility) is then a consequence of the fact that if T has quasi-banded subdiagonals


with respect to f then

Pm T n Pm = Pm (P fn (m) T P fn (m) )n Pm

and the relations (2.5)—we can apply Gram-Schmidt (or a more stable modified ver-
sion) to the columns of P fn (m) T P fn (m) and truncate the resulting matrix.
Assume that given T ∈ B(l 2 (N)) invertible (not necessarily with quasi-banded
subdiagonals), we can evaluate an increasing family of increasing functions g j :
N → N such that defining the matrix T( j) with columns {Pg j (n) T en } we have that T( j)
is invertible and
  1
(P − I )T en  ≤ . (4.5)
g j (n)
j

It is easy to see that such a sequence of functions must exist since any S with S − T <
 −1 −1
T  is invertible. Given this information, without loss of generality by increasing
the g j s pointwise if necessary,
 applying Hölder’s inequality and taking subsequences,
we may assume that T( j) − T  ≤ 1/ j. In other words, given a sequence of functions
satisfying (4.5) we can evaluate a sequence of functions with this stronger condition.
The following says that given such a sequence of functions, we can compute the
truncations Pm Tn Pm to a given precision.

Theorem 4.8 Suppose T ∈ B(l 2 (N)) is invertible and the family of functions {g j } are
as above. Suppose also that we are given a bound C such that T ≤ C. Let  > 0 and
m, n ∈ N, then we can choose j such that applying Theorem 4.3 (with the diagonal
operators to ensure Rii > 0) to T( j) using the function g j instead of f , we have the
guaranteed bound
 
 Pm Tn Pm − Pm T( j),n Pm  ≤ ,

where T( j),n denotes the n-th IQR iterate of T( j) .

Proof of Theorem 4.8 First consider the error when applying Theorem 4.3 to T( j) with
g j for any fixed j. We will show that we can compute an error bound which converges
to zero as j → ∞ and from this the theorem easily follows by successively computing
the bound and halting when this bound is less than .
Write the QR decompositions

T n = Q̂ n R̂n , (T( j) )n = Q̂ ( j),n R̂( j),n .


 
We have T − T( j)  ≤ 1/ j and hence, by writing T( j) = T + (T( j) − T ), that

 
 n  n
n 1 n−k (C + 1)n C̃
T − (T( j) )n  ≤ C ≤ = ,
k j k j j
k=1

123
52 M. J. Colbrook, A. C. Hansen

where C̃ = (C + 1)n . The columns of Q̂ n and Q̂ ( j),n are simply the columns of the
matrices T n and (T( j) )n after the application of Gram-Schmidt. Let the first m columns
of T n and (T( j) )n be denoted by {tk }m ˜j m
k=1 and {tk }k=1 respectively and let {qk }k=1 and
m
j m
{q̃k }k=1 be the vectors obtained after applying Gram-Schmidt to these sequences of
vectors. We then have
 
 t t˜1 
j
j  1 
q1 − q̃1 = − j 
 t1 t˜1 
  (4.6)
 t ( t˜ j − t ) (t˜ j − t ) t  2 t − t˜ j
1 1 1 1 1  1 2C̃
= − 1 ≤ 1
≤ .
 t1 t˜1
j
t1 t˜1
j  t˜1
j
j t˜1
j

For a vector v of unit norm, let P⊥v denote the orthogonal projection onto the
space of vectors perpendicular to v. Note that for two such vectors v, w, we have
P⊥v − P⊥w ≤ v − w . Let

j j
vk = P⊥qk−1 · · · P⊥q1 tk , ṽk = P⊥q̃ j · · · P⊥q̃ j t˜k , (4.7)
k−1 1

j
then qk are just the normalised version of vk and likewise q̃k are just the normalised
j j
version of ṽk . Suppose that for μ < k we have qμ − q̃μ ≤ δ for some δ > 0. Then
applying the above products of projections we have

j j j j
vk − ṽk ≤ P⊥qk−1 · · · P⊥q1 (tk − t˜k ) + P⊥qk−1 · · · P⊥q1 t˜k − ṽk
j j
≤ tk − t˜k + P⊥qk−1 · · · P⊥q1 − P⊥q̃ j · · · P⊥q̃ j t˜k
k−1 1
j j
≤ tk − t˜k + (k − 1)δ t˜k .

In the last line we have used the fact that if the operators {Al }l=1
m and {B }m have
l l=1
norm bounded by 1, then
m 
, ,
m  m
 
 Al − Bl  ≤ Al − Bl .
 
l=1 l=1 l=1

Applying the same argument as in the inequalities (4.6) we see that

j j
j 2( tk − t˜k + (k − 1)δ t˜k ) 2(C̃/ j + 2(k − 1)δ C̃)
qk − q̃k ≤ j
≤ j
, (4.8)
ṽk ṽk

j j
since t˜k ≤ C + C̃/ j ≤ 2C̃. Now note that we can compute the ṽk from the proof
of Theorem 4.3. Set δ1 ( j) = 2C̃
j and for 1 < k ≤ m define iteratively
j t˜1

123
On the infinite-dimensional QR algorithm 53

$ -
2(C̃/ j + 2(k − 1)δk−1 ( j)C̃)
δk ( j) = max δk−1 ( j), j
.
ṽk

j
We must have qk − q̃k ≤ δm ( j) for 1 ≤ k ≤ m where we have now shown the j
dependence as an argument. √
It follows that ( Q̂ n − Q̂ ( j),n )Pm ≤ mδm ( j) and hence that

Pm Tn Pm − Pm T( j),n Pm ≤ Pm ( Q̂ n − Q̂ ( j),n )∗ T Q̂ n Pm
+ Pm Q̂ ∗( j),n (T Q̂ n − T( j) Q̂ ( j),n )Pm

≤ mδm ( j)C + (T − T( j) ) Q̂ ( j),n Pm
+ T ( Q̂ n − Q̂ ( j),n )Pm
√ 1
≤ 2 mδm ( j)C + .
j

So we need only show that δm ( j) → 0 as j → ∞. Note that as j → ∞, the columns of


j j
(T( j) )n converge to that of T n . It follows that t˜k converge to tk and q̃1 converges to q1 .
j
An easy inductive argument using (4.7) and (4.8) shows that the vectors q̃k converge
j
to qk and ṽk are bounded below. The convergence δm ( j) → 0 now follows. 


5 SCI classification theorems

In this section we will apply the above results to prove three new classification theorems
in the SCI hierarchy. First, assume that T ∈ B(l 2 (N)) is an invertible normal operator
with σ (T ) = ω ∪ , where ω ∩  = ∅, ω = {λi }i=1 N , and the λ ’s are isolated
i
eigenvalues with multiplicity m i satisfying |λ1 | > · · · > |λ N |. As usual, we also
assume that sup{|θ | : θ ∈ } < |λ N | and set

M := m 1 + · · · + m N ∈ N ∪ {∞}. (5.1)

In this section we will assume for simplicity that all the m i except possibly m N are
finite. To be able to obtain the classification results we need two key assumptions.
(I) Column decay We assume a much weaker condition than bandedness of the infinite
matrix. Indeed, we suppose a known decay of the elements in the columns of T that
is described through a family of increasing functions {g j } j∈N . In particular, g j :
N → N is such that defining the infinite matrix T( j) with columns {Pg j (n) T en }n∈N
we have that T( j) is invertible and

  1
(P − I )T en  ≤ , n ∈ N. (5.2)
g j (n)
j

(II) Distance to span of eigenvectors In order to obtain error control (1 classification)
one needs to control the hidden constant in the O(r n ) estimate in (3.16). This is

123
54 M. J. Colbrook, A. C. Hansen

done as follows, where {Q n }n∈N is a Q-sequence of T with respect to {e j } j∈N .


Given finite k < M + 1 with m 1 + · · · + m N −1 < k, we will assume that if
1 +···+m l
l < N then {χ{λ1 ,...,λl } (T )e j }mj=1 are linearly independent. We also assume
that {χ{λ1 ,...,λ N } (T )e j } j=1 are linearly independent. This simply ensures that the
k

IQR algorithm converges with the expected ordering (largest eigenvalue in the first
diagonal entry then in descending order). It follows from Theorems 3.9 and 3.7,
that there exist eigenspaces E 1 , . . . , E N (with the last space depending on k and
the vectors {e j }) corresponding to the eigenvalues λ1 , . . . , λ N such that
– Ei = ker(T − λi I ) is the full eigenspaceif i < N
.l min{m 1 +···+m l ,k}
– δ̂ i=1 E i , span{Q n e j } j=1 → 0 as n → ∞ for l = 1, . . . , N .

We then define the initial supremum subspace angle by


l 
min{m 1 +···+m l ,k}
(T , {e j }kj=1 ) := sup φ E i , span{e j } j=1 , (5.3)
l=1,...,N i=1

where φ, defined by (1.4), denotes the subspace angle. Our assumptions and the
proofs in Sect. 3 show that (T , {e j }kj=1 ) < π/2 and hence the key quantity
 
tan (T , {e j }kj=1 ) is finite.
 
Remark 5.1 The quantity tan (T , {e j }kj=1 ) can be viewed as a measure of how far
{e j }kj=1 is from {q j }kj=1 , the k eigenvectors of T corresponding to the first k eigenvalues
(including multiplicity and preserving order). Hence it gives an estimate of how good
the initial approximation {e j }kj=1 to {q j }kj=1 is. Indeed, we know from (3.16) that
the convergence rate is O(r n ), and the hidden constant C depends exactly on this
behaviour. In particular, if e j = q j for j ≤ k then C = 0.
Define also

r (T ) = max{|λ2 /λ1 | , . . . , |λ N /λ N −1 | , ρ(T )/ |λ N |}, ρ(T ) = sup{|z| : z ∈ }.

We can now define the class of operators kt,L for the classification theorem.

Definition 5.2 Given k ∈ N, t ∈ (0, 1) and L > 0, let kt,L denote the class of
invertible normal operators T acting on l 2 (N) with T ≤ L such that:
1. There exists the decomposition σ (T ) = ω ∪  as above with m 1 + · · · + m N −1 <
k ≤ M, where M is defined in (5.1).
1 +···+m l
2. If m 1 + · · · + m l < k then {χ{λ1 ,...,λl } (T )e j }mj=1 are linearly independent.
Also, the vectors {χ{λ1 ,...,λ N } (T )e j }kj=1 are linearly independent.
3. We have access to functions gj : N → N with(5.2).
4. It holds that r (T ) ≤ t and tan (T , {e j }kj=1 ) ≤ L.
We can now define the computational problem that we want to classify in the SCI
hierarchy. Consider for any T ∈ kt,L , the problem of computing the k-th largest eigen-
values (including multiplicity) and the corresponding eigenspaces. In other words, we

123
On the infinite-dimensional QR algorithm 55

consider the set-valued mapping


 k
1 (T ) = S ⊂ M = Ck × l 2 (N)

where we define

S := (λ1 , . . . , λ1 , . . . , λ ,...,λ ) × (q̂1 , . . . , q̂k ) :
( )* + ( N )* N+
m 1 times k−(m 1 +···+m N −1 ) times
m 1 +···+m l
s.t. {q̂ j } j=m 1 +···+m l−1 +1 is an orthonormal basis of ran(χλl (T )) for l < N

and {q̂ j }kj=m 1 +···+m N −1 +1 is an orthonormal basis for a subspace of ran(χλ N (T )) .

As discussed in Remark A.6 in an Appendix, where we review the SCI hierarchy,


when we speak of convergence of M  n (T ) to 1 (T ), we define, with a slight
abuse of notation,

dist(n (T ), 1 (T )) := inf dM (n (T ), y) → 0.


y∈1 (T )

Having established the basic definition we can now present the classification theorem.

Theorem 5.3 (1 classification for the extremal part of the spectrum) Given the above
set-up we have {1 , kt,L } ∈ 1 . In other words, for all n ∈ N, there exists a general
tower using radicals, n (T ), such that for all T ∈ kt,L ,

dist(n (T ), 1 (T )) ≤ 2−n .

Remark 5.4 Note that this means we converge to the k largest magnitude eigenvalues
in order with error control, and not just arbitrary points of the spectrum. This is in
contrast to most 1 classifications in the SCI hierarchy where the best we can hope
for is to bound dist(z, σ (T )) for z ∈ C.

Proof of Theorem 5.3 Let T ∈ kt,L then by the definition of kt,L , we may take ê j = e j
for j = 1, . . . , k in the arguments in Sect. 3.1. The first step is to bound Z (T , {e j }kj=1 )
in terms of (T , {e j }kj=1 ). Let {ẽ j }kj=1 denote the basis described in Sect. 3.1. In our
case:
• For any 1 ≤ i ≤ k, span{ẽ j }ij=1 = span{e j }ij=1 .
• If j > m 1 + · · · + m l then χλl (T )ẽ j = 0.
min{m 1 +···+m l ,k}
• The vectors {χλl (T )ẽ j } j=m 1 +···+m l−1 +1
are orthonormal.

123
56 M. J. Colbrook, A. C. Hansen

Let δ j = ẽ j then we must have that if m 1 + · · · , m l−1 < j ≤ m 1 + · · · + m l then

δ 2j − 1  
l 
min{m 1 +···+m i ,k} 2
≤ δ span{ẽ j }, span{χ{λi } (T )ẽ j } j=m 1 +··· ,m i−1 +1
δ 2j i=1
 
l 
min{m +···+m l ,k} min{m 1 +···+m i ,k} 2
≤ δ span{ẽ j } j=1 1 , span{χ{λi } (T )ẽ j } j=m 1 +··· ,m i−1 +1
i=1
 
l 2
min{m +···+m l ,k}
= δ span{e j } j=1 1 , Ei
i=1
 
≤ sin (T , {e j }kj=1 )
2

.l
Where the first line holds since the nearest point to ẽ j in i=1 span{χ{λi } (T )
min{m 1 +···+m i ,k}
ẽ j } j=m 1 +··· ,m i−1 +1 is simply χ λl (T ) ẽ j and the E i are defined as above and in (3.1).
Rearranging, this implies that

1 1
δ 2j ≤  =  .
1 − sin2 (T , {e j }kj=1 ) cos2 (T , {e j }kj=1 )

Hence it follows that

 k 1  k
  21 √
2
Z (T , {e j }kj=1 ) = δ 2j − 1 ≤ tan2 (T , {e j }kj=1 ) ≤ k L.
j=1 j=1

In particular, Theorem 3.7 and its proof now implies that



δ̂(span{q̂ j }, span{Q m e j }) ≤ B( j) k Lt m ,

where {q̂ j }kj=1 are orthonormal eigenvectors of T and Q m is a Q−sequence of T . In


particular, {B( j)}kj=1 can be computed in finitely many arithmetic operations from the

√there exists z j,m ∈ C of unit modulus


induction proof of Theorem 3.7. It follows that
such that defining β = max{B(1), . . . , B(k)} k L, we have
 
 Q m e j − z j,m q̂ j  ≤ βt m .

Note that we do not need to assume knowledge of N for this bound (trivially N ≤ k).
Using that Q m is an isometry, this implies that
 ∗ 
Q T Q m e j , e j  − λa  ≤ 2 T βt m ≤ 2Lβt m ,
m j

1 +···+m l
where T q̂ j = λa j . Note that we must have {λa j }mj=m 1 +···+m l−1 +1
= λl and
{λa j }kj=m 1 +···+m N −1 +1 = λ N by 3. in the definition of kt,L .

123
On the infinite-dimensional QR algorithm 57

Given any  > 0, choose m large enough so that 2Lβt m ≤  and βt m ≤ . The fact
that T ≤ L and (5.2) hold implies that we can compute Q ∗m T Q m e j , e j  to accuracy
 using finitely many arithmetical and square root operations using Theorem 4.8. Call
these approximations λ̃1 , λ̃2 , . . . , λ̃k . Furthermore, the proof of Theorem 4.8 also
makes clear that we can compute Q m e j ∈ l 2 (N) to accuracy  using finitely many
arithmetical and square root operations (the approximations have finite support). Call
these approximations q̃1 , q̃2 , . . . , q̃k . Then set

  (T ) = (λ̃1 , λ̃2 , . . . , λ̃k ) × (q̃1 , q̃2 , . . . , q̃k ).

The above estimates show that dist(  (T ), 1 (T )) ≤ 4k. The proof is completed by
−(n+2) /k
setting n (T ) =  2 (T ). 


Next, suppose that we have a continuous increasing function g : R≥0 → R≥0


g
diverging at ∞ such that g(0) = 0 and g(x) ≤ x. Let IQR be the set of all operators
T acting on l 2 (N) (i.e. we fix the representation w.r.t. the canonical basis) for which
the IQR algorithm converges in the weak operator topology to a diagonal matrix with
the same spectrum as T and such that
 −1  
 
(T − z I )−1  ≥ g dist(z, σ (T )) .

Note that by Theorem 3.9 this includes all normal compact operators, T , such that
{z ∈ σ (T ) : |z| = s} has size at most 1 for all s > 0 (where we can take g(x) = x).2
We will allow evaluations of g in our algorithms and also assume that we are given
functions that satisfy (5.2) and have an upper bound for T . We consider computing
2 (T ) = σ (T ) in the space of compact non-empty subsets of C with the Hausdorff
metric.

Theorem 5.5 (1 classification for spectrum) Given the above set-up we have
g
{2 , IQR } ∈ 1 . In other words, there is a convergent sequence of general tow-
g
ers using radicals, n (T ), such that n (T ) → 2 (T ) = σ (T ) for any T ∈ IQR and
for all n we have

n (T ) ⊂ σ (T ) + B2−n (0).
g
Proof of Theorem 5.5 Let T ∈ IQR and Q m be a Q−sequence of T . Fix n ∈ N. Then
Theorem 4.8 shows that we can compute any finite number of the diagonal entries
of Q ∗m T Q m to any given accuracy using finitely many arithmetical and square root
operations. Similarly, the proof shows that we can compute T Q m e j and Q m e j to any
given accuracy in l 2 (N) (the approximations have finite support). Now let α j,m be the
computed approximations of Q ∗m T Q m e j , e j  to accuracy 1/m, then since T ∈ IQR
g

we have that limm→∞ α j,m = α j ∈ σ (T ). Furthermore, {α j : j ∈ N} is dense in

2 A simple compactness argument shows that for any bounded operator T there is a corresponding function
g that works.

123
58 M. J. Colbrook, A. C. Hansen

σ (T ). We have that
 −1  
 
(T − α j,m I )−1  ≤ T Q m e j − α j,m Q m e j 

and hence that


 
dist(α j,m , σ (T )) ≤ g −1 (T Q m e j − α j,m Q m e j ). (5.4)

Given m, j, we can computean upper bound h j,m for the right-hand side of (5.4)
by approximating the norm T Q m e j − α j,m Q m e j  from above to accuracy 1/m
and finitely many evaluations
  of g. Namely, let x j,m be the approximation of
T Q m e j − α j,m Q m e j  and set

min{l ∈ N : g(l/m) ≥ x j,m }


h j,m = .
m
 
It is then clear that limm→∞ h j,m = 0 and h j,m ≥ g −1 (T Q m e j − α j,m Q m e j ).
We set n (T ) = {α j,m(n,T ) : j = 1, . . . , n} where m(n, T ) is minimal such that
h j,m ≤ 2−n for j = 1, . . . , n. By (5.4), we must have that

n (T ) ⊂ σ (T ) + B2−n (0).

It is also clear that n (T ) → σ (T ) in the Hausdorff metric. 



The final result considers dominant invariant subspaces discussed in Theorem 3.15.
Let M ∈ N, t ∈ (0, 1) and L > 0. We let  ˜ M denote the class of operators such that
t,L
the assumptions of Theorem 3.15 hold (same M) and such that:
1. β/α < t  
 sin φ(span{e j } M

j=1 ,ran(P)) P T (I −P)
2. max T ,   1+ α−β ≤L
cos φ(span{e j } j=1 ,S)
M

We also assume that we are given functions that satisfy (5.2) and consider computing
the dominant invariant subspace 3 (T ) = ran(P) in the space of M-dimensional
subspaces of l 2 (N) equipped with the metric δ̂.
Theorem 5.6 (1 classification for dominant invariant subspace) Given the above
˜ M } ∈ 1 . In other words, for all n ∈ N, there exists a general
set-up we have {3 ,  t,L
tower using radicals, n (T ), each an M-dimensional subspace of l 2 (N), such that for
˜M ,
all T ∈  t,L

δ̂(n (T ), 3 (T )) ≤ 2−n .

Proof of Theorem 5.6 Let n ∈ N and T ∈  ˜ M . Then from Theorem 3.15, we can
t,L
choose m large so that t m L < 2−(n+1) , and hence
−(n+1)
δ̂(span{Q m e j } M
j=1 , ran(P)) < 2 .

123
On the infinite-dimensional QR algorithm 59

Using Theorem 4.8 and its proof, given  we can compute in finitely many arithmetical
and square root operations, approximations vm, j () (of finite support) such that

vm, j () − Q m e j ≤ .

The vectors {Q m e j } M
j=1 are orthonormal, as are the approximations {vm, j ()} j=1 . A
M

simple application of Hölder’s inequality then yields



δ̂(span{vm, j ()} M
j=1 , span{Q m e j } j=1 ) ≤
M
M.

√ triangle inequality, the proof of the theorem is complete by choosing  such


By the
that M ≤ 2−(n+1) and then setting n (T ) = span{vm, j ()} M
j=1 . 


6 Examples and numerical simulations

The aim of the is section is threefold:


1. To demonstrate the convergence and implementation results of Sects. 3–5 on prac-
tical examples.
2. To demonstrate that, as well as the proven results, the IQR algorithm performs
better than theoretically expected in many cases. In particular, we conjecture that
for normal operators whose essential spectrum has exactly one extremal point,
the IQR algorithm will also converge to this point. We also demonstrate cases
where this seems to hold even if there are multiple extreme points of the essential
spectrum and even in non-normal cases.
3. To compare the IQR algorithm to the finite section method and show that in some
cases it considerably outperforms it. In general, one can view σ (Pm Q ∗n T Q n | Pm H )
as a generalised version of the finite section method, now with two parameters (m
and n) that can be varied with n controlling the number of IQR iterates. In some
cases we find this avoids spectral pollution whilst still converging to the entire
spectrum.
Before embarking with some numerical examples, two remarks are in order. First,
extra care has been taken in the case of non-self-adjoint operators whose finite trunca-
tions can be non-normal, and hence the computation of their spectra can be numerically
unstable. Unless stated otherwise, all calculations were performed in double preci-
sion (in MATLAB) and have been checked against extended precision [32] to ensure
that none of the results are due to numerical artefacts. Second, when dealing with
operators acting on l 2 (Z) we use N as an index set by listing the canonical basis as
e0 , e1 , e−1 , e2 , e−2 , . . ., allowing us to apply the IQR algorithm on l 2 (N). Of course
different indexing is possible and in general, this would lead to different implementa-
tions of the IQR algorithm,3 but we stick with this ordering throughout.

3 A discussion of this is beyond the scope of this paper. In effect, for invertible operators, this corresponds
to choosing the order of columns on which to perform a Gram-Schmidt type procedure.

123
60 M. J. Colbrook, A. C. Hansen

6.1 The finite section method

We first briefly say a few words on the finite section method, the standard means
to discretise infinite matrices, since comparisons will be made later. If {Pm }m∈N is a
sequence of finite-rank projections such that Pm+1 ≥ Pm and Pm → I strongly, where
I is the identity, then the idea is to replace T by the finite square matrix Pm T | Pm H
(typically, one takes Pm to be the orthogonal projection onto span{e1 , . . . , em }). Thus,
to find σ (T ), we instead compute σ (Pm T | Pm H ). However, there can be significant
issues when using the finite section method. In general, there is no guarantee that the
computed spectra σ (Pm T | Pm H ) need converge to σ (T ).
For example, consider the shift operator Se j = e j+1 on l 2 (N). If Pm projects onto
span{e1 , . . . , em }, we would get that σ ((Pm S| Pm H ) = {0} for all m, whereas σ (S) is
the closed unit disc. We can also have that σ (Pm T | Pm H )  σ (T ). For example, let
⎛ ⎞
a1 i
⎜ 1 a2 i ⎟
⎜ ⎟
⎜ 1 a3 i ⎟

A=⎜ ⎟, (6.1)
. . ⎟
⎜ 1 a4 . ⎟
⎝ ⎠
.. ..
. .

where a j = 5 cos( j)/4 + 2i sin( j). To gain an accurate picture of the spectrum, note
that A is banded and hence we can compute approximates to the pseudospectrum [38].
In order to approximate the spectrum in the best possible way we must take  as small
as possible. Unfortunately, there is a restriction to how small  can be depending on
mach (machine precision) of the software used. To illustrate this, observe that the
approximates are given by (a discrete version of)

m (A) = {z ∈ C : min{ λ : λ ∈ σ (Pm (A − z)∗ (A − z)| Pm H )} ≤ }
√ (6.2)
∪ {z ∈ C : min{ λ : λ ∈ σ (Pm (A − z)(A − z)∗ | Pm H )} ≤ }.

Thus, ignoring the additional error in computing the smallest eigenvalues of the
squared operator and assuming A to have matrix entries of order 1, computing m (A)
will have the same challenges as if one squares a real number and then takes its square
root. In particular, due to the floating point arithmetic used in the software and (6.2),
we must ensure that

 mach ,

and this puts a serious restriction on our computation, particularly for the non-normal
case where the distance d H (σ (T ), σ (T )) may be large (though we always have
σ (T ) ⊂ σ (T )). However, it is possible to detect spectral pollution outside of σ (T )
if we can approximate it well.
The phenomenon of “spectral pollution” occurs for A: namely, the computed spec-
trum σ (Pm A| Pm H ) contains elements that have nothing to do with σ (A). This is

123
On the infinite-dimensional QR algorithm 61

100
2 10
1 5 100 5

5
5 10

10
100 10
0
1
100 10 0.5

5
10
10
100
5
10

10
5 5

10
0 0

0
10
100

5
100
5
5
1 00 10

10
-1 5 -0.5
10 0 10 10
100

5
10

10
0
-2 -1 10
100 5
5 10 5 5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
-3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Fig. 1 Left: σ (A) plotted as contours of the resolvent norm, as well as σ (Pm A| Pm H ) for m = 300 with
the false eigenvalue (recall that σ (A) ⊆ σ (A)). Right: σ (T ), σ (T ) and σ (Pm T | Pm H ) for m = 100

visualised in Fig. 1, an example with spectral pollution z ∈


/ σ1/10 (A), where the same
phenomenon occurs for larger m. The spectral pollution phenomenon is well-known.
As the following theorem suggests, such pollution can be arbitrarily bad.

Theorem 6.1 (Pokrzywa [58]) Let A ∈ B(H) and {Pm } be a sequence of finite-
dimensional projections converging strongly to the identity. Suppose that S ⊂ We (A).
Then there exists a sequence { P̃m } of finite-dimensional projections such that Pm < P̃m
(so P̃m → I strongly) and

d H (σ (Am ) ∪ S, σ ( Ãm )) → 0, as m → ∞,

where

Am = Pm A| Pm H , Ãm = P̃m A| P̃m H

and d H denotes the Hausdorff metric.

Despite this result, the finite section can perform quite well. This is the case for
self-adjoint operators [6,21,36] and it is also well suited for the computation of pseu-
dospectra of Toeplitz operators [14,18]. Moreover, in general, we have the following
(recall that We (T ) is the convex hull of the essential spectrum for T normal):

Theorem 6.2 (Pokrzywa [58]) Let T ∈ B(H) and {Pm } be a sequence of finite-
dimensional projections converging strongly to the identity. If λ ∈
/ We (T ) then
λ ∈ σ (T ) if and only if

dist(λ, σ (Pm T | Pm H )) −→ 0, as m → ∞.

However, if we want to use the finite section method and rely on Theorem 6.2, we
must know We (T ), and that may be unpleasant to compute. Alternatively, we could
hope that σess (T ) is close to We (T ). For example, if T is hypo-normal (T ∗ T − T T ∗ ≥
0) then

123
62 M. J. Colbrook, A. C. Hansen

conv(σess (T )) = We (T ),

where conv(σess (T )) denotes the convex hull of σess (T ). But what if we have a “very
non-normal” operator?
Another problem we may encounter when using the finite section method is that
even though σd (T ) may be recovered, one may get a very misleading picture of the
rest of the spectrum. Such problems are illustrated in the following simple example.
Let
⎛ ⎞
2.5 + 0.5i 0 0 0 0 0 0 ···
⎜ 1 3 − 0.5i 0 0 0 0 0 ···⎟
⎜ ⎟
⎜ 0 1 1.7 0.05 0 0 0 ···⎟
⎜ ⎟
⎜ 0 0 0.05 t4 0 0 0 ···⎟
⎜ ⎟
T =⎜ 0 0 0 0 t5 0 0 ···⎟ , (6.3)
⎜ ⎟
⎜ 0 0 0 0 1 t6 0 ···⎟
⎜ ⎟
⎜ 0 0 0 0 0 1 t7 ···⎟
⎝ ⎠
.. .. .. .. .. .. .. ..
. . . . . . . .

where t j = 1 + 0.5(sin( j) + i cos( j)) for j ≥ 4. This operator decomposes into


an upper 4 × 4 block and an operator acting on the perpendicular subspace. It is also
possible to compute the spectrum analytically (it consists of a disc of radius 1 centred at
1 together with two isolated eigenvalues). Again, we can compute the pseudospectrum
of T (Fig. 1) to reveal that whilst the eigenvalues produced by the finite section method
are correct, they do not capture the entire spectrum. It is straightforward to adapt this
example (e.g. by changing basis) to have the same phenomena without an obvious
decomposition of the operator into a finite part and triangular part. Without the support
from the picture of the pseudospectrum, the finite section method does not provide
information regarding the boundary of the essential numerical range of T —there is a
misleading circle of eigenvalues of Pm T | Pm H which do not occur along the boundary
of the essential spectrum but are simply given by the diagonal entries {t5 , t6 , . . . , tm }.

Remark 6.3 The previous examples demonstrated that, in general, the finite section
method is not always suitable for computing spectra. Rather then working with square
sections of the infinite matrix T , one should work with uneven sections Pn T Pm , where
the parameters n and m are allowed to vary independently. Indeed, the algorithms
presented in [24,38] use this method. In effect, we need to know how large n should
be to retain enough information of the operator T Pm . This type of idea is also used
implicitly in the IQR algorithm (see Sect. 4).

6.2 Numerical examples I: normal operators

Example 6.4 (Convergence of the IQR algorithm) We begin with two simple examples
that demonstrate the linear (or exponential) convergence proven in Theorem 3.9 and
Corollary 3.12 (and its generalisations). Consider first the one-dimensional discrete
Schrödinger operator given by

123
On the infinite-dimensional QR algorithm 63

10
10

10
10

10 10

10 10

0 50 100 150 200 250 0 50 100 150 200 250 300

Fig. 2 Exponential convergence to the diagonal blocks for T1 and T2

⎛ ⎞
v1 1
⎜ 1 v2 1 ⎟
⎜ ⎟
⎜ 1 v3 1 ⎟
T1 = ⎜

⎟,
⎜ 1 v
. . .⎟

⎝ 4 ⎠
.. ..
. .

where v j = 5 sin( j)2 / j if j ≤ 10 and v j = 0 otherwise. As a compact (in fact
finite rank) perturbation of the free Laplacian, σ (T1 ) consists of the interval [−2, 2]
together with isolated eigenvalues of finite multiplicity which can be computed [73].
The second operator, T2 , consists of taking the operator
⎛ ⎞
2 0 0 0
⎜0 3i 0 0 ⎟ 
T0 = ⎜ 2
⎝0 0 − 5 0 ⎠
⎟ U1 ,
4
0 0 0 −8 9i

where Uk denotes the bilateral shift e j → e j+k , writing this as an operator on l 2 (N)
and then mixing the spaces via a random unitary transformation on the span of the first
9 basis vectors. This ensures T2 is not written in block form but has known eigenvalues.
We have plotted the difference in norm between the first j × j block of each Q ∗n Tl Q n
and the diagonal operator formed via the largest j eigenvalues for j = 1, 2, 3 and 4
in Fig. 2. The plot clearly shows the exponential convergence.

Example 6.5 (Convergence to extremal parts of the spectrum) To see why we may
need some condition on σ (T ) for convergence of the IQR algorithm to the extreme
parts of the spectrum, we consider Laurent and Toeplitz operators with symbol given
by a trigonometric polynomial

j=k
a(t) = ajt j.
j=−k

123
64 M. J. Colbrook, A. C. Hansen

Given such a symbol, we define Laurent and Toeplitz operators


⎛ ⎞
··· ··· ··· ··· ··· ··· ···
⎜··· a0 a−1 a−2 a−3 a−4 ···⎟ ⎛ ⎞
⎜ ⎟ a0 a−1 a−2 ···
⎜··· a1 a0 a−1 a−2 a−3 ···⎟
⎜ ⎟ ⎜ a1 a0 a−1 ···⎟
L(a) = ⎜
⎜··· a2 a1 a0 a−1 a−2 ⎟ ⎜
· · · ⎟ , T (a) = ⎝ ⎟,
⎜··· a2 a1 a0 ···⎠
⎜ a3 a2 a1 a0 a−1 ···⎟ ⎟
⎝··· · ·· ··· ··· ···
a4 a3 a2 a1 a0 ···⎠
··· ··· ··· ··· ··· ··· ···

acting on l 2 (Z) and l 2 (N) respectively. Note that L(a) is always normal whereas T (a)
need not be (see for example [18]). A simple example already mentioned is a(t) = t
which gives rise to the bilateral and unilateral shifts L(a) = U1 and T (a) = S. In this
case, both of these operators are invariant under iterations of the IQR algorithm and
hence their finite sections Pm Q ∗n T Q n | Pm H always have spectrum {0}. In the case of
L(a) this is an example of spectral pollution, whereas in the case of T (a) this does not
capture the extremal parts of the spectrum. Regarding pure finite section, the following
beautiful result is known:

Theorem 6.6 (Schmidt and Spitzer [62]) If a is a trigonometric polynomial then we


have the following convergence in the Hausdorff metric:

lim σ (Pm L(a)| Pm H ) = lim σ (Pm T (a)| Pm H ) = σ (T (ar )) =: ϒ(a),
m→∞ m→∞
r ∈(0,∞)

where ar (t) = a(r t). Furthermore, this limit set is a connected finite union of analytic
arcs, each pair of which has at most endpoints in common.

It is straightforward to construct examples where it appears that both lim n→∞ Pm Q ∗n


T (a)Q n | Pm H and limn→∞ Pm Q ∗n L(a)Q n | Pm H exist and are either the extreme parts
of σ (L(a)) or of ϒ(a). For example, consider the symbols

t 3 + t −1
a(t) = , ã(t) = t + it −2 .
2

Figure 3 shows the outputs of the IQR algorithm and plain finite section for
the corresponding Laurent and Toeplitz operators for m = 50 and n = 1 and
n = 300. In the case of a, it appears that both limit sets are the extremal parts
of σ (L(a)) (together with 0 if m is not a multiple of 4). Whereas in the case of
ã it appears that limn→∞ Pm Q ∗n T (ã)Q n | Pm H is the extremal parts of ϒ(ã) and
limn→∞ Pm Q ∗n L(ã)Q n | Pm H is the extremal parts of σ (L(ã)) (again together with
a finite collection of points depending on the value of m modulo 3). Curiously, in
both cases we observed convergence in the strong operator topology to block diago-
nal operators (up to unitary equivalence in each sublock), whose blocks have spectra
corresponding to the limiting sets (hence the dependence on the remainder of m mod-
ulo 2 or 3). However, in contrast to convergence to points in the discrete spectrum,

123
On the infinite-dimensional QR algorithm 65

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

-0.2 -0.2

-0.4 -0.4

-0.6 -0.6

-0.8 -0.8

-1 -1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Fig. 3 Top: output of IQR and finite section on T (a) and L(a) for m = 50 and n = 1 (left), n = 300 (right).
Bottom: same but for the symbol ã. In both cases for a given symbol b, σ (L(b)) is given by {b(z) : z ∈ T}
(shown) and σ (T (b)) is given by σ (L(b)) ∪ {z ∈ C\b(T) : wind(b, z)  = 0}

convergence to these operators was only algebraic. This is shown in Fig. 4 where we
have plotted the Hausdorff distance between the limiting set and the eigenvalues of
the first diagonal block. We also shifted the operators (+ 1.1I for a and − 1.5i I for
ã) so that the extremal points correspond to exactly one point. In this scenario and for
all operators (Laurent or Toeplitz) the IQR algorithm converges strongly to a diagonal
operator whose diagonal entries are the corresponding extremal point of σ (L(a)). This
convergence is also shown in Fig. 4 and we observed a slower rate of convergence than
before. This is possibly due to points from the other tips of the petals of σ (L(a)) con-
verging as we increase n. It would be interesting to see if some form of Theorem 6.6
holds for the IQR algorithm (now taking n → ∞). Given the examples presented
here, such a statement would likely be quite complicated. However, we conjecture
that if a normal operator has exactly one extreme point of its essential spectrum (and
finitely many eigenvalues of magnitude greater than ress ) then this extreme point will
be recovered in the limit n → ∞ for large enough m.

Example 6.7 (IQR and avoiding spectral pollution) In this example we consider
whether the IQR algorithm may be used as a tool to avoid spectral pollution. Some-
times when considering σ (Pm T | Pm H ), spectral pollution can be detected by changing
m (edge states which correspond to spectral pollution are often unstable, but this is not

123
66 M. J. Colbrook, A. C. Hansen

10 10

10

10

10

10 10

Fig. 4 Left: algebraic convergence to block diagonal operators. Right: algebraic convergence to diagonal
operators. In both cases, we have plotted the difference in eigenvalues of the first block as we increase n

always the case). In general, σ (Pm Q ∗n T Q n | Pm H ) can be considered as a generalised


version of finite section with a finite number (n) of IQR iterates being performed on
the infinite-dimensional operator before truncation. If Q n is unitary, then this simply
changes the basis before truncation and such a change may reduce (or change) spectral
pollution allowing it to be detected. Here we consider
⎛ ⎞
03
⎜3 0 1 ⎟
⎜ ⎟
⎜ 10 3 ⎟
T3 = ⎜

⎟.
.. ⎟
⎜ 3 0 .⎟
⎝ ⎠
.. ..
. .

The spectrum of T3 is [− 4, − 2]∪[2, 4]. However, if m is odd then 0 ∈ σ (Pm T3 | Pm H ).


We shifted the operator by considering T3 + 0.2I (and then shifted back for the spec-
trum). Figure 5 shows the Hausdorff distance between σ (Pm Q ∗n (T3 +0.2I )Q n | Pm H )−
0.2I and σ (T3 ) as n varies for different m. The spikes in the distance correspond
to eigenvalues leaving the interval [− 4, − 2] and crossing to [2, 4] (also shown in
Fig. 5). The increase in distance as m decreases (for large n) is due less of the interval
[− 4, − 2] being approximated. It appears that the IQR algorithm can be an effective
tool at detecting spectral pollution - certainly a mixture of varying m and n will be
more effective than just varying m.
Another example of this is given by the operator L(a) considered previously. For
fixed m we found that

lim sup dist(z, σ (L(a))) = 0.


n→∞ z∈σ (P Q ∗ L(a)Q |
m n n Pm H )

However, for finite section, spectral pollution occurs for all large m

lim sup dist(z, σ (L(a))) > 0


m→∞ z∈σ (P L(a)|
m Pm H )

123
On the infinite-dimensional QR algorithm 67

2 200

1.8 180

1.6 160

1.4 140

1.2 120

1 100

0.8 80

0.6 60

0.4 40

0.2 20

0 0
0 20 40 60 80 100 120 140 160 180 200 -4 -3 -2 -1 0 1 2 3 4

Fig. 5 Left: d H (σ (Pm Q ∗n (T3 + 0.2I )Q n | Pm H ) − 0.2I , σ (T3 )) as a function of n for different m. Right:
σ (Pm Q ∗n (T3 + 0.2I )Q n | Pm H ) − 0.2I as a function of n for m = 201. Note the crossing of eigenvalues
across the spectral gap

0.55 0.7

0.5
0.6
0.45

0.4 0.5

0.35
0.4
0.3
0.3
0.25

0.2 0.2

0.15
0.1
0.1

0.05 0
0 2 4 6 8 10 12 14 16 18 20 0 2 4 6 8 10 12 14 16 18 20

Fig. 6 Left: d H (σ (Pm Q ∗n L(a)Q n | Pm H ), σ (L(a))) as a function of n for different m.


d H (σ (Pm Q ∗n L(ã)Q n | Pm H ), σ (L(ã))) as a function of n for different m

and the IQR algorithm can only recover the extreme parts of the spectrum

lim d H (σ (Pm Q ∗n L(a)Q n | Pm H ), σ (L(a))) > 0.


n→∞

Despite this, we found that for small fixed n > 0 it appears that

lim d H (σ (Pm Q ∗n L(a)Q n | Pm H ), σ (L(a))) = 0.


m→∞

This is shown in Fig. 6 with similar results for L(ã).

6.3 Numerical examples II: non-normal operators

Although Theorem 3.9 considers normal operators, Theorems 3.13 and 3.15 suggest
the IQR algorithm may also be useful for non-normal operators. Indeed, the results
presented here demonstrate that in practice the IQR algorithm can work very well for

123
68 M. J. Colbrook, A. C. Hansen

100
2 10
1 5 100 5

5
5 10

10
1 00 10
0
1 0.5
1 0 0 10

5
10
10
100
5
10

10
5 5

10
0 0

0
10
100

5
100
5
5
100 10

10
-1 5 -0.5
100 10 10
100

5
10

10
0
-2 -1
10
1 00 5
5 10 5 5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
-3
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Fig. 7 Left: output of the IQR algorithm σ (Pm Q ∗n AQ n | Pm ) for m = 300 and n = 1000. Right: output of
the IQR algorithm σ (Pm Q ∗n T Q n | Pm ) for m = 100 and n = 300

non-normal problems. If an infinite matrix T has m isolated eigenvalues {λ1 , . . . , λm }


(repeated according to multiplicity) outside ress (T ) (the essential spectral radius), then
Theorems 3.13 and 3.15 suggest that the eigenvalues will appear on the diagonal of
Pm Q ∗n T Q n | Pm H as n → ∞, i.e.

σ (Pm Q ∗n T Q n | Pm H ) −→ {λ1 , . . . , λm }, as n → ∞.

We will verify this numerically in the next examples. However, we will see that not
only do we get convergence to the eigenvalues, but often we also pick up parts of
the boundary of the essential spectrum (this was the case when considering T (a) but
appeared not to be the case for T (ã)). This phenomenon is not accounted for in the
previous exposition where normality was crucial for proving Theorem 3.9.

Example 6.8 (Recovering the extremal part of the spectrum) Let us return to the infinite
matrices A in (6.1) and T in (6.3) from Sect. 6.1. We have run the IQR algorithm with
n = 1000 and n = 300 for A and T respectively, shown in Fig. 7. We see that if one
takes a finite section after running the IQR algorithm, then part of the boundary of the
essential spectrum also appears, along with the discrete spectrum σd (A). Note that the
part of the boundary that is captured is the extreme part (points with largest modulus).
It seems that after running the IQR algorithm, the spectral information from the largest
isolated eigenvalues and the largest approximate point spectrum is “squeezed up” to
the upper and leftmost portions of the matrix. This is not completely counter-intuitive
given (2.5) and is what normally happens in finite dimensions. For both examples, we
found that the IQR iterates converge to an upper triangular matrix (analogous to the
finite-dimensional case) in agreement with Theorems 3.13 and 3.15. The convergence
of the upper 1 × 1 block for A (corresponding to the dominant eigenvalue) and 4 × 4
non-diagonal block for T are shown in Fig. 8 where we have plotted the difference in
norm.

123
On the infinite-dimensional QR algorithm 69

10

10
0.5

0 10

-0.5
10

-1

10
0 0.5 1 1.5 2 2.5 3 0 100 200 300 400 500 600 700

Fig. 8 Left: output of the IQR algorithm σ (Pm Q ∗n T1Q n | P ) for m = 100 and n = 300. The reference
m
circle is the boundary of the essential spectrum. Right: convergence of upper diagonal blocks for operators
A, T and T 1

We also discuss another drawback of the algorithm, m , to compute the pseu-


dospectrum by perturbing the operator T . Let T 1 be the operator obtained from T if
we set T15,4 = 5 × 107 (note that this gets rid of the block form). The computation
of m involves squaring the operator and hence leads to matrices of norm of order
1015 , making it impossible to compute σ (T 1) using double precision for   10−1 .
However, the IQR algorithm shares the pleasant feature of finite section in allowing
a wider range of magnitudes of the matrix entries of the operator. The output for
n = 300, m = 100 is shown in Fig. 8 as well as convergence of the upper 2 × 2 block
(corresponding to the dominant eigenvalues). Note in this case we can only compute
this upper block to an accuracy of about 10−8 in double precision due to the large
perturbed entry. However, this is still much better than pseudospectral techniques. All
the errors in Fig. 8 were obtained via comparison with converged matrices computed
using quadruple precision.

Example 6.9 (P T -symmetry in quantum mechanics) Finally, we consider a so-called


P T -symmetric operator (non-normal), demonstrating the same phenomena. A Hamil-
tonian H = p 2 /2 + V (x) is said to be P T -symmetric if it commutes with the action
of the operator P T where P is the parity operator x̂ → − x̂, p̂ → − p̂ and T the
time operator p̂ → − p̂, i → − i. Further distinction can be made between exact
(unbroken) P T symmetry, when H shares common “eigenfunctions” with P T , and
broken P T symmetry, when they possess different eigenfunctions. Many P T Hamil-
tonians possess the remarkable property that their spectra are real for small enough
Im(V ), but that the spectrum becomes complex above a certain threshold [11]. This
phase transition from exact to broken P T phase is known as symmetry breaking.
There has been a lot of interest in recent years, both theoretically and experimentally,
in non-Hermitian P T -symmetric Hamiltonians [45,60].
We consider an operator on l 2 (Z) of the form

(H1 x)n = xn−1 + xn+1 + Vn xn . (6.4)

123
70 M. J. Colbrook, A. C. Hansen

1.5 1.5

Resolvent Norm 5 5
1 Finite Section 1
10 10
IQR 100

1100
100
0.5 5 0.5

00
10

10

5 10
10 0
10
10
100

100
5
5
10
Im(z)

Im(z)

100
10 10

5
10
0 100
100 0 10

5
11000

10
10
0

10 0 100 100

100
1 5

10

10
50

5
1 00
10

5
-0.5 5 -0.5
0

10
11000

0
10
100
-1 -1 10 5
10 5
5

-1.5 -1.5
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Re(z) Re(z)

Fig. 9 The figures show finite sections σ (Pm H1 | Pm H ) (magenta) and (shifted) σ (Pm Q ∗n H1 Q n | Pm H )
IQR iterates (blue) along with converged resolvent norm contours for γ = 1 (left) and γ = 2 (right). Both
figures are for m = 500, n = 3000 and show the convergence to the extremal parts of the spectrum

This commutes with (the discrete version of) P T precisely when the potential has
even real part and odd imaginary part. We tested the IQR algorithm on the potential
$
cos(n) + iγ sin(n), mod (n, 2) = 0
Vn = , (6.5)
0, mod (n, 2) = 1

and found similar results for other potentials. Figure 9 shows the same qualitative
behaviour as the last example for γ = 1, 2 at m = 500, n = 3000. We shifted by
2.2 and 2.15 for γ = 1, 2 respectively. For comparison, we have shown converged
resolvent norms. We found that spectral pollution with no IQR iterates was consistent
as we varied m. However, for a fixed m, increasing the number of iterates (n → ∞)
caused σ (Pm Q ∗n H1 Q n | Pm H ) to approach the extremal part of the spectrum.

6.4 Numerical examples III: random non-Hermitian operators and boundary


conditions

In this final section, we explore examples where the Pm Q ∗n T Q n Pm naturally give rise
to periodic boundary conditions (this was already seen for some examples of Laurent
operators in Sect. 6.2). Both examples discussed here are physically motivated random
tridiagonal operators on the lattice Z. One of the key applications of studying such ran-
dom operators can be found in condensed matter physics. The discrete models below
have been used to study conductivity of disordered media, flux lines in superconduc-
tors and asymmetric hopping particles. Many such operators are also the discretisation
of certain stochastic differential equations. As we will demonstrate, the IQR method
can be a powerful way of avoiding spectral pollution caused by unnatural “open”
boundary conditions in forming the finite section Pm T Pm . In both of these examples,
periodic boundary conditions are natural and we find that taking finite sections after
iterating the IQR algorithm captures periodic boundary conditions.

123
On the infinite-dimensional QR algorithm 71

Example 6.10 (Hopping sign model in sparse neural networks) The first example is a
non-normal operator with random sub and superdiagonals, first studied by Feinberg
and Zee [23,31,40]. The usual “Hopping Sign Model” is defined via

(H2 x)n = xn−1 + bn xn+1 ,

with bn ∈ {± 1} (say independent Bernoulli with parameter p = 1/2). This describes


a particle “‘hopping” on Z and can be mapped into a (complex-valued) random walk.
We will consider a slightly different operator described by


(H3 x)n = sn−1 exp(−g)xn−1 + sn+ exp(g)xn+1 , (6.6)

and appearing in [1] in the context of sparse neural networks. We shall assume that
g is real and non-negative and that s ±j are i.d.d. random variables with Bernoulli
distribution p. In other words

P(s ± ±
j = 1) = 1 − P(s j = −1) = p.

We will only consider g = 1/10 and p = 1/2, but will vary p in an effort to compute
the spectrum of H3 which only depends on the support of the distribution of the s ± j ’s.
It is easy to prove that the spectrum (and pseudospectrum) of H3 is almost surely
constant and that there is no inessential spectrum. Furthermore, one can show that
σ (H3 ) is contained in the annulus {z ∈ C : 2 sinh(g) ≤ |z| ≤ 2 cosh(g)}.
Finite section calculations associated with this operator have some interesting prop-
erties and are extensively studied in [1]. If one projects using the standard basis of
l 2 (Z) then one obtains matrices of the form

⎛ − ⎞
0 s−n+1 exp(−g)
⎜ + .. ⎟
⎜s . ⎟
Mn1 = ⎜ −n+1 exp(g) 0 ⎟.
⎜ .. .. ⎟
⎝ . . −
sn−1 exp(−g)⎠
+
sn−1 exp(g) 0

If we use open boundary conditions (i.e. we simply project onto the space spanned by
{e−n , . . . , en }) then one can “gauge” away g by a similarity transformation, leading
to

⎛ − ⎞
0 s−n+1
⎜ + .. ⎟
⎜s 0 . ⎟
Mn =⎜

−n+1 ⎟.

⎝ .. .. − ⎠
. . sn−1
+
sn−1 0

123
72 M. J. Colbrook, A. C. Hansen

Fig. 10 Top: output of finite section over a random sample of 200 matrices of size 200 (left) and the
estimates using pseudospectral techniques (right). Bottom: the output of IQR over 200 samples computing
σ (Pm Q ∗n H3 Q n | Pm H ) for m = 200 and n = 50 (left), n = 2000 (right). Note that after a few iterates, the
output seems to agree with periodic boundary conditions and then increasing the number of iterates leads
to convergence to the extremal parts of the essential spectrum

On the other hand, the use of periodic boundary conditions leads to the matrix
⎛ − ⎞
0 s−n+1 exp(−g) sn+ exp(g)
⎜ + .. ⎟
⎜s . ⎟
Mn2 =⎜ −n+1 exp(g) 0 ⎟,
⎜ .. .. ⎟
⎝ . . −
sn−1 exp(−g)⎠
+
sn− exp(−g) sn−1 exp(g) 0

which does not suffer from this setback.


In [1] this phenomenon was studied via localisation of the eigenvalues of Mn2 ,
in particular using the Lyapunov exponent κ(z) which is equal to the inverse of the
localisation length. An eigenfunction ψ with eigenvalue z localised around x0 behaves
approximately as

|ψ(x)| ∼ exp(−κ(z) |x − x0 |).

123
On the infinite-dimensional QR algorithm 73

If one defines recursively

ψn+2 −
yn+1 (z) = exp(g) = −(sn−1 /sn+ )/yn (z) + z/sn+
ψn+1

then (in the limit of large system sizes)

1
N
   
κ(z; g) = lim log  y j (z) − g .
N →∞ 2N + 1
j=−N

This is known as the transfer matrix approach. For fixed z, as we increase g, κ(z; g)
becomes negative. The heuristic is that a hole opens up in the spectrum corresponding
to a negative Lyapunov exponent. Eigenvalues of Mn2 inside the hole are swept up
and become delocalised moving to the rim of the hole, whereas those outside remain
largely undisturbed. Eigenvalues of Mn1 inside the negative κ zone correspond to edge
states due to the finite system size approximation.
Figure 10 shows the output of a sample of 200 finite sections with open boundary
conditions and matrix size 200. We have also shown the annular region that bounds
the spectrum, as well as the contour κ = 0. In order to calculate κ, we calculated the
above sum on a grid with large N to ensure convergence. The colour bar corresponds
to the inverse participation ratio (log scale) of normalised eigenfunctions defined by

j |ψi |4
1/P ≡  .
j |ψi |2

Note that this has a maximum value of 1 (localised) and a minimum value of 1/N
(delocalised), N being the size of the matrix. Open boundary conditions produce
spectral pollution in the hole with localised eigenfunctions and the contour κ = 0
corresponds to the delocalised region. In order to compare to the spectrum of the
infinite operator on l 2 (Z) we have plotted σ (H3 ), for  = 10−2 , calculated using
matrix sizes of order 105 . We note that the spectrum is independent of p ∈ (0, 1) so
we have also shown the union of these estimates over p = {k/100}99 k=1 . Although the
algorithm used to compute the pseudospectrum is guaranteed to converge to σ (H3 ),
there are regions in the complex plane where this convergence is very slow. Taking
unions over p is simply a way to speed up this convergence. Upon taking  smaller, we
found that the spectrum appeared to have a fractal-like nature. It also appears that the
hole in the spectrum corresponds to the boundary of two ellipses. It is easy to prove
that the ellipse

E 1 = {exp(g + iθ ) + exp(−g − iθ ) : θ ∈ [0, 2π )}

is contained in σ (H3 ) and that the spectrum (and pseudospectrum) of H3 has fourfold
rotational symmetry. Denoting the rotation of E 1 by π/4 as E 2 we have shown E 1 ∪ E 2
in the figure.

123
74 M. J. Colbrook, A. C. Hansen

Figure 10 also shows the effect of IQR iterations over random samples of size 200
for m = 200 and n = 50 and 2000. Remarkably, as we increase n, a few iterations is
enough to capture periodic boundary conditions and sweep away the localised edge
states. We have also shown the inverse participation ratio which, although now is
defined with respect to a new basis, still gives an indication of how “diagonal” the
matrix Pm Q ∗n H3 Q n | Pm H is. If we increase n further, the output approaches the edge of
the spectrum with eigenvectors becoming more localised (in the new basis). We found
exactly the same phenomena to occur if we shifted the operator H3 , with convergence
to the corresponding extremal part of the essential spectrum.

Example 6.11 (NSA Anderson model in superconductors) Finally, we consider a non-


normal operator with no inessential spectrum where the IQR algorithm does not seem
to converge to the boundary of the essential spectrum, but rather to a curve associated
with periodic boundary conditions in the large system size limit.
Over the past twenty years there has been considerable interest in non-self-adjoint
random operators, sparked by Hatano and Nelson studying a non-self-adjoint Anderson
model in the context of vortex pinning in type-II superconductors [39]. Their model
showed that an imaginary gauge field in a disordered one-dimensional lattice can
induce a delocalisation transition. The operator in B(l 2 (Z)) can be written as

(H4 x)n = exp(−g)xn−1 + exp(g)xn+1 + Vn xn (6.7)

where g > 0 and V is a random potential. This operator also has applications in
population biology [52] and the self-adjoint version of this model is widely studied
for the phenomenon of Anderson localisation (absence of diffusion of waves) [2,12].
In the non-self-adjoint case, complex values of the spectrum indicate delocalisation.
Note that we now have randomness on the diagonal with fixed coupling coefficients
exp(±g).
Standard finite section produces real eigenvalues since the matrix Pm H4 | Pm H is
similar to a real symmetric matrix. However, truncating the operator and adopting
periodic boundary conditions gives rise to the famous “bubble and wings”. If V = 0
then the spectrum is an ellipse E = {exp(g + iθ ) + exp(−g − iθ ) : θ ∈ [0, 2π )},
but as we increase the randomness wings appear on the real axis. For a study of this
phenomenon and the described phase transition we refer the reader to [31]. Goldsheid
and Khoruzhenko have studied the convergence of the spectral measure in the periodic
case as N → ∞ in [33], N being the number of sites. In general, the support of these
measures as N → ∞ can be very different from the spectrum of the operator on l 2 (Z)
given by (6.7), highlighting the difficulty in computing the spectrum.
We consider the case g = 1/2 with Vn i.i.d. Bernoulli random variables taking
values in {±1} with equal probability p = 1/2. Again, there is no inessential spectrum
and the spectrum/pseudospectrum is constant almost surely, depending only on the
support of the distribution of the Vn . The following inclusion is also known, which
bounds the spectrum:

σ (H4 ) ⊂ (conv(E) + [−1, 1]) ∩ (E + B1 ),

123
On the infinite-dimensional QR algorithm 75

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

Fig. 11 The output of IQR over 200 samples computing σ (Pm Q ∗n H4 Q n | Pm H ) for m = 30 and n = 15
(left), n = 300 (right). Note that we appear to recover the periodic limit curve and increasing the number
of iterates converges to the extremal parts. Applying shifts allowed us to recover the extremal parts of the
limit curves

where conv(E) its closed convex hull of E and B1 denotes the closed unit disk. The
choice of g ensures the spectrum has a hole in it. One may calculate the Lyapunov
exponent, either by the transfer matrix approach or by calculating a potential related
to the density of states. The limiting distribution of the eigenvalues of finite section
with periodic boundary conditions is given by the complex curve

{z ∈ C\R : κ(z) = 0} ∪ {x ∈ supp(d N ) : κ(x + i0) > 0}.

The output of the IQR algorithm for m = 30 and n = 15 and n = 300 over 200
random samples are shown in Fig. 11. Note that if we took n = 0, the spectrum would
be real in stark contrast to Fig. 11. Taking a small number of IQR iterates approximates
the bubble and wings with a few remaining real eigenvalues. However, upon increasing
n, the output does not seem to converge to the extremal parts of the spectrum, but seems
to remain stuck on the limit curve with the operator Pm Q ∗n H4 Q n | Pm H . Shifting by
+4i I caused the output to recover the top part of the limit curve.

Remark 6.12 For any operator T that has Q n unitary, the essential spectrum and spec-
trum of Q ∗n T Q n is equal to that of T . As the above two examples suggest, taking a
small value of n could be used as a method of testing eigenvalues of finite section meth-
ods that correspond to finite system size effects, such as open boundary conditions.
This could be used in quasi-periodic systems or systems with very few symmetries,
where there is no obvious choice of appropriate boundary conditions. However, detect-
ing isolated eigenvalues of finite multiplicity within the convex hull of the essential
spectrum still remains a challenge.

7 Concluding remarks and open problems

This paper discussed the generalisation of the famous QR algorithm to infinite dimen-
sions. It was shown that for a large class of operators, encompassing many in scientific

123
76 M. J. Colbrook, A. C. Hansen

applications, the iterates of the IQR algorithm can be computed efficiently on a


computer. For matrices with finitely many entries in each column, the computation
collapses to a finite one. In general, for an invertible operator, we can compute the
iterates to any given accuracy in finite time. Furthermore, it was proven that for normal
operators, the algorithm converges to the discrete spectrum outside the convex hull
of the essential spectrum, with the rate of convergence generalising the well-known
result in finite dimensions. These were extended to more general invariant subspaces
and non-normal operators in Theorems 3.13 and 3.15. Unfortunately, the IQR algo-
rithm cannot, in general, be sped up with the use of shift strategies, which considerably
speed up the finite-dimensional algorithm [57]. This is due to two reasons. The first
is simply that there is no final column of an infinite matrix, hence the usual link with
inverse iteration cannot be made. Second, it is also possible for part of the spectrum
to be lost in the limit (see Example 3.11), and below the essential spectral radius there
is no guarantee of convergence.
Despite these inherent drawbacks of the infinite-dimensional setting, we showed
how the IQR algorithm can be used to gain new classification results and convergent
algorithms in the SCI hierarchy. In particular, we showed how to compute eigenvalues
and eigenspaces outside the essential spectrum with error control for normal operators.
This was extended to dominant invariant subspaces for general (possibly non-normal)
operators as well as the spectrum of a large class of operators that includes compact
normal operators with eigenvalues of distinct magnitude. These results present the
first such algorithms that tackle these problems with error control.
Finally, we demonstrated that the IQR algorithm can be implemented both in the-
ory and in practice. We demonstrated the convergence theorems in Sect. 3 as well as
some examples (normal and non-normal) where the extremal parts of the essential
spectrum also appear to be recovered. Based on this, we conjecture that there may
be a large class of operators for which the IQR algorithm converges to the extreme
parts of the essential spectrum.4 In particular, we conjecture that this holds for nor-
mal operators if the set of extremal points of the essential spectrum has size one.
However, an example was given where convergence to the essential spectrum was
only algebraic O(n −α ) as n → ∞ as opposed to the linear convergence rate O(r n )
to the discrete spectrum/eigenvalues. It was also demonstrated that the truncations
of the IQR algorithm have spectra agreeing with periodic boundary conditions for a
range of operators in the class of “pseudoergodic” NSA random operators. In some
cases, the algorithm performed much better than standard finite section methods. We
should stress that, as in the case of the finite section method, for fixed n and m → ∞,
the output σ (Pm Q ∗n T Q n | Pm ) will in general still suffer from the spectral pollution
phenomenon and in some cases not recover the full spectrum. This was apparent in
the numerical examples and is likely to hold for many operators even when taking a
mixture of double limits m, n → ∞. However, examples of Laurent operators were
given where it appears σ (Pm Q ∗n T Q n | Pm ) converges to the spectrum as m → ∞ for
fixed n > 0 but not n = 0. We hope that the algorithm’s potential use in sifting out
spectral pollution/complying with appropriate boundary conditions via a canonical
unitary transformation can also be exploited.

4 This is false in general as is easily seen by considering the shift operator.

123
On the infinite-dimensional QR algorithm 77

Based on our findings, we end with a list of open problems for further study on the
theoretical properties of the IQR algorithm:
• Which conditions are needed on a possibly non-normal operator in order for the
IQR algorithm to pick up the extreme points of the essential spectrum?
• Is the convergence rate to non-isolated points of the spectrum algebraic?
• For operators which do not have a trivial QR decomposition, is there a way of
choosing n = n(m) such that σ (Pm Q ∗n(m) T Q n(m) | Pm ) converges to the spectrum
as m → ∞? If not, then for which classes of operators does such a choice exist?
• Is there a link between the IQR algorithm and the finite section method with
periodic boundary conditions for the class of pseudoergodic operators?
• Are there other cases where the IQR algorithm alleviates the need to provide natural
boundary conditions when applying the finite section method?
• Extending the IQR algorithm to unbounded operators. Can the IQR algorithm also
be extended to a continuous version for differential operators?

Acknowledgements MJC acknowledges support from the UK Engineering and Physical Sciences Research
Council (EPSRC) Grant EP/L016516/1. ACH acknowledges support from a Royal Society University
Research Fellowship as well as EPSRC Grant EP/L003457/1.We would also like to thank the referees
whose comments and suggestions led to the improvement of the manuscript.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna-
tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if changes were made.

A Appendix

A.1 Example codes

Here we show example code for the IQR algorithm in the case that the matrix has k
subdiagonals. The code can easily be adapted for the more general case considered in
Sect. 4.1.
Algorithm A.1

% The Infinite_QR(A,n,k,m) takes a section P_{nk+m}AP_{nk+m}


% of an infinite matrix A with k subdiagonals, performs n iterations
% of the infinite-dimensional QR algorithm and returns
% J = P_mQ_n*AQ_nP_m.

function J = Infinite_QR(A,n,k,m)
d = size(A,2);
for j=1:n
A = Inf_QR(A,d-j*k,k); % The output in each loop is actually
end % U_(d-j*k)...U_1A_(j-1)U_1...U_(d-j*k)
J = A(1:m,1:m); % if A_j is the j-th term in the QR iteration.

Algorithm A.2
% Inf_QR(A,n,k) takes a matrix A with k subdiagonals and performs
% multiplication by n Householder transformation from the left and

123
78 M. J. Colbrook, A. C. Hansen

% right, i.e. B = U_n...U_1AU_1...U_n.

function B = Inf_QR(A,n,k)
B = A; d = size(A,1);
for j = 1:n
u = House(A(j:j+k,j));
A(j:j+k,j:d) = A(j:j+k,j:d) - 2*u*(u’*A(j:j+k,j:d));
B(j:j+k,1:d) = B(j:j+k,1:d) - 2*u*(u’*B(j:j+k,1:d));
B(1:d,j:j+k) = B(1:d,j:j+k) - 2*(B(1:d,j:j+k)*u)*u’;
end

Algorithm A.3

% House(x) takes a vector x and creates a unit vector u


% such that (I - 2u*u’)x = ce_1 where c is some complex
% number (depending on x) and e_1 = [1,0,\ldots].

function u = House(x)
v = x;
if v(1) == 0
v(1) = v(1) + norm(v); %This is the classical way
else %of creating Householder reflections
v(1) = x(1) + sign(x(1))*norm(x); %as in finite dimensions.
end
u = v/norm(v);

A.2 Recalling the basics of the SCI hierarchy

The cornerstone in the SCI hierarchy is the definition of a computational problem,


a general algorithm and towers of algorithms. The basic objects in a computational
problem are as follows:
(i)  is some set, called the domain.
(ii)  is a set of complex-valued functions on  called the evaluation set.
(iii) M is a metric space with metric dM .
(iv)  :  → M is called the problem function.
The set  is the set of objects that give rise to our computational problems. The
problem function  :  → M is what we are interested in computing. Moreover,
the set  is the collection of functions that provide us with the information we are
allowed to read. This leads to the following definition.
Definition A.4 (Computational Problem) Given a primary set , an evaluation set
, a metric space M and a problem function  :  → M, we call the collection
{, , M, } a computational problem.
For instance, when computing the spectrum of bounded operators on l 2 (N), we let
 be a subset of B(l 2 (N)) (for example the set of self-adjoint operators or compact
operators), (M, d) be the set of all non-empty compact subsets of C provided with
the Hausdorff metric d = d H in (1.2). The evaluation functions in  consist of the
family of all functions f i, j : A "→ Ae j , ei , i, j ∈ N, which provide the entries
of the matrix representation of A with respect to the canonical basis {ei }i∈N . Finally,
 : A "→ σ (A).

123
On the infinite-dimensional QR algorithm 79

The goal is to find algorithms which approximate the function . More generally,
the main pillar of our framework is the concept of a tower of algorithms, which is
needed to describe problems that need several limits in the computation. However,
first one needs the definition of a general algorithm.
Definition A.5 (General Algorithm) Given a computational problem {, , M, },
a general algorithm is a mapping  :  → M such that for each A ∈ 
(i) there exists a finite subset of evaluations  (A) ⊂ ,
(ii) the action of  on A only depends on {A f } f ∈ (A) where A f := f (A),
(iii) for every B ∈  such that B f = A f for every f ∈  (A), it holds that  (B) =
 (A).
Note that the definition of a general algorithm is more general than the definition
of a Turing machine or a Blum–Shub–Smale (BSS) machine. A general algorithm has
no restrictions on the operations allowed. The only restriction is that it can only take
a finite amount of information, though it is allowed to adaptively choose the finite
amount of information it reads depending on the input. Condition (iii) assures that the
algorithm reads the information in a consistent way. Note that the purpose of such a
general definition is to get strong lower bounds. In particular, the more general the
definition is, the stronger a proven lower bound will be.
With a definition of a general algorithm we can define the concept of towers of
algorithms. However, before we define that, we will discuss the cases for which we
may have a set-valued function.
Remark A.6 (Set-valued functions) Occasionally we will consider a function  such
that for T ∈  we have that (T ) ⊂ M. In this case, we will still require that a general
algorithm produces a single valued out put i.e (T ) ∈ M for T ∈ . However, we
replace the metric in order to define convergence. In particular, n (T ) → (T ), as
n → ∞ means

inf dM (n (T ), y) → 0.
y∈(T )

Definition A.7 (Tower of Algorithms) Given a computational problem {, , M, },


a tower of algorithms of height k for {, , M, } is a family of sequences of
functions

n k :  → M, n k ,n k−1 :  → M, . . . , n k ,...,n 1 :  → M,

where n k , . . . , n 1 ∈ N and the functions n k ,...,n 1 at the “lowest level” of the tower
are general algorithms in the sense of Definition A.5. Moreover, for every A ∈ ,

(A) = lim n k (A), n k ,...,n j+1 (A) = lim n k ,...,n j (A) j = k − 1, . . . , 1.


n k →∞ n j →∞

In addition to a general tower of algorithms (defined above), we will focus on


radical towers. The definition of a general algorithm allows for strong lower bounds,
however, to produce upper bounds we must add structure to the algorithm and towers of
algorithms. A radical tower allows for arithmetic operations, comparisons and radicals.

123
80 M. J. Colbrook, A. C. Hansen

Definition A.8 (Radical Towers) Given a computational problem {, , M, }, a


Radical Tower of Algorithms of height k for {, , M, } is a tower of algorithms
where the lowest level functions

 = n k ,...,n 1 :  → M

satisfy the following: For each A ∈  the action of  on√A consists of only finitely
many arithmetic operations, comparisons and radicals ( ·) of positive numbers on
{A f } f ∈ (A) , where A f = f (A).
In other words one may say that for the finitely many steps of the computation
of the lowest functions  = n k ,...,n 1 :  → M only the four arithmetic opera-
tions +, −, ·, / within the smallest (algebraic) field which is generated by the input
{A f } f ∈ (A) are allowed. In addition, we allow the extraction of radicals of positive
real numbers. We implicitly assume that any complex number can be decomposed into
a real and an imaginary part, and moreover we can determine whether a = b or a > b
for all real numbers a, b which can occur during the computations. Given the defini-
tions above we can now define the key concept, namely, the Solvability Complexity
Index:
Definition A.9 (Solvability Complexity Index) A computational problem {, , M, }
is said to have Solvability Complexity Index SCI(, , M, )α = k, with respect to a
tower of algorithms of type α, if k is the smallest integer for which there exists a tower of
algorithms of type α of height k. If no such tower exists then SCI(, , M, )α = ∞.
If there exists a tower {n }n∈N of type α and height one such that  = n 1 for
some n 1 < ∞, then we define SCI(, , M, )α = 0. We may sometimes write
SCI(, )α to simplify notation when M and  are obvious.
The definition of the SCI immediately induces the SCI hierarchy:
Definition A.10 (The Solvability Complexity Index Hierarchy) Consider a collection
C of computational problems and let T be the collection of all towers of algorithms
of type α for the computational problems in C. Define

α0 := {{, } ∈ C | SCI(, )α = 0}


αm+1 := {{, } ∈ C | SCI(, )α ≤ m}, m ∈ N,

as well as

α1 := {{, } ∈ C | ∃ {n }n∈N ∈ T s.t. ∀A ∈ , d(n (A), (A)) ≤ 2−n }.

Remark A.11 (The k notation) Note that in this paper we only consider radical towers
and hence the superscript α will be omitted throughout. Thus we will always write
k .
Finally, we recall the definition of 1α .

1α = {{, } ∈ α2 | ∃ {n }n∈N ∈ T


s.t. n (A) ⊂ N2−n ((A)) and n (A) → (A) ∀A ∈ }

123
On the infinite-dimensional QR algorithm 81

where Nδ (ω) denotes the δ-neighbourhood of ω ⊂ M.

References
1. Amir, A., Hatano, N., Nelson, D.R.: Non-Hermitian localization in biological networks. Phys. Rev. E
93(4), 042310 (2016)
2. Anderson, P.W.: Absence of diffusion in certain random lattices. Phys. Rev. 109(5), 1492 (1958)
3. Aronszajn, N.: Approximation methods for Eigenvalues of completely continuous symmetric operators.
In: Proceedings of the Symposium on Spectral Theory and Differential Problems, pp. 179–202 (1951)
4. Arveson, W.: Improper filtrations for C ∗ -algebras: spectra of unilateral tridiagonal operators. Acta Sci.
Math. (Szeged) 57(1–4), 11–24 (1993)
5. Arveson, W.: Noncommutative spheres and numerical quantum mechanics. In: Operator Algebras,
Mathematical Physics, and Low-dimensional Topology (Istanbul, 1991), Research Notes in Mathe-
matics, vol. 5, A K Peters, Wellesley, pp. 1–10 (1993)
6. Arveson, W.: C ∗ -algebras and numerical linear algebra. J. Funct. Anal. 122(2), 333–360 (1994)
7. Arveson, W.: The role of C ∗ -algebras in infinite-dimensional numerical linear algebra. In: C ∗ -algebras:
1943–1993 (San Antonio, TX, 1993), Contemporary Mathematics, vol. 167, Amer. Math. Soc., Prov-
idence, RI, pp. 114–129 (1994)
8. Ben-Artzi, J., Colbrook, M.J., Hansen, A.C., Nevanlinna, O., Seidel, M.: On the Solvability Complexity
Index Hierarchy and Towers of Algorithms (Preprint) (2018)
9. Ben-Artzi, J., Hansen, A.C., Nevanlinna, O., Seidel, M.: New barriers in complexity theory: on the
solvability complexity index and the towers of algorithms. Comput. Rend. Math. 353(10), 931–936
(2015)
10. Bender, C.M.: Making sense of non-Hermitian Hamiltonians. Rep. Prog. Phys. 70(6), 947 (2007)
11. Bender, C.M., Boettcher, S.: Real spectra in non-Hermitian Hamiltonians having PT symmetry. Phys.
Rev. Lett. 80(24), 5243 (1998)
12. Billy, J., Josse, V., Zuo, Z., Bernard, A., Hambrecht, B., Lugan, P., Clément, D., Sanchez-Palencia, L.,
Bouyer, P., Aspect, A.: Direct observation of Anderson localization of matter waves in a controlled
disorder. Nature 453(7197), 891–894 (2008)
13. Bögli, S., Brown, B.M., Marletta, M., Tretter, C., Wagenhofer, M.: Guaranteed resonance enclosures
and exclosures for atoms and molecules. In: Proceedings of the Royal Society of London A: Mathe-
matical, Physical and Engineering Sciences, vol. 470, no. 2171 (2014)
14. Böttcher, A.: Pseudospectra and singular values of large convolution operators. J. Integral Equ. Appl.
6(3), 267–301 (1994)
15. Böttcher, A.: Infinite matrices and projection methods. In: Lectures on Operator Theory and Its Appli-
cations (Waterloo, ON, 1994), Fields Institute Monographs, vol. 3, Amer. Math. Soc., Providence, pp.
1–72 (1996)
16. Böttcher, A., Brunner, H., Iserles, A., Nørsett, S.P.: On the singular values and eigenvalues of the
Fox-Li and related operators. N. Y. J. Math. 16, 539–561 (2010)
17. Böttcher, A., Chithra, A.V., Namboodiri, M.N.N.: Approximation of approximation numbers by trun-
cation. Integral Equ. Oper. Theory 39(4), 387–395 (2001)
18. Böttcher, A., Silbermann, B.: Introduction to Large Truncated Toeplitz Matrices. Springer, New York
(1999)
19. Böttcher, A., Spitkovsky, I.M.: A gentle guide to the basics of two projections theory. Linear Algebra
Appl. 432(6), 1412–1459 (2010)
20. Boulton, L.: Projection methods for discrete Schrödinger operators. Proc. Lond. Math. Soc. 88(2),
526–544 (2004)
21. Brown, N.: Quasi-diagonality and the finite section method. Math. Comput. 76(257), 339–360 (2007)
22. Brunner, H., Iserles, A., Nørsett, S.P.: The computation of the spectra of highly oscillatory Fredholm
integral operators. J. Integral Equ. Appl. 23(4), 467–519 (2011)
23. Chandler-Wilde, S., Chonchaiya, R., Lindner, M.: Eigenvalue problem meets Sierpinski triangle: com-
puting the spectrum of a non-self-adjoint random operator. Oper. Matrices 5(4), 633–648 (2011)
24. Colbrook, M.J., Roman, B., Hansen, A.: How to Compute Spectra with Error Control (Preprint) (2019)
25. Davies, E.B.: Spectral enclosures and complex resonances for general self-adjoint operators. LMS J.
Comput. Math. 1, 42–74 (1998)

123
82 M. J. Colbrook, A. C. Hansen

26. Davies, E.B.: Linear Operators and Their Spectra, vol. 106. Cambridge University Press, Cambridge
(2007)
27. Dean, C., Wang, L., Maher, P., Forsythe, C., Ghahari, F., Gao, Y., Katoch, J., Ishigami, M., Moon, P.,
Koshino, M., et al.: Hofstadter’s butterfly and the fractal quantum Hall effect in moire superlattices.
Nature 497(7451), 598–602 (2013)
28. Deift, P., Li, L., Tomei, C.: Toda flows with infinitely many variables. J. Funct. Anal. 64(3), 358–402
(1985)
29. Digernes, T., Varadarajan, V.S., Varadhan, S.: Finite approximations to quantum systems. Rev. Math.
Phys. 6(04), 621–648 (1994)
30. Doyle, P., McMullen, C.: Solving the quintic by iteration. Acta Math. 163(3–4), 151–180 (1989)
31. Feinberg, J., Zee, A.: Non-Hermitian localization and delocalization. Phys. Rev. E 59(6), 6433 (1999)
32. M. C. T. for MATLAB 4.5.3.12856. Advanpix LLC., Yokohama, Japan
33. Goldsheid, I.Y., Khoruzhenko, B.A.: Distribution of eigenvalues in non-Hermitian Anderson models.
Phys. Rev. Lett. 80(13), 2897 (1998)
34. Gray, R.M., et al.: Toeplitz and circulant matrices: a review. Found. Trends Commun. Inf. Theory 2(3),
155–239 (2006)
35. Hagen, R., Roch, S., Silbermann, B.: C ∗ -algebras and numerical analysis. In: Monographs and Text-
books in Pure and Applied Mathematics, vol. 236, Marcel Dekker Inc., New York (2001)
36. Hansen, A.C.: On the approximation of spectra of linear operators on Hilbert spaces. J. Funct. Anal.
254(8), 2092–2126 (2008)
37. Hansen, A.C.: Infinite-dimensional numerical linear algebra: theory and applications. Proc. R. Soc.
Lond. Ser. A Math. Phys. Eng. Sci. 466(2124), 3539–3559 (2010)
38. Hansen, A.C.: On the solvability complexity index, the n-pseudospectrum and approximations of
spectra of operators. J. Am. Math. Soc. 24(1), 81–124 (2011)
39. Hatano, N., Nelson, D.R.: Localization transitions in non-Hermitian quantum mechanics. Phys. Rev.
Lett. 77(3), 570 (1996)
40. Holz, D.E., Orland, H., Zee, A.: On the remarkable spectrum of a non-Hermitian random matrix model.
J. Phys. A Math. Gen. 36(12), 3385 (2003)
41. Kato, T.: Perturbation Theory for Linear Operators, vol. 132. Springer, Berlin (2013)
42. Krein, M., Krasnoselski, M.: Fundamental theorems concerning the extension of Hermitian operators
and some of their applications to the theory of orthogonal polynomials and the moment problem.
Uspekhi Mat. Nauk. 2, 60–106 (1947)
43. Levitin, M., Shargorodsky, E.: Spectral pollution and second-order relative spectra for self-adjoint
operators. IMA J. Numer. Anal. 24(3), 393–416 (2004)
44. Lindner, M.: Infinite matrices and their finite sections. In: Frontiers in Mathematics: An Introduction
to the Limit Operator Method, Birkhäuser Verlag, Basel (2006)
45. Makris, K.G., El-Ganainy, R., Christodoulides, D.N., Musslimani, Z.H.: Beam dynamics in PT sym-
metric optical lattices. Phys. Rev. Lett. 100(10), 103904 (2008)
46. Marletta, M.: Neumann–Dirichlet maps and analysis of spectral pollution for non-self-adjoint elliptic
PDEs with real essential spectrum. IMA J. Numer. Anal. 30(4), 917–939 (2010)
47. Marletta, M., Scheichl, R.: Eigenvalues in spectral gaps of differential operators. J. Spectr. Theory
2(3), 293–320 (2012)
48. Mattis, D.C.: The few-body problem on a lattice. Rev. Mod. Phys. 58(2), 361 (1986)
49. McMullen, C.: Families of rational maps and iterative root-finding algorithms. Ann. Math. 125(3),
467–493 (1987)
50. McMullen, C.: Braiding of the attractor and the failure of iterative algorithms. Invent. Math. 91(2),
259–272 (1988)
51. Mogilner, A.: Hamiltonians in solid state physics as multiparticle discrete Schrödinger operators. Adv.
Soc. Math. 5, 139–194 (1991)
52. Nelson, D.R., Shnerb, N.M.: Non-Hermitian localization and population biology. Phys. Rev. E 58(2),
1383 (1998)
53. Olver, S.: ApproxFun.jl v0.8. github (online). https://github.com/JuliaApproximation/ApproxFun.jl
(2018)
54. Olver, S., Townsend, A.: A fast and well-conditioned spectral method. SIAM Rev. 55(3), 462–489
(2013)

123
On the infinite-dimensional QR algorithm 83

55. Olver, S., Townsend, A.: A practical framework for infinite-dimensional linear algebra. In: Proceed-
ings of the 1st First Workshop for High Performance Technical Computing in Dynamic Languages,
HPTCDL ’14, Piscataway, NJ, USA, IEEE Press, pp. 57–62 (2014)
56. Olver, S., Webb, M.: SpectralMeasures.jl. github (online). https://github.com/JuliaApproximation/
SpectralMeasures.jl (2018)
57. Parlett, B.N.: The Symmetric Eigenvalue Problem, vol. 20. siam, Bangkok (1998)
58. Pokrzywa, A.: Method of orthogonal projections and approximation of the spectrum of a bounded
operator. Stud. Math. 65(1), 21–29 (1979)
59. Ponomarenko, L., Gorbachev, R., Yu, G., Elias, D., Jalil, R., Patel, A., Mishchenko, A., Mayorov, A.,
Woods, C., Wallbank, J., et al.: Cloning of Dirac fermions in graphene superlattices. Nature 497(7451),
594–597 (2013)
60. Regensburger, A., Bersch, C., Miri, M.-A., Onishchukov, G., Christodoulides, D.N., Peschel, U.:
Parity-time synthetic photonic lattices. Nature 488(7410), 167–171 (2012)
61. Riddell, R.: Spectral concentration for self-adjoint operators. Pac. J. Math. 23(2), 377–401 (1967)
62. Schmidt, P., Spitzer, F.: The Toeplitz matrices of an arbitrary Laurent polynomial. Math. Scand. 8(1),
15–38 (1960)
63. Seidel, M.: On (N , )-pseudospectra of operators on Banach spaces. J. Funct. Anal. 262(11), 4916–
4927 (2012)
64. Seidel, M., Silbermann, B.: Finite sections of band-dominated operators—norms, condition numbers
and pseudospectra. In: Operator Theory, Pseudo-differential Equations, and Mathematical Physics,
Operator Theory: Advances and Applications, vol. 228, Birkhauser/Springer Basel AG, Basel, pp.
375–390 (2013)
65. Shargorodsky, E.: Geometry of higher order relative spectra and projection methods. J. Oper. Theory
44(1), 43–62 (2000)
66. Shargorodsky, E.: On the limit behaviour of second order relative spectra of self-adjoint operators. J.
Spectr. Theory 3, 535–552 (2013)
67. Shivakumar, P., Sivakumar, K., Zhang, Y.: Infinite Matrices and Their Recent Applications. Springer,
Berlin (2016)
68. Simon, B.: The classical moment problem as a self-adjoint finite difference operator. Adv. Math. 137(1),
82–203 (1998)
69. Smale, S.: The fundamental theorem of algebra and complexity theory. Bull. Am. Math. Soc. (N.S.)
4(1), 1–36 (1981)
70. Szabo, A., Ostlund, N.S.: Modern Quantum Chemistry: Introduction to Advanced Electronic Structure
Theory. Courier Corporation, Chelmsford (2012)
71. Teschl, G.: Jacobi Operators and Completely Integrable Nonlinear Lattices. American Mathematical
Soc, Providence (2000)
72. Trefethen, L.N., Embree, M.: Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and
Operators. Princeton University Press, Princeton (2005)
73. Webb, M., Olver, S.: Spectra of Jacobi Operators Via Connection Coefficient Matrices. arXiv preprint.
arXiv:1702.03095 (2017)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.

123

You might also like