0% found this document useful (0 votes)
62 views132 pages

Quantumnotesgraduate

This document provides lecture notes on quantum algorithms for scientific computation. It covers the basics of quantum computation and algorithms such as Grover's search, quantum phase estimation, and their applications to problems like solving linear systems, differential equations, and Hamiltonian simulation. It is intended to introduce how a quantum computer can be applied to computational science and engineering problems.

Uploaded by

Bob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views132 pages

Quantumnotesgraduate

This document provides lecture notes on quantum algorithms for scientific computation. It covers the basics of quantum computation and algorithms such as Grover's search, quantum phase estimation, and their applications to problems like solving linear systems, differential equations, and Hamiltonian simulation. It is intended to introduce how a quantum computer can be applied to computational science and engineering problems.

Uploaded by

Bob
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

arXiv:2201.

08309v1 [quant-ph] 20 Jan 2022


Lecture Notes on
Quantum Algorithms for Scientific Computation

Lin Lin

Department of Mathematics, University of California, Berkeley


Challenge Institute of Quantum Computation, University of California, Berkeley
Computational Research Division, Lawrence Berkeley National Laboratory

January 21, 2022


Contents

Preface 5

Chapter 1. Preliminaries of quantum computation 7


1.1. Postulates of quantum mechanics 7
1.2. Density operator 11
1.3. Quantum circuit 13
1.4. Copy operation and no-cloning theorem 15
1.5. Measurement 16
1.6. Linear error growth and Duhamel’s principle 19
1.7. Universal gate sets and reversible computation 20
1.8. Fixed point number representation and classical arithmetic operations 23
1.9. Fault tolerant computation 23
1.10. Complexity of quantum algorithms 24
1.11. Notation 25

Chapter 2. Grover’s algorithm 27


2.1. The first quantum algorithm: Deutsch’s algorithm 27
2.2. Unstructured search problem 30
2.3. Amplitude amplification 33
2.4. Lower bound of query complexity* 34

Chapter 3. Quantum phase estimation 39


3.1. Hadamard test 39
3.2. Quantum phase estimation (Kitaev’s method)* 42
3.3. Quantum Fourier transform 45
3.4. Quantum phase estimation using quantum Fourier transform 48
3.5. Analysis of quantum phase estimation 50

Chapter 4. Applications of quantum phase estimation 53


4.1. Ground state energy estimation 53
4.2. Amplitude estimation 57
4.3. HHL algorithm for solving linear systems of equations 59
4.4. Example: Solve Poisson’s equation 65
4.5. Solve linear differential equations* 66
4.6. Example: Solve the heat equation* 72

Chapter 5. Trotter based Hamiltonian simulation 75


5.1. Trotter splitting 75
3
4 CONTENTS

5.2. Commutator type error bound 76


Chapter 6. Block encoding 79
6.1. Query model for matrix entries 79
6.2. Block encoding 80
6.3. Block encoding of s-sparse matrices 84
6.4. Hermitian block encoding 85
6.5. Query models for general sparse matrices* 86
Chapter 7. Matrix functions of Hermitian matrices 91
7.1. Qubitization of Hermitian matrices with Hermitian block encoding 91
7.2. Application: Szegedy’s quantum walk* 94
7.3. Linear combination of unitaries 99
7.4. Qubitization of Hermitian matrices with general block encoding 103
7.5. Quantum eigenvalue transformation 105
7.6. Quantum signal processing 109
7.7. Application: Time-independent Hamiltonian simulation 114
7.8. Application: Ground state preparation 116

Chapter 8. Quantum singular value transformation 119


8.1. Generalized matrix functions 119
8.2. Qubitization of general matrices 120
8.3. Quantum singular value transformation 122
8.4. Application: Solve linear systems of equations 124
8.5. Quantum singular value transformation with basis transformation* 126
8.6. Application: Grover’s search revisited, and fixed-point amplitude amplification* 128
Bibliography 131
Preface

With availability of near-term quantum devices and the breakthrough of quantum supremacy
experiments, quantum computation has received an increasing amount of attention from a diverse
range of scientific disciplines in the past few years. Despite the availability of excellent textbooks
as well as lecture notes such as [NC00, KSV02, Nak08, RP11, Aar13, Pre99, DW19, Chi21], these
materials often cover all aspects of quantum computation, including complexity theory, physical
implementations of quantum devices, quantum information theory, quantum error correction, quan-
tum algorithms etc. This leaves little room for introducing how a quantum computer is supposed
to be used to solve challenging computational problems in scientific and engineering. For instance,
after the initial reading of (admittedly, selected chapters of) the classic textbook by Nielsen and
Chuang [NC00], I was both amazed by the potential power of a quantum computer, and baffled by
its practical range of applicability: are we really trying to build a quantum computer, either to per-
form a quantum Fourier transform or to perform a quantum search? Is quantum phase estimation
the only bridge connecting a quantum computer on one side, and virtually all scientific computing
problems on the other, such as solving linear systems, eigenvalue problems, least squares problems,
differential equations, numerical optimization etc.?
Thanks to the significant progresses in the development of quantum algorithms, it should be
by now self-evident that the answer to both questions above is no. This is a fast evolving field,
and many important progresses have only been developed in the past few years. However, many
such developments are theoretically and technically involved, and can be difficult to penetrate for
someone with only basic knowledge of quantum computing. I think it is worth delivering some of
these exciting results, in a somewhat more accessible way, to a broader community interested in
using future fault-tolerant quantum computers to solve scientific problems.
This is a set of lecture notes used in a graduate topic class in applied mathematics called
“Quantum Algorithms for Scientific Computation” at the Department of Mathematics, UC Berkeley
during the fall semester of 2021. These lecture notes focus only on quantum algorithms closely
related to scientific computation, and in particular, matrix computation. In fact, this is only a
small class of quantum algorithms viewed from the perspective of the “quantum algorithm zoo” 1.
This means that many important materials are consciously left out, such as quantum complexity
theory, applications in number theory and cryptography (notably, Shor’s algorithm), applications in
algebraic problems (such as the hidden subgroup problems) etc. Readers interested in these topics
can consult some of the excellent aforementioned textbooks. Since the materials were designed to
fit into the curriculum of one semester, several other topics relevant to scientific computation are
not included, notably adiabatic quantum computation (AQC), and variational quantum algorithms
(VQA). These materials may be added in future editions of the lecture notes. To my knowledge,

1https://quantumalgorithmzoo.org/

5
6 PREFACE

some of the materials in these lecture notes may be new and have not been presented in the
literature. The sections marked by * can be skipped upon first reading without much detriment.
I would like to thank Dong An, Yulong Dong, Di Fang, Fabian M. Faulstich, Cory Hargus,
Zhen Huang, Subhayan Roy Moulik, Yu Tong, Jiasu Wang, Mathias Weiden, Jiahao Yao, Lexing
Ying for useful discussions and for pointing out typos in the notes. I would like also like to thank
Nilin Abrahamsen, Di Fang, Subhayan Roy Moulik, Yu Tong for contributing some of the exercises,
and Jiahao Yao for providing the cover image of the notes. For errors / comments / suggestions /
general thoughts on the lectures notes, please send me an email: [email protected].
CHAPTER 1

Preliminaries of quantum computation

1.1. Postulates of quantum mechanics


We introduce the four main postulates of quantum mechanics related to this course. For more
details, we refer readers to [NC00, Section 2.2]. All postulates concern closed quantum systems
(i.e., systems isolated from environments) only.

1.1.1. State space postulate. The set of all quantum states of a quantum system forms a
complex vector space with inner product structure (i.e., it is a Hilbert space, denoted by H), called
the state space. If the state space H is finite dimensional, it is isomorphic to some CN , written as
H ∼= CN . Without loss of generality we may simply take H = CN . We always assume N = 2n
for some non-negative integer n, often called the number of quantum bits (or qubits). A quantum
state ψ ∈ CN can be expressed in terms of its components as
 
ψ0
 ψ1 
(1.1) ψ =  . .
 
 .. 
ψN −1
Its Hermitian conjugate is

ψ† = ψ0

(1.2) ψ1 ··· ψ N −1 ,

where c is the complex conjugation of c ∈ C. We also use the Dirac notation, which uses |ψi to
denote a quantum state, hψ| to denote its Hermitian conjugation ψ † , and the inner product
X
(1.3) hψ|ϕi := hψ, ϕi = ψ i ϕi .
i∈[N ]

Here [N ] = { 0, . . . , N − 1 }. Let {|ii} be the standard basis of CN . The i-th entry of ψ can be
written as an inner product ψi = hi|ψi. Then |ψi hϕ| should be interpreted as an outer product,
with (i, j)-th matrix element given by

(1.4) hi|(|ψi hϕ|)|ji = hi|ψi hϕ|ji = ψi ϕj .

Two state vectors |ψi and c |ψi for some 0 6= c ∈ C always refer to the same physical state, i.e.,
c has no observable effects. Hence without loss of generality we always assume |ψi is normalized to
be a unit vector, i.e., hψ|ψi = 1. Sometimes it is more convenient to write down an unnormalized
state, which will be denoted by ψ without the ket notation |·i. Restricting to normalized state
vectors, the complex number c = eiθ for some θ ∈ [0, 2π), called the global phase factor.
7
8 1. PRELIMINARIES OF QUANTUM COMPUTATION

Example 1.1 (Single qubit system). A (single) qubit corresponds to a state space H ∼
= C2 . We
also define
   
1 0
(1.5) |0i = , |1i = .
0 1
Since the state space of the spin- 12 system is also isomorphic to C2 , this is also called the single spin
system, where |0i , |1i are referred to as the spin-up and spin-down state, respectively. A general
state vector in H takes the form
 
a
(1.6) |ψi = a |0i + b |1i = , a, b ∈ C,
b
2 2
and the normalization condition implies |a| + |b| = 1. So we may rewrite |ψi as
 
iγ θ iϕ θ
(1.7) |ψi = e cos |0i + e sin |1i , θ, ϕ, γ ∈ R.
2 2
If we ignore the irrelevant global phase γ, the state is effectively
θ θ
(1.8) |ψi = cos |0i + eiϕ sin |1i , 0 ≤ θ < π, 0 ≤ ϕ < 2π.
2 2
So we may identify each single qubit quantum state with a unique point on the unit three-
dimensional sphere (called the Bloch sphere) as
(1.9) a = (sin θ cos ϕ, sin θ sin ϕ, cos θ)> .

0
1.1.2. Quantum operator postulate. The evolution of a quantum state from |ψi → |ψ i ∈
CN is always achieved via a unitary operator U ∈ CN ×N , i.e.,
(1.10) |ψ 0 i = U |ψi , U † U = IN .
Here U † is the Hermitian conjugate of a matrix U , and IN is the N -dimensional identity matrix.
When the dimension is apparent, we may also simply write I ≡ IN . In quantum computation, a
unitary matrix is often referred to as a gate.
Example 1.2. For a single qubit, the Pauli matrices are
     
0 1 0 −i 1 0
(1.11) σx = X = , σy = Y = , σz = Z = .
1 0 i 0 0 −1
Together with the two-dimensional identity matrix, they form a basis of all linear operators on
C2 . 
Some other commonly used single qubit operators include, to name a few:
• Hadamard gate
 
1 1 1
(1.12) H=√
2 1 −1
• Phase gate
 
1 0
(1.13) S=
0 i
1.1. POSTULATES OF QUANTUM MECHANICS 9

• T gate:
 
1 0
(1.14) T =
0 eiπ/4
When there are notational conflicts, we will use the roman font such as H, X for these single-qubit
gates (one common scenario is to distinguish the Hadamard gate H from a Hamiltonian H). An
operator acting on an n-qubit quantum state space is called an n-qubit operator.
Starting from an initial quantum state |ψ(0)i, the quantum state can evolve in time, which
gives a single parameter family of quantum states denoted by {|ψ(t)i}. These quantum states are
related to each other via a quantum evolution operator U :
(1.15) ψ(t2 ) = U (t2 , t1 )ψ(t1 ),
where U (t2 , t1 ) is unitary for any given t1 , t2 . Here t2 > t1 refers to quantum evolution forward in
time, t2 < t1 refers to quantum evolution backward in time, and U (t1 , t1 ) = I for any t1 .
The quantum evolution under a time-independent Hamiltonian H satisfies the time-independent
Schrödinger equation
(1.16) i∂t |ψ(t)i = H |ψ(t)i .
Here H = H † is a Hermitian matrix. The corresponding time evolution operator is
(1.17) U (t2 , t1 ) = e−iH(t2 −t1 ) , ∀t1 , t2 .
In particular, U (t2 , t1 ) = U (t2 − t1 , 0).
On the other hand, for any unitary matrix U , we can always find a Hermitian matrix H such
that U = eiH (Exercise 1.1).
Example 1.3. Let the Hamiltonian H be the Pauli-X gate. Then
 
cos t −i sin t
(1.18) U (t, 0) = e−iXt = = (cos t)I − iX(sin t).
−i sin t cos t
Starting from an initial state |ψ(0)i = |0i, after time t = π/2, the state evolves into |ψ(π/2)i =
−i |1i, i.e., the |1i state (up to a global phase factor). 
1.1.3. Quantum measurement postulate. Without loss of generality, we only discuss a
special type of quantum measurements called the projective measurement. For more general types
of quantum measurements, see [NC00, Section 2.2.3]. All quantum measurements expressed as a
positive operator-valued measure (POVM) can be expressed in terms of projective measurements
in an enlarged Hilbert space via the Naimark dilation theorem.
In a finite dimensional setting, a quantum observable always corresponds to a Hermitian matrix
M , which has the spectral decomposition
X
(1.19) M= λm Pm .
m

Here λm ∈ R are the eigenvalues of M , and Pm is the projection operator onto the eigenspace
2
associated with λm , i.e., Pm = Pm .
When a quantum state |ψi is measured by a quantum observable M , the outcome of the
measurement is always an eigenvalue λm , with probability
(1.20) pm = hψ|Pm |ψi .
10 1. PRELIMINARIES OF QUANTUM COMPUTATION

After the measurement, the quantum state becomes


Pm |ψi
(1.21) |ψi → √
pm
Note that this is not a unitary process!
In order to evaluate the expectation value of a quantum observable M , we first use the resolution
of identity:
X
(1.22) Pm = I.
m

This implies the normalization condition,


X X
(1.23) pm = hψ|Pm |ψi = hψ|ψi = 1.
m m

Together with pm ≥ 0, we find that {pm } is indeed a probability distribution.


The expectation value of the measurement outcome is
* ! +
X X X
(1.24) Eψ (M ) = λm pm = λm hψ|Pm |ψi = ψ λm Pm ψ = hψ|M |ψi .
m m m

Example 1.4. Again let M = X. From the spectral decomposition of X:


(1.25) X |±i = λ± |±i ,
where |±i := √1 (|0i ± |1i), λ± = ±1, we obtain the eigendecomposition
2

(1.26) M = X = |+i h+| − |−i h−| .


Consider a quantum state |ψi = |0i = √1 (|+i + |−i), then
2
1
(1.27) hψ|P+ |ψi = hψ|P− |ψi =
.
2
Therefore the expectation value of the measurement is hψ|M |ψi = 0. 
1.1.4. Tensor product postulate. For a quantum state consists of m components with state
spaces {Hi }m−1 m−1
i=0 , the state space is their tensor products denoted by H = ⊗i=0 Hi . Let |ψi i be a
state vector in Hi , then
(1.28) |ψi = |ψ0 i ⊗ · · · ⊗ |ψm−1 i
in H. However, not all quantum states in H can be written in the tensor product form above. Let
(i)
{|ej i}j∈[Ni ] be the basis of Hi , then a general state vector in H takes the form
(0) (m−1)
X
(1.29) |ψi = ψj0 ···jm−1 |ej0 i ⊗ · · · ⊗ |ejm−1 i .
j0 ∈[N0 ],...,jm−1 ∈[Nm−1 ]
Q
Here ψj0 ···jm−1 ∈ C is an entry of a m-way tensor, and the dimension of H is therefore i∈[m] Ni .
The state space of n-qubits is H = (C2 )⊗n ∼
n
= C2 , rather than C2n . We also use the notation
⊗n
(1.30) |01i ≡ |0, 1i ≡ |0i |1i ≡ |0i ⊗ |1i , |0⊗n i = |0i .
Furthermore, x ∈ {0, 1}n is called a classical bit-string, and {|xi |x ∈ {0, 1}n } is called the compu-
n
tational basis of C2 .
1.2. DENSITY OPERATOR 11

Example 1.5 (Two qubit system). The state space is H = (C2 )⊗2 ∼= C4 . The standard basis is
(row-major order, i.e., last index is the fastest changing one)
       
1 0 0 0
0 1 0 0
(1.31) |00i =  0 , |01i = 0 , |10i = 1 , |11i = 0 .
      

0 0 0 1
The Bell state (also called the EPR pair) is defined to be
 
1
1 1  0
(1.32) |ψi = √ (|00i + |11i) = √  ,
2 2 0

1
which cannot be written as any product state |ai ⊗ |bi (Exercise 1.4).
There are many important quantum operators on the two-qubit quantum system. One of them
is the CNOT gate, with matrix representation
 
1 0 0 0
 0 1 0 0
(1.33) CNOT =   0 0 0 1 .

0 0 1 0
In other words, when acting on the standard basis, we have


 |00i = |00i

|01i = |01i
(1.34) CNOT .
|10i
 = |11i

|11i = |10i

This can be compactly written as


(1.35) CNOT |ai |bi = |ai |a ⊕ bi .
Here a ⊕ b = (a + b) mod 2 is the “exclusive or” (XOR) operation. 
Example 1.6 (Multi-qubit Pauli operators). For a n-qubit quantum system, the Pauli operator
acting on the i-th qubit is denoted by Pi (P = X, Y, Z). For instance
(1.36) Xi := I ⊗(i−1) ⊗ X ⊗ I ⊗(n−i) .


1.2. Density operator


So far all quantum states encountered can be described by a single |ψi ∈ H, called the pure
state. More generally, if a quantum system is in one of a number of states |ψi i with respective
probabilities pi , then {pi , |ψi i} is an ensemble of pure states. The density operator of the quantum
system is
X
(1.37) ρ := pi |ψi i hψi | .
i
For a pure state |ψi, we have
(1.38) ρ = |ψi hψ|
12 1. PRELIMINARIES OF QUANTUM COMPUTATION

is a rank-1 matrix.
Consider a quantum observable in Eq. (1.19) associated with the projectors {Pm }. For a pure
state, it can be verified that the probability result of returning λm , and the expectation value of
the measurement are respectively,
(1.39) p(m) = Tr[Pm ρ], Eρ [M ] = Tr[M ρ]
The expression (1.39) also holds for general density operators ρ.
An operator ρ is the density operator associated to some ensemble {pi , |ψi i} if and only if (1)
Tr ρ = 1 (2) ρ  0, i.e., ρ is a positive semidefinite matrix (also called a positive operator). All
postulates in Section 1.1 can be stated in terms of density operators (see [NC00, Section 2.4.2]).
Note that a pure state satisfies ρ2 = ρ. In general we have ρ2  ρ. If ρ2 ≺ ρ, then ρ is called (the
density operator of) a mixed state. Furthermore, an ensemble of admissible density operators is
also a density operator.
A quantum operator U that transforms |ψi to U |ψi also transforms the density operator
according to
U
X X
(1.40) ρ= pi |ψi i hψi | −
→ pi U |ψi i hψi | U † = U ρU † := U [ρ].
i i

However, not all quantum operations on density operators need to be unitary! See [NC00, Section
8.2] for more general discussions on quantum operations.
Most of the discussions in this course will be restricted to pure states, and unitary quantum
operations. Even in this restricted setting, the density operator formalism can still be convenient,
particularly for describing a subsystem of a composite quantum system. Consider a quantum system
n
of (n + m)-qubits, partitioned into a subsystem A with n qubits (the state space is HA = C2 ) and
m
a subsystem B with m qubits (the state space is HB = C2 ) respectively. The quantum state is a
n+m
pure state |ψi ∈ C2 with density operator ρAB . Let |a1 i , |a2 i be two state vectors in HA , and
|b1 i , |b2 i be two state vectors in HB . Then the partial trace over system B is defined as
(1.41) TrB [|a1 i ha2 | ⊗ |b1 i hb2 |] = |a1 i ha2 | Tr[|b1 i hb2 |] = |a1 i ha2 | hb2 |b1 i .
Since we can expand the density operator ρAB in terms of the basis of HA , HB , the definition of
(1.41) can be extended to define the reduced density operator for the subsystem A
(1.42) ρA = TrB [ρAB ].
The reduced density operator for the subsystem B can be similarly defined. The reduced density
operators ρA , ρB are generally mixed states.

Example 1.7 (Reduced density operator of tensor product states). If ρAB = ρ1 ⊗ ρ2 , then
(1.43) TrB [ρAB ] = ρ1 , TrA [ρAB ] = ρ2 .


If a quantum observable is defined only on the subsystem A, i.e., M = MA ⊗ I where MA has


the decomposition (1.19), then the success probability of returning λm , and the expectation value
are respectively
(1.44) p(m) = Tr[(Pm ⊗I)ρ] = Tr[Pm TrB [ρ]] = Tr[Pm ρA ], Eρ [M ] = Tr[(MA ⊗I)ρ] = Tr[MA ρA ].
1.3. QUANTUM CIRCUIT 13

1.3. Quantum circuit


Nearly all quantum algorithms operate on multi-qubit quantum systems. When quantum op-
erators operate on two or more qubits, writing down quantum states in terms of its components as
in Eq. (1.29) quickly becomes cumbersome. The quantum circuit language offers a graphical and
compact manner for writing down the procedure of applying a sequence of quantum operators to a
quantum state. For more details see [NC00, Section 4.2, 4.3].
In the quantum circuit language, time flows from the left to right, i.e., the input quantum state
appears on the left, and the quantum operator appears on the right, and each “wire” represents a
qubit i.e.,

|ψi U U |ψi

Here are a few examples:

|0i X |1i |1i Z − |1i |0i H |+i

which is a graphical way of writing


(1.45) X |0i = |1i , Z |1i = − |1i , H |0i = |+i .
The relation between these states can be expressed in terms of the following diagram
X
|0i |1i
(1.46) H H

Z
|+i |−i

Also verify that

|0i X |1i

|0i |0i
which is a graphical way of writing
(1.47) (X ⊗ I) |00i = |10i .
Note that the input state can be general, and in particular does not need to be a product state. For
√ we just apply the quantum operator to |00i and |11i,
example, if the input is a Bell state (1.32),
respectively and multiply the results by 1/ 2 and add together. To distinguish with other symbols,
these single qubit gates may be either written as X, Y, Z, H or (using the roman font) X, Y, Z, H.
The quantum circuit for the CNOT gate is
|ai |ai

|bi |a ⊕ bi
Here the “dot” means that the quantum gate connected to the dot only becomes active if the state of
the qubit 0 (called the control qubit) is a = 1. This justifies the name of the CNOT gate (controlled
NOT).
Similarly,
14 1. PRELIMINARIES OF QUANTUM COMPUTATION

|ai |ai

|bi U U a |bi
is the controlled U gate for some unitary U . Here U a = I if a = 0. The CNOT gate can be obtained
by setting U = X.
Another commonly used two-qubit gate is the SWAP gate, which swaps the state in the 0-th
and the 1-st qubits.
|ai |bi

|bi |ai
Quantum operators applied to multiple qubits can be written in a similar manner:
qubit 0: |0i

qubit 1: |0i
U
qubit 2: |0i

qubit 3: |0i
For a multi-qubit quantum circuit, unless stated otherwise, the first qubit will be referred to as the
qubit 0, and the second qubit as the qubit 1, etc.
When the context is clear, we may also use a more compact notation for the multi-qubit
quantum operators:

|0i
⊗4
U ⇔ |0i⊗4 U ⇔ |0i⊗4 U

One useful multiple qubit gate is the Toffoli gate (or controlled-controlled-NOT, CCNOT gate).
|ai |ai
|bi |bi

|ci |(ab) ⊕ ci
We may also want to apply a n-qubit unitary U only when certain conditions are met
|1i |1i
|1i |1i
|0i |0i

|xi U U |xi
where the empty circle means that the gate being controlled only becomes active when the value of
the control qubit is 0. This can be used to write down the quantum “if” statements, i.e., when the
qubits 0, 1 are at the |1i state and the qubit 2 is at the |0i state, then apply U to |xi.
A set of qubits is often called a register (or quantum variable). For example, in the picture
above, the main quantum state of interest (an n qubit quantum state |xi) is called the system
register. The first 3 qubits can be called the control register.
1.4. COPY OPERATION AND NO-CLONING THEOREM 15

1.4. Copy operation and no-cloning theorem


One of the most striking early results of quantum computation is the no-cloning theorem (by
Wootters and Zurek, as well as Dieks in 1982), which forbids generic quantum copy operations
(see also [NC00, Section 12.1]). The no-deleting theorem is a consequence of linearity of quantum
mechanics.
Assume there is a unitary operator U that acts as the copy operations, i.e.,

(1.48) U |xi ⊗ |si = |xi ⊗ |xi ,

for any black-box state x, and a chosen target state |si (e.g. |0n i). Then take two states |x1 i , |x2 i,
we have

(1.49) U |x1 i ⊗ |si = |x1 i ⊗ |x1 i , U |x2 i ⊗ |si = |x2 i ⊗ |x2 i .

Taking the inner product of the two equations, we have

2
(1.50) hx1 |x2 i = hx1 |x2 i ,

which implies hx1 |x2 i = 0 or 1. When hx1 |x2 i = 1, |x1 i , |x2 i refer to the same physical state.
Therefore a cloning operator U can at most copy states which are orthogonal to each other, and a
general quantum copy operation is impossible.
Given the ubiquity of the copy operation in scientific computing like y = x, the no-cloning
theorem has profound implications. For instance, all classical iterative algorithms for solving linear
systems require storing some intermediate variables. This operation is generally not possible, or at
least cannot be efficiently performed.
There are two notable exceptions to the range of applications of the no-cloning theorem. The
first is that we know how a quantum state is prepared, i.e., |xi = Ux |si for a known unitary Ux
and some |si. Then we can of course copy this specific vector |xi via

(1.51) (I ⊗ Ux ) |xi ⊗ |si = |xi ⊗ |xi .

The second is the copying of classical information. This is an application of the CNOT gate.

|xi |xi

|0i |xi

i.e.,

(1.52) CNOT |x, 0i = |x, xi , x ∈ {0, 1}.

The same principle applies to copying classical information from multiple qubits. Fig. 1.1 gives an
example of copying the classical information stored in 3 bits.
16 1. PRELIMINARIES OF QUANTUM COMPUTATION

|x1 i |x1 i
|x2 i |x2 i
|x3 i |x3 i

|0i |x1 i

|0i |x2 i

|0i |x3 i

Figure 1.1. Copying classical information using multi-qubit CNOT gates.

In general, a multi-qubit CNOT operation can be used to perform the classical copying operation
in the computational basis. Note that in the circuit model, this can be implemented with a depth
1 circuit, since they all act on different qubits.
Example 1.8. Let us verify that the CNOT gate does not violate the no-cloning theorem, i.e., it
cannot be used to copy a superposition of classical bits |xi = a |0i + b |1i. Direct calculation shows
(1.53) CNOT |xi ⊗ |0i = a |00i + b |11i =
6 |xi ⊗ |xi .
In particular, if |xi = |+i, then CNOT creates a Bell state.

The quantum no-cloning theorem implies that there does not exist a unitary U that performs
the deleting operation, which resets a black-box state |xi to |0n i. This is because such a deleting
unitary can be viewed as a copying operation
(1.54) U |0n i ⊗ |xi = |0n i ⊗ |0n i .
Then take |x1 i , |x2 i that are orthogonal to each other, apply the deleting gate, and compute the
inner products, we obtain
(1.55) 0 = hx1 |x2 i = h0n |0n i = 1,
which is a contradiction.
A more common way to express the no-deleting theorem is in terms of the time reversed dual
of the no-cloning theorem: in general, given two copies of some arbitrary quantum state, it is
impossible to delete one of the copies. More specifically, there is no unitary U performing the
following operation using known states |si , |s0 i,
(1.56) U |xi |xi |si = |xi |0n i |s0 i
for an arbitrary unknown state |xi (Exercise 1.7).

1.5. Measurement
The quantum measurement applied to any qubit, by default, measures the outcome in the
computational basis. For example,
|0i H
1.5. MEASUREMENT 17

outputs 0 or 1 each w.p. 1/2. We may also measure some of the qubits in a multi-qubit system.

|0i
|0i
(1.57) |0i U ≡ U
|ψi |0i
⊗2
|ψi
|0i

There are two important principles related to quantum measurements: the principle of deferred
measurement, and the principle of implicit measurement. At a first glance, both principles may
seem to be counterintuitive.
The principle of deferred measurement states that measurement operations can always be moved
from an intermediate stage of a quantum circuit to the end of the circuit. This is because even if
a measurement is performed as an intermediate step in a quantum circuit, and the result of the
measurement is used to conditionally control subsequent quantum gates, such classical controls can
always be replaced by quantum controls, and the result of the quantum measurement is postponed
to later.
Example 1.9 (Deferring quantum measurements). Consider the circuit

|0i H

|0i X
Here the double line denotes the classical control operation. The outcome is that qubit 0 has
probability 1/2 of outputting 0, and the qubit 1 is at state |0i. Qubit 0 also has probability 1/2 of
outputting 1, and the qubit 1 is at state |1i.
However, we may replace the classical control operation after the measurement by a quantum
controlled X (which is CNOT), and measure the qubit 0 afterwards:

|0i H

|0i
It can be verified that the result is the same. Note that CNOT acts as the classical copying
operation. So qubit 1 really stores the classical information (i.e., in the computational basis) of
qubit 0. 
Example 1.10 (Deferred measurement requires extra qubits). The procedure of deferring quantum
measurements using CNOTs is general, and important. Consider the following circuit:
|0i H H
The probability of obtaining 0, 1 is 1/2, respectively. However, if we simply “defer” the measurement
to the end by removing the intermediate measurement, we obtain
|0i H H
The result of the measurement is deterministically 0! The correct way of deferring the intermediate
quantum measurement is to introduce another qubit
18 1. PRELIMINARIES OF QUANTUM COMPUTATION

|0i H H

|0i

Measuring the qubit 0, we obtain 0 or 1 w.p. 1/2, respectively. Hence when deferring quantum
measurements, it is necessary to store the intermediate information in extra (ancilla) qubits, even
if such information is not used afterwards. 
The principle of implicit measurements states that at the end of a quantum circuit, any un-
measured qubit may be assumed to be measured. More specifically, assume the quantum system
consists of two subsystems A and B. If qubits A are to be measured at the end of the circuits,
the results of the measurements does not depend on whether the qubits B are measured or not.
Recall from Eq. (1.44) that a measurement on the subsystem A only depends on the reduced den-
sity matrix ρA . So we only need to show that ρA does not depend on the measurement in B.
To see why this is the case, let {Pi } be the projectors onto the computational basis of B. Before
the measurement, the density operator is ρ. If we measure the subsystem B, the resulting density
operator is transformed into
X
(1.58) ρ0 = (I ⊗ Pi )ρ(I ⊗ Pi ).
i
Then it can be verified that
" #
X
(1.59) ρ0A 0
= TrB [ρ ] = TrB ρ (I ⊗ Pi ) = TrB [ρ] = ρA .
i
This proves the principle of implicit measurements.
By definition, the output of all quantum algorithms must be obtained through measurements,
and hence the measurement outcome is probabilistic in general. If the goal is to compute the
expectation value of a quantum observable MA acting on a subsystem A, then its variance is
(1.60) Varρ [MA ] = Tr[MA2 ρA ] − (Tr[MA ρA ])2 .
The number of samples N needed to estimate Tr[MA ρA ] to additive precision  satisfies
r
Varρ [MA ] Varρ [MA ]
(1.61) ≤ ⇒ N ≥ ,
N 2
which only depends on ρA .
Example 1.11 (Estimating success probability on one qubit). Let A be the single qubit to be
measured in the computational basis, and we are interested in the accuracy in estimating the success
probability of obtaining 1, i.e., p. This can be realized as an expectation value with MA = |1i h1|,
and p = Tr[MA ρA ]. Note that MA2 = MA , then
(1.62) Varρ [MA ] = p − p2 = p(1 − p).
Hence to estimate p to additive error , the number of samples needed satisfies
p(1 − p)
(1.63) N ≥ .
2
Note that if p is close to 0 or 1, the number of samples needed is also very small: indeed, the
outcome of the measurement becomes increasing deterministic in this case!
1.6. LINEAR ERROR GROWTH AND DUHAMEL’S PRINCIPLE 19

If we are interested in estimating p to multiplicative accuracy , then the number of samples is


p(1 − p) 1−p
(1.64) N ≥ = ,
p2 2 p2
and the task becomes increasingly more difficult when p approaches 0. 

1.6. Linear error growth and Duhamel’s principle


If a quantum algorithm denoted by a unitary U can be decomposed into a series of simpler
unitaries as U = UK · · · U1 , and if we can implement each Ui to precision , then what is the global
error? We now introduce a simple technique connecting the local error with the global error. In
the context of quantum computation, this is often referred to as the “hybrid argument”.
Proposition 1.12 (Hybrid argument). Given unitaries U1 , U eK ∈ CN ×N satisfying
e1 , . . . , UK , U

(1.65) ei ≤ ,
Ui − U ∀i = 1, . . . , K,

we have
(1.66) UK · · · U1 − U
eK · · · U
e1 ≤ K.

Proof. Use a telescoping series


UK · · · U1 − U
eK · · · U
e1

=(UK · · · U2 U1 − UK · · · U2 U
e1 ) + (UK · · · U3 U2 U
e1 − UK · · · U3 U e1 ) + · · ·
e2 U
(1.67)
eK−1 · · · U
+ (UK U e1 − U eK · · · U
eK U e1 )

=UK · · · U2 (U1 − U
e1 ) + UK · · · U3 (U2 − U
e2 ) + · · · + (UK − U eK−1 · · · U
eK )U e1 .

Since all Ui , U
ei are unitary matrices, we readily have
K
X
(1.68) UK · · · U1 − U
eK · · · U
e1 ≤ Ui − U
ei ≤ K.
i=1

In other words, if we can implement each local unitary to precision , the global error grows
at most linearly with respect to the number of gates and is bounded by K. The telescoping series
Eq. (1.67), as well as the hybrid argument can also be seen as a discrete analogue of the variation
of constants method (also called Duhamel’s principle).
e (t) ∈ CN ×N
Proposition 1.13 (Duhamel’s principle for Hamiltonian simulation). Let U (t), U
satisfy
(1.69) i∂t U (t) = HU (t), i∂t U
e (t) = H U
e (t) + B(t), U (0) = U
e (0) = I,

where H ∈ CN ×N is a Hermitian matrix, and B(t) ∈ CN ×N is an arbitrary matrix. Then


Z t
(1.70) U (t) = U (t) − i
e U (t − s)B(s) ds,
0
20 1. PRELIMINARIES OF QUANTUM COMPUTATION

and
Z t
(1.71) e (t) − U (t) ≤
U kB(s)k ds.
0

Proof. Directly verify that Eq. (1.70) is the solution to the differential equation. 

As a special case, consider B(t) = E(t)U e (t), then Eq. (1.70) becomes
Z t
(1.72) e (t) = U (t) − i
U U (t − s)E(s)U
e (s) ds,
0
and
Z t
(1.73) e (t) − U (t) ≤
U kE(s)k ds.
0
This is a direct analogue of the hybrid argument in the continuous setting.

1.7. Universal gate sets and reversible computation


In classical computation, there are many universal gate sets, in the sense that any classical gate
can be represented as a combination of gates from the set. For example, the NAND gate (“Not
AND”) alone forms a universal gate set [NC00, Section 3.1.2]. The NOR gate (“Not OR”) is also a
universal gate set.
In the quantum setting, any unitary operator on n qubits can be implemented using 1- and
2-qubit gates[NC00, Section 4.5]. It is desirable to come up with a set of discrete universal gates,
but this means that we need to give up the notion that the unitary U can be exactly represented.
Instead, a set of quantum gates S is universal if given any unitary operator U and desired precision
, we can find U1 , . . . , Um ∈ S such that
(1.74) kU − Um Um−1 · · · U1 k ≤ .
Here kAk = suphψ|ψi=1 kA |ψik is the operator norm (also called the spectral norm) of A, and
p
k|ψik = hψ|ψi is the vector 2-norm). There are many possible choices of universal gate sets, e.g.
{H, T, CNOT}. Another universal gate set is {H, Toffoli}, which only involves real numbers.
Are some universal gate sets better than others? The Solovay-Kitaev theorem states that all
choices of universal gate sets are asymptotically equivalent (see e.g. [Chi21, Chapter 2]):
Theorem 1.14 (Solovay-Kitaev). Let S, T be two universal gate sets that are closed under
inverses. Then any m-gate circuit using the gate set S can be implemented to precision  using
a circuit of O(m · polylog(m/)) gates from the gate set T , and there is a classical algorithm for
finding this circuit in time O(m · polylog(m/)).
Another natural question is about the computational power of quantum computers. Perhaps
surprisingly, it is very difficult to prove that quantum computer is more powerful than classical
computer. But is quantum computer at least as powerful as classical computers? The answer is
yes! More specifically, any classical circuit can also be asymptotically efficiently implemented using
a quantum circuit.
The proof rests on that the classical universal gate can be efficiently simulated using quantum
circuits. Note that this is not a straightforward process: NAND, and other classical gates (such as
AND, OR etc.) are not reversible gates! Hence the first step is to perform classical computation
1.7. UNIVERSAL GATE SETS AND REVERSIBLE COMPUTATION 21

with reversible gates. More specifically, any irreversible classical gate x 7→ f (x) can be made into
a reversible classical gate

(1.75) (x, y) 7→ (x, y ⊕ f (x)).

In particular, we have (x, 0) 7→ (x, f (x)) computed in a reversible way. The key idea is to store all
intermediate steps of the computation (see [NC00, Section 3.2.5] for more details).
On the quantum computer, storing all intermediate computational steps indefinitely creates two
problems: (1) tremendous waste of quantum resources (2) the intermediate results stored in some
extra qubits are still entangled to the quantum state of interest. So if the environments interfere
with intermediate results, the quantum state of interest is also affected.
Fortunately, both problems can be solved by a step called “uncomputation”. In order to imple-
ment a Boolean function f : {0, 1}n → {0, 1}m , we assume there is an oracle

(1.76) |0m i |xi 7→ |f (x)i |xi ,

where |0m i comes from a m-qubit output register. The oracle is often further implemented with
the help of a working register (a.k.a. “garbage” register) such that

(1.77) Uf : |0w i |0m i |xi 7→ |g(x)i |f (x)i |xi .

From the no-deleting theorem, there is no generic unitary operator that can set a black-box
state to |0w i. In order to set the working register back to |0w i while keeping the input and output
state, we introduce yet another m-qubit ancilla register initialized at |0m i. Then we can use an
n-qubit CNOT controlled on the output register and obtain

(1.78) |0m i |g(x)i |f (x)i |xi 7→ |f (x)i |g(x)i |f (x)i |xi .


| {z } | {z } | {z } |{z}
ancilla working output input

It is important to remember that in the operation above, the multi-qubit CNOT gate only performs
the classical copying operation in the computational basis, and does not violate the no-cloning
theorem.
Recall that Uf−1 = Uf† , we have

(1.79) (Im ⊗ Uf† ) |f (x)i |g(x)i |f (x)i |xi = |f (x)i (Uf† |g(x)i |f (x)i |xi) = |f (x)i |0w i |0m i |xi .

Finally we apply an n-qubit SWAP operator on the ancilla and output registers to obtain

(1.80) |f (x)i |0w i |0m i |xi 7→ |0m i |0w i |f (x)i |xi .

After this procedure, both the ancilla and the working register are set to the initial state. They
are no longer entangled to the input or output register, and can be reused for other purposes. This
procedure is called uncomputation. The circuit is shown in Fig. 1.2.
22 1. PRELIMINARIES OF QUANTUM COMPUTATION

|0m i |0m i

|0w i |0w i

|0m i Uf Uf† |f (x)i

|xi |xi

Figure 1.2. Circuit for uncomputation. The CNOT and SWAP operators indicate
the multi-qubit copy and swap operations, respectively.

Remark 1.15 (Discarding working registers). After the uncomputation as shown in Fig. 1.2, the
first two registers are unchanged before and after the application of the circuit (though they are
changed during the intermediate steps). Therefore Fig. 1.2 effectively implements a unitary

(1.81) (Im+w ⊗ Vf ) |0m i |0w i |0m i |xi = |0m i |0w i |f (x)i |xi

or equivalently

(1.82) Vf |0m i |xi = |f (x)i |xi .

In the definition of Vf , all working registers have been discarded (on paper). This allows us to
simplify the notation and focus on the essence of the quantum algorithms under study. 

Using the technique of uncomputation, if the map x 7→ f (x) can be efficiently implemented on
a classical computer, then we can implement this map efficiently on a quantum computer as well.
To do this, we first turn it into a reversible map (1.75). All reversible single-bit and two-bit classical
gates can be implemented using single-qubit and two-qubit quantum gates. So the reversible map
can be made into a unitary operator

(1.83) Uf : |x, yi 7→ |x, y ⊕ f (x)i

on a quantum computer. This proves that a quantum computer is at least as powerful as classical
computers.
The unitary transformation Uf in (1.83) can be applied to any superposition of states in the
computational basis, e.g.
1 X 1 X
(1.84) Uf : √ |x, 0m i 7→ √ |x, f (x)i .
2n x∈{0,1}n
2n x∈{0,1}n

This does not necessarily mean that we can efficient implement the map |xi 7→ |f (x)i. However, if
f is a bijection, and we have access to the inverse of the reversible circuit for computing f −1 , then
we may use the technique of uncomputation to implement such a map (Exercise 1.9).
1.9. FAULT TOLERANT COMPUTATION 23

1.8. Fixed point number representation and classical arithmetic operations


Let [N ] = { 0, 1, . . . , N − 1 }. Any integer k ∈ [N ] where N = 2n can be expressed as an n-bit
string as k = kn−1 · · · k0 with ki ∈ {0, 1}. This is called the binary representation of the integer k.
It should be interpreted as
X
(1.85) k= ki 2 i .
i∈[n]

m
The number k divided by 2 (0 ≤ m ≤ n) can be written as (note that the decimal is shifted to be
after km ):
k X
(1.86) a= = ki 2i−m =: (kn−1 · · · km .km−1 · · · k0 ).
2m
i∈[n]

The most common case is m = n, where


k X
(1.87) a= n = ki 2i−n =: (0.kn−1 · · · k0 ).
2
i∈[n]

Sometimes we may also write a = 0.k1 · · · kn , so that ki is the i-th decimal of a in the binary
representation. For a given floating number 0 ≤ a < 1 written as
(1.88) a = (0.k1 · · · kn kn+1 · · · ),
the number (0.k1 · · · kn ) is called the n-bit fixed point representation of a. Therefore to represent a
to additive precision , we will need n = dlog2 e qubits. If the sign of a is also important, we may
reserve one extra bit to indicate its sign.
Together with the reversible computational model, we can perform classical arithmetic oper-
ations, such as (x, y) 7→ x + y, (x, y) 7→ xy, x 7→ xα , x 7→ cos(x) etc. using reversible quantum
circuits. The number of ancilla qubits, and the number of elementary gates needed for implementing
such quantum circuits is O(poly(n)) (see [RP11, Chapter 6] for more details).
It is worth commenting that while quantum computer is theoretically as powerful as classical
computers, there is a very significant overhead in implementing reversible classical circuits on
quantum devices, both in terms of the number of ancilla qubits and the circuit depth.

1.9. Fault tolerant computation


All previous discussions assume that quantum operations can be perfectly performed. Due
to the immense technical difficulty for realizing quantum computers, both quantum gates and
quantum measurements may involve (significant) errors, particularly on near-term quantum devices.
However, the threshold theorem states that if the noise in individual quantum gates is below a
certain constant threshold (around 10−4 or above), it is possible to efficiently perform an arbitrarily
large quantum computation with any desired precision (see [NC00, Section 10.6]). This procedure
requires quantum error correction protocols.
This course will not discuss any details on quantum error corrections. We always assume fault-
tolerant protocols have been implemented, and all errors come from either approximation errors
at the mathematical level, or Monte Carlo errors in the readout process due to the probabilistic
nature of the measurement process.
24 1. PRELIMINARIES OF QUANTUM COMPUTATION

1.10. Complexity of quantum algorithms


Let n be the number of qubits needed to represent the input. A quantum algorithm is efficient
if the number of gates in the quantum circuit is O(poly(n)). Due to the probabilistic nature of the
measurement outcome, we are typically satisfied if a quantum algorithm can produce the correct
answer with sufficiently high probability p. For a decision problem that asks for a binary answer
0 or 1, we require p > 2/3 (or at least p > 1/2 + 1/poly(n)). For other problems that we have
an efficient procedure to check the correctness of the answer, we require p = Ω(1). Repeating this
process many times and apply the Chernoff bound [NC00, Box 3.4], we can make the probability
of outputting an incorrect answer vanishingly small.
In quantum algorithms, the computational cost is often measured in terms of the query com-
plexity. Assume that we have access to black-box unitary operator Uf (e.g. the one used in the
reversible computation), which is often called a quantum oracle. Our goal is to perform a given
task using as few queries as possible to Uf .
Example 1.16 (Query access to a boolean function). Let f : {0, 1}n → {0, 1} be a boolean
function, which can be queried via the following unitary
(1.89) Uf |xi = (−1)f (x) |xi .
This is called a phase kickback, i.e., the value of f (x) is returned as a phase factor. The phase
kickback is an important tool in many quantum algorithms, e.g. Grover’s algorithm. Note that 1)
Uf can be applied to a superposition of states in the computational basis, and 2) Having query access
to f (x) does not mean that we know everything about f (x), e.g. finding the set { x | f (x) = 0 }
can still be a difficult task. 
Example 1.17 (Partially specified quantum oracles). When designing quantum algorithms, it is
common that we are not interested in the behavior of the entire unitary matrix Uf , but only Uf
applied to certain vectors. For instance, for a (n + 1)-qubit state space, we are only interested in
(1.90) Uf |0i |xi = |0i (A |xi) + |1i (B |xi).
This means that we have only defined the first block-column of Uf as (remember that the row-major
order is used)
 
A ∗
(1.91) Uf =
B ∗
Here A, B are N × N matrices, and ∗ stands for an arbitrary N × N matrix so that Uf is unitary.
Of course in order to implement Uf into quantum gates, we still need to specify the content of ∗.
However, at the conceptual level, the partially specified unitary (1.90) simplifies the design process
of quantum oracles. 
The concept of query complexity hides the implementation details of Uf , and in some cases we
can prove lower bounds on the number of queries to solve a certain problem, e.g. in the case of
Grover’s search (proving a lower bound of the number of gates among all quantum algorithms can
be much harder). Furthermore, once we have access to the number of elementary gates needed to
implement Uf , we obtain immediately the gate complexity of the total algorithm. However, some
queries can be (provably) difficult to implement, and then there can be a large gap between the query
complexity and gate complexity. In order to obtain a meaningful query complexity analysis, one
should also make sure that other components of the quantum algorithm will not end up dominating
the total gate complexity, when all factors are taken into account.
1.11. NOTATION 25

Another important measure of the complexity is the circuit depth, i.e., the maximum number
of gates along any path from an input to an output. Since quantum gates can be performed in
parallel, the circuit depth is approximately equivalent to the concept of “wall-clock time” in classical
computation, i.e., the real time needed for a quantum computer to carry out a certain task. Since
quantum states can only be preserved for a short period of time (called the coherence time), the
circuit depth also provides an approximate measure of whether the quantum algorithm exceeds the
coherence limit of a given quantum computer. In many scenarios, the maximum coherence time
is the most severe limiting factor. When possible, it is often desirable to reduce the circuit depth,
even if it means that the quantum circuit needs to be carried out many more times.
Let us summarize the basic components of a typical quantum algorithm: the set of qubits can be
separated into system registers (storing quantum states of interest) and ancilla registers (auxiliary
registers needed to implement the unitary operation acting on system registers). Starting from an
initial state, apply a series of one-/two-qubit gates, and perform measurements. Uncomputation
should be performed whenever possible. Within the ancilla registers, if a register can be “freed”
after the uncomputation, it is called a working register. Since working registers can be reused for
other purposes, the cost of working registers is often not (explicitly) factored into the asymptotic
cost analysis in the literature.

1.11. Notation
We use k · k to denote vector or matrix 2-norm: when v is a vector we denote by kvk its 2-norm,
and when A is matrix we denote by kAk its operator norm. Other matrix and vector norms will be
introduced when needed. Unless otherwise specified, a vector v ∈ CN is an unnormalized vector,
and a normalized vector (stored as a quantum state) is denoted by |vi = v/ kvk. A vector v can
be expressed in terms of j-th component as v = (vj ) or (v)j = vj . We use a 0-based indexing, i.e.,
j = 0, . . . , N − 1 or j ∈ [N ]. When 1-based indexing is used, we will explicitly write j = 1, . . . , N .
We use the following asymptotic notations besides the usual O (or “big-O”) notation: we write
f = Ω(g) if g = O(f ); f = Θ(g) if f = O(g) and g = O(f ); f = O(g)e if f = O(g polylog(g)).

Exercise 1.1. Prove that any unitary matrix U ∈ CN ×N can be written as U = eiH , where H is
an Hermitian matrix.
Exercise 1.2. Prove Eq. (1.26).

Exercise
√ 1.3. Write down the matrix representation of the SWAP gate, as well as the SWAP
and iSWAP gates.
Exercise 1.4. Prove that the Bell state Eq. (1.32) cannot be written as any product state |ai ⊗ |bi.
Exercise 1.5. Prove Eq. (1.39) holds for a general mixed state ρ.
Exercise 1.6. Prove that an ensemble of admissible density operators is also a density operator.
Exercise 1.7. Prove the no-deleting theorem in Eq. (1.56).
Exercise 1.8. Work out the circuit for implementing Eq. (1.78) and Eq. (1.80).
Exercise 1.9. Prove that if f : {0, 1}n → {0, 1}n is a bijection, and we have access to the inverse
mapping f −1 , then the mapping Uf : |xi 7→ |f (x)i can be implemented on a quantum computer.
Exercise 1.10. Prove Eq. (1.58) and Eq. (1.59).
CHAPTER 2

Grover’s algorithm

Now we will introduce a few basic quantum algorithms, of which the ideas and variants are
present in numerous other quantum algorithms. Hence they are called “quantum primitives”. There
is no official ruling on which quantum algorithms qualify for being included in the set of quantum
primitives, but the membership of Grover’s algorithm, quantum Fourier transform, quantum phase
estimation, and Trotter based Hamiltonian simulation should not be controversial. We first intro-
duce Deutsch’s algorithm, which is arguably one of the simplest quantum algorithms carrying out
a well-defined task.

2.1. The first quantum algorithm: Deutsch’s algorithm


Assume we have two boxes, each of them may contain either an apple or an orange. We would
like to answer: whether the two boxes contain the same type of fruit (but do not need to answer
whether it is apple or orange).
This seems to be a weird question. If the content of a box can only be checked by opening
it, then to answer the question we would need to open two boxes and check what is inside. It is
impossible to answer whether the fruit types are the same without knowing the types! Nevertheless,
this is precisely the question to be addressed by Deutsch’s algorithm.

Mathematically, consider a boolean function f : {0, 1} → {0, 1}. The question is whether
f (0) = f (1) or f (0) 6= f (1)? A quantum fruit-checker assumes the access to f via the following
quantum oracle:

(2.1) Uf |x, yi = |x, y ⊕ f (x)i , x, y ∈ {0, 1}.

A classical fruit-checker can only query Uf in the computational basis as

(2.2) Uf |0, 0i = |0, f (0)i , Uf |1, 0i = |1, f (1)i .

After these two queries, we can measure qubit 1 with a deterministic outcome, and answer whether
f (0) = f (1). However, a quantum fruit-checker can apply Uf to a linear combination of states in
the computational basis.
Let us first check that Uf is unitary:
27
28 2. GROVER’S ALGORITHM

hx0 , y 0 |Uf† Uf |x, yi = hx0 , y 0 ⊕ f (x0 )|x, y ⊕ f (x)i


(2.3) = hx0 |xi hy 0 ⊕ f (x0 )|y ⊕ f (x)i
=δx,x0 δy,y0
which gives Uf† Uf = I.
The idea behind Deutsch’s algorithm is to convert the oracle (2.1) into a phase kickback in
Eq. (1.89). Take |yi = |−i = √12 (|0i − |1i). Then
1
(2.4) Uf |x, yi = √ (|x, f (x)i − |x, 1 ⊕ f (x)i) = (−1)f (x) |x, yi .
2
Note that |yi = HX |0i, Eq. (2.4) can also be interpreted as
(2.5) (I ⊗ XH)Uf (I ⊗ HX) |x, 0i = (−1)f (x) |x, 0i .
The application of XH can be viewed as the step of uncomputation. Neglecting the qubit 1 which
does not change before and after the application, we can focus on the first qubit only, which
effectively defines a unitary
(2.6) ef |xi = (−1)f (x) |xi .
U
Hence the information of f (x) is stored as a phase factor (0 or π). Recall that using a Hadamard
gate H |0i = |+i , H |1i = |−i, the quantum circuit of Deutsch’s algorithm is

|0i H U
ef H

or in the commonly seen form in Fig. 2.1.

|0i H x x H
Uf
|1i H y y ⊕ f (x)

Figure 2.1. Quantum circuit for Deutsch’s algorithm.

The answer is embedded in the measurement outcome of qubit 0. To verify this:


H⊗H 1
|0, 1i −−−−→ |+, −i = √ (|0i + |1i) ⊗ |−i
2
Uf 1
 
f (0)
−−→ √ (−1) |0i + (−1)f (1) |1i ⊗ |−i
(2.7) 2
H⊗I 1  
−−−→ (−1)f (0) + (−1)f (1) |0, −i
2
1 
+ (−1)f (0) − (−1)f (1) |1, −i .
2
So if f (0) = f (1), the final state is ± |0, −i. Measuring qubit 0 returns 0 deterministically (the
globally phase factor is irrelevant). Similarly if f (0) 6= f (1), the final state is ± |1, −i. Measuring
qubit 0 returns 1 deterministically. In summary, only one query to Uf is sufficient to answer
2.1. THE FIRST QUANTUM ALGORITHM: DEUTSCH’S ALGORITHM 29

whether the two boxes contain the same type of fruit. The procedure is equally counterintuitive.
Note that a classical fruit checker as implemented in Eq. (2.2) naturally receives the information by
measuring qubit 1. On the other hand, Deutsch’s algorithm only uses qubit 1 as a signal qubit, and
all the information is retrieved by measuring qubit 0, which, at least from the classical perspective
of Eq. (2.2), seems to contain no information at all!
Now we have seen that in terms of the query complexity, a quantum fruit-checker is clearly more
efficient. However, it is a fair question how to implement the oracle Uf , especially in a way that
somehow does not already reveal the values of f (0), f (1). We give the implementation of some cases
of Uf in Example 2.2. In general, proving the query complexity alone may not be convincing enough
that quantum computers are better than classical computers, and the gate complexity matters. This
will not be the last time we hide away such “implementation details” of quantum oracles.

Remark 2.1 (Deutsch–Jozsa algorithm). The single-qubit version of the Deutsch algorithm can
be naturally generalized to the n-qubit version, called the Deutsch–Jozsa algorithm. Given N = 2n
boxes with an apple or an orange in each box, and the promise that either 1) all boxes contain
the same type of fruit or 2) exactly half of the boxes contain apples and the other half contain
oranges, we would like to distinguish the two cases. Mathematically, given the promise that a
boolean function f : {0, 1}n → {0, 1} is either a constant function (i.e., | { x | f (x) = 0 } | = 0 or
2n ) or a balanced function (i.e., | { x | f (x) = 0 } | = 2n−1 ), we would like to decide to which type
f belongs. We refer to [NC00, Section 1.4.4] for more details. 

Example 2.2 (Qiskit example of Deutsch’s algorithm). For f (0) = f (1) = 1 (constant case), we
can use Uf = I ⊗ X. For f (0) = 0, f (1) = 1 (balanced case), we can use Uf = CNOT.


30 2. GROVER’S ALGORITHM

2.2. Unstructured search problem


n
Assume we have N = 2 boxes, and we are given the promise that only one of the boxes
contains an orange, and each of the remaining boxes contains an apple. The goal is to find the box
that contains the orange.
Mathematically, given a boolean function f : {0, 1}n → {0, 1} and the promise that there exists
a unique marked state x0 that f (x0 ) = 1, we would like to find x0 . This is called an unstructured
search problem. Classically, there is no simpler methods than opening (N − 1) boxes in the worst
case to determine x0 .
The quantum algorithm below, called Grover’s algorithm, relies on access to an oracle
(2.8) Uf |x, yi = |x, y ⊕ f (x)i , x ∈ {0, 1}n , y ∈ {0, 1},

and can find x0 using O( N ) queries. A classical computer again can only query Uf in the
computational basis, and Grover’s algorithm achieves a quadratic speedup in terms of the query
complexity.
The origin of the quadratic speedup can be summarized as follows: while classical proba-
bilistic algorithms work with probability densities, quantum algorithms work with wavefunction
amplitudes, of which the square gives the probability densities. More specifically, we start from a
uniform superposition of all states as the initial state
1 X
(2.9) |ψ0 i = √ |xi .
N x∈[N ]

This state can be prepared using Hadamard gates as


(2.10) |ψ0 i = H ⊗n |0n i .
√ √
We would like to amplify the desired amplitude corresponding to |x0 i from 1/ N to p = Ω(1).

We demonstrate below that this requires O( N ) queries to Uf . After this procedure, by measuring
the final state in the computational basis, we obtain some output state |xi. We can check whether
x = x0 by applying another query of Uf according to Uf |x, 0i = |x, f (x)i. The probability of
obtaining f (x) = 1 is p. If f (x) 6= 1, we repeat the process. Then after O(1/p) times of repetition,
we can obtain x0 with high probability.
The first step of Grover’s algorithm is to turn the oracle (2.8) into a phase kickback. For this
we take |yi = |−i, and for any x ∈ {0, 1}n ,
1
(2.11) Uf |x, −i = √ (|x, f (x)i − |x, 1 ⊕ f (x)i) = (−1)f (x) |x, −i .
2
Any quantum state |ψi can be decomposed as
(2.12) |ψi = α |x0 i + β |ψ ⊥ i ,
where |ψ ⊥ i is the component of |ψi orthogonal to |x0 i, i.e., hψ ⊥ |x0 i = 0. We have
(2.13) Uf |ψi ⊗ |−i = (−α |x0 i + β |ψ ⊥ i) ⊗ |−i .
Here the minus sign is gained through the phase kickback. Discarding the |−i which is unchanged
by applying Uf , we obtain an n-qubit unitary
(2.14) Rx0 (α |x0 i + β |ψ ⊥ i) = −α |x0 i + β |ψ ⊥ i .
2.2. UNSTRUCTURED SEARCH PROBLEM 31

Therefore Rx0 is a reflection operator across the hyperplane orthogonal to |x0 i, i.e., the Householder
reflector
(2.15) Rx0 = I − 2 |x0 i hx0 | .
Let us write
(2.16) |ψ0 i = sin(θ/2) |x0 i + cos(θ/2) |ψ0⊥ i ,
with θ = 2 arcsin √1N ≈ √2N , and |ψ0⊥ i = √N1−1 x6=x0 |xi is a normalized state orthogonal to |x0 i.
P

Then
(2.17) Rx0 |ψ0 i = − sin(θ/2) |x0 i + cos(θ/2) |ψ0⊥ i .
So span{|x0 i , |ψ0⊥ i} is an invariant subspace of Rx0 .
The next key step is to consider another Householder reflector with respect to |ψ0 i. For later
convenience we add a global phase factor −1 (which is irrelevant to the physical outcome):
(2.18) Rψ0 = −(I − 2 |ψ0 i hψ0 |).
Direct computation shows
Rψ0 Rx0 |ψ0 i =Rψ0 (|ψ0 i − 2 sin(θ/2) |x0 i)
=(|ψ0 i − 4 sin2 (θ/2) |ψ0 i) + 2 sin(θ/2) |x0 i
(2.19)
= sin(θ/2)(3 − 4 sin2 (θ/2)) |x0 i + cos(θ/2)(1 − 4 sin2 (θ/2)) |ψ0⊥ i
= sin(3θ/2) |x0 i + cos(3θ/2) |ψ0⊥ i .
So define G = Rψ0 Rx0 as the product of the two reflection operators (called the Grover operator),
then it amplifies the angle from θ/2 to 3θ/2. The geometric picture is in fact even clearer in Fig. 2.2
and the conclusion can be observed without explicit computation.

Figure 2.2. Geometric interpretation of one Grover iteration.

Applying the Grover operator k times, we obtain


(2.20) Gk |ψ0 i = sin((2k + 1)θ/2) |x0 i + cos((2k + 1)θ/2) |ψ0⊥ i .

π
So for sin((2k + 1)θ/2) ≈ 1, we need k ≈ 2θ − 21 ≈ N π
4 . This proves that Grover’s algorithm can

solve the unstructured search problem with O( N ) queries to Uf .
32 2. GROVER’S ALGORITHM

Another derivation of the Grover method is to focus on the operator, instead of the initial
vector |ψ0 i at each step of the calculation. In the orthonormal basis B = {|x0 i , |ψ0⊥ i}, the matrix
representation of the reflector Rx0 = I − 2 |x0 i hx0 | is
 
−1 0
(2.21) [Rx0 ]B = .
0 1
The matrix representation for the Grover diffusion operator Rψ0 = 2 |ψ0 i hψ0 | − I is
 2
√   
(2.22) [Rψ0 ]B = √ −1
2a 2a 1 − a2
=
− cos θ sin θ
.
2a 1 − a2 1 − 2a2 sin θ cos θ

Here sin(θ/2) = a = 1/ N . Therefore for the matrix representation of the Grover iterate G =
Rψ0 Rx0 is
 
cos θ sin θ
(2.23) [G]B = ,
− sin θ cos θ
i.e., G is a rotation matrix restricted to the two-dimensional space H = span B.
The initial vector satisfies
 
sin(θ/2)
(2.24) [|ψ0 i]B = ,
cos(θ/2)

so Grover’s search can be applied as before, via Gk for k ≈ π 4N times.
To draw the quantum circuit of Grover’s algorithm, we need an implementation of Rψ0 . Note
that
(2.25) Rψ0 = H ⊗n (2 |0n i h0n | − I)H ⊗n .
This can be implemented via the following circuit using one ancilla qubit:

|−i X

|ψi H ⊗n H ⊗n

Here the controlled-NOT gate is an n-qubit controlled-X gate, and is only active if the system
qubits are in the 0n state. Discarding the signal qubit, we obtain and implementation of Rψ0 . Since
the signal qubit |−i only changes up to a sign, it can be reused for both Rψ0 and Rx0 .
The reflector Rψ0 can also be implemented without using the ancilla qubit as (use a 3-qubit
system as an example)

H X Z X H

H X X H

H X X H

Figure 2.3. Implementing Rψ0 for a three qubit system.


2.3. AMPLITUDE AMPLIFICATION 33

Remark 2.3 (Multiple marked states). The Grover search algorithm can be naturally
p generalized
to the case when there are M > 1 marked states. The query complexity is O( N/M ). 
Example 2.4 (Qiskit for Grover’s algorithm). https://qiskit.org/textbook/ch-algorithms/
grover.html 

2.3. Amplitude amplification


Grover’s algorithm is not restricted to the problem of unstructured search. One immediate
application is called amplitude amplification (AA), which is used ubiquitously as a subroutine to
achieve quadratic speedups.
Let |ψ0 i be prepared by an oracle Uψ0 , i.e., Uψ0 |0n i = |ψ0 i. We have the knowledge that
√ p
(2.26) |ψ0 i = p0 |ψgood i + 1 − p0 |ψbad i ,
and p0  1. Here |ψbad i is an orthogonal state to the desired state |ψgood i. We cannot get access to
|ψgood i directly, but would like to obtain a state that has a large overlap with |ψgood i, i.e., amplify
the amplitude of |ψgood i.
In the problem of unstructured search, |ψgood i = |x0 i, and p0 = 1/N . Although we do not
have access to the answer |x0 i, we assume access to the reflection oracle Rx0 . Here we also assume
access to the reflection oracle
(2.27) Rgood = 1 − 2 |ψgood i hψgood | .
From Uψ0 , we can construct the reflection with respect to the initial state
(2.28) Rψ0 = 2 |ψ0 i hψ0 | − I = Uψ0 (2 |0n i h0n | − I)Uψ† 0
via the n-qubit controlled-X gate. So following exactly the same procedure as the unstructured
search problem, we can construct the Grover iterate
(2.29) G = Rψ0 Rgood .

Applying Gk to |ψ0 i for some k = O(1/ p0 ), we obtain a state that has Ω(1) overlap with |ψgood i.
Example 2.5 (Reflection with respect to signal qubits). One common scenario is that the imple-
mentation of Uψ0 requires m ancilla qubits (also called signal qubits), i.e.,
√ p
(2.30) Uψ0 |0m i |0n i = p0 |0m i |ψ0 i + 1 − p0 |⊥i ,
where |⊥i is some orthogonal state satisfying
(2.31) (Π ⊗ In ) |⊥i = 0, Π = |0m i h0m | .
Therefore
(2.32) |ψgood i = |0m i |ψ0 i , |ψbad i = |⊥i .
This setting is special since the “good” state can be verified by measuring the ancilla qubits after
applying Uψ0 in Eq. (2.30), and post-select the outcome 0m . In particular, the expected number of
measurements needed to obtain |ψgood i is 1/p0 .
In order to employ the AA procedure, we first note that the reflection operator can be simplified
as
(2.33) Rgood = (1 − 2 |0m i h0m |) ⊗ In = (1 − 2Π) ⊗ In .
34 2. GROVER’S ALGORITHM

This is because |ψgood i can be entirely identified by measuring the ancilla qubits. Meanwhile
(2.34) Rψ0 = Uψ0 (2 |0m+n i h0m+n | − I)Uψ† 0 .

Let G = Rψ0 Rgood , and applying Gk to |ψ0 i for some k = O(1/ p0 ) times, we obtain a state that
has Ω(1) overlap with |ψgood i. This achieves the desired quadratic speedup. 
Example 2.6 (Amplitude damping). Assuming access to an oracle in Eq. (2.30), where p0 is large,

we can easily dampen the amplitude to any number α ≤ p0 .
We introduce an additional signal qubit. Then Eq. (2.30) becomes
√ p
(2.35) (I ⊗ Uψ0 ) |0i |0i |0n i = |0i p0 |0i |ψ0 i + 1 − p0 |⊥i .
Define a single qubit rotation operation as
(2.36) Rθ |0i = cos θ |0i + sin θ |1i ,
and we have
(Rθ ⊗ Im+n )(I ⊗ Uψ0 ) |0i |0m i |0n i
√ √
= cos θ |0i ( p0 |0m i |ψ0 i + 1 − p0 |⊥0 i) + sin θ |1i ( p0 |0m i |ψ0 i + 1 − p0 |⊥0 i)
p p
(2.37)
√ p
:= p0 cos θ |0i |0m i |ψ0 i + 1 − p0 cos2 θ |⊥0 i .

Here (|0m+1 i h0m+1 | ⊗ In ) |⊥0 i = 0. We only need to choose p0 cos θ = α.


2.4. Lower bound of query complexity*


Recall that the unstructured search problem tries to find a marked state x0 ∈ [N ], using a
reflection oracle
(2.38) Rx0 = I − 2 |x0 i hx0 | .

Grover’s algorithm can find x0 with constant probability (e.g. at least 1/2) by making O( N )
times querying Rx0 . It turns out that this√ is asymptotically optimal, i.e., no quantum algorithm
can perform this task using fewer than Ω( N ) access to Rx0 .
Any quantum search algorithm that starts from a universal initial state |ψ0 i and queries Rx0
for k steps can be written in the following form:
(2.39) |ψkx0 i = Ukx0 |ψ0 i = Uk Rx0 · · · U2 Rx0 U1 Rx0 |ψ0 i ,
for some unitaries U1 , . . . , Uk . For simplicity we assume no ancilla qubits are used, and the proof
can be generalized to the case in the presence of ancilla qubits. The superscript x0 indicates that
the state depends on the marked state x0 . Specifically, by “solving” the search problem, it means
that there exists a k so that for each marked state x0 ,
1
(2.40) | hψkx0 |x0 i |2 ≥ .
2
x0
In other words, measuring |ψk i in the computational basis, the probability of obtaining |x0 i is at
least 1/2.
To prove the lower bound, we compare the action of Ukx0 with another “fake algorithm” Uk ,
defined as
(2.41) |ψk i = Uk |ψ0 i = Uk · · · U2 U1 |ψ0 i .
2.4. LOWER BOUND OF QUERY COMPLEXITY* 35

In particular, |ψk i does not involve any information of the solution x0 and therefore cannot possibly
solve the search problem.
For a set of vectors {f x0 }x0 ∈[N ] , and each f x0 ∈ CN , we will extensively use the following
discrete `2 -norm:
  12
X
(2.42) kf k`2 :=  kf x0 k .
x0 ∈[N ]

In particular, we have the following triangle inequality

(2.43) kf k`2 − kgk`2 ≤ kf + gk`2 ≤ kf k`2 + kgk`2 .

The proof contains two steps. First, we show that the true solution and the fake solution differs
significantly, in the sense that
X 2
(2.44) Dk := k|ψkx0 i − |ψk ik = Ω(N ).
x0 ∈[N ]

Second, we prove that (define D0 = 0):

(2.45) Dk ≤ 4k 2 , k ≥ 0.

Therefore to satisfy Eq. (2.44), we must have k = Ω( N ).
In the first step, since multiplying a phase factor eiθ to |ψkx0 i does not have any physical
consequences, we may choose a particular phase θ so that Eq. (2.40) becomes

1
(2.46) hψkx0 |x0 i ≥ √ .
2

Therefore
2

(2.47) k|ψkx0 i − |x0 ik = 2 − 2 hψkx0 |x0 i ≤ 2 − 2.

This means that


X 2

(2.48) k|ψkx0 i − |x0 ik ≤ 2N − 2N.
x0 ∈[N ]

On the other hand, for the “fake algorithm”, using the Cauchy-Schwarz inequality,
X 2
X √ X √
(2.49) k|ψk i − |x0 ik ≥ 2N − 2 | hx0 |ψi | ≥ 2N − 2 N | hx0 |ψi |2 = 2N − 2 N .
x0 ∈[N ] x0 ∈[N ] x0 ∈[N ]

This violates the bound in Eq. (2.48), and the fake algorithm cannot solve the search problem for
arbitrarily large k.
36 2. GROVER’S ALGORITHM

So from Eqs. (2.48) and (2.49), and the triangle inequality, we have
X 2
X 2
Dk = k|ψkx0 i − |ψk ik = k(|ψkx0 i − |x0 i) − (|ψk i − |x0 i)k
x0 ∈[N ] x0 ∈[N ]
 2
s X s X
2 2
≥ k|ψk i − |x0 ik − k|ψkx0 i − |x0 ik 
(2.50) x0 ∈[N ] x0 ∈[N ]
2
√ √
q q
≥ 2N − 2 N − 2N − 2N

= Ω(N ).
This proves Eq. (2.44). In other words, the true solution and the fake solution must be well separated
in `2 -norm.
In the second step, we prove Eq. (2.45) inductively. Clearly Eq. (2.45) is true for k = 0. Assume
this is true, then
X
x0 2
Dk+1 = |ψk+1 i − |ψk+1 i
x0 ∈[N ]
X 2
= kUk+1 Rx0 |ψkx0 i − Uk+1 |ψk ik
x0 ∈[N ]
X 2
= kRx0 |ψkx0 i − |ψk ik
(2.51) x0 ∈[N ]
X 2
= kRx0 (|ψkx0 i − |ψk i) + (Rx0 − I) |ψk ik
x0 ∈[N ]
 2
s X s X
2 2
≤ kRx0 (|ψkx0 i − |ψk i)k + k(Rx0 − I) |ψk ik  .
x0 ∈[N ] x0 ∈[N ]

The last inequality uses the triangle inequality of the discrete `2 -norm. Note that
s X s X
2 2
(2.52) k(Rx0 − I) |ψk ik = 4 |hx0 |ψk i| = 2,
x0 ∈[N ] x0 ∈[N ]

and
s X s X
2 2
p
(2.53) kRx0 (|ψkx0 i − |ψk i)k = k|ψkx0 i − |ψk ik = Dk ,
x0 ∈[N ] x0 ∈[N ]

we have
p p
(2.54) Dk+1 ≤ Dk + 2 ≤ 2(k + 1),
which finishes the induction.
Finally, combining the lower bound Eqs. (2.44) and (2.45), we find
√ that the necessary condition
to solve the unstructured search problem is 4k 2 = Ω(N ), or k = Ω( N ).
Remark 2.7 (Implication for amplitude amplification). Due to the close relation between unstruc-
tred search and amplitude amplification, it means that given a state |ψi of which the amplitude of
2.4. LOWER BOUND OF QUERY COMPLEXITY* 37

the “good” component is α  1, no quantum algorithms can amplify the amplitude to Ω(1) using
1
o(α− 2 ) queries to the reflection operators. 

Exercise 2.1. In Deutsch’s algorithm, demonstrate why not assuming access to an oracle Vf :
|xi 7→ |f (x)i.
Exercise 2.2. For all possible mappings f : {0, 1} → {0, 1}, draw the corresponding quantum
circuit to implement Uf : |x, 0i 7→ |x, f (x)i.
Exercise 2.3. Prove Eq. (2.20).
Exercise 2.4. Draw the quantum circuit for Eq. (2.28).
Exercise 2.5. Prove√ that when ancilla qubits are used, the complexity of the unstructured search
problem is still Ω( N ).
CHAPTER 3

Quantum phase estimation

The setup of the phase estimation problem is as follows. Let U be a unitary, and |ψi is an
eigenvector, i.e.,

(3.1) U |ψi = ei2πϕ |ψi , ϕ ∈ [0, 1).

The goal is to find ϕ up to certain precision. This is a quantum primitive with numerous applica-
tions: prime factorization (Shor’s algorithm), linear system (HHL), eigenvalue problem, amplitude
estimation, quantum counting, quantum walk, etc.
Using a classical computer, we can estimate ϕ using U |ψi |ψi, where stands for the element-
wise division operation. Specifically, if |ψi is indeed an eigenvector and hj|ψi 6= 0 for any j in the
computational basis, then we can extract the phase from

(3.2) hj|U |ψi / hj|ψi = ei2πϕ .

Unfortunately, such a element-wise division operation cannot be efficiently implemented on quantum


computers, and alternative methods are needed.
Quantum phase estimation has numerous variants, and still receives intensive research attention
till today. This chapter only introduces some of the simplest variants.

3.1. Hadamard test


We first introduce the Hadamard test, which is a useful tool for computing the expectation value
of an unitary operator with respect to a state, i.e., hψ|U |ψi. Since U is generally not Hermitian, this
does not correspond to the measurement of a physical observable. Instead the real and imaginary
part of the expectation value need to be measured separately.
The (real) Hadamard test is the quantum circuit in Fig. 3.1 for estimating the real part of
hψ|U |ψi.

|0i H H

|ψi U

Figure 3.1. Hadamard test for Re hψ|U |ψi.

39
40 3. QUANTUM PHASE ESTIMATION

To verify this, we find that the circuit transforms |0i |ψi as

H⊗I 1
|0i |ψi −−−→ √ (|0i + |1i) |ψi
2
c-U 1
−−→ √ (|0i |ψi + |1i U |ψi)
2
H⊗I 1 1
−−−→ |0i (|ψi + U |ψi) + |1i (|ψi − U |ψi).
2 2

The probability of measuring the qubit 0 to be in state |0i is

1
(3.3) p(0) = (1 + Re hψ|U |ψi).
2

This is well defined since −1 ≤ Re hψ|U |ψi ≤ 1.


To obtain the imaginary part, we can use the circuit in Fig. 3.2 called the (imaginary) Hadamard
test, where
 
1 0
(3.4) S=
0 i

is called the phase gate.

|0i H S† H

|ψi U

Figure 3.2. Hadamard test for Im hψ|U |ψi.

Similar calculation shows the circuit transforms |0i |ψi to the state

1 1
(3.5) |0i (|ψi − iU |ψi) + |1i (|ψi + iU |ψi).
2 2

Therefore the probability of measuring the qubit 0 to be in state |0i is

1
(3.6) p(0) = (1 + Im hψ|U |ψi).
2

Combining the results from the two circuits, we obtain the estimate to hψ|U |ψi.

Example 3.1 (Overlap estimate using swap test). An application of the Hadamard test is called
the swap test, which is used to estimate the overlap of two quantum states |hϕ|ψi|. The quantum
circuit for the swap test is
3.1. HADAMARD TEST 41

|0i H H

|ϕi

|ψi

Figure 3.3. Circuit for the SWAP test.

Note that this is exactly the Hadamard test with U being the n-qubit swap gate. Direct
calculation shows that the probability of measuring the qubit 0 to be in state |0i is
1 1 2
(3.7) p(0) = (1 + Re hϕ, ψ|ψ, ϕi) = (1 + |hϕ|ψi| ).
2 2

Example 3.2 (Overlap estimate with relative phase information). In the swap test, the quantum
states |ϕi , |ψi can be black-box states, and in such a scenario obtaining an estimate to |hϕ|ψi| is
the best one can do. In order to retrieve the relative phase information and to obtain hϕ|ψi, we
need to have access to the unitary circuit preparing |ϕi , |ψi, i.e.,
(3.8) Uϕ |0n i = |ϕi , Uψ |0n i = |ψi .
Then we have hϕ|ψi = h0n |Uϕ† Uψ |0n i. 
Example 3.3 (Single qubit phase estimation). The Hadamard test can also be used to derive the
simplest version of the phase estimate based on success probabilities. Apply the Hadamard test in
Fig. 3.1 with U, ψ satisfying Eq. (3.1). Then the probability of measuring the qubit 0 to be in state
|1i is
1 1
(3.9) p(1) = (1 − Re hψ|U |ψi) = (1 − cos(2πϕ)).
2 2
Therefore
arccos(1 − 2p(1))
(3.10) ϕ=± (mod 1).

In order to quantify the efficiency of the procedure, recall from Example 1.11 that if p(1) is far
away from 0 or 1, i.e., (2ϕ mod 1) is far away from 0, in order to approximate p(1) (and hence ϕ)
to additive precision , the number of samples needed is O(1/2 ).
Now assume ϕ is very close to 0 and we would like to estimate ϕ to additive precision . Note
that
(3.11) p(1) ≈ (2πϕ)2 = O(2 ).
Then p(1) needs to be estimated to precision O(2 ), and again the number of samples needed is
O(1/2 ). The case when ϕ is close to 1/2 or 1 is similar.
Note that the circuit in Fig. 3.1 cannot distinguish the sign of ϕ (or whether ϕ ≥ 1/2 when
restricted to the interval [0, 1)). To this end we need Fig. 3.2, but replace S† by S, so that the
success probability of measuring 1 in the computational basis is
1
(3.12) p(1) = (1 + sin(2πϕ)).
2
42 3. QUANTUM PHASE ESTIMATION

This gives
(
∈ [0, 1/2), p(1) ≥ 12 ,
(3.13) ϕ
∈ (1/2, 1), p(1) < 12 .

Unlike the previous estimate, in order to correctly estimate the sign, we only require O(1) accuracy,
and run Fig. 3.2 for a constant number of times (unless ϕ is very close to 0 or π). 
Example 3.4 (Qiskit example for phase estimation using the Hadamard test). Here is a qiskit
example of the simple phase estimation for
  
1 0 0
R|1i = = ei2πϕ |1i,
0 ei2πϕ 1
1
with ϕ = 0.5 + 2d
≡ 0.10 · · · 01 (d bits in total). In this example, d = 4 and the exact value is
ϕ = 0.5625.

3.2. Quantum phase estimation (Kitaev’s method)*


In Example 3.3, the number of measurements needed to estimate ϕ to precision  is O(1/2 ). A
quadratic improvement in precision (i.e., O(1/)) can be achieved by means of the quantum phase
estimation. One such procedure is called Kitaev’s method [KSV02, Section 13.5].
In the fixed point representation in Section 1.8, for simplicity we assume that the eigenvalue
can be exactly represented using d bits, i.e.,
(3.14) ϕ = (.ϕd−1 · · · ϕ0 ).
In the simplest scenario, we assume d = 1, and ϕ = .ϕ0 , ϕ0 ∈ {0, 1}. Then ei2πϕ = eiπϕ0 .
Performing the real Hadamard test in Example 3.3, we have p(1) = 0 if ϕ0 = 0, and p(1) = 1 if
ϕ0 = 1. In either case, the result is deterministic, and one measurement is sufficient to determine
the value of ϕ0 .
3.2. QUANTUM PHASE ESTIMATION (KITAEV’S METHOD)* 43

Next, consider ϕ = . 0 · · · 0ϕ0 . To determine the value of ϕ0 , we need to reach precision


| {z }
d bits
 < 2−d . The method in Example 3.3 requires O(1/2 ) = O(22d ) repeated measurements, or
number of queries to U . The observation from Kitaev’s method is that if we can have access to U j
for a suitable power j, then the number of queries to U can be reduced. More specifically, if we can
d−1
query U 2 , then the circuit in Fig. 3.4 with j = d − 1 gives
(
1 0, ϕ0 = 0,
(3.15) p(1) = (1 − cos(2π.ϕ0 )) =
2 1, ϕ0 = 1.
The result is again deterministic. Therefore the total number of queries to U becomes O(2−d ) =
O(−1 ).

|0i H H

|ψi U2
j

Figure 3.4. Circuit used in Kitaev’s method. Another one with a phase gate to
determine the sign of 2j ϕ may also be used.

This is the basic idea behind Kitaev’s method: use a more complex quantum circuit (and in
particular, with a larger circuit depth) to reduce the total number of queries. As a general strategy,
j
instead of estimating ϕ from a single number, we assume access to U 2 , and estimate ϕ bit-by-bit.
j
In particular, changing U → U 2 in the Hadamard test allows us to estimate
(3.16) 2j ϕ = ϕd−1 · · · ϕd−j .ϕd−j−1 · · · ϕ0 = .ϕd−j−1 · · · ϕ0 (mod 1)
One immediate difficulty of the bit-by-bit estimation is that we need to tell 0.0111 . . . apart from
0.1000 . . ., and the two numbers can be arbitrarily close to each other (though the two numbers can
also differ at some number of digits), and some careful work is needed. We will first describe the
algorithm, and then analyze its performance. The algorithm works for any ϕ, and then the goal is
to estimate its d bits. For simplicity of the analysis, we assume ϕ is exactly represented by d bits.
We will use extensively the distance
(3.17) |x|1 ≡ |x| mod 1 := min{(x mod 1), 1 − (x mod 1)},
which is the distance on the unit circle.
First, by applying the circuit in Fig. 3.4 (and the corresponding circuit to determine the sign)
with j = 0, 1, . . . , d − 3, for each j we can estimate p(0), so that the error in 2j ϕ is less than 1/16
for all j (this can happen with a sufficiently high success probability. For simplicity, let us assume
that this happens with certainty). The measured result is denoted by αj . This means that any per-
turbation must be due to the 5th digit in the binary representation. For example, if 2j ϕ = 0.11100,
then αj = 0.11011 is an acceptable result with an error 0.00001 = 1/32, but αj = 0.11110 is not
acceptable since the error is 0.0001 = 1/16. We then round αj (mod 1) by its closest 3-bit estimate
denoted by βj , i.e., βj is taken from the set { 0.000, 0.001, 0.010, 0.011, 0.100, 0.101, 0.110, 0.111 }.
Consider the example of 2j ϕ = 0.11110, if αj = 0.11101, then βj = 0.111. But if αj = 0.11111, then
βj = 0.000. Another example is 2j ϕ = 0.11101, if αj = 0.11110, then both βj = 0.111 (rounded
44 3. QUANTUM PHASE ESTIMATION

down) and βj = 0.000 (rounded up) are acceptable. We can pick one of them at random. We will
show later that the uncertainty in αj , βj is not detrimental to the success of the algorithm.
Second, we perform some post-processing. Start from j = d − 3, we can estimate .ϕ2 ϕ1 ϕ0 to
accuracy 1/16, which recovers these three bits exactly. The values of these three bits will be taken
from βd−3 directly. Then we proceed with the iteration: for j = d − 4, . . . , 0, we assign
(
0, |.0ϕd−j−2 ϕd−j−3 − βj | mod 1 < 1/4,
(3.18) ϕd−j−1 =
1, |.1ϕd−j−2 ϕd−j−3 − βj | mod 1 < 1/4.
Here |·| mod 1 is the periodic distance on [0, 1) and its value is always ≤ 1/2. Since the two possibil-
ities are separated by 1/2, for each j, there will be at most one case that is satisfied. We will also
show that in all circumstances, there is always one case that is satisfied, regardless of the ambiguity
of the choice of βj above.
After running the algorithm above, we recover ϕ = .ϕd−1 · · · ϕ0 exactly. The total cost of
Pd−3 j  −1
Kitaev’s method measured by the number of queries to U is O j=0 2 = O( ).
If ϕ is exactly represented by d bits, we will obtain an estimate
(3.19) |.ϕd−1 · · · ϕ0 − ϕ| mod 1 < 2−d = .
Example 3.5. Consider ϕ = 0.ϕ4 ϕ3 ϕ2 ϕ1 ϕ0 = 0.11111 and d = 5. Running Kitaev’s algorithm
with j = 0, 1, 2 gives the following possible choices of βj :
j 2j ϕ possible βj
0 0.11111 { 0.111, 0.000 }
1 0.1111 { 0.111, 0.000 }
2 0.111 { 0.111 }
Start with j = 2. We have only one choice of βj , and can recover 0.ϕ2 ϕ1 ϕ0 = 0.111. Then for
j = 1, we need to use Eq. (3.18) to decide ϕ3 . If we choose βj = 0.111, we have ϕ3 = 1. But if we
choose βj = 0.000, we still need to choose ϕ3 = 1, since |.011 − 0.000| mod 1 = 0.100 = 1/2 > 1/4,
and |.111 − 0.000| mod 1 = 0.001 = 1/8 < 1/4. Similar analysis shows that for j = 0 we have ϕ4 = 1.
This recovers ϕ exactly. 
Example 3.6 (A variant of Kitaev’s algorithm that does not work). Let us modify Kitaev’s al-
gorithm as follows: for each 2j ϕ is determined to precision 1/8, and round the result to βj ∈
{ 0.00, 0.01, 0.10, 0.11 }. Start from j = d − 2, we estimate .ϕ1 ϕ0 exactly. Then for j = d − 3, . . . , 0,
we assign
(
0, |.0ϕd−j−2 − βj | mod 1 < 1/2,
(3.20) ϕd−j−1 =
1, |.1ϕd−j−2 − βj | mod 1 < 1/2.
Now that the inequality < 1/2 above can be equivalently written as ≤ 1/4.
Let us run the algorithm above for ϕ = 0.ϕ4 ϕ3 ϕ2 ϕ1 ϕ0 = 0.1111 and d = 4. This gives:
j 2j ϕ possible βj
0 0.1111 { 0.11, 0.00 }
1 0.111 { 0.11, 0.00 }
2 0.11 { 0.11 }
Start with j = 2. We have only one choice of βj , and can recover 0.ϕ1 ϕ0 = 0.11. Then for j = 1, if we
choose βj = 0.11, we have ϕ2 = 1. But if we choose βj = 0.00, then |.01 − 0.00| mod 1 = 0.0.01 = 1/4,
3.3. QUANTUM FOURIER TRANSFORM 45

and |.11 − 0.000| mod 1 = 0.01 = 1/4. So the algorithm cannot distinguish the two possibilities and
fails. 
Let us now inductively show why Kitaev’s algorithm works. Again assume ϕ is exactly rep-
resented by d bits. For j = d − 3, we know that ϕ2 ϕ1 ϕ0 can be recovered exactly. Then assume
ϕd−j−2 · · · ϕ0 have all been exactly computed, at step j we would like to determine the value of
ϕd−j−1 . From
(3.21) αj − 2j ϕ mod 1
< 1/16, |αj − βj | mod 1 ≤ 1/16,
we know
(3.22) 2j ϕ − βj mod 1
< 1/8.
Then
|.ϕd−j−1 ϕd−j−2 ϕd−j−3 − βj | mod 1
(3.23) ≤ .ϕd−j−1 ϕd−j−2 ϕd−j−3 − 2j ϕ mod 1
+ 2 j ϕ − βj mod 1
≤1/16 + 1/8 < 1/4.
The wrong choice of ϕd−j−1 denoted by ϕ
ed−j−1 then satisfies
(3.24)
|.ϕ
ed−j−1 ϕd−j−2 ϕd−j−3 − βj | mod 1
≥ |.ϕ
ed−j−1 ϕd−j−2 ϕd−j−3 − .ϕd−j−1 ϕd−j−2 ϕd−j−3 | mod 1 − |.ϕd−j−1 ϕd−j−2 ϕd−j−3 − βj | mod 1
>1/2 − 1/4 = 1/4.
This proves the validity of Eq. (3.18), and hence that of Kitaev’s algorithm.

3.3. Quantum Fourier transform


Fourier transform is used ubiquitously in scientific computing, and fast Fourier transform (FFT)
is the backbone for many fast algorithms in classical computation. Similarly, the quantum Fourier
transform is also an important component in many quantum algorithms, such as phase estima-
tion, Shor’s algorithm, and inspires other fast algorithms such as fast Fermionic Fourier transform
(FFFT) etc.
For any j in the computational basis, the (discrete) forward Fourier transform is defined as
1 X i2π kj
(3.25) UFT |ji = √ e N |ki .
N k∈[N ]

In particular
1 X
(3.26) UFT |0n i = √ |ki = H ⊗n |0n i .
N k∈[N ]

The (discrete) inverse Fourier transform is


† 1 X −i2π kj
(3.27) UFT |ji = √ e N |ki .
N k∈[N ]

Using the binary representation of integers


(3.28) k = (kn−1 · · · k0 .), j = (jn−1 · · · j0 .)
46 3. QUANTUM PHASE ESTIMATION

we have
kj j j j
=k0 n + k1 n−1 + · · · + kn−1
N 2 2 2
=k0 (.jn−1 · · · j0 ) + k1 (jn−1 .jn−2 · · · j0 ) + · · · + kn−1 (jn−1 · · · j1 .j0 ).
Therefore the exponential can be written as
kj
(3.29) ei2π N = ei2πk0 (.jn−1 ···j0 ) ei2πk1 (.jn−2 ···j0 ) · · · ei2πkn−1 (.j0 ) .
The most important step of QFT is the following direct calculation, which requires some patience
with the manipulation of indices:
(3.30)
1 X
UFT |jn−1 · · · j0 i = √ ei2πk0 (.jn−1 ···j0 ) ei2πk1 (.jn−2 ···j0 ) · · · ei2πkn−1 (.j0 ) |kn−1 · · · k0 i
n
2 k ,...,k
n−1 0
   
1  X i2πkn−1 (.j0 ) X
=√ e |kn−1 i ⊗  ei2πkn−2 (.j1 j0 ) |kn−1 i
2n k kn−2
n−1
!
X
⊗ ··· ⊗ ei2πk0 (.jn−1 ···j0 ) |k0 i
k0
1      
=√ |0i + ei2π(.j0 ) |1i ⊗ |0i + ei2π(.j1 j0 ) |1i ⊗ · · · ⊗ |0i + ei2π(.jn−1 ···j0 ) |1i .
2n
Eq. (3.30) involves a series of controlled rotations of the form
1  
(3.31) |0i → √ |0i + ei2π(.jn−1 ···j0 ) |1i .
2
Hence before discussing the quantum circuit for QFT, let us first work out the circuit for imple-
menting this controlled rotation. We use the relation
(3.32) ei2π(.jn−1 ···j0 ) = ei2π(.jn−1 ) ei2π(.0jn−2 ) · · · ei2π(.0···0j0 ) .
Example 3.7 (Implementation of controlled rotation). Consider the implementation of
1  
(3.33) |0i |ji → √ |0i + ei2π(.jn−1 ···j0 ) |1i |ji ,
2
Let
 
1 0
(3.34) Rz (θ) = ,
0 eiθ
and Rj = Rz (π/2j−1 ). In particular, R1 = Z. The quantum circuit is
|0i H R1 R2 ··· Rn

|jn−1 i ···
|jn−2 i ···
···
|j0 i ···
3.3. QUANTUM FOURIER TRANSFORM 47

The implementation of QFT follows the same principle, but does not require the signal qubit
to store the phase information. Let us see a few examples.
When n = 1, we need to implement

1  
(3.35) |j0 i → √ |0i + ei2π(.j0 ) |1i .
2

This is the Hadamard gate:

(3.36) |j0 i → H |j0 i .

When n = 2, we need to implement

1    
(3.37) |ji → √ |0i + ei2π(.j0 ) |1i ⊗ |0i + ei2π(.j1 j0 ) |1i .
22

This can be implemented using the following circuit:

|j1 i H R2 √1 (|0i + ei2π(.j1 j0 ) |1i)


2

|j0 i H √1 (|0i + ei2π(.j0 ) |1i)


2

Comparing the result with that in Eq. (3.37), we find that the ordering of the qubits is reversed.
To recover the desired result in QFT, we can apply a SWAP gate to the outcome, i.e.,

|j1 i H R2 √1 (|0i + ei2π(.j0 ) |1i)


2

|j0 i H √1 (|0i + ei2π(.j1 j0 ) |1i)


2

In order to implement the inverse Fourier transform, we only need to apply the Hermitian
conjugate as

|j1 i R2† H √1 (|0i


2
+ e−i2π(.j0 ) |1i)

|j0 i H √1 (|0i
2
+ e−i2π(.j1 j0 ) |1i)

Similarly one can construct the circuit for UFT and its inverse for n = 3.
In general, the QFT circuit is given by Fig. 3.5. Compare the circuit in Fig. 3.5 with Eq. (3.30),
we find again that the ordering is reversed in the output. To restore the correct order as defined in
QFT, we can use O(n/2) swaps operations. The total gate complexity of QFT is O(n2 ).
48 3. QUANTUM PHASE ESTIMATION

|jn−1 i H R2 R3 ··· Rn ··· √1 (|0i


2
+ ei2π(.jn−1 ···j0 ) |1i)

|jn−2 i ··· H R2 ··· √1 (|0i


2
+ ei2π(.jn−2 ···j0 ) |1i)

|jn−3 i ··· ··· √1 (|0i


2
+ ei2π(.jn−3 ···j0 ) |1i)
···

|j0 i ··· ··· H √1 (|0i + ei2π(.j0 ) |1i)


2

Figure 3.5. Quantum circuit for quantum Fourier transform (before applying
swap operations).

Example 3.8 (Qiskit example for QFT). https://qiskit.org/textbook/ch-algorithms/quantum-


fourier-transform.html 

3.4. Quantum phase estimation using quantum Fourier transform


In Kitaev’s method, we use 1 ancilla qubit but d different circuits of various circuit depths to
perform phase estimation. In this section, we introduce the (standard) quantum phase estimation
(QPE), which uses one signal quantum circuit based on QFT, but requires d-ancilla qubits to store
the phase information in the quantum computer. From now, we assume ϕ = .ϕd−1 · · · ϕ0 is exact.
From the availability of U j we can define a controlled unitary operation
X
(3.38) U= |ji hj| ⊗ U j .
j∈[2d ]

When d = 1, U is simply the controlled U operation. For a general d, it seems that we need to
implement all 2d different U j . However, this is not necessary.
Pd−1
Using the binary representation of
Pd−1 i Qd−1 i
integers j = (jd−1 · · · j0 .) = i=0 ji 2i , we have U j = U i=0 ji 2 = i=0 U ji 2 . Therefore similar
to the operations in QFT,
X
U= |ji hj| ⊗ U j
j∈[2d ]
d−1
X Y i
= (|jd−1 i hjd−1 |) ⊗ · · · ⊗ (|j0 i hj0 |) ⊗ U ji 2
jd−1 ,...,j0 i=0
(3.39) d−1
 
Y 0 X i
=  |ji i hji | ⊗ U ji 2 
i=0 ji
d−1
Y 0  i

= |0i h0| ⊗ In + |1i h1| ⊗ U 2 .
i=0
3.4. QUANTUM PHASE ESTIMATION USING QUANTUM FOURIER TRANSFORM 49

Q0
Here the primed product is a slightly awkward notation, which means the tensor product for
the first register, and the regular matrix product for the second register. It is in fact much clearer
to observe the structure in the quantum circuit in Fig. 3.6.

|jd−1 i ···
···
|j1 i ···
|j0 i ···

|ψi U U2 ··· U2
d−1
U |ψi

Figure 3.6. Circuit for controlled matrix power of U .

Remark 3.9. At first glance, the saving due to the usage of the circuit in Fig. 3.6 may not
2d−1
seem to be large, since we still need to P implement matrix powers as high as U . However,
the alternative would be to implement j∈[2d ] |ji hj|, which requires very complicated multi-qubit
control operations. Another scenario when significant advantage can be gained is when U can be
fast-forwarded, i.e., U j can be implemented at a cost that is independent of j. This is for instance,
if U = Rz (θ) is a single-qubit rotation. Then the circuit Fig. 3.6 is exponentially better than the
direct implementation of U. 

Now let the initial state in the ancilla qubits be |0n i. Use QFT and U, we transform the initial
states according to

UFT ⊗I 1 X
|0d i |ψ0 i −− −−→ √ |ji |ψ0 i
2d j∈[2d ]
U 1 X 1 X
→√
− |ji U j |ψ0 i = √ |ji ei2πϕj |ψ0 i
(3.40) 2d j∈[2d ] 2d j∈[2d ]
 
† X i2πj ϕ− k0 
UFT ⊗I X 1 2d  |k 0 i |ψ i .
−−−−→  e 0
0 d
2d d
k ∈[2 ] j∈[2 ]

Since we have ϕ = 2kd for some k ∈ [2d ], measuring the ancilla qubits, and we will obtain the state
|ki |ψ0 i with certainty, and we obtain the phase information. Therefore the quantum circuit for

QFT based QPE is given by Fig. 3.7. Here we have used Eq. (3.26). We should note that UFT
includes the swapping operations. (Exercise: 1. what if the swap operation is not implemented? 2.
Is it possible to modify the circuit and remove the need of implementing the swap operations?)
50 3. QUANTUM PHASE ESTIMATION


|0d i H ⊗d UFT
U
|ψi |ψi

Figure 3.7. Quantum circuit for quantum phase estimation using quantum
Fourier transform.


Example 3.10 (Hadamard test viewed as QPE). When d = 1, note that U † = UFT = H, the QFT
based QPE in Fig. 3.7 is exactly the Hadamard test in Fig. 3.1. Note that ϕ does not need to be
exactly represented by a one bit number! 
Example 3.11 (Qiskit example for QPE). https://qiskit.org/textbook/ch-algorithms/quantum-
phase-estimation.html 

3.5. Analysis of quantum phase estimation


In order to apply QPE (standard or Kitaev), we have assumed that
(1) |ψ0 i is an eigenstate.
(2) ϕ0 has an d-bit binary representation.
In general practical calculations, neither condition can be exactly satisfied, and we need to analyze
the effect on the error of the QPE. Recall the discussion in Section 1.9, we assume the only sources
of errors are at the mathematical level (instead of noisy implementation of quantum devices). In
this context, the error can be due to an inexact eigenstate |ψi, or Monte Carlo errors in the readout
process due to the probabilistic nature of the measurement process. In this section, we relax these
constraints and study what happens when the conditions are not exactly met. We assume U has
the eigendecomposition.
(3.41) U |ψj i = ei2πϕj |ψj i .
Without loss of generality we assume 0 ≤ ϕ0 ≤ ϕ1 · · · ≤ ϕN −1 < 1. We are interested in using QPE
to find the value of ϕ0 .
We first relax the condition (1), i.e., assume all ϕi ’s have an exact d-bit binary representation,
but the quantum state is given by a linear combination
X
(3.42) |φi = ck |ψk i .
k∈[N ]
2 2
Here the overlap p0 = |hφ|ψ0 i| = |c0 | < 1.
Applying the QPE circuit in Fig. 3.7 to |0t i |φi with t = d, and measure the ancilla qubits,
we obtain the binary representation of ϕ0 with probability p0 . Furthermore, the system register
returns the eigenstate |ψ0 i with probability p0 . Of course, in order to recognize that ϕ0 is the
desired phase, we need some a priori knowledge of the location of ϕ0 , e.g. ϕ0 ∈ (a, b) and ϕi > b
for all i 6= 0. It would be desirable to relax both conditions (1) and (2). However, the analysis can
be rather involved and additional assumptions are needed. We will discuss some of the implications
in the context of estimating the ground state energy in Section 4.1.
For now, to simplify the analysis, we focus on the case that only the condition (2) is violated,
i.e., ϕ0 cannot be exactly represented by a d-bit number, and we need to apply the QPE circuit to
3.5. ANALYSIS OF QUANTUM PHASE ESTIMATION 51

an initial state |0t i |φi with t > d. The exact relation between the t and the desired accuracy d will
be determined later. Similar to Eq. (3.40), we obtain the state
 
X 1 X i2πj ϕ0 − k0 
|0t i |ψ0 i →  e T  |k 0 i |ψ0 i
0
T
(3.43) k ∈[T ] j∈[T ]
X
= γ0,k0 |k 0 i |ψ0 i .
k0

Here
1 X i2πj ϕ0 − kT0 1 1 − ei2πT (ϕ0 −ϕek0 ) k0
 
(3.44) γ0,k0 = e = , ϕ
ek0 = .
T T 1 − ei2π(ϕ0 −ϕek0 ) T
j∈[T ]

ek00 for some k00 , then γ0,k0 = δk0 ,k00 .


Therefore if ϕ0 has an exact d-bit representation, i.e., ϕ0 = ϕ
We recover the previous result that one run of the QPE circuit gives the value ϕ0 deterministically.
Now assume that ϕ0 6= ϕ ek0 for any k 0 . Note that ei2πx is a periodic function with period 1,
we can only determine the value of x mod 1. Therefore we use the periodic distance Eq. (3.17). In
terms of the phase, we would like to find k00 such that
(3.45) ϕ0 − ϕ
ek00 1
< .

Here  = 2−d = 2t−d /T is the precision parameter. In particular, for any k 0 we have
1
(3.46) |ϕ0 − ϕ
ek0 |1 ≤ .
2
Using the relation that for any θ ∈ [−π, π],
p 2
(3.47) 1 − eiθ = 2(1 − cos θ) = 2 |sin(θ/2)| ≥ |θ| ,
π
we obtain
2 1
(3.48) |γ0,k0 | ≤ = .
T π2 2π |ϕ0 − ϕ
ek0 |1 2T |ϕ0 − ϕ
ek0 |1

Figure 3.8. For ϕ0 = 0.35, the shape of |γ0,k | with T = 64 and T = 1024.
52 3. QUANTUM PHASE ESTIMATION

Let k00 be the measurement outcome, which can be viewed as a random variable. The probability
of obtaining some ϕ ek00 that is at least  distance away from ϕ0 is
X 2
P ( ϕ0 − ϕek00 1 ≥ ) = |γ0,k0 |
|ϕ0 −ϕ
ek0 |1 ≥
X 1
(3.49) ≤ 2
|ϕ0 −ϕ
ek0 |1 ≥
4T 2 |ϕ0 − ϕ
ek0 |1
Z ∞
2 1 2 1 1
≤ dx + = + .
4T  x2 4T 2 2 2T  2(T )2
Set t − d = dlog2 δ −1 e, then T  = 2t−d ≥ δ −1 . Hence for any 0 < δ < 1, the failure probability
δ + δ2
(3.50) P ( ϕ0 − ϕ
ek00 1
≥ ) ≤ ≤ δ.
2
In other words, in order to obtain the phase ϕ0 to accuracy  = 2−d with a success probability at
least 1 − δ, we need d + dlog2 δ −1 e ancilla qubits to store the value of the phase. On top of that,
the simulation time needs to be T = (δ)−1 .
Remark 3.12 (Quantum median method). One problem with QPE is that in order to obtain a
success probability 1 − δ, we must use log2 δ −1 ancilla qubits, and the maximal simulation time also
needs to be increased by a factor δ −1 . The increase of the maximal simulation time is particularly
undesirable since it increases the circuit depth and hence the required coherence time of the quantum
device. When |ψi is an exact eigenstate, this can be improved by the median method, which uses
log δ −1 copies of the result from QPE without using ancilla qubits or increasing the circuit depth.
When |ψi is a linear combination of eigenstates, the problem of the aliasing effect becomes more
difficult to handle. One possibility is to generalize the median method into the quantum median
method [NWZ09], which uses classical arithmetics to evaluate the median using a quantum circuit.
To reach success probability 1 − δ, we still need log2 δ −1 ancilla qubits, but the maximal simulation
time does not need to be increased. 

Exercise 3.1. Write down the quantum circuit for the overlap estimate in Example 3.2.
Exercise 3.2. For ϕ = 0.111111, we run Kitaev’s algorithm to estimate its first 4 bits. Check that
the outcome satisfies Eq. (3.19). Note that 0.0000 and 0.1111 are both acceptable answers. Prove
the validity of Kitaev’s algorithm in general in the sense of Eq. (3.19).
Exercise 3.3. For a 3 qubit system, explicitly construct the circuit for UFT and its inverse.
Exercise 3.4. For a n-qubit system, write down the quantum circuit for the swap operation used
in QFT.
Exercise 3.5. Similar to the Hadamard test in Example 3.10, develop an algorithm to perform
QPE using the circuit in Fig. 3.7 with only d = 2, while the phase ϕ can be any number in [0, 1/2).
CHAPTER 4

Applications of quantum phase estimation

4.1. Ground state energy estimation


As an application, we use QPE to solve the problem of estimating the ground state energy of
a Hamiltonian. Let H be a Hermitian matrix with eigendecomposition
(4.1) H |ψj i = λj |ψj i .
Below are two examples of Hamiltonians commonly encountered in quantum many-body physics.
Example 4.1 (Transverse field Ising model). The Hamiltonian for the one dimensional transverse
field Ising model (TFIM) with nearest neighbor interaction of length n is
n−1
X n
X
(4.2) H=− Zi Zi+1 − g Xi .
i=1 i=1

The dimension of the Hamiltonian matrix H is 2n . 


Example 4.2 (Fermionic system in second quantization). For a fermionic system (such as elec-
trons), the Hamiltonian can be expressed in terms of the creation and annihilation operators as
n n
Tij â†i âj + Vijkl â†i â†j âl âk .
X X
(4.3) H=
ij=1 ijkl=1

The creation and annihilation operators â†i , âi can be converted into Pauli operators via e.g. the
Jordan-Wigner transform as
1 1
(4.4) âi = Z ⊗(i−1) ⊗ (X + iY ) ⊗ I ⊗(N −i) , â†i = Z ⊗(i−1) ⊗ (X − iY ) ⊗ I ⊗(N −i) .
2 2
Here X, Y, Z, I are single-qubit Pauli-matrices. The dimension of the Hamiltonian matrix Ĥ is thus
2n .
The number operator takes the form
1
(4.5) n̂i := â†i âi = (I − Zi ).
2
For a given state |Ψi, the total number of particles is
* n
+
X
(4.6) Ne = Ψ n̂i Ψ .
i=1

The Hamiltonian H preserves the total number of particles Ne . 


53
54 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

Without loss of generality we assume 0 < λ0 ≤ λ1 ≤ · · · < λN −1 < 21 . Note that for the
purpose of estimating the ground state energy, we do not necessarily require a positive energy gap.
For simplicity of the presentation, we still assume that the ground state is non-degenerate, i.e.,
λ0 < λ1 . We are also provided an approximate eigenstate
X
(4.7) |φi = ck |ψk i ,
k∈[N ]

2
of which the overlap with the ground state is p0 = |hφ|ψ0 i| . Our goal is to estimate λ0 to precision
 = 2−d . We assume  < λ0 . This appears in many problems in quantum many-body physics,
quantum chemistry, optimization etc.
In order to use QPE (based on QFT), we assume access to the unitary evolution operator
U = ei2πH . This is called a Hamiltonian simulation problem, which will be discussed in detail in
later chapters. For now we assume U can be implemented exactly. Then
U |ψ0 i = ei2πλ0 |ψ0 i .
This becomes a phase estimation problem, where the input vector is not an exact eigenstate.
Following the discussion in Section 3.5, if all eigenvalues λj can be exactly represented by d-
bit numbers, we obtain both the ground state and the ground state energy with probability p0 .
Therefore repeating the process for O(p−1 0 ) times we obtain the ground state energy.
Now we relax both conditions (1) and (2) in Section 3.5, and apply the QPE circuit in Fig. 3.7
to an initial state |0t i |φi for some t > d. Similar to Eq. (3.40), we have
UFT ⊗I
X 1 X
|0t i |φi −− −−→ ck √ |ji |ψk i
k
T j∈[T ]
U
X 1 X X 1 X

→ ck √ |ji U j |ψk i = ck √ |ji ei2πλk j |ψk i
k
T j∈[T ] k
T j∈[T ]
(4.8)  
† X i2πj λk − k0
1
 
UFT ⊗I X X
−− −−→ ck e T  |k 0 i |ψk i
0
T
k k ∈[T ] j∈[T ]
X X
= ck γk,k0 |k 0 i |ψk i .
k k0 ∈[T ]

Here
1 X i2πj λk − kT0 1 1 − ei2πT (λk −ϕek0 ) k0
 
(4.9) γk,k0 = e = , ϕ
ek0 = .
T T 1 − ei2π(λk −ϕek0 ) T
j∈[T ]

Therefore the definition in Eq. (3.44) is a special case.


Our algorithm is simple: we would like to run QPE M times, and denote the output of the
(`) (`)
`-th run by ϕ ek0 . Then we take the minimum of the measured output min` ϕ ek0 as the estimate to
the ground state energy. The hope is to obtain the ground state energy to accuracy  with success
probability at least 1 − δ for any δ > 0. Let us now analyze this algorithm.
If ϕ0 has an exact d-bit representation, i.e., λ0 = ϕ ek00 for some integer k00 , then γk00 ,k0 =
δk00 ,k0 . It may seem that this implies with probability p0 , we obtain the exact estimate of ϕ0 , and
correspondingly the eigenstate |ψ0 i is stored in the system register. This is much better than the
previous assumption that all eigenvalues λj need to be represented by a d-bit number.
4.1. GROUND STATE ENERGY ESTIMATION 55

Unfortunately, this analysis is not correct. In fact, for any λk that does not have an exact t-bit
representation (note that t > d), we may have γk,k00 6= 0 and ϕ ek00 < λ0 , i.e., we obtain an energy
estimate that is lower than the ground state energy! Therefore the probability of ending up in the
2
state |k 00 i |ψk i is |ck γk,k00 | , i.e., it is still possible to obtain a wrong ground state energy. This is
called the aliasing effect.
We demonstrate below that if T is large enough, we can control the probability of underes-
timating the ground state energy. Since λ0 is the ground state energy, and all eigenvalues are in
(0, 1/2), when ϕ ek0 ≤ λ0 − , we have

(4.10) |λk − ϕ
ek0 |1 ≥ |λ0 − ϕ
ek0 |1 = λ0 − ϕ
ek0 ≥ , ∀k ∈ [N ].

Then the probability of under estimating the ground state energy by  is

X X 2
ek0 ≤ λ0 − ) =
P (ϕ |ck γk,k0 |
k λ 0 −ϕ
ek0 ≥
X X 1
≤ pk 2
k λ 0 −ϕ
ek0 ≥
4T 2 |λk − ϕ
ek0 |1
X X 1
(4.11) ≤ pk 2
k λ0 −ϕ
ek0 ≥
4T 2 |λ0 − ϕ
ek0 |
X 1
= 2
λ 0 −ϕ
ek0 ≥
4T 2 |λ0 − ϕ
ek0 |
1 1
≤ + .
4T  4(T )2

Let T  = δ 0−1 with δ 0 < 1/2, we have

δ0
(4.12) ek0 ≤ λ0 − ) ≤
P (ϕ .
2

Therefore after M repetitions, we have

M δ0
 
(`)
(4.13) P ek0 ≤ λ0 −  ≤
min ϕ .
` 2

 
(`)
ek0 ≤ λ0 −  < 2δ , we need to set δ 0 = M −1 δ.
In order to obtain P min` ϕ
56 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

 
(`)
ek0 ≥ λ0 +  . To this end, we
On the other hand, we also would like to have bound P min` ϕ
ek0 ≥ λ0 + , we have |ϕ
first note that when ϕ ek0 − λ0 |1 ≥ . Moreover,
X X 2
ek0 − λ0 |1 < ) =
P (|ϕ |ck γk,k0 |
k |ϕ
ek0 −λ0 |1 <
X 2
≥p0 |γ0,k0 |

ek0 −λ0 |1 <
 
(4.14) X 2
=p0 1 − |γ0,k0 | 

ek0 −λ0 |1 ≥
 
1 1
≥p0 1 − −
2T  2(T )2
≥p0 (1 − δ 0 ).
Here we have used the normalization condition that
X 2
(4.15) |γk,k0 | = 1, ∀k.
k0
Therefore
p0
(4.16) P (|ϕ ek0 − λ0 |1 < ) ≤ 1 − p0 (1 − δ 0 ) ≤ 1 −
ek0 − λ0 |1 ≥ ) = 1 − P (|ϕ .
2
This means that
p0 M
(`)
(4.17) ek0 − λ0 |1 ≥ )M = (1 − p0 /2)M ≤ e−
ek0 ≥ λ0 + ) ≤ P (|ϕ
P (min ϕ 2 .
`

We can then take M = d p20 log 2δ e so that


(`) δ
(4.18) ek0 ≥ λ0 + ) ≤
P (min ϕ .
` 2
To summarize, according to the relation δ 0 = M −1 δ, in order to estimate the ground state energy
to precision  = 2−d with success probability 1 − δ, we need
(4.19) t = d + dlog δ 0−1 e = d + O(log p−1
0 + log log δ
−1
)
ancilla qubits in QPE. The circuit depth is
T = O (δp0 )−1 log δ −1 .

(4.20)
Taking the number of repetitions M into account, the total cost of the method is
O −1 δ −1 p−2 −1 2

(4.21) 0 (log δ ) .
Remark 4.3 (Dependence on the initial overlap p0 ). The analysis above shows that QPE has a
nontrivial dependence on the initial overlap p0 , which has a rather indirect source. First, in order
to reduce the possibility of overshooting, the number of repetitions M needs to be large enough
and is O(p−10 ). However, this also increases the probability of undershooting the eigenvalue, and
hence δ 0 needs to be chosen to be O(M −1 ) = O(p0 ). This means that the circuit depth should be
O(δ 0 ) = O(p−1 −2
0 ). The total complexity is thus T M = O(p0 ). Therefore, when the initial overlap
p0 is small, using QPE to find the ground state energy can be very costly. On the other hand,
−1
using different techniques, the dependence on p0 can be drastically improved to O(p0 2 ) [LT20b].
See also the discussions in Section 7.8.
4.2. AMPLITUDE ESTIMATION 57


Remark 4.4 (QPE for preparing the ground state). The estimate of the ground state energy does
not necessarily require the energy gap λg := λ1 −λ0 to be positive. However, if our goal is to prepare
the ground state |ψ0 i from an initial state |φi using QPE, then we need stronger assumptions. In
particular, we cannot afford to obtain |k 0 i |ψk i, where |ϕ
ek0 − λ0 | <  but k 6= 0. This at least
−d
requires  = 2 < λg , and introduces a natural dependence on the inverse of the gap. 
Through the analysis above, we see that although the analysis of QPE is very clean when 1) all
eigenvalues (properly scaled to be represented as phase factors) are exactly given by d-bit numbers
2) the input vector is an eigenstate, the analysis can become rather complicated and tedious when
such conditions are relaxed. Such difficulty does not merely show up at the theoretical level, but
can seriously impact the robust performance of QPE in practical applications. To simplify the
discussion of the applications below, we will be much more cavalier about the usage of QPE and
assume all eigenvalues are exactly represented by d-bit numbers whenever necessary. But we should
keep such caveats in mind. Furthermore, when we move beyond QPE, the issue of having exact
d-bit numbers will become much less severe in techniques based on quantum signal processing, i.e.,
quantum eigenvalue transformation (QET) and quantum singular value transformation (QSVT).

4.2. Amplitude estimation


Let |ψ0 i be prepared by an oracle Uψ0 , i.e., Uψ0 |0n i = |ψ0 i. We have the knowledge that
√ p
(4.22) |ψ0 i = p0 |ψgood i + 1 − p0 |ψbad i ,

and p0  1. Here |ψbad i is an orthogonal state to the desired state |ψgood i, and p0 = sin θ2 . The
problem of amplitude estimation is to estimate p0 to -precision. If p0 is away from 0 or 1 and is to
be estimated directly from the Monte Carlo method, the number of samples needed is N = O(−2 ).
Let G = Rψ0 Rgood be the Grover operator as in Section 2.3. Then in the basis B = {|ψgood i , |ψbad i},
the subspace H = span B is an invariant subspace of G. Recall the computation of Eq. (2.23), the
matrix representation is
 
cos θ sin θ
(4.23) [G]B = .
− sin θ cos θ
Its two eigenstates are
1
(4.24) |ψ± i = √ (|ψgood i ± i |ψbad i) ,
2
with eigenvalues e±iθ , respectively.
Therefore the problem of estimating θ can be solved with phase estimation with an imperfect
initial state. Note that
2
2 1 θ θ 1 2
(4.25) |hψ0 |ψ+ i| = sin + i cos = = |hψ0 |ψ− i| .
2 2 2 2
Consider a QPE circuit with t ancilla qubits and querying G for T = 2t times. Then each execution
with the system register will be in |ψ+ i or |ψ− i states each with probability 0.5. Let
(4.26) t = d + dlog δ −1 e
58 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

be the number of ancilla qubits with 0 = 2−d . Then QPE obtains an estimate denoted by θ,
e which
0 2 θ
approximates θ to precision  with success probability 1 − δ. Note that p0 = sin 2 , and

θe θ
sin2 − sin2
2 2
θe − θ θ θe − θ θ θ θ θe − θ θe − θ θ
(4.27) = sin2 cos2 + cos2 sin2 + 2 sin cos sin cos − sin2
2 2 2 2 2 2 2 2 2
θe − θ
 
θ θ θ
= sin cos sin(θe − θ) + 1 − 2 sin2 sin2 .
2 2 2 2

Using the fact that sin(θe − θ) ≤ θe − θ ≤ 0 , we have

p 02
(4.28) |e
p − p| ≤ p0 (1 − p0 )0 + (1 − 2p0 ) .
4
Let 0 be sufficiently small. Now if p0 (1 − p0 ) = Ω(1), we can choose 0 = O(), and the total
complexity of QPE is O(−1 ).
If p0 is small, then we should estimate p0 to multiplicative accuracy  instead. Use

(4.29) |e
p − p| ≈ p0 0 < p0 ,
√ −1
we have 0 = p0 . Therefore the runtime of QPE is O(p0 2 −1 ). If p0 is to be estimated to
precision 0 using the Monte Carlo method, the number of samples would be N = O(p−10 
−2
).
So, the amplitude estimation method achieves quadratic speedup in the total complexity, but
the circuit depth is increased to O(0−1 δ −1 ).

Example 4.5 (Amplitude estimation to accelerate Hadamard test). Consider the circuit for the
Hadamard test in Fig. 3.1 to estimate Re hψ|U |ψi. Let the initial state |ψi be prepared by a unitary
Uψ , then the following combined circuit

|0i H H

|0n i Uψ U

maps |0i |0n i to


1 1 √ p
(4.30) |ψ0 i = |0i (|ψi + U |ψi) + |1i (|ψi − U |ψi) := p0 |ψgood i + 1 − p0 |ψbad i ,
2 2
and the goal is to estimate p0 . This also gives the implementation of the reflector Rψ0 .
Note that Rgood can be implemented by simply reflecting against the signal qubit, i.e.,

(4.31) Rgood = (I − 2 |0i h0|) ⊗ In = −Z ⊗ In .

Then we can run QPE to the Grover unitary G = Rψ0 Rgood to estimate p(0), and the circuit depth
is O(−1 ). 
4.3. HHL ALGORITHM FOR SOLVING LINEAR SYSTEMS OF EQUATIONS 59

4.3. HHL algorithm for solving linear systems of equations


In this section, we consider the solution of linear systems of equations
(4.32) Ax = b,
where A ∈ CN ×N is a non-singular matrix. Without loss of generality we assume A is Hermitian.
Otherwise, we can solve a dilated Hermitian matrix, which enlarges the matrix dimension by a
factor of 2 (i.e., use one ancilla qubit)

 
(4.33) Ae = 0 A = |1i h0| ⊗ A + |0i h1| ⊗ A† ,
A 0
and solve the enlarged problem
(4.34) e |0, xi = |1, bi .
A
We assume b ∈ CN is a normalized vector and hence can be stored in a quantum state. More
specifically, we have a unitary Ub such that (may require some work registers)
(4.35) |bi = Ub |0n i .
On classical computers, the solution is simply x = A−1 b. However, the solution vector is in general
not a normalized vector, and hence cannot be directly stored as a quantum state. Therefore the
goal of solving the quantum linear system problem (QLSP) is to find a quantum state |e xi so that
A−1 |bi
(4.36) k|e
xi − |xik ≤ , |xi = .
kA−1 |bik
The normalization constant A−1 |bi should be recovered separately.
One useful application of QLSP solvers is to evaluate the many-body Green’s function [NO88],
based on the quantum many-body Hamiltonian in Example 4.2. We will omit the details here.
4.3.1. Algorithmic description. The HHL algorithm [HHL09], based on QPE, is the first
quantum algorithm for solving QLSP. The algorithm can be summarized as follows. Let A have
the following eigendecomposition
(4.37) A |vj i = λj |vj i .
To simplify the analysis, we assume 0 < λ0 ≤ λ1 ≤ . . . ≤ λN −1 < 1 and all eigenvalues have an
exact d-bit representation. The analysis can also be generalized to the case when A has negative
eigenvalues, but the interpretation of the result from QPE needs to be modified accordingly.
The matrix A can be queried using Hamiltonian simulation as U = ei2πA . Then if the input
state is already one of the eigenvectors, i.e., |bi = |vj i, then QPE can be applied to implement the
mapping
(4.38) UQPE |0d i |vj i = |λj i |vj i .
P
In general, let the input state |bi = j βj |vj i be expanded using the eigendecomposition of A.
Then by linearity,
X
(4.39) UQPE |0d i |bi = βj |λj i |vj i .
j

Note that the unnormalized solution satisfies


X βj
(4.40) A−1 |bi = |vj i ,
j
λj
60 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

so all we need to do is to use the information of the eigenvalue |λj i stored in the ancilla register,
and perform a controlled rotation to multiply the factor λ−1 j to each βj . To this end, we see that
it is crucial to store all eigenvalues in the quantum computer coherently, as achieved by QPE. We
would like to implement the following controlled rotation unitary (see Section 4.3.2)
s !
C2 C
(4.41) UCR |0i |λj i = 1− |0i + |1i |λj i .
e2
λ λ
ej
j

where each λ ej approximates λj .



Finally, perform uncomputation by applying UQPE , we convert the information from the ancilla
d
register |λj i back to |0 i. Therefore the quantum circuit for the HHL algorithm is in Fig. 4.1.

r
C2 C
|0i 1− e2 |0i + |1i
λj λ
ej

UCR
|0d i |0d i

UQPE UQPE
|vj i |vj i

Figure 4.1. Circuit for the HHL algorithm.

Note that through the uncomputation, the d ancilla qubits for storing the eigenvalues also
becomes a working register. Discarding all working registers, the resulting unitary denoted by
UHHL satisfies
s !
X C2 C
(4.42) UHHL |0i |bi = 1− |0i + |1i βj |vj i .
e2
λ λ
ej
j j

Finally, measuring the signal qubit (the only ancilla qubit left), if the outcome is 1, we obtain the
(unnormalized) vector
X Cβj
(4.43) x
e= |vj i
j λ
ej

stored as a normalized state in the system register is


x
|e
xi = ≈ |xi ,
e
(4.44)
ke
xk
which is the desired approximate solution to QLSP. In particular, the constant C does not appear
in the solution.
Remark 4.6 (Recovering the norm of the solution). The HHL algorithm returns the solution to
QLSP in the form of a normalized state |xi stored in the quantum computer. In order to recover
4.3. HHL ALGORITHM FOR SOLVING LINEAR SYSTEMS OF EQUATIONS 61

the magnitude of the unnormalized solution ke


xk, we note that the success probability of measuring
the signal qubit in Eq. (4.42) is
2
X Cβj 2
(4.45) p(1) = |vj i = ke
xk .
j λ
ej

Therefore sufficient repetitions of running the circuit in Eq. (4.42) and estimate p(1), we can obtain
an estimate of kexk. 
More general discussion of the readout problem of the HHL algorithm will be given in Re-
mark 4.10.
4.3.2. Implementation of controlled rotation.
Proposition 4.7 (Controlled rotation given rotation angles). Let 0 ≤ θ < 1 has exact d-bit fixed
point representation θ = .θd−1 · · · θ0 be its d-bit fixed point representation. Then there is a (d + 1)-
qubit unitary Uθ such that
(4.46) Uθ : |0i|θi 7→ (cos(πθ)|0i + sin(πθ)|1i)|θi.
Proof. First (by e.g. Taylor expansion)
 
cos(τ ) − sin(τ )
(4.47) exp (−iτ σy ) = =: Ry (2τ ).
sin(τ ) cos(τ )
Here Ry (·) perform a single-qubit rotation around the y-axis. For any j ∈ [2d ] with its binary
representation j = jd−1 · · · j0 , we have
(4.48) j/2d = (.jd−1 · · · j0 ).
So choose τ = π(.jd−1 · · · j0 ), and define
X
(4.49) Uθ = exp (−iπ(.jd−1 · · · j0 )σy ) ⊗ |ji hj| .
j∈[2d ]

Applying Uθ to |0i |θi gives the desired results. 


The quantum circuit for the controlled rotation circuit is
|0i Ry (π) Ry (π/2) ··· Ry (π/2d−1 )

|θd−1 i
|θd−2 i
···
|θ0 i

This is a sequence of single-qubit rotations on the signal qubit, each controlled by a single qubit.
In order to use the controlled rotation operation, we need to store the information of λj in term
of an angle θj . Let C > 0 be a lower bound to λ0 , so that 0 < C/λj < 1 for all j. Define
1
(4.50) θj = arcsin(C/λj ),
π
62 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

and θej be its d0 -bit representation. Then


C C
(4.51) sin π θej ≡ ≈ .
λj
e λ j

Again for simplicity we assume d0 is large enough so that the error of the fixed point representation
is negligible in this step. The mapping
0
(4.52) Uangle |0d −d i |λj i = |θej i
can be implemented using classical arithmetics circuits in Section 1.8, which may require poly(d0 )
gates and an additional working register of poly(d0 ) qubits, which are not displayed here. Therefore
the entire controlled rotation operation needed for the HHL algorithm is given by the circuit in
Fig. 4.2.

|0i cos(π θej ) |0i + sin(π θej ) |1i

0 0
|0d −d i Uθ |0d −d i

Uangle Uangle
|λj i |λj i

Figure 4.2. Circuit for the controlled rotation step used by the HHL algorithm
(not including additional working register for classical arithmetic operations).


Therefore through the uncomputation Uangle , the d0 − d ancilla qubits also become a working
register. Discard the working register, and we obtain a unitary UCR satisfying
s !
  C 2 C
(4.53) UCR |0i |λj i = cos(π θej ) |0i + sin(π θej ) |1i |λj i = 1− |0i + |1i |λj i .
e2
λ λ
ej
j

This is the unitary used in the HHL circuit in Fig. 4.1.

4.3.3. Complexity analysis of the HHL algorithm. Although the choice of the constant
C does not appear in the normalized quantum state, it does directly affect the success probability.
From Eq. (4.42) we immediately obtain the success probability for measuring the signal qubit with
outcome 1 is the square of the norm of the unnormalized solution
2 2
(4.54) xk ≈ C 2 A−1 |bi
p(1) = ke .
Therefore the success probability is determined by
(1) the choice of the normalization constant C,
(2) the norm of the true solution kxk = A−1 |bi .
To maximize the success probability, C should be chosen to be as large as possible (without
exceeding λ0 ). So assuming the exact knowledge of λ0 , we can choose C = λ0 . For a Hermitian
4.3. HHL ALGORITHM FOR SOLVING LINEAR SYSTEMS OF EQUATIONS 63

positive definite matrix A, kAk = λN −1 , and A−1 = λ−1 0 . For simplicity, assume the largest
eigenvalue of A is λN −1 = 1. Then the condition number of A is
λN −1
(4.55) κ := kAk A−1 = = C −1 .
λ0
Furthermore,
1
(4.56) A−1 |bi ≥ k|bik = 1.
kAk
Therefore
(4.57) p(1) = Ω(κ−2 ).
In other words, in the worst case, we need to repeatedly run the HHL algorithm for O(κ2 ) times
to obtain the outcome 1 in the signal qubit.
Assuming the number of system qubits n is large, the circuit depth and the gate complexity
of UHHL is mainly determined by those of UQPE . Therefore we can measure the complexity of the
HHL algorithm in terms of the number of queries to U = ei2πA . In order to solve QLSP to precision
, we need to estimate the eigenvalues to multiplicative accuracy  instead of the standard additive
accuracy.
To see why this is the case, assume λej = λj (1 + ej ) and |ej | ≤  ≤ 1 . Then the unnormalized
4 2
solution satisfies
!
X 1 1 X βj  −ej  
(4.58) ke
x − xk = βj − |vj i ≤ |vj i ≤ kxk .
j λj
e λj j
λ j 1 + ej 2

Hence
ke
xk 
(4.59) 1− ≤ .
kxk 2
Then the normalized solution satisfies
x x ke
xk ke
x − xk
k|e
xi − |xik = − ≤ 1− ≤ .
e
(4.60) +
ke
xk kxk kxk kxk
The discussion requires the QPE to be run to additive precision 0 = λ0  = /κ. Therefore
the query complexity of QPE is O(κ/). Counting the number of times needed to repeat the HHL
circuit, the worst case query complexity of the HHL algorithm is O(κ3 /).
The above analysis the worst case analysis, because we assume p(1) attains the lower bound
Ω(κ−2 ). In practical applications, the result may not be so pessimistic. For instance, if βj con-
centrates around the smallest eigenvalues of A, then we may have kxk ∼ Θ(λ−1 0 ) = Θ(κ
−1
). Then
p(1) = Θ(1). In such a case, we only need to repeat the HHL algorithm for a constant number of
times to yield outcome 1 in the ancilla qubit. This does not reduce the query complexity of each
run of the algorithm. Then in this best case, the query complexity is O(κ/).

4.3.4. Additional considerations. Below we discuss a few more aspects of the HHL algo-
rithm. The first observation is that the asymptotic worst-case complexity of the HHL algorithm
can be generally improved using amplitude amplification.
64 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

Remark 4.8 (HHL with amplitude amplification). We may view Eq. (4.42) as
p p
(4.61) UHHL |0i |bi = p(1) |1i |ψgood i + 1 − p(1) |0i |ψbad i , |ψgood i = |e
xi .
Since |ψgood i is marked by a single signal qubit, we may use Example 2.5 to construct a reflection
operator with respect to the signal qubit. This is simply given by
(4.62) Rgood = Z ⊗ In .
The reflection with respect to the initial vector is
(4.63) Rψ0 = Uψ0 (2 |01+n i h01+n | − I)Uψ† 0 ,
where Uψ0 = UHHL (I1 ⊗Ub ). Let G = Rψ0 Rgood be the Grover iterate. Then amplitude amplification
1
allows us to apply G for Θ(p(1)− 2 ) times to boost the success probability of obtaining |ψgood i
with constant success probability. Therefore in the worst case when p(1) = Θ(κ−2 ), the number
of repetitions is reduced to O(κ), and the total runtime is O(κ2 /). This query complexity is
the commonly referred query complexity for the HHL algorithm. Note that as usual, amplitude
amplification increases the circuit depth. However, the tradeoff is that the circuit depth increases
from O(κ/) to O(κ2 /). 
So far our analysis, especially that based on QPE relies on the assumption that all λj all
eigenvalues have an exact d-bit representation. From the discussion in Section 3.5, we know that
such an assumption is unrealistic and causes theoretical and practical difficulties. The full analysis
of the HHL algorithm is thus more involved. We refer to e.g. [DHM+ 18] for more details.
Remark 4.9 (Comparison with classical iterative linear system solvers). Let us now compare the
cost of the HHL algorithm to that of classical iterative algorithms. If A is n-qubit Hermitian positive
definite with condition number κ, and is d-sparse (i.e., each row/column of A has at most d nonzero
entries), then each matrix vector multiplication Ax costs O(dN ) floating point operations. The
−1
number of iterations for the steepest√ descent (SD) algorithm is O(κ log  ), and the this number
can be significantly reduced to O( κ log −1 ) by the renowned conjugate gradient (CG) √ method.
Therefore the total cost (or wall clock time) of SD and CG is O(dN κ log −1 ) and O(dN κ log −1 ),
respectively.
On the other hand, the query complexity of the HHL algorithm, even after using the AA
algorithm, is still O(κ2 /). Such a performance is terrible in terms of both κ and . Hence the
power of the HHL algorithm, and other QLSP solvers is based on that each application of A (in
this case, using the unitary U ) is much faster. In particular, if U can be implemented with poly(n)
gate complexity (also can be measured by the wall clock time), then the total gate complexity
of the HHL algorithm (with AA) is O(poly(n)κ2 /). When n is large enough, we expect that
poly(n)  N = 2n and the HHL algorithm would eventually yield an advantage. Nonetheless, for
a realistic problem, the assumption that U can be implemented with poly(n) cost, and no classical
algorithm can implement Ax with poly(n) cost should be taken with a grain of salt and carefully
examined. 
Remark 4.10 (Readout problem of QLSP). By solving the QLSP, the solution is stored as a
quantum state in the quantum computer. Sometimes the QLSP is only a subproblem of a larger
application, so it is sufficient to treat the HHL algorithm (or other QLSP solvers) as a “quantum
subroutine”, and leave |xi in the quantum computer. However, in many applications (such as the
solution of Poisson’s equation in Section 4.4, the goal is to solve the lienar system. Then the
information in |xi must be converted to a measurable classical output.
4.4. EXAMPLE: SOLVE POISSON’S EQUATION 65

The most common case is to compute the expectation of some observable hOi = hx|O|xi ≈
he
x|O|exi. Assuming hOi = Θ(1). Then to reach additive precision  of the observable, the number
of samples needed is O(−2 ). On the other hand, in order to reach precision , the solution vector
|e
xi must be solved to precision . Assuming the worst case analysis for the HHL algorithm, the
total query complexity needed is
(4.64) O(κ2 /) × O(−2 ) = O(κ2 /3 ).

Remark 4.11 (Query complexity lower bound). The cost of a quantum algorithm for solving a
generic QLSP scales at least as Ω(κ(A)), where κ(A) := kAk A−1 is the condition number of
A. The proof is based on converting the QLSP into a Hamiltonian simulation problem, and the
lower bound with respect to κ is proved via the “no-fast-forwarding” theorem for simulating generic
Hamiltonians [HHL09]. Nonetheless, for specific classes of Hamiltonians, it may still be possible to
develop fast algorithms to overcome this lower bound. 

4.4. Example: Solve Poisson’s equation


As an application of the HHL algorithm, let us consider a toy problem of solving the Poisson’s
equation in one-dimension with Dirichlet boundary conditions
(4.65) − u00 (r) = b(r), r ∈ Ω = [0, 1], u(0) = u(1) = 0.
We use the central finite difference formula to discretize the Laplacian operator on a uniform grid
ri = (i + 1)h, i ∈ [N ] and h = 1/(N + 1). Let ui = u(ri ), bi = b(ri ), then Poisson’s equation is
discretized into a linear system of equations Au = b, with a tridiagonal matrix
 
2 −1 0 · · · 0 0 0
−1 2 −1 · · · 0 0 0
1 
 .. .

(4.66) A= 2 . .. .

h  
0 0 0 · · · −1 2 −1
0 0 0 · · · 0 −1 2
Our goal here is not to address the quality of the spatial discretization, but to study the cost for
solving the linear system. To this end we need to compute the condition number of A. We also
assume the right hand vector b is already normalized so that |bi = b.
Proposition 4.12 (Diagonalization of tridiagonal matrices). A Hermitian Toeplitz tridiagonal
matrix
 
a b 0 ··· 0 0 0
b
 a b ··· 0 0 0
(4.67) A =  ... ..  ∈ CN ×N
 
 . 
0 0 0 ··· b a b
0 0 0 ··· 0 b a
can be analytically diagonalized as
(4.68) Avk = λk vk , k = 1, . . . , N.
where (vk )j = vj,k , j = 1, . . . , N , b = |b| eiθ , and
kπ jkπ ijθ
(4.69) λk = a + 2 |b| cos , vj,k = sin e .
N +1 N +1
66 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

Proof. Note that formally v0,k = vN +1,k = 0. Then direct matrix vector multiplication shows
that for any j = 1, . . . , N ,
jkπ ijθ jkπ kπ ijθ
(4.70) (Avk )j = a sin e + 2 |b| sin cos e = λk (vk )j .
N +1 N +1 N +1


Using Proposition 4.12 with a = 2/h2 , b = −1/h2 , the largest eigenvalue of A is λmax = kAk ≈
−1
4/h2 , and the smallest eigenvalue λmin = A−1 ≈ π 2 . So
4
(4.71) κ(A) ≈ = O(N 2 ).
h2 π 2
The circuit depth of the HHL algorithm is O(N 2 /), and the worst case query complexity (using
AA) is O(N 4 /). So when N is large, there is little benefit in employing the quantum computer to
solve this problem.
Let us now consider solving a d-dimensional Poisson’s equation with Dirichlet boundary con-
ditions
(4.72) − ∆u(r) = b(r), r ∈ Ω = [0, 1]d , u|∂Ω = 0.
The grid is the Cartesian product of the uniform grid in 1D with N grid points per dimension and
h = 1/(N + 1). The total number of grid points is N = N d . After discretization, we obtain a linear
system Au = b, where
(4.73) A = A ⊗ I ⊗ · · · I + · · · + I ⊗ · · · I ⊗ A,
where I is an identity matrix of size N . Since A is Hermitian and positive definite, we have
−1
kAk ≈ 4d/h2 , and A−1 ≈ dπ 2 . So κ(A) ≈ 4/(h2 π 2 ) ≈ κ(A). Therefore the condition number
is independent of the spatial dimension d.
The worst case query complexity of the HHL algorithm is O(N 2 /). So when the number of
grid points per dimension N is fixed and the spatial dimension d increases, and if U = eiAτ can be
implemented efficiently for some τ kAk < 1 with poly(d) cost, then the HHL algorithm will have
an advantage over classical solvers, for which each matrix-vector multiplication scales linearly with
respect to N and is therefore exponential in d.

4.5. Solve linear differential equations*


Consider the solution of a time-dependent linear differential equation
d
x(t) = A(t)x(t) + b(t), t ∈ [0, T ],
(4.74) dt
x(0) = x0 .
Here b(t), x(t) ∈ Cd and A(t) ∈ Cd×d . Eq. (4.74) can be used to represent a very large class
of ordinary differential equations (ODEs) and spatially discretized partial differential equations
(PDEs). For instance, if A(t) = −iH(t) for some Hermitian matrix H(t) and b(t) = 0, this is the
time-dependent Schrödinger equation
d
(4.75) i x(t) = H(t)x(t).
dt
4.5. SOLVE LINEAR DIFFERENTIAL EQUATIONS* 67

A special case when H(t) ≡ H is often called the Hamiltonian simulation problem, and the solution
can be written as
(4.76) x(T ) = e−iHT x(0),
which can be viewed as the problem of evaluating the matrix function e−iHT . This will be discussed
separately in later chapters.
In this section we consider the general case of Eq. (4.74), and for simplicity discretize the
equation in time using the forward Euler method with a uniform grid tk = k∆t where ∆t =
T /N, k = 0, . . . , N . Let Ak = A(tk ), bk = b(tk ). The resulting discretized system becomes
(4.77) xk+1 − xk = ∆t(Ak xk + bk ), k = 1, . . . , N,
which can be rewritten as
(4.78)
    
I 0 0 ··· 0 0 x1 (I + ∆tA0 )x0 + ∆tb0
−(I + ∆tA1 ) I 0 ··· 0 0   x2   ∆tb1
   
 

 0 −(I + ∆tA2 ) I ··· 0 0  x3  
  =  ∆tb 2 ,

 .. ..   ..   .. 
 . .   .   . 
0 0 0 ··· −(I + ∆tAN −1 ) I xN ∆tbN −1
or more compactly as a linear systems of equations
(4.79) Ax = b.
Here I is the identity matrix of size d, and x ∈ RN d encodes the entire history of the states.
To solve Eq. (4.78) as a QLSP, the right hand side b needs to be a normalized vector. This
means we need to properly normalize x0 , bk so that
2 2
X 2
(4.80) kbk = k(I + ∆tA0 )x0 + ∆tb0 k + (∆t)2 kbk k = 1.
k∈[N −1]

In the limit ∆t → 0, this can be written as


Z T
2 2 2
(4.81) kbk = kx0 k + kb(t)k dt = 1,
0

which is not difficult to satisfy as long as x0 , b(t) can be prepared efficiently using unitary circuits
and kx0 k , kb(t)k = Θ(1).
To solve Eq. (4.78) using the HHL algorithm, we need to estimate the condition number of A.
Note that A is a block-bidiagonal matrix and in particular is not Hermitian. So we need to use the
dilation method in Eq. (4.33) and solve the corresponding Hermitian problem.

4.5.1. Scalar case. In order to estimate the condition number, for simplicity we first assume
d = 1 (i.e., this is a scalar ODE problem), and A(t) ≡ a ∈ C is a constant. Then
 
1 0 0 ··· 0 0
−(1 + ∆ta) 1 0 ··· 0 0
 
(4.82) A=
 0 −(1 + ∆ta) 1 · · · 0 0  ∈ CN ×N .
 .. .. 
 . .
0 0 0 ··· −(1 + ∆ta) 1
68 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

Let ξ = 1 + ∆ta. When Re a ≤ 0, the absolute stability condition of the forward Euler method
requires |1 + ∆ta| = |ξ| < 1. In general we are interested in the regime ∆t |a|  1, and in particular
|1 + ∆ta| = |ξ| < 2.
2
Proposition 4.13. For any A ∈ CN ×N , kAk , (1/ A−1 )2 are given by the largest and smallest
eigenvalue of A† A, respectively.
From Proposition 4.13, we need to first compute
 2

1 + |ξ| −ξ ··· 0 0
2
 −ξ 1 + |ξ| −ξ ··· 0 0
 

2
 0 −ξ 1 + |ξ| ··· 0 0
 
A† A = 

(4.83) .
 .. .. 
 . . 
 2 
 0 0 0 ··· 1 + |ξ| −ξ 
0 0 0 ··· −ξ 1
Theorem 4.14 (Gershgorin circle theorem, see e.g. [GVL13, Theorem 7.2.1]). Let A ∈ CN ×N
with entries aij . For each i = 1, . . . , N , define
X
(4.84) Ri = |aij | .
j6=i

Let D(aii , Ri ) ⊆ C be a closed disc centered at aii with radius Ri , which is called a Gershgorin disc.
Then every eigenvalue of A lies within at least one of the Gershgorin discs D(aii , Ri )
Since A† A is Hermitian, we can restrict the Gershgorin discs to the real line so that D(aii , Ri ) ⊆
R. Then Gershgorin discs of the matrix A† A satisfy the bound
h i
2 2
D(a11 , R1 ) ⊆ 1 + |ξ| − |ξ| , 1 + |ξ| + |ξ| ,
h i
(4.85) 2 2
D(aii , Ri ) ⊆ 1 + |ξ| − 2 |ξ| , 1 + |ξ| + 2 |ξ| , i = 2, . . . , N − 1
D(aN N , RN ) ⊆ [1 − |ξ| , 1 + |ξ|] .
Applying Theorem 4.14 we have
(4.86) λmax (A† A) ≤ (1 + |ξ|)2 < 9.
for all values of a such that |ξ| < 2.
To obtain a meaningful lower bound of λmin (A† A), we need Re a < 0 and hence |ξ| < 1. Use
the inequality
√ 1
(4.87) 1 + x ≤ 1 + x, x > −1,
2
we have
p (∆t |a|)2 ∆t
(4.88) 1 − |ξ| = 1 − (1 + ∆t Re a)2 + ∆t2 (Im a)2 ≥ −∆t Re a − ≥− Re a,
2 2
when
(4.89) ∆t |a| < (− Re a)/ |a|
is satisfied. Then we have
∆t
(4.90) 1 > 1 − |ξ| ≥ − Re a.
2
4.5. SOLVE LINEAR DIFFERENTIAL EQUATIONS* 69

Then according to Theorem 4.14, under the condition Eq. (4.89),

(∆t Re a)2
(4.91) λmin (A† A) ≥ (1 − |ξ|)2 ≥ .
4
Therefore
q q
2
(4.92) kAk = λmax (A† A) ≤ 1 + |ξ| + 2 |ξ| < 3,

and

∆t(− Re a)
q
−1
(4.93) A−1 = λmin (A† A) ≥ .
2

As a result, when (− Re a) = Θ(1), the condition number satisfies

(4.94) κ(A) = kAk A−1 = O (1/∆t) .

In summary, for the scalar problem d = 1 and Re a < 0, the query complexity of the HHL
algorithm is O((∆t)−2 −2 ).

Example 4.15 (Growth of the condition number when Re a ≥ 0). The Gershgorin circle theorem
does not provide a meaningful bound of the condition number when Re a > 0 and 1 < |ξ| < 2.
 a = 1, b = 0, and the solution should grow
This is a correct behavior. To see this, just consider
exponentially as x(T ) = eT . If κ(A) = O (∆t)−1 holds and in particular is independent of the
final T , then the norm of the solution can only grow polynomially in T , which is a contradiction.
See Fig. 4.3 for an illustration. Note that when a = −1.0, ∆t = 0.1, the condition number is less
2
than 20, which is smaller than the upper bound above, i.e., 3 × ∆t(−a) = 60.

Figure 4.3. Growth of the condition number κ(A) with respect to T with a fixed
step size ∆t = 0.1 for a = 1.0 and a = −1.0.


70 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

2
Remark 4.16. It may be tempting to modify the (N, N )-th entry of A to be 1 + |ξ| to obtain
2
 
1 + |ξ| −ξ ··· 0 0
 −ξ 2
 1 + |ξ| −ξ ··· 0 0  
2
 0 −ξ 1 + |ξ| · · · 0 0 
 
(4.95) G= .. ..  .
 . . 
 2 
 0 0 0 · · · 1 + |ξ| −ξ 
2
0 0 0 ··· −ξ 1 + |ξ|
Here G is a Toeplitz tridiagonal matrix satisfying the requirement of Proposition 4.12. The eigen-
values of G take the form
2 kπ
(4.96) λk = 1 + |ξ| + 2 |ξ| cos , k = 1, . . . , N.
N +1
If we take the approximation λmin (A† A) ≈ λmin (G), we would find that Eq. (4.94) holds even when
Re a > 0. This behavior is however incorrect, despite that the matrices A† A and G only differ by
a single entry! 
4.5.2. Vector case. Here we consider a general d > 1, but for simplicity assume A(t) ≡ A ∈
Cd×d is a constant matrix. We also assume A is diagonalizable with eigenvalue decomposition
(4.97) A = V ΛV −1 ,
and Λ = diag(λ1 , . . . , λN ). We only consider the case Re λk < 0 for all k.
Proposition 4.17. For any diagonalizable A ∈ CN ×N with eigenvalue decomposition Avk = λk vk ,
we have
−1
(4.98) A−1 ≤ min |λk | ≤ max |λk | ≤ kAk .
k k

Proof. Use the Schur form A = QT Q† , where Q is an unitary matrix and T is an upper tri-
angular matrix (see e.g. [GVL13, Theorem 7.13]). The diagonal entries of T encodes all eigenvalues
of A, and the eigenvalues can appear in any order along the diagonal of T . The proposition follows
by arranging the eigenvalue of A with the smallest and largest absolute values to the (N, N )-th
entry, respectively. 

The absolute stability condition of the forward Euler method requires ∆t kAk < 1, and we are
interested in the regime ∆t kAk  1. Therefore ∆t |λk |  1 for all k.
Let I be an identity matrix of size d, and denote by B = −(I + ∆tA), then
I + B†B B†
 
··· 0 0
 B I + B†B B† ··· 0 0

 
 0 B I + B B · · · 0 0

(4.99) A A= ..  .
 
..

 . . 

 0 0 0 · · · I + B B B†

0 0 0 ··· B I
Note that
(4.100) kBk ≤ I + B † B ≤ 1 + (1 + ∆t kAk)2 ≤ 5.
4.5. SOLVE LINEAR DIFFERENTIAL EQUATIONS* 71

For any x ∈ CN d ,
2
h
2 2 2 2 2
A† Ax ≤ I + B†B (kx1 k + kx2 k ) + (kx1 k + kx2 k + kx3 k ) + · · ·
(4.101) i
2 2 2
+ (kxN −1 k + kxN k ) ≤ 15 kxk .

So λmax (A† A) ≤ 15, and


q √
(4.102) kAk = λmax (A† A) ≤ 15 = O(1).

To bound A−1 , we first note that from the eigenvalue decomposition of A, we have
(4.103)  
  I 0 0 ··· 0 0  −1 
V −(I + ∆tΛ) V
I 0 ··· 0 0
 V   V −1 
A=
  0 −(I + ∆tΛ) I ··· 0 0 
.

.. 
..

..   ..
 .  . 
. .
V −1

V
0 0 0 ··· −(I + ∆tΛ) I
Hence
(4.104) A−1 ≤ kV k V −1 max A−1
k = κ(V ) A−1
k .
k

Here κ(V ) = kV k V −1 is the condition number of the eigenvector matrix


 
1 0 0 ··· 0 0
−(1 + ∆tλk ) 1 0 ··· 0 0
 
(4.105) Ak = 
 0 −(1 + ∆tλ k ) 1 ··· 0 0 .
 .. .. 
 . .
0 0 0 ··· −(1 + ∆tλk ) 1
This reduces the problem to the scalar case. Let ξk = 1 + ∆tλk , and assume
(4.106) Re λk < 0, ∆t |λk | < (− Re λk )/ |λk | , ∀k.
Then from Eq. (4.93) we have
−1 ∆t mink (− Re λk )
(4.107) min A−1
k ≥ .
k 2
So if mink (− Re λk ) = Θ(1), we have
(4.108) κ(A) = O(κ(V )/∆t),
and the cost of the HHL algorithm is O((∆t)−2 −2 κ(V )). Hence compared to the scalar case, the
condition number of the eigenvector matrix V can play an important role.

4.5.3. Computing observables. The solution of Eq. (4.78) means that the normalized state
|xi is computed to precision  and stored in the quantum computer. In order to evaluate observables
at the final time T , i.e., hx(T )|O|x(T )i, we find that by the normalization condition, kx(T )k is on
1 1
average O(N − 2 ) = O((∆t) 2 ), and hx(T )|O|x(T )i = O(∆t). Therefore instead of reaching accuracy
, the Monte Carlo procedure must reach precision O(∆t). This increases the number of samples
by another factor of O((∆t)−2 ).
72 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION

There is however a simple way to overcome this problem. Instead of solving Eq. (4.78), we
can redefine x by artificially padding the vector with N copies of the final state xN . This can be
written as and can be reinterpreted as
(4.109) X = |0i ⊗ x + |1i ⊗ y,
with the unnormalized vector
     
x1 xN +1 xN
 x2  xN +2  xN 
(4.110) x =  . , y =  .  =  . ,
     
 ..   ..   .. 
xN x2N xN
and the corresponding linear systems of equation becomes
(4.111)
    
I x1 (I + ∆tA0 )x0 + ∆tb0
−(I + ∆tA1 ) I  x2   ∆tb1 
.. ..
    
 ..    

 . 
 .  
  . 


 −(I + ∆tA N −1 ) I   xN  
 = ∆tb N −1

.

 −I I  xN +1  
   0 


 −I I  xN +2  
   0 

..  .   .
  ..   ..
 
 . 
−I I x2N 0
Note that the solution vector only requires one ancilla qubit, after solving the equation The con-
dition number of this modified equation is still κ = O((∆t)−1 ), so the total query complexity
is O((∆t)−2 −2 ). By solving the modified equation Eq. (4.111) to precision , we can estimate
hx(T )|O|x(T )i by
(4.112) hy|I ⊗ O|yi
of which the magnitude does not scale with ∆t. Here I is the identity matrix of size N , and y can
be obtained by measuring the ancilla qubit and obtain 1. If the norm of kx(t)k is comparable for
all t ∈ [0, T ], then the success probability will be Θ(1) after |Xi is obtained.

4.6. Example: Solve the heat equation*


As an application of the differential equation solver in 4.5, let us consider a toy problem of
solving the heat equation in one-dimension with Dirichlet boundary conditions
(4.113) ∂t u(r, t) = u00 (r), r ∈ Ω = [0, 1], u(0, t) = u(1, t) = 0.
After spatial discretization using the central finite difference method with d grid points, this becomes
a linear ODE system
(4.114) ∂t u = −Au,
N ×N
where A ∈ R is a tridiagonal matrix given by Eq. (4.66). After applying the forward Euler
method and discretize the simulation time T into L intervals with ∆t = T /L, we obtain a linear
system of size N L. The eigenvalues of −A, denoted by λk , are all negative and satisfy
4 −1 1
(4.115) − 2 ≈ − kAk ≤ λk ≤ − A−1 ≈ − 2.
h π
4.6. EXAMPLE: SOLVE THE HEAT EQUATION* 73

The absolute stability condition requires |1 + ∆tλk | < 1, or ∆t < h2 /4 = O(N −2 ), which implies
L = O(N 2 ). Since A is Hermitian, we have κ(V ) = 1. So for T = O(1), the query complexity for
solving the heat equation is O(N 2 −2 ), which is the same as solving Poisson’s equation.
Again, the potential advantage of the quantum solver only appears when solving the d-dimensional
heat equation
(4.116) ∂t u(r, t) = ∆u(r), r ∈ Ω = [0, 1]d , u(·, t)|∂Ω = 0.
This can be written as a linear system of equations
(4.117) ∂t u = −Au,
where A is given in Eq. (4.73). The eigenvalues of A are all negative. Note that kAk = Θ(dN 2 ),
then h = O(d−1 N −2 ), and the query complexity of the HHL solver is O(dN 2 −2 ). This could
potentially have an exponential advantage over classical solvers.

Exercise 4.1 (Quantum counting). Given query access to a function f : {0, 1}N → {0, 1} design
a quantum algorithm that computes the size of its kernel, i.e.„ total number of x’s that satisfy
f (x) = 1.
Exercise 4.2. Consider the initial value problem of the linear differential equation Eq. (4.74).
(1) Construct the linear system of equations
Ax = b
like Eq. (4.78) using the backward Euler method.
(2) In the scalar case when A(t) ≡ a ∈ C is a constant satisfying Re(a) ≤ 0, estimate the
query complexity of the HHL algorithm applying to the linear system constructed in (1).
CHAPTER 5

Trotter based Hamiltonian simulation

The Hamiltonian simulation problem with a time-independent Hamiltonian, or the Hamiltonian


simulation problem for short is the following problem: given an initial state |ψ0 i and a Hamiltonian
H, evaluate the quantum state at time t according to |ψ(t)i = e−itH |ψ0 i. Hamiltonian simulation
is of immense importance in characterizing quantum dynamics for a diverse range of systems and
situations in quantum physics, chemistry and materials science. Simulation of one quantum Hamil-
tonian by another quantum system was also one of the motivations of Feynman’s 1982 proposal
for design of quantum computers [Fey82]. We have also seen that Hamiltonian simulation appears
as a quantum subroutine in numerous other quantum algorithms, such as QPE and its various
applications.
The Hamiltonian simulation problem can also be viewed as a linear ODE:
(5.1) ∂t ψ(t) = −iHψ(t), ψ(0) = ψ0 .
−itH
However, thanks to the unitarity of the operator e for any t, we do not need to store the full
history of the quantum states as in Section 4.5, and can instead focus on the quantum state at time
t of interest.
Following the conceptualization of a universal quantum simulator using a Trotter decomposi-
tion of the time evolution operator e−itH [Llo96], many new quantum algorithms for Hamiltonian
simulation have been proposed. We will discuss some more advanced methods in later chapters.
This chapter focuses on the Trotter based Hamiltonian simulation method (also called the product
formula).

5.1. Trotter splitting


Consider the Hamiltonian simulation problem for H = H1 + H2 , where e−iH1 ∆t and e−iH2 ∆t
can be efficiently computed at least for some ∆t. In general [H1 , H2 ] 6= 0, and the splitting of the
evolution of H1 , H2 needs to be implemented via the Lie product formula
 t t
L
(5.2) e−itH = lim e−i L H1 e−i L H2 .
L→∞
When taking L to be a finite number, and let ∆t = t/L, this gives the simplest first order Trotter
method with
(5.3) e−i∆tH − e−i∆tH1 e−i∆tH2 = O(∆t2 ),
Therefore to perform Hamiltonian simulation to time t, the error is
L  2
 t t t
(5.4) e−itH − e−i L H1 e−i L H2 = O(∆t2 L) = O .
L
So to reach precision  in the operator norm, we need
(5.5) L = O(t2 −1 ).
75
76 5. TROTTER BASED HAMILTONIAN SIMULATION

This can be improved to the second order Trotter method (also called the symmetric Trotter
splitting, or Strang splitting)

(5.6) e−i∆tH − e−i∆t/2H2 e−i∆tH1 e−i∆t/2H2 = O(∆t3 ).


Following a similar analysis to the first order method, we find that to reach precision  we need
(5.7) L = O(t3/2 −1/2 ).
Higher order Trotter methods are also available, such as the p-th order Suzuki formula. The local
truncation error is (∆t)p+1 . Therefore to reach precision , we need
p+1
(5.8) L = O(t p −1/p ).
This is often written as L = O(t1+o(1) −o(1) ) as p → ∞.
Example 5.1 (Simulating transverse field Ising model). For the one dimensional transverse field
Ising model (TFIM) with nearest neighbor interaction in Eq. (4.2), wince all Pauli-Zi operators
commute, we have
Pn−1 n−1
Y
(5.9) e−itH1 := eit i=1 Zi Zi+1
= eitZi Zi+1 .
i=1
itZi Zi+1
Each e is a rotation involving only the qubits i, j, and the splitting has no error. Similarly
P Y
(5.10) e−itH2 := eg i Xi = eitgXi ,
i
itgXi
and each e can be implemented independently without error. 
Example 5.2 (Particle in a potential). Let H = −∆r + V (r) = H1 + H2 be the Hamiltonian of a
particle in a potential field V (r), where r ∈ Ω = [0, 1]d with periodic boundary conditions. After
discretization using Fourier modes, eiH1 t can be efficiently performed by diagonalizing H1 in the
Fourier space, and eiH2 t can be efficiently performed since V (r) is diagonal in the real space. 

5.2. Commutator type error bound


In this section we try to refine the error bounds in Eq. (5.3) by evaluating the preconstant
explicitly. For simplicity we only focus on the first order Trotter formula. The Trotter propagator
e (t) = e−itH1 e−itH2 satisfies the equation
U
e (t) = H1 e−itH1 e−itH2 + e−itH1 H2 e−itH2
i∂t U
(5.11) = (H1 + H2 )e−itH1 e−itH2 + e−itH1 H2 e−itH2 − H2 e−itH1 e−itH2
e (t) + [e−itH1 , H2 ]e−itH2 ,
= HU
e (0) = I. By Duhamel’s principle, and let U (t) = e−itH , we have
with initial condition U
Z t
(5.12) U (t) = U (t) − i
e e−iH(t−s) [e−isH1 , H2 ]e−isH2 ds.
0
So we have
Z t
(5.13) e (t) − U (t) ≤
U [e−isH1 , H2 ] ds.
0
5.2. COMMUTATOR TYPE ERROR BOUND 77

Now consider G(t) = [e−itH1 , H2 ]eitH1 = e−itH1 H2 eitH1 − H2 , which satisfies G(0) = 0 and
(5.14) i∂t G(t) = e−itH1 [H1 , H2 ]e+itH1 .
Hence
(5.15) [e−itH1 , H2 ] = kG(t)k ≤ t k[H1 , H2 ]k .
Plugging this back to Eq. (5.13), we have
Z t
t2
(5.16) U (t) − U (t) ≤
e s k[H1 , H2 ]k ds ≤ k[H1 , H2 ]k ≤ t2 ν 2 .
0 2
In the last equality, we have used the relation k[H1 , H2 ]k ≤ 2ν 2 with ν = max{kH1 k , kH2 k}.
Therefore Eq. (5.3) can be replaced by a sharper inequality
∆t2
(5.17) e−i∆tH − e−i∆tH1 e−i∆tH2 ≤ k[H1 , H2 ]k ≤ (∆t)2 ν 2 .
2
Here the first inequality is called the commutator norm error estimate, and the second inequality
the operator norm error estimate.
For the transverse field Ising model with nearest neighbor interaction, we have kH1 k , kH2 k =
O(n), and hence ν 2 = O(n2 ). On the other hand, since [Zi Zj , Xk ] 6= 0 only if k = i or k = j, the
commutator bound satisfies k[H1 , H2 ]k = O(n). Therefore to reach precision , the scaling of the
total number of time steps L with respect to the system size is O(n2 /) according to the estimate
based on the operator norm, but is only O(n/) according to that based on the commutator norm.
For the particle in a potential, for simplicity consider d = 1 and the domain Ω = [0, 1] is
discretized using a uniform grid of size N . For smooth and bounded potential, we have kH1 k =
O(N 2 ), and kV k = O(1). Therefore the operator norm bound gives ν 2 = O(N 4 ). This is too
pessimistic. Reexamining the second inequality of Eq. (5.17) shows that in this case, the error
bound should be O((∆t)2 ν) instead of (∆t)2 ν 2 . So according to the operator norm error estimate,
we have L = O(N 2 /). On the other hand, in the continuous space, for any smooth function ψ(r),
we have
d2
 
(5.18) [H1 , H2 ]ψ = − 2 , V ψ = −V 00 ψ − V 0 ψ 0 .
dr
So
(5.19) k[H1 , H2 ]ψk ≤ kV 00 k + kV 0 k kψ 0 k = O(N ).
Here we have used that kV 0 k = kV 00 k = O(1), and kψ 0 k = O(N ) in the worst case scenario. There-
fore k[H1 , H2 ]k = O(N ), and we obtain a significantly improved estimate L = O(N/) according to
the commutator norm.
The commutator scaling of the Trotter error is an important feature of the method. We refer
readers to [JL00] for analysis of the second order Trotter method, and [Tha08, CST+ 21] for the
analysis of the commutator scaling of high order Trotter methods.
Remark 5.3 (Vector norm bound). The Hamiltonian simulation problem of interest in practice
often concerns the solution with particular types of initial conditions, instead of arbitrary initial
conditions. Therefore the operator norm bound in Eq. (5.17) can still be too loose. Taking the
initial condition into account, we readily obtain
∆t2
(5.20) e−i∆tH ψ(0) − e−i∆tH1 e−i∆tH2 ψ(0) ≤ max k[H1 , H2 ]ψ(s)k .
2 0≤s≤∆t
78 5. TROTTER BASED HAMILTONIAN SIMULATION

For the example of the particle in a potential, we have


(5.21) max k[H1 , H2 ]ψ(s)k ≤ kV 00 k + kV 0 k max kψ 0 (s)k .
0≤s≤∆t 0≤s≤∆t

Therefore if we are given the a priori knowledge that max0≤s≤t kψ 0 (s)k = O(1), we may even have
L = O(−1 ), i.e., the number of time steps is independent of N . 

Exercise 5.1. Consider the Hamiltonian simulation problem for H = H1 + H2 + H3 . Show that
the first order Trotter formula
Ue (t) = e−itH1 e−itH2 e−itH3
has a commutator type error bound.
Exercise 5.2. Consider the time-dependent Hamiltonian simulation problem for the following
controlled Hamiltonian
H(t) = a(t)H1 + b(t)H2 ,
where a(t) and b(t) are smooth functions bounded together with all derivatives. We focus on the
following Trotter type splitting, defined as
U e (tn , tn−1 ) · · · U
e (t) := U e (tj+1 , tj ) = e−i∆ta(tj )H1 e−i∆tb(tj )H2 ,
e (t1 , t0 ), U
where the intervals [tj , tj+1 ] are equidistant and of length ∆t on the interval [0, t] with tn = t. Show
that this method has first-order accuracy, but does not exhibit a commutator type error bound in
general.
CHAPTER 6

Block encoding

In order to perform matrix computations, we must first address the problem of the input model :
how to get access to information in a matrix A ∈ CN ×N (N = 2n ) which is generally a non-unitary
matrix, into the quantum computer? One possible input model is given via the unitary eiτ A (if A
is not Hermitian, in some scenarios we can consider its Hermitian version via the dilation method).
This is particularly useful when eiτ A can be constructed using simple circuits, e.g. Trotter splitting.
A more general input model, as will be discussed in this chapter, is called “block encoding”. Of
course, if A is a dense matrix without obvious structures, any input model will be very expensive
(e.g. exponential in n) to implement. Therefore a commonly assumed input model is s-sparse, i.e.,
there are at most s nonzero entries in each row / column of the matrix. Furthermore, we have an
efficient procedure to get access to the location, as well as the value of the nonzero entries. This in
general can again be a difficult task given that the number of nonzero entries can still be exponential
in n for a sparse matrix. Some dense matrices may also be efficiently block encoded on quantum
computers. This chapter will illustrate the block encoding procedure for via a number of detailed
examples.

6.1. Query model for matrix entries


The query model for sparse matrices is based on certain quantum oracles. In some scenarios,
these quantum oracles can be implemented all the way to the elementary gate level.
Throughout the discussion we assume A is an n-qubit, square matrix, and
(6.1) kAkmax := max |Aij | < 1.
ij

If the kAkmax ≥ 1, we can simply consider the rescaled matrix A/α e for some α > kAk
max .
To query the entries of a matrix, the desired oracle takes the following general form
 q 
2
(6.2) OA |0i |ii |ji = Aij |0i + 1 − |Aij | |1i |ii |ji .

In other words, given i, j ∈ [N ] and a signal qubit 0, OA performs a controlled rotation (controlling
on i, j) of the signal qubit, which encodes the information in terms of amplitude of |0i.
However, the classical information in A is usually not stored natively in terms of such an oracle
OA . Sometimes it is more natural to assume that there is an oracle
(6.3) eA |0d0 i |ii |ji = |A
O eij i |ii |ji ,

where Aeij is a d0 -bit fixed point representation of Aij , and the value of A
eij is either computed
on-the-fly with a quantum computer, or obtained through an external database. In either case, the
implementation of O eA may be challenging, and we will only consider the query complexity with
respect to this oracle.
79
80 6. BLOCK ENCODING

Using classical arithmetic operations, we can convert this oracle into an oracle
0
(6.4) OA |0d i |ii |ji = |θeij i |ii |ji ,

where 0 ≤ θeij < 1, and θeij is a d-bit representation of θij = arccos(Aij )/π . This step may require
some additional work registers not shown here.
Now using the controlled rotation in Proposition 4.7, the information of A eij , θeij has now been
transferred to the phase of the signal qubit. We should then perform uncomputation and free the
work register storing such intermediate information A eij , θeij . The procedure is as follows

0
OA
|0i |0d i |ii |ji −−→ |0i |θeij i |ii |ji
|{z}
work register
 q 
(6.5) CR 2
−−→ Aij |0i + 1 − |Aij | |1i |θeij i |ii |ji
0 −1
 q 
(OA ) 2
−−−−−→ Aij |0i + 1 − |Aij | |1i |0d i |ii |ji

From now on, we will always assume that the matrix entries of A can be queried using the
phase oracle OA or its variants.

6.2. Block encoding


The simplest example of block encoding is the following: assume we can find a (n + 1)-qubit
unitary matrix U (i.e., U ∈ C2N ×2N ) such that
 
A ∗
UA =
∗ ∗

where ∗ means that the corresponding matrix entries are irrelevant, then for any n-qubit quantum
state |bi, we can consider the state
 
b
(6.6) |0, bi = |0i |bi = ,
0

and
 
Ab
(6.7) UA |0, bi = =: |0i A |bi + |⊥i .

Here the (unnormalized) state |⊥i can be written as |1i |ψi for some (unnormalized) state |ψi, that
is irrelevant to the computation of A |bi. In particular, it satisfies the orthogonality relation.

(6.8) (h0| ⊗ In ) |⊥i = 0.

In order to obtain A |bi, we need to measure the qubit 0 and only keep the state if it returns 0.
This can be summarized into the following quantum circuit:
6.2. BLOCK ENCODING 81

|0i
UA
A|bi
|bi kA|bik (upon measuring 0)

Figure 6.1. Circuit for block encoding of A using one ancilla qubit.

Note that the output state is normalized after the measurement takes place. The success
probability of obtaining 0 from the measurement can be computed as
2
(6.9) p(0) = kA |bik = hb|A† A|bi .
So the missing information of norm kA |bik can be recovered via the success probability p(0) if
needed. We find that the success probability is only determined by A, |bi, and is independent of
other irrelevant components of UA .
Note that we may not need to restrict the matrix UA to be a (n + 1)-qubit matrix. If we can
find any (n + m)-qubit matrix UA so that
 
A ∗ ··· ∗
 ∗ ∗ · · · ∗
(6.10) UA =  .
 
..
 ..

. 
∗ ∗ ··· ∗
Here each ∗ stands for an n-qubit matrix, and there are 2m block rows / columns in UA . The
relation above can be written compactly using the braket notation as
(6.11) A = (h0m | ⊗ In ) UA (|0m i ⊗ In )
A necessary condition for the existence of UA is that kAk ≤ 1. (Note: kAkmax ≤ 1 does not
guarantee that kAk ≤ 1, see Exercise 6.2). However, if we can find sufficiently large α and UA so
that
(6.12) A/α = (h0m | ⊗ In ) UA (|0m i ⊗ In ) .
Measuring the m ancilla qubits and all m-qubits return 0, we still obtain the normalized state
A|bi
kA|bik . The number α is hidden in the success probability:

1 2 1
(6.13) p(0m ) = kA |bik = 2 hb|A† A|bi .
α2 α
So if α is chosen to be too large, the probability of obtaining all 0’s from the measurement can be
vanishingly small.
Finally, it can be difficult to find UA to block encode A exactly. This is not a problem, since it
is sufficient if we can find UA to block encode A up to some error . We are now ready to give the
definition of block encoding in Definition 6.1.
Definition 6.1 (Block encoding). Given an n-qubit matrix A, if we can find α,  ∈ R+ , and an
(m + n)-qubit unitary matrix UA so that
(6.14) kA − α (h0m | ⊗ In ) UA (|0m i ⊗ In ) k ≤ ,
82 6. BLOCK ENCODING

then UA is called an (α, m, )-block-encoding of A. When the block encoding is exact with  = 0,
UA is called an (α, m)-block-encoding of A. The set of all (α, m, )-block-encoding of A is denoted
by BEα,m (A, ), and we define BEα,m (A) = BE(A, 0).
Assume we know each matrix element of the n-qubit matrix Aij , and we are given an (n + m)-
qubit unitary UA . In order to verify that UA ∈ BE1,m (A), we only need to verify that
(6.15) h0m , i|UA |0m , ji = Aij ,
and UA applied to any vector |0m , bi can be obtained via the superposition principle.
Therefore we may first evaluate the state UA |0m , ji, and perform inner product with |0m , ii and
verify the resulting the inner product is Aij . We will also use the following technique frequently.
Assume UA = UB UC , and then
(6.16) h0m , i|UA |0m , ji = h0m , i|UB UC |0m , ji = (UB† |0m , ii)† (UC |0m , ji).
So we can evaluate the states UB† |0m , ii , UC |0m , ji independently, and then verify the inner product
is Aij . Such a calculation amounts to running the circuit Fig. 6.2, P and if theP ancilla qubits are
measured to be 0m , the system qubits return the normalized state i Aij |ii / k i Aij |iik.

|0m i
UA
|ji

Figure 6.2. Circuit for general block encoding of A.

Example 6.2 ((1, 1)-block-encoding is general). For any n-qubit matrix A with kAk2 ≤ 1, the
singular value decomposition (SVD) of A is denoted by W ΣV † , where all singular values in the
diagonal matrix Σ belong to [0, 1]. Then we may construct an (n + 1)-qubit unitary matrix
  √  † 
W 0 √ Σ In − Σ2 V 0
UA :=
0 In In − Σ 2 −Σ 0 In
(6.17)  √ 
A W In − Σ2
= √ †
In − Σ V
2 −Σ
which is a (1, 1)-block-encoding of A. 
Example 6.3 (Random circuit block encoded matrix). In some scenarios, we may want to con-
struct a pseudo-random non-unitary matrix on quantum computers. Note that it would be highly
inefficient if we first generate a dense pseudo-random matrix A classically and then feed it into the
quantum computer using e.g. quantum random-access memory (QRAM). Instead we would like to
work with matrices that are inherently easy to generate on quantum computers. This inspires the
random circuit based block encoding matrix (RACBEM) model [DL21]. Instead of first identifying
A and then finding its block encoding UA , we reverse this thought process: we first identify a
unitary UA that is easy to implement on a quantum computer, and then ask which matrix can be
block encoded by UA .
Example 6.2 shows that in principle, any matrix A with kAk2 ≤ 1 can be accessed via a (1, 1, 0)-
block-encoding. In other words, A can be block encoded by an (n + 1)-qubit random unitary UA ,
6.2. BLOCK ENCODING 83

and UA can be constructed using only basic one-qubit unitaries and CNOT gates. The layout of
the two-qubit operations can be designed to be compatible with the coupling map of the hardware.
A cartoon is shown in Example 6.3, and an example is given in Fig. 6.4.

Figure 6.3. A cartoon illustration of the RACBEM model.

q0 : |0i U1(5.12) U3(3.09, 2.64, 2.83) U3(5.99, 6.27, 0.798) U3(4.51, 5.13, 0.165) • • U1(4.29)

q1 : |0i U1(5.17) U1(5.73) U2 (0.137, 0.709) U2 (0.833, 4.42) U1(1.3) U3(5.78, 1.22, 0.301)

q2 : |0i U2 (1.57, 2.43) • U3(3.9, 4.32, 0.171) U2 (5.16, 3.67) U2 (1.39, 1.36) U3(5.38, 3.58, 4.07) •

q3 : |0i U1(4.64) U1(2.98) U2 (5.1, 3.74) U2 (2.66, 3.07)

q0 (continue) U1(5.25) U2 (5.68, 2.7) • U1(0.034) • U2 (6.1, 1.35)

q1 (continue) U3(4.21, 5.36, 3.3) • U1(0.703) U2 (3.63, 1.84) • U2 (2.34, 4.58) U1(5.45)

q2 (continue) U3(1.05, 1.55, 5.24) U3(3.4, 5.71, 3.6) • U2 (6.27, 2.12) U1(4.21) U3(3.09, 4.85, 5.24)

q3 (continue) • U1(5.97) U1(5.5) U3(0.614, 1.83, 4.87) U1(0.782) U2 (1.67, 2.85) U1(0.783)

 
0.096 + 0.256i −0.041 + 0.058i 0.096 − 0.224i 0.120 + 0.061i −0.138 − 0.054i 0.013 + 0.052i 0.189 − 0.099i 0.152 − 0.166i

 0.143 − 0.023i 0.001 − 0.335i 0.046 − 0.237i −0.056 + 0.007i 0.063 + 0.016i 0.079 − 0.063i 0.017 + 0.276i −0.046 + 0.007i 


 0.054 − 0.017i −0.073 + 0.149i −0.002 − 0.063i −0.128 + 0.128i −0.371 + 0.048i −0.163 − 0.102i −0.069 − 0.069i 0.126 + 0.037i 

 −0.043 − 0.208i −0.156 − 0.170i 0.189 − 0.080i −0.090 + 0.142i −0.057 + 0.075i 0.252 + 0.080i 0.150 + 0.057i 0.098 − 0.043i 
A= 

 0.145 + 0.178i −0.325 + 0.125i 0.114 + 0.242i −0.136 − 0.316i 0.145 + 0.255i −0.120 − 0.335i −0.046 + 0.295i −0.142 − 0.184i 


 −0.117 + 0.149i −0.101 + 0.338i −0.213 − 0.018i −0.474 + 0.081i −0.036 − 0.121i 0.444 + 0.147i −0.198 + 0.035i −0.091 − 0.054i 

 −0.063 + 0.305i 0.001 − 0.145i −0.177 + 0.045i −0.209 − 0.150i −0.041 + 0.296i 0.046 + 0.082i 0.387 − 0.051i −0.430 + 0.233i 
−0.093 − 0.127i 0.254 + 0.307i −0.144 − 0.265i −0.048 − 0.353i 0.023 + 0.060i 0.085 − 0.156i 0.011 + 0.225i 0.249 + 0.420i

Figure 6.4. A RACBEM circuit constructed using the basic gate set
{U1 , U2 , U3 , CNOT}. The circuit at the bottom is a continuation of the top circuit.
A is the 3-qubit matrix block encoded as the upper-left block, namely, identifying
q0 as the block encoding qubit.


Example 6.4 (Block encoding of a diagonal matrix). As a special case, let us consider the block
encoding of a diagonal matrix. Since the row and column indices are the same, we may simplify
the oracle Eq. (6.2) into
 q 
2
(6.18) OA |0i |ii = Aii |0i + 1 − |Aii | |1i |ii .

In the case when the oracle O


eA is used, we may assume accordingly
(6.19) OeA |0d0 i |ii = |Aii i |ii .
Let UA = OA . Direct calculation shows that for any i, j ∈ [N ],
(6.20) h0| hi| UA |0i |ji = Aii δij .
This proves that UA ∈ BE1,1 (A), i.e., UA is a (1, 1)-block-encoding of the diagonal matrix A. 
84 6. BLOCK ENCODING

6.3. Block encoding of s-sparse matrices


We now give a few examples of block encodings of more general sparse matrices. We start from
a 1-sparse matrix, i.e., there is only one nonzero entry in each row or column of the matrix. This
means that for each j ∈ [N ], there is a unique c(j) ∈ [N ] such that Ac(j),j 6= 0, and the mapping c
is a permutation. Then there exists a unitary Oc such that
(6.21) Oc |ji = |c(j)i .
The implementation of Oc may require the usage of some work registers that are omitted here. We
also have
(6.22) Oc† |c(j)i = |ji .
We assume the matrix entry Ac(j),j can be queried via
 q 
2
(6.23) OA |0i |ji = Ac(j),j |0i + 1 − Ac(j),j |1i |ji .

Now we construct UA = (I ⊗ Oc )OA , and compute


 q 
2
(6.24) h0| hi| UA |0i |ji = h0| hi| Ac(j),j |0i + 1 − Ac(j),j |1i |c(j)i = Ac(j),j δi,c(j) .

This proves that UA ∈ BE1,1 (A).


For a more general s-sparse matrix, WLOG we assume each row and column has exactly s
nonzero entries (otherwise we can always treat some zero entries as nonzeros). For each column j,
the row index for the `-th nonzero entry is denoted by c(j, `) ≡ cj,` . For simplicity, we assume that
there exists a unitary Oc such that
(6.25) Oc |`i |ji = |`i |c(j, `)i .
s
Here we assume s = 2 and the first register is an s-qubit register. A necessary condition for this
query model is that Oc is reversible, i.e., we can have Oc† |`i |c(j, `)i = |`i |ji. This means that for
each row index i = c(j, `), we can recover the column index j given the value of `. This can be
satisfied e.g.
(6.26) c(j, `) = j + ` − `0 (mod N ),
where `0 is a fixed number. This corresponds to a banded matrix. This assumption is of course
somewhat restrictive. We shall discuss more general query models in Section 6.5.
Corresponding to Eq. (6.25), the matrix entries can be queried via
 q 
2
(6.27) OA |0i |`i |ji = Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |`i |ji .

In order to construct a unitary that encodes all row indices at the same time, we define D = H ⊗s
(sometimes called a diffusion operator, which is a term originated from Grover’s search) satisfying
1 X
(6.28) D |0s i = √ |`i .
s
`∈[s]

Consider UA given by the circuit in Fig. 6.5. The measurement means that to obtain A |bi, the
ancilla register should all return the value 0.
6.4. HERMITIAN BLOCK ENCODING 85

|0i

|0s i D OA D
Oc
|bi

Figure 6.5. Quantum circuit for block encoding an s-sparse matrix.

Proposition 6.5. The circuit in Fig. 6.5 defines UA ∈ BEs,s+1 (A).


Proof. We call |0i |0s i |ji the source state, and |0i |0s i |ii the target state. In order to compute
the inner product h0| h0s | hi| UA |0i |0s i |ji, we apply D, OA , Oc to the source state accordingly as
D 1
X
|0i |0s i |ji −→ √ |0i |`i |ji
s
`∈[s]

OA 1
X q
2

(6.29) −−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |`i |ji
s
`∈[s]

Oc 1
X q
2

−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |`i |c(j, `)i .
s
`∈[s]

Since we are only interested in the final state when all ancilla qubits are the 0 state, we may apply
D to target state |0i |0s i |ii as (note that D is Hermitian)
D 1 X
(6.30) |0i |0s i |ii −→ √ |0i |`0 i |ii .
s 0
` ∈[s]

Hence the inner product


1X 1
(6.31) h0| h0s | hi| UA |0i |0s i |ji = Ac(j,`),j δi,c(j,`) = Aij .
s s
`


6.4. Hermitian block encoding


So far we have considered general s-sparse matrices. Note that if A is a Hermitian matrix, its
(α, m, )-block-encoding UA does not need to be Hermitian. Even if  = 0, we only have that the
upper-left n-qubit block of UA is Hermitian. For instance, even the block encoding of a Hermitian,
diagonal matrix in Example 6.4 may not be Hermitian (exercise). On the other hand, there are
indeed cases when UA = UA† is indeed a Hermitian matrix, and hence the definition:
Definition 6.6 (Hermitian block encoding). Let UA be an (α, m, )-block-encoding of A. If UA
is also Hermitian, then it is called an (α, m, )-Hermitian-block-encoding of A. When  = 0, it is
called an (α, m)-Hermitian-block-encoding. The set of all (α, m, )-Hermitian-block-encoding of A
is denoted by HBEα,m (A, ), and we define HBEα,m (A) = HBE(A, 0).
The Hermitian block encoding provides the simplest scenario of the qubitization process in
Section 7.1.
86 6. BLOCK ENCODING

|0i

|0n i D OA D
Oc SWAP Or†
|bi

Figure 6.6. Quantum circuit for block encoding of general sparse matrices.

6.5. Query models for general sparse matrices*


If we query the oracle (6.25), the assumption that for each ` the value of c(j, `) is unique for all
j seems unnatural for constructing general sparse matrices. So we consider an altnerative method
for construct the block encoding of a general sparse matrix as below.
Again WLOG we assume that each row / column has at most s = 2s nonzero entries, and that
we have access to the following two (2n)-qubit oracles

Or |`i |ii = |r(i, `)i |ii ,


(6.32)
Oc |`i |ji = |c(j, `)i |ji .

Here r(i, `), c(j, `) gives the `-th nonzero entry in the i-th row and j-th column, respectively. It
should be noted that although the index ` ∈ [s], we should expand it into an n-qubit state (e.g. let
` take the last s qubits of the n-qubit register following the binary representation of integers). The
reason for such an expansion, and that we need two oracles Or , Oc will be seen shortly.
Similar to the discussion before, we need a diffusion operator satisfying

1 X
(6.33) D |0n i = √ |`i .
s
`∈[s]

This can be implemented using Hadamard gates as

(6.34) D = In−s ⊗ H ⊗s .

We assume that the matrix entries are queried using the following oracle using controlled
rotations
 q 
2
(6.35) OA |0i |ii |ji = Aij |0i + 1 − |Aij | |ii |ji ,

where the rotation is controlled by both row and column indices. However, if Aij = 0 for some i, j,
the rotation can be arbitrary, as there will be no contribution due to the usage of Or , Oc .

Proposition 6.7. Fig. 6.6 defines UA ∈ BEs,n+1 (A).


6.5. QUERY MODELS FOR GENERAL SPARSE MATRICES* 87

Proof. We apply the first four gate sets to the source state
D O
|0i |0n i |ji −→−−→c

OA 1
X q
2

−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |c(j, `)i |ji
(6.36) s
`∈[s]

SWAP 1
X q
2

−−−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |ji |c(j, `)i .
s
`∈[s]

We then apply D and Or to the target state


D Or 1 X
(6.37) |0i |0n i |ii −→−−→ √ |0i |r(i, `0 )i |ii .
s 0
` ∈[s]

Then the inner product gives


1X
h0| h0n | hi| UA |0i |0n i |ji = Ac(j,`),j δi,c(j,`) δr(j,`0 ),j
s 0
`,`
(6.38)
1X 1
= Ac(j,`),j δi,c(j,`) = Aij .
s s
`

Here we have used that there exists a unique ` such that i = c(j, `), and a unique `0 such that
j = r(i, `0 ). 
We remark that the quantum circuit in Fig. 6.6 is essentially the construction in [GSLW18,
Lemma 48], which gives a (s, n + 3)-block-encoding. The construction above slightly simplifies the
procedure and saves two extra qubits (used to mark whether ` ≥ s).
Next we consider the Hermitian block encoding of a s-sparse Hermitian matrix. Since A is
Hermitian, we only need one oracle to query the location of the nonzero entries
(6.39) Oc |`i |ji = |c(j, `)i |ji .
Here c(j, `) gives the `-th nonzero entry in the j-th column. It can also be interpreted as the `-th
nonzero entry in the i-th column. Again the first register needs to be interpreted as an n-qubit
register. The diffusion operator is the same as in Eq. (6.34).
Unlike all discussions before, we introduce two signal qubits, and a quantum state in the
computational basis takes the form |ai |ii |bi |ji, where a, b ∈ {0, 1}, i, j ∈ [N ]. In other words, we
may view |ai |ii as the first register, and |bi |ji as the second register. The (n + 1)-qubit SWAP
gate is defined as
(6.40) SWAP |ai |ii |bi |ji = |bi |ji |ai |ii .
To query matrix entries, we need access to the square root of Aij as (note that act on the second
single-qubit register)
 q 
p
(6.41) OA |ii |0i |ji = |ii Aij |0i + 1 − |Aij | |1i |ji .

The square root operation is well defined if Aij ≥ 0 for all entries. If A has negative (or complex)
entries, we
p first iθwrite Aij = |Aij | eiθij , θij ∈ [0, 2π), and the square root is uniquely defined as
/2
p
Aij = |Aij |e ij .
Proposition 6.8. Fig. 6.7 defines UA ∈ HBEs,n+2 (A).
88 6. BLOCK ENCODING

|0i

|0n i D Oc2 Oc†2 D


SWAP
|0i OA (OA )†

|bi Oc Oc†

Figure 6.7. Quantum circuit for Hermitian block encoding of a general Hermitian matrix

Proof. Apply the first four gate sets to the source state gives
D O
|0i |0n i |0i |ji −→−−→ c

 
OA 1
X q q
−−→ √ |0i |c(j, `)i Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |ji
(6.42) s
`∈[s]

SWAP 1
X q q 
−−−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |ji |0i |c(j, `)i
s
`∈[s]

Apply the last three gate sets to the target state


D O
|0i |0n i |0i |ii −→−−→ c

q 
(6.43) OA 1
X q
−−→ √ |0i |c(i, `0 )i Ac(i,`0 ),i |0i + 1 − Ac(i,`0 ),i |1i |ii
s 0
` ∈[s]

Finally, take the inner product as


h0| h0n | h0| hi| UA |0i |0n i |0i |ji
1 Xq q
= Ac(j,`),j A∗c(i,`0 ),i δi,c(j,`) δc(i,`0 ),j
(6.44) s 0
`,`
1 q X 1
= Aij A∗ji δi,c(j,`) δc(i,`0 ),j = Aij .
s 0
s
`,`

In this equality, we have used that A is Hermitian: Aij = A∗ji , and there exists a unique ` such that
i = c(j, `), as well as a unique `0 such that j = c(i, `0 ). 
The quantum circuit in Fig. 6.7 is essentially the construction in [CKS17]. The relation with
quantum walks will be further discussed in Section 7.2.

Exercise 6.1. Construct a query oracle OA similar to that in Eq. (6.5), when Aij ∈ C with
|Aij | < 1.
6.5. QUERY MODELS FOR GENERAL SPARSE MATRICES* 89

Exercise 6.2. Let A ∈ CN ×N be a s-sparse matrix. Prove that kAk ≤ s kAkmax . For every
1 ≤ s ≤ N , provide an example that the equality can reached.
Exercise 6.3. Construct an s-sparse matrix so that the oracle in Eq. (6.25) does not exist.
Exercise 6.4. Let A ∈ CN ×N (N = 2n ) be a Hermitian matrix with entries on the complex unit
circle Aij = zij , |zij | = 1.
2 2
(1) Construct a 2n qubit block-diagonal unitary V ∈ CN ×N such that
1 X p
V |0i |ji = √ z̄ij |ii |ji , j ∈ [N ].
N i∈[N ]

Here, block-diagonal means (hx| ⊗ I)V (|yi ⊗ I) = 0N ×N for x 6= y.


(2) Draw a circuit which uses V to implement a block encoding U of A with n ancilla qubits
. What is the prefactor α for the block encoding?
(3) Give an explicit expression for the entries of the block encoding U .
CHAPTER 7

Matrix functions of Hermitian matrices

Let A be an n-qubit Hermitian matrix. Then A has the eigenvalue decomposition


(7.1) A = V ΛV † .
Here Λ = diag({λi }) is a diagonal matrix, and λ0 ≤ · · · ≤ λN −1 . Let the scalar function f be well
defined on all λi ’s. Then the matrix function f (A) can be defined in terms of the eigendecomposition:
Definition 7.1 (Matrix function of Hermitian matrices). Let A ∈ CN ×N be a Hermitian matrix
with eigenvalue decomposition Eq. (7.1). Let f : R → C be a scalar function such that f (λi ) is
defined for all i ∈ [N ]. The matrix function is defined as
(7.2) f (A) := V f (Λ)V † ,
where
(7.3) f (Λ) = diag (f (λ0 ) , f (λ1 ) , . . . , f (λN −1 )) .
This chapter introduces techniques to construct an efficient quantum circuit to compute f (A) |bi
for any state |bi. Throughout the discussion we assume A is queried in the block encoding model
denoted by UA . For simplicity we assume that there is no error in the block encoding, i.e., UA ∈
BEα,m (A), and WLOG we can take α = 1.
Many tasks in scientific computation can be expressed in terms of matrix functions. Here are
a few examples:
• Hamiltonian simulation: f (A) = eiAt .
• Gibbs state preparation f (A) = e−βA .
• Solving linear systems of equation f (A) = A−1 .
• Eigenstate filtering f (A) = 1(−∞,0) (A − µI).
A key technique for representing matrix functions is called the qubitization.

7.1. Qubitization of Hermitian matrices with Hermitian block encoding


We first introduce some heuristic idea behind qubitization. For any −1 < λ ≤ 1, we can
consider a 2 × 2 rotation matrix,
 √   
√ λ − 1 − λ2 cos θ − sin θ
(7.4) O(λ) = = .
1 − λ2 λ sin θ cos θ
where we have performed the change of variable λ = cos θ with 0 ≤ θ < π.
Now direct computation shows
 
cos(kθ) − sin(kθ)
(7.5) Ok (λ) = .
sin(kθ) cos(kθ)
91
92 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Using the definition of Chebyshev polynomials (of first and second kinds, respectively)
sin(kθ) sin(k arccos λ)
(7.6) Tk (λ) = cos(kθ) = cos(k arccos λ), Uk−1 (λ) = = √ ,
sin θ 1 − λ2
we have
 √ 
√ Tk (λ) − 1 − λ2 Uk−1 (λ)
(7.7) Ok (λ) = .
1 − λ2 Uk−1 (λ) Tk (λ)
Note that if we can somehow replace λ by A, we immediately obtain a (1, 1)-block-encoding for the
Chebyshev polynomial Tk (A)! This is precisely what qubitization aims at achieving, though there
are some small twists.
In the simplest scenario, we assume that UA ∈ HBE1,m (A). Start from the spectral decompo-
sition
X
(7.8) A= λi |vi i hvi | ,
i

we have that for each eigenstate |vi i,


(7.9) UA |0m i |vi i = |0m i A |vi i + |⊥
e i i = λi |0m i |vi i + |⊥
e ii .
e i i is an unnormalized state that is orthogonal to all states of the form |0m i |ψi, i.e.,
Here |⊥
(7.10) Π |⊥
e i i = 0.
where
(7.11) Π = |0m i h0m | ⊗ In
is a projection operator.
Since the right hand side of Eq. (7.9) is a normalized state, we may also write
q
(7.12) e i i = 1 − λ2 |⊥i i ,
|⊥ i

where |⊥i i is a normalized state.


Now if λi = ±1, then Hi = span{|0m i |vi i} is already an invariant subspace of UA , and |⊥i i
can be any state. Otherwise, use the fact that UA = UA† , we can apply UA again to both sides of
Eq. (7.9) and obtain
q
(7.13) UA |⊥i i = 1 − λ2i |0m i |vi i − λi |⊥i i .
Therefore Hi = span{|0m i |vi i , |⊥i i} is an invariant subspace of UA . Furthermore, the matrix
representation of UA with respect to the basis Bi = {|0m i |vi i , |⊥i i} is
 p 
λ 1 − λ 2
i i
(7.14) [UA ]Bi = p ,
1 − λ2i −λi
i.e., UA restricted to Hi is a reflection operator. This also leads to the name “qubitization”, which
means that each eigenvector |vi i is “qubitized” into a two-dimensional space Hi .
In order to construct a block encoding for Tk (A), we need to turn UA into a rotation. For this
note that Hi is also an invariant subspace for the projection operator Π:
 
1 0
(7.15) [Π]Bi = .
0 0
7.1. QUBITIZATION OF HERMITIAN MATRICES WITH HERMITIAN BLOCK ENCODING 93

Similarly define ZΠ = 2Π − 1, since


 
1 0
(7.16) [ZΠ ]Bi = ,
0 −1
ZΠ acts as a reflection operator restricted to each subspace Hi . Then Hi is the invariant subspace
for the iterate
(7.17) O = UA ZΠ
and
 p 
λ − 1 − λ2i
(7.18) [O]Bi = p i 2
1 − λi λi
is the desired rotation matrix. Therefore
 p 
k k Tk (λi ) − 1 − λ2i Uk−1 (λi )
(7.19) [O ]Bi = [(UA ZΠ ) ]Bi = p .
1 − λ2i Uk−1 (λi ) Tk (λi )
Since {|0m i |vi i} spans the range of Π, we have
 
Tk (A) ∗
(7.20) Ok =
∗ ∗
i.e., Ok = (UA ZΠ )k is a (1, m)-block-encoding of the Chebyshev polynomial Tk (A).
In order to implement ZΠ , note that if m = 1, then ZΠ is just the Pauli Z gate. When m > 1,
the circuit
|1i Z

|bi
returns |1i |0 i if b = 0 , and − |1i |bi if b 6= 0m . So this precisely implements ZΠ where the signal
m m

qubit |1i is used as a work register. We may also discard the signal qubit, and resulting unitary is
denoted by ZΠ .
In other words, the circuit in Fig. 7.1 implements the operator O. Repeating the circuit k times
gives the (1, m + 1)-block-encoding of Tk (A).

|1i Z
|0m i ZΠ
m
|0 i ≡ UA
UA |ψi
|ψi

Figure 7.1. Circuit implementing one step of qubitization with a Hermitian block
encoding of a Hermitian matrix. Here UA ∈ HBE1,m (A).

Remark 7.2 (Alternative perspectives of qubitization). The fact that an arbitrarily large block
encoding matrix UA can be partially block diagonalized into N subblocks of size 2 × 2 may seem a
rather peculiar algebraic structure. In fact there are other alternative perspectives and derivations
94 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

of the qubitization result. Some noticeable ones include the use of Jordan’s Lemma, and the use of
the cosine-sine (CS) decomposition. Throughout this chapter and the next chapter, we will adopt
the more “elementary” derivations used above. 

7.2. Application: Szegedy’s quantum walk*


Quantum walk is one of the major topics in quantum algorithms. Roughly speaking, there are
two versions of quantum walks. The continuous time quantum walk is the closely analogous to
its classical counterpart, i.e., continuous time random walk. The other version, the discrete time
quantum walk, or Szegedy’s quantum walk [Sze04] is not so obviously connected to the classical
random walks. We will not introduce the motivations behind the continuous and discrete time
random walks, and refer readers to [Chi21, Chapter 16,17] for detailed discussions.

7.2.1. Basics of Markov chain. Let G = (V, E) be a graph of size N . A Markov chain (or
a random walk) is given by a transition matrix P , with its entry Pij denoting the probability of the
transition from vertex i to vertex j. The matrix P is a stochastic matrix satisfying
X
(7.21) Pij ≥ 0, Pij = 1.
j

Let π be the stationary state, which is a left eigenvector of P with eigenvalue 1:


X X
(7.22) πi Pij = πj , πi ≥ 0, πi = 1.
i i

A Markov chain is irreducible if any state can be reached from any other state in a finite number
of steps. An irreducible Markov chain is aperiodic if there exists no integer greater than one that
divides the length of every directed cycle of the graph. A Markov chain is ergodic if it is both
irreducible and aperiodic. By the Perron–Frobenius Theorem, any ergodic Markov chain P has a
unique stationary state π, and πi > 0 for all i. A Markov chain is reversible if the following detailed
balance condition is satisfied
(7.23) πi Pij = πj Pji .
Now we define the discriminant matrix associated with a Markov chain as
p
(7.24) Dij = Pij Pji ,
which is real symmetric and hence Hermitian. For a reversible Markov chain, the stationary state
can be encoded as an eigenvector of D (the proof is left as an exercise).
Proposition 7.3 (Reversible Markov chain). If a Markov chain is reversible, then the coherent
version of the stationary state
X√
(7.25) |πi = πi |ii
i

is a normalized eigenvector of the discriminant matrix D satisfying


(7.26) D |πi = |πi .
Furthermore, when πi > 0 for all i, we have
√ √
(7.27) D = diag( π)P diag( π)−1 .
Therefore the set of (left) eigenvalues of P and the set of the eigenvalues of D are the same.
7.2. APPLICATION: SZEGEDY’S QUANTUM WALK* 95

7.2.2. Block encoding of the discriminant matrix. Our first goal is to construct a Her-
mitian block encoding of D. Assume that we have access to an oracle OP satisfying
Xp
(7.28) OP |0n i |ji = Pjk |ki |ji .
k

Thanks to the stochasticity of P , the right hand side is already a normalized vector, and no
additional signal qubit is needed.
We also introduce the n-qubit SWAP operator Swap operator:

(7.29) SWAP |ii |ji = |ji |ii ,

which swaps the value of the two registers in the computational basis, and can be directly imple-
mented using n two-qubit SWAP gates.
We claim that the following circuit gives UD ∈ HBE1,n (D).

|0n i
OP SWAP OP†
|ji

Figure 7.2. Circuit for the Hermitian block encoding of a discriminant matrix.

Proposition 7.4. Fig. 7.2 defines UD ∈ HBE1,n (D).

Proof. Clearly UD is unitary and Hermitian. Now we compute as before


OP SWAP
Xp Xp
(7.30) |0n i |ji −−→ Pjk |ki |ji −−−−→ Pjk |ji |ki .
k k

Meanwhile
O
Xp
(7.31) |0n i |ii −−→
P
Pik0 |k 0 i |ii .
k0

So the inner product gives


Xp
h0n | hi| UD |0n i |ji =
p
(7.32) Pik0 Pjk δj,k0 δi,k = Pij Pji = Dij .
k,k0

This proves the claim. 

7.2.3. Szegedy’s quantum walk. For a Markov chain defined on a graph G = (V, E),
Szegedy’s quantum walk implements a qubitization of the discriminant matrix D in Eq. (7.24).
Let UD be the Hermitian block encoding defined by the circuit in Fig. 7.2, we may readily plug it
into Fig. 7.1, and obtain the circuit in Fig. 7.3 denoted by OZ .
96 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

|0n i ZΠ
OP SWAP OP†
|ψi

Figure 7.3. Circuit implementing one step of Szegedy’s quantum walk operator.

Let the eigendecomposition of D be denoted by


(7.33) D |vi i = λi |vi i .
For each |vi i, the associated basis in the 2-dimensional subspace is Bi = {|0m i |vi i , |⊥i i}. Then the
qubitization procedure gives
 p 
λ − 1 − λ2
i i
(7.34) [OZ ]Bi = p .
1 − λ2i λi
The eigenvalues of OZ in the 2 × 2 matrix block are
(7.35) e±i arccos(λi ) .
This relation is important for the following reasons. By Proposition 7.3, if a Markov chain is
reversible and ergodic, the eigenvalues of D and P are the same. In particular, the largest eigenvalue
of D is unique and is equal to 1, and the second largest eigenvalue
√ of D is 1−δ, where δ > 0 is called
the spectral gap. Since arccos(1)√ = 0, and arccos(1 − δ) ≈ 2δ, we find that the spectral gap of OZ
on the unit circle is in fact O( δ) instead of O(δ). This is called the spectral gap amplification,
which leads to e.g. the quadratic quantum speedup of the hitting time.
Example 7.5 (Determining whether there is a marked vertex in a complete graph). Let G = (V, E)
be a complete, graph of N = 2n vertices. We would like to distinguish the following two scenarios:
(1) All vertices are the same, and the random walk is given by the transition matrix
1 >
(7.36) P = ee , e = (1, . . . , 1)> .
N
(2) There is one marked vertex. Without loss of generality we may assume this is the 0-th
vertex (of course we do not have access to this information). In this case, the transition
matrix is
(
δij , i = 0,
(7.37) Peij =
Pij , i > 0.
In other words, in the case (2), the random walk will stop at the marked index. The transition
matrix can also be written in the block partitioned form as
 
1 0
(7.38) P = 1
e .
Ne e N1 eeee>
Here ee is an all 1 vector of length N − 1.
For the random walk defined by P , the stationary state is π = N1 e, and the spectral gap is
e = (1, 0, . . . , 0)> , and the spectral
1. For the random walk defined by Pe, the stationary state is π
−1
gap of is δ = N . Starting from the uniform state π, the probability distribution after k steps of
7.2. APPLICATION: SZEGEDY’S QUANTUM WALK* 97

random walk is π > Pek . This converges to the stationary state of Pe, and hence reach the marked
vertex after O(N ) steps of walks (exercise).
These properties are also inherited by the discriminant matrices, with D = P and
 
(7.39) e= 1 10> .
D
0 N eeee
To distinguish the two cases, we are given a Szegedy quantum walk operator called O, which
can be either OZ or OeZ , which is associated with D, D,
e respectively. The initial state is
(7.40) |ψ0 i = |0n i (H ⊗n |0n i).
Our strategy is to measure the expectation
(7.41) mk = hψ0 |Ok |ψ0 i ,
which can be obtained via Hadamard’s test.
Before determining the value of k, first notice that if O = OZ , then OZ |ψ0 i = |ψ0 i. Hence
mk = 1 for all values of k.
On the other hand, if O = OeZ , we use the fact that D
e only has two nonzero eigenvalues 1 and
(N − 1)/N = 1 − δ, with associated eigenvectors denoted by |e π i and |ev i = √N1−1 (0, 1, 1 . . . , 1)> ,
respectively. Furthermore,
r
1 n N −1 n
(7.42) |ψ0 i = √ |0 i |e πi + |0 i |evi .
N N
Due to qubitization, we have
r
k 1 n N −1 n
(7.43) OZ |ψ0 i = √ |0 i Tk (1) |e
e πi + |0 i Tk (1 − δ) |e
v i + |⊥i ,
N N
where |⊥i is an unnormalized state satisfying (|0n i h0n |) ⊗ In |⊥i = 0. Then using Tk (1) = 1 for all
k, we have
 
1 1
(7.44) mk = + 1− Tk (1 − δ).
N N
Use the fact that Tk (1 − δ) = cos(k arccos(1 − δ)), in order to have Tk (1 − δ) ≈ 0, the smallest k
satisfies

π π π N
(7.45) k≈ ≈ √ = √ .
2 arccos(1 − δ) 2 2δ 2 2

Therefore taking k = d π2√N2
e, we have mk ≈ 1/N . Running Hadamard’s test to constant accuracy
allows us to distinguish the two scenarios. 
Remark 7.6 (Without using the Hadamard test). Alternatively, we may evaluate the success
probability of obtaining 0n in the ancilla qubits, i.e.,
2
(7.46) p(0n ) = (|0n i h0n | ⊗ In )Ok |ψ0 i .
When O = OZ , we have p(0n ) = 1 with certainty. When O = O eZ , according to Eq. (7.43),
 
1 1
(7.47) p(0n ) = + 1− Tk2 (1 − δ).
N N

So running the problem with k = d π2√N
2
e, we can distinguish between the two cases. 
98 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Remark 7.7 (Comparison with Grover’s search). It is natural to draw comparisons between
Szegedy’s quantum walk and Grover’s search. The two algorithms make queries to different or-
acles, and both yield quadratic speedup compared to the classical algorithms. The quantum walk
is slightly weaker, since it only tells whether there is one marked vertex or not. On the other hand,
Grover’s search also finds the location of the marked vertex. Both algorithms consist of repeated
usage of the product of two reflectors. The number of iterations need to be carefully controlled.
Indeed, choosing a polynomial degree four times as large as Eq. (7.45) would result in mk ≈ 1 for
the case with a marked vertex. 
Remark 7.8 (Comparison with QPE). Another possible solution of the problem of finding the
marked vertex is to perform QPE on the Szegedy walk operator O (which can be OZ or O eZ ). The
effectiveness of the method rests on the spectral gap amplification discussed above. We refer to
[Chi21, Chapter 17] for more details. 
7.2.4. Comparison with the original version of Szegedy’s quantum walk. The quan-
tum walk procedure can also be presented as follows. Using the OP oracle and the multi-qubit
SWAP gate, we can define two set of quantum states
Xp
|ψj1 i = OP |0n i |ji = Pjk |ki |ji ,
k
(7.48) Xp
|ψj2 i = SWAP(OP |0n i |ji) = Pjk |ji |ki .
k

This defines two projection operators


X
(7.49) Πl = |ψjl i hψjl | , l = 1, 2,
j∈[N ]

from which we can define two 2n-qubit reflection operators RΠl = 2Πl − I2n . Let us write down
the reflection operators more explicitly. Using the resolution of identity,
(7.50) RΠ1 = OP ((2 |0n i h0n | − I) ⊗ In )OP† = OP (ZΠ ⊗ In )OP† .
Similarly
(7.51) RΠ2 = SWAP OP (ZΠ ⊗ In )OP† SWAP .
Then Szegedy’s quantum walk operator takes the form
(7.52) UZ = RΠ2 RΠ1 ,
which is a rotation operator that resembles Grover’s algorithm. Note that
(7.53) UZ = SWAP OP (ZΠ ⊗ In )OP† SWAP OP (ZΠ ⊗ In )OP† ,
so
(7.54) OP† UZ (OP† )−1 = OZ
2
,
so the walk operator is the same as a block encoding of T2 (D) using qubitization, up to a matrix
similarity transformation, and the eigenvalues are the same. In particular, consider the matrix
k
power OZ , which provides a block encoding of the Chebyshev matrix polynomial Tk (D). Then the
2k
difference between OZ and UZk appears only at the beginning and end of the circuit.
7.3. LINEAR COMBINATION OF UNITARIES 99

7.3. Linear combination of unitaries


In practical applications, we are often not interested in constructing the Chebyshev polynomial
of A, but a linear combination of Chebyshev polynomials. For instance, the matrix inversion prob-
lem can be solved by expanding f (x) = x−1 using a linear combination of Chebyshev polynomials,
on the interval [−1, −κ−1 ] ∪ [κ−1 , 1]. Here we assume kAk = 1 and κ = kAk A−1 is the condition
number. One way to implement this is via a quantum primitive called the linear combination of
unitaries (LCU).
PK−1
Let T = i=0 αi Ui be the linear combination of unitary matrices Ui . For simplicity let
K = 2a , and αi > 0 (WLOG we can absorb the phase of αi into the unitary Ui ). Then
X
(7.55) U := |ii hi| ⊗ Ui ,
i∈[K]

implements the selection of Ui conditioned on the value of the a-qubit ancilla states (also called the
control register). U is called a select oracle.
Let V be a unitary operation satisfying
1 X √
(7.56) V |0a i = p αi |ii ,
kαk1 i∈[K]

and V is called the prepare oracle. The 1-norm of the coefficients is given by
X
(7.57) kαk1 = |αi | .
i

In the matrix form

 √ 
α0 ∗ ··· ∗
1 .. . .
(7.58) V =p ∗ . . ..  .

kαk1 √
 .
αK−1 ∗ ··· ∗
where the first basis is |0m i, and all other basis functions are orthogonal to it. Then
√ √ 
α0 · · · αK−1
1  ∗ ··· ∗ 
(7.59) V† = p ..  .
 
 .. . .
kαk1  . . . 
∗ ··· ∗
Then T can be implemented using the unitary given in Lemma 7.9 (called the LCU lemma).
Lemma 7.9 (LCU). Define W = (V † ⊗ In )U (V ⊗ In ), then for any |ψi,
1
(7.60) W |0a i |ψi = |0a i T |ψi + |⊥i
e ,
kαk1

where |⊥i
e is an unnormalized state satisfying

(7.61) (|0a i h0a | ⊗ In ) |⊥i


e = 0.

In other words, W ∈ BEkαk1 ,a (T ).


100 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Proof. First
1 X√ 1 X√
(7.62) U (V ⊗ In ) |0a i |ψi = U p αi |ii |ψi = p αi |ii Ui |ψi .
kαk1 i kαk1 i

Then using the matrix representation (7.59), and let the state |⊥i
e collect all the states marked by
m
∗ orthogonal to |0 i,

1 X
e = 1 |0a i T |ψi + |⊥i
(7.63) (V † ⊗ In )U (V ⊗ In ) |0a i |ψi = |0a i αi Ui |ψi + |⊥i e .
kαk1 i
kαk1

The LCU Lemma is a useful quantum primitive, as it states that the number of ancilla qubits
needed only depends logarithmically on K, the number of terms in the linear combination. Hence
it is possible to implement the linear combination of a very large number of terms efficiently. From
a practical perspective, the select and prepare oracles uses multi-qubit controls, and can be difficult
to implement. If implemented directly, the number of multi-qubit controls again depends linearly
on K and is not desirable. Therefore an efficient implementation using LCU (in terms of the gate
complexity) also requires additional structures in the prepare and select oracles.
If we apply W to |0a i |ψi and measure the ancilla qubits, then the probability of obtaining the
outcome 0a in the ancilla qubits (and therefore obtaining the state T |ψi / kT |ψik in the system reg-
2 2
ister) is (kT |ψik / kαk1 ) . The expected number of repetition needed to succeed is (kαk1 / kT |ψik) .
Now we demonstrate that using amplitude amplification (AA) in Section 2.3, this number can be
reduced to O (kαk1 / kT |ψik).

Remark 7.10 (Alternative construction of the prepare oracle). In some applications it may not
be convenient to absorb the phase of αi into the select oracle. In such a case, we may modify the
√ √
prepare oracle instead. If αi = ri eiθi with ri > 0, θi ∈ [0, 2π), we can define αi = ri eiθi /2 , and
V is defined as in Eq. (7.56). However, instead of V † , we need to introduce
√ √ 
α0 · · · αK−1
1  ∗ ··· ∗ 
(7.64) Ve = p ..  .
 
 .. . .
kαk1  . . . 
∗ ··· ∗

Then following the same proof as Lemma 7.9, we find that W = (Ve ⊗ In )U (V ⊗ In ) ∈ BEkαk1 ,a (T ).


Remark 7.11 (Linear combination of non-unitaries). Using the block encoding technique, we may
immediately obtain linear combination of general matrices that are not unitaries. However, with
some abuse of notation, the term “LCU” will be used whether the terms to be combined are unitaries
or not. In other words, the term “linear combination of unitaries” should be loosely interpreted as
“linear combination of things” (LCT) in many contexts. 

Example 7.12 (Linear combination of two matrices). Let UA , UB be two n-qubit unitaries, and
we would like to construct a block encoding of T = UA + UB .
7.3. LINEAR COMBINATION OF UNITARIES 101

There are two terms in total, so one ancilla qubit is needed. The prepare oracle needs to
implement
1
(7.65) V |0i = √ (|0i + |1i),
2
so this is the Hadamard gate. The circuit is given by Fig. 7.4, which constructs W ∈ BE√2,1 (T ).

|0i H H

|ψi UA UB

Figure 7.4. Circuit for linear combination of two unitaries.

A special case is the linear combination of two block encoded matrices. Given two n-qubit
matrices A, B, for simplicity let UA ∈ BE1,m (A), UB ∈ BE1,m (B). We would like to construct a
block encoding of T = A + B. The circuit is given by Fig. 7.5, which constructs W ∈ BE√2,1+m (T ).
This is also an example of a linear combination of non-unitary matrices.

|0i H H

|0m i
UA UB
|ψi

Figure 7.5. Circuit for linear combination of two block encoded matrices.


Example 7.13 (Transverse field Ising model). Consider the following TFIM model with periodic
boundary conditions (Zn = Z0 ), and n = 2n ,
X X
(7.66) Ĥ = − Zi Zi+1 − Xi .
i∈[n] i∈[n]

In order to use LCU, we need (n + 1) ancilla qubits. The prepare oracle can be simply constructed
from the Hadamard gate
(7.67) V = H ⊗(n+1) ,
and the select oracle implements
X X
(7.68) U= |ii hi| ⊗ (−Zi Zi+1 ) + |i + ni hi + n| ⊗ (−Xi ).
i∈[n] i∈[n]

The corresponding W ∈ BE√2n,n+1 (Ĥ). 


102 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Example 7.14 (Block encoding of a matrix polynomial). Let us use the LCU lemma to construct
the block encoding for an arbitrary matrix polynomial for a Hermitian matrix A in Section 7.1.
X
(7.69) f (A) = αk Tk (A),
k∈[K]

with kαk1 = k∈[K] |αk | and we set K = 2a . For simplicity assume αk ≥ 0.


P

We have constructed Uk := (UA ZΠ )k as the (1, m)-block-encoding of Tk (A). From each Uk we


can implement the select oracle
X
(7.70) U := |ki hk| ⊗ Uk
k∈[K]

via multi-qubit controls. Also given the availability of the prepare oracle
1 X √
(7.71) V |0a i = p αk |ki ,
kαk1 k∈[K]

we obtain a (kαk1 , m + a)-block-encoding of f (A).


The need of using a ancilla qubits, and even more importantly the need to implement the
prepare and select oracles is undesirable. We will see later that the quantum signal processing
(QSP) and quantum singular value transformation (QSVT) can drastically reduce both sources of
difficulties. 

Example 7.15 (Matrix functions given by a matrix Fourier series). Instead of block encoding,
LCU can also utilize a different query model based on Hamiltonian simulation. Let A be an n-qubit
Hermitian matrix. Consider f (x) ∈ R given by its Fourier expansion (up to a normalization factor)
Z
(7.72) f (x) = fˆ(k)eikx dk,

and we are interested in computing the matrix function via numerical quadrature
Z X
(7.73) f (A) = fˆ(k)eikA dk ≈ ∆k fˆ(k)eikA .
k∈K

Here K is a uniform grid discretizing the interval [−L, L] using |K| = 2k grid points, and the grid
spacing is ∆k = 2L/ |K|. The prepare oracle is given by the coefficients ck = ∆k fˆ(k), and the
corresponding subnormalization factor is
X Z
(7.74) kck1 = ∆k fˆ(k) ≈ fˆ(k) dk.
k∈K

The select oracle is


X
(7.75) U= |ki hk| ⊗ eikA .
k∈K

This can be efficiently implemented using the controlled matrix powers as in Fig. 3.6, where the
basic unit is the short time Hamiltonian simulation ei∆kA . This can be used to block encode a large
class of matrix functions. 
7.4. QUBITIZATION OF HERMITIAN MATRICES WITH GENERAL BLOCK ENCODING 103

7.4. Qubitization of Hermitian matrices with general block encoding


In Section 7.1 we assume that UA = UA† to block encode a Hermitian matrix A. For instance, s-
sparse Hermitian matrices, such Hermitian block encodings can be constructed following the recipe
in Section 6.5. However, this can come at the expense of requiring additional structures and oracles.
In general, the block encoding of a Hermitian matrix may not be Hermitian itself. In this section
we demonstrate that the strategy of qubitization can be modified to accommodate general block
encodings.
Again start from the eigendecomposition Eq. (7.8), we apply UA to |0m i |vi i and obtain
q
(7.76) UA |0m i |vi i = λi |0m i |vi i + 1 − λ2i |⊥0i i ,
where |⊥0i i is a normalized state satisfying Π |⊥0i i = 0.
Since UA block-encodes a Hermitian matrix A, we have
 
A ∗
(7.77) UA† = ,
∗ ∗
which implies that there exists another normalized state |⊥i i satisfying Π |⊥i i = 0 and
q
(7.78) UA† |0m i |vi i = λi |0m i |vi i + 1 − λ2i |⊥i i .
Now apply UA to both sides of Eq. (7.78), we obtain
q q
(7.79) |0m i |vi i = λ2i |0m i |vi i + λi 1 − λ2i |⊥0i i + 1 − λ2i UA |⊥i i ,
which gives
q
(7.80) UA |⊥i i = 1 − λ2i |0m i |vi i − λi |⊥0i i .
Define
(7.81) Bi = {|0m i |vi i , |⊥i i}, Bi0 = {|0m i |vi i , |⊥0i i},
and the associated two-dimensional subspaces Hi = span Bi , Hi0 = span Bi0 , we find that UA maps
Hi to Hi0 . Correspondingly UA† maps Hi0 to Hi .
Then Eqs. (7.76) and (7.80) give the matrix representation
 p 
Bi0 λ i 1 − λ2i
(7.82) [UA ]Bi = p .
1 − λ2i −λi
Similar calculation shows that
 p 
p λi 1 − λ2i
(7.83) [UA† ]B
B0 =
i
.
i 1 − λ2i −λi
Meanwhile both Hi and Hi0 are the invariant subspaces of the projector Π, with matrix representa-
tion
 
1 0
(7.84) [Π]Bi = [Π]Bi0 = .
0 0
Therefore
 
1 0
(7.85) [ZΠ ]Bi = [ZΠ ]Bi0 = .
0 −1
104 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

e = U † ZΠ UA ZΠ , with matrix representation


Hence Hi is an invariant subspace of O A
 p 2
λi − 1 − λ2i
(7.86) [O]Bi =
e p .
1 − λ2i λi
Repeating k times, we have
 p 2k
e k ]B =(U † ZΠ UA ZΠ )k = p λi − 1 − λ2i
[O A
i
1 − λ2i λi
(7.87)  p 
T (λ ) − 1 − λ 2U (λ )
2k i i 2k−1 i
= p .
1 − λ2i U2k−1 (λi ) T2k (λi )
Since any vector |0m i |ψi can be expanded in terms of the eigenvectors |0m i |vi i, we have
 
T2k (A) ∗
(7.88) (UA† ZΠ UA ZΠ )k = .
∗ ∗
Therefore if we would like to construct an even order Chebyshev polynomial T2k (A), the circuit
(UA† ZΠ UA ZΠ )k straightforwardly gives a (1, m)-block-encoding.
In order to construct the block-encoding of an odd polynomial T2k+1 (A), we note that
 p 

0
k Bi T2k+1 (λi ) − 1 − λ2i U2k (λi )
(7.89) [UA ZΠ (UA ZΠ UA ZΠ ) ]Bi = p .
1 − λ2i U2k (λi ) T2k+1 (λi )
Using the fact that Bi , Bi0 share the common basis |0m i |vi i, we still have the block-encoding
 
T2k+1 (A) ∗
(7.90) UA ZΠ (UA† ZΠ UA ZΠ )k = .
∗ ∗

Therefore UA ZΠ (UA† ZΠ UA ZΠ )k is a (1, m)-block-encoding of T2k+1 (A).


In summary, the block-encoding of Tl (A) is given by applying UA ZΠ and UA† ZΠ alternately. If
l = 2k, then there are exactly k such pairs. The quantum circuit for each pair O
e is

|0m i ZΠ ZΠ
UA UA†
|ψi

Figure 7.6. Circuit implementing one step of qubitization with a general block
encoding of a Hermitian matrix. This block encodes T2 (A). Here UA ∈ BE1,m (A).

Otherwise if l = 2k + 1, then there is an extra UA ZΠ . The effect is to map each eigenvector


|0m i |vi i back and forth between the two-dimensional subspaces Hi and Hi0 . In Section 7.6, we shall
see that the separate treatment even/odd polynomials will play a more prominent role.
Now that we have obtained Ok ∈ BE1,m (Tk (A)) for all k, we can use the LCU lemma again to
construct block encodings for linear combination of Chebyshev polynomials. We omit the details
here.
7.5. QUANTUM EIGENVALUE TRANSFORMATION 105

7.5. Quantum eigenvalue transformation


Let us briefly recap what we have done so far: (1) construct a block encoding of an Hermitian
matrix A (the block encoding matrix itself can be non-Hermitian); (2) use qubitization to block
encode Tk (A); (3) use LCU to block encode an arbitrary polynomial function of A (up to a subnor-
malization factor). This framework is conceptually appealing, but the practical implementation of
the select and prepare oracles are by no means straightforward, and can come at significant costs.
π
7.5.1. Hermitian block encoding. Note that iZ = ei 2 Z , if A is given by a Hermitian block
encoding UA , the block encoding of the Chebyshev polynomial in Section 7.1 can be written as
d
π
Y
(7.91) Od = (−i)d (UA ei 2 ZΠ ).
j=1

This is a special case of the following representation. Note that (−i)d is an irrelevant phase factor
and can be discarded.
d
Y

e0 ZΠ
(7.92) UΦ = e (UA eiφj ZΠ ).
e
e
j=1

The representation Eq. (7.92) is called a quantum eigenvalue transformation (QET).


Due to qubitization, UΦ
e should block encode some matrix polynomial of A. We first state the
following theorem without proof.
Theorem 7.16 (Quantum signal processing, a simplified version). Consider
 √ 
x 1 − x2
(7.93) UA = √ .
1 − x2 −x
e := (φe0 , · · · , φed ) ∈ Rd+1 ,
For any Φ
d  
iφ0 Z
Y P (x) ∗
(7.94) UΦ
e := e (UA eiφj Z ) = ,
e e
∗ ∗
j=1

where P ∈ C[x] satisfy


(1) deg(P ) ≤ d,
(2) P has parity (d mod 2),
(3) |P (x)| ≤ 1, ∀x ∈ [−1, 1].
Also define −Φ e := (−φe0 , · · · , −φed ) ∈ Rd+1 , then
d
P ∗ (x) ∗
Y  
−iφ0 Z
(7.95) U−Φ
e := e (UA e−iφj Z ) = UΦ

e = .
e e
∗ ∗
j=1
∗ ∗
Here P (x) is the complex conjugation of P (x), and UΦ
e is the complex conjugation (without trans-
pose) of UΦ
e.

Remark 7.17. This theorem can be proved inductively. However, this is a special case of the
quantum signal processing in Theorem 7.20, so we will omit the proof here. In fact, Theorem 7.20
will also state the converse of the result, which describes precisely the class of matrix polynomials
that can be described by such phase factor modulations. In Theorem 7.16, the condition (1) states
that the polynomial degree is upper bounded by the number of UA ’s, and the condition (3) is simply
106 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

a consequence of that UΦ e is a unitary matrix. The condition (2) is less obvious, but should not
come at a surprise, since we have seen the need of treating even and odd polynomials separately in
the case of qubitization with a general block encoding. Eq. (7.95) can be proved directly by taking
the complex conjugation of UΦ e. 
Following the qubitization procedure, we immediately have Theorem 7.18.
Theorem 7.18 (Quantum eigenvalue transformation with Hermitian block encoding). Let UA ∈
HBE1,m (A). Then for any Φe := (φe0 , · · · , φed ) ∈ Rd+1 ,
d  

e0 ZΠ
Y
iφej ZΠ P (A) ∗
(7.96) UΦ
e =e (UA e )= ∈ BE1,m (P (A)),
∗ ∗
j=1

where P ∈ C[x] satisfies the requirements in Theorem 7.16.


Using Theorem 7.18, we may construct the block encoding of a matrix polynomial without
invoking LCU. The cost is essentially the same as block encoding a Chebyshev polynomial.
In order to implement eiφZΠ , we note that the quantum circuit denoted by CRφ is in Fig. 7.7
returns eiφ |0i |0m i if b = 0m , and e−iφ |0i |bi if b 6= 0m . So omitting the signal qubit, this is precisely
eiφZΠ .

|0i e−iφZ

|bi

Figure 7.7. Implementing the controlled rotation circuit for quantum eigenvalue transformation.

Therefore, if A is given by a Hermitian block encoding UA , we can follow the argument in


Section 7.1 and construct the following unitary The corresponding quantum circuit is in Fig. 7.8,
which uses one extra ancilla qubit. When measuring the (m + 1) ancilla qubits and obtain |0i |0m i,
the corresponding (unnormalized) state in the system register is P (A) |ψi.

|0i ···
CRφed CRφed−1 CRφe0
|0m i ···
UA UA UA
|ψi ···

Figure 7.8. Circuit of quantum eigenvalue transformation to construct UP (A) ∈


BE1,m+1 (P (A)), using UA ∈ HBE1,m (A).

The QET described by the circuit in Fig. 7.8 generally constructs a block encoding of P (A)
for some complex polynomial P . In practical applications (such as those later in this chapter),
7.5. QUANTUM EIGENVALUE TRANSFORMATION 107

we would like to construct a block encoding of PRe (A) ≡ (Re P )(A) = 12 (P (A) + P ∗ (A)) instead.
Below we demonstrate that a simple modification of Fig. 7.8 allows us to achieve this goal.
To this end, we use Eq. (7.95). Qubitization allows us to construct
d  ∗ 
−iφe0 ZΠ
Y
−iφ
ej ZΠ P (A) ∗
(7.97) U−Φe =e (UA e )= .
∗ ∗
j=1

So all we need is to negate all phase factors in Φ.


e In order to implement CR−φ , we do not actually
need to implement a new circuit. Instead we may simply change the signal qubit from |0i to |1i:

|1i e−iφZ

|bi

which returns e−iφ |1i |0m i if b = 0m , and eiφ |1i |bi if b 6= 0m . In other words, the circuit for UP ∗ (A)
and UP (A) are exactly the same except that the input signal qubit is changed from |0i to |1i.
Now we claim the circuit in Fig. 7.9 implements a block encoding UPRe (A) ∈ BE1,m+1 (PRe (A)).
This circuit can be viewed as an implementation of the linear combination of unitaries 21 (UP ∗ (A) +
UP (A) ).

|0i H H

|0m i UP (A)

|ψi

Figure 7.9. Circuit of quantum eigenvalue transformation for constructing a


(1, m + 1)-block-encoding of PRe (A).

To verify this, we may evaluate


H⊗Im+n 1
|0i |0m i |ψi −−−−−−→ √ (|0i + |1i) |0m i |ψi
2
UP (A) 1 1
−−−−→ √ |0i (|0m i P (A) |ψi + |⊥i) + √ |1i (|0m i P ∗ (A) |ψi + |⊥0 i)
(7.98) 2 2

 
H⊗Im+n P (A) + P (A)
−−−−−−→ |0i |0m i |ψi + |⊥i
e
2
= |0i |0m i PRe (A) |ψi + |⊥i
e

Here |⊥i , |⊥0 i are two (m+n)-qubit state orthogonal to any state |0m i |xi, while |⊥i
e is a (m+n+1)-
m
qubit state orthogonal to any state |0i |0 i |xi. In other words, by measuring all (m + 1) ancilla
qubits and obtain 0m+1 , the corresponding (unnormalized) state in the system register is PRe (A) |ψi.
108 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

7.5.2. General block encoding. If A is given by a general block encoding UA , the quantum
eigenvalue transformation should consist of an alternating sequence of UA , UA† gates. The circuit is
given by Fig. 7.10, and the corresponding block encoding is described in Theorem 7.19. Note that
the Hermitian block encoding becomes a special case with UA = UA† .

|0i ···
CRφed CRφed−1 CRφe0
|0m i ···
UA UA† UA
|ψi ···

Figure 7.10. Circuit of quantum eigenvalue transformation to construct UP (A) ∈


BE1,m+1 (P (A)), using UA ∈ BE1,m (A). Here UA , UA† should be applied alternately.
When d is even, the last UA gate should be replaced UA† .

Theorem 7.19 (Quantum eigenvalue transformation with general block encoding). Let UA ∈
e := (φe0 , · · · , φed ) ∈ Rd+1 , let
BE1,m (A). Then for any Φ
d/2 h i
UA† eiφ2j−1 ZΠ UA eiφ2j ZΠ
Y
iφ0 ZΠ
(7.99) UΦ
e =e
e e e

j=1

when d is even, and


(d−1)/2 h i
UA† eiφ2j ZΠ UA eiφ2j+1 ZΠ
Y
d iφ0 ZΠ
(7.100) UΦ
e = (−i) e (UA eiφ1 ZΠ )
e e e e

j=1

when d is odd. Then


 
P (A) ∗
(7.101) UΦ
e = ∈ BE1,m (P (A)),
∗ ∗
where P ∈ C[x] satisfy the conditions in Theorem 7.16.
Following exactly the same procedure, we find that the circuit in Fig. 7.9, with UP (A) given by
Fig. 7.10 implements a UPRe (A) ∈ BE1,m+1 (PRe (A)). This is left as an exercise.
7.5.3. General matrix polynomials. In practical applications, we may be interested in
matrix polynomials f (A), where f (x) ∈ R[x] does not have a definite parity. This violates the
parity requirement of Theorem 7.16. This can be solved by using the LCU technique.
Note that
(7.102) f (x) = feven (x) + fodd (x),
1
where feven (x) = 2 (f (x)+ f (−x)), fodd (x) = 12 (f (x) − f (−x)). If |f (x)| ≤ 1 on [−1, 1], then
|feven (x)|, |fodd (x)| ≤ 1 on [−1, 1], and feven (x), fodd (x) can be each constructed using the circuit
in Fig. 7.9. Introducing another ancilla qubit and using the LCU technique, we obtain a (1, m + 2)-
block-encoding of (feven (A) + fodd (A))/2. In other words, we obtain a circuit Uf ∈ BE2,m+2 (f (A)).
7.6. QUANTUM SIGNAL PROCESSING 109

Note that unlike the case of the block encoding of PRe (A), we lose a subnormalization factor of 2
here.
Following the same principle, if f (x) = g(x) + ih(x) ∈ C[x] is a given complex polynomial,
and g, h ∈ R[x] do not have a definite parity, we can construct Ug(A) ∈ BE2,m+2 (g(A)), Uh(A) ∈
BE2,m+2 (h(A)). Then applying another layer of LCU, we obtain Uf (A) ∈ BE4,m+3 (f (A)).
On the other hand, if the real and imaginary parts g, h have definite parity, then Ug(A) ∈
BE1,m+1 (g(A)), Uh(A) ∈ BE1,m+1 (h(A)). Applying LCU, we obtain Uf (A) ∈ BE2,m+2 (f (A)).
The construction circuits in the cases above is left as an exercise.

7.6. Quantum signal processing


In terms of implementing matrix polynomials of Hermitian matrices, quantum eigenvalue trans-
form provides a much simpler circuit than the method based on LCU and qubitization (i.e., linear
combination of Chebyshev polynomials). The simplification is clear both in terms of the number
of ancilla qubits and of the circuit architecture. However, it is not clear so far for which polyno-
mials (either a complex polynomial P ∈ C[x] or a real polynomial PRe ∈ R[x]) we can apply the
QET technique, and how to obtain the phase factors. Quantum signal processing (QSP) provides
a complete answer to this question.
Due to qubitization, all these questions can be answered in the context of SU(2) matrices. QSP
is the theory of QET for SU(2) matrices, or the unitary representation of a scalar (real or complex)
polynomial P (x). Let A = x ∈ [−1, 1] be a scalar with a one-qubit Hermitian block encoding
 √ 
x 1 − x 2
(7.103) UA (x) = √ .
1 − x2 −x
Then
 √ 
√ x − 1 − x2
(7.104) O(x) = UA (x)Z =
1 − x2 x
is a rotation matrix.
Similar to Eq. (7.92), the QSP representation takes the following form
(7.105) U (x) = eiφ0 Z O(x)eiφ1 Z O(x) · · · eiφd−1 Z O(x)eiφd Z .
By setting φ0 = · · · = φd = 0, we immediately obtain the block encoding of the Chebyshev
polynomial Td (x). The representation power of this formulation is characterized by Theorem 7.20,
which is based on slight modification of [GSLW19, Theorem 4]. In the following discussion, even
functions have parity 0 and odd functions have parity 1.
Theorem 7.20 (Quantum signal processing). There exists a set of phase factors Φ := (φ0 , · · · , φd ) ∈
Rd+1 such that
d  √ 
iφ0 Z
Y  iφj Z
 P√
(x) −Q(x) 1 − x2
(7.106) UΦ (x) = e O(x)e =
Q∗ (x) 1 − x2 P ∗ (x)
j=1

if and only if P, Q ∈ C[x] satisfy


(1) deg(P ) ≤ d, deg(Q) ≤ d − 1,
(2) P has parity d mod 2 and Q has parity d − 1 mod 2, and
(3) |P (x)|2 + (1 − x2 )|Q(x)|2 = 1, ∀x ∈ [−1, 1].
Here deg Q = −1 means Q = 0.
110 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Proof. ⇒:
Since both eiφZ and O(x) are unitary, the matrix UΦ (x) is always a unitary matrix, which
immediately implies the condition (3). Below we only need to verify the conditions (1), (2).
When d = 0, UΦ (x) = eiφ0 Z , which gives P (x) = eiφ0 and Q = 0, satisfying all three conditions.
For induction, suppose U(φ0 ,··· ,φd−1 ) (x) takes the form in Eq. (7.106) with degree d − 1, then for
any φ ∈ R, we have

(7.107)
 √  √   iφ 
P√
(x) −Q(x) 1 − x2 √ x − 1 − x2 e 0
U(φ0 ,··· ,φd−1 ) (x) = ∗
Q (x) 1 − x2 P ∗ (x) 1 − x2 x 0 e−iφ

− x2 )Q(x)
   iφ 
= √xP (x) − (1 − 1 − x2 (P (x) + xQ(x)) e 0
− 1 − x2 (P ∗ (x) + xQ∗ (x)) xP ∗ (x) − (1 − x2 )Q∗ (x) 0 e−iφ

eiφ (xP (x) − (1 − x2 )Q(x)) e−iφ (− 1 − x2 (P (x) + xQ(x)))
 
= iφ √ .
e ((− 1 − x2 (P ∗ (x) + xQ∗ (x))) eiφ (xP ∗ (x) − (1 − x2 )Q∗ (x))

Therefore U(φ0 ,··· ,φd−1 ,φ) (x) satisfies conditions (1),(2).


⇐:
When d = 0, the only possibility is P (x) = eiφ0 and Q = 0, which satisfies Eq. (7.106).
For d > 0, when d is even we may still have deg P = 0, i.e., P (x) = eiφ0 and Q = 0. In this
case, note that
 √ 
−1 † √x 1 − x2 π π
(7.108) O (x) = O (x) = = e−i 2 Z O(x)e+i 2 Z ,
− 1 − x2 x

we may set φj = (−1)j π2 , j = 1, . . . , d, and

d
d
Y
eiφ0 Z O(x)eiφj Z = eiφ0 Z (O† (x)O(x)) 2 = eiφ0 Z .
 
(7.109)
j=1

Thus the statement holds.


Now given P, Q satisfying conditions (1)–(3), with deg P = ` > 0, and ` ≡ d (mod 2). Then
2
deg(|P (x)| ) = 2` > 0, and according to the condition (3) we must have deg(Q) = ` − 1. Let P, Q
be expanded as

`
X `−1
X
(7.110) P (x) = αk xk , Q(x) = βk x k ,
k=0 k=0

then the leading term of |P (x)|2 + (1 − x2 )|Q(x)|2 is

2 2 2 2
(7.111) |α` | x2` − x2 |β`−1 | x2`−2 = (|α` | − |β`−1 | )x2` = 0,

which implies |α` | = |β`−1 |.


7.6. QUANTUM SIGNAL PROCESSING 111

For any φ ∈ R, we have


 √ 
P√(x) −Q(x) 1 − x2
e−iφZ O† (x)
Q∗ (x) 1 − x2 P ∗ (x)
 √   −iφ  √ 
P√(x) −Q(x) 1 − x2 e 0 √ x 1 − x 2
=
Q∗ (x) 1 − x2 P ∗ (x) 0 eiφ − 1 − x2 x
(7.112) −iφ 2 iφ
√ −iφ
− 1 − x (−e P (x) + xQ(x)eiφ )
 
e xP (x) + (1 − x )Q(x)e 2
= √
1 − x2 (−eiφ P ∗ (x) + xQ∗ (x)e−iφ ) eiφ xP ∗ (x) + (1 − x2 )Q∗ (x)e−iφ
√ !
Pe(x) −Q(x)
e 1 − x2
=: ∗
√ ∗
.
Q (x) 1 − x
e 2 P (x)
e

It may appear that deg Pe = ` + 1. However, by properly choosing φ we may obtain deg Pe = ` − 1.
Let e2iφ = α` /β`−1 . Then the coefficient of the x`+1 term in Pe is
(7.113) e−iφ α` − eiφ β`−1 = 0.
Similarly, the coefficient of the x` term in Q
e is
(7.114) − e−iφ α` + eiφ β`−1 = 0.
The coefficient of the x` term in Pe, and the coefficient of the x`−1 term in Q
e are both 0 by the
parity condition. So we have
(1) deg(Pe) ≤ ` − 1 ≤ d − 1, deg(Q) ≤ ` − 2 ≤ d − 2,
(2) Pe has parity d − 1 mod 2 and Qe has parity d − 2 mod 2, and
2 2 e 2
(3) |P (x)| + (1 − x )|Q(x)| = 1, ∀x ∈ [−1, 1].
e
Here the condition (3) is automatically satisfied due to unitarity. The induction follows until
` = 0, and apply the argument in Eq. (7.109) to represent the remaining constant phase factor if
needed. 
Remark 7.21 (W -convention of QSP). [GSLW19, Theorem 4] is stated slightly differently as
d h i  √ 
W Y W P√(x) iQ(x) 1 − x2
(7.115) UΦW (x) = eiφ0 Z W (x)eiφj Z = ,
iQ∗ (x) 1 − x2 P ∗ (x)
j=1

where
 √ 
√ x i 1 − x2
(7.116) W (x) = ei arccos(x)X = .
i 1 − x2 x
This will be referred to as the W -convention. Correspondingly Eq. (7.105) will be referred to as the
O-convention. The two conventions can be easily converted into one another, due to the relation
π π
(7.117) W (x) = e−i 4 Z O(x)e+i 4 Z .
Correspondingly the relation between the phase angles using the O and W representations are
related according to

W π
φ0 − 4 ,
 j = 0,
W
(7.118) φj = φj , j = 1, . . . , d − 1,
 W π

φd + 4 , j = d,
112 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

On the other hand, note that for any θ ∈ R, UΦ (x) and eiθZ UΦ (x)e−iθZ both block encodes P (x).
Therefore WLOG we may as well take
(7.119) Φ = ΦW .
In many applications, we are only interested in P ∈ C[x], and Q ∈ C[x] is not provided a
priori. [GSLW18, Theorem 4] states that under certain conditions P , the polynomial Q can always
be constructed. We omit the details here. 

7.6.1. QSP for real polynomials. Note that the normalization condition (3) in Theo-
rem 7.20 imposes very strong constraints on the coefficients of P, Q ∈ C[x]. If we are only interested
in QSP for real polynomials, the conditions can be significantly relaxed.

Theorem 7.22 (Quantum signal processing for real polynomials). Given a real polynomial
PRe (x) ∈ R[x], and deg PRe = d > 0, satisfying
(1) PRe has parity d mod 2,
(2) |PRe (x)| ≤ 1, ∀x ∈ [−1, 1],
then there exists polynomials P (x), Q(x) ∈ C[x] with Re P = PRe and a set of phase factors Φ :=
(φ0 , · · · , φd ) ∈ Rd+1 such that the QSP representation Eq. (7.106) holds.

Compared to Theorem 7.20, the conditions in Theorem 7.22 is much easier to satisfy: given
any polynomial f (x) ∈ R[x] satisfying condition (1) on parity, we can always scale f to satisfy the
condition (2) on its magnitude. Again the presentation is slightly modified compared to [GSLW19,
Corollary 5]. We can now summarize the result of QET with real polynomials as follows.

Corollary 7.23 (Quantum eigenvalue transformation with real polynomials). Let A ∈ CN ×N be


encoded by its (1, m)-block-encoding UA . Given a polynomial PRe (x) ∈ R[x] of degree d satisfying the
conditions in Theorem 7.22, we can find a sequence of phase factors Φ ∈ Rd+1 , so that the circuit in
Fig. 7.9 denoted by UΦ implements a (1, m + 1)-block-encoding of PRe (A). UΦ uses UA , UA† , m-qubit
controlled NOT, and single qubit rotation gates for O(d) times.

Remark 7.24 (Relation between QSP representation and QET circuit). Although O(x) = UA (x)Z,
π
we do not actually need to implement Z separately in QET. Note that iZ = ei 2 Z , i.e., ZeiφZ =
π
(−i)ei( 2 +φ)Z , we obtain
d
Y d h
Y i
UΦ (x) = eiφ0 Z O(x)eiφj Z = (−i)d eiφ0 Z UA (x)eiφj Z ,
 
(7.120)
e e

j=1 j=1

where φe0 = φ0 , φej = φj + π/2, j = 1, . . . , d. For the purpose of block encoding P (x), another
equivalent, and more symmetric choice is

π
φ0 + 4 , j = 0,

π
(7.121) φj = φj + 2 , j = 1, . . . , d − 1,
e
φd + π4 , j = d.

When the phase factors are given in the W -convention, since we can perform a similarity
transformation and take Φ = ΦW , we can directly convert ΦW to Φ
e according to Eq. (7.121), which
is used in the QET circuit in Fig. 7.10. 
7.6. QUANTUM SIGNAL PROCESSING 113

Example 7.25 (QSP for Chebyshev polynomial revisited). In order to block encode the Chebyshev
polynomial, we have φj = 0, j = 0, . . . , d. This gives φe0 = 0, φej = π/2, j = 1, . . . , d, and
 √  d
d √ Td (x) − 1 − x2 Ud−1 (x) d
Y π
UA (x)ei 2 Z .
 
(7.122) UΦ (x) = [O(x)] = = (−i)
1 − x Ud−1 (x)
2 Td (x)
j=1

According to Eq. (7.121), an equivalent symmetric choice for block encoding Td (x) is

π
 4 , j = 0,

(7.123) φej = π2 , j = 1, . . . , d − 1,
π

4 , j = d.

7.6.2. Optimization based method for finding phase factors. QSP for real polynomials
is the most useful version for many problems in scientific computation. Let us now summarize the
problem of finding phase factors following the W -convention and identify Φ = ΦW .
Given a target polynomial f = PRe ∈ R[x] satisfying (1) deg(f ) = d, (2) the parity of f is
d mod 2, (3) kf k∞ := maxx∈[−1,1] |f (x)| < 1, we would like to find phase factors Φ := (φ0 , · · · , φd ) ∈
[−π, π)d+1 so that
(7.124) f (x) = g(x, Φ) := Re[U (x, Φ)11 ], x ∈ [−1, 1],
with
(7.125) U (x, Φ) := eiφ0 Z W (x)eiφ1 Z W (x) · · · eiφd−1 Z W (x)eiφd Z .
Theorem 7.22 shows the existence of the phase factors. Due to the parity constraint, the number of
degrees of freedom in the target polynomial f (x) is de := d d+1
2 e. Hence f (x) is entirely determined
by   on d distinct points. Throughout the paper, we choose these points to be xk =
its values e
2k−1
cos π , k = 1, ..., d,
e i.e., positive nodes of the Chebyshev polynomial T e(x). The QSP
2d
4de
problem can be equivalently solved via the following optimization problem
de
1X 2
(7.126) Φ∗ = arg min F (Φ), F (Φ) := |g(xk , Φ) − f (xk )| ,
Φ∈[−π,π)d+1 de k=1

i.e., any solution Φ to Eq. (7.124) achieves the global minimum of the cost function with F (Φ∗ ) = 0,

and vice versa.


However, note that the number of variables is larger than the number of equations and there
should be an infinite number of global minima. [DMWL21, Theorem 2] shows that the existence of
symmetric phase factors
(7.127) Φ = (φ0 , φ1 , φ2 , . . . , φ2 , φ1 , φ0 ) ∈ [−π, π)d+1 ,
Then the optimization problem is changed to
de
∗ 1X 2
(7.128) Φ = arg min F (Φ), F (Φ) := |g(xk , Φ) − f (xk )| ,
Φ∈[−π,π) d+1
, d
e
k=1
symmetric.

This corresponds to choosing complementary polynomial Q(x) ∈ R[x]. With the symmetric con-
straint taken into account, the number of variables matches the number of constraints.
114 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Unfortunately, the energy landscape of the cost function F (Φ) is very complex, and has numer-
ous global as well as local minima. Starting from a random initial guess, an optimization algorithm
can easily be trapped at a local minima already when d is small. It is therefore surprising that
starting from a special symmetric initial guess
(7.129) Φ0 = (π/4, 0, 0, . . . , 0, 0, π/4),
at least one global minimum can be robustly identified using standard unconstrained optimization
algorithms even when d is as large as 10, 000 using standard double precision arithmetic opera-
tions [DMWL21], and the optimization method is observed to be free from being trapped by any
local minima. Direct calculation shows that g(x, Φ0 ) = 0, and therefore Φ0 does not contain any a
priori information of the target polynomial f (x).
This optimization based method is implemented in QSPPACK1.
Remark 7.26 (Other methods for treating QSP with real polynomials). The proof of [GSLW19,
Corollary 5] also gives a constructive algorithm for solving the QSP problem for real polynomials.
Since PRe = f ∈ R[x] is given, the idea is to first find complementary polynomials PIm , Q ∈ R[x],
so that the resulting P (x) = PRe (x) + iPIm (x) and Q(x) satisfy the requirement in Theorem 7.20.
Then the phase factors can be constructed following the recursion relation shown in the proof of
Theorem 7.20. We will not describe the details of the procedure here. It is worth noting that
the method is not numerically stable. This is made more precise by [Haa19] that these algorithms
require O(d log(d/)) bits of precision, where d is the degree of f (x) and  is the target accuracy.
It is worth mentioning that the extended precision needed in these algorithms is not an artifact
of the proof technique. For instance, for d ≈ 500, the number of bits needed to represent each
floating point number can be as large as 1000 ∼ 2000. In particular, such a task cannot be reliably
performed using standard double precision arithmetic operations which only has 64 bits. 
7.6.3. A typical workflow preparing the circuit of QET. Let us now use f (x) = cos(xt)
as in the Hamiltonian simulation to demonstrate a typical workflow of QSP. This function should
be an even or odd function to satisfy the parity constraint (1) in Theorem 7.22.
(1) Expand f (x), x ∈ [−1, 1] using a polynomial expansion (in this case, the Jacob–Anger
expansion in Eq. (7.131)), and truncate it to some finite order.
(2) Scale the truncated polynomial by a suitable constant so that the resulting real polynomial
PRe (x) satisfies the maxnorm constraint (2) in Theorem 7.22.
(3) Use the optimization based method to find phase factors ΦW = Φ, and convert the result
to Φ
e according to the relation in Eq. (7.121). Φ
e can be directly used in the QET circuit
in Figs. 7.9 and 7.10.
Remark 7.27. When the function f (x) of interest has singularity on [−1, 1], the function should
first be mollified on a proper subinterval of interest, and then approximated by polynomials. A
more streamlined method is to use the Remez exchange algorithm with parity constraint to directly
approximate f (x) on the subinterval. We refer readers to [DMWL21, Appendix E] for more details.


7.7. Application: Time-independent Hamiltonian simulation


Using QET, let us now revisit the problem of time-independent Hamiltonian simulation problem
U = eiHt . Instead of Trotter splitting, we assume that we are given access to a block encoding
1https://github.com/qsppack/QSPPACK
7.7. APPLICATION: TIME-INDEPENDENT HAMILTONIAN SIMULATION 115

UH ∈ BEα,m (H). Since eiHt = ei(H/α)(αt) , the subnormalization factor α can be factored into the
simulation time t. So WLOG we assume UH ∈ BE1,m (H). Since

(7.130) U = cos(Ht) + i sin(Ht),

and that cos(xt), sin(xt) are even and odd functions, respectively, we can construct a block encoding
for the real and imaginary part directly using the circuit for real matrix polynomials in Fig. 7.9.
More specifically, we first use the Fourier–Chebyshev series of the trigonometric functions given
by the Jacobi-Anger expansion [−1, 1]:


X
cos(tx) =J0 (t) + 2 (−1)k J2k (t)T2k (x),
k=1
(7.131) ∞
X
sin(tx) =2 (−1)k J2k+1 (t)T2k+1 (x).
k=0

Here Jν (t) denotes Bessel functions of the first kind.


This series converges very rapidly. With
 
log(1/)
(7.132) r =Θ t+
log(e + log(1/)/t)

terms, the truncated Jacobi–Anger


√ expansion with degree up to d = 2r + 1 can approximate
cos(tx), sin(tx) to precision / 2, respectively [GSLW19, Corollary 32]. Such a scaling also matches
the complexity lower bound for Hamiltonian simulation [BACS07, LC17b]. Define

r r
1 2X 2X
(7.133) Cd (x) = J0 (t) + (−1)k J2k (t)T2k (x), Sd (x) = (−1)k J2k+1 (t)T2k+1 (x),
β β β
k=1 k=0

where β > 1 is chosen so that |Cd |, |Sd | ≤ 1 on [−1, 1], and β can be chosen to be as small as 1 + .
Also let fd (x) = Cd (x) + iSd (x), then

(7.134) max βfd (x) − eitx ≤ .


x∈[−1,1]

Theorem 7.22 guarantees the existence of phase factors Φ eC, Φ


e S , using which we can construct
UC ∈ BE1,m+1 (Cd (H)) and Us ∈ BE1,m+1 (Sd (H)). Finally, we can use one more ancilla qubit and
LCU in Section 7.5.3 to construct a block encoding Ud ∈ BE2,m+2 (fd (H)), or Ud ∈ BE2β,m+2 (U, ).
The circuit depth is O (t + log(1/)).
As an example, Fig. 7.11 shows the QSP representation of the quality of approximating cos(tx)
using PRe (x) = Cd (x) with t = 4π, β = 1.001, d = 24. The quality of the approximation can be
significantly improved with a larger degree d = 50 (see Fig. 7.12). The phase factors are obtained
via QSPPACK.
116 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Figure 7.11. QSP representation of cos(tx)/β with t = 4π, β = 1.001, d = 24.


The phase factors plotted removes a factor of π/4 on both ends (see Eq. (7.129)).

Figure 7.12. Error of the QSP representation of cos(tx)/β with t = 4π, β =


1.001, d = 50. The phase factors plotted removes a factor of π/4 on both ends (see
Eq. (7.129)).

7.8. Application: Ground state preparation


Given a block encoding UH ∈ BE1,m (H), and WLOG assume 0  H  1. We assume that we
2
are provided an initial state |ϕi so that the initial overlap p0 = |hϕ|ψ0 i| is not small. To simplify
the problem we also assume that ground and the first excited state energies E0 , E1 are known with
a positive gap ∆ := E1 − E0 > 0. Our goal is to use QET to prepare an approximate quantum
state |ψi ≈ |ψ0 i.
To this end, we can construct a threshold polynomial approximating the sign function on [0, 1].
Due to the assumption that 0  H  1, this function can be chosen to be an even function. Since
the sign function is a discontinuous, the polynomial should only aim at approximating the sign
function outside (µ − ∆/2, µ + ∆/2), where µ = (E0 + E1 )/2. The polynomial also needs to satisfy
the conditions in Theorem 7.22. We need to find an even polynomial PRe (x) satisfying
(7.135) |PRe (x) − 1| ≤ , ∀x ∈ [0, µ − ∆/2]; |PRe (x)| ≤ , ∀x ∈ [µ + ∆/2, 1].
We can achieve this by approximating the sign function, with deg(PRe ) = O(log(1/)∆−1 ). (see
e.g. [LC17a, Corollary 7] and [GSLW19, Corollary 16]). This construction is based on an approxi-
mation to the erf function. Fig. 7.13 gives a concrete construction of the even polynomial obtained
via numerical optimization, and the phase factors are obtained via QSPPACK.
7.8. APPLICATION: GROUND STATE PREPARATION 117

Figure 7.13. QSP representation for approximating a step function using an even
polynomial on [0, 0.1] ∪ [0.2, 1]. The phase factors plotted removes a factor of π/4
on both ends (see Eq. (7.129)).

Applying the circuit UΦ to the initial state |ϕi, we have


√ p
(7.136) UΦ |0m i |ϕi = p0 |0m i PRe (H) |ψ0 i + 1 − p0 |0m i PRe (H) |ψ⊥ i + |⊥i .
Here |ψ⊥ i is a state in the system register orthogonal to |ψ0 i, while |⊥i is orthogonal to all states
|0m i |ψi. Note that
(7.137) kPRe (H) |ψ⊥ ik ≤ , kPRe (H) |ψ0 ik ≥ 1 − ,
Therefore if we measure the ancilla qubits, the success probability of obtaining 0m in the ancilla
qubits, and the ground state |ψ0 i in the system register is at least p0 (1 − ). So the total number
of queries to to UA and UA† is O(∆−1 p−1
0 log(1/)).
Using amplitude amplification, the number of repetitions can be reduced to O(γ −1 ), and the
−1
total number of queries to to UA and UA† becomes O(∆−1 p0 2 log(1/)). This also matches the
lower bound [LT20a].
Once the ground state is prepared, we can estimate the ground state energy by measuring the
expectation value hψ0 |H|ψ0 i. The number of samples needed is O(1/2 ), which can be reduced to
O(1/) using amplitude amplification. In summary, the best complexity for estimating the ground
−1
state energy E0 to accuracy  is O(∆−1 p0 2 −1 log(1/)). Note that the cost of estimating the
ground state energy depends on the gap ∆. This is because the algorithm first prepares the ground
state and then estimates the ground state energy. If we are only interested in estimating E0 to
precision , the gap dependence is not necessary (see Section 4.1 as well as [LT20a]).

Exercise 7.1. Let A, B be two n-qubit matrices. Construct a circuit to block encode C = A + B
with UA ∈ BEαA ,m (A), UB ∈ BEαB ,m (B).
Exercise 7.2. Use LCU to construct a block encoding of the TFIM model with periodic boundary
conditions in Eq. (4.2), with g 6= 1.
Exercise 7.3. Prove Proposition 7.3.
Exercise 7.4. Let A be an n-qubit Hermitian matrix. Write down the circuit for UPRe (A) ∈
BE1,m+1 (PRe (A)) with a block encoding UA ∈ BE1,m (A), where P is characterized by the phase
sequence Φ
e specified in Theorem 7.16.
118 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES

Exercise 7.5. Write down the circuit for LCU of Hamiltonian simulation.
Exercise 7.6. Using QET to prepare the Gibbs state.
CHAPTER 8

Quantum singular value transformation

In Chapter 7, we have found that using qubitization, we can effectively block encode the
Chebyshev matrix polynomial Tk (A) for a Hermitian matrix A. Combined with LCU, we can
construct a block encoding of any matrix polynomial of A. The process is greatly simplified using
QSP and QET, which allows the implementation a general class of matrix functions for Hermitian
matrices.
In this section, we generalize the results of qubitization and QET to general non-Hermitian ma-
trices. This is called the quantum singular value transformation (QSVT). Throughout the chapter
we assume A ∈ CN ×N is a square matrix. QSVT is applicable to non-square matrices as well, and
we will omit the discussions here.

8.1. Generalized matrix functions


For a square matrix A ∈ CN ×N , where for simplicity we assume N = 2n for some positive
integer n, the singular value decomposition (SVD) of the normalized matrix A can be written as
(8.1) A = W ΣV † ,
or equivalently
(8.2) A |vi i = σi |wi i , A† |wi i = σi |vi i , i ∈ [N ].
We may apply a function f (·) on its singular values and define the generalized matrix function
[HBI73, ABF16] as below.
Definition 8.1 (Generalized matrix function [ABF16, Definition 4]). Given A ∈ CN ×N with sin-
gular value decomposition Eq. (8.1), and let f : R → C be a scalar function such that f (σi ) is
defined for all i ∈ [N ]. The generalized matrix function is defined as
(8.3) f  (A) := W f (Σ)V † ,
where
f (Σ) = diag (f (σ0 ) , f (σ1 ) , . . . , f (σN −1 )) .
Given the form in Eq. (8.2), we also define two other types of generalized matrix functions.
Definition 8.2. Under the conditions in Definition 8.1, the left and right generalized matrix func-
tion are defined respectively as
(8.4) f / (A) := W f (Σ)W † , f . (A) := V f (Σ)V † .

119
120 8. QUANTUM SINGULAR VALUE TRANSFORMATION

Here the left and right pointing triangles reflects that the transformation only keeps the left and
right singular vectors, respectively. For a given A, somewhat confusingly, in the discussion below,
the transformation f . (A), f / (A), f  (A) will all be referred to as singular value transformations. In
particular, QSVT mainly concerns f . (A), f  (A).
Proposition 8.3. The following relations hold:
(8.5) f  (A† ) = (f  (A))† , f . (A) = f / (A† ),
and
√ √ √ √
(8.6) f . (A) = f  ( A† A) = f ( A† A), f / (A) = f  ( AA† ) = f ( AA† ).


Proof. Just note that A† A = V Σ2√V † , we have√ A† A = V ΣV † . So the eigenvalue and singular
value decomposition coincide for both A† A and AA† . 
WLOG we assume access to UA ∈ BE1,m (A), so that the singular values of A are in [0, 1], i.e.,
(8.7) 0 ≤ σi ≤ 1, i ∈ [N ].

8.2. Qubitization of general matrices


In Section 7.4 we have observed that when A is a Hermitian matrix, the qubitization procedure
introduces two different subspaces Hi and Hi0 associated with each eigenvector |vi i. In particular,
UA maps Hi to Hi0 , and UA† maps Hi0 to Hi . Furthermore, both Hi and Hi0 are the invariant
subspaces of the projection operator Π. Therefore Hi is an invariant subspace of UA† f (Π)UA for
any function f . Much of the same structure can be carried out to the quantum singular value
transformation. The only difference is that the qubitization is now defined with respect to the
singular vectors. The procedure below almost entirely parallelizes that of Section 7.4, except that
we need to work with the singular value decomposition instead of the eigenvalue decomposition.
Start from the SVD in Eq. (8.2), we apply UA to |0m i |vi i and obtain
q
(8.8) UA |0m i |vi i = σi |0m i |wi i + 1 − σi2 |⊥0i i ,
where |⊥0i i is a normalized state satisfying Π |⊥0i i = 0.
Since UA block encodes a matrix A, we have
 † 
† A ∗
(8.9) UA = ,
∗ ∗
which implies that there exists another normalized state |⊥i i satisfying Π |⊥i i = 0 and
q
(8.10) UA† |0m i |wi i = σi |0m i |vi i + 1 − σi2 |⊥i i .
Now apply UA to both sides of Eq. (8.10), we obtain
q q
(8.11) |0m i |wi i = σi2 |0m i |wi i + σi 1 − σi2 |⊥0i i + 1 − σi2 UA |⊥i i ,
which gives
q
(8.12) UA |⊥i i = 1 − σi2 |0m i |wi i − σi |⊥0i i .
Define
(8.13) Bi = {|0m i |vi i , |⊥i i}, Bi0 = {|0m i |wi i , |⊥0i i},
8.2. QUBITIZATION OF GENERAL MATRICES 121

and the associated two-dimensional subspaces Hi = span Bi , Hi0 = span Bi0 , we find that UA maps
Hi to Hi0 . Correspondingly UA† maps Hi0 to Hi .
Then Eqs. (8.8) and (8.12) give the matrix representation
 p 
Bi0 σi 1 − σi2
(8.14) [UA ]Bi = p .
1 − σi2 −σi
Similar calculation shows that
 p 
p σi 1 − σi2
(8.15) [UA† ]B i
Bi0 = .
1 − σi2 −σi
Meanwhile both Hi and Hi0 are the invariant subspaces of the projector Π, with matrix representa-
tion
 
1 0
(8.16) [Π]Bi = [Π]Bi0 = .
0 0
Therefore
 
1 0
(8.17) [ZΠ ]Bi = [ZΠ ]Bi0 = .
0 −1
e = U † ZΠ UA ZΠ , with matrix representation
Hence Hi is an invariant subspace of O A
 p 2
σ i − 1 − σi2
(8.18) [O]Bi = p
e .
1 − σi2 σi
Repeating k times, we have
 p 2k
e k ]B =(U † ZΠ UA ZΠ )k = p σi − 1 − σi2
[O A
i
1 − σi2 σi
(8.19)  p 
T (σ ) − 1 − σ 2U (σ )
2k i i 2k−1 i
= p .
1 − σi2 U2k−1 (σi ) T2k (σi )
In other words,
vi T2k (σi )vi†
P   . 
ek = i ∗ T2k (A) ∗
(8.20) O = .
∗ ∗ ∗ ∗

Therefore the circuit (UA† ZΠ UA ZΠ )k gives (1, m)-block-encoding of T2k.


(A).
Similarly,
 p 
† k Bi
0 T2k+1 (σi ) − 1 − σi2 U2k (σi )
(8.21) [UA ZΠ (UA ZΠ UA ZΠ ) ]Bi = p .
1 − σi2 U2k (σi ) T2k+1 (σi )
In other words,
wi T2k+1 (σi )vi†
P    
∗ T2k+1 (A) ∗
(8.22) UA ZΠ (UA† ZΠ UA ZΠ )k = i = .
∗ ∗ ∗ ∗

Therefore the circuit UA ZΠ (UA† ZΠ UA ZΠ )k gives (1, m)-block-encoding of T2k+1



(A).
122 8. QUANTUM SINGULAR VALUE TRANSFORMATION

Remark 8.4. By approximating any continuous function f using polynomials, and using the LCU
lemma, we can approximately evaluate f  (A) for any odd function f , and f . (A) for any even
function f . This may seem somewhat restrictive. However, note that all singular values are non-
negative. Hence when performing the polynomial approximation, if we are interested in f  (A), we
can always use first perform a polynomial approximation of an odd extension of f , i.e.,

f (x),
 x > 0,
(8.23) g(x) = 0, x = 0,

−f (−x), x < 0,

and then evaluate g  (A). Similarly, if we are interested in f . (A) for a general f , we can perform
polynomial approximation to its even extension
(
f (x), x ≥ 0,
(8.24) g(x) =
f (−x), x < 0,
and evaluate g . (A). 

8.3. Quantum singular value transformation


8.3.1. Quantum circuit. Due to the close relation between eigenvalue and singular value
transformation in terms of Chebyshev polynomials and qubitization in Section 8.2, we can obtain
the QSVT circuit easily following the discussion in Section 7.6.
First, there are no changes to the scalar case of QSP (in terms of SU(2) matrices), and in
particular Theorem 7.20 and Theorem 7.22.
For the matrix case, when d is even,
d/2 h i
UA† eiφ2j−1 ZΠ UA eiφ2j ZΠ
Y
(8.25) UΦ = (−i)d eiφ0 ZΠ
e e e

j=1
.
gives a (1, m + 1)-block-encoding of P (A) for some even polynomial P ∈ C[x].
When d is odd,
(d−1)/2 h i
UA† eiφ2j ZΠ UA eiφ2j+1 ZΠ
Y
(8.26) UΦ = (−i)d eiφ0 ZΠ (UA eiφ1 ZΠ )
e e e e

j=1

gives a (1, m + 1)-block-encoding of P (A) for some odd polynomial P ∈ C[x].
The quantum circuit is exactly the same as that in Fig. 7.10. The phase factors can be adjusted
so that all polynomials P satisfying the conditions in Theorem 7.20 can be exactly represented. If

we are only interested in some real polynomial PRe ∈ R[x] and PRe (A) (odd) and P . (A) (even),
we can use Theorem 7.22 and the circuit in Fig. 8.1 (which is simply a combination of Figs. 7.9
and 7.10) to implement its (1, m + 1)-block-encoding. We have the following theorem. Since the
conditions of QSP representation for real polynomials is simple to satisfy and is also most useful in
practice, we only state the case with real polynomials.
Theorem 8.5 (Quantum singular value transformation with real polynomials). Let A ∈ CN ×N
be encoded by its (1, m)-block-encoding UA . Given a polynomial PRe (x) ∈ R[x] of degree d satisfying
the conditions in Theorem 7.22, we can find a sequence of phase factors Φ ∈ Rd+1 , so that the

circuit in Fig. 8.1 denoted by UΦ implements a (1, m + 1)-block-encoding of PRe (A) if d is odd, and
8.3. QUANTUM SINGULAR VALUE TRANSFORMATION 123

.
of PRe (A) if d is even. UΦ uses UA , UA† , m-qubit controlled NOT, and single qubit rotation gates
for O(d) times.

|0i H ··· H
CRφed CRφed−1 CRφe0
|0m i ···
UA UA† UA
|ψi ···

 ∈
Figure 8.1. Circuit of quantum singular value transformation to construct UPRe
 †
BE1,m+1 (PRe (A)), using UA ∈ BE1,m (A). Here UA , UA should be applied alter-
nately. When d is even, the last UA gate should be replaced UA† , and the circuit
.
constructs UPRe
. ∈ BE1,m+1 (PRe (A)). This is simply a combination of Figs. 7.9
and 7.10.

8.3.2. QSVT applied to Hermitian matrices. When A is a Hermitian matrix, the quan-
tum circuit for QET and QSVT are the same. This means that the eigenvalue transformation and
the singular value transformation are merely two different perspectives of the same object.
For a Hermitian matrix A, the eigenvalue decomposition and the singular value decomposition
are connected as
X X X
(8.27) A= |vi i λi hvi | = |sgn(λi )vi i |λi | hvi | := |wi i σi hvi | .
i i i

Here
(8.28) |wi i = |sgn(λi )vi i , σi = |λi | .
So if P ∈ C[x] is an even polynomial,
X X
(8.29) P (A) = |vi i P (λi ) hvi | = |vi i P (|λi |) hvi | = P . (A).
i i

Similarly, if P ∈ C[x] is an odd polynomial, then


X X
(8.30) P (A) = |vi i P (λi ) hvi | = |sgn(λi )vi i P (|λi |) hvi | = P  (A).
i i

These relations indeed verify that the eigenvalue decomposition and singular value decomposition
are indeed the same when P has a definite parity. When the parity of P is indefinite, the two
objects are in general not the same, and in particular cannot be directly implemented using the
QET circuit.

8.3.3. QSVT and matrix dilation. For general matrices, we have seen in the context of
solving linear equations in Section 4.3 that the matrix dilation method in Eq. (4.33) can be used to
convert the non-Hermitian problem to a Hermitian problem. Here we study the relation between
QSP applied to the dilated Hermitian matrix, and QSVT for the general matrix.
124 8. QUANTUM SINGULAR VALUE TRANSFORMATION

Recall the definition of the dilated Hermitian matrix


0 A†
 
(8.31) A=
e .
A 0
When A is given by its block encoding UA ∈ BE1,m (A), the dilated Hermitian matrix A e can be

obtained with one ancilla qubit through UAe = |0i h1| ⊗ UA + |1i h0| ⊗ UA , i.e., UAe ∈ BE1,m+1 (A).
e

Note that this requires the controlled version of UA , UA .
From the SVD in Eq. (8.2), we can construct
1
(8.32) |zi± i = √ (|0i |vi i ± |1i |wi i).
2
Direct calculation shows
(8.33) e |z ± i = ±σi |z ± i ,
A i i

i.e., {|zi± i} are all the eigenvectors of A.e


For an arbitrary polynomial f ∈ C[x], the matrix function applied to A e can be computed as
X
f (A)
e = |zi+ i f (σi ) hzi+ | + |zi− i f (−σi ) hzi− |
i
X |vi i feven (σk ) hvi | |vi i fodd (σk ) hwi |

(8.34) =
|wi i fodd (σk ) hvi | |wi i feven (σk ) hwi |
i

.
(A† )
 
feven (A) fodd
=  / .
fodd (A) feven (A)
Here
1 1
(8.35) feven (x) = (f (x) + f (−x)), fodd (x) = (f (x) − f (−x)).
2 2
Therefore applying the QSP to the dilated matrix A e automatically implements QSVT of A using
polynomials of even and odd parities.
In particular, if f is an even function, then
(8.36) e |0i |ψi = |0i f . (A) |ψi ,
f (A)
.
i.e., measuring the signal qubit we obtain 0 with certainty, and the system register is feven (A) |ψi.
Similarly, if f is odd, then
(8.37) e |0i |ψi = |1i f  (A) |ψi ,
f (A)
i.e., measuring the signal qubit we obtain 1 with certainty.

8.4. Application: Solve linear systems of equations


In this section, we revisit the problem of solving linear systems of equations Ax = b. With
QSVT we can solve QLSP for general matrices without the need of dilating the matrix into a
Hermitian matrix. Assume A = W ΣV † is invertible, i.e., ∀i, Σii > 0, then
(8.38) A−1 = V Σ−1 W † = f  (A† ),
where f (x) = x−1 is an odd function.
WLOG assume A can be accessed by UA ∈ BE1,m (A). For simplicity we also assume kAk = 1
(though in general kAk ≤ α = 1, and we may not always be able to set kAk = 1). Let κ be
8.4. APPLICATION: SOLVE LINEAR SYSTEMS OF EQUATIONS 125

the condition number of A, then the singular values of A are contained in the interval [δ, 1], with
δ = κ−1 .
Note that f (·) is not bounded by 1 and in particular singular at x = 0. Therefore instead of
approximating f on the whole interval [−1, 1] we consider an odd polynomial p(x) such that
δ
(8.39) p(x) − ≤ 0 , ∀x ∈ [−1, −δ] ∪ [δ, 1].
βx
The β factor is chosen arbitrarily so that |p(x)| ≤ 1 for all x ∈ [−1, 1] to satisfy the requirement of
the condition (2) in Theorem 7.22. For instance, we may choose β = 4/3. The precision parameter
0 will be chosen later. The degree of the odd polynomial can be chosen to be O( 1δ log( 10 )) is
guaranteed by e.g. [GSLW18, Corollary 69]. This construction is not explicit (see an explicit
construction in [CKS17]). Fig. 8.2 gives a concrete construction of an odd polynomial obtained via
numerical optimization, and the phase factors are obtained via QSPPACK.

Figure 8.2. QSP representation of an odd polynomial approximating the inverse


function on [κ−1 , 1] with κ = 10. The phase factors plotted removes a factor of
π/4 on both ends (see Eq. (7.129)).

Then Fig. 8.1 implements a (1, m + 1)-block-encoding of p (A† ) = V p(Σ)W † denoted by UΦ .


We have
(8.40) k(p (A† ) − (δ/β)A−1 k = kp(Σ) − (δ/β)Σ−1 k ≤ 0 .
The total number of queries to to UA and UA† is
     
1 1 1
(8.41) O log 0 = O κ log 0 .
δ  
To solve QLSP, we assume the right hand side vector |bi can be accessed through the oracle Ub
such that
(8.42) Ub |0n i = |bi .
We introduce the parameter
(8.43) ξ = A−1 |bi ,
which plays an important part in the success probability of the procedure. Let
(8.44) x = (δ/β)A−1 |bi
126 8. QUANTUM SINGULAR VALUE TRANSFORMATION

be the unnormalized true solution, and the normalized solution state is |xi = x/ kxk. Now denote
e = p (A† ) |bi, and |e
x xi = x
e/ ke ek ≤ 0 . For the
xk. Then the unnormalized solution satisfies kx − x
0
normalized state |yi, this error is scaled accordingly. When   k |e
xi k, we have
ke
x − xk 0
(8.45) k |xi − |xi k ≈ ≤ .
ke
xk ke
xk
Also we have
δ −1 δξ ξ
(8.46) ke
xk = A |bi = = .
β β βκ
Therefore in order for the normalized output quantum state to be -close to the normalized solution
state |xi, we need to set 0 = O(ξ/κ). This is similar to the case of the HHL algorithm in
Section 4.3, where QPE needs to achieve  multiplicative accuracy, which means that the additive
accuracy parameter 0 should be set to O(/κ).
The success probability of the above procedure is Ω(ke xk2 ) = Ω(ξ 2 /κ2 ). With amplitude am-
plification we can boost the success probability to be greater than 1/2 with one qubit serving as
a witness, i.e., if measuring this qubit we get an outcome 0 it means the procedure has succeeded,
and if 1 it means the procedure has failed. It takes O(κ/ξ) rounds of amplitude amplification, i.e.,
using UΦ† , UΦ , Ub , and Ub† for O(κ/ξ) times. A single UΦ uses UA and its inverse
     
1 1 κ
(8.47) O log 0 = O κ log
δ  ξ
times. Therefore the total number of queries to UA and its inverse is
 2  
κ κ
(8.48) O log .
ξ ξ
The number of queries to the Ub and its inverse is O(κ/ξ). We consider the following two cases for
the magnitude of ξ.
(1) In general if no further promise is given, then ξ ≥ 1. The total query complexity of UA is
therefore O(κ2 log(κ/)). This is the typical complexity referred to in the literature.
(2) If |bi has a Ω(1) overlap with the left-singular vector of A with the smallest singular value,
then ξ = Ω(κ). This is the best case scenario, and the total query complexity of UA is
O(κ log(1/)), and the number of queries to the right hand side vector |bi is O(1), which
is independent of the condition number.

8.5. Quantum singular value transformation with basis transformation*


So far we have assumed that we have a matrix A in mind, and the access to A is provided via
the block encoding matrix UA . The QSVT circuit takes the form
(8.49) UΦ = CRφe0 · · · CRφed−2 UA† CRφed−1 UA CRφed .
If we are further given two unitary matrices P, Q, we can equivalently rewrite the QSVT transfor-
mation as
(8.50) UΦ = · · · (Q† UA† P )† (Q CRφed−1 Q† )(QUA P † )(P CRφed P † )P.
The beginning of the equation ends with P or Q depending on the parity of d. The insertion of
P, Q amounts to a basis transformation. Assume that we have access to U eA = QUA P † , then UA
is the matrix representation of UA with respect to the bases given by P, Q, respectively. What is
e
8.5. QUANTUM SINGULAR VALUE TRANSFORMATION WITH BASIS TRANSFORMATION* 127

different is that the controlled rotations before UA and UA† are now expressed with respect to two
different basis sets, i.e., P CRφed P † , and Q CRφed−1 Q† , respectively. This can be useful for certain
applications. Let us now express these ideas more formally.
Assume that we are given an (n + m)-qubit unitary U eA , and two (n + m)-qubit projectors Π, Π0 .
0
For simplicity we assume rank(Π) = rank(Π ) = N . Define an orthonormal basis set
(8.51) B = {|ϕ0 i , . . . , |ϕN −1 i , |vN i , . . . , |vN M −1 i},
where the vectors |ϕ0 i , . . . , |ϕN −1 i span the range of Π, and all states |vi i are orthogonal to |ϕj i.
Similarly define an orthonormal basis set
(8.52) B 0 = {|ψ0 i , . . . , |ψN −1 i , |wN i , . . . , |wN M −1 i}
where the vectors |ψ0 i , . . . , |ψN −1 i span the range of Π0 , and all states |wi i are orthogonal to |ψj i.
We can think that the columns of B, B 0 form the basis transformation matrix P, Q, respectively.
Then the matrix A is defined implicitly in terms of its matrix representation
 
(8.53) eA ]B0 = UA = A ∗ .
[U B ∗ ∗
Note that
 
0 In 0
(8.54) [Π]B 0 B
B = [Π ]B0 = = |0m i h0m | ⊗ In ,
0 0
we find that
X
(8.55) Π0 U
eA Π = |ψi i Aij hϕj | .
i,j∈[N ]

Therefore Theorem 8.5 can be viewed as the singular value transformation of A, which is a submatrix
of the matrix representation of UA with respect to bases B, B 0 .
The implementation of the controlled rotation P CRφe P † relies on the implementation of CΠ NOT.
The projectors Π, Π0 can be accessed directly, and WLOG we focus on one projector Π. Motivated
from Grover’s search, we may assume access to a reflection operator
(8.56) RΠ = Im − 2Π.
via the controlled NOT gates CΠ NOT, CΠ0 NOT respectively. We can then define an m-qubit
controlled NOT gate as
(8.57) CΠ NOT := X ⊗ Π + I ⊗ (Im − Π),
which can be constructed using RΠ as
Im − RΠ Im + RΠ
CΠ NOT = X ⊗ +I ⊗
2 2
I +X I −X
(8.58) = ⊗ Im + ⊗ RΠ
2 2
= |+i h+| ⊗ Im + |−i h−| ⊗ RΠ
= (H ⊗ Im )(|0i h0| ⊗ Im + |1i h1| ⊗ RΠ )(H ⊗ Im ).
Therefore assuming access to RΠ , the CΠ NOT gate can be implemented using the circuit in Fig. 8.3.
128 8. QUANTUM SINGULAR VALUE TRANSFORMATION

H H

Figure 8.3. Circuit for implementing CΠ NOT using a reflector RΠ .

Then according to Theorem 8.5, we can implement the QSVT using U e † , C|0m ih0m | NOT
eA , U
A
and single qubit rotation gates, where |0m i h0m | ⊗ In is a rank-2n projector. In this generalized
setting, C|0m ih0m |⊗In NOT = C|0m ih0m | NOT ⊗In in the B, B 0 basis should be implemented using
CΠ NOT, CΠ0 NOT, respectively. We arrive at the following theorem:
Theorem 8.6 (Quantum singular value transformation with real polynomials and projectors).
Let UeA be a (n + m)-qubit unitary, and Π, Π0 be two (n + m)-qubit projectors of rank 2n . Define
the basis B, B 0 according to Eqs. (8.51) and (8.52), and let A ∈ CN ×N be defined in terms of the
matrix representation in Eq. (8.53). Given a polynomial PRe (x) ∈ R[x] of degree d satisfying the
conditions in Theorem 7.22, we can find a sequence of phase factors Φ ∈ Rd+1 to define a unitary
UΦ satisfying
X

(8.59) UΦ |ϕj i = |ψi i [PRe (A)]ij + |⊥0j i ,
i∈[N ]

if d is odd, and
X
.
(8.60) UΦ |ϕj i = |ϕi i [PRe (A)]ij + |⊥j i .
i∈[N ]

if d is even. Here Π0 |⊥0j i = 0, Π |⊥j i = 0, and UΦ uses U e † , CΠ NOT, CΠ0 NOT, and single
eA , U
A
qubit rotation gates for O(d) times.

8.6. Application: Grover’s search revisited, and fixed-point amplitude amplification*


As an application, let us revisit the Grover search problem from the perspective of QSVT. Again
let |x0 i be the desired marked state, and |ψ0 i be the uniform superposition of states. Now let us
perform a basis transformation. We define an orthonormal basis set B = {|ψ0 i , |v1 i , . . . , |vN −1 i},
where all states |vi i are orthogonal to |ψ0 i. Similarly define an orthonormal basis set B 0 =
{|x0 i , |w1 i , . . . , |wN −1 i}, where all states |wi i are√orthogonal to |x0 i. Then the matrix of reflection
operator Rψ0 with respect to B, B 0 is (let a = 1/ N )
 
0 a ∗
(8.61) [Rψ0 ]BB = ,
∗ ∗
Let A = a be a 1 × 1 matrix, and then Rψ0 ∈ BE1,n (A). Furthermore, the projectors Π = |ψ0 i hψ0 |
and Π0 = |x0 i hx0 | can be accessed via the reflection operator Rψ0 , Rx0 , respectively. According to
Eq. (8.58), this defines CΠ NOT, CΠ0 NOT. Let U eA = Rψ , we have
0

(8.62) Π0 U
eA Π = a |x0 i hψ0 | ,
and we would like to use Theorem 8.6 to find UΦ that block encodes

(8.63) |x0 i PRe (a) hψ0 | ≈ |x0 i hψ0 | .
8.6. APPLICATION: GROVER’S SEARCH REVISITED, AND FIXED-POINT AMPLITUDE AMPLIFICATION*129

To this end, we need to find an odd, real polynomial PRe (x) satisfying PRe (a) ≈ 1. More specifically,
we need to find PRe (x) satisfying
(8.64) |PRe (x) − 1| ≤ 2 , ∀x ∈ [a, 1].
We can achieve
√ this by approximating the sign function, with deg(PRe ) = O(log(1/2 )a−1 ) =
O(log(1/) N ) (see e.g. [LC17a, Corollary 6]). This construction is also based on an approximation
to the erf function. Fig. 8.4 gives a concrete construction of an odd polynomial obtained via
numerical optimization, and the phase factors are obtained via QSPPACK.

Figure 8.4. QSP representation of an odd polynomial approximating the sign


function on [0.1, 1]. The phase factors plotted removes a factor of π/4 on both
ends (see Eq. (7.129)).

Then
(8.65) UΦ |ψ0 i ≈ |x0 i .
More specifically, note that
(8.66) UΦ |ψ0 i − |x0 i = (PRe (a) − 1) |ψ0 i + |⊥i
e

for some unnormalized state |⊥i.


e Moreover
2
2
(8.67) |⊥i
e + PRe (a) = 1,
which gives
q p √
(8.68) |⊥i
e = 2 (a) ≤
1 − PRe 1 − (1 − 2 )2 ≤ 22 .
So

(8.69) kUΦ |ψ0 i − |x0 ik ≤ 2 + 2 = O().
Therefore we can measure the system register to find x0 , and we achieve the same Grover type
speedup.
Note that the approximation can be arbitrarily accurate, and there is no overshooting problem
(though this is a small problem) as in the standard Grover search. While Grover’s search does not
require the output quantum state to be exactly |x0 i, this could be desirable when it is used as a
quantum subroutine, such as amplitude amplification.
The immediate generalization of the procedure above is called the fixed point amplitude am-
plification.
130 8. QUANTUM SINGULAR VALUE TRANSFORMATION

Proposition 8.7 (Fixed-point amplitude amplification). Let U eA be an n-qubit unitary and Π0 be


an n-qubit orthogonal projector such that
(8.70) Π0 U
eA |ϕ0 i = a |ψi , a > δ > 0.
Then there is a (n + 1)-qubit unitary circuit UΦ such that
(8.71) k|0i |ψi − UΦ |0i |ϕ0 ik ≤ .
here UΦ uses the gates U e † , CΠ0
eA , U NOT, C|ϕ0 ihϕ0 | NOT and single qubit rotation gates for O(log(1/)δ −1 )
A
times.
Proof. The procedure is very similar to Grover’s search. We only prove the case when Π
is of rank 1, though the statement is also correct when the rank of Π is larger than 1. We can
construct B = {|ϕ0 i , |v1 i , . . . , |vN −1 i}, where all states |vi i are orthogonal to |ϕ0 i. Similarly define
an orthonormal basis set B 0 = {|ψi , |w1 i , . . . , |wN −1 i}, where all states |wi i are orthogonal to the
target state |ψi. Since the target state |ψi belongs to the range of Π0 ,
(8.72) hψ|UeA |ϕ0 i = hψ|Π0 U
eA |ϕ0 i = a,
i.e.,
 
eA ]B 0 a ∗
(8.73) [U B = .
∗ ∗
Now let Π = |ϕ0 i hϕ0 |, we can use the same choice of PRe (x) as in Grover’s search so that
|PRe (x) − 1| = O(2 ) for any x ≥ δ, and deg(PRe ) = O(log(1/)δ −1 ). The corresponding UΦ
uses one ancilla qubit to block encode
(8.74) |ψi PRe (δ) hϕ0 | ≈ |ψi hϕ0 | .

Note that the ranks of Π0 , Π are different. This does not affect the proof.

Exercise 8.1 (Robust oblivious amplitude amplification). Consider a quantum circuit consisting of
two registers denoted by a and s. Suppose we have a block encoding V of A: A = (h0|a ⊗Is )V (|0ia ⊗
Is ). Let W = −V (REF⊗Is )V † (REF⊗Is )V , where REF = Ia −2 |0ia h0|a . (1) Within the framework
of QSVT, what is the polynomial associated with the singular value transformation implemented
by W ? (2) Suppose A = U/2 for some unitary U . What is (h0|a ⊗ Is )W (|0ia ⊗ Is )? (3) Explain the
construction of W in terms of a singular value transformation f  (A) with f (x) = 3x − 4x3 . Draw
the picture of f (x) and mark its values at x = 0, 21 , 1.
Exercise 8.2 (Logarithm of unitaries). Given access to a unitary U = eiH where kHk ≤ π/2. Use
QSVT to design a quantum algorithm to approximately implement a block encoding of H, using
controlled U and its inverses, as well as elementary quantum gates.
Bibliography

[Aar13] Scott Aaronson. Quantum computing since Democritus. Cambridge Univ. Pr., 2013.
[ABF16] F. Arrigo, M. Benzi, and C. Fenu. Computation of generalized matrix functions. SIAM
J. Matrix Anal. Appl., 37:836–860, 2016.
[BACS07] Dominic W Berry, Graeme Ahokas, Richard Cleve, and Barry C Sanders. Efficient
quantum algorithms for simulating sparse Hamiltonians. Commun. Math. Phys.,
270(2):359–371, 2007.
[Chi21] Andrew Childs. Lecture notes on quantum algorithms, 2021.
[CKS17] Andrew M. Childs, Robin Kothari, and Rolando D. Somma. Quantum algorithm
for systems of linear equations with exponentially improved dependence on precision.
SIAM J. Comput., 46:1920–1950, 2017.
[CST+ 21] Andrew M Childs, Yuan Su, Minh C Tran, Nathan Wiebe, and Shuchen Zhu. Theory
of trotter error with commutator scaling. Phys. Rev. X, 11(1):011020, 2021.
[DHM+ 18] Danial Dervovic, Mark Herbster, Peter Mountney, Simone Severini, Naïri Usher, and
Leonard Wossnig. Quantum linear systems algorithms: a primer. arXiv:1802.08227,
2018.
[DL21] Yulong Dong and Lin Lin. Random circuit block-encoded matrix and a proposal of
quantum linpack benchmark. Phys. Rev. A, 103(6):062412, 2021.
[DMWL21] Yulong Dong, Xiang Meng, K Birgitta Whaley, and Lin Lin. Efficient phase factor
evaluation in quantum signal processing. Phys. Rev. A, 103:042419, 2021.
[DW19] Ronald De Wolf. Quantum computing: Lecture notes. arXiv:1907.09415, 2019.
[Fey82] Richard P Feynman. Simulating physics with computers. Int. J. Theor. Phys, 21(6/7),
1982.
[GSLW18] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular
value transformation and beyond: exponential improvements for quantum matrix arith-
metics. arXiv:1806.01838, 2018.
[GSLW19] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular
value transformation and beyond: exponential improvements for quantum matrix arith-
metics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of
Computing, pages 193–204, 2019.
[GVL13] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins Univ. Press,
Baltimore, fourth edition, 2013.
[Haa19] J. Haah. Product decomposition of periodic functions in quantum signal processing.
Quantum, 3:190, 2019.
[HBI73] J. B. Hawkins and A. Ben-Israel. On generalized matrix functions. Linear and Multi-
linear Algebra, 1:163–171, 1973.
[HHL09] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear
systems of equations. Phys. Rev. Lett., 103:150502, 2009.
131
132 Bibliography

[JL00] Tobias Jahnke and Christian Lubich. Error bounds for exponential operator splittings.
BIT, 40(4):735–744, 2000.
[KSV02] Alexei Yu Kitaev, Alexander Shen, and Mikhail N Vyalyi. Classical and quantum
computation. Number 47. American Mathematical Soc., 2002.
[LC17a] Guang Hao Low and Isaac L Chuang. Hamiltonian simulation by uniform spectral
amplification. arXiv:1707.05391, 2017.
[LC17b] Guang Hao Low and Isaac L. Chuang. Optimal hamiltonian simulation by quantum
signal processing. Phys. Rev. Lett., 118:010501, 2017.
[Llo96] Seth Lloyd. Universal quantum simulators. Science, pages 1073–1078, 1996.
[LT20a] Lin Lin and Yu Tong. Near-optimal ground state preparation. Quantum, 4:372, 2020.
[LT20b] Lin Lin and Yu Tong. Optimal quantum eigenstate filtering with application to solving
quantum linear systems. Quantum, 4:361, 2020.
[Nak08] Mikio Nakahara. Quantum computing: from linear algebra to physical realizations.
CRC press, 2008.
[NC00] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information,
2000.
[NO88] J. W. Negele and H. Orland. Quantum many-particle systems. Westview, 1988.
[NWZ09] Daniel Nagaj, Pawel Wocjan, and Yong Zhang. Fast amplification of QMA. Quantum
Inf. Comput., 9(11):1053–1068, 2009.
[Pre99] John Preskill. Lecture notes for physics 219: Quantum computation. Caltech Lecture
Notes, 1999.
[RP11] Eleanor G Rieffel and Wolfgang H Polak. Quantum computing: A gentle introduction.
MIT Press, 2011.
[Sze04] Mario Szegedy. Quantum speed-up of markov chain based algorithms. In 45th Annual
IEEE symposium on foundations of computer science, pages 32–41, 2004.
[Tha08] Mechthild Thalhammer. High-order exponential operator splitting methods for time-
dependent Schrödinger equations. SIAM J. Numer. Anal., 46(4):2022–2038, 2008.

You might also like