Quantumnotesgraduate
Quantumnotesgraduate
Lin Lin
Preface 5
With availability of near-term quantum devices and the breakthrough of quantum supremacy
experiments, quantum computation has received an increasing amount of attention from a diverse
range of scientific disciplines in the past few years. Despite the availability of excellent textbooks
as well as lecture notes such as [NC00, KSV02, Nak08, RP11, Aar13, Pre99, DW19, Chi21], these
materials often cover all aspects of quantum computation, including complexity theory, physical
implementations of quantum devices, quantum information theory, quantum error correction, quan-
tum algorithms etc. This leaves little room for introducing how a quantum computer is supposed
to be used to solve challenging computational problems in scientific and engineering. For instance,
after the initial reading of (admittedly, selected chapters of) the classic textbook by Nielsen and
Chuang [NC00], I was both amazed by the potential power of a quantum computer, and baffled by
its practical range of applicability: are we really trying to build a quantum computer, either to per-
form a quantum Fourier transform or to perform a quantum search? Is quantum phase estimation
the only bridge connecting a quantum computer on one side, and virtually all scientific computing
problems on the other, such as solving linear systems, eigenvalue problems, least squares problems,
differential equations, numerical optimization etc.?
Thanks to the significant progresses in the development of quantum algorithms, it should be
by now self-evident that the answer to both questions above is no. This is a fast evolving field,
and many important progresses have only been developed in the past few years. However, many
such developments are theoretically and technically involved, and can be difficult to penetrate for
someone with only basic knowledge of quantum computing. I think it is worth delivering some of
these exciting results, in a somewhat more accessible way, to a broader community interested in
using future fault-tolerant quantum computers to solve scientific problems.
This is a set of lecture notes used in a graduate topic class in applied mathematics called
“Quantum Algorithms for Scientific Computation” at the Department of Mathematics, UC Berkeley
during the fall semester of 2021. These lecture notes focus only on quantum algorithms closely
related to scientific computation, and in particular, matrix computation. In fact, this is only a
small class of quantum algorithms viewed from the perspective of the “quantum algorithm zoo” 1.
This means that many important materials are consciously left out, such as quantum complexity
theory, applications in number theory and cryptography (notably, Shor’s algorithm), applications in
algebraic problems (such as the hidden subgroup problems) etc. Readers interested in these topics
can consult some of the excellent aforementioned textbooks. Since the materials were designed to
fit into the curriculum of one semester, several other topics relevant to scientific computation are
not included, notably adiabatic quantum computation (AQC), and variational quantum algorithms
(VQA). These materials may be added in future editions of the lecture notes. To my knowledge,
1https://quantumalgorithmzoo.org/
5
6 PREFACE
some of the materials in these lecture notes may be new and have not been presented in the
literature. The sections marked by * can be skipped upon first reading without much detriment.
I would like to thank Dong An, Yulong Dong, Di Fang, Fabian M. Faulstich, Cory Hargus,
Zhen Huang, Subhayan Roy Moulik, Yu Tong, Jiasu Wang, Mathias Weiden, Jiahao Yao, Lexing
Ying for useful discussions and for pointing out typos in the notes. I would like also like to thank
Nilin Abrahamsen, Di Fang, Subhayan Roy Moulik, Yu Tong for contributing some of the exercises,
and Jiahao Yao for providing the cover image of the notes. For errors / comments / suggestions /
general thoughts on the lectures notes, please send me an email: [email protected].
CHAPTER 1
1.1.1. State space postulate. The set of all quantum states of a quantum system forms a
complex vector space with inner product structure (i.e., it is a Hilbert space, denoted by H), called
the state space. If the state space H is finite dimensional, it is isomorphic to some CN , written as
H ∼= CN . Without loss of generality we may simply take H = CN . We always assume N = 2n
for some non-negative integer n, often called the number of quantum bits (or qubits). A quantum
state ψ ∈ CN can be expressed in terms of its components as
ψ0
ψ1
(1.1) ψ = . .
..
ψN −1
Its Hermitian conjugate is
ψ† = ψ0
(1.2) ψ1 ··· ψ N −1 ,
where c is the complex conjugation of c ∈ C. We also use the Dirac notation, which uses |ψi to
denote a quantum state, hψ| to denote its Hermitian conjugation ψ † , and the inner product
X
(1.3) hψ|ϕi := hψ, ϕi = ψ i ϕi .
i∈[N ]
Here [N ] = { 0, . . . , N − 1 }. Let {|ii} be the standard basis of CN . The i-th entry of ψ can be
written as an inner product ψi = hi|ψi. Then |ψi hϕ| should be interpreted as an outer product,
with (i, j)-th matrix element given by
Two state vectors |ψi and c |ψi for some 0 6= c ∈ C always refer to the same physical state, i.e.,
c has no observable effects. Hence without loss of generality we always assume |ψi is normalized to
be a unit vector, i.e., hψ|ψi = 1. Sometimes it is more convenient to write down an unnormalized
state, which will be denoted by ψ without the ket notation |·i. Restricting to normalized state
vectors, the complex number c = eiθ for some θ ∈ [0, 2π), called the global phase factor.
7
8 1. PRELIMINARIES OF QUANTUM COMPUTATION
Example 1.1 (Single qubit system). A (single) qubit corresponds to a state space H ∼
= C2 . We
also define
1 0
(1.5) |0i = , |1i = .
0 1
Since the state space of the spin- 12 system is also isomorphic to C2 , this is also called the single spin
system, where |0i , |1i are referred to as the spin-up and spin-down state, respectively. A general
state vector in H takes the form
a
(1.6) |ψi = a |0i + b |1i = , a, b ∈ C,
b
2 2
and the normalization condition implies |a| + |b| = 1. So we may rewrite |ψi as
iγ θ iϕ θ
(1.7) |ψi = e cos |0i + e sin |1i , θ, ϕ, γ ∈ R.
2 2
If we ignore the irrelevant global phase γ, the state is effectively
θ θ
(1.8) |ψi = cos |0i + eiϕ sin |1i , 0 ≤ θ < π, 0 ≤ ϕ < 2π.
2 2
So we may identify each single qubit quantum state with a unique point on the unit three-
dimensional sphere (called the Bloch sphere) as
(1.9) a = (sin θ cos ϕ, sin θ sin ϕ, cos θ)> .
0
1.1.2. Quantum operator postulate. The evolution of a quantum state from |ψi → |ψ i ∈
CN is always achieved via a unitary operator U ∈ CN ×N , i.e.,
(1.10) |ψ 0 i = U |ψi , U † U = IN .
Here U † is the Hermitian conjugate of a matrix U , and IN is the N -dimensional identity matrix.
When the dimension is apparent, we may also simply write I ≡ IN . In quantum computation, a
unitary matrix is often referred to as a gate.
Example 1.2. For a single qubit, the Pauli matrices are
0 1 0 −i 1 0
(1.11) σx = X = , σy = Y = , σz = Z = .
1 0 i 0 0 −1
Together with the two-dimensional identity matrix, they form a basis of all linear operators on
C2 .
Some other commonly used single qubit operators include, to name a few:
• Hadamard gate
1 1 1
(1.12) H=√
2 1 −1
• Phase gate
1 0
(1.13) S=
0 i
1.1. POSTULATES OF QUANTUM MECHANICS 9
• T gate:
1 0
(1.14) T =
0 eiπ/4
When there are notational conflicts, we will use the roman font such as H, X for these single-qubit
gates (one common scenario is to distinguish the Hadamard gate H from a Hamiltonian H). An
operator acting on an n-qubit quantum state space is called an n-qubit operator.
Starting from an initial quantum state |ψ(0)i, the quantum state can evolve in time, which
gives a single parameter family of quantum states denoted by {|ψ(t)i}. These quantum states are
related to each other via a quantum evolution operator U :
(1.15) ψ(t2 ) = U (t2 , t1 )ψ(t1 ),
where U (t2 , t1 ) is unitary for any given t1 , t2 . Here t2 > t1 refers to quantum evolution forward in
time, t2 < t1 refers to quantum evolution backward in time, and U (t1 , t1 ) = I for any t1 .
The quantum evolution under a time-independent Hamiltonian H satisfies the time-independent
Schrödinger equation
(1.16) i∂t |ψ(t)i = H |ψ(t)i .
Here H = H † is a Hermitian matrix. The corresponding time evolution operator is
(1.17) U (t2 , t1 ) = e−iH(t2 −t1 ) , ∀t1 , t2 .
In particular, U (t2 , t1 ) = U (t2 − t1 , 0).
On the other hand, for any unitary matrix U , we can always find a Hermitian matrix H such
that U = eiH (Exercise 1.1).
Example 1.3. Let the Hamiltonian H be the Pauli-X gate. Then
cos t −i sin t
(1.18) U (t, 0) = e−iXt = = (cos t)I − iX(sin t).
−i sin t cos t
Starting from an initial state |ψ(0)i = |0i, after time t = π/2, the state evolves into |ψ(π/2)i =
−i |1i, i.e., the |1i state (up to a global phase factor).
1.1.3. Quantum measurement postulate. Without loss of generality, we only discuss a
special type of quantum measurements called the projective measurement. For more general types
of quantum measurements, see [NC00, Section 2.2.3]. All quantum measurements expressed as a
positive operator-valued measure (POVM) can be expressed in terms of projective measurements
in an enlarged Hilbert space via the Naimark dilation theorem.
In a finite dimensional setting, a quantum observable always corresponds to a Hermitian matrix
M , which has the spectral decomposition
X
(1.19) M= λm Pm .
m
Here λm ∈ R are the eigenvalues of M , and Pm is the projection operator onto the eigenspace
2
associated with λm , i.e., Pm = Pm .
When a quantum state |ψi is measured by a quantum observable M , the outcome of the
measurement is always an eigenvalue λm , with probability
(1.20) pm = hψ|Pm |ψi .
10 1. PRELIMINARIES OF QUANTUM COMPUTATION
Example 1.5 (Two qubit system). The state space is H = (C2 )⊗2 ∼= C4 . The standard basis is
(row-major order, i.e., last index is the fastest changing one)
1 0 0 0
0 1 0 0
(1.31) |00i = 0 , |01i = 0 , |10i = 1 , |11i = 0 .
0 0 0 1
The Bell state (also called the EPR pair) is defined to be
1
1 1 0
(1.32) |ψi = √ (|00i + |11i) = √ ,
2 2 0
1
which cannot be written as any product state |ai ⊗ |bi (Exercise 1.4).
There are many important quantum operators on the two-qubit quantum system. One of them
is the CNOT gate, with matrix representation
1 0 0 0
0 1 0 0
(1.33) CNOT = 0 0 0 1 .
0 0 1 0
In other words, when acting on the standard basis, we have
|00i = |00i
|01i = |01i
(1.34) CNOT .
|10i
= |11i
|11i = |10i
is a rank-1 matrix.
Consider a quantum observable in Eq. (1.19) associated with the projectors {Pm }. For a pure
state, it can be verified that the probability result of returning λm , and the expectation value of
the measurement are respectively,
(1.39) p(m) = Tr[Pm ρ], Eρ [M ] = Tr[M ρ]
The expression (1.39) also holds for general density operators ρ.
An operator ρ is the density operator associated to some ensemble {pi , |ψi i} if and only if (1)
Tr ρ = 1 (2) ρ 0, i.e., ρ is a positive semidefinite matrix (also called a positive operator). All
postulates in Section 1.1 can be stated in terms of density operators (see [NC00, Section 2.4.2]).
Note that a pure state satisfies ρ2 = ρ. In general we have ρ2 ρ. If ρ2 ≺ ρ, then ρ is called (the
density operator of) a mixed state. Furthermore, an ensemble of admissible density operators is
also a density operator.
A quantum operator U that transforms |ψi to U |ψi also transforms the density operator
according to
U
X X
(1.40) ρ= pi |ψi i hψi | −
→ pi U |ψi i hψi | U † = U ρU † := U [ρ].
i i
However, not all quantum operations on density operators need to be unitary! See [NC00, Section
8.2] for more general discussions on quantum operations.
Most of the discussions in this course will be restricted to pure states, and unitary quantum
operations. Even in this restricted setting, the density operator formalism can still be convenient,
particularly for describing a subsystem of a composite quantum system. Consider a quantum system
n
of (n + m)-qubits, partitioned into a subsystem A with n qubits (the state space is HA = C2 ) and
m
a subsystem B with m qubits (the state space is HB = C2 ) respectively. The quantum state is a
n+m
pure state |ψi ∈ C2 with density operator ρAB . Let |a1 i , |a2 i be two state vectors in HA , and
|b1 i , |b2 i be two state vectors in HB . Then the partial trace over system B is defined as
(1.41) TrB [|a1 i ha2 | ⊗ |b1 i hb2 |] = |a1 i ha2 | Tr[|b1 i hb2 |] = |a1 i ha2 | hb2 |b1 i .
Since we can expand the density operator ρAB in terms of the basis of HA , HB , the definition of
(1.41) can be extended to define the reduced density operator for the subsystem A
(1.42) ρA = TrB [ρAB ].
The reduced density operator for the subsystem B can be similarly defined. The reduced density
operators ρA , ρB are generally mixed states.
Example 1.7 (Reduced density operator of tensor product states). If ρAB = ρ1 ⊗ ρ2 , then
(1.43) TrB [ρAB ] = ρ1 , TrA [ρAB ] = ρ2 .
|ψi U U |ψi
Z
|+i |−i
|0i X |1i
|0i |0i
which is a graphical way of writing
(1.47) (X ⊗ I) |00i = |10i .
Note that the input state can be general, and in particular does not need to be a product state. For
√ we just apply the quantum operator to |00i and |11i,
example, if the input is a Bell state (1.32),
respectively and multiply the results by 1/ 2 and add together. To distinguish with other symbols,
these single qubit gates may be either written as X, Y, Z, H or (using the roman font) X, Y, Z, H.
The quantum circuit for the CNOT gate is
|ai |ai
|bi |a ⊕ bi
Here the “dot” means that the quantum gate connected to the dot only becomes active if the state of
the qubit 0 (called the control qubit) is a = 1. This justifies the name of the CNOT gate (controlled
NOT).
Similarly,
14 1. PRELIMINARIES OF QUANTUM COMPUTATION
|ai |ai
|bi U U a |bi
is the controlled U gate for some unitary U . Here U a = I if a = 0. The CNOT gate can be obtained
by setting U = X.
Another commonly used two-qubit gate is the SWAP gate, which swaps the state in the 0-th
and the 1-st qubits.
|ai |bi
|bi |ai
Quantum operators applied to multiple qubits can be written in a similar manner:
qubit 0: |0i
qubit 1: |0i
U
qubit 2: |0i
qubit 3: |0i
For a multi-qubit quantum circuit, unless stated otherwise, the first qubit will be referred to as the
qubit 0, and the second qubit as the qubit 1, etc.
When the context is clear, we may also use a more compact notation for the multi-qubit
quantum operators:
|0i
⊗4
U ⇔ |0i⊗4 U ⇔ |0i⊗4 U
One useful multiple qubit gate is the Toffoli gate (or controlled-controlled-NOT, CCNOT gate).
|ai |ai
|bi |bi
|ci |(ab) ⊕ ci
We may also want to apply a n-qubit unitary U only when certain conditions are met
|1i |1i
|1i |1i
|0i |0i
|xi U U |xi
where the empty circle means that the gate being controlled only becomes active when the value of
the control qubit is 0. This can be used to write down the quantum “if” statements, i.e., when the
qubits 0, 1 are at the |1i state and the qubit 2 is at the |0i state, then apply U to |xi.
A set of qubits is often called a register (or quantum variable). For example, in the picture
above, the main quantum state of interest (an n qubit quantum state |xi) is called the system
register. The first 3 qubits can be called the control register.
1.4. COPY OPERATION AND NO-CLONING THEOREM 15
for any black-box state x, and a chosen target state |si (e.g. |0n i). Then take two states |x1 i , |x2 i,
we have
2
(1.50) hx1 |x2 i = hx1 |x2 i ,
which implies hx1 |x2 i = 0 or 1. When hx1 |x2 i = 1, |x1 i , |x2 i refer to the same physical state.
Therefore a cloning operator U can at most copy states which are orthogonal to each other, and a
general quantum copy operation is impossible.
Given the ubiquity of the copy operation in scientific computing like y = x, the no-cloning
theorem has profound implications. For instance, all classical iterative algorithms for solving linear
systems require storing some intermediate variables. This operation is generally not possible, or at
least cannot be efficiently performed.
There are two notable exceptions to the range of applications of the no-cloning theorem. The
first is that we know how a quantum state is prepared, i.e., |xi = Ux |si for a known unitary Ux
and some |si. Then we can of course copy this specific vector |xi via
The second is the copying of classical information. This is an application of the CNOT gate.
|xi |xi
|0i |xi
i.e.,
The same principle applies to copying classical information from multiple qubits. Fig. 1.1 gives an
example of copying the classical information stored in 3 bits.
16 1. PRELIMINARIES OF QUANTUM COMPUTATION
|x1 i |x1 i
|x2 i |x2 i
|x3 i |x3 i
|0i |x1 i
|0i |x2 i
|0i |x3 i
In general, a multi-qubit CNOT operation can be used to perform the classical copying operation
in the computational basis. Note that in the circuit model, this can be implemented with a depth
1 circuit, since they all act on different qubits.
Example 1.8. Let us verify that the CNOT gate does not violate the no-cloning theorem, i.e., it
cannot be used to copy a superposition of classical bits |xi = a |0i + b |1i. Direct calculation shows
(1.53) CNOT |xi ⊗ |0i = a |00i + b |11i =
6 |xi ⊗ |xi .
In particular, if |xi = |+i, then CNOT creates a Bell state.
The quantum no-cloning theorem implies that there does not exist a unitary U that performs
the deleting operation, which resets a black-box state |xi to |0n i. This is because such a deleting
unitary can be viewed as a copying operation
(1.54) U |0n i ⊗ |xi = |0n i ⊗ |0n i .
Then take |x1 i , |x2 i that are orthogonal to each other, apply the deleting gate, and compute the
inner products, we obtain
(1.55) 0 = hx1 |x2 i = h0n |0n i = 1,
which is a contradiction.
A more common way to express the no-deleting theorem is in terms of the time reversed dual
of the no-cloning theorem: in general, given two copies of some arbitrary quantum state, it is
impossible to delete one of the copies. More specifically, there is no unitary U performing the
following operation using known states |si , |s0 i,
(1.56) U |xi |xi |si = |xi |0n i |s0 i
for an arbitrary unknown state |xi (Exercise 1.7).
1.5. Measurement
The quantum measurement applied to any qubit, by default, measures the outcome in the
computational basis. For example,
|0i H
1.5. MEASUREMENT 17
outputs 0 or 1 each w.p. 1/2. We may also measure some of the qubits in a multi-qubit system.
|0i
|0i
(1.57) |0i U ≡ U
|ψi |0i
⊗2
|ψi
|0i
There are two important principles related to quantum measurements: the principle of deferred
measurement, and the principle of implicit measurement. At a first glance, both principles may
seem to be counterintuitive.
The principle of deferred measurement states that measurement operations can always be moved
from an intermediate stage of a quantum circuit to the end of the circuit. This is because even if
a measurement is performed as an intermediate step in a quantum circuit, and the result of the
measurement is used to conditionally control subsequent quantum gates, such classical controls can
always be replaced by quantum controls, and the result of the quantum measurement is postponed
to later.
Example 1.9 (Deferring quantum measurements). Consider the circuit
|0i H
|0i X
Here the double line denotes the classical control operation. The outcome is that qubit 0 has
probability 1/2 of outputting 0, and the qubit 1 is at state |0i. Qubit 0 also has probability 1/2 of
outputting 1, and the qubit 1 is at state |1i.
However, we may replace the classical control operation after the measurement by a quantum
controlled X (which is CNOT), and measure the qubit 0 afterwards:
|0i H
|0i
It can be verified that the result is the same. Note that CNOT acts as the classical copying
operation. So qubit 1 really stores the classical information (i.e., in the computational basis) of
qubit 0.
Example 1.10 (Deferred measurement requires extra qubits). The procedure of deferring quantum
measurements using CNOTs is general, and important. Consider the following circuit:
|0i H H
The probability of obtaining 0, 1 is 1/2, respectively. However, if we simply “defer” the measurement
to the end by removing the intermediate measurement, we obtain
|0i H H
The result of the measurement is deterministically 0! The correct way of deferring the intermediate
quantum measurement is to introduce another qubit
18 1. PRELIMINARIES OF QUANTUM COMPUTATION
|0i H H
|0i
Measuring the qubit 0, we obtain 0 or 1 w.p. 1/2, respectively. Hence when deferring quantum
measurements, it is necessary to store the intermediate information in extra (ancilla) qubits, even
if such information is not used afterwards.
The principle of implicit measurements states that at the end of a quantum circuit, any un-
measured qubit may be assumed to be measured. More specifically, assume the quantum system
consists of two subsystems A and B. If qubits A are to be measured at the end of the circuits,
the results of the measurements does not depend on whether the qubits B are measured or not.
Recall from Eq. (1.44) that a measurement on the subsystem A only depends on the reduced den-
sity matrix ρA . So we only need to show that ρA does not depend on the measurement in B.
To see why this is the case, let {Pi } be the projectors onto the computational basis of B. Before
the measurement, the density operator is ρ. If we measure the subsystem B, the resulting density
operator is transformed into
X
(1.58) ρ0 = (I ⊗ Pi )ρ(I ⊗ Pi ).
i
Then it can be verified that
" #
X
(1.59) ρ0A 0
= TrB [ρ ] = TrB ρ (I ⊗ Pi ) = TrB [ρ] = ρA .
i
This proves the principle of implicit measurements.
By definition, the output of all quantum algorithms must be obtained through measurements,
and hence the measurement outcome is probabilistic in general. If the goal is to compute the
expectation value of a quantum observable MA acting on a subsystem A, then its variance is
(1.60) Varρ [MA ] = Tr[MA2 ρA ] − (Tr[MA ρA ])2 .
The number of samples N needed to estimate Tr[MA ρA ] to additive precision satisfies
r
Varρ [MA ] Varρ [MA ]
(1.61) ≤ ⇒ N ≥ ,
N 2
which only depends on ρA .
Example 1.11 (Estimating success probability on one qubit). Let A be the single qubit to be
measured in the computational basis, and we are interested in the accuracy in estimating the success
probability of obtaining 1, i.e., p. This can be realized as an expectation value with MA = |1i h1|,
and p = Tr[MA ρA ]. Note that MA2 = MA , then
(1.62) Varρ [MA ] = p − p2 = p(1 − p).
Hence to estimate p to additive error , the number of samples needed satisfies
p(1 − p)
(1.63) N ≥ .
2
Note that if p is close to 0 or 1, the number of samples needed is also very small: indeed, the
outcome of the measurement becomes increasing deterministic in this case!
1.6. LINEAR ERROR GROWTH AND DUHAMEL’S PRINCIPLE 19
(1.65) ei ≤ ,
Ui − U ∀i = 1, . . . , K,
we have
(1.66) UK · · · U1 − U
eK · · · U
e1 ≤ K.
=(UK · · · U2 U1 − UK · · · U2 U
e1 ) + (UK · · · U3 U2 U
e1 − UK · · · U3 U e1 ) + · · ·
e2 U
(1.67)
eK−1 · · · U
+ (UK U e1 − U eK · · · U
eK U e1 )
=UK · · · U2 (U1 − U
e1 ) + UK · · · U3 (U2 − U
e2 ) + · · · + (UK − U eK−1 · · · U
eK )U e1 .
Since all Ui , U
ei are unitary matrices, we readily have
K
X
(1.68) UK · · · U1 − U
eK · · · U
e1 ≤ Ui − U
ei ≤ K.
i=1
In other words, if we can implement each local unitary to precision , the global error grows
at most linearly with respect to the number of gates and is bounded by K. The telescoping series
Eq. (1.67), as well as the hybrid argument can also be seen as a discrete analogue of the variation
of constants method (also called Duhamel’s principle).
e (t) ∈ CN ×N
Proposition 1.13 (Duhamel’s principle for Hamiltonian simulation). Let U (t), U
satisfy
(1.69) i∂t U (t) = HU (t), i∂t U
e (t) = H U
e (t) + B(t), U (0) = U
e (0) = I,
and
Z t
(1.71) e (t) − U (t) ≤
U kB(s)k ds.
0
Proof. Directly verify that Eq. (1.70) is the solution to the differential equation.
As a special case, consider B(t) = E(t)U e (t), then Eq. (1.70) becomes
Z t
(1.72) e (t) = U (t) − i
U U (t − s)E(s)U
e (s) ds,
0
and
Z t
(1.73) e (t) − U (t) ≤
U kE(s)k ds.
0
This is a direct analogue of the hybrid argument in the continuous setting.
with reversible gates. More specifically, any irreversible classical gate x 7→ f (x) can be made into
a reversible classical gate
In particular, we have (x, 0) 7→ (x, f (x)) computed in a reversible way. The key idea is to store all
intermediate steps of the computation (see [NC00, Section 3.2.5] for more details).
On the quantum computer, storing all intermediate computational steps indefinitely creates two
problems: (1) tremendous waste of quantum resources (2) the intermediate results stored in some
extra qubits are still entangled to the quantum state of interest. So if the environments interfere
with intermediate results, the quantum state of interest is also affected.
Fortunately, both problems can be solved by a step called “uncomputation”. In order to imple-
ment a Boolean function f : {0, 1}n → {0, 1}m , we assume there is an oracle
where |0m i comes from a m-qubit output register. The oracle is often further implemented with
the help of a working register (a.k.a. “garbage” register) such that
From the no-deleting theorem, there is no generic unitary operator that can set a black-box
state to |0w i. In order to set the working register back to |0w i while keeping the input and output
state, we introduce yet another m-qubit ancilla register initialized at |0m i. Then we can use an
n-qubit CNOT controlled on the output register and obtain
It is important to remember that in the operation above, the multi-qubit CNOT gate only performs
the classical copying operation in the computational basis, and does not violate the no-cloning
theorem.
Recall that Uf−1 = Uf† , we have
(1.79) (Im ⊗ Uf† ) |f (x)i |g(x)i |f (x)i |xi = |f (x)i (Uf† |g(x)i |f (x)i |xi) = |f (x)i |0w i |0m i |xi .
Finally we apply an n-qubit SWAP operator on the ancilla and output registers to obtain
After this procedure, both the ancilla and the working register are set to the initial state. They
are no longer entangled to the input or output register, and can be reused for other purposes. This
procedure is called uncomputation. The circuit is shown in Fig. 1.2.
22 1. PRELIMINARIES OF QUANTUM COMPUTATION
|0m i |0m i
|0w i |0w i
|xi |xi
Figure 1.2. Circuit for uncomputation. The CNOT and SWAP operators indicate
the multi-qubit copy and swap operations, respectively.
Remark 1.15 (Discarding working registers). After the uncomputation as shown in Fig. 1.2, the
first two registers are unchanged before and after the application of the circuit (though they are
changed during the intermediate steps). Therefore Fig. 1.2 effectively implements a unitary
(1.81) (Im+w ⊗ Vf ) |0m i |0w i |0m i |xi = |0m i |0w i |f (x)i |xi
or equivalently
In the definition of Vf , all working registers have been discarded (on paper). This allows us to
simplify the notation and focus on the essence of the quantum algorithms under study.
Using the technique of uncomputation, if the map x 7→ f (x) can be efficiently implemented on
a classical computer, then we can implement this map efficiently on a quantum computer as well.
To do this, we first turn it into a reversible map (1.75). All reversible single-bit and two-bit classical
gates can be implemented using single-qubit and two-qubit quantum gates. So the reversible map
can be made into a unitary operator
on a quantum computer. This proves that a quantum computer is at least as powerful as classical
computers.
The unitary transformation Uf in (1.83) can be applied to any superposition of states in the
computational basis, e.g.
1 X 1 X
(1.84) Uf : √ |x, 0m i 7→ √ |x, f (x)i .
2n x∈{0,1}n
2n x∈{0,1}n
This does not necessarily mean that we can efficient implement the map |xi 7→ |f (x)i. However, if
f is a bijection, and we have access to the inverse of the reversible circuit for computing f −1 , then
we may use the technique of uncomputation to implement such a map (Exercise 1.9).
1.9. FAULT TOLERANT COMPUTATION 23
m
The number k divided by 2 (0 ≤ m ≤ n) can be written as (note that the decimal is shifted to be
after km ):
k X
(1.86) a= = ki 2i−m =: (kn−1 · · · km .km−1 · · · k0 ).
2m
i∈[n]
Sometimes we may also write a = 0.k1 · · · kn , so that ki is the i-th decimal of a in the binary
representation. For a given floating number 0 ≤ a < 1 written as
(1.88) a = (0.k1 · · · kn kn+1 · · · ),
the number (0.k1 · · · kn ) is called the n-bit fixed point representation of a. Therefore to represent a
to additive precision , we will need n = dlog2 e qubits. If the sign of a is also important, we may
reserve one extra bit to indicate its sign.
Together with the reversible computational model, we can perform classical arithmetic oper-
ations, such as (x, y) 7→ x + y, (x, y) 7→ xy, x 7→ xα , x 7→ cos(x) etc. using reversible quantum
circuits. The number of ancilla qubits, and the number of elementary gates needed for implementing
such quantum circuits is O(poly(n)) (see [RP11, Chapter 6] for more details).
It is worth commenting that while quantum computer is theoretically as powerful as classical
computers, there is a very significant overhead in implementing reversible classical circuits on
quantum devices, both in terms of the number of ancilla qubits and the circuit depth.
Another important measure of the complexity is the circuit depth, i.e., the maximum number
of gates along any path from an input to an output. Since quantum gates can be performed in
parallel, the circuit depth is approximately equivalent to the concept of “wall-clock time” in classical
computation, i.e., the real time needed for a quantum computer to carry out a certain task. Since
quantum states can only be preserved for a short period of time (called the coherence time), the
circuit depth also provides an approximate measure of whether the quantum algorithm exceeds the
coherence limit of a given quantum computer. In many scenarios, the maximum coherence time
is the most severe limiting factor. When possible, it is often desirable to reduce the circuit depth,
even if it means that the quantum circuit needs to be carried out many more times.
Let us summarize the basic components of a typical quantum algorithm: the set of qubits can be
separated into system registers (storing quantum states of interest) and ancilla registers (auxiliary
registers needed to implement the unitary operation acting on system registers). Starting from an
initial state, apply a series of one-/two-qubit gates, and perform measurements. Uncomputation
should be performed whenever possible. Within the ancilla registers, if a register can be “freed”
after the uncomputation, it is called a working register. Since working registers can be reused for
other purposes, the cost of working registers is often not (explicitly) factored into the asymptotic
cost analysis in the literature.
1.11. Notation
We use k · k to denote vector or matrix 2-norm: when v is a vector we denote by kvk its 2-norm,
and when A is matrix we denote by kAk its operator norm. Other matrix and vector norms will be
introduced when needed. Unless otherwise specified, a vector v ∈ CN is an unnormalized vector,
and a normalized vector (stored as a quantum state) is denoted by |vi = v/ kvk. A vector v can
be expressed in terms of j-th component as v = (vj ) or (v)j = vj . We use a 0-based indexing, i.e.,
j = 0, . . . , N − 1 or j ∈ [N ]. When 1-based indexing is used, we will explicitly write j = 1, . . . , N .
We use the following asymptotic notations besides the usual O (or “big-O”) notation: we write
f = Ω(g) if g = O(f ); f = Θ(g) if f = O(g) and g = O(f ); f = O(g)e if f = O(g polylog(g)).
Exercise 1.1. Prove that any unitary matrix U ∈ CN ×N can be written as U = eiH , where H is
an Hermitian matrix.
Exercise 1.2. Prove Eq. (1.26).
√
Exercise
√ 1.3. Write down the matrix representation of the SWAP gate, as well as the SWAP
and iSWAP gates.
Exercise 1.4. Prove that the Bell state Eq. (1.32) cannot be written as any product state |ai ⊗ |bi.
Exercise 1.5. Prove Eq. (1.39) holds for a general mixed state ρ.
Exercise 1.6. Prove that an ensemble of admissible density operators is also a density operator.
Exercise 1.7. Prove the no-deleting theorem in Eq. (1.56).
Exercise 1.8. Work out the circuit for implementing Eq. (1.78) and Eq. (1.80).
Exercise 1.9. Prove that if f : {0, 1}n → {0, 1}n is a bijection, and we have access to the inverse
mapping f −1 , then the mapping Uf : |xi 7→ |f (x)i can be implemented on a quantum computer.
Exercise 1.10. Prove Eq. (1.58) and Eq. (1.59).
CHAPTER 2
Grover’s algorithm
Now we will introduce a few basic quantum algorithms, of which the ideas and variants are
present in numerous other quantum algorithms. Hence they are called “quantum primitives”. There
is no official ruling on which quantum algorithms qualify for being included in the set of quantum
primitives, but the membership of Grover’s algorithm, quantum Fourier transform, quantum phase
estimation, and Trotter based Hamiltonian simulation should not be controversial. We first intro-
duce Deutsch’s algorithm, which is arguably one of the simplest quantum algorithms carrying out
a well-defined task.
Mathematically, consider a boolean function f : {0, 1} → {0, 1}. The question is whether
f (0) = f (1) or f (0) 6= f (1)? A quantum fruit-checker assumes the access to f via the following
quantum oracle:
After these two queries, we can measure qubit 1 with a deterministic outcome, and answer whether
f (0) = f (1). However, a quantum fruit-checker can apply Uf to a linear combination of states in
the computational basis.
Let us first check that Uf is unitary:
27
28 2. GROVER’S ALGORITHM
|0i H U
ef H
|0i H x x H
Uf
|1i H y y ⊕ f (x)
whether the two boxes contain the same type of fruit. The procedure is equally counterintuitive.
Note that a classical fruit checker as implemented in Eq. (2.2) naturally receives the information by
measuring qubit 1. On the other hand, Deutsch’s algorithm only uses qubit 1 as a signal qubit, and
all the information is retrieved by measuring qubit 0, which, at least from the classical perspective
of Eq. (2.2), seems to contain no information at all!
Now we have seen that in terms of the query complexity, a quantum fruit-checker is clearly more
efficient. However, it is a fair question how to implement the oracle Uf , especially in a way that
somehow does not already reveal the values of f (0), f (1). We give the implementation of some cases
of Uf in Example 2.2. In general, proving the query complexity alone may not be convincing enough
that quantum computers are better than classical computers, and the gate complexity matters. This
will not be the last time we hide away such “implementation details” of quantum oracles.
Remark 2.1 (Deutsch–Jozsa algorithm). The single-qubit version of the Deutsch algorithm can
be naturally generalized to the n-qubit version, called the Deutsch–Jozsa algorithm. Given N = 2n
boxes with an apple or an orange in each box, and the promise that either 1) all boxes contain
the same type of fruit or 2) exactly half of the boxes contain apples and the other half contain
oranges, we would like to distinguish the two cases. Mathematically, given the promise that a
boolean function f : {0, 1}n → {0, 1} is either a constant function (i.e., | { x | f (x) = 0 } | = 0 or
2n ) or a balanced function (i.e., | { x | f (x) = 0 } | = 2n−1 ), we would like to decide to which type
f belongs. We refer to [NC00, Section 1.4.4] for more details.
Example 2.2 (Qiskit example of Deutsch’s algorithm). For f (0) = f (1) = 1 (constant case), we
can use Uf = I ⊗ X. For f (0) = 0, f (1) = 1 (balanced case), we can use Uf = CNOT.
30 2. GROVER’S ALGORITHM
Therefore Rx0 is a reflection operator across the hyperplane orthogonal to |x0 i, i.e., the Householder
reflector
(2.15) Rx0 = I − 2 |x0 i hx0 | .
Let us write
(2.16) |ψ0 i = sin(θ/2) |x0 i + cos(θ/2) |ψ0⊥ i ,
with θ = 2 arcsin √1N ≈ √2N , and |ψ0⊥ i = √N1−1 x6=x0 |xi is a normalized state orthogonal to |x0 i.
P
Then
(2.17) Rx0 |ψ0 i = − sin(θ/2) |x0 i + cos(θ/2) |ψ0⊥ i .
So span{|x0 i , |ψ0⊥ i} is an invariant subspace of Rx0 .
The next key step is to consider another Householder reflector with respect to |ψ0 i. For later
convenience we add a global phase factor −1 (which is irrelevant to the physical outcome):
(2.18) Rψ0 = −(I − 2 |ψ0 i hψ0 |).
Direct computation shows
Rψ0 Rx0 |ψ0 i =Rψ0 (|ψ0 i − 2 sin(θ/2) |x0 i)
=(|ψ0 i − 4 sin2 (θ/2) |ψ0 i) + 2 sin(θ/2) |x0 i
(2.19)
= sin(θ/2)(3 − 4 sin2 (θ/2)) |x0 i + cos(θ/2)(1 − 4 sin2 (θ/2)) |ψ0⊥ i
= sin(3θ/2) |x0 i + cos(3θ/2) |ψ0⊥ i .
So define G = Rψ0 Rx0 as the product of the two reflection operators (called the Grover operator),
then it amplifies the angle from θ/2 to 3θ/2. The geometric picture is in fact even clearer in Fig. 2.2
and the conclusion can be observed without explicit computation.
Another derivation of the Grover method is to focus on the operator, instead of the initial
vector |ψ0 i at each step of the calculation. In the orthonormal basis B = {|x0 i , |ψ0⊥ i}, the matrix
representation of the reflector Rx0 = I − 2 |x0 i hx0 | is
−1 0
(2.21) [Rx0 ]B = .
0 1
The matrix representation for the Grover diffusion operator Rψ0 = 2 |ψ0 i hψ0 | − I is
2
√
(2.22) [Rψ0 ]B = √ −1
2a 2a 1 − a2
=
− cos θ sin θ
.
2a 1 − a2 1 − 2a2 sin θ cos θ
√
Here sin(θ/2) = a = 1/ N . Therefore for the matrix representation of the Grover iterate G =
Rψ0 Rx0 is
cos θ sin θ
(2.23) [G]B = ,
− sin θ cos θ
i.e., G is a rotation matrix restricted to the two-dimensional space H = span B.
The initial vector satisfies
sin(θ/2)
(2.24) [|ψ0 i]B = ,
cos(θ/2)
√
so Grover’s search can be applied as before, via Gk for k ≈ π 4N times.
To draw the quantum circuit of Grover’s algorithm, we need an implementation of Rψ0 . Note
that
(2.25) Rψ0 = H ⊗n (2 |0n i h0n | − I)H ⊗n .
This can be implemented via the following circuit using one ancilla qubit:
|−i X
|ψi H ⊗n H ⊗n
Here the controlled-NOT gate is an n-qubit controlled-X gate, and is only active if the system
qubits are in the 0n state. Discarding the signal qubit, we obtain and implementation of Rψ0 . Since
the signal qubit |−i only changes up to a sign, it can be reused for both Rψ0 and Rx0 .
The reflector Rψ0 can also be implemented without using the ancilla qubit as (use a 3-qubit
system as an example)
H X Z X H
H X X H
H X X H
Remark 2.3 (Multiple marked states). The Grover search algorithm can be naturally
p generalized
to the case when there are M > 1 marked states. The query complexity is O( N/M ).
Example 2.4 (Qiskit for Grover’s algorithm). https://qiskit.org/textbook/ch-algorithms/
grover.html
This is because |ψgood i can be entirely identified by measuring the ancilla qubits. Meanwhile
(2.34) Rψ0 = Uψ0 (2 |0m+n i h0m+n | − I)Uψ† 0 .
√
Let G = Rψ0 Rgood , and applying Gk to |ψ0 i for some k = O(1/ p0 ) times, we obtain a state that
has Ω(1) overlap with |ψgood i. This achieves the desired quadratic speedup.
Example 2.6 (Amplitude damping). Assuming access to an oracle in Eq. (2.30), where p0 is large,
√
we can easily dampen the amplitude to any number α ≤ p0 .
We introduce an additional signal qubit. Then Eq. (2.30) becomes
√ p
(2.35) (I ⊗ Uψ0 ) |0i |0i |0n i = |0i p0 |0i |ψ0 i + 1 − p0 |⊥i .
Define a single qubit rotation operation as
(2.36) Rθ |0i = cos θ |0i + sin θ |1i ,
and we have
(Rθ ⊗ Im+n )(I ⊗ Uψ0 ) |0i |0m i |0n i
√ √
= cos θ |0i ( p0 |0m i |ψ0 i + 1 − p0 |⊥0 i) + sin θ |1i ( p0 |0m i |ψ0 i + 1 − p0 |⊥0 i)
p p
(2.37)
√ p
:= p0 cos θ |0i |0m i |ψ0 i + 1 − p0 cos2 θ |⊥0 i .
√
Here (|0m+1 i h0m+1 | ⊗ In ) |⊥0 i = 0. We only need to choose p0 cos θ = α.
In particular, |ψk i does not involve any information of the solution x0 and therefore cannot possibly
solve the search problem.
For a set of vectors {f x0 }x0 ∈[N ] , and each f x0 ∈ CN , we will extensively use the following
discrete `2 -norm:
12
X
(2.42) kf k`2 := kf x0 k .
x0 ∈[N ]
The proof contains two steps. First, we show that the true solution and the fake solution differs
significantly, in the sense that
X 2
(2.44) Dk := k|ψkx0 i − |ψk ik = Ω(N ).
x0 ∈[N ]
(2.45) Dk ≤ 4k 2 , k ≥ 0.
√
Therefore to satisfy Eq. (2.44), we must have k = Ω( N ).
In the first step, since multiplying a phase factor eiθ to |ψkx0 i does not have any physical
consequences, we may choose a particular phase θ so that Eq. (2.40) becomes
1
(2.46) hψkx0 |x0 i ≥ √ .
2
Therefore
2
√
(2.47) k|ψkx0 i − |x0 ik = 2 − 2 hψkx0 |x0 i ≤ 2 − 2.
On the other hand, for the “fake algorithm”, using the Cauchy-Schwarz inequality,
X 2
X √ X √
(2.49) k|ψk i − |x0 ik ≥ 2N − 2 | hx0 |ψi | ≥ 2N − 2 N | hx0 |ψi |2 = 2N − 2 N .
x0 ∈[N ] x0 ∈[N ] x0 ∈[N ]
This violates the bound in Eq. (2.48), and the fake algorithm cannot solve the search problem for
arbitrarily large k.
36 2. GROVER’S ALGORITHM
So from Eqs. (2.48) and (2.49), and the triangle inequality, we have
X 2
X 2
Dk = k|ψkx0 i − |ψk ik = k(|ψkx0 i − |x0 i) − (|ψk i − |x0 i)k
x0 ∈[N ] x0 ∈[N ]
2
s X s X
2 2
≥ k|ψk i − |x0 ik − k|ψkx0 i − |x0 ik
(2.50) x0 ∈[N ] x0 ∈[N ]
2
√ √
q q
≥ 2N − 2 N − 2N − 2N
= Ω(N ).
This proves Eq. (2.44). In other words, the true solution and the fake solution must be well separated
in `2 -norm.
In the second step, we prove Eq. (2.45) inductively. Clearly Eq. (2.45) is true for k = 0. Assume
this is true, then
X
x0 2
Dk+1 = |ψk+1 i − |ψk+1 i
x0 ∈[N ]
X 2
= kUk+1 Rx0 |ψkx0 i − Uk+1 |ψk ik
x0 ∈[N ]
X 2
= kRx0 |ψkx0 i − |ψk ik
(2.51) x0 ∈[N ]
X 2
= kRx0 (|ψkx0 i − |ψk i) + (Rx0 − I) |ψk ik
x0 ∈[N ]
2
s X s X
2 2
≤ kRx0 (|ψkx0 i − |ψk i)k + k(Rx0 − I) |ψk ik .
x0 ∈[N ] x0 ∈[N ]
The last inequality uses the triangle inequality of the discrete `2 -norm. Note that
s X s X
2 2
(2.52) k(Rx0 − I) |ψk ik = 4 |hx0 |ψk i| = 2,
x0 ∈[N ] x0 ∈[N ]
and
s X s X
2 2
p
(2.53) kRx0 (|ψkx0 i − |ψk i)k = k|ψkx0 i − |ψk ik = Dk ,
x0 ∈[N ] x0 ∈[N ]
we have
p p
(2.54) Dk+1 ≤ Dk + 2 ≤ 2(k + 1),
which finishes the induction.
Finally, combining the lower bound Eqs. (2.44) and (2.45), we find
√ that the necessary condition
to solve the unstructured search problem is 4k 2 = Ω(N ), or k = Ω( N ).
Remark 2.7 (Implication for amplitude amplification). Due to the close relation between unstruc-
tred search and amplitude amplification, it means that given a state |ψi of which the amplitude of
2.4. LOWER BOUND OF QUERY COMPLEXITY* 37
the “good” component is α 1, no quantum algorithms can amplify the amplitude to Ω(1) using
1
o(α− 2 ) queries to the reflection operators.
Exercise 2.1. In Deutsch’s algorithm, demonstrate why not assuming access to an oracle Vf :
|xi 7→ |f (x)i.
Exercise 2.2. For all possible mappings f : {0, 1} → {0, 1}, draw the corresponding quantum
circuit to implement Uf : |x, 0i 7→ |x, f (x)i.
Exercise 2.3. Prove Eq. (2.20).
Exercise 2.4. Draw the quantum circuit for Eq. (2.28).
Exercise 2.5. Prove√ that when ancilla qubits are used, the complexity of the unstructured search
problem is still Ω( N ).
CHAPTER 3
The setup of the phase estimation problem is as follows. Let U be a unitary, and |ψi is an
eigenvector, i.e.,
The goal is to find ϕ up to certain precision. This is a quantum primitive with numerous applica-
tions: prime factorization (Shor’s algorithm), linear system (HHL), eigenvalue problem, amplitude
estimation, quantum counting, quantum walk, etc.
Using a classical computer, we can estimate ϕ using U |ψi |ψi, where stands for the element-
wise division operation. Specifically, if |ψi is indeed an eigenvector and hj|ψi 6= 0 for any j in the
computational basis, then we can extract the phase from
|0i H H
|ψi U
39
40 3. QUANTUM PHASE ESTIMATION
H⊗I 1
|0i |ψi −−−→ √ (|0i + |1i) |ψi
2
c-U 1
−−→ √ (|0i |ψi + |1i U |ψi)
2
H⊗I 1 1
−−−→ |0i (|ψi + U |ψi) + |1i (|ψi − U |ψi).
2 2
1
(3.3) p(0) = (1 + Re hψ|U |ψi).
2
|0i H S† H
|ψi U
Similar calculation shows the circuit transforms |0i |ψi to the state
1 1
(3.5) |0i (|ψi − iU |ψi) + |1i (|ψi + iU |ψi).
2 2
1
(3.6) p(0) = (1 + Im hψ|U |ψi).
2
Combining the results from the two circuits, we obtain the estimate to hψ|U |ψi.
Example 3.1 (Overlap estimate using swap test). An application of the Hadamard test is called
the swap test, which is used to estimate the overlap of two quantum states |hϕ|ψi|. The quantum
circuit for the swap test is
3.1. HADAMARD TEST 41
|0i H H
|ϕi
|ψi
Note that this is exactly the Hadamard test with U being the n-qubit swap gate. Direct
calculation shows that the probability of measuring the qubit 0 to be in state |0i is
1 1 2
(3.7) p(0) = (1 + Re hϕ, ψ|ψ, ϕi) = (1 + |hϕ|ψi| ).
2 2
Example 3.2 (Overlap estimate with relative phase information). In the swap test, the quantum
states |ϕi , |ψi can be black-box states, and in such a scenario obtaining an estimate to |hϕ|ψi| is
the best one can do. In order to retrieve the relative phase information and to obtain hϕ|ψi, we
need to have access to the unitary circuit preparing |ϕi , |ψi, i.e.,
(3.8) Uϕ |0n i = |ϕi , Uψ |0n i = |ψi .
Then we have hϕ|ψi = h0n |Uϕ† Uψ |0n i.
Example 3.3 (Single qubit phase estimation). The Hadamard test can also be used to derive the
simplest version of the phase estimate based on success probabilities. Apply the Hadamard test in
Fig. 3.1 with U, ψ satisfying Eq. (3.1). Then the probability of measuring the qubit 0 to be in state
|1i is
1 1
(3.9) p(1) = (1 − Re hψ|U |ψi) = (1 − cos(2πϕ)).
2 2
Therefore
arccos(1 − 2p(1))
(3.10) ϕ=± (mod 1).
2π
In order to quantify the efficiency of the procedure, recall from Example 1.11 that if p(1) is far
away from 0 or 1, i.e., (2ϕ mod 1) is far away from 0, in order to approximate p(1) (and hence ϕ)
to additive precision , the number of samples needed is O(1/2 ).
Now assume ϕ is very close to 0 and we would like to estimate ϕ to additive precision . Note
that
(3.11) p(1) ≈ (2πϕ)2 = O(2 ).
Then p(1) needs to be estimated to precision O(2 ), and again the number of samples needed is
O(1/2 ). The case when ϕ is close to 1/2 or 1 is similar.
Note that the circuit in Fig. 3.1 cannot distinguish the sign of ϕ (or whether ϕ ≥ 1/2 when
restricted to the interval [0, 1)). To this end we need Fig. 3.2, but replace S† by S, so that the
success probability of measuring 1 in the computational basis is
1
(3.12) p(1) = (1 + sin(2πϕ)).
2
42 3. QUANTUM PHASE ESTIMATION
This gives
(
∈ [0, 1/2), p(1) ≥ 12 ,
(3.13) ϕ
∈ (1/2, 1), p(1) < 12 .
Unlike the previous estimate, in order to correctly estimate the sign, we only require O(1) accuracy,
and run Fig. 3.2 for a constant number of times (unless ϕ is very close to 0 or π).
Example 3.4 (Qiskit example for phase estimation using the Hadamard test). Here is a qiskit
example of the simple phase estimation for
1 0 0
R|1i = = ei2πϕ |1i,
0 ei2πϕ 1
1
with ϕ = 0.5 + 2d
≡ 0.10 · · · 01 (d bits in total). In this example, d = 4 and the exact value is
ϕ = 0.5625.
|0i H H
|ψi U2
j
Figure 3.4. Circuit used in Kitaev’s method. Another one with a phase gate to
determine the sign of 2j ϕ may also be used.
This is the basic idea behind Kitaev’s method: use a more complex quantum circuit (and in
particular, with a larger circuit depth) to reduce the total number of queries. As a general strategy,
j
instead of estimating ϕ from a single number, we assume access to U 2 , and estimate ϕ bit-by-bit.
j
In particular, changing U → U 2 in the Hadamard test allows us to estimate
(3.16) 2j ϕ = ϕd−1 · · · ϕd−j .ϕd−j−1 · · · ϕ0 = .ϕd−j−1 · · · ϕ0 (mod 1)
One immediate difficulty of the bit-by-bit estimation is that we need to tell 0.0111 . . . apart from
0.1000 . . ., and the two numbers can be arbitrarily close to each other (though the two numbers can
also differ at some number of digits), and some careful work is needed. We will first describe the
algorithm, and then analyze its performance. The algorithm works for any ϕ, and then the goal is
to estimate its d bits. For simplicity of the analysis, we assume ϕ is exactly represented by d bits.
We will use extensively the distance
(3.17) |x|1 ≡ |x| mod 1 := min{(x mod 1), 1 − (x mod 1)},
which is the distance on the unit circle.
First, by applying the circuit in Fig. 3.4 (and the corresponding circuit to determine the sign)
with j = 0, 1, . . . , d − 3, for each j we can estimate p(0), so that the error in 2j ϕ is less than 1/16
for all j (this can happen with a sufficiently high success probability. For simplicity, let us assume
that this happens with certainty). The measured result is denoted by αj . This means that any per-
turbation must be due to the 5th digit in the binary representation. For example, if 2j ϕ = 0.11100,
then αj = 0.11011 is an acceptable result with an error 0.00001 = 1/32, but αj = 0.11110 is not
acceptable since the error is 0.0001 = 1/16. We then round αj (mod 1) by its closest 3-bit estimate
denoted by βj , i.e., βj is taken from the set { 0.000, 0.001, 0.010, 0.011, 0.100, 0.101, 0.110, 0.111 }.
Consider the example of 2j ϕ = 0.11110, if αj = 0.11101, then βj = 0.111. But if αj = 0.11111, then
βj = 0.000. Another example is 2j ϕ = 0.11101, if αj = 0.11110, then both βj = 0.111 (rounded
44 3. QUANTUM PHASE ESTIMATION
down) and βj = 0.000 (rounded up) are acceptable. We can pick one of them at random. We will
show later that the uncertainty in αj , βj is not detrimental to the success of the algorithm.
Second, we perform some post-processing. Start from j = d − 3, we can estimate .ϕ2 ϕ1 ϕ0 to
accuracy 1/16, which recovers these three bits exactly. The values of these three bits will be taken
from βd−3 directly. Then we proceed with the iteration: for j = d − 4, . . . , 0, we assign
(
0, |.0ϕd−j−2 ϕd−j−3 − βj | mod 1 < 1/4,
(3.18) ϕd−j−1 =
1, |.1ϕd−j−2 ϕd−j−3 − βj | mod 1 < 1/4.
Here |·| mod 1 is the periodic distance on [0, 1) and its value is always ≤ 1/2. Since the two possibil-
ities are separated by 1/2, for each j, there will be at most one case that is satisfied. We will also
show that in all circumstances, there is always one case that is satisfied, regardless of the ambiguity
of the choice of βj above.
After running the algorithm above, we recover ϕ = .ϕd−1 · · · ϕ0 exactly. The total cost of
Pd−3 j −1
Kitaev’s method measured by the number of queries to U is O j=0 2 = O( ).
If ϕ is exactly represented by d bits, we will obtain an estimate
(3.19) |.ϕd−1 · · · ϕ0 − ϕ| mod 1 < 2−d = .
Example 3.5. Consider ϕ = 0.ϕ4 ϕ3 ϕ2 ϕ1 ϕ0 = 0.11111 and d = 5. Running Kitaev’s algorithm
with j = 0, 1, 2 gives the following possible choices of βj :
j 2j ϕ possible βj
0 0.11111 { 0.111, 0.000 }
1 0.1111 { 0.111, 0.000 }
2 0.111 { 0.111 }
Start with j = 2. We have only one choice of βj , and can recover 0.ϕ2 ϕ1 ϕ0 = 0.111. Then for
j = 1, we need to use Eq. (3.18) to decide ϕ3 . If we choose βj = 0.111, we have ϕ3 = 1. But if we
choose βj = 0.000, we still need to choose ϕ3 = 1, since |.011 − 0.000| mod 1 = 0.100 = 1/2 > 1/4,
and |.111 − 0.000| mod 1 = 0.001 = 1/8 < 1/4. Similar analysis shows that for j = 0 we have ϕ4 = 1.
This recovers ϕ exactly.
Example 3.6 (A variant of Kitaev’s algorithm that does not work). Let us modify Kitaev’s al-
gorithm as follows: for each 2j ϕ is determined to precision 1/8, and round the result to βj ∈
{ 0.00, 0.01, 0.10, 0.11 }. Start from j = d − 2, we estimate .ϕ1 ϕ0 exactly. Then for j = d − 3, . . . , 0,
we assign
(
0, |.0ϕd−j−2 − βj | mod 1 < 1/2,
(3.20) ϕd−j−1 =
1, |.1ϕd−j−2 − βj | mod 1 < 1/2.
Now that the inequality < 1/2 above can be equivalently written as ≤ 1/4.
Let us run the algorithm above for ϕ = 0.ϕ4 ϕ3 ϕ2 ϕ1 ϕ0 = 0.1111 and d = 4. This gives:
j 2j ϕ possible βj
0 0.1111 { 0.11, 0.00 }
1 0.111 { 0.11, 0.00 }
2 0.11 { 0.11 }
Start with j = 2. We have only one choice of βj , and can recover 0.ϕ1 ϕ0 = 0.11. Then for j = 1, if we
choose βj = 0.11, we have ϕ2 = 1. But if we choose βj = 0.00, then |.01 − 0.00| mod 1 = 0.0.01 = 1/4,
3.3. QUANTUM FOURIER TRANSFORM 45
and |.11 − 0.000| mod 1 = 0.01 = 1/4. So the algorithm cannot distinguish the two possibilities and
fails.
Let us now inductively show why Kitaev’s algorithm works. Again assume ϕ is exactly rep-
resented by d bits. For j = d − 3, we know that ϕ2 ϕ1 ϕ0 can be recovered exactly. Then assume
ϕd−j−2 · · · ϕ0 have all been exactly computed, at step j we would like to determine the value of
ϕd−j−1 . From
(3.21) αj − 2j ϕ mod 1
< 1/16, |αj − βj | mod 1 ≤ 1/16,
we know
(3.22) 2j ϕ − βj mod 1
< 1/8.
Then
|.ϕd−j−1 ϕd−j−2 ϕd−j−3 − βj | mod 1
(3.23) ≤ .ϕd−j−1 ϕd−j−2 ϕd−j−3 − 2j ϕ mod 1
+ 2 j ϕ − βj mod 1
≤1/16 + 1/8 < 1/4.
The wrong choice of ϕd−j−1 denoted by ϕ
ed−j−1 then satisfies
(3.24)
|.ϕ
ed−j−1 ϕd−j−2 ϕd−j−3 − βj | mod 1
≥ |.ϕ
ed−j−1 ϕd−j−2 ϕd−j−3 − .ϕd−j−1 ϕd−j−2 ϕd−j−3 | mod 1 − |.ϕd−j−1 ϕd−j−2 ϕd−j−3 − βj | mod 1
>1/2 − 1/4 = 1/4.
This proves the validity of Eq. (3.18), and hence that of Kitaev’s algorithm.
In particular
1 X
(3.26) UFT |0n i = √ |ki = H ⊗n |0n i .
N k∈[N ]
we have
kj j j j
=k0 n + k1 n−1 + · · · + kn−1
N 2 2 2
=k0 (.jn−1 · · · j0 ) + k1 (jn−1 .jn−2 · · · j0 ) + · · · + kn−1 (jn−1 · · · j1 .j0 ).
Therefore the exponential can be written as
kj
(3.29) ei2π N = ei2πk0 (.jn−1 ···j0 ) ei2πk1 (.jn−2 ···j0 ) · · · ei2πkn−1 (.j0 ) .
The most important step of QFT is the following direct calculation, which requires some patience
with the manipulation of indices:
(3.30)
1 X
UFT |jn−1 · · · j0 i = √ ei2πk0 (.jn−1 ···j0 ) ei2πk1 (.jn−2 ···j0 ) · · · ei2πkn−1 (.j0 ) |kn−1 · · · k0 i
n
2 k ,...,k
n−1 0
1 X i2πkn−1 (.j0 ) X
=√ e |kn−1 i ⊗ ei2πkn−2 (.j1 j0 ) |kn−1 i
2n k kn−2
n−1
!
X
⊗ ··· ⊗ ei2πk0 (.jn−1 ···j0 ) |k0 i
k0
1
=√ |0i + ei2π(.j0 ) |1i ⊗ |0i + ei2π(.j1 j0 ) |1i ⊗ · · · ⊗ |0i + ei2π(.jn−1 ···j0 ) |1i .
2n
Eq. (3.30) involves a series of controlled rotations of the form
1
(3.31) |0i → √ |0i + ei2π(.jn−1 ···j0 ) |1i .
2
Hence before discussing the quantum circuit for QFT, let us first work out the circuit for imple-
menting this controlled rotation. We use the relation
(3.32) ei2π(.jn−1 ···j0 ) = ei2π(.jn−1 ) ei2π(.0jn−2 ) · · · ei2π(.0···0j0 ) .
Example 3.7 (Implementation of controlled rotation). Consider the implementation of
1
(3.33) |0i |ji → √ |0i + ei2π(.jn−1 ···j0 ) |1i |ji ,
2
Let
1 0
(3.34) Rz (θ) = ,
0 eiθ
and Rj = Rz (π/2j−1 ). In particular, R1 = Z. The quantum circuit is
|0i H R1 R2 ··· Rn
|jn−1 i ···
|jn−2 i ···
···
|j0 i ···
3.3. QUANTUM FOURIER TRANSFORM 47
The implementation of QFT follows the same principle, but does not require the signal qubit
to store the phase information. Let us see a few examples.
When n = 1, we need to implement
1
(3.35) |j0 i → √ |0i + ei2π(.j0 ) |1i .
2
1
(3.37) |ji → √ |0i + ei2π(.j0 ) |1i ⊗ |0i + ei2π(.j1 j0 ) |1i .
22
Comparing the result with that in Eq. (3.37), we find that the ordering of the qubits is reversed.
To recover the desired result in QFT, we can apply a SWAP gate to the outcome, i.e.,
In order to implement the inverse Fourier transform, we only need to apply the Hermitian
conjugate as
|j0 i H √1 (|0i
2
+ e−i2π(.j1 j0 ) |1i)
Similarly one can construct the circuit for UFT and its inverse for n = 3.
In general, the QFT circuit is given by Fig. 3.5. Compare the circuit in Fig. 3.5 with Eq. (3.30),
we find again that the ordering is reversed in the output. To restore the correct order as defined in
QFT, we can use O(n/2) swaps operations. The total gate complexity of QFT is O(n2 ).
48 3. QUANTUM PHASE ESTIMATION
Figure 3.5. Quantum circuit for quantum Fourier transform (before applying
swap operations).
When d = 1, U is simply the controlled U operation. For a general d, it seems that we need to
implement all 2d different U j . However, this is not necessary.
Pd−1
Using the binary representation of
Pd−1 i Qd−1 i
integers j = (jd−1 · · · j0 .) = i=0 ji 2i , we have U j = U i=0 ji 2 = i=0 U ji 2 . Therefore similar
to the operations in QFT,
X
U= |ji hj| ⊗ U j
j∈[2d ]
d−1
X Y i
= (|jd−1 i hjd−1 |) ⊗ · · · ⊗ (|j0 i hj0 |) ⊗ U ji 2
jd−1 ,...,j0 i=0
(3.39) d−1
Y 0 X i
= |ji i hji | ⊗ U ji 2
i=0 ji
d−1
Y 0 i
= |0i h0| ⊗ In + |1i h1| ⊗ U 2 .
i=0
3.4. QUANTUM PHASE ESTIMATION USING QUANTUM FOURIER TRANSFORM 49
Q0
Here the primed product is a slightly awkward notation, which means the tensor product for
the first register, and the regular matrix product for the second register. It is in fact much clearer
to observe the structure in the quantum circuit in Fig. 3.6.
|jd−1 i ···
···
|j1 i ···
|j0 i ···
|ψi U U2 ··· U2
d−1
U |ψi
Remark 3.9. At first glance, the saving due to the usage of the circuit in Fig. 3.6 may not
2d−1
seem to be large, since we still need to P implement matrix powers as high as U . However,
the alternative would be to implement j∈[2d ] |ji hj|, which requires very complicated multi-qubit
control operations. Another scenario when significant advantage can be gained is when U can be
fast-forwarded, i.e., U j can be implemented at a cost that is independent of j. This is for instance,
if U = Rz (θ) is a single-qubit rotation. Then the circuit Fig. 3.6 is exponentially better than the
direct implementation of U.
Now let the initial state in the ancilla qubits be |0n i. Use QFT and U, we transform the initial
states according to
UFT ⊗I 1 X
|0d i |ψ0 i −− −−→ √ |ji |ψ0 i
2d j∈[2d ]
U 1 X 1 X
→√
− |ji U j |ψ0 i = √ |ji ei2πϕj |ψ0 i
(3.40) 2d j∈[2d ] 2d j∈[2d ]
† X i2πj ϕ− k0
UFT ⊗I X 1 2d |k 0 i |ψ i .
−−−−→ e 0
0 d
2d d
k ∈[2 ] j∈[2 ]
Since we have ϕ = 2kd for some k ∈ [2d ], measuring the ancilla qubits, and we will obtain the state
|ki |ψ0 i with certainty, and we obtain the phase information. Therefore the quantum circuit for
†
QFT based QPE is given by Fig. 3.7. Here we have used Eq. (3.26). We should note that UFT
includes the swapping operations. (Exercise: 1. what if the swap operation is not implemented? 2.
Is it possible to modify the circuit and remove the need of implementing the swap operations?)
50 3. QUANTUM PHASE ESTIMATION
†
|0d i H ⊗d UFT
U
|ψi |ψi
Figure 3.7. Quantum circuit for quantum phase estimation using quantum
Fourier transform.
†
Example 3.10 (Hadamard test viewed as QPE). When d = 1, note that U † = UFT = H, the QFT
based QPE in Fig. 3.7 is exactly the Hadamard test in Fig. 3.1. Note that ϕ does not need to be
exactly represented by a one bit number!
Example 3.11 (Qiskit example for QPE). https://qiskit.org/textbook/ch-algorithms/quantum-
phase-estimation.html
an initial state |0t i |φi with t > d. The exact relation between the t and the desired accuracy d will
be determined later. Similar to Eq. (3.40), we obtain the state
X 1 X i2πj ϕ0 − k0
|0t i |ψ0 i → e T |k 0 i |ψ0 i
0
T
(3.43) k ∈[T ] j∈[T ]
X
= γ0,k0 |k 0 i |ψ0 i .
k0
Here
1 X i2πj ϕ0 − kT0 1 1 − ei2πT (ϕ0 −ϕek0 ) k0
(3.44) γ0,k0 = e = , ϕ
ek0 = .
T T 1 − ei2π(ϕ0 −ϕek0 ) T
j∈[T ]
Here = 2−d = 2t−d /T is the precision parameter. In particular, for any k 0 we have
1
(3.46) |ϕ0 − ϕ
ek0 |1 ≤ .
2
Using the relation that for any θ ∈ [−π, π],
p 2
(3.47) 1 − eiθ = 2(1 − cos θ) = 2 |sin(θ/2)| ≥ |θ| ,
π
we obtain
2 1
(3.48) |γ0,k0 | ≤ = .
T π2 2π |ϕ0 − ϕ
ek0 |1 2T |ϕ0 − ϕ
ek0 |1
Figure 3.8. For ϕ0 = 0.35, the shape of |γ0,k | with T = 64 and T = 1024.
52 3. QUANTUM PHASE ESTIMATION
Let k00 be the measurement outcome, which can be viewed as a random variable. The probability
of obtaining some ϕ ek00 that is at least distance away from ϕ0 is
X 2
P ( ϕ0 − ϕek00 1 ≥ ) = |γ0,k0 |
|ϕ0 −ϕ
ek0 |1 ≥
X 1
(3.49) ≤ 2
|ϕ0 −ϕ
ek0 |1 ≥
4T 2 |ϕ0 − ϕ
ek0 |1
Z ∞
2 1 2 1 1
≤ dx + = + .
4T x2 4T 2 2 2T 2(T )2
Set t − d = dlog2 δ −1 e, then T = 2t−d ≥ δ −1 . Hence for any 0 < δ < 1, the failure probability
δ + δ2
(3.50) P ( ϕ0 − ϕ
ek00 1
≥ ) ≤ ≤ δ.
2
In other words, in order to obtain the phase ϕ0 to accuracy = 2−d with a success probability at
least 1 − δ, we need d + dlog2 δ −1 e ancilla qubits to store the value of the phase. On top of that,
the simulation time needs to be T = (δ)−1 .
Remark 3.12 (Quantum median method). One problem with QPE is that in order to obtain a
success probability 1 − δ, we must use log2 δ −1 ancilla qubits, and the maximal simulation time also
needs to be increased by a factor δ −1 . The increase of the maximal simulation time is particularly
undesirable since it increases the circuit depth and hence the required coherence time of the quantum
device. When |ψi is an exact eigenstate, this can be improved by the median method, which uses
log δ −1 copies of the result from QPE without using ancilla qubits or increasing the circuit depth.
When |ψi is a linear combination of eigenstates, the problem of the aliasing effect becomes more
difficult to handle. One possibility is to generalize the median method into the quantum median
method [NWZ09], which uses classical arithmetics to evaluate the median using a quantum circuit.
To reach success probability 1 − δ, we still need log2 δ −1 ancilla qubits, but the maximal simulation
time does not need to be increased.
Exercise 3.1. Write down the quantum circuit for the overlap estimate in Example 3.2.
Exercise 3.2. For ϕ = 0.111111, we run Kitaev’s algorithm to estimate its first 4 bits. Check that
the outcome satisfies Eq. (3.19). Note that 0.0000 and 0.1111 are both acceptable answers. Prove
the validity of Kitaev’s algorithm in general in the sense of Eq. (3.19).
Exercise 3.3. For a 3 qubit system, explicitly construct the circuit for UFT and its inverse.
Exercise 3.4. For a n-qubit system, write down the quantum circuit for the swap operation used
in QFT.
Exercise 3.5. Similar to the Hadamard test in Example 3.10, develop an algorithm to perform
QPE using the circuit in Fig. 3.7 with only d = 2, while the phase ϕ can be any number in [0, 1/2).
CHAPTER 4
The creation and annihilation operators â†i , âi can be converted into Pauli operators via e.g. the
Jordan-Wigner transform as
1 1
(4.4) âi = Z ⊗(i−1) ⊗ (X + iY ) ⊗ I ⊗(N −i) , â†i = Z ⊗(i−1) ⊗ (X − iY ) ⊗ I ⊗(N −i) .
2 2
Here X, Y, Z, I are single-qubit Pauli-matrices. The dimension of the Hamiltonian matrix Ĥ is thus
2n .
The number operator takes the form
1
(4.5) n̂i := â†i âi = (I − Zi ).
2
For a given state |Ψi, the total number of particles is
* n
+
X
(4.6) Ne = Ψ n̂i Ψ .
i=1
Without loss of generality we assume 0 < λ0 ≤ λ1 ≤ · · · < λN −1 < 21 . Note that for the
purpose of estimating the ground state energy, we do not necessarily require a positive energy gap.
For simplicity of the presentation, we still assume that the ground state is non-degenerate, i.e.,
λ0 < λ1 . We are also provided an approximate eigenstate
X
(4.7) |φi = ck |ψk i ,
k∈[N ]
2
of which the overlap with the ground state is p0 = |hφ|ψ0 i| . Our goal is to estimate λ0 to precision
= 2−d . We assume < λ0 . This appears in many problems in quantum many-body physics,
quantum chemistry, optimization etc.
In order to use QPE (based on QFT), we assume access to the unitary evolution operator
U = ei2πH . This is called a Hamiltonian simulation problem, which will be discussed in detail in
later chapters. For now we assume U can be implemented exactly. Then
U |ψ0 i = ei2πλ0 |ψ0 i .
This becomes a phase estimation problem, where the input vector is not an exact eigenstate.
Following the discussion in Section 3.5, if all eigenvalues λj can be exactly represented by d-
bit numbers, we obtain both the ground state and the ground state energy with probability p0 .
Therefore repeating the process for O(p−1 0 ) times we obtain the ground state energy.
Now we relax both conditions (1) and (2) in Section 3.5, and apply the QPE circuit in Fig. 3.7
to an initial state |0t i |φi for some t > d. Similar to Eq. (3.40), we have
UFT ⊗I
X 1 X
|0t i |φi −− −−→ ck √ |ji |ψk i
k
T j∈[T ]
U
X 1 X X 1 X
−
→ ck √ |ji U j |ψk i = ck √ |ji ei2πλk j |ψk i
k
T j∈[T ] k
T j∈[T ]
(4.8)
† X i2πj λk − k0
1
UFT ⊗I X X
−− −−→ ck e T |k 0 i |ψk i
0
T
k k ∈[T ] j∈[T ]
X X
= ck γk,k0 |k 0 i |ψk i .
k k0 ∈[T ]
Here
1 X i2πj λk − kT0 1 1 − ei2πT (λk −ϕek0 ) k0
(4.9) γk,k0 = e = , ϕ
ek0 = .
T T 1 − ei2π(λk −ϕek0 ) T
j∈[T ]
Unfortunately, this analysis is not correct. In fact, for any λk that does not have an exact t-bit
representation (note that t > d), we may have γk,k00 6= 0 and ϕ ek00 < λ0 , i.e., we obtain an energy
estimate that is lower than the ground state energy! Therefore the probability of ending up in the
2
state |k 00 i |ψk i is |ck γk,k00 | , i.e., it is still possible to obtain a wrong ground state energy. This is
called the aliasing effect.
We demonstrate below that if T is large enough, we can control the probability of underes-
timating the ground state energy. Since λ0 is the ground state energy, and all eigenvalues are in
(0, 1/2), when ϕ ek0 ≤ λ0 − , we have
(4.10) |λk − ϕ
ek0 |1 ≥ |λ0 − ϕ
ek0 |1 = λ0 − ϕ
ek0 ≥ , ∀k ∈ [N ].
X X 2
ek0 ≤ λ0 − ) =
P (ϕ |ck γk,k0 |
k λ 0 −ϕ
ek0 ≥
X X 1
≤ pk 2
k λ 0 −ϕ
ek0 ≥
4T 2 |λk − ϕ
ek0 |1
X X 1
(4.11) ≤ pk 2
k λ0 −ϕ
ek0 ≥
4T 2 |λ0 − ϕ
ek0 |
X 1
= 2
λ 0 −ϕ
ek0 ≥
4T 2 |λ0 − ϕ
ek0 |
1 1
≤ + .
4T 4(T )2
δ0
(4.12) ek0 ≤ λ0 − ) ≤
P (ϕ .
2
M δ0
(`)
(4.13) P ek0 ≤ λ0 − ≤
min ϕ .
` 2
(`)
ek0 ≤ λ0 − < 2δ , we need to set δ 0 = M −1 δ.
In order to obtain P min` ϕ
56 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION
(`)
ek0 ≥ λ0 + . To this end, we
On the other hand, we also would like to have bound P min` ϕ
ek0 ≥ λ0 + , we have |ϕ
first note that when ϕ ek0 − λ0 |1 ≥ . Moreover,
X X 2
ek0 − λ0 |1 < ) =
P (|ϕ |ck γk,k0 |
k |ϕ
ek0 −λ0 |1 <
X 2
≥p0 |γ0,k0 |
|ϕ
ek0 −λ0 |1 <
(4.14) X 2
=p0 1 − |γ0,k0 |
|ϕ
ek0 −λ0 |1 ≥
1 1
≥p0 1 − −
2T 2(T )2
≥p0 (1 − δ 0 ).
Here we have used the normalization condition that
X 2
(4.15) |γk,k0 | = 1, ∀k.
k0
Therefore
p0
(4.16) P (|ϕ ek0 − λ0 |1 < ) ≤ 1 − p0 (1 − δ 0 ) ≤ 1 −
ek0 − λ0 |1 ≥ ) = 1 − P (|ϕ .
2
This means that
p0 M
(`)
(4.17) ek0 − λ0 |1 ≥ )M = (1 − p0 /2)M ≤ e−
ek0 ≥ λ0 + ) ≤ P (|ϕ
P (min ϕ 2 .
`
Remark 4.4 (QPE for preparing the ground state). The estimate of the ground state energy does
not necessarily require the energy gap λg := λ1 −λ0 to be positive. However, if our goal is to prepare
the ground state |ψ0 i from an initial state |φi using QPE, then we need stronger assumptions. In
particular, we cannot afford to obtain |k 0 i |ψk i, where |ϕ
ek0 − λ0 | < but k 6= 0. This at least
−d
requires = 2 < λg , and introduces a natural dependence on the inverse of the gap.
Through the analysis above, we see that although the analysis of QPE is very clean when 1) all
eigenvalues (properly scaled to be represented as phase factors) are exactly given by d-bit numbers
2) the input vector is an eigenstate, the analysis can become rather complicated and tedious when
such conditions are relaxed. Such difficulty does not merely show up at the theoretical level, but
can seriously impact the robust performance of QPE in practical applications. To simplify the
discussion of the applications below, we will be much more cavalier about the usage of QPE and
assume all eigenvalues are exactly represented by d-bit numbers whenever necessary. But we should
keep such caveats in mind. Furthermore, when we move beyond QPE, the issue of having exact
d-bit numbers will become much less severe in techniques based on quantum signal processing, i.e.,
quantum eigenvalue transformation (QET) and quantum singular value transformation (QSVT).
be the number of ancilla qubits with 0 = 2−d . Then QPE obtains an estimate denoted by θ,
e which
0 2 θ
approximates θ to precision with success probability 1 − δ. Note that p0 = sin 2 , and
θe θ
sin2 − sin2
2 2
θe − θ θ θe − θ θ θ θ θe − θ θe − θ θ
(4.27) = sin2 cos2 + cos2 sin2 + 2 sin cos sin cos − sin2
2 2 2 2 2 2 2 2 2
θe − θ
θ θ θ
= sin cos sin(θe − θ) + 1 − 2 sin2 sin2 .
2 2 2 2
p 02
(4.28) |e
p − p| ≤ p0 (1 − p0 )0 + (1 − 2p0 ) .
4
Let 0 be sufficiently small. Now if p0 (1 − p0 ) = Ω(1), we can choose 0 = O(), and the total
complexity of QPE is O(−1 ).
If p0 is small, then we should estimate p0 to multiplicative accuracy instead. Use
√
(4.29) |e
p − p| ≈ p0 0 < p0 ,
√ −1
we have 0 = p0 . Therefore the runtime of QPE is O(p0 2 −1 ). If p0 is to be estimated to
precision 0 using the Monte Carlo method, the number of samples would be N = O(p−10
−2
).
So, the amplitude estimation method achieves quadratic speedup in the total complexity, but
the circuit depth is increased to O(0−1 δ −1 ).
Example 4.5 (Amplitude estimation to accelerate Hadamard test). Consider the circuit for the
Hadamard test in Fig. 3.1 to estimate Re hψ|U |ψi. Let the initial state |ψi be prepared by a unitary
Uψ , then the following combined circuit
|0i H H
|0n i Uψ U
Then we can run QPE to the Grover unitary G = Rψ0 Rgood to estimate p(0), and the circuit depth
is O(−1 ).
4.3. HHL ALGORITHM FOR SOLVING LINEAR SYSTEMS OF EQUATIONS 59
so all we need to do is to use the information of the eigenvalue |λj i stored in the ancilla register,
and perform a controlled rotation to multiply the factor λ−1 j to each βj . To this end, we see that
it is crucial to store all eigenvalues in the quantum computer coherently, as achieved by QPE. We
would like to implement the following controlled rotation unitary (see Section 4.3.2)
s !
C2 C
(4.41) UCR |0i |λj i = 1− |0i + |1i |λj i .
e2
λ λ
ej
j
r
C2 C
|0i 1− e2 |0i + |1i
λj λ
ej
UCR
|0d i |0d i
†
UQPE UQPE
|vj i |vj i
Note that through the uncomputation, the d ancilla qubits for storing the eigenvalues also
becomes a working register. Discarding all working registers, the resulting unitary denoted by
UHHL satisfies
s !
X C2 C
(4.42) UHHL |0i |bi = 1− |0i + |1i βj |vj i .
e2
λ λ
ej
j j
Finally, measuring the signal qubit (the only ancilla qubit left), if the outcome is 1, we obtain the
(unnormalized) vector
X Cβj
(4.43) x
e= |vj i
j λ
ej
Therefore sufficient repetitions of running the circuit in Eq. (4.42) and estimate p(1), we can obtain
an estimate of kexk.
More general discussion of the readout problem of the HHL algorithm will be given in Re-
mark 4.10.
4.3.2. Implementation of controlled rotation.
Proposition 4.7 (Controlled rotation given rotation angles). Let 0 ≤ θ < 1 has exact d-bit fixed
point representation θ = .θd−1 · · · θ0 be its d-bit fixed point representation. Then there is a (d + 1)-
qubit unitary Uθ such that
(4.46) Uθ : |0i|θi 7→ (cos(πθ)|0i + sin(πθ)|1i)|θi.
Proof. First (by e.g. Taylor expansion)
cos(τ ) − sin(τ )
(4.47) exp (−iτ σy ) = =: Ry (2τ ).
sin(τ ) cos(τ )
Here Ry (·) perform a single-qubit rotation around the y-axis. For any j ∈ [2d ] with its binary
representation j = jd−1 · · · j0 , we have
(4.48) j/2d = (.jd−1 · · · j0 ).
So choose τ = π(.jd−1 · · · j0 ), and define
X
(4.49) Uθ = exp (−iπ(.jd−1 · · · j0 )σy ) ⊗ |ji hj| .
j∈[2d ]
|θd−1 i
|θd−2 i
···
|θ0 i
This is a sequence of single-qubit rotations on the signal qubit, each controlled by a single qubit.
In order to use the controlled rotation operation, we need to store the information of λj in term
of an angle θj . Let C > 0 be a lower bound to λ0 , so that 0 < C/λj < 1 for all j. Define
1
(4.50) θj = arcsin(C/λj ),
π
62 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION
Again for simplicity we assume d0 is large enough so that the error of the fixed point representation
is negligible in this step. The mapping
0
(4.52) Uangle |0d −d i |λj i = |θej i
can be implemented using classical arithmetics circuits in Section 1.8, which may require poly(d0 )
gates and an additional working register of poly(d0 ) qubits, which are not displayed here. Therefore
the entire controlled rotation operation needed for the HHL algorithm is given by the circuit in
Fig. 4.2.
0 0
|0d −d i Uθ |0d −d i
†
Uangle Uangle
|λj i |λj i
Figure 4.2. Circuit for the controlled rotation step used by the HHL algorithm
(not including additional working register for classical arithmetic operations).
†
Therefore through the uncomputation Uangle , the d0 − d ancilla qubits also become a working
register. Discard the working register, and we obtain a unitary UCR satisfying
s !
C 2 C
(4.53) UCR |0i |λj i = cos(π θej ) |0i + sin(π θej ) |1i |λj i = 1− |0i + |1i |λj i .
e2
λ λ
ej
j
4.3.3. Complexity analysis of the HHL algorithm. Although the choice of the constant
C does not appear in the normalized quantum state, it does directly affect the success probability.
From Eq. (4.42) we immediately obtain the success probability for measuring the signal qubit with
outcome 1 is the square of the norm of the unnormalized solution
2 2
(4.54) xk ≈ C 2 A−1 |bi
p(1) = ke .
Therefore the success probability is determined by
(1) the choice of the normalization constant C,
(2) the norm of the true solution kxk = A−1 |bi .
To maximize the success probability, C should be chosen to be as large as possible (without
exceeding λ0 ). So assuming the exact knowledge of λ0 , we can choose C = λ0 . For a Hermitian
4.3. HHL ALGORITHM FOR SOLVING LINEAR SYSTEMS OF EQUATIONS 63
positive definite matrix A, kAk = λN −1 , and A−1 = λ−1 0 . For simplicity, assume the largest
eigenvalue of A is λN −1 = 1. Then the condition number of A is
λN −1
(4.55) κ := kAk A−1 = = C −1 .
λ0
Furthermore,
1
(4.56) A−1 |bi ≥ k|bik = 1.
kAk
Therefore
(4.57) p(1) = Ω(κ−2 ).
In other words, in the worst case, we need to repeatedly run the HHL algorithm for O(κ2 ) times
to obtain the outcome 1 in the signal qubit.
Assuming the number of system qubits n is large, the circuit depth and the gate complexity
of UHHL is mainly determined by those of UQPE . Therefore we can measure the complexity of the
HHL algorithm in terms of the number of queries to U = ei2πA . In order to solve QLSP to precision
, we need to estimate the eigenvalues to multiplicative accuracy instead of the standard additive
accuracy.
To see why this is the case, assume λej = λj (1 + ej ) and |ej | ≤ ≤ 1 . Then the unnormalized
4 2
solution satisfies
!
X 1 1 X βj −ej
(4.58) ke
x − xk = βj − |vj i ≤ |vj i ≤ kxk .
j λj
e λj j
λ j 1 + ej 2
Hence
ke
xk
(4.59) 1− ≤ .
kxk 2
Then the normalized solution satisfies
x x ke
xk ke
x − xk
k|e
xi − |xik = − ≤ 1− ≤ .
e
(4.60) +
ke
xk kxk kxk kxk
The discussion requires the QPE to be run to additive precision 0 = λ0 = /κ. Therefore
the query complexity of QPE is O(κ/). Counting the number of times needed to repeat the HHL
circuit, the worst case query complexity of the HHL algorithm is O(κ3 /).
The above analysis the worst case analysis, because we assume p(1) attains the lower bound
Ω(κ−2 ). In practical applications, the result may not be so pessimistic. For instance, if βj con-
centrates around the smallest eigenvalues of A, then we may have kxk ∼ Θ(λ−1 0 ) = Θ(κ
−1
). Then
p(1) = Θ(1). In such a case, we only need to repeat the HHL algorithm for a constant number of
times to yield outcome 1 in the ancilla qubit. This does not reduce the query complexity of each
run of the algorithm. Then in this best case, the query complexity is O(κ/).
4.3.4. Additional considerations. Below we discuss a few more aspects of the HHL algo-
rithm. The first observation is that the asymptotic worst-case complexity of the HHL algorithm
can be generally improved using amplitude amplification.
64 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION
Remark 4.8 (HHL with amplitude amplification). We may view Eq. (4.42) as
p p
(4.61) UHHL |0i |bi = p(1) |1i |ψgood i + 1 − p(1) |0i |ψbad i , |ψgood i = |e
xi .
Since |ψgood i is marked by a single signal qubit, we may use Example 2.5 to construct a reflection
operator with respect to the signal qubit. This is simply given by
(4.62) Rgood = Z ⊗ In .
The reflection with respect to the initial vector is
(4.63) Rψ0 = Uψ0 (2 |01+n i h01+n | − I)Uψ† 0 ,
where Uψ0 = UHHL (I1 ⊗Ub ). Let G = Rψ0 Rgood be the Grover iterate. Then amplitude amplification
1
allows us to apply G for Θ(p(1)− 2 ) times to boost the success probability of obtaining |ψgood i
with constant success probability. Therefore in the worst case when p(1) = Θ(κ−2 ), the number
of repetitions is reduced to O(κ), and the total runtime is O(κ2 /). This query complexity is
the commonly referred query complexity for the HHL algorithm. Note that as usual, amplitude
amplification increases the circuit depth. However, the tradeoff is that the circuit depth increases
from O(κ/) to O(κ2 /).
So far our analysis, especially that based on QPE relies on the assumption that all λj all
eigenvalues have an exact d-bit representation. From the discussion in Section 3.5, we know that
such an assumption is unrealistic and causes theoretical and practical difficulties. The full analysis
of the HHL algorithm is thus more involved. We refer to e.g. [DHM+ 18] for more details.
Remark 4.9 (Comparison with classical iterative linear system solvers). Let us now compare the
cost of the HHL algorithm to that of classical iterative algorithms. If A is n-qubit Hermitian positive
definite with condition number κ, and is d-sparse (i.e., each row/column of A has at most d nonzero
entries), then each matrix vector multiplication Ax costs O(dN ) floating point operations. The
−1
number of iterations for the steepest√ descent (SD) algorithm is O(κ log ), and the this number
can be significantly reduced to O( κ log −1 ) by the renowned conjugate gradient (CG) √ method.
Therefore the total cost (or wall clock time) of SD and CG is O(dN κ log −1 ) and O(dN κ log −1 ),
respectively.
On the other hand, the query complexity of the HHL algorithm, even after using the AA
algorithm, is still O(κ2 /). Such a performance is terrible in terms of both κ and . Hence the
power of the HHL algorithm, and other QLSP solvers is based on that each application of A (in
this case, using the unitary U ) is much faster. In particular, if U can be implemented with poly(n)
gate complexity (also can be measured by the wall clock time), then the total gate complexity
of the HHL algorithm (with AA) is O(poly(n)κ2 /). When n is large enough, we expect that
poly(n) N = 2n and the HHL algorithm would eventually yield an advantage. Nonetheless, for
a realistic problem, the assumption that U can be implemented with poly(n) cost, and no classical
algorithm can implement Ax with poly(n) cost should be taken with a grain of salt and carefully
examined.
Remark 4.10 (Readout problem of QLSP). By solving the QLSP, the solution is stored as a
quantum state in the quantum computer. Sometimes the QLSP is only a subproblem of a larger
application, so it is sufficient to treat the HHL algorithm (or other QLSP solvers) as a “quantum
subroutine”, and leave |xi in the quantum computer. However, in many applications (such as the
solution of Poisson’s equation in Section 4.4, the goal is to solve the lienar system. Then the
information in |xi must be converted to a measurable classical output.
4.4. EXAMPLE: SOLVE POISSON’S EQUATION 65
The most common case is to compute the expectation of some observable hOi = hx|O|xi ≈
he
x|O|exi. Assuming hOi = Θ(1). Then to reach additive precision of the observable, the number
of samples needed is O(−2 ). On the other hand, in order to reach precision , the solution vector
|e
xi must be solved to precision . Assuming the worst case analysis for the HHL algorithm, the
total query complexity needed is
(4.64) O(κ2 /) × O(−2 ) = O(κ2 /3 ).
Remark 4.11 (Query complexity lower bound). The cost of a quantum algorithm for solving a
generic QLSP scales at least as Ω(κ(A)), where κ(A) := kAk A−1 is the condition number of
A. The proof is based on converting the QLSP into a Hamiltonian simulation problem, and the
lower bound with respect to κ is proved via the “no-fast-forwarding” theorem for simulating generic
Hamiltonians [HHL09]. Nonetheless, for specific classes of Hamiltonians, it may still be possible to
develop fast algorithms to overcome this lower bound.
Proof. Note that formally v0,k = vN +1,k = 0. Then direct matrix vector multiplication shows
that for any j = 1, . . . , N ,
jkπ ijθ jkπ kπ ijθ
(4.70) (Avk )j = a sin e + 2 |b| sin cos e = λk (vk )j .
N +1 N +1 N +1
Using Proposition 4.12 with a = 2/h2 , b = −1/h2 , the largest eigenvalue of A is λmax = kAk ≈
−1
4/h2 , and the smallest eigenvalue λmin = A−1 ≈ π 2 . So
4
(4.71) κ(A) ≈ = O(N 2 ).
h2 π 2
The circuit depth of the HHL algorithm is O(N 2 /), and the worst case query complexity (using
AA) is O(N 4 /). So when N is large, there is little benefit in employing the quantum computer to
solve this problem.
Let us now consider solving a d-dimensional Poisson’s equation with Dirichlet boundary con-
ditions
(4.72) − ∆u(r) = b(r), r ∈ Ω = [0, 1]d , u|∂Ω = 0.
The grid is the Cartesian product of the uniform grid in 1D with N grid points per dimension and
h = 1/(N + 1). The total number of grid points is N = N d . After discretization, we obtain a linear
system Au = b, where
(4.73) A = A ⊗ I ⊗ · · · I + · · · + I ⊗ · · · I ⊗ A,
where I is an identity matrix of size N . Since A is Hermitian and positive definite, we have
−1
kAk ≈ 4d/h2 , and A−1 ≈ dπ 2 . So κ(A) ≈ 4/(h2 π 2 ) ≈ κ(A). Therefore the condition number
is independent of the spatial dimension d.
The worst case query complexity of the HHL algorithm is O(N 2 /). So when the number of
grid points per dimension N is fixed and the spatial dimension d increases, and if U = eiAτ can be
implemented efficiently for some τ kAk < 1 with poly(d) cost, then the HHL algorithm will have
an advantage over classical solvers, for which each matrix-vector multiplication scales linearly with
respect to N and is therefore exponential in d.
A special case when H(t) ≡ H is often called the Hamiltonian simulation problem, and the solution
can be written as
(4.76) x(T ) = e−iHT x(0),
which can be viewed as the problem of evaluating the matrix function e−iHT . This will be discussed
separately in later chapters.
In this section we consider the general case of Eq. (4.74), and for simplicity discretize the
equation in time using the forward Euler method with a uniform grid tk = k∆t where ∆t =
T /N, k = 0, . . . , N . Let Ak = A(tk ), bk = b(tk ). The resulting discretized system becomes
(4.77) xk+1 − xk = ∆t(Ak xk + bk ), k = 1, . . . , N,
which can be rewritten as
(4.78)
I 0 0 ··· 0 0 x1 (I + ∆tA0 )x0 + ∆tb0
−(I + ∆tA1 ) I 0 ··· 0 0 x2 ∆tb1
0 −(I + ∆tA2 ) I ··· 0 0 x3
= ∆tb 2 ,
.. .. .. ..
. . . .
0 0 0 ··· −(I + ∆tAN −1 ) I xN ∆tbN −1
or more compactly as a linear systems of equations
(4.79) Ax = b.
Here I is the identity matrix of size d, and x ∈ RN d encodes the entire history of the states.
To solve Eq. (4.78) as a QLSP, the right hand side b needs to be a normalized vector. This
means we need to properly normalize x0 , bk so that
2 2
X 2
(4.80) kbk = k(I + ∆tA0 )x0 + ∆tb0 k + (∆t)2 kbk k = 1.
k∈[N −1]
which is not difficult to satisfy as long as x0 , b(t) can be prepared efficiently using unitary circuits
and kx0 k , kb(t)k = Θ(1).
To solve Eq. (4.78) using the HHL algorithm, we need to estimate the condition number of A.
Note that A is a block-bidiagonal matrix and in particular is not Hermitian. So we need to use the
dilation method in Eq. (4.33) and solve the corresponding Hermitian problem.
4.5.1. Scalar case. In order to estimate the condition number, for simplicity we first assume
d = 1 (i.e., this is a scalar ODE problem), and A(t) ≡ a ∈ C is a constant. Then
1 0 0 ··· 0 0
−(1 + ∆ta) 1 0 ··· 0 0
(4.82) A=
0 −(1 + ∆ta) 1 · · · 0 0 ∈ CN ×N .
.. ..
. .
0 0 0 ··· −(1 + ∆ta) 1
68 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION
Let ξ = 1 + ∆ta. When Re a ≤ 0, the absolute stability condition of the forward Euler method
requires |1 + ∆ta| = |ξ| < 1. In general we are interested in the regime ∆t |a| 1, and in particular
|1 + ∆ta| = |ξ| < 2.
2
Proposition 4.13. For any A ∈ CN ×N , kAk , (1/ A−1 )2 are given by the largest and smallest
eigenvalue of A† A, respectively.
From Proposition 4.13, we need to first compute
2
1 + |ξ| −ξ ··· 0 0
2
−ξ 1 + |ξ| −ξ ··· 0 0
2
0 −ξ 1 + |ξ| ··· 0 0
A† A =
(4.83) .
.. ..
. .
2
0 0 0 ··· 1 + |ξ| −ξ
0 0 0 ··· −ξ 1
Theorem 4.14 (Gershgorin circle theorem, see e.g. [GVL13, Theorem 7.2.1]). Let A ∈ CN ×N
with entries aij . For each i = 1, . . . , N , define
X
(4.84) Ri = |aij | .
j6=i
Let D(aii , Ri ) ⊆ C be a closed disc centered at aii with radius Ri , which is called a Gershgorin disc.
Then every eigenvalue of A lies within at least one of the Gershgorin discs D(aii , Ri )
Since A† A is Hermitian, we can restrict the Gershgorin discs to the real line so that D(aii , Ri ) ⊆
R. Then Gershgorin discs of the matrix A† A satisfy the bound
h i
2 2
D(a11 , R1 ) ⊆ 1 + |ξ| − |ξ| , 1 + |ξ| + |ξ| ,
h i
(4.85) 2 2
D(aii , Ri ) ⊆ 1 + |ξ| − 2 |ξ| , 1 + |ξ| + 2 |ξ| , i = 2, . . . , N − 1
D(aN N , RN ) ⊆ [1 − |ξ| , 1 + |ξ|] .
Applying Theorem 4.14 we have
(4.86) λmax (A† A) ≤ (1 + |ξ|)2 < 9.
for all values of a such that |ξ| < 2.
To obtain a meaningful lower bound of λmin (A† A), we need Re a < 0 and hence |ξ| < 1. Use
the inequality
√ 1
(4.87) 1 + x ≤ 1 + x, x > −1,
2
we have
p (∆t |a|)2 ∆t
(4.88) 1 − |ξ| = 1 − (1 + ∆t Re a)2 + ∆t2 (Im a)2 ≥ −∆t Re a − ≥− Re a,
2 2
when
(4.89) ∆t |a| < (− Re a)/ |a|
is satisfied. Then we have
∆t
(4.90) 1 > 1 − |ξ| ≥ − Re a.
2
4.5. SOLVE LINEAR DIFFERENTIAL EQUATIONS* 69
(∆t Re a)2
(4.91) λmin (A† A) ≥ (1 − |ξ|)2 ≥ .
4
Therefore
q q
2
(4.92) kAk = λmax (A† A) ≤ 1 + |ξ| + 2 |ξ| < 3,
and
∆t(− Re a)
q
−1
(4.93) A−1 = λmin (A† A) ≥ .
2
In summary, for the scalar problem d = 1 and Re a < 0, the query complexity of the HHL
algorithm is O((∆t)−2 −2 ).
Example 4.15 (Growth of the condition number when Re a ≥ 0). The Gershgorin circle theorem
does not provide a meaningful bound of the condition number when Re a > 0 and 1 < |ξ| < 2.
a = 1, b = 0, and the solution should grow
This is a correct behavior. To see this, just consider
exponentially as x(T ) = eT . If κ(A) = O (∆t)−1 holds and in particular is independent of the
final T , then the norm of the solution can only grow polynomially in T , which is a contradiction.
See Fig. 4.3 for an illustration. Note that when a = −1.0, ∆t = 0.1, the condition number is less
2
than 20, which is smaller than the upper bound above, i.e., 3 × ∆t(−a) = 60.
Figure 4.3. Growth of the condition number κ(A) with respect to T with a fixed
step size ∆t = 0.1 for a = 1.0 and a = −1.0.
70 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION
2
Remark 4.16. It may be tempting to modify the (N, N )-th entry of A to be 1 + |ξ| to obtain
2
1 + |ξ| −ξ ··· 0 0
−ξ 2
1 + |ξ| −ξ ··· 0 0
2
0 −ξ 1 + |ξ| · · · 0 0
(4.95) G= .. .. .
. .
2
0 0 0 · · · 1 + |ξ| −ξ
2
0 0 0 ··· −ξ 1 + |ξ|
Here G is a Toeplitz tridiagonal matrix satisfying the requirement of Proposition 4.12. The eigen-
values of G take the form
2 kπ
(4.96) λk = 1 + |ξ| + 2 |ξ| cos , k = 1, . . . , N.
N +1
If we take the approximation λmin (A† A) ≈ λmin (G), we would find that Eq. (4.94) holds even when
Re a > 0. This behavior is however incorrect, despite that the matrices A† A and G only differ by
a single entry!
4.5.2. Vector case. Here we consider a general d > 1, but for simplicity assume A(t) ≡ A ∈
Cd×d is a constant matrix. We also assume A is diagonalizable with eigenvalue decomposition
(4.97) A = V ΛV −1 ,
and Λ = diag(λ1 , . . . , λN ). We only consider the case Re λk < 0 for all k.
Proposition 4.17. For any diagonalizable A ∈ CN ×N with eigenvalue decomposition Avk = λk vk ,
we have
−1
(4.98) A−1 ≤ min |λk | ≤ max |λk | ≤ kAk .
k k
Proof. Use the Schur form A = QT Q† , where Q is an unitary matrix and T is an upper tri-
angular matrix (see e.g. [GVL13, Theorem 7.13]). The diagonal entries of T encodes all eigenvalues
of A, and the eigenvalues can appear in any order along the diagonal of T . The proposition follows
by arranging the eigenvalue of A with the smallest and largest absolute values to the (N, N )-th
entry, respectively.
The absolute stability condition of the forward Euler method requires ∆t kAk < 1, and we are
interested in the regime ∆t kAk 1. Therefore ∆t |λk | 1 for all k.
Let I be an identity matrix of size d, and denote by B = −(I + ∆tA), then
I + B†B B†
··· 0 0
B I + B†B B† ··· 0 0
†
0 B I + B B · · · 0 0
†
(4.99) A A= .. .
..
. .
0 0 0 · · · I + B B B†
†
0 0 0 ··· B I
Note that
(4.100) kBk ≤ I + B † B ≤ 1 + (1 + ∆t kAk)2 ≤ 5.
4.5. SOLVE LINEAR DIFFERENTIAL EQUATIONS* 71
For any x ∈ CN d ,
2
h
2 2 2 2 2
A† Ax ≤ I + B†B (kx1 k + kx2 k ) + (kx1 k + kx2 k + kx3 k ) + · · ·
(4.101) i
2 2 2
+ (kxN −1 k + kxN k ) ≤ 15 kxk .
To bound A−1 , we first note that from the eigenvalue decomposition of A, we have
(4.103)
I 0 0 ··· 0 0 −1
V −(I + ∆tΛ) V
I 0 ··· 0 0
V V −1
A=
0 −(I + ∆tΛ) I ··· 0 0
.
..
..
.. ..
. .
. .
V −1
V
0 0 0 ··· −(I + ∆tΛ) I
Hence
(4.104) A−1 ≤ kV k V −1 max A−1
k = κ(V ) A−1
k .
k
4.5.3. Computing observables. The solution of Eq. (4.78) means that the normalized state
|xi is computed to precision and stored in the quantum computer. In order to evaluate observables
at the final time T , i.e., hx(T )|O|x(T )i, we find that by the normalization condition, kx(T )k is on
1 1
average O(N − 2 ) = O((∆t) 2 ), and hx(T )|O|x(T )i = O(∆t). Therefore instead of reaching accuracy
, the Monte Carlo procedure must reach precision O(∆t). This increases the number of samples
by another factor of O((∆t)−2 ).
72 4. APPLICATIONS OF QUANTUM PHASE ESTIMATION
There is however a simple way to overcome this problem. Instead of solving Eq. (4.78), we
can redefine x by artificially padding the vector with N copies of the final state xN . This can be
written as and can be reinterpreted as
(4.109) X = |0i ⊗ x + |1i ⊗ y,
with the unnormalized vector
x1 xN +1 xN
x2 xN +2 xN
(4.110) x = . , y = . = . ,
.. .. ..
xN x2N xN
and the corresponding linear systems of equation becomes
(4.111)
I x1 (I + ∆tA0 )x0 + ∆tb0
−(I + ∆tA1 ) I x2 ∆tb1
.. ..
..
.
.
.
−(I + ∆tA N −1 ) I xN
= ∆tb N −1
.
−I I xN +1
0
−I I xN +2
0
.. . .
.. ..
.
−I I x2N 0
Note that the solution vector only requires one ancilla qubit, after solving the equation The con-
dition number of this modified equation is still κ = O((∆t)−1 ), so the total query complexity
is O((∆t)−2 −2 ). By solving the modified equation Eq. (4.111) to precision , we can estimate
hx(T )|O|x(T )i by
(4.112) hy|I ⊗ O|yi
of which the magnitude does not scale with ∆t. Here I is the identity matrix of size N , and y can
be obtained by measuring the ancilla qubit and obtain 1. If the norm of kx(t)k is comparable for
all t ∈ [0, T ], then the success probability will be Θ(1) after |Xi is obtained.
The absolute stability condition requires |1 + ∆tλk | < 1, or ∆t < h2 /4 = O(N −2 ), which implies
L = O(N 2 ). Since A is Hermitian, we have κ(V ) = 1. So for T = O(1), the query complexity for
solving the heat equation is O(N 2 −2 ), which is the same as solving Poisson’s equation.
Again, the potential advantage of the quantum solver only appears when solving the d-dimensional
heat equation
(4.116) ∂t u(r, t) = ∆u(r), r ∈ Ω = [0, 1]d , u(·, t)|∂Ω = 0.
This can be written as a linear system of equations
(4.117) ∂t u = −Au,
where A is given in Eq. (4.73). The eigenvalues of A are all negative. Note that kAk = Θ(dN 2 ),
then h = O(d−1 N −2 ), and the query complexity of the HHL solver is O(dN 2 −2 ). This could
potentially have an exponential advantage over classical solvers.
Exercise 4.1 (Quantum counting). Given query access to a function f : {0, 1}N → {0, 1} design
a quantum algorithm that computes the size of its kernel, i.e.„ total number of x’s that satisfy
f (x) = 1.
Exercise 4.2. Consider the initial value problem of the linear differential equation Eq. (4.74).
(1) Construct the linear system of equations
Ax = b
like Eq. (4.78) using the backward Euler method.
(2) In the scalar case when A(t) ≡ a ∈ C is a constant satisfying Re(a) ≤ 0, estimate the
query complexity of the HHL algorithm applying to the linear system constructed in (1).
CHAPTER 5
This can be improved to the second order Trotter method (also called the symmetric Trotter
splitting, or Strang splitting)
Now consider G(t) = [e−itH1 , H2 ]eitH1 = e−itH1 H2 eitH1 − H2 , which satisfies G(0) = 0 and
(5.14) i∂t G(t) = e−itH1 [H1 , H2 ]e+itH1 .
Hence
(5.15) [e−itH1 , H2 ] = kG(t)k ≤ t k[H1 , H2 ]k .
Plugging this back to Eq. (5.13), we have
Z t
t2
(5.16) U (t) − U (t) ≤
e s k[H1 , H2 ]k ds ≤ k[H1 , H2 ]k ≤ t2 ν 2 .
0 2
In the last equality, we have used the relation k[H1 , H2 ]k ≤ 2ν 2 with ν = max{kH1 k , kH2 k}.
Therefore Eq. (5.3) can be replaced by a sharper inequality
∆t2
(5.17) e−i∆tH − e−i∆tH1 e−i∆tH2 ≤ k[H1 , H2 ]k ≤ (∆t)2 ν 2 .
2
Here the first inequality is called the commutator norm error estimate, and the second inequality
the operator norm error estimate.
For the transverse field Ising model with nearest neighbor interaction, we have kH1 k , kH2 k =
O(n), and hence ν 2 = O(n2 ). On the other hand, since [Zi Zj , Xk ] 6= 0 only if k = i or k = j, the
commutator bound satisfies k[H1 , H2 ]k = O(n). Therefore to reach precision , the scaling of the
total number of time steps L with respect to the system size is O(n2 /) according to the estimate
based on the operator norm, but is only O(n/) according to that based on the commutator norm.
For the particle in a potential, for simplicity consider d = 1 and the domain Ω = [0, 1] is
discretized using a uniform grid of size N . For smooth and bounded potential, we have kH1 k =
O(N 2 ), and kV k = O(1). Therefore the operator norm bound gives ν 2 = O(N 4 ). This is too
pessimistic. Reexamining the second inequality of Eq. (5.17) shows that in this case, the error
bound should be O((∆t)2 ν) instead of (∆t)2 ν 2 . So according to the operator norm error estimate,
we have L = O(N 2 /). On the other hand, in the continuous space, for any smooth function ψ(r),
we have
d2
(5.18) [H1 , H2 ]ψ = − 2 , V ψ = −V 00 ψ − V 0 ψ 0 .
dr
So
(5.19) k[H1 , H2 ]ψk ≤ kV 00 k + kV 0 k kψ 0 k = O(N ).
Here we have used that kV 0 k = kV 00 k = O(1), and kψ 0 k = O(N ) in the worst case scenario. There-
fore k[H1 , H2 ]k = O(N ), and we obtain a significantly improved estimate L = O(N/) according to
the commutator norm.
The commutator scaling of the Trotter error is an important feature of the method. We refer
readers to [JL00] for analysis of the second order Trotter method, and [Tha08, CST+ 21] for the
analysis of the commutator scaling of high order Trotter methods.
Remark 5.3 (Vector norm bound). The Hamiltonian simulation problem of interest in practice
often concerns the solution with particular types of initial conditions, instead of arbitrary initial
conditions. Therefore the operator norm bound in Eq. (5.17) can still be too loose. Taking the
initial condition into account, we readily obtain
∆t2
(5.20) e−i∆tH ψ(0) − e−i∆tH1 e−i∆tH2 ψ(0) ≤ max k[H1 , H2 ]ψ(s)k .
2 0≤s≤∆t
78 5. TROTTER BASED HAMILTONIAN SIMULATION
Therefore if we are given the a priori knowledge that max0≤s≤t kψ 0 (s)k = O(1), we may even have
L = O(−1 ), i.e., the number of time steps is independent of N .
Exercise 5.1. Consider the Hamiltonian simulation problem for H = H1 + H2 + H3 . Show that
the first order Trotter formula
Ue (t) = e−itH1 e−itH2 e−itH3
has a commutator type error bound.
Exercise 5.2. Consider the time-dependent Hamiltonian simulation problem for the following
controlled Hamiltonian
H(t) = a(t)H1 + b(t)H2 ,
where a(t) and b(t) are smooth functions bounded together with all derivatives. We focus on the
following Trotter type splitting, defined as
U e (tn , tn−1 ) · · · U
e (t) := U e (tj+1 , tj ) = e−i∆ta(tj )H1 e−i∆tb(tj )H2 ,
e (t1 , t0 ), U
where the intervals [tj , tj+1 ] are equidistant and of length ∆t on the interval [0, t] with tn = t. Show
that this method has first-order accuracy, but does not exhibit a commutator type error bound in
general.
CHAPTER 6
Block encoding
In order to perform matrix computations, we must first address the problem of the input model :
how to get access to information in a matrix A ∈ CN ×N (N = 2n ) which is generally a non-unitary
matrix, into the quantum computer? One possible input model is given via the unitary eiτ A (if A
is not Hermitian, in some scenarios we can consider its Hermitian version via the dilation method).
This is particularly useful when eiτ A can be constructed using simple circuits, e.g. Trotter splitting.
A more general input model, as will be discussed in this chapter, is called “block encoding”. Of
course, if A is a dense matrix without obvious structures, any input model will be very expensive
(e.g. exponential in n) to implement. Therefore a commonly assumed input model is s-sparse, i.e.,
there are at most s nonzero entries in each row / column of the matrix. Furthermore, we have an
efficient procedure to get access to the location, as well as the value of the nonzero entries. This in
general can again be a difficult task given that the number of nonzero entries can still be exponential
in n for a sparse matrix. Some dense matrices may also be efficiently block encoded on quantum
computers. This chapter will illustrate the block encoding procedure for via a number of detailed
examples.
If the kAkmax ≥ 1, we can simply consider the rescaled matrix A/α e for some α > kAk
max .
To query the entries of a matrix, the desired oracle takes the following general form
q
2
(6.2) OA |0i |ii |ji = Aij |0i + 1 − |Aij | |1i |ii |ji .
In other words, given i, j ∈ [N ] and a signal qubit 0, OA performs a controlled rotation (controlling
on i, j) of the signal qubit, which encodes the information in terms of amplitude of |0i.
However, the classical information in A is usually not stored natively in terms of such an oracle
OA . Sometimes it is more natural to assume that there is an oracle
(6.3) eA |0d0 i |ii |ji = |A
O eij i |ii |ji ,
where Aeij is a d0 -bit fixed point representation of Aij , and the value of A
eij is either computed
on-the-fly with a quantum computer, or obtained through an external database. In either case, the
implementation of O eA may be challenging, and we will only consider the query complexity with
respect to this oracle.
79
80 6. BLOCK ENCODING
Using classical arithmetic operations, we can convert this oracle into an oracle
0
(6.4) OA |0d i |ii |ji = |θeij i |ii |ji ,
where 0 ≤ θeij < 1, and θeij is a d-bit representation of θij = arccos(Aij )/π . This step may require
some additional work registers not shown here.
Now using the controlled rotation in Proposition 4.7, the information of A eij , θeij has now been
transferred to the phase of the signal qubit. We should then perform uncomputation and free the
work register storing such intermediate information A eij , θeij . The procedure is as follows
0
OA
|0i |0d i |ii |ji −−→ |0i |θeij i |ii |ji
|{z}
work register
q
(6.5) CR 2
−−→ Aij |0i + 1 − |Aij | |1i |θeij i |ii |ji
0 −1
q
(OA ) 2
−−−−−→ Aij |0i + 1 − |Aij | |1i |0d i |ii |ji
From now on, we will always assume that the matrix entries of A can be queried using the
phase oracle OA or its variants.
where ∗ means that the corresponding matrix entries are irrelevant, then for any n-qubit quantum
state |bi, we can consider the state
b
(6.6) |0, bi = |0i |bi = ,
0
and
Ab
(6.7) UA |0, bi = =: |0i A |bi + |⊥i .
∗
Here the (unnormalized) state |⊥i can be written as |1i |ψi for some (unnormalized) state |ψi, that
is irrelevant to the computation of A |bi. In particular, it satisfies the orthogonality relation.
In order to obtain A |bi, we need to measure the qubit 0 and only keep the state if it returns 0.
This can be summarized into the following quantum circuit:
6.2. BLOCK ENCODING 81
|0i
UA
A|bi
|bi kA|bik (upon measuring 0)
Figure 6.1. Circuit for block encoding of A using one ancilla qubit.
Note that the output state is normalized after the measurement takes place. The success
probability of obtaining 0 from the measurement can be computed as
2
(6.9) p(0) = kA |bik = hb|A† A|bi .
So the missing information of norm kA |bik can be recovered via the success probability p(0) if
needed. We find that the success probability is only determined by A, |bi, and is independent of
other irrelevant components of UA .
Note that we may not need to restrict the matrix UA to be a (n + 1)-qubit matrix. If we can
find any (n + m)-qubit matrix UA so that
A ∗ ··· ∗
∗ ∗ · · · ∗
(6.10) UA = .
..
..
.
∗ ∗ ··· ∗
Here each ∗ stands for an n-qubit matrix, and there are 2m block rows / columns in UA . The
relation above can be written compactly using the braket notation as
(6.11) A = (h0m | ⊗ In ) UA (|0m i ⊗ In )
A necessary condition for the existence of UA is that kAk ≤ 1. (Note: kAkmax ≤ 1 does not
guarantee that kAk ≤ 1, see Exercise 6.2). However, if we can find sufficiently large α and UA so
that
(6.12) A/α = (h0m | ⊗ In ) UA (|0m i ⊗ In ) .
Measuring the m ancilla qubits and all m-qubits return 0, we still obtain the normalized state
A|bi
kA|bik . The number α is hidden in the success probability:
1 2 1
(6.13) p(0m ) = kA |bik = 2 hb|A† A|bi .
α2 α
So if α is chosen to be too large, the probability of obtaining all 0’s from the measurement can be
vanishingly small.
Finally, it can be difficult to find UA to block encode A exactly. This is not a problem, since it
is sufficient if we can find UA to block encode A up to some error . We are now ready to give the
definition of block encoding in Definition 6.1.
Definition 6.1 (Block encoding). Given an n-qubit matrix A, if we can find α, ∈ R+ , and an
(m + n)-qubit unitary matrix UA so that
(6.14) kA − α (h0m | ⊗ In ) UA (|0m i ⊗ In ) k ≤ ,
82 6. BLOCK ENCODING
then UA is called an (α, m, )-block-encoding of A. When the block encoding is exact with = 0,
UA is called an (α, m)-block-encoding of A. The set of all (α, m, )-block-encoding of A is denoted
by BEα,m (A, ), and we define BEα,m (A) = BE(A, 0).
Assume we know each matrix element of the n-qubit matrix Aij , and we are given an (n + m)-
qubit unitary UA . In order to verify that UA ∈ BE1,m (A), we only need to verify that
(6.15) h0m , i|UA |0m , ji = Aij ,
and UA applied to any vector |0m , bi can be obtained via the superposition principle.
Therefore we may first evaluate the state UA |0m , ji, and perform inner product with |0m , ii and
verify the resulting the inner product is Aij . We will also use the following technique frequently.
Assume UA = UB UC , and then
(6.16) h0m , i|UA |0m , ji = h0m , i|UB UC |0m , ji = (UB† |0m , ii)† (UC |0m , ji).
So we can evaluate the states UB† |0m , ii , UC |0m , ji independently, and then verify the inner product
is Aij . Such a calculation amounts to running the circuit Fig. 6.2, P and if theP ancilla qubits are
measured to be 0m , the system qubits return the normalized state i Aij |ii / k i Aij |iik.
|0m i
UA
|ji
Example 6.2 ((1, 1)-block-encoding is general). For any n-qubit matrix A with kAk2 ≤ 1, the
singular value decomposition (SVD) of A is denoted by W ΣV † , where all singular values in the
diagonal matrix Σ belong to [0, 1]. Then we may construct an (n + 1)-qubit unitary matrix
√ †
W 0 √ Σ In − Σ2 V 0
UA :=
0 In In − Σ 2 −Σ 0 In
(6.17) √
A W In − Σ2
= √ †
In − Σ V
2 −Σ
which is a (1, 1)-block-encoding of A.
Example 6.3 (Random circuit block encoded matrix). In some scenarios, we may want to con-
struct a pseudo-random non-unitary matrix on quantum computers. Note that it would be highly
inefficient if we first generate a dense pseudo-random matrix A classically and then feed it into the
quantum computer using e.g. quantum random-access memory (QRAM). Instead we would like to
work with matrices that are inherently easy to generate on quantum computers. This inspires the
random circuit based block encoding matrix (RACBEM) model [DL21]. Instead of first identifying
A and then finding its block encoding UA , we reverse this thought process: we first identify a
unitary UA that is easy to implement on a quantum computer, and then ask which matrix can be
block encoded by UA .
Example 6.2 shows that in principle, any matrix A with kAk2 ≤ 1 can be accessed via a (1, 1, 0)-
block-encoding. In other words, A can be block encoded by an (n + 1)-qubit random unitary UA ,
6.2. BLOCK ENCODING 83
and UA can be constructed using only basic one-qubit unitaries and CNOT gates. The layout of
the two-qubit operations can be designed to be compatible with the coupling map of the hardware.
A cartoon is shown in Example 6.3, and an example is given in Fig. 6.4.
q0 : |0i U1(5.12) U3(3.09, 2.64, 2.83) U3(5.99, 6.27, 0.798) U3(4.51, 5.13, 0.165) • • U1(4.29)
q1 : |0i U1(5.17) U1(5.73) U2 (0.137, 0.709) U2 (0.833, 4.42) U1(1.3) U3(5.78, 1.22, 0.301)
q2 : |0i U2 (1.57, 2.43) • U3(3.9, 4.32, 0.171) U2 (5.16, 3.67) U2 (1.39, 1.36) U3(5.38, 3.58, 4.07) •
q1 (continue) U3(4.21, 5.36, 3.3) • U1(0.703) U2 (3.63, 1.84) • U2 (2.34, 4.58) U1(5.45)
q2 (continue) U3(1.05, 1.55, 5.24) U3(3.4, 5.71, 3.6) • U2 (6.27, 2.12) U1(4.21) U3(3.09, 4.85, 5.24)
q3 (continue) • U1(5.97) U1(5.5) U3(0.614, 1.83, 4.87) U1(0.782) U2 (1.67, 2.85) U1(0.783)
0.096 + 0.256i −0.041 + 0.058i 0.096 − 0.224i 0.120 + 0.061i −0.138 − 0.054i 0.013 + 0.052i 0.189 − 0.099i 0.152 − 0.166i
0.143 − 0.023i 0.001 − 0.335i 0.046 − 0.237i −0.056 + 0.007i 0.063 + 0.016i 0.079 − 0.063i 0.017 + 0.276i −0.046 + 0.007i
0.054 − 0.017i −0.073 + 0.149i −0.002 − 0.063i −0.128 + 0.128i −0.371 + 0.048i −0.163 − 0.102i −0.069 − 0.069i 0.126 + 0.037i
−0.043 − 0.208i −0.156 − 0.170i 0.189 − 0.080i −0.090 + 0.142i −0.057 + 0.075i 0.252 + 0.080i 0.150 + 0.057i 0.098 − 0.043i
A=
0.145 + 0.178i −0.325 + 0.125i 0.114 + 0.242i −0.136 − 0.316i 0.145 + 0.255i −0.120 − 0.335i −0.046 + 0.295i −0.142 − 0.184i
−0.117 + 0.149i −0.101 + 0.338i −0.213 − 0.018i −0.474 + 0.081i −0.036 − 0.121i 0.444 + 0.147i −0.198 + 0.035i −0.091 − 0.054i
−0.063 + 0.305i 0.001 − 0.145i −0.177 + 0.045i −0.209 − 0.150i −0.041 + 0.296i 0.046 + 0.082i 0.387 − 0.051i −0.430 + 0.233i
−0.093 − 0.127i 0.254 + 0.307i −0.144 − 0.265i −0.048 − 0.353i 0.023 + 0.060i 0.085 − 0.156i 0.011 + 0.225i 0.249 + 0.420i
Figure 6.4. A RACBEM circuit constructed using the basic gate set
{U1 , U2 , U3 , CNOT}. The circuit at the bottom is a continuation of the top circuit.
A is the 3-qubit matrix block encoded as the upper-left block, namely, identifying
q0 as the block encoding qubit.
Example 6.4 (Block encoding of a diagonal matrix). As a special case, let us consider the block
encoding of a diagonal matrix. Since the row and column indices are the same, we may simplify
the oracle Eq. (6.2) into
q
2
(6.18) OA |0i |ii = Aii |0i + 1 − |Aii | |1i |ii .
In order to construct a unitary that encodes all row indices at the same time, we define D = H ⊗s
(sometimes called a diffusion operator, which is a term originated from Grover’s search) satisfying
1 X
(6.28) D |0s i = √ |`i .
s
`∈[s]
Consider UA given by the circuit in Fig. 6.5. The measurement means that to obtain A |bi, the
ancilla register should all return the value 0.
6.4. HERMITIAN BLOCK ENCODING 85
|0i
|0s i D OA D
Oc
|bi
OA 1
X q
2
(6.29) −−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |`i |ji
s
`∈[s]
Oc 1
X q
2
−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |`i |c(j, `)i .
s
`∈[s]
Since we are only interested in the final state when all ancilla qubits are the 0 state, we may apply
D to target state |0i |0s i |ii as (note that D is Hermitian)
D 1 X
(6.30) |0i |0s i |ii −→ √ |0i |`0 i |ii .
s 0
` ∈[s]
|0i
|0n i D OA D
Oc SWAP Or†
|bi
Figure 6.6. Quantum circuit for block encoding of general sparse matrices.
Here r(i, `), c(j, `) gives the `-th nonzero entry in the i-th row and j-th column, respectively. It
should be noted that although the index ` ∈ [s], we should expand it into an n-qubit state (e.g. let
` take the last s qubits of the n-qubit register following the binary representation of integers). The
reason for such an expansion, and that we need two oracles Or , Oc will be seen shortly.
Similar to the discussion before, we need a diffusion operator satisfying
1 X
(6.33) D |0n i = √ |`i .
s
`∈[s]
(6.34) D = In−s ⊗ H ⊗s .
We assume that the matrix entries are queried using the following oracle using controlled
rotations
q
2
(6.35) OA |0i |ii |ji = Aij |0i + 1 − |Aij | |ii |ji ,
where the rotation is controlled by both row and column indices. However, if Aij = 0 for some i, j,
the rotation can be arbitrary, as there will be no contribution due to the usage of Or , Oc .
Proof. We apply the first four gate sets to the source state
D O
|0i |0n i |ji −→−−→c
OA 1
X q
2
−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |c(j, `)i |ji
(6.36) s
`∈[s]
SWAP 1
X q
2
−−−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |ji |c(j, `)i .
s
`∈[s]
Here we have used that there exists a unique ` such that i = c(j, `), and a unique `0 such that
j = r(i, `0 ).
We remark that the quantum circuit in Fig. 6.6 is essentially the construction in [GSLW18,
Lemma 48], which gives a (s, n + 3)-block-encoding. The construction above slightly simplifies the
procedure and saves two extra qubits (used to mark whether ` ≥ s).
Next we consider the Hermitian block encoding of a s-sparse Hermitian matrix. Since A is
Hermitian, we only need one oracle to query the location of the nonzero entries
(6.39) Oc |`i |ji = |c(j, `)i |ji .
Here c(j, `) gives the `-th nonzero entry in the j-th column. It can also be interpreted as the `-th
nonzero entry in the i-th column. Again the first register needs to be interpreted as an n-qubit
register. The diffusion operator is the same as in Eq. (6.34).
Unlike all discussions before, we introduce two signal qubits, and a quantum state in the
computational basis takes the form |ai |ii |bi |ji, where a, b ∈ {0, 1}, i, j ∈ [N ]. In other words, we
may view |ai |ii as the first register, and |bi |ji as the second register. The (n + 1)-qubit SWAP
gate is defined as
(6.40) SWAP |ai |ii |bi |ji = |bi |ji |ai |ii .
To query matrix entries, we need access to the square root of Aij as (note that act on the second
single-qubit register)
q
p
(6.41) OA |ii |0i |ji = |ii Aij |0i + 1 − |Aij | |1i |ji .
The square root operation is well defined if Aij ≥ 0 for all entries. If A has negative (or complex)
entries, we
p first iθwrite Aij = |Aij | eiθij , θij ∈ [0, 2π), and the square root is uniquely defined as
/2
p
Aij = |Aij |e ij .
Proposition 6.8. Fig. 6.7 defines UA ∈ HBEs,n+2 (A).
88 6. BLOCK ENCODING
|0i
|bi Oc Oc†
Figure 6.7. Quantum circuit for Hermitian block encoding of a general Hermitian matrix
Proof. Apply the first four gate sets to the source state gives
D O
|0i |0n i |0i |ji −→−−→ c
OA 1
X q q
−−→ √ |0i |c(j, `)i Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |ji
(6.42) s
`∈[s]
SWAP 1
X q q
−−−−→ √ Ac(j,`),j |0i + 1 − Ac(j,`),j |1i |ji |0i |c(j, `)i
s
`∈[s]
q
(6.43) OA 1
X q
−−→ √ |0i |c(i, `0 )i Ac(i,`0 ),i |0i + 1 − Ac(i,`0 ),i |1i |ii
s 0
` ∈[s]
In this equality, we have used that A is Hermitian: Aij = A∗ji , and there exists a unique ` such that
i = c(j, `), as well as a unique `0 such that j = c(i, `0 ).
The quantum circuit in Fig. 6.7 is essentially the construction in [CKS17]. The relation with
quantum walks will be further discussed in Section 7.2.
Exercise 6.1. Construct a query oracle OA similar to that in Eq. (6.5), when Aij ∈ C with
|Aij | < 1.
6.5. QUERY MODELS FOR GENERAL SPARSE MATRICES* 89
Exercise 6.2. Let A ∈ CN ×N be a s-sparse matrix. Prove that kAk ≤ s kAkmax . For every
1 ≤ s ≤ N , provide an example that the equality can reached.
Exercise 6.3. Construct an s-sparse matrix so that the oracle in Eq. (6.25) does not exist.
Exercise 6.4. Let A ∈ CN ×N (N = 2n ) be a Hermitian matrix with entries on the complex unit
circle Aij = zij , |zij | = 1.
2 2
(1) Construct a 2n qubit block-diagonal unitary V ∈ CN ×N such that
1 X p
V |0i |ji = √ z̄ij |ii |ji , j ∈ [N ].
N i∈[N ]
Using the definition of Chebyshev polynomials (of first and second kinds, respectively)
sin(kθ) sin(k arccos λ)
(7.6) Tk (λ) = cos(kθ) = cos(k arccos λ), Uk−1 (λ) = = √ ,
sin θ 1 − λ2
we have
√
√ Tk (λ) − 1 − λ2 Uk−1 (λ)
(7.7) Ok (λ) = .
1 − λ2 Uk−1 (λ) Tk (λ)
Note that if we can somehow replace λ by A, we immediately obtain a (1, 1)-block-encoding for the
Chebyshev polynomial Tk (A)! This is precisely what qubitization aims at achieving, though there
are some small twists.
In the simplest scenario, we assume that UA ∈ HBE1,m (A). Start from the spectral decompo-
sition
X
(7.8) A= λi |vi i hvi | ,
i
|bi
returns |1i |0 i if b = 0 , and − |1i |bi if b 6= 0m . So this precisely implements ZΠ where the signal
m m
qubit |1i is used as a work register. We may also discard the signal qubit, and resulting unitary is
denoted by ZΠ .
In other words, the circuit in Fig. 7.1 implements the operator O. Repeating the circuit k times
gives the (1, m + 1)-block-encoding of Tk (A).
|1i Z
|0m i ZΠ
m
|0 i ≡ UA
UA |ψi
|ψi
Figure 7.1. Circuit implementing one step of qubitization with a Hermitian block
encoding of a Hermitian matrix. Here UA ∈ HBE1,m (A).
Remark 7.2 (Alternative perspectives of qubitization). The fact that an arbitrarily large block
encoding matrix UA can be partially block diagonalized into N subblocks of size 2 × 2 may seem a
rather peculiar algebraic structure. In fact there are other alternative perspectives and derivations
94 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
of the qubitization result. Some noticeable ones include the use of Jordan’s Lemma, and the use of
the cosine-sine (CS) decomposition. Throughout this chapter and the next chapter, we will adopt
the more “elementary” derivations used above.
7.2.1. Basics of Markov chain. Let G = (V, E) be a graph of size N . A Markov chain (or
a random walk) is given by a transition matrix P , with its entry Pij denoting the probability of the
transition from vertex i to vertex j. The matrix P is a stochastic matrix satisfying
X
(7.21) Pij ≥ 0, Pij = 1.
j
A Markov chain is irreducible if any state can be reached from any other state in a finite number
of steps. An irreducible Markov chain is aperiodic if there exists no integer greater than one that
divides the length of every directed cycle of the graph. A Markov chain is ergodic if it is both
irreducible and aperiodic. By the Perron–Frobenius Theorem, any ergodic Markov chain P has a
unique stationary state π, and πi > 0 for all i. A Markov chain is reversible if the following detailed
balance condition is satisfied
(7.23) πi Pij = πj Pji .
Now we define the discriminant matrix associated with a Markov chain as
p
(7.24) Dij = Pij Pji ,
which is real symmetric and hence Hermitian. For a reversible Markov chain, the stationary state
can be encoded as an eigenvector of D (the proof is left as an exercise).
Proposition 7.3 (Reversible Markov chain). If a Markov chain is reversible, then the coherent
version of the stationary state
X√
(7.25) |πi = πi |ii
i
7.2.2. Block encoding of the discriminant matrix. Our first goal is to construct a Her-
mitian block encoding of D. Assume that we have access to an oracle OP satisfying
Xp
(7.28) OP |0n i |ji = Pjk |ki |ji .
k
Thanks to the stochasticity of P , the right hand side is already a normalized vector, and no
additional signal qubit is needed.
We also introduce the n-qubit SWAP operator Swap operator:
which swaps the value of the two registers in the computational basis, and can be directly imple-
mented using n two-qubit SWAP gates.
We claim that the following circuit gives UD ∈ HBE1,n (D).
|0n i
OP SWAP OP†
|ji
Figure 7.2. Circuit for the Hermitian block encoding of a discriminant matrix.
Meanwhile
O
Xp
(7.31) |0n i |ii −−→
P
Pik0 |k 0 i |ii .
k0
7.2.3. Szegedy’s quantum walk. For a Markov chain defined on a graph G = (V, E),
Szegedy’s quantum walk implements a qubitization of the discriminant matrix D in Eq. (7.24).
Let UD be the Hermitian block encoding defined by the circuit in Fig. 7.2, we may readily plug it
into Fig. 7.1, and obtain the circuit in Fig. 7.3 denoted by OZ .
96 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
|0n i ZΠ
OP SWAP OP†
|ψi
Figure 7.3. Circuit implementing one step of Szegedy’s quantum walk operator.
random walk is π > Pek . This converges to the stationary state of Pe, and hence reach the marked
vertex after O(N ) steps of walks (exercise).
These properties are also inherited by the discriminant matrices, with D = P and
(7.39) e= 1 10> .
D
0 N eeee
To distinguish the two cases, we are given a Szegedy quantum walk operator called O, which
can be either OZ or OeZ , which is associated with D, D,
e respectively. The initial state is
(7.40) |ψ0 i = |0n i (H ⊗n |0n i).
Our strategy is to measure the expectation
(7.41) mk = hψ0 |Ok |ψ0 i ,
which can be obtained via Hadamard’s test.
Before determining the value of k, first notice that if O = OZ , then OZ |ψ0 i = |ψ0 i. Hence
mk = 1 for all values of k.
On the other hand, if O = OeZ , we use the fact that D
e only has two nonzero eigenvalues 1 and
(N − 1)/N = 1 − δ, with associated eigenvectors denoted by |e π i and |ev i = √N1−1 (0, 1, 1 . . . , 1)> ,
respectively. Furthermore,
r
1 n N −1 n
(7.42) |ψ0 i = √ |0 i |e πi + |0 i |evi .
N N
Due to qubitization, we have
r
k 1 n N −1 n
(7.43) OZ |ψ0 i = √ |0 i Tk (1) |e
e πi + |0 i Tk (1 − δ) |e
v i + |⊥i ,
N N
where |⊥i is an unnormalized state satisfying (|0n i h0n |) ⊗ In |⊥i = 0. Then using Tk (1) = 1 for all
k, we have
1 1
(7.44) mk = + 1− Tk (1 − δ).
N N
Use the fact that Tk (1 − δ) = cos(k arccos(1 − δ)), in order to have Tk (1 − δ) ≈ 0, the smallest k
satisfies
√
π π π N
(7.45) k≈ ≈ √ = √ .
2 arccos(1 − δ) 2 2δ 2 2
√
Therefore taking k = d π2√N2
e, we have mk ≈ 1/N . Running Hadamard’s test to constant accuracy
allows us to distinguish the two scenarios.
Remark 7.6 (Without using the Hadamard test). Alternatively, we may evaluate the success
probability of obtaining 0n in the ancilla qubits, i.e.,
2
(7.46) p(0n ) = (|0n i h0n | ⊗ In )Ok |ψ0 i .
When O = OZ , we have p(0n ) = 1 with certainty. When O = O eZ , according to Eq. (7.43),
1 1
(7.47) p(0n ) = + 1− Tk2 (1 − δ).
N N
√
So running the problem with k = d π2√N
2
e, we can distinguish between the two cases.
98 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
Remark 7.7 (Comparison with Grover’s search). It is natural to draw comparisons between
Szegedy’s quantum walk and Grover’s search. The two algorithms make queries to different or-
acles, and both yield quadratic speedup compared to the classical algorithms. The quantum walk
is slightly weaker, since it only tells whether there is one marked vertex or not. On the other hand,
Grover’s search also finds the location of the marked vertex. Both algorithms consist of repeated
usage of the product of two reflectors. The number of iterations need to be carefully controlled.
Indeed, choosing a polynomial degree four times as large as Eq. (7.45) would result in mk ≈ 1 for
the case with a marked vertex.
Remark 7.8 (Comparison with QPE). Another possible solution of the problem of finding the
marked vertex is to perform QPE on the Szegedy walk operator O (which can be OZ or O eZ ). The
effectiveness of the method rests on the spectral gap amplification discussed above. We refer to
[Chi21, Chapter 17] for more details.
7.2.4. Comparison with the original version of Szegedy’s quantum walk. The quan-
tum walk procedure can also be presented as follows. Using the OP oracle and the multi-qubit
SWAP gate, we can define two set of quantum states
Xp
|ψj1 i = OP |0n i |ji = Pjk |ki |ji ,
k
(7.48) Xp
|ψj2 i = SWAP(OP |0n i |ji) = Pjk |ji |ki .
k
from which we can define two 2n-qubit reflection operators RΠl = 2Πl − I2n . Let us write down
the reflection operators more explicitly. Using the resolution of identity,
(7.50) RΠ1 = OP ((2 |0n i h0n | − I) ⊗ In )OP† = OP (ZΠ ⊗ In )OP† .
Similarly
(7.51) RΠ2 = SWAP OP (ZΠ ⊗ In )OP† SWAP .
Then Szegedy’s quantum walk operator takes the form
(7.52) UZ = RΠ2 RΠ1 ,
which is a rotation operator that resembles Grover’s algorithm. Note that
(7.53) UZ = SWAP OP (ZΠ ⊗ In )OP† SWAP OP (ZΠ ⊗ In )OP† ,
so
(7.54) OP† UZ (OP† )−1 = OZ
2
,
so the walk operator is the same as a block encoding of T2 (D) using qubitization, up to a matrix
similarity transformation, and the eigenvalues are the same. In particular, consider the matrix
k
power OZ , which provides a block encoding of the Chebyshev matrix polynomial Tk (D). Then the
2k
difference between OZ and UZk appears only at the beginning and end of the circuit.
7.3. LINEAR COMBINATION OF UNITARIES 99
implements the selection of Ui conditioned on the value of the a-qubit ancilla states (also called the
control register). U is called a select oracle.
Let V be a unitary operation satisfying
1 X √
(7.56) V |0a i = p αi |ii ,
kαk1 i∈[K]
and V is called the prepare oracle. The 1-norm of the coefficients is given by
X
(7.57) kαk1 = |αi | .
i
√
α0 ∗ ··· ∗
1 .. . .
(7.58) V =p ∗ . . .. .
kαk1 √
.
αK−1 ∗ ··· ∗
where the first basis is |0m i, and all other basis functions are orthogonal to it. Then
√ √
α0 · · · αK−1
1 ∗ ··· ∗
(7.59) V† = p .. .
.. . .
kαk1 . . .
∗ ··· ∗
Then T can be implemented using the unitary given in Lemma 7.9 (called the LCU lemma).
Lemma 7.9 (LCU). Define W = (V † ⊗ In )U (V ⊗ In ), then for any |ψi,
1
(7.60) W |0a i |ψi = |0a i T |ψi + |⊥i
e ,
kαk1
where |⊥i
e is an unnormalized state satisfying
Proof. First
1 X√ 1 X√
(7.62) U (V ⊗ In ) |0a i |ψi = U p αi |ii |ψi = p αi |ii Ui |ψi .
kαk1 i kαk1 i
Then using the matrix representation (7.59), and let the state |⊥i
e collect all the states marked by
m
∗ orthogonal to |0 i,
1 X
e = 1 |0a i T |ψi + |⊥i
(7.63) (V † ⊗ In )U (V ⊗ In ) |0a i |ψi = |0a i αi Ui |ψi + |⊥i e .
kαk1 i
kαk1
The LCU Lemma is a useful quantum primitive, as it states that the number of ancilla qubits
needed only depends logarithmically on K, the number of terms in the linear combination. Hence
it is possible to implement the linear combination of a very large number of terms efficiently. From
a practical perspective, the select and prepare oracles uses multi-qubit controls, and can be difficult
to implement. If implemented directly, the number of multi-qubit controls again depends linearly
on K and is not desirable. Therefore an efficient implementation using LCU (in terms of the gate
complexity) also requires additional structures in the prepare and select oracles.
If we apply W to |0a i |ψi and measure the ancilla qubits, then the probability of obtaining the
outcome 0a in the ancilla qubits (and therefore obtaining the state T |ψi / kT |ψik in the system reg-
2 2
ister) is (kT |ψik / kαk1 ) . The expected number of repetition needed to succeed is (kαk1 / kT |ψik) .
Now we demonstrate that using amplitude amplification (AA) in Section 2.3, this number can be
reduced to O (kαk1 / kT |ψik).
Remark 7.10 (Alternative construction of the prepare oracle). In some applications it may not
be convenient to absorb the phase of αi into the select oracle. In such a case, we may modify the
√ √
prepare oracle instead. If αi = ri eiθi with ri > 0, θi ∈ [0, 2π), we can define αi = ri eiθi /2 , and
V is defined as in Eq. (7.56). However, instead of V † , we need to introduce
√ √
α0 · · · αK−1
1 ∗ ··· ∗
(7.64) Ve = p .. .
.. . .
kαk1 . . .
∗ ··· ∗
Then following the same proof as Lemma 7.9, we find that W = (Ve ⊗ In )U (V ⊗ In ) ∈ BEkαk1 ,a (T ).
Remark 7.11 (Linear combination of non-unitaries). Using the block encoding technique, we may
immediately obtain linear combination of general matrices that are not unitaries. However, with
some abuse of notation, the term “LCU” will be used whether the terms to be combined are unitaries
or not. In other words, the term “linear combination of unitaries” should be loosely interpreted as
“linear combination of things” (LCT) in many contexts.
Example 7.12 (Linear combination of two matrices). Let UA , UB be two n-qubit unitaries, and
we would like to construct a block encoding of T = UA + UB .
7.3. LINEAR COMBINATION OF UNITARIES 101
There are two terms in total, so one ancilla qubit is needed. The prepare oracle needs to
implement
1
(7.65) V |0i = √ (|0i + |1i),
2
so this is the Hadamard gate. The circuit is given by Fig. 7.4, which constructs W ∈ BE√2,1 (T ).
|0i H H
|ψi UA UB
A special case is the linear combination of two block encoded matrices. Given two n-qubit
matrices A, B, for simplicity let UA ∈ BE1,m (A), UB ∈ BE1,m (B). We would like to construct a
block encoding of T = A + B. The circuit is given by Fig. 7.5, which constructs W ∈ BE√2,1+m (T ).
This is also an example of a linear combination of non-unitary matrices.
|0i H H
|0m i
UA UB
|ψi
Figure 7.5. Circuit for linear combination of two block encoded matrices.
Example 7.13 (Transverse field Ising model). Consider the following TFIM model with periodic
boundary conditions (Zn = Z0 ), and n = 2n ,
X X
(7.66) Ĥ = − Zi Zi+1 − Xi .
i∈[n] i∈[n]
In order to use LCU, we need (n + 1) ancilla qubits. The prepare oracle can be simply constructed
from the Hadamard gate
(7.67) V = H ⊗(n+1) ,
and the select oracle implements
X X
(7.68) U= |ii hi| ⊗ (−Zi Zi+1 ) + |i + ni hi + n| ⊗ (−Xi ).
i∈[n] i∈[n]
Example 7.14 (Block encoding of a matrix polynomial). Let us use the LCU lemma to construct
the block encoding for an arbitrary matrix polynomial for a Hermitian matrix A in Section 7.1.
X
(7.69) f (A) = αk Tk (A),
k∈[K]
via multi-qubit controls. Also given the availability of the prepare oracle
1 X √
(7.71) V |0a i = p αk |ki ,
kαk1 k∈[K]
Example 7.15 (Matrix functions given by a matrix Fourier series). Instead of block encoding,
LCU can also utilize a different query model based on Hamiltonian simulation. Let A be an n-qubit
Hermitian matrix. Consider f (x) ∈ R given by its Fourier expansion (up to a normalization factor)
Z
(7.72) f (x) = fˆ(k)eikx dk,
and we are interested in computing the matrix function via numerical quadrature
Z X
(7.73) f (A) = fˆ(k)eikA dk ≈ ∆k fˆ(k)eikA .
k∈K
Here K is a uniform grid discretizing the interval [−L, L] using |K| = 2k grid points, and the grid
spacing is ∆k = 2L/ |K|. The prepare oracle is given by the coefficients ck = ∆k fˆ(k), and the
corresponding subnormalization factor is
X Z
(7.74) kck1 = ∆k fˆ(k) ≈ fˆ(k) dk.
k∈K
This can be efficiently implemented using the controlled matrix powers as in Fig. 3.6, where the
basic unit is the short time Hamiltonian simulation ei∆kA . This can be used to block encode a large
class of matrix functions.
7.4. QUBITIZATION OF HERMITIAN MATRICES WITH GENERAL BLOCK ENCODING 103
|0m i ZΠ ZΠ
UA UA†
|ψi
Figure 7.6. Circuit implementing one step of qubitization with a general block
encoding of a Hermitian matrix. This block encodes T2 (A). Here UA ∈ BE1,m (A).
This is a special case of the following representation. Note that (−i)d is an irrelevant phase factor
and can be discarded.
d
Y
iφ
e0 ZΠ
(7.92) UΦ = e (UA eiφj ZΠ ).
e
e
j=1
Remark 7.17. This theorem can be proved inductively. However, this is a special case of the
quantum signal processing in Theorem 7.20, so we will omit the proof here. In fact, Theorem 7.20
will also state the converse of the result, which describes precisely the class of matrix polynomials
that can be described by such phase factor modulations. In Theorem 7.16, the condition (1) states
that the polynomial degree is upper bounded by the number of UA ’s, and the condition (3) is simply
106 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
a consequence of that UΦ e is a unitary matrix. The condition (2) is less obvious, but should not
come at a surprise, since we have seen the need of treating even and odd polynomials separately in
the case of qubitization with a general block encoding. Eq. (7.95) can be proved directly by taking
the complex conjugation of UΦ e.
Following the qubitization procedure, we immediately have Theorem 7.18.
Theorem 7.18 (Quantum eigenvalue transformation with Hermitian block encoding). Let UA ∈
HBE1,m (A). Then for any Φe := (φe0 , · · · , φed ) ∈ Rd+1 ,
d
iφ
e0 ZΠ
Y
iφej ZΠ P (A) ∗
(7.96) UΦ
e =e (UA e )= ∈ BE1,m (P (A)),
∗ ∗
j=1
|0i e−iφZ
|bi
Figure 7.7. Implementing the controlled rotation circuit for quantum eigenvalue transformation.
|0i ···
CRφed CRφed−1 CRφe0
|0m i ···
UA UA UA
|ψi ···
The QET described by the circuit in Fig. 7.8 generally constructs a block encoding of P (A)
for some complex polynomial P . In practical applications (such as those later in this chapter),
7.5. QUANTUM EIGENVALUE TRANSFORMATION 107
we would like to construct a block encoding of PRe (A) ≡ (Re P )(A) = 12 (P (A) + P ∗ (A)) instead.
Below we demonstrate that a simple modification of Fig. 7.8 allows us to achieve this goal.
To this end, we use Eq. (7.95). Qubitization allows us to construct
d ∗
−iφe0 ZΠ
Y
−iφ
ej ZΠ P (A) ∗
(7.97) U−Φe =e (UA e )= .
∗ ∗
j=1
|1i e−iφZ
|bi
which returns e−iφ |1i |0m i if b = 0m , and eiφ |1i |bi if b 6= 0m . In other words, the circuit for UP ∗ (A)
and UP (A) are exactly the same except that the input signal qubit is changed from |0i to |1i.
Now we claim the circuit in Fig. 7.9 implements a block encoding UPRe (A) ∈ BE1,m+1 (PRe (A)).
This circuit can be viewed as an implementation of the linear combination of unitaries 21 (UP ∗ (A) +
UP (A) ).
|0i H H
|0m i UP (A)
|ψi
Here |⊥i , |⊥0 i are two (m+n)-qubit state orthogonal to any state |0m i |xi, while |⊥i
e is a (m+n+1)-
m
qubit state orthogonal to any state |0i |0 i |xi. In other words, by measuring all (m + 1) ancilla
qubits and obtain 0m+1 , the corresponding (unnormalized) state in the system register is PRe (A) |ψi.
108 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
7.5.2. General block encoding. If A is given by a general block encoding UA , the quantum
eigenvalue transformation should consist of an alternating sequence of UA , UA† gates. The circuit is
given by Fig. 7.10, and the corresponding block encoding is described in Theorem 7.19. Note that
the Hermitian block encoding becomes a special case with UA = UA† .
|0i ···
CRφed CRφed−1 CRφe0
|0m i ···
UA UA† UA
|ψi ···
Theorem 7.19 (Quantum eigenvalue transformation with general block encoding). Let UA ∈
e := (φe0 , · · · , φed ) ∈ Rd+1 , let
BE1,m (A). Then for any Φ
d/2 h i
UA† eiφ2j−1 ZΠ UA eiφ2j ZΠ
Y
iφ0 ZΠ
(7.99) UΦ
e =e
e e e
j=1
j=1
Note that unlike the case of the block encoding of PRe (A), we lose a subnormalization factor of 2
here.
Following the same principle, if f (x) = g(x) + ih(x) ∈ C[x] is a given complex polynomial,
and g, h ∈ R[x] do not have a definite parity, we can construct Ug(A) ∈ BE2,m+2 (g(A)), Uh(A) ∈
BE2,m+2 (h(A)). Then applying another layer of LCU, we obtain Uf (A) ∈ BE4,m+3 (f (A)).
On the other hand, if the real and imaginary parts g, h have definite parity, then Ug(A) ∈
BE1,m+1 (g(A)), Uh(A) ∈ BE1,m+1 (h(A)). Applying LCU, we obtain Uf (A) ∈ BE2,m+2 (f (A)).
The construction circuits in the cases above is left as an exercise.
Proof. ⇒:
Since both eiφZ and O(x) are unitary, the matrix UΦ (x) is always a unitary matrix, which
immediately implies the condition (3). Below we only need to verify the conditions (1), (2).
When d = 0, UΦ (x) = eiφ0 Z , which gives P (x) = eiφ0 and Q = 0, satisfying all three conditions.
For induction, suppose U(φ0 ,··· ,φd−1 ) (x) takes the form in Eq. (7.106) with degree d − 1, then for
any φ ∈ R, we have
(7.107)
√ √ iφ
P√
(x) −Q(x) 1 − x2 √ x − 1 − x2 e 0
U(φ0 ,··· ,φd−1 ) (x) = ∗
Q (x) 1 − x2 P ∗ (x) 1 − x2 x 0 e−iφ
√
− x2 )Q(x)
iφ
= √xP (x) − (1 − 1 − x2 (P (x) + xQ(x)) e 0
− 1 − x2 (P ∗ (x) + xQ∗ (x)) xP ∗ (x) − (1 − x2 )Q∗ (x) 0 e−iφ
√
eiφ (xP (x) − (1 − x2 )Q(x)) e−iφ (− 1 − x2 (P (x) + xQ(x)))
= iφ √ .
e ((− 1 − x2 (P ∗ (x) + xQ∗ (x))) eiφ (xP ∗ (x) − (1 − x2 )Q∗ (x))
d
d
Y
eiφ0 Z O(x)eiφj Z = eiφ0 Z (O† (x)O(x)) 2 = eiφ0 Z .
(7.109)
j=1
`
X `−1
X
(7.110) P (x) = αk xk , Q(x) = βk x k ,
k=0 k=0
2 2 2 2
(7.111) |α` | x2` − x2 |β`−1 | x2`−2 = (|α` | − |β`−1 | )x2` = 0,
It may appear that deg Pe = ` + 1. However, by properly choosing φ we may obtain deg Pe = ` − 1.
Let e2iφ = α` /β`−1 . Then the coefficient of the x`+1 term in Pe is
(7.113) e−iφ α` − eiφ β`−1 = 0.
Similarly, the coefficient of the x` term in Q
e is
(7.114) − e−iφ α` + eiφ β`−1 = 0.
The coefficient of the x` term in Pe, and the coefficient of the x`−1 term in Q
e are both 0 by the
parity condition. So we have
(1) deg(Pe) ≤ ` − 1 ≤ d − 1, deg(Q) ≤ ` − 2 ≤ d − 2,
(2) Pe has parity d − 1 mod 2 and Qe has parity d − 2 mod 2, and
2 2 e 2
(3) |P (x)| + (1 − x )|Q(x)| = 1, ∀x ∈ [−1, 1].
e
Here the condition (3) is automatically satisfied due to unitarity. The induction follows until
` = 0, and apply the argument in Eq. (7.109) to represent the remaining constant phase factor if
needed.
Remark 7.21 (W -convention of QSP). [GSLW19, Theorem 4] is stated slightly differently as
d h i √
W Y W P√(x) iQ(x) 1 − x2
(7.115) UΦW (x) = eiφ0 Z W (x)eiφj Z = ,
iQ∗ (x) 1 − x2 P ∗ (x)
j=1
where
√
√ x i 1 − x2
(7.116) W (x) = ei arccos(x)X = .
i 1 − x2 x
This will be referred to as the W -convention. Correspondingly Eq. (7.105) will be referred to as the
O-convention. The two conventions can be easily converted into one another, due to the relation
π π
(7.117) W (x) = e−i 4 Z O(x)e+i 4 Z .
Correspondingly the relation between the phase angles using the O and W representations are
related according to
W π
φ0 − 4 ,
j = 0,
W
(7.118) φj = φj , j = 1, . . . , d − 1,
W π
φd + 4 , j = d,
112 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
On the other hand, note that for any θ ∈ R, UΦ (x) and eiθZ UΦ (x)e−iθZ both block encodes P (x).
Therefore WLOG we may as well take
(7.119) Φ = ΦW .
In many applications, we are only interested in P ∈ C[x], and Q ∈ C[x] is not provided a
priori. [GSLW18, Theorem 4] states that under certain conditions P , the polynomial Q can always
be constructed. We omit the details here.
7.6.1. QSP for real polynomials. Note that the normalization condition (3) in Theo-
rem 7.20 imposes very strong constraints on the coefficients of P, Q ∈ C[x]. If we are only interested
in QSP for real polynomials, the conditions can be significantly relaxed.
Theorem 7.22 (Quantum signal processing for real polynomials). Given a real polynomial
PRe (x) ∈ R[x], and deg PRe = d > 0, satisfying
(1) PRe has parity d mod 2,
(2) |PRe (x)| ≤ 1, ∀x ∈ [−1, 1],
then there exists polynomials P (x), Q(x) ∈ C[x] with Re P = PRe and a set of phase factors Φ :=
(φ0 , · · · , φd ) ∈ Rd+1 such that the QSP representation Eq. (7.106) holds.
Compared to Theorem 7.20, the conditions in Theorem 7.22 is much easier to satisfy: given
any polynomial f (x) ∈ R[x] satisfying condition (1) on parity, we can always scale f to satisfy the
condition (2) on its magnitude. Again the presentation is slightly modified compared to [GSLW19,
Corollary 5]. We can now summarize the result of QET with real polynomials as follows.
Remark 7.24 (Relation between QSP representation and QET circuit). Although O(x) = UA (x)Z,
π
we do not actually need to implement Z separately in QET. Note that iZ = ei 2 Z , i.e., ZeiφZ =
π
(−i)ei( 2 +φ)Z , we obtain
d
Y d h
Y i
UΦ (x) = eiφ0 Z O(x)eiφj Z = (−i)d eiφ0 Z UA (x)eiφj Z ,
(7.120)
e e
j=1 j=1
where φe0 = φ0 , φej = φj + π/2, j = 1, . . . , d. For the purpose of block encoding P (x), another
equivalent, and more symmetric choice is
π
φ0 + 4 , j = 0,
π
(7.121) φj = φj + 2 , j = 1, . . . , d − 1,
e
φd + π4 , j = d.
When the phase factors are given in the W -convention, since we can perform a similarity
transformation and take Φ = ΦW , we can directly convert ΦW to Φ
e according to Eq. (7.121), which
is used in the QET circuit in Fig. 7.10.
7.6. QUANTUM SIGNAL PROCESSING 113
Example 7.25 (QSP for Chebyshev polynomial revisited). In order to block encode the Chebyshev
polynomial, we have φj = 0, j = 0, . . . , d. This gives φe0 = 0, φej = π/2, j = 1, . . . , d, and
√ d
d √ Td (x) − 1 − x2 Ud−1 (x) d
Y π
UA (x)ei 2 Z .
(7.122) UΦ (x) = [O(x)] = = (−i)
1 − x Ud−1 (x)
2 Td (x)
j=1
According to Eq. (7.121), an equivalent symmetric choice for block encoding Td (x) is
π
4 , j = 0,
(7.123) φej = π2 , j = 1, . . . , d − 1,
π
4 , j = d.
7.6.2. Optimization based method for finding phase factors. QSP for real polynomials
is the most useful version for many problems in scientific computation. Let us now summarize the
problem of finding phase factors following the W -convention and identify Φ = ΦW .
Given a target polynomial f = PRe ∈ R[x] satisfying (1) deg(f ) = d, (2) the parity of f is
d mod 2, (3) kf k∞ := maxx∈[−1,1] |f (x)| < 1, we would like to find phase factors Φ := (φ0 , · · · , φd ) ∈
[−π, π)d+1 so that
(7.124) f (x) = g(x, Φ) := Re[U (x, Φ)11 ], x ∈ [−1, 1],
with
(7.125) U (x, Φ) := eiφ0 Z W (x)eiφ1 Z W (x) · · · eiφd−1 Z W (x)eiφd Z .
Theorem 7.22 shows the existence of the phase factors. Due to the parity constraint, the number of
degrees of freedom in the target polynomial f (x) is de := d d+1
2 e. Hence f (x) is entirely determined
by on d distinct points. Throughout the paper, we choose these points to be xk =
its values e
2k−1
cos π , k = 1, ..., d,
e i.e., positive nodes of the Chebyshev polynomial T e(x). The QSP
2d
4de
problem can be equivalently solved via the following optimization problem
de
1X 2
(7.126) Φ∗ = arg min F (Φ), F (Φ) := |g(xk , Φ) − f (xk )| ,
Φ∈[−π,π)d+1 de k=1
i.e., any solution Φ to Eq. (7.124) achieves the global minimum of the cost function with F (Φ∗ ) = 0,
∗
This corresponds to choosing complementary polynomial Q(x) ∈ R[x]. With the symmetric con-
straint taken into account, the number of variables matches the number of constraints.
114 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
Unfortunately, the energy landscape of the cost function F (Φ) is very complex, and has numer-
ous global as well as local minima. Starting from a random initial guess, an optimization algorithm
can easily be trapped at a local minima already when d is small. It is therefore surprising that
starting from a special symmetric initial guess
(7.129) Φ0 = (π/4, 0, 0, . . . , 0, 0, π/4),
at least one global minimum can be robustly identified using standard unconstrained optimization
algorithms even when d is as large as 10, 000 using standard double precision arithmetic opera-
tions [DMWL21], and the optimization method is observed to be free from being trapped by any
local minima. Direct calculation shows that g(x, Φ0 ) = 0, and therefore Φ0 does not contain any a
priori information of the target polynomial f (x).
This optimization based method is implemented in QSPPACK1.
Remark 7.26 (Other methods for treating QSP with real polynomials). The proof of [GSLW19,
Corollary 5] also gives a constructive algorithm for solving the QSP problem for real polynomials.
Since PRe = f ∈ R[x] is given, the idea is to first find complementary polynomials PIm , Q ∈ R[x],
so that the resulting P (x) = PRe (x) + iPIm (x) and Q(x) satisfy the requirement in Theorem 7.20.
Then the phase factors can be constructed following the recursion relation shown in the proof of
Theorem 7.20. We will not describe the details of the procedure here. It is worth noting that
the method is not numerically stable. This is made more precise by [Haa19] that these algorithms
require O(d log(d/)) bits of precision, where d is the degree of f (x) and is the target accuracy.
It is worth mentioning that the extended precision needed in these algorithms is not an artifact
of the proof technique. For instance, for d ≈ 500, the number of bits needed to represent each
floating point number can be as large as 1000 ∼ 2000. In particular, such a task cannot be reliably
performed using standard double precision arithmetic operations which only has 64 bits.
7.6.3. A typical workflow preparing the circuit of QET. Let us now use f (x) = cos(xt)
as in the Hamiltonian simulation to demonstrate a typical workflow of QSP. This function should
be an even or odd function to satisfy the parity constraint (1) in Theorem 7.22.
(1) Expand f (x), x ∈ [−1, 1] using a polynomial expansion (in this case, the Jacob–Anger
expansion in Eq. (7.131)), and truncate it to some finite order.
(2) Scale the truncated polynomial by a suitable constant so that the resulting real polynomial
PRe (x) satisfies the maxnorm constraint (2) in Theorem 7.22.
(3) Use the optimization based method to find phase factors ΦW = Φ, and convert the result
to Φ
e according to the relation in Eq. (7.121). Φ
e can be directly used in the QET circuit
in Figs. 7.9 and 7.10.
Remark 7.27. When the function f (x) of interest has singularity on [−1, 1], the function should
first be mollified on a proper subinterval of interest, and then approximated by polynomials. A
more streamlined method is to use the Remez exchange algorithm with parity constraint to directly
approximate f (x) on the subinterval. We refer readers to [DMWL21, Appendix E] for more details.
UH ∈ BEα,m (H). Since eiHt = ei(H/α)(αt) , the subnormalization factor α can be factored into the
simulation time t. So WLOG we assume UH ∈ BE1,m (H). Since
and that cos(xt), sin(xt) are even and odd functions, respectively, we can construct a block encoding
for the real and imaginary part directly using the circuit for real matrix polynomials in Fig. 7.9.
More specifically, we first use the Fourier–Chebyshev series of the trigonometric functions given
by the Jacobi-Anger expansion [−1, 1]:
∞
X
cos(tx) =J0 (t) + 2 (−1)k J2k (t)T2k (x),
k=1
(7.131) ∞
X
sin(tx) =2 (−1)k J2k+1 (t)T2k+1 (x).
k=0
r r
1 2X 2X
(7.133) Cd (x) = J0 (t) + (−1)k J2k (t)T2k (x), Sd (x) = (−1)k J2k+1 (t)T2k+1 (x),
β β β
k=1 k=0
where β > 1 is chosen so that |Cd |, |Sd | ≤ 1 on [−1, 1], and β can be chosen to be as small as 1 + .
Also let fd (x) = Cd (x) + iSd (x), then
Figure 7.13. QSP representation for approximating a step function using an even
polynomial on [0, 0.1] ∪ [0.2, 1]. The phase factors plotted removes a factor of π/4
on both ends (see Eq. (7.129)).
Exercise 7.1. Let A, B be two n-qubit matrices. Construct a circuit to block encode C = A + B
with UA ∈ BEαA ,m (A), UB ∈ BEαB ,m (B).
Exercise 7.2. Use LCU to construct a block encoding of the TFIM model with periodic boundary
conditions in Eq. (4.2), with g 6= 1.
Exercise 7.3. Prove Proposition 7.3.
Exercise 7.4. Let A be an n-qubit Hermitian matrix. Write down the circuit for UPRe (A) ∈
BE1,m+1 (PRe (A)) with a block encoding UA ∈ BE1,m (A), where P is characterized by the phase
sequence Φ
e specified in Theorem 7.16.
118 7. MATRIX FUNCTIONS OF HERMITIAN MATRICES
Exercise 7.5. Write down the circuit for LCU of Hamiltonian simulation.
Exercise 7.6. Using QET to prepare the Gibbs state.
CHAPTER 8
In Chapter 7, we have found that using qubitization, we can effectively block encode the
Chebyshev matrix polynomial Tk (A) for a Hermitian matrix A. Combined with LCU, we can
construct a block encoding of any matrix polynomial of A. The process is greatly simplified using
QSP and QET, which allows the implementation a general class of matrix functions for Hermitian
matrices.
In this section, we generalize the results of qubitization and QET to general non-Hermitian ma-
trices. This is called the quantum singular value transformation (QSVT). Throughout the chapter
we assume A ∈ CN ×N is a square matrix. QSVT is applicable to non-square matrices as well, and
we will omit the discussions here.
119
120 8. QUANTUM SINGULAR VALUE TRANSFORMATION
Here the left and right pointing triangles reflects that the transformation only keeps the left and
right singular vectors, respectively. For a given A, somewhat confusingly, in the discussion below,
the transformation f . (A), f / (A), f (A) will all be referred to as singular value transformations. In
particular, QSVT mainly concerns f . (A), f (A).
Proposition 8.3. The following relations hold:
(8.5) f (A† ) = (f (A))† , f . (A) = f / (A† ),
and
√ √ √ √
(8.6) f . (A) = f ( A† A) = f ( A† A), f / (A) = f ( AA† ) = f ( AA† ).
√
Proof. Just note that A† A = V Σ2√V † , we have√ A† A = V ΣV † . So the eigenvalue and singular
value decomposition coincide for both A† A and AA† .
WLOG we assume access to UA ∈ BE1,m (A), so that the singular values of A are in [0, 1], i.e.,
(8.7) 0 ≤ σi ≤ 1, i ∈ [N ].
and the associated two-dimensional subspaces Hi = span Bi , Hi0 = span Bi0 , we find that UA maps
Hi to Hi0 . Correspondingly UA† maps Hi0 to Hi .
Then Eqs. (8.8) and (8.12) give the matrix representation
p
Bi0 σi 1 − σi2
(8.14) [UA ]Bi = p .
1 − σi2 −σi
Similar calculation shows that
p
p σi 1 − σi2
(8.15) [UA† ]B i
Bi0 = .
1 − σi2 −σi
Meanwhile both Hi and Hi0 are the invariant subspaces of the projector Π, with matrix representa-
tion
1 0
(8.16) [Π]Bi = [Π]Bi0 = .
0 0
Therefore
1 0
(8.17) [ZΠ ]Bi = [ZΠ ]Bi0 = .
0 −1
e = U † ZΠ UA ZΠ , with matrix representation
Hence Hi is an invariant subspace of O A
p 2
σ i − 1 − σi2
(8.18) [O]Bi = p
e .
1 − σi2 σi
Repeating k times, we have
p 2k
e k ]B =(U † ZΠ UA ZΠ )k = p σi − 1 − σi2
[O A
i
1 − σi2 σi
(8.19) p
T (σ ) − 1 − σ 2U (σ )
2k i i 2k−1 i
= p .
1 − σi2 U2k−1 (σi ) T2k (σi )
In other words,
vi T2k (σi )vi†
P .
ek = i ∗ T2k (A) ∗
(8.20) O = .
∗ ∗ ∗ ∗
Remark 8.4. By approximating any continuous function f using polynomials, and using the LCU
lemma, we can approximately evaluate f (A) for any odd function f , and f . (A) for any even
function f . This may seem somewhat restrictive. However, note that all singular values are non-
negative. Hence when performing the polynomial approximation, if we are interested in f (A), we
can always use first perform a polynomial approximation of an odd extension of f , i.e.,
f (x),
x > 0,
(8.23) g(x) = 0, x = 0,
−f (−x), x < 0,
and then evaluate g (A). Similarly, if we are interested in f . (A) for a general f , we can perform
polynomial approximation to its even extension
(
f (x), x ≥ 0,
(8.24) g(x) =
f (−x), x < 0,
and evaluate g . (A).
j=1
.
gives a (1, m + 1)-block-encoding of P (A) for some even polynomial P ∈ C[x].
When d is odd,
(d−1)/2 h i
UA† eiφ2j ZΠ UA eiφ2j+1 ZΠ
Y
(8.26) UΦ = (−i)d eiφ0 ZΠ (UA eiφ1 ZΠ )
e e e e
j=1
gives a (1, m + 1)-block-encoding of P (A) for some odd polynomial P ∈ C[x].
The quantum circuit is exactly the same as that in Fig. 7.10. The phase factors can be adjusted
so that all polynomials P satisfying the conditions in Theorem 7.20 can be exactly represented. If
we are only interested in some real polynomial PRe ∈ R[x] and PRe (A) (odd) and P . (A) (even),
we can use Theorem 7.22 and the circuit in Fig. 8.1 (which is simply a combination of Figs. 7.9
and 7.10) to implement its (1, m + 1)-block-encoding. We have the following theorem. Since the
conditions of QSP representation for real polynomials is simple to satisfy and is also most useful in
practice, we only state the case with real polynomials.
Theorem 8.5 (Quantum singular value transformation with real polynomials). Let A ∈ CN ×N
be encoded by its (1, m)-block-encoding UA . Given a polynomial PRe (x) ∈ R[x] of degree d satisfying
the conditions in Theorem 7.22, we can find a sequence of phase factors Φ ∈ Rd+1 , so that the
circuit in Fig. 8.1 denoted by UΦ implements a (1, m + 1)-block-encoding of PRe (A) if d is odd, and
8.3. QUANTUM SINGULAR VALUE TRANSFORMATION 123
.
of PRe (A) if d is even. UΦ uses UA , UA† , m-qubit controlled NOT, and single qubit rotation gates
for O(d) times.
|0i H ··· H
CRφed CRφed−1 CRφe0
|0m i ···
UA UA† UA
|ψi ···
∈
Figure 8.1. Circuit of quantum singular value transformation to construct UPRe
†
BE1,m+1 (PRe (A)), using UA ∈ BE1,m (A). Here UA , UA should be applied alter-
nately. When d is even, the last UA gate should be replaced UA† , and the circuit
.
constructs UPRe
. ∈ BE1,m+1 (PRe (A)). This is simply a combination of Figs. 7.9
and 7.10.
8.3.2. QSVT applied to Hermitian matrices. When A is a Hermitian matrix, the quan-
tum circuit for QET and QSVT are the same. This means that the eigenvalue transformation and
the singular value transformation are merely two different perspectives of the same object.
For a Hermitian matrix A, the eigenvalue decomposition and the singular value decomposition
are connected as
X X X
(8.27) A= |vi i λi hvi | = |sgn(λi )vi i |λi | hvi | := |wi i σi hvi | .
i i i
Here
(8.28) |wi i = |sgn(λi )vi i , σi = |λi | .
So if P ∈ C[x] is an even polynomial,
X X
(8.29) P (A) = |vi i P (λi ) hvi | = |vi i P (|λi |) hvi | = P . (A).
i i
These relations indeed verify that the eigenvalue decomposition and singular value decomposition
are indeed the same when P has a definite parity. When the parity of P is indefinite, the two
objects are in general not the same, and in particular cannot be directly implemented using the
QET circuit.
8.3.3. QSVT and matrix dilation. For general matrices, we have seen in the context of
solving linear equations in Section 4.3 that the matrix dilation method in Eq. (4.33) can be used to
convert the non-Hermitian problem to a Hermitian problem. Here we study the relation between
QSP applied to the dilated Hermitian matrix, and QSVT for the general matrix.
124 8. QUANTUM SINGULAR VALUE TRANSFORMATION
the condition number of A, then the singular values of A are contained in the interval [δ, 1], with
δ = κ−1 .
Note that f (·) is not bounded by 1 and in particular singular at x = 0. Therefore instead of
approximating f on the whole interval [−1, 1] we consider an odd polynomial p(x) such that
δ
(8.39) p(x) − ≤ 0 , ∀x ∈ [−1, −δ] ∪ [δ, 1].
βx
The β factor is chosen arbitrarily so that |p(x)| ≤ 1 for all x ∈ [−1, 1] to satisfy the requirement of
the condition (2) in Theorem 7.22. For instance, we may choose β = 4/3. The precision parameter
0 will be chosen later. The degree of the odd polynomial can be chosen to be O( 1δ log( 10 )) is
guaranteed by e.g. [GSLW18, Corollary 69]. This construction is not explicit (see an explicit
construction in [CKS17]). Fig. 8.2 gives a concrete construction of an odd polynomial obtained via
numerical optimization, and the phase factors are obtained via QSPPACK.
be the unnormalized true solution, and the normalized solution state is |xi = x/ kxk. Now denote
e = p (A† ) |bi, and |e
x xi = x
e/ ke ek ≤ 0 . For the
xk. Then the unnormalized solution satisfies kx − x
0
normalized state |yi, this error is scaled accordingly. When k |e
xi k, we have
ke
x − xk 0
(8.45) k |xi − |xi k ≈ ≤ .
ke
xk ke
xk
Also we have
δ −1 δξ ξ
(8.46) ke
xk = A |bi = = .
β β βκ
Therefore in order for the normalized output quantum state to be -close to the normalized solution
state |xi, we need to set 0 = O(ξ/κ). This is similar to the case of the HHL algorithm in
Section 4.3, where QPE needs to achieve multiplicative accuracy, which means that the additive
accuracy parameter 0 should be set to O(/κ).
The success probability of the above procedure is Ω(ke xk2 ) = Ω(ξ 2 /κ2 ). With amplitude am-
plification we can boost the success probability to be greater than 1/2 with one qubit serving as
a witness, i.e., if measuring this qubit we get an outcome 0 it means the procedure has succeeded,
and if 1 it means the procedure has failed. It takes O(κ/ξ) rounds of amplitude amplification, i.e.,
using UΦ† , UΦ , Ub , and Ub† for O(κ/ξ) times. A single UΦ uses UA and its inverse
1 1 κ
(8.47) O log 0 = O κ log
δ ξ
times. Therefore the total number of queries to UA and its inverse is
2
κ κ
(8.48) O log .
ξ ξ
The number of queries to the Ub and its inverse is O(κ/ξ). We consider the following two cases for
the magnitude of ξ.
(1) In general if no further promise is given, then ξ ≥ 1. The total query complexity of UA is
therefore O(κ2 log(κ/)). This is the typical complexity referred to in the literature.
(2) If |bi has a Ω(1) overlap with the left-singular vector of A with the smallest singular value,
then ξ = Ω(κ). This is the best case scenario, and the total query complexity of UA is
O(κ log(1/)), and the number of queries to the right hand side vector |bi is O(1), which
is independent of the condition number.
different is that the controlled rotations before UA and UA† are now expressed with respect to two
different basis sets, i.e., P CRφed P † , and Q CRφed−1 Q† , respectively. This can be useful for certain
applications. Let us now express these ideas more formally.
Assume that we are given an (n + m)-qubit unitary U eA , and two (n + m)-qubit projectors Π, Π0 .
0
For simplicity we assume rank(Π) = rank(Π ) = N . Define an orthonormal basis set
(8.51) B = {|ϕ0 i , . . . , |ϕN −1 i , |vN i , . . . , |vN M −1 i},
where the vectors |ϕ0 i , . . . , |ϕN −1 i span the range of Π, and all states |vi i are orthogonal to |ϕj i.
Similarly define an orthonormal basis set
(8.52) B 0 = {|ψ0 i , . . . , |ψN −1 i , |wN i , . . . , |wN M −1 i}
where the vectors |ψ0 i , . . . , |ψN −1 i span the range of Π0 , and all states |wi i are orthogonal to |ψj i.
We can think that the columns of B, B 0 form the basis transformation matrix P, Q, respectively.
Then the matrix A is defined implicitly in terms of its matrix representation
(8.53) eA ]B0 = UA = A ∗ .
[U B ∗ ∗
Note that
0 In 0
(8.54) [Π]B 0 B
B = [Π ]B0 = = |0m i h0m | ⊗ In ,
0 0
we find that
X
(8.55) Π0 U
eA Π = |ψi i Aij hϕj | .
i,j∈[N ]
Therefore Theorem 8.5 can be viewed as the singular value transformation of A, which is a submatrix
of the matrix representation of UA with respect to bases B, B 0 .
The implementation of the controlled rotation P CRφe P † relies on the implementation of CΠ NOT.
The projectors Π, Π0 can be accessed directly, and WLOG we focus on one projector Π. Motivated
from Grover’s search, we may assume access to a reflection operator
(8.56) RΠ = Im − 2Π.
via the controlled NOT gates CΠ NOT, CΠ0 NOT respectively. We can then define an m-qubit
controlled NOT gate as
(8.57) CΠ NOT := X ⊗ Π + I ⊗ (Im − Π),
which can be constructed using RΠ as
Im − RΠ Im + RΠ
CΠ NOT = X ⊗ +I ⊗
2 2
I +X I −X
(8.58) = ⊗ Im + ⊗ RΠ
2 2
= |+i h+| ⊗ Im + |−i h−| ⊗ RΠ
= (H ⊗ Im )(|0i h0| ⊗ Im + |1i h1| ⊗ RΠ )(H ⊗ Im ).
Therefore assuming access to RΠ , the CΠ NOT gate can be implemented using the circuit in Fig. 8.3.
128 8. QUANTUM SINGULAR VALUE TRANSFORMATION
H H
RΠ
Then according to Theorem 8.5, we can implement the QSVT using U e † , C|0m ih0m | NOT
eA , U
A
and single qubit rotation gates, where |0m i h0m | ⊗ In is a rank-2n projector. In this generalized
setting, C|0m ih0m |⊗In NOT = C|0m ih0m | NOT ⊗In in the B, B 0 basis should be implemented using
CΠ NOT, CΠ0 NOT, respectively. We arrive at the following theorem:
Theorem 8.6 (Quantum singular value transformation with real polynomials and projectors).
Let UeA be a (n + m)-qubit unitary, and Π, Π0 be two (n + m)-qubit projectors of rank 2n . Define
the basis B, B 0 according to Eqs. (8.51) and (8.52), and let A ∈ CN ×N be defined in terms of the
matrix representation in Eq. (8.53). Given a polynomial PRe (x) ∈ R[x] of degree d satisfying the
conditions in Theorem 7.22, we can find a sequence of phase factors Φ ∈ Rd+1 to define a unitary
UΦ satisfying
X
(8.59) UΦ |ϕj i = |ψi i [PRe (A)]ij + |⊥0j i ,
i∈[N ]
if d is odd, and
X
.
(8.60) UΦ |ϕj i = |ϕi i [PRe (A)]ij + |⊥j i .
i∈[N ]
if d is even. Here Π0 |⊥0j i = 0, Π |⊥j i = 0, and UΦ uses U e † , CΠ NOT, CΠ0 NOT, and single
eA , U
A
qubit rotation gates for O(d) times.
(8.62) Π0 U
eA Π = a |x0 i hψ0 | ,
and we would like to use Theorem 8.6 to find UΦ that block encodes
(8.63) |x0 i PRe (a) hψ0 | ≈ |x0 i hψ0 | .
8.6. APPLICATION: GROVER’S SEARCH REVISITED, AND FIXED-POINT AMPLITUDE AMPLIFICATION*129
To this end, we need to find an odd, real polynomial PRe (x) satisfying PRe (a) ≈ 1. More specifically,
we need to find PRe (x) satisfying
(8.64) |PRe (x) − 1| ≤ 2 , ∀x ∈ [a, 1].
We can achieve
√ this by approximating the sign function, with deg(PRe ) = O(log(1/2 )a−1 ) =
O(log(1/) N ) (see e.g. [LC17a, Corollary 6]). This construction is also based on an approximation
to the erf function. Fig. 8.4 gives a concrete construction of an odd polynomial obtained via
numerical optimization, and the phase factors are obtained via QSPPACK.
Then
(8.65) UΦ |ψ0 i ≈ |x0 i .
More specifically, note that
(8.66) UΦ |ψ0 i − |x0 i = (PRe (a) − 1) |ψ0 i + |⊥i
e
Exercise 8.1 (Robust oblivious amplitude amplification). Consider a quantum circuit consisting of
two registers denoted by a and s. Suppose we have a block encoding V of A: A = (h0|a ⊗Is )V (|0ia ⊗
Is ). Let W = −V (REF⊗Is )V † (REF⊗Is )V , where REF = Ia −2 |0ia h0|a . (1) Within the framework
of QSVT, what is the polynomial associated with the singular value transformation implemented
by W ? (2) Suppose A = U/2 for some unitary U . What is (h0|a ⊗ Is )W (|0ia ⊗ Is )? (3) Explain the
construction of W in terms of a singular value transformation f (A) with f (x) = 3x − 4x3 . Draw
the picture of f (x) and mark its values at x = 0, 21 , 1.
Exercise 8.2 (Logarithm of unitaries). Given access to a unitary U = eiH where kHk ≤ π/2. Use
QSVT to design a quantum algorithm to approximately implement a block encoding of H, using
controlled U and its inverses, as well as elementary quantum gates.
Bibliography
[Aar13] Scott Aaronson. Quantum computing since Democritus. Cambridge Univ. Pr., 2013.
[ABF16] F. Arrigo, M. Benzi, and C. Fenu. Computation of generalized matrix functions. SIAM
J. Matrix Anal. Appl., 37:836–860, 2016.
[BACS07] Dominic W Berry, Graeme Ahokas, Richard Cleve, and Barry C Sanders. Efficient
quantum algorithms for simulating sparse Hamiltonians. Commun. Math. Phys.,
270(2):359–371, 2007.
[Chi21] Andrew Childs. Lecture notes on quantum algorithms, 2021.
[CKS17] Andrew M. Childs, Robin Kothari, and Rolando D. Somma. Quantum algorithm
for systems of linear equations with exponentially improved dependence on precision.
SIAM J. Comput., 46:1920–1950, 2017.
[CST+ 21] Andrew M Childs, Yuan Su, Minh C Tran, Nathan Wiebe, and Shuchen Zhu. Theory
of trotter error with commutator scaling. Phys. Rev. X, 11(1):011020, 2021.
[DHM+ 18] Danial Dervovic, Mark Herbster, Peter Mountney, Simone Severini, Naïri Usher, and
Leonard Wossnig. Quantum linear systems algorithms: a primer. arXiv:1802.08227,
2018.
[DL21] Yulong Dong and Lin Lin. Random circuit block-encoded matrix and a proposal of
quantum linpack benchmark. Phys. Rev. A, 103(6):062412, 2021.
[DMWL21] Yulong Dong, Xiang Meng, K Birgitta Whaley, and Lin Lin. Efficient phase factor
evaluation in quantum signal processing. Phys. Rev. A, 103:042419, 2021.
[DW19] Ronald De Wolf. Quantum computing: Lecture notes. arXiv:1907.09415, 2019.
[Fey82] Richard P Feynman. Simulating physics with computers. Int. J. Theor. Phys, 21(6/7),
1982.
[GSLW18] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular
value transformation and beyond: exponential improvements for quantum matrix arith-
metics. arXiv:1806.01838, 2018.
[GSLW19] András Gilyén, Yuan Su, Guang Hao Low, and Nathan Wiebe. Quantum singular
value transformation and beyond: exponential improvements for quantum matrix arith-
metics. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of
Computing, pages 193–204, 2019.
[GVL13] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins Univ. Press,
Baltimore, fourth edition, 2013.
[Haa19] J. Haah. Product decomposition of periodic functions in quantum signal processing.
Quantum, 3:190, 2019.
[HBI73] J. B. Hawkins and A. Ben-Israel. On generalized matrix functions. Linear and Multi-
linear Algebra, 1:163–171, 1973.
[HHL09] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear
systems of equations. Phys. Rev. Lett., 103:150502, 2009.
131
132 Bibliography
[JL00] Tobias Jahnke and Christian Lubich. Error bounds for exponential operator splittings.
BIT, 40(4):735–744, 2000.
[KSV02] Alexei Yu Kitaev, Alexander Shen, and Mikhail N Vyalyi. Classical and quantum
computation. Number 47. American Mathematical Soc., 2002.
[LC17a] Guang Hao Low and Isaac L Chuang. Hamiltonian simulation by uniform spectral
amplification. arXiv:1707.05391, 2017.
[LC17b] Guang Hao Low and Isaac L. Chuang. Optimal hamiltonian simulation by quantum
signal processing. Phys. Rev. Lett., 118:010501, 2017.
[Llo96] Seth Lloyd. Universal quantum simulators. Science, pages 1073–1078, 1996.
[LT20a] Lin Lin and Yu Tong. Near-optimal ground state preparation. Quantum, 4:372, 2020.
[LT20b] Lin Lin and Yu Tong. Optimal quantum eigenstate filtering with application to solving
quantum linear systems. Quantum, 4:361, 2020.
[Nak08] Mikio Nakahara. Quantum computing: from linear algebra to physical realizations.
CRC press, 2008.
[NC00] Michael A Nielsen and Isaac Chuang. Quantum computation and quantum information,
2000.
[NO88] J. W. Negele and H. Orland. Quantum many-particle systems. Westview, 1988.
[NWZ09] Daniel Nagaj, Pawel Wocjan, and Yong Zhang. Fast amplification of QMA. Quantum
Inf. Comput., 9(11):1053–1068, 2009.
[Pre99] John Preskill. Lecture notes for physics 219: Quantum computation. Caltech Lecture
Notes, 1999.
[RP11] Eleanor G Rieffel and Wolfgang H Polak. Quantum computing: A gentle introduction.
MIT Press, 2011.
[Sze04] Mario Szegedy. Quantum speed-up of markov chain based algorithms. In 45th Annual
IEEE symposium on foundations of computer science, pages 32–41, 2004.
[Tha08] Mechthild Thalhammer. High-order exponential operator splitting methods for time-
dependent Schrödinger equations. SIAM J. Numer. Anal., 46(4):2022–2038, 2008.