Quantum Computing For Programmers
Quantum Computing For Programmers
www.cambridge.org
Information on this title: www.cambridge.org/9781009098175
DOI: 10.1017/9781009099974
© Robert Hundt 2022
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2022
Printed in the United Kingdom by TJ Books Limited, Padstow Cornwall
A catalogue record for this publication is available from the British Library.
Library of Congress Cataloging-in-Publication Data
Names: Hundt, Robert, author.
Title: Quantum computing for programmers / Robert Hundt.
Description: Cambridge, United Kingdom ; New York, NY : Cambridge
University Press, 2022. | Includes bibliographical references and index.
Identifiers: LCCN 2021044761 (print) | LCCN 2021044762 (ebook) |
ISBN 9781009098175 (hardback) | ISBN 9781009099974 (epub)
Subjects: LCSH: Quantum computing. | BISAC: COMPUTERS / General
Classification: LCC QA76.889 .H86 2022 (print) | LCC QA76.889 (ebook) |
DDC 006.3/843–dc23/eng/20211105
LC record available at https://lccn.loc.gov/2021044761
LC ebook record available at https://lccn.loc.gov/2021044762
ISBN 978-1-009-09817-5 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication
and does not guarantee that any content on such websites is, or will remain,
accurate or appropriate.
To Mary, Thalia, and Johannes
Contents
Acknowledgments page x
Introduction xi
3 Simple Algorithms 78
3.1 Random Number Generator 78
3.2 Gate Equivalences 79
3.3 Classical Arithmetic 89
3.4 Swap Test 93
viii Contents
References 335
Index 343
Acknowledgments
A book like this would not be possible without the help of many people. Vincent Russo
and Timofey Golubev found a large number of issues with the mathematical formula-
tion, code, and writing. Gabriel Hannon provided valuable pointers to related physics
concepts. Several of my questions were answered on the Quantum Computing Stack
Exchange, a very helpful resource and community. Wes Cowley and Sarah Schedler
provided line editing, Sue Klefstad produced the impressive index, and Eleanor Bolton
provided outstanding copyediting services. I am grateful to Tiago Leao for pointing
me to Beauregard (2003), which was key for my implementation of Shor’s refactoring.
Together with Rui Maia, Tiago also provides the community with a much-appreciated
reference implementation (Leao, 2021).
I also want to thank many of my colleagues at Google. Dave Bennet and Michael
Dorner went through the first draft of this text, which, I am certain, was not a pleasant
experience. Their feedback helped to shape this work into an actual learning resource.
Two of my colleagues stand out for their rigor and obsession with details: Fedor
Kostritsa, who provided most detailed comments on text, math, derivations, and code;
and Ton Kalker, who diligently reviewed the whole text and helped greatly in sharp-
ening the mathematical formulation. Sergio Boixo and Benjamin Villalonga corrected
many of my misunderstandings about the quantum supremacy experiment. Michael
Broughton and Craig Gidney helped to improve the section on Grover’s algorithm.
Craig also maintains the elegant online simulator Quirk. Thanks also to Mark Heffer-
nan, Chris Leary, Rob Springer, and Mirko Rossini. Finally, I owe gratitude to Aamer
Mahmood for his extraordinary support.
Without exception, my contacts at Cambridge University Press were outstanding.
First and foremost, I must thank my editor, Lauren Cowles, who did a tremendous job
guiding me throughout the whole process.
Finally, and most importantly, I am incredibly thankful to my family, including my
dog, Charlie, for their love, patience, and support during this all-consuming effort.
Introduction
I have been impressed by numerous instances of mathematical theories that are really about
particular algorithms; these theories are typically formulated in mathematical terms that are
much more cumbersome and less natural than the equivalent formulation today’s computer
scientists would use
Knuth (1974)
mathematical concepts that are necessary to understand the algorithms. We hope that
this format will be helpful to the linear algebra-challenged while not being too shallow
for the cognoscenti. After the introduction of the basic mathematical concepts, the
book is organized into the following major sections:
Source Code
Much of the content in this book is explained with both math and code. To avoid
turning this book into a giant code listing, however, we abbreviate less interesting or
Introduction xiii
repetitive pieces of code with constructs such as [...]. Scaffolding code, such as
Python import statements or C++ #include directives, are typically omitted. The
full sources are hosted under a permissive Apache license on GitHub, along with
instructions on how to download, build, and run:
https://github.com/qcc4cp/qcc
Contributions, comments, and suggestions are always welcome. Typesetting the code
may have introduced errors, but the source of truth is the online repository. The code
is also likely to have evolved beyond what is published here.
1 The Mathematical Minimum
z = x + i y.
The x is called the real part of z; y is the imaginary part. The imaginary number i is
defined as the solution to the equation:
x 2 + 1 = 0.
In other words, i is defined as the square root of −1. A complex number’s con-
jugate, often denoted by z̄ or z ∗ , is created by simply negating its imaginary part:
i → −i. For example, for z = 5 + 2i the conjugate z ∗ would simply be z ∗ = 5 − 2i.
The conjugate of a product of complex numbers is equal to the product of the
conjugates of the complex numbers:
(ab)∗ = a ∗ b∗ .
|z|2 = z ∗ z,
√
|z| = z ∗ z.
Complex numbers can be drawn in the 2D plane with an x- and y-axis according
to the definition. If we think of a complex number as a vector originating at (0, 0), the
norm of a complex number, which is then the length of the corresponding vector, is a
real number and can be computed using Pythagoras’ theorem as:
p q
|z| = |x + i y| = (x − i y)(x + i y) = x 2 + y 2
2 The Mathematical Minimum
For complex numbers, the norm is commonly referred to as the modulus. Note the
difference between the square of a complex number and its squared norm. The square
is computed as:
z 2 = (x + i y)2 = (x + i y)(x + i y) = x 2 + 2i x y − y 2 .
r ei φ = r cos(φ) + i sin(φ) .
z = ei φ = cos(φ) + i sin(φ).
The resulting complex numbers from this exponentiation are on a unit circle around
the origin (0,0).
In Python, complex numbers are, conveniently, part of the language. Note how-
ever, that the imaginary i is written as a j, which is customarily used in electrical
engineering. An example:
x = 1.0 + 0.5j
To conjugate, you can use the built-in conjugate() function for the complex data
types or use numpy’s conj() function. For example:
x_conj = x.conjugate() # or
x_conj = np.conj(x)
denoted by a dagger |xi† , is the transpose of the vector with each element conjugated.
We write this vector as hx|, changing the direction of the angle bracket:
In Dirac notation, such a row vector hx| is called a bra or the dual vector for a
ket |xi. Transposition and conjugation goes both ways – applying the transformation
twice results in the original ket, a property called involutivity.
|xi† = hx|,
hx|† = |xi,
(|xi† )† = |xi.
There is potential for confusion around the conjugates: should the conjugates be
denoted explicitly, via a ∗ or a † , as in hx0∗ x1∗ . . . xn−1
∗ |, or is the fact that a vector has
been converted from ket to bra sufficient? Typically, the conjugates are not marked
explicitly.
For kets |xi and |yi, the inner product is defined as:
x0 y0
x1 y1
|xi = . , hx| = x0∗ x1∗ . . . xn−1 ∗
|yi =
.
, .. ,
. .
xn−1 yn−1
The inner product is how vectors in this notation get their names. It forms a product
of a bra and a ket, a bra(c)ket. Naming is difficult in general and quantum computing
is no exception.
Note that hx|yi does not generally equal hy|xi. For example, consider two kets |xi
and |yi:
−1 1
|xi = 2i , |yi = 0 . (1.1)
1 i
4 The Mathematical Minimum
We construct the corresponding bras via transposition and negation of the imaginary
parts:
hx| = −1 −2i 1 , hy| = 1 0 −i .
hx|yi = −1 ∗ 1 + 2i ∗ 0 + i ∗ 1 = −1 + i,
hy|xi = 1 ∗ −1 + 0 ∗ 2i − i ∗ 1 = −1 − i.
The second result is the conjugate of the first; the two inner products are different.
This points to the important general rule:
hx|yi∗ = hy|xi.
Two vectors are orthogonal if and only if their scalar product is zero. For 2D
vectors, we visualize orthogonal vectors as perpendicular to each other:
hx|yi = 0 ⇒ x, y orthogonal.
Similar to the way in which we compute the norm of a complex number, the norm of
a vector is the scalar product of the vector with its dual vector. A vector is normalized
if its norm is 1:
x0 y0∗ x0 y1∗ . . . ∗
x0
x0 yn−1
∗ x1 y1∗ . . . ∗
x1 x1 y0 x1 yn−1
∗
|xihy| = y0 y1∗ . . . ∗
yn−1 = . .
..
.. .. ..
. .. .
. .
xn−1 xn−1 y0∗ ∗
xn−1 y1 . . . xn−1 yn−1∗
In the example given by Equation (1.1), |xi is a 3×1 vector and |yi is a 1×3 vector.
By the rules of matrix multiplication, their outer product will be a 3 × 3 matrix. Again,
if the vector elements are complex, we conjugate the vector elements when converting
from bra to ket and vice versa.
1.3 Tensor Product 5
To compute the tensor product1 of two vectors, which can be either bras or kets, we
use any of these notations:
|xi ⊗ |yi = |xi|yi = |x, yi = |x yi. (1.3)
And correspondingly:
hx| ⊗ hy| = hx|hy| = hx, y| = hx y|.
In a tensor product, each element of the first constituent is multiplied with the whole
of the second constituent. Hence, an n × m matrix tensored with an k × l matrix
will result in an nk × ml matrix. For example, to compute the tensor products of the
following two kets:
1 0
|0i = , |1i = ,
0 1
0 0
1 1 1
|0i ⊗ |1i = |01i = 0 = 0 .
0
1 0
You can see that the tensor product of two kets is a ket. Similarly, the tensor product
of two bras is a bra, and the tensor product of two diagonal matrices is a diagonal
matrix. Of course, tensor products are also defined for general matrices:
b00 b01 b00 b01
a00 b10 b11 a01
a00 a01 b b01 b b11
⊗ 00 = 10
a10 a11 b10 b11 b00 b01 b00 b01
a10 a11
b10 b11 b10 b11
a00 b00 a00 b01 a01 b00 a01 b01
a b
00 10 a00 b11 a01 b10 a01 b11
= .
a10 b00 a10 b01 a11 b00 a11 b01
a10 b10 a10 b11 a11 b10 a11 b11
For multiplication of scalars α and β with a tensor product, these rules apply:
α(x ⊗ y) = (αx) ⊗ y = x ⊗ (αy), (1.4)
(α + β)(x ⊗ y) = αx ⊗ y + βx ⊗ y. (1.5)
A key property of the the tensor product is the following – it is used in many
derivations in this text:
(A ⊗ B)(a ⊗ b) = (A ⊗ a)(B ⊗ b). (1.6)
1 Here, we are ignoring differences between the tensor product and the Kronecker product.
6 The Mathematical Minimum
The next rule is also very important for this text. Given two composite kets:
|ψ1 i = |φ1 i ⊗ |χ1 i and |ψ2 i = |φ2 i ⊗ |χ2 i,
the inner product between |ψ1 i and |ψ2 i is computed as:
†
hψ1 |ψ2 i = |φ1 i ⊗ |χ1 i |φ2 i ⊗ |χ2 i
= hφ1 | ⊗ hχ1 | |φ2 i ⊗ |χ2 i
= hφ1 |φ2 ihχ1 |χ2 i. (1.7)
Here are the rules for how to conjugate expressions of matrices and vectors. We have
already learned how to convert between bras and kets:
|ψi† = hψ|,
hψ|† = |ψi.
To compute the adjoint of a matrix scaled by a complex factor:
(α A)† = α ∗ A† . (1.8)
1.6 Eigenvalues and Eigenvectors 7
For matrix-matrix products, the order reverses (this is an important rule used in this
book):
(AB)† = B † A† . (1.9)
And similarly, to compute the adjoint for products of matrices and vectors:
And finally:
(A + B)† = A† + B † . (1.13)
A|ψi = λ|ψi.
Applying A to the special vector |ψi only scales the vector with a complex number,
it does not change its orientation. We call λ an eigenvalue of A. There can be multiple
eigenvalues for a given operator. The corresponding vectors for which this equation
holds are called eigenvectors. In quantum mechanics, the synonym eigenstates is also
used. Eigenvalues are allowed to be 0 by definition, but a null vector is not considered
an eigenvector.
Diagonal matrices are a case for which finding the eigenvalues is trivial. Given a
diagonal matrix of this form:
λ0
λ1
.. ,
.
λ n−1
we can pick the eigenvalues right off the diagonal. The corresponding eigenvectors
are the computational bases (1,0,0, . . .)T , (0,1,0, . . .)T , and so on. For Hermitian
matrices the eigenvalues are necessarily real.
8 The Mathematical Minimum
Basic properties of the trace are the following, where c is a scalar, and A and B are
square matrices:
tr(A + B) = tr(A) + tr(B), (1.14)
tr(c A) = c tr(A), (1.15)
tr(AB) = tr(B A). (1.16)
For tensor products, this important relation holds:
tr(A ⊗ B) = tr(A) tr(B). (1.17)
The trace of a Hermitian matrix is real because the diagonal elements of a Hermitian
are real. The trace of a matrix A is the sum of its n eigenvalues λi :
n−1
X
tr(A) = λi . (1.18)
i=0
This next relation is important for measurements. Suppose we have two kets |xi and
|yi, such that
x0 y0
x1 y1
|xi = . and |yi = . .
.. ..
xn−1 yn−1
The trace of the outer product of |xi and hy| is equal to their inner product:
tr(|xihy|) = hy|xi. (1.19)
This is easy to see from the outer product:
x0 y0∗ x0 y1∗ ∗
x0 ... x0 yn−1
∗ x1 y1∗ ∗
x1 y0 x1 yn−1
x1 ...
. y0∗ y1∗ . . . yn−1 ∗ = .
.. .. ..
.. ..
. . .
xn−1 xn−1 y0∗ xn−1 y1∗ . . . ∗
xn−1 yn−1
n−1
X
H⇒ tr(|xihy|) = xi yi∗ = hy|xi.
i=0
2 Quantum Computing Fundamentals
In this chapter, we describe the fundamental concepts and rules of quantum comput-
ing. In parallel, we develop an initial, easy-to-understand, and easy-to-debug code
base for building and simulating smaller-scale algorithms.
The chapter is structured as follows. We first introduce our basic underlying data
type, the Python Tensor type, which is derived from numpy’s ndarray data struc-
ture. Using this type we construct single qubits and quantum states composed of
many qubits. We define operators, which allow us to modify states, and describe a
range of important single-qubit gates. Controlled gates, which play a similar role to
that of control flow in classical computing, come next. We detail how to describe
quantum circuits via the Bloch sphere and in quantum circuit notation. A discussion
of entanglement follows, that fascinating “spooky action at a distance,” as Einstein
called it. In quantum physics, measurement might be even more problematic than
entanglement (Norsen, 2017). In this text, we avoid philosophy and conclude the
chapter by describing a simple way to simulate measurements.
2.1 Tensors
Quantum mechanics and quantum computing are expressed in the language of linear
algebra – vectors, matrices, and operations such as the dot product, outer product, and
Kronecker products. As we develop the theory, we complement it with working code
to allow experimentation.
We start by describing a fundamental data structure in Python. Python may be
slow in general, but it also has the vectorized and accelerated numpy numerical
library for scientific computing. We will make heavy use of this library so that we
do not have to implement standard numerical linear algebra operations ourselves. In
general, we follow Google’s coding style guides for Python (Google, 2021b) and C++
(Google, 2021a).
The core data types, such as states, operators, and density matrices, are all vectors
and matrices of complex numbers. It is good practice to base all of them on one
common type abstraction, which hides the underlying implementation. This approach
avoids potential problems with type mismatches and makes analysis, testing, debug-
ging, pretty-printing, and other common functionality easier to maintain consistently.
The base data type for all subsequent work will be a common Tensor class.
10 Quantum Computing Fundamentals
We derive Tensor from the ndarray array data structure in numpy. Since we
derive Tensor, it will behave just like a numpy array, but we can augment it with
additional convenience functions.
There are several complex ways to instantiate an ndarray. The proper way
to derive a class from this data type is complicated but well documented.1 The
implementation is in the open-source repository in lib/tensor.py:
import numpy as np
class Tensor(np.ndarray):
"""Tensor is a numpy array representing a state or operator."""
Note the use of tensor_type() in this code snippet: It abstracts the floating-
point representation for complex numbers. Why do we do this? The choice of which
complex data type to use is an interesting question, and each has its implications.
Should it be complex numbers based on 64-bit doubles, 32-bit floats, or something
else, for example, TPU’s 16-bit bfloat format? Smaller data types are faster to sim-
ulate because of lower memory bandwidth requirements. But what level of accu-
racy is needed for which circuit? The numpy package supports np.complex128 and
np.complex64, so we simply define a global variable that holds this type’s width.
Having this information in one place makes it easy to experiment with different data
types later on.
if tensor_width == 64:
return np.complex64
return np.complex128
1 https://numpy.org/doc/stable/user/basics.subclassing.html.
2.1 Tensors 11
As we will see in our discussion of quantum states in Section 2.3, the Kronecker
product of tensors, denoted with operator ⊗, is an important operation. This product is
often referred to as tensor product. The Kronecker product describes a block product
between matrices and is the correct term to use. We use the terms Kronecker product
and tensor product interchangeably – tensoring states rolls off the tongue much more
easily than Kroneckering states.
We implement it by adding the member function kron to the Tensor class. The
function simply delegates to the function of the same name in numpy. We make heavy
use of this operation, so for convenience we additionally overload the * operator to
call this function.
There is the potential to confuse the * operator with simple matrix multiplication.
However, in Python and with numpy, matrix multiplication is done with the at opera-
tor @. We inherit this operator from numpy; we don’t have to implement it ourselves.
return self.kron(arg)
In our initial approach to quantum computing, we will often construct larger matri-
ces by tensoring together many identical matrices, which corresponds to calling the
kron function multiple times. For example, to tensor together n unitary matrices U ,
we will use the following notation:
U ⊗ U ⊗ · · · ⊗ U = U ⊗n .
| {z }
n
if n == 0:
return 1.0
t = self
for _ in range(n - 1):
t = np.kron(t, self)
return self.__class__(t) # Necessary to return a Tensor type
12 Quantum Computing Fundamentals
if len(self.shape) != 2:
return False
if self.shape[0] != self.shape[1]:
return False
return self.is_close(np.conj(self.transpose()))
Some of the matrices we will encounter later in the text are permutation matrices,
which have only a single 1 in each row and column of the matrix. This routine verifies
this property:
2 Note that for scalars, math.isclose is significantly faster than np.allclose. We will use it in
performance-critical code.
2.2 Qubits 13
2.2 Qubits
In classical computing, a bit can have the value 0 or 1. If we think of a bit as a switch,
it is either off or on. You could say it is in the off-state (0-state) or on-state (1-state).
Quantum bits, which are also called qubits, can be in a 0-state or a 1-state as well.
What makes them quantum is that they are in a superposition of these states: They can
be in the 0-state and the 1-state at the same time. What does this mean exactly?
First, we have to distinguish between a qubit and the state of a qubit. Physical
qubits, developed for real quantum computers, are real physical entities, such as ions
captured in an electric field, Josephson junctions on an ASIC, and so on. The state of
a qubit describes some measurable property of that qubit, for example, the energy
level of an electron. In quantum computing, at the level of programming abstrac-
tions, the physical implementation does not matter; instead, we are concerned with
the measurable state. This is similar to classical computing, where very few people
are knowledgeable about the quantum effects that enable transistors at the level of
logic gates. Thus, the terms qubit and state of the qubit are used interchangeably; we
typically only use the term qubit.
This state space of one or more qubits is often denoted by the Greek symbol
|ψi (“psi”). The standard notation for a qubit’s 0-state is |0i in the Dirac notation.
The 1-state is correspondingly written as |1i. You can think of these as physically
distinguishable states, such as the spin of an electron. These two states, |0i and |1i, are
known as basis states. We will not delve into the typical elaborate discussion of linear
algebra and the theory of vector spaces here. All we need to know is that basis states
represent orthogonal sets of vectors of dimensionality n (vectors with n components).
Any vector of the same dimensionality can be constructed from linear combinations of
basis states. In our context, we also require that basis vectors are normalized, forming
an orthonormal set of basis vectors. Another way to say this is that the basis vectors
are linearly independent and have a modulus of 1.
Superposition now simply means that the state of a qubit is a linear combination of
orthonormal basis states, for example the |0i and |1i states:
where α and β are complex numbers, called the probability amplitudes, with
|α|2 + |β|2 = 1.
14 Quantum Computing Fundamentals
Note that we use the square of the norm, not the square of a complex number.
As will become clear later, this follows from one of the fundamental postulates of
quantum mechanics: On measurement, the state collapses to either |0i with (real)
probability |α|2 or |1i with (real) probability |β|2 . The state has to collapse to one
of the two, and thus, the added probabilities must add up to 1.0. If both α and β are
√ √ 2
exactly 1/2, there is an equal 1/2 = 1/2 probability that the state collapses to |0i
or |1i on measurement. If α is 1.0 and β is 0.0, it is certain that the state will collapse
to the |0i state on measurement.
Let us look at a standard example. Given a qubit |φi as
√
3 i
|φi = |0i + |1i,
2 2
the probability of measuring |0i is
√ ! √ !
3 3 3
Pr|0i |φi = = .
2 2 4
The probability of measuring |1i is the following – we compute the norm squared
of the factor i/2:
i 2 −i i 1
Pr|1i |φi = = = .
2 2 2 4
The following code will translate these concepts into a straightforward implemen-
tation. As a forward reference, we use the type State, which we discuss in the next
section. Put simply, a State is a vector of complex numbers implemented using
Tensor.
To construct a qubit, we need either α or β, or both. If only one is provided, we
can easily compute the other one, given that their squared norms must add to 1.0. To
compute the norms of the complex numbers α and β, we multiply each one with its
complex conjugate (hence the use of np.conj()). The result will be a real number.
For the code not to generate a type error from numpy, we have to explicitly convert the
result with np.real(). We compare the results to 1.0, and if it is within tolerance,
we construct and return the qubit.
What data structure should we use to represent a qubit? We simply create an array
of two complex values, fill in α and β, and return a State constructed from this
array.
if beta is None:
beta = math.sqrt(1.0 - np.conj(alpha) * alpha)
2.3 States 15
if alpha is None:
alpha = math.sqrt(1.0 - np.conj(beta) * beta)
qb = np.zeros(2, dtype=tensor.tensor_type())
qb[0] = alpha
qb[1] = beta
return State(qb)
From this code, you can infer what the basis states might look like – a state is just
a complex vector: the |0i state should be [1,0]T and the |1i state [0,1]T . With this in
mind, the state of a qubit can be written in these forms:
|ψi = α|0i + β|1i
1 0
=α +β
0 1
α
= .
β
The choice of |0i = [1,0]T and |1i = [0,1]T as the basis states is not the only
possible one. What matters is that these vectors are orthonormal. They are orthogonal,
with a mutual scalar product of h0|1i = 0.0, and normalized, with scalar products of
h0|0i = 1.0 and h1|1i = 1.0.
The set of orthonormal bases [1,0]T and [0,1]T for the qubit vector space, which
is also called the computational basis, is intuitive and simplifies the math. But other
bases are possible, especially the ones resulting from rotations. Those are common-
place in quantum computing, as we will see shortly.
2.3 States
As we saw in the previous section, qubits are states, vectors of complex numbers rep-
resenting probability amplitudes. We should use our trusty Tensor class to represent
states in code. We inherit a State class from Tensor and add a moderately improved
print function. The sources are in the open-source repository in lib/state.py:
class State(tensor.Tensor):
"""class State represents single and multi-qubit states."""
The state of two or more qubits is defined as their tensor product. To compute
it, we added the * operator to the underlying Tensor type in the previous section
(implemented as the corresponding Python __mul__ member function). Note that the
tensor product of two states, which both have a norm of 1.0, also has a norm of 1.0.
For two qubits |φi and |χi, the combined state can be written as in Equation (1.3),
with ⊗ being the symbol for the Kronecker product:
|ψi = |φi ⊗ |χi = |φi|χi = |φ,χi = |φχi.
Given this definition, the state for n qubits is a Tensor of 2n complex numbers, the
probability amplitudes. We could maintain this length as an extra member variable to
State, but it is easy to compute from the length of the state vector (which is already
maintained by numpy ). We define it as a property. Because this property is required for
all classes derived from Tensor (e.g., States and Operators), we add the nbits
property to the Tensor base class, so that derived classes can inherit it:
@property
def nbits(self) -> int:
"""Compute the number of qubits in the state."""
return int(math.log2(self.shape[0]))
Python does have a bit_length() function to determine how many bits are
needed to represent a number. Here, using this function would be wrong. To represent
eight states, n = 3 for a state of 2n complex numbers. However, you would need
four classical binary bits to represent the number 8. Using a value of n − 1 will not
work for an input value of 0. Additionally, bit_length() returns values for negative
numbers, which makes no sense in this quantum state context. For all these reasons,
we decided to use the log2 function. As a code example, let us combine two qubits:
The resulting state is a Tensor with the complex values [0.0,1.0,0.0,0.0], which is
the result you would expect from tensoring [1,0]T and [0,1]T . The index of the value
2.3 States 17
1.0 is 1, which indicates that the combination of the |0i state and the |1i state is being
interpreted as binary 0b01. This will become more clear in the next section, where we
discuss qubit ordering.
In all code examples that follow, states are constructed from the high order bit to
the low order bit. This choice is arbitrary, and as a matter of fact, some texts have it
the other way around. The important thing is to stay consistent.
A quantum state’s probability amplitudes represent probabilities – the squared
norms of all amplitudes must add up to 1.0. Basis states are normalized. As an
example (which generalizes to n qubits), for two qubits there are four basis states, and
we can write the state |ψi as a superposition:
1 0 0 0
0 1 0 0
|ψi = c0 0 + c1 0 + c2 1 + c3 0
0 0 0 1
3
X
= c0 ψ0 + c1 ψ1 + c2 ψ2 + c3 ψ3 = ci ψi .
i=0
The amplitudes are complex numbers, so to compute the norm we multiply by the
complex conjugates:
For states that are products of states, we apply Equation (1.7). Note, again, that the
elements in the bras are the complex conjugates:
|ψi = |φχi
hψ| = |φχi† = hφχ|
hψ|ψi = hφχ|φχi = hφ|φihχ|χi.
p1 = state.qubit(alpha=random.random())
x1 = state.qubit(alpha=random.random())
psi = p1 * x1 # Tensor product.
• As qubits are added to a circuit, they are being added from left to right (in a binary
string), from the high-order qubit to the low-order qubit.
• In Dirac notation, two tensored states are written as |x, yi, for example, |0,1i.
Additionally, in this notation the most significant bit is the first to appear. States
like this are also expressed as decimals. Here, the state is interpreted as |1i (binary
01) as opposed to |2i (binary 10) if incorrectly read from right to left:
• We will see in Section 2.8 that circuits are drawn as a vertical stack of qubits, with
the top qubit being the most significant.
• We will learn soon about simple functions for constructing composite states from
|0i and |1i states. In these functions, the first qubit to appear will be the most
significant qubit, similar to the circuit notation. For example, the state
|ψi = |1i ⊗ |0i ⊗ 1 ⊗ |0i is constructed with
psi = state.bitstring(1, 0, 1, 0)
• When states are formatted to print, the most significant bit will also be to the left,
as in binary notation.
• We have to distinguish between how bits or qubits are interpreted and how we
store them in our programs. In classical computing, bit 0 is typically the rightmost
bit, which is the least significant bit. When we store bits as an array of bits, we will
store them from low index to high array index. This means the index into the array
is 0 for the first stored bit. In the quantum case, this is the most significant qubit.
This is a constant source of confusion, not just in this context but in any context
that has to represent bits or qubits in an indexed, array-like data structure.
For brevity, when interpreting the bitstrings as binary numbers and numbering them
in decimal, states are often written as in this example:
|011i = |3i.
2.3 States 19
Be aware of the potential for confusion between the state |000i, the corresponding
decimal state |0i, and an actual one-qubit state |0i. How does the decimal interpreta-
tion of a state relate to the state vector?
1
1 1 0
• The state |00i is computed as 0 ⊗ 0 = 0. Also called |0i.
0
0
1 0 1
• The state |01i is computed as 0 ⊗ 1 = 0. Also called |1i.
0
0
0 1 0
1. Also called |2i.
• The state |10i is computed as 1 ⊗ 0 =
0
0
0 0 0
0. Also called |3i.
• The state |11i is computed as 1 ⊗ 1 =
To find the probability amplitude for a given state, we can use binary addressing.
The state vector for the three-qubit state |011i is:
[0,0,0,1,0,0,0,0]T .
Interpreting the right-most qubit in |011i as the least significant bit 0 with a bit
value of 1, the middle one as bit 1 with a bit value of 1 and a power-of-2 value of 2,
and the left-most as bit 2 with a value of 0, the state |011i computes the value 3 in
decimal, state |3i. We index the state vector as an array, as described above, from left
to right, from 0 to n. Indeed, entry 3 of the state vector is set to 1. The amplitude for
each state in the state vector can be found with this simple binary addressing scheme.
Note that the tensor product representation of this 3-qubit state contains the ampli-
tudes for all eight possible states. Seven states have an amplitude of 0.0. This already
hints at a potentially more efficient sparse representation.
amplitude = self.ampl(*bits)
return np.real(amplitude.conj() * amplitude)
We use Python parameters that are decorated with a *. This means a variable
number of arguments is allowed and the parameters are packed as a tuple. To unpack
the tuple, you have to prefix the access with a * again, as shown in the function
definitions above.
As an example, for a four-qubit state, you can get the amplitude and probability for
the state |1011i in the following way:
psi.ampl(1, 0, 1, 1)
psi.prob(1, 0, 1, 1)
The following snippet iterates over all possible states and prints the probabilities
for each state:
During algorithm development, we often want to find the one state with the highest
probability. For this, we add the following convenience function, which iterates over
all possible states and returns the state/probability pair with the highest probability:
As we will see later, it can become necessary to renormalize a state vector. This is
done with the normalize member function. Note that this function assumes that the
dot product is not 0.0; otherwise, this code will result in a division by zero exception:
2.3 States 21
The phase of a qubit is the angle obtained when converting the qubit’s complex
amplitude to polar coordinates. We only use this during print-outs, so we convert the
phase to degrees here.
amplitude = self.ampl(*bits)
return math.degrees(cmath.phase(amplitude))
if desc:
print('|', end='')
for i in range(psi.nbits-1, -1, -1):
print(i % 10, end='')
print(f'> \'{desc}\'')
state_list: List[str] = []
for bits in helper.bitprod(psi.nbits):
if prob_only and (psi.prob(*bits) < 10e-6):
continue
state_list.append(
'{:s}: ampl: {:+.2f} prob: {:.2f} Phase: {:5.1f}'
.format(state_to_string(bits),
psi.ampl(*bits),
psi.prob(*bits),
psi.phase(*bits)))
22 Quantum Computing Fundamentals
state_list.sort()
print(*state_list, sep='\n')
As an example, the output from the dumper may look like the following, showing
all states with nonzero probability:
if d < 1:
raise ValueError('Rank must be at least 1.')
shape = 2**d
t = np.zeros(shape, dtype=tensor.tensor_type())
t[idx] = 1
return State(t)
The function bitstring allows the construction of states from a defined series of
|0i and |1i states. As noted above, the most significant bit comes first:
2.3 States 23
d = len(bits)
if d == 0:
raise ValueError('Rank must be at least 1.')
t = np.zeros(1 << d, dtype=tensor.tensor_type())
t[helper.bits2val(bits)] = 1
return State(t)
bits = [0] * n
for i in range(n):
bits[i] = random.randint(0, 1)
return bitstring(*bits)
Finally, because the canonical single-qubit states |0i and |1i are used often, it may
make sense to define constants for them. Global variables are bad style. We only added
them to offer compatibility with other frameworks. Don’t use them.
Can we initialize a state with a given normalized vector? Yes, we can. This is a
pattern we will see later, in the section on phase estimation (Section 6.4), where we
initialize a state directly with the eigenvector of a unitary matrix:
umat = scipy.stats.unitary_group.rvs(2**nbits)
eigvals, eigvecs = np.linalg.eig(umat)
psi = state.State(eigvecs[:, 0])
In code:
Because of the way we constructed this density matrix, it represents a pure state; it
is not entangled or statistically mixed with anything else. Correspondingly, the trace
of the density matrix is 1; it is the sum of all state probabilities.
We will need a small set of helper functions. They will be used in several places, but
don’t seem to belong to any specific core module. Hence we collect helper functions
in the open-source repository in file lib/helper.py.
Bit Conversions
We often have to convert between a decimal number and its binary representation as a
tuple of 0s and 1s. These two helper functions make that easy:
2.5 Operators
We have discussed qubits and states. In quantum computing, how are these states
modified? Classical bits are manipulated via logic gates, such as AND, OR, XOR,
and NAND. In quantum computing, qubits and states are changed with operators. It
seems appropriate to think of operators as the Instruction Set Architecture (ISA) of a
quantum computer. It is a different ISA than that of a typical classical computer, but it
is an ISA nonetheless. It enables computation.
In this section, we discuss operators, their structure, properties, and how to apply
them to states. All sources are in the open-source repository, in file lib/ops.py.
An example of a single-qubit gate is the Identity gate, which, when applied, leaves
a qubit unmodified:
1 0 α α
= .
0 1 β β
Another example is the X-gate (a synonym for the Pauli X-gate described in Section
2.6.2), which swaps the probability amplitudes of a qubit:
0 1 α β
= .
1 0 β α
We detail many standard gates later in this section. Note that because UU † = I ,
unitary matrices are necessarily invertible. As a result, all (unitary) quantum gates are
reversible by simply using a gate’s conjugate transpose.
On the other hand, Hermitian matrices are not necessarily unitary. In Section 2.15
we will see how Hermitian operators used for measurements are neither unitary nor
reversible.
class Operator(tensor.Tensor):
"""Operators are represented by square, unitary matrices."""
The numpy package has routines to print arrays, but we add another dumper func-
tion that produces a more compact output, making it easier to see the matrix structure
instead of seeing values with high precision. This function can be adapted quickly to
help during challenging debugging sessions.
2.5 Operators 27
def dump(self,
description: Optional[str] = None,
zeros: bool = False) -> None:
res = ''
if description:
res += f'{description} ({self.nbits}-qubits operator)\n'
for row in range(self.shape[0]):
for col in range(self.shape[1]):
val = self[row, col]
res += f'{val.real:+.1f}{val.imag:+.1f}j '
res += '\n'
if not zeros:
res = res.replace('+0.0j', ' ')
res = res.replace('+0.0', ' - ')
res = res.replace('-0.0', ' - ')
res = res.replace('+', ' ')
print(res)
Here are examples for a two-qubit operator, printed both with this dumper function
and numpy’s own print3 function.
# dump
0.5 0.5 0.5 0.5
0.5 -0.5 0.5 -0.5
0.5 0.5 -0.5 -0.5
0.5 -0.5 -0.5 0.5
# numpy print
Operator for 2-qubit state space. Tensor:
[[ 0.49999997+0.j 0.49999997+0.j 0.49999997+0.j 0.49999997+0.j]
[ 0.49999997+0.j -0.49999997+0.j 0.49999997+0.j -0.49999997+0.j]
[ 0.49999997+0.j 0.49999997+0.j -0.49999997+0.j -0.49999997+0.j]
[ 0.49999997+0.j -0.49999997+0.j -0.49999997+0.j 0.49999997-0.j]]
3 numpy has a fairly flexible way to configure prints as well; see https://numpy.org/doc/stable/
reference/generated/numpy.set_printoptions.html.
28 Quantum Computing Fundamentals
def __call__(self,
arg: Union[state.State, ops.Operator],
idx: int = 0) -> state.State:
return self.apply(arg, idx)
def apply(self,
arg: Union[state.State, ops.Operator],
idx: int) -> state.State:
"""Apply operator to a state or operator."""
[...]
if not isinstance(arg, state.State):
raise AssertionError('Invalid parameter, expected State.')
[...]
return state.State(np.matmul(self, arg))
We can also apply an operator to another operator. In this case, application results
in matrix-matrix multiplication. What is the order of applications when multiple oper-
ators are being applied in sequence?
Assume we have an X-gate and a Y-gate (to be explained later), and we want to
apply them in sequence. We can write this the following way in Python, where gates
are applied to a state, and we return the updated state:
psi_1 = X(psi_0)
psi_2 = Y(psi_1)
|0i X Y |ψi
In the function call notation, we would write the symbols from left to right as well.
But note that function parameters are being evaluated first, which means they are being
applied first:
This already points to the fact that if we express the combined operator as the
product of matrices, we have to invert their order (with @ being the matrix multiply
operator in Python):
def apply(self,
arg: Union[state.State, ops.Operator],
idx: int) -> Union[state.State, ops.Operator]:
"""Apply operator to a state or operator."""
if isinstance(arg, Operator):
if self.nbits != arg.nbits:
raise AssertionError('Operator with mis-matched dimensions.')
to apply the 2 × 2 gate to just one qubit in their tensor product? The key property of
the tensor product that enables handling this case is Equation (1.6):
(A ⊗ B)(α ⊗ β) = (A ⊗ α)(B ⊗ β).
We can utilize this equation with the identity gate I , which is the matrix:
1 0
I = .
0 1
Applying I to any qubit leaves the qubit intact. The above equation allows us to,
for example, apply the X-gate (discussed earlier) to the second qubit in a three-qubit
state by tensoring together I , the X-gate, and another I to obtain an 8 × 8 matrix:
psi = state.bitstring(0, 0, 0)
op = ops.Identity() * ops.PauliX() * ops.Identity()
psi = op(psi)
psi.dump()
psi = state.bitstring(0, 0, 0)
opx = ops.Identity() * ops.PauliX() * ops.Identity()
opy = ops.Identity() * ops.Identity() * ops.PauliY()
big_op = opx(opy)
psi = big_op(psi)
Note that there is a shortcut notation for this. To indicate that gate A should be
applied to a qubit at a certain index i, we just write Ai . This notation means that
this operator is being padded from both sides with identity matrices. For the example
above, to apply the X-gate to qubit 1 and the Y-gate to qubit 2, we would write
X 1 Y2 .
Of course, in regards to performance, constructing the full, combined operator
up front for n qubits is the worst possible case, as we have to perform full matrix
multiplies with matrices of size (2n )2 . Matrix multiplication is of cubic4 complexity
O(n 3 ). Since a matrix-vector product is of complexity O(n 2 ), it can be faster to apply
the gates individually, depending on the number of gates.
psi = state.bitstring(0, 0, 0)
opx = ops.Identity() * ops.PauliX() * ops.Identity()
psi = opx(psi)
opy = ops.Identity() * ops.Identity() * ops.PauliY()
psi = opy(psi)
Of course, in this particular example we could have simply combined the gates:
psi = state.bitstring(0, 0, 0)
opxy = ops.Identity() * ops.PauliX() * ops.PauliY()
psi = opxy(psi)
4 This is an approximation to make a point, which we will use in several places. More efficient algorithms
are known, such as the Coppersmith–Winograd algorithm with complexity O(22.3752477 ).
32 Quantum Computing Fundamentals
To apply a given gate, say the X-gate, to a state psi at a given qubit index idx,
we write:
X = ops.PauliX()
psi = X(psi, idx)
To achieve this, we augment the function call operator for Operator. If an index
is provided as a parameter, we pad the operator up to this index with identity matrices.
Then, we compute the size of the given operator, which can be larger than 2 × 2, and
if the resulting matrix’s dimension is still smaller than the state it is applied to, we pad
it further with identity matrices. In above example, instead of:
psi = state.bitstring(0, 0, 0)
opx = ops.Identity() * ops.PauliX() * ops.Identity()
psi = opx(psi)
we can now write the following. Note that the first pair of parentheses to PauliX()
returns a simple 2×2 Operator object. The parenthesis (psi, 1) are the parameters
passed to the operator’s function call operator __call__, which delegates to the
apply function. This is where the automatic padding finally happens. This syntax
may be confusing on first sight:
psi = state.bitstring(0, 0, 0)
psi = ops.PauliX()(psi, 1)
def apply(self,
arg: Union[state.State, ops.Operator],
idx: int) -> Union[state.State, ops.Operator]:
"""Apply operator to a state or operator."""
if isinstance(arg, Operator):
arg_bits = arg.nbits
if idx > 0:
arg = Identity().kpow(idx) * arg
if self.nbits > arg.nbits:
arg = arg * Identity().kpow(self.nbits - idx - arg_bits)
if self.nbits != arg.nbits:
raise AssertionError('Operator(O) with mis-matched dimensions.')
#
# [... Comment block as shown above]
#
return arg @ self
2.6 Single-Qubit Gates 33
op = self
if idx > 0:
op = Identity().kpow(idx) * op
if arg.nbits - idx - self.nbits > 0:
op = op * Identity().kpow(arg.nbits - idx - self.nbits)
In this section, we list single-qubit gates that are commonly used in quantum com-
puting. They are equivalent to logic gates in classical computing – understanding
the basic gates helps in building up more sophisticated circuits. Quantum gates are
similar. You have to understand their function in order to compose them into more
interesting circuits. However, for the most part, the gates’ functions are quite different
from classical gates.
We start with simple gates and then discuss the more complicated roots and rota-
tions before discussing the important Hadamard gate, which puts qubits in a superpo-
sition of basis states.
For each gate, we define a constructor function and allow passing a dimension
parameter d, which allows the construction of multi-qubit operators from the same
underlying single-qubit gates. For example, for the identity gates in the previous
example, instead of having to write
y2 = ops.Identity(2) * ops.PauliY()
that computes the following tensor product. Note the subscript in Y2 , which indicates
that the Y-gate should only be applied to qubit 2:
Y2 = I ⊗ I ⊗ Y = I ⊗2 ⊗ Y .
The Pauli Z-gate, or just Z-gate, is also known as the phase-flip gate, as it inverts
the sign of the second qubit component.
1 0 α α
Z |ψi = = .
0 −1 β −β
2.6 Single-Qubit Gates 35
|0i(z)
aE
θ
y
φ
x
|1i
In other words, it changes |ψi = α|0i + β|1i to |ψi = α|0i − β|1i. Just to reiterate
one more time, the basis states remain unchanged. It is the sign of the coefficient β
that changes. The gate’s constructor looks similar to the previous ones:
Together with the identity matrix, the Pauli matrices form a basis for the vector
space of 2 × 2 Hermitian matrices. This means that all 2 × 2 Hermitian matrices can
be constructed using a linear combination of Pauli matrices. They all have eigenvalues
of 1.0 and −1.0. Pauli matrices are also involutory:
I I = X X = YY = Z Z = I.
2.6.3 Rotations
Rotation operators are constructed via exponentiation of the Pauli matrices. Their
impact can best be visualized as rotations around the Bloch sphere. We will discuss
the Bloch sphere in more detail in Section 2.9. In short, every single qubit can be
visualized as a point on a 3D sphere with a radius of 1.0 as shown in Figure 2.1. The
|0i and |1i states are located at the north and south poles, respectively.
Applying a gate to a qubit moves its corresponding point from one surface point
to another. This sphere lives in the 3D space, so there are x, y, and z axes with
corresponding coordinates on the sphere. A qubit can reach any point on the sphere
defined by spherical coordinates with r = 1 and the two angles θ and φ. The rotation
around the z-axis ei φ is called the relative phase.
θ iφ θ
|ψi = cos |0i + e sin |1i.
2 2
36 Quantum Computing Fundamentals
We define rotations about the orthogonal axes x, y, and z, with help of the Pauli
matrices as:
θ
Rx (θ) = ei 2 X ,
θ
R y (θ) = ei 2 Y ,
θ
Rz (θ) = ei 2 Z .
ei θ A = cos(θ)I + i sin(θ)A.
f (A) = c0 I + c1 A + c2 A2 + c3 A3 + · · · .
A2 A3 A4
f (A) = e A = I + A + + + + ··· .
2! 3! 4!
If the operator is involutory and satisfies A2 = I , then this becomes the following,
which can be reordered into the Taylor series for sin(·) and cos(·):
θ2 I θ3 A θ4 I
ei θ A = I + i θ A − −i + + ···
2! 3! 4!
θ2 θ4 θ3 θ5
= 1− + − ··· I +i θ − + − ··· A
2! 4! 3! 5!
= cos(θ)I + i sin(θ)A.
−i θ2 Y θ θ
R y (θ) = e = cos I − i sin Y
2 2
cos θ2 − sin θ2
= ,
sin θ2 cos θ2
−i θ2 Z θ θ
Rz (θ) = e = cos I − i sin Z
2 2
θ
e−i 2 0
= θ
.
0 ei 2
This also helps to explain why the Pauli Z-gate is called the phase-flip gate. Applying
this gate will rotate the |1i part of a qubit around the z-axis by φ. With an angle
φ = π, the expression ei φ = −1, a rotation by 180◦ . This result is also famously
known as Euler’s identity:
ei π = −1.
In general, rotations can be defined about any arbitrary axis n̂ = (n 0,n 1,n 2 ):
1
Rn̂ = ex p −i θn̂ σ̂ .
2
We learn more about general rotation and how to compute the axis and rotation
angles in Section 6.15.3. For now, we can focus on the implementation of rotations
about the standard Cartesian (x, y,z) axes:
v = np.asarray(v)
if (v.shape != (3,) or not math.isclose(v @ v, 1) or
not np.all(np.isreal(v))):
raise ValueError('Rotation vector must be 3D real unit vector.')
ei φ = cos(φ) + i sin(φ),
ei π/2 = cos(π/2) + i sin(π/2) = i.
Note that there is an important difference between this and the RotationZ gate.
The S-gate only affects the second component of a qubit with its imaginary i, repre-
senting a 90◦ rotation. It leaves the first component of a qubit intact because of the
1.0 in the operator’s upper left corner. In contrast, the RotationZ gate affects both
components of a qubit. In code:
To see the difference between the RotationZ gate and the S-gate, you can run a
few simple experiments:
def test_rotation(self):
rz = ops.RotationZ(math.pi)
rz.dump('RotationZ pi/2')
rs = ops.Sgate()
rs.dump('S-gate')
psi = state.qubit(random.random())
psi.dump('Random state')
ops.Sgate()(psi).dump('After applying S-gate')
ops.RotationZ(math.pi)(psi).dump('After applying RotationZ')
2.6 Single-Qubit Gates 39
See how the S-gate only affects the |1i component of the state, while the RotationZ
gate affects both components?
We can spot a potential source of errors – the direction of rotations, especially
when porting code from other infrastructures that might interpret angle directions
differently. Fortunately, for much of this text, we are shielded from this problem.
However, it may be one of the first things to look out for when results do not match
expectations.
Finally, remember the Z-gate and how similar it looks to the phase gate? The
relationship is easy to see – applying two phase gates, each effecting a rotation of
π/2, yields a rotation of π, which we get from applying the Z-gate:
1 0 1 0 1 0
S 2 = SS = = = Z.
0 i 0 i 0 −1
The next form is the U1(lambda) gate, also known as the phase shift or phase
kick gate.
1 0
U1 (λ) = .
0 ei λ
It is similar to Rk , except arbitrary phase angles are allowed. In this text, we
will only use power of 2 fractions of π. The implementation of the gate itself is
straightforward:
..
.
You can verify this quickly:
def test_rk_u1(self):
for i in range(10):
u1 = ops.U1(2*math.pi / (2**i))
rk = ops.Rk(i)
self.assertTrue(u1.is_close(rk))
Some of the named gates are just special cases of Rk , in particular, the Identity-
gate, Z-gate, S-gate, and T-gate (which we define in Section 2.6.6 below). This test
code helps to clarify:
def test_rk(self):
rk0 = ops.Rk(0)
self.assertTrue(rk0.is_close(ops.Identity()))
rk1 = ops.Rk(1)
self.assertTrue(rk1.is_close(ops.PauliZ()))
2.6 Single-Qubit Gates 41
rk2 = ops.Rk(2)
self.assertTrue(rk2.is_close(ops.Sgate()))
rk3 = ops.Rk(3)
self.assertTrue(rk3.is_close(ops.Tgate()))
The root of a rotation is a rotation about the same axis, with the same direction, but
by half the angle. This is obvious from the exponential form:
p 1
ei φ = (ei φ ) 2 = ei φ/2 .
The root of the Phase gate (the S-gate) is called the T-gate. The S-gate represents
a phase of 90◦ around the z-axis. Correspondingly, the T-gate is equivalent to a 45◦
phase around the z-axis.
1 0
T = .
0 ei π/4
return Operator(
np.array([[1.0, 0.0],
[0.0, cmath.exp(cmath.pi * 1j / 4)]])).kpow(d)
The root of the Y-gate has no special name (that we know of), but it is required later
in the text, so we introduce it here as Yroot. It is defined as:
1 1 + i −1 − i
Yr oot = .
2 1+i 1+i
42 Quantum Computing Fundamentals
There are other interesting roots, but these are the main ones we will encounter in
this text. We can test for the correct implementation of the roots with these snippets:
def test_t_gate(self):
"""Test that T^2 == S."""
t = ops.Tgate()
self.assertTrue(t(t).is_close(ops.Phase()))
def test_v_gate(self):
"""Test that V^2 == X."""
v = ops.Vgate()
self.assertTrue(v(v).is_close(ops.PauliX()))
def test_yroot_gate(self):
"""Test that Yroot^2 == Y."""
yr = ops.Yroot()
self.assertTrue(yr(yr).is_close(ops.PauliY()))
Finding a root in closed form can be quite cumbersome. In case of problems, you can
simply use the scipy function sqrtm(). For this to work, scipy must be installed:
return Operator(psi.density())
Applying the projector for the |0i state to a random qubit yields the probability
amplitude of the qubit being found in the |0i state (similar for the projector to the |1i
state):
Projection operators are Hermitian, hence P = P † , but note that projection opera-
tors are not unitary and not reversible. If a projector operator’s basis states are normal-
ized, the projection operator is equal to its own square P = P 2 ; it is idempotent. This
is a result we will use later in the section on measurement. Similar to basis states, two
projection operators are orthogonal if and only if their product is 0, which means that
for each state:
E
P|0i P|1i |ψi = 0.
To generalize, you can think of the outer product |r ihc| as a 2-dimensional index
[r,c] into a matrix. This is called the outer product representation of an operator.
a b
A= = a|0ih0| + b|0ih1| + c|1ih0| + d|1ih1|.
c d
This also works for larger operators. For example, for this two-qubit operator U
with just one nonzero element α:
The outer product representation for the single nonzero element α in this operator
would be α|11ih01|, an index pattern of |r owihcol|. For derivations, this representation
44 Quantum Computing Fundamentals
can be more convenient than having to deal with full matrices. For example, to express
the application of the X-gate to a qubit, we would write:
0 1
X= = |0ih1| + |1ih0|,
1 0
X α |0i + β |1i
= |0ih1| + |1ih0| (α |0i + β |1i)
= |0ih1|α |0i + |0ih1|β |1i + |1ih0|α |0i + |1ih0|β |1i
= α|0i h1|0i +β|0i h1|1i +α|1i h0|0i +β|1i h0|1i
| {z } | {z } | {z } | {z }
=0 =1 =1 =0
= β |0i + α|1i.
Applying the Hadamard gate twice reverses the initial Hadamard gate. A Hadamard
gate is its own inverse, H = H −1 , H H = I . It is involutory, just like the Pauli matri-
ces:
1 1 1 1 1 1 1 2 0 1 0
HH = √ √ = = = I.
2 1 −1 2 1 −1 2 0 2 0 1
1 X
H ⊗n |0i⊗n = √ |xi.
2n x∈{0,1}n
This construction is common for two and three qubits and used in many of the
algorithms and examples. Let’s spell it out explicitly:
1
(H ⊗ H ) |0i ⊗ |0i = |00i + |01i + |10i + |11i ,
2
(H ⊗ H ⊗ H ) |0i ⊗ |0i ⊗ |0i
1
= √ |000i + |001i + |010i + |011i + |100i + |101i + |110i + |111i
23
1
= √ |0i + |1i + |2i + |3i + |4i + |5i + |6i + |7i
2 3
7
1 X 1 X
=√ |xi =√ |xi.
23 x=0 23 x∈{0,1}3
Although we have seen single-qubit gates and learned how to construct multi-qubit
states, a key ingredient to flexible computing is still missing. Where are the control
flow constructs that are so common in classical computing and that seem essential
for any type of algorithm? The quantum equivalents of those constructs are called
controlled gates, which we discuss next.
46 Quantum Computing Fundamentals
Quantum computing does not have classic control flow with branches around condi-
tionally executed parts of the code. As described earlier, all qubits are active at all
times. The quantum equivalent of control-dependent execution are controlled qubit
gates.
These are gates that are always applied but only show effect under certain condi-
tions. At least two qubits are involved: a controller qubit and a controlled qubit. Note
that 2-qubit gates of this form cannot be decomposed into single-qubit gates.
Let us explain the function of controlled gates by example. Consider how the
following Controlled-Not matrix (abbreviated as CNOT, or CX) from qubit 0 to qubit
1 operates on all combinations of the |0i and |1i states.
1 0 0 0
0 1 0 0
CX0,1 =
0
.
0 0 1
0 0 1 0
Eagle-eyed readers will find the X-gate at the lower right side of this matrix and
the identity matrix in the upper left. This can be misleading, as we show below for the
controlled gate from 1 to 0. The important thing to note is that a Controlled-Not gate
is a permutation matrix.
Applying this matrix to states |00i and |01i leaves the states intact:
1 0 0 0 1 1
0 1 0 0 0 0
CX0,1 |00i =
0
= = |00i.
0 0 1 0 0
0 0 1 0 0 0
1 0 0 0 0 0
0 1 0 0 1 1
CX0,1 |01i =
0
= = |01i.
0 0 1 0 0
0 0 1 0 0 0
2.7 Controlled Gates 47
The CX matrix flips the second qubit from |0i to |1i, or from |1i to |0i, but only if
the first qubit is in state |1i. The X-gate on the second qubit is controlled by the first
qubit. Any 2 × 2 quantum gate can be controlled this way. We can have Controlled-Z
gates, controlled rotations, or any other controlled 2 × 2 gates.
The CX gate is usually introduced, as we did here, by its effects on the |0i and
|1i states of the second qubit. Only the amplitudes of the controlled qubit are being
flipped. This is easy to see with the effects of the X-gate alone on a single qubit in
superposition:
0 1 α β
X |ψi = X α|0i + β|1i = = = β|0i + α|1i.
1 0 β α
The CX matrix allows a first qubit to control an adjacent second qubit. What if
the controller and controlled qubit are farther apart? The general way to construct a
controlled unitary operator U with the help of projectors is the following:
Note that for a Controlled-Not gate CX1,0 from 1 to 0, you cannot find the original
X-gate or identity matrix in the operator. This matrix is still just a permutation matrix:
1 0 0 0
0 0 0 1
CX1,0 =
0
.
0 1 0
0 1 0 0
If there are n qubits in between the controlling and controlled qubits, n identity
matrices have to be tensored in between as well. If the index of the controlling qubit
is larger than the index of the controlled qubit, the tensor products in Equation (2.1)
need to be inverted. Here is an example with qubit 2 controlling gate U on qubit 0:
The corresponding code is straightforward. We have to make sure that the right number
of identity matrices are being added to pad the operator:
if idx0 == idx1:
raise ValueError('Control and controlled qubit must not be equal.')
p0 = Projector(state.zeros(1))
p1 = Projector(state.ones(1))
# space between qubits
ifill = Identity(abs(idx1 - idx0) - 1)
# 'width' of U in terms of Identity matrices
ufill = Identity().kpow(u.nbits)
What is clear from this code is that operators larger than 2 × 2 can be controlled
as well. We can construct Controlled-Controlled-... gates, which are required for most
interesting algorithms.
This code makes one big operator matrix. This can be a problem in larger circuits,
e.g., for a circuit with 20 qubits with qubit 0 controlling qubit 19 (or any other padded
operator), the operator will be a matrix of size (220 )2 * sizeof(complex), which
could be 8 or 16 terabytes5 of memory. Building such a large matrix in memory and
5 tebibytes, to be precise.
2.7 Controlled Gates 49
Note that this construction to control a gate by |0i works for any target gate. We will
see several examples of this in the later sections.
A A
B T B
C (A ∧ B) ⊕ C
gate. Interestingly, this universality attribute does not hold in quantum computing. In
quantum computing, there are only sets of universal gates (see also Section 6.15).
This is how the Toffoli gate works: if the first two inputs are |1i it flips the third
qubit. This is often shown in form of a logic block diagram (with ∧ as the logical
AND), as in Figure 2.2.
In matrix form, we can describe it using block matrices, with 0n as an n × n null
matrix. Note that changing the indices of the controller and controlled qubits may
destroy these patterns, but the matrix will still be a permutation matrix:
I2 02 02 02
I4 04 02 I2 02 02
= 02 02 I2 02 .
04 CX
02 02 02 X
The constructor code is fairly straightforward and a good example of how to con-
struct a double-controlled gate:
We observe that because we are able to construct quantum Toffoli gates, and
because Toffoli gates are classical universal gates, if follows that quantum computers
are at least as capable as classical computers.
A A
B F (¬A ∧ B) ⊕ (A ∧ C)
C (¬A ∧ C) ⊕ (A ∧ B)
1 0 0 0
0 0 1 0
. (2.2)
0 1 0 0
0 0 0 1
Our approach to construct controlled gates cannot produce this gate. However, it
turns out that a sequence of three CNOT gates swaps the probability amplitudes of the
basis states. For example, to swap qubit 0 and 1, you apply CX10 CX01 CX10 . This
is analogous to classical computing, where a sequence of three XOR operations also
swaps values. These techniques do not require additional temporary storage, such as
a temporary variable or an additional helper qubit. There are other ways to construct
Swap gates; some interesting examples are outlined in Gidney (2021b).
We have learned the basics of qubits, states, operators, and gates, and how to com-
bine them into larger circuits. We already hinted at a nice graphical way to visualize
circuits. Here is how it works.
Qubits are drawn from top to bottom. Like the ordering we described earlier, the
qubits are depicted from the “most significant” qubit to the “least significant” qubit.
This can be confusing because you may naturally consider the top qubit as “qubit 0,”
52 Quantum Computing Fundamentals
which, in classical computing, denotes the least significant bit. In analytical equations,
the top qubit will always be on the left of a state, such as the 1 in |1000i.
Here is another way to visualize this ordering. If we imagine the column of qubits
as a vector, transposing this vector will move the top qubit into the most significant
(left-most) slot.
Graphically, the qubits’ initial state is drawn to the left, and horizontal lines go to
the right, indicating how the state changes over time as operators are applied. Again,
note the absence of classical control flow. All gates are always active in a combined
state, which could be a product state or an entangled state. Computation flows from
left to right.
The initial state of three qubits appears like the following, with the initial state to
the left of the circuit. It is conventional to always initialize qubits in state |0i. However,
because it is trivial to insert X-gates or Hadamard gates, we may take shortcuts and
draw circuits as if these gates were present:
|0i
|1i
|+i
Also, note that the state of this circuit is the tensor product of the three qubits; it is
a combined state (in the example, the state is still separable). It is tempting to reason
that single, isolated qubits in the circuit diagram are in a particular state. However,
the reality is that qubits are always in a combined state with other qubits, either as
separable product states or entangled states.
Graphically, applying a Hadamard gate, or any other single-qubit operator, to
the first qubit looks like the following. Before the operator is applied, the state is
ψ0 = |0i ⊗ |0i = [1,0,0,0]T . After the operator is applied, the state is ψ1 . You can
think about the state ψ1 as the tensor product of the top qubit being in superposition
of |0i and |1i and the bottom qubit being in state |0i.
ψ0 ψ1
|0i H
|0i
The initial state before the Hadamard gate is ψ0 = |0i ⊗ |0i = |00i, the√ tensor
product of the two |0i states. The Hadamard gate puts the top qubit into 1/ 2(|0i +
|1i) = |+i. As a result, the state ψ1 is the tensor product of the top qubit with the
bottom qubit |0i:
2.8 Quantum Circuit Notation 53
|0i + |1i 1
ψ1 = |+i ⊗ |0i = √ ⊗ |0i = √ |00i + |10i . (2.3)
2 2
Applying a Z-gate after the Hadamard gate to qubit 1 results in this circuit:
ψ1 ψ2
|0i H
|0i Z
The fact that the Z-gate is to the right of the Hadamard gate indicates that this
operator should be applied after the Hadamard gate. We can think of this as two
separate gate applications – first the Hadamard H tensored with I , and then a second
application of I tensored with Z . This is the equivalent of applying just one two-
qubit operator O that has been constructed by multiplying the two tensor products (in
reverse order of application, see Section 2.8.1):
O = (I ⊗ Z )(H ⊗ I ).
Controlled-X gates are indicated with a solid dot for the controller qubit and the
addition-modulo-2 symbol ⊕ for the controlled qubit (not to be confused with the
symbol for the tensor product ⊗). Addition-modulo-2 behaves like the binary XOR
function. If the controlled qubit is |0i and we apply the X-gate, the gate becomes |1i,
as in 0 ⊕ 1 = 1. If the controlled bit is |1i, applying the X-gate will turn it into |0i,
as in 1 ⊕ 1 = 0. The XOR gives the same truth table as addition-modulo-2. In some
instances, we may still want to see an X-gate, but again, these two are identical:
=
X
Any single-qubit gate can be controlled this way, for example, the Z-gate:
The Controlled-Not-by-0 gate can be built by applying an X-gate before and after
the controller. It is drawn with an empty circle on the controlling qubit:
X X
=
54 Quantum Computing Fundamentals
U V
Figure 2.4 Measurement and flow of classical data after measurement, indicated with
double lines.
Swap gates are marked with two connected × symbols, as in the circuit diagrams
below. As discussed, like any other gate, Swap gates can also be controlled:
If a gate is controlled by more than one qubit, this can be indicated with multiple
black or empty circles, depending on whether the gates are controlled by |1i or |0i. In
the example, qubits 0 and 2 must be |1i (have an amplitude for this base state), and
qubit 1 must be |0i to activate the X-gate on qubit 3.
We will talk more about measurements in Section 2.15. Measurement gates produce
real, classical values and are indicated with a meter symbol. Classical information
flow is drawn with double lines. In the example in Figure 2.4, measurements are being
made, and the real, classical measurement data may then be used to build or control
other unitary gates, U and V in the example.6
X Y Z
In code, we can apply the gate to the state from left to right, for example:
6 All circuit diagrams in this book were created using the excellent LAT X quantikz package.
E
2.9 Bloch Sphere 55
psi = state.zeros(1)
psi = ops.PauliX()(psi)
psi = ops.PauliY()(psi)
psi = ops.PauliZ()(psi)
psi = state.zeros(1)
op = ops.PauliX(ops.PauliY(ops.PauliZ()))
psi = op(psi)
But if you want to write this as an explicit matrix multiplication, the order reverses.
Take note of the parentheses – in Python, the function call operator has higher prece-
dence than the matrix multiply operator:
psi = state.zeros(1)
psi = (ops.PauliZ() @ (ops.PauliY() @ ops.PauliX()))(psi)
We have seen several ways to describe states and gates, including Dirac notation,
matrix notation, circuit notation, and code. You may prefer one over the other in your
learning journey. Another representation is especially useful for visual learners: the
Bloch sphere, named after the famous physicist Felix Bloch.
A single complex number can be drawn in a 2D polar coordinate system by speci-
fying just a radius r and an angle φ to the x-axis. Typically, we think of this angle as
counterclockwise. Complex numbers with radius r = 1 are restricted to a unit circle
with radius 1.
A qubit is normalized because the probabilities of measuring a basis state must add
up to 1.0. A qubit also has two complex amplitudes. Hence, two angles will suffice7
to describe a qubit fully – it can be placed on the surface of a sphere with radius 1.0,
the unit sphere.
In this representation, the |0i state is located at the north pole along the z-axis
and the |1i state at the south pole of a sphere with radius 1. The |+i state points
in the positive direction of the x-axis. Typically the x-axis is drawn as pointing out
7 https://en.wikipedia.org/wiki/Qubit#Bloch_sphere_representation.
56 Quantum Computing Fundamentals
|−i |ii(y)
|1i |1i
Figure 2.5 A Bloch sphere and the same sphere rotated counterclockwise by 90◦ about the
z-axis.
of the page. The corresponding |−i state points in the negative x-direction into the
page. The state |ii is on the Bloch sphere’s equator on the positive y-axis, which
is typically drawn to the right of the page. Correspondingly, |-ii is on the negative
y-axis. The two spheres in Figure 2.5 are identical. The second sphere is rotated by
90◦ counterclockwise about the z-axis.
To see how to move about the sphere, let’s start in state |0i, the north pole of
the sphere. Applying the Hadamard operator moves the state to |+i, which is on
the x-axis. This is shown in Figure 2.6a. Applying the Z-gate moves the arrow to
point to state |−i on the negative x-axis. The relative phase is now π/2, as shown in
Figure 2.6b.
It is obvious that, with rotations, you can reach any point on this sphere by using
different paths or sequences of gate applications. Large rotations can also be broken
down into equivalent sequences of smaller rotations.
This can also be demonstrated in code. In the example below, we start out with state
|1i and apply the Hadamard gate to change the state to |−i.
|0i − |1i
H |1i = √ = |−i.
2
|0i − |1i |1i − |0i
X √ = √ .
2 2
Finally, we apply the Hadamard gate again, resulting in the state −|1i. We should
wonder about the minus sign and whether −|1i = |1i. We will show in Section 2.10
that we can ignore this minus sign because it is a global phase.
2.9 Bloch Sphere 57
|0i |0i
|−i |−i
y y
|+i |+i
|1i |1i
(a) Application of a H-gate (b) Application of a Z-gate
|1i − |0i 1 1 1 1 −1
H √ =√ √
2 2 1 −1 2 1
1 0 0
= = −1
2 −2 1
= −|1i.
def basis_changes():
"""Explore basis changes via Hadamard."""
# Generate [0, 1]
psi = state.ones(1)
As useful as the Bloch sphere can be, it may also lead to confusion. For example,
the basis states |0i and |1i are orthogonal, but on the Bloch sphere they appear on
opposite poles (similarly for |+i, |−i, and |ii, |-ii). It may be tempting to think
58 Quantum Computing Fundamentals
that classical rules of vector addition apply, but they do not: adding |1i and − |1i
does not equal |0i.
2.9.1 Coordinates
How do you compute the x, y, and z coordinates on the Bloch sphere for a given state
|ψi? This is how we do it in two simple steps:
1. Compute the outer product of state |ψi with itself to compute its density matrix
ρ = |ψihψ|. We can use the convenience function density() for this, as
described in Section 2.3.5.
2. Apply the helper function density_to_cartesian(rho), shown below, which
returns the corresponding x, y, z coordinates.
The function density_to_cartesian(rho) computes the Cartesian coordinates
from a single-qubit density matrix. In the open-source repository, we add this function
in file lib/helper.py .
How does the math8 work? We stated in Section 2.6.2 that the Pauli matrices form
a basis for the space of 2 × 2 Hermitian matrices. A density operator ρ is a 2 × 2
Hermitian matrix; we can write it as:
" #
I + x X + yY + z Z 1 1 + z x − iy
ρ= = . (2.5)
2 2 x + iy 1 − z
a b
If we think of ρ as a matrix , then
c d
2a = 1 + z,
2c = x + i y.
And correspondingly, x = 2 Re(c), y = 2 Im(c), and z = 2a − 1.
a = rho[0, 0]
c = rho[1, 0]
x = 2.0 * c.real
y = 2.0 * c.imag
z = 2.0 * a - 1.0
return np.real(x), np.real(y), np.real(z)
8 https://quantumcomputing.stackexchange.com/a/17180/11582.
2.10 Global Phase 59
def test_bloch(self):
psi = state.zeros(1)
x, y, z = helper.density_to_cartesian(psi.density())
self.assertEqual(x, 0.0)
self.assertEqual(y, 0.0)
self.assertEqual(z, 1.0)
psi = ops.PauliX()(psi)
x, y, z = helper.density_to_cartesian(psi.density())
self.assertEqual(x, 0.0)
self.assertEqual(y, 0.0)
self.assertEqual(z, -1.0)
psi = ops.Hadamard()(psi)
x, y, z = helper.density_to_cartesian(psi.density())
self.assertTrue(math.isclose(x, -1.0, abs_tol=1e-6))
self.assertTrue(math.isclose(y, 0.0, abs_tol=1e-6))
self.assertTrue(math.isclose(z, 0.0, abs_tol=1e-6))
Bloch spheres are only defined for single-qubit states. You can visualize an indi-
vidual qubit’s Bloch sphere in a many-qubit system by tracing out all the other qubits
in the state. This is done with with the partial trace procedure, a helpful tool which we
detail in Section 2.14.
What do we do with the minus sign on the final state after rotating the state about the
Bloch sphere? Is |1i different from − |1i? The Bloch sphere does not allow simple
addition, |1i plus − |1i does not equal |0i! The answer is that the minus sign in − |1i
represents a global phase, a rotation by π. It has no physical meaning and can be
ignored.
This is an important insight in quantum computing. A global phase is a complex
coefficient ei φ to a state with norm 1.0. Multiplying a state by a complex coefficient
has no physical meaning because the expectation value of the state with or without the
coefficient does not change. Physicists also call this phase invariance.
The expectation value for an operator A on state |ψi is this expression (which we
will develop in Section 2.15 on measurement):
hψ|A|ψi.
2.11 Entanglement
The entanglement of two or more qubits is one of the most fascinating aspects of
quantum physics. When two qubits (or systems) are entangled, it means that, on
measurement, the results are strongly correlated, even when the states were physically
separated, be it by a millimeter or across the universe! This is the effect that Albert
Einstein famously called “spooky action at a distance.” If we entangle two qubits in a
specific way (described below), and qubit 0 is measured to be in state |0i, qubit 1 will
always be found in state |0i as well.
Why is this truly remarkable? What if we took two coins, placed them heads-up in
two boxes, and shipped one of those boxes to Mars. When we open up the boxes, they
will both show heads. So what’s so special about the quantum case? In this example,
coins have hidden state. We have placed them in the boxes before shipment, knowing
which side to put on top in an initial, defined, nonprobabilistic state. We also know
that this state will not change during shipment.
If there was hidden state in quantum mechanics, then the theory would be incom-
plete; the quantum mechanical wave functions would be insufficient to describe a
physical state in full. This was the point that Einstein, Podolsky, and Rosen attempted
to make in their famous EPR paper (Einstein et al., 1935).
However, a few decades later it was shown that there cannot be a hidden state in an
entangled quantum system. A famous thought experiment, the Bell inequalities (Bell,
1964), proved this and was later experimentally confirmed.
Qubits collapse probabilistically during measurement to either |0i or |1i.9 This is
equivalent to putting the coins in the boxes while they are twirling on their edges.
Only when we open the boxes will the coins fall to one of their sides. Perfect coins
would fall to each side 50% of the time. Similarly, if we prepared a qubit in the |0i
state and applied a Hadamard gate to it, this qubit will measure either |0i or |1i, with
50% probability for each outcome. The magic of quantum entanglement means that
both qubits of an entangled pair will measure the same value, either |0i or |1i, 100%
of the time. This is equivalent to the coins falling to their same side, 100% of the time,
on Earth and Mars!
There are profound philosophical arguments about entanglement, measurement,
and what they tell us about the very nature of reality. Many of the greatest physicists of
the last century have argued over this, for decades – Einstein, Schrödinger, Heisenberg,
9 This is true as long as we measure in this basis. We talk about measurements in different bases in
Section 6.11.3.
2.11 Entanglement 61
Bohr, and many others. These discussions are not settled to this day; there is no
agreement. Many books and articles have been written about this topic, explaining
it better than we would be able to do here. We are not even going to try. Instead, we
accept the facts for what they are: rules we can exploit for computation.
This sentiment might put us in the camp of the Copenhagen interpretation of quan-
tum mechanics (Faye, 2019). Ontology is a fancy term for questions like “What is?”
or “What is the nature of reality?” The Copenhagen interpretation refuses to answer
all ontological questions. To quote what David Mermin said about it (Mermin, 1989,
p. 2): If I were forced to sum up in one sentence what the Copenhagen interpretation
says to me, it would be “Shut up and calculate!.” The key here is, of course, that
progress can be made, even if the ontological questions remain unanswered.
Proof As a quick proof, assume two qubits q0 = [i,k]T and q1 = [m,n]T . Their
Kronecker product is q0 ⊗ q1 = [im,in,km,kn]T . Multiplying the outer elements and
the inner elements, corresponding to the ad = bc form above, we see that
|0i H
|0i
The initial state |ψ0 i, before the Hadamard gate, is the tensor product of the two |0i
states, which is |00i with a state vector of [1,0,0,0]T . The Hadamard gate puts the first
qubit in superposition of |0i and |1i. The state |ψ1 i after the Hadamard gate becomes
62 Quantum Computing Fundamentals
the tensor product of the superposition of the first qubit with the second qubit:
|0i + |1i 1
|ψ1 i = √ |0i = √ (|00i + |10i).
2 2
In code, we compute this with the following snippet. The print statement pro-
duces a state with nonzero entries at indices 0 and 2, corresponding to the states |00i
and |10i:
psi = state.zeros(2)
op = ops.Hadamard() * ops.Identity()
psi = op(psi)
print(psi)
>>
2-qubit state. Tensor:
[0.70710677+0.j 0. +0.j 0.70710677+0.j 0. +0.j]
Now we apply the Controlled-Not gate. The |0i part of the first qubit in superposi-
tion does not impact the second qubit; the |00i part remains unchanged. However, the
|1i part of the superpositioned first qubit controls the second qubit and will flip it to
|1i, changing the |10i part to |11i. The resulting state |ψ2 i after the Controlled-Not
gate thus becomes:
|00i + |11i
|ψ2 i = √ .
2
This state corresponds to this state vector:
1 a
1 0 b
|ψ2 i = √ = .
2 0 c
1 d
This state is now entangled because the ad = bc identity in the rule above does not
hold: the product of elements 0 and 3 is 1/2, but the product of elements 1 and 2 is 0.
The state can no longer be expressed as a product state.
In code, we take the state psi we computed above and apply the Controlled-Not:
√
This prints the entangled 2-qubit state, with elements 0 and 3 having values 1/ 2,
corresponding to the binary indices of the |00i and |11i states:
Entanglement also means that now only states |00i and |11i can be measured. The
other two basis states have a probability of 0.0 and cannot be measured. If qubit 0
is measured as |0i, the other qubit will be measured as |0i as well, since the only
nonzero probability state with a |0i as the first qubit is |00i. Similar logic applies for
state |1i.
This explains the correlations, the spooky action at a distance, at least mathemati-
cally. The measurement results of the two qubits are 100% correlated. We don’t know
why this is, what physical mechanism facilitates this effect, or what reality is. But, at
least for simple circuits and their respective matrices, we now have a means to express
this unreal feeling reality.
1 0
|00i + |11i 1 0 |01i + |10i 1 1
β00 = √ =√ , β01 = √ =√ ,
2 2 0
2 2 1
1 0
1 0
|00i − |11i 1 0 |01i − |10i 1 1
β10 = √ =√ , β 11 = √ =√ .
2 2 0 2 2 −1
−1 0
Here is the code to construct these states (in file lib/bell.py in the open-source
repository):
|0i H
|000i+|111i
|0i √
2
|0i
We then define maximally entangled the following way. The partial trace allows
reasoning about a subspace of a state, as explained in Section 2.14. It allows tracing
out parts of a state. What is left is a reduced density matrix, representing the reduced
state, the subspace. We call a state maximally entangled if the remaining reduced
density matrices are maximally mixed, meaning their diagonal elements are all the
same.
There is another profound difference between classical computing and quantum com-
puting. In classical computing, it is always possible to copy a bit, a byte, or any
memory and copy it many times. In quantum computing, this is verboten – it is
generally impossible to clone the state of a given qubit. This restriction is related
to the topic of measurements and the fact that it is impossible to create a measurement
device that does not impact (entangle with) a state. The inability to copy is expressed
with the so-called No-Cloning Theorem (Wootters and Zurek, 1982).
t❤❡♦r❡♠ 2.1 Given a general quantum state |ψi = |φi|0i, there cannot exist a
unitary operator U such that U |ψi = |φi|φi.
2.13 Uncomputation
• For constructions like the multi-controlled gate, as we will see in Section 3.2.8, we
need additional qubits in order to properly perform the computation. You may
think of these qubits as temporary qubits, or helper qubits, that play no essential
role for the algorithm. They are the equivalent of compiler-allocated stack space
mitigating classical register pressure. These qubits are called ancilla qubits, or
ancillae.
• Ancilla qubits may start in state |0i and also end up in state |0i after a construction
like the multi-controlled gate. In other scenarios, however, ancillae may remain
entangled with a state, potentially destroying a desired result. In this case, we call
these ancillae junk qubits, or simply junk.
The typical structure of a quantum computation looks like that shown in Figure 2.7.
All quantum gates are unitary, so we pretend we packed them all up in one giant
unitary operator U f . There is the input state |xi and some ancillae qubits, all initialized
to |0i. The result of the computation will be f (|xi) and some left over ancillae, which
are now junk; they serve no purpose, they just hang around, intent on messing up
2.13 Uncomputation 67
|xi f (|xi)
Uf
|0i Junk
|0i f (x)
|xi |xi
Uf U †f
|0i |0i
our results. The problem is that the junk qubits may still be entangled with the result,
nullifying the intended effects of quantum interference, which quantum algorithms are
based upon. We will not be exposed to this problem until Section 6.6, where we have
to solve this problem for the order finding part of Shor’s algorithm.
Here is the procedure, as shown in Figure 2.8. After computing a solution, we apply
the inverse unitary operations to undo the computation completely. We can either
build a giant combined unitary adjoint operator, or, if we have constructed a circuit
from individual gates, we apply the inverses of the gates in reverse order. This works
because operators are unitary and U † U = I .
The problem now is that we lost the result f (|xi) that we were trying to compute.
Here is the “trick” to work around the problem, which is similar to Bennet’s recipe.
After computation, but before uncomputation, we connect the result qubits out to
another quantum register via Controlled-Not gates, as shown in Figure 2.8.
With this circuit, the result of f (|xi) will be in the upper register, and the state
of the other registers will be restored to their original state, eliminating all unwanted
entanglement.
Why does this work at all? We start in the state composed of an input state |xi and
a working register initialized with |0i. The first U f transforms the initial state |xi|0i
into f (|xi)g(|0i). We add the ancillae register at the top, building a product state
|0i f (|xi)g(|0i), and CNOT the register holding the result to get f (|xi) f (|xi)g(|0i).
68 Quantum Computing Fundamentals
This does not violate the no-cloning theorem; the two registers cannot be measured
independently and give the same result. We apply U †f to the bottom two registers
to uncompute U f and obtain f (|0i)|xi|0i. The final result is in the top register; the
bottom registers have been successfully restored.
we want to trace out qubit 0, so the |0i and |1i states come first and are tensored with
an identity matrix:
1 0 0 0
1 1 0 0 1 0 1 0 0 0
|0 A i = ⊗ = 0 0 , |1 A i = 1 ⊗ 0 1 = 1 0 .
0 0 1
0 0 0 1
Multiplying the 4 × 4 density to the right with a 4 × 2 matrix results in a 4 × 2
matrix. Multiplying this matrix from the left with a (now transposed) 2 × 4 matrix
results in a 2 × 2 matrix.
This definition of the partial trace is sufficient for implementation. The following
code is written in full matrix form and will only scale to a small number of qubits.
nbits = int(math.log2(rho.shape[0]))
if index > nbits:
raise AssertionError(
'Error in TraceOutSingle, invalid index (>nbits).')
eye = Identity()
zero = Operator(np.array([1.0, 0.0]))
one = Operator(np.array([0.0, 1.0]))
p0 = p1 = tensor.Tensor(1.0)
for idx in range(nbits):
if idx == index:
p0 = p0 * zero
p1 = p1 * one
else:
p0 = p0 * eye
p1 = p1 * eye
rho0 = p0 @ rho
rho0 = rho0 @ p0.transpose()
rho1 = p1 @ rho
rho1 = rho1 @ p1.transpose()
rho_reduced = rho0 + rho1
return rho_reduced
If we have a state of n qubits and are interested in the state of just one of the qubits,
we have to trace out all other qubits. Here is a convenience function for this:
nbits = int(math.log2(rho.shape[0]))
if val > nbits:
raise AssertionError('Error TraceOut, invalid index (>nbits).')
rho = TraceOutSingle(rho, val)
# Tracing out a bit means that rho is now 1 bit smaller, the
# indices right to the traced out qubit need to shift left by 1.
# Example, to trace out bits 2, 4:
# Before:
# qubit 0 1 2 3 4 5
# a b c d e f
# Trace out 2:
# qubit 0 1 <- 3 4 5
# qubit 0 1 2 3 4
# a b d e f
# Trace out 4 (is now 3)
# qubit 0 1 2 <- 4
# qubit 0 1 2 3
# a b d f
for i in range(idx+1, len(index_set)):
index_set[i] = index_set[i] - 1
return rho
2.14.1 Experiments
Now let us see this procedure in action. We start by producing a state from two well-
defined qubits.
q0 = state.qubit(alpha=0.5)
q1 = state.qubit(alpha=0.8660254)
psi = q0 * q1
Tracing out one qubit should leave the other in the resulting density matrix, with
matrix element (0,0) holding the value |α|2 and matrix element (1, 1) holding the value
|β|2 . Remember that the density matrix is the outer product of a state vector, with:
∗
a ∗ ∗
aa ab∗
a b = .
b ba ∗ bb∗
We have seen earlier that
n−1
X
tr(|xihy|) = xi yi∗ = hy|xi. (2.7)
i=0
For an outer product of a state vector, we know that the trace must be 1.0, as the
probabilities add up to 1.0. Correspondingly, density matrices also must have a trace
of 1.0. This is confirmed above; the diagonal elements are the squares of the norms of
the probability amplitudes.
2.14 Reduced Density Matrix and Partial Trace 71
Tracing out qubit 1 should result in a value of 0.5 ∗ 0.5 = 0.25 in matrix element
(0, 0), which is the norm squared of qubit 0’s α value of 0.5:
Tracing out qubit 0 should leave 0.86602542 = 0.75 in matrix element (0, 0):
This becomes interesting for entangled states. Take, for example, the first Bell state.
If we compute the density matrix and square it, the trace of the squared matrix is 1.0.
If we trace out qubit 0:
psi = bell.bell_state(0, 0)
reduced = ops.TraceOut(psi.density(), [0])
self.assertTrue(math.isclose(np.real(np.trace(reduced)),
1.0, abs_tol=1e-6))
self.assertTrue(math.isclose(np.real(reduced[0, 0]),
0.5, abs_tol=1e-6))
self.assertTrue(math.isclose(np.real(reduced[1, 1]),
0.5, abs_tol=1e-6))
We see that the result is I /2, the diagonals are all the same, the state was maximally
entangled. The trace of the squared, reduced density matrix is 0.5. We mentioned pure
and mixed states above. The trace operation gives us a mathematical definition:
tr(ρ2 ) < 1 : Mixed State,
tr(ρ2 ) = 1 : Pure State.
The result for the partial trace on the Bell state shows that the remaining qubit is in
a mixed state:
tr (I /2)2 = 0.5 < 1.
The joint state of the two qubits, entangled or not, is a pure state; it is known exactly,
it is not entangled further with the environment. However, looking at the individual
qubits of the entangled Bell state, we find that those are in a mixed state – we do not
have full knowledge of their state.
72 Quantum Computing Fundamentals
2.15 Measurement
We have arrived at the end of the introductory section. What remains is to discuss
measurements. This is a complex subject with many subtleties and layered theories.
Here, we stick to the minimum – projective measurements.
In the preceding chapters, we already captured postulates one and two by express-
ing states as full state vectors and showing how to apply unitary operators. By means
of probability amplitudes, we have implicitly used postulate four and, to some degree,
postulate five as well. In this section, we will focus on postulate three, measurements.
An important thing to note is that the postulates are postulates, not standard physical
laws. As noted above, they are also the subject of almost a century’s worth of scientific
disputes and philosophical interpretation. See, for example Einstein et al. (1935),
Bell (1964), Norsen (2017), Faye (2019), and Ghirardi and Bassi (2020), and many
more. Nevertheless, as indicated before, we avoid philosophy and focus on how the
postulates enable interesting forms of computation.
energy state high or low, the idea behind making a projective measurement is to simply
determine which of those two states the atom was in. Qubits are in superposition. We
may wonder whether it is in the |0i state or the |1i state, but measurement can only
return one of these two states, which it does with a given probability. We know that
after measurement, according to the Born rule, the state collapses to the measured
state (postulate 5). It is now either |0i or |1i and will be found in this state for all
future measurements.
Why is this called a projective measurement? In the section on single-qubit opera-
tors, we already learned about projection operators.
1 1 0
P|0i = |0ih0| = 1 0 = ,
0 0 0
0 0 0
P|1i = |1ih1| = 0 1 = .
1 0 1
Applying a projector to a qubit “extracts” a subspace. For example, for P|0i :
Following Equation (1.2) to compute the vector norm and Equation (1.10) to com-
pute the Hermitian adjoint of an expression, we have
†
Pr (i) = P|ii |ψi P|ii |ψi
†
= hψ|P|ii P|ii |ψi.
We also know that for projectors of normalized basis vectors (which have 1s on the
diagonal):
2
P|ii = P|ii,
The term hψ|P|ii |ψi is also called the expectation value of operator P|ii , the quantum
equivalent of the average of P|ii . It can be written as [P|ii ].
We know from the section on the trace of a matrix (Equation (1.19)) that
n−1
X
tr |xihy| = xi yi∗ = hy|xi. (2.8)
i=0
By rearranging terms and using Equation (2.8), we finally arrive at the form we will
use in our code:
Pr (i) = hψ|P|ii |ψi = tr P|ii |ψihψ| . (2.9)
You can understand this form intuitively. The density matrix of the state |ψihψ|
has the probabilities Pr (i) for each basis state |xi i on the diagonal, as shown in
Section 2.3.5. The projector zeros out all diagonal elements that are not covered by
the projector’s basis state. What remains on the diagonal are the probabilities of states
matching the projector. The trace then adds up all these remaining probabilities off the
diagonal.
After measurement, the state collapses to the measured result. Basis states that
do not agree with the measured qubit values get a resulting probability of 0.0 and
“disappear.” As a result, the remaining states’ probabilities no longer add up to 1.0 and
need to be renormalized, which we achieve with the complicated-looking expression
(no worries, in code, this will look quite simple):
P|ii |ψi
|ψi = p . (2.10)
hψ|P|ii |ψi
As an example, let us assume we have this state:
|ψi = 1/2 |00i + |01i + |10i + |11i .
2
Each of the four basis states has equal probability 12 = 1/4 of being measured.
Let us further assume that qubit 0 is being measured as |0i. This means that the only
choices for the final, full state to be measured are |00i or |01i. The first qubit is “fixed”
at |0i after measurement. This means that the states where qubit 0 is |1i now have a
0% probability of ever being measured. The state collapsed to this unnormalized state:
|ψi(6=|1|) = 1/2 |00i + |01i + 0.0 |10i + |11i .
But in this form, the squares of the probability amplitudes no longer add up to 1.0.
We must renormalize the state following Equation (2.10) and divide by the square root
of the expectation value (which was 1/2) to get:
√
|ψi = 1/ 2 |00i + |01i .
This step might be surprising. How does Nature know when and if to normalize?
Given that we’re adhering to the Copenhagen interpretation and decided to “Shut up
and compute!,” a possible answer is that the need for re-normalization is simply a
remnant of the mathematical framework, nothing more, nothing less.
2.15 Measurement 75
2.15.3 Implementation
The function to measure a specific qubit has the following parameters:
The way this function is written, if we measure and collapse to state |0i, the state
will be made to collapse to this state, independent of the probabilities. There are
other ways to implement this, e.g., by selecting the measurement result based on the
probabilities. At this early point in our exploration, the ability to force a result works
quite well; it makes debugging easier. Care must be taken, though, never to force the
state to collapse to a result with probability 0. This would lead to a division by 0 and
likely very confusing subsequent measurement results.
The function returns two values: the probability of measuring the desired qubit state
and a state. This state is either the post-measurement collapsed state, if collapse was
set to True, or the unmodified state otherwise. Here is the implementation. The func-
tion first computes the density matrix and the padded operator around the projection
operator:
The probability is computed from a trace over the matrix resulting from mul-
tiplication of the padded projection operator with the density matrix, as in
Equation 2.9:
If state collapse is required, we update the state and renormalize it before returning
the updated (or unmodified) probability and state.
And just to clarify one more time, the measurement operators are projectors. They
are Hermitian with eigenvalues 0 and 1, and eigenvectors |0i and |1i. A measurement
will produce |0i or |1i, corresponding to the basis states’ probabilities. Measure-
ment will not measure, say, a value of 0.75. It will measure one of the two basis
states with probability 0.75. This can be a source of confusion for beginners – in the
real world we have to measure several times to find the probabilities with statistical
significance.
2.15.4 Examples
Let us look at a handful of examples to see measurements in action. In the first
example, let us create a 4-qubit state and look at the probabilities:
psi = state.bitstring(1, 0, 1, 0)
psi.dump()
>>
|1010> (|10>): ampl: +1.00+0.00j prob: 1.00 Phase: 0.0
There is only one state with nonzero probabilities. If we measure the second qubit
to be 0, which it is:
p0, _ = ops.Measure(psi, 1)
print(p0)
>>
1.0
But if we tried to measure this second qubit to be 1, which it cannot be, we will get
an error, as expected:
2.15 Measurement 77
psi = bell.bell_state(0, 0)
psi.dump()
>>
|00> (|0>): ampl: +0.71+0.00j prob: 0.50 Phase: 0.0
|11> (|3>): ampl: +0.71+0.00j prob: 0.50 Phase: 0.0
The state has only two possible measurement outcomes, |00i and |11i. Let us measure
the first qubit to be |0i without collapsing the state:
psi = bell.bell_state(0, 0)
p0, _ = ops.Measure(psi, 0, 0, collapse=False)
print('Probability: ', p0)
psi.dump()
>>
Probability: 0.49999997
|00> (|0>): ampl: +0.71+0.00j prob: 0.50 Phase: 0.0
|11> (|3>): ampl: +0.71+0.00j prob: 0.50 Phase: 0.0
This shows the correct probability of 0.5 of measuring |0i, but the state is still unmod-
ified. Now we collapse the state after the measurement, which is more reflective of
making an actual, physical measurement:
psi = bell.bell_state(0, 0)
p0, psi = ops.Measure(psi, 0, 0, collapse=True)
print('Probability: ', p0)
psi.dump()
>>
Probability: 0.49999997
|00> (|0>): ampl: +1.00+0.00j prob: 1.00 Phase: 0.0
Now only one possible measurement outcome remains, the state |00i, which from here
on out would be measured with 100% probability.
At this point we have mastered the fundamental concepts and are ready to move on
to studying our first quantum algorithms. There are two possible paths forward. You
may want to explore Chapter 3 on simple algorithms next before learning more about
infrastructure and high-performance simulation in Chapter 4. Or, you may prefer read-
ing Chapter 4 on infrastructure first before exploring Chapter 3 on simple algorithms.
3 Simple Algorithms
In this section, we introduce a first set of quantum algorithms. All that is needed to
follow this section is the background and infrastructure from Chapter 2. What makes
the algorithms presented here simple compared to the complex algorithms in Chapter
6? A judgment call. To justify the judgement call, the algorithms in this chapter
are typically shorter and require less preparation or background than those in later
chapters. Additionally, the derivations are developed with great detail. Many of these
techniques will be taken for granted in later sections.
In this chapter, we start with what is possibly the most simple algorithm of all: a
quantum random number generator. We follow this with a range of gate equivalences –
how one gate, or gate sequence, can be expressed by another. Armed with these basic
tools, we implement a classical full adder but with quantum gates. This circuit does
not yet exploit superposition or entanglement.
Then it gets more exciting. We describe a swap test, which allows measurement
of the similarity between two states without directly measuring the states themselves.
We describe two algorithms with very cool names that utilize entanglement – quantum
teleportation and superdense coding.
After this we move on to three so-called oracle algorithms. These algorithms exploit
superposition and compute solutions in parallel using a large unitary operator. These
are the first quantum algorithms we explore that perform better than their classical
counterparts.
Every programming system introduces itself with the equivalent of a “Hello World”
program. In quantum computing, this may be a random number generator. It is the
simplest possible quantum circuit that does something meaningful, and it does so with
just one qubit and one gate:
|0i or |1i
|0i H |+i
3.2 Gate Equivalences 79
psi = ops.Hadamard()(state.zero)
psi.dump()
>>
0.70710678+0.00000000i |0> 50.0%
0.70710678+0.00000000i |1> 50.0%
Since we can construct one single, completely random qubit, which we interpret
as a classical bit after measurement, bundling multiple of these bits in parallel or
sequence allows the generation of random numbers of any bit width. By random,
we mean true, atomic-level randomness, not classical pseudo-randomness. There is
a finite number of possible (enumerable) states in classical computers with a finite
amount of memory. Hence all random numbers generated on a classical computer are
not truly random; their sequence will repeat itself eventually.
This circuit can barely be called a circuit, never mind an algorithm (even though
in Section 6.8 on amplitude amplification, we do call it an algorithm). It only has
one gate, so it is the simplest of all possible circuits. Nevertheless, it exploits crucial
quantum computing properties, namely superposition and the probabilistic collapse of
the wave function on measurement. It is trivial, and it is not, both at the same time. It
is a true quantum circuit.
As we learned earlier, points on the surface of the Bloch sphere can be reached in
an infinite number of ways, simply by utilizing rotations. Similarly for circuits, using
only standard gates, there are many interesting equivalences for one-, two-, and many-
more-qubit circuits. In this section, we describe common equivalences, mostly in the
form of code. With just a little stretch of the imagination, we can consider those simple
circuits as algorithms; that’s why we’re discussing them in this part of the book.
There are also higher-level functions that can be composed of simpler, one- or two-
qubit gates, e.g., a Swap gate over larger qubit distances and the double-controlled
X-gate. This aspect, the reduction to one- and two-qubit gates, is important, and in
the remainder of this text, we will (mostly) focus on these types of gates. The main
reasons for this focus are:
def test_t_gate(self):
"""T^2 == S."""
s = ops.Tgate() @ ops.Tgate()
self.assertTrue(s.is_close(ops.Phase()))
def test_s_gate(self):
"""S^2 == Z."""
x = ops.Sgate() @ ops.Sgate()
self.assertTrue(s.is_close(ops.PauliZ()))
def test_v_gate(self):
"""V^2 == X."""
s = ops.Vgate() @ ops.Vgate()
self.assertTrue(s.is_close(ops.PauliX()))
H H
=
H H
3.2 Gate Equivalences 81
def test_had_cnot_had(self):
h2 = ops.Hadamard(2)
cnot = ops.Cnot(0, 1)
op = h2(cnot(h2))
self.assertTrue(op.is_close(ops.Cnot(1, 0)))
We can also convince ourselves of this result by looking at the operator matrices;
this is a useful debugging technique. The circuit above translates to the following gate-
level expression. Because this circuit is symmetric around the CNOT gate, we don’t
have to worry about the ordering of the matrix multiplications:
(H ⊗ H ) CNOT0,1 (H ⊗ H ).
In matrix form:
1 1 1 1 1 0 0 0 1 1 1 1
1 1 −1
1 −1 0
1 0 1 1 −1
0 1 −1
.
2 1 1 −1 −1 0 0 0 1 2 1
1 −1 −1
1 −1 −1 1 0 0 1 0 1 −1 −1 1
We construct this in code and find that both versions produce the same operator matrix:
ops.Cnot(1, 0).dump()
>>
1.0 - - -
- - - 1.0
- - 1.0 -
- 1.0 - -
Z
= =
Z
82 Simple Algorithms
Let us confirm that the Controlled-Z is indeed symmetric. In Section 2.7 on con-
trolled gates we learned how to construct controlled unitary gates with the help of
projectors such as:
It is a good exercise to try and manually compute the tensor products for this
Controlled-Z gate. In one case we add the matrices:
1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0
CZ 0,1 = + .
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 −1
In the case with the indices changed from 0,1 to 1,0 we add these matrices:
1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0
CZ 1,0 = + .
0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 −1
The corresponding code for this experiment is below. Note that constructing a
Multi-Controlled-Z results in a similar matrix with all diagonal elements being 1
except the bottom-right element, which is −1. Try it out yourself!
def test_controlled_z(self):
z0 = ops.ControlledU(0, 1, ops.PauliZ())
z1 = ops.ControlledU(1, 0, ops.PauliZ())
self.assertTrue(z0.is_close(z1))
Note that all controlled phase gates are symmetric, for example:
S
=
S
=
H H
X Y X = −Y
def test_xyx(self):
x = ops.PauliX()
y = ops.PauliY()
print(y)
op = x(y(x))
print(op)
self.assertTrue(op.is_close(-1.0 * y))
The two print statements in the code above indeed produce the expected result:
2 Note that these equivalences are related to the Pauli commutators. See, for example, https://en.wikipedia
.org/wiki/Pauli_matrices#Commutation_relations.
84 Simple Algorithms
def test_equalities(self):
# Generate the Pauli and Hadamard matrices.
_, x, y, z = ops.Pauli()
h = ops.Hadamard()
# Check equalities.
op = h(x(h))
self.assertTrue(op.is_close(z))
op = h(y(h))
self.assertTrue(op.is_close(-1.0 * y))
op = h(z(h))
self.assertTrue(op.is_close(x))
op = x(z)
self.assertTrue(op.is_close(1.0j * y))
Let us compare this against a rotation by π/4 around the x-axis, which is this operator:
At first glance, it does not appear that these two results are equivalent. However,
dividing one by the other results in a matrix with all equal elements. This means that
the two operators only differ by a multiplication factor. Applying a scaled operator to a
3.2 Gate Equivalences 85
state leads to an equally scaled state. Since this is a global phase, which has no physical
meaning, we can ignore it. Conversely, we can say that operators that differ only by a
constant factor are equivalent. If this is true then computing the division between the
two should yield a matrix with all identical elements. And indeed, it does:
def test_global_phase(self):
h = ops.Hadamard()
op = h(ops.Tgate()(h))
3 We can prove that any unitary matrix has a square root. Unitary matrices are diagonalizable, so the root
of the unitary matrix is just the root of the diagonal elements.
86 Simple Algorithms
|q0 i |q0 i
|q1 i = |q1 i
√ √ † √
|q2 i |q2 i X X X
def test_v_vdag_v(self):
# Make Toffoli out of V = sqrt(X).
#
v = ops.Vgate() # Could be any unitary, in principle!
ident = ops.Identity()
cnot = ops.Cnot(0, 1)
o0 = ident * ops.ControlledU(1, 2, v)
c2 = cnot * ident
o2 = (ident * ops.ControlledU(1, 2, v.adjoint()))
o4 = ops.ControlledU(0, 2, v)
final = o4 @ c2 @ o2 @ c2 @ o0
v2 = v @ v
cv1 = ops.ControlledU(1, 2, v2)
cv0 = ops.ControlledU(0, 1, cv1)
self.assertTrue(final.is_close(cv0))
Uncompute
...
...
|0i⊗3
...
...
=
...
...
...
X X
entanglement with the ancilla qubits has been eliminated. We detail an implementation
of multi-controlled gates that may have 0, 1, or many controllers and that can be
controlled by |0i or |1i, in Section 4.3.7.
Other constructions are possible. Mermin (2007) proposes multi-controlled gates
that trade additional gates for lower numbers of ancillae, as well as circuits that don’t
require the ancillae to be in |0i states (which may save a few uncomputation gates).
We can use the following code to validate these equivalences. We will also revisit
some of these in Section 8.4 where we discuss compiler optimization.
def test_control_equalities(self):
"""Exercise 4.31 Nielson, Chuang."""
i, x, y, z = ops.Pauli()
x1 = x * i
x2 = i * x
y1 = y * i
y2 = i * y
z1 = z * i
z2 = i * z
c = ops.Cnot(0, 1)
theta = 25.0 * math.pi / 180.0
rx2 = i * ops.RotationX(theta)
rz1 = ops.RotationZ(theta) * i
self.assertTrue(c(x1(c)).is_close(x1(x2)))
self.assertTrue((c @ x1 @ c).is_close(x1 @ x2))
self.assertTrue((c @ y1 @ c).is_close(y1 @ x2))
self.assertTrue((c @ z1 @ c).is_close(z1))
self.assertTrue((c @ x2 @ c).is_close(x2))
self.assertTrue((c @ y2 @ c).is_close(z1 @ y2))
self.assertTrue((c @ z2 @ c).is_close(z1 @ z2))
self.assertTrue((rz1 @ c).is_close(c @ rz1))
self.assertTrue((rx2 @ c).is_close(c @ rx2))
|ψi |φi
|φi |ψi
There are many more equivalences to be found in the literature. They are all inter-
esting and valuable, especially in the context of optimization and compilation. An
interesting (and not necessarily trivial) problem is to find them programmatically. We
leave this as a challenge to you.
In this section we study a standard classical logic circuit, the full adder, and implement
it with quantum gates. The quantum circuit does not exploit any of the quantum
features – we will detail arithmetic in the quantum Fourier domain in Section 6.2.
A 1-bit full adder block is usually drawn as shown in Figure 3.4.
Input bits are A and B. The carry-in from a previous, chained-in full adder is
denoted by Cin . The outputs are the sum Sum and the potential carry-out Cout .
Multiple full adders can be chained together (with Cin and Cout ) to produce adders of
arbitrary bit width. The truth table for the full adder logic circuit is shown in Table 3.1.
Classical circuits use, unsurprisingly, classical gates like AND, OR, NAND, and
others. The task at hand is to construct a quantum circuit that produces the same truth
table by using only quantum gates. Classical 0s and 1s are represented by the basis
states |0i and |1i. With some thought (and experimentation), we arrive at the circuit in
Figure 3.5. Note that this circuit exploits neither superposition nor entanglement.
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
A Sum
Cin Cout
A
B
Cin
Sum Sum
Cout Cout
Let’s walk through the circuit to convince ourselves that it is working properly:
We run an experiment the following way. First, we construct the state from the
inputs, augmenting it with two |0i states for expected outputs sum and cout. Then,
we apply the circuit we just constructed. We measure the probabilities of the outputs
being 1, which means we will get a probability of 0.0 if the state was |0i and a
probability of 1.0 if the state was |1i:
3.3 Classical Arithmetic 91
def add_classic():
"""Full eval of the full adder."""
def main(argv):
[...]
add_classic()
This will produce the following output. The absence of error messages indicates
that things went as planned:
Other classical circuits can be implemented and combined in this way to build up
more powerful circuits. We show a general construction below. All these circuits point
to a general statement about quantum computers: since universal logic gates can be
implemented on quantum computers, a quantum computer is as least as capable as a
classical computer. It does not mean that it performs better in the general case. The
circuit presented here may be a very inefficient way to implement a simple 1-bit adder.
However, we will soon see a class of algorithms that performs significantly better on
quantum computers than on classical computers.
92 Simple Algorithms
NOT = a a⊕1
a a
CNOT =
b a⊕b
a a
Toffoli = b b
c (a ∧ b) ⊕ c
These three gates are sufficient to construct the classical gates AND (∧), OR (∨),
and, of course, the NOT gate. The AND gate is a Toffoli gate with a |0i as third input:
a a
AND = b b
|0i a∧b
The OR gate is slightly more involved, but it is just based on another Toffoli gate:
a a a X X a
OR = b b = b X X b
We know that with NOT and AND, we can build the universal NAND gate, which
means we can construct any classical logic circuit with quantum gates. We might run
into the need for fan-out to connect single wires to multiple gates for complex logic
circuits. So, we need a fan-out circuit. Is this even possible? Doesn’t fan-out violate
the no-cloning theorem? The answer is no, it does not, because logical 0 and 1 are
represented by qubits in states |0i or |1i. For those basis states, cloning and fan-out
are indeed possible, as we showed in Section 2.13.
a a
Fan-out =
|0i a
With these elements and knowing that any Boolean formula can be expressed as
a product of sums,5 we can build any logic circuit with quantum gates. Of course,
5 https://en.wikipedia.org/wiki/Canonical_normal_form
3.4 Swap Test 93
x0
x1
x2
|0i0 x0 ∧ x1
|0i1 x1 ∨ x2
making this construction efficient would require additional techniques, such as ancilla
management, uncomputation, and general minimization of gates.
An example of a quantum circuit for the Boolean formula (x0 ∧ x1 ) ∧ (x1 ∨ x2 )
is shown in Figure 3.6. In this circuit diagram, we do not show the uncomputation
following the final gate that would be required to disentangle the ancillae from the
state. The ability to uncompute ancillae in a large chain of logic expressions can reduce
the number of required ancillae. In this example, we could uncompute |0i0 and |0i1
and make them available again for future temporary results.
The quantum swap test allows measuring similarity between two quantum states with-
out actually measuring the two states directly (Buhrman et al., 2001). If the resulting
measurement probability is close to 0.5 for the basis state |0i of an ancilla, it means
that the two states are very different. Conversely, a measurement probability close to
1.0 means that the two states are very similar. In the physical world, we have to run
the experiment multiple times to confirm the probabilities. In our implementation, we
can conveniently peek at the probabilities.
This is an instance of a quantum algorithm that allows the derivation of an indirect
measure. It won’t tell us what the two states are – that would constitute a measurement.
It also does not tell us which state has the larger amplitude. However, it does tell us
how similar two unknown states are without having to measure them. The circuit to
measure the proximity of qubits |ψi and |φi is shown in Figure 3.7.
Let’s denote the state of the 3-qubit system by χ and see how it progresses going
from left to right. We use this circuit as a first example to exhaustively derive the
related math.
At the start of the circuit, the state is the tensor product of the three qubits:
|χ0 i = |0,ψ,φi.
The Hadamard gate on qubit 0 superimposes the system to:
1
|χ1 i = √ |0,ψ,φi + |1,ψ,φi .
2
94 Simple Algorithms
|0i H H
|ψi
|φi
The Controlled-Swap gate modifies the second half of this expression because of the
controlling |1i state for qubit 0. In the part marked b, |φi and |ψi switch positions:
1
|χ2 i = √ |0,ψ,φi + |1,φ,ψi .
2 | {z } | {z }
a b
The second Hadamard now superimposes further. The first part of the state (marked
as a),
1
√ |0,ψ,φi + · · · ,
2 | {z }
a
turns into the following (the Hadamard superposition of the |0i state introduces a plus
sign)
1 1
√ √ |0,ψ,φi + |1,ψ,φi + · · · .
2 2
The second part (marked as b):
· · · + |1,φ,ψi ,
| {z }
b
becomes the following state (the Hadamard superposition of |1i introduces a minus
sign)
1 1
· · · + √ √ |0,φ,ψi − |1,φ,ψi .
2 2
Combining the two sub expressions results in state |χ3 i:
1
|χ3 i = |0,ψ,φi + |1,ψ,φi + |0,φ,ψi − |1,φ,ψi .
2
This simplifies to the following, pulling out the first qubit (qubit 0):
1 1
|χ3 i = |0i |ψ,φi + |φ,ψi + |1i |ψ,φi − |φ,ψi .
2 2
Now we measure the first qubit. If it collapses to |0i, the second term disappears
(it can no longer be measured and has probability 0). We only consider measurements
3.4 Swap Test 95
that result in state |0i for the first qubit and ignore all others. The probability amplitude
of the state collapsing to state |0i is taken from the first term:
1
|0i |ψ,φi + |φ,ψi .
2
The probability is computed via squaring the amplitude’s norm, which is to multi-
ply the amplitude with its complex conjugate. Here we have to be careful: to compute
this square (of amplitude and its complex conjugate) we have to square the whole
amplitude to the |0i state, which includes the two tensor products after the |0i itself,
as well as the factor of 1/2:
1 † 1
|ψ,φi + |φ,ψi |ψ,φi + |φ,ψi
2 2
1 1
= hψ,φ| + hφ,ψ| |φ,ψi + |ψ,φi
2 2
1 1 1 1
= hψ,φ|φ,ψi + hψ,φ|ψ,φi + hφ,ψ|φ,ψi + hφ,ψ|ψ,φi.
4 4 | {z } 4 | {z } 4
=1 =1
The scalar product of a normalized state with itself is 1.0, which means, in the above
expression, that the second and third subterms each become 41 and the expression
simplifies to:
1 1 1
+ hψ,φ|φ,ψi + hφ,ψ|ψ,φi. (3.3)
2 4 4
Now recall how to compute the inner product of two tensors from Equation (1.7).
Given two states:
This means we rewrite Equation (3.3) as the following, changing the order of the
scalar products; they are just complex numbers:
1 1 1
+ hψ,φ|φ,ψi + hφ,ψ|ψ,φi
2 4 4
1 1 1
= + hψ|φihφ|ψi + hφ|ψihψ|φi
2 4 4
1 1 1
= + hψ|φihφ|ψi + hψ|φihφ|ψi
2 4 4
1 1
= + hψ|φihφ|ψi.
2 2
96 Simple Algorithms
which means that for the swap test circuit, the final probability of measuring |0i will be
1 1
Pr (|0i) = + hψ|φi2 .
2 2
This probability containing the scalar product of the two states is the key for the
similarity measurement. Measuring probability for state |0i will give a value close to
1/2 if the dot product of |ψi and |φi is close to 0, which means that these two states
are orthogonal and maximally different. The measurement will give a value close to
1.0 if the dot product is close to 1.0, which means the states are almost identical.
In code, this looks quite simple. In each experiment, we construct the circuit:
We perform the usual measurement by peek-a-boo and find the probability of qubit
0 to be in state |0i:
That’s all there is to it. The variable p0 will be the probability of qubit 0 to be found
in the |0i state. What’s left to do now is to compare this probability against a target to
check that the result is valid. We allow a 5% error margin (0.05):
def main(argv):
[...]
print('Swap test. 0.5 means different, 1.0 means similar')
run_experiment(1.0, 0.0, 0.5)
run_experiment(0.0, 1.0, 0.5)
run_experiment(1.0, 1.0, 1.0)
run_experiment(0.0, 0.0, 1.0)
run_experiment(0.1, 0.9, 0.65)
[...]
Swap test to compare state. 0.5 means different, 1.0 means similar
Similarity of a1: 1.00, a2: 0.00 ==> %: 50.00
Similarity of a1: 0.00, a2: 1.00 ==> %: 50.00
Similarity of a1: 1.00, a2: 1.00 ==> %: 100.00
Similarity of a1: 0.00, a2: 0.00 ==> %: 100.00
Similarity of a1: 0.10, a2: 0.90 ==> %: 63.71
[...]
This section describes the quantum algorithm with one of the most intriguing names
of all time – quantum teleportation (Bennett et al., 1993). It is a small example from
the fascinating field of quantum information, which includes encryption and error
correction. This type of algorithm exploits entanglement to communicate information
across spatially separate locations.
The algorithmic story begins, as always, with Alice and Bob, placeholders for the
distinct systems A and B. At the beginning of the story, they are together in a lab on
Earth and create an entangled pair of qubits, for example, the Bell state β 00 . Let us
mark the first qubit as Alice’s and the second one as Bob’s:
|0 A 0 B i + |1 A 1 B i
β 00 = √
2
def main(argv):
[...]
After creating the state, they each take one of the qubits and physically separate –
Alice goes to the Moon and Bob ships off to Mars. Let’s not worry about how they
are getting their super-cooled quantum qubits across the solar system. Nobody said
teleportation was easy.
98 Simple Algorithms
Sitting there on the Moon, Alice happens to be in possession of this other qubit |xi,
which is in a specific state with probability amplitudes α and β:
|xi = α|0i + β|1i.
Alice does not know what the values of α and β are, and measuring the qubit would
destroy the superposition. But Alice wants to communicate α and β to Bob, so that
when he measures, he will obtain the basis states of |xi with the corresponding prob-
abilities. How can Alice “send” or “teleport” the state of |xi to Bob? She can do this
by exploiting the entangled qubit she already has in her hands from the time before
the Moon travel.
In code, let’s create the qubit |xi with defined values for α and β so we can check
later whether Alice has teleported the correct values to Bob:
Here comes the key “trick”: Alice combines the new qubit |xi with the qubit she
brought with her from Earth, the one that is entangled with Bob’s qubit. We don’t
concern ourselves with how this might be accomplished in the physical world; we just
assume it is possible:
Abusing notation a little bit, the combined state is now |x ABi, with A representing
Alice’s entangled qubit on the Moon and B being Bob’s entangled qubit on Mars. She
now explicitly entangles |xi with the usual technique of applying a Controlled-Not
gate:
Finally, she applies a Hadamard gate to |xi. Note that the application of an entangler
circuit in reverse, with a first Controlled-Not gate followed by a Hadamard gate, is
also called making a Bell measurement.
ψ0 ψ1 ψ2 ψ3 ψ4 ψ5
|xi H
|ai H
|bi
The whole procedure in circuit notation is shown in Figure 3.8. Let us analyze how
the state progresses from left to right and spell out the math in great detail. Starting in
the lab, before the first Hadamard gate, the state is just the tensor product of the two
qubits:
ψ0 = |0i A ⊗ |0i B = |0 A 0 B i.
1
ψ5 = α(|000i + |011i + |100i + |111i)
2
+ β(|010i + |001i − |110i − |101i)
We’re almost there. Alice has in her possession the first two qubits. If we regroup
the above expression and isolate out the first two qubits, we arrive at our target
expression:
1
ψ5 = |00i α|0i + β|1i
2
+ |01i β|0i + α|1i
+ |10i α|0i − β|1i
+ |11i − β|0i + α|1i .
Remember that the first two qubits are Alice’s, and the third qubit is Bob’s. Alice’s
four basis states have probabilities determined by combinations of α and β. She can
measure her first two qubits, while leaving the superposition of Bob’s third qubit
intact. The probability amplitudes changed, but Bob’s qubit remains in superposition.
As Alice measures her two qubits, the state collapses and leaves only one proba-
bility combination for Bob’s qubit. The final trick is now that Alice tells Bob over a
classic communication channel what she has measured:
• If she measured |00i, we know that Bob’s qubit is in the state α|0i + β|1i.
• If she measured |01i, we know that Bob’s qubit is in the state β|0i + α|1i.
• If she measured |10i, we know that Bob’s qubit is in the state α|0i − β|1i.
• If she measured |11i, we know that Bob’s qubit is in the state −β|0i + α|1i.
Alice succeeded in teleporting the probability amplitudes of |xi to Bob. She still
has to classically communicate her measurement results, so there is no faster than light
communication. However, the spooky action at a distance “modified” Bob’s entangled
qubit on Mars to obtain the probability amplitudes from Alice’s qubit |xi, which she
created on the Moon. This spooky action is truly spooky, and also astonishing.
The final step is, depending on Alice’s classical communication, to apply gates to
Bob’s qubit to put it in the actual state of α|0i + β|1i:
After this, Bob’s qubit on Mars will be in the state of Alice’s original qubit |xi on the
Moon. Teleportation completed. Minds blown.
3.5 Quantum Teleportation 101
# Alice measures and communicates the result |00>, |01>, ... to Bob.
alice_measures(alice, a, b, 0, 0)
alice_measures(alice, a, b, 0, 1)
alice_measures(alice, a, b, 1, 0)
alice_measures(alice, a, b, 1, 1)
For each experiment, we pretend that Alice measured a specific result and apply
the corresponding decoder gates to Bob’s qubit:
Quantum Teleportation
Start with EPR Pair a=0.60, b=0.80
Teleported (|00>) a=0.60, b=0.80
Teleported (|01>) a=0.60, b=0.80
Teleported (|10>) a=0.60, b=0.80
Teleported (|11>) a=0.60, b=0.80
The core idea of exploiting entanglement is found in other algorithms of this type.
An interesting example is superdense coding, which we discuss next. Entanglement
swapping would be another representative of this class of algorithms (Berry and
Sanders, 2002), but we won’t discuss it further here. A sample implementation can be
found in file src/entanglement_swap.py in the open source repository.
Superdense coding, another algorithm with a very cool name, takes the core idea from
quantum teleportation and turns it on its head. Alice and Bob again share an entangled
pair of qubits. Alice takes hers to the Moon, while Bob takes his to Mars. Sitting on
the Moon, Alice wants to communicate two classical bits to Bob. Superdense coding
encodes two classical bits and sends them to Bob by physically transporting just a
single qubit. Two qubits are still needed in total, but the communication is done with
just a single qubit.
There exists no other classical compression scheme that would allow the compres-
sion of two classical bits into one. Of course, here we are dealing with qubits, which
have two degrees of freedom (two angles define the position on a Bloch sphere). The
challenge is how to exploit this fact in order to compress information. To understand
how this works, we again begin with an entangled pair of qubits (the corresponding
code is in file src/superdense.py):
Alice manipulates her qubit 0 on the Moon according to the rules of how to encode
two classical bits into a single qubit, as shown below. In a twist of events, she will
classically ship her qubit to Bob’s Mars station. There, Bob will disentangle and
3.6 Superdense Coding 103
if bit 0 if bit 1
|0i H X Z H
|0i
measure both qubits. Based on the results of the measurement, he can derive Alice’s
original two classical bits. Alice sent just one qubit to allow Bob to restore two
classical bits.
To start the process, Alice manipulates her qubit, which is qubit 0, in the following
way.
The whole procedure in circuit notation is shown in Figure 3.9. In code, the two
classical bits encode four possible cases (00, 01, 10, and 11). We iterate over these
four combinations to drive our experiments:
# Alice manipulates her qubit and sends her 1 qubit back to Bob,
# who measures. In the Hadamard basis he would get b00, b01, etc.
# but we're measuring in the computational basis by reverse
# applying Hadamard and Cnot.
Let us understand how the math works. The entangled pair is in the Bell state β 00
initially:
|00i + |11i
β 00 = √ .
2
Now let us apply the X-gate to qubit 0:
0 0 1 0 1 0
0 0 0 1 1 0 1 1
(X ⊗ I ) β00 = 1 0 0
√ = √ = β01 .
0 2 0 2 1
0 1 0 0 1 0
Or, in short:
|10i + |01i
(X ⊗ I ) β00 = √ = β01 .
2
Applying the X-gate changes the state and turns it into a different Bell state – it
flips the second subscript of the Bell state. This corresponds to setting the classical bit
0 to 1 in Alice’s encoding.
Applying the Z-gate changes the state and flips the first subscript of the Bell state,
which corresponds to setting classical bit 1 to 1 in Alice’s encoding:
|00i − |11i
(Z ⊗ I ) β00 = √ = β10 .
2
Applying both the X-gate and the Z-gate will change the state to β11 , indicating
that both classical bits 0 and 1 were set to 1. Analogously, as we saw earlier with
Equation (3.2), we could explicitly apply iY to β 00 to yield β11 . Of course, this
step is not needed here because the prior applications of the X-gate and Z-gate already
compound to this effect.
|01i − |10i
(iY ⊗ I ) β00 = √ = β 11 .
2
Bob measures in the Hadamard basis, which is another way of saying that before
measurement, he converts the state to the computational basis by applying the entan-
gler’s Hadamard and Controlled-Not gates in reverse order.
Going through the entangler circuit in reverse uncomputes the entanglement and
changes the state to one of the defined basis states |00i,|01i,|10i, or |11i, depending
on the value of the original classical bits. The probability for each possible case is
100%. We will find the classic bit values as they were set by Alice.
print(f'Expected/matched: {expect0}{expect1}.')
This confirms results with 100% probability for the |0i and |1i states, depending on
how the qubit was manipulated by Alice. Here is the expected output:
Expected/matched: 00
Expected/matched: 01
Expected/matched: 10
Expected/matched: 11
The goal is to find the secret string s. On a classical computer, we would have to try
n times. Each experiment would have an input string of all 0s, except for a single 1.
Each iteration where Equation (3.4) holds identifies a 1-bit in s at position t for trial
t ∈ [0,n − 1].
In the quantum formulation, we construct a circuit. After the circuit has been
applied, the outputs will be in states |0i and |1i, corresponding to the secret string’s
bits, which we encode into a big unitary operator U f . In the example below, the secret
string is 001.
|0i |0i
|1i Uf |0i
|1i |1i
106 Simple Algorithms
To see how this works, we need to understand the mechanics of basis changes.
Remember how the |0i and |1i states are put in superposition with Hadamard gates:
|0i + |1i
H |0i = √ = |+i, (3.5)
2
|0i − |1i
H |1i = √ = |−i. (3.6)
2
As a first step, we create an input state of length n, initialized with all |0i, with an
additional ancilla qubit in state |1i. The main trick of this circuit and algorithm is the
following.
If we apply a Controlled-Not from a controlling qubit in the |+i state to a qubit
in the |−i state, this has the effect of flipping the controlling qubit into the |−i state!
This is the crucial trick because now applying another Hadamard gate will rotate the
bases from |+i back to |0i and from |−i to |1i. In other words, the qubits on which
we applied the Controlled-Not will have the resulting state |1i.
We can visualize this effect with the circuit in Figure 3.10, where we symbolically
inline the states. Let us write this in code (in file src/bernstein.py in the open
source repository). First, we create the secret string:
Next, we construct the circuit, which is also called an oracle. The construction is
simple – we apply a Controlled-Not for each of the qubits corresponding to 1s in the
secret string. For example, for the secret string 1010, we would construct the circuit
in Figure 3.11.
We construct the corresponding circuit as one big unitary operator matrix U .
This limits the number of qubits we can use but is still sufficient for exploring the
algorithm.
3.7 Bernstein–Vazirani Algorithm 107
|0i H H
|0i H H
|0i H H
|0i H H
|0i X H H
Figure 3.11 The quantum circuit for the Bernstein–Vazirani algorithm with secret string 1010.
op = ops.Identity(nbits)
for idx in range(nbits-1):
if constant_c[idx]:
op = ops.Identity(idx) * ops.Cnot(idx, nbits-1) @ op
if not op.is_unitary():
raise AssertionError('Constructed non-unitary operator.')
return op
For the full circuit, we perform the following steps. First we create a secret string
of length nbits-1 and construct the big unitary. Then we build a state consisting of
nbits states initialized as |0i, tensored with an ancilla qubit initialized as |1i. We
follow this with the big unitary, which we sandwiched between Hadamard gates. As a
final step, we measure and compare the results:
c = make_c(nbits-1)
u = make_u(nbits, c)
To check the results, we measure the probability for all possible states and ensure
the state with nonzero probability ( p > 0.1) matches the secret string. There should
108 Simple Algorithms
only be one matching state of n qubits. In the code below, we iterate over all possible
results and only print the results with higher probability.
print(f'Expected: {c}')
# The state with the 'flipped' bits will have probability 1.0.
# It will be found on the very first try.
for bits in helper.bitprod(nbits):
if psi.prob(*bits) > 0.1:
print('Found : {}, with prob: {:.1f}'
.format(bits[:-1], psi.prob(*bits)))
if bits[:-1] != c:
raise AssertionError('invalid result')
That’s it! Running this program will produce something like the following output,
showing the bit settings and the resulting probabilities:
Expected: (0, 1, 0, 1, 0, 0)
Found : (0, 1, 0, 1, 0, 0), with prob: 1.0
particular algorithm is then to extract some meaningful information from the resulting
states, which may be entangled with junk qubits.
There are a handful of oracle algorithms to be found in the literature. We will visit
2.5 of them. First, we will discuss the fundamental Deutsch algorithm, and then, later
in this chapter, its extension to more than two input qubits. That’s two algorithms, so
we add another 1/2 algorithm by showing how to formulate the previously discussed
Bernstein–Vazirani algorithm in oracle form as well, using the oracle constructor we
develop in this section.
f : {0,1} → {0,1}
There are four possible cases for this function, which we call constant or balanced:
Deutsch’s algorithm answers the following question: Given one of these four func-
tions f , which type of function is it: balanced or constant?
To answer this question with a classical computer, you have to evaluate the function
for all possible inputs. In the quantum model, we assume we have an oracle that, given
two input qubits |xi and |yi, changes the state to Equation 3.7. Note that XOR, denoted
by ⊕, is equivalent to addition modulo 2, hence the plus sign in the circle:
The input |xi remains unmodified, |yi is being XOR’ed with f (|xi). This is a
formulation that we will see in other oracle algorithms as well – there is always
an ancilla |yi, and the result of the evaluated function is XOR’ed with that ancilla.
Remember that quantum operators must be reversible; this is one of the ways to
achieve this.
Assuming we have oracle U f representing and applying the unknown function
f (x), then the Deutsch algorithm can be drawn as the circuit shown in Figure 3.12.
As discussed, it is a convention to start every circuit with all qubits in state |0i. The
algorithm requires the ancilla qubit to be in state |1i, which can be achieved easily by
adding an X-gate to the lower qubit. Let us look at the detailed math again. Initially,
after the X-gate on qubit 1, the state is:
|ψ0 i = |01i.
110 Simple Algorithms
ψ0 ψ1 ψ2 ψ3
|0i H x x H
Uf
|0i X H y y ⊕ f (x)
|0i + |1i |0 ⊕ 0i − |1 ⊕ 0i
ψ2 = √ ⊗ √
2 2
|0i + |1i |0i − |1i
= √ ⊗ √ .
2 2
But if f (x) = 1, then
|0i + |1i |0 ⊕ 1i − |1 ⊕ 1i
ψ2 = √ ⊗ √
2 2
|0i + |1i |1i − |0i
= √ ⊗ √ .
2 2
We can combine the two results into a single expression:
|0i + |1i |0i − |1i
ψ2 = (−1) f (x) √ ⊗ √ .
2 2
The first qubit x is in a superposition, so we multiply in the constant factor:
Finally, applying the final Hadamard to the top qubit takes the state from the
Hadamard basis back to the computational basis. To see how this works, let us remind
ourselves that the Hadamard operator is its own inverse:
|0i + |1i |0i + |1i
H |0i = √ and H √ = |0i,
2 2
0 0 0 0 0,0
0 1 0 1 0,1
1 0 0 0 1,0
1 1 0 1 1,1
For a constant function f , we always have a |0i in the front, and in the balanced
case the first qubit will always be in state |1i. This means that after a single run of the
circuit, we can determine the type of f simply by measuring the first qubit.
The superposition allows the computation of the results for both basis states |0i
and |1i simultaneously. This is also called quantum parallelism. The XOR’ing to the
ancilla qubit allows the math to add up in a smart way such that a result can be obtained
with high probability. The result does not tell us which specific function it is out of the
four possible cases, but it does tell us which of the two classes it belongs to. Because
the algorithm is able to exploit superposition to compute the result in parallel, it has a
true advantage over the classical equivalent.
3.8.2 Construct Uf
The math in Subsection 3.8.1 is a bit abstract, but things may become clearer when
considering how to construct U f . To reiterate, for a combined state of two qubits, the
basis states are:
|00i = [1,0,0,0]T ,
|01i = [0,1,0,0]T ,
|10i = [0,0,1,0]T ,
|11i = [0,0,0,1]T .
We want to construct an operator that takes any linear combination of these input
states to one with the condition that the second qubit flips to:
|x, yi → |x, y ⊕ f (x)i.
We show this by example, followed by the code to compute the oracle operator.
f (0) = f (1) = 0
The function f only modifies the second qubit as a function of the first qubit. For the
case where f (0) = f (1) = 0 the truth table is shown in Table 3.2. The columns x and
y represent the input qubits f (x) is constant 0 in this case. The next column shows
the result of XOR’ing the function’s return value with y, which is y ⊕ f (x). The last
column finally shows the resulting new state, which leaves the first qubit unmodified
and changes the second qubit to the result of the previous XOR.
3.8 Deutsch’s Algorithm 113
0 0 0 0 0,0
0 1 0 1 0,1
1 0 1 1 1,1
1 1 1 0 1,0
We can express this with a 4 × 4 permutation matrix, where the rows and columns
are marked with the four basis states. We use the combination of x and y as row
index, and the new state as column index. In this example, old state and new state are
identical, and the resulting U f matrix is the identity matrix I :
|00i |01i |10i |11i
|00i 1 0 0 0
0
|01i 1 0 0
.
|10i 0 0 1 0
|11i 0 0 0 1
Note that this has to be a permutation matrix in order to make this a reversible operator.
f (0) = 0, f (1) = 1
The construction follows the same pattern as above, with a truth table as shown in
Table 3.3. The table translates into this matrix:
|00i |01i |10i |11i
|00i 1 0 0 0
0
|01i 1 0 0
.
|10i 0 0 0 1
|11i 0 0 1 0
f (0) = 1, f (1) = 0
We apply the same approach for this flavor of f (x), with the truth table as shown in
Table 3.4. It translates into this matrix:
|00i |01i |10i |11i
|00i 0 1 0 0
1
|01i 0 0 0
.
|10i 0 0 1 0
|11i 0 0 0 1
114 Simple Algorithms
0 0 1 1 0,1
0 1 1 0 0,0
1 0 0 0 1,0
1 1 0 1 1,1
0 0 1 1 0,1
0 1 1 0 0,0
1 0 1 1 1,1
1 1 1 0 1,0
f (0) = f (1) = 1
For the final case, the truth table is in Table 3.5. It corresponds to this matrix:
|00i |01i |10i |11i
|00i 0 1 0 0
1
|01i 0 0 0
.
|10i 0 0 0 1
|11i 0 0 1 0
u = np.zeros(16).reshape(4, 4)
for col in range(4):
y = col & 1
x = col & 2
fx = f(x >> 1)
3.8 Deutsch’s Algorithm 115
xor = y ^ fx
u[col][x + xor] = 1.0
op = ops.Operator(u)
if not op.is_unitary():
raise AssertionError('Produced non-unitary operator.')
return op
for i in range(4):
f = make_f(i)
u = make_uf(f)
print(f'Flavor {i:02b}: {u}')
3.8.4 Experiments
To run an experiment, we construct the circuit and measure the first qubit. If it col-
lapses to |0i, f (·) was a balanced function, according to the math above. If it collapses
to |1i, f (·) was a constant function.
First, we define a function make_f which returns a function object according to
one of the four possible function flavors. We can call the returned function object as
f (0) or f (1), it will return 0 or 1:
return f
The full experiment first constructs this function object, then the oracle. Hadamard
gates are applied to each qubit in an initial state |0i ⊗ |1i, followed by the oracle
operator and a final Hadamard gate on the top qubit:
f = make_f(flavor)
u = make_uf(f)
h = ops.Hadamard()
Finally, we check that we get the right answer for all four function flavors:
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
run_experiment(0)
run_experiment(1)
run_experiment(2)
run_experiment(3)
new_bits = bits[0:-1]
new_bits.append(xor)
op = Operator(u)
if not op.is_unitary():
raise AssertionError('Constructed non-unitary operator.')
return op
Controlled-Not gates to represent the secret number, we write an oracle function and
call the OracleUf constructor above. This also demonstrates how a multi-qubit input
can be used to build the oracle. This implementation only supports a single ancilla
qubit.
First, we construct the function to compute the dot product between the state and
the secret string:
const_c = c
def f(bit_string: Tuple[int]) -> int:
val = 0
for idx in range(len(bit_string)):
val += const_c[idx] * bit_string[idx]
return val % 2
return f
And then we repeat much of the original algorithm, but with an oracle:
c = make_c(nbits-1)
f = make_oracle_f(c)
u = ops.oracleUf(nbits, f)
check_result(nbits, c, psi)
We run the code to convince ourselves that we implemented all this correctly:
Expected: (0, 1, 0, 1, 0, 0)
Found : (0, 1, 0, 1, 0, 0), with prob: 1.0
|0i⊗n H ⊗n x x H ⊗n
Uf
|0i X H y y ⊕ f (x)
f : {0,1}n → {0,1}.
The mathematical treatment of this case parallels the two-qubit Deutsch algorithm.
The key result is that we will measure a state of n qubits in the end. If we only find
qubits in state |0i then the function is constant; if we find anything else, the function
is balanced. The circuit, shown in Figure 3.13, looks similar to the two-qubit case,
except that multiple qubits are being handled on both the input and output. The single
ancilla qubit at the bottom will still be the key to the answer:
3.9.1 Implementation
Let us focus on the code (in file src/deutsch_jozsa.py), which looks quite com-
pact with our U f operator. First, we create the many-qubit function as either a constant
function (all 0s or all 1s with equal probability) or a balanced function (the same
number of 0s and 1s, randomly distributed over the length of the input bitstring).
We create an array of bits and fill it with 0s and 1s accordingly. Finally, we return
a function object that returns one of the values from this prepopulated array, thus
representing one of the two function types:
power2 = 2**dim
bits = np.zeros(power2, dtype=np.uint8)
if flavor == exp_constant:
bits[:] = int(np.random.random() < 0.5)
else:
bits[np.random.choice(power2, size=power2//2, replace=False)] = 1
idx = helper.bits2val(bit_string)
return bits[idx]
return f
120 Simple Algorithms
This function drives the rest of the implementation. To run an experiment, we construct
the circuit shown in Figure 3.13 and measure. If the measurement finds that only state
|00 . . . 0i has a nonzero probability amplitude, then we have a constant function.
f = make_f(nbits-1, flavor)
u = ops.OracleUf(nbits, f)
psi = (ops.Hadamard(nbits-1)(state.zeros(nbits-1)) *
ops.Hadamard()(state.ones(1)))
psi = u(psi)
psi = (ops.Hadamard(nbits-1) * ops.Identity(1))(psi)
def main(argv):
[...]
for qubits in range(2, 8):
result = run_experiment(qubits, exp_constant)
print('Found: {} ({} qubits) (expected: {})'
.format(result, qubits, exp_constant))
if result != exp_constant:
raise AssertionError('Error, expected {}'.format(exp_constant))
if __name__ == '__main__':
app.run(main)
Other algorithms of this nature are Simon’s algorithm and Simon’s generalized
algorithm (Simon, 1994). We won’t discuss them here, but implementations can be
found in the open-source repository in files simon.py and simon_general.py.
At this point, notice how the execution speed slows down as we increase the number
of qubits in this algorithm. Once we reach 10, 11, or 12 qubits, the corresponding
operator matrices become very large. Adding a few more qubits makes simulation
intractable. To remedy this, we develop ways to make gate application much faster in
Chapter 4, increasing our ability to simulate many more qubits.
4 Scalable, Fast Simulation
The concepts and basic infrastructure consisting of tensors, states, and operators,
implemented as big matrices and state vectors, are sufficient to implement many small-
scale quantum algorithms. All the algorithms in Chapter 3 on simple algorithms were
implemented this way.
This basic infrastructure is great for learning and experimenting with the basic
concepts and mechanisms of quantum computing. However, for complex algorithms,
which typically consist of much larger circuits with many more qubits, this matrix-
based infrastructure becomes unwieldy, error-prone, and does not scale. In this chapter
we address the scalability problem by developing an improved infrastructure that
scales easily to much larger problems. We recommend at least skimming this content
before exploring the complex algorithms in Chapter 6. Ultimately, we are building the
foundation for a high-performance quantum simulator. You don’t want to miss it!
In this chapter, we first give an overview of the various levels of infrastructure
that will be developed, with the corresponding computational complexities and lev-
els of performance. We introduce quantum registers, which are named groups of
qubits. We describe a quantum circuit model, where most of the complexity of the
base infrastructure is hidden away in an elegant way. To handle the larger circuits
of advanced algorithms, we need faster simulation speeds. We detail an approach to
apply an operator with linear complexity rather than the O(n 2 ) method that we started
with in Section 2.5.3. We then further accelerate this method with C++, attaining a
performance improvement of up 100× over the Python version. For some specific
algorithms we can do even better. We describe a sparse state representation, which
will be the best-performing implementation for many circuits.
This book focuses on algorithms and how efficiently simulate them on a classical
computer. Quantum simulation can be implemented in a multitude of ways. The key
attributes of the various implementation strategies are computational complexity, the
resulting performance, and the maximal number of qubits that can be simulated in a
reasonable amount of time with reasonable resource requirements.
The size of the quantum state vector (as described so far) grows exponentially
with the number of qubits. For a single qubit, we only need to store two complex
4.1 Simulation Complexity 123
numbers, amounting to 8 bytes when using float or 16 bytes when using double as
the underlying data type. Two qubits require four complex numbers; n qubits require
2n complex numbers. Simulation speed, or the ability to fit a state into memory, is
typically measured by the number of qubits at which a given methodology is still
tractable. By tractable, we mean that a result that can be obtained in roughly less than
an hour. At the time of this writing, the world record in storing and simulating a full
wave function stood at 48 qubits (De Raedt et al., 2019).
Because of the exponential nature of the problem, improving performance by a
factor of 8× means that we can only handle three more qubits. If we see a speedup of
100×, this means we can handle six or seven additional qubits. The following are the
five different approaches we describe in this book:
another handful of qubits (if they were fully dedicated to a simulation job, including
all their secondary storage).
These aforementioned techniques are mostly standard High-Performance Comput-
ing (HPC) techniques. Since they don’t add much to the exposition here, we will not
discuss them further. We list a range of open-source solutions in Section 8.5.10, several
of which do support distributed simulation (the transpilation techniques detailed in
Section 8.5 allow the targeting of several of these simulators). What these numbers
demonstrate is how quickly simulation hits limits. Improving performance or scala-
bility by 1,000× only gains about 10 qubits. Adding 20 qubits results in 1,000,000×
higher resource requirements.
There are other important simulation techniques. For example, the Schrödinger–
Feynman simulation technique, which is based on path history (Rudiak-Gould,
2006; Frank et al., 2009). This technique trades performance for reduced memory
requirements. Other simulators work efficiently on restricted gate sets, such as the
Clifford gates (Anders and Briegel, 2006; Aaronson and Gottesman, 2004). Further-
more, there is ongoing research on improving simulation of specific circuit types
(Markov et al., 2018; Pan and Zhang, 2021). As exciting as these efforts are, they are
beyond the scope of this book.
For larger and more complex circuits, we want to make the formulation of algorithms
more readable by addressing qubits in named groups. For example, the circuit in
Figure 4.1 has a total of eight qubits. We want to name the first four data, the
next three ancilla, and the bottom one control. In the example, the gates are just
random. On the right side, it shows the global qubit index as gx , as well as the index
into the named groups; for example, global qubit g5 corresponds to register ancilla1 .
These named groups of adjacent qubits are called quantum registers. In classical
machines, a register typically holds a single value (ignoring vector registers for a
moment). A group of registers, or the full physical implementation of registers in
hardware, is what is typically called a register file. In that sense, because a quantum
register is a named group of qubits, it is more akin to a classical register file, like a
group of pipes in a church organ register.
The state of the system is still the tensor product of all eight (global) qubits, num-
bered from 0 to 7 (g0 to g7 ). At the same time, we want to address data with indices
ranging from 0 to 3, which should produce the global qubit indices 0 to 3 in the
combined state; we want to address ancilla from 0 to 2, resulting in global qubit
indices 4 to 6; and we want to address control with index 0, resulting in global qubit
index 7. In code, a simple lookup table will do the trick.
The initial implementation is a bit rough. No worries, we will wrap this up nicely
in the next section. We introduce a Python class Reg (for “Register”) and initialize it
by passing the size of the register file we want to create and the current global offset,
4.2 Quantum Registers 125
H U ... H g0 : data0
H Z U ... H g1 : data1
data
H T ... H g2 : data2
H T ... H g3 : data3
H U ... H g4 : ancilla0
H T ... H g6 : ancilla2
which has to be manually maintained for this interface. In the example above, the first
global offset is 0, for the second register it is 4, and for the last register it is 7.
By default, the states are assumed to be all |0i, but an initializer, it, can be passed
as well. If it is an integer, it is converted to a string with the number’s binary repre-
sentation. If it is a string (including after the previous step), tuple, or list, the lookup
table is initialized with 0s and 1s according to the binary numbers passed. Again, the
ordering is from most significant to least significant qubit.
class Reg():
def __init__(self, size: int, it=0, global_reg: int = None):
self.size = size
self.global_idx = list(range(global_reg,
global_reg + size))
self.val = [0] * size
if it:
if isinstance(it, int):
it = format(it, '0{}b'.format(size))
if isinstance(it, (str, tuple, list)):
for idx, val in enumerate(it):
if val == '1' or val == 1:
self.val[idx] = 1
For example, to create and initialize data with |1011i and ancilla with |111i,
which is the binary representation of decimal 7, and to access global qubit 5, we write:
To give a textual representation of the register with the initial state, we write a short
dumper function to print the register in state notation:
The code so far is simple but good enough for our next use cases. It does not,
for example, allow the initialization of individual registers in superposition. To
get the global qubit index from a register’s index, we use this function, which
allows getting a global register by simply indexing into the register, for example,
greg = ancilla[1]:
@property
def nbits(self) -> int:
return self.size
After all this setup, we still have to create an actual state from the register, with:
We must only call this function once per initialized register, as a final step.
Modifying the initialization value of a register after the state has been created has
no effect on the created state. This may not be the most elegant way to do this, but it
is compact and sufficient for our purposes. We do not show any code examples here,
because we are going to develop a much nicer interface next.
4.3 Circuits
So far we have used full state vectors and operator matrices to implement the initial
algorithms. This infrastructure is easy to understand and works quite well for algo-
4.3 Circuits 127
rithms with a small number of qubits. It is helpful for learning, but the representation
is explicit. It exposes the underlying data structures, and that can cause problems:
class qc:
"""Wrapper class to maintain state + operators."""
4.3.1 Qubits
The circuit class supports quantum registers, which immediately add a register’s qubits
to the circuit’s full state. This is the place where the global register count is maintained,
hiding away the rough earlier interface of the underlying Reg class:
To add individual qubits to the circuit, we wrap the various constructor functions
discussed earlier with corresponding member functions of qc. Each of these generator
functions immediately combines the newly generated qubits with the internal state. In
order to allow mixing of qubits and registers, we have to update the global register
count as well.
def qubit(self,
alpha: np.complexfloating = None,
beta: np.complexfloating = None) -> None:
self.psi = self.psi * state.qubit(alpha, beta)
self.global_reg = self.global_reg + 1
@property
def nbits(self) -> int:
return self.psi.nbits
The function apply1 applies the single-qubit gate gate to the qubit at idx. The
gate to be applied may get a name. Some gates require parameters, e.g., the rotation
gates, which can be specified with the named parameter val.
The function applyc operates the same way, but it additionally gets the index of
the controlling qubit ctl.
4.3.3 Gates
With these two apply functions in place, we can now wrap all standard gates as
member functions of the circuit. This is mostly a straightforward wrapping, except
for the double-controlled X-gate and the corresponding ccx member function, which
uses the special construction we saw earlier in Section 3.2.7.
def ccx(self, idx0: int, idx1: int, idx2: int) -> None:
"""Sleator-Weinfurter Construction."""
self.cv(idx0, idx2)
self.cx(idx0, idx1)
self.cv_adj(idx1, idx2)
self.cx(idx0, idx1)
self.cv(idx1, idx2)
def toffoli(self, idx0: int, idx1: int, idx2: int) -> None:
self.ccx(idx0, idx1, idx2)
All these gates can be applied with our still-hypothetical two apply functions,
except the unitary function. This function allows the application of an arbitrarily
sized operator, falling back to the full matrix implementation. In the context of qc,
this function is an abomination. As a matter of fact, we don’t use it for any of the
examples and algorithms in this book. We only added it for generality. Don’t use it.
And add the id function as a default parameter to each gate application function. For
example, for the S-gate:
And to apply the adjoint function, call the function with the adjoint function as
parameter:
4.3 Circuits 131
qc.s(0, circuit.adjoint)
This is certainly elegant, especially for compiled languages, which can optimize away
the overhead from this construction. Python is relatively slow as is, and we don’t want
to further slow it down, so we go with the first alternative – we add individual apply
functions for adjoint gates, as needed for the code examples.
4.3.5 Measurement
We wrap the measurement operator in a straightforward way:
Note that we construct a full-matrix measurement operator, which means this way
of measuring won’t scale. Fortunately, in many cases we don’t have to perform an
actual measurement to determine a most likely measurement outcome. We can just
look at the state vector and find the one state with the highest probability – we can do
measurement by peek-a-boo.
For convenience, we also add a statistical sampling function. Its parameter is the
probability of measuring |0i. For example, we could provide the value 0.25. The
function picks a random number in the range of 0.0 to 1.0. If the probability of
measuring |0i is lower than this random number, it means that we happened to measure
a state |1i.
To a degree this is silly, as in our infrastructure we know the probabilities for any
given state. We don’t have to sample over the probabilities to obtain probabilities we
already know. Nevertheless, some code might be written in a way as if it would run
on an actual quantum computer, and that would make sampling necessary. To mimic
this, and also to mirror code that can be found in other infrastructures, we offer this
function.
q0
q1
q2
q3
q4
will be used later in Shor’s algorithm (Section 6.6). It is easy to implement by simply
changing the cx gates in a Swap gate to double-controlled ccx gates:
def cswap(self, ctl: int, idx0: int, idx1: int) -> None:
self.ccx(ctl, idx1, idx0)
self.ccx(ctl, idx0, idx1)
self.ccx(ctl, idx1, idx0)
• For the controlling gates, we allow 0, 1, or 2 or more controllers. This makes this
implementation quite versatile in several scenarios.
• We allow for Controlled-By-1 gates and Controlled-By-0 gates. To mark a gate as
Controlled-By-0, the index idx of the controller is passed as a single element list
item [idx].
For the example in Figure 4.2, for the X-gate on qubit q4 , which is controlled by
By-1 and By-0 control qubits, we make the following function call. Of course,
we have to make sure we have reserved enough space for the ancillae in the
aux register:
Here is the full implementation. We also modified the function applyc to enable
Controlled-By-0 gates by emitting an X-gate before and after the controller qubit (not
shown here).
# Uncompute predicate.
aux_idx = aux_idx - 1
for i in range(len(ctl)-1, 1, -1):
self.ccx(ctl[i], aux[aux_idx], aux[aux_idx+1])
aux_idx = aux_idx - 1
self.ccx(ctl[0], ctl[1], aux[0])
134 Scalable, Fast Simulation
4.3.8 Example
To show an example of how to use the circuit model, here is the classical arithmetic
adder circuit using the matrix-based infrastructure:
And here is the formulation using the quantum circuit. It is considerably more
compact. It also hides the implementation details, which means we will be able to
accelerate the circuit later (in Sections 4.4 and 4.5) by providing fast implementations
of the apply functions.
qc.cx(0, 3)
qc.cx(1, 3)
qc.ccx(0, 1, 4)
qc.ccx(0, 2, 4)
qc.ccx(1, 2, 4)
qc.cx(2, 3)
As we will see later in Section 8.5, this wrapper class makes it easy to augment the
implementations of the gate application functions to add functionality for transpilation
of a circuit to other forms, e.g., QASM (Cross et al., 2017), Cirq (Google, 2021c), or
Qiskit (Gambetta et al., 2019).
But wait – we have not yet detailed the apply functions! They will be the topic of
the next sections.
Up to this point, we have applied a gate by first tensoring it with identity matrices
and then applying the resulting large matrix to a full state vector. As described in
the introductory notes in Section 4.1, this does not scale well beyond a small number
4.4 Fast Gate Application 135
of qubits. For ten qubits, the augmented matrix is already a 1024 × 1024 matrix,
requiring 10242 multiplications and additions. Can we devise a more efficient way to
apply gates? Yes, we can.
Let’s analyze what happens during gate application. To start the analysis, we create
a pseudo state vector which is not normalized, but allows the visualization of what
happens to it as gates are applied to individual qubits.
qc = circuit.qc('test')
qc.arange(4)
print(qc.psi)
>>
4-qubit state. Tensor:
[ 0.+0.j 1.+0.j 2.+0.j 3.+0.j 4.+0.j 5.+0.j 6.+0.j 7.+0.j 8.+0.j
9.+0.j 10.+0.j 11.+0.j 12.+0.j 13.+0.j 14.+0.j 15.+0.j]
Now we apply the X-gate to qubits 0 to 3, one by one, always starting with a freshly
created vector. The X-gate is interesting in that it multiplies state vector entries by 0
and 1, causing values to swap. This is similar to how applying the X-gate to a regular
qubit “flips” |0i and |1i.
It appears the right half of the vector was swapped with the left half. Let’s try the
next qubit index. Applying the X-gate to qubit 1 results in:
Now it appears that chunks of four vector elements are being swapped. The ele-
ments 4–7 swap position with elements 0–3, and the elements 12–15 swap position
with elements 8–11. A pattern is emerging. Let us apply the X-gate to qubit 2:
The pattern continues: now groups of two elements are swapped. And finally, for
qubit 3 we see that individual qubits are being swapped:
• Qubit 0. Applying the X-gate to qubit 0 swaps the first half of the state vector
with the second half.
If we interpret vector indices in binary, the state elements with indices that had
bit 3 set (most significant bit) switched position with the indices that did not have
bit 3 set. Positions 8–15 had bit 3 set and switched positions with 0–7, which did
not have bit 3 set. One block of eight elements got switched.
• Qubit 1. Applying the X-gate to qubit 1 swaps the second quarter of the state
vector with the first, and the fourth quarter with the third.
Correspondingly, the vector elements with indices that had bit 2 set switched
with the ones that have bit 2 not set, “bracketed” by the bit pattern in bit 3. What
does it mean that an index is bracketed by a higher-order bit? It simply means that
the higher-order bit did not change, it remained 0 or 1. Only the lower order bits
switch between 0 and 1. Blocks of four elements were switched at a time and there
are two brackets for qubit index 1.
• Qubit 2. Applying the X-gate to qubit 2 swaps the second eighth of the state
vector with the first, the fourth with the third, the sixth with the fifth, and so on.
Similar to above, the vector elements with indices that had bit 1 set switched
with the ones that didn’t have bit 1 set. This swapping is bracketed by the bit
pattern in bit 2 and further bracketed by the bit patterns of bit 3.
• Qubit 3. Finally, and again similar to above, applying the X-gate to qubit 3 now
swaps single elements: element 0 swaps with element 1, element 3 swaps with
element 2, and so on.
4.4 Fast Gate Application 137
We can put this pattern in a closed form by looking at the binary bit pattern for the
state vector indices (Smelyanskiy et al., 2016). Let us introduce this bit index notation
for a state with a classical binary bit representation (where we omit the state brackets
|·i for ease of notation):
ψ βn−1 βn−2...β0
As an example, the state |01101i can be written as decimal |13i or ψ01101 in this
notation.
Applying a single-qubit gate on qubit k in an n-qubit state (qubits 0 to n − 1)
applies the gate to a pair of amplitudes whose indices differ in bit (n − 1 − k) in binary
representation. In our first example, we have four qubits. Qubit 0 translates to classical
bit 3 in this notation, and qubit 3 corresponds to classical bit 0. We apply the X-gate to
the probability amplitudes that correspond to the states where the bit index switches
between 0 and 1, thus swapping chunks of the state vector. The swapping happens
because we applied the X-gate. Again, the same approach will work for all gates; it
just becomes visual and easy to understand for the X-gate.
In general, we want to apply a gate G to a qubit of a system in state |ψi, where
G is a 2 × 2 matrix. Let us name the four matrix elements G 00 , G 01 , G 10 , and G 11 ,
corresponding to left-top, right-top, left-bottom, and right-bottom.
Applying a gate G to the kth qubit corresponds to the following recipe. This nota-
tion indicates looping over the full state vector. All vector elements whose indices
match the specified bit patterns are being multiplied with the gate elements G 00 , G 01 ,
G 10 , and G 11 , as specified in this recipe.
For controlled gates, the pattern can be extended. We have to ensure that the control
bit c is set to 1 and only apply the gates to states for which this is the case:
To apply a single gate, we add this function to our implementation of states in file
lib/state.py (with 1<<n as an optimized version of 2**n):
The implementation for controlled gates is very similar, but note the additional if
statement in the code, checking whether or not the control bit is set:
def applyc(self, gate: ops.Operator, ctrl: int, target: int) -> None:
"""Apply a controlled 2-qubit gate via explicit indexing."""
4.4.1 Benchmarking
The complexity of this method is now O(n), compared to O(n 2 ) for matrix-vector
multiplication. To see how quickly one outperforms the other, we write a quick test.
This is not “rocket surgery,” but the effects are too pleasing to ignore.
nbits = 12
qubit = random.randint(0, nbits-1)
gate = ops.PauliX()
def with_matmul():
psi = state.zeros(nbits)
op = ops.Identity(qubit) * gate * ops.Identity(nbits - qubit - 1)
psi = op(psi)
def apply_single():
psi = state.zeros(nbits)
psi = apply_single_gate(gate, qubit, psi)
We could now add these routines to the quantum circuit class, but wait – we can do
even better and accelerate these routines with C++. This will be the topic of the next
section.
We now understand how to apply gates to a state vector with linear complexity, but
the code was in Python, which is known to execute slower than C++. In order to add
a few more qubits to our simulation capabilities and accelerate gate application, we
140 Scalable, Fast Simulation
implement the gate application functions in C++ and import them into Python using
standard extension techniques.
This section contains a lot of C++ code. The core principles were shown in Section
4.4; there is not much new here, except some fun observations about performance at
the end. We still detail this code as it might be of value for readers with no experience
extending Python with fast C++. The actual open-source code is about 150 lines long
and available in the open-source repository.
The key routines are in a file xgates.cc for “accelerated gates”. The <path> to
numpy must be set correctly to point to a local setup. The open-source repository will
have the latest instructions on how to compile and use this Python extension. We also
want to support both float and double complex numbers, so we templatize the code
accordingly.
#include <stdio.h>
#include <stdlib.h>
#include <complex>
The code above mirrors the Python implementation very closely. To now extend
Python and make this extension loadable as a shared module, we add standard Python
bindings code for single-qubit gates:
PyObject *gate_arr =
PyArray_FROM_OTF(param_gate, npy_type, NPY_IN_ARRAY);
cmplx_type *gate = ((cmplx_type *)PyArray_GETPTR1(gate_arr, 0));
Py_DECREF(psi_arr);
Py_DECREF(gate_arr);
}
There is, of course, similar code for the controlled gates in the open-source repos-
itory. The following are the functions the Python interpreter will call when importing
142 Scalable, Fast Simulation
a module. We register the Python wrappers in a module named xgates with standard
boilerplate code:
PyMODINIT_FUNC PyInit_xgates(void) {
Py_Initialize();
import_array();
return PyModule_Create(&xgates_definition);
}
In order for Python to be able to find this extension, we typically set an environment
variable. For example, on Linux:
export PYTHONPATH=path_to_xgates.so
Alternatively, you can extend Python’s module search path programmatically with
code like this:
import sys
sys.path.append('/path/to/search')
import xgates
This is a modified and seemingly faster variant of the inner loop, which avoids at
least four multiplications:
Table 4.1 Benchmark results (program output), comparing hand-optimized and nonoptimized gate
application routines.
The performance results are shown in Table 4.1. Remember our hypothesis that
the optimized version would be faster, because it executes fewer multiplications and
additions. Column Iterations shows iterations per second; higher is better.
The specialized version runs about 10% slower! For the given x86 platform, the
compiler was able to vectorize the nonspecialized version, leading to a slightly higher
overall throughput. Intuition is good, validation is better.
In summary, we found a way to apply gates with linear complexity and accelerated
it by another significant factor with C++. Performance comparison to the Python
version shows a speedup of about 100×. This should add six or seven additional
qubits to our simulation capabilities. This infrastructure is sufficient for all remaining
algorithms in this book.
There are other ways to simulate quantum computing (Altman et al., 2021), as we
discussed at the end of Section 4.1. There is one specific, interesting way to represent
states sparsely. For many circuits this is a very efficient data structure. We give a brief
overview of it in Section 4.6 and provide full implementation details in the Appendix.
So far, our data structure for representing quantum states is a dense array holding all
probability amplitudes for all superimposed states, where the amplitude for a specific
state can be found via binary indexing. However, for many circuits and algorithms,
there can be a high percentage of states with close to zero probability. Storing these
0-states and applying gates to them will have no effect and is wasteful. This fact can be
exploited with a sparse representation. An excellent reference implementation of this
principle can be found in the venerable, open source libquantum library (Butscher
and Weimer, 2013).
146 Scalable, Fast Simulation
We re-implement the core ideas of that library as they pertain to this book;
libquantum addresses other aspects of quantum information, which we do not
cover. We therefore name our implementation libq to distinguish it from the original.
The original library is in plain C, but our implementation was moderately updated
with C++ for improved readability and performance. We maintain some of the C
naming conventions for key variables and functions to help with direct comparisons.
Here is the core idea: assume we have a state of N qubits, all initialized to be in the
state |0i. The dense representation stores 2 N complex numbers, where only the very
first entry is a 1.0 and all other values are 0.0, corresponding to state |00 . . . 0i.
Our library libq turns this on its head. States are stored as bitmasks (currently
up to 64 qubits, but this can be extended), where 0s and 1s correspond to states |0i
and |1i. Each of these bit combinations is then paired with a probability amplitude.
In the above example, libq would store the tuple (0x00...0, 1.0), indicating
that the only state with nonzero probability is |00 . . . 0i. For 53 qubits, the full state
representation would require 72 petabytes of memory, while the sparse representation
only requires a total of 16 bytes if the amplitude is stored as a double precision value,
12 bytes if we use 4-byte floats.
Applying a Hadamard gate to the least significant qubit will put it in superposition.
In libq, this means that there are now two states with nonzero probability:
4.6.1 Benchmarking
We only provide anecdotal evidence for the efficiency of the sparse representation.
A full performance evaluation is ill-advised in a book like this – the results will be out
of date and no longer relevant by the time you read this.
The most complex algorithm in this book is Shor’s integer factorization algorithm
(Section 4.6). The quantum part of the algorithm is called order finding. To factor the
number 15, it requires 18 qubits and 10,533 gates; to factor 21, it requires 22 qubits
and 20,671 gates, and to factor 35, it requires 26 qubits and 36,373 gates. We run this
circuit in two different ways:
Both versions will compute the same result; the textual output only differs
marginally. To factor 21 with 22 qubits, we get the following result. Note that a
maximum of only 1.6% of all possible states ever obtain a nonzero probability at one
point or the other during execution:
# of qubits : 22
# of hash computes : 2736
Maximum # of states: 65536, theoretical: 4194304, 1.562%
States with nonzero probability:
0.499966 +0.000000i|4> (2.499658e-01) (|00 0000 0000 0000 0000 0100>)
148 Scalable, Fast Simulation
The libq version runs in under five seconds on a modern workstation, while the
circuit version takes about 2.5 minutes, a speedup of roughly 25×. To factor the
number 35 with 26 qubits, the libq version runs for about 3 minutes, while the full
state simulation takes about an hour. Again, a solid speedup, this time of about 20×.
We do ignore the compile times for the generated C++ versions. We would have to
include these in an actual scientific evaluation, which this is not.
5 Beyond Classical
The term Beyond Classical is now the preferred term over Quantum Advantage, which
was the preferred term over the unfortunate term Quantum Supremacy. That term
was originally coined by Prof. John Preskill to describe a computation that can be
run efficiently on a quantum computer but would be intractable to run on a classical
computer (Preskill, 2012; Harrow and Montanaro, 2017).
Computational complexity theory is a pillar of computer science. A good intro-
duction, along with extensive literature references, can be found in Dean (2016).
There exists a large set of complexity classes. The best known big categories are the
following:
• Class P, the class of decision problems (with a yes or no answer) with problem
size n that run in polynomial time (n x ).
n
• Class NP, decision problems with exponential run time (x ) which can be verified
in polynomial time.
• Class NP-complete, which is a somewhat technical construction. It is a class of NP
problems that other NP-complete problems can be mapped to in polynomial time.
Finding a single example from this class falling into P would mean that all
members of this class are in P as well.
• Class NP-hard, the class of problems that are at least as hard as the hardest
problems in NP. To simplify a little bit, this is the class of NP problems that may
not be a decision problem, such as integer factorization, or for which there is no
known polynomial-time algorithm for verification, such as the traveling salesman
problem (Applegate et al., 2006).
There are dozens of complexity classes with various properties and inter-
relationships. The famous question of whether P = NP remains one of the great
challenges in computer science today (and can be answered jokingly with yes – if
N = 1 or P = 0).
The interest in quantum computing arises from the belief that quantum algorithms
fall into class BQP, the complexity class of algorithms that can be solved by a quantum
Turing machine in polynomial time with an error probability of less than 1/3. This
group is believed to be more powerful than class BPP, the class of algorithms that can
be solved in polynomial time by a probabilistic Turing machine with a similar error
rate. Stated simply, there is a class of algorithms that can run exponentially faster on
quantum machines than on classical machines.
150 Beyond Classical
From a complexity-theoretic point of view, since BQP contains BPP, this would
mean that quantum computers can efficiently simulate classical computers. However,
would we run a word processor or video game on a quantum computer? Classical
and quantum computing appear complementary. The term beyond seems well chosen
to indicate that there is a complexity class for algorithms that run tractably only on
quantum computers.
To establish the quantum advantage, we will not take a complexity-theoretic
approach in this book. Instead, we will try to estimate and validate the results of the
quantum supremacy paper by Arute et al. (2019) to convince ourselves that quantum
computers indeed reach capabilities beyond those of classical machines.
In 2019, Google published a seminal paper claiming to finally have reached quantum
advantage on their 53-qubit Sycamore chip (Arute et al., 2019). A quantum random
algorithm was computed and sampled 1,000,000 times in just 200 seconds, a result
that was estimated to take the world’s fastest supercomputer 10,000 years to simulate
classically.
Shortly thereafter, IBM, a competitor in the field of quantum computing, followed
up by estimating that a similar result could be achieved in just a few days, with higher
accuracy, on a classical supercomputer (Pednault et al., 2019). A few days versus 200
seconds is a factor of about 1,000×. A few days versus 10,000 years is another factor
of 1,000×. Disagreements of this magnitude are exciting. How is it that these two
great companies disagree to the tune of a combined 1,000,000×?
In order to make claims about performance, you first need a proper benchmark. Typi-
cal benchmark sets are SPEC (www.spec.org) for CPU performance and the recent
MLPerf benchmarks (http://mlcommons.org) for machine learning systems. It is
also known that as soon as benchmarks are published, large groups of people embark
on efforts to optimize and tune their various infrastructures towards the benchmarks.
When these efforts cross into an area where optimizations only work for benchmarks,
these efforts are called benchmark gaming.
The challenge, therefore, is to build a benchmark that is meaningful, general, yet
hard to game. Google suggested the methodology of using quantum random circuits
(QRC) and cross entropy benchmarking (XEB) (Boixo et al., 2018). XEB observes
that the measurement probabilities of a random circuit follow certain patterns, which
would be destroyed if there were errors or chaotic randomness in the system. XEB
samples the resulting bitstrings and uses statistical modeling to confirm that the chip
indeed performed a nonchaotic computation. The math is beyond the scope of this
text, so we defer to Boixo et al. (2018) for further details.
5.2 Quantum Random Circuit Algorithm 151
There are specific constraints for the gates on the Google chip, and they cannot be
placed at random. We follow the original construction rules from Boixo et al. (2018).
The supremacy experiment uses three types of gates, each a rotation by π/2 around
an axis on a Bloch sphere’s equator. Note that the definitions of the gates is slightly
different from those we presented earlier:
1/2 1 1 −i
X ≡ R X (π/2) = √ ,
2 −i 1
1 1 −1
Y 1/2 ≡ RY (π/2) = √ ,
2 1 1
√
1/2 1 1 − i
W ≡ R X +Y (π/2) = √ √ .
2 −i 1
There is a list of specific constraints for circuits:
• For each qubit, the very first and last gates must be Hadamard gates. This is
reflected in a notation for circuit depth as 1-n-1, indicating that n steps, or gate
levels, are to be sandwiched between Hadamard gates.
• Apply CZ gates in the patterns shown in Figure 5.1, alternating between horizontal
and vertical layouts.
1/2 1/2 1/2
• Apply single-qubit operators X , Y , and T (or W ) to qubits that are not
affected by the CZ gates, using the criteria below. For our simulation (using our
infrastructure, which does not specialize for specific gates), the choice of gates
actually does not matter in regards to computational complexity: they are all 2 × 2
gates, and we can use any of the standard gates. For more sophisticated
methodology, like tensor networks, the choice of gates can make a difference.
5.3 Circuit Construction 153
Figure 5.1 Patterns for applying controlled gates on the Sycamore chip.
|0i H U T U T H
|0i H Z U T Z U T Z H
|0i H T U T U H
|0i H T Z U T Z U H
|0i H U T U T H
|0i H Z U T Z U T Z H
|0i H T U T U H
|0i H T Z U T Z U H
|0i H U T U T H
|0i H Z U T Z U T Z H
– If the previous cycle had a CZ gate, apply any of three single-qubit unitary
gates.
– If the previous cycle had a nondiagonal unitary gate, apply the T-gate.
– If the previous cycle had no unitary gate (except Hadamard), apply the T-gate.
– Else, don’t apply a gate.
• Repeat above steps for a given number of steps (which we call depth in our
implementation).
• Apply the final Hadamard and measure.
This interpretation of the rules produces a circuit like the one shown in Figure 5.2.
Note that there have been refinements since first publication; Arute et al. (2020) has
the details. The main motivation for making changes was to make it harder for the new
154 Beyond Classical
circuits to be simulated by tensor networks, the most efficient simulation technique for
this type of network (Pan and Zhang, 2021). In our case, we are looking for orders of
magnitude differences. We stick with this original definition and make sure to apply
corresponding fudge factors in the final estimation.
Let’s implement this interpretation. Again, it doesn’t matter which gates to apply
specifically; the simulation time is the same for each gate in our infrastructure. As long
as the gate types and density are roughly aligned with the Google circuit, our estima-
tion should be reasonably accurate. Note that other infrastructures, including Google’s
qsimh, do apply a range of optimizations to improve simulation performance.
We encode the patterns as lists of indices, where a nonzero element indicates a CZ
gate from the current index to the index with the offset found at that location. The
eight patterns are then encoded like this:
pattern2 = [1, 0, 0, 0, 1, 0,
0, 0, 1, 0, 0, 0] * 3
Gates are represented by a simple enumeration (H, T, U, CZ). With this, we are ready
to build the circuit. We start from the horizontal and vertical patterns and then proceed
to apply the rules as stated above. Note, again, for our simulation the actual gates do
not matter:
def apply_pattern(pattern):
bits_touched = []
for i in range(min(nbits, len(pattern))):
if pattern[i] != 0 and i + pattern[i] < nbits:
bits_touched.append((i, i + pattern[i]))
return bits_touched
5.4 Estimation
with a smaller, tractable dimension, simulate it, and from the simulation results we
extrapolate to a 53-qubit circuit.
We make several simplifying assumptions, most notably, that communication
between machines is free. At the end of the estimation, we should apply appropriate
factors to account for such overheads.
Simulation is done with an eager execution function, which iterates over the depth
of the circuit, simulating each gate one by one:
[...]
qc = circuit.qc('Supremacy Circuit')
qc.reg(nbits)
for d in range(depth):
s = states[d]
for i in range(nbits):
if s[i] == Gate.UNK:
continue
ngates += 1
if s[i] == Gate.T:
qc.t(i)
[... similar for H, U/V, U/Yroot]
if s[i] == Gate.CZ:
ngates += 1 # This is just an estimate of the overhead
if i < nbits - 1 and s[i + 1] == Gate.CZ:
qc.cz(i, i+1)
s[i+1] = Gate.UNK
if i < nbits - 6 and s[i + 6] == Gate.CZ:
qc.cz(i, i+6)
s[i+6] = Gate.UNK
[...]
To estimate the time it would take to execute this circuit at 53 qubits, we make the
following assumptions:
• We assume that one-qubit and two-qubit gate application time is linear over the
size of the state vector.
• Performance is memory-bound.
• We know we’d have to distribute the computation over multiple machines, but we
ignore the communication cost.
• We assume a number of machines and a number of cores on those machines. We
know that a small number of cores on a high-core machine can saturate the
available memory bandwidth, so we take a guess on what the number of
reasonably utilized cores would be (16, but this number can be adjusted).
5.4 Estimation 157
With these assumptions, the metric Time per gate per byte in the state vector is the
one we’ll use to extrapolate results. It is remarkably stable across qubits and depth
and thus can be used to estimate approximate performance of bigger circuits. In order
to estimate how many gates there would be in a larger circuit, we compute a gate
ratio, which is the number of gates found in a circuit divided by (nbits * depth).
In code:
estimated_sim_time_secs = (
# time per gate per byte
(duration / ngates / (2**(nbits-1) * 16))
# gates
* target_nbits
# gate ratio scaling factor to circuit size
* gate_ratio
# depth
* target_depth
# memory
* 2**(target_nbits-1) * 16
# number of machines
/ flags.FLAGS.machines
# Active core per machine
/ flags.FLAGS.cores)
Let’s look at a specific result. We assume the target circuit has 53 qubits and
is run on 100 machines, each one having 16 fully available cores. The number of
158 Beyond Classical
gates in our simulation (475) seems to roughly align with the published number of
gates from the Google publication, though not exactly. (Google quoted 1,200 gates,
while extrapolating the 475 would yield about 2,000 gates.) For these parameters, the
estimation results are:
5.5 Evaluation
For comparison, let’s look at the massive Summit supercomputer (Oak Ridge National
Laboratory, 2021). It is theoretically capable of performing up to 1017 single precision
floating point operations per second. To compute 253 equivalents of 2 × 2 matrix
multiplications is of complexity 256 . At 100 percent utilization, it would take Summit
just a few seconds to compute a full simulation!
To store a full state of 53 qubits, we need 72 PB bytes of storage. Summit has
an estimated 2.5 PB of RAM on all sockets, and 250 PB of secondary storage. This
means we should expect that the simulation encounters high communication overhead
moving data from permanent storage into RAM. Much of the permanent storage would
have to be reserved for this experiment as well. The researchers at IBM found an
impressive way to minimize data transfers, which is a major contribution by Pednault
et al. (2019). With this technique, a slow-down of about 500× was anticipated, which
led to the estimate that a full simulation could run in about two days.
Now let’s try to answer the question that started this section: Where does the
discrepancy of 10,000 years versus days come from? This is a factor of about 1,000×,
after all.
The Google Quantum X team based their estimations on a different simulator
architecture (Markov et al., 2018), assuming that a full-state simulation is not realistic.
5.5 Evaluation 159
Now that we have convinced ourselves that quantum computers can indeed reach
capabilities beyond classical, at least on a semi-random circuit, we move on to discuss
more meaningful algorithms. The previous sections on simple algorithms prepared us
well to explore the complex algorithms in this chapter. We will still use a mix of full
matrix and accelerated circuit implementations, depending on which seems best in
context. It is recommended that you at least skim Chapter 4 on infrastructure before
exploring this chapter.
In this chapter, we develop the quantum Fourier transform (QFT), an important
technique used by many complex algorithms, and show it in action by performing
arithmetic in the quantum Fourier domain. We discuss phase estimation next, another
essential tool, especially when used together with QFT. Armed with these tools, we
embark on implementing Shor’s famous factorization algorithm.
After this, we switch gears and discuss Grover’s search algorithm, along with some
derivatives and improvements. We show how combining Grover and phase estimation
leads to the interesting quantum counting algorithm. A short interlude on the topic of
quantum random walks follows. Quantum walks are a complex topic; we only discuss
and implement basic principles.
At a high level, quantum computing appears to have a computational complexity
advantage over classical computing for several classes of algorithms and their deriva-
tives. These are algorithms utilizing quantum search, algorithms based on the quantum
Fourier transform, algorithms utilizing quantum random walks, and a fourth class,
the simulation of quantum systems. We detail the variational quantum eigensolver
algorithm (VQE), which allows finding minimum eigenvalues of a Hamiltonian. As
an application, we develop a graph maximum cut algorithm by framing the problem
as a Hamiltonian. This algorithm was introduced as part of the quantum approximate
optimization algorithm (QAOA), which we briefly touch upon. We further explore the
Subset Sum problem using a similar mechanism.
We conclude this chapter with an in-depth discussion of the elegant Solovay–Kitaev
algorithm for gate approximation, another seminal result in quantum computing.
6.1 Phase Kick 161
|0i H
|0i H
|1i S T
In this section we discuss the phase kick mechanism, which is the basis for the quan-
tum Fourier transform.
The controlled rotation gates have the interesting property that they can be used
in an additive fashion. The basic principle is best explained with a circuit that is
commonly known as a phase kick circuit. An example is shown in Figure 6.1.
Any number of qubits (the two top qubits in this example circuit) are initialized as
|0i and put into superposition with Hadamard gates. A third ancilla qubit starts out
in state |1i. We apply the controlled S-gate and T-gate, but remember that these gates
only add a phase to the |1i part of a state.
Each of the top qubits then connects a controlled rotation gate to the ancilla. In the
example above:
◦
• The top qubit controls a 90 rotation gate, the S-gate.
◦
• The second qubit controls a 45 gate, the T-gate.
To express this in code:
psi = state.bitstring(0, 0, 1)
psi = ops.Hadamard(2)(psi)
psi = ops.ControlledU(0, 2, ops.Sgate())(psi)
psi = ops.ControlledU(1, 2, ops.Tgate())(psi, 1)
psi.dump()
Because of the superposition, the |1i part of each of the top qubits’ superpositioned
states will activate the rotation of the controlled gate. Using this, we can perform
addition of phases. For the example, these are the resulting probability amplitudes and
phases:
Note how the phases add up because they are controlled by a (superpositioned)
|1i in the corresponding qubit. Having the top qubit as |1i adds 90◦ , and having the
second qubit as |1i adds 45◦ . The third qubit is an ancilla. Also, note that we could use
arbitrary rotations or fractions of π. We can use this type of circuit and corresponding
rotation gates to express numerical computations in terms of phases. Of course, we
have to normalize to 2π to avoid overflows.
The ability to add phases in a controlled fashion is powerful and is the foundation
of the quantum Fourier transformation, which we will explore in the next section.
In preparation for this section, let us briefly look at how to express the rotations
mathematically.
◦ 1
2πi/2 . Expressed as a phase angle
• A rotation by 180 as a fraction of 2π is e
this is −1.
2
• A rotation by 90◦ as a fraction of 2π is e2πi/2 , a phase of i.
◦ 2πi/23 .
• A rotation by 45 as a fraction of 2π is e
◦ ◦ ◦ 2πi(1/22 +1/23 ) .
• Finally, a rotation by 135 = 90 + 45 as a fraction of 2π is e
So why is this circuit called a phase kick circuit? To understand this, let us look at
a simpler version of the circuit and the corresponding math.
|ψ1 i |ψ2 i
|0i H
|1i S
The quantum Fourier transform (QFT) is one of the foundational algorithms of quan-
tum computing. It is important to note that although it does not speed up classical
Fourier analysis of classical data, it does enable other important algorithms, such as
phase estimation, which is the approximation of the eigenvalues of an operator. Phase
estimation is a key ingredient in Shor’s factoring algorithm and others. Let us discuss
a few preliminaries first.
The xi should be interpreted as binary bits, with values of either 0 or 1. This looks
natural in the following mathematical notation:
x0 1
= x0 1 = 0.x0,
21 2
x0 x1 1 1
1
+ 2 = x0 1 + x1 2 = 0.x0 x1,
2 2 2 2
x0 x1 x2 1 1 1
1
+ 2 + 3 = x0 1 + x1 2 + x2 3 = 0.x0 x1 x2,
2 2 2 2 2 2
..
.
We can define this the other way around, with x0 being least significant fractional
part of a binary fraction, as in 0.xn−1 · · · x1 x0 . It is important to note that this is just a
notational difference. We will encounter an example of this in the derivation of phase
estimation in Section 6.4.
In our code below, the most significant bit represents the largest part of the fraction,
for example, 0.5 for the first bit, 0.25 for the second bit, 0.125 for the third, and so
164 Complex Algorithms
on. Again, we can easily revert the order. Given a binary string of states, we use the
following routine from file lib/helpers.py to compute binary fractions:
val = helper.bits2frac((0,))
print(val)
>> 0
val = helper.bits2frac((1,))
print(val)
>> 0.5
For two bits, the first one will represent the 0.5 part of the fraction, the second part
will represent the 0.25 part of the fraction:
with:
Rk (0) = U1 (2π/20 ),
6.2 Quantum Fourier Transform 165
Rk (1) = U1 (2π/21 ),
Rk (2) = U1 (2π/22 ),
..
.
Applying one of these gates to a state means only the |1i basis state gets a phase.
To reiterate, the angles (with cw meaning clockwise and ccw meaning counterclock-
wise) are:
π
ei 2 = i ⇒ 90◦ ccw,
ei π = −1 ⇒ 180◦ ccw,
3π
ei 2 = −i ⇒ 270◦ ccw = 90◦ cw.
You might have noticed that the S-gate and T-gate we used in Section 6.1 are also
of this form, except they have their rotation angles at the fixed values of π/2 and π/4.
And converts it to a form where fractional values are encoded as fractional phases.
Since we use the phase gates, only the |1i basis state gets a phase. This is how the
state looks after we apply a QFT circuit. A detailed derivation for how this state comes
about is presented later, in Section 6.4.
1
Q F T |x0 x1 · · · xn−1 i = |0i + e2πi 0.x0 |1i
2n/2
⊗ |0i + e2πi 0.x0 x1 |1i
..
.
⊗ |0i + e2πi 0.x0 x1 x2 ···xn−1 |1i .
If we interpret the binary fractions in the reverse order, we’d get this state:
166 Complex Algorithms
1
|0i + e2πi 0.xn−1 |1i
2n/2
⊗ |0i + e2πi 0.xn−1 xn−2 |1i
..
.
⊗ |0i + e2πi 0.xn−1 xn−2 ···x1 x0 |1i .
Note that the QFT is a unitary operation, as it is made up of other unitary operators.
Since it is unitary, it has an inverse, and we should explicitly state this important
inverse relation:
1
Q F T † n/2 |0i + e2πi 0.x0 |1i
2
⊗ |0i + e2πi 0.x0 x1 |1i
..
.
⊗ |0i + e2πi 0.x0 x1 x2 ···xn−1 |1i
This mathematical formulation gives us the blueprint for constructing the circuit.
We have to put the qubits into superposition, and we have to apply controlled rotation
gates to rotate qubits around according to the scheme of binary fractions.
We can construct the QFT in two directions, depending on our interpretation of the
input qubits. We can draw it from top to bottom:
|a1 i H π/2
|a2 i H
|a0 i H
|a1 i H π/2
Or we can implement it one way and add an optional Swap gate to get both possible
directions with just one implementation. We will see examples of all of these styles in
6.2 Quantum Fourier Transform 167
the remainder of this book. Note how we can also switch the controlling and controlled
phase gates in the following diagram. Phase gates are symmetric, as shown in Section
3.2.3
|a1 i π/2 H
|a2 i H
ψ0 ψ1 ψ2 ψ3
|0i H
|0i S H
psi = state.bitstring(0, 0)
psi = ops.Hadamard()(psi)
psi = ops.ControlledU(0, 1, ops.Sgate())(psi)
psi = ops.Hadamard()(psi, 1)
psi.dump()
Let us briefly look at how to compute the results of this circuit. We start with the
tensor product of two inputs set to |0i initially.
ψ0 = |0i ⊗ |0i.
ψ0 = |0i ⊗ |1i,
ψ1 = (H ⊗ I ) |0i ⊗ |1i
|0i + |1i
= √ |1i
2
|01i + |11i
= √ .
2
Applying the Controlled-S gate will have an effect in this case:
|01i + ei π/2 |11i
ψ2 = √ .
2
And the final Hadamard leads to:
|01i + ei π/2 |11i
ψ3 = (I ⊗ H ) √
2
1
= |0i(|0i − |1i) + ei π/2 |1i(|0i − |1i) .
2
As ei π/2 = i, this results in:
1
ψ3 = |00i − |01i + i|10i − i|11i .
2
Now let’s look the four different inputs in matrix form. Remember that applying
the operators in a circuit means we have to multiply the matrices in reverse order:
1 1 1 1
1 1 −1 i −i
(I ⊗ H ) CS (H ⊗ I ) = .
2 1 1 −1 −1
1 −1 −i i
6.2 Quantum Fourier Transform 169
Applying this gate to the |0,0i base state pulls out the first row, and, correspond-
ingly, |0,1i pulls out the second, matching our results above. Similarly, for the other
three basis states and columns 2 and 3:
1 1 1 1 1 1
1 1 −1 i −i 0 1
=
2 1 1 −1 −1 0 1
1 −1 −i i 0 1
1
= (|00i + |01i + |10i + |11i).
2
And indeed, the other three cases produce amplitudes that correspond to rows 1, 2,
and 3 of the above matrix:
Input: |01>
|00> (|0>): ampl: +0.50+0.00j prob: 0.25 Phase: 0.0
|01> (|1>): ampl: -0.50+0.00j prob: 0.25 Phase: 180.0
|10> (|2>): ampl: +0.00+0.50j prob: 0.25 Phase: 90.0
|11> (|3>): ampl: +0.00-0.50j prob: 0.25 Phase: -90.0
Input: |10>
|00> (|0>): ampl: +0.50+0.00j prob: 0.25 Phase: 0.0
|01> (|1>): ampl: +0.50+0.00j prob: 0.25 Phase: 0.0
|10> (|2>): ampl: -0.50+0.00j prob: 0.25 Phase: 180.0
|11> (|3>): ampl: -0.50+0.00j prob: 0.25 Phase: 180.0
Input: |11>
|00> (|0>): ampl: +0.50+0.00j prob: 0.25 Phase: 0.0
|01> (|1>): ampl: -0.50+0.00j prob: 0.25 Phase: 180.0
|10> (|2>): ampl: +0.00-0.50j prob: 0.25 Phase: -90.0
|11> (|3>): ampl: +0.00+0.50j prob: 0.25 Phase: 90.0
In summary, QFT encodes the binary fractional encoding of a state into phases
representing the fractions for the basis states. It “rotates around” the states according
to the binary fractional parts of each qubit. In the Section 6.3, we will see an immediate
application of this: quantum arithmetic. We will combine two states in an additive
fashion to enable addition and subtraction in the Fourier domain.
One very important aspect of QFT is that while it enables encoding of (binary)
states with phases, on measurement the state would collapse to just one of the basis
states. All other information will be lost. The challenge for QFT-based algorithms is to
apply transformations such that, on measurement, you can find an algorithmic solution
to the problem at hand. In practically all cases, we will apply the inverse QFT to get
the state out of superposition to measure a result, following Equation (6.2).
must get the indices into the right order. It usually helps to transpile a circuit into
a textual format, such as QASM (defined in Section 8.3.1) or similar, to inspect the
indices.
op = Identity(nbits)
h = Hadamard()
if not op.is_unitary():
raise AssertionError('Constructed non-unitary operator.')
return op
Computing the inverse of the QFT operator is trivial. QFT is a unitary operator, so
the inverse is being computed trivially as the adjoint:
Qft = ops.Qft(nbits)
[...]
InvQft = Qft.adjoint()
If the QFT is computed via explicit gate applications in a circuit, then the inverse
has to be implemented as the application of the inverse gates in reverse order, as
outlined in Section 2.13 on reversible computing. We will see examples of this explicit
construction shortly.
A widely used online simulator is Quirk (Gidney, 2021a). Let us construct a simple
two-qubit QFT circuit in Quirk, as shown in Figure 6.2. If we look to the very right
and reconstruct the phases from the gray circles (blue on the website), we see that
state |00i (top left) has a phase of 0 (the direction of the x-axis). The state |01i (top
right) has a phase of 180◦ , the state |10i (bottom left) has a phase of −90◦ , and the
state |11i has a phase of 90◦ . So it appears that Quirk agrees with our qubit ordering
(or we agree with Quirk).
Quirk also shows the state of individual qubits on a Bloch sphere. How does this
work, as we are dealing with a two-qubit tensored state, and Bloch spheres only
represent single qubits? We talked about the partial trace in Section 2.14, which allows
tracing out qubits from a state. The result after a trace-out operation is a density
matrix. Since it only represents a partial state, it is called a reduced density matrix. In
Section 2.9, we showed how to compute the Bloch sphere coordinates from a density
matrix. Note that for systems of more than two qubits, all qubits that are not of interest
must be traced out, such that only a 2 × 2 density matrix remains.
Let’s give this a try. From the state as shown in Figure 6.2, we trace out qubit 0 and
qubit 1 individually and compute the Bloch sphere coordinates:
psi = state.bitstring(1, 1)
psi = ops.Qft(2)(psi)
>>
x0: -1.0 y0: 0.0 z0: -0.0
x1: -0.0 y1:-1.0 z1: -0.0
This result seems to agree with Quirk as well. The first qubit is located at −1 on the
x-axis of the Bloch sphere (going from the back of the page to the front of the page),
and the second qubit is located at −1 on the y-axis (going from left to right).
172 Complex Algorithms
We saw in Section 3.3 how a quantum circuit could be used to emulate a classical full
adder, using quantum gates without exploiting any of the unique features of quantum
computing, such as superposition or entanglement. It is fair to say that this was a nice
exercise demonstrating the universality of quantum computing, but otherwise, a fairly
inefficient way to construct a full adder.
In this section, we discuss another algorithm that performs addition and subtraction.
This time the math is being developed in the Fourier domain with a technique that was
first described by Draper (2000).
To perform addition, we will apply a QFT, some magic, and a final inverse QFT to
obtain a numerical result. We explain this algorithm with just a hint of math and lots
of code. This implementation uses a different direction from the controller to the con-
trolled qubit as our early QFT operator. This is not difficult to follow; simply inverting
the qubits in a register leads to identical implementations. We use explicit angles and
the Controlled-U1 gate. The code can be found in the file src/arith_quantum.py
in the open-source repository.
The first thing we need to specify is the bit width of the inputs a and b. If we want
to do n-bit arithmetic, we need to store results as (n + 1) bits to account for overflow.
Our entry point’s signature will get the bit width as n and the two initial integer
values init_a and init_b, which must fit into the available bits. The parameter
factor will be 1.0 for addition and −1.0 for subtraction. We’ll see shortly how this
factor is applied.
• Apply the QFT over the qubits representing a. This encodes the bits as phases on
states.
• Evolve a by b. This cryptic sounding step basically performs another set of
QFT-like rotations on a using the same controlled-rotation mechanism as with
regular QFT. It is not a full QFT though. There are also no initial Hadamard gates
as the states are already in superposition. We detail the steps below.
• Perform inverse QFT to decode phases back to bits.
6.3 Quantum Arithmetic 173
for i in range(n+1):
qft(qc, a, n-i)
for i in range(n+1):
evolve(qc, a, b, n-i, factor)
for i in range(n+1):
inverse_qft(qc, a, i)
Let us look at these three steps in detail, using the example of two-qubit additions.
We can insert dumpers after each of the loops to dump and visualize the circuit, as
described in Section 8.5.6. After the first loop, we produced this circuit, in QASM
format (Cross et al., 2017):
OPENQASM 2.0;
qreg a[3];
qreg b[3];
h a[2];
cu1(pi/2) a[1],a[2];
cu1(pi/4) a[0],a[2];
h a[1];
cu1(pi/2) a[0],a[1];
h a[0];
|a0 i H
|a1 i H π/2
The middle loop in Figure 6.3 is where the magic happens – the evolve step pro-
duces this circuit in QASM format. We explain how this works below.
cu1(pi) b[2],a[2];
cu1(pi/2) b[1],a[2];
cu1(pi/4) b[0],a[2];
cu1(pi) b[1],a[1];
cu1(pi/2) b[0],a[1];
cu1(pi) b[0],a[0];
174 Complex Algorithms
|a0 i π
|a1 i π π/2
|b0 i
|b1 i
|b2 i
Figure 6.3 The evolve step of quantum arithmetic in the Fourier domain.
The construction of the inverse QFT circuit happens in the third loop. All of the first
QFT’s gates are inverted and applied in reverse order. Remember that the inverse of
the Hadamard gate is another Hadamard gate, and the inverse of a rotation is a rotation
by the same angle in the opposite direction.
|a0 i H
|a1 i −π/2 H
This is how the combined circuit looks. The gates are too small to read, but the
elegant, melodic structure of the whole circuit becomes apparent:1
H π H
Why and how does this work? Let us first try to explain this mathematically, before
explaining it by looking at the state vector. First, remember that QFT takes this state:
|ψi = |xn−1 xn−2 · · · x1 x0 i,
and changes it into this form again; depending on bit ordering, two forms are possible:
1 This circuit was generated with our LAT X transpiler, explained in Section 8.5.
E
6.3 Quantum Arithmetic 175
1
|0i + e2πi 0.xn−1 |1i
2n/2
⊗ |0i + e2πi 0.xn−1 xn−2 |1i
..
.
⊗ |0i + e2πi 0.xn−1 xn−2 ···x1 x0 |1i .
Applying the rotations of the evolve step adds the binary fractions of b to a. For
example, the first part of above state for a:
|0i + e2πi 0.an−1 |1i ,
becomes:
|0i + e2πi 0.(an−1 +bn−1 ) |1i .
And so on for all fractional parts. The “trick” for quantum arithmetic is to interpret
the qubits not as binary fractions, but as bits of full binary numbers. When interpreted
this way, the net result is a full binary addition.
Another way to explain this is to look at the state vector and the evolve circuit itself.
Let’s assume we want to add 1 in register a and 1 in register b. Before we enter the
main loops, the state looks like the following. The first three qubits belong to a, with
the qubit 0 now in the role of the least significant qubit. The next three qubits belong
to b. This is how we initialize the state, and, as a result, only one state has a nonzero
probability:
Since we only apply QFT to the first three qubits, we now have eight superimposed
states, all with the same probability. Let’s now look at the circuit diagram for the
evolve phase. The least significant qubit for b is global qubit 3, corresponding to b0 in
the diagram. Because it is set, the evolve circuit controls rotations of a full π to qubit
a0 , a rotation of π/2 to qubit a1 , and a rotation of π/4 to qubit a2 .
176 Complex Algorithms
This works like clockwork, as in, how a clock actually works, where smaller gears
drive the larger ones. Here we deal with rotations; the higher-order qubits are made to
rotate slower than the lower-order qubits:
• The least significant qubit gets a full rotation by π. If it was not set, it will have a
phase of π now. If it was set, it will now have a phase of 0.
• For qubit 1 of register a, it gets a rotation of π/2.
• The most significant qubit of register a rotates by π/4.
This means, for example, that when adding a number 1 twice, the least significant
qubit flip flops between 0 and π, qubit 1 runs in increments of π/2, and qubit 2 in
increments of π/4. The same scheme works for the higher order qubits in register b.
For our example of 1 + 1, the following are the phases on the state vector after the
evolve step:
How does this compare if we initialize a with the value 2 instead of 1? Here is that
state after initialization. Note the first three qubits are now in state |010i:
Here is the state after the initial QFT. We see that the phases are identical to the
1 + 1 state above after evolving:
Back to our 1 + 1 example. After evolve and the inverse QFT, the state correspond-
ing to value a = 2 is the only state with nonzero probability, the addition worked.
6.3 Quantum Arithmetic 177
Remember that the first three qubits correspond to the register a, with qubit 0 acting
as the least significant bit.
The code for the QFT and evolve functions is straightforward using the cu1 gate.
The indices are hard to follow. It is a good exercise to dump the circuit textually, for
example as QASM, and check that the gates are applied in the correct order (as we
have done above):
Given the insight that rotations in the Fourier domain facilitate addition, it is almost
too easy to implement subtraction – we add a factor to b, and, for subtraction, we
evolve the state in the opposite direction. This is already implemented in the code
above in the evolve function.
With the same line of reasoning, multiplication of the form a + cb, with c being a
constant other than ±1, also just applies the factor c to the rotations. We have to be
careful with overflow because we only accounted for one overflow bit.
You could argue that this is a bit disingenuous, as the rotations are fixed to a
given classical factor c. The algorithm does not implement an actual multiplication
(as recently proposed in Gidney (2019)) where the factor c is another quantum register
used as input to the algorithm. Argued this way, it is true that this multiplication
is not purely quantum. Nevertheless, performing multiplication this way has a valid
178 Complex Algorithms
use case. In Section 6.5 on Shor’s algorithm, we will deal with known integer values.
We can classically compute multiplication results before performing multiplication
as above with an unknown quantum state. Perhaps we should call this semi-quantum
multiplication? Naming is difficult.
In the upcoming discussion on the order-finding algorithm, we will see that for
certain, more complex computations, you do indeed have to implement full arithmetic
functions in the quantum domain.
To test our code, we check the results with a routine that performs a measurement.
We don’t actually measure, we just find the state with the highest probability. The input
state with the bit pattern representing a has a probability of 1.0. After the rotations and
coming out of superposition, the state a + b will also have a probability close to 1.0.
Note how we invert the bit order again to get to a valid result.
maxbits, _ = psi.maxprob()
result = helper.bits2val(maxbits[0:nbits][::-1])
if result != a + factor * b:
print(f'{a} + ({factor} * {b}) != {result}')
raise AssertionError('Incorrect addition.')
Our test program drives this with a few loops, passing factor through to
arith_quantum and evolve to allow testing of subtraction and (pseudo)
multiplication.
def main(argv):
print('Check quantum addition...')
for i in range(7):
for j in range(7):
arith_quantum(6, i, j, +1.0)
Since we’re using the circuit implementation with accelerated gates, we can easily
handle up to 14 qubits, hence we test with bit widths of 6 (plus one overflow bit per
input) for the individual inputs.
6.3 Quantum Arithmetic 179
angles = [0.] * n
for i in range(n):
for j in range(i, n):
if s[j] == '1':
angles[n-i-1] += 2**(-(j-i))
angles[n-i-1] *= math.pi
return angles
We also have to modify the evolve step in the quantum addition. Instead of adding
controlled gates, we simply add the rotation gates directly. This is the method we will
use later in Shor’s algorithm as well.
for i in range(n+1):
qft(qc, a, n-i)
angles = precompute_angles(c, n)
for i in range(n):
qc.u1(a[i], angles[i])
for i in range(n+1):
inverse_qft(qc, a, i)
For a three-qubit addition of 1 + 1, the circuit no longer needs the b register and
turns into:
H π H
π/8
180 Complex Algorithms
This is the same circuit for addition of 1 + 2, but notice the modified rotation angles
in the evolve step:
H 0 H
H π/2 π −π/2 H
π/8
Quantum phase estimation (QPE) is a key building block for the advanced algorithms
presented in this chapter. QPE cannot be discussed without the concepts of eigenvalues
and eigenvectors. Let us briefly reiterate what we already know about them.
import numpy as np
[...]
umat = ... # some matrix
eigvals, eigvecs = np.linalg.eig(umat)
Diagonal matrices are a special case for which finding of eigenvalues is trivial –
we can take them right off the diagonal, with the corresponding eigenvectors being
the computational bases (1,0,0, . . .)T , (0,1,0, . . .)T , and so on. If you are an attentive
2 With det being the determinant of a matrix. See, for example: https://en.wikipedia.org/wiki/Determinant.
6.4 Phase Estimation 181
reader, you will have noticed that the gates we used for phase rotations during quantum
Fourier transforms are of similar form:
" # " #
1 0 1 0
Rk = k and U1 (λ) = .
0 e2πi/2 0 ei λ
1. Encode the unknown phase with a circuit that produces a result that is identical to
the result of the QFT discussed earlier in Section 6.2. We interpret the resulting
qubits as parts of a binary fraction.
2. Apply QFT † to compute the phase φ.
To detail step one, we define a register with t qubits, where t is determined by the
precision we want to achieve. Just as with QFT, we will interpret the qubits as parts of
a binary fraction; the more qubits, the more fine-grained fractions of powers of 2 we
will be able to add up to the final result. We initialize the register with |0i and put it in
superposition with Hadamard gates.
We add a second register representing the eigenvector |ui. We then connect this
register to a sequence of t instances of the unitary gate U , each one taken to increasing
powers of 2 (1,2,4,8, . . . ,2t−1 ). Similar to QFT, we connect the t register’s qubits as
controlling gates to the unitary gates. To achieve the powers of 2, we multiply U with
itself and accumulate the results. The whole procedure is shown in Figure 6.4 in circuit
notation.
Now, the relation to the quantum Fourier rotation gates becomes apparent – higher
powers of 2 of U will result in the fractional phase angle being multiplied by increas-
ing powers of 2. Please note the ordering of the qubits and their corresponding pow-
ers of 2.
A question to ask is this: Why does |ui have to be initialized with an eigenvector?
Wouldn’t this procedure work for any normalized state vector |xi? The answer is no.
This equation holds true only for eigenvectors:
A|ui = λ|ui.
This means we can apply U and any power of U to |ui as often as we want to. Since
|ui is an eigenvector, it will only be scaled by a number: the complex eigenvalue
φ with modulus 1 (as we will prove below). Let’s develop the details in the next
section. If you are not interested in the math, you may jump to Section 6.4.4 on
implementation.
182 Complex Algorithms
.. .. ..
. . .
... 2
|0i H e2πi(2 φ) |1i
... 1
|0i H e2πi(2 φ) |1i
... 0
|0i H e2πi(2 φ) |1i
U |xi = λ|xi.
hxU † |U xi = hx λ∗ |λxi.
We know that UU † = I , and λ 2 is a factor that we can pull in front of the inner
product. State vectors are also normalized with an inner product of 1.0:
hxU † |U xi = (λ ∗ λ)hx|xi,
hx|xi = |λ|2 hx|xi,
1 = |λ|2 = |λ|.
Since |λ| = 1, we know that the eigenvalues are of this form, with φ being a factor
between 0 and 1:
λ = e2πi φ .
In Section 6.2, we used the following notation for binary fractions with t bits of
resolution, with φi being binary bits with values 0 or 1:
φ = 0.φ0 φ1 . . . φt−1
1 1 1
= φ0 1 + φ1 2 + · · · + φt−1 t .
2 2 2
6.4 Phase Estimation 183
With these preliminaries, let’s see what happens to a state in the following circuit,
which is a first small part of the phase estimation circuit. The lower qubits are in state
|ψi, which must be an eigenstate of U .
|0i H H
|ψi 0
U2
We start by limiting the precision for the eigenvalue of U to just one fractional
bit. Once we understand how this works for a single fractional bit, we expand to two,
which then makes it easy to generalize.
Let start with U having eigenvalue λ = e2πi 0.φ0 , with only one binary fractional
part, corresponding to 2−1 . The phase can thus only have a value of 0.0 or 0.5. The
state |ψ1 i after the first Hadamard gate is:
1
|ψ1 i = |+i ⊗ |ψi = √ |0i|ψi + |1i|ψi .
2
1
|ψ2 i = √ |0i|ψi + |1iU |ψi
2
1
= √ |0i|ψi + e2πi0.φ0 |1i|ψi
2
1
= √ |0i + e2πi0.φ0 |1i |ψi.
2
1
|ψ3 i = √ H |0i + e2πi0.φ0 |1i
2
1 1
= 1 + e2πi0.φ0 |0i + 1 − e2πi0.φ0 |1i.
2 2
184 Complex Algorithms
ψ1 ψ2
|0i H
|0i H
|ψi 0 1
U2 U2
1
Now let’s study the effect of the controlled U 2 on qubit 1. We know that squaring
a rotation means doubling the rotation angle:
Looking at the fractional representation and the effects of U 2 , we see that the binary
point shifts by one digit:
2φ = 2(0.φ0 φ1 )
= 2 φ0 2−1 + φ1 2−2
= φ0 + φ1 2−1 = φ0 .φ1 .
6.4 Phase Estimation 185
The term φ0 corresponds to a binary digit; it can only be 0 or 1. This means the first
factor corresponds to a rotation by 0 or 2π, which has no effect. The final result is:
e2πi(2φ) = e2πi(0.φ1 ) .
1
|ψ2 i = √ |0i + e2πi0.φ0 φ1 |1i ⊗ |0i + e2πi0.φ1 |1i ⊗ |ψi
23 | {z } | {z } |{z}
qubit 0 qubit 1 qubit 2
This is the form that results from applying the QFT operator to two qubits! This
means we can apply the two-qubit adjoint QFT † operator to retrieve the binary frac-
tions of φ = 0.φ0 φ1 as qubit states |0i or |1i:
To summarize, we connected the 0th power of 2 to the last qubit in register t and
the (t − 1)’s power of 2 to the first qubit (or the other way around, depending on how
we want to interpret the binary fractions). The final state thus becomes:
1 t−1 φ
|0i + ei π2
|1i
2t/2
t−2
⊗ |0i + ei π2 φ |1i
..
.
0
⊗ |0i + ei π2 φ |1i .
φ = 0.φt−1 φt−2 · · · φ0 .
Multiplying this angle with the powers of two as shown above will shift the digits
of the binary representation to the left, and the state after the circuit will be:
186 Complex Algorithms
1 i2π0.φt−1
|0i + e |1i
2t/2
⊗ |0i + ei2π0.φt−1 φt−2 |1i
..
.
⊗ |0i + ei2π0.φt−1 φt−2 ···φ1 φ0 |1i .
The form above is similar to the result of a QFT, where the rotations come out
according to how the input qubits were initialized in binary representation. The bit
indices are reversed from what we usually see, but this is just a renaming or ordering
issue. The final step of phase estimation reverses the QFT. It applies QFT † to allow
reconstruction of the input, which, in our case, was the representation of φ in binary
fractions. The complete circuit layout is shown in Figure 6.6.
It is important not to confuse the ordering in code and notation. As usual, all
textbooks disagree on notation and ordering. This is not that important in our case
as we can interpret the binary fraction in the proper order to obtain the desired result.
We can now measure the qubits, interpret them as binary fractions, and combine
them to approximate φ, as we will show in the implementation. Remember how
we used |1i to initialize the ancilla qubit in the phase kick circuit? The underlying
mechanism is the same. State |1i is an eigenstate for both the S-gate and T-gate.
6.4.4 Implementation
In code, this may look simpler than the math above. The full implementation can be
found in file src/phase_estimation in the open source repository. We drive this
algorithm from main(), reserving six qubits for t and three qubits for the unitary
operator. It is instructive to experiment with the numbers. We run ten experiments:
|0i H ...
0.φt−1 φt−2 · · · φ1 φ0
..
.
|0i H ...
|0i H ...
def main(argv):
nbits = 3
t = 6
print('Estimating {} qubits random unitary eigenvalue '
.format(nbits) + 'with {} bits of accuracy.'.format(t))
for i in range(10):
run_experiment(nbits, t)
In each experiment, we create a random operator and obtain its eigenvalues and
eigenvectors to ensure that our estimates below are close.
We pick eigenvector 0 to use as an example here, but the procedure works for all
other pairs of eigenvectors and eigenvalues. To check whether the algorithm works, we
compute the to-be-estimated angle phi upfront. Since we are assuming the eigenvalue
to be of the form e2πi φ , as discussed in Section 6.4.2, we divide by 2j*np.pi. Also,
we don’t want to deal with negative values. Again, this angle does not participate in
the algorithm; we just compute it upfront to confirm later that we indeed computed a
correct approximation:
For the overall circuit, note how we initialize the state psi with t qubits in state
|0i tensored with another state that is initialized with an eigenvector.
psi = expo_u(psi, u, t)
psi = ops.Qft(t).adjoint()(psi)
The heart of this circuit is the controlled connection of the operators taken to powers
of 2, which is implemented in expo_u (naming is difficult):
psi = ops.Hadamard(t)(psi)
for idx, inv in enumerate(range(t-1, -1, -1)):
u2 = u
for _ in range(idx):
u2 = u2(u2)
psi = ops.ControlledU(inv, t, u2)(psi, inv)
return psi
All that is left to do is to simulate a measurement by picking the state with the
highest probability, computing the binary fraction from this state, and comparing the
result against the target value. Since we have limited bits to represent the result, we
allow an error margin of 2%. More bits for t will make the circuit run slower but also
improve the error margins.
There is the potential of delta to be larger than two percent, especially when not
enough bits were reserved for t. Another interesting error case is when the eigenvalue
rounds to 1.0. In this case, all digits after the dot are 0. As a result, the estimated value
from the binary fractions will also be 0.0 instead of the correct value of 1.0. The code
warns about this case.
The result should look similar to the following output. Note that the highest prob-
ability found may not be close to 1.0. This means that when measured on a real,
probabilistic quantum computer, we would obtain a fairly noisy result, with the correct
solution hopefully showing enough distinction from other measurements.
Phase : 0.3203
Estimate: 0.3125 delta: 0.0078 probability: 7.30%
[...]
Phase : 0.6688
Estimate: 0.6719 delta: 0.0030 probability: 20.73%
Shor’s algorithm for number factorization is the one algorithm that has sparked a
tremendous amount of interest in quantum computing (Shor, 1994). The internet’s
RSA (Rivest, Shamir, Adlemen) encryption algorithm (Rivest et al., 1978) is based on
the assumption that number factoring is an intractable problem. If quantum computers
could crack this code, it would have severe implications.
Shor’s algorithm is complex to implement, at least with the background of the
material presented so far. To factor numbers like 15 or 21, it requires a large number
of qubits and a very large number of gates, in the order of many thousands.
This looks like a great challenge, so let’s dive right in. The algorithm has two parts:
Correspondingly, we split the description of the algorithm into two parts. The
classical part is discussed in this section, and the quantum part will be discussed in
Section 6.6 on order finding.
Or, equivalently,
Two numbers are congruent mod N if their modulus is the same, in which case they
are in the same equivalence class. Here are examples for a modulus of 12:
15 ≡ 3 mod 12,
15 ≡ −9 mod 12.
190 Complex Algorithms
The numbers 15,3, and −9 are in the same mod 12 equivalence class. Note that in
Python, applying the % operator would yield -9 % 12 = 3. Simple algebraic rules
hold:
We can use these rules to simplify computation with large numbers, for example:
15 = 3 · 5,
21 = 3 · 7.
6.5.3 Factorization
Now we shall see how to use modular arithmetic and the GCD to factor a large number
into two primes. We are only considering numbers that have two prime factors. Why
is this important? In general, any number can be factored into several prime factors pi :
e e
N = p00 p1e1 · · · pn−1
n−1
.
But the factoring is most difficult if N has just two prime factors of roughly equal
length. This is why this mechanism is used in RSA encryption. Hence, we assume that
N = pq.
x 2 ≡ 1 mod N . (6.3)
6.5 Shor’s Algorithm 191
There are two trivial solutions to this equation: x = 1 and x = −1. Are there other
solutions? In the following, the typical examples for N are 15 and 21. As we will see
later, this is mostly determined by the number of qubits we will be able to simulate.
Let us pick 21 as our example integer. We will iterate over all values from 0 to N
and see whether we find another x for which above Equation (6.3) holds:
1*1 = 1 = 1 mod N
2*2 = 4 = 4 mod N
3*3 = 9 = 9 mod N
4*4 = 16 = 16 mod N
5*5 = 25 = 4 mod N
6*6 = 36 = 15 mod N
7*7 = 49 = 8 mod N
8*8 = 64 = 1 mod N
[...]
Indeed, we found another x for which this equation holds, x = 8. We can turn the
search around and, instead of looking for the n in n 2 = 1 mod N , we search for the n
with a given constant c in cn = 1 mod N . This is the mechanism we will use during
order finding. Here is an example with c = 2:
2^0 = 1 = 1 mod N
2^1 = 2 = 2 mod N
2^2 = 4 = 4 mod N
2^3 = 8 = 8 mod N
2^4 = 16 = 16 mod N
2^5 = 32 = 11 mod N
2^6 = 64 = 1 mod N
Since:
x 2 ≡ 1 mod N,
x 2 − 1 ≡ 0 mod N .
(x + 1)(x − 1) ≡ 0 mod N .
The modulo 0 means that N divides this product. We can therefore find the prime
factors by computing:
This looks easy but suffers from the “little technical problem” of having to find that
number x. In the classical case, our only options are to either iterate over all numbers
or pick random values, square them, and check whether we hit a modulo 1 number.
192 Complex Algorithms
Picking random values means that the birthday paradox applies, 3 and the probability
√
of finding the right value for an N -bit number is roughly 2N . The is completely
intractable for the large numbers used in internet encryption and numbers with lengths
of 1024 bits, 4096 bits, and higher. What now?
3 https://en.wikipedia.org/wiki/Birthday_problem.
6.5 Shor’s Algorithm 193
Step 3 – Factor
Once we have the order, how does it help us to get the factors of N ?
If we find an order r that is an odd number, we give up, throw the result away, and
try a different initial value of a in step 1.
If we find an order r that is an even number, we can use what we discovered earlier,
namely, we can get the factors if we can find the x in this equation:
x 2 ≡ 1 mod N .
We just found in step 2 above that:
a r ≡ 1 mod N .
Which we can rewrite as the following, if r is even:
2
a r/2 ≡ 1 mod N .
This means we can now compute the factors similar to above, (with r = order) as:
There is another little (as in, actually little) caveat – we do not know whether a given
initial value of a will result in an even or odd order. It can be shown that the probability
of getting an even order is 1/2. We might have to run the algorithm multiple times.
These three steps, select seed number, find ordering, and factor, are the core of
Shor’s algorithm, minus the quantum parts. Let us write some code to explore the
concepts this far before explaining quantum order finding in Section 6.6.
6.5.5 Playground
In this section, we will pick random numbers and apply the ideas from above. We
still compute the order and derive the prime factors classically. Since our numbers are
small, this is still tractable. Let us write a few helper functions first (the full source
code is in file src/shor_classic.,py).
When picking a random number to play with, we must make sure that it is, indeed,
factorizable and not prime:
The algorithm requires picking a random number to seed the process. This number
must not be a relative prime of the larger number, or the process might fail:
Find a random, odd, nonprime number in the range of numbers from fr to to and
also find a corresponding coprime:
while True:
n = random.randint(fr, to)
if n % 2 == 0:
continue
if not is_prime(n):
return n
while True:
val = random.randint(3, larger_num - 1)
if is_coprime(val, larger_num):
return val
And finally, we will need a routine to compute the order of a given modulus. This
routine is, of course, classical and iterates until it finds the result of 1 that is guaranteed
to exist.
order = 1
while True:
newval = (num ** order) % modulus
if newval == 1:
return order
order += 1
return order
Here is the main algorithm, which we execute many times over randomly chosen
numbers. We first select a random a and N , as described above. N is the number we
6.5 Shor’s Algorithm 195
want to factorize, so it must not be prime. The value a must not be a coprime. Once
we have those, we compute the order:
n = get_odd_non_prime(fr, to)
a = get_coprime(n)
order = classic_order(a, n)
All that’s left is to compute the factors from the even order and to print and check
the results:
We run some 25 tests and should see results like the following. For random numbers
up to 9,999, the order can already reach values of up to almost 4,000:
def main(argv):
print('Classic Part of Shor\'s Algorithm.')
for i in range(25):
run_experiment(21, 9999)
[...]
Classic Part of Shor's Algorithm.
Found Factors: N = 3629 = 191 * 19 (r=1710)
Found Factors: N = 4295 = 5 * 859 (r=1716)
[...]
Found Factors: N = 2035 = 5 * 407 (r= 180)
Found Factors: N = 9023 = 1289 * 7 (r=3864)
Found Factors: N = 1781 = 137 * 13 (r= 408)
In summary, we have learned how to factor a number N into two prime factors
based on order finding and modular arithmetic. Order finding for very large numbers
196 Complex Algorithms
is intractable classically, but in the next section we will learn an efficient quantum
algorithm for this task. The whole algorithm is quite magical, and it becomes even
more so when considering the quantum parts!
In the last section, we learned how finding the order of a specific function classically
lets us efficiently factor a number into its two prime factors. In this section, we discuss
an effective quantum algorithm to replace this classical task. We start by stating an
objective – finding the phase of a particular operator. Initially, it might not be apparent
how this relates to finding the order, but no worries, we develop all the details in the
next few sections.
Quantum order finding is phase estimation applied to this operator U :
U |yi = |x y mod N i. (6.4)
Phase estimation needs an eigenvector in order to run correctly. Let us first find the
eigenvalues of this operator. We know that eigenvalues are defined as:
U |vi = λ|vi.
We use a process similar to the power iteration process. We know that the eigenval-
ues must be of norm 1; otherwise, the probabilities in the state vector would not add
up to 1. Thus we can state:
U k |vi = λ k |vi,
and substitute this into the operator of Equation (6.4). This is a key step that is,
unfortunately, often omitted in the literature:
U k |yi = x k y mod N .
If r is now the order of x mod N , with x r = 1 mod N , then we get this result:
U r |vi = λr |vi = x r y mod N = |vi.
From this we can derive:
λr = 1.
This means the eigenvalues of U are the r th roots of unity. A root of unity is a
complex number that, when raised to some integer power n, yields 1.0. It is defined as:
λ = e2πis/r for s = 0, . . . ,r − 1.
With this result, we will show below that the eigenvectors of this operator are the
following for order r and a value s with 0 ≤ s < r :
r −1
1 X 2πiks/r k
|vs i = √ e a mod N .
r
k=0
6.6 Order Finding 197
With phase estimation, we can find the eigenvalues e2πis/r . The final trick will be
to get to the order from the fraction s/r .
There is, of course, a big problem – for the phase estimation circuit, we needed to
know an eigenvector. Because we do not know the order r , we cannot know any of
the eigenvectors. Here comes another smart trick. We do know that the operator in
Equation (6.4) is a permutation operator. Following the pattern of modular arithmetic,
states are uniquely mapped to other states with order r . In this context, we should
interpret states as integers, with state |1i representing decimal 1, and state |1001i
representing decimal 9. For all values less than r , this mapping is a 1:1 mapping. For
our operator:
U |yi = |x y mod N i .
We see that state |yi is multiplied by x mod N . As we iterate over exponents, this
becomes:
U n |yi = x n y mod N .
For our example above, with a = 2 and N = 21, each application multiplies
the state of the input register by 2 mod N . We started with 20 = 1 = 1 mod N ,
corresponding to state |1i. Then:
U |1i = |2i,
U 2 |1i = UU |1i = U |2i = |4i,
U 3 |1i = |8i,
U 4 |1i = |16i,
U 5 |1i = |11i,
U 6 |1i = U r |1i = |1i.
We can deduce that the first eigenvector of this operator is the superposition of all
states. This is easy to understand from a simpler example.4 Assume a unitary gate
only permutes between the two states |0i and |1i, with:
U |0i = |1i and U |1i = |0i.
Applying U to the superposition of both these states leads to the following result
with an eigenvalue of 1:
|0i + |1i U |0i + U |1i
U √ = √
2 2
|1i + |0i |0i + |1i
= √ = √
2 2
|0i + |1i
= 1.0 √ .
2
4 https://quantumcomputing.stackexchange.com/a/15590/11582.
198 Complex Algorithms
For the operator in Equation (6.4), we can generalize to multiple basis states. The
superposition of the basis states is an eigenvector of U with eigenvalue 1.0:
r −1
1 X k
|u 1 i = √ a mod N .
r
k=0
We also deduced above that the other eigenvalues are of the form:
λ = e2πis/r for s = 0, . . . ,r − 1.
Let’s look at the eigenstates where the phase of the kth basis state is proportional
to k:
r −1
1 X 2πik/r k
|u 1 i = √ e a mod N . (6.5)
r
k=0
For our example, applying the operator to this eigenvector follows the permutation
rules of the operator U (|1i → |2i,|2i → |4i, . . .):
1
|u 1 i = |1i + e2πi/6 |2i + e4πi/6 |4i + e6πi/6 |8i + e8πi/6 |16i + e10πi/6 |11i ,
6
1
U |u 1 i = |2i + e2πi/6 |4i + e4πi/6 |8i + e6πi/6 |16i + e8πi/6 |11i + e10πi/6 |1i .
6
Note how the order r = 6 now appears in the denominator. To make this general
for all eigenvectors, we multiply in a factor s:
r −1
1 X 2πiks/r k
|u s i = √ e a mod N .
r
k=0
As a result, for our operator, we now get a unique eigenvector for each integer
s = 0, . . . ,r − 1, with the following eigenvalues (note that if we added the minus sign
to Equation (6.5) above, the minus sign here would disappear; we can ignore it):
e−2πis/r |u s i.
Furthermore, there is another important result from this: if we add up all these
eigenvectors, the phases cancel out except for |1i (not shown here; it is voluminous,
but not challenging). This helps us because now we can use |1i as the eigenvector input
to the phase estimation circuit. Phase estimation will give us the following result:
s
φ= .
r
6.6 Order Finding 199
|0i H ...
..
.
|0i H ...
|0i H ...
But why is it that can we use |1i to initialize the phase estimation? Here5 is an
answer: Phase estimation should work for one eigenvector/eigenvalue pair. But in this
case, we initialize the circuit with the sum of all eigenvectors, which we can consider
as the superposition of all eigenstates. On measurement, the state will collapse to one
of them. Which one? We do not know, but we do know from above that it will have
a phase φ = s/r . This is all we need to find the order with the method of continued
fractions.
With all these preliminaries, we can now construct a phase estimation circuit as
shown in Figure 6.7. For a given to-be-factored N , we define the number of bits
necessary to represent N as L = log2 (N ). The output of this circuit will be less than
N , and we may need up to L output bits. We need to evaluate the unitary operation for
at least N 2 values of x to be able to sample the order reliably, so we need 2L input bits:
log2 N 2 = 2 log2 N = 2L.
When we implement the algorithm, we will also use an ancilla register to store
intermediate results from additions with a bit width of L + 2. (Typically you reserve a
single bit for addition overflow, but we implement controlled addition, which requires
an additional ancilla.) In summary, in order to factor a number fitting in L classical
bits, we need L + 2L + L + 2 = 4L + 2 qubits. To factor 15, which fits into four
classical bits, we will need 18 qubits. To factor 21, which fits in five classical bits, we
will need 22 qubits. This number differs from what is typically quoted in the literature,
which is theoretically closer to 2L + 1. This discrepancy appears to be an artifact of
the implementation details.
The big practical challenge for this circuit is how to implement the large unitary
operator U . Our solution is based on a paper from Stephane Beauregard (Beauregard,
5 https://quantumcomputing.stackexchange.com/q/15589/11582
200 Complex Algorithms
2003) and a reference implementation by Tiago Leao and Rui Maia (Leao, 2021). It is
a rather complex implementation, but, fortunately, we have seen most of the building
blocks already.
To factor the number 21, we need 22 qubits and more than 20,000 gates. There are
lots of QFTs and uncomputations, so the number of gates increases quickly. With our
fast implementation, we can still simulate this circuit tractably. The overall implemen-
tation is about 250 lines of Python code.
As with all oracles or high-level unitary operators, you might expect some sort of
quantum trick, a specially crafted matrix that just happens to compute the modulo
exponentiation. Unfortunately, a magical matrix does not exist. Instead, we have to
compute the modulo exponentiation explicitly with quantum gates by implementing
addition and multiplication (by a constant) in the Fourier domain. We also have to
implement the modulo operation, which is something we have not seen before.
We describe the implementation as follows: first, we outline the main routine driv-
ing the whole process. Then, we describe the helper routines, e.g., for addition. We
have seen most of these before in other sections. Finally, we describe the code that
builds the unitary operators and connects them to compute the phase estimate. We
then get actual experimental results from the estimated phase with help of continued
fractions.
def main(argv):
print('Order finding.')
number = flags.FLAGS.N
a = flags.FLAGS.a
qc = circuit.qc('order_finding')
# Register for QFT. This reg will hold the resulting x-value.
up = qc.reg(nbits*2, name='q1')
qc.had(up)
qc.x(down[0])
for i in range(nbits*2):
cmultmodn(qc, up[i], down, aux, int(a**(2**i)), number, nbits)
inverse_qft(qc, up, 2*nbits, with_swaps=1)
Finally, we check the results. For the numbers given (N = 15, a = 4), we expect a
result of 128 or 0 in the up register, corresponding to interpretations as binary fractions
of 0.5 and 0.0. We will detail the next steps on how to get to the factors at the end of
this section. This code snippet differs from the final implementation. Note again that
we inverted the bit order with [::-1]:
print(qc.stats())
202 Complex Algorithms
And indeed, we will get this result with 50% probability. This is the reality of this
algorithm – it is probabilistic. On a real machine, we might find only 1 and N and
have to run the algorithm multiple times until we find at least one prime factor. In our
infrastructure, of course, we can just peek at the resulting probabilities, no need to run
multiple times.
[...]
Swap...
Uncompute...
Measurement...
Final x-value. Got: 0 Want: 128, probability: 0.250
Final x-value. Got: 0 Want: 128, probability: 0.250
Final x-value. Got: 128 Want: 128, probability: 0.250
Final x-value. Got: 128 Want: 128, probability: 0.250
Circuit Statistics
Qubits: 18
Gates : 10553
if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
We will run a large number of QFTs and inverse QFTs. Many of these operations
are part of adding a quantum register with a known constant value. As we saw in
6.6 Order Finding 203
Section 6.3 on quantum arithmetic, this makes the implementation of quantum addi-
tion easier. We precompute the angles to apply them directly to the target register:
angles = [0.] * n
for i in range(0, n):
for j in range(i, n):
if s[j] == '1':
angles[n-i-1] += 2**(-(j-i))
angles[n-i-1] *= math.pi
return angles
angles = precompute_angles(a, n)
for i in range(n):
qc.u1(q[i], factor * angles[i])
angles = precompute_angles(a, n)
for i in range(n):
qc.cu1(ctl, q[i], factor * angles[i])
angles = precompute_angles(a, n)
for i in range(n):
ccphase(qc, factor*angles[i], ctl1, ctl2, q[i])
their controlled root and adjoint. For rotations around an angle x, the root is just a
rotation by x/2, and the adjoint of a rotation is a rotation in the other direction:
def ccphase(qc, angle: float, ctl1: int, ctl2: int, idx: int) -> None:
"""Controlled controlled phase gate."""
Using the adjoint of the addition circuit, we get (b−a) if b ≥ a, and (2n−1 −(a −b))
if b < a. So we can use this to subtract and compare numbers. If b < a, then the most
significant qubit will be |1i. We utilize this qubit to control other gates later.
(
|b − ai if b ≥ a,
b QFT Add † (a) QFT † =
|2n−1 − (a − b)i if b < a.
We implement QFT and QFT † on the up register, this time with an option for swaps
(which we actually don’t use for this algorithm).
if with_swaps:
for i in range(n // 2):
qc.swap(up_reg[i], up_reg[n-1-i])
if with_swaps == 1:
for i in range(n // 2):
qc.swap(up_reg[i], up_reg[n-1-i])
for i in range(n):
qc.had(up_reg[i])
if i != n-1:
6.6 Order Finding 205
j = i+1
for y in range(i, -1, -1):
qc.cu1(up_reg[j], up_reg[y], -math.pi / 2**(j-y))
(a + b) mod N ≥ a ⇒ a + b < N .
This time we run an inverse addition to subtract a from the result above and com-
pute (a+b) mod N −a. The most significant bit is going to be |0i if (a+b) mod N ≥ a.
|0i
X X
|0i
Figure 6.9 Second half of the modulo addition circuit, disentangling the ancilla.
We apply a NOT gate and use it as the controller for a Controlled-Not to the ancilla.
With this, the ancilla has been restored. Now we have to undo what we just did.
We apply another NOT gate to the most significant qubit, followed by a QFT and
an addition of a to revert the initial subtraction. The end result is a clean computa-
tion of (a + b) mod N . In circuit notation, the second half of the circuit is shown in
Figure 6.9. In code:
We will also need the inverse of this procedure. As before, and as explained in the
section on uncomputation, we simply apply the inverse gates in the reverse order:
|ci ...
...
...
|xi
...
...
qft(qc, q, n, with_swaps=0)
ccadd(qc, q, ctl1, ctl2, a, n, factor=1.0)
Uncomputing circuits like this is tedious. In Section 8.5.5 we show how to automate
uncomputation in an elegant way.
(ax) mod N =
(. . . (((20 ax0 ) mod N + 21 ax1 ) mod N ) + · · · + 2n−1 axn−1 ) mod N .
This means that we must implement the inverse computation of the first block using
the modular inverse.
print('Compute...')
qft(qc, aux, n+1, with_swaps=0)
for i in range(n):
cc_add_mod_n(qc, aux, q[i], ctl, aux[n+1],
((2**i)*a) % number, number, n+1)
inverse_qft(qc, aux, n+1, with_swaps=0)
print('Swap...')
for i in range(n):
qc.cswap(ctl, q[i], aux[i])
a_inv = modular_inverse(a, number)
print('Uncompute...')
qft(qc, aux, n+1, with_swaps=0)
for i in range(n-1, -1, -1):
cc_add_mod_n_inverse(qc, aux, q[i], ctl, aux[n+1],
((2**i)*a_inv) % number, number, n+1)
inverse_qft(qc, aux, n+1, with_swaps=0)
We shall name this circuit CUa . There is still a problem – the phase estimation
algorithm requires powers of this circuit. Does this mean that we have to multiply this
circuit n times with itself to get to (CUa )n for each power of 2 as required by phase
estimation? Fortunately, we don’t. We simply compute a n classically with:
(CUa )n = CUa n .
This can be seen at the top level in the code where we iterate over the calls to the
modular arithmetic circuit (those expressions containing 2**i).
Shouldn’t this mean that if we could find a fraction of integers approximating this
phase, we would have an initial guess for the order r ?
To approximate a fractional value to an arbitrary degree of accuracy, we can use the
technique of continued fractions.6 Fortunately for us, an implementation of it already
exists in the form of a Python library. We include the module:
import fractions
phase = helper.bits2frac(
bits[nbits+2 : nbits+2 + nbits*2][::-1], nbits*2)
We get the lowest denominator from the continued fractions algorithm. We also
want to limit the accuracy via limit_denominator to ensure we get reasonably
sized denominators:
r = fractions.Fraction(phase).limit_denominator(number).denominator
With this r, we can then follow the explanations on the nonquantum part of Shor’s
algorithm and seek to compute the factors. We might get 1s, or we might get N s,
which are both useless. With a little luck and by following actual probabilities, we
might just find one or two of the real factors.
6.6.6 Experiments
Let us run just a few examples to demonstrate that this machinery works. To factorize
15 with a value of a of 4, we run a circuit with 10,553 gates and obtain two sets of
factors, the trivial ones with 1 and 15, but, Eureka! also the real factors of 3 and 5:
6 https://en.wikipedia.org/wiki/Continued_fraction.
210 Complex Algorithms
Finally, factoring 35 with an initial a value of 4, uses over 36,000 gates and a
runtime of approximately 60 minutes:
You may want to experiment and perhaps convert this code to libq with the
transpilation facilities described in Section 8.5. The code runs significantly faster in
libq, which allows experimentation with much larger numbers of qubits. As a rough
and unscientific estimate – factorization with 22 qubits runs for about two minutes on
a standard workstation. After compilation to libq, it accelerates because of the sparse
representation and takes less than five seconds to complete, a speedup factor of over
25×. Factoring 35 with 26 qubits takes about an hour, but with libq it takes about
three minutes, a still significant speedup of about 20×.
To summarize, the algorithm as a whole – from the classical parts to the quantum
parts and finding the order with continued fraction – is truly magical. No wonder it
has gotten so much attention and stands out as one of the key contributors to today’s
interest in quantum computing.
“searching” we mean that there is a function f (x) and one (or more) special inputs x ′
for which:
f (x) = 0 ∀x 6= x ′,
f (x) = 1 x = x ′.
The classical algorithm to find x ′ is of complexity O(N ) in the worst case. It needs
to evaluate all possible inputs to f . Strictly speaking, N −1 steps are required, because
once all the elements, including the penultimate one, have returned 0, we know√that
the last element must be the elusive x ′ . Being able to do this with complexity O N
is, of course, an exciting prospect.
To understand and implement the algorithm, we first describe the algorithm at
a high level in fairly abstract terms. We need to learn two new concepts – phase
inversion and inversion about the mean. Once these concepts are understood, we
detail several variants of their implementation. We finally combine all the pieces into
Grover’s algorithm and run a few experiments.
U f = I ⊗n − 2|x ′ ihx ′ |.
⊤
3. Construct an inversion about the mean operator U , defined as:
⊤
U = 2(|+ih+|)⊗n − I ⊗n .
⊤
4. Combine U and U f into the Grover operator G (in this notation, U f is applied
first):
⊤
G = U Uf.
5. Iterate k times and apply G to the state. We derive the iteration count k below. The
resulting state will be close to the special state |x ′ i:
G k |+i⊗n ∼ |x ′ i.
This basically explains the whole procedure. Some of you may look at this, shrug, and
understand it right away. For the rest of us, the next sections explain this procedure in
great detail, sometimes in multiple different ways. Grover’s algorithm is foundational,
so we want to make sure we understand and appreciate it fully.
212 Complex Algorithms
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
In the chart in Figure 6.12, we negated the phase of state |4i, which should serve as
our special state |x ′ i.
To relate this back to the function f (x) we are trying to analyze, we use phase
inversion to negate the phase for the special elements only, which we can express with
this closed form:
X X
|ψi = cx |xi →inv |ψi = cx (−1) f (x) |xi. (6.6)
x x
A key aspect of this procedure is that the function f has to be known. Because how
else can we implement and perform this operation? There is an important distinction:
Even though an implementation has to know the function, observers who try to recon-
struct and measure
√ the function would still have to go through N steps in the classical
case, but only N in the quantum case. This is still different from, say, finding an
element meeting certain criteria in a database.
6.7 Grover’s Algorithm 213
0.2
−0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Figure 6.13 An example of (solid line) random data and (dashed line) its inversion about the
mean.
Inversion about the mean is the process of mirroring each cx across the mean. To
achieve this, we take each value’s distance from the mean, which is µ − cx , and add
it to the mean. For values that were above the mean, µ − cx is negative, and the value
is reflected below the mean. Conversely, for values that were below the mean, µ − cx
is positive, and the values are being reflected up. An example with a random set of
values is shown in Figure 6.13 (solid line).
For each ci we compute:
ci → µ + (µ − ci ) = (2µ − ci ).
This reflects each value about the mean. For the example in Figure 6.13, the
reflected values are shown with a dashed line. Each amplitude ci has been reflected
about the mean of all amplitudes.
214 Complex Algorithms
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
cx → µ + (µ − cx ) = (2µ − cx ),
X X
cx |xi → (2µ − cx )|xi. (6.7)
x x
(note that in the implementation below, we use a different methodology to get this
operator):
1 0 0 0
0 1 0 0
U f = I − 2|x ′ ihx ′ | =
0
.
0 1 0
0 0 0 −1
⊤
We know how to create an equal superposition state |si = | + +i. The state |x i
orthogonal to |x ′ i is very close to |si; |si is almost orthogonal to |x ′ i.
1 1
1 1 ⊤ 1
|x i = √ 1 .
|si = H ⊗2 |00i = |++i =
2 1
3 1
1 0
⊤
Note that |x i = |si − |x ′ i is the equal superposition state |si with |x ′ i removed; it
corresponds to the axis |αi in Figure 6.15. The state |ψi in the figure corresponds to
the initial |si. It is easy to see how applying the operator U f inverts the phase of the
|x ′ i component in |si:
1
1 1
U f |si = .
2 1
−1
In Figure 6.15, this corresponds to a reflection of the state |ψi (which is our |si)
⊤
about the α-axis. The inversion about the mean operator U , as defined in step 3)
above, is:
⊤
U = 2(|+ih+|)⊗2 − I ⊗2
= 2|sihs| − I ⊗2 .
⊤
The operator U reflects U f |ψi about the original state |si into the new state
⊤
U U f |ψi = |11i. For our example with just two qubits, a single iteration is all that is
needed to move state |si to |x ′ i. In code:
x = state.bitstring(1, 1)
s = ops.Hadamard(2)(state.bitstring(0, 0))
Uf = ops.Operator(ops.Identity(2) - 2 * x.density())
Ub = ops.Operator(2 * s.density() - ops.Identity(2))
(Ub @ Uf)(s).dump()
>>
|11> (|3>): ampl: +1.00+0.00j prob: 1.00 Phase: 0.0
The iteration count of 1 agrees with Equation (6.11), which we will derive next.
216 Complex Algorithms
|βi
⊤
U U f |ψi
3φ
2 |ψi
2φ
φ/2
φ |αi
U f |ψi
⊤
The inversion around the mean (let’s again call this operator U ) then performs
another reflection about the vector |ψi. The two reflections amount to a rotation, which
means the state remains in the space spanned by |αi and |βi. Furthermore, the state
incrementally rotates towards the solution space |βi. We have seen in (6.8) above that:
r r
N−M M
|ψi = |αi + |βi.
N N
We can geometrically position the state vector with simple trigonometry. We define
the initial angle between |ψi and |αi as φ/2. Equation (6.9) is important; we will use
it in Section 6.9 on quantum counting:
r
φ N−M
cos = ,
2 N
r
φ M
sin = , (6.9)
2 N
φ φ
|ψi = cos |αi + sin |βi.
2 2
From Figure 6.16, we can see that after phase inversion and inversion around the
mean, the state has rotated by φ towards |βi. The angle between |αi and |ψi is now
3φ/2. We call the combined operator the Grover operator G = U U f :⊤
3φ 3φ
G|ψi = cos |αi + sin |βi.
2 2
From this, we see that repeated application of the Grover operator G takes the
state to:
k 2k + 1 2k + 1
G |ψi = cos φ |αi + sin φ |βi.
2 2
In order to maximize the probability of measuring |βi, the term sin 2k+1 2 φ ought
to be as close to 1.0 as possible. Taking the arcsin of the expression yields:
2k + 1
sin φ =1
2
2k + 1
φ = π/2
2
π 1 π 1
k= − = φ − . (6.10)
2φ 2 4 2
2
Note that the iteration number must be an integer, so the question we face now is
what to do with the −1/2. We could ignore it, use it for rounding up, or for rounding
down. In our implementation, we chose to ignore it. For our examples below, the
probabilities for finding the solutions are around 40% or higher, and this term has no
impact.
218 Complex Algorithms
|0i⊗n H ⊗n x x H ⊗n
Uf
|1i H y y ⊕ f (x)
It is important to note that the bottom ancilla qubit is initialized as |1i. The
Hadamard gate puts it into state |−i. This is important because it means that the input
state is transformed by U f into the desired state:
X X
|ψi = cx |xi i →inv |ψi = cx (−1) f (x) |xi.
x x
If f (x) = 1, the bottom qubit in state |−i is XOR’ed with 1, which means the state
changes to:
|1i − |0i
√ .
2
This means it gets a phase:
|−i → −|−i.
We can slightly rearrange the terms, ignore the ancilla, and arrive at the exact form we
were looking for:
X
|ψi = cx (−1) f (x) |xi.
x
We only want to apply the XOR for the special state |x ′ i for which f (x) = 1. This
means we can multi-control the final qubit as shown in Figure 6.16, ensuring that all
control bits are |1i.
In matrix form, we can accomplish this by multiplying the state vector with a
matrix that has 2/N at each element, except for the diagonal elements, which are
220 Complex Algorithms
|x ′ i = |11010i
|x0 i |x0 i
|x1 i |x1 i
|x2 i X X |x2 i
|x3 i |x3 i
|x4 i X X |x4 i
|yi X |y ⊕ f (x)i
2/N − 1. Note that this matrix is the desired end result of the derivation in the next
few paragraphs. It represents this expression, as shown in the introduction of this
section:
⊤
U = 2(|+ih+|)⊗n − I ⊗n . (6.12)
This matrix is also called the diffusion operator because its form is similar to the
discretized version of the diffusion equation, but we can safely ignore this fun fact
here. This is the operator we hope to construct:
2/N − 1 2/N ... 2/N
2/N 2/N − 1 . . . 2/N
⊤
U =
. .. .. .. . (6.13)
.. . . .
2/N 2/N ... 2/N − 1
Why do we look for this specific operator? Remember Equation 6.10. We want to
construct an operator that performs this transformation.
X X
cx |xi → (2µ − cx )|xi.
x x
Why does this work? Each row multiplies and adds up each state vector element
by 2/N before subtracting the one element corresponding to the diagonal. This is
the exact definition of the closed form inversion procedure shown in Equation (6.12)
above.
6.7 Grover’s Algorithm 221
|0i H ⊗n W H ⊗n
|1i
2/N − 1 2/N ... 2/N
c0
2/N 2/N − 1 . . . 2/N c1
.. .. .. .. .
. .
.
. . .
How would we arrive at this matrix from what we’ve learned so far? We’ve
seen the geometrical interpretation above – we can think of inversion about the
mean as a reflection around a subspace. Hence, a possible derivation consists of
three steps:
These three steps define the circuit shown in Figure 6.17. For steps 1 and 3, it is
sufficient to apply the Hadamard operators, as we are in the Hadamard basis from the
phase inversion before.
For step 2, we will want to leave the state |00 . . . 0i alone but reflect all other
states. If we think about how states are represented in binary and how matrix-vector
multiplication works, we can achieve this by constructing the matrix W , which is easy
to derive:
1
−1
W =
..
.
−1
2 1
0 1
= 2(P|0i )⊗n − I ⊗n =
..
−
..
.
. .
0 1
Again, we could pick any state as the axis to reflect about, but the math is elegant
and simple when picking the state |00 . . . 0i. This will become more clear with the
derivation immediately below.
Only the first bit in the state vector remains unmodified, and that first bit corre-
sponds to the state |00 . . . 0i. Remember that the state vector for this state is all 0s,
except the very first element, which is a 1. All other states are therefore being negated.
In combination, we want to compute the following:
1
−1
⊗n ⊗n
⊗n
⊗n
H WH = H ..
H
.
−1
2
0
= H ⊗n
..
− I H ⊗n
.
0
2
0
⊗n
= H ⊗n
..
H − H ⊗n I H ⊗n .
.
0
Since the Hadamard is its own inverse, the second term reduces to just the identity
matrix I . Multiplying in the left and right Hadamard gates:
6.7 Grover’s Algorithm 223
√
2/ N 0 ... 0
2/√ N 0 ... 0
⊗n
=
.. .. .. .. H − I
. . . .
√
2/ N 0 ... 0
2/N 2/N ... 2/N
2/N 2/N ... 2/N
=
.. .. .. .. − I .
. . . .
2/N 2/N ... 2/N
Finally, subtracting the identity I yields a matrix where all elements are 2/N, except
the diagonal elements, which are 2/N − 1:
2/N − 1 2/N ... 2/N
2/N 2/N − 1 . . . 2/N
⊤
U =
.. .. .. .. . (6.14)
. . . .
2/N 2/N . . . 2/N − 1
⊤
This is the matrix U we were looking for. Applying this matrix to a state turns each
element cx into 2µ − cx , which is exactly what we wanted from the inversion about
the mean procedure, as shown in Equation (6.9)!
Not to foreshadow the implementation below, but we can verify this by changing
this line in src/grover.py:
<<
reflection = op_zero * 2.0 - ops.Identity(nbits)
>>
reflection = ops.Identity(nbits) - op_zero * 2.0
We want to build a gate that leaves every state untouched, except |00 . . . 0i, which
should get its phase negated. A Z-gate will do this for us. Because the Z-gate must be
224 Complex Algorithms
X X
X X
= X X
X X
Z Z
Figure 6.18 Inversion about the mean circuit (omitting leading and trailing Hadamard gates
applied to all qubits).
controlled to only apply to |00 . . . 0i, we expect all inputs to be |0i. Hence, to control
the Z-gate, we sandwich it between X-gates (omitting the left and right Hadamard
gates from the construction in Equation (6.13)), as shown in Figure 6.18.
⊤
As a result, for the big inversion operator U from Equation (6.13), the circuit in
Figure 6.18 corresponds to the closed form below, which yields −U : ⊤
H ⊗n X ⊗n (CZ)n−1 X ⊗n H ⊗n = −U . ⊤
num_inputs = 2**d
answers = np.zeros(num_inputs, dtype=np.int32)
answer_true = np.random.randint(0, num_inputs)
def func(*bits):
return answers[helper.bits2val(*bits)]
return func
6.7 Grover’s Algorithm 225
⊤
√
G = U U f repeat π4 N times
H ⊗n
|0i⊗n H ⊗n W H ⊗n
Uf
|1i H
The circuit’s initial state is a register of |0i qubits with an additional ancilla qubit
in state |1i. Applying the Hadamard gate to all of the qubits puts the ancilla into the
state |−i:
# State initialization:
psi = state.zeros(nbits) * state.ones(1)
for i in range(nbits + 1):
psi.apply(ops.Hadamard(), i)
Now on to mean inversion. We first construct an all-0 matrix with a single 1.0
at element (0, 0). This is equivalent to building up an nbits-dimensional |0ih0|
projector:
⊤
The full inversion operator U consists of the Hadamard gates bracketing the
reflection matrix W . We add an identity gate to account for the ancilla we added
earlier for the phase inversion oracle. Finally, we build the full Grover operator G =
grover as the combination of mean inversion inversion with the phase inversion
operator uf:
We finally iterate the desired number of times based on the size of the state as
discussed above (see Equation (6.10)):
for _ in range(iterations):
psi = grover(psi)
def main(argv):
[...]
How should we modify Grover’s algorithm to account for multiple solutions? We have
to adjust the phase inversion, inversion about the mean, and the iteration count.
Phase inversion for multiple solutions is easy to achieve. We modify the function
make_f and give it a parameter solutions to indicate how many solutions to mark.
We also thread this parameter through the code (not shown here, but available in the
open-source repository):
num_inputs = 2**d
answers = np.zeros(num_inputs, dtype=np.int32)
for i in range(solutions):
idx = random.randint(0, num_inputs - 1)
# Avoid collisions.
while answers[idx] == 1:
idx = random.randint(0, num_inputs - 1)
228 Complex Algorithms
We already derived the proper iteration count in the derivation for Grover’s algo-
rithm in Equation (6.11) as:
r
π N
k= .
4 M
We assumed M = 1 there (Section 6.7.6). To account for multiple solutions, we have
to adjust the computation of the iteration count and divide by M, which is parameter
solutions in the code:
We add a test sequence to our main driver code to check whether any solution
can be found, and with what maximal probability. For good performance, we fix the
number of qubits at eight and gradually increase the number of solutions from 1 to 32:
If we print the number of states with nonzero probability, we find that all of their
probabilities are identical, and there are twice as many states with nonzero probability
as there are solutions! This is an artifact of our oracle construction and the entangle-
ment with the ancilla qubit. We should get output like the following:
0.1
8 · 10−2
6 · 10−2
4 · 10−2
2 · 10−2
0
20 40 60
Figure 6.20 Probability of finding a solution when the total number of solutions ranges from 5
up to 64 in a state space of 128 elements.
Note how the probabilities decline rapidly. Let’s visualize this with the graph in
Figure 6.20. On the x-axis, we have the number of solutions ranging from 5 to 64.
On the y-axis, we ignore the first few cases with high probability and set a maximum
of 0.1. We can see how the probabilities decline rapidly and drop to 0 after the total
number of solutions exceeds 40.
What if there are many more solutions, perhaps even a majority of the state space?
To answer this question, Grover’s algorithm has been generalized by Brassard et al.
(2002) as Quantum Amplitude Amplification (QAA).
Grover expected just one special element and initialized the search with an equal
superposition of all inputs by applying the algorithm A = H ⊗n to the input (note
the unusual use of the term algorithm here). However, we might already have prior
knowledge about the state of the system, which we can exploit by preparing the state
differently. QAA supports any algorithm A to initialize the input and changes the
Grover iteration to a more general form:
Q = AW A−1 U f .
Operator U f is the phase inversion operator for multiple solutions, and W is the
inversion about the mean matrix we saw in Grover’s algorithm. What changes is the
derivation of the iteration count k, which has been shown to be proportional to the
probability pgood of finding a solution (see Kaye et al., 2007, section 8.2), which was
M/N (with M = 1 in the case of Grover):
s
1
k= .
pgood
Let us see how the probabilities improve with the new and improved iteration count.
As an experiment,7 we keep A = H ⊗n and compute the new iteration count as the
0.1
8 · 10−2
6 · 10−2
4 · 10−2
2 · 10−2
0
20 40 60
Figure 6.21 shows the probabilities for the two iterations, where the thick line repre-
sents the probabilities obtained with the new iteration count. We see that the situation
improves markedly, but the probabilities still drop to 0 at more than 64 solutions. We
have twice as many states with nonzero probabilities as there are solutions because
of the ancilla entanglement. As soon as we hit half the size of the space, probabilities
will drop to 0. A simple way to work around this problem is to just add another qubit.
This additional qubit will double the size of the state space and eliminate the problem.
The technique of amplitude amplification requires knowledge of the number of
good solutions, as well as their probability distribution. A general technique called
amplitude estimation can help with this (see Kaye et al., 2007, section 8.2). In the next
section, we detail a special case of amplitude estimation, quantum counting, which
assumes an equal superposition of the search space with algorithm A = H ⊗n , similar
to Grover.
|0i H ...
..
.
|0i H ...
|0i H ...
|0i⊗n H G2
0
G2
1
G2
2 ...
G2
t−1
the right iteration count. Quantum counting is a special case of amplitude estimation
that seeks to estimate this number M. Because it expects an equal superposition of the
search space, similar to Grover, with algorithm A = H ⊗n , we can reuse much of the
Grover implementation in the code below.
As in Grover’s algorithm, we partition the state space into a space with no solution
|αi and the space with only solutions |βi:
r r
N−M M
|ψi = |αi + |βi.
N N
Applying the Grover operator amounts to a rotation by an angle φ towards the
solution space |βi. You may refer back to Figure 6.15 for a graphical illustration of this
process. Since this a counterclockwise rotation, we can express the Grover operator as
a standard rotation matrix:
cos φ − sin φ
G(φ) = .
sin φ cos φ
Rotation matrices are unitary matrices with the eigenvalues:
λ0,1 = e±i φ .
We have also learned in the analysis of Grover’s algorithm that Equation (6.9),
replicated here, holds, with N being the number of elements and M being the number
of solutions:
r
φ M
sin = .
2 N
If we find φ, we can estimate M because we already know N . We also know that
we can use our friendly neighborhood phase estimator to find φ. For this, we build the
circuit as shown in Figure 6.22.
232 Complex Algorithms
Let’s translate this circuit into code (the implementation is in file src/counting
.py). First we define the function that returns 1 for a solution and 0 otherwise. This is
the same function we developed for amplitude amplification in Section 6.8:
Next, we build up the phase inversion operator, just as we did in Section 6.7 on
Grover’s algorithm. Parameter nbits_phase specifies how many qubits to use for the
phase estimation, and the parameter nbits_grover indicates how many qubits to use
for the Grover operator itself. Since this code utilizes the full matrix implementation,
we can only use a limited number of qubits. Nevertheless, the more qubits we use for
the phase estimation, the more numerically accurate the results will become.
We build up the circuit as in Figure 6.22. Note that the Grover operator needs a |1i
ancilla, which we also have to add to the state (not shown in the Figure). We apply a
Hadamard to the inputs, including the ancilla:
Let us construct the Grover operator next. This, again, is very similar to the previous
section on Grover’s algorithm:
We follow this with the sequence of exponentiated gates and a final inverse QFT:
This completes the circuit. We measure and find the state with the highest probability.
We reconstruct the phase from the binary fractions and then use Equation (6.9) to
estimate M:
# Get the state with highest probability and compute the phase
# as a binary fraction. Note that the probability decreases
# as M, the number of solutions, gets closer and closer to N,
# the total number of states.
maxbits, maxprob = psi.maxprob()
phi_estimate = (sum(maxbits[i] * 2**(-i - 1)
for i in range(nbits_phase)))
# the 1/2 in above formula cancels out against the 2 and we compute:
M = round(n * math.sin(phi_estimate * math.pi)**2, 2)
Let’s run some experiments with seven qubits for the phase estimation, and five
qubits for the Grover operator. In each experiment, we increase M by 1. For N = 64,
we let M range from 1 to 10:
def main(argv):
[...]
for solutions in range(1, 11):
run_experiment(7, 5, solutions)
Running this code should produce output like the following. We can see that our
estimates are “in the ballpark” and will round to the correct solution. The solution
probability also decreases significantly with higher values for M. It is instructional to
experiment with all these parameters.
0.6
0.4
0.2
0
0 20 40 60 80 100
Figure 6.23 Results from a simulated classical random walk, plotting the likelihood of final
position after starting in the middle of the range.
6.10.1 1D Walk
Let us start by considering a classical 1-dimensional walk on a number line. For each
step, a coin toss determines whether to move left or right. After a number of moves,
the probability distribution of the final location will be shaped like a classic bell curve,
with the highest probability clustering around the origin of the journey. Figure 6.23
shows the result from a simple experiment,8 which is available in the open-source
repository in tools/random_walk.py.
The equivalent quantum walk operates in a similar fashion with coin tosses and
movements. Because this is quantum, we exploit superposition and move in both
directions at the same time. In short, a quantum random walk is the repeated appli-
cation of an operator U = C M, with C being a coin toss and M being the move
operator.
The most straightforward coin toss operator we can think of is, of course, a single
Hadamard gate. In this context, the coin is called a Hadamard coin. The |0i part of the
resulting superposition will control a movement to the left, and the |1i part, the move
to the right.
The movement circuits can be constructed the following way, as shown by Douglas
and Wang (2009). A number line is of length infinity, which cannot be properly
represented. We should assume a circle with N states as underlying topology for the
walk. Simple up- and down-counters with overflow and underflow between N and 0
8 In fairness, the curve simply reflects the random number distribution chosen for the experiment.
236 Complex Algorithms
... ...
... ...
... ...
... ...
# -X--
# -o--X--
# -o--o--X--
# -o--o--o--X--
# ...
for i in range(nbits):
ctl=controller.copy()
for j in range(nbits-1, i, -1):
ctl.append(j+idx)
qc.multi_control(ctl, i+idx, aux, ops.PauliX(), 'multi-1-X')
|1001i
|1i
|0i
|0i
|0i
|0i |0i
...
...
n I ncr Decr
...
...
H ...
For both cases, N is a power of 2. We can construct other types of counters, for
example, counters with step size larger than 1, or counters that increment modulo
another number. For example, to construct a counter modulo 9, we add gates matching
the binary representation of 9 to force a counter reset to 0, as shown in Figure 6.25.
With these tools, we can construct an initial n-qubit quantum circuit step, as shown
in Figure 6.26. It has to be applied repeatedly to simulate a walk (consisting of more
than just a single step).
We can see how to generalize this pattern to other topologies. For example, for a 2D
walk across a grid, we can use two Hadamard coins: one for the left or right movement
and one for movements up or down. For graph traversals, we would encode a graph’s
connectivity as an unitary operator. Several other examples can be found in Douglas
and Wang (2009).
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
def simple_walk():
"""Simple quantum walk."""
nbits = 8
qc = circuit.qc('simple_walk')
qc.reg(nbits, 0x80)
aux = qc.reg(nbits, 0)
coin = qc.reg(1, 0) # Add single coin qubit
for _ in range(64):
qc.h(coin[0])
incr(qc, 0, nbits, aux, [coin[0]]) # ctrl-by-1
decr(qc, 0, nbits, aux, [[coin[0]]]) # ctrl-by-0
What is really happening here? With n qubits, we can represent 2n states with
the corresponding number of probability amplitudes. As we perform step after step,
nonzero amplitudes will start to propagate out over the state space. Looking at the
examples in Figures 6.27b and 6.28b we see that, in contrast to a classical quantum
walk, the amplitude distribution spreads out faster and with a very different shape.
A series of 32 steps produces a nonzero amplitude in 64 states; the walk progresses
in both directions at the same time. The farther away from the origin, the larger the
amplitudes become. These are the key properties that quantum algorithms exploit to
solve classically intractable problems, such as Childs’ welded tree algorithm (Childs
et al., 2003). To visualize the effect, we graph the resulting amplitudes:
6.10 Quantum Random Walk 239
Let us experiment with eight qubits. The starting position should be in the middle
of the range of states. With eight qubits, there are 256 possible states, and we initialize
with 0x80, the middle of the range. It is possible to initialize with 0 of course, but
that would lead to immediate wraparound effects. The amplitudes after 32, 64, and 96
steps are shown in Figure 6.27a, Figure 6.27b, and Figure 6.28a. The x-axis shows
the state space (256 unique states for eight qubits). The y-axis shows each state’s
amplitude.
Notice how in the figures the amplitudes progress in a biased fashion. It is possible
to create coin operators that are biased to the other side, or even balanced coin opera-
tors. Alternatively, we can start in a state different from |0i. In the example in Figure
6.28b, we simply initialize the coin state as |1i.
There are countless more experiments that you can perform with different coin
operators, starting points, initial states, number of qubits, iteration counts, and more
complex topologies beyond simple 1D and 2D walks.
It is exciting that if we can express a particular algorithmic reachability problem
as a quantum walk circuit, the fast speed of quantum walks and the dense storage of
states can lead to quantum algorithms with lower complexity than their corresponding
classical algorithms. As an example, the 2010 IARPA program announcement set a
challenge of eight complex algorithms to drive scalable quantum software and infras-
tructure development (IARPA, 2010). Three of these algorithms utilized quantum
0.2
0.4
0.1
0.2
0 0
−0.2
−0.1
−0.4
−0.2
0 50 100 150 200 250 0 50 100 150 200 250
(a) 8 qubits, 96 steps, starting at 0x80, with (b) 8 qubits, 96 steps, starting at 0x80, with
initial state |0i initial state |1i
walks: the triangle finding algorithm (Buhrman et al., 2005; Magniez et al., 2005),
the Boolean formula algorithm (Childs et al., 2009), and the welded tree algorithm
(Childs et al., 2003).
This section represents a brief foray into the area of quantum simulation. We discuss
the variational quantum eigensolver (VQE), an algorithm to estimate the ground state
energy of a Hamiltonian.
It is possible to use quantum phase estimation (QPE) for this purpose. However, for
realistic Hamiltonians, the number of gates required can reach millions, even billions,
making it challenging to keep a physical quantum machine coherent long enough to
run the computation. VQE, on the other hand, is a hybrid classical/quantum algorithm.
The quantum part requires fewer gates and, therefore, much shorter coherence times
when compared to QPE. This is why it created great interest in today’s era of Noisy
Intermediate Scale Quantum Computers (NISQ), which have limited resources and
short coherence times (Preskill, 2018).
There cannot be a book about quantum computing without mentioning the
Schrödinger equation at least once. This is that section in this book. So we begin
by marveling at the beauty of the equation, although we will not solve it here. The
purpose of showing it is to derive the composition of Hamiltonians from eigenvectors
and how the variational principle enables the approximation of a minimum eigenvalue.
This is followed by a discussion of measurement in different bases. We explain the
variational principle next before we detail the hybrid classical/quantum algorithm
itself.
∂9 h̄ 2 ∂ 2 9
i h̄ =− + V 9. (6.15)
∂t 2m ∂ x 2
6.11 Variational Quantum Eigensolver 241
h̄ 2 d 2 ψ
− + V ψ = E ψ. (6.16)
2m d x 2
In classical mechanics, the total energy of a system, which is the kinetic energy
plus the potential V , is called the Hamiltonian, denoted as H, not to be confused with
our Hadamard operator H .
mv2
H(x, p) = + V (x)
2
(mv)2
= + V (x)
2m
p2
= + V (x).
2m
As a side note, the factor h̄ (the Planck constant) appears here from the famous
Heisenberg uncertainty principle for a particle’s position x and momentum px , with
1x1px ≥ h̄/2. A Hamiltonian operator is obtained by the standard substitution with
the momentum operator:
∂
p → −i h̄ ,
∂x
h̄ 2 ∂ 2
Ĥ = − + V (x).
2m ∂ x 2
We use this result to rewrite Equation (6.16) as the following, with Ĥ being the
operator and E being an energy eigenvalue. Note the parallel to the definition of
eigenvectors as A xE = λ xE:
Ĥψ = E ψ.
hĤi = E.
with corresponding real eigenvalues λ0,λ 1, . . . ,λ n−1 . We can describe states as linear
combinations of the eigenvectors
This is the result we were looking for. It is important to note that the ci are complex
coefficients, similar to how we described a superposition between |0i and |1i. In this
case, however, the basis vectors are E i . For a detailed derivation of the above, see for
example Fleisch (2020).
242 Complex Algorithms
E 0 ≤ hψ|Ĥ|ψi ≡ hĤi.
However, what is this state |ψi? The answer is, any state, as long as the state is
capable, or close to being capable, of producing an eigenvector for Ĥ. This state will
determine the remaining error for the estimation of E 0 . We have to be smart about
how to construct it. This is the key idea of the VQE algorithm.
To see how this principle works, let’s take our assumed state from above and further
assume that λ0 is the smallest eigenvalue:
The VQE algorithm works with Hamiltonians that can be written as a sum of a
polynomial number of terms of Pauli operators and their tensor products (Peruzzo
et al., 2014). This type of Hamiltonian is used in quantum chemistry, the Heisenberg
Model, the quantum Ising Model, and other areas. For example, for a helium hydride
ion (He-H+ ) with bond distance 90 pm, the energy (Hamiltonian) is:
|0i (z)
|−i
|+i (x)
|1i
However, what if the current state was aligned with a different axis, such as the
x-axis from |−i to |+i, or the y-axis pointing from |-ii to |ii? In both cases, a mea-
surement along the z-axis would result in a random toss between |0i and |1i.
To measure in a different basis, we should rotate the state into the standard basis on
the z-axis and perform a standard measurement there. The results can be interpreted
along the original bases, and we get the added benefit that we only need a measurement
apparatus in one direction.
To get a proper measurement along the x-axis, we could apply the Hadamard gate
or rotate over the y-axis. Correspondingly, to get a measurement along the y-axis, we
may rotate about the x-axis.
To compute expectation values for states composed of Pauli matrices, we remind
ourselves of the basis states in the X, Y, and Z bases:
1 1 1 1
X : |+i = √ , |−i = √ ,
2 1 2 −1
1 1 1 1
Y : |ii = √ , |-ii = √ ,
2 i 2 −i
1 0
Z : |0i = , |1i = .
0 1
Pauli operators have eigenvalues of −1 and +1. Here are the operators applied to
basis states with eigenvalues +1:
Z |0i = |0i,
X |+i = |+i,
Y |ii = |ii.
These are the same operators applied to basis states with eigenvalues −1:
Z |1i = −|1i,
244 Complex Algorithms
X |−i = −|−i,
Y |-ii = -|-ii.
Let us now talk about expectation values. For a state in the Z -basis with amplitudes
c0z and c1z :
Computing the expectation value for Z , in the Z -basis, yields the following, and
similar for the X - and Y -bases:
hψ|Z |ψi = c0z∗ h0| + c1z∗ h1| Z c0z |0i + c1z |1i
= |c0z |2 − |c1z |2
The values |c0z |2 and |c1z |2 are the measurement probabilities for |0i and |1i. If we
run N experiments and measure state |0i n 0 times and state |1i n 1 times:
n0 n1
|c0z |2 = , |c1z |2 = .
N N
Then this is the final expectation value for Z ; please note the minus sign in this
equation:
n0 − n1
hZ i = .
N
To give an example, let’s assume we have a very simple circuit initialized with |0i
and with just one Hadamard gate. The state after this gate will be |+i, which is on the
x-axis. If we now measure N times in the Z -basis, about 50% of the measurements
will return |0i, and 50% will return |1i. The |0i corresponds to eigenvalue 1, the |1i
corresponds to eigenvalue −1. Hence, the expectation value is 0:
(+1)N /2 + (−1)N /2
= 0.
N
If we rotated the state into the Z -basis with another Hadamard gate, the expectation
value of |0i in the Z -basis would now be 1.0, which corresponds to the expectation
value of the state |+i originally in the X -basis.
In our infrastructure, we do not have to make measurements to compute probabili-
ties because we can directly peek at the amplitudes of a state vector. To compute the
expectation values for measurements made on Pauli operators with eigenvalues +1
and −1 corresponding to measuring |0i or |1i, we add this function to our quantum
circuit implementation qc:
Let’s run a few experiments to familiarize ourselves with these concepts. What hap-
pens to the eigenvectors and eigenvalues for a Hamiltonian constructed from a single
Pauli matrix multiplied with a factor? Is the result still unitary, or is it Hermitian?
factor = 0.6
H = factor * ops.PauliY()
eigvals = np.linalg.eigvalsh(H)
print(f'Eigenvalues of {factor} X = ', eigvals)
print(f'is_unitary: {H.is_unitary()}')
print(f'is_hermitian: {H.is_hermitian()}')
>>
Eigenvalues of 0.6 X = [-0.6 0.6]
is_unitary: False
is_hermitian: True
Eigenvalues scale with the factor. Hamiltonians are Hermitian, but not necessarily
unitary. Let’s create a |0i state, show its Bloch sphere coordinates, and compute its
expectation value in the Z -basis.
qc = circuit.qc('test')
qc.reg(1, 0)
qubit_dump_bloch(qc.psi)
print(f'Expectation value for 0 State: {qc.pauli_expectation(0)}')
>>
x: 0.00, y: 0.00, z: 1.00
Expectation value for 0 State: 1.0
As expected, the current position is on top of the north pole, corresponding to state
|0i. The expectation value is 1.0; there will be no measurements of the |1i state. Now,
if we add just a single Hadamard gate, we will get:
The position on the Bloch sphere is now on the x-axis, and the corresponding expec-
tation value in the Z -basis is 0; we will measure an equal amount of |0i and |1i states.
Of course, to rotate this state back into the Z -basis, we only have to apply another
Hadamard gate.
This is best explained by example. Let’s focus on the single-qubit case first. We know
that we can reach any point on the Bloch sphere with rotations about the x-axis and
y-axis. Let’s use this simple parameterized circuit as the ansatz:
We will construct multiple instances of ansatzes (which has a fun rhyme to it and
is the proper English plural from; the correct German plural Ansätze does not sound
quite as melodic). Let’s wrap it into code (which is in file src/vqe_simple.py):
To compute the expectation value, let’s create a state |ψi and compute the expecta-
tion value hψ|Ĥ|ψi from two angles theta and phi:9
# Construct Hamiltonian.
H = 0.2 * ops.PauliX() + 0.5 * ops.PauliY() + 0.6 * ops.PauliZ()
9 This code segment is different from the open-source version. It is for illustration only.
6.11 Variational Quantum Eigensolver 247
Let’s experiment with a few different values for theta and phi:
run_single_qubit_experiment2(0.1, -0.4)
run_single_qubit_experiment2(0.8, -0.1)
run_single_qubit_experiment2(0.9, -0.8)
>>
Minimum: -0.8062, Estimated: 0.4225, Delta: 1.2287
Minimum: -0.8062, Estimated: 0.0433, Delta: 0.8496
Minimum: -0.8062, Estimated: -0.2210, Delta: 0.5852
It appears we are moving in the right direction. We are getting closer to estimating
the lowest eigenvalue, but we are still pretty far away. This particular ansatz is simple
enough; we can incrementally iterate over both angles, approximating the minimum
eigenvalue with good precision. Just picking random numbers would work as well,
up to a certain degree. Obviously we can use techniques like gradient descent to find
the best possible arguments more quickly (Wikipedia, 2021d). Let’s run 10 experi-
ments with random, single-qubit Hamiltonians, iterating over the angles φ and θ in
increments of 10 degrees:
[...]
# iterate over all angles in increments of 10 degrees.
for i in range(0, 180, 10):
for j in range(0, 180, 10):
theta = np.pi * i / 180.0
phi = np.pi * j / 180.0
[...]
# run 10 experiments with random H's.
[...]
>>
Minimum: -0.6898, Estimated: -0.6889, Delta: 0.0009
Minimum: -0.7378, Estimated: -0.7357, Delta: 0.0020
[...]
Minimum: -1.1555, Estimated: -1.1552, Delta: 0.0004
Minimum: -0.7750, Estimated: -0.7736, Delta: 0.0014
In the code above, we explicitly compute the expectation value with two dot prod-
ucts. The key to success here is that the ansatz is capable of creating the minimum
248 Complex Algorithms
eigenvalue’s eigenvector (for two qubits). Shende et al. (2004) show how to construct
a universal two-qubit gate. However, the challenge is to minimize gates for much
larger Hamiltonians, especially on today’s smaller machines. How to construct the
ansatz is a research challenge. Which specific learning technique to use to accelerate
the approximation is another subject of ongoing interest in the field, even though it
appears that standard techniques from the field of machine learning work well enough.
We express the expectation values in the Z -basis with help of gate equivalences. Note
how we isolate the Z in the last line, representing the measurement in the Z -basis:
a = random.random()
b = random.random()
c = random.random()
H = (a * ops.PauliX() + b * ops.PauliY() + c * ops.PauliZ())
We have to build three circuits. The first is for hψ|X |ψi, which requires an addi-
tional Hadamard gate:
# X-Basis
qc = single_qubit_ansatz(theta, phi)
qc.h(0)
val_a = a * qc.pauli_expectation(0)
6.11 Variational Quantum Eigensolver 249
Then one circuit for hψ|Y |ψi, which requires a Hadamard gate and an S † gate:
# Y-Basis
qc = single_qubit_ansatz(theta, phi)
qc.sdag(0)
qc.h(0)
val_b = b * qc.pauli_expectation(0)
Finally, a circuit for the measurement in the Z -basis hψ|Z |ψi. In this basis we can
measure as is, there is no need for additional gates:
# Z-Basis
qc = single_qubit_ansatz(theta, phi)
val_c = c * qc.pauli_expectation(0)
As before, we iterate over the angles φ and θ in increments of, this time, 5 degrees.
For each iteration, we take the expectation values val_a, val_b, and val_c, multiply
them with the factors we noted above, add up the result, and look for the smallest
value.
This value min_val should be our estimate. The results are numerically accurate:
H = ops.Hadamard()
I = ops.Identity()
U = H * I
(ops.PauliZ() * I).dump('Z x I')
(ops.PauliX() * I).dump('X x I')
(U.adjoint() @ (ops.PauliX() * I)).dump('Udag(X x I)')
(U.adjoint() @ (ops.PauliX() * I) @ U).dump('Udag(X x I)U ')
>>
Z x I (2-qubits operator)
1.0 - - -
- 1.0 - -
- - -1.0 -
- - - -1.0
X x I (2-qubits operator)
- - 1.0 -
- - - 1.0
1.0 - - -
- 1.0 - -
Udag(X x I) (2-qubits operator)
0.7 - 0.7 -
- 0.7 - 0.7
-0.7 - 0.7 -
- -0.7 - 0.7
Udag(X x I)U (2-qubits operator)
1.0 - - -
6.11 Variational Quantum Eigensolver 251
- 1.0 - -
- - -1.0 -
- - - -1.0
From this, it is straightforward to construct the operators for a first set of Pauli
measurements containing at least one identity operator, as shown in Table 6.1.
But now it gets complicated. The operator for Z ⊗ Z is the Controlled-Not
U = CX 1,0 ! How does this happen? Let look at the matrix Z ⊗ Z :
1 0 0 0
0 −1 0 0
Z⊗Z =
0 0 −1 0
0 0 0 1
It needs a few permutations to turn into the form we are looking for, which is Z ⊗ I .
If we apply the Controlled-Not from left and right:
CX †1,0 (Z ⊗ Z ) CX 1,0 = (Z ⊗ I ),
The operator matrices for CX 1,0 perform the required permutation; we should not
think of this as an actually controlled operation. With these insights, we can now define
the remaining 4 × 4 Pauli measurement operators as shown in Table 6.2.
From here, we can generalize the construction to more than two qubits, similar to
Whitfield et al. (2011) for Hamiltonian simulation (which we don’t cover here). All
we have to do is to surround the multi-Z Hamiltonian with cascading Controlled-Not
Z ⊗I I⊗I
X ⊗I H⊗I
Y ⊗I H S† ⊗ I
I ⊗Z (I ⊗ I ) SWAP
I ⊗X (H ⊗ I ) SWAP
I ⊗Y (H S † ⊗ I ) SWAP
252 Complex Algorithms
Z ⊗Z CX 1,0
X⊗Z CX 1,0 (H ⊗ I )
Y ⊗Z CX 1,0 (H S † ⊗ I )
Z ⊗X CX 1,0 (I ⊗ H )
X⊗X CX 1,0 (H ⊗ H )
Y ⊗X CX 1,0 (H S † ⊗ H )
Z ⊗Y CX 1,0 (I ⊗ H S † )
X ⊗Y CX 1,0 (H ⊗ H S † )
Y ⊗Y CX 1,0 (H S † ⊗ H S † )
gates. For example, for the three-qubit Z Z Z , we write the code below (which could
be simplified by recognizing that CX †1,0 = CX 1,0 ).
Note that the adjoint of the X-gate is identical to the X-gate and the adjoint of a
Controlled-Not is a Controlled-Not as well. For Z Z Z Z , or even longer sequences of
Z-gates, we build a cascading gate sequence in circuit notation, as shown in Figure
6.30. Now that we have this methodology available, we can measure in any Pauli
measurement basis! And just to make sure, you can verify the construction for Z Z Z Z
with a short code sequence like the following:
Z Z
Z I
=
Z I
Z I
This operator acts on two qubits and thus can be used for problems that can be
expressed as weighted graphs.
The second operator U B depends on a parameter β. It is problem-independent and
applies rotations to each qubit with the following, where each X j is a Pauli X-gate:
Y X
U B (β) = e−i β B = e−i β X , where B = X j.
j j
For problems with higher depth, these two operators UC and U B are applied repeat-
edly, each with their own set of hyperparameters γi and βi , on an initial state of |+i⊗n :
The task at hand is then similar to VQE – find the best possible set of hyperpa-
rameters to minimize the expectation value for the cost function hγ,β|C|γ,βi, using
well-known optimization techniques, for example, from the area of machine learning.
The operators UC and U B can be approximated with the following circuits:
qj
UC : UB : qj Rx (βi )
qk Rz (γi )
We know from Section 6.11 on VQE how to implement this type of search, so we
won’t expand on this further.
The original QAOA paper showed that for 3-regular graphs, which are cubic graphs
with each vertex having exactly three edges, the algorithm produces a cut that is at
least 0.7 of the maximum cut, a number that we are roughly able to confirm in our
experiments below. Together with VQE, QAOA is an attractive algorithm for today’s
NISQ machines with limited resources, as the corresponding circuits have a shallow
depth (Preskill, 2018). At the same time, QAOA’s utility for industrial-sized problems
is still under debate (Harrigan et al., 2021).
In the previous section, we saw how the VQE finds the minimum eigenvalue and
eigenvector for a Hamiltonian. This is an exciting methodology because if we can
successfully frame an optimization problem as a Hamiltonian, we can use a VQE to
find an optimal solution. This section briefly describes how to construct a class of
such Hamiltonians: the Ising spin glass model, representing a multivariate optimiza-
tion problem. The treatment here is admittedly shallow but sufficient to implement
examples – the Max-Cut and Min-Cut algorithms in this section and the Subset Sum
problem in Section 6.14.
N
X X
h i xi + Ji j xi y j ,
i i, j
6.13 Maximum Cut Algorithm 255
N
X X
H (x0,x1, . . . ,xn ) = − h i σiz − Ji j σiz σ zj .
i i, j
The term σiz is the application of a Pauli Z-gate on qubit i. The minus sign explains
that we can look for a minimum eigenvalue to find a maximum solution. For prob-
lems such as Max-Cut, we use σ z because we want an operator with eigenvalues
−1 and +1.
With this background, Lucas (2014) details several NP-complete or NP-hard prob-
lems for which this approach may work. The list of algorithms includes partition-
ing problems, graph coloring problems, covering and packing problems, Hamiltonian
cycles (including the traveling salesman problem), and tree problems. In the next few
sections, we develop a related problem, the graph Max-Cut problem. We will also
explore a slightly modified formulation of the Subset Sum problem.
6.13.2 Max-Cut/Min-Cut
For a graph, a cut is a partition of the graph’s vertices into two nonoverlapping sets
L and R. A maximum cut is the cut that maximizes the number of edges between L
and R. Assigning weights to the graph edges turns the problem into the more general
weighted maximum cut, which aims to find the cut that maximizes the weights of
edges between sets L and R. This is the Max-Cut problem we are trying to solve in
this section.
Weights can be both positive and negative. The Max-Cut problem turns into a
Min-Cut problem simply by changing the sign of each weight. As an example, the
maximum cut in Figure 6.31a, a graph of four nodes, is between the sets L = {0,2}
and R = {1,3}. The nodes are colored white or gray, depending on which set they
belong to.
For a graph with just 15 nodes, as shown in Figure 6.31b, the problem becomes
unwieldy very quickly. General Max-Cut is NP-complete, we don’t know of any
polynomial time algorithm to find an optimal solution. This looks like a formidable
challenge for a quantum algorithm!
if num < 3:
raise app.UsageError('Must request graph of at least 3 nodes.')
weight = 5.0
nodes = [(0, 1, 1.0), (1, 2, 2.0), (0, 2, 3.0)]
for i in range(num-3):
l = random.sample(range(0, 3 + i - 1), 2)
nodes.append((3 + i, l[0],
weight*np.random.random()))
nodes.append((3 + i, l[1],
weight*np.random.random()))
return num, nodes
For debugging and intuition, it helps to visualize the graph. The output of the
routine below can be used to visualize the graphs with Graphviz (graphviz.org, 2021).
The graphs in Figure 6.31 were produced this way.
print('graph {')
print(' {\n node [ style=filled ]')
pattern = bin(max_cut)[2:].zfill(n)
for idx, val in enumerate(pattern):
if val == '0':
print(f' "{idx}" [fillcolor=lightgray]')
print(' }')
for node in nodes:
print(' "{}" -- "{}" [label="{:.1f}",weight="{:.2f}"];'
.format(node[0], node[1], node[2], node[2]))
print('}')
6.13 Maximum Cut Algorithm 257
|{z} 0 |{z}
1 |{z} 0
1 |{z}
n0 n1 n2 n3
We can compute the Max-Cut exhaustively, and quite inefficiently, given our choice
of data structures. For n nodes, we generate all binary bitstrings from 0 to n. For each
bitstring, we iterate over the individual bits and build two index sets: indices with a 0
in the bitstring, and indices with a 1 in the bitstring. For example, the bitstring 11001
would create set L = {0,1,4} and set R = {2,3}.
The calculation then iterates over all edges in the graph. For each edge, if one of
the vertices is in L and the other in R, there is an edge between sets. We add the edge
weight to the currently computed maximum cut and maintain the absolute maximum
cut. Finally, we return the corresponding bit pattern as a decimal. For example, if the
maximum cut was binary 11001, the routine returns 25 (this routine will only work
with up to 64 bits or vertices).
max_cut = -1000
for bits in helper.bitprod(n):
# Collect in/out sets.
iset = []
oset = []
for idx, val in enumerate(bits):
iset.append(idx) if val == 0 else oset.append(idx)
state = bin(helper.bits2val(max_bits))[2:].zfill(n)
258 Complex Algorithms
The performance of this code is, of course, quite horrible but perhaps indicative
of the combinatorial character of the problem. On a standard workstation, computing
the Max-Cut for 20 nodes takes about 10 seconds; for 23 nodes it takes about 110
seconds. Even considering performance differences between Python and C++ and the
relatively poor choice of data structure, it is obvious that the runtime will quickly
become intractable for larger graphs.
Note that the solution is symmetric. If a Max-Cut is L = {0,1,4} and R = {2,3},
then L = {2,3} and R = {0,1,4} is a Max-Cut as well.
# Full matrix.
H = np.zeros((2**n, 2**n))
6.13 Maximum Cut Algorithm 259
As we have described so far, for a graph with n nodes, we would have to build
operator matrices of size 2n × 2n , which does not scale well. Note, however, that both
the identity matrix and σz are diagonal matrices. Tensor products of diagonal matrices
result in diagonal matrices. For example:
1 0 0 0 0 0 0 0
0 −1 0 0 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 −1 0 0 0 0
I⊗I⊗Z =
0 0 0 0 1 0 0 0
0 0 0 0 0 −1 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 −1
= diag(1, − 1,1, − 1,1, − 1,1, − 1).
If we apply a factor to any individual operator, that factor multiplies across the
whole diagonal. Let us look at what happens if we apply σz at index 0,1,2, . . . in the
tensor products (from right to left):
I ⊗ I ⊗ Z = diag(+1, − 1, + 1, − 1, + 1, − 1, +1 , −1 ),
|{z} |{z}
20 20
I ⊗ Z ⊗ I = diag(+1, + 1, − 1, − 1, +1, + 1 , −1, − 1),
| {z } | {z }
21 21
Z ⊗ I ⊗ I = diag(+1, + 1, + 1, + 1 , −1, − 1, − 1, − 1).
| {z } | {z }
22 22
These are power-of-2 patterns, similar to those we have seen in the fast gate apply
routines. This means we can optimize the construction of the diagonal Hamiltonian
and only construct a diagonal tensor product! The full matrix code is very slow and
can barely handle 12 graph nodes. The diagonal version below can easily handle twice
260 Complex Algorithms
as many nodes. C++ acceleration might help to further improve scalability, especially
because the calls to tensor_diag can be parallelized.
h = [0.0] * 2**n
for node in nodes:
diag = tensor_diag(n, node[0], node[1], node[2])
for idx, val in enumerate(diag):
h[idx] += val
return h
The minimum value is −41.91 and appears in two places: at index 6, which is
binary 0110, and the complementary index 9, which is binary 1001. This corresponds
to state |0110i and the complementary |1001i. This is precisely the Max-Cut pattern
in Figure 6.32. We have found the Max-Cut by applying VQE by peek-a-boo on a
properly prepared Hamiltonian!
6.13 Maximum Cut Algorithm 261
Figure 6.32 Graph with 4 nodes; Max-Cut is {0,3},{1,2}, or 0110 in binary set encoding.
Here is the code to run experiments. It constructs the graph and computes the Max-
Cut exhaustively. Then it computes the Hamiltonian and obtains the minimum value
and its index off the diagonal.
n, nodes = build_graph(num_nodes)
max_cut = compute_max_cut(n, nodes)
#
# These two lines are the basic implementation, where
# a full matrix is being constructed. However, these
# are diagonal, and can be constructed much faster.
# H = graph_to_hamiltonian(n, nodes)
# diag = H.diagonal()
#
diag = graph_to_diagonal_h(n, nodes)
min_idx = np.argmin(diag)
# Results...
if flags.FLAGS.graph:
graph_to_dot(n, nodes, max_cut)
if min_idx == max_cut:
print('SUCCESS: {:+10.2f} |{}>'.format(np.real(diag[min_idx]),
bin(min_idx)[2:].zfill(n)))
else:
print('FAIL : {:+10.2f} |{}> '.format(np.real(diag[min_idx]),
bin(min_idx)[2:].zfill(n)), end='')
print('Max-Cut: {:+10.2f} |{}>'.format(np.real(diag[max_cut]),
bin(max_cut)[2:].zfill(n)))
Running this code, we find that it does not always work; it fails in about 20–30%
of the invocations. Our criteria is very strict; to mark a run as successful, we check
whether the optimal cut was found. Anything else is considered a failure. However
262 Complex Algorithms
even if the optimal cut was not found, the results are still within 30% of optimal, and
typically significantly below 20%. This matches the analysis from the QAOA paper.
For example, running over graphs with 12 nodes may produce output like the
following:
In Section 6.13, we saw how to use QAOA and VQE (by peek-a-boo) to solve an
optimization problem. In this section, we explore another algorithm of this type, the
so-called Subset Sum problem. Similar to Max-Cut, this problem is known to be NP-
complete.
The problem can be stated the following way. Given a set S of integers, can S be
divided into two sets, L and R = S − L, such that the sum of the elements in L equals
the sum of the elements in R:
|L|
X |R|
X
li = rj.
i j
We will also express this problem with a Hamiltonian constructed very similarly to
Max-Cut, except that we will only introduce a single weighted Z-gate for each number
in S. In Max-Cut we were looking for a minimal energy state. For this balanced sum
problem, we have to look for a zero-energy state because this would indicate energy
equilibrium, or equilibrium of the partial sums. Our implementation only decides
whether or not a solution exists. It does not find a specific solution.
6.14 Subset Sum Algorithm 263
6.14.1 Implementation
To start, and since this algorithm is begging to be experimented with, we define
relevant parameters as command line options (the implementation is in file src/
subset_sum.py). The highest integer in S is specified with parameter nmax. We
will encode integers as positions in a bitstring, or, correspondingly, a state. For inte-
gers up to nmax, we will need nmax qubits. The size |S| of set S is specified with
parameter nnum. And finally, the number of experiments to run is specified with
parameter iterations.
The next step is to get nnum random, unique integers in the range from 1 to nmax
inclusive. Other ranges are possible, including negative numbers, but given that we
use integers as bit positions, we have to map any such range to the range of 0 to nmax.
while True:
sample = random.sample(range(1, nmax), nnum)
if sum(sample) % 2 == 0:
return sample
The next step is to compute the diagonal tensor product. Note that we only have to
check for a single number and a correspondingly weighted (by index i) Z-gate.
Figure 6.33 A subset partition for a set of eight integers. The partial sums of all elements in the
white and gray sets are equal.
The final step for constructing the Hamiltonian is to add up all the diagonal tensor
products from the step above. This function is identical to the same function in the
Max-Cut algorithm, except for the invocation of the routine to compute the diagonal
tensor product itself. If we implemented more algorithms of this type, we would
clearly generalize the construction.
h = [0.0] * 2**nmax
for num in num_list:
diag = tensor_diag(nmax, num)
for idx, val in enumerate(diag):
h[idx] += val
return h
6.14.2 Experiments
Now on to experiments. We created a list of random numbers in the step above.
The next step is to exhaustively compute potential partitions. Similar to Max-Cut,
we divide the set of numbers into two sets with the help of binary bit patterns. For
each division, we compute the two sums for these two sets. If the results match up, we
add the corresponding bit pattern to the list of results. The routine then returns this list,
which could be empty if no solution was found for a given set of numbers. A sample
set partition is shown in Figure 6.33.
solutions = []
for bits in helper.bitprod(len(num_list)):
6.14 Subset Sum Algorithm 265
iset = []
oset = []
for idx, val in enumerate(bits):
(iset.append(num_list[idx]) if val == 0 else
oset.append(num_list[idx]))
if sum(iset) == sum(oset):
solutions.append(bits)
return solutions
Finally, we run the experiments. For each experiment, we create a set of numbers,
compute the solutions exhaustively, and compute the Hamiltonian.
nmax = flags.FLAGS.nmax
num_list = select_numbers(nmax, flags.FLAGS.nnum)
solutions = compute_partition(num_list)
diag = set_to_diagonal_h(num_list, nmax)
non_zero = np.count_nonzero(diag)
if non_zero != 2**nmax:
print('Solution should exist...', end='')
if solutions:
print(' Found Solution:',
dump_solution(solutions[0], num_list))
return True
266 Complex Algorithms
for i in range(flags.FLAGS.iterations):
ret = run_experiment()
[...]
Solution should exist... Found Solution: 13+1+5+3 == 14+8
Solution should exist... Found Solution: 4+9+14 == 12+5+10
Solution should exist... Found Solution: 10+1+14 == 4+12+9
Solution should exist... Found Solution: 12+7+10 == 6+9+14
[...]
Solution should exist... Found Solution: 1+3+11+2 == 5+12
Solution should exist... Found Solution: 13+5+7 == 2+9+14
We now switch gears and discuss the Solovay–Kitaev (SK) theorem and corresponding
algorithm (Kitaev et al., 2002). It proves that any unitary gate can be approximated
from a finite set of universal gates, which in the case of single-qubit gates, are just
Hadamard and T-gates. This theorem is one of the key results in quantum computing.
A version of the theorem that seems appropriate in our context is the following:
t❤❡♦r❡♠ 6.1 (Solovay–Kitaev theorem) Let G be a finite set of elements in SU (2)
containing its own inverses, such that hGi is dense in SU(2). Let ǫ > 0 be given. Then
there is a constant c such that for any U in SU (2) there is a sequence of gates of a
length O(log c (1/ǫ)) such that ||S − U || < ǫ.
In English, this theorem says that for a given unitary gate U , there is a finite
sequence of universal gates that will approximate U up to any precision. To a degree,
this should not surprise us because if this was not the case, the set of universal gates
could hardly be called universal. The produced gate sequences can be quite long, and
it appears the field has moved on (Ross and Selinger, 2016, 2021; Kliuchnikov et al.,
2015). Nevertheless, the algorithm was seminal and is supremely elegant.
We will study it using the pedagogical review from Dawson and Nielsen (2006) as
a guide. We start with a few important concepts and functions. Then we outline the
high-level structure of the algorithm before diving deeper into the complex parts and
implementation.
by a sequence of just these two gates. We prove this below by showing that the SK
algorithm, based on just these two gates, can approximate any unitary matrix up to
arbitrary precision. We develop the implementation here in a piecemeal fashion, but
the full code is available in the open-source repository src/solovay_kitaev.py.
6.15.2 SU(2)
One of the requirements for the SK algorithm is that the universal gates involved
are part of the SU (2) group, which is the group of all 2 × 2 unitary matrices with
determinant 1. The determinants of both the Hadamard gate and the T-gate are not 1.
In order to convert the gates to become members of SU (2), we apply the simple
transformation:
r
′ 1
U = U.
det U
def to_su2(U):
"""Convert a 2x2 unitary to a unitary with determinant 1.0."""
We won’t go deeper into SU (2) and the related mathematics of Lie groups. For our
purposes, we should simply think of SU (2) in terms of rotations. For a given rotation
V , the inverse rotation is V † , with V V † = I . For two rotations U and V , the inverse
of U V is V † U † , with U V V † U † = I . However, similar to how two perpendicular
sides on a Rubik’s cube rotate against each other, U V U † V † 6= I .
" # " #
cos (θ/2) 0 0 n 1 i sin (θ/2)
= +
0 cos (θ/2) n 1 i sin (θ/2) 0
" # " #
0 n 2 sin (θ/2) n 3 i sin (θ/2) 0
+ +
−n 2 sin (θ/2) 0 0 −n 3 i sin (θ/2)
" # " #
cos (θ/2) + n 3 i sin (θ/2) n 2 sin (θ/2) + n 1 i sin (θ/2) a b
= = .
−n 2 sin (θ/2) + n 1 i sin (θ/2) cos (θ/2) − n 3 i sin (θ/2) c d
def u_to_bloch(U):
"""Compute angle and axis for a unitary."""
There are other similarity metrics. For example, quantum fidelity. It is instructive to
experiment with these kind of measures to study their impact on the achieved accuracy
of our implementation:
q 2
√ √
F(ρ,φ) = tr ρφ ρ .
Figure 6.34 Distribution of 256 generated gate sequences applied to state |0i. There are many
duplicates.
To look up the closest gate, we iterate over the list of gates, compute the trace
distance to each one, and return the gate with the minimum distance. Again, this code
is kept simple for illustrative purposes. It is horribly slow, but there are ways to speed
it up significantly, for example, with KD-trees (Wikipedia, 2021a).
Note that the way we generate gate sequences here leads to duplicate gates. For
example, when plotting the effects of the generated gates on state |0i, we see that the
resulting distinct gates are quite sparse on the Bloch sphere, as shown in Figure 6.34.
6.15.6 Algorithm
Now we are ready to discuss the algorithm. We describe it in code, explaining it line
by line. Inputs are the unitary operator U we seek to approximate and a maximum
recursion depth n.
The recursion is counting down from an initial value of n and stops as it reaches
the termination case with n==0. At this point, the algorithm looks up the closest pre-
computed gate it can find.
if n == 0:
return find_closest_u(gates, U)
Starting with this basic approximation, the following steps further improve the
approximation by applying sequences of other inaccurate gates. The magic of this
algorithms is, of course, that this actually works!
The first recursive step tries to find an approximation of U. For example, if n==1,
the recursion would reach the termination clause and return the closest pre-computed
gate.
†
The next key steps are now to define 1 = UUn−1 and to improve the approximation
†
of 1. We concatenate the two gate sequences for U and Un−1 to obtain the improved
†
approximation. The interesting part here is that we use Un−1 . The gate Un−1 got us
closer to the target. The recursion wants to find out what we did before in order to
arrive at this gate.
We decompose 1 as a group commutator, which is defined as 1 = V W V † W †
with unitary gates V ,W . There are an infinite number of such decompositions, but we
apply an accuracy criterion to get a balanced group commutator. The math motivating
this decomposition is beyond this book. We refer to Dawson and Nielsen (2006) and
Kitaev et al. (2002) for details. Here we accept the result and show how to implement
gc_decomp().
V, W = gc_decomp(U @ U_next.adjoint())
The next recursive steps are then to get improved approximations for V ,W with the
†
same algorithm and to return a new and improved sequence UUn−1 , as:
† †
Un = Vn−1 Wn−1 Vn−1 Wn−1 Un−1 .
In the next few paragraphs, we will first derive Equation (6.18) and then solve for
φ. Both V and W were defined as rotations about the x-axis and y-axis:
V = Rx (φ),
V † = Rx (φ)† = Rx (−φ),
U = V W V † W † = Rx (φ)R y (φ)Rx (−φ)R y (−φ).
Similar to how we derived a unitary operator’s Bloch sphere angle and axis, we
express rotations as:
Rx (φ) = cos φ/2 I + i sin φ/2 X,
R y (φ) = cos φ/2 I + i sin φ/2 Y .
We can multiply this out and only evaluate the diagonal elements as above as
cos θ2 = a+d2 to arrive at:
cos (θ/2) = cos4 φ/2 + 2 cos2 φ/2 sin2 φ/2 − sin4 φ/2 .
We can factor out cos2 φ/2 + sin2 φ/2 for:
cos (θ/2) = cos4 φ/2 + 2 cos2 φ/2 sin2 φ/2 − sin4 φ/2
2
= cos2 φ/2 + sin2 φ/2 − 2 sin4 φ/2
= 1 − 2 sin4 φ/2 .
= 4 sin4 φ/2 − 4 sin8 φ/2
= 4 sin4 φ/2 1 − sin4 φ/2
q
⇒ sin (θ/2) = 2 sin2 φ/2 1 − sin4 φ/2 .
Now on to solving for φ. From what we’ve done so far, we know how to compute
θ for an operator. We get rid of the square root in Equation (6.18) by squaring the
whole equation. For ease of notation, we substitute x for the left side:
sin (θ/2) 2 2
q
x= = sin2 φ/2 1 − sin4 φ/2
2
x = sin4 φ/2 1 − sin4 φ/2
= sin4 φ/2 − sin8 φ/2
⇒ 0 = sin4 φ/2 − sin8 φ/2 − x
= sin8 φ/2 − sin4 φ/2 + x.
This is a quadratic equation, which we can solve:
y2 − y + x = 0
√
4
1 ± 1 − 4x
⇒ sin φ/2 = y =
2
q√
sin φ/2 = y
√
φ = 2 arcsin 4 y . (6.19)
Expand y (and remember that cos2 (φ) + sin2 (φ) = 1):
√
1 ± 1 − 4x
y=
2
q
1 ± 1 − 4 sin2 (θ/2) /4
=
2
1 ± cos (θ/2)
= .
2
Substituting this into Equation (6.19) leads to the final result for φ. We ignore the
+ case from the quadratic equation, as the goal was to arrive at Equation (6.18):10
r !
4 1 − cos (θ/2)
φ = 2 arcsin .
2
The construction proceeds as follows. We assumed that U is a rotation by
angle θ about some axis x̂. The angle φ is the solution to Equation (6.18). We
define V ,W to be rotations by φ, so U must be conjugate to the rotation by θ,
U = V̂ Ŵ V̂ † Ŵ † .
Let’s write this in code. First we define the function gc_decomp, adding a helper
function to diagonalize a unitary matrix. We compute θ and φ as described above:
def gc_decomp(U):
"""Group commutator decomposition."""
def diagonalize(U):
_, V = np.linalg.eig(U)
return ops.Operator(V)
We compute the axis on the Bloch sphere as shown above and construct the rotation
operators V and W :
V = ops.RotationX(phi)
if axis[2] > 0:
W = ops.RotationY(2 * np.pi - phi)
else:
W = ops.RotationY(phi)
Ud = diagonalize(U)
VWVdWd = diagonalize(V @ W @ V.adjoint() @ W.adjoint())
S = Ud @ VWVdWd.adjoint()
V_hat = S @ V @ S.adjoint()
W_hat = S @ W @ S.adjoint()
return V_hat, W_hat
6.15.8 Evaluation
For a brief and anecdotal evaluation, we define key parameters and run a handful
of experiments. The number of experiments to run is given by num_experiments.
6.15 Solovay–Kitaev Theorem and Algorithm 275
Variable depth is the maximum length of the bitstrings we use to pre-compute gates.
For a depth value x, 2x − 1 gates are pre-computed. Variable recursion is the
recursion depth for the SK algorithm. It is instructive to experiment with these values
to explore the levels of accuracy and performance you can achieve.
def main(argv):
if len(argv) > 1:
raise app.UsageError('Too many command-line arguments.')
num_experiments = 10
depth = 8
recursion = 4
print('SK algorithm - depth: {}, recursion: {}, experiments: {}'.
format(depth, recursion, num_experiments))
Next we compute the SU (2) base gates and create the pre-computed gates.
for i in range(num_experiments):
U = (ops.RotationX(2.0 * np.pi * random.random()) @
ops.RotationY(2.0 * np.pi * random.random()) @
ops.RotationZ(2.0 * np.pi * random.random()))
phi1 = U(state.zero)
phi2 = U_approx(state.zero)
print('[{:2d}]: Trace Dist: {:.4f} State: {:6.4f}%'.
format(i, dist,
100.0 * (1.0 - np.real(np.dot(phi1, phi2.conj())))))
print('Gates: {}, Mean Trace Dist:: {:.4f}'.
format(len(gates), sum_dist / num_experiments))
This should result in output like the following. With just 255 pre-computed gates
(including many duplicates) and a recursion depth of 4, approximation accuracy falls
consistently below 1%.
276 Complex Algorithms
It is educational to experiment with this approach. You can find approximated gates
with small trace distances, but it appears the impact on basis states is much larger than
for the SK algorithm. With longer gate sequences and many more tries, gate sequences
can reach low trace distance deltas. However, to reach accuracies as shown above
for the SK algorithm, the runtime can be orders of magnitude longer. To answer the
6.15 Solovay–Kitaev Theorem and Algorithm 277
question above about how well the SK algorithm performs – it does very well! Here is
an example output from a sequence of randomized experiments:
This concludes the section on complex quantum algorithms. For a deeper math-
ematical treatment of these algorithms and their derivatives, see the Bibliography
and relevant publications. For further reading on known algorithms, Mosca (2008)
provides a detailed taxonomy and categorization of algorithms. The Quantum Algo-
rithm Zoo lists another large number of algorithms alongside an excellent bibliography
(Jordan, 2021). Abhijith et al. (2020) offers high-level descriptions of about 50 algo-
rithm implementations in Qiskit.
7 Quantum Error Correction
In this section, we discuss techniques for quantum error correction, which is an abso-
lute necessity for the success of quantum computing, given the high likelihood of
noise, errors, and decoherence in larger circuits. We have ignored this topic so far and
assumed an ideal, error-free execution environment. For real machines, this assump-
tion will not hold. Quantum error correction is a fascinating and wide-ranging topic.
This section is mostly an introduction, with focus on just a few core principles.
Building a real, physical quantum computer big enough to perform useful computation
presents an enormous challenge. On one hand, the quantum system must be isolated
from the environment as much as possible to avoid entanglement with the environment
and other perturbations, which may introduce errors. For example, molecules may
bump into qubits and change their relative phase, even at temperatures of close to abso-
lute zero. On the other hand, the quantum system cannot be entirely isolated because
we want to program the machine, perhaps dynamically, and make measurements.
Here is a summary of available technologies, as presented in Nielsen and Chuang
(2011). Table 7.1 shows the underlying technology, the time τ Q the system may stay
coherent before it starts entangling with the environment, the time τop it takes to apply
a typical unitary gate, and the number n op of operations that can be executed while
still in a coherent state.
For several technologies, the number of coherently executable instructions is rather
small and won’t suffice to execute very large algorithms with potentially billions of
gates.
Errors are inevitable, given the quantum scale and very high likelihood of the
environment perturbing the system. To compare the expected quantum and classical
error rates – for a modern CPU, a typical error rate is about one per year, or one
error for every 1017 operations. The actual error rate might be higher, but mitigation
strategies are in place. In contrast, data from 2020 from IBM shows an average single-
qubit gate error rate of about one per 10−3 seconds, and one per 10−2 seconds for
two-qubit gates. Based on frequency, this could reach up to one error for every 200
operations. This is a difference of almost 10 orders of magnitude!
7.1 Quantum Noise 279
Table 7.1 Estimates for decoherence times (secs), gate application latency (secs),
and number of gates that can be applied while coherent. Data from Nielsen and
Chuang (2011).
System τQ τop n op
What are possible error conditions, and how do we model the likelihood of their
occurrence?
Bit-Flip Error
The bit-flip error causes the probability amplitudes of a qubit to flip, similar to the
effect of an X-gate:
α|0i + β|1i → β|0i + α|1i.
This is also called a dissipation-induced bit-flip error. Dissipation is the process of
losing energy to the environment. If we think of a qubit in state |1i as an electron’s
excited state, as it loses energy, it may fall to the lower energy |0i state and emit a
photon. Correspondingly, it may jump from |0i to |1i by absorbing a photon (in which
case it should probably be called an excitation-induced error).
Phase-Flip Error
The phase-flip error causes the relative phase to flip from +1 to −1, similar to the
effect of a Z-gate:
α|0i + β|1i → α|0i − β|1i.
This is also called a decoherence-induced phase shift error. In the example, we
shifted the phase by π, but for decoherence we should also consider much smaller
phase changes and their insidious tendency to compound over time.
This is equivalent to applying the Y-gate, as we’ve seen before, ignoring the global
phase:
0 −i α
Y α|0i + β|1i = = −i β|0i + i α|1i = −i β|0i − α|1i .
i 0 β
No Error
We should mention this one for completeness; it’s the equivalent effect of applying an
identity gate to a qubit, or, equally, doing nothing.
These errors will occur with a certain probability. To model this properly, we will
introduce the concept of quantum operations next, which allow the formalizing of the
statistical distribution of error conditions in an elegant way.
ρ ′ = E(ρ).
Where the E is called a quantum operation. The two types of operations discussed
in this book are unitary transformations and measurements (note the matrix multipli-
cation from both sides):
In a closed quantum system, which has no interaction with the environment, the
system evolves as:
ρ U U ρU †
In an open system, we model the system as the tensor product of state and environ-
ment as ρ ⊗ ρenv . The system evolves as described in Equation (7.1) as:
U (ρ ⊗ ρenv ) U † .
ρ E(ρ)
U
ρenv
To describe the system without the environment, we trace out the environment using
the methodology from Section 2.14:
h i
E(ρ) = trenv U (ρ ⊗ ρenv ) U † . (7.2)
where E k = hek |U |e0 i. The E k are the operation elements for the quantum operation
E. They are also called Krauss operators1 and operate on the principal system only.
Now let’s see how we can use this formalism to describe the various error modes.
1 This notation is sloppy, as U applies to both environment and state. Because this detail is not essential in
our context, we tolerate it.
282 Quantum Error Correction
|0i H
|0i S H
To model noise, we introduce error gates E with a given probability, injecting bit-
flip and phase-flip errors. An example circuit before and after error injection is shown
in Figures 7.1 and 7.2, respectively. It is very educational to inject these error gates
and evaluate their impact on various algorithms. We will do just that in Section 7.2.
7.1 Quantum Noise 283
|0i E H E E E
|0i E E S E H E
Phase damping describes the process of a system losing relative phase between
qubits, thus introducing errors in algorithms that rely on successful quantum interfer-
ence. The operator elements are:
1 p 0 0 0
E0 = and E 1 = √ .
0 1−γ 0 γ
40
30
20
10
0
0 0.5 1 1.5 2
Figure 7.3 Phase estimation errors exceeding a threshold of 2% from increasing levels of noise,
for N = 50 experiments per setting. The x-axis represents noise ranging from 0% to 200%, the
y-axis represents the number of experiments exceeding the threshold.
def Rk(k):
return Operator(np.array([(1.0, 0.0),
(0.0, cmath.exp((1 + (random.random() * flags.FLAGS.noise)) *
(2.0 * cmath.pi * 1j / 2**k)))]))
Then, for values of n f ranging from 0.0 to 2.0, we run 50 experiments and count the
number of experiments that result in a phase estimation error larger than 2%. Hence,
we test the robustness of phase estimation against small to large errors in the inverse
QFT rotation gates. Figure 7.3 shows the distribution.
We see that the inverse QFT is surprisingly robust against sizeable maximum errors
in the rotation gates, but this is just an anecdote. The exact outcome would depend
on the statistical distribution of the actual errors. We should also expect that each
algorithm has different tolerances and sensitivities. For comparison, introducing depo-
larization with just 0.1% probability leads to significantly different outcomes in the
order finding algorithm, which is very sensitive to this particular type of error.
We will need some form of error correction techniques to control the impact of noise.
In classical computing, a large body of known error correction techniques exists. Error
correction code memory, or ECC (Wikipedia, 2021b), may be one of the best known
ones. There are many more techniques that prevent invalid data, missing data, or spu-
rious data. NASA, in particular, has developed impressive techniques to communicate
with their ever-more-distant exploratory vehicles.
A simple classical error correction technique is based on repetition codes and
majority voting. For example, we could triple each binary digit:
7.2 Quantum Error Correction 285
000 0 111 1
001 0 110 1
010 0 101 1
100 0 011 1
0 → 000
1 → 111
As we receive data over a noisy channel, we measure it and perform majority voting
with the scheme shown in Table 7.2. This simple scheme does not account for missing
or erroneous bits, but it is good enough to explain the basic principles.
In quantum computing, the situation is generally more difficult:
|0i
286 Quantum Error Correction
qbit = state.qubit(random.random())
psi = qbit * state.zeros(2)
psi = ops.Cnot(0, 2)(psi)
psi = ops.Cnot(0, 1)(psi)
psi.dump()
>>
|000> (|0>): ampl: +0.78+0.00j prob: 0.61 Phase: 0.0
|111> (|7>): ampl: +0.62+0.00j prob: 0.39 Phase: 0.0
Note again that these states do not violate the no-cloning theorem as we are not
constructing (α|0i + β|1i)⊗3 .
2 https://en.wikipedia.org/wiki/Amdahl%27s_law.
7.2 Quantum Error Correction 287
ψ1 Error Correction
0
|0i
1
|0i
|ψi |ψi
|0i E bit
|0i
def test_x_error(self):
qc = circuit.qc('x-flip / correction')
qc.qubit(0.6)
# Error insertion.
qc.x(0)
# Fix.
288 Quantum Error Correction
qc.cx(0, 1)
qc.cx(0, 2)
qc.ccx(1, 2, 0)
qc.psi.dump('after correction')
Note the small difference in the final states with nonzero probabilities. In the case
without an injected error, the state resumes to |000i and |100i, the original input state.
In the case with an injected error, the ancilla qubits are |11i for both resulting states.
|ψi H H |ψi
|0i H E phase H
|0i H H
[...]
qc.h(0)
qc.h(1)
qc.h(2)
qc.z(0)
qc.h(0)
qc.h(1)
qc.h(2)
[...]
The probability distribution of the resulting nonzero probability states is the same,
but we get a few states with phases. For example, without error injection:
All of this leads up to the final nine-qubit Shor code (Shor, 1995), which is a combi-
nation of the above. It combines the circuits to find bit-flip, phase-flip, and combined
errors into one large circuit, as shown in Figure 7.7.
The Shor nine-qubit circuit is able to identify and correct one bit-flip error, one
phase-flip error, or one of each on any of the nine qubits! Let’s verify this in code and
apply all Pauli gates to each of the qubits of this circuit. For this experiment, we con-
struct a qubit with α = 0.60 (the code can be found in file lib/circuit_test.py):
def test_shor_9_qubit_correction(self):
for i in range(9):
qc = circuit.qc('shor-9')
print(f'Init qubit as 0.6|0> + 0.8|1>, error on qubit {i}')
qc.qubit(0.6)
qc.reg(8, 0)
# Left Side.
qc.cx(0, 3)
290 Quantum Error Correction
|ψi H H |ψi
|0i
|0i
|0i H H
|0i E
|0i
|0i H H
|0i
|0i
qc.cx(0, 6)
qc.h(0); qc.h(3); qc.h(6);
qc.cx(0, 1); qc.cx(0, 2)
qc.cx(3, 4); qc.cx(3, 5)
qc.cx(6, 7); qc.cx(6, 8)
# Fix.
qc.cx(0, 1); qc.cx(0, 2); qc.ccx(1, 2, 0)
qc.h(0)
qc.cx(3, 4); qc.cx(3, 5); qc.ccx(4, 5, 3)
qc.h(3)
qc.cx(6, 7); qc.cx(6, 8); qc.ccx(7, 8, 6)
qc.h(6)
prob0, s = qc.measure_bit(0, 0)
prob1, s = qc.measure_bit(0, 1)
print(' Measured: {:.2f}|0> + {:.2f}|1>'.format(
math.sqrt(prob0), math.sqrt(prob1)))
There are several other techniques and formalisms for quantum information and
quantum error correction. We only want to mention a small number of influential
works. A good introduction and overview can be found in Devitt et al. (2013). Andrew
Steane published the seven-qubit Steane code in Steane (1996). A five-qubit error
correction code was discussed by Cory et al. (1998).
8 Quantum Languages, Compilers,
and Tools
At this point, we understand the principles of quantum computing, the important foun-
dational algorithms, and the basics of quantum error correction. We have developed a
compact and reasonably fast infrastructure for exploration and experimentation with
the presented material.
The infrastructure is working but still far away from enabling high programmer
productivity. Composing algorithms is labor-intensive and error prone. Circuits with
maybe 106 gates are supported, but some algorithms may require trillions of gates
with orders of magnitude more qubits.
In classical computing, programs are being constructed at much higher levels of
abstraction, which allows the targeting of several general-purpose architectures in a
portable way. On a high-performance CPU, programs execute billions of instructions
per second on a single core. Building quantum programs at that scale with a flat
programming model like QASM stitching together individual gates does not scale.
It is the equivalent of programming today’s machines in assembly language, without
looping constructs.
There are parallels to the 1950s, where assembly language was the trade of the
day to program early computers. Just as FORTRAN emerged as one of the first com-
piled programming language and enabled major productivity gains, there are similar
attempts today, trying to raise the abstraction level in the area of quantum computing.
In this chapter we discuss a representative cross-section of quantum programming
languages, and briefly touch on tooling, such as simulators or entanglement analysis.
There is also a discussion on quantum compiler optimizations, a fascinating topic with
unique challenges. We write this chapter with the understanding that comparisons
between toolchains are necessarily incomplete but nonetheless educational.
Section 8.5 on transpilation finishes the chapter; this is a powerful technique with
many uses. It allows seamless porting of our circuits to other frameworks. This
enables direct comparisons and the use of specific features of these platforms, such as
advanced error models or distributed simulation. Transpilation can be used to produce
circuit diagrams or LATEX source code. The underlying compiler technology further
enables implementation of several of the features found in various programming
languages, such as uncomputation, entanglement analysis, and conditional blocks.
8.1 Challenges for Quantum Compilation 293
Quantum computing poses unique challenges for compiler design. In this section, we
provide a brief overview of some of the main challenges. The next sections discuss
additional details and proposed solutions.
Quantum computing needs a programming model – what will run, how, when,
and where? Unlike classical coprocessors, such as graphics processing units (GPUs),
quantum computers will not offer general-purpose functionality similar to a CPU.
Instead, a classical machine will entirely control the quantum computer. A model
called QRAM was proposed for this early in the history of quantum computing. We
will discuss this model in Section 8.2.
A key question is how realistic this idealized model is, or can be. Quantum cir-
cuits operate at micro-Kelvin temperatures. It will be a challenge for standard CPU
manufacturing processes to operate at this temperature, even though progress has been
made (Patra et al., 2020). The CPU could alternatively operate away from the quantum
circuit, but the bandwidth between classical and quantum circuits may be severely
limited. Current work is presented in Xue et al. (2021).
Constructing quantum circuits gate by gate is tedious and prone to error. There
are additional challenges, such as the no-cloning theorem and the need for automatic
error correction. Programming languages offer a higher level of abstraction and will
be essential for programmer productivity. But what is the right level of abstraction?
We sample several existing approaches to quantum programming languages in Section
8.3. Compiler construction and intermediate representation (IR) design are challenges
by themselves; it is apparent that a flat, QASM-like, linked-list IR will not scale to
programs with trillions of gates.
The required precision of gates is an important design parameter. We will have
to approximate certain unitaries by sequences of existing, physical gates, but this
introduces inaccuracies and noise. Some algorithms are robust against noise, others
not at all. The toolchain plays an essential role in this area as well.
Aspects of dynamic code generation may become necessary, for example, to
approximate specific rotations dynamically or to reduce noise (Wilson et al., 2020).
There are challenges in fast gate approximation, compile time, accuracy, and optimal-
ity of the approximating gate sequences. To give a taste of these problems, we already
detailed the Solovay–Kitaev algorithm in Section 6.15.
Compiler optimization has a novel set of transformations to consider in an expo-
nentially growing search space. We are currently in the era of physical machines with
50–100 physical qubits, the Noisy Intermediate-Scale Quantum Computers (Preskill,
2018). Future systems will have more qubits and qubits with likely different charac-
teristics than today’s qubits. Compiler optimizations and code generation techniques
will have to evolve accordingly as well. We discuss several optimization techniques in
Section 8.4.
294 Quantum Languages, Compilers, and Tools
As our standard model of computation, we assume the quantum random access model
(QRAM) as proposed by Knill (1996). The model proposes connecting a general-
purpose machine with a quantum computer in order to use it as an accelerator. Reg-
isters are explicitly quantum or classical. There are functions to initialize quantum
registers, to program gate sequences into the quantum device, and to get results back
via measurements.
Init, Operations →
Server Quantum Machine
← Measurement
On the surface, this model is not much different from today’s programming mod-
els for PCIe connected accelerators,1 such as GPUs or storage devices, which are
ubiquitous today. The elegant CUDA programming model for GPUs provides clear
abstractions for code that is supposed to run on either device or server (Buck et al.,
2004; Nickolls et al., 2008). Source code for the accelerator and host can be mixed in
the same source file to enhance programmer productivity.
QRAM is an idealization. Communication between the classical and quantum parts
of a program may be severely limited. There may either be a significant lack of com-
pute power close to the quantum circuit, which operates at micro-Kelvin temperature,
or bandwidth-limited communication to a CPU further away.
It is important to keep the separation between classical and quantum in mind. In
QRAM, as in our simulation infrastructure, the separation of classical and quantum is
muddled, running classical loops over applications of quantum gates interspersed with
print statements. This might be a good approach for learning, but it is not realistic for
a real machine. To a degree, the approach is more akin to an infrastructure such as the
machine learning platform Tensorflow. It first builds up computation in the form of a
graph before executing the graph in a distributed fashion on CPU, GPU, or TPU.
Another aspect of the QRAM model is the expectation of available universal gates
on the target quantum machine. Several universal sets of gates have been described
in the literature (see Nielsen and Chuang, 2011, section 4.5.3). We showed how any
unitary gate can be approximated by universal gates in Section 6.15. With this, we
assume that any gate may be used freely in our idealized infrastructure. On real
machines, however, the number of gates is limited, and there are accuracy and noise
concerns.
necessarily incomplete. Most importantly, the selection does not judge the quality of
the nonselected languages. Each attempt makes novel contributions over prior art,
variations of which can be found in other related works.
In a hierarchy of abstractions, this is how to place quantum programming lan-
guages:
• The high abstraction level of programming languages. This level may provide
automatic ancilla management, support correct program construction with
advanced typing rules, offer libraries for standard operations (such as QFT), and
perhaps offer meta-programming techniques.
• Programming at the level of gates. This is the level of this text. It is the
construction and manipulation of individual qubits and gates.
• Direct machine control with pulses and wave forms to operate a physical device.
We will not discuss related infrastructure, such as OpenPulse (Gokhale et al.,
2020).
For each of the platforms, there is lots of material available online to experiment
with. This section is meant to be educational – it should also inspire. For example, we
could easily add several of the proposed features to our infrastructure. Also, despite
all progress, the development of quantum programming languages and their compilers
appears to still be in its infancy.
8.3.1 QASM
The quantum assembly language (QASM) was an early attempt to textually specify
quantum circuits (Svore et al., 2006). We’ve already seen QASM code in Section 6.3
on quantum arithmetic, and we will see more of it in Section 8.5 on transpilation.
The structure of a QASM program is very simple. Qubits and registers are declared,
and gate applications follow one by one. There are no looping constructs, function
calls, or other constructs that would help to structure and densify the code. For exam-
ple, a simple entangler circuit would read like this:
qubit x,y;
gate h;
gate cx;
h x;
cx x,y;
8.3.2 QCL
The quantum computing language (QCL) was an early attempt to use classical
programming constructs to express quantum computation (Ömer, 2000, 2005; QCL
Online). Algorithms are run on a classical machine controlling a quantum computer
and might have to run multiple times until a solution is found. Quantum and classical
code are intermixed. Qubits are defined as registers of a given length, and gates are
applied directly to registers:
qureg q[l];
qureg f[1];
H(q);
Not(f);
const n=#q; // length of q register
for i =1 to n { // classical loop
Phase(pi/2^(i));
}
QCL defines several quantum register types. There is the unrestricted qureg,
quconst defines an invariant qubit, and quvoid specifies a register to be empty. It is
guaranteed to be in state |0i. The register type quscratch denotes ancillae.
Code is organized into quantum functions. Functions and operators are reversible.
Prefixing a function with an exclamation point produces the inverse, as in this example
from the Grover algorithm:3
operator diffuse(qureg q) {
H(q); // Hadamard transform
Not(q); // Invert q
CPhase(pi,q); // Rotate if q=1111...
!Not(q); // Undo inversion
!H(q); // Undo Hadamard transform
}
3 Note that both the Hadamard and the NOT gates are their own inverses. This might not be the most
convincing example.
8.3 Quantum Programming Languages 297
The implementation of fanout is quite elegant: Assume a function F(x, y,s) with
x being the input, y being the output, and s being junk qubits. Allocate the ancilla t
and transform F into the following, adding t to its signature:
What makes this elegant is the fact that fanout is written in QCL itself:
QCL supports conditionals in interesting ways. Standard controlled gates are sup-
ported as described in Section 2.7. If a function signature is marked with the keyword
cond and has as parameter a quconst condition qubit, QCL automatically transforms
the operators in the functions to controlled operators:
if e {
inc(x);
} else {
!inc(x);
}
=>
cinc(x, e);
Not(e);
!cinc(x, e);
Not(e)
298 Quantum Languages, Compilers, and Tools
8.3.3 Scaffold
Scaffold takes a different approach (Javadi-Abhari et al., 2014). It extends the open-
source LLVM compiler and its Clang-based front end for C/C++. Scaffold introduces
data types qbit and cbit to distinguish quantum from classical data. Quantum gates,
such as the X-gate or the Hadamard gate, are implemented as built-ins; the compiler
recognizes them as such and can reason about them in transformation passes.
Scaffold supports hierarchical code structure via modules, which are specially
marked functions. Quantum circuits do not support calls and returns, so modules
representing subcircuits need to be instantiated, similar to, say, how Verilog modules
are instantiated in a hardware design. Modules must be reversible, either by design or
via automatic compiler transformations, such as full unrolling of classical loops.
Scaffold offers convenient functionality to convert classical circuits to quantum
gates, via the Classical-To-Quantum-Circuit (CTQC) tool. This tool is of great utility
for quantum algorithms that perform classical computation in the quantum domain.
CQTC emits QASM assembly. To enable whole program optimization, Scaffold has a
QASM to LLVM IR transpiler, which can be used to import QASM modules, enabling
further cross-module optimization.
Modules are parameterized. This means the compiler has to manage module instan-
tiation, for example, with IR duplication. This can lead to sizeable code bloat and
correspondingly long compile times. The example given is the following code snippet,
where the module Oracle would have to be instantiated N = 3000 times. Clearly, a
parameterized IR would alleviate this problem considerably.
8.3 Quantum Programming Languages 299
module main () {
qbit a[1], b[1];
int i, j;
for (i=1; i<=N; i++) {
for (j=0; j<=3; j++) {
Oracle(a, b, j);
}
}
}
As a result, Javadi-Abhari et al. (2014) reports compile times ranging from 24 hours
to several days for a larger triangle finding problem with size n = 15 (see also Magniez
et al., 2005).
Hierarchical QASM
Scaffold intends to scale to very large circuits. The existing QASM model, as shown
above, is flat, which is not suitable for large circuits. One of the main contributions
of Scaffold is the introduction of hierarchical QASM. Additionally, the compiler
employs heuristics for what code sequences to flatten or keep in a hierarchical struc-
ture. For example, the compiler distinguishes between forall loops to apply a gate to all
qubits in a register and repeat loops, such as those required for the Grover iterations.
Entanglement Analysis
Scaffold includes tooling for entanglement analysis. In the development of Shor’s
algorithm, we observed a certain ancilla qubit that was still entangled after modular
addition. How does one reason about this?
Scaffold tracks entanglement-generating gates, such as Controlled-Not gates, on a
stack. As inverse gates are executed in reverse order, items are popped off the stack. If,
for a given qubit, no more entangling gates are found on the stack, the qubit is marked
as unentangled. As a result of the analysis, the generated output can be decorated to
show the estimated entanglement:
8.3.4 Q language
We can contrast this work with a pure C++-embedded approach, as presented in
Bettelli et al. (2003). This approach consists of a library of C++ classes modeling
quantum registers, operators, operator application, and other functions, such as
reordering of quantum registers. The class library builds up an internal data structure
to represent the computation, similar in nature to the infrastructure we developed here.
It is interesting to ponder the question of which approach makes more sense:
• Extension of the C/C++ compiler with specific quantum types and operators,
as in Scaffold.
• A C++ class library as in the Q language.
Both approaches appear equally powerful in principle. The compiler-based approach
has the advantage of benefitting from a large set of established compiler passes, such
as inlining, loop transformations, redundancy elimination, and many other scalar,
loop, and inter-procedural optimizations. The C++ class library has the advantage that
the management of the IR, all optimizations, and final code generation schemes are
being maintained outside of the compiler. Since compilers can prove impenetrable for
noncompiler experts, this approach might have a maintenance advantage, but at the
cost of potentially having to re-implement many optimization passes.
8.3.5 Quipper
Haskell is a popular choice for programming language theorists and enthusiasts
because of its powerful type system. An example of a Haskell-embedded implemen-
tation of a quantum programming system can be found with the Quantum IO Monad
(Altenkirch and Green, 2013). Another even more rigid example is van Tonder’s
proposal for a λ-calculus to express quantum computation (van Tonder, 2004).
What these approaches have in common is the attempt to guarantee correctness
by construction with support of the type system. This is also one of Quipper’s core
design ideas (Green et al., 2013; Quipper Online, 2021). Quipper is an embedded DSL
in Haskell. At the time of Quipper’s publication, Haskell lacked linear types, which
could have guaranteed that objects were only referenced once, as well as dependent
types, which are types combined with a value. Dependent types, for example, allow
you to distinguish a QFT operator over n qubits from one over m qubits.
8.3 Quantum Programming Languages 301
Quipper is designed to scale and handle large programs with up to 1012 operators.
Quipper has a notion of ancilla scope, with an ability to reason about ancilla live
ranges. Allocating ancilla qubits turns into a register allocation problem. Ancilla live
ranges have to be marked explicitly by the programmer.
At the language level, qubits are held in variables and gates are applied to these
variables. For example, to generate a Bell state:
Automatic Oracles
Quipper offers tooling for the automatic construction of oracles. Typically, oracles are
constructed with the following four manual steps. There are open-source implementa-
tions available for these techniques (Soeken et al., 2019).
1. Build a classical oracle, for example, a permutation matrix.
2. Translate the classic oracle into classical circuits.
3. Compile classical circuits to quantum circuits, potentially using additional
ancillae. We saw examples of this in Section 3.3.1.
4. Finally, make the oracle reversible, typically with an XOR construction to another
ancilla.
Quipper utilizes Template Haskell to automate steps two and three. The approach
has high utility and has been used to synthesize millions of gates in a set of bench-
marks. In direct comparison to QCL on the Binary Welded Tree algorithm, it appears
that QCL generates significantly more gates and qubits than Quipper. On the other
hand, Quipper appears to generate more ancillae.
Despite the tooling, type checks, automation of oracles, and utilization of the
Haskell environment, it still took 55 man months to implement the 11 algorithms in
a given benchmark set (IARPA, 2010). This is certainly a productivity improvement
over manually constructing all the benchmarks at the gate level, but it still compares
unfavorably against programmer productivity on classical infrastructure.
302 Quantum Languages, Compilers, and Tools
8.3.6 Silq
Based on a fork from the PSI probabilistic programming language (PSI Online, 2021),
Silq is another step in the evolution of quantum programming languages, supporting
safe and automatic uncomputation (Bichsel et al., 2020).
It explicitly distinguishes between the classical and quantum domains with syntacti-
cal constructs. Giving the responsibility for safe uncomputation to the compiler leads
to two major benefits. First, the code becomes more compact. Direct comparisons
against Quipper and Q# appear to show significant code size savings for Silq in the
range of 30% to over 40%. Second, the compiler may choose an optimal strategy
for uncomputation, minimizing the required number of ancillae. As an added benefit,
the compiler may choose to skip uncomputation for simulation altogether and just
renormalize states and unentangle ancillae.
Many of the Haskell embedded DSLs bemoan either the absence of linear types or
difficulties handling constants. Silq resolves this by using linear types for nonconstant
values and a standard type system for constants. This leads to safe semantics, even
across function calls, and the no-cloning theorem falls out naturally. Function type
annotations are used to aid the type checker:
• The annotation qfree indicates that a function can be classically computed. For
example, the quantum X-gate is considered qfree, while the
superposition-inducing Hadamard gate is not.
• Function parameters marked as const are preserved and not consumed by a
function. They continue to be accessible after a function call. Parameters not
marked as const are no longer available after the function call. Functions with
only const parameters are called lifted.
• Functions marked as mfree promise not to perform measurements and are
reversible.
Silq supports other quantum language features, such as function calls, measure-
ment, explicit reversing of an operator via reverse, and an if-then-else con-
struct that can be classical or quantum, similar to other quantum languages. Looping
constructs must be classical. As an improvement over prior approaches, Silq supports
Oracle construction with quantum gates.
With the annotations and the corresponding operational semantics, Silq can safely
deduce which operations are safe to reverse and uncompute, even across function
calls. The paper provides many examples of potentially hazardous corner cases that
are being handled correctly (Bichsel et al., 2020).
8.4 Compiler Optimization 303
As a program example, the code snippet below solves one of the challenges
in Microsoft’s Q# Summer 2018 coding contest:4 Given classical binary string
√
b ∈ {0,1}n with b[0] = 1, return state 1/ 2(|bi + |0i), where |0i is represented using
n qubits.
The code iself demonstrates several of Silq’s features, for example, the use of ! to
denote classical values and types.
def solve[n:|N|](bits:|!B|^n){
// prepare superposition between 0 and 1
x:=H(0:|!B|);
// prepare superposition between bits and 0
qs := if x then bits else (0:int[n]) as |!B|^n;
// uncompute x
forget(x=qs[0]); // valid because bits[0]==1
return qs;
}
def main(){
// example usage for bits=1, n=2
x := 1:|!|int[2];
y := x as |!B|^2;
return solve(y);
}
4 http://codeforces.com/blog/entry/60209.
304 Quantum Languages, Compilers, and Tools
The space is large and complex, and we won’t be able to cover it exhaustively.
Instead, we again provide representative examples of key principles and techniques,
hoping to give a taste of the challenges.
Zi X i X i Yi = Z i Yi .
| {z }
r edundant
The circuit expands the multi-control gates into this much longer sequence of gates
(don’t worry, you are not expected to be able to decipher this):
X
X X X X X
X cx cx X X cx cx X X cx cx X X cx cx X X
X X X X X X X X X X X
cv cv† cv cv cv† cv
306 Quantum Languages, Compilers, and Tools
Zooming in at the right, we can see the opportunity to eliminate redundant X-gates:
X X cx cx X X cx cx X
X X X X X X
cv cv† cv cv cv† cv
In general, for a single-qubit operator U , if the compiler can prove that the input
state is an eigenstate of U with an eigenvalue of 1 (which means U |ψi = |ψi), it can
simply remove the gate. For example, if the qubit is in the |+i state, the X-gate has no
effect, as X |+i = |+i.
Depending on the numerical conditioning of an algorithm, the compiler may also
decide to remove gates that have only small effects. As an example, we have seen the
effectiveness of this technique in the approximate QFT (Coppersmith, 2002).
X Y = YX
The compiler can also exploit the fact that qubits may be unentangled. For exam-
ple, assume qubits |ψi and |φi are known to be unentangled and must be swapped,
potentially by a Swap gate spanning multiple qubits. Since the gates are unentangled
and in a pure state, we may be able to classically find a unitary operator U such that
U |ψi = |φi and U † |φi = |ψi. In circuit notation:
computer can support, topological constraints, and also on the relative cost of specific
gates. For example, T-gates might be an order of magnitude slower than other gates
and may have to be avoided.
In order to find the best equivalences, pattern matching can be used. To maximize
the number of possible matches, you may have to reorder and reschedule gates. Valid
and efficient recipes for reordering are hence a rich area of research. As a simple exam-
ple, single-qubit gates applied to different qubits can be reordered and parallelized, as:
(U × I )(I × V ) = (I × V )(U × I ) = (U × V )
U U U
= =
V V V
There are many other opportunities to reorder. For example, if a gate is followed by
a controlled gate of the same type, the two gates can be re-ordered:
Yi CY ji = CY ji Yi .
Rotations are a popular target for reordering. For example, the S-gate, T-gate, and
Phase-gate all represent rotations, which can be applied in any order. Nam et al. (2018)
provide many recipes, rewrite rules, and examples, such as the following:
Rz Rz
=
|0i H H |0i H Z H
|0i H H |0i H H
|0i H H |0i H Z H
=
|0i H H |0i H H
|0i X H H |0i H H
|0i
=
|ψi
We can also squeeze the Swap gate if one of the inputs is known to be |0i:
|0i |ψi
=
|ψi |0i
Figure 8.2 Decomposition of a Swap gate spanning three qubits into next-neighbor controlled
gates.
Some abstract gates will be easier to approximate than others on a given physical
instruction set, such as the IBM machine above. Each target and algorithm will hence
require targeted methodology and compilation techniques.
8.5 Transpilation 311
8.5 Transpilation
8.5.2 IR Nodes
A node will hold all information available for the operations, such as target qubit or
intended rotation angle. The nodes themselves are represented by a simple Python
class with all the relevant parameters passed to its constructor. One class is enough
to represent all possible node types. Again, we keep it very simple in this prototype
implementation.
312 Quantum Languages, Compilers, and Tools
class Node:
"""Single node in the IR."""
def is_single(self):
return self._opcode == Op.SINGLE
def is_ctl(self):
return self._opcode == Op.CTL
def is_gate(self):
return self.is_single() or self.is_ctl()
Then, based on the specific node type, the transpilers will query the properties to
get the node attributes:
@property
def opcode(self):
return self._opcode
@property
def name(self):
if not self._name:
return '*unk*'
return self._name
@property
def desc(self):
return self._name
[...]
class Ir:
"""Compiler IR."""
def __init__(self):
self._ngates = 0 # gates in this IR
self.gates = [] # [] of gates
self.regs = [] # [] of tuples (global reg index, name, reg index)
self.nregs = 0 # number of registers
self.regset = [] # [] of tuples (name, size, reg) for registers
@property
def ngates(self):
return self._ngates
There are only two functions in the qc class that apply gates. We have to add IR-
construction calls to those two functions. This is one of the benefits of having this
abstraction, as alluded to in Section 4.3:
Registers are also supported. As a matter of fact, at the time of writing, only quan-
tum registers are supported for code generation. In other words, to generate valid
output, the qubits have to be generated and initialized as registers.
For example, to produce a QASM output file, the following flag will generate this
format and write it to /tmp/test.qasm:
To enable this, we add the following functions to the quantum circuit class. The
function dump_to_file checks for any of the flags. If one is present, it passes the
flag and a corresponding code generator function to dump_with_dumper, which will
call this function on the IR and produce the output:
def dump_to_file(self):
self.dump_with_dumper(flags.FLAGS.libq, dumpers.libq)
self.dump_with_dumper(flags.FLAGS.qasm, dumpers.qasm)
self.dump_with_dumper(flags.FLAGS.cirq, dumpers.cirq)
There are, of course, better ways to structure this, especially when many more code
generators and options become available. For this text, the simple implementation will
do. The various code generators below also use a small number of helper functions,
which can be found in the open-source repository as well.
Fractions of Pi
As we produce textual output, it greatly improves readability to print fractions of
π as a fraction such as 3π/2 instead of 4.71238898038. For example, the complex
algorithms use the quantum Fourier transform with lots of rotations. Showing them as
fractions of π makes debug prints and generated code easier to read.
if val is None:
return ''
if val == 0:
return '0'
for pi_multiplier in range(1, 4):
for denom in range(-128, 128):
if denom and math.isclose(val, pi_multiplier * math.pi / denom):
pi_str = ''
if pi_multiplier != 1:
pi_str = f'{abs(pi_multiplier)}*'
if denom == -1:
return f'-{pi_str}{pi}'
if denom < 0:
return f'-{pi_str}{pi}/{-denom}'
if denom == 1:
return f'{pi_str}{pi}'
return f'{pi_str}{pi}/{denom}'
# Couldn't find fractional, just return original value.
return f'{val}'
8.5.7 QASM
The first dumper we present is the simplest one: QASM. It just traverses the list of
nodes and outputs the nodes with their names as found. Conveniently, the names
chosen for the operators already match the QASM specification. Not a coincidence.
8.5 Transpilation 317
for op in ir.gates:
if op.is_gate():
res += op.name
if op.val is not None:
res += '({})'.format(helper.pi_fractions(op.val))
if op.is_single():
res += f' {reg2str(ir, op.idx0)};\n'
if op.is_ctl():
res += f' {reg2str(ir, op.ctl)},{reg2str(ir, op.idx1)};\n'
return res
OPENQASM 2.0;
qreg q2[4];
qreg q1[8];
qreg q0[6];
creg c0[8];
h q1[0];
h q1[1];
h q1[2];
[. . .]
cu1(-pi/64) q1[7],q1[1];
cu1(-pi/128) q1[7],q1[0];
h q1[7];
measure q1[0] -> c0[0];
measure q1[1] -> c0[1];
[...]
8.5.8 LIBQ
The generation of sparse libq C++ code is similarly trivial. It produces C++, which
requires a bit more scaffolding in the beginning and end, but the core function is
similar: iterate over all nodes and convert IR nodes to C++. Compiling C++ requires
318 Quantum Languages, Compilers, and Tools
proper include paths and initialization. Those are stubbed out below and must be set
to the specifics of a given build system:
total_regs = 0
for regs in ir.regset:
total_regs += regs[1]
res += f' libq::qureg* q = libq::new_qureg(0, {total_regs});\n\n'
total_regs = 0
for regs in ir.regset:
for r in regs[2].val:
if r == 1:
res += f' libq::x({total_regs}, q);\n'
total_regs += 1
res += '\n'
for op in ir.gates:
if op.is_gate():
res += f' libq::{op.name}('
if op.is_single():
res += f'{op.idx0}'
if op.val is not None:
res += ', {}'.format(helper.pi_fractions(op.val, 'M_PI'))
res += ', q);\n'
if op.is_ctl():
res += f'{op.ctl}, {op.idx1}'
if op.val is not None:
res += ', {}'.format(helper.pi_fractions(op.val, 'M_PI'))
res += ', q);\n'
[...]
libq::x(1, q);
libq::x(13, q);
libq::cu1(11, 12, M_PI/2, q);
[...]
libq::cu1(11, 12, -M_PI/2, q);
libq::h(12, q);
libq::flush(q);
libq::print_qureg(q);
libq::delete_qureg(q);
return EXIT_SUCCESS;
}
8.5.9 Cirq
The last example is the converter to Google’s Cirq. This one is interesting: Because
Cirq doesn’t support certain gates, we have to construct workarounds as we traverse
the IR. Also, the operators need to be renamed (see op_map below):
for op in ir.gates:
if op.is_gate():
if op.name == 'u1':
res += 'm = np.array([(1.0, 0.0), (0.0, '
res += f'cmath.exp(1j * {helper.pi_fractions(op.val)}))])\n'
res += f'qc.append(cirq.MatrixGate(m).on(r[{op.idx0}]))\n'
continue
op_name = op_map[op.name]
res += f'qc.append(cirq.{op_name}('
320 Quantum Languages, Compilers, and Tools
if op.is_single():
res += f'r[{op.idx0}]'
if op.val is not None:
res += ', {}'.format(helper.pi_fractions(op.val))
res += '))\n'
if op.is_ctl():
res += f'r[{op.ctl}], r[{op.idx1}]'
if op.val is not None:
res += ', {}'.format(helper.pi_fractions(op.val))
res += '))\n'
Writing a transpiler does not appear overly complicated.5 At the time of writing,
other transpilers are gestating in open-source, such as one for LATEX. They were used
for a few of the circuits in this book, both to evaluate performance and to generate
circuit diagrams.
This appendix details the implementation of libq, including some optimization suc-
cesses and failures. The full source code can be found online in directory src/libq
in the open-source repository. It is about 500 lines of C++ code. Correspondingly, this
section is very code heavy.
The register file, which holds the qubits, is defined in libq.h in the type qureg_t.
Again, we use similar names in libq as found in libquantum to enable line-by-line
comparisons. This structure will hold an array with complex amplitudes and an array
with the state bitmasks.
struct qureg_t {
cmplx* amplitude;
state_t* state;
}
Interesting tidbit: in an earlier version of this library that was included in the SPEC
2006 benchmarks, those two arrays were written as an array of structures, which is
not good for performance, as iterations over the array to, say, flip a bit in all states
has to iterate over more memory than necessary. This author implemented a quite
involved data layout transformation in the HP compilers for Itanium to transform the
array of structs into a struct of arrays (Hundt et al., 2006). A later version of
the library then modified the source code itself, erasing the need and benefit of the
complex compiler transformation.
The member width, which probably deserves a better name, holds the number of
qubits in this register. The member size holds the number of nonzero probabilities,
and hash is the actual hash table, with hashw being the allocation size of the hash
table.
A.1 Register File 323
The operations to check whether a bit is set and to XOR a specific bit with a value
are common, so we extract them into member functions:
.cc:
libq::qureg* q = libq::new_qureg(0, 2);
Print a textual representation of the current state by listing all states with non-zero
probability.
Display statistics, such as how many qubits were stored, how often the hash table
was recomputed, and, another important metric, the maximum number of non-zero
probability states reached during the execution of an algorithm:
For certain experiments, we cache internal state. This next function ensures that all
remaining states will be flushed. This could mean a computation is completed or some
pending prints are flushed to stdout.
These are gates that do not create or destroy superposition. They represent the “easy”
case in this sparse representation. Let us look at a few representative gates; the full
implementation is in libq/gates.cc.
To apply the X-gate to a specific qubit, the bit corresponding to the qubit index
must be flipped. Recall that the gate’s function is determined by:
0 1 α β
= .
1 0 β α
If there are n states with nonzero probabilities in the system, we will have n tuples.
To flip one qubit’s probability according to the gate, we have to flip the bit for that
qubit in each of those tuples, as that represents the operation of this gate on all the
states. The probability amplitudes for that qubit are flipped by just flipping the bit;
there is no other data movement. The code is remarkably simple:
For another class of operators, we must check whether a bit is set before applying
a transformation. For example, applying the Z-gate to a state acts like this:
1 0 α α
= .
0 −1 β −β
The gate only has effect if β is nonzero. In the sparse representation, this means that
there must be a tuple representing a nonzero probability that has a 1 at the intended
qubit location. We iterate over all state tuples, check for the condition, and only negate
the amplitude if that bit was set:
Recall that if the qubit is in superposition, there will be two tuples: one with the
corresponding bit set to 0 and the amplitude set to α, and the other with the bit set to 1
and the amplitude set to β. For the Z-gate, we only need to change the second tuple.
A related example is the T-gate, there are a few more gates of this nature:
The Y-gate is moderately more complex, using a combination of the methods shown
above.
0 −i α −i β
= .
i 0 β iα
First, we flip the bit, similar to the X-gate, and then we apply i or −i, depending
on whether or not the qubit’s bit is set after being flipped:
Controlled gates are a logical extension of the above. In order to make a gate con-
trolled, we only have to check whether the corresponding control bit is set to 1. For
example, for the Controlled-X gate:
This even works for double-controlled gates, where we only have to check for both
control bits to be set. Here is the implementation of a double-controlled X-gate:
The difficult case is for gates that create or destroy superposition. We provide a single
implementation in apply.cc for this in function libq_gate1. The function expects
the 2 × 2 gate to be passed in.
For example, for the Hadamard gate:
The implementation itself applies the same technique we saw earlier in Section 4.5
on accelerated gates: a linear traversal over the states, except it is adapted to the sparse
representation. Additionally, it manages memory by filtering out close-to-zero states.
Let’s dive into it. The implementation is about 175 lines of code.
A.5 Hash Table 327
First, as indicated above, states are maintained in a hash table with the following hash
function:
The hash lookup function get_state checks whether a given state exists with
nonzero probability. It computes the hash index for a state a and iterates over the
dense array, hoping to find the actual state. If a 0 state was found (the marker for a
unpopulated entry) or if the search wraps around, no state was found and -1 is returned;
otherwise, the position in the hash table is returned:
}
}
reg->hash[i] = pos + 1;
// -- Optimization will happen here (later).
}
The most interesting function from a performance perspective is the one to recon-
struct the hash table. Since the gate apply function will filter out states with probabili-
ties close to 0.0, after gate application, we have to reconstruct the hash table to ensure
it only contains valid entries. This is the most expensive operation of the whole libq
implementation. We show some optimizations below, where the first loop is being
replaced with a memset(), and also in Section A.8.
The first thing to note is the first loop, which resets the hash array to all zeros:
You might expect that the compiler transforms this loop to a vectorized memset
operation. However, it does not. The loop-bound reg->hashw aliases with the loop
body, meaning that the compiler cannot infer whether or not the loop body would
modify the loop bound. Manually changing this to a memset speeds up the whole
simulation by about 20%.
This memset is still the slowest part of the implementation. Later, we will show how
to optimize it further.
Here is the routine to apply a gate. It assumes that something changed since last
invocation, so the first task is to reconstruct the hash table:
A.6 Gate Application 329
Superposition on a given qubit means that both the states with a 0 and a 1 at a given
bit position must exist. So the function iterates, and counts how many of those states
are missing and need to be added:
If new states need to be added, the function reallocates the arrays. It also does some
bookkeeping and remembers the largest number of states with nonzero probability:
This is all for state and memory management. Now on to applying the gates. We
allocate an array done to remember which states we’ve already handled. The limit
variable will be used at the end of the function to remove states with close to zero
probability.
char *done =
static_cast<char *>(calloc(reg->size + addsize, sizeof(char)));
int next_state = reg->size;
float limit = (1.0 / (static_cast<state_t>(1) << reg->width))
* 1e-6;
330 Appendix: Sparse Implementation
We then we iterate over all states and check if a state has not been handled. We
check whether a target bit has been set and obtain the other base state’s index in
variable xor_index. The amplitudes for the |0i and |1i basis states are stored in
tnot and t.
The matrix multiplication follows the patterns we’ve seen before for the fast gate
application in Section 4.5. If states are found, we apply the gate. If the XOR’ed state
was not found, this means we have to add a new state and perform the multiplication:
if (is_set) {
reg->amplitude[i] = m[2] * tnot + m[3] * t;
} else {
reg->amplitude[i] = m[0] * t + m[1] * tnot;
}
if (xor_index >= 0) {
if (is_set) {
reg->amplitude[xor_index] = m[0] * tnot + m[1] * t;
} else {
reg->amplitude[xor_index] = m[2] * t + m[3] * tnot;
}
} else { /* new basis state will be created */
if (abs(m[1]) == 0.0 && is_set) break;
if (abs(m[2]) == 0.0 && !is_set) break;
reg->state[next_state] =
reg->state[i] ^ (static_cast<state_t>(1) << target);
reg->amplitude[next_state] = is_set ? m[1] * t : m[2] * t;
next_state += 1;
}
A.7 Premature Optimization, Second Act 331
if (xor_index >= 0) {
done[xor_index] = 1;
}
As a final step, we filter out the states with a probability close to 0. This code
densifies the array by moving up all nonzero elements before finally reallocating the
amplitude and state arrays to a smaller size (which is actually a redundant operation):
reg->size += addsize;
free(done);
if (decsize) {
reg->size -= decsize;
Here is an anecdote that might serve as a lesson. After implementing the code and
running initial benchmarks, it appeared obvious that the repeated iteration over the
memory just had to be a bottleneck. Some form of mini-JIT (Just-In-Time compi-
lation) should be helpful, which first collects all the operations and then fuses gate
applications into the same loop iteration. The goal would be to significantly reduce
repeated iterations over the states to avoid the memory traffic which, again, just had to
332 Appendix: Sparse Implementation
be the problem. The code is available online. It might become valuable in the future,
as other performance bottlenecks are being resolved.
The goal of the main routine was to execute something like the following, with just
one outer loop and a switch statement over all superposition-preserving gates:
[...]
void Execute(qureg *reg) {
for (int i = 0; i < reg->size; ++i) {
for (auto op : op_list_) {
switch (op.op()) {
case op_t::X:
reg->bit_xor(i, op.target());
break;
case op_t::Y:
reg->bit_xor(i, op.target());
if (reg->bit_is_set(i, op.target()))
reg->amplitude[i] *= cmplx(0, 1.0);
else
reg->amplitude[i] *= cmplx(0, -1.0);
break;
case op_t::Z:
if (reg->bit_is_set(i, op.target())) {
reg->amplitude[i] *= -1;
}
Break;
[...] }}}}
As noted above, reconstructing the hash table is the most expensive operation in this
library. The hash table is sized to hold all potential states, given the number of qubits.
However, even for complex algorithms, the actual maximal number of states with
nonzero probability can be quite small. For example, for two benchmarks we extract
from quantum arithmetic (Arith) and order finding (Order), we show the maximum
number of nonzero states reached (8,192) and, given the number of qubits involved,
the theoretical maximal number of states. The percentage is 3.125% for Order, and
only 0.012% for Arith. It has a lot more qubits and, hence, a very large potential
number of states.
A.8 Actual Performance Optimization 333
During execution, the number of states changes dynamically in powers of two as libq
removes states that are very close to 0.0. Therefore, there is an opportunity to augment
the hash table and track, or cache, the addresses of elements that have been set, up to
a given threshold, for example, up to 64K elements.
To reset the hash table, we iterate over this hash cache and zero out the populated
elements in the hash table, as shown in Figure A.1. There will be a crossover point. For
some size of the hash cache, just linearly sweeping the hash table will be faster than
the random memory access patterns from the cache because of hardware prefetching
dynamics. We picked 64K as cache size, which, for our given examples, improves the
runtime significantly. This is an interesting space to experiment in, trying to find better
heuristics and data structures.
In function libq_reconstruct_hash, we additionally maintain an array called
hash_hits which holds the addresses of states in the main hash table, along with
a counter reg->hits of those. Then, we selectively zero out only those memory
addresses in the hash table that we cached. If the hash cache was not big enough, we
have to resort to zeroing out the full hash table:
reg->hits = 0;
}
for (int i = 0; i < reg->size; ++i) {
libq_add_hash(reg->state[i], i, reg);
}
}
All that’s left to do is to fill in this array hash_hits whenever we add a new
element in libq_add_hash using the following code at the very bottom:
[...]
reg->hash[i] = pos + 1;
if (reg->hash_caching && reg->hits < HASH_CACHE_SIZE) {
reg->hash_hits[reg->hits] = i;
reg->hits += 1;
}
of the 41st ACM SIGPLAN International Conference on Programming Language Design and
Implementation, PLDI 2020, London, UK, June 15–20, 2020, pp. 286–300. ACM, 2020. doi:
10.1145/3385412.3386007.
S. Boixo, S. V. Isakov, V. N. Smelyanskiy, et al. Characterizing quantum supremacy in near-term
devices. Nature Physics, 14(6):595–600, 2018. doi: 10.1038/s41567-018-0124-x.
G. Brassard, P. Høyer, M. Mosca, and A. Tapp. Quantum amplitude amplification and
estimation. Quantum Computation and Information, pp. 53–74, 2002. doi: 10.1090/conm/
305/05215.
I. Buck, T. Foley, D. Horn, et al. Brook for GPUs: Stream computing on graphics hardware.
ACM Transactions on Graphics, 23:777–786, 2004. doi: 10.1145/1186562.1015800.
H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf. Quantum fingerprinting. Physical Review
Letters, 87(16), 2001. doi: 10.1103/physrevlett.87.167902.
H. Buhrman, C. Dürr, M. Heiligman, et al. Quantum algorithms for element distinctness. SIAM
Journal on Computing, 34(6):1324–1330, 2005. doi: 10.1137/s0097539702402780.
B. Butscher and H. Weimer. libquantum. www.libquantum.de/, 2013. Accessed: 2021-02-10.
A. M. Childs, R. Cleve, E. Deotto, E. Farhi, S. Gutmann, and D. A. Spielman. Exponential
algorithmic speedup by a quantum walk. Proceedings of the Thirty-Fifth ACM Symposium
on Theory of Computing – STOC ’03, 2003. doi: 10.1145/780542.780552.
A.M. Childs, R. Cleve, S. P. Jordan, and D. Yonge-Mallo. Discrete-query quantum algorithm
for nand trees. Theory of Computing, 5(1):119–123, 2009. doi: 10.4086/toc.2009.v005a005.
F. T. Chong, D. Franklin, and M. Martonosi. Programming languages and compiler design for
realistic quantum hardware. Nature, 549(7671):180–187, 2017. doi: 10.1038/nature23459.
D. Coppersmith. An approximate Fourier transform useful in quantum factoring. arXiv e-prints,
art. quant-ph/0201067, Jan. 2002.
D. G. Cory, M. D. Price, W. Maas, et al. Experimental quantum error correction. Physical
Review Letters, 81(10):2152–2155, 1998. doi: 10.1103/physrevlett.81.2152.
A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta. Open quantum assembly language,
2017. arXiv:1707.03429.
C. M. Dawson and M. A. Nielsen. The Solovay–Kitaev algorithm. Quantum Information and
Computation, 6(1):81–95, 2006.
H. De Raedt, F. Jin, D. Willsch, et al. Massively parallel quantum computer simulator, eleven
years later. Computer Physics Communications, 237:47–61, 2019. doi: 10.1016/j.cpc.2018.
11.005.
W. Dean. Computational complexity theory. In E. N. Zalta, ed., The Stanford Encyclopedia of
Philosophy. Metaphysics Research Lab, Stanford University, 2016.
D. Deutsch. Quantum theory, the Church–Turing principle and the universal quantum computer.
Proceedings of the Royal Society of London Series A, 400(1818):97–117, 1985. doi: 10.1098/
rspa.1985.0070.
D. Deutsch and R. Jozsa. Rapid solution of problems by quantum computation. Proceedings of
the Royal Society of London. Series A, 439(1907):553–558, 1992. doi: 10.1098/rspa.1992.
0167.
S. J. Devitt, W. J. Munro, and K. Nemoto. Quantum error correction for beginners. Reports on
Progress in Physics, 76(7):076001, 2013. doi: 10.1088/0034-4885/76/7/076001.
Y. Ding and F. T. Chong. Quantum computer systems: Research for noisy intermediate-scale
quantum computers. Synthesis Lectures on Computer Architecture, 15(2):1–227, 2020. doi:
10.2200/S01014ED1V01Y202005CAC051.
References 337
F. Rios and P. Selinger. A categorical model for a quantum circuit description language
(extended abstract). Electronic Proceedings in Theoretical Computer Science, 266:164–178,
2018. doi: 10.4204/eptcs.266.11.
R. L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public-
key cryptosystems. Communications of the ACM, 21:120–126, 1978.
L. Rolf. Is quantum mechanics useful? Philosophical Transactions of the Royal Society of
London. Series A: Physical and Engineering Sciences, 353:367–376, 1995. doi: 10.1098/
rsta.1995.0106.
N. J. Ross. Algebraic and logical methods in quantum computation, 2017. URL https://arxiv.
org/abs/1510.02198.
N. J. Ross and P. Selinger. Optimal ancilla-free Clifford+T approximation of z-rotations.
Quantum Information and Computation, 11–12:901–953, 2016.
N. J. Ross and P. Selinger. Exact and approximate synthesis of quantum circuits.
www.mathstat.dal.ca/∼selinger/newsynth/, 2021. Accessed: 2021-02-10.
B. Rudiak-Gould. The sum-over-histories formulation of quantum computing. arXiv e-prints,
art. quant-ph/0607151, 2006.
V. V. Shende, I. L. Markov, and S. S. Bullock. Minimal universal two-qubit controlled-NOT-
based circuits. Physical Review A, 69(6):062321, 2004. doi: 10.1103/physreva.69.062321.
P. W. Shor. Algorithms for quantum computation: Discrete logarithms and factoring. In
Proceedings 35th Annual Symposium on Foundations of Computer Science, pp. 124–134,
1994. doi: 10.1109/SFCS.1994.365700.
P. W. Shor. Scheme for reducing decoherence in quantum computer memory. Physics Review
A, 52:R2493–R2496, 1995. doi: 10.1103/PhysRevA.52.R2493.
D. Simon. On the power of quantum computation. In Proceedings 35th Annual Symposium on
Foundations of Computer Science, pp. 116–123, 1994. doi: 10.1109/SFCS.1994.365701.
M. Smelyanskiy, N. P. D. Sawaya, and A. Aspuru-Guzik. qHiPSTER: The quantum high
performance software testing environment, 2016. arXiv:1601.07195v2 [quant-ph].
M. Soeken, S. Frehse, R. Wille, and R. Drechsler. RevKit: An open source toolkit for the
design of reversible circuits. In Reversible Computation 2011, vol. 7165 of Lecture Notes
in Computer Science, pp. 64–76, 2012. RevKit is available at www.revkit.org.
M. Soeken, H. Riener, W. Haaswijk, et al. The EPFL logic synthesis libraries, 2019.
arXiv:1805.05121v2.
A. Steane. Multiple particle interference and quantum error correction. Proceedings of the
Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences,
452(1954):2551–2577, 1996. doi: 10.1098/rspa.1996.0136.
D. S. Steiger, T. Häner, and M. Troyer. ProjectQ: An open source software framework for
quantum computing. Quantum, 2:49, 2018. doi: 10.22331/q-2018-01-31-49.
K. M. Svore, A. V. Aho, A. W. Cross, I. Chuang, and I. L. Markov. A layered software
architecture for quantum computing design tools. Computer, 39(1):74–83, 2006. doi: 10.
1109/MC.2006.4.
A. van Tonder. A lambda calculus for quantum computation. SIAM Journal on Computing,
33(5):1109–1135, 2004. doi: 10.1137/s0097539703432165.
J. D. Whitfield, J. Biamonte, and A. Aspuru-Guzik. Simulation of electronic structure
Hamiltonians using quantum computers. Molecular Physics, 109(5):735–750, 2011. doi:
10.1080/00268976.2011.552441.
Wikipedia. KD-Trees. https://en.wikipedia.org/wiki/K-d_tree, 2021a. Accessed: 2021-02-10.
342 References
Q language C++ class library, 300 Quantum circuit (qc) data structure
Transpiler code generation flag, 315 About abstraction, 126, 134
Q# commercial system (Microsoft), 303 Constructor, 127
Q# programming language, 303 Full adder example, 134
Silq comparison, 302 Gates, 129
Quantum Developer Kit, 303 Adjoint, 130
QASM tool, 295 Applying, 128, 142
Addition via QFT circuit, 173 Fast application, 134–139
cQASM, 295 Fast application generalized, 137–139
Hierarchical QASM, 299 Faster application with C++, 139–145
OpenQASM, 295 Multi-Controlled gates, 132
Transpilation Swap and controlled swap gates, 131
About QASM, 311 Measurements, 131
Code generation flag, 315 Statistical sampling function, 131
Dumper function, 316 nbits property, 128
qc (quantum circuit) data structure Quantum registers, 127
About abstraction, 126, 134 Qubits added, 128
Constructor, 127 Sparse representation, 145–147
Full adder example, 134 Benchmarking, 147
Gates, 129 Transpilation extension of, 313
Adjoint, 130 Code generation flags, 315
Applying, 128, 142 Eager mode, 313, 314
Fast application, 134–139 Quantum circuit notation
Fast application generalized, 137–139 About ordering of gate applications, 54
Faster application with C++, 139–145 Controlled gates
Multi-Controlled gates, 132 Controlled-X gates, 53
Swap and controlled swap gates, 131
Controlled-Z gates, 53
Measurements, 131
Controlled-Not-by-0 gates, 53
Statistical sampling function, 131
More than one qubit controlling, 54
nbits property, 128
Entangler circuits, 61
Quantum registers, 127
Fan-out circuits, 92
Qubits added, 128
Full adder, 89
Sparse representation, 145–147
Information flow double lines, 54
Benchmarking, 147
Logic circuits, 92
Transpilation extension of, 313
Measurement, 54
Code generation flags, 315
Oracle for Bernstein-Vazirani algorithm, 106
Eager mode, 313, 314
Qubit order, 51
QCL programming language, 296–298
Single-qubit operator applied, 52, 53
Quipper comparison, 301
State as tensor product combined state, 52
QFT, see Quantum Fourier transform
qHipster simulator, 320 State change depiction, 52
Qiskit commercial system (IBM), 303 State initialization, 52
ALAP scheduling of gates, 307 Swap test, 93
Algorithm reference, 277 X-gates, 53
QASM support, 311 Quantum computers
Simulators, 321 Arithmetic via full adder, 89–91
QPE, see Quantum phase estimation Quantum arithmetic, 172–177
QRAM model of quantum computing, 293, 294 Classical computers controlling, 293, 294
Gate approximation, 310 Classical computers simulated by, 149
qsim simulator (Google), 321 Commercial systems, 303
qsimh simulator (Google), 321 Compiler design challenges, 293
Quadratic programming problem, 254 Density matrices for theory of, 24, 68
Quantum advantage, 149 Environmental challenges, 278–284
Quantum algorithm zoo, 277 Error correction challenges, 285
Quantum amplitude amplification (QAA), 227–230 Flow control via controlled gates, 46
Quantum approximate optimization algorithm QCL programming language, 297
(QAOA), 253 Silq programming language, 302
Index 355