Introduction To Quantum Algorithms
Introduction To Quantum Algorithms
64
Introduction
to Quantum
Algorithms
Johannes A. Buchmann
Introduction
to Quantum
Algorithms
Pure and Applied
Sally
The UNDERGRADUATE TEXTS • 64
SERIES
Introduction
to Quantum
Algorithms
Johannes A. Buchmann
EDITORIAL COMMITTEE
Daniel P. Groves John M. Lee
Tara S. Holm Maria Cristina Pereyra
Giuliana Davidoff (Chair)
Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to [email protected].
c 2024 by the American Mathematical Society. All rights reserved.
The American Mathematical Society retains all rights
except those granted to the United States Government.
Printed in the United States of America.
∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 29 28 27 26 25 24
Contents
Preface ix
The advent of quantum computing ix
The goal of the book x
The structure of the book xi
What is not covered xiv
For instructors xiv
Acknowledgements xv
v
vi Contents
Bibliography 363
Index 365
Preface
ix
x Preface
work of Peter Shor [Sho94] on quantum polynomial time factoring and discrete log-
arithm algorithms. Shor’s work alarmed the world as it revealed the vulnerability of
all known public-key cryptography, one of the fundamental pillars of cybersecurity, to
quantum computer attacks.
Another early advancement in quantum computing that garnered significant at-
tention was Lov Grover’s algorithm [Gro96], offering a quadratic speedup for unstruc-
tured search problems. This breakthrough further fueled the growing interest in quan-
tum computing. Grover’s algorithm captured widespread interest because of its ability
to solve a very generic problem, making it useful across a wide range of applications.
In the decades following these early developments, many more quantum algo-
rithms have been discovered. An example is the HHL algorithm [HHL08], which can
be used to find properties of solutions of large sparse linear systems with certain prop-
erties, providing an exponential speedup over classical solvers like Gauss elimination.
Since linear algebra is one of the most important tools in all areas of science and engi-
neering, the HHL algorithm has wide applications, including machine learning, which
is one of the most significant techniques in computer science today.
This progress should not deceive us, as the development of quantum algorithms
remains a significant challenge. Sometimes, there is the impression that quantum com-
puting allows all computations to be parallelized and significantly accelerated. How-
ever, that is not the case. In reality, each new quantum algorithm requires a unique
idea. Consequently, such algorithms can currently only accelerate a few computa-
tion problems. Moreover, only very few of these improvements come with exponential
speedups.
All the algorithms mentioned in this book are designed for universal, gate-based
quantum computers, which are the most widely recognized and extensively researched
type of quantum computers. In addition to universal quantum computers, there are
more specialized types of quantum computers, such as quantum annealers and quan-
tum simulators. Quantum annealers utilize annealing techniques to solve optimiza-
tion problems by finding the lowest-energy state of a physical system. On the other
hand, quantum simulators are specifically designed to simulate quantum systems and
study quantum phenomena. However, this book focuses on universal quantum com-
puters due to their versatility and because they are the most interesting from a com-
puter science perspective.
subsequent chapters. I also adopted this approach in writing the book due to my own
experience. Despite having degrees in mathematics and physics and being a computer
science professor for over 30 years, I found myself needing to refresh my memory on
several required concepts and to learn new material. Therefore, my objective is to make
the presentation understandable with a minimum of prerequisites, ensuring clarity for
both myself and the readers.
My approach of covering all the details will lead to the situation that some readers
may already possess knowledge covered in the introductory chapters. However, even
they are likely to encounter new and vital information in these chapters, essential for
understanding quantum algorithms. For example, Chapter 1 gives an introduction to
the theory of reversible computation, which is not typically part of the standard com-
puter science education. Chapter 2 introduces mathematicians to the Dirac notation,
commonly used by physicists. Chapter 3 further expands the understanding of physi-
cists by applying the quantum mechanics postulates to quantum gates and circuits.
Therefore, I encourage those with prior knowledge to read these sections, taking note
of the notation used in the book and of unfamiliar results. This is vital for grasping the
intricacies of my explanation of quantum algorithms.
valuable tools for subsequent discussions. Moving forward, the chapter familiarizes the
reader with significant operators in quantum mechanics, such as Hermitian, unitary,
and normal operators. Of particular significance is the spectral theorem, a fundamen-
tal result that offers profound insights into these operators and their characteristics.
The consequences of the spectral theorem are also explored to enrich the reader’s un-
derstanding. Furthermore, the chapter delves into the concept of tensor products of
finite-dimensional Hilbert spaces, a crucial notion in quantum computing. The dis-
cussion culminates with an elucidation of the Schmidt decomposition theorem, which
plays a pivotal role in characterizing the entanglement of quantum systems.
Chapter 3 constitutes the third foundational pillar of quantum computing required
in this book, encompassing the essential background of quantum mechanics. This
chapter introduces the relevant quantum mechanics postulates. To illustrate their rele-
vance, the chapter applies these postulates to introduce fundamental concepts of quan-
tum computing, including quantum bits, registers, gates, and circuits. Simple exam-
ples of quantum computation are provided to enhance the reader’s understanding of
the connection between the postulates and quantum algorithms. In addition, the chap-
ter provides the foundation for the geometric interpretation of quantum computation.
It achieves this by establishing the correspondence between states of individual quan-
tum bits and points on the Bloch sphere, a pivotal concept in quantum computing vi-
sualization. Moreover, the chapter presents an alternative description of the relevant
quantum mechanics using density operators. This approach enables the modeling of
the behavior of components within composed quantum systems.
The foundational groundwork laid out in the initial three chapters, including the
domains of computer science, mathematics, and physics, sets the stage for a compre-
hensive exploration of quantum algorithms in Chapter 4. This chapter embarks on
this transformative journey by shedding light on pivotal quantum gates, which serve
as the fundamental constituents of quantum circuits. We start by introducing single-
qubit gates, demonstrating that their operations can be perceived as rotations within
three-dimensional real space. Subsequently, we delve into the realm of multiple-qubit
operators, with a particular focus on controlled operators. In addition, this chapter
familiarizes readers with the significance of ancillary and erasure gates, which play
a vital role in the augmentation and removal of quantum bits. Leveraging analogous
outcomes from classical reversible circuits, the chapter shows that every Boolean func-
tion can be realized through a quantum circuit. In contrast to the classical scenario, the
quantum case does not adhere to the notion that a finite set of quantum gates suffices to
implement any quantum operator. Instead, finite sets of quantum gates are presented,
enabling the approximation of all quantum circuits. Lastly, the chapter ushers in the
concept of quantum complexity theory, using the analogy between classical probabilis-
tic algorithms and quantum algorithms. It introduces the complexity class BQP, which
stands for bounded-error quantum polynomial time.
The following four chapters focus on specific quantum algorithms.
Chapter 5 introduces early algorithms designed to illustrate the quantum comput-
ing advantage. We begin with the Deutsch algorithm, as presented in David Deutsch’s
seminal paper from 1985 [Deu85], and its generalization by David Deutsch and Richard
The structure of the book xiii
of the appendix lists essential trigonometric identities and inequalities that play a cru-
cial role in the main part of the book. Appendix B focuses on linear algebra. Its first
part briefly reviews important concepts and results. The second part covers the con-
cept of tensor products, which is of significant importance in quantum computing and
is typically not included in introductory courses in linear algebra. Lastly, Appendix C
contains the required notions and results from probability theory. This knowledge is
essential for the analysis of probabilistic and quantum algorithms.
For instructors
This book is suitable for self-study. It is also intended and has been used for teaching
introductory courses on quantum algorithms. My recommendation to instructors for
such a course is as follows: If most of the participants are already familiar with the
required basics of algorithms and complexity, linear algebra, algebra, and probability
theory, the course should cover Chapters 3, 4, 5, 6, 7, and 8 in this order, exploring dif-
ferent aspects of quantum algorithms. Individual students lacking some basic knowl-
edge can familiarize themselves with those topics using the detailed explanations in
the respective parts of the book. If the majority of the participants in the course is un-
familiar with certain basic topics, the instructor may want to briefly cover them either
in a preliminary lecture or when they are used during the course.
Depending on the instructor’s intentions and the available time, the course may
focus more on theoretical explanations and proofs or on the practical aspects of how
quantum algorithms work. In both situations, students who desire more background
Acknowledgements xv
than is covered in the course can supplement their knowledge through self-study of
the corresponding book parts.
Acknowledgements
I express my sincere gratitude to the following individuals who have been instrumental
in supporting me throughout the process of writing this book. Their invaluable advice,
discussions, and comments have played a pivotal role in shaping the content and qual-
ity of this work: Gernot Alber, Gerhard Birkl, Jintai Ding, Samed Düzlü, Fritz Eisen-
brand, Marc Fischlin, Mika Göös, Iryna Gurevich, Matthieu Nicolas Haeberle, Taketo
Imaizumi, Michael Jacobson, Nigam Jigyasa, Norbert Lutkenhaus, Alastair Kay, Ju-
liane Krämer, Gen Kimura, Michele Mosca, Jigyasa Nigam, Rom Pinchasi, Ahamad-
Reza Sadeghi, Masahide Sasaki, Alexander Sauer, Florian Schwarz, Tsuyoshi Takagi,
Shusaku Uemura, Thomas Walther, Yuntao Wang, and Ho Yun. Their dedication to
sharing their expertise and knowledge has been truly invaluable, and I am deeply grate-
ful for their willingness to engage in insightful discussions and provide constructive
feedback throughout this journey.
I also extend my heartfelt gratitude to Ina Mette, the responsible person at AMS.
Her belief in the potential of this book and her continuous encouragement to pursue
this project have been instrumental in its realization. I am deeply grateful for her un-
wavering support and guidance throughout the writing process. I am also grateful to
Arlene O’Sean and her team at AMS for their great job in carefully proofreading the
book and making its appearance so nice.
I learned a lot from the books on quantum computing by Michael A. Nielsen and
Isaac L. Chuang [NC16] and by Phillip Kaye, Raymond Laflamme, and Michele Mosca
[KLM06].
The writing of this book would not have been possible without the invaluable con-
tributions of several open-source LaTeX packages, which greatly facilitated the presen-
tation of complex concepts. I extend my gratitude to the creators and maintainers of
these packages:
• The powerful TikZ library1 and its extension circuitikz2 were instrumental in
visualizing circuits and diagrams throughout the book.
• I used the open source TikZ code for illustrating the right-hand rule,3
• For the clear representation of quantum circuits, I relied on quantikz.4
• To illustrate quantum states on Bloch spheres, I used the blochsphere package.5
• The packages algorithm and algorithmpseudocode6 were indispensable in pre-
senting algorithms in a structured and easily understandable format.
• For handling the Dirac notation with ease, I benefited from the physics package.7
1
https://tikz.net/
2
https://ctan.org/pkg/circuitikz
3
https://tikz.net/righthand_rule/
4
https://ctan.org/pkg/quantikz
5
https://ctan.org/pkg/blochsphere
6
https://www.overleaf.com/learn/latex/Algorithms
7
https://www.ctan.org/pkg/physics
xvi Preface
I am sincerely grateful to the open-source community for making these and many more
tools available, enhancing the quality of this work and simplifying its creation.
Finally, I would like to acknowledge the support provided by ChatGPT8 in improv-
ing many formulations of my presentation. As I am not a native speaker of the English
language, this assistance was of great help.
8
https://chat.openai.com/
Chapter 1
Classical Computation
1
2 1. Classical Computation
uses similar modeling is the book by Thomas H. Cormen, Charles E. Leiserson, Ronald
L. Rivest, and Clifford Stein [CLRS22].
1.1.1. Basics. To explain our model, we introduce some basic concepts and re-
sults. We begin by defining alphabets.
Definition 1.1.1. An alphabet is a finite nonempty set.
Example 1.1.2. The simplest alphabet is the unary alphabet {I}, which contains only
the symbol I. The most commonly used alphabet in computer science is the binary
alphabet {0, 1}, where each element is referred to as a bit. Other commonly used al-
phabets in computer science include the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} of decimal digits,
the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹} of hexadecimal digits, and the Latin al-
phabet ℛ = {a, . . . , z, A, . . . , Z, ␣} that includes lowercase and uppercase Latin letters,
as well as the space symbol ␣.
As we have seen, every finite sequence of bits that starts with the bit ‘1’ is assigned
to a uniquely determined positive integer as its binary expansion. Now, we also assign
uniquely determined nonnegative integers to all sequences in {0, 1}∗ including those
that start with the bit 0 using the following definition.
Definition 1.1.12. For all 𝑛 ∈ ℕ and 𝑏 ⃗ = (𝑏0 , . . . , 𝑏𝑛−1 ) ∈ {0, 1}∗ we set
𝑛−1
(1.1.3) stringToInt(𝑏)⃗ = ∑ 𝑏𝑖 2𝑛−𝑖−1
𝑖=0
Note that nonnegative integers are represented by infinitely many strings in {0, 1}∗ .
Specifically, they are represented by all strings that result from prepending a string
consisting only of zeros to their binary expansions. Also, the number 0 is represented
by all finite sequences of the bit “0”.
Exercise 1.1.13. Use Proposition 1.1.7 to show that the map (1.1.3) is a bijection.
Another common data type is that of a floating point number, which represents an
approximation to a real number. However, analyzing algorithms that use this data
type is more complex, as it requires considering error propagation. For our purposes,
we do not need to take this data type into account.
Objects of these elementary data types are represented and stored using bits. The
encoding of these objects can be defined in a straightforward manner for both bits and
Roman characters. Integers are encoded using the binary expansion of their absolute
value along with the indication of their sign. The size of these encodings refers to the
number of bits used. For a data type object 𝑎, the size of its encoding is denoted by
size 𝑎. It may vary depending on the specific computing platform, programming lan-
guage, or other relevant factors. However, detailed discussions regarding these specific
implementations are beyond the scope of this book, and we present only a summary of
the different sizes of the elementary data types in Table 1.1.1.
In our model, advanced data types include vectors and matrices over some data
type, which are described in Sections B.1 and B.4. For example, (1, 2, 3) is an integer
0 1
vector. Similarly, ( ) is a bit matrix. Vectors and matrices are also encoded by bits,
1 0
using the encoding method of their data type. The encodings of vectors and matrices
possess the following properties. For any 𝑘 ∈ ℕ, the size of the encoding of a vector
𝑠 ⃗ = (𝑠0 , . . . , 𝑠𝑘−1 ) over some data type satisfies
𝑘−1
(1.1.4) size(𝑠)⃗ = O ( ∑ size 𝑠𝑖 ) .
𝑖=0
Similarly, for any 𝑘, 𝑙 ∈ ℕ, the size of the encoding of a matrix 𝑀 = (𝑚𝑖,𝑗 )𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑙 over
some data type satisfies
𝑎 𝑏
19 “bit”
Figure 1.1.1. The variables 𝑎 and 𝑏 represent memory cells that contain 19 and “bit”, respectively.
(1.1.6) 𝑎 ← 𝑏.
It sets the value of a variable 𝑎 of a certain data type to an element 𝑏 of this data type
or to the value of a variable 𝑏 of the same data type as 𝑎. The running time and space
requirement for this operation is O(size 𝑏).
Algorithms may also assign the result of an operation to a variable. An example of
such an instruction is
(1.1.7) 𝑎 ← 𝑏 + 3.
The right-hand side of this assignment is the arithmetic expression 𝑏 + 3. When such
an instruction is executed, the arithmetic expression is evaluated first. In the example,
6 1. Classical Computation
Table 1.1.2. Permitted operations on integers, their running times, and space require-
ments for operands of size O(𝑛).
it depends on the value of the variable 𝑏. Then the result is assigned to 𝑎. It is permit-
ted that the variable on the left side also appears on the right side. For instance, the
instruction
(1.1.8) 𝑐←𝑐+1
increments a counter 𝑐 by 1.
Next, we present the operations that may be used in expressions on the right side
of an assign instruction. The permitted operations on integers are listed in Table 1.1.2,
including their time and space requirements. The results of the operations absolute
value, floor, ceiling, next integer, add, subtract, multiply, divide, and remainder are
integers. The results of the comparisons are the bits 0 or 1 where 1 stands for “true”
and 0 stands for “false”. For a description and analysis of these algorithms see [AHU74]
and [Knu82].
In most programming languages, only integers with limited bit lengths are avail-
able, such as 64-bit integers. When there is a need to work with integers of arbitrary
length, specialized algorithms are required to handle operations on such numbers. To
simplify our descriptions, we assume that operations on integers of arbitrary length are
available as basic operations. These operations can be realized using the running time
and memory space as listed in Table 1.1.2 but there exist much more efficient algo-
rithms for integer multiplication and division with remainder. For instance, a highly
efficient integer multiplication algorithm developed by David Harvey and Joris van
der Hoeven [HvdH21] has running time O(𝑛 log 𝑛) for 𝑛-bit operands. Additionally,
it is known that for any integer multiplication algorithm with a running time of 𝑀(𝑛),
there exist division with remainder and square root algorithms with a running time of
O(𝑀(𝑛)) (see [AHU74, Theorem 8.5]). However, in practice, these faster algorithms
1.1. Deterministic algorithms 7
are only advantageous for handling very large numbers. For more typical integer sizes,
classical algorithms may still be more efficient in terms of practical performance.
On the bits 0 and 1 our algorithms can perform the logic operations that are listed
in Table 1.1.3. They implement the functions shown in Table 1.1.4. All permitted logic
operations run in time O(1) and require space O(1).
Algorithms may also use the branch statements for, while, repeat, and if. They
initiate the execution of a sequence of instructions if some branch condition is satis-
fied. In the analysis of algorithms, we will assume that the time and space required to
execute an algorithm part that uses a branch instruction is the time and space required
to evaluate the branch condition and the corresponding sequence of instructions, pos-
sibly several times. Branch instructions together with the corresponding instruction
sequence are referred to as loops.
We now provide more detailed explanations of the branch instructions using the
examples shown in Figures 1.1.2 and 1.1.3 utilizing pseudocode that is further de-
scribed in Section 1.1.5.
A for statement appears at the beginning of an instruction sequence and is ended
by an end for statement. This instruction sequence is executed for all values of a spec-
ified variable, as indicated in the for statement. In the for loop in Figure 1.1.2, the
variable is 𝑖, and the instruction 𝑝 ← 2𝑝 is executed for all 𝑖 from 𝑖 = 1 to 𝑖 = 𝑒. After
𝑖 iterations of this instruction, the value of 𝑝 is 2𝑖 . So after completion of the for loop,
the value of 𝑝 is 2𝑒 .
Also, while statements appear at the beginning of an instruction sequence after
which there is an end while statement. The instruction sequence is executed as long
as the condition in the while statement is true. For instance, the while loop in Figure
8 1. Classical Computation
1.1.2 also computes 2𝑒 . For this, the counting variable 𝑖 is initialized to 1 and the vari-
able 𝑝 is initially set to 1. Before each round of the while loop, the logic expression 𝑖 ≤ 𝑒
is evaluated. If it is true, then the instruction sequence in the while loop is executed.
In the example, 𝑝 is set to 2𝑝 and the counting variable 𝑖 is increased by 1. After the
𝑘th iteration of the while loop, the value of 𝑝 is 2𝑘 and the counting variable is 𝑖 = 𝑘.
Hence, after the 𝑒th iteration of the while loop we have 𝑝 = 2𝑒 and 𝑖 = 𝑒 + 1. So the
while condition is violated and the computation continues with the first instruction
following the while loop.
Next, repeat statements are also followed by an instruction sequence that is ended
by an until statement that contains a condition. If this condition is satisfied, the com-
putation continues with the first instruction after the until statement. Otherwise, the
instruction sequence is executed again. In the example in Figure 1.1.2, the instruction
𝑝 ← 2𝑝 is executed until the counting variable 𝑖 is equal to 𝑒. Note that the instruction
sequence is executed at least once. Therefore, this repeat loop cannot compute 20 .
Exercise 1.1.15. Find for, while, and repeat loops that compute the integer repre-
sented by a bit sequence 𝑠0 𝑠1 ⋯ 𝑠𝑛−1 ∈ {0, 1}𝑛 where 𝑛 ∈ ℕ.
Now we explain if statements. The three different ways to use if are shown in
Figure 1.1.3. Such a statement is followed by a sequence of instructions that is ended
by an end if statement. The instruction sequence may be interrupted by an else state-
ment or by one or more else if statements. The code segment on the left side of Figure
1.1.3 checks whether 𝑎 < 0 is true, in which case the instruction 𝑎 ← −𝑎 is executed.
Otherwise, the computation continues with the instruction following the end if state-
ment. This code segment computes the absolute value of 𝑎 because 𝑎 is set to −𝑎 if 𝑎 is
negative and otherwise remains unchanged. The code segment in the middle of Figure
1.1.3 checks whether 𝑎 is divisible by 11. If so, the variable 𝑠 is set to 1 and otherwise
to 0. Finally, the code segment on the right side of Figure 1.1.3 first checks if 𝑎 > 0 in
which case the variable 𝑠 is set to 1. Next, if 𝑎 = 0, 𝑠 is set to 0. Finally, 𝑠 is set to −1 if
𝑎 < 0. The result is the sign 𝑠 of 𝑎.
A computation terminates when the return instruction is used. This instruction
makes the result available for further use and takes the form
(1.1.9) return(𝑎)
where 𝑎 is an element or variable of some data type or a sequence of such objects. The
time and space requirement for this operation is O(𝑆), where 𝑆 represents the sum of
the sizes of the objects in the return statement.
1.1. Deterministic algorithms 9
Algorithms can also return the result of expressions. For instance, a return in-
struction may be of the form
In this case, the expression is evaluated first, and then the result is returned. The time
and space requirements of this instruction are O(𝑡) and O(𝑠), respectively, where 𝑡 and
𝑠 denote the time and space required to evaluate the corresponding expression.
Finally, a call to a subroutine is also considered a valid instruction in our model.
It takes the form
(1.1.11) 𝑎 ← 𝐴(𝑏)
Here, the call 𝗉𝗈𝗐𝖾𝗋(𝑎, 𝑒) invokes the subroutine, returning the result of 𝑎𝑒 , where 𝑎
and 𝑒 are both nonnegative integers.
The time and space requirements associated with a subroutine call are O(𝑡) and
O(𝑠), respectively, where 𝑡 and 𝑠 represent the time and space required by the subrou-
tine to execute. Section 1.4 provides an explanation of how these requirements are
determined.
But in the next sections, we formally define the properties of algorithms. These defini-
tions can be made precise if a formal model of computation is used, such as the Turing
machine model.
We illustrate our algorithm model using the Euclidean algorithm. The correspond-
ing pseudocode is shown in Algorithm 1.1.16.
(1) Each run of the algorithm with a permitted input carries out a return instruction.
This means that the algorithm terminates on any input 𝑎 ∈ Input(𝐴).
(2) When the algorithm performs a return instruction, the return value is correct;
i.e., it has the property specified in the Output statement.
(3) Executing the return instruction is the only way the algorithm can terminate.
This means that after executing a statement that is not a return instruction there
is always a next instruction that the algorithm carries out.
Example 1.1.17. We describe the run of the Euclidean algorithm with input (𝑎, 𝑏) =
(100, 35). The instructions in lines 2 and 3 replace 𝑎 and 𝑏 by their absolute values. For
the chosen input, they have no effect. Since 𝑏 = 35, the while condition is satisfied.
Hence, the Euclidean algorithm executes 𝑟 ← 100 mod 35 = 30, 𝑎 ← 𝑏 = 35, and
𝑏 ← 𝑟 = 30. After this, the while condition is still satisfied since 𝑏 = 30. So the
Euclidean algorithm executes 𝑟 ← 35 mod 30 = 5, 𝑎 ← 𝑏 = 30, and 𝑏 ← 𝑟 = 5. Also,
after this iteration of the while loop, the while condition is still satisfied since 𝑏 = 5.
The Euclidean algorithm executes 𝑟 ← 30 mod 5 = 0, 𝑎 ← 𝑏 = 5, and 𝑏 ← 𝑟 = 0. Now,
the while condition is violated. So the while loop is no longer executed. Instead, the
return instruction following end while is carried out. This means that the algorithm
returns 5 which is gcd(100, 35).
Table 1.1.5. Beginning of the run of the Euclidean algorithm with input (100, 35).
Table 1.1.6. End of the run of the Euclidean algorithm with input (100, 35).
algorithm”. For instance, consider State 3 in Table 1.1.5. The value of 𝑏 is 35. So the
while condition is satisfied. The execution of the while instruction does not change
the values of 𝑎, 𝑏, or 𝑟 and causes the next instruction to be 𝑟 ← 𝑎 mod 𝑏. So State 4 is
uniquely determined by State 3.
Since we require deterministic algorithms to always terminate, the same state can-
not occur repeatedly in an algorithm run. Otherwise, the algorithm would enter an
infinite loop. In other words, the states in algorithm runs are pairwise different.
It is important to prove the correctness of an algorithm. This means that on input of
any 𝑎 ∈ Input(𝐴) the algorithm terminates and its output has the specified properties.
In Example 1.1.18, we present the correctness proof of the Euclidean algorithm.
Example 1.1.18. We prove the correctness of the Euclidean algorithm. First, note that
after 𝑏 is replaced by its absolute value, the sequence of values of 𝑏 is strictly decreasing
since starting from the second 𝑏, any such value is the remainder of a division by the
previous value of 𝑏. So at some point, we must have 𝑏 = 0 which means that the
algorithm terminates. Next, as Exercise 1.1.19 shows, the value of gcd(𝑎, 𝑏) in line 4
is always the same. But when the algorithm terminates, we have 𝑏 = 0 and therefore
gcd(𝑎, 𝑏) = gcd(𝑎, 0) = 𝑎. The fact that gcd(𝑎, 𝑏) does not change is called an algorithm
invariant. Such invariants are frequently used in correctness proofs of algorithms.
Exercise 1.1.19. Show that in line 4 of the Euclidean algorithm, the value of gcd(𝑎, 𝑏)
is always the same.
Exercise 1.1.22. Let 𝑎 = 35. Determine the first three and the last three states of the
run of Algorithm 1.1.21.
There is a close connection between decision and more general algorithms. For ex-
ample, as shown in Example 1.4.21, an algorithm that decides whether an integer has
a proper divisor below a given bound can be transformed into an integer factoring al-
gorithm with almost the same efficiency. This can be generalized to many algorithmic
problems.
1.1.7. Time and space complexity. Let 𝐴 be an algorithm. Its efficiency de-
pends on the time complexity and the memory requirements of 𝐴 which we discuss in
this section.
Definition 1.1.25. (1) The running time or time complexity of 𝐴 for a particular in-
put 𝑎 ∈ Input(𝐴) is the sum of the time required for reading the input 𝑎 which
is O(size(𝑎)) and the running times of the instructions executed during the algo-
rithm run with input 𝑎.
(2) The worst-case running time or worst-case time complexity of 𝐴 is the function
(1.1.13) wTime𝐴 ∶ ℕ → ℝ≥0
that sends a positive integer 𝑛 which is the size of an input of 𝐴 to the maximum
running time of 𝐴 over all inputs of size 𝑛. If 𝑛 is not the size of an input of 𝑎, then
we set wTime𝐴 (𝑛) = 0.
Using the Definitions 1.1.25 and 1.1.26, we define the asymptotic time and space
complexity of deterministic algorithms.
Definition 1.1.27. Let 𝑓 ∶ ℕ → ℝ>0 be a function. We say that 𝐴 has asymptotic worst-
case running time or space complexity O(𝑓) if wTime𝐴 = O(𝑓) or wSpace𝐴 = O(𝑓),
respectively. The words “asymptotic” and “worst-case” may also be omitted.
It is common to use special names for certain time and space complexities. Several
of these names are listed in Table 1.1.7.
Exercise 1.1.28. Show that quasilinear complexity can also be written as 𝑛1+o(1) , poly-
nomial complexity as 𝑛O(1) or 2O(log 𝑛) , subexponential complexity as 2o(𝑛) , and expo-
O(1)
nential complexity as 2𝑛 .
Example 1.1.29. We analyze the time and space complexity of the Euclidean Algo-
rithm 1.1.16. Let 𝑎, 𝑏 ∈ ℤ be the input of the algorithm, and let 𝑛 be the maximum
of size(𝑎) and size(𝑏). The time to read the input 𝑎, 𝑏 is O(𝑛). After the operations in
lines 2 and 3 we have 𝑎, 𝑏 ≥ 0. The time and space complexity of these instructions
is O(𝑛). If 𝑏 = 0, then the while loop is not executed and 𝑎 is returned, which takes
time O(𝑛). If 𝑏 ≠ 0 and 𝑎 ≤ 𝑏, then after the first iteration of the while loop, we have
𝑏 < 𝑎. It follows from Exercise 1.1.30 that the total number of executions of the while
loop is O(𝑛). Also, by this exercise, the size of the operands used in the executions of
the while loop is O(𝑛). So, the running time of each iteration is O(𝑛2 ) and the space
requirement is O(𝑛). This shows that the worst-case running time of the Euclidean
algorithm is O(𝑛3 ) and the worst-case space complexity is O(𝑛). Thus, the Euclidean
algorithm has cubic running time. Using more complicated arguments, it can even be
shown that this algorithm has quadratic running time (see Theorem 1.10.5 in [Buc04]).
What is the practical relevance of worst-case running times when comparing algo-
rithms? Let us take two algorithms, 𝐴 and 𝐴′ , both designed to solve the same problem,
such as computing the greatest common divisors. It is essential to note that if algorithm
𝐴 has a smaller asymptotic running time than algorithm 𝐴′ , it does not automatically
make 𝐴 superior to 𝐴′ in practice. This comparison only indicates that 𝐴 is faster than
𝐴′ for inputs greater than a certain length. However, this input length can be so large
that it becomes irrelevant for most real-world use cases.
For example, in [AHU74] it is shown that for any integer multiplication algorithm
with a worst-case time complexity of 𝑀(𝑛), there exists a gcd algorithm with a worst-
case time complexity of O(𝑀(𝑛) log(𝑛)). Additionally, [HvdH21] presents an integer
multiplication algorithm with a worst-case running time of O(𝑛 log 𝑛). As a result,
2
there is a corresponding gcd algorithm with a worst-case running time of O(𝑛 log 𝑛).
However, this improved complexity only outperforms the O(𝑛2 ) algorithm for very
large integers, which may not occur in most common input sizes.
Exercise 1.1.30. Let 𝑎, 𝑏 ∈ ℕ, 𝑎 > 𝑏, be the input of the Euclidean algorithm. Let
𝑟0 = 𝑎 and 𝑟1 = 𝑏. Denote by 𝑘 the number of iterations of the while loop executed
in the algorithm and denote by 𝑟2 , 𝑟3 , . . . , 𝑟 𝑘+1 the sequence of remainders 𝑟 which are
computed in line 5 of the Euclidean algorithm. Prove that the sequence (𝑟 𝑖 )0≤𝑖≤𝑘+1 is
strictly decreasing and that 𝑟 𝑖+2 < 𝑟 𝑖 /2 for all 𝑖 ∈ ℤ𝑘 . Conclude that 𝑘 = O(size 𝑎).
Example 1.1.31. We determine the worst-case time and space complexity of the De-
terministic Factoring Algorithm 1.1.21. Let 𝑛 = bitLength 𝑎. The number of iterations
of the for loop in this algorithm is O(2𝑛/2 ). Each iteration of the for loop requires time
O(𝑛2 ) and space O(𝑛). Hence, the worst-case time complexity of Algorithm 1.1.21 is
O(𝑛2 2𝑛/2 ) = 2O(𝑛) and the worst-case space complexity is O(𝑛). So, the algorithm has
exponential running time and linear space complexity.
16 1. Classical Computation
(1) The probabilistic algorithm 𝐴 may call the subroutine coinToss. It returns 0 or 1,
1
both with probability 2 .
(2) The probabilistic algorithm 𝐴 may call other probabilistic algorithms subroutines
if they satisfy the following condition. Given a permitted input, they terminate
and return one of finitely many possible outputs according to a probability distri-
bution.
(3) The run of 𝐴 on input of some 𝑎 ∈ Input(𝐴) may depend on 𝑎 and the return
values of the probabilistic subroutines called during the run of the algorithm.
Therefore, in contrast to deterministic algorithms, this run may not be uniquely
determined by 𝑎.
(4) 𝐴 may not terminate, since termination may depend on certain return values of
some probabilistic subroutine that may never occur.
(5) Let 𝑎 ∈ Input(𝐴) and suppose that 𝐴 terminates on input of 𝑎 with output 𝑜.
Then 𝑜 may not be uniquely determined by 𝑎, but it may also depend on the return
values of the probabilistic subroutine calls during the run of 𝐴. Also, we may have
𝑜 ∈ Output(𝐴, 𝑎), 𝑜 = “Failure”, which indicates that the algorithm did not find
a correct output or 𝑜 has neither of these properties.
(6) Due to the special meaning of the return value “Failure”, it must never be a correct
output.
Algorithm 1.2.2. Selecting a uniformly distributed random bit string of fixed length
Input: 𝑘 ∈ ℕ
Output: 𝑠 ∈ {0, 1}𝑘
1: randomString(𝑘)
2: for 𝑖 = 0 to 𝑘 − 1 do
3: 𝑠𝑖 ← coinToss
4: end for
5: return 𝑠 ⃗ = (𝑠0 , . . . , 𝑠𝑘−1 )
6: end
numbers. Moreover, as shown in [AGP94], there are infinitely many Carmichael num-
bers. Since Carmichael numbers are composite, the Fermat test will return 0 for these
inputs, making the algorithm non-error-free.
Exercise 1.2.5. (1) Write pseudocode for the Fermat test described in Example 1.2.4.
(2) Find a composite number 𝑎 such that on input of 𝑎 the algorithm of Example 1.2.4
sometimes returns 0 and sometimes 1.
Example 1.2.9. Algorithm 1.2.10 is a Las Vegas factoring algorithm which calls
𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 until a proper divisor of 𝑎 is found. This may take forever. But if the al-
gorithm terminates, then it is successful.
1.2. Probabilistic algorithms 19
The approach used in Algorithm 1.2.10 can be extended to create a more general
version, allowing any error-free Monte Carlo algorithm 𝐴 to be transformed into a Las
Vegas algorithm. This transformation is achieved through Algorithm 1.2.11. When
given an input 𝑎 ∈ Input(𝐴), this algorithm repeatedly executes 𝐴(𝑎) until a successful
outcome is obtained. As this algorithm is akin to performing a Bernoulli experiment,
we refer to it as the Bernoulli algorithm associated with 𝐴.
Algorithm 1.2.11. Bernoulli algorithm associated with an error-free Monte Carlo al-
gorithm 𝐴
Input: 𝑎 ∈ Input(𝐴)
Output: 𝑏 ∈ Output(𝐴, 𝑎)
1: 𝖻𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂𝐴 (𝑎)
2: 𝑏 ← “Failure”
3: while 𝑏 = “Failure” do
4: 𝑏 ← 𝐴(𝑎)
5: end while
6: return 𝑏
7: end
On the other hand, every Las Vegas algorithm can indeed be transformed into an
error-free Monte Carlo algorithm. This conversion entails monitoring the number of
calls made to the probabilistic subroutines while the algorithm runs. The algorithm
terminates if the Las Vegas algorithm produces a successful outcome or if the count of
subroutine calls exceeds a predetermined threshold value, which may vary depending
on the specific input of the algorithm. In the event of success, the algorithm returns the
output of the Las Vegas algorithm. However, if the threshold is surpassed, it returns
the result “Failure.”
Exercise 1.2.12. Change Algorithm 1.2.10 to an error-free Monte Carlo algorithm that,
on input of 𝑎 ∈ ℤ>1 , performs at most bitLength(𝑎) coin tosses.
Exercise 1.2.13. Change Algorithm 1.2.10 to an error-free Monte Carlo algorithm that,
on input of 𝑎 ∈ ℤ>1 , performs at most bitLength(𝑎) coin tosses.
membership of 𝑠 ⃗ ∈ {0, 1}∗ in a language 𝐿 ⊂ {0, 1}∗ . Such an algorithm always re-
turns 1 or 0. It satisfies Output(𝐴, 𝑠)⃗ = {1} for all 𝑠 ⃗ ∈ 𝐿 and Output(𝐴, 𝑠)⃗ = {0} for all
𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿. However, recall that runs of probabilistic decision algorithms do not
have to be successful. So, the algorithm may return 0 if 𝑠 ⃗ ∈ 𝐿 and 1 if 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿.
There are three different types of probabilistic decision algorithms. To define them,
let 𝐴 be a probabilistic algorithm that decides a language 𝐿.
(1) 𝐴 is called true-biased if it never returns false positives. So, if on input of 𝑠 ⃗ ∈ {0, 1}∗
the algorithm returns 1, then 𝑠 ⃗ ∈ 𝐿.
(2) 𝐴 is called false-biased if it never returns false negatives. So, if at the input of
𝑠 ⃗ ∈ {0, 1}∗ the algorithm returns 0, then 𝑠 ⃗ ∉ 𝐿.
(3) If 𝐴 is true-biased or false-biased, then it is also called an algorithm with one-sided
error.
(4) 𝐴 is called an algorithm with two-sided error if it can return false positives and
false negatives.
Note that a false-biased algorithm can always be transformed into a true-biased
algorithm. We only need to replace the language to be decided by its complement in
{0, 1}∗ and change the outputs 0 and 1.
Example 1.2.14. Consider Algorithm 1.2.15 that decides whether or not the integer
that corresponds to a string in {0, 1}∗ is composite or not. On the input of 𝑠 ⃗ ∈ {0, 1}∗ , the
algorithm computes the corresponding integer 𝑎 and calls 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋. If this subroutine
returns a proper divisor of 𝑎, then the algorithm returns 1. Otherwise, it returns 0.
This is a true-biased Monte Carlo decision algorithm. If it returns 1, then 𝑠 ⃗ represents
a composite integer. But if it returns 0, then the integer represented by 𝑠 ⃗ may or may
not be composite.
returns a false negative answer if 𝑎 is composite and coinToss and 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 both return
0. Also, it returns a false positive answer if 𝑎 is a prime number and coinToss gives 1.
Algorithm 1.2.17. Monte Carlo compositeness decision algorithm with two-sided er-
ror
Input: 𝑠 ⃗ ∈ {0, 1}∗
Output: 1 if stringToInt(𝑠)⃗ is composite and 0 otherwise
1: 𝗆𝖼𝖢𝗈𝗆𝗉𝗈𝗌𝗂𝗍𝖾2(𝑎)
2: 𝑎 ← stringToInt(𝑠)⃗
3: 𝑐 ← coinToss
4: 𝑏 ← 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
5: if 𝑐 = 1 ∨ 𝑏 ∈ ℕ then
6: return 1
7: else
8: return 0
9: end if
10: end
1.3.1. A discrete probability space. Our first goal is to define a discrete prob-
ability space that is the basis of the analyses. In this section, 𝐴 denotes a probabilistic
algorithm. We first introduce some notation.
Consider a run 𝑅 of 𝐴 with input 𝑎 ∈ Input(𝐴) and let 𝑙 ∈ ℕ0 ∪ {∞} be the num-
ber of probabilistic subroutine calls in 𝑅. For instance, in Algorithm 1.2.2 we have
Input(𝐴) = ℕ and for 𝑎 ∈ ℕ it holds that 𝑙 = 𝑎. In contrast, in Algorithm 1.2.10, the
number 𝑙 of probabilistic subroutine calls may be infinite.
For all 𝑘 ∈ ℕ, 𝑘 ≤ 𝑙, let 𝑎𝑘 be the input of the 𝑘th probabilistic subroutine call in 𝑅 if
this subroutine requires an input, let 𝑟 𝑘 be its output, and let 𝑝 𝑘 be the probability that
on input of 𝑎𝑘 the output 𝑟 𝑘 occurs. These quantities are well-defined since we require
that probabilistic algorithms may only use probabilistic subroutines that on any input
terminate and return one of finitely many possible outputs according to some proba-
bility distribution. For example, for the probabilistic subroutine coinToss there is no
1
input, the output is 0 or 1, and the probability of both outputs is 2 . We call 𝑟 ⃗ = (𝑟 𝑘 )𝑘≤𝑙
the random sequence of the run 𝑅. So the random sequence of a run of randomInt with
input 𝑎 ∈ ℕ is in {0, 1}𝑎 .
We denote the set of all random sequences of runs of 𝐴 with input 𝑎 by Rand(𝐴, 𝑎)
and the set of finite strings in Rand(𝐴, 𝑎) by FRand(𝐴, 𝑎). So for 𝐴 = randomInt
and 𝑎 ∈ ℕ we have Rand(𝐴, 𝑎) = FRand(𝐴, 𝑎) = {0, 1}𝑎 . We note that for any
𝑎 ∈ Input(𝐴), each 𝑟 ⃗ ∈ Rand(𝐴, 𝑎) is the random sequence of exactly one run of 𝐴. We
22 1. Classical Computation
call it the run of 𝐴 corresponding to 𝑟.⃗ This run terminates if and only if 𝑟 ⃗ ∈ FRand(𝐴, 𝑎)
in which case we write 𝐴(𝑎, 𝑟)⃗ for the return value of this run.
Finally, let 𝑘 ∈ ℕ0 , 𝑘 ≤ 𝑙, and let 𝑟 ⃗ = (𝑟0 , . . . , 𝑟 𝑘−1 ) be a prefix of a random sequence
of a run of 𝐴 with input 𝑎. Also, for 0 ≤ 𝑖 < 𝑘 denote by 𝑝 𝑖 the probability for the return
value 𝑟 𝑖 to occur. Then we set
𝑘−1
(1.3.1) Pr𝐴,𝑎 (𝑟)⃗ = ∏ 𝑝 𝑖 .
𝑖=0
This is the probability that 𝑟 ⃗ occurs as the prefix of the random sequence of a run of 𝐴
with input 𝑎. For instance, if 𝐴 = randomInt and 𝑎 ∈ ℕ, then for all 𝑘 ∈ ℕ0 with 𝑘 ≤ 𝑎
1
and all 𝑟 ⃗ ∈ {0, 1}𝑘 , we have Pr𝐴,𝑎 (𝑟)⃗ = 2𝑘 .
Exercise 1.3.1. Determine Rand(𝐴, 𝑎), FRand(𝐴, 𝑎), and Pr𝐴,𝑎 for 𝐴 = 𝗅𝗏𝖥𝖺𝖼𝗍𝗈𝗋 spec-
ified in Algorithm 1.2.10 and 𝑎 ∈ Input(𝐴) = ℕ>1 .
The next lemma allows the definition of the probability distribution that we are
looking for.
Lemma 1.3.2. Let 𝑎 ∈ Input(𝐴). The (possibly infinite) sum
(1.3.2) ∑ Pr(𝑟)⃗
⃗
𝑟∈FRand(𝐴,𝑎)
converges, its limit is in the interval [0, 1], and it is independent of the ordering of the terms
in the sum.
Proof. First, note the following. If the sum in (1.3.2) is convergent, then it is absolute
convergent since the terms in the sum are nonnegative. So Theorem C.1.4 implies that
its limit is independent of the ordering of the terms in the sum.
To prove the convergence of the sum, set
(1.3.3) 𝑡𝑘 = ∑ Pr𝐴,𝑎 (𝑟)⃗
⃗
𝑟∈FRand, |𝑟|⃗ ≤𝑘
for all 𝑘 ∈ ℕ0 . Then the sum in (1.3.2) is convergent if and only if the sequence (𝑡 𝑘 ) con-
verges. For 𝑘 ∈ ℕ0 let Rand𝑘 be the set of all prefixes of length at most 𝑘 of sequences
in Rand(𝐴, 𝑎). We will prove below that
(1.3.4) ∑ Pr𝐴,𝑎 (𝑟)⃗ = 1
⃗
𝑟∈Rand 𝑘
for all 𝑘 ∈ ℕ0 . Since the elements of the sequence (𝑡 𝑘 ) are nondecreasing this proves
the convergence of (𝑡 𝑘 ) and thus of the infinite sum (1.3.2).
We will now prove (1.3.4) by induction on 𝑘. Since Rand0 only contains the empty
sequence, (1.3.4) holds for 𝑘 = 0. For the inductive step, assume that 𝑘 ∈ ℕ0 and
′
that (1.3.4) holds for 𝑘. Denote by Rand𝑘 the set of all sequences of length at most 𝑘
″
in Rand(𝐴, 𝑎) and denote by Rand𝑘 the set of sequences of length 𝑘 that are proper
1.3. Analysis of probabilistic algorithms 23
″
prefixes of strings in Rand(𝐴, 𝑎). For 𝑟 ⃗ ∈ Rand𝑘 let 𝑚(𝑟)⃗ be the number of possible
outputs of the (𝑘 + 1)st call of a probabilistic subroutine when the sequence of return
values of the previous calls was 𝑟,⃗ let 𝑟 𝑖 (𝑟)⃗ be the 𝑖th of these outputs, and let 𝑝 𝑖 (𝑟)⃗ be its
probability. These quantities exist by the definition of probabilistic algorithms. Then
we have
′ ″
(1.3.6) Rand𝑘+1 = Rand𝑘 ∪{𝑟‖𝑟
⃗ 𝑖 (𝑟)⃗ ∶ 𝑟 ⃗ ∈ Rand𝑘 and 1 ≤ 𝑖 ≤ 𝑚(𝑟)}.
⃗
Also, we have
𝑚(𝑟)⃗
(1.3.7) ∑ 𝑝 𝑖 (𝑟)⃗ = 1
𝑖=1
″
for all 𝑟 ⃗ ∈ Rand𝑘 . This implies
𝑚(𝑟)⃗
∑ Pr𝐴,𝑎 (𝑟)⃗ = ∑ Pr𝐴,𝑎 (𝑟)⃗ + ∑ ∑ Pr𝐴,𝑎 (𝑟||𝑠
⃗ 𝑖 (𝑟))
⃗
⃗
𝑟∈Rand ′ ″ 𝑖=1
𝑘+1 ⃗
𝑟∈Rand𝑘 ⃗
𝑟∈Rand 𝑘
𝑚(𝑟)⃗
= ∑ Pr𝐴,𝑎 (𝑟)⃗ + ∑ Pr𝐴,𝑎 (𝑟)⃗ ∑ 𝑝 𝑖 (𝑟)⃗
′ ″ 𝑖=1
⃗
𝑟∈Rand𝑘 ⃗
𝑟∈Rand𝑘
= ∑ Pr𝐴,𝑎 (𝑟)⃗ = 1. □
⃗
𝑟∈Rand𝑘
Lemma 1.3.2 allows the definition of the probability distribution that we are look-
ing for. This is done in the following proposition.
Then (FRand(𝐴, 𝑎) ∪ {∞}, Pr𝐴,𝑎 ) is a discrete probability space. Also, if Pr𝐴,𝑎 (∞) = 0,
then (FRand(𝐴, 𝑎), Pr𝐴,𝑎 ) is a discrete probability space.
For 𝑎 ∈ Input(𝑎) and 𝑟 ⃗ ∈ FRand(𝐴, 𝑎), the value Pr𝐴,𝑎 (𝑟)⃗ is the probability that
𝑟 ⃗ is the random sequence of a run of 𝐴 with input 𝑎. Also, Pr𝐴,𝑎 (∞) is the probability
that on input of 𝑎, the algorithm 𝐴 does not terminate.
An important type of algorithms 𝐴 that satisfy Pr𝐴,𝑎 (∞) = 0 for all 𝑎 ∈ Input(𝑎)
is Monte Carlo algorithms. We now show that they are exactly the probabilistic algo-
rithms that, according to the specification in Section 1.2.1, can be called by probabilistic
algorithms as subroutines.
24 1. Classical Computation
Proposition 1.3.5. Let 𝐴 be a Monte Carlo algorithm and let 𝑎 ∈ Input(𝐴). Then the
following hold.
(1) The running time of 𝐴 on input of 𝑎 is bounded by some 𝑘 ∈ ℕ that may depend on
𝑎.
(2) On input of 𝑎, algorithm 𝐴 returns one of finitely many possible outputs according
to a probability distribution.
Proof. We first show that the length of all 𝑟 ⃗ ∈ FRand(𝐴, 𝑎) is bounded by some 𝑘 ∈ ℕ.
This shows that there are only finitely many possible runs of 𝐴 on input of 𝑎 which
implies the first assertion.
Assume that no such upper bound exists. We inductively construct prefixes 𝑟 𝑘⃗ =
(𝑟0 , . . . , 𝑟 𝑘 ), 𝑘 ∈ 𝑁0 , of an infinite sequence 𝑟 ⃗ = (𝑟0 , 𝑟1 , . . .) that are also prefixes of
arbitrarily long strings in Rand(𝐴, 𝑎); that is, for all 𝑘 ∈ ℕ0 and 𝑙 ∈ ℕ the sequence 𝑟 𝑘⃗
is a prefix of a sequence in Rand(𝐴, 𝑎) of length at least 𝑙. Then 𝑟 ⃗ is an infinite sequence
in Rand(𝐴, 𝑎) that contradicts the assumption that 𝐴 is a Monte Carlo algorithm.
For the base case, we set 𝑟0⃗ = (). This is a prefix of all strings in Rand(𝐴, 𝑎) that, by
our assumption, may be arbitrarily long. For the inductive step, assume that 𝑘 ∈ ℕ and
that we have constructed 𝑟 𝑘−1 ⃗ = (𝑟0 , . . . , 𝑟 𝑘−1 ). By the definition of probabilistic algo-
rithms, there are finitely many possibilities to select 𝑟 𝑘 in such a way that the sequence
(𝑟0 , . . . , 𝑟 𝑘 ) is the prefix of a string in Rand(𝐴, 𝑎). For at least one of these choices, this
sequence is a prefix of arbitrarily long strings in Rand(𝐴, 𝑎) because, by the induction
hypothesis, 𝑟 𝑘−1 ⃗ has this property. We select such an 𝑟 𝑘 and this concludes the induc-
tive construction and the proof of the first assertion.
Together with Proposition 1.3.3, the first assertion of the proposition implies the
second one. □
and call this quantity the success probability of 𝐴 on input of 𝑎. Also, the value
(1.3.11) 𝑞𝐴 (𝑎) = 1 − 𝑝𝐴 (𝑎)
is called the failure probability of 𝐴 on input of 𝑎.
Exercise 1.3.7. Prove that for all 𝑎 ∈ Input(𝐴), the sum in (1.3.10) is convergent and
its limit is independent of the ordering of the terms in the sum.
Example 1.3.8. Let 𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 specified in Algorithm 1.2.8 and let 𝑎 ∈ Input(𝐴) =
ℕ>1 . Then Randsucc (𝐴, 𝑎) is the set of all sequences (𝑏) where 𝑏 is a proper divisor of
1.3. Analysis of probabilistic algorithms 25
𝑎 of bitlength at most m(𝑎). By Exercise 1.2.7, this set is not empty. Therefore, the
success probability 𝑝𝐴 (𝑎) of 𝐴 on input of 𝑎 is at least 1/2m(𝑎) .
We can use the definition of the success probability to show that Bernoulli algo-
rithms terminate with probability 1.
Proposition 1.3.9. Let 𝐴 be a Bernoulli algorithm. Then we have Pr𝐴,𝑎 (∞) = 0 for all
𝑎 ∈ Input(𝐴).
Proof. Denote by 𝐴′ the error-free Monte Carlo algorithm used in 𝐴. Let 𝑎 ∈ Input(𝐴′ ).
Then FRand(𝐴, 𝑎) consists of all strings 𝑟 ⃗ = 𝑟1⃗ || ⋯ ||𝑟 𝑘⃗ where 𝑘 ∈ ℕ, 𝑟 𝑖⃗ ∈ Rand(𝐴′ , 𝑎)
for 1 ≤ 𝑖 ≤ 𝑘, 𝐴′ (𝑎, 𝑟 𝑖⃗ ) = “Failure” for 1 ≤ 𝑖 < 𝑘, and 𝐴′ (𝑎, 𝑟 𝑘⃗ ) ≠ “Failure”. So we
obtain
∞
𝑝𝐴′ (𝑎)
(1.3.12) ∑ Pr𝐴,𝑎 (𝑠) = 𝑝𝐴′ (𝑎) ∑ (1 − 𝑝𝐴′ (𝑎))𝑘 = = 1.
⃗
𝑟∈FRand(𝐴,𝑎) 𝑘=0
𝑝𝐴′ (𝑎)
Example 1.3.11. Let 𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 which is specified in Algorithm 1.2.8 and let 𝑎 ∈
Input(𝐴). Then FRand(𝐴, 𝑎) is the set of all one-element sequences (𝑏), where 𝑏 is
an integer that can be represented by a bit string of length m(𝑎). So | FRand(𝐴, 𝑎)| ≤
2m(𝑎) . Also, by Proposition 1.3.5 we have Pr𝐴,𝑎 (∞) = 0. So eTime𝐴 (𝑎) is defined. Since
2
each run of 𝐴 on input 𝑎 has running time O(size 𝑎), we have
2 1 2
(1.3.14) eTime𝐴 (𝑎) = O (size 𝑎 ∑ ) = O(size 𝑎).
⃗
𝑟∈FRand(𝐴,𝑎)
2m(𝑎)
The next proposition determines the expected running time of Bernoulli algorithms.
Proposition 1.3.12. Let 𝐴 be an error-free Monte Carlo algorithm, let 𝑎 ∈ Input(𝐴),
and let 𝑡 be an upper bound on the running time of 𝐴 with input of 𝑎. Then the expected
running time of 𝖻𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂𝐴 (𝑎) specified in Algorithm 1.2.11 is O(𝑡/𝑝𝐴 (𝑎)).
26 1. Classical Computation
Proof. We use the fact that for all 𝑐 ∈ ℝ with 0 ≤ 𝑐 < 1 we have
∞
𝑐
(1.3.15) ∑ 𝑘𝑐𝑘 = .
𝑖=0
(1 − 𝑐)2
So the expected number of calls of 𝐴 until 𝖻𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂𝐴 (𝑎) is successful is
∞
𝑝𝐴 (𝑎) 1
(1.3.16) 𝑝𝐴 (𝑎) ∑ 𝑘𝑞𝐴 (𝑎)𝑘 = 2
= .
𝑘=0
𝑝𝐴 (𝑎) 𝑝𝐴 (𝑎)
The statement about the expected running time is an immediate consequence of this
result. □
Example 1.3.13. Proposition 1.3.12 allows the analysis of 𝗅𝗏𝖥𝖺𝖼𝗍𝗈𝗋 specified in Algo-
rithm 1.2.10. Let 𝑛 ∈ ℕ and let 𝑎 ∈ ℕ>1 be an input of size 𝑛. It follows from Example
1.3.8 that the success probability of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) is at least 1/2m(𝑎) ≥ 1/2𝑛/2+1 . Also,
the worst-case running time of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) is O(𝑛2 ). It therefore follows from Proposi-
tion 1.3.12 that the expected running time of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) is O(𝑛2 2𝑛/2 ). So the expected
running time is exponential which shows that this probabilistic algorithm has no ad-
vantage over the deterministic Algorithm 1.1.21.
Definition 1.3.15. Let 𝑎 ∈ Input(𝐴). We denote the success probability of 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 (𝑎, 𝑘)
by 𝑝𝐴 (𝑎, 𝑘) and the failure probability of this call by 𝑞𝐴 (𝑎, 𝑘) = 1 − 𝑝𝐴 (𝑎, 𝑘).
The next corollary shows how to choose 𝑘 in order to obtain a desired success prob-
ability. It also gives a lower bound for 𝑘 that corresponds to a given success probability.
Corollary 1.3.17. Let 𝑎 ∈ Input(𝐴) with 𝑝𝐴 (𝑎) > 0 and let 𝜀 ∈ ℝ with 0 < 𝜀 ≤ 1.
(1) If 𝑘 ≥ log(1/𝜀)/𝑝𝐴 (𝑎), then 𝑝𝐴 (𝑎, 𝑘) ≥ 1 − 𝜀.
(2) If 𝑝𝐴 (𝑎, 𝑘) ≥ 1 − 𝜀, then 𝑘 ≥ log(1/𝜀)𝑞𝐴 (𝑎)/𝑝𝐴 (𝑎).
Exercise 1.3.18. Prove Corollary 1.3.17.
Example 1.3.19. Consider 𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 as specified in Algorithm 1.2.8. In Example
1.3.8, we have seen that 𝑝𝐴 (𝑎) ≥ 1/2m(𝑎) > 0 for all 𝑎 ∈ ℤ>1 . So 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 can be used
to amplify this probability. For example, if we choose 𝜀 = 1/3 and 𝑘 ≥ (log 3)2m(𝑎) ≥
2
log(1/𝜀)/𝑝𝐴 (𝑎), then Corollary 1.3.17 implies 𝑝𝐴 (𝑎, 𝑘) ≥ 3 . Since
m(𝑎) ≥ bitLength(𝑎)/2,
this number 𝑘 of calls to 𝐴 is exponential in size 𝑎. Therefore, again, this algorithm
does not give an asymptotic advantage over the deterministic Algorithm 1.1.21.
We can also amplify the success probability of decision algorithms with errors.
Consider a true-biased decision algorithm 𝐴 that decides a language 𝐿. We can mod-
ify this algorithm to make it an error-free Monte Carlo algorithm. For this, we set
Output(𝐴, 𝑠)⃗ = {1} for all 𝑠 ⃗ ∈ 𝐿, Output(𝐴, 𝑠)⃗ = ∅ for all 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿 and we re-
place the return value 0 by “Failure”. So, the success probability of 𝐴 can be amplified
using Algorithm 1.3.14. Analogously, the success probability of false-biased decision
algorithms can be amplified.
Next, we consider a Monte Carlo decision algorithm 𝐴 with two-sided error that
decides a language 𝐿. Such an algorithm never gives certainty about whether an input
𝑠 ⃗ ∈ {0, 1}∗ belongs to 𝐿 or not. However, the probability of success can be increased
by using a majority vote. To do this, we run the algorithm 𝑘 times with input 𝑠 ⃗ for
some 𝑘 ∈ ℕ and count the number of positive responses 1 and the number of negative
28 1. Classical Computation
answers 0 and return 1 or 0 depending on which answer has the majority. This is done
in Algorithm 1.3.20.
Algorithm 1.3.20. Success probability amplifier for a Monte Carlo decision algorithm
𝐴 with two-sided error
Input: 𝑠 ⃗ ∈ {0, 1}∗ , 𝑘 ∈ ℕ
Output: 1 if 𝑠 ⃗ ∈ 𝐿 and 0 if 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿 where 𝐿 is the language decided by the Monte
Carlo decision algorithm 𝐴 that is used as a subroutine
1: 𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘)
2: 𝑙=0
3: for 𝑖 = 1 to 𝑘 do
4: 𝑙 ← 𝑙 + 𝐴(𝑠)⃗
5: end for
6: if 𝑙 > 𝑘/2 then
7: return 1
8: else
9: return 0
10: end if
11: end
We will show that under certain conditions, Algorithm 1.3.20 amplifies the success
probability of decision algorithms with two-sided error. For this, we need the following
definition.
Definition 1.3.21. Assume that a Monte Carlo algorithm 𝐴 decides a language 𝐿, let
𝑠 ⃗ ∈ 𝐿, and let 𝑏 ∈ {0, 1}. Then we write Pr(𝐴(𝑠) = 𝑏) for the probability that on input
of 𝑠 ⃗ the algorithm 𝐴 returns 𝑏.
Proposition 1.3.22. Let 𝐴 be a Monte Carlo algorithm that decides a language 𝐿, let
1
𝑠 ⃗ ∈ 𝐿, and let 𝜀 ∈ ℝ>0 such that Pr(𝐴(𝑠)⃗ = 1) ≥ 2 + 𝜀. Then for all 𝑘 ∈ ℕ we have
2
(1.3.23) Pr(𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘) = 1) > 1 − 𝑒−2𝑘𝜀 .
for all 𝑘 ∈ ℕ.
Exercise 1.3.24. Use the result in Example 1.4.7 to determine 𝑘 such that
2
Pr(𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘) = 1) ≥ .
3
Example 1.4.3. By the square root problem we mean the triplet (ℕ, ℤ, 𝑅) where 𝑅 =
{(𝑎, 𝑏) ∈ ℕ × ℤ ∶ 𝑏2 = 𝑎 or 𝑏 = 0 if 𝑎 is not a square in ℕ}. An instance of the square
root problem is 4. It has the two solutions −2 and 2. Another instance is 2. It has the
solution 0 that indicates that 2 is not a square in ℕ. We can also define this problem
differently by only allowing problem instances that are squares.
30 1. Classical Computation
Example 1.4.9. As seen in Example 1.1.29, the gcd problem from Example 1.4.5 can
be solved in deterministic time O(𝑛3 ). As noted in this example, the gcd problem can
2
even be solved in deterministic time O(𝑛2 ) or O(𝑛 log 𝑛) and linear space. Thus this
problem can be solved in polynomial time or, more precisely, cubic, quadratic, or even
quasilinear time.
Example 1.4.10. As seen in Example 1.1.31, the integer factorization problem can be
solved in deterministic exponential time and linear space.
1.4.3. Complexity classes. In this section, we delve into the definition of com-
plexity classes, which serve to group languages that satisfy specific complexity con-
ditions. The foundation of this concept was laid in the early 1970s. Over the years,
complexity theory has witnessed the introduction of numerous complexity classes, and
extensive research has been conducted to study their interrelationships. For the scope
of this discussion, we will focus on a select few complexity classes that hold relevance
to our context.
We begin with the definition of the most basic complexity classes.
Definition 1.4.14. Let 𝑓 ∶ ℕ → ℝ>0 be a function.
(1) The complexity class DTIME(𝑓) is the set of all languages 𝐿 for which there is a
deterministic algorithm that decides 𝐿 and has time complexity O(𝑓).
(2) The complexity class DSPACE(𝑓) is the set of all languages 𝐿 for which there is a
deterministic algorithm that decides 𝐿 and has space complexity O(𝑓).
Exercise 1.4.16. Consider the language 𝐿 of all strings that correspond to squares in
ℕ. Show that 𝐿 is in P.
Example 1.4.17. As shown in 2002 by Manindra Agrawal, Neeraj Kayal, and Nitin
Saxena [AKS04], the language 𝐿 of all bit strings that correspond to composite integers
is in P. Therefore, it can be decided in polynomial time whether a positive integer is
a prime number composite. However, if the algorithm of Agrawal, Kayal, and Saxena
finds that a positive integer is composite, it does not give a proper divisor of this number.
Finding such a divisor appears to be a much harder problem (see Example 1.4.13).
There are many other languages 𝐿 that have a property analogous to that of the
Goldbach language presented in Example 1.4.19. Abstractly speaking, this property is
the following. For 𝑠 ⃗ ∈ {0, 1}∗ it may be hard to decide whether 𝑠 ⃗ ∈ 𝐿. But for each 𝑠 ⃗ ∈ 𝐿
there is a certificate 𝑡 which allows us to verify in polynomial time in |𝑠|⃗ that 𝑠 ⃗ ∈ 𝐿. For
1.4. Complexity theory 33
the Goldbach language, the certificate is the prime number 𝑝 such that 𝑎 − 𝑝 is a prime
number. The set of all languages with this property is denoted by NP, which stands for
nondeterministic polynomial time. This name comes from another NP modeling that
we do not discuss here (see [LP98]). Here is a formal definition of NP.
Definition 1.4.20. (1) The complexity class NP is the set of all languages 𝐿 with the
following properties.
(a) There is a deterministic polynomial time algorithm 𝐴 with Input(𝐴) = {0, 1}∗
× {0, 1}∗ such that 𝐴(𝑠,⃗ 𝑡)⃗ = 1 implies 𝑠 ⃗ ∈ 𝐿 for all 𝑠,⃗ 𝑡 ⃗ ∈ Σ∗ .
(b) There is 𝑐 ∈ ℕ that may depend on 𝐿 so that for all 𝑠 ⃗ ∈ 𝐿 there is 𝑡 ⃗ ∈ {0, 1}∗
with |𝑡|⃗ ≤ |𝑠|⃗ 𝑐 and 𝐴(𝑠,⃗ 𝑡)⃗ = 1.
If 𝑠 ⃗ ∈ 𝐿 and 𝑡 ⃗ ∈ {0, 1}∗ such that 𝐴(𝑠,⃗ 𝑡)⃗ = 1, then 𝑡 ⃗ is called a certificate for the
membership of 𝑠 ⃗ in 𝐿.
(2) The complexity class Co-NP is the set of all languages 𝐿 such that {0, 1}∗ ⧵ 𝐿 ∈ NP.
One of the big open research problems in computer science is finding out whether
P is equal to NP. It is one of the seven Millennium Prize Problems. They are well-
known mathematical problems that were selected by the Clay Mathematics Institute
in the year 2000. The Clay Institute has pledged a US$1 million prize for the correct
solution to any of the problems.
The complexity theory that we have explained so far only refers to solving language
decision problems but not to more general computational problems such as finding
proper divisors of composite integers. But, as illustrated in the next example, there is
a close connection between these two problem classes.
of 𝑎. During the execution of the algorithm, the interval shrinks exponentially. After
O(size 𝑎) iterations, we have 𝑢 = 𝑣. Then the algorithm returns 𝑏 = 𝑢. To achieve this,
the algorithm initially sets 𝑢 ← 2 and 𝑣 ← 𝑎−1. Since 𝑎 is composite, the interval [𝑢, 𝑣]
contains a proper divisor of 𝑎, but not the interval [1, 𝑢 − 1]. While 𝑢 < 𝑣, algorithm
𝑣−ᵆ
𝐴′ repeats the following steps. It determines 𝑚 = ⌊ 2 ⌋ and calls 𝐴(𝑎, 𝑢 + 𝑚). If the
return value is 1, then 𝐴′ sets 𝑣 to 𝑢 + 𝑚. So [𝑢, 𝑣] contains a proper divisor of 𝑎, but
[1, 𝑢 − 1] does not. If the return value is 0, then 𝐴′ sets 𝑢 to 𝑢 + 𝑚 + 1. Again [𝑢, 𝑣]
contains a proper divisor of 𝑎 but [1, 𝑢 − 1] does not. If after this step we have 𝑢 = 𝑣,
then the algorithm returns 𝑏 = 𝑢. It is a proper divisor of 𝑎 since [𝑢, 𝑣] contains such
a divisor. Since in each iteration of this while loop the interval [𝑢, 𝑣] is roughly cut in
half and the initial length of the interval is 𝑎 − 2, the number of iterations is O(size 𝑎).
Also, because 𝐴 is a polynomial time algorithm, it follows that 𝐴′ runs in polynomial
time.
Exercise 1.4.22. Write pseudocode for the algorithms sketched in Example 1.4.21 and
analyze them.
The next theorem describes the relation between the complexity classes that we
have introduced. This theorem is also illustrated in Figure 1.4.1.
EXP PSPACE
PSPACE
PP
P
-N
N
P
co
BPP
P
P
Proof. Since our computational model is only semiformal, we can only sketch the
proofs of the inclusions. But with the ideas presented below, the proofs can be car-
ried out in any of the formal models of computation, for instance the Turing machine
model.
We clearly have P ⊂ NP.
To prove NP ⊂ PSPACE, let 𝐿 ∈ NP. Also, let 𝐴 be an algorithm and let 𝑐 ∈
ℕ be a constant with the properties from Definition 1.4.20. Then this algorithm can
be transformed as follows into an algorithm 𝐴′ that decides 𝐿 in polynomial space.
The input set of 𝐴′ is {0, 1}∗ . On input of 𝑠 ⃗ ∈ {0, 1}∗ the modified algorithm runs the
algorithm 𝐴 with all possible certificates 𝑡 ⃗ ∈ {0, 1}∗ such that |𝑡|⃗ ≤ |𝑠|⃗ 𝑐 . It returns 1 if 𝐴
returns 1 for one of these certificates and 0 otherwise. It follows from Definition 1.4.20
that 𝐴′ decides 𝐿. Also, since 𝐴 is a polynomial time algorithm and because |𝑡|⃗ ≤ |𝑠|⃗ 𝑐 ,
it follows that 𝐴′ has polynomial space complexity.
Next, we show that PSPACE ⊂ EXPTIME. Let 𝐿 ∈ PSPACE and let 𝐴 be an
algorithm with polynomial space complexity that decides 𝐿. So there is a constant
𝑐 ∈ ℕ such that on input of 𝑠 ⃗ ∈ {0, 1}∗ the size of the memory used by the algorithm is
𝑐
at most |𝑠|⃗ 𝑐 . Therefore, the number of states of the algorithm run with input 𝑠 ⃗ is O(2𝑛 )
since the number of instructions that the algorithm may use is a constant. This implies
that the algorithm has exponential running time.
Now we turn to probabilistic algorithms. Clearly, we have P ⊂ BPP ⊂ PP.
To see that PP ⊂ PSPACE, let 𝐿 be a language in PP and let 𝐴 be a Monte Carlo
algorithm that decides 𝐿 as described in Definition 1.4.18. Using 𝐴, we construct an
algorithm 𝐴′ with polynomial space complexity that decides 𝐿. Let 𝑠 ⃗ ∈ {0, 1}∗ . Since 𝐴
has polynomial running time, there is 𝑐 ∈ ℕ such that the number of calls of probabilis-
tic subroutines in a run of 𝐴 on input of 𝑠 ⃗ is at most |𝑠|⃗ 𝑐 . On input of 𝑎, the algorithm 𝐴′
runs algorithm 𝐴 with all random sequences corresponding to runs of 𝐴 with input 𝑠.⃗
Their number is bounded by ≤ |𝑠|⃗ 𝑐 . If the majority of these runs returns 1, the return
value of 𝐴′ is 1. Otherwise, 𝐴′ returns 0. It follows from the definition of the complexity
class PP that 𝐴′ decides 𝐿. Also, 𝐴′ has polynomial space complexity. □
1.5.1. Logic gates. Logic gates are fundamental components of Boolean circuits.
A logic gate is a device that implements a Boolean function {0, 1}𝑛 → {0, 1}𝑚 , where 𝑚
and 𝑛 are natural numbers. The availability of specific gates depends on the comput-
ing platform being used. In this context, we will focus solely on the gates with 𝑚 = 1.
Table 1.5.1 presents commonly used logic gates and their corresponding implemented
functions are listed in Table 1.1.4. These gates can be realized using various technolo-
gies, such as diodes or transistors acting as electronic switches. Additionally, they can
36 1. Classical Computation
Boolean circuits are also referred to as logic circuits or simply as circuits. We intro-
duce a few important notions for Boolean circuits.
Definition 1.5.2. Let 𝐶 be a Boolean circuit.
(1) The depth of a node 𝑣 of 𝐶 is the maximum length of a path from an input node
or a constant node to 𝑣.
(2) The depth of 𝐶 is the maximum depth of all nodes of 𝐶. It is denoted by depth(𝐶).
(3) The size of 𝐶 is the number of nodes of 𝐶. It is denoted by |𝐶|.
Example 1.5.3. Figure 1.5.1 shows two examples of Boolean circuits. The first imple-
ments 𝖭𝖠𝖭𝖣 using one 𝖠𝖭𝖣 and one 𝖭𝖮𝖳 gate. The second implements 𝖷𝖮𝖱 using one
𝖭𝖠𝖭𝖣, one 𝖮𝖱, and one 𝖠𝖭𝖣 gate.
1.5. The circuit model 37
𝑏0
𝑏0
𝑏0 ↑ 𝑏1 𝑏1 𝑏0 ⊕ 𝑏1
𝑏1
Exercise 1.5.4. Verify that the circuits in Figure 1.5.1 implement 𝖭𝖠𝖭𝖣 and 𝖷𝖮𝖱.
In the second circuit in Figure 1.5.1, the input nodes have outdegree 2. This is
represented by a fanout symbol . Fanout operations are used in circuits in order to
increase the outdegree of logic gates. When we describe the simulation of circuits by
reversible circuits in Section 1.7, we will consider fanout symbols as gates.
Next, we define the functions that are computed by circuits. Let 𝐶 = (𝑉, 𝐸, 𝐺, 𝐿)
be a circuit with 𝑛 input nodes and 𝑚 output nodes. To simplify the description, we
assume that all gates in 𝐺 implement functions {0, 1}𝑙 → {0, 1} for some 𝑙 ∈ ℕ. The
generalization to arbitrary gates is straightforward.
The circuit 𝐶 computes a function
(1.5.1) 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 .
To specify this function, we let 𝑏 ⃗ = (𝑏0 , . . . , 𝑏𝑛−1 ) ∈ {0, 1}𝑛 and construct 𝑓(𝑏).
⃗ For this,
we use a value function
(1.5.2) 𝐵 ∶ 𝑉 → {0, 1}
which we define by induction on the depths of the nodes in 𝑉. For the base case, we
specify the following.
(1) For constant nodes 𝑣 labeled 0 or 1 we set 𝐵(𝑣) to 0 or 1, respectively.
(2) Let 𝑣 𝑖 be the input nodes of 𝐶 for 0 ≤ 𝑖 < 𝑛. Then we set 𝐵(𝑣 𝑖 ) = 𝑏𝑖 , 0 ≤ 𝑖 < 𝑛.
Here, we use the ordering on the input nodes to assign a bit 𝑏𝑖 to an input node
𝑣𝑖 .
For the inductive step, let 𝐾 be the depth of 𝐶 and let 𝑘 be a positive integer with 0 <
𝑘 ≤ 𝐾. Assume that 𝐵(𝑣) has been defined for all nodes 𝑣 of depth less than 𝑘. Let 𝑣 be
a node of depth 𝑘. We define 𝐵(𝑣) as follows. Since the depth of the node 𝑣 is greater
than 0, it is either a gate or an output node. Assume that 𝑣 is a gate. Let
(1.5.3) 𝑔 ∶ {0, 1}𝑙 → {0, 1}
be the Boolean function implemented by this gate. Then 𝑔 has 𝑙 incoming edges and,
by definition, there is an ordering (𝑒 0 , . . . , 𝑒 𝑙−1 ) on these edges. Denote by 𝑢0 , . . . , 𝑢𝑙−1
the nodes in the circuit such that 𝑒 𝑖 is an outgoing edge of 𝑢𝑖 for 0 ≤ 𝑖 < 𝑙. Then the
nodes 𝑢0 , . . . , 𝑢𝑙−1 have depth less than 𝑘. Therefore, the values 𝐵(𝑢𝑖 ), 0 ≤ 𝑖 < 𝑙, are
already defined. We set
(1.5.4) 𝐵(𝑣) = 𝑔(𝐵(𝑢0 ), . . . , 𝐵(𝑢𝑛−1 )).
38 1. Classical Computation
Assume that 𝑣 is an output node. By definition, it has indegree 1. Let 𝑢 be the node in
𝑉 from which there exists an edge to 𝑣. Then we define
(1.5.5) 𝐵(𝑣) = 𝐵(𝑢).
Finally, let (𝑦0 , . . . , 𝑦𝑚 ) be the ordered sequence of output nodes of 𝐶. Then we set
(1.5.6) 𝑓(𝑏)⃗ = (𝐵(𝑦0 ), . . . , 𝐵(𝑦𝑚−1 )).
Examples of circuits and the functions that they implement can be seen in Figure
1.5.1.
Exercise 1.5.5. Define the function computed by a circuit that uses logic gates with
more than one output gate.
Now we present a very simple set of logic gates that is universal for classical com-
putation.
Theorem 1.5.7. The set {𝖭𝖮𝖳, 𝖠𝖭𝖣, 𝖮𝖱} is universal for classical computation.
𝑏 𝑏
𝑓(𝑏) = 0 𝑓(𝑏) = 1
0 1
𝑏 𝑓(𝑏) = 𝑏 𝑥 𝑓(𝑏) = ¬𝑏
Figure 1.5.2. Base case of the induction proof in Theorem 1.5.7: circuits that compute
the four functions 𝑓 ∶ {0, 1} → {0, 1}.
𝑏0
𝑏1
⋮ 𝑓0 (𝑏0 , . . . , 𝑏𝑛−2 )
𝑏𝑛−2
𝑓(𝑏)⃗
⋮ 𝑓1 (𝑏0 , . . . , 𝑏𝑛−2 )
𝑏𝑛−1
By the induction hypothesis, there exist circuits that implement 𝑓0 and 𝑓1 and use only
the gates 𝖭𝖮𝖳, 𝖮𝖱, and 𝖠𝖭𝖣. Therefore, the circuit in Figure 1.5.3 that uses the circuits
for 𝑓0 and 𝑓1 implements the function 𝑓. □
𝑏0
𝑏 𝑏0
¬𝑏 𝑏0 ∧ 𝑏1 𝑏0 ∨ 𝑏1
𝑏1
𝑏1
Theorem 1.5.9. The gate set {𝖭𝖠𝖭𝖣} is universal for classical computing.
Proof. By Theorem 1.5.7 it suffices to show that the gates 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱 can be
implemented by a circuit that uses only the 𝖭𝖠𝖭𝖣 gate. This is shown in Figure 1.5.4.
□
Exercise 1.5.10. Show that the set {𝖭𝖮𝖱} is universal for classical computation.
If it is clear which universal set of logic gates we refer to, we simply speak about
the circuit size complexity of a Boolean function.
We note that there is also the notion of circuit depth complexity which is not re-
quired in our context.
1.6.1. Circuit families. Individual circuits cannot compute functions 𝑓 ∶ {0, 1}∗ →
{0, 1}∗ or decide languages since their input length is fixed. To solve these more general
problems, we need families of circuits. We fix a finite universal set of logical gates and
assume from now on that all circuits are constructed using these gates. Such universal
sets are presented in Theorems 1.5.7 and 1.5.9.
Definition 1.6.1. A family of circuits or circuit family is a sequence (𝐶𝑛 )𝑛∈ℕ of circuits
such that the circuit 𝐶𝑛 has 𝑛 input nodes for all 𝑛 ∈ ℕ.
Next, we describe what it means for a circuit family to compute a function, solve a
computational problem, or decide a language. In doing so, we must take into account
that circuits have a fixed output length. Example 1.6.2 illustrates how to deal with this.
1.6. Circuit families and circuit complexity 41
Example 1.6.2. Consider the function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ , 𝑎 ↦ 𝑓(𝑎) = 𝑎2 where we
identify the elements of {0, 1}∗ with the integers they represent. How can we implement
this function using a circuit family? For 𝑛 ∈ ℕ, let
(1.6.1) 𝑓𝑛 ∶ {0, 1}𝑛 → {0, 1}∗ , 𝑎 ↦ 𝑎2 .
In order to implement 𝑓𝑛 as a circuit, we must use representations of the function values
that have the same length for all inputs of length 𝑛. So we prepend an appropriate
number of zeros to the binary representations to ensure that they all have the same
length 2𝑛. For example, for 𝑛 = 2 we write 𝑓2 (00) = 0000, 𝑓2 (01) = 0001, 𝑓2 (10) =
0100, 𝑓2 (11) = 1001. In the same way, circuits 𝐶𝑛 can be constructed that implement
𝑓𝑛 for all 𝑛 ∈ ℕ. We note that the binary expansion of the function values ≠ 0 can be
obtained from the function values represented by bit strings of length 𝑛 by deleting the
leading zeros.
The idea of Example 1.6.2 can be used for all functions 𝑓 ∶ {0, 1}∗ → {0, 1}∗ whose
function values are encoded either by 0 or by bit strings in {0, 1}∗ starting with bit 1. This
encoding can be easily obtained from any encoding by prefixing the representations
of the function values different from 0 with 1. So without loss of generality, we only
consider functions that satisfy
(1.6.2) |𝑠|⃗ = |𝑠′⃗ | ⇒ |𝑓(𝑠)|
⃗ = |𝑓(𝑠′⃗ )| for all 𝑠,⃗ 𝑠′⃗ ∈ {0, 1}∗ .
Definition 1.6.3. Let 𝑓 ∶ {0, 1}∗ → {0, 1}∗ be a function that satisfies (1.6.2) and let
𝐶 = (𝐶𝑛 )𝑛∈ℕ be a circuit family. We say that 𝐶 computes 𝑓 if for all 𝑛 ∈ ℕ, the circuit
𝐶𝑛 computes the function 𝑓𝑛 ∶ {0, 1}𝑛 → {0, 1}∗ , 𝑠 ⃗ ↦ 𝑓(𝑠).⃗
We also define what it means for a circuit family to solve a computational problem
CP = (𝐼, 𝑂, 𝑅). Analogous to (1.6.2), we may, without loss of generality, assume that
the solutions of all instances of a fixed length also have a fixed length. This means that
(1.6.3) |𝑠|⃗ = |𝑠′⃗ | ⇒ |𝑡|⃗ = |𝑡′⃗ | for all (𝑠,⃗ 𝑡),⃗ (𝑠′⃗ , 𝑡′⃗ ) ∈ 𝑅.
We assume in the following that the encodings of computational problems have this
property.
Definition 1.6.4. Let CP = (𝐼, 𝑂, 𝑅) be a computational problem and let 𝐶 = (𝐶𝑛 )𝑛∈ℕ
be a circuit family. We say that 𝐶 solves CP if for all 𝑛 ∈ ℕ on input of 𝑎 ∈ {0, 1}𝑛 ∩ 𝐼
the circuit 𝐶𝑛 computes a solution 𝑏 of 𝑎 .
Definition 1.6.5. Let 𝐿 be a language, and let 𝐶 = (𝐶𝑛 )𝑛∈ℕ be a circuit family. We say
that 𝐶 decides 𝐿 if for all 𝑛 ∈ ℕ on input of 𝑠 ⃗ ∈ {0, 1}𝑛 the circuit 𝐶𝑛 returns 1 if 𝑠 ⃗ ∈ 𝐿
and 0 otherwise.
Theorem 1.6.6. For all functions 𝑓 ∶ {0, 1}∗ → {0, 1}∗ , computational problems CP,
and languages 𝐿 there is a circuit family that computes 𝑓, solves CP, or decides 𝐿.
42 1. Classical Computation
Theorem 1.6.6 demonstrates that circuit families are more powerful than algo-
rithms in terms of computation. It is known that certain functions 𝑓 ∶ {0, 1}∗ → {0, 1}∗
cannot be computed by algorithms (see [Dav82]). However, as this theorem shows,
for all such functions, there exists a circuit family that can compute them. This is pos-
sible because an individual circuit can be designed for each input length. The next
section introduces a more limited concept of circuit families that possesses capabilities
equivalent to the concept of algorithms.
1.6.2. Uniform circuit families. We now introduce uniform circuit families and
obtain a computational model that corresponds to the algorithmic one. For the next
definition, we assume that we have fixed some encoding of circuits by bit strings. Fol-
lowing [Wat09] we require the following.
(1) The encoding is sensible: every circuit is encoded by at least one bit string, and
every bit string encodes at most one quantum circuit.
(2) The encoding is efficient: there is 𝑐 ∈ ℕ such that every circuit 𝐶 has an encoding
of length at least size 𝐶 and at most (size 𝐶)𝑐 .
(3) Information about the structure of a circuit is computable in polynomial time
from an encoding of the circuit.
“Structure information” means, for example, information about what the input
nodes, the gates, and the output nodes are and how these nodes are connected.
We define uniform circuit families.
Definition 1.6.7. A circuit family 𝐶 = (𝐶𝑛 ) is called uniform if there is a deterministic
algorithm which on input of I𝑛 , 𝑛 ∈ ℕ, outputs the encoding of 𝐶𝑛 .
After the next definition, we explain why the input of the algorithm in Definition
1.6.7 is I𝑛 and not simply 𝑛. Here, we remark the following: It can be shown that
uniform circuit families are Turing complete, meaning that their computing power is
equivalent to that of Turing machines. This is important because the Turing-Church
thesis states that Turing machines are the most powerful computing devices imagin-
able. This means that a function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ is computable by a human being
following an algorithm, ignoring resource limitations, if and only if it is computable
by a Turing machine. In today’s computer science, the Turing-Church thesis is still
considered to be true.
Now we define the P-uniform circuit families.
Definition 1.6.8. A circuit family 𝐶 = (𝐶𝑛 ) is called P-uniform if there is a determin-
istic polynomial time algorithm which on input of I𝑛 , 𝑛 ∈ ℕ, outputs the encoding of
𝐶𝑛 .
algorithm running time would not be polynomial in 𝑛. On the other hand, the unary
encoding I𝑛 has a length proportional to 𝑛, making it suitable for ensuring polynomial
running time in terms of the input size.
1.6.3. Circuit complexity. Now we define the size complexity of circuit families
and different circuit complexity classes.
Definition 1.6.9. Let 𝐶 = (𝐶𝑛 )𝑛∈ℕ be a circuit family and let 𝑓 ∶ ℕ → ℕ be a function.
(1) The size complexity of 𝐶 is the function ℕ → ℕ, 𝑛 ↦ |𝐶𝑛 |.
(2) The complexity class SIZE(𝑓) is the set of all languages that can be decided by a
P-uniform circuit family with size complexity O(𝑓).
The next theorem establishes a connection between algorithmic and circuit com-
plexity classes.
Theorem 1.6.10. Let 𝑓 ∶ ℕ → ℕ. Then DTIME(𝑓) ⊂ SIZE(𝑓 log 𝑓).
For the proof of this theorem, see [Vol99] and [AB09]. It is beyond the scope of
this book.
From Theorem 1.6.10 we obtain the following corollary which characterizes the
complexity class P in terms of polynomial size uniform circuit families.
Corollary 1.6.11. A language 𝐿 is in P if and only if 𝐿 is in SIZE(𝑛𝑐 ) for some 𝑐 ∈ ℕ.
The only reversible gate that we have seen so far is the 𝖭𝖮𝖳 gate. All other gates
in Table 1.5.1 are not reversible.
An important reversible gate with two input nodes is the controlled not gate which
is denoted by 𝖢𝖭𝖮𝖳. It applies the 𝖭𝖮𝖳 operation to a target bit 𝑡 if a control bit 𝑐 is
1. Otherwise, the target bit remains unchanged. Therefore, the target qubit becomes
𝑐⊕𝑡. The control bit is never changed. Two variants of 𝖢𝖭𝖮𝖳 are shown in Figure 1.7.1.
In the left 𝖢𝖭𝖮𝖳 gate, the first bit is the control, and the second bit is the target. In the
right 𝖢𝖭𝖮𝖳 gate, the roles of the bits are reversed. A circuit implementation of the left
𝖢𝖭𝖮𝖳 gate using one 𝖷𝖮𝖱 gate is shown in Figure 1.7.2. Figure 1.7.3 presents two more
𝖢𝖭𝖮𝖳 variants. They flip the target bit 𝑡 conditioned on the control bit 𝑐 being 0.
44 1. Classical Computation
𝑐 𝑐 𝑡 𝑐⊕𝑡
𝑡 𝑐⊕𝑡 𝑐 𝑐
Figure 1.7.1. 𝖢𝖭𝖮𝖳 gates that change the target bit 𝑡 conditioned on the control bit 𝑐
being 1.
𝑐 𝑐 𝑐
= 𝑐
𝑡 𝑐⊕𝑡 𝑐⊕𝑡
𝑡
Figure 1.7.2. Circuit implementation of a 𝖢𝖭𝖮𝖳 gate using one 𝖷𝖮𝖱 gate.
𝑐 𝑐 𝑡 ¬𝑐 ⊕ 𝑡
𝑡 ¬𝑐 ⊕ 𝑡 𝑐 𝑐
Figure 1.7.3. 𝖢𝖭𝖮𝖳 gates that change the target bit 𝑡 conditioned on the control bit 𝑐
being 0.
𝑏0 𝑏1 𝑏0 𝑏1
=
𝑏1 𝑏0 𝑏1 𝑏0
Figure 1.7.4. The 𝖲𝖶𝖠𝖯 gate and its implementation using three 𝖢𝖭𝖮𝖳 gates.
Another important gate is the 𝖲𝖶𝖠𝖯 gate. On input of a pair (𝑏0 , 𝑏1 ) of bits it
returns (𝑏1 , 𝑏0 ). This gate is shown in Figure 1.7.4 together with an implementation
that uses only three 𝖢𝖭𝖮𝖳 gates.
Next, we show that we can implement every permutation of the entries of an 𝑛-bit
string by a reversible circuit that uses at most 𝑛 − 1 𝖲𝖶𝖠𝖯 gates.
Proposition 1.7.2. Let 𝑛 ∈ ℕ and let 𝜋 ∈ 𝑆𝑛 . Then the map
(1.7.1) 𝑓𝜋 ∶ {0, 1}𝑛 → {0, 1}𝑛 , (𝑏0 , . . . , 𝑏𝑛−1 ) ↦ (𝑏𝜋(0) , . . . 𝑏𝜋(𝑛−1) )
can be implemented by a circuit that uses at most 𝑛 − 1 𝖲𝖶𝖠𝖯 or at most 3𝑛 𝖢𝖭𝖮𝖳 gates.
Proof. The proposition follows Theorem A.4.25 which states that 𝜋 is the product of
at most 𝑛 − 1 transpositions. □
Example 1.7.3. Consider the permutation
0 1 2 3
(1.7.2) 𝜋=( ).
1 3 0 2
We have 𝜋 = (2, 1) ∘ (1, 3) ∘ (0, 2). So the circuit in Figure 1.7.5 implements 𝜋.
We now introduce the Toffoli gate which was proposed in 1980 by Tomaso Toffoli
and is shown in Figure 1.7.6. It implements the bijection
(1.7.3) {0, 1}3 → {0, 1}3 , (𝑐 0 , 𝑐 1 , 𝑡) ↦ (𝑐 0 , 𝑐 1 , 𝑐 0 ∧ 𝑐 1 ⊕ 𝑡).
1.7. Reversible circuits 45
𝑏0 𝑏1
𝑏1 𝑏3
𝑏2 𝑏0
𝑏3 𝑏2
𝑐0 𝑐0
𝑐1 𝑐1
𝑡 (𝑐 0 ∧ 𝑐 1 ) ⊕ 𝑡
𝑏0 𝑏0 𝑏 𝑏
𝑏1 𝑏1 1 1
1 𝑏0 ↑ 𝑏1 0 𝑏
𝖭𝖠𝖭𝖣 𝖥𝖠𝖭𝖮𝖴𝖳
Figure 1.7.7. Reversible circuits that implement the 𝖭𝖠𝖭𝖣 and 𝖥𝖠𝖭𝖮𝖴𝖳 gates.
This gate leaves the control bits 𝑐 0 and 𝑐 1 unchanged and modifies the target bit 𝑡 condi-
tioned on both control bits 𝑐 0 and 𝑐 1 being 1. Toffoli gates are also called 𝖢𝖢𝖭𝖮𝖳 gates:
a 𝖭𝖮𝖳 operation controlled by two control bits.
The Toffoli gate has the important property that it allows implementations of the
𝖭𝖠𝖭𝖣 and fanout operations. This is shown in Figure 1.7.7. As we will see in Section
1.7.2, this property implies that Toffoli gates can be used to transform every Boolean
circuit into a reversible circuit.
Exercise 1.7.4. Verify that the circuits in Figure 1.7.7 are reversible and implement
the 𝖭𝖠𝖭𝖣 and the 𝖥𝖠𝖭𝖮𝖴𝖳 operation, respectively.
Another gate that can be used to make every circuit reversible is the Fredkin gate.
It was introduced by Edward Fredkin in 1969. It implements the bijection
{0, 1}3 → {0, 1}3 ,
(1.7.4)
(𝑐, 𝑡0 , 𝑡1 ) ↦ (𝑐, (¬𝑐 ∧ 𝑡0 ) ∨ (𝑐 ∧ 𝑡1 ), (𝑐 ∧ 𝑡0 ) ∨ (¬𝑐 ∧ 𝑡1 )).
This function does not change the control bit 𝑐, swaps the target bits 𝑏1 and 𝑏2 if the
control bit 𝑐 is 1, and leaves them unchanged otherwise (see Exercise 1.7.5). Because
of this property, the Fredkin gate is also called the controlled swap gate and is denoted
by 𝖢𝖲𝖶𝖠𝖯: a swap controlled by one control bit.
46 1. Classical Computation
𝑐 𝑐
𝑡0 (¬𝑐 ∧ 𝑡0 ) ∨ (𝑐 ∧ 𝑡1 )
𝑡1 (𝑐 ∧ 𝑡0 ) ∨ (¬𝑐 ∧ 𝑡1 )
Exercise 1.7.5. Determine the truth tables of the Toffoli and the Fredkin gates and use
them to verify that they implement the functions in (1.7.3) and (1.7.4). Also, verify that
the two functions are bijections.
Exercise 1.7.6. (1) Find an implementation of the Toffoli gate that uses only 𝖭𝖮𝖳,
𝖠𝖭𝖣, and 𝖮𝖱 gates.
(2) Find an implementation of the Fredkin gate that uses only 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱
gates.
(3) Find implementations of the 𝖭𝖠𝖭𝖣 and fanout operations that use only Fredkin
gates.
Exercise 1.7.7. Determine 𝑓(𝑥)⃗ for the function 𝑓 implemented by the circuit in Figure
1.7.9 for all 𝑥⃗ ∈ {0, 1}4 .
𝑏0 ¬ 𝑐0
𝑏1 𝑐1
𝑏2 𝑐2
𝑏3 𝑐3
𝑓0 𝑓1 𝑓2
𝑏0 𝖭𝖮𝖳 𝑏0
𝑏1 𝑏1
𝑏2 𝑏2
𝑏3 𝑏3
This construction can be easily extended to circuits that handle inputs of any size.
Moreover, the construction enables the solution of the subsequent exercise.
Exercise 1.7.8. Show that any circuit that uses only reversible gates is reversible.
for all 𝑥⃗ ∈ {0, 1}𝑛 . The bits in 𝑎⃗ are called ancilla bits. The functional value 𝑔(𝑥)⃗ is called
garbage.
𝑥0 𝑦 0 = 𝑥0
𝑥0 𝑦 0 = 𝑥0
𝑦 1 = 𝑥0
𝐶 𝐶′
𝑥0 𝑦 0 = 𝑥0
𝑎0 = 0 𝑦 1 = 𝑥0
𝑎1 = 1 1
removed 𝖥𝖠𝖭𝖮𝖴𝖳 gate. The first input of this gate is the output bit 𝑦 𝑖 of 𝐶𝑟′ , and the
second and third inputs are the new ancilla bits 𝑎𝑝−2 = 0 and 𝑎𝑝−1 = 1. As shown
in Exercise 1.7.4, the output of the Toffoli gate is (𝑦 𝑖 , 𝑦 𝑖 , 1). The first two output edges
of the Toffoli gate are connected to two output nodes that are in the same position
as the removed output nodes of the removed 𝖥𝖠𝖭𝖮𝖴𝖳 gate. Then 𝑝 = 𝑝′ + 2, 𝐶𝑟 ,
𝑎⃗ = 𝑎′⃗ ||(0, 1), and 𝑔(𝑥)⃗ = 𝑔′ (𝑥)‖(1)
⃗ have the required properties. We also note that
|𝐶𝑟 | = |𝐶𝑟′ | + 1 = |𝐶 ′ |𝐹 + 1 = |𝐶|𝐹 and 𝑝 = 𝑝′ + 2 ≤ 2|𝐶 ′ |𝐹 + 2 = 2|𝐶|𝐹 .
Figure 1.7.12 shows how this construction works for the example in Figure 1.7.11.
There, the circuit 𝐶𝑟 is simply the Toffoli gate that implements the fanout operation
and we have 𝑝 = 2 = 𝑝′ + 2, 𝑔(𝑥0 ) = 1, and |𝐶𝑟 | = 1 = |𝑓|𝐹 .
Now suppose that 𝐶 has the second property; i.e., there is a 𝖭𝖠𝖭𝖣 gate whose out-
going edge is connected to an output node 𝑦 𝑖 of 𝐶 for some 𝑖 ∈ ℤ𝑚 . Remove this
𝖭𝖠𝖭𝖣 gate and the corresponding output node 𝑦 𝑖 from 𝐶. Add two new output gates
𝑦′𝑖 and 𝑦′𝑚 to 𝐶 and connect the incoming edges of the removed 𝖭𝖠𝖭𝖣 to 𝑦′𝑖 and 𝑦′𝑚 .
Denote by 𝐶 ′ the resulting circuit and by 𝑓′ the function implemented by 𝐶 ′ . Then we
have |𝑓′ |𝐹 = 𝑘 − 1. In the example shown in Figure 1.7.13 we have 𝑛 = 2, 𝑚 = 1,
𝑦0 = 𝑓(𝑥0 , 𝑥1 ) = 𝑥0 ∧ 𝑥1 , and (𝑦′0 , 𝑦′1 ) = 𝑓′ (𝑥0 , 𝑥1 ) = (𝑥0 , 𝑥1 ).
Apply the induction hypothesis to 𝑓′ and obtain 𝑝′ , 𝐶𝑟′ , 𝑎′⃗ , and 𝑔′ as described in
the assertion of Theorem 1.7.10. In the example in Figure 1.7.13 we can set 𝐶𝑟′ = 𝐶 ′
because 𝐶 ′ is reversible. So we have 𝑝′ = 0, 𝑎′⃗ = (), and 𝑔′ ∶ {0, 1}2 → {0, 1}0 , 𝑏 ⃗ ↦ ().
The reversible circuit 𝐶𝑟 is obtained from 𝐶𝑟′ as follows. We set 𝑝 = 𝑝′ + 1 and add one
ancilla bit 𝑎𝑝−1 = 1. In addition, we add a Toffoli gate that replaces the removed 𝖭𝖠𝖭𝖣
gate. Its first input is 𝑎𝑝−1 = 1. The two other inputs are 𝑦′𝑖 and 𝑦′𝑚 . The corresponding
output gates are removed. Then the output of the Toffoli gate is (𝑦′𝑖 ↑ 𝑦′𝑚 , 𝑦′𝑖 , 𝑦′𝑚 ).
The first outgoing edge of the Toffoli gate is connected to a new output gate 𝑦 𝑖 . The
two other outgoing edges are connected to two new garbage output gates. So we have
′ ′
𝑔(𝑥)⃗ = 𝑔′ (𝑥)‖(𝑦
⃗ ′ ′ ′
𝑖 , 𝑦 𝑚 ). We note that |𝐶𝑟 | = |𝐶𝑟 | + 1 = |𝐶 |𝐹 + 1 = |𝐶|𝐹 and 𝑝 = 𝑝 + 1 ≤
′ ′
2|𝐶 |𝐹 + 1 < 2|𝐶 |𝐹 + 2 = 2|𝐶|𝐹 .
Figure 1.7.14 shows how this construction works for the example in Figure 1.7.12.
In this example, 𝑎0 is the first, 𝑥0 is the second, and 𝑥1 to the third input of the Toffoli
gate. So the circuit 𝐶𝑟 is a simple modification of the Toffoli gate that implements the
𝑥0 𝑥0 𝑦′0 = 𝑥0
𝑦 1 = 𝑥0 ↑ 𝑥1
𝑥1 𝑥1 𝑦′1 = 𝑥1
𝑓(𝑥0 , 𝑥1 ) = 𝑥0 ↑ 𝑥1
𝑥0 𝑔0 (𝑥0 , 𝑥1 ) = 𝑥0
𝑥1 𝑔1 (𝑥0 , 𝑥1 ) = 𝑥1
𝑎0 = 1
𝖭𝖠𝖭𝖣 gate and we have 𝑝 = 1 = 𝑝′ + 1, 𝑔(𝑥0 , 𝑥1 ) = (𝑥0 , 𝑥1 ), and |𝐶𝑟 | = 1 = |𝑓|𝐹 . This
concludes the proof. □
Exercise 1.7.11. State and prove a theorem analogous to Theorem 1.7.10 where Fred-
kin gates are used instead of Toffoli gates.
When we use the construction from the proof of Theorem 1.7.10 to construct quan-
tum circuits, then the garbage may be problematic. Therefore, we need the following
theorem whose proof uses the so-called uncompute trick.
Theorem 1.7.12. For all Boolean functions 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ, there is
𝑝 ∈ ℕ0 , 𝑝 ≤ 2|𝑓|𝐹 , a reversible circuit 𝐷𝑟 with |𝐷𝑟 | = O(|𝑓|𝐹 ) that uses only Toffoli, 𝖭𝖮𝖳,
and 𝖢𝖭𝖮𝖳 gates such that 𝐷𝑟 implements a function
(1.7.9) ℎ ∶{0, 1}𝑛 × {0, 1}𝑛+𝑝 × {0, 1}𝑚 → {0, 1}𝑛 × {0, 1}𝑛+𝑝 × {0, 1}𝑚
with
(1.7.10) ℎ(𝑥,⃗ 0,⃗ 𝑦)⃗ = (𝑥,⃗ 0,⃗ 𝑦 ⃗ ⊕ 𝑓(𝑥))
⃗
for all 𝑥⃗ ∈ {0, 1}𝑛 and 𝑦 ⃗ ∈ {0, 1}𝑚 .
Proof. Let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ. Let 𝑝, 𝐶𝑟 , 𝑎,⃗ and 𝑔 be as in Theorem 1.7.10.
We construct the circuit 𝐷𝑟 from 𝐶𝑟 . This construction is illustrated in Figure 1.7.15
for 𝐶𝑟 from Figure 1.7.14.
𝐷𝑟 has a total of 2𝑛 + 𝑝 + 𝑚 input nodes. The initial sequence consists of the
first 𝑛 nodes, represented as 𝑥⃗ = (𝑥0 , . . . , 𝑥𝑛−1 ), followed by 𝑥′⃗ = (𝑥0′ , . . . , 𝑥𝑛−1
′
). Subse-
quently, we include sequences of 𝑝 ancillary input nodes, denoted as 𝑎⃗ = (𝑎′0 , . . . , 𝑎𝑝−1 ),
′
and 𝑚 input nodes, 𝑦 ⃗ = (𝑦0 , . . . , 𝑦𝑚 ). In the example shown in Figure 1.7.15, where
𝑛 = 2, 𝑚 = 𝑝 = 1, we append two input nodes after 𝑥1 , set 𝑎0 to 0, and introduce the
input node 𝑦0 .
The circuit 𝐷𝑟 applies a bitwise 𝖢𝖭𝖮𝖳 to 𝑥⃗ and 𝑥′⃗ . If 𝑥′⃗ = 0,⃗ then this operation
copies 𝑥⃗ to 𝑥′⃗ . In 𝐷𝑟 , there is also a 𝖭𝖮𝖳 gate after each ancilla input node whose value
in 𝑎⃗ is 1. These 𝖭𝖮𝖳 gates change an ancillary bit vector 0⃗ of length 𝑝 to 𝑎.⃗ In the
example, we have 𝑎0 = 1. Therefore, a 𝖭𝖮𝖳 gate is inserted behind the input node 𝑎0 .
Now, the reversible circuit 𝐶𝑟 is applied to the input 𝑥′⃗ ‖𝑎′⃗ . This does not change
𝑥⃗ and 𝑦.⃗ The circuit 𝐷𝑟 then copies 𝑓(𝑥)⃗ to 𝑦 ⃗ using bitwise 𝖢𝖭𝖮𝖳. In the example, 𝐶𝑟
produces the bit string (𝑓(𝑥0 , 𝑥1 ), 𝑔0 (𝑥0 , 𝑥1 ), 𝑔1 (𝑥0 , 𝑥1 )) where 𝑓(𝑥0 , 𝑥1 ) = 𝑥0 ↑ 𝑥1 . In
addition, a 𝖢𝖭𝖮𝖳 gate is required to copy 𝑓(𝑥0 , 𝑥1 ) to 𝑦0 .
1.7. Reversible circuits 51
𝑥0 𝑥0
𝑥1 𝑥1
𝑥0′ = 0 𝑥0 𝑓(𝑥0 , 𝑥1 ) 𝑥0 0
𝑎0 = 0 ¬ 1 𝑔1 (𝑥0 , 𝑥1 ) 1 ¬ 0
𝑦0 𝑦0 ⊕ 𝑓(𝑥0 , 𝑥1 )
Finally, the uncompute trick is used. The inverse circuit 𝐶𝑟−1 is applied to the bits
with indices 𝑛 . . . 𝑛 + 𝑝 − 1. This gives (𝑥,⃗ 𝑎).
⃗ Since the first 𝑛 input nodes have not
changed their values, 𝖢𝖭𝖮𝖳 gates can be used to change 𝑥′⃗ back to 0.⃗ In addition,
applying 𝖭𝖮𝖳 gates to the appropriate ancilla bits maps 𝑎⃗ to 0.⃗ In the example, two
𝖢𝖭𝖮𝖳 gates are used to obtain 𝑥′⃗ = 0.⃗ Also, one 𝖭𝖮𝖳 gate is required to change 𝑎0 = 1
to 𝑎0 = 0. The assertion about the size of 𝐷𝑟 is verified in Exercise 1.7.13. □
Exercise 1.7.13. Show that the circuit 𝐷𝑟 from Theorem 1.7.12 satisfies |𝐷𝑟 | = O(|𝑓|𝐹 )
and determine an appropriate O-constant.
Exercise 1.7.14. Construct a reversible circuit that computes the function 𝑓(𝑏0 , 𝑏1 ) =
𝑏0 ↓ 𝑏1 .
Chapter 2
Hilbert Spaces
Hilbert spaces, named after the mathematician David Hilbert, serve as the fundamental
mathematical framework for quantum mechanics. In the context of quantum comput-
ing, finite-dimensional complex Hilbert spaces prove to be sufficient. These spaces are
complex vector spaces equipped with an inner product, known as state spaces in the
context of quantum computing. They provide a powerful mathematical representation
of the quantum mechanics that is required for understanding quantum algorithms, in
particular quantum states, their evolution, and measurement. In our presentation, we
introduce and use the bra-ket notation. It is also referred to as Dirac notation due to
its originator, physicist Paul Dirac, in the 1930s, and plays a central role in quantum
mechanics. Its primary purpose is to simplify the presentation of the mathematical
framework of Hilbert spaces, making it more accessible and concise.
It is crucial to emphasize the following point: in the realm of general quantum
mechanics, which models the behavior of particles at the atomic and subatomic levels,
finite-dimensional Hilbert spaces frequently fall short when addressing numerous as-
pects of the theory. As we confront progressively more complex and intricate quantum
systems, the limitations inherent in finite-dimensional spaces become increasingly ap-
parent. This expansion of complexity necessitates a departure from the confines of
linear algebra, leading quantum theory into the profound domains of mathematical
analysis
Appendix B presents general linear algebra, which forms the foundation for this
chapter. Our objective here is to introduce the concept of finite-dimensional Hilbert
spaces and explore their notable properties. We employ the widely used Dirac notation
from physics, which proves to be elegant, and we select our examples from the realm of
quantum computing. After the introductory sections, this chapter delves into vital con-
cepts crucial for our discussion of quantum mechanics and quantum algorithms. We
explore various special operators, such as Hermitian, unitary, and normal operators,
involutions, and projections, which play pivotal roles in modeling quantum computa-
tion. Of particular significance are the spectral theorem, essential, for example, when
53
54 2. Hilbert Spaces
stating the quantum mechanical postulates, and the Schmidt decomposition theorem,
enabling the characterization of a fundamental quantum mechanical phenomenon:
entanglement.
Throughout this chapter, 𝑘 is a positive integer and ℍ denotes a complex vector
space of dimension 𝑘.
2.1.1. Kets. As explained in Appendix B, elements of vector spaces are called vec-
tors and are denoted by 𝑣 ⃗ for some character 𝑣. In quantum physics, the following
notation is used.
Definition 2.1.1. Every element of ℍ is called a ket. Such a ket is denoted by |𝜑⟩ for
some character 𝜑 and is pronounced “ket-𝜑”. The character 𝜑 may be replaced by any
other character, number, or even word.
It is important to note that the sum of two kets |𝜑⟩ , |𝜓⟩ ∈ ℍ is written as |𝜑⟩ + |𝜓⟩
and not as |𝜑 + 𝜓⟩. In the same way, all expressions that contain several kets are written
by keeping the | ⟩ notation for all kets. In the next section, we will present examples of
the ket notation.
(2.1.1) 𝐵𝑛 = (||𝑏⟩)
⃗ .
⃗
𝑏∈{0,1}𝑛
Note that the first sum in (2.1.3) is direct. On ℍ𝑛 , we define componentwise addition
and multiplication with complex scalars as follows. If
basis of ℍ𝑛 .
Example 2.1.3. In classical computing, the bits 0 and 1 are used. In quantum comput-
ing, these bits are replaced by quantum bits or qubits. The state of a qubit is an element
in the single-qubit state space ℍ1 . This will be explained in more detail in Section 3.1.2.
The computational basis of ℍ1 is 𝐵 = (|0⟩ , |1⟩). Another basis of ℍ1 is
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.1.7) (|𝑥+ ⟩ , |𝑥− ⟩) = ( , ).
√2 √2
Here the symbols 𝑥+ and 𝑥− are used to denote the basis elements. This basis will play
a role in later sections.
Exercise 2.1.4. Show that (|𝑥+ ⟩ , |𝑥− ⟩) is a basis of the single-qubit state space ℍ1 .
Example 2.1.5. We describe an alternative representation of the computational basis
elements of the 𝑛-qubit state space ℍ𝑛 . For this, we use the map
𝑛−1
(2.1.8) stringToInt ∶ {0, 1}𝑛 → ℤ2𝑛 , 𝑏 ⃗ = (𝑏0 ⋯ 𝑏𝑛−1 ) ↦ ∑ 𝑏𝑖 2𝑛−𝑖−1
𝑖=0
which was introduced in Definition 1.1.12. Also, in Exercise 1.1.13 it was shown that
this map is a bijection. Using this bijection, we identify the bit vectors in {0, 1}𝑛 with
the integers in ℤ2𝑛 . For example, the bit vector (010) is identified with the integer
0 ⋅ 22 + 1 ⋅ 21 + 0 ⋅ 20 = 2.
So we can write the computational basis of ℍ𝑛 as (|𝑏⟩𝑛 )𝑏∈ℤ2𝑛 , where the index 𝑛
indicates that the number in the ket is considered as an 𝑛-bit string. For instance,
the computational basis of ℍ2 is (|0⟩2 , |1⟩2 , |2⟩2 , |3⟩2 ). We also obtain the following
alternative representation of ℍ𝑛 :
2𝑛 −1
(2.1.9) ℍ𝑛 = { ∑ 𝛼𝑏 |𝑏⟩𝑛 ∶ 𝛼𝑏 ∈ ℂ for all 𝑏 ∈ ℤ2𝑛 } .
𝑏=0
56 2. Hilbert Spaces
So, we change each ket |𝜑𝑖 ⟩ in the left argument to a so-called bra ⟨𝜑𝑖 | (see Section
2.2.3 for an explanation) with the same symbol inside and omit the outer ⟨⟩. Using this
notation, we now define inner products.
Definition 2.2.1. An inner product on ℍ is a map
(2.2.4) ⟨⋅|⋅⟩ ∶ ℍ × ℍ → ℂ, (|𝜑⟩ , |𝜓⟩) ↦ ⟨𝜑|𝜓⟩
that satisfies the following three conditions for all kets |𝜉⟩ , |𝜑⟩ , |𝜓⟩ ∈ ℍ and all scalars
𝛼 ∈ ℂ.
(1) Linearity in the second argument: ⟨𝜉| (|𝜑⟩ + |𝜓⟩) = ⟨𝜉|𝜑⟩ + ⟨𝜉|𝜓⟩ and ⟨𝜑| (𝛼 |𝜓⟩) =
𝛼⟨𝜑|𝜓⟩.
(2) Conjugate symmetry: ⟨𝜓|𝜑⟩ = ⟨𝜑|𝜓⟩. This property is also called Hermitian sym-
metry or conjugate commutativity. It implies that ⟨𝜑|𝜑⟩ is a real number.
(3) Positive definiteness: ⟨𝜑|𝜑⟩ ≥ 0 and ⟨𝜑|𝜑⟩ = 0 if and only if |𝜑⟩ = 0. This property
is also called positivity.
Inner products on real vector spaces are defined analogously, but the conjugate
symmetry condition becomes a symmetry condition. Note that the definition of inner
products does not require ℍ to be finite dimensional.
Exercise 2.2.2. Show that for all |𝜑⟩ ∈ ℍ the inner product ⟨𝜑|𝜑⟩ is a real number.
Using the linearity of the inner product in the second argument and the conjugate
linearity in the first argument we obtain the distributive law
𝑚−1 𝑛−1 𝑚−1 𝑛−1
(2.2.5) ( ∑ 𝛼𝑖 ⟨𝜑𝑖 |) ( ∑ 𝛽𝑗 |𝜓𝑗 ⟩) = ∑ ∑ 𝛼𝑖 𝛽𝑗 ⟨𝜑𝑖 |𝜓𝑗 ⟩
𝑖=0 𝑗=0 𝑖=0 𝑗=0
The definition of matrix multiplication allows multiplying the dual 𝑣∗⃗ of a vector
𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ ℂ𝑘 with another vector 𝑤⃗ = (𝑤 0 , . . . , 𝑤 𝑘−1 ) ∈ ℂ𝑘 . The result is
𝑤0 𝑘−1
(2.2.8) 𝑣∗⃗ 𝑤⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ( ⋮ ) = ∑ 𝑣 𝑖 𝑤 𝑖 .
𝑤 𝑘−1 𝑖=0
Example 2.2.8. Let 𝑘 = 2, 𝑣 ⃗ = (1, 𝑖), and 𝑤⃗ = (𝑖, 1). Then we have 𝑣∗⃗ = (1, −𝑖) and
𝑣∗⃗ 𝑤⃗ = 𝑖 − 𝑖 = 0.
Next, we show that the Hermitian inner product with respect to a basis 𝐵 of ℍ can
be used to determine the coefficient vectors of kets in ℍ with respect to 𝐵.
2.2. Inner products 59
Proposition 2.2.13. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be a basis of ℍ. Then the following hold.
(1) For all 𝑖, 𝑗 ∈ ℤ𝑘 we have
(2.2.11) ⟨𝑏𝑖 |𝑏𝑗 ⟩𝐵 = 𝛿 𝑖,𝑗 .
(2) For all |𝜑⟩ ∈ ℍ we have
𝑘−1
(2.2.12) |𝜑⟩ = ∑ ⟨𝑏𝑖 |𝜑⟩𝐵 |𝑏𝑖 ⟩ .
𝑖=0
2.2.3. Bras. We introduce and discuss the bra notation that will further simplify
the presentation of the theory of Hilbert spaces. We assume that ⟨⋅|⋅⟩ is an inner product
on ℍ. In the next definition, the dual of ℍ is used, which — as explained in Section
B.3.2 — is ℍ∗ = End(ℍ, ℂ).
Definition 2.2.15. Every element of the dual ℍ∗ of ℍ is called a bra. Such a bra is
denoted by ⟨𝜑| for some character 𝜑 and is pronounced “bra-𝜑”. The character 𝜑 may
be replaced by any other character, number, or even word.
Proof. It follows from the linearity in the second argument of the inner product that
for all |𝜑⟩ ∈ ℍ the map ⟨𝜑| from (2.2.13) is in ℍ∗ . Furthermore, due to the positivity
of the inner product, the map (2.2.14) is injective. Hence, Theorem B.6.18 implies that
this map is a bijection. Its conjugate linearity follows from Proposition 2.2.3. □
Theorem 2.2.16 shows that all elements in ℍ∗ can be written as ⟨𝜑| for some unique-
ly determined |𝜑⟩ ∈ ℍ and vice versa. Therefore, we will always use the same character,
number, or word inside |⟩ and ⟨| to denote the kets and bras that correspond to each
other. The construction of ⟨𝜑| from |𝜑⟩ ∈ ℍ will be explained in Proposition 2.2.37.
The bra notation is quite elegant since for every |𝜑⟩ , |𝜓⟩ ∈ ℍ the image of |𝜓⟩ ∈
ℍ under ⟨𝜑| is obtained by “gluing” the two expressions ⟨𝜑| and |𝜓⟩ together, giving
⟨𝜑|𝜓⟩. We also obtain the following interpretation of the notation introduced in (2.2.3):
the inner product of a linear combination of kets with another linear combination of
kets is obtained by gluing together the linear combination of bras corresponding to the
first linear combination of kets with the second linear combination of kets. The linear
60 2. Hilbert Spaces
combinations are written in parentheses. Also, the distributive law from (2.2.5) holds
for bras and kets.
Exercise 2.2.17. Determine the images of the computational basis states of ℍ1 under
⟨𝑥+ | and ⟨𝑥− |.
Exercise 2.2.21. Find a basis 𝐶 of ℍ1 such that the Hilbert spaces (ℍ1 , ⟨⋅|⋅⟩) and
(ℍ1 , ⟨⋅|⋅⟩𝐶 ) are different.
2.2.5. Norm. Another important notion is the norm on a Hilbert space which we
define now.
Definition 2.2.22. A norm on ℍ is a function 𝑓 ∶ ℍ → ℝ, |𝜑⟩ ↦ 𝑓 |𝜑⟩ which for all
|𝜑⟩ , |𝜓⟩ ∈ ℍ and all 𝛼 ∈ ℂ satisfies the following conditions.
(1) Triangle inequality: 𝑓(|𝜑⟩ + |𝜓⟩) ≤ 𝑓 |𝜑⟩ + 𝑓 |𝜓⟩.
(2) Absolute homogeneity: 𝑓(𝛼 |𝜑⟩) = |𝛼|𝑓 |𝜑⟩.
(3) Positive definiteness: 𝑓 |𝜑⟩ ≥ 0 and 𝑓 |𝜑⟩ = 0 if and only |𝜑⟩ = 0.
is a norm on ℂ𝑘 .
We show how to construct a norm from an inner product on ℍ. For this, we let
⟨⋅|⋅⟩ be such an inner product. We refer to the Hilbert space (ℍ, ⟨⋅|⋅⟩) simply by ℍ. But
we must keep in mind which inner product we have chosen on ℍ since changing the
inner product changes the norm.
Proposition 2.2.25. The map
Proof. We start by proving the Cauchy-Schwarz inequality. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ and 𝛼 ∈
ℂ. For 𝑥 ∈ ℝ let
𝑝(𝑥) = (⟨𝜑| − 𝑥 ⟨𝜓|)(|𝜑⟩ − 𝑥 |𝜓⟩)
(2.2.24)
= 𝑥2 ⟨𝜓|𝜓⟩ − (⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩)𝑥 + ⟨𝜑|𝜑⟩.
Since ⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩ = 2ℜ⟨𝜑|𝜓⟩ is a real number, it follows that the coefficients of 𝑝(𝑥),
considered as a quadratic polynomial, are real numbers. The discriminant of this poly-
nomial (see Exercise A.4.49) is
(2.2.25) Δ(𝑝) = (⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩)2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩.
62 2. Hilbert Spaces
This equation and the conjugate symmetry of the inner product imply
Δ(𝑝) = (⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩)2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩
= |⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩|2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩
(2.2.26)
≤ (|⟨𝜑|𝜓⟩| + |⟨𝜓|𝜑⟩|)2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩
= 4(|⟨𝜑|𝜓⟩|2 − ⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩).
But 𝑝(𝑥) is nonnegative for all 𝑥 ∈ ℝ. Therefore, this polynomial can have at most
one real root, which means that its discriminant is nonpositive. So (2.2.26) implies the
Cauchy-Schwarz inequality.
Now we apply the Cauchy-Schwarz inequality and obtain
2
‖|𝜑⟩ − |𝜓⟩‖
= (⟨𝜑| − ⟨𝜓|)(|𝜑⟩ − |𝜓⟩)
2 2
(2.2.27) = ‖𝜑‖ − ⟨𝜑|𝜓⟩ − ⟨𝜓|𝜑⟩ + ‖𝜓‖
2 2
≤ ‖𝜑‖ − 2‖𝜑‖‖𝜓‖ + ‖𝜓‖
= (‖𝜑‖ − ‖𝜓‖)2 .
This implies the triangle inequality.
Next, we let 𝛼 ∈ ℂ. Then, the linearity in the second element and the conjugate
linearity in the first argument of the inner product imply
2 2
(2.2.28) ‖𝛼 |𝜑⟩‖ = (𝛼 |𝜑⟩)(𝛼 |𝜑⟩) = |𝛼|2 ⟨𝜑|𝜑⟩ = |𝛼|2 ‖𝜑‖ .
This implies the absolute homogeneity of the norm.
Finally, the positive definiteness of ‖⋅‖ immediately follows from the positive def-
initeness of the inner product. □
Definition 2.2.26. The norm ‖⋅‖ → ℂ, |𝜑⟩ ↦ ‖𝜑‖ = √⟨𝜑|𝜑⟩ defined in Proposition
2.2.25 is called the Euclidean norm on the Hilbert space ℍ. It depends on the inner
product on ℍ. For |𝜑⟩ ∈ ℍ we also refer to ‖𝜑‖ as the length of |𝜑⟩.
The next theorem presents the Gram-Schmidt procedure that constructs an orthog-
onal basis from any basis of ℍ.
Theorem 2.2.31 (Gram-Schmidt procedure). Let 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑘−1 ⟩) be a basis of ℍ.
Set
(2.2.29) |𝑏0 ⟩ = |𝑐 0 ⟩
and for 1 ≤ 𝑗 < 𝑘 let
𝑗−1
⟨𝑏𝑖 |𝑐𝑗 ⟩
(2.2.30) |𝑏𝑗 ⟩ = |𝑐𝑗 ⟩ − ∑ |𝑏 ⟩ .
𝑖=0
⟨𝑏𝑖 |𝑏𝑖 ⟩ 𝑖
The process of constructing the orthogonal basis 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) from the
basis 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑘−1 ⟩) of ℍ presented in Theorem 2.2.31 is referred to as the Gram-
Schmidt orthogonalization of 𝐶. We also call the resulting orthogonal basis 𝐵 the Gram-
Schmidt orthogonalization of the basis 𝐶.
64 2. Hilbert Spaces
Example 2.2.32. Consider the basis (|𝑐 0 ⟩ , |𝑐 1 ⟩) = (|0⟩ , |0⟩ + |1⟩) of the single-qubit
state space ℍ1 . It is not orthogonal since
(2.2.33) ⟨𝑐 0 |𝑐 1 ⟩ = ⟨0| (|0⟩ + |1⟩) = ⟨0|0⟩ + ⟨0|1⟩ = 1.
We apply Gram-Schmidt orthogonalization to this basis. We obtain
(2.2.34) |𝑏0 ⟩ = |𝑐 0 ⟩ = |0⟩
and
⟨𝑏0 |𝑐 1 ⟩ ⟨0| (|0⟩ + |1⟩)
|𝑏1 ⟩ = |𝑐 1 ⟩ − |𝑏0 ⟩ = (|0⟩ + |1⟩) − |0⟩
(2.2.35) ⟨𝑏0 |𝑐 0 ⟩ ⟨0|0⟩
= (|0⟩ + |1⟩) − (⟨0|0⟩ + ⟨0|1⟩) |0⟩ = (|0⟩ + |1⟩) − |0⟩ = |1⟩ .
Hence, the Gram-Schmidt orthogonalization of the basis (|0⟩ , |0⟩ + |1⟩) of ℍ1 gives
(|0⟩ , |1⟩).
Since ‖|𝑏1 ⟩‖ = 1/√2, this gives the orthonormal basis (|𝑏0 ⟩ , √2 |𝑏1 ⟩) = (|𝑥+ ⟩ , |𝑥− ⟩) of
ℍ1
Corollary 2.2.34 implies the following result, which shows that all inner products
on ℍ are Hermitian inner products with respect to some basis of ℍ.
Proof. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ, and let |𝜑⟩𝐵 = (𝛼0 , . . . , 𝛼𝑘−1 ), |𝜓⟩𝐵 = (𝛽0 , . . . , 𝛽 𝑘−1 ). Then we
have
𝑘−1 𝑘−1 𝑘−1
⟨𝜑, 𝜓⟩ = ( ∑ 𝛼𝑖 ⟨𝑏𝑖 |) ( ∑ 𝛽𝑗 |𝑏𝑖 ⟩) = ∑ 𝛼𝑖 𝛽𝑗 ⟨𝑏𝑖 , 𝑏𝑗 ⟩
𝑖=0 𝑗=0 𝑖,𝑗=0
(2.2.40)
𝑘−1 𝑘−1
∗
= ∑ 𝛼𝑖 𝛽𝑗 𝛿 𝑖,𝑗 = ∑ 𝛼𝑖 𝛽 𝑖 = |𝜑⟩𝐵 |𝜓⟩𝐵
𝑖,𝑗=0 𝑖=0
as asserted. □
By Theorem 2.2.16, we have ℍ∗ = {⟨𝜑| ∶ |𝜑⟩ ∈ ℍ}. The next proposition shows
how |𝜑⟩ is constructed from 𝜑 ∈ ℍ∗ .
we have 𝑓 = ⟨𝜑|.
66 2. Hilbert Spaces
Using Proposition 2.2.42 we can prove the following more general statement.
Proposition 2.2.44. Let 𝑙 ∈ ℕ and let ℍ(0), . . . , ℍ(𝑙 − 1) be subspaces of ℍ. Then the
following hold.
(1) If ℍ(0), . . . , ℍ(𝑙 − 1) are pairwise orthogonal to each other, then their sum is direct.
(2) The subspaces ℍ(0), . . . , ℍ(𝑙 − 1) are pairwise orthogonal to each other if and only
if there are orthonormal bases 𝐵0 , . . . , 𝐵𝑙−1 of ℍ(0), . . . , ℍ(𝑙 − 1), respectively, such
that 𝐵 = 𝐵0 ∥ ⋯ ∥ 𝐵𝑙−1 is an orthonormal basis of ℍ(0) + ⋯ + ℍ(𝑙 − 1).
Proof. Without loss of generality, we assume that the sum of the subspaces ℍ(𝑖) is ℍ.
We begin by proving the first assertion. For 𝑖 ∈ ℤ𝑙 let |𝜑𝑖 ⟩ ∈ ℍ(𝑖) such that
𝑙−1
(2.2.47) ∑ |𝜑𝑖 ⟩ = 0.
𝑖=0
Since the subspaces ℍ(𝑖) are pairwise orthogonal, for all 𝑗 ∈ ℤ𝑙 we have
|𝑙−1 𝑙−1
(2.2.48) 0 = ⟨𝜑𝑗 || ∑ |𝜑𝑖 ⟩ ⟩ = ∑ ⟨𝜑𝑗 |𝜑𝑖 ⟩ = ⟨𝜑𝑗 |𝜑𝑗 ⟩
|𝑖=0 𝑖=0
Next, we turn to the second assertion. Assume that the subspaces ℍ(𝑖) are pair-
wise orthogonal. We prove the existence of bases 𝐵𝑖 with the asserted properties by
induction on 𝑙. For 𝑙 = 1, we can choose an orthonormal basis 𝐵0 of ℍ(0) that exists by
Corollary 2.2.34. Assume that 𝑙 > 1 and that the assertion holds for 𝑙 − 1. According to
the induction hypothesis, there are orthonormal bases 𝐵0 , . . . , 𝐵𝑙−2 of ℍ(0), . . . , ℍ(𝑙 −2),
respectively, such that 𝐵′ = 𝐵0 ∥ ⋯ ∥ 𝐵𝑙−2 is an orthonormal basis of the sum ℍ′ of
these subspaces. It follows from Proposition 2.2.42 that ℍ(𝑙 − 1) = (ℍ′ )⟂ and there is
an orthonormal basis 𝐵𝑙−1 of ℍ(𝑙 − 1) such that 𝐵 ′ ∥ 𝐵𝑙−1 = ℍ.
To prove the converse of the second assertion, assume that there are orthonormal
bases 𝐵𝑖 of ℍ(𝑖) such that their concatenation is an orthonormal basis of ℍ. It is then
easy to verify that the subspaces are pairwise orthogonal to each other. □
We also determine the matrix representation of the Pauli 𝑋 operator with respect to the
basis
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.3.8) 𝐶 = (|𝑥+ ⟩ , |𝑥− ⟩) = ( , )
√2 √2
of ℍ1 . We note that
𝑋 |0⟩ + 𝑋 |1⟩ |1⟩ + |0⟩
(2.3.9) 𝑋 |𝑥+ ⟩ = = = |𝑥+ ⟩
√2 √2
and
𝑋 |0⟩ − 𝑋 |1⟩ |1⟩ − |0⟩
(2.3.10) 𝑋 |𝑥− ⟩ = = = − |𝑥− ⟩ .
√2 √2
Hence, we have
1 0
(2.3.11) Mat𝐶 (𝑋) = ( ).
0 −1
Note that this matrix is different from Mat𝐵 (𝑋).
Example 2.3.2. The Pauli 𝑍 operator
(2.3.12) 𝑍 ∶ ℍ1 → ℍ 1
has the representation matrix
1 0
(2.3.13) 𝐴 = Mat𝐵 (𝑍) = ( )
0 −1
with respect to the computational basis of ℍ1 . Note that this matrix is equal to Mat𝐶 (𝑋)
from (2.3.11). So the Pauli 𝑍 operator is
(2.3.14) 𝑍 = 𝑓𝐴,𝐵 ∶ ℍ1 → ℍ1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ 𝛼 |0⟩ − 𝛽 |1⟩ .
Exercise 2.3.3. (1) Determine the matrix representation of the Pauli 𝑌 operator
(2.3.15) 𝑌 ∶ ℍ1 → ℍ 1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ −𝑖𝛽 |0⟩ + 𝑖𝛼 |1⟩
with respect to the computational basis of ℍ1 .
(2) Determine the matrix representations of the Pauli 𝑌 and 𝑍 operators with respect
to the basis 𝐶 = (|𝑥− ⟩ , |𝑥+ ⟩) from (2.3.8).
Exercise 2.3.4. (1) Find the matrix representation of the Hadamard operator
(2.3.16) 𝐻 ∶ ℍ1 → ℍ 1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ 𝛼 |𝑥+ ⟩ + 𝛽 |𝑥− ⟩
with respect to the computational basis of ℍ1 .
(2) Use the matrix representations of the operators 𝐻, 𝑋, 𝑌 , and 𝑍 to show that
(2.3.17) 𝐻𝑋𝐻 = 𝑍, 𝐻𝑌 𝐻 = −𝑌 , 𝐻𝑍𝐻 = 𝑋.
operators on state spaces ℍ𝑛 will typically be described by their effect on the computa-
tional basis elements. For instance, using this representation, the Hadamard operator
from(2.3.16) can written as
(2.3.18) 𝐻 ∶ ℍ1 → ℍ 1 , |0⟩ ↦ |𝑥+ ⟩ , |1⟩ ↦ |𝑥− ⟩ .
Proof. Write Mat𝐵,𝐶 (𝑇) = (𝛼𝑖,𝑗 ). Then for all 𝑖 ∈ ℤ𝑘 and 𝑗 ∈ ℤ𝑙 the linearity of the
inner product in the second argument and the fact that ⟨𝑏𝑖 |𝑏𝑗 ⟩ = 𝛿 𝑖,𝑗 for all 𝑖, 𝑗 ∈ ℤ𝑘
imply
𝑘−1 𝑘−1
(2.3.21) ⟨𝑐 𝑖 |𝑇|𝑏𝑗 ⟩ = ⟨𝑐 𝑖 | ( ∑ 𝛼𝑚,𝑗 |𝑐𝑚 ⟩) = ∑ 𝛼𝑚,𝑗 ⟨𝑐 𝑖 |𝑐𝑚 ⟩ = 𝛼𝑖,𝑗 . □
𝑚=0 𝑚=0
Example 2.3.6. Denote by ⟨⋅|⋅⟩ the Hermitian inner product on ℍ1 with respect to
𝐵 = (|0⟩ , |1⟩). By Proposition 2.3.5, the representation matrix of the Pauli 𝑍 operator
with respect to 𝐵 is
⟨0|𝑍|0⟩ ⟨0|𝑍|1⟩ ⟨0|0⟩ −⟨0|1⟩ 1 0
(2.3.22) Mat𝐵 (𝑍) = ( )=( )=( ).
⟨1|𝑍|0⟩ ⟨1|𝑍|1⟩ ⟨1|0⟩ −⟨1|1⟩ 0 −1
The adjoint of a matrix over ℂ is also called its Hermitian adjoint, Hermitian con-
jugate, or Hermitian transpose.
The notation 𝐴∗ for the adjoint of a matrix 𝐴 over ℂ is in agreement with Definition
2.2.7 where the dual of a complex vector is specified as the matrix that has the conjugate
of this vector as its only row.
Example 2.3.8. Consider the matrix
1 𝑖 1+𝑖
(2.3.23) 𝐴=( ) ∈ ℂ(2,3) .
1−𝑖 𝑖 1
70 2. Hilbert Spaces
Its adjoint is
1 1+𝑖
∗
(2.3.24) 𝐴 = ( −𝑖 −𝑖 ) ∈ ℂ(3,2) .
1−𝑖 1
To show that 𝐴∗ is the only matrix in ℂ(𝑙,𝑘) that satisfies (2.3.30), let 𝐴′ ∈ ℂ(𝑙,𝑘) such
that
(2.3.31) ⃗ 𝑤⟩⃗ = ⟨𝐴′ 𝑣|⃗ 𝑤⟩⃗
⟨𝑣|𝐴
So (2.3.25) implies 𝐴∗ = 𝐴′ . □
From Proposition 2.3.11 we obtain the following result which allows us to define
adjoints of linear operators on Hilbert spaces.
2.3. Linear maps 71
for all |𝜑⟩ ∈ ℍ and |𝜓⟩ ∈ ℍ′ . The operator 𝐴∗ is called the adjoint of 𝐴.
Exercise 2.3.13. Verify that Proposition 2.3.9 also holds for linear operators.
Proof. The map ℂ(𝑙,𝑘) → ℂ𝑘𝑙 , which sends a matrix 𝐴 ∈ ℂ(𝑙,𝑘) to the concatenation of
its column vectors, is an isomorphism of ℂ-vector spaces. We use this map to identify
the matrices in ℂ(𝑙,𝑘) with vectors in ℂ𝑘𝑙 . Using this identification, the map (2.3.33) is
the standard Hermitian inner product on ℂ(𝑙,𝑘) . □
Equipped with the Hilbert-Schmidt inner product, the complex vector space
Hom(ℍ, ℍ′ ) becomes a Hilbert space.
We present another way of writing the Hilbert-Schmidt inner product.
72 2. Hilbert Spaces
2.4. Endomorphisms
In this section, we discuss endomorphisms of a Hilbert space ℍ of finite dimension 𝑘
with inner product ⟨⋅|⋅⟩ and their properties.
(2.4.4) tr(𝐴) = ∑ 𝑚𝜆 𝜆,
𝜆∈Λ
Proof. Let 𝐴 ∈ End(ℍ) or 𝐴 ∈ ℂ(𝑘,𝑘) . The first assertion follows from the fact that
ℂ is algebraically closed, which implies that 𝑝𝐴 (𝑥) is a product of linear factors. The
details are beyond the scope of this book. The other two assertions are derived from
Proposition B.5.27. □
Example 2.4.2. The characteristic polynomial of the identity operator 𝐼1 on ℍ1 is
𝑥−1 0
(2.4.6) 𝑝𝐼 (𝑥) = det(𝑥𝐼 − 𝐼) = det ( ) = (𝑥 − 1)2 .
0 𝑥−1
Hence, the only eigenvalue of 𝐼 is 1. It has algebraic multiplicity 2 and we have tr(𝐼) =
1 + 1 = 2 and det(𝐼) = 1 ⋅ 1 = 1.
The characteristic polynomial of the Pauli 𝑋 operator is
𝑥 −1
(2.4.7) 𝑝𝑋 (𝑥) = det(𝑥𝐼 − 𝑋) = det ( ) = (𝑥 − 1)(𝑥 + 1).
−1 𝑥
Hence, the eigenvalues of 𝑋 are 1 and −1, both with algebraic multiplicity 1, and we
have tr(𝑋) = 1 + (−1) = 0 and det(𝑋) = 1 ⋅ (−1) = −1.
Exercise 2.4.3. Use Proposition 2.4.1 to show that the Pauli 𝑌 and 𝑍 matrices have the
eigenvalues 1 and −1, trace 0, and determinant −1.
Hermitian matrices and operators are named after the French mathematician
Charles Hermite who lived in the 19th century and made significant contributions to
many areas of mathematics.
Example 2.4.11. The matrix
1 𝑖
(2.4.14) 𝐴=( )
−𝑖 1
is Hermitian since
1 −𝑖 1 𝑖
(2.4.15) 𝐴∗ = 𝐴𝑇 = ( )=( ).
𝑖 1 −𝑖 1
Exercise 2.4.12. Show that the Pauli operators 𝑋, 𝑌 , and 𝑍 and the Hadamard operator
𝐻 are Hermitian.
2.4. Endomorphisms 75
Proposition 2.4.13. (1) The diagonal elements of Hermitian matrices are real num-
bers.
(2) The determinants, trace, and eigenvalues of Hermitian matrices or operators are
real numbers.
(3) The inverse of an invertible Hermitian matrix or operator is Hermitian.
(4) The sum of two Hermitian matrices or operators is Hermitian.
(5) The product 𝐴𝐵 of two Hermitian matrices 𝐴, 𝐵 ∈ ℂ(𝑘,𝑘) or operators 𝐴, 𝐵 ∈ End(ℍ)
is Hermitian if and only if 𝐴𝐵 = 𝐵𝐴.
(6) If 𝐴, 𝐵 ∈ ℂ(𝑘,𝑘) or 𝐴, 𝐵 ∈ End(ℍ), then 𝐴𝐵𝐴 is Hermitian.
2.4.3. Unitary matrices and operators. In Section 3.3, unitary operators will
be used to model the evolution of quantum systems over time. This section introduces
these operators and presents their properties.
Exercise 2.4.17. Show that Hermitian matrices or operators are involutions if and only
if they are unitary. Conclude that the Pauli operators 𝑋, 𝑌 , and 𝑍 and the Hadamard
operator are unitary.
Proposition 2.4.18. Let 𝑈 ∈ ℂ(𝑘,𝑘) . Then the following statements are equivalent.
(1) 𝑈 is unitary.
(2) 𝑈 is invertible and 𝑈 −1 = 𝑈 ∗ .
(3) The columns of the matrix 𝑈 form an orthonormal basis of ℂ𝑘 .
(4) The rows of the matrix 𝑈 form an orthonormal basis of ℂ𝑘 .
(5) ⟨𝑈 𝑣,⃗ 𝑈 𝑤⟩⃗ = ⟨𝑣,⃗ 𝑤⟩⃗ for all 𝑣,⃗ 𝑤⃗ ∈ ℂ𝑘
(6) ‖‖𝑈 𝑣‖‖⃗ = ‖‖𝑣‖‖⃗ for all 𝑣 ⃗ ∈ ℂ𝑘 .
76 2. Hilbert Spaces
Proof. Let 𝑈 ∈ ℂ(𝑘,𝑘) . Statements (1) and (2) are equivalent by the definition of uni-
tary matrices and invertibility. Next, we note that 𝑈 ∗ = 𝑈 −1 if and only if 𝑈𝑈 ∗ = 𝐼𝑘
which is equivalent to the sequence of row vectors of 𝑈 being an orthonormal basis
of ℂ𝑘 . Also, the equivalence of the second and fourth property can be deduced from
𝑈 ∗ 𝑈 = 𝐼𝑘 .
We show that statement (1) and statement (5) are equivalent. Let 𝑈 ∈ ℂ(𝑘,𝑘) be
unitary and let 𝑣,⃗ 𝑤⃗ ∈ ℂ𝑘 . Then 𝑈 ∗ 𝑈 = 𝐼𝑘 and Proposition 2.3.11 imply ⟨𝑈 𝑣,⃗ 𝑈 𝑤⟩⃗ =
⟨𝑈 ∗ 𝑈 𝑣,⃗ 𝑤⟩⃗ = ⟨𝑣,⃗ 𝑤⟩.
⃗ Conversely, assume that ⟨𝑈 𝑣,⃗ 𝑈 𝑤⟩⃗ = ⟨𝑣,⃗ 𝑤⟩⃗ for all 𝑣,⃗ 𝑤⃗ ∈ ℂ𝑘 . This
implies
(2.4.18) 𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = ⟨𝑈 𝑒 𝑖⃗ |𝑈 𝑒𝑗⃗ ⟩ = ⟨𝑒 𝑖⃗ |𝑒𝑗⃗ ⟩ = 𝛿 𝑖,𝑗
for all 𝑖, 𝑗 ∈ ℤ𝑘 where 𝑒 𝑖⃗ is the 𝑖th standard unit vector in ℂ𝑘 for 0 ≤ 𝑖 < 𝑘. This means
that 𝑈 ∗ 𝑈 = 𝐼𝑘 . So Corollary B.5.21 implies that 𝑈 is invertible and 𝑈 ∗ = 𝑈 −1 ; i.e.,
𝑈 ∗ 𝑈 = 𝑈𝑈 ∗ = 𝐼𝑘 .
Finally, we show that statements (1) and (6) are equivalent. Statement (6) follows
immediately from statement (5) which is equivalent to statement (1). Conversely, as-
sume that ⟨𝑈 𝑣,⃗ 𝑈 𝑣⟩⃗ = ⟨𝑣,⃗ 𝑣⟩⃗ for all 𝑣 ⃗ ∈ ℂ𝑘 . We show that
(2.4.19) 𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = 𝛿 𝑖,𝑗
for all 𝑖, 𝑗 ∈ ℤ𝑘 . Then Corollary B.5.21 implies that 𝑈 is invertible and 𝑈 ∗ = 𝑈 −1 ; i.e.,
𝑈 ∗ 𝑈 = 𝑈𝑈 ∗ = 𝐼𝑘 . For all 𝑖 ∈ ℤ𝑘 we have
(2.4.20) 𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒 𝑖⃗ = ⟨𝑈 𝑒 𝑖⃗ |𝑈 𝑒 𝑖⃗ ⟩ = ⟨𝑒 𝑖⃗ , 𝑒 𝑖⃗ ⟩ = 1.
Next, let 𝑖, 𝑗 ∈ ℤ𝑘 and assume that 𝑖 ≠ 𝑗. Then we have
2 = ⟨𝑒 𝑖⃗ + 𝑒𝑗⃗ |𝑒 𝑖⃗ + 𝑒𝑗⃗ ⟩
= ⟨𝑈(𝑒 𝑖⃗ + 𝑒𝑗⃗ )|𝑈(𝑒 𝑖⃗ + 𝑒𝑗⃗ )⟩
2 2
= ‖‖𝑈 𝑒 𝑖⃗ ‖‖ + ⟨𝑈 𝑒 𝑖⃗ |𝑈 𝑒𝑗⃗ ⟩ + ⟨𝑈 𝑒𝑗⃗ |𝑈 𝑒 𝑖⃗ ⟩ + ‖‖𝑈 𝑒𝑗⃗ ‖‖
= 2 + 2ℜ𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ .
It follows that ℜ𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = 0. Applying similar arguments to ⟨𝑈(𝑒 𝑖⃗ + 𝑖𝑒𝑗⃗ )|𝑈(𝑒 𝑖⃗ + 𝑖𝑒𝑗⃗ )⟩
it can be shown that ℑ𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = 0. Therefore, (2.4.19) holds. □
Exercise 2.4.19. Show that permutation matrices are unitary.
Proof. Let 𝑈, 𝑉 ∈ ℂ(𝑘,𝑘) be unitary. It then follows from Lemma B.5.23, Proposition
2.3.9, and Proposition 2.4.18 that (𝑈𝑉)−1 = 𝑉 −1 𝑈 −1 = 𝑉 ∗ 𝑈 ∗ = (𝑈𝑉)∗ . Also, 𝐼𝑘 is
unitary. Therefore, the set of unitary matrices is a subgroup of 𝖦𝖫(𝑘, ℂ). Since the
product of two matrices of determinant 1 has determinant 1, it follows that SU(𝑘) is a
subgroup of U(𝑘). □
2.4. Endomorphisms 77
In formula (2.4.22) we deviate from the usual notation and write the scalar product
of the complex number 𝛼 = ⟨𝜓|𝜉⟩ with |𝜑⟩ ∈ ℍ as |𝜑⟩ 𝛼 instead of 𝛼 |𝜑⟩. This allows for
a more intuitive notation.
Example 2.4.26. The computational basis of the single-qubit state space ℍ1 is (|0⟩ , |1⟩).
Examples of the outer products of kets in ℍ1 are |0⟩ ⟨0|, |0⟩ ⟨1|, |1⟩ ⟨0|, and |1⟩ ⟨1|. Let
|𝜓⟩ = 𝛼 |0⟩ + 𝛽 |1⟩ ∈ ℍ1 with complex coefficients 𝛼 and 𝛽. Then the images of this ket
under the four outer products are
|0⟩ ⟨0| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |0⟩ ⟨0|0⟩ + 𝛽 |0⟩ ⟨0|1⟩ = 𝛼 |0⟩ ,
|0⟩ ⟨1| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |0⟩ ⟨1|0⟩ + 𝛽 |0⟩ ⟨1|1⟩ = 𝛽 |0⟩ ,
|1⟩ ⟨0| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |1⟩ ⟨0|0⟩ + 𝛽 |1⟩ ⟨0|1⟩ = 𝛼 |1⟩ ,
|1⟩ ⟨1| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |1⟩ ⟨1|0⟩ + 𝛽 |1⟩ ⟨1|1⟩ = 𝛽 |1⟩ .
We present a more abstract interpretation of the outer product, which is useful for
computations. We recall from Section B.3 that we can view each |𝜑⟩ ∈ ℍ as the linear
map
(2.4.23) ℂ → ℍ, 𝛼 ↦ 𝛼 |𝜑⟩
78 2. Hilbert Spaces
We present a formula for the representation matrix of |𝜑⟩ ⟨𝜓| with respect to the
basis 𝐵 where |𝜑⟩ and |𝜓⟩ ∈ ℍ. For this, we let
(2.4.25) |𝜑⟩𝐵 = (𝛼0 , . . . , 𝛼𝑘−1 ), |𝜓⟩𝐵 = (𝛽0 , . . . , 𝛽 𝑘−1 ).
Then we have
∗
(2.4.26) Mat𝐵 (|𝜑⟩ ⟨𝜓|) = |𝜑⟩𝐵 |𝜓⟩𝐵 = (𝛼𝑖 𝛽𝑗 )0≤𝑖,𝑗<𝑘 .
From (2.4.26) we obtain the following results by applying the rules of matrix mul-
tiplication and the formula for the trace.
Proposition 2.4.27. Let |𝜑⟩ , |𝜓⟩ , |𝜉⟩ , |𝜒⟩ ∈ ℍ and let 𝛼 ∈ ℂ. Then the following hold.
(1) (|𝜑⟩ + |𝜓⟩) ⟨𝜉| = |𝜑⟩ ⟨𝜉| + |𝜓⟩ ⟨𝜉|.
(2) |𝜑⟩ (⟨𝜓| + ⟨𝜉|) = |𝜑⟩ ⟨𝜓| + |𝜑⟩ ⟨𝜉|.
(3) (𝛼 |𝜑⟩) ⟨𝜓| = 𝛼 |𝜑⟩ ⟨𝜓|.
(4) |𝜑⟩ (𝛼 ⟨𝜓|) = 𝛼 |𝜑⟩ ⟨𝜓|.
(5) (|𝜑⟩ ⟨𝜓|)∗ = |𝜓⟩ ⟨𝜑|.
(6) tr(|𝜑⟩ ⟨𝜓|) = ⟨𝜓|𝜑⟩.
(7) |𝜑⟩ ⟨𝜓| ∘ |𝜉⟩ ⟨𝜒| = ⟨𝜓|𝜉⟩ |𝜑⟩ ⟨𝜒|.
(8) ⟨𝜑|𝜓⟩⟨𝜉|𝜒⟩ = ⟨𝜑||(|𝜓⟩ ⟨𝜉|)||𝜒⟩.
Proof. Let 𝑖, 𝑗, 𝑢, 𝑣 ∈ ℤ𝑘 . Then Proposition 2.4.27 implies tr(|𝑏𝑖 ⟩ ⟨𝑏𝑗 | ∘ |𝑏ᵆ ⟩ ⟨𝑏𝑣 |) =
⟨𝑏𝑗 |𝑏ᵆ ⟩⟨𝑏𝑖 |𝑏𝑣 ⟩ = 𝛿𝑗,ᵆ 𝛿 𝑖,𝑣 . Hence, the sequence (|𝑏𝑖 ⟩ ⟨𝑏𝑗 |) is orthonormal. Since its
length is 𝑛2 which is the dimension of End(ℍ) over ℂ, it is a basis of this ℂ-algebra.
Also, (2.4.27) follows from (2.2.12). □
Corollary 2.4.33.
𝑘−1
(2.4.29) 𝐼ℍ = ∑ |𝑏𝑖 ⟩ ⟨𝑏𝑖 | .
𝑖=0
But 𝑃 is not an orthogonal projection. To see this, we note that ⟨𝑃(1, 2)||(1, 2)−𝑃(1, 2)⟩ =
⟨(−2, 2)||(1, 2) − (−2, 2)⟩ = ⟨(−2, 2)|(3, 0)⟩ = −6 ≠ 0.
To show that 𝑃 is an orthogonal projection, we note that ⟨𝑃(𝛼, 𝛽)||(𝛼, 𝛽) − 𝑃(𝛼, 𝛽)⟩ =
⟨(0, 𝛽)||(𝛼, 0)⟩ = 0 for all 𝛼, 𝛽 ∈ ℂ.
Exercise 2.4.38. Show that for any orthogonal projection 𝑃 on ℍ and any |𝜓⟩ ∈ ℍ we
have ‖𝑃 |𝜓⟩‖ ≤ ‖𝜓‖.
Proposition 2.4.39. Let 𝑃 ∈ End(ℍ). Then the following are true.
(1) If 𝑃 is a projection, then 𝑃 ∗ is a projection.
(2) If 𝑃 is an orthogonal projection, then 𝑃∗ is an orthogonal projection.
Exercise 2.4.40. Prove Proposition 2.4.39.
(4) Let 𝐵0 , . . . , 𝐵𝑙−1 be orthonormal bases of ℍ(0), . . . , ℍ(𝑙 − 1), respectively, such that
𝐵 = 𝐵0 ‖ ⋯ ‖𝐵𝑙−1 is an orthonormal basis of ℍ which exists by Proposition 2.2.44.
Then for all 𝑖 ∈ ℤ𝑙 we have
(2.4.37) 𝑃𝑖 = ∑ |𝑏⟩ ⟨𝑏| .
|𝑏⟩∈𝐵𝑖
Proof. Let 𝑖 ∈ ℤ𝑙 and let |𝜑⟩ ∈ ℍ. The uniqueness of the representation of the ele-
ments of ℍ as a sum of elements in the ℍ(𝑖) implies 𝑃𝑖2 𝜑 = 𝑃𝑖 (|𝜑(𝑖)⟩) = |𝜑(𝑖)⟩. Also, the
sequence (𝑃𝑖 ) is orthogonal because of the orthogonality of the ℍ(𝑖). This proves the
first assertion. Next, for 𝑖, 𝑗 ∈ ℤ𝑙 with 𝑖 ≠ 0 we have 𝑃𝑖 𝑃𝑗 = 0. Therefore, the 𝑃𝑖 are or-
thogonal to each other with respect to the Hilbert-Schmidt inner product. So the 𝑃𝑖 are
linearly independent by Corollary 2.2.34. The last assertion follows from Proposition
2.4.31. □
Example 2.4.45. Recall that
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.4.38) (|𝑥+ ⟩ , |𝑥− ⟩) = ( , )
√2 √2
is an orthonormal basis of ℍ1 . The orthogonal projection of |0⟩ onto ℂ |𝑥+ ⟩ is
1
(2.4.39) |𝑥+ ⟩ ⟨+|0⟩ = |𝑥+ ⟩ .
√2
2.4.6. Schur decomposition. As we have seen in Section 2.4.1, not all matrices
in ℂ(𝑘,𝑘) are diagonizable. However, we can prove the following weaker result, which
will allow us to prove the spectral theorem in the next section. It was first proved by
the mathematician Issai Schur in the early 20th century.
Theorem 2.4.46 (Schur decomposition theorem). Let 𝐴 ∈ ℂ(𝑘,𝑘) . Assume that 𝐴 has
the 𝑙 distinct eigenvalues 𝜆0 , . . . , 𝜆𝑙−1 with algebraic multiplicities 𝑚0 , . . . , 𝑚𝑙−1 . Then 𝑘 =
𝑙−1
∑𝑖=0 𝑚𝑖 and there is a unitary matrix 𝑈 ∈ ℂ(𝑘,𝑘) and an upper triangular matrix 𝑇 with
diagonal
(2.4.40) (𝜆0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝜆⎵⏟
𝑙−1 )
𝑚0 𝑚1 𝑚𝑙−1
such that
(2.4.41) 𝐴 = 𝑈𝑇𝑈 ∗ .
Such a representation is called Schur decomposition of 𝐴.
Proof. We prove the assertion by induction on 𝑘 and, in doing so, present an algorithm
to construct a Schur decomposition of 𝐴. For 𝑘 = 1 the assertion is true since in this
case, the matrix 𝐴 is in upper triangular form. So we can set 𝑈 = 𝐼1 .
Let 𝑘 > 1 and assume that the assertion holds for all 𝑚 < 𝑘. Let 𝑣 ⃗ be an eigenvector
associated with the eigenvalue 𝜆0 that exists by Proposition B.7.21. Assume, without
loss of generality, that ‖‖𝑣‖‖⃗ = 1. By Theorem 2.2.33, there is a matrix 𝑋 ∈ 𝐶 (𝑘,𝑘−1) such
that the column vectors of the matrix
(2.4.42) (𝑣 ⃗ 𝑋)
82 2. Hilbert Spaces
form an orthonormal basis of ℂ𝑘 . Proposition 2.4.18 implies that this matrix is unitary.
So we have
𝑣∗⃗ 𝑣∗⃗ 𝐴𝑣 ⃗ 𝑣∗⃗ 𝐴𝑋
( ∗ ) 𝐴 (𝑣 ⃗ 𝑋) = ( ∗ )
𝑋 𝑋 𝐴𝑣 ⃗ 𝑋 ∗ 𝐴𝑋
(2.4.43)
𝜆0 𝑣∗⃗ 𝑣 ⃗ 𝑣∗⃗ 𝐴𝑋 𝜆 𝑣∗⃗ 𝐴𝑋
=( ∗ ∗ )=( 0 ).
𝜆0 𝑋 𝑣 ⃗ 𝑋 𝐴𝑋 0 𝑋 ∗ 𝐴𝑋
The lower-left corner of this matrix is zero because all columns of 𝑋 are orthogonal to
𝑣.⃗ Also, 𝑋 ∗ 𝐴𝑋 is in ℂ(𝑘−1,𝑘−1) and since the matrix (𝑣 ⃗ 𝑋) is unitary, we have (𝑣𝑋)
⃗ −1 =
(𝑣𝑋)⃗ ∗ . So by (2.4.43) and Proposition B.5.30 we have
(2.4.44) 𝑝𝐴 (𝑥) = (𝑥 − 𝜆0 )𝑝𝑋 ∗ 𝐴𝑋 (𝑥)
which implies
𝑙−1
(2.4.45) 𝑝𝑋 ∗ 𝐴𝑋 (𝑥) = (𝑥 − 𝜆0 )𝑚0 −1 ∏(𝑥 − 𝜆𝑖 )𝑚𝑖 .
𝑖=1
Define
(2.4.48) 𝑈 = (𝑣 ⃗ 𝑋𝑌 ) .
𝜆0 𝑣∗⃗ 𝐴𝑋𝑌
=( )
0 𝑍
which is an upper triangular matrix. Denote it by 𝑇. The diagonal of the upper trian-
gular matrix 𝑍 is shown in (2.4.47). Therefore, by (2.4.50) the diagonal of 𝑇 is
(2.4.51) (𝜆0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝜆⎵⏟
𝑙−1 ).
𝑚0 𝑚1 𝑚𝑙−1
1 1
(2.4.53) 𝑋= ()
√2 −1
2.4.7. The spectral theorem. The aim of this section is to introduce the re-
nowned spectral theorem, which establishes the diagonalizability of normal matrices
and the existence of an orthonormal basis consisting of eigenvectors for normal op-
erators. The finite-dimensional version which we present here goes back to the early
20th century and is closely associated with the contributions of mathematicians such as
David Hilbert. It assumes pivotal significance in the postulates of quantum mechan-
ics presented in Chapter 3. As previously mentioned, the broader domain of quan-
tum mechanics necessitates the infinite-dimensional analog, which can be attributed
to mathematicians like John von Neumann and Hermann Weyl.
The next proposition presents important examples of normal matrices and opera-
tors.
Proposition 2.4.49. If 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(𝑉) is Hermitian or unitary, then 𝐴 is
normal.
Exercise 2.4.50. Prove Proposition 2.4.49.
Proof. Let 𝐴 ∈ ℂ(𝑘,𝑘) be a normal matrix. We apply (2.3.25) and obtain (𝐴∗ )∗ 𝐴∗ =
𝐴𝐴∗ = 𝐴∗ 𝐴 = 𝐴∗ (𝐴∗ )∗ . This proves the first assertion.
To show the second assertion, let 𝐷 ∈ ℂ(𝑘,𝑘) be a diagonal matrix with diago-
nal (𝜆0 , . . . , 𝜆𝑘−1 ) ∈ ℂ𝑘 . Then 𝐷∗ 𝐷 and 𝐷𝐷∗ are diagonal matrices with diagonal
(|𝜆0 |2 , . . . , |𝜆𝑘−1 |2 ). Hence, we have 𝐷𝐷∗ = 𝐷∗ 𝐷 which means that 𝐷 is normal.
We prove the last statement and let 𝐴 = (𝑎𝑖,𝑗 ). We also denote by (𝑟0⃗ , . . . , 𝑟 𝑘−1
⃗ )
⃗ ) the column vectors of 𝐴. Then 𝐴 𝐴 = 𝐴𝐴∗
the row vectors of 𝐴 and by (𝑐 0⃗ , . . . , 𝑐 𝑘−1 ∗
implies that for any 𝑢 ∈ ℤ𝑘 the entry of this matrix with row and column index 𝑢 can
2
be computed as 𝑟ᵆ∗⃗ 𝑟ᵆ⃗ = ‖‖𝑟ᵆ⃗ ‖‖ and as 𝑐ᵆ⃗ 𝑐ᵆ∗⃗ = ‖‖𝑐ᵆ⃗ ‖‖. So for 0 ≤ 𝑢 < 𝑘 we have
2 2
(2.4.58) ‖𝑟 ⃗ ‖ = ‖𝑐 ⃗ ‖ .
‖ ᵆ‖ ‖ ᵆ‖
We prove by induction on 𝑢 that for 𝑢 = 0, 1, . . . , 𝑘 we have
(2.4.59) 𝑟 𝑖⃗ = 𝑐 𝑖⃗ = 𝑎𝑖,𝑖 𝑒 𝑖⃗ , 0 ≤ 𝑖 < 𝑢.
For 𝑢 = 𝑘 this implies that 𝐴 is diagonal. For the base case 𝑢 = 0 there is nothing
to show. For the induction step, assume that 0 ≤ 𝑢 < 𝑘 and that (2.4.59) holds for 𝑢.
Since 𝐴 is upper triangular, it follows from (2.4.59) that
(2.4.60) 𝑐ᵆ⃗ = 𝑎ᵆ,ᵆ 𝑒 ᵆ⃗
and
(2.4.61) 𝑟ᵆ⃗ = (0, . . . , 0, 𝑎ᵆ,ᵆ , 𝑎ᵆ,ᵆ+1 , . . . , 𝑎ᵆ,𝑘−1 ).
𝑘−1
So (2.4.58) implies |𝑎ᵆ,ᵆ |2 = ∑𝑗=ᵆ |𝑎ᵆ,𝑗 |2 and thus 𝑎ᵆ,ᵆ+1 = ⋯ = 𝑎ᵆ,𝑘−1 = 0 which
shows that 𝑟ᵆ⃗ = 𝑎ᵆ,ᵆ 𝑒 ᵆ⃗ . □
The next exercise shows that there are normal matrices that are neither unitary
nor Hermitian.
Example 2.4.52. Show that the matrix
1 1+𝑖 1
(2.4.62) (−1 + 𝑖 1 1)
−1 −1 1
is normal but not Hermitian or unitary.
2.4. Endomorphisms 85
𝑗−1
with 𝑀𝑗 = ∑𝑖=0 𝑚𝑖 for 0 ≤ 𝑗 ≤ 𝑙, then 𝑈 𝑗 is an orthonormal basis of the eigenspace
associated with 𝜆𝑗 for all 𝑗 ∈ ℤ𝑙 .
Theorem 2.4.56 (Spectral theorem). Let 𝐴 ∈ End(ℍ) be normal. Let Λ be the set of
eigenvalues of 𝐴. For 𝜆 ∈ Λ denote by 𝑃𝜆 the orthogonal projection onto the eigenspace
𝐸 𝜆 corresponding to 𝜆. Then the following are true.
(1) There are orthonormal bases 𝐵𝜆 of 𝐸 𝜆 for all 𝜆 ∈ Λ such that their concatenation is
an orthonormal basis of ℍ.
(2) The eigenspaces 𝐸 𝜆 are orthogonal to each other, and their sum is ℍ.
(3) 𝑃𝜆 = ∑|𝑏⟩∈𝐵 |𝑏⟩ ⟨𝑏|.
𝜆
(4) ∑𝜆∈Λ 𝑃𝜆 = 𝐼ℍ .
(5) 𝐴 = ∑𝜆∈Λ 𝜆𝑃𝜆 . This representation of 𝐴 is called the spectral decomposition of 𝐴.
Proof. Let 𝑙 = |Λ| and Λ = {𝜆0 , . . . , 𝜆𝑙−1 }. Denote by 𝐴 also the representation matrix
of 𝐴 with respect to an orthonormal basis 𝐶 of ℍ. Use the notation of Theorem 2.4.53.
Then
(2.4.69) 𝐵 = 𝐶𝑈 = (|𝑢0 ⟩ , . . . , |𝑢𝑘−1 ⟩)
is another orthonormal basis of ℍ. Let 𝑗 ∈ ℤ𝑙 . It follows from the properties of 𝑈
that 𝐵𝜆 = (||𝑢𝑀𝑗 ⟩ , . . . , ||𝑢𝑀𝑗+1 −1 ⟩) is an orthonormal basis of 𝐸 𝜆 . This proves the first
assertion.
The second assertion follows immediately from the first. The second, third, and
fourth assertions follow from Proposition 2.4.44. Using the fourth assertion we obtain
(2.4.70) 𝐴 |𝜑⟩ = 𝐴 ∑ 𝑃𝜆 |𝜑⟩ = ∑ 𝐴𝑃𝜆 |𝜑⟩ = ∑ 𝜆𝑃𝜆 |𝜑⟩ . □
𝜆∈Λ 𝜆∈Λ 𝜆∈Λ
Proposition 2.4.59. Let 𝐴 ∈ End(ℍ) be normal, let Λ be the set of eigenvalues of 𝐴, and
let
(2.4.73) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ
Proof. We have
∗
∗
𝐴 = ( ∑ 𝜆𝑃𝜆 ) by (2.4.73),
𝜆∈𝑃
This proves the first assertion. The other two assertions can be verified using the fact
that by Theorem 2.4.56 the eigenspaces 𝑃𝜆 (ℍ) are pairwise orthogonal. □
We note that the second and third equations in (2.4.74) may show spectral decom-
positions, since the absolute values and powers of different eigenvalues of a normal
operator may be the same. To obtain the spectral decompositions, we have to group
the projections appropriately.
The spectral theorem allows us to characterize involutions, projections, and Her-
mitian and unitary operators by their eigenvalues.
Proposition 2.4.60. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ) be normal and let Λ be the set of
eigenvalues of 𝐴. Then the following hold.
(1) 𝐴 is an involution if and only if Λ ⊂ {−1, 1}.
(2) 𝐴 is a projection if and only if Λ ⊂ {0, 1}.
(3) 𝐴 is Hermitian if and only if all its eigenvalues are real numbers.
(4) 𝐴 is unitary if and only if all of its eigenvalues have absolute value 1.
Proof. Let
(2.4.75) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ
be the spectral decomposition of 𝐴. Recall that by Proposition 2.4.44, the 𝑃𝑖 are linearly
independent. This will be used in all arguments.
88 2. Hilbert Spaces
Using Theorem 2.4.56 and Proposition 2.4.59 we see that 𝐴 is an involution if and
only if
(2.4.76) 𝐴2 = ∑ 𝜆2 𝑃𝜆 = 𝐼ℍ = ∑ 𝑃𝜆 .
𝜆∈Λ 𝜆
Hence, 𝐴 is an involution if and only if 𝜆2 = 1 for all 𝜆 ∈ Λ. This is true if and only if
Λ ⊂ {1, −1}.
Next, 𝐴 is a projection if and only if 𝐴2 = 𝐴. By Theorem 2.4.56 and Proposition
2.4.59 this is true if and only if 𝜆2 = 𝜆 for all 𝜆 ∈ Λ which is equivalent to Λ ⊂ {0, 1}.
In addition, 𝐴 is Hermitian if and only if 𝐴∗ = 𝐴. By Theorem 2.4.56 and Proposi-
tion 2.4.59 this is equivalent to 𝜆 = 𝜆 and thus 𝜆 ∈ ℝ for all 𝜆 ∈ Λ.
Finally, 𝐴 is unitary if and only if 𝐴 is invertible and 𝐴∗ = 𝐴−1 . By Theorem 2.4.56
and Proposition 2.4.59 this is equivalent to 0 ∉ Λ and 𝜆 = 1/𝜆 for all 𝜆 ∈ Λ which
means that |𝜆| = 1 for all 𝜆 ∈ Λ. □
Proof. It suffices to prove the assertion for 𝐴 ∈ ℂ(𝑘,𝑘) . First, assume that 𝐴 is Hermit-
ian and let 𝑢⃗ ∈ ℂ𝑘 . Then we have
2.4.8. Definite operators and matrices. We define definite matrices and op-
erators.
Proposition 2.4.64. All normal positive definite, positive semidefinite, negative definite,
and negative semidefinite operators or matrices are Hermitian.
2.4. Endomorphisms 89
This shows that the first assertion implies the other two statements.
Now assume that there is a normal 𝐵 ∈ End(ℍ) such that 𝐴 = 𝐵𝐵 ∗ . Let
(2.4.82) 𝐵 = ∑ 𝜆𝑃𝜆
𝜆∈Λ′
Proof. As shown in Exercise 2.4.68, the matrix 𝐴∗ 𝐴 is a positive semidefinite and Her-
mitian matrix in ℂ(𝑙,𝑙) . It follows from Theorem 2.4.53 that there is a unitary matrix
𝑉 ∈ ℂ(𝑙,𝑙) such that
𝐷 0
(2.4.85) 𝑉 ∗ 𝐴∗ 𝐴𝑉 = 𝐷′ = ( )
0 0
where 𝐷 ∈ ℂ(𝑚,𝑚) is a positive definite diagonal matrix, 𝑚 is the number of nonzero
eigenvalues of 𝐴∗ 𝐴, and these eigenvalues are positive real numbers and diagonal el-
ements of 𝐷. By Theorem 2.4.53, the columns of 𝑉 form an orthonormal basis of ℂ𝑙
consisting of eigenvectors of 𝐴∗ 𝐴. The eigenvalue corresponding to the 𝑖th column of
𝑉 is the 𝑖th diagonal entry of 𝐷′ for 0 ≤ 𝑖 < 𝑘. We write
(2.4.86) 𝑉 = (𝑉1 𝑉2 )
where 𝑉1 ∈ ℂ(𝑙,𝑚) and 𝑉2 ∈ 𝐶 (𝑙,𝑙−𝑚) . Then the columns of 𝑉1 are linearly independent
eigenvectors corresponding to the nonzero eigenvalues of 𝐴∗ 𝐴. Also, the columns of 𝑉2
are eigenvectors corresponding to the eigenvalue 0 of 𝐴∗ 𝐴. So (2.4.85) can be rewritten
as
𝑉∗ 𝑉 ∗ 𝐴∗ 𝐴𝑉 𝑉1∗ 𝐴∗ 𝐴𝑉2 𝐷 0
(2.4.87) ( 1∗ ) 𝐴∗ 𝐴 (𝑉1 𝑉2 ) = ( 1∗ ∗ 1 )=( ).
𝑉2 𝑉2 𝐴 𝐴𝑉1 𝑉2∗ 𝐴∗ 𝐴𝑉2 0 0
This implies
(2.4.88) (𝐴𝑉1 )∗ 𝐴𝑉1 = 𝑉1∗ 𝐴∗ 𝐴𝑉1 = 𝐷, (𝐴𝑉2 )∗ 𝐴𝑉2 = 𝑉2∗ 𝐴∗ 𝐴𝑉2 = 0
which implies
2
(2.4.89) ‖𝐴𝑉2 ‖ = tr(𝑉2∗ 𝐴∗ 𝐴𝑉2 ) = 0.
Hence we have
(2.4.90) 𝐴𝑉2 = 0.
The fact that 𝑉 is unitary implies
(2.4.91) 𝑉1∗ 𝑉1 = 𝐼𝑙 , 𝑉2∗ 𝑉2 = 𝐼𝑙−𝑚 , 𝑉1 𝑉1∗ + 𝑉2 𝑉2∗ = 𝐼𝑙 .
2.4. Endomorphisms 91
Now define
(2.4.92) 𝑈1 = 𝐴𝑉1 𝐷−1/2
where 𝐷−1/2 is the diagonal matrix whose diagonal entries are the inverse square roots
of the diagonal entries of 𝐷. Then the third equation in (2.4.91), (2.4.92), and (2.4.90)
imply
𝑈1 𝐷1/2 𝑉1∗ = 𝐴𝑉1 𝐷−1/2 𝐷1/2 𝑉1∗ = 𝐴𝑉1 𝑉1∗
(2.4.93)
= 𝐴(𝐼𝑘 − 𝑉2 𝑉2∗ ) = 𝐴 − (𝐴𝑉2 )𝑉2∗ = 𝐴.
Also, (2.4.92) and (2.4.88) imply
(2.4.94) 𝑈1∗ 𝑈1 = 𝐷−1/2 𝑉1∗ 𝐴∗ 𝐴𝑉1 𝐷−1/2 = 𝐷−1/2 𝐷𝐷−1/2 = 𝐼𝑚 .
Hence, the columns of 𝑈1 form an orthonormal sequence which by Theorem 2.2.33 can
be extended to an orthonormal basis of ℂ𝑙 . Therefore, we can choose 𝑈2 ∈ 𝐶 (𝑘,𝑘−𝑚) so
that
(2.4.95) 𝑈 = (𝑈1 𝑈2 )
is a unitary matrix. Finally, we define the matrix 𝐵 ∈ ℂ(𝑘,𝑙) as
𝐷1/2 0
(2.4.96) 𝐵=( ).
0 0
Then we have
𝐷1/2 0 𝑉1∗
(2.4.97) 𝑈𝐵𝑉 ∗ = (𝑈1 𝑈2 ) ( ) ( ) = 𝑈1 𝐷1/2 𝑉1∗ = 𝐴. □
0 0 𝑉2∗
Note that the singular values of a complex matrix 𝐴 are the square roots of the
nonzero eigenvalues of the normal square matrix 𝐴∗ 𝐴.
Exercise 2.4.68. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ). Show that 𝐴𝐴∗ is positive semidefinite
and Hermitian.
It can be shown that in certain cases the function 𝑓(𝐴) can also be defined using
power series, even if 𝐴 is not normal. Also, we note that from (2.4.100), the spectral
decomposition of 𝑓(𝐴) can be easily obtained.
Example 2.4.70. Consider the Pauli operator 𝑍 and the 𝜋/8 operator 𝑇 which have the
spectral decompositions
(2.4.101) 𝑍 = |0⟩ ⟨0| − |1⟩ ⟨1| , 𝑇 = |0⟩ ⟨0| + 𝑒𝑖𝜋/4 |1⟩ ⟨1| .
Also, let
(2.4.102) 𝑓 ∶ ℂ → ℂ, 𝑥 ↦ 𝑒−𝑖𝜋𝑥/8 .
The eigenvalues of 𝑍 are 1 and −1. Therefore, we have
𝑓(𝑍) = 𝑒−𝑖𝜋𝑍/8
= 𝑒−𝑖𝜋/8 |0⟩ ⟨0| + 𝑒𝑖𝜋/8 |1⟩ ⟨1|
(2.4.103)
= 𝑒−𝑖𝜋/8 (|0⟩ ⟨0| + 𝑒𝑖𝜋/4 |1⟩ ⟨1|)
= 𝑒−𝑖𝜋/8 𝑇.
Exercise 2.4.71. Let 𝐴 be a normal linear operator on ℍ and let 𝛼, 𝛽 ∈ ℝ. Prove that
𝑒𝑖𝐴(𝛼+𝛽) = 𝑒𝑖𝐴𝛼 𝑒𝑖𝐴𝛽 .
2.4.60 implies that Λ ⊂ ℝ. Hence, we have |𝑒𝑖𝜆 | = 1 for all 𝜆 ∈ Λ. Therefore, it follows
from Proposition 2.4.60 that 𝑒𝑖𝐴 is unitary.
Now assume that 𝑈 ∈ End(ℍ) is unitary. Let Λ be the set of eigenvalues of 𝑈.
Then it follows from Proposition 2.4.60 that |𝜆| = 1 for all 𝜆 ∈ Λ. Hence, for any 𝜆 ∈ Λ
there is 𝛼𝜆 ∈ ℝ such that 𝜆 = 𝑒𝑖𝛼𝜆 . Set
(2.4.106) 𝐴 = ∑ 𝛼𝜆 𝑃𝜆 .
𝜆∈Λ
Then 𝐴 is a linear operator on ℍ whose eigenvalues are all real numbers. Proposition
2.4.60 implies that 𝐴 is Hermitian. Also, we see from (2.4.106) that 𝑈 = 𝑒𝑖𝐴 . □
Proof. Since 𝐴 is an involution, it follows from Proposition 2.4.60 that the eigenvalues
of 𝐴 are in {1, −1}. If 1 is an eigenvalue of 𝐴, then we denote by 𝑃1 the orthogonal
projection onto the corresponding eigenspace. Otherwise, we set 𝑃1 = 0. Likewise,
if −1 is an eigenvalue of 𝐴, then we denote by 𝑃−1 the orthogonal projection onto the
corresponding eigenspace. Otherwise, we set 𝑃1 = 0. Then we have
(2.4.109) 𝐼ℍ = 𝑃1 + 𝑃−1 , 𝐴 = 𝑃1 − 𝑃−1 ,
and therefore
𝑒𝑖𝐴𝑥 = 𝑒𝑖𝑥 𝑃1 + 𝑒−𝑖𝑥 𝑃−1
= (cos 𝑥 + 𝑖 sin 𝑥)𝑃1 + (cos(−𝑥) + 𝑖 sin(−𝑥))𝑃−1
(2.4.110) = (cos 𝑥 + 𝑖 sin 𝑥)𝑃1 + (cos 𝑥 − 𝑖 sin 𝑥)𝑃−1
= cos 𝑥(𝑃1 + 𝑃−1 ) + 𝑖 sin 𝑥(𝑃1 − 𝑃−1 )
= (cos 𝑥)𝐼ℍ + 𝑖(sin 𝑥)𝐴. □
2.5.1. Basics and notation. In this section, let 𝑚 ∈ ℕ and let ℍ(0), . . . , ℍ(𝑚 − 1)
be Hilbert spaces of finite dimension 𝑘0 , . . . , 𝑘𝑚−1 , respectively. The inner products
on these Hilbert spaces are denoted by ⟨⋅|⋅⟩. We use the notation ℍ(𝑗) to distinguish
this Hilbert space from the 𝑗-qubit state spaces ℍ𝑗 . For each 𝑗 ∈ ℤ𝑚 , let 𝐵𝑗 be an
orthonormal basis of ℍ(𝑗). We write these bases as
(2.5.1) 𝐵𝑗 = (|𝑏0,𝑗 ⟩ , . . . , ||𝑏𝑘𝑗 ,𝑗 ⟩).
94 2. Hilbert Spaces
and
⊗3
(2.5.12) |0⟩ = |0⟩ |0⟩ |0⟩ .
and
𝑚−1
(2.5.14) ℤ𝑘 ⃗ = ∏ ℤ𝑘 𝑗 .
𝑗=0
Proof. Write
𝑚−1 𝑚−1
(2.5.20) |𝜑⟩ = |𝜑𝑗 ⟩ and |𝜓⟩ = 𝜓 ⟩.
⨂ ⨂| 𝑗
𝑗=0 𝑗=0
Example 2.5.6. Let 𝑚 = 2, ℍ(𝑗) = ℍ1 , and 𝐵𝑗 = (|0⟩ , |1⟩) for 0 ≤ 𝑗 < 2. For
|𝜑⟩0 = |0⟩ + 𝑖 |1⟩ , |𝜑⟩1 = |0⟩ − 𝑖 |1⟩
(2.5.26)
|𝜓⟩0 = |0⟩ + |1⟩ , |𝜓⟩1 = |0⟩ − |1⟩
we obtain
(2.5.27) ⟨ |𝜑0 ⟩ |𝜑1 ⟩ || |𝜓0 ⟩ |𝜓1 ⟩ ⟩ = ⟨𝜑0 |𝜓0 ⟩⟨𝜑1 |𝜓1 ⟩ = (1 + 𝑖)(1 − 𝑖) = 2.
2.5.3. State spaces as tensor products. We can use the construction in the
previous section to identify the tensor product of state spaces with a larger state space.
To explain this, let 𝑚, 𝑛0 , . . . , 𝑛𝑚−1 ∈ ℕ. Consider the tensor product
(2.5.28) ℍ = ℍ𝑛0 ⊗ ⋯ ⊗ ℍ𝑛𝑚−1
of the 𝑛𝑗 -qubit state spaces ℍ𝑛𝑗 , 𝑗 ∈ ℤ𝑚 .
𝑚−1
Let 𝑛 = ∑𝑗=0 𝑛𝑗 . Denote by 𝐵 the computational basis of ℍ𝑛 . Then the linear
map
|𝑏0⃗ ⟩ |𝑏1⃗ ⟩ ⋯ |𝑏𝑚−1 ⟩ ↦ ||𝑏0⃗ 𝑏1⃗ ⋯ 𝑏𝑚−1
(2.5.29) ℍ → ℍ𝑛 , | | | ⃗ ⃗ ⟩,
where 𝑏𝑗 ∈ {0, 1}𝑛𝑗 for 0 ≤ 𝑗 < 𝑚, is an isometry between ℍ𝑛0 ⊗ ⋯ ⊗ ℍ𝑛𝑚−1 and ℍ𝑛 .
Using this isometry, we identify the elements of the tensor product ℍ with the elements
of ℍ𝑛 .
Then we have
|𝜑⟩ ⊗ |𝜓⟩ = (|0⟩ + |1⟩) ⊗ (|0⟩ − |1⟩)
(2.5.31) = |0⟩ |0⟩ − |0⟩ |1⟩ + |1⟩ |0⟩ − |1⟩ |1⟩
= |00⟩ − |01⟩ + |10⟩ − |11⟩ .
|𝑚−1
(2.5.32) ℍ𝑛0 ⊕ ⋯ ⊕ ℍ𝑛𝑚−1 → ℍ𝑛 , |𝑏0 ⟩ ⋯ |𝑏𝑚−1 ⟩ ↦ || ∑ 𝑏𝑗 2𝑠𝑗 ⟩
| 𝑗=0
𝑚−1
where 𝑏𝑗 ∈ ℤ2𝑛𝑗 and 𝑠𝑗 = ∑ᵆ=𝑗 𝑛ᵆ for all 𝑗 ∈ ℤ𝑚 .
and
and
𝑙−1 𝑙−1 𝑙−1
| |
(2.5.40) | ⨂ |𝜑𝑗 ⟩ ⟩⟨ ⨂ |𝜓𝑗 ⟩ | = ⨂ |𝜑𝑗 ⟩ ⟨𝜓𝑗 | .
𝑗=0 𝑗=0 𝑗=0
𝑚−1
Also, the concatenation of all sequences ⨂𝑗=0 𝐵𝜆𝑗 ,𝑗 , where (𝜆0 , . . . , 𝜆𝑚−1 ) ∈ 𝐿𝜆 , is
an orthonormal basis of 𝐸 𝜆 , and the projection onto this eigenspace is
𝑚−1
(2.5.45) 𝑃𝜆 = ∑ 𝑃𝜆𝑗 ,𝑗 .
⨂
(𝜆0 ,. . .,𝜆𝑚−1 )∈𝐿𝜆 𝑗=0
Example 2.5.17. Let 𝑚 = 2, ℍ(0) = ℍ(1) = ℍ1 , and let 𝐴0 and 𝐴1 be the Pauli 𝑋
operator introduced in Example 2.3.1 which sends |0⟩ to |1⟩ and vice versa. It has the
eigenvalues 1 and −1. Also
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.5.46) (|𝑥+ ⟩) = ( ), (|𝑥− ⟩) = ( )
√2 √2
are orthonormal bases of the eigenspaces of 𝑋 associated with the eigenvalues 1 and
−1, respectively. The projections onto these eigenspaces are
(2.5.47) |𝑥+ ⟩ ⟨𝑥+ | , |𝑥− ⟩ ⟨𝑥− | ,
respectively.
We consider the tensor product
(2.5.48) 𝐴 = 𝐴 0 ⊗ 𝐴1 .
It is in End(ℍ1 ⊗ ℍ1 ) = End(ℍ2 ). It follows from Proposition 2.5.15 that 1 and −1 are
the eigenvalues of 𝐴, that the sequences
𝐵1 = (|𝑥+ 𝑥+ ⟩ , |𝑥− 𝑥− ⟩),
(2.5.49)
𝐵−1 = (|𝑥+ 𝑥− ⟩ , |𝑥− 𝑥+ ⟩)
are orthonormal bases of the eigenspaces 𝐸1 and 𝐸−1 associated with the eigenvalues
1 and −1, and that
𝑃1 = |𝑥+ 𝑥+ ⟩ ⟨𝑥+ 𝑥+ | + |𝑥− 𝑥− ⟩ ⟨𝑥− 𝑥− | ,
(2.5.50)
𝑃−1 = |𝑥+ 𝑥− ⟩ ⟨𝑥+ 𝑥− | + |𝑥− 𝑥+ ⟩ ⟨𝑥− 𝑥+ |
are the projections onto these eigenspaces.
Since 𝑋 is a Hermitian unitary involution, the same is true for 𝑋 ⊗ 𝑋.
Proof. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘 ⟩) ∈ ℍ(0)𝑘 and 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑙−1 ⟩) ∈ ℍ(1)𝑙 be orthonor-
mal bases of ℍ(0) and ℍ(1), respectively. Then we can write
𝑘−1 𝑙−1
(2.5.52) |𝜑⟩ = ∑ ∑ 𝛼𝑖,𝑗 |𝑏𝑖 ⟩ |𝑐𝑗 ⟩
𝑖=0 𝑗=0
with 𝛼𝑖,𝑗 ∈ ℂ for all 𝑖 ∈ ℤ𝑘 and 𝑗 ∈ ℤ𝑙 . If we set 𝐴 = (𝛼𝑖,𝑗 ) ∈ ℂ(𝑘,𝑙) , then (2.5.52) can
also be written as
(2.5.53) |𝜑⟩ = 𝐵𝐴𝐶.
Without loss of generality, assume that 𝑘 ≥ 𝑙. Then by Theorem 2.4.67 there is a sin-
gular value decomposition
𝐷
(2.5.54) 𝐴 = 𝑈 ( ) 𝑉∗
0
where 𝑈 ∈ ℂ(𝑘,𝑘) and 𝑉 ∈ ℂ(𝑙,𝑙) are unitary matrices and 𝐷 ∈ ℂ(𝑙,𝑙) is a positive
semidefinite diagonal matrix; that is,
(2.5.55) 𝐷 = (𝑟0 , . . . , 𝑟 𝑙−1 )
with 𝑟 𝑖 ∈ ℝ≥0 for 0 ≤ 𝑖 < 𝑙. Write
(2.5.56) 𝑈 = (𝑈1 𝑈2 )
with 𝑈1 ∈ ℂ(𝑘,𝑙) and 𝑈2 ∈ ℂ(𝑘,𝑘−𝑙) . Then it follows from (2.5.54) that
(2.5.57) 𝐴 = 𝑈1 𝐷𝑉 ∗ .
So (2.5.53) implies
(2.5.58) |𝜑⟩ = 𝐵𝑈1 𝐷𝑉 ∗ 𝐶.
If we write
(2.5.59) (|𝑢0 ⟩ , . . . , |𝑢𝑙−1 ⟩) = 𝐵𝑈1 and (|𝑣 0 ⟩ , . . . , |𝑣 𝑙−1 ⟩) = 𝑉 ∗ 𝐶,
we obtain
𝑙−1
(2.5.60) |𝜑⟩ = ∑ 𝑟 𝑖 |𝑢𝑖 ⟩ ⊗ |𝑣 𝑖 ⟩ .
𝑖=0
Definition 2.5.20. Let ℍ(0) and ℍ(1) be Hilbert spaces of dimension 𝑘 and 𝑙, respec-
tively, and let 𝑚 = min{𝑘, 𝑙}. Let |𝜑⟩ ∈ ℍ(0) ⊗ ℍ(1). Let 𝑟 = (𝑟0 , . . . , 𝑟𝑚−1 ) be the
sequence of coefficients in a Schmidt decomposition of |𝜑⟩. From Theorem 2.5.18 we
know that the elements of 𝑟 are nonnegative real numbers and that 𝑟 is uniquely deter-
mined up to reordering.
(1) The elements of 𝑟 are called the Schmidt coefficients of |𝜑⟩.
(2) The number of elements in 𝑟 counted with multiplicities is called the Schmidt
rank or Schmidt number of |𝜑⟩.
(3) The ket |𝜑⟩ is called separable with respect to the decomposition ℍ = ℍ(0) ⊗ ℍ(1)
if its Schmidt rank is 1, i.e., if it can be written as |𝜑⟩ = |𝜓⟩ ⊗ |𝜉⟩ with |𝜓⟩ ∈ ℍ(0)
and |𝜉⟩ ∈ ℍ(1). Otherwise, |𝜑⟩ is called inseparable.
Quantum Mechanics
Quantum mechanics, discovered in the early 20th century, stands as one of the most
revolutionary discoveries in physics. It was fundamentally shaped by the work of
physicists like Max Planck, who received the Nobel Prize in 1918, and Albert Ein-
stein, who received the Nobel Prize in 1921, although Einstein was later very critical of
quantum mechanics. The theory was fully developed by remarkable scientists, includ-
ing Niels Bohr, Nobel laureate of 1922, Werner Heisenberg, Nobel Prize 1932, Erwin
Schrödinger, and Paul Dirac, Nobel Prize 1933, Wolfgang Pauli, Nobel Prize 1945, and
Max Born, Nobel Prize 1954. In 1965, Richard P. Feynman, Julian Schwinger, and
Sin-Itiro Tomonaga were awarded the Nobel Prize for their contributions to quantum
electrodynamics. Recently, in 2022, Alain Aspect, John Clauser, and Anton Zeilinger
received the Nobel Prize for experimentally verifying one of the most counterintuitive
phenomena in quantum physics: entanglement.
One of the fundamental features of quantum mechanics is the concept that closed
physical systems can exist in a superposition of many possible states. This intrigu-
ing property inspired the mathematician and physicist Yuri Manin [Man80] and the
physicists Paul Benioff [Ben80] and Richard Feynman [Fey82] to conceive the idea
of a quantum computer, where information is stored and processed in superposition.
However, it soon became evident that designing practical and useful algorithms based
on this concept is a challenging task, as described in the chapters following this one.
In order to grasp the functioning of these algorithms and their underlying princi-
ples, understanding quantum mechanics becomes essential. Therefore, the objective of
this chapter is to introduce the reader to the quantum mechanical basis that underlies
quantum computing.
Just like other fields of physics, quantum mechanics is built upon a set of postu-
lates that establish correspondence between real-world objects and processes and their
mathematical counterparts. This correspondence enables us to make predictions about
103
104 3. Quantum Mechanics
The term “closed” refers to the system not interacting with other systems, i.e., not
exchanging energy or matter with them. In reality, the only closed system is the uni-
verse as a whole. However, in quantum computing, it is possible to construct quantum
systems which can be described to a good approximation as being closed. The state
vector of a quantum system is also called its wave function. The term “wave function”
originates from the historical development of quantum mechanics, where the theory
was initially formulated by analogy with classical wave phenomena.
basis states |0⟩ and |1⟩. The coefficients 𝛼0 and 𝛼1 are called the amplitudes of |𝜑⟩ for the
basis states |0⟩ and |1⟩, respectively. This is a special case of the following definition.
Definition 3.1.2. Let 𝑙 ∈ ℕ, let (|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩) ∈ ℍ𝑙 be linearly independent, and
let |𝜑⟩ ∈ ℍ, |𝜑⟩ = 𝛼0 |𝜑0 ⟩ + ⋯ + 𝛼𝑙−1 |𝜑𝑙−1 ⟩ with 𝛼𝑖 ∈ ℂ for 𝑖 ∈ ℤ𝑙 . Then, for each
𝑖 ∈ ℤ𝑙 , the coefficient 𝛼𝑖 is called the amplitude of |𝜑⟩ for the state |𝜑𝑖 ⟩.
The next proposition provides properties of the inner product on ℝ3 which are
similar to those listed in Definition 2.2.1.
Proposition 3.1.6. For all 𝑝,⃗ 𝑞,⃗ 𝑟 ⃗ ∈ ℝ3 and all 𝛾 ∈ ℝ the following hold.
(1) Bilinearity: ⟨𝑝+
⃗ 𝑞|⃗ 𝑟⟩⃗ = ⟨𝑝|⃗ 𝑟⟩+⟨
⃗ 𝑞|⃗ 𝑟⟩,
⃗ ⟨𝑝|⃗ 𝑞+
⃗ 𝑟⟩⃗ = ⟨𝑝|⃗ 𝑞⟩+⟨
⃗ 𝑝|⃗ 𝑟⟩,
⃗ and ⟨𝛾𝑝|⃗ 𝑞⟩⃗ = ⟨𝑝|𝛾
⃗ 𝑞⟩⃗ =
𝛾⟨𝑝|⃗ 𝑞⟩.
⃗
(2) Positive definiteness: ⟨𝑝|⃗ 𝑝⟩⃗ ≥ 0 and ⟨𝑝|⃗ 𝑝⟩⃗ = 0 if and only if 𝑝 ⃗ = 0. This property is
also called positivity.
Exercise 3.1.7. Prove Proposition 3.1.6.
Example 3.1.8. The inner product of (3, 2, 1) and (−1, 1, 1) is ⟨(3, 2, 1)|(−1, 1, 1)⟩ =
−3 + 2 + 1 = 0. Therefore, these vectors are orthogonal to each other. The length of
1
the first vector is ‖(3, 2, 1)‖ = √9 + 4 + 1 = √14. So, (3, 2, 1) is a unit vector.
√14
𝜙 𝑦
𝑥
Figure 3.1.1. Spherical coordinates of (𝑥, 𝑦, 𝑧).
Proof. Assume without loss of generality that ‖𝑝‖⃗ = 1. Write 𝑝 ⃗ = (𝑥, 𝑦, 𝑧). Then
|𝑧| ≤ 1 and 𝜃 = arccos 𝑧 is the uniquely determined real number 𝜃 ∈ [0, 𝜋] that
satisfies cos 𝜃 = 𝑧.
If 𝑝 ⃗ = (0, 0, 1), then 𝜃 = 0. So, if we choose 𝜙 = 0, then (3.1.7) is satisfied. Also, if
𝑝 ⃗ = (0, 0, −1), then 𝜃 = 𝜋, and if we choose 𝜙 = 0, then (3.1.7) is satisfied.
If 𝑝 ⃗ ≠ (0, 0, ±1), then 0 < 𝜃 < 𝜋 and we have
𝑥 2 𝑦 2 1 − 𝑧2 1 − cos2 𝜃
(3.1.8) ( ) +( ) ℎ 2
= 2
= 1.
sin 𝜃 sin 𝜃 sin 𝜃 sin 𝜃
So it follows from Lemma 3.1.9 that there is a uniquely determined 𝜙 ∈]0, 2𝜋[ such
that (3.1.7) holds. □
3.1.4. The Bloch sphere. In this section, we present the Bloch sphere represen-
tation of single-qubit states. As we will see in Section 4.3, this representation allows
for a geometric interpretation of the unitary operators in ℍ1 .
Definition 3.1.14. By the Bloch sphere we mean the set {𝑝 ⃗ ∈ ℝ3 ∶ ‖𝑝‖⃗ = 1} which
is the surface of the sphere of radius 1 in ℝ3 . The elements of the Bloch sphere are
referred to as points on the Bloch sphere.
and ℂ is a two-dimensional ℝ-vector space, |𝜑⟩ can be described using four real num-
bers. But since single-qubit states have Euclidean length 1, these numbers are not inde-
pendent of each other. We will now show that single-qubit states can be represented by
three real numbers. For this, we need the following result which follows from Lemma
3.1.9.
Lemma 3.1.15. If 𝛼 ∈ ℂ with |𝛼| = 1, then there is a uniquely determined real number
𝛾 with 0 ≤ 𝛾 < 2𝜋 such that 𝛼 = 𝑒𝑖𝛾 = cos 𝛾 + 𝑖 sin 𝛾.
Exercise 3.1.16. Prove Lemma 3.1.15.
The next proposition presents the representation of single-qubit states by three real
numbers.
Proposition 3.1.18. Let |𝜓⟩ ∈ ℍ1 be a single-qubit state. Then there are uniquely deter-
mined real numbers 𝛾, 𝜃, and 𝜙 such that
𝜃 𝜃
(3.1.10) |𝜓⟩ = 𝑒𝑖𝛾 (cos( ) |0⟩ + 𝑒𝑖𝜙 sin( ) |1⟩)
2 2
and
(3.1.11) 0 ≤ 𝜃 ≤ 𝜋, 0 ≤ 𝛾, 𝜙 < 2𝜋, 𝜃 ∈ {0, 𝜋} ⇒ 𝜙 = 0.
We write these numbers as 𝛾(𝜓), 𝜃(𝜓), and 𝜙(𝜓).
Proof. Let |𝜓⟩ = 𝛼0 |0⟩ + 𝛼1 |1⟩ with 𝛼0 , 𝛼1 ∈ ℂ. Since |𝜓⟩ is a single-qubit state, we
have |𝛼0 |2 + |𝛼1 |2 = 1. Choose 𝜃 ∈ [0, 𝜋] such that
𝜃 𝜃
(3.1.12) |𝛼0 | = cos , |𝛼1 | = sin .
2 2
By Lemma 3.1.9, this is possible and 𝜃 is uniquely determined. To complete the proof,
we distinguish three cases.
First, if 𝛼0 = 0, then 𝜃 = 0, |𝛼1 | = 1, and by Lemma 3.1.15 we can write |𝛼1 | = 𝑒𝑖𝛾
with a uniquely determined 𝛾 ∈ [0, 2𝜋[. If we set 𝜙 = 0, then (𝛾, 𝜃, 𝜙) is the only triplet
of real numbers that satifies (3.1.10) and (3.1.11).
Second, if 𝛼1 = 0, then 𝜃 = 𝜋, |𝛼0 | = 1, and by Lemma 3.1.15 we can write
|𝛼0 | = 𝑒𝑖𝛾 |0⟩ with a uniquely determined 𝛾 ∈ [0, 2𝜋[. If we set 𝜙 = 0, then (𝛾, 𝜃, 𝜙) is
the only triplet of real numbers that satisfies (3.1.10) and (3.1.11).
3.1. State spaces 109
Third, assume that 𝛼0 , 𝛼1 ≠ 0. Then it follows from Lemma 3.1.15 that there are
uniquely determined real numbers 𝛾, 𝛿 ∈ [0, 2𝜋[ such that
𝜃 𝜃
(3.1.13) 𝛼0 = 𝑒𝑖𝛾 |𝛼0 | = 𝑒𝑖𝛾 cos , 𝛼1 = 𝑒𝑖𝛿 |𝛼1 | = 𝑒𝑖𝛿 sin .
2 2
Set 𝜙 = 𝛿 − 𝛾 mod 2𝜋. Then we have
𝜃 𝜃
(3.1.14) |𝜑⟩ = 𝑒𝑖𝛾 (cos |0⟩ + 𝑒𝑖𝜙 sin |1⟩)
2 2
and (𝛾, 𝜃, 𝜙) is the uniquely determined triplet of real numbers that satisfies (3.1.10)
and (3.1.11). □
Definition 3.1.19. (1) To each single-qubit state |𝜓⟩ ∈ ℍ1 we assign the point 𝑝(𝜓)
⃗
on the Bloch sphere with spherical coordinates (1, 𝜃(𝜓), 𝜙(𝜓)) and Cartesian co-
ordinates (sin 𝜃(𝜓) cos 𝜙(𝜓), sin 𝜃(𝜓) sin 𝜙(𝜓), cos 𝜃(𝜓)).
(2) To each point 𝑝 ⃗ on the Bloch sphere with spherical coordinates (1, 𝜃, 𝜙) we assign
the single-qubit state
𝜃 𝜃
(3.1.15) ⃗ = cos( ) |0⟩ + 𝑒𝑖𝜙 sin( ) |1⟩ .
|𝜓(𝑝)⟩
2 2
The correspondence between single-qubit states and points on the Bloch sphere
is illustrated in Example 3.1.20, Exercise 3.1.21, and Figure 3.1.2. There and in the
remainder of this book, we write the unit vectors in the 𝑥-, 𝑦-, and 𝑧-directions in ℝ3 as
ˆ |0⟩ = |𝑧+ ⟩
𝑧̂ =
ˆ |𝜓⟩
(1, 𝜃, 𝜙) =
𝜃
ˆ |𝑦+ ⟩
𝑦̂ =
𝜙
𝑥̂ = |𝑥
ˆ +⟩
ˆ |1⟩ = |𝑧− ⟩
−𝑧 ̂ =
Figure 3.1.2. Points on the Bloch sphere corresponding to |𝑥+ ⟩, |𝑦+ ⟩, |𝑧+ ⟩ = |0⟩,
|𝑧− ⟩ = |1⟩, and a general single-qubit state |𝜓⟩.
110 3. Quantum Mechanics
We also recall that the orthonormal eigenbases of the Pauli operators 𝑋, 𝑌 , and 𝑍 on
ℍ1 (see Section 2.4.7) are
|0⟩ + |1⟩ |0⟩ − |1⟩
(|𝑥+ ⟩ , |𝑥− ⟩) = ( , ),
√2 √2
(3.1.17) |0⟩ + 𝑖 |1⟩ |0⟩ − 𝑖 |1⟩
(|𝑦+ ⟩ , |𝑦− ⟩) = ( , ),
√2 √2
(|𝑧+ ⟩ , |𝑧− ⟩) = (|0⟩ , |1⟩).
Example 3.1.20. The representation (3.1.10) of |𝑧+ ⟩ = |0⟩ is
0 0
(3.1.18) |𝑧+ ⟩ = |0⟩ = 𝑒𝑖⋅0 (cos |0⟩ + 𝑒𝑖⋅0 sin |1⟩) .
2 2
Hence, the spherical coordinate representation of the point on the Bloch sphere corre-
sponding to this state is (1, 0, 0) and its Cartesian coordinate representation is (0, 0, 1) =
𝑧.̂
The representation (3.1.10) of |𝑧− ⟩ = |1⟩ is
0 0
(3.1.19) |𝑧− ⟩ = |1⟩ = 𝑒𝑖⋅0 (cos |0⟩ + 𝑒𝑖⋅0 sin |1⟩) .
2 2
Hence, the spherical coordinate representation of the point on the Bloch sphere corre-
sponding to this state is (1, 𝜋, 0) and its Cartesian coordinate representation (0, 0, −1) =
−𝑧.̂
We now introduce global phase factors. For example, the term 𝑒𝑖𝛾 in the represen-
tation (3.1.10) is such a factor. The general definition is as follows.
Definition 3.1.22. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ and let 𝛾 ∈ ℝ be such that |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩. Then we
say that |𝜑⟩ and |𝜓⟩ are equal up to the global phase factor 𝑒𝑖𝛾 or that these states differ
by the global phase factor 𝑒𝑖𝛾 .
Next, we will show that there is a one-to-one correspondence between the points
on the Bloch sphere and the equivalence classes [𝜓] of the quantum states |𝜓⟩ in ℍ1 .
This means that the state of a single-qubit system is completely described by the cor-
responding point on the Bloch sphere.
Theorem 3.1.25. Denote by 𝑆 1 the set of quantum states in ℍ1 and by 𝑅1 the equivalence
relation on 𝑆 1 from Proposition 3.1.23. Then the map
Proof. It follows from Proposition 3.1.12 that the map that sends the spherical coordi-
nates of a point on the Bloch sphere to its Cartesian coordinates is a bijection. There-
fore, it suffices to prove that the map
(3.1.22) 𝑆 1 /𝑅 → {(0, 0), (𝜋, 0)} ∪ (]0, 𝜋[×[0, 2𝜋[) , [𝜓] ↦ (𝜃(𝜓), 𝜙(𝜓))
is a bijection. Injectivity follows from Proposition 3.1.18. To see the surjectivity, we note
that for a point 𝑝 ⃗ on the Bloch sphere with spherical coordinates (𝜃, 𝜙), the equivalence
class [𝜓(𝑝)]⃗ is the inverse image of 𝑝.⃗ □
(3.1.24) ∑ |𝛼𝑏⃗ |2 = 1.
⃗
𝑏∈{0,1}𝑛
Such an element of ℍ𝑛 is called an 𝑛-qubit state. So, while the state of a classical 𝑛-bit
register is an element 𝑏 ⃗ ∈ {0, 1}𝑛 , the state of an 𝑛-qubit quantum register is a linear
combination of the computational basis states ||𝑏⟩, ⃗ 𝑏 ⃗ ∈ {0, 1}𝑛 which is also called a
superposition of the basis elements.
112 3. Quantum Mechanics
Example 3.1.26. Consider the state space ℍ2 of a 2-qubit system. The computational
basis of ℍ2 is (|00⟩ , |01⟩ , |10⟩ , |11⟩). It can also be written as |0⟩2 , |1⟩2 , |2⟩2 , |3⟩2 . For
instance,
1 𝑖 1 𝑖
(3.1.25) |𝜑⟩ = |00⟩ − |11⟩ = |0⟩2 − |3⟩2
√2 √2 √2 √2
is a 2-qubit state. It is a superposition of the states |00⟩ = |0⟩2 and |11⟩ = |3⟩2 .
Postulate 3.2.1 (Composite Systems Postulate). The state space of the composition of
finitely many physical systems is the tensor product of the state spaces of the compo-
nent systems. Moreover, if we have systems numbered 0 through 𝑚−1 and if system 𝑖 is
in the state |𝜓𝑖 ⟩ for 0 ≤ 𝑖 < 𝑚, then the composite system is in state |𝜓0 ⟩ ⊗ ⋯ ⊗ |𝜓𝑚−1 ⟩.
For quantum computing, compositions of the following type are frequently used:
Example 3.2.2. We consider the composition of two qubits with state space ℍ2 =
ℍ1 ⊗ ℍ1 . For all pairs (|𝜑0 ⟩ , |𝜑1 ⟩) of single-qubit states, ℍ2 contains the composite
state
(3.2.2) |𝜑⟩ = |𝜑0 ⟩ ⊗ |𝜑1 ⟩ .
However, as we have seen in Example 2.5.22, the Bell state
|00⟩ + |11⟩
(3.2.3) |𝜑⟩ =
√2
cannot be written in this form, that is, as the tensor product of two single-qubit states.
It is therefore called entangled.
Definition 3.2.3. A state of the composition of two physical systems is called entan-
gled if it cannot be written as the tensor product of states of the component systems.
Otherwise, this state is called separable or nonentangled.
Theorem 3.2.4. The state of the composition of two quantum systems is separable if and
only if its Schmidt rank is 1, and it is entangled if and only if its Schmidt rank is greater
than 1.
Exercise 3.2.5. Find an example of an entangled state in ℍ3 and prove that it is entan-
gled.
3.3.1. Evolution Postulate. How does the state of a quantum mechanical sys-
tem change over time? This question is answered by the Evolution Postulate.
Definition 3.3.4 shows that the 𝖢𝖭𝖮𝖳 gate applies the Pauli 𝑋 operator to a target
qubit |𝑡⟩ if the control qubit |𝑐⟩ is |1⟩. Otherwise, the target qubit remains unchanged.
This means that the application of the Pauli 𝑋 operator to the target qubit is controlled
by the control qubit. Since the Pauli 𝑋 operator is the quantum 𝖭𝖮𝖳 operator, this
explains the name “controlled-𝖭𝖮𝖳 gate”. So, 𝖢𝖭𝖮𝖳 operates on the computational
basis states of ℍ2 in the following way:
(3.3.4) |00⟩ ↦ |00⟩ , |01⟩ ↦ |01⟩ , |10⟩ ↦ |11⟩ , |11⟩ ↦ |10⟩ ,
𝐻
Figure 3.3.1. Symbol for the Hadamard gate in quantum circuits.
3.3. Time evolution 115
which shows that the representation matrix of 𝖢𝖭𝖮𝖳 with respect to the computational
basis of ℍ2 is
1 0 0 0
⎛ ⎞
0 1 0 0
(3.3.5) 𝖢𝖭𝖮𝖳 = ⎜ ⎟.
⎜0 0 0 1⎟
⎝0 0 1 0⎠
This implies the following result.
Proposition 3.3.5. The 𝖢𝖭𝖮𝖳 operator is a Hermitian unitary involution; that is, we
∗ −1 2
have 𝖢𝖭𝖮𝖳 = 𝖢𝖭𝖮𝖳 = 𝖢𝖭𝖮𝖳 and 𝖢𝖭𝖮𝖳 = 𝐼2 .
Exercise 3.3.6. Prove Proposition 3.3.5.
In quantum circuits, the 𝖢𝖭𝖮𝖳 gate is represented by the symbol shown in Figure
3.3.2.
The definition of the CNOT gate might give the impression that this gate never
changes the control qubit. However, as the next example shows, this impression is
deceptive.
Example 3.3.7. We have
𝖢𝖭𝖮𝖳 |𝑥+ ⟩ |𝑥− ⟩
𝖢𝖭𝖮𝖳 |0⟩ |0⟩ − 𝖢𝖭𝖮𝖳 |0⟩ |1⟩ + 𝖢𝖭𝖮𝖳 |1⟩ |0⟩ − 𝖢𝖭𝖮𝖳 |1⟩ |1⟩
=
2
|0⟩ |0⟩ − |0⟩ |1⟩ + |1⟩ |1⟩ − |1⟩ |0⟩
=
2
= |𝑥− ⟩ |𝑥− ⟩ .
So applied to |𝑥+ ⟩ |𝑥− ⟩ the 𝖢𝖭𝖮𝖳 operator changes the control qubit but not the target
qubit.
𝐻
= 𝐻
𝐼1
Figure 3.3.3. Extension of 𝐻 acting on the first state space to an operator action on two qubits.
𝑓𝑗 on one of the component state spaces ℍ(𝑗), 𝑗 ∈ ℤ𝑚 , to the composite state space by
using the operator
(3.3.8) 𝐼0 ⊗ ⋯ ⊗ 𝐼𝑗−1 ⊗ 𝑓𝑗 ⊗ 𝐼𝑗+1 ⊗ ⋯ ⊗ 𝐼𝑚−1
where 𝐼𝑖 is the identity operator on ℍ(𝑖) for 0 ≤ 𝑖 < 𝑚.
Example 3.3.8. Consider the composition of two single-qubit systems with state spaces
ℍ1 each. The state space of the composite system is ℍ2 . The extension of the Hadamard
operator acting on the first state space to an operator on the composite space is 𝐻 ⊗ 𝐼
where 𝐼 is the identity operator on ℍ1 . In a quantum circuit, this extended operator
is depicted on the right side of Figure 3.3.3. So, the identity operator is omitted. The
two lines represent the two qubits. The box with 𝐻 inside represents the Hadamard
operator acting on the first qubit. The extended operator has the following effect on
the computational basis elements of ℍ2 :
1 1
|00⟩ ↦ (|00⟩ + |10⟩), |01⟩ ↦ (|01⟩ + |11⟩),
√2 √2
1 1
|10⟩ ↦ (|00⟩ − |10⟩), |11⟩ ↦ (|01⟩ − |11⟩).
√2 √2
The matrix representation of the composite operator is
1 0 1 0
⎛ ⎞
1
1 1 1 0 1 0 1 0 1
𝐻 ⊗ 𝐼2 = ( )⊗( )= ⎜ ⎟.
√2 1 −1 0 1 √2 ⎜1 0 −1 0⎟
⎝0 1 0 −1⎠
𝐻 𝐻
|𝜑⟩ 𝐻 |𝜓⟩
Figure 3.3.4. Quantum circuit that combines the Hadamard and 𝖢𝖭𝖮𝖳 gates.
3.3. Time evolution 117
𝑈0 = 𝐻 ⊗ 𝐻 ⊗ 𝐼1 𝑈1 = 𝐼1 ⊗ 𝖢𝖭𝖮𝖳 𝑈2 = 𝐻 ⊗ 𝐼1 ⊗ 𝐼1
𝐻 𝐼1 𝐻
|𝜑⟩ 𝐻 𝐼1 |𝜓⟩
𝐼1 𝑥 𝐼1
The circuit transforms the 3-qubit input state |𝜑⟩ into a 3-qubit output state |𝜓⟩. How
is 𝑈 constructed? As shown in Figure 3.3.5, 𝑈 is the concatenation of three unitary
operators; i.e.,
(3.3.9) 𝑈 = 𝑈2 ∘ 𝑈1 ∘ 𝑈0 .
Each of these unitary operators is the tensor product of the unitary operators imple-
mented by the gates that are one above the other. If there is no operator but only a
wire, the identity operator is inserted. The composed operators are 𝑈0 = 𝐻 ⊗ 𝐻 ⊗ 𝐼1 ,
𝑈1 = 𝐼1 ⊗ 𝖢𝖭𝖮𝖳, and 𝑈2 = 𝐻 ⊗ 𝐼1 ⊗ 𝐼1 .
We determine the state that is obtained when the input state of the quantum circuit
in Figures 3.3.4 and 3.3.5 is |000⟩:
1
|000⟩ ↦ (|0⟩ + |1⟩)(|0⟩ + |1⟩) |0⟩
𝑈0 √4
1
= (|000⟩ + |010⟩ + |100⟩ + |110⟩)
√4
1
↦ (|000⟩ + |011⟩ + |100⟩ + |111⟩)
𝑈1 √4
(3.3.10)
1
= (|0⟩ + |1⟩)(|00⟩ + |11⟩)
√4
1
↦ |0⟩ (|00⟩ + |11⟩)
𝑈2 √2
1
= (|000⟩ + |011⟩).
√2
Exercise 3.3.9. Determine the output state 𝑈 ||𝑏⟩⃗ of the circuit in Figure 3.3.4 for all
𝑏 ⃗ ∈ {0, 1}3 .
The concept of a quantum circuit can easily be generalized to circuits that operate
on 𝑛-qubit registers for any 𝑛 ∈ ℕ. This will be discussed in more detail in Section 4.7.
118 3. Quantum Mechanics
|0⟩ 𝐻 𝐻
1
|0⟩ 𝐻 (|000⟩ + |011⟩)
√2
|0⟩
Figure 3.3.6. The quantum circuit from Figure 3.3.4 operating on |000⟩.
3.4. Measurements
In the previous sections, we have seen the first steps of quantum computing. A state
of an 𝑛-bit quantum register is prepared. This is the input state for a quantum circuit
operating on 𝑛-qubit states. This quantum circuit implements a unitary operator on
the state space of the quantum register. It is composed of several quantum gates and
transforms the input state into an output state. This output state may either be used by a
further quantum computation or the information may be extracted by a measurement.
Such measurements are discussed in more detail in this section.
The measurements in the postulate are called “projective” because they project the
state of the quantum system onto one of the eigenspaces of the measured observable
and normalize the length of this projection. We also note that measurement devices
3.4. Measurements 119
In Section 3.1.4 we have introduced global phase factors of quantum states and
have shown that equality up to a global phase factor is an equivalence relation. We
have denoted the corresponding equivalence class of a quantum state |𝜓⟩ by [𝜓]. We
now show that global phase factors have no impact on quantum measurements.
Exercise 3.4.5. Let ℍ be the state space of a quantum system. Show that 𝑂 = 𝐼ℍ is an
observable of the quantum system. Also, show that measuring 𝑂 when the quantum
system is in the state |𝜑⟩ ∈ ℍ gives 1 with probability 1 and that immediately after the
measurement, the quantum system is still in state |𝜑⟩.
be a single-qubit state, where 𝛼0 , 𝛼1 ∈ ℂ with |𝛼20 | + |𝛼1 |2 = 1. We would like the mea-
surement to reflect this superposition; that is, we want the measurement outcome to be
0 with probability |𝛼0 |2 and 1 with probability |𝛼1 |2 . Recall that, by Proposition 2.4.44,
the projections onto the subspaces spanned by |0⟩ or |1⟩ are 𝑃0 = |0⟩ ⟨0| and 𝑃1 = |1⟩ ⟨1|,
respectively. Therefore, we use the observable 𝑂 with the spectral decomposition
For example, if the qubit is in the state |0⟩, then the expectation value of this measure-
ment is 0. Also, if the qubit is in the state |1⟩, then the expectation value is 1. But if the
1 1
qubit is in the superposition |𝜓⟩ = |0⟩ + |1⟩, then the expectation value of this
√2 √2
1
measurement is 2 .
We generalize Example 3.4.6. Let ℍ be the state space of a quantum system and
let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ. For example, if ℍ = ℍ𝑛 , then
𝑘 = 2𝑛 and we may use the computational basis (|0⟩𝑛 , . . . , |2𝑛 − 1⟩𝑛 ) of ℍ𝑛 . For 𝑗 ∈ ℤ𝑘
the projection onto ℂ |𝑏𝑗 ⟩ is
𝑘−1
with 𝛼𝑗 ∈ ℂ for 0 ≤ 𝑗 < 𝑘 and |𝛼𝑖 |2 = 1. According to Postulate 3.4.1, the
∑𝑖=0
possible outcomes when measuring the observable 𝑂 are the eigenvalues 𝑗 ∈ ℤ𝑘 of 𝑂.
Each integer 𝑗 ∈ ℤ𝑘 occurs with probability
(3.4.12) ‖𝑃𝑗 |𝜑⟩ ‖2 = |𝛼𝑗 |2
and the state of the quantum system immediately after this outcome is
𝑃𝑗 |𝜑⟩ 𝛼𝑗
(3.4.13) = |𝑏 ⟩ .
‖𝑃 |𝜑⟩‖ |𝛼𝑗 | 𝑗
‖ 𝑗 ‖
Up to a global phase factor, this state is equal to the basis state |𝑏𝑗 ⟩. The expectation
value of the measurement is
𝑘−1
(3.4.14) ⟨𝜑|𝐴|𝜑⟩ = ∑ 𝑗|𝛼𝑗 |2 .
𝑗=1
of ℍ.
Example 3.4.8. Consider the circuit shown in Figure 3.3.6. As shown in the figure,
|000⟩+|011⟩
the input state is |000⟩ and the output state is . If we use integers in ℤ3 to
√2
|0⟩3 +|3⟩3
denote the computational basis states, then the output state is . Measuring
√2
122 3. Quantum Mechanics
the 3-qubit register in the computational basis of ℍ3 means measuring the observable
7
𝑂 = ∑𝑗=1 𝑗 |𝑗⟩ ⟨𝑗|. The measurement outcome is one of the numbers 0 or 3, each with
1 0+3 3
probability 2 . The expectation value of this measurement is 2
= 2.
Exercise 3.4.9. Determine the measurement statistics and the expectation values for
measuring the 3-qubit register of the circuit in Figure 3.3.4 in the computational basis
for all input states ||𝑏⟩,
⃗ 𝑏 ⃗ ∈ {0, 1}3 .
Measuring this observable when the system 𝐴𝐵 is in the state |𝜓⟩, the outcome is 𝜆 ∈ Λ
2 𝑃 ⊗𝐼 |𝜓⟩
with probability ‖𝑃𝜆 ⊗ 𝐼𝐵 |𝜓⟩‖ and the state after this outcome is ‖𝑃𝜆 ⊗𝐼𝐵 |𝜓⟩‖ .
𝜆 𝐵
Example 3.4.10 shows that measuring the first qubit of the entangled Bell state
and leaving the second qubit alone changes both qubits. This observation is central
to the famous EPR thought experiment. It is named after its inventors Albert Ein-
stein, Boris Podolsky, and Nathan Rosen. They published it in 1935 to demonstrate
that quantum mechanics is incomplete. We present a simplified description of their
3.5. Density operators 123
idea. Prepare two qubits in the entangled state (3.4.19). Give one to Alice and the other
to Bob. Then Alice travels a long way, taking her qubit with her. Upon arriving, she
measures her qubit. As we have seen in Example 3.4.10, this measurement will with
probability 1/2 put both qubits into the state |0⟩ or |1⟩. After this measurement, Alice
knows with certainty the state of Bob’s qubit. Einstein, Podolsky, and Rosen claimed
that this instantaneous change of Bob’s qubit contradicts relativity theory which says
that the maximum possible speed is the speed of light. They concluded that the quan-
tum mechanics explanation must be incomplete. However, much later experiments
confirmed the prediction of quantum mechanics and thus showed that the arguments
of Einstein, Podolsky, and Rosen were not correct. We note that from the perspective
of information theory, Alice and Bob do not exchange information. They only obtain
a uniformly distributed random bit, which they can also produce by tossing a coin.
As we will see now, the situation is much easier if the composite system 𝐴𝐵 is in a
separable state
(3.4.22) |𝜓⟩ = |𝜑⟩ |𝜉⟩
with |𝜑⟩ , |𝜉⟩ ∈ ℍ1 . Then for 𝜆 ∈ Λ we have
(3.4.23) 𝑃𝜆 ⊗ 𝐼𝐵 𝜓 = 𝑃𝜆 ⊗ 𝐼𝐵 |𝜑⟩ |𝜉⟩ = 𝑃𝜆 |𝜑⟩ ⊗ |𝜉⟩ .
Hence, the probability of measuring 𝜆 is
(3.4.24) ‖𝑃𝜆 |𝜑⟩ ⊗ |𝜉⟩‖ = ‖𝑃𝜆 |𝜑⟩‖ ‖𝜉‖ = ‖𝑃𝜆 |𝜑⟩‖ .
This is the probability for obtaining 𝜆 when only system 𝐴 is measured. Also, the state
after measuring 𝜆 is
𝑃𝜆 |𝜑⟩
(3.4.25) ⊗ |𝜉⟩ .
‖𝑃𝜆 |𝜑⟩‖
This state is the tensor product of the state after measuring system 𝐴 with the state |𝜉⟩
of system 𝐵 before the measurement.
Example 3.4.11. Consider the separable quantum state
(3.4.26) |𝜓⟩ = |𝑥− ⟩ |𝑥− ⟩ .
Measuring only the first qubit in the computational basis of ℍ1 gives 0 or 1, each with
1
probability 2 . If the measurement outcome 𝑏 ∈ {0, 1} occurs, then the state immedi-
ately after the measurement is |𝑏⟩ |𝑥− ⟩.
Exercise 3.4.12. Write down the observable that only measures the first and last qubit
of a 3-qubit quantum register. Determine the measurement statistics for the quantum
|000⟩+𝑖|111⟩
state |𝜑⟩ = .
√2
3.5.1. Definition.
Definition 3.5.1. A density operator on ℍ is a linear operator on ℍ that satisfies the
following conditions.
(1) Trace condition: tr 𝜌 = 1,
(2) Positivity condition: 𝜌 is positive semidefinite.
Example 3.5.2. If |𝜑⟩ is a state of 𝑄, then
(3.5.1) 𝜌 = |𝜑⟩ ⟨𝜑|
is a density operator on ℍ. In fact, by Proposition 2.4.27 and since quantum states have
norm 1, we have
(3.5.2) tr 𝜌 = tr |𝜑⟩ ⟨𝜑| = ⟨𝜑|𝜑⟩ = 1.
This proves the trace condition. Also, it follows from Proposition 2.4.27 that for all
|𝜓⟩ ∈ ℍ we have
(3.5.3) ⟨𝜓|𝜌|𝜓⟩ = ⟨𝜓|𝜑⟩⟨𝜑|𝜓⟩ = |⟨𝜑|𝜓⟩|2 ≥ 0.
This proves the positivity condition.
We note that density operators on ℍ are Hermitian since by Proposition 2.4.64 posi-
tive semidefinite operators are Hermitian. Next, we introduce mixed states of quantum
systems. They allow us to describe the probabilistic behavior of quantum systems in
situations where we don’t have complete information about the system.
Definition 3.5.3. (1) A mixed state of the quantum system 𝑄 is a sequence
(3.5.4) ((𝑝0 , |𝜓0 ⟩), . . . (𝑝 𝑘−1 , |𝜓𝑙−1 ⟩))
where 𝑙 ∈ ℕ, the |𝜓𝑖 ⟩ are quantum states in ℍ for 0 ≤ 𝑖 < 𝑙, and 𝑝 𝑖 ∈ ℝ≥0 for
𝑙−1
0 ≤ 𝑖 < 𝑙 such that ∑𝑖=0 𝑝 𝑖 = 1.
(2) A pure state of the quantum system 𝑄 is a quantum state in its state space ℍ.
We note that there is a one-to-one correspondence between the pure states |𝜓⟩ and
the mixed states (1, |𝜓⟩) of 𝑄. Using this correspondence, we identify pure states and
these mixed states. A mixed state of 𝑄 with more than 1 component describes the
situation where the exact state of the quantum system is inaccessible. For instance,
such mixed states arise as the states of parts of composite quantum systems that are in
an entangled state. This will be discussed in Section 3.7. The next theorem associates
each mixed state with a density operator.
Proposition 3.5.4. Let ((𝑝0 , |𝜓0 ⟩), . . . (𝑝 𝑙−1 , |𝜓𝑙−1 ⟩)) be a mixed state of the quantum
system 𝑄. Then
𝑙−1
(3.5.5) 𝜌 = ∑ 𝑝 𝑖 |𝜓𝑖 ⟩ ⟨𝜓𝑖 |
𝑖=0
Proof. We know from Proposition B.5.25 that the trace is ℂ-linear. Therefore, we have
𝑙−1
tr(𝜌) = ∑ 𝑝 𝑖 tr |𝜓𝑖 ⟩ ⟨𝜓𝑖 | linearity of the trace,
𝑖=0
𝑙−1
= ∑ 𝑝 𝑖 ⟨𝜓𝑖 |𝜓𝑖 ⟩ Proposition 2.4.27(6),
𝑖=0
𝑙−1
= ∑ 𝑝𝑖 = 1 Definition 3.5.3.
𝑖=0
This proves the trace condition. To show the positivity condition, we note that for all
|𝜉⟩ ∈ ℍ we have
|𝑙−1
⟨𝜉|𝜌|𝜉⟩ = ⟨𝜉 || ∑ 𝑝 𝑖 |𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜉⟩ ⟩ definition of 𝜌,
|𝑖=0
𝑙−1
= ∑ 𝑝 𝑖 ⟨𝜉|𝜓𝑖 ⟩⟨𝜓𝑖 |𝜉⟩ linearity of the inner product,
𝑖=0
𝑙−1
= ∑ 𝑝 𝑖 |⟨𝜓𝑖 |𝜉⟩|2 ≥ 0 conjugate symmetry of the inner product.
𝑖=0
The positivity condition and Proposition 2.4.65 imply that 𝜆𝑖 ≥ 0 for 0 ≤ 𝑖 < 𝑘. Hence,
((𝜆0 , |𝑏0 ⟩), . . . , (𝜆𝑘−1 , |𝑏𝑘−1 ⟩))
is a mixed state of the quantum system with density operator 𝜌. □
The proof of Proposition 3.5.8 contains a method for constructing a mixed state
that corresponds to a given density operator. This is illustrated in the next example.
Example 3.5.9. Consider the operator
1 1
(3.5.15) 𝜌 = |0⟩ ⟨0| + |1⟩ ⟨1| .
2 2
This is the representation of 𝜌 as in (3.5.13). Also, 𝜌 is positive semidefinite and has
trace 1. Therefore, 𝜌 is a density operator. The construction in the proof of Proposition
3.5.8 gives the mixed state
1 1
(3.5.16) (( , |0⟩) , ( , |1⟩))
2 2
that we already know from Example 3.5.6. Its density operator is 𝜌.
Proposition 3.5.8 shows that the map that sends a mixed state to its density operator
is surjective. However, in general, this map is not injective. The next proposition allows
us to determine the mixed states that are associated with the same density operator.
3.5. Density operators 127
Now, assume that the two operators are equal. Call them 𝜌. Since 𝜌 is Hermitian
it follows from the Spectral Theorem 2.4.56 that there is a decomposition
𝑚−1
(3.5.23) 𝜌 = ∑ 𝜆𝑖 |𝑏𝑖 ⟩ ⟨𝑏𝑖 |
𝑖=0
is a density operator, these eigenvalues are positive. We show that the vector space
𝑉 = Span(|𝑏0 ⟩ , . . . , |𝑏𝑚−1 ⟩) is equal to the vector space 𝑉 ′ = Span(|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩).
Since for all 𝑗 ∈ ℤ𝑚 we have
𝑙−1
(3.5.24) 𝜆𝑗 |𝑏𝑗 ⟩ = 𝜌 |𝑏𝑗 ⟩ = ∑ ⟨𝜑𝑖 |𝑏𝑗 |𝜑𝑖 ⟩
𝑖=0
Hence, |⟨𝜑𝑖 |𝜓⟩|2 = 0 for 0 ≤ 𝑖 < 𝑙 and all |𝜓⟩ ∈ 𝑉 ⟂ . Therefore |𝜑𝑖 ⟩ ∈ (𝑉 ⟂ )⟂ = 𝑉 for
0 ≤ 𝑖 < 𝑙 which together with (3.5.25) implies
(3.5.29) 𝑉 = 𝑉′ and 𝑚 ≤ 𝑙.
So we can write
𝑚−1
(3.5.30) |𝜑𝑗 ⟩ = ∑ 𝑢𝑖,𝑗 |𝜉𝑖 ⟩
𝑖=0
for all 𝑗 ∈ ℤ𝑙 with complex coefficients 𝑢𝑖,𝑗 . Denote the matrix (𝑢𝑖,𝑗 ) ∈ ℂ(𝑚,𝑙) by 𝑈.
Also, denote the entries of the adjoint 𝑈 ∗ of 𝑈 by 𝑢∗𝑖,𝑗 . Now we have
𝑚−1 𝑙−1
∑ |𝜉𝑝 ⟩ ⟨𝜉𝑝 | = ∑ |𝜑𝑝 ⟩ ⟨𝜑𝑝 |
𝑝=0 𝑝=0
𝑙−1 |𝑚−1 𝑚−1 |
(3.5.31) = ∑ || ∑ 𝑢𝑖,𝑝 |𝜉𝑖 ⟩⟩ ⟨ ∑ 𝑢𝑗,𝑝 |𝜉𝑗 ⟩||
𝑝=0 | 𝑖=0 𝑗=0 |
𝑚−1 𝑙−1
∗
= ∑ ( ∑ 𝑢𝑖,𝑝 𝑢𝑝,𝑗 ) |𝜉𝑖 ⟩ ⟨𝜉𝑗 | .
𝑖,𝑗=0 𝑝=0
3.5. Density operators 129
It follows from Proposition 2.4.31 that the sequence (|𝜉𝑝 ⟩ ⟨𝜉𝑝 |) is linearly independent.
Hence, (3.5.31) implies
𝑙−1
∗
(3.5.32) ∑ 𝑢𝑖,𝑝 𝑢𝑝,𝑗 = 𝛿 𝑖,𝑗
𝑝=0
Proof. To prove the first assertion, we let |𝜑⟩ and |𝜓⟩ be pure states of 𝑄. The den-
sity operators of these states are the density operators of the mixed states ((1, |𝜑⟩)) and
((1, |𝜓⟩)), respectively. Therefore, it follows from Proposition 3.5.10 that the density
operators of these states are equal if and only if there is a complex number 𝑢 of norm
1 such that |𝜓⟩ = 𝑢 |𝜑⟩. This proves the first assertion. The second assertion follows
immediately from Proposition 3.5.10. □
Theorem 3.5.11 can also be used to characterize the mixed states of different lengths
that correspond to the same density operator. As is shown in Exercise 3.5.12 we can
extend any mixed state of length 𝑙 to a mixed state of length 𝑘 > 𝑙 with the same density
operator by appending 𝑘 − 𝑙 pairs (0, 0) to it.
Exercise 3.5.12. Show that appending pairs (0, 0) to a mixed state gives a mixed state
with the same density operator.
The next theorem gives a criterion that allows one to distinguish between density
operators of pure states and mixed states.
Theorem 3.5.15. Let 𝜌 be a density operator on ℍ. Then the following statements hold.
(1) 𝜌 is the density operator of a pure state if and only if 𝜌2 = 𝜌, which is true if and
only if tr 𝜌2 = 1.
(2) 𝜌 is not the density operator of a pure state if and only if 𝜌2 ≠ 𝜌, which is true if and
only if tr 𝜌2 < 1.
We now prove the first assertion of the theorem. Let 𝜌 = |𝜑⟩ ⟨𝜑| with a quantum
state |𝜑⟩ ∈ ℍ. Since ⟨𝜑|𝜑⟩ = 1, it follows that 𝜌 is a projection; that is, 𝜌2 = 𝜌. Now
assume that 𝜌2 = 𝜌. Then the trace condition implies tr 𝜌2 = tr 𝜌 = 1. Finally, let
tr 𝜌2 = 1. Then it follows from (3.5.40) that there is 𝑙 ∈ ℤ𝑘 with |𝜆𝑙 | = 1 and 𝜆𝑖 = 0 for
𝑖 ≠ 𝑙. Therefore, 𝜌 = |𝑏𝑙 ⟩ ⟨𝑏𝑙 |. The second assertion is proved in Exercise 3.5.16. □
Example 3.5.17. Consider the density operator 𝜌 from Example 3.5.9. Since
1
(3.5.41) 𝜌2 = (|0⟩ ⟨0| + |1⟩ ⟨1|) ≠ 𝜌
4
it follows from Theorem 3.5.15 that 𝜌 is not the density operator of a pure state. This
we have already directly verified in Example 3.5.6.
3.6.1. State Space Postulate. We begin with the generalized State Space Pos-
tulate.
Postulate 3.6.1 (State Space Postulate — density operator version). Associated with
any physical system is a Hilbert space, called the state space of the system. The system
is completely described by a density operator on the state space.
By Theorem 3.5.11, modeling pure quantum states using density operators is coars-
er than modeling them using state vectors. This theorem tells us that the density oper-
ators of two state vectors are equal as long as they are equal up to a global phase factor.
However, since by Theorem 3.4.4 all state vectors that are equal up to a global phase
factor behave the same in measurements, this is of no importance.
The description of quantum systems using density operators includes mixed states.
This becomes essential in scenarios like when a component of a composite system is
discarded, leaving only the remaining system to be described.
The Composite Systems Postulate for density operators is the following.
Postulate 3.6.2 (Composite Systems Postulate — density operator version). The state
space of the composition of finitely many physical systems is the tensor product of the
state spaces of the component physical systems. Moreover, if we have systems num-
bered 0 through 𝑚 − 1 and if system 𝑖 is in the state 𝜌𝑖 where 𝜌𝑖 is a density operator on
the state space of the 𝑖th component system for 0 ≤ 𝑖 < 𝑘, then the composite system
is in the state 𝜌0 ⊗ ⋯ ⊗ 𝜌𝑚−1 .
Analogous to what we have described in Section 3.3.3, we can use composite uni-
tary operators to describe the time evolution of composite quantum systems that are in
a mixed state.
Exercise 3.6.4. Suppose that at time 𝑡 a 2-qubit quantum register is in the state 𝜌 =
|00⟩ ⟨00| and that the state of the system at time 𝑡′ > 𝑡 is obtained from 𝜌 by applying the
𝖢𝖭𝖮𝖳 operator. Determine the density operator that describes the state of the system
at time 𝑡′ .
In the situation of the Measurement Postulate 3.6.5, the expectation value of the
random variable that sends a measurement outcome to the corresponding eigenvalue
is
|
(3.6.6) ∑ 𝜆Pr(𝜆) = ∑ 𝜆 tr(𝑃𝜆 𝜌) = tr ⟨( ∑ 𝜆𝑃𝜆 )𝜌|| = tr(𝑂𝜌).
𝜆∈Λ 𝜆∈Λ 𝜆∈Λ |
This motivates the following definition.
3.6. The quantum postulates for mixed states 133
The following proposition applies the Measurement Postulate for density operators
to explain what happens when mixed states are measured in an orthonormal basis.
Proposition 3.6.7. Suppose that we measure the quantum system in the orthonormal
basis 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) when it is in the mixed state. Then measuring the observable
𝑘−1 𝑙−1
∑𝜆=0 𝜆 |𝜆𝑗 ⟩ ⟨𝜆𝑗 | gives 𝜆 ∈ ℤ𝑘 with probability Pr(𝜆) = ∑𝑖=0 𝑝 𝑖 |⟨𝑏𝜆 |𝜑𝑖 ⟩|2 . Immediately
after this measurement, the quantum system is in the state |𝑏𝜆 ⟩ ⟨𝑏𝜆 |.
Proof. The density operator corresponding to the mixed state of the quantum system
is
𝑙−1
(3.6.7) 𝜌 = ∑ 𝑝 𝑖 |𝜑𝑖 ⟩ ⟨𝜑𝑖 | .
𝑖=0
When the measurement outcome is 𝜆, then immediately after the measurement the
quantum system is in the state
𝑙−1
𝑃𝜆 𝜌𝑃𝜆 ∑ 𝑝 𝑖 |𝑏𝜆 ⟩ ⟨𝑏𝜆 |𝜑𝑖 ⟩ ⟨𝜑𝑖 |𝑏𝜆 ⟩ ⟨𝑏𝜆 |
(3.6.10) = 𝑖=0
Pr(𝜆) Pr(𝜆)
𝑙−1
𝑃𝜆 ∑𝑖=0 𝑝 𝑖 |⟨𝑏𝜆 |𝜑𝑖 ⟩|2
= = 𝑃𝜆 . □
Pr(𝜆)
The concepts and results regarding partial measurements in Section 3.4.3 carry
over to the mixed state situation. We only need to replace the formulas for the mea-
surement probabilities and for the states immediately after measurement with the for-
mulas that hold for mixed states. In Exercise 3.6.8 this is done for quantum systems
that are the composition of two quantum systems.
Exercise 3.6.8. Assume that 𝐴 and 𝐵 are quantum systems with state spaces ℍ𝐴 and
ℍ𝐵 . Consider the composite quantum system 𝐴 with state space ℍ𝐴𝐵 = ℍ𝐴 ⊗ ℍ𝐵 . Let
𝑂𝐴 be an observable of system 𝐴 with spectral decomposition 𝑂𝐴 = ∑𝜆∈Λ 𝜆𝑃𝜆 . Also,
let 𝜌𝐴 , 𝜌𝐵 be states of the systems 𝐴 and 𝐵, respectively. Prove that the following hold.
134 3. Quantum Mechanics
3.6.4. The descriptions by state vectors and density operators are equiva-
lent. We have introduced the description of the states of quantum systems using state
vectors and density operators. Modeling quantum systems using density operators is
more general since it also covers the situation where a quantum system is described by
a mixed state. However, when we restrict our attention to pure states, the two descrip-
tions are equivalent. This means the following. Consider a quantum system with state
space ℍ. Suppose that it evolves in 𝑘 steps. Initially, it is in the pure state 𝑠0 , then in the
pure state 𝑠1 , etc., until it is finally in the pure state 𝑠𝑙 . These states can be described by
state vectors or density operators. In both versions of the State Space Postulate, each
transition is associated with a unitary operator on ℍ. So let 𝑈0 , 𝑈1 , . . . , 𝑈 𝑙−1 be a se-
quence of unitary operators on ℍ such that state 𝑠𝑖+1 is obtained from state 𝑠𝑖 using 𝑈 𝑖
for 0 ≤ 𝑖 < 𝑙. Assume that the states 𝑠𝑖 are represented by state vectors |𝜑𝑖 ⟩ ∈ ℍ for
0 ≤ 𝑖 < 𝑙. Then we have
(3.6.12) |𝜑𝑘 ⟩ = 𝑈 |𝜑0 ⟩
with
(3.6.13) 𝑈 = 𝑈 𝑙−1 ⋯ 𝑈0 .
Next, assume that the state 𝑠0 is represented by the density operator
(3.6.14) 𝜌0 = |𝜑0 ⟩ ⟨𝜑0 | .
Also, for 0 ≤ 𝑖 < 𝑙 set
(3.6.15) 𝜌𝑖+1 = 𝑈 𝑖 𝜌𝑖 𝑈𝑖∗ .
Then from (3.6.1) we obtain
(3.6.16) 𝜌𝑙 = 𝑈𝜌0 𝑈 ∗ = |𝜑𝑙 ⟩ ⟨𝜑𝑙 | .
As shown in Section 3.6.3 the measurement statistics and the quantum state immedi-
ately after the measurement are the same for the state described by |𝜑𝑙 ⟩ and 𝜌𝑙 . This
shows that from the perspective of quantum mechanics, the two descriptions of the
state evolution are equivalent.
3.7.1. Partial trace on ℍ𝐴𝐵 . In this section, we discuss the partial trace on ℍ𝐴𝐵
over ℍ𝐵 . We refer to it as the partial trace over 𝐵 and denote it by tr𝐵 . The definition of
the partial trace tr𝐴 over 𝐴 and the corresponding terminology are analogous.
From the definition of the partial trace in Section B.9.4 we obtain the following
formula.
(3.7.2) tr𝐵 𝑈 = ∑ 𝑈 𝑘
𝑘∈ℤ𝑁
where
The next proposition shows how to determine the trace over 𝐵 of a projection |𝜑⟩ ⟨𝜑|
for |𝜑⟩ ∈ ℍ𝐴𝐵 from the Schmidt decomposition of |𝜑⟩.
Proof. We have
𝑙−1
tr𝐵 |𝜑⟩ ⟨𝜑| = tr𝐵 ( ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ |𝜉𝑖 ⟩ ⟨𝜓𝑗 | ⟨𝜉𝑗 |)
𝑖,𝑗=0
𝑙−1
= ∑ 𝑟𝑖2 tr𝐵 (|𝜓𝑖 ⟩ ⟨𝜓𝑗 | ⊗ |𝜉𝑖 ⟩ ⟨𝜉𝑗 |)
𝑖,𝑗=0
𝑙−1
(3.7.8) = ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑗 | tr(|𝜉𝑖 ⟩ ⟨𝜉𝑗 |)
𝑖,𝑗=0
𝑙−1
= ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑗 | 𝛿 𝑖,𝑗
𝑖,𝑗=0
𝑙−1
= ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑖 | . □
𝑖=0
Then we have
3.7.2. Tracing out subsystems. In this section, we study the following ques-
tion. Suppose that the state of a composite quantum system 𝐴𝐵 at time 𝑡 is 𝜌 and we
discard subsystem 𝐵 at this time. Discarding component 𝐵 refers to intentionally dis-
regarding the quantum state of this part of the composite system while focusing solely
on the remaining part 𝐴. What is the state of 𝐴 after time 𝑡? We will show that it must
3.7. Partial trace and reduced density operators 137
be the partial trace tr𝐵 𝜌 of 𝜌 over 𝐵. In the whole section, states of quantum systems
are described by density operators.
We start with the following observation.
Proposition 3.7.6. Let 𝜌 be a density operator on ℍ𝐴𝐵 . Then tr𝐵 (𝜌) is a density operator
on ℍ𝐴 .
Proof. We must show that tr𝐵 (𝜌) satisfies the trace condition and the positivity condi-
tion. As a density operator, 𝜌 satisfies these conditions. Since by Proposition B.9.25 the
partial trace is trace-preserving, we have tr(tr𝐵 (𝜌)) = tr(𝜌) = 1. Therefore, tr𝐵 (𝜌) sat-
isfies the trace condition. Also, tr𝐵 (𝜌) satisfies the positivity condition by Proposition
3.7.5. □
Proof. Let (𝑆 𝑖 ) and (𝑇𝑗 ) be bases of End(ℍ𝐴 ) and End(ℍ𝐵 ), respectively. Then (𝑆 𝑖 ⊗𝑇𝑗 )
is a basis of End(ℍ𝐴𝐵 ). Since the trace is a linear map, it suffices to prove (3.7.13) for
the basis elements 𝑆 𝑖 ⊗ 𝑇𝑗 . So, let 𝑖 ∈ 𝑀 2 , 𝑗 ∈ 𝑁 2 , and 𝜌 = 𝑆 𝑖 ⊗ 𝑇𝑗 . We use the fact
that the partial trace is trace-preserving (see Proposition B.9.25) and obtain
tr(𝑂𝐴𝐵 𝜌) = tr((𝑂𝐴 ⊗ 𝐼𝐵 )(𝑆 𝑖 ⊗ 𝑇𝑗 )) = tr(𝑂𝐴 𝑆 𝑖 ⊗ 𝑇𝑗 ) = tr(tr𝐵 (𝑂𝐴 𝑆 𝑖 ⊗ 𝑇𝑗 ))
= tr(𝑂𝐴 𝑆 𝑖 tr 𝑇𝑗 )
= tr(𝑂𝐴 𝜌𝐴 ).
138 3. Quantum Mechanics
The expression on the right side of (3.7.16) is independent of 𝑓. Hence, 𝑓 must be equal
to tr𝐵 since by the first assertion 𝑓 = tr𝐵 satisfies (3.7.15). □
Proposition 3.7.10. Let 𝑙 ∈ ℕ and for 0 ≤ 𝑖 < 𝑙 let |𝜑𝑖 ⟩ and |𝜓𝑖 ⟩ be quantum states in
ℍ𝐴 and ℍ𝐵 , respectively, such that the states |𝜓𝑖 ⟩ are orthogonal to each other. Also, let 𝜌
be the density operator of the state
𝑙−1
1
(3.7.21) |𝜉⟩ = ∑ |𝜑𝑖 ⟩ |𝜓𝑖 ⟩ .
√𝑙 𝑖=0
Now we characterize the states of composite systems whose partial trace is not a
pure state.
Theorem 3.7.13. Let |𝜑⟩ be the state of the composite system 𝐴𝐵 and let 𝜌 = |𝜑⟩ ⟨𝜑| be
its density operator. Then |𝜑⟩ is entangled with respect to the decomposition of 𝐴𝐵 into
the subsystems 𝐴 and 𝐵 if and only if the reduced density operator 𝜌𝐴 is not the density
operator of a pure state.
Proof. Let
𝑠−1
(3.7.23) |𝜑⟩ = ∑ 𝑟 𝑖 |𝜓𝑖 ⟩ |𝜉𝑖 ⟩
𝑖=0
Hence, we have
𝑠−1
(3.7.25) tr 𝜌 = ∑ 𝑟𝑖2 .
𝑖=0
Assume that |𝜑⟩ is entangled. Then by Theorem 3.2.4 we have 𝑠 > 1. Since the
Schmidt coefficients 𝑟 𝑖 are positive real numbers, the trace condition for 𝜌 and (3.7.25)
imply that 0 < 𝑟 𝑖 < 1 for 0 ≤ 𝑖 < 𝑠. So, by (3.7.26) and (3.7.27) we have 𝜌𝐴 ≠ (𝜌𝐴 )2 and
it follows from Theorem 3.5.15 that 𝜌𝐴 is not the density operator of a pure state.
Conversely, suppose that |𝜑⟩ is separable. Then Theorem 3.2.4 implies that 𝑠 = 1.
It follows from the trace condition and (3.7.25) that 𝑟0 = 1. So (3.7.26) and (3.7.27)
imply that 𝜌𝐴 = (𝜌𝐴 )2 . Theorem 3.5.15 shows that 𝜌𝐴 is the density operator of a pure
state. □
Exercise 3.7.14. Verify (3.7.24).
Chapter 4
The Theory of
Quantum Algorithms
141
142 4. The Theory of Quantum Algorithms
4.1.2. The Pauli gates. The Pauli gates have already been introduced in Section
2.3.1. They are named after the physicist Wolfgang Pauli (1900–1958) and are of great
importance for the construction of quantum circuits and the implementation of quan-
tum algorithms. We recall their definition and discuss their properties.
Definition 4.1.1. The Pauli gates or Pauli operators are
0 1 0 −𝑖 1 0
(4.1.3) 𝑋=( ), 𝑌 =( ), 𝑍=( ).
1 0 𝑖 0 0 −1
So the effect of the Pauli gates on the computational basis vectors of ℍ1 is as follows:
𝑋 |0⟩ = |1⟩ , 𝑋 |1⟩ = |0⟩ ,
(4.1.5) 𝑌 |0⟩ = 𝑖 |1⟩ , 𝑌 |1⟩ = −𝑖 |0⟩ ,
𝑍 |0⟩ = |0⟩ , 𝑍 |1⟩ = − |1⟩ .
This shows that the Pauli 𝑋 gate can be considered as the quantum equivalent of
the classical 𝖭𝖮𝖳 gate since it sends |𝑏⟩ to |¬𝑏⟩ for all 𝑏 ∈ {0, 1}. It is also called the
4.1. Simple single-qubit operators 143
quantum 𝖭𝖮𝖳 gate or the bit-flip gate. Also, the Pauli 𝑍 gate is sometimes called the
phase-flip gate since it flips the phase of |1⟩ from 1 to −1.
In Example 2.4.57 we have determined the spectral decomposition of the Pauli
operators. They are
𝑋 = |𝑥+ ⟩ ⟨𝑥+ | − |𝑥− ⟩ ⟨𝑥− | ,
(4.1.6) 𝑌 = |𝑦+ ⟩ ⟨𝑦+ | − |𝑦− ⟩ ⟨𝑦− | ,
𝑍 = |𝑧+ ⟩ ⟨𝑧+ | − |𝑧− ⟩ ⟨𝑧− |
where
|0⟩ + |1⟩ |0⟩ − |1⟩
(|𝑥+ ⟩ , |𝑥− ⟩) = ( , ),
√2 √2
|0⟩ + 𝑖 |1⟩ |0⟩ − 𝑖 |1⟩
(4.1.7) (|𝑦+ ⟩ , |𝑦− ⟩) = ( , ),
√2 √2
(|𝑧+ ⟩ , |𝑧− ⟩) = (|0⟩ , |1⟩).
The following proposition will be very useful when we discuss rotation gates.
Proposition 4.1.4. The sequence (𝐼, 𝑋, 𝑌 , 𝑍) is a ℂ-basis of End(ℍ1 ) which is orthogonal
with respect to the Hilbert-Schmidt inner product.
The symbols representing the identity and the Pauli gates in quantum circuits are
shown in Figure 4.1.1.
144 4. The Theory of Quantum Algorithms
𝐼 𝑋 𝑌 𝑍
Figure 4.1.1. Symbols for the identity and the Pauli gates in quantum circuits.
4.1.3. The Hadamard gate. Another important single-qubit gate, the Hadamard
gate or Hadamard operator, has already been introduced in Section 3.3.2. We recall that
this gate is
11 1
(4.1.11) 𝐻= ().
√2 1 −1
The Hadamard operator is a unitary and Hermitian involution and we have shown in
Exercise 2.3.4 that
(4.1.12) 𝐻𝑋𝐻 = 𝑍, 𝐻𝑌 𝐻 = −𝑌 , 𝐻𝑍𝐻 = 𝑋.
We also note that
1
(4.1.13) 𝐻= (𝑋 + 𝑍).
√2
Definition 4.2.1. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 be nonzero. Then the angle between 𝑎⃗ and 𝑏 ⃗ is defined
as
⟨𝑎|⃗ 𝑏⟩⃗
(4.2.4) ∠(𝑎,⃗ 𝑏)⃗ = arccos .
‖𝑎‖⃗ ‖𝑏‖⃗
‖ ‖
𝑏⃗ 𝑏⃗
𝑎⃗ 𝑎⃗
𝑎⃗ × 𝑏 ⃗
𝑎⃗
𝑏⃗
Proposition 4.2.3. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 be nonzero vectors . Then we have the following.
(1) 0 ≤ ∠(𝑎,⃗ 𝑏)⃗ = ∠(𝑏,⃗ 𝑎)⃗ ≤ 𝜋.
(2) ∠(𝑎,⃗ 𝑏)⃗ = 0 if and only if 𝑏 ⃗ = 𝑟𝑎⃗ with 𝑟 ∈ ℝ>0 .
(3) ∠(𝑎,⃗ 𝑏)⃗ = 𝜋/2 if and only if ⟨𝑎|⃗ 𝑏⟩⃗ = 0; that is, 𝑎⃗ and 𝑏 ⃗ are orthogonal to each other.
(4) ∠(𝑎,⃗ 𝑏)⃗ = 𝜋 if and only if 𝑏 ⃗ = 𝑟𝑎⃗ with 𝑟 ∈ ℝ<0 .
Exercise 4.2.4. Prove Proposition 4.2.3.
(4.2.9) 𝑎⃗ × 𝑏 ⃗ = (𝑎𝑦 𝑏𝑧 − 𝑎𝑧 𝑏𝑦 , 𝑎𝑧 𝑏𝑥 − 𝑎𝑥 𝑏𝑧 , 𝑎𝑥 𝑏𝑦 − 𝑎𝑦 𝑏𝑥 ).
Example 4.2.6. Let 𝑎⃗ = 𝑥̂ = (1, 0, 0), 𝑏 ⃗ = 𝑦 ̂ = (0, 1, 0). Then 𝑎⃗ × 𝑏 ⃗ = 𝑧 ̂ = (0, 0, 1).
If 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 are linearly independent, then the cross product has the following
geometric interpretation by the right-hand rule which is illustrated in Figure 4.2.2. If 𝑎⃗
points in the direction of the index finger of the right hand and 𝑏 ⃗ points in the direction
of the middle finger, then 𝑎⃗ × 𝑏 ⃗ is a vector orthogonal to the plane spanned by 𝑎⃗ and 𝑏 ⃗
that points in the direction of the thumb.
Here are some important properties of the outer product.
Proposition 4.2.7. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 and let 𝜃 be the angle between 𝑎⃗ and 𝑏.⃗ Then the
following hold.
and
⟨𝑎|⃗ 𝑏⟩⃗ 2 = (𝑎𝑥 𝑏𝑥 + 𝑎𝑦 𝑏𝑦 + 𝑎𝑧 𝑏𝑧 )2
(4.2.12)
= 𝑎𝑥2 𝑏𝑥2 + 𝑎𝑦2 𝑏𝑦2 + 𝑎2𝑧 𝑏𝑧2 + 2(𝑎𝑥 𝑏𝑥 𝑎𝑦 𝑏𝑦 + 𝑎𝑥 𝑏𝑥 𝑎𝑧 𝑏𝑧 + 𝑎𝑦 𝑏𝑦 𝑎𝑧 𝑏𝑧 ).
So
2
‖𝑎‖⃗ ‖𝑏‖⃗ 2 − ⟨𝑎|⃗ 𝑏⟩⃗ 2
‖ ‖
(4.2.13) = 𝑎𝑥2 𝑏𝑦2 + 𝑎𝑥2 𝑏𝑧2 + 𝑎𝑦2 𝑏𝑥2 + 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑥2 + 𝑎2𝑧 𝑏𝑦2
− 2(𝑎𝑥 𝑏𝑥 𝑎𝑦 𝑏𝑦 + 𝑎𝑥 𝑏𝑥 𝑎𝑧 𝑏𝑧 + 𝑎𝑦 𝑏𝑦 𝑎𝑧 𝑏𝑧 ).
On the other hand, we have
2
‖𝑎⃗ × 𝑏‖⃗ = (𝑎 𝑏 − 𝑎 𝑏 )2 + (𝑎 𝑏 − 𝑎 𝑏 )2 + (𝑎 𝑏 − 𝑎 𝑏 )2
‖ ‖ 𝑦 𝑧 𝑧 𝑦 𝑧 𝑥 𝑥 𝑧 𝑦 𝑧 𝑧 𝑦
(4.2.14) = 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑦2 + 𝑎2𝑧 𝑏𝑥2 + 𝑎𝑥2 𝑏𝑧2 + 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑦2
− 2(𝑎𝑦 𝑏𝑧 𝑎𝑧 𝑏𝑦 + 𝑎𝑧 𝑏𝑥 𝑎𝑥 𝑏𝑧 + 𝑎𝑦 𝑏𝑧 𝑎𝑧 𝑏𝑦 ).
So the first assertion follows from (4.2.13) and (4.2.14). Also, Theorem B.5.16, the
Laplace expansion formula for determinants, implies the second assertion. The sec-
ond assertion and the alternating property of the determinant imply the third assertion.
Finally, the first assertion implies the fourth assertion. □
Proof. Since 𝑎̂ and 𝑏 ̂ are unit vectors and since they are orthogonal to each other, it fol-
lows from the first assertion in Proposition 4.2.7 that 𝑝,̂ 𝑞,̂ and 𝑟 ̂ are unit vectors. Also,
the second and third assertion of Proposition 4.2.7 imply that (𝑎,̂ 𝑏,̂ 𝑟)̂ is an orthonormal
basis of ℝ3 with determinant 1.
Assume that (𝑎,̂ 𝑏,̂ 𝑟′̂ ) is another orthonormal basis of ℝ3 with determinant 1. Then
there are 𝛼, 𝛽, 𝛾 ∈ ℝ such that
(4.2.15) 𝑟′̂ = 𝛼𝑎̂ + 𝛽 𝑏 ̂ + 𝛾𝑟.̂
148 4. The Theory of Quantum Algorithms
𝑤̂
𝑝⃗
𝜃
𝑢̂
Figure 4.2.3. The spherical coordinate representation of 𝑝 ⃗ with respect to (𝑢,̂ 𝑣)̂ is (‖𝑝‖,
⃗ 𝜃, 𝜙).
Since 𝑎̂ is a unit vector and since it is orthogonal to 𝑏,̂ 𝑟,̂ and 𝑟′̂ , we have 𝛼 = ⟨𝑎|̂ 𝑟′̂ ⟩ = 0.
In the same way, we see that 𝛽 = 0. Since 𝑟 ̂ and 𝑟′̂ are unit vectors and det(𝑎,̂ 𝑏,̂ 𝑟)̂ =
det(𝑎,̂ 𝑏,̂ 𝑟′̂ ) = 1, it follows that 𝛾 = 1. So, we have 𝑟 ̂ = 𝑟′̂ , as asserted.
The assertions for 𝑝 ̂ and 𝑞 ̂ follow by swapping the columns of (𝑎,̂ 𝑏,̂ 𝑟)̂ and applying
Proposition B.5.11. □
1 1 1 1
Exercise 4.2.9. Let 𝑎̂ = ( , , 0) and 𝑏 ̂ = ( , − , 0). Find 𝑐 ̂ ∈ ℝ3 so that (𝑎,̂ 𝑏,̂ 𝑐)̂
√2 √2 √2 √2
is an orthogonal matrix.
Next, we show how the spherical coordinate representation changes if the azimuth
reference is changed.
4.2. More geometry in ℝ3 149
Proposition 4.2.12. Let 𝑢,̂ 𝑢̂′ , 𝑤̂ ∈ ℝ3 be unit vectors and assume that both 𝑢̂ and 𝑢̂′ are
orthogonal to 𝑤.̂ Then the following hold.
(1) The spherical coordinate representation of 𝑢̂′ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛿) where
cos 𝛿 = ⟨𝑢|̂ 𝑢̂′ ⟩ and sin 𝛿 = ⟨𝑤̂ × 𝑢|̂ 𝑢̂′ ⟩.
(2) Let 𝑝 ⃗ ∈ ℝ3 and let (𝑟, 𝜃, 𝜙) and (𝑟′ , 𝜃′ , 𝜙′ ) be the spherical coordinate representations
of 𝑝 ⃗ with respect to (𝑢,̂ 𝑤)̂ and (𝑢̂′ , 𝑤),
̂ respectively. Then we have 𝑟′ = 𝑟, 𝜃′ = 𝜃, and
0 if 𝜙 = 0,
(4.2.17) 𝜙′ = {
𝜙 − 𝛿 mod 2𝜋 otherwise.
Proof. Set 𝑣 ̂ = 𝑤̂ × 𝑢̂ and 𝑣′̂ = 𝑤̂ ′ × 𝑢̂′ . Then it follows from Theorem 4.2.8 that
𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ and 𝐵 ′ = (𝑢̂′ , 𝑣′̂ , 𝑤)̂ are the uniquely determined orthonormal bases of
ℝ3 with determinant 1 and first and last column 𝑢,̂ 𝑤̂ and 𝑢̂′ , 𝑤̂ ′ , respectively. Let
(1, 𝜀, 𝛿) be the spherical coordinate representation of 𝑢̂′ with respect to (𝑢,̂ 𝑤).
̂ Then by
Proposition 3.1.12 we have
Since 𝑤̂ is a unit vector and since it is orthogonal to 𝑢̂′ , 𝑢,̂ and 𝑣,̂ it follows that cos 𝜀 = 0
and thus 𝜀 = 𝜋/2. Since 𝑢̂ and 𝑣 ̂ are unit vectors and since they are orthogonal to each
other, it follows that cos 𝛿 = ⟨𝑢|̂ 𝑢̂′ ⟩ and sin 𝛿 = ⟨𝑣|̂ 𝑢̂′ ⟩.
Now we turn to the second assertion. Set
cos 𝛿 − sin 𝛿 0
(4.2.19) 𝑀 = ( sin 𝛿 cos 𝛿 0) .
0 0 1
Then 𝐵𝑀 is an orthonormal basis of ℝ3 with determinant 1 with first vector 𝑢̂′ and
last vector 𝑤.̂ Since by Theorem 4.2.8 there is only one such basis, it follows that 𝑣′̂ =
− sin 𝛿 𝑢̂ + cos 𝛿 𝑣.̂ Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinate representations (𝑟, 𝜃, 𝜙) and
(𝑟′ , 𝜃′ , 𝜙′ ) with respect to (𝑢,̂ 𝑤)̂ and (𝑢̂′ , 𝑤),
̂ respectively. Then 𝑟 = ‖𝑝‖⃗ = 𝑟′ . As shown
in Exercise 4.2.13 we have
Exercise 4.2.13. Verify (4.2.20) in the proof of Proposition 4.2.12 using the trigono-
metric identities (A.5.3) and (A.5.6).
Example 4.2.14. Let 𝑢̂ = 𝑥̂ = (1, 0, 0), 𝑢̂′ = 𝑦 ̂ = (0, 1, 0), and 𝑤̂ = 𝑧 ̂ = (0, 0, 1).
Then 𝑣 ̂ = 𝑤̂ × 𝑢̂ = (0, 1, 0) = 𝑦,̂ ⟨𝑢|̂ 𝑢̂′ ⟩ = 0, ⟨𝑤̂ × 𝑢|̂ 𝑢̂′ ⟩ = 1. Hence, the spherical
coordinate representation of 𝑢̂′ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝜋/2). Also, let 𝑝 ⃗ ∈ ℝ3
with Cartesian coordinates (√2, √2, 0). Then the spherical coordinate representation
of 𝑝 ⃗ is (2, 𝜋/2, 𝜋/4). So, by Proposition 4.2.12, the sperical coordinate representation of
𝑝 ⃗ with respect to (𝑦,̂ 𝑧)̂ is (2, 𝜋/2, 7𝜋/4).
150 4. The Theory of Quantum Algorithms
We note that orthogonal matrices are unitary matrices in ℂ(3,3) with real entries.
Therefore, Proposition 2.4.18 implies the following characterization of orthogonal ma-
trices.
Proposition 4.2.17. Let 𝑂 ∈ ℝ(3,3) . Then the following statements are equivalent.
(1) 𝑂 ∈ 𝖮(3).
(2) The columns of 𝑂 form an orthonormal basis of ℝ3 .
(3) The rows of 𝑂 form an orthonormal basis of ℝ3 .
̂ 𝑤⟩̂ = ⟨𝑣|̂ 𝑤⟩̂ for all 𝑣,̂ 𝑤̂ ∈ ℝ3 .
(4) ⟨𝑂𝑣|𝑂
(5) ‖𝑂𝑣‖̂ = ‖𝑣‖̂ for all 𝑣 ̂ ∈ ℝ3 .
It follows from the equivalence of the first two statements in Proposition 4.2.17 that
there is a one-to-one correspondence between orthogonal matrices and orthonormal
bases of ℝ3 .
Now we introduce the orthogonal and the special orthogonal group.
Theorem 4.2.19. (1) The set 𝖮(3) of all orthogonal matrices is a group with respect to
matrix multiplication. It is called the orthogonal group of rank 3.
(2) The set of all orthogonal matrices with determinant 1 is a subgroup of 𝖮(3). It is
denoted by SO(3) and is called the special orthogonal group of rank 3.
The next theorem introduces rotations in ℝ3 . They are illustrated in Figure 4.2.4.
Theorem 4.2.21. Let 𝑢,̂ 𝑤̂ ∈ ℝ3 be unit vectors and let them be orthogonal to each other,
and let 𝛾 ∈ ℝ. Consider the map ℝ3 → ℝ3 that sends 𝑝 ⃗ ∈ ℝ3 with spherical coordinates
(𝑟, 𝜃, 𝜙) with respect to (𝑢,̂ 𝑤)̂ to the vector in 𝑅3 with the following spherical coordinate
representation with respect to (𝑢,̂ 𝑤): ̂
(𝑟, 𝜃, 𝜙) if 𝜃 ∈ {0, 𝜋},
(4.2.21) {
(𝑟, 𝜃, (𝜙 + 𝛾) mod 2𝜋) otherwise.
Then this map depends only on 𝑤̂ and 𝛾 and is independent of 𝑢.̂ It is denoted by Rot𝑤̂ (𝛾)
and is called the rotation about 𝑤̂ through the angle 𝛾. Also, 𝑤̂ and 𝛾 are called the axis
and the angle of this rotation, respectively.
4.2. More geometry in ℝ3 151
𝑤̂
Rot𝑤̂ (𝛾)𝑝 ⃗
𝑝⃗
Proof. We show that the map defined in the theorem is independent of 𝑢.⃗ Let 𝑢̂′ be
another unit vector in ℝ3 that is orthogonal to 𝑤.̂ Denote the map in the theorem by
Rot. We show that Rot is the same map regardless of whether we use 𝑢̂ or 𝑢̂′ for its
definition. By Proposition 4.2.12, the spherical coordinate representation of 𝑢̂′ with
respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛿) with 𝛿 ∈ [0, 2𝜋[. Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinate
representation (𝑟, 𝜃, 𝜙) with respect to (𝑢,̂ 𝑤).̂ If 𝜃 ∈ {0, 𝜋}, then by Proposition 4.2.12
the spherical coordinate representation of 𝑝 ⃗ with respect to (𝑢̂′ , 𝑤)̂ is also (𝑟, 𝜃, 𝜙). So
Rot(𝑝)⃗ = 𝑝,⃗ regardless of whether we choose 𝑢̂ or 𝑢̂′ for its definition. If 𝜃 ≠ 0, 𝜋, then
by Proposition 4.2.12 the spherical coordinate representation of 𝑝 ⃗ with respect to (𝑢̂′ , 𝑤)̂
is (𝑟, 𝜃, (𝜙−𝛿) mod 2𝜋). So if we use 𝑢̂′ to define Rot, we obtain (𝑟, 𝜃, (𝜙−𝛿+𝛾) mod 2𝜋)
as the spherical coordinate representation of Rot(𝑝)⃗ with respect to (𝑢̂′ , 𝑤).
̂ Proposition
4.2.12 shows that the spherical coordinate representation of this vector with respect to
(𝑢,̂ 𝑤)̂ is (𝑟, 𝜃, (𝜙 + 𝛾) mod 2𝜋). But this is the spherical coordinate representation of
Rot(𝑝)⃗ if we use 𝑢̂ to define Rot. □
Figure 4.2.4 shows that applying Rot𝑤̂ (𝛾) to 𝑝 ⃗ ∈ ℝ3 rotates this vector about the
axis 𝑤̂ counterclockwise through an angle 𝛾.
In the remainder of this section, we will prove the following theorem.
Theorem 4.2.22. The set of rotations in ℝ3 is 𝑥SO(3).
We first determine the rotations about the axes 𝑥̂ = (1, 0, 0), 𝑦 ̂ = (0, 1, 0), and
𝑧 ̂ = (0, 0, 1) explicitly.
Proposition 4.2.23. Let 𝛾 ∈ ℝ. Then we have
1 0 0
(4.2.22) Rot𝑥̂ (𝛾) = (0 cos 𝛾 − sin 𝛾) ,
0 sin 𝛾 cos 𝛾
cos 𝛾 0 − sin 𝛾
(4.2.23) Rot𝑦̂(𝛾) = ( 0 1 0 ),
sin 𝛾 0 cos 𝛾
cos 𝛾 − sin 𝛾 0
(4.2.24) Rot𝑧̂ (𝛾) = ( sin 𝛾 cos 𝛾 0) .
0 0 1
Exercise 4.2.24. Prove Proposition 4.2.23.
152 4. The Theory of Quantum Algorithms
Note that by Proposition 4.2.23, the rotations about the 𝑥-, 𝑦-, and 𝑧-axes are in
SO(3). The next proposition provides explicit formulas for all rotations in ℝ3 and shows
that all rotations are in SO(3).
Proposition 4.2.25. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3) and let 𝛾 ∈ ℝ. Then we have
(4.2.25) Rotᵆ̂ (𝛾) = 𝐵 Rot𝑥̂ (𝛾)𝐵−1 ,
(4.2.26) Rot𝑣̂ (𝛾) = 𝐵 Rot𝑦̂(𝛾)𝐵−1 ,
(4.2.27) Rot𝑤̂ (𝛾) = 𝐵 Rot𝑧̂ (𝛾)𝐵−1
and these rotation operators are in SO(3).
Proof. We first prove (4.2.27). Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinates (𝑟, 𝜃, 𝜙) with
respect to (𝑢,̂ 𝑤).
̂ Then we have
(4.2.28) 𝑝 ⃗ = 𝑟𝐵(cos 𝜙 sin 𝜃, sin 𝜙 sin 𝜃, cos 𝜃).
Applying (4.2.24) and the trigonometric identities (A.5.2) and (A.5.5) we obtain
𝐵 Rot𝑧̂ (𝛾)𝐵 −1 𝑝 ⃗ = 𝐵 Rot𝑧̂ (𝛾)(cos 𝜙 sin 𝜃, sin 𝜙 sin 𝜃, cos 𝜃)
(4.2.29)
= 𝑟𝐵(cos(𝜙 + 𝛾) sin 𝜃, sin(𝜙 + 𝛾) sin 𝜃, cos 𝜃).
On the other hand, by (3.1.7) we have
(4.2.30) Rot𝑤̂ (𝛾)𝑝 ⃗ = 𝑟𝐵(cos(𝜙 + 𝛾) sin 𝜃, sin(𝜙 + 𝛾) sin 𝜃, cos 𝜃).
So (4.2.30) and (4.2.29) imply (4.2.27). Next, we prove (4.2.25). With the permutation
matrix
0 0 1
(4.2.31) 𝑃 = (0 1 0)
1 0 0
we have
(4.2.32) 𝐵𝑃 = (𝑤,̂ 𝑣,̂ 𝑢)̂
and
(4.2.33) 𝑃 Rot𝑧̂ (𝛾)𝑃 = Rot𝑧̂ (𝛾).
So it follows from (4.2.27), (4.2.32), and (4.2.33) that
Rotᵆ̂ (𝛾) = 𝐵𝑃 Rot𝑧̂ (𝛾)(𝐵𝑃)−1
(4.2.34)
= 𝐵𝑃 Rot𝑧̂ (𝛾)𝑃𝐵 −1 = 𝐵 Rot𝑥̂ (𝛾)𝐵−1 .
The identity (4.2.26) can be proved analogously. □
We now prove that every operator in SO(3) is a rotation and we explain how rota-
tions can be represented.
4.2. More geometry in ℝ3 153
have 𝑂𝑤̂ ′ = Rot𝑤̂ ′ (𝛾′ )𝑤̂ ′ = 𝑤̂ ′ . So 𝑤̂ ′ is a unit eigenvector of 𝑂 associated with the
eigenvalue 1. As seen above, this implies 𝑤̂ ′ = 𝑤̂ or 𝑤̂ ′ = −𝑤.̂ The uniqueness modulo
2𝜋 is proved in Exercise 4.2.28. □
Now we can prove Theorem 4.2.22. It follows from Proposition 4.2.25 that all rota-
tions of ℝ3 operators are in SO(3). Proposition 4.2.27 shows that all elements of SO(3)
are rotations of ℝ3 . Therefore, the set of all rotations of ℝ3 is indeed SO(3).
Lemma 4.2.29. Let 𝑢,̂ 𝑢̂′ , 𝑤̂ ∈ ℝ3 be unit vectors such that 𝑢̂ and 𝑢̂′ are orthogonal to 𝑤.̂
Then there is a modulo 2𝜋 uniquely determined 𝛿 ∈ ℝ such that Rot𝑤̂ (𝛿)𝑢̂ = 𝑢̂′ .
Proof. It follows from Proposition 4.2.12 that the spherical coordinate representation
of 𝑢̂′ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛿) with a modulo 2𝜋 uniquely determined 𝛿 ∈ ℝ.
Therefore, the definition of rotations in Theorem 4.2.21 implies the assertion. □
Proof. Let 𝑂 ∈ SO(3). Denote by 𝑥,̂ 𝑦′̂ , 𝑧′̂ the column vectors of 𝑂. If 𝑧′̂ = 𝑧 ̂ or 𝑧′̂ = −𝑧,̂
then, as shown in the proof of Proposition 4.2.27, we have 𝑂 = Rot𝑧̂ (𝛾) with 𝛾 ∈ ℝ. So,
if we set 𝛼 = 𝛽 = 0, then (4.2.40) holds.
Assume that 𝑧′̂ ≠ ±𝑧.̂ The proof for this case is illustrated in Figure 4.2.5. The
intersection of the plane spanned by 𝑥̂ and 𝑦 ̂ and the plane spanned by 𝑥̂ and 𝑦′̂ is a
line, that is, a one-dimensional subspace of ℝ3 . Denote by 𝑣 ̂ a unit vector that generates
this space. Both 𝑦 ̂ and 𝑣 ̂ are orthogonal to 𝑧.̂ By Lemma 4.2.29 we can choose 𝛼 ∈ ℝ
such that Rot𝑧̂ (𝛼)𝑦 ̂ = 𝑣.̂ This rotation does not change 𝑧 ̂ and maps 𝑥̂ to some unit
vector 𝑥1̂ ∈ ℝ3 . If we apply this rotation to the standard basis 𝐼3 of ℝ3 , we obtain the
orthonormal basis
(4.2.41) 𝐵1 = Rot𝑧̂ (𝛼)𝐼3 = (𝑥1̂ , 𝑣,̂ 𝑧)̂ ∈ SO(3).
Next, we observe that 𝑧 ̂ and 𝑧′̂ are orthogonal to 𝑣.̂ By Lemma 4.2.29 we can choose
𝛽 ∈ ℝ so that Rot𝑣̂ (𝛽)𝑧 ̂ = 𝑧′̂ . This rotation does not change 𝑣 ̂ and maps 𝑥1̂ to some
unit vector 𝑥2̂ ∈ ℝ3 . By Proposition 4.2.25 and (4.2.41) we can write this rotation as
(4.2.42) Rot𝑣̂ (𝛽) = 𝐵1 Rot𝑦̂(𝛽)𝐵1−1 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽)𝐵1−1 .
Applying this rotation to 𝐵1 we obtain the basis
(4.2.43) 𝐵2 = Rot𝑣̂ (𝛽)𝐵1 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽)𝐼3 = (𝑥2̂ , 𝑣,̂ 𝑧′̂ ) ∈ SO(3).
4.2. More geometry in ℝ3 155
𝑧̂
𝑧′̂
𝛽
𝛾 𝑦′̂
𝛽 𝛼 𝑣̂
𝛼 𝑦̂
𝑥̂ 𝑥̂
1 𝛾
𝑥2̂ 𝑥̂
Finally, we note that 𝑣 ̂ and 𝑦′̂ are both orthogonal to 𝑧′̂ . By Lemma 4.2.29 we can choose
𝛾 ∈ ℝ3 such that Rot𝑧′̂ (𝛾)𝑣 ̂ = 𝑦′̂ . This rotation does not change 𝑧′̂ and maps 𝑥2̂ to some
unit vector 𝑥3̂ . By Proposition 4.2.25 and (4.2.43) this rotation is
(4.2.44) Rot𝑧′̂ (𝛾) = 𝐵2 Rot𝑧̂ (𝛼)𝐵2−1 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽) Rot𝑧̂ (𝛾)𝐵2−1 .
Applying this rotation to 𝐵2 we obtain the basis
(4.2.45) 𝐵3 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽) Rot𝑧̂ (𝛾) = (𝑥3̂ , 𝑦′̂ , 𝑧′̂ ) ∈ SO(3).
Since 𝑂 = (𝑥,̂ 𝑦′̂ , 𝑧′̂ ) ∈ SO(3), it follows from Proposition 4.2.7 that 𝑥3̂ = 𝑥̂ and 𝑂 = 𝐵3 .
This concludes the proof. □
If in Exercise 4.2.31 the rotation axes 𝑣 ̂ and 𝑤̂ are not orthogonal, then, in general,
a decomposition (4.2.46) does not exist. However, we can prove a weaker result, for
which we need the following lemma.
Lemma 4.2.32. Let 𝑤,̂ 𝑤̂ ′ ∈ ℝ3 be unit vectors, and let 𝛾 ∈ ℝ. Also, let 𝑂 ∈ SO(3) with
𝑂𝑤̂ = 𝑤̂ ′ . Then
(4.2.47) Rot𝑤̂ ′ (𝛾) = 𝑂 Rot𝑤̂ (𝛾)𝑂−1 .
Proof. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3). This matrix exists by Theorem 4.2.8. Set 𝐵′ = 𝑂𝐵 =
(𝑢̂′ , 𝑣′̂ , 𝑤̂ ′ ). Then it follows from Proposition 4.2.25 that Rot𝑤̂ ′ (𝛾) = 𝐵 ′ Rot𝑧̂ (𝛾)(𝐵 ′ )−1 =
𝑂𝐵 Rot𝑧̂ (𝛾)𝐵−1 𝑂−1 = 𝑂𝑅𝑤̂ (𝛾)𝑂−1 . □
156 4. The Theory of Quantum Algorithms
which is a decomposition as in (4.2.48). We will also show that 𝑙 = O(1/𝜑). Then the
theorem is proved. In order to keep the proof as simple as possible, we will give geo-
metric arguments. They can be verified algebraically using the terminology introduced
so far.
First, we observe that a rotation about 𝑎̂ brings 𝑤̂ into the plane 𝑃 spanned by 𝑎̂
and 𝑏.̂ So, we assume that 𝑤̂ is in this plane. Next, we may assume that the initial
positions of the three vectors 𝑎,̂ 𝑏,̂ and 𝑤̂ are as shown in Figure 4.2.6. Therefore, they
are in the half-plane above the dashed line orthogonal to 𝑎̂ and also in the half-plane
to the right of 𝑎.̂ This can be achieved as follows. Vectors that are below the dashed
line are multiplied by −1. This is justified by Exercise 4.2.26. If 𝑏 ̂ is on the wrong side
of 𝑎,̂ then we exchange 𝑎̂ and 𝑏.̂ Also, if 𝑤̂ is on the wrong side of 𝑎,̂ then we apply a
rotation about 𝑎̂ through an angle of 𝜋 to 𝑤.̂
𝑎̂ 𝑎̂
𝑤̂
𝑏̂
𝑏̂
𝑤̂
Case 1 Case 2
Figure 4.2.6. The two cases in the proof of Theorem 4.2.33 for possible initial positions
of the vectors 𝑎,̂ 𝑏,̂ and 𝑤.̂
4.2. More geometry in ℝ3 157
In the situation shown in Figure 4.2.6 there are two possible cases. In Case 1, the
vector 𝑤̂ is on the left of 𝑏,̂ and in Case 2, it is on the right of 𝑏.̂ We prove the assertion
in both cases.
The proof in the first case is illustrated in Figure 4.2.7. In this case, 𝑤̂ is in the
half-plane to the left of 𝑏 ̂ and the angle 𝜃 between 𝑎̂ and 𝑤̂ is at most as big as the angle
𝜑 between 𝑎̂ and 𝑏.̂ If 𝜃 = 𝜑, then we are done. Therefore, we assume that 𝜃 < 𝜑.
Suppose that we rotate 𝑤̂ about 𝑏 ̂ through an angle of 𝛽 ∈ [0, 𝜋]. Denote the rotated
vector by 𝑤(𝛽)
̂ and the angle between 𝑤(𝛽) ̂ and 𝑎̂ by 𝜃(𝛽). Then 𝜃(0) = 𝜃 < 𝜑 and
𝜃(𝜋) > 𝜑. Since the function [0, 𝜋] → ℝ, 𝛽 ↦ 𝜃(𝛽) is continuous, it follows from the
intermediate value theorem that there is 𝛽 ∈ [0, 𝜋] such that 𝜃(𝛽) = 𝜑. We apply the
rotation Rot𝑏̂ (𝛽) to 𝑤̂ and obtain 𝑤̂ ′ such that the angle between 𝑎̂ and 𝑤̂ ′ is equal to
the angle between 𝑎̂ and 𝑏.̂ A rotation of 𝑤̂ ′ about 𝑎̂ through some angle 𝛼 ∈ ℝ sends
𝑤̂ ′ to 𝑏.̂
𝑎̂
𝑤̂
𝜃(𝜋)
𝜃 𝑏̂
𝜑
𝑤(𝜋)
̂
Now we turn to the second case where 𝑤̂ is in the half-plane on the right side of 𝑏.̂
We show how to use rotations of 𝑤̂ about 𝑎̂ and 𝑏 ̂ to obtain Case 1. This construction is
illustrated in Figure 4.2.8. We set 𝑤̂ 0 = 𝑤̂ and construct a finite sequence 𝑤̂ 1 , . . . , 𝑤̂ 𝑚 ,
𝑚 ∈ ℕ, such that 𝑤̂ 𝑖 is obtained from 𝑤̂ 𝑖−1 by rotations about 𝑎̂ and 𝑏 ̂ and 𝑤̂ 𝑚 is for
the first time between 𝑎̂ and 𝑏.̂ For 𝑖 ∈ {0, . . . , 𝑚} we denote by 𝛼𝑖 the angle between 𝑎̂
and 𝑤̂ 𝑖 and by 𝛽 𝑖 the angle between 𝑤̂ 𝑖 and 𝑏.̂ Furthermore, we denote by 𝜑 the angle
between 𝑎̂ and 𝑏.̂ To construct 𝑤̂ 1 from 𝑤̂ 0 , we rotate 𝑤̂ 0 about 𝑏 ̂ through an angle
𝜋. If 𝛽0 ≤ 𝜙, then 𝑤̂ 1 is between 𝑎̂ and 𝑏 ̂ and we are in Case 1. In Figure 4.2.8 this
is not the case. If 𝛽0 > 𝜑, then we have 𝛼1 = 𝛽1 − 𝜑 = 𝛽0 − 𝜑. Since 𝛽0 < 𝛼0 , it
follows that 𝛼1 < 𝛼0 − 𝜑. Next, we construct 𝑤̂ 2 by a rotation of 𝑤̂ 1 about 𝑎̂ through
𝑎̂ 𝑎̂ 𝑎̂ 𝑤̂ 3
𝑤̂ 1 𝛼1
𝑏̂ 𝑤̂ 1 𝑏 ̂ 𝑤̂ 𝛼3 𝑏 ̂ 𝑤̂
2 2
𝛽2
𝛽1 𝛽3 𝛽
2
𝛽1 𝛼2
𝛼0 𝛽0
𝜑 𝜑 𝜑
𝑤̂ 0 = 𝑤̂
Proof. We use Theorem 4.1.2 and obtain the following. Since the Pauli operators are
Hermitian operators with trace 0, the operator 𝑝 ̂ ⋅ 𝜎 is also Hermitian and has trace 0.
Let 𝑝 ̂ = (𝑝𝑥 , 𝑝𝑦 , 𝑝𝑧 ). Due to ‖𝑝‖̂ = 1 we have
(𝑝 ̂ ⋅ 𝜎)2 = (𝑝𝑥 𝑋 + 𝑝𝑦 𝑌 + 𝑝𝑧 𝑍)2
= 𝑝𝑥2 𝑋 2 + 𝑝𝑦2 𝑌 2 + 𝑝𝑧 𝑍 2 + 𝑝𝑥 𝑝𝑦 (𝑋𝑌 + 𝑌 𝑋)
+ 𝑝𝑥 𝑝𝑧 (𝑋𝑍 + 𝑌 𝑍) + 𝑝𝑦 𝑝𝑧 (𝑌 𝑍 + 𝑍𝑌 )
= (𝑝𝑥2 + 𝑝𝑦2 + 𝑝𝑧2 )𝐼 = 𝐼.
So 𝑝 ̂ ⋅ 𝜎 is an involution. But Hermitian involutions are unitary. Also, since 𝑝 ̂ ⋅ 𝜎 is a
Hermitian involution of trace 0, this operator is diagonizable by Theorem 2.4.53 and
its eigenvalues are in {±1} by Proposition 2.4.60. But since (𝐼, 𝑋, 𝑌 , 𝑍) is a ℂ-basis of
End(ℍ1 ) by Proposition 4.1.4, it follows that 𝑝 ̂ ⋅ 𝜎 ≠ 𝐼. So, the set of eigenvalues of 𝑝 ̂ ⋅ 𝜎
is {±1}. □
From Proposition 4.3.2, Corollary 2.4.73, and Proposition 2.4.74 we obtain the fol-
lowing result.
4.3. Rotation operators 159
The name “rotation operator” comes from the fact that applying the operator from
(4.3.4) to a quantum state in ℍ1 means applying the rotation Rot𝑤̂ (𝛾) to the correspond-
ing point on the Bloch sphere. This will be shown in Theorem 4.3.20. The next exercise
verifies this for the special case where 𝑤̂ = 𝑧 ̂ = (0, 0, 1).
Exercise 4.3.5. Show that for every 𝛾 ∈ ℝ and every quantum state |𝜓⟩ ∈ ℍ1 we have
(4.3.5) 𝑝 ̂ (𝑅𝑧̂ (𝛾) |𝜓⟩) = Rot𝑧̂ (𝛾)𝑝(𝜓).
̂
Here is another representation of the rotation operators that we have just intro-
duced.
Proposition 4.3.8. Let 𝛾 ∈ ℝ. Then we have
𝛾 𝛾
cos 2 −𝑖 sin 2
(4.3.8) 𝑅𝑥̂ (𝛾) = ( 𝛾 𝛾 ),
−𝑖 sin 2 cos 2
𝛾 𝛾
cos 2 − sin 2
(4.3.9) 𝑅𝑦̂(𝛾) = ( 𝛾 𝛾 ),
sin 2 cos 2
𝛾
𝑒−𝑖 2 0
(4.3.10) 𝑅𝑧̂ (𝛾) = ( 𝛾).
0 𝑒𝑖 2
Exercise 4.3.9. Prove Proposition 4.3.8
The next exercise shows that 𝑖𝑋, 𝑖𝑌 , 𝑖𝑍, and 𝑋𝐻 are rotation operators.
𝜋
Exercise 4.3.10. Show that 𝑅𝑥̂ (𝜋) = 𝑖𝑋 , 𝑅𝑦̂(𝜋) = 𝑖𝑌 , 𝑅𝑧̂ (𝜋) = 𝑖𝑍, and 𝐻 = 𝑋𝑅𝑦̂ ( 2 ).
160 4. The Theory of Quantum Algorithms
4.3.2. The group of rotation operators. Our next goal is to show that the set
of rotation operators on ℍ1 is SU(2). For this, we use the following characterization of
SU(2) which follows from Corollary 2.4.73:
(4.3.11) SU(2) = {𝑒𝑖𝐴 ∶ 𝐴 Hermitian and tr 𝐴 = 0}.
Definition 4.3.11. The set of all Hermitian operators on ℍ1 with trace 0 is denoted by
su(2).
We note that we slightly deviate from the standard notation in mathematics where
su(2) typically denotes the Lie algebra that consists of all 2×2 skew-Hermitian matrices
with trace 0.
The elements of su(2) can be characterized as follows.
Lemma 4.3.12. Let 𝐴 ∈ ℂ(2,2) . Then 𝐴 ∈ su(2) if and only if there are 𝑎 ∈ ℝ and 𝑏 ∈ ℂ
such that
𝑎 𝑏
(4.3.12) 𝐴=( ).
𝑏 −𝑎
Exercise 4.3.13. Prove Lemma 4.3.12.
The next proposition uses Lemma 4.3.12 to describe the structure of su(2).
Proposition 4.3.14. The set su(2) is a real three-dimensional vector space. The triplet
𝜎 = (𝑋, 𝑌 , 𝑍) of the three Pauli operators is an ℝ-basis of su(2) that is orthogonal with
respect to the Hilbert-Schmidt inner product.
Proof. Let 𝑤̂ ∈ ℝ3 be a unit vector. It can be easily verified that 𝑅𝑤̂ (𝛾) = 𝐼 if 𝛾/2 ≡
0 mod 2𝜋. Also, by Proposition 4.1.4 the coefficients of a representation of 𝐼 as a linear
combination of 𝐼, 𝑋, 𝑌 , 𝑍 are uniquely determined. So if 𝐼 = 𝑅𝑤̂ (𝛾) with 𝛾 ∈ ℝ, then
it follows from (4.3.3) that we have cos 𝛾/2 = 1 and sin 𝛾/2 = 0. This implies 𝛾/2 ≡
0 mod 2𝜋. The second assertion can be proved analogously.
We prove the third statement. It follows from (4.3.11) that 𝑈 = 𝑒𝑖𝐴 with 𝐴 ∈ su(2).
By Proposition 4.3.14 there is a uniquely determined 𝑝 ⃗ ∈ ℝ3 such that 𝐴 = 𝑝 ⃗ ⋅ 𝜎. Since
𝑈 ≠ ±𝐼 it follows that 𝑝 ⃗ is nonzero. Set 𝛾 = 2 ‖‖𝑝‖‖⃗ mod 4𝜋 and 𝑤̂ = −𝑝/⃗ ‖‖𝑝‖‖⃗ . Then 𝑤̂
̂
is a unit vector, and we have 𝑈 = 𝑒−𝑖𝛾𝑤⋅𝜍/2 .
Next, let 𝑤̂ ′ ∈ ℝ3 be a unit vector and let 𝛾′ ∈ ℝ be such that 𝑈 = 𝑅𝑤̂ ′ (𝛾′ ).
Then cos 𝛾/2 ≠ ±1 and sin 𝛾/2 ≠ 0. The uniqueness of the coefficient of 𝐼 in (4.3.3)
implies 𝛾/2 ≡ ±𝛾′ /2 mod 2𝜋. If 𝛾/2 ≡ 𝛾′ /2 mod 2𝜋, then sin 𝛾/2 = sin 𝛾′ /2 and due
to the uniqueness of the coefficients of 𝑋, 𝑌 , and 𝑍 in (4.3.3) we have 𝑤̂ ′ = 𝑤.̂ If
𝛾/2 ≡ −𝛾′ /2 mod 2𝜋, then sin 𝛾/2 = − sin 𝛾′ /2 and because of the uniqueness of the
coefficients of 𝑋, 𝑌 , and 𝑍 in (4.3.3) we have 𝑤̂ = −𝑤.̂ □
Theorem 4.3.15 implies that, up to a global phase factor, every unitary operator on
ℍ1 is a rotation operator. This is what the next corollary says.
Corollary 4.3.16. Let 𝑈 ∈ U(2). Then there is 𝛿 ∈ ℝ such that 𝑒−𝑖𝛿 𝑈 is a rotation
operator on ℍ1 .
Proof. Since | det 𝑈| = 1 we can choose 𝛿 ∈ ℝ such that det 𝑈 = 𝑒𝑖2𝛿 . So det(𝑒−𝑖𝛿 𝑈) =
1 which implies that 𝑒−𝑖𝛿 𝑈 ∈ SU(2). So by Theorem 4.3.15, 𝑒−𝑖𝛿 𝑈 is a rotation operator.
□
Theorem 4.3.15 also implies the next corollary which, in turn, allows us to assign a
uniquely determined rotation of ℝ3 to each rotation operator on ℍ1 in the most obvious
way.
Corollary 4.3.17. Let 𝑈 ∈ SU(2). Then for all unit vectors 𝑤̂ ∈ ℝ and 𝛾 ∈ ℝ such that
𝑈 = 𝑅𝑤̂ (𝛾) the rotation Rot𝑤̂ (𝛾) is the same.
Exercise 4.3.18. Use Theorem 4.3.15 and Proposition 4.2.27 to prove Corollary 4.3.17.
Corollary 4.3.17 justifies the following definition.
Definition 4.3.19. Let 𝑈 ∈ SU(2) and let 𝑈 = 𝑅𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and
𝛾 ∈ ℝ. Then we set Rot(𝑈) = Rot𝑤̂ (𝛾).
4.3.3. Rotation operators and rotations on the Bloch sphere. After the
preparations of the preceding section, we are now ready to prove the following im-
portant theorem.
Theorem 4.3.20. The map
(4.3.14) Rot ∶ SU(2) → SO(3), 𝑈 ↦ Rot(𝑈)
is a surjective group homomorphism with kernel ±𝐼. Furthermore, for all 𝑈 ∈ SU(2) and
all quantum states |𝜓⟩ in ℍ1 the point on the Bloch sphere corresponding to |𝜓⟩ is
(4.3.15) ⃗ |𝜓⟩) = Rot(𝑈)𝑝(𝜓).
𝑝(𝑈 ⃗
162 4. The Theory of Quantum Algorithms
Proof. First, (4.3.20) follows from Proposition 4.3.2. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ with column
vectors 𝑢,̂ 𝑣,̂ 𝑤̂ ∈ ℝ3 . From Lemma 4.3.24 and Theorem 4.2.8 we obtain
(4.3.23) 𝜏ᵆ 𝜏𝑣 = (𝑢̂ ⋅ 𝜎)(𝑣 ̂ ⋅ 𝜎) = ⟨𝑢|̂ 𝑣⟩̂ 𝐼 + 𝑖(𝑢̂ × 𝑣)̂ ⋅ 𝜎 = 𝑖 𝑤̂ ⋅ 𝜎 = 𝑖𝜏𝑤 .
The other identities in (4.3.21) can be proved analogously. Finally, from (4.3.20) and
(4.3.21) we obtain
(4.3.24) −𝑖𝜏ᵆ 𝜏𝑣 𝜏𝑤 = (−𝑖)𝑖𝜏𝑤 𝜏𝑤 = 𝐼. □
4.3. Rotation operators 163
Proof. Let 𝑈 = 𝑅𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and 𝛾 ∈ ℝ which exist by Theorem
4.3.15. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3) which exists by Theorem 4.2.8. Set
(4.3.26) 𝜏 = (𝜏ᵆ , 𝜏𝑣 , 𝜏𝑤 ) = 𝐵 ⋅ 𝜎.
−1
Also let 𝑞 ⃗ = 𝐵 𝑝.⃗ Then Proposition 4.2.25 and Lemma 4.3.22 imply
(Rot(𝑈)𝑝)⃗ ⋅ 𝜎 = (Rot𝑤̂ (𝛾)𝑝)⃗ ⋅ 𝜎 = (𝐵 Rot𝑧̂ (𝛾)𝐵−1 𝑝)⃗ ⋅ 𝜎
(4.3.27)
= (𝐵 Rot𝑧̂ (𝛾)𝑞)⃗ ⋅ 𝜎 = (Rot𝑧̂ (𝛾)𝑞)⃗ ⋅ 𝜏
and
(4.3.28) 𝑈(𝑝 ⃗ ⋅ 𝜎)𝑈 −1 = 𝑈(𝐵 𝑞 ⃗ ⋅ 𝜎)𝑈 −1 = 𝑈(𝑞 ⃗ ⋅ 𝜏)𝑈 −1 .
So it suffices to show that
(4.3.29) (Rot𝑧̂ (𝛾)𝑞)⃗ ⋅ 𝜏 = 𝑈(𝑞 ⃗ ⋅ 𝜏)𝑈 −1 .
Since the expressions on the left and right side of (4.3.29) are linear in 𝑞,⃗ it suffices to
prove this identity for 𝑞 ⃗ ∈ {𝑥,̂ 𝑦,̂ 𝑧}.
̂ This is done in Exercise 4.3.27. □
Exercise 4.3.27. Verify (4.3.29) in the proof of Lemma 4.3.26 for 𝑞 ⃗ = 𝑥̂ = (1, 0, 0),
𝑞 ⃗ = 𝑦 ̂ = (0, 1, 0), and 𝑞 ⃗ = 𝑧 ̂ = (0, 0, 1) using Proposition 4.3.25 and the trigonometric
identities in Section A.5.
The linearity of the map Rot can now be seen as follows. Lemma 4.3.26 implies
that for all 𝑈1 , 𝑈2 ∈ SU(2) and all 𝑝 ⃗ ∈ ℝ3 we have
(Rot(𝑈1 𝑈2 )𝑝)⃗ ⋅ 𝜎 = 𝑈1 𝑈2 (𝑝 ⃗ ⋅ 𝜎)𝑈2−1 𝑈1−1
(4.3.30)
= 𝑈1 ((Rot(𝑈2 )𝑝)⃗ ⋅ 𝜎) 𝑈1−1 = (Rot(𝑈1 ) Rot(𝑈2 )𝑝)⃗ ⋅ 𝜎.
We determine the kernel of Rot. Let 𝑈 ∈ SU(2). Write 𝑈 = 𝑅𝑤̂ (𝛾) as in The-
orem 4.3.15. Then it follows from the definition of rotations in Theorem 4.2.21 that
Rot𝑤̂ (𝛾) = 𝐼3 if and only if 𝛾 ≡ 0 mod 2𝜋. This is true if and only if 𝛾/2 ≡ 0 mod 𝜋.
Therefore, Theorem 4.3.15 implies that Rot(𝑈) = 𝐼3 if and only if 𝑈 = ±𝐼.
Next, we prove the second assertion of Theorem 4.3.20. For this, we need further
auxiliary results.
Lemma 4.3.28. Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinate representation (1, 𝜃, 𝜙). Then we
have
cos 𝜃 𝑒−𝑖𝜙 sin 𝜃
(4.3.31) 𝑝 ⃗ ⋅ 𝜎 = ( 𝑖𝜙 ).
𝑒 sin 𝜃 − cos 𝜃
Exercise 4.3.29. Prove Lemma 4.3.28.
From Lemma 4.3.28 we obtain the following representation of the density opera-
tors corresponding to quantum states in ℍ1 .
Proposition 4.3.30. Let |𝜓⟩ be a quantum state in ℍ1 . Then we have
1
(4.3.32) |𝜓⟩ ⟨𝜓| = (𝐼 + 𝑝(𝜓)
⃗ ⋅ 𝜎).
2
164 4. The Theory of Quantum Algorithms
With these results, we can prove the second assertion of Theorem 4.3.20. For this,
let 𝑈 ∈ SU(2) and let |𝜓⟩ be a quantum state in ℍ1 . Set 𝑝 ⃗ = 𝑝(𝜓)
⃗ ⃗ |𝜓⟩).
and 𝑞 ⃗ = 𝑝(𝑈
Then it follows from Proposition 4.3.30 that
1
(4.3.36) |𝑈 |𝜓⟩⟩ ⟨𝑈 |𝜓⟩| = (𝐼 + 𝑞 ⃗ ⋅ 𝜎)
2
and
1
(4.3.37) 𝑈 |𝜓⟩ ⟨𝜓| 𝑈 −1 = (𝐼 + 𝑈 𝑝 ⃗ ⋅ 𝜎𝑈 −1 ).
2
But Lemma 4.3.26 gives
(4.3.38) 𝑈 𝑝 ⃗ ⋅ 𝜎𝑈 −1 = (Rot(𝑈)𝑝)⃗ ⋅ 𝜎.
So (4.3.36), (4.3.37), and (4.3.38) imply the assertion.
We will now use Theorem 4.3.31 to give another representation of unitary single-
qubit operators, since Section 4.4 allows us to implement controlled operators. For this,
we need the following lemma.
Lemma 4.3.32. For all 𝛾 ∈ ℝ we have 𝑋𝑅𝑦̂(𝛾)𝑋 = 𝑅𝑦̂(−𝛾) and 𝑋𝑅𝑧̂ (𝛾)𝑋 = 𝑅𝑧̂ (−𝛾).
4.3. Rotation operators 165
The next exercise gives a representation as in Theorem 4.3.33 for rotation opera-
tors.
Exercise 4.3.34. Let 𝑤̂ ∈ ℝ3 be a unit vector and let 𝛾 ∈ ℝ. Set 𝐴 = 𝑅𝑤̂ (𝛾/2), 𝐵 =
𝑅𝑤̂ (−𝛾/2), and 𝐶 = 𝐼2 . Show that 𝑅𝑤̂ (𝛾) = 𝐴𝑋𝐵𝑋𝐶 and that 𝐴𝐵𝐶 = 𝐼2 .
166 4. The Theory of Quantum Algorithms
From Corollary 4.3.16, Proposition 4.3.14, and Theorem 4.3.20 we obtain the fol-
lowing decomposition result.
Theorem 4.3.35. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 be nonparallel unit vectors. Denote by 𝜑 the angle be-
tween 𝑎⃗ and 𝑏.⃗ Then for all unitary operators 𝑈 on ℍ1 there are 𝑘 ∈ ℕ and real numbers
𝛼1 , . . . , 𝛼𝑘 , 𝛽1 , . . . , 𝛽 𝑘 , 𝛿 such that 𝑘 = O(1/𝜑) and
𝑘
(4.3.46) 𝑈 = 𝑒𝑖𝛿 ∏ 𝑅𝑎⃗ (𝛼𝑖 )𝑅𝑏⃗ (𝛽 𝑖 ).
𝑖=1
4.3.5. Phase shift gates. In this section, we introduce the following special class
of rotation operators.
1 0
(4.3.47) 𝑃(𝛾) = ( ).
0 𝑒𝑖𝛾
It shifts the phase of the amplitude of |1⟩ by an angle 𝛾 while it does not change the
amplitude of |0⟩.
1 0 2𝜋𝑖
2𝜋
(4.3.49) 𝑅𝑘 = ( 2𝜋𝑖 ) = 𝑒 2𝑘−1 𝑅𝑧̂ ( ).
0 𝑒 2𝑘 2𝑘−1
𝜋 1 0
(4.3.50) 𝑆 = 𝑅2 = 𝑃 ( ) = ( ).
2 0 𝑖
𝜋 1 0
(4.3.51) 𝑇 = 𝑅3 = 𝑃 ( ) = ( 𝑖𝜋/4 ) .
4 0 𝑒
(4.3.53) 𝑇 2 = 𝑆.
4.4. Controlled operators 167
𝑆 𝑇
Figure 4.3.1. Symbols for the phase and 𝜋/8 gates in quantum circuits.
The symbols that represent the phase gate and the 𝜋/8 gate are shown in Figure
4.3.1.
Figure 4.4.1. The four 𝖢𝖭𝖮𝖳 gates. The upper-left 𝖢𝖭𝖮𝖳 gate is called the standard
𝖢𝖭𝖮𝖳 gate.
168 4. The Theory of Quantum Algorithms
𝑋 𝑋
=
Figure 4.4.2. Implementation of the lower-left 𝖢𝖭𝖮𝖳 gate from Figure 4.4.1.
As seen in (3.3.5), the representation matrix of the standard 𝖢𝖭𝖮𝖳 gate with respect
to the computational basis of ℍ2 is
1 0 0 0
⎛ ⎞
0 1 0 0
(4.4.1) 𝖢𝖭𝖮𝖳 = ⎜ ⎟.
⎜0 0 0 1⎟
⎝0 0 1 0⎠
Exercise 4.4.1. Determine matrix representations of all 𝖢𝖭𝖮𝖳 gates in Figure 4.4.1
with respect to the computational basis of ℍ2 .
We note that the lower-left 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 can be implemented using
the standard 𝖢𝖭𝖮𝖳 gate and the Pauli 𝑋 gate as shown in Figure 4.4.2.
Exercise 4.4.2. (1) Show that the implementation of the 𝖢𝖭𝖮𝖳 gate in Figure 4.4.2
is correct.
(2) Give an implementation of the lower-right 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 using the
upper-right 𝖢𝖭𝖮𝖳 gate and the Pauli 𝑋 gate.
Next, we note that the roles of qubits as control and target qubits in the 𝖢𝖭𝖮𝖳 gates
depend on the choice of the basis of ℍ2 . To see this, consider the orthonormal basis
|0⟩ + |1⟩ |0⟩ − |1⟩
(4.4.2) (|𝑥+ ⟩ , |𝑥− ⟩) = (𝐻 |0⟩ , 𝐻 |1⟩) = (
, )
√2 √2
of ℍ1 where 𝐻 is the Hadamard operator. We have seen in Section 4.1.2 that it is an
eigenbasis of the Pauli 𝑋 operator. Also,
(4.4.3) (|𝑥+ 𝑥+ ⟩ , |𝑥− 𝑥+ ⟩ , |𝑥+ 𝑥− ⟩ , |𝑥− 𝑥− ⟩)
is an orthonormal basis of ℍ2 . As shown in Exercise 4.4.3, applying 𝖢𝖭𝖮𝖳 to the ele-
ments of this basis has the following effect:
𝖢𝖭𝖮𝖳 |𝑥+ 𝑥+ ⟩ = |𝑥+ 𝑥+ ⟩ , 𝖢𝖭𝖮𝖳 |𝑥− 𝑥+ ⟩ = |𝑥− 𝑥+ ⟩ ,
(4.4.4)
𝖢𝖭𝖮𝖳 |𝑥+ 𝑥− ⟩ = |𝑥− 𝑥− ⟩ , 𝖢𝖭𝖮𝖳 |𝑥− 𝑥− ⟩ = |𝑥+ 𝑥− ⟩ .
Exercise 4.4.3. Prove (4.4.4).
We see that in (4.4.4) the 𝖢𝖭𝖮𝖳 operator exchanges the basis states |𝑥+ ⟩ and |𝑥− ⟩
of the first qubit conditioned on the second qubit being in state |𝑥− ⟩. So in this repre-
sentation of 𝖢𝖭𝖮𝖳, the first qubit is the target, while the second qubit is the control.
As shown in Figure 4.4.3, this observation can be used to implement the upper-right
𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 using the standard 𝖢𝖭𝖮𝖳 gate and the Hadamard gate.
Exercise 4.4.4. Show that all 𝖢𝖭𝖮𝖳 gates in Figure 4.4.1 can be implemented using
the standard 𝖢𝖭𝖮𝖳 gate, the Pauli 𝑋 gate, and the Hadamard gate 𝐻.
4.4. Controlled operators 169
𝐻 𝐻
=
𝐻 𝐻
Figure 4.4.3. Implementation of the upper-right 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 using the
Hadamard gate and the standard 𝖢𝖭𝖮𝖳 gate.
The 𝖢𝖭𝖮𝖳 gates from Figure 4.4.1 can also be applied to quantum registers of
length > 2. This is shown in Figure 4.4.4. Such 𝖢𝖭𝖮𝖳 operators are specified by their
action on the computational basis vectors |𝑏0 ⋯ 𝑏𝑛−1 ⟩, (𝑏0 ⋯ 𝑏𝑛−1 ) ∈ {0, 1}𝑛 , as fol-
lows. There is a control qubit |𝑐⟩ and a target qubit |𝑡⟩, 𝑐, 𝑡 ∈ ℤ𝑛 , 𝑐 ≠ 𝑡. The target qubit
|𝑡⟩ is mapped to 𝑋 𝑐 |𝑡⟩. All other qubits remain unchanged.
|𝑏0 ⟩ |𝑏0 ⟩
|𝑏0 ⟩ 𝑈 𝑈 𝑏1 |𝑏0 ⟩
|𝑏1 ⟩ 𝑈 𝑈 𝑏0 |𝑏1 ⟩ |𝑏1 ⟩ |𝑏1 ⟩
|𝑏0 ⟩ |𝑏0 ⟩ |𝑏0 ⟩ 𝑈 𝑈 1−𝑏1 |𝑏0 ⟩
|𝑏1 ⟩ 𝑈 𝑈 1−𝑏0 |𝑏1 ⟩ |𝑏1 ⟩ |𝑏1 ⟩
1 0
( ) 𝑒𝑖𝛿/2 𝑅𝑧̂ (𝛿)
0 𝑒𝑖𝛿
= =
𝑒𝑖𝛿 0
( )
0 𝑒𝑖𝛿
𝑃(𝛿)
=
𝑈 𝐶 𝐵 𝐴
Figure 4.4.7. Implementation of the controlled-𝑈 gate with the first qubit as control
using the decomposition 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶.
𝑒𝑖𝛿 0
𝐶 𝐵 𝐴 ( )
0 𝑒𝑖𝛿
Figure 4.4.8. Circuit that implements the same operator as the circuit in Figure 4.4.7.
Theorem 4.4.5. Let 𝑈 be a unitary operator on ℍ1 and let 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶 where
𝛿 ∈ ℝ and 𝐴, 𝐵, 𝐶 are unitary single-qubit operators with 𝐴𝐵𝐶 = 𝐼. Then the upper-
left controlled-𝑈 operator in Figure 4.4.5 can be implemented as shown in Figure 4.4.7
using 𝐴, 𝐵, 𝐶, and 𝑃(𝛿) and two 𝖢𝖭𝖮𝖳 gates .
Proof. We show that the implementation in Figure 4.4.7 is correct. We first note that
the two circuits in Figure 4.4.6 implement the same operator since when applied to the
computational basis states of ℍ2 , both have the following effect:
(4.4.5) |00⟩ ↦ |00⟩ , |01⟩ ↦ |01⟩ , |10⟩ ↦ 𝑒𝑖𝛿 |10⟩ , |11⟩ ↦ 𝑒𝑖𝛿 |11⟩ .
This means that the circuit in Figure 4.4.8 implements the same operator as the circuit
in Figure 4.4.7. We show that the circuit in Figure 4.4.8 implements the controlled-𝑈
operator with the first qubit as control. If the first qubit is |1⟩, then the circuit applies
𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶 to the second qubit. If the first qubit is |0⟩, then the circuit applies
𝐴𝐵𝐶 to the second qubit. Since 𝐴𝐵𝐶 = 𝐼, the circuit does not change the second qubit.
This proves the claim. □
Exercise 4.4.6. Show how Theorem 4.4.5 can be used to implement the controlled-𝑌 ,
𝑍, 𝑆, and 𝑇 operators.
4.4. Controlled operators 171
Figure 4.4.9. Controlled-𝑈 operators acting on quantum states with more than two qubits.
Like the CNOT gates, the controlled-𝑈 gates can be applied to quantum registers of
length > 2. This is shown in Figure 4.4.9. Theorem 4.4.5 implies that these generalized
controlled-𝑈 operators can be implemented using four unitary single-qubit operators
and two 𝖢𝖭𝖮𝖳 gates.
4.4.3. General controlled operators. Now we present the most general con-
trolled operators. An example of such an operator is shown in Figure 4.4.10. This
operator applies the unitary operator 𝑈 on ℍ2 to the qubits |𝑏4 𝑏5 ⟩ conditioned on the
qubit |𝑏1 ⟩ being |0⟩ and the qubit |𝑏2 ⟩ being |1⟩. The other qubits are not changed. So,
it acts on the computational basis states of ℍ7 as follows:
Definition 4.4.7. Let 𝐶0 , 𝐶1 , and 𝑇 be pairwise disjoint subsets of the index set ℤ𝑛 .
Let 𝑚 = |𝑇| > 0, and let 𝑇 = {𝑡, 𝑡 + 1, . . . , 𝑡 + 𝑚 − 1} with 𝑡 ∈ ℤ𝑛 . So 𝑇 is a set of 𝑚
consecutive integers in the index set ℤ𝑛 . Also, let 𝑈 be a unitary operator on ℍ𝑚 . Then
|𝑏0 ⟩ |𝑏0 ⟩
|𝑏1 ⟩ |𝑏1 ⟩
|𝑏2 ⟩ |𝑏2 ⟩
|𝑏3 ⟩ |𝑏3 ⟩
|𝑏4 ⟩
𝑈 𝑈 ¬𝑏1 ∧𝑏2 |𝑏4 𝑏5 ⟩
|𝑏5 ⟩
|𝑏6 ⟩ |𝑏6 ⟩
the linear operator 𝐶 𝐶0 ,𝐶1 ,𝑇 (𝑈) is defined by its action on the computational basis states
|𝑏0 ⋯ 𝑏𝑛−1 ⟩ of ℍ𝑛 as follows. It applies 𝑈 to the target qubits |𝑏𝑡 ⋯ 𝑏𝑡+𝑚−1 ⟩ conditioned
on the control qubits |𝑏𝑖 ⟩ with 𝑖 ∈ 𝐶0 being |0⟩ and the control qubits |𝑏𝑖 ⟩ with 𝑖 ∈ 𝐶1
being |1⟩; i.e.,
𝐶 𝐶0 ,𝐶1 ,𝑇 (𝑈) |𝑏0 ⋯ 𝑏𝑛−1 ⟩
(4.4.7)
= |𝑏0 ⋯ 𝑏𝑡−1 ⟩ 𝑈 𝑐 |𝑏𝑡 ⋯ 𝑏𝑡+𝑚−1 ⟩ |𝑏𝑚 ⋯ 𝑏𝑛−1 ⟩
where
(4.4.8) 𝑐 = ∏ (1 − 𝑏𝑖 ) ∏ 𝑏𝑖 .
𝑖∈𝐶0 𝑖∈𝐶1
If any of the index sets 𝐶0 , 𝐶1 , or 𝑇 has only one element, then we replace the set in the
superscript by this element.
In this definition, we could drop the requirement that the set 𝑇 of target qubits be
a set of consecutive numbers. However, this would complicate the definition and we
can achieve the same effect by using SWAP gates, described in Section 4.5. As shown
in Exercise 4.4.8, general multiply controlled operators are unitary.
Exercise 4.4.8. Prove that every multiply controlled operator as specified in Definition
4.4.7 is unitary.
Example 4.4.9. Using the notation from Definition 4.4.7 the operator in Figure 4.4.10
can be written as
(4.4.9) 𝐶 1,2,{4,5} (𝑈).
Example 4.4.10. Using the notation from Definition 4.4.7 we see that for the left op-
erator in Figure 4.4.4 we have 𝐶0 = ∅, 𝐶1 = {𝑐}, 𝑇 = {𝑡}, and 𝑈 = 𝑋. So, the operator is
𝐶 ∅,𝑐,𝑡 (𝑋). Also, the right operator is 𝐶 𝑐,∅,𝑡 (𝑋).
4.4.4. The quantum Toffoli gate. Figure 4.4.11 shows the quantum Toffoli gate,
also called 𝖢𝖢𝖭𝖮𝖳. Its classical counterpart has been introduced in Section 1.7.1.
The 𝖢𝖢𝖭𝖮𝖳 gate applies the Pauli 𝑋 gate to the target qubit |𝑡⟩ conditioned on the
control qubits |𝑐 0 ⟩ and |𝑐 1 ⟩ both being |1⟩; that is, it acts on the computational basis
vectors of ℍ3 as follows:
(4.4.10) |𝑐 0 𝑐 1 𝑡⟩ ↦ |𝑐 0 𝑐 1 ⟩ 𝑋 𝑐0 𝑐1 |𝑡⟩ .
|𝑥⟩ 𝑐 0 |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
|𝑡⟩ 𝑋 𝑐0 𝑐1 |𝑡⟩
|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑡⟩ |(𝑐 0 ⋅ 𝑐 1 ) ⊕ 𝑡⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
4.4.5. The 𝐶 𝑘 (𝑈) operators. We introduce the controlled operators 𝐶 𝑘 (𝑈). Such
an operator is shown in Figure 4.4.13. To specify it, we use 𝑘, 𝑚 ∈ ℕ with 𝑛 = 𝑘 + 𝑚
and a unitary operator 𝑈 on ℍ𝑚 . We write the computational basis vectors of ℍ𝑛 as
|𝑐 0 ⋯ 𝑐 𝑘−1 𝑡0 ⋯ 𝑡𝑚−1 ⟩ instead of |𝑏0 ⋯ 𝑏𝑛−1 ⟩ to distinguish between control and target
qubits. Then we have
𝑘−1
(4.4.12) 𝐶 𝑘 (𝑈) |𝑐 0 ⋯ 𝑐 𝑘−1 𝑡0 ⋯ 𝑡𝑚−1 ⟩ = |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ 𝑈 ∏𝑖=0 |𝑡0 ⋯𝑡𝑚−1 ⟩
or using the notation of Definition 4.4.7
(4.4.13) 𝐶 𝑘 (𝑈) = 𝐶 ∅,{0,. . .,𝑘−1},{𝑘,. . .,𝑘+𝑚−1} (𝑈).
|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
⋮ ⋮
|𝑐 𝑘−1 ⟩ |𝑐 𝑘−1 ⟩
|𝑡0 ⟩
|𝑡1 ⟩
𝑘−1
𝑈 ∏𝑖=0 𝑐𝑖
|𝑡0 ⋯ 𝑡𝑚−1 ⟩
⋮ ⋮ 𝑈
|𝑡𝑚−1 ⟩
|𝑏0 ⟩ |𝑏0 ⟩
|𝑏1 ⟩ |𝑏1 ⟩
|𝑏3 ⟩ |𝑏3 ⟩
Figure 4.4.14. The operator 𝖳𝖱𝖠𝖭𝖲(01∗0) which exchanges |0100⟩ and |0110⟩.
𝑏0 𝑏1 |𝑏0 ⟩ |𝑏1 ⟩
=
𝑏1 𝑏0 |𝑏1 ⟩ |𝑏0 ⟩
Figure 4.5.1. The quantum 𝖲𝖶𝖠𝖯 gate and its implementation using 𝖢𝖭𝖮𝖳 gates.
Exercise 4.5.1. Verify that the implementation of the 𝖲𝖶𝖠𝖯 gate in Figure 4.5.1 is
correct.
Generalizing the implementation of the simple 𝖲𝖶𝖠𝖯 gate in Figure 4.5.1, we see
that every quantum 𝖲𝖶𝖠𝖯 gate can be implemented using 3 𝖢𝖭𝖮𝖳 gates.
We note that 𝖲𝖶𝖠𝖯 gates apply a transposition to the index sequence of the com-
putational basis states of ℍ𝑛 . This suggests the generalization of quantum swap gates
as follows. For 𝜋 ∈ 𝑆𝑛 the quantum permutation operator 𝑈𝜋 is defined by its action
on the computational basis states |𝑏0 ⋯ 𝑏𝑛−1 ⟩ as follows:
(4.5.3) 𝑈𝜋 |𝑏0 ⋯ 𝑏𝑛−1 ⟩ = |𝑏𝜋(0) ⋯ 𝑏𝜋(𝑛−1) ⟩ .
|𝑎0 ⟩ = |0⟩ tr
|𝑎1 ⟩ = |0⟩ tr
|𝑡⟩ 𝑈 𝑈 𝑐0 𝑐1 𝑐2 |𝑡⟩
Table 4.6.1. Evolution of the states in the circuit from Figure 4.6.1.
𝑖 |𝜓𝑖 ⟩
0 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑡⟩
1 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |0⟩ |0⟩ |𝑡⟩
2 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |0⟩ |𝑡⟩
3 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |𝑐 0 ⋅ 𝑐 1 ⋅ 𝑐 2 ⟩ |𝑡⟩
4 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |𝑐 0 ⋅ 𝑐 1 ⋅ 𝑐 2 ⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩
5 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |0⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩
6 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |0⟩ |0⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩
7 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩ = 𝐶 3 (𝑈) |𝜓⟩0
and |𝑎1 ⟩. Since the state of the quantum register used in the circuit is separable with
respect to the decomposition into ancillary and nonancillary qubits, it follows from
Corollary 3.7.12 that this does not change the other qubits and the resulting state is
𝐶 3 (𝑈) |𝜓⟩.
Exercise 4.6.1. Verify Table 4.6.1.
(2) the information about how the ancilla qubits are initialized and where they are
inserted or to which qubits the unitary or erasure gates are applied, respectively.
At most one gate is applied to each qubit.
𝑞1 𝑞5
|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
𝑞2 𝑞4
|𝑐 2 ⟩ |𝑐 2 ⟩
𝑞0 𝑞6
|𝑎0 ⟩ = |0⟩ tr
𝑞3
|𝑎1 ⟩ = |0⟩ tr
|𝑡⟩ 𝑈 𝑈 𝑐0 𝑐1 𝑐2 |𝑡⟩
We make some remarks about this construction. Since tracing out quantum bits
is allowed in this general definition of quantum circuits, the resulting quantum state
may be a mixed state. But frequently the situation is simpler. Denote by 𝐴 the quantum
system of all output qubits that are not traced out and by 𝐵 the quantum system of the
qubits that are traced out and assume that the state of system 𝐴𝐵 is separable. Then
by Corollary 3.7.12, tracing out system 𝐵 means omitting the corresponding quantum
bits. This is the case in Figure 4.6.1.
Next, we call the quantum circuit 𝑄 unitary if 𝑚 = 𝑛 and the quantum operator
implemented by it is unitary. If 𝑄 uses no ancillary and erasure gates, then it is always
unitary. If 𝑄 uses ancillary or erasure gates, it may or may not be unitary. For example,
if 𝑚 ≠ 𝑛, then 𝑄 is not unitary.
But every quantum circuit can be transformed into a unitary quantum circuit. To
see how this works, let 𝑄 be a quantum circuit. The transformed quantum circuit 𝑅 is
obtained by removing the ancillary gates and adding the corresponding ancillary qubits
as new input qubits. Also, we remove the erasure gates and add the corresponding
qubits as new output qubits, and the new quantum circuit 𝑅 is called the purification
of the quantum circuit 𝑄. This purification is unitary. Furthermore, this process is
referred to as purification of 𝑄. Figure 4.7.2 shows the purification 𝑅 of the quantum
circuit 𝑄 from Figure 4.6.1. However, note that the quantum circuit 𝑄 in Figure 4.6.1
is already unitary. So we see that purification may not be required to make a quantum
circuit unitary.
It is also possible to represent quantum circuits as algorithms that are specified us-
ing pseudocode. An example is the algorithmic representation of the quantum circuit
in Figure 4.6.1 which is shown in Algorithm 4.7.5.
|𝑎0 ⟩ |𝑎0 ⟩
|𝑎1 ⟩ |𝑎1 ⟩
|𝑡⟩ 𝑈 𝑈 𝑐0 𝑐1 𝑐2 |𝑡⟩
Exercise 4.7.6. Write an algorithm that represents the quantum circuit in Figure 4.7.2.
Theorem 4.7.7. Let 𝑛, 𝑚 ∈ ℕ and let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 . Then there is a quantum
circuit 𝑄 of size O(|𝑓|𝐹 ) that only uses quantum Toffoli, ancillary, and erasure gates and
implements the quantum operator
(4.7.3) 𝑈 ∶ ℍ𝑛+𝑚 → ℍ𝑛+𝑚 , |𝑥⟩⃗ |𝑦⟩⃗ ↦ |𝑥⟩⃗ |𝑦 ⃗ ⊕ 𝑓(𝑥)⟩
⃗ .
Proof. Consider the quantum circuit 𝑄𝑟 which is obtained by replacing the classical
Toffoli gates in the circuit 𝐷𝑟 from Theorem 1.7.12 with quantum Toffoli gates. It im-
plements a unitary operator
(4.7.4) 𝑈𝑟 ∶ ℍ𝑛 ⊗ ℍ𝑛+𝑝 ⊗ ℍ𝑚 → ℍ𝑛 ⊗ ℍ𝑛+𝑝 ⊗ ℍ𝑚
where 𝑝 ∈ ℕ, 𝑝 ≤ 2|𝑓|𝐹 , such that for all 𝑥⃗ ∈ {0, 1}𝑛 and all 𝑦 ⃗ ∈ {0, 1}𝑚 we have
Our first theorem shows that the notion of a universal set of quantum gates can-
not be obtained as a straightforward generalization of the corresponding definition in
classical computing.
Theorem 4.8.1. Let 𝑆 be a set of quantum gates such that for every 𝑛 ∈ ℕ and every
unitary operator 𝑈 on ℍ𝑛 there is a quantum circuit that implements 𝑈 and uses only
gates from 𝑆. Then 𝑆 is uncountable.
Proof. Let 𝑛 ∈ ℕ. By Theorem 4.3.15 the rotation gates 𝑅𝑥̂ (𝜃) with 𝜃/2 ∈ [0, 2𝜋[
are pairwise different. Since the set [0, 2𝜋[ is uncountable, it follows that the set of
unitary operators on ℍ𝑛 is uncountable. But if 𝑆 is a countable set of quantum gates,
then the set of all quantum circuits that can be constructed using the gates in 𝑆 is also
countable. □
Theorem 4.8.1 implies that there are no finite or even countable universal sets of
quantum gates in the classical sense. We will therefore call a set of quantum gates uni-
versal if it can be used to approximate every unitary operator to an arbitrary precision
and we will show that finite sets of quantum gates with this property exist. The notions
of universality can be generalized to general quantum operators. But in this book, we
restrict ourselves to unitary quantum operators. For the discussion of universality, we
need the following definition. It uses the supremum sup 𝑆 of a set 𝑆 of real numbers
that is bounded from above. It is the least upper bound of 𝑆 in ℝ and it can be shown
that it always exists and is uniquely determined.
Definition 4.8.2. Let 𝑈 and 𝑉 be two unitary operators on ℍ𝑛 . Then we define the
error when 𝑉 is implemented instead of 𝑈 as
The next proposition uses the distance between two unitary operators 𝑈 and 𝑉 on
ℍ𝑛 to estimate the difference between the probabilities of measuring a certain outcome
when 𝑈 and 𝑉 are applied to a quantum state.
Proof. Let 𝜆 ∈ Λ and let |𝜓⟩ ∈ ℍ𝑛 be a quantum state. Since 𝑃𝜆 is an orthogonal pro-
jection, it is Hermitian by Proposition 2.4.41. Also, 𝑈 is unitary and |𝜓⟩ has Euclidean
length 1. From Exercise 2.4.38 we obtain
This inequality, the linearity of the inner product on ℍ𝑛 , the Cauchy-Schwarz inequal-
ity, and Exercise 2.4.38 imply
(4.8.4) = ||⟨𝑃𝜆∗ 𝑈 |𝜓⟩ ||(𝑈 − 𝑉) |𝜓⟩ ⟩ + ⟨(𝑈 − 𝑉) |𝜓⟩ ||𝑃𝜆 𝑉 |𝜓⟩ ⟩||
Now we can define universal sets of quantum gates. In this definition, we use the
term unitary quantum gate. By this we mean a quantum gate that implements a unitary
operator; i.e., this quantum gate is neither an ancillary nor an erasure gate.
(1) We say that 𝑆 is universal for a set 𝑇 of unitary quantum operators if for every
𝜀 ∈ ℝ>0 and every 𝑈 ∈ 𝑇 there is a unitary operator 𝑉 which can — up to a
global phase factor — be implemented by a quantum circuit that uses only gates
from 𝑆, ancillary gates, and erasure gates such that 𝐸(𝑈, 𝑉) < 𝜀.
(2) We say that 𝑆 is universal for quantum computation if 𝑆 is universal for the set of
all unitary quantum operators.
Definition 4.8.5. We call a set 𝑆 of unitary quantum gates perfectly universal for quan-
tum computation or perfectly universal for short if all unitary operators 𝑈 on ℍ𝑛 can,
up to a global phase factor, be implemented by a quantum circuit that uses only gates
from 𝑆, ancillary gates, and erasure gates.
Proof. Let
(4.9.1) 𝑈 = ∑ 𝜆𝑃𝜆
𝜆∈Λ
Using the method from the proof of Proposition 4.9.1, we can determine square
roots of the Pauli operators.
Proposition 4.9.2. The operator 𝑉 = (1 + 𝑖)(𝐼 − 𝑖𝑋)/2 is unitary, and we have 𝑉 2 = 𝑋.
Figure 4.9.1 shows an implementation of 𝐶 2 (𝑈) that uses a square root 𝑉 of 𝑈. Its
correctness is stated in Proposition 4.9.5.
Proposition 4.9.5. Let 𝑈, 𝑉 be unitary operators such that 𝑉 2 = 𝑈. Then the circuit on
the right side of Figure 4.9.1 implements 𝐶 2 (𝑈). It uses two 𝖢𝖭𝖮𝖳 gates, two 𝐶 1 (𝑉) gates,
and one 𝐶 1 (𝑉 ∗ ) gates.
𝑈 𝑉 𝑉∗ 𝑉
𝑇∗ 𝑇∗ 𝑆
𝐻 𝑇∗ 𝑇 𝑇∗ 𝑇 𝐻
Figure 4.9.2. An implementation of the Toffoli gate that uses the Hadamard, phase,
𝖢𝖭𝖮𝖳, and 𝜋/8 gates.
It follows from Proposition 4.9.2 that the circuit in Figure 4.9.1 can be used to im-
plement the quantum Toffoli gate. Figure 4.9.2 shows another implementation of this
gate. It uses the phase, 𝜋/8, and 𝖢𝖭𝖮𝖳 gates and its correctness is stated in Proposition
4.9.7.
Proposition 4.9.7. The circuit in Figure 4.9.2 implements the Toffoli gate. It uses two
Hadamard, one phase, seven (inverse) 𝜋/8, and six 𝖢𝖭𝖮𝖳 gates.
This step has no effect on the other qubits. The next 𝑘 − 1 steps of the circuit change
the ancilla qubits back to |0⟩ without an effect on the other qubits. Now the state of
|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
|𝑐 2 ⟩ |𝑐 2 ⟩
|𝑐 3 ⟩ |𝑐 3 ⟩
|𝑐 4 ⟩ |𝑐 4 ⟩
|𝑎0 ⟩ = |0⟩ tr
|𝑎1 ⟩ = |0⟩ tr
|𝑎2 ⟩ = |0⟩ tr
|𝑎3 ⟩ = |0⟩ tr
4
||𝑡 ⃗ ⟩ 𝑈 𝑈 ∏𝑖=0 𝑐𝑖 ||𝑡 ⃗ ⟩
Figure 4.9.3. Implementation of 𝐶 5 (𝑈) using only 𝖢𝖢𝖭𝖮𝖳 and 𝐶 1 (𝑈) gates.
4.9. Implementation of controlled operators 185
the composition 𝐴𝐵, where 𝐴 consists of the control and target qubits and system 𝐵
consists of the ancilla qubits, is separable. Hence, it follows from Corollary 3.7.12 that
tracing out the ancilla qubits does not change the control or target qubits and yields
𝑘−1
(4.9.6) 𝐶 𝑘 (𝑈) |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ |𝑡0 ⋯ 𝑡𝑚−1 ⟩ = |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ 𝑈 ∏𝑖=0 𝑐𝑖 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ .
Proof. We show by induction that for 𝑗 = 0, . . . , 𝑘 − 2, the first steps 𝑗 + 1 carried out
by the algorithm starting in line 3 yield the ancilla qubits
| 𝑗+1
(4.9.7) |𝑎𝑗 ⟩ = ||∏ 𝑐 𝑖 ⟩
| 𝑖=0
and do not change the other qubits. We start with 𝑗 = 0, that is, with the statement in
line 3 of the algorithm. It produces the state
(4.9.8) |𝑎0 ⟩ = |𝑐 0 ⋅ 𝑐 1 ⟩ .
So (4.9.9) and (4.9.10) imply (4.9.7). Next, it follows from (4.9.7) with 𝑗 = 𝑘 − 2 that
the statement in line 7 has the following effect:
𝑘−1
(4.9.11) |𝑡⟩ ← 𝐶 𝑎𝑘−2 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ = 𝐶 ∏𝑖=0 𝑐𝑖 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ .
Finally, the for loop starting in line 8 carries out 𝑘 − 2 iterations with loop indices
𝑗 = 𝑘 − 2, . . . , 1. Iteration with loop index 𝑗 ∈ {1, . . . , 𝑘 − 2} changes the qubit |𝑎𝑗 ⟩ and
no other qubit. Also, after the 𝑗th iteration we have
(4.9.12) |𝑎𝑗 ⟩ = |0⟩ .
This can also be seen by induction on 𝑗. The statement in line 11 yields |𝑎0 ⟩ = 0. So
(4.9.12) holds for 𝑗 = 0, . . . , 𝑘 − 2. In addition, the state of the composition 𝐴𝐵, where
𝐴 consists of the control and target qubits and system 𝐵 consists of the ancillary qubits,
is separable. So, by Corollary 3.7.12, tracing out the qubits |𝑎0 ⟩ , . . . , |𝑎𝑘−2 ⟩ has no effect
on the other qubits. This concludes the proof. □
186 4. The Theory of Quantum Algorithms
|𝑐 0 ⟩ 𝑋 𝑋 |𝑐 0 ⟩
|𝑐 1 ⟩ 𝑋 𝑋 |𝑐 1 ⟩
|𝑐 2 ⟩ |𝑐 2 ⟩
|𝑐 3 ⟩ |𝑐 3 ⟩
|𝑐 4 ⟩ |𝑐 4 ⟩
|𝑐 5 ⟩ |𝑐 5 ⟩
|𝑎0 ⟩ = |0⟩ tr
|𝑎1 ⟩ = |0⟩ tr
|𝑎2 ⟩ = |0⟩ tr
|𝑎3 ⟩ = |0⟩ tr
Figure 4.9.4. Implementation of 𝐶 {0,1},{2,3,4},6 (𝑈) using only 𝑋, 𝖢𝖢𝖭𝖮𝖳, and 𝐶 1 (𝑈) gates.
Based on the construction of the 𝐶 𝑘 (𝑈) quantum circuit, we can implement gen-
eral 𝐶 𝐶0 ,𝐶1 ,𝑇 operators as in Definition 4.4.7. As an example, Figure 4.9.4 shows a quan-
tum circuit that implements such an operator with 𝑛 = 7, 𝐶0 = {0, 1}, 𝐶1 = {2, 3, 4},
and 𝑇 = {6}. In this circuit, the Pauli 𝑋 operator is applied to the qubits |𝑏𝑖 ⟩ with 𝑖 ∈ ℂ0
at the beginning and the end of the computation. This idea is the basis for the following
theorem.
Theorem 4.9.13. Let 𝐶0 , 𝐶1 , and 𝑇 be pairwise disjoint subsets of ℤ𝑛 , let 𝑚 = |𝑇| > 0,
and assume that 𝑇 = {𝑖, 𝑖 + 1, . . . , 𝑖 + 𝑚 − 1}. Also, let 𝑈 be a unitary operator on ℍ𝑚 .
Set 𝑘0 = |𝐶0 |, 𝑘1 = |𝐶1 |, 𝑘 = 𝑘0 + 𝑘1 . Then the unitary operator 𝐶 𝐶0 ,𝐶1 ,𝑇 (𝑈) can be
implemented by a quantum circuit that uses 2𝑘0 Pauli 𝑋 gates, 2𝑘 − 2 𝖢𝖢𝖭𝖮𝖳 gates, one
𝐶 1 (𝑈) gate, and 𝑘 − 1 ancillary and erasure gates.
For the special case where 𝑈 is a single-qubit operator, Theorem 4.9.13 implies the
following result.
Theorem 4.9.14. Let 𝑈 be a unitary single-qubit operator, let 𝐶0 , 𝐶1 be disjoint subsets
of ℤ𝑛 , let 𝑘0 = |𝐶0 |, 𝑘1 = |𝐶1 |, 𝑘 = 𝑘0 + 𝑘1 < 𝑛, and 𝑡 ∈ ℤ𝑛 ⧵ (𝐶0 ∪ 𝐶1 ). Then the
4.10. Perfectly universal sets of quantum gates 187
unitary operator 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑈) can be implemented by a quantum circuit that uses O(𝑘)
Pauli 𝑋, Hadamard, 𝜋/8, inverse 𝜋/8, 𝖢𝖭𝖮𝖳, ancillary, and erasure gates and four other
single-qubit gates.
Proof. It follows from Theorem 4.9.13 that 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑈) can be implemented using 2𝑘0
Pauli 𝑋 gates, 2𝑘 − 2 𝖢𝖢𝖭𝖮𝖳 gates, one 𝐶 1 (𝑈) operator, and 𝑘 − 1 ancillary and erasure
gates. By Proposition 4.9.7 each 𝖢𝖢𝖭𝖮𝖳 gate can be implemented using 2 Hadamard
gates 𝐻, one phase gate 𝑆, 7 (inverse) 𝜋/8 gates 𝑇, and 6 standard 𝖢𝖭𝖮𝖳 gates. Also,
by Theorem 4.4.5 the operator 𝐶 1 (𝑈) can be implemented using 2 standard 𝖢𝖭𝖮𝖳 and
4 single-qubit gates 𝐴, 𝐵, 𝐶, 𝑒𝑖𝛿/2 𝑅𝑧̂ (𝛿) such that 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶. Since by (4.3.53)
we have 𝑆 = 𝑇 2 , it follows that 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑈) can be implemented using O(𝑘) Pauli 𝑋,
Hadamard, (inverse) 𝜋/8, and standard 𝖢𝖭𝖮𝖳 gates and four other single-qubit gates.
□
Proof. The proof is analogous to the proof of Theorem 4.9.14 and uses the fact that for
a transposition operator, we have 𝑈 = 𝑋. □
4.10.1. Two-level operators. For the first construction, we need the following
definition. We recall that we identify linear operators on ℍ𝑛 with their representation
matrices with respect to the computational basis of ℍ𝑛 .
Definition 4.10.1. Let 𝐴 ∈ ℂ(𝑘,𝑘) . Then 𝐴 is called a two-level matrix, two-level oper-
ator, or two-level gate if there are 𝑖, 𝑗 ∈ ℤ𝑘 such that for every 𝑣 ̂ ∈ ℂ𝑘 all entries of the
vectors 𝑣 ̂ and 𝐴𝑣 ̂ with indices different from 𝑖 and 𝑗 are equal.
We note that in this definition we do not require 𝑖 and 𝑗 to be different. This implies
that all matrices in ℂ(1,1) are two-level matrices that will simplify our reasoning. We
also note that all matrices in ℂ(2,2) are two-level matrices.
Example 4.10.2. Consider the matrix
1 1 0
𝐴 = (1 −1 0) .
0 0 1
For every 𝑣 ̂ = (𝑣 0 , 𝑣 1 , 𝑣 2 ) ∈ ℂ3 we have
𝑣0 + 𝑣1
𝐴𝑣 ̂ = (𝑣 0 − 𝑣 1 ) .
𝑣2
Hence, 𝐴 is a two-level matrix.
188 4. The Theory of Quantum Algorithms
To prove this claim, we use the following construction. Let 𝑊 = (𝑤 𝑖,𝑗 ) ∈ ℂ(𝑘,𝑘) be a
unitary matrix. For 𝑖 = 1, . . . , 𝑘 − 1 define the matrix 𝑇𝑖 (𝑊) = (𝑡𝑝,𝑞 ) ∈ ℂ(𝑘,𝑘) as follows.
If 𝑤 𝑖,0 = 0, we set 𝑇𝑖 (𝑊) = 𝐼𝑘 . If 𝑤 𝑖,0 ≠ 0, we initialize 𝑇𝑖 (𝑊) to 𝐼𝑘 and then change
four of the entries of 𝑇𝑖 (𝑊) in the following way. Set
1
(4.10.3) 𝑐=
2 2
√|𝑤 0,0 | + |𝑤 𝑖,0|
and
(4.10.4) 𝑡0,0 = 𝑐𝑤 0,0 , 𝑡0,𝑖 = 𝑐𝑤 𝑖,0 , 𝑡 𝑖,0 = 𝑐𝑤 𝑖,0 , 𝑡 𝑖,𝑖 = −𝑐𝑤 0,0 .
Then 𝑇𝑖 (𝑊) is a two-level matrix. Also, this matrix is unitary. To see this, we observe
that the columns with indices 𝑗 ≠ 0, 𝑖 are equal to the unit vectors 𝑒𝑗⃗ . Therefore, they
have length 1 and are pairwise orthogonal. Furthermore, the length of the first column
is 𝑐2 (|𝑤 0,0 |2 + |𝑤 𝑖,0 |2 ) = 1, the length of the 𝑖th column is 𝑐2 (|𝑤 𝑖,0 |2 + |𝑤 0,0 |2 ) = 1, the
inner product of the first and the 𝑖th columns is 𝑐2 (𝑤 0,0 𝑤 𝑖,0 − 𝑤 𝑖,0 𝑤 0,0 ) = 0, and the
inner product of the first and second columns with the other columns is 0.
4.10. Perfectly universal sets of quantum gates 189
We determine the entries in the first column of the product 𝑇𝑖 (𝑊)𝑊. The entries
with indices different from 0 and 𝑖 are the same as the corresponding entries in the first
column of 𝑊. The entry with index 0 is
1
(4.10.5) 𝑡0,0 𝑤 0,0 + 𝑡0,𝑖 𝑤 𝑖,0 = 𝑐(|𝑤 0,0 |2 + |𝑤 𝑖,0 |2 ) = .
𝑐
The entry with index 𝑖 is
(4.10.6) 𝑡 𝑖,0 𝑤 0,0 + 𝑡 𝑖,𝑖 𝑤 𝑖,0 = 𝑐(𝑤 𝑖,0 𝑤 0,0 − 𝑤 0,0 𝑤 𝑖,0 ) = 0.
where 0⃗ denotes the row and column vectors in ℂ𝑘−1 which have only zero entries,
respectively, and 𝑉 ′ ∈ ℂ(𝑘−1,𝑘−1) is the minor obtained from 𝑉 by deleting the first
row and column of this matrix. Since 𝑉 is unitary, Proposition 2.4.18 implies that 𝑉 ′ is
unitary.
According to the induction hypothesis, there are 𝑚 ∈ ℕ and unitary two-level
matrices 𝑉1′ , . . . , 𝑉𝑚′ ∈ ℂ(𝑘−1,𝑘−1) such that 𝑚 ≤ (𝑘 − 1)(𝑘 − 2)/2 and 𝑉 ′ = 𝑉0′ ⋯ 𝑉𝑚−1
′
.
For 0 ≤ 𝑖 < 𝑚 we set
1 0⃗
(4.10.9) 𝑉𝑖 = ( ).
0⃗ 𝑉𝑖′
So 𝑉 𝑖 is obtained from 𝑉𝑖′ by prepending the unit vector (1, 0, . . . , 0) ∈ ℂ𝑘 as first row
and column. Then the 𝑉 𝑖 are unitary two-level matrices and we have 𝑉 = 𝑉0 ⋯ 𝑉𝑚−1 .
From (4.10.1) we obtain
(4.10.10) 𝑈 = 𝑈1∗ ⋯ 𝑈𝑘−1
∗
𝑉0 ⋯ 𝑉𝑚−1 .
This is a decomposition of 𝑈 into a product of two-level unitary matrices. The number
of factors is 𝑚 + 𝑘 − 1 ≤ (𝑘 − 1)(𝑘 − 2)/2 + 𝑘 − 1 = (𝑘2 − 3𝑘 + 2 + 2𝑘 − 2)/2 = (𝑘2 − 𝑘)/2 =
𝑘(𝑘 − 1)/2. □
The proof of Theorem 4.10.4 also contains a method to construct the decompo-
sition of a unitary matrix into a product of two-level matrices and thus a method to
construct a circuit that implements a given unitary operator and uses only two-level
gates. Therefore, Theorem 4.10.4 implies the following corollary.
190 4. The Theory of Quantum Algorithms
Corollary 4.10.5. The set of all two-level unitary operators is perfectly universal for quan-
tum computation.
To prove Theorem 4.10.6 we will show that every unitary two-level operator can be
implemented by a quantum circuit that uses only single-qubit gates and the standard
𝖢𝖭𝖮𝖳 gate. Then Theorem 4.10.6 follows from this statement and Corollaries 4.10.5
and 4.3.16. For the proof of Theorem 4.10.6 we also need the following definition and
result.
Definition 4.10.7. Let 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 . A Gray code connecting 𝑠 ⃗ and 𝑡 ⃗ is a sequence
⃗ ) of pairwise distinct vectors in {0, 1}𝑛 such that 𝑔0⃗ = 𝑠,⃗ 𝑔𝑚
𝐺 = (𝑔0⃗ , . . . , 𝑔𝑚 ⃗ = 𝑡,⃗ and the
successive elements of 𝐺 differ by exactly one bit.
Example 4.10.8. Let 𝑠 ⃗ = (0, 0, 0) and 𝑡 ⃗ = (1, 1, 1). Then
(4.10.11) 𝐺 = ((0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1))
is a Gray code that connects 𝑠 ⃗ and 𝑡.⃗
Exercise 4.10.9. Find the shortest Gray code that connects (1, 1, 0) and (0, 1, 1).
The next proposition is also required for the proof of Theorem 4.10.6.
Proposition 4.10.10. Let 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 . Then there is a Gray code of length ≤ 𝑛 + 1 that
connects 𝑠 ⃗ and 𝑡.⃗
Proof. We prove the theorem by induction on 𝑛. Let 𝑛 = 1. Then 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}. If 𝑠 ⃗ = 𝑡,⃗
then (𝑠)⃗ is a gray code of length 1 ≤ 2 = 𝑛 + 1 that connects 𝑠 ⃗ and 𝑡.⃗ If 𝑠 ⃗ ≠ 𝑡,⃗ then (𝑠,⃗ 𝑡)⃗
is a Gray code of length 2 = 𝑛 + 1 connecting 𝑠 ⃗ and 𝑡 ⃗ . This proves the base case.
For the induction step, assume that the assertion is true for 𝑛−1. Denote by 𝑠′⃗ and 𝑡′⃗
the vectors that are obtained from 𝑠 ⃗ and 𝑡,⃗ respectively, by deleting the last entry. Then
′ ′
it follows from the induction hypothesis that there is a Gray code 𝐺 ′ = (𝑔0⃗ , . . . , 𝑔𝑚−1 ⃗ )
of length 𝑚 ≤ 𝑛 that connects 𝑠′⃗ and 𝑡′⃗ . Let 𝑏 be the last entry of 𝑠.⃗ Append 𝑏 to all ele-
ments of 𝐺′ as the new last entry. Denote the resulting sequence by 𝐺 = (𝑔0⃗ , . . . , 𝑔𝑚−1 ⃗ ).
Then 𝐺 is a sequence of length 𝑚 in {0, 1}𝑛 and the successive elements of 𝐺 differ by
exactly one bit. Also, we have 𝑔0⃗ = 𝑠 ⃗ and the vectors 𝑔𝑚−1 ⃗ and 𝑡 ⃗ are either equal or
differ exactly in the last bit. In the first case, 𝐺 is a Gray code connecting 𝑠 ⃗ and 𝑡.⃗ In
the second case, (𝑔0⃗ , . . . , 𝑔𝑚−1
⃗ , 𝑡)⃗ is such a Gray code. □
Note that the proof of Proposition 4.10.10 contains a method for constructing a
connecting gray code.
Exercise 4.10.11. Find an algorithm that on input of 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 computes a Gray
code of length at most 𝑛 + 1 that connects 𝑠 ⃗ and 𝑡 ⃗ and analyze its complexity.
4.10. Perfectly universal sets of quantum gates 191
Next, we will prove a statement that together with Corollary 4.10.5 implies Theo-
rem 4.10.6.
Theorem 4.10.12. For every two-level unitary operator 𝑈 on ℍ𝑛 there is a unitary single-
qubit operator 𝑉 such that 𝑈 can be implemented by a quantum circuit that uses 𝑉 and
O(𝑛2 ) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳, ancillary, and erasure gates
and four other single-qubit gates.
Since the elements of the Gray code 𝐺 are pairwise different, it follows that 𝑡 ⃗ is different
from the first 𝑚−1 elements in the Gray code which implies 𝑃 ||𝑡 ⃗ ⟩ = ||𝑡 ⃗ ⟩. Also, 𝑃 |𝑠 ⃗ ⟩ and
||𝑡 ⃗ ⟩ differ in exactly one qubit. Let 𝑖 be its index. If 𝑡 𝑖 = 1, then (4.10.15) holds. If 𝑡 𝑖 = 0,
𝑐⃗
then we replace 𝑃 by 𝑇𝑃 where 𝑇 is 𝑇 = 𝖳𝖱𝖠𝖭𝖲 with 𝑐 ⃗ = (𝑡0 , . . . , 𝑡 𝑖−1 , ∗, 𝑡 𝑖+1 , . . . , 𝑡𝑛−1 ).
Note that 𝑃 is the product of at most 𝑛 transposition operators.
Next, assertion (2) is verified in Exercise 4.10.14.
Finally, we deduce the assertion of the theorem from (1) and (2) and Theorems
4.9.14 and 4.9.15. Since 𝑃 is the product of O(𝑛) transposition operators, it follows
from Theorem 4.9.15 that 𝑃 and 𝑃∗ can be implemented by a quantum circuit that con-
tains O(𝑛2 ) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳, ancillary, and erasure
gates. Also, it follows from Theorem 4.9.14 that 𝐶 𝐶0 ,𝐶1 ,𝑖 (𝑉) can be implemented by
a quantum circuit that uses O(𝑛) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳,
ancillary, and erasure gates and four other single-qubit gates. This concludes the proof
of the theorem. □
Exercise 4.10.13. Show that the operator 𝑉 from the proof of Theorem 4.10.12 is uni-
tary.
Exercise 4.10.14. Show that in the proof of Theorem 4.10.12 assertion (2) is correct.
The main work in proving this theorem is to show the following theorem.
Theorem 4.11.2. The set containing the Hadamard and 𝜋/8 gates is universal for the
set of all unitary single-qubit operators.
Proof. We prove the assertion by induction on 𝑘. For the base case 𝑘 = 1 the assertion
is obviously true. For the inductive step, let 𝑘 > 1,
𝑘−1 𝑘−1
(4.11.2) 𝑈 = ∏ 𝑈 𝑖, 𝑉 = ∏ 𝑉 𝑖,
𝑖=1 𝑖=1
≤ 𝐸(𝑈, 𝑉) + 𝐸(𝑈 𝑘 , 𝑉 𝑘 )
(3)
𝑘
≤ ∑ 𝐸(𝑈 𝑖 , 𝑉 𝑖 ).
(4) 𝑖=1
These equations and inequalities are valid for the following reasons: inequality
(1) uses the triangle inequality which holds by Proposition 2.2.25, equation (2) holds
because 𝑈 𝑘 is unitary, inequality (3) uses Definition 4.8.2, and inequality (4) follows
from an application of the induction hypothesis (4.11.3). □
We write
(4.11.9) 𝑛̂ = (𝑛𝑥 , 𝑛𝑦 , 𝑛𝑧 ), 𝑚̂ = (𝑚𝑥 , 𝑚𝑦 , 𝑛𝑧 )
and use the following observation.
Lemma 4.11.4. For all 𝛾 ∈ ℝ we have
(4.11.10) 𝑅𝑚̂ (𝛾) = 𝐻𝑅𝑛̂ (𝛾)𝐻.
Now let
𝜋
(4.11.14) 𝜃 = 2 arccos(cos2
).
8
The next lemma shows that, up to a global phase factor, we can implement the rotation
operator 𝑅𝑛̂ (𝜃) by a circuit that uses only Hadamard and 𝜋/8 gates.
𝜋
Lemma 4.11.5. We have 𝑅𝑛̂ (𝜃) = 𝑒−𝑖 4 𝑇𝐻𝑇𝐻.
This implies
𝜃
𝜋 sin 2
(4.11.19) sin = .
8 ‖𝑛‖⃗
So we obtain
𝑒−𝑖𝜋/4 𝑇𝐻𝑇𝐻
= 𝑒−𝑖𝜋𝑍/8 𝑒−𝑖𝜋𝑋/8
(1)
𝜋 𝜋 𝜋 𝜋
= ( cos 𝐼 − 𝑖 sin 𝑍)( cos 𝐼 − 𝑖 sin 𝑋)
(2) 8 8 8 8
2 𝜋 𝜋 𝜋 2 𝜋
= cos 𝐼 − 𝑖 sin cos (𝑋 + 𝑍) − 𝑖 sin 𝑍𝑋
8 8 8 8
𝜃 𝜃 1 𝜋 𝜋 𝜋
= cos 𝐼 − 𝑖 sin ( cos 𝑋 + sin 𝑌 + cos 𝑍)
(3) 2 2 ‖𝑛‖⃗ 8 8 8
= 𝑅𝑛̂ (𝜃).
(4)
In these equations we use the following arguments: equation (1) follows from (4.11.15)
and (4.11.17), equaltion (2) is obtained from (4.3.3), equation (3) holds because of
(4.11.14), 𝑍𝑋 = −𝑌 (see Theorem 4.1.2), (4.11.19), and equation (4) is true because
of (4.3.3). □
To show that {𝐻, 𝑇} is universal for the set of all unitary single-qubit operators we
also need the following auxiliary results. Their proofs require some algebraic number
theory which is beyond the scope of this book. We refer to [IR10] which is an excellent
introduction to the subject.
ᵆ𝜋
Lemma 4.11.6. If 𝑣, 𝑢 ∈ ℤ with 𝑣 > 0, then 2 cos( 𝑣
) is an algebraic integer.
Proof. We will show that for all 𝑣 ∈ ℕ and all 𝑦 ∈ ℝ, there exists a polynomial 𝑃𝑣 ∈
ℤ[𝑥] that is monic, has degree 𝑣, and satisfies
(4.11.20) 𝑃𝑣 (2 cos 𝑦) = 2 cos 𝑣𝑦.
So we have
𝑢𝜋
(4.11.21) 𝑃𝑣 (2 cos ) = 2 cos 𝑢𝜋
𝑣
which implies the assertion. The polynomials 𝑃𝑣 are constructed inductively. We set
𝑃0 (𝑥) = 2, 𝑃1 (𝑥) = 𝑥. Then (4.11.20) holds for 𝑣 = 0, 1. Also, for 𝑣 ≥ 1 we set
(4.11.22) 𝑃𝑣+1 (𝑥) = 𝑥𝑃𝑣 (𝑥) − 𝑃𝑣−1 (𝑥).
Assume that (4.11.20) holds for all 𝑣′ ≤ 𝑣. Then (A.5.8), (4.11.22), and the induction
hypothesis imply
(4.11.23) 𝑃𝑣+1 (2 cos 𝑦) = 2 cos 𝑦𝑃𝑣 (2 cos 𝑦) − 𝑃𝑣−1 (2 cos 𝑦)
= 4 cos 𝑦 cos 𝑣𝑦 − 2 cos(𝑣 − 1)𝑦 = 2 cos(𝑣 + 1)𝑦. □
𝜃
Lemma 4.11.7. The fraction 𝜋
is irrational.
196 4. The Theory of Quantum Algorithms
𝜃 𝜋 1 𝜋 1 √2
(4.11.24) cos = cos2 = (cos + 1) = + .
2 8 2 4 2 4
𝜃 ᵆ
Now assume that 2𝜋
= 𝑣
with 𝑢, 𝑣 ∈ ℤ, 𝑣 > 0. Then it follows from Lemma 4.11.6 that
𝜃 𝑢𝜋
(4.11.25) 2 cos = 2 cos
2 𝑣
𝜃
is an algebraic integer. From (4.11.24) we see that 2 cos 2
is a quadratic irrationality of
norm
1 √2 1 √2 1 1
(4.11.26) 4( + )( − )=1− = .
2 4 2 4 2 2
But this is not the norm of an algebraic integer, a contradiction. □
The next proposition shows that the rotation operator 𝑅𝑛̂ (𝜃) can be used to approx-
imate every other rotation about the 𝑛-axis
̂ with arbitrary precision.
Proposition 4.11.10. For all 𝜀 ∈ ℝ>0 and all 𝛾 ∈ ℝ there is 𝑘 ∈ ℕ such that
(4.11.38) 𝐸(𝑅𝑛̂ (𝛾), 𝑅𝑛̂ (𝜃)𝑘 ) < 𝜀.
For the proof of Theorem 4.11.12 we refer the reader to [NC16, Appendix 3].
has the following properties, which are mentioned in [Wat09] and which we have al-
ready used in Section 1.6.3.
(1) The encoding is sensible: every quantum circuit is encoded by at least one bit
string and every bit string encodes at most one quantum circuit.
(2) The encoding is efficient: there is 𝑐 ∈ ℕ such that every quantum circuit 𝑄 of size
𝑁 has an encoding of length at least size 𝑄 and at most 𝑁 𝑐 . Information about the
structure of a circuit must be computable in polynomial time from an encoding
of the circuit.
(3) The length of every encoding of a quantum circuit is at least the size of the circuit.
The term “structure information” may, for example, refer to information regarding
the input qubits and the quantum gates used in quantum circuits, including the qubits
on which these gates operate.
Now, uniform quantum circuit families can be defined analogously to classical
uniform circuit families in Definition 1.6.7. Since, by Theorem 4.7.7, the computing
power of quantum circuits is the same as the computing power of classical circuits, it
follows that quantum computing is Turing complete.
Next, similar to classical circuit complexity theory, quantum complexity theory re-
quires P-uniform quantum circuit families. Let us provide a formal definition of them.
Definition 4.12.2. A quantum circuit family (𝑄𝑛 ) is called P-uniform if there is a de-
terministic polynomial time algorithm that on input of I𝑛 , 𝑛 ∈ ℕ, outputs an encoding
of 𝑄𝑛 .
Our next goal is to define quantum algorithms. A simple example of such an al-
gorithm is the quantum implementation of coinToss presented in Algorithm 4.12.3.
This algorithm uses the quantum circuit QcoinToss shown in Figure 4.12.1, where the
Hadamard operator is applied to |0⟩, followed by measuring the resulting state. The
algorithm provides the measurement result, which can be 0 or 1, both occurring with
1
equal probabilities of 2 .
|𝜓⟩ 𝐻 𝑏
Figure 4.12.1. The quantum circuit QcoinToss where |𝜓⟩ is a single-qubit quantum
state and 𝑏 ∈ {0, 1}.
This definition allows for the smooth transition of concepts and results from prob-
abilistic algorithms to quantum algorithms. For example, quantum Monte Carlo algo-
rithms and quantum Las Vegas algorithms can be defined in a straightforward manner.
Also, quantum Monte Carlo algorithms can be either error-free or not. Additionally,
there are quantum Bernoulli algorithms that correspond to error-free quantum Monte
|𝜓⟩ 𝐻 ⊗𝑛 𝑥⃗
Carlo algorithms. Lastly, quantum decision algorithms and their properties are analo-
gous to probabilistic decision algorithms discussed in Section 1.2.2.
Proof. The quantum circuit 𝑄′ is constructed from the quantum circuit 𝑄 by replac-
ing all unitary elementary gates 𝑉 with their controlled counterparts 𝐶 1 (𝑉). The ele-
mentary single-qubit gates are rotation gates, up to global phase factors. Therefore, it
suffices to consider rotation gates. Let 𝑤̂ ∈ ℝ3 be a unit vector, 𝛾 ∈ ℝ, and consider
the rotation operator 𝑉 = 𝑅𝑤̂ (𝛾). As demonstrated in Exercise 4.3.34, we can express
𝑉 as 𝑉 = 𝐴𝑋𝐵𝑋𝐶, where 𝐴 = 𝑅𝑤̂ (𝛾/2), 𝐵 = 𝑅𝑤̂ (−𝛾/2), and 𝐶 = 𝐼2 . Consequently, it
follows from Theorem 4.4.5 that 𝐶 1 (𝑉) can be implemented by a quantum circuit that
uses the three rotation gates 𝐴, 𝐵, and 𝐶 with known axis and angle of rotation, along
with two 𝖢𝖭𝖮𝖳 gates. There are two more unitary elementary gates: the 𝖢𝖭𝖮𝖳 gate and
the 𝖢𝖢𝖭𝖮𝖳 gate. It follows from Proposition 4.9.10 that the corresponding controlled
versions can be implemented using 𝑂(1) elementary gates. These arguments imply the
assertion of the theorem. □
4.12.4. Time and space complexity. The goal of this section is to introduce
the time and space complexity of quantum algorithms which are defined as probabilis-
tic algorithms that may invoke quantum subroutines. Therefore, we must define the
complexity of such subroutines. For this, we first explain the running time and space
requirements of quantum circuits.
Definition 4.12.10. Let 𝑄 be a quantum circuit.
(1) The running time or time complexity of 𝑄 is its size, i.e., the number of input qubits
plus the number of gates used by the circuit.
(2) The space complexity of 𝑄 is the number of input qubits plus the number of ancilla
qubits used by 𝑄.
(2) The space complexity of 𝐴 is the function qSpace ∶ ℕ → ℕ that sends an input
length 𝑛 ∈ ℕ to the maximum space complexity of a quantum circuit used in the
execution of 𝐴 with an input of length 𝑛.
2
It is important to observe that by Exercise 1.4.12 the value 3 in the definition of the
quantum time complexities and the quantum complexity class BQP may be replaced
1
by any real number in ] 2 , 1]. Also, we note that the following inclusions are satisfied.
Theorem 4.12.14. We have P ⊂ BPP ⊂ BQP ⊂ PSPACE.
Exercise 4.12.15. Sketch the proof of Theorem 4.12.14.
Chapter 5
The Algorithms of
Deutsch and Simon
205
206 5. The Algorithms of Deutsch and Simon
At first glance, this problem may appear trivial, as with just two queries to the
black-box function 𝑓, we can determine the answer. However, it’s essential to make
a significant observation: there is no way to solve the Deutsch problem with fewer
queries. To understand this, let’s assume that the first query yields 𝑓(𝑏) = 𝑐, where
both 𝑏 and 𝑐 are binary values. It’s still possible that 𝑓(1 − 𝑏) = 𝑐, which implies
that either 𝑓(0) ⊕ 𝑓(1) = 0 or that 𝑓(1 − 𝑏) = 1 − 𝑐, which implies 𝑓(0) ⊕ 𝑓(1) =
1. However, as we will explore in the next section, the quantum Deutsch algorithm
can solve this problem with just one query to a black-box that implements a unitary
operator closely related to the function 𝑓. This makes the Deutsch algorithm the first
algorithm capable of accomplishing something impossible in the classical world. In
doing so, it introduces crucial principles of quantum computing that are also leveraged
in more advanced quantum algorithms.
5.1.2. The quantum version and its solution. To explain the quantum version
of the Deutsch problem, we define the following unitary operator on ℍ2 :
(5.1.2) 𝑈 𝑓 ∶ ℍ2 → ℍ2 , |𝑥⟩ |𝑦⟩ ↦ |𝑥⟩ |𝑓(𝑥) ⊕ 𝑦⟩ = |𝑥⟩ 𝑋 𝑓(𝑥) |𝑦⟩ .
The quantum circuit that solves the Deutsch problem is shown in Figure 5.1.1.
It uses important ingredients of quantum computing: superposition, quantum paral-
lelism, phase kickback, and quantum interference. Now we describe the circuit step by
step and explain these ingredients.
The input state is
(5.1.3) |𝜓0 ⟩ = |0⟩ |1⟩ .
In the first step, the circuit applies the Hadamard operator to the first and the second
qubit which gives the state
|0⟩ |𝑥− ⟩ + |1⟩ |𝑥− ⟩
(5.1.4) |𝜓1 ⟩ = |𝑥+ ⟩ |𝑥− ⟩ = .
√2
Figure 5.1.1. The quantum circuit that solves the Deutsch problem.
208 5. The Algorithms of Deutsch and Simon
This is an equally weighted superposition of the states |0⟩ |𝑥− ⟩ and |1⟩ |𝑥− ⟩.
Next, the phase kickback trick is used. We note that
(5.1.5) 𝑈 𝑓 |0⟩ |𝑥− ⟩ = |0⟩ 𝑋 𝑓(0) |𝑥− ⟩ = (−1)𝑓(0) |0⟩ |𝑥− ⟩
and
(5.1.6) 𝑈 𝑓 |1⟩ |𝑥− ⟩ = |0⟩ 𝑋 𝑓(1) |𝑥− ⟩ = (−1)𝑓(1) |0⟩ |𝑥− ⟩ .
These quantum states have the global phase factors (−1)𝑓(0 and (−1)𝑓(1) . But since
global phase factors do not influence measurement outcomes, we cannot learn any-
thing about 𝑓(0) or 𝑓(1) from measuring these states or evolutions of them. However,
if we apply 𝑈 𝑓 to the superposition |𝜓1 ⟩ we obtain
|0⟩ + |1⟩
|𝜓2 ⟩ = 𝑈 𝑓 |𝑥+ ⟩ |𝑥− ⟩ = 𝑈 𝑓 |𝑥− ⟩
√2
(5.1.7)
𝑈 𝑓 |0⟩ |𝑥− ⟩ + 𝑈 𝑓 |1⟩ |𝑥− ⟩ (−1)𝑓(0) |0⟩ + (−1)𝑓(1) |1⟩
= = |𝑥− ⟩ .
√2 √2
In this operation the global phase factors (−1)𝑓(0) and (−1)𝑓(1) are kicked back to the
amplitudes of |0⟩ and |1⟩ in the first qubit. As we will see, this opens up the possibility
of gaining information about 𝑓(0) and 𝑓(1) through measurement of an evolution of
𝑈 𝑓 |𝑥+ ⟩ |𝑥− ⟩. Here, we also see quantum parallelism in action: one application of 𝑈 𝑓
changes both the amplitudes of |0⟩ and |1⟩.
Equation (5.1.7) implies
|0⟩ + (−1)𝑓(0)⊕𝑓(1) |1⟩
|𝜓2 ⟩ = (−1)𝑓(0) |𝑥− ⟩
√2
(5.1.8)
(−1)𝑓(0) |𝑥+ ⟩ |𝑥− ⟩ if 𝑓(0) ⊕ 𝑓(1) = 0,
={
(−1)𝑓(0) |𝑥− ⟩ |𝑥− ⟩ if 𝑓(0) ⊕ 𝑓(1) = 1.
So up to the global phase factor (−1)𝑓(0) the quantum interference of the two states
𝑈 𝑓 |0⟩ |𝑥− ⟩ and 𝑈 𝑓 |1⟩ |𝑥− ⟩ causes the amplitude of |1⟩ in the first qubit to be
(−1)𝑓(0)⊕𝑓(1) while the amplitude of |0⟩ in the first qubit is independent of this value.
Measuring the first qubit in the basis |𝑥+ ⟩ and |𝑥− ⟩ would give the desired result. Since
measurement in the computational basis is used, the Hadamard operator is applied to
the first qubit. This gives
(5.1.9) |𝜓3 ⟩ = (𝐻 ⊗ 𝐼) |𝜓2 ⟩ = (−1)𝑓(0) |𝑓(0) ⊕ 𝑓(1)⟩ |𝑥− ⟩ .
This state is separable with respect to the decomposition into the subsystems that con-
tain the first and the second qubit, respectively. So it follows from Corollary 3.7.12
that measuring the first qubit of |𝜓3 ⟩ in the computational basis gives 𝑓(0) ⊕ 𝑓(1) with
probability 1.
We have thus proved the following theorem.
Theorem 5.1.4. The quantum circuit in Figure 5.1.1 gives 𝑓(0) ⊕ 𝑓(1) with probability
1. It uses the black-box 𝑈 𝑓 once and, in addition, three Hadamard gates.
5.3. The Deutsch-Jozsa algorithm 209
So, while solving the classical Deutsch problem requires two applications of the
function 𝑓, the Deutsch quantum circuit only needs one application of 𝑈 𝑓 . The next
exercise generalizes the phase kickback trick.
Exercise 5.1.5. Consider a unitary single-qubit operator 𝑉 and an eigenstate |𝜓⟩ of
this operator.
(1) Show that applying the operator 𝑉 to |𝜓⟩ means applying a global phase shift to
this state.
(2) Show that applying the controlled-𝑉 operator 𝐶(𝑉) with the first qubit as a control
to the state |𝑥+ ⟩ |𝜓⟩ kicks the global phase shift back to the amplitude of |1⟩ in the
first qubit.
(3) Find the spherical coordinates of the points on the Bloch sphere corresponding
to the first qubit before and after the application of 𝐶(𝑉) to |𝑥+ ⟩ |𝜓⟩.
Exercise 5.3.2. Show that every deterministic algorithm that solves the classical
Deutsch-Jozsa problem requires 2𝑛−1 + 1 queries of the black-box implementing 𝑓.
210 5. The Algorithms of Deutsch and Simon
5.3.2. The quantum version and its solution. The quantum version of the
Deutsch-Jozsa problem uses the operator
(5.3.2) 𝑈 𝑓 ∶ ℍ 𝑛 ⊗ ℍ1 → ℍ 𝑛 ⊗ ℍ1 , |𝑥⟩⃗ |𝑦⟩ ↦ |𝑥⟩⃗ |𝑓(𝑥)⃗ ⊕ 𝑦⟩ = |𝑥⟩⃗ 𝑋 𝑓(𝑥)⃗ |𝑦⟩
to find out whether 𝑓 is constant or balanced.
Exercise 5.3.4. Prove that 𝑈 𝑓 defined in (5.3.2) is a unitary operator on ℍ𝑛 .
Figure 5.3.1. The quantum circuit QDJ (𝑛, 𝑈 𝑓 ) that solves the Deutsch-Jozsa problem.
5.3. The Deutsch-Jozsa algorithm 211
and
1
(5.3.4) |𝑥⟩⃗ = ∑ (−1)𝑥⋅⃗ 𝑧⃗ 𝐻 ⊗𝑛 |𝑧⟩⃗ .
√2𝑛 ⃗
𝑧∈{0,1}𝑛
1
𝐻 ⊗𝑛 |𝑥⟩⃗ = ( ∑ (−1)𝑥0 𝑧0 |𝑧⟩) ⊗ ⋯ ⊗ ( ∑ (−1)𝑥𝑛−1 𝑧𝑛−1 |𝑧⟩)
√2𝑛 𝑧0 ∈{0,1} 𝑧𝑛−1 ∈{0,1}
1
(5.3.6) = ∑ ((−1)𝑥0 𝑧0 |𝑧0 ⟩) ⊗ ⋯ ⊗ ((−1)𝑥𝑛−1 𝑧𝑛−1 |𝑧𝑛−1 ⟩)
√2𝑛 (𝑧0 ,. . .,𝑧𝑛−1 )∈{0,1}𝑛
1
= ∑ (−1)𝑥⋅⃗ 𝑧⃗ |𝑧⟩⃗ .
√2𝑛 ⃗
𝑧∈{0,1}𝑛
Therefore, the application of 𝑈 𝑓 to |𝑥⟩⃗ |𝑥− ⟩ modifies this state by the global phase factor
(−1)𝑓(𝑥)⃗ . It follows from (5.3.9) that
1
|𝜓2 ⟩ = 𝑈 𝑓 |𝜓1 ⟩ = ∑ 𝑈 𝑓 |𝑥⟩⃗ |𝑥− ⟩
√2𝑛 ⃗
𝑥∈{0,1}𝑛
(5.3.10)
1
= ∑ (−1)𝑓(𝑥)⃗ |𝑥⟩⃗ |𝑥− ⟩ .
√2𝑛 ⃗
𝑥∈{0,1}𝑛
212 5. The Algorithms of Deutsch and Simon
Hence, applying 𝑈 𝑓 to the superposition |𝜓1 ⟩ kicks the global phase shifts (−1)𝑥⃗ back
to all the amplitudes of the states |𝑥⟩⃗ of the first 𝑛 qubits. In order to extract information
about the function 𝑓 from this superposition, we note that by (5.3.4) we have
(−1)𝑓(0)⃗ ⃗ 0)⃗ |
|𝜓2 ⟩ = ∑ (−1)𝑓(𝑥)⊕𝑓( 𝑥⟩⃗ |𝑥− ⟩
√2𝑛 ⃗
𝑥∈{0,1}𝑛
(−1)𝑓(0)⃗ ⃗ 0)⃗
(5.3.11) = ∑ (−1)𝑓(𝑥)⊕𝑓( ∑ (−1)𝑥⋅⃗ 𝑧⃗ 𝐻 ⊗𝑛 |𝑧⟩⃗ |𝑥− ⟩
2𝑛 ⃗ 𝑛 ⃗ 𝑛
𝑥∈{0,1} 𝑧∈{0,1}
(−1)𝑓(0)⃗ 0)⃗
=( ∑ ( ∑ (−1)𝑥⋅⃗ 𝑧+𝑓(
⃗ ⃗
𝑥)⊕𝑓(
) 𝐻 ⊗𝑛 |𝑧⟩)
⃗ |𝑥− ⟩ .
2𝑛 ⃗ 𝑛 ⃗ 𝑛
𝑧∈{0,1} 𝑥∈{0,1}
This is the tensor product of a quantum state in ℍ𝑛 with |𝑥− ⟩. The amplitude of
the basis state 𝐻 ⊗𝑛 |0⟩𝑛 in the state of the first 𝑛 qubits is
It follows from Corollary 3.7.12 that measuring the first 𝑛 qubits in the basis
(𝐻 ⊗𝑛 |𝑧⟩)
⃗ 𝑧∈{0,1}
⃗ 𝑛 gives with probability 1 the information whether 𝑓 is constant or bal-
(−1)𝑓(0)⃗ 0)⃗
(5.3.13) |𝜓3 ⟩ = ( ∑ ( ∑ (−1)𝑥⋅⃗ 𝑧+𝑓(
⃗ ⃗
𝑥)⊕𝑓(
) |𝑧⟩)
⃗ |𝑥− ⟩ .
2𝑛 ⃗ 𝑛 ⃗ 𝑛
𝑧∈{0,1} 𝑥∈{0,1}
Measuring the first 𝑛 qubits of |𝜓3 ⟩ in the computational basis gives 0⃗ with probability
1 if 𝑓 is constant and 𝑧 ⃗ ≠ 0⃗ if 𝑓 is balanced. As desired, this measurement distinguishes
with probability 1 between constant and balanced functions 𝑓 using 2𝑛+1 applications
of the Hadamard operator 𝐻 and one application of 𝑈 𝑓 . This can be considered as an
exponential speedup of the best classical solution to the Deutsch problem.
Summarizing our discussion, we obtain the following theorem.
Theorem 5.3.7. Let 𝑛 ∈ ℕ, let 𝑓 ∶ {0, 1}𝑛 → {0, 1} be a function that is constant or
balanced, and let 𝑈 𝑓 be the unitary operator from (5.3.2). Then with probability 1 the
quantum circuit QDJ returns 0⃗ if 𝑓 is constant and 𝑥⃗ ∈ {0, 1}𝑛 , 𝑥⃗ ≠ 0,⃗ if 𝑓 is balanced. It
uses one 𝑈 𝑓 gate and 2𝑛 + 1 Hadamard gates.
So, compared to the best deterministic algorithm for solving the Deutsch-Jozsa
problem, which by Exercise 5.3.2 requires 2𝑛−1 + 1 evaluations of the function 𝑓, the
Deutsch-Jozsa algorithm represents a dramatic asymptotic speedup. However, com-
pared to the probabilistic algorithm from Exercise 5.3.3, this advantage vanishes. This
is different for Simon’s algorithm, which is presented in the next section.
5.4. Simon’s algorithm 213
The next exercise gives a lower bound for solving Simon’s problem using a classical
deterministic algorithm.
Exercise 5.4.2. Let 𝐴 be a classical deterministic algorithm that solves Simon’s prob-
lem. Show that in the worst case, 𝐴 must query the black-box implementing 𝑓 at least
2𝑛−1 + 1 times.
We will see that the quantum algorithm for Simon’s problem is much more effi-
cient. But it is a probabilistic algorithm. Therefore, we must compare it with classical
probabilistic algorithms. A lower bound for their performance is given in the next the-
orem, which was proved by Richard Cleve in [Cle11].
Theorem 5.4.3. Any classical probabilistic algorithm that solves Simon’s problem with
probability at least 3/4 must make Ω(2𝑛/2 ) queries to the black-box for 𝑓.
5.4.2. The quantum version and its solution. As in the quantum Deutsch
problem, also in the quantum version of Simon’s problem the function 𝑓 ∶ {0, 1}𝑛 →
{0, 1}𝑛 is replaced by a unitary operator on ℍ𝑛 . This operator is
(5.4.1) 𝑈 𝑓 ∶ ℍ 𝑛 ⊗ ℍ𝑛 , |𝑥⟩⃗ |𝑦⟩⃗ ↦ |𝑥⟩⃗ |𝑓(𝑥)⃗ ⊕ 𝑦⟩⃗ .
With this operator, the quantum version of Simon’s problem can be stated as follows.
Problem 5.4.4 (Simon’s problem — quantum version).
Input: A positive integer 𝑛, a black-box implementing the unitary operator 𝑈 𝑓 from
(5.4.1) for a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 such that there is a vector 𝑠 ⃗ ∈ {0, 1}𝑛 , 𝑠 ⃗ ≠ 0,⃗
with the property that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑥⃗ = 𝑦 ⃗ or
𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠.⃗
Output: The hidden string 𝑠.⃗
214 5. The Algorithms of Deutsch and Simon
We explain the idea of Simon’s Algorithm 5.4.5. The details and proofs are given
below. Using the quantum circuit QSimon (𝑛, 𝑈 𝑓 ) from Figure 5.4.1 the algorithm se-
lects 𝑛 − 1 elements 𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−1 in the orthogonal complement
(5.4.2) 𝑠⟂⃗ = {𝑤⃗ ∈ {0, 1}𝑛 ∶ 𝑤⃗ ⋅ 𝑠 ⃗ = 0}
of 𝑠.⃗ If the matrix 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−1 ) has rank 𝑛 − 1, then the algorithm computes the
uniquely determined solution of the linear system 𝑊 T 𝑥⃗ = 0⃗ which happens to be the
hidden string 𝑠.⃗
In the upcoming part of this section, we will adopt the notation from the quan-
tum version of Simon’s problem and prove the following theorem. In view of Theorem
5.4.3, it shows that Simon’s algorithm offers an exponential speedup compared to any
deterministic algorithm for Simon’s problem.
Theorem 5.4.6. Simon’s Algorithm 5.4.5 returns the hidden string 𝑠 ⃗ from Simon’s prob-
lem with probability at least 1/4. It requires 𝑛 − 1 applications of 𝑈 𝑓 and O(𝑛3 ) other
operations.
Figure 5.4.1. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) used in Simon’s algorithm.
5.4. Simon’s algorithm 215
Proposition 5.4.7. Let 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−1 ) be of dimension 𝑛−1. Then 𝑠 ⃗ is the uniquely
determined solution of the linear system 𝑊 T 𝑥⃗ = 0.⃗
It follows from Proposition 5.4.7 that Simon’s algorithm returns the correct result
if the quantum circuit QSimon (𝑛, 𝑈 𝑓 ) in Figure 5.4.1 returns elements of 𝑠⟂⃗ . This is
what we will prove now. We need the following result.
Lemma 5.4.8. Let 𝑠 ⃗ ∈ {0, 1}𝑛 be nonzero. Then for all 𝑧 ⃗ ∈ {0, 1}𝑛 we have
|𝑧⟩⃗ + |𝑧 ⃗ ⊕ 𝑠⟩⃗ 1
(5.4.3) 𝐻 ⊗𝑛 ( )= ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ .
√2 √2𝑛−1 ⃗ 𝑠⟂⃗
𝑤∈
Proof. It follows from Lemma 5.3.6 that for all 𝑧 ⃗ ∈ {0, 1}𝑛 we have
|𝑧⟩⃗ + |𝑧 ⃗ ⊕ 𝑠⟩⃗
𝐻 ⊗𝑛 ( )
√2
1
= ∑ ((−1)𝑧⋅⃗ 𝑤⃗ + (−1)(𝑧⊕ ⃗ 𝑤⃗ |
⃗ 𝑠)⋅
) 𝑤⟩⃗
√2𝑛+1 ⃗
𝑤∈{0,1} 𝑛
(5.4.4)
1
= ( ∑ ((−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗ |
) 𝑤⟩⃗
√2𝑛+1 ⃗ 𝑠⟂⃗
𝑤∈
+ ∑ ((−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗ |
) 𝑤⟩)
⃗ .
⃗
𝑤∈{0,1}𝑛 ⧵𝑠⟂
⃗
If 𝑤⃗ ∈ 𝑠⟂⃗ , then
(5.4.5) (−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗
= 2 ⋅ (−1)𝑧⋅⃗ 𝑤⃗ .
Also, if 𝑤⃗ ∈ {0, 1}𝑛 ⧵ 𝑠⟂⃗ , then
(5.4.6) (−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗
= 0.
Hence, it follows from (5.4.4) that
|𝑧⟩⃗ ⊕ |𝑧 ⃗ + 𝑠⟩⃗ 2 1
(5.4.7) 𝐻 ⊗𝑛 ( )= ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑧⟩⃗ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑧⟩⃗
√2 √2𝑛+1 ⃗ 𝑠⟂⃗
𝑤∈ √2𝑛−1 ⃗ 𝑠⟂⃗
𝑤∈
as asserted. □
Proposition 5.4.9. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) from Figure 5.4.1 returns a uni-
formly distributed random element of 𝑠⟂⃗ .
216 5. The Algorithms of Deutsch and Simon
Proof. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) operates on a quantum system that con-
⊗𝑛
sists of two quantum registers of length 𝑛 each of which is initialized to |0⟩ . So, we
have
⊗𝑛 ⊗𝑛
(5.4.8) |𝜓0 ⟩ = |0⟩ |0⟩ .
Then 𝐻 ⊗𝑛 is applied to the first register. It follows from (5.3.3) that this gives the quan-
tum state
1 ⊗𝑛
(5.4.9) |𝜓1 ⟩ = ∑ |𝑧⟩⃗ |0⟩ .
√2𝑛 𝑧∈{0,1}
⃗ 𝑛
⊗𝑛
It is an equally weighted superposition of the quantum states |𝑧⟩⃗ |0⟩ . Next, the algo-
rithm applies 𝑈 𝑓 to |𝜓1 ⟩ and produces the state
1
(5.4.10) |𝜓2 ⟩ = 𝑈 𝑓 |𝜓1 ⟩ = ∑ |𝑧⟩⃗ |𝑓(𝑧)⟩
⃗ .
√2𝑛 ⃗
𝑧∈{0,1}𝑛
1 |𝑧⟩⃗ + |𝑧 ⃗ ⊕ 𝑠⟩⃗
(5.4.12) |𝜓2 ⟩ = ∑ |𝑓(𝑧)⟩
⃗ .
√2𝑛−1 ⃗
𝑧∈𝐼 √2
From Lemma 5.4.8 we obtain
1
(5.4.13) |𝜓2 ⟩ = ∑ ∑ (−1)𝑧⋅⃗ 𝑤⃗ 𝐻 ⊗𝑛 |𝑤⟩⃗ |𝑓(𝑧)⟩
⃗ .
2𝑛−1 ⃗ ⃗ ⟂⃗
𝑧∈𝐼 𝑤∈𝑠
As shown in Exercise 5.4.10, measuring the first register of |𝜓3 ⟩ in the computational
basis of ℍ𝑛 gives every 𝑤⃗ ∈ 𝑠⟂⃗ with probability 1/2𝑛−1 . □
Exercise 5.4.10. (1) Show that measuring the first register of |𝜓3 ⟩ in (5.4.14) in the
computational basis of ℍ𝑛 gives every 𝑤⃗ ∈ 𝑠⟂⃗ with probability 1/2𝑛−1 .
(2) Analyze the modification of Simon’s algorithm where the second register is traced
out before the measurement.
5.4. Simon’s algorithm 217
Next, we analyze Algorithm 5.4.5. The algorithm invokes the quantum circuit
QSimon (𝑛, 𝑈 𝑓 ) 𝑛 − 1 times and constructs the matrix 𝑊. If the rank of the matrix
𝑊 is 𝑛 − 1, it follows from Proposition 5.4.7 that the hidden string 𝑠 ⃗ is the uniquely
determined nonzero solution of the linear system 𝑊 T 𝑥⃗ = 0.⃗ To estimate the success
probability of the algorithm, we use the following lemma.
Lemma 5.4.11. For all 𝑛 ∈ ℕ we have
𝑛−1
1 1
(5.4.15) ∏ (1 − )≥ .
𝑘=1
2𝑘 4
Using Lemma 5.4.11 we now obtain the following estimate of the success proba-
bility of Simon’s algorithm.
Proposition 5.4.13. The success probability of Simon’s algorithm is at least 1/4.
Using this equation for 𝑗 = 𝑛 − 1 and Lemma 5.4.11 we see that the success probability
of the Simon algorithm is at least 1/4.
If 𝑛 = 1, then (5.4.18) holds. So, let 𝑛 > 1. We prove (5.4.18) be induction on 𝑗. In
each iteration of the for loop, a random 𝑤⃗ ∈ 𝑠⟂⃗ is returned by QSimon (𝑛, 𝑈 𝑗 ) according
to the uniform distribution and is appended to the previous 𝑊. The matrix found in
this way in the first iteration has rank 1 if 𝑤⃗ is nonzero. This happens with probability
(2𝑛−1 − 1)/2𝑛−1 = 1 − 1/2𝑛−1 = 𝑝1 . Now let 𝑗 ∈ 𝑁, 1 < 𝑗 ≤ 𝑛 − 1, and assume that
rank 𝑊 𝑗−1 = 𝑗 − 1 and that this happens with probability 𝑝𝑗−1 . We determine the
probability that the vector 𝑤⃗ found in the 𝑗th iteration appends 𝑊 𝑗−1 to a matrix of
rank 𝑗. Let (𝑏1⃗ , . . . , 𝑏𝑛−1
⃗ ) be a basis of 𝑠⟂⃗ such that 𝑏1⃗ , . . . , 𝑏𝑗−1
⃗ are the row vectors of
𝑛−1
𝑊 𝑗−1 . Let 𝑤⃗ = ∑𝑖=0 𝑤 𝑖 𝑏𝑖⃗ be the vector returned in the 𝑗th iteration of the for loop by
QSimon (𝑛, 𝑈 𝑓 ). This vector appends 𝑊 𝑗−1 to a matrix of rank 𝑗 if and only if at least
218 5. The Algorithms of Deutsch and Simon
one of the coefficients 𝑤 𝑖 of the basis elements 𝑏𝑖⃗ with 𝑗 ≤ 𝑖 ≤ 𝑛 − 1 is nonzero. This
holds for 2𝑛−1 − 2𝑗−1 vectors in {0, 1}𝑛 . So, the probability of finding such a vector is
2𝑗−1 1
(5.4.19) 𝑝𝑗−1 (1 − ) = 𝑝𝑗−1 (1 − 𝑛−𝑗 ) = 𝑝𝑗 . □
2𝑛−1 2
Proof. Clearly, the number of calls of QSimon (𝑛, 𝑈 𝑓 ) is 𝑛 − 1. Since each application
of QSimon (𝑛, 𝑈𝑛 ) uses O(𝑛) Hadamard gates, the total number of required Hadamard
gates is O(𝑛3 ). Also, by Proposition B.7.17 the linear system 𝑊 T 𝑥⃗ = 0⃗ can be solved
using O(𝑛2 ) operations. □
Now Theorem 5.4.6 follows from Propositions 5.4.7, 5.4.9, 5.4.13, and 5.4.14.
The subspace 𝑆 in also referred to as a hidden subgroup of {0, 1}𝑛 . In Simon’s orig-
inal problem, we have 𝑆 = {0,⃗ 𝑠}. Finding a hidden subgroup is also the key step in
Shor’s factoring and discrete logarithm algorithms. They are discussed in Chapter 6.
The quantum version of the general Simon’s problem is the following.
Problem 5.5.2 (General Simon’s problem — quantum version).
Input: A black-box implementing the unitary operator 𝑈 𝑓 from (5.4.1) for a function
𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 with the following property. There is a linear subspace 𝑆 of {0, 1}𝑛
such that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠 ⃗ for some
𝑠 ⃗ ∈ 𝑆. The dimension 𝑚 of 𝑆 is also an input.
Output: A basis of 𝑆.
circuit 𝑄(𝑈 𝑓 ) from Figure 5.4.1 the algorithm selects 𝑚 elements 𝑤⃗ 1 , . . . , 𝑤⃗ 𝑚 in the
orthogonal complement
(5.5.1) 𝑆 ⟂ = {𝑤⃗ ∈ {0, 1}𝑛 ∶ 𝑤⃗ ⋅ 𝑠 ⃗ = 0 for all 𝑠 ⃗ ∈ 𝑆}
of 𝑆 randomly with the uniform distribution. This set is a linear subspace of {0, 1}𝑛 of
dimension 𝑛 − 𝑚. If the matrix 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑚 ) has rank 𝑚, then the kernel of 𝑊 is
equal to the subspace 𝑆, and the algorithm returns a basis of this kernel.
In the remainder of this section, we will prove the following theorem.
Theorem 5.5.3. Algorithm 5.5.4 returns a basis of the hidden subgroup 𝑆 from the gen-
eralization of Simon’s problem with probability at least 1/4. It uses 𝑚 applications of 𝑈 𝑓
and O(𝑛3 ) other operations.
The proof of Theorem 5.5.3 is analogous to the proof of Theorem 5.4.6. Therefore,
the proofs of the corresponding results are left to the reader as exercises. We start by
determining the structure of 𝑆 ⟂ .
Proposition 5.5.5. Let 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−𝑚 ) be of dimension 𝑛 − 𝑚. Then 𝑆 is the kernel
of 𝑊 T .
Exercise 5.5.6. Prove Proposition 5.5.5.
It follows from Proposition 5.5.5 that Algorithm 5.5.4 returns the correct result if
the quantum circuit QSimon (𝑛, 𝑈 𝑓 ) in Figure 5.4.1 returns elements of 𝑆 ⟂ which we
will prove now. For this, we need the following lemma.
Lemma 5.5.7. Let 𝑧 ⃗ ∈ {0, 1}𝑛 and set
1
(5.5.2) |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ |𝑧 ⃗ ⊕ 𝑠⟩⃗ .
√2𝑚 ⃗
𝑠∈𝑆
220 5. The Algorithms of Deutsch and Simon
Then we have
1
(5.5.3) 𝐻 ⊗𝑛 |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑧⟩⃗ .
√2𝑛−𝑚 ⃗ ⟂
𝑤∈𝑆
Finally, Theorem 5.5.3 is proved in the next exercise using Propositions 5.5.5 and
5.5.9.
Exercise 5.5.11. Prove Theorem 5.5.3.
Chapter 6
In this chapter, we present the algorithms that Peter Shor first introduced in 1994
[Sho94], causing a significant stir in the cybersecurity domain. These are quantum
algorithms that have the remarkable ability to compute integer factorizations and dis-
crete logarithms in polynomial time. The intractability of these problems for large pa-
rameters forms the foundation of security in the most commonly used public-key cryp-
tography, a pivotal pillar of overall cybersecurity, particularly internet security. Due
to their profound significance in IT security, the Shor algorithms stand as the most
renowned quantum algorithms. Their invention spurred the inception of the highly
active research field known as Post-Quantum Cryptography.
The chapter starts by presenting the idea of the Shor factoring algorithm. Then, the
most important tools used by Shors’s algorithms, the Quantum Fourier Transform and
quantum circuits for efficiently implementing it and its inverse, are explained. Subse-
quently, we show how the Quantum Fourier Transform is used to solve the quantum
phase estimation problem which addresses the challenge of approximating the phase
of the eigenvalue of a unitary operator when an associated eigenstate of this operator
is known. Quantum phase estimation is then applied to finding the order of elements
in the multiplicative group modulo a positive integer in polynomial time. For this,
a quantum variant of the well-known fast exponentiation technique is essential. We
then show how efficient order finding enables integer factorization in polynomial time.
Also, we show how quantum phase estimation and quantum fast exponentiation lead
to a polynomial time algorithm for discrete logarithms. We conclude the chapter by dis-
cussing the hidden subgroup problem and demonstrating that several computational
problems in this and the previous section can be viewed as instances of this problem.
As usual, we identify linear operators on the state spaces ℍ𝑛 , 𝑛 ∈ ℕ, with their rep-
resentation matrices with respect to the computational basis of ℍ𝑛 . Furthermore, the
complexity analyses of this chapter assume that all quantum circuits are constructed
using the elementary quantum gates provided by the platform discussed in Section
4.12.2.
221
222 6. The Algorithms of Shor
So, our primary objective is now to determine the order of 𝑎 modulo 𝑁. To achieve
this, the Shor algorithm uses a precision parameter 𝑛 and a unitary operator 𝑈𝑎 on ℍ𝑛
with the following property. There is an orthonormal sequence (|𝑢𝑘 ⟩)𝑘∈ℤ𝑟 of eigen-
𝑘
states with corresponding eigenvalue sequence (𝑒2𝜋𝑖 𝑟 )𝑘∈ℤ𝑟 . The Shor algorithm then
𝑥 𝑘
finds an integer 𝑥 ∈ ℤ2𝑛 such that 2𝜋 2𝑛 is close to the phase 2𝜋 𝑟 of the eigenvalue
associated to |𝑢𝑘 ⟩ for some 𝑘 ∈ ℤ𝑟 . Then the continued fraction algorithm applied to
𝑥
2𝑛
is used to determine the denominator 𝑟 which is the order of 𝑎 modulo 𝑁.
𝑘
To approximate one of the phases 2𝜋 𝑟 , Shor’s algorithm employs the quantum
phase estimation algorithm. It can find an approximation to the phase of the eigen-
value of a given unitary operator 𝑈. However, it is important to note that this algo-
rithm requires a corresponding eigenstate as an input. In general, preparing such an
eigenstate efficiently is impossible. But we will demonstrate in Proposition 6.4.5 that
𝑟−1
∑𝑘=0 |𝑢𝑘 ⟩ = |1⟩𝑛 and, therefore, this superposition can be efficiently prepared. Con-
sequently, the algorithm of Shor applies the quantum phase estimation algorithm to
𝑘
this superposition, obtaining an approximation for one of the phases 2𝜋 𝑟 . Since the
primary interest lies in determining the denominator 𝑟 of these phases this is sufficient.
Before we can explain how quantum phase estimation operates, it is essential to
introduce its primary component: the Quantum Fourier Transform.
to identify the strings in {0, 1}𝑛 with the integers 𝑥 ∈ ℤ2𝑛 . Using this identification we
write
it follows that |𝜓𝑛 (𝜔)⟩ is a quantum state in ℍ𝑛 . We now give another representation
of this state.
Proof. We have
𝑛−1 𝑛−𝑗−1 𝜔
|0⟩ + 𝑒2𝜋𝑖⋅2 |1⟩
⨂ √2
𝑗=0
𝑛−1
1 𝑛−𝑗−1 𝜔
= ∑ ∏ 𝑒2𝜋𝑖𝑦𝑗 2 |𝑦⟩⃗
√2𝑛 ⃗
𝑦=(𝑦 0 ,. . .,𝑦𝑛−1 )∈{0,1}𝑛 𝑗=0
(6.2.6) 1 𝑛−1
2𝜋𝑖(∑𝑗=0 𝑦𝑗 2𝑛−𝑗−1 )𝜔
= ∑ 𝑒 |𝑦⟩⃗
√2𝑛 ⃗
𝑦=(𝑦 0 ,. . .,𝑦𝑛−1 )∈{0,1}
𝑛
2𝑛 −1
1
= ∑ 𝑒2𝜋𝑖𝑦𝜔 |𝑦⟩𝑛
√2𝑛 𝑦=0
= |𝜓𝑛 (𝜔)⟩ .
1
Example 6.2.4. The alternative representation of the state 𝜓2 ( 4 ), which was already
considered in Example 6.2.2 ,is
1⋅ 1 0⋅ 1
1 |0⟩ + 𝑒2𝜋𝑖⋅2 4 |1⟩ |0⟩ + 𝑒2𝜋𝑖⋅2 4 |1⟩
𝜓2 ( ) = ⊗
4 √2 √2
|0⟩ − |1⟩ |0⟩ + 𝑖 |1⟩
= ⊗ .
√2 √2
∗
Proof. The representation matrices of QFT𝑛 and QFT𝑛 with respect to the computa-
tional basis of ℍ𝑛 are
𝑥
𝑒2𝜋𝑖 2𝑛 𝑦
(6.2.10) QFT𝑛 = ( )
√2𝑛 𝑥,𝑦∈ℤ2𝑛
and
𝑦
∗ 𝑒−2𝜋𝑖 2𝑛 𝑧
(6.2.11) QFT𝑛 =( ) .
√2𝑛 𝑦,𝑧∈ℤ2𝑛
6.2. The Quantum Fourier Transform 225
Let 𝑥, 𝑧 ∈ ℤ2𝑛 . Then the entry in row 𝑥 and column 𝑧 of the matrix products
∗ ∗
QFT𝑛 ⋅ QFT𝑛 and QFT𝑛 ⋅ QFT𝑛 is
2𝑛 −1 1 if 𝑥 = 𝑧,
1 𝑥 𝑦
(6.2.12) 𝑛
∑ 𝑒2𝜋𝑖 2𝑛 𝑦 𝑒−2𝜋𝑖 2𝑛 𝑧 = { 1 1−𝑒2𝜋𝑖(𝑥−𝑧)
2 𝑦=0 2𝑛 2𝜋𝑖
𝑥−𝑧 = 0 if 𝑥 ≠ 𝑧.
1−𝑒 2𝑛
∗ ∗
Hence, we have QFT𝑛 ⋅ QFT𝑛 = QFT𝑛 ⋅ QFT𝑛 = 𝐼𝑛 . So QFT𝑛 is unitary and (6.2.9)
follows from (6.2.11). □
In particular, we have
𝑛−1
|0⟩ + |1⟩ ⊗𝑛
(6.2.14) QFT𝑛 |0⟩𝑛 = = 𝐻 ⊗𝑛 |0⟩
⨂ √2
𝑗=0
and construct a quantum circuit that creates the equally weighted superposition of all
computational basis states of ℍ𝑛 .
We use Proposition 6.2.3 and (6.2.14) to give alternative formulas for QFT𝑛 and
−1
QFT𝑛 . For 𝑚 ∈ ℕ and (𝑏0 , . . . , 𝑏𝑚−1 ) ∈ {0, 1}𝑚 we write
𝑚−1
(6.2.16) 0.𝑏0 𝑏1 ⋯ 𝑏𝑚−1 = ∑ 𝑏𝑖 2−𝑖−1 .
𝑖=0
and
𝑛−1
−1 |0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥𝑛−𝑗−1 𝑗𝑥𝑛−𝑗−2 ⋯𝑥𝑛−1 |1⟩
(6.2.18) QFT𝑛 |𝑥⟩𝑛 = .
⨂ √2
𝑗=0
226 6. The Algorithms of Shor
1
|𝑥0 ⟩ 𝐻 𝑅2 ⋯ 𝑅𝑛 ⋯ ⋯ (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥0 ⋯𝑥𝑛−1 |1⟩)
√2
1
|𝑥1 ⟩ ⋯ 𝐻 𝑅2 ⋯ 𝑅𝑛−1 ⋯ (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥1 ⋯𝑥𝑛−1 |1⟩)
√2
1
|𝑥2 ⟩ ⋯ ⋯ ⋯ (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥2 ⋯𝑥𝑛−1 |1⟩)
√2
⋮ ⋮
1
|𝑥𝑛−2 ⟩ ⋯ ⋯ 𝐻 𝑅2 (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑛−2 𝑥𝑛−1 |1⟩)
√2
1
|𝑥𝑛−1 ⟩ ⋯ ⋯ 𝐻 (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑛−1 |1⟩)
√2
Figure 6.2.1. A quantum circuit that computes QFT𝑛 up to a permutation that re-
verses the order of the output qubits.
Next, we present a quantum circuit that computes the Quantum Fourier Trans-
form. Figure 6.2.1 shows this quantum circuit up to a permutation that reverses the
order of the output qubits and Algorithm 6.2.12 is its complete specification.
Proof. Let 𝑗 ∈ {0, . . . , 𝑛 − 1}. We explain the evolution of the input qubit |𝑥𝑗 ⟩ in
the quantum circuit shown in Figure 6.2.1. First, the quantum circuit applies the
Hadamard operator to |𝑥𝑗 ⟩. This has the following effect on this qubit which is shown
in Figure 6.2.2:
1
(6.2.19) 𝐻 |𝑥𝑗 ⟩ = (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 |1⟩) .
√2
This operation does not change the other qubits.
6.2. The Quantum Fourier Transform 227
1
𝑥𝑗 𝐻 (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 |1⟩)
√2
1 1
|0⟩ + 𝑒2𝜋⋅0.𝑥𝑗 ⋯𝑥𝑘−1 |1⟩ 𝑅𝑘+1 |0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 ⋯𝑥𝑘 |1⟩
√2 √2
|𝑥𝑘 ⟩ |𝑥𝑘 ⟩
Figure 6.2.3. The controlled-𝑅𝑘 operator kicks the phase 2𝜋𝑥𝑘 /2𝑘+1 back to the am-
plitude of |1⟩.
We estimate the size of the quantum circuit. It can be seen in Figure 6.2.1 that
O(𝑛2 ) input qubits, elementary quantum gates, and controlled-𝑅𝑘 gates are used before
the order of the output qubits is reversed. By Corollary 4.12.8, the implementation
of the controlled-𝑅𝑘 gates requires O(1) elementary gates. So, the size of this part of
the quantum circuit is O(𝑛2 ). By Proposition 4.5.2, the reversal of the output qubits
requires another O(𝑛) elementary quantum gates. This shows that the total size of the
circuit is O(𝑛2 ). □
Exercise 6.2.14. Find a quantum P-uniform circuit family (𝑄𝑛 ) such that for all 𝑛 ∈ ℕ
and all (𝑏0 , . . . , 𝑏𝑛 ) ∈ {0, 1}𝑛
Exercise 6.2.15. Verify that the quantum circuits in Figures 6.2.2 and 6.2.3 have the
asserted outputs.
Proof. The quantum circuit in Figure 6.2.4 is the inverse of the quantum circuit in
Figure 6.2.1. □
228 6. The Algorithms of Shor
1
|𝑥0 ⟩ ⋯ ⋯ 𝑅∗𝑛 ⋯ 𝑅∗2 𝑞𝑤
𝐻 (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥0 ⋯𝑥𝑛−1 |1⟩)
√2
1
|𝑥1 ⟩ ⋯ 𝑅∗𝑛−1 ⋯ 𝑅∗2 𝐻 ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥1 ⋯𝑥𝑛−1 |1⟩)
√2
1
|𝑥2 ⟩ ⋯ ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥2 ⋯𝑥𝑛−1 |1⟩)
√2
⋮ ⋮
1
|𝑥𝑛−2 ⟩ 𝑅∗2 𝐻 ⋯ ⋯ ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥𝑛−2 𝑥𝑛−1 |1⟩)
√2
1
|𝑥𝑛−1 ⟩ 𝐻 ⋯ ⋯ ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥𝑛−1 |1⟩)
√2
6.3.1. The idea. We start by explaining the idea of the phase estimation algo-
rithm which is shown in Figure 6.3.1. Using the Hadamard operator and the controlled-
𝑥
𝑈 operator, the algorithm constructs the state |𝜓3 ⟩ = |𝜓𝑛 (𝜔)⟩. If 𝜔 = 2𝑛 for some
−1
𝑥 ∈ ℤ2𝑛 , then it follows from (6.2.7) that |𝜓4 ⟩ = QFT𝑛 |𝜓𝑛 (𝜔)⟩ = |𝑥⟩. So measuring
this state gives 𝑥 with probability 1. We will show that if 𝜔 is not of this form, then mea-
𝑥
suring |𝜓4 ⟩ gives with high probability 𝑥 ∈ ℤ2𝑛 such that 2𝑛 is a good approximation
to 𝜔.
6.3.2. Approximating |𝜓𝑛 (𝜔)⟩. As discussed in Section 6.3.1, the phase estima-
−1
tion algorithm constructs and measures QFT𝑛 |𝜓𝑛 (𝜔)⟩. The measurement outcome is
𝑥 𝑥
𝑥 ∈ ℤ2𝑛 such that 2𝑛 approximates the phase 𝜔. It is important to note that 2𝑛 falls
within the interval [0, 1[, whereas 𝜔 may be any real number. However, adding inte-
gers to 𝜔 does not alter |𝜓𝑛 (𝜔)⟩. This justifies the subsequent definition, allowing us to
quantify the accuracy of this approximation.
The quantity Δ(𝜔, 𝑛, 𝑥) from Definition 6.3.1 has the following properties.
Proof. The first assertion follows from the periodicity of the function 𝑓(𝑦) = 𝑒2𝜋𝑖𝑦
modulo 1. The second claim follows from the definition of Δ(𝜔, 𝑛, 𝑥). □
𝑥
In several situations we are interested in estimation 𝜔 − 2𝑛 instead of Δ(𝜔, 𝑛, 𝑥).
The next lemma shows when the first expression can be replaced by the second.
Proof. Let
𝑥
(6.3.4) 𝑧 = ⌊𝜔 − ⌉.
2𝑛
Then we have
𝑥
(6.3.5) Δ(𝜔, 𝑛, 𝑥) = 𝜔 − −𝑧
2𝑛
which implies
𝑥
(6.3.6) 𝑧=𝜔− − Δ(𝜔, 𝑛, 𝑥).
2𝑛
So the first inequality in (6.3.2) implies
𝑥 1 𝑥 1
(6.3.7) 𝜔− − < 𝑧 < 𝜔 − 𝑛 + 𝑛.
2𝑛 2𝑛 2 2
Using 𝑥 ∈ ℤ2𝑛 and the inequalities for 𝜔 in (6.3.2) we obtain
1 𝑥 𝑥
(6.3.8) −1 ≤ − − <𝑧<1− 𝑛
2𝑛 2𝑛 2
which implies 𝑧 = 0. □
Finally, we prove the following statement which is crucial in the analysis of the
phase estimation algorithm.
230 6. The Algorithms of Shor
Proposition 6.3.4. Let 𝑛 ∈ ℕ and 𝜔 ∈ ℝ. For 𝑥 ∈ ℤ2𝑛 denote by 𝑝(𝑥) the probability
−1
that 𝑥 is the outcome of measuring QFT𝑛 |𝜓𝑛 (𝜔)⟩ in the computational basis of ℍ𝑛 . Then
the following hold.
(1) If 2𝑛 𝜔 ∈ ℤ, then for 𝑥 = 2𝑛 𝜔 mod 2𝑛 we have 𝑝(𝑥) = 1 and 𝑝(𝑥) = 0 for all other
integers 𝑥 ∈ ℤ2𝑛 .
(2) If 2𝑛 𝜔 ∉ ℤ, then for all 𝑥 ∈ ℤ2𝑛 we have
2
1 sin (2𝑛 𝜋Δ(𝜔, 𝑛, 𝑥))
(6.3.9) 𝑝(𝑥) = .
22𝑛 sin2 (𝜋Δ(𝜔, 𝑛, 𝑥))
Proof. Set 𝑁 = 2𝑛 and Δ = Δ(𝜔, 𝑛, 𝑥). From Definition 6.2.1, Proposition 6.2.8, and
Lemma 6.3.2 we obtain
𝑁−1
−1 1 −1
QFT𝑛 |𝜓𝑛 (𝜔)⟩ = ∑ 𝑒2𝜋𝑖𝜔𝑦 QFT𝑛 |𝑦⟩𝑛
√𝑁 𝑦=0
𝑁−1 𝑁−1
1 1 𝑦
(6.3.10) = ∑ 𝑒2𝜋𝑖𝜔𝑦 ∑ 𝑒−2𝜋𝑖 𝑁 𝑥 |𝑥⟩𝑛
√𝑁 𝑦=0 √𝑁 𝑥=0
𝑁−1 𝑁−1
1
= ∑ ( ∑ 𝑒2𝜋𝑖∆𝑦 ) |𝑥⟩𝑛 .
𝑥=0
𝑁 𝑦=0
6.3.3. The problem and the algorithm. We now state the phase estimation
problem which is also called the eigenvalue estimation problem.
Problem 6.3.5 (Phase estimation problem).
Input: Positive integers 𝑚 and 𝑛, an implementation of the controlled-𝑈 operator for
some unitary operator 𝑈 on ℍ𝑚 , and an eigenstate |𝜓⟩ of 𝑈.
6.3. Quantum phase estimation 231
|0⟩
|0⟩
−1
𝐻⋮⊗𝑛 ⋮ QFT𝑛 ⋮ 𝑥
|0⟩
|0⟩
|𝜓⟩ 𝑈2
𝑛−1
𝑈2
𝑛−2
⋯ 𝑈2 𝑈 tr
Output: An integer 𝑥 ∈ ℤ2𝑛 such that |Δ(𝜔, 𝑛, 𝑥)| < 1/2𝑛 where 𝜔 ∈ ℝ and 𝑒2𝜋𝑖𝜔 is
the eigenvalue associated with |𝜓⟩.
A quantum circuit that solves this problem is shown in Figure 6.3.1 and specified
in Algorithm 6.3.6.
The next theorem states the correctness of the phase estimation algorithms and its
success probability.
232 6. The Algorithms of Shor
Theorem 6.3.7. Let 𝑚, 𝑛 be positive integers, and let 𝑈 be a unitary operator on ℍ𝑚 . Let
𝜔 ∈ ℝ be such that 𝑒2𝜋𝑖𝜔 is an eigenvalue of 𝑈 and let |𝜓⟩ be a corresponding eigenstate.
Also, let 𝑥 ∈ ℤ2𝑛 be the return value of the phase estimation algorithm. Then the following
hold.
Proof. The quantum circuit operates on two quantum registers. The first is the control
⊗𝑛
register. Its length is the precision parameter 𝑛 and is initialized to |0⟩ . The second
register is the target register. It is of length 𝑚 and is initialized with the eigenvector |𝜓⟩
of 𝑈. So, the initial state of the algorithm is
⊗𝑛
(6.3.15) |𝜓0 ⟩ = |0⟩ |𝜓⟩ .
The algorithm then applies 𝐻 ⊗𝑛 to the control register. This gives the state
⊗𝑛
|0⟩ + |1⟩
(6.3.16) |𝜓1 ⟩ = ( ) |𝜓⟩ .
√2
𝑛−𝑗−1 𝑛−𝑗−1 𝜔
(6.3.17) 𝑈2 |𝜓⟩ = 𝑒2𝜋𝑖2 |𝜓⟩ .
This is a global phase shift. These operators are applied to the target register controlled
by the 𝑗th qubit of the control register which produces the new state
𝑛−1 𝑛−𝑗−1 𝑗𝜔
|0⟩ + 𝑒2𝜋𝑖2 |1⟩
(6.3.18) |𝜓2 ⟩ = |𝜓⟩ = |𝜓𝑛 (𝜔)⟩ |𝜓⟩ .
⨂ √2
𝑗=0
This shows that the global phase shifts are kicked back to the amplitudes of |1⟩ in the
control qubits. Since the state |𝜓2 ⟩ is separable with respect to the decomposition into
the control and the target register, it follows from Corollary 3.7.12 that tracing out the
target register yields the state
This state is measured. So, the first assertion follows from Proposition 6.3.4.
6.4. Order finding 233
|0⟩𝑛 −1 |𝑥⟩𝑛
QFT𝑛 𝑐 QFT𝑛
|𝜓⟩ 𝑈𝑐 tr
Figure 6.3.2. Simplified representation of the quantum circuit for eigenvalue estimation.
In the sequel, we explain a quantum polynomial time algorithm that solves the
order problem. In the following, we let 𝑁, 𝑎, 𝑟 be as in the order finding problem. We
also let 𝑛 ∈ ℕ.
6.4.2. The operator 𝑈𝑐 . We introduce and discuss a unitary operator that is used
in the order finding algorithm.
Definition 6.4.2. For any 𝑐 ∈ ℤ with gcd(𝑐, 𝑁) = 1 we define the linear operator
Proof. By definition, the map 𝑈𝑐 is linear. So it suffices to show that the map
𝑐𝑥 mod 𝑁 if 0 ≤ 𝑥 < 𝑁,
(6.4.2) 𝑓𝑐 ∶ ℤ2𝑛 → ℤ2𝑛 , 𝑥↦{
𝑐 if 𝑁 ≤ 𝑥 < 2𝑛
is a bijection. For this, it suffices to show that 𝑓𝑐 is surjective since the domain and the
codomain of 𝑓𝑐 are the same. Let 𝑦 ∈ ℤ2𝑛 . If 𝑦 ≥ 𝑁, then we have 𝑓(𝑦) = 𝑦. If 𝑦 < 𝑁,
then we have 𝑦 = 𝑓(𝑥) where 𝑥 ∈ ℤ𝑁 such that 𝑐𝑥 ≡ 𝑦 mod 𝑁. This number 𝑥 exists
because gcd(𝑐, 𝑁) = 1. □
Next, we determine the eigenstates of the operators 𝑈𝑎𝑡 for all 𝑡 ∈ ℕ. In the con-
text of the order finding algorithm, only the case 𝑡 = 1 is relevant. However, for the
quantum discrete logarithm problem, we use 𝑡 > 1.
Proof. To prove the first assertion, let 𝑘 ∈ ℤ and 𝑡 ∈ ℕ. We note that the map
(6.4.4) ℤ𝑟 → ℤ 𝑟 , 𝑠 ↦ (𝑠 + 𝑡) mod 𝑟
6.4. Order finding 235
is a bijection. So we have
𝑟−1
1 𝑘
𝑈𝑎𝑡 |𝑢𝑘 ⟩ = ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 𝑈𝑎𝑡 |𝑎𝑠 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑟−1
1 𝑘
= ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 |𝑎𝑠+𝑡 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑟−1
(6.4.5) 𝑘𝑡 1 𝑘
= 𝑒2𝜋𝑖 𝑟 ∑ 𝑒−2𝜋𝑖 𝑟 (𝑠+𝑡) mod 𝑟 ||𝑎(𝑠+𝑡) mod 𝑟 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑟−1
𝑘𝑡 1 𝑘
= 𝑒2𝜋 𝑟 ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 |𝑎𝑠 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑘𝑡
2𝜋 𝑟
=𝑒 |𝑢𝑘 ⟩ .
This concludes the proof of the first assertion.
Next, we turn to the second assertion. Since 𝑟 is the order of 𝑎 modulo 𝑁, the ele-
ments of the sequence (|𝑎𝑠 mod 𝑁⟩𝑛 )0≤𝑠<𝑟 are pairwise different. Thus this sequence
is a basis of Span{|𝑎𝑠 mod 𝑁⟩𝑛 ∶ 0 ≤ 𝑠 < 𝑟}. Also, for all 𝑘, 𝑘′ ∈ ℤ𝑟 we have
𝑟−1
1 𝑘−𝑘′
⟨𝑢𝑘 |𝑢𝑘′ ⟩ = ∑ 𝑒2𝜋𝑖 𝑟 𝑠
𝑟 𝑠=0
(6.4.6)
1 if 𝑘 = 𝑘′ ,
= { 1−𝑒2𝜋𝑖(𝑘−𝑘′ )𝑠
𝑘−𝑘′
=0 if 𝑘 ≠ 𝑘′ .
1−𝑒2𝜋𝑖 𝑟 𝑠
Hence, the sequence (|𝑢0 ⟩ , . . . , |𝑢𝑟−1 ⟩) is an orthonormal basis of Span{|𝑎𝑠 mod 𝑁⟩𝑛 ∶
𝑠 ∈ ℤ𝑟 }, as claimed. □
If we were to apply the phase estimation algorithm with the target register initial-
𝑥
ized to an eigenstate |𝑢𝑘 ⟩ of 𝑈𝑎 , then we would obtain a rational approximation 2𝑛 to
𝑘
𝑟
and thus some information about the order 𝑟 mod 𝑁. Unfortunately, this cannot be
done because it is not known how to prepare the eigenstates of 𝑈𝑎 . But the following
proposition is of help.
We determine the amplitude of |1⟩ in the state in (6.4.8). Since 𝑎𝑠 ≡ 1 mod 𝑁 if and
only if 𝑠 ≡ 0 mod 𝑟, it follows that this amplitude is
𝑟−1
1 𝑘 𝑟
(6.4.9) ∑ 𝑒−2𝜋𝑖 𝑟 ⋅0 = = 1.
𝑟 𝑘=0 𝑟
1 𝑟−1
Since ∑𝑘=0 |𝑢𝑘 ⟩ is a quantum state, the amplitudes of the other basis states in this
√𝑟
representation must be 0. This implies the assertion. □
6.4.3. The algorithm. We now present and analyze a quantum order finding al-
gorithm. The pseudocode is shown in Algorithm 6.4.6. It uses a variant of the phase
estimation circuit, shown in Figure 6.4.1. This circuit differs from the quantum phase
estimation circuit in that the target register is initialized to |1⟩𝑛 . This state can be pre-
pared, and by Proposition 6.4.5 it is the equally weighted superposition of the eigen-
states |𝑢𝑘 ⟩ of 𝑈𝑎 for 0 ≤ 𝑘 < 𝑟. By Proposition 6.4.4, the corresponding eigenvalues are
𝑘
𝑒2𝜋𝑖 𝑟 . Their phase contains information about the order 𝑟 of 𝑎 modulo 𝑁. As we will
see in the proof of Theorem 6.4.8, the modified quantum phase estimation circuit can
be used to determine 𝑟.
(6.4.10) 2𝑛 ≥ 2𝑟2 .
6.4. Order finding 237
|0⟩𝑛 −1
𝐻 ⊗𝑛 𝑐 QFT𝑛 𝑥 ∈ ℤ 2𝑛
|1⟩𝑛 𝑈𝑎𝑐 tr
Figure 6.4.1. The modified phase estimation circuit 𝑄𝑎 used in the order finding algorithm.
We will now prove the following theorem that states the correctness and the com-
plexity of the algorithm.
Theorem 6.4.8. On input of 𝑁 ∈ ℕ and 𝑎 ∈ ℤ𝑁 such that gcd(𝑎, 𝑁) = 1, Algorithm
3958924
6.4.6 computes the order 𝑟 of 𝑎 mod 𝑁 with probability at least 101761𝜋4 > 0.399. The
algorithm has running time O((log 𝑁)3 ).
For analyzing the success probability of the order finding algorithm, we need the
𝑝
following proposition in which we call a representation 𝑟 = 𝑞 of a nonzero rational
number 𝑟 reduced if 𝑞 > 0 and gcd(𝑝, 𝑞) = 1.
Proposition 6.4.9. Denote by 𝑥 the return value of the quantum circuit 𝑄𝑎 from Figure
6.4.1 and let 𝑘 ∈ ℤ𝑟 . Then the following hold.
8
(1) With probability at least 𝑟𝜋2
we have
|𝑥 𝑘| 1
(6.4.13) | 𝑛 − | < 𝑛.
|2 𝑟 | 2
𝑘 𝑥
(2) If (6.4.13) holds, then 𝑟 is a convergent of the continued fraction expansion of 2𝑛 . It
𝑝
is the only convergent of this expansion whose reduced representation 𝑞 satisfies
|𝑥 𝑝| 1 𝑛−1
(6.4.14) | 𝑛 − |< 𝑛 and 𝑞 ≤ 2 2 .
|2 𝑞| 2
238 6. The Algorithms of Shor
and
|𝑥 𝑘| 1
(6.4.22) | 𝑛 − | < 2.
|2 𝑟 | 2𝑟
𝑘
So by Proposition A.3.35, the fraction 𝑟 is a convergent of the continued fraction expan-
𝑥 𝑝
sion of 2𝑛 and its reduced representation 𝑞 satisfies (6.4.14). To show the uniqueness
𝑝
of this convergent, let 𝑞 be another convergent of this continued fraction expansion
that satisfies (6.4.14). Then we have
|𝑘 𝑝| |𝑥 𝑘| | 𝑥 𝑝| 2 ⋅ 2𝑛−1
(6.4.23) |𝑘𝑞 − 𝑝𝑟| = 𝑟𝑞 | − | ≤ 2𝑛−1 (|| 𝑛 − || + | 𝑛 − |) < = 1.
|𝑟 𝑞 | 2 𝑟 | 2 𝑞 | 2𝑛
𝑘 𝑝
So we have 𝑘𝑞 = 𝑝𝑟 which implies 𝑟
= 𝑞. □
𝑘 𝑘 𝑥 |
Exercise 6.4.10. Use Lemma 6.3.3 to show that (6.4.20) implies ||Δ ( 𝑟 , 𝑛, 𝑥)|| = || 𝑟 − 2𝑛 |
.
6.4. Order finding 239
The next lemma provides a sufficient condition for Algorithm 6.4.6 to find the or-
der 𝑟 of 𝑎 modulo 𝑁.
𝑘𝑗 𝑚𝑗
Lemma 6.4.11. For 𝑗 = 1, 2 let 𝑘𝑗 , 𝑚𝑗 ∈ ℕ0 and 𝑟𝑗 ∈ ℕ with 𝑟
= 𝑟𝑗
, and assume that
gcd(𝑘1 , 𝑘2 , 𝑟) = gcd(𝑚𝑗 , 𝑟𝑗 ) = 1. Then 𝑟 = lcm(𝑟1 , 𝑟2 ).
𝑘𝑗 𝑚𝑗
Proof. Since 𝑟
= 𝑟𝑗
and gcd(𝑚𝑗 , 𝑟𝑗 ) = 1 for 𝑗 = 1, 2, it follows that 𝑟1 and 𝑟2 are
divisors of 𝑟. Therefore, lcm(𝑟1 , 𝑟2 ) is a divisor of 𝑟. This means that we can write
The last statement in this section allows us to estimate the probability of the suffi-
cient condition in Lemma 6.4.11 to occur.
Proof. First, note that the number of pairs (𝑘1 , 𝑘2 ) in ℤ2𝑟 with gcd(𝑘1 , 𝑘2 , 𝑟) = 1 is the
same as the number of all such pairs in {1, . . . , 𝑟}2 . This number is at least the number of
coprime pairs in {1, . . . , 𝑟}2 . But it is shown in [Fon12] that the latter number is at least
989731 2
1628176
𝑟 ≥ 0.6𝑟2 . Since, in the experiment, any such pair is chosen with probability at
𝑐2 989731 2
least 𝑟2
it follows that the probability of choosing one such pair is at least 1628176
𝑐 . □
We can now prove the success probability stated in Theorem 6.4.8. Proposition
6.4.9 implies that for 𝑗 = 1, 2 and each 𝑘𝑗 ∈ ℤ𝑟 the 𝑗th iteration of the for loop in
8 𝑚
Algorithm 6.4.6 finds with probability at least 𝑟𝜋2 integers 𝑚𝑗 , 𝑟𝑗 ∈ ℤ2𝑛 such that 𝑟 𝑗 =
𝑗
𝑘𝑗
𝑟
.By Proposition 6.4.12, the probability that the two rounds of the for loop find such
integers with gcd(𝑘1 , 𝑘2 , 𝑟) = 1 satisfies
989731 64
(6.4.26) 𝑝≥ = 0.399.
1628176 𝜋4
It remains to analyze the complexity of the order finding algorithm. Its bottleneck
is the computation of the controlled-𝑈𝑎𝑐 operator which we discuss in the next section.
240 6. The Algorithms of Shor
The circuit implementing C-𝑈𝑎 uses the modular multiplication operator 𝑈𝑚 that
is defined by its effect on the computational basis states |𝑥⟩ |𝑦⟩ of ℍ2𝑛 , 𝑥, 𝑦 ∈ ℤ2𝑛 , as
follows:
|𝑥⟩ |𝑥𝑦 mod 𝑁⟩ if (𝑥, 𝑦) ∈ ℤ2𝑁 ∧ gcd(𝑦, 𝑁) = 1,
(6.4.33) 𝑈𝑚 |𝑥⟩ |𝑦⟩ = {
|𝑥⟩ |𝑦⟩ if (𝑥, 𝑦) ∉ ℤ2𝑁 ∨ gcd(𝑦, 𝑁) > 1.
This operator is unitary, because, as shown in Exercise 6.4.14, the map
|𝑎⟩3 |𝑎0 ⟩3
𝑈𝑚
|0⟩3 |𝑎1 ⟩3
𝑈𝑚
|0⟩3 |𝑎2 ⟩3
We now describe a quantum circuit 𝑈𝑝 that constructs |𝑎𝑖 ⟩ for 0 ≤ 𝑖 < 𝑛. It uses
the bitwise 𝖢𝖭𝖮𝖳 operator which for (𝑥0 , . . . , 𝑥𝑛−1 ), (𝑦0 , . . . , 𝑦𝑛−1 ) ∈ {0, 1}𝑛 gives
𝖢𝖭𝖮𝖳𝑛 |𝑥0 ⋯ 𝑥𝑛−1 ⟩ |𝑦0 ⋯ 𝑦𝑛−1 ⟩
(6.4.35)
= |𝑥0 ⋯ 𝑥𝑛−1 ⟩ 𝑋 𝑥0 |𝑦0 ⟩ ⋯ 𝑋 𝑥𝑛−1 |𝑦𝑛−1 ⟩ .
Note that we have
(6.4.36) 𝖢𝖭𝖮𝖳𝑛 |𝑥0 ⋯ 𝑥𝑛−1 ⟩ |0 ⋯ 0⟩ = |𝑥0 ⋯ 𝑥𝑛−1 ⟩ |𝑥0 ⋯ 𝑥𝑛−1 ⟩ .
Figure 6.4.2 shows the circuit 𝑈𝑝 for 𝑛 = 3. It works as follows. The input state is
|𝑎⟩3 |0⟩3 |0⟩3 . In the first step, the circuit computes the new state
(6.4.37) 𝑈𝑚 𝖢𝖭𝖮𝖳𝑛 (|𝑎⟩3 |0⟩3 ) |0⟩3 = 𝑈𝑚 (|𝑎0 ⟩3 |𝑎0 ⟩3 ) |0⟩3 = |𝑎0 ⟩3 |𝑎1 ⟩3 |0⟩3 .
In the second step, the circuit computes the new state
(6.4.38) |𝑎0 ⟩3 𝑈𝑚 𝖢𝖭𝖮𝖳𝑛 (|𝑎1 ⟩3 |0⟩3 ) = |𝑎0 ⟩3 𝑈𝑚 (|𝑎1 ⟩3 |𝑎1 ⟩3 ) = |𝑎0 ⟩3 |𝑎1 ⟩3 |𝑎2 ⟩3 .
The circuit specification for the general case is presented in Algorithm 6.4.15.
𝑖
Algorithm 6.4.15. Circuit 𝑈𝑝 for computing |𝑎𝑖 ⟩𝑛 = ||𝑎2 mod 𝑁⟩ for 0 ≤ 𝑖 < 𝑛
𝑛
Input: 𝑛 ∈ ℕ, 𝑁 ∈ ℤ2𝑛 , 𝑎 ∈ ℤ∗𝑁
Output: |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛 with 𝑎𝑖 as in (6.4.28)
1: 𝑈𝑝 (𝑛, 𝑁, 𝑎)
2: /* The circuit operates on |𝜓⟩ = |𝜓0 ⟩ ⋯ |𝜓𝑛−1 ⟩ ∈ ℍ⊗𝑛
𝑛 */
⊗(𝑛−1)
3: |𝜓⟩ ← |𝑎⟩𝑛 |0⟩𝑛
4: for 𝑖 = 1, . . . , 𝑛 − 1 do
5: |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩ ← 𝖢𝖭𝖮𝖳𝑛 |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩
6: |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩ ← 𝑈𝑚 |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩
7: end for
8: end
We now prove that the quantum circuit 𝑈𝑝 has the required property and analyze
its complexity.
Proposition 6.4.16. We have
⊗𝑛−1
(6.4.39) 𝑈𝑝 |𝑎⟩𝑛 |0⟩𝑛 = |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛 .
Also, 𝑈𝑝 has size O(𝑛3 ).
242 6. The Algorithms of Shor
|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
|𝑎⟩2 |𝑎0 ⟩2 tr
𝑈𝑝 𝑈𝑝−1
|0⟩2 |𝑎1 ⟩2 tr
|1⟩2 tr
𝑈𝑚 𝑈𝑚
|𝑎𝑐 𝑡 mod 𝑁⟩2 if 𝑡 ∈ ℤ𝑁 ,
|𝑡⟩2 {
|𝑡⟩2 if ∉ ℤ𝑁
Proof. We use the notation of Algorithm 6.4.15 and show by induction on 𝑖 that after
the 𝑖th iteration of the for loop we have
The base case 𝑖 = 0 follows from the choice of the input state. For the inductive step,
assume that 0 ≤ 𝑖 < 𝑛 − 1 and that (6.4.40) holds. It follows from (6.4.29) that after the
completion of the (𝑖 + 1)st iteration of the for loop we have
This proves that 𝑈𝑝 gives |𝑎0 ⟩ ⋯ |𝑎𝑛−1 ⟩. We estimate the size of the circuit. There are
𝑛 iterations of the for loop. Each iteration executes 𝖢𝖭𝖮𝖳𝑛 and 𝑈𝑚 . Both operations
have complexity O(𝑛2 ). So the total size is O(𝑛3 ). □
Next, we construct a quantum circuit that implements the unitary operator C-𝑈𝑎
from (6.4.32) which is shown in Figure 6.4.3 for 𝑛 = 2. Its initial state is |𝑐⟩2 |𝑡⟩2 where
𝑐, 𝑡 ∈ ℤ4 . Then the ancilla qubits |𝑎⟩2 , |0⟩2 , and |1⟩2 are inserted between |𝑐⟩2 and |𝑡⟩2 .
The operator 𝑈𝑝 is applied to |𝑎⟩2 |0⟩2 . This gives the state |𝑐⟩2 |𝑎0 ⟩2 |𝑎1 ⟩2 |1⟩2 |𝑡⟩2 . Next,
|𝑎1 ⟩2 and |1⟩2 are swapped conditioned on 𝑐 0 = 1. Therefore, the input state to 𝑈𝑚 is
||𝑎𝑐10 ⟩ |𝑡⟩2 . Applying 𝑈𝑚 gives the state |𝑐⟩2 |𝑎0 ⟩2 ||𝑎1−𝑐
1
0 𝑐
⟩ ||𝑎10 ⟩ |𝑡1 ⟩2 . The next controlled
2 2
swap changes the three ancillary registers back to |𝑎0 ⟩2 |𝑎1 ⟩2 |1⟩2 . The circuit then ap-
plies another controlled swap, the operator 𝑈𝑚 and the inverse swap. The result is the
state |𝑐⟩2 |𝑎0 ⟩2 |𝑎1 ⟩2 |1⟩2 |𝑡2 ⟩2 . Tracing out the ancillary qubits, we obtain the required
result.
6.4. Order finding 243
The general circuit that implements C-𝑈𝑎 is specified in Algorithm 6.4.17. It uses
the controlled swap operator 𝖢𝖲𝖶𝖠𝖯𝑖 for 𝑖 = 0, . . . , 𝑛 − 1. Its effect is seen in (6.4.42):
𝖢𝖲𝖶𝖠𝖯𝑖 |𝑐⟩𝑛 |𝜉0 ⟩ ⋯ |𝜉𝑛−1 ⟩ |𝜑0 ⟩ |𝜑1 ⟩
(6.4.42) |𝑐⟩ |𝜉 ⟩ ⋯ |𝜉𝑛−1 ⟩ |𝜑0 ⟩ |𝜑1 ⟩ if 𝑐 𝑖 = 0,
={ 𝑛 0
|𝑐⟩𝑛 |𝜉0 ⟩ ⋯ |𝜉𝑛−𝑖−1 ⟩ |𝜑0 ⟩ |𝜉𝑛−𝑖+1 ⟩ ⋯ |𝜉𝑛−𝑖 ⟩ |𝜑1 ⟩ if 𝑐 𝑖 = 1.
So 𝖢𝖲𝖶𝖠𝖯𝑖 exchanges |𝜉𝑛−𝑖 ⟩ and |𝜑0 ⟩ conditioned on 𝑐 𝑖 being 1. For example, the cir-
cuit in Figure 6.4.3 uses 𝖢𝖲𝖶𝖠𝖯1 . It swaps |𝑎1 ⟩𝑛 and |1⟩𝑛 conditioned on 𝑐 0 being 1.
We now prove the following result.
Proposition 6.4.18. The circuit specified in Algorithm 6.4.17 implements the unitary
operator C-𝑈𝑎 from (6.4.32) and has size O(𝑛3 ).
Proof. We prove by induction on 𝑖 that after executing 𝑖 iterations of the for loop we
have
(6.4.43) |𝜓⟩ = |𝑐⟩𝑛 |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛 |1⟩𝑛 |𝑡 𝑖 ⟩𝑛 .
Applying this for 𝑖 = 𝑛 it follows that the circuit implements C-𝑈𝑎 .
The base case follows by considering the instructions in lines 3 and 4 and Propo-
sition 6.4.16.
For the inductive step, assume that 0 ≤ 𝑖 < 𝑛 and that (6.4.43) holds. In the (𝑖+1)st
iteration of the for loop the instruction in line 6 swaps |𝑎𝑛−𝑖−1 ⟩𝑛 and |1⟩𝑛 conditioned
on 𝑐 𝑖+1 being 1. This means that after this operation, we have
𝑐
(6.4.44) |𝜓𝑛 ⟩ |𝜓𝑛+1 ⟩ = ||𝑎𝑛−𝑖−1
𝑖+1
⟩ |𝑡 𝑖 ⟩𝑛 .
So the application of 𝑈𝑚 to this quantum state gives
(6.4.45) ||𝑎𝑐𝑛−𝑖−1
𝑖+1
⟩ |𝑡 𝑖+1 ⟩𝑛 .
Another application of 𝖢𝖲𝖶𝖠𝖯𝑖 swaps |𝑎𝑖+1 ⟩𝑛 and |1⟩𝑛 conditioned on 𝑐 𝑖+1 being 1. So,
(6.4.43) holds.
244 6. The Algorithms of Shor
Finally, we estimate the size of the circuit. The number of ancilla bits required by
the circuit is O(𝑛2 ). By Proposition 6.4.16, the circuit 𝑈𝑝 has size O(𝑛3 ). The number
of iterations of the for loop is 𝑛. We analyze the complexity of the implementation of
𝖢𝖲𝖶𝖠𝖯𝑖 . As seen in Figure 4.5.1, one quantum swap can be implemented using 𝑂(1)
elementary gates. So, swapping 𝑛 qubits can be achieved using O(𝑛) elementary gates.
Theorem 4.12.7 implies that 𝖢𝖲𝖶𝖠𝖯𝑖 can also be implemented using O(𝑛) elementary
quantum gates. Also, 𝑈𝑚 requires O(𝑛2 ) elementary quantum gates. So, the complexity
of the for loop is O(𝑛3 ) which concludes the proof. □
Now we can prove the complexity statement of Theorem 6.4.8 as follows. The or-
der finding algorithm uses the quantum circuit 𝑄𝑎 twice. By assumption, the precision
parameter 𝑛 used in this circuit satisfies 2𝑛 ≤ 2𝑁 2 , which implies 𝑛 = O(log 𝑁). So,
by Proposition 6.4.18, the size of this circuit is O((log 𝑁)3 ). By Proposition A.3.28, the
application of the continued fraction algorithm requires time O((log 𝑁)2 ). Further-
more, the calculation of the lcm, 𝑎𝑟 mod 𝑁, and all other operations requires running
time O((log 𝑁)3 ). This implies the complexity statement of Theorem 6.4.8. If faster
algorithms for integer multiplication, division with remainder, and lcm are used, the
complexity of the order finding algorithm becomes (log 𝑁)2 (log log 𝑁)O(1) . Such algo-
rithms are, for example, presented in [AHU74] and [HvdH21].
The best-known classical and fully analyzed Monte Carlo algorithm for this prob-
1/2
lem has subexponential complexity 𝑒(1+o(1))(log 𝑁 log log 𝑁) [LP92]. Furthermore, the
best heuristic Monte Carlo algorithm for this problem has subexponential complexity
1/3 2/3
𝑒(𝑐+o(1))(log 𝑁) (log log 𝑁) where 𝑐 = √ 3
64/9 [BLP93]. The quantum factoring algo-
rithm is Algorithm 6.5.2. Its idea has already been explained in Section 6.1. It selects
𝑎 ∈ ℤ𝑁 randomly with the uniform distribution and computes 𝑑 = gcd(𝑎, 𝑁). If 𝑑 > 1,
then 𝑑 is a proper divisor of 𝑁; the algorithm returns this divisor and terminates. Oth-
erwise, the algorithm calls FindOrder(𝑁, 𝑎, 𝑛) with 𝑛 from (6.4.11): By Theorem 6.4.8
it finds the order 𝑟 of 𝑎 modulo 𝑁 with probability at least 0.399. If 𝑟 is even, then
Also, if 𝑁 does not divide 𝑎𝑟/2 + 1, then gcd(𝑎𝑟/2 − 1, 𝑁) is a proper divisor of 𝑁. This
is what the algorithm tests.
6.5. Integer factorization 245
Lemma 6.5.5. Let 𝑝 be an odd prime number and let 𝑒 ∈ ℕ. Let 𝑑 ∈ ℕ be the exponent
of 2 in the prime factor decomposition of 𝑝 − 1. Choose an integer 𝑎 ∈ ℤ𝑝∗𝑒 randomly with
the uniform distribution. Then the probability for 2𝑑 to divide the order of 𝑎 modulo 𝑝𝑒 is
1/2.
is a bijection. Select 𝑘 ∈ ℤ𝜑 randomly with the uniform distribution. Due to the bijec-
tivity of (6.5.3) , the integer 𝑎 = 𝑔𝑘 mod 𝑝𝑒 is uniformly distributed in ℤ𝑝∗𝑒 . Also, the
order of 𝑎 is
𝜑
(6.5.4) 𝑟= .
gcd(𝑘, 𝜑)
Let 𝑑 ′ be the exponent of 2 in the prime factor decomposition of 𝑟. Then (6.5.4) implies
that 𝑑 ′ = 𝑑 if 𝑘 is odd and 𝑑 ′ < 𝑑 if 𝑘 is even. Since 𝜑 is even, half of the integers in ℤ𝜑
are even and half are odd. So, we have 𝑑 = 𝑑 ′ with probability 1/2. □
The next proposition gives a lower bound for the conditional success probability
of Algorithm 6.5.2 in the case where it selects an integer 𝑎 that is coprime to 𝑁 and the
order finding algorithm returns the order of 𝑎 modulo 𝑁.
Proposition 6.5.6. Let 𝑁 be an odd composite number with 𝑚 different prime factors.
Choose 𝑎 ∈ ℤ∗𝑁 randomly with the uniform distribution. Then the probability that the
order 𝑟 of 𝑎 modulo 𝑁 is even and that 𝑎𝑟/2 ≢ −1 mod 𝑁 is at least 1 − 1/2𝑚−1 .
Proof. We show that the probability that 𝑟 is odd or that 𝑟 is even and satisfies 𝑎𝑟/2 ≡
−1 mod 𝑁 is at most 1/2𝑚−1 .
Let
𝑚
𝑒
(6.5.5) 𝑁 = ∏ 𝑝𝑖 𝑖
𝑖=1
(6.5.6) 𝑟 = lcm(𝑟1 , . . . , 𝑟𝑛 ).
6.5. Integer factorization 247
(6.5.7) 𝑓 = 𝑓𝑖 for 1 ≤ 𝑖 ≤ 𝑚.
So let 𝑖 ∈ {1, . . . , 𝑚}. If 𝑟 is odd, then 𝑓 = 0 and (6.5.6) implies 𝑓𝑖 = 0. Now assume that
𝑒
𝑓 > 0 and 𝑎𝑟/2 ≡ −1 mod 𝑁. Since 𝑎𝑟𝑖 ≡ 1 mod 𝑝𝑖 𝑖 , it follows that 𝑟 𝑖 |𝑟 which implies
𝑒
𝑓𝑖 ≤ 𝑓. But since 𝑎𝑟/2 𝑖 ≡ −1 mod 𝑝𝑖 𝑖 , it follows that 𝑓𝑖 = 𝑓.
From the above argument, it follows that the probability that 𝑟 is odd or that 𝑟 is
even and satisfies 𝑎𝑟/2 ≡ −1 mod 𝑁 is at most the probability that all 𝑓𝑖 are equal. We
show that this probability is at most 1/2𝑚−1 . We know that 𝑓1 assumes some value with
probability 1. It follows from Lemma 6.5.5 that for 2 ≤ 𝑖 ≤ 𝑚 we have 𝑓𝑖 = 𝑓1 with
probability at most 1/2. In fact, if 𝑓1 is the exponent of 2 in the prime factorization of
𝑝 𝑖 − 1, then by Lemma 6.5.5 the probability that 𝑓𝑖 = 𝑓1 is 1/2. And if 𝑓1 is not this
exponent, then by Lemma 6.5.5 the probability that 𝑓𝑖 = 𝑓1 is at most 1 − 1/2 = 1/2.
So the probability that all 𝑓𝑖 are equal is at most 1/2𝑚−1 . □
3958924
(6.5.8) > 0.199.
2 ∗ 101761𝜋4
𝑁 − 𝜑(𝑁) + 0.199𝜑(𝑁)
(6.5.9) > 0.199.
𝑁
By Theorem 6.4.8 and the choice of 𝑛 in Algorithm 6.5.2, the running time of the algo-
rithm is O((log 𝑁)3 ).
Exercise 6.5.7. Find a polynomial time quantum algorithm that finds the prime factor
decomposition of every positive integer.
Exercise 6.5.8. Let 𝑁 ∈ ℕ, 𝑎 ∈ ℤ∗𝑁 , and 𝑟 ∈ ℤ𝑁 . Show how the quantum factoring
algorithm can be used to check in polynomial time whether 𝑟 is the order of 𝑎 mod-
ulo 𝑁.
248 6. The Algorithms of Shor
The exponent 𝑡 in the discrete logarithm problem is called the discrete logarithm
of 𝑏 to base 𝑎 modulo 𝑁. The discrete logarithm problem is also referred to as the DL
problem.
We will now present a quantum polynomial time DL algorithm. In the presenta-
tion, we will use the notation from the discrete logarithm problem.
We make a few preliminary remarks. Using the quantum factoring algorithm from
the previous section, we can find the prime factorization of 𝑁 in polynomial time, al-
lowing us to compute the order 𝜑(𝑁) of ℤ∗𝑁 since
1
(6.6.1) 𝜑(𝑁) = 𝑁 ∏ (1 − )
𝑝∣𝑁
𝑝
which is shown in [Buc04, Theorem 2.17.2]. By applying the quantum factoring al-
gorithm, we can also determine the prime factorization of 𝜑(𝑁). Subsequently, the
Pohlig-Hellman algorithm, as described in Section 10.5 of [Buc04], provides a polyno-
mial time reduction from the general DL problem to the problem of computing discrete
logarithms of basis elements 𝑎 whose order 𝑟 is a known prime number. Therefore, to
achieve a quantum polynomial time DL algorithm, we may assume that the order 𝑟 of
𝑎 modulo 𝑁 is a prime number. Additionally, we assume that 𝑡 > 1, as the cases 𝑡 = 0
and 𝑡 = 1 can be solved by inspection.
The idea of the quantum DL algorithm for this special case is the following. The
algorithm selects an appropriate precision parameter 𝑛 ∈ ℕ and uses the unitary oper-
ators 𝑈𝑎 and 𝑈 𝑏 that are specified in Definition 6.4.2. Since 𝑏 ≡ 𝑎𝑡 mod 𝑁, it follows
𝑘 𝑡𝑘
from Proposition 6.4.4 that the eigenvalues of these operators are 𝑒2𝜋𝑖 𝑟 and 𝑒2𝜋𝑖 𝑟 for
𝑥
0 ≤ 𝑘 < 𝑟. Using quantum phase estimation, we can find (𝑥, 𝑦) ∈ ℤ22𝑛 such that 2𝑛 and
𝑦 𝑘 𝑡𝑘 mod 𝑟
2𝑛
are close enough to 𝑟
and 𝑟
for some 𝑘 ∈ ℤ∗𝑟 such that
𝑟𝑥 𝑟𝑦
(6.6.2) 𝑘=⌊
𝑛
⌉ and 𝑘𝑡 mod 𝑟 = ⌊ 𝑛 ⌉ .
2 2
Since 𝑟 is a prime number, it follows that gcd(𝑘, 𝑟) = 1. So we can compute 𝑘′ with
𝑘𝑘′ ≡ 1 mod 𝑟 and obtain
𝑟𝑦
(6.6.3) 𝑡 = 𝑘′ ⌊ 𝑛 ⌉ mod 𝑟.
2
The quantum discrete logarithm is Algorithm 6.6.2. In the remainder of this sec-
tion, we will prove Theorem 6.6.3 which states that it is correct and runs in polynomial
time.
6.6. Discrete logarithms 249
Then it applies the quantum circuit from Figure 6.6.1. The next proposition de-
scribes the output of the circuit.
Proposition 6.6.4. For all 𝑘 ∈ ℤ∗𝑟 the quantum circuit in Figure 6.6.1 gives with proba-
bility 64(𝑟 − 1)/𝑟𝜋4 two integers 𝑥, 𝑦 ∈ ℤ2𝑛 such that
𝑟𝑥 𝑟𝑦
(6.6.6) 𝑘 = ⌊ 𝑛 ⌉ and 𝑘𝑡 mod 𝑟 = ⌊ 𝑛 ⌉ .
2 2
|0⟩𝑛 |𝑥⟩𝑛 −1
𝐻 ⊗𝑛 QFT𝑛 𝑥
|0⟩𝑛 |𝑦⟩𝑛 −1 𝑦
𝐻 ⊗𝑛 QFT𝑛
𝑦
|0⟩𝑛 𝑈𝑎𝑥 𝑈𝑏 tr
Figure 6.6.1. The quantum circuit QDL for discrete logarithm computation.
250 6. The Algorithms of Shor
Proof. The circuit operates on the tensor product of three quantum registers of length
𝑛 each of which is initialized to
Then it applies 𝐻 ⊗𝑛 to the first two quantum registers. This gives the state
⊗𝑛 ⊗𝑛
|0⟩ + |1⟩ |0⟩ + |1⟩
(6.6.8) ( ) ( ) |1⟩𝑛 .
√2 √2
Next, it applies the operator C-𝑈𝑎 from (6.4.27) to the first and the third quantum reg-
ister. As in (6.4.17) this gives the state
𝑟−1 2𝑛 −1
1 | 𝑘
(6.6.9) |𝜓2𝑎 ⟩ = ∑ (|𝜓𝑛 ( )⟩ ( ∑ |𝑦⟩) |𝑢𝑘 ⟩) .
√2𝑛 𝑟 𝑘=0 | 𝑟 𝑦=0
Then the algorithm applies the operator C-𝑈 𝑏 to the second and third quantum register.
It follows from Proposition 6.4.4 that the |𝑢𝑘 ⟩ are eigenstates of 𝑈 𝑏 associated to the
𝑘𝑡
eigenvalues 𝑒2𝜋𝑖 𝑟 . So, by the same argument, this gives the state
𝑟−1
1 | 𝑘 | 𝑘𝑡
(6.6.10) |𝜓2𝑏 ⟩ = ∑ (||𝜓𝑛 ( )⟩ ||𝜓𝑛 ( )⟩ |𝑢𝑘 ⟩) .
𝑟 𝑘=0 𝑟 𝑟
By Exercise 3.7.11, tracing out the third quantum register gives the mixed state
1 | 𝑘 | 𝑘𝑡
(6.6.11) |𝜓3 ⟩ = (( , |𝜓𝑛 ( )⟩ |𝜓𝑛 ( )⟩)) .
𝑟 | 𝑟 | 𝑟 0≤𝑘<𝑟
−1
Now the algorithm applies QFT𝑛 to the first and second register. This gives the mixed
state
1 | 𝑘 | 𝑘𝑡
(6.6.12) |𝜓4 ⟩ = (( , QFT𝑛−1 |𝜓𝑛 ( )⟩ QFT𝑛−1 |𝜓𝑛 ( )⟩)) .
𝑟 𝑛 | 𝑟 𝑛 | 𝑟 0≤𝑘<𝑟
Let 𝑘 ∈ ℤ𝑟 . By Theorem 6.3.7 and (6.6.5) measuring these registers in the computa-
tional basis of ℍ⊗2 4 2
𝑛 gives with probability 64/𝜋 𝑟 two integers (𝑥, 𝑦) ∈ ℤ2𝑛 with
| 𝑘 | 1 1 | 𝑘𝑡 mod 𝑟 | 1 1
(6.6.13) |Δ ( , 𝑛, 𝑥)| < 𝑛 < , |Δ ( , 𝑛, 𝑦)|| < 𝑛 < .
| 𝑟 | 2 2𝑟 | 𝑟 2 2𝑟
6.7. Relevance for cryptography 251
We note that
𝑘 1 1 𝑘𝑡 mod 𝑟 1 1
(6.6.14) 0< ≤ 1 − < 1 − 𝑛, ≤ 1 − < 1 − 𝑛.
𝑟 𝑟 2 𝑟 𝑟 2
So Lemma 6.3.3 and (6.6.13) imply
|𝑘 𝑥| 1 | 𝑘𝑡 mod 𝑟 𝑦| 1
(6.6.15) | − 𝑛| < , | − 𝑛 || < .
| 𝑟 2 | 2𝑟 | 𝑟 2 2𝑟
Consequently, we obtain
𝑟𝑥 𝑟𝑦
(6.6.16) 𝑘 = ⌊ 𝑛 ⌉ and 𝑘𝑡 mod 𝑟 = ⌊ 𝑛 ⌉
2 2
which concludes the proof. □
It follows from Proposition 6.6.4 that with probability at least 64(𝑟 − 1)/𝑟𝜋4 ≥
32/𝜋4 > 0 the quantum circuit in Figure 6.6.1 returns 𝑥, 𝑦 ∈ ℤ2𝑛 that satisfy (6.6.6)
for some 𝑘 ∈ ℤ∗𝑟 . If this happens, then the algorithm computes 𝑙 = ⌊𝑦𝑟/2𝑛 ⌉ which
by Proposition 6.6.4 is 𝑘𝑡 mod 𝑟. So we have 𝑡 = 𝑙𝑘−1 mod 𝑟 which means that the
algorithm produces the correct result. As in the analysis of the order finding algorithm,
it can be seen that the complexity of the algorithm is O((log 𝑁)3 ).
6.8.1. The problem. To state the hidden subgroup problem, we need the follow-
ing definition.
Definition 6.8.1. Let 𝐺 be a group, let 𝐻 be a subgroup of 𝐺, and let 𝑋 be a set. We say
that a function 𝑓 ∶ 𝐺 → 𝑋 hides the subgroup 𝐻 if for all 𝑔, 𝑔′ ∈ 𝐺 we have 𝑓(𝑔) = 𝑓(𝑔′ )
if and only if 𝑔𝐻 = 𝑔′ 𝐻. In other words, the function 𝑓 takes the same value for all
elements of a coset of 𝐻, while it takes different values for elements of different cosets.
Exercise 6.8.3. Show that the Deutsch-Jozsa problem can be viewed as a hidden sub-
group problem by finding 𝐺, 𝑋, 𝑓, and 𝐻 as in Definition 6.8.1, and show that finding
the hidden subgroup is equivalent to solving the Deutsch-Jozsa problem.
Exercise 6.8.4. Show that the generalization of Simon’s problem can be viewed as a
hidden subgroup problem by finding 𝐺, 𝑋, 𝑓, and 𝐻 as in Definition 6.8.1, and show
that finding the hidden subgroup is equivalent to solving the generalization of Simon’s
problem.
6.8.3. Hidden subgroup version of the order finding problem. In the order
finding problem from Section 6.4, an odd integer 𝑁 ∈ ℕ≥3 and 𝑎 ∈ ℤ𝑁 are given such
that gcd(𝑎, 𝑁) = 1. The problem is to find the order 𝑟 of 𝑎 modulo 𝑁. To frame this
problem as a hidden subgroup problem, we set 𝐺 = (ℤ, +), 𝑋 = ℤ𝑁 , 𝑓 ∶ 𝐺 → 𝑋,
𝑗 ↦ 𝑎𝑗 mod 𝑁. The hidden subgroup is 𝐻 = 𝑟ℤ. It has the property that for all 𝑗 ∈ ℤ
we have 𝑎𝑗 ≡ 1 mod 𝑁 if and only if 𝑗 ∈ 𝐻.
We show that the problem of finding the order 𝑟 of 𝑎 modulo 𝑁 is equivalent to
finding a finite generating system of 𝐻. First, we note that if we find the order 𝑟 of 𝑎
modulo 𝑁, then we know the generating system (𝑟) of 𝐻. To prove the converse, we
need the following result.
Lemma 6.8.5. Let 𝑟 ∈ ℕ and let 𝑚 ∈ ℕ and 𝐺 = (𝑟0 , . . . , 𝑟𝑚−1 ) ∈ ℤ𝑚 . Then 𝐺 is a
generating system of 𝑟ℤ if and only if gcd(𝑟0 , . . . , 𝑟𝑚−1 ) = 𝑟.
From Lemma 6.8.5 it follows that 𝑟 can be determined as the gcd of every gener-
ating system of 𝑟ℤ. Hence, finding a finite generating system of 𝐻 = 𝑟ℤ allows one to
find 𝑟.
Proof. Since ((𝑥0 , 𝑦0 ), . . . , (𝑥𝑚−1 , 𝑦𝑚−1 )) is a generating system of ℤ𝑟 (1, 𝑡), it follows
that there are 𝑢0 , . . . , 𝑢𝑚−1 ∈ ℤ with
𝑚−1
(6.8.1) ∑ 𝑢𝑖 (𝑥𝑖 , 𝑦 𝑖 ) ≡ (1, −𝑡) mod 𝑟.
𝑖=0
Let 𝑑 = gcd(𝑥0 , . . . , 𝑥𝑚−1 ) and let 𝑥𝑖′ = 𝑥𝑖 /𝑑 for all 𝑖 ∈ ℤ𝑟 . Then we obtain from (6.8.2)
𝑚−1
(6.8.3) 𝑑 ∑ 𝑢𝑖 𝑥𝑖′ ≡ 1 mod 𝑟.
𝑖=0
This chapter explores the renowned search algorithm developed by Lov Grover [Gro96]
and the related counting algorithms devised by Gilles Brassard, Peter Høyer, and Alain
Tapp [BHT98]. These algorithms provide a quadratic acceleration of classical algo-
rithms for unstructured search and counting problems. This refers to situations in
which we are given black-box access to a function 𝑓 ∶ {0, 1}𝑛 → {0, 1} and our objective
is to discover an input 𝑥⃗ ∈ {0, 1}𝑛 that satisfies 𝑓(𝑥)⃗ = 1 or to count the number of such
inputs. Given the broad applicability of this problem, the algorithms elucidated in this
section find utility across diverse fields, including cryptography and machine learning,
and effectively amplify the efficiency of existing algorithms in these contexts.
The initial section of this chapter focuses on Grover’s search algorithm. It shows
that addressing the search problem necessitates only the measurement of a quantum
state, which can be effectively prepared. However, the success probability of this ap-
proach proves to be inadequate. Here the crucial technique of amplitude amplifica-
tion steps in. This technique serves to enhance the likelihood of obtaining a solution
from a measurement, thus achieving quadratic acceleration. In the subsequent part of
the chapter, the synergy between amplitude amplification and phase estimation, intro-
duced in the preceding chapter, is explored. This synergy yields a quantum counting
algorithm that, under certain conditions, also delivers a quadratic speedup.
In the complexity analyses of this chapter, we assume that all quantum circuits are
constructed using the elementary quantum gates provided by the platform discussed
in Section 4.12.2, along with implementations of operators 𝑈 𝑓 as specified in their re-
spective contexts.
255
256 7. Quantum Search and Quantum Counting
7.1.1. The classical search problem. The algorithm of Grover solves the un-
structured search problem. The classical version of this problem is as follows.
Problem 7.1.1 (Classical search problem).
Input: 𝑛 ∈ ℕ and a black-box that implements a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}.
Output: A string 𝑥⃗ ∈ {0, 1}𝑛 with 𝑓(𝑥)⃗ = 1.
Input: 𝑛 ∈ ℕ, a black-box that implements 𝑈 𝑓 for a function 𝑓 ∶ {0, 1}𝑛 → {0, 1},
𝑀 = |𝑓−1 (1)|. It is assumed that 𝑀 > 0.
In the explanation of the Grover algorithm that solves Problem 7.1.2, we use the
notation and assumptions of this problem and set 𝑁 = 2𝑛 .
First, we explain how the search problem can be solved by measuring the quantum
state
1
(7.1.3) |𝑠⟩ = ∑ |𝑥⟩⃗ .
√𝑁 ⃗
𝑥∈{0,1}𝑛
Set
1 1
(7.1.4) |𝑠0 ⟩ = ∑ |𝑥⟩⃗ , |𝑠1 ⟩ = ∑ |𝑥⟩⃗ ,
√𝑁 ⃗
𝑥∈{0,1}𝑛 , 𝑓(𝑥)=0
⃗ √𝑁 ⃗
𝑥∈{0,1} 𝑛 , 𝑓(𝑥)=1
⃗
and
𝑀
(7.1.5) 𝜃 = arcsin √ .
𝑁
𝑁−𝑀 𝑀
(7.1.6) |𝑠⟩ = √ |𝑠0 ⟩ + √ |𝑠1 ⟩ = cos 𝜃 |𝑠0 ⟩ + sin 𝜃 |𝑠1 ⟩ .
𝑁 𝑁
Exercise 7.1.4. Prove Proposition 7.1.3.
(7.1.8) 𝐺(cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .
|0⟩𝑛 𝐻 ⊗𝑛 𝐺 ⋯ 𝐺 𝑥
⏟⎵⎵⎵⎵⎵⏟⎵⎵⎵⎵⎵⏟
𝑘 times
The Grover quantum search algorithm is shown in Figure 7.1.1 and Algorithm
7.1.6. Its input is the state |0⟩𝑛 . Then the algorithm constructs
(7.1.10) |𝑠⟩ = ℍ⊗𝑛 |0⟩𝑛 .
This equation holds by Lemma 5.3.6. Subsequently, the algorithm applies 𝐺 𝑘 to |𝑠⟩ and
measures the resulting quantum state in the computational basis of ℍ𝑛 . The number
𝑝𝑖
𝑘 is chosen so that 2(𝑘 + 1)𝜃 is as close as possible to 2 . By (7.1.9), this maximizes the
probability that the algorithm finds 𝑥⃗ ∈ {0, 1}𝑛 with 𝑓(𝑥)⃗ = 1. In Theorem 7.1.21 we
will estimate the number 𝑘 of applications of 𝐺 required in the search algorithm and
the success probability of the algorithm. Before we state and prove this theorem, we
construct the Grover iterator in the next section. Note that Algorithm 7.1.6 receives
as input the black-box implementing 𝑈 𝑓 but applies the Grover iterator 𝐺. In Section
7.1.4, we will explain how 𝐺 can be efficiently implemented using 𝑈 𝑓 .
7.1.3. The Grover iterator. We now explain the construction of the Grover iter-
ator and prove its properties.
Algorithm 7.1.6. Grover algorithm for search problems with known number of solu-
tions
Input: 𝑛 ∈ ℕ, a black-box implementing 𝑈 𝑓 for some 𝑓 ∶ {0, 1}𝑛 → {0, 1}, and 𝑀 =
|𝑓−1 (1)|. It is assumed that 𝑀 > 0.
Output: 𝑥 ∈ {0, 1}𝑛 such that 𝑓(𝑥) = 1
1: QSearch(𝑛, 𝑈 𝑓 , 𝑀)
𝜋 𝑀
2: 𝑘 ← ⌊ 4𝜃 ⌋ where 𝜃 = arcsin(√ 𝑁 )
3: Apply the quantum circuit from Figure 7.1.1, the result being 𝑥⃗ ∈ {0, 1}𝑛
4: end
The operator 𝑈𝑠 is sometimes also called the Grover diffusion operator. The next
proposition states basic properties of the operators in Definition 7.1.5.
7.1. Quantum search 259
𝛼
|𝑠0 ⟩
𝛼 𝑈1
Proposition 7.1.7. (1) The operators 𝑈1 and 𝑈𝑠 are unitary and Hermitian involu-
tions.
(2) The Grover iterator 𝐺 is unitary.
Exercise 7.1.8. Prove Proposition 7.1.7.
Next, we present the geometric properties of 𝑈1 and 𝑈𝑠 . For this, we define the
complex plane
(7.1.13) 𝑃 = ℂ |𝑠0 ⟩ + ℂ |𝑠1 ⟩ .
We note that (|𝑠0 ⟩ , |𝑠1 ⟩) is an orthonormal basis of 𝑃. The next proposition states an
important geometric property of 𝑈1 .
Proposition 7.1.9. The operator 𝑈1 acts as a reflection in the plane 𝑃 across |𝑠0 ⟩. In
particular, for all 𝛼 ∈ ℝ we have
(7.1.14) 𝑈1 (cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos 𝛼 |𝑠0 ⟩ − sin 𝛼 |𝑠1 ⟩ .
𝑈𝑠 |𝑠⟩
𝛽
𝛽
|𝜓⟩ = cos 𝛽 |𝑠⟩ + sin 𝛽 |𝑠⟂ ⟩
Proposition 7.1.12. The operator 𝑈𝑠 acts as a reflection in the plane 𝑃 across |𝑠⟩. In
particular, for all 𝛼 ∈ ℝ we have
Let 𝛼 ∈ ℝ. Using Propositions 7.1.9 and 7.1.12 we can describe the action of the
Grover iterator on a quantum state |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩ geometrically. This is
illustrated in Figure 7.1.4. Since applying 𝑈1 to |𝜓⟩ means reflecting |𝜓⟩ across |𝑠0 ⟩,
the angle between |𝑠0 ⟩ and 𝑈1 |𝜓⟩ is 𝛼 mod 2𝜋. So, the angle between |𝑠⟩ and 𝑈1 |𝜓⟩ is
𝛼 + 𝜃 mod 2𝜋. Next, applying 𝑈𝑠 to 𝑈1 |𝜓⟩ means reflecting 𝑈1 |𝜓⟩ across |𝑠⟩. So the
angle between 𝐺 |𝜓⟩ = 𝑈𝑠 𝑈1 |𝜓⟩ and |𝑠⟩ is 𝛼 + 𝜃 mod 2𝜋 and the angle between |𝑠0 ⟩ and
𝐺 |𝜓⟩ is 𝛼 + 2𝜃 mod 2𝜋. So we have
and
(7.1.22) 𝐺 (cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .
7.1. Quantum search 261
|𝑠1 ⟩
𝐺 |𝜓⟩ = 𝑈𝑠 𝑈1 |𝜓⟩ = cos(𝛼 + 2𝜃) |𝑠0 ⟩ − sin(𝛼 + 2𝜃) |𝑠1 ⟩
𝑈𝑠
|𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩
𝛼 + 2𝜃 |𝑠⟩ = cos 𝜃 |𝑠0 ⟩ + sin 𝜃 |𝑠1 ⟩
𝛼
𝜃
|𝑠0 ⟩
𝛼
𝑈1
Figure 7.1.4. Applying the Grover iterator 𝐺 to |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩.
Proposition 7.1.17. The circuit in Figure 7.1.5 implements the operator 𝑈1 in the plane
𝑃. It applies the black-box for 𝑈 𝑓 once and uses four additional elementary quantum
gates.
Proof. We will prove that the circuit computes 𝑈1 |𝑠0 ⟩ and 𝑈1 |𝑠1 ⟩ correctly. This suf-
fices since 𝑈1 is linear and (|𝑠0 ⟩ , |𝑠1 ⟩) is a basis of the plane 𝑃. Let 𝑗 ∈ {0, 1}. First, we
262 7. Quantum Search and Quantum Counting
|𝜓⟩ 𝑈1 |𝜓⟩
𝑈𝑓
|1⟩ 𝐻 𝐻 tr
note that
𝑈1 |𝑠𝑗 ⟩ = (𝐼 − 2 |𝑠1 ⟩ ⟨𝑠1 |) |𝑠𝑗 ⟩ = (−1)𝑗 |𝑠𝑗 ⟩ ,
|𝑠𝑗 ⟩ |0⟩ if 𝑗 = 0, |𝑠𝑗 ⟩ |1⟩ if 𝑗 = 0,
𝑈 𝑓 |𝑠𝑗 ⟩ |0⟩ = { 𝑈 𝑓 |𝑠𝑗 ⟩ |1⟩ = {
|𝑠𝑗 ⟩ |1⟩ if 𝑗 = 1, |𝑠𝑗 ⟩ |0⟩ if 𝑗 = 1,
and therefore
𝑈 𝑓 |𝑠𝑗 ⟩ |0⟩ − 𝑈 𝑓 |𝑠𝑗 ⟩ |1⟩
(7.1.24) 𝑈 𝑓 |𝑠𝑗 ⟩ |𝑥− ⟩ = = (−1)𝑗 |𝑠𝑗 ⟩ |𝑥− ⟩ = (𝑈1 |𝑠𝑗 ⟩) |𝑥− ⟩ .
√2
This allows us to determine the intermediate states in the circuit. They are
|𝜓0 ⟩ = |𝑏𝑗 ⟩ |1⟩ ,
|𝜓1 ⟩ = |𝑏𝑗 ⟩ |𝑥− ⟩ ,
|𝜓2 ⟩ = 𝑈 𝑓 |𝑏𝑗 ⟩ |𝑥− ⟩ = (𝑈1 |𝑏𝑗 ⟩) |𝑥− ⟩ ,
|𝜓3 ⟩ = (𝑈1 |𝑏𝑗 ⟩) |1⟩ .
This concludes the proof of the proposition. □
|𝑥0 ⟩ 𝐻 𝑋 𝑋 𝐻
|𝑥1 ⟩ 𝐻 𝑋 𝑋 𝐻
|𝑥𝑛−2 ⟩ 𝐻 𝑋 𝑋 𝐻
|𝑥𝑛−1 ⟩ 𝐻 𝑋 𝑍 𝑋 𝐻
− |𝑥⟩⃗ if 𝑥⃗ ≠ 0,⃗
(7.1.28) 𝑉 |𝑥⟩⃗ = {
|𝑥⟩⃗ if 𝑥⃗ = 0⃗
We also have
(7.1.29) 𝑋 ⊗𝑛 |𝑥⟩⃗ = |¬𝑥⟩⃗
where ¬𝑥⃗ denotes the string in {0, 1}𝑛 that is obtained by negating all entries in 𝑥.⃗ We
now show that
− |¬𝑥⟩⃗ if 𝑥⃗ = 0,⃗
(7.1.30) 𝐶 𝑛−1 (𝑍)𝑋 ⊗𝑛 |𝑥⟩⃗ = 𝐶 𝑛−1 (𝑍) |¬𝑥⟩⃗ = {
|¬𝑥⟩⃗ if 𝑥⃗ ≠ 0.⃗
− |𝑥⟩⃗ if 𝑥⃗ = 0,⃗
(7.1.31) 𝑋 ⊗𝑛 𝐶 𝑛−1 (𝑍)𝑋 ⊗𝑛 |𝑥⟩⃗ = { } = 𝑉 |𝑥⟩⃗ .
|𝑥⟩⃗ if 𝑥⃗ ≠ 0⃗
We estimate the size of the circuit. It uses O(𝑛) Pauli 𝑋 and Hadamard gates and
one 𝐶 𝑛−1 (𝑍) operator which by Proposition 4.9.11 and Corollary 4.12.8 can be imple-
mented using the O(𝑛) elementary quantum gates. So in total, the circuit has size
O(𝑛). □
Proposition 7.1.19. The Grover iterator 𝐺 can be implemented using one black-box for
𝑈 𝑓 and O(𝑛) additional elementary quantum gates.
Proof. It follows from Proposition 7.1.16 that the final quantum state produced by the
algorithm is
(7.1.32) |𝜓⟩ = 𝐺 𝑘 |𝑠⟩ = cos(2𝑘 + 1)𝜃 |𝑠0 ⟩ + sin(2𝑘 + 1)𝜃 |𝑠1 ⟩
where
𝑀 𝜋
(7.1.33) 𝜃 = arcsin √ and 𝑘=⌊ ⌋.
𝑁 4𝜃
Then the algorithm measures |𝜓⟩ in the computational basis of ℍ𝑛 . It follows from the
definition of |𝑠0 ⟩ and |𝑠1 ⟩ that this gives 𝑥⃗ such that 𝑓(𝑥)⃗ = 1 with probability
2
(7.1.34) 𝑝 = sin (2𝑘 + 1)𝜃.
To prove the theorem, we estimate 𝑘 and 𝑝. It follows from Corollary A.5.6 and (7.1.33)
that
𝜋 𝜋 𝜋 𝑁
(7.1.35) 𝑘≤ = ≤ √ .
4𝜃 4 arcsin √𝑀/𝑁 4 𝑀
To estimate 𝑝 we observe that
𝜋
(7.1.36) 0<𝜃≤ .
2
We also set
𝜋 1
(7.1.37) 𝑘̃ = − .
4𝜃 2
Then
𝜋 𝜋
(7.1.38) (2𝑘 ̃ + 1)𝜃 = ( − 𝜃 + 𝜃) = .
2 2
Also, the choice of 𝑘 in (7.1.33) implies
𝜋
(7.1.39) 0≤ −𝑘<1
4𝜃
and therefore
1 𝜋 1 1
(7.1.40) − ≤ − − 𝑘 = 𝑘̃ − 𝑘 <
2 4𝜃 2 2
which implies
1
(7.1.41) |𝑘 − 𝑘|̃ ≤ .
2
7.1. Quantum search 265
It follows that
(7.1.42) |(2𝑘 + 1)𝜃 − (2𝑘 ̃ + 1)𝜃| = |2(𝑘 − 𝑘)𝜃
̃ | ≤ 𝜃.
By (7.1.38) we have sin(2𝑘 ̃ + 1)𝜃 = 1 and cos(2𝑘 ̃ + 1)𝜃 = 0. These equations, the
trigonometric identity (A.5.3), and equations (7.1.42) and (7.1.36) imply
|cos(2𝑘 + 1)𝜃|
= |cos(2𝑘 + 1)𝜃 sin(2𝑘 ̃ + 1)𝜃 − cos(2𝑘 ̃ + 1)𝜃 sin(2𝑘 + 1)𝜃|
(7.1.43)
= |sin ((2𝑘 + 1)𝜃 − 2(𝑘 ̃ + 1)𝜃)|
= sin |(2𝑘 + 1)𝜃 − 2(𝑘 ̃ + 1)𝜃| ≤ sin 𝜃.
Therefore, the failure probability of the Grover search algorithm after 𝑘 iterations is
2 𝑀
(7.1.44) cos2 (2𝑘 + 1)𝜃 ≤ sin 𝜃 =
𝑁
which implies the assertion about the success probability.
We estimate the complexity of the algorithm. It follows from (7.1.35) that the num-
𝜋 𝑁
ber of applications of the Grover iterator in the algorithm is bounded by 4 √ 𝑀 . So it
follows from Proposition 7.1.19 that the algorithm invokes the black-box for 𝑈 𝑓 at most
𝜋 𝑁 𝑀
4 √𝑀
times and uses O (log 𝑁 √ 𝑁 ) additional elementary quantum gates. □
3𝑁 3𝑁
We note that the condition 𝑀 ≤ 4
is not a restriction, since for 𝑀 > 4
guessing
3
a solution of the search problem has success probability at least 4 .
In the proof of Theorem 7.1.23, we again use the angle
𝑀
(7.1.45) 𝜃 = arcsin √ .
𝑁
𝑚−1
(7.1.46) 2 sin 𝛼 ∑ cos(2𝑘 + 1)𝛼 = sin 2𝑚𝛼.
𝑘=0
𝑚−1
(7.1.47) 2 sin 𝛼 ∑ cos((2𝑘 + 1)𝛼) = 2 sin 𝛼 cos 𝛼 = sin 2𝛼.
𝑘=0
Now let 𝑚 ≥ 1 and assume that (7.1.46) holds. Then this equation and the trigonomet-
ric identities (A.5.2) and (A.5.3) imply
𝑚
2 sin 𝛼 ∑ cos(2𝑘 + 1)𝛼
𝑘=0
𝑚−1
= 2 sin 𝛼 ( ∑ cos(2𝑘 + 1)𝛼 + cos(2𝑚 + 1)𝛼)
𝑘=0
= sin 2𝑚𝛼 + 2 sin 𝛼 cos(2𝑚 + 1)𝛼
(7.1.48) = sin 2𝑚𝛼 + sin 𝛼 cos(2𝑚 + 1)𝛼 − cos 𝛼 sin(2𝑚 + 1)𝛼
+ sin 𝛼 cos(2𝑚 + 1)𝛼 + cos 𝛼 sin(2𝑚 + 1)𝛼
= sin 2𝑚𝛼 − sin 2𝑚𝛼 + sin 2(𝑚 + 1)𝛼
= sin 2(𝑚 + 1)𝛼. □
7.1. Quantum search 267
Lemma 7.1.25. Let 𝑚 ∈ ℕ and assume that 𝑘 is chosen randomly with the uniform
distribution from ℤ𝑚 . Then measuring 𝐺 𝑘 |𝑠⟩ gives a solution of the search problem with
probability
1 sin 4𝑚𝜃
(7.1.49) 𝑝𝑚 = − .
2 4𝑚 sin 2𝜃
1 1
In particular, we have 𝑝𝑚 ≥ 4
when 𝑚 ≥ sin 2𝜃
.
Proof. By (7.1.9) the probability of obtaining a solution of the search problem when
2
measuring 𝐺𝑘 |𝑠⟩ for some 𝑘 ∈ ℕ0 is sin (2𝑘 + 1)𝜃. So if 𝑘 is chosen randomly from ℤ𝑚
for some 𝑚 ∈ ℕ, then equation (7.1.32), the trigonometric identity (A.5.7), and Lemma
7.1.24 imply that this probability is
𝑚−1
1 2
𝑝𝑚 = ∑ sin (2𝑘 + 1)𝜃
𝑚 𝑘=0
𝑚−1
(7.1.50) 1
= ∑ (1 − cos(2𝑘 + 1)2𝜃)
2𝑚 𝑘=0
1 sin 4𝑚𝜃
= − .
2 4𝑚 sin 2𝜃
1
If 𝑚 ≥ sin 2𝜃
, then
sin 4𝑚𝜃 sin 4𝑚𝜃 1
(7.1.51) ≤ ≤ . □
4𝑚 sin 2𝜃 4 4
We now prove Theorem 7.1.23. Set
1
(7.1.52) 𝑚0 = .
sin 2𝜃
𝑀 𝑁−𝑀 3𝑁
Since sin 𝜃 = √ 𝑁 and cos 𝜃 = √ 𝑁
, it follows from (A.5.4) and 𝑀 ≤ 4
that
1 𝑁 𝑁
(7.1.53) 𝑚0 = = ≤√ .
2 sin 𝜃 cos 𝜃 2√(𝑁 − 𝑀)𝑀 𝑀
In the 𝑗th iteration of the loop in Algorithm 7.1.22 we have
(7.1.54) 𝑚 = ⌊min{𝜆𝑗−1 , √𝑁}⌋
6
with 𝜆 = 5 . Also, the expected number of applications of the Grover iterator in this
loop is bounded as follows:
𝑚 1
(7.1.55) 𝐸𝑗 = ≤ min {𝜆𝑗−1 , √𝑁}.
2 2
We say that the algorithm reaches the critical stage if for the first time 𝑚 ≥ 𝑚0 . This
happens when in line 7 of the algorithm we have 𝑗 = ⌈log𝜆 𝑚0 ⌉. From (7.1.55) and
6
𝜆 = 5 it follows that the expected number of applications of the Grover iterator before
the algorithm finds a solution or reaches the critical stage is at most
⌈log𝜆 𝑚0 ⌉
1 𝜆⌈log𝜆 𝑚0 ⌉ − 1 𝜆
(7.1.56) ∑ 𝜆𝑗−1 = < 𝑚 = 3𝑚0 .
2 𝑗=1
2(𝜆 − 1) 2(𝜆 − 1) 0
268 7. Quantum Search and Quantum Counting
If the critical stage is reached, then in every iteration of the repeat loop in the
algorithm from this point on, we have 𝑚 ≥ 𝑚0 = 1/ sin 2𝜃. By Lemma 7.1.24, the suc-
1
cess probability in each of these iterations is at least 4 . So for all 𝑢 ≥ 1 the probability
that the algorithm is successful in the (⌈log𝜆 𝑚0 ⌉ + 𝑢)th iteration of the loop is at most
3 ᵆ−1
4
.
Therefore, the expected number of applications of the Grover iterator needed to
succeed in the critical stage is at most
∞ ᵆ ∞
𝜆⌈log𝜆 𝑚0 ⌉ 3𝜆 3𝑚0 9 ᵆ
(7.1.57) ∑( ) < ∑ ( ) = 6𝑚0 .
2 ᵆ=0
4 5 ᵆ=0 10
Therefore, the total expected number of applications of the Grover iterator in the algo-
𝑁
rithm is bounded by 9𝑚0 which by (7.1.52) is bounded by 9√ 𝑀 . The estimate of the
expected running time of the algorithm is derived from Propositions 7.1.17 and 7.1.18.
controlled-𝑈𝑠 operator can also be implemented by a quantum circuit using O(𝑛) ele-
mentary quantum gates. Next, in Figure 7.1.5, a quantum circuit implementation of 𝑈1
is presented, utilizing the 𝑈 𝑓 operator and O(1) elementary quantum gates. To trans-
form it into an implementation of the controlled-𝑈1 operator, we require the controlled-
𝑈 𝑓 operator. If an implementation of 𝑈 𝑓 is available that exclusively uses elemen-
tary quantum gates, then, by following the method described in the proof of Theorem
4.12.7, a quantum circuit for the controlled-𝑈 𝑓 operator can be constructed using only
elementary quantum gates. We assume that the controlled-𝑈 𝑓 operator is provided in
this manner or through some other means. Then, by employing the method from the
proof of Theorem 4.12.7, a quantum circuit implementing the controlled-𝑈1 operator
can be constructed that uses O(1) elementary quantum gates and one controlled-𝑈 𝑓
gate. Combining these results, we obtain the following result.
Proposition 7.2.1. There is an implementation of the controlled Grover iterator that
requires one controlled-𝑈 𝑓 gate and O(𝑛) additional elementary quantum gates.
Proof. By Exercise 7.2.3, the pair (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of 𝑃. Also, we
know from Proposition 7.1.16 that for all 𝛼 ∈ ℝ we have
(7.2.3) 𝐺(cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .
As shown in Exercise 7.2.3, this implies
(7.2.4) 𝐺 |𝑠0 ⟩ = cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩ , 𝐺 |𝑠1 ⟩ = − sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩
and therefore
(7.2.5) 𝐺 |𝑠+ ⟩ = 𝑒2𝑖𝜃 |𝑠+ ⟩ , 𝐺 |𝑠− ⟩ = 𝑒−2𝑖𝜃 |𝑠− ⟩ .
270 7. Quantum Search and Quantum Counting
|0⟩𝑛 𝐻 ⊗𝑛 𝐺𝑐 tr
So |𝑠+ ⟩ and |𝑠− ⟩ are eigenstates of 𝐺|𝑃 associated with the eigenvalues 𝑒2𝑖𝜃 and 𝑒−2𝑖𝜃 ,
respectively. Equation (7.2.2) is also proved in Exercise 7.2.3. □
Exercise 7.2.3. (1) Show that (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of 𝑃.
(2) Verify equations (7.2.4), (7.2.5), and (7.2.2).
The following theorem establishes the correctness of Algorithm 7.2.4 and provides
insight into its computational complexity.
7.2. Quantum counting 271
Theorem 7.2.5. Assume that the input of Algorithm 7.2.4 is 𝑛, 𝑈 𝑓 , 𝑙 as specified in the
algorithm and let 𝐿 = 2𝑙 . Denote by 𝑀̃ the output of the algorithm. Then the following
are true.
8
(1) With probability at least 𝜋2
we have
√𝑀(𝑁 − 𝑀) 𝜋2 𝑁
(7.2.7) |𝑀̃ − 𝑀 | ≤ 2𝜋 + 2 .
𝐿 𝐿
(2) The algorithm requires O(𝐿) applications of 𝑈 𝑓 and O(𝐿𝑛2 ) additional elementary
operations.
Proof. From (7.2.2) and (6.3.18) it follows that before tracing the target register, the
state of the quantum circuit is
−𝑖 𝜃 −𝜃
(7.2.8) |𝜑⟩ = − (𝑒𝑖𝜃 𝜓𝑙 ( ) |𝑠+ ⟩ − 𝑒−𝑖𝜃 𝜓𝑙 ( ) |𝑠− ⟩) .
√2 𝜋 𝜋
Since (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of the plane 𝑃, Corollary 3.7.12 implies that
after tracing out the target register, the control register is in the mixed state
1 𝜃 1 𝜃
(7.2.9) (( , 𝜓𝑙 ( )) , ( , 𝜓𝑙 (− ))) .
2 𝜋 2 𝜋
8
So it follows from Theorem 6.3.7 and Lemma 6.3.3 that with probability at least 𝜋2
the
measurement result 𝑥 in Algorithm 7.2.4 satisfies
|𝑥 𝜃 | 1
(7.2.10) | − |< .
|𝐿 𝜋| 𝐿
Set
2 𝑀
(7.2.11) 𝑝 = sin 𝜃 = .
𝑁
Then we have
(7.2.12) sin 𝜃 = √𝑝, cos 𝜃 = √1 − 𝑝,
and
(7.2.13) 𝑀 = 𝑁𝑝.
So if we set
𝜋𝑥 2
(7.2.14) 𝜃̃ =
, 𝑝 ̃ = sin 𝜃,̃
𝐿
then the return value of Algorithm 7.2.4 is
(7.2.15) 𝑀̃ = 𝑁 𝑝.̃
We will now prove that
√𝑝(1 − 𝑝) 𝜋2
(7.2.16) |𝑝 ̃ − 𝑝| < 2𝜋 + 2.
𝐿 𝐿
Multiplying this equation by 𝑁, we obtain the assertion of the theorem. We set
(7.2.17) 𝜀 = 𝜃 ̃ − 𝜃.
272 7. Quantum Search and Quantum Counting
2
In the analysis of the algorithm, it is demonstrated that with probability at least cos2 5 ,
we have
2 𝑁
(7.2.23) 2𝑙max ≥ √ .
5𝜋 𝑀
Then the number 𝑙max of QCount calls that return 0 provides crucial information about
the magnitude of the solution count 𝑀. A larger value of 𝑙max implies a smaller value
20𝜋2
of 𝑀. Furthermore, it is proven that for 𝑙 = 𝑙max + ⌈log2 𝜀 ⌉, where 𝜀 denotes the
chosen precision, the call QCount(𝑛, 𝑈 𝑓 , 𝑙) provides the desired approximation 𝑀̃ with
8
a probability of at least 𝜋2 . Therefore, the total success probability of the algorithm is
8 2 2
at least 𝜋2
cos2 5
> 3.
The following theorem establishes both the correctness and the computational
complexity of Algorithm 7.2.6.
Theorem 7.2.7. Let 𝑛, 𝑈 𝑓 , 𝜀 be the input of Algorithm 7.2.6. Denote by 𝑀̂ the return
value of the algorithm. Then the following are true.
2
(1) With probability at least 3
we have
(7.2.24) |𝑀̂ − 𝑀| < 𝜀𝑀.
√𝑁 𝑛2 √𝑁
(2) The algorithm requires O ( 𝜀
) applications of 𝑈 𝑓 and O ( 𝜀
) additional ele-
mentary operations.
Proof. Let
𝑀 1
(7.2.25) 𝜃 = arcsin √ , 𝑘 = ⌈log2 ⌉.
𝑁 5𝜃
Then we have
1 1
(7.2.26) 2𝑘 ≥ 2log2 5𝜃 = .
5𝜃
274 7. Quantum Search and Quantum Counting
7.2.4. Exact counting. The two algorithms, QCount and ApproxQCount, can be
effectively utilized to count the number of solutions 𝑀 = 𝑓−1 (1) exactly. The approach
1
involves employing ApproxQCount(𝑛, 𝑈 𝑓 , 2 ) to obtain a reliable approximation 𝑀̃ 1 of
𝑀. Then, using this approximation, we find an appropriate value of 𝑙 ∈ ℕ such that
1
QCount(𝑛, 𝑈 𝑓 , 𝑙) provides 𝑀̃ 2 that satisfies |𝑀 − 𝑀̃ 2 | < 2 . Consequently, 𝑀 is the
nearest integer value of 𝑀̃ 2 . This entire process is implemented in Algorithm 7.2.8.
Theorem 7.2.9. On input of 𝑛 ∈ ℕ and 𝑈 𝑓 for some function 𝑓 ∶ {0, 1}𝑛 → {0, 1},
1
Algorithm 7.2.8 returns 𝑀 = |𝑓−1 (1)| with probability at least 2 . The algorithm requires
O(√𝑀𝑁) applications of 𝑈 𝑓 and O(𝑛2 √𝑀𝑁) additional elementary operations.
𝑀 𝑀
Proof. We have |𝑀̃ 1 − 𝑀| ≤ 2 and, therefore, 𝑀̃ 1 ≥ 2
. Choose 𝑙 = ⌈log2 26√𝑀̃ 1 𝑁⌉
as in line 3 and 𝐿 = 2𝑙 . Then we have
1 1
(7.2.34) ≤ .
𝐿 26√𝑀̃ 𝑁
1
This implies
2𝜋 𝑀𝑁 𝜋2 𝑁
|𝑀 − 𝑀̃ 2 | ≤ + 2
26 √ 𝑀̃ 1 𝑁 26 𝑀̃ 1 𝑁
(7.2.35)
4𝜋 𝜋2 1
≤ + 2 < .
26 26 2
So it follows that the algorithm returns the correct 𝑀.
The complexity statement follows from Theorems 7.2.5 and 7.2.7. □
Chapter 8
277
278 8. The HHL Algorithm
can only handle rational approximations to complex numbers, several algorithms have
been developed to efficiently find good approximations to 𝑥⃗ using polynomial time in
𝑀.
The HHL algorithm addresses the Quantum Linear System Problem (QLSP). As in
LSP, it uses 𝑀 ∈ ℕ, 𝐴 ∈ 𝖦𝖫(𝑀, ℂ), 𝑏 ⃗ ∈ ℂ𝑀 , and 𝑥⃗ = 𝐴−1 𝑏.⃗ To simplify the description
of the HHL algorithm, the following assumptions are made.
(1) 𝑀 = 2𝑚 with 𝑚 ∈ ℕ.
(2) 𝐴 is Hermitian; hence, 𝐴 ∈ 𝖦𝖫(𝑀, ℂ) and Proposition 2.4.60 imply that the eigen-
values of 𝐴 are nonzero real numbers.
(3) The eigenvalues of 𝐴 are in [0, 2𝜋[.
(4) ‖‖𝑏‖‖⃗ = 1.
0 𝐴
Exercise 8.1.1. Let 𝐴 ∈ 𝖦𝖫(𝑀, ℂ), 𝑏,⃗ 𝑥⃗ ∈ ℂ𝑀 with 𝐴𝑏 ⃗ = 𝑥.⃗ Show that 𝐴′ = ( ∗ )
𝐴 0
is Hermitian and for 𝑏′⃗ = (𝑏,⃗ 0)⃗ and 𝑥′⃗ = (0,⃗ 𝑥)⃗ we have 𝐴′ 𝑏′⃗ = 𝑥′⃗ .
8.2. Overview
Let 𝑚, 𝑀, 𝐴, 𝑏,⃗ 𝑥,⃗ |𝑏⟩, and |𝑥⟩ be as specified in the previous section for the HHL prob-
lem. In the following, we give an overview of the HHL algorithm.
Since 𝐴 is Hermitian, it follows from Theorem 2.4.53 that we can choose an or-
thonormal basis (|𝑢0 ⟩ , . . . , |𝑢𝑀−1 ⟩) of eigenstates of 𝐴. Denote by 𝜆0 , . . . , 𝜆𝑀 the cor-
responding eigenvalues. They are nonzero real numbers in [0, 2𝜋[ by assumption. It
follows from the Spectral Theorem 2.4.56 that
𝑀−1
(8.2.1) 𝐴 = ∑ 𝜆𝑗 |𝑢𝑗 ⟩ ⟨𝑢𝑗 | .
𝑗=0
The HHL circuit shown in Figure 8.2.1 uses this identity to approximate |𝑥⟩. We explain
how this works by determining the intermediate states |𝜓0 ⟩ , . . . , |𝜓4 ⟩.
The HHL circuit operates on a quantum register which is composed of three smaller
quantum registers. The first is the ancilla register. It contains one ancillary qubit. The
second is the clock register. It is of length 𝑛 ∈ ℕ which is a precision constant. The third
register is the b-register. It is of length 𝑚. To simplify the explanation of the algorithm,
|0⟩ 𝑅𝑐 tr
𝑈𝑃 𝑈𝑃−1
|0⟩𝑛 −1
𝐻 ⊗𝑛 |𝑐⟩ QFT𝑛 |𝑐⟩ QFT𝑛 |𝑐⟩ 𝐻 ⊗𝑛 tr
𝑈 𝑉 𝑈 −1
with 𝑈 and 𝑈𝑃 from Figure 8.2.1. Here, 𝑈𝑃 is the phase estimation circuit introduced
in Section 6.3. It is used to estimate the eigenvalues of
are the eigenvalues corresponding to the basis elements. It follows from equation
(6.3.20) in the analysis of the phase estimation algorithm, Definition 6.2.5, and the
invertibility of QFT𝑛 shown in Proposition 6.2.8 that
𝑀−1
|𝜓1 ⟩ = |0⟩ ∑ 𝛽𝑗 𝑈𝑃 |0⟩𝑛 |𝑢𝑗 ⟩
𝑗=0
𝑀−1
(8.2.10) = |0⟩ ∑ 𝛽𝑗 QFT𝑛
−1 ||𝜓 ( 𝑐𝑗 )⟩ |𝑢 ⟩
| 𝑛 2𝑛 𝑗
𝑗=0
𝑀−1
= |0⟩ ∑ 𝛽𝑗 |𝑐𝑗 ⟩𝑛 |𝑢𝑗 ⟩ .
𝑗=0
In order to obtain |𝜓2 ⟩, the HHL circuit applies the operator 𝑉 to |𝜓1 ⟩ which is also
shown in Figure 8.2.1. This operator acts as the rotation
(8.2.11) 𝑅𝑐 = 𝑅𝑦̂(−2𝜃(𝑐))
on the ancilla register controlled by the clock register |𝑐⟩𝑛 , 𝑐 ∈ ℤ2𝑛 , and does not change
the clock and the 𝑏 register. Here, 𝑅𝑦̂ is from Definition 4.3.7,
𝐶 𝐶
(8.2.12) 𝜃(𝑐) = arcsin with 𝜆(𝑐) = ,
𝜆(𝑐) 2𝑛
8.2. Overview 281
and the constant 𝐶 ∈ ℝ is chosen such that 𝜃(𝑐) in (8.2.12) is defined, in the interval
𝜋
[0, 2 ], and the success probability of the algorithm is maximized. From (4.3.9), we
obtain the following:
𝐶2 𝐶
(8.2.13) 𝑅𝑐 |0⟩ = cos 𝜃(𝑐) |0⟩ + sin 𝜃(𝑐) |0⟩ = 1− |0⟩ + |1⟩ .
√ 𝜆(𝑐)2 𝜆(𝑐)
This implies
𝑀−1
|𝜓2 ⟩ = 𝑉 |𝜓1 ⟩ = ∑ 𝛽𝑗 𝑅𝑐 |0⟩ |𝑐𝑗 ⟩ |𝑢𝑗 ⟩
𝑛
𝑗=0
(8.2.14)
𝑀−1 𝑀−1
𝐶2 𝐶
= |0⟩ ∑ 𝛽𝑗 1− |𝑐 ⟩ |𝑢 ⟩ + |1⟩ ∑ 𝛽𝑗 |𝑐𝑗 ⟩ |𝑢𝑗 ⟩ .
𝑗=0 √ 𝜆𝑗 𝑗 𝑛 𝑗 𝑗=0
𝜆𝑗
𝑛
Theorem 8.2.3. Measuring the first qubit of |𝜓3 ⟩ gives |1⟩ with probability 𝐶 ‖‖𝑥‖‖⃗ . If |1⟩
is measured, then the final state in the HHL circuit is
𝐶
(8.2.16) |𝜓4 ⟩ = |𝑥⟩ .
‖𝑥‖⃗
‖ ‖
Proof. Measuring the first qubit of |𝜓3 ⟩ means measuring the observable 𝑂 = (|0⟩ ⟨0|+
|1⟩ ⟨1|)𝐼𝐵 where 𝐵 is the quantum system comprising the second and third quantum
registers. Therefore, the probability of measuring |1⟩ is 𝐶 ‖‖𝑥‖‖⃗ and if |1⟩ is measured,
then (8.2.16) holds. □
𝐶
We note that the proportionality factor ‖𝑥⃗‖ can be obtained from 𝐶 and the proba-
‖ ‖
bility of measuring |1⟩. Also, if the measurement of the first qubit gives |1⟩ but (8.2.5)
does not hold, which in general is the case, then the final state is
𝐶
(8.2.17) |𝜓4 ⟩ = |𝑥′ ⟩
‖𝑥′⃗ ‖
‖ ‖
𝑀−1
where 𝑥′⃗ = (𝑥0′ , . . . , 𝑥𝑀−1
′
) is an approximation of 𝑥⃗ and |𝑥′ ⟩ = ∑𝑗=0 𝑥𝑗′ |𝑢𝑗 ⟩.
282 8. The HHL Algorithm
Foundations
A.1. Basics
A.1.1. Numbers. We denote the usual sets of numbers as follows.
• ℕ is the set of natural numbers; i.e., ℕ = {1, 2, . . .}.
• ℕ0 is the set of natural numbers including 0; i.e., ℕ0 = {0, 1, 2, . . .}.
• ℤ is the set of integers; i.e., ℤ = {0, ±1, ±2, . . .}.
𝑝
• ℚ is the set of rational numbers; i.e., ℚ = { 𝑞 ∶ 𝑝 ∈ ℤ, 𝑞 ∈ ℕ}.
• ℝ is the set of real numbers, i.e., the set of all numbers that can be represented by
infinite decimals, like √2 = 1.414 . . . or 𝜋 = 3.14159 . . ..
• ℂ is the set of complex numbers, i.e., the set of all numbers 𝛾 = 𝛼 + 𝑖𝛽 where 𝛼, 𝛽
are real numbers and 𝑖 is a square root of −1; i.e., 𝑖2 = −1. In this representation,
283
284 A. Foundations
𝛼 is called the real part of 𝛾 and is denoted by ℜ𝛾 and 𝛽 is called the imaginary
part of 𝛾 and is denoted by ℑ𝛾.
We note that
(A.1.1) ℕ ⊂ ℕ0 ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ.
A.1.2. Relations.
Definition A.1.4. (1) Let 𝑆 and 𝑇 be sets. The Cartesian product 𝑆 × 𝑇 of 𝑆 and 𝑇 is
the set of all pairs (𝑠, 𝑡) with 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇; that is,
(A.1.3) 𝑆 × 𝑇 = {(𝑠, 𝑡) ∶ 𝑠 ∈ 𝑆, 𝑡 ∈ 𝑇}.
(2) More generally, if 𝑘 ∈ ℕ and 𝑆 1 , . . . , 𝑆 𝑘 are sets, then the Cartesian product 𝑆 0 ×
⋯ × 𝑆 𝑘−1 of these sets is the set of all tuples (𝑠1 , . . . , 𝑠𝑘 ) where 𝑠𝑖 ∈ 𝑆 𝑖 , 𝑖 ∈ ℤ𝑘 ; i.e.,
(A.1.4) 𝑆 0 × ⋯ × 𝑆 𝑘−1 = {(𝑠0 , . . . , 𝑠𝑘−1 ) ∶ 𝑠𝑖 ∈ 𝑆 𝑖 , 𝑖 ∈ ℤ𝑘 }.
𝑘−1
We also write ∏𝑖=0 for this Cartesian product.
Definition A.1.5. Let 𝑆 and 𝑇 be sets. A relation between 𝑆 and 𝑇 is a subset 𝑅 of the
Cartesian product 𝑆 × 𝑇. If 𝑆 = 𝑇, then 𝑅 is called a relation on 𝑆.
Example A.1.6. Consider the two sets 𝑆 = {“odd”, “even”}, 𝑇 = ℤ. Then “is the parity
of” is a relation between 𝑆 and 𝑇. Denote it by 𝑅. A pair (𝑠, 𝑡) is in 𝑅 if and only if 𝑠
is the parity of 𝑡. For example, (“even”, 2) is in 𝑅. Also, (“odd”, −3) is in 𝑅. However,
(“odd”, 0) is not in 𝑅.
(3) The relation 𝑅 is called antisymmetric if (𝑠, 𝑡) ∈ 𝑅 and (𝑡, 𝑠) ∈ 𝑅 implies 𝑠 = 𝑡 for
all 𝑠, 𝑡 ∈ 𝑆.
(4) The relation 𝑅 is called transitive if for all 𝑠, 𝑡, 𝑢 ∈ 𝑆 such that both (𝑠, 𝑡) and (𝑡, 𝑢)
are in 𝑅, the pair (𝑠, 𝑢) is also in 𝑅.
(5) The relation 𝑅 is called an equivalence relation if it is reflexive, symmetric, and
transitive.
(1) The equivalence class of an element 𝑠 ∈ 𝑆 with respect to the relation 𝑅 is the set
[𝑠]𝑅 = {𝑡 ∈ 𝑆 ∶ (𝑠, 𝑡) ∈ 𝑅}.
(2) The set of all equivalence classes of 𝑆 with respect to 𝑅 is written as 𝑆/𝑅. An
element of an equivalence class is called a representative of this equivalence class.
Definition A.1.12. A function is a triplet 𝑓 = (𝑆, 𝑇, 𝑅) where 𝑆 and 𝑇 are sets and 𝑅 is
a relation between 𝑆 and 𝑇 that associates every element of 𝑆 with exactly one element
of 𝑇. This means that for every 𝑠 ∈ 𝑆 there is exactly one 𝑡 ∈ 𝑇 such that (𝑠, 𝑡) ∈ 𝑅.
This element 𝑡 is denoted by 𝑓(𝑠). We will write the function as
(A.1.6) 𝑓∶𝑆→𝑇
(A.1.7) 𝑓 ∶ 𝑆 → 𝑇, 𝑠 ↦ 𝑓(𝑠).
Such a function is also called a map or mapping from 𝑆 to 𝑇. The set of all functions
(𝑆, 𝑇, 𝑓) is denoted by 𝑆 𝑇 .
286 A. Foundations
This function is neither injective nor surjective. For example, 𝑓(0) = 𝑓(11) = 0.
Therefore, 𝑓 is not injective. In addition, 𝑓 is not surjective since 𝑓(ℤ) = ℤ11 .
However, if we restrict the domain of 𝑓 to ℤ11 , then 𝑓 becomes injective. Also,
if we restrict the codomain of 𝑓 to ℤ11 , then 𝑓 becomes surjective. In fact, the
function 𝑓 ∶ ℤ11 → ℤ11 , 𝑠 ↦ 𝑠 mod 11 is the identity map on ℤ11 . Another
bijection is
(A.1.13) 𝑓 ∶ 𝑇 → 𝑈, 𝑔∶𝑆→𝑇
(A.1.14) 𝑓 ∘ 𝑔 ∶ 𝑆 → 𝑈, 𝑠 ↦ 𝑓(𝑔(𝑥)).
(A.1.15) 𝑓 ∶ ℤ6 → ℤ 5 , 𝑥 ↦ 𝑥 mod 5
and
(A.1.16) 𝑔 ∶ ℤ → ℤ6 , 𝑥 ↦ 𝑥 mod 6.
Then
(A.1.19) ∘ ∶ 𝑆 × 𝑆 → 𝑆.
Example A.1.21. Addition and multiplication are binary operations on the sets of nat-
ural numbers ℕ and on the set of integers ℤ.
Let 𝑚 be a positive integer. We define the binary operations addition and multi-
plication on ℤ𝑘 as follows:
+𝑚 ∶ℤ𝑚 × ℤ𝑚 , (𝑎, 𝑏) ↦ 𝑎 +𝑚 𝑏 = (𝑎 + 𝑏) mod 𝑚,
(A.1.20)
⋅𝑚 ∶ℤ𝑚 × ℤ𝑚 , (𝑎, 𝑏) ↦ 𝑎 ⋅𝑚 𝑏 = (𝑎 ⋅ 𝑏) mod 𝑚.
A B
D C
We note that any integer 𝑚 divides 0 because 0 = 𝑚 ∗ 0. The only integer that is
divisible by 0 is 0 because 𝑎 = 0 ∗ 𝑚 implies 𝑎 = 0.
We also use divisibility in ℤ in the following example of an equivalence relation.
Example A.3.3.
Let 𝑠 and 𝑡 be integers, and let 𝑚 be a positive integer. We write
(A.3.1) 𝑠 ≡ 𝑡 mod 𝑚
if 𝑚 divides 𝑡 − 𝑠. Consider the relation
(A.3.2) 𝑅 = {(𝑠, 𝑡) ∶ 𝑠, 𝑡 ∈ ℤ, 𝑠 ≡ 𝑡 mod 𝑚}.
It is called a congruence relation. It is reflexive because 𝑚 divides 𝑠 − 𝑠 = 0 for all 𝑠 ∈ ℤ.
This relation is symmetric since 𝑚 is a divisor of 𝑡 − 𝑠 if and only if 𝑚 is a divisor of 𝑠 − 𝑡.
Finally, the relation is transitive. To see this, we let 𝑠, 𝑡, 𝑢 be integers. Suppose that
(A.3.3) 𝑠 ≡ 𝑡 mod 𝑚, 𝑡 ≡ 𝑢 mod 𝑚.
Then 𝑚 divides 𝑡 − 𝑠 and 𝑢 − 𝑡. So we can write
(A.3.4) 𝑡 − 𝑠 = 𝑥𝑚, 𝑢 − 𝑡 = 𝑦𝑚
with two integers 𝑥, 𝑦. Therefore, we have the following:
(A.3.5) 𝑢 − 𝑠 = (𝑢 − 𝑡) + (𝑡 − 𝑠) = 𝑥𝑚 + 𝑦𝑚 = (𝑥 + 𝑦)𝑚.
290 A. Foundations
A.3.2. Greatest common divisor. Our next topic is the greatest common divi-
sor of two integers and we will explain that the Euclidean algorithm computes it effi-
ciently. The proofs of all the results presented in this section can be found in [Buc04,
Section 1.10].
Definition A.3.4. A common divisor of 𝑎 and 𝑏 is an integer that divides both 𝑎 and
𝑏.
Proposition A.3.5. Among all common divisors of two integers 𝑎 and 𝑏, which are not
both zero, there is exactly one greatest divisor (with respect to ≤). It is called the greatest
common divisor (gcd) of 𝑎 and 𝑏.
For completeness, we set the greatest common divisor of 0 and 0 to 0 (that is,
gcd(0, 0) = 0). Therefore, the greatest common divisor of two numbers is never nega-
tive.
We present another useful characterization of the greatest common divisor.
Proposition A.3.6. There is exactly one nonnegative common divisor of 𝑎 and 𝑏, which
is divisible by all other common divisors of 𝑎 and 𝑏, namely the greatest common divisor
of 𝑎 and 𝑏.
Example A.3.7. The greatest common divisor of 18 and 30 is 6. The greatest common
divisor of −10 and 20 is 10. The greatest common divisor of −20 and −14 is 2. The
highest common divisor of 12 and 0 is 12.
Example A.3.10. We have ℤ∗1 = ∅, 𝜑(1) = 0, ℤ∗2 = {1}, 𝜑(2) = 1, ℤ∗15 = {1, 2, 4, 7, 8,
11, 13, 14}, and 𝜑(15) = 8.
Exercise A.3.11. Let 𝑚 ∈ ℕ, 𝑚 > 1. Prove that ℤ∗𝑚 is a group with respect to multi-
plication modulo 𝑚.
A.3.3. Least common multiple. We also require the least common multiple of
two integers.
Definition A.3.12. Let 𝑛 ∈ ℕ and let 𝑎0 , . . . , 𝑎𝑛−1 be nonzero integers. Then the least
common multiple of these integers is the smallest positive integer that is a multiple of
all 𝑎𝑖 . It is denoted by lcm(𝑎0 , . . . , 𝑎𝑛−1 ).
Example A.3.13. The least common multiple of 2, 3, 4 is lcm(2, 3, 4) = 12.
The next exercise justifies the definition of the least common multiple and finds
an algorithm for computing it.
Exercise A.3.14. (1) Prove the existence and uniqueness of the least common multi-
ple of finitely many nonzero integers. Why do these numbers have to be nonzero?
(2) Utilize the Euclidean algorithm to devise an algorithm for computing the least
common multiple in quadratic running time.
We state the fundamental theorem of arithmetic. It is also called the unique fac-
torization theorem and goes back to Euclid.
Theorem A.3.17. Every integer 𝑎 > 1 can be written as the product of prime numbers.
Up to permutation, the factors in this product are uniquely determined.
Example A.3.18. The French mathematician Pierre de Fermat (1601–1665) thought
that all of the so-called Fermat numbers
𝑖
𝐹𝑖 = 22 + 1
are primes. In fact, 𝐹0 = 3, 𝐹1 = 5, 𝐹2 = 17, 𝐹3 = 257, and 𝐹4 = 65537 are prime
numbers. However, in 1732 Euler discovered that 𝐹5 = 641 ∗ 6700417 is composite.
Both factors in this decomposition are primes. 𝐹6 , 𝐹7 , 𝐹8 , and 𝐹9 are also composite. The
factorization of 𝐹6 was found in 1880 by Landry and Le Lasseur. The factorization of 𝐹7
was found in 1970 by Brillhart and Morrison. The factorization of 𝐹8 was computed in
1980 by Brent and Pollard, and 𝐹9 was factored in 1990 by Lenstra, Lenstra, Manasse,
292 A. Foundations
and Pollard (see [LLMP93] where also references for the other results mentioned in
this example can be found). This shows the difficulty of the factoring problem. But
on the other hand, we also see that there is considerable progress. It took until 1970
to factor the 39-digit number 𝐹7 , but only 20 years later the 155-digit number 𝐹9 was
factored.
A.3.5. The continued fraction algorithm. In this section, we present the con-
tinued fraction algorithm (CFA) and its properties. It is used in Shor’s order finding
algorithm to compute good rational approximations.
We start with an example.
Example A.3.19. The continued fraction [𝑎0 , 𝑎1 , 𝑎2 ] = [1, 2, 3] represents the rational
number
1
(A.3.8) [1, 2, 3] = 1 + .
1
2+
3
This rational number is
1 1 3 10
(A.3.9) [1, 2, 3] = 1 + =1+ =1+ = .
1 7 7 7
2+
3 3
Definition A.3.20. Let 𝑛 ∈ ℕ0 and let (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) ∈ ℚ≥0 × ℚ𝑛>0 . Then we set
1
(A.3.10) [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] = 𝑎0 +
1
𝑎1 +
𝑎2 +
⋱
1
+
1
𝑎𝑛−1 + .
𝑎𝑛
This defines the map
(A.3.11) ℚ≥0 × ℚ𝑛>0 → ℚ ∶ (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) ↦ [𝑎0 , . . . , 𝑎𝑛 ].
Note that the sequence on the left side of (A.3.12) has length 𝑛, while the sequence on
the right side has length 𝑛 − 1. The following lemma generalizes this observation.
Lemma A.3.21. Let 𝑛, 𝑘 ∈ ℕ0 , (𝑎0 , . . . , 𝑎𝑛 ) ∈ ℚ≥0 × 𝑄𝑛>0 , and (𝑏0 , . . . , 𝑏𝑘 ) ∈ ℚ𝑘+1
>0 .
Then we have
(A.3.13) [𝑎0 , . . . , 𝑎𝑛 , [𝑏0 , . . . , 𝑏𝑘 ]] = [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘 ].
A.3. Number theory 293
Proof. We prove the assertion by induction on 𝑘. For 𝑘 = 0, it follows from the fact that
[𝑏0 ] = 𝑏0 . Now, assume that the assertion holds for 𝑘 − 1. The induction hypothesis
and (A.3.12) imply
[𝑎0 , . . . , 𝑎𝑛 , [𝑏0 , . . . , 𝑏𝑘 ]]
1
= [𝑎0 , . . . , 𝑎𝑛 , [𝑏0 , . . . , 𝑏𝑘−1 + ]]
𝑏𝑘
1
(A.3.14) = [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘−1 + ]
𝑏𝑘
= [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘 ]. □
Example A.3.24. Let 𝑝 = 15 and 𝑞 = 13. The sequence of values 𝑟 𝑖 and 𝑎𝑖 from
15
Algorithm A.3.23 is shown in Table A.3.1. We verify that [1, 6, 2] = 13 . We have
1 1 2 15
[1, 6, 2] = 1 + =1+ =1+ = .
(A.3.15) 1 13 13 13
6+
2 2
294 A. Foundations
Table A.3.1. Run of the continued fraction algorithm with input 𝑝 = 15, 𝑞 = 13.
𝑖 −1 0 1 2
𝑟𝑖 15 13 2 1
𝑎𝑖 1 6 2
The next proposition shows that the continued fraction algorithm yields the correct
result.
Proposition A.3.26. On input of 𝑝, 𝑞 ∈ ℕ, Algorithm A.3.23 computes a continued
𝑝
fraction [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] for some 𝑛 ∈ ℕ0 that represents 𝑞 and satisfies 𝑎𝑛 > 1 if 𝑛 > 0.
It is the only continued fraction with these properties.
Proof. We use the notation of Algorithm A.3.23. After each iteration of the repeat
loop, we have
(A.3.16) 𝑟 𝑖−1 = 𝑎𝑖 𝑟 𝑖 + 𝑟 𝑖+1 and 0 ≤ 𝑟 𝑖+1 < 𝑟 𝑖 for 0 ≤ 𝑖 ≤ 𝑛.
This shows that the sequence 𝑟0 , 𝑟1 , . . . is strictly decreasing. Therefore, the algorithm
terminates.
For 0 ≤ 𝑖 ≤ 𝑛 let 𝛼𝑖 = [𝑎𝑖 , . . . , 𝑎𝑛 ]. Then we have
1
(A.3.17) 𝛼𝑛 = 𝑎𝑛 and 𝛼𝑖 = 𝑎 𝑖 + for 0 ≤ 𝑖 < 𝑛.
𝛼𝑖+1
We show by induction on 𝑖 = 𝑛, 𝑛 − 1, . . . , 0 that
𝑟
(A.3.18) 𝛼𝑖 = 𝑖−1 and 𝑎𝑖 = ⌊𝛼𝑖 ⌋.
𝑟𝑖
𝑝
For 𝑖 = 0, this shows that [𝑎0 , . . . , 𝑎𝑛 ] = 𝑞 .
The base case where 𝑖 = 𝑛 is obtained from 𝑟𝑛−1 = 𝑎𝑛 𝑟𝑛 , which follows from
(A.3.16) and 𝑟𝑛+1 = 0. For the inductive step, let 𝑖 ∈ ℕ, 𝑛 ≥ 𝑖 > 0 and assume that
(A.3.18) holds. Then (A.3.16) implies
𝑟 𝑖−2 1 1
(A.3.19) = 𝑎𝑖−1 + 𝑟𝑖−1 = 𝑎𝑖−1 + = 𝛼𝑖−1 .
𝑟 𝑖−1 𝛼𝑖
𝑟𝑖
𝑟𝑖−1
Since 𝛼𝑖 = 𝑟𝑖
> 1 by (A.3.16), this implies 𝑎𝑖−1 = ⌊𝛼𝑖 ⌋. This completes the induction
proof.
Next, we assume that 𝑛 > 0 and show that 𝑎𝑛 > 1. Since 𝑟𝑛+1 = 0, we have
𝑟𝑛−1 = 𝑎𝑛 𝑟𝑛 . So 𝑎𝑛 = 1 would imply 𝑟𝑛−1 = 𝑟𝑛 which contradicts (A.3.16).
Finally, we prove the uniqueness of [𝑎0 , , . . . , 𝑎𝑛 ]. Let 𝑘 ∈ ℕ, 𝑘 ≥ 𝑛, and let
𝑝
[𝑏0 , . . . , 𝑏𝑘 ] be a continuous fraction that represents 𝑞 with 𝑏𝑘 > 1 for 𝑘 > 0. If 𝑘 = 0,
A.3. Number theory 295
𝑝
then we have 𝑛 = 0 and 𝑞 = 𝑎0 = 𝑏0 . Assume that 𝑘 > 0. For 0 ≤ 𝑖 ≤ 𝑘, let
𝛽 𝑖 = [𝑏𝑖 , . . . , 𝑏𝑘 ]. Then we have
1
(A.3.20) 𝛽 𝑘 = 𝑏𝑘 and 𝛽 𝑖 = 𝑏𝑖 + for 0 ≤ 𝑖 < 𝑘.
𝛽 𝑖+1
Since 𝑏𝑘 > 1, it follows from (A.3.20) that
(A.3.22) 𝑏𝑖 = ⌊𝛽 𝑖 ⌋ for 1 ≤ 𝑖 ≤ 𝑘.
(A.3.23) 𝛼𝑖 = 𝛽 𝑖 and 𝑎𝑖 = 𝑏𝑖 .
𝑟
By assumption, we have 𝛼0 = [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] = 𝑟 0 = [𝑏0 , . . . , 𝑏𝑘 ] = 𝛽0 and by
𝑖−1
(A.3.18) and (A.3.22) 𝑏0 = 𝑎0 . Now assume that 0 ≤ 𝑖 < 𝑛 and that (A.3.23) holds.
Then it follows from (A.3.17) and (A.3.20) that
1 1
(A.3.24) 𝛽 𝑖+1 = = = 𝛼𝑖+1 .
𝑏𝑖 − 𝛽 𝑖 𝑎𝑖 − 𝛼𝑖
So (A.3.18) and (A.3.22) imply
Definition A.3.27. Let 𝛼 ∈ ℚ≥0 . The uniquely determined continued fraction from
Proposition A.3.26 that represents 𝛼 is called the continued fraction expansion of 𝛼.
Proof. The statement can be proved using the techniques from Section 1.6.4 in
[Buc04]. Note that the space is required to represent continued fractions. □
Proof. We prove the assertion by induction on 𝑛. For the base case, we note that
𝑝0 𝑎
(A.3.29) = 0 = [𝑎0 ]
𝑞0 1
and
𝑝1 𝑎 𝑎 +1 1
(A.3.30) = 1 0 = 𝑎0 + = [𝑎0 , 𝑎1 ].
𝑞1 𝑎1 𝑎1
For the inductive step, let 𝑖 ∈ ℕ0 , 0 ≤ 𝑖 < 𝑛 and assume that the assertion of the
proposition holds for 𝑖. The induction hypothesis gives
1
[𝑎0 , . . . , 𝑎𝑖+1 ] = [𝑎0 , . . . , 𝑎𝑖 + ]
𝑎𝑖+1
1
(𝑎𝑖 + 𝑎𝑖+1
)𝑝 𝑖−1 + 𝑝 𝑖−2
= 1
(𝑎𝑖 + 𝑎𝑖+1
)𝑞𝑖−1 + 𝑞𝑖−2
𝑝𝑖−1 𝑝𝑖−1
𝑎𝑖 𝑝 𝑖−1 + 𝑝 𝑖−2 + 𝑎𝑖+1
𝑝𝑖 + 𝑎𝑖+1
(A.3.31) = 𝑞𝑖−1 = 𝑞𝑖−1
𝑎𝑖 𝑞𝑖−1 + 𝑞𝑖−2 + 𝑎𝑖+1
𝑞𝑖 + 𝑎𝑖+1
𝑎𝑖+1 𝑝 𝑖 + 𝑝 𝑖−1 𝑝
= = 𝑖+1 . □
𝑎𝑖+1 𝑞𝑖 + 𝑞𝑖−1 𝑞𝑖+1
Exercise A.3.32. Let [𝑎0 , . . . , 𝑎𝑛 ] be a finite simple continued fraction. Use the nota-
tion from Proposition A.3.31 and show that 𝑝 𝑖−1 𝑞𝑖 + 𝑝 𝑖 𝑞𝑖−1 = (−1)𝑖 for −1 ≤ 𝑖 ≤ 𝑛
and gcd(𝑝 𝑖 , 𝑞𝑖 ) = 1 for −2 ≤ 𝑖 ≤ 𝑛.
Now we show that there are exactly two finite simple continued fractions that rep-
resent a given positive rational number.
Proposition A.3.33. Let 𝛼 ∈ ℚ>0 and let [𝑎0 , . . . , 𝑎𝑛 ] be the continued fraction expan-
sion of 𝛼. Then [𝑎0 , . . . , 𝑎𝑛−1 , 𝑎𝑛 − 1, 1] is the only other finite simple continued fraction
that represents 𝛼.
Proof. For −2 ≤ 𝑖 ≤ 𝑛 denote by 𝑝 𝑖 , 𝑞𝑖 the integers from Proposition A.3.31 for the
continued fraction [𝑎0 , . . . , 𝑎𝑛 ], and for −2 ≤ 𝑖 ≤ 𝑛 + 1 denote by 𝑝𝑖′ , 𝑞′𝑖 the correspond-
ing integers for the continued fraction [𝑎0 , . . . , 𝑎𝑛 − 1, 1]. Then we have
′
(A.3.32) 𝑝𝑛′ = (𝑎𝑛 − 1)𝑝𝑛−1 + 𝑝𝑛−2 , 𝑝𝑛+1 = 𝑝𝑛′ + 𝑝𝑛−1 = 𝑎𝑛 𝑝𝑛−1 + 𝑝𝑛−2 = 𝑝𝑛 .
A.3. Number theory 297
In the same way, it can be verified that 𝑞′𝑛+1 = 𝑞𝑛 . This shows that [𝑎0 , . . . , 𝑎𝑛 − 1, 1] =
𝛼.
The uniqueness is proved in Exercise A.3.34. □
Exercise A.3.34. Verify the uniqueness claim in Proposition A.3.33.
𝑝
Proof. If 𝛼 = 0, then 𝑝 = 0 and 𝑞
is the only convergent of 𝛼. Let 𝛼 ≠ 0. Let 𝛿 be the
rational number with
𝑝 𝛿
(A.3.34) 𝛼= + .
𝑞 2𝑞2
Then the assumption of the proposition implies |𝛿| ≤ 1. By Proposition A.3.33, we can
𝑝
choose a simple continued fraction [𝑎0 , . . . , 𝑎𝑛 ] that represents 𝑞 such that
(A.3.35) sign 𝛿 = (−1)𝑛 .
𝑝 𝑝𝑛
For −2 ≤ 𝑖 ≤ 𝑛, define 𝑝 𝑖 , 𝑞𝑖 as in Proposition A.3.31. Then we have 𝑞
= 𝑞𝑛
. Set
2 𝑞
(A.3.36) 𝜆= − 𝑛−1 .
|𝛿| 𝑞𝑛
Then 𝑞𝑛−1 > 𝑞𝑛 and |𝛿| ≤ 1 imply
(A.3.37) 𝜆 > 2 − 1 = 1.
Also, we have
2𝑝𝑛 𝑝𝑛 𝑞𝑛−1
(A.3.38) 𝜆𝑝𝑛 + 𝑝𝑛−1 − + 𝑝𝑛−1
|𝛿| 𝑞𝑛
and
2𝑞𝑛
(A.3.39) 𝜆𝑞𝑛 + 𝑞𝑛−1 = .
|𝛿|
It follows from Exercise A.3.32 and (A.3.35) that
𝜆𝑝𝑛 + 𝑝𝑛−1 𝑝 (𝑝 𝑞 − 𝑝 𝑞 )|𝛿| 𝑝 𝛿
(A.3.40) = 𝑛 − 𝑛 𝑛−1 2𝑛−1 𝑛 = 𝑛 − 2 = 𝛼.
𝜆𝑞𝑛 + 𝑞𝑛−1 𝑞𝑛 2𝑞𝑛 𝑞𝑛 2𝑞𝑛
So it follows from Proposition A.3.31 that
(A.3.41) 𝛼 = [𝑎0 , . . . , 𝑎𝑛−1 , 𝜆].
If [𝑏0 , . . . , 𝑏𝑘 ] is the continued fraction expansion of 𝜆, then (A.3.37) implies 𝑏0 > 0.
Therefore, it follows from Lemma A.3.21 that 𝛼 = [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘 ]. Since 𝑏𝑘 > 1,
𝑝
this is the continued fraction expansion of 𝛼. So 𝑞𝑛 = [𝑎0 , . . . , 𝑎𝑛 ] is a convergent
𝑛
of 𝛼. □
298 A. Foundations
(A.3.42) || 15 − 7|
|| =
1
≤
1
=
1
.
| 13 6 78 2 ∗ 6 2 72
7
So Proposition A.3.35 predicts that 6
is a convergent of the continued fraction expan-
15
sion of 13
which we have shown in Example A.3.30.
A.4. Algebra
We introduce a few basic concepts of algebra that are required in this book.
A.4.1. Semigroups.
Definition A.4.1. A semigroup is a pair (𝑆, ∘) where 𝑆 is a nonempty set and ∘ is an
associative binary operation on 𝑆. If it is clear from the context which operation ∘ we
refer to, then we also write 𝑆 instead of (𝑆, ∘).
(2) The semigroup is called a monoid if 𝑆 contains an identity element with respect
to ∘.
Exercise A.4.3. Prove that ℕ is not a monoid with respect to addition.
We show that the identity element and the inverses in monoids are uniquely de-
termined.
Proposition A.4.4. A monoid has exactly one identity element.
Proof. Suppose that 𝑢 is a unit of a monoid (𝑆, ∘) with the unit element 1 and that 𝑢′
and 𝑢″ are inverses of 𝑢. Then 𝑢′ = 𝑢 ∘ 1 = 𝑢′ ∘ (𝑢 ∘ 𝑢″ ) = (𝑢′ ∘ 𝑢) ∘ 𝑢″ = 1 ∘ 𝑢″ = 𝑢″ . □
Example A.4.10. The claims in this example are shown in Exercise A.4.11.
(1) (ℕ, +) is an abelian semigroup. But (ℕ, +) is not a monoid since there is no identity
element in this semigroup.
(2) (ℤ, +), (ℚ, +), (ℝ, +), and (ℂ, +) are abelian groups with identity element 0.
(3) (ℕ, ⋅) and (ℤ, ⋅) are abelian monoids with identity element 1. But they are not
groups. In fact, the only unit in (ℕ, ⋅) is 1 and the only units in (ℤ, ⋅) are ±1.
(4) (ℚ ⧵ {0}, ⋅), (ℝ ⧵ {0}, ⋅), and (ℂ ⧵ {0}, ⋅) are abelian groups with identity element 1.
(5) If 𝑚 ∈ ℕ, then (ℤ𝑚 , +𝑚 ) is a finite abelian group.
(6) If 𝑚 ∈ ℕ, then (ℤ∗𝑚 , ⋅𝑚 ) is a finite abelian group.
Exercise A.4.11. Show that the claims in Example A.4.10 are correct.
We also define the order of elements of finite groups. This notion is motivated by
the following observation.
Proposition A.4.13. Let 𝐺 be a finite group and let 𝑔 ∈ 𝐺. Then there is 𝑛 ∈ ℕ such
that 𝑔𝑛 = 1.
Here is the definition of element orders and the order of an integer modulo another
integer.
Definition A.4.15. (1) Let 𝐺 be a finite group and let 𝑔 ∈ 𝐺. Then the smallest
positive integer 𝑛 with 𝑔𝑛 = 1 is called the order of 𝑔 in 𝐺.
(2) Let 𝑁 ∈ 𝑁 and let 𝑎 ∈ ℤ∗𝑁 . Then the order of 𝑎 in the multiplicative group ℤ∗𝑁 is
called the order of 𝑎 modulo 𝑁.
For a further discussion of the order of elements of finite abelian groups we refer
to [Buc04, Sections 2.9 and 2.14].
𝑚 𝑒
Exercise A.4.17. Let 𝑚 ∈ ℕ, let 𝑝1 , . . . , 𝑝𝑚 be prime numbers, and let 𝑁 = ∏𝑖=1 𝑝𝑖 𝑖
where 𝑒 1 , . . . , 𝑒 𝑚 are positive integers. Also, let 𝑎 ∈ ℤ such that gcd(𝑎, 𝑁) = 1. Show
that the order of 𝑎 modulo 𝑁 is the least common multiple of the orders of 𝑎 modulo
𝑒
𝑝𝑖 𝑖 for 1 ≤ 𝑖 ≤ 𝑛.
300 A. Foundations
A.4.3. The symmetric group. In many contexts, for example, in Section 4.5,
symmetric groups play an important role. We discuss them in this section.
Definition A.4.20. Let 𝑛 ∈ ℕ. The symmetric group of degree 𝑛 is the group of permu-
tations of ℤ𝑛 . It is denoted by 𝑆𝑛 . Also, if 𝜋 ∈ 𝑆𝑛 , then we write 𝜋 as
0 1 2 ... 𝑛−1
(A.4.1) ( )
𝜋(0) 𝜋(1) 𝜋(2) . . . 𝜋(𝑛 − 1)
or simply as the sequence (𝜋(0), 𝜋(1), . . . , 𝜋(𝑛 − 1)).
We introduce transpositions.
Theorem A.4.25. Let 𝑛 ∈ ℕ. Then every element of the symmetric group of degree 𝑛 can
be written as a composition of at most 𝑛 − 1 transpositions.
First, assume that 𝜋(𝑛) = 𝑛. Then the map 𝜋′ = 𝜋|ℤ𝑛 is in 𝑆𝑛 . By the induction
hypothesis, 𝜋′ can be written as the composition of at most 𝑛 − 1 transpositions in 𝑆𝑛
that are also transpositions in 𝑆𝑛+1 . Also, since 𝜋(𝑛) = 𝑛, the permutation 𝜋 is the
composition of the same transpositions.
Now assume 𝜋(𝑛) = 𝑗 with 𝑗 < 𝑛. Then the map 𝜋′ = 𝜋 ∘ (𝑗, 𝑛)|ℤ𝑛 is in 𝑆𝑛 . By
the induction hypothesis, 𝜋′ can be written as the composition of at most 𝑛 − 1 trans-
positions in 𝑆𝑛 that are also transpositions in 𝑆𝑛+1 . Therefore, the permutation 𝜋 is
the composition of the same transpositions and (𝑗, 𝑛). These are at most 𝑛 transposi-
tions. □
0 1 2 3 0 1 2 3
(A.4.4) 𝜋=( )=( ) ∘ (0, 3) = (1, 2) ∘ (0, 3).
3 2 1 0 0 2 1 3
Exercise A.4.27. Find a polynomial time algorithm that computes the representation
of a permutation in 𝑆𝑛 as a product of at most 𝑛 − 1 transpositions in 𝑆𝑛 .
Definition A.4.33. Let (𝐺, ∘) be a group, and let 𝐻 be a subset of 𝐺. Then 𝐻 is called
a subgroup of 𝐺 if (𝐻, ∘) is a group.
302 A. Foundations
A.4.5. Rings and fields. Another basic notion in algebra is that of a ring, which
we define now.
Definition A.4.38. A ring is a triple (𝑅, +, ⋅) where 𝑅 is a nonempty set and + and ⋅ are
binary operations on 𝑅 called addition and multiplication. They satisfy the following
conditions.
(1) (𝑅, +) is an abelian group.
(2) (𝑅, ⋅) is a monoid.
(3) Multiplication is distributive with respect to addition, meaning that
• 𝑎 ⋅ (𝑏 + 𝑐) = 𝑎 ⋅ 𝑏 + 𝑎 ⋅ 𝑐 for all 𝑎, 𝑏, 𝑐 ∈ 𝑅 (left distributivity),
• (𝑏 + 𝑐) ⋅ 𝑎 = 𝑏 ⋅ 𝑎 + 𝑐 ⋅ 𝑎 for all 𝑎, 𝑏, 𝑐 ∈ 𝑅 (right distributivity).
Definition A.4.39. Let (𝑅, +, ⋅) be a ring.
(1) The ring 𝑅 is called commutative if the semigroup (𝑅, ⋅) is commutative.
(2) The unit group of the ring 𝑅 is the unit group of the monoid (𝑅, ⋅).
(3) A zero divisor in 𝑅 is an element 𝑎 ∈ 𝑅 such that there are 𝑥, 𝑦 ∈ 𝑅 with 𝑥𝑎 =
𝑎𝑦 = 0.
(4) The ring 𝑅 is called a field if (𝑅 ⧵ {0}, ⋅) is an abelian group.
Theorem A.4.40. Let (𝑅, +, ⋅) be a ring. Then the set of units and the set of zero divisors
in this ring are disjoint.
Exercise A.4.41. Prove Theorem A.4.40.
We give a few examples of rings, their unit groups, and zero divisors.
Example A.4.42. The claims of this example are verified in Exercise A.4.43.
(1) The integers, equipped with the usual addition and multiplication, are a commu-
tative ring without zero divisors.
A.4. Algebra 303
As explained in [Buc04, Section 2.20], for all prime numbers 𝑝 and all positive
integers 𝑒, there is a finite field with 𝑞 = 𝑝𝑒 elements. It is uniquely determined up to
isomorphism and is denoted by 𝔽𝑞 . These are all the finite fields that exist.
where
𝑘
𝑐 𝑘 = ∑ 𝑎𝑖 𝑏𝑘−𝑖 , 0 ≤ 𝑘 ≤ 𝑛 + 𝑚.
𝑖=0
Example A.4.52. The polynomial 𝑓(𝑥) = 𝑥2 − 1 ∈ ℚ[𝑥] has the zero 1 and can be
written as 𝑓(𝑥) = (𝑥 − 1)(𝑥 + 1).
Exercise A.4.54. Let 𝑓 ∈ ℝ[𝑥] be a polynomial of degree 𝑛 ∈ ℕ. Show that there are
nonnegative integers 𝑠 and 𝑡 such that 𝑛 = 𝑠 + 2𝑡 and 𝑓 has 𝑠 real zeros and 2𝑡 pairs of
complex conjugate zeros.
Proof. Let 𝑥, 𝑦 ∈ ℝ. If 𝑥 = 𝑦, then the statement is valid. So, let 𝑥 ≠ 𝑦. By the mean
value theorem, there is 𝑧 ∈ ℝ with
(A.5.10) sin(𝑥) − sin(𝑦) = cos(𝑧)(𝑥 − 𝑦).
This implies
(A.5.11) | sin(𝑥) − sin(𝑦)| = | cos(𝑧)||𝑥 − 𝑦| ≤ |𝑥 − 𝑦|.
Likewise, the mean value theorem implies
(A.5.12) cos(𝑥) − cos(𝑦) = − sin(𝑧)(𝑥 − 𝑦).
This implies
(A.5.13) | cos(𝑥) − cos(𝑦)| = | sin(𝑧)||𝑥 − 𝑦| ≤ |𝑥 − 𝑦|. □
1
Lemma A.5.3. For all 𝑥 ∈ [0, 2 ] we have sin 𝜋𝑥 ≥ 2𝑥.
Proof. Consider the function 𝑓(𝑥) = cos 𝑥 − 𝑥. We have 𝑓(0) = 1, 𝑓(𝜋/2) = 0, and
𝑓′ (𝑥) = − sin 𝑥 − 1 < 0 for all 𝑥 ∈ [0, 𝜋/2]. □
306 A. Foundations
Proof. We prove the assertion by induction on 𝑘. For 𝑘 = 0 the assertion holds since
both sides of (A.5.18) are equal to 1. So assume that 𝑘 ≥ 0 and that the assertion is true
for 𝑘. Then we have
𝑘+1 𝑘
∏ cos(2𝑙 𝛼) = cos(2𝑘+1 𝛼) ∏ cos(2𝑙 𝛼)
(A.5.19) 𝑙=1 𝑙=0
sin(2𝑘+1 𝛼) sin(2𝑘+2 𝛼)
= cos(2𝑘+1 𝛼) =
2𝑘 sin(2𝛼) 2𝑘+1 sin(2𝛼)
where the last equality follows from the trigonometric identity (A.5.4). □
Appendix B
Linear Algebra
Linear algebra plays an essential role in modeling phenomena in diverse scientific dis-
ciplines. Its efficiency in algorithmic solutions empowers the resolution of compu-
tational challenges and the formulation of concrete predictions in various scientific
domains.
In the context of this book, linear algebra assumes particular importance, since it
includes the theory of Hilbert spaces, which serves as a framework for modeling quan-
tum mechanics. To comprehend this theory fully, it becomes necessary to establish a
foundation in linear algebra, which we provide in this appendix.
The appendix is divided into two parts. In the initial part, which includes Sec-
tions B.1 to B.7, we provide a brief overview of fundamental concepts commonly found
in introductory linear algebra courses. We assume the reader’s familiarity with these
concepts and present them as reference points, omitting proofs, examples, and exer-
cises. The topics encompass vectors, matrices, modules over rings, vector spaces, lin-
ear maps, characteristic polynomials, eigenvalues, and eigenspaces. This section also
covers the Gaussian elimination algorithm, its complexity, and its applications in de-
termining bases for linear map images and kernels, as well as solving linear systems.
The subsequent part of this appendix, Section B.8, presents tensor products of mod-
ules and vector spaces, as well as the concept of the partial trace. This area of study
typically falls outside the scope of introductory linear algebra courses but is crucial for
modeling quantum mechanics mathematically. As this topic may be unfamiliar or en-
tirely new to readers, we include comprehensive explanations, proofs, examples, and
exercises to facilitate understanding.
Throughout this chapter, we use the following notation. By 𝑘, 𝑙, 𝑚, and 𝑛 we denote
positive integers, (𝑅, +, ⋅) is a commutative ring, and (𝐹, +, ⋅) is a field. We write 0 for
the identity elements with respect to addition in 𝑅 and 𝐹 and we write 1 for the identity
elements with respect to multiplication in 𝑅 and 𝐹. If 𝑟 ∈ 𝑅 or 𝑟 ∈ 𝐹 is invertible
with respect to multiplication, we write 𝑟−1 for its multiplicative inverse in 𝑅 or 𝐹,
respectively.
307
308 B. Linear Algebra
B.1. Vectors
Definition B.1.1. (1) A vector over a nonempty set 𝑆 is a sequence 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 )
of elements in 𝑆. The positive integer 𝑘 is called the length of 𝑣.⃗ The elements 𝑣 𝑖
are called the entries or components of 𝑣.⃗
(2) The set of all vectors of length 𝑘 over 𝑆 is denoted by 𝑆 𝑘 .
(3) For 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ 𝑆 𝑘 we also write 𝑣 ⃗ = (𝑣 𝑖 )0≤𝑖<𝑘 or 𝑣 ⃗ = (𝑣 𝑖 )𝑖∈ℤ𝑘 .
(4) If 𝑣 ⃗ ∈ 𝑆 𝑙 and 𝑤⃗ ∈ 𝑆 𝑘 , then 𝑣‖⃗ 𝑤⃗ denotes the concatenation of 𝑣 ⃗ and 𝑤⃗ which is an
element of 𝑆 𝑘+𝑙 .
Usually, we will start by numbering the entries of a vector by 0, but we may also
number the entries differently. We do not distinguish between row vectors and column
vectors. However, an analogous distinction is introduced in Section B.4.1.
Proposition B.1.2. Let 𝑆 be finite and let 𝑘 ∈ ℕ. Then |𝑆𝑘 | = |𝑆|𝑘 .
B.1.1. Vector operations. For vectors over the ring 𝑅, we define the following
operations.
Definition B.1.3. Let 𝑟 ∈ 𝑅 and 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ), 𝑤⃗ = (𝑤 0 , . . . , 𝑤 𝑘−1 ) ∈ 𝑅𝑘 .
(1) The scalar product of 𝑟 with 𝑣 ⃗ is defined as
(B.1.1) 𝑟𝑣 ⃗ = 𝑟 ⋅ 𝑣 ⃗ = (𝑟𝑣 0 , 𝑟𝑣 1 , . . . , 𝑟𝑣 𝑘−1 ).
(2) The sum of 𝑣 ⃗ and 𝑤⃗ is defined as
(B.1.2) 𝑣 ⃗ + 𝑤⃗ = (𝑣 0 + 𝑤 0 , 𝑣 1 + 𝑤 1 , . . . , 𝑣 𝑘−1 + 𝑤 𝑘−1 ).
(3) We write −𝑤⃗ = (−𝑤 0 , . . . , −𝑤 𝑘−1 ) and 𝑣 ⃗ − 𝑤⃗ for 𝑣 ⃗ + (−𝑤).
⃗
(4) The dot product of 𝑣 ⃗ and 𝑤⃗ is an element of 𝑅 which is defined as
𝑘−1
(B.1.3) 𝑣 ⃗ ⋅ 𝑤⃗ = ∑ 𝑣 𝑖 𝑤 𝑖 .
𝑖=0
(3) The identity element 1 of 𝑅 is also an identity element with respect to scalar mul-
tiplication; that is, 1 ⋅ 𝑣 ⃗ = 𝑣 ⃗ for all 𝑣 ⃗ ∈ 𝑀.
(4) Scalar multiplication is distributive with respect to addition in 𝑀; that is,
𝑟 ⋅ (𝑣 ⃗ + 𝑤)⃗ = 𝑟 ⋅ 𝑣 ⃗ + 𝑟 ⋅ 𝑤⃗ for all 𝑟 ∈ 𝑅 and all 𝑣,⃗ 𝑤⃗ ∈ 𝑀.
(5) Scalar multiplication is distributive with respect to addition in 𝑅; that is, (𝑟+𝑠)⋅ 𝑣 ⃗ =
𝑟 ⋅ 𝑣 ⃗ + 𝑠 ⋅ 𝑣 ⃗ for all 𝑟, 𝑠 ∈ 𝑅 and all 𝑣 ⃗ ∈ 𝑀.
Definition B.2.2. Any module over the field 𝐹 is called a vector space over 𝐹 or an
𝐹-vector space.
Proposition B.2.3. (1) (𝑅𝑘 , +, ⋅) is an 𝑅-module, where “+” and “⋅” denote addition
and scalar multiplication in ℝ𝑘 , respectively.
(2) (𝐹 𝑘 , +, ⋅) is an 𝐹-vector space, where “+” and “⋅” denote addition and scalar multi-
plication in 𝔽𝑘 , respectively.
(3) For all 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ 𝑅𝑘 , the element −𝑣 ⃗ = (−𝑣 0 , . . . , −𝑣 𝑘−1 ) is the additive
inverse of 𝑣.⃗
(4) The zero vector 0⃗ = (0, . . . , 0) is the neutral element in ℝ𝑘 with respect to addition.
B.2.1. Submodules.
Definition B.2.4. (1) An R-submodule of 𝑀 is a nonempty subset 𝑁 of 𝑀 such that
𝑁 is a subgroup of 𝑀 with respect to addition and 𝑁 is closed under scalar multi-
plication; that is, 𝑟𝑣 ⃗ ∈ 𝑁 for all 𝑣 ⃗ ∈ 𝑁 and all 𝑟 ∈ 𝑅. If it is clear from the context
what is meant by the ring 𝑅, we call 𝑁 a submodule of 𝑀.
(2) An 𝐹-subspace 𝑊 of 𝑉 is an 𝐹-submodule 𝑊 of 𝑉. If it is clear from the context
what is meant by the field 𝐹, we call 𝑊 a subspace of 𝑀.
Proposition B.2.5. (1) Every 𝑅-submodule of 𝑀 is an 𝑅-module with the same addi-
tion and scalar multiplication as in 𝑀.
(2) Every 𝐹-subspace of 𝑉 is an 𝐹-vector space with the same addition and scalar mul-
tiplication as in 𝑉.
Definition B.2.6. (1) Let 𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ , 𝑣 ⃗ be vectors in 𝑀. We say that 𝑣 ⃗ is a linear
combination of the vectors 𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ if 𝑣 ⃗ can be written as
𝑙−1
(B.2.2) 𝑣 ⃗ = 𝑟0 𝑣 0⃗ + ⋯ + 𝑟 𝑙−1 𝑣 𝑙−1
⃗ = ∑ 𝑟𝑗 𝑣𝑗⃗
𝑗=0
with 𝑟𝑗 ∈ 𝑅 for 0 ≤ 𝑗 < 𝑙. The ring elements 𝑟𝑗 are called the coefficients of the
linear combination (B.2.2).
(2) The linear combination of the empty sequence in 𝑀 is defined to be 0.⃗
310 B. Linear Algebra
(3) For any subset 𝑆 of 𝑀 we define the span of 𝑆 as the set of all linear combinations
of finitely many elements of 𝑆 including the empty set. We write it as Span(𝑆).
So, we have
𝑙−1
(B.2.3) Span(𝑆) = { ∑ 𝑟𝑗 𝑣𝑗⃗ ∶ 𝑙 ∈ ℕ0 , 𝑟𝑗 ∈ 𝑅, 𝑣𝑗⃗ ∈ 𝑆, for all 𝑗 ∈ ℤ𝑙 } .
𝑗=0
⃗
In particular, the span of the empty set is {0}.
(4) We say that 𝑀 is finitely generated if 𝑀 = Span(𝑆) for a finite subset 𝑆 of 𝑀.
(B.2.4) ∑ 𝑁 = { ∑ 𝑣𝑁
⃗ ∶ 𝑣𝑁
⃗ ∈ 𝑁, finitely many 𝑣 𝑁
⃗ are nonzero}
𝑁∈𝑋 𝑁∈𝑋
(B.2.5) 𝑣 ⃗ = ∑ 𝑣𝑁
⃗
𝑁∈𝑋
where 𝑣 𝑁
⃗ ∈ 𝑁 and only finitely many of these elements are nonzero.
(2) If ∑𝑁∈𝑋 𝑁 is direct, then the module 𝑃 = ∑𝑁∈𝑋 𝑁 is called the direct sum of the
submodules in 𝑋.
(𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ ) + (𝑤⃗ 0 , . . . , 𝑤⃗ 𝑙−1 ) = (𝑣 0⃗ + 𝑤⃗ 0 , . . . , 𝑣 𝑙−1
⃗ + 𝑤⃗ 𝑙−1 ),
(B.2.6)
𝑟 ⋅ (𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ ) = (𝑟 ⋅ 𝑣 0⃗ , . . . , 𝑟 ⋅ 𝑣 𝑙−1
⃗ ).
𝑙−1
Then (∏𝑖=0 𝑀𝑖 , +, ⋅) is an 𝑅-module. It is called the direct product of the 𝑅-modules
𝑙−1
𝑀0 , . . . , 𝑀𝑙−1 and is also written as ∏𝑖=0 𝑀𝑖 .
B.3. Linear maps between modules 311
B.3.3. 𝑅-algebras.
Definition B.3.8. An 𝑅-algebra is a tuple (𝑀, +, ⋅, ∘) which has the following proper-
ties.
(1) (𝑀, +, ⋅) is an 𝑅-module.
(2) (𝑀, +, ∘) is a ring.
(3) The scalar multiplication of the 𝑅-module 𝑀 is associative with respect to the
∘- operation; that is,
(B.3.10) 𝑟 ⋅ (𝐴 ∘ 𝐵) = (𝑟 ⋅ 𝐴) ∘ 𝐵 = 𝐴 ∘ (𝑟 ⋅ 𝐵)
for all 𝐴, 𝐵 ∈ 𝑀 and 𝑟 ∈ 𝑅.
B.4. Matrices 313
B.3.4. Endomorphisms.
B.4. Matrices
In this section, we introduce matrices which play a very important role in linear algebra
as representations of module homomorphisms. We let 𝑆 be a nonempty set.
where 𝑎𝑖,𝑗 ∈ 𝑆 for 0 ≤ 𝑖 < 𝑘, 0 ≤ 𝑗 < 𝑙. The elements 𝑎𝑖,𝑗 are called the entries
of the matrix 𝐴. This matrix can also be written as 𝐴 = (𝑎𝑖,𝑗 )0≤𝑖<𝑘,0≤𝑗<𝑙 or 𝐴 =
(𝑎𝑖,𝑗 )𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑙 . If the ranges of 𝑘 and 𝑙 are clear from the context, we also write
𝐴 = (𝑎𝑖,𝑗 ).
(2) The set of all 𝑘 × 𝑙 matrices over 𝑆 is denoted by 𝑆 (𝑘,𝑙) .
(1) The scalar product of 𝑟 with 𝐴 is the matrix 𝑟 ⋅ 𝐴 = 𝑟𝐴 = (𝑟𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑙) . This
operation is called scalar multiplication.
(2) The sum of 𝐴 and 𝐵 is
Proposition B.4.5. (1) (𝑅(𝑘,𝑙) , +, ⋅) is an 𝑅-module where “+” denotes matrix addition
and “⋅” stands for scalar multiplication on 𝑅(𝑘,𝑙) .
(2) For all 𝐴 ∈ 𝑅(𝑘,𝑙) the matrix −𝐴 is the additive inverse of 𝐴.
(3) The neutral element of the group (𝑅(𝑘,𝑙) , +) is the zero matrix in 𝑅(𝑘,𝑙) all of whose
entries are zero. We denote it by 0𝑘,𝑙 or as 0 if it is clear from the context what is
meant by 𝑘, 𝑙.
Definition B.4.6. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑙) and 𝐵 = (𝑏𝑖,𝑗 ) ∈ 𝑅(𝑙,𝑚) . Then we define the
product of 𝐴 and 𝐵 as
𝑙−1
(B.4.9) 𝐴 ⋅ 𝐵 = ( ∑ 𝑎𝑖,ᵆ 𝑏ᵆ,𝑗 ) .
ᵆ=0 𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑚
(B.4.10) (𝐴 ⋅ 𝐵) ⋅ 𝐶 = 𝐴 ⋅ (𝐵 ⋅ 𝐶).
Definition B.4.8. Let 𝐴 ∈ 𝑅(𝑘,𝑙) with column vectors 𝑎0⃗ , . . . , 𝑎𝑙−1 ⃗ and let 𝑣 ⃗ =
(𝑣 0 , . . . , 𝑣 𝑙−1 ) ∈ 𝑅𝑙 . Then we define the product of 𝐴 with 𝑣 ⃗ as
𝑙−1
(B.4.11) 𝐴 ⋅ 𝑣 ⃗ = 𝐴𝑣 ⃗ = ∑ 𝑣 𝑖 𝑎𝑖⃗ .
𝑗=0
Note that the product of a matrix 𝐴 with a vector 𝑣 ⃗ is the same as the product of 𝐴
with the matrix corresponding to 𝑣.⃗
Then the entries 𝑎𝑖,𝑖 , 0 ≤ 𝑖 < 𝑘, are called the diagonal elements of 𝐴 (highlighted
in (B.5.1)). The other entries are called the off-diagonal elements of 𝐴.
(2) The zero matrix of order 𝑘 over 𝑅 is the matrix in 𝑅(𝑘,𝑘) all of whose entries are 0.
We denote it by 0𝑘 or simply by 0 if it is clear from the context what is meant by
𝑘.
(3) The identity matrix of order 𝑘 over 𝑅 is the following square matrix in 𝑅(𝑘,𝑘) :
1 0 0 ⋯ 0
⎛ ⎞
0 1 0 ⋯ 0
⎜ ⎟
(B.5.2) 𝐼𝑘 = ⎜0 0 1 ⋯ 0⎟ .
⎜⋮ ⋮ ⋮ ⋱ ⋮⎟
⎝0 0 0 ⋯ 1⎠
All of its diagonal elements are 1 and the off-diagonal elements are 0. The matrix
𝐼𝑘 is also called the unit matrix of order and is denoted by 𝐼 if 𝑘 is clear from the
context.
𝑎 0 0 ... 0
⎛ 0,0 ⎞
𝑎 𝑎1,1 0 ... 0
⎜ 1,0 ⎟
(B.5.4) 𝐴 = ⎜ 𝑎2,0 𝑎2,1 𝑎2,2 ... 0 ⎟;
⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⎟
⎝𝑎𝑘−1,0 𝑎𝑘−1,1 𝑎𝑘−1,2 . . . 𝑎𝑘−1,𝑘−1 ⎠
Definition B.5.4 uses the fact that the inverse of an element of a semigroup is
uniquely determined. We will show in Corollary B.5.21 that for a matrix 𝐴 ∈ 𝑅(𝑘,𝑘)
to be invertible, it suffices that 𝐴 has a left or right inverse, respectively; that is, there
is a matrix 𝐵 ∈ 𝑅(𝑘,𝑘) such that 𝐵𝐴 = 𝐼𝑘 or 𝐴𝐵 = 𝐼𝑘 , respectively. This right or left
inverse of 𝐴 is its inverse.
This shows that 𝑃𝜍 is invertible and 𝑃𝜍−1 = (𝑃𝜍 )−1 . Also, by Proposition B.5.6 we
have 𝑃𝜍T = (𝛿 𝑖,𝜍(𝑗) ) and 𝑃𝜍−1 = (𝛿 𝑖,𝜍(𝑗) ) which proves that these two matrices are the
same. □
Corollary B.5.8. The set of permutation matrices in 𝑅(𝑘,𝑘) is a subgroup of 𝖦𝖫(𝑘, 𝑅).
Proposition B.5.9. (1) For all 𝐴 ∈ 𝑅(𝑘,𝑙) with row vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
⃗ and all 𝜎 ∈ 𝑆 𝑘
the product 𝑃𝜍 𝐴 is the matrix in 𝑅(𝑘,𝑙) with row vectors 𝑎𝜍(0)
⃗ , . . . , 𝑎𝜍(𝑘−1)
⃗ , in this
order.
(2) For all 𝐴 ∈ 𝑅(𝑙,𝑘) with column vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
⃗ and all 𝜎 ∈ 𝑆 𝑙 the product 𝐴𝑃𝜍
is the matrix in 𝑅(𝑙,𝑘) with column vectors 𝑎𝜍⃗ −1 (0) , . . . , 𝑎𝜍⃗ −1 (𝑙−1) , in this order.
Proof. Let 𝐴 ∈ 𝑅(𝑘,𝑙) and 𝜎 ∈ 𝑆 𝑘 . Then, for all 𝑖 ∈ ℤ𝑘 , the product 𝑒 𝜍(𝑖)
⃗ 𝐴 is the 𝜎(𝑖)th
row vector of 𝐴. Together with the definition of 𝑃𝜍 this implies the first assertion. Now,
let 𝜎 ∈ 𝑆 𝑙 . Let 𝑗 ∈ ℤ𝑙 . By Proposition B.5.6, the 𝑗th column vector of 𝑃𝜍 is 𝑒 𝜍⃗ −1 (𝑗)
and the product 𝐴𝑒 𝜍(𝑗)
⃗ is the 𝜎−1 (𝑗)th column vector of 𝐴. This implies the second
assertion. □
B.5.3. Determinants.
have
⃗ , 𝑎𝑗⃗ + 𝑏,⃗ 𝑎𝑗+1
det(𝑎0⃗ , . . . , 𝑎𝑗−1 ⃗ , . . . , 𝑎𝑘−1
⃗ )
(B.5.10) = det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑎𝑗⃗ , 𝑎𝑗+1
⃗ , . . . , 𝑎𝑘−1
⃗ )
⃗ , 𝑏,⃗ 𝑎𝑗+1
+ det(𝑎0⃗ , . . . , 𝑎𝑗−1 ⃗ , . . . , 𝑎𝑘−1
⃗ )
and
det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑟𝑎𝑗⃗ , 𝑎𝑗+1
⃗ , . . . , 𝑎𝑘−1
⃗ )
(B.5.11)
= 𝑟 ⋅ det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑎𝑗⃗ , 𝑎𝑗+1
⃗ , . . . , 𝑎𝑘−1
⃗ ).
(2) The map det is called alternating if for every matrix 𝐴 ∈ 𝑅(𝑘,𝑘) which has two
equal columns we have det(𝐴) = 0.
(3) The map det is called normalized if det(𝐼𝑘 ) = 1.
(4) The map det is called a determinant function if it is multilinear, alternating, and
normalized.
Proposition B.5.11. Let det ∶ 𝑅(𝑘,𝑘) → 𝑅 be multilinear and alternating. Then, for all
𝐴 ∈ 𝑅(𝑘,𝑘) with column vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
⃗ the following hold.
(1) Adding a multiple of one column to another column of 𝐴 does not change det(𝐴);
that is, for all all 𝑟 ∈ 𝑅 and all 𝑖, 𝑗 ∈ ℤ𝑘 with 𝑖 ≠ 𝑗 we have
(B.5.12) det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑎𝑗⃗ + 𝑟𝑎𝑖⃗ , 𝑎𝑗+1
⃗ . . . , 𝑎𝑘−1
⃗ ) = det 𝐴.
(2) Swapping two columns of 𝐴 changes the sign of det(𝐴); that is, for all 𝑖, 𝑗 ∈ ℤ𝑘 with
𝑖 ≠ 𝑗 we have
(B.5.13) det(𝑎0⃗ , . . . , 𝑎𝑗⃗ , . . . , 𝑎𝑖⃗ . . . , 𝑎𝑘−1
⃗ ) = − det 𝐴.
(3) If one column of 𝐴 is zero, then det 𝐴 = 0.
Theorem B.5.12. The map
𝑘−1
(𝑘,𝑘)
(B.5.14) det ∶ 𝑅 → 𝑅, 𝐴 ↦ det(𝐴) = ∑ sign(𝜎) ∏ 𝑎𝜍(𝑗),𝑗
𝜍∈𝑆𝑘 𝑗=0
is a determinant function and it is the only determinant function that maps 𝑅(𝑘,𝑘) to 𝑅.
Definition B.5.13. For 𝐴 ∈ 𝑅(𝑘,𝑘) the value det(𝐴) is called the determinant of 𝐴.
The formula (B.5.14) is called the Leibniz formula for evaluating determinants.
Proposition B.5.14. (1) The determinant of a square matrix over 𝑅 and its transpose
are the same; that is, for all 𝐴 ∈ 𝑅(𝑘,𝑘) we have
(B.5.15) det(𝐴) = det(𝐴T ).
(2) The determinant is linear with respect to matrix multiplication; that is, for all 𝐴, 𝐵 ∈
𝑅(𝑘,𝑘) we have
(B.5.16) det(𝐴𝐵) = det(𝐴) det(𝐵).
(3) The determinant of a triangular matrix (upper or lower) is the product of its diagonal
entries.
320 B. Linear Algebra
Definition B.5.15. Let 𝐴 ∈ 𝑅(𝑘,𝑘) and assume that 𝑘 > 1. Also, let 𝑖, 𝑗 ∈ ℤ𝑘 . Then the
minor 𝐴𝑖,𝑗 is the matrix in 𝑅(𝑘−1,𝑘−1) that is obtained by deleting the 𝑖th row and 𝑗th
column in 𝐴.
Here is the Laplace expansion formula for determinants.
Theorem B.5.16. Let 𝑘 > 1 and let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑘) . Then for every 𝑖 ∈ ℤ𝑘 we have
𝑘−1
(B.5.17) det 𝐴 = ∑ (−1)𝑖+𝑗 𝑎𝑖,𝑗 det 𝐴𝑖,𝑗
𝑗=0
where 𝐴𝑖,𝑗 are the minors of 𝐴. We also write adj 𝐴 instead of adj(𝐴).
The adjugate of a square matrix has the following property that allows us to com-
pute inverses of square matrices.
Proposition B.5.19. Let 𝐴 ∈ 𝑅(𝑘,𝑘) . Then we have
(B.5.20) (adj 𝐴)𝐴 = 𝐴(adj 𝐴) = det 𝐴 ⋅ 𝐼𝑘 .
Now we can characterize the elements of 𝖦𝖫(𝑘, 𝑅) and show how to compute the
inverses of square matrices.
Theorem B.5.20. (1) A matrix 𝐴 ∈ 𝑅(𝑘,𝑘) is invertible if and only if det(𝐴) is a unit in
𝑅; that is,
(B.5.21) 𝖦𝖫(𝑘, 𝑅) = {𝐴 ∈ 𝑅(𝑘,𝑘) ∶ det 𝐴 ∈ 𝑈(𝑅)}.
(2) Let 𝐴 ∈ 𝖦𝖫(𝑘, 𝑅). Then we have
(B.5.22) det(𝐴−1 ) = (det 𝐴)−1
and the inverse of 𝐴 is
(B.5.23) 𝐴−1 = det(𝐴)−1 adj(𝐴).
Corollary B.5.21. Let 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) with 𝐴𝐵 = 𝐼𝑘 . Then 𝐴, 𝐵 ∈ 𝖦𝖫(𝑘, 𝑅), 𝐵 = 𝐴−1 , and
𝐴 = 𝐵 −1 .
Corollary B.5.22. If 𝐹 is a field, then we have
(B.5.24) 𝖦𝖫(𝑘, 𝐹) = {𝐴 ∈ 𝐹 (𝑘,𝑘) ∶ det 𝐴 ≠ 0}.
Lemma B.5.23. Let 𝐴, 𝐵 ∈ 𝖦𝖫(𝑘, 𝑅). Then we have (𝐴𝐵)−1 = 𝐵 −1 𝐴−1 .
B.5. Square matrices 321
B.5.5. Trace.
Definition B.5.24. The trace of a square matrix 𝐴 over 𝑅 is the sum of its diagonal
elements. It is denoted by tr(𝐴) or tr 𝐴.
Proposition B.5.25. (1) The trace map tr ∶ 𝑅(𝑘,𝑘) → 𝑅 is 𝑅-linear; that is,
tr(𝑎𝐴 + 𝑏𝐵) = 𝑎 tr(𝐴) + 𝑏 tr(𝐵) for all 𝑎, 𝑏 ∈ 𝑅 and 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) .
(2) tr(𝐴T ) = tr(𝐴) for all 𝐴 ∈ 𝑅(𝑘,𝑘) .
(3) tr(𝐴𝐵) = tr(𝐵𝐴) for all 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) .
Corollary B.5.28. Let 𝐴 ∈ 𝑅(𝑘,𝑘) and assume that the characteristic polynomial of 𝐴
can be written as
𝑘−1
(B.5.28) 𝑝𝐴 (𝑥) = ∏(𝑥 − 𝜆𝑖 )
𝑖=0
and
𝑘−1
(B.5.30) det(𝐴) = ∏ 𝜆𝑖 .
𝑖=0
Definition B.5.29. Two matrices 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) are called similar if there is a matrix
𝑈 ∈ 𝖦𝖫(𝑘, 𝑅) such that 𝐵 = 𝑈 −1 𝐴𝑈.
Proposition B.5.30. Similar matrices in 𝑅(𝑘,𝑘) have the same characteristic polynomial,
trace, and determinant.
322 B. Linear Algebra
B.6.1. Operations on 𝑀 𝑘 .
(1) For all 𝑟 ∈ 𝑅 and all 𝑥,⃗ 𝑦 ⃗ ∈ 𝑅𝑘 we have (𝑟𝐵)𝑥⃗ = 𝐵(𝑟𝑥)⃗ and 𝐵(𝑥⃗ + 𝑦)⃗ = 𝐵 𝑥⃗ + 𝐵𝑦.⃗
(2) For all 𝑟 ∈ 𝑅 and 𝑋 ∈ 𝑅(𝑘,𝑙) and 𝑌 ∈ 𝑅(𝑘,𝑙) we have (𝑟𝐵)𝑋 = 𝐵(𝑟𝑋) and 𝐵(𝑋 +𝑌 ) =
𝐵𝑋 + 𝐵𝑌 .
(3) For all 𝑋 ∈ 𝑅(𝑘,𝑙) and 𝑌 ∈ 𝑅(𝑙,𝑚) we have (𝐵𝑋)𝑌 = 𝐵(𝑋𝑌 ).
(B.6.3) 𝑀 → 𝑅𝑘 , 𝑣 ⃗ ↦ 𝑣 𝐵⃗ ,
that sends an element 𝑣 ⃗ ∈ 𝑀 to its coefficient vector with respect to the basis 𝐵, is an
isomorphism of 𝑅-modules.
B.6. Free modules of finite dimension 323
Proposition B.6.8. Let 𝐵 be a finite basis of 𝑀 of length 𝑘 and let 𝑇 ∈ 𝖦𝖫(𝑘, 𝑅). Then
for all 𝑣 ⃗ ∈ 𝑀 we have
(B.6.4) 𝑣 𝐵⃗ = 𝑇 𝑣 𝐵𝑇
⃗ .
B.6.5. Dual modules. Let 𝑀 be a free 𝑅-module of finite dimension 𝑘 and let
𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1
⃗ ) be a basis of 𝑀.
B.7.1. Bases and generating systems. We know from Theorem B.6.4 that all
bases of 𝑉 have the same length 𝑘 and that for every basis 𝐵 of 𝑉, the set of all bases of
𝑉 is the coset 𝖦𝖫(𝑘, 𝐹)𝐵. We now state the Steinitz Exchange Lemma which allows us
to obtain more results for bases and generating systems of 𝑉.
Lemma B.7.1. Let 𝑚, 𝑛 ∈ ℕ, let 𝑈 = (𝑢⃗0 , . . . , 𝑢⃗𝑚−1 ) ∈ 𝑉 𝑚 be linearly independent, and
let 𝐺 ∈ 𝑉 𝑛 be a generating system of 𝑉. Then 𝑚 ≤ 𝑛 and there are elements 𝑢⃗𝑚 , . . . , 𝑢⃗𝑛−1
in 𝐺 such that (𝑢⃗0 , . . . , 𝑢⃗𝑛−1 ) is a generating system of 𝑉.
B.7. Finite-dimensional vector spaces 325
Proposition B.7.3. The vector spaces generated by the rows and columns of a matrix 𝐴
over 𝐹, respectively, have the same dimension. This dimension is called the rank of 𝐴. It
is denoted by rank(𝐴) or rank 𝐴.
1 0
(B.7.1) 𝐴 = (0 1)
1 1
over 𝔽2 . The rank of 𝐴 is 2. Indeed, the column rank of 𝐴 is 2 because the two column
vectors of 𝐴 are linearly independent. Also, the row rank of this matrix is 2 because the
first two row vectors of 𝐴 are linearly independent and the third row vector is a linear
combination of the first two.
The next proposition establishes a connection between the kernel and the image
of a linear map from 𝑉 to 𝑊 and the rank of a representation matrix of this map.
Proposition B.7.5. Let 𝑓 ∈ Hom𝐹 (𝑉, 𝑊). Then the rank 𝑟 of all representation matrices
of 𝑓 is the same and the following hold.
Proposition B.7.8. Let 𝐴 ∈ 𝐹 (𝑘,𝑘) . Then the following statements are equivalent.
(1) 𝐴 is nonsingular.
(2) The rank of 𝐴 is 𝑘.
(3) The columns of 𝐴 form a basis of 𝐹 𝑘 .
(4) The rows of 𝐴 form a basis of 𝐹 𝑘 .
326 B. Linear Algebra
Definition B.7.9. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑙,𝑘) with row vectors 𝑎0⃗ , . . . , 𝑎𝑙−1
⃗ .
(1) We say that 𝐴 is in row echelon form if the following conditions are satisfied.
(a) All rows of 𝐴 that have only zero entries are at the bottom of 𝐴; that is, if
𝑢, 𝑣 ∈ ℤ𝑘 such that 𝑎ᵆ⃗ ≠ 0⃗ and 𝑎𝑣⃗ = 0,⃗ then 𝑢 < 𝑣.
(b) If 𝑢 > 0 and 𝑎ᵆ⃗ is nonzero, then the first nonzero entry in 𝑎ᵆ⃗ is strictly to the
right of the first nonzero entry in 𝑎ᵆ−1
⃗ ; that is,
(B.7.2) min{𝑗 ∈ ℤ𝑙 ∶ 𝑎ᵆ,𝑗 ≠ 0} > min{𝑗 ∈ ℤ𝑙 ∶ 𝑎ᵆ−1,𝑗 ≠ 0}.
(2) We say that 𝐴 is in reduced row echelon form if 𝐴 is in row echelon form and the
first nonzero element in each nonzero row is 1.
We note that a square matrix in row echelon form is an upper triangular matrix.
Also, a square matrix in column echelon form is a lower triangular matrix.
B.7.4. The Gauss elimination algorithm. In this section, we explain the Gauss
Elimination Algorithm B.7.13 that transforms a matrix 𝐴 ∈ 𝐹 (𝑙,𝑘) into column echelon
form. Despite its name, this algorithm was already known in China in the second
century. Since the algorithm uses division by nonzero elements, the algorithm is only
guaranteed to work over fields but, in general, not over rings.
The correctness of the algorithm is stated in the next theorem.
The name “Gaussian elimination algorithm” derives from the fact that in the for
loop starting at line 15, the entries 𝑎ᵆ,𝑗 are “eliminated” for 𝑣 < 𝑗 < 𝑘.
Algorithm B.7.13 can also be used to transform 𝐴 ∈ 𝐹 (𝑘,𝑙) into row echelon form
as follows. We apply Algorithm B.7.13 to the transpose of 𝐴. The algorithm returns
𝐴′ , 𝑆, 𝑣, 𝑤. We replace 𝐴′ , 𝑆 by their transposes. Then 𝐴′ is in row echelon form and we
have 𝐴′ = 𝑆𝐴, 𝑣 is the number of nonzero rows of 𝐴′ , and det 𝑆 = (−1)𝑤 .
Theorem B.7.12. Let 𝑘, 𝑙 ∈ ℕ and 𝐴 ∈ 𝐹 (𝑙,𝑘) be the input of Algorithm B.7.13 and
let 𝑛 = max{𝑘, 𝑙}. The algorithm then uses O(𝑛3 ) operations in 𝐹 and space for O(𝑛2 )
elements of 𝐹.
The Gauss elimination algorithm also requires time and space to initialize and
increment the loop variables 𝑢, 𝑣, and 𝑤. However, these time and space requirements
are dominated by the complexity of the operations in the field 𝐹. Therefore, we do not
mention them explicitly.
B.7. Finite-dimensional vector spaces 327
Theorem B.7.14. Let 𝑙, 𝑘 ∈ ℕ and 𝐴 ∈ 𝐹 (𝑙,𝑘) be the input of the Gauss elimination
algorithm and let 𝐴′ ∈ 𝐹 (𝑙,𝑘) , 𝑆 ∈ 𝖦𝖫(𝑘, 𝐹), 𝑣, 𝑤 ∈ ℕ be its output. Then the following
hold.
(1) The rank of 𝐴 is 𝑣.
(2) The sequence consisting of the first 𝑣 column vectors of 𝐴′ is a basis of the image of 𝐴.
328 B. Linear Algebra
(3) The sequence consisting of the last 𝑣 − 𝑘 columns of 𝑆 is a basis of the kernel of 𝐴.
(4) If 𝑘 = 𝑙, then (−1)𝑤 det 𝐴 is the product of the diagonal elements of 𝐴′ .
Also, with 𝑛 = max{𝑘, 𝑙} the computation of these objects requires O(𝑘3 ) operations in 𝐹
and space for O(𝑘2 ) elements of 𝐹.
If 𝑥⃗ ∈ 𝐹 𝑙 satisfies (B.7.3), then 𝑥⃗ is called a solution of the linear system (B.7.3). We first
characterize the solutions of linear systems.
Proposition B.7.15. Let 𝐴 ∈ 𝐹 (𝑙,𝑘) and let 𝑏 ∈ 𝐹 𝑙 . Then the set of all the solutions of
the linear system 𝐴𝑥⃗ = 𝑏 ⃗ is empty or a coset of the kernel of 𝐴, i.e., of the form 𝑥⃗ + ker(𝐴)
where 𝑥⃗ is any of the solutions of the linear system.
Proposition B.7.15 shows how to find the set of all solutions of the linear system
(B.7.3). First, decide whether the linear system has a solution and if this is the case,
find one. Second, determine the basis of the kernel of 𝐴. We have already explained
how the second task can be achieved using the Gauss algorithm. So, it remains to solve
the first task. This is done in Algorithm B.7.16.
(𝑘−𝑙,𝑘)
𝟎 stands for the matrix in 𝐹 with only zero entries, 𝐴2 ∈ 𝐹 (𝑙,𝑘−𝑙) , and 𝐴3 ∈ 𝐹 (𝑙,𝑘−𝑙) .
Corollary B.7.26. Let 𝑓 ∈ End(𝑉). Then the geometric multiplicities of the eigenvalues
of 𝑓 are less than or equal to the corresponding algebraic multiplicities.
330 B. Linear Algebra
We will now generalize Example B.8.5 and show how to construct the map Φ from
Definition B.8.4 and prove that it is uniquely determined by 𝜙 in this definition.
Lemma B.8.7. Let 𝑇 be an 𝑅-module and let 𝜃 ∶ 𝑀 → 𝑇 be a multilinear map with
Span 𝜃(𝑀) = 𝑇. Then the following statements are equivalent.
(1) The pair (𝑇, 𝜃) has the universal property.
(2) For every 𝑅-module 𝑃, every multilinear map 𝜙 ∶ 𝑀 → 𝑃, every 𝑛 ∈ ℕ, all
𝑛−1
(𝑣 0⃗ , . . . , 𝑣𝑛−1
⃗ ) ∈ 𝑀, and all 𝑟0 , . . . , 𝑟𝑛−1 ∈ 𝑅 with ∑𝑗=0 𝑟𝑗 𝜃(𝑣𝑗⃗ ) = 0 we have
𝑛−1
∑𝑗=0 𝑟𝑗 𝜙(𝑣𝑗⃗ ) = 0.
Conversely, assume that the second condition holds. Consider the map
𝑛−1 𝑛−1
(B.8.14) Φ ∶ 𝑇 → 𝑃, ∑ 𝑟𝑗 𝜃(𝑣𝑗⃗ ) ↦ ∑ 𝑟𝑗 𝜙(𝑣𝑗⃗ )
𝑗=0 𝑗=0
for all 𝑛 ∈ ℕ, 𝑣𝑗⃗ ∈ 𝑀 for 0 ≤ 𝑗 < 𝑛. We show that Φ is well-defined. Since Span 𝜃(𝑀) =
𝑇, every 𝑡 ∈ 𝑇 can be written as
𝑚−1
(B.8.15) 𝑡 = ∑ 𝑟𝑗 𝜃(𝑣𝑗⃗ )
𝑗=0
inserting summands with coefficients 0 in the sums on the right sides of (B.8.15) and
(B.8.16) and changing the order of the terms in these sums, we achieve 𝑛 = 𝑛′ and
𝑣𝑗⃗ = 𝑣𝑗′⃗ for all 𝑗 ∈ ℤ𝑛 . So we have
𝑛−1
(B.8.17) 0 = ∑ (𝑟𝑗 − 𝑟𝑗′ )𝜃(𝑣𝑗⃗ ).
𝑗=0
This shows that Φ is, in fact, well-defined. The proof of the linearity is left to the reader
as Exercise B.8.8. □
Exercise B.8.8. Prove the linearity of the map defined in (B.8.14).
is a well-defined homomorphism that satisfies 𝜙 = Φ ∘ 𝜃 and it is the only linear map with
this property.
Proof. We have shown in the proof of Lemma B.8.7 that Φ is a well-defined homo-
morphism with 𝜙 = Φ ∘ 𝜃. To prove the uniqueness, let Φ′ ∶ 𝑇 → 𝑃 be another linear
map with these properties. Also, let 𝑡 ∈ 𝑇 with a representation as in (B.8.15). Then
𝑛−1 𝑛−1 𝑛−1
we have Φ′ (𝑡) = Φ′ (∑𝑖=0 𝑟 𝑖 𝜃(𝑣 𝑖⃗ )) = ∑𝑖=0 𝑟 𝑖 Φ′ ∘ 𝜃(𝑣 𝑖⃗ ) = ∑𝑖=0 𝑟 𝑖 𝜙(𝑣 𝑖⃗ ) = Φ(𝑡). □
Example B.8.10. We use the tensor product (ℤ, 𝜃) from Example B.8.5 and consider
the map 𝜙 ∶ ℤ3 → ℤ, (𝑥0 , 𝑥1 , 𝑥2 ) ↦ 2𝑥1 . It is multilinear. The uniquely determined
linear map Φ from Proposition B.8.9 satisfies
(B.8.20) Φ(𝑡) = 𝜙(1, 𝑡, 1) = 2𝑡
and this equation completely determines Φ.
Proof. Let (𝑇, 𝜃) and (𝑇 ′ , 𝜃′ ) be tensor products of 𝑀0 , . . . , 𝑀𝑚−1 . Then it follows from
the universal property of tensor products that there are linear maps Θ ∶ 𝑇 ′ → 𝑇 and
Θ′ ∶ 𝑇 → 𝑇 ′ such that
(B.8.21) 𝜃 ′ = Θ′ ∘ 𝜃 and 𝜃 = Θ ∘ 𝜃′ .
This implies
(B.8.22) 𝜃′ = (Θ′ ∘ Θ) ∘ 𝜃′ and 𝜃 = (Θ ∘ Θ′ ) ∘ 𝜃.
Equation (B.8.22) implies
(B.8.23) Θ′ ∘ Θ|𝜃′ (𝑀) = 𝐼𝜃′ (𝑀) and Θ ∘ Θ′ |𝜃(𝑀) = 𝐼𝜃(𝑀) .
334 B. Linear Algebra
Since Θ and Θ′ are linear transformations and since Span(𝜃(𝑀)) = 𝑇 and Span(𝜃′ (𝑀))
= 𝑇 ′ we obtain from (B.8.23)
(B.8.24) Θ′ ∘ Θ = Θ′ ∘ Θ|Span(𝜃′ (𝑀)) = 𝐼Span(𝜃′ (𝑀)) = 𝐼𝑇 ′
and
(B.8.25) Θ ∘ Θ′ = Θ ∘ Θ′ |Span(𝜃(𝑀)) = 𝐼Span(𝜃(𝑀)) = 𝐼𝑇 .
So Θ is an isomorphism between 𝑇 ′ and 𝑇. The uniqueness of Θ follows from Propo-
sition B.8.9 □
and
𝑘−1 𝑘−1
(B.8.28) 𝑟 ∑ 𝑟 𝑖 𝑣 𝑖⃗ = ∑ (𝑟𝑟 𝑖 )𝑣 𝑖⃗ .
𝑖=0 𝑖=0
From these rules, we also obtain formulas for adding two linear combinations in 𝐿.
For this, we write both as linear combinations of the same elements of 𝑀 by inserting
summands with coefficients zero and changing the order of the terms in the sum if
necessary. As shown in Exercise B.8.13, 𝐿 is an 𝑅-module.
Exercise B.8.13. Verify that 𝐿 is an 𝑅-module.
Example B.8.14. Let 𝑀0 = 𝑀1 = ℤ. Then the module 𝐿 consists of all formal sums
𝑘−1
∑𝑗=0 𝑟 𝑖 (𝑣 𝑖 , 𝑤 𝑖 ) where 𝑘 ∈ ℕ0 and 𝑟 𝑖 , 𝑣 𝑖 , 𝑤 𝑖 ∈ ℤ for 0 ≤ 𝑖 < 𝑘 such that the tuples
(𝑣 𝑖 , 𝑤 𝑖 ) are pairwise different and different from (0, 0). For example (3, 2) − 2 ⋅ (1, 2)
and (1, 2) are two different elements of 𝐿.
and
𝑟(𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑣,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
(B.8.30)
− (𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑟𝑣,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
where 𝑟 ∈ 𝑅, (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀, 𝑖 ∈ ℤ𝑚 , 𝑣,⃗ 𝑤⃗ ∈ 𝑀𝑖 . For 𝑣 ⃗ = (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀 we
denote the residue class of 𝑣 ⃗ modulo 𝑆 by
(B.8.31) 𝑣 0⃗ ⊗𝑅 ⋯ ⊗𝑅 𝑣 𝑚−1
⃗ .
(B.8.33) 𝑀0 ⊗𝑅 ⋯ ⊗𝑅 𝑀𝑚−1 .
It follows from the definition of 𝑆 that the following relations hold for all 𝑟 ∈ 𝑅,
(𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀, 𝑖 ∈ ℤ𝑚 , and 𝑣,⃗ 𝑤⃗ ∈ 𝑀𝑖 :
𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖−1
⃗ ⊗ 𝑣 ⃗ ⊗ 𝑣 𝑖+1
⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗
(B.8.36) + 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖−1
⃗ ⊗ 𝑤⃗ ⊗ 𝑣 𝑖+1
⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗
= 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖−1
⃗ ⊗ 𝑣 ⃗ + 𝑤⃗ ⊗ 𝑣 𝑖+1
⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗ ,
and
𝑟𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗
(B.8.37)
= 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑟𝑣 𝑖⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗ .
𝑚−1
Theorem B.8.16. The pair (⨂𝑗=0 𝑀𝑗 , ⨂) is a tensor product of 𝑀0 , . . . , 𝑀𝑚−1 over 𝑅.
The uniqueness of the tensor product shown in Theorem B.8.12 justifies the fol-
lowing definition.
𝑚−1
Definition B.8.17. The pair (⨂𝑗=0 𝑀𝑗 , ⨂) is called the tensor product of 𝑀0 , . . . , 𝑀𝑚−1
𝑚−1
over 𝑅. We simply write it as 𝑀0 ⊗𝑅 ⋯ ⊗𝑅 𝑀𝑚−1 or as 𝑀0 ⊗ ⋯ ⊗ 𝑀𝑚−1 = ⨂𝑖=0 𝑀𝑖
if 𝑅 is understood.
and for 𝑣 ⃗ ∈ 𝑀
𝑘−1
(B.8.41) 𝑣⊗𝑘
⃗ = 𝑣.⃗
⨂
𝑖=0
Example B.8.18. We construct the tensor product ℤ⊗3 . Its elements are the linear
combinations of 𝑥0 ⊗ 𝑥1 ⊗ 𝑥2 with integer coefficients where 𝑥𝑖 ∈ ℤ. We claim that
(B.8.42) ℤ⊗3 = ℤ ⋅ 1⊗3 .
To verify (B.8.42) we first note that ℤ ⋅ 1⊗3 ⊂ ℤ⊗3 . To show the reverse inclusion, let
𝑥0 ⊗𝑥1 ⊗𝑥2 ∈ ℤ⊗3 with 𝑥0 , 𝑥1 , 𝑥2 ∈ ℤ. Due to the multilinearity of the tensor product,
we have 𝑥0 ⊗ 𝑥1 ⊗ 𝑥2 = 𝑥 ⋅ 1⊗3 where 𝑥 = 𝑥0 𝑥1 𝑥2 . So 𝑥0 ⊗ 𝑥1 ⊗ 𝑥2 ∈ ℤ ⋅ 1⊗3 .
B.9. Tensor products of finite-dimensional vector spaces 337
The next proposition shows that with each element of the tensor product
𝑚−1
⨂𝑗=0 Hom(𝑀𝑗 , 𝑁 𝑗 ) we can associate a homomorphisms in Hom(𝑀, 𝑁).
𝑚−1 𝑚−1
(B.8.44) 𝑀 → 𝑁, 𝑣𝑗⃗ ↦ 𝑓𝑗 (𝑣𝑗⃗ )
⨂ ⨂
𝑗=0 𝑗=0
𝑚−1
defines a map in Hom(𝑀, 𝑁). We refer to this homomorphism as ⨂𝑗=0 𝑓𝑗 .
𝑚−1
is multilinear because the maps 𝑓𝑗 are linear. Since (⨂𝑗=0 𝑀𝑗 , ⊗) is a tensor product
of 𝑀0 , . . . , 𝑀𝑚−1 , the assertion follows from Proposition B.8.9. □
We note that, in general, the map that sends the tensor product of elements in
Hom(𝑀𝑗 , 𝑁 𝑗 ) to the corresponding homomorphism in Hom(𝑀, 𝑁) is not injective.
Therefore, several such tensor products may be associated with the same homomor-
phism in Hom(𝑀, 𝑁). But as we will see in Section B.9.3, the map is an 𝑅-module
isomorphism if the modules 𝑀𝑗 and 𝑁 𝑗 are finite-dimensional vector spaces.
⃗
Definition B.9.1. (1) By 𝐹 𝑘 we mean the set of all 𝑚-dimensional matrices (𝛼𝑖 )⃗ 𝑖∈ℤ
⃗
𝑘⃗
with entries 𝛼𝑖 ⃗ ∈ 𝐹.
⃗ ⃗
(2) Let 𝑖 ⃗ ∈ ℤ𝑘⃗ . The standard unit matrices in 𝐹 𝑘 are the matrices 𝐸𝑖 ⃗ in 𝐹 𝑘 such that
the entry with index 𝑖 ⃗ is 1 and it is the only nonzero entry of 𝐸 .⃗ 𝑖
𝑘⃗
Proposition B.9.2. The set 𝐹 equipped with componentwise addition and scalar mul-
⃗
tiplication is a 𝑘-dimensional 𝐹-vector space and (𝐸𝑖 )⃗ 𝑖∈ℤ
⃗ is a basis of 𝐹 𝑘 .
Exercise B.9.3. Prove Proposition B.9.2.
Note that this matrix depends on the choice of the bases of the vector spaces 𝑉 𝑗 . If we
want to make this dependence explicit, we write Mat𝐵0 ,. . .,𝐵𝑚−1 (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ).
Example B.9.5. Let 𝐹 = ℚ, 𝑚 = 3, 𝑘0 = 𝑘1 = 𝑘2 = 2. The three-dimensional matrix
𝑀((1, 1), (1, −1), (1, 0)) is presented in Table B.9.1.
⃗
Proposition B.9.6. The pair (𝐹 𝑘 , Mat) is a tensor product of 𝑉0 , . . . , 𝑉𝑚−1 .
B.9. Tensor products of finite-dimensional vector spaces 339
𝑚−1
index (𝑖0 , 𝑖1 , 𝑖2 ) entry ∏𝑗=0 𝑣 𝑖𝑗 ,𝑗
(0, 0, 0) 𝑣 0,0 𝑣 0,1 𝑣 0,2 = 1
(0, 0, 1) 𝑣 0,0 𝑣 0,1 𝑣 1,2 = 0
(0, 1, 0) 𝑣 0,0 𝑣 1,1 𝑣 0,2 = −1
(0, 1, 1) 𝑣 0,0 𝑣 1,1 𝑣 1,2 = 0
(1, 0, 0) 𝑣 1,0 𝑣 0,1 𝑣 0,2 = 1
(1, 0, 1) 𝑣 1,0 𝑣 0,1 𝑣 1,2 = 0
(1, 1, 0) 𝑣 1,0 𝑣 1,1 𝑣 0,2 = −1
(1, 1, 1) 𝑣 1,0 𝑣 1,1 𝑣 1,2 = 0
Proof. The map Mat is well-defined since the 𝐵𝑖 are bases of the 𝑉 𝑖 . Also, it is easy to
verify that Mat is multilinear. Next, we note that for all 𝑖 ⃗ = (𝑖0 , . . . , 𝑖𝑚−1 ) ∈ ℤ𝑘⃗ we have
From Proposition B.9.6 and Theorem B.8.12 we obtain the following corollary.
Corollary B.9.8. The map
𝑚−1 𝑚−1
⃗
(B.9.8) 𝑉 𝑖 → 𝐹 𝑘, 𝑣 𝑖⃗ ↦ Mat(𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ )
⨂ ⨂
𝑖=0 𝑖=0
𝑚−1
is the uniquely determined isomorphism between the tensor products (⨂𝑖=0 𝑉 𝑖 , ⨂) and
⃗
(𝐹 𝑘 , Mat).
Next, we show that the tensor product of the 𝐵𝑗 is a basis of 𝑉. For this, we need
the following definition and result.
340 B. Linear Algebra
We explain the meaning of 𝜃(𝐴, 𝐵). For this, we assume that we have modified
the representation in Definition B.9.9 such that the matrices in 𝐹 (𝑚,𝑛) become vectors
in 𝐹 𝑚𝑛 and matrices in 𝐹 (ᵆ,𝑣) become vectors in 𝐹 ᵆ𝑣 . The details are worked out in
Exercise B.9.15.
B.9. Tensor products of finite-dimensional vector spaces 341
𝑚−1
defines an isomorphism of 𝐹-vector spaces. In this definition, ⨂𝑗=0 𝑓𝑗 means two dif-
𝑚−1
ferent things: a map in ⨂𝑗=0 Hom(𝑉 𝑗 , 𝑊 𝑗 ) and the corresponding map in Hom(𝑉, 𝑊)
defined in Proposition B.8.19.
B.9.4. Partial trace. Our next goal is to introduce the notion of the partial trace.
In the discussion, we use direct products ∏𝑗∈𝐼 𝑀𝑗 and tensor products ⨂𝑗∈𝐼 𝑀𝑗 for
subsets 𝐼 of ℤ𝑚 . In these expressions, the indices are ordered by size: from smallest to
largest.
First, we note that the following holds.
Proposition B.9.21. For 0 ≤ 𝑗 < 𝑚 let 𝑓𝑗 ∈ End(𝑉 𝑗 ). Then we have
𝑚−1 𝑚−1
(B.9.27) tr( 𝑓𝑗 ) = ∏ tr 𝑓𝑗 .
⨂
𝑗=0 𝑗=0
Exercise B.9.22. Prove Proposition B.9.21. Hint: Use induction on 𝑚 and the formula
(B.9.12) for the tensor product of matrices.
that satisfies
(B.9.29) tr𝐽 ( 𝑓𝑗 ) = ∏ tr 𝑓𝑗 𝑓𝑗
⨂ ⨂
𝑗∈ℤ𝑚 𝑗∈𝐽 𝑗∈ℤ𝑚 ⧵𝐽
𝑚−1
for all (𝑓0 , . . . , 𝑓𝑚−1 ) ∈ ∏𝑗=0 End(𝑉 𝑗 ). It is called the partial trace over the 𝑉 𝑗 , 𝑗 ∈ 𝐽.
It is multilinear. Hence, it follows from Proposition B.8.9 that (B.9.29) defines the
uniquely determined homomorphism (B.9.28). □
Example B.9.24. Let 𝑅, 𝑀0 , 𝑀1 = ℤ3 , 𝑓 ∶ ℤ3 → ℤ3 , and 𝑣 ↦ 2𝑣 mod 3. The partial
trace of 𝑓⊗2 over 𝑀0 is the map (𝑥, 𝑦) ↦ (tr 𝑓)𝑓(𝑦) = 𝑦.
Probability Theory
Quantum algorithms are probabilistic by nature. So their analysis requires some prob-
ability theory. This part of the appendix summarizes the concepts and results of proba-
bility theory that are required in the analyses of probabilistic and quantum algorithms
in this book.
C.1. Basics
We begin with some basic definitions.
In the following, we need the Riemann Series Theorem which we state now.
∞
Theorem C.1.4. Consider an infinite sum ∑𝑖=0 𝑟 𝑖 where 𝑟 𝑖 ∈ ℝ for all 𝑖 ∈ ℕ0 . Then
the following statements are equivalent.
∞
(1) The infinite sum ∑𝑖=0 𝑟 𝑖 is absolute convergent.
∞
(2) For all permutations 𝜋 ∶ ℕ0 → ℕ0 , the infinite sums ∑𝑖=0 𝑟𝜋(𝑖) are convergent and
have the same limit.
∞
If the two statements hold, then we write ∑𝑟∈𝑅 𝑟 for the limit of the infinite sum ∑𝑖=0 𝑟 𝑖
where 𝑅 represents any ordering of the sequence (𝑟 𝑖 ). If the elements of this sequence are
pairwise distinct, then 𝑅 is the set of these elements.
345
346 C. Probability Theory
We say that the probability distribution assigns the probability Pr(𝑠) to each ele-
mentary event 𝑠 ∈ 𝑆. The probability space is called finite if the sample space is
finite. Otherwise, it is called infinite.
(2) The subsets of 𝑆 are called events. The probability of an event 𝐴 ⊂ 𝑆 is
(C.1.3) Pr(𝐴) = ∑ Pr(𝐴).
𝑎∈𝐴
Note that by Theorem C.1.4, the condition (C.1.2) means that this sum converges
to 1 for any ordering of the elements of 𝑆.
Example C.1.7. Consider the experiment of tossing a fair coin. The corresponding
discrete probability space is ({0, 1}, Pr) where 0 and 1 represent tails and heads, respec-
1
tively, and Pr sends both 0 and 1 to 2 .
Example C.1.8. Consider the experiment of throwing a dice. The corresponding dis-
1
crete probability space is ({1, . . . , 6}, Pr) where Pr sends all elements of {1, . . . , 6} to 6 .
1
Exercise C.1.9. Consider a fair coin, where the probability of getting heads is 2
and
1
the probability of getting tails is 2 . What is the probability of getting heads at least
once when tossing the coin two times? Describe the corresponding probability space
and event and use this to find the solution of the exercise.
Example C.1.10. Consider the experiment in which a dice is rolled until it shows 6.
The sample space is the set of all finite sequences of length ≥ 1 where the last entry is
6 and all other entries are between 1 and 5. The probability distribution is
5|𝑠|−1
(C.1.4) Pr ∶ 𝑆 → [0, 1], 𝑠↦ .
6|𝑠|
C.1. Basics 347
Example C.1.11. We present another way to model the experiment of Example C.1.10.
The sample space is ℕ. The sample or elementary event 𝑠 ∈ ℕ means that the experi-
ment is successful after rolling the dice 𝑠 times. The probability distribution is
5𝑠−1
(C.1.6) Pr ∶ ℕ → [0, 1], 𝑠↦ .
6𝑠
This is a probability distribution due to (C.1.5).
Exercise C.1.12. Consider the experiment in which a dice is rolled until an odd num-
ber occurs for the first time. Determine the corresponding discrete probability space
as in Example C.1.11.
Definition C.1.13. A random variable on a discrete probability space (𝑆, Pr) is a func-
tion
𝑋 ∶ 𝑆 → ℝ.
The expected value or expectation of 𝑋 is
(C.1.7) 𝐸[𝑋] = ∑ Pr(𝑠)𝑋(𝑠)
𝑠∈𝑆
Example C.1.14. Use the notation of Example C.1.10 and define the random variable
(C.1.8) 𝑋 ∶ 𝑆 → ℝ, 𝑠 ↦ |𝑠|.
The expected value of this random variable is
∞
1 5 𝑛−1 1 1
(C.1.9) 𝐸[𝑋] = ∑ 𝑛( ) = = 6.
6 𝑛=1 6 6 (1 − 5/6)2
This means that the expected number of times one needs to roll a dice until it shows a
6 is 6.
Exercise C.1.15. Calculate the expected number of rolls needed to achieve success in
the experiment described in Exercise C.1.12.
Next, we show that the expectation of random variables has linearity properties.
Proposition C.1.16. Let (𝑆, Pr) be a discrete probability space and let 𝑋 and 𝑌 be ran-
dom variables on it such that the expectations 𝐸[𝑋] and 𝐸[𝑌 ] are defined. Then the fol-
lowing hold.
(1) E[𝑋] + E[𝑌 ] = E[𝑋 + 𝑌 ].
(2) E[𝑟𝑋] = 𝑟E[𝑋].
348 C. Probability Theory
Proof. Let 𝑌 be the random variable satisfying 𝑌 (𝑠) = 0 if 0 ≤ 𝑋(𝑠) < 𝑐E[𝑋] and
𝑌 (𝑠) = 𝑐E[𝑋] if 𝑋(𝑠) ≥ 𝑐E[𝑋] for all 𝑠 ∈ 𝑆. Then we have
(C.1.11) E[𝑋] ≥ E[𝑌 ] = 𝑐E[𝑋]Pr(𝑋 ≥ 𝑐E[𝑋]).
This implies the assertion. □
1
Proposition C.2.4. The expected number of trials in the Bernoulli experiment is 𝑝 .
Proof. We have
∞
𝑝 1
(C.2.4) ∑ 𝑖Pr∗ (𝑖) = 𝑝 ∑ 𝑖(1 − 𝑝)𝑖−1 = = . □
𝑖∈ℕ 𝑖=1
(1 − (1 − 𝑝))2 𝑝
Example C.2.5. The expected number of rolls to obtain a 6 on a dice is 6. The expected
number of coin tosses to obtain heads is 2.
Exercise C.2.6. Determine the expected number of rolls to obtain a number > 3 on a
dice.
Appendix D
Solutions of
Selected Exercises
Solution of Exercise 1.1.20. Let 𝑎 = 𝑏𝑐 with two proper divisors 𝑏, 𝑐 of 𝑎 such that
1 < 𝑏 ≤ |𝑐|. Then we have 𝑏2 ≤ |𝑏𝑐| = |𝑎|. This implies 1 < 𝑏 ≤ √|𝑎|. □
351
352 D. Solutions of Selected Exercises
Solution of Exercise 1.3.4. The set FRand(𝐴, 𝑎) ∪ {∞} is countable and by Lemma
1.3.2 Pr𝐴,𝑎 is a probability distribution on the sample space. If Pr𝐴,𝑎 (∞) = 0, then Pr
is a probability distribution on the sample space FRand(𝐴, 𝑎). □
Solution of Exercise 1.3.18. Write 𝑝 = 𝑝𝐴 (𝑎) and 𝑞 = 𝑞𝐴 (𝑎) and denote by 𝑞(𝑎, 𝑘)
the failure probability of 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 (𝑎, 𝑘). If 𝑘 ≥ | log 𝜀|/𝑝, then it follows from (1.3.17)
and from 0 < 𝜀 ≤ 1 that
This implies
and thus
as asserted. □
We also have
𝑘−1 𝑘−1
⃗ 𝑤⟩⃗ = ∑ 𝑣 𝑖 (𝛼𝑤 𝑖 ) = 𝛼 ∑ 𝑣 𝑖 𝑤 𝑖 = 𝛼⟨𝑣|⃗ 𝑤⟩.
⟨𝑣|𝛼 ⃗
𝑖=0 𝑖=0
D. Solutions of Selected Exercises 353
This proves the linearity in the second argument. Next, we prove the conjugate sym-
metry:
𝑘−1 𝑘−1 𝑘−1
⟨𝑤|⃗ 𝑣⟩⃗ = ∑ 𝑤 𝑖 𝑣 𝑖 = ∑ 𝑣 𝑖 𝑤 𝑖 = ∑ 𝑣 𝑖 𝑤 𝑖 = ⟨𝑣|⃗ 𝑤⟩.
⃗
𝑖=0 𝑖=0 𝑖=0
Finally, we have
𝑘−1 𝑘−1
⟨𝑣|⃗ 𝑣⟩⃗ = ∑ 𝑣 𝑖 𝑣 𝑖 = ∑ |𝑣 𝑖 |2 .
𝑖=0 𝑖=0
This implies the positive definiteness and concludes the proof of Theorem 2.2.9. □
Solution of Exercise 2.2.23. We prove that the map (2.2.20) is a norm on ℂ. We first
prove the triangle inequality. Let 𝛼, 𝛽 ∈ ℂ. We apply the triangle inequality for the
absolute value in ℝ and obtain
|𝛼 + 𝛽|2 = |ℜ𝛼 + ℜ𝛽|2 + |ℑ𝛼 + ℑ𝛽|2
(D.7) ≤ (ℜ𝛼)2 + (ℜ𝛽)2 + (ℑ𝛼)2 + (ℑ𝛽)2
= |𝛼|2 + |𝛽|2 .
The absolute homogeneity is seen as follows:
|𝛼𝛽|2 = |(ℜ𝛼 + 𝑖ℑ𝛼)(ℜ𝛽 + 𝑖ℑ𝛽)|2
= (ℜ𝛼ℜ𝛽 − ℑ𝛼ℑ𝛽)2 + (ℜ𝛼ℑ𝛽 + ℑ𝛼ℜ𝛽)2
(D.8) = (ℜ𝛼ℜ𝛽)2 + (ℑ𝛼ℑ𝛽)2 + (ℜ𝛼ℑ𝛽)2 + (ℑ𝛼ℜ𝛽)2
= ((ℜ𝛼)2 + (ℑ𝛼)2 )2 ((ℜ𝛽)2 + (ℑ𝛽)2 )2
= |𝛼|2 |𝛽|2 .
Finally, the positive definiteness follows directly from (2.2.20). □
and
𝑍 |0⟩ − 𝑍 |1⟩ |0⟩ + |1⟩
(D.14) 𝑍 |𝑥− ⟩ = = = |𝑥+ ⟩ .
√2 √2
Hence, we have
0 1
(D.15) Mat𝐶 (𝑍) = ( ).
1 0
Solution of Exercise 2.3.10. The identity (2.3.25) follows from the fact that transpo-
sition and conjugation of matrices are involutions. Next, we have (𝐴 + 𝐵)T = 𝐴T + 𝐵 T
and (𝐴 + 𝐵) = 𝐴 + 𝐵 which implies (2.3.26). Also, (𝛼𝐴)T = 𝛼𝐴T and 𝛼𝐴 = 𝛼𝐴 imply
(2.3.27).
Next, we prove (2.3.28). The rank 𝑟 of 𝐴 is the number of linearly independent
column vectors of 𝐴. The conjugates of these column vectors are the row vectors of 𝐴∗ .
Since conjugation does not change linear dependence and independence, the number
of linearly independent row vectors of 𝐴∗ is also 𝑟. So, Proposition B.7.3 implies that 𝐴
and 𝐴∗ have the same rank.
Finally, equation (2.3.29) follows from the observations that 𝐴𝐵 = 𝐴 𝐵 and (𝐴𝐵)T
=𝐵 𝐴 .T T
□
Solution of Exercise 2.4.5. Assume that all eigenvalues of 𝐴 ∈ ℂ(𝑘,𝑘) have algebraic
multiplicity 1. It then follows from the definition of an eigenvalue and from Corollary
B.7.26 that all eigenvalues also have geometric multiplicity 1. Since by Proposition
2.4.1 the characteristic polynomial 𝑝𝐴 (𝑥) is a product of linear factors, Theorem B.7.28
implies the assertion. □
Solution of Exercise 2.4.14. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ ℂ(𝑘,𝑘) . The diagonal elements of 𝐴 are
𝑎𝑖,𝑖 and the diagonal elements of 𝐴∗ are 𝑎𝑖,𝑖 . Since 𝐴 is Hermitian, we have 𝐴 = 𝐴∗
and therefore 𝑎𝑖,𝑖 = 𝑎𝑖,𝑖 for all 𝑖 ∈ ℤ𝑘 . This proves the first assertion. The second
assertion follows from Proposition 2.3.14. The remaining assertions can be deduced
from Proposition 2.3.9 and the Hermitian property. □
⟨𝑃𝑣,⃗ 𝑣 ⃗ − 𝑃 𝑣⟩⃗
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑃 𝑣⟩⃗ linearity of the inner product,
∗
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑃 𝑣,⃗ 𝑣⟩⃗ property of the adjoint,
2
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ 𝑃 is Hermitian,
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ = 0 𝑃 is a projection.
D. Solutions of Selected Exercises 355
Solution of Exercise 3.1.21. We have |𝑥+ ⟩ = cos(𝜋/4) |0⟩ + 𝑒𝑖⋅0 sin(𝜋/4) |1⟩. There-
fore, the spherical coordinates of the point on the Bloch sphere corresponding to |𝑥+ ⟩
are (𝜋/2, 0). The Cartesian coordinates of this point are (1, 0, 0). The proof for |𝑥− ⟩ is
analogous.
Also, we have |𝑦+ ⟩ = cos(𝜋/4) |0⟩ + 𝑒𝑖⋅𝜋/2 sin(𝜋/4) |1⟩. Therefore, the spherical
coordinates of the point on the Bloch sphere corresponding to |𝑦+ ⟩ are (𝜋/2, 𝜋/2). The
Cartesian coordinates of this point are (0, 1, 0). The proof for |𝑦− ⟩ is analogous. □
Solution of Exercise 3.1.24. The relation 𝑅 is reflexive, since for every |𝜓⟩ ∈ 𝑆 we
have |𝜓⟩ = 𝑒𝑖𝛾 |𝜓⟩ with 𝛾 = 0. If |𝜑⟩ , |𝜓⟩ ∈ 𝑆 with |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩ for some 𝛾 ∈ ℝ, then we
356 D. Solutions of Selected Exercises
have |𝜑⟩ = 𝑒𝑖(−𝛾) |𝜓⟩. So 𝑅 is symmetric. Finally, let |𝜑⟩ , |𝜓⟩ , |𝜉⟩ ∈ 𝑆 and let 𝛾, 𝛿 ∈ ℝ
such that |𝜉⟩ = 𝑒𝑖𝛿 |𝜓⟩ and |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩. Then we have |𝜉⟩ = 𝑒𝑖(𝛿+𝛾) |𝜑⟩. Therefore, 𝑅 is
transitive. □
Since the sequence (|𝜓𝑖 ⟩) is orthonormal, it follows that for all 𝑖, 𝑗 ∈ ℤ𝑙 we have
(D.20) tr𝐵 |𝜑𝑖 ⟩ |𝜓𝑖 ⟩ ⟨𝜑𝑗 | ⟨𝜓𝑗 | = |𝜑𝑖 ⟩ ⟨𝜑𝑗 | 𝛿 𝑖,𝑗 .
Equations (D.19) and (D.20) imply
𝑙−1
1
(D.21) tr𝐵 |𝜉⟩ ⟨𝜉| = ∑ |𝜑 ⟩ ⟨𝜑𝑖 |
𝑙 𝑖=0 𝑖
Solution of Exercise 4.2.26. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3). Then Proposition 4.2.25 im-
plies Rot𝑤̂ (𝛾) = 𝐵 Rot𝑧̂ (𝛾)𝐵−1 . Choose 𝑇 ∈ SO(3) with 𝐵𝑇 = (−𝑢,̂ 𝑣,̂ −𝑤). ̂ Then
𝑇 Rot𝑧̂ (𝛾)𝑇 −1 = Rot𝑧̂ (−𝛾). This implies
Rot−𝑤̂ (𝛾) = 𝐵𝑇 Rot𝑧̂ (𝛾)𝑇 −1 𝐵 −1 = 𝐵 Rot𝑧̂ (−𝛾)𝐵 −1 = Rot𝑤̂ (−𝛾). □
and
𝛾 𝛾 𝛾 𝛾
(D.23) 𝑅𝑥̂ (𝛾) |1⟩ = ( cos 𝐼 − 𝑖 sin 𝑋) |1⟩ = cos |1⟩ − 𝑖 sin |0⟩ .
2 2 2 2
This proves (4.3.8). We also have
𝛾 𝛾 𝛾 𝛾
(D.24) 𝑅𝑦̂(𝛾) |0⟩ = ( cos 𝐼 − 𝑖 sin 𝑌 ) |0⟩ = cos |0⟩ + sin |1⟩
2 2 2 2
and
𝛾 𝛾 𝛾 𝛾
(D.25) 𝑅𝑦̂(𝛾) |1⟩ = ( cos 𝐼 − 𝑖 sin 𝑋) |1⟩ = cos |1⟩ − sin |0⟩ .
2 2 2 2
This proves (4.3.9). Finally, we have
𝛾 𝛾 𝛾 𝛾
(D.26) 𝑅𝑧̂ (𝛾) |0⟩ = ( cos 𝐼 − 𝑖 sin 𝑍) |0⟩ = ( cos − 𝑖 sin ) |0⟩ 𝑒−𝑖𝛾/2 |0⟩
2 2 2 2
and
𝛾 𝛾 𝛾 𝛾
(D.27) 𝑅𝑧̂ (𝛾) |1⟩ = ( cos 𝐼 − 𝑖 sin 𝑍) |1⟩ = ( cos + 𝑖 sin ) |0⟩ .𝑒𝑖𝛾/2 |0⟩ .
2 2 2 2
This proves (4.3.10). □
Solution of Exercise 4.3.18. Let 𝑤̂ ∈ ℝ3 be a unit vector and let 𝛾 ∈ ℝ such that
𝑈 = 𝑅𝑤̂ (𝛾). If 𝑈 ∈ {±𝐼}, then Theorem 4.3.15 implies 𝛾 ≡ 0 mod 2𝜋. So, by Proposition
4.2.27 we have Rot(𝑈) = 𝐼3 . Assume that 𝑈 ≠ ±𝐼. Let 𝑤̂ ′ ∈ ℝ3 be a unit vector
and let 𝛾′ ∈ ℝ such that 𝑈 = 𝑅𝑤̂ ′ (𝛾′ ). Then Theorem 4.3.15 implies 𝑤̂ = 𝑤̂ ′ and
𝛾 ≡ 𝛾′ mod 2𝜋 or 𝑤̂ = −𝑤̂ ′ and 𝛾 ≡ −𝛾′ mod 2𝜋. So Proposition 4.2.27 implies that
Rot𝑤̂ (𝛾) = Rot𝑤̂ ′ (𝛾′ ). □
1
= ∑ (−1)𝑧⋅⃗ 𝑤⃗ (∑ (−1)𝑠⋅⃗ 𝑤⃗ ) |𝑤⟩⃗ .
√2𝑚+𝑛 ⃗
𝑤∈{0,1}𝑛 ⃗
𝑠∈𝑆
(D.39) ∑ (−1)𝑠⋅⃗ 𝑤⃗ = 0.
⃗
𝑠∈𝑆
2𝑚 1
(D.40) 𝐻 ⊗𝑛 |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ . □
√2𝑚+𝑛 ⃗ ⟂
𝑤∈𝑆 √2𝑛−𝑚 ⃗ ⟂
𝑤∈𝑆
Solution of Exercise 6.4.14. It suffices to show that the cardinality of the image of the
map in (6.4.34) is 22𝑛 . The number of pairs (𝑥, 𝑦) ∈ ℤ22𝑛 with 𝑥 ≥ 𝑁 is 𝑘1 = (2𝑛 − 𝑁)2𝑛 .
The number of pairs (𝑥, 𝑦) with 𝑥 < 𝑁 and 𝑦 ≥ 𝑁 is 𝑘2 = 𝑁(2𝑛 − 𝑁). The number of
pairs (𝑥, 𝑦) ∈ ℤ2𝑁 with gcd(𝑦, 𝑁) > 1 is 𝑘3 = 𝑁(𝑁 − 𝜑(𝑁)) where 𝜑(𝑁) is the number
of 𝑦 ∈ ℤ𝑁 with gcd(𝑦, 𝑁) = 1. Finally, if 𝑦 ∈ ℤ𝑁 with gcd(𝑦, 𝑁) = 1, then the map
ℤ𝑁 → ℤ𝑁 , 𝑥 ↦ 𝑥𝑦 mod 𝑁 is a bijection. Hence, the number of pairs (𝑥, 𝑥𝑦 mod 𝑁) ∈
ℤ2𝑁 with gcd(𝑦, 𝑁) = 1 is 𝑘4 = 𝑁𝜑(𝑁). So the cardinality of the image of the map in
(6.4.34) is 𝑘1 + 𝑘2 + 𝑘3 + 𝑘4 = (2𝑛 − 𝑁)2𝑛 + 𝑁(2𝑛 − 𝑁) + 𝑁(𝑁 − 𝜑(𝑁)) + 𝑁𝜑(𝑁) =
22𝑛 − 2𝑛 𝑁 + 2𝑛 𝑁 − 𝑁 2 + 𝑁 2 − 𝑁𝜑(𝑁) + 𝑁𝜑(𝑁) = 22𝑛 . □
360 D. Solutions of Selected Exercises
Solution of Exercise 7.1.8. Since the identity operator 𝐼𝑁 and the projection |𝑠1 ⟩ ⟨𝑠1 |
are involutions, we have
So 𝑈1 is an involution. Also, since 𝐼𝑁 and |𝑠1 ⟩ ⟨𝑠1 | are Hermitian, it follows that 𝑈1 is
also Hermitian. It follows from Exercise 2.4.17 that 𝑈1 is unitary. In the same way, it
can be shown that 𝑈𝑠 is a Hermitian, unitary involution. So 𝐺 = 𝑈𝑠 𝑈1 is unitary. □
1
⟨𝑠+ |𝑠+ ⟩ = ⟨𝑠− |𝑠− ⟩ = (⟨𝑠 |𝑠 ⟩ + ⟨𝑠0 |𝑠0 ⟩) = 1,
2 1 1
1
⟨𝑠+ |𝑠− ⟩ = (⟨𝑠 |𝑠 ⟩ − ⟨𝑠0 |𝑠0 ⟩) = 0.
2 1 1
(D.43) 𝐺 |𝑠0 ⟩ = 𝐺(cos 0 |𝑠0 ⟩ + sin 0 |𝑠1 ⟩) = cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩
and
𝜋 𝜋
𝐺 |𝑠1 ⟩ = 𝐺 (cos |𝑠 ⟩ + sin |𝑠1 ⟩)
2 0 2
(D.44) 𝜋 𝜋
= cos ( + 2𝜃) |𝑠0 ⟩ + sin ( + 2𝜃) |𝑠1 ⟩
2 2
= − sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩ .
1
𝐺 |𝑠+ ⟩ = (𝐺 |𝑠1 ⟩ + 𝑖 |𝑠0 ⟩)
√2
1
= (− sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩ + 𝑖(cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩))
√2
1
= ((cos 2𝜃 + 𝑖 sin 2𝜃) |𝑠1 ⟩ + (𝑖 cos 2𝜃 − sin 2𝜃) |𝑠0 ⟩)
√2
1
= ((cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠0 ⟩ + 𝑖(cos 2𝜃 + 𝑖 sin 2𝜃) |𝑠1 ⟩)
√2
= 𝑒2𝜃 |𝑠+ ⟩
D. Solutions of Selected Exercises 361
and
1
𝐺 |𝑠− ⟩ = (𝐺(|𝑠1 ⟩ − 𝑖 |𝑠0 ⟩))
√2
1
= (− sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩ − 𝑖(cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩))
√2
1
= ((cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠1 ⟩ + (−𝑖 cos 2𝜃 − sin 2𝜃) |𝑠0 ⟩)
√2
1
= ((cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠0 ⟩ − 𝑖(cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠1 ⟩)
√2
= 𝑒−2𝜃 |𝑠+ ⟩ .
This means that |𝑠+ ⟩ and |𝑠− ⟩ are eigenstates of 𝐺 associated with the eigenvales 𝑒2𝜃
and 𝑒−2𝜃 , respectively. Finally, we prove (7.2.2). We have
−𝑖
(𝑒𝑖𝜃 |𝑠+ ⟩ − 𝑒−𝑖𝜃 |𝑠− ⟩)
√2
−𝑖 𝑖𝜃
= (𝑒 (|𝑠1 ⟩ + 𝑖 |𝑠0 ⟩) − 𝑒−𝑖𝜃 (|𝑠1 ⟩ − 𝑖 |𝑠0 ⟩))
2
1 −𝑖 𝑖𝜃
(D.45) = (𝑒𝑖𝜃 + 𝑒−𝑖𝜃 ) |𝑠0 ⟩ + (𝑒 − 𝑒−𝑖𝜃 ) |𝑠1 ⟩
2 2
= cos 𝜃 |𝑠0 ⟩ + sin 𝜃 |𝑠1 ⟩ = |𝑠⟩ . □
∗
0 𝐴
Solution of Exercise 8.1.1. We have (𝐴′ )∗ = ( ∗ ) = 𝐴′ and 𝐴′ 𝑥⃗ = (𝐴𝑥,⃗ 0)⃗ =
𝐴 0
(𝑏,⃗ 0)⃗ = 𝑏′⃗ . □
Solution of Exercise A.1.11. Let 𝑎, 𝑏 ∈ 𝑆 and assume that the equivalence classes of
𝑎 and 𝑏 have a common element 𝑐. Let 𝑑 ∈ 𝑆. Also, let (𝑎, 𝑑) ∈ 𝑅. Then (𝑎, 𝑐) ∈ 𝑅, the
symmetry, and the transitivity of 𝑅 imply that (𝑏, 𝑑) ∈ 𝑅. So the equivalence class of
𝑎 is contained in the equivalence class of 𝑏 and vice versa. Therefore, the equivalence
classes are equal. □
Solution of Exercise A.4.12. Both are abelian semigroups with identity elements 0
and 1, respectively. Also, (ℤ𝑘 , +𝑘 ) is a group but (ℤ𝑘 , ⋅𝑘 ) is not, since 0 has no inverse.
The unit group of (ℤ𝑘 , ⋅𝑘 ) is (ℤ∗𝑘 , ⋅𝑘 ). □
362 D. Solutions of Selected Exercises
Solution of Exercise A.5.9. We use the trigonometric identities (A.5.1), (A.5.2), and
(A.5.4) and obtain
2 2
sin (𝑥 + 𝑦) − sin 𝑥
2
= (sin 𝑥 cos 𝑦 + cos 𝑥 sin 𝑦)2 − sin 𝑥
2 2 2
= sin 𝑥 cos2 𝑦 + 2 sin 𝑥 cos 𝑥 sin 𝑦 cos 𝑦 + cos2 𝑥 sin 𝑦 − sin 𝑥
2 2 2 2 2
= sin 𝑥(1 − sin 𝑦) + sin 𝑥 cos 𝑥 sin(2𝑦) + (1 − sin 𝑥) sin 𝑦 − sin 𝑥
2 2 2 2 2 2 2
= sin 𝑥 − sin 𝑥 sin 𝑦 + sin 𝑥 cos 𝑥 sin(2𝑦) + sin 𝑦 − sin 𝑥 sin 𝑦 − sin 𝑥
2 2
= sin 𝑥 cos 𝑥 sin(2𝑦) + (1 − 2 sin 𝑥) sin 𝑦.
Likewise, we obtain from the trigonometric identities (A.5.1), (A.5.3), and (A.5.4)
2 2
sin 𝑥 − sin (𝑥 − 𝑦)
= 𝑠𝑖𝑛2 𝑥 − (sin 𝑥 cos 𝑦 − cos 𝑥 sin 𝑦)2
2 2 2
= sin 𝑥 − sin 𝑥 cos2 𝑦 + 2 sin 𝑥 cos 𝑥 sin 𝑦 cos 𝑦 − cos2 𝑥 sin 𝑦
2 2 2 2 2 2
= sin 𝑥 − sin 𝑥(1 − sin 𝑦) + sin 𝑥 cos 𝑥 sin(2𝑦) − (1 − sin 𝑥) sin 𝑦 − sin 𝑥
2 2 2 2 2 2 2
= sin 𝑥 − sin 𝑥 + sin 𝑥 sin 𝑦 + sin 𝑥 cos 𝑥 sin(2𝑦) − sin 𝑦 + sin 𝑥 sin 𝑦
2 2
= sin 𝑥 cos 𝑥 sin(2𝑦) − (1 − 2 sin 𝑥) sin 𝑦. □
Solution of Exercise C.1.9. The probability distribution is ({0, 1}2 , Pr) where 0 and 1
represent tails and heads, respectively. Furthermore, Pr sends each pair (𝑎, 𝑏) ∈ {0, 1}2
1
to its probability 4 . The event “getting heads at least once” is {(0, 1), (1, 0), (1, 1)}. Its
3
probability is 4 . □
In fact, we have
∞
𝑝
(D.47) ∑ Pr′ (𝑖) = 𝑝 ∑ (1 − 𝑝)𝑖 = = 1. □
𝑖∈ℕ 𝑖=0
1 − (1 − 𝑝)
Bibliography
[AB09] S. Arora and B. Barak, Computational complexity: A modern approach, Cambridge University Press, Cambridge,
2009, DOI 10.1017/CBO9780511804090. MR2500087
[Abr72] M. Abramowitz (ed.), Handbook of mathematical functions: with formulas, graphs, and mathematical tables,
10th printing, with corrections, Applied mathematics series, no. 55, U. S. Government Printing Office, Wash-
ington, DC, 1972 (English).
[AGP94] W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many Carmichael numbers, Ann. of Math. (2)
139 (1994), no. 3, 703–722, DOI 10.2307/2118576. MR1283874
[AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms, second print-
ing, Addison-Wesley Series in Computer Science and Information Processing, Addison-Wesley Publishing Co.,
Reading, Mass.-London-Amsterdam, 1975. MR413592
[AKS04] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), no. 2, 781–793, DOI
10.4007/annals.2004.160.781. MR2123939
[Aut23] Wikipedia Authors, Timeline of quantum computing and communication, September 2023, Page Version ID:
1174829260.
[Ben80] P. Benioff, The computer as a physical system: a microscopic quantum mechanical Hamiltonian model of com-
puters as represented by Turing machines, J. Statist. Phys. 22 (1980), no. 5, 563–591, DOI 10.1007/BF01011339.
MR574722
[BHT98] G. Brassard, P. Høyer, and A. Tapp, Quantum counting, Automata, languages and programming (Aalborg,
1998), Lecture Notes in Comput. Sci., vol. 1443, Springer, Berlin, 1998, pp. 820–831, DOI 10.1007/BFb0055105.
MR1683527
[BLM17] J. Buchmann, K. E. Lauter, and M. Mosca (eds.), Postquantum cryptography, part 1, IEEE Security & Privacy,
vol. 15, IEEE, 2017.
[BLM18] J. Buchmann, K. E. Lauter, and M. Mosca (eds.), Postquantum cryptography, part 2, IEEE Security & Privacy,
vol. 16, IEEE, 2018.
[BLP93] J. P. Buhler, H. W. Lenstra Jr., and C. Pomerance, Factoring integers with the number field sieve, The devel-
opment of the number field sieve, Lecture Notes in Math., vol. 1554, Springer, Berlin, 1993, pp. 50–94, DOI
10.1007/BFb0091539. MR1321221
[Buc04] J. Buchmann, Introduction to cryptography, 2nd ed., Undergraduate Texts in Mathematics, Springer-Verlag, New
York, 2004, DOI 10.1007/978-1-4419-9003-7. MR2075209
[BWP+ 17] J. D. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nat.
549 (2017), no. 7671, 195–202.
[CEH+ 98] R. Cleve, A. Ekert, L. Henderson, C. Macchiavello, and M. Mosca, On quantum algorithms, Complexity 4 (1998),
no. 1, 33–42, DOI 10.1002/(SICI)1099-0526(199809/10)4:1<33::AID-CPLX10>3.0.CO;2-U. MR1653992
[Cle11] R. Cleve, Classical lower bounds for Simon’s problem, https://cs.uwaterloo.ca/~cleve/courses/
F11CS667/SimonClassicalLB.pdf, 2011.
[CLRS22] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms, 4th ed., The MIT Press,
Cambridge, MA, 2022.
[Dav82] M. Davis, Computability & unsolvability, Dover, New York, 1982.
[Deu85] D. Deutsch, Quantum theory, the Church-Turing principle and the universal quantum computer, Proc. Roy. Soc.
London Ser. A 400 (1985), no. 1818, 97–117. MR801665
363
364 Bibliography
[DGM+ 21] M. Dürmuth, M. Golla, P. Markert, A. May, and L. Schlieper, Towards quantum large-scale password guessing on
real-world distributions, Cryptology and network security, Lecture Notes in Comput. Sci., vol. 13099, Springer,
2021, pp. 412–431, DOI 10.1007/978-3-030-92548-2_22. MR4460974
[DHM+ 18] D. Dervovic, M. Herbster, P. Mountney, S. Severini, N. Usher, and L. Wossnig, Quantum linear systems algo-
rithms: a primer, CoRR abs/1802.08227 (2018).
[DJ92] D. Deutsch and R. Jozsa, Rapid solution of problems by quantum computation, Proc. Roy. Soc. London Ser. A 439
(1992), no. 1907, 553–558, DOI 10.1098/rspa.1992.0167. MR1196433
[Fey82] R. P. Feynman, Simulating physics with computers, Physics of computation, Part II (Dedham, Mass., 1981), In-
ternat. J. Theoret. Phys. 21 (1981/82), no. 6-7, 467–488, DOI 10.1007/BF02650179. MR658311
[FK03] J. B. Fraleigh and V. J. Katz, A first course in abstract algebra, 7th ed., Addison-Wesley, Boston, 2003.
[Fon12] F. Fontein, The probability that two numbers are coprime, https://math.fontein.de/2012/07/10/
the-probability-that-two-numbers-are-coprime/, 2012.
[GLRS16] M. Grassl, B. Langenberg, M. Roetteler, and R. Steinwandt, Applying Grover’s algorithm to AES: quantum re-
source estimates, Post-quantum cryptography, Lecture Notes in Comput. Sci., vol. 9606, Springer, 2016, pp. 29–
43, DOI 10.1007/978-3-319-29360-8_3. MR3509727
[Gro96] L. K. Grover, A fast quantum mechanical algorithm for database search, Proceedings of the Twenty-eighth An-
nual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996), ACM, New York, 1996, pp. 212–
219, DOI 10.1145/237814.237866. MR1427516
[HHL08] A. W. Harrow, A. Hassidim, and S. Lloyd, Quantum algorithm for solving linear systems of equations, 2008, cite
arxiv:0811.3171, Comment: 15 pages; v2 is much longer, with errors fixed, run-time improved and a new BQP-
completeness result added; v3 is the final published version and mostly adds clarifications and corrections to
v2.
[HvdH21] D. Harvey and J. van der Hoeven, Integer multiplication in time 𝑂(𝑛 log 𝑛), Ann. of Math. (2) 193 (2021), no. 2,
563–617, DOI 10.4007/annals.2021.193.2.4. MR4224716
[IR10] K. Ireland and M. I. Rosen, A classical introduction to modern number theory, 2nd ed., 3rd printing, Graduate
Texts in Mathematics, no. 84, Springer, New York, Berlin, Heidelberg, 2010 (English).
[Jor] S. Jordan, Quantum algorithm zoo, https://quantumalgorithmzoo.org/.
[KLM06] P. Kaye, R. Laflamme, and M. Mosca, An introduction to quantum computing, Oxford University Press, Oxford,
2007. MR2311153
[Knu82] D. E. Knuth, The art of computer programming. 1: Fundamental algorithms, 2nd ed., 7th printing, Addison-
Wesley, Reading, MA, 1982.
[LP92] H. W. Lenstra Jr. and C. Pomerance, A rigorous time bound for factoring integers, J. Amer. Math. Soc. 5 (1992),
no. 3, 483–516, DOI 10.2307/2152702. MR1137100
[LLMP93] A. K. Lenstra, H. W. Lenstra Jr., M. S. Manasse, and J. M. Pollard, The factorization of the ninth Fermat number,
Math. Comp. 61 (1993), no. 203, 319–349, DOI 10.2307/2152957. MR1182953
[LP98] H. R. Lewis and C. H. Papadimitriou, Elements of the theory of computation, 2nd ed., Prentice-Hall, Upper Saddle
River, N.J, 1998.
[Man80] Y. Manin, Computable and noncomputable (Russian), Cybernetics, 1980.
[Man99] Y. I. Manin, Classical computing, quantum computing, and Shor’s factoring algorithm, Astérisque 266 (2000),
Exp. No. 862, 5, 375–404. Séminaire Bourbaki, Vol. 1998/99. MR1772680
[NC16] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information, Cambridge University Press,
Cambridge, 2000. MR1796805
[RC18] R. Rines and I. L. Chuang, High performance quantum modular multipliers, arXiv:1801.01081 (2018).
[Rud76] W. Rudin, Principles of mathematical analysis, 3rd ed., International Series in Pure and Applied Mathematics,
McGraw-Hill Book Co., New York-Auckland-Düsseldorf, 1976. MR385023
[Sho94] P. W. Shor, Polynominal time algorithms for discrete logarithms and factoring on a quantum computer, Algorith-
mic Number Theory, First International Symposium, ANTS-I, Ithaca, NY, USA, May 6–9, 1994, Proceedings
(Leonard M. Adleman and Ming-Deh A. Huang, eds.), Lecture Notes in Computer Science, vol. 877, Springer,
1994, p. 289.
[Sim94] D. R. Simon, On the power of quantum computation, 35th Annual Symposium on Foundations of Com-
puter Science (Santa Fe, NM, 1994), IEEE Comput. Soc. Press, Los Alamitos, CA, 1994, pp. 116–123, DOI
10.1109/SFCS.1994.365701. MR1489241
[Sim97] D. R. Simon, On the power of quantum computation, SIAM J. Comput. 26 (1997), no. 5, 1474–1483, DOI
10.1137/S0097539796298637. MR1471989
[Vol99] H. Vollmer, Introduction to circuit complexity: A uniform approach, Texts in Theoretical Computer Science. An
EATCS Series, Springer-Verlag, Berlin, 1999, DOI 10.1007/978-3-662-03927-4. MR1704235
[Wan10] F. Wang, The Hidden Subgroup Problem, Publisher: arXiv Version Number: 1.
[Wat09] J. Watrous, Quantum computational complexity, Encyclopedia of Complexity and Systems Science (Robert A.
Meyers, ed.), Springer, 2009, pp. 7174–7201.
Index
365
366 Index
ℤ, 283 bit, 2
ℤ𝑘 , 4, 284 bit length, 3
ℤ∗𝑚 , 290 bit operation, 5
bit-flip gate, 143
abelian black-box, 202
group, 298 black-box access, 206
semigroup, 298 Bloch sphere, 107
absolute convergence, 345 Boolean circuit, 36
absolute homogeneity, 61 Boolean function, 35
acyclic graph, 288 bra, 59
adjoint, 69, 71 bra notation, 57
adjugate, 320
algebra, 312 Carmichael number, 17
algebraic multiplicity, 329 Cartesian coordinates, 105
algorithm Cartesian product, 284
deterministic, 9 Cauchy-Schwartz inequality, 61
invariant, 12 certificate, 32, 33
probabilistic, 18 character, 2
random, 18 characteristic polynomial, 321
state, 11 circuit
algorithm run Boolean, 36
random sequence, 21 complexity, 40
alphabet, 2 family, 40
alternating function, 319 logic, 36
amplitude, 105 reversible, 43, 46
amplitude amplification, 257 circuit family
ancilla bit, 48 P-uniform, 42
ancillary gate, 175 closed under scalar multiplication, 309
angle between two vectors, 145 codomain, 286
antisymmetric relation, 285 coefficient, 303
argument, 286 coefficient vector, 322
assign instruction, 5 column echelon form, 326
associative operation, 287 reduced, 326
associativity, 308 common divisor, 290
automorphism, 313 commutative
group, 298
balanced function, 206, 209 semigroup, 298
basis, 311 commutative operation, 287
orthogonal, 106 commutative ring, 302
orthonormal, 106 complex numbers, 284
Bell state, 101 complexity
Bernoulli algorithm, 19 exponential, 30, 31
Bernoulli experiment, 348 linear, 30, 31
bijection, 286 polynomial, 30, 31
bijective function, 286 quasilinear, 30, 31
bilinear map, 330 subexponential, 30, 31
bilinearity, 106 complexity class, 31
binary alphabet, 2 composite integer, 12, 291
binary expansion, 3 Composite Systems Postulate, 112
binary length, 3 composition of functions, 287
binary operation, 287 computational basis, 55, 104
binary representation, 3 computational problem, 29
Index 367
unary alphabet, 2
unary representation, 2
uncompute trick, 50, 51
uncountable set, 345
uniform circuit family, 42
unit group, 298, 302
unit matrix, 316
unit vector, 106
unitary, 75
group, 76
unitary gate, 175
unitary quantum operator, 181
universal gate set, 38
universal property, 331
universal set of quantum gates, 181
value, 286
value function, 37
value of a variable, 4
variable, 4, 303
vector, 308
component, 308
dot product, 308
entry, 308
length, 308
vector space, 309
dimension, 322
dual, 312
vertex, 288
Buchmann
For additional information
and updates on this book, visit
www.ams.org/bookpages/amstext-64
AMSTEXT/64