0% found this document useful (0 votes)
187 views

Introduction To Quantum Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
187 views

Introduction To Quantum Algorithms

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 390

4

64

Introduction
to Quantum
Algorithms

Johannes A. Buchmann
Introduction
to Quantum
Algorithms
Pure and Applied
Sally
The UNDERGRADUATE TEXTS • 64
SERIES

Introduction
to Quantum
Algorithms

Johannes A. Buchmann
EDITORIAL COMMITTEE
Daniel P. Groves John M. Lee
Tara S. Holm Maria Cristina Pereyra
Giuliana Davidoff (Chair)

2020 Mathematics Subject Classification. Primary 11-xx, 68-xx, 81-xx, 94-xx.

For additional information and updates on this book, visit


www.ams.org/bookpages/amstext-64

Library of Congress Cataloging-in-Publication Data


Cataloging-in-Publication Data has been applied for by the AMS.
See http://www.loc.gov/publish/cip/.
DOI: https://doi.org/10.1090/amstext/64

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit www.ams.org/publications/pubpermissions.
Send requests for translation rights and licensed reprints to [email protected].

c 2024 by the American Mathematical Society. All rights reserved.
The American Mathematical Society retains all rights
except those granted to the United States Government.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at https://www.ams.org/
10 9 8 7 6 5 4 3 2 1 29 28 27 26 25 24
Contents

Preface ix
The advent of quantum computing ix
The goal of the book x
The structure of the book xi
What is not covered xiv
For instructors xiv
Acknowledgements xv

Chapter 1. Classical Computation 1


1.1. Deterministic algorithms 1
1.2. Probabilistic algorithms 16
1.3. Analysis of probabilistic algorithms 21
1.4. Complexity theory 29
1.5. The circuit model 35
1.6. Circuit families and circuit complexity 40
1.7. Reversible circuits 43

Chapter 2. Hilbert Spaces 53


2.1. Kets and state spaces 54
2.2. Inner products 56
2.3. Linear maps 67
2.4. Endomorphisms 72
2.5. Tensor products 93

v
vi Contents

Chapter 3. Quantum Mechanics 103


3.1. State spaces 104
3.2. State spaces of composite systems 112
3.3. Time evolution 113
3.4. Measurements 118
3.5. Density operators 123
3.6. The quantum postulates for mixed states 131
3.7. Partial trace and reduced density operators 134

Chapter 4. The Theory of Quantum Algorithms 141


4.1. Simple single-qubit operators 142
3
4.2. More geometry in ℝ 144
4.3. Rotation operators 158
4.4. Controlled operators 167
4.5. Swap and permutation operators 174
4.6. Ancillary and erasure gates 175
4.7. Quantum circuits revisited 176
4.8. Universal sets of quantum gates 179
4.9. Implementation of controlled operators 181
4.10. Perfectly universal sets of quantum gates 187
4.11. A universal set of quantum gates 192
4.12. Quantum algorithms and quantum complexity 199

Chapter 5. The Algorithms of Deutsch and Simon 205


5.1. The Deutsch algorithm 206
5.2. Oracle complexity 209
5.3. The Deutsch-Jozsa algorithm 209
5.4. Simon’s algorithm 213
5.5. Generalization of Simon’s algorithm 218

Chapter 6. The Algorithms of Shor 221


6.1. Idea of Shor’s factoring algorithm 222
6.2. The Quantum Fourier Transform 222
6.3. Quantum phase estimation 228
6.4. Order finding 233
6.5. Integer factorization 244
6.6. Discrete logarithms 248
6.7. Relevance for cryptography 251
6.8. The hidden subgroup problem 252
Contents vii

Chapter 7. Quantum Search and Quantum Counting 255


7.1. Quantum search 256
7.2. Quantum counting 268

Chapter 8. The HHL Algorithm 277


8.1. The problem 277
8.2. Overview 279
8.3. Analysis and applications 282

Appendix A. Foundations 283


A.1. Basics 283
A.2. The asymptotic notation 288
A.3. Number theory 289
A.4. Algebra 298
A.5. Trigonometric identities and inequalities 304

Appendix B. Linear Algebra 307


B.1. Vectors 308
B.2. Modules and vector spaces 308
B.3. Linear maps between modules 311
B.4. Matrices 313
B.5. Square matrices 315
B.6. Free modules of finite dimension 322
B.7. Finite-dimensional vector spaces 324
B.8. Tensor products 330
B.9. Tensor products of finite-dimensional vector spaces 337

Appendix C. Probability Theory 345


C.1. Basics 345
C.2. Bernoulli experiments 348

Appendix D. Solutions of Selected Exercises 351

Bibliography 363

Index 365
Preface

The advent of quantum computing


Computing has become indispensable in all areas of society, economy, science, and
engineering. As the scope and complexity of problems we aim to solve with comput-
ers expand, the demand for computing power continues to skyrocket. For decades,
Moore’s law has driven the exponential growth of computing resources, enabling faster
and more capable machines. However, physical limitations loom on the horizon, chal-
lenging the continuation of this trend. Addressing the impending limitations of classi-
cal computing requires innovative solutions. New computing architectures and algo-
rithms that can significantly outperform classical systems would be a game changer,
promising the ability to tackle problems that are currently beyond reach.
Such an innovative concept is that of a quantum computer. It harnesses the prin-
ciples of quantum mechanics. While in classical computing, bits can represent ei-
ther a “0” or a “1”, quantum bits or qubits can exist in superpositions of these values.
This property allows quantum registers with several qubits to store and process expo-
nentially more information than classical registers with an equivalent number of bits.
However, as this book will demonstrate, harnessing this property of quantum comput-
ers presents significant challenges.
The concept of quantum computing began to take shape in the early 1980s, thanks
to the visionary work of several researchers. The mathematician and physicist Yuri
Manin [Man80], [Man99] pondered whether computers based on the principles of
quantum mechanics could challenge the Church-Turing thesis and go beyond the ca-
pabilities of Turing machines. Physicist Paul Benioff [Ben80] proposed the model of a
quantum Turing machine, laying the theoretical foundation for quantum computation.
Physics Nobel laureate Richard Feynman [Fey82] recognized the potential of quantum
computers to simulate physics experiments involving quantum effects. Quantum algo-
rithms became known to virtually everyone in the world through the groundbreaking

ix
x Preface

work of Peter Shor [Sho94] on quantum polynomial time factoring and discrete log-
arithm algorithms. Shor’s work alarmed the world as it revealed the vulnerability of
all known public-key cryptography, one of the fundamental pillars of cybersecurity, to
quantum computer attacks.
Another early advancement in quantum computing that garnered significant at-
tention was Lov Grover’s algorithm [Gro96], offering a quadratic speedup for unstruc-
tured search problems. This breakthrough further fueled the growing interest in quan-
tum computing. Grover’s algorithm captured widespread interest because of its ability
to solve a very generic problem, making it useful across a wide range of applications.
In the decades following these early developments, many more quantum algo-
rithms have been discovered. An example is the HHL algorithm [HHL08], which can
be used to find properties of solutions of large sparse linear systems with certain prop-
erties, providing an exponential speedup over classical solvers like Gauss elimination.
Since linear algebra is one of the most important tools in all areas of science and engi-
neering, the HHL algorithm has wide applications, including machine learning, which
is one of the most significant techniques in computer science today.
This progress should not deceive us, as the development of quantum algorithms
remains a significant challenge. Sometimes, there is the impression that quantum com-
puting allows all computations to be parallelized and significantly accelerated. How-
ever, that is not the case. In reality, each new quantum algorithm requires a unique
idea. Consequently, such algorithms can currently only accelerate a few computa-
tion problems. Moreover, only very few of these improvements come with exponential
speedups.
All the algorithms mentioned in this book are designed for universal, gate-based
quantum computers, which are the most widely recognized and extensively researched
type of quantum computers. In addition to universal quantum computers, there are
more specialized types of quantum computers, such as quantum annealers and quan-
tum simulators. Quantum annealers utilize annealing techniques to solve optimiza-
tion problems by finding the lowest-energy state of a physical system. On the other
hand, quantum simulators are specifically designed to simulate quantum systems and
study quantum phenomena. However, this book focuses on universal quantum com-
puters due to their versatility and because they are the most interesting from a com-
puter science perspective.

The goal of the book


The book is well suited for readers with a basic understanding of calculus, linear alge-
bra, algebra, and probability theory, as well as algorithms and complexity as taught in
basic university courses on these subjects. However, recognizing that the readers may
have diverse backgrounds in computer science, mathematics, physics, and engineer-
ing, my goal is to make the book self-contained and rigorous from the perspectives of all
these subjects. Thus, even readers with little knowledge of algorithms and complexity
can acquire this by studying Chapter 1. Similarly, for those lacking knowledge of cer-
tain topics in linear algebra, algebra, or probability theory, the appendix and Chapter 2
can compensate. Furthermore, the physics introduced in Chapter 3 is sufficient for the
The structure of the book xi

subsequent chapters. I also adopted this approach in writing the book due to my own
experience. Despite having degrees in mathematics and physics and being a computer
science professor for over 30 years, I found myself needing to refresh my memory on
several required concepts and to learn new material. Therefore, my objective is to make
the presentation understandable with a minimum of prerequisites, ensuring clarity for
both myself and the readers.
My approach of covering all the details will lead to the situation that some readers
may already possess knowledge covered in the introductory chapters. However, even
they are likely to encounter new and vital information in these chapters, essential for
understanding quantum algorithms. For example, Chapter 1 gives an introduction to
the theory of reversible computation, which is not typically part of the standard com-
puter science education. Chapter 2 introduces mathematicians to the Dirac notation,
commonly used by physicists. Chapter 3 further expands the understanding of physi-
cists by applying the quantum mechanics postulates to quantum gates and circuits.
Therefore, I encourage those with prior knowledge to read these sections, taking note
of the notation used in the book and of unfamiliar results. This is vital for grasping the
intricacies of my explanation of quantum algorithms.

The structure of the book


As described, the initial three chapters lay the groundwork required to dive into quan-
tum computing.
Chapter 1 presents the theoretical aspects of classical computation relevant to un-
derstanding quantum computing. It begins with classical algorithms, particularly fo-
cusing on probabilistic algorithms and their complexity analysis. This is pivotal be-
cause quantum algorithms, such as Shor’s algorithm, can be seen as probabilistic algo-
rithms with quantum circuits as subroutines. Therefore, the analysis of probabilistic al-
gorithms effortlessly extends to the analysis of quantum algorithms. Furthermore, the
classical complexity classes elucidated in this chapter serve as a blueprint for defining
quantum complexity classes. The second part of this chapter is dedicated to the com-
putational model of classical Boolean circuits, specifically examining the theory of re-
versible circuits, which bears a direct connection to their quantum counterparts. This
serves as a foundation for designing basic quantum algorithms using quantum circuits.
The chapter establishes that every Boolean function can be computed by a Boolean cir-
cuit and covers universal sets of classical gates, uniform families of circuits, and their
correlation with classical algorithms and their complexity. Moreover, it demonstrates
that every Boolean circuit can be transformed into a reversible circuit — an essential
result later applied to show that every Boolean function can be computed by a quantum
circuit.
Chapter 2 explores the theory of finite-dimensional Hilbert spaces, which forms
the mathematical framework for modeling the physics of quantum computing. We
note, however, that finite-dimensional Hilbert spaces may fall short of modeling gen-
eral quantum mechanics. The chapter begins by establishing the necessary founda-
tions, including essential concepts like inner products, orthogonality, linear maps, and
their adjoints. It also introduces the Schur decomposition theorem, which provides
xii Preface

valuable tools for subsequent discussions. Moving forward, the chapter familiarizes the
reader with significant operators in quantum mechanics, such as Hermitian, unitary,
and normal operators. Of particular significance is the spectral theorem, a fundamen-
tal result that offers profound insights into these operators and their characteristics.
The consequences of the spectral theorem are also explored to enrich the reader’s un-
derstanding. Furthermore, the chapter delves into the concept of tensor products of
finite-dimensional Hilbert spaces, a crucial notion in quantum computing. The dis-
cussion culminates with an elucidation of the Schmidt decomposition theorem, which
plays a pivotal role in characterizing the entanglement of quantum systems.
Chapter 3 constitutes the third foundational pillar of quantum computing required
in this book, encompassing the essential background of quantum mechanics. This
chapter introduces the relevant quantum mechanics postulates. To illustrate their rele-
vance, the chapter applies these postulates to introduce fundamental concepts of quan-
tum computing, including quantum bits, registers, gates, and circuits. Simple exam-
ples of quantum computation are provided to enhance the reader’s understanding of
the connection between the postulates and quantum algorithms. In addition, the chap-
ter provides the foundation for the geometric interpretation of quantum computation.
It achieves this by establishing the correspondence between states of individual quan-
tum bits and points on the Bloch sphere, a pivotal concept in quantum computing vi-
sualization. Moreover, the chapter presents an alternative description of the relevant
quantum mechanics using density operators. This approach enables the modeling of
the behavior of components within composed quantum systems.
The foundational groundwork laid out in the initial three chapters, including the
domains of computer science, mathematics, and physics, sets the stage for a compre-
hensive exploration of quantum algorithms in Chapter 4. This chapter embarks on
this transformative journey by shedding light on pivotal quantum gates, which serve
as the fundamental constituents of quantum circuits. We start by introducing single-
qubit gates, demonstrating that their operations can be perceived as rotations within
three-dimensional real space. Subsequently, we delve into the realm of multiple-qubit
operators, with a particular focus on controlled operators. In addition, this chapter
familiarizes readers with the significance of ancillary and erasure gates, which play
a vital role in the augmentation and removal of quantum bits. Leveraging analogous
outcomes from classical reversible circuits, the chapter shows that every Boolean func-
tion can be realized through a quantum circuit. In contrast to the classical scenario, the
quantum case does not adhere to the notion that a finite set of quantum gates suffices to
implement any quantum operator. Instead, finite sets of quantum gates are presented,
enabling the approximation of all quantum circuits. Lastly, the chapter ushers in the
concept of quantum complexity theory, using the analogy between classical probabilis-
tic algorithms and quantum algorithms. It introduces the complexity class BQP, which
stands for bounded-error quantum polynomial time.
The following four chapters focus on specific quantum algorithms.
Chapter 5 introduces early algorithms designed to illustrate the quantum comput-
ing advantage. We begin with the Deutsch algorithm, as presented in David Deutsch’s
seminal paper from 1985 [Deu85], and its generalization by David Deutsch and Richard
The structure of the book xiii

Jozsa in 1992 [DJ92]. Subsequently, we explore Daniel R. Simon’s algorithm, pro-


posed in 1994 [Sim94], which demonstrated a quadratic speedup compared to the best-
known classical algorithm for its specific problem. Through the explanations of these
algorithms, two key principles emerge, elucidating why quantum computing can sur-
pass classical computing. The first principle is quantum parallelism, which capitalizes
on the ability of quantum registers to exist in superpositions of states, allowing for si-
multaneous computation of multiple possibilities. The second principle is quantum
interference, empowering quantum algorithms to concentrate probability amplitudes
on correct answers while suppressing incorrect ones. This phenomenon increases the
likelihood of obtaining the desired solution to a computational problem. Also, the im-
portant phase-kickback method is explained.
Chapter 6 introduces the most famous quantum algorithms, namely Peter Shor’s
algorithms for factoring integers and computing discrete logarithms [Sho94]. The
chapter commences by presenting an overview of Shor’s integer factorization algo-
rithm, serving as a road map for the subsequent concepts introduced and their utiliza-
tion within this algorithm. Next, the key ingredient of Shor’s algorithms is presented:
the Quantum Fourier Transform. A detailed explanation illustrates how this opera-
tor and its inverse can be efficiently implemented using simple quantum gates. The
Quantum Fourier Transform is then employed to solve the problem of quantum phase
estimation. By utilizing quantum phase estimation, a polynomial time quantum algo-
rithm for computing the order of an integer modulo another positive integer is devel-
oped, leading to a polynomial time quantum algorithm for integer factorization. More-
over, the chapter reveals how quantum phase estimation enables the computation of
discrete logarithms modulo positive integers in polynomial time.
Chapter 7 explores another significant quantum algorithm with numerous appli-
cations: the Grover search algorithm from 1996. This algorithm offers a quadratic ad-
vantage when searching for a specific target item in an unstructured set. The chapter
delves into the concept of amplitude amplification, a powerful technique that plays
a pivotal role in Grover’s algorithm. Moreover, the chapter introduces the quantum
counting algorithm, a notable contribution by Gilles Brassard, Peter Høyer, and Alain
Tapp in 1998 [BHT98]. This algorithm utilizes Grover’s algorithm in combination with
quantum phase estimation to count the number of solutions for a given search prob-
lem.
Chapter 8 provides an overview of the HHL algorithm, which focuses on comput-
ing properties of solutions for large sparse linear systems over the complex numbers.
The primary purpose of this chapter is to demonstrate how the ideas of previously in-
troduced algorithms in this book enable the design of such an advanced algorithm.
The book also features an extensive appendix that serves as a valuable resource
for readers. It introduces the mathematical notation used throughout the book and
covers essential concepts, including results that might not commonly be part of stan-
dard mathematical university education. Appendix A begins by exploring fundamen-
tal mathematical concepts, including groups, rings, and fields. It then delves into the
necessary topics in number theory, such as the greatest common divisor and its com-
putation, prime factor decomposition, and continued fractions. In addition, this part
xiv Preface

of the appendix lists essential trigonometric identities and inequalities that play a cru-
cial role in the main part of the book. Appendix B focuses on linear algebra. Its first
part briefly reviews important concepts and results. The second part covers the con-
cept of tensor products, which is of significant importance in quantum computing and
is typically not included in introductory courses in linear algebra. Lastly, Appendix C
contains the required notions and results from probability theory. This knowledge is
essential for the analysis of probabilistic and quantum algorithms.

What is not covered


The main focus of this book is on the theory of quantum algorithms, rather than the
practical aspects of their implementation. However, it is essential to acknowledge two
significant aspects related to realizing practical quantum computers, which our pre-
sentation does not cover.
One such aspect is quantum error correction, which plays a crucial role in quan-
tum computing. Quantum systems are highly sensitive to their environment, leading
to decoherence, which can introduce errors in quantum computations. Quantum error
correction techniques are designed to protect quantum computing from the harmful
effects of decoherence and errors, making quantum algorithms more reliable and ro-
bust.
Another critical aspect is the physical implementation of quantum computers.
Various approaches are being explored to build qubits and quantum gates for prac-
tical quantum computers. These approaches include using superconducting circuits,
trapped ions, photonic qubits, topological qubits, and quantum dots, among others.
Each of these approaches has its advantages and challenges, and researchers are ac-
tively working to develop scalable and efficient quantum computing technologies.
An overview of the main results of quantum computing, including quantum error
correction and quantum computer architectures, can be found in the Wikipedia article
“Timeline of quantum computing” [Aut23].

For instructors
This book is suitable for self-study. It is also intended and has been used for teaching
introductory courses on quantum algorithms. My recommendation to instructors for
such a course is as follows: If most of the participants are already familiar with the
required basics of algorithms and complexity, linear algebra, algebra, and probability
theory, the course should cover Chapters 3, 4, 5, 6, 7, and 8 in this order, exploring dif-
ferent aspects of quantum algorithms. Individual students lacking some basic knowl-
edge can familiarize themselves with those topics using the detailed explanations in
the respective parts of the book. If the majority of the participants in the course is un-
familiar with certain basic topics, the instructor may want to briefly cover them either
in a preliminary lecture or when they are used during the course.
Depending on the instructor’s intentions and the available time, the course may
focus more on theoretical explanations and proofs or on the practical aspects of how
quantum algorithms work. In both situations, students who desire more background
Acknowledgements xv

than is covered in the course can supplement their knowledge through self-study of
the corresponding book parts.

Acknowledgements
I express my sincere gratitude to the following individuals who have been instrumental
in supporting me throughout the process of writing this book. Their invaluable advice,
discussions, and comments have played a pivotal role in shaping the content and qual-
ity of this work: Gernot Alber, Gerhard Birkl, Jintai Ding, Samed Düzlü, Fritz Eisen-
brand, Marc Fischlin, Mika Göös, Iryna Gurevich, Matthieu Nicolas Haeberle, Taketo
Imaizumi, Michael Jacobson, Nigam Jigyasa, Norbert Lutkenhaus, Alastair Kay, Ju-
liane Krämer, Gen Kimura, Michele Mosca, Jigyasa Nigam, Rom Pinchasi, Ahamad-
Reza Sadeghi, Masahide Sasaki, Alexander Sauer, Florian Schwarz, Tsuyoshi Takagi,
Shusaku Uemura, Thomas Walther, Yuntao Wang, and Ho Yun. Their dedication to
sharing their expertise and knowledge has been truly invaluable, and I am deeply grate-
ful for their willingness to engage in insightful discussions and provide constructive
feedback throughout this journey.
I also extend my heartfelt gratitude to Ina Mette, the responsible person at AMS.
Her belief in the potential of this book and her continuous encouragement to pursue
this project have been instrumental in its realization. I am deeply grateful for her un-
wavering support and guidance throughout the writing process. I am also grateful to
Arlene O’Sean and her team at AMS for their great job in carefully proofreading the
book and making its appearance so nice.
I learned a lot from the books on quantum computing by Michael A. Nielsen and
Isaac L. Chuang [NC16] and by Phillip Kaye, Raymond Laflamme, and Michele Mosca
[KLM06].
The writing of this book would not have been possible without the invaluable con-
tributions of several open-source LaTeX packages, which greatly facilitated the presen-
tation of complex concepts. I extend my gratitude to the creators and maintainers of
these packages:
• The powerful TikZ library1 and its extension circuitikz2 were instrumental in
visualizing circuits and diagrams throughout the book.
• I used the open source TikZ code for illustrating the right-hand rule,3
• For the clear representation of quantum circuits, I relied on quantikz.4
• To illustrate quantum states on Bloch spheres, I used the blochsphere package.5
• The packages algorithm and algorithmpseudocode6 were indispensable in pre-
senting algorithms in a structured and easily understandable format.
• For handling the Dirac notation with ease, I benefited from the physics package.7
1
https://tikz.net/
2
https://ctan.org/pkg/circuitikz
3
https://tikz.net/righthand_rule/
4
https://ctan.org/pkg/quantikz
5
https://ctan.org/pkg/blochsphere
6
https://www.overleaf.com/learn/latex/Algorithms
7
https://www.ctan.org/pkg/physics
xvi Preface

I am sincerely grateful to the open-source community for making these and many more
tools available, enhancing the quality of this work and simplifying its creation.
Finally, I would like to acknowledge the support provided by ChatGPT8 in improv-
ing many formulations of my presentation. As I am not a native speaker of the English
language, this assistance was of great help.

8
https://chat.openai.com/
Chapter 1

Classical Computation

Quantum algorithms outperform the best-known classical algorithms in numerous


computational tasks. To establish and demonstrate these advancements, we rely on
essential mathematical frameworks. These frameworks aid our understanding of clas-
sical and quantum computation, enable us to address questions about problem solubil-
ity on both classical and quantum computers, and assess the computational efficiency
of these algorithms.
This chapter presents models fundamental to classical computation, providing the
basis for the corresponding models in quantum computation. We begin by explain-
ing a model for classical deterministic and probabilistic algorithms. It forms the basis
for reviewing key concepts and results in classical complexity theory. The part on the
analysis of probabilistic algorithms is particularly important, as it plays a crucial role in
later chapters where we discuss quantum algorithms which are probabilistic in nature.
The methods used for analyzing probabilistic algorithms, including success probability
amplification, will be directly applicable to the analysis of quantum algorithms.
Moving on from the algorithmic model, we introduce the classical circuit model
of computation, with a particular emphasis on reversible circuits. These reversible cir-
cuits can be readily transformed into quantum circuits, providing the essential insight
that quantum computing is Turing complete.

1.1. Deterministic algorithms


In the realm of computer algorithms, there exist various formal models. The most
famous is the model of Turing machines. It was invented in 1936 by the mathematician
Alan Turing. The Turing machine model has mathematical rigor. But its connection
with algorithms implemented using real-world programming languages is difficult to
understand and describe. Hence, we will introduce a model of computation that strikes
a balance between a formal description and a representation resembling real-world
computing. A very good and comprehensive presentation of computer algorithms that

1
2 1. Classical Computation

uses similar modeling is the book by Thomas H. Cormen, Charles E. Leiserson, Ronald
L. Rivest, and Clifford Stein [CLRS22].

1.1.1. Basics. To explain our model, we introduce some basic concepts and re-
sults. We begin by defining alphabets.
Definition 1.1.1. An alphabet is a finite nonempty set.
Example 1.1.2. The simplest alphabet is the unary alphabet {I}, which contains only
the symbol I. The most commonly used alphabet in computer science is the binary
alphabet {0, 1}, where each element is referred to as a bit. Other commonly used al-
phabets in computer science include the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} of decimal digits,
the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹} of hexadecimal digits, and the Latin al-
phabet ℛ = {a, . . . , z, A, . . . , Z, ␣} that includes lowercase and uppercase Latin letters,
as well as the space symbol ␣.

Next, we introduce words over alphabets.


Definition 1.1.3. Let Σ be an alphabet.
(1) By Σ∗ we denote the set of all finite sequences over Σ including the empty se-
quence which is denoted by ( ).
(2) The elements of Σ∗ are called words or strings over Σ.
(3) If 𝑤 is a word over Σ, then its elements are called the characters of 𝑤 and |𝑤|
denotes the length of 𝑤.
(4) The set of words over Σ of length 𝑛 ∈ ℕ0 is denoted by 𝑆 𝑛 .
(5) If 𝑛 ∈ ℕ0 and 𝑠 ⃗ ∈ Σ𝑛 , then we write 𝑠 ⃗ = 𝑠0 𝑠1 ⋯ 𝑠𝑛−1 , 𝑠 ⃗ = “𝑠0 ⋯ 𝑠𝑘−1 ”,
𝑠 ⃗ = (𝑠0 𝑠1 ⋯ 𝑠𝑛−1 ), or 𝑠 ⃗ = (𝑠0 , 𝑠1 , . . . , 𝑠𝑛−1 ). We also use other numberings of the
characters. For example, we may start the numbering with 1.
Example 1.1.4. The sequence III is a word over the alphabet {I}. The sequence (0, 1, 0)
is a word over the alphabets of bits, decimal digits, and hexadecimal digits. The se-
quence (0, 1, 2) is a word over the alphabets of decimal digits and hexadecimal digits.
The sequence (0, 1, 𝐴) is a word over the alphabet of hexadecimal digits. All these
words have length 3. Finally “another␣failure” is a word over the alphabet of Roman
letters.

Next, we define encodings.


Definition 1.1.5. An encoding of a set 𝑆 with respect to an alphabet Σ is an injective
map 𝑒 ∶ 𝑆 → Σ∗ .
Example 1.1.6. A simple encoding of the set ℕ0 of nonnegative integers is
(1.1.1) 𝑒 ∶ ℕ0 → {I}∗ , 𝑎 ↦ I𝑎 = I⏟
⋯I .
𝑎 times
𝑎
For 𝑎 ∈ ℕ0 we call 𝑒(𝑎) = I the unary representation of 𝑎. Its length is 𝑎.

As an important way of encoding nonnegative integers, we introduce their binary


representation.
1.1. Deterministic algorithms 3

Proposition 1.1.7. Let 𝑎 ∈ ℕ. Then there is a uniquely determined sequence 𝑏 ⃗ =


(𝑏0 , . . . , 𝑏𝑛−1 ) ∈ {0, 1}∗ with 𝑏0 = 1 such that
𝑛−1
(1.1.2) 𝑎 = ∑ 𝑏𝑖 2𝑛−𝑖−1 .
𝑖=0

The length of this sequence is 𝑛 = ⌊log2 𝑎⌋ + 1.


Exercise 1.1.8. Prove Proposition 1.1.7.
Definition 1.1.9. Let 𝑎 ∈ ℕ.
(1) The sequence 𝑏 ⃗ from Proposition 1.1.7 is called the binary expansion or binary
representation of 𝑎.
(2) The positive integer 𝑛 = ⌊log2 𝑎⌋ + 1 is called the binary length or bit length of 𝑎.
It is denoted by bitLength(𝑎).
(3) The binary expansion or binary representation of 0 is defined to be (0). The binary
length or bit length of 0 is set to 1.
Example 1.1.10. The binary expansion of 7 is 111 since 7 = 22 + 21 + 20 . The binary
length of 7 is 3.
Exercise 1.1.11. Determine the binary expansion and the binary length of 251.

As we have seen, every finite sequence of bits that starts with the bit ‘1’ is assigned
to a uniquely determined positive integer as its binary expansion. Now, we also assign
uniquely determined nonnegative integers to all sequences in {0, 1}∗ including those
that start with the bit 0 using the following definition.

Definition 1.1.12. For all 𝑛 ∈ ℕ and 𝑏 ⃗ = (𝑏0 , . . . , 𝑏𝑛−1 ) ∈ {0, 1}∗ we set
𝑛−1
(1.1.3) stringToInt(𝑏)⃗ = ∑ 𝑏𝑖 2𝑛−𝑖−1
𝑖=0

and call this value the integer represented by 𝑏.⃗

Note that nonnegative integers are represented by infinitely many strings in {0, 1}∗ .
Specifically, they are represented by all strings that result from prepending a string
consisting only of zeros to their binary expansions. Also, the number 0 is represented
by all finite sequences of the bit “0”.
Exercise 1.1.13. Use Proposition 1.1.7 to show that the map (1.1.3) is a bijection.

1.1.2. Data types. Algorithms perform operations on objects of specific data


types, such as bits or integers. In this context, a data type refers to a nonempty set,
and the individual elements within this set are called data type objects.
In the presented model, the elementary data types consist of the following sets:
• the set {0, 1} of bits,
• the set ℤ of integers,
• the set ℛ of lowercase and uppercase Latin letters, including the space symbol ␣.
4 1. Classical Computation

Table 1.1.1. Elementary data types and their sizes.

Data type Element Size


{0, 1} 𝑏 size(𝑏) = O(1)
ℛ 𝑥 size(𝑥) = O(1)
ℤ 𝑎 size(𝑎) = O(bitLength(|𝑎|))

Another common data type is that of a floating point number, which represents an
approximation to a real number. However, analyzing algorithms that use this data
type is more complex, as it requires considering error propagation. For our purposes,
we do not need to take this data type into account.
Objects of these elementary data types are represented and stored using bits. The
encoding of these objects can be defined in a straightforward manner for both bits and
Roman characters. Integers are encoded using the binary expansion of their absolute
value along with the indication of their sign. The size of these encodings refers to the
number of bits used. For a data type object 𝑎, the size of its encoding is denoted by
size 𝑎. It may vary depending on the specific computing platform, programming lan-
guage, or other relevant factors. However, detailed discussions regarding these specific
implementations are beyond the scope of this book, and we present only a summary of
the different sizes of the elementary data types in Table 1.1.1.
In our model, advanced data types include vectors and matrices over some data
type, which are described in Sections B.1 and B.4. For example, (1, 2, 3) is an integer
0 1
vector. Similarly, ( ) is a bit matrix. Vectors and matrices are also encoded by bits,
1 0
using the encoding method of their data type. The encodings of vectors and matrices
possess the following properties. For any 𝑘 ∈ ℕ, the size of the encoding of a vector
𝑠 ⃗ = (𝑠0 , . . . , 𝑠𝑘−1 ) over some data type satisfies
𝑘−1
(1.1.4) size(𝑠)⃗ = O ( ∑ size 𝑠𝑖 ) .
𝑖=0

Similarly, for any 𝑘, 𝑙 ∈ ℕ, the size of the encoding of a matrix 𝑀 = (𝑚𝑖,𝑗 )𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑙 over
some data type satisfies

(1.1.5) size(𝑀) = O ( ∑ size 𝑚𝑖,𝑗 ) .


𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑙

Here we have used ℤ𝑘 = {0, . . . , 𝑘 − 1}.

1.1.3. Variables. A variable is a symbol that serves as a reference to a memory


unit with the capability of storing an element of a specific data type. Once defined,
the data type of a variable remains fixed. The value of a variable corresponds to the
content stored in its associated memory cell. Thus, the size of a variable is determined
by the size of its value. This concept is illustrated in Figure 1.1.1, where the variable 𝑎
represents a memory cell containing the integer 19, while the variable 𝑏 represents a
memory cell containing the word “bit”. Therefore, the value of 𝑎 is 19, and the value of
𝑏 is “bit”.
1.1. Deterministic algorithms 5

𝑎 𝑏

19 “bit”

Figure 1.1.1. The variables 𝑎 and 𝑏 represent memory cells that contain 19 and “bit”, respectively.

In most programming languages, variables must be explicitly declared before they


can be used. This declaration serves three essential purposes: first, it specifies the
data type of the variable; second, it reserves the necessary memory space to store the
variable’s value; and third, it assigns a name to the variable for easy reference within
the code. However, to streamline the presentation of algorithms and to keep the focus
on the core logic, we omit explicit variable declarations and assume that the data type
and name of the variables are apparent from the context in which they are used.

1.1.4. Instructions. Instructions serve as fundamental building blocks of algo-


rithms and play a central role in defining their functionality. In this section, we will
introduce the instructions, also known as statements, that algorithms in our model can
execute, and we will provide upper bounds on their running times and memory size
requirements. We will use the terms “instructions” and “statements” interchangeably
to refer to these essential algorithmic components.
Unless otherwise specified, we measure the running times in bit operations. By bit
operations, we refer to all operations that a computer can perform on a single or two
bits. For example, a collection of bit operations commonly available on most computers
is shown in Table 1.1.3. The concept is that all operations are executed using these bit
operations, and the number of such bit operations is counted. This concept is justified
by the fact that, as we will see in Section 1.5.3, all operations can be implemented using
the operators 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱. Analogously, we measure the space requirement by
counting the number of memory units used during the computation.
The basic instruction is the assign instruction:

(1.1.6) 𝑎 ← 𝑏.

It sets the value of a variable 𝑎 of a certain data type to an element 𝑏 of this data type
or to the value of a variable 𝑏 of the same data type as 𝑎. The running time and space
requirement for this operation is O(size 𝑏).
Algorithms may also assign the result of an operation to a variable. An example of
such an instruction is

(1.1.7) 𝑎 ← 𝑏 + 3.

The right-hand side of this assignment is the arithmetic expression 𝑏 + 3. When such
an instruction is executed, the arithmetic expression is evaluated first. In the example,
6 1. Classical Computation

Table 1.1.2. Permitted operations on integers, their running times, and space require-
ments for operands of size O(𝑛).

Operation Operands Result Time Space


absolute value 𝑎∈ℤ |𝑎| O(𝑛) O(𝑛)
add 𝑎, 𝑏 ∈ ℤ 𝑎+𝑏 O(𝑛) O(𝑛)
subtract 𝑎, 𝑏 ∈ ℤ 𝑎−𝑏 O(𝑛) O(𝑛)
multiply 𝑎, 𝑏 ∈ ℤ 𝑎∗𝑏 O(𝑛2 ) O(𝑛)
divide 𝑎, 𝑏 ∈ ℤ, 𝑏 ≠ 0 ⌊𝑎/𝑏⌋ O(𝑛2 ) O(𝑛)
remainder 𝑎, 𝑏 ∈ ℤ, 𝑏 ≠ 0 𝑎 mod 𝑏 O(𝑛2 ) O(𝑛)
equal 𝑎, 𝑏 ∈ ℤ 𝑎=𝑏 O(𝑛) O(𝑛)
less than 𝑎, 𝑏 ∈ ℤ 𝑎<𝑏 O(𝑛) O(𝑛)
less than or equal to 𝑎, 𝑏 ∈ ℤ 𝑎≤𝑏 O(𝑛) O(𝑛)
square root 𝑎∈ℤ ⌊√𝑎⌋ O(𝑛2 ) O(𝑛)
floor 𝑎, 𝑏 ∈ ℤ, 𝑏 ≠ 0 ⌊𝑎/𝑏⌋ O(𝑛2 ) O(𝑛)
ceiling 𝑎, 𝑏 ∈ ℤ, 𝑏 ≠ 0 ⌈𝑎/𝑏⌉ O(𝑛2 ) O(𝑛)
next integer 𝑎, 𝑏 ∈ ℤ, 𝑏 ≠ 0 ⌊𝑎/𝑏⌉ O(𝑛2 ) O(𝑛)
bit length 𝑎 ∈ ℕ0 bitLength(𝑎) O(size 𝑎) O(size 𝑎)
string to int 𝑠 ⃗ ∈ {0, 1}∗ stringToInt(𝑠)⃗ O(size 𝑠)⃗ O(size 𝑠)⃗

it depends on the value of the variable 𝑏. Then the result is assigned to 𝑎. It is permit-
ted that the variable on the left side also appears on the right side. For instance, the
instruction
(1.1.8) 𝑐←𝑐+1
increments a counter 𝑐 by 1.
Next, we present the operations that may be used in expressions on the right side
of an assign instruction. The permitted operations on integers are listed in Table 1.1.2,
including their time and space requirements. The results of the operations absolute
value, floor, ceiling, next integer, add, subtract, multiply, divide, and remainder are
integers. The results of the comparisons are the bits 0 or 1 where 1 stands for “true”
and 0 stands for “false”. For a description and analysis of these algorithms see [AHU74]
and [Knu82].
In most programming languages, only integers with limited bit lengths are avail-
able, such as 64-bit integers. When there is a need to work with integers of arbitrary
length, specialized algorithms are required to handle operations on such numbers. To
simplify our descriptions, we assume that operations on integers of arbitrary length are
available as basic operations. These operations can be realized using the running time
and memory space as listed in Table 1.1.2 but there exist much more efficient algo-
rithms for integer multiplication and division with remainder. For instance, a highly
efficient integer multiplication algorithm developed by David Harvey and Joris van
der Hoeven [HvdH21] has running time O(𝑛 log 𝑛) for 𝑛-bit operands. Additionally,
it is known that for any integer multiplication algorithm with a running time of 𝑀(𝑛),
there exist division with remainder and square root algorithms with a running time of
O(𝑀(𝑛)) (see [AHU74, Theorem 8.5]). However, in practice, these faster algorithms
1.1. Deterministic algorithms 7

Table 1.1.3. Permitted logic operations.

Name Logic operator


𝖠𝖭𝖣 ∧
𝖮𝖱 ∨
𝖭𝖮𝖳 ¬
𝖭𝖠𝖭𝖣 ↑
𝖭𝖮𝖱 ↓
𝖷𝖮𝖱 ⊕

Table 1.1.4. Functions implemented by the permitted logic operations.

𝑎 𝑏 𝑎∧𝑏 𝑎∨𝑏 ¬𝑎 𝑎↑𝑏 𝑎↓𝑏 𝑎⊕𝑏


0 0 0 0 1 1 1 0
0 1 0 1 1 1 0 1
1 0 0 1 0 1 0 1
1 1 1 1 0 0 0 0

are only advantageous for handling very large numbers. For more typical integer sizes,
classical algorithms may still be more efficient in terms of practical performance.
On the bits 0 and 1 our algorithms can perform the logic operations that are listed
in Table 1.1.3. They implement the functions shown in Table 1.1.4. All permitted logic
operations run in time O(1) and require space O(1).

Exercise 1.1.14. Show that ⊕ is addition in ℤ2 and ∧ is multiplication in ℤ2 .

Algorithms may also use the branch statements for, while, repeat, and if. They
initiate the execution of a sequence of instructions if some branch condition is satis-
fied. In the analysis of algorithms, we will assume that the time and space required to
execute an algorithm part that uses a branch instruction is the time and space required
to evaluate the branch condition and the corresponding sequence of instructions, pos-
sibly several times. Branch instructions together with the corresponding instruction
sequence are referred to as loops.
We now provide more detailed explanations of the branch instructions using the
examples shown in Figures 1.1.2 and 1.1.3 utilizing pseudocode that is further de-
scribed in Section 1.1.5.
A for statement appears at the beginning of an instruction sequence and is ended
by an end for statement. This instruction sequence is executed for all values of a spec-
ified variable, as indicated in the for statement. In the for loop in Figure 1.1.2, the
variable is 𝑖, and the instruction 𝑝 ← 2𝑝 is executed for all 𝑖 from 𝑖 = 1 to 𝑖 = 𝑒. After
𝑖 iterations of this instruction, the value of 𝑝 is 2𝑖 . So after completion of the for loop,
the value of 𝑝 is 2𝑒 .
Also, while statements appear at the beginning of an instruction sequence after
which there is an end while statement. The instruction sequence is executed as long
as the condition in the while statement is true. For instance, the while loop in Figure
8 1. Classical Computation

𝑝←1 𝑖←1 𝑖←0


for 𝑖 = 1 to 𝑒 do 𝑝←1 𝑝←1
𝑝 ← 2𝑝 while 𝑖 ≤ 𝑒 do repeat
end for 𝑝 ← 2𝑝 𝑝 ← 2𝑝
𝑖 ←𝑖+1 𝑖 ←𝑖+1
end while until 𝑖 = 𝑒

Figure 1.1.2. for, while, and repeat loops that compute 2𝑒 .

1.1.2 also computes 2𝑒 . For this, the counting variable 𝑖 is initialized to 1 and the vari-
able 𝑝 is initially set to 1. Before each round of the while loop, the logic expression 𝑖 ≤ 𝑒
is evaluated. If it is true, then the instruction sequence in the while loop is executed.
In the example, 𝑝 is set to 2𝑝 and the counting variable 𝑖 is increased by 1. After the
𝑘th iteration of the while loop, the value of 𝑝 is 2𝑘 and the counting variable is 𝑖 = 𝑘.
Hence, after the 𝑒th iteration of the while loop we have 𝑝 = 2𝑒 and 𝑖 = 𝑒 + 1. So the
while condition is violated and the computation continues with the first instruction
following the while loop.
Next, repeat statements are also followed by an instruction sequence that is ended
by an until statement that contains a condition. If this condition is satisfied, the com-
putation continues with the first instruction after the until statement. Otherwise, the
instruction sequence is executed again. In the example in Figure 1.1.2, the instruction
𝑝 ← 2𝑝 is executed until the counting variable 𝑖 is equal to 𝑒. Note that the instruction
sequence is executed at least once. Therefore, this repeat loop cannot compute 20 .
Exercise 1.1.15. Find for, while, and repeat loops that compute the integer repre-
sented by a bit sequence 𝑠0 𝑠1 ⋯ 𝑠𝑛−1 ∈ {0, 1}𝑛 where 𝑛 ∈ ℕ.

Now we explain if statements. The three different ways to use if are shown in
Figure 1.1.3. Such a statement is followed by a sequence of instructions that is ended
by an end if statement. The instruction sequence may be interrupted by an else state-
ment or by one or more else if statements. The code segment on the left side of Figure
1.1.3 checks whether 𝑎 < 0 is true, in which case the instruction 𝑎 ← −𝑎 is executed.
Otherwise, the computation continues with the instruction following the end if state-
ment. This code segment computes the absolute value of 𝑎 because 𝑎 is set to −𝑎 if 𝑎 is
negative and otherwise remains unchanged. The code segment in the middle of Figure
1.1.3 checks whether 𝑎 is divisible by 11. If so, the variable 𝑠 is set to 1 and otherwise
to 0. Finally, the code segment on the right side of Figure 1.1.3 first checks if 𝑎 > 0 in
which case the variable 𝑠 is set to 1. Next, if 𝑎 = 0, 𝑠 is set to 0. Finally, 𝑠 is set to −1 if
𝑎 < 0. The result is the sign 𝑠 of 𝑎.
A computation terminates when the return instruction is used. This instruction
makes the result available for further use and takes the form
(1.1.9) return(𝑎)
where 𝑎 is an element or variable of some data type or a sequence of such objects. The
time and space requirement for this operation is O(𝑆), where 𝑆 represents the sum of
the sizes of the objects in the return statement.
1.1. Deterministic algorithms 9

if 𝑎 < 0 then if 𝑎 mod 11 = 0 if 𝑎 > 0 then


𝑎 ← −𝑎 then 𝑠←1
end if 𝑠←1 else if 𝑎 = 0
else then
𝑠←0 𝑠←0
end if else if 𝑎 < 0
then
𝑠 ← −1
end if

Figure 1.1.3. if statements.

Algorithms can also return the result of expressions. For instance, a return in-
struction may be of the form

(1.1.10) return(2 ∗ 𝑎 + 𝑏).

In this case, the expression is evaluated first, and then the result is returned. The time
and space requirements of this instruction are O(𝑡) and O(𝑠), respectively, where 𝑡 and
𝑠 denote the time and space required to evaluate the corresponding expression.
Finally, a call to a subroutine is also considered a valid instruction in our model.
It takes the form

(1.1.11) 𝑎 ← 𝐴(𝑏)

where 𝐴 represents an algorithm, as elaborated in the subsequent section. For example,


such an instruction may be expressed as

(1.1.12) 𝑏 ← 𝗉𝗈𝗐𝖾𝗋(𝑎, 𝑒).

Here, the call 𝗉𝗈𝗐𝖾𝗋(𝑎, 𝑒) invokes the subroutine, returning the result of 𝑎𝑒 , where 𝑎
and 𝑒 are both nonnegative integers.
The time and space requirements associated with a subroutine call are O(𝑡) and
O(𝑠), respectively, where 𝑡 and 𝑠 represent the time and space required by the subrou-
tine to execute. Section 1.4 provides an explanation of how these requirements are
determined.

1.1.5. Definition of deterministic algorithms. In this section, we explain the


concept of a deterministic algorithm and illustrate how such an algorithm can be pre-
sented using pseudocode. Pseudocode serves as a high-level depiction of an algorithm’s
logic and sequence of steps, with the aim of ensuring ease of comprehension for human
readers. To simplify the discussion, we will just talk about an “algorithm” instead of a
“deterministic algorithm”. In contrast, in the next section, we introduce probabilistic
algorithms.
As mentioned earlier, our model of computation is a compromise between a for-
mal description and a representation that resembles real algorithms written in some
programming language. Therefore, in this section, we do not use formal definitions.
10 1. Classical Computation

But in the next sections, we formally define the properties of algorithms. These defini-
tions can be made precise if a formal model of computation is used, such as the Turing
machine model.
We illustrate our algorithm model using the Euclidean algorithm. The correspond-
ing pseudocode is shown in Algorithm 1.1.16.

Algorithm 1.1.16. Euclidean algorithm


Input: 𝑎, 𝑏 ∈ ℤ
Output: gcd(𝑎, 𝑏)
1: gcd(𝑎, 𝑏)
2: 𝑎 ← |𝑎|
3: 𝑏 ← |𝑏|
4: while 𝑏 ≠ 0 do
5: 𝑟 ← 𝑎 mod 𝑏
6: 𝑎←𝑏
7: 𝑏←𝑟
8: end while
9: return 𝑎
10: end

An algorithm 𝐴 has the following components:


(1) An Input statement. It specifies a finite number of input variables, their data
types, and the permitted values of these variables. The set of all permitted input
value tuples is referred to as Input(𝐴).
(2) An Output statement. For every 𝑎 ∈ Input(𝐴) it specifies properties that may
depend on 𝑎 and makes a return value a correct output. The set of all correct
outputs for input 𝑎 is denoted by Output(𝐴, 𝑎).
(3) An algorithm name followed by the sequence of input variables which is used
when 𝐴 is called by other algorithms as a subroutine.
(4) A finite sequence of instructions that ends with end.
For example, in the Euclidean algorithm, the Input statement specifies that there
are two integer input variables 𝑎 and 𝑏 which may take any integer value. According
to the Output statement, the algorithm returns gcd(𝑎, 𝑏). The name of the algorithm
and the sequence of input variables can be seen in line 1 of the pseudocode: gcd(𝑎, 𝑏).
It is followed by 8 instructions. The last line of the pseudocode is end.
Let 𝐴 be an algorithm that has 𝑘 input variables 𝑣 0 , . . . , 𝑣 𝑘−1 of data types 𝐷0 , . . . ,
𝐷𝑘−1 , respectively. Then the set Input(𝐴) of all allowed input values of 𝐴 satisfies
Input(𝐴) ⊂ 𝐷0 × ⋯ × 𝐷𝑘−1 . For the Euclidean algorithm, we have 𝑘 = 2, 𝐷0 = 𝐷1 = ℤ,
and Input(𝐴) = ℤ × ℤ. Running 𝐴 with input 𝑎 = (𝑎0 , . . . , 𝑎𝑘−1 ) ∈ Input(𝐴) means
the following. 𝐴 assigns 𝑎𝑖 to the input variable 𝑣 𝑖 for all 𝑖 ∈ ℤ𝑘 . Then it executes its
sequence of instructions. This process is called the run of 𝐴 with input 𝑎. We list a few
requirements that every deterministic algorithm must satisfy.
1.1. Deterministic algorithms 11

(1) Each run of the algorithm with a permitted input carries out a return instruction.
This means that the algorithm terminates on any input 𝑎 ∈ Input(𝐴).
(2) When the algorithm performs a return instruction, the return value is correct;
i.e., it has the property specified in the Output statement.
(3) Executing the return instruction is the only way the algorithm can terminate.
This means that after executing a statement that is not a return instruction there
is always a next instruction that the algorithm carries out.

Example 1.1.17. We describe the run of the Euclidean algorithm with input (𝑎, 𝑏) =
(100, 35). The instructions in lines 2 and 3 replace 𝑎 and 𝑏 by their absolute values. For
the chosen input, they have no effect. Since 𝑏 = 35, the while condition is satisfied.
Hence, the Euclidean algorithm executes 𝑟 ← 100 mod 35 = 30, 𝑎 ← 𝑏 = 35, and
𝑏 ← 𝑟 = 30. After this, the while condition is still satisfied since 𝑏 = 30. So the
Euclidean algorithm executes 𝑟 ← 35 mod 30 = 5, 𝑎 ← 𝑏 = 30, and 𝑏 ← 𝑟 = 5. Also,
after this iteration of the while loop, the while condition is still satisfied since 𝑏 = 5.
The Euclidean algorithm executes 𝑟 ← 30 mod 5 = 0, 𝑎 ← 𝑏 = 5, and 𝑏 ← 𝑟 = 0. Now,
the while condition is violated. So the while loop is no longer executed. Instead, the
return instruction following end while is carried out. This means that the algorithm
returns 5 which is gcd(100, 35).

We model the run of an algorithm 𝐴 on an input 𝑎 ∈ Input(𝐴) as a sequence of


states. A state describes the situation of the algorithm immediately before the exe-
cution of an instruction. It includes the contents of the memory cells corresponding
to the variables in the algorithm and the instruction to be executed next. For example,
consider the run of the Euclidean algorithm with input (100, 35). Tables 1.1.5 and 1.1.6
show the first and final states of this run.
If the instruction in a state is not the return statement, then the algorithm per-
forms this instruction. This may change the value of the variables. Then the algorithm
enters the next state since in our model termination is only possible when the return
instruction is carried out. This next state is uniquely determined by the previous state.
So, the input 𝑎 ∈ Input(𝐴) uniquely determines the execution of the algorithm and
its return value, which is referred to by 𝐴(𝑎). This explains the name “deterministic

Table 1.1.5. Beginning of the run of the Euclidean algorithm with input (100, 35).

State# Memory contents Next instruction


𝑎 𝑏 𝑟
1 100 35 𝑎 ← |𝑎|
2 100 35 𝑏 ← |𝑏|
3 100 35 while 𝑏 ≠ 0 do
4 100 35 𝑟 ← 𝑎 mod 𝑏
5 100 35 30 𝑎←𝑏
6 35 35 30 𝑏←𝑟
7 35 30 30 end while
8 35 30 30 while 𝑏 ≠ 0 do
12 1. Classical Computation

Table 1.1.6. End of the run of the Euclidean algorithm with input (100, 35).

State# Memory contents Next instruction


𝑎 𝑏 𝑟
1 30 5 5 while 𝑏 ≠ 0 do
2 30 5 5 𝑟 ← 𝑎 mod 𝑏
3 30 5 0 𝑎←𝑏
4 5 5 0 𝑏←𝑟
5 5 0 0 end while
6 5 0 0 while 𝑏 ≠ 0 do
7 5 0 0 return 𝑎

algorithm”. For instance, consider State 3 in Table 1.1.5. The value of 𝑏 is 35. So the
while condition is satisfied. The execution of the while instruction does not change
the values of 𝑎, 𝑏, or 𝑟 and causes the next instruction to be 𝑟 ← 𝑎 mod 𝑏. So State 4 is
uniquely determined by State 3.
Since we require deterministic algorithms to always terminate, the same state can-
not occur repeatedly in an algorithm run. Otherwise, the algorithm would enter an
infinite loop. In other words, the states in algorithm runs are pairwise different.
It is important to prove the correctness of an algorithm. This means that on input of
any 𝑎 ∈ Input(𝐴) the algorithm terminates and its output has the specified properties.
In Example 1.1.18, we present the correctness proof of the Euclidean algorithm.
Example 1.1.18. We prove the correctness of the Euclidean algorithm. First, note that
after 𝑏 is replaced by its absolute value, the sequence of values of 𝑏 is strictly decreasing
since starting from the second 𝑏, any such value is the remainder of a division by the
previous value of 𝑏. So at some point, we must have 𝑏 = 0 which means that the
algorithm terminates. Next, as Exercise 1.1.19 shows, the value of gcd(𝑎, 𝑏) in line 4
is always the same. But when the algorithm terminates, we have 𝑏 = 0 and therefore
gcd(𝑎, 𝑏) = gcd(𝑎, 0) = 𝑎. The fact that gcd(𝑎, 𝑏) does not change is called an algorithm
invariant. Such invariants are frequently used in correctness proofs of algorithms.
Exercise 1.1.19. Show that in line 4 of the Euclidean algorithm, the value of gcd(𝑎, 𝑏)
is always the same.

As a further example, we present a deterministic factoring algorithm. Recall that


a composite integer is an integer 𝑎 that can be written as 𝑎 = 𝑏𝑐 where 𝑏, 𝑐 are integers
and 𝑏 is a proper divisor of 𝑎; i.e., 𝑎 mod 𝑏 = 0 and 0 < |𝑏| < |𝑎|. The goal of a factoring
algorithm is to find a proper divisor of a given integer 𝑎 if 𝑎 is composite. Algorithm
1.1.21 is such an algorithm. It is based on the fact that any composite integer 𝑎 has a
proper divisor 𝑏 with 1 < 𝑏 ≤ √|𝑎|. This is proved in Exercise 1.1.20. The input of the
algorithm is an integer 𝑎 > 1. It enumerates all integers 𝑏 with 1 < 𝑏 ≤ √𝑎 and checks
whether 𝑏 is a divisor of 𝑎. If no divisor is found, then 𝑎 is proved to be a prime number
in which case the algorithm returns 0.
Exercise 1.1.20. Show that every composite integer 𝑎 has a proper divisor 𝑏 such that
1 < 𝑏 ≤ √|𝑎|.
1.1. Deterministic algorithms 13

Algorithm 1.1.21. A deterministic factoring algorithm


Input: 𝑎 ∈ ℤ>1
Output: A proper divisor 𝑏 of 𝑎 if 𝑎 is composite, or 0 if 𝑎 is a prime number
1: 𝖽𝖾𝗍𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
2: for all 𝑏 = 2, . . . , ⌊√𝑎⌋ do
3: if 𝑎 mod 𝑏 = 0 then
4: return 𝑏
5: end if
6: end for
7: return 0
8: end

Exercise 1.1.22. Let 𝑎 = 35. Determine the first three and the last three states of the
run of Algorithm 1.1.21.

1.1.6. Decision algorithms. In classical complexity theory, decision algorithms


play an important role. Such an algorithm decides whether a string 𝑠 ∈ {0, 1}∗ belongs
to a subset 𝐿 of {0, 1} which is called a language. The input of a decision algorithm is a
string 𝑠 ⃗ in {0, 1}∗ . The output is 0 or 1, where 1 means that the input 𝑠 ⃗ belongs to 𝐿 and
0 means that 𝑠 ⃗ belongs to the complement of 𝐿 in {0, 1}∗ . We also say that the algorithm
decides the language 𝐿.
Example 1.1.23. Algorithm 1.1.24 decides the language 𝐿 that consists of all strings in
𝑠 ⃗ ∈ {0, 1}∗ representing composite integers. It works like Algorithm 1.1.21 except that
the output corresponding to a composite number is 1 and the output is 0 if the input
string 𝑠 ⃗ is (), represents 0, 1, or a prime number.

There is a close connection between decision and more general algorithms. For ex-
ample, as shown in Example 1.4.21, an algorithm that decides whether an integer has
a proper divisor below a given bound can be transformed into an integer factoring al-
gorithm with almost the same efficiency. This can be generalized to many algorithmic
problems.

Algorithm 1.1.24. Compositeness decision algorithm


Input: 𝑠 ⃗ ∈ {0, 1}∗
Output: 1 if stringToInt(𝑠)⃗ is composite and 0 otherwise
1: 𝖽𝖾𝖼𝗂𝖽𝖾𝖢𝗈𝗆𝗉(𝑠)⃗
2: 𝑎 ← stringToInt(𝑠)⃗
3: for all 𝑏 = 2, . . . , ⌊√𝑎⌋ do
4: if 𝑎 mod 𝑏 = 0 then
5: return 1
6: end if
7: end for
8: return 0
9: end
14 1. Classical Computation

1.1.7. Time and space complexity. Let 𝐴 be an algorithm. Its efficiency de-
pends on the time complexity and the memory requirements of 𝐴 which we discuss in
this section.
Definition 1.1.25. (1) The running time or time complexity of 𝐴 for a particular in-
put 𝑎 ∈ Input(𝐴) is the sum of the time required for reading the input 𝑎 which
is O(size(𝑎)) and the running times of the instructions executed during the algo-
rithm run with input 𝑎.
(2) The worst-case running time or worst-case time complexity of 𝐴 is the function
(1.1.13) wTime𝐴 ∶ ℕ → ℝ≥0
that sends a positive integer 𝑛 which is the size of an input of 𝐴 to the maximum
running time of 𝐴 over all inputs of size 𝑛. If 𝑛 is not the size of an input of 𝑎, then
we set wTime𝐴 (𝑛) = 0.

Next, we define the space complexity of 𝐴.


Definition 1.1.26. (1) The space complexity of 𝐴 for a particular input 𝑎 is the total
amount of memory space that is used in the algorithm run with input 𝑎.
(2) The worst-case space complexity of 𝐴 is the function
(1.1.14) wSpace𝐴 ∶ ℕ → ℝ≥0
that sends a positive integer 𝑛 which is the size of an input of 𝐴 to the maximum
space complexity of 𝐴 over all inputs of size 𝑛. If 𝑛 is not the size of an input of 𝑎,
then we set wSpace𝐴 (𝑛) = 0.

Using the Definitions 1.1.25 and 1.1.26, we define the asymptotic time and space
complexity of deterministic algorithms.
Definition 1.1.27. Let 𝑓 ∶ ℕ → ℝ>0 be a function. We say that 𝐴 has asymptotic worst-
case running time or space complexity O(𝑓) if wTime𝐴 = O(𝑓) or wSpace𝐴 = O(𝑓),
respectively. The words “asymptotic” and “worst-case” may also be omitted.

It is common to use special names for certain time and space complexities. Several
of these names are listed in Table 1.1.7.

Table 1.1.7. Asymptotic time and space complexity names.

Name Time or space complexity


constant O(1)
logarithmic O(log 𝑛)
linear O(𝑛)
quasilinear O(𝑛(log 𝑛)𝑐 ) for some 𝑐 ∈ ℕ
quadratic O(𝑛2 )
cubic O(𝑛3 )
polynomial O(𝑛𝑐 ) for some 𝑐 ∈ ℕ
𝜀
subexponential O(2𝑛 ) for all 𝜀 ∈ ℝ>0
𝑐
exponential O(2𝑛 ) for some 𝑐 ∈ ℕ
1.1. Deterministic algorithms 15

Exercise 1.1.28. Show that quasilinear complexity can also be written as 𝑛1+o(1) , poly-
nomial complexity as 𝑛O(1) or 2O(log 𝑛) , subexponential complexity as 2o(𝑛) , and expo-
O(1)
nential complexity as 2𝑛 .

Example 1.1.29. We analyze the time and space complexity of the Euclidean Algo-
rithm 1.1.16. Let 𝑎, 𝑏 ∈ ℤ be the input of the algorithm, and let 𝑛 be the maximum
of size(𝑎) and size(𝑏). The time to read the input 𝑎, 𝑏 is O(𝑛). After the operations in
lines 2 and 3 we have 𝑎, 𝑏 ≥ 0. The time and space complexity of these instructions
is O(𝑛). If 𝑏 = 0, then the while loop is not executed and 𝑎 is returned, which takes
time O(𝑛). If 𝑏 ≠ 0 and 𝑎 ≤ 𝑏, then after the first iteration of the while loop, we have
𝑏 < 𝑎. It follows from Exercise 1.1.30 that the total number of executions of the while
loop is O(𝑛). Also, by this exercise, the size of the operands used in the executions of
the while loop is O(𝑛). So, the running time of each iteration is O(𝑛2 ) and the space
requirement is O(𝑛). This shows that the worst-case running time of the Euclidean
algorithm is O(𝑛3 ) and the worst-case space complexity is O(𝑛). Thus, the Euclidean
algorithm has cubic running time. Using more complicated arguments, it can even be
shown that this algorithm has quadratic running time (see Theorem 1.10.5 in [Buc04]).

What is the practical relevance of worst-case running times when comparing algo-
rithms? Let us take two algorithms, 𝐴 and 𝐴′ , both designed to solve the same problem,
such as computing the greatest common divisors. It is essential to note that if algorithm
𝐴 has a smaller asymptotic running time than algorithm 𝐴′ , it does not automatically
make 𝐴 superior to 𝐴′ in practice. This comparison only indicates that 𝐴 is faster than
𝐴′ for inputs greater than a certain length. However, this input length can be so large
that it becomes irrelevant for most real-world use cases.
For example, in [AHU74] it is shown that for any integer multiplication algorithm
with a worst-case time complexity of 𝑀(𝑛), there exists a gcd algorithm with a worst-
case time complexity of O(𝑀(𝑛) log(𝑛)). Additionally, [HvdH21] presents an integer
multiplication algorithm with a worst-case running time of O(𝑛 log 𝑛). As a result,
2
there is a corresponding gcd algorithm with a worst-case running time of O(𝑛 log 𝑛).
However, this improved complexity only outperforms the O(𝑛2 ) algorithm for very
large integers, which may not occur in most common input sizes.

Exercise 1.1.30. Let 𝑎, 𝑏 ∈ ℕ, 𝑎 > 𝑏, be the input of the Euclidean algorithm. Let
𝑟0 = 𝑎 and 𝑟1 = 𝑏. Denote by 𝑘 the number of iterations of the while loop executed
in the algorithm and denote by 𝑟2 , 𝑟3 , . . . , 𝑟 𝑘+1 the sequence of remainders 𝑟 which are
computed in line 5 of the Euclidean algorithm. Prove that the sequence (𝑟 𝑖 )0≤𝑖≤𝑘+1 is
strictly decreasing and that 𝑟 𝑖+2 < 𝑟 𝑖 /2 for all 𝑖 ∈ ℤ𝑘 . Conclude that 𝑘 = O(size 𝑎).

Example 1.1.31. We determine the worst-case time and space complexity of the De-
terministic Factoring Algorithm 1.1.21. Let 𝑛 = bitLength 𝑎. The number of iterations
of the for loop in this algorithm is O(2𝑛/2 ). Each iteration of the for loop requires time
O(𝑛2 ) and space O(𝑛). Hence, the worst-case time complexity of Algorithm 1.1.21 is
O(𝑛2 2𝑛/2 ) = 2O(𝑛) and the worst-case space complexity is O(𝑛). So, the algorithm has
exponential running time and linear space complexity.
16 1. Classical Computation

1.2. Probabilistic algorithms


Quantum algorithms are inherently probabilistic; i.e., the output for a given input is
not uniquely determined but follows a probability distribution on the possible outputs.
This section discusses classical probabilistic algorithms. In many cases, they are much
more efficient than their deterministic counterparts. An example is the probabilistic
solution of the Deutsch-Jozsa problem in Section 5.3.1.

1.2.1. Definition of probabilistic algorithms. A probabilistic algorithm has


the same four components as a deterministic algorithm: the Input and Output state-
ment, the algorithm name followed by the sequence of input variables, and a sequence
of instructions that is ended by end. In addition, states, runs, and complexities of
probabilistic algorithms are defined analogously to their deterministic counterparts.
The differences between the two types of algorithms are now listed. For this, let 𝐴 be
a probabilistic algorithm.

(1) The probabilistic algorithm 𝐴 may call the subroutine coinToss. It returns 0 or 1,
1
both with probability 2 .
(2) The probabilistic algorithm 𝐴 may call other probabilistic algorithms subroutines
if they satisfy the following condition. Given a permitted input, they terminate
and return one of finitely many possible outputs according to a probability distri-
bution.
(3) The run of 𝐴 on input of some 𝑎 ∈ Input(𝐴) may depend on 𝑎 and the return
values of the probabilistic subroutines called during the run of the algorithm.
Therefore, in contrast to deterministic algorithms, this run may not be uniquely
determined by 𝑎.
(4) 𝐴 may not terminate, since termination may depend on certain return values of
some probabilistic subroutine that may never occur.
(5) Let 𝑎 ∈ Input(𝐴) and suppose that 𝐴 terminates on input of 𝑎 with output 𝑜.
Then 𝑜 may not be uniquely determined by 𝑎, but it may also depend on the return
values of the probabilistic subroutine calls during the run of 𝐴. Also, we may have
𝑜 ∈ Output(𝐴, 𝑎), 𝑜 = “Failure”, which indicates that the algorithm did not find
a correct output or 𝑜 has neither of these properties.
(6) Due to the special meaning of the return value “Failure”, it must never be a correct
output.

So an important difference between deterministic and probabilistic algorithms is


that the latter are not required to always return correct outputs. As we shall see, correct
outputs occur with a certain success probability. We will show in Proposition 1.3.5 that
the second condition in (2) follows from the property that the probabilistic subroutine
always terminates.
We present two examples of probabilistic algorithms that may be used as subrou-
tines in probabilistic algorithms.
1.2. Probabilistic algorithms 17

Example 1.2.1. On input of 𝑘 ∈ ℕ, Algorithm 1.2.2 returns a random bit string of


length 𝑘 with a uniform distribution. Also, on the same input, Algorithm 1.2.3 returns
a random integer 𝑎 with bit length at most 𝑘 with the uniform distribution.

Algorithm 1.2.2. Selecting a uniformly distributed random bit string of fixed length
Input: 𝑘 ∈ ℕ
Output: 𝑠 ∈ {0, 1}𝑘
1: randomString(𝑘)
2: for 𝑖 = 0 to 𝑘 − 1 do
3: 𝑠𝑖 ← coinToss
4: end for
5: return 𝑠 ⃗ = (𝑠0 , . . . , 𝑠𝑘−1 )
6: end

Algorithm 1.2.3. Selecting a uniformly distributed random positive integer of


bounded bitlength
Input: 𝑘 ∈ ℕ
Output: 𝑏 ∈ ℕ0 with bitLength(𝑏) ≤ 𝑘
1: randomInt(𝑘)
2: 𝑠 ⃗ ← randomString(𝑘)
3: 𝑏 ← stringToInt(𝑠)⃗
4: return 𝑏
5: end

We also use the following terminology. If a probabilistic algorithm 𝐴 returns an


output from Output(𝐴, 𝑎) for a specific input 𝑎 ∈ Input(𝐴), we refer to the correspond-
ing algorithm run as a “success”. Otherwise, it is considered a “failure”. Notably, if
the output of 𝐴 on input 𝑎 is “Failure”, it indicates that the algorithm did not find a
correct output. However, if the output is not “Failure”, it is not immediately evident
whether the result is correct or not, that is, whether the algorithm run was a success
or not. This must be checked by other means.
A probabilistic algorithm that, upon termination, always returns a correct result
or “Failure” is called “error-free”.
Example 1.2.4. We present an algorithm that implements the Fermat test to deter-
mine whether a positive integer is composite. Given an input 𝑎 ∈ ℕ>1 , the algorithm
randomly selects 𝑏 ∈ ℤ𝑎 with the uniform distribution. If the condition
(1.2.1) 1 < gcd(𝑎, 𝑏) < 𝑎 ∨ 𝑏𝑎−1 ≢ 1 mod 𝑎
holds, the algorithm returns 1; otherwise, it returns 0. Fermat’s Little Theorem (see
[Buc04, Theorem 2.11.1]) guarantees that condition (1.2.1) implies the compositeness
of 𝑎. However, it is essential to note that the converse may not be true, since 𝑎 could be
a Carmichael number. Carmichael numbers are composite numbers that satisfy 𝑏𝑎−1 ≡
1 mod 𝑎 for all 𝑏 ∈ ℤ∗𝑎 . For example, 561, 1105, and 1729 are the first three Carmichael
18 1. Classical Computation

numbers. Moreover, as shown in [AGP94], there are infinitely many Carmichael num-
bers. Since Carmichael numbers are composite, the Fermat test will return 0 for these
inputs, making the algorithm non-error-free.
Exercise 1.2.5. (1) Write pseudocode for the Fermat test described in Example 1.2.4.
(2) Find a composite number 𝑎 such that on input of 𝑎 the algorithm of Example 1.2.4
sometimes returns 0 and sometimes 1.

There are the following two types of probabilistic algorithms.


(1) Monte Carlo algorithms. They always terminate but may not always be successful.
(2) Las Vegas algorithms. They may not terminate but when they terminate, they are
successful.
Algorithms 1.2.2 and 1.2.3 are examples of Monte Carlo algorithms. We will see
in Proposition 1.3.5 that Monte Carlo algorithms are exactly the possible subroutines
of probabilistic algorithms; i.e., they terminate on every permitted input and return
one of finitely many possible outputs according to a probability distribution. Another
example of a Monte Carlo algorithm is the following.
Example 1.2.6. Algorithm 1.2.8 is an error-free Monte Carlo factoring algorithm that
is based on the fact that a composite integer 𝑎 ∈ ℕ has a proper divisor 𝑏 ∈ ℕ such that
(1.2.2) bitLength(𝑏) ≤ m(𝑎) = ⌈(bitLength 𝑎)/2⌉.
This is shown in Exercise 1.2.7. On input of 𝑎 ∈ ℤ>1 , Algorithm 1.2.8 computes the
integer 𝑏 represented by a uniformly distributed random bit string of length m(𝑎). The
algorithm returns 𝑏 if this number is a proper divisor of 𝑎. Then the algorithm run was
successful. Otherwise, it returns “Failure” which means that the algorithm did not find
a proper divisor of 𝑎. The algorithm always terminates since it tests only a single 𝑏 but
it may not always be successful.
Exercise 1.2.7. Show that every composite number 𝑎 ∈ ℕ has a proper divisor with
bitlength at most m(𝑎). Also, show that m(𝑎) can be computed in linear time.

Algorithm 1.2.8. Monte Carlo factoring algorithm


Input: 𝑎 ∈ ℕ>1
Output: A proper divisor 𝑏 ∈ ℕ of 𝑎
1: 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
2: 𝑏 ← randomInt(m(𝑎))
3: if 1 < 𝑏 < 𝑎 ∧ 𝑎 mod 𝑏 = 0 then
4: return 𝑏
5: end if
6: return “Failure”
7: end

Example 1.2.9. Algorithm 1.2.10 is a Las Vegas factoring algorithm which calls
𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 until a proper divisor of 𝑎 is found. This may take forever. But if the al-
gorithm terminates, then it is successful.
1.2. Probabilistic algorithms 19

Algorithm 1.2.10. Las Vegas factoring algorithm


Input: 𝑎 ∈ ℕ>1
Output: A proper divisor 𝑏 ∈ ℕ of 𝑎
1: 𝗅𝗏𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
2: repeat
3: 𝑏 ← 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
4: until 𝑏 ≠ “Failure”
5: return 𝑏
6: end

The approach used in Algorithm 1.2.10 can be extended to create a more general
version, allowing any error-free Monte Carlo algorithm 𝐴 to be transformed into a Las
Vegas algorithm. This transformation is achieved through Algorithm 1.2.11. When
given an input 𝑎 ∈ Input(𝐴), this algorithm repeatedly executes 𝐴(𝑎) until a successful
outcome is obtained. As this algorithm is akin to performing a Bernoulli experiment,
we refer to it as the Bernoulli algorithm associated with 𝐴.

Algorithm 1.2.11. Bernoulli algorithm associated with an error-free Monte Carlo al-
gorithm 𝐴
Input: 𝑎 ∈ Input(𝐴)
Output: 𝑏 ∈ Output(𝐴, 𝑎)
1: 𝖻𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂𝐴 (𝑎)
2: 𝑏 ← “Failure”
3: while 𝑏 = “Failure” do
4: 𝑏 ← 𝐴(𝑎)
5: end while
6: return 𝑏
7: end

On the other hand, every Las Vegas algorithm can indeed be transformed into an
error-free Monte Carlo algorithm. This conversion entails monitoring the number of
calls made to the probabilistic subroutines while the algorithm runs. The algorithm
terminates if the Las Vegas algorithm produces a successful outcome or if the count of
subroutine calls exceeds a predetermined threshold value, which may vary depending
on the specific input of the algorithm. In the event of success, the algorithm returns the
output of the Las Vegas algorithm. However, if the threshold is surpassed, it returns
the result “Failure.”
Exercise 1.2.12. Change Algorithm 1.2.10 to an error-free Monte Carlo algorithm that,
on input of 𝑎 ∈ ℤ>1 , performs at most bitLength(𝑎) coin tosses.
Exercise 1.2.13. Change Algorithm 1.2.10 to an error-free Monte Carlo algorithm that,
on input of 𝑎 ∈ ℤ>1 , performs at most bitLength(𝑎) coin tosses.

1.2.2. Probabilistic decision algorithms. We now introduce probabilistic de-


cision algorithms. As deterministic decision algorithms, their purpose is to decide the
20 1. Classical Computation

membership of 𝑠 ⃗ ∈ {0, 1}∗ in a language 𝐿 ⊂ {0, 1}∗ . Such an algorithm always re-
turns 1 or 0. It satisfies Output(𝐴, 𝑠)⃗ = {1} for all 𝑠 ⃗ ∈ 𝐿 and Output(𝐴, 𝑠)⃗ = {0} for all
𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿. However, recall that runs of probabilistic decision algorithms do not
have to be successful. So, the algorithm may return 0 if 𝑠 ⃗ ∈ 𝐿 and 1 if 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿.
There are three different types of probabilistic decision algorithms. To define them,
let 𝐴 be a probabilistic algorithm that decides a language 𝐿.
(1) 𝐴 is called true-biased if it never returns false positives. So, if on input of 𝑠 ⃗ ∈ {0, 1}∗
the algorithm returns 1, then 𝑠 ⃗ ∈ 𝐿.
(2) 𝐴 is called false-biased if it never returns false negatives. So, if at the input of
𝑠 ⃗ ∈ {0, 1}∗ the algorithm returns 0, then 𝑠 ⃗ ∉ 𝐿.
(3) If 𝐴 is true-biased or false-biased, then it is also called an algorithm with one-sided
error.
(4) 𝐴 is called an algorithm with two-sided error if it can return false positives and
false negatives.
Note that a false-biased algorithm can always be transformed into a true-biased
algorithm. We only need to replace the language to be decided by its complement in
{0, 1}∗ and change the outputs 0 and 1.
Example 1.2.14. Consider Algorithm 1.2.15 that decides whether or not the integer
that corresponds to a string in {0, 1}∗ is composite or not. On the input of 𝑠 ⃗ ∈ {0, 1}∗ , the
algorithm computes the corresponding integer 𝑎 and calls 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋. If this subroutine
returns a proper divisor of 𝑎, then the algorithm returns 1. Otherwise, it returns 0.
This is a true-biased Monte Carlo decision algorithm. If it returns 1, then 𝑠 ⃗ represents
a composite integer. But if it returns 0, then the integer represented by 𝑠 ⃗ may or may
not be composite.

Algorithm 1.2.15. True-biased Monte Carlo compositeness decision algorithm


Input: 𝑠 ⃗ ∈ {0, 1}∗
Output: 1 if stringToInt(𝑠)⃗ is composite and 0 otherwise
1: 𝗆𝖼𝖢𝗈𝗆𝗉𝗈𝗌𝗂𝗍𝖾(𝑠)⃗
2: 𝑎 ← stringToInt(𝑠)⃗
3: 𝑏 ← 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
4: if 𝑏 ∈ ℕ then
5: return 1
6: else
7: return 0
8: end if
9: end

Example 1.2.16. Algorithm 1.2.17 is a somewhat artificial example of a Monte Carlo


decision algorithm with two-sided error. On input of 𝑠 ⃗ ∈ {0, 1}∗ it computes 𝑎 =
stringToInt(𝑠).⃗ Then it tosses a coin, calls 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋, and returns 1 if the coin toss gives
1 or 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) returns a proper divisor of 𝑎. Otherwise, it returns 0. The algorithm
1.3. Analysis of probabilistic algorithms 21

returns a false negative answer if 𝑎 is composite and coinToss and 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 both return
0. Also, it returns a false positive answer if 𝑎 is a prime number and coinToss gives 1.

Algorithm 1.2.17. Monte Carlo compositeness decision algorithm with two-sided er-
ror
Input: 𝑠 ⃗ ∈ {0, 1}∗
Output: 1 if stringToInt(𝑠)⃗ is composite and 0 otherwise
1: 𝗆𝖼𝖢𝗈𝗆𝗉𝗈𝗌𝗂𝗍𝖾2(𝑎)
2: 𝑎 ← stringToInt(𝑠)⃗
3: 𝑐 ← coinToss
4: 𝑏 ← 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎)
5: if 𝑐 = 1 ∨ 𝑏 ∈ ℕ then
6: return 1
7: else
8: return 0
9: end if
10: end

1.3. Analysis of probabilistic algorithms


This section discusses the analysis of the time complexity and success probability of
probabilistic algorithms.

1.3.1. A discrete probability space. Our first goal is to define a discrete prob-
ability space that is the basis of the analyses. In this section, 𝐴 denotes a probabilistic
algorithm. We first introduce some notation.
Consider a run 𝑅 of 𝐴 with input 𝑎 ∈ Input(𝐴) and let 𝑙 ∈ ℕ0 ∪ {∞} be the num-
ber of probabilistic subroutine calls in 𝑅. For instance, in Algorithm 1.2.2 we have
Input(𝐴) = ℕ and for 𝑎 ∈ ℕ it holds that 𝑙 = 𝑎. In contrast, in Algorithm 1.2.10, the
number 𝑙 of probabilistic subroutine calls may be infinite.
For all 𝑘 ∈ ℕ, 𝑘 ≤ 𝑙, let 𝑎𝑘 be the input of the 𝑘th probabilistic subroutine call in 𝑅 if
this subroutine requires an input, let 𝑟 𝑘 be its output, and let 𝑝 𝑘 be the probability that
on input of 𝑎𝑘 the output 𝑟 𝑘 occurs. These quantities are well-defined since we require
that probabilistic algorithms may only use probabilistic subroutines that on any input
terminate and return one of finitely many possible outputs according to some proba-
bility distribution. For example, for the probabilistic subroutine coinToss there is no
1
input, the output is 0 or 1, and the probability of both outputs is 2 . We call 𝑟 ⃗ = (𝑟 𝑘 )𝑘≤𝑙
the random sequence of the run 𝑅. So the random sequence of a run of randomInt with
input 𝑎 ∈ ℕ is in {0, 1}𝑎 .
We denote the set of all random sequences of runs of 𝐴 with input 𝑎 by Rand(𝐴, 𝑎)
and the set of finite strings in Rand(𝐴, 𝑎) by FRand(𝐴, 𝑎). So for 𝐴 = randomInt
and 𝑎 ∈ ℕ we have Rand(𝐴, 𝑎) = FRand(𝐴, 𝑎) = {0, 1}𝑎 . We note that for any
𝑎 ∈ Input(𝐴), each 𝑟 ⃗ ∈ Rand(𝐴, 𝑎) is the random sequence of exactly one run of 𝐴. We
22 1. Classical Computation

call it the run of 𝐴 corresponding to 𝑟.⃗ This run terminates if and only if 𝑟 ⃗ ∈ FRand(𝐴, 𝑎)
in which case we write 𝐴(𝑎, 𝑟)⃗ for the return value of this run.
Finally, let 𝑘 ∈ ℕ0 , 𝑘 ≤ 𝑙, and let 𝑟 ⃗ = (𝑟0 , . . . , 𝑟 𝑘−1 ) be a prefix of a random sequence
of a run of 𝐴 with input 𝑎. Also, for 0 ≤ 𝑖 < 𝑘 denote by 𝑝 𝑖 the probability for the return
value 𝑟 𝑖 to occur. Then we set
𝑘−1
(1.3.1) Pr𝐴,𝑎 (𝑟)⃗ = ∏ 𝑝 𝑖 .
𝑖=0

This is the probability that 𝑟 ⃗ occurs as the prefix of the random sequence of a run of 𝐴
with input 𝑎. For instance, if 𝐴 = randomInt and 𝑎 ∈ ℕ, then for all 𝑘 ∈ ℕ0 with 𝑘 ≤ 𝑎
1
and all 𝑟 ⃗ ∈ {0, 1}𝑘 , we have Pr𝐴,𝑎 (𝑟)⃗ = 2𝑘 .
Exercise 1.3.1. Determine Rand(𝐴, 𝑎), FRand(𝐴, 𝑎), and Pr𝐴,𝑎 for 𝐴 = 𝗅𝗏𝖥𝖺𝖼𝗍𝗈𝗋 spec-
ified in Algorithm 1.2.10 and 𝑎 ∈ Input(𝐴) = ℕ>1 .

The next lemma allows the definition of the probability distribution that we are
looking for.
Lemma 1.3.2. Let 𝑎 ∈ Input(𝐴). The (possibly infinite) sum
(1.3.2) ∑ Pr(𝑟)⃗

𝑟∈FRand(𝐴,𝑎)

converges, its limit is in the interval [0, 1], and it is independent of the ordering of the terms
in the sum.

Proof. First, note the following. If the sum in (1.3.2) is convergent, then it is absolute
convergent since the terms in the sum are nonnegative. So Theorem C.1.4 implies that
its limit is independent of the ordering of the terms in the sum.
To prove the convergence of the sum, set
(1.3.3) 𝑡𝑘 = ∑ Pr𝐴,𝑎 (𝑟)⃗

𝑟∈FRand, |𝑟|⃗ ≤𝑘

for all 𝑘 ∈ ℕ0 . Then the sum in (1.3.2) is convergent if and only if the sequence (𝑡 𝑘 ) con-
verges. For 𝑘 ∈ ℕ0 let Rand𝑘 be the set of all prefixes of length at most 𝑘 of sequences
in Rand(𝐴, 𝑎). We will prove below that
(1.3.4) ∑ Pr𝐴,𝑎 (𝑟)⃗ = 1

𝑟∈Rand 𝑘

for all 𝑘 ∈ ℕ0 . This implies


(1.3.5) 𝑡𝑘 ≤ ∑ Pr𝐴,𝑎 (𝑟)⃗ = 1

𝑟∈Rand 𝑘

for all 𝑘 ∈ ℕ0 . Since the elements of the sequence (𝑡 𝑘 ) are nondecreasing this proves
the convergence of (𝑡 𝑘 ) and thus of the infinite sum (1.3.2).
We will now prove (1.3.4) by induction on 𝑘. Since Rand0 only contains the empty
sequence, (1.3.4) holds for 𝑘 = 0. For the inductive step, assume that 𝑘 ∈ ℕ0 and

that (1.3.4) holds for 𝑘. Denote by Rand𝑘 the set of all sequences of length at most 𝑘

in Rand(𝐴, 𝑎) and denote by Rand𝑘 the set of sequences of length 𝑘 that are proper
1.3. Analysis of probabilistic algorithms 23


prefixes of strings in Rand(𝐴, 𝑎). For 𝑟 ⃗ ∈ Rand𝑘 let 𝑚(𝑟)⃗ be the number of possible
outputs of the (𝑘 + 1)st call of a probabilistic subroutine when the sequence of return
values of the previous calls was 𝑟,⃗ let 𝑟 𝑖 (𝑟)⃗ be the 𝑖th of these outputs, and let 𝑝 𝑖 (𝑟)⃗ be its
probability. These quantities exist by the definition of probabilistic algorithms. Then
we have
′ ″
(1.3.6) Rand𝑘+1 = Rand𝑘 ∪{𝑟‖𝑟
⃗ 𝑖 (𝑟)⃗ ∶ 𝑟 ⃗ ∈ Rand𝑘 and 1 ≤ 𝑖 ≤ 𝑚(𝑟)}.

Also, we have
𝑚(𝑟)⃗
(1.3.7) ∑ 𝑝 𝑖 (𝑟)⃗ = 1
𝑖=1


for all 𝑟 ⃗ ∈ Rand𝑘 . This implies

𝑚(𝑟)⃗
∑ Pr𝐴,𝑎 (𝑟)⃗ = ∑ Pr𝐴,𝑎 (𝑟)⃗ + ∑ ∑ Pr𝐴,𝑎 (𝑟||𝑠
⃗ 𝑖 (𝑟))


𝑟∈Rand ′ ″ 𝑖=1
𝑘+1 ⃗
𝑟∈Rand𝑘 ⃗
𝑟∈Rand 𝑘

𝑚(𝑟)⃗
= ∑ Pr𝐴,𝑎 (𝑟)⃗ + ∑ Pr𝐴,𝑎 (𝑟)⃗ ∑ 𝑝 𝑖 (𝑟)⃗
′ ″ 𝑖=1

𝑟∈Rand𝑘 ⃗
𝑟∈Rand𝑘

(1.3.8) = ∑ Pr𝐴,𝑎 (𝑟)⃗ + ∑ Pr𝐴,𝑎 (𝑟)⃗


′ ″

𝑟∈Rand 𝑘 ⃗
𝑟∈Rand𝑘

= ∑ Pr𝐴,𝑎 (𝑟)⃗ = 1. □

𝑟∈Rand𝑘

Lemma 1.3.2 allows the definition of the probability distribution that we are look-
ing for. This is done in the following proposition.

Proposition 1.3.3. For every 𝑎 ∈ Input(𝐴) we define

(1.3.9) Pr𝐴,𝑎 (∞) = 1 − ∑ Pr𝐴,𝑎 (𝑟).




𝑟∈FRand(𝐴,𝑎)

Then (FRand(𝐴, 𝑎) ∪ {∞}, Pr𝐴,𝑎 ) is a discrete probability space. Also, if Pr𝐴,𝑎 (∞) = 0,
then (FRand(𝐴, 𝑎), Pr𝐴,𝑎 ) is a discrete probability space.

Exercise 1.3.4. Prove Proposition 1.3.3.

For 𝑎 ∈ Input(𝑎) and 𝑟 ⃗ ∈ FRand(𝐴, 𝑎), the value Pr𝐴,𝑎 (𝑟)⃗ is the probability that
𝑟 ⃗ is the random sequence of a run of 𝐴 with input 𝑎. Also, Pr𝐴,𝑎 (∞) is the probability
that on input of 𝑎, the algorithm 𝐴 does not terminate.
An important type of algorithms 𝐴 that satisfy Pr𝐴,𝑎 (∞) = 0 for all 𝑎 ∈ Input(𝑎)
is Monte Carlo algorithms. We now show that they are exactly the probabilistic algo-
rithms that, according to the specification in Section 1.2.1, can be called by probabilistic
algorithms as subroutines.
24 1. Classical Computation

Proposition 1.3.5. Let 𝐴 be a Monte Carlo algorithm and let 𝑎 ∈ Input(𝐴). Then the
following hold.
(1) The running time of 𝐴 on input of 𝑎 is bounded by some 𝑘 ∈ ℕ that may depend on
𝑎.
(2) On input of 𝑎, algorithm 𝐴 returns one of finitely many possible outputs according
to a probability distribution.

Proof. We first show that the length of all 𝑟 ⃗ ∈ FRand(𝐴, 𝑎) is bounded by some 𝑘 ∈ ℕ.
This shows that there are only finitely many possible runs of 𝐴 on input of 𝑎 which
implies the first assertion.
Assume that no such upper bound exists. We inductively construct prefixes 𝑟 𝑘⃗ =
(𝑟0 , . . . , 𝑟 𝑘 ), 𝑘 ∈ 𝑁0 , of an infinite sequence 𝑟 ⃗ = (𝑟0 , 𝑟1 , . . .) that are also prefixes of
arbitrarily long strings in Rand(𝐴, 𝑎); that is, for all 𝑘 ∈ ℕ0 and 𝑙 ∈ ℕ the sequence 𝑟 𝑘⃗
is a prefix of a sequence in Rand(𝐴, 𝑎) of length at least 𝑙. Then 𝑟 ⃗ is an infinite sequence
in Rand(𝐴, 𝑎) that contradicts the assumption that 𝐴 is a Monte Carlo algorithm.
For the base case, we set 𝑟0⃗ = (). This is a prefix of all strings in Rand(𝐴, 𝑎) that, by
our assumption, may be arbitrarily long. For the inductive step, assume that 𝑘 ∈ ℕ and
that we have constructed 𝑟 𝑘−1 ⃗ = (𝑟0 , . . . , 𝑟 𝑘−1 ). By the definition of probabilistic algo-
rithms, there are finitely many possibilities to select 𝑟 𝑘 in such a way that the sequence
(𝑟0 , . . . , 𝑟 𝑘 ) is the prefix of a string in Rand(𝐴, 𝑎). For at least one of these choices, this
sequence is a prefix of arbitrarily long strings in Rand(𝐴, 𝑎) because, by the induction
hypothesis, 𝑟 𝑘−1 ⃗ has this property. We select such an 𝑟 𝑘 and this concludes the induc-
tive construction and the proof of the first assertion.
Together with Proposition 1.3.3, the first assertion of the proposition implies the
second one. □

1.3.2. The success probability of Monte Carlo algorithms. In addition to


the running time, the probability of success is also crucial for the efficiency of a Monte
Carlo algorithm. We now define this probability.
Definition 1.3.6. Let 𝐴 be a Monte Carlo algorithm, let 𝑎 ∈ Input(𝐴), and denote by
Randsucc (𝐴, 𝑎) the set of all 𝑟 ⃗ ∈ Rand(𝐴, 𝑎) such that the run of 𝐴 associated with 𝑟 ⃗ is
successful; i.e., 𝐴(𝑎, 𝑟)⃗ ∈ Output(𝐴, 𝑎). Then we set
(1.3.10) 𝑝𝐴 (𝑎) = ∑ Pr𝐴,𝑎 (𝑟)⃗

𝑟∈Rand succ (𝐴,𝑎)

and call this quantity the success probability of 𝐴 on input of 𝑎. Also, the value
(1.3.11) 𝑞𝐴 (𝑎) = 1 − 𝑝𝐴 (𝑎)
is called the failure probability of 𝐴 on input of 𝑎.

Exercise 1.3.7. Prove that for all 𝑎 ∈ Input(𝐴), the sum in (1.3.10) is convergent and
its limit is independent of the ordering of the terms in the sum.
Example 1.3.8. Let 𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 specified in Algorithm 1.2.8 and let 𝑎 ∈ Input(𝐴) =
ℕ>1 . Then Randsucc (𝐴, 𝑎) is the set of all sequences (𝑏) where 𝑏 is a proper divisor of
1.3. Analysis of probabilistic algorithms 25

𝑎 of bitlength at most m(𝑎). By Exercise 1.2.7, this set is not empty. Therefore, the
success probability 𝑝𝐴 (𝑎) of 𝐴 on input of 𝑎 is at least 1/2m(𝑎) .

We can use the definition of the success probability to show that Bernoulli algo-
rithms terminate with probability 1.
Proposition 1.3.9. Let 𝐴 be a Bernoulli algorithm. Then we have Pr𝐴,𝑎 (∞) = 0 for all
𝑎 ∈ Input(𝐴).

Proof. Denote by 𝐴′ the error-free Monte Carlo algorithm used in 𝐴. Let 𝑎 ∈ Input(𝐴′ ).
Then FRand(𝐴, 𝑎) consists of all strings 𝑟 ⃗ = 𝑟1⃗ || ⋯ ||𝑟 𝑘⃗ where 𝑘 ∈ ℕ, 𝑟 𝑖⃗ ∈ Rand(𝐴′ , 𝑎)
for 1 ≤ 𝑖 ≤ 𝑘, 𝐴′ (𝑎, 𝑟 𝑖⃗ ) = “Failure” for 1 ≤ 𝑖 < 𝑘, and 𝐴′ (𝑎, 𝑟 𝑘⃗ ) ≠ “Failure”. So we
obtain

𝑝𝐴′ (𝑎)
(1.3.12) ∑ Pr𝐴,𝑎 (𝑠) = 𝑝𝐴′ (𝑎) ∑ (1 − 𝑝𝐴′ (𝑎))𝑘 = = 1.

𝑟∈FRand(𝐴,𝑎) 𝑘=0
𝑝𝐴′ (𝑎)

This implies the assertion. □

1.3.3. Expected running time. The probability space defined in Proposition


1.3.3 also allows the definition of the expected running time of random algorithms.
Definition 1.3.10. Let 𝐴 be a probabilistic algorithm and let 𝑎 ∈ Input(𝐴) such that
Pr𝐴,𝑎 (∞) = 0. Then the expected running time of 𝐴 on input of 𝑎 is defined as the
expectation of the random variable time𝐴,𝑎 that sends 𝑟 ⃗ ∈ FRand(𝐴, 𝑎) to the running
time of the algorithm run associated with 𝑟.⃗ It is denoted by eTime𝐴 (𝑎). So we have
(1.3.13) eTime𝐴 (𝑎) = ∑ time𝐴,𝑎 (𝑟)Pr
⃗ 𝐴,𝑎 (𝑟).


𝑟∈FRand(𝐴,𝑎)

Example 1.3.11. Let 𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 which is specified in Algorithm 1.2.8 and let 𝑎 ∈
Input(𝐴). Then FRand(𝐴, 𝑎) is the set of all one-element sequences (𝑏), where 𝑏 is
an integer that can be represented by a bit string of length m(𝑎). So | FRand(𝐴, 𝑎)| ≤
2m(𝑎) . Also, by Proposition 1.3.5 we have Pr𝐴,𝑎 (∞) = 0. So eTime𝐴 (𝑎) is defined. Since
2
each run of 𝐴 on input 𝑎 has running time O(size 𝑎), we have

2 1 2
(1.3.14) eTime𝐴 (𝑎) = O (size 𝑎 ∑ ) = O(size 𝑎).

𝑟∈FRand(𝐴,𝑎)
2m(𝑎)

So the expected running time is quadratic. However, the success probability of 𝐴 on


input of 𝑎 may be as small as 1/2m(𝑎) . As we will see in Section 1.3.4, this success
probability can be amplified by repeatedly calling 𝐴. But in order to obtain success
2
probability ≥ 3 , exponential expected running time is required.

The next proposition determines the expected running time of Bernoulli algorithms.
Proposition 1.3.12. Let 𝐴 be an error-free Monte Carlo algorithm, let 𝑎 ∈ Input(𝐴),
and let 𝑡 be an upper bound on the running time of 𝐴 with input of 𝑎. Then the expected
running time of 𝖻𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂𝐴 (𝑎) specified in Algorithm 1.2.11 is O(𝑡/𝑝𝐴 (𝑎)).
26 1. Classical Computation

Proof. We use the fact that for all 𝑐 ∈ ℝ with 0 ≤ 𝑐 < 1 we have

𝑐
(1.3.15) ∑ 𝑘𝑐𝑘 = .
𝑖=0
(1 − 𝑐)2
So the expected number of calls of 𝐴 until 𝖻𝖾𝗋𝗇𝗈𝗎𝗅𝗅𝗂𝐴 (𝑎) is successful is

𝑝𝐴 (𝑎) 1
(1.3.16) 𝑝𝐴 (𝑎) ∑ 𝑘𝑞𝐴 (𝑎)𝑘 = 2
= .
𝑘=0
𝑝𝐴 (𝑎) 𝑝𝐴 (𝑎)
The statement about the expected running time is an immediate consequence of this
result. □
Example 1.3.13. Proposition 1.3.12 allows the analysis of 𝗅𝗏𝖥𝖺𝖼𝗍𝗈𝗋 specified in Algo-
rithm 1.2.10. Let 𝑛 ∈ ℕ and let 𝑎 ∈ ℕ>1 be an input of size 𝑛. It follows from Example
1.3.8 that the success probability of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) is at least 1/2m(𝑎) ≥ 1/2𝑛/2+1 . Also,
the worst-case running time of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) is O(𝑛2 ). It therefore follows from Proposi-
tion 1.3.12 that the expected running time of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) is O(𝑛2 2𝑛/2 ). So the expected
running time is exponential which shows that this probabilistic algorithm has no ad-
vantage over the deterministic Algorithm 1.1.21.

1.3.4. Amplifying success probabilities. In Example 1.3.8, we have seen that


the success probability of 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 specified in Algorithm 1.2.8 with input 𝑎 ∈ ℕ>1 is at
least 1/2m(𝑎) . We now explain how this success probability and the success probability
of every error-free Monte Carlo algorithm 𝐴 can be amplified by repeatedly calling it
with the same input. Algorithm 1.3.14 implements this idea.

Algorithm 1.3.14. Repeated application of an error-free Monte Carlo algorithm


Input: 𝑎 ∈ Input(𝐴) for an error-free Monte Carlo algorithm 𝐴 used as a subroutine,
𝑘∈ℕ
Output: 𝑏 ∈ Output(𝐴, 𝑎)
1: 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 (𝑎, 𝑘)
2: for 𝑖 = 1 to 𝑘 do
3: 𝑏 ← 𝐴(𝑎)
4: if 𝑏 ≠ “Failure” then
5: return 𝑏
6: end if
7: end for
8: return “Failure”
9: end

Definition 1.3.15. Let 𝑎 ∈ Input(𝐴). We denote the success probability of 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 (𝑎, 𝑘)
by 𝑝𝐴 (𝑎, 𝑘) and the failure probability of this call by 𝑞𝐴 (𝑎, 𝑘) = 1 − 𝑝𝐴 (𝑎, 𝑘).

The next proposition shows that 𝑞𝐴 (𝑎, 𝑘) decreases exponentially in 𝑘.


Proposition 1.3.16. Let 𝑘 ∈ ℕ, and let 𝑎 ∈ Input(𝐴) with 𝑝𝐴 (𝑎) < 1. Then we have the
following:
(1.3.17) 𝑒−𝑘𝑝𝐴 (𝑎)/𝑞𝐴 (𝑎) ≤ 𝑞𝐴 (𝑎, 𝑘) ≤ 𝑒−𝑘𝑝𝐴 (𝑎) .
1.3. Analysis of probabilistic algorithms 27

Proof. Write 𝑝 = 𝑝𝐴 (𝑎) and 𝑞 = 𝑞𝐴 (𝑎) = 1 − 𝑝. Then we have


(1.3.18) 𝑞𝐴 (𝑎, 𝑘) = 𝑞𝑘 .
Since we assume that 𝑞 > 0, it follows from [Abr72, (4.1.33) and (4.1.36)] that
1
(1.3.19) 1 − ≤ log 𝑞 ≤ 𝑞 − 1.
𝑞
This implies
(1.3.20) 𝑘 log 𝑞 ≤ 𝑘(𝑞 − 1) = −𝑘𝑝
and
1 𝑞−1 𝑝
(1.3.21) 𝑘 log 𝑞 ≥ 𝑘(1 − ) = 𝑘 = −𝑘 .
𝑞 𝑞 𝑞
So (1.3.20) and (1.3.21) imply
(1.3.22) 𝑒−𝑘𝑝/𝑞 ≤ 𝑞𝑘 ≤ 𝑒−𝑘𝑝
as asserted. □

The next corollary shows how to choose 𝑘 in order to obtain a desired success prob-
ability. It also gives a lower bound for 𝑘 that corresponds to a given success probability.
Corollary 1.3.17. Let 𝑎 ∈ Input(𝐴) with 𝑝𝐴 (𝑎) > 0 and let 𝜀 ∈ ℝ with 0 < 𝜀 ≤ 1.
(1) If 𝑘 ≥ log(1/𝜀)/𝑝𝐴 (𝑎), then 𝑝𝐴 (𝑎, 𝑘) ≥ 1 − 𝜀.
(2) If 𝑝𝐴 (𝑎, 𝑘) ≥ 1 − 𝜀, then 𝑘 ≥ log(1/𝜀)𝑞𝐴 (𝑎)/𝑝𝐴 (𝑎).
Exercise 1.3.18. Prove Corollary 1.3.17.
Example 1.3.19. Consider 𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋 as specified in Algorithm 1.2.8. In Example
1.3.8, we have seen that 𝑝𝐴 (𝑎) ≥ 1/2m(𝑎) > 0 for all 𝑎 ∈ ℤ>1 . So 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 can be used
to amplify this probability. For example, if we choose 𝜀 = 1/3 and 𝑘 ≥ (log 3)2m(𝑎) ≥
2
log(1/𝜀)/𝑝𝐴 (𝑎), then Corollary 1.3.17 implies 𝑝𝐴 (𝑎, 𝑘) ≥ 3 . Since
m(𝑎) ≥ bitLength(𝑎)/2,
this number 𝑘 of calls to 𝐴 is exponential in size 𝑎. Therefore, again, this algorithm
does not give an asymptotic advantage over the deterministic Algorithm 1.1.21.

We can also amplify the success probability of decision algorithms with errors.
Consider a true-biased decision algorithm 𝐴 that decides a language 𝐿. We can mod-
ify this algorithm to make it an error-free Monte Carlo algorithm. For this, we set
Output(𝐴, 𝑠)⃗ = {1} for all 𝑠 ⃗ ∈ 𝐿, Output(𝐴, 𝑠)⃗ = ∅ for all 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿 and we re-
place the return value 0 by “Failure”. So, the success probability of 𝐴 can be amplified
using Algorithm 1.3.14. Analogously, the success probability of false-biased decision
algorithms can be amplified.
Next, we consider a Monte Carlo decision algorithm 𝐴 with two-sided error that
decides a language 𝐿. Such an algorithm never gives certainty about whether an input
𝑠 ⃗ ∈ {0, 1}∗ belongs to 𝐿 or not. However, the probability of success can be increased
by using a majority vote. To do this, we run the algorithm 𝑘 times with input 𝑠 ⃗ for
some 𝑘 ∈ ℕ and count the number of positive responses 1 and the number of negative
28 1. Classical Computation

answers 0 and return 1 or 0 depending on which answer has the majority. This is done
in Algorithm 1.3.20.

Algorithm 1.3.20. Success probability amplifier for a Monte Carlo decision algorithm
𝐴 with two-sided error
Input: 𝑠 ⃗ ∈ {0, 1}∗ , 𝑘 ∈ ℕ
Output: 1 if 𝑠 ⃗ ∈ 𝐿 and 0 if 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿 where 𝐿 is the language decided by the Monte
Carlo decision algorithm 𝐴 that is used as a subroutine
1: 𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘)
2: 𝑙=0
3: for 𝑖 = 1 to 𝑘 do
4: 𝑙 ← 𝑙 + 𝐴(𝑠)⃗
5: end for
6: if 𝑙 > 𝑘/2 then
7: return 1
8: else
9: return 0
10: end if
11: end

We will show that under certain conditions, Algorithm 1.3.20 amplifies the success
probability of decision algorithms with two-sided error. For this, we need the following
definition.
Definition 1.3.21. Assume that a Monte Carlo algorithm 𝐴 decides a language 𝐿, let
𝑠 ⃗ ∈ 𝐿, and let 𝑏 ∈ {0, 1}. Then we write Pr(𝐴(𝑠) = 𝑏) for the probability that on input
of 𝑠 ⃗ the algorithm 𝐴 returns 𝑏.
Proposition 1.3.22. Let 𝐴 be a Monte Carlo algorithm that decides a language 𝐿, let
1
𝑠 ⃗ ∈ 𝐿, and let 𝜀 ∈ ℝ>0 such that Pr(𝐴(𝑠)⃗ = 1) ≥ 2 + 𝜀. Then for all 𝑘 ∈ ℕ we have
2
(1.3.23) Pr(𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘) = 1) > 1 − 𝑒−2𝑘𝜀 .

Proof. Let 𝑘 ∈ ℕ. We prove the assertion by showing that


2
(1.3.24) 𝑞 = Pr(𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘) = 0) < 𝑒−2𝑘𝜀 .
Consider a run of 𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 with input 𝑠,⃗ 𝑘. Denote by 𝑟 ⃗ ∈ {0, 1}𝑘 the random
output sequence corresponding to this run. Then 𝐴 returns 0 if and only if at most 𝑘/2
entries in 𝑟 ⃗ are 1. The probability for such an 𝑟 ⃗ to occur is at most
𝑘 𝑘 𝑘
1 2 1 2 (1 − 4𝜀2 ) 2
(1.3.25) ( − 𝜀) ( + 𝜀) = .
2 2 2𝑘
Since the number of such sequences 𝑟 ⃗ is at most 2𝑘 and because by [Abr72, (4.2.30)]
we have 1 − 𝑥 < 𝑒−𝑥 for all 𝑥 > −1, it follows that
𝑘 2
(1.3.26) 𝑞 ≤ (1 − 4𝜀2 ) 2 < 𝑒−2𝑘𝜀 .
This concludes the proof. □
1.4. Complexity theory 29

Example 1.3.23. Let 𝐴 = 𝗆𝖼𝖢𝗈𝗆𝗉𝗈𝗌𝗂𝗍𝖾2 specified in Algorithm 1.2.17 which is the


Monte Carlo compositeness test with two-sided error and let 𝑠 ⃗ ∈ {0, 1}∗ such that
𝑎 = stringToInt(𝑠)⃗ is composite. The call 𝐴(𝑠)⃗ returns 1 if the first coin toss gives
1
1 which happens with probability 2 or 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) returns a proper divisor of 𝑎 =
1
stringToInt(𝑠)⃗ which occurs with probability ≥ 2𝑚
where 𝑚 = m(𝑎). Therefore, we
have
1 1𝑚 1 1
(1.3.27) Pr(𝐴(𝑠)⃗ = 1) ≥ 1 − (1 − ) = + 𝑚+1 .
2 2 2 2
1
So in Proposition 1.3.22 we can set 𝜀 = 2𝑚+1
and obtain
𝑘
2 −
(1.3.28) Pr(𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘) = 1) ≥ 1 − 𝑒−2𝑘𝜀 = 1 − 𝑒 22𝑚+1

for all 𝑘 ∈ ℕ.

Exercise 1.3.24. Use the result in Example 1.4.7 to determine 𝑘 such that
2
Pr(𝗆𝖺𝗃𝗈𝗋𝗂𝗍𝗒𝖵𝗈𝗍𝖾𝐴 (𝑠,⃗ 𝑘) = 1) ≥ .
3

1.4. Complexity theory


Classical complexity theory allows us to assess the efficiency of algorithms and the
difficulty of solving computational problems. In this section, we present important
notions and results of this theory. They are required as a basis for quantum complexity
theory.

1.4.1. Computational problems. An important question in complexity theory


is: How efficiently can a computational problem be solved? In cryptography, for ex-
ample, it is crucial to know how quickly an encryption system can be broken. That is
why we start by defining computational problems

Definition 1.4.1. A computational problem is a triplet CP = (𝐼, 𝑂, 𝑅) where 𝐼 is a


subset of the Cartesian product of finitely many data types where 𝐼 and 𝑂 are subsets
of Cartesian products of finitely many data types.
Also, 𝑅 ⊂ 𝐼 × 𝑂 such that for all 𝑎 ∈ 𝐼 there is 𝑏 ∈ 𝑂 with (𝑎, 𝑏) ∈ 𝑅.

Definition 1.4.2. Let CP = (𝐼, 𝑂, 𝑅) be a computational problem.


(1) The elements of 𝐼 are called the instances of CP.
(2) If (𝑎, 𝑏) ∈ 𝑅, then 𝑏 is called a solution of the problem instance 𝑎.

Example 1.4.3. By the square root problem we mean the triplet (ℕ, ℤ, 𝑅) where 𝑅 =
{(𝑎, 𝑏) ∈ ℕ × ℤ ∶ 𝑏2 = 𝑎 or 𝑏 = 0 if 𝑎 is not a square in ℕ}. An instance of the square
root problem is 4. It has the two solutions −2 and 2. Another instance is 2. It has the
solution 0 that indicates that 2 is not a square in ℕ. We can also define this problem
differently by only allowing problem instances that are squares.
30 1. Classical Computation

We define what it means that an algorithm solves a computational problem.


Definition 1.4.4. Let CP = (𝐼, 𝑂, 𝑅) be a computational problem.
(1) We say that a deterministic algorithm 𝐴 solves CP if 𝐼 ⊂ Input(𝐴) and on input of
a problem instance 𝑎 ∈ 𝐼 the algorithm returns a solution of 𝑎.
(2) We say that a Monte Carlo algorithm 𝐴 solves CP if 𝐼 ⊂ Input(𝐴) and on input of
𝑎 ∈ 𝐼 a successful run of 𝐴 returns a solution of 𝑎.
(3) We say that a Las Vegas algorithm solves CP if 𝐼 ⊂ Input(𝐴) and on input of 𝑎 ∈ 𝐼
the algorithm either terminates and returns a solution of 𝑎 or does not terminate.
Example 1.4.5. The gcd problem is the triplet CP = (ℤ2 , ℕ0 , 𝑅) where 𝑅 = {(𝑎, 𝑏, 𝑐) ∈
ℤ3 ∶ 𝑐 = gcd(𝑎, 𝑏)}. An instance of this problem is (100, 35). The unique solution of
this instance is 5. Also, the Euclidean algorithm solves this problem.
Exercise 1.4.6. Find a deterministic algorithm that solves the square root problem
from Example 1.4.3 in polynomial time.
Example 1.4.7. By the integer factorization problem we mean the triplet (𝐶, ℕ, 𝑅) where
𝐶 is the set of all positive composite integers and
𝑅 = {(𝑎, 𝑏) ∈ 𝐶 × ℕ ∶ 𝑏 is a proper divisor of 𝑎}.
Algorithms 1.1.21, 1.2.8, and 1.2.10 are deterministic, Monte Carlo, and Las Vegas
algorithms that solve the integer factorization problem.

1.4.2. Complexity of computational problems. We define the complexity of


computational problems.
Definition 1.4.8. Let CP be a computational problem and let 𝑓 ∶ ℕ → ℝ>0 be a
function
(1) We say that CP can be solved in (deterministic) time O(𝑓) or has time complexity
O(𝑓) if there is a deterministic algorithm that solves CP and has running time
O(𝑓).
(2) We say that CP can be solved in (deterministic) linear, quasilinear, quadratic, cu-
bic, polynomial, subexponential, or exponential time or has this time complexity
if there is a deterministic algorithm that solves CP and has the respective time
complexity.
(3) The corresponding space complexities are defined analogously.

Example 1.4.9. As seen in Example 1.1.29, the gcd problem from Example 1.4.5 can
be solved in deterministic time O(𝑛3 ). As noted in this example, the gcd problem can
2
even be solved in deterministic time O(𝑛2 ) or O(𝑛 log 𝑛) and linear space. Thus this
problem can be solved in polynomial time or, more precisely, cubic, quadratic, or even
quasilinear time.
Example 1.4.10. As seen in Example 1.1.31, the integer factorization problem can be
solved in deterministic exponential time and linear space.

Now we introduce the corresponding probabilistic complexity notions.


1.4. Complexity theory 31

Definition 1.4.11. Let CP be a computational problem and let 𝑓 ∶ ℕ → ℝ>0 be a


function.
(1) We say that CP can be solved in probabilistic time O(𝑓) if there is a Monte Carlo
2
algorithm that solves CP and has running time O(𝑓) and success probability ≥ 3 .
(2) We say that CP can be solved in probabilistic linear, quasilinear, quadratic, cu-
bic, polynomial, subexponential, or exponential time if there is a Monte Carlo algo-
rithm with the respective running time that solves CP and has success probability
2
≥ 3.
2
Exercise 1.4.12 shows that the value 3
in Definition 1.4.11 may be replaced by any
1
real number in ] 2 , 1] without changing the complexity of the computational problem.
Exercise 1.4.12. Let CP be a computational problem, let 𝑓 ∶ ℕ → ℝ>0 be a function,
1
and let 𝑝 ∈ ] 2 , 1]. Use Proposition 1.3.22 to show that CP can be solved in probabilistic
time O(𝑓) if and only if there is a Monte Carlo algorithm that solves CP in time O(𝑓)
and has success probability ≥ 𝑝.
Example 1.4.13. It follows from Example 1.3.19 that Algorithm 1.3.14 with subroutine
𝐴 = 𝗆𝖼𝖥𝖺𝖼𝗍𝗈𝗋(𝑎) and 𝑘 = ⌈(log 3)2m(𝑎) ⌉ is an error-free Monte Carlo algorithm that
solves the integer factorization problem in probabilistic exponential time. We note
that this problem can even be solved in probabilistic subexponential time (see [LP92],
[BLP93]) but no classical polynomial time algorithm for this problem is known.

1.4.3. Complexity classes. In this section, we delve into the definition of com-
plexity classes, which serve to group languages that satisfy specific complexity con-
ditions. The foundation of this concept was laid in the early 1970s. Over the years,
complexity theory has witnessed the introduction of numerous complexity classes, and
extensive research has been conducted to study their interrelationships. For the scope
of this discussion, we will focus on a select few complexity classes that hold relevance
to our context.
We begin with the definition of the most basic complexity classes.
Definition 1.4.14. Let 𝑓 ∶ ℕ → ℝ>0 be a function.
(1) The complexity class DTIME(𝑓) is the set of all languages 𝐿 for which there is a
deterministic algorithm that decides 𝐿 and has time complexity O(𝑓).
(2) The complexity class DSPACE(𝑓) is the set of all languages 𝐿 for which there is a
deterministic algorithm that decides 𝐿 and has space complexity O(𝑓).

We also define the following more concrete complexity classes.


Definition 1.4.15. (1) The complexity class P is the set of all languages 𝐿 for which
there is a deterministic polynomial time algorithm which decides 𝐿.
(2) The complexity class PSPACE is the set of all languages 𝐿 for which there is a
deterministic polynomial space algorithm which decides 𝐿.
(3) The complexity class EXPTIME is the set of all languages 𝐿 for which there is a
deterministic exponential time algorithm which decides 𝐿.
32 1. Classical Computation

Exercise 1.4.16. Consider the language 𝐿 of all strings that correspond to squares in
ℕ. Show that 𝐿 is in P.
Example 1.4.17. As shown in 2002 by Manindra Agrawal, Neeraj Kayal, and Nitin
Saxena [AKS04], the language 𝐿 of all bit strings that correspond to composite integers
is in P. Therefore, it can be decided in polynomial time whether a positive integer is
a prime number composite. However, if the algorithm of Agrawal, Kayal, and Saxena
finds that a positive integer is composite, it does not give a proper divisor of this number.
Finding such a divisor appears to be a much harder problem (see Example 1.4.13).

We also define two probabilistic complexity classes.


Definition 1.4.18. (1) The complexity class PP is the set of all languages 𝐿 for which
there is a polynomial time Monte Carlo algorithm 𝐴 which decides 𝐿 and satisfies
1 1
Pr(𝐴(𝑠) = 1) > 2 for all 𝑠 ∈ 𝐿 and Pr(𝐴(𝑠) = 0) > 2 for all 𝑠 ∈ {0, 1}∗ ⧵ 𝐿.
(2) The complexity class BPP is the set of all languages 𝐿 for which there is a polyno-
2
mial time Monte Carlo algorithm 𝐴 which decides 𝐿 and satisfies Pr(𝐴(𝑠) = 1) ≥ 3
2
for all 𝑠 ∈ 𝐿 and Pr(𝐴(𝑠) = 0) ≥ 3
for all 𝑠 ∈ {0, 1}∗ ⧵ 𝐿.
2
We note that the constant 3
in Definition 1.4.18 can be replaced by any other con-
1
stant 𝑝 with < 𝑝 ≤ 11. This flexibility is established by Proposition 1.3.22, which
2
asserts that when commencing with a success probability exceeding 1/2, Algorithm
1.3.20 can be employed to obtain a success probability arbitrarily close to 1.
Finally, we introduce the complexity class NP. We begin with a motivating exam-
ple.
Example 1.4.19. In 1742, the German mathematician Christian Goldbach wrote a
letter to Leonard Euler in which he proposed the conjecture that every even integer
≥ 4 is the sum of two odd prime numbers. For instance, we have 4 = 1 + 3, 6 = 3 + 3,
and 8 = 3 + 5. To this day, this conjecture has neither been proven nor disproved by
a counterexample. In order to find such a counterexample, one would have to find an
even positive integer which is not the sum of two primes.
To frame this problem as a decision problem, we identify positive integers with
their binary expansions and consider the Goldbach language 𝐿 comprising all integers
which are the sum of two odd prime numbers. So the Goldbach conjecture states that
𝐿 is the set of all even integers ≥ 4. A proof of the Goldbach conjecture would imply
that 𝐿 ∈ P since deciding membership of 𝐿 would mean deciding whether an integer is
≥ 4 is even. But 𝐿 is not known to be in P. However, we can verify in polynomial time
that 𝑎 ∈ 𝐿 if we are given a prime number 𝑝 such that 𝑎−𝑝 is also a prime number. For
this, we apply the polynomial time primality test to 𝑝 and 𝑎 − 𝑝 that was mentioned in
Example 1.4.17. The prime number 𝑝 is called a certificate for the Goldbach language
membership of 𝑎.

There are many other languages 𝐿 that have a property analogous to that of the
Goldbach language presented in Example 1.4.19. Abstractly speaking, this property is
the following. For 𝑠 ⃗ ∈ {0, 1}∗ it may be hard to decide whether 𝑠 ⃗ ∈ 𝐿. But for each 𝑠 ⃗ ∈ 𝐿
there is a certificate 𝑡 which allows us to verify in polynomial time in |𝑠|⃗ that 𝑠 ⃗ ∈ 𝐿. For
1.4. Complexity theory 33

the Goldbach language, the certificate is the prime number 𝑝 such that 𝑎 − 𝑝 is a prime
number. The set of all languages with this property is denoted by NP, which stands for
nondeterministic polynomial time. This name comes from another NP modeling that
we do not discuss here (see [LP98]). Here is a formal definition of NP.

Definition 1.4.20. (1) The complexity class NP is the set of all languages 𝐿 with the
following properties.
(a) There is a deterministic polynomial time algorithm 𝐴 with Input(𝐴) = {0, 1}∗
× {0, 1}∗ such that 𝐴(𝑠,⃗ 𝑡)⃗ = 1 implies 𝑠 ⃗ ∈ 𝐿 for all 𝑠,⃗ 𝑡 ⃗ ∈ Σ∗ .
(b) There is 𝑐 ∈ ℕ that may depend on 𝐿 so that for all 𝑠 ⃗ ∈ 𝐿 there is 𝑡 ⃗ ∈ {0, 1}∗
with |𝑡|⃗ ≤ |𝑠|⃗ 𝑐 and 𝐴(𝑠,⃗ 𝑡)⃗ = 1.
If 𝑠 ⃗ ∈ 𝐿 and 𝑡 ⃗ ∈ {0, 1}∗ such that 𝐴(𝑠,⃗ 𝑡)⃗ = 1, then 𝑡 ⃗ is called a certificate for the
membership of 𝑠 ⃗ in 𝐿.
(2) The complexity class Co-NP is the set of all languages 𝐿 such that {0, 1}∗ ⧵ 𝐿 ∈ NP.

One of the big open research problems in computer science is finding out whether
P is equal to NP. It is one of the seven Millennium Prize Problems. They are well-
known mathematical problems that were selected by the Clay Mathematics Institute
in the year 2000. The Clay Institute has pledged a US$1 million prize for the correct
solution to any of the problems.
The complexity theory that we have explained so far only refers to solving language
decision problems but not to more general computational problems such as finding
proper divisors of composite integers. But, as illustrated in the next example, there is
a close connection between these two problem classes.

Example 1.4.21. Consider the set

(1.4.1) 𝐿 = {(𝑎, 𝑐) ∈ ℕ2 ∶ 𝑐 ≤ 𝑎, 𝑎 has a proper divisor ≤ 𝑐}.

By identifying the elements (𝑎, 𝑐) of 𝐿 with a linear length string representation in


size 𝑎, we consider 𝐿 as a language, that is, a subset of {0, 1}∗ . We will now show that 𝐿
is in P if and only if the integer factorization problem from Example 1.4.7 can be solved
in polynomial time.
Suppose that 𝐴 is a polynomial time algorithm that on input of a composite 𝑎 ∈ ℕ
finds a proper divisor of 𝑎. Then the following algorithm 𝐴′ decides 𝐿 in polynomial
time. On input of 𝑎 ∈ ℕ, the algorithm checks whether 𝑎 = 1 or 𝑎 is a prime number.
In both cases, 𝐴′ returns 0. As explained in Example 1.4.17, this test can be carried out
in polynomial time in size 𝑎. If 𝑎 is neither 1 nor a prime number, 𝐴′ invokes 𝐴 and
finds a proper divisor 𝑏 ∈ ℕ of 𝑎. If 𝑏 ≤ 𝑐, then 𝐴′ returns 1. Otherwise, this procedure
is applied to 𝑏 and 𝑎/𝑏 and so on until a sufficiently small divisor is found or until it is
clear that all prime divisors of the input are greater than 𝑐. Exercise 1.4.22 shows that
this algorithm has polynomial running time.
Conversely, let 𝐴 be an algorithm that decides 𝐿 in polynomial time. We present
a polynomial time algorithm 𝐴′ that uses 𝐴 as a subroutine and finds a proper divisor
of a composite integer 𝑎 ∈ ℕ. It uses two integers 𝑙, 𝑢 ∈ ℕ with 1 ≤ 𝑢 ≤ 𝑣 ≤ 𝑎.
They define an interval [𝑢, 𝑣] such that inside this interval there is a proper divisor
34 1. Classical Computation

of 𝑎. During the execution of the algorithm, the interval shrinks exponentially. After
O(size 𝑎) iterations, we have 𝑢 = 𝑣. Then the algorithm returns 𝑏 = 𝑢. To achieve this,
the algorithm initially sets 𝑢 ← 2 and 𝑣 ← 𝑎−1. Since 𝑎 is composite, the interval [𝑢, 𝑣]
contains a proper divisor of 𝑎, but not the interval [1, 𝑢 − 1]. While 𝑢 < 𝑣, algorithm
𝑣−ᵆ
𝐴′ repeats the following steps. It determines 𝑚 = ⌊ 2 ⌋ and calls 𝐴(𝑎, 𝑢 + 𝑚). If the
return value is 1, then 𝐴′ sets 𝑣 to 𝑢 + 𝑚. So [𝑢, 𝑣] contains a proper divisor of 𝑎, but
[1, 𝑢 − 1] does not. If the return value is 0, then 𝐴′ sets 𝑢 to 𝑢 + 𝑚 + 1. Again [𝑢, 𝑣]
contains a proper divisor of 𝑎 but [1, 𝑢 − 1] does not. If after this step we have 𝑢 = 𝑣,
then the algorithm returns 𝑏 = 𝑢. It is a proper divisor of 𝑎 since [𝑢, 𝑣] contains such
a divisor. Since in each iteration of this while loop the interval [𝑢, 𝑣] is roughly cut in
half and the initial length of the interval is 𝑎 − 2, the number of iterations is O(size 𝑎).
Also, because 𝐴 is a polynomial time algorithm, it follows that 𝐴′ runs in polynomial
time.

Exercise 1.4.22. Write pseudocode for the algorithms sketched in Example 1.4.21 and
analyze them.

The method explained in Example 1.4.21 can be generalized to all algorithmic


problems for which the solution length is polynomially bounded in the instance length.
This is shown in Exercise 1.4.23.

Exercise 1.4.23. Consider a computational problem CP = (𝐼, 𝑂, 𝑅) with the following


property. There is 𝑐 ∈ ℕ such that for all 𝑎 ∈ 𝐼 and all solutions 𝑏 of 𝑎 we have
size 𝑏 ≤ (size 𝑎)𝑐 . Define a language 𝐿 that can be decided in polynomial time if and
only if the computational problem can be solved in polynomial time.

The next theorem describes the relation between the complexity classes that we
have introduced. This theorem is also illustrated in Figure 1.4.1.

Theorem 1.4.24. We have P ⊂ NP ⊂ PSPACE ⊂ EXPTIME and P ⊂ BPP ⊂ PP ⊂


PSPACE.

EXP PSPACE

PSPACE
PP
P
-N

N
P
co

BPP
P
P

Figure 1.4.1. Relation between deterministic and probabilistic complexity classes.


1.5. The circuit model 35

Proof. Since our computational model is only semiformal, we can only sketch the
proofs of the inclusions. But with the ideas presented below, the proofs can be car-
ried out in any of the formal models of computation, for instance the Turing machine
model.
We clearly have P ⊂ NP.
To prove NP ⊂ PSPACE, let 𝐿 ∈ NP. Also, let 𝐴 be an algorithm and let 𝑐 ∈
ℕ be a constant with the properties from Definition 1.4.20. Then this algorithm can
be transformed as follows into an algorithm 𝐴′ that decides 𝐿 in polynomial space.
The input set of 𝐴′ is {0, 1}∗ . On input of 𝑠 ⃗ ∈ {0, 1}∗ the modified algorithm runs the
algorithm 𝐴 with all possible certificates 𝑡 ⃗ ∈ {0, 1}∗ such that |𝑡|⃗ ≤ |𝑠|⃗ 𝑐 . It returns 1 if 𝐴
returns 1 for one of these certificates and 0 otherwise. It follows from Definition 1.4.20
that 𝐴′ decides 𝐿. Also, since 𝐴 is a polynomial time algorithm and because |𝑡|⃗ ≤ |𝑠|⃗ 𝑐 ,
it follows that 𝐴′ has polynomial space complexity.
Next, we show that PSPACE ⊂ EXPTIME. Let 𝐿 ∈ PSPACE and let 𝐴 be an
algorithm with polynomial space complexity that decides 𝐿. So there is a constant
𝑐 ∈ ℕ such that on input of 𝑠 ⃗ ∈ {0, 1}∗ the size of the memory used by the algorithm is
𝑐
at most |𝑠|⃗ 𝑐 . Therefore, the number of states of the algorithm run with input 𝑠 ⃗ is O(2𝑛 )
since the number of instructions that the algorithm may use is a constant. This implies
that the algorithm has exponential running time.
Now we turn to probabilistic algorithms. Clearly, we have P ⊂ BPP ⊂ PP.
To see that PP ⊂ PSPACE, let 𝐿 be a language in PP and let 𝐴 be a Monte Carlo
algorithm that decides 𝐿 as described in Definition 1.4.18. Using 𝐴, we construct an
algorithm 𝐴′ with polynomial space complexity that decides 𝐿. Let 𝑠 ⃗ ∈ {0, 1}∗ . Since 𝐴
has polynomial running time, there is 𝑐 ∈ ℕ such that the number of calls of probabilis-
tic subroutines in a run of 𝐴 on input of 𝑠 ⃗ is at most |𝑠|⃗ 𝑐 . On input of 𝑎, the algorithm 𝐴′
runs algorithm 𝐴 with all random sequences corresponding to runs of 𝐴 with input 𝑠.⃗
Their number is bounded by ≤ |𝑠|⃗ 𝑐 . If the majority of these runs returns 1, the return
value of 𝐴′ is 1. Otherwise, 𝐴′ returns 0. It follows from the definition of the complexity
class PP that 𝐴′ decides 𝐿. Also, 𝐴′ has polynomial space complexity. □

We note that the relation between BPP and NP is unknown.

1.5. The circuit model


The second computational model that we present is that of Boolean circuits. It is the
basis for the circuit model for quantum computation.

1.5.1. Logic gates. Logic gates are fundamental components of Boolean circuits.
A logic gate is a device that implements a Boolean function {0, 1}𝑛 → {0, 1}𝑚 , where 𝑚
and 𝑛 are natural numbers. The availability of specific gates depends on the comput-
ing platform being used. In this context, we will focus solely on the gates with 𝑚 = 1.
Table 1.5.1 presents commonly used logic gates and their corresponding implemented
functions are listed in Table 1.1.4. These gates can be realized using various technolo-
gies, such as diodes or transistors acting as electronic switches. Additionally, they can
36 1. Classical Computation

Table 1.5.1. List of important logic gates.

Name Logic operator Circuit symbol


𝖠𝖭𝖣 ∧
𝖮𝖱 ∨
𝖭𝖮𝖳 ¬
𝖭𝖠𝖭𝖣 ↑
𝖭𝖮𝖱 ↓
𝖷𝖮𝖱 ⊕

also be constructed using alternative technologies such as vacuum tubes, electromag-


netic relays, or even mechanical elements.

1.5.2. Boolean circuits. Next, we define Boolean circuits.


Definition 1.5.1. A Boolean circuit is a tuple 𝐶 = (𝑉, 𝐸, 𝐺, 𝐿) where (𝑉, 𝐸) is a directed
acyclic graph, 𝐺 is a set of logic gates, and
𝐿 ∶ 𝑉 → {𝖨, 𝖮, 0, 1} ∪ 𝐺
is a map that labels the elements of 𝑉 which are called vertices or nodes. A node labeled
𝖨 is called an input node. A node labeled 𝖮 is called an output node. A node labeled 0
or 1 is called a constant node. All other nodes are called gates. Also, the circuit 𝐶
satisfies the following conditions.
(1) Input nodes and constant nodes have indegree 0.
(2) There is an ordering 𝐼 → ℤ|𝐼| on the set 𝐼 of input nodes.
(3) The output nodes have indegree 1 and outdegree 0.
(4) There is an ordering 𝑂 → ℤ|𝑂| on the set 𝑂 of output nodes.
(5) Let 𝑔 ∈ 𝐺 and assume that 𝑔 implements a function with 𝑘 inputs and 𝑙 outputs.
Then its indegree is 𝑘 and its outdegree is 𝑙. Also, there is an ordering 𝐼(𝑔) → ℤ𝑘
on the set 𝐼(𝑔) of incoming edges of 𝑔 and there is an analogous ordering on the
outgoing edges of 𝑔.

Boolean circuits are also referred to as logic circuits or simply as circuits. We intro-
duce a few important notions for Boolean circuits.
Definition 1.5.2. Let 𝐶 be a Boolean circuit.
(1) The depth of a node 𝑣 of 𝐶 is the maximum length of a path from an input node
or a constant node to 𝑣.
(2) The depth of 𝐶 is the maximum depth of all nodes of 𝐶. It is denoted by depth(𝐶).
(3) The size of 𝐶 is the number of nodes of 𝐶. It is denoted by |𝐶|.
Example 1.5.3. Figure 1.5.1 shows two examples of Boolean circuits. The first imple-
ments 𝖭𝖠𝖭𝖣 using one 𝖠𝖭𝖣 and one 𝖭𝖮𝖳 gate. The second implements 𝖷𝖮𝖱 using one
𝖭𝖠𝖭𝖣, one 𝖮𝖱, and one 𝖠𝖭𝖣 gate.
1.5. The circuit model 37

𝑏0

𝑏0
𝑏0 ↑ 𝑏1 𝑏1 𝑏0 ⊕ 𝑏1
𝑏1

Figure 1.5.1. Circuit implementations of 𝖭𝖠𝖭𝖣 and 𝖷𝖮𝖱.

Exercise 1.5.4. Verify that the circuits in Figure 1.5.1 implement 𝖭𝖠𝖭𝖣 and 𝖷𝖮𝖱.

In the second circuit in Figure 1.5.1, the input nodes have outdegree 2. This is
represented by a fanout symbol . Fanout operations are used in circuits in order to
increase the outdegree of logic gates. When we describe the simulation of circuits by
reversible circuits in Section 1.7, we will consider fanout symbols as gates.
Next, we define the functions that are computed by circuits. Let 𝐶 = (𝑉, 𝐸, 𝐺, 𝐿)
be a circuit with 𝑛 input nodes and 𝑚 output nodes. To simplify the description, we
assume that all gates in 𝐺 implement functions {0, 1}𝑙 → {0, 1} for some 𝑙 ∈ ℕ. The
generalization to arbitrary gates is straightforward.
The circuit 𝐶 computes a function
(1.5.1) 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 .
To specify this function, we let 𝑏 ⃗ = (𝑏0 , . . . , 𝑏𝑛−1 ) ∈ {0, 1}𝑛 and construct 𝑓(𝑏).
⃗ For this,
we use a value function
(1.5.2) 𝐵 ∶ 𝑉 → {0, 1}
which we define by induction on the depths of the nodes in 𝑉. For the base case, we
specify the following.
(1) For constant nodes 𝑣 labeled 0 or 1 we set 𝐵(𝑣) to 0 or 1, respectively.
(2) Let 𝑣 𝑖 be the input nodes of 𝐶 for 0 ≤ 𝑖 < 𝑛. Then we set 𝐵(𝑣 𝑖 ) = 𝑏𝑖 , 0 ≤ 𝑖 < 𝑛.
Here, we use the ordering on the input nodes to assign a bit 𝑏𝑖 to an input node
𝑣𝑖 .
For the inductive step, let 𝐾 be the depth of 𝐶 and let 𝑘 be a positive integer with 0 <
𝑘 ≤ 𝐾. Assume that 𝐵(𝑣) has been defined for all nodes 𝑣 of depth less than 𝑘. Let 𝑣 be
a node of depth 𝑘. We define 𝐵(𝑣) as follows. Since the depth of the node 𝑣 is greater
than 0, it is either a gate or an output node. Assume that 𝑣 is a gate. Let
(1.5.3) 𝑔 ∶ {0, 1}𝑙 → {0, 1}
be the Boolean function implemented by this gate. Then 𝑔 has 𝑙 incoming edges and,
by definition, there is an ordering (𝑒 0 , . . . , 𝑒 𝑙−1 ) on these edges. Denote by 𝑢0 , . . . , 𝑢𝑙−1
the nodes in the circuit such that 𝑒 𝑖 is an outgoing edge of 𝑢𝑖 for 0 ≤ 𝑖 < 𝑙. Then the
nodes 𝑢0 , . . . , 𝑢𝑙−1 have depth less than 𝑘. Therefore, the values 𝐵(𝑢𝑖 ), 0 ≤ 𝑖 < 𝑙, are
already defined. We set
(1.5.4) 𝐵(𝑣) = 𝑔(𝐵(𝑢0 ), . . . , 𝐵(𝑢𝑛−1 )).
38 1. Classical Computation

Assume that 𝑣 is an output node. By definition, it has indegree 1. Let 𝑢 be the node in
𝑉 from which there exists an edge to 𝑣. Then we define
(1.5.5) 𝐵(𝑣) = 𝐵(𝑢).
Finally, let (𝑦0 , . . . , 𝑦𝑚 ) be the ordered sequence of output nodes of 𝐶. Then we set
(1.5.6) 𝑓(𝑏)⃗ = (𝐵(𝑦0 ), . . . , 𝐵(𝑦𝑚−1 )).

Examples of circuits and the functions that they implement can be seen in Figure
1.5.1.
Exercise 1.5.5. Define the function computed by a circuit that uses logic gates with
more than one output gate.

1.5.3. Universal sets of gates. Which gates do we really need in circuits? In


this section we show that very few suffice. We start with a definition.
Definition 1.5.6. A set 𝐺 of logic gates is called universal for classical computation if
for all 𝑚, 𝑛 ∈ 𝑁 and every function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 there is a circuit that only
uses gates from 𝐺 and computes 𝑓.

Now we present a very simple set of logic gates that is universal for classical com-
putation.
Theorem 1.5.7. The set {𝖭𝖮𝖳, 𝖠𝖭𝖣, 𝖮𝖱} is universal for classical computation.

Proof. It suffices to prove the theorem for Boolean functions


(1.5.7) 𝑓 ∶ {0, 1}𝑛 → {0, 1}.
If the function 𝑓 has more components, we can use circuits that implement the indi-
vidual components to construct a circuit 𝐶 that computes 𝑓. For this, we proceed as
follows. Let 𝑓 = (𝑓0 , . . . , 𝑓𝑙 ) with Boolean functions 𝑓𝑖 ∶ {0, 1}𝑛 → {0, 1} for 0 ≤ 𝑖 < 𝑚.
For all 𝑖 ∈ ℤ𝑚 let 𝐶𝑖 be a circuit that computes 𝑓𝑖 . We use fanout operations in order
to make the constant and input nodes available for all circuits 𝐶𝑖 . Then (𝑜0 , . . . , 𝑜 𝑙−1 )
are used as the output nodes of 𝐶 where 𝑜 𝑖 is the output node of 𝐶𝑖 .
We now prove the assertion for functions as in (1.5.7) by induction on 𝑛. For the
base case, let 𝑛 = 1. There are four Boolean functions 𝑓 ∶ {0, 1} → {0, 1}, namely the
following:
(1.5.8) 𝑏 ↦ 0, 𝑏 ↦ 1, 𝑏 ↦ 𝑏, 𝑏0 ↦ ¬𝑏.
Implementations of these functions by circuits that use only the 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱
gates are shown in Figure 1.5.2.
Now let 𝑛 > 0 and assume that all functions {0, 1}𝑛−1 → {0, 1} can be implemented
by circuits that use only the 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱 gates. Let 𝑓 ∶ {0, 1}𝑛 → {0, 1}. For
𝑏 ∈ {0, 1} define the functions
(1.5.9) 𝑓𝑏 ∶ {0, 1}𝑛−1 → {0, 1}, (𝑏0 , . . . , 𝑏𝑛−2 ) ↦ 𝑓(𝑏0 , . . . , 𝑏𝑛−2 , 𝑏).
Then for every (𝑏0 , . . . , 𝑏𝑛−2 , 𝑏) ∈ {0, 1}𝑛 we can write
(1.5.10) 𝑓(𝑏0 , . . . , 𝑏𝑛−2 , 𝑏) = (𝑓0 (𝑏0 , . . . , 𝑏𝑛−2 ) ∧ ¬𝑏) ∨ (𝑓1 (𝑏0 , . . . , 𝑏𝑛−2 ) ∧ 𝑏).
1.5. The circuit model 39

𝑏 𝑏

𝑓(𝑏) = 0 𝑓(𝑏) = 1

0 1

𝑏 𝑓(𝑏) = 𝑏 𝑥 𝑓(𝑏) = ¬𝑏

Figure 1.5.2. Base case of the induction proof in Theorem 1.5.7: circuits that compute
the four functions 𝑓 ∶ {0, 1} → {0, 1}.

𝑏0
𝑏1

⋮ 𝑓0 (𝑏0 , . . . , 𝑏𝑛−2 )

𝑏𝑛−2

𝑓(𝑏)⃗

⋮ 𝑓1 (𝑏0 , . . . , 𝑏𝑛−2 )

𝑏𝑛−1

Figure 1.5.3. Inductive step in the proof of Theorem 1.5.7.

By the induction hypothesis, there exist circuits that implement 𝑓0 and 𝑓1 and use only
the gates 𝖭𝖮𝖳, 𝖮𝖱, and 𝖠𝖭𝖣. Therefore, the circuit in Figure 1.5.3 that uses the circuits
for 𝑓0 and 𝑓1 implements the function 𝑓. □

From Theorem 1.5.7 we obtain the following important corollary.


Corollary 1.5.8. For all 𝑚, 𝑛 ∈ ℕ and every function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 there is a
circuit that computes 𝑓.

Next, we present an even smaller universal set of gates.


40 1. Classical Computation

𝑏0

𝑏 𝑏0
¬𝑏 𝑏0 ∧ 𝑏1 𝑏0 ∨ 𝑏1
𝑏1
𝑏1

Figure 1.5.4. Implementation of 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱 by 𝖭𝖠𝖭𝖣.

Theorem 1.5.9. The gate set {𝖭𝖠𝖭𝖣} is universal for classical computing.

Proof. By Theorem 1.5.7 it suffices to show that the gates 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱 can be
implemented by a circuit that uses only the 𝖭𝖠𝖭𝖣 gate. This is shown in Figure 1.5.4.

Exercise 1.5.10. Show that the set {𝖭𝖮𝖱} is universal for classical computation.

Finally, we define the complexity of Boolean functions.


Definition 1.5.11. Let 𝐺 be a universal set of logic gates and let 𝑓 be a Boolean func-
tion. The circuit size complexity of 𝑓 with respect to 𝐺 is the minimum size of any circuit
that computes 𝑓. It is also called the circuit complexity of 𝑓.

If it is clear which universal set of logic gates we refer to, we simply speak about
the circuit size complexity of a Boolean function.
We note that there is also the notion of circuit depth complexity which is not re-
quired in our context.

1.6. Circuit families and circuit complexity


In this section we discuss the notion of circuit families which is then used to introduce
circuit complexity theory.

1.6.1. Circuit families. Individual circuits cannot compute functions 𝑓 ∶ {0, 1}∗ →
{0, 1}∗ or decide languages since their input length is fixed. To solve these more general
problems, we need families of circuits. We fix a finite universal set of logical gates and
assume from now on that all circuits are constructed using these gates. Such universal
sets are presented in Theorems 1.5.7 and 1.5.9.
Definition 1.6.1. A family of circuits or circuit family is a sequence (𝐶𝑛 )𝑛∈ℕ of circuits
such that the circuit 𝐶𝑛 has 𝑛 input nodes for all 𝑛 ∈ ℕ.

Next, we describe what it means for a circuit family to compute a function, solve a
computational problem, or decide a language. In doing so, we must take into account
that circuits have a fixed output length. Example 1.6.2 illustrates how to deal with this.
1.6. Circuit families and circuit complexity 41

Example 1.6.2. Consider the function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ , 𝑎 ↦ 𝑓(𝑎) = 𝑎2 where we
identify the elements of {0, 1}∗ with the integers they represent. How can we implement
this function using a circuit family? For 𝑛 ∈ ℕ, let
(1.6.1) 𝑓𝑛 ∶ {0, 1}𝑛 → {0, 1}∗ , 𝑎 ↦ 𝑎2 .
In order to implement 𝑓𝑛 as a circuit, we must use representations of the function values
that have the same length for all inputs of length 𝑛. So we prepend an appropriate
number of zeros to the binary representations to ensure that they all have the same
length 2𝑛. For example, for 𝑛 = 2 we write 𝑓2 (00) = 0000, 𝑓2 (01) = 0001, 𝑓2 (10) =
0100, 𝑓2 (11) = 1001. In the same way, circuits 𝐶𝑛 can be constructed that implement
𝑓𝑛 for all 𝑛 ∈ ℕ. We note that the binary expansion of the function values ≠ 0 can be
obtained from the function values represented by bit strings of length 𝑛 by deleting the
leading zeros.

The idea of Example 1.6.2 can be used for all functions 𝑓 ∶ {0, 1}∗ → {0, 1}∗ whose
function values are encoded either by 0 or by bit strings in {0, 1}∗ starting with bit 1. This
encoding can be easily obtained from any encoding by prefixing the representations
of the function values different from 0 with 1. So without loss of generality, we only
consider functions that satisfy
(1.6.2) |𝑠|⃗ = |𝑠′⃗ | ⇒ |𝑓(𝑠)|
⃗ = |𝑓(𝑠′⃗ )| for all 𝑠,⃗ 𝑠′⃗ ∈ {0, 1}∗ .

Definition 1.6.3. Let 𝑓 ∶ {0, 1}∗ → {0, 1}∗ be a function that satisfies (1.6.2) and let
𝐶 = (𝐶𝑛 )𝑛∈ℕ be a circuit family. We say that 𝐶 computes 𝑓 if for all 𝑛 ∈ ℕ, the circuit
𝐶𝑛 computes the function 𝑓𝑛 ∶ {0, 1}𝑛 → {0, 1}∗ , 𝑠 ⃗ ↦ 𝑓(𝑠).⃗

We also define what it means for a circuit family to solve a computational problem
CP = (𝐼, 𝑂, 𝑅). Analogous to (1.6.2), we may, without loss of generality, assume that
the solutions of all instances of a fixed length also have a fixed length. This means that
(1.6.3) |𝑠|⃗ = |𝑠′⃗ | ⇒ |𝑡|⃗ = |𝑡′⃗ | for all (𝑠,⃗ 𝑡),⃗ (𝑠′⃗ , 𝑡′⃗ ) ∈ 𝑅.
We assume in the following that the encodings of computational problems have this
property.

Definition 1.6.4. Let CP = (𝐼, 𝑂, 𝑅) be a computational problem and let 𝐶 = (𝐶𝑛 )𝑛∈ℕ
be a circuit family. We say that 𝐶 solves CP if for all 𝑛 ∈ ℕ on input of 𝑎 ∈ {0, 1}𝑛 ∩ 𝐼
the circuit 𝐶𝑛 computes a solution 𝑏 of 𝑎 .

Finally, we define what it means for a circuit family to decide a language.

Definition 1.6.5. Let 𝐿 be a language, and let 𝐶 = (𝐶𝑛 )𝑛∈ℕ be a circuit family. We say
that 𝐶 decides 𝐿 if for all 𝑛 ∈ ℕ on input of 𝑠 ⃗ ∈ {0, 1}𝑛 the circuit 𝐶𝑛 returns 1 if 𝑠 ⃗ ∈ 𝐿
and 0 otherwise.

From Corollary 1.5.8 we obtain the following result.

Theorem 1.6.6. For all functions 𝑓 ∶ {0, 1}∗ → {0, 1}∗ , computational problems CP,
and languages 𝐿 there is a circuit family that computes 𝑓, solves CP, or decides 𝐿.
42 1. Classical Computation

Theorem 1.6.6 demonstrates that circuit families are more powerful than algo-
rithms in terms of computation. It is known that certain functions 𝑓 ∶ {0, 1}∗ → {0, 1}∗
cannot be computed by algorithms (see [Dav82]). However, as this theorem shows,
for all such functions, there exists a circuit family that can compute them. This is pos-
sible because an individual circuit can be designed for each input length. The next
section introduces a more limited concept of circuit families that possesses capabilities
equivalent to the concept of algorithms.

1.6.2. Uniform circuit families. We now introduce uniform circuit families and
obtain a computational model that corresponds to the algorithmic one. For the next
definition, we assume that we have fixed some encoding of circuits by bit strings. Fol-
lowing [Wat09] we require the following.
(1) The encoding is sensible: every circuit is encoded by at least one bit string, and
every bit string encodes at most one quantum circuit.
(2) The encoding is efficient: there is 𝑐 ∈ ℕ such that every circuit 𝐶 has an encoding
of length at least size 𝐶 and at most (size 𝐶)𝑐 .
(3) Information about the structure of a circuit is computable in polynomial time
from an encoding of the circuit.
“Structure information” means, for example, information about what the input
nodes, the gates, and the output nodes are and how these nodes are connected.
We define uniform circuit families.
Definition 1.6.7. A circuit family 𝐶 = (𝐶𝑛 ) is called uniform if there is a deterministic
algorithm which on input of I𝑛 , 𝑛 ∈ ℕ, outputs the encoding of 𝐶𝑛 .

After the next definition, we explain why the input of the algorithm in Definition
1.6.7 is I𝑛 and not simply 𝑛. Here, we remark the following: It can be shown that
uniform circuit families are Turing complete, meaning that their computing power is
equivalent to that of Turing machines. This is important because the Turing-Church
thesis states that Turing machines are the most powerful computing devices imagin-
able. This means that a function 𝑓 ∶ {0, 1}∗ → {0, 1}∗ is computable by a human being
following an algorithm, ignoring resource limitations, if and only if it is computable
by a Turing machine. In today’s computer science, the Turing-Church thesis is still
considered to be true.
Now we define the P-uniform circuit families.
Definition 1.6.8. A circuit family 𝐶 = (𝐶𝑛 ) is called P-uniform if there is a determin-
istic polynomial time algorithm which on input of I𝑛 , 𝑛 ∈ ℕ, outputs the encoding of
𝐶𝑛 .

Why is the input of the algorithm represented in unary encoding I𝑛 of 𝑛, rather


than the binary expansion of this number? The key reason is that the algorithm run-
ning time should be polynomial in 𝑛. The running time of algorithms is typically mea-
sured as a function of the length of their input. Therefore, the input length of this
algorithm must be proportional to 𝑛. If we were to use the binary encoding of 𝑛, its
length would only be of the order log 𝑛, which is much smaller than 𝑛. As a result, the
1.7. Reversible circuits 43

algorithm running time would not be polynomial in 𝑛. On the other hand, the unary
encoding I𝑛 has a length proportional to 𝑛, making it suitable for ensuring polynomial
running time in terms of the input size.

1.6.3. Circuit complexity. Now we define the size complexity of circuit families
and different circuit complexity classes.
Definition 1.6.9. Let 𝐶 = (𝐶𝑛 )𝑛∈ℕ be a circuit family and let 𝑓 ∶ ℕ → ℕ be a function.
(1) The size complexity of 𝐶 is the function ℕ → ℕ, 𝑛 ↦ |𝐶𝑛 |.
(2) The complexity class SIZE(𝑓) is the set of all languages that can be decided by a
P-uniform circuit family with size complexity O(𝑓).

The next theorem establishes a connection between algorithmic and circuit com-
plexity classes.
Theorem 1.6.10. Let 𝑓 ∶ ℕ → ℕ. Then DTIME(𝑓) ⊂ SIZE(𝑓 log 𝑓).

For the proof of this theorem, see [Vol99] and [AB09]. It is beyond the scope of
this book.
From Theorem 1.6.10 we obtain the following corollary which characterizes the
complexity class P in terms of polynomial size uniform circuit families.
Corollary 1.6.11. A language 𝐿 is in P if and only if 𝐿 is in SIZE(𝑛𝑐 ) for some 𝑐 ∈ ℕ.

1.7. Reversible circuits


This section explores reversible circuits, which play a crucial role in quantum com-
puting, as we will see in Section 4.7. Reversible circuits can be easily converted into
quantum circuits by substituting classical reversible gates with their quantum equiv-
alents. A significant objective of this section is to demonstrate that any circuit can be
emulated by a reversible circuit. This, in turn, implies that there exists a quantum
circuit for every Boolean function, capable of computing that function.

1.7.1. Basics. We define reversible gates and circuits.


Definition 1.7.1. A reversible gate or circuit is a logic gate or circuit that implements
an invertible function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 for some 𝑛 ∈ ℕ, respectively.

The only reversible gate that we have seen so far is the 𝖭𝖮𝖳 gate. All other gates
in Table 1.5.1 are not reversible.
An important reversible gate with two input nodes is the controlled not gate which
is denoted by 𝖢𝖭𝖮𝖳. It applies the 𝖭𝖮𝖳 operation to a target bit 𝑡 if a control bit 𝑐 is
1. Otherwise, the target bit remains unchanged. Therefore, the target qubit becomes
𝑐⊕𝑡. The control bit is never changed. Two variants of 𝖢𝖭𝖮𝖳 are shown in Figure 1.7.1.
In the left 𝖢𝖭𝖮𝖳 gate, the first bit is the control, and the second bit is the target. In the
right 𝖢𝖭𝖮𝖳 gate, the roles of the bits are reversed. A circuit implementation of the left
𝖢𝖭𝖮𝖳 gate using one 𝖷𝖮𝖱 gate is shown in Figure 1.7.2. Figure 1.7.3 presents two more
𝖢𝖭𝖮𝖳 variants. They flip the target bit 𝑡 conditioned on the control bit 𝑐 being 0.
44 1. Classical Computation

𝑐 𝑐 𝑡 𝑐⊕𝑡
𝑡 𝑐⊕𝑡 𝑐 𝑐

Figure 1.7.1. 𝖢𝖭𝖮𝖳 gates that change the target bit 𝑡 conditioned on the control bit 𝑐
being 1.

𝑐 𝑐 𝑐
= 𝑐
𝑡 𝑐⊕𝑡 𝑐⊕𝑡
𝑡

Figure 1.7.2. Circuit implementation of a 𝖢𝖭𝖮𝖳 gate using one 𝖷𝖮𝖱 gate.

𝑐 𝑐 𝑡 ¬𝑐 ⊕ 𝑡
𝑡 ¬𝑐 ⊕ 𝑡 𝑐 𝑐

Figure 1.7.3. 𝖢𝖭𝖮𝖳 gates that change the target bit 𝑡 conditioned on the control bit 𝑐
being 0.

𝑏0 𝑏1 𝑏0 𝑏1
=
𝑏1 𝑏0 𝑏1 𝑏0

Figure 1.7.4. The 𝖲𝖶𝖠𝖯 gate and its implementation using three 𝖢𝖭𝖮𝖳 gates.

Another important gate is the 𝖲𝖶𝖠𝖯 gate. On input of a pair (𝑏0 , 𝑏1 ) of bits it
returns (𝑏1 , 𝑏0 ). This gate is shown in Figure 1.7.4 together with an implementation
that uses only three 𝖢𝖭𝖮𝖳 gates.
Next, we show that we can implement every permutation of the entries of an 𝑛-bit
string by a reversible circuit that uses at most 𝑛 − 1 𝖲𝖶𝖠𝖯 gates.
Proposition 1.7.2. Let 𝑛 ∈ ℕ and let 𝜋 ∈ 𝑆𝑛 . Then the map
(1.7.1) 𝑓𝜋 ∶ {0, 1}𝑛 → {0, 1}𝑛 , (𝑏0 , . . . , 𝑏𝑛−1 ) ↦ (𝑏𝜋(0) , . . . 𝑏𝜋(𝑛−1) )
can be implemented by a circuit that uses at most 𝑛 − 1 𝖲𝖶𝖠𝖯 or at most 3𝑛 𝖢𝖭𝖮𝖳 gates.

Proof. The proposition follows Theorem A.4.25 which states that 𝜋 is the product of
at most 𝑛 − 1 transpositions. □
Example 1.7.3. Consider the permutation
0 1 2 3
(1.7.2) 𝜋=( ).
1 3 0 2
We have 𝜋 = (2, 1) ∘ (1, 3) ∘ (0, 2). So the circuit in Figure 1.7.5 implements 𝜋.

We now introduce the Toffoli gate which was proposed in 1980 by Tomaso Toffoli
and is shown in Figure 1.7.6. It implements the bijection
(1.7.3) {0, 1}3 → {0, 1}3 , (𝑐 0 , 𝑐 1 , 𝑡) ↦ (𝑐 0 , 𝑐 1 , 𝑐 0 ∧ 𝑐 1 ⊕ 𝑡).
1.7. Reversible circuits 45

𝑏0 𝑏1

𝑏1 𝑏3

𝑏2 𝑏0

𝑏3 𝑏2

Figure 1.7.5. Implementation of the permutation 𝜋 in (1.7.2).

𝑐0 𝑐0
𝑐1 𝑐1

𝑡 (𝑐 0 ∧ 𝑐 1 ) ⊕ 𝑡

Figure 1.7.6. A Toffoli or 𝖢𝖢𝖭𝖮𝖳 gate.

𝑏0 𝑏0 𝑏 𝑏
𝑏1 𝑏1 1 1

1 𝑏0 ↑ 𝑏1 0 𝑏

𝖭𝖠𝖭𝖣 𝖥𝖠𝖭𝖮𝖴𝖳

Figure 1.7.7. Reversible circuits that implement the 𝖭𝖠𝖭𝖣 and 𝖥𝖠𝖭𝖮𝖴𝖳 gates.

This gate leaves the control bits 𝑐 0 and 𝑐 1 unchanged and modifies the target bit 𝑡 condi-
tioned on both control bits 𝑐 0 and 𝑐 1 being 1. Toffoli gates are also called 𝖢𝖢𝖭𝖮𝖳 gates:
a 𝖭𝖮𝖳 operation controlled by two control bits.
The Toffoli gate has the important property that it allows implementations of the
𝖭𝖠𝖭𝖣 and fanout operations. This is shown in Figure 1.7.7. As we will see in Section
1.7.2, this property implies that Toffoli gates can be used to transform every Boolean
circuit into a reversible circuit.
Exercise 1.7.4. Verify that the circuits in Figure 1.7.7 are reversible and implement
the 𝖭𝖠𝖭𝖣 and the 𝖥𝖠𝖭𝖮𝖴𝖳 operation, respectively.

Another gate that can be used to make every circuit reversible is the Fredkin gate.
It was introduced by Edward Fredkin in 1969. It implements the bijection
{0, 1}3 → {0, 1}3 ,
(1.7.4)
(𝑐, 𝑡0 , 𝑡1 ) ↦ (𝑐, (¬𝑐 ∧ 𝑡0 ) ∨ (𝑐 ∧ 𝑡1 ), (𝑐 ∧ 𝑡0 ) ∨ (¬𝑐 ∧ 𝑡1 )).
This function does not change the control bit 𝑐, swaps the target bits 𝑏1 and 𝑏2 if the
control bit 𝑐 is 1, and leaves them unchanged otherwise (see Exercise 1.7.5). Because
of this property, the Fredkin gate is also called the controlled swap gate and is denoted
by 𝖢𝖲𝖶𝖠𝖯: a swap controlled by one control bit.
46 1. Classical Computation

𝑐 𝑐

𝑡0 (¬𝑐 ∧ 𝑡0 ) ∨ (𝑐 ∧ 𝑡1 )

𝑡1 (𝑐 ∧ 𝑡0 ) ∨ (¬𝑐 ∧ 𝑡1 )

Figure 1.7.8. A Fredkin or 𝖢𝖲𝖶𝖠𝖯 gate.

Exercise 1.7.5. Determine the truth tables of the Toffoli and the Fredkin gates and use
them to verify that they implement the functions in (1.7.3) and (1.7.4). Also, verify that
the two functions are bijections.

Exercise 1.7.6. (1) Find an implementation of the Toffoli gate that uses only 𝖭𝖮𝖳,
𝖠𝖭𝖣, and 𝖮𝖱 gates.
(2) Find an implementation of the Fredkin gate that uses only 𝖭𝖮𝖳, 𝖠𝖭𝖣, and 𝖮𝖱
gates.
(3) Find implementations of the 𝖭𝖠𝖭𝖣 and fanout operations that use only Fredkin
gates.

1.7.2. Construction of reversible circuits. In this section, we provide an il-


lustrative example that demonstrates the construction of reversible circuits using re-
versible gates. This construction will be subsequently adopted in Section 3.3.4 and
further formalized in Definition 4.7.1 for generating quantum circuits. The example is
shown in Figure 1.7.9.
The circuit implements a bijection 𝑓 ∶ {0, 1}4 → {0, 1}4 . As shown in Figure 3.3.5,
it can be written as 𝑓 = 𝑓2 ∘𝑓1 ∘𝑓0 where 𝑓0 , 𝑓1 , 𝑓2 ∶ {0, 1}4 → {0, 1}4 are bijections. Each
of these functions is obtained by applying invertible gates to certain bits and applying
the identity function 𝐼 ∶ {0, 1} → {0, 1} to the remaining bits. The functions are
(1.7.5) 𝑓0 = (𝖭𝖮𝖳, 𝐼, 𝐼, 𝐼), 𝑓1 = (𝖢𝖢𝖭𝖮𝖳, 𝐼), 𝑓2 = (𝐼, 𝖢𝖲𝖶𝖠𝖯).

The circuit operates on the input (0, 1, 0, 0) as follows:

(1.7.6) (0, 1, 0, 0) ↦ (1, 1, 0, 0) ↦ (1, 1, 1, 0) ↦ (1, 1, 0, 1).


𝑓0 𝑓1 𝑓2

Exercise 1.7.7. Determine 𝑓(𝑥)⃗ for the function 𝑓 implemented by the circuit in Figure
1.7.9 for all 𝑥⃗ ∈ {0, 1}4 .

𝑏0 ¬ 𝑐0

𝑏1 𝑐1

𝑏2 𝑐2

𝑏3 𝑐3

Figure 1.7.9. Reversible circuit.


1.7. Reversible circuits 47

𝑓0 𝑓1 𝑓2

𝑏0 𝖭𝖮𝖳 𝑏0

𝑏1 𝑏1

𝑏2 𝑏2

𝑏3 𝑏3

Figure 1.7.10. The functions 𝑓0 , 𝑓1 , 𝑓2 corresponding to the reversible circuit from


Figure 1.7.9.
.

This construction can be easily extended to circuits that handle inputs of any size.
Moreover, the construction enables the solution of the subsequent exercise.
Exercise 1.7.8. Show that any circuit that uses only reversible gates is reversible.

1.7.3. Every function can be computed by a reversible circuit. In this sec-


tion, we show that every Boolean function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 with 𝑚, 𝑛 ∈ ℕ can be
computed by a reversible circuit that uses only Toffoli gates. Since, in general, the func-
tion 𝑓 is not reversible, this cannot mean that 𝑓 is implementable by a reversible circuit.
It means that there is an invertible circuit such that the function ℎ implemented by it
allows us to obtain 𝑓(𝑥)⃗ for every 𝑥⃗ ∈ {0, 1}𝑛 very easily. Theorems 1.7.10 and 1.7.12
will make this statement precise.
To simplify our description, we consider fanout operations also as gates which we
write as 𝖥𝖠𝖭𝖮𝖴𝖳. Before we state the first important result of this section, we introduce
some more notation.
Definition 1.7.9. (1) For a circuit 𝐶 we denote by |𝐶|𝐹 the number of gates including
𝖥𝖠𝖭𝖮𝖴𝖳 gates that it uses.
(2) For a Boolean function 𝑓 denote by |𝑓|𝐹 the minimum value of |𝐶|𝐹 over all cir-
cuits 𝐶 that implement 𝑓 and use only 𝖭𝖠𝖭𝖣 and 𝖥𝖠𝖭𝖮𝖴𝖳 gates.

The idea of the construction of a reversible circuit that implements 𝑓 is to start


from a circuit 𝐶 that implements 𝑓 and uses only 𝖭𝖠𝖭𝖣 and 𝖥𝖠𝖭𝖮𝖴𝖳 gates. Such a
circuit exists by Theorem 1.5.9. Then all 𝖭𝖠𝖭𝖣 and 𝖥𝖠𝖭𝖮𝖴𝖳 gates in 𝐶 are replaced by
their reversible counterparts shown in Figure 1.7.7.
Theorem 1.7.10. For all Boolean functions 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ, there
is 𝑝 ∈ ℕ0 , 𝑝 ≤ 2|𝑓|𝐹 , a reversible circuit 𝐶𝑟 of size |𝑓|𝐹 that uses only Toffoli gates,
𝑎⃗ ∈ {0, 1}𝑝 , and a function 𝑔 ∶ {0, 1}𝑛 → {0, 1}𝑛+𝑝−𝑚 such that 𝐶𝑟 implements a function
(1.7.7) ℎ ∶ {0, 1}𝑛 × {0, 1}𝑝 → {0, 1}𝑚 × {0, 1}𝑛+𝑝−𝑚
with
(1.7.8) ℎ(𝑥,⃗ 𝑎)⃗ = (𝑓(𝑥),
⃗ 𝑔(𝑥))

48 1. Classical Computation

for all 𝑥⃗ ∈ {0, 1}𝑛 . The bits in 𝑎⃗ are called ancilla bits. The functional value 𝑔(𝑥)⃗ is called
garbage.

Proof. We prove the theorem by induction on 𝑘 = |𝑓|𝐹 .


For the base case, let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ, with |𝑓|𝐹 = 0 and let 𝐶 be
a circuit that implements 𝑓 with |𝐶|𝐹 = 0. Then 𝐶 has only input and output nodes.
Input nodes have indegree 0 and outdegree 1, while output nodes have indegree 1 and
outdegree 0. Therefore, we have 𝑛 = 𝑚 and the function 𝑓 permutes the input bits and
is therefore a bijection. So 𝑝 = 0, 𝐶𝑟 = 𝐶, 𝑎⃗ = (), and 𝑔 ∶ {0, 1}𝑛 → {0, 1}0 , 𝑥⃗ ↦ () have
the asserted properties.
For the inductive step, let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ, with 𝑘 = |𝑓|𝐹 > 0 and
assume that for every Boolean function 𝑓′ with |𝑓′ |𝐹 < 𝑘 the assertion of Theorem
1.7.10 holds. Let 𝐶 be a circuit that implements the function 𝑓, uses only 𝖭𝖠𝖭𝖣 and
𝖥𝖠𝖭𝖮𝖴𝖳 gates, and satisfies |𝐶|𝐹 = 𝑘.
Since 𝑘 > 0, it follows that 𝐶 contains at least one gate, which is a 𝖭𝖠𝖭𝖣 or a
𝖥𝖠𝖭𝖮𝖴𝖳 gate. This implies that 𝐶 has at least one of the following properties.
(1) There is a 𝖥𝖠𝖭𝖮𝖴𝖳 gate in 𝐶 whose outgoing edges are incoming edges of two
output nodes 𝑦 𝑖 and 𝑦𝑗 of 𝐶 where 𝑖, 𝑗 ∈ ℤ𝑚 , 𝑖 ≠ 𝑗.
(2) There is a 𝖭𝖠𝖭𝖣 gate in 𝐶 whose outgoing edge is the incoming edge of an output
node 𝑦 𝑖 of 𝐶 where 𝑖 ∈ ℤ𝑚 .
Assume that 𝐶 has the first property. To construct 𝐶𝑟 we proceed as follows. Re-
move the 𝖥𝖠𝖭𝖮𝖴𝖳 gate and the corresponding output nodes from 𝐶. Connect the in-
coming edge of the removed 𝖥𝖠𝖭𝖮𝖴𝖳 gate to a new output node. Since 𝖥𝖠𝖭𝖮𝖴𝖳 gates
do not change the input, we also denote it by 𝑦 𝑖 . The resulting circuit is denoted by 𝐶 ′ .
An example for 𝐶 and 𝐶 ′ is shown in Figure 1.7.11. In this example we have 𝑛 = 1,
𝑚 = 2, 𝑖 = 0, 𝑗 = 1, and the function implemented by 𝐶 is 𝑓(𝑥0 ) = (𝑥0 , 𝑥0 ).
Let 𝑓′ ∶ {0, 1}𝑛 → {0, 1}𝑚−1 be the function implemented by 𝐶 ′ . In Figure 1.7.11
we have 𝑓′ (𝑥0 ) = 𝑥0 . Since a 𝖥𝖠𝖭𝖮𝖴𝖳 gate was removed from 𝐶 to obtain 𝐶 ′ , we have
|𝑓′ |𝐹 < 𝑘 = |𝑓|𝐹 . Apply the induction hypothesis to 𝑓′ and obtain 𝑝′ , 𝐶𝑟′ , 𝑎′⃗ , and 𝑔′ as
described in Theorem 1.7.10. In the example in Figure 1.7.11 the circuit 𝐶 ′ is reversible.
So, we can choose 𝐶𝑟′ = 𝐶 ′ . Note that |𝐶𝑟′ | = 0 = |𝑓|𝐹 − 1 which means that 𝑝′ = 0,
𝑎′⃗ = (), and 𝑔′ ∶ {0, 1} → {0, 1}0 , (𝑏) ↦ () have the required properties.
We construct the reversible circuit 𝐶𝑟 . We set 𝑝 = 𝑝′ + 2 and add two ancilla
bits 𝑎𝑝−2 and 𝑎𝑝−1 to 𝐶𝑟′ . Additionally, we add a Toffoli gate to 𝐶𝑟′ that replaces the

𝑥0 𝑦 0 = 𝑥0
𝑥0 𝑦 0 = 𝑥0
𝑦 1 = 𝑥0

𝐶 𝐶′

Figure 1.7.11. 𝐶𝑟′ = 𝐶 ′ is obtained by removing the fanout node from 𝐶.


1.7. Reversible circuits 49

𝑥0 𝑦 0 = 𝑥0

𝑎0 = 0 𝑦 1 = 𝑥0

𝑎1 = 1 1

Figure 1.7.12. Construction of 𝐶𝑟 from 𝐶𝑟′ in Figure 1.7.11.

removed 𝖥𝖠𝖭𝖮𝖴𝖳 gate. The first input of this gate is the output bit 𝑦 𝑖 of 𝐶𝑟′ , and the
second and third inputs are the new ancilla bits 𝑎𝑝−2 = 0 and 𝑎𝑝−1 = 1. As shown
in Exercise 1.7.4, the output of the Toffoli gate is (𝑦 𝑖 , 𝑦 𝑖 , 1). The first two output edges
of the Toffoli gate are connected to two output nodes that are in the same position
as the removed output nodes of the removed 𝖥𝖠𝖭𝖮𝖴𝖳 gate. Then 𝑝 = 𝑝′ + 2, 𝐶𝑟 ,
𝑎⃗ = 𝑎′⃗ ||(0, 1), and 𝑔(𝑥)⃗ = 𝑔′ (𝑥)‖(1)
⃗ have the required properties. We also note that
|𝐶𝑟 | = |𝐶𝑟′ | + 1 = |𝐶 ′ |𝐹 + 1 = |𝐶|𝐹 and 𝑝 = 𝑝′ + 2 ≤ 2|𝐶 ′ |𝐹 + 2 = 2|𝐶|𝐹 .
Figure 1.7.12 shows how this construction works for the example in Figure 1.7.11.
There, the circuit 𝐶𝑟 is simply the Toffoli gate that implements the fanout operation
and we have 𝑝 = 2 = 𝑝′ + 2, 𝑔(𝑥0 ) = 1, and |𝐶𝑟 | = 1 = |𝑓|𝐹 .
Now suppose that 𝐶 has the second property; i.e., there is a 𝖭𝖠𝖭𝖣 gate whose out-
going edge is connected to an output node 𝑦 𝑖 of 𝐶 for some 𝑖 ∈ ℤ𝑚 . Remove this
𝖭𝖠𝖭𝖣 gate and the corresponding output node 𝑦 𝑖 from 𝐶. Add two new output gates
𝑦′𝑖 and 𝑦′𝑚 to 𝐶 and connect the incoming edges of the removed 𝖭𝖠𝖭𝖣 to 𝑦′𝑖 and 𝑦′𝑚 .
Denote by 𝐶 ′ the resulting circuit and by 𝑓′ the function implemented by 𝐶 ′ . Then we
have |𝑓′ |𝐹 = 𝑘 − 1. In the example shown in Figure 1.7.13 we have 𝑛 = 2, 𝑚 = 1,
𝑦0 = 𝑓(𝑥0 , 𝑥1 ) = 𝑥0 ∧ 𝑥1 , and (𝑦′0 , 𝑦′1 ) = 𝑓′ (𝑥0 , 𝑥1 ) = (𝑥0 , 𝑥1 ).
Apply the induction hypothesis to 𝑓′ and obtain 𝑝′ , 𝐶𝑟′ , 𝑎′⃗ , and 𝑔′ as described in
the assertion of Theorem 1.7.10. In the example in Figure 1.7.13 we can set 𝐶𝑟′ = 𝐶 ′
because 𝐶 ′ is reversible. So we have 𝑝′ = 0, 𝑎′⃗ = (), and 𝑔′ ∶ {0, 1}2 → {0, 1}0 , 𝑏 ⃗ ↦ ().
The reversible circuit 𝐶𝑟 is obtained from 𝐶𝑟′ as follows. We set 𝑝 = 𝑝′ + 1 and add one
ancilla bit 𝑎𝑝−1 = 1. In addition, we add a Toffoli gate that replaces the removed 𝖭𝖠𝖭𝖣
gate. Its first input is 𝑎𝑝−1 = 1. The two other inputs are 𝑦′𝑖 and 𝑦′𝑚 . The corresponding
output gates are removed. Then the output of the Toffoli gate is (𝑦′𝑖 ↑ 𝑦′𝑚 , 𝑦′𝑖 , 𝑦′𝑚 ).
The first outgoing edge of the Toffoli gate is connected to a new output gate 𝑦 𝑖 . The
two other outgoing edges are connected to two new garbage output gates. So we have
′ ′
𝑔(𝑥)⃗ = 𝑔′ (𝑥)‖(𝑦
⃗ ′ ′ ′
𝑖 , 𝑦 𝑚 ). We note that |𝐶𝑟 | = |𝐶𝑟 | + 1 = |𝐶 |𝐹 + 1 = |𝐶|𝐹 and 𝑝 = 𝑝 + 1 ≤
′ ′
2|𝐶 |𝐹 + 1 < 2|𝐶 |𝐹 + 2 = 2|𝐶|𝐹 .
Figure 1.7.14 shows how this construction works for the example in Figure 1.7.12.
In this example, 𝑎0 is the first, 𝑥0 is the second, and 𝑥1 to the third input of the Toffoli
gate. So the circuit 𝐶𝑟 is a simple modification of the Toffoli gate that implements the

𝑥0 𝑥0 𝑦′0 = 𝑥0
𝑦 1 = 𝑥0 ↑ 𝑥1
𝑥1 𝑥1 𝑦′1 = 𝑥1

Figure 1.7.13. 𝐶𝑟′ = 𝐶 ′ is obtained by removing the 𝖭𝖠𝖭𝖣 gate from 𝐶.


50 1. Classical Computation

𝑓(𝑥0 , 𝑥1 ) = 𝑥0 ↑ 𝑥1
𝑥0 𝑔0 (𝑥0 , 𝑥1 ) = 𝑥0
𝑥1 𝑔1 (𝑥0 , 𝑥1 ) = 𝑥1
𝑎0 = 1

Figure 1.7.14. Construction of 𝐶𝑟 from 𝐶𝑟′ in Figure 1.7.13.

𝖭𝖠𝖭𝖣 gate and we have 𝑝 = 1 = 𝑝′ + 1, 𝑔(𝑥0 , 𝑥1 ) = (𝑥0 , 𝑥1 ), and |𝐶𝑟 | = 1 = |𝑓|𝐹 . This
concludes the proof. □
Exercise 1.7.11. State and prove a theorem analogous to Theorem 1.7.10 where Fred-
kin gates are used instead of Toffoli gates.

When we use the construction from the proof of Theorem 1.7.10 to construct quan-
tum circuits, then the garbage may be problematic. Therefore, we need the following
theorem whose proof uses the so-called uncompute trick.
Theorem 1.7.12. For all Boolean functions 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ, there is
𝑝 ∈ ℕ0 , 𝑝 ≤ 2|𝑓|𝐹 , a reversible circuit 𝐷𝑟 with |𝐷𝑟 | = O(|𝑓|𝐹 ) that uses only Toffoli, 𝖭𝖮𝖳,
and 𝖢𝖭𝖮𝖳 gates such that 𝐷𝑟 implements a function
(1.7.9) ℎ ∶{0, 1}𝑛 × {0, 1}𝑛+𝑝 × {0, 1}𝑚 → {0, 1}𝑛 × {0, 1}𝑛+𝑝 × {0, 1}𝑚

with
(1.7.10) ℎ(𝑥,⃗ 0,⃗ 𝑦)⃗ = (𝑥,⃗ 0,⃗ 𝑦 ⃗ ⊕ 𝑓(𝑥))

for all 𝑥⃗ ∈ {0, 1}𝑛 and 𝑦 ⃗ ∈ {0, 1}𝑚 .

Proof. Let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 , 𝑚, 𝑛 ∈ ℕ. Let 𝑝, 𝐶𝑟 , 𝑎,⃗ and 𝑔 be as in Theorem 1.7.10.
We construct the circuit 𝐷𝑟 from 𝐶𝑟 . This construction is illustrated in Figure 1.7.15
for 𝐶𝑟 from Figure 1.7.14.
𝐷𝑟 has a total of 2𝑛 + 𝑝 + 𝑚 input nodes. The initial sequence consists of the
first 𝑛 nodes, represented as 𝑥⃗ = (𝑥0 , . . . , 𝑥𝑛−1 ), followed by 𝑥′⃗ = (𝑥0′ , . . . , 𝑥𝑛−1

). Subse-
quently, we include sequences of 𝑝 ancillary input nodes, denoted as 𝑎⃗ = (𝑎′0 , . . . , 𝑎𝑝−1 ),

and 𝑚 input nodes, 𝑦 ⃗ = (𝑦0 , . . . , 𝑦𝑚 ). In the example shown in Figure 1.7.15, where
𝑛 = 2, 𝑚 = 𝑝 = 1, we append two input nodes after 𝑥1 , set 𝑎0 to 0, and introduce the
input node 𝑦0 .
The circuit 𝐷𝑟 applies a bitwise 𝖢𝖭𝖮𝖳 to 𝑥⃗ and 𝑥′⃗ . If 𝑥′⃗ = 0,⃗ then this operation
copies 𝑥⃗ to 𝑥′⃗ . In 𝐷𝑟 , there is also a 𝖭𝖮𝖳 gate after each ancilla input node whose value
in 𝑎⃗ is 1. These 𝖭𝖮𝖳 gates change an ancillary bit vector 0⃗ of length 𝑝 to 𝑎.⃗ In the
example, we have 𝑎0 = 1. Therefore, a 𝖭𝖮𝖳 gate is inserted behind the input node 𝑎0 .
Now, the reversible circuit 𝐶𝑟 is applied to the input 𝑥′⃗ ‖𝑎′⃗ . This does not change
𝑥⃗ and 𝑦.⃗ The circuit 𝐷𝑟 then copies 𝑓(𝑥)⃗ to 𝑦 ⃗ using bitwise 𝖢𝖭𝖮𝖳. In the example, 𝐶𝑟
produces the bit string (𝑓(𝑥0 , 𝑥1 ), 𝑔0 (𝑥0 , 𝑥1 ), 𝑔1 (𝑥0 , 𝑥1 )) where 𝑓(𝑥0 , 𝑥1 ) = 𝑥0 ↑ 𝑥1 . In
addition, a 𝖢𝖭𝖮𝖳 gate is required to copy 𝑓(𝑥0 , 𝑥1 ) to 𝑦0 .
1.7. Reversible circuits 51

𝑥0 𝑥0
𝑥1 𝑥1

𝑥0′ = 0 𝑥0 𝑓(𝑥0 , 𝑥1 ) 𝑥0 0

𝑥1′ = 0 𝑥1 𝐶𝑟 𝑔0 (𝑥0 , 𝑥1 ) 𝐶𝑟−1 𝑥1 0

𝑎0 = 0 ¬ 1 𝑔1 (𝑥0 , 𝑥1 ) 1 ¬ 0

𝑦0 𝑦0 ⊕ 𝑓(𝑥0 , 𝑥1 )

Figure 1.7.15. Construction of 𝐷𝑟 from Theorem 1.7.12.

Finally, the uncompute trick is used. The inverse circuit 𝐶𝑟−1 is applied to the bits
with indices 𝑛 . . . 𝑛 + 𝑝 − 1. This gives (𝑥,⃗ 𝑎).
⃗ Since the first 𝑛 input nodes have not
changed their values, 𝖢𝖭𝖮𝖳 gates can be used to change 𝑥′⃗ back to 0.⃗ In addition,
applying 𝖭𝖮𝖳 gates to the appropriate ancilla bits maps 𝑎⃗ to 0.⃗ In the example, two
𝖢𝖭𝖮𝖳 gates are used to obtain 𝑥′⃗ = 0.⃗ Also, one 𝖭𝖮𝖳 gate is required to change 𝑎0 = 1
to 𝑎0 = 0. The assertion about the size of 𝐷𝑟 is verified in Exercise 1.7.13. □
Exercise 1.7.13. Show that the circuit 𝐷𝑟 from Theorem 1.7.12 satisfies |𝐷𝑟 | = O(|𝑓|𝐹 )
and determine an appropriate O-constant.
Exercise 1.7.14. Construct a reversible circuit that computes the function 𝑓(𝑏0 , 𝑏1 ) =
𝑏0 ↓ 𝑏1 .
Chapter 2

Hilbert Spaces

Hilbert spaces, named after the mathematician David Hilbert, serve as the fundamental
mathematical framework for quantum mechanics. In the context of quantum comput-
ing, finite-dimensional complex Hilbert spaces prove to be sufficient. These spaces are
complex vector spaces equipped with an inner product, known as state spaces in the
context of quantum computing. They provide a powerful mathematical representation
of the quantum mechanics that is required for understanding quantum algorithms, in
particular quantum states, their evolution, and measurement. In our presentation, we
introduce and use the bra-ket notation. It is also referred to as Dirac notation due to
its originator, physicist Paul Dirac, in the 1930s, and plays a central role in quantum
mechanics. Its primary purpose is to simplify the presentation of the mathematical
framework of Hilbert spaces, making it more accessible and concise.
It is crucial to emphasize the following point: in the realm of general quantum
mechanics, which models the behavior of particles at the atomic and subatomic levels,
finite-dimensional Hilbert spaces frequently fall short when addressing numerous as-
pects of the theory. As we confront progressively more complex and intricate quantum
systems, the limitations inherent in finite-dimensional spaces become increasingly ap-
parent. This expansion of complexity necessitates a departure from the confines of
linear algebra, leading quantum theory into the profound domains of mathematical
analysis
Appendix B presents general linear algebra, which forms the foundation for this
chapter. Our objective here is to introduce the concept of finite-dimensional Hilbert
spaces and explore their notable properties. We employ the widely used Dirac notation
from physics, which proves to be elegant, and we select our examples from the realm of
quantum computing. After the introductory sections, this chapter delves into vital con-
cepts crucial for our discussion of quantum mechanics and quantum algorithms. We
explore various special operators, such as Hermitian, unitary, and normal operators,
involutions, and projections, which play pivotal roles in modeling quantum computa-
tion. Of particular significance are the spectral theorem, essential, for example, when

53
54 2. Hilbert Spaces

stating the quantum mechanical postulates, and the Schmidt decomposition theorem,
enabling the characterization of a fundamental quantum mechanical phenomenon:
entanglement.
Throughout this chapter, 𝑘 is a positive integer and ℍ denotes a complex vector
space of dimension 𝑘.

2.1. Kets and state spaces


In this section, we introduce the ket notation which is used to denote elements of the
complex vector space ℍ. We illustrate this notation by applying it to the so-called state
spaces, which provide a convenient way of modeling the states of quantum computers.
Additionally, we demonstrate how elements of ℍ can be represented as vectors in the
complex vector space ℂ𝑘 .

2.1.1. Kets. As explained in Appendix B, elements of vector spaces are called vec-
tors and are denoted by 𝑣 ⃗ for some character 𝑣. In quantum physics, the following
notation is used.
Definition 2.1.1. Every element of ℍ is called a ket. Such a ket is denoted by |𝜑⟩ for
some character 𝜑 and is pronounced “ket-𝜑”. The character 𝜑 may be replaced by any
other character, number, or even word.

It is important to note that the sum of two kets |𝜑⟩ , |𝜓⟩ ∈ ℍ is written as |𝜑⟩ + |𝜓⟩
and not as |𝜑 + 𝜓⟩. In the same way, all expressions that contain several kets are written
by keeping the | ⟩ notation for all kets. In the next section, we will present examples of
the ket notation.

2.1.2. State spaces. We introduce the finite-dimensional complex vector spaces


that will be central for modeling quantum computing. Let 𝑛 ∈ ℕ. Consider the lexico-
graphically ordered sequence

(2.1.1) 𝐵𝑛 = (||𝑏⟩)
⃗ .

𝑏∈{0,1}𝑛

For example, if 𝑛 = 2, then this sequence is


(2.1.2) 𝐵2 = (|00⟩ , |01⟩ , |10⟩ , |11⟩).
We define ℍ𝑛 as the set of all formal linear combinations of the elements of 𝐵𝑛 with
complex coefficients; that is,

(2.1.3) ℍ𝑛 = ∑ ℂ ||𝑏⟩⃗ = { ∑ 𝛼𝑏⃗ ||𝑏⟩⃗ ∶ 𝛼𝑏⃗ ∈ ℂ for all 𝑏 ⃗ ∈ {0, 1}𝑛 } .



𝑏∈{0,1}𝑛 ⃗
𝑏∈{0,1}𝑛

Note that the first sum in (2.1.3) is direct. On ℍ𝑛 , we define componentwise addition
and multiplication with complex scalars as follows. If

(2.1.4) |𝜑⟩ = ∑ 𝛼𝑏⃗ ||𝑏⟩⃗ , |𝜓⟩ = ∑ 𝛽𝑏⃗ ||𝑏⟩⃗



𝑏∈{0,1}𝑛 ⃗
𝑏∈{0,1}𝑛
2.1. Kets and state spaces 55

where 𝛼𝑏⃗ , 𝛽𝑏⃗ ∈ ℂ for all 𝑏 ⃗ ∈ {0, 1}𝑛 , then we set

(2.1.5) |𝜑⟩ + |𝜓⟩ = ∑ (𝛼𝑏⃗ + 𝛽𝑏⃗ ) ||𝑏⟩⃗ .



𝑏∈{0,1}𝑛

Also, if 𝛾 ∈ ℂ, then we set


(2.1.6) 𝛾 |𝜑⟩ = ∑ 𝛾𝛼𝑏⃗ ||𝑏⟩⃗ .

𝑏∈{0,1}𝑛

Then ℍ𝑛 is a complex vector space, and the sequence (||𝑏⟩) ⃗ is a basis of ℍ𝑛 .


𝑏∈{0,1}𝑛
𝑛
Therefore, the dimension of ℍ𝑛 is 2 . This vector space will be used in Section 3.1.5 to
describe the states of 𝑛-qubit registers. This explains the following definition.
Definition 2.1.2. (1) The 2𝑛 -dimensional complex vector space ℍ𝑛 from (2.1.3)
equipped with addition and scalar multiplication as defined in (2.1.5) and (2.1.6)
is called the 𝑛-qubit state space. In particular, ℍ1 is called the single-qubit state
space.
(2) The lexicographically ordered sequence (||𝑏⟩) ⃗ is called the computational

𝑏∈{0,1}𝑛

basis of ℍ𝑛 .
Example 2.1.3. In classical computing, the bits 0 and 1 are used. In quantum comput-
ing, these bits are replaced by quantum bits or qubits. The state of a qubit is an element
in the single-qubit state space ℍ1 . This will be explained in more detail in Section 3.1.2.
The computational basis of ℍ1 is 𝐵 = (|0⟩ , |1⟩). Another basis of ℍ1 is
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.1.7) (|𝑥+ ⟩ , |𝑥− ⟩) = ( , ).
√2 √2
Here the symbols 𝑥+ and 𝑥− are used to denote the basis elements. This basis will play
a role in later sections.
Exercise 2.1.4. Show that (|𝑥+ ⟩ , |𝑥− ⟩) is a basis of the single-qubit state space ℍ1 .
Example 2.1.5. We describe an alternative representation of the computational basis
elements of the 𝑛-qubit state space ℍ𝑛 . For this, we use the map
𝑛−1
(2.1.8) stringToInt ∶ {0, 1}𝑛 → ℤ2𝑛 , 𝑏 ⃗ = (𝑏0 ⋯ 𝑏𝑛−1 ) ↦ ∑ 𝑏𝑖 2𝑛−𝑖−1
𝑖=0

which was introduced in Definition 1.1.12. Also, in Exercise 1.1.13 it was shown that
this map is a bijection. Using this bijection, we identify the bit vectors in {0, 1}𝑛 with
the integers in ℤ2𝑛 . For example, the bit vector (010) is identified with the integer
0 ⋅ 22 + 1 ⋅ 21 + 0 ⋅ 20 = 2.
So we can write the computational basis of ℍ𝑛 as (|𝑏⟩𝑛 )𝑏∈ℤ2𝑛 , where the index 𝑛
indicates that the number in the ket is considered as an 𝑛-bit string. For instance,
the computational basis of ℍ2 is (|0⟩2 , |1⟩2 , |2⟩2 , |3⟩2 ). We also obtain the following
alternative representation of ℍ𝑛 :
2𝑛 −1
(2.1.9) ℍ𝑛 = { ∑ 𝛼𝑏 |𝑏⟩𝑛 ∶ 𝛼𝑏 ∈ ℂ for all 𝑏 ∈ ℤ2𝑛 } .
𝑏=0
56 2. Hilbert Spaces

2.1.3. Vector representations. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be any basis of ℍ. In


Definition B.6.6 we have assigned to each |𝜑⟩ ∈ ℍ its coefficient vector
(2.1.10) |𝜑⟩𝐵 = (𝛼0 , . . . , 𝛼𝑘−1 ) ∈ ℂ𝑘
with respect to 𝐵. It is the uniquely determined vector in ℂ𝑘 with
𝑘−1
(2.1.11) |𝜑⟩ = 𝐵 |𝜑⟩𝐵 = ∑ 𝛼𝑖 |𝑏𝑖 ⟩ .
𝑖=0

Theorem B.6.7 states that the map


(2.1.12) ℍ → ℂ𝑘 , |𝜑⟩ ↦ |𝜑⟩𝐵
is an isomorphism of ℂ-vector spaces. Using this isomorphism, we identify kets in ℍ
with vectors in ℂ𝑘 which is useful in many contexts.
Example 2.1.6. The definition of the single qubit state space ℍ1 and Exercise 2.1.4 tell
us that 𝐵 = (|0⟩ , |1⟩) and 𝐶 = (|𝑥+ ⟩ , |𝑥− ⟩) from (2.1.7) are bases of ℍ1 . Let
(2.1.13) |𝜑⟩ = |0⟩ + 𝑖 |1⟩ .
Then the coefficient vector of |𝜑⟩ with respect to the basis 𝐵 is
(2.1.14) |𝜑⟩𝐵 = (1, 𝑖).
Also, we have
|𝜑⟩ = |0⟩ + 𝑖 |1⟩
(2.1.15) 1+𝑖 1−𝑖
= |𝑥+ ⟩ + |𝑥− ⟩ .
√2 √2
Hence, the coefficient vector of |𝜑⟩ with respect to the basis 𝐶 is
1+𝑖 1−𝑖
(2.1.16) |𝜑⟩𝐶 = ( , ).
√2 √2

2.2. Inner products


In this section, we introduce and discuss inner products on ℍ.

2.2.1. Basics. Inner products on ℍ are maps ℍ × ℍ → ℂ with certain properties


which are explained in Definition 2.2.1. We will write them as
(2.2.1) ⟨⋅|⋅⟩ ∶ ℍ × ℍ → ℂ, (|𝜑⟩ , |𝜓⟩) ↦ ⟨𝜑|𝜓⟩
and use the following simplifying notation. Let
𝑚−1 𝑛−1
(2.2.2) |𝜑⟩ = ∑ 𝛼𝑖 |𝜑𝑖 ⟩ , |𝜓⟩ = ∑ 𝛽 𝑖 |𝜓𝑖 ⟩
𝑖=0 𝑖=0

with 𝑚, 𝑛 ∈ ℕ, 𝛼𝑖 ∈ ℂ, |𝜑𝑖 ⟩ ∈ ℍ for all 𝑖 ∈ ℤ𝑚 , and 𝛽 𝑖 ∈ ℂ, |𝜓𝑖 ⟩ ∈ ℍ for all 𝑖 ∈ ℤ𝑛 .


Then we write
𝑚−1 𝑛−1
(2.2.3) ⟨𝜑|𝜓⟩ = ( ∑ 𝛼𝑖 ⟨𝜑𝑖 |) ( ∑ 𝛽 𝑖 |𝜓𝑖 ⟩) .
𝑖=0 𝑖=0
2.2. Inner products 57

So, we change each ket |𝜑𝑖 ⟩ in the left argument to a so-called bra ⟨𝜑𝑖 | (see Section
2.2.3 for an explanation) with the same symbol inside and omit the outer ⟨⟩. Using this
notation, we now define inner products.
Definition 2.2.1. An inner product on ℍ is a map
(2.2.4) ⟨⋅|⋅⟩ ∶ ℍ × ℍ → ℂ, (|𝜑⟩ , |𝜓⟩) ↦ ⟨𝜑|𝜓⟩
that satisfies the following three conditions for all kets |𝜉⟩ , |𝜑⟩ , |𝜓⟩ ∈ ℍ and all scalars
𝛼 ∈ ℂ.
(1) Linearity in the second argument: ⟨𝜉| (|𝜑⟩ + |𝜓⟩) = ⟨𝜉|𝜑⟩ + ⟨𝜉|𝜓⟩ and ⟨𝜑| (𝛼 |𝜓⟩) =
𝛼⟨𝜑|𝜓⟩.
(2) Conjugate symmetry: ⟨𝜓|𝜑⟩ = ⟨𝜑|𝜓⟩. This property is also called Hermitian sym-
metry or conjugate commutativity. It implies that ⟨𝜑|𝜑⟩ is a real number.
(3) Positive definiteness: ⟨𝜑|𝜑⟩ ≥ 0 and ⟨𝜑|𝜑⟩ = 0 if and only if |𝜑⟩ = 0. This property
is also called positivity.

Inner products on real vector spaces are defined analogously, but the conjugate
symmetry condition becomes a symmetry condition. Note that the definition of inner
products does not require ℍ to be finite dimensional.
Exercise 2.2.2. Show that for all |𝜑⟩ ∈ ℍ the inner product ⟨𝜑|𝜑⟩ is a real number.

We present three important properties of inner products.


Proposition 2.2.3. Let ⟨⋅|⋅⟩ be an inner product on ℍ. Then for all 𝛼 ∈ ℂ, |𝜉⟩ , |𝜑⟩ , |𝜓⟩ ∈
ℍ, and the zero element 0⃗ ∈ ℍ the following hold.

(1) ⟨0|𝜑⟩ = ⟨𝜑|0⟩⃗ = 0.
(2) (⟨𝜉| + ⟨𝜑|) |𝜓⟩ = ⟨𝜉|𝜓⟩ + ⟨𝜑|𝜉⟩ and (𝛼 ⟨𝜑|) |𝜓⟩ = 𝛼⟨𝜑|𝜓⟩.
(3) (⟨𝜑| + ⟨𝜓|)(|𝜑⟩ + |𝜓⟩) = ⟨𝜑|𝜑⟩ + 2ℜ⟨𝜑|𝜓⟩ + ⟨𝜓|𝜓⟩.

The second property in Proposition 2.2.3 is called sesquilinearity or conjugate lin-


earity of the inner product in the first argument.
Exercise 2.2.4. Prove Proposition 2.2.3.

Using the linearity of the inner product in the second argument and the conjugate
linearity in the first argument we obtain the distributive law
𝑚−1 𝑛−1 𝑚−1 𝑛−1
(2.2.5) ( ∑ 𝛼𝑖 ⟨𝜑𝑖 |) ( ∑ 𝛽𝑗 |𝜓𝑗 ⟩) = ∑ ∑ 𝛼𝑖 𝛽𝑗 ⟨𝜑𝑖 |𝜓𝑗 ⟩
𝑖=0 𝑗=0 𝑖=0 𝑗=0

where 𝑚, 𝑛 ∈ ℕ, 𝛼𝑖 ∈ ℂ, |𝜑𝑖 ⟩ ∈ ℍ for all 𝑖 ∈ ℤ𝑚 , and 𝛽𝑗 ∈ ℂ, |𝜓𝑗 ⟩ ∈ ℍ for all 𝑗 ∈ ℤ𝑛 .


The next proposition is useful in many contexts.
Proposition 2.2.5. If ⟨⋅|⋅⟩ is an inner product on ℍ and if ℍ′ is a linear subspace in ℍ,
then the restriction ⟨⋅|⋅⟩ℍ′ of ⟨⋅|⋅⟩ to ℍ′ is an inner product on ℍ′ .
Exercise 2.2.6. Prove Proposition 2.2.5.
58 2. Hilbert Spaces

2.2.2. Construction of inner products. We construct inner products on ℍ and


begin with the case ℍ = ℂ𝑘 . For the construction we define the dual of vectors in ℂ𝑘 .
In Section B.4.1 we identify every vector 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ ℂ𝑘 with the matrix
𝑣
⎛ 0 ⎞
𝑣
(2.2.6) ⎜ 1 ⎟ ∈ ℂ(𝑘,1)
⎜ ⋮ ⎟
⎝𝑣 𝑘−1 ⎠
that has 𝑣 ⃗ as its only column vector. This is used in the following definition.
Definition 2.2.7. Let 𝑘 ∈ ℕ and let 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ ℂ𝑘 . Then we define the dual
𝑣∗⃗ of 𝑣 ⃗ as the complex conjugate and transpose of the matrix in ℂ(1,𝑘) with 𝑣 ⃗ as its only
row vector; that is,

(2.2.7) 𝑣∗⃗ = 𝑣T⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ).

The definition of matrix multiplication allows multiplying the dual 𝑣∗⃗ of a vector
𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ ℂ𝑘 with another vector 𝑤⃗ = (𝑤 0 , . . . , 𝑤 𝑘−1 ) ∈ ℂ𝑘 . The result is
𝑤0 𝑘−1
(2.2.8) 𝑣∗⃗ 𝑤⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ( ⋮ ) = ∑ 𝑣 𝑖 𝑤 𝑖 .
𝑤 𝑘−1 𝑖=0

Example 2.2.8. Let 𝑘 = 2, 𝑣 ⃗ = (1, 𝑖), and 𝑤⃗ = (𝑖, 1). Then we have 𝑣∗⃗ = (1, −𝑖) and
𝑣∗⃗ 𝑤⃗ = 𝑖 − 𝑖 = 0.

Now we introduce the standard Hermitian inner product on ℂ𝑘 .


Theorem 2.2.9. The map
(2.2.9) ⟨⋅|⋅⟩ ∶ ℂ𝑘 × ℂ𝑘 → ℂ, (𝑣,⃗ 𝑤)⃗ ↦ ⟨𝑣|⃗ 𝑤⟩⃗ = 𝑣∗⃗ 𝑤⃗
is an inner product on the complex vector space ℂ𝑘 . It is called the Hermitian inner prod-
uct on ℂ𝑘 .

We will always write the standard Hermitian inner product on ℂ𝑘 as ⟨⋅|⋅⟩.


Exercise 2.2.10. Prove Theorem 2.2.9.

We use Theorem 2.2.9 to construct further inner products on ℍ.


Corollary 2.2.11. Let 𝐵 be a basis of ℍ. Then the map

(2.2.10) ⟨⋅|⋅⟩𝐵 ∶ ℍ × ℍ → ℂ, (|𝜑⟩ , |𝜓⟩) ↦ ⟨𝜑|𝜓⟩𝐵 = |𝜑⟩𝐵 |𝜓⟩𝐵
is an inner product on ℍ. It is called the Hermitian inner product on ℍ with respect to
the basis 𝐵.
Exercise 2.2.12. Prove Corollary 2.2.11.

Next, we show that the Hermitian inner product with respect to a basis 𝐵 of ℍ can
be used to determine the coefficient vectors of kets in ℍ with respect to 𝐵.
2.2. Inner products 59

Proposition 2.2.13. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be a basis of ℍ. Then the following hold.
(1) For all 𝑖, 𝑗 ∈ ℤ𝑘 we have
(2.2.11) ⟨𝑏𝑖 |𝑏𝑗 ⟩𝐵 = 𝛿 𝑖,𝑗 .
(2) For all |𝜑⟩ ∈ ℍ we have
𝑘−1
(2.2.12) |𝜑⟩ = ∑ ⟨𝑏𝑖 |𝜑⟩𝐵 |𝑏𝑖 ⟩ .
𝑖=0

Exercise 2.2.14. Prove Proposition 2.2.13.

2.2.3. Bras. We introduce and discuss the bra notation that will further simplify
the presentation of the theory of Hilbert spaces. We assume that ⟨⋅|⋅⟩ is an inner product
on ℍ. In the next definition, the dual of ℍ is used, which — as explained in Section
B.3.2 — is ℍ∗ = End(ℍ, ℂ).
Definition 2.2.15. Every element of the dual ℍ∗ of ℍ is called a bra. Such a bra is
denoted by ⟨𝜑| for some character 𝜑 and is pronounced “bra-𝜑”. The character 𝜑 may
be replaced by any other character, number, or even word.

The next theorem associates with every ket in ℍ a bra in ℍ∗ .


Theorem 2.2.16. (1) For all |𝜑⟩ ∈ ℍ the map
(2.2.13) ⟨𝜑| ∶ ℍ → ℂ, |𝜓⟩ ↦ ⟨𝜑|𝜓⟩

is in the dual ℍ of ℍ. It is called the dual of |𝜑⟩.
(2) The map
(2.2.14) ℍ ↦ ℍ∗ , |𝜑⟩ ↦ ⟨𝜑|
is a conjugate linear bijection; that is, for all |𝜑⟩ , |𝜓⟩ ∈ ℍ and 𝛼 ∈ ℂ the following
hold. For |𝜉⟩ = |𝜑⟩ + |𝜓⟩ we have ⟨𝜉| = ⟨𝜑| + ⟨𝜓| and for |𝜉⟩ = 𝛼 |𝜑⟩ we have
⟨𝜉| = 𝛼 |𝜑⟩.

Proof. It follows from the linearity in the second argument of the inner product that
for all |𝜑⟩ ∈ ℍ the map ⟨𝜑| from (2.2.13) is in ℍ∗ . Furthermore, due to the positivity
of the inner product, the map (2.2.14) is injective. Hence, Theorem B.6.18 implies that
this map is a bijection. Its conjugate linearity follows from Proposition 2.2.3. □

Theorem 2.2.16 shows that all elements in ℍ∗ can be written as ⟨𝜑| for some unique-
ly determined |𝜑⟩ ∈ ℍ and vice versa. Therefore, we will always use the same character,
number, or word inside |⟩ and ⟨| to denote the kets and bras that correspond to each
other. The construction of ⟨𝜑| from |𝜑⟩ ∈ ℍ will be explained in Proposition 2.2.37.
The bra notation is quite elegant since for every |𝜑⟩ , |𝜓⟩ ∈ ℍ the image of |𝜓⟩ ∈
ℍ under ⟨𝜑| is obtained by “gluing” the two expressions ⟨𝜑| and |𝜓⟩ together, giving
⟨𝜑|𝜓⟩. We also obtain the following interpretation of the notation introduced in (2.2.3):
the inner product of a linear combination of kets with another linear combination of
kets is obtained by gluing together the linear combination of bras corresponding to the
first linear combination of kets with the second linear combination of kets. The linear
60 2. Hilbert Spaces

combinations are written in parentheses. Also, the distributive law from (2.2.5) holds
for bras and kets.
Exercise 2.2.17. Determine the images of the computational basis states of ℍ1 under
⟨𝑥+ | and ⟨𝑥− |.

2.2.4. Hilbert spaces. We define finite-dimensional Hilbert spaces. To define


general Hilbert spaces, the notion of a complete metric space is required, which is be-
yond the scope of this book.
Definition 2.2.18. A finite-dimensional Hilbert space is a pair (𝑉, ⟨⋅|⋅⟩) where 𝑉 is a
finite-dimensional real or complex vector space and ⟨⋅|⋅⟩ is an inner product on 𝑉.

By Corollary 2.2.11, there is an inner product on every finite-dimensional com-


plex vector space. Hence, every finite-dimensional complex vector space can be made
a Hilbert space by choosing such an inner product on it. Also, if (ℍ, ⟨⋅|⋅⟩) is a finite-
dimensional complex Hilbert space and if ℍ′ is a linear subspace of ℍ, then it follows
from Proposition 2.2.5 that (ℍ′ , ⟨⋅|⋅⟩ℍ′ ) is also a finite-dimensional complex Hilbert
space where ⟨⋅|⋅⟩ℍ′ denotes the restriction of the inner product ⟨⋅|⋅⟩ to ℍ′ .
State spaces are of particular importance in the discussion of quantum algorithms.
We now define inner products on these spaces so that they become Hilbert spaces.
Definition 2.2.19. Let 𝑛 ∈ ℕ. Then we denote by ⟨⋅|⋅⟩ the inner product on ℍ𝑛 with re-
spect to the computational basis of ℍ𝑛 . We also write ℍ𝑛 for the Hilbert space (ℍ𝑛 ⟨⋅|⋅⟩).

We now discuss some examples of finite-dimensional complex Hilbert spaces.


Example 2.2.20. Consider the single-qubit state space ℍ1 . We determine the inner
product of
(2.2.15) |𝜑⟩ = 𝛼 |0⟩ + 𝛽 |1⟩ and |𝜓⟩ = 𝛾 |0⟩ + 𝛿 |1⟩
where 𝛼, 𝛽, 𝛾, 𝛿 ∈ ℂ. Using (2.2.5) and Proposition 2.2.13, we obtain the following:
(2.2.16) ⟨𝜑|𝜓⟩ = 𝛼𝛾 + 𝛽𝛿.
Next, we recall that 𝐶 = (|𝑥+ ⟩ , |𝑥− ⟩) introduced in (2.1.7) is another basis of ℍ1 . So
(ℍ1 , ⟨⋅|⋅⟩𝐶 ) is also a Hilbert space. The representation of |𝜑⟩ and |𝜓⟩ with respect to 𝐶 is
𝛼+𝛽 𝛼−𝛽 𝛾+𝛿 𝛾−𝛿
(2.2.17) |𝜑⟩ = |𝑥+ ⟩ + |𝑥− ⟩ , |𝜓⟩ = |𝑥+ ⟩ + |𝑥− ⟩ .
√2 √2 √2 √2
This implies
1
⟨𝜑|𝜓⟩𝐶 = ((𝛼 + 𝛽)(𝛾 + 𝛿) + (𝛼 − 𝛽)(𝛾 − 𝛿))
2
(2.2.18) 1
= (𝛼𝛾 + 𝛼𝛿 + 𝛽𝛾 + 𝛽𝛿 + 𝛼𝛾 − 𝛼𝛿 − 𝛽𝛾 + 𝛽𝛿)
2
= 𝛼𝛾 + 𝛽𝛿.
We see that
(2.2.19) ⟨𝜑|𝜓⟩𝐵 = ⟨𝜑|𝜓⟩𝐶
for all |𝜑⟩ , |𝜓⟩ ∈ ℍ1 . Hence, the Hilbert spaces (ℍ, ⟨⋅|⋅⟩𝐵 ) and (ℍ, ⟨⋅|⋅⟩𝐶 ) are identical.
2.2. Inner products 61

Exercise 2.2.21. Find a basis 𝐶 of ℍ1 such that the Hilbert spaces (ℍ1 , ⟨⋅|⋅⟩) and
(ℍ1 , ⟨⋅|⋅⟩𝐶 ) are different.

2.2.5. Norm. Another important notion is the norm on a Hilbert space which we
define now.
Definition 2.2.22. A norm on ℍ is a function 𝑓 ∶ ℍ → ℝ, |𝜑⟩ ↦ 𝑓 |𝜑⟩ which for all
|𝜑⟩ , |𝜓⟩ ∈ ℍ and all 𝛼 ∈ ℂ satisfies the following conditions.
(1) Triangle inequality: 𝑓(|𝜑⟩ + |𝜓⟩) ≤ 𝑓 |𝜑⟩ + 𝑓 |𝜓⟩.
(2) Absolute homogeneity: 𝑓(𝛼 |𝜑⟩) = |𝛼|𝑓 |𝜑⟩.
(3) Positive definiteness: 𝑓 |𝜑⟩ ≥ 0 and 𝑓 |𝜑⟩ = 0 if and only |𝜑⟩ = 0.

Exercise 2.2.23. Verify that the map


(2.2.20) ℂ → ℂ, 𝛼 = ℜ𝛼 + 𝑖ℑ𝛼 ↦ |𝛼| = √(ℜ𝛼)2 + (ℑ𝛼)2
is a norm on ℂ.
Exercise 2.2.24. Show that the map
𝑘−1
(2.2.21) | ⋅ | ∶ ℂ𝑘 → ℂ, 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ↦ |𝑣|⃗ = ∑ |𝑣 𝑖 |
𝑖=0

is a norm on ℂ𝑘 .

We show how to construct a norm from an inner product on ℍ. For this, we let
⟨⋅|⋅⟩ be such an inner product. We refer to the Hilbert space (ℍ, ⟨⋅|⋅⟩) simply by ℍ. But
we must keep in mind which inner product we have chosen on ℍ since changing the
inner product changes the norm.
Proposition 2.2.25. The map

(2.2.22) ‖⋅‖ ∶ ℍ → ℝ, |𝜑⟩ ↦ ‖𝜑‖ = √⟨𝜑|𝜑⟩


is a norm on ℍ that satisfies the Cauchy-Schwarz inequality
(2.2.23) |⟨𝜑|𝜓⟩| ≤ ‖𝜑‖‖𝜓‖
for all |𝜑⟩ , |𝜓⟩ ∈ ℍ.

Proof. We start by proving the Cauchy-Schwarz inequality. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ and 𝛼 ∈
ℂ. For 𝑥 ∈ ℝ let
𝑝(𝑥) = (⟨𝜑| − 𝑥 ⟨𝜓|)(|𝜑⟩ − 𝑥 |𝜓⟩)
(2.2.24)
= 𝑥2 ⟨𝜓|𝜓⟩ − (⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩)𝑥 + ⟨𝜑|𝜑⟩.
Since ⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩ = 2ℜ⟨𝜑|𝜓⟩ is a real number, it follows that the coefficients of 𝑝(𝑥),
considered as a quadratic polynomial, are real numbers. The discriminant of this poly-
nomial (see Exercise A.4.49) is
(2.2.25) Δ(𝑝) = (⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩)2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩.
62 2. Hilbert Spaces

This equation and the conjugate symmetry of the inner product imply
Δ(𝑝) = (⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩)2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩
= |⟨𝜑|𝜓⟩ + ⟨𝜓|𝜑⟩|2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩
(2.2.26)
≤ (|⟨𝜑|𝜓⟩| + |⟨𝜓|𝜑⟩|)2 − 4⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩
= 4(|⟨𝜑|𝜓⟩|2 − ⟨𝜑|𝜑⟩⟨𝜓|𝜓⟩).
But 𝑝(𝑥) is nonnegative for all 𝑥 ∈ ℝ. Therefore, this polynomial can have at most
one real root, which means that its discriminant is nonpositive. So (2.2.26) implies the
Cauchy-Schwarz inequality.
Now we apply the Cauchy-Schwarz inequality and obtain
2
‖|𝜑⟩ − |𝜓⟩‖
= (⟨𝜑| − ⟨𝜓|)(|𝜑⟩ − |𝜓⟩)
2 2
(2.2.27) = ‖𝜑‖ − ⟨𝜑|𝜓⟩ − ⟨𝜓|𝜑⟩ + ‖𝜓‖
2 2
≤ ‖𝜑‖ − 2‖𝜑‖‖𝜓‖ + ‖𝜓‖
= (‖𝜑‖ − ‖𝜓‖)2 .
This implies the triangle inequality.
Next, we let 𝛼 ∈ ℂ. Then, the linearity in the second element and the conjugate
linearity in the first argument of the inner product imply
2 2
(2.2.28) ‖𝛼 |𝜑⟩‖ = (𝛼 |𝜑⟩)(𝛼 |𝜑⟩) = |𝛼|2 ⟨𝜑|𝜑⟩ = |𝛼|2 ‖𝜑‖ .
This implies the absolute homogeneity of the norm.
Finally, the positive definiteness of ‖⋅‖ immediately follows from the positive def-
initeness of the inner product. □
Definition 2.2.26. The norm ‖⋅‖ → ℂ, |𝜑⟩ ↦ ‖𝜑‖ = √⟨𝜑|𝜑⟩ defined in Proposition
2.2.25 is called the Euclidean norm on the Hilbert space ℍ. It depends on the inner
product on ℍ. For |𝜑⟩ ∈ ℍ we also refer to ‖𝜑‖ as the length of |𝜑⟩.

2.2.6. Orthogonality. We discuss the important concept of orthogonality. For


this, we fix an inner product ⟨⋅|⋅⟩ on ℍ.
Definition 2.2.27. (1) Two kets in ℍ are called orthogonal to each other if their inner
product is zero.
(2) Two subsets 𝑆 0 and 𝑆 1 of ℍ are called orthogonal to each other if all |𝜙0 ⟩ ∈ 𝑆 0 and
|𝜑1 ⟩ ∈ 𝑆 1 are orthogonal to each other.
(3) A sequence 𝐵 in ℍ is called orthogonal if any two different elements of 𝐵 are or-
thogonal.
(4) A sequence 𝐵 in ℍ is called orthonormal if it is orthogonal and all its elements
have Euclidean norm 1.
(5) By an orthogonal or orthonormal basis of ℍ we mean a basis of ℍ that is orthogonal
or orthonormal as a sequence, respectively.
2.2. Inner products 63

Example 2.2.28. The empty sequence () in ℍ is orthonormal since it has no elements


and, therefore, all statements about all of its elements are true.
Exercise 2.2.29. Let 𝐵 be a basis of ℍ and consider the Hilbert space (ℍ, ⟨⋅|⋅⟩𝐵 ). Show
that the basis 𝐵 is an orthonormal basis in this Hilbert space.
Exercise 2.2.30. Show that the basis (|𝑥+ ⟩ , |𝑥− ⟩) of ℍ1 is orthonormal.

The next theorem presents the Gram-Schmidt procedure that constructs an orthog-
onal basis from any basis of ℍ.
Theorem 2.2.31 (Gram-Schmidt procedure). Let 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑘−1 ⟩) be a basis of ℍ.
Set
(2.2.29) |𝑏0 ⟩ = |𝑐 0 ⟩
and for 1 ≤ 𝑗 < 𝑘 let
𝑗−1
⟨𝑏𝑖 |𝑐𝑗 ⟩
(2.2.30) |𝑏𝑗 ⟩ = |𝑐𝑗 ⟩ − ∑ |𝑏 ⟩ .
𝑖=0
⟨𝑏𝑖 |𝑏𝑖 ⟩ 𝑖

Then (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) is an orthogonal basis of ℍ and for 0 ≤ 𝑗 < 𝑘 we have


(2.2.31) Span {|𝑏0 ⟩ , . . . , |𝑏𝑗 ⟩} = Span {|𝑐 0 ⟩ , . . . , |𝑐𝑗 ⟩}.

Proof. We prove the assertion by induction on the dimension 𝑘 of ℍ. If 𝑘 = 1, then


(|𝑏0 ⟩) = (|𝑐 0 ⟩) is an orthogonal basis of ℍ. Assume that 𝑘 > 1 and that the assertion
holds for 𝑘 − 1. Set ℍ′ = Span{|𝑐 0 ⟩ , . . . , |𝑐 𝑘−2 ⟩}. The induction hypothesis then implies
that 𝐵 ′ = (|𝑏0 ⟩ , . . . , |𝑏𝑘−2 ⟩) is an orthogonal basis of ℍ′ and (2.2.31) holds for 0 ≤ 𝑗 ≤
𝑘 − 2. Furthermore, the definition of |𝑏𝑘−1 ⟩ in (2.2.30) implies (2.2.31) for 𝑗 = 𝑘 − 1. It
remains to show that ⟨𝑏𝑗 |𝑏𝑘−1 ⟩ = 0 for 0 ≤ 𝑗 ≤ 𝑘 − 2. So, let 𝑗 ∈ {0, . . . , 𝑘 − 2}. Then
the linearity in the second argument of the inner product and the orthogonality of the
sequence 𝐵 ′ imply
𝑘−2
⟨𝑏𝑖 |𝑐 𝑘−1 ⟩
⟨𝑏𝑗 |𝑏𝑘−1 ⟩ = ⟨𝑏𝑗 | (|𝑐 𝑘−1 ⟩ − ∑ |𝑏𝑖 ⟩)
𝑖=0
⟨𝑏𝑖 |𝑏𝑖 ⟩
𝑘−2
⟨𝑏𝑖 |𝑐 𝑘−1 ⟩
= ⟨𝑏𝑗 |𝑐 𝑘−1 ⟩ − ⟨𝑏𝑗 | ∑ |𝑏𝑖 ⟩
(2.2.32) 𝑖=0
⟨𝑏𝑖 |𝑏𝑖 ⟩
𝑘−2
⟨𝑏𝑖 |𝑐 𝑘−1 ⟩
= ⟨𝑏𝑗 |𝑐 𝑘−1 ⟩ − ∑ ⟨𝑏 |𝑏 ⟩
𝑖=0
⟨𝑏𝑖 |𝑏𝑖 ⟩ 𝑗 𝑖
= ⟨𝑏𝑗 |𝑐 𝑘−1 ⟩ − ⟨𝑏𝑗 |𝑐 𝑘−1 ⟩ = 0.
This concludes the proof. □

The process of constructing the orthogonal basis 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) from the
basis 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑘−1 ⟩) of ℍ presented in Theorem 2.2.31 is referred to as the Gram-
Schmidt orthogonalization of 𝐶. We also call the resulting orthogonal basis 𝐵 the Gram-
Schmidt orthogonalization of the basis 𝐶.
64 2. Hilbert Spaces

Example 2.2.32. Consider the basis (|𝑐 0 ⟩ , |𝑐 1 ⟩) = (|0⟩ , |0⟩ + |1⟩) of the single-qubit
state space ℍ1 . It is not orthogonal since
(2.2.33) ⟨𝑐 0 |𝑐 1 ⟩ = ⟨0| (|0⟩ + |1⟩) = ⟨0|0⟩ + ⟨0|1⟩ = 1.
We apply Gram-Schmidt orthogonalization to this basis. We obtain
(2.2.34) |𝑏0 ⟩ = |𝑐 0 ⟩ = |0⟩
and
⟨𝑏0 |𝑐 1 ⟩ ⟨0| (|0⟩ + |1⟩)
|𝑏1 ⟩ = |𝑐 1 ⟩ − |𝑏0 ⟩ = (|0⟩ + |1⟩) − |0⟩
(2.2.35) ⟨𝑏0 |𝑐 0 ⟩ ⟨0|0⟩
= (|0⟩ + |1⟩) − (⟨0|0⟩ + ⟨0|1⟩) |0⟩ = (|0⟩ + |1⟩) − |0⟩ = |1⟩ .
Hence, the Gram-Schmidt orthogonalization of the basis (|0⟩ , |0⟩ + |1⟩) of ℍ1 gives
(|0⟩ , |1⟩).

From Theorem 2.2.31 we obtain the following theorem.


Theorem 2.2.33. Every orthogonal or orthonormal sequence in ℍ is linearly indepen-
dent and can be extended to an orthogonal or orthonormal basis of ℍ, respectively.

Proof. Let 𝑙 ∈ ℕ and let 𝐶 ′ = (|𝑐 0 ⟩ , . . . |𝑐 𝑙−1 ⟩) be an orthogonal sequence in ℍ. By


Theorem B.7.2 it can be extended to a basis 𝐶 of ℍ. Let (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be the Gram-
Schmidt orthogonalization of 𝐶. We show by induction on 𝑙 that
(2.2.36) (|𝑏0 ⟩ , . . . , |𝑏𝑙−1 ⟩) = (|𝑐 0 ⟩ , . . . , |𝑐 𝑙−1 ⟩).
For 𝑙 = 1 the assertion follows from (2.2.29). Now let 𝑙 ∈ ℕ, 1 < 𝑙 ≤ 𝑘, and assume that
(2.2.37) (|𝑏0 ⟩ , . . . , |𝑏𝑙−2 ⟩) = (|𝑐 0 ⟩ , . . . , |𝑐 𝑙−2 ⟩).
Then we have
𝑙−2
⟨𝑏𝑖 |𝑐 𝑙−1 ⟩
|𝑏𝑙−1 ⟩ = |𝑐 𝑙−1 ⟩ − ∑ |𝑏𝑖 ⟩
𝑖=0
⟨𝑏𝑖 |𝑏𝑖 ⟩
(2.2.38)
𝑙−2
⟨𝑐 𝑖 |𝑐 𝑙−1 ⟩
= |𝑐 𝑙−1 ⟩ − ∑ |𝑐 ⟩ = |𝑐 𝑙−1 ⟩ .
𝑖=0
⟨𝑐 𝑖 |𝑐 𝑖 ⟩ 𝑖

Hence, (|𝑏𝑙 ⟩ , . . . , |𝑏𝑘−1 ⟩) extends 𝐶 ′ to an orthogonal basis of ℍ.


In order to extend an orthonormal sequence to an orthonormal basis, one first
extends this sequence to an orthogonal basis and then normalizes the appended ele-
ments. □

Theorem 2.2.33 implies the following result.


Corollary 2.2.34. Every finite-dimensional Hilbert space has an orthonormal basis
Example 2.2.35. On ℍ1 we use the Hermitian inner product with respect to the com-
|0⟩+|1⟩
putational basis (|0⟩ , |1⟩). Then |𝑥+ ⟩ = has length 1. We use Gram-Schmidt
√2
orthogonalization to append 𝐵 = (|𝑥+ ⟩) to an orthonormal basis of ℍ1 . First, we note
2.2. Inner products 65

that (|𝑏0 ⟩ , |𝑐 1 ⟩) = (|𝑥+ ⟩ , |0⟩) is a basis of ℍ1 which is not orthogonal. Gram-Schmidt


orthogonalization gives
⟨𝑏0 |𝑐 1 ⟩ |0⟩ − |1⟩
(2.2.39) |𝑏1 ⟩ = |𝑐 1 ⟩ − |𝑏 ⟩ = .
⟨𝑏0 |𝑏0 ⟩ 0 2

Since ‖|𝑏1 ⟩‖ = 1/√2, this gives the orthonormal basis (|𝑏0 ⟩ , √2 |𝑏1 ⟩) = (|𝑥+ ⟩ , |𝑥− ⟩) of
ℍ1

Corollary 2.2.34 implies the following result, which shows that all inner products
on ℍ are Hermitian inner products with respect to some basis of ℍ.

Proposition 2.2.36. If 𝐵 is an orthonormal basis of ℍ, then ⟨⋅|⋅⟩ is the Hermitian inner


product on ℍ with respect to 𝐵.

Proof. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ, and let |𝜑⟩𝐵 = (𝛼0 , . . . , 𝛼𝑘−1 ), |𝜓⟩𝐵 = (𝛽0 , . . . , 𝛽 𝑘−1 ). Then we
have
𝑘−1 𝑘−1 𝑘−1
⟨𝜑, 𝜓⟩ = ( ∑ 𝛼𝑖 ⟨𝑏𝑖 |) ( ∑ 𝛽𝑗 |𝑏𝑖 ⟩) = ∑ 𝛼𝑖 𝛽𝑗 ⟨𝑏𝑖 , 𝑏𝑗 ⟩
𝑖=0 𝑗=0 𝑖,𝑗=0
(2.2.40)
𝑘−1 𝑘−1

= ∑ 𝛼𝑖 𝛽𝑗 𝛿 𝑖,𝑗 = ∑ 𝛼𝑖 𝛽 𝑖 = |𝜑⟩𝐵 |𝜓⟩𝐵
𝑖,𝑗=0 𝑖=0

as asserted. □

By Theorem 2.2.16, we have ℍ∗ = {⟨𝜑| ∶ |𝜑⟩ ∈ ℍ}. The next proposition shows
how |𝜑⟩ is constructed from 𝜑 ∈ ℍ∗ .

Proposition 2.2.37. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ and let


⟨𝜑| ∈ ℍ∗ . Then we have
𝑘−1
(2.2.41) |𝜑⟩ = ∑ ⟨𝜑|𝑏𝑖 ⟩ |𝑏𝑖 ⟩ .
𝑖=0

Exercise 2.2.38. Verify Proposition 2.2.37.

Example 2.2.39. Consider the map


3
(2.2.42) 𝑓 ∶ ℍ2 → ℍ 2 , ∑ 𝛼𝑖 |𝑖⟩2 ↦ 2𝛼0 + 𝛼3
𝑖=0

where 𝛼𝑖 ∈ ℂ for all 𝑖 ∈ ℤ4 . It is easy to verify that 𝑓 ∈ ℍ∗ . From Proposition 2.2.37


we know that for
3
(2.2.43) |𝜑⟩ = ∑ 𝑓 |𝑖⟩2 |𝑖⟩2 = 2 |0⟩2 + |3⟩2
𝑖=0

we have 𝑓 = ⟨𝜑|.
66 2. Hilbert Spaces

2.2.7. Orthogonal complements. We define orthogonal complements of sub-


sets of ℍ and discuss their properties.
Proposition 2.2.40. Let 𝑆 ⊂ ℍ. Then the set
(2.2.44) 𝑆 ⟂ = {|𝜑⟩ ∈ ℍ ∶ ⟨𝜓, 𝜑⟩ = 0 for every |𝜓⟩ ∈ 𝑆}
is a linear subspace of ℍ. It is called the orthogonal complement of 𝑆. If |𝜑⟩ ∈ ℍ, then

we write |𝜑⟩ for {|𝜑⟩}⟂ and call this subspace the orthogonal complement of |𝜑⟩.

Example 2.2.41. We determine the orthogonal complement |0⟩ of |0⟩ in ℍ1 . Let |𝜓⟩ ∈
ℍ1 ,
(2.2.45) |𝜓⟩ = 𝛼 |0⟩ + 𝛽 |1⟩

with 𝛼, 𝛽 ∈ ℂ. Then |𝜓⟩ ∈ |0⟩ if and only if 0 = ⟨𝜓|0⟩ = 𝛼⟨0|0⟩ + 𝛽⟨1|0⟩ = 𝛼. This
implies that

(2.2.46) |0⟩ = ℂ |1⟩ .
Proposition 2.2.42. Let ℍ(0), ℍ(1) be linear subspaces of ℍ. Then the following hold.
(1) (ℍ(0)⟂ )⟂ = ℍ(0).
(2) ℍ is the direct sum of ℍ(0) and ℍ(0)⟂ and dim ℍ(0) + dim ℍ(0)⟂ = dim ℍ.
(3) If 𝐵0 is an orthonormal basis of ℍ(0) and 𝐵1 is an orthonormal basis of ℍ⟂ , then
𝐵0 ∥ 𝐵1 is an orthonormal basis of ℍ.
(4) If ℍ = ℍ(0) + ℍ(1) and ℍ(0) and ℍ(1) are orthogonal to each other, then ℍ(1) =
ℍ(0)⟂ .
Exercise 2.2.43. Prove Proposition 2.2.42.

Using Proposition 2.2.42 we can prove the following more general statement.
Proposition 2.2.44. Let 𝑙 ∈ ℕ and let ℍ(0), . . . , ℍ(𝑙 − 1) be subspaces of ℍ. Then the
following hold.
(1) If ℍ(0), . . . , ℍ(𝑙 − 1) are pairwise orthogonal to each other, then their sum is direct.
(2) The subspaces ℍ(0), . . . , ℍ(𝑙 − 1) are pairwise orthogonal to each other if and only
if there are orthonormal bases 𝐵0 , . . . , 𝐵𝑙−1 of ℍ(0), . . . , ℍ(𝑙 − 1), respectively, such
that 𝐵 = 𝐵0 ∥ ⋯ ∥ 𝐵𝑙−1 is an orthonormal basis of ℍ(0) + ⋯ + ℍ(𝑙 − 1).

Proof. Without loss of generality, we assume that the sum of the subspaces ℍ(𝑖) is ℍ.
We begin by proving the first assertion. For 𝑖 ∈ ℤ𝑙 let |𝜑𝑖 ⟩ ∈ ℍ(𝑖) such that
𝑙−1
(2.2.47) ∑ |𝜑𝑖 ⟩ = 0.
𝑖=0

Since the subspaces ℍ(𝑖) are pairwise orthogonal, for all 𝑗 ∈ ℤ𝑙 we have
|𝑙−1 𝑙−1
(2.2.48) 0 = ⟨𝜑𝑗 || ∑ |𝜑𝑖 ⟩ ⟩ = ∑ ⟨𝜑𝑗 |𝜑𝑖 ⟩ = ⟨𝜑𝑗 |𝜑𝑗 ⟩
|𝑖=0 𝑖=0

and therefore |𝜑𝑗 ⟩ = 0.


2.3. Linear maps 67

Next, we turn to the second assertion. Assume that the subspaces ℍ(𝑖) are pair-
wise orthogonal. We prove the existence of bases 𝐵𝑖 with the asserted properties by
induction on 𝑙. For 𝑙 = 1, we can choose an orthonormal basis 𝐵0 of ℍ(0) that exists by
Corollary 2.2.34. Assume that 𝑙 > 1 and that the assertion holds for 𝑙 − 1. According to
the induction hypothesis, there are orthonormal bases 𝐵0 , . . . , 𝐵𝑙−2 of ℍ(0), . . . , ℍ(𝑙 −2),
respectively, such that 𝐵′ = 𝐵0 ∥ ⋯ ∥ 𝐵𝑙−2 is an orthonormal basis of the sum ℍ′ of
these subspaces. It follows from Proposition 2.2.42 that ℍ(𝑙 − 1) = (ℍ′ )⟂ and there is
an orthonormal basis 𝐵𝑙−1 of ℍ(𝑙 − 1) such that 𝐵 ′ ∥ 𝐵𝑙−1 = ℍ.
To prove the converse of the second assertion, assume that there are orthonormal
bases 𝐵𝑖 of ℍ(𝑖) such that their concatenation is an orthonormal basis of ℍ. It is then
easy to verify that the subspaces are pairwise orthogonal to each other. □

2.3. Linear maps


In this section, we study linear maps between Hilbert spaces. They are also called linear
operators. We let ℍ, ℍ′ , ℍ″ be Hilbert spaces of dimension 𝑘, 𝑙, and 𝑚, respectively, and
denote the inner product on these Hilbert spaces by ⟨⋅|⋅⟩.

2.3.1. Matrix representations. Matrix representations of homomorphisms be-


tween vector spaces are discussed in Section B.6.3. We briefly summarize this concept.
Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be a basis of ℍ and let 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑙−1 ⟩) be a basis of ℍ′ .
Then the representation matrix of 𝑓 ∈ Hom(ℍ, ℍ′ ) with respect to these bases is the
matrix in ℂ(𝑙,𝑘) whose column vectors are the coefficient vectors of 𝑓 |𝑏0 ⟩ , . . . , 𝑓 |𝑏𝑘−1 ⟩
with respect to the basis 𝐶; that is,
(2.3.1) Mat𝐵,𝐶 (𝑓) = ((𝑓 |𝑏0 ⟩)𝐶 , . . . , (𝑓 |𝑏𝑘−1 ⟩)𝐶 ) ∈ ℂ(𝑙,𝑘) .
Theorem B.6.10 states that the map
(2.3.2) Hom(ℍ, ℍ′ ) → ℂ(𝑙,𝑘) , 𝑓 ↦ Mat𝐵,𝐶 (𝑓)
is an isomorphism of ℂ-vector spaces. Its inverse is
(2.3.3) ℂ(𝑙,𝑘) → Hom(ℍ, ℍ′ ), 𝐴 ↦ 𝑓𝐴,𝐵,𝐶
where
(2.3.4) 𝑓𝐴,𝐵,𝐶 ∶ ℍ → ℍ′ , |𝜓⟩ ↦ 𝐶𝐴 |𝜓⟩𝐵 .
If ℍ = ℍ′ and 𝐵 = 𝐶, then we write Mat𝐵 (𝑓) for Mat𝐵,𝐵 (𝑓) and 𝑓𝐴,𝐵 for 𝑓𝐴,𝐵,𝐵 . Also,
if 𝐷 is a basis of ℍ″ , if 𝑓 ∈ Hom(ℍ, ℍ′ ) and 𝑔 ∈ Hom(ℍ′ , ℍ″ ), then
(2.3.5) 𝑀𝐵,𝐷 (𝑔 ∘ 𝑓) = 𝑀𝐵,𝐶 (𝑓)𝑀𝐶,𝐷 (𝑔).
Example 2.3.1. The Pauli 𝑋 operator on ℍ1 is defined as
(2.3.6) 𝑋 ∶ ℍ1 → ℍ1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ 𝛽 |0⟩ + 𝛼 |1⟩ .
It swaps the vectors of the computational basis 𝐵 = (|0⟩ , |1⟩). Hence, its matrix repre-
sentation with respect to 𝐵 is
0 1
(2.3.7) Mat𝐵 (𝑋) = ( ).
1 0
68 2. Hilbert Spaces

We also determine the matrix representation of the Pauli 𝑋 operator with respect to the
basis
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.3.8) 𝐶 = (|𝑥+ ⟩ , |𝑥− ⟩) = ( , )
√2 √2
of ℍ1 . We note that
𝑋 |0⟩ + 𝑋 |1⟩ |1⟩ + |0⟩
(2.3.9) 𝑋 |𝑥+ ⟩ = = = |𝑥+ ⟩
√2 √2
and
𝑋 |0⟩ − 𝑋 |1⟩ |1⟩ − |0⟩
(2.3.10) 𝑋 |𝑥− ⟩ = = = − |𝑥− ⟩ .
√2 √2
Hence, we have
1 0
(2.3.11) Mat𝐶 (𝑋) = ( ).
0 −1
Note that this matrix is different from Mat𝐵 (𝑋).
Example 2.3.2. The Pauli 𝑍 operator
(2.3.12) 𝑍 ∶ ℍ1 → ℍ 1
has the representation matrix
1 0
(2.3.13) 𝐴 = Mat𝐵 (𝑍) = ( )
0 −1
with respect to the computational basis of ℍ1 . Note that this matrix is equal to Mat𝐶 (𝑋)
from (2.3.11). So the Pauli 𝑍 operator is
(2.3.14) 𝑍 = 𝑓𝐴,𝐵 ∶ ℍ1 → ℍ1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ 𝛼 |0⟩ − 𝛽 |1⟩ .
Exercise 2.3.3. (1) Determine the matrix representation of the Pauli 𝑌 operator
(2.3.15) 𝑌 ∶ ℍ1 → ℍ 1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ −𝑖𝛽 |0⟩ + 𝑖𝛼 |1⟩
with respect to the computational basis of ℍ1 .
(2) Determine the matrix representations of the Pauli 𝑌 and 𝑍 operators with respect
to the basis 𝐶 = (|𝑥− ⟩ , |𝑥+ ⟩) from (2.3.8).
Exercise 2.3.4. (1) Find the matrix representation of the Hadamard operator
(2.3.16) 𝐻 ∶ ℍ1 → ℍ 1 , 𝛼 |0⟩ + 𝛽 |1⟩ ↦ 𝛼 |𝑥+ ⟩ + 𝛽 |𝑥− ⟩
with respect to the computational basis of ℍ1 .
(2) Use the matrix representations of the operators 𝐻, 𝑋, 𝑌 , and 𝑍 to show that
(2.3.17) 𝐻𝑋𝐻 = 𝑍, 𝐻𝑌 𝐻 = −𝑌 , 𝐻𝑍𝐻 = 𝑋.

Since (2.3.2) is an isomorphism of ℂ-vector spaces, every map 𝑓 ∈ Hom(ℍ, ℍ′ ) is


uniquely determined by its representation matrix Mat𝐵,𝐶 (𝑓). Therefore, we can define
linear maps in Hom(ℍ, ℍ′ ) by their action on the elements of a basis of ℍ. In particular,
2.3. Linear maps 69

operators on state spaces ℍ𝑛 will typically be described by their effect on the computa-
tional basis elements. For instance, using this representation, the Hadamard operator
from(2.3.16) can written as
(2.3.18) 𝐻 ∶ ℍ1 → ℍ 1 , |0⟩ ↦ |𝑥+ ⟩ , |1⟩ ↦ |𝑥− ⟩ .

Finally, we introduce a further simplification of the notation. For this, we let 𝑇 ∈


Hom(ℍ, ℍ′ ), |𝜑⟩ ∈ ℍ, and |𝜓⟩ ∈ ℍ′ . Then applying the operator ⟨𝜑| to 𝑇 |𝜓⟩ ∈ ℍ has
the same effect as applying the composite operator ⟨𝜑| ∘ 𝑇 to |𝜓⟩ ∈ ℍ′ . Therefore, we
write
(2.3.19) ⟨𝜑|𝑇|𝜓⟩ = ⟨𝜑| (𝑇 |𝜓⟩) = ⟨𝜑| ∘ 𝑇 |𝜓⟩ .

We can use this notation to describe the representation matrices of homomor-


phisms between Hilbert spaces.
Proposition 2.3.5. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) and 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑙−1 ⟩) be orthonormal
bases of ℍ and ℍ′ , respectively. Then the matrix representation of a linear map 𝑇 ∈
Hom(ℍ, ℍ′ ) with respect to these bases is
(2.3.20) Mat𝐵,𝐶 (𝑇) = (⟨𝑐 𝑖 |𝑇|𝑏𝑗 ⟩)𝑖∈ℤ𝑙 ,𝑗∈ℤ𝑘 ∈ ℂ(𝑙,𝑘) .

Proof. Write Mat𝐵,𝐶 (𝑇) = (𝛼𝑖,𝑗 ). Then for all 𝑖 ∈ ℤ𝑘 and 𝑗 ∈ ℤ𝑙 the linearity of the
inner product in the second argument and the fact that ⟨𝑏𝑖 |𝑏𝑗 ⟩ = 𝛿 𝑖,𝑗 for all 𝑖, 𝑗 ∈ ℤ𝑘
imply
𝑘−1 𝑘−1
(2.3.21) ⟨𝑐 𝑖 |𝑇|𝑏𝑗 ⟩ = ⟨𝑐 𝑖 | ( ∑ 𝛼𝑚,𝑗 |𝑐𝑚 ⟩) = ∑ 𝛼𝑚,𝑗 ⟨𝑐 𝑖 |𝑐𝑚 ⟩ = 𝛼𝑖,𝑗 . □
𝑚=0 𝑚=0

Example 2.3.6. Denote by ⟨⋅|⋅⟩ the Hermitian inner product on ℍ1 with respect to
𝐵 = (|0⟩ , |1⟩). By Proposition 2.3.5, the representation matrix of the Pauli 𝑍 operator
with respect to 𝐵 is
⟨0|𝑍|0⟩ ⟨0|𝑍|1⟩ ⟨0|0⟩ −⟨0|1⟩ 1 0
(2.3.22) Mat𝐵 (𝑍) = ( )=( )=( ).
⟨1|𝑍|0⟩ ⟨1|𝑍|1⟩ ⟨1|0⟩ −⟨1|1⟩ 0 −1

2.3.2. Adjoints. In this section, we introduce adjoints of matrices over ℂ and of


linear maps between finite-dimensional Hilbert spaces and discuss their properties.
We start by defining the adjoints of complex matrices.

Definition 2.3.7. The adjoint of a matrix 𝐴 ∈ ℂ(𝑘,𝑙) is 𝐴∗ = 𝐴T ∈ ℂ(𝑙,𝑘) .

The adjoint of a matrix over ℂ is also called its Hermitian adjoint, Hermitian con-
jugate, or Hermitian transpose.
The notation 𝐴∗ for the adjoint of a matrix 𝐴 over ℂ is in agreement with Definition
2.2.7 where the dual of a complex vector is specified as the matrix that has the conjugate
of this vector as its only row.
Example 2.3.8. Consider the matrix
1 𝑖 1+𝑖
(2.3.23) 𝐴=( ) ∈ ℂ(2,3) .
1−𝑖 𝑖 1
70 2. Hilbert Spaces

Its adjoint is
1 1+𝑖

(2.3.24) 𝐴 = ( −𝑖 −𝑖 ) ∈ ℂ(3,2) .
1−𝑖 1

Here are some important properties of the adjoint of matrices.


Proposition 2.3.9. Let 𝐴 ∈ ℂ(𝑘,𝑙) , 𝐵 ∈ ℂ(𝑚,𝑛) , and 𝛼 ∈ ℂ. Then we have

(2.3.25) (𝐴∗ ) = 𝐴,
(2.3.26) (𝐴 + 𝐵)∗ = 𝐴∗ + 𝐵 ∗ if 𝑚 = 𝑘 and 𝑛 = 𝑙,
(2.3.27) (𝛼𝐴)∗ = 𝛼𝐴∗ ,
(2.3.28) rank(𝐴) = rank(𝐴∗ ),
(2.3.29) (𝐴𝐵)∗ = 𝐵 ∗ 𝐴∗ if 𝑙 = 𝑚.
Exercise 2.3.10. Prove Proposition 2.3.9.

The next proposition characterizes the adjoints of matrices.


Proposition 2.3.11. Let 𝐴 ∈ ℂ(𝑘,𝑙) . Then for all 𝑣 ⃗ ∈ ℂ𝑘 and all 𝑤⃗ ∈ ℂ𝑙 the adjoint 𝐴∗
of 𝐴 satisfies

(2.3.30) ⟨𝑣||⃗ 𝐴𝑤⟩⃗ = ⟨𝐴∗ 𝑣||⃗ 𝑤⟩⃗

and 𝐴∗ is the only matrix in ℂ(𝑙,𝑘) with this property.

Proof. Let 𝑣 ⃗ ∈ ℂ𝑘 , 𝑤⃗ ∈ ℂ𝑙 . Then we have


⟨𝐴∗ 𝑣|⃗ 𝑤⟩⃗ = (𝐴∗ 𝑣)⃗ ∗ 𝑤⃗ by (2.2.9),
∗ ∗
= 𝑣∗⃗ (𝐴 ) 𝑤⃗ by (2.3.29),

= 𝑣 ⃗ 𝐴𝑤⃗ by (2.3.25),
= ⟨𝑣|𝐴
⃗ 𝑤⟩⃗ by (2.2.9).

To show that 𝐴∗ is the only matrix in ℂ(𝑙,𝑘) that satisfies (2.3.30), let 𝐴′ ∈ ℂ(𝑙,𝑘) such
that
(2.3.31) ⃗ 𝑤⟩⃗ = ⟨𝐴′ 𝑣|⃗ 𝑤⟩⃗
⟨𝑣|𝐴

for all 𝑣 ⃗ ∈ ℂ𝑘 and 𝑤⃗ ∈ ℂ𝑙 . Denote by 𝑒 0⃗ , . . . , 𝑒 𝑘−1


⃗ and 𝑓0⃗ , . . . , 𝑓𝑙−1
⃗ the standard unit
𝑘 𝑙
vectors of ℂ and ℂ , respectively. Then for all 𝑖 ∈ ℤ𝑘 and 𝑗 ∈ ℤ𝑙 we have

⟨𝑒 𝑖⃗ ||𝐴𝑓𝑗⃗ ⟩ = ⟨𝐴′ 𝑒 𝑖⃗ ||𝑓𝑗⃗ ⟩ by (2.3.31),

= ⟨𝑒 𝑖⃗ ||(𝐴′ )∗ 𝑓𝑗⃗ ⟩ by (2.3.30) and (2.3.25).

So (2.3.25) implies 𝐴∗ = 𝐴′ . □

From Proposition 2.3.11 we obtain the following result which allows us to define
adjoints of linear operators on Hilbert spaces.
2.3. Linear maps 71

Proposition 2.3.12. If 𝐴 ∈ Hom(ℍ′ , ℍ), then there is a uniquely determined operator


𝐴∗ ∈ Hom(ℍ, ℍ′ ) such that

(2.3.32) ⟨𝜑||𝐴 |𝜓⟩ ⟩ = ⟨𝐴∗ |𝜑⟩ ||𝜓⟩

for all |𝜑⟩ ∈ ℍ and |𝜓⟩ ∈ ℍ′ . The operator 𝐴∗ is called the adjoint of 𝐴.

Proof. Let 𝐴 ∈ Hom(ℍ′ , ℍ). Choose orthonormal bases 𝐵 of ℍ and 𝐶 of ℍ′ . Let 𝐴∗ be


the linear map in Hom(ℍ, ℍ′ ) with representation matrix (Mat𝐶,𝐵 (𝐴))∗ . It then follows
from Proposition 2.3.11 that (2.3.32) holds for all |𝜑⟩ ∈ ℍ and |𝜓⟩ ∈ ℍ′ . The uniqueness
of 𝐴∗ follows from the uniqueness of 𝐴∗ in Proposition 2.3.11. □

Exercise 2.3.13. Verify that Proposition 2.3.9 also holds for linear operators.

Finally, we mention some properties of the adjoint of endomorphisms and square


matrices over ℂ.

Proposition 2.3.14. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ). Then the following hold.


(1) The determinant, trace, characteristic polynomial, and eigenvalues of 𝐴∗ are the
complex conjugates of the determinant, trace, characteristic polynomial, and eigen-
values of 𝐴, respectively.
(2) If 𝐴 is invertible, then 𝐴∗ is invertible, and we have (𝐴∗ )−1 = (𝐴−1 )∗ .

Exercise 2.3.15. Prove Proposition 2.3.14.

2.3.3. The Hilbert-Schmidt inner product. We define an inner product on


ℂ(𝑙,𝑘) and on Hom(ℍ, ℍ′ ).

Proposition 2.3.16. The map


(2.3.33) ⟨⋅|⋅⟩ ∶ ℂ(𝑙,𝑘) × ℂ(𝑙,𝑘) → ℂ, (𝐴, 𝐵) ↦ ⟨𝐴|𝐵⟩ = tr(𝐴∗ 𝐵)
is an inner product on ℂ(𝑙,𝑘) . It is called the Hilbert-Schmidt inner product on ℂ(𝑙,𝑘) .

Proof. The map ℂ(𝑙,𝑘) → ℂ𝑘𝑙 , which sends a matrix 𝐴 ∈ ℂ(𝑙,𝑘) to the concatenation of
its column vectors, is an isomorphism of ℂ-vector spaces. We use this map to identify
the matrices in ℂ(𝑙,𝑘) with vectors in ℂ𝑘𝑙 . Using this identification, the map (2.3.33) is
the standard Hermitian inner product on ℂ(𝑙,𝑘) . □

Corollary 2.3.17. The map


(2.3.34) ⟨⋅|⋅⟩ ∶ Hom(ℍ, ℍ′ ) × Hom(ℍ, ℍ′ ), (𝐴, 𝐵) ↦ ⟨𝐴|𝐵⟩ = tr(𝐴∗ 𝐵)
is an inner product on Hom(ℍ, ℍ′ ). It is called the Hilbert-Schmidt inner product on
End(ℍ).

Equipped with the Hilbert-Schmidt inner product, the complex vector space
Hom(ℍ, ℍ′ ) becomes a Hilbert space.
We present another way of writing the Hilbert-Schmidt inner product.
72 2. Hilbert Spaces

Proposition 2.3.18. Let 𝐴, 𝐵 ∈ ℂ(𝑙,𝑘) and denote by 𝑎0⃗ , . . . , 𝑎𝑘−1


⃗ the column vectors of
𝐴 and by 𝑏0⃗ , . . . , 𝑏𝑙−1
⃗ the column vectors of 𝐵. Then we have
𝑙−1
(2.3.35) ⟨𝐴|𝐵⟩ = ∑ ⟨𝑎𝑖⃗ |𝑏𝑖⃗ ⟩.
𝑖=0

Exercise 2.3.19. Prove Proposition 2.3.18.


Example 2.3.20. Let
2 𝑖 𝑖 1
(2.3.36) 𝐴=( ) and 𝐵=( ).
3 −1 2 4
Then we have
2 3 𝑖 1 2𝑖 + 6 14
(2.3.37) ⟨𝐴|𝐵⟩ = tr 𝐴∗ 𝐵 = tr ( )( ) = tr ( ) = 𝑖 + 2.
−𝑖 −1 2 4 −1 −𝑖 − 4

The norm induced by the Hilbert-Schmidt inner product is

(2.3.38) ℂ(𝑙,𝑘) → ℝ, 𝐴 = (𝑎𝑖,𝑗 ) ↦ ‖𝐴‖ = tr(𝐴𝐴∗ ) = ∑ |𝑎𝑖,𝑗 |2 .


√𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑙
Example 2.3.21. Let
1 𝑖
(2.3.39) 𝐴=( ).
1+𝑖 1−𝑖
Then we have
2
(2.3.40) ‖𝐴‖ = |1|2 + |𝑖|2 + |1 + 𝑖|2 + |1 − 𝑖|2 = 1 + 1 + 2 + 2 = 6.

2.4. Endomorphisms
In this section, we discuss endomorphisms of a Hilbert space ℍ of finite dimension 𝑘
with inner product ⟨⋅|⋅⟩ and their properties.

2.4.1. Basics. The representation matrices of endomorphisms of ℍ are the ma-


trices in ℂ(𝑘,𝑘) .
As examples we will use the Pauli operators 𝑋, 𝑌 , and 𝑍 on ℍ1 that were already in-
troduced in Examples 2.3.1 and 2.3.2 and Exercise 2.3.3. Their representation matrices
with respect to the computational basis (|0⟩ , |1⟩) are the Pauli matrices
0 1 0 −𝑖 1 0
(2.4.1) 𝑋=( ), 𝑌 =( ), 𝑍=( ).
1 0 𝑖 0 0 −1
In the examples, we will also use the Hadamard operator on ℍ1 . Its representation
matrix with respect to (|0⟩ , |1⟩) is the Hadamard matrix
11 1
(2.4.2) 𝐻= ().
√2 1 −1
The next theorem presents formulas for the characteristic polynomial 𝑝𝐴 (𝑥), the
trace tr(𝐴), and the determinant det(𝐴) of endomorphisms 𝐴 of ℍ and matrices 𝐴 in
ℂ(𝑘,𝑘) .
2.4. Endomorphisms 73

Proposition 2.4.1. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ). Let Λ be the set of eigenvalues of 𝐴.


For each 𝜆 ∈ Λ let 𝑚𝜆 be its algebraic multiplicity. Then we have
(2.4.3) 𝑝𝐴 (𝑥) = ∏(𝑥 − 𝜆)𝑚𝜆 ,
𝜆∈Λ

(2.4.4) tr(𝐴) = ∑ 𝑚𝜆 𝜆,
𝜆∈Λ

(2.4.5) det(𝐴) = ∏ 𝜆𝑚𝜆 .


𝜆∈Λ

Proof. Let 𝐴 ∈ End(ℍ) or 𝐴 ∈ ℂ(𝑘,𝑘) . The first assertion follows from the fact that
ℂ is algebraically closed, which implies that 𝑝𝐴 (𝑥) is a product of linear factors. The
details are beyond the scope of this book. The other two assertions are derived from
Proposition B.5.27. □
Example 2.4.2. The characteristic polynomial of the identity operator 𝐼1 on ℍ1 is
𝑥−1 0
(2.4.6) 𝑝𝐼 (𝑥) = det(𝑥𝐼 − 𝐼) = det ( ) = (𝑥 − 1)2 .
0 𝑥−1
Hence, the only eigenvalue of 𝐼 is 1. It has algebraic multiplicity 2 and we have tr(𝐼) =
1 + 1 = 2 and det(𝐼) = 1 ⋅ 1 = 1.
The characteristic polynomial of the Pauli 𝑋 operator is
𝑥 −1
(2.4.7) 𝑝𝑋 (𝑥) = det(𝑥𝐼 − 𝑋) = det ( ) = (𝑥 − 1)(𝑥 + 1).
−1 𝑥
Hence, the eigenvalues of 𝑋 are 1 and −1, both with algebraic multiplicity 1, and we
have tr(𝑋) = 1 + (−1) = 0 and det(𝑋) = 1 ⋅ (−1) = −1.
Exercise 2.4.3. Use Proposition 2.4.1 to show that the Pauli 𝑌 and 𝑍 matrices have the
eigenvalues 1 and −1, trace 0, and determinant −1.

Next, we note the following.


Theorem 2.4.4. If all eigenvalues of 𝐴 ∈ ℂ(𝑘,𝑘) have algebraic multiplicity 1, then 𝐴 is
diagonizable.
Exercise 2.4.5. Prove Theorem 2.4.4.
Example 2.4.6. According to Example 2.4.2, the Pauli matrix 𝑋 has the eigenvalues 1
and −1 and both have algebraic multiplicity 1. Therefore, by Theorem 2.4.4 this matrix
is diagonalizable. In fact, by Exercise 2.3.4 we have
(2.4.8) 𝐻𝑋𝐻 = 𝑍
with the Pauli matrix
1 0
(2.4.9) 𝑍=( )
0 −1
which is a diagonal matrix and the Hadamard matrix
1 1 1
(2.4.10) 𝐻= ( ).
√2 1 −1
74 2. Hilbert Spaces

Since 𝐻 −1 = 𝐻, it follows from (2.4.8) that


(2.4.11) 𝐻 −1 𝑋𝐻 = 𝑍
which shows that 𝑋 is diagonizable.

In order for a matrix 𝐴 ∈ ℂ(𝑘,𝑘) to be diagonizable, it is not necessary that all of


its eigenvalues have algebraic multiplicity 1. For example, as shown in Example 2.4.2,
the diagonal identity matrix 𝐼 has 1 as its single eigenvalue. So the question arises of
whether all matrices in ℂ(𝑘,𝑘) are diagonizable. But this is not the case as the next
example shows.
Example 2.4.7. Consider the matrix
1 1
(2.4.12) 𝐴=( ).
0 1
Its characteristic polynomial is 𝑝𝐴 (𝑥) = (𝑥 − 1)2 . Hence 1 is the only eigenvalue of 𝐴
and it has algebraic multiplicity 2. Also, the eigenspace of this eigenvector is the kernel
of the matrix
0 −1
(2.4.13) 1⋅𝐼−𝐴=( ).
0 0
This matrix has rank 1 and therefore the dimension of this eigenspace is 1. So according
to Theorem B.7.28, the matrix 𝐴 is not diagonizable.
Definition 2.4.8. A matrix 𝐴 ∈ ℂ(𝑘,𝑘) or an operator 𝐴 ∈ End(ℍ) is called an involu-
tion if 𝐴2 = 𝐼𝑘 or 𝐴2 = 𝐼ℍ , respectively.
Exercise 2.4.9. Prove that the Pauli matrices and operators are involutions.

2.4.2. Hermitian matrices and operators. In this section, we introduce and


discuss Hermitian matrices and operators. They will be used in Section 3.4 to model
quantum mechanical measurements.
Definition 2.4.10. A matrix 𝐴 ∈ ℂ(𝑘,𝑘) or operator 𝐴 ∈ End(ℍ) is called Hermitian or
self-adjoint if 𝐴 = 𝐴∗ .

Hermitian matrices and operators are named after the French mathematician
Charles Hermite who lived in the 19th century and made significant contributions to
many areas of mathematics.
Example 2.4.11. The matrix
1 𝑖
(2.4.14) 𝐴=( )
−𝑖 1
is Hermitian since
1 −𝑖 1 𝑖
(2.4.15) 𝐴∗ = 𝐴𝑇 = ( )=( ).
𝑖 1 −𝑖 1
Exercise 2.4.12. Show that the Pauli operators 𝑋, 𝑌 , and 𝑍 and the Hadamard operator
𝐻 are Hermitian.
2.4. Endomorphisms 75

We present properties of Hermitian matrices and operators.

Proposition 2.4.13. (1) The diagonal elements of Hermitian matrices are real num-
bers.
(2) The determinants, trace, and eigenvalues of Hermitian matrices or operators are
real numbers.
(3) The inverse of an invertible Hermitian matrix or operator is Hermitian.
(4) The sum of two Hermitian matrices or operators is Hermitian.
(5) The product 𝐴𝐵 of two Hermitian matrices 𝐴, 𝐵 ∈ ℂ(𝑘,𝑘) or operators 𝐴, 𝐵 ∈ End(ℍ)
is Hermitian if and only if 𝐴𝐵 = 𝐵𝐴.
(6) If 𝐴, 𝐵 ∈ ℂ(𝑘,𝑘) or 𝐴, 𝐵 ∈ End(ℍ), then 𝐴𝐵𝐴 is Hermitian.

Exercise 2.4.14. Prove Proposition 2.4.13.

2.4.3. Unitary matrices and operators. In Section 3.3, unitary operators will
be used to model the evolution of quantum systems over time. This section introduces
these operators and presents their properties.

Definition 2.4.15. A matrix 𝑈 ∈ ℂ(𝑘,𝑘) or operator 𝑈 ∈ End(ℍ) is called unitary if


𝑈 ∗ 𝑈 = 𝑈𝑈 ∗ = 𝐼𝑘 or 𝑈 ∗ 𝑈 = 𝑈𝑈 ∗ = 𝐼ℍ , respectively

Example 2.4.16. Consider the Hadamard matrix


1 1 1
(2.4.16) 𝐻= ( ).
√2 1 −1
We have
1 0
(2.4.17) 𝐻𝐻 ∗ = 𝐻 2 = ( ).
0 1
Hence, 𝐻 is unitary.

Exercise 2.4.17. Show that Hermitian matrices or operators are involutions if and only
if they are unitary. Conclude that the Pauli operators 𝑋, 𝑌 , and 𝑍 and the Hadamard
operator are unitary.

We prove a number of equivalent characterizations of unitary matrices.

Proposition 2.4.18. Let 𝑈 ∈ ℂ(𝑘,𝑘) . Then the following statements are equivalent.
(1) 𝑈 is unitary.
(2) 𝑈 is invertible and 𝑈 −1 = 𝑈 ∗ .
(3) The columns of the matrix 𝑈 form an orthonormal basis of ℂ𝑘 .
(4) The rows of the matrix 𝑈 form an orthonormal basis of ℂ𝑘 .
(5) ⟨𝑈 𝑣,⃗ 𝑈 𝑤⟩⃗ = ⟨𝑣,⃗ 𝑤⟩⃗ for all 𝑣,⃗ 𝑤⃗ ∈ ℂ𝑘
(6) ‖‖𝑈 𝑣‖‖⃗ = ‖‖𝑣‖‖⃗ for all 𝑣 ⃗ ∈ ℂ𝑘 .
76 2. Hilbert Spaces

Proof. Let 𝑈 ∈ ℂ(𝑘,𝑘) . Statements (1) and (2) are equivalent by the definition of uni-
tary matrices and invertibility. Next, we note that 𝑈 ∗ = 𝑈 −1 if and only if 𝑈𝑈 ∗ = 𝐼𝑘
which is equivalent to the sequence of row vectors of 𝑈 being an orthonormal basis
of ℂ𝑘 . Also, the equivalence of the second and fourth property can be deduced from
𝑈 ∗ 𝑈 = 𝐼𝑘 .
We show that statement (1) and statement (5) are equivalent. Let 𝑈 ∈ ℂ(𝑘,𝑘) be
unitary and let 𝑣,⃗ 𝑤⃗ ∈ ℂ𝑘 . Then 𝑈 ∗ 𝑈 = 𝐼𝑘 and Proposition 2.3.11 imply ⟨𝑈 𝑣,⃗ 𝑈 𝑤⟩⃗ =
⟨𝑈 ∗ 𝑈 𝑣,⃗ 𝑤⟩⃗ = ⟨𝑣,⃗ 𝑤⟩.
⃗ Conversely, assume that ⟨𝑈 𝑣,⃗ 𝑈 𝑤⟩⃗ = ⟨𝑣,⃗ 𝑤⟩⃗ for all 𝑣,⃗ 𝑤⃗ ∈ ℂ𝑘 . This
implies
(2.4.18) 𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = ⟨𝑈 𝑒 𝑖⃗ |𝑈 𝑒𝑗⃗ ⟩ = ⟨𝑒 𝑖⃗ |𝑒𝑗⃗ ⟩ = 𝛿 𝑖,𝑗
for all 𝑖, 𝑗 ∈ ℤ𝑘 where 𝑒 𝑖⃗ is the 𝑖th standard unit vector in ℂ𝑘 for 0 ≤ 𝑖 < 𝑘. This means
that 𝑈 ∗ 𝑈 = 𝐼𝑘 . So Corollary B.5.21 implies that 𝑈 is invertible and 𝑈 ∗ = 𝑈 −1 ; i.e.,
𝑈 ∗ 𝑈 = 𝑈𝑈 ∗ = 𝐼𝑘 .
Finally, we show that statements (1) and (6) are equivalent. Statement (6) follows
immediately from statement (5) which is equivalent to statement (1). Conversely, as-
sume that ⟨𝑈 𝑣,⃗ 𝑈 𝑣⟩⃗ = ⟨𝑣,⃗ 𝑣⟩⃗ for all 𝑣 ⃗ ∈ ℂ𝑘 . We show that
(2.4.19) 𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = 𝛿 𝑖,𝑗
for all 𝑖, 𝑗 ∈ ℤ𝑘 . Then Corollary B.5.21 implies that 𝑈 is invertible and 𝑈 ∗ = 𝑈 −1 ; i.e.,
𝑈 ∗ 𝑈 = 𝑈𝑈 ∗ = 𝐼𝑘 . For all 𝑖 ∈ ℤ𝑘 we have
(2.4.20) 𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒 𝑖⃗ = ⟨𝑈 𝑒 𝑖⃗ |𝑈 𝑒 𝑖⃗ ⟩ = ⟨𝑒 𝑖⃗ , 𝑒 𝑖⃗ ⟩ = 1.
Next, let 𝑖, 𝑗 ∈ ℤ𝑘 and assume that 𝑖 ≠ 𝑗. Then we have
2 = ⟨𝑒 𝑖⃗ + 𝑒𝑗⃗ |𝑒 𝑖⃗ + 𝑒𝑗⃗ ⟩
= ⟨𝑈(𝑒 𝑖⃗ + 𝑒𝑗⃗ )|𝑈(𝑒 𝑖⃗ + 𝑒𝑗⃗ )⟩
2 2
= ‖‖𝑈 𝑒 𝑖⃗ ‖‖ + ⟨𝑈 𝑒 𝑖⃗ |𝑈 𝑒𝑗⃗ ⟩ + ⟨𝑈 𝑒𝑗⃗ |𝑈 𝑒 𝑖⃗ ⟩ + ‖‖𝑈 𝑒𝑗⃗ ‖‖
= 2 + 2ℜ𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ .
It follows that ℜ𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = 0. Applying similar arguments to ⟨𝑈(𝑒 𝑖⃗ + 𝑖𝑒𝑗⃗ )|𝑈(𝑒 𝑖⃗ + 𝑖𝑒𝑗⃗ )⟩
it can be shown that ℑ𝑒∗𝑖⃗ 𝑈 ∗ 𝑈 𝑒𝑗⃗ = 0. Therefore, (2.4.19) holds. □
Exercise 2.4.19. Show that permutation matrices are unitary.

Proposition 2.4.18 implies the following result.


Theorem 2.4.20. (1) The set of all unitary matrices in ℂ(𝑘,𝑘) is a subgroup of 𝖦𝖫(𝑘, ℂ).
It is denoted by U(𝑘) and is called the unitary group of rank 𝑘.
(2) The set of all unitary matrices of determinant 1 is a subgroup of U(𝑘). It is called the
special unitary group of rank 𝑘 and is denoted by SU(𝑘).

Proof. Let 𝑈, 𝑉 ∈ ℂ(𝑘,𝑘) be unitary. It then follows from Lemma B.5.23, Proposition
2.3.9, and Proposition 2.4.18 that (𝑈𝑉)−1 = 𝑉 −1 𝑈 −1 = 𝑉 ∗ 𝑈 ∗ = (𝑈𝑉)∗ . Also, 𝐼𝑘 is
unitary. Therefore, the set of unitary matrices is a subgroup of 𝖦𝖫(𝑘, ℂ). Since the
product of two matrices of determinant 1 has determinant 1, it follows that SU(𝑘) is a
subgroup of U(𝑘). □
2.4. Endomorphisms 77

From Proposition 2.4.18 we also obtain characterizations of unitary operators on


ℍ. To state them, we need the following definition.
Definition 2.4.21. Let ℍ′ be another Hilbert space with an inner product ⟨⋅|⋅⟩. A map
𝑈 ∈ Hom(ℍ, ℍ′ ) is called an isometry between ℍ and ℍ′ if ⟨𝜑|𝜓⟩ = ⟨𝑈 |𝜑⟩ ||𝑈 |𝜓⟩ ⟩ for
all |𝜑⟩ , |𝜓⟩ ∈ ℍ. .
Example 2.4.22. Let 𝐵 be a basis of ℍ. Denote by ⟨⋅|⋅⟩ the standard Hermitian inner
product on ℂ𝑘 . So (ℍ, ⟨⋅|⋅⟩𝐵 ) and (ℂ𝑘 , ⟨⋅|⋅⟩) are Hilbert spaces. By Corollary 2.2.11, the
map
(2.4.21) ℍ → ℂ𝑘 , |𝜑⟩ ↦ |𝜑⟩𝐵
is an isometry between these Hilbert spaces.

Finally, we characterize the set of all orthonormal bases of ℍ.


Corollary 2.4.23. Let 𝐵 be an orthonormal basis of ℍ. Then the set of all orthonormal
bases of ℍ is the coset 𝐵U(𝑘) in 𝖦𝖫(𝑘, ℂ).
Exercise 2.4.24. Prove Corollary 2.4.23.

2.4.4. Outer products. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ.


We define the outer product of elements of ℍ.
Definition 2.4.25. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ. Then the outer product of |𝜑⟩ and |𝜓⟩ is the endo-
morphism
(2.4.22) |𝜑⟩ ⟨𝜓| ∶ ℍ → ℍ, |𝜉⟩ ↦ |𝜑⟩ ⟨𝜓|𝜉⟩
of ℍ.

In formula (2.4.22) we deviate from the usual notation and write the scalar product
of the complex number 𝛼 = ⟨𝜓|𝜉⟩ with |𝜑⟩ ∈ ℍ as |𝜑⟩ 𝛼 instead of 𝛼 |𝜑⟩. This allows for
a more intuitive notation.
Example 2.4.26. The computational basis of the single-qubit state space ℍ1 is (|0⟩ , |1⟩).
Examples of the outer products of kets in ℍ1 are |0⟩ ⟨0|, |0⟩ ⟨1|, |1⟩ ⟨0|, and |1⟩ ⟨1|. Let
|𝜓⟩ = 𝛼 |0⟩ + 𝛽 |1⟩ ∈ ℍ1 with complex coefficients 𝛼 and 𝛽. Then the images of this ket
under the four outer products are
|0⟩ ⟨0| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |0⟩ ⟨0|0⟩ + 𝛽 |0⟩ ⟨0|1⟩ = 𝛼 |0⟩ ,
|0⟩ ⟨1| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |0⟩ ⟨1|0⟩ + 𝛽 |0⟩ ⟨1|1⟩ = 𝛽 |0⟩ ,
|1⟩ ⟨0| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |1⟩ ⟨0|0⟩ + 𝛽 |1⟩ ⟨0|1⟩ = 𝛼 |1⟩ ,
|1⟩ ⟨1| (𝛼 |0⟩ + 𝛽 |1⟩) = 𝛼 |1⟩ ⟨1|0⟩ + 𝛽 |1⟩ ⟨1|1⟩ = 𝛽 |1⟩ .

We present a more abstract interpretation of the outer product, which is useful for
computations. We recall from Section B.3 that we can view each |𝜑⟩ ∈ ℍ as the linear
map
(2.4.23) ℂ → ℍ, 𝛼 ↦ 𝛼 |𝜑⟩
78 2. Hilbert Spaces

in Hom(ℂ, ℍ). In this way, we obtain an isomorphism ℍ → Hom(ℂ, ℍ) of ℂ-vector


spaces. Also, if we identify |𝜑⟩ with the map in (2.4.23), then we can write
(2.4.24) |𝜑⟩ ⟨𝜓| = |𝜑⟩ ∘ ⟨𝜓| .

We present a formula for the representation matrix of |𝜑⟩ ⟨𝜓| with respect to the
basis 𝐵 where |𝜑⟩ and |𝜓⟩ ∈ ℍ. For this, we let
(2.4.25) |𝜑⟩𝐵 = (𝛼0 , . . . , 𝛼𝑘−1 ), |𝜓⟩𝐵 = (𝛽0 , . . . , 𝛽 𝑘−1 ).
Then we have

(2.4.26) Mat𝐵 (|𝜑⟩ ⟨𝜓|) = |𝜑⟩𝐵 |𝜓⟩𝐵 = (𝛼𝑖 𝛽𝑗 )0≤𝑖,𝑗<𝑘 .

From (2.4.26) we obtain the following results by applying the rules of matrix mul-
tiplication and the formula for the trace.
Proposition 2.4.27. Let |𝜑⟩ , |𝜓⟩ , |𝜉⟩ , |𝜒⟩ ∈ ℍ and let 𝛼 ∈ ℂ. Then the following hold.
(1) (|𝜑⟩ + |𝜓⟩) ⟨𝜉| = |𝜑⟩ ⟨𝜉| + |𝜓⟩ ⟨𝜉|.
(2) |𝜑⟩ (⟨𝜓| + ⟨𝜉|) = |𝜑⟩ ⟨𝜓| + |𝜑⟩ ⟨𝜉|.
(3) (𝛼 |𝜑⟩) ⟨𝜓| = 𝛼 |𝜑⟩ ⟨𝜓|.
(4) |𝜑⟩ (𝛼 ⟨𝜓|) = 𝛼 |𝜑⟩ ⟨𝜓|.
(5) (|𝜑⟩ ⟨𝜓|)∗ = |𝜓⟩ ⟨𝜑|.
(6) tr(|𝜑⟩ ⟨𝜓|) = ⟨𝜓|𝜑⟩.
(7) |𝜑⟩ ⟨𝜓| ∘ |𝜉⟩ ⟨𝜒| = ⟨𝜓|𝜉⟩ |𝜑⟩ ⟨𝜒|.
(8) ⟨𝜑|𝜓⟩⟨𝜉|𝜒⟩ = ⟨𝜑||(|𝜓⟩ ⟨𝜉|)||𝜒⟩.

Exercise 2.4.28. Prove Proposition 2.4.27.

In addition, the following proposition can be deduced from (2.4.26).


Proposition 2.4.29. Let 𝐴 be a linear operator on ℍ and let |𝜑⟩ , |𝜓⟩ ∈ ℍ. Then we have
the following.
(1) 𝐴 ∘ |𝜑⟩ ⟨𝜓| = 𝐴 |𝜑⟩ ⟨𝜓|.
(2) |𝜑⟩ ⟨𝜓| ∘ 𝐴∗ = |𝜑⟩ 𝐴 |𝜓⟩.
(3) tr 𝐴 ∘ |𝜑⟩ ⟨𝜓| = tr |𝜑⟩ ⟨𝜓| ∘ 𝐴∗ = ⟨𝜓|𝐴|𝜑⟩.
Exercise 2.4.30. Prove Proposition 2.4.29.

The outer product can be used to construct an orthonormal basis of End(ℍ) as


follows.
Proposition 2.4.31. The sequence (|𝑏𝑖 ⟩ ⟨𝑏𝑗 |)𝑖,𝑗∈ℤ𝑘 is an orthonormal basis of the ℂ-alge-
bra End(ℍ) with respect to the Hilbert-Schmidt inner product. Furthermore, for any 𝐴 ∈
End(ℍ) we have
𝑘−1
(2.4.27) 𝐴 = ∑ ⟨𝑏𝑖 |𝐴|𝑏𝑗 ⟩ |𝑏𝑖 ⟩ ⟨𝑏𝑗 | .
𝑖,𝑗=0
2.4. Endomorphisms 79

Proof. Let 𝑖, 𝑗, 𝑢, 𝑣 ∈ ℤ𝑘 . Then Proposition 2.4.27 implies tr(|𝑏𝑖 ⟩ ⟨𝑏𝑗 | ∘ |𝑏ᵆ ⟩ ⟨𝑏𝑣 |) =
⟨𝑏𝑗 |𝑏ᵆ ⟩⟨𝑏𝑖 |𝑏𝑣 ⟩ = 𝛿𝑗,ᵆ 𝛿 𝑖,𝑣 . Hence, the sequence (|𝑏𝑖 ⟩ ⟨𝑏𝑗 |) is orthonormal. Since its
length is 𝑛2 which is the dimension of End(ℍ) over ℂ, it is a basis of this ℂ-algebra.
Also, (2.4.27) follows from (2.2.12). □

From Proposition 2.4.31 we obtain the following results.


Corollary 2.4.32. For any 𝐴 ∈ End(ℍ) and any 𝑗 ∈ ℤ𝑘 we have
𝑘−1
(2.4.28) 𝐴 |𝑏𝑗 ⟩ = ∑ ⟨𝑏𝑖 |𝐴|𝑏𝑗 ⟩ |𝑏𝑖 ⟩ .
𝑖=0

Corollary 2.4.33.
𝑘−1
(2.4.29) 𝐼ℍ = ∑ |𝑏𝑖 ⟩ ⟨𝑏𝑖 | .
𝑖=0

Exercise 2.4.34. Prove Corollaries 2.4.32 and 2.4.33.

2.4.5. Projections. In this section, we introduce projections and, in particular,


orthogonal projections. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ.
Definition 2.4.35. (1) An operator 𝑃 ∈ End(ℍ) is called a projection if 𝑃 2 = 𝑃.
(2) A projection 𝑃 ∈ End(ℍ) is called orthogonal if 𝑃 |𝜑⟩ and |𝜑⟩−𝑃 |𝜑⟩ are orthogonal
to each other for all |𝜑⟩ ∈ ℍ.

A projection is sometimes also called a projector.


Example 2.4.36. Consider the map
(2.4.30) 𝑃 ∶ ℂ2 → ℂ 2 , (𝛼, 𝛽) ↦ (−𝛽, 𝛽).
This map is ℂ-linear with the representation matrix
0 −1
(2.4.31) 𝑃=( ).
0 1
Also, it is a projection since
0 −1 0 −1 0 −1
(2.4.32) 𝑃2 = ( ).( )=( ).
0 1 0 1 0 1

But 𝑃 is not an orthogonal projection. To see this, we note that ⟨𝑃(1, 2)||(1, 2)−𝑃(1, 2)⟩ =
⟨(−2, 2)||(1, 2) − (−2, 2)⟩ = ⟨(−2, 2)|(3, 0)⟩ = −6 ≠ 0.

Example 2.4.37. Consider the map


(2.4.33) 𝑃 ∶ ℂ2 → ℂ 2 , (𝛼, 𝛽) ↦ (0, 𝛽).
This map is ℂ-linear with the representation matrix
0 0
(2.4.34) 𝑃=( ).
0 1
80 2. Hilbert Spaces

Also, it is a projection since


0 0 0 0 0 0
(2.4.35) 𝑃2 = ( )( )=( ).
0 1 0 1 0 1

To show that 𝑃 is an orthogonal projection, we note that ⟨𝑃(𝛼, 𝛽)||(𝛼, 𝛽) − 𝑃(𝛼, 𝛽)⟩ =
⟨(0, 𝛽)||(𝛼, 0)⟩ = 0 for all 𝛼, 𝛽 ∈ ℂ.

Exercise 2.4.38. Show that for any orthogonal projection 𝑃 on ℍ and any |𝜓⟩ ∈ ℍ we
have ‖𝑃 |𝜓⟩‖ ≤ ‖𝜓‖.
Proposition 2.4.39. Let 𝑃 ∈ End(ℍ). Then the following are true.
(1) If 𝑃 is a projection, then 𝑃 ∗ is a projection.
(2) If 𝑃 is an orthogonal projection, then 𝑃∗ is an orthogonal projection.
Exercise 2.4.40. Prove Proposition 2.4.39.

We characterize orthogonal projections.


Proposition 2.4.41. A projection 𝑃 ∈ End(ℍ) is orthogonal if and only if 𝑃 is Hermitian.
Exercise 2.4.42. Prove Proposition 2.4.41.
Example 2.4.43. Let |𝜑⟩ ∈ ℍ with ⟨𝜑|𝜑⟩ = 1. We claim that the map 𝑃 = |𝜑⟩ ⟨𝜑| is an
orthogonal projection. To see this, let |𝜓⟩ ∈ ℍ. Then the linearity of the inner product
in the second argument implies
𝑃2 |𝜓⟩ = 𝑃(𝑃 |𝜓⟩) = 𝑃(|𝜑⟩ ⟨𝜑|𝜓⟩)
= ⟨𝜑|𝜓⟩𝑃 |𝜑⟩ = ⟨𝜑|𝜓⟩⟨𝜑|𝜑⟩ |𝜑⟩
= |𝜑⟩ ⟨𝜑|𝜓⟩ = 𝑃 |𝜓⟩ .
Also, we have
⟨𝑃 |𝜓⟩ || |𝜓⟩ − 𝑃 |𝜓⟩ ⟩ = ⟨𝜑|𝜓⟩(⟨𝜑|𝜓⟩ − ⟨𝜑|𝜓⟩⟨𝜑|𝜑⟩) = 0.

We generalize Example 2.4.43.


Proposition 2.4.44. Let 𝑙 ∈ ℕ, and let ℍ(0), . . . , ℍ(𝑙 − 1) be linear subspaces of ℍ which
are orthogonal to each other such that ℍ = ℍ(0) + ⋯ + ℍ(𝑙 − 1). Then the following hold.
𝑙−1
(1) For |𝜑⟩ ∈ ℍ let |𝜑⟩ = ∑𝑖=0 |𝜑(𝑖)⟩ be the uniquely determined representation of |𝜑⟩
as a sum of elements |𝜑𝑖 ⟩ in ℍ(𝑖). Then, for all 𝑖 ∈ ℤ𝑙 the map
(2.4.36) 𝑃𝑖 ∶ ℍ → ℍ(𝑖), |𝜑⟩ ↦ |𝜑(𝑖)⟩
is an orthogonal projection. It is called the orthogonal projection of ℍ onto ℍ(𝑖).
Also, for |𝜑⟩ ∈ ℍ the image 𝑃𝑖 |𝜑⟩ is called the orthogonal projection of |𝜑⟩ onto
ℍ(𝑖).
(2) 𝑃0 , . . . , 𝑃𝑙−1 are orthogonal to each other with respect to the Hilbert-Schmidt inner
product.
𝑙−1
(3) We have ∑𝑖=0 𝑃𝑖 = 𝐼ℍ .
2.4. Endomorphisms 81

(4) Let 𝐵0 , . . . , 𝐵𝑙−1 be orthonormal bases of ℍ(0), . . . , ℍ(𝑙 − 1), respectively, such that
𝐵 = 𝐵0 ‖ ⋯ ‖𝐵𝑙−1 is an orthonormal basis of ℍ which exists by Proposition 2.2.44.
Then for all 𝑖 ∈ ℤ𝑙 we have
(2.4.37) 𝑃𝑖 = ∑ |𝑏⟩ ⟨𝑏| .
|𝑏⟩∈𝐵𝑖

Proof. Let 𝑖 ∈ ℤ𝑙 and let |𝜑⟩ ∈ ℍ. The uniqueness of the representation of the ele-
ments of ℍ as a sum of elements in the ℍ(𝑖) implies 𝑃𝑖2 𝜑 = 𝑃𝑖 (|𝜑(𝑖)⟩) = |𝜑(𝑖)⟩. Also, the
sequence (𝑃𝑖 ) is orthogonal because of the orthogonality of the ℍ(𝑖). This proves the
first assertion. Next, for 𝑖, 𝑗 ∈ ℤ𝑙 with 𝑖 ≠ 0 we have 𝑃𝑖 𝑃𝑗 = 0. Therefore, the 𝑃𝑖 are or-
thogonal to each other with respect to the Hilbert-Schmidt inner product. So the 𝑃𝑖 are
linearly independent by Corollary 2.2.34. The last assertion follows from Proposition
2.4.31. □
Example 2.4.45. Recall that
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.4.38) (|𝑥+ ⟩ , |𝑥− ⟩) = ( , )
√2 √2
is an orthonormal basis of ℍ1 . The orthogonal projection of |0⟩ onto ℂ |𝑥+ ⟩ is
1
(2.4.39) |𝑥+ ⟩ ⟨+|0⟩ = |𝑥+ ⟩ .
√2

2.4.6. Schur decomposition. As we have seen in Section 2.4.1, not all matrices
in ℂ(𝑘,𝑘) are diagonizable. However, we can prove the following weaker result, which
will allow us to prove the spectral theorem in the next section. It was first proved by
the mathematician Issai Schur in the early 20th century.
Theorem 2.4.46 (Schur decomposition theorem). Let 𝐴 ∈ ℂ(𝑘,𝑘) . Assume that 𝐴 has
the 𝑙 distinct eigenvalues 𝜆0 , . . . , 𝜆𝑙−1 with algebraic multiplicities 𝑚0 , . . . , 𝑚𝑙−1 . Then 𝑘 =
𝑙−1
∑𝑖=0 𝑚𝑖 and there is a unitary matrix 𝑈 ∈ ℂ(𝑘,𝑘) and an upper triangular matrix 𝑇 with
diagonal
(2.4.40) (𝜆0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝜆⎵⏟
𝑙−1 )
𝑚0 𝑚1 𝑚𝑙−1

such that
(2.4.41) 𝐴 = 𝑈𝑇𝑈 ∗ .
Such a representation is called Schur decomposition of 𝐴.

Proof. We prove the assertion by induction on 𝑘 and, in doing so, present an algorithm
to construct a Schur decomposition of 𝐴. For 𝑘 = 1 the assertion is true since in this
case, the matrix 𝐴 is in upper triangular form. So we can set 𝑈 = 𝐼1 .
Let 𝑘 > 1 and assume that the assertion holds for all 𝑚 < 𝑘. Let 𝑣 ⃗ be an eigenvector
associated with the eigenvalue 𝜆0 that exists by Proposition B.7.21. Assume, without
loss of generality, that ‖‖𝑣‖‖⃗ = 1. By Theorem 2.2.33, there is a matrix 𝑋 ∈ 𝐶 (𝑘,𝑘−1) such
that the column vectors of the matrix
(2.4.42) (𝑣 ⃗ 𝑋)
82 2. Hilbert Spaces

form an orthonormal basis of ℂ𝑘 . Proposition 2.4.18 implies that this matrix is unitary.
So we have
𝑣∗⃗ 𝑣∗⃗ 𝐴𝑣 ⃗ 𝑣∗⃗ 𝐴𝑋
( ∗ ) 𝐴 (𝑣 ⃗ 𝑋) = ( ∗ )
𝑋 𝑋 𝐴𝑣 ⃗ 𝑋 ∗ 𝐴𝑋
(2.4.43)
𝜆0 𝑣∗⃗ 𝑣 ⃗ 𝑣∗⃗ 𝐴𝑋 𝜆 𝑣∗⃗ 𝐴𝑋
=( ∗ ∗ )=( 0 ).
𝜆0 𝑋 𝑣 ⃗ 𝑋 𝐴𝑋 0 𝑋 ∗ 𝐴𝑋
The lower-left corner of this matrix is zero because all columns of 𝑋 are orthogonal to
𝑣.⃗ Also, 𝑋 ∗ 𝐴𝑋 is in ℂ(𝑘−1,𝑘−1) and since the matrix (𝑣 ⃗ 𝑋) is unitary, we have (𝑣𝑋)
⃗ −1 =
(𝑣𝑋)⃗ ∗ . So by (2.4.43) and Proposition B.5.30 we have
(2.4.44) 𝑝𝐴 (𝑥) = (𝑥 − 𝜆0 )𝑝𝑋 ∗ 𝐴𝑋 (𝑥)
which implies
𝑙−1
(2.4.45) 𝑝𝑋 ∗ 𝐴𝑋 (𝑥) = (𝑥 − 𝜆0 )𝑚0 −1 ∏(𝑥 − 𝜆𝑖 )𝑚𝑖 .
𝑖=1

By the induction hypothesis, there is a unitary matrix 𝑌 ∈ ℂ(𝑘−1,𝑘−1) such that


(2.4.46) 𝑍 = 𝑌 ∗ 𝑋 ∗ 𝐴𝑋𝑌
is an upper triangular matrix with diagonal
(2.4.47) (𝜆0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝑙−1 ).
𝜆⎵⏟
𝑚0 −1 𝑚1 𝑚𝑙−1

Define
(2.4.48) 𝑈 = (𝑣 ⃗ 𝑋𝑌 ) .

Then 𝑈 ∈ ℂ(𝑘,𝑘) and this matrix is unitary because


𝑣∗⃗ 1 𝑣∗⃗ 𝑋𝑌 1 0
(2.4.49) 𝑈 ∗ 𝑈 = ( ∗ ∗ ) (𝑣 ⃗ 𝑋𝑌 ) = ( ∗ ∗ )=( ).
𝑌 𝑋 𝑌 𝑋 𝑣 ⃗ 𝑌 ∗ 𝑋 ∗ 𝑋𝑌 0 𝐼𝑘−1
Also, we have
𝑣∗⃗
𝑈 ∗ 𝐴𝑈 = ( ) 𝐴 (𝑣 ⃗ 𝑋𝑌 )
𝑌 ∗𝑋 ∗
𝑣∗⃗ 𝐴𝑣 ⃗ 𝑣∗⃗ 𝐴𝑋𝑌
(2.4.50) =( )
𝑌 𝑋 𝐴𝑣 ⃗ 𝑌 ∗ 𝑋 ∗ 𝐴𝑋𝑌
∗ ∗

𝜆0 𝑣∗⃗ 𝐴𝑋𝑌
=( )
0 𝑍
which is an upper triangular matrix. Denote it by 𝑇. The diagonal of the upper trian-
gular matrix 𝑍 is shown in (2.4.47). Therefore, by (2.4.50) the diagonal of 𝑇 is
(2.4.51) (𝜆0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝜆⎵⏟
𝑙−1 ).
𝑚0 𝑚1 𝑚𝑙−1

So we have found the Schur decomposition 𝐴 = 𝑈𝑇𝑈 ∗ of 𝐴. □


2.4. Endomorphisms 83

Example 2.4.47. Consider the matrix


3 −2
(2.4.52) 𝐴=( ).
2 −1

Its characteristic polynomial is 𝑝𝐴 (𝑥) = (𝑥 − 1)2 . So 1 is the only eigenvalue of 𝐴 and


its algebraic multiplicity is 2. However, its geometric multiplicity is only 1 since the
matrix 𝐼 − 𝐴 has rank 1. Therefore, 𝐴 is not diagonalizable. We determine the Schur
decomposition of 𝐴 using the construction of the proof of Theorem 2.4.46. We have
𝑙 = 1, 𝜆0 = 1, and 𝑚0 = 2. The case 𝑘 = 1 is trivial. So, let 𝑘 = 2 and use the unitary
1
eigenvector 𝑣 ⃗ = (1, 1) of 𝐴. The only column vector of the matrix
√2

1 1
(2.4.53) 𝑋= ()
√2 −1

appends 𝑣 ⃗ to an orthonormal basis of ℂ2 . So, the matrix from (2.4.42) is


1 1 1
(2.4.54) (𝑣 ⃗ 𝑋) = ( ).
√2 1 −1
Now we have
𝑣∗⃗ 1 1 1 3 −2 1 1 1 4
(2.4.55) ( ) 𝐴 (𝑣 ⃗ 𝑋) = ( )( )( )=( ).
𝑋∗ 2 1 −1 2 −1 1 −1 0 1
This is already the upper triangular matrix that we are looking for. But let us see how
the construction proceeds. The matrix 𝑋 ∗ 𝐴𝑋 is (1). We can choose the unitary matrix
𝑌 = (1). Then 𝑍 = 𝑌 ∗ 𝑋 ∗ 𝐴𝑋𝑌 = (1). We set
1
1 1
(2.4.56) 𝑈 = (𝑣 ⃗ 𝑋𝑌 ) = (
)
√2 1 −1
and obtain
1 4
(2.4.57) 𝑈 ∗ 𝐴𝑈 = 𝑇 = ( ).
0 1
So 𝐴 = 𝑈𝑇𝑈 ∗ is a Schur decomposition of 𝐴.

2.4.7. The spectral theorem. The aim of this section is to introduce the re-
nowned spectral theorem, which establishes the diagonalizability of normal matrices
and the existence of an orthonormal basis consisting of eigenvectors for normal op-
erators. The finite-dimensional version which we present here goes back to the early
20th century and is closely associated with the contributions of mathematicians such as
David Hilbert. It assumes pivotal significance in the postulates of quantum mechan-
ics presented in Chapter 3. As previously mentioned, the broader domain of quan-
tum mechanics necessitates the infinite-dimensional analog, which can be attributed
to mathematicians like John von Neumann and Hermann Weyl.

Definition 2.4.48. A matrix 𝐴 ∈ ℂ(𝑘,𝑘) or operator 𝐴 ∈ End(ℍ) is called normal if


𝐴∗ 𝐴 = 𝐴𝐴∗ .
84 2. Hilbert Spaces

The next proposition presents important examples of normal matrices and opera-
tors.
Proposition 2.4.49. If 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(𝑉) is Hermitian or unitary, then 𝐴 is
normal.
Exercise 2.4.50. Prove Proposition 2.4.49.

Here are some properties of normal matrices and operators.


Proposition 2.4.51. (1) The adjoint of a normal matrix or operator is normal.
(2) Every diagonal matrix in ℂ(𝑘,𝑘) is normal.
(3) A matrix in ℂ(𝑘,𝑘) that is both normal and upper triangular is a diagonal matrix.

Proof. Let 𝐴 ∈ ℂ(𝑘,𝑘) be a normal matrix. We apply (2.3.25) and obtain (𝐴∗ )∗ 𝐴∗ =
𝐴𝐴∗ = 𝐴∗ 𝐴 = 𝐴∗ (𝐴∗ )∗ . This proves the first assertion.
To show the second assertion, let 𝐷 ∈ ℂ(𝑘,𝑘) be a diagonal matrix with diago-
nal (𝜆0 , . . . , 𝜆𝑘−1 ) ∈ ℂ𝑘 . Then 𝐷∗ 𝐷 and 𝐷𝐷∗ are diagonal matrices with diagonal
(|𝜆0 |2 , . . . , |𝜆𝑘−1 |2 ). Hence, we have 𝐷𝐷∗ = 𝐷∗ 𝐷 which means that 𝐷 is normal.
We prove the last statement and let 𝐴 = (𝑎𝑖,𝑗 ). We also denote by (𝑟0⃗ , . . . , 𝑟 𝑘−1
⃗ )
⃗ ) the column vectors of 𝐴. Then 𝐴 𝐴 = 𝐴𝐴∗
the row vectors of 𝐴 and by (𝑐 0⃗ , . . . , 𝑐 𝑘−1 ∗

implies that for any 𝑢 ∈ ℤ𝑘 the entry of this matrix with row and column index 𝑢 can
2
be computed as 𝑟ᵆ∗⃗ 𝑟ᵆ⃗ = ‖‖𝑟ᵆ⃗ ‖‖ and as 𝑐ᵆ⃗ 𝑐ᵆ∗⃗ = ‖‖𝑐ᵆ⃗ ‖‖. So for 0 ≤ 𝑢 < 𝑘 we have
2 2
(2.4.58) ‖𝑟 ⃗ ‖ = ‖𝑐 ⃗ ‖ .
‖ ᵆ‖ ‖ ᵆ‖
We prove by induction on 𝑢 that for 𝑢 = 0, 1, . . . , 𝑘 we have
(2.4.59) 𝑟 𝑖⃗ = 𝑐 𝑖⃗ = 𝑎𝑖,𝑖 𝑒 𝑖⃗ , 0 ≤ 𝑖 < 𝑢.
For 𝑢 = 𝑘 this implies that 𝐴 is diagonal. For the base case 𝑢 = 0 there is nothing
to show. For the induction step, assume that 0 ≤ 𝑢 < 𝑘 and that (2.4.59) holds for 𝑢.
Since 𝐴 is upper triangular, it follows from (2.4.59) that
(2.4.60) 𝑐ᵆ⃗ = 𝑎ᵆ,ᵆ 𝑒 ᵆ⃗
and
(2.4.61) 𝑟ᵆ⃗ = (0, . . . , 0, 𝑎ᵆ,ᵆ , 𝑎ᵆ,ᵆ+1 , . . . , 𝑎ᵆ,𝑘−1 ).
𝑘−1
So (2.4.58) implies |𝑎ᵆ,ᵆ |2 = ∑𝑗=ᵆ |𝑎ᵆ,𝑗 |2 and thus 𝑎ᵆ,ᵆ+1 = ⋯ = 𝑎ᵆ,𝑘−1 = 0 which
shows that 𝑟ᵆ⃗ = 𝑎ᵆ,ᵆ 𝑒 ᵆ⃗ . □

The next exercise shows that there are normal matrices that are neither unitary
nor Hermitian.
Example 2.4.52. Show that the matrix
1 1+𝑖 1
(2.4.62) (−1 + 𝑖 1 1)
−1 −1 1
is normal but not Hermitian or unitary.
2.4. Endomorphisms 85

The next theorem states that normal matrices are diagonalizable.


Theorem 2.4.53. Let 𝐴 ∈ ℂ(𝑘,𝑘) be a normal matrix, let 𝑙 ∈ ℕ, let 𝜆0 , . . . , 𝜆𝑙−1 be the
distinct eigenvalues of 𝐴, and let 𝑚0 , . . . , 𝑚𝑙−1 be their algebraic multiplicities. Then there
is a unitary matrix 𝑈 ∈ ℂ(𝑘,𝑘) such that
(2.4.63) 𝑈 ∗ 𝐴𝑈 = diag(𝜆 0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝜆⎵⏟
𝑙−1 ).
𝑚0 𝑚1 𝑚𝑙−1

Also, if we write the sequence of column vectors of 𝑈 as


(2.4.64) 𝑈 = (⏟⎵⎵⎵⏟⎵⎵⎵⏟
𝑢⃗𝑀0 , . . . , 𝑢⃗𝑀1 −1 , ⏟⎵⎵⎵⏟⎵⎵⎵⏟
𝑢⃗𝑀1 , . . . , 𝑢⃗𝑀2 −1 , . . . , ⏟⎵
𝑢⃗𝑀𝑙−1
⎵⎵,⎵⏟⎵
... ,⎵
𝑢⎵
⃗𝑀⎵⏟
𝑙 −1
)
𝑈0 𝑈1 𝑈𝑙−1

𝑗−1
with 𝑀𝑗 = ∑𝑖=0 𝑚𝑖 for 0 ≤ 𝑗 ≤ 𝑙, then 𝑈 𝑗 is an orthonormal basis of the eigenspace
associated with 𝜆𝑗 for all 𝑗 ∈ ℤ𝑙 .

Proof. Let 𝐴 be normal. By Theorem 2.4.46 there is a Schur decomposition 𝐴 = 𝑈𝑇𝑈 ∗


where 𝑈 ∈ ℂ(𝑘,𝑘) is a unitary matrix and 𝑇 is an upper triangular matrix with diagonal
(2.4.65) (𝜆0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , 𝜆
⏟⎵⏟⎵⏟ ⏟⎵ , . . . ,⎵
⎵⎵⏟⎵
𝑙−1 𝑙−1 ).
𝜆⎵⏟
𝑚0 𝑚1 𝑚𝑙−1

Since 𝑈 is unitary, we can also write 𝑇 = 𝑈 𝐴𝑈. Now we have
𝑇 ∗ 𝑇 = (𝑈 ∗ 𝐴𝑈)∗ (𝑈 ∗ 𝐴𝑈) = 𝑈 ∗ 𝐴∗ 𝑈𝑈 ∗ 𝐴𝑈
(2.4.66) = 𝑈 ∗ 𝐴∗ 𝐴𝑈 = 𝑈 ∗ 𝐴𝐴∗ 𝑈 = (𝑈 ∗ 𝐴𝑈)(𝑈 ∗ 𝐴∗ 𝑈)
= (𝑈 ∗ 𝐴𝑈)(𝑈 ∗ 𝐴𝑈)∗ = 𝑇𝑇 ∗ .
Hence 𝑇 is normal and upper triangular. So Proposition 2.4.51 implies that 𝑇 is a di-
agonal matrix.
We prove the second assertion. Let 𝑗 ∈ ℤ𝑙 and let 𝑢⃗ be an element of 𝑈 𝑗 . Then
𝐴𝑢⃗ = 𝜆𝑗 𝑢.⃗ It follows that 𝑈 𝑗 is an orthonormal sequence of 𝑚𝑗 elements of the eigen-
space 𝐸𝑗 of 𝐴 associated with 𝜆𝑗 . By Theorem B.7.28, the dimension of 𝐸𝑗 is 𝑚𝑗 . So 𝑈 𝑗
is an orthonormal basis of 𝐸𝑗 . □

Note that in Theorem 2.4.53 we have 𝑀0 = 0 and 𝑀𝑙 = 𝑘.


Example 2.4.54. Consider the Pauli matrix
0 1
(2.4.67) 𝑋=( )
1 0
and set
11 1
(2.4.68) 𝑈= ().
√2 1 −1
The we have 𝑈 ∗ 𝑋𝑈 = diag(1, −1).
Exercise 2.4.55. Find the decomposition (2.4.63) for the Pauli matrices 𝑌 and 𝑍 and
for the Hadamard matrix 𝐻.

From Theorem 2.4.53 we obtain the spectral theorem.


86 2. Hilbert Spaces

Theorem 2.4.56 (Spectral theorem). Let 𝐴 ∈ End(ℍ) be normal. Let Λ be the set of
eigenvalues of 𝐴. For 𝜆 ∈ Λ denote by 𝑃𝜆 the orthogonal projection onto the eigenspace
𝐸 𝜆 corresponding to 𝜆. Then the following are true.
(1) There are orthonormal bases 𝐵𝜆 of 𝐸 𝜆 for all 𝜆 ∈ Λ such that their concatenation is
an orthonormal basis of ℍ.
(2) The eigenspaces 𝐸 𝜆 are orthogonal to each other, and their sum is ℍ.
(3) 𝑃𝜆 = ∑|𝑏⟩∈𝐵 |𝑏⟩ ⟨𝑏|.
𝜆

(4) ∑𝜆∈Λ 𝑃𝜆 = 𝐼ℍ .
(5) 𝐴 = ∑𝜆∈Λ 𝜆𝑃𝜆 . This representation of 𝐴 is called the spectral decomposition of 𝐴.

Proof. Let 𝑙 = |Λ| and Λ = {𝜆0 , . . . , 𝜆𝑙−1 }. Denote by 𝐴 also the representation matrix
of 𝐴 with respect to an orthonormal basis 𝐶 of ℍ. Use the notation of Theorem 2.4.53.
Then
(2.4.69) 𝐵 = 𝐶𝑈 = (|𝑢0 ⟩ , . . . , |𝑢𝑘−1 ⟩)
is another orthonormal basis of ℍ. Let 𝑗 ∈ ℤ𝑙 . It follows from the properties of 𝑈
that 𝐵𝜆 = (||𝑢𝑀𝑗 ⟩ , . . . , ||𝑢𝑀𝑗+1 −1 ⟩) is an orthonormal basis of 𝐸 𝜆 . This proves the first
assertion.
The second assertion follows immediately from the first. The second, third, and
fourth assertions follow from Proposition 2.4.44. Using the fourth assertion we obtain
(2.4.70) 𝐴 |𝜑⟩ = 𝐴 ∑ 𝑃𝜆 |𝜑⟩ = ∑ 𝐴𝑃𝜆 |𝜑⟩ = ∑ 𝜆𝑃𝜆 |𝜑⟩ . □
𝜆∈Λ 𝜆∈Λ 𝜆∈Λ

In the following, we adopt a simplified notation. When 𝐴 is a normal operator in


End(ℍ), we write its spectral as 𝐴 = ∑𝜆∈Λ 𝑃𝜆 . This notation assumes that Λ represents
the set of eigenvalues of 𝐴, and for each 𝜆 ∈ Λ, 𝑃𝜆 represents the projection onto the
eigenspace in ℍ corresponding to 𝜆 without explicitly mentioning it.
The next example determines the spectral decomposition of the Pauli operators.
Example 2.4.57. Consider the following pairs of quantum states in ℍ1 :
|0⟩ + |1⟩ |0⟩ − |1⟩
(|𝑥+ ⟩ , |𝑥− ⟩) = ( , ),
√2 √2
|0⟩ + 𝑖 |1⟩ |0⟩ − 𝑖 |1⟩
(2.4.71) (|𝑦+ ⟩ , |𝑦− ⟩) = ( , ),
√2 √2
(|𝑧+ ⟩ , |𝑧− ⟩) = (|0⟩ , |1⟩).
These pairs are orthonormal bases of ℍ1 . The eigenvalues of the Pauli operators 𝑋, 𝑌 ,
and 𝑍 are 1 and −1 and their spectral decomposition is
𝑋 = |𝑥+ ⟩ ⟨𝑥+ | − |𝑥− ⟩ ⟨𝑥− | ,
(2.4.72) 𝑌 = |𝑦+ ⟩ ⟨𝑦+ | − |𝑦− ⟩ ⟨𝑦− | ,
𝑍 = |𝑧+ ⟩ ⟨𝑧+ | − |𝑧− ⟩ ⟨𝑧− | .
Exercise 2.4.58. Verify Example 2.4.57.
2.4. Endomorphisms 87

Here are some properties of the spectral decomposition of normal endomorphisms


of ℍ.

Proposition 2.4.59. Let 𝐴 ∈ End(ℍ) be normal, let Λ be the set of eigenvalues of 𝐴, and
let
(2.4.73) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ

be the spectral decomposition of 𝐴. Also, let 𝑚 ∈ ℤ≥0 . Then we have

(2.4.74) 𝐴∗ = ∑ 𝜆𝑃𝜆 , 𝐴𝐴∗ = ∑ |𝜆|2 𝑃𝜆 , 𝐴𝑚 = ∑ 𝜆𝑚 𝑃𝜆 .


𝜆∈Λ 𝜆∈Λ 𝜆∈Λ

If 𝐴 is invertible, then the last equation holds for all 𝑚 ∈ ℤ.

Proof. We have


𝐴 = ( ∑ 𝜆𝑃𝜆 ) by (2.4.73),
𝜆∈𝑃

= ∑ 𝜆𝑃𝜆∗ by Proposition 2.3.9,


𝜆∈𝑃

= ∑ 𝜆𝑃𝜆 by Proposition 2.4.41.


𝜆∈𝑃

This proves the first assertion. The other two assertions can be verified using the fact
that by Theorem 2.4.56 the eigenspaces 𝑃𝜆 (ℍ) are pairwise orthogonal. □

We note that the second and third equations in (2.4.74) may show spectral decom-
positions, since the absolute values and powers of different eigenvalues of a normal
operator may be the same. To obtain the spectral decompositions, we have to group
the projections appropriately.
The spectral theorem allows us to characterize involutions, projections, and Her-
mitian and unitary operators by their eigenvalues.

Proposition 2.4.60. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ) be normal and let Λ be the set of
eigenvalues of 𝐴. Then the following hold.
(1) 𝐴 is an involution if and only if Λ ⊂ {−1, 1}.
(2) 𝐴 is a projection if and only if Λ ⊂ {0, 1}.
(3) 𝐴 is Hermitian if and only if all its eigenvalues are real numbers.
(4) 𝐴 is unitary if and only if all of its eigenvalues have absolute value 1.

Proof. Let
(2.4.75) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ

be the spectral decomposition of 𝐴. Recall that by Proposition 2.4.44, the 𝑃𝑖 are linearly
independent. This will be used in all arguments.
88 2. Hilbert Spaces

Using Theorem 2.4.56 and Proposition 2.4.59 we see that 𝐴 is an involution if and
only if

(2.4.76) 𝐴2 = ∑ 𝜆2 𝑃𝜆 = 𝐼ℍ = ∑ 𝑃𝜆 .
𝜆∈Λ 𝜆

Hence, 𝐴 is an involution if and only if 𝜆2 = 1 for all 𝜆 ∈ Λ. This is true if and only if
Λ ⊂ {1, −1}.
Next, 𝐴 is a projection if and only if 𝐴2 = 𝐴. By Theorem 2.4.56 and Proposition
2.4.59 this is true if and only if 𝜆2 = 𝜆 for all 𝜆 ∈ Λ which is equivalent to Λ ⊂ {0, 1}.
In addition, 𝐴 is Hermitian if and only if 𝐴∗ = 𝐴. By Theorem 2.4.56 and Proposi-
tion 2.4.59 this is equivalent to 𝜆 = 𝜆 and thus 𝜆 ∈ ℝ for all 𝜆 ∈ Λ.
Finally, 𝐴 is unitary if and only if 𝐴 is invertible and 𝐴∗ = 𝐴−1 . By Theorem 2.4.56
and Proposition 2.4.59 this is equivalent to 0 ∉ Λ and 𝜆 = 1/𝜆 for all 𝜆 ∈ Λ which
means that |𝜆| = 1 for all 𝜆 ∈ Λ. □

As a consequence of Proposition 2.4.60, we obtain the following characterization


of Hermitian matrices or operators.

Proposition 2.4.61. A normal matrix 𝐴 ∈ ℂ(𝑘,𝑘) or operator 𝐴 ∈ End(ℍ) is Hermitian


⃗ 𝑢⟩⃗ or ⟨𝜑|𝐴|𝜑⟩ are real numbers for all 𝑢⃗ ∈ ℂ𝑘 or |𝜑⟩ ∈ ℍ, respectively.
if and only if ⟨𝑢|𝐴|

Proof. It suffices to prove the assertion for 𝐴 ∈ ℂ(𝑘,𝑘) . First, assume that 𝐴 is Hermit-
ian and let 𝑢⃗ ∈ ℂ𝑘 . Then we have

(2.4.77) ⃗ 𝑢⟩⃗ = ⟨𝐴𝑢|⃗ 𝑢⟩⃗ = ⟨𝐴∗ 𝑢|⃗ 𝑢⟩⃗ = ⟨𝑢|𝐴


⟨𝑢|𝐴 ⃗ 𝑢⟩.

This shows that ⟨𝑢|𝐴


⃗ 𝑢⟩⃗ ∈ ℝ.
Conversely, assume that ⟨𝑢|𝐴⃗ 𝑢⟩⃗ ∈ ℝ for all 𝑢⃗ ∈ ℂ𝑘 . Let 𝜆 be an eigenvalue of 𝐴 and
let 𝑢⃗ be an eigenvector of 𝐴 associated with 𝜆 of length 1. Then we have 𝜆 = ⟨𝑢|𝐴| ⃗ 𝑢⟩⃗ ∈
ℝ. Therefore, Proposition 2.4.60 implies that 𝐴 is Hermitian. □

2.4.8. Definite operators and matrices. We define definite matrices and op-
erators.

Definition 2.4.62. Let 𝐴 ∈ End(ℍ).


(1) 𝐴 is called positive definite if ⟨𝜑|𝐴|𝜑⟩ ∈ ℝ>0 for all nonzero |𝜑⟩ ∈ ℍ .
(2) 𝐴 is called positive semidefinite if ⟨𝜑|𝐴|𝜑⟩ ∈ ℝ≥0 for all |𝜑⟩ ∈ ℍ.

Definition 2.4.63. A matrix 𝐴 ∈ ℂ(𝑘,𝑘) is called positive definite or positive semidefinite


if the corresponding endomorphism of ℂ𝑘 has this property.

From Proposition 2.4.61 we obtain the following result.

Proposition 2.4.64. All normal positive definite, positive semidefinite, negative definite,
and negative semidefinite operators or matrices are Hermitian.
2.4. Endomorphisms 89

We can also characterize positive definite operators by their eigenvalues.

Proposition 2.4.65. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ).


(1) If 𝐴 is positive definite, then all eigenvalues of 𝐴 are positive real numbers.
(2) If 𝐴 is positive semidefinite, then all the eigenvalues of 𝐴 are real numbers ≥ 0.

Proof. Let 𝜆 be an eigenvalue of 𝐴 and let |𝜑⟩ ∈ ℍ be an eigenvector of 𝐴 associated


to 𝜆 of length 1. Then we have
(2.4.78) ⟨𝜑|𝐴|𝜑⟩ = ⟨𝜑|𝜆 |𝜑⟩⟩ = 𝜆.
This implies both assertions of the proposition. □

We also present another characterization of normal positive semidefinite matrices


and operators.

Proposition 2.4.66. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ) be normal. Then the following


statements are equivalent.
(1) 𝐴 is positive semidefinite.
(2) 𝐴 = 𝐵𝐵 ∗ for some normal 𝐵 ∈ ℂ(𝑘,𝑘) or 𝐵 ∈ End(ℍ), respectively.
(3) 𝐴 = 𝐵 2 for some Hermitian 𝐵 ∈ ℂ(𝑘,𝑘) or 𝐵 ∈ End(ℍ), respectively.

Proof. Let 𝐴 be positive semidefinite. Then 𝐴 is Hermitian by Proposition 2.4.64. Let


(2.4.79) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ

be the spectral decomposition of 𝐴. Then by Proposition 2.4.65 we have 𝜆 ∈ ℝ≥0 for


all 𝜆 ∈ Λ. We set
(2.4.80) 𝐵 = ∑ √𝜆𝑃𝜆 .
𝜆∈Λ

Then 𝐵 = 𝐵 and Proposition 2.4.59 imply
(2.4.81) 𝐵 ∗ 𝐵 = 𝐵𝐵 ∗ = 𝐵 2 = ∑ 𝜆𝑃𝜆 = 𝐴.
𝜆∈Λ

This shows that the first assertion implies the other two statements.
Now assume that there is a normal 𝐵 ∈ End(ℍ) such that 𝐴 = 𝐵𝐵 ∗ . Let
(2.4.82) 𝐵 = ∑ 𝜆𝑃𝜆
𝜆∈Λ′

be the spectral decomposition of 𝐵. Then we have


(2.4.83) 𝐴 = ∑ |𝜆|2 𝑃𝜆 .
𝜆∈Λ′

So 𝐴 is positive semidefinite. Finally, assume that there is a Hermitian 𝐵 ∈ End(ℍ)


such that 𝐴 = 𝐵 2 . Then we have 𝐴 = 𝐵 ∗ 𝐵 which implies that 𝐴 is positive semidefinite.

90 2. Hilbert Spaces

2.4.9. Singular value decomposition. The next theorem is another conse-


quence of the spectral theorem.
Theorem 2.4.67 (Singular value decomposition). Let 𝑘, 𝑙, 𝑟 ∈ ℕ and let 𝐴 ∈ ℂ(𝑘,𝑙) be
of rank 𝑟. Then there are unitary matrices 𝑈 ∈ ℂ(𝑘,𝑘) and 𝑉 ∈ ℂ(𝑙,𝑙) such that
𝜆 ⋯ 0 ⋮
⎛ 0 ⎞
⋮ ⋱ ⋮ ⋯ 0 ⋯
⎜ ⎟
0 ⋯ 𝜆𝑟−1 ⋮
(2.4.84) 𝐴=𝑈⎜ ⎟ 𝑉∗
⎜ ⋮ ⋮ ⎟
⎜⋯ 0 ⋯ ⋯ 0 ⋯⎟
⎝ ⋮ ⋮ ⎠
where 𝜆0 , . . . , 𝜆𝑟−1 are positive real numbers. Such a representation is called a singu-
lar value decomposition of the matrix 𝐴. In the decomposition, the diagonal entries
𝜆0 , . . . , 𝜆𝑟−1 are uniquely determined by 𝐴 up to reordering. They are called the singu-
lar values of 𝐴.

Proof. As shown in Exercise 2.4.68, the matrix 𝐴∗ 𝐴 is a positive semidefinite and Her-
mitian matrix in ℂ(𝑙,𝑙) . It follows from Theorem 2.4.53 that there is a unitary matrix
𝑉 ∈ ℂ(𝑙,𝑙) such that
𝐷 0
(2.4.85) 𝑉 ∗ 𝐴∗ 𝐴𝑉 = 𝐷′ = ( )
0 0
where 𝐷 ∈ ℂ(𝑚,𝑚) is a positive definite diagonal matrix, 𝑚 is the number of nonzero
eigenvalues of 𝐴∗ 𝐴, and these eigenvalues are positive real numbers and diagonal el-
ements of 𝐷. By Theorem 2.4.53, the columns of 𝑉 form an orthonormal basis of ℂ𝑙
consisting of eigenvectors of 𝐴∗ 𝐴. The eigenvalue corresponding to the 𝑖th column of
𝑉 is the 𝑖th diagonal entry of 𝐷′ for 0 ≤ 𝑖 < 𝑘. We write
(2.4.86) 𝑉 = (𝑉1 𝑉2 )
where 𝑉1 ∈ ℂ(𝑙,𝑚) and 𝑉2 ∈ 𝐶 (𝑙,𝑙−𝑚) . Then the columns of 𝑉1 are linearly independent
eigenvectors corresponding to the nonzero eigenvalues of 𝐴∗ 𝐴. Also, the columns of 𝑉2
are eigenvectors corresponding to the eigenvalue 0 of 𝐴∗ 𝐴. So (2.4.85) can be rewritten
as
𝑉∗ 𝑉 ∗ 𝐴∗ 𝐴𝑉 𝑉1∗ 𝐴∗ 𝐴𝑉2 𝐷 0
(2.4.87) ( 1∗ ) 𝐴∗ 𝐴 (𝑉1 𝑉2 ) = ( 1∗ ∗ 1 )=( ).
𝑉2 𝑉2 𝐴 𝐴𝑉1 𝑉2∗ 𝐴∗ 𝐴𝑉2 0 0
This implies
(2.4.88) (𝐴𝑉1 )∗ 𝐴𝑉1 = 𝑉1∗ 𝐴∗ 𝐴𝑉1 = 𝐷, (𝐴𝑉2 )∗ 𝐴𝑉2 = 𝑉2∗ 𝐴∗ 𝐴𝑉2 = 0
which implies
2
(2.4.89) ‖𝐴𝑉2 ‖ = tr(𝑉2∗ 𝐴∗ 𝐴𝑉2 ) = 0.
Hence we have
(2.4.90) 𝐴𝑉2 = 0.
The fact that 𝑉 is unitary implies
(2.4.91) 𝑉1∗ 𝑉1 = 𝐼𝑙 , 𝑉2∗ 𝑉2 = 𝐼𝑙−𝑚 , 𝑉1 𝑉1∗ + 𝑉2 𝑉2∗ = 𝐼𝑙 .
2.4. Endomorphisms 91

Now define
(2.4.92) 𝑈1 = 𝐴𝑉1 𝐷−1/2
where 𝐷−1/2 is the diagonal matrix whose diagonal entries are the inverse square roots
of the diagonal entries of 𝐷. Then the third equation in (2.4.91), (2.4.92), and (2.4.90)
imply
𝑈1 𝐷1/2 𝑉1∗ = 𝐴𝑉1 𝐷−1/2 𝐷1/2 𝑉1∗ = 𝐴𝑉1 𝑉1∗
(2.4.93)
= 𝐴(𝐼𝑘 − 𝑉2 𝑉2∗ ) = 𝐴 − (𝐴𝑉2 )𝑉2∗ = 𝐴.
Also, (2.4.92) and (2.4.88) imply
(2.4.94) 𝑈1∗ 𝑈1 = 𝐷−1/2 𝑉1∗ 𝐴∗ 𝐴𝑉1 𝐷−1/2 = 𝐷−1/2 𝐷𝐷−1/2 = 𝐼𝑚 .
Hence, the columns of 𝑈1 form an orthonormal sequence which by Theorem 2.2.33 can
be extended to an orthonormal basis of ℂ𝑙 . Therefore, we can choose 𝑈2 ∈ 𝐶 (𝑘,𝑘−𝑚) so
that
(2.4.95) 𝑈 = (𝑈1 𝑈2 )
is a unitary matrix. Finally, we define the matrix 𝐵 ∈ ℂ(𝑘,𝑙) as
𝐷1/2 0
(2.4.96) 𝐵=( ).
0 0
Then we have
𝐷1/2 0 𝑉1∗
(2.4.97) 𝑈𝐵𝑉 ∗ = (𝑈1 𝑈2 ) ( ) ( ) = 𝑈1 𝐷1/2 𝑉1∗ = 𝐴. □
0 0 𝑉2∗

Note that the singular values of a complex matrix 𝐴 are the square roots of the
nonzero eigenvalues of the normal square matrix 𝐴∗ 𝐴.
Exercise 2.4.68. Let 𝐴 ∈ ℂ(𝑘,𝑘) or 𝐴 ∈ End(ℍ). Show that 𝐴𝐴∗ is positive semidefinite
and Hermitian.

2.4.10. Functions of operators. In this section, we explain how functions of


normal operators on ℍ are defined. We start with a motivation. In many contexts, it is
useful to write complex numbers 𝛾 with |𝛾| = 1 as
(2.4.98) 𝛾 = 𝑒𝑖𝛽 = cos 𝛽 + 𝑖 sin 𝛽
where 𝑖 is the complex unit and 𝛽 ∈ ℝ. The notions introduced in this section will allow
us to write any unitary operator 𝑈 ∈ End(ℍ) as 𝑈 = 𝑒𝑖𝐴 with a Hermitian operator
𝐴 ∈ End(ℍ).
Definition 2.4.69. Let 𝑓 ∶ ℂ → ℂ, let 𝐴 be a normal linear operator on ℍ, let
(2.4.99) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ

be the spectral decomposition of 𝐴. Then we define


(2.4.100) 𝑓(𝐴) = ∑ 𝑓(𝜆)𝑃𝜆 .
𝜆∈Λ
92 2. Hilbert Spaces

It can be shown that in certain cases the function 𝑓(𝐴) can also be defined using
power series, even if 𝐴 is not normal. Also, we note that from (2.4.100), the spectral
decomposition of 𝑓(𝐴) can be easily obtained.
Example 2.4.70. Consider the Pauli operator 𝑍 and the 𝜋/8 operator 𝑇 which have the
spectral decompositions
(2.4.101) 𝑍 = |0⟩ ⟨0| − |1⟩ ⟨1| , 𝑇 = |0⟩ ⟨0| + 𝑒𝑖𝜋/4 |1⟩ ⟨1| .
Also, let
(2.4.102) 𝑓 ∶ ℂ → ℂ, 𝑥 ↦ 𝑒−𝑖𝜋𝑥/8 .
The eigenvalues of 𝑍 are 1 and −1. Therefore, we have
𝑓(𝑍) = 𝑒−𝑖𝜋𝑍/8
= 𝑒−𝑖𝜋/8 |0⟩ ⟨0| + 𝑒𝑖𝜋/8 |1⟩ ⟨1|
(2.4.103)
= 𝑒−𝑖𝜋/8 (|0⟩ ⟨0| + 𝑒𝑖𝜋/4 |1⟩ ⟨1|)
= 𝑒−𝑖𝜋/8 𝑇.
Exercise 2.4.71. Let 𝐴 be a normal linear operator on ℍ and let 𝛼, 𝛽 ∈ ℝ. Prove that
𝑒𝑖𝐴(𝛼+𝛽) = 𝑒𝑖𝐴𝛼 𝑒𝑖𝐴𝛽 .

The following theorem gives another characterization of unitary operators.


Theorem 2.4.72. An operator 𝑈 ∈ End(ℍ) is unitary if and only if 𝑈 can be written as
𝑈 = 𝑒𝑖𝐴 with a Hermitian operator 𝐴 ∈ End(ℍ).

Proof. Let 𝐴 ∈ End(ℍ) be Hermitian and let


(2.4.104) 𝐴 = ∑ 𝜆𝑃𝜆
𝜆∈Λ

be the spectral decomposition of 𝐴. Then we have


(2.4.105) 𝑒𝑖𝐴 = ∑ 𝑒𝑖𝜆 𝑃𝜆 .
𝜆∈Λ

It follows that the eigenvalues of 𝑒 are 𝑒𝑖𝜆 , 𝜆 ∈ Λ. Since 𝐴 is Hermitian, Proposition


𝑖𝐴

2.4.60 implies that Λ ⊂ ℝ. Hence, we have |𝑒𝑖𝜆 | = 1 for all 𝜆 ∈ Λ. Therefore, it follows
from Proposition 2.4.60 that 𝑒𝑖𝐴 is unitary.
Now assume that 𝑈 ∈ End(ℍ) is unitary. Let Λ be the set of eigenvalues of 𝑈.
Then it follows from Proposition 2.4.60 that |𝜆| = 1 for all 𝜆 ∈ Λ. Hence, for any 𝜆 ∈ Λ
there is 𝛼𝜆 ∈ ℝ such that 𝜆 = 𝑒𝑖𝛼𝜆 . Set
(2.4.106) 𝐴 = ∑ 𝛼𝜆 𝑃𝜆 .
𝜆∈Λ

Then 𝐴 is a linear operator on ℍ whose eigenvalues are all real numbers. Proposition
2.4.60 implies that 𝐴 is Hermitian. Also, we see from (2.4.106) that 𝑈 = 𝑒𝑖𝐴 . □

From Theorem 2.4.72 we obtain the following characterization of SU(𝑘).


Corollary 2.4.73. An operator 𝑈 ∈ End(ℍ) is in SU(𝑘) if and only if 𝑈 can be written
as 𝑈 = 𝑒𝑖𝐴 . with a Hermitian operator 𝐴 ∈ End(ℍ) of trace 0.
2.5. Tensor products 93

Proof. Let 𝑈 be a unitary operator on ℍ. Then by Theorem 2.4.72 there is a Hermitian


operator 𝐴 on ℍ with 𝑈 = 𝑒𝑖𝐴 . Let Λ be the set of eigenvalues of 𝐴. For 𝜆 ∈ Λ denote
by 𝑎𝜆 the algebraic multiplicity of 𝜆. Then we see from the proof of Theorem 2.4.72
and Proposition 2.4.1 that
(2.4.107) det 𝑈 = ∏ 𝑒𝑖𝑎𝜆 𝜆 = 𝑒𝑖 ∑𝜆∈Λ 𝑎𝜆 𝜆 = 𝑒𝑖 tr 𝐴 .
𝜆∈Λ

If tr 𝐴 = 0, then det 𝑈 = 1. Conversely, if det 𝑈 = 1, then


(2.4.108) tr 𝐴 ≡ 0 mod 2𝜋.
Change the eigenvalues of 𝐴 modulo 2𝜋 so that tr 𝐴 = 0. Then we still have 𝑈 =
𝑒𝑖𝐴 . □

The following result will also be useful.


Proposition 2.4.74. Let 𝐴 ∈ End(ℍ) be normal and an involution. Then we have 𝑒𝑖𝑥𝐴 =
(cos 𝑥)𝐼ℍ + 𝑖(sin 𝑥)𝐴 for all 𝑥 ∈ ℝ.

Proof. Since 𝐴 is an involution, it follows from Proposition 2.4.60 that the eigenvalues
of 𝐴 are in {1, −1}. If 1 is an eigenvalue of 𝐴, then we denote by 𝑃1 the orthogonal
projection onto the corresponding eigenspace. Otherwise, we set 𝑃1 = 0. Likewise,
if −1 is an eigenvalue of 𝐴, then we denote by 𝑃−1 the orthogonal projection onto the
corresponding eigenspace. Otherwise, we set 𝑃1 = 0. Then we have
(2.4.109) 𝐼ℍ = 𝑃1 + 𝑃−1 , 𝐴 = 𝑃1 − 𝑃−1 ,
and therefore
𝑒𝑖𝐴𝑥 = 𝑒𝑖𝑥 𝑃1 + 𝑒−𝑖𝑥 𝑃−1
= (cos 𝑥 + 𝑖 sin 𝑥)𝑃1 + (cos(−𝑥) + 𝑖 sin(−𝑥))𝑃−1
(2.4.110) = (cos 𝑥 + 𝑖 sin 𝑥)𝑃1 + (cos 𝑥 − 𝑖 sin 𝑥)𝑃−1
= cos 𝑥(𝑃1 + 𝑃−1 ) + 𝑖 sin 𝑥(𝑃1 − 𝑃−1 )
= (cos 𝑥)𝐼ℍ + 𝑖(sin 𝑥)𝐴. □

2.5. Tensor products


In the previous sections, we have discussed finite-dimensional Hilbert spaces. In this
section, we explain how to transfer many of the corresponding concepts and results to
tensor products of such spaces. For an introduction to tensor products, refer to Section
B.8.

2.5.1. Basics and notation. In this section, let 𝑚 ∈ ℕ and let ℍ(0), . . . , ℍ(𝑚 − 1)
be Hilbert spaces of finite dimension 𝑘0 , . . . , 𝑘𝑚−1 , respectively. The inner products
on these Hilbert spaces are denoted by ⟨⋅|⋅⟩. We use the notation ℍ(𝑗) to distinguish
this Hilbert space from the 𝑗-qubit state spaces ℍ𝑗 . For each 𝑗 ∈ ℤ𝑚 , let 𝐵𝑗 be an
orthonormal basis of ℍ(𝑗). We write these bases as
(2.5.1) 𝐵𝑗 = (|𝑏0,𝑗 ⟩ , . . . , ||𝑏𝑘𝑗 ,𝑗 ⟩).
94 2. Hilbert Spaces

We study the tensor product


(2.5.2) ℍ = ℍ(0) ⊗ ⋯ ⊗ ℍ(𝑚 − 1).
Its dimension as a ℂ-vector space is
𝑚−1
(2.5.3) 𝑘 = ∏ 𝑘𝑗 .
𝑗=0

Also, by Proposition B.9.11


(2.5.4) 𝐵 = 𝐵0 ⊗ ⋯ ⊗ 𝐵𝑚−1
is a basis of ℍ.
In tensor products of kets, we sometimes omit the tensor symbols; i.e., if |𝜑𝑗 ⟩ ∈
ℍ(𝑗) for 0 ≤ 𝑗 < 𝑚, then we write
(2.5.5) |𝜑0 ⟩ |𝜑1 ⟩ ⋯ |𝜑𝑚−1 ⟩ = |𝜑0 ⟩ ⊗ |𝜑1 ⟩ ⊗ ⋯ ⊗ |𝜑𝑚−1 ⟩ .

We present a few examples of tensor products of Hilbert spaces.


Example 2.5.1. Let 𝑚 = 2, ℍ(𝑗) = ℍ1 , and 𝐵𝑗 = (|0⟩ , |1⟩) for 𝑗 = 0, 1. Then we have
(2.5.6) ℍ = ℍ 1 ⊗ ℍ1
and
(2.5.7) 𝐵 = (|0⟩ ⊗ |0⟩ , |0⟩ ⊗ |1⟩ , |1⟩ ⊗ |0⟩ , |1⟩ ⊗ |1⟩).
By (2.5.5) we can write the basis 𝐵 as
(2.5.8) 𝐵 = (|0⟩ |0⟩ , |0⟩ |1⟩ , |1⟩ |0⟩ , |1⟩ |1⟩).

If all Hilbert spaces ℍ(𝑗) are equal, then ℍ is written as


(2.5.9) ℍ(0)⊗𝑚 = ℍ(0) ⊗ ⋯ ⊗ ℍ(0) .
⏟⎵⎵⎵⎵⏟⎵⎵⎵⎵⏟
𝑚 times
𝑚
Also, the elements of ℍ(0) are written as
⊗𝑚
(2.5.10) |𝜑⟩ = |𝜑⟩ ⋯ |𝜑⟩ .
⏟⎵⏟⎵⏟
𝑚 times

Example 2.5.2. We have


(2.5.11) ℍ⊗3
1 = ℍ 1 ⊗ ℍ1 ⊗ ℍ1

and
⊗3
(2.5.12) |0⟩ = |0⟩ |0⟩ |0⟩ .

We also show how to obtain the representation of a tensor product of elements


of the Hilbert spaces ℍ(𝑗) with respect to the basis 𝐵 from the representation of the
components with respect to the bases 𝐵𝑗 . We define

(2.5.13) 𝑘 ⃗ = (𝑘0 , . . . , 𝑘𝑚−1 )


2.5. Tensor products 95

and
𝑚−1
(2.5.14) ℤ𝑘 ⃗ = ∏ ℤ𝑘 𝑗 .
𝑗=0

Also, for all 𝑖 ⃗ ∈ ℤ𝑘⃗ , 𝑖 ⃗ = (𝑖0 , . . . , 𝑖𝑚−1 ), we set


𝑚−1
(2.5.15) ||𝑏 ⟩⃗ = |𝑏 ⟩ = |𝑏𝑖 ,0 ⟩ ⋯ |𝑏𝑖 ,𝑚−1 ⟩ .
𝑖 ⨂ | 𝑖𝑗 ,𝑗 0 𝑚−1
𝑗=0

Proposition 2.5.3. For 0 ≤ 𝑗 < 𝑚 let |𝜑𝑗 ⟩ ∈ ℍ(𝑗) with


𝑘𝑗 −1
(2.5.16) |𝜑𝑗 ⟩ = ∑ 𝛼𝑖,𝑗 |𝑏ᵆ,𝑗 ⟩
𝑖=0

where 𝛼𝑖,𝑗 ∈ ℂ for all 𝑖 ∈ ℤ𝑘𝑗 . Then


𝑚−1
(2.5.17) 𝜑 ⟩ = ∑ 𝛼𝑖 ⃗ ||𝑏𝑖 ⟩⃗
⨂| 𝑗
𝑗=0 ⃗
𝑖∈ℤ
𝑘⃗

where for 𝑖 ⃗ = (𝑖0 , . . . , 𝑖𝑘𝑗 −1 ) ∈ ℤ𝑘⃗ we have


𝑚−1
(2.5.18) 𝛼𝑖 ⃗ = ∏ 𝛼𝑖𝑗 ,𝑗 .
𝑗=0

Exercise 2.5.4. Prove Proposition 2.5.3.

2.5.2. Inner product. On the tensor product ℍ = ℍ(0) ⊗ ⋯ ⊗ ℍ(𝑚 − 1) we


use the inner product induced by the inner products on the component spaces which is
defined as the Hermitian inner product with respect to the base 𝐵 = 𝐵0 ⊗ ⋯ ⊗ 𝐵𝑚−1 .
So 𝐵 is an orthonormal basis with respect to this inner product. Also, equipped with
the inner product induced by the inner products of the component spaces, the tensor
product ℍ is a Hilbert space.
Proposition 2.5.5. For 0 ≤ 𝑗 < 𝑚 let |𝜑𝑗 ⟩ , |𝜓𝑗 ⟩ ∈ ℍ(𝑗). Then we have
𝑚−1 |𝑚−1 𝑚−1
(2.5.19) ⟨ |𝜑𝑗 ⟩ || |𝜓𝑗 ⟩ ⟩ = ∏ ⟨𝜑𝑗 |𝜓𝑗 ⟩.
⨂ ⨂
𝑗=0 | 𝑗=0 𝑗=0

Proof. Write
𝑚−1 𝑚−1
(2.5.20) |𝜑⟩ = |𝜑𝑗 ⟩ and |𝜓⟩ = 𝜓 ⟩.
⨂ ⨂| 𝑗
𝑗=0 𝑗=0

For 0 ≤ 𝑗 < 𝑚 let


𝑘𝑗 −1 𝑘𝑗 −1
(2.5.21) |𝜑𝑗 ⟩ = ∑ 𝛼𝑖,𝑗 |𝑏𝑖,𝑗 ⟩ , |𝜓𝑗 ⟩ = ∑ 𝛽 𝑖,𝑗 |𝑏𝑖,𝑗 ⟩
𝑖=0 𝑖=0
96 2. Hilbert Spaces

where 𝛼𝑖,𝑗 , 𝛽 𝑖,𝑗 ∈ ℂ for all 𝑖 ∈ ℤ𝑘𝑗 . Proposition 2.5.3 implies

(2.5.22) |𝜑⟩ = ∑ 𝛼𝑖 ⃗ ||𝑏𝑖 ⟩⃗ , |𝜓⟩ = ∑ 𝛽𝑖 ⃗ ||𝑏𝑖 ⟩⃗



𝑖∈ℤ ⃗
𝑖∈ℤ
𝑘⃗ 𝑘⃗

where for all 𝑖 ⃗ = (𝑖0 , . . . , 𝑖𝑚−1 ) ∈ ℤ𝑘⃗ we have


𝑚−1 𝑚−1
(2.5.23) 𝛼𝑖 ⃗ = ∏ 𝛼𝑖𝑗 ,𝑗 , 𝛽𝑖 ⃗ = ∏ 𝛽 𝑖𝑗 ,𝑗 .
𝑗=0 𝑗=0

From the orthonormality of 𝐵, we obtain

(2.5.24) ⟨𝜑|𝜓⟩ = ⟨ ∑ 𝛼𝑖 ⃗ ||𝑏𝑖 ⟩⃗ || ∑ 𝛽𝑖 ⃗ ||𝑏𝑖 ⟩⃗ ⟩ = ∑ 𝛼𝑖 𝛽


⃗ 𝑖 .⃗

𝑖∈ℤ ⃗
𝑖∈ℤ ⃗ 𝑘
𝑖∈ℤ
𝑘⃗ 𝑘⃗

On the other hand, the orthonormality of the 𝐵𝑗 implies


𝑚−1 𝑚−1
(2.5.25) ∏ ⟨𝜑𝑗 |𝜓𝑗 ⟩ = ∏ ⟨ ∑ 𝛼𝑖,𝑗 |𝑏𝑖,𝑗 ⟩ | ∑ 𝛽 𝑖,𝑗 |𝑏𝑖,𝑗 ⟩ ⟩ = ∑ 𝛼𝑖 𝛽
⃗ 𝑖 .⃗ □
𝑗=0 𝑗=0 𝑖∈ℤ𝑘 𝑖∈ℤ𝑘 ⃗ 𝑘
𝑖∈ℤ
𝑗 𝑗

Example 2.5.6. Let 𝑚 = 2, ℍ(𝑗) = ℍ1 , and 𝐵𝑗 = (|0⟩ , |1⟩) for 0 ≤ 𝑗 < 2. For
|𝜑⟩0 = |0⟩ + 𝑖 |1⟩ , |𝜑⟩1 = |0⟩ − 𝑖 |1⟩
(2.5.26)
|𝜓⟩0 = |0⟩ + |1⟩ , |𝜓⟩1 = |0⟩ − |1⟩
we obtain
(2.5.27) ⟨ |𝜑0 ⟩ |𝜑1 ⟩ || |𝜓0 ⟩ |𝜓1 ⟩ ⟩ = ⟨𝜑0 |𝜓0 ⟩⟨𝜑1 |𝜓1 ⟩ = (1 + 𝑖)(1 − 𝑖) = 2.

2.5.3. State spaces as tensor products. We can use the construction in the
previous section to identify the tensor product of state spaces with a larger state space.
To explain this, let 𝑚, 𝑛0 , . . . , 𝑛𝑚−1 ∈ ℕ. Consider the tensor product
(2.5.28) ℍ = ℍ𝑛0 ⊗ ⋯ ⊗ ℍ𝑛𝑚−1
of the 𝑛𝑗 -qubit state spaces ℍ𝑛𝑗 , 𝑗 ∈ ℤ𝑚 .
𝑚−1
Let 𝑛 = ∑𝑗=0 𝑛𝑗 . Denote by 𝐵 the computational basis of ℍ𝑛 . Then the linear
map
|𝑏0⃗ ⟩ |𝑏1⃗ ⟩ ⋯ |𝑏𝑚−1 ⟩ ↦ ||𝑏0⃗ 𝑏1⃗ ⋯ 𝑏𝑚−1
(2.5.29) ℍ → ℍ𝑛 , | | | ⃗ ⃗ ⟩,

where 𝑏𝑗 ∈ {0, 1}𝑛𝑗 for 0 ≤ 𝑗 < 𝑚, is an isometry between ℍ𝑛0 ⊗ ⋯ ⊗ ℍ𝑛𝑚−1 and ℍ𝑛 .
Using this isometry, we identify the elements of the tensor product ℍ with the elements
of ℍ𝑛 .

Exercise 2.5.7. Show that the map (2.5.29) is an isometry.

Example 2.5.8. Let 𝑚 = 2, 𝑛0 , 𝑛1 = 1, and let


(2.5.30) |𝜑⟩ = |0⟩ + |1⟩ , |𝜓⟩ = |0⟩ − |1⟩ .
2.5. Tensor products 97

Then we have
|𝜑⟩ ⊗ |𝜓⟩ = (|0⟩ + |1⟩) ⊗ (|0⟩ − |1⟩)
(2.5.31) = |0⟩ |0⟩ − |0⟩ |1⟩ + |1⟩ |0⟩ − |1⟩ |1⟩
= |00⟩ − |01⟩ + |10⟩ − |11⟩ .

The isometry (2.5.29) can also be written as

|𝑚−1
(2.5.32) ℍ𝑛0 ⊕ ⋯ ⊕ ℍ𝑛𝑚−1 → ℍ𝑛 , |𝑏0 ⟩ ⋯ |𝑏𝑚−1 ⟩ ↦ || ∑ 𝑏𝑗 2𝑠𝑗 ⟩
| 𝑗=0
𝑚−1
where 𝑏𝑗 ∈ ℤ2𝑛𝑗 and 𝑠𝑗 = ∑ᵆ=𝑗 𝑛ᵆ for all 𝑗 ∈ ℤ𝑚 .

2.5.4. Homomorphisms. Let 𝑚 ∈ ℕ, and let ℍ′ (0), . . . , ℍ′ (𝑚 − 1) be finite-


dimensional Hilbert spaces. Let

(2.5.33) ℍ′ = ℍ(0)′ ⊗ ⋯ ⊗ ℍ(𝑚 − 1)′ .


𝑚−1
Then, as shown in Section B.9.3, we identify ⨂𝑗=0 Hom(ℍ(𝑗), ℍ(𝑗)′ ) with Hom(ℍ, ℍ′ ).
𝑚−1
Proposition 2.5.9. Let (𝑓0 , . . . , 𝑓𝑚−1 ) ∈ ∏𝑗=0 ℍ(𝑗). Then we have

(2.5.34) 𝑓∗ = 𝑓0∗ ⊗ ⋯ ⊗ 𝑓𝑚−1


and

(2.5.35) 𝑓𝑘 = 𝑓0𝑘 ⊗ ⋯ ⊗ 𝑓𝑚−1


𝑘

for all 𝑘 ∈ ℕ0 and for all 𝑘 ∈ ℤ if 𝑓 is invertible.

Exercise 2.5.10. Prove Proposition 2.5.9.

Example 2.5.11. Consider 𝐻 ⊗2 = 𝐻 ⊗ 𝐻 in End(ℍ1 ⊗ ℍ1 ) where 𝐻 is the Hadamard


operator. Since we identify ℍ1 ⊗ ℍ1 with ℍ2 , this map is in End(ℍ2 ). From
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.5.36) 𝐻 |0⟩ = , 𝐻 |1⟩ =
√2 √2
it follows that
⊗2
𝐻 ⊗2 |0⟩ = 𝐻 |0⟩ ⊗ 𝐻 |0⟩
(2.5.37) |0⟩ + |1⟩ |0⟩ + |1⟩ 1
= ⊗ = ∑ ||𝑏⟩⃗ .
√2 √2 √2 𝑏∈{0,1}
⃗ 2

Exercise 2.5.12. Show that for all 𝑛 ∈ ℕ we have


⊗𝑛 1 |𝑏⟩⃗ .
(2.5.38) 𝐻 ⊗𝑛 |0⟩ = ∑ |
√2𝑛 ⃗
𝑏∈{0,1}𝑛
98 2. Hilbert Spaces

We also note the following.


Proposition 2.5.13. For 0 ≤ 𝑗 < 𝑚 let |𝜑𝑗 ⟩ , |𝜓𝑗 ⟩ ∈ ℍ(𝑗). Then we have
𝑙−1 𝑙−1
(2.5.39) ⟨ |𝜑 ⟩ | = ⟨𝜑 |
⨂ 𝑗 | ⨂ 𝑗
𝑗=0 𝑗=0

and
𝑙−1 𝑙−1 𝑙−1
| |
(2.5.40) | ⨂ |𝜑𝑗 ⟩ ⟩⟨ ⨂ |𝜓𝑗 ⟩ | = ⨂ |𝜑𝑗 ⟩ ⟨𝜓𝑗 | .
𝑗=0 𝑗=0 𝑗=0

Exercise 2.5.14. Prove Proposition 2.5.13.

2.5.5. Endomorphisms. We explain how the properties of the tensor product of


endomorphisms are related to the properties of its components.
Proposition 2.5.15. For 0 ≤ 𝑗 < 𝑚 assume that 𝐴𝑗 ∈ End(ℍ(𝑗)) and Λ𝑗 is the set of
eigenvalues of 𝐴𝑗 . Also, for 0 ≤ 𝑗 < 𝑚 and 𝜆 ∈ Λ𝑗 use the following notation: 𝐸 𝜆,𝑗 denotes
the eigenspace of 𝐴𝑗 associated with the eigenvalue 𝜆, 𝐵𝜆,𝑗 is an orthonormal basis of 𝐸 𝜆.𝑗 ,
and 𝑃𝜆,𝑗 is the orthogonal projection of ℍ(𝑗) onto 𝐸 𝜆,𝑗 . Finally, let
(2.5.41) 𝐴 = 𝐴0 ⊗ ⋯ ⊗ 𝐴𝑚−1 .
Then 𝐴 ∈ End(ℍ) and the following hold.
(1) The set of eigenvalues of 𝐴 is
𝑚−1
(2.5.42) Λ = { ∏ 𝜆𝑗 ∶ 𝜆𝑗 ∈ Λ𝑗 , for 0 ≤ 𝑗 < 𝑚} .
𝑗=0

(2) For all 𝜆 ∈ Λ let


𝑚−1 𝑚−1
(2.5.43) 𝐿𝜆 = {(𝜆0 , . . . , 𝜆𝑚−1 ) ∈ ∏ Λ𝑗 ∶ 𝜆 = ∏ 𝜆𝑗 } .
𝑗=0 𝑗=0

Then the eigenspace of 𝐴 associated with 𝜆 is


𝑚−1
(2.5.44) 𝐸𝜆 = ∑ 𝐸 𝜆𝑗 ,𝑗 .

(𝜆0 ,. . .,𝜆𝑚−1 )∈𝐿𝜆 𝑗=0

𝑚−1
Also, the concatenation of all sequences ⨂𝑗=0 𝐵𝜆𝑗 ,𝑗 , where (𝜆0 , . . . , 𝜆𝑚−1 ) ∈ 𝐿𝜆 , is
an orthonormal basis of 𝐸 𝜆 , and the projection onto this eigenspace is
𝑚−1
(2.5.45) 𝑃𝜆 = ∑ 𝑃𝜆𝑗 ,𝑗 .

(𝜆0 ,. . .,𝜆𝑚−1 )∈𝐿𝜆 𝑗=0

(3) The operator 𝐴 is a projection, an involution, normal, Hermitian, or unitary if and


only if all of its components 𝐴𝑗 , 𝑗 ∈ ℤ𝑚 , have the respective properties.
Exercise 2.5.16. Prove Proposition 2.5.15.
2.5. Tensor products 99

Example 2.5.17. Let 𝑚 = 2, ℍ(0) = ℍ(1) = ℍ1 , and let 𝐴0 and 𝐴1 be the Pauli 𝑋
operator introduced in Example 2.3.1 which sends |0⟩ to |1⟩ and vice versa. It has the
eigenvalues 1 and −1. Also
|0⟩ + |1⟩ |0⟩ − |1⟩
(2.5.46) (|𝑥+ ⟩) = ( ), (|𝑥− ⟩) = ( )
√2 √2
are orthonormal bases of the eigenspaces of 𝑋 associated with the eigenvalues 1 and
−1, respectively. The projections onto these eigenspaces are
(2.5.47) |𝑥+ ⟩ ⟨𝑥+ | , |𝑥− ⟩ ⟨𝑥− | ,
respectively.
We consider the tensor product
(2.5.48) 𝐴 = 𝐴 0 ⊗ 𝐴1 .
It is in End(ℍ1 ⊗ ℍ1 ) = End(ℍ2 ). It follows from Proposition 2.5.15 that 1 and −1 are
the eigenvalues of 𝐴, that the sequences
𝐵1 = (|𝑥+ 𝑥+ ⟩ , |𝑥− 𝑥− ⟩),
(2.5.49)
𝐵−1 = (|𝑥+ 𝑥− ⟩ , |𝑥− 𝑥+ ⟩)
are orthonormal bases of the eigenspaces 𝐸1 and 𝐸−1 associated with the eigenvalues
1 and −1, and that
𝑃1 = |𝑥+ 𝑥+ ⟩ ⟨𝑥+ 𝑥+ | + |𝑥− 𝑥− ⟩ ⟨𝑥− 𝑥− | ,
(2.5.50)
𝑃−1 = |𝑥+ 𝑥− ⟩ ⟨𝑥+ 𝑥− | + |𝑥− 𝑥+ ⟩ ⟨𝑥− 𝑥+ |
are the projections onto these eigenspaces.
Since 𝑋 is a Hermitian unitary involution, the same is true for 𝑋 ⊗ 𝑋.

2.5.6. Schmidt decomposition theorem. In this section, we establish the re-


nowned Schmidt decomposition theorem, credited to the mathematician Erhard
Schmidt’s discovery during the early 20th century. This theorem holds particular sig-
nificance within the realm of quantum mechanics, playing a crucial role in the analysis
of entanglement. The situation is the following.
It deals with the following problem. Let ℍ(0) and ℍ(1) be Hilbert spaces of dimen-
sions 𝑘 and 𝑙, respectively. Then |𝜑0 ⟩ ⊗ |𝜑1 ⟩ ∈ ℍ(0) ⊗ ℍ(1) for all |𝜑0 ⟩ ∈ ℍ(0) and
|𝜑1 ⟩ ∈ ℍ(1). But not all elements |𝜑⟩ in this tensor product of Hilbert spaces can be
written as a tensor product of elements in ℍ(0) and ℍ(1). The Schmidt decomposition
theorem allows one to distinguish between these cases.
Theorem 2.5.18 (Schmidt decomposition theorem). Let 𝑚 = min{𝑘, 𝑙} and let |𝜑⟩ ∈
ℍ(0) ⊗ ℍ(1). Then there are orthonormal sequences (|𝑢0 ⟩ , . . . , |𝑢𝑚−1 ⟩) in ℍ(0) and (|𝑣 0 ⟩ ,
. . . , |𝑣 𝑚−1 ⟩) in ℍ(1) and positive real numbers 𝑟0 , . . . , 𝑟𝑚−1 such that
𝑚−1
(2.5.51) |𝜑⟩ = ∑ 𝑟 𝑖 |𝑢𝑖 ⟩ ⊗ |𝑣 𝑖 ⟩ .
𝑖=0

Up to reordering, the coefficients 𝑟 𝑖 are uniquely determined by |𝜑⟩. The representation in


(2.5.51) is called a Schmidt decomposition of |𝜑⟩.
100 2. Hilbert Spaces

Proof. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘 ⟩) ∈ ℍ(0)𝑘 and 𝐶 = (|𝑐 0 ⟩ , . . . , |𝑐 𝑙−1 ⟩) ∈ ℍ(1)𝑙 be orthonor-
mal bases of ℍ(0) and ℍ(1), respectively. Then we can write
𝑘−1 𝑙−1
(2.5.52) |𝜑⟩ = ∑ ∑ 𝛼𝑖,𝑗 |𝑏𝑖 ⟩ |𝑐𝑗 ⟩
𝑖=0 𝑗=0

with 𝛼𝑖,𝑗 ∈ ℂ for all 𝑖 ∈ ℤ𝑘 and 𝑗 ∈ ℤ𝑙 . If we set 𝐴 = (𝛼𝑖,𝑗 ) ∈ ℂ(𝑘,𝑙) , then (2.5.52) can
also be written as
(2.5.53) |𝜑⟩ = 𝐵𝐴𝐶.
Without loss of generality, assume that 𝑘 ≥ 𝑙. Then by Theorem 2.4.67 there is a sin-
gular value decomposition
𝐷
(2.5.54) 𝐴 = 𝑈 ( ) 𝑉∗
0

where 𝑈 ∈ ℂ(𝑘,𝑘) and 𝑉 ∈ ℂ(𝑙,𝑙) are unitary matrices and 𝐷 ∈ ℂ(𝑙,𝑙) is a positive
semidefinite diagonal matrix; that is,
(2.5.55) 𝐷 = (𝑟0 , . . . , 𝑟 𝑙−1 )
with 𝑟 𝑖 ∈ ℝ≥0 for 0 ≤ 𝑖 < 𝑙. Write
(2.5.56) 𝑈 = (𝑈1 𝑈2 )
with 𝑈1 ∈ ℂ(𝑘,𝑙) and 𝑈2 ∈ ℂ(𝑘,𝑘−𝑙) . Then it follows from (2.5.54) that
(2.5.57) 𝐴 = 𝑈1 𝐷𝑉 ∗ .
So (2.5.53) implies
(2.5.58) |𝜑⟩ = 𝐵𝑈1 𝐷𝑉 ∗ 𝐶.
If we write
(2.5.59) (|𝑢0 ⟩ , . . . , |𝑢𝑙−1 ⟩) = 𝐵𝑈1 and (|𝑣 0 ⟩ , . . . , |𝑣 𝑙−1 ⟩) = 𝑉 ∗ 𝐶,
we obtain
𝑙−1
(2.5.60) |𝜑⟩ = ∑ 𝑟 𝑖 |𝑢𝑖 ⟩ ⊗ |𝑣 𝑖 ⟩ .
𝑖=0

Leaving out the summands with coefficients 0 we obtain a decomposition as in the


theorem.
We show the uniqueness of the coefficients 𝑟 𝑖 . Suppose that we are given a Schmidt
decomposition (2.5.51). From this, we can construct a singular value decomposition
of the matrix 𝐴 from (2.5.53) where the positive coefficients are the singular values
of 𝐴. The details of the construction are worked out in Exercise 2.5.19. The singular
values are uniquely determined by Theorem 2.4.67. This shows that these coefficients
are uniquely determined up to reordering. □
Exercise 2.5.19. Construct the singular value decomposition in the uniqueness proof
of Theorem 2.5.18.
2.5. Tensor products 101

Definition 2.5.20. Let ℍ(0) and ℍ(1) be Hilbert spaces of dimension 𝑘 and 𝑙, respec-
tively, and let 𝑚 = min{𝑘, 𝑙}. Let |𝜑⟩ ∈ ℍ(0) ⊗ ℍ(1). Let 𝑟 = (𝑟0 , . . . , 𝑟𝑚−1 ) be the
sequence of coefficients in a Schmidt decomposition of |𝜑⟩. From Theorem 2.5.18 we
know that the elements of 𝑟 are nonnegative real numbers and that 𝑟 is uniquely deter-
mined up to reordering.
(1) The elements of 𝑟 are called the Schmidt coefficients of |𝜑⟩.
(2) The number of elements in 𝑟 counted with multiplicities is called the Schmidt
rank or Schmidt number of |𝜑⟩.
(3) The ket |𝜑⟩ is called separable with respect to the decomposition ℍ = ℍ(0) ⊗ ℍ(1)
if its Schmidt rank is 1, i.e., if it can be written as |𝜑⟩ = |𝜓⟩ ⊗ |𝜉⟩ with |𝜓⟩ ∈ ℍ(0)
and |𝜉⟩ ∈ ℍ(1). Otherwise, |𝜑⟩ is called inseparable.

We note that the notion of separability depends on the decomposition of a Hilbert


space into the tensor product of two Hilbert spaces. The meaning of the term “entan-
gled” will become clear in the next chapter. Entanglement is one of the most important
concepts in quantum mechanics.
Example 2.5.21. Consider the decomposition ℍ2 = ℍ1 ⊗ ℍ1 and in ℍ2 the element
|0⟩ + |1⟩
(2.5.61) |𝜑⟩ = |0⟩ ∈ ℍ 1 ⊗ ℍ1 .
√2
This is a Schmidt decomposition of |𝜑⟩, since both |0⟩ and (|0⟩ + |1⟩)/√2 have length 1.
So 1 is the only Schmidt coefficient of |𝜑⟩. It has multiplicity 1 in the Schmidt decom-
position of |𝜑⟩. Hence, the Schmidt rank or Schmidt number of |𝜑⟩ (with respect to the
chosen decomposition of ℍ) is 1, which means that |𝜑⟩ is separable.
Example 2.5.22. Consider the so-called Bell state
|0⟩ |0⟩ + |1⟩ |1⟩
(2.5.62) |𝜑⟩ = ∈ ℍ 1 ⊗ ℍ1 .
√2
The representation (2.5.62) is a Schmidt decomposition of |𝜑⟩. So 1/√2 is the only
Schmidt coefficient of |𝜑⟩. It has multiplicity 2 in the Schmidt decomposition of |𝜑⟩.
Hence, the Schmidt rank or Schmidt number of |𝜑⟩ is 2 which means that |𝜑⟩ is entan-
gled.
Chapter 3

Quantum Mechanics

Quantum mechanics, discovered in the early 20th century, stands as one of the most
revolutionary discoveries in physics. It was fundamentally shaped by the work of
physicists like Max Planck, who received the Nobel Prize in 1918, and Albert Ein-
stein, who received the Nobel Prize in 1921, although Einstein was later very critical of
quantum mechanics. The theory was fully developed by remarkable scientists, includ-
ing Niels Bohr, Nobel laureate of 1922, Werner Heisenberg, Nobel Prize 1932, Erwin
Schrödinger, and Paul Dirac, Nobel Prize 1933, Wolfgang Pauli, Nobel Prize 1945, and
Max Born, Nobel Prize 1954. In 1965, Richard P. Feynman, Julian Schwinger, and
Sin-Itiro Tomonaga were awarded the Nobel Prize for their contributions to quantum
electrodynamics. Recently, in 2022, Alain Aspect, John Clauser, and Anton Zeilinger
received the Nobel Prize for experimentally verifying one of the most counterintuitive
phenomena in quantum physics: entanglement.
One of the fundamental features of quantum mechanics is the concept that closed
physical systems can exist in a superposition of many possible states. This intrigu-
ing property inspired the mathematician and physicist Yuri Manin [Man80] and the
physicists Paul Benioff [Ben80] and Richard Feynman [Fey82] to conceive the idea
of a quantum computer, where information is stored and processed in superposition.
However, it soon became evident that designing practical and useful algorithms based
on this concept is a challenging task, as described in the chapters following this one.
In order to grasp the functioning of these algorithms and their underlying princi-
ples, understanding quantum mechanics becomes essential. Therefore, the objective of
this chapter is to introduce the reader to the quantum mechanical basis that underlies
quantum computing.
Just like other fields of physics, quantum mechanics is built upon a set of postu-
lates that establish correspondence between real-world objects and processes and their
mathematical counterparts. This correspondence enables us to make predictions about

103
104 3. Quantum Mechanics

quantum computing through rigorous mathematical reasoning. We commence by in-


troducing these postulates and elucidating their significance within the realm of quan-
tum computing. We illustrate how using these postulates, quantum bits and quantum
registers are represented using Hilbert spaces, termed state spaces. Moreover, we ex-
plore how the evolution of such quantum systems is captured using unitary operators
and how measurements unlock their content for subsequent classical computation.
Our subsequent focus centers on visualizing quantum bits as points on the unit sphere
with a radius of 1 within the three-dimensional real space, which is known as the Bloch
sphere. Following this, we delve into an alternative description of quantum states using
density operators. This approach provides a means of describing the state of compo-
nents within composite quantum systems, further enhancing our understanding of the
intricate quantum realm.
Throughout this chapter, we make the assumption that ℍ is a Hilbert space of di-
mension 𝑘 ∈ ℕ, equipped with the inner product ⟨⋅|⋅⟩.

3.1. State spaces


3.1.1. The State Space Postulate. The first postulate that we discuss is the
State Space Postulate. It specifies how the state of a closed or isolated physical system
is modeled.
Postulate 3.1.1 (State Space Postulate). A closed physical system is associated with a
Hilbert space, called the state space of the system. The system at a particular time is
completely described by a unit vector in its state space, called the state vector or state
of the physical system.

The term “closed” refers to the system not interacting with other systems, i.e., not
exchanging energy or matter with them. In reality, the only closed system is the uni-
verse as a whole. However, in quantum computing, it is possible to construct quantum
systems which can be described to a good approximation as being closed. The state
vector of a quantum system is also called its wave function. The term “wave function”
originates from the historical development of quantum mechanics, where the theory
was initially formulated by analogy with classical wave phenomena.

3.1.2. Quantum bits. In quantum computing, the basic unit of information is a


quantum bit or qubit for short. A quantum bit is a physical system whose state space
is the single-qubit state space ℍ1 which we have specified in Definition 2.1.2. It is a two-
dimensional complex Hilbert space with orthonormal basis (|0⟩ , |1⟩), which is called
the computational basis of ℍ1 . By Postulate 3.1.1, an individual qubit at a particular
time is completely described by its state vector
(3.1.1) |𝜑⟩ = 𝛼0 |0⟩ + 𝛼1 |1⟩
where 𝛼0 and 𝛼1 are complex coefficients such that
2
(3.1.2) ‖𝜑‖ = |𝛼0 |2 + |𝛼1 |2 = 1.
The linear combination in (3.1.1) is called a superposition of the basis states |0⟩ and |1⟩.
So while the state of a classical bit is either 0 or 1, qubits are in a superposition of the
3.1. State spaces 105

basis states |0⟩ and |1⟩. The coefficients 𝛼0 and 𝛼1 are called the amplitudes of |𝜑⟩ for the
basis states |0⟩ and |1⟩, respectively. This is a special case of the following definition.
Definition 3.1.2. Let 𝑙 ∈ ℕ, let (|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩) ∈ ℍ𝑙 be linearly independent, and
let |𝜑⟩ ∈ ℍ, |𝜑⟩ = 𝛼0 |𝜑0 ⟩ + ⋯ + 𝛼𝑙−1 |𝜑𝑙−1 ⟩ with 𝛼𝑖 ∈ ℂ for 𝑖 ∈ ℤ𝑙 . Then, for each
𝑖 ∈ ℤ𝑙 , the coefficient 𝛼𝑖 is called the amplitude of |𝜑⟩ for the state |𝜑𝑖 ⟩.

In order to illustrate the physical meaning of the amplitudes of a single-qubit state,


we give a preview of the concept of measurements that will be discussed in more detail
in Section 3.4.1. A measurement in the computational basis of a qubit is an interaction of
an observer with the qubit. If the qubit is in the state |𝜑⟩ = 𝛼0 |0⟩ + 𝛼1 |1⟩, 𝛼0 , 𝛼1 ∈ ℂ,
|𝛼0 |2 + |𝛼1 |2 = 1, then the measurement gives 0 with probability |𝛼0 |2 and 1 with
probability |𝛼1 |2 . Note that |𝜑⟩ must be unitary for the sum of the probabilities of the
two measurement outcomes to be 1.
Example 3.1.3. Consider a physical system consisting of a single qubit. Then
1 𝑖
(3.1.3) |𝜑⟩ = |0⟩ + |1⟩
√2 √2
is a possible state of this qubit since
2 2
| 1 | | 𝑖 | 1 1
‖𝜑‖2 = | | +| | = + = 1.
| √2 | | √2 | 2 2
1 𝑖
The amplitudes of |𝜑⟩ for the states |0⟩ and |1⟩ are and , respectively. If our qubit
√2 √2
is in the state |𝜑⟩ and an observer measures it, then she gets 0 or 1, each with probability
1/2. Also, immediately after the measurement, the qubit is in state |0⟩ or |1⟩, depending
on the measurement outcome.

3.1.3. Spherical coordinates. In many contexts, the geometric interpretation of


single-qubit states as points on the so-called Bloch sphere is used. This interpretation
requires spherical coordinates of vectors in the three-dimensional real space ℝ3 which
we explain in this section.
Let 𝑝 ⃗ = (𝑥, 𝑦, 𝑧) ∈ ℝ3 . The triplet of real numbers (𝑥, 𝑦, 𝑧) is called the Carte-
sian coordinate representation of 𝑝.⃗ The elements of this representation are called the
Cartesian coordinates of 𝑝.⃗ To represent 𝑝,⃗ we also use the Cartesian coordinate rep-
resentation of 𝑝 ⃗ with respect to another basis 𝐵 of ℝ3 by which we mean the Cartesian
coordinate representation of 𝑝𝐵⃗ = 𝐵 −1 𝑝.⃗
Example 3.1.4. Consider 𝑝 ⃗ = (1, 1, 1). Consider the alternative basis
1 0 0
(3.1.4) 𝐵 = (0 −1 0)
0 0 −1
of ℝ3 . The Cartesian coordinate representation of 𝑝 ⃗ with respect to 𝐵 is 𝐵 −1 𝑝 ⃗ =
(1, −1, −1).

As inner product on ℝ3 we use the inner product on ℂ3 restricted to ℝ3 . This is


described in the next definition.
106 3. Quantum Mechanics

Definition 3.1.5. Let 𝑝 ⃗ = (𝑝𝑥 , 𝑝𝑦 , 𝑝𝑧 ), 𝑞 ⃗ = (𝑞𝑥 , 𝑞𝑦 , 𝑞𝑧 ) ∈ ℝ3 .


(1) The inner product of 𝑝 ⃗ and 𝑞 ⃗ is ⟨𝑝|⃗ 𝑞⟩⃗ = 𝑝 ⃗ ⋅ 𝑞 ⃗ = 𝑝𝑥 𝑞𝑥 + 𝑝𝑦 𝑞𝑦 + 𝑝𝑧 𝑞𝑧 𝑠.

(2) The Euclidean norm or length of 𝑝 ⃗ is ‖𝑝‖⃗ = √⟨𝑝|⃗ 𝑝⟩.



(3) 𝑝 ⃗ is called a unit vector if its Euclidean length is 1.
(4) 𝑝 ⃗ and 𝑞 ⃗ are called orthogonal to each other if ⟨𝑝|⃗ 𝑞⟩⃗ = 0.
(5) A basis of ℝ3 is called orthogonal if its elements are pairwise orthogonal.
(6) A basis of ℝ3 is called orthonormal if it is orthogonal and all its elements are unit
vectors.

The next proposition provides properties of the inner product on ℝ3 which are
similar to those listed in Definition 2.2.1.
Proposition 3.1.6. For all 𝑝,⃗ 𝑞,⃗ 𝑟 ⃗ ∈ ℝ3 and all 𝛾 ∈ ℝ the following hold.
(1) Bilinearity: ⟨𝑝+
⃗ 𝑞|⃗ 𝑟⟩⃗ = ⟨𝑝|⃗ 𝑟⟩+⟨
⃗ 𝑞|⃗ 𝑟⟩,
⃗ ⟨𝑝|⃗ 𝑞+
⃗ 𝑟⟩⃗ = ⟨𝑝|⃗ 𝑞⟩+⟨
⃗ 𝑝|⃗ 𝑟⟩,
⃗ and ⟨𝛾𝑝|⃗ 𝑞⟩⃗ = ⟨𝑝|𝛾
⃗ 𝑞⟩⃗ =
𝛾⟨𝑝|⃗ 𝑞⟩.

(2) Positive definiteness: ⟨𝑝|⃗ 𝑝⟩⃗ ≥ 0 and ⟨𝑝|⃗ 𝑝⟩⃗ = 0 if and only if 𝑝 ⃗ = 0. This property is
also called positivity.
Exercise 3.1.7. Prove Proposition 3.1.6.
Example 3.1.8. The inner product of (3, 2, 1) and (−1, 1, 1) is ⟨(3, 2, 1)|(−1, 1, 1)⟩ =
−3 + 2 + 1 = 0. Therefore, these vectors are orthogonal to each other. The length of
1
the first vector is ‖(3, 2, 1)‖ = √9 + 4 + 1 = √14. So, (3, 2, 1) is a unit vector.
√14

In order to define spherical coordinates we need the following result.


Lemma 3.1.9. Let 𝑥, 𝑦 ∈ ℝ with 𝑥2 + 𝑦2 = 1. Then there is a uniquely determined real
number 𝛾 with 0 ≤ 𝛾 < 2𝜋 such that
(3.1.5) 𝑥 = cos 𝛾 and 𝑦 = sin 𝛾.
Also, if 0 ≤ 𝑥, 𝑦 ≤ 1, then 𝛾 = arccos = arcsin 𝑦 and 0 ≤ 𝛾 ≤ 𝜋/2.
Exercise 3.1.10. Prove Lemma 3.1.9.
Example 3.1.11. If 𝑥 = −√2/2 and 𝑦 = √2/2, then for 𝛾 = 3/4𝜋 we have (cos 𝛾, sin 𝛾)
= (𝑥, 𝑦).

Now we introduce spherical coordinates.


Proposition 3.1.12. Let 𝑝 ⃗ ∈ ℝ3 , 𝑝 ⃗ ≠ 0. Then there are uniquely determined real num-
bers 𝜃 and 𝜙 with
𝜙=0 if 𝜃 ∈ {0, 𝜋},
(3.1.6) 0≤𝜃≤𝜋 and {
0 < 𝜙 < 2𝜋 otherwise
such that
(3.1.7) 𝑝 ⃗ = ‖𝑣‖(cos
⃗ 𝜙 sin 𝜃, sin 𝜙 sin 𝜃, cos 𝜃).
3.1. State spaces 107

(𝑥, 𝑦, 𝑧) = 𝑟(cos 𝜙 sin 𝜃, sin 𝜙 sin 𝜃, cos 𝜃)


𝜃

𝜙 𝑦

𝑥
Figure 3.1.1. Spherical coordinates of (𝑥, 𝑦, 𝑧).

The triplet (‖𝑝‖,


⃗ 𝜃, 𝜙) is called the spherical coordinate representation of 𝑝.⃗ Its elements
are called the spherical coordinates of 𝑝 ⃗ and are referred to as 𝑟(𝑝)⃗ = ‖𝑝‖,
⃗ 𝜃(𝑝)⃗ = 𝜃, and
𝜙(𝑝)⃗ = 𝜙.

Proof. Assume without loss of generality that ‖𝑝‖⃗ = 1. Write 𝑝 ⃗ = (𝑥, 𝑦, 𝑧). Then
|𝑧| ≤ 1 and 𝜃 = arccos 𝑧 is the uniquely determined real number 𝜃 ∈ [0, 𝜋] that
satisfies cos 𝜃 = 𝑧.
If 𝑝 ⃗ = (0, 0, 1), then 𝜃 = 0. So, if we choose 𝜙 = 0, then (3.1.7) is satisfied. Also, if
𝑝 ⃗ = (0, 0, −1), then 𝜃 = 𝜋, and if we choose 𝜙 = 0, then (3.1.7) is satisfied.
If 𝑝 ⃗ ≠ (0, 0, ±1), then 0 < 𝜃 < 𝜋 and we have
𝑥 2 𝑦 2 1 − 𝑧2 1 − cos2 𝜃
(3.1.8) ( ) +( ) ℎ 2
= 2
= 1.
sin 𝜃 sin 𝜃 sin 𝜃 sin 𝜃
So it follows from Lemma 3.1.9 that there is a uniquely determined 𝜙 ∈]0, 2𝜋[ such
that (3.1.7) holds. □

Figure 3.1.1 illustrates the spherical coordinates.

Example 3.1.13. We determine the spherical coordinate representation (𝑟, 𝜃, 𝜙) of a


vector in ℝ3 with Cartesian coordinates (1/2, 1/2, √2/2). We have 𝑟2 = 1/4+1/4+1/2 =
1, 𝜃 = arccos(√2/2) = 𝜋/4. Also, we have cos 𝜙 = √2/2, sin 𝜙 = √2/2. So 𝜙 = 𝜋/4.

3.1.4. The Bloch sphere. In this section, we present the Bloch sphere represen-
tation of single-qubit states. As we will see in Section 4.3, this representation allows
for a geometric interpretation of the unitary operators in ℍ1 .

Definition 3.1.14. By the Bloch sphere we mean the set {𝑝 ⃗ ∈ ℝ3 ∶ ‖𝑝‖⃗ = 1} which
is the surface of the sphere of radius 1 in ℝ3 . The elements of the Bloch sphere are
referred to as points on the Bloch sphere.

In (3.1.1), elements |𝜑⟩ of ℍ1 are represented as superpositions of the computa-


tional basis elements |0⟩ and |1⟩ of ℍ1 . Since the two coefficients are complex numbers
108 3. Quantum Mechanics

and ℂ is a two-dimensional ℝ-vector space, |𝜑⟩ can be described using four real num-
bers. But since single-qubit states have Euclidean length 1, these numbers are not inde-
pendent of each other. We will now show that single-qubit states can be represented by
three real numbers. For this, we need the following result which follows from Lemma
3.1.9.
Lemma 3.1.15. If 𝛼 ∈ ℂ with |𝛼| = 1, then there is a uniquely determined real number
𝛾 with 0 ≤ 𝛾 < 2𝜋 such that 𝛼 = 𝑒𝑖𝛾 = cos 𝛾 + 𝑖 sin 𝛾.
Exercise 3.1.16. Prove Lemma 3.1.15.

We give a few examples for the representation from Lemma 3.1.15.


Example 3.1.17. We have
1 = 𝑒𝑖⋅0 = cos 0 + 𝑖 sin 0,
𝜋 𝜋
𝑖 = 𝑒𝑖⋅𝜋/2 = cos + 𝑖 sin ,
2 2
(3.1.9) 1+𝑖 𝜋 𝜋
= 𝑒𝑖⋅𝜋/4 = cos + 𝑖 sin ,
√2 4 4
1−𝑖 7𝜋 7𝜋
= 𝑒𝑖⋅7𝜋/4 = cos + 𝑖 sin .
√2 4 4

The next proposition presents the representation of single-qubit states by three real
numbers.
Proposition 3.1.18. Let |𝜓⟩ ∈ ℍ1 be a single-qubit state. Then there are uniquely deter-
mined real numbers 𝛾, 𝜃, and 𝜙 such that
𝜃 𝜃
(3.1.10) |𝜓⟩ = 𝑒𝑖𝛾 (cos( ) |0⟩ + 𝑒𝑖𝜙 sin( ) |1⟩)
2 2
and
(3.1.11) 0 ≤ 𝜃 ≤ 𝜋, 0 ≤ 𝛾, 𝜙 < 2𝜋, 𝜃 ∈ {0, 𝜋} ⇒ 𝜙 = 0.
We write these numbers as 𝛾(𝜓), 𝜃(𝜓), and 𝜙(𝜓).

Proof. Let |𝜓⟩ = 𝛼0 |0⟩ + 𝛼1 |1⟩ with 𝛼0 , 𝛼1 ∈ ℂ. Since |𝜓⟩ is a single-qubit state, we
have |𝛼0 |2 + |𝛼1 |2 = 1. Choose 𝜃 ∈ [0, 𝜋] such that
𝜃 𝜃
(3.1.12) |𝛼0 | = cos , |𝛼1 | = sin .
2 2
By Lemma 3.1.9, this is possible and 𝜃 is uniquely determined. To complete the proof,
we distinguish three cases.
First, if 𝛼0 = 0, then 𝜃 = 0, |𝛼1 | = 1, and by Lemma 3.1.15 we can write |𝛼1 | = 𝑒𝑖𝛾
with a uniquely determined 𝛾 ∈ [0, 2𝜋[. If we set 𝜙 = 0, then (𝛾, 𝜃, 𝜙) is the only triplet
of real numbers that satifies (3.1.10) and (3.1.11).
Second, if 𝛼1 = 0, then 𝜃 = 𝜋, |𝛼0 | = 1, and by Lemma 3.1.15 we can write
|𝛼0 | = 𝑒𝑖𝛾 |0⟩ with a uniquely determined 𝛾 ∈ [0, 2𝜋[. If we set 𝜙 = 0, then (𝛾, 𝜃, 𝜙) is
the only triplet of real numbers that satisfies (3.1.10) and (3.1.11).
3.1. State spaces 109

Third, assume that 𝛼0 , 𝛼1 ≠ 0. Then it follows from Lemma 3.1.15 that there are
uniquely determined real numbers 𝛾, 𝛿 ∈ [0, 2𝜋[ such that

𝜃 𝜃
(3.1.13) 𝛼0 = 𝑒𝑖𝛾 |𝛼0 | = 𝑒𝑖𝛾 cos , 𝛼1 = 𝑒𝑖𝛿 |𝛼1 | = 𝑒𝑖𝛿 sin .
2 2
Set 𝜙 = 𝛿 − 𝛾 mod 2𝜋. Then we have

𝜃 𝜃
(3.1.14) |𝜑⟩ = 𝑒𝑖𝛾 (cos |0⟩ + 𝑒𝑖𝜙 sin |1⟩)
2 2

and (𝛾, 𝜃, 𝜙) is the uniquely determined triplet of real numbers that satisfies (3.1.10)
and (3.1.11). □

Building upon Proposition 3.1.18, we establish a correspondence between each


single-qubit state and a location on the Bloch sphere and vice versa.

Definition 3.1.19. (1) To each single-qubit state |𝜓⟩ ∈ ℍ1 we assign the point 𝑝(𝜓)

on the Bloch sphere with spherical coordinates (1, 𝜃(𝜓), 𝜙(𝜓)) and Cartesian co-
ordinates (sin 𝜃(𝜓) cos 𝜙(𝜓), sin 𝜃(𝜓) sin 𝜙(𝜓), cos 𝜃(𝜓)).
(2) To each point 𝑝 ⃗ on the Bloch sphere with spherical coordinates (1, 𝜃, 𝜙) we assign
the single-qubit state

𝜃 𝜃
(3.1.15) ⃗ = cos( ) |0⟩ + 𝑒𝑖𝜙 sin( ) |1⟩ .
|𝜓(𝑝)⟩
2 2

The correspondence between single-qubit states and points on the Bloch sphere
is illustrated in Example 3.1.20, Exercise 3.1.21, and Figure 3.1.2. There and in the
remainder of this book, we write the unit vectors in the 𝑥-, 𝑦-, and 𝑧-directions in ℝ3 as

(3.1.16) 𝑥̂ = (1, 0, 0), 𝑦 ̂ = (0, 1, 0), 𝑧 ̂ = (0, 0, 1).

ˆ |0⟩ = |𝑧+ ⟩
𝑧̂ =

ˆ |𝜓⟩
(1, 𝜃, 𝜙) =

𝜃
ˆ |𝑦+ ⟩
𝑦̂ =
𝜙
𝑥̂ = |𝑥
ˆ +⟩

ˆ |1⟩ = |𝑧− ⟩
−𝑧 ̂ =

Figure 3.1.2. Points on the Bloch sphere corresponding to |𝑥+ ⟩, |𝑦+ ⟩, |𝑧+ ⟩ = |0⟩,
|𝑧− ⟩ = |1⟩, and a general single-qubit state |𝜓⟩.
110 3. Quantum Mechanics

We also recall that the orthonormal eigenbases of the Pauli operators 𝑋, 𝑌 , and 𝑍 on
ℍ1 (see Section 2.4.7) are
|0⟩ + |1⟩ |0⟩ − |1⟩
(|𝑥+ ⟩ , |𝑥− ⟩) = ( , ),
√2 √2
(3.1.17) |0⟩ + 𝑖 |1⟩ |0⟩ − 𝑖 |1⟩
(|𝑦+ ⟩ , |𝑦− ⟩) = ( , ),
√2 √2
(|𝑧+ ⟩ , |𝑧− ⟩) = (|0⟩ , |1⟩).
Example 3.1.20. The representation (3.1.10) of |𝑧+ ⟩ = |0⟩ is
0 0
(3.1.18) |𝑧+ ⟩ = |0⟩ = 𝑒𝑖⋅0 (cos |0⟩ + 𝑒𝑖⋅0 sin |1⟩) .
2 2
Hence, the spherical coordinate representation of the point on the Bloch sphere corre-
sponding to this state is (1, 0, 0) and its Cartesian coordinate representation is (0, 0, 1) =
𝑧.̂
The representation (3.1.10) of |𝑧− ⟩ = |1⟩ is
0 0
(3.1.19) |𝑧− ⟩ = |1⟩ = 𝑒𝑖⋅0 (cos |0⟩ + 𝑒𝑖⋅0 sin |1⟩) .
2 2
Hence, the spherical coordinate representation of the point on the Bloch sphere corre-
sponding to this state is (1, 𝜋, 0) and its Cartesian coordinate representation (0, 0, −1) =
−𝑧.̂

In the previous example, we have presented single-qubit states that correspond to


the unit vectors on the Bloch sphere in the 𝑥-direction. The next exercise determines
such states that correspond to the unit vectors in the 𝑦- and 𝑧-directions.
Exercise 3.1.21. Show that (𝑝(𝑥
⃗ + ), 𝑝(𝑥
⃗ − )) = (𝑥,̂ −𝑥)̂ and (𝑝(𝑦 ⃗ − )) = (𝑦,̂ −𝑦).
⃗ + ), 𝑝(𝑦 ̂

We now introduce global phase factors. For example, the term 𝑒𝑖𝛾 in the represen-
tation (3.1.10) is such a factor. The general definition is as follows.
Definition 3.1.22. Let |𝜑⟩ , |𝜓⟩ ∈ ℍ and let 𝛾 ∈ ℝ be such that |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩. Then we
say that |𝜑⟩ and |𝜓⟩ are equal up to the global phase factor 𝑒𝑖𝛾 or that these states differ
by the global phase factor 𝑒𝑖𝛾 .

We note the following.


Proposition 3.1.23. Let 𝑆 be the set of all quantum states in the Hilbert space ℍ. Then
the subset of 𝑆 2 of all pairs of quantum states that are equal up to a global phase factor
is an equivalence relation on 𝑆. For |𝜓⟩ ∈ ℍ, we denote the equivalence class of |𝜓⟩ with
respect to this relation by [𝜓].
Exercise 3.1.24. Prove Proposition 3.1.23

As we show in Theorem 3.4.4, global phase factors have no impact on measure-


ment outcomes. So if |𝜓⟩ ∈ ℍ is a unit vector that describes the state of a closed phys-
ical system, then all elements of the equivalence class [𝜓], that is, all unit vectors in ℍ
that are equal to |𝜓⟩ up to a global phase factor, also completely describe the state of
this system.
3.1. State spaces 111

Next, we will show that there is a one-to-one correspondence between the points
on the Bloch sphere and the equivalence classes [𝜓] of the quantum states |𝜓⟩ in ℍ1 .
This means that the state of a single-qubit system is completely described by the cor-
responding point on the Bloch sphere.

Theorem 3.1.25. Denote by 𝑆 1 the set of quantum states in ℍ1 and by 𝑅1 the equivalence
relation on 𝑆 1 from Proposition 3.1.23. Then the map

(3.1.20) 𝑆 1 /𝑅1 → {𝑝 ⃗ ∈ ℝ3 ∶ ‖𝑝‖⃗ = 1}, [𝜓] ↦ 𝑝(𝜓)


is a bijection. Its inverse is

(3.1.21) {𝑝 ⃗ ∈ ℝ3 ∶ ‖𝑝‖⃗ = 1} → 𝑆 1 /𝑅1 , 𝑝 ⃗ ↦ [𝜓(𝑝)].


Proof. It follows from Proposition 3.1.12 that the map that sends the spherical coordi-
nates of a point on the Bloch sphere to its Cartesian coordinates is a bijection. There-
fore, it suffices to prove that the map

(3.1.22) 𝑆 1 /𝑅 → {(0, 0), (𝜋, 0)} ∪ (]0, 𝜋[×[0, 2𝜋[) , [𝜓] ↦ (𝜃(𝜓), 𝜙(𝜓))

is a bijection. Injectivity follows from Proposition 3.1.18. To see the surjectivity, we note
that for a point 𝑝 ⃗ on the Bloch sphere with spherical coordinates (𝜃, 𝜙), the equivalence
class [𝜓(𝑝)]⃗ is the inverse image of 𝑝.⃗ □

3.1.5. Quantum registers. To be able to perform complex computations, quan-


tum systems are required that consist of more than one qubit. Let 𝑛 ∈ ℕ. A quantum
system comprising 𝑛 qubits is called an 𝑛-qubit quantum register. The corresponding
state space ℍ𝑛 and its computational basis have already been introduced in Definition
2.1.2. Recall that the computational basis of ℍ𝑛 is the lexicographically ordered se-
quence 𝐵 = (||𝑏⟩)
⃗ which can also be written as (|𝑏⟩𝑛 )𝑏∈ℤ 𝑛 (see Example 2.1.5).

𝑏∈{0,1}𝑛 2
Since the inner product on ℍ𝑛 is the Hermitian inner product with respect to 𝐵, this
basis is orthonormal.
The state of an 𝑛-qubit quantum register can be written as

(3.1.23) |𝜑⟩ = ∑ 𝛼𝑏⃗ ||𝑏⟩⃗



𝑏∈{0,1}𝑛

with complex coefficients 𝛼𝑏⃗ satisfying

(3.1.24) ∑ |𝛼𝑏⃗ |2 = 1.

𝑏∈{0,1}𝑛

Such an element of ℍ𝑛 is called an 𝑛-qubit state. So, while the state of a classical 𝑛-bit
register is an element 𝑏 ⃗ ∈ {0, 1}𝑛 , the state of an 𝑛-qubit quantum register is a linear
combination of the computational basis states ||𝑏⟩, ⃗ 𝑏 ⃗ ∈ {0, 1}𝑛 which is also called a
superposition of the basis elements.
112 3. Quantum Mechanics

Example 3.1.26. Consider the state space ℍ2 of a 2-qubit system. The computational
basis of ℍ2 is (|00⟩ , |01⟩ , |10⟩ , |11⟩). It can also be written as |0⟩2 , |1⟩2 , |2⟩2 , |3⟩2 . For
instance,
1 𝑖 1 𝑖
(3.1.25) |𝜑⟩ = |00⟩ − |11⟩ = |0⟩2 − |3⟩2
√2 √2 √2 √2
is a 2-qubit state. It is a superposition of the states |00⟩ = |0⟩2 and |11⟩ = |3⟩2 .

Once again, we provide a preview of the concept of measurements, which will


be discussed in more detail in Section 3.4.1. When measuring an 𝑛-qubit register in
the computational basis of ℍ𝑛 that is in the state ∑𝑏∈{0,1}
⃗ 𝑛 𝛼𝑏⃗ |𝑏⟩, where 𝛼𝑏⃗ ∈ ℂ, the

probability of obtaining any specific 𝑏 ⃗ ∈ {0, 1} is |𝛼 ⃗ | . After a measurement with


𝑛 2
𝑏
outcome 𝑏 ⃗ ∈ {0, 1}𝑛 , the quantum register’s state becomes ||𝑏⟩.
⃗ If immediately after a
measurement the register is measured again, then the result of the previous measure-
ment is reproduced. Similar to single-qubits, when measuring an 𝑛-qubit register in
two different states that are equal up to a global phase factor, the resulting probability
distribution is the same.

3.2. State spaces of composite systems


In quantum computing, we need to be able to combine physical systems to larger phys-
ical systems and to operate on them. In this section, we explain how this is done.

3.2.1. The Composite Systems Postulate. The Composite System Postulate


that we present now describes how to construct the state space of composite physical
systems.

Postulate 3.2.1 (Composite Systems Postulate). The state space of the composition of
finitely many physical systems is the tensor product of the state spaces of the compo-
nent systems. Moreover, if we have systems numbered 0 through 𝑚−1 and if system 𝑖 is
in the state |𝜓𝑖 ⟩ for 0 ≤ 𝑖 < 𝑚, then the composite system is in state |𝜓0 ⟩ ⊗ ⋯ ⊗ |𝜓𝑚−1 ⟩.

For quantum computing, compositions of the following type are frequently used:

(3.2.1) ℍ = ℍ𝑛0 ⊗ ⋯ ⊗ ℍ𝑛𝑚−1

where 𝑚 ∈ ℕ and 𝑛𝑖 ∈ ℕ for all 𝑖 ∈ ℤ𝑚 . As explained in Section 2.5.3, the composite


𝑚−1
space ℍ is identified with the state space ℍ𝑛 where 𝑛 = ∑𝑖=0 𝑛𝑖 . The computational
basis of ℍ = ℍ𝑛 is the tensor product of the computational bases of the Hilbert spaces
ℍ 𝑛𝑖 .

3.2.2. Entangled states. An important reason for the superiority of quantum


computing over classical computing is that quantum states can be entangled. Entan-
glement has already been introduced in Definition 2.5.20. The present section puts this
concept into the context of quantum mechanics. We begin with an example.
3.3. Time evolution 113

Example 3.2.2. We consider the composition of two qubits with state space ℍ2 =
ℍ1 ⊗ ℍ1 . For all pairs (|𝜑0 ⟩ , |𝜑1 ⟩) of single-qubit states, ℍ2 contains the composite
state
(3.2.2) |𝜑⟩ = |𝜑0 ⟩ ⊗ |𝜑1 ⟩ .
However, as we have seen in Example 2.5.22, the Bell state
|00⟩ + |11⟩
(3.2.3) |𝜑⟩ =
√2
cannot be written in this form, that is, as the tensor product of two single-qubit states.
It is therefore called entangled.

The next definition generalizes Example 3.2.2.

Definition 3.2.3. A state of the composition of two physical systems is called entan-
gled if it cannot be written as the tensor product of states of the component systems.
Otherwise, this state is called separable or nonentangled.

Note that the concept of entanglement depends on the decomposition of a physical


system into parts. Theorem 2.5.18 implies the following result.

Theorem 3.2.4. The state of the composition of two quantum systems is separable if and
only if its Schmidt rank is 1, and it is entangled if and only if its Schmidt rank is greater
than 1.

Exercise 3.2.5. Find an example of an entangled state in ℍ3 and prove that it is entan-
gled.

3.3. Time evolution


Quantum computers use a sequence of operations to transform the input state of a
quantum register into its output state. In this section, we describe these operations.

3.3.1. Evolution Postulate. How does the state of a quantum mechanical sys-
tem change over time? This question is answered by the Evolution Postulate.

Postulate 3.3.1 (Evolution Postulate). The evolution of a closed quantum system is


described by a unitary transformation. More precisely, if 𝑡, 𝑡′ ∈ ℝ, 𝑡 < 𝑡′ , then the
state |𝜑′ ⟩ of the system at time 𝑡′ is obtained from the state |𝜑⟩ of the system at time
𝑡 as |𝜑′ ⟩ = 𝑈 |𝜑⟩ where 𝑈 is a unitary operator on the state space of the system that
depends only on 𝑡 and 𝑡′ .

In general quantum mechanics where infinite-dimensional Hilbert spaces are used,


the Schrödinger differential equation is used to formulate a more general Evolution
Postulate. However, for our purposes the above formulation of the Evolution Postulate
is appropriate. In the next section, we illustrate the Evolution Postulate in the context
of quantum computing.
114 3. Quantum Mechanics

3.3.2. Quantum gates. Postulate 3.3.1 provides an understanding of how quan-


tum computation works in principle. Such a computation uses a quantum register. Its
initial state is the input state of the computation. A unitary transformation is applied
to the input state, giving the output state. So, quantum computing relies on the im-
plementation of unitary operators. For this, quantum circuits are used. They will be
described in Section 3.3.4.
The building blocks of such circuits are quantum gates just as logic gates are the
building blocks of Boolean circuits. Quantum gates implement simple unitary opera-
tors and are provided by the quantum computing platform that is used.
We now present two examples of quantum gates. Many more quantum gates are
discussed in Chapter 4.
The first example is the Hadamard gate or Hadamard operator
(3.3.1) 𝐻 ∶ ℍ1 → ℍ 1 , |0⟩ ↦ |𝑥+ ⟩ , |1⟩ ↦ |𝑥− ⟩
which was already introduced in Exercise 2.3.4. There, it is shown that its representa-
tion matrix with respect to the computational basis of ℍ1 is
1 1 1
(3.3.2) 𝐻= ( ).
√2 1 −1
This implies the following result.
Proposition 3.3.2. The Hadamard operator is a Hermitian unitary involution; that is,
we have 𝐻 ∗ = 𝐻 = 𝐻 −1 and 𝐻 2 = 𝐼.
Exercise 3.3.3. Prove Proposition 3.3.2.

In quantum circuits, the Hadamard gate is represented by the symbol shown in


Figure 3.3.1.
The second example of a quantum gate is the 𝖢𝖭𝖮𝖳 gate.
Definition 3.3.4. The controlled-𝖭𝖮𝖳 gate or 𝖢𝖭𝖮𝖳 gate for short is the linear operator

(3.3.3) 𝖢𝖭𝖮𝖳 ∶ ℍ2 → ℍ2 , |𝑐⟩ |𝑡⟩ ↦ |𝑐⟩ 𝑋 𝑐 |𝑡⟩ .

Definition 3.3.4 shows that the 𝖢𝖭𝖮𝖳 gate applies the Pauli 𝑋 operator to a target
qubit |𝑡⟩ if the control qubit |𝑐⟩ is |1⟩. Otherwise, the target qubit remains unchanged.
This means that the application of the Pauli 𝑋 operator to the target qubit is controlled
by the control qubit. Since the Pauli 𝑋 operator is the quantum 𝖭𝖮𝖳 operator, this
explains the name “controlled-𝖭𝖮𝖳 gate”. So, 𝖢𝖭𝖮𝖳 operates on the computational
basis states of ℍ2 in the following way:
(3.3.4) |00⟩ ↦ |00⟩ , |01⟩ ↦ |01⟩ , |10⟩ ↦ |11⟩ , |11⟩ ↦ |10⟩ ,

𝐻
Figure 3.3.1. Symbol for the Hadamard gate in quantum circuits.
3.3. Time evolution 115

Figure 3.3.2. Symbol for the 𝖢𝖭𝖮𝖳 gate in quantum circuits.

which shows that the representation matrix of 𝖢𝖭𝖮𝖳 with respect to the computational
basis of ℍ2 is
1 0 0 0
⎛ ⎞
0 1 0 0
(3.3.5) 𝖢𝖭𝖮𝖳 = ⎜ ⎟.
⎜0 0 0 1⎟
⎝0 0 1 0⎠
This implies the following result.
Proposition 3.3.5. The 𝖢𝖭𝖮𝖳 operator is a Hermitian unitary involution; that is, we
∗ −1 2
have 𝖢𝖭𝖮𝖳 = 𝖢𝖭𝖮𝖳 = 𝖢𝖭𝖮𝖳 and 𝖢𝖭𝖮𝖳 = 𝐼2 .
Exercise 3.3.6. Prove Proposition 3.3.5.

In quantum circuits, the 𝖢𝖭𝖮𝖳 gate is represented by the symbol shown in Figure
3.3.2.
The definition of the CNOT gate might give the impression that this gate never
changes the control qubit. However, as the next example shows, this impression is
deceptive.
Example 3.3.7. We have
𝖢𝖭𝖮𝖳 |𝑥+ ⟩ |𝑥− ⟩
𝖢𝖭𝖮𝖳 |0⟩ |0⟩ − 𝖢𝖭𝖮𝖳 |0⟩ |1⟩ + 𝖢𝖭𝖮𝖳 |1⟩ |0⟩ − 𝖢𝖭𝖮𝖳 |1⟩ |1⟩
=
2
|0⟩ |0⟩ − |0⟩ |1⟩ + |1⟩ |1⟩ − |1⟩ |0⟩
=
2
= |𝑥− ⟩ |𝑥− ⟩ .
So applied to |𝑥+ ⟩ |𝑥− ⟩ the 𝖢𝖭𝖮𝖳 operator changes the control qubit but not the target
qubit.

3.3.3. Composition of operators. Let 𝑚 ∈ ℕ and consider 𝑚 quantum systems


with the corresponding state spaces ℍ(0), . . . , ℍ(𝑚 − 1). According to Postulate 3.2.1,
the state space of the composition of the 𝑚 quantum systems is the tensor product
(3.3.6) ℍ = ℍ(0) ⊗ ⋯ ⊗ ℍ(𝑚 − 1).
Let 𝑓0 , . . . , 𝑓𝑚−1 be linear operators on ℍ(0), . . . , ℍ(𝑚 − 1), respectively. In Section 2.5.5
we have introduced the linear operator
(3.3.7) 𝑓 = 𝑓0 ⊗ ⋯ ⊗ 𝑓𝑚−1
on ℍ and presented its properties. In particular, we have seen that 𝑓 is a projection, an
involution, normal, Hermitian, or unitary if and only if all of its components have the
corresponding property. This construction allows us to extend the action of an operator
116 3. Quantum Mechanics

𝐻
= 𝐻

𝐼1

Figure 3.3.3. Extension of 𝐻 acting on the first state space to an operator action on two qubits.

𝑓𝑗 on one of the component state spaces ℍ(𝑗), 𝑗 ∈ ℤ𝑚 , to the composite state space by
using the operator
(3.3.8) 𝐼0 ⊗ ⋯ ⊗ 𝐼𝑗−1 ⊗ 𝑓𝑗 ⊗ 𝐼𝑗+1 ⊗ ⋯ ⊗ 𝐼𝑚−1
where 𝐼𝑖 is the identity operator on ℍ(𝑖) for 0 ≤ 𝑖 < 𝑚.
Example 3.3.8. Consider the composition of two single-qubit systems with state spaces
ℍ1 each. The state space of the composite system is ℍ2 . The extension of the Hadamard
operator acting on the first state space to an operator on the composite space is 𝐻 ⊗ 𝐼
where 𝐼 is the identity operator on ℍ1 . In a quantum circuit, this extended operator
is depicted on the right side of Figure 3.3.3. So, the identity operator is omitted. The
two lines represent the two qubits. The box with 𝐻 inside represents the Hadamard
operator acting on the first qubit. The extended operator has the following effect on
the computational basis elements of ℍ2 :
1 1
|00⟩ ↦ (|00⟩ + |10⟩), |01⟩ ↦ (|01⟩ + |11⟩),
√2 √2
1 1
|10⟩ ↦ (|00⟩ − |10⟩), |11⟩ ↦ (|01⟩ − |11⟩).
√2 √2
The matrix representation of the composite operator is
1 0 1 0
⎛ ⎞
1
1 1 1 0 1 0 1 0 1
𝐻 ⊗ 𝐼2 = ( )⊗( )= ⎜ ⎟.
√2 1 −1 0 1 √2 ⎜1 0 −1 0⎟
⎝0 1 0 −1⎠

3.3.4. Quantum circuits. Quantum circuits implement more complex unitary


operators on a state space ℍ𝑛 , 𝑛 ∈ ℕ, by combining several quantum gates. We illus-
trate this concept by an example shown in Figure 3.3.4.
This quantum circuit has three wires. This indicates that the unitary operator 𝑈
implemented by the quantum circuit acts on the state space ℍ3 of 3-qubit registers.

𝐻 𝐻

|𝜑⟩ 𝐻 |𝜓⟩

Figure 3.3.4. Quantum circuit that combines the Hadamard and 𝖢𝖭𝖮𝖳 gates.
3.3. Time evolution 117

𝑈0 = 𝐻 ⊗ 𝐻 ⊗ 𝐼1 𝑈1 = 𝐼1 ⊗ 𝖢𝖭𝖮𝖳 𝑈2 = 𝐻 ⊗ 𝐼1 ⊗ 𝐼1

𝐻 𝐼1 𝐻

|𝜑⟩ 𝐻 𝐼1 |𝜓⟩

𝐼1 𝑥 𝐼1

Figure 3.3.5. The quantum circuit from Figure 3.3.4 implements 𝑈 = 𝑈 0 ∘ 𝑈 1 ∘ 𝑈 2 .

The circuit transforms the 3-qubit input state |𝜑⟩ into a 3-qubit output state |𝜓⟩. How
is 𝑈 constructed? As shown in Figure 3.3.5, 𝑈 is the concatenation of three unitary
operators; i.e.,

(3.3.9) 𝑈 = 𝑈2 ∘ 𝑈1 ∘ 𝑈0 .

Each of these unitary operators is the tensor product of the unitary operators imple-
mented by the gates that are one above the other. If there is no operator but only a
wire, the identity operator is inserted. The composed operators are 𝑈0 = 𝐻 ⊗ 𝐻 ⊗ 𝐼1 ,
𝑈1 = 𝐼1 ⊗ 𝖢𝖭𝖮𝖳, and 𝑈2 = 𝐻 ⊗ 𝐼1 ⊗ 𝐼1 .
We determine the state that is obtained when the input state of the quantum circuit
in Figures 3.3.4 and 3.3.5 is |000⟩:
1
|000⟩ ↦ (|0⟩ + |1⟩)(|0⟩ + |1⟩) |0⟩
𝑈0 √4
1
= (|000⟩ + |010⟩ + |100⟩ + |110⟩)
√4
1
↦ (|000⟩ + |011⟩ + |100⟩ + |111⟩)
𝑈1 √4
(3.3.10)
1
= (|0⟩ + |1⟩)(|00⟩ + |11⟩)
√4
1
↦ |0⟩ (|00⟩ + |11⟩)
𝑈2 √2

1
= (|000⟩ + |011⟩).
√2

This computation is also illustrated in Figure 3.3.6.

Exercise 3.3.9. Determine the output state 𝑈 ||𝑏⟩⃗ of the circuit in Figure 3.3.4 for all
𝑏 ⃗ ∈ {0, 1}3 .

The concept of a quantum circuit can easily be generalized to circuits that operate
on 𝑛-qubit registers for any 𝑛 ∈ ℕ. This will be discussed in more detail in Section 4.7.
118 3. Quantum Mechanics

|0⟩ 𝐻 𝐻

1
|0⟩ 𝐻 (|000⟩ + |011⟩)
√2

|0⟩

Figure 3.3.6. The quantum circuit from Figure 3.3.4 operating on |000⟩.

3.4. Measurements
In the previous sections, we have seen the first steps of quantum computing. A state
of an 𝑛-bit quantum register is prepared. This is the input state for a quantum circuit
operating on 𝑛-qubit states. This quantum circuit implements a unitary operator on
the state space of the quantum register. It is composed of several quantum gates and
transforms the input state into an output state. This output state may either be used by a
further quantum computation or the information may be extracted by a measurement.
Such measurements are discussed in more detail in this section.

3.4.1. Measurement Postulate. The concept of measurements is controversial


in quantum mechanics. Here, we present this concept according to the Copenhagen
interpretation of quantum theory. This term was coined by Werner Heisenberg around
1955 for views of this theory developed in the second quarter of the 20th century. The
idea is that a quantum mechanical system is initially closed. It then develops according
to the state space and the Evolution Postulates. A measurement ends the seclusion of
the quantum system. It is an interaction with a laboratory device. When this device
makes a measurement, the state of the quantum system collapses irreversibly to an
eigenstate of the observable implemented by the device. This observable is a Hermitian
operator on the state space. The measurement makes a potentiality, the state of the
quantum system in superposition of the eigenstates, become an actuality. Additionally,
the device records the corresponding eigenvalue.
This view of quantum mechanical measurements is the content of the following
Measurement Postulate. It describes projective measurements. In the sequel, we need
only such measurements. The more general notion is that of a positive operator-valued
measurement (POVM).
Postulate 3.4.1 (Measurement Postulate). A projective measurement is described by
an observable 𝑂 which is a Hermitian operator on the state space of the system being
observed. Let 𝑂 = ∑𝜆∈Λ 𝜆𝑃𝜆 be the spectral decomposition of 𝑂. The possible out-
comes of the measurement are the eigenvalues 𝜆 of the observable. When measuring
𝑂 while the quantum system is in the state |𝜑⟩, the probability of getting the result 𝜆
is Pr𝑂,𝜑 (𝜆) = ⟨𝜑|𝑃𝜆 |𝜑⟩ = ‖𝑃𝜆 |𝜑⟩ ‖2 . If this outcome occurs, the state of the quantum
𝑃𝜆 |𝜑⟩
system immediately after the measurement is ‖𝑃𝜆 |𝜑⟩‖
.

The measurements in the postulate are called “projective” because they project the
state of the quantum system onto one of the eigenspaces of the measured observable
and normalize the length of this projection. We also note that measurement devices
3.4. Measurements 119

must use an appropriate encoding of the measurement outcomes, typically by finitely


many bits. Examples of measurements are given in the following section.
Next, we discuss the expectation values of measurements. For every observable
on the state space ℍ and all |𝜑⟩ ∈ ℍ, Postulate 3.4.1 defines the discrete probability
space (Λ, Pr𝑂,𝜑 ). Since 𝑂 is Hermitian, Proposition 2.4.61 implies that Λ ⊂ ℝ. So the
identity map 𝐼Λ on Λ is a random variable associated with the probability space. The
next lemma determines its expectation value.

Lemma 3.4.2. We have 𝐸[𝐼Λ ] = ⟨𝜓|𝑂|𝜓⟩.

Proof. We note that


𝐸[𝑋] = ∑ 𝜆Pr(𝜆) definition of the expectation value,
𝜆∈Λ

= ∑ 𝜆⟨𝜓|𝑃𝜆 |𝜓⟩ definition of Pr,


𝜆∈Λ

= ⟨𝜓| ∑ 𝜆𝑃𝜆 |𝜓⟩ linearity of the inner product,


𝜆∈Λ
= ⟨𝜓|𝑂|𝜓⟩ spectral theorem.
This completes the proof. □

Lemma 3.4.2 motivates the following definition.

Definition 3.4.3. Let 𝑂 be an observable of a quantum system with state space ℍ.


Suppose that we measure this observable when the system is in the state |𝜑⟩ ∈ ℍ.
Then the expectation value of this measurement is defined as ⟨𝜑|𝑂|𝜑⟩.

In Section 3.1.4 we have introduced global phase factors of quantum states and
have shown that equality up to a global phase factor is an equivalence relation. We
have denoted the corresponding equivalence class of a quantum state |𝜓⟩ by [𝜓]. We
now show that global phase factors have no impact on quantum measurements.

Theorem 3.4.4. Assume that we measure an observable 𝑂 of a quantum system. Let 𝜆


be an eigenvalue of 𝑂 and let |𝜑⟩ and |𝜓⟩ be two states of the system that differ only by a
global phase factor. Then the probability of measuring 𝜆 is the same regardless of whether
the system is in the state |𝜑⟩ or |𝜓⟩. Also, if the system is in one of the two states and the
measurement outcome 𝜆 occurs, then the states immediately after the measurement are
equal up to the same global phase factor.

Proof. Let 𝛾 ∈ ℝ such that |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩. Then we have


(3.4.1) ‖𝑃𝜆 |𝜓⟩ ‖ = ‖𝑒𝑖𝜃 𝑃𝜆 |𝜑⟩ ‖ = |𝑒𝑖𝜃 |‖𝑃𝜆 |𝜑⟩ ‖ = ‖𝑃𝜆 |𝜑⟩ ‖.
Also, if the state of the system is |𝜓⟩, then the state immediately after the measurement
is
𝑃𝜆 |𝜓⟩ 𝑒𝑖𝛾 𝑃𝜆 |𝜑⟩
(3.4.2) = . □
‖𝑃𝜆 |𝜑⟩ ‖ ‖𝑃𝜆 |𝜑⟩ ‖
120 3. Quantum Mechanics

Exercise 3.4.5. Let ℍ be the state space of a quantum system. Show that 𝑂 = 𝐼ℍ is an
observable of the quantum system. Also, show that measuring 𝑂 when the quantum
system is in the state |𝜑⟩ ∈ ℍ gives 1 with probability 1 and that immediately after the
measurement, the quantum system is still in state |𝜑⟩.

3.4.2. Measuring quantum systems in an orthonormal basis. In quantum


computing, it is common to measure 𝑛-qubit registers in the computational basis of
their state space ℍ𝑛 for some 𝑛 ∈ ℕ. We have already introduced these measurements
in Sections 3.1.2 and 3.1.5. Now, we explain how to model them using the measurement
postulate and start with an example.

Example 3.4.6. Let

(3.4.3) |𝜑⟩ = 𝛼0 |0⟩ + 𝛼1 |1⟩

be a single-qubit state, where 𝛼0 , 𝛼1 ∈ ℂ with |𝛼20 | + |𝛼1 |2 = 1. We would like the mea-
surement to reflect this superposition; that is, we want the measurement outcome to be
0 with probability |𝛼0 |2 and 1 with probability |𝛼1 |2 . Recall that, by Proposition 2.4.44,
the projections onto the subspaces spanned by |0⟩ or |1⟩ are 𝑃0 = |0⟩ ⟨0| and 𝑃1 = |1⟩ ⟨1|,
respectively. Therefore, we use the observable 𝑂 with the spectral decomposition

(3.4.4) 𝑂 = 0 ⋅ 𝑃0 + 1 ⋅ 𝑃1 = |𝜑1 ⟩ ⟨𝜑1 | .

It is Hermitian, has the two eigenvalues 𝜆0 = 0 and 𝜆1 = 1, and the corresponding


eigenspaces are ℂ |0⟩ and ℂ |1⟩. When the qubit is in the state |𝜑⟩ from (3.4.3) and we
measure 𝑂, the measurement produces 𝑏 ∈ {0, 1} with probability.

(3.4.5) ‖𝑃𝑏 |𝜑⟩ ‖2 = |𝛼𝑏 |2 .

Also, the state of the qubit immediately after the measurement is


𝑃𝑏 |𝜑⟩ 𝛼
(3.4.6) = 𝑏 |𝑏⟩ .
‖𝑃𝑏 |𝜑⟩‖ |𝛼𝑏 |
This state is equal to |𝑏⟩ up to the global phase factor 𝛼𝑏 /|𝛼𝑏 |. By (3.4.4), the expectation
value of this measurement is

(3.4.7) ⟨𝜑|𝑂|𝜑⟩ = ⟨𝛼0 |0⟩ + 𝛼1 |1⟩ |𝛼1 |1⟩ ⟩ = |𝛼1 |2 .

For example, if the qubit is in the state |0⟩, then the expectation value of this measure-
ment is 0. Also, if the qubit is in the state |1⟩, then the expectation value is 1. But if the
1 1
qubit is in the superposition |𝜓⟩ = |0⟩ + |1⟩, then the expectation value of this
√2 √2
1
measurement is 2 .

We generalize Example 3.4.6. Let ℍ be the state space of a quantum system and
let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ. For example, if ℍ = ℍ𝑛 , then
𝑘 = 2𝑛 and we may use the computational basis (|0⟩𝑛 , . . . , |2𝑛 − 1⟩𝑛 ) of ℍ𝑛 . For 𝑗 ∈ ℤ𝑘
the projection onto ℂ |𝑏𝑗 ⟩ is

(3.4.8) 𝑃𝑗 = |𝑏𝑗 ⟩ ⟨𝑏𝑗 | .


3.4. Measurements 121

We use the observable 𝑂 with spectral decomposition


𝑘−1 𝑘−1
(3.4.9) 𝑂 = ∑ 𝑗𝑃𝑗 = ∑ 𝑗 |𝑏𝑗 ⟩ ⟨𝑏𝑗 | .
𝑗=0 𝑗=0

Its matrix representation with respect to the basis 𝐵 is


0 0 0 ⋯ 0
⎛ ⎞
0 1 0 ⋯ 0
⎜ ⎟
(3.4.10) ⎜0 0 2 ⋯ 0 ⎟.
⎜⋮ ⋱ ⋮ ⎟
⎝0 0 0 ⋯ 𝑘 − 1⎠
So 𝑂 is Hermitian and has the 𝑘 eigenvalues 𝜆𝑗 = 𝑗 with the corresponding eigenspaces
ℂ |𝑏𝑗 ⟩, 𝑗 ∈ ℤ𝑘 .
Let |𝜑⟩ be a state of our quantum system; i.e.,
𝑘−1
(3.4.11) |𝜑⟩ = ∑ 𝛼𝑗 |𝑏𝑗 ⟩
𝑗=0

𝑘−1
with 𝛼𝑗 ∈ ℂ for 0 ≤ 𝑗 < 𝑘 and |𝛼𝑖 |2 = 1. According to Postulate 3.4.1, the
∑𝑖=0
possible outcomes when measuring the observable 𝑂 are the eigenvalues 𝑗 ∈ ℤ𝑘 of 𝑂.
Each integer 𝑗 ∈ ℤ𝑘 occurs with probability
(3.4.12) ‖𝑃𝑗 |𝜑⟩ ‖2 = |𝛼𝑗 |2
and the state of the quantum system immediately after this outcome is
𝑃𝑗 |𝜑⟩ 𝛼𝑗
(3.4.13) = |𝑏 ⟩ .
‖𝑃 |𝜑⟩‖ |𝛼𝑗 | 𝑗
‖ 𝑗 ‖
Up to a global phase factor, this state is equal to the basis state |𝑏𝑗 ⟩. The expectation
value of the measurement is
𝑘−1
(3.4.14) ⟨𝜑|𝐴|𝜑⟩ = ∑ 𝑗|𝛼𝑗 |2 .
𝑗=1

This discussion motivates the following definition.


Definition 3.4.7. Let ℍ be the state space of a quantum system and let 𝐵 =
(|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ. By measuring the quantum system
in the basis 𝐵 we mean measuring the observable
𝑘−1
(3.4.15) 𝑂 = ∑ 𝑗 |𝑏𝑗 ⟩ ⟨𝑏𝑗 |
𝑗=0

of ℍ.
Example 3.4.8. Consider the circuit shown in Figure 3.3.6. As shown in the figure,
|000⟩+|011⟩
the input state is |000⟩ and the output state is . If we use integers in ℤ3 to
√2
|0⟩3 +|3⟩3
denote the computational basis states, then the output state is . Measuring
√2
122 3. Quantum Mechanics

the 3-qubit register in the computational basis of ℍ3 means measuring the observable
7
𝑂 = ∑𝑗=1 𝑗 |𝑗⟩ ⟨𝑗|. The measurement outcome is one of the numbers 0 or 3, each with
1 0+3 3
probability 2 . The expectation value of this measurement is 2
= 2.
Exercise 3.4.9. Determine the measurement statistics and the expectation values for
measuring the 3-qubit register of the circuit in Figure 3.3.4 in the computational basis
for all input states ||𝑏⟩,
⃗ 𝑏 ⃗ ∈ {0, 1}3 .

3.4.3. Partial measurements. In certain quantum computing scenarios, mea-


surements are performed selectively on specific parts of a quantum system. We will
now explain the modeling of such measurements in a particular situation. However,
it allows for a straightforward generalization to other cases.
Assume that 𝐴 and 𝐵 are quantum systems with state spaces ℍ𝐴 and ℍ𝐵 , respec-
tively. Consider the composite quantum system 𝐴𝐵 with state space ℍ𝐴𝐵 = ℍ𝐴 ⊗ ℍ𝐵 .
Let 𝑂𝐴 be an observable of system 𝐴 with spectral decomposition
(3.4.16) 𝑂𝐴 = ∑ 𝜆𝑃𝜆 .
𝜆∈Λ

Denote by 𝐼𝐵 the identity operator on ℍ𝐵 . Then the composite operator


(3.4.17) 𝑂𝐴𝐵 = 𝑂𝐴 ⊗ 𝐼𝐵
is an observable of ℍ𝐴𝐵 . Its spectral decomposition is
(3.4.18) 𝑂𝐴𝐵 = ∑ 𝜆𝑃𝜆 ⊗ 𝐼𝐵 .
𝜆∈Λ

Measuring this observable when the system 𝐴𝐵 is in the state |𝜓⟩, the outcome is 𝜆 ∈ Λ
2 𝑃 ⊗𝐼 |𝜓⟩
with probability ‖𝑃𝜆 ⊗ 𝐼𝐵 |𝜓⟩‖ and the state after this outcome is ‖𝑃𝜆 ⊗𝐼𝐵 |𝜓⟩‖ .
𝜆 𝐵

Example 3.4.10. Let 𝐴 and 𝐵 be single-qubit systems with state spaces ℍ𝐴 = ℍ𝐵 = ℍ1 .


Our goal is to measure the first qubit of the Bell state
|00⟩ + |11⟩
(3.4.19) |𝜓⟩ = .
√2
The corresponding observable is
(3.4.20) 𝑂 = |1⟩ ⟨1| ⊗ 𝐼1 .
If 𝑏 ∈ {0, 1}, then we have
𝑃𝑏 ⊗ 𝐼1 |0⟩ |0⟩ + 𝑃𝑏 ⊗ 𝐼1 |1⟩ |1⟩ |𝑏⟩ |𝑏⟩
(3.4.21) 𝑃𝑏 ⊗ 𝐼1 |𝜓⟩ = = .
√2 √2
1
So the probability that measuring 𝑂 gives 𝑏 is 2
and the state after this measurement
outcome is |𝑏⟩ |𝑏⟩.

Example 3.4.10 shows that measuring the first qubit of the entangled Bell state
and leaving the second qubit alone changes both qubits. This observation is central
to the famous EPR thought experiment. It is named after its inventors Albert Ein-
stein, Boris Podolsky, and Nathan Rosen. They published it in 1935 to demonstrate
that quantum mechanics is incomplete. We present a simplified description of their
3.5. Density operators 123

idea. Prepare two qubits in the entangled state (3.4.19). Give one to Alice and the other
to Bob. Then Alice travels a long way, taking her qubit with her. Upon arriving, she
measures her qubit. As we have seen in Example 3.4.10, this measurement will with
probability 1/2 put both qubits into the state |0⟩ or |1⟩. After this measurement, Alice
knows with certainty the state of Bob’s qubit. Einstein, Podolsky, and Rosen claimed
that this instantaneous change of Bob’s qubit contradicts relativity theory which says
that the maximum possible speed is the speed of light. They concluded that the quan-
tum mechanics explanation must be incomplete. However, much later experiments
confirmed the prediction of quantum mechanics and thus showed that the arguments
of Einstein, Podolsky, and Rosen were not correct. We note that from the perspective
of information theory, Alice and Bob do not exchange information. They only obtain
a uniformly distributed random bit, which they can also produce by tossing a coin.
As we will see now, the situation is much easier if the composite system 𝐴𝐵 is in a
separable state
(3.4.22) |𝜓⟩ = |𝜑⟩ |𝜉⟩
with |𝜑⟩ , |𝜉⟩ ∈ ℍ1 . Then for 𝜆 ∈ Λ we have
(3.4.23) 𝑃𝜆 ⊗ 𝐼𝐵 𝜓 = 𝑃𝜆 ⊗ 𝐼𝐵 |𝜑⟩ |𝜉⟩ = 𝑃𝜆 |𝜑⟩ ⊗ |𝜉⟩ .
Hence, the probability of measuring 𝜆 is
(3.4.24) ‖𝑃𝜆 |𝜑⟩ ⊗ |𝜉⟩‖ = ‖𝑃𝜆 |𝜑⟩‖ ‖𝜉‖ = ‖𝑃𝜆 |𝜑⟩‖ .
This is the probability for obtaining 𝜆 when only system 𝐴 is measured. Also, the state
after measuring 𝜆 is
𝑃𝜆 |𝜑⟩
(3.4.25) ⊗ |𝜉⟩ .
‖𝑃𝜆 |𝜑⟩‖
This state is the tensor product of the state after measuring system 𝐴 with the state |𝜉⟩
of system 𝐵 before the measurement.
Example 3.4.11. Consider the separable quantum state
(3.4.26) |𝜓⟩ = |𝑥− ⟩ |𝑥− ⟩ .
Measuring only the first qubit in the computational basis of ℍ1 gives 0 or 1, each with
1
probability 2 . If the measurement outcome 𝑏 ∈ {0, 1} occurs, then the state immedi-
ately after the measurement is |𝑏⟩ |𝑥− ⟩.
Exercise 3.4.12. Write down the observable that only measures the first and last qubit
of a 3-qubit quantum register. Determine the measurement statistics for the quantum
|000⟩+𝑖|111⟩
state |𝜑⟩ = .
√2

3.5. Density operators


In this section, we introduce density operators on state spaces of quantum systems.
We show how they can be used instead of state vectors to describe the states of quan-
tum systems and generalizations of such states, the so-called mixed states. Let 𝑄 be a
quantum system with state space ℍ.
124 3. Quantum Mechanics

3.5.1. Definition.
Definition 3.5.1. A density operator on ℍ is a linear operator on ℍ that satisfies the
following conditions.
(1) Trace condition: tr 𝜌 = 1,
(2) Positivity condition: 𝜌 is positive semidefinite.
Example 3.5.2. If |𝜑⟩ is a state of 𝑄, then
(3.5.1) 𝜌 = |𝜑⟩ ⟨𝜑|
is a density operator on ℍ. In fact, by Proposition 2.4.27 and since quantum states have
norm 1, we have
(3.5.2) tr 𝜌 = tr |𝜑⟩ ⟨𝜑| = ⟨𝜑|𝜑⟩ = 1.
This proves the trace condition. Also, it follows from Proposition 2.4.27 that for all
|𝜓⟩ ∈ ℍ we have
(3.5.3) ⟨𝜓|𝜌|𝜓⟩ = ⟨𝜓|𝜑⟩⟨𝜑|𝜓⟩ = |⟨𝜑|𝜓⟩|2 ≥ 0.
This proves the positivity condition.

We note that density operators on ℍ are Hermitian since by Proposition 2.4.64 posi-
tive semidefinite operators are Hermitian. Next, we introduce mixed states of quantum
systems. They allow us to describe the probabilistic behavior of quantum systems in
situations where we don’t have complete information about the system.
Definition 3.5.3. (1) A mixed state of the quantum system 𝑄 is a sequence
(3.5.4) ((𝑝0 , |𝜓0 ⟩), . . . (𝑝 𝑘−1 , |𝜓𝑙−1 ⟩))
where 𝑙 ∈ ℕ, the |𝜓𝑖 ⟩ are quantum states in ℍ for 0 ≤ 𝑖 < 𝑙, and 𝑝 𝑖 ∈ ℝ≥0 for
𝑙−1
0 ≤ 𝑖 < 𝑙 such that ∑𝑖=0 𝑝 𝑖 = 1.
(2) A pure state of the quantum system 𝑄 is a quantum state in its state space ℍ.

We note that there is a one-to-one correspondence between the pure states |𝜓⟩ and
the mixed states (1, |𝜓⟩) of 𝑄. Using this correspondence, we identify pure states and
these mixed states. A mixed state of 𝑄 with more than 1 component describes the
situation where the exact state of the quantum system is inaccessible. For instance,
such mixed states arise as the states of parts of composite quantum systems that are in
an entangled state. This will be discussed in Section 3.7. The next theorem associates
each mixed state with a density operator.
Proposition 3.5.4. Let ((𝑝0 , |𝜓0 ⟩), . . . (𝑝 𝑙−1 , |𝜓𝑙−1 ⟩)) be a mixed state of the quantum
system 𝑄. Then
𝑙−1
(3.5.5) 𝜌 = ∑ 𝑝 𝑖 |𝜓𝑖 ⟩ ⟨𝜓𝑖 |
𝑖=0

is a density operator on the state space ℍ of 𝑄.


3.5. Density operators 125

Proof. We know from Proposition B.5.25 that the trace is ℂ-linear. Therefore, we have
𝑙−1
tr(𝜌) = ∑ 𝑝 𝑖 tr |𝜓𝑖 ⟩ ⟨𝜓𝑖 | linearity of the trace,
𝑖=0
𝑙−1
= ∑ 𝑝 𝑖 ⟨𝜓𝑖 |𝜓𝑖 ⟩ Proposition 2.4.27(6),
𝑖=0
𝑙−1
= ∑ 𝑝𝑖 = 1 Definition 3.5.3.
𝑖=0

This proves the trace condition. To show the positivity condition, we note that for all
|𝜉⟩ ∈ ℍ we have
|𝑙−1
⟨𝜉|𝜌|𝜉⟩ = ⟨𝜉 || ∑ 𝑝 𝑖 |𝜓𝑖 ⟩ ⟨𝜓𝑖 |𝜉⟩ ⟩ definition of 𝜌,
|𝑖=0
𝑙−1
= ∑ 𝑝 𝑖 ⟨𝜉|𝜓𝑖 ⟩⟨𝜓𝑖 |𝜉⟩ linearity of the inner product,
𝑖=0
𝑙−1
= ∑ 𝑝 𝑖 |⟨𝜓𝑖 |𝜉⟩|2 ≥ 0 conjugate symmetry of the inner product.
𝑖=0

This concludes the proof of the proposition. □

Proposition 3.5.4 justifies the following definition.


Definition 3.5.5. (1) The density operator of a mixed state
(3.5.6) 𝑆 = (𝑝0 , |𝜓0 ⟩), . . . , (𝑝 𝑙−1 , |𝜓𝑙−1 ⟩)
of 𝑄 is defined as
𝑙−1
(3.5.7) 𝜌𝑆 = ∑ 𝑝 𝑖 |𝜓𝑖 ⟩ ⟨𝜓𝑖 | .
𝑖=0

(2) The density operator of a pure state |𝜓⟩ ∈ ℍ is defined as


(3.5.8) 𝜌𝜓 = |𝜓⟩ ⟨𝜓| .
Example 3.5.6. Consider the mixed state
1 1
(3.5.9) 𝑆 = (( , |0⟩) , ( , |1⟩)) .
2 2
The corresponding density operator is
1
(3.5.10) 𝜌𝑆 = (|0⟩ ⟨0| + |1⟩ ⟨1|) .
2
We show that there is no pure state |𝜓⟩ such that 𝜌 = 𝜌𝜓 . Let |𝜓⟩ = 𝛼0 |0⟩ + 𝛼1 |1⟩ be a
pure state where 𝛼0 , 𝛼1 ∈ ℂ such that |𝛼0 |2 + |𝛼1 |2 = 1. Its density operator is
(3.5.11) 𝜌𝜓 = |𝛼0 |2 |0⟩ ⟨0| + 𝛼0 𝛼1 |0⟩ ⟨1| + 𝛼1 𝛼0 |1⟩ ⟨0| + |𝛼1 |2 |1⟩ ⟨1| .
1
Now 𝜌 = 𝜌𝜓 implies |𝛼0 |2 = |𝛼1 |2 = 2
and 𝛼0 = 0 or 𝛼1 = 0. But this cannot be true.
126 3. Quantum Mechanics

Exercise 3.5.7. Let 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) be an orthonormal basis of ℍ. Furthermore,


consider the quantum state
𝑘−1
(3.5.12) |𝜑⟩ = ∑ 𝛼𝑖 |𝑏𝑖 ⟩
𝑖=0
𝑘−1
where 𝛼𝑖 ∈ ℂ for all 𝑖 ∈ ℤ𝑘 such that ∑𝑖=0|𝛼𝑖 |2 = 1. Show that the density operators of
the pure state |𝜑⟩ and the mixed state ((|𝛼0 |2 , |𝑏0 ⟩), . . . , (|𝛼𝑘−1 |2 , |𝑏𝑘−1 ⟩)) are the same.

3.5.2. Correspondence between mixed states and density operators. This


section explains the correspondence between mixed states and density operators. To
begin, we present the following observation.
Proposition 3.5.8. Every density operator on ℍ is the density operator of some mixed
state of the quantum system 𝑄.

Proof. Let 𝜌 be a density operator on ℍ. Then 𝜌 is Hermitian. It follows from Theorem


2.4.56 that we can write
𝑘−1
(3.5.13) 𝜌 = ∑ 𝜆𝑖 |𝑏𝑖 ⟩ ⟨𝑏𝑖 |
𝑖=0

where 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) is an orthonormal basis of eigenvectors of 𝜌 and 𝜆𝑖 is the


eigenvalue associated with the eigenvector |𝑏𝑖 ⟩ for all 𝑖 ∈ ℤ𝑘 . It follows from the trace
condition and Proposition 2.4.1 that
𝑘−1
(3.5.14) 1 = tr(𝜌) = ∑ 𝜆𝑖 .
𝑖=0

The positivity condition and Proposition 2.4.65 imply that 𝜆𝑖 ≥ 0 for 0 ≤ 𝑖 < 𝑘. Hence,
((𝜆0 , |𝑏0 ⟩), . . . , (𝜆𝑘−1 , |𝑏𝑘−1 ⟩))
is a mixed state of the quantum system with density operator 𝜌. □

The proof of Proposition 3.5.8 contains a method for constructing a mixed state
that corresponds to a given density operator. This is illustrated in the next example.
Example 3.5.9. Consider the operator
1 1
(3.5.15) 𝜌 = |0⟩ ⟨0| + |1⟩ ⟨1| .
2 2
This is the representation of 𝜌 as in (3.5.13). Also, 𝜌 is positive semidefinite and has
trace 1. Therefore, 𝜌 is a density operator. The construction in the proof of Proposition
3.5.8 gives the mixed state
1 1
(3.5.16) (( , |0⟩) , ( , |1⟩))
2 2
that we already know from Example 3.5.6. Its density operator is 𝜌.

Proposition 3.5.8 shows that the map that sends a mixed state to its density operator
is surjective. However, in general, this map is not injective. The next proposition allows
us to determine the mixed states that are associated with the same density operator.
3.5. Density operators 127

Proposition 3.5.10. Let 𝑙 ∈ ℕ and let


(3.5.17) 𝑆 = (|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩), 𝑇 = (|𝜓0 ⟩ , . . . , |𝜓𝑙−1 ⟩) ∈ ℍ𝑙 .
Then we have
𝑙−1 𝑙−1
(3.5.18) ∑ |𝜑𝑖 ⟩ ⟨𝜑𝑖 | = ∑ |𝜓𝑖 ⟩ ⟨𝜓𝑖 |
𝑖=0 𝑖=0

if and only if there is a unitary matrix 𝑈 ∈ ℂ(𝑙,𝑙) such that


(3.5.19) 𝑇 = 𝑆𝑈.

Proof. Let 𝑈 ∈ ℂ(𝑙,𝑙) be a unitary matrix such that


(3.5.20) 𝑇 = 𝑆𝑈.
We write 𝑈 = (𝑢𝑖,𝑗 ) and 𝑈 ∗ = (𝑢∗𝑖.𝑗 ) with 𝑢𝑖,𝑗 , 𝑢∗𝑖,𝑗 ∈ ℂ for 0 ≤ 𝑖, 𝑗 < 𝑙. Then we have
(3.5.21) 𝑢∗𝑖,𝑗 = 𝑢𝑗,𝑖 , 0 ≤ 𝑖, 𝑗 < 𝑙.
Since 𝑈 is unitary, we have 𝑈 ∗ 𝑈 = 𝐼𝑙 . Hence, for 0 ≤ 𝑚, 𝑛 < 𝑙 we have
𝑙−1 𝑙−1
(3.5.22) ∑ 𝑢𝑖,𝑚 𝑢𝑖,𝑛 = ∑ 𝑢∗𝑚,𝑖 𝑢𝑖,𝑛 = 𝛿𝑚,𝑛 .
𝑖=0 𝑖=0

This identity and the rules in Proposition 2.4.27 imply


𝑙−1
∑ |𝜓𝑖 ⟩ ⟨𝜓𝑖 |
𝑖=0
𝑙−1 | 𝑙−1 𝑙−1 |
= ∑ || ∑ 𝑢𝑖,𝑚 |𝜑𝑚 ⟩⟩ ⟨ ∑ 𝑢𝑖,𝑛 |𝜑𝑛 ⟩||
𝑖=0 |𝑚=0 𝑛=0 |
𝑙−1
= ∑ 𝑢𝑖,𝑚 𝑢𝑖,𝑛 |𝜑𝑚 ⟩ ⟨𝜑𝑛 |
𝑖,𝑚,𝑛=0
𝑙−1 𝑙−1
= ∑ ( ∑ 𝑢∗𝑚,𝑖 𝑢𝑖,𝑛 ) |𝜑𝑚 ⟩ ⟨𝜑𝑛 |
𝑚,𝑛=0 𝑖=0
𝑙−1
= ∑ 𝛿𝑚,𝑛 |𝜑𝑚 ⟩ ⟨𝜑𝑛 |
𝑚,𝑛=0
𝑙−1
= ∑ |𝜑𝑚 ⟩ ⟨𝜑𝑚 | .
𝑚=0

Now, assume that the two operators are equal. Call them 𝜌. Since 𝜌 is Hermitian
it follows from the Spectral Theorem 2.4.56 that there is a decomposition
𝑚−1
(3.5.23) 𝜌 = ∑ 𝜆𝑖 |𝑏𝑖 ⟩ ⟨𝑏𝑖 |
𝑖=0

of 𝜌 where 𝑚 ∈ ℕ, (|𝑏0 ⟩ , . . . , |𝑏𝑚−1 ⟩) is an orthonormal sequence of eigenvectors of


𝜌, and 𝜆𝑖 ∈ ℝ is a nonzero eigenvalue associated with |𝑏𝑖 ⟩ for all 𝑖 ∈ ℤ𝑚 . Since 𝜌
128 3. Quantum Mechanics

is a density operator, these eigenvalues are positive. We show that the vector space
𝑉 = Span(|𝑏0 ⟩ , . . . , |𝑏𝑚−1 ⟩) is equal to the vector space 𝑉 ′ = Span(|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩).
Since for all 𝑗 ∈ ℤ𝑚 we have
𝑙−1
(3.5.24) 𝜆𝑗 |𝑏𝑗 ⟩ = 𝜌 |𝑏𝑗 ⟩ = ∑ ⟨𝜑𝑖 |𝑏𝑗 |𝜑𝑖 ⟩
𝑖=0

and 𝜆𝑗 ≠ 0 it follows that


(3.5.25) 𝑉 ⊂ 𝑉 ′.
Next, set |𝜉𝑖 ⟩ = √𝜆𝑖 |𝑏𝑖 ⟩ for all 𝑖 ∈ ℤ𝑚 . Then we have
𝑚−1
(3.5.26) 𝜌 = ∑ |𝜉𝑖 ⟩ ⟨𝜉𝑖 |
𝑖=0

and (|𝜉0 ⟩ , . . . , |𝜉𝑚−1 ⟩) is an orthogonal basis of 𝑉. Let |𝜓⟩ ∈ 𝑉 ⟂ . Then


|𝑚−1
(3.5.27) ⟨𝜓|𝜌|𝜓⟩ = ⟨𝜓 || ∑ ⟨𝜉𝑖 |𝜓⟩ |𝜉𝑖 ⟩ ⟩ = ⟨𝜓|0⟩ = 0.
| 𝑖=0
This implies
|𝑙−1
0 = ⟨𝜓|𝜌|𝜓⟩ = ⟨𝜓 || ∑ ⟨𝜑𝑖 |𝜓⟩ |𝜑𝑖 ⟩ ⟩
|𝑖=0
(3.5.28)
𝑙−1 𝑙−1
= ∑ ⟨𝜓|𝜑𝑖 ⟩⟨𝜑𝑖 |𝜓⟩ = ∑ |⟨𝜑𝑖 |𝜓⟩|2 .
𝑖=0 𝑖=0

Hence, |⟨𝜑𝑖 |𝜓⟩|2 = 0 for 0 ≤ 𝑖 < 𝑙 and all |𝜓⟩ ∈ 𝑉 ⟂ . Therefore |𝜑𝑖 ⟩ ∈ (𝑉 ⟂ )⟂ = 𝑉 for
0 ≤ 𝑖 < 𝑙 which together with (3.5.25) implies
(3.5.29) 𝑉 = 𝑉′ and 𝑚 ≤ 𝑙.

So we can write
𝑚−1
(3.5.30) |𝜑𝑗 ⟩ = ∑ 𝑢𝑖,𝑗 |𝜉𝑖 ⟩
𝑖=0

for all 𝑗 ∈ ℤ𝑙 with complex coefficients 𝑢𝑖,𝑗 . Denote the matrix (𝑢𝑖,𝑗 ) ∈ ℂ(𝑚,𝑙) by 𝑈.
Also, denote the entries of the adjoint 𝑈 ∗ of 𝑈 by 𝑢∗𝑖,𝑗 . Now we have
𝑚−1 𝑙−1
∑ |𝜉𝑝 ⟩ ⟨𝜉𝑝 | = ∑ |𝜑𝑝 ⟩ ⟨𝜑𝑝 |
𝑝=0 𝑝=0
𝑙−1 |𝑚−1 𝑚−1 |
(3.5.31) = ∑ || ∑ 𝑢𝑖,𝑝 |𝜉𝑖 ⟩⟩ ⟨ ∑ 𝑢𝑗,𝑝 |𝜉𝑗 ⟩||
𝑝=0 | 𝑖=0 𝑗=0 |
𝑚−1 𝑙−1

= ∑ ( ∑ 𝑢𝑖,𝑝 𝑢𝑝,𝑗 ) |𝜉𝑖 ⟩ ⟨𝜉𝑗 | .
𝑖,𝑗=0 𝑝=0
3.5. Density operators 129

It follows from Proposition 2.4.31 that the sequence (|𝜉𝑝 ⟩ ⟨𝜉𝑝 |) is linearly independent.
Hence, (3.5.31) implies
𝑙−1

(3.5.32) ∑ 𝑢𝑖,𝑝 𝑢𝑝,𝑗 = 𝛿 𝑖,𝑗
𝑝=0

for 0 ≤ 𝑖, 𝑗 < 𝑚. So the sequence of row vectors of 𝑈 is orthonormal and it follows


from Theorem 2.2.33 that we can add 𝑙 − 𝑚 rows to 𝑈 such that the new 𝑈 is a unitary
matrix. Recall that by (3.5.29) we have 𝑚 ≤ 𝑙. We set |𝜉𝑚 ⟩ , . . . , |𝜉𝑙−1 ⟩ = 0. Then we
have
(3.5.33) (|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩) = (|𝜉0 ⟩ , . . . , |𝜉𝑙−1 ⟩)𝑈.
In the same way, we can show that there is a unitary matrix 𝑈 ′ ∈ ℂ(𝑙,𝑙) such that
(3.5.34) (|𝜓0 ⟩ , . . . , |𝜓𝑙−1 ⟩) = (|𝜉0 ⟩ , . . . , |𝜉𝑙−1 ⟩)𝑈 ′ .
So we obtain
(3.5.35) (|𝜓0 ⟩ , . . . , |𝜓𝑙−1 ⟩) = (|𝜑0 ⟩ , . . . , |𝜑𝑙−1 ⟩)𝑈 ∗ 𝑈 ′ .
Since 𝑈 ∗ 𝑈 ′ is a unitary matrix, this completes the proof. □

From Proposition 3.5.10 we obtain the following theorem.


Theorem 3.5.11. (1) The density operators of two pure states of 𝑄 are the same if and
only if these states are equal up to a global phase factor.
(2) Let 𝑙 ∈ ℕ. The density operators of two mixed states
(3.5.36) ((𝑝0 , |𝜑0 ⟩), . . . , (𝑝 𝑙−1 , |𝜑𝑙−1 ⟩) , ((𝑞0 , |𝜓0 ⟩), . . . , (𝑞𝑙−1 , |𝜓𝑙−1 ⟩)
of 𝑄 are the same if and only if there is a unitary matrix 𝑈 ∈ ℂ(𝑙,𝑙) such that
(3.5.37) (√𝑝0 |𝜑0 ⟩), . . . , √𝑝 𝑙−1 |𝜑𝑙−1 ⟩) = (√𝑞0 |𝜓0 ⟩), . . . , √𝑞𝑙−1 |𝜓𝑙−1 ⟩)𝑈.

Proof. To prove the first assertion, we let |𝜑⟩ and |𝜓⟩ be pure states of 𝑄. The den-
sity operators of these states are the density operators of the mixed states ((1, |𝜑⟩)) and
((1, |𝜓⟩)), respectively. Therefore, it follows from Proposition 3.5.10 that the density
operators of these states are equal if and only if there is a complex number 𝑢 of norm
1 such that |𝜓⟩ = 𝑢 |𝜑⟩. This proves the first assertion. The second assertion follows
immediately from Proposition 3.5.10. □

Theorem 3.5.11 can also be used to characterize the mixed states of different lengths
that correspond to the same density operator. As is shown in Exercise 3.5.12 we can
extend any mixed state of length 𝑙 to a mixed state of length 𝑘 > 𝑙 with the same density
operator by appending 𝑘 − 𝑙 pairs (0, 0) to it.
Exercise 3.5.12. Show that appending pairs (0, 0) to a mixed state gives a mixed state
with the same density operator.

We generalize the equivalence relation introduced in Proposition 3.1.23.


Theorem 3.5.13. The set 𝑅 of all pairs of mixed states of 𝑄 with the same density operator
is an equivalence relation on the set of all mixed states of 𝑄.
130 3. Quantum Mechanics

Exercise 3.5.14. Prove Theorem 3.5.13.

The next theorem gives a criterion that allows one to distinguish between density
operators of pure states and mixed states.

Theorem 3.5.15. Let 𝜌 be a density operator on ℍ. Then the following statements hold.
(1) 𝜌 is the density operator of a pure state if and only if 𝜌2 = 𝜌, which is true if and
only if tr 𝜌2 = 1.
(2) 𝜌 is not the density operator of a pure state if and only if 𝜌2 ≠ 𝜌, which is true if and
only if tr 𝜌2 < 1.

Proof. First, we show that tr 𝜌2 ≤ 1. Since 𝜌 is positive semidefinite, Proposition 2.4.64


tells us that 𝜌 is Hermitian. By Theorem 2.4.56 we can write
𝑘−1
(3.5.38) 𝜌 = ∑ 𝜆𝑖 |𝑏𝑖 ⟩ ⟨𝑏𝑖 |
𝑖=0

where (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) is an orthonormal basis of eigenvectors of 𝜌 and 𝜆𝑖 is the eigen-


value associated with |𝑏𝑖 ⟩ which is a real number for all 𝑖 ∈ ℤ𝑘 . Proposition 2.4.59
implies
𝑘−1
(3.5.39) 𝜌2 = ∑ 𝜆2𝑖 |𝑏𝑖 ⟩ ⟨𝑏𝑖 | .
𝑖=0

The trace and positivity conditions imply


2
𝑘−1 𝑘−1
(3.5.40) 2
tr 𝜌 = ∑ 𝜆2𝑖 ≤ ( ∑ 𝜆𝑖 ) = (tr 𝜌)2 = 1.
𝑖=0 𝑖=0

We now prove the first assertion of the theorem. Let 𝜌 = |𝜑⟩ ⟨𝜑| with a quantum
state |𝜑⟩ ∈ ℍ. Since ⟨𝜑|𝜑⟩ = 1, it follows that 𝜌 is a projection; that is, 𝜌2 = 𝜌. Now
assume that 𝜌2 = 𝜌. Then the trace condition implies tr 𝜌2 = tr 𝜌 = 1. Finally, let
tr 𝜌2 = 1. Then it follows from (3.5.40) that there is 𝑙 ∈ ℤ𝑘 with |𝜆𝑙 | = 1 and 𝜆𝑖 = 0 for
𝑖 ≠ 𝑙. Therefore, 𝜌 = |𝑏𝑙 ⟩ ⟨𝑏𝑙 |. The second assertion is proved in Exercise 3.5.16. □

Exercise 3.5.16. Prove the second assertion of Theorem 3.5.15.

Example 3.5.17. Consider the density operator 𝜌 from Example 3.5.9. Since
1
(3.5.41) 𝜌2 = (|0⟩ ⟨0| + |1⟩ ⟨1|) ≠ 𝜌
4
it follows from Theorem 3.5.15 that 𝜌 is not the density operator of a pure state. This
we have already directly verified in Example 3.5.6.

Note that for density operators 𝜌 with 𝜌2 = 𝜌 or tr 𝜌2 = 1, the proof of Theorem


3.5.15 contains a method for determining a pure state |𝜑⟩ such that 𝜌 = |𝜑⟩ ⟨𝜑|.
3.6. The quantum postulates for mixed states 131

3.6. The quantum postulates for mixed states


The set of mixed states of a quantum system 𝑄 can be viewed as a superset of the set
of all pure states of 𝑄 if we identify a pure state |𝜑⟩ with the mixed state (1, |𝜑⟩). In this
section, we generalize the postulates of quantum mechanics to mixed states.

3.6.1. State Space Postulate. We begin with the generalized State Space Pos-
tulate.

Postulate 3.6.1 (State Space Postulate — density operator version). Associated with
any physical system is a Hilbert space, called the state space of the system. The system
is completely described by a density operator on the state space.

By Theorem 3.5.11, modeling pure quantum states using density operators is coars-
er than modeling them using state vectors. This theorem tells us that the density oper-
ators of two state vectors are equal as long as they are equal up to a global phase factor.
However, since by Theorem 3.4.4 all state vectors that are equal up to a global phase
factor behave the same in measurements, this is of no importance.
The description of quantum systems using density operators includes mixed states.
This becomes essential in scenarios like when a component of a composite system is
discarded, leaving only the remaining system to be described.
The Composite Systems Postulate for density operators is the following.

Postulate 3.6.2 (Composite Systems Postulate — density operator version). The state
space of the composition of finitely many physical systems is the tensor product of the
state spaces of the component physical systems. Moreover, if we have systems num-
bered 0 through 𝑚 − 1 and if system 𝑖 is in the state 𝜌𝑖 where 𝜌𝑖 is a density operator on
the state space of the 𝑖th component system for 0 ≤ 𝑖 < 𝑘, then the composite system
is in the state 𝜌0 ⊗ ⋯ ⊗ 𝜌𝑚−1 .

3.6.2. Evolution Postulate. We also present a density operator analog of the


Evolution Postulate 3.3.1. The postulate for pure states tells us that the evolution of
a closed quantum system is described by unitary transformations. Let 𝑈 be a unitary
transformation on a state space ℍ that describes an evolution of the quantum system.
Then after this evolution, a state vector |𝜓⟩ becomes 𝑈 |𝜓⟩. The corresponding density
operator is

(3.6.1) |𝑈 |𝜓⟩⟩ ⟨𝑈 |𝜓⟩| = 𝑈 |𝜓⟩ ⟨𝜓| 𝑈 ∗ .

This motivates the following modified Evolution Postulate.

Postulate 3.6.3 (Evolution Postulate — density operator version). The evolution of


a quantum system with state space ℍ is described by a unitary transformation on ℍ.
More precisely, let 𝑡, 𝑡′ ∈ ℝ, 𝑡 < 𝑡′ . Assume that the state of the system at time 𝑡 is
described by the density operator 𝜌 on ℍ. Then the state of the system at time 𝑡′ is
obtained from 𝜌 as 𝜌′ = 𝑈𝜌𝑈 ∗ where 𝑈 is a unitary operator on ℍ that only depends
on 𝑡 and 𝑡′ .
132 3. Quantum Mechanics

Analogous to what we have described in Section 3.3.3, we can use composite uni-
tary operators to describe the time evolution of composite quantum systems that are in
a mixed state.
Exercise 3.6.4. Suppose that at time 𝑡 a 2-qubit quantum register is in the state 𝜌 =
|00⟩ ⟨00| and that the state of the system at time 𝑡′ > 𝑡 is obtained from 𝜌 by applying the
𝖢𝖭𝖮𝖳 operator. Determine the density operator that describes the state of the system
at time 𝑡′ .

3.6.3. Measurement Postulate. Next, we adapt the Measurement Postulate


3.4.1 to density operators. First, we motivate this modification.
Let 𝑂 be an observable of a quantum system and let 𝑂 = ∑𝜆 𝜆𝑃𝜆 be its spectral
decomposition. Assume that the quantum system is in the state |𝜑⟩. Then the corre-
sponding density operator is
(3.6.2) 𝜌 = |𝜑⟩ ⟨𝜑| .
The Measurement Postulate tells us that measuring the observable 𝑂 when the quan-
tum system is in the state |𝜑⟩ gives one of the eigenvalues 𝜆 of 𝑂 with probability
Pr(𝜆) = ⟨𝜑|𝑃𝜆 |𝜑⟩. From Proposition 2.4.29 we obtain
(3.6.3) Pr(𝜆) = ⟨𝜑|𝑃𝜆 |𝜑⟩ = tr(𝑃𝜆 |𝜑⟩ ⟨𝜑|).
Furthermore, if the result of the measurement is 𝜆, then the state of the system imme-
diately after the measurement is
𝑃𝜆 |𝜓⟩
(3.6.4) .
√Pr(𝜆)
By Proposition 2.4.29 the density operator of this state is
𝑃𝜆 |𝜑⟩ ⟨𝜑| 𝑃𝜆 𝑃 𝜌𝑃
(3.6.5) = 𝜆 𝜆.
Pr(𝜆) Pr(𝜆)
This motivates the following modified Measurement Postulate.
Postulate 3.6.5 (Measurement Postulate — density operator version). A projective
measurement is described by an observable 𝑂 that is a Hermitian operator on the state
space of the system being observed. Let 𝑂 = ∑𝜆∈Λ 𝜆𝑃𝜆 be the spectral decomposition
of 𝑂. The possible outcomes of the measurement are the eigenvalues of the observable.
When measuring the state 𝜌 the probability of getting the result corresponding to 𝜆 is
Pr(𝜆) = tr(𝑃𝜆 𝜌). If this outcome occurs, the state immediately after the measurement
𝑃𝜆 𝜌𝑃𝜆
is Pr(𝜆) .

In the situation of the Measurement Postulate 3.6.5, the expectation value of the
random variable that sends a measurement outcome to the corresponding eigenvalue
is
|
(3.6.6) ∑ 𝜆Pr(𝜆) = ∑ 𝜆 tr(𝑃𝜆 𝜌) = tr ⟨( ∑ 𝜆𝑃𝜆 )𝜌|| = tr(𝑂𝜌).
𝜆∈Λ 𝜆∈Λ 𝜆∈Λ |
This motivates the following definition.
3.6. The quantum postulates for mixed states 133

Definition 3.6.6. Let 𝑂 be an observable of a quantum system with state space ℍ.


Suppose that we measure this observable when the system is in a state described by
the density operator 𝜌. Then the expectation value of this measurement is defined as
tr(𝑂𝜌).

The following proposition applies the Measurement Postulate for density operators
to explain what happens when mixed states are measured in an orthonormal basis.

Proposition 3.6.7. Suppose that we measure the quantum system in the orthonormal
basis 𝐵 = (|𝑏0 ⟩ , . . . , |𝑏𝑘−1 ⟩) when it is in the mixed state. Then measuring the observable
𝑘−1 𝑙−1
∑𝜆=0 𝜆 |𝜆𝑗 ⟩ ⟨𝜆𝑗 | gives 𝜆 ∈ ℤ𝑘 with probability Pr(𝜆) = ∑𝑖=0 𝑝 𝑖 |⟨𝑏𝜆 |𝜑𝑖 ⟩|2 . Immediately
after this measurement, the quantum system is in the state |𝑏𝜆 ⟩ ⟨𝑏𝜆 |.

Proof. The density operator corresponding to the mixed state of the quantum system
is
𝑙−1
(3.6.7) 𝜌 = ∑ 𝑝 𝑖 |𝜑𝑖 ⟩ ⟨𝜑𝑖 | .
𝑖=0

For all 𝜆 ∈ ℤ𝑘 we have


𝑙−1 𝑙−1
(3.6.8) 𝑃𝜆 𝜌 = 𝑃𝜆 ∑ 𝑝 𝑖 |𝜑𝑖 ⟩ ⟨𝜑𝑖 | = ∑ 𝑝 𝑖 ⟨𝑏𝜆 |𝜑𝑖 ⟩ |𝑏𝜆 ⟩ ⟨𝜑𝑖 | .
𝑖=0 𝑖=0

So it follows from Proposition 2.4.27 that measuring 𝑂 gives 𝜆 ∈ ℤ𝑘 with probability


𝑙−1
(3.6.9) Pr(𝜆) = tr(𝑃𝜆 𝜌) = ∑ 𝑝 𝑖 |⟨𝑏𝜆 |𝜑𝑖 ⟩|2 .
𝑖=0

When the measurement outcome is 𝜆, then immediately after the measurement the
quantum system is in the state
𝑙−1
𝑃𝜆 𝜌𝑃𝜆 ∑ 𝑝 𝑖 |𝑏𝜆 ⟩ ⟨𝑏𝜆 |𝜑𝑖 ⟩ ⟨𝜑𝑖 |𝑏𝜆 ⟩ ⟨𝑏𝜆 |
(3.6.10) = 𝑖=0
Pr(𝜆) Pr(𝜆)
𝑙−1
𝑃𝜆 ∑𝑖=0 𝑝 𝑖 |⟨𝑏𝜆 |𝜑𝑖 ⟩|2
= = 𝑃𝜆 . □
Pr(𝜆)

The concepts and results regarding partial measurements in Section 3.4.3 carry
over to the mixed state situation. We only need to replace the formulas for the mea-
surement probabilities and for the states immediately after measurement with the for-
mulas that hold for mixed states. In Exercise 3.6.8 this is done for quantum systems
that are the composition of two quantum systems.

Exercise 3.6.8. Assume that 𝐴 and 𝐵 are quantum systems with state spaces ℍ𝐴 and
ℍ𝐵 . Consider the composite quantum system 𝐴 with state space ℍ𝐴𝐵 = ℍ𝐴 ⊗ ℍ𝐵 . Let
𝑂𝐴 be an observable of system 𝐴 with spectral decomposition 𝑂𝐴 = ∑𝜆∈Λ 𝜆𝑃𝜆 . Also,
let 𝜌𝐴 , 𝜌𝐵 be states of the systems 𝐴 and 𝐵, respectively. Prove that the following hold.
134 3. Quantum Mechanics

(1) The composed operator 𝑂𝐴𝐵 = 𝑂𝐴 ⊗ 𝐼𝐵 is an observable of the composite system


𝐴𝐵 with spectral decomposition
(3.6.11) 𝑂𝐴𝐵 = ∑ 𝜆(𝑃𝜆 ⊗ 𝐼𝐵 ).
𝜆∈Λ

(2) Measuring 𝑂𝐴𝐵 when 𝐴𝐵 is in the nonentangled state 𝜌𝐴 ⊗ 𝜌𝐵 we obtain the


eigenvalue 𝜆 with probability Pr(𝜆) = tr(𝑂𝐴 𝜌𝐴 ). If this outcome occurs, the state
(𝑃 𝜌𝐴 𝑃𝜆 )⊗𝜌𝐵
of system 𝐴𝐵 immediately after the measurement is 𝜆tr(𝑃 . Also, the expec-
𝜆 𝜌𝐴 )
tation of 𝑂𝐴𝐵 is tr(𝑂𝐴 𝜌𝐴 ).

3.6.4. The descriptions by state vectors and density operators are equiva-
lent. We have introduced the description of the states of quantum systems using state
vectors and density operators. Modeling quantum systems using density operators is
more general since it also covers the situation where a quantum system is described by
a mixed state. However, when we restrict our attention to pure states, the two descrip-
tions are equivalent. This means the following. Consider a quantum system with state
space ℍ. Suppose that it evolves in 𝑘 steps. Initially, it is in the pure state 𝑠0 , then in the
pure state 𝑠1 , etc., until it is finally in the pure state 𝑠𝑙 . These states can be described by
state vectors or density operators. In both versions of the State Space Postulate, each
transition is associated with a unitary operator on ℍ. So let 𝑈0 , 𝑈1 , . . . , 𝑈 𝑙−1 be a se-
quence of unitary operators on ℍ such that state 𝑠𝑖+1 is obtained from state 𝑠𝑖 using 𝑈 𝑖
for 0 ≤ 𝑖 < 𝑙. Assume that the states 𝑠𝑖 are represented by state vectors |𝜑𝑖 ⟩ ∈ ℍ for
0 ≤ 𝑖 < 𝑙. Then we have
(3.6.12) |𝜑𝑘 ⟩ = 𝑈 |𝜑0 ⟩
with
(3.6.13) 𝑈 = 𝑈 𝑙−1 ⋯ 𝑈0 .
Next, assume that the state 𝑠0 is represented by the density operator
(3.6.14) 𝜌0 = |𝜑0 ⟩ ⟨𝜑0 | .
Also, for 0 ≤ 𝑖 < 𝑙 set
(3.6.15) 𝜌𝑖+1 = 𝑈 𝑖 𝜌𝑖 𝑈𝑖∗ .
Then from (3.6.1) we obtain
(3.6.16) 𝜌𝑙 = 𝑈𝜌0 𝑈 ∗ = |𝜑𝑙 ⟩ ⟨𝜑𝑙 | .
As shown in Section 3.6.3 the measurement statistics and the quantum state immedi-
ately after the measurement are the same for the state described by |𝜑𝑙 ⟩ and 𝜌𝑙 . This
shows that from the perspective of quantum mechanics, the two descriptions of the
state evolution are equivalent.

3.7. Partial trace and reduced density operators


In Section B.9.4 we have introduced the partial trace. In this section, we use it to model
states of subsystems of composite physical systems.
We consider two quantum systems 𝐴 and 𝐵 with state spaces ℍ𝐴 and ℍ𝐵 of di-
mensions 𝑀 and 𝑁, respectively. Without loss of generality, we assume that 𝑀 ≤ 𝑁.
3.7. Partial trace and reduced density operators 135

Let (|𝑎0 ⟩ , . . . , |𝑎𝑀−1 ⟩) be an orthonormal basis of ℍ𝐴 and let (|𝑏0 ⟩ , . . . , |𝑏𝑁−1 ⟩) be an


orthonormal basis of ℍ𝐵 . We also denote the composition of 𝐴 and 𝐵 by 𝐴𝐵. Its
state space is ℍ𝐴𝐵 = ℍ𝐴 ⊗ ℍ𝐵 and (|𝑎𝑖 ⟩ |𝑏𝑗 ⟩) is an orthonormal basis of ℍ. Further-
more, (|𝑎𝑖 ⟩ ⟨𝑎𝑗 |) and (|𝑏𝑘 ⟩ ⟨𝑏𝑙 |) are orthonormal bases of End(ℍ𝐴 ) and End(ℍ𝐵 ). Hence
(|𝑎𝑖 ⟩ ⟨𝑎𝑗 | ⊗ |𝑏𝑘 ⟩ ⟨𝑏𝑙 |) is an orthonormal basis of End(ℍ𝐴𝐵 ).

3.7.1. Partial trace on ℍ𝐴𝐵 . In this section, we discuss the partial trace on ℍ𝐴𝐵
over ℍ𝐵 . We refer to it as the partial trace over 𝐵 and denote it by tr𝐵 . The definition of
the partial trace tr𝐴 over 𝐴 and the corresponding terminology are analogous.
From the definition of the partial trace in Section B.9.4 we obtain the following
formula.

Proposition 3.7.1. Let 𝑈 ∈ End(ℍ𝐴𝐵 ),

(3.7.1) 𝑈= ∑ 𝑢𝑖,𝑗,𝑘,𝑙 |𝑎𝑖 ⟩ ⟨𝑎𝑗 | ⊗ |𝑏𝑘 ⟩ ⟨𝑏𝑙 |


𝑖,𝑗∈ℤ𝑀 ,𝑘,𝑙∈ℤ𝑁

with complex coefficients 𝑢𝑖,𝑗,𝑘,𝑙 . Then

(3.7.2) tr𝐵 𝑈 = ∑ 𝑈 𝑘
𝑘∈ℤ𝑁

where

(3.7.3) 𝑈 𝑘 = ∑ 𝑢𝑖,𝑗,𝑘.𝑘 |𝑎𝑖 ⟩ ⟨𝑎𝑗 | .


𝑖,𝑗∈ℤ𝑀

Exercise 3.7.2. Prove Proposition 3.7.1

Example 3.7.3. Consider

(3.7.4) 𝑈 = |𝑥+ ⟩ ⟨𝑥+ | ⊗ |0⟩ ⟨0| .

It follows from Proposition 3.7.1 that

(3.7.5) tr𝐵 𝑈 = |𝑥+ ⟩ ⟨𝑥+ | .

The next proposition shows how to determine the trace over 𝐵 of a projection |𝜑⟩ ⟨𝜑|
for |𝜑⟩ ∈ ℍ𝐴𝐵 from the Schmidt decomposition of |𝜑⟩.

Proposition 3.7.4. Let 𝜑 ∈ ℍ𝐴𝐵 and let


𝑙−1
(3.7.6) |𝜑⟩ = ∑ 𝑟 𝑖 |𝜓𝑖 ⟩ |𝜉𝑖 ⟩
𝑖=0

be a Schmidt decomposition of 𝜑 as described in Theorem 2.5.18. Then we have


𝑙−1
(3.7.7) tr𝐵 (|𝜑⟩ ⟨𝜑|) = ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑖 | .
𝑖=0
136 3. Quantum Mechanics

Proof. We have
𝑙−1
tr𝐵 |𝜑⟩ ⟨𝜑| = tr𝐵 ( ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ |𝜉𝑖 ⟩ ⟨𝜓𝑗 | ⟨𝜉𝑗 |)
𝑖,𝑗=0
𝑙−1
= ∑ 𝑟𝑖2 tr𝐵 (|𝜓𝑖 ⟩ ⟨𝜓𝑗 | ⊗ |𝜉𝑖 ⟩ ⟨𝜉𝑗 |)
𝑖,𝑗=0
𝑙−1
(3.7.8) = ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑗 | tr(|𝜉𝑖 ⟩ ⟨𝜉𝑗 |)
𝑖,𝑗=0
𝑙−1
= ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑗 | 𝛿 𝑖,𝑗
𝑖,𝑗=0
𝑙−1
= ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑖 | . □
𝑖=0

Next, we prove that the partial trace preserves positive semidefiniteness.

Proposition 3.7.5. If 𝑈 ∈ End(ℍ𝐴𝐵 ) is positive semidefinite, then tr𝐵 (𝑈) is positive


semidefinite.

Proof. Let 𝑈 ∈ End(ℍ𝐴𝐵 ) be positive semidefinite. Use the representation (3.7.1) of


𝑈. For 𝑘, 𝑙 ∈ ℤ𝑁 let

(3.7.9) 𝑈 𝑘,𝑙 = ∑ 𝑢𝑖,𝑗,𝑘,𝑙 |𝑎𝑖 ⟩ ⟨𝑎𝑗 | .


𝑖,𝑗∈ℤ𝑀

Then we have

(3.7.10) 𝑈 = ∑ 𝑈 𝑘,𝑙 ⊗ |𝑏𝑘 ⟩ ⟨𝑏𝑙 | .


𝑘,𝑙∈ℤ𝑁

Let |𝜑⟩ ∈ ℍ𝐴 . For 𝑥 ∈ ℤ𝑁 set

(3.7.11) |𝜑𝑥 ⟩ = |𝜑⟩ ⊗ |𝑏𝑥 ⟩ .

Then (3.7.11), 𝑈 being unitary, and Proposition 3.7.1 imply

(3.7.12) 0 ≤ ∑ ⟨𝜑𝑥 |𝑈|𝜑𝑥 ⟩ = ∑ ∑ ⟨𝜑|𝑈 𝑘,𝑙 |𝜑⟩⟨𝑏𝑥 |𝑏𝑘 ⟩⟨𝑏𝑙 |𝑏𝑥 ⟩


𝑥∈ℤ𝑁 𝑥∈ℤ𝑁 𝑘,𝑙∈ℤ𝑁

= ∑ ⟨𝜑|𝑈𝑥,𝑥 |𝜑⟩ = ⟨𝜑| tr𝐵 (𝑈)|𝜑⟩. □


𝑥∈ℤ𝑁

3.7.2. Tracing out subsystems. In this section, we study the following ques-
tion. Suppose that the state of a composite quantum system 𝐴𝐵 at time 𝑡 is 𝜌 and we
discard subsystem 𝐵 at this time. Discarding component 𝐵 refers to intentionally dis-
regarding the quantum state of this part of the composite system while focusing solely
on the remaining part 𝐴. What is the state of 𝐴 after time 𝑡? We will show that it must
3.7. Partial trace and reduced density operators 137

be the partial trace tr𝐵 𝜌 of 𝜌 over 𝐵. In the whole section, states of quantum systems
are described by density operators.
We start with the following observation.
Proposition 3.7.6. Let 𝜌 be a density operator on ℍ𝐴𝐵 . Then tr𝐵 (𝜌) is a density operator
on ℍ𝐴 .

Proof. We must show that tr𝐵 (𝜌) satisfies the trace condition and the positivity condi-
tion. As a density operator, 𝜌 satisfies these conditions. Since by Proposition B.9.25 the
partial trace is trace-preserving, we have tr(tr𝐵 (𝜌)) = tr(𝜌) = 1. Therefore, tr𝐵 (𝜌) sat-
isfies the trace condition. Also, tr𝐵 (𝜌) satisfies the positivity condition by Proposition
3.7.5. □

The previous proposition justifies the next definition.


Definition 3.7.7. If 𝜌 is a density operator on ℍ𝐴𝐵 , then tr𝐵 (𝜌) is called the reduced
density operator of 𝜌 on the subsystem 𝐴. This operator is denoted by 𝜌𝐴 .

Our next objective is to demonstrate that by discarding subsystem 𝐵 subsystem


𝐴 assumes the state 𝜌𝐴 . The argument proceeds as follows: Let 𝑂𝐴 be an observable
of subsystem 𝐴, and let us assume that the composite system is in the state 𝜌. There
are two approaches to measuring 𝑂𝐴 . We can either measure the composite observable
𝑂𝐴𝐵 = 𝑂𝐴 ⊗𝐼𝐵 for system 𝐴𝐵, where 𝐼𝐵 denotes the identity operator on the state space
of subsystem 𝐵, or we can discard subsystem 𝐵 and measure 𝑂𝐴 only for the subsystem
𝐴. The measurement statistics in both scenarios must be identical. Specifically, the
expectation values of both measurements must be equal. The following theorem states
that the equality of these expectation values implies 𝜌𝐴 = 𝜌𝐴 . Consequently, after
discarding system 𝐵, subsystem 𝐴 assumes the state 𝜌𝐴 .
Theorem 3.7.8. (1) Let 𝑂𝐴 be an observable of system 𝐴, let 𝑂𝐴𝐵 = 𝑂𝐴 ⊗ 𝐼𝐵 , and
assume that the state of the quantum system is 𝜌. Then the expectation value of 𝑂𝐴𝐵
is the same as the expectation value of 𝑂𝐴 when system 𝐴 is in the reduced state 𝜌𝐴 ;
i.e.,
(3.7.13) tr(𝑂𝐴𝐵 𝜌) = tr(𝑂𝐴 𝜌𝐴 ).
(2) The function
(3.7.14) End(ℍ𝐴𝐵 ) → End(ℍ𝐴 ), 𝜌 ↦ 𝜌𝐴 = tr𝐵 (𝜌)
is the only linear map that satisfies (3.7.13) for all observables 𝑂𝐴 of 𝐴 and all states
𝜌 of 𝐴𝐵.

Proof. Let (𝑆 𝑖 ) and (𝑇𝑗 ) be bases of End(ℍ𝐴 ) and End(ℍ𝐵 ), respectively. Then (𝑆 𝑖 ⊗𝑇𝑗 )
is a basis of End(ℍ𝐴𝐵 ). Since the trace is a linear map, it suffices to prove (3.7.13) for
the basis elements 𝑆 𝑖 ⊗ 𝑇𝑗 . So, let 𝑖 ∈ 𝑀 2 , 𝑗 ∈ 𝑁 2 , and 𝜌 = 𝑆 𝑖 ⊗ 𝑇𝑗 . We use the fact
that the partial trace is trace-preserving (see Proposition B.9.25) and obtain
tr(𝑂𝐴𝐵 𝜌) = tr((𝑂𝐴 ⊗ 𝐼𝐵 )(𝑆 𝑖 ⊗ 𝑇𝑗 )) = tr(𝑂𝐴 𝑆 𝑖 ⊗ 𝑇𝑗 ) = tr(tr𝐵 (𝑂𝐴 𝑆 𝑖 ⊗ 𝑇𝑗 ))
= tr(𝑂𝐴 𝑆 𝑖 tr 𝑇𝑗 )
= tr(𝑂𝐴 𝜌𝐴 ).
138 3. Quantum Mechanics

To prove the second assertion, consider a linear map 𝑓 ∶ End(ℍ𝐴𝐵 ) → End(ℍ𝐴 )


satisfying
(3.7.15) tr(𝑂𝐴𝐵 𝜌) = tr(𝑂𝐴 𝑓(𝜌))
for all observables 𝑂𝐴 of 𝐴, 𝑂𝐴𝐵 = 𝑂𝐴 ⊗ 𝐼𝐵 , and all states 𝜌 of system 𝐴𝐵. We show
that this map is the partial trace. Denote by 𝐾 the dimension of the linear space of all
Hermitian operators on ℍ𝐴 , let (𝑆 𝑖 )0≤𝑖<𝐾 be an orthonormal basis of this space with
respect to the Hilbert-Schmidt inner product, and let 𝜌 ∈ End(ℍ𝐴𝐵 ). Then expanding
𝑓(𝜌) in this basis and noting that the basis elements 𝑆 𝑖 are observables of system 𝐴, we
obtain from (3.7.15)
𝐾−1 𝐾−1
(3.7.16) 𝑓(𝜌) = ∑ tr(𝑆 ∗𝑖 𝑓(𝜌))𝑆 𝑖 = ∑ tr(𝑆 ∗𝑖 ⊗ 𝐼𝐵 𝜌)𝑆 𝑖 .
𝑖=0 𝑖=0

The expression on the right side of (3.7.16) is independent of 𝑓. Hence, 𝑓 must be equal
to tr𝐵 since by the first assertion 𝑓 = tr𝐵 satisfies (3.7.15). □

As explained above, Theorem 3.7.8 tells us the following. Suppose that 𝐴𝐵 is a


quantum system that is composed of the two quantum systems 𝐴 and 𝐵. Assume that
𝐴𝐵 is in the state 𝜌. We discard system 𝐵 and only keep system 𝐴. Then 𝐴 is in the
state 𝜌𝐴 = tr𝐵 𝜌. Therefore, we refer to the process of discarding system 𝐵 as tracing
out system 𝐵.
Example 3.7.9. We determine the state of the first qubit of the Bell state
1
(3.7.17) |𝜓⟩ = (|00⟩ + |11⟩)
√2
when the second qubit is discarded. The density operator of |𝜓⟩ is
𝜌 = |𝜓⟩ ⟨𝜓|
1
(3.7.18) = (|00⟩ + |11⟩)(⟨00| + ⟨11|)
2
1
= (|00⟩ ⟨00| + |00⟩ ⟨11| + |11⟩ ⟨00| + |11⟩ ⟨11|).
2
We note that
(3.7.19) tr𝐵 (|𝑖𝑘⟩ ⟨𝑗𝑙|) = |𝑖⟩ ⟨𝑗| 𝛿 𝑘𝑙 .
We trace out the second qubit to obtain the reduced density operator of the first qubit.
From (3.7.19) we obtain
𝜌𝐴 = tr𝐵 (𝜌)
1
(3.7.20) = (tr𝐵 (|00⟩ ⟨00|) + tr𝐵 (|00⟩ ⟨11|) + tr𝐵 (|11⟩ ⟨00|) + tr𝐵 (|11⟩ ⟨11|))
2
1
= (|0⟩ ⟨0| + |1⟩ ⟨1|).
2
1 1
This is the density operator of the mixed state (( 2 , |0⟩), ( 2 , |1⟩)).
It follows from Theorem 3.5.15 that the density operator in (3.7.20) is not the den-
sity operator of a pure state because the trace of its square is 1/2.
The next proposition generalizes Example 3.7.9.
3.7. Partial trace and reduced density operators 139

Proposition 3.7.10. Let 𝑙 ∈ ℕ and for 0 ≤ 𝑖 < 𝑙 let |𝜑𝑖 ⟩ and |𝜓𝑖 ⟩ be quantum states in
ℍ𝐴 and ℍ𝐵 , respectively, such that the states |𝜓𝑖 ⟩ are orthogonal to each other. Also, let 𝜌
be the density operator of the state
𝑙−1
1
(3.7.21) |𝜉⟩ = ∑ |𝜑𝑖 ⟩ |𝜓𝑖 ⟩ .
√𝑙 𝑖=0

Then 𝜌𝐴 is the density operator of the mixed state


1 1
(3.7.22) (( , |𝜑0 ⟩) , . . . , ( , |𝜑𝑙−1 ⟩)) .
𝑙 𝑙
In other words, if the composite system 𝐴𝐵 is in the state 𝜌 = |𝜉⟩ ⟨𝜉|, then the state of
system 𝐴 after tracing out system 𝐵 can be described by the mixed state (3.7.22).
Exercise 3.7.11. Prove Proposition 3.7.10.

From Proposition 3.7.10 we obtain the following consequence.


Corollary 3.7.12. Assume that the composite system 𝐴𝐵 is in the state 𝜌 = |𝜉⟩ ⟨𝜉| where
|𝜉⟩ = |𝜑⟩ |𝜓⟩ with |𝜑⟩ ∈ ℍ𝐴 and |𝜓⟩ ∈ ℍ𝐵 . Then 𝜌𝐴 = |𝜑⟩ ⟨𝜑|. This means that the state
of system 𝐴 after tracing out system 𝐵 can be described by the state vector |𝜑⟩.

Now we characterize the states of composite systems whose partial trace is not a
pure state.
Theorem 3.7.13. Let |𝜑⟩ be the state of the composite system 𝐴𝐵 and let 𝜌 = |𝜑⟩ ⟨𝜑| be
its density operator. Then |𝜑⟩ is entangled with respect to the decomposition of 𝐴𝐵 into
the subsystems 𝐴 and 𝐵 if and only if the reduced density operator 𝜌𝐴 is not the density
operator of a pure state.

Proof. Let
𝑠−1
(3.7.23) |𝜑⟩ = ∑ 𝑟 𝑖 |𝜓𝑖 ⟩ |𝜉𝑖 ⟩
𝑖=0

be a Schmidt decomposition of |𝜑⟩. Then, as shown in Exercise 3.7.14, we have


𝑠−1
(3.7.24) 𝜌 = ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ |𝜉𝑖 ⟩ ⟨𝜓𝑖 | ⟨𝜉𝑖 | .
𝑖=0

Hence, we have
𝑠−1
(3.7.25) tr 𝜌 = ∑ 𝑟𝑖2 .
𝑖=0

By Proposition 3.7.4 we have


𝑠−1
(3.7.26) 𝜌𝐴 = ∑ 𝑟𝑖2 |𝜓𝑖 ⟩ ⟨𝜓𝑖 | .
𝑖=0

So it follows from Proposition 2.4.59 that


𝑠−1
(3.7.27) (𝜌𝐴 )2 = ∑ 𝑟𝑖4 |𝜓𝑖 ⟩ ⟨𝜓𝑖 | .
𝑖=0
140 3. Quantum Mechanics

Assume that |𝜑⟩ is entangled. Then by Theorem 3.2.4 we have 𝑠 > 1. Since the
Schmidt coefficients 𝑟 𝑖 are positive real numbers, the trace condition for 𝜌 and (3.7.25)
imply that 0 < 𝑟 𝑖 < 1 for 0 ≤ 𝑖 < 𝑠. So, by (3.7.26) and (3.7.27) we have 𝜌𝐴 ≠ (𝜌𝐴 )2 and
it follows from Theorem 3.5.15 that 𝜌𝐴 is not the density operator of a pure state.
Conversely, suppose that |𝜑⟩ is separable. Then Theorem 3.2.4 implies that 𝑠 = 1.
It follows from the trace condition and (3.7.25) that 𝑟0 = 1. So (3.7.26) and (3.7.27)
imply that 𝜌𝐴 = (𝜌𝐴 )2 . Theorem 3.5.15 shows that 𝜌𝐴 is the density operator of a pure
state. □
Exercise 3.7.14. Verify (3.7.24).
Chapter 4

The Theory of
Quantum Algorithms

In this chapter, we introduce fundamental facets of quantum algorithms. Building


upon the concepts previously introduced in Chapter 1, our journey commences by
shedding light on the pivotal building blocks that constitute quantum circuits: single-
qubit operators. Initially, the chapter introduces the Pauli and Hadamard gates, un-
raveling their properties. It then delves into the theory of rotations in real three-space,
revealing that all single-qubit operators can essentially be viewed as such rotations.
This insight leads to the derivation of decomposition theorems for single-qubit opera-
tors, drawing parallels from the decomposition of three-dimensional rotations.
However, to construct general quantum circuits, single-qubit operators alone are
insufficient. Therefore, the chapter explores multiple-qubit operators, specifically con-
trolled operators. It covers a spectrum ranging from simple controlled-𝖭𝖮𝖳 operators
to the intricacies of general controlled operators. The chapter also acquaints readers
with ancillary and erasure gates, instrumental in adding and erasing quantum bits. By
utilizing these components, the groundwork for defining general quantum circuits is
laid. Furthermore, drawing upon analogous results in classical reversible circuits, the
chapter demonstrates that every Boolean function can be implemented by a quantum
circuit.
An intriguing question arises concerning the necessary gates for implementing
any quantum circuit. While the 𝖭𝖠𝖭𝖣 gate suffices in the classical case, no finite set
of quantum gates can adequately fulfill this purpose. Instead, finite sets of quantum
gates are presented, enabling the approximation of operators implemented by arbitrary
quantum circuits. These sets leverage the decomposition theorems for single-qubit
operators as their foundation.
Following the exploration of the computing power of quantum gates and circuits,
the chapter ventures into quantum complexity theory. Here, quantum algorithms are
defined as probabilistic algorithms capable of utilizing elements from uniform families

141
142 4. The Theory of Quantum Algorithms

of quantum circuits as subroutines. This approach facilitates the transfer of complex-


ity analysis from probabilistic algorithms to quantum algorithms and culminates in the
introduction of the complexity class BQP-representing bounded-error quantum poly-
nomial time.
In the whole chapter, we always denote by 𝑘, 𝑙, 𝑚, 𝑛 positive integers.

4.1. Simple single-qubit operators


This section discusses important single-qubit operators, that is, unitary operators on
ℍ1 . Since they are typically used as building blocks of quantum circuits, we also refer
to them as single-qubit gates. We identify these operators with their representation ma-
trices with respect to the computational basis (|0⟩ , |1⟩) of ℍ1 . To begin, we summarize
the properties of the identity, Pauli, and Hadamard gates that were already introduced
in Chapters 2 and 3. Subsequently, we explain rotations in ℝ3 and rotation gates. We
will prove that the set of all such gates is the special unitary group SU(2). Finally, we
introduce phase shift gates, which are special rotation operators.

4.1.1. The identity gate. The identity gate


1 0
(4.1.1) 𝐼=( )
0 1
is the most elementary single-qubit gate. It is a Hermitian and unitary involution, its
only eigenvalue is 1, and its spectral decomposition is
(4.1.2) 𝐼 = |0⟩ ⟨0| + |1⟩ ⟨1| .

4.1.2. The Pauli gates. The Pauli gates have already been introduced in Section
2.3.1. They are named after the physicist Wolfgang Pauli (1900–1958) and are of great
importance for the construction of quantum circuits and the implementation of quan-
tum algorithms. We recall their definition and discuss their properties.
Definition 4.1.1. The Pauli gates or Pauli operators are
0 1 0 −𝑖 1 0
(4.1.3) 𝑋=( ), 𝑌 =( ), 𝑍=( ).
1 0 𝑖 0 0 −1

Sometimes, the Pauli gates are also denoted by 𝜎1 , 𝜎2 , 𝜎3 or by 𝜎𝑥 , 𝜎𝑦 , 𝜎𝑧 . We can


also write them as
(4.1.4) 𝑋 = |0⟩ ⟨1| + |1⟩ ⟨0| , 𝑌 = 𝑖(|1⟩ ⟨0| − |0⟩ ⟨1|), 𝑍 = |0⟩ ⟨0| − |1⟩ ⟨1| .

So the effect of the Pauli gates on the computational basis vectors of ℍ1 is as follows:
𝑋 |0⟩ = |1⟩ , 𝑋 |1⟩ = |0⟩ ,
(4.1.5) 𝑌 |0⟩ = 𝑖 |1⟩ , 𝑌 |1⟩ = −𝑖 |0⟩ ,
𝑍 |0⟩ = |0⟩ , 𝑍 |1⟩ = − |1⟩ .

This shows that the Pauli 𝑋 gate can be considered as the quantum equivalent of
the classical 𝖭𝖮𝖳 gate since it sends |𝑏⟩ to |¬𝑏⟩ for all 𝑏 ∈ {0, 1}. It is also called the
4.1. Simple single-qubit operators 143

quantum 𝖭𝖮𝖳 gate or the bit-flip gate. Also, the Pauli 𝑍 gate is sometimes called the
phase-flip gate since it flips the phase of |1⟩ from 1 to −1.
In Example 2.4.57 we have determined the spectral decomposition of the Pauli
operators. They are
𝑋 = |𝑥+ ⟩ ⟨𝑥+ | − |𝑥− ⟩ ⟨𝑥− | ,
(4.1.6) 𝑌 = |𝑦+ ⟩ ⟨𝑦+ | − |𝑦− ⟩ ⟨𝑦− | ,
𝑍 = |𝑧+ ⟩ ⟨𝑧+ | − |𝑧− ⟩ ⟨𝑧− |
where
|0⟩ + |1⟩ |0⟩ − |1⟩
(|𝑥+ ⟩ , |𝑥− ⟩) = ( , ),
√2 √2
|0⟩ + 𝑖 |1⟩ |0⟩ − 𝑖 |1⟩
(4.1.7) (|𝑦+ ⟩ , |𝑦− ⟩) = ( , ),
√2 √2
(|𝑧+ ⟩ , |𝑧− ⟩) = (|0⟩ , |1⟩).

Now we present important properties of the Pauli gates.


Theorem 4.1.2. The Pauli gates are Hermitian and unitary involutions
(4.1.8) 𝑋𝑌 = 𝑖𝑍 = −𝑌 𝑋, 𝑍𝑋 = 𝑖𝑌 = −𝑋𝑍, 𝑌 𝑍 = 𝑖𝑋 = −𝑍𝑌 ,
and
(4.1.9) −𝑖𝑋𝑌 𝑍 = 𝐼.
Exercise 4.1.3. Prove Theorem 4.1.2.

The following proposition will be very useful when we discuss rotation gates.
Proposition 4.1.4. The sequence (𝐼, 𝑋, 𝑌 , 𝑍) is a ℂ-basis of End(ℍ1 ) which is orthogonal
with respect to the Hilbert-Schmidt inner product.

Proof. Let 𝛼, 𝛽, 𝛾, 𝛿 ∈ ℂ. Then we have


𝛼 + 𝛿 𝛽 − 𝑖𝛾
𝛼𝐼 + 𝛽𝑋 + 𝛾𝑌 + 𝛿𝑍 = ( ).
𝛽 + 𝑖𝛾 𝛼 − 𝛿
So, 𝛼𝐼 + 𝛽𝑋 + 𝛾𝑌 + 𝛿𝑍 = 0 implies
(4.1.10) 0 = 𝛼 + 𝛿 = 𝛼 − 𝛿 = 𝛽 + 𝑖𝛾 = 𝛽 − 𝑖𝛾.
This implies 𝛼 = −𝛿 and 𝛼 = 𝛿 and therefore 𝛼 = 𝛿 = 0. This also implies 𝛽 = −𝑖𝛾 and
𝛽 = 𝑖𝛾 and therefore 𝛽 = 𝛾 = 0. Hence, the sequence (𝐼, 𝑋, 𝑌 , 𝑍) is linearly indepen-
dent. Since the dimension of End(ℍ1 ) as a complex vector space is 4, the sequence is a
basis of End(ℍ1 ). The orthogonality can be verified by matrix multiplication which is
done in Exercise 4.1.5. □
Exercise 4.1.5. Verify that (𝐼, 𝑋, 𝑌 , 𝑍) is orthogonal with respect to the Hilbert-Schmidt
inner product.

The symbols representing the identity and the Pauli gates in quantum circuits are
shown in Figure 4.1.1.
144 4. The Theory of Quantum Algorithms

𝐼 𝑋 𝑌 𝑍

Figure 4.1.1. Symbols for the identity and the Pauli gates in quantum circuits.

4.1.3. The Hadamard gate. Another important single-qubit gate, the Hadamard
gate or Hadamard operator, has already been introduced in Section 3.3.2. We recall that
this gate is
11 1
(4.1.11) 𝐻= ().
√2 1 −1
The Hadamard operator is a unitary and Hermitian involution and we have shown in
Exercise 2.3.4 that
(4.1.12) 𝐻𝑋𝐻 = 𝑍, 𝐻𝑌 𝐻 = −𝑌 , 𝐻𝑍𝐻 = 𝑋.
We also note that
1
(4.1.13) 𝐻= (𝑋 + 𝑍).
√2

The symbol representing the Hadamard gate is shown in Figure 4.1.2.

Figure 4.1.2. Symbol for the Hadamard gate in quantum circuits.

4.2. More geometry in ℝ3


In Section 3.1.4, we have shown that any single-qubit quantum state corresponds to
a point on the Bloch sphere. In Section 4.2.2, we will prove the important fact that
any unitary operator on ℍ1 can be seen as a rotation of the Bloch sphere. This proof
requires more concepts and results from the geometry of ℝ3 , which are presented in
this section. Specifically, we will discuss rotations in ℝ3 .
We will identify triplets (𝑎,⃗ 𝑏,⃗ 𝑐)⃗ of vectors in ℝ3 with the matrices that have 𝑎,⃗ 𝑏,⃗ 𝑐 ⃗
as column vectors; e.g., the unit matrix
1 0 0
(4.2.1) 𝐼3 = (0 1 0)
0 0 1
is identified with the basis
(4.2.2) (𝑥,̂ 𝑦,̂ 𝑧)̂ = ((1, 0, 0), (0, 1, 0), (0, 0, 1)).
We also equate the endomorphisms of ℝ3 with their representation matrices corre-
sponding to the basis 𝐼3 .
4.2. More geometry in ℝ3 145

4.2.1. General spherical coordinates. We discuss spherical coordinates with


respect to any pair of orthogonal axes. We start by defining the angle between two
nonzero vectors 𝑎⃗ and 𝑏 ⃗ in ℝ3 . For this, we note that the Cauchy-Schwarz inequality
(see Proposition 2.2.25) implies
⟨𝑎|⃗ 𝑏⟩⃗
(4.2.3) ≤1
‖𝑎‖⃗ ‖𝑏‖⃗
‖ ‖
for any two nonzero vectors 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 . So the following definition makes sense.

Definition 4.2.1. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 be nonzero. Then the angle between 𝑎⃗ and 𝑏 ⃗ is defined
as
⟨𝑎|⃗ 𝑏⟩⃗
(4.2.4) ∠(𝑎,⃗ 𝑏)⃗ = arccos .
‖𝑎‖⃗ ‖𝑏‖⃗
‖ ‖

The angle between two vectors in ℝ3 is illustrated in Figure 4.2.1. As shown in


Proposition 4.2.3, such an angle is always between 0 and 𝜋.
Example 4.2.2. The angle between 𝑥̂ and 𝑦 ̂ is
⟨𝑥|̂ 𝑦⟩̂ 𝜋
(4.2.5) ∠(𝑥,̂ 𝑦)̂ = arccos = arccos(0) = .
‖𝑥‖̂ ‖𝑦‖̂ 2
The angle between 𝑥̂ and −𝑥̂ is
⟨𝑥|̂ − 𝑥⟩̂
(4.2.6) ∠(𝑥,̂ −𝑥)̂ = arccos = arccos(−1) = 𝜋.
‖𝑥‖̂ ‖−𝑥‖̂
Let
𝑥̂ + 𝑦 ̂
(4.2.7) 𝑎̂ = .
√2
Then the angle between 𝑥̂ and 𝑎̂ is
⟨𝑥|̂ 𝑎⟩̂ 1 𝜋
(4.2.8) ∠(𝑥,̂ 𝑎)̂ = arccos = arccos = .
‖𝑥‖̂ ‖𝑎‖̂ √2 4

𝑏⃗ 𝑏⃗

∠(𝑎,⃗ 𝑏)⃗ ∠(𝑎,⃗ 𝑏)⃗

𝑎⃗ 𝑎⃗

Figure 4.2.1. Angle between two vectors.


146 4. The Theory of Quantum Algorithms

𝑎⃗ × 𝑏 ⃗
𝑎⃗

𝑏⃗

Figure 4.2.2. Visualization of the cross product by the right-hand rule.

The angle between two vectors in ℝ3 has the following properties.

Proposition 4.2.3. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 be nonzero vectors . Then we have the following.
(1) 0 ≤ ∠(𝑎,⃗ 𝑏)⃗ = ∠(𝑏,⃗ 𝑎)⃗ ≤ 𝜋.
(2) ∠(𝑎,⃗ 𝑏)⃗ = 0 if and only if 𝑏 ⃗ = 𝑟𝑎⃗ with 𝑟 ∈ ℝ>0 .
(3) ∠(𝑎,⃗ 𝑏)⃗ = 𝜋/2 if and only if ⟨𝑎|⃗ 𝑏⟩⃗ = 0; that is, 𝑎⃗ and 𝑏 ⃗ are orthogonal to each other.
(4) ∠(𝑎,⃗ 𝑏)⃗ = 𝜋 if and only if 𝑏 ⃗ = 𝑟𝑎⃗ with 𝑟 ∈ ℝ<0 .
Exercise 4.2.4. Prove Proposition 4.2.3.

Next, we define the cross products or outer product of two vectors in ℝ3 .

Definition 4.2.5. Let 𝑎⃗ = (𝑎𝑥 , 𝑎𝑦 , 𝑎𝑧 ), 𝑏 ⃗ = (𝑏𝑥 , 𝑏𝑦 , 𝑏𝑧 ) ∈ ℝ3 . Then the cross product or


outer product of 𝑎⃗ and 𝑏 ⃗ is

(4.2.9) 𝑎⃗ × 𝑏 ⃗ = (𝑎𝑦 𝑏𝑧 − 𝑎𝑧 𝑏𝑦 , 𝑎𝑧 𝑏𝑥 − 𝑎𝑥 𝑏𝑧 , 𝑎𝑥 𝑏𝑦 − 𝑎𝑦 𝑏𝑥 ).

Example 4.2.6. Let 𝑎⃗ = 𝑥̂ = (1, 0, 0), 𝑏 ⃗ = 𝑦 ̂ = (0, 1, 0). Then 𝑎⃗ × 𝑏 ⃗ = 𝑧 ̂ = (0, 0, 1).

If 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 are linearly independent, then the cross product has the following
geometric interpretation by the right-hand rule which is illustrated in Figure 4.2.2. If 𝑎⃗
points in the direction of the index finger of the right hand and 𝑏 ⃗ points in the direction
of the middle finger, then 𝑎⃗ × 𝑏 ⃗ is a vector orthogonal to the plane spanned by 𝑎⃗ and 𝑏 ⃗
that points in the direction of the thumb.
Here are some important properties of the outer product.

Proposition 4.2.7. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 and let 𝜃 be the angle between 𝑎⃗ and 𝑏.⃗ Then the
following hold.

(1) ‖‖𝑎⃗ × 𝑏‖‖⃗ = ‖‖𝑎‖‖⃗ ‖𝑏‖⃗ sin 𝜃.

(2) det(𝑎,⃗ 𝑏,⃗ 𝑐)⃗ = ⟨𝑎⃗ × 𝑏|⃗ 𝑐⟩⃗ for all 𝑐 ⃗ ∈ ℝ3 .


4.2. More geometry in ℝ3 147

(3) 𝑎⃗ × 𝑏 ⃗ is orthogonal to 𝑎⃗ and 𝑏.⃗


(4) 𝑎⃗ × 𝑏 ⃗ = 0 if and only if 𝑎⃗ and 𝑏 ⃗ are linearly dependent.

Proof. It follows from Definition 4.2.1 that


2 2
(4.2.10) ‖𝑎‖⃗ ‖𝑏‖⃗ 2 sin2 𝜃 = ‖𝑎‖⃗ ‖𝑏‖⃗ 2 − ⟨𝑎|⃗ 𝑏⟩⃗ 2 .
‖ ‖ ‖ ‖
Now we have
2
‖𝑎‖⃗ ‖𝑏‖⃗ 2 = (𝑎2 + 𝑎2 + 𝑎2 )(𝑏2 + 𝑏2 + 𝑏2 )
(4.2.11) ‖ ‖ 𝑥 𝑦 𝑧 𝑥 𝑦 𝑧

= 𝑎𝑥 𝑏𝑥 + 𝑎𝑥 𝑏𝑦 + 𝑎𝑥 𝑏𝑧 + 𝑎𝑦 𝑏𝑥 + 𝑎𝑦 𝑏𝑦 + 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑥2 + 𝑎2𝑧 𝑏𝑦2 + 𝑎2𝑧 𝑏𝑧2


2 2 2 2 2 2 2 2 2 2

and
⟨𝑎|⃗ 𝑏⟩⃗ 2 = (𝑎𝑥 𝑏𝑥 + 𝑎𝑦 𝑏𝑦 + 𝑎𝑧 𝑏𝑧 )2
(4.2.12)
= 𝑎𝑥2 𝑏𝑥2 + 𝑎𝑦2 𝑏𝑦2 + 𝑎2𝑧 𝑏𝑧2 + 2(𝑎𝑥 𝑏𝑥 𝑎𝑦 𝑏𝑦 + 𝑎𝑥 𝑏𝑥 𝑎𝑧 𝑏𝑧 + 𝑎𝑦 𝑏𝑦 𝑎𝑧 𝑏𝑧 ).
So
2
‖𝑎‖⃗ ‖𝑏‖⃗ 2 − ⟨𝑎|⃗ 𝑏⟩⃗ 2
‖ ‖
(4.2.13) = 𝑎𝑥2 𝑏𝑦2 + 𝑎𝑥2 𝑏𝑧2 + 𝑎𝑦2 𝑏𝑥2 + 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑥2 + 𝑎2𝑧 𝑏𝑦2
− 2(𝑎𝑥 𝑏𝑥 𝑎𝑦 𝑏𝑦 + 𝑎𝑥 𝑏𝑥 𝑎𝑧 𝑏𝑧 + 𝑎𝑦 𝑏𝑦 𝑎𝑧 𝑏𝑧 ).
On the other hand, we have
2
‖𝑎⃗ × 𝑏‖⃗ = (𝑎 𝑏 − 𝑎 𝑏 )2 + (𝑎 𝑏 − 𝑎 𝑏 )2 + (𝑎 𝑏 − 𝑎 𝑏 )2
‖ ‖ 𝑦 𝑧 𝑧 𝑦 𝑧 𝑥 𝑥 𝑧 𝑦 𝑧 𝑧 𝑦
(4.2.14) = 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑦2 + 𝑎2𝑧 𝑏𝑥2 + 𝑎𝑥2 𝑏𝑧2 + 𝑎𝑦2 𝑏𝑧2 + 𝑎2𝑧 𝑏𝑦2
− 2(𝑎𝑦 𝑏𝑧 𝑎𝑧 𝑏𝑦 + 𝑎𝑧 𝑏𝑥 𝑎𝑥 𝑏𝑧 + 𝑎𝑦 𝑏𝑧 𝑎𝑧 𝑏𝑦 ).
So the first assertion follows from (4.2.13) and (4.2.14). Also, Theorem B.5.16, the
Laplace expansion formula for determinants, implies the second assertion. The sec-
ond assertion and the alternating property of the determinant imply the third assertion.
Finally, the first assertion implies the fourth assertion. □

From Proposition 4.2.7 we obtain the following important result.


Theorem 4.2.8. Let 𝑎,̂ 𝑏 ̂ ∈ ℝ3 be unit vectors that are orthogonal to each other. Then
𝑝 ̂ = 𝑟 ̂ = 𝑎̂ × 𝑏 ̂ and 𝑞 ̂ = 𝑏 ̂ × 𝑎̂ are the uniquely determined vectors in ℝ3 such that (𝑝,̂ 𝑎,̂ 𝑏),
̂
̂ ̂ 3
(𝑎,̂ 𝑞,̂ 𝑏), and (𝑎,̂ 𝑏, 𝑟)̂ are orthonormal bases of ℝ with determinant 1.

Proof. Since 𝑎̂ and 𝑏 ̂ are unit vectors and since they are orthogonal to each other, it fol-
lows from the first assertion in Proposition 4.2.7 that 𝑝,̂ 𝑞,̂ and 𝑟 ̂ are unit vectors. Also,
the second and third assertion of Proposition 4.2.7 imply that (𝑎,̂ 𝑏,̂ 𝑟)̂ is an orthonormal
basis of ℝ3 with determinant 1.
Assume that (𝑎,̂ 𝑏,̂ 𝑟′̂ ) is another orthonormal basis of ℝ3 with determinant 1. Then
there are 𝛼, 𝛽, 𝛾 ∈ ℝ such that
(4.2.15) 𝑟′̂ = 𝛼𝑎̂ + 𝛽 𝑏 ̂ + 𝛾𝑟.̂
148 4. The Theory of Quantum Algorithms

𝑤̂

𝑝⃗
𝜃

𝑢̂

Figure 4.2.3. The spherical coordinate representation of 𝑝 ⃗ with respect to (𝑢,̂ 𝑣)̂ is (‖𝑝‖,
⃗ 𝜃, 𝜙).

Since 𝑎̂ is a unit vector and since it is orthogonal to 𝑏,̂ 𝑟,̂ and 𝑟′̂ , we have 𝛼 = ⟨𝑎|̂ 𝑟′̂ ⟩ = 0.
In the same way, we see that 𝛽 = 0. Since 𝑟 ̂ and 𝑟′̂ are unit vectors and det(𝑎,̂ 𝑏,̂ 𝑟)̂ =
det(𝑎,̂ 𝑏,̂ 𝑟′̂ ) = 1, it follows that 𝛾 = 1. So, we have 𝑟 ̂ = 𝑟′̂ , as asserted.
The assertions for 𝑝 ̂ and 𝑞 ̂ follow by swapping the columns of (𝑎,̂ 𝑏,̂ 𝑟)̂ and applying
Proposition B.5.11. □
1 1 1 1
Exercise 4.2.9. Let 𝑎̂ = ( , , 0) and 𝑏 ̂ = ( , − , 0). Find 𝑐 ̂ ∈ ℝ3 so that (𝑎,̂ 𝑏,̂ 𝑐)̂
√2 √2 √2 √2
is an orthogonal matrix.

We now introduce general spherical coordinates.


Definition 4.2.10. Let 𝑢,̂ 𝑤̂ be unit vectors that are orthogonal to each other. Let 𝐵 =
(𝑢,̂ 𝑣,̂ 𝑤)̂ be an orthonormal basis of ℝ3 with determinant 1 which according to Theorem
4.2.8 exists and is uniquely determined. Also, let 𝑝 ⃗ ∈ ℝ3 . Then the spherical coordinate
representation of 𝑝 ⃗ with respect to the azimuth reference 𝑢̂ and the zenith 𝑤̂ or with respect
to (𝑢,̂ 𝑤)̂ for short is defined as the spherical coordinate representation of 𝐵 −1 𝑝.⃗

Figure 4.2.3 illustrates the generalized spherical coordinate representation.


Example 4.2.11. We determine the spherical coordinate representation (𝑟, 𝜃, 𝜙) of 𝑝 ⃗ =
1 1
( 2 , 2 , √2/2) with respect to the azimuth reference 𝑣 ̂ = (1, 0, 0) and the zenith 𝑤̂ =
(0, 0, −1). First,
1 0 0
(4.2.16) 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ = (0 −1 0)
0 0 −1
is an orthonormal basis of ℝ3 with determinant 1. Also, we have 𝐵 −1 = 𝐵 and 𝐵 −1 𝑝 ⃗ =
1 1
( 2 , − 2 , −√2/2). So using arguments analogous to those in Example 3.1.13 we obtain
𝑟 = 1, 𝜃 = 3𝜋/4, and 𝜙 = 5𝜋/4.

Next, we show how the spherical coordinate representation changes if the azimuth
reference is changed.
4.2. More geometry in ℝ3 149

Proposition 4.2.12. Let 𝑢,̂ 𝑢̂′ , 𝑤̂ ∈ ℝ3 be unit vectors and assume that both 𝑢̂ and 𝑢̂′ are
orthogonal to 𝑤.̂ Then the following hold.

(1) The spherical coordinate representation of 𝑢̂′ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛿) where
cos 𝛿 = ⟨𝑢|̂ 𝑢̂′ ⟩ and sin 𝛿 = ⟨𝑤̂ × 𝑢|̂ 𝑢̂′ ⟩.
(2) Let 𝑝 ⃗ ∈ ℝ3 and let (𝑟, 𝜃, 𝜙) and (𝑟′ , 𝜃′ , 𝜙′ ) be the spherical coordinate representations
of 𝑝 ⃗ with respect to (𝑢,̂ 𝑤)̂ and (𝑢̂′ , 𝑤),
̂ respectively. Then we have 𝑟′ = 𝑟, 𝜃′ = 𝜃, and

0 if 𝜙 = 0,
(4.2.17) 𝜙′ = {
𝜙 − 𝛿 mod 2𝜋 otherwise.

Proof. Set 𝑣 ̂ = 𝑤̂ × 𝑢̂ and 𝑣′̂ = 𝑤̂ ′ × 𝑢̂′ . Then it follows from Theorem 4.2.8 that
𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ and 𝐵 ′ = (𝑢̂′ , 𝑣′̂ , 𝑤)̂ are the uniquely determined orthonormal bases of
ℝ3 with determinant 1 and first and last column 𝑢,̂ 𝑤̂ and 𝑢̂′ , 𝑤̂ ′ , respectively. Let
(1, 𝜀, 𝛿) be the spherical coordinate representation of 𝑢̂′ with respect to (𝑢,̂ 𝑤).
̂ Then by
Proposition 3.1.12 we have

(4.2.18) 𝑢̂′ = cos 𝛿 sin 𝜀 𝑢̂ + sin 𝛿 sin 𝜀 𝑣 ̂ + cos 𝜀 𝑤.̂

Since 𝑤̂ is a unit vector and since it is orthogonal to 𝑢̂′ , 𝑢,̂ and 𝑣,̂ it follows that cos 𝜀 = 0
and thus 𝜀 = 𝜋/2. Since 𝑢̂ and 𝑣 ̂ are unit vectors and since they are orthogonal to each
other, it follows that cos 𝛿 = ⟨𝑢|̂ 𝑢̂′ ⟩ and sin 𝛿 = ⟨𝑣|̂ 𝑢̂′ ⟩.
Now we turn to the second assertion. Set
cos 𝛿 − sin 𝛿 0
(4.2.19) 𝑀 = ( sin 𝛿 cos 𝛿 0) .
0 0 1

Then 𝐵𝑀 is an orthonormal basis of ℝ3 with determinant 1 with first vector 𝑢̂′ and
last vector 𝑤.̂ Since by Theorem 4.2.8 there is only one such basis, it follows that 𝑣′̂ =
− sin 𝛿 𝑢̂ + cos 𝛿 𝑣.̂ Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinate representations (𝑟, 𝜃, 𝜙) and
(𝑟′ , 𝜃′ , 𝜙′ ) with respect to (𝑢,̂ 𝑤)̂ and (𝑢̂′ , 𝑤),
̂ respectively. Then 𝑟 = ‖𝑝‖⃗ = 𝑟′ . As shown
in Exercise 4.2.13 we have

(4.2.20) 𝑞′⃗ = 𝑀 −1 𝑞 ⃗ = (cos(𝜙 − 𝛿) sin 𝜃, sin(𝜙 − 𝛿) sin 𝜃, cos 𝜃).

So Definition 4.2.10 and Proposition 3.1.12 imply the assertion. □

Exercise 4.2.13. Verify (4.2.20) in the proof of Proposition 4.2.12 using the trigono-
metric identities (A.5.3) and (A.5.6).

Example 4.2.14. Let 𝑢̂ = 𝑥̂ = (1, 0, 0), 𝑢̂′ = 𝑦 ̂ = (0, 1, 0), and 𝑤̂ = 𝑧 ̂ = (0, 0, 1).
Then 𝑣 ̂ = 𝑤̂ × 𝑢̂ = (0, 1, 0) = 𝑦,̂ ⟨𝑢|̂ 𝑢̂′ ⟩ = 0, ⟨𝑤̂ × 𝑢|̂ 𝑢̂′ ⟩ = 1. Hence, the spherical
coordinate representation of 𝑢̂′ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝜋/2). Also, let 𝑝 ⃗ ∈ ℝ3
with Cartesian coordinates (√2, √2, 0). Then the spherical coordinate representation
of 𝑝 ⃗ is (2, 𝜋/2, 𝜋/4). So, by Proposition 4.2.12, the sperical coordinate representation of
𝑝 ⃗ with respect to (𝑦,̂ 𝑧)̂ is (2, 𝜋/2, 7𝜋/4).
150 4. The Theory of Quantum Algorithms

4.2.2. Rotations. In this section, we explain the geometry of rotations in ℝ3 .


First, we define orthogonal matrices.

Definition 4.2.15. (1) A matrix 𝑂 ∈ ℝ(3,3) is called orthogonal if 𝑂 is invertible and


𝑂−1 = 𝑂T .
(2) The set of all orthogonal matrices is denoted by 𝖮(3).

Exercise 4.2.16. Show that the determinant of orthogonal matrices is in {±1}.

We note that orthogonal matrices are unitary matrices in ℂ(3,3) with real entries.
Therefore, Proposition 2.4.18 implies the following characterization of orthogonal ma-
trices.

Proposition 4.2.17. Let 𝑂 ∈ ℝ(3,3) . Then the following statements are equivalent.
(1) 𝑂 ∈ 𝖮(3).
(2) The columns of 𝑂 form an orthonormal basis of ℝ3 .
(3) The rows of 𝑂 form an orthonormal basis of ℝ3 .
̂ 𝑤⟩̂ = ⟨𝑣|̂ 𝑤⟩̂ for all 𝑣,̂ 𝑤̂ ∈ ℝ3 .
(4) ⟨𝑂𝑣|𝑂
(5) ‖𝑂𝑣‖̂ = ‖𝑣‖̂ for all 𝑣 ̂ ∈ ℝ3 .

Exercise 4.2.18. Prove Proposition 4.2.17.

It follows from the equivalence of the first two statements in Proposition 4.2.17 that
there is a one-to-one correspondence between orthogonal matrices and orthonormal
bases of ℝ3 .
Now we introduce the orthogonal and the special orthogonal group.

Theorem 4.2.19. (1) The set 𝖮(3) of all orthogonal matrices is a group with respect to
matrix multiplication. It is called the orthogonal group of rank 3.
(2) The set of all orthogonal matrices with determinant 1 is a subgroup of 𝖮(3). It is
denoted by SO(3) and is called the special orthogonal group of rank 3.

Exercise 4.2.20. Prove Theorem 4.2.19.

The next theorem introduces rotations in ℝ3 . They are illustrated in Figure 4.2.4.

Theorem 4.2.21. Let 𝑢,̂ 𝑤̂ ∈ ℝ3 be unit vectors and let them be orthogonal to each other,
and let 𝛾 ∈ ℝ. Consider the map ℝ3 → ℝ3 that sends 𝑝 ⃗ ∈ ℝ3 with spherical coordinates
(𝑟, 𝜃, 𝜙) with respect to (𝑢,̂ 𝑤)̂ to the vector in 𝑅3 with the following spherical coordinate
representation with respect to (𝑢,̂ 𝑤): ̂
(𝑟, 𝜃, 𝜙) if 𝜃 ∈ {0, 𝜋},
(4.2.21) {
(𝑟, 𝜃, (𝜙 + 𝛾) mod 2𝜋) otherwise.
Then this map depends only on 𝑤̂ and 𝛾 and is independent of 𝑢.̂ It is denoted by Rot𝑤̂ (𝛾)
and is called the rotation about 𝑤̂ through the angle 𝛾. Also, 𝑤̂ and 𝛾 are called the axis
and the angle of this rotation, respectively.
4.2. More geometry in ℝ3 151

𝑤̂

Rot𝑤̂ (𝛾)𝑝 ⃗

𝑝⃗

Figure 4.2.4. Rotation of 𝑝 ⃗ about 𝑤̂ through the angle 𝛾.

Proof. We show that the map defined in the theorem is independent of 𝑢.⃗ Let 𝑢̂′ be
another unit vector in ℝ3 that is orthogonal to 𝑤.̂ Denote the map in the theorem by
Rot. We show that Rot is the same map regardless of whether we use 𝑢̂ or 𝑢̂′ for its
definition. By Proposition 4.2.12, the spherical coordinate representation of 𝑢̂′ with
respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛿) with 𝛿 ∈ [0, 2𝜋[. Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinate
representation (𝑟, 𝜃, 𝜙) with respect to (𝑢,̂ 𝑤).̂ If 𝜃 ∈ {0, 𝜋}, then by Proposition 4.2.12
the spherical coordinate representation of 𝑝 ⃗ with respect to (𝑢̂′ , 𝑤)̂ is also (𝑟, 𝜃, 𝜙). So
Rot(𝑝)⃗ = 𝑝,⃗ regardless of whether we choose 𝑢̂ or 𝑢̂′ for its definition. If 𝜃 ≠ 0, 𝜋, then
by Proposition 4.2.12 the spherical coordinate representation of 𝑝 ⃗ with respect to (𝑢̂′ , 𝑤)̂
is (𝑟, 𝜃, (𝜙−𝛿) mod 2𝜋). So if we use 𝑢̂′ to define Rot, we obtain (𝑟, 𝜃, (𝜙−𝛿+𝛾) mod 2𝜋)
as the spherical coordinate representation of Rot(𝑝)⃗ with respect to (𝑢̂′ , 𝑤).
̂ Proposition
4.2.12 shows that the spherical coordinate representation of this vector with respect to
(𝑢,̂ 𝑤)̂ is (𝑟, 𝜃, (𝜙 + 𝛾) mod 2𝜋). But this is the spherical coordinate representation of
Rot(𝑝)⃗ if we use 𝑢̂ to define Rot. □

Figure 4.2.4 shows that applying Rot𝑤̂ (𝛾) to 𝑝 ⃗ ∈ ℝ3 rotates this vector about the
axis 𝑤̂ counterclockwise through an angle 𝛾.
In the remainder of this section, we will prove the following theorem.
Theorem 4.2.22. The set of rotations in ℝ3 is 𝑥SO(3).

We first determine the rotations about the axes 𝑥̂ = (1, 0, 0), 𝑦 ̂ = (0, 1, 0), and
𝑧 ̂ = (0, 0, 1) explicitly.
Proposition 4.2.23. Let 𝛾 ∈ ℝ. Then we have
1 0 0
(4.2.22) Rot𝑥̂ (𝛾) = (0 cos 𝛾 − sin 𝛾) ,
0 sin 𝛾 cos 𝛾
cos 𝛾 0 − sin 𝛾
(4.2.23) Rot𝑦̂(𝛾) = ( 0 1 0 ),
sin 𝛾 0 cos 𝛾
cos 𝛾 − sin 𝛾 0
(4.2.24) Rot𝑧̂ (𝛾) = ( sin 𝛾 cos 𝛾 0) .
0 0 1
Exercise 4.2.24. Prove Proposition 4.2.23.
152 4. The Theory of Quantum Algorithms

Note that by Proposition 4.2.23, the rotations about the 𝑥-, 𝑦-, and 𝑧-axes are in
SO(3). The next proposition provides explicit formulas for all rotations in ℝ3 and shows
that all rotations are in SO(3).
Proposition 4.2.25. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3) and let 𝛾 ∈ ℝ. Then we have
(4.2.25) Rotᵆ̂ (𝛾) = 𝐵 Rot𝑥̂ (𝛾)𝐵−1 ,
(4.2.26) Rot𝑣̂ (𝛾) = 𝐵 Rot𝑦̂(𝛾)𝐵−1 ,
(4.2.27) Rot𝑤̂ (𝛾) = 𝐵 Rot𝑧̂ (𝛾)𝐵−1
and these rotation operators are in SO(3).

Proof. We first prove (4.2.27). Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinates (𝑟, 𝜃, 𝜙) with
respect to (𝑢,̂ 𝑤).
̂ Then we have
(4.2.28) 𝑝 ⃗ = 𝑟𝐵(cos 𝜙 sin 𝜃, sin 𝜙 sin 𝜃, cos 𝜃).
Applying (4.2.24) and the trigonometric identities (A.5.2) and (A.5.5) we obtain
𝐵 Rot𝑧̂ (𝛾)𝐵 −1 𝑝 ⃗ = 𝐵 Rot𝑧̂ (𝛾)(cos 𝜙 sin 𝜃, sin 𝜙 sin 𝜃, cos 𝜃)
(4.2.29)
= 𝑟𝐵(cos(𝜙 + 𝛾) sin 𝜃, sin(𝜙 + 𝛾) sin 𝜃, cos 𝜃).
On the other hand, by (3.1.7) we have
(4.2.30) Rot𝑤̂ (𝛾)𝑝 ⃗ = 𝑟𝐵(cos(𝜙 + 𝛾) sin 𝜃, sin(𝜙 + 𝛾) sin 𝜃, cos 𝜃).
So (4.2.30) and (4.2.29) imply (4.2.27). Next, we prove (4.2.25). With the permutation
matrix
0 0 1
(4.2.31) 𝑃 = (0 1 0)
1 0 0
we have
(4.2.32) 𝐵𝑃 = (𝑤,̂ 𝑣,̂ 𝑢)̂
and
(4.2.33) 𝑃 Rot𝑧̂ (𝛾)𝑃 = Rot𝑧̂ (𝛾).
So it follows from (4.2.27), (4.2.32), and (4.2.33) that
Rotᵆ̂ (𝛾) = 𝐵𝑃 Rot𝑧̂ (𝛾)(𝐵𝑃)−1
(4.2.34)
= 𝐵𝑃 Rot𝑧̂ (𝛾)𝑃𝐵 −1 = 𝐵 Rot𝑥̂ (𝛾)𝐵−1 .
The identity (4.2.26) can be proved analogously. □

The following exercise is an application of Proposition 4.2.25.


Exercise 4.2.26. Let 𝑤̂ ∈ ℝ3 be a unit vector and let 𝛾 ∈ ℝ. Prove that
(4.2.35) Rot𝑤̂ (−𝛾) = Rot−𝑤̂ (𝛾).

We now prove that every operator in SO(3) is a rotation and we explain how rota-
tions can be represented.
4.2. More geometry in ℝ3 153

Proposition 4.2.27. Let 𝑂 ∈ SO(3). Then the following hold.


(1) We have 𝑂 = 𝐼3 if and only if 𝑂 = Rot𝑤̂ (𝛾) with an arbitrary unit vector 𝑤̂ ∈ ℝ3
and 𝛾 ∈ ℝ such that 𝛾 ≡ 0 mod 2𝜋.
(2) If 𝑂 ≠ 𝐼3 , then there is a unit vector 𝑤̂ ∈ ℝ3 such that ±𝑤̂ are the only unit eigen-
vectors of 𝑂 associated with the eigenvalue 1 and there is a modulo 2𝜋 uniquely
determined 𝛾 ∈ ℝ such that 𝑂 = Rot𝑤̂ (𝛾) = Rot−𝑤̂ (−𝛾).

Proof. Let 𝑤̂ ∈ ℝ3 be a unit vector and let 𝛾 ∈ ℝ.


If 𝛾 ≡ 0 mod 2𝜋, then Rot𝑤̂ (𝛾) = 𝐼3 by Theorem 4.2.21. Conversely, let Rot𝑤̂ (𝛾) =
𝐼3 . Let 𝑢̂ ∈ ℝ3 be a unit vector that is orthogonal to 𝑤.̂ Then the spherical coordinate
representation of 𝑢̂ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 0). So by Theorem 4.2.21 the spher-
ical coordinate representation of Rot𝑤̂ (𝛾)𝑢̂ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛾 mod 2𝜋).
But since Rot𝑤̂ (𝛾) = 𝐼3 , it follows that Rot𝑤̂ (𝛾)𝑢̂ = 𝑢.̂ So 𝛾 ≡ 0 mod 2𝜋.
To prove the second assertion, assume that 𝑂 ≠ 𝐼3 . The characteristic polynomial
of 𝑂 has degree 3 and real coefficients. So it follows from Proposition B.7.21 and Exer-
cise A.4.54 that 𝑂 has only real eigenvalues or one real eigenvalue and a pair of complex
conjugate eigenvalues.
Since by Proposition 4.2.17 the orthogonal operator 𝑂 is length preserving, the
absolute value of these eigenvalues is 1. But because 𝑂 ≠ 𝐼3 , det 𝑂 = 1, and det 𝑂 is
the product of the three eigenvalues by Proposition 2.4.1, it follows that exactly one of
the eigenvalues 𝑂 is 1. This implies that the eigenspace associated with the eigenvalue
1 has dimension 1. Let 𝑤̂ be a unit eigenvector of 𝑈 associated with the eigenvalue 1.
So −𝑤̂ is the only other unit eigenvector of 𝑂 associated with the eigenvalue 1.
Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3) which exists by Theorem 4.2.8. Set
(4.2.36) 𝑅 = 𝐵 −1 𝑂𝐵.
Now we show that 𝑅 = Rot𝑧̂ (𝛾) with 𝛾 ∈ ℝ. Then 𝑂 = Rot𝑤̂ (𝛾) by Proposition 4.2.25.
It follows from (4.2.36) that 𝑤̂ = 𝑂𝑤̂ is the last column of 𝐵𝑅. This implies that
𝑎 𝑏 0
(4.2.37) 𝑅 = (𝑐 𝑑 0)
𝑒 𝑓 1
with real entries 𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓. Since 𝑅 is orthogonal, Proposition 4.2.17 implies that
the columns of 𝑅 are orthogonal to each other. Hence, we have 𝑒 = 𝑓 = 0. Also, since
𝑅 is orthogonal we have
𝑎 −𝑏 0
(4.2.38) 𝑅 = (𝑏 𝑎 0)
0 0 1
and
(4.2.39) 𝑎2 + 𝑏2 = 1.
Therefore, Lemma 3.1.9 implies that 𝑅 = Rot𝑧̂ (𝛾) with 𝛾 ∈ ℝ.
We conclude the proof by showing the uniqueness statements for 𝑤̂ and 𝛾. So
let 𝑤̂ ′ ∈ ℝ3 be a unit vector, and let 𝛾′ ∈ ℝ be such that 𝑂 = Rot𝑤̂ ′ (𝛾′ ). Then we
154 4. The Theory of Quantum Algorithms

have 𝑂𝑤̂ ′ = Rot𝑤̂ ′ (𝛾′ )𝑤̂ ′ = 𝑤̂ ′ . So 𝑤̂ ′ is a unit eigenvector of 𝑂 associated with the
eigenvalue 1. As seen above, this implies 𝑤̂ ′ = 𝑤̂ or 𝑤̂ ′ = −𝑤.̂ The uniqueness modulo
2𝜋 is proved in Exercise 4.2.28. □

Exercise 4.2.28. Prove the uniqueness of 𝛾 modulo 2𝜋 in Proposition 4.2.27.

Now we can prove Theorem 4.2.22. It follows from Proposition 4.2.25 that all rota-
tions of ℝ3 operators are in SO(3). Proposition 4.2.27 shows that all elements of SO(3)
are rotations of ℝ3 . Therefore, the set of all rotations of ℝ3 is indeed SO(3).

4.2.3. Decomposition of rotations. Next, we give two decomposition theo-


rems for rotations in ℝ3 . For this, we need the following lemma.

Lemma 4.2.29. Let 𝑢,̂ 𝑢̂′ , 𝑤̂ ∈ ℝ3 be unit vectors such that 𝑢̂ and 𝑢̂′ are orthogonal to 𝑤.̂
Then there is a modulo 2𝜋 uniquely determined 𝛿 ∈ ℝ such that Rot𝑤̂ (𝛿)𝑢̂ = 𝑢̂′ .

Proof. It follows from Proposition 4.2.12 that the spherical coordinate representation
of 𝑢̂′ with respect to (𝑢,̂ 𝑤)̂ is (1, 𝜋/2, 𝛿) with a modulo 2𝜋 uniquely determined 𝛿 ∈ ℝ.
Therefore, the definition of rotations in Theorem 4.2.21 implies the assertion. □

Here is our first decomposition theorem.

Theorem 4.2.30. For every 𝑂 ∈ SO(3) there are 𝛼, 𝛽, 𝛾 ∈ ℝ such that


(4.2.40) 𝑂 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽) Rot𝑧̂ (𝛾).
The real numbers 𝛼, 𝛽, 𝛾 are called the Euler angles of 𝑂.

Proof. Let 𝑂 ∈ SO(3). Denote by 𝑥,̂ 𝑦′̂ , 𝑧′̂ the column vectors of 𝑂. If 𝑧′̂ = 𝑧 ̂ or 𝑧′̂ = −𝑧,̂
then, as shown in the proof of Proposition 4.2.27, we have 𝑂 = Rot𝑧̂ (𝛾) with 𝛾 ∈ ℝ. So,
if we set 𝛼 = 𝛽 = 0, then (4.2.40) holds.
Assume that 𝑧′̂ ≠ ±𝑧.̂ The proof for this case is illustrated in Figure 4.2.5. The
intersection of the plane spanned by 𝑥̂ and 𝑦 ̂ and the plane spanned by 𝑥̂ and 𝑦′̂ is a
line, that is, a one-dimensional subspace of ℝ3 . Denote by 𝑣 ̂ a unit vector that generates
this space. Both 𝑦 ̂ and 𝑣 ̂ are orthogonal to 𝑧.̂ By Lemma 4.2.29 we can choose 𝛼 ∈ ℝ
such that Rot𝑧̂ (𝛼)𝑦 ̂ = 𝑣.̂ This rotation does not change 𝑧 ̂ and maps 𝑥̂ to some unit
vector 𝑥1̂ ∈ ℝ3 . If we apply this rotation to the standard basis 𝐼3 of ℝ3 , we obtain the
orthonormal basis
(4.2.41) 𝐵1 = Rot𝑧̂ (𝛼)𝐼3 = (𝑥1̂ , 𝑣,̂ 𝑧)̂ ∈ SO(3).
Next, we observe that 𝑧 ̂ and 𝑧′̂ are orthogonal to 𝑣.̂ By Lemma 4.2.29 we can choose
𝛽 ∈ ℝ so that Rot𝑣̂ (𝛽)𝑧 ̂ = 𝑧′̂ . This rotation does not change 𝑣 ̂ and maps 𝑥1̂ to some
unit vector 𝑥2̂ ∈ ℝ3 . By Proposition 4.2.25 and (4.2.41) we can write this rotation as
(4.2.42) Rot𝑣̂ (𝛽) = 𝐵1 Rot𝑦̂(𝛽)𝐵1−1 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽)𝐵1−1 .
Applying this rotation to 𝐵1 we obtain the basis
(4.2.43) 𝐵2 = Rot𝑣̂ (𝛽)𝐵1 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽)𝐼3 = (𝑥2̂ , 𝑣,̂ 𝑧′̂ ) ∈ SO(3).
4.2. More geometry in ℝ3 155

𝑧̂
𝑧′̂
𝛽

𝛾 𝑦′̂
𝛽 𝛼 𝑣̂
𝛼 𝑦̂
𝑥̂ 𝑥̂
1 𝛾
𝑥2̂ 𝑥̂

Figure 4.2.5. Euler angles.

Finally, we note that 𝑣 ̂ and 𝑦′̂ are both orthogonal to 𝑧′̂ . By Lemma 4.2.29 we can choose
𝛾 ∈ ℝ3 such that Rot𝑧′̂ (𝛾)𝑣 ̂ = 𝑦′̂ . This rotation does not change 𝑧′̂ and maps 𝑥2̂ to some
unit vector 𝑥3̂ . By Proposition 4.2.25 and (4.2.43) this rotation is
(4.2.44) Rot𝑧′̂ (𝛾) = 𝐵2 Rot𝑧̂ (𝛼)𝐵2−1 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽) Rot𝑧̂ (𝛾)𝐵2−1 .
Applying this rotation to 𝐵2 we obtain the basis
(4.2.45) 𝐵3 = Rot𝑧̂ (𝛼) Rot𝑦̂(𝛽) Rot𝑧̂ (𝛾) = (𝑥3̂ , 𝑦′̂ , 𝑧′̂ ) ∈ SO(3).
Since 𝑂 = (𝑥,̂ 𝑦′̂ , 𝑧′̂ ) ∈ SO(3), it follows from Proposition 4.2.7 that 𝑥3̂ = 𝑥̂ and 𝑂 = 𝐵3 .
This concludes the proof. □

The next exercise generalizes Theorem 4.2.30.


Exercise 4.2.31. Let 𝑣,̂ 𝑤̂ ∈ ℝ3 be unit vectors that are orthogonal to each other. Show
that for all 𝑂 ∈ SO(3) there are 𝛼, 𝛽, 𝛾 ∈ ℝ such that
(4.2.46) 𝑂 = Rot𝑤̂ (𝛼) Rot𝑣̂ (𝛽) Rot𝑤̂ (𝛾).

If in Exercise 4.2.31 the rotation axes 𝑣 ̂ and 𝑤̂ are not orthogonal, then, in general,
a decomposition (4.2.46) does not exist. However, we can prove a weaker result, for
which we need the following lemma.
Lemma 4.2.32. Let 𝑤,̂ 𝑤̂ ′ ∈ ℝ3 be unit vectors, and let 𝛾 ∈ ℝ. Also, let 𝑂 ∈ SO(3) with
𝑂𝑤̂ = 𝑤̂ ′ . Then
(4.2.47) Rot𝑤̂ ′ (𝛾) = 𝑂 Rot𝑤̂ (𝛾)𝑂−1 .

Proof. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3). This matrix exists by Theorem 4.2.8. Set 𝐵′ = 𝑂𝐵 =
(𝑢̂′ , 𝑣′̂ , 𝑤̂ ′ ). Then it follows from Proposition 4.2.25 that Rot𝑤̂ ′ (𝛾) = 𝐵 ′ Rot𝑧̂ (𝛾)(𝐵 ′ )−1 =
𝑂𝐵 Rot𝑧̂ (𝛾)𝐵−1 𝑂−1 = 𝑂𝑅𝑤̂ (𝛾)𝑂−1 . □
156 4. The Theory of Quantum Algorithms

Here is our second decomposition theorem with respect to nonparallel rotation


axes that are not required to be orthogonal to each other.
Theorem 4.2.33. Let 𝑎,̂ 𝑏 ̂ ∈ ℝ3 be nonparallel unit vectors. Denote by 𝜑 the angle be-
tween 𝑎̂ and 𝑏.̂ Then for all 𝑂 ∈ SO(3) there are 𝑘 ∈ ℕ and 𝛼1 , . . . , 𝛼𝑘 , 𝛽1 , . . . , 𝛽 𝑘 ∈ ℝ
such that 𝑘 = O(1/𝜑) and
𝑘
(4.2.48) 𝑂 = ∏ Rot𝑎̂ (𝛼𝑖 ) Rot𝑏̂ (𝛽 𝑖 ).
𝑖=1

Proof. Let 𝑂 ∈ SO(3). Then by Proposition 4.2.27 we can write


(4.2.49) 𝑂 = Rot𝑤̂ (𝛾)
3
with a unit vector 𝑤̂ ∈ ℝ and 𝛾 ∈ ℝ. We will show that there are 𝑙 ∈ ℕ and real
numbers 𝛼1 , . . . , 𝛼𝑙 , 𝛽1 , . . . , 𝛽 𝑙 such that for
𝑙
(4.2.50) 𝑂1 = ∏ Rot𝑎̂ (𝛼𝑖 ) Rot𝑏̂ (𝛽 𝑙 )
𝑖=1

we have 𝑏 ̂ = 𝑂1 𝑤.̂ It then follows from Lemma 4.2.32 that


(4.2.51) 𝑂 = 𝑂−1
1 Rot𝑏̂ (𝛾)𝑂1

which is a decomposition as in (4.2.48). We will also show that 𝑙 = O(1/𝜑). Then the
theorem is proved. In order to keep the proof as simple as possible, we will give geo-
metric arguments. They can be verified algebraically using the terminology introduced
so far.
First, we observe that a rotation about 𝑎̂ brings 𝑤̂ into the plane 𝑃 spanned by 𝑎̂
and 𝑏.̂ So, we assume that 𝑤̂ is in this plane. Next, we may assume that the initial
positions of the three vectors 𝑎,̂ 𝑏,̂ and 𝑤̂ are as shown in Figure 4.2.6. Therefore, they
are in the half-plane above the dashed line orthogonal to 𝑎̂ and also in the half-plane
to the right of 𝑎.̂ This can be achieved as follows. Vectors that are below the dashed
line are multiplied by −1. This is justified by Exercise 4.2.26. If 𝑏 ̂ is on the wrong side
of 𝑎,̂ then we exchange 𝑎̂ and 𝑏.̂ Also, if 𝑤̂ is on the wrong side of 𝑎,̂ then we apply a
rotation about 𝑎̂ through an angle of 𝜋 to 𝑤.̂

𝑎̂ 𝑎̂
𝑤̂
𝑏̂

𝑏̂

𝑤̂

Case 1 Case 2

Figure 4.2.6. The two cases in the proof of Theorem 4.2.33 for possible initial positions
of the vectors 𝑎,̂ 𝑏,̂ and 𝑤.̂
4.2. More geometry in ℝ3 157

In the situation shown in Figure 4.2.6 there are two possible cases. In Case 1, the
vector 𝑤̂ is on the left of 𝑏,̂ and in Case 2, it is on the right of 𝑏.̂ We prove the assertion
in both cases.
The proof in the first case is illustrated in Figure 4.2.7. In this case, 𝑤̂ is in the
half-plane to the left of 𝑏 ̂ and the angle 𝜃 between 𝑎̂ and 𝑤̂ is at most as big as the angle
𝜑 between 𝑎̂ and 𝑏.̂ If 𝜃 = 𝜑, then we are done. Therefore, we assume that 𝜃 < 𝜑.
Suppose that we rotate 𝑤̂ about 𝑏 ̂ through an angle of 𝛽 ∈ [0, 𝜋]. Denote the rotated
vector by 𝑤(𝛽)
̂ and the angle between 𝑤(𝛽) ̂ and 𝑎̂ by 𝜃(𝛽). Then 𝜃(0) = 𝜃 < 𝜑 and
𝜃(𝜋) > 𝜑. Since the function [0, 𝜋] → ℝ, 𝛽 ↦ 𝜃(𝛽) is continuous, it follows from the
intermediate value theorem that there is 𝛽 ∈ [0, 𝜋] such that 𝜃(𝛽) = 𝜑. We apply the
rotation Rot𝑏̂ (𝛽) to 𝑤̂ and obtain 𝑤̂ ′ such that the angle between 𝑎̂ and 𝑤̂ ′ is equal to
the angle between 𝑎̂ and 𝑏.̂ A rotation of 𝑤̂ ′ about 𝑎̂ through some angle 𝛼 ∈ ℝ sends
𝑤̂ ′ to 𝑏.̂

𝑎̂
𝑤̂
𝜃(𝜋)
𝜃 𝑏̂
𝜑

𝑤(𝜋)
̂

Figure 4.2.7. Illustration of the proof of Theorem 4.2.33 in Case 1.

Now we turn to the second case where 𝑤̂ is in the half-plane on the right side of 𝑏.̂
We show how to use rotations of 𝑤̂ about 𝑎̂ and 𝑏 ̂ to obtain Case 1. This construction is
illustrated in Figure 4.2.8. We set 𝑤̂ 0 = 𝑤̂ and construct a finite sequence 𝑤̂ 1 , . . . , 𝑤̂ 𝑚 ,
𝑚 ∈ ℕ, such that 𝑤̂ 𝑖 is obtained from 𝑤̂ 𝑖−1 by rotations about 𝑎̂ and 𝑏 ̂ and 𝑤̂ 𝑚 is for
the first time between 𝑎̂ and 𝑏.̂ For 𝑖 ∈ {0, . . . , 𝑚} we denote by 𝛼𝑖 the angle between 𝑎̂
and 𝑤̂ 𝑖 and by 𝛽 𝑖 the angle between 𝑤̂ 𝑖 and 𝑏.̂ Furthermore, we denote by 𝜑 the angle
between 𝑎̂ and 𝑏.̂ To construct 𝑤̂ 1 from 𝑤̂ 0 , we rotate 𝑤̂ 0 about 𝑏 ̂ through an angle
𝜋. If 𝛽0 ≤ 𝜙, then 𝑤̂ 1 is between 𝑎̂ and 𝑏 ̂ and we are in Case 1. In Figure 4.2.8 this
is not the case. If 𝛽0 > 𝜑, then we have 𝛼1 = 𝛽1 − 𝜑 = 𝛽0 − 𝜑. Since 𝛽0 < 𝛼0 , it
follows that 𝛼1 < 𝛼0 − 𝜑. Next, we construct 𝑤̂ 2 by a rotation of 𝑤̂ 1 about 𝑎̂ through

𝑎̂ 𝑎̂ 𝑎̂ 𝑤̂ 3
𝑤̂ 1 𝛼1
𝑏̂ 𝑤̂ 1 𝑏 ̂ 𝑤̂ 𝛼3 𝑏 ̂ 𝑤̂
2 2
𝛽2
𝛽1 𝛽3 𝛽
2
𝛽1 𝛼2
𝛼0 𝛽0
𝜑 𝜑 𝜑
𝑤̂ 0 = 𝑤̂

Figure 4.2.8. Illustration of the proof of Theorem 4.2.33 in Case 2.


158 4. The Theory of Quantum Algorithms

an angle 𝜋. If 𝛼1 ≤ 𝜑, then 𝑤̂ 2 is between 𝑎̂ and 𝑏 ̂ and we are in Case 1. Otherwise,


we have 𝛽2 = 𝛼2 − 𝜑 = 𝛼1 − 𝜑 < 𝛼0 − 2𝜑. If we continue this construction, we obtain
𝛼2𝑖+1 < 𝛼0 − (2𝑖 + 1)𝜑 as long as 𝛽2𝑖 > 𝜑. Also, we obtain 𝛽2𝑖+2 < 𝛼0 − (2𝑖 + 2)𝜑 as
long as 𝛼2𝑖+1 > 𝜑. Since 0 ≤ 𝛼0 ≤ 𝜋/2, this implies that 𝑚 = O(1/𝜑). □

4.3. Rotation operators


In this section, we introduce rotation operators. We will show that applying such oper-
ators to a quantum state |𝜓⟩ in ℍ1 means applying a rotation to the corresponding point
𝑝(𝜓)
⃗ on the Bloch sphere. We will prove that the set of all these operators is the special
unitary group SU(2), i.e., the group of all unitary operators on ℍ1 with determinant
1, which was introduced in Theorem 2.4.20. We will also construct an isomorphism
between SU(2)/{±𝐼} and the group SO(3) of rotations in ℝ3 . This allows us to obtain
decomposition theorems for rotation operators from Theorems 4.2.30 and 4.2.33.

4.3.1. Basics. This section introduces rotation operators.


Let
(4.3.1) 𝜎 = (𝑋, 𝑌 , 𝑍)
be the triplet of Pauli operators. For all 𝑝 ⃗ = (𝑝𝑥 , 𝑝𝑦 , 𝑝𝑧 ) ∈ ℝ3 set
(4.3.2) 𝑝 ⃗ ⋅ 𝜎 = 𝑝𝑥 𝑋 + 𝑝𝑦 𝑌 + 𝑝𝑧 𝑍.
Example 4.3.1. We have 𝑥̂ ⋅ 𝜎 = (1, 0, 0) ⋅ 𝜎 = 𝑋, 𝑦 ̂ ⋅ 𝜎 = (0, 1, 0) ⋅ 𝜎 = 𝑌 , and
𝑧 ̂ ⋅ 𝜎 = (0, 0, 1) ⋅ 𝜎 = 𝑍.

For the definition of rotation operators, we need the following proposition.


Proposition 4.3.2. Let 𝑝 ̂ ∈ ℝ3 be a unit vector. Then 𝑝 ̂ ⋅ 𝜎 is a Hermitian unitary
involution with trace 0 and eigenvalues ±1.

Proof. We use Theorem 4.1.2 and obtain the following. Since the Pauli operators are
Hermitian operators with trace 0, the operator 𝑝 ̂ ⋅ 𝜎 is also Hermitian and has trace 0.
Let 𝑝 ̂ = (𝑝𝑥 , 𝑝𝑦 , 𝑝𝑧 ). Due to ‖𝑝‖̂ = 1 we have
(𝑝 ̂ ⋅ 𝜎)2 = (𝑝𝑥 𝑋 + 𝑝𝑦 𝑌 + 𝑝𝑧 𝑍)2
= 𝑝𝑥2 𝑋 2 + 𝑝𝑦2 𝑌 2 + 𝑝𝑧 𝑍 2 + 𝑝𝑥 𝑝𝑦 (𝑋𝑌 + 𝑌 𝑋)
+ 𝑝𝑥 𝑝𝑧 (𝑋𝑍 + 𝑌 𝑍) + 𝑝𝑦 𝑝𝑧 (𝑌 𝑍 + 𝑍𝑌 )
= (𝑝𝑥2 + 𝑝𝑦2 + 𝑝𝑧2 )𝐼 = 𝐼.
So 𝑝 ̂ ⋅ 𝜎 is an involution. But Hermitian involutions are unitary. Also, since 𝑝 ̂ ⋅ 𝜎 is a
Hermitian involution of trace 0, this operator is diagonizable by Theorem 2.4.53 and
its eigenvalues are in {±1} by Proposition 2.4.60. But since (𝐼, 𝑋, 𝑌 , 𝑍) is a ℂ-basis of
End(ℍ1 ) by Proposition 4.1.4, it follows that 𝑝 ̂ ⋅ 𝜎 ≠ 𝐼. So, the set of eigenvalues of 𝑝 ̂ ⋅ 𝜎
is {±1}. □

From Proposition 4.3.2, Corollary 2.4.73, and Proposition 2.4.74 we obtain the fol-
lowing result.
4.3. Rotation operators 159

Proposition 4.3.3. For all unit vectors 𝑤̂ ∈ ℝ3 and all 𝛾 ∈ ℝ


̂ 𝛾 𝛾
(4.3.3) 𝑒−𝑖𝛾 𝑤⋅𝜍/2 = cos 𝐼 − 𝑖 sin 𝑤̂ ⋅ 𝜎
2 2
is a unitary operator on ℍ1 with determinant 1, i.e., in SU(2).

Proposition 4.3.3 justifies the following definition.


Definition 4.3.4. A rotation gate or rotation operator is an operator
̂ 𝛾 𝛾
(4.3.4) 𝑅𝑤̂ (𝛾) = 𝑒−𝑖𝛾 𝑤⋅𝜍/2 = cos 𝐼 − 𝑖 sin 𝑤̂ ⋅ 𝜎
2 2
3
on ℍ1 where 𝑤̂ ∈ ℝ is a unit vector and 𝛾 ∈ ℝ.

The name “rotation operator” comes from the fact that applying the operator from
(4.3.4) to a quantum state in ℍ1 means applying the rotation Rot𝑤̂ (𝛾) to the correspond-
ing point on the Bloch sphere. This will be shown in Theorem 4.3.20. The next exercise
verifies this for the special case where 𝑤̂ = 𝑧 ̂ = (0, 0, 1).
Exercise 4.3.5. Show that for every 𝛾 ∈ ℝ and every quantum state |𝜓⟩ ∈ ℍ1 we have
(4.3.5) 𝑝 ̂ (𝑅𝑧̂ (𝛾) |𝜓⟩) = Rot𝑧̂ (𝛾)𝑝(𝜓).
̂

From Exercise 2.4.71 we obtain the following result.


Proposition 4.3.6. For all unit vectors 𝑤̂ ∈ ℝ3 and all 𝛽, 𝛾 ∈ ℝ we have
(4.3.6) 𝑅𝑤̂ (𝛽)𝑅𝑤̂ (𝛾) = 𝑅𝑤̂ (𝛽 + 𝛾).

We define special rotations.


Definition 4.3.7. Let 𝛾 ∈ ℝ. The rotation operators about the 𝑥-, 𝑦-, and 𝑧-axes through
the angle 𝛾 are defined as
(4.3.7) 𝑅𝑥̂ (𝛾) = 𝑒−𝑖𝛾𝑋/2 , 𝑅𝑦̂(𝛾) = 𝑒−𝑖𝛾𝑌 /2 , 𝑅𝑧̂ (𝛾) = 𝑒−𝑖𝛾𝑍/2 ,
respectively.

Here is another representation of the rotation operators that we have just intro-
duced.
Proposition 4.3.8. Let 𝛾 ∈ ℝ. Then we have
𝛾 𝛾
cos 2 −𝑖 sin 2
(4.3.8) 𝑅𝑥̂ (𝛾) = ( 𝛾 𝛾 ),
−𝑖 sin 2 cos 2
𝛾 𝛾
cos 2 − sin 2
(4.3.9) 𝑅𝑦̂(𝛾) = ( 𝛾 𝛾 ),
sin 2 cos 2
𝛾
𝑒−𝑖 2 0
(4.3.10) 𝑅𝑧̂ (𝛾) = ( 𝛾).
0 𝑒𝑖 2
Exercise 4.3.9. Prove Proposition 4.3.8

The next exercise shows that 𝑖𝑋, 𝑖𝑌 , 𝑖𝑍, and 𝑋𝐻 are rotation operators.
𝜋
Exercise 4.3.10. Show that 𝑅𝑥̂ (𝜋) = 𝑖𝑋 , 𝑅𝑦̂(𝜋) = 𝑖𝑌 , 𝑅𝑧̂ (𝜋) = 𝑖𝑍, and 𝐻 = 𝑋𝑅𝑦̂ ( 2 ).
160 4. The Theory of Quantum Algorithms

4.3.2. The group of rotation operators. Our next goal is to show that the set
of rotation operators on ℍ1 is SU(2). For this, we use the following characterization of
SU(2) which follows from Corollary 2.4.73:
(4.3.11) SU(2) = {𝑒𝑖𝐴 ∶ 𝐴 Hermitian and tr 𝐴 = 0}.

The following definition will simplify our discussion.

Definition 4.3.11. The set of all Hermitian operators on ℍ1 with trace 0 is denoted by
su(2).

We note that we slightly deviate from the standard notation in mathematics where
su(2) typically denotes the Lie algebra that consists of all 2×2 skew-Hermitian matrices
with trace 0.
The elements of su(2) can be characterized as follows.

Lemma 4.3.12. Let 𝐴 ∈ ℂ(2,2) . Then 𝐴 ∈ su(2) if and only if there are 𝑎 ∈ ℝ and 𝑏 ∈ ℂ
such that
𝑎 𝑏
(4.3.12) 𝐴=( ).
𝑏 −𝑎
Exercise 4.3.13. Prove Lemma 4.3.12.

The next proposition uses Lemma 4.3.12 to describe the structure of su(2).

Proposition 4.3.14. The set su(2) is a real three-dimensional vector space. The triplet
𝜎 = (𝑋, 𝑌 , 𝑍) of the three Pauli operators is an ℝ-basis of su(2) that is orthogonal with
respect to the Hilbert-Schmidt inner product.

Proof. Let 𝐴 ∈ su(2). By Lemma 4.3.12 we can write


(4.3.13) 𝐴 = (ℜ𝑏)𝑋 + (ℑ𝑏)𝑌 + 𝑎𝑍
where 𝑎 ∈ ℝ and 𝑏 ∈ ℂ. This implies that (𝑋, 𝑌 , 𝑍) is a generating system of su(2). But
it follows from Proposition 4.1.4 that the triplet (𝑋, 𝑌 , 𝑍) is linearly independent over
ℝ and orthogonal with respect to the Hilbert-Schmidt inner product. Therefore, su(2)
is a real three-dimensional vector space. □

We now show that the operators in SU(2) are rotation operators.

Theorem 4.3.15. The set of rotation operators on ℍ1 is SU(2). Moreover, if 𝑈 ∈ SU(2),


then the following hold.
(1) 𝑈 = 𝐼 if and only if 𝑈 = 𝑅𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and 𝛾/2 ≡ 0 mod 2𝜋.
(2) 𝑈 = −𝐼 if and only if 𝑈 = 𝑅𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and 𝛾/2 ≡ 𝜋 mod 2𝜋.
(3) Let 𝑈 ≠ ±𝐼.
(a) There are a unit vector 𝑤̂ ∈ ℝ3 and 𝛾 ∈ ℝ such that 𝑈 = 𝑅𝑤̂ (𝛾).
(b) If 𝑤̂ ′ ∈ ℝ3 is a unit vector and 𝛾′ ∈ ℝ, then 𝑈 = 𝑅𝑤̂ ′ (𝛾′ ) if and only if 𝑤̂ = 𝑤̂ ′
and 𝛾/2 ≡ 𝛾′ /2 mod 2𝜋 or 𝑤̂ = −𝑤̂ ′ and 𝛾/2 ≡ −𝛾′ /2 mod 2𝜋.
4.3. Rotation operators 161

Proof. Let 𝑤̂ ∈ ℝ3 be a unit vector. It can be easily verified that 𝑅𝑤̂ (𝛾) = 𝐼 if 𝛾/2 ≡
0 mod 2𝜋. Also, by Proposition 4.1.4 the coefficients of a representation of 𝐼 as a linear
combination of 𝐼, 𝑋, 𝑌 , 𝑍 are uniquely determined. So if 𝐼 = 𝑅𝑤̂ (𝛾) with 𝛾 ∈ ℝ, then
it follows from (4.3.3) that we have cos 𝛾/2 = 1 and sin 𝛾/2 = 0. This implies 𝛾/2 ≡
0 mod 2𝜋. The second assertion can be proved analogously.
We prove the third statement. It follows from (4.3.11) that 𝑈 = 𝑒𝑖𝐴 with 𝐴 ∈ su(2).
By Proposition 4.3.14 there is a uniquely determined 𝑝 ⃗ ∈ ℝ3 such that 𝐴 = 𝑝 ⃗ ⋅ 𝜎. Since
𝑈 ≠ ±𝐼 it follows that 𝑝 ⃗ is nonzero. Set 𝛾 = 2 ‖‖𝑝‖‖⃗ mod 4𝜋 and 𝑤̂ = −𝑝/⃗ ‖‖𝑝‖‖⃗ . Then 𝑤̂
̂
is a unit vector, and we have 𝑈 = 𝑒−𝑖𝛾𝑤⋅𝜍/2 .
Next, let 𝑤̂ ′ ∈ ℝ3 be a unit vector and let 𝛾′ ∈ ℝ be such that 𝑈 = 𝑅𝑤̂ ′ (𝛾′ ).
Then cos 𝛾/2 ≠ ±1 and sin 𝛾/2 ≠ 0. The uniqueness of the coefficient of 𝐼 in (4.3.3)
implies 𝛾/2 ≡ ±𝛾′ /2 mod 2𝜋. If 𝛾/2 ≡ 𝛾′ /2 mod 2𝜋, then sin 𝛾/2 = sin 𝛾′ /2 and due
to the uniqueness of the coefficients of 𝑋, 𝑌 , and 𝑍 in (4.3.3) we have 𝑤̂ ′ = 𝑤.̂ If
𝛾/2 ≡ −𝛾′ /2 mod 2𝜋, then sin 𝛾/2 = − sin 𝛾′ /2 and because of the uniqueness of the
coefficients of 𝑋, 𝑌 , and 𝑍 in (4.3.3) we have 𝑤̂ = −𝑤.̂ □

Theorem 4.3.15 implies that, up to a global phase factor, every unitary operator on
ℍ1 is a rotation operator. This is what the next corollary says.
Corollary 4.3.16. Let 𝑈 ∈ U(2). Then there is 𝛿 ∈ ℝ such that 𝑒−𝑖𝛿 𝑈 is a rotation
operator on ℍ1 .

Proof. Since | det 𝑈| = 1 we can choose 𝛿 ∈ ℝ such that det 𝑈 = 𝑒𝑖2𝛿 . So det(𝑒−𝑖𝛿 𝑈) =
1 which implies that 𝑒−𝑖𝛿 𝑈 ∈ SU(2). So by Theorem 4.3.15, 𝑒−𝑖𝛿 𝑈 is a rotation operator.

Theorem 4.3.15 also implies the next corollary which, in turn, allows us to assign a
uniquely determined rotation of ℝ3 to each rotation operator on ℍ1 in the most obvious
way.
Corollary 4.3.17. Let 𝑈 ∈ SU(2). Then for all unit vectors 𝑤̂ ∈ ℝ and 𝛾 ∈ ℝ such that
𝑈 = 𝑅𝑤̂ (𝛾) the rotation Rot𝑤̂ (𝛾) is the same.
Exercise 4.3.18. Use Theorem 4.3.15 and Proposition 4.2.27 to prove Corollary 4.3.17.
Corollary 4.3.17 justifies the following definition.
Definition 4.3.19. Let 𝑈 ∈ SU(2) and let 𝑈 = 𝑅𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and
𝛾 ∈ ℝ. Then we set Rot(𝑈) = Rot𝑤̂ (𝛾).
4.3.3. Rotation operators and rotations on the Bloch sphere. After the
preparations of the preceding section, we are now ready to prove the following im-
portant theorem.
Theorem 4.3.20. The map
(4.3.14) Rot ∶ SU(2) → SO(3), 𝑈 ↦ Rot(𝑈)
is a surjective group homomorphism with kernel ±𝐼. Furthermore, for all 𝑈 ∈ SU(2) and
all quantum states |𝜓⟩ in ℍ1 the point on the Bloch sphere corresponding to |𝜓⟩ is
(4.3.15) ⃗ |𝜓⟩) = Rot(𝑈)𝑝(𝜓).
𝑝(𝑈 ⃗
162 4. The Theory of Quantum Algorithms

We start by verifying the surjectivity of Rot. Let 𝑂 ∈ SO(3). By Proposition 4.2.27


we have 𝑂 = Rot𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and 𝛾 ∈ ℝ. Set 𝑈 = 𝑅𝑤̂ (𝛾). Then
𝑈 ∈ SU(2) by Proposition 4.3.3 and 𝑂 = Rot(𝑈) by Definition 4.3.19.
In order to prove the other properties of Rot, we need some notation and several
auxiliary results, which we present now.
Definition 4.3.21. Let 𝜏 = (𝜏ᵆ , 𝜏𝑣 , 𝜏𝑤 ) ∈ su(2)3 .
(1) For all 𝑝 ⃗ = (𝑝ᵆ , 𝑝𝑣 , 𝑝𝑤 ) ∈ ℝ3 we set
(4.3.16) 𝑝 ⃗ ⋅ 𝜏 = 𝑝ᵆ 𝜏ᵆ + 𝑝𝑣 𝜏𝑣 + 𝑝𝑤 𝜏𝑤 .
(2) If 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ ℝ(3,3) , then we define
(4.3.17) 𝐵 ⋅ 𝜏 = (𝑢̂ ⋅ 𝜏, 𝑣 ̂ ⋅ 𝜏, 𝑤̂ ⋅ 𝜏).
Lemma 4.3.22. Let 𝜏 ∈ su(2)3 , 𝐵 ∈ ℝ(3,3) , and 𝑝 ⃗ ∈ ℝ3 . Then we have
(4.3.18) (𝐵𝑝)⃗ ⋅ 𝜏 = 𝑝 ⃗ ⋅ (𝐵 ⋅ 𝜏).
Exercise 4.3.23. Prove Lemma 4.3.22.
Lemma 4.3.24. Let 𝑝,⃗ 𝑞 ⃗ ∈ ℝ3 . Then we have
(4.3.19) (𝑝 ⃗ ⋅ 𝜎)(𝑞 ⃗ ⋅ 𝜎) = ⟨𝑝|⃗ 𝑞⟩𝐼
⃗ + 𝑖 𝑝 ⃗ × 𝑞 ⃗ ⋅ 𝜎.

Proof. Let 𝑝 ⃗ = (𝑝𝑥 , 𝑝𝑦 , 𝑝𝑧 ), 𝑞 ⃗ = (𝑞𝑥 , 𝑞𝑦 , 𝑞𝑧 ) ∈ ℝ3 . Then we obtain from Theorem 4.1.2


(𝑝 ⃗ ⋅ 𝜎)(𝑞 ⃗ ⋅ 𝜎) = (𝑝𝑥 𝑋 + 𝑝𝑦 𝑌 + 𝑝𝑧 𝑍)(𝑞𝑥 𝑋 + 𝑞𝑦 𝑌 + 𝑞𝑧 𝑍)
= 𝑝𝑥 𝑞𝑥 𝑋 2 + 𝑝𝑦 𝑞𝑦 𝑌 2 + 𝑝𝑧 𝑞𝑧 𝑍 2
+ 𝑝𝑥 𝑞𝑦 𝑋𝑌 + 𝑝𝑦 𝑞𝑥 𝑌 𝑋 + 𝑝𝑥 𝑞𝑧 𝑋𝑍 + 𝑝𝑧 𝑞𝑥 𝑍𝑋 + 𝑝𝑦 𝑞𝑧 𝑌 𝑍 + 𝑝𝑧 𝑞𝑦 𝑍𝑌
= ⟨𝑝|⃗ 𝑞⟩𝐼
⃗ + 𝑖(𝑍(𝑝𝑥 𝑞𝑦 − 𝑝𝑦 𝑞𝑥 ) + 𝑌 (𝑝𝑧 𝑞𝑥 − 𝑝𝑥 𝑞𝑧 ) + 𝑍(𝑝𝑥 𝑞𝑦 − 𝑝𝑦 𝑞𝑧 ))
= ⟨𝑝|⃗ 𝑞⟩𝐼
⃗ + 𝑖 𝑝 ⃗ × 𝑞 ⃗ ⋅ 𝜎.
This proves the assertion. □
Proposition 4.3.25. Let 𝐵 ∈ SO(3) and write 𝜏 = (𝜏ᵆ , 𝜏𝑣 , 𝜏𝑤 ) = 𝐵 ⋅ 𝜎. Then we have
(4.3.20) 2
𝜏ᵆ2 = 𝜏𝑣2 = 𝜏𝑤 = 𝐼,
(4.3.21) 𝜏ᵆ 𝜏𝑣 = 𝑖𝜏𝑤 = −𝜏𝑣 𝜏ᵆ , 𝜏𝑤 𝜏ᵆ = 𝑖𝜏𝑣 = −𝜏ᵆ 𝜏𝑤 , 𝜏𝑣 𝜏𝑤 = 𝑖𝜏ᵆ = −𝜏𝑤 𝜏𝑣 ,
and
(4.3.22) −𝑖𝜏ᵆ 𝜏𝑣 𝜏𝑤 = 𝐼.

Proof. First, (4.3.20) follows from Proposition 4.3.2. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ with column
vectors 𝑢,̂ 𝑣,̂ 𝑤̂ ∈ ℝ3 . From Lemma 4.3.24 and Theorem 4.2.8 we obtain
(4.3.23) 𝜏ᵆ 𝜏𝑣 = (𝑢̂ ⋅ 𝜎)(𝑣 ̂ ⋅ 𝜎) = ⟨𝑢|̂ 𝑣⟩̂ 𝐼 + 𝑖(𝑢̂ × 𝑣)̂ ⋅ 𝜎 = 𝑖 𝑤̂ ⋅ 𝜎 = 𝑖𝜏𝑤 .

The other identities in (4.3.21) can be proved analogously. Finally, from (4.3.20) and
(4.3.21) we obtain
(4.3.24) −𝑖𝜏ᵆ 𝜏𝑣 𝜏𝑤 = (−𝑖)𝑖𝜏𝑤 𝜏𝑤 = 𝐼. □
4.3. Rotation operators 163

Lemma 4.3.26. Let 𝑈 ∈ SU(2) and let 𝑝 ⃗ ∈ ℝ3 . Then we have


(4.3.25) (Rot(𝑈)𝑝)⃗ ⋅ 𝜎 = 𝑈(𝑝 ⃗ ⋅ 𝜎)𝑈 −1 .

Proof. Let 𝑈 = 𝑅𝑤̂ (𝛾) with a unit vector 𝑤̂ ∈ ℝ3 and 𝛾 ∈ ℝ which exist by Theorem
4.3.15. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3) which exists by Theorem 4.2.8. Set
(4.3.26) 𝜏 = (𝜏ᵆ , 𝜏𝑣 , 𝜏𝑤 ) = 𝐵 ⋅ 𝜎.
−1
Also let 𝑞 ⃗ = 𝐵 𝑝.⃗ Then Proposition 4.2.25 and Lemma 4.3.22 imply
(Rot(𝑈)𝑝)⃗ ⋅ 𝜎 = (Rot𝑤̂ (𝛾)𝑝)⃗ ⋅ 𝜎 = (𝐵 Rot𝑧̂ (𝛾)𝐵−1 𝑝)⃗ ⋅ 𝜎
(4.3.27)
= (𝐵 Rot𝑧̂ (𝛾)𝑞)⃗ ⋅ 𝜎 = (Rot𝑧̂ (𝛾)𝑞)⃗ ⋅ 𝜏
and
(4.3.28) 𝑈(𝑝 ⃗ ⋅ 𝜎)𝑈 −1 = 𝑈(𝐵 𝑞 ⃗ ⋅ 𝜎)𝑈 −1 = 𝑈(𝑞 ⃗ ⋅ 𝜏)𝑈 −1 .
So it suffices to show that
(4.3.29) (Rot𝑧̂ (𝛾)𝑞)⃗ ⋅ 𝜏 = 𝑈(𝑞 ⃗ ⋅ 𝜏)𝑈 −1 .
Since the expressions on the left and right side of (4.3.29) are linear in 𝑞,⃗ it suffices to
prove this identity for 𝑞 ⃗ ∈ {𝑥,̂ 𝑦,̂ 𝑧}.
̂ This is done in Exercise 4.3.27. □
Exercise 4.3.27. Verify (4.3.29) in the proof of Lemma 4.3.26 for 𝑞 ⃗ = 𝑥̂ = (1, 0, 0),
𝑞 ⃗ = 𝑦 ̂ = (0, 1, 0), and 𝑞 ⃗ = 𝑧 ̂ = (0, 0, 1) using Proposition 4.3.25 and the trigonometric
identities in Section A.5.

The linearity of the map Rot can now be seen as follows. Lemma 4.3.26 implies
that for all 𝑈1 , 𝑈2 ∈ SU(2) and all 𝑝 ⃗ ∈ ℝ3 we have
(Rot(𝑈1 𝑈2 )𝑝)⃗ ⋅ 𝜎 = 𝑈1 𝑈2 (𝑝 ⃗ ⋅ 𝜎)𝑈2−1 𝑈1−1
(4.3.30)
= 𝑈1 ((Rot(𝑈2 )𝑝)⃗ ⋅ 𝜎) 𝑈1−1 = (Rot(𝑈1 ) Rot(𝑈2 )𝑝)⃗ ⋅ 𝜎.
We determine the kernel of Rot. Let 𝑈 ∈ SU(2). Write 𝑈 = 𝑅𝑤̂ (𝛾) as in The-
orem 4.3.15. Then it follows from the definition of rotations in Theorem 4.2.21 that
Rot𝑤̂ (𝛾) = 𝐼3 if and only if 𝛾 ≡ 0 mod 2𝜋. This is true if and only if 𝛾/2 ≡ 0 mod 𝜋.
Therefore, Theorem 4.3.15 implies that Rot(𝑈) = 𝐼3 if and only if 𝑈 = ±𝐼.
Next, we prove the second assertion of Theorem 4.3.20. For this, we need further
auxiliary results.
Lemma 4.3.28. Let 𝑝 ⃗ ∈ ℝ3 with spherical coordinate representation (1, 𝜃, 𝜙). Then we
have
cos 𝜃 𝑒−𝑖𝜙 sin 𝜃
(4.3.31) 𝑝 ⃗ ⋅ 𝜎 = ( 𝑖𝜙 ).
𝑒 sin 𝜃 − cos 𝜃
Exercise 4.3.29. Prove Lemma 4.3.28.

From Lemma 4.3.28 we obtain the following representation of the density opera-
tors corresponding to quantum states in ℍ1 .
Proposition 4.3.30. Let |𝜓⟩ be a quantum state in ℍ1 . Then we have
1
(4.3.32) |𝜓⟩ ⟨𝜓| = (𝐼 + 𝑝(𝜓)
⃗ ⋅ 𝜎).
2
164 4. The Theory of Quantum Algorithms

Proof. Let (1, 𝜃, 𝜙) be the spherical coordinate representation of 𝑝(𝜓).


⃗ Without loss of
generality let
𝜃 𝜃
(4.3.33) |𝜓⟩ = cos |0⟩ + 𝑒𝑖𝜙 sin |1⟩ .
2 2
Using the trigonometric identities (A.5.4) and (A.5.7) we obtain
𝜃
(2 |𝜓⟩ ⟨𝜓| − 𝐼) |0⟩ = 2 |𝜓⟩ ⟨𝜓|0⟩ − |0⟩ = 2 cos |𝜓⟩ − |0⟩
(4.3.34) 2
𝜃 𝜃 𝜃
= (2 cos2 − 1) |0⟩ + 2𝑒𝑖𝜙 cos sin |1⟩ = cos 𝜃 |0⟩ + 𝑒𝑖𝜙 sin 𝜃 |1⟩
2 2 2
and
𝜃
(2 |𝜓⟩ ⟨𝜓| − 𝐼) |1⟩ = 2 |𝜓⟩ ⟨𝜓|1⟩ − |1⟩ = 2𝑒−𝑖𝜙 sin|𝜓⟩ − |1⟩
2
(4.3.35) 𝜃 𝜃 2 𝜃
= 2𝑒−𝑖𝜙 sin cos |0⟩ + (2 sin − 1) |1⟩
2 2 2
= 𝑒−𝑖𝜙 sin 𝜃 |0⟩ − cos 𝜃 |1⟩ .
So the assertion follows from Lemma 4.3.28. □

With these results, we can prove the second assertion of Theorem 4.3.20. For this,
let 𝑈 ∈ SU(2) and let |𝜓⟩ be a quantum state in ℍ1 . Set 𝑝 ⃗ = 𝑝(𝜓)
⃗ ⃗ |𝜓⟩).
and 𝑞 ⃗ = 𝑝(𝑈
Then it follows from Proposition 4.3.30 that
1
(4.3.36) |𝑈 |𝜓⟩⟩ ⟨𝑈 |𝜓⟩| = (𝐼 + 𝑞 ⃗ ⋅ 𝜎)
2
and
1
(4.3.37) 𝑈 |𝜓⟩ ⟨𝜓| 𝑈 −1 = (𝐼 + 𝑈 𝑝 ⃗ ⋅ 𝜎𝑈 −1 ).
2
But Lemma 4.3.26 gives
(4.3.38) 𝑈 𝑝 ⃗ ⋅ 𝜎𝑈 −1 = (Rot(𝑈)𝑝)⃗ ⋅ 𝜎.
So (4.3.36), (4.3.37), and (4.3.38) imply the assertion.

4.3.4. Decomposition of rotation operators. From Corollary 4.3.16, Theo-


rem 4.2.33, and Theorem 4.3.20 we obtain the following decomposition result for rota-
tion gates.

Theorem 4.3.31. For every 𝑈 ∈ U(2) there are 𝛼, 𝛽, 𝛾, 𝛿 ∈ ℝ such that


(4.3.39) 𝑈 = 𝑒𝑖𝛿 𝑅𝑧̂ (𝛼)𝑅𝑦̂(𝛽)𝑅𝑧̂ (𝛾).
If 𝑈 ∈ SU(2), then there is such a representation with 𝛿 = 0.

We will now use Theorem 4.3.31 to give another representation of unitary single-
qubit operators, since Section 4.4 allows us to implement controlled operators. For this,
we need the following lemma.

Lemma 4.3.32. For all 𝛾 ∈ ℝ we have 𝑋𝑅𝑦̂(𝛾)𝑋 = 𝑅𝑦̂(−𝛾) and 𝑋𝑅𝑧̂ (𝛾)𝑋 = 𝑅𝑧̂ (−𝛾).
4.3. Rotation operators 165

Proof. By Theorem 4.1.2 we have 𝑋 2 = 𝐼 and 𝑋𝑌 𝑋 = −𝑋𝑋𝑌 = −𝑌 . So from (4.3.3)


we obtain
𝛾 𝛾
𝑋𝑅𝑦̂(𝛾)𝑋 = 𝑋( cos 𝐼 − 𝑖 sin 𝑌 )𝑋
2 2
𝛾 2 𝛾
= cos 𝑋 − 𝑖 sin 𝑋𝑌 𝑋
2 2
−𝛾 −𝛾
= cos 𝐼 − 𝑖 sin 𝑌
2 2
= 𝑅𝑦̂(−𝛾).
The second assertion can be proved analogously. □
Theorem 4.3.33. Let 𝑈 be a unitary operator on ℍ1 . Let 𝛼, 𝛽, 𝛾, 𝛿 ∈ ℝ such that
(4.3.40) 𝑈 = 𝑒𝑖𝛿 𝑅𝑧̂ (𝛼)𝑅𝑦̂(𝛽)𝑅𝑧̂ (𝛾).
Such a representation exists by by Theorem 4.3.31. Set
𝛽
𝐴 = 𝑅𝑧̂ (𝛼)𝑅𝑦̂ ( ) ,
2
(4.3.41) 𝛽 𝛼+𝛾
𝐵 = 𝑅𝑦̂ (− ) 𝑅𝑧̂ (− ),
2 2
𝛼−𝛾
𝐶 = 𝑅𝑧̂ (− ).
2
Then we have
(4.3.42) 𝐴𝐵𝐶 = 𝐼 and 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶.

Proof. Proposition 4.3.6 implies


𝛽 𝛽 𝛼+𝛾 𝛼−𝛾
(4.3.43) 𝐴𝐵𝐶 = 𝑅𝑧̂ (𝛼)𝑅𝑦̂ ( ) 𝑅𝑦̂ (− ) 𝑅𝑧̂ (− ) 𝑅𝑧̂ (− ) = 𝐼.
2 2 2 2
Using 𝑋 2 = 𝐼 and Lemma 4.3.32 we find that
𝛽 𝛼+𝛾
𝑋𝐵𝑋 = 𝑋𝑅𝑦̂ (− ) 𝑋𝑋𝑅𝑧̂ (− )𝑋
2 2
(4.3.44)
𝛽 𝛼+𝛾
= 𝑅𝑦̂ ( ) 𝑅𝑧̂ ( ).
2 2
Applying Proposition 4.3.6 again, we obtain
𝛽 𝛽 𝛼+𝛾 𝛼−𝛾
𝐴𝑋𝐵𝑋𝐶 = 𝑅𝑧̂ (𝛼)𝑅𝑦̂ ( ) 𝑅𝑦̂ ( ) 𝑅𝑧̂ ( ) 𝑅𝑧̂ (− )
(4.3.45) 2 2 2 2
= 𝑅𝑧̂ (𝛼)𝑅𝑦̂(𝛾)𝑅𝑧̂ (𝛾).
Hence(4.3.40) implies 𝑈 = 𝑒𝑖𝛾 𝐴𝑋𝐵𝑋𝐶. Together with (4.3.43) this concludes the proof.

The next exercise gives a representation as in Theorem 4.3.33 for rotation opera-
tors.
Exercise 4.3.34. Let 𝑤̂ ∈ ℝ3 be a unit vector and let 𝛾 ∈ ℝ. Set 𝐴 = 𝑅𝑤̂ (𝛾/2), 𝐵 =
𝑅𝑤̂ (−𝛾/2), and 𝐶 = 𝐼2 . Show that 𝑅𝑤̂ (𝛾) = 𝐴𝑋𝐵𝑋𝐶 and that 𝐴𝐵𝐶 = 𝐼2 .
166 4. The Theory of Quantum Algorithms

From Corollary 4.3.16, Proposition 4.3.14, and Theorem 4.3.20 we obtain the fol-
lowing decomposition result.

Theorem 4.3.35. Let 𝑎,⃗ 𝑏 ⃗ ∈ ℝ3 be nonparallel unit vectors. Denote by 𝜑 the angle be-
tween 𝑎⃗ and 𝑏.⃗ Then for all unitary operators 𝑈 on ℍ1 there are 𝑘 ∈ ℕ and real numbers
𝛼1 , . . . , 𝛼𝑘 , 𝛽1 , . . . , 𝛽 𝑘 , 𝛿 such that 𝑘 = O(1/𝜑) and
𝑘
(4.3.46) 𝑈 = 𝑒𝑖𝛿 ∏ 𝑅𝑎⃗ (𝛼𝑖 )𝑅𝑏⃗ (𝛽 𝑖 ).
𝑖=1

If 𝑈 ∈ SU(2), then there is such a representation with 𝛿 = 0.

4.3.5. Phase shift gates. In this section, we introduce the following special class
of rotation operators.

Definition 4.3.36. For 𝛾 ∈ ℝ the phase shift gate 𝑃(𝛾) is defined as

1 0
(4.3.47) 𝑃(𝛾) = ( ).
0 𝑒𝑖𝛾

It shifts the phase of the amplitude of |1⟩ by an angle 𝛾 while it does not change the
amplitude of |0⟩.

The phase shift gate can also be written as


𝛾
(4.3.48) 𝑃(𝛾) = 𝑒𝑖 2 𝑅𝑧̂ (𝛾).

The inverse and adjoint of 𝑃(𝛾) is 𝑃(−𝛾).


Next, we introduce special phase shift gates. For 𝑘 ∈ ℕ we set

1 0 2𝜋𝑖
2𝜋
(4.3.49) 𝑅𝑘 = ( 2𝜋𝑖 ) = 𝑒 2𝑘−1 𝑅𝑧̂ ( ).
0 𝑒 2𝑘 2𝑘−1

For 𝑘 = 2 we obtain the phase gate

𝜋 1 0
(4.3.50) 𝑆 = 𝑅2 = 𝑃 ( ) = ( ).
2 0 𝑖

Also, for 𝑘 = 3 this gives the 𝜋/8 gate

𝜋 1 0
(4.3.51) 𝑇 = 𝑅3 = 𝑃 ( ) = ( 𝑖𝜋/4 ) .
4 0 𝑒

It is called the “𝜋/8 gate” since it can be written as


𝑖𝜋 𝜋
(4.3.52) 𝑇 = 𝑒 8 𝑅𝑧 ̂ ( ) .
8
We note that

(4.3.53) 𝑇 2 = 𝑆.
4.4. Controlled operators 167

𝑆 𝑇

Figure 4.3.1. Symbols for the phase and 𝜋/8 gates in quantum circuits.

The symbols that represent the phase gate and the 𝜋/8 gate are shown in Figure
4.3.1.

4.4. Controlled operators


So far, we have discussed single-qubit operators. However, they are not sufficient to
construct quantum circuits that implement unitary operators on ℍ𝑛 for 𝑛 > 1. For
this purpose, multiple-qubit operators must be used. An important class of multiple-
qubit operators is controlled operators, which are discussed in this section. They apply
a unitary operator to some qubits, called the target qubits, conditioned on some other
qubits, called the control qubits, being in certain quantum states. We start with the
description of controlled-NOT gates. Then general controlled operators and several
special cases, such as quantum Toffoli operators, are introduced. In our discussion, we
let 𝑛 be a positive integer and identify linear operators on the state space ℍ𝑛 with their
representation matrices with respect to the computational basis of ℍ𝑛 . For several gates
that we introduce here, there are classical equivalents, which are presented in Section
1.7. The classical and the corresponding quantum gates are referred to by the same
names. Also, some of these gates are already explained in Chapter 3 .

4.4.1. Controlled-NOT gates. We begin with controlled-𝖭𝖮𝖳 gates, also known


as 𝖢𝖭𝖮𝖳 gates, which were introduced in Section 3.3.2. In Section 1.7.1, their classical
equivalent 𝖢𝖭𝖮𝖳 was presented. Figure 4.4.1 illustrates four different types of such
gates. All apply the Pauli 𝑋 gate, i.e., the quantum 𝖭𝖮𝖳 gate, to a target qubit based
on the state of a control qubit. The control qubit can be either the first or the second
qubit, and correspondingly, the target qubit can be either the second or the first qubit.
Furthermore, the Pauli 𝑋 gate may be applied to the target qubit depending on whether
the control qubit is |1⟩ or |0⟩. The upper-left 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 is called the
standard 𝖢𝖭𝖮𝖳 gate.

|𝑏0 ⟩ |𝑏0 ⟩ |𝑏0 ⟩ |𝑏0 ⊕ 𝑏1 ⟩


|𝑏1 ⟩ |𝑏0 ⊕ 𝑏1 ⟩ |𝑏1 ⟩ |𝑏1 ⟩

|𝑏0 ⟩ |𝑏0 ⟩ |𝑏0 ⟩ |𝑏0 ⊕ ¬𝑏1 ⟩


|𝑏1 ⟩ |𝑏0 ⊕ 𝑏1 ⟩ |𝑏1 ⟩ |𝑏1 ⟩

Figure 4.4.1. The four 𝖢𝖭𝖮𝖳 gates. The upper-left 𝖢𝖭𝖮𝖳 gate is called the standard
𝖢𝖭𝖮𝖳 gate.
168 4. The Theory of Quantum Algorithms

𝑋 𝑋
=

Figure 4.4.2. Implementation of the lower-left 𝖢𝖭𝖮𝖳 gate from Figure 4.4.1.

As seen in (3.3.5), the representation matrix of the standard 𝖢𝖭𝖮𝖳 gate with respect
to the computational basis of ℍ2 is
1 0 0 0
⎛ ⎞
0 1 0 0
(4.4.1) 𝖢𝖭𝖮𝖳 = ⎜ ⎟.
⎜0 0 0 1⎟
⎝0 0 1 0⎠
Exercise 4.4.1. Determine matrix representations of all 𝖢𝖭𝖮𝖳 gates in Figure 4.4.1
with respect to the computational basis of ℍ2 .

We note that the lower-left 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 can be implemented using
the standard 𝖢𝖭𝖮𝖳 gate and the Pauli 𝑋 gate as shown in Figure 4.4.2.
Exercise 4.4.2. (1) Show that the implementation of the 𝖢𝖭𝖮𝖳 gate in Figure 4.4.2
is correct.
(2) Give an implementation of the lower-right 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 using the
upper-right 𝖢𝖭𝖮𝖳 gate and the Pauli 𝑋 gate.

Next, we note that the roles of qubits as control and target qubits in the 𝖢𝖭𝖮𝖳 gates
depend on the choice of the basis of ℍ2 . To see this, consider the orthonormal basis
|0⟩ + |1⟩ |0⟩ − |1⟩
(4.4.2) (|𝑥+ ⟩ , |𝑥− ⟩) = (𝐻 |0⟩ , 𝐻 |1⟩) = (
, )
√2 √2
of ℍ1 where 𝐻 is the Hadamard operator. We have seen in Section 4.1.2 that it is an
eigenbasis of the Pauli 𝑋 operator. Also,
(4.4.3) (|𝑥+ 𝑥+ ⟩ , |𝑥− 𝑥+ ⟩ , |𝑥+ 𝑥− ⟩ , |𝑥− 𝑥− ⟩)
is an orthonormal basis of ℍ2 . As shown in Exercise 4.4.3, applying 𝖢𝖭𝖮𝖳 to the ele-
ments of this basis has the following effect:
𝖢𝖭𝖮𝖳 |𝑥+ 𝑥+ ⟩ = |𝑥+ 𝑥+ ⟩ , 𝖢𝖭𝖮𝖳 |𝑥− 𝑥+ ⟩ = |𝑥− 𝑥+ ⟩ ,
(4.4.4)
𝖢𝖭𝖮𝖳 |𝑥+ 𝑥− ⟩ = |𝑥− 𝑥− ⟩ , 𝖢𝖭𝖮𝖳 |𝑥− 𝑥− ⟩ = |𝑥+ 𝑥− ⟩ .
Exercise 4.4.3. Prove (4.4.4).

We see that in (4.4.4) the 𝖢𝖭𝖮𝖳 operator exchanges the basis states |𝑥+ ⟩ and |𝑥− ⟩
of the first qubit conditioned on the second qubit being in state |𝑥− ⟩. So in this repre-
sentation of 𝖢𝖭𝖮𝖳, the first qubit is the target, while the second qubit is the control.
As shown in Figure 4.4.3, this observation can be used to implement the upper-right
𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 using the standard 𝖢𝖭𝖮𝖳 gate and the Hadamard gate.
Exercise 4.4.4. Show that all 𝖢𝖭𝖮𝖳 gates in Figure 4.4.1 can be implemented using
the standard 𝖢𝖭𝖮𝖳 gate, the Pauli 𝑋 gate, and the Hadamard gate 𝐻.
4.4. Controlled operators 169

𝐻 𝐻
=
𝐻 𝐻

Figure 4.4.3. Implementation of the upper-right 𝖢𝖭𝖮𝖳 gate in Figure 4.4.1 using the
Hadamard gate and the standard 𝖢𝖭𝖮𝖳 gate.

|𝑐⟩ |𝑐⟩ |𝑐⟩ |𝑐⟩

|𝑡⟩ |𝑐 ⊕ 𝑡⟩ |𝑡⟩ |¬𝑐 ⊕ 𝑡⟩

Figure 4.4.4. 𝖢𝖭𝖮𝖳 operators acting on quantum registers of length > 2.

The 𝖢𝖭𝖮𝖳 gates from Figure 4.4.1 can also be applied to quantum registers of
length > 2. This is shown in Figure 4.4.4. Such 𝖢𝖭𝖮𝖳 operators are specified by their
action on the computational basis vectors |𝑏0 ⋯ 𝑏𝑛−1 ⟩, (𝑏0 ⋯ 𝑏𝑛−1 ) ∈ {0, 1}𝑛 , as fol-
lows. There is a control qubit |𝑐⟩ and a target qubit |𝑡⟩, 𝑐, 𝑡 ∈ ℤ𝑛 , 𝑐 ≠ 𝑡. The target qubit
|𝑡⟩ is mapped to 𝑋 𝑐 |𝑡⟩. All other qubits remain unchanged.

4.4.2. Controlled-𝑈 operators. We generalize the construction of the 𝖢𝖭𝖮𝖳 op-


erators by replacing the Pauli 𝑋 gate in Figure 4.4.1 by an arbitrary unitary single-qubit
operator 𝑈. The resulting operators are shown in Figure 4.4.5. They apply 𝑈 to the tar-
get qubit depending on the control qubit being |0⟩ or |1⟩, respectively, and are called
controlled-𝑈 operators.
For any unitary single-qubit operator 𝑈, the controlled-𝑈 operators in Figure 4.4.5
can be implemented using a decomposition 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶 with 𝐴𝐵𝐶 = 𝐼 which
exists by Theorem 4.3.33. For the upper-left operator in Figure 4.4.5 this is shown in
Figure 4.4.7. The correctness of this construction is stated in the next theorem.

|𝑏0 ⟩ |𝑏0 ⟩
|𝑏0 ⟩ 𝑈 𝑈 𝑏1 |𝑏0 ⟩
|𝑏1 ⟩ 𝑈 𝑈 𝑏0 |𝑏1 ⟩ |𝑏1 ⟩ |𝑏1 ⟩
|𝑏0 ⟩ |𝑏0 ⟩ |𝑏0 ⟩ 𝑈 𝑈 1−𝑏1 |𝑏0 ⟩
|𝑏1 ⟩ 𝑈 𝑈 1−𝑏0 |𝑏1 ⟩ |𝑏1 ⟩ |𝑏1 ⟩

Figure 4.4.5. Controlled-𝑈 operators.


170 4. The Theory of Quantum Algorithms

1 0
( ) 𝑒𝑖𝛿/2 𝑅𝑧̂ (𝛿)
0 𝑒𝑖𝛿
= =
𝑒𝑖𝛿 0
( )
0 𝑒𝑖𝛿

Figure 4.4.6. Two circuits implementing the same operator.

𝑃(𝛿)
=
𝑈 𝐶 𝐵 𝐴

Figure 4.4.7. Implementation of the controlled-𝑈 gate with the first qubit as control
using the decomposition 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶.

𝑒𝑖𝛿 0
𝐶 𝐵 𝐴 ( )
0 𝑒𝑖𝛿

Figure 4.4.8. Circuit that implements the same operator as the circuit in Figure 4.4.7.

Theorem 4.4.5. Let 𝑈 be a unitary operator on ℍ1 and let 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶 where
𝛿 ∈ ℝ and 𝐴, 𝐵, 𝐶 are unitary single-qubit operators with 𝐴𝐵𝐶 = 𝐼. Then the upper-
left controlled-𝑈 operator in Figure 4.4.5 can be implemented as shown in Figure 4.4.7
using 𝐴, 𝐵, 𝐶, and 𝑃(𝛿) and two 𝖢𝖭𝖮𝖳 gates .

Proof. We show that the implementation in Figure 4.4.7 is correct. We first note that
the two circuits in Figure 4.4.6 implement the same operator since when applied to the
computational basis states of ℍ2 , both have the following effect:

(4.4.5) |00⟩ ↦ |00⟩ , |01⟩ ↦ |01⟩ , |10⟩ ↦ 𝑒𝑖𝛿 |10⟩ , |11⟩ ↦ 𝑒𝑖𝛿 |11⟩ .

This means that the circuit in Figure 4.4.8 implements the same operator as the circuit
in Figure 4.4.7. We show that the circuit in Figure 4.4.8 implements the controlled-𝑈
operator with the first qubit as control. If the first qubit is |1⟩, then the circuit applies
𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶 to the second qubit. If the first qubit is |0⟩, then the circuit applies
𝐴𝐵𝐶 to the second qubit. Since 𝐴𝐵𝐶 = 𝐼, the circuit does not change the second qubit.
This proves the claim. □

Exercise 4.4.6. Show how Theorem 4.4.5 can be used to implement the controlled-𝑌 ,
𝑍, 𝑆, and 𝑇 operators.
4.4. Controlled operators 171

|𝑐⟩ |𝑐⟩ |𝑐⟩ |𝑐⟩

|𝑡⟩ 𝑈 𝑈 𝑐 |𝑡⟩ |𝑡⟩ 𝑈 𝑈 ¬𝑐 |𝑡⟩

Figure 4.4.9. Controlled-𝑈 operators acting on quantum states with more than two qubits.

Like the CNOT gates, the controlled-𝑈 gates can be applied to quantum registers of
length > 2. This is shown in Figure 4.4.9. Theorem 4.4.5 implies that these generalized
controlled-𝑈 operators can be implemented using four unitary single-qubit operators
and two 𝖢𝖭𝖮𝖳 gates.

4.4.3. General controlled operators. Now we present the most general con-
trolled operators. An example of such an operator is shown in Figure 4.4.10. This
operator applies the unitary operator 𝑈 on ℍ2 to the qubits |𝑏4 𝑏5 ⟩ conditioned on the
qubit |𝑏1 ⟩ being |0⟩ and the qubit |𝑏2 ⟩ being |1⟩. The other qubits are not changed. So,
it acts on the computational basis states of ℍ7 as follows:

(4.4.6) |𝑏0 . . . 𝑏6 ⟩ ↦ |𝑏0 . . . 𝑏3 ⟩ 𝑈 (1−𝑏1 )𝑏2 |𝑏4 𝑏5 ⟩ |𝑏6 ⟩ .

We describe general controlled operators on ℍ𝑛 formally.

Definition 4.4.7. Let 𝐶0 , 𝐶1 , and 𝑇 be pairwise disjoint subsets of the index set ℤ𝑛 .
Let 𝑚 = |𝑇| > 0, and let 𝑇 = {𝑡, 𝑡 + 1, . . . , 𝑡 + 𝑚 − 1} with 𝑡 ∈ ℤ𝑛 . So 𝑇 is a set of 𝑚
consecutive integers in the index set ℤ𝑛 . Also, let 𝑈 be a unitary operator on ℍ𝑚 . Then

|𝑏0 ⟩ |𝑏0 ⟩
|𝑏1 ⟩ |𝑏1 ⟩
|𝑏2 ⟩ |𝑏2 ⟩
|𝑏3 ⟩ |𝑏3 ⟩

|𝑏4 ⟩
𝑈 𝑈 ¬𝑏1 ∧𝑏2 |𝑏4 𝑏5 ⟩
|𝑏5 ⟩

|𝑏6 ⟩ |𝑏6 ⟩

Figure 4.4.10. Example for a general controlled operator.


172 4. The Theory of Quantum Algorithms

the linear operator 𝐶 𝐶0 ,𝐶1 ,𝑇 (𝑈) is defined by its action on the computational basis states
|𝑏0 ⋯ 𝑏𝑛−1 ⟩ of ℍ𝑛 as follows. It applies 𝑈 to the target qubits |𝑏𝑡 ⋯ 𝑏𝑡+𝑚−1 ⟩ conditioned
on the control qubits |𝑏𝑖 ⟩ with 𝑖 ∈ 𝐶0 being |0⟩ and the control qubits |𝑏𝑖 ⟩ with 𝑖 ∈ 𝐶1
being |1⟩; i.e.,
𝐶 𝐶0 ,𝐶1 ,𝑇 (𝑈) |𝑏0 ⋯ 𝑏𝑛−1 ⟩
(4.4.7)
= |𝑏0 ⋯ 𝑏𝑡−1 ⟩ 𝑈 𝑐 |𝑏𝑡 ⋯ 𝑏𝑡+𝑚−1 ⟩ |𝑏𝑚 ⋯ 𝑏𝑛−1 ⟩
where
(4.4.8) 𝑐 = ∏ (1 − 𝑏𝑖 ) ∏ 𝑏𝑖 .
𝑖∈𝐶0 𝑖∈𝐶1

If any of the index sets 𝐶0 , 𝐶1 , or 𝑇 has only one element, then we replace the set in the
superscript by this element.

In this definition, we could drop the requirement that the set 𝑇 of target qubits be
a set of consecutive numbers. However, this would complicate the definition and we
can achieve the same effect by using SWAP gates, described in Section 4.5. As shown
in Exercise 4.4.8, general multiply controlled operators are unitary.
Exercise 4.4.8. Prove that every multiply controlled operator as specified in Definition
4.4.7 is unitary.
Example 4.4.9. Using the notation from Definition 4.4.7 the operator in Figure 4.4.10
can be written as
(4.4.9) 𝐶 1,2,{4,5} (𝑈).
Example 4.4.10. Using the notation from Definition 4.4.7 we see that for the left op-
erator in Figure 4.4.4 we have 𝐶0 = ∅, 𝐶1 = {𝑐}, 𝑇 = {𝑡}, and 𝑈 = 𝑋. So, the operator is
𝐶 ∅,𝑐,𝑡 (𝑋). Also, the right operator is 𝐶 𝑐,∅,𝑡 (𝑋).

The implementation of general controlled operators is discussed in Section 4.9.2.


Also, the next sections present further important instances of general controlled oper-
ators.

4.4.4. The quantum Toffoli gate. Figure 4.4.11 shows the quantum Toffoli gate,
also called 𝖢𝖢𝖭𝖮𝖳. Its classical counterpart has been introduced in Section 1.7.1.
The 𝖢𝖢𝖭𝖮𝖳 gate applies the Pauli 𝑋 gate to the target qubit |𝑡⟩ conditioned on the
control qubits |𝑐 0 ⟩ and |𝑐 1 ⟩ both being |1⟩; that is, it acts on the computational basis
vectors of ℍ3 as follows:
(4.4.10) |𝑐 0 𝑐 1 𝑡⟩ ↦ |𝑐 0 𝑐 1 ⟩ 𝑋 𝑐0 𝑐1 |𝑡⟩ .

|𝑥⟩ 𝑐 0 |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩

|𝑡⟩ 𝑋 𝑐0 𝑐1 |𝑡⟩

Figure 4.4.11. The quantum Toffoli or 𝖢𝖢𝖭𝖮𝖳 gate.


4.4. Controlled operators 173

|𝑐 0 ⟩ |𝑐 0 ⟩

|𝑡⟩ |(𝑐 0 ⋅ 𝑐 1 ) ⊕ 𝑡⟩

|𝑐 1 ⟩ |𝑐 1 ⟩

Figure 4.4.12. Generalized 𝖢𝖢𝖭𝖮𝖳 operator.

Using the terminology from Definition 4.4.7 we can write


(4.4.11) 𝖢𝖢𝖭𝖮𝖳 = 𝐶 ∅,{0,1},2 (𝑋).
We may also apply 𝖢𝖢𝖭𝖮𝖳 to states that have more than three qubits or change the
order of the control and target qubits. This is shown in Figure 4.4.12. The implemen-
tation of the Toffoli gate is discussed in Section 4.9.1.

4.4.5. The 𝐶 𝑘 (𝑈) operators. We introduce the controlled operators 𝐶 𝑘 (𝑈). Such
an operator is shown in Figure 4.4.13. To specify it, we use 𝑘, 𝑚 ∈ ℕ with 𝑛 = 𝑘 + 𝑚
and a unitary operator 𝑈 on ℍ𝑚 . We write the computational basis vectors of ℍ𝑛 as
|𝑐 0 ⋯ 𝑐 𝑘−1 𝑡0 ⋯ 𝑡𝑚−1 ⟩ instead of |𝑏0 ⋯ 𝑏𝑛−1 ⟩ to distinguish between control and target
qubits. Then we have
𝑘−1
(4.4.12) 𝐶 𝑘 (𝑈) |𝑐 0 ⋯ 𝑐 𝑘−1 𝑡0 ⋯ 𝑡𝑚−1 ⟩ = |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ 𝑈 ∏𝑖=0 |𝑡0 ⋯𝑡𝑚−1 ⟩
or using the notation of Definition 4.4.7
(4.4.13) 𝐶 𝑘 (𝑈) = 𝐶 ∅,{0,. . .,𝑘−1},{𝑘,. . .,𝑘+𝑚−1} (𝑈).

|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩

⋮ ⋮
|𝑐 𝑘−1 ⟩ |𝑐 𝑘−1 ⟩

|𝑡0 ⟩

|𝑡1 ⟩
𝑘−1
𝑈 ∏𝑖=0 𝑐𝑖
|𝑡0 ⋯ 𝑡𝑚−1 ⟩
⋮ ⋮ 𝑈

|𝑡𝑚−1 ⟩

Figure 4.4.13. The operator 𝐶 𝑘 (𝑈).


174 4. The Theory of Quantum Algorithms

For example, we can write


(4.4.14) 𝖢𝖢𝖭𝖮𝖳 = 𝐶 2 (𝑋).

4.4.6. Transposition operators. As a last example for controlled operators, we


present transposition operators.
Definition 4.4.11. Let 𝑡 ∈ ℤ𝑛 and let
(4.4.15) 𝑐 ⃗ = 𝑐 0 ⋯ 𝑐 𝑡−1 ∗ 𝑐 𝑡+1 ⋯ 𝑐𝑛−1
with 𝑐 𝑖 ∈ {0, 1} for 𝑖 ∈ ℤ𝑛 , 𝑖 ≠ 𝑡. This is a vector of length 𝑛 with entries from {0, 1}
𝑐⃗
except that the entry with index 𝑡 is “∗”. The transposition operator 𝖳𝖱𝖠𝖭𝖲 exchanges
the computational basis elements
(4.4.16) |𝑐 0 ⋯ 𝑐 𝑡−1 ⟩ |0⟩ |𝑐 𝑡+1 ⋯ 𝑐𝑛−1 ⟩
and
(4.4.17) |𝑐 0 ⋯ 𝑐 𝑡−1 ⟩ |1⟩ |𝑐 𝑡+1 ⋯ 𝑐𝑛−1 ⟩
and leaves all other computational basis elements unchanged.

Using the notation from Definition 4.4.7 we can write


𝑐⃗
(4.4.18) 𝖳𝖱𝖠𝖭𝖲 = 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑋)
where
(4.4.19) 𝐶0 = {𝑖 ∈ ℤ𝑛 ∶ 𝑐 𝑖 = 0}, 𝐶1 = {𝑖 ∈ ℤ𝑛 ∶ 𝑐 𝑖 = 1}.

An example of such a transposition operator can be seen in Figure 4.4.14. In this


example, we have 𝑐 ⃗ = 01 ∗ 0.

|𝑏0 ⟩ |𝑏0 ⟩
|𝑏1 ⟩ |𝑏1 ⟩

|𝑏2 ⟩ |(1 − 𝑏0 )𝑏1 (1 − 𝑏3 ) ⊕ 𝑏2 ⟩

|𝑏3 ⟩ |𝑏3 ⟩

Figure 4.4.14. The operator 𝖳𝖱𝖠𝖭𝖲(01∗0) which exchanges |0100⟩ and |0110⟩.

4.5. Swap and permutation operators


Another important multiple-qubit operator is the quantum 𝖲𝖶𝖠𝖯 operator which cor-
responds to the classical 𝖲𝖶𝖠𝖯 gate introduced in Section 1.7.1. Applied to a compu-
tational basis element |𝑏0 𝑏1 ⟩ of ℍ2 it swaps 𝑏0 and 𝑏1 ; i.e., we have
(4.5.1) 𝖲𝖶𝖠𝖯 |𝑏0 𝑏1 ⟩ = |𝑏1 𝑏0 ⟩ .
This gate is shown in Figure 4.5.1 together with an implementation that only uses
𝖢𝖭𝖮𝖳 gates.
4.6. Ancillary and erasure gates 175

𝑏0 𝑏1 |𝑏0 ⟩ |𝑏1 ⟩
=
𝑏1 𝑏0 |𝑏1 ⟩ |𝑏0 ⟩

Figure 4.5.1. The quantum 𝖲𝖶𝖠𝖯 gate and its implementation using 𝖢𝖭𝖮𝖳 gates.

Exercise 4.5.1. Verify that the implementation of the 𝖲𝖶𝖠𝖯 gate in Figure 4.5.1 is
correct.

We generalize the quantum 𝖲𝖶𝖠𝖯 gates. Let 𝑛 ∈ ℕ, 𝑖, 𝑗 ∈ ℤ𝑛 with 𝑖 ≤ 𝑗. Then


the quantum swap gate 𝖲𝖶𝖠𝖯𝑛 (𝑖, 𝑗) is the unitary operator on ℍ𝑛 that is defined by its
effect on the computational basis states |𝑏0 ⋯ 𝑏𝑛−1 ⟩ of ℍ𝑛 as follows:
(4.5.2) 𝖲𝖶𝖠𝖯𝑛 (𝑖, 𝑗) |𝑏0 ⋯ 𝑏𝑖 ⋯ 𝑏𝑗 ⋯ 𝑏𝑛−1 ⟩ = |𝑏0 ⋯ 𝑏𝑗 ⋯ 𝑏𝑖 ⋯ 𝑏𝑛−1 ⟩ .

Generalizing the implementation of the simple 𝖲𝖶𝖠𝖯 gate in Figure 4.5.1, we see
that every quantum 𝖲𝖶𝖠𝖯 gate can be implemented using 3 𝖢𝖭𝖮𝖳 gates.
We note that 𝖲𝖶𝖠𝖯 gates apply a transposition to the index sequence of the com-
putational basis states of ℍ𝑛 . This suggests the generalization of quantum swap gates
as follows. For 𝜋 ∈ 𝑆𝑛 the quantum permutation operator 𝑈𝜋 is defined by its action
on the computational basis states |𝑏0 ⋯ 𝑏𝑛−1 ⟩ as follows:
(4.5.3) 𝑈𝜋 |𝑏0 ⋯ 𝑏𝑛−1 ⟩ = |𝑏𝜋(0) ⋯ 𝑏𝜋(𝑛−1) ⟩ .

Using Proposition 1.7.2 the following result can be proved.


Proposition 4.5.2. Let 𝑛 ∈ ℕ and let 𝜋 ∈ 𝑆𝑛 . Then the permutation operator 𝑈𝜋 can be
implemented by a quantum circuit that uses at most 𝑛 − 1 𝖲𝖶𝖠𝖯 gates or at most 3𝑛 − 3
𝖢𝖭𝖮𝖳 gates, respectively.
Exercise 4.5.3. Prove Proposition 4.5.2.

4.6. Ancillary and erasure gates


In the previous sections and in Section 3.3.2, we have introduced quantum gates that
implement unitary operators on some state space ℍ𝑛 and which can be used as building
blocks of more complex quantum circuits. Such quantum gates are called unitary gates.
Now it turns out that the construction of many quantum circuits also requires the use
of two types of quantum gates that do not implement unitary operators: ancillary gates
and erasure gates. This section introduces these gates. The quantum circuit in Figure
4.6.1 illustrates why these gates are useful. It implements the operator 𝐶 3 (𝑈) for some
single-qubit operator 𝑈. The evolution of the basis states of the quantum register in this
circuit is shown in Table 4.6.1. The circuit uses two ancilla qubits |𝑎0 ⟩ and |𝑎1 ⟩. They
are inserted between the control and target qubits and are initialized to |0⟩ using two
ancillary gates. The first 𝖢𝖢𝖭𝖮𝖳 gate changes the state of |𝑎0 ⟩ to |𝑐 0 ⋅ 𝑐 1 ⟩. The second
𝖢𝖢𝖭𝖮𝖳 gate changes the state of |𝑎1 ⟩ to |𝑐 0 ⋅ 𝑐 1 ⋅ 𝑐 2 ⟩. The ancillary qubit |𝑎1 ⟩ controls
the application of 𝑈 to the target qubit |𝑡⟩. So after the application of this controlled
operation, the target qubit is in the state 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐2 |𝑡⟩. The two further 𝖢𝖢𝖭𝖮𝖳 gates
change the ancilla bits back to |𝑎0 ⟩ = |𝑎1 ⟩ = |0⟩. The two erasure gates trace out |𝑎0 ⟩
176 4. The Theory of Quantum Algorithms

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩ |𝜓4 ⟩ |𝜓5 ⟩ |𝜓6 ⟩ |𝜓7 ⟩


|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
|𝑐 2 ⟩ |𝑐 2 ⟩

|𝑎0 ⟩ = |0⟩ tr

|𝑎1 ⟩ = |0⟩ tr

|𝑡⟩ 𝑈 𝑈 𝑐0 𝑐1 𝑐2 |𝑡⟩

Figure 4.6.1. Implementation of 𝐶 3 (𝑈) using ancillary and erasure gates.

Table 4.6.1. Evolution of the states in the circuit from Figure 4.6.1.

𝑖 |𝜓𝑖 ⟩
0 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑡⟩
1 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |0⟩ |0⟩ |𝑡⟩
2 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |0⟩ |𝑡⟩
3 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |𝑐 0 ⋅ 𝑐 1 ⋅ 𝑐 2 ⟩ |𝑡⟩
4 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |𝑐 0 ⋅ 𝑐 1 ⋅ 𝑐 2 ⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩
5 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑐 0 ⋅ 𝑐 1 ⟩ |0⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩
6 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |0⟩ |0⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩
7 |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐1 |𝑡⟩ = 𝐶 3 (𝑈) |𝜓⟩0

and |𝑎1 ⟩. Since the state of the quantum register used in the circuit is separable with
respect to the decomposition into ancillary and nonancillary qubits, it follows from
Corollary 3.7.12 that this does not change the other qubits and the resulting state is
𝐶 3 (𝑈) |𝜓⟩.
Exercise 4.6.1. Verify Table 4.6.1.

4.7. Quantum circuits revisited


We have already introduced quantum circuits in Section 3.3.4. In this section, we ex-
pand the definition of quantum circuits to include ancillary and erasure gates and pro-
vide a more formal description of quantum circuits and their computations.
Definition 4.7.1. A quantum circuit 𝑄 is specified by two positive integers 𝑛 and 𝑘
and a finite sequence (𝑞0 , . . . , 𝑞𝑘−1 ). Here, 𝑛 is the number of input qubits and for all
𝑖 ∈ ℤ𝑘 the component 𝑞𝑖 contains the following:
(1) a tuple of quantum gates that are either all ancillary gates, all unitary gates, or all
erasure gates and
4.7. Quantum circuits revisited 177

(2) the information about how the ancilla qubits are initialized and where they are
inserted or to which qubits the unitary or erasure gates are applied, respectively.
At most one gate is applied to each qubit.

We illustrate Definition 4.7.1.


Example 4.7.2. Consider the quantum circuit shown in Figure 4.6.1. Its representa-
tion as explained in Definition 4.7.1 is illustrated in Figure 4.7.1. In this circuit, we
have 𝑛 = 4 since there are the input qubits |𝑐 0 ⟩, |𝑐 1 ⟩, |𝑐 2 ⟩, and |𝑡⟩. Also, we have 𝑘 = 7.
The component 𝑞0 contains two ancillary gates that insert two ancillary qubits and
initialize them to |0⟩. Also, 𝑞0 contains the information that the two ancilla qubits
are inserted behind the control qubits. The sequence elements 𝑞1 and 𝑞2 contain one
𝖢𝖢𝖭𝖮𝖳 gate each and the information about to which of the six qubits these gates are
applied. The element 𝑞3 contains a 𝐶 1 (𝑈) gate and 𝑞4 and 𝑞5 each contain two 𝖢𝖢𝖭𝖮𝖳
gates. Additionally, these gates include information about the qubits to which they are
applied. Finally, 𝑞6 contains two erasure gates and the information that they trace out
the two ancillary qubits. The quantum operator implemented by this circuit is 𝐶 3 (𝑈).
Definition 4.7.3. The size of a quantum circuit is the number of quantum gates it
contains plus the number of input qubits it operates on.

The computation of a quantum circuit 𝑄 as in Definition 4.7.1 can be described as


an evolution
(4.7.1) |𝜓0 ⟩ , |𝜓1 ⟩ , . . . , |𝜓𝑘 ⟩
of quantum states that are defined as follows.
(1) The initial state |𝜓0 ⟩ ∈ ℍ𝑛 is the input of the computation.
(2) For 𝑖 ∈ ℤ𝑘 the state |𝜓𝑖+1 ⟩ is obtained by applying the quantum gates in 𝑞𝑖 to |𝜓𝑖 ⟩
as specified in 𝑞𝑖 .
(3) The final state is |𝜓𝑘 ⟩ = |𝑐 0 ⋯ 𝑐𝑚−1 ⟩ ∈ ℍ𝑚 with 𝑚 = 𝑛 + 𝑛𝑎 − 𝑛𝑒 where 𝑛𝑎 is
the number of ancillary gates and 𝑛𝑒 is the number of erasure gates used in the
quantum circuit.

𝑞1 𝑞5
|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
𝑞2 𝑞4
|𝑐 2 ⟩ |𝑐 2 ⟩
𝑞0 𝑞6
|𝑎0 ⟩ = |0⟩ tr
𝑞3
|𝑎1 ⟩ = |0⟩ tr

|𝑡⟩ 𝑈 𝑈 𝑐0 𝑐1 𝑐2 |𝑡⟩

Figure 4.7.1. Illustration of Definition 4.7.1.


178 4. The Theory of Quantum Algorithms

The quantum operator implemented by 𝑄 is


(4.7.2) ℍ𝑛 → ℍ𝑚 , |𝜓0 ⟩ ↦ |𝜓𝑘 ⟩ .
Example 4.7.4. As shown in Section 4.6, the quantum operator shown in Figures 4.6.1
and 4.7.1 implements 𝐶 3 (𝑈).

We make some remarks about this construction. Since tracing out quantum bits
is allowed in this general definition of quantum circuits, the resulting quantum state
may be a mixed state. But frequently the situation is simpler. Denote by 𝐴 the quantum
system of all output qubits that are not traced out and by 𝐵 the quantum system of the
qubits that are traced out and assume that the state of system 𝐴𝐵 is separable. Then
by Corollary 3.7.12, tracing out system 𝐵 means omitting the corresponding quantum
bits. This is the case in Figure 4.6.1.
Next, we call the quantum circuit 𝑄 unitary if 𝑚 = 𝑛 and the quantum operator
implemented by it is unitary. If 𝑄 uses no ancillary and erasure gates, then it is always
unitary. If 𝑄 uses ancillary or erasure gates, it may or may not be unitary. For example,
if 𝑚 ≠ 𝑛, then 𝑄 is not unitary.
But every quantum circuit can be transformed into a unitary quantum circuit. To
see how this works, let 𝑄 be a quantum circuit. The transformed quantum circuit 𝑅 is
obtained by removing the ancillary gates and adding the corresponding ancillary qubits
as new input qubits. Also, we remove the erasure gates and add the corresponding
qubits as new output qubits, and the new quantum circuit 𝑅 is called the purification
of the quantum circuit 𝑄. This purification is unitary. Furthermore, this process is
referred to as purification of 𝑄. Figure 4.7.2 shows the purification 𝑅 of the quantum
circuit 𝑄 from Figure 4.6.1. However, note that the quantum circuit 𝑄 in Figure 4.6.1
is already unitary. So we see that purification may not be required to make a quantum
circuit unitary.
It is also possible to represent quantum circuits as algorithms that are specified us-
ing pseudocode. An example is the algorithmic representation of the quantum circuit
in Figure 4.6.1 which is shown in Algorithm 4.7.5.

|𝜓0 ⟩|𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩ |𝜓4 ⟩ |𝜓5 ⟩ |𝜓6 ⟩|𝜓7 ⟩


|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
|𝑐 2 ⟩ |𝑐 2 ⟩

|𝑎0 ⟩ |𝑎0 ⟩

|𝑎1 ⟩ |𝑎1 ⟩

|𝑡⟩ 𝑈 𝑈 𝑐0 𝑐1 𝑐2 |𝑡⟩

Figure 4.7.2. Purification of the quantum circuit in Figure 4.6.1.


4.8. Universal sets of quantum gates 179

Algorithm 4.7.5. Implementation of 𝐶 3 (𝑈) using only 𝖢𝖢𝖭𝖮𝖳 and 𝐶 1 (𝑈)


Input: |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑡⟩
Output: |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ 𝑈 𝑐0 ⋅𝑐1 ⋅𝑐2 |𝑡⟩
3
1: 𝐶 (𝑈)
2: Insert ancilla qubits |𝑎0 ⟩ , |𝑎1 ⟩ after |𝑐 3 ⟩ and initialize them to |0⟩
3: |𝑎0 ⟩ ← 𝑋 𝑐0 ⋅𝑐1 |𝑎0 ⟩
4: |𝑎1 ⟩ ← 𝑋 𝑐2 ⋅𝑎0 |𝑎1 ⟩
5: |𝑡⟩ ← 𝑈 𝑎1 |𝑡⟩
6: |𝑎1 ⟩ ← 𝑋 𝑐2 ⋅𝑎0 |𝑎1 ⟩
7: |𝑎0 ⟩ ← 𝑋 𝑐0 ⋅𝑐1 |𝑎0 ⟩
8: Trace out |𝑎0 ⟩ and |𝑎1 ⟩
9: The final state is |𝑐 0 ⟩ |𝑐 1 ⟩ |𝑐 2 ⟩ |𝑡⟩
10: end

Exercise 4.7.6. Write an algorithm that represents the quantum circuit in Figure 4.7.2.

From Theorem 1.7.12 we can deduce the following result.

Theorem 4.7.7. Let 𝑛, 𝑚 ∈ ℕ and let 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 . Then there is a quantum
circuit 𝑄 of size O(|𝑓|𝐹 ) that only uses quantum Toffoli, ancillary, and erasure gates and
implements the quantum operator
(4.7.3) 𝑈 ∶ ℍ𝑛+𝑚 → ℍ𝑛+𝑚 , |𝑥⟩⃗ |𝑦⟩⃗ ↦ |𝑥⟩⃗ |𝑦 ⃗ ⊕ 𝑓(𝑥)⟩
⃗ .

Proof. Consider the quantum circuit 𝑄𝑟 which is obtained by replacing the classical
Toffoli gates in the circuit 𝐷𝑟 from Theorem 1.7.12 with quantum Toffoli gates. It im-
plements a unitary operator
(4.7.4) 𝑈𝑟 ∶ ℍ𝑛 ⊗ ℍ𝑛+𝑝 ⊗ ℍ𝑚 → ℍ𝑛 ⊗ ℍ𝑛+𝑝 ⊗ ℍ𝑚
where 𝑝 ∈ ℕ, 𝑝 ≤ 2|𝑓|𝐹 , such that for all 𝑥⃗ ∈ {0, 1}𝑛 and all 𝑦 ⃗ ∈ {0, 1}𝑚 we have

(4.7.5) 𝑈𝑟 |𝑥⟩⃗ ||0⟩⃗ |𝑦⟩⃗ = |𝑥⟩⃗ ||0⟩⃗ |𝑦 ⃗ ⊕ 𝑓(𝑥)⟩


⃗ .

This quantum circuit is the purification of a quantum circuit 𝑄 that is constructed


as follows. The input state is |𝑥⟩⃗ |𝑦⟩.
⃗ First, 𝑄 inserts 𝑛 + 𝑝 ancilla qubits behind |𝑥⟩ that
are all initialized to |0⟩. Then it applies 𝑄𝑟 . The result is the separable state in (4.7.5).
Finally, 𝑄 traces out the 𝑛 + 𝑝 ancilla qubits. By Corollary 3.7.12, this gives the asserted
quantum state. □

4.8. Universal sets of quantum gates


In Definition 1.5.6, a universal set 𝑆 of logic gates is defined by the property that every
function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑚 can be implemented by a Boolean circuit that uses only
the gates from 𝑆. For example, by Theorems 1.5.7 and 1.5.9, the sets {𝖠𝖭𝖣, 𝖮𝖱, 𝖭𝖮𝖳}
and {𝖭𝖠𝖭𝖣} are universal for classical computation. In this section, we introduce and
discuss universal sets of quantum gates.
180 4. The Theory of Quantum Algorithms

Our first theorem shows that the notion of a universal set of quantum gates can-
not be obtained as a straightforward generalization of the corresponding definition in
classical computing.

Theorem 4.8.1. Let 𝑆 be a set of quantum gates such that for every 𝑛 ∈ ℕ and every
unitary operator 𝑈 on ℍ𝑛 there is a quantum circuit that implements 𝑈 and uses only
gates from 𝑆. Then 𝑆 is uncountable.

Proof. Let 𝑛 ∈ ℕ. By Theorem 4.3.15 the rotation gates 𝑅𝑥̂ (𝜃) with 𝜃/2 ∈ [0, 2𝜋[
are pairwise different. Since the set [0, 2𝜋[ is uncountable, it follows that the set of
unitary operators on ℍ𝑛 is uncountable. But if 𝑆 is a countable set of quantum gates,
then the set of all quantum circuits that can be constructed using the gates in 𝑆 is also
countable. □

Theorem 4.8.1 implies that there are no finite or even countable universal sets of
quantum gates in the classical sense. We will therefore call a set of quantum gates uni-
versal if it can be used to approximate every unitary operator to an arbitrary precision
and we will show that finite sets of quantum gates with this property exist. The notions
of universality can be generalized to general quantum operators. But in this book, we
restrict ourselves to unitary quantum operators. For the discussion of universality, we
need the following definition. It uses the supremum sup 𝑆 of a set 𝑆 of real numbers
that is bounded from above. It is the least upper bound of 𝑆 in ℝ and it can be shown
that it always exists and is uniquely determined.

Definition 4.8.2. Let 𝑈 and 𝑉 be two unitary operators on ℍ𝑛 . Then we define the
error when 𝑉 is implemented instead of 𝑈 as

(4.8.1) 𝐸(𝑈, 𝑉) = sup{‖(𝑈 − 𝑉) |𝜑⟩ ‖ ∶ |𝜑⟩ ∈ ℍ𝑛 , ⟨𝜓|𝜓⟩ = 1}.

We also call this error the distance between 𝑈 and 𝑉.

The next proposition uses the distance between two unitary operators 𝑈 and 𝑉 on
ℍ𝑛 to estimate the difference between the probabilities of measuring a certain outcome
when 𝑈 and 𝑉 are applied to a quantum state.

Proposition 4.8.3. Let 𝑈 and 𝑉 be unitary operators on ℍ𝑛 , let 𝑂 be an observable on


ℍ𝑛 , let 𝑂 = ∑𝜆∈Λ 𝜆𝑃𝜆 be the spectral decomposition of 𝑂. Then for all 𝜆 ∈ Λ and all
quantum states |𝜓⟩ ∈ ℍ𝑛 we have

(4.8.2) |⟨𝑈 |𝜓⟩ |𝑃 |𝑈 |𝜓⟩ ⟩ − ⟨𝑉 |𝜓⟩ |𝑃 |𝑉 |𝜓⟩ ⟩| ≤ 2𝐸(𝑈, 𝑉).


| | 𝜆| | 𝜆| |

Proof. Let 𝜆 ∈ Λ and let |𝜓⟩ ∈ ℍ𝑛 be a quantum state. Since 𝑃𝜆 is an orthogonal pro-
jection, it is Hermitian by Proposition 2.4.41. Also, 𝑈 is unitary and |𝜓⟩ has Euclidean
length 1. From Exercise 2.4.38 we obtain

(4.8.3) ‖𝑃 ∗ 𝑈 |𝜓⟩ ‖ = ‖𝑃 𝑈 |𝜓⟩ ‖ ≤ ‖𝑈 |𝜓⟩ ‖ = ‖𝜓‖ = 1.


‖𝜆 ‖ ‖ 𝜆 ‖
4.9. Implementation of controlled operators 181

This inequality, the linearity of the inner product on ℍ𝑛 , the Cauchy-Schwarz inequal-
ity, and Exercise 2.4.38 imply

|⟨𝑈 |𝜓⟩ |𝑃 |𝑈 |𝜓⟩ ⟩ − ⟨𝑉 |𝜓⟩ |𝑃 |𝑉 |𝜓⟩ ⟩|


| | 𝜆| | 𝜆| |
= ||⟨𝑈 |𝜓⟩ ||𝑃𝜆 ||(𝑈 − 𝑉) |𝜓⟩ ⟩ + ⟨(𝑈 − 𝑉) |𝜓⟩ ||𝑃𝜆 ||𝑉 |𝜓⟩ ⟩||

(4.8.4) = ||⟨𝑃𝜆∗ 𝑈 |𝜓⟩ ||(𝑈 − 𝑉) |𝜓⟩ ⟩ + ⟨(𝑈 − 𝑉) |𝜓⟩ ||𝑃𝜆 𝑉 |𝜓⟩ ⟩||

≤ ‖‖𝑃𝜆∗ 𝑈 |𝜓⟩ ‖‖‖‖(𝑈 − 𝑉) |𝜓⟩ ‖‖ + ‖‖(𝑈 − 𝑉) |𝜓⟩ ‖‖‖‖𝑃𝜆 𝑉 |𝜓⟩ ‖‖


≤ 2𝐸(𝑈, 𝑉). □

Now we can define universal sets of quantum gates. In this definition, we use the
term unitary quantum gate. By this we mean a quantum gate that implements a unitary
operator; i.e., this quantum gate is neither an ancillary nor an erasure gate.

Definition 4.8.4. Let 𝑆 be a set of unitary quantum gates.

(1) We say that 𝑆 is universal for a set 𝑇 of unitary quantum operators if for every
𝜀 ∈ ℝ>0 and every 𝑈 ∈ 𝑇 there is a unitary operator 𝑉 which can — up to a
global phase factor — be implemented by a quantum circuit that uses only gates
from 𝑆, ancillary gates, and erasure gates such that 𝐸(𝑈, 𝑉) < 𝜀.
(2) We say that 𝑆 is universal for quantum computation if 𝑆 is universal for the set of
all unitary quantum operators.

We also define a direct analog of the classical notion of universality.

Definition 4.8.5. We call a set 𝑆 of unitary quantum gates perfectly universal for quan-
tum computation or perfectly universal for short if all unitary operators 𝑈 on ℍ𝑛 can,
up to a global phase factor, be implemented by a quantum circuit that uses only gates
from 𝑆, ancillary gates, and erasure gates.

4.9. Implementation of controlled operators


This section discusses the implementation of general controlled unitary operators that
were defined in Section 4.4.3. They are required in many contexts, for example in the
proofs that certain sets of gates are universal or perfectly universal.
We first note the following. In Section 4.4.2, we demonstrated how to implement
𝐶 1 (𝑈) for single-qubit operators 𝑈 when a certain representation of 𝑈 is known. How-
ever, in general, implementing 𝐶 1 (𝑈) for unitary operators on ℍ𝑛 when 𝑛 > 1 is more
involved. To create implementations of such controlled operators, additional informa-
tion about 𝑈 is required, as will be discussed in Section 4.12.3. Therefore, this section
exclusively discusses implementations of controlled-𝑈 operators that may use 𝐶 1 (𝑉)
gates for certain operators 𝑉.
182 4. The Theory of Quantum Algorithms

4.9.1. Implementations of 𝐶 2 (𝑈) operators. We start by presenting imple-


mentations of 𝐶 2 (𝑈) for unitary operators 𝑈 for which a square root is given. This
will allow the implementation of the quantum Toffoli gate. We first show that any
unitary operator possesses a square root.
Proposition 4.9.1. Let 𝑈 be a unitary operator on ℍ𝑛 . Then there is a unitary operator
𝑉 on ℍ𝑛 with 𝑉 2 = 𝑈.

Proof. Let
(4.9.1) 𝑈 = ∑ 𝜆𝑃𝜆
𝜆∈Λ

be the spectral decomposition of 𝑈. Set


(4.9.2) 𝑉 = ∑ √𝜆𝑃𝜆
𝜆∈Λ

where √𝜆 is a square root of 𝜆 in ℂ. This is the spectral decomposition of 𝑉 and we


have 𝑉 2 = 𝑈. Also, by Proposition 2.4.60 we have |𝜆| = 1 for all 𝜆 ∈ Λ. This implies
that |√𝜆| = 1 for all 𝜆 ∈ Λ. Therefore, Proposition 2.4.60 implies that 𝑉 is unitary. □

Using the method from the proof of Proposition 4.9.1, we can determine square
roots of the Pauli operators.
Proposition 4.9.2. The operator 𝑉 = (1 + 𝑖)(𝐼 − 𝑖𝑋)/2 is unitary, and we have 𝑉 2 = 𝑋.

Proof. By Example 2.4.57 the spectral decomposition of the Pauli 𝑋 operator is 𝑋 =


|𝑥+ ⟩ ⟨𝑥+ | − |𝑥− ⟩ ⟨𝑥− | with |𝑥+ ⟩ , |𝑥− ⟩ from (4.1.7). Hence, if we set 𝑉 = |𝑥+ ⟩ ⟨𝑥+ | +
𝑖 |𝑥− ⟩ ⟨𝑥− |, then 𝑉 is a unitary single-qubit operator, we have 𝑉 2 = 𝑋, and 𝑉 =
(1 + 𝑖)(𝐼 − 𝑖𝑋) as shown in Exercise 4.9.3. □
Exercise 4.9.3. Verify that |𝑥+ ⟩ ⟨𝑥+ | + 𝑖 |𝑥− ⟩ ⟨𝑥− | = (1 + 𝑖)(𝐼 − 𝑖𝑋)/2.
Exercise 4.9.4. Determine square roots of the Pauli operators 𝑌 and 𝑍 and the repre-
sentations of these square roots as a linear combination of the basis elements 𝐼, 𝑋, 𝑌 , 𝑍
of End(ℍ1 ).

Figure 4.9.1 shows an implementation of 𝐶 2 (𝑈) that uses a square root 𝑉 of 𝑈. Its
correctness is stated in Proposition 4.9.5.
Proposition 4.9.5. Let 𝑈, 𝑉 be unitary operators such that 𝑉 2 = 𝑈. Then the circuit on
the right side of Figure 4.9.1 implements 𝐶 2 (𝑈). It uses two 𝖢𝖭𝖮𝖳 gates, two 𝐶 1 (𝑉) gates,
and one 𝐶 1 (𝑉 ∗ ) gates.

𝑈 𝑉 𝑉∗ 𝑉

Figure 4.9.1. Implementation of 𝐶 2 (𝑈) if 𝑉 2 = 𝑈.


4.9. Implementation of controlled operators 183

𝑇∗ 𝑇∗ 𝑆

𝐻 𝑇∗ 𝑇 𝑇∗ 𝑇 𝐻

Figure 4.9.2. An implementation of the Toffoli gate that uses the Hadamard, phase,
𝖢𝖭𝖮𝖳, and 𝜋/8 gates.

Exercise 4.9.6. Prove Proposition 4.9.5.

It follows from Proposition 4.9.2 that the circuit in Figure 4.9.1 can be used to im-
plement the quantum Toffoli gate. Figure 4.9.2 shows another implementation of this
gate. It uses the phase, 𝜋/8, and 𝖢𝖭𝖮𝖳 gates and its correctness is stated in Proposition
4.9.7.

Proposition 4.9.7. The circuit in Figure 4.9.2 implements the Toffoli gate. It uses two
Hadamard, one phase, seven (inverse) 𝜋/8, and six 𝖢𝖭𝖮𝖳 gates.

Exercise 4.9.8. Prove Proposition 4.9.7.

4.9.2. Implementation of general controlled operators. We now present im-


plementations of general controlled operators as described in Definition 4.4.7, using
only Toffoli gates and controlled gates with a single control qubit. We begin by provid-
ing an explanation of the implementations of all operators 𝐶 𝑘 (𝑈) introduced in Section
4.4.5, for every unitary operator 𝑈 on ℍ𝑚 , where 𝑘 and 𝑚 are both natural numbers
and 𝑘 ≥ 2. The circuits use 𝑘 − 1 ancilla qubits and operate on quantum registers of
length 2𝑘 + 𝑚 − 1. The corresponding circuit implementation of 𝐶 3 (𝑈) has already
been presented in Section 4.6 and Figure 4.6.1 to motivate the use of ancillary qubits.
The circuit for 𝑘 = 5 and 𝑚 = 1 is shown in Figure 4.9.3. The general implementation
of 𝐶 𝑘 (𝑈) is presented in Algorithm 4.9.9.
The idea of this construction is the following. Fix 𝑘, 𝑚, 𝑈. After inserting the an-
cilla qubits, the circuit operates on states of the form
(4.9.3) |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ |𝑎0 ⋯ 𝑎𝑘−2 ⟩ |𝑡0 ⋯ 𝑡𝑚−1 ⟩
where |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ are the 𝑘 control qubits, |𝑎0 ⋯ 𝑎𝑘−2 ⟩ are the 𝑘 − 1 ancilla qubits, and
|𝑡0 ⋯ 𝑡𝑚−1 ⟩ are the 𝑚 target qubits. The first 𝑘 − 1 steps of the computation change the
ancilla qubits to
| 𝑗+1
(4.9.4) |𝑎𝑗 ⟩ = ||∏ 𝑐 𝑖 ⟩ , 0 ≤ 𝑗 ≤ 𝑘 − 2,
| 𝑖=0
and have no effect on the other qubits. In step 𝑘 the circuit changes the target qubits
|𝑡0 ⋯ 𝑡𝑚−1 ⟩ to 𝑈 𝑎𝑘−2 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ which by (4.9.4) is
𝑘−1
(4.9.5) 𝑈 ∏𝑖=0 𝑐𝑖 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ .
184 4. The Theory of Quantum Algorithms

Algorithm 4.9.9. Implementation of 𝐶 𝑘 (𝑈) using only 𝖢𝖢𝖭𝖮𝖳 and 𝐶 1 (𝑈)


Input: |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ |𝑡0 ⋯ 𝑡𝑚−1 ⟩
Output: 𝐶 𝑘 (𝑈) |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ |𝑡0 ⋯ 𝑡𝑚−1 ⟩
𝑘
1: 𝐶 (𝑈)
2: Insert 𝑘 − 1 ancilla qubits |𝑎0 ⟩ , . . . , |𝑎𝑘−2 ⟩ after the control qubit |𝑐 𝑘−1 ⟩ and
initialize them to |0⟩
3: |𝑎0 ⟩ ← 𝑋 𝑐0 ⋅𝑐1 |𝑎0 ⟩
4: for 𝑗 = 1 ⋯ 𝑘 − 2 do
5: |𝑎𝑗 ⟩ ← 𝑋 𝑐𝑗+1 ⋅𝑎𝑗−1 |0⟩
6: end for
7: |𝑡0 ⋯ 𝑡𝑚−1 ⟩ ← 𝑈 𝑎𝑘−2 |𝑡0 ⋯ 𝑡𝑚−1 ⟩
8: for 𝑗 = 𝑘 − 2, . . . , 1 do
9: |𝑎𝑗 ⟩ ← 𝑋 𝑐𝑗+1 ⋅𝑎𝑗−1 |𝑎𝑗 ⟩
10: end for
11: |𝑎0 ⟩ ← 𝑋 𝑐0 ⋅𝑐1 |𝑎0 ⟩
12: Trace out |𝑎0 ⋯ 𝑎𝑘−2 ⟩
13: The final state is |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ |𝑡0 ⋯ 𝑡𝑚−1 ⟩
14: end

This step has no effect on the other qubits. The next 𝑘 − 1 steps of the circuit change
the ancilla qubits back to |0⟩ without an effect on the other qubits. Now the state of

|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩
|𝑐 2 ⟩ |𝑐 2 ⟩
|𝑐 3 ⟩ |𝑐 3 ⟩
|𝑐 4 ⟩ |𝑐 4 ⟩

|𝑎0 ⟩ = |0⟩ tr

|𝑎1 ⟩ = |0⟩ tr

|𝑎2 ⟩ = |0⟩ tr

|𝑎3 ⟩ = |0⟩ tr

4
||𝑡 ⃗ ⟩ 𝑈 𝑈 ∏𝑖=0 𝑐𝑖 ||𝑡 ⃗ ⟩
Figure 4.9.3. Implementation of 𝐶 5 (𝑈) using only 𝖢𝖢𝖭𝖮𝖳 and 𝐶 1 (𝑈) gates.
4.9. Implementation of controlled operators 185

the composition 𝐴𝐵, where 𝐴 consists of the control and target qubits and system 𝐵
consists of the ancilla qubits, is separable. Hence, it follows from Corollary 3.7.12 that
tracing out the ancilla qubits does not change the control or target qubits and yields
𝑘−1
(4.9.6) 𝐶 𝑘 (𝑈) |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ |𝑡0 ⋯ 𝑡𝑚−1 ⟩ = |𝑐 0 ⋯ 𝑐 𝑘−1 ⟩ 𝑈 ∏𝑖=0 𝑐𝑖 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ .

The next proposition states the correctness of the construction.

Proposition 4.9.10. Let 𝑈 be a unitary operator on ℍ𝑚 and let 𝑘 ∈ ℕ, 𝑘 ≥ 2. Then


Algorithm 4.9.9 implements 𝐶 𝑘 (𝑈). It uses 2𝑘 − 2 𝖢𝖢𝖭𝖮𝖳 gates, one 𝐶 1 (𝑈) gate, and
𝑘 − 1 ancillary and erasure gates.

Proof. We show by induction that for 𝑗 = 0, . . . , 𝑘 − 2, the first steps 𝑗 + 1 carried out
by the algorithm starting in line 3 yield the ancilla qubits
| 𝑗+1
(4.9.7) |𝑎𝑗 ⟩ = ||∏ 𝑐 𝑖 ⟩
| 𝑖=0
and do not change the other qubits. We start with 𝑗 = 0, that is, with the statement in
line 3 of the algorithm. It produces the state
(4.9.8) |𝑎0 ⟩ = |𝑐 0 ⋅ 𝑐 1 ⟩ .

All other qubits remain unchanged.


Now let 𝑗 ∈ ℕ, 1 ≤ 𝑗 < 𝑘 − 2, and assume that after 𝑗 steps we have
| 𝑗
(4.9.9) |𝑎𝑗−1 ⟩ = ||∏ 𝑐𝑗 ⟩ .
| 𝑖=0
Step 𝑗 + 1 is the iteration with loop index 𝑗 of the for loop starting in line 4. After this
iteration, we have
(4.9.10) |𝑎𝑗 ⟩ = 𝑋 𝑎𝑗−1 ⋅𝑐𝑗+1 |0⟩ = |𝑎𝑗−1 ⋅ 𝑐𝑗+1 ⟩ .

So (4.9.9) and (4.9.10) imply (4.9.7). Next, it follows from (4.9.7) with 𝑗 = 𝑘 − 2 that
the statement in line 7 has the following effect:
𝑘−1
(4.9.11) |𝑡⟩ ← 𝐶 𝑎𝑘−2 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ = 𝐶 ∏𝑖=0 𝑐𝑖 |𝑡0 ⋯ 𝑡𝑚−1 ⟩ .

Finally, the for loop starting in line 8 carries out 𝑘 − 2 iterations with loop indices
𝑗 = 𝑘 − 2, . . . , 1. Iteration with loop index 𝑗 ∈ {1, . . . , 𝑘 − 2} changes the qubit |𝑎𝑗 ⟩ and
no other qubit. Also, after the 𝑗th iteration we have
(4.9.12) |𝑎𝑗 ⟩ = |0⟩ .

This can also be seen by induction on 𝑗. The statement in line 11 yields |𝑎0 ⟩ = 0. So
(4.9.12) holds for 𝑗 = 0, . . . , 𝑘 − 2. In addition, the state of the composition 𝐴𝐵, where
𝐴 consists of the control and target qubits and system 𝐵 consists of the ancillary qubits,
is separable. So, by Corollary 3.7.12, tracing out the qubits |𝑎0 ⟩ , . . . , |𝑎𝑘−2 ⟩ has no effect
on the other qubits. This concludes the proof. □
186 4. The Theory of Quantum Algorithms

|𝑐 0 ⟩ 𝑋 𝑋 |𝑐 0 ⟩

|𝑐 1 ⟩ 𝑋 𝑋 |𝑐 1 ⟩

|𝑐 2 ⟩ |𝑐 2 ⟩
|𝑐 3 ⟩ |𝑐 3 ⟩
|𝑐 4 ⟩ |𝑐 4 ⟩
|𝑐 5 ⟩ |𝑐 5 ⟩

|𝑎0 ⟩ = |0⟩ tr

|𝑎1 ⟩ = |0⟩ tr

|𝑎2 ⟩ = |0⟩ tr

|𝑎3 ⟩ = |0⟩ tr

|𝑡⟩ 𝑈 𝑈 (1−𝑐0 )(1−𝑐1 )𝑐2 𝑐3 𝑐4 |𝑡⟩

Figure 4.9.4. Implementation of 𝐶 {0,1},{2,3,4},6 (𝑈) using only 𝑋, 𝖢𝖢𝖭𝖮𝖳, and 𝐶 1 (𝑈) gates.

From Propositions 4.9.10 and 4.9.7 we obtain the following result.


Proposition 4.9.11. Let 𝑈 be a unitary operator on ℍ𝑚 and let 𝑘 ∈ ℕ, 𝑘 ≥ 2. Then Al-
gorithm 4.9.9 implements 𝐶 𝑘 (𝑈) using O(𝑘) Hadamard, phase, 𝜋/8, inverse 𝜋/8, 𝖢𝖭𝖮𝖳,
ancillary, and erasure gates, and one 𝐶 1 (𝑈) gate.
Exercise 4.9.12. Prove Proposition 4.9.11.

Based on the construction of the 𝐶 𝑘 (𝑈) quantum circuit, we can implement gen-
eral 𝐶 𝐶0 ,𝐶1 ,𝑇 operators as in Definition 4.4.7. As an example, Figure 4.9.4 shows a quan-
tum circuit that implements such an operator with 𝑛 = 7, 𝐶0 = {0, 1}, 𝐶1 = {2, 3, 4},
and 𝑇 = {6}. In this circuit, the Pauli 𝑋 operator is applied to the qubits |𝑏𝑖 ⟩ with 𝑖 ∈ ℂ0
at the beginning and the end of the computation. This idea is the basis for the following
theorem.
Theorem 4.9.13. Let 𝐶0 , 𝐶1 , and 𝑇 be pairwise disjoint subsets of ℤ𝑛 , let 𝑚 = |𝑇| > 0,
and assume that 𝑇 = {𝑖, 𝑖 + 1, . . . , 𝑖 + 𝑚 − 1}. Also, let 𝑈 be a unitary operator on ℍ𝑚 .
Set 𝑘0 = |𝐶0 |, 𝑘1 = |𝐶1 |, 𝑘 = 𝑘0 + 𝑘1 . Then the unitary operator 𝐶 𝐶0 ,𝐶1 ,𝑇 (𝑈) can be
implemented by a quantum circuit that uses 2𝑘0 Pauli 𝑋 gates, 2𝑘 − 2 𝖢𝖢𝖭𝖮𝖳 gates, one
𝐶 1 (𝑈) gate, and 𝑘 − 1 ancillary and erasure gates.

For the special case where 𝑈 is a single-qubit operator, Theorem 4.9.13 implies the
following result.
Theorem 4.9.14. Let 𝑈 be a unitary single-qubit operator, let 𝐶0 , 𝐶1 be disjoint subsets
of ℤ𝑛 , let 𝑘0 = |𝐶0 |, 𝑘1 = |𝐶1 |, 𝑘 = 𝑘0 + 𝑘1 < 𝑛, and 𝑡 ∈ ℤ𝑛 ⧵ (𝐶0 ∪ 𝐶1 ). Then the
4.10. Perfectly universal sets of quantum gates 187

unitary operator 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑈) can be implemented by a quantum circuit that uses O(𝑘)
Pauli 𝑋, Hadamard, 𝜋/8, inverse 𝜋/8, 𝖢𝖭𝖮𝖳, ancillary, and erasure gates and four other
single-qubit gates.

Proof. It follows from Theorem 4.9.13 that 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑈) can be implemented using 2𝑘0
Pauli 𝑋 gates, 2𝑘 − 2 𝖢𝖢𝖭𝖮𝖳 gates, one 𝐶 1 (𝑈) operator, and 𝑘 − 1 ancillary and erasure
gates. By Proposition 4.9.7 each 𝖢𝖢𝖭𝖮𝖳 gate can be implemented using 2 Hadamard
gates 𝐻, one phase gate 𝑆, 7 (inverse) 𝜋/8 gates 𝑇, and 6 standard 𝖢𝖭𝖮𝖳 gates. Also,
by Theorem 4.4.5 the operator 𝐶 1 (𝑈) can be implemented using 2 standard 𝖢𝖭𝖮𝖳 and
4 single-qubit gates 𝐴, 𝐵, 𝐶, 𝑒𝑖𝛿/2 𝑅𝑧̂ (𝛿) such that 𝑈 = 𝑒𝑖𝛿 𝐴𝑋𝐵𝑋𝐶. Since by (4.3.53)
we have 𝑆 = 𝑇 2 , it follows that 𝐶 𝐶0 ,𝐶1 ,𝑡 (𝑈) can be implemented using O(𝑘) Pauli 𝑋,
Hadamard, (inverse) 𝜋/8, and standard 𝖢𝖭𝖮𝖳 gates and four other single-qubit gates.

From Theorem 4.9.13 we obtain the following result.


Theorem 4.9.15. Every transposition operator can be implemented by a quantum circuit
that uses O(𝑛) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳, ancillary and erasure
gates.

Proof. The proof is analogous to the proof of Theorem 4.9.14 and uses the fact that for
a transposition operator, we have 𝑈 = 𝑋. □

4.10. Perfectly universal sets of quantum gates


In this section, we construct two perfectly universal sets of quantum gates.

4.10.1. Two-level operators. For the first construction, we need the following
definition. We recall that we identify linear operators on ℍ𝑛 with their representation
matrices with respect to the computational basis of ℍ𝑛 .
Definition 4.10.1. Let 𝐴 ∈ ℂ(𝑘,𝑘) . Then 𝐴 is called a two-level matrix, two-level oper-
ator, or two-level gate if there are 𝑖, 𝑗 ∈ ℤ𝑘 such that for every 𝑣 ̂ ∈ ℂ𝑘 all entries of the
vectors 𝑣 ̂ and 𝐴𝑣 ̂ with indices different from 𝑖 and 𝑗 are equal.

We note that in this definition we do not require 𝑖 and 𝑗 to be different. This implies
that all matrices in ℂ(1,1) are two-level matrices that will simplify our reasoning. We
also note that all matrices in ℂ(2,2) are two-level matrices.
Example 4.10.2. Consider the matrix
1 1 0
𝐴 = (1 −1 0) .
0 0 1
For every 𝑣 ̂ = (𝑣 0 , 𝑣 1 , 𝑣 2 ) ∈ ℂ3 we have
𝑣0 + 𝑣1
𝐴𝑣 ̂ = (𝑣 0 − 𝑣 1 ) .
𝑣2
Hence, 𝐴 is a two-level matrix.
188 4. The Theory of Quantum Algorithms

Example 4.10.3. Consider the matrix


1 1 0
𝐴 = (1 −1 0) .
0 1 1

For every 𝑣 ̂ = (𝑣 0 , 𝑣 1 , 𝑣 2 ) ∈ ℂ3 we have


𝑣0 + 𝑣1
𝐴𝑣 ̂ = (𝑣 0 − 𝑣 1 ) .
𝑣1 + 𝑣2
Hence, for 𝑣 ̂ = (1, 1, 1) we have 𝐴𝑣 ̂ = (2, 0, 2). So 𝐴 is not a two-level matrix.

The main theorem of this section is the following.

Theorem 4.10.4. Let 𝑈 ∈ ℂ(𝑘,𝑘) be unitary. Then 𝑈 can be written as a product of


𝑘(𝑘 − 1)/2 unitary two-level matrices.

Proof. We prove the assertion by induction on 𝑘. For 𝑘 = 1 and 𝑘 = 2 the assertion


holds. Now assume that 𝑘 > 2 and that the assertion is proved for 𝑘 − 1. For the
inductive step, let 𝑈 ∈ ℂ(𝑘,𝑘) be a unitary matrix. We first show that there are unitary
two-level matrices 𝑈1 , . . . , 𝑈 𝑘−1 ∈ ℂ(𝑘,𝑘) such that the product
(4.10.1) 𝑉 = 𝑈 𝑘−1 ⋯ 𝑈1 𝑈
is of the form
1 0 ⋯ 0
⎛ ⎞
0 ∗ ⋯ ∗
(4.10.2) 𝑉 =⎜ ⎟.
⎜ ⋮ ⋮ ⋮⎟
⎝ 0 ∗ ⋯ ∗⎠

To prove this claim, we use the following construction. Let 𝑊 = (𝑤 𝑖,𝑗 ) ∈ ℂ(𝑘,𝑘) be a
unitary matrix. For 𝑖 = 1, . . . , 𝑘 − 1 define the matrix 𝑇𝑖 (𝑊) = (𝑡𝑝,𝑞 ) ∈ ℂ(𝑘,𝑘) as follows.
If 𝑤 𝑖,0 = 0, we set 𝑇𝑖 (𝑊) = 𝐼𝑘 . If 𝑤 𝑖,0 ≠ 0, we initialize 𝑇𝑖 (𝑊) to 𝐼𝑘 and then change
four of the entries of 𝑇𝑖 (𝑊) in the following way. Set
1
(4.10.3) 𝑐=
2 2
√|𝑤 0,0 | + |𝑤 𝑖,0|
and

(4.10.4) 𝑡0,0 = 𝑐𝑤 0,0 , 𝑡0,𝑖 = 𝑐𝑤 𝑖,0 , 𝑡 𝑖,0 = 𝑐𝑤 𝑖,0 , 𝑡 𝑖,𝑖 = −𝑐𝑤 0,0 .

Then 𝑇𝑖 (𝑊) is a two-level matrix. Also, this matrix is unitary. To see this, we observe
that the columns with indices 𝑗 ≠ 0, 𝑖 are equal to the unit vectors 𝑒𝑗⃗ . Therefore, they
have length 1 and are pairwise orthogonal. Furthermore, the length of the first column
is 𝑐2 (|𝑤 0,0 |2 + |𝑤 𝑖,0 |2 ) = 1, the length of the 𝑖th column is 𝑐2 (|𝑤 𝑖,0 |2 + |𝑤 0,0 |2 ) = 1, the
inner product of the first and the 𝑖th columns is 𝑐2 (𝑤 0,0 𝑤 𝑖,0 − 𝑤 𝑖,0 𝑤 0,0 ) = 0, and the
inner product of the first and second columns with the other columns is 0.
4.10. Perfectly universal sets of quantum gates 189

We determine the entries in the first column of the product 𝑇𝑖 (𝑊)𝑊. The entries
with indices different from 0 and 𝑖 are the same as the corresponding entries in the first
column of 𝑊. The entry with index 0 is
1
(4.10.5) 𝑡0,0 𝑤 0,0 + 𝑡0,𝑖 𝑤 𝑖,0 = 𝑐(|𝑤 0,0 |2 + |𝑤 𝑖,0 |2 ) = .
𝑐
The entry with index 𝑖 is
(4.10.6) 𝑡 𝑖,0 𝑤 0,0 + 𝑡 𝑖,𝑖 𝑤 𝑖,0 = 𝑐(𝑤 𝑖,0 𝑤 0,0 − 𝑤 0,0 𝑤 𝑖,0 ) = 0.

So if we set 𝑈0 = 𝑈 and 𝑈 𝑗 = 𝑇𝑗 (𝑈 𝑗−1 ⋯ 𝑈2 𝑈) for 𝑗 = 1, . . . , 𝑘 − 1, then 𝑉 =


𝑈 𝑘−1 ⋯ 𝑈1 𝑈 has the form
𝑥 ∗ ⋯ ∗
⎛ ⎞
0 ∗ ⋯ ∗
(4.10.7) 𝑉 =⎜ ⎟
⎜ ⋮ ⋮ ⋮⎟
⎝0 ∗ ⋯ ∗⎠
with a positive real number 𝑥. As a product of unitary matrices, this matrix is unitary.
So it follows from Proposition 2.4.18 that the length of the first row and column is 1.
This implies 𝑥 = 1 and that the entries in the first row with indices 𝑗 ∈ {1, . . . , 𝑘 − 1}
are 0. Hence, 𝑉 is of the form
1 0⃗
(4.10.8) 𝑉 =( )
0⃗ 𝑉′

where 0⃗ denotes the row and column vectors in ℂ𝑘−1 which have only zero entries,
respectively, and 𝑉 ′ ∈ ℂ(𝑘−1,𝑘−1) is the minor obtained from 𝑉 by deleting the first
row and column of this matrix. Since 𝑉 is unitary, Proposition 2.4.18 implies that 𝑉 ′ is
unitary.
According to the induction hypothesis, there are 𝑚 ∈ ℕ and unitary two-level
matrices 𝑉1′ , . . . , 𝑉𝑚′ ∈ ℂ(𝑘−1,𝑘−1) such that 𝑚 ≤ (𝑘 − 1)(𝑘 − 2)/2 and 𝑉 ′ = 𝑉0′ ⋯ 𝑉𝑚−1

.
For 0 ≤ 𝑖 < 𝑚 we set
1 0⃗
(4.10.9) 𝑉𝑖 = ( ).
0⃗ 𝑉𝑖′
So 𝑉 𝑖 is obtained from 𝑉𝑖′ by prepending the unit vector (1, 0, . . . , 0) ∈ ℂ𝑘 as first row
and column. Then the 𝑉 𝑖 are unitary two-level matrices and we have 𝑉 = 𝑉0 ⋯ 𝑉𝑚−1 .
From (4.10.1) we obtain
(4.10.10) 𝑈 = 𝑈1∗ ⋯ 𝑈𝑘−1

𝑉0 ⋯ 𝑉𝑚−1 .
This is a decomposition of 𝑈 into a product of two-level unitary matrices. The number
of factors is 𝑚 + 𝑘 − 1 ≤ (𝑘 − 1)(𝑘 − 2)/2 + 𝑘 − 1 = (𝑘2 − 3𝑘 + 2 + 2𝑘 − 2)/2 = (𝑘2 − 𝑘)/2 =
𝑘(𝑘 − 1)/2. □

The proof of Theorem 4.10.4 also contains a method to construct the decompo-
sition of a unitary matrix into a product of two-level matrices and thus a method to
construct a circuit that implements a given unitary operator and uses only two-level
gates. Therefore, Theorem 4.10.4 implies the following corollary.
190 4. The Theory of Quantum Algorithms

Corollary 4.10.5. The set of all two-level unitary operators is perfectly universal for quan-
tum computation.

4.10.2. Another perfectly universal set of quantum gates. In this section,


we prove the following theorem.
Theorem 4.10.6. The set that contains all rotation gates and the standard 𝖢𝖭𝖮𝖳 gate is
perfectly universal for quantum computing.

To prove Theorem 4.10.6 we will show that every unitary two-level operator can be
implemented by a quantum circuit that uses only single-qubit gates and the standard
𝖢𝖭𝖮𝖳 gate. Then Theorem 4.10.6 follows from this statement and Corollaries 4.10.5
and 4.3.16. For the proof of Theorem 4.10.6 we also need the following definition and
result.
Definition 4.10.7. Let 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 . A Gray code connecting 𝑠 ⃗ and 𝑡 ⃗ is a sequence
⃗ ) of pairwise distinct vectors in {0, 1}𝑛 such that 𝑔0⃗ = 𝑠,⃗ 𝑔𝑚
𝐺 = (𝑔0⃗ , . . . , 𝑔𝑚 ⃗ = 𝑡,⃗ and the
successive elements of 𝐺 differ by exactly one bit.
Example 4.10.8. Let 𝑠 ⃗ = (0, 0, 0) and 𝑡 ⃗ = (1, 1, 1). Then
(4.10.11) 𝐺 = ((0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1))
is a Gray code that connects 𝑠 ⃗ and 𝑡.⃗
Exercise 4.10.9. Find the shortest Gray code that connects (1, 1, 0) and (0, 1, 1).

The next proposition is also required for the proof of Theorem 4.10.6.
Proposition 4.10.10. Let 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 . Then there is a Gray code of length ≤ 𝑛 + 1 that
connects 𝑠 ⃗ and 𝑡.⃗

Proof. We prove the theorem by induction on 𝑛. Let 𝑛 = 1. Then 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}. If 𝑠 ⃗ = 𝑡,⃗
then (𝑠)⃗ is a gray code of length 1 ≤ 2 = 𝑛 + 1 that connects 𝑠 ⃗ and 𝑡.⃗ If 𝑠 ⃗ ≠ 𝑡,⃗ then (𝑠,⃗ 𝑡)⃗
is a Gray code of length 2 = 𝑛 + 1 connecting 𝑠 ⃗ and 𝑡 ⃗ . This proves the base case.
For the induction step, assume that the assertion is true for 𝑛−1. Denote by 𝑠′⃗ and 𝑡′⃗
the vectors that are obtained from 𝑠 ⃗ and 𝑡,⃗ respectively, by deleting the last entry. Then
′ ′
it follows from the induction hypothesis that there is a Gray code 𝐺 ′ = (𝑔0⃗ , . . . , 𝑔𝑚−1 ⃗ )
of length 𝑚 ≤ 𝑛 that connects 𝑠′⃗ and 𝑡′⃗ . Let 𝑏 be the last entry of 𝑠.⃗ Append 𝑏 to all ele-
ments of 𝐺′ as the new last entry. Denote the resulting sequence by 𝐺 = (𝑔0⃗ , . . . , 𝑔𝑚−1 ⃗ ).
Then 𝐺 is a sequence of length 𝑚 in {0, 1}𝑛 and the successive elements of 𝐺 differ by
exactly one bit. Also, we have 𝑔0⃗ = 𝑠 ⃗ and the vectors 𝑔𝑚−1 ⃗ and 𝑡 ⃗ are either equal or
differ exactly in the last bit. In the first case, 𝐺 is a Gray code connecting 𝑠 ⃗ and 𝑡.⃗ In
the second case, (𝑔0⃗ , . . . , 𝑔𝑚−1
⃗ , 𝑡)⃗ is such a Gray code. □

Note that the proof of Proposition 4.10.10 contains a method for constructing a
connecting gray code.
Exercise 4.10.11. Find an algorithm that on input of 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 computes a Gray
code of length at most 𝑛 + 1 that connects 𝑠 ⃗ and 𝑡 ⃗ and analyze its complexity.
4.10. Perfectly universal sets of quantum gates 191

Next, we will prove a statement that together with Corollary 4.10.5 implies Theo-
rem 4.10.6.
Theorem 4.10.12. For every two-level unitary operator 𝑈 on ℍ𝑛 there is a unitary single-
qubit operator 𝑉 such that 𝑈 can be implemented by a quantum circuit that uses 𝑉 and
O(𝑛2 ) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳, ancillary, and erasure gates
and four other single-qubit gates.

Proof. Let 𝑈 be a two-level unitary operator on ℍ𝑛 . We prove the theorem by con-


structing 𝑉 and a circuit with the required properties.
We start by constructing 𝑉. Since 𝑈 is two-level, we can choose 𝑠,⃗ 𝑡 ⃗ ∈ {0, 1}𝑛 and
𝛼, 𝛽, 𝛾, 𝛿 ∈ ℂ such that
(4.10.12) 𝑈 |𝑠 ⃗ ⟩ = 𝛼 |𝑠 ⃗ ⟩ + 𝛽 ||𝑡 ⃗ ⟩ , 𝑈 ||𝑡 ⃗ ⟩ = 𝛾 |𝑠 ⃗ ⟩ + 𝛿 ||𝑡 ⃗ ⟩ ,
and 𝑈 leaves all other computational basis states of ℍ𝑛 unchanged. We define the
single-qubit operator 𝑉 by its action on the computational basis elements of ℍ1 as fol-
lows:
(4.10.13) 𝑉 |0⟩ = 𝛼 |0⟩ + 𝛽 |1⟩ , 𝑉 |1⟩ = 𝛾 |0⟩ + 𝛿 |1⟩ .
We also write
(4.10.14) ||𝑡 ⃗ ⟩ = |𝑡0 𝑡1 ⋯ 𝑡𝑛−1 ⟩ .

The operator 𝑉 is unitary because 𝑈 is unitary. This is verified in Exercise 4.10.13.


Next, we construct the quantum circuit that has the asserted properties. This con-
struction has three steps.
(1) We show how to find 𝑖 ∈ ℤ𝑛 and a unitary operator 𝑃 that can be implemented
using at most 𝑛 transposition operators such that
𝑃 |𝑠 ⃗ ⟩ = |𝑡0 ⋯ 𝑡 𝑖−1 ⟩ |0⟩ |𝑡 𝑖+1 ⋯ 𝑡𝑛−1 ⟩ ,
(4.10.15)
𝑃 ||𝑡 ⃗ ⟩ = |𝑡0 ⋯ 𝑡 𝑖−1 ⟩ |1⟩ |𝑡 𝑖+1 ⋯ 𝑡𝑛−1 ⟩ ,
and 𝑃 leaves the other computational basis vectors of ℍ𝑛 unchanged.
(2) We let
(4.10.16) 𝐶0 = {𝑗 ∈ ℤ𝑛 ⧵ {𝑖} ∶ 𝑡𝑗 = 0}, 𝐶1 = {𝑗 ∈ ℤ𝑛 ⧵ {𝑖} ∶ 𝑡𝑗 = 1}
and show that
(4.10.17) 𝑈 = 𝑃 ∗ 𝐶 𝐶0 ,𝐶1 ,𝑖 (𝑉)𝑃.
(3) We construct the quantum circuit using (1) and (2) and apply Theorems 4.9.14
and 4.9.15 to estimate the number of required gates.
We start with (1) and show how to construct the unitary operator 𝑃. Let 𝐺 =
⃗ ) be a Gray code with 𝑚 ≤ 𝑛 that connects 𝑠 ⃗ and 𝑡.⃗ It exists by Proposition
(𝑔0⃗ , . . . , 𝑔𝑚
4.10.10. Let 𝑗 ∈ {1, . . . , 𝑚}. Then 𝑔𝑗−1
⃗ and 𝑔𝑗⃗ differ in exactly one bit. Denote by 𝑇𝑗 the
transposition operator from Definition 4.4.11 that satisfies
(4.10.18) |𝑔𝑗⃗ ⟩ = 𝑇𝑗 |𝑔𝑗−1
⃗ ⟩ and |𝑔𝑗−1
⃗ ⟩ = 𝑇𝑗 |𝑔𝑗⃗ ⟩ .
192 4. The Theory of Quantum Algorithms

It does not change the other computational basis states of ℍ𝑛 . Set


(4.10.19) 𝑃 = 𝑇𝑚−1 ⋯ 𝑇1 .
Then
(4.10.20) 𝑃 |𝑠 ⃗ ⟩ = |𝑔𝑚−1
⃗ ⟩ and ⃗ ⟩ = ||𝑡⟩⃗ .
𝑇𝑚 |𝑔𝑚−1 ⟩ = |𝑔𝑚

Since the elements of the Gray code 𝐺 are pairwise different, it follows that 𝑡 ⃗ is different
from the first 𝑚−1 elements in the Gray code which implies 𝑃 ||𝑡 ⃗ ⟩ = ||𝑡 ⃗ ⟩. Also, 𝑃 |𝑠 ⃗ ⟩ and
||𝑡 ⃗ ⟩ differ in exactly one qubit. Let 𝑖 be its index. If 𝑡 𝑖 = 1, then (4.10.15) holds. If 𝑡 𝑖 = 0,
𝑐⃗
then we replace 𝑃 by 𝑇𝑃 where 𝑇 is 𝑇 = 𝖳𝖱𝖠𝖭𝖲 with 𝑐 ⃗ = (𝑡0 , . . . , 𝑡 𝑖−1 , ∗, 𝑡 𝑖+1 , . . . , 𝑡𝑛−1 ).
Note that 𝑃 is the product of at most 𝑛 transposition operators.
Next, assertion (2) is verified in Exercise 4.10.14.
Finally, we deduce the assertion of the theorem from (1) and (2) and Theorems
4.9.14 and 4.9.15. Since 𝑃 is the product of O(𝑛) transposition operators, it follows
from Theorem 4.9.15 that 𝑃 and 𝑃∗ can be implemented by a quantum circuit that con-
tains O(𝑛2 ) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳, ancillary, and erasure
gates. Also, it follows from Theorem 4.9.14 that 𝐶 𝐶0 ,𝐶1 ,𝑖 (𝑉) can be implemented by
a quantum circuit that uses O(𝑛) Pauli 𝑋, Hadamard, (inverse) 𝜋/8, standard 𝖢𝖭𝖮𝖳,
ancillary, and erasure gates and four other single-qubit gates. This concludes the proof
of the theorem. □
Exercise 4.10.13. Show that the operator 𝑉 from the proof of Theorem 4.10.12 is uni-
tary.
Exercise 4.10.14. Show that in the proof of Theorem 4.10.12 assertion (2) is correct.

4.11. A universal set of quantum gates


The goal of this section is to prove the following theorem.
Theorem 4.11.1. The set containing the Hadamard, 𝜋/8, and standard 𝖢𝖭𝖮𝖳 gates is
universal for quantum computation.

The main work in proving this theorem is to show the following theorem.
Theorem 4.11.2. The set containing the Hadamard and 𝜋/8 gates is universal for the
set of all unitary single-qubit operators.

Then Theorem 4.11.1 follows from Theorems 4.10.6 and 4.11.2.


To prove Theorem 4.11.2, we first estimate the distance between two products of
quantum operators in terms of the distance of the factors.
Proposition 4.11.3. Let 𝑘 ∈ ℕ and let 𝑈 𝑖 , 𝑉 𝑖 be unitary operators on ℍ𝑛 for 1 ≤ 𝑖 ≤ 𝑘.
Then we have
𝑘 𝑘 𝑘
(4.11.1) 𝐸 (∏ 𝑈 𝑖 , ∏ 𝑉 𝑖 ) ≤ ∑ 𝐸(𝑈 𝑖 , 𝑉 𝑖 ).
𝑖=1 𝑖=1 𝑖=1
4.11. A universal set of quantum gates 193

Proof. We prove the assertion by induction on 𝑘. For the base case 𝑘 = 1 the assertion
is obviously true. For the inductive step, let 𝑘 > 1,
𝑘−1 𝑘−1
(4.11.2) 𝑈 = ∏ 𝑈 𝑖, 𝑉 = ∏ 𝑉 𝑖,
𝑖=1 𝑖=1

and assume that


𝑘−1
(4.11.3) 𝐸(𝑈, 𝑉) ≤ ∑ 𝐸(𝑈 𝑖 , 𝑉 𝑖 ).
𝑖=1

Also, let |𝜓⟩ ∈ ℍ𝑛 be a quantum state. Then we have


‖(𝑈 𝑈 − 𝑉 𝑉) |𝜓⟩ ‖
‖ 𝑘 𝑘 ‖
= ‖‖𝑈 𝑘 𝑈 |𝜓⟩ − 𝑉 𝑘 𝑉 |𝜓⟩ ‖‖

= ‖‖𝑈 𝑘 (𝑈 − 𝑉) |𝜓⟩ + (𝑈 𝑘 − 𝑉 𝑘 )𝑉 |𝜓⟩ ‖‖

≤ ‖‖𝑈 𝑘 (𝑈 − 𝑉) |𝜓⟩ ‖‖ + ‖‖(𝑈 𝑘 − 𝑉 𝑘 )𝑉 |𝜓⟩ ‖‖


(1)

= ‖‖(𝑈 − 𝑉) |𝜓⟩ ‖‖ + ‖‖(𝑈 𝑘 − 𝑉 𝑘 )𝑉 |𝜓⟩ ‖‖


(2)

≤ 𝐸(𝑈, 𝑉) + 𝐸(𝑈 𝑘 , 𝑉 𝑘 )
(3)
𝑘
≤ ∑ 𝐸(𝑈 𝑖 , 𝑉 𝑖 ).
(4) 𝑖=1

These equations and inequalities are valid for the following reasons: inequality
(1) uses the triangle inequality which holds by Proposition 2.2.25, equation (2) holds
because 𝑈 𝑘 is unitary, inequality (3) uses Definition 4.8.2, and inequality (4) follows
from an application of the induction hypothesis (4.11.3). □

Next, we prove Theorem 4.11.2 using Theorem 4.3.35. Define


𝜋 𝜋 𝜋
(4.11.4) 𝑛⃗ = ( cos , sin , cos ).
8 8 8
Then we have
𝜋 2 𝜋 𝜋
(4.11.5) ‖𝑛‖⃗ 2 = 2 cos2 + sin = cos2 + 1.
8 8 8
We normalize 𝑛⃗ and obtain the unit vector
𝑛⃗
(4.11.6) 𝑛̂ = .
‖𝑛‖⃗
We also set
𝜋 𝜋 𝜋
(4.11.7) 𝑚⃗ = ( cos , − sin , cos ).
8 8 8
𝜋
⃗ = ‖𝑛‖⃗ = cos2
Then ‖𝑚‖ 8
+ 1. Normalizing 𝑚⃗ we obtain
𝑚⃗
(4.11.8) 𝑚̂ = .
‖𝑚‖

194 4. The Theory of Quantum Algorithms

We write
(4.11.9) 𝑛̂ = (𝑛𝑥 , 𝑛𝑦 , 𝑛𝑧 ), 𝑚̂ = (𝑚𝑥 , 𝑚𝑦 , 𝑛𝑧 )
and use the following observation.
Lemma 4.11.4. For all 𝛾 ∈ ℝ we have
(4.11.10) 𝑅𝑚̂ (𝛾) = 𝐻𝑅𝑛̂ (𝛾)𝐻.

Proof. Let 𝛾 ∈ ℝ. It follows from Proposition 4.3.3 that


𝛾 𝛾
(4.11.11) 𝑅𝑛̂ (𝛾) = cos 𝐼 − 𝑖 sin (𝑛𝑥 𝑋 + 𝑛𝑦 𝑌 + 𝑛𝑧 𝑍).
2 2
From (4.1.12) and
(4.11.12) 𝑛𝑥 = 𝑛𝑧 = 𝑚𝑥 = 𝑚𝑧 , 𝑛𝑦 = −𝑚𝑦
we obtain from (2.3.17)
𝛾 𝛾
𝐻𝑅𝑛̂ (𝛾)𝐻 = cos 𝐼 − 𝑖 sin (𝑛𝑥 𝐻𝑋𝐻 + 𝑛𝑦 𝐻𝑌 𝐻 + 𝑛𝑧 𝐻𝑍𝐻)
2 2
𝛾 𝛾
= cos 𝐼 − 𝑖 sin (𝑛𝑥 𝑋 − 𝑛𝑦 𝑌 + 𝑛𝑧 𝑍)
2 2
𝛾 𝛾
(4.11.13) = cos 𝐼 − 𝑖 sin (𝑚𝑥 𝑋 + 𝑚𝑦 𝑌 + 𝑛𝑧 𝑍)
2 2
= 𝑅𝑚̂ (𝛾). □

Now let
𝜋
(4.11.14) 𝜃 = 2 arccos(cos2
).
8
The next lemma shows that, up to a global phase factor, we can implement the rotation
operator 𝑅𝑛̂ (𝜃) by a circuit that uses only Hadamard and 𝜋/8 gates.
𝜋
Lemma 4.11.5. We have 𝑅𝑛̂ (𝜃) = 𝑒−𝑖 4 𝑇𝐻𝑇𝐻.

Proof. It follows from (4.3.52) and (4.3.10) that


𝜋 𝜋 𝜋 𝜋
(4.11.15) 𝑒−𝑖 8 𝑇 = 𝑅𝑧̂ ( ) = 𝑒−𝑖 8 |0⟩ ⟨0| + 𝑒𝑖 8 |1⟩ ⟨1| .
8
Next, recall that by (2.4.72),
(4.11.16) 𝑋 = 𝐻 |0⟩ ⟨0| 𝐻 − 𝐻 |1⟩ ⟨1| 𝐻
is the spectral decomposition of the Pauli 𝑋 gate. So it follows from (4.11.15) and Def-
inition 2.4.69 that
𝜋 𝜋 𝜋
𝑒−𝑖 8 𝐻𝑇𝐻 = 𝑒−𝑖 8 𝐻 |0⟩ ⟨0| 𝐻 + 𝑒𝑖 8 𝐻 |1⟩ ⟨1| 𝐻
𝜋 𝜋
(4.11.17) = 𝑒−𝑖 8 |𝑥+ ⟩ ⟨𝑥+ | + 𝑒𝑖 8 |𝑥− ⟩ ⟨𝑥− |
= 𝑒−𝑖𝜋𝑋/8 .
Now we observe that by (4.11.5) and (4.11.14) we have
𝜋 2 𝜃
2 𝜋 𝜋 1 − cos4 8
sin 2
(4.11.18) sin = 1 − cos2 = 𝜋 = .
8 8 1 + cos2 ‖𝑛‖⃗ 2
8
4.11. A universal set of quantum gates 195

This implies
𝜃
𝜋 sin 2
(4.11.19) sin = .
8 ‖𝑛‖⃗
So we obtain
𝑒−𝑖𝜋/4 𝑇𝐻𝑇𝐻
= 𝑒−𝑖𝜋𝑍/8 𝑒−𝑖𝜋𝑋/8
(1)
𝜋 𝜋 𝜋 𝜋
= ( cos 𝐼 − 𝑖 sin 𝑍)( cos 𝐼 − 𝑖 sin 𝑋)
(2) 8 8 8 8
2 𝜋 𝜋 𝜋 2 𝜋
= cos 𝐼 − 𝑖 sin cos (𝑋 + 𝑍) − 𝑖 sin 𝑍𝑋
8 8 8 8
𝜃 𝜃 1 𝜋 𝜋 𝜋
= cos 𝐼 − 𝑖 sin ( cos 𝑋 + sin 𝑌 + cos 𝑍)
(3) 2 2 ‖𝑛‖⃗ 8 8 8
= 𝑅𝑛̂ (𝜃).
(4)

In these equations we use the following arguments: equation (1) follows from (4.11.15)
and (4.11.17), equaltion (2) is obtained from (4.3.3), equation (3) holds because of
(4.11.14), 𝑍𝑋 = −𝑌 (see Theorem 4.1.2), (4.11.19), and equation (4) is true because
of (4.3.3). □

To show that {𝐻, 𝑇} is universal for the set of all unitary single-qubit operators we
also need the following auxiliary results. Their proofs require some algebraic number
theory which is beyond the scope of this book. We refer to [IR10] which is an excellent
introduction to the subject.
ᵆ𝜋
Lemma 4.11.6. If 𝑣, 𝑢 ∈ ℤ with 𝑣 > 0, then 2 cos( 𝑣
) is an algebraic integer.

Proof. We will show that for all 𝑣 ∈ ℕ and all 𝑦 ∈ ℝ, there exists a polynomial 𝑃𝑣 ∈
ℤ[𝑥] that is monic, has degree 𝑣, and satisfies
(4.11.20) 𝑃𝑣 (2 cos 𝑦) = 2 cos 𝑣𝑦.
So we have
𝑢𝜋
(4.11.21) 𝑃𝑣 (2 cos ) = 2 cos 𝑢𝜋
𝑣
which implies the assertion. The polynomials 𝑃𝑣 are constructed inductively. We set
𝑃0 (𝑥) = 2, 𝑃1 (𝑥) = 𝑥. Then (4.11.20) holds for 𝑣 = 0, 1. Also, for 𝑣 ≥ 1 we set
(4.11.22) 𝑃𝑣+1 (𝑥) = 𝑥𝑃𝑣 (𝑥) − 𝑃𝑣−1 (𝑥).
Assume that (4.11.20) holds for all 𝑣′ ≤ 𝑣. Then (A.5.8), (4.11.22), and the induction
hypothesis imply
(4.11.23) 𝑃𝑣+1 (2 cos 𝑦) = 2 cos 𝑦𝑃𝑣 (2 cos 𝑦) − 𝑃𝑣−1 (2 cos 𝑦)
= 4 cos 𝑦 cos 𝑣𝑦 − 2 cos(𝑣 − 1)𝑦 = 2 cos(𝑣 + 1)𝑦. □
𝜃
Lemma 4.11.7. The fraction 𝜋
is irrational.
196 4. The Theory of Quantum Algorithms

Proof. Using (A.5.7) we obtain

𝜃 𝜋 1 𝜋 1 √2
(4.11.24) cos = cos2 = (cos + 1) = + .
2 8 2 4 2 4
𝜃 ᵆ
Now assume that 2𝜋
= 𝑣
with 𝑢, 𝑣 ∈ ℤ, 𝑣 > 0. Then it follows from Lemma 4.11.6 that
𝜃 𝑢𝜋
(4.11.25) 2 cos = 2 cos
2 𝑣
𝜃
is an algebraic integer. From (4.11.24) we see that 2 cos 2
is a quadratic irrationality of
norm
1 √2 1 √2 1 1
(4.11.26) 4( + )( − )=1− = .
2 4 2 4 2 2
But this is not the norm of an algebraic integer, a contradiction. □

In the next lemma, we need the following notion.


Definition 4.11.8. Let 𝑆 and 𝑇 be sets of real numbers. Then we say that 𝑇 is dense in
𝑆 if for every 𝜀 > 0 and every 𝑠 ∈ 𝑆 there is 𝑡 ∈ 𝑇 such that |𝑠 − 𝑡| < 𝜀.
Lemma 4.11.9. Let 𝛼 ∈ ℝ be an irrational number. Then the set {𝑢𝛼 mod 1 ∶ 𝑢 ∈ ℕ} is
dense in [0, 1[.

Proof. For 𝑢 ∈ ℤ set


(4.11.27) 𝛼ᵆ = (𝑢𝛼) mod 1.
Let 𝑥 ∈ [0, 1[ and 𝜀 ∈ ℝ>0 . We must show that there is 𝑢 ∈ ℕ such that
(4.11.28) |𝛼ᵆ − 𝑥| < 𝜀.
To construct this 𝑢, we select 𝑁 ∈ ℕ such that
1
(4.11.29) < 𝜀.
𝑁
Using the Pigeonhole Principle and the irrationality of 𝛼 we see that there are 𝑘, 𝑙, 𝑘 > 𝑙,
such that
1
(4.11.30) 0 < 𝛼 𝑘 − 𝛼𝑙 <
𝑁
or
1
(4.11.31) − < 𝛼𝑘 − 𝛼𝑙 < 0.
𝑁
First, assume that (4.11.30) is true. This inequality implies that there is 𝑣 ∈ ℕ such
that
(4.11.32) 𝑣(𝛼𝑘 − 𝛼𝑙 ) ∈ [0, 1[
and
1
(4.11.33) |𝑣(𝛼𝑘 − 𝛼𝑙 ) − 𝑥| < < 𝜀.
𝑁
4.11. A universal set of quantum gates 197

It follows from (4.11.32) that


(4.11.34) 𝛼𝑣(𝑘−𝑙) = 𝑣(𝛼𝑘 − 𝛼𝑙 ).
So if we set 𝑢 = 𝑣(𝑘 − 𝑙), then (4.11.28) holds.
Next, assume that (4.11.31) is true. Then we can select 𝑣 ∈ ℕ such that
(4.11.35) 𝑣(𝛼𝑘 − 𝛼𝑙 ) ∈] − 1, 0]
and
1
(4.11.36) |𝑣(𝛼𝑘 − 𝛼𝑙 ) − (𝑥 − 1)| < < 𝜀.
𝑁
It follows from (4.11.34) that
(4.11.37) 𝛼𝑣(𝑘−𝑙) = 𝑣(𝛼𝑘 − 𝛼𝑙 ) + 1.
Using this equality in (4.11.36) we see that (4.11.28) holds for 𝑢 = 𝑣(𝑘 − 𝑙). □

The next proposition shows that the rotation operator 𝑅𝑛̂ (𝜃) can be used to approx-
imate every other rotation about the 𝑛-axis
̂ with arbitrary precision.

Proposition 4.11.10. For all 𝜀 ∈ ℝ>0 and all 𝛾 ∈ ℝ there is 𝑘 ∈ ℕ such that
(4.11.38) 𝐸(𝑅𝑛̂ (𝛾), 𝑅𝑛̂ (𝜃)𝑘 ) < 𝜀.

Proof. Let 𝜀 ∈ ℝ>0 and 𝛾 ∈ ℝ. We approximate 𝑅𝑛̂ (𝛾) to precision 𝜀. By Theorem


4.3.15, we may choose 𝛾 ∈ [0, 2𝜋[. For 𝑘 ∈ ℕ set
(4.11.39) 𝜃𝑘 = (𝑘𝜃) mod 2𝜋.
It follows from Lemma 4.11.7 and Lemma 4.11.9 that there is 𝑘 ∈ ℕ such that
𝜀
(4.11.40) |𝛾 − 𝜃𝑘 | < .
2
Let |𝜓⟩ be a quantum state in ℍ1 . Using the triangle inequality, the fact that 𝑛̂ ⋅ 𝜎 is
unitary (Proposition 4.3.2), Lemma A.5.2, and (4.11.40) we obtain
‖(𝑅𝑛̂ (𝛾) − 𝑅𝑛̂ (𝜃)𝑘 ) |𝜓⟩ ‖
= ‖(cos 𝛾 − cos 𝜃𝑘 ) |𝜓⟩ − 𝑖(sin 𝛾 − sin 𝜃𝑘 )(𝑛̂ ⋅ 𝜎) |𝜓⟩ ‖
(4.11.41) ≤ | cos 𝛾 − cos 𝜃𝑘 | + | sin 𝛾 − sin 𝜃𝑘 )|
≤ 2|𝛾 − 𝜃𝑘 |
< 𝜀. □

Now we can prove Theorem 4.11.2. Let 𝑈 be a unitary single-qubit operator. It


follows from Theorem 4.3.35 that up to a global phase factor the operator 𝑈 can be
written as
𝑘−1
(4.11.42) 𝑈 = ∏ 𝑅𝑛̂ (𝛼𝑖 )𝑅𝑚̂ (𝛽 𝑖 )
𝑖=0
198 4. The Theory of Quantum Algorithms

where 𝑘 ∈ ℕ, 𝑘 = O(1), 𝛼𝑖 , 𝛽 𝑖 ∈ ℝ for 𝑖 ∈ ℤ𝑘 . By Lemma 4.11.4 this implies


𝑘−1
(4.11.43) 𝑈 = ∏ 𝑅𝑛̂ (𝛼𝑖 )𝐻𝑅𝑛̂ (𝛽 𝑖 )𝐻.
𝑖=0

According to Proposition 4.11.10, we can select positive integers 𝑎0 , . . . , 𝑎𝑘−1 and 𝑏0 ,


. . . , 𝑏𝑘−1 such that
𝑎 𝑏
(4.11.44) 𝐸(𝑅𝑛̂ (𝛼𝑖 ), 𝑅𝑛̂𝑖 (𝜃)) < 𝜀/2𝑘, 𝐸(𝑅𝑛̂ (𝛽 𝑖 ), 𝑅𝑛̂𝑖 (𝜃)) < 𝜀/2𝑘
for all 𝑖 ∈ ℤ𝑘 . Now we set
𝑘−1
(4.11.45) 𝑉 = ∏ 𝑅𝑛̂ (𝜃)𝑎𝑖 𝐻𝑅𝑛̂ (𝜃)𝑏𝑖 𝐻.
𝑖=0
𝜋
From Proposition 4.11.5 we know that 𝑅𝑛̂ (𝜃) = 𝑒−𝑖 4 𝑇𝐻𝑇𝐻. Hence, there is 𝛾 ∈ ℂ
such that
𝑘−1
(4.11.46) 𝑉 = 𝑒𝑖𝛾 ∏(𝑇𝐻𝑇𝐻)𝑎𝑖 𝐻(𝑇𝐻𝑇𝐻)𝑏𝑖 𝐻.
𝑖=0

From Proposition 4.11.3 we obtain


𝑘−1 𝑘−1
𝐸(𝑈, 𝑉) = 𝐸 (∏ 𝑅𝑛̂ (𝛼𝑖 )𝐻𝑅𝑛̂ (𝛽 𝑖 )𝐻, ∏ 𝑅𝑛̂ (𝜃)𝑎𝑖 𝐻𝑅𝑛̂ (𝜃)𝑏𝑖 𝐻)
𝑖=0 𝑖=0

(4.11.47) 𝑘−1 𝑘−1


≤ ∑ 𝐸(𝑅𝑛̂ (𝛼𝑖 ), 𝑅𝑛̂ (𝜃)𝑎𝑖 ) + ∑ 𝐸 (𝑅𝑛̂ (𝛽 𝑖 ), 𝑅𝑛̂ (𝜃)𝑏𝑖 )
𝑖=1 𝑖=1
< 2𝑘𝜀/2𝑘 = 𝜀.
Theorem 4.11.2 now follows from (4.11.46) and (4.11.47).
Now we prove Theorem 4.11.1. Let 𝑈 be a unitary operator on ℍ𝑛 . By Theorem
4.10.6, there are 𝑘 ∈ ℕ and unitary single-qubit operators 𝑈0 , . . . , 𝑈 𝑘−1 , such that — up
to a global phase factor — the operator 𝑈 is the product of the operators 𝑈 𝑖 (applied to
certain qubits) and some 𝖢𝖭𝖮𝖳 gates (applied to certain pairs of qubits). Let 𝜀 ∈ ℝ>0 . It
follows from Theorem 4.11.2 that for all 𝑖 ∈ ℤ𝑘 there are unitary single-qubit operators
𝑉 𝑖 that — up global phase factors — can be written as products of Hadamard and 𝜋/8
gates and satisfy
𝜀
(4.11.48) 𝐸(𝑈 𝑖 , 𝑉 𝑖 ) < .
𝑘
Let 𝑉 be the unitary operator that is obtained as follows. In the representation of 𝑈 as
the product of the single-qubit operators 𝑈 𝑖 and certain 𝖢𝖭𝖮𝖳 gates replace all 𝑈 𝑖 by
𝑉 𝑖 . Then Exercise 4.11.11 and Proposition 4.11.3 imply
(4.11.49) 𝐸(𝑈, 𝑉) < 𝜀.
This concludes the proof of the theorem.
Exercise 4.11.11. Let 𝑈 and 𝑉 be single-qubit operators, and let 𝑖 ∈ ℤ𝑛 . Denote by
𝑈(𝑖) and 𝑉(𝑖) the unitary operator on ℍ𝑛 that applies 𝑈 and 𝑉 to the 𝑖th qubit of a
quantum register of length 𝑛. Show that 𝐸(𝑈, 𝑉) = 𝐸(𝑈(𝑖), 𝑉(𝑖)).
4.12. Quantum algorithms and quantum complexity 199

4.11.1. Efficiency of approximation. We have observed in Theorem 4.10.6 that


the set comprising all rotation gates and the standard 𝖢𝖭𝖮𝖳 gate is perfectly univer-
sal. This implies that, disregarding a global phase factor, it is possible to implement all
unitary operators on any state space ℍ𝑛 , where 𝑛 ∈ ℕ, using gates from this set. Con-
sequently, when discussing the complexity of quantum computing, it is highly con-
venient to assume that the available quantum computing platform offers all rotation
gates and the standard 𝖢𝖭𝖮𝖳 gate. This is our approach in Section 4.12.2.
However, quantum computing platforms may also provide only a finite number
of single-qubit gates, which, in conjunction with the 𝖢𝖭𝖮𝖳 gate, form a universal set
of quantum gates. According to Theorem 4.11.1, the set containing the Hadamard,
𝜋/8, and standard 𝖢𝖭𝖮𝖳 gates possesses this property. As stated in Theorem 4.11.2,
all single-qubit gates can be approximated with arbitrary precision through composi-
tions of the Hadamard and 𝜋/8 gates. When transitioning from a platform that offers
all rotation gates to one that provides specific rotation gates capable of approximat-
ing all rotation gates, the impact on complexity results depends on the efficiency of
this approximation. Unfortunately, Theorem 4.11.2 does not address the issue of ap-
proximation efficiency. This is where the Solovay-Kitaev Theorem proves invaluable.
Initially announced by Robert M. Solovay in 1995 and independently proven by Alexei
Kitaev in 1997, it establishes the existence of highly efficient approximations.
Theorem 4.11.12. Let 𝐺 be a finite set of rotation gates containing its own inverses which
is universal for the set of all rotation operators. Then for all 𝜀 ∈ ℝ>0 and all rotation
𝑐
operators 𝑈 there is 𝑙 ∈ ℕ and a sequence 𝑉0 , . . . , 𝑉 𝑙−1 such that 𝑙 = O(log 1/𝜀) and
𝑙−1
𝐸 (𝑈, ∏𝑖=0 𝑉 𝑖 ) < 𝜀.

For the proof of Theorem 4.11.12 we refer the reader to [NC16, Appendix 3].

4.12. Quantum algorithms and quantum complexity


So far, we have focused on quantum circuits designed for fixed-length inputs. This sec-
tion introduces quantum algorithms capable of handling inputs of any length through
the utilization of quantum circuit families. Following this, we delve into quantum com-
plexity theory, which builds upon classical complexity theory as elucidated in Chapter
1. This theory empowers the assessment of quantum algorithm efficiency.

4.12.1. Quantum algorithms. To be able to define quantum algorithms, we


need families of quantum circuits, which are defined now. The classical analog is pre-
sented in Section 1.6.3.
Definition 4.12.1. A family of quantum circuits is a sequence (𝑄𝑛 )𝑛∈ℕ of quantum
circuits 𝑄𝑛 such that 𝑄𝑛 operates on 𝑛-qubit input registers for all 𝑛 ∈ ℕ.

In the theory of Boolean circuits, classical algorithms correspond to uniform fam-


ilies of such circuits. This has been explained in Section 1.6.3. Analogously, for the
construction of quantum algorithms, we use uniform families of quantum circuits. To
define such families, quantum circuits are encoded by finite bit strings. Definition 4.7.1
shows how such an encoding can be constructed. We assume that any such encoding
200 4. The Theory of Quantum Algorithms

has the following properties, which are mentioned in [Wat09] and which we have al-
ready used in Section 1.6.3.

(1) The encoding is sensible: every quantum circuit is encoded by at least one bit
string and every bit string encodes at most one quantum circuit.
(2) The encoding is efficient: there is 𝑐 ∈ ℕ such that every quantum circuit 𝑄 of size
𝑁 has an encoding of length at least size 𝑄 and at most 𝑁 𝑐 . Information about the
structure of a circuit must be computable in polynomial time from an encoding
of the circuit.
(3) The length of every encoding of a quantum circuit is at least the size of the circuit.

The term “structure information” may, for example, refer to information regarding
the input qubits and the quantum gates used in quantum circuits, including the qubits
on which these gates operate.
Now, uniform quantum circuit families can be defined analogously to classical
uniform circuit families in Definition 1.6.7. Since, by Theorem 4.7.7, the computing
power of quantum circuits is the same as the computing power of classical circuits, it
follows that quantum computing is Turing complete.
Next, similar to classical circuit complexity theory, quantum complexity theory re-
quires P-uniform quantum circuit families. Let us provide a formal definition of them.

Definition 4.12.2. A quantum circuit family (𝑄𝑛 ) is called P-uniform if there is a de-
terministic polynomial time algorithm that on input of I𝑛 , 𝑛 ∈ ℕ, outputs an encoding
of 𝑄𝑛 .

Our next goal is to define quantum algorithms. A simple example of such an al-
gorithm is the quantum implementation of coinToss presented in Algorithm 4.12.3.
This algorithm uses the quantum circuit QcoinToss shown in Figure 4.12.1, where the
Hadamard operator is applied to |0⟩, followed by measuring the resulting state. The
algorithm provides the measurement result, which can be 0 or 1, both occurring with
1
equal probabilities of 2 .

Algorithm 4.12.3. Quantum coin toss


Input: ∅
Output: 0 or 1
1: coinToss
2: |𝜓⟩ ← |0⟩
3: 𝑏 ← QcoinToss |𝜓⟩
4: return 𝑏
5: end
4.12. Quantum algorithms and quantum complexity 201

|𝜓⟩ 𝐻 𝑏

Figure 4.12.1. The quantum circuit QcoinToss where |𝜓⟩ is a single-qubit quantum
state and 𝑏 ∈ {0, 1}.

Generalizing this example, we define general quantum algorithms as follows.


Definition 4.12.4. A quantum algorithm is a probabilistic algorithm with the follow-
ing additional features.
(1) The algorithm may invoke elements from a P-uniform quantum circuit family. To
do so, it prepares an input state for the quantum circuit unless this state is already
part of this circuit. The return value is the outcome of the final measurement
performed in the quantum circuit.
(2) The algorithm may also invoke other quantum algorithms as subroutines if they
terminate on any input.
Example 4.12.5. Consider Algorithm 4.12.6 that implements the probabilistic opera-
tion randomString from Section 1.2.1. It uses the family of quantum circuits
(QrandomString𝑛 ) whose elements are shown in Figure 4.12.2. On input of a string
⊗𝑛
length 𝑛 ∈ ℕ, it prepares the input state |0⟩ and applies the quantum circuit
QrandomString𝑛 to this state. This circuit applies the Hadamard operator to all input
qubits and measures the resulting state in the computational basis of ℍ𝑛 . The return
1𝑛
value is one of the vectors 𝑥⃗ ∈ {0, 1}𝑛 , each with probability 2
.

Algorithm 4.12.6. Quantum random string selection


Input: 𝑛 ∈ ℕ
Output: 𝑏 ⃗ ∈ {0, 1}𝑛
1: randomString(𝑛)
⊗𝑛
2: |𝜓⟩ ← |0⟩
3: 𝑥⃗ ← QrandomString𝑛 |𝜓⟩
4: return 𝑥⃗
5: end

This definition allows for the smooth transition of concepts and results from prob-
abilistic algorithms to quantum algorithms. For example, quantum Monte Carlo algo-
rithms and quantum Las Vegas algorithms can be defined in a straightforward manner.
Also, quantum Monte Carlo algorithms can be either error-free or not. Additionally,
there are quantum Bernoulli algorithms that correspond to error-free quantum Monte

|𝜓⟩ 𝐻 ⊗𝑛 𝑥⃗

Figure 4.12.2. The quantum circuit QrandomString𝑛 where 𝑛 ∈ ℕ, |𝜓⟩ is a quantum


state in ℍ𝑛 , and 𝑥⃗ ∈ {0, 1}𝑛 .
202 4. The Theory of Quantum Algorithms

Carlo algorithms. Lastly, quantum decision algorithms and their properties are analo-
gous to probabilistic decision algorithms discussed in Section 1.2.2.

4.12.2. The quantum computing platform. In the forthcoming exposition of


quantum complexity theory and in the complexity analysis of the quantum algorithms
presented in the subsequent chapters, we assume the availability of a quantum com-
puting platform.
According to Theorem 4.10.6, the set that includes all rotation gates and the stan-
dard 𝖢𝖭𝖮𝖳 gate is perfectly universal. This means that it is possible to implement all
unitary operators on any state space using the gates from this set, up to a global phase
factor. Also, all physical realizations of quantum computers allow for the implemen-
tation of rotation gates. Since we are only interested in implementing quantum oper-
ators up to a global phase factor, we, therefore, assume that our quantum computing
platform makes all rotation gates, the 𝖢𝖭𝖮𝖳 gate, the ancillary and the erasure gates
available. To use the rotation gates in the platform, the rotation axis and the rotation
angle must be known. For the convenience of the exposition, we also assume that the
platform includes the Pauli, Hadamard, and phase shift gates 𝑅𝑘 for all 𝑘 ∈ ℕ since for
these gates, we know representations as rotation gates, up to global phase factors. For
the Pauli and Hadamard gates, this is shown in Exercise 4.3.10 and for the phase shift
gates 𝑅𝑘 in (4.3.49). Note that the platform also includes the phase gate 𝑆 = 𝑅2 from
(4.3.50) and the 𝜋/8 gate 𝑇 = 𝑅3 from (4.3.51). Furthermore, the platform includes the
Toffoli gate 𝖢𝖢𝖭𝖮𝖳. As we have seen in Figure 4.9.2, it can be implemented using O(1)
of the previous gates. We refer to the gates listed so far as elementary gates.
In addition, specific quantum circuits may make use of certain operators linked
to the particular computational problem that they solve. A notable example is the
Deutsch algorithm, which will be explored in the next chapter. This algorithm uses a
black-box that implements an operator 𝑈 𝑓 , tailored for a function 𝑓 ∶ 0, 1 → 0, 1 where
by a black-box we mean a system or device that can only be observed in terms of its in-
put and output, without revealing any knowledge of its inner workings. The purpose
of the algorithm is to determine 𝑓(0) ⊕ 𝑓(1). This incorporation of problem-specific
operators adds a layer of versatility to quantum circuits, allowing them to encapsulate
the intricacies of the problems they seek to solve.
As already discussed in Section 4.11.1, it is also possible that quantum computing
platforms only provide a finite set of single-qubit gates which, in conjunction with the
𝖢𝖭𝖮𝖳 gate, form a universal set of quantum gates. Theorem 4.11.12 reveals that these
gates can be used to efficiently approximate all single-qubit operators. Consequently,
the complexity results obtained with our larger platform undergo minimal changes if
such reduced platforms are utilized. However, discussing this is beyond the scope of
this book.

4.12.3. Implementing 𝐶 1 (𝑈). In several quantum circuits constructed in the


following, controlled-𝑈 operators will be required for certain unitary operators 𝑈. We
have seen in Theorem 4.9.13 that general controlled operators with 𝑘 control bits, where
𝑘 ≥ 2, can be implemented using O(𝑘) elementary gates and one 𝐶 1 (𝑈) gate. This raises
the question of how to implement the 𝐶 1 (𝑈) gate. Unfortunately, there is no generic
4.12. Quantum algorithms and quantum complexity 203

construction of 𝐶 1 (𝑈) when 𝑈 is given as a black-box. However, as the next theorem


shows, the situation is different if a quantum circuit implementation of 𝑈 is provided,
using only elementary gates.
Theorem 4.12.7. Let 𝑛 ∈ ℕ and let 𝑈 be a unitary operator on ℍ𝑛 . Assume that there is a
quantum circuit 𝑄 that implements 𝑈 and uses 𝑘 ∈ ℕ elementary and no other quantum
gates. Then there is a quantum circuit 𝑄′ that implements the controlled operator 𝐶 1 (𝑈)
and uses O(𝑘) elementary gates and no other gates.

Proof. The quantum circuit 𝑄′ is constructed from the quantum circuit 𝑄 by replac-
ing all unitary elementary gates 𝑉 with their controlled counterparts 𝐶 1 (𝑉). The ele-
mentary single-qubit gates are rotation gates, up to global phase factors. Therefore, it
suffices to consider rotation gates. Let 𝑤̂ ∈ ℝ3 be a unit vector, 𝛾 ∈ ℝ, and consider
the rotation operator 𝑉 = 𝑅𝑤̂ (𝛾). As demonstrated in Exercise 4.3.34, we can express
𝑉 as 𝑉 = 𝐴𝑋𝐵𝑋𝐶, where 𝐴 = 𝑅𝑤̂ (𝛾/2), 𝐵 = 𝑅𝑤̂ (−𝛾/2), and 𝐶 = 𝐼2 . Consequently, it
follows from Theorem 4.4.5 that 𝐶 1 (𝑉) can be implemented by a quantum circuit that
uses the three rotation gates 𝐴, 𝐵, and 𝐶 with known axis and angle of rotation, along
with two 𝖢𝖭𝖮𝖳 gates. There are two more unitary elementary gates: the 𝖢𝖭𝖮𝖳 gate and
the 𝖢𝖢𝖭𝖮𝖳 gate. It follows from Proposition 4.9.10 that the corresponding controlled
versions can be implemented using 𝑂(1) elementary gates. These arguments imply the
assertion of the theorem. □

From Theorem 4.12.7 we obtain the following corollary.


Corollary 4.12.8. For any unitary elementary gate 𝑈 the controlled-𝑈 operator can be
implemented using a quantum circuit that uses O(1) elementary quantum gates.
Exercise 4.12.9. Prove Corollary 4.12.8.

4.12.4. Time and space complexity. The goal of this section is to introduce
the time and space complexity of quantum algorithms which are defined as probabilis-
tic algorithms that may invoke quantum subroutines. Therefore, we must define the
complexity of such subroutines. For this, we first explain the running time and space
requirements of quantum circuits.
Definition 4.12.10. Let 𝑄 be a quantum circuit.
(1) The running time or time complexity of 𝑄 is its size, i.e., the number of input qubits
plus the number of gates used by the circuit.
(2) The space complexity of 𝑄 is the number of input qubits plus the number of ancilla
qubits used by 𝑄.

The complexity of quantum algorithms associated with P-uniform quantum circuit


families is defined next.
Definition 4.12.11. Let (𝑄𝑛 ) be a P-uniform family of quantum circuits and let 𝐴 be
the quantum algorithm corresponding to it.
(1) The time complexity or running time of 𝐴 is the function qTime ∶ ℕ → ℕ that
sends an input length 𝑛 ∈ ℕ to the maximum time complexity of a quantum
circuit used in the execution of 𝐴 with an input of length 𝑛.
204 4. The Theory of Quantum Algorithms

(2) The space complexity of 𝐴 is the function qSpace ∶ ℕ → ℕ that sends an input
length 𝑛 ∈ ℕ to the maximum space complexity of a quantum circuit used in the
execution of 𝐴 with an input of length 𝑛.

The complexity of quantum algorithms is now defined using the corresponding


concepts for probabilistic algorithms found in Definitions 1.1.25 and 1.1.26, while also
accounting for the complexity of quantum subroutines. Furthermore, the names of
the asymptotic time and space complexities in Table 1.1.7 apply directly to quantum
algorithms. Additionally, the concepts of expected running time discussed in Section
1.3.3, success probability outlined in Section 1.3.2, and its amplification as described
in Section 1.3.4 all seamlessly carry over to quantum algorithms.
Example 4.12.12. The time and space complexity of Algorithm 4.12.6 is exponential
since the input is the number of bits of the integer returned by the algorithm.

4.12.5. Quantum complexity classes. The complexity theory for probabilistic


algorithms also carries over to quantum algorithms.
To say that a quantum Monte Carlo or Las Vegas algorithm solves a computational
problem is analogous to stating that a classical Monte Carlo or Las Vegas algorithm
solves such a problem (see Definition 1.4.4). Accordingly, if 𝑓 ∶ ℕ → ℝ>0 is a func-
tion, we say that an algorithmic problem can be solved in quantum time O(𝑓) if there
is a quantum Monte Carlo algorithm that solves this problem with success probability
2
≥ 3 and has running time O(𝑓). We also say that a computational problem is solvable
in quantum linear, quasilinear, quadratic, cubic, polynomial, subexponential, and ex-
ponential time if there is a quantum Monte Carlo algorithm with this running time
2
that solves this problem with success probability ≥ 3 (see Definition 1.4.11).
Finally, we define the complexity class BQP in analogy to BPP (see Definition
1.4.18).
Definition 4.12.13. The complexity class BQP is the set of all languages 𝐿 for which
there is a quantum polynomial time Monte Carlo algorithm 𝐴 that decides 𝐿 and sat-
2 2
isfies Pr(𝐴(𝑠)⃗ = 1) ≥ 3 for all 𝑠 ⃗ ∈ 𝐿 and Pr(𝐴(𝑠)⃗ = 0) ≥ 3 for all 𝑠 ⃗ ∈ {0, 1}∗ ⧵ 𝐿.

2
It is important to observe that by Exercise 1.4.12 the value 3 in the definition of the
quantum time complexities and the quantum complexity class BQP may be replaced
1
by any real number in ] 2 , 1]. Also, we note that the following inclusions are satisfied.
Theorem 4.12.14. We have P ⊂ BPP ⊂ BQP ⊂ PSPACE.
Exercise 4.12.15. Sketch the proof of Theorem 4.12.14.
Chapter 5

The Algorithms of
Deutsch and Simon

After the initial concepts of quantum computers emerged, a fundamental question


arose: Can these new computers speed up the process of solving complex problems?
This chapter delves into the world of early quantum algorithms that firmly answer this
question in the affirmative.
Before we provide an overview of the content of this chapter, it is important to
note the following. None of the algorithms presented here are primarily intended for
practical applications. Instead, their primary purpose is to illustrate the superiority
of quantum computing over classical computing in terms of complexity. Additionally,
these algorithms demonstrate fundamental components and techniques of quantum
algorithms and have been a source of inspiration for researchers in the development of
more practical quantum algorithms.
One of the pioneering researchers who offered a positive response was David
Deutsch in 1985 [Deu85]. We kick off this chapter by introducing his clever and con-
cise quantum algorithm, which has the capability of calculating the value 𝑓(0) ⊕ 𝑓(1)
for a function 𝑓 ∶ {0, 1} → {0, 1}. Notably, this quantum algorithm achieves its result
with just one evaluation of the quantum counterpart of the function 𝑓 ∶ {0, 1} → {0, 1},
whereas in the classical context, achieving the same result would require two evalu-
ations of 𝑓. This algorithm already unveils key techniques that underpin a majority
of quantum algorithms: quantum parallelism, the phase-kickback trick, and quantum
interference.
Subsequently, we discuss the generalization of the Deutsch algorithm, jointly con-
ceived by David Deutsch and Richard Jozsa [DJ92] in 1992 and subsequently refined
by Richard Cleve, Artur Ekert, Leah Henderson, Chaira Macchiavello, and Michele
Mosca [CEH+ 98]. This algorithm determines whether a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}
consistently yields the same output or provides an equal distribution of 0 and 1. While

205
206 5. The Algorithms of Deutsch and Simon

the classical deterministic alternative requires 2𝑛−1 + 1 evaluations of 𝑓, the Deutsch-


Jozsa algorithm achieves its result with just one evaluation of a quantum operator cor-
responding to 𝑓. This significant efficiency boost lends the algorithm its renown, al-
though its advantage disappears when compared to a relatively straightforward prob-
abilistic algorithm.
However, the story does not end here. The next groundbreaking step was the devel-
opment of quantum algorithms that offer exponential speedup compared to all efficient
classical algorithms for solving the same computational problem that they address. So
following the Deutsch and Deutsch-Jozsa algorithms, this chapter proceeds to present
the first such algorithm introduced in 1994 by Daniel R. Simon [Sim94]. It represents
yet another quantum computing breakthrough. In fact, it laid the groundwork for Pe-
ter Shor’s renowned polynomial time quantum algorithms, which are described in the
following chapter and target problems like integer factorization and discrete logarithm
computation.
So, the algorithms presented in this chapter serve as pivotal steps towards more
sophisticated algorithms, for instance, those explained in the following chapters, which
provide much more efficient solutions to algorithmic problems in many application
domains than their classical counterparts.

5.1. The Deutsch algorithm


The first quantum algorithm that we discuss is the Deutsch algorithm. It was first
proposed by David Deutsch in 1985 [Deu85] and in 1998 improved by Cleve et al.
[CEH+ 98].

5.1.1. The classical version. Consider a function


(5.1.1) 𝑓 ∶ {0, 1} → {0, 1}.
Assume that this function is provided by a black-box. By “black-box”, we mean a sys-
tem or device that can only be observed in terms of its input and output, without re-
vealing any knowledge of its inner workings. This concept was previously introduced
in Section 4.12.2. The black-box implementing 𝑓 can be queried with an input value
𝑏 ∈ {0, 1}, returns 𝑓(𝑏), and this constitutes the only information obtained from this
query. If we are given a black-box that implements a function 𝑓, we also refer to it as
having “black-box access” to 𝑓.
Next, we call 𝑓 from (5.1.1) constant if 𝑓(0) = 𝑓(1), that is, 𝑓(0) ⊕ 𝑓(1) = 0, and
we call 𝑓 balanced if 𝑓(0) ≠ 𝑓(1), that is, 𝑓(0) ⊕ 𝑓(1) = 1.
The classical Deutsch problem is to find out if a function 𝑓 ∶ {0, 1} → {0, 1} is
constant or balanced given black-box access to this function.
More formally, this can be described as follows.
Problem 5.1.1 (Deutsch problem — classical version).
Input: A black-box implementing a function 𝑓 ∶ {0, 1} → {0, 1}.
Output: 𝑓(0) ⊕ 𝑓(1).
5.1. The Deutsch algorithm 207

At first glance, this problem may appear trivial, as with just two queries to the
black-box function 𝑓, we can determine the answer. However, it’s essential to make
a significant observation: there is no way to solve the Deutsch problem with fewer
queries. To understand this, let’s assume that the first query yields 𝑓(𝑏) = 𝑐, where
both 𝑏 and 𝑐 are binary values. It’s still possible that 𝑓(1 − 𝑏) = 𝑐, which implies
that either 𝑓(0) ⊕ 𝑓(1) = 0 or that 𝑓(1 − 𝑏) = 1 − 𝑐, which implies 𝑓(0) ⊕ 𝑓(1) =
1. However, as we will explore in the next section, the quantum Deutsch algorithm
can solve this problem with just one query to a black-box that implements a unitary
operator closely related to the function 𝑓. This makes the Deutsch algorithm the first
algorithm capable of accomplishing something impossible in the classical world. In
doing so, it introduces crucial principles of quantum computing that are also leveraged
in more advanced quantum algorithms.

5.1.2. The quantum version and its solution. To explain the quantum version
of the Deutsch problem, we define the following unitary operator on ℍ2 :
(5.1.2) 𝑈 𝑓 ∶ ℍ2 → ℍ2 , |𝑥⟩ |𝑦⟩ ↦ |𝑥⟩ |𝑓(𝑥) ⊕ 𝑦⟩ = |𝑥⟩ 𝑋 𝑓(𝑥) |𝑦⟩ .

Replacing 𝑓 by 𝑈 𝑓 we obtain the quantum version of the Deutsch problem.


Problem 5.1.2 (Deutsch problem — quantum version).
Input: A black-box that implements the quantum operator 𝑈 𝑓 for a function 𝑓 ∶
{0, 1} → {0, 1}.
Output: 𝑓(0) ⊕ 𝑓(1).
Exercise 5.1.3. Show that 𝑈 𝑓 is a unitary operator.

The quantum circuit that solves the Deutsch problem is shown in Figure 5.1.1.
It uses important ingredients of quantum computing: superposition, quantum paral-
lelism, phase kickback, and quantum interference. Now we describe the circuit step by
step and explain these ingredients.
The input state is
(5.1.3) |𝜓0 ⟩ = |0⟩ |1⟩ .
In the first step, the circuit applies the Hadamard operator to the first and the second
qubit which gives the state
|0⟩ |𝑥− ⟩ + |1⟩ |𝑥− ⟩
(5.1.4) |𝜓1 ⟩ = |𝑥+ ⟩ |𝑥− ⟩ = .
√2

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩

|0⟩ 𝐻 𝐻 𝑓(0) ⊕ 𝑓(1)


𝑈𝑓
|1⟩ 𝐻

Figure 5.1.1. The quantum circuit that solves the Deutsch problem.
208 5. The Algorithms of Deutsch and Simon

This is an equally weighted superposition of the states |0⟩ |𝑥− ⟩ and |1⟩ |𝑥− ⟩.
Next, the phase kickback trick is used. We note that
(5.1.5) 𝑈 𝑓 |0⟩ |𝑥− ⟩ = |0⟩ 𝑋 𝑓(0) |𝑥− ⟩ = (−1)𝑓(0) |0⟩ |𝑥− ⟩
and
(5.1.6) 𝑈 𝑓 |1⟩ |𝑥− ⟩ = |0⟩ 𝑋 𝑓(1) |𝑥− ⟩ = (−1)𝑓(1) |0⟩ |𝑥− ⟩ .

These quantum states have the global phase factors (−1)𝑓(0 and (−1)𝑓(1) . But since
global phase factors do not influence measurement outcomes, we cannot learn any-
thing about 𝑓(0) or 𝑓(1) from measuring these states or evolutions of them. However,
if we apply 𝑈 𝑓 to the superposition |𝜓1 ⟩ we obtain
|0⟩ + |1⟩
|𝜓2 ⟩ = 𝑈 𝑓 |𝑥+ ⟩ |𝑥− ⟩ = 𝑈 𝑓 |𝑥− ⟩
√2
(5.1.7)
𝑈 𝑓 |0⟩ |𝑥− ⟩ + 𝑈 𝑓 |1⟩ |𝑥− ⟩ (−1)𝑓(0) |0⟩ + (−1)𝑓(1) |1⟩
= = |𝑥− ⟩ .
√2 √2
In this operation the global phase factors (−1)𝑓(0) and (−1)𝑓(1) are kicked back to the
amplitudes of |0⟩ and |1⟩ in the first qubit. As we will see, this opens up the possibility
of gaining information about 𝑓(0) and 𝑓(1) through measurement of an evolution of
𝑈 𝑓 |𝑥+ ⟩ |𝑥− ⟩. Here, we also see quantum parallelism in action: one application of 𝑈 𝑓
changes both the amplitudes of |0⟩ and |1⟩.
Equation (5.1.7) implies
|0⟩ + (−1)𝑓(0)⊕𝑓(1) |1⟩
|𝜓2 ⟩ = (−1)𝑓(0) |𝑥− ⟩
√2
(5.1.8)
(−1)𝑓(0) |𝑥+ ⟩ |𝑥− ⟩ if 𝑓(0) ⊕ 𝑓(1) = 0,
={
(−1)𝑓(0) |𝑥− ⟩ |𝑥− ⟩ if 𝑓(0) ⊕ 𝑓(1) = 1.

So up to the global phase factor (−1)𝑓(0) the quantum interference of the two states
𝑈 𝑓 |0⟩ |𝑥− ⟩ and 𝑈 𝑓 |1⟩ |𝑥− ⟩ causes the amplitude of |1⟩ in the first qubit to be
(−1)𝑓(0)⊕𝑓(1) while the amplitude of |0⟩ in the first qubit is independent of this value.
Measuring the first qubit in the basis |𝑥+ ⟩ and |𝑥− ⟩ would give the desired result. Since
measurement in the computational basis is used, the Hadamard operator is applied to
the first qubit. This gives
(5.1.9) |𝜓3 ⟩ = (𝐻 ⊗ 𝐼) |𝜓2 ⟩ = (−1)𝑓(0) |𝑓(0) ⊕ 𝑓(1)⟩ |𝑥− ⟩ .
This state is separable with respect to the decomposition into the subsystems that con-
tain the first and the second qubit, respectively. So it follows from Corollary 3.7.12
that measuring the first qubit of |𝜓3 ⟩ in the computational basis gives 𝑓(0) ⊕ 𝑓(1) with
probability 1.
We have thus proved the following theorem.

Theorem 5.1.4. The quantum circuit in Figure 5.1.1 gives 𝑓(0) ⊕ 𝑓(1) with probability
1. It uses the black-box 𝑈 𝑓 once and, in addition, three Hadamard gates.
5.3. The Deutsch-Jozsa algorithm 209

So, while solving the classical Deutsch problem requires two applications of the
function 𝑓, the Deutsch quantum circuit only needs one application of 𝑈 𝑓 . The next
exercise generalizes the phase kickback trick.
Exercise 5.1.5. Consider a unitary single-qubit operator 𝑉 and an eigenstate |𝜓⟩ of
this operator.
(1) Show that applying the operator 𝑉 to |𝜓⟩ means applying a global phase shift to
this state.
(2) Show that applying the controlled-𝑉 operator 𝐶(𝑉) with the first qubit as a control
to the state |𝑥+ ⟩ |𝜓⟩ kicks the global phase shift back to the amplitude of |1⟩ in the
first qubit.
(3) Find the spherical coordinates of the points on the Bloch sphere corresponding
to the first qubit before and after the application of 𝐶(𝑉) to |𝑥+ ⟩ |𝜓⟩.

5.2. Oracle complexity


We note that specifying the Deutsch algorithm and analyzing its complexity necessi-
tates a modification of the concepts of probabilistic algorithms and their complexity, as
presented in Section 4.12. The only input required by the Deutsch algorithm is the or-
acle 𝑈 𝑓 . However, inputs of this kind are not accounted for in the quantum algorithm
concept discussed thus far. As a result, the analysis of the algorithm’s time complexity
includes consideration of the number of calls to this oracle. We will adopt this approach
for all other algorithms presented in this chapter.

5.3. The Deutsch-Jozsa algorithm


In 1992, David Deutsch and Richard Jozsa [DJ92] proposed a natural generalization
of the Deutsch problem and a quantum algorithm to solve it. Here we present the
improved version of the algorithm by Cleve et al. [CEH+ 98] in 1998.

5.3.1. The classical version. Let


(5.3.1) 𝑓 ∶ {0, 1}𝑛 → {0, 1}
be a function: We call 𝑓 constant if 𝑓(𝑥)⃗ is the same for all 𝑥⃗ ∈ {0, 1}𝑛 and we call 𝑓
balanced if 𝑓(𝑥)⃗ = 0 for half of the arguments 𝑥⃗ ∈ {0, 1}𝑛 and 𝑓(𝑥)⃗ = 1 for the other
half. The classical version of the Deutsch-Jozsa problem is the following.
Problem 5.3.1 (Deutsch-Jozsa problem — classical version).
Input: A black-box implementing a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 that is either con-
stant or balanced.
Output: “constant” or “balanced”, respectively, if 𝑓 has this property.

Exercise 5.3.2. Show that every deterministic algorithm that solves the classical
Deutsch-Jozsa problem requires 2𝑛−1 + 1 queries of the black-box implementing 𝑓.
210 5. The Algorithms of Deutsch and Simon

As previously mentioned, the Deutsch-Jozsa problem represents a straightforward


extension of the Deutsch problem. Its primary significance does not necessarily stem
from direct practical applications but rather from the striking disparity in performance
between the classical deterministic solution and the quantum solution, which is even
more pronounced compared to the Deutsch problem. As demonstrated in Exercise
5.3.2, any deterministic classical algorithm necessitates a minimum of 2𝑛−1 + 1 queries
of 𝑓, whereas the quantum algorithm uses the operator 𝑈 𝑓 corresponding to the func-
tion 𝑓 only once. However, it’s worth noting that this is only part of the story, as the
subsequent exercise reveals the existence of a significantly more efficient classical prob-
abilistic algorithm for solving the Deutsch-Jozsa problem.
Exercise 5.3.3. Find a probabilistic algorithm which requires two evaluations of 𝑓 and
1
solves the Deutsch-Jozsa problem with a success probability of at least 2 .

5.3.2. The quantum version and its solution. The quantum version of the
Deutsch-Jozsa problem uses the operator
(5.3.2) 𝑈 𝑓 ∶ ℍ 𝑛 ⊗ ℍ1 → ℍ 𝑛 ⊗ ℍ1 , |𝑥⟩⃗ |𝑦⟩ ↦ |𝑥⟩⃗ |𝑓(𝑥)⃗ ⊕ 𝑦⟩ = |𝑥⟩⃗ 𝑋 𝑓(𝑥)⃗ |𝑦⟩
to find out whether 𝑓 is constant or balanced.
Exercise 5.3.4. Prove that 𝑈 𝑓 defined in (5.3.2) is a unitary operator on ℍ𝑛 .

The quantum version of the Deutsch-Jozsa problem is the following.


Problem 5.3.5 (Deutsch-Jozsa problem — quantum version).
Input: A positive integer 𝑛, a black-box implementing the operator 𝑈 𝑓 from (5.3.2) for
a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 that is either constant or balanced.
Output: “constant” or “balanced”, respectively, if 𝑓 has this property.

The Deutsch-Josza problem is an example of a promise problem which in complex-


ity theory refers to a decision problem that comes with a “promise” or guarantee about
the possible answers it can have. Here, the two possible answers are “constant” or
“balanced”.
The quantum circuit that solves the Deutsch-Jozsa problem is shown in Figure
5.3.1. It also uses superposition, phase kickback, and interference and is similar to the
quantum circuit in Figure 5.1.1 that solves the Deutsch problem.
In order to show that this circuit has the desired property, we use the following
lemma.

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩


⊗𝑛 0⃗ if 𝑓 is constant,
|0⟩ 𝐻 ⊗𝑛 𝐻 ⊗𝑛 {

𝑥⃗ ≠ 0 if 𝑓 is balanced
𝑈𝑓
|1⟩ 𝐻

Figure 5.3.1. The quantum circuit QDJ (𝑛, 𝑈 𝑓 ) that solves the Deutsch-Jozsa problem.
5.3. The Deutsch-Jozsa algorithm 211

Lemma 5.3.6. For all 𝑥⃗ ∈ {0, 1}𝑛 we have


1
(5.3.3) 𝐻 ⊗𝑛 |𝑥⟩⃗ = ∑ (−1)𝑥⋅⃗ 𝑧⃗ |𝑧⟩⃗
√2𝑛 ⃗
𝑧∈{0,1}𝑛

and
1
(5.3.4) |𝑥⟩⃗ = ∑ (−1)𝑥⋅⃗ 𝑧⃗ 𝐻 ⊗𝑛 |𝑧⟩⃗ .
√2𝑛 ⃗
𝑧∈{0,1}𝑛

Proof. We first note that for 𝑥 ∈ {0, 1} we have


1 1
(5.3.5) 𝐻 |𝑥⟩ = (|0⟩ + (−1)𝑥 |1⟩) = ∑ (−1)𝑥𝑧 |𝑧⟩ .
√2 √2 𝑧∈{0,1}

Hence, for 𝑥⃗ = (𝑥0 , . . . , 𝑥𝑛−1 ) ∈ {0, 1}𝑛 we have

1
𝐻 ⊗𝑛 |𝑥⟩⃗ = ( ∑ (−1)𝑥0 𝑧0 |𝑧⟩) ⊗ ⋯ ⊗ ( ∑ (−1)𝑥𝑛−1 𝑧𝑛−1 |𝑧⟩)
√2𝑛 𝑧0 ∈{0,1} 𝑧𝑛−1 ∈{0,1}
1
(5.3.6) = ∑ ((−1)𝑥0 𝑧0 |𝑧0 ⟩) ⊗ ⋯ ⊗ ((−1)𝑥𝑛−1 𝑧𝑛−1 |𝑧𝑛−1 ⟩)
√2𝑛 (𝑧0 ,. . .,𝑧𝑛−1 )∈{0,1}𝑛
1
= ∑ (−1)𝑥⋅⃗ 𝑧⃗ |𝑧⟩⃗ .
√2𝑛 ⃗
𝑧∈{0,1}𝑛

This proves (5.3.3). So 𝐻 2 = 𝐼 implies (5.3.4). □

Now we determine the states |𝜓𝑖 ⟩, 0 ≤ 𝑖 ≤ 3, in the Deutsch-Jozsa circuit. The


initial state is
⊗𝑛
(5.3.7) |𝜓0 ⟩ = |0⟩ |1⟩ .
The circuit applies 𝐻 ⊗(𝑛+1) to this state. It follows from (5.3.3) that this gives the state
⊗𝑛 1
(5.3.8) |𝜓1 ⟩ = 𝐻 ⊗𝑛 |0⟩ |𝑥− ⟩ = ∑ |𝑥⟩⃗ |𝑥− ⟩ .
√2𝑛 ⃗
𝑥∈{0,1}𝑛

This is the equally weighted superposition of the states |𝑥⟩⃗ |𝑥− ⟩.


Next, the quantum circuit applies 𝑈 𝑓 to |𝜓1 ⟩. We show that this is an application
of the phase kickback trick. For all 𝑥⃗ ∈ {0, 1}𝑛 we have
(5.3.9) 𝑈 𝑓 |𝑥⟩⃗ |𝑥− ⟩ = (−1)𝑓(𝑥)⃗ |𝑥⟩⃗ |𝑥− ⟩ .

Therefore, the application of 𝑈 𝑓 to |𝑥⟩⃗ |𝑥− ⟩ modifies this state by the global phase factor
(−1)𝑓(𝑥)⃗ . It follows from (5.3.9) that
1
|𝜓2 ⟩ = 𝑈 𝑓 |𝜓1 ⟩ = ∑ 𝑈 𝑓 |𝑥⟩⃗ |𝑥− ⟩
√2𝑛 ⃗
𝑥∈{0,1}𝑛
(5.3.10)
1
= ∑ (−1)𝑓(𝑥)⃗ |𝑥⟩⃗ |𝑥− ⟩ .
√2𝑛 ⃗
𝑥∈{0,1}𝑛
212 5. The Algorithms of Deutsch and Simon

Hence, applying 𝑈 𝑓 to the superposition |𝜓1 ⟩ kicks the global phase shifts (−1)𝑥⃗ back
to all the amplitudes of the states |𝑥⟩⃗ of the first 𝑛 qubits. In order to extract information
about the function 𝑓 from this superposition, we note that by (5.3.4) we have

(−1)𝑓(0)⃗ ⃗ 0)⃗ |
|𝜓2 ⟩ = ∑ (−1)𝑓(𝑥)⊕𝑓( 𝑥⟩⃗ |𝑥− ⟩
√2𝑛 ⃗
𝑥∈{0,1}𝑛

(−1)𝑓(0)⃗ ⃗ 0)⃗
(5.3.11) = ∑ (−1)𝑓(𝑥)⊕𝑓( ∑ (−1)𝑥⋅⃗ 𝑧⃗ 𝐻 ⊗𝑛 |𝑧⟩⃗ |𝑥− ⟩
2𝑛 ⃗ 𝑛 ⃗ 𝑛
𝑥∈{0,1} 𝑧∈{0,1}

(−1)𝑓(0)⃗ 0)⃗
=( ∑ ( ∑ (−1)𝑥⋅⃗ 𝑧+𝑓(
⃗ ⃗
𝑥)⊕𝑓(
) 𝐻 ⊗𝑛 |𝑧⟩)
⃗ |𝑥− ⟩ .
2𝑛 ⃗ 𝑛 ⃗ 𝑛
𝑧∈{0,1} 𝑥∈{0,1}

This is the tensor product of a quantum state in ℍ𝑛 with |𝑥− ⟩. The amplitude of
the basis state 𝐻 ⊗𝑛 |0⟩𝑛 in the state of the first 𝑛 qubits is

(−1)𝑓(0)⃗ ⃗ 0)⃗ (−1)𝑓(0)⃗ if 𝑓 is constant,


(5.3.12) ∑ (−1)𝑓(𝑥)⊕𝑓( ={
2𝑛 𝑥∈{0,1}𝑛 0 if 𝑓 is balanced.

It follows from Corollary 3.7.12 that measuring the first 𝑛 qubits in the basis
(𝐻 ⊗𝑛 |𝑧⟩)
⃗ 𝑧∈{0,1}
⃗ 𝑛 gives with probability 1 the information whether 𝑓 is constant or bal-

anced. To obtain this information by a measurement in the computational basis, the


Deutsch-Jozsa circuit applies 𝐻 ⊗𝑛 to the quantum state of the first 𝑛 qubits of |𝜓2 ⟩. By
(5.3.11), this gives the final state

(−1)𝑓(0)⃗ 0)⃗
(5.3.13) |𝜓3 ⟩ = ( ∑ ( ∑ (−1)𝑥⋅⃗ 𝑧+𝑓(
⃗ ⃗
𝑥)⊕𝑓(
) |𝑧⟩)
⃗ |𝑥− ⟩ .
2𝑛 ⃗ 𝑛 ⃗ 𝑛
𝑧∈{0,1} 𝑥∈{0,1}

Measuring the first 𝑛 qubits of |𝜓3 ⟩ in the computational basis gives 0⃗ with probability
1 if 𝑓 is constant and 𝑧 ⃗ ≠ 0⃗ if 𝑓 is balanced. As desired, this measurement distinguishes
with probability 1 between constant and balanced functions 𝑓 using 2𝑛+1 applications
of the Hadamard operator 𝐻 and one application of 𝑈 𝑓 . This can be considered as an
exponential speedup of the best classical solution to the Deutsch problem.
Summarizing our discussion, we obtain the following theorem.

Theorem 5.3.7. Let 𝑛 ∈ ℕ, let 𝑓 ∶ {0, 1}𝑛 → {0, 1} be a function that is constant or
balanced, and let 𝑈 𝑓 be the unitary operator from (5.3.2). Then with probability 1 the
quantum circuit QDJ returns 0⃗ if 𝑓 is constant and 𝑥⃗ ∈ {0, 1}𝑛 , 𝑥⃗ ≠ 0,⃗ if 𝑓 is balanced. It
uses one 𝑈 𝑓 gate and 2𝑛 + 1 Hadamard gates.

So, compared to the best deterministic algorithm for solving the Deutsch-Jozsa
problem, which by Exercise 5.3.2 requires 2𝑛−1 + 1 evaluations of the function 𝑓, the
Deutsch-Jozsa algorithm represents a dramatic asymptotic speedup. However, com-
pared to the probabilistic algorithm from Exercise 5.3.3, this advantage vanishes. This
is different for Simon’s algorithm, which is presented in the next section.
5.4. Simon’s algorithm 213

5.4. Simon’s algorithm


This section focuses on Simon’s problem and Simon’s quantum algorithm for solving
this problem. The algorithm was presented in 1994 by Daniel Simon [Sim94, Sim97]
with the intention of showing quantum computing’s supremacy over classical comput-
ing from the perspective of complexity theory. Simon’s pioneering work demonstrated
that the quantum variant of Simon’s problem can be resolved exponentially faster com-
pared to its classical counterpart. Notably, this marked the first instance of such an ex-
ponential acceleration, and it also laid the groundwork for inspiring the development
of the Shor algorithm, a topic we will explore in Chapter 6.

5.4.1. The classical version. Classically, Simon’s problem is the following.


Problem 5.4.1 (Simon’s problem — classical version).
Input: A black-box implementing a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 with the property
that there is 𝑠 ⃗ ∈ {0, 1}𝑛 , 𝑠 ⃗ ≠ 0,⃗ such that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and
only if 𝑥⃗ = 𝑦 ⃗ or 𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠.⃗
Output: The hidden string 𝑠.⃗

The next exercise gives a lower bound for solving Simon’s problem using a classical
deterministic algorithm.
Exercise 5.4.2. Let 𝐴 be a classical deterministic algorithm that solves Simon’s prob-
lem. Show that in the worst case, 𝐴 must query the black-box implementing 𝑓 at least
2𝑛−1 + 1 times.

We will see that the quantum algorithm for Simon’s problem is much more effi-
cient. But it is a probabilistic algorithm. Therefore, we must compare it with classical
probabilistic algorithms. A lower bound for their performance is given in the next the-
orem, which was proved by Richard Cleve in [Cle11].
Theorem 5.4.3. Any classical probabilistic algorithm that solves Simon’s problem with
probability at least 3/4 must make Ω(2𝑛/2 ) queries to the black-box for 𝑓.

5.4.2. The quantum version and its solution. As in the quantum Deutsch
problem, also in the quantum version of Simon’s problem the function 𝑓 ∶ {0, 1}𝑛 →
{0, 1}𝑛 is replaced by a unitary operator on ℍ𝑛 . This operator is
(5.4.1) 𝑈 𝑓 ∶ ℍ 𝑛 ⊗ ℍ𝑛 , |𝑥⟩⃗ |𝑦⟩⃗ ↦ |𝑥⟩⃗ |𝑓(𝑥)⃗ ⊕ 𝑦⟩⃗ .
With this operator, the quantum version of Simon’s problem can be stated as follows.
Problem 5.4.4 (Simon’s problem — quantum version).
Input: A positive integer 𝑛, a black-box implementing the unitary operator 𝑈 𝑓 from
(5.4.1) for a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 such that there is a vector 𝑠 ⃗ ∈ {0, 1}𝑛 , 𝑠 ⃗ ≠ 0,⃗
with the property that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑥⃗ = 𝑦 ⃗ or
𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠.⃗
Output: The hidden string 𝑠.⃗
214 5. The Algorithms of Deutsch and Simon

Algorithm 5.4.5. Simon’s algorithm


Input: A positive integer 𝑛 and a black-box implementation of 𝑈 𝑓 as in (5.4.1) for a
function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 such that there is 𝑠 ⃗ ∈ {0, 1}𝑛 , 𝑠 ⃗ ≠ 0,⃗ with the property
that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑥⃗ = 𝑦 ⃗ or 𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠.⃗
Output: The hidden string 𝑠 ⃗ from Simon’s problem
1: QSimon(𝑛, 𝑈 𝑓 )
2: 𝑊 ← ()
3: for 𝑗 = 1 to 𝑛 − 1 do
4: 𝑤𝑗⃗ ← QSimon (𝑛, 𝑈 𝑓 )
5: 𝑊 ← 𝑊 ∘ (𝑤𝑗⃗ )
6: end for
7: 𝑟 ← rank 𝑊
8: if 𝑟 = 𝑛 − 1 then
9: Find the unique nonzero solution 𝑠 ⃗ of the linear system 𝑊 T 𝑥⃗ = 0
10: return 𝑠 ⃗
11: else
12: return “Failure”
13: end if
14: end

We explain the idea of Simon’s Algorithm 5.4.5. The details and proofs are given
below. Using the quantum circuit QSimon (𝑛, 𝑈 𝑓 ) from Figure 5.4.1 the algorithm se-
lects 𝑛 − 1 elements 𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−1 in the orthogonal complement
(5.4.2) 𝑠⟂⃗ = {𝑤⃗ ∈ {0, 1}𝑛 ∶ 𝑤⃗ ⋅ 𝑠 ⃗ = 0}
of 𝑠.⃗ If the matrix 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−1 ) has rank 𝑛 − 1, then the algorithm computes the
uniquely determined solution of the linear system 𝑊 T 𝑥⃗ = 0⃗ which happens to be the
hidden string 𝑠.⃗
In the upcoming part of this section, we will adopt the notation from the quan-
tum version of Simon’s problem and prove the following theorem. In view of Theorem
5.4.3, it shows that Simon’s algorithm offers an exponential speedup compared to any
deterministic algorithm for Simon’s problem.
Theorem 5.4.6. Simon’s Algorithm 5.4.5 returns the hidden string 𝑠 ⃗ from Simon’s prob-
lem with probability at least 1/4. It requires 𝑛 − 1 applications of 𝑈 𝑓 and O(𝑛3 ) other
operations.

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩


⊗𝑛
|0⟩ 𝐻 ⊗𝑛 𝐻 ⊗𝑛 𝑤⃗
𝑈𝑓
⊗𝑛
|0⟩

Figure 5.4.1. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) used in Simon’s algorithm.
5.4. Simon’s algorithm 215

We first prove that Simon’s algorithm returns the correct result.

Proposition 5.4.7. Let 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−1 ) be of dimension 𝑛−1. Then 𝑠 ⃗ is the uniquely
determined solution of the linear system 𝑊 T 𝑥⃗ = 0.⃗

Proof. By Proposition 2.2.42, the orthogonal complement of 𝑠 ⃗ is a subspace of {0, 1}𝑛


of dimension 𝑛 − 1. By Proposition B.7.5, the dimension of the kernel of 𝑊 T is 1. Since
𝑠 ⃗ is in the kernel of 𝑊 T it follows that this kernel is {0,⃗ 𝑠}.⃗ □

It follows from Proposition 5.4.7 that Simon’s algorithm returns the correct result
if the quantum circuit QSimon (𝑛, 𝑈 𝑓 ) in Figure 5.4.1 returns elements of 𝑠⟂⃗ . This is
what we will prove now. We need the following result.

Lemma 5.4.8. Let 𝑠 ⃗ ∈ {0, 1}𝑛 be nonzero. Then for all 𝑧 ⃗ ∈ {0, 1}𝑛 we have
|𝑧⟩⃗ + |𝑧 ⃗ ⊕ 𝑠⟩⃗ 1
(5.4.3) 𝐻 ⊗𝑛 ( )= ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ .
√2 √2𝑛−1 ⃗ 𝑠⟂⃗
𝑤∈

Proof. It follows from Lemma 5.3.6 that for all 𝑧 ⃗ ∈ {0, 1}𝑛 we have
|𝑧⟩⃗ + |𝑧 ⃗ ⊕ 𝑠⟩⃗
𝐻 ⊗𝑛 ( )
√2
1
= ∑ ((−1)𝑧⋅⃗ 𝑤⃗ + (−1)(𝑧⊕ ⃗ 𝑤⃗ |
⃗ 𝑠)⋅
) 𝑤⟩⃗
√2𝑛+1 ⃗
𝑤∈{0,1} 𝑛

(5.4.4)
1
= ( ∑ ((−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗ |
) 𝑤⟩⃗
√2𝑛+1 ⃗ 𝑠⟂⃗
𝑤∈

+ ∑ ((−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗ |
) 𝑤⟩)
⃗ .

𝑤∈{0,1}𝑛 ⧵𝑠⟂

If 𝑤⃗ ∈ 𝑠⟂⃗ , then
(5.4.5) (−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗
= 2 ⋅ (−1)𝑧⋅⃗ 𝑤⃗ .
Also, if 𝑤⃗ ∈ {0, 1}𝑛 ⧵ 𝑠⟂⃗ , then
(5.4.6) (−1)𝑧⋅⃗ 𝑤⃗ + (−1)𝑧⋅⃗ 𝑤⊕
⃗ 𝑠⋅⃗ 𝑤⃗
= 0.
Hence, it follows from (5.4.4) that
|𝑧⟩⃗ ⊕ |𝑧 ⃗ + 𝑠⟩⃗ 2 1
(5.4.7) 𝐻 ⊗𝑛 ( )= ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑧⟩⃗ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑧⟩⃗
√2 √2𝑛+1 ⃗ 𝑠⟂⃗
𝑤∈ √2𝑛−1 ⃗ 𝑠⟂⃗
𝑤∈

as asserted. □

Now we can prove the following proposition.

Proposition 5.4.9. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) from Figure 5.4.1 returns a uni-
formly distributed random element of 𝑠⟂⃗ .
216 5. The Algorithms of Deutsch and Simon

Proof. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) operates on a quantum system that con-
⊗𝑛
sists of two quantum registers of length 𝑛 each of which is initialized to |0⟩ . So, we
have
⊗𝑛 ⊗𝑛
(5.4.8) |𝜓0 ⟩ = |0⟩ |0⟩ .
Then 𝐻 ⊗𝑛 is applied to the first register. It follows from (5.3.3) that this gives the quan-
tum state
1 ⊗𝑛
(5.4.9) |𝜓1 ⟩ = ∑ |𝑧⟩⃗ |0⟩ .
√2𝑛 𝑧∈{0,1}
⃗ 𝑛

⊗𝑛
It is an equally weighted superposition of the quantum states |𝑧⟩⃗ |0⟩ . Next, the algo-
rithm applies 𝑈 𝑓 to |𝜓1 ⟩ and produces the state
1
(5.4.10) |𝜓2 ⟩ = 𝑈 𝑓 |𝜓1 ⟩ = ∑ |𝑧⟩⃗ |𝑓(𝑧)⟩
⃗ .
√2𝑛 ⃗
𝑧∈{0,1}𝑛

This is an instance of quantum parallelism: one application of the operator 𝑈 𝑓 gives


⃗ 𝑧 ⃗ ∈ {0, 1}𝑛 . We will now show that this operation
a superposition of the states |𝑓(𝑧)⟩,
also leads to quantum interference which allows us to obtain 𝑤⃗ ∈ 𝑠⟂⃗ . To see this, let 𝐼
be a set of representatives of the elements of the quotient space {0, 1}𝑛 /{0, 𝑠}.⃗ Then we
have
(5.4.11) {0, 1}𝑛 = {𝑧,⃗ 𝑠 ⃗ ⊕ 𝑧}.



𝑧∈𝐼

This implies that |𝜓2 ⟩ can be rewritten as

1 |𝑧⟩⃗ + |𝑧 ⃗ ⊕ 𝑠⟩⃗
(5.4.12) |𝜓2 ⟩ = ∑ |𝑓(𝑧)⟩
⃗ .
√2𝑛−1 ⃗
𝑧∈𝐼 √2
From Lemma 5.4.8 we obtain
1
(5.4.13) |𝜓2 ⟩ = ∑ ∑ (−1)𝑧⋅⃗ 𝑤⃗ 𝐻 ⊗𝑛 |𝑤⟩⃗ |𝑓(𝑧)⟩
⃗ .
2𝑛−1 ⃗ ⃗ ⟂⃗
𝑧∈𝐼 𝑤∈𝑠

So quantum interference gives the equally weighted superposition of the quantum


states 𝐻 ⊗𝑛 |𝑤⟩⃗ |𝑓(𝑧)⟩
⃗ with 𝑤⃗ ∈ 𝑠⟂⃗ and 𝑧 ⃗ ∈ 𝐼. To allow extraction of some 𝑤,⃗ the al-
⊗𝑛
gorithm applies 𝐻 to the first quantum register. This gives the final state
1
(5.4.14) |𝜓3 ⟩ = ∑ ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ |𝑓(𝑧)⟩
⃗ .
2𝑛−1 ⃗ ⃗ ⟂⃗
𝑧∈𝐼 𝑤∈𝑠

As shown in Exercise 5.4.10, measuring the first register of |𝜓3 ⟩ in the computational
basis of ℍ𝑛 gives every 𝑤⃗ ∈ 𝑠⟂⃗ with probability 1/2𝑛−1 . □

Exercise 5.4.10. (1) Show that measuring the first register of |𝜓3 ⟩ in (5.4.14) in the
computational basis of ℍ𝑛 gives every 𝑤⃗ ∈ 𝑠⟂⃗ with probability 1/2𝑛−1 .
(2) Analyze the modification of Simon’s algorithm where the second register is traced
out before the measurement.
5.4. Simon’s algorithm 217

Next, we analyze Algorithm 5.4.5. The algorithm invokes the quantum circuit
QSimon (𝑛, 𝑈 𝑓 ) 𝑛 − 1 times and constructs the matrix 𝑊. If the rank of the matrix
𝑊 is 𝑛 − 1, it follows from Proposition 5.4.7 that the hidden string 𝑠 ⃗ is the uniquely
determined nonzero solution of the linear system 𝑊 T 𝑥⃗ = 0.⃗ To estimate the success
probability of the algorithm, we use the following lemma.
Lemma 5.4.11. For all 𝑛 ∈ ℕ we have
𝑛−1
1 1
(5.4.15) ∏ (1 − )≥ .
𝑘=1
2𝑘 4

Proof. For any 𝑥 ∈ [0, 1/2] we have


(5.4.16) log(1 − 𝑥) ≥ −2𝑥 log 2.
This is shown in Exercise 5.4.12 and implies
𝑛−1 𝑛−1
1 1
log (∏ (1 − )) = ∑ log (1 − 𝑘 )
𝑘=1
2𝑘 𝑘=1
2
(5.4.17) ∞
1
≥ −2 log 2 ∑ 𝑘
≥ −2 log 2.
𝑘=1
2
This implies the assertion. □
Exercise 5.4.12. Use elementary calculus to show that (5.4.16) holds.

Using Lemma 5.4.11 we now obtain the following estimate of the success proba-
bility of Simon’s algorithm.
Proposition 5.4.13. The success probability of Simon’s algorithm is at least 1/4.

Proof. We show that for 1 ≤ 𝑗 ≤ 𝑛 − 1 the matrix 𝑊 𝑗 computed by the algorithm in


the 𝑗th iteration of the for loop has rank 𝑗 with probability
𝑛−1
1
(5.4.18) 𝑝𝑗 = ∏ (1 − ).
𝑘=𝑛−𝑗
2𝑘

Using this equation for 𝑗 = 𝑛 − 1 and Lemma 5.4.11 we see that the success probability
of the Simon algorithm is at least 1/4.
If 𝑛 = 1, then (5.4.18) holds. So, let 𝑛 > 1. We prove (5.4.18) be induction on 𝑗. In
each iteration of the for loop, a random 𝑤⃗ ∈ 𝑠⟂⃗ is returned by QSimon (𝑛, 𝑈 𝑗 ) according
to the uniform distribution and is appended to the previous 𝑊. The matrix found in
this way in the first iteration has rank 1 if 𝑤⃗ is nonzero. This happens with probability
(2𝑛−1 − 1)/2𝑛−1 = 1 − 1/2𝑛−1 = 𝑝1 . Now let 𝑗 ∈ 𝑁, 1 < 𝑗 ≤ 𝑛 − 1, and assume that
rank 𝑊 𝑗−1 = 𝑗 − 1 and that this happens with probability 𝑝𝑗−1 . We determine the
probability that the vector 𝑤⃗ found in the 𝑗th iteration appends 𝑊 𝑗−1 to a matrix of
rank 𝑗. Let (𝑏1⃗ , . . . , 𝑏𝑛−1
⃗ ) be a basis of 𝑠⟂⃗ such that 𝑏1⃗ , . . . , 𝑏𝑗−1
⃗ are the row vectors of
𝑛−1
𝑊 𝑗−1 . Let 𝑤⃗ = ∑𝑖=0 𝑤 𝑖 𝑏𝑖⃗ be the vector returned in the 𝑗th iteration of the for loop by
QSimon (𝑛, 𝑈 𝑓 ). This vector appends 𝑊 𝑗−1 to a matrix of rank 𝑗 if and only if at least
218 5. The Algorithms of Deutsch and Simon

one of the coefficients 𝑤 𝑖 of the basis elements 𝑏𝑖⃗ with 𝑗 ≤ 𝑖 ≤ 𝑛 − 1 is nonzero. This
holds for 2𝑛−1 − 2𝑗−1 vectors in {0, 1}𝑛 . So, the probability of finding such a vector is
2𝑗−1 1
(5.4.19) 𝑝𝑗−1 (1 − ) = 𝑝𝑗−1 (1 − 𝑛−𝑗 ) = 𝑝𝑗 . □
2𝑛−1 2

Finally, we analyze the complexity of the algorithm.


Proposition 5.4.14. Simon’s algorithm requires 𝑛 − 1 applications of 𝑈 𝑓 and 𝑂(𝑛2 )
Hadamard gates and O(𝑛3 ) additional operations.

Proof. Clearly, the number of calls of QSimon (𝑛, 𝑈 𝑓 ) is 𝑛 − 1. Since each application
of QSimon (𝑛, 𝑈𝑛 ) uses O(𝑛) Hadamard gates, the total number of required Hadamard
gates is O(𝑛3 ). Also, by Proposition B.7.17 the linear system 𝑊 T 𝑥⃗ = 0⃗ can be solved
using O(𝑛2 ) operations. □

Now Theorem 5.4.6 follows from Propositions 5.4.7, 5.4.9, 5.4.13, and 5.4.14.

5.5. Generalization of Simon’s algorithm


We discuss a generalization of Simon’s problem and a quantum algorithm for solv-
ing it. The idea is to replace the hidden string 𝑠 ⃗ by a hidden linear subspace 𝑆 of the
𝑛-dimensional vector space {0, 1}𝑛 . The classical version of this generalization is the
following.
Problem 5.5.1 (General Simon’s problem — classical version).
Input: A black-box implementing a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 with the following
property. There is a linear subspace 𝑆 of {0, 1}𝑛 such that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have
𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠 ⃗ for some 𝑠 ⃗ ∈ 𝑆. The dimension 𝑚 of 𝑆 is also an
input.
Output: A basis of 𝑆.

The subspace 𝑆 in also referred to as a hidden subgroup of {0, 1}𝑛 . In Simon’s orig-
inal problem, we have 𝑆 = {0,⃗ 𝑠}. Finding a hidden subgroup is also the key step in
Shor’s factoring and discrete logarithm algorithms. They are discussed in Chapter 6.
The quantum version of the general Simon’s problem is the following.
Problem 5.5.2 (General Simon’s problem — quantum version).
Input: A black-box implementing the unitary operator 𝑈 𝑓 from (5.4.1) for a function
𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 with the following property. There is a linear subspace 𝑆 of {0, 1}𝑛
such that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑥⃗ = 𝑦 ⃗ ⊕ 𝑠 ⃗ for some
𝑠 ⃗ ∈ 𝑆. The dimension 𝑚 of 𝑆 is also an input.
Output: A basis of 𝑆.

Algorithm 5.5.4, which is a small modification of Simon’s Algorithm 5.4.5, solves


the generalization of Simon’s problem. The idea is the following. Using the quantum
5.5. Generalization of Simon’s algorithm 219

circuit 𝑄(𝑈 𝑓 ) from Figure 5.4.1 the algorithm selects 𝑚 elements 𝑤⃗ 1 , . . . , 𝑤⃗ 𝑚 in the
orthogonal complement
(5.5.1) 𝑆 ⟂ = {𝑤⃗ ∈ {0, 1}𝑛 ∶ 𝑤⃗ ⋅ 𝑠 ⃗ = 0 for all 𝑠 ⃗ ∈ 𝑆}
of 𝑆 randomly with the uniform distribution. This set is a linear subspace of {0, 1}𝑛 of
dimension 𝑛 − 𝑚. If the matrix 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑚 ) has rank 𝑚, then the kernel of 𝑊 is
equal to the subspace 𝑆, and the algorithm returns a basis of this kernel.
In the remainder of this section, we will prove the following theorem.
Theorem 5.5.3. Algorithm 5.5.4 returns a basis of the hidden subgroup 𝑆 from the gen-
eralization of Simon’s problem with probability at least 1/4. It uses 𝑚 applications of 𝑈 𝑓
and O(𝑛3 ) other operations.

Algorithm 5.5.4. General Simon’s algorithm


Input: A black-box implementing 𝑈 𝑓 from (5.4.1) and the dimension 𝑚 of the hidden
subgroup 𝑆 where 𝑓 and 𝑆 are as specified in the generalization of Simon’s problem
Output: A basis of 𝑆
1: GeneralSimon(𝑛, 𝑈 𝑓 , 𝑚)
2: 𝑊 ← ()
3: for 𝑗 = 1 to 𝑛 − 𝑚 do
4: 𝑤𝑗⃗ ← QSimon (𝑛, 𝑈 𝑓 )
5: 𝑊 ← 𝑊 ∘ (𝑤𝑗⃗ )
6: end for
7: 𝑟 ← rank 𝑊
8: if 𝑟 = 𝑚 then
9: Find a basis 𝐵 of the kernel of 𝑊
10: return 𝐵
11: else
12: return “Failure”
13: end if
14: end

The proof of Theorem 5.5.3 is analogous to the proof of Theorem 5.4.6. Therefore,
the proofs of the corresponding results are left to the reader as exercises. We start by
determining the structure of 𝑆 ⟂ .
Proposition 5.5.5. Let 𝑊 = (𝑤⃗ 1 , . . . , 𝑤⃗ 𝑛−𝑚 ) be of dimension 𝑛 − 𝑚. Then 𝑆 is the kernel
of 𝑊 T .
Exercise 5.5.6. Prove Proposition 5.5.5.

It follows from Proposition 5.5.5 that Algorithm 5.5.4 returns the correct result if
the quantum circuit QSimon (𝑛, 𝑈 𝑓 ) in Figure 5.4.1 returns elements of 𝑆 ⟂ which we
will prove now. For this, we need the following lemma.
Lemma 5.5.7. Let 𝑧 ⃗ ∈ {0, 1}𝑛 and set
1
(5.5.2) |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ |𝑧 ⃗ ⊕ 𝑠⟩⃗ .
√2𝑚 ⃗
𝑠∈𝑆
220 5. The Algorithms of Deutsch and Simon

Then we have
1
(5.5.3) 𝐻 ⊗𝑛 |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑧⟩⃗ .
√2𝑛−𝑚 ⃗ ⟂
𝑤∈𝑆

Exercise 5.5.8. Prove Lemma 5.5.7.

Lemma 5.5.7 implies the following proposition.


Proposition 5.5.9. The quantum circuit QSimon (𝑛, 𝑈 𝑓 ) from Figure 5.4.1 returns a uni-
formly distributed random element of 𝑆 ⟂ .
Exercise 5.5.10. Prove Proposition 5.5.9.

Finally, Theorem 5.5.3 is proved in the next exercise using Propositions 5.5.5 and
5.5.9.
Exercise 5.5.11. Prove Theorem 5.5.3.
Chapter 6

The Algorithms of Shor

In this chapter, we present the algorithms that Peter Shor first introduced in 1994
[Sho94], causing a significant stir in the cybersecurity domain. These are quantum
algorithms that have the remarkable ability to compute integer factorizations and dis-
crete logarithms in polynomial time. The intractability of these problems for large pa-
rameters forms the foundation of security in the most commonly used public-key cryp-
tography, a pivotal pillar of overall cybersecurity, particularly internet security. Due
to their profound significance in IT security, the Shor algorithms stand as the most
renowned quantum algorithms. Their invention spurred the inception of the highly
active research field known as Post-Quantum Cryptography.
The chapter starts by presenting the idea of the Shor factoring algorithm. Then, the
most important tools used by Shors’s algorithms, the Quantum Fourier Transform and
quantum circuits for efficiently implementing it and its inverse, are explained. Subse-
quently, we show how the Quantum Fourier Transform is used to solve the quantum
phase estimation problem which addresses the challenge of approximating the phase
of the eigenvalue of a unitary operator when an associated eigenstate of this operator
is known. Quantum phase estimation is then applied to finding the order of elements
in the multiplicative group modulo a positive integer in polynomial time. For this,
a quantum variant of the well-known fast exponentiation technique is essential. We
then show how efficient order finding enables integer factorization in polynomial time.
Also, we show how quantum phase estimation and quantum fast exponentiation lead
to a polynomial time algorithm for discrete logarithms. We conclude the chapter by dis-
cussing the hidden subgroup problem and demonstrating that several computational
problems in this and the previous section can be viewed as instances of this problem.
As usual, we identify linear operators on the state spaces ℍ𝑛 , 𝑛 ∈ ℕ, with their rep-
resentation matrices with respect to the computational basis of ℍ𝑛 . Furthermore, the
complexity analyses of this chapter assume that all quantum circuits are constructed
using the elementary quantum gates provided by the platform discussed in Section
4.12.2.

221
222 6. The Algorithms of Shor

6.1. Idea of Shor’s factoring algorithm


To enhance the clarity of Shor’s factoring algorithm explanation, we provide a concise
overview. Let us assume our goal is to discover a proper divisor of a composite number
𝑁. The algorithm’s initial step involves selecting a random number 𝑎 from ℤ𝑁 with the
uniform distribution. If gcd(𝑎, 𝑁) > 1, we have found a proper divisor of 𝑁, and our
task is complete. Now, let us consider the scenario where gcd(𝑁, 𝑎) = 1. In this case,
the algorithm proceeds to determine the order 𝑟 of 𝑎 modulo 𝑁. If this order happens
to be even, then we can factorize 𝑎𝑟 − 1 as (𝑎𝑟/2 − 1)(𝑎𝑟/2 + 1), which is guaranteed
to be divisible by 𝑁. If 𝑎𝑟/2 + 1 is not divisible by 𝑁, then, as shown in Exercise 6.1.1,
gcd(𝑎𝑟/2 − 1, 𝑁) is a proper divisor of 𝑁. The analysis of the algorithm will demonstrate
that the probability of 𝑟 being even and 𝑎𝑟/2 + 1 not being divisible by 𝑁 is sufficiently
high.

Exercise 6.1.1. Let 𝑁 ∈ ℕ be a composite number and let 𝑎 ∈ ℤ𝑁 with gcd(𝑎, 𝑁) = 1.


Assume that the order 𝑟 of 𝑎 modulo 𝑁 is even and that 𝑎𝑟/2 + 1 is not divisible by 𝑁.
Show that gcd(𝑎𝑟/2 − 1, 𝑁) is a proper divisor of 𝑁.

So, our primary objective is now to determine the order of 𝑎 modulo 𝑁. To achieve
this, the Shor algorithm uses a precision parameter 𝑛 and a unitary operator 𝑈𝑎 on ℍ𝑛
with the following property. There is an orthonormal sequence (|𝑢𝑘 ⟩)𝑘∈ℤ𝑟 of eigen-
𝑘
states with corresponding eigenvalue sequence (𝑒2𝜋𝑖 𝑟 )𝑘∈ℤ𝑟 . The Shor algorithm then
𝑥 𝑘
finds an integer 𝑥 ∈ ℤ2𝑛 such that 2𝜋 2𝑛 is close to the phase 2𝜋 𝑟 of the eigenvalue
associated to |𝑢𝑘 ⟩ for some 𝑘 ∈ ℤ𝑟 . Then the continued fraction algorithm applied to
𝑥
2𝑛
is used to determine the denominator 𝑟 which is the order of 𝑎 modulo 𝑁.
𝑘
To approximate one of the phases 2𝜋 𝑟 , Shor’s algorithm employs the quantum
phase estimation algorithm. It can find an approximation to the phase of the eigen-
value of a given unitary operator 𝑈. However, it is important to note that this algo-
rithm requires a corresponding eigenstate as an input. In general, preparing such an
eigenstate efficiently is impossible. But we will demonstrate in Proposition 6.4.5 that
𝑟−1
∑𝑘=0 |𝑢𝑘 ⟩ = |1⟩𝑛 and, therefore, this superposition can be efficiently prepared. Con-
sequently, the algorithm of Shor applies the quantum phase estimation algorithm to
𝑘
this superposition, obtaining an approximation for one of the phases 2𝜋 𝑟 . Since the
primary interest lies in determining the denominator 𝑟 of these phases this is sufficient.
Before we can explain how quantum phase estimation operates, it is essential to
introduce its primary component: the Quantum Fourier Transform.

6.2. The Quantum Fourier Transform


The most important tool used by Shor’s algorithms is the Quantum Fourier Transform
which we discuss in this section. As in Example 2.1.5, we use the bijection
𝑛−1
(6.2.1) stringToInt ∶ {0, 1}𝑛 → ℤ2𝑛 , 𝑥⃗ = (𝑥1 , . . . , 𝑥𝑛 ) ↦ ∑ 𝑥𝑗 2𝑛−𝑗−1
𝑗=0
6.2. The Quantum Fourier Transform 223

to identify the strings in {0, 1}𝑛 with the integers 𝑥 ∈ ℤ2𝑛 . Using this identification we
write

(6.2.2) |𝑥⟩⃗ = |𝑥⟩𝑛

for the elements of the computational basis of ℍ𝑛 .


To define the Quantum Fourier Transform, we use the following notation.

Definition 6.2.1. For any 𝜔 ∈ ℝ we set


2𝑛 −1
1
(6.2.3) |𝜓𝑛 (𝜔)⟩ = ∑ 𝑒2𝜋𝑖𝑦𝜔 |𝑦⟩𝑛 .
√2𝑛 𝑦=0

Example 6.2.2. We have


1 1 1 1 1 1
𝜓2 ( ) = (𝑒2𝜋𝑖⋅0⋅ 4 |0⟩2 + 𝑒2𝜋𝑖⋅1⋅ 4 |1⟩2 + 𝑒2𝜋𝑖⋅2⋅ 4 |2⟩2 + 𝑒2𝜋𝑖⋅3⋅ 4 |3⟩2 )
4 4
1
= (|0⟩2 + 𝑖 |1⟩2 − |2⟩2 − 𝑖 |3⟩2 ) .
4
Let 𝜔 ∈ ℝ. Since
2𝑛 −1
1
(6.2.4) ∑ |𝑒2𝜋𝑖𝜔𝑦 |2 = 1
2𝑛 𝑦=0

it follows that |𝜓𝑛 (𝜔)⟩ is a quantum state in ℍ𝑛 . We now give another representation
of this state.

Proposition 6.2.3. Let 𝜔 ∈ ℝ. Then we have


𝑛−1 𝑛−𝑗−1 𝜔
|0⟩ + 𝑒2𝜋𝑖⋅2 |1⟩
(6.2.5) |𝜓𝑛 (𝜔)⟩ = .
⨂ √2
𝑗=0

Proof. We have
𝑛−1 𝑛−𝑗−1 𝜔
|0⟩ + 𝑒2𝜋𝑖⋅2 |1⟩
⨂ √2
𝑗=0
𝑛−1
1 𝑛−𝑗−1 𝜔
= ∑ ∏ 𝑒2𝜋𝑖𝑦𝑗 2 |𝑦⟩⃗
√2𝑛 ⃗
𝑦=(𝑦 0 ,. . .,𝑦𝑛−1 )∈{0,1}𝑛 𝑗=0
(6.2.6) 1 𝑛−1
2𝜋𝑖(∑𝑗=0 𝑦𝑗 2𝑛−𝑗−1 )𝜔
= ∑ 𝑒 |𝑦⟩⃗
√2𝑛 ⃗
𝑦=(𝑦 0 ,. . .,𝑦𝑛−1 )∈{0,1}
𝑛

2𝑛 −1
1
= ∑ 𝑒2𝜋𝑖𝑦𝜔 |𝑦⟩𝑛
√2𝑛 𝑦=0

= |𝜓𝑛 (𝜔)⟩ .

This proves the assertion. □


224 6. The Algorithms of Shor

1
Example 6.2.4. The alternative representation of the state 𝜓2 ( 4 ), which was already
considered in Example 6.2.2 ,is
1⋅ 1 0⋅ 1
1 |0⟩ + 𝑒2𝜋𝑖⋅2 4 |1⟩ |0⟩ + 𝑒2𝜋𝑖⋅2 4 |1⟩
𝜓2 ( ) = ⊗
4 √2 √2
|0⟩ − |1⟩ |0⟩ + 𝑖 |1⟩
= ⊗ .
√2 √2

Next, we define the Quantum Fourier Transform.


Definition 6.2.5. The Quantum Fourier Transform QFT𝑛 is the linear operator on ℍ𝑛
that is defined in terms of the images of the computational basis states |𝑥⟩𝑛 , 𝑥 ∈ ℤ2𝑛 ,
of ℍ𝑛 as follows:
2𝑛 −1
𝑥 1 𝑥
(6.2.7) QFT𝑛 |𝑥⟩𝑛 = ||𝜓𝑛 ( 𝑛 )⟩ = ∑ 𝑒2𝜋𝑖 2𝑛 𝑦 |𝑦⟩𝑛 .
2 √2𝑛 𝑦=0

Example 6.2.6. We have


1 1
QFT1 (|0⟩ + |1⟩) = (QFT1 |0⟩ + QFT1 |1⟩)
√2 √2
1 1 1
= ( (|0⟩ + |1⟩) + (|0⟩ − |1⟩)) = |0⟩ .
√2 √2 √2
Example 6.2.7. We have
2 −1 𝑛
| 1
(6.2.8) QFT𝑛 ||⏟
00⎵⋯ ⏟⟩ = QFT |0⟩𝑛 = |𝜓𝑛 (0)⟩ =
⏟⎵00 ∑ |𝑦⟩𝑛 .
| 𝑛 √2𝑛 𝑦=0
So QFT𝑛 |0⟩𝑛 is the equally weighted superposition of all computational basis states of
ℍ𝑛 .
Proposition 6.2.8. The Quantum Fourier Transform QFT𝑛 is a unitary operator on
ℍ𝑛 . Its inverse is the linear operator on ℍ𝑛 that is defined in terms of the images of the
computational basis elements |𝑥⟩𝑛 , 𝑥 ∈ ℤ2𝑛 , of ℍ𝑛 as follows:
2𝑛 −1
−𝑥 1 𝑥
QFT𝑛 |𝑥⟩𝑛 = ||𝜓𝑛 ( 𝑛 )⟩ =
−1
(6.2.9) ∑ 𝑒−2𝜋𝑖 2𝑛 𝑦 |𝑦⟩𝑛 .
2 √2𝑛 𝑦=0


Proof. The representation matrices of QFT𝑛 and QFT𝑛 with respect to the computa-
tional basis of ℍ𝑛 are
𝑥
𝑒2𝜋𝑖 2𝑛 𝑦
(6.2.10) QFT𝑛 = ( )
√2𝑛 𝑥,𝑦∈ℤ2𝑛

and
𝑦
∗ 𝑒−2𝜋𝑖 2𝑛 𝑧
(6.2.11) QFT𝑛 =( ) .
√2𝑛 𝑦,𝑧∈ℤ2𝑛
6.2. The Quantum Fourier Transform 225

Let 𝑥, 𝑧 ∈ ℤ2𝑛 . Then the entry in row 𝑥 and column 𝑧 of the matrix products
∗ ∗
QFT𝑛 ⋅ QFT𝑛 and QFT𝑛 ⋅ QFT𝑛 is
2𝑛 −1 1 if 𝑥 = 𝑧,
1 𝑥 𝑦
(6.2.12) 𝑛
∑ 𝑒2𝜋𝑖 2𝑛 𝑦 𝑒−2𝜋𝑖 2𝑛 𝑧 = { 1 1−𝑒2𝜋𝑖(𝑥−𝑧)
2 𝑦=0 2𝑛 2𝜋𝑖
𝑥−𝑧 = 0 if 𝑥 ≠ 𝑧.
1−𝑒 2𝑛
∗ ∗
Hence, we have QFT𝑛 ⋅ QFT𝑛 = QFT𝑛 ⋅ QFT𝑛 = 𝐼𝑛 . So QFT𝑛 is unitary and (6.2.9)
follows from (6.2.11). □

We note that for all 𝑥 ∈ ℤ2𝑛 we obtain from Proposition 6.2.3


𝑥
𝑛−1 2𝜋𝑖⋅
|0⟩ + 𝑒 2𝑗+1 |1⟩
(6.2.13) QFT𝑛 |𝑥⟩𝑛 = .
⨂ √2
𝑗=0

In particular, we have
𝑛−1
|0⟩ + |1⟩ ⊗𝑛
(6.2.14) QFT𝑛 |0⟩𝑛 = = 𝐻 ⊗𝑛 |0⟩
⨂ √2
𝑗=0

which has already been shown in Example 6.2.7.


Exercise 6.2.9. Use equation (6.2.14) to show that
2𝑛 −1
⊗𝑛 ⊗𝑛
(6.2.15) 𝐻 |0⟩ = ∑ |𝑦⟩𝑛
𝑦=0

and construct a quantum circuit that creates the equally weighted superposition of all
computational basis states of ℍ𝑛 .

We use Proposition 6.2.3 and (6.2.14) to give alternative formulas for QFT𝑛 and
−1
QFT𝑛 . For 𝑚 ∈ ℕ and (𝑏0 , . . . , 𝑏𝑚−1 ) ∈ {0, 1}𝑚 we write
𝑚−1
(6.2.16) 0.𝑏0 𝑏1 ⋯ 𝑏𝑚−1 = ∑ 𝑏𝑖 2−𝑖−1 .
𝑖=0

Example 6.2.10. We have


0.1001 = 1 ⋅ 2−1 + 0 ⋅ 2−2 + 0 ⋅ 2−3 + 1 ⋅ 2−4 = 2−1 + 2−4 .

With this notation, we obtain the following result.


Proposition 6.2.11. Let 𝑥 ∈ ℤ2𝑛 and let 𝑥⃗ = (𝑥0 𝑥1 𝑥2 ⋯ 𝑥𝑛−1 ) ∈ {0, 1}𝑛 such that
𝑛−1
𝑥 = ∑𝑖=0 𝑥𝑖 2𝑛−𝑖−1 . Then we have
𝑛−1
|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑛−𝑗−1 𝑥𝑛−𝑗−2 ⋯𝑥𝑛−1 |1⟩
(6.2.17) QFT𝑛 |𝑥⟩𝑛 =
⨂ √2
𝑗=0

and
𝑛−1
−1 |0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥𝑛−𝑗−1 𝑗𝑥𝑛−𝑗−2 ⋯𝑥𝑛−1 |1⟩
(6.2.18) QFT𝑛 |𝑥⟩𝑛 = .
⨂ √2
𝑗=0
226 6. The Algorithms of Shor

1
|𝑥0 ⟩ 𝐻 𝑅2 ⋯ 𝑅𝑛 ⋯ ⋯ (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥0 ⋯𝑥𝑛−1 |1⟩)
√2

1
|𝑥1 ⟩ ⋯ 𝐻 𝑅2 ⋯ 𝑅𝑛−1 ⋯ (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥1 ⋯𝑥𝑛−1 |1⟩)
√2

1
|𝑥2 ⟩ ⋯ ⋯ ⋯ (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥2 ⋯𝑥𝑛−1 |1⟩)
√2

⋮ ⋮
1
|𝑥𝑛−2 ⟩ ⋯ ⋯ 𝐻 𝑅2 (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑛−2 𝑥𝑛−1 |1⟩)
√2

1
|𝑥𝑛−1 ⟩ ⋯ ⋯ 𝐻 (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑛−1 |1⟩)
√2

Figure 6.2.1. A quantum circuit that computes QFT𝑛 up to a permutation that re-
verses the order of the output qubits.

Proof. The assertion follows from equation (6.2.13). □

Next, we present a quantum circuit that computes the Quantum Fourier Trans-
form. Figure 6.2.1 shows this quantum circuit up to a permutation that reverses the
order of the output qubits and Algorithm 6.2.12 is its complete specification.

Algorithm 6.2.12. Quantum Fourier Transform


Input: |𝜓⟩ ∈ ℍ𝑛
Output: QFT𝑛 |𝜓⟩
1: Initialize the input register to |𝜓⟩
2: for 𝑗 = 0 to 𝑛 − 1 do
3: Apply 𝐻 to the 𝑗th qubit
4: for 𝑘 = 1 to 𝑛 − 𝑗 − 1 do
5: Apply 𝑅𝑘+1 to the 𝑗th qubit controlled by the (𝑗 + 𝑘)th qubit
6: end for
7: end for
8: Apply a permutation that reverses the order of the qubits in |𝜓⟩

The next theorem states the correctness of the QFT implementation.


Theorem 6.2.13. The quantum circuit specified in Algorithm 6.2.12 computes QFT𝑛
and has size O(𝑛2 ).

Proof. Let 𝑗 ∈ {0, . . . , 𝑛 − 1}. We explain the evolution of the input qubit |𝑥𝑗 ⟩ in
the quantum circuit shown in Figure 6.2.1. First, the quantum circuit applies the
Hadamard operator to |𝑥𝑗 ⟩. This has the following effect on this qubit which is shown
in Figure 6.2.2:
1
(6.2.19) 𝐻 |𝑥𝑗 ⟩ = (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 |1⟩) .
√2
This operation does not change the other qubits.
6.2. The Quantum Fourier Transform 227

1
𝑥𝑗 𝐻 (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 |1⟩)
√2

Figure 6.2.2. Application of the Hadamard gate to ||𝑥𝑗 ⟩.

1 1
|0⟩ + 𝑒2𝜋⋅0.𝑥𝑗 ⋯𝑥𝑘−1 |1⟩ 𝑅𝑘+1 |0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 ⋯𝑥𝑘 |1⟩
√2 √2

|𝑥𝑘 ⟩ |𝑥𝑘 ⟩

Figure 6.2.3. The controlled-𝑅𝑘 operator kicks the phase 2𝜋𝑥𝑘 /2𝑘+1 back to the am-
plitude of |1⟩.

Next, for 𝑘 = 𝑗 + 1, . . . , 𝑛 − 1 the quantum circuit applies the controlled-𝑅𝑘+1


operator to this qubit. As shown in Figure 6.2.3, this kicks the phase 2𝜋 ⋅ 0. 0⏟ ⋅ 0 𝑥𝑘
𝑘
back to the amplitude of |1⟩ in the 𝑗th qubit while the amplitude of |0⟩ and the other
qubits remain unchanged. So the final state of this qubit is
1
(6.2.20) (|0⟩ + 𝑒2𝜋𝑖⋅0.𝑥𝑗 ⋯𝑥𝑛−1 |1⟩) .
√2

We estimate the size of the quantum circuit. It can be seen in Figure 6.2.1 that
O(𝑛2 ) input qubits, elementary quantum gates, and controlled-𝑅𝑘 gates are used before
the order of the output qubits is reversed. By Corollary 4.12.8, the implementation
of the controlled-𝑅𝑘 gates requires O(1) elementary gates. So, the size of this part of
the quantum circuit is O(𝑛2 ). By Proposition 4.5.2, the reversal of the output qubits
requires another O(𝑛) elementary quantum gates. This shows that the total size of the
circuit is O(𝑛2 ). □

Exercise 6.2.14. Find a quantum P-uniform circuit family (𝑄𝑛 ) such that for all 𝑛 ∈ ℕ
and all (𝑏0 , . . . , 𝑏𝑛 ) ∈ {0, 1}𝑛

(6.2.21) 𝑄𝑛 |𝑏0 ⋯ 𝑏𝑛−1 ⟩ = |𝑏𝑛−1 ⋯ 𝑏0 ⟩

and 𝑄𝑛 requires O(𝑛) 𝖢𝖭𝖮𝖳 gates.

Exercise 6.2.15. Verify that the quantum circuits in Figures 6.2.2 and 6.2.3 have the
asserted outputs.

Theorem 6.2.13 implies the following corollary.


−1
Corollary 6.2.16. The quantum circuit in Figure 6.2.4 computes QFT𝑛 up to a permu-
tation that reverses the order of the output qubits. It has size O(𝑛2 ).

Proof. The quantum circuit in Figure 6.2.4 is the inverse of the quantum circuit in
Figure 6.2.1. □
228 6. The Algorithms of Shor

1
|𝑥0 ⟩ ⋯ ⋯ 𝑅∗𝑛 ⋯ 𝑅∗2 𝑞𝑤
𝐻 (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥0 ⋯𝑥𝑛−1 |1⟩)
√2

1
|𝑥1 ⟩ ⋯ 𝑅∗𝑛−1 ⋯ 𝑅∗2 𝐻 ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥1 ⋯𝑥𝑛−1 |1⟩)
√2

1
|𝑥2 ⟩ ⋯ ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥2 ⋯𝑥𝑛−1 |1⟩)
√2

⋮ ⋮

1
|𝑥𝑛−2 ⟩ 𝑅∗2 𝐻 ⋯ ⋯ ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥𝑛−2 𝑥𝑛−1 |1⟩)
√2

1
|𝑥𝑛−1 ⟩ 𝐻 ⋯ ⋯ ⋯ (|0⟩ + 𝑒−2𝜋𝑖⋅0.𝑥𝑛−1 |1⟩)
√2

Figure 6.2.4. Quantum circuit that computes QFT−1


𝑛 up to a permutation that reverses
the order of the output bits.

6.3. Quantum phase estimation


In this section, we consider the following problem. Let 𝑚 ∈ ℕ, let 𝑈 be a unitary
operator on ℍ𝑚 , and let |𝜓⟩ be an eigenstate of 𝑈. Since 𝑈 is unitary, its eigenvalues
have absolute value 1 by Proposition 2.4.60. We may therefore write the eigenvalue
associated to |𝜓⟩ as 𝑒2𝜋𝑖𝜔 with 𝜔 ∈ ℝ. The phase 2𝜋𝜔 of the eigenvalue is uniquely
determined modulo 1. We will present a quantum algorithm that, given a precision
𝑥
parameter 𝑛 ∈ ℕ, finds with high probability 𝑥 ∈ ℤ2𝑛 such that 𝑒2𝜋𝑖 𝑛 is close to the
eigenvalue 𝑒2𝜋𝑖𝜔 . The integer 𝑥 will be obtained by measuring the final state of the
quantum register produced by the quantum circuit in Figure 6.3.1.

6.3.1. The idea. We start by explaining the idea of the phase estimation algo-
rithm which is shown in Figure 6.3.1. Using the Hadamard operator and the controlled-
𝑥
𝑈 operator, the algorithm constructs the state |𝜓3 ⟩ = |𝜓𝑛 (𝜔)⟩. If 𝜔 = 2𝑛 for some
−1
𝑥 ∈ ℤ2𝑛 , then it follows from (6.2.7) that |𝜓4 ⟩ = QFT𝑛 |𝜓𝑛 (𝜔)⟩ = |𝑥⟩. So measuring
this state gives 𝑥 with probability 1. We will show that if 𝜔 is not of this form, then mea-
𝑥
suring |𝜓4 ⟩ gives with high probability 𝑥 ∈ ℤ2𝑛 such that 2𝑛 is a good approximation
to 𝜔.

6.3.2. Approximating |𝜓𝑛 (𝜔)⟩. As discussed in Section 6.3.1, the phase estima-
−1
tion algorithm constructs and measures QFT𝑛 |𝜓𝑛 (𝜔)⟩. The measurement outcome is
𝑥 𝑥
𝑥 ∈ ℤ2𝑛 such that 2𝑛 approximates the phase 𝜔. It is important to note that 2𝑛 falls
within the interval [0, 1[, whereas 𝜔 may be any real number. However, adding inte-
gers to 𝜔 does not alter |𝜓𝑛 (𝜔)⟩. This justifies the subsequent definition, allowing us to
quantify the accuracy of this approximation.

Definition 6.3.1. Let 𝜔 ∈ ℝ, 𝑛 ∈ ℕ, and 𝑥 ∈ ℤ2𝑛 . Then we set


𝑥 𝑥
(6.3.1) Δ(𝜔, 𝑛, 𝑥) = 𝜔 − − ⌊𝜔 − 𝑛 ⌉ .
2𝑛 2
6.3. Quantum phase estimation 229

The quantity Δ(𝜔, 𝑛, 𝑥) from Definition 6.3.1 has the following properties.

Lemma 6.3.2. Let 𝜔 ∈ ℝ, 𝑛 ∈ ℕ, and 𝑥 ∈ ℤ2𝑛 . Then the following hold.


𝑥
(1) 𝑒2𝜋𝑖(𝜔− 2𝑛 ) = 𝑒2𝜋𝑖∆(𝜔,𝑛,𝑥) .
1 1
(2) − 2 < Δ(𝜔, 𝑛, 𝑥) ≤ 2 .

Proof. The first assertion follows from the periodicity of the function 𝑓(𝑦) = 𝑒2𝜋𝑖𝑦
modulo 1. The second claim follows from the definition of Δ(𝜔, 𝑛, 𝑥). □

𝑥
In several situations we are interested in estimation 𝜔 − 2𝑛 instead of Δ(𝜔, 𝑛, 𝑥).
The next lemma shows when the first expression can be replaced by the second.

Lemma 6.3.3. Let 𝜔 ∈ ℝ, 𝑛 ∈ ℕ, 𝑥 ∈ ℤ2𝑛 . Assume that


1 1
(6.3.2) |Δ(𝜔, 𝑛, 𝑥)| < and 0≤𝜔<1− .
2𝑛 2𝑛
Then we have
𝑥
(6.3.3) Δ(𝜔, 𝑛, 𝑥) = 𝜔 − .
2𝑛

Proof. Let
𝑥
(6.3.4) 𝑧 = ⌊𝜔 − ⌉.
2𝑛
Then we have
𝑥
(6.3.5) Δ(𝜔, 𝑛, 𝑥) = 𝜔 − −𝑧
2𝑛
which implies
𝑥
(6.3.6) 𝑧=𝜔− − Δ(𝜔, 𝑛, 𝑥).
2𝑛
So the first inequality in (6.3.2) implies
𝑥 1 𝑥 1
(6.3.7) 𝜔− − < 𝑧 < 𝜔 − 𝑛 + 𝑛.
2𝑛 2𝑛 2 2
Using 𝑥 ∈ ℤ2𝑛 and the inequalities for 𝜔 in (6.3.2) we obtain
1 𝑥 𝑥
(6.3.8) −1 ≤ − − <𝑧<1− 𝑛
2𝑛 2𝑛 2
which implies 𝑧 = 0. □

Finally, we prove the following statement which is crucial in the analysis of the
phase estimation algorithm.
230 6. The Algorithms of Shor

Proposition 6.3.4. Let 𝑛 ∈ ℕ and 𝜔 ∈ ℝ. For 𝑥 ∈ ℤ2𝑛 denote by 𝑝(𝑥) the probability
−1
that 𝑥 is the outcome of measuring QFT𝑛 |𝜓𝑛 (𝜔)⟩ in the computational basis of ℍ𝑛 . Then
the following hold.
(1) If 2𝑛 𝜔 ∈ ℤ, then for 𝑥 = 2𝑛 𝜔 mod 2𝑛 we have 𝑝(𝑥) = 1 and 𝑝(𝑥) = 0 for all other
integers 𝑥 ∈ ℤ2𝑛 .
(2) If 2𝑛 𝜔 ∉ ℤ, then for all 𝑥 ∈ ℤ2𝑛 we have
2
1 sin (2𝑛 𝜋Δ(𝜔, 𝑛, 𝑥))
(6.3.9) 𝑝(𝑥) = .
22𝑛 sin2 (𝜋Δ(𝜔, 𝑛, 𝑥))

Proof. Set 𝑁 = 2𝑛 and Δ = Δ(𝜔, 𝑛, 𝑥). From Definition 6.2.1, Proposition 6.2.8, and
Lemma 6.3.2 we obtain
𝑁−1
−1 1 −1
QFT𝑛 |𝜓𝑛 (𝜔)⟩ = ∑ 𝑒2𝜋𝑖𝜔𝑦 QFT𝑛 |𝑦⟩𝑛
√𝑁 𝑦=0
𝑁−1 𝑁−1
1 1 𝑦
(6.3.10) = ∑ 𝑒2𝜋𝑖𝜔𝑦 ∑ 𝑒−2𝜋𝑖 𝑁 𝑥 |𝑥⟩𝑛
√𝑁 𝑦=0 √𝑁 𝑥=0
𝑁−1 𝑁−1
1
= ∑ ( ∑ 𝑒2𝜋𝑖∆𝑦 ) |𝑥⟩𝑛 .
𝑥=0
𝑁 𝑦=0

So for all 𝑥 ∈ ℤ𝑁 we have


2
1 | |
𝑁−1
(6.3.11) 𝑝(𝑥) = 2 || ∑ 𝑒2𝜋𝑖∆𝑦 || .
𝑁 | 𝑦=0 |
Let 𝑁𝜔 ∈ ℤ and 𝑥 = 𝑁𝜔 mod 𝑁. Then Δ = 0 and 𝑝(𝑥) = 1.
𝑁−1
Assume that 𝑁𝜔 ∉ ℤ. Then evaluating the geometric series ∑𝑦=0 𝑒2𝜋𝑖∆𝑦 in (6.3.11)
gives
𝑁−1
−1 1 1 − 𝑒2𝜋𝑖𝑁∆
(6.3.12) QFT𝑛 |𝜓𝑛 (𝜔)⟩ = ∑ |𝑥⟩𝑛 .
𝑁 𝑥=0 1 − 𝑒2𝜋𝑖∆
Now for all 𝜃 ∈ ℝ we have
(6.3.13) |1 − 𝑒2𝜋𝑖𝜃 | = |𝑒−𝜋𝑖𝜃 − 𝑒𝜋𝑖𝜃 | = 2| sin(𝜋𝜃)|.
It follows from equations (6.3.12) and (6.3.13) that
2 2
| 1 − 𝑒2𝜋𝑖𝑁∆ | 1 sin (𝑁𝜋Δ)
(6.3.14) 𝑝(𝑥) = | | = . □
| 1 − 𝑒2𝜋𝑖∆ | 𝑁 2 sin2 𝜋Δ

6.3.3. The problem and the algorithm. We now state the phase estimation
problem which is also called the eigenvalue estimation problem.
Problem 6.3.5 (Phase estimation problem).
Input: Positive integers 𝑚 and 𝑛, an implementation of the controlled-𝑈 operator for
some unitary operator 𝑈 on ℍ𝑚 , and an eigenstate |𝜓⟩ of 𝑈.
6.3. Quantum phase estimation 231

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩ |𝜓4 ⟩

|0⟩

|0⟩

−1
𝐻⋮⊗𝑛 ⋮ QFT𝑛 ⋮ 𝑥

|0⟩

|0⟩

|𝜓⟩ 𝑈2
𝑛−1
𝑈2
𝑛−2
⋯ 𝑈2 𝑈 tr

Figure 6.3.1. Quantum circuit for phase estimation.

Output: An integer 𝑥 ∈ ℤ2𝑛 such that |Δ(𝜔, 𝑛, 𝑥)| < 1/2𝑛 where 𝜔 ∈ ℝ and 𝑒2𝜋𝑖𝜔 is
the eigenvalue associated with |𝜓⟩.

A quantum circuit that solves this problem is shown in Figure 6.3.1 and specified
in Algorithm 6.3.6.

Algorithm 6.3.6. Phase estimation algorithm


Input: Positive integers 𝑚, 𝑛, an implementation of the controlled-𝑈 operator for a
unitary operator 𝑈 on ℍ𝑚 , 𝑚 ∈ ℕ, an eigenstate |𝜓⟩ of 𝑈.
Output: 𝑥 ∈ ℤ2𝑛
1: PhaseEstimate(𝑚, 𝑛, 𝑈, |𝜓⟩)
⊗𝑛
2: Initialize the control register to |0⟩
3: Initialize the target register to |𝜓⟩
4: Apply 𝐻 ⊗𝑛 to the control register
5: for 𝑗 = 1 to 𝑛 − 1 do
𝑛−𝑗
6: Apply 𝑈 2 to the target register controlled by the 𝑗th qubit of the control
register
7: end for
8: Trace out the target register
−1
9: Apply QFT𝑛 to the control register
10: Measure the control register in the computational basis, the result being 𝑥
11: return 𝑥
12: end

The next theorem states the correctness of the phase estimation algorithms and its
success probability.
232 6. The Algorithms of Shor

Theorem 6.3.7. Let 𝑚, 𝑛 be positive integers, and let 𝑈 be a unitary operator on ℍ𝑚 . Let
𝜔 ∈ ℝ be such that 𝑒2𝜋𝑖𝜔 is an eigenvalue of 𝑈 and let |𝜓⟩ be a corresponding eigenstate.
Also, let 𝑥 ∈ ℤ2𝑛 be the return value of the phase estimation algorithm. Then the following
hold.

(1) If 2𝑛 𝜔 ∈ ℤ, then 𝑥 = 2𝑛 𝜔 with probability 1.


4 1
(2) With probability at least 𝜋2
we have |Δ(𝜔, 𝑛, 𝑥)| ≤ 2𝑛+1
.
8 1
(3) With probability at least 𝜋2
we have |Δ(𝜔, 𝑛, 𝑥)| < 2𝑛
.

Proof. The quantum circuit operates on two quantum registers. The first is the control
⊗𝑛
register. Its length is the precision parameter 𝑛 and is initialized to |0⟩ . The second
register is the target register. It is of length 𝑚 and is initialized with the eigenvector |𝜓⟩
of 𝑈. So, the initial state of the algorithm is

⊗𝑛
(6.3.15) |𝜓0 ⟩ = |0⟩ |𝜓⟩ .

The algorithm then applies 𝐻 ⊗𝑛 to the control register. This gives the state

⊗𝑛
|0⟩ + |1⟩
(6.3.16) |𝜓1 ⟩ = ( ) |𝜓⟩ .
√2

Next, we note that for 𝑗 = 0, . . . , 𝑛 − 1 we have

𝑛−𝑗−1 𝑛−𝑗−1 𝜔
(6.3.17) 𝑈2 |𝜓⟩ = 𝑒2𝜋𝑖2 |𝜓⟩ .

This is a global phase shift. These operators are applied to the target register controlled
by the 𝑗th qubit of the control register which produces the new state

𝑛−1 𝑛−𝑗−1 𝑗𝜔
|0⟩ + 𝑒2𝜋𝑖2 |1⟩
(6.3.18) |𝜓2 ⟩ = |𝜓⟩ = |𝜓𝑛 (𝜔)⟩ |𝜓⟩ .
⨂ √2
𝑗=0

This shows that the global phase shifts are kicked back to the amplitudes of |1⟩ in the
control qubits. Since the state |𝜓2 ⟩ is separable with respect to the decomposition into
the control and the target register, it follows from Corollary 3.7.12 that tracing out the
target register yields the state

(6.3.19) |𝜓3 ⟩ = |𝜓𝑛 (𝜔)⟩ .

Therefore, the final state is

(6.3.20) |𝜓4 ⟩ = QFT−1


𝑛 |𝜓𝑛 (𝜔)⟩ .

This state is measured. So, the first assertion follows from Proposition 6.3.4.
6.4. Order finding 233

|0⟩𝑛 −1 |𝑥⟩𝑛
QFT𝑛 𝑐 QFT𝑛

|𝜓⟩ 𝑈𝑐 tr

Figure 6.3.2. Simplified representation of the quantum circuit for eigenvalue estimation.

To prove the second assertion, we set


(6.3.21) 𝑁 = 2𝑛 , 𝑥 = ⌊𝑁𝜔⌉ mod 𝑁, 𝜃 = 𝑁Δ(𝜔, 𝑛, 𝑥).
Then we have
1
(6.3.22) . |𝜃| ≤
2
From Proposition 6.3.4, Lemma A.5.3, Lemma A.5.5, and inequality (6.3.22) we obtain
2 2
1 sin (𝜋𝜃) 1 (2𝜃) 4
(6.3.23) 𝑝(𝑥) = ≥ = 2.
𝑁 2 sin2 (𝜋𝜃/𝑁) 𝑁 2 (𝜋𝜃/𝑁)2 𝜋

To prove the third assertion, we choose 𝑥′ ∈ {𝑥 ± 1} such that


(6.3.24) 𝑁|Δ(𝑛, 𝜔, 𝑥′ )| = 1 − 𝑁|Δ(𝑛, 𝜔, 𝑥)| = 1 − 𝜃.
By Proposition 6.3.4 and Lemma A.5.5 the probability 𝑝 for measuring 𝑥 or 𝑥+1 satisfies
2 2
1 sin (𝜋𝜃) 1 sin (𝜋(1 − 𝜃))
𝑝= +
𝑁 2 sin2 (𝜋𝜃/𝑁) 𝑁 2 sin2 (𝜋(1 − 𝜃)/𝑁)
2
sin (𝜋𝜃) 1 1
(6.3.25) = ( 2 + )
𝑁 2 2
sin (𝜋𝜃/𝑁) sin (𝜋(1 − 𝜃)/𝑁)
2
sin (𝜋𝜃) 1 1
≥ ( 2+ ).
𝜋2 𝜃 (1 − 𝜃)2
Since for 0 < 𝜃 < 1 the function
2 1 1
(6.3.26) 𝑓(𝑥) = sin (𝜋𝜃) ( + )
𝜃2 (1 − 𝜃)2
1
attains its minimum 8 for 𝜃 = 2 , the assertion follows. □

Figure 6.3.2 is a simplified representation of the circuit in Figure 6.3.1. In this


representation, we use a controlled-𝑈 𝑐 operator which sends |𝜓⟩ to 𝑈 𝑐 |𝜓⟩ if the control
register contains the state |𝑐⟩𝑛 for some 𝑐 ∈ ℤ2𝑛 .

6.4. Order finding


We will now present an important application of quantum phase estimation: a quan-
tum algorithm that computes the order of an integer modulo another positive integer
in polynomial time. This algorithm is the central building block in the integer factor-
ization and discrete logarithm algorithms of Peter Shor, which we discuss in Sections
6.5 and 6.6.
234 6. The Algorithms of Shor

6.4.1. The problem. The order finding problem is the following.

Problem 6.4.1 (Order finding problem).

Input: 𝑁 ∈ ℕ and 𝑎 ∈ ℤ𝑁 such that gcd(𝑎, 𝑁) = 1.

Output: The order 𝑟 of 𝑎 modulo 𝑁.

In the sequel, we explain a quantum polynomial time algorithm that solves the
order problem. In the following, we let 𝑁, 𝑎, 𝑟 be as in the order finding problem. We
also let 𝑛 ∈ ℕ.

6.4.2. The operator 𝑈𝑐 . We introduce and discuss a unitary operator that is used
in the order finding algorithm.

Definition 6.4.2. For any 𝑐 ∈ ℤ with gcd(𝑐, 𝑁) = 1 we define the linear operator

|𝑐𝑥 mod 𝑁⟩𝑛 if 0 ≤ 𝑥 < 𝑁,


(6.4.1) 𝑈𝑐 ∶ ℍ𝑛 → ℍ𝑛 , |𝑥⟩𝑛 ↦ {
|𝑥⟩𝑛 if 𝑁 ≤ 𝑥 < 2𝑛 .

Proposition 6.4.3. For all 𝑐 ∈ ℤ with gcd(𝑐, 𝑁) = 1, the operator 𝑈𝑐 is unitary.

Proof. By definition, the map 𝑈𝑐 is linear. So it suffices to show that the map

𝑐𝑥 mod 𝑁 if 0 ≤ 𝑥 < 𝑁,
(6.4.2) 𝑓𝑐 ∶ ℤ2𝑛 → ℤ2𝑛 , 𝑥↦{
𝑐 if 𝑁 ≤ 𝑥 < 2𝑛

is a bijection. For this, it suffices to show that 𝑓𝑐 is surjective since the domain and the
codomain of 𝑓𝑐 are the same. Let 𝑦 ∈ ℤ2𝑛 . If 𝑦 ≥ 𝑁, then we have 𝑓(𝑦) = 𝑦. If 𝑦 < 𝑁,
then we have 𝑦 = 𝑓(𝑥) where 𝑥 ∈ ℤ𝑁 such that 𝑐𝑥 ≡ 𝑦 mod 𝑁. This number 𝑥 exists
because gcd(𝑐, 𝑁) = 1. □

Next, we determine the eigenstates of the operators 𝑈𝑎𝑡 for all 𝑡 ∈ ℕ. In the con-
text of the order finding algorithm, only the case 𝑡 = 1 is relevant. However, for the
quantum discrete logarithm problem, we use 𝑡 > 1.

Proposition 6.4.4. (1) For every 𝑘 ∈ ℤ and every 𝑡 ∈ ℕ the state


𝑟−1
1 𝑘
(6.4.3) |𝑢𝑘 ⟩ = ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 |𝑎𝑠 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑡𝑘
is an eigenstate of 𝑈𝑎𝑡 with eigenvalue 𝑒2𝜋𝑖 𝑟 .
(2) The sequence (|𝑢0 ⟩ , . . . , |𝑢𝑟−1 ⟩) is an orthonormal basis of Span{|𝑎𝑠 mod 𝑁⟩𝑛 ∶
𝑠 ∈ ℤ𝑟 }.

Proof. To prove the first assertion, let 𝑘 ∈ ℤ and 𝑡 ∈ ℕ. We note that the map

(6.4.4) ℤ𝑟 → ℤ 𝑟 , 𝑠 ↦ (𝑠 + 𝑡) mod 𝑟
6.4. Order finding 235

is a bijection. So we have
𝑟−1
1 𝑘
𝑈𝑎𝑡 |𝑢𝑘 ⟩ = ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 𝑈𝑎𝑡 |𝑎𝑠 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑟−1
1 𝑘
= ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 |𝑎𝑠+𝑡 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑟−1
(6.4.5) 𝑘𝑡 1 𝑘
= 𝑒2𝜋𝑖 𝑟 ∑ 𝑒−2𝜋𝑖 𝑟 (𝑠+𝑡) mod 𝑟 ||𝑎(𝑠+𝑡) mod 𝑟 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑟−1
𝑘𝑡 1 𝑘
= 𝑒2𝜋 𝑟 ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 |𝑎𝑠 mod 𝑁⟩𝑛
√𝑟 𝑠=0
𝑘𝑡
2𝜋 𝑟
=𝑒 |𝑢𝑘 ⟩ .
This concludes the proof of the first assertion.
Next, we turn to the second assertion. Since 𝑟 is the order of 𝑎 modulo 𝑁, the ele-
ments of the sequence (|𝑎𝑠 mod 𝑁⟩𝑛 )0≤𝑠<𝑟 are pairwise different. Thus this sequence
is a basis of Span{|𝑎𝑠 mod 𝑁⟩𝑛 ∶ 0 ≤ 𝑠 < 𝑟}. Also, for all 𝑘, 𝑘′ ∈ ℤ𝑟 we have
𝑟−1
1 𝑘−𝑘′
⟨𝑢𝑘 |𝑢𝑘′ ⟩ = ∑ 𝑒2𝜋𝑖 𝑟 𝑠
𝑟 𝑠=0
(6.4.6)
1 if 𝑘 = 𝑘′ ,
= { 1−𝑒2𝜋𝑖(𝑘−𝑘′ )𝑠
𝑘−𝑘′
=0 if 𝑘 ≠ 𝑘′ .
1−𝑒2𝜋𝑖 𝑟 𝑠

Hence, the sequence (|𝑢0 ⟩ , . . . , |𝑢𝑟−1 ⟩) is an orthonormal basis of Span{|𝑎𝑠 mod 𝑁⟩𝑛 ∶
𝑠 ∈ ℤ𝑟 }, as claimed. □

If we were to apply the phase estimation algorithm with the target register initial-
𝑥
ized to an eigenstate |𝑢𝑘 ⟩ of 𝑈𝑎 , then we would obtain a rational approximation 2𝑛 to
𝑘
𝑟
and thus some information about the order 𝑟 mod 𝑁. Unfortunately, this cannot be
done because it is not known how to prepare the eigenstates of 𝑈𝑎 . But the following
proposition is of help.

Proposition 6.4.5. We have


𝑟−1
1
(6.4.7) ∑ |𝑢𝑘 ⟩ = |1⟩𝑛 .
√𝑟 𝑘=0

Proof. Note that


𝑟−1 𝑟−1 𝑟−1
1 1 1 𝑘
∑ |𝑢𝑘 ⟩ = ∑ ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 |𝑎𝑠 mod 𝑁⟩𝑛
√𝑟 𝑘=0 √𝑟 𝑘=0 √𝑟 𝑠=0
(6.4.8)
𝑟−1 𝑟−1
1 𝑘
= ∑ ( ∑ 𝑒−2𝜋𝑖 𝑟 𝑠 ) |𝑎𝑠 mod 𝑁⟩𝑛 .
𝑟 𝑠=0 𝑘=0
236 6. The Algorithms of Shor

We determine the amplitude of |1⟩ in the state in (6.4.8). Since 𝑎𝑠 ≡ 1 mod 𝑁 if and
only if 𝑠 ≡ 0 mod 𝑟, it follows that this amplitude is
𝑟−1
1 𝑘 𝑟
(6.4.9) ∑ 𝑒−2𝜋𝑖 𝑟 ⋅0 = = 1.
𝑟 𝑘=0 𝑟

1 𝑟−1
Since ∑𝑘=0 |𝑢𝑘 ⟩ is a quantum state, the amplitudes of the other basis states in this
√𝑟
representation must be 0. This implies the assertion. □

6.4.3. The algorithm. We now present and analyze a quantum order finding al-
gorithm. The pseudocode is shown in Algorithm 6.4.6. It uses a variant of the phase
estimation circuit, shown in Figure 6.4.1. This circuit differs from the quantum phase
estimation circuit in that the target register is initialized to |1⟩𝑛 . This state can be pre-
pared, and by Proposition 6.4.5 it is the equally weighted superposition of the eigen-
states |𝑢𝑘 ⟩ of 𝑈𝑎 for 0 ≤ 𝑘 < 𝑟. By Proposition 6.4.4, the corresponding eigenvalues are
𝑘
𝑒2𝜋𝑖 𝑟 . Their phase contains information about the order 𝑟 of 𝑎 modulo 𝑁. As we will
see in the proof of Theorem 6.4.8, the modified quantum phase estimation circuit can
be used to determine 𝑟.

Algorithm 6.4.6. Quantum order finding algorithm


Input: 𝑁 ∈ ℕ, 𝑎 ∈ ℤ𝑁 with gcd(𝑎, 𝑁) = 1, 𝑛 ∈ ℕ with 2𝑟2 ≤ 2𝑛 ≤ 2𝑁 2 where 𝑟 is the
order 𝑎 module 𝑁.
Output: 𝑟 or “FAILURE”
1: FindOrder(𝑁, 𝑎, 𝑛)
2: for 𝑗 = 1, 2 do
3: Apply the quantum circuit 𝑄𝑎 from Figure 6.4.1 and obtain 𝑥𝑗 ∈ ℤ2𝑛
𝑥
4: Apply the continued fraction algorithm to 2𝑛𝑗
𝑥 𝑝 1 𝑝
5: if | 2𝑛𝑗 − 𝑞 | ≤ 2𝑛 for a convergent 𝑞
then
6: Set 𝑚𝑗 ← 𝑝 and 𝑟𝑗 ← 𝑞
7: else
8: return “FAILURE”
9: end if
10: end for
11: 𝑟 ← lcm(𝑟1 , 𝑟2 )
12: if 𝑎𝑟 ≡ 1 mod 𝑁 then
13: return 𝑟
14: else
15: return “FAILURE”
16: end if
17: end

In Algorithm 6.4.6 the precision parameter 𝑛 ∈ ℕ is chosen such that

(6.4.10) 2𝑛 ≥ 2𝑟2 .
6.4. Order finding 237

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩ |𝜓4 ⟩

|0⟩𝑛 −1
𝐻 ⊗𝑛 𝑐 QFT𝑛 𝑥 ∈ ℤ 2𝑛

|1⟩𝑛 𝑈𝑎𝑐 tr

Figure 6.4.1. The modified phase estimation circuit 𝑄𝑎 used in the order finding algorithm.

For example, we may set


(6.4.11) 𝑛 = ⌈2 log2 𝑁⌉ + 1.
Then we have
(6.4.12) 𝑛 ≤ 2 log2 𝑁 + 2.
But if there is more information about 𝑁 and 𝑟, we may be able to choose a smaller
value for 𝑛. This is important because 2𝑛 is the number of qubits in the order finding
algorithm. So, the smaller 𝑛 is, the more efficient the order finding algorithm. For
example, as shown in Exercise 6.4.7, for composite 𝑁 we may choose 𝑛 = ⌈log2 𝑁⌉ + 1.
Exercise 6.4.7. (1) Show that (6.4.10) and (6.4.12) are satisfied for 𝑛 = ⌈2 log2 𝑁⌉+1.
(2) Let 𝑁 be a composite number. Show that (6.4.10) and (6.4.12) are satisfied for
𝑛 = ⌈log2 𝑁⌉ + 1.

We will now prove the following theorem that states the correctness and the com-
plexity of the algorithm.
Theorem 6.4.8. On input of 𝑁 ∈ ℕ and 𝑎 ∈ ℤ𝑁 such that gcd(𝑎, 𝑁) = 1, Algorithm
3958924
6.4.6 computes the order 𝑟 of 𝑎 mod 𝑁 with probability at least 101761𝜋4 > 0.399. The
algorithm has running time O((log 𝑁)3 ).

For analyzing the success probability of the order finding algorithm, we need the
𝑝
following proposition in which we call a representation 𝑟 = 𝑞 of a nonzero rational
number 𝑟 reduced if 𝑞 > 0 and gcd(𝑝, 𝑞) = 1.
Proposition 6.4.9. Denote by 𝑥 the return value of the quantum circuit 𝑄𝑎 from Figure
6.4.1 and let 𝑘 ∈ ℤ𝑟 . Then the following hold.
8
(1) With probability at least 𝑟𝜋2
we have
|𝑥 𝑘| 1
(6.4.13) | 𝑛 − | < 𝑛.
|2 𝑟 | 2
𝑘 𝑥
(2) If (6.4.13) holds, then 𝑟 is a convergent of the continued fraction expansion of 2𝑛 . It
𝑝
is the only convergent of this expansion whose reduced representation 𝑞 satisfies

|𝑥 𝑝| 1 𝑛−1
(6.4.14) | 𝑛 − |< 𝑛 and 𝑞 ≤ 2 2 .
|2 𝑞| 2
238 6. The Algorithms of Shor

Proof. By Proposition 6.4.5, the initial state of 𝑄𝑎 is


𝑟−1
1
(6.4.15) |𝜓0 ⟩ = |0⟩𝑛 |1⟩𝑛 = ∑ |0⟩𝑛 |𝑢𝑘 ⟩ .
√𝑟 𝑘=0
Applying 𝐻 ⊗𝑛 to the first register gives
𝑟−1 ⊗𝑛
|0⟩ + |1⟩
(6.4.16) |𝜓1 ⟩ = ∑ ( ) |𝑢𝑘 ⟩ .
𝑘=0 √2
As seen in (6.3.18) we have
𝑟−1
1 | 𝑘
(6.4.17) |𝜓2 ⟩ = ∑ ||𝜓𝑛 ( )⟩ |𝑢𝑘 ⟩ .
√𝑟 𝑘=0 𝑟
By Proposition 6.4.4, the sequence (|𝑢𝑘 ⟩) is orthonormal. So it follows from Exercise
3.7.11 that tracing out the target register gives the mixed state
1 0 1 𝑟−1
(6.4.18) |𝜓3 ⟩ = (( , |||𝜓𝑛 ( )⟩) , . . . , ( , |||𝜓𝑛 ( )⟩)) .
𝑟 𝑟 𝑟 𝑟
−1
After applying QFT𝑛 the mixed state is
1 | 0 1 −1 | 𝑟−1
(6.4.19) |𝜓4 ⟩ = ( ( , QFT−1
𝑛 ||𝜓𝑛 ( )⟩) , . . . , ( , QFT𝑛 ||𝜓𝑛 ( )⟩) ).
𝑟 𝑟 𝑟 𝑟
It follows from Theorem 6.3.7 that measuring this mixed state in the computational
8
basis of ℍ𝑛 gives with probability at least 𝑟𝜋2 an integer 𝑥 ∈ ℤ2𝑛 such that
| 𝑘 | 1
(6.4.20) |Δ ( , 𝑛, 𝑥)| < 𝑛 .
| 𝑟 | 2
𝑘 𝑘 𝑥 |
In Exercise 6.4.10 it is shown that ||Δ ( 𝑟 , 𝑛, 𝑥)|| = || 𝑟 − 2𝑛 |
. This concludes the proof of
the first assertion.
Now assume that (6.4.13) holds. Since we have chosen 𝑛 such that 2𝑛 ≥ 2𝑟2 holds,
we have
𝑛−1
(6.4.21) 𝑟≤2 2

and
|𝑥 𝑘| 1
(6.4.22) | 𝑛 − | < 2.
|2 𝑟 | 2𝑟
𝑘
So by Proposition A.3.35, the fraction 𝑟 is a convergent of the continued fraction expan-
𝑥 𝑝
sion of 2𝑛 and its reduced representation 𝑞 satisfies (6.4.14). To show the uniqueness
𝑝
of this convergent, let 𝑞 be another convergent of this continued fraction expansion
that satisfies (6.4.14). Then we have
|𝑘 𝑝| |𝑥 𝑘| | 𝑥 𝑝| 2 ⋅ 2𝑛−1
(6.4.23) |𝑘𝑞 − 𝑝𝑟| = 𝑟𝑞 | − | ≤ 2𝑛−1 (|| 𝑛 − || + | 𝑛 − |) < = 1.
|𝑟 𝑞 | 2 𝑟 | 2 𝑞 | 2𝑛
𝑘 𝑝
So we have 𝑘𝑞 = 𝑝𝑟 which implies 𝑟
= 𝑞. □
𝑘 𝑘 𝑥 |
Exercise 6.4.10. Use Lemma 6.3.3 to show that (6.4.20) implies ||Δ ( 𝑟 , 𝑛, 𝑥)|| = || 𝑟 − 2𝑛 |
.
6.4. Order finding 239

The next lemma provides a sufficient condition for Algorithm 6.4.6 to find the or-
der 𝑟 of 𝑎 modulo 𝑁.
𝑘𝑗 𝑚𝑗
Lemma 6.4.11. For 𝑗 = 1, 2 let 𝑘𝑗 , 𝑚𝑗 ∈ ℕ0 and 𝑟𝑗 ∈ ℕ with 𝑟
= 𝑟𝑗
, and assume that
gcd(𝑘1 , 𝑘2 , 𝑟) = gcd(𝑚𝑗 , 𝑟𝑗 ) = 1. Then 𝑟 = lcm(𝑟1 , 𝑟2 ).

𝑘𝑗 𝑚𝑗
Proof. Since 𝑟
= 𝑟𝑗
and gcd(𝑚𝑗 , 𝑟𝑗 ) = 1 for 𝑗 = 1, 2, it follows that 𝑟1 and 𝑟2 are
divisors of 𝑟. Therefore, lcm(𝑟1 , 𝑟2 ) is a divisor of 𝑟. This means that we can write

(6.4.24) 𝑟 = 𝑢 ⋅ lcm(𝑟1 , 𝑟2 ) = 𝑢𝑢1 𝑟1 = 𝑢𝑢2 𝑟2

with 𝑢, 𝑢1 , 𝑢2 ∈ ℕ. So, we have


𝑚𝑗 𝑘𝑗 𝑘𝑗
(6.4.25) = =
𝑟𝑗 𝑟 𝑢𝑢𝑗 𝑟𝑗

for 𝑗 = 1, 2. Hence 𝑢 divides 𝑘1 , 𝑘2 , and 𝑟. Since gcd(𝑘1 , 𝑘2 , 𝑟) = 1, we have 𝑢 = 1 and


from (6.4.24) we obtain 𝑟 = lcm(𝑟1 , 𝑟2 ). □

The last statement in this section allows us to estimate the probability of the suffi-
cient condition in Lemma 6.4.11 to occur.

Proposition 6.4.12. Let Pr be a probability distribution on ℤ𝑟 and let 𝑐 ∈ [0, 1] such


𝑐
that Pr(𝑘) ≥ 𝑟 for all 𝑘 ∈ ℤ𝑟 . Consider the experiment in which two integers 𝑘1 and 𝑘2
are independently chosen from ℤ𝑟 according to the probability distribution Pr. Then the
probability that the experiment gives a pair (𝑘1 , 𝑘2 ) such that gcd(𝑘1 , 𝑘2 , 𝑟) = 1 is at least
989731 2
1628176
𝑐 ≥ 0.6𝑐2 .

Proof. First, note that the number of pairs (𝑘1 , 𝑘2 ) in ℤ2𝑟 with gcd(𝑘1 , 𝑘2 , 𝑟) = 1 is the
same as the number of all such pairs in {1, . . . , 𝑟}2 . This number is at least the number of
coprime pairs in {1, . . . , 𝑟}2 . But it is shown in [Fon12] that the latter number is at least
989731 2
1628176
𝑟 ≥ 0.6𝑟2 . Since, in the experiment, any such pair is chosen with probability at
𝑐2 989731 2
least 𝑟2
it follows that the probability of choosing one such pair is at least 1628176
𝑐 . □

We can now prove the success probability stated in Theorem 6.4.8. Proposition
6.4.9 implies that for 𝑗 = 1, 2 and each 𝑘𝑗 ∈ ℤ𝑟 the 𝑗th iteration of the for loop in
8 𝑚
Algorithm 6.4.6 finds with probability at least 𝑟𝜋2 integers 𝑚𝑗 , 𝑟𝑗 ∈ ℤ2𝑛 such that 𝑟 𝑗 =
𝑗
𝑘𝑗
𝑟
.By Proposition 6.4.12, the probability that the two rounds of the for loop find such
integers with gcd(𝑘1 , 𝑘2 , 𝑟) = 1 satisfies
989731 64
(6.4.26) 𝑝≥ = 0.399.
1628176 𝜋4

It remains to analyze the complexity of the order finding algorithm. Its bottleneck
is the computation of the controlled-𝑈𝑎𝑐 operator which we discuss in the next section.
240 6. The Algorithms of Shor

6.4.4. Modular exponentiation. By C-𝑈𝑎 we denote the controlled operator


which for all |𝑐⟩𝑛 |𝑡⟩𝑛 with 𝑐, 𝑡 ∈ ℤ2𝑛 satisfies

|𝑐⟩ |𝑎𝑐 𝑡 mod 𝑁⟩𝑛 if 0 ≤ 𝑡 < 𝑁,


(6.4.27) C-𝑈𝑎 |𝑐⟩𝑛 |𝑡⟩𝑛 = { 𝑛
|𝑡⟩𝑛 if 𝑁 ≤ 𝑡 < 2𝑛 .
We present a quantum circuit that implements C-𝑈𝑎 efficiently. For all 𝑖 ∈ ℤ𝑛 let
𝑖
(6.4.28) 𝑎𝑖 = 𝑎2 mod 𝑁.
Then for 1 ≤ 𝑖 < 𝑛 we have
(6.4.29) 𝑎𝑖 = 𝑎2𝑖−1 mod 𝑁.
𝑛−1
Let 𝑐, 𝑡 ∈ ℤ2𝑛 , 𝑐 = ∑𝑖=0 𝑐 𝑖 2𝑛−1−𝑖 where 𝑐 𝑖 ∈ {0, 1} for 0 ≤ 𝑖 < 𝑛. For 0 ≤ 𝑖 ≤ 𝑛 set
𝑖−1 𝑐
∏ 𝑎𝑙 𝑡 mod 𝑁 if 𝑡 < 𝑁,
(6.4.30) 𝑡 𝑖 = { 𝑙=0 𝑛−𝑙−1
𝑡 if 𝑡 ≥ 𝑁.
Then we have
𝑎𝑐 𝑡 mod 𝑁 if 𝑡 < 𝑁,
(6.4.31) 𝑡0 = 𝑡, 𝑡𝑛 = {
𝑡 if 𝑡 ≥ 𝑁.
Also, for 0 ≤ 𝑖 ≤ 𝑛 − 1 we have
𝑐
𝑎 𝑖 𝑡 mod 𝑛 if 𝑡 < 𝑁,
(6.4.32) 𝑡 𝑖+1 = { 𝑛−𝑖 𝑖
𝑡 if 𝑡 ≥ 𝑁.

Exercise 6.4.13. Verify (6.4.31) and (6.4.32).

The circuit implementing C-𝑈𝑎 uses the modular multiplication operator 𝑈𝑚 that
is defined by its effect on the computational basis states |𝑥⟩ |𝑦⟩ of ℍ2𝑛 , 𝑥, 𝑦 ∈ ℤ2𝑛 , as
follows:
|𝑥⟩ |𝑥𝑦 mod 𝑁⟩ if (𝑥, 𝑦) ∈ ℤ2𝑁 ∧ gcd(𝑦, 𝑁) = 1,
(6.4.33) 𝑈𝑚 |𝑥⟩ |𝑦⟩ = {
|𝑥⟩ |𝑦⟩ if (𝑥, 𝑦) ∉ ℤ2𝑁 ∨ gcd(𝑦, 𝑁) > 1.
This operator is unitary, because, as shown in Exercise 6.4.14, the map

(𝑥, 𝑥𝑦 mod 𝑁) if (𝑥, 𝑦) ∈ ℤ2𝑁 ∧ gcd(𝑦, 𝑁) = 1,


(6.4.34) (𝑥, 𝑦) ↦ {
(𝑥, 𝑦) if (𝑥, 𝑦) ∉ ℤ2𝑁 ∨ gcd(𝑦, 𝑁) > 1

is a bijection of {0, 1}2𝑛 .


Exercise 6.4.14. Show that the function in (6.4.34) is a bijection.

As shown in the overview of quantum circuits that implement modular multiplica-


tion [RC18], there exists a quantum circuit implementation of 𝑈𝑚 with a size of O(𝑛2 ).
The idea of a circuit implementation of C-𝑈𝑎 is to first compute |𝑎𝑖 ⟩𝑛 with 𝑎𝑖 from
(6.4.28) for 0 ≤ 𝑖 < 𝑛 using (6.4.29). The circuit then computes |𝑡 𝑖 ⟩𝑛 with 𝑡 𝑖 from
(6.4.30) for 1 ≤ 𝑖 ≤ 𝑛 using (6.4.32). As seen in (6.4.31), the required result is |𝑡𝑛 ⟩.
6.4. Order finding 241

|𝑎⟩3 |𝑎0 ⟩3
𝑈𝑚
|0⟩3 |𝑎1 ⟩3
𝑈𝑚
|0⟩3 |𝑎2 ⟩3

Figure 6.4.2. Circuit 𝑈𝑝 for 𝑛 = 3.

We now describe a quantum circuit 𝑈𝑝 that constructs |𝑎𝑖 ⟩ for 0 ≤ 𝑖 < 𝑛. It uses
the bitwise 𝖢𝖭𝖮𝖳 operator which for (𝑥0 , . . . , 𝑥𝑛−1 ), (𝑦0 , . . . , 𝑦𝑛−1 ) ∈ {0, 1}𝑛 gives
𝖢𝖭𝖮𝖳𝑛 |𝑥0 ⋯ 𝑥𝑛−1 ⟩ |𝑦0 ⋯ 𝑦𝑛−1 ⟩
(6.4.35)
= |𝑥0 ⋯ 𝑥𝑛−1 ⟩ 𝑋 𝑥0 |𝑦0 ⟩ ⋯ 𝑋 𝑥𝑛−1 |𝑦𝑛−1 ⟩ .
Note that we have
(6.4.36) 𝖢𝖭𝖮𝖳𝑛 |𝑥0 ⋯ 𝑥𝑛−1 ⟩ |0 ⋯ 0⟩ = |𝑥0 ⋯ 𝑥𝑛−1 ⟩ |𝑥0 ⋯ 𝑥𝑛−1 ⟩ .
Figure 6.4.2 shows the circuit 𝑈𝑝 for 𝑛 = 3. It works as follows. The input state is
|𝑎⟩3 |0⟩3 |0⟩3 . In the first step, the circuit computes the new state
(6.4.37) 𝑈𝑚 𝖢𝖭𝖮𝖳𝑛 (|𝑎⟩3 |0⟩3 ) |0⟩3 = 𝑈𝑚 (|𝑎0 ⟩3 |𝑎0 ⟩3 ) |0⟩3 = |𝑎0 ⟩3 |𝑎1 ⟩3 |0⟩3 .
In the second step, the circuit computes the new state
(6.4.38) |𝑎0 ⟩3 𝑈𝑚 𝖢𝖭𝖮𝖳𝑛 (|𝑎1 ⟩3 |0⟩3 ) = |𝑎0 ⟩3 𝑈𝑚 (|𝑎1 ⟩3 |𝑎1 ⟩3 ) = |𝑎0 ⟩3 |𝑎1 ⟩3 |𝑎2 ⟩3 .

The circuit specification for the general case is presented in Algorithm 6.4.15.

𝑖
Algorithm 6.4.15. Circuit 𝑈𝑝 for computing |𝑎𝑖 ⟩𝑛 = ||𝑎2 mod 𝑁⟩ for 0 ≤ 𝑖 < 𝑛
𝑛
Input: 𝑛 ∈ ℕ, 𝑁 ∈ ℤ2𝑛 , 𝑎 ∈ ℤ∗𝑁
Output: |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛 with 𝑎𝑖 as in (6.4.28)
1: 𝑈𝑝 (𝑛, 𝑁, 𝑎)
2: /* The circuit operates on |𝜓⟩ = |𝜓0 ⟩ ⋯ |𝜓𝑛−1 ⟩ ∈ ℍ⊗𝑛
𝑛 */
⊗(𝑛−1)
3: |𝜓⟩ ← |𝑎⟩𝑛 |0⟩𝑛
4: for 𝑖 = 1, . . . , 𝑛 − 1 do
5: |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩ ← 𝖢𝖭𝖮𝖳𝑛 |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩
6: |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩ ← 𝑈𝑚 |𝜓𝑖−1 ⟩ |𝜓𝑖 ⟩
7: end for
8: end

We now prove that the quantum circuit 𝑈𝑝 has the required property and analyze
its complexity.
Proposition 6.4.16. We have
⊗𝑛−1
(6.4.39) 𝑈𝑝 |𝑎⟩𝑛 |0⟩𝑛 = |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛 .
Also, 𝑈𝑝 has size O(𝑛3 ).
242 6. The Algorithms of Shor

|𝑐 0 ⟩ |𝑐 0 ⟩
|𝑐 1 ⟩ |𝑐 1 ⟩

|𝑎⟩2 |𝑎0 ⟩2 tr
𝑈𝑝 𝑈𝑝−1
|0⟩2 |𝑎1 ⟩2 tr

|1⟩2 tr
𝑈𝑚 𝑈𝑚
|𝑎𝑐 𝑡 mod 𝑁⟩2 if 𝑡 ∈ ℤ𝑁 ,
|𝑡⟩2 {
|𝑡⟩2 if ∉ ℤ𝑁

Figure 6.4.3. Quantum circuit that implements C-𝑈𝑎 for 𝑛 = 2.

Proof. We use the notation of Algorithm 6.4.15 and show by induction on 𝑖 that after
the 𝑖th iteration of the for loop we have

(6.4.40) |𝜓⟩ = |𝑎0 ⟩𝑛 ⋯ |𝑎𝑖 ⟩𝑛 |0⟩𝑛 ⋯ |0⟩𝑛 .

The base case 𝑖 = 0 follows from the choice of the input state. For the inductive step,
assume that 0 ≤ 𝑖 < 𝑛 − 1 and that (6.4.40) holds. It follows from (6.4.29) that after the
completion of the (𝑖 + 1)st iteration of the for loop we have

|𝜓⟩ = |𝑎0 ⟩𝑛 ⋯ |𝑎𝑖−1 ⟩ 𝑈𝑚 𝖢𝖭𝖮𝖳𝑛 (|𝑎𝑖 ⟩𝑛 |0⟩𝑛 ) |0⟩𝑛 ⋯ |0⟩𝑛


= |𝑎0 ⟩𝑛 ⋯ |𝑎𝑖−1 ⟩ 𝑈𝑚 (|𝑎𝑖 ⟩𝑛 |𝑎𝑖 ⟩𝑛 ) |0⟩𝑛 ⋯ |0⟩𝑛
(6.4.41)
= |𝑎0 ⟩𝑛 ⋯ |𝑎𝑖−1 ⟩ |𝑎𝑖 ⟩𝑛 |𝑎2𝑖 mod 𝑁⟩𝑛 |0⟩𝑛 ⋯ |0⟩𝑛
= |𝑎0 ⟩𝑛 ⋯ |𝑎𝑖−1 ⟩ |𝑎𝑖 ⟩𝑛 |𝑎𝑖+1 ⟩𝑛 |0⟩𝑛 ⋯ |0⟩𝑛 .

This proves that 𝑈𝑝 gives |𝑎0 ⟩ ⋯ |𝑎𝑛−1 ⟩. We estimate the size of the circuit. There are
𝑛 iterations of the for loop. Each iteration executes 𝖢𝖭𝖮𝖳𝑛 and 𝑈𝑚 . Both operations
have complexity O(𝑛2 ). So the total size is O(𝑛3 ). □

Next, we construct a quantum circuit that implements the unitary operator C-𝑈𝑎
from (6.4.32) which is shown in Figure 6.4.3 for 𝑛 = 2. Its initial state is |𝑐⟩2 |𝑡⟩2 where
𝑐, 𝑡 ∈ ℤ4 . Then the ancilla qubits |𝑎⟩2 , |0⟩2 , and |1⟩2 are inserted between |𝑐⟩2 and |𝑡⟩2 .
The operator 𝑈𝑝 is applied to |𝑎⟩2 |0⟩2 . This gives the state |𝑐⟩2 |𝑎0 ⟩2 |𝑎1 ⟩2 |1⟩2 |𝑡⟩2 . Next,
|𝑎1 ⟩2 and |1⟩2 are swapped conditioned on 𝑐 0 = 1. Therefore, the input state to 𝑈𝑚 is
||𝑎𝑐10 ⟩ |𝑡⟩2 . Applying 𝑈𝑚 gives the state |𝑐⟩2 |𝑎0 ⟩2 ||𝑎1−𝑐
1
0 𝑐
⟩ ||𝑎10 ⟩ |𝑡1 ⟩2 . The next controlled
2 2
swap changes the three ancillary registers back to |𝑎0 ⟩2 |𝑎1 ⟩2 |1⟩2 . The circuit then ap-
plies another controlled swap, the operator 𝑈𝑚 and the inverse swap. The result is the
state |𝑐⟩2 |𝑎0 ⟩2 |𝑎1 ⟩2 |1⟩2 |𝑡2 ⟩2 . Tracing out the ancillary qubits, we obtain the required
result.
6.4. Order finding 243

Algorithm 6.4.17. Circuit that implements C-𝑈𝑎


Input: |𝑐⟩𝑛 |𝑡⟩𝑛 ∈ ℍ2𝑛
Output: |𝑐⟩𝑛 |𝑡𝑛 ⟩𝑛
1: 𝑈𝑒
⊗𝑛(𝑛+3)
2: */ The circuit operates on |𝜓⟩ = |𝜓0 ⟩ ⋯ |𝜓𝑛+2 ⟩ ∈ ℍ𝑛 */
⊗(𝑛−1)
3: |𝜓⟩ ← |𝑐⟩𝑛 |𝑎⟩𝑛 |0⟩𝑛 |1⟩𝑛 |𝑡⟩𝑛
4: |𝜓1 ⟩ ⋯ |𝜓𝑛 ⟩ ← 𝑈𝑝 |𝜓1 ⟩ ⋯ |𝜓𝑛 ⟩ = |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛
5: for 𝑖 = 1 to 𝑛 do
6: |𝜓⟩ ← 𝖢𝖲𝖶𝖠𝖯𝑖−1 |𝜓⟩
7: |𝜓𝑛+1 ⟩ |𝜓𝑛+2 ⟩ ← 𝑈𝑚 |𝜓𝑛+1 ⟩ |𝜓𝑛+2 ⟩
8: |𝜓⟩ ← 𝖢𝖲𝖶𝖠𝖯𝑖−1 |𝜓⟩
9: end for
10: Trace out |𝜓𝑛 ⟩ ⋯ |𝜓2𝑛 ⟩
11: end

The general circuit that implements C-𝑈𝑎 is specified in Algorithm 6.4.17. It uses
the controlled swap operator 𝖢𝖲𝖶𝖠𝖯𝑖 for 𝑖 = 0, . . . , 𝑛 − 1. Its effect is seen in (6.4.42):
𝖢𝖲𝖶𝖠𝖯𝑖 |𝑐⟩𝑛 |𝜉0 ⟩ ⋯ |𝜉𝑛−1 ⟩ |𝜑0 ⟩ |𝜑1 ⟩
(6.4.42) |𝑐⟩ |𝜉 ⟩ ⋯ |𝜉𝑛−1 ⟩ |𝜑0 ⟩ |𝜑1 ⟩ if 𝑐 𝑖 = 0,
={ 𝑛 0
|𝑐⟩𝑛 |𝜉0 ⟩ ⋯ |𝜉𝑛−𝑖−1 ⟩ |𝜑0 ⟩ |𝜉𝑛−𝑖+1 ⟩ ⋯ |𝜉𝑛−𝑖 ⟩ |𝜑1 ⟩ if 𝑐 𝑖 = 1.
So 𝖢𝖲𝖶𝖠𝖯𝑖 exchanges |𝜉𝑛−𝑖 ⟩ and |𝜑0 ⟩ conditioned on 𝑐 𝑖 being 1. For example, the cir-
cuit in Figure 6.4.3 uses 𝖢𝖲𝖶𝖠𝖯1 . It swaps |𝑎1 ⟩𝑛 and |1⟩𝑛 conditioned on 𝑐 0 being 1.
We now prove the following result.
Proposition 6.4.18. The circuit specified in Algorithm 6.4.17 implements the unitary
operator C-𝑈𝑎 from (6.4.32) and has size O(𝑛3 ).

Proof. We prove by induction on 𝑖 that after executing 𝑖 iterations of the for loop we
have
(6.4.43) |𝜓⟩ = |𝑐⟩𝑛 |𝑎0 ⟩𝑛 ⋯ |𝑎𝑛−1 ⟩𝑛 |1⟩𝑛 |𝑡 𝑖 ⟩𝑛 .
Applying this for 𝑖 = 𝑛 it follows that the circuit implements C-𝑈𝑎 .
The base case follows by considering the instructions in lines 3 and 4 and Propo-
sition 6.4.16.
For the inductive step, assume that 0 ≤ 𝑖 < 𝑛 and that (6.4.43) holds. In the (𝑖+1)st
iteration of the for loop the instruction in line 6 swaps |𝑎𝑛−𝑖−1 ⟩𝑛 and |1⟩𝑛 conditioned
on 𝑐 𝑖+1 being 1. This means that after this operation, we have
𝑐
(6.4.44) |𝜓𝑛 ⟩ |𝜓𝑛+1 ⟩ = ||𝑎𝑛−𝑖−1
𝑖+1
⟩ |𝑡 𝑖 ⟩𝑛 .
So the application of 𝑈𝑚 to this quantum state gives
(6.4.45) ||𝑎𝑐𝑛−𝑖−1
𝑖+1
⟩ |𝑡 𝑖+1 ⟩𝑛 .
Another application of 𝖢𝖲𝖶𝖠𝖯𝑖 swaps |𝑎𝑖+1 ⟩𝑛 and |1⟩𝑛 conditioned on 𝑐 𝑖+1 being 1. So,
(6.4.43) holds.
244 6. The Algorithms of Shor

Finally, we estimate the size of the circuit. The number of ancilla bits required by
the circuit is O(𝑛2 ). By Proposition 6.4.16, the circuit 𝑈𝑝 has size O(𝑛3 ). The number
of iterations of the for loop is 𝑛. We analyze the complexity of the implementation of
𝖢𝖲𝖶𝖠𝖯𝑖 . As seen in Figure 4.5.1, one quantum swap can be implemented using 𝑂(1)
elementary gates. So, swapping 𝑛 qubits can be achieved using O(𝑛) elementary gates.
Theorem 4.12.7 implies that 𝖢𝖲𝖶𝖠𝖯𝑖 can also be implemented using O(𝑛) elementary
quantum gates. Also, 𝑈𝑚 requires O(𝑛2 ) elementary quantum gates. So, the complexity
of the for loop is O(𝑛3 ) which concludes the proof. □

Now we can prove the complexity statement of Theorem 6.4.8 as follows. The or-
der finding algorithm uses the quantum circuit 𝑄𝑎 twice. By assumption, the precision
parameter 𝑛 used in this circuit satisfies 2𝑛 ≤ 2𝑁 2 , which implies 𝑛 = O(log 𝑁). So,
by Proposition 6.4.18, the size of this circuit is O((log 𝑁)3 ). By Proposition A.3.28, the
application of the continued fraction algorithm requires time O((log 𝑁)2 ). Further-
more, the calculation of the lcm, 𝑎𝑟 mod 𝑁, and all other operations requires running
time O((log 𝑁)3 ). This implies the complexity statement of Theorem 6.4.8. If faster
algorithms for integer multiplication, division with remainder, and lcm are used, the
complexity of the order finding algorithm becomes (log 𝑁)2 (log log 𝑁)O(1) . Such algo-
rithms are, for example, presented in [AHU74] and [HvdH21].

6.5. Integer factorization


In this section, we explain how to solve the following problem in quantum polynomial
time using the order finding algorithm from Section 6.4.

Problem 6.5.1 (Integer factorization problem).

Input: A composite positive integer

Output: A proper divisor 𝑑 of 𝑁

The best-known classical and fully analyzed Monte Carlo algorithm for this prob-
1/2
lem has subexponential complexity 𝑒(1+o(1))(log 𝑁 log log 𝑁) [LP92]. Furthermore, the
best heuristic Monte Carlo algorithm for this problem has subexponential complexity
1/3 2/3
𝑒(𝑐+o(1))(log 𝑁) (log log 𝑁) where 𝑐 = √ 3
64/9 [BLP93]. The quantum factoring algo-
rithm is Algorithm 6.5.2. Its idea has already been explained in Section 6.1. It selects
𝑎 ∈ ℤ𝑁 randomly with the uniform distribution and computes 𝑑 = gcd(𝑎, 𝑁). If 𝑑 > 1,
then 𝑑 is a proper divisor of 𝑁; the algorithm returns this divisor and terminates. Oth-
erwise, the algorithm calls FindOrder(𝑁, 𝑎, 𝑛) with 𝑛 from (6.4.11): By Theorem 6.4.8
it finds the order 𝑟 of 𝑎 modulo 𝑁 with probability at least 0.399. If 𝑟 is even, then

(6.5.1) (𝑎𝑟/2 − 1)(𝑎𝑟/2 + 1) ≡ 0 mod 𝑁.

Also, if 𝑁 does not divide 𝑎𝑟/2 + 1, then gcd(𝑎𝑟/2 − 1, 𝑁) is a proper divisor of 𝑁. This
is what the algorithm tests.
6.5. Integer factorization 245

Algorithm 6.5.2. Factoring using order finding


Input: 𝑛 ∈ ℕ, an odd composite 𝑁 ∈ ℕ
Output: A proper divisor 𝑑 of 𝑁
1: Factor(𝑁)
2: 𝑎 ← randomInt(𝑁)
3: 𝑑 ← gcd(𝑎, 𝑁)
4: if 𝑑 > 0 then
5: return (𝑑)
6: end if
7: 𝑛 ← ⌈2 log2 𝑁⌉ + 1
8: 𝑟 ← FindOrder(𝑁, 𝑎, 𝑛)
9: if 𝑟 ≠ “FAILURE” and 𝑟 ≡ 0 mod 2 then
10: 𝑏 ← 𝑎𝑟/2 mod 𝑁
11: 𝑑 ← gcd(𝑏 − 1, 𝑁)
12: if 𝑑 ≠ 1 then
13: return 𝑑
14: end if
15: end if
16: return “FAILURE”
17: end

We present examples for a successful and an unsuccessful run of Algorithm 6.5.2.


Example 6.5.3. Let 𝑁 = 15. Suppose that the factoring algorithm selects 𝑎 = 2.
Then gcd(𝑎, 𝑁) = 1. As can be easily verified, the order of 2 modulo 15 is 4. Suppose
that the order finding algorithm returns 𝑟 = 4. Then it computes 𝑏 = 𝑎𝑟/2 mod 𝑁 =
22 mod 15 = 4 and 𝑑 = gcd(𝑏 − 1, 𝑁) = gcd(4 − 1, 15) = 3 which is a proper divisor of
15.
Let 𝑁 = 15. Now suppose that the factoring algorithm selects 𝑎 = 14. Then
gcd(𝑎, 𝑁) = 1. As can be easily verified, the order of 14 modulo 15 is 2. Suppose that
the order finding algorithm returns 𝑟 = 2. Then it computes 𝑏 = 𝑎𝑟/2 mod 𝑁 = 14 and
𝑑 = gcd(1 − 1, 15) = 15, which is not a proper divisor of 15.

In the remainder of this section, we will prove the following theorem.


Theorem 6.5.4. On input of an odd composite number 𝑁 ∈ ℕ, Algorithm 6.5.2 returns
a proper divisor of 𝑁 with probability at least 0.199 and has running time O((log 𝑁)3 ).

We note that by employing the technique outlined in Section 1.3.4, it is possible


to approach a success probability of 1 with increasing precision without altering the
asymptotic complexity. This achievement can be accomplished through iterative ap-
plication of the algorithm.
Now we analyze the success probability and the running time of the quantum fac-
toring algorithm. It is successful if the order finding algorithm returns the order 𝑟 of 𝑎,
this order is even, and 𝑎𝑟/2 ≢ −1 mod 𝑁. To obtain a lower bound for the probability
of this happening, we need the following lemma.
246 6. The Algorithms of Shor

Lemma 6.5.5. Let 𝑝 be an odd prime number and let 𝑒 ∈ ℕ. Let 𝑑 ∈ ℕ be the exponent
of 2 in the prime factor decomposition of 𝑝 − 1. Choose an integer 𝑎 ∈ ℤ𝑝∗𝑒 randomly with
the uniform distribution. Then the probability for 2𝑑 to divide the order of 𝑎 modulo 𝑝𝑒 is
1/2.

Proof. The order of ℤ𝑝∗𝑒 is

(6.5.2) 𝜑 = 𝜑(𝑝𝑒 ) = (𝑝 − 1)𝑝𝑒−1 .

Since 𝑝 is odd, the exponent of 2 in the prime factorization of 𝜑 is 𝑑. Choose a primitive


root 𝑔 ∈ 𝑍𝑝∗𝑒 modulo 𝑝𝑒 . Its order modulo 𝑝𝑒 is 𝜑 and

(6.5.3) ℤ𝜑 → ℤ𝑝∗𝑒 , 𝑘 ↦ 𝑔𝑘 mod 𝑝𝑒

is a bijection. Select 𝑘 ∈ ℤ𝜑 randomly with the uniform distribution. Due to the bijec-
tivity of (6.5.3) , the integer 𝑎 = 𝑔𝑘 mod 𝑝𝑒 is uniformly distributed in ℤ𝑝∗𝑒 . Also, the
order of 𝑎 is
𝜑
(6.5.4) 𝑟= .
gcd(𝑘, 𝜑)
Let 𝑑 ′ be the exponent of 2 in the prime factor decomposition of 𝑟. Then (6.5.4) implies
that 𝑑 ′ = 𝑑 if 𝑘 is odd and 𝑑 ′ < 𝑑 if 𝑘 is even. Since 𝜑 is even, half of the integers in ℤ𝜑
are even and half are odd. So, we have 𝑑 = 𝑑 ′ with probability 1/2. □

The next proposition gives a lower bound for the conditional success probability
of Algorithm 6.5.2 in the case where it selects an integer 𝑎 that is coprime to 𝑁 and the
order finding algorithm returns the order of 𝑎 modulo 𝑁.

Proposition 6.5.6. Let 𝑁 be an odd composite number with 𝑚 different prime factors.
Choose 𝑎 ∈ ℤ∗𝑁 randomly with the uniform distribution. Then the probability that the
order 𝑟 of 𝑎 modulo 𝑁 is even and that 𝑎𝑟/2 ≢ −1 mod 𝑁 is at least 1 − 1/2𝑚−1 .

Proof. We show that the probability that 𝑟 is odd or that 𝑟 is even and satisfies 𝑎𝑟/2 ≡
−1 mod 𝑁 is at most 1/2𝑚−1 .
Let
𝑚
𝑒
(6.5.5) 𝑁 = ∏ 𝑝𝑖 𝑖
𝑖=1

be the prime factor decomposition of 𝑁. To choose 𝑎 ∈ ℤ∗𝑁 randomly with uniform


distribution, select 𝑎𝑖 ∈ ℤ∗𝑒𝑖 randomly with the uniform distribution and apply Chi-
𝑝𝑖
𝑒
nese remaindering (see [Buc04, Section 2.15]) to find 𝑎 ∈ ℤ∗𝑁 with 𝑎 ≡ 𝑎𝑖 mod 𝑝𝑖 𝑖 for
1 ≤ 𝑖 ≤ 𝑚.
Let 𝑟 be the order of 𝑎 modulo 𝑁 and for 1 ≤ 𝑖 ≤ 𝑚 let 𝑟 𝑖 be the order of 𝑎𝑖 modulo
𝑒
𝑝𝑖 𝑖 . Then by Exercise A.4.17, we have

(6.5.6) 𝑟 = lcm(𝑟1 , . . . , 𝑟𝑛 ).
6.5. Integer factorization 247

Denote by 𝑓 the exponent of 2 in the prime factor decomposition of 𝑟 and denote by 𝑓𝑖


𝑒
the exponent of 2 in the prime factor decomposition of 𝑟 𝑖 modulo 𝑝𝑖 𝑖 . We show that if
𝑟/2
𝑟 is odd or 𝑟 is even and satisfies 𝑎 ≡ −1 mod 𝑁, then

(6.5.7) 𝑓 = 𝑓𝑖 for 1 ≤ 𝑖 ≤ 𝑚.

So let 𝑖 ∈ {1, . . . , 𝑚}. If 𝑟 is odd, then 𝑓 = 0 and (6.5.6) implies 𝑓𝑖 = 0. Now assume that
𝑒
𝑓 > 0 and 𝑎𝑟/2 ≡ −1 mod 𝑁. Since 𝑎𝑟𝑖 ≡ 1 mod 𝑝𝑖 𝑖 , it follows that 𝑟 𝑖 |𝑟 which implies
𝑒
𝑓𝑖 ≤ 𝑓. But since 𝑎𝑟/2 𝑖 ≡ −1 mod 𝑝𝑖 𝑖 , it follows that 𝑓𝑖 = 𝑓.
From the above argument, it follows that the probability that 𝑟 is odd or that 𝑟 is
even and satisfies 𝑎𝑟/2 ≡ −1 mod 𝑁 is at most the probability that all 𝑓𝑖 are equal. We
show that this probability is at most 1/2𝑚−1 . We know that 𝑓1 assumes some value with
probability 1. It follows from Lemma 6.5.5 that for 2 ≤ 𝑖 ≤ 𝑚 we have 𝑓𝑖 = 𝑓1 with
probability at most 1/2. In fact, if 𝑓1 is the exponent of 2 in the prime factorization of
𝑝 𝑖 − 1, then by Lemma 6.5.5 the probability that 𝑓𝑖 = 𝑓1 is 1/2. And if 𝑓1 is not this
exponent, then by Lemma 6.5.5 the probability that 𝑓𝑖 = 𝑓1 is at most 1 − 1/2 = 1/2.
So the probability that all 𝑓𝑖 are equal is at most 1/2𝑚−1 . □

We can now prove Theorem 6.5.4.


The algorithm is at least in the following two cases successful. In the first case 𝑎 is
not coprime to 𝑁 which occurs with probability 1 − 1/𝜑(𝑁). In the second case (1) the
algorithm finds the order 𝑟 of 𝑎 modulo 𝑁 and (2) this order is even and 𝑎𝑟/2 ≢ −1 mod
3958924
𝑁. By Theorem 6.4.8, (1) occurs with probability at least 101761𝜋4 . Furthermore, by
1
Proposition 6.5.6, (2) happens with conditional probability at least 2
since 𝑁 has at
least two prime factors. So the probability of (1) and (2) is at least

3958924
(6.5.8) > 0.199.
2 ∗ 101761𝜋4

Summarizing these results, the total success probability is at least

𝑁 − 𝜑(𝑁) + 0.199𝜑(𝑁)
(6.5.9) > 0.199.
𝑁

By Theorem 6.4.8 and the choice of 𝑛 in Algorithm 6.5.2, the running time of the algo-
rithm is O((log 𝑁)3 ).

Exercise 6.5.7. Find a polynomial time quantum algorithm that finds the prime factor
decomposition of every positive integer.

Exercise 6.5.8. Let 𝑁 ∈ ℕ, 𝑎 ∈ ℤ∗𝑁 , and 𝑟 ∈ ℤ𝑁 . Show how the quantum factoring
algorithm can be used to check in polynomial time whether 𝑟 is the order of 𝑎 mod-
ulo 𝑁.
248 6. The Algorithms of Shor

6.6. Discrete logarithms


In this section, we show how the order finding algorithm and the phase estimation
algorithm can be used to solve the following problem in quantum polynomial time.
Problem 6.6.1 (Discrete Logarithm Problem).
Input: An odd integer 𝑁, 𝑎, 𝑏 ∈ ℤ∗𝑁 such that 𝑏 ≡ 𝑎𝑡 mod 𝑁 where 𝑡 ∈ ℤ𝑟 and 𝑟 is the
order of 𝑎 modulo 𝑁
Output: The exponent 𝑡

The exponent 𝑡 in the discrete logarithm problem is called the discrete logarithm
of 𝑏 to base 𝑎 modulo 𝑁. The discrete logarithm problem is also referred to as the DL
problem.
We will now present a quantum polynomial time DL algorithm. In the presenta-
tion, we will use the notation from the discrete logarithm problem.
We make a few preliminary remarks. Using the quantum factoring algorithm from
the previous section, we can find the prime factorization of 𝑁 in polynomial time, al-
lowing us to compute the order 𝜑(𝑁) of ℤ∗𝑁 since
1
(6.6.1) 𝜑(𝑁) = 𝑁 ∏ (1 − )
𝑝∣𝑁
𝑝

which is shown in [Buc04, Theorem 2.17.2]. By applying the quantum factoring al-
gorithm, we can also determine the prime factorization of 𝜑(𝑁). Subsequently, the
Pohlig-Hellman algorithm, as described in Section 10.5 of [Buc04], provides a polyno-
mial time reduction from the general DL problem to the problem of computing discrete
logarithms of basis elements 𝑎 whose order 𝑟 is a known prime number. Therefore, to
achieve a quantum polynomial time DL algorithm, we may assume that the order 𝑟 of
𝑎 modulo 𝑁 is a prime number. Additionally, we assume that 𝑡 > 1, as the cases 𝑡 = 0
and 𝑡 = 1 can be solved by inspection.
The idea of the quantum DL algorithm for this special case is the following. The
algorithm selects an appropriate precision parameter 𝑛 ∈ ℕ and uses the unitary oper-
ators 𝑈𝑎 and 𝑈 𝑏 that are specified in Definition 6.4.2. Since 𝑏 ≡ 𝑎𝑡 mod 𝑁, it follows
𝑘 𝑡𝑘
from Proposition 6.4.4 that the eigenvalues of these operators are 𝑒2𝜋𝑖 𝑟 and 𝑒2𝜋𝑖 𝑟 for
𝑥
0 ≤ 𝑘 < 𝑟. Using quantum phase estimation, we can find (𝑥, 𝑦) ∈ ℤ22𝑛 such that 2𝑛 and
𝑦 𝑘 𝑡𝑘 mod 𝑟
2𝑛
are close enough to 𝑟
and 𝑟
for some 𝑘 ∈ ℤ∗𝑟 such that
𝑟𝑥 𝑟𝑦
(6.6.2) 𝑘=⌊
𝑛
⌉ and 𝑘𝑡 mod 𝑟 = ⌊ 𝑛 ⌉ .
2 2
Since 𝑟 is a prime number, it follows that gcd(𝑘, 𝑟) = 1. So we can compute 𝑘′ with
𝑘𝑘′ ≡ 1 mod 𝑟 and obtain
𝑟𝑦
(6.6.3) 𝑡 = 𝑘′ ⌊ 𝑛 ⌉ mod 𝑟.
2
The quantum discrete logarithm is Algorithm 6.6.2. In the remainder of this sec-
tion, we will prove Theorem 6.6.3 which states that it is correct and runs in polynomial
time.
6.6. Discrete logarithms 249

Algorithm 6.6.2. Quantum discrete logarithm algorithm


Input: 𝑁, 𝑎, 𝑏, 𝑟 such that 𝑟 is the order of 𝑎 modulo 𝑁, 𝑟 is a prime number, and
𝑏 ≡ 𝑎𝑡 mod 𝑁 for some 𝑡 ∈ ℤ∗𝑟 .
Output: The discrete logarithm 𝑡 of 𝑏 to base 𝑎 modulo 𝑁 or “FAILURE”
1: DL(𝑁, 𝑎, 𝑏, 𝑟)
2: 𝑛 ← ⌈log2 𝑟⌉ + 1
3: Apply the quantum circuit QDL from Figure 6.6.1 and obtain (𝑥, 𝑦) ∈ ℤ22𝑛
4: 𝑘 ← ⌊𝑥𝑟/2𝑛 ⌉ mod 𝑟
5: if 𝑘 ≠ 0 then
6: 𝑙 ← ⌊𝑦𝑟/2𝑛 ⌉ mod 𝑟
7: 𝑡 ← 𝑙𝑘−1 mod 𝑟
8: return 𝑡
9: end if
10: return “FAILURE”
11: end

Theorem 6.6.3. On input of 𝑁 ∈ ℕ, 𝑎, 𝑏 ∈ ℤ∗𝑁 such that the order of 𝑎 modulo 𝑁 is


a prime number 𝑟 and 𝑏 ≡ 𝑎𝑡 mod 𝑁 for some 𝑡 ∈ ℤ∗𝑟 , Algorithm 6.6.2 returns 𝑡 with
probability at least 64(𝑟 − 1)/𝑟𝜋4 > 0.328. Its running time is O((log 𝑁)3 ).

Algorithm 6.6.2 sets


(6.6.4) 𝑛 = ⌈log2 ⌉ + 1.
This implies
(6.6.5) 2𝑛 > 2𝑟.

Then it applies the quantum circuit from Figure 6.6.1. The next proposition de-
scribes the output of the circuit.
Proposition 6.6.4. For all 𝑘 ∈ ℤ∗𝑟 the quantum circuit in Figure 6.6.1 gives with proba-
bility 64(𝑟 − 1)/𝑟𝜋4 two integers 𝑥, 𝑦 ∈ ℤ2𝑛 such that
𝑟𝑥 𝑟𝑦
(6.6.6) 𝑘 = ⌊ 𝑛 ⌉ and 𝑘𝑡 mod 𝑟 = ⌊ 𝑛 ⌉ .
2 2

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2𝑎 ⟩ |𝜓2𝑏 ⟩ |𝜓3 ⟩ |𝜓4 ⟩

|0⟩𝑛 |𝑥⟩𝑛 −1
𝐻 ⊗𝑛 QFT𝑛 𝑥

|0⟩𝑛 |𝑦⟩𝑛 −1 𝑦
𝐻 ⊗𝑛 QFT𝑛

𝑦
|0⟩𝑛 𝑈𝑎𝑥 𝑈𝑏 tr

Figure 6.6.1. The quantum circuit QDL for discrete logarithm computation.
250 6. The Algorithms of Shor

Proof. The circuit operates on the tensor product of three quantum registers of length
𝑛 each of which is initialized to

(6.6.7) |𝜓0 ⟩ = |0⟩𝑛 |0⟩𝑛 |1⟩𝑛 .

Then it applies 𝐻 ⊗𝑛 to the first two quantum registers. This gives the state

⊗𝑛 ⊗𝑛
|0⟩ + |1⟩ |0⟩ + |1⟩
(6.6.8) ( ) ( ) |1⟩𝑛 .
√2 √2

Next, it applies the operator C-𝑈𝑎 from (6.4.27) to the first and the third quantum reg-
ister. As in (6.4.17) this gives the state

𝑟−1 2𝑛 −1
1 | 𝑘
(6.6.9) |𝜓2𝑎 ⟩ = ∑ (|𝜓𝑛 ( )⟩ ( ∑ |𝑦⟩) |𝑢𝑘 ⟩) .
√2𝑛 𝑟 𝑘=0 | 𝑟 𝑦=0

Then the algorithm applies the operator C-𝑈 𝑏 to the second and third quantum register.
It follows from Proposition 6.4.4 that the |𝑢𝑘 ⟩ are eigenstates of 𝑈 𝑏 associated to the
𝑘𝑡
eigenvalues 𝑒2𝜋𝑖 𝑟 . So, by the same argument, this gives the state

𝑟−1
1 | 𝑘 | 𝑘𝑡
(6.6.10) |𝜓2𝑏 ⟩ = ∑ (||𝜓𝑛 ( )⟩ ||𝜓𝑛 ( )⟩ |𝑢𝑘 ⟩) .
𝑟 𝑘=0 𝑟 𝑟

By Exercise 3.7.11, tracing out the third quantum register gives the mixed state

1 | 𝑘 | 𝑘𝑡
(6.6.11) |𝜓3 ⟩ = (( , |𝜓𝑛 ( )⟩ |𝜓𝑛 ( )⟩)) .
𝑟 | 𝑟 | 𝑟 0≤𝑘<𝑟

−1
Now the algorithm applies QFT𝑛 to the first and second register. This gives the mixed
state

1 | 𝑘 | 𝑘𝑡
(6.6.12) |𝜓4 ⟩ = (( , QFT𝑛−1 |𝜓𝑛 ( )⟩ QFT𝑛−1 |𝜓𝑛 ( )⟩)) .
𝑟 𝑛 | 𝑟 𝑛 | 𝑟 0≤𝑘<𝑟

Let 𝑘 ∈ ℤ𝑟 . By Theorem 6.3.7 and (6.6.5) measuring these registers in the computa-
tional basis of ℍ⊗2 4 2
𝑛 gives with probability 64/𝜋 𝑟 two integers (𝑥, 𝑦) ∈ ℤ2𝑛 with

| 𝑘 | 1 1 | 𝑘𝑡 mod 𝑟 | 1 1
(6.6.13) |Δ ( , 𝑛, 𝑥)| < 𝑛 < , |Δ ( , 𝑛, 𝑦)|| < 𝑛 < .
| 𝑟 | 2 2𝑟 | 𝑟 2 2𝑟
6.7. Relevance for cryptography 251

We note that
𝑘 1 1 𝑘𝑡 mod 𝑟 1 1
(6.6.14) 0< ≤ 1 − < 1 − 𝑛, ≤ 1 − < 1 − 𝑛.
𝑟 𝑟 2 𝑟 𝑟 2
So Lemma 6.3.3 and (6.6.13) imply
|𝑘 𝑥| 1 | 𝑘𝑡 mod 𝑟 𝑦| 1
(6.6.15) | − 𝑛| < , | − 𝑛 || < .
| 𝑟 2 | 2𝑟 | 𝑟 2 2𝑟
Consequently, we obtain
𝑟𝑥 𝑟𝑦
(6.6.16) 𝑘 = ⌊ 𝑛 ⌉ and 𝑘𝑡 mod 𝑟 = ⌊ 𝑛 ⌉
2 2
which concludes the proof. □

It follows from Proposition 6.6.4 that with probability at least 64(𝑟 − 1)/𝑟𝜋4 ≥
32/𝜋4 > 0 the quantum circuit in Figure 6.6.1 returns 𝑥, 𝑦 ∈ ℤ2𝑛 that satisfy (6.6.6)
for some 𝑘 ∈ ℤ∗𝑟 . If this happens, then the algorithm computes 𝑙 = ⌊𝑦𝑟/2𝑛 ⌉ which
by Proposition 6.6.4 is 𝑘𝑡 mod 𝑟. So we have 𝑡 = 𝑙𝑘−1 mod 𝑟 which means that the
algorithm produces the correct result. As in the analysis of the order finding algorithm,
it can be seen that the complexity of the algorithm is O((log 𝑁)3 ).

6.6.1. Discrete logarithms in other groups. The presented quantum polyno-


mial time algorithm effectively calculates the discrete logarithm in the multiplicative
group modulo a positive integer. However, as it became apparent that this problem
serves as the foundation for the security of public-key cryptography, a multitude of
public-key cryptography algorithms emerged, which hinge upon the discrete logarithm
problem in various groups. Notably, elliptic curve cryptography stands out from an ap-
plication perspective. Its security relies on the intractability of computing discrete log-
arithms in the group of points of an elliptic curve over a finite field. Nevertheless, for
all variants of the discrete logarithm problem, polynomial time quantum algorithms
were uncovered. Consequently, none of the cryptographic algorithms based on dis-
crete logarithms offer security against attacks from quantum computers. An exhaus-
tive overview of these algorithms can be found in [Jor].

6.7. Relevance for cryptography


The discovery of Shor’s algorithms, capable of factoring integers and computing dis-
crete logarithms in polynomial time, has had a significant impact on cybersecurity.
Traditional public-key cryptography, including the RSA public-key encryption and sig-
nature schemes and the digital signature algorithm, as described in [Buc04], becomes
insecure in the face of these algorithms. These cryptographic systems are critical for
cybersecurity, especially to secure the Internet.
Furthermore, variants of the Shor algorithm can also solve other discrete loga-
rithm problems, such as those in elliptic curve groups over finite fields. As a result,
all classical public-key cryptography becomes vulnerable when faced with sufficiently
large quantum computers. In response, scientists have been actively working on devel-
oping post-quantum cryptography (see [BLM17], [BLM18]) that can resist quantum
computer attacks.
252 6. The Algorithms of Shor

6.8. The hidden subgroup problem


In this section, we discuss the hidden subgroup problem, which serves as a framework
for addressing algorithmic problems in finite abelian groups. We will show that the
algorithmic problems that we have studies so far can be viewed as hidden subgroup
problems. However, there are several more instances of the hidden subgroup problem.
For an overview see [Wan10].

6.8.1. The problem. To state the hidden subgroup problem, we need the follow-
ing definition.

Definition 6.8.1. Let 𝐺 be a group, let 𝐻 be a subgroup of 𝐺, and let 𝑋 be a set. We say
that a function 𝑓 ∶ 𝐺 → 𝑋 hides the subgroup 𝐻 if for all 𝑔, 𝑔′ ∈ 𝐺 we have 𝑓(𝑔) = 𝑓(𝑔′ )
if and only if 𝑔𝐻 = 𝑔′ 𝐻. In other words, the function 𝑓 takes the same value for all
elements of a coset of 𝐻, while it takes different values for elements of different cosets.

The hidden subgroup problem is the following.

Problem 6.8.2 (Hidden subgroup problem).

Input: A black-box that implements a function 𝑓 ∶ 𝐺 → 𝑋 where 𝐺 is a group, 𝑋 is a


set, and 𝑓 hides a finitely generated subgroup 𝐻 of 𝐺.

Output: A finite generating system for 𝐻.

6.8.2. Hidden subgroup versions of the Deutsch and Simon problems.


The Deutsch problem explained in Section 5.1 is to find out whether a function 𝑓 ∶
{0, 1} → {0, 1} is constant or balanced. So we can set 𝐺 = ({0, 1}, ⊕), 𝑋 = {0, 1} and use
the function 𝑓. If 𝑓 is constant, then 𝑓(𝑥) is the same for all elements of 𝐺. Therefore,
the function 𝑓 hides 𝐻 = 𝐺. Every generating system of the subgroup 𝐻 contains the
element 1. If 𝑓 is balanced, then 𝑓(0) and 𝑓(1) are different. So 𝑓 hides the subgroup
𝐻 = {0} with the cosets 0 ⊕ 𝐻 = {0} and 1 ⊕ 𝐻 = {1}. In this case, the only generating
system of 𝐻 is (0). So we can tell whether 𝑓 is constant or balanced by checking whether
the generating system of 𝐻 contains 1.

Exercise 6.8.3. Show that the Deutsch-Jozsa problem can be viewed as a hidden sub-
group problem by finding 𝐺, 𝑋, 𝑓, and 𝐻 as in Definition 6.8.1, and show that finding
the hidden subgroup is equivalent to solving the Deutsch-Jozsa problem.

In Simon’s problem from Section 5.4, a black-box implementation of a function


𝑓 ∶ {0, 1}𝑛 → {0, 1}𝑛 is given with the property that there is 𝑠 ⃗ ∈ {0, 1}𝑛 , 𝑠 ⃗ ≠ 0, such
that for all 𝑥,⃗ 𝑦 ⃗ ∈ {0, 1}𝑛 we have 𝑓(𝑥)⃗ = 𝑓(𝑦)⃗ if and only if 𝑦 ⃗ = 𝑥⃗ ⊕ 𝑠.⃗ The task is to
find 𝑠.⃗ In the hidden subgroup version of Simon’s problem, we can set 𝐺 = ({0, 1}𝑛 , ⊕),
𝑋 = {0, 1}𝑛 and use 𝑓 from the original problem. The hidden subgroup is 𝐻 = {0,⃗ 𝑠}.⃗ Its
generating systems are (𝑠)⃗ and (0,⃗ 𝑠).⃗ Hence, if we find such a generating system, then
we have solved Simon’s problem. Conversely, the solution of Simon’s problem gives a
generating system of the hidden subgroup.
6.8. The hidden subgroup problem 253

Exercise 6.8.4. Show that the generalization of Simon’s problem can be viewed as a
hidden subgroup problem by finding 𝐺, 𝑋, 𝑓, and 𝐻 as in Definition 6.8.1, and show
that finding the hidden subgroup is equivalent to solving the generalization of Simon’s
problem.

6.8.3. Hidden subgroup version of the order finding problem. In the order
finding problem from Section 6.4, an odd integer 𝑁 ∈ ℕ≥3 and 𝑎 ∈ ℤ𝑁 are given such
that gcd(𝑎, 𝑁) = 1. The problem is to find the order 𝑟 of 𝑎 modulo 𝑁. To frame this
problem as a hidden subgroup problem, we set 𝐺 = (ℤ, +), 𝑋 = ℤ𝑁 , 𝑓 ∶ 𝐺 → 𝑋,
𝑗 ↦ 𝑎𝑗 mod 𝑁. The hidden subgroup is 𝐻 = 𝑟ℤ. It has the property that for all 𝑗 ∈ ℤ
we have 𝑎𝑗 ≡ 1 mod 𝑁 if and only if 𝑗 ∈ 𝐻.
We show that the problem of finding the order 𝑟 of 𝑎 modulo 𝑁 is equivalent to
finding a finite generating system of 𝐻. First, we note that if we find the order 𝑟 of 𝑎
modulo 𝑁, then we know the generating system (𝑟) of 𝐻. To prove the converse, we
need the following result.
Lemma 6.8.5. Let 𝑟 ∈ ℕ and let 𝑚 ∈ ℕ and 𝐺 = (𝑟0 , . . . , 𝑟𝑚−1 ) ∈ ℤ𝑚 . Then 𝐺 is a
generating system of 𝑟ℤ if and only if gcd(𝑟0 , . . . , 𝑟𝑚−1 ) = 𝑟.

Proof. The lemma follows from Theorem 1.7.5 in [Buc04]. □

From Lemma 6.8.5 it follows that 𝑟 can be determined as the gcd of every gener-
ating system of 𝑟ℤ. Hence, finding a finite generating system of 𝐻 = 𝑟ℤ allows one to
find 𝑟.

6.8.4. Hidden subgroup version of the discrete logarithm problem. Next,


we show that the discrete logarithm problem can also be viewed as a hidden subgroup
problem. In the discrete logarithm problem as we have phrased it above, we are given
an odd integer 𝑁 and 𝑎, 𝑏 ∈ ℤ∗𝑁 such that 𝑏 ≡ 𝑎𝑡 for some 𝑡 ∈ ℤ𝑟 where 𝑟 is the order
of 𝑎 modulo 𝑁 which is also known. The task is to find the discrete logarithm 𝑡 of 𝑏 to
the base 𝑎. We set 𝐺 = (ℤ2𝑟 , +), 𝑋 = ℤ𝑁 , and 𝑓 ∶ 𝐺 → 𝑋, (𝑥, 𝑦) ↦ 𝑎𝑥 𝑏𝑦 . The hidden
subgroup is 𝐻 = ℤ𝑟 (1, −𝑡). We show that finding a finite generating system of 𝐻 is
equivalent to determining 𝑡. For this, we need the following lemma.
Lemma 6.8.6. Let 𝑟, 𝑡, 𝑚 ∈ ℕ and let ((𝑥0 , 𝑦0 ), . . . , (𝑥𝑚−1 , 𝑦𝑚−1 )) be a generating system
of 𝐻 = ℤ𝑟 (1, −𝑡) where (𝑥𝑖 , 𝑦 𝑖 ) ∈ ℤ2𝑟 for all 𝑖 ∈ ℤ𝑚 . Then 𝑑 = gcd(𝑥0 , . . . , 𝑥𝑚−1 ) is
coprime to 𝑟 and 𝑡 = 𝑑 ′ gcd(𝑦0 , . . . , 𝑦𝑚−1 ) mod 𝑟 where 𝑑 ′ is the inverse of 𝑑 modulo 𝑟.

Proof. Since ((𝑥0 , 𝑦0 ), . . . , (𝑥𝑚−1 , 𝑦𝑚−1 )) is a generating system of ℤ𝑟 (1, 𝑡), it follows
that there are 𝑢0 , . . . , 𝑢𝑚−1 ∈ ℤ with
𝑚−1
(6.8.1) ∑ 𝑢𝑖 (𝑥𝑖 , 𝑦 𝑖 ) ≡ (1, −𝑡) mod 𝑟.
𝑖=0

Since (𝑥𝑖 , 𝑦 𝑖 ) ∈ ℤ𝑟 (1, −𝑡) for all 𝑖 ∈ ℤ𝑟 , (6.8.1) implies


𝑚−1
(6.8.2) ∑ 𝑢𝑖 𝑥𝑖 (1, −𝑡) ≡ (1, −𝑡) mod 𝑟.
𝑖=0
254 6. The Algorithms of Shor

Let 𝑑 = gcd(𝑥0 , . . . , 𝑥𝑚−1 ) and let 𝑥𝑖′ = 𝑥𝑖 /𝑑 for all 𝑖 ∈ ℤ𝑟 . Then we obtain from (6.8.2)
𝑚−1
(6.8.3) 𝑑 ∑ 𝑢𝑖 𝑥𝑖′ ≡ 1 mod 𝑟.
𝑖=0

So 𝑑 is coprime to 𝑟. Since 𝑦 𝑖 ≡ −𝑡𝑥𝑖 mod 𝑟 for all 𝑖 ∈ ℤ𝑚 , it follows that


(6.8.4) gcd(𝑦0 , . . . , 𝑦𝑚−1 ) ≡ 𝑑𝑡 mod 𝑟.

So if 𝑑 is the inverse of 𝑑 modulo 𝑟, then we have
(6.8.5) 𝑑 ′ gcd(𝑦0 , . . . , 𝑦𝑚−1 ) ≡ 𝑡 mod 𝑟
which implies the assertion of the lemma. □

Lemma 6.8.6 shows how to obtain the discrete logarithm 𝑡 of 𝑏 modulo 𝑁 to


base 𝑎 from a generating system ((𝑥0 , 𝑦0 ), . . . , (𝑥𝑚−1 , 𝑦𝑚−1 )) of ℤ𝑟 (1, −𝑡). We compute
𝑑 = gcd(𝑥0 , . . . , 𝑥𝑚−1 ), find the inverse 𝑑 ′ of 𝑑 modulo 𝑟, and determine 𝑡 =
𝑑 ′ gcd(𝑦0 , . . . , 𝑦𝑚−1 ). Conversely, if we find the discrete logarithm 𝑡, then we know
the generation system (1, −𝑡) of ℤ𝑟 (1, −𝑡).
Chapter 7

Quantum Search and


Quantum Counting

This chapter explores the renowned search algorithm developed by Lov Grover [Gro96]
and the related counting algorithms devised by Gilles Brassard, Peter Høyer, and Alain
Tapp [BHT98]. These algorithms provide a quadratic acceleration of classical algo-
rithms for unstructured search and counting problems. This refers to situations in
which we are given black-box access to a function 𝑓 ∶ {0, 1}𝑛 → {0, 1} and our objective
is to discover an input 𝑥⃗ ∈ {0, 1}𝑛 that satisfies 𝑓(𝑥)⃗ = 1 or to count the number of such
inputs. Given the broad applicability of this problem, the algorithms elucidated in this
section find utility across diverse fields, including cryptography and machine learning,
and effectively amplify the efficiency of existing algorithms in these contexts.
The initial section of this chapter focuses on Grover’s search algorithm. It shows
that addressing the search problem necessitates only the measurement of a quantum
state, which can be effectively prepared. However, the success probability of this ap-
proach proves to be inadequate. Here the crucial technique of amplitude amplifica-
tion steps in. This technique serves to enhance the likelihood of obtaining a solution
from a measurement, thus achieving quadratic acceleration. In the subsequent part of
the chapter, the synergy between amplitude amplification and phase estimation, intro-
duced in the preceding chapter, is explored. This synergy yields a quantum counting
algorithm that, under certain conditions, also delivers a quadratic speedup.
In the complexity analyses of this chapter, we assume that all quantum circuits are
constructed using the elementary quantum gates provided by the platform discussed
in Section 4.12.2, along with implementations of operators 𝑈 𝑓 as specified in their re-
spective contexts.

255
256 7. Quantum Search and Quantum Counting

7.1. Quantum search


In this section, we present Grover’s quantum search algorithm, featuring the funda-
mental technique of amplitude amplification, which is employed in numerous other
quantum algorithms.

7.1.1. The classical search problem. The algorithm of Grover solves the un-
structured search problem. The classical version of this problem is as follows.
Problem 7.1.1 (Classical search problem).
Input: 𝑛 ∈ ℕ and a black-box that implements a function 𝑓 ∶ {0, 1}𝑛 → {0, 1}.
Output: A string 𝑥⃗ ∈ {0, 1}𝑛 with 𝑓(𝑥)⃗ = 1.

This problem appears in innumerable applications. For example, we can think


of {0, 1}𝑛 as the set of addresses of the entries in a database and we can think of the
function 𝑓 as marking the addresses 𝑥 of the entries that meet a search criterion. This
criterion is satisfied if 𝑓(𝑥) = 1 and is not satisfied if 𝑓(𝑥) = 0.
There are several variants of the search problem. For example, we may require
that the search problem has a solution, “No Solution” may be a permitted output, and
the number of solutions may be an input.
Requiring the domain of the function 𝑓 to be {0, 1}𝑛 is not a restriction. To see this,
identify {0, 1}𝑛 with ℤ2𝑛 as explained in Example 2.1.5. For any 𝑁 ∈ ℕ and any function
𝑓 ∶ ℤ𝑁 → {0, 1} the domain ℤ𝑁 can be simply extended to the larger domain ℤ2𝑛 where
𝑛 = ⌈log2 𝑁⌉ by assigning the value 0 to all inputs outside of ℤ𝑁 . Since |{0, 1}𝑛 | < 2𝑁,
the extended search space is smaller than twice the size of the original search space.
On a classical computer, solving the search problem requires conducting 2𝑛 eval-
uations of the function 𝑓 in the worst case. However, as we will see in the sequel,
Grover’s algorithm is significantly faster, as it offers a quadratic speedup.
This has many applications. For example, Grover’s algorithm can be used to search
for a secret key in a symmetric encryption scheme, such as AES (e.g., see [GLRS16]).
The quadratic speedup leads to the need to double the key length as protection against
quantum computer attacks. Grover’s algorithm can also be used to crack passwords
stored in hashed form (e.g., see [DGM+ 21]). Another application of Grover’s algorithm
is quantum machine learning (e.g., see [BWP+ 17]).

7.1.2. The Grover search algorithm with a known number of solutions. In


this section, we present the first quantum search algorithm that requires the number
of solutions of the search problem to be known. As in all quantum search problems
presented in this chapter, the function 𝑓 is replaced by the unitary operator
(7.1.1) 𝑈 𝑓 ∶ ℍ 𝑛 ⊗ ℍ1 → ℍ 𝑛 ⊗ ℍ1
that has been introduced in (5.3.2). For all 𝑥⃗ ∈ {0, 1}𝑛 and 𝑦 ∈ {0, 1} it satisfies
(7.1.2) 𝑈 𝑓 |𝑥⟩⃗ |𝑦⟩ = |𝑥⟩⃗ |𝑓(𝑥)⃗ ⊕ 𝑦⟩ = |𝑥⟩⃗ 𝑋 𝑓(𝑥)⃗ |𝑦⟩ .
The problem we aim to solve is the following.
7.1. Quantum search 257

Problem 7.1.2 (Quantum search problem with a known number of solutions).

Input: 𝑛 ∈ ℕ, a black-box that implements 𝑈 𝑓 for a function 𝑓 ∶ {0, 1}𝑛 → {0, 1},
𝑀 = |𝑓−1 (1)|. It is assumed that 𝑀 > 0.

Output: A string 𝑥⃗ ∈ {0, 1}𝑛 with 𝑓(𝑥)⃗ = 1.

In the explanation of the Grover algorithm that solves Problem 7.1.2, we use the
notation and assumptions of this problem and set 𝑁 = 2𝑛 .
First, we explain how the search problem can be solved by measuring the quantum
state
1
(7.1.3) |𝑠⟩ = ∑ |𝑥⟩⃗ .
√𝑁 ⃗
𝑥∈{0,1}𝑛

Set
1 1
(7.1.4) |𝑠0 ⟩ = ∑ |𝑥⟩⃗ , |𝑠1 ⟩ = ∑ |𝑥⟩⃗ ,
√𝑁 ⃗
𝑥∈{0,1}𝑛 , 𝑓(𝑥)=0
⃗ √𝑁 ⃗
𝑥∈{0,1} 𝑛 , 𝑓(𝑥)=1

and
𝑀
(7.1.5) 𝜃 = arcsin √ .
𝑁

Then the following proposition holds.

Proposition 7.1.3. We have

𝑁−𝑀 𝑀
(7.1.6) |𝑠⟩ = √ |𝑠0 ⟩ + √ |𝑠1 ⟩ = cos 𝜃 |𝑠0 ⟩ + sin 𝜃 |𝑠1 ⟩ .
𝑁 𝑁
Exercise 7.1.4. Prove Proposition 7.1.3.

If the quantum state |𝑠⟩ is measured in the computational basis of ℍ𝑛 , then by


Proposition 7.1.3, the probability of obtaining 𝑥⃗ with 𝑓(𝑥)⃗ = 1 is
2 𝑀
(7.1.7) 𝑝 = sin 𝜃 = .
𝑁
This is exactly the probability of correctly guessing a solution to the search problem,
and therefore this simple quantum strategy does not provide an advantage over classi-
cal solutions. To enhance the success probability of the quantum strategy, we employ a
technique called amplitude amplification which significantly increases the amplitude
of the state |𝑠1 ⟩ and therefore the probability of finding a solution. Amplitude amplifi-
cation uses a unitary operator 𝐺, called the Grover iterator. For every 𝛼 ∈ ℝ it satisfies

(7.1.8) 𝐺(cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .

So, it follows from (7.1.6) that for every 𝑘 ∈ ℕ0 we have

(7.1.9) 𝐺 𝑘 |𝑠⟩ = cos(2𝑘 + 1)𝜃 |𝑠0 ⟩ + sin(2𝑘 + 1)𝜃) |𝑠1 ⟩ .


258 7. Quantum Search and Quantum Counting

|0⟩𝑛 𝐻 ⊗𝑛 𝐺 ⋯ 𝐺 𝑥
⏟⎵⎵⎵⎵⎵⏟⎵⎵⎵⎵⎵⏟
𝑘 times

Figure 7.1.1. The quantum circuit for Grover’s search algorithm.

The Grover quantum search algorithm is shown in Figure 7.1.1 and Algorithm
7.1.6. Its input is the state |0⟩𝑛 . Then the algorithm constructs
(7.1.10) |𝑠⟩ = ℍ⊗𝑛 |0⟩𝑛 .

This equation holds by Lemma 5.3.6. Subsequently, the algorithm applies 𝐺 𝑘 to |𝑠⟩ and
measures the resulting quantum state in the computational basis of ℍ𝑛 . The number
𝑝𝑖
𝑘 is chosen so that 2(𝑘 + 1)𝜃 is as close as possible to 2 . By (7.1.9), this maximizes the
probability that the algorithm finds 𝑥⃗ ∈ {0, 1}𝑛 with 𝑓(𝑥)⃗ = 1. In Theorem 7.1.21 we
will estimate the number 𝑘 of applications of 𝐺 required in the search algorithm and
the success probability of the algorithm. Before we state and prove this theorem, we
construct the Grover iterator in the next section. Note that Algorithm 7.1.6 receives
as input the black-box implementing 𝑈 𝑓 but applies the Grover iterator 𝐺. In Section
7.1.4, we will explain how 𝐺 can be efficiently implemented using 𝑈 𝑓 .

7.1.3. The Grover iterator. We now explain the construction of the Grover iter-
ator and prove its properties.

Definition 7.1.5. We define the following operators on ℍ𝑛 :


(7.1.11) 𝑈1 = 𝐼 − 2 |𝑠1 ⟩ ⟨𝑠1 | , 𝑈𝑠 = 2 |𝑠⟩ ⟨𝑠| − 𝐼
where 𝐼 denotes the identity operator on ℍ𝑛 . Then the Grover iterator is defined as the
operator
(7.1.12) 𝐺 = 𝑈𝑠 𝑈1 .

Algorithm 7.1.6. Grover algorithm for search problems with known number of solu-
tions
Input: 𝑛 ∈ ℕ, a black-box implementing 𝑈 𝑓 for some 𝑓 ∶ {0, 1}𝑛 → {0, 1}, and 𝑀 =
|𝑓−1 (1)|. It is assumed that 𝑀 > 0.
Output: 𝑥 ∈ {0, 1}𝑛 such that 𝑓(𝑥) = 1
1: QSearch(𝑛, 𝑈 𝑓 , 𝑀)
𝜋 𝑀
2: 𝑘 ← ⌊ 4𝜃 ⌋ where 𝜃 = arcsin(√ 𝑁 )
3: Apply the quantum circuit from Figure 7.1.1, the result being 𝑥⃗ ∈ {0, 1}𝑛
4: end

The operator 𝑈𝑠 is sometimes also called the Grover diffusion operator. The next
proposition states basic properties of the operators in Definition 7.1.5.
7.1. Quantum search 259

|𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩

𝛼
|𝑠0 ⟩
𝛼 𝑈1

𝑈1 |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ − sin 𝛼 |𝑠1 ⟩

Figure 7.1.2. Applying 𝑈 1 to |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩.

Proposition 7.1.7. (1) The operators 𝑈1 and 𝑈𝑠 are unitary and Hermitian involu-
tions.
(2) The Grover iterator 𝐺 is unitary.
Exercise 7.1.8. Prove Proposition 7.1.7.

Next, we present the geometric properties of 𝑈1 and 𝑈𝑠 . For this, we define the
complex plane
(7.1.13) 𝑃 = ℂ |𝑠0 ⟩ + ℂ |𝑠1 ⟩ .
We note that (|𝑠0 ⟩ , |𝑠1 ⟩) is an orthonormal basis of 𝑃. The next proposition states an
important geometric property of 𝑈1 .
Proposition 7.1.9. The operator 𝑈1 acts as a reflection in the plane 𝑃 across |𝑠0 ⟩. In
particular, for all 𝛼 ∈ ℝ we have
(7.1.14) 𝑈1 (cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos 𝛼 |𝑠0 ⟩ − sin 𝛼 |𝑠1 ⟩ .

Proposition 7.1.9 is illustrated in Figure 7.1.2 and proved in Exercise 7.1.10.


Exercise 7.1.10. Prove Proposition 7.1.9.

In order to describe the geometric meaning of 𝑈𝑠 we define the quantum state


(7.1.15) |𝑠⟂ ⟩ = − sin 𝜃 |𝑠0 ⟩ + cos 𝜃 |𝑠1 ⟩
which is in 𝑃 and orthogonal to |𝑠⟩. We also define the matrix
cos 𝜃 − sin 𝜃
(7.1.16) 𝑇=( ).
sin 𝜃 cos 𝜃
As shown in Exercise 7.1.11, the matrix 𝑇 is unitary, (|𝑠⟩ , |𝑠⟂ ⟩) is another orthonormal
basis of the plane 𝑃, and we have
(7.1.17) (|𝑠⟩ , |𝑠⟂ ⟩) = (|𝑠0 ⟩ , |𝑠1 ⟩) 𝑇, (|𝑠0 ⟩ , |𝑠1 ⟩) = (|𝑠⟩ , |𝑠⟂ ⟩) 𝑇 ∗ .
Exercise 7.1.11. (1) Show that the matrix 𝑇 is unitary.
(2) Prove that (|𝑠1 ⟩ , |𝑠0 ⟩) and (|𝑠⟩ , |𝑠⟂ ⟩) are orthonormal bases of the plane 𝑃 and verify
(7.1.17).

The next proposition is the desired geometric interpretation of the operator 𝑈𝑠 .


260 7. Quantum Search and Quantum Counting

𝑈𝑠 |𝜓⟩ = cos 𝛽 |𝑠⟩ − sin 𝛽 |𝑠⟂ ⟩

𝑈𝑠 |𝑠⟩
𝛽
𝛽
|𝜓⟩ = cos 𝛽 |𝑠⟩ + sin 𝛽 |𝑠⟂ ⟩

Figure 7.1.3. Applying 𝑈𝑠 to |𝜓⟩ = cos 𝛽 |𝑠⟩ + sin 𝛽 |𝑠⟂ ⟩.

Proposition 7.1.12. The operator 𝑈𝑠 acts as a reflection in the plane 𝑃 across |𝑠⟩. In
particular, for all 𝛼 ∈ ℝ we have

(7.1.18) 𝑈𝑠 (cos 𝛼 |𝑠⟩ + sin 𝛼 |𝑠⟂ ⟩) = cos 𝛼 |𝑠⟩ − sin 𝛼 |𝑠⟂ ⟩ .

The proposition is illustrated in Figure 7.1.3 and proved in Exercise 7.1.13.

Exercise 7.1.13. Prove Proposition 7.1.12.

Let 𝛼 ∈ ℝ. Using Propositions 7.1.9 and 7.1.12 we can describe the action of the
Grover iterator on a quantum state |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩ geometrically. This is
illustrated in Figure 7.1.4. Since applying 𝑈1 to |𝜓⟩ means reflecting |𝜓⟩ across |𝑠0 ⟩,
the angle between |𝑠0 ⟩ and 𝑈1 |𝜓⟩ is 𝛼 mod 2𝜋. So, the angle between |𝑠⟩ and 𝑈1 |𝜓⟩ is
𝛼 + 𝜃 mod 2𝜋. Next, applying 𝑈𝑠 to 𝑈1 |𝜓⟩ means reflecting 𝑈1 |𝜓⟩ across |𝑠⟩. So the
angle between 𝐺 |𝜓⟩ = 𝑈𝑠 𝑈1 |𝜓⟩ and |𝑠⟩ is 𝛼 + 𝜃 mod 2𝜋 and the angle between |𝑠0 ⟩ and
𝐺 |𝜓⟩ is 𝛼 + 2𝜃 mod 2𝜋. So we have

(7.1.19) 𝐺 |𝜓⟩ = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .

To verify (7.1.19) also algebraically, we need one more proposition.

Proposition 7.1.14. Let 𝛼 ∈ ℝ, then we have

(7.1.20) cos 𝛼 |𝑠⟩ + sin 𝛼 |𝑠⟂ ⟩ = cos(𝛼 + 𝜃) |𝑠0 ⟩ + sin(𝛼 + 𝜃) |𝑠1 ⟩

and

(7.1.21) cos 𝛼 |𝑠0 ⟩ − sin 𝛼 |𝑠1 ⟩ = cos(𝛼 + 𝜃) |𝑠⟩ − sin(𝛼 + 𝜃) |𝑠⟂ ⟩ .

Exercise 7.1.15. Prove Proposition 7.1.14.

Here is the algebraic verification of (7.1.19).

Proposition 7.1.16. Let 𝛼 ∈ ℝ. Then

(7.1.22) 𝐺 (cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .
7.1. Quantum search 261

|𝑠1 ⟩
𝐺 |𝜓⟩ = 𝑈𝑠 𝑈1 |𝜓⟩ = cos(𝛼 + 2𝜃) |𝑠0 ⟩ − sin(𝛼 + 2𝜃) |𝑠1 ⟩

𝑈𝑠
|𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩
𝛼 + 2𝜃 |𝑠⟩ = cos 𝜃 |𝑠0 ⟩ + sin 𝜃 |𝑠1 ⟩

𝛼
𝜃
|𝑠0 ⟩
𝛼
𝑈1

𝑈1 |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ − sin 𝛼 |𝑠1 ⟩

Figure 7.1.4. Applying the Grover iterator 𝐺 to |𝜓⟩ = cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩.

Proof. From Propositions 7.1.9, 7.1.12, and 7.1.14 we obtain


𝐺 (cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩)
= 𝑈𝑠 𝑈1 (cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩)
= 𝑈𝑠 (cos 𝛼 |𝑠0 ⟩ − sin 𝛼 |𝑠1 ⟩)
(7.1.23) = 𝑈𝑠 (cos(𝛼 + 𝜃) |𝑠⟩ − sin(𝛼 + 𝜃) |𝑠⟂ ⟩
= cos(𝛼 + 𝜃) |𝑠⟩ + sin(𝛼 + 𝜃) |𝑠⟂ ⟩
= cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ . □

7.1.4. Implementation of the Grover iterator. The goal of this section is to


show that the Grover 𝐺 iterator can be efficiently implemented using the operator 𝑈 𝑓 .
We use the quantum states |𝑠⟩, |𝑠⟂ ⟩, |𝑠0 ⟩, |𝑠1 ⟩, the plane 𝑃 = ℂ |𝑠⟩+ℂ |𝑠⟂ ⟩ = ℂ |𝑠0 ⟩+ℂ |𝑠1 ⟩,
and the operators 𝑈1 and 𝑈𝑠 that were introduced in Section 7.1.3.
Figure 7.1.5 shows a quantum circuit that implements the operator 𝑈1 on the plane
𝑃. Its correctness is stated in the next proposition.

Proposition 7.1.17. The circuit in Figure 7.1.5 implements the operator 𝑈1 in the plane
𝑃. It applies the black-box for 𝑈 𝑓 once and uses four additional elementary quantum
gates.

Proof. We will prove that the circuit computes 𝑈1 |𝑠0 ⟩ and 𝑈1 |𝑠1 ⟩ correctly. This suf-
fices since 𝑈1 is linear and (|𝑠0 ⟩ , |𝑠1 ⟩) is a basis of the plane 𝑃. Let 𝑗 ∈ {0, 1}. First, we
262 7. Quantum Search and Quantum Counting

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩

|𝜓⟩ 𝑈1 |𝜓⟩
𝑈𝑓
|1⟩ 𝐻 𝐻 tr

Figure 7.1.5. Implementation of 𝑈 1 .

note that
𝑈1 |𝑠𝑗 ⟩ = (𝐼 − 2 |𝑠1 ⟩ ⟨𝑠1 |) |𝑠𝑗 ⟩ = (−1)𝑗 |𝑠𝑗 ⟩ ,
|𝑠𝑗 ⟩ |0⟩ if 𝑗 = 0, |𝑠𝑗 ⟩ |1⟩ if 𝑗 = 0,
𝑈 𝑓 |𝑠𝑗 ⟩ |0⟩ = { 𝑈 𝑓 |𝑠𝑗 ⟩ |1⟩ = {
|𝑠𝑗 ⟩ |1⟩ if 𝑗 = 1, |𝑠𝑗 ⟩ |0⟩ if 𝑗 = 1,
and therefore
𝑈 𝑓 |𝑠𝑗 ⟩ |0⟩ − 𝑈 𝑓 |𝑠𝑗 ⟩ |1⟩
(7.1.24) 𝑈 𝑓 |𝑠𝑗 ⟩ |𝑥− ⟩ = = (−1)𝑗 |𝑠𝑗 ⟩ |𝑥− ⟩ = (𝑈1 |𝑠𝑗 ⟩) |𝑥− ⟩ .
√2
This allows us to determine the intermediate states in the circuit. They are
|𝜓0 ⟩ = |𝑏𝑗 ⟩ |1⟩ ,
|𝜓1 ⟩ = |𝑏𝑗 ⟩ |𝑥− ⟩ ,
|𝜓2 ⟩ = 𝑈 𝑓 |𝑏𝑗 ⟩ |𝑥− ⟩ = (𝑈1 |𝑏𝑗 ⟩) |𝑥− ⟩ ,
|𝜓3 ⟩ = (𝑈1 |𝑏𝑗 ⟩) |1⟩ .
This concludes the proof of the proposition. □

Next, as Proposition 7.1.18 shows, Figure 7.1.6 shows an implementation of the


operator −𝑈𝑠 . Since global phase factors do not change measurement outcomes, this
is as good as implementing 𝑈𝑠 .
Proposition 7.1.18. The circuit in Figure 7.1.6 implements −𝑈𝑠 on the plane 𝑃. It has
size O(𝑛).

|𝑥0 ⟩ 𝐻 𝑋 𝑋 𝐻

|𝑥1 ⟩ 𝐻 𝑋 𝑋 𝐻

⋮ −𝑈𝑠 |𝑥0 ⋯ 𝑥𝑛−1 ⟩

|𝑥𝑛−2 ⟩ 𝐻 𝑋 𝑋 𝐻

|𝑥𝑛−1 ⟩ 𝐻 𝑋 𝑍 𝑋 𝐻

Figure 7.1.6. Implementation of −𝑈𝑠 .


7.1. Quantum search 263

Proof. By the definition of |𝑠⟩ and 𝑈𝑠 we have


(7.1.25) 𝑈𝑠 = 𝐻 ⊗𝑛 (2 |0⟩𝑛 ⟨0|𝑛 − 𝐼)𝐻⊗𝑛.
Set
(7.1.26) 𝑉 = 2 |0⟩𝑛 ⟨0|𝑛 − 𝐼.
To verify that the circuit implements −𝑈𝑠 , it suffices to show that
(7.1.27) 𝑉 = −𝑋 ⊗𝑛 𝐶 𝑛−1 (𝑍)𝑋 ⊗𝑛 .
In order to prove this equation, let 𝑥⃗ ∈ {0, 1}𝑛 . Then

− |𝑥⟩⃗ if 𝑥⃗ ≠ 0,⃗
(7.1.28) 𝑉 |𝑥⟩⃗ = {
|𝑥⟩⃗ if 𝑥⃗ = 0⃗
We also have
(7.1.29) 𝑋 ⊗𝑛 |𝑥⟩⃗ = |¬𝑥⟩⃗

where ¬𝑥⃗ denotes the string in {0, 1}𝑛 that is obtained by negating all entries in 𝑥.⃗ We
now show that
− |¬𝑥⟩⃗ if 𝑥⃗ = 0,⃗
(7.1.30) 𝐶 𝑛−1 (𝑍)𝑋 ⊗𝑛 |𝑥⟩⃗ = 𝐶 𝑛−1 (𝑍) |¬𝑥⟩⃗ = {
|¬𝑥⟩⃗ if 𝑥⃗ ≠ 0.⃗

If 𝑥⃗ = 0,⃗ then ¬𝑥⃗ = (11


⏟⎵⋯ ⏟). Hence 𝐶 𝑛−1 (𝑍) |¬𝑥⟩⃗ applies the Pauli 𝑍 gate to the last
⏟⎵11
𝑛
qubit |1⟩ which becomes − |1⟩. This implies that the state |¬𝑥⟩⃗ is changed to − |¬𝑥⟩. ⃗

Assume that 𝑥⃗ ≠ 0. Then at least one of the entries of ¬𝑥⃗ is 0. If one of the first 𝑛 − 1
entries is 0, then 𝑍 is not applied to the last qubit of |¬𝑥⟩, ⃗ which means that |¬𝑥⟩⃗ is
not changed by applying 𝐶 𝑛−1 (𝑍). But if the first 𝑛 − 1 entries of ¬𝑥⃗ are 1, then its last
entry is 0 and 𝑍 is applied to the last qubit |0⟩ of |¬𝑥⟩⃗ which does not change this qubit.
So, again, |¬𝑥⟩⃗ remains unchanged by applying 𝐶 𝑛−1 (𝑍). In summary, (7.1.30) holds
which together with (7.1.28) implies

− |𝑥⟩⃗ if 𝑥⃗ = 0,⃗
(7.1.31) 𝑋 ⊗𝑛 𝐶 𝑛−1 (𝑍)𝑋 ⊗𝑛 |𝑥⟩⃗ = { } = 𝑉 |𝑥⟩⃗ .
|𝑥⟩⃗ if 𝑥⃗ ≠ 0⃗

We estimate the size of the circuit. It uses O(𝑛) Pauli 𝑋 and Hadamard gates and
one 𝐶 𝑛−1 (𝑍) operator which by Proposition 4.9.11 and Corollary 4.12.8 can be imple-
mented using the O(𝑛) elementary quantum gates. So in total, the circuit has size
O(𝑛). □

From Propositions 7.1.17 and 7.1.18 we obtain the following result.

Proposition 7.1.19. The Grover iterator 𝐺 can be implemented using one black-box for
𝑈 𝑓 and O(𝑛) additional elementary quantum gates.

Exercise 7.1.20. Prove Proposition 7.1.19.


264 7. Quantum Search and Quantum Counting

7.1.5. Analysis of the search algorithm with known number of solutions.


After the preparations of the previous sections, we can now estimate the success prob-
ability and the complexity of the Grover search algorithm with a known number of
solutions.
Theorem 7.1.21. Let 𝑛 ∈ ℕ, 𝑁 = 2𝑛 , 𝑓 ∶ {0, 1}𝑛 → {0, 1}, 𝑀 = |𝑓−1 (1)| > 0. On input
of 𝑛, a black-box that implements 𝑈 𝑓 and 𝑀, Algorithm 7.1.6 returns with probability at
𝑀
least 1− 𝑁 a string 𝑥⃗ ∈ {0, 1}𝑛 such that 𝑓(𝑥)⃗ = 1. The algorithm applies the black-box for
𝜋 𝑁 𝑁
𝑈 𝑓 at most 4 √ 𝑀 times and uses O (log 𝑁 √ 𝑀 ) additional elementary quantum gates.

Proof. It follows from Proposition 7.1.16 that the final quantum state produced by the
algorithm is
(7.1.32) |𝜓⟩ = 𝐺 𝑘 |𝑠⟩ = cos(2𝑘 + 1)𝜃 |𝑠0 ⟩ + sin(2𝑘 + 1)𝜃 |𝑠1 ⟩
where
𝑀 𝜋
(7.1.33) 𝜃 = arcsin √ and 𝑘=⌊ ⌋.
𝑁 4𝜃
Then the algorithm measures |𝜓⟩ in the computational basis of ℍ𝑛 . It follows from the
definition of |𝑠0 ⟩ and |𝑠1 ⟩ that this gives 𝑥⃗ such that 𝑓(𝑥)⃗ = 1 with probability
2
(7.1.34) 𝑝 = sin (2𝑘 + 1)𝜃.
To prove the theorem, we estimate 𝑘 and 𝑝. It follows from Corollary A.5.6 and (7.1.33)
that
𝜋 𝜋 𝜋 𝑁
(7.1.35) 𝑘≤ = ≤ √ .
4𝜃 4 arcsin √𝑀/𝑁 4 𝑀
To estimate 𝑝 we observe that
𝜋
(7.1.36) 0<𝜃≤ .
2
We also set
𝜋 1
(7.1.37) 𝑘̃ = − .
4𝜃 2
Then
𝜋 𝜋
(7.1.38) (2𝑘 ̃ + 1)𝜃 = ( − 𝜃 + 𝜃) = .
2 2
Also, the choice of 𝑘 in (7.1.33) implies
𝜋
(7.1.39) 0≤ −𝑘<1
4𝜃
and therefore
1 𝜋 1 1
(7.1.40) − ≤ − − 𝑘 = 𝑘̃ − 𝑘 <
2 4𝜃 2 2
which implies
1
(7.1.41) |𝑘 − 𝑘|̃ ≤ .
2
7.1. Quantum search 265

It follows that
(7.1.42) |(2𝑘 + 1)𝜃 − (2𝑘 ̃ + 1)𝜃| = |2(𝑘 − 𝑘)𝜃
̃ | ≤ 𝜃.

By (7.1.38) we have sin(2𝑘 ̃ + 1)𝜃 = 1 and cos(2𝑘 ̃ + 1)𝜃 = 0. These equations, the
trigonometric identity (A.5.3), and equations (7.1.42) and (7.1.36) imply
|cos(2𝑘 + 1)𝜃|
= |cos(2𝑘 + 1)𝜃 sin(2𝑘 ̃ + 1)𝜃 − cos(2𝑘 ̃ + 1)𝜃 sin(2𝑘 + 1)𝜃|
(7.1.43)
= |sin ((2𝑘 + 1)𝜃 − 2(𝑘 ̃ + 1)𝜃)|
= sin |(2𝑘 + 1)𝜃 − 2(𝑘 ̃ + 1)𝜃| ≤ sin 𝜃.
Therefore, the failure probability of the Grover search algorithm after 𝑘 iterations is
2 𝑀
(7.1.44) cos2 (2𝑘 + 1)𝜃 ≤ sin 𝜃 =
𝑁
which implies the assertion about the success probability.
We estimate the complexity of the algorithm. It follows from (7.1.35) that the num-
𝜋 𝑁
ber of applications of the Grover iterator in the algorithm is bounded by 4 √ 𝑀 . So it
follows from Proposition 7.1.19 that the algorithm invokes the black-box for 𝑈 𝑓 at most
𝜋 𝑁 𝑀
4 √𝑀
times and uses O (log 𝑁 √ 𝑁 ) additional elementary quantum gates. □

7.1.6. A search algorithm with an unknown number of solutions. Again,


let 𝑓 ∶ {0, 1}𝑛 → {0, 1}, 𝑁 = 2𝑛 , and assume that 𝑀 = |𝑓−1 {1}| > 0. We want to find
𝑥⃗ ∈ {0, 1}𝑛 with 𝑓(𝑥)⃗ = 1. In this section, we present and analyze Algorithm 7.1.22 that
solves this search problem in the case where the number of solutions 𝑀 is not known.
This algorithm is a Las Vegas algorithm. It repeatedly computes and measures 𝐺 𝑘 |𝑠⟩
for a randomly chosen 𝑘 from ℤ𝑚 where 𝑚 increases exponentially until a solution of
the search problem is found.

Algorithm 7.1.22. Quantum search when the number of solutions is unknown


Input: 𝑛 ∈ ℕ, a black-box implementing 𝑈 𝑓 for some function 𝑓 ∶ {0, 1}𝑛 → {0, 1}
Output: 𝑥⃗ ∈ {0, 1}𝑛 such that 𝑓(𝑥)⃗ = 1
1: QSearch(𝑛, 𝑈 𝑓 )
2: 𝑚←1
3: 𝜆 ← 6/5
4: repeat
5: 𝑘 ← randomInt(𝑚)
6: Apply the quantum circuit from Figure 7.1.1, the result being 𝑥⃗ ∈ {0, 1}𝑛
7: 𝑚 ← ⌊min{𝜆𝑚, √𝑁}⌋
8: until 𝑓(𝑥)⃗ = 1
9: return 𝑥⃗
10: end
266 7. Quantum Search and Quantum Counting

Our goal is to prove the following theorem.


3𝑁
Theorem 7.1.23. Assume that 𝑀 ≤ 4 . Then the expected number of applications of the
Grover iterator and thus of the operator 𝑈 𝑓 required by Algorithm 7.1.22 to find a solution
𝑁
of the search problem is at most 9√ 𝑀 . The expected running time of the algorithm is
1+o(1)
𝑁
(√ 𝑀 ) .

3𝑁 3𝑁
We note that the condition 𝑀 ≤ 4
is not a restriction, since for 𝑀 > 4
guessing
3
a solution of the search problem has success probability at least 4 .
In the proof of Theorem 7.1.23, we again use the angle

𝑀
(7.1.45) 𝜃 = arcsin √ .
𝑁

Our proof requires the following two auxiliary results.

Lemma 7.1.24. For any 𝛼 ∈ ℝ and 𝑚 ∈ ℕ we have

𝑚−1
(7.1.46) 2 sin 𝛼 ∑ cos(2𝑘 + 1)𝛼 = sin 2𝑚𝛼.
𝑘=0

Proof. We prove the assertion by induction on 𝑚. For 𝑚 = 1, the trigonometric iden-


tity (A.5.4) gives

𝑚−1
(7.1.47) 2 sin 𝛼 ∑ cos((2𝑘 + 1)𝛼) = 2 sin 𝛼 cos 𝛼 = sin 2𝛼.
𝑘=0

Now let 𝑚 ≥ 1 and assume that (7.1.46) holds. Then this equation and the trigonomet-
ric identities (A.5.2) and (A.5.3) imply
𝑚
2 sin 𝛼 ∑ cos(2𝑘 + 1)𝛼
𝑘=0
𝑚−1
= 2 sin 𝛼 ( ∑ cos(2𝑘 + 1)𝛼 + cos(2𝑚 + 1)𝛼)
𝑘=0
= sin 2𝑚𝛼 + 2 sin 𝛼 cos(2𝑚 + 1)𝛼
(7.1.48) = sin 2𝑚𝛼 + sin 𝛼 cos(2𝑚 + 1)𝛼 − cos 𝛼 sin(2𝑚 + 1)𝛼
+ sin 𝛼 cos(2𝑚 + 1)𝛼 + cos 𝛼 sin(2𝑚 + 1)𝛼
= sin 2𝑚𝛼 − sin 2𝑚𝛼 + sin 2(𝑚 + 1)𝛼
= sin 2(𝑚 + 1)𝛼. □
7.1. Quantum search 267

Lemma 7.1.25. Let 𝑚 ∈ ℕ and assume that 𝑘 is chosen randomly with the uniform
distribution from ℤ𝑚 . Then measuring 𝐺 𝑘 |𝑠⟩ gives a solution of the search problem with
probability
1 sin 4𝑚𝜃
(7.1.49) 𝑝𝑚 = − .
2 4𝑚 sin 2𝜃
1 1
In particular, we have 𝑝𝑚 ≥ 4
when 𝑚 ≥ sin 2𝜃
.

Proof. By (7.1.9) the probability of obtaining a solution of the search problem when
2
measuring 𝐺𝑘 |𝑠⟩ for some 𝑘 ∈ ℕ0 is sin (2𝑘 + 1)𝜃. So if 𝑘 is chosen randomly from ℤ𝑚
for some 𝑚 ∈ ℕ, then equation (7.1.32), the trigonometric identity (A.5.7), and Lemma
7.1.24 imply that this probability is
𝑚−1
1 2
𝑝𝑚 = ∑ sin (2𝑘 + 1)𝜃
𝑚 𝑘=0
𝑚−1
(7.1.50) 1
= ∑ (1 − cos(2𝑘 + 1)2𝜃)
2𝑚 𝑘=0
1 sin 4𝑚𝜃
= − .
2 4𝑚 sin 2𝜃
1
If 𝑚 ≥ sin 2𝜃
, then
sin 4𝑚𝜃 sin 4𝑚𝜃 1
(7.1.51) ≤ ≤ . □
4𝑚 sin 2𝜃 4 4
We now prove Theorem 7.1.23. Set
1
(7.1.52) 𝑚0 = .
sin 2𝜃
𝑀 𝑁−𝑀 3𝑁
Since sin 𝜃 = √ 𝑁 and cos 𝜃 = √ 𝑁
, it follows from (A.5.4) and 𝑀 ≤ 4
that

1 𝑁 𝑁
(7.1.53) 𝑚0 = = ≤√ .
2 sin 𝜃 cos 𝜃 2√(𝑁 − 𝑀)𝑀 𝑀
In the 𝑗th iteration of the loop in Algorithm 7.1.22 we have
(7.1.54) 𝑚 = ⌊min{𝜆𝑗−1 , √𝑁}⌋
6
with 𝜆 = 5 . Also, the expected number of applications of the Grover iterator in this
loop is bounded as follows:
𝑚 1
(7.1.55) 𝐸𝑗 = ≤ min {𝜆𝑗−1 , √𝑁}.
2 2
We say that the algorithm reaches the critical stage if for the first time 𝑚 ≥ 𝑚0 . This
happens when in line 7 of the algorithm we have 𝑗 = ⌈log𝜆 𝑚0 ⌉. From (7.1.55) and
6
𝜆 = 5 it follows that the expected number of applications of the Grover iterator before
the algorithm finds a solution or reaches the critical stage is at most
⌈log𝜆 𝑚0 ⌉
1 𝜆⌈log𝜆 𝑚0 ⌉ − 1 𝜆
(7.1.56) ∑ 𝜆𝑗−1 = < 𝑚 = 3𝑚0 .
2 𝑗=1
2(𝜆 − 1) 2(𝜆 − 1) 0
268 7. Quantum Search and Quantum Counting

If the critical stage is reached, then in every iteration of the repeat loop in the
algorithm from this point on, we have 𝑚 ≥ 𝑚0 = 1/ sin 2𝜃. By Lemma 7.1.24, the suc-
1
cess probability in each of these iterations is at least 4 . So for all 𝑢 ≥ 1 the probability
that the algorithm is successful in the (⌈log𝜆 𝑚0 ⌉ + 𝑢)th iteration of the loop is at most
3 ᵆ−1
4
.
Therefore, the expected number of applications of the Grover iterator needed to
succeed in the critical stage is at most
∞ ᵆ ∞
𝜆⌈log𝜆 𝑚0 ⌉ 3𝜆 3𝑚0 9 ᵆ
(7.1.57) ∑( ) < ∑ ( ) = 6𝑚0 .
2 ᵆ=0
4 5 ᵆ=0 10

Therefore, the total expected number of applications of the Grover iterator in the algo-
𝑁
rithm is bounded by 9𝑚0 which by (7.1.52) is bounded by 9√ 𝑀 . The estimate of the
expected running time of the algorithm is derived from Propositions 7.1.17 and 7.1.18.

7.2. Quantum counting


A problem closely related to the search problem is the following. For 𝑛 ∈ ℕ, 𝑓 ∶
{0, 1}𝑛 → {0, 1} a function, determine the number 𝑀 = |𝑓−1 (1)| of solutions of the
search problem. In this section, we describe quantum algorithms to find approxima-
tions of 𝑀 or even to determine 𝑀 exactly.
These algorithms utilize the quantum states |𝑠⟩, |𝑠⟂ ⟩, |𝑠0 ⟩, and |𝑠1 ⟩, the plane 𝑃 =
ℂ |𝑠⟩ + ℂ |𝑠⟂ ⟩ = ℂ |𝑠0 ⟩ + ℂ |𝑠1 ⟩, the operators 𝑈 𝑓 , 𝑈1 , and 𝑈𝑠 , 𝑁 = 2𝑛 , and the angle
𝑀
𝜃 = arcsin (√ 𝑁 ) which were previously discussed in this chapter.

7.2.1. Implementing the controlled Grover iterator. When evaluating the


time complexity of the quantum counting algorithms, we consider the frequency of
using the 𝑈 𝑓 operator. This enables us to draw a comparison with classical count-
ing algorithms, where the crucial information lies in the number of evaluations of the
function 𝑓 required for the counting process.
To determine the frequency of using 𝑈 𝑓 , we need to take into account that the
algorithms discussed in this section utilize quantum phase estimation to approximate
the eigenvalues of the Grover iterator 𝐺. This technique is explained in detail in Section
6.3, and its implementation involves the quantum circuit shown in Figure 6.3.1. This
𝑖
implementation utilizes controlled-𝐺 2 operators for 0 ≤ 𝑖 < 𝑙, where 𝑙 represents the
precision parameter.
In Section 6.4.4, we demonstrate an efficient approach to implement the controlled-
𝑈𝑎𝑐 operators. However, this simplification is not applicable in our current situation,
since 𝐺 is the Grover operator for an arbitrary function 𝑓 and does not possess spe-
cial properties as 𝑈𝑎 . As a result, we need an alternative method to implement the
𝑖
controlled-𝐺 2 operators, which can be achieved as follows.
Recall from equation (7.1.12) that the Grover iterator is expressed as 𝐺 = 𝑈𝑠 𝑈1 .
Figure 7.1.6 presents an implementation of 𝑈𝑠 , which, as per Proposition 7.1.18,
requires O(𝑛) elementary quantum gates. It follows from Theorem 4.12.7 that the
7.2. Quantum counting 269

controlled-𝑈𝑠 operator can also be implemented by a quantum circuit using O(𝑛) ele-
mentary quantum gates. Next, in Figure 7.1.5, a quantum circuit implementation of 𝑈1
is presented, utilizing the 𝑈 𝑓 operator and O(1) elementary quantum gates. To trans-
form it into an implementation of the controlled-𝑈1 operator, we require the controlled-
𝑈 𝑓 operator. If an implementation of 𝑈 𝑓 is available that exclusively uses elemen-
tary quantum gates, then, by following the method described in the proof of Theorem
4.12.7, a quantum circuit for the controlled-𝑈 𝑓 operator can be constructed using only
elementary quantum gates. We assume that the controlled-𝑈 𝑓 operator is provided in
this manner or through some other means. Then, by employing the method from the
proof of Theorem 4.12.7, a quantum circuit implementing the controlled-𝑈1 operator
can be constructed that uses O(1) elementary quantum gates and one controlled-𝑈 𝑓
gate. Combining these results, we obtain the following result.
Proposition 7.2.1. There is an implementation of the controlled Grover iterator that
requires one controlled-𝑈 𝑓 gate and O(𝑛) additional elementary quantum gates.

In the complexity analyses presented in the following sections, we consider using


one controlled-𝑈 𝑓 gate as equivalent to using one 𝑈 𝑓 gate. This is because if the cir-
cuits implementing these gates are constructed from elementary gates, their sizes are
proportional.

7.2.2. An approximate quantum counting algorithm. We begin by present-


ing an approximate quantum counting algorithm that also serves to demonstrate the
principles employed in the other counting algorithms covered in this chapter.
The following proposition reveals the eigenvalues of the restricted Grover iterator
𝐺|𝑃 within the plane 𝑃 from (7.1.13). It demonstrates that the counting problem can be
effectively addressed through quantum phase estimation. To achieve this, we introduce
the quantum states:
|𝑠1 ⟩ + 𝑖 |𝑠0 ⟩ |𝑠1 ⟩ − 𝑖 |𝑠0 ⟩
(7.2.1) |𝑠+ ⟩ = , |𝑠− ⟩ = .
√2 √2
Proposition 7.2.2. The pair (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of eigenstates of the
restriction 𝐺|𝑃 of the Grover iterator to the plain 𝑃. The corresponding eigenvalues are
𝑒2𝑖𝜃 and 𝑒−2𝑖𝜃𝑖 with 𝜃 from (7.1.45), and we have
−𝑖
(7.2.2) |𝑠⟩ = (𝑒𝑖𝜃 |𝑠+ ⟩ − 𝑒−𝑖𝜃 |𝑠− ⟩) .
√2

Proof. By Exercise 7.2.3, the pair (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of 𝑃. Also, we
know from Proposition 7.1.16 that for all 𝛼 ∈ ℝ we have
(7.2.3) 𝐺(cos 𝛼 |𝑠0 ⟩ + sin 𝛼 |𝑠1 ⟩) = cos(𝛼 + 2𝜃) |𝑠0 ⟩ + sin(𝛼 + 2𝜃) |𝑠1 ⟩ .
As shown in Exercise 7.2.3, this implies
(7.2.4) 𝐺 |𝑠0 ⟩ = cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩ , 𝐺 |𝑠1 ⟩ = − sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩
and therefore
(7.2.5) 𝐺 |𝑠+ ⟩ = 𝑒2𝑖𝜃 |𝑠+ ⟩ , 𝐺 |𝑠− ⟩ = 𝑒−2𝑖𝜃 |𝑠− ⟩ .
270 7. Quantum Search and Quantum Counting

|0⟩𝑙 |𝑐⟩𝑙 −1 |𝑥⟩𝑙


𝐻 ⊗𝑙 QFT𝑙

|0⟩𝑛 𝐻 ⊗𝑛 𝐺𝑐 tr

Figure 7.2.1. The approximate quantum counting algorithm.

So |𝑠+ ⟩ and |𝑠− ⟩ are eigenstates of 𝐺|𝑃 associated with the eigenvalues 𝑒2𝑖𝜃 and 𝑒−2𝑖𝜃 ,
respectively. Equation (7.2.2) is also proved in Exercise 7.2.3. □
Exercise 7.2.3. (1) Show that (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of 𝑃.
(2) Verify equations (7.2.4), (7.2.5), and (7.2.2).

Proposition 7.2.2 demonstrates the feasibility of estimating the value of 𝑀 =


2
𝑁 sin 𝜃 by approximating one of the phases ±2𝜃 associated with the eigenvalues of the
Grover iterator 𝐺. To achieve this approximation, we employ Algorithm 7.2.4, which
utilizes the quantum circuit depicted in Figure 7.2.1 as its main component.
The idea of this algorithm is the following. By Exercise 6.2.9 and (7.2.2) we have
⊗𝑛 −𝑖
(7.2.6) 𝐻 ⊗𝑛 |0⟩ = |𝑠⟩ = (𝑒𝑖𝜃 |𝑠+ ⟩ − 𝑒−𝑖𝜃 |𝑠− ⟩) .
√2
This indicates that we can efficiently create an equally weighted superposition of
the two eigenstates of 𝐺|𝑃 . This is done in the second register of the quantum circuit
depicted in Figure 7.2.1. The circuit proceeds to perform quantum phase estimation
with a precision parameter 𝑙 ∈ ℕ on this superposition to identify a value 𝑥 ∈ ℤ𝐿 ,
𝑥 ±𝜃
where 𝐿 = 2𝑙 , such that 𝐿 is an approximation for one of the real numbers 𝜋 . As a
2 𝜋𝑥 2
result, the computed value 𝑀̃ = 𝑁 sin 2𝑙 provides an approximation of 𝑀 = sin 𝜃.
Algorithm 7.2.4 performs these calculations and returns the obtained approximation.
It is essential to note that the returned value 𝑀̃ is a real number and the algorithm can
only provide a rational approximation to this number. Therefore, implementations of
the algorithm must ensure that the precision of the approximation is sufficient for the
specific application’s requirements.

Algorithm 7.2.4. Approximate quantum counting algorithm


Input: 𝑛 ∈ ℕ, 𝑈 𝑓 for some 𝑓 ∶ {0, 1}𝑛 → {0, 1}, and a precision parameter 𝑙 ∈ ℕ
Output: An approximation 𝑀̃ to 𝑀 = |𝑓−1 (1)|
1: QCount(𝑛, 𝑈 𝑓 , 𝑙)
2: Apply the quantum circuit from Figure 7.2.1, the result being 𝑥 ∈ ℤ𝐿 where
𝐿 = 2𝑙
2 𝜋𝑥
3: return 𝑀̃ ← 𝑁 sin 𝐿
4: end

The following theorem establishes the correctness of Algorithm 7.2.4 and provides
insight into its computational complexity.
7.2. Quantum counting 271

Theorem 7.2.5. Assume that the input of Algorithm 7.2.4 is 𝑛, 𝑈 𝑓 , 𝑙 as specified in the
algorithm and let 𝐿 = 2𝑙 . Denote by 𝑀̃ the output of the algorithm. Then the following
are true.
8
(1) With probability at least 𝜋2
we have

√𝑀(𝑁 − 𝑀) 𝜋2 𝑁
(7.2.7) |𝑀̃ − 𝑀 | ≤ 2𝜋 + 2 .
𝐿 𝐿
(2) The algorithm requires O(𝐿) applications of 𝑈 𝑓 and O(𝐿𝑛2 ) additional elementary
operations.

Proof. From (7.2.2) and (6.3.18) it follows that before tracing the target register, the
state of the quantum circuit is
−𝑖 𝜃 −𝜃
(7.2.8) |𝜑⟩ = − (𝑒𝑖𝜃 𝜓𝑙 ( ) |𝑠+ ⟩ − 𝑒−𝑖𝜃 𝜓𝑙 ( ) |𝑠− ⟩) .
√2 𝜋 𝜋
Since (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of the plane 𝑃, Corollary 3.7.12 implies that
after tracing out the target register, the control register is in the mixed state
1 𝜃 1 𝜃
(7.2.9) (( , 𝜓𝑙 ( )) , ( , 𝜓𝑙 (− ))) .
2 𝜋 2 𝜋
8
So it follows from Theorem 6.3.7 and Lemma 6.3.3 that with probability at least 𝜋2
the
measurement result 𝑥 in Algorithm 7.2.4 satisfies
|𝑥 𝜃 | 1
(7.2.10) | − |< .
|𝐿 𝜋| 𝐿
Set
2 𝑀
(7.2.11) 𝑝 = sin 𝜃 = .
𝑁
Then we have
(7.2.12) sin 𝜃 = √𝑝, cos 𝜃 = √1 − 𝑝,
and
(7.2.13) 𝑀 = 𝑁𝑝.
So if we set
𝜋𝑥 2
(7.2.14) 𝜃̃ =
, 𝑝 ̃ = sin 𝜃,̃
𝐿
then the return value of Algorithm 7.2.4 is
(7.2.15) 𝑀̃ = 𝑁 𝑝.̃
We will now prove that
√𝑝(1 − 𝑝) 𝜋2
(7.2.16) |𝑝 ̃ − 𝑝| < 2𝜋 + 2.
𝐿 𝐿
Multiplying this equation by 𝑁, we obtain the assertion of the theorem. We set
(7.2.17) 𝜀 = 𝜃 ̃ − 𝜃.
272 7. Quantum Search and Quantum Counting

Then (7.2.10) implies


𝜋
(7.2.18) |𝜀| < .
𝐿
First, assume that 𝑝 ̃ ≥ 𝑝 and note that by the trigonometric identites (A.5.2) and (A.5.7)
we have
(7.2.19)
2 2
sin 𝜃 cos 𝜃 sin 2𝜀 + (1 − 2 sin 𝜃) sin 𝜀
2 2
= 2 sin 𝜃 cos 𝜃 sin 𝜀 cos 𝜀 + (cos2 𝜃 − sin 𝜃) sin 𝜀
2 2 2 2
= sin 𝜃 cos2 𝜀 + 2 sin 𝜃 cos 𝜃 sin 𝜀 cos 𝜀 + cos2 𝜃 sin 𝜀 − sin 𝜃(sin 𝜀 + cos2 𝜀)
2 2
= sin (𝜃 + 𝜀) − sin 𝜃.
Hence, using Lemma A.5.3 and equations (7.2.12), and (7.2.18) we obtain
2 2
|𝑝 ̃ − 𝑝| = 𝑝 ̃ − 𝑝 = sin (𝜃 + 𝜀) − sin 𝜃
2 2
= sin 𝜃 cos 𝜃 sin 2𝜀 + (1 − 2 sin 𝜃) sin 𝜀
(7.2.20) 2
= √𝑝(1 − 𝑝) sin 2𝜀 + (1 − 2𝑝) sin 𝜀
√𝑝(1 − 𝑝) 𝜋2
≤ 2𝜀√𝑝(1 − 𝑝) + 𝜀2 < 2𝜋 + .
𝐿 𝐿
Next, let 𝑝 ̃ ≤ 𝑝. Then using analogous arguments we obtain
2 2
|𝑝 ̃ − 𝑝| = 𝑝 − 𝑝 ̃ = sin 𝜃 − sin (𝜃 − 𝜀)
2 2
= sin 𝜃 cos 𝜃 sin 2𝜀 − (1 − 2 sin 𝜃) sin 𝜀
(7.2.21) 2
= √𝑝(1 − 𝑝) sin 2𝜀 + (2𝑝 − 1) sin 𝜀
√𝑝(1 − 𝑝) 𝜋2
≤ 2𝜀√𝑝(1 − 𝑝) + 𝜀2 ≤ 2𝜋 + 2.
𝐿 𝐿
This concludes the proof of (7.2.16) which implies (7.2.7).
We will now establish the complexity statement. The implementation of the quan-
𝑖
tum phase estimation circuit necessitates the utilization of controlled-𝐺 2 operators
for 0 ≤ 𝑖 < 𝑙. It follows from Proposition 7.2.1 that these, in turn, require a total of
𝑙−1
∑𝑖=0 2𝑙 = 2𝑙 = 𝐿 − 1 applications of the 𝑈 𝑓 operator and an additional O(𝐿𝑛2 ) ele-
mentary quantum gates. □

7.2.3. A quantum counting algorithm with pre-selected error. Algorithm


7.2.6 is a modification of the approximate quantum counting algorithm from the previ-
ous section. Given an error parameter 𝜀 ∈ ℝ, 0 < 𝜀 < 1, it computes an approximation
𝑀̃ of 𝑀 = 𝑓−1 (1) such that
(7.2.22) |𝑀 − 𝑀̃ | ≤ 𝜀𝑀.

We explain the idea of the algorithm. For increasing values of 𝑙 = 0, 1, 2, . . ., it


calls QCount(𝑛, 𝑈 𝑓 , 𝑙) until the return value is different from 0 for the first time or
2𝑙 ≥ 2√𝑁. Denote the 𝑙-value of the first occurrence of a nonzero return value by 𝑙max .
7.2. Quantum counting 273

2
In the analysis of the algorithm, it is demonstrated that with probability at least cos2 5 ,
we have
2 𝑁
(7.2.23) 2𝑙max ≥ √ .
5𝜋 𝑀
Then the number 𝑙max of QCount calls that return 0 provides crucial information about
the magnitude of the solution count 𝑀. A larger value of 𝑙max implies a smaller value
20𝜋2
of 𝑀. Furthermore, it is proven that for 𝑙 = 𝑙max + ⌈log2 𝜀 ⌉, where 𝜀 denotes the
chosen precision, the call QCount(𝑛, 𝑈 𝑓 , 𝑙) provides the desired approximation 𝑀̃ with
8
a probability of at least 𝜋2 . Therefore, the total success probability of the algorithm is
8 2 2
at least 𝜋2
cos2 5
> 3.

Algorithm 7.2.6. Approximate counting with pre-selected approximation precision


Input: 𝑛 ∈ ℕ, 𝑈 𝑓 with 𝑓 ∶ {0, 1}𝑛 → {0, 1}, a parameter 𝜀 ∈ ℝ with 0 < 𝜀 < 1
Output: An approximation 𝑀̂ ∈ ℕ0 to 𝑀 such that |𝑀 − 𝑀̂ | ≤ 𝜀𝑀
1: ApproxQCount(𝑛, 𝑈 𝑓 , 𝜀)
2: 𝑙←0
3: repeat
4: 𝑙 ←𝑙+1
5: 𝑀̃ ← QCount(𝑛, 𝑈 𝑓 , 𝑙)
6: until 𝑀̃ ≠ 0 or 2𝑙 ≥ 2√𝑁
20𝜋2
7: 𝑙 ← 𝑙 + ⌈log2 𝜀 ⌉
8: 𝑀̃ ← QCount(𝑛, 𝑈 𝑓 , 𝑙)
9: return 𝑀̂ ← ⌊𝑀⌉̃
10: end

The following theorem establishes both the correctness and the computational
complexity of Algorithm 7.2.6.
Theorem 7.2.7. Let 𝑛, 𝑈 𝑓 , 𝜀 be the input of Algorithm 7.2.6. Denote by 𝑀̂ the return
value of the algorithm. Then the following are true.
2
(1) With probability at least 3
we have
(7.2.24) |𝑀̂ − 𝑀| < 𝜀𝑀.
√𝑁 𝑛2 √𝑁
(2) The algorithm requires O ( 𝜀
) applications of 𝑈 𝑓 and O ( 𝜀
) additional ele-
mentary operations.

Proof. Let
𝑀 1
(7.2.25) 𝜃 = arcsin √ , 𝑘 = ⌈log2 ⌉.
𝑁 5𝜃
Then we have
1 1
(7.2.26) 2𝑘 ≥ 2log2 5𝜃 = .
5𝜃
274 7. Quantum Search and Quantum Counting

Also, Corollary A.5.6 implies


1 2 2 2 𝑁
(7.2.27) 2𝑘 ≤ 2log2 5𝜃 +1 = = ≤ .
5𝜃 𝑀 5√𝑀
5 arcsin √ 𝑁

In line 6 of the algorithm we obtain 𝑙 = 𝑘 if the call QCount(𝑛, 𝑈 𝑓 , 𝑙) in line 5 of


the algorithm returns 0 on 𝑘 occasions. This happens with probability
𝑘 2 𝑘
1 sin (2𝑙 𝜃)
𝑝=∏ ≥ ∏ cos2 (2𝑙 𝜃)
(1)
𝑙=1
22𝑙 sin2 𝜃 (2) 𝑙=1
(7.2.28)
2
sin (2𝑘+1 𝜃) 2
= 2
= cos2 (2𝑘+1 𝜃) ≥ cos2 .
(3) 22𝑘 sin (2𝜃) (4) (5) 5
Here, equation (1) follows from Proposition 6.3.4, inequality (2) is a consequence of
Lemma A.5.10 with 𝑥 = 2𝑙 and 𝛼 = 𝜃 which is applicable because of (7.2.26), equation
(3) is obtained from Lemma A.5.11, for equation (4) we use Lemma A.5.10 again with
𝑥 = 2𝑘 and 𝛼 = 2𝜃, and inequality (5) uses Lemma A.5.7 cos 𝑥 ≥ 𝑥, and (7.2.26).
Assume that the maximum value 𝑙max for 𝑙 assumed in the repeat loop is at least
𝑘. As in line 7 of the algorithm set
20𝜋2
(7.2.29) 𝑙 = 𝑙max + ⌈log2 ⌉.
𝜀
Corollary A.5.4 implies
𝑀 𝜋 𝑀
(7.2.30) 𝜃 = arcsin √ ≤ √ .
𝑁 2 𝑁
So with 𝐿 = 2𝑙 we obtain from (7.2.29), (7.2.26), and (7.2.30)
1 𝜖 𝜀 𝑀
(7.2.31) ≤ 5𝜃 ≤ .
𝐿 20𝜋2 8𝜋 √ 𝑁
8
This implies that QCount(𝑛, 𝑈 𝑓 , 𝑙) returns with probability at least 𝜋2
a real number
𝑀̃ with
√𝑀(𝑁 − 𝑀) 𝑁
|𝑀 − 𝑀̃ | ≤ 2𝜋 + 𝜋2 2
𝐿 𝐿
(7.2.32) 𝜀 𝑁−𝑀 𝜀2
≤ 𝑀√ +
4 𝑁 64
1 1 𝜀𝑀
≤ 𝜀𝑀 ( + ) 𝑀 < .
4 64 2
̃ If 𝜀𝑀 < 1, then (7.2.32) implies |𝑀̂ − 𝑀 | < 1 . So we have 𝑀̂ = 𝑀 and
Set 𝑀̂ = ⌊𝑀⌉. 2
1 𝑀
|𝑀̂ − 𝑀 | = 0. If 𝜀𝑀 ≥ 1, then we have |𝑀̂ − 𝑀̃ | ≤ 2 ≤ 𝜀 2 . Together with (7.2.32) this
implies |𝑀̂ − 𝑀 | ≤ |𝑀̂ − 𝑀̃ | + |𝑀̃ − 𝑀 | < 𝜀𝑀. The total success probability is
8 2 2
(7.2.33) cos2 ≥ .
𝜋2 5 3
The complexity statement can be seen as follows. According to Theorem 7.2.5,
the call to QCount(𝑛, 𝑈 𝑓 , 𝑙) in line 5 necessitates O(2𝑙 ) applications of 𝑈 𝑓 and O(𝑛2 2𝑙 )
7.2. Quantum counting 275

additional elementary operations. Furthermore, due to the condition in line 6, the


maximum value of 2𝑙 in the repeat loop is bounded by O(√𝑁). As a result, this repeat
loop requires O(√𝑁) applications of 𝑈 𝑓 and O(𝑛2 √𝑁) additional operations. After the
√𝑁
assignment in line 7, the value of 𝑙 becomes O ( 𝜀
). Another application of Theorem
7.2.5 completes the proof. □

7.2.4. Exact counting. The two algorithms, QCount and ApproxQCount, can be
effectively utilized to count the number of solutions 𝑀 = 𝑓−1 (1) exactly. The approach
1
involves employing ApproxQCount(𝑛, 𝑈 𝑓 , 2 ) to obtain a reliable approximation 𝑀̃ 1 of
𝑀. Then, using this approximation, we find an appropriate value of 𝑙 ∈ ℕ such that
1
QCount(𝑛, 𝑈 𝑓 , 𝑙) provides 𝑀̃ 2 that satisfies |𝑀 − 𝑀̃ 2 | < 2 . Consequently, 𝑀 is the
nearest integer value of 𝑀̃ 2 . This entire process is implemented in Algorithm 7.2.8.

Algorithm 7.2.8. Exact counting


Input: 𝑛 ∈ ℕ, 𝑈 𝑓 with 𝑓 ∶ {0, 1}𝑛 → {0, 1}
Output: 𝑀 = |𝑓−1 (1)|
1: ExactQCount(𝑛, 𝑈 𝑓 )
1
2: 𝑀̃ 1 ← ApproxQCount (𝑛, 𝑈 𝑓 , ) 2
3: 𝑙 ← ⌈log2 26√𝑀̃ 1 𝑁⌉
4: 𝑀̃ 2 ← QCount(𝑛, 𝑈 𝑓 , 𝑙)
5: return 𝑀 ← ⌊𝑀̃ 2 ⌉
6: end

Theorem 7.2.9. On input of 𝑛 ∈ ℕ and 𝑈 𝑓 for some function 𝑓 ∶ {0, 1}𝑛 → {0, 1},
1
Algorithm 7.2.8 returns 𝑀 = |𝑓−1 (1)| with probability at least 2 . The algorithm requires
O(√𝑀𝑁) applications of 𝑈 𝑓 and O(𝑛2 √𝑀𝑁) additional elementary operations.
𝑀 𝑀
Proof. We have |𝑀̃ 1 − 𝑀| ≤ 2 and, therefore, 𝑀̃ 1 ≥ 2
. Choose 𝑙 = ⌈log2 26√𝑀̃ 1 𝑁⌉
as in line 3 and 𝐿 = 2𝑙 . Then we have
1 1
(7.2.34) ≤ .
𝐿 26√𝑀̃ 𝑁
1
This implies
2𝜋 𝑀𝑁 𝜋2 𝑁
|𝑀 − 𝑀̃ 2 | ≤ + 2
26 √ 𝑀̃ 1 𝑁 26 𝑀̃ 1 𝑁
(7.2.35)
4𝜋 𝜋2 1
≤ + 2 < .
26 26 2
So it follows that the algorithm returns the correct 𝑀.
The complexity statement follows from Theorems 7.2.5 and 7.2.7. □
Chapter 8

The HHL Algorithm

In the previous chapters, we explored early quantum algorithms and fundamental


techniques, such as phase estimation and amplitude amplification. Shor’s algorithms
for integer factorization and computing discrete logarithms stand out for their signif-
icant speedups compared to classical algorithms. They solve these problems in poly-
nomial time, which is classically only possible in subexponential time. Additionally,
the Grover search algorithm offers a quadratic speedup, which is also very impressive.
These advancements have inspired researchers to seek quantum algorithms with sim-
ilar advantages across various computing domains. Many other quantum algorithms
have been discovered, building upon the techniques we have presented. For a compre-
hensive overview of the presently known quantum algorithms, refer to [Jor].
In this chapter, we focus on a fascinating and more recent quantum algorithm, the
HHL algorithm, proposed by Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd in
2008 [HHL08]. The HHL algorithm addresses a crucial problem encountered in count-
less applications in science and engineering: solving linear systems over ℂ. When cer-
tain conditions are met and the problem is appropriately formulated, the HHL algo-
rithm achieves exponential speedup compared to the best classical algorithms. Its the-
oretical significance and practical applications make it an intriguing subject of study.
Our presentation is based on the description of the algorithm in [DHM+ 18] and
aims to provide an impression of the algorithm and its analysis. However, a detailed
explanation goes beyond the scope of this book.

8.1. The problem


One of the most significant challenges in algorithmic linear algebra is known as the
Linear Systems Problem (LSP), which we also discuss in Section B.7.5. Here, we focus
on a specific case of LSP, involving the parameters 𝑀 ∈ ℕ, 𝐴 ∈ 𝖦𝖫(𝑀, ℂ), and 𝑏 ⃗ ∈ ℂ𝑀 .
The objective is to compute 𝑥⃗ = 𝐴−1 𝑏.⃗ In Section B.7.5, we demonstrate that Gauss-
ian elimination can find 𝑥⃗ using operations O(𝑀 3 ) in ℂ. However, since computers

277
278 8. The HHL Algorithm

can only handle rational approximations to complex numbers, several algorithms have
been developed to efficiently find good approximations to 𝑥⃗ using polynomial time in
𝑀.
The HHL algorithm addresses the Quantum Linear System Problem (QLSP). As in
LSP, it uses 𝑀 ∈ ℕ, 𝐴 ∈ 𝖦𝖫(𝑀, ℂ), 𝑏 ⃗ ∈ ℂ𝑀 , and 𝑥⃗ = 𝐴−1 𝑏.⃗ To simplify the description
of the HHL algorithm, the following assumptions are made.
(1) 𝑀 = 2𝑚 with 𝑚 ∈ ℕ.
(2) 𝐴 is Hermitian; hence, 𝐴 ∈ 𝖦𝖫(𝑀, ℂ) and Proposition 2.4.60 imply that the eigen-
values of 𝐴 are nonzero real numbers.
(3) The eigenvalues of 𝐴 are in [0, 2𝜋[.
(4) ‖‖𝑏‖‖⃗ = 1.

If these assumptions are not satisfied, it can be ensured by appropriately modify-


ing the HHL parameters 𝑀, 𝐴, and 𝑏 ⃗ that they are satisfied, and therefore the HHL
algorithm still works. This modification is presented in the next exercise.

0 𝐴
Exercise 8.1.1. Let 𝐴 ∈ 𝖦𝖫(𝑀, ℂ), 𝑏,⃗ 𝑥⃗ ∈ ℂ𝑀 with 𝐴𝑏 ⃗ = 𝑥.⃗ Show that 𝐴′ = ( ∗ )
𝐴 0
is Hermitian and for 𝑏′⃗ = (𝑏,⃗ 0)⃗ and 𝑥′⃗ = (0,⃗ 𝑥)⃗ we have 𝐴′ 𝑏′⃗ = 𝑥′⃗ .

Let 𝑏 ⃗ = (𝑏0 , . . . , 𝑏𝑀−1 ). Since ‖‖𝑏‖‖⃗ = 1,

(8.1.1) |𝑏⟩ = ∑ 𝑏𝑖 |𝑖⟩𝑚


𝑖∈ℤ𝑀

is a quantum state in ℍ𝑚 . So the number 𝑚 = log2 𝑀 of qubits required to represent


𝑏 ⃗ is logarithmic in the dimension 𝑀 of the linear system to be solved. This opens the
possibility for the HHL algorithm to find

(8.1.2) |𝑥⟩ = ∑ 𝑥𝑖 |𝑖⟩𝑚


𝑖∈ℤ𝑀

where 𝑥⃗ = (𝑥0 , . . . , 𝑥𝑀−1 ) in time polylogarithmic in 𝑀 which would be an exponen-


tial advantage over all classical LSP algorithms. However, note that |𝑥⟩ may not be a
quantum state, since the Euclidean length of 𝑥⃗ may not be 1.
In the upcoming section, we provide an overview of the HHL algorithm, and in
Section 8.3, we delve into the conditions under which the algorithm achieves its ex-
ponential speedup. Here, we highlight some essential considerations. To achieve ex-
ponential speedup, the algorithm cannot directly read all components of 𝑥⃗ since this
vector has length 𝑀. Instead, it can extract certain properties of 𝑥⃗ by measuring |𝑥⟩
with respect to an observable of ℍ𝑚 . In several application contexts, this is sufficient.
Additionally, the input of the algorithm cannot simply be given as 𝐴 and 𝑏 ⃗ because
representing these objects in a standard form requires a time complexity of Ω(𝑀 2 ).
Therefore, both 𝐴 and 𝑏 ⃗ must be very sparse, and there must exist an efficient method
to access the entries of 𝐴 and 𝑏.⃗
8.2. Overview 279

8.2. Overview
Let 𝑚, 𝑀, 𝐴, 𝑏,⃗ 𝑥,⃗ |𝑏⟩, and |𝑥⟩ be as specified in the previous section for the HHL prob-
lem. In the following, we give an overview of the HHL algorithm.
Since 𝐴 is Hermitian, it follows from Theorem 2.4.53 that we can choose an or-
thonormal basis (|𝑢0 ⟩ , . . . , |𝑢𝑀−1 ⟩) of eigenstates of 𝐴. Denote by 𝜆0 , . . . , 𝜆𝑀 the cor-
responding eigenvalues. They are nonzero real numbers in [0, 2𝜋[ by assumption. It
follows from the Spectral Theorem 2.4.56 that
𝑀−1
(8.2.1) 𝐴 = ∑ 𝜆𝑗 |𝑢𝑗 ⟩ ⟨𝑢𝑗 | .
𝑗=0

So, the inverse of 𝐴 is


𝑀−1
1
(8.2.2) 𝐴−1 = ∑ |𝑢 ⟩ ⟨𝑢 | .
𝑗=0
𝜆𝑗 𝑗 𝑗

The quantum state |𝑏⟩ can be written as


𝑀−1
(8.2.3) |𝑏⟩ = ∑ 𝛽𝑗 |𝑢𝑗 ⟩
𝑗=0

with 𝛽𝑗 ∈ ℂ for 𝑗 ∈ ℤ𝑀 . With this notation, we have


𝑀−1
𝛽𝑗
(8.2.4) |𝑥⟩ = 𝐴−1 |𝑏⟩ = ∑ |𝑢 ⟩ .
𝑗=0
𝜆𝑗 𝑗

The HHL circuit shown in Figure 8.2.1 uses this identity to approximate |𝑥⟩. We explain
how this works by determining the intermediate states |𝜓0 ⟩ , . . . , |𝜓4 ⟩.
The HHL circuit operates on a quantum register which is composed of three smaller
quantum registers. The first is the ancilla register. It contains one ancillary qubit. The
second is the clock register. It is of length 𝑛 ∈ ℕ which is a precision constant. The third
register is the b-register. It is of length 𝑚. To simplify the explanation of the algorithm,

|𝜓0 ⟩ |𝜓1 ⟩ |𝜓2 ⟩ |𝜓3 ⟩ |𝜓4 ⟩

|0⟩ 𝑅𝑐 tr
𝑈𝑃 𝑈𝑃−1

|0⟩𝑛 −1
𝐻 ⊗𝑛 |𝑐⟩ QFT𝑛 |𝑐⟩ QFT𝑛 |𝑐⟩ 𝐻 ⊗𝑛 tr

|𝑏⟩ 𝑈𝐴𝑐 𝑈𝐴−𝑐

𝑈 𝑉 𝑈 −1

Figure 8.2.1. The HHL circuit.


280 8. The HHL Algorithm

we assume that the eigenvalues of 𝐴 can be written as


2𝜋𝑐𝑗
(8.2.5) 𝜆𝑗 = with 𝑐𝑗 ∈ ℤ2𝑛 for 0 ≤ 𝑗 < 𝑀.
2𝑛
This means that all 𝜆𝑗 have a finite binary expansion of length at most 𝑛 and can there-
fore be precisely determined by quantum phase estimation. We will show that the final
state |𝜓4 ⟩ is proportional to |𝑥⟩ if the measurement of the first qubit gives 1. If (8.2.5) is
not true, the algorithm finds an approximation of a quantum state proportional to |𝑥⟩.
The initial state of the HHL algorithm is
(8.2.6) |𝜓0 ⟩ = |0⟩ |0⟩𝑛 |𝑏⟩ .

Next, we note that


(8.2.7) |𝜓1 ⟩ = 𝑈 |𝜓0 ⟩ = |0⟩ 𝑈𝑃 |0⟩𝑛 |𝑏⟩

with 𝑈 and 𝑈𝑃 from Figure 8.2.1. Here, 𝑈𝑃 is the phase estimation circuit introduced
in Section 6.3. It is used to estimate the eigenvalues of

(8.2.8) 𝑈𝐴 = 𝑒𝑖𝐴 = ∑ 𝑒𝑖𝜆𝑗 |𝑢𝑗 ⟩ ⟨𝑢𝑗 |


𝑗∈ℤ𝑀

introduced in Definition 2.4.69. By Theorem 2.4.72, this operator is unitary since 𝐴 is


Hermitian. In addition, (8.2.8) shows that (|𝑢0 ⟩ , . . . , |𝑢𝑀−1 ⟩) is an orthonormal basis
of the eigenstates of 𝑈𝐴 and
𝑐0 𝑐𝑀−1
(8.2.9) 𝑒𝑖𝜆0 = 𝑒2𝜋𝑖 2𝑛 , . . . , 𝑒𝑖𝜆𝑀−1 = 𝑒2𝜋𝑖 2𝑛

are the eigenvalues corresponding to the basis elements. It follows from equation
(6.3.20) in the analysis of the phase estimation algorithm, Definition 6.2.5, and the
invertibility of QFT𝑛 shown in Proposition 6.2.8 that
𝑀−1
|𝜓1 ⟩ = |0⟩ ∑ 𝛽𝑗 𝑈𝑃 |0⟩𝑛 |𝑢𝑗 ⟩
𝑗=0
𝑀−1
(8.2.10) = |0⟩ ∑ 𝛽𝑗 QFT𝑛
−1 ||𝜓 ( 𝑐𝑗 )⟩ |𝑢 ⟩
| 𝑛 2𝑛 𝑗
𝑗=0
𝑀−1
= |0⟩ ∑ 𝛽𝑗 |𝑐𝑗 ⟩𝑛 |𝑢𝑗 ⟩ .
𝑗=0

In order to obtain |𝜓2 ⟩, the HHL circuit applies the operator 𝑉 to |𝜓1 ⟩ which is also
shown in Figure 8.2.1. This operator acts as the rotation
(8.2.11) 𝑅𝑐 = 𝑅𝑦̂(−2𝜃(𝑐))
on the ancilla register controlled by the clock register |𝑐⟩𝑛 , 𝑐 ∈ ℤ2𝑛 , and does not change
the clock and the 𝑏 register. Here, 𝑅𝑦̂ is from Definition 4.3.7,
𝐶 𝐶
(8.2.12) 𝜃(𝑐) = arcsin with 𝜆(𝑐) = ,
𝜆(𝑐) 2𝑛
8.2. Overview 281

and the constant 𝐶 ∈ ℝ is chosen such that 𝜃(𝑐) in (8.2.12) is defined, in the interval
𝜋
[0, 2 ], and the success probability of the algorithm is maximized. From (4.3.9), we
obtain the following:

𝐶2 𝐶
(8.2.13) 𝑅𝑐 |0⟩ = cos 𝜃(𝑐) |0⟩ + sin 𝜃(𝑐) |0⟩ = 1− |0⟩ + |1⟩ .
√ 𝜆(𝑐)2 𝜆(𝑐)
This implies
𝑀−1
|𝜓2 ⟩ = 𝑉 |𝜓1 ⟩ = ∑ 𝛽𝑗 𝑅𝑐 |0⟩ |𝑐𝑗 ⟩ |𝑢𝑗 ⟩
𝑛
𝑗=0
(8.2.14)
𝑀−1 𝑀−1
𝐶2 𝐶
= |0⟩ ∑ 𝛽𝑗 1− |𝑐 ⟩ |𝑢 ⟩ + |1⟩ ∑ 𝛽𝑗 |𝑐𝑗 ⟩ |𝑢𝑗 ⟩ .
𝑗=0 √ 𝜆𝑗 𝑗 𝑛 𝑗 𝑗=0
𝜆𝑗
𝑛

Exercise 8.2.1. Show that the operator 𝑉 is unitary.

As in (8.2.10), we see that


𝑀−1 𝑀−1
𝐶2 𝐶
|𝜓3 ⟩ = |0⟩ ∑ 𝛽𝑗 1− |0⟩𝑛 |𝑢𝑗 ⟩ + |1⟩ ∑ 𝛽𝑗 |0⟩𝑛 |𝑢𝑗 ⟩
𝑗=0 √ 𝜆𝑗 𝑗=0
𝜆𝑗
(8.2.15)
𝑀−1
𝐶2
= |0⟩ |0⟩𝑛 ∑ 𝛽𝑗 1− |𝑢 ⟩ + |1⟩ |0⟩𝑛 𝐶 |𝑥⟩ .
𝑗=0 √ 𝜆𝑗 𝑗

Exercise 8.2.2. Verify (8.2.15)

So we obtain the following result.

Theorem 8.2.3. Measuring the first qubit of |𝜓3 ⟩ gives |1⟩ with probability 𝐶 ‖‖𝑥‖‖⃗ . If |1⟩
is measured, then the final state in the HHL circuit is
𝐶
(8.2.16) |𝜓4 ⟩ = |𝑥⟩ .
‖𝑥‖⃗
‖ ‖

Proof. Measuring the first qubit of |𝜓3 ⟩ means measuring the observable 𝑂 = (|0⟩ ⟨0|+
|1⟩ ⟨1|)𝐼𝐵 where 𝐵 is the quantum system comprising the second and third quantum
registers. Therefore, the probability of measuring |1⟩ is 𝐶 ‖‖𝑥‖‖⃗ and if |1⟩ is measured,
then (8.2.16) holds. □
𝐶
We note that the proportionality factor ‖𝑥⃗‖ can be obtained from 𝐶 and the proba-
‖ ‖
bility of measuring |1⟩. Also, if the measurement of the first qubit gives |1⟩ but (8.2.5)
does not hold, which in general is the case, then the final state is
𝐶
(8.2.17) |𝜓4 ⟩ = |𝑥′ ⟩
‖𝑥′⃗ ‖
‖ ‖
𝑀−1
where 𝑥′⃗ = (𝑥0′ , . . . , 𝑥𝑀−1

) is an approximation of 𝑥⃗ and |𝑥′ ⟩ = ∑𝑗=0 𝑥𝑗′ |𝑢𝑗 ⟩.
282 8. The HHL Algorithm

8.3. Analysis and applications


This section presents the results of the complexity analysis of the HHL algorithm. As
in the two previous chapters, we assume in the following that all quantum circuits are
constructed using the elementary quantum gates from the platform specified in Section
4.12.2.
To state the complexity result, we need some further notations and assumptions.
We use 𝑀, 𝐴, 𝜆𝑗 , 𝑗 ∈ ℤ𝑀 , 𝑏,⃗ 𝑥,⃗ 𝑥′⃗ , |𝑏⟩, |𝑥⟩, and |𝑥′ ⟩ as introduced in the previous sections.
The condition number of 𝐴 is
max{|𝜆𝑖 | ∶ 𝑖 ∈ ℤ𝑀 }
(8.3.1) 𝜅= .
min{|𝜆𝑖 | ∶ 𝑖 ∈ ℤ𝑀 }
It is assumed that the matrix 𝐴 is 𝑠-sparse and efficiently row computable, which means
that there are at most 𝑠 nonzero entries per row, and, given a row index, the entries of
the corresponding row can be computed in time O(𝑠).
The parameters 𝐶 and 𝑛 are chosen in such a way that ‖‖𝑥⃗ − 𝑥′⃗ ‖‖ < 𝜀 for some error
parameter 𝜀 ∈ ℝ>0 .
With the given notations and assumptions, the time complexity of the HHL algo-
rithm can be expressed as
log 𝑀 ⋅ 𝑠2 ⋅ 𝜅2
(8.3.2) 𝒪( ).
𝜀
So, the complexity is only polylogarithmic in 𝑀 as long as 𝑠𝜅 has this property. This
means that if 𝐴 is sparse and the condition number is small, the algorithm provides an
exponential speedup compared to the best-known classical algorithms. However, the
end result is a quantum state representing 𝑥′⃗ , and as previously mentioned in Section
8.1, obtaining all 𝑀 components of 𝑥′⃗ would take at least time 𝑀 and thus undermine
exponential speedup. But within the time complexity (8.3.2), it is possible to measure
the final state of the HHL algorithm and obtain information about 𝑥.⃗ This has inter-
esting applications in various domains such as machine learning, data analysis, and
optimization.
Appendix A

Foundations

To understand quantum algorithms and their underlying principles, a solid mathemat-


ical foundation is indispensable. To ensure the comprehensiveness of this book, this
appendix provides important mathematical concepts and results used throughout.
The initial part of this appendix comprises fundamental notions such as numbers,
relations, functions, and operations. Subsequently, we explain directed graphs, used
to model Boolean circuits, and the asymptotic notation, which is an indispensable tool
for algorithm analysis.
To comprehend the algorithms devised by Peter Shor, as discussed in Chapter 6,
some number theory becomes essential. This is addressed in the next section and in-
cludes the exploration of the utility of continued fractions in determining good rational
approximations, which is crucial in the order finding algorithm.
We also present fundamental concepts from algebra, encompassing groups, rings,
and fields. In addition, many algorithms require familiarity with trigonometric identi-
ties and inequalities. These are presented in the concluding section.

A.1. Basics
A.1.1. Numbers. We denote the usual sets of numbers as follows.
• ℕ is the set of natural numbers; i.e., ℕ = {1, 2, . . .}.
• ℕ0 is the set of natural numbers including 0; i.e., ℕ0 = {0, 1, 2, . . .}.
• ℤ is the set of integers; i.e., ℤ = {0, ±1, ±2, . . .}.
𝑝
• ℚ is the set of rational numbers; i.e., ℚ = { 𝑞 ∶ 𝑝 ∈ ℤ, 𝑞 ∈ ℕ}.
• ℝ is the set of real numbers, i.e., the set of all numbers that can be represented by
infinite decimals, like √2 = 1.414 . . . or 𝜋 = 3.14159 . . ..
• ℂ is the set of complex numbers, i.e., the set of all numbers 𝛾 = 𝛼 + 𝑖𝛽 where 𝛼, 𝛽
are real numbers and 𝑖 is a square root of −1; i.e., 𝑖2 = −1. In this representation,

283
284 A. Foundations

𝛼 is called the real part of 𝛾 and is denoted by ℜ𝛾 and 𝛽 is called the imaginary
part of 𝛾 and is denoted by ℑ𝛾.
We note that
(A.1.1) ℕ ⊂ ℕ0 ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ.

For every 𝑘 ∈ ℕ we write


(A.1.2) ℤ𝑘 = {0, 1, . . . , 𝑘 − 1}.
Furthermore, if 𝑙 ∈ ℤ, we denote by 𝑙 mod 𝑘 the remainder of the division of 𝑙 by 𝑘.
Example A.1.1. We have ℤ5 = {0, 1, 2, 3, 4} and 123 mod 5 = 3.

Next, we use the following notation.


Definition A.1.2. Let 𝑟 be a real number. Then we set
(1) ⌊𝑟⌋ = max{𝑧 ∈ ℤ ∶ 𝑧 ≤ 𝑟},
(2) ⌈𝑟⌉ = min{𝑧 ∈ ℤ ∶ 𝑧 > 𝑟} , and
1 1
(3) ⌊𝑟⌉ to the uniquely determined integer 𝑧 with − 2 ≤ 𝑟 − 𝑧 < 2 .
Example A.1.3. We have ⌊1.3⌋ = 1, ⌈1.3⌉ = 2, ⌊1.3⌉ = 1, ⌊−1.3⌋ = −2, ⌈−1.3⌉ = −1,
⌊−1.3⌉ = −1.

A.1.2. Relations.
Definition A.1.4. (1) Let 𝑆 and 𝑇 be sets. The Cartesian product 𝑆 × 𝑇 of 𝑆 and 𝑇 is
the set of all pairs (𝑠, 𝑡) with 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇; that is,
(A.1.3) 𝑆 × 𝑇 = {(𝑠, 𝑡) ∶ 𝑠 ∈ 𝑆, 𝑡 ∈ 𝑇}.
(2) More generally, if 𝑘 ∈ ℕ and 𝑆 1 , . . . , 𝑆 𝑘 are sets, then the Cartesian product 𝑆 0 ×
⋯ × 𝑆 𝑘−1 of these sets is the set of all tuples (𝑠1 , . . . , 𝑠𝑘 ) where 𝑠𝑖 ∈ 𝑆 𝑖 , 𝑖 ∈ ℤ𝑘 ; i.e.,
(A.1.4) 𝑆 0 × ⋯ × 𝑆 𝑘−1 = {(𝑠0 , . . . , 𝑠𝑘−1 ) ∶ 𝑠𝑖 ∈ 𝑆 𝑖 , 𝑖 ∈ ℤ𝑘 }.
𝑘−1
We also write ∏𝑖=0 for this Cartesian product.
Definition A.1.5. Let 𝑆 and 𝑇 be sets. A relation between 𝑆 and 𝑇 is a subset 𝑅 of the
Cartesian product 𝑆 × 𝑇. If 𝑆 = 𝑇, then 𝑅 is called a relation on 𝑆.
Example A.1.6. Consider the two sets 𝑆 = {“odd”, “even”}, 𝑇 = ℤ. Then “is the parity
of” is a relation between 𝑆 and 𝑇. Denote it by 𝑅. A pair (𝑠, 𝑡) is in 𝑅 if and only if 𝑠
is the parity of 𝑡. For example, (“even”, 2) is in 𝑅. Also, (“odd”, −3) is in 𝑅. However,
(“odd”, 0) is not in 𝑅.

We introduce a few important notions regarding relations on a single set.


Definition A.1.7. Let 𝑆 be a set and let 𝑅 ⊂ 𝑆 × 𝑆 be a relation on 𝑆.
(1) The relation 𝑅 is called reflexive if (𝑠, 𝑠) ∈ 𝑅 for all 𝑠 ∈ 𝑆.
(2) The relation 𝑅 is called symmetric if for any pair (𝑠, 𝑡) in 𝑆, the pair (𝑡, 𝑠) is also
in 𝑅.
A.1. Basics 285

(3) The relation 𝑅 is called antisymmetric if (𝑠, 𝑡) ∈ 𝑅 and (𝑡, 𝑠) ∈ 𝑅 implies 𝑠 = 𝑡 for
all 𝑠, 𝑡 ∈ 𝑆.
(4) The relation 𝑅 is called transitive if for all 𝑠, 𝑡, 𝑢 ∈ 𝑆 such that both (𝑠, 𝑡) and (𝑡, 𝑢)
are in 𝑅, the pair (𝑠, 𝑢) is also in 𝑅.
(5) The relation 𝑅 is called an equivalence relation if it is reflexive, symmetric, and
transitive.

Example A.1.8. Consider the relation ≤ on ℤ. To be more explicit, this relation is


defined as

(A.1.5) 𝑅 = {(𝑠, 𝑡) ∶ 𝑠, 𝑡 ∈ ℤ, 𝑠 ≤ 𝑡}.

This relation is reflexive since 𝑠 ≤ 𝑠 for all 𝑠 ∈ ℤ. It is antisymmetric since 𝑠 ≤ 𝑡 and


𝑡 ≤ 𝑠 implies 𝑠 = 𝑡 for all 𝑠, 𝑡 ∈ ℤ. The relation is also transitive, since 𝑠 ≤ 𝑡 and 𝑡 ≤ 𝑢
implies 𝑠 ≤ 𝑢 for all 𝑠, 𝑡, 𝑢 ∈ ℤ.

Definition A.1.9. Let 𝑆 be a set, and let 𝑅 ⊂ 𝑆 × 𝑆 be an equivalence relation on 𝑆.

(1) The equivalence class of an element 𝑠 ∈ 𝑆 with respect to the relation 𝑅 is the set
[𝑠]𝑅 = {𝑡 ∈ 𝑆 ∶ (𝑠, 𝑡) ∈ 𝑅}.
(2) The set of all equivalence classes of 𝑆 with respect to 𝑅 is written as 𝑆/𝑅. An
element of an equivalence class is called a representative of this equivalence class.

Theorem A.1.10. If 𝑆 is a set and 𝑅 is an equivalence relation on 𝑆, then the equivalence


classes of two elements 𝑆 are either equal or disjoint. In other words, 𝑆 is the disjoint union
of the equivalence classes in 𝑆/𝑅.

Exercise A.1.11. Prove Theorem A.1.10.

Example A.3.3 shows an equivalence relation.

A.1.3. Functions. In this section, we introduce and discuss functions.

Definition A.1.12. A function is a triplet 𝑓 = (𝑆, 𝑇, 𝑅) where 𝑆 and 𝑇 are sets and 𝑅 is
a relation between 𝑆 and 𝑇 that associates every element of 𝑆 with exactly one element
of 𝑇. This means that for every 𝑠 ∈ 𝑆 there is exactly one 𝑡 ∈ 𝑇 such that (𝑠, 𝑡) ∈ 𝑅.
This element 𝑡 is denoted by 𝑓(𝑠). We will write the function as

(A.1.6) 𝑓∶𝑆→𝑇

or, more explicitly, as

(A.1.7) 𝑓 ∶ 𝑆 → 𝑇, 𝑠 ↦ 𝑓(𝑠).

Such a function is also called a map or mapping from 𝑆 to 𝑇. The set of all functions
(𝑆, 𝑇, 𝑓) is denoted by 𝑆 𝑇 .
286 A. Foundations

We introduce more terminology regarding functions.


Definition A.1.13. Let
(A.1.8) 𝑓∶𝑆→𝑇
be a function.
(1) The set 𝑆 is called the domain and 𝑇 is called the codomain of 𝑓.
(2) Any 𝑠 ∈ 𝑆 is called an argument or input of 𝑓 and 𝑓(𝑠) is the value or image of 𝑓
on input 𝑥. We also call 𝑓(𝑠) the image of 𝑠 under 𝑓 and say that 𝑓 maps 𝑠 to 𝑓(𝑠).
(3) If 𝑆 ′ ⊂ 𝑆, then we denote by 𝑓(𝑆 ′ ) the set of the images of all arguments in the
subset 𝑆 ′ ; i.e.,
(A.1.9) 𝑓(𝑆′ ) = {𝑓(𝑠) ∶ 𝑠 ∈ 𝑆 ′ }.
We call 𝑓(𝑆 ′ ) the image of 𝑆 ′ under 𝑓.
(4) If 𝑠 ∈ 𝑆 and 𝑡 ∈ 𝑇 with 𝑓(𝑠) = 𝑡, then we call 𝑠 an inverse image of 𝑡 under 𝑓.
(5) Let 𝑇 ′ ⊂ 𝑇. The set of all inverse images of the elements of 𝑇 ′ is denoted by
𝑓−1 (𝑇 ′ ). Furthermore, the inverse image 𝑓−1 ({𝑡}) of a single element 𝑡 ∈ 𝑇 is
denoted by 𝑓−1 (𝑡).
Definition A.1.14. (1) The identity function on a set 𝑆 is the function 𝐼𝑆 ∶ 𝑆 →
𝑆, 𝑠 ↦ 𝑠. This function is also called the identity map, identity mapping, or
identity relation.
(2) For sets 𝐴, 𝐵, 𝐶 with 𝐵 ⊂ 𝐴 and a map 𝑓 ∶ 𝐴 → 𝐶 we denote by 𝑓|𝐵 the restriction
of 𝑓 to 𝐵, i.e., the map 𝑓|𝐵 ∶ 𝐵 → 𝐶, 𝑏 ↦ 𝑓(𝑏).
Definition A.1.15. Let 𝑓 ∶ 𝑆 → 𝑇 be a function.
(1) The function 𝑓 is called injective, one-to-one, or an injection if for all 𝑠 ∈ 𝑆 there is
exactly one inverse image of 𝑓(𝑠) under 𝑓, namely 𝑠.
(2) The function 𝑓 is called surjective, onto, or a surjection if 𝑓(𝑆) = 𝑇; i.e., for all
𝑡 ∈ 𝑇 there is an argument 𝑠 ∈ 𝑆 such that 𝑓(𝑠) = 𝑡.
(3) The function 𝑓 is called bijective or a bijection if it is injective and surjective.
(4) If 𝑓 is a bijection and 𝑆 = 𝑇, then 𝑓 is called a permutation of 𝑆.
(5) If 𝑓 is a bijection, then we denote by 𝑓−1 the function that maps 𝑡 ∈ 𝑇 to its
uniquely determined inverse image 𝑠 ∈ 𝑆 under 𝑓. This function is called the
inverse of 𝑓.
Example A.1.16. (1) Consider the function
(A.1.10) 𝑓 ∶ ℤ → {“even”, “odd”}, 𝑠 ↦ parity of 𝑠.
For example, we have 𝑓(2) = “even” and 𝑓(−3) = “odd”. This function is not
injective because there are many even and odd integers. However, the function is
surjective because there exist even and odd integers.
(2) Next, consider the function
(A.1.11) 𝑓 ∶ ℤ → ℤ, 𝑠 ↦ 𝑠 mod 11.
A.1. Basics 287

This function is neither injective nor surjective. For example, 𝑓(0) = 𝑓(11) = 0.
Therefore, 𝑓 is not injective. In addition, 𝑓 is not surjective since 𝑓(ℤ) = ℤ11 .
However, if we restrict the domain of 𝑓 to ℤ11 , then 𝑓 becomes injective. Also,
if we restrict the codomain of 𝑓 to ℤ11 , then 𝑓 becomes surjective. In fact, the
function 𝑓 ∶ ℤ11 → ℤ11 , 𝑠 ↦ 𝑠 mod 11 is the identity map on ℤ11 . Another
bijection is

(A.1.12) 𝑓 ∶ ℤ11 → ℤ11 , 𝑠 ↦ (𝑠 + 1) mod 11.

This function is not the identity map.

Next, we introduce the composition of functions.

Definition A.1.17. Let 𝑆, 𝑇, 𝑈 be sets and let

(A.1.13) 𝑓 ∶ 𝑇 → 𝑈, 𝑔∶𝑆→𝑇

be functions. Then the composition of 𝑓 and 𝑔 is the map

(A.1.14) 𝑓 ∘ 𝑔 ∶ 𝑆 → 𝑈, 𝑠 ↦ 𝑓(𝑔(𝑥)).

Example A.1.18. Consider the functions

(A.1.15) 𝑓 ∶ ℤ6 → ℤ 5 , 𝑥 ↦ 𝑥 mod 5

and

(A.1.16) 𝑔 ∶ ℤ → ℤ6 , 𝑥 ↦ 𝑥 mod 6.

Then

(A.1.17) 𝑓 ∘ 𝑔 ∶ ℤ → ℤ5 , 𝑥 ↦ 𝑓(𝑔(𝑥)) = (𝑥 mod 6) mod 5.

For instance, we have

(A.1.18) (𝑓 ∘ 𝑔)(11) = 𝑓(𝑔(11)) = (11 mod 6) mod 5 = 5 mod 5 = 0.

A.1.4. Operations. In order to be able to define algebraic structures, we intro-


duce operations on a nonempty set 𝑆.

Definition A.1.19. A binary operation on 𝑆 is a map

(A.1.19) ∘ ∶ 𝑆 × 𝑆 → 𝑆.

We write the image ∘(𝑠, 𝑠′ ) of (𝑠, 𝑠′ ) under this map as 𝑠 ∘ 𝑠′ .

Definition A.1.20. Let ∘ be an operation on 𝑆.


(1) The operation ∘ is called associative if (𝑎 ∘ 𝑏) ∘ 𝑐 = 𝑎 ∘ (𝑏 ∘ 𝑐) for all 𝑎, 𝑏, 𝑐 ∈ 𝑆.
(2) The operation ∘ is called commutative if 𝑎 ∘ 𝑏 = 𝑏 ∘ 𝑎 for all 𝑎, 𝑏 ∈ 𝑆.
(3) An element 𝑖 of 𝑆 is called the identity element with respect to ∘ if 𝑎 ∘ 𝑖 = 𝑖 ∘ 𝑎 = 𝑎
for all 𝑎 ∈ 𝑆.
288 A. Foundations

Example A.1.21. Addition and multiplication are binary operations on the sets of nat-
ural numbers ℕ and on the set of integers ℤ.
Let 𝑚 be a positive integer. We define the binary operations addition and multi-
plication on ℤ𝑘 as follows:
+𝑚 ∶ℤ𝑚 × ℤ𝑚 , (𝑎, 𝑏) ↦ 𝑎 +𝑚 𝑏 = (𝑎 + 𝑏) mod 𝑚,
(A.1.20)
⋅𝑚 ∶ℤ𝑚 × ℤ𝑚 , (𝑎, 𝑏) ↦ 𝑎 ⋅𝑚 𝑏 = (𝑎 ⋅ 𝑏) mod 𝑚.

A.1.5. Directed graphs. To define Boolean circuits in Chapter 1, directed graphs


are required, which we introduce now.
Definition A.1.22. A directed graph is a pair 𝐺 = (𝑉, 𝐸) where 𝑉 is a nonempty set
and 𝐸 is a subset of 𝑉 2 . An element of 𝑉 is called a vertex of 𝐺 and an element 𝐸 is
called an edge of 𝐺.

Figure A.1.1 gives an example of a directed graph.


Definition A.1.23. Let 𝐺 = (𝑉, 𝐸) be a directed graph.
(1) An edge (𝑢, 𝑣) of 𝐺 is called an outgoing edge of 𝑢 and an incoming edge of 𝑣.
(2) Let 𝑘 ∈ ℕ. A sequence (𝑣 0 , . . . , 𝑣 𝑘 ) is called a path in 𝐺 if (𝑣 𝑖 , 𝑣 𝑖+1 ) ∈ 𝐸 for
0 ≤ 𝑖 < 𝑘. The length of such a path is 𝑘 and is called a cycle if 𝑘 > 0 and 𝑣 0 = 𝑣 𝑘 .
(3) The graph 𝐺 is called acyclic if it has no cycles.
Exercise A.1.24. (1) Find the incoming and outgoing edges of all vertices of the
graph 𝐺 in Figure A.1.1.
(2) Remove a minimum number of edges from 𝐺 such that the graph becomes acyclic.

A B

D C

Figure A.1.1. Example of a directed graph.

A.2. The asymptotic notation


In order to compare the asymptotic behavior of functions, the following notation is
used. It is especially useful in the complexity analysis of algorithms.
Definition A.2.1. Let 𝑋 ⊂ ℝ≥0 and let 𝑓, 𝑔 ∶ 𝑋 → ℝ≥0 be functions. Then we write
the following.
(1) 𝑓 = o(𝑔) if for all 𝜀 > 0 there is 𝑥0 > 0 such that for all 𝑥 > 𝑥0 we have 𝑓(𝑥) ≤
𝜀𝑔(𝑥).
A.3. Number theory 289

(2) 𝑓 = 𝜔(𝑔) if 𝑔 = o(𝑓).


(3) 𝑓 = O(𝑔) if there are 𝐶 > 0 and 𝑥0 > 0 such that for all 𝑥 > 𝑥0 we have 𝑓(𝑥) ≤
𝐶𝑔(𝑥).
(4) 𝑓 = Ω(𝑔) if 𝑔 = O(𝑓).
(5) 𝑓 = Θ(𝑔) if 𝑓 = O(𝑔) and 𝑔 = O(𝑓).

This terminology can be asymptotically interpreted as follows: If 𝑓 = O(𝑔), then


𝑓 grows not faster than 𝑔. If 𝑓 = o(𝑔), then 𝑓 grows much slower than 𝑔. If 𝑓 = Θ(𝑔),
then these functions grow equally fast.
Example A.2.2. We have 𝑛2 = o(𝑛3 + 1), 𝑛2 = O(2𝑛2 + 𝑛 + 1), 𝑛2 = Θ(2𝑛2 + 𝑛 + 1).
Exercise A.2.3. Prove the statements in Example A.2.2.

A.3. Number theory


We present the concepts and results from number theory that are used in this book.

A.3.1. Divisibility. We begin by discussing divisibility in ℤ.


Definition A.3.1. We say that an integer 𝑚 divides an integer 𝑎 if there is an integer 𝑛
with 𝑎 = 𝑛𝑚. If 𝑚 divides 𝑎, then 𝑚 is called a divisor of 𝑎, 𝑎 is called a multiple of 𝑚,
and we write 𝑚 ∣ 𝑎. We also say that 𝑎 is divisible by 𝑚. If 𝑚 is not a divisor of 𝑎, then
we write 𝑚 ∤ 𝑎.
Example A.3.2. We have 13 ∣ 182 because 182 = 14 ∗ 13. Likewise, we have −5 ∣ 30
because 30 = (−6) ∗ (−5). The divisors of 30 are ±1, ±2, ±3, ±5, ±6, ±10, ±15, ±30.

We note that any integer 𝑚 divides 0 because 0 = 𝑚 ∗ 0. The only integer that is
divisible by 0 is 0 because 𝑎 = 0 ∗ 𝑚 implies 𝑎 = 0.
We also use divisibility in ℤ in the following example of an equivalence relation.
Example A.3.3.
Let 𝑠 and 𝑡 be integers, and let 𝑚 be a positive integer. We write
(A.3.1) 𝑠 ≡ 𝑡 mod 𝑚
if 𝑚 divides 𝑡 − 𝑠. Consider the relation
(A.3.2) 𝑅 = {(𝑠, 𝑡) ∶ 𝑠, 𝑡 ∈ ℤ, 𝑠 ≡ 𝑡 mod 𝑚}.
It is called a congruence relation. It is reflexive because 𝑚 divides 𝑠 − 𝑠 = 0 for all 𝑠 ∈ ℤ.
This relation is symmetric since 𝑚 is a divisor of 𝑡 − 𝑠 if and only if 𝑚 is a divisor of 𝑠 − 𝑡.
Finally, the relation is transitive. To see this, we let 𝑠, 𝑡, 𝑢 be integers. Suppose that
(A.3.3) 𝑠 ≡ 𝑡 mod 𝑚, 𝑡 ≡ 𝑢 mod 𝑚.
Then 𝑚 divides 𝑡 − 𝑠 and 𝑢 − 𝑡. So we can write
(A.3.4) 𝑡 − 𝑠 = 𝑥𝑚, 𝑢 − 𝑡 = 𝑦𝑚
with two integers 𝑥, 𝑦. Therefore, we have the following:
(A.3.5) 𝑢 − 𝑠 = (𝑢 − 𝑡) + (𝑡 − 𝑠) = 𝑥𝑚 + 𝑦𝑚 = (𝑥 + 𝑦)𝑚.
290 A. Foundations

This implies that 𝑚 divides 𝑢 − 𝑠 which means that


(A.3.6) 𝑠 ≡ 𝑢 mod 𝑚.
Since the congruence relation is reflexive, symmetric, and transitive, it is an equiva-
lence relation and we have
(A.3.7) ℤ/𝑅 = {{𝑖 + 𝑚ℤ} ∶ 0 ≤ 𝑖 < 𝑚}.
For instance, if 𝑚 = 3, then there are the three equivalence classes 0 + 3ℤ = {0, ±3,
±6, . . .}, 1+3ℤ = {. . . , −5, −2, 1, 4, 7, . . .}, and 2+3ℤ = {. . . , −4, −1, 2, 5, 8, . . .}. Typically,
ℤ/𝑅 is written as ℤ/𝑚ℤ.

A.3.2. Greatest common divisor. Our next topic is the greatest common divi-
sor of two integers and we will explain that the Euclidean algorithm computes it effi-
ciently. The proofs of all the results presented in this section can be found in [Buc04,
Section 1.10].
Definition A.3.4. A common divisor of 𝑎 and 𝑏 is an integer that divides both 𝑎 and
𝑏.
Proposition A.3.5. Among all common divisors of two integers 𝑎 and 𝑏, which are not
both zero, there is exactly one greatest divisor (with respect to ≤). It is called the greatest
common divisor (gcd) of 𝑎 and 𝑏.

For completeness, we set the greatest common divisor of 0 and 0 to 0 (that is,
gcd(0, 0) = 0). Therefore, the greatest common divisor of two numbers is never nega-
tive.
We present another useful characterization of the greatest common divisor.
Proposition A.3.6. There is exactly one nonnegative common divisor of 𝑎 and 𝑏, which
is divisible by all other common divisors of 𝑎 and 𝑏, namely the greatest common divisor
of 𝑎 and 𝑏.
Example A.3.7. The greatest common divisor of 18 and 30 is 6. The greatest common
divisor of −10 and 20 is 10. The greatest common divisor of −20 and −14 is 2. The
highest common divisor of 12 and 0 is 12.

An important property of the greatest common divisor is that it can be computed


very efficiently by the Euclidean Algorithm 1.1.16. The next proposition states its com-
plexity.
Proposition A.3.8. The Euclidean algorithm uses time O((bitLength 𝑎)(bitLength 𝑏))
and space O(bitLength 𝑎 + bitLength 𝑏) to compute gcd(𝑎, 𝑏).

In several contexts, we use Euler’s totient function which is defined next.


Definition A.3.9. Let 𝑁 ∈ ℕ.
(1) By ℤ∗𝑚 we denote the set of all 𝑎 ∈ ℤ𝑚 with gcd(𝑎, 𝑚) = 1.
(2) We write 𝜑(𝑚) = |ℤ∗𝑚 |.
(3) The function that sends 𝑚 ∈ ℕ to 𝜑(𝑚) is referred to as Euler’s totient function.
A.3. Number theory 291

Example A.3.10. We have ℤ∗1 = ∅, 𝜑(1) = 0, ℤ∗2 = {1}, 𝜑(2) = 1, ℤ∗15 = {1, 2, 4, 7, 8,
11, 13, 14}, and 𝜑(15) = 8.
Exercise A.3.11. Let 𝑚 ∈ ℕ, 𝑚 > 1. Prove that ℤ∗𝑚 is a group with respect to multi-
plication modulo 𝑚.

A.3.3. Least common multiple. We also require the least common multiple of
two integers.
Definition A.3.12. Let 𝑛 ∈ ℕ and let 𝑎0 , . . . , 𝑎𝑛−1 be nonzero integers. Then the least
common multiple of these integers is the smallest positive integer that is a multiple of
all 𝑎𝑖 . It is denoted by lcm(𝑎0 , . . . , 𝑎𝑛−1 ).
Example A.3.13. The least common multiple of 2, 3, 4 is lcm(2, 3, 4) = 12.

The next exercise justifies the definition of the least common multiple and finds
an algorithm for computing it.
Exercise A.3.14. (1) Prove the existence and uniqueness of the least common multi-
ple of finitely many nonzero integers. Why do these numbers have to be nonzero?
(2) Utilize the Euclidean algorithm to devise an algorithm for computing the least
common multiple in quadratic running time.

A.3.4. Prime factor decomposition. A famous quantum algorithm by Peter


Shor, which we discuss in Chapter 6, can compute the prime factor decomposition
of a positive integer in polynomial time. We now explain its mathematical foundation.
Definition A.3.15. An integer 𝑝 > 1 is called a prime number if it has exactly two pos-
itive divisors, namely 1 and 𝑝. Instead of “prime number” we also simply say “prime”.
An integer 𝑎 > 1 that is not a prime is called composite . If the prime 𝑝 divides the
integer 𝑎, then 𝑝 is called the prime divisor of 𝑎.
Example A.3.16. The first eight prime numbers are 2, 3, 5, 7, 11, 13, 17, 19. As of 2023,
the largest known prime number is 282,589,933 − 1.

We state the fundamental theorem of arithmetic. It is also called the unique fac-
torization theorem and goes back to Euclid.
Theorem A.3.17. Every integer 𝑎 > 1 can be written as the product of prime numbers.
Up to permutation, the factors in this product are uniquely determined.
Example A.3.18. The French mathematician Pierre de Fermat (1601–1665) thought
that all of the so-called Fermat numbers
𝑖
𝐹𝑖 = 22 + 1
are primes. In fact, 𝐹0 = 3, 𝐹1 = 5, 𝐹2 = 17, 𝐹3 = 257, and 𝐹4 = 65537 are prime
numbers. However, in 1732 Euler discovered that 𝐹5 = 641 ∗ 6700417 is composite.
Both factors in this decomposition are primes. 𝐹6 , 𝐹7 , 𝐹8 , and 𝐹9 are also composite. The
factorization of 𝐹6 was found in 1880 by Landry and Le Lasseur. The factorization of 𝐹7
was found in 1970 by Brillhart and Morrison. The factorization of 𝐹8 was computed in
1980 by Brent and Pollard, and 𝐹9 was factored in 1990 by Lenstra, Lenstra, Manasse,
292 A. Foundations

and Pollard (see [LLMP93] where also references for the other results mentioned in
this example can be found). This shows the difficulty of the factoring problem. But
on the other hand, we also see that there is considerable progress. It took until 1970
to factor the 39-digit number 𝐹7 , but only 20 years later the 155-digit number 𝐹9 was
factored.

A.3.5. The continued fraction algorithm. In this section, we present the con-
tinued fraction algorithm (CFA) and its properties. It is used in Shor’s order finding
algorithm to compute good rational approximations.
We start with an example.
Example A.3.19. The continued fraction [𝑎0 , 𝑎1 , 𝑎2 ] = [1, 2, 3] represents the rational
number
1
(A.3.8) [1, 2, 3] = 1 + .
1
2+
3
This rational number is
1 1 3 10
(A.3.9) [1, 2, 3] = 1 + =1+ =1+ = .
1 7 7 7
2+
3 3
Definition A.3.20. Let 𝑛 ∈ ℕ0 and let (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) ∈ ℚ≥0 × ℚ𝑛>0 . Then we set
1
(A.3.10) [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] = 𝑎0 +
1
𝑎1 +
𝑎2 +

1
+
1
𝑎𝑛−1 + .
𝑎𝑛
This defines the map
(A.3.11) ℚ≥0 × ℚ𝑛>0 → ℚ ∶ (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ) ↦ [𝑎0 , . . . , 𝑎𝑛 ].

We make some remarks concerning this definition. Let 𝑛 ∈ ℕ0 and (𝑎0 , . . . , 𝑎𝑛 ) ∈


ℚ≥0 × ℚ𝑛>0 . Then we will denote by [𝑎0 , . . . , 𝑎𝑛 ] the rational number from Definition
A.3.20 but also the sequence (𝑎0 , . . . , 𝑎𝑛 ). This will make our conversation easier. How-
ever, for each rational number, there are multiple sequences that can be used to express
it in this manner. For example, for 𝑛 > 0 we have
1
(A.3.12) [𝑎
⏟0⎵,⎵.⏟
. .⎵,⎵𝑎⏟
𝑛 ] = [𝑎0 , . . . , 𝑎𝑛−1 + ].
⏟⎵⎵⎵⎵⏟⎵⎵⎵⎵⏟ 𝑎 𝑛
𝑛
𝑛−1

Note that the sequence on the left side of (A.3.12) has length 𝑛, while the sequence on
the right side has length 𝑛 − 1. The following lemma generalizes this observation.
Lemma A.3.21. Let 𝑛, 𝑘 ∈ ℕ0 , (𝑎0 , . . . , 𝑎𝑛 ) ∈ ℚ≥0 × 𝑄𝑛>0 , and (𝑏0 , . . . , 𝑏𝑘 ) ∈ ℚ𝑘+1
>0 .
Then we have
(A.3.13) [𝑎0 , . . . , 𝑎𝑛 , [𝑏0 , . . . , 𝑏𝑘 ]] = [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘 ].
A.3. Number theory 293

Proof. We prove the assertion by induction on 𝑘. For 𝑘 = 0, it follows from the fact that
[𝑏0 ] = 𝑏0 . Now, assume that the assertion holds for 𝑘 − 1. The induction hypothesis
and (A.3.12) imply
[𝑎0 , . . . , 𝑎𝑛 , [𝑏0 , . . . , 𝑏𝑘 ]]
1
= [𝑎0 , . . . , 𝑎𝑛 , [𝑏0 , . . . , 𝑏𝑘−1 + ]]
𝑏𝑘
1
(A.3.14) = [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘−1 + ]
𝑏𝑘
= [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘 ]. □

Now we define finite simple continued fractions.

Definition A.3.22. A finite simple continued fraction is a finite sequence (𝑎0 , 𝑎1 , . . . , 𝑎𝑛 )


∈ ℕ0 × ℕ𝑛 . It has length 𝑛 and represents the rational number [𝑎0 , . . . , 𝑎𝑛 ].

As in the general case, we write [𝑎0 , . . . , 𝑎𝑛 ] for a continued fraction (𝑎0 , . . . 𝑎𝑛 );


i.e., we write it in the same way as the rational number represented by it. Example
A.3.19 shows a finite simple continued fraction.
Next, we present an algorithm that, given a nonnegative rational number, finds a
continued fraction that represents it.

Algorithm A.3.23. Continued fraction algorithm


Input: 𝑝 ∈ ℕ0 , 𝑞 ∈ ℕ
𝑝
Output: A continued fraction [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] ∈ ℕ𝑛+1 that represents 𝑞
1: CFA(𝑝, 𝑞)
2: 𝑟−1 ← 𝑝
3: 𝑟0 ← 𝑞
4: 𝑖 ← −1
5: repeat
6: 𝑖 ←𝑖+1
7: 𝑎𝑖 ← ⌊𝑟 𝑖−1 /𝑟 𝑖 ⌋
8: 𝑟 𝑖+1 ← 𝑟 𝑖−1 − 𝑎𝑖 𝑟 𝑖
9: until 𝑟 𝑖+1 = 0
10: 𝑛←𝑖
11: return ([𝑎0 , . . . , 𝑎𝑛 ])
12: end

Example A.3.24. Let 𝑝 = 15 and 𝑞 = 13. The sequence of values 𝑟 𝑖 and 𝑎𝑖 from
15
Algorithm A.3.23 is shown in Table A.3.1. We verify that [1, 6, 2] = 13 . We have

1 1 2 15
[1, 6, 2] = 1 + =1+ =1+ = .
(A.3.15) 1 13 13 13
6+
2 2
294 A. Foundations

Table A.3.1. Run of the continued fraction algorithm with input 𝑝 = 15, 𝑞 = 13.

𝑖 −1 0 1 2
𝑟𝑖 15 13 2 1
𝑎𝑖 1 6 2

As shown in Exercise A.3.25, the continued fraction algorithm is a variant of the


Euclidean algorithm that outputs the sequence (𝑎0 , . . . , 𝑎𝑛 ) of quotients that are com-
puted in this algorithm.
Exercise A.3.25. Use the notation from Algorithm A.3.23 and show that 𝑟𝑛 = gcd(𝑝, 𝑞).

The next proposition shows that the continued fraction algorithm yields the correct
result.
Proposition A.3.26. On input of 𝑝, 𝑞 ∈ ℕ, Algorithm A.3.23 computes a continued
𝑝
fraction [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] for some 𝑛 ∈ ℕ0 that represents 𝑞 and satisfies 𝑎𝑛 > 1 if 𝑛 > 0.
It is the only continued fraction with these properties.

Proof. We use the notation of Algorithm A.3.23. After each iteration of the repeat
loop, we have
(A.3.16) 𝑟 𝑖−1 = 𝑎𝑖 𝑟 𝑖 + 𝑟 𝑖+1 and 0 ≤ 𝑟 𝑖+1 < 𝑟 𝑖 for 0 ≤ 𝑖 ≤ 𝑛.
This shows that the sequence 𝑟0 , 𝑟1 , . . . is strictly decreasing. Therefore, the algorithm
terminates.
For 0 ≤ 𝑖 ≤ 𝑛 let 𝛼𝑖 = [𝑎𝑖 , . . . , 𝑎𝑛 ]. Then we have
1
(A.3.17) 𝛼𝑛 = 𝑎𝑛 and 𝛼𝑖 = 𝑎 𝑖 + for 0 ≤ 𝑖 < 𝑛.
𝛼𝑖+1
We show by induction on 𝑖 = 𝑛, 𝑛 − 1, . . . , 0 that
𝑟
(A.3.18) 𝛼𝑖 = 𝑖−1 and 𝑎𝑖 = ⌊𝛼𝑖 ⌋.
𝑟𝑖
𝑝
For 𝑖 = 0, this shows that [𝑎0 , . . . , 𝑎𝑛 ] = 𝑞 .
The base case where 𝑖 = 𝑛 is obtained from 𝑟𝑛−1 = 𝑎𝑛 𝑟𝑛 , which follows from
(A.3.16) and 𝑟𝑛+1 = 0. For the inductive step, let 𝑖 ∈ ℕ, 𝑛 ≥ 𝑖 > 0 and assume that
(A.3.18) holds. Then (A.3.16) implies
𝑟 𝑖−2 1 1
(A.3.19) = 𝑎𝑖−1 + 𝑟𝑖−1 = 𝑎𝑖−1 + = 𝛼𝑖−1 .
𝑟 𝑖−1 𝛼𝑖
𝑟𝑖
𝑟𝑖−1
Since 𝛼𝑖 = 𝑟𝑖
> 1 by (A.3.16), this implies 𝑎𝑖−1 = ⌊𝛼𝑖 ⌋. This completes the induction
proof.
Next, we assume that 𝑛 > 0 and show that 𝑎𝑛 > 1. Since 𝑟𝑛+1 = 0, we have
𝑟𝑛−1 = 𝑎𝑛 𝑟𝑛 . So 𝑎𝑛 = 1 would imply 𝑟𝑛−1 = 𝑟𝑛 which contradicts (A.3.16).
Finally, we prove the uniqueness of [𝑎0 , , . . . , 𝑎𝑛 ]. Let 𝑘 ∈ ℕ, 𝑘 ≥ 𝑛, and let
𝑝
[𝑏0 , . . . , 𝑏𝑘 ] be a continuous fraction that represents 𝑞 with 𝑏𝑘 > 1 for 𝑘 > 0. If 𝑘 = 0,
A.3. Number theory 295

𝑝
then we have 𝑛 = 0 and 𝑞 = 𝑎0 = 𝑏0 . Assume that 𝑘 > 0. For 0 ≤ 𝑖 ≤ 𝑘, let
𝛽 𝑖 = [𝑏𝑖 , . . . , 𝑏𝑘 ]. Then we have
1
(A.3.20) 𝛽 𝑘 = 𝑏𝑘 and 𝛽 𝑖 = 𝑏𝑖 + for 0 ≤ 𝑖 < 𝑘.
𝛽 𝑖+1
Since 𝑏𝑘 > 1, it follows from (A.3.20) that

(A.3.21) 𝛽𝑖 > 1 for 0 ≤ 𝑖 ≤ 𝑘.

From (A.3.20) and (A.3.21) we obtain

(A.3.22) 𝑏𝑖 = ⌊𝛽 𝑖 ⌋ for 1 ≤ 𝑖 ≤ 𝑘.

We show by induction on 𝑖 = 0, . . . , 𝑛 that

(A.3.23) 𝛼𝑖 = 𝛽 𝑖 and 𝑎𝑖 = 𝑏𝑖 .
𝑟
By assumption, we have 𝛼0 = [𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ] = 𝑟 0 = [𝑏0 , . . . , 𝑏𝑘 ] = 𝛽0 and by
𝑖−1
(A.3.18) and (A.3.22) 𝑏0 = 𝑎0 . Now assume that 0 ≤ 𝑖 < 𝑛 and that (A.3.23) holds.
Then it follows from (A.3.17) and (A.3.20) that
1 1
(A.3.24) 𝛽 𝑖+1 = = = 𝛼𝑖+1 .
𝑏𝑖 − 𝛽 𝑖 𝑎𝑖 − 𝛼𝑖
So (A.3.18) and (A.3.22) imply

(A.3.25) 𝑏𝑖+1 = ⌊𝛽 𝑖+1 ⌋ = ⌊𝛼𝑖+1 ⌋ = 𝑎𝑖+1 .

Finally, since by (A.3.20) we have 𝛽 𝑖 > 𝑏𝑖 for 1 ≤ 𝑖 < 𝑘, it follows from 𝛽𝑛 = 𝛼𝑛 = 𝑎𝑛 =


𝑏𝑛 that 𝑛 = 𝑘. □

Definition A.3.27. Let 𝛼 ∈ ℚ≥0 . The uniquely determined continued fraction from
Proposition A.3.26 that represents 𝛼 is called the continued fraction expansion of 𝛼.

We estimate the running time of the continued fraction algorithm.

Proposition A.3.28. Let 𝑝 ∈ ℕ0 , 𝑞 ∈ ℕ and let 𝑙 = max{bitLength(𝑝), bitLength(𝑞)}.


Then the number of iterations in the continued fraction algorithm is O(𝑙) and its running
time and space requirement is O(𝑙2 ).

Proof. The statement can be proved using the techniques from Section 1.6.4 in
[Buc04]. Note that the space is required to represent continued fractions. □

We now introduce convergents of continued fractions.

Definition A.3.29. Let [𝑎0 , . . . , 𝑎𝑛 ] be a continued fraction. Then for 0 ≤ 𝑖 ≤ 𝑛 the


rational numbers [𝑎0 , . . . , 𝑎𝑖 ] are called its convergents.
15
Example A.3.30. The convergents of the continued fraction expansion [1, 6, 2] of 13
1 7 15
are [1] = 1, [1, 6] = 1 + 6
= 6 , and [1, 6, 2] = 13
.
296 A. Foundations

The following proposition provides a method for computing convergents.


Proposition A.3.31. Let 𝑛 ∈ ℕ0 and (𝑎0 , . . . , 𝑎𝑛 ) ∈ ℚ≥0 × 𝑄𝑛>0 . Set
(A.3.26) 𝑝−2 = 0, 𝑝−1 = 1, 𝑞−2 = 1, 𝑞−1 = 0
and for 0 ≤ 𝑖 ≤ 𝑛 let
(A.3.27) 𝑝 𝑖 = 𝑎𝑖 𝑝 𝑖−1 + 𝑝 𝑖−2 , 𝑞𝑖 = 𝑎𝑖 𝑞𝑖−1 + 𝑞𝑖−1 .
Then
𝑝𝑖
(A.3.28) [𝑎0 , . . . , 𝑎𝑖 ] = for 0 ≤ 𝑖 ≤ 𝑛.
𝑞𝑖

Proof. We prove the assertion by induction on 𝑛. For the base case, we note that
𝑝0 𝑎
(A.3.29) = 0 = [𝑎0 ]
𝑞0 1
and
𝑝1 𝑎 𝑎 +1 1
(A.3.30) = 1 0 = 𝑎0 + = [𝑎0 , 𝑎1 ].
𝑞1 𝑎1 𝑎1
For the inductive step, let 𝑖 ∈ ℕ0 , 0 ≤ 𝑖 < 𝑛 and assume that the assertion of the
proposition holds for 𝑖. The induction hypothesis gives
1
[𝑎0 , . . . , 𝑎𝑖+1 ] = [𝑎0 , . . . , 𝑎𝑖 + ]
𝑎𝑖+1
1
(𝑎𝑖 + 𝑎𝑖+1
)𝑝 𝑖−1 + 𝑝 𝑖−2
= 1
(𝑎𝑖 + 𝑎𝑖+1
)𝑞𝑖−1 + 𝑞𝑖−2
𝑝𝑖−1 𝑝𝑖−1
𝑎𝑖 𝑝 𝑖−1 + 𝑝 𝑖−2 + 𝑎𝑖+1
𝑝𝑖 + 𝑎𝑖+1
(A.3.31) = 𝑞𝑖−1 = 𝑞𝑖−1
𝑎𝑖 𝑞𝑖−1 + 𝑞𝑖−2 + 𝑎𝑖+1
𝑞𝑖 + 𝑎𝑖+1
𝑎𝑖+1 𝑝 𝑖 + 𝑝 𝑖−1 𝑝
= = 𝑖+1 . □
𝑎𝑖+1 𝑞𝑖 + 𝑞𝑖−1 𝑞𝑖+1
Exercise A.3.32. Let [𝑎0 , . . . , 𝑎𝑛 ] be a finite simple continued fraction. Use the nota-
tion from Proposition A.3.31 and show that 𝑝 𝑖−1 𝑞𝑖 + 𝑝 𝑖 𝑞𝑖−1 = (−1)𝑖 for −1 ≤ 𝑖 ≤ 𝑛
and gcd(𝑝 𝑖 , 𝑞𝑖 ) = 1 for −2 ≤ 𝑖 ≤ 𝑛.

Now we show that there are exactly two finite simple continued fractions that rep-
resent a given positive rational number.
Proposition A.3.33. Let 𝛼 ∈ ℚ>0 and let [𝑎0 , . . . , 𝑎𝑛 ] be the continued fraction expan-
sion of 𝛼. Then [𝑎0 , . . . , 𝑎𝑛−1 , 𝑎𝑛 − 1, 1] is the only other finite simple continued fraction
that represents 𝛼.

Proof. For −2 ≤ 𝑖 ≤ 𝑛 denote by 𝑝 𝑖 , 𝑞𝑖 the integers from Proposition A.3.31 for the
continued fraction [𝑎0 , . . . , 𝑎𝑛 ], and for −2 ≤ 𝑖 ≤ 𝑛 + 1 denote by 𝑝𝑖′ , 𝑞′𝑖 the correspond-
ing integers for the continued fraction [𝑎0 , . . . , 𝑎𝑛 − 1, 1]. Then we have

(A.3.32) 𝑝𝑛′ = (𝑎𝑛 − 1)𝑝𝑛−1 + 𝑝𝑛−2 , 𝑝𝑛+1 = 𝑝𝑛′ + 𝑝𝑛−1 = 𝑎𝑛 𝑝𝑛−1 + 𝑝𝑛−2 = 𝑝𝑛 .
A.3. Number theory 297

In the same way, it can be verified that 𝑞′𝑛+1 = 𝑞𝑛 . This shows that [𝑎0 , . . . , 𝑎𝑛 − 1, 1] =
𝛼.
The uniqueness is proved in Exercise A.3.34. □
Exercise A.3.34. Verify the uniqueness claim in Proposition A.3.33.

Finally, we prove that sufficiently good approximations to a positive rational num-


ber are convergents of its continued fraction expansion.
Proposition A.3.35.
|𝑝 | 1
(A.3.33) | − 𝛼| ≤ 2 .
|𝑞 | 2𝑞
𝑝
Then 𝑞
is a convergent of the continued expansion of the fraction of 𝛼.

𝑝
Proof. If 𝛼 = 0, then 𝑝 = 0 and 𝑞
is the only convergent of 𝛼. Let 𝛼 ≠ 0. Let 𝛿 be the
rational number with
𝑝 𝛿
(A.3.34) 𝛼= + .
𝑞 2𝑞2
Then the assumption of the proposition implies |𝛿| ≤ 1. By Proposition A.3.33, we can
𝑝
choose a simple continued fraction [𝑎0 , . . . , 𝑎𝑛 ] that represents 𝑞 such that
(A.3.35) sign 𝛿 = (−1)𝑛 .
𝑝 𝑝𝑛
For −2 ≤ 𝑖 ≤ 𝑛, define 𝑝 𝑖 , 𝑞𝑖 as in Proposition A.3.31. Then we have 𝑞
= 𝑞𝑛
. Set
2 𝑞
(A.3.36) 𝜆= − 𝑛−1 .
|𝛿| 𝑞𝑛
Then 𝑞𝑛−1 > 𝑞𝑛 and |𝛿| ≤ 1 imply
(A.3.37) 𝜆 > 2 − 1 = 1.
Also, we have
2𝑝𝑛 𝑝𝑛 𝑞𝑛−1
(A.3.38) 𝜆𝑝𝑛 + 𝑝𝑛−1 − + 𝑝𝑛−1
|𝛿| 𝑞𝑛
and
2𝑞𝑛
(A.3.39) 𝜆𝑞𝑛 + 𝑞𝑛−1 = .
|𝛿|
It follows from Exercise A.3.32 and (A.3.35) that
𝜆𝑝𝑛 + 𝑝𝑛−1 𝑝 (𝑝 𝑞 − 𝑝 𝑞 )|𝛿| 𝑝 𝛿
(A.3.40) = 𝑛 − 𝑛 𝑛−1 2𝑛−1 𝑛 = 𝑛 − 2 = 𝛼.
𝜆𝑞𝑛 + 𝑞𝑛−1 𝑞𝑛 2𝑞𝑛 𝑞𝑛 2𝑞𝑛
So it follows from Proposition A.3.31 that
(A.3.41) 𝛼 = [𝑎0 , . . . , 𝑎𝑛−1 , 𝜆].
If [𝑏0 , . . . , 𝑏𝑘 ] is the continued fraction expansion of 𝜆, then (A.3.37) implies 𝑏0 > 0.
Therefore, it follows from Lemma A.3.21 that 𝛼 = [𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑘 ]. Since 𝑏𝑘 > 1,
𝑝
this is the continued fraction expansion of 𝛼. So 𝑞𝑛 = [𝑎0 , . . . , 𝑎𝑛 ] is a convergent
𝑛
of 𝛼. □
298 A. Foundations

Example A.3.36. We note that

(A.3.42) || 15 − 7|
|| =
1

1
=
1
.
| 13 6 78 2 ∗ 6 2 72
7
So Proposition A.3.35 predicts that 6
is a convergent of the continued fraction expan-
15
sion of 13
which we have shown in Example A.3.30.

A.4. Algebra
We introduce a few basic concepts of algebra that are required in this book.

A.4.1. Semigroups.
Definition A.4.1. A semigroup is a pair (𝑆, ∘) where 𝑆 is a nonempty set and ∘ is an
associative binary operation on 𝑆. If it is clear from the context which operation ∘ we
refer to, then we also write 𝑆 instead of (𝑆, ∘).

Next, we introduce some notions concerning semigroups.


Definition A.4.2. Let (𝑆, ∘) be a semigroup.
(1) The semigroup is called abelian or commutative if the operation ∘ is commutative.

(2) The semigroup is called a monoid if 𝑆 contains an identity element with respect
to ∘.
Exercise A.4.3. Prove that ℕ is not a monoid with respect to addition.

We show that the identity element and the inverses in monoids are uniquely de-
termined.
Proposition A.4.4. A monoid has exactly one identity element.

Proof. Let 𝑖 and 𝑖′ be identity elements in 𝑆. Then 𝑖 = 𝑖 ∘ 𝑖′ = 𝑖′ . □


Definition A.4.5. An element 𝑢 of a monoid (𝑆.∘) with unit element 1 is called invert-
ible of 𝑆 if there is 𝑢′ ∈ 𝑆 with 𝑢 ∘ 𝑢′ = 𝑢′ ∘ 𝑢 = 1.
Proposition A.4.6. Every invertible element of a monoid has a uniquely determined
inverse.

Proof. Suppose that 𝑢 is a unit of a monoid (𝑆, ∘) with the unit element 1 and that 𝑢′
and 𝑢″ are inverses of 𝑢. Then 𝑢′ = 𝑢 ∘ 1 = 𝑢′ ∘ (𝑢 ∘ 𝑢″ ) = (𝑢′ ∘ 𝑢) ∘ 𝑢″ = 1 ∘ 𝑢″ = 𝑢″ . □

A.4.2. Groups. We now define groups.


Definition A.4.7. (1) A group is a monoid in which every element is invertible.
(2) A commutative or abelian group is a group in which the group operation is com-
mutative.
Proposition A.4.8. If (𝑆, ∘) is a monoid and 𝑈 is the set of all invertible elements in 𝑆,
then (𝑈, ∘) is a group, called the unit group of (𝑆, ∘).
A.4. Algebra 299

Exercise A.4.9. Prove Proposition A.4.8.

We give a few concrete examples of semigroups, monoids, and groups.

Example A.4.10. The claims in this example are shown in Exercise A.4.11.
(1) (ℕ, +) is an abelian semigroup. But (ℕ, +) is not a monoid since there is no identity
element in this semigroup.
(2) (ℤ, +), (ℚ, +), (ℝ, +), and (ℂ, +) are abelian groups with identity element 0.
(3) (ℕ, ⋅) and (ℤ, ⋅) are abelian monoids with identity element 1. But they are not
groups. In fact, the only unit in (ℕ, ⋅) is 1 and the only units in (ℤ, ⋅) are ±1.
(4) (ℚ ⧵ {0}, ⋅), (ℝ ⧵ {0}, ⋅), and (ℂ ⧵ {0}, ⋅) are abelian groups with identity element 1.
(5) If 𝑚 ∈ ℕ, then (ℤ𝑚 , +𝑚 ) is a finite abelian group.
(6) If 𝑚 ∈ ℕ, then (ℤ∗𝑚 , ⋅𝑚 ) is a finite abelian group.

Exercise A.4.11. Show that the claims in Example A.4.10 are correct.

Exercise A.4.12. Let 𝑚 be a positive integer. Determine if each of (ℤ𝑚 , +𝑚 ) and


(ℤ𝑚 , ⋅𝑚 ) is a semigroup (abelian?), a monoid, or a group. If applicable, determine the
respective identity elements and unit groups.

We also define the order of elements of finite groups. This notion is motivated by
the following observation.

Proposition A.4.13. Let 𝐺 be a finite group and let 𝑔 ∈ 𝐺. Then there is 𝑛 ∈ ℕ such
that 𝑔𝑛 = 1.

Exercise A.4.14. Prove Proposition A.4.13.

Here is the definition of element orders and the order of an integer modulo another
integer.

Definition A.4.15. (1) Let 𝐺 be a finite group and let 𝑔 ∈ 𝐺. Then the smallest
positive integer 𝑛 with 𝑔𝑛 = 1 is called the order of 𝑔 in 𝐺.
(2) Let 𝑁 ∈ 𝑁 and let 𝑎 ∈ ℤ∗𝑁 . Then the order of 𝑎 in the multiplicative group ℤ∗𝑁 is
called the order of 𝑎 modulo 𝑁.

Example A.4.16. The order of 3 modulo 4 is 3 because 22 ≡ 4 mod 7 and 23 ≡ 8 ≡


1 mod 7.

For a further discussion of the order of elements of finite abelian groups we refer
to [Buc04, Sections 2.9 and 2.14].
𝑚 𝑒
Exercise A.4.17. Let 𝑚 ∈ ℕ, let 𝑝1 , . . . , 𝑝𝑚 be prime numbers, and let 𝑁 = ∏𝑖=1 𝑝𝑖 𝑖
where 𝑒 1 , . . . , 𝑒 𝑚 are positive integers. Also, let 𝑎 ∈ ℤ such that gcd(𝑎, 𝑁) = 1. Show
that the order of 𝑎 modulo 𝑁 is the least common multiple of the orders of 𝑎 modulo
𝑒
𝑝𝑖 𝑖 for 1 ≤ 𝑖 ≤ 𝑛.
300 A. Foundations

A.4.3. The symmetric group. In many contexts, for example, in Section 4.5,
symmetric groups play an important role. We discuss them in this section.

Proposition A.4.18. Let 𝑆 be a nonempty set. Then the following hold.


(1) The set of all permutations of 𝑆 is a group with respect to composition. It is called
the symmetric group on 𝑆.
(2) If 𝑆 is finite, then the symmetric group on 𝑆 has order |𝑆|!.

Exercise A.4.19. Prove Proposition A.4.18.

In the next definition, a special symmetric group is introduced.

Definition A.4.20. Let 𝑛 ∈ ℕ. The symmetric group of degree 𝑛 is the group of permu-
tations of ℤ𝑛 . It is denoted by 𝑆𝑛 . Also, if 𝜋 ∈ 𝑆𝑛 , then we write 𝜋 as
0 1 2 ... 𝑛−1
(A.4.1) ( )
𝜋(0) 𝜋(1) 𝜋(2) . . . 𝜋(𝑛 − 1)
or simply as the sequence (𝜋(0), 𝜋(1), . . . , 𝜋(𝑛 − 1)).

Exercise A.4.21. Let 𝑛 ∈ ℕ. Show that 𝑆𝑛 = 𝑛! = 2 ⋅ 3 ⋅ 4 ⋯ 𝑛.

Example A.4.22. There are 3! = 6 permutations of degree 3. They are


0 1 2 0 1 2 0 1 2
( ), ( ), ( ),
0 1 2 0 2 1 1 0 2
(A.4.2)
0 1 2 0 1 2 0 1 2
( ), ( ), ( ).
1 2 0 2 0 1 2 2 0

We introduce transpositions.

Definition A.4.23. A transposition is a permutation in a symmetric group over some


set 𝑆 that exchanges two elements of 𝑆 and does not change the other elements. If 𝑎
and 𝑏 are the exchanged elements, then we write the transposition as (𝑎, 𝑏).

Example A.4.24. The transpositions in 𝑆 3 are the permutations


0 1 2 0 1 2 0 1 2
(A.4.3) ( ), ( ), ( ).
0 2 1 1 0 2 2 1 0
They can also be written as (1, 2), (0, 1), and (0, 2).

We prove an important representation theorem for the symmetric group of de-


gree 𝑛.

Theorem A.4.25. Let 𝑛 ∈ ℕ. Then every element of the symmetric group of degree 𝑛 can
be written as a composition of at most 𝑛 − 1 transpositions.

Proof. We prove the theorem by induction on 𝑛. For 𝑛 = 1, the assertion is valid


because 𝑆 1 does not contain transpositions. Assume that the assertion holds for 𝑛 and
let 𝜋 ∈ 𝑆𝑛+1 .
A.4. Algebra 301

First, assume that 𝜋(𝑛) = 𝑛. Then the map 𝜋′ = 𝜋|ℤ𝑛 is in 𝑆𝑛 . By the induction
hypothesis, 𝜋′ can be written as the composition of at most 𝑛 − 1 transpositions in 𝑆𝑛
that are also transpositions in 𝑆𝑛+1 . Also, since 𝜋(𝑛) = 𝑛, the permutation 𝜋 is the
composition of the same transpositions.
Now assume 𝜋(𝑛) = 𝑗 with 𝑗 < 𝑛. Then the map 𝜋′ = 𝜋 ∘ (𝑗, 𝑛)|ℤ𝑛 is in 𝑆𝑛 . By
the induction hypothesis, 𝜋′ can be written as the composition of at most 𝑛 − 1 trans-
positions in 𝑆𝑛 that are also transpositions in 𝑆𝑛+1 . Therefore, the permutation 𝜋 is
the composition of the same transpositions and (𝑗, 𝑛). These are at most 𝑛 transposi-
tions. □

Example A.4.26. Consider the permutation 𝜋 = (3, 2, 1, 0) in 𝑆 4 . We can write

0 1 2 3 0 1 2 3
(A.4.4) 𝜋=( )=( ) ∘ (0, 3) = (1, 2) ∘ (0, 3).
3 2 1 0 0 2 1 3

Exercise A.4.27. Find a polynomial time algorithm that computes the representation
of a permutation in 𝑆𝑛 as a product of at most 𝑛 − 1 transpositions in 𝑆𝑛 .

We define the sign of the elements of the symmetric group.

Definition A.4.28. Let 𝑘 ∈ ℕ and let 𝜎 ∈ 𝑆 𝑘 . Then the sign of 𝜎 is defined as

(A.4.5) sign(𝜎) = |{(𝑢, 𝑣) ∈ ℤ2𝑘 ∶ 𝑢 < 𝑣 and 𝜎(𝑢) > 𝜎(𝑣)}|.

Example A.4.29. Consider the permutation (0, 2, 1) ∈ 𝑆 3 . Then we have 𝜎(0) = 0,


𝜎(1) = 2, and 𝜎(2) = 1. So (𝑢, 𝑣) = (1, 2) is the only pair in ℤ23 such that 𝑢 < 𝑣 and
𝜎(𝑢) > 𝜎(𝑣) and, therefore, sign(𝜎) = 1.

We show how to determine the sign of a permutation.

Proposition A.4.30. Let 𝑘, 𝑚 ∈ ℕ and suppose that 𝜎 ∈ 𝑆 𝑘 can be written as a compo-


sition of 𝑚 transpositions. Then sign 𝜎 = (−1)𝑚 .

Proof. We prove the assertion by induction on 𝑚. If 𝑚 = 1, then 𝜎 is equal to a trans-


position (𝑢, 𝑣) with 𝑢, 𝑣 ∈ ℤ𝑘 . This implies sign 𝜎 = −1. Now assume that 𝑚 ≥ 1 and
that the assertion holds for 𝑚. Also, suppose that 𝜎 ∈ 𝑆 𝑘 can be represented as a com-
position of 𝑚 + 1 transpositions. Then we can write 𝜎 = 𝜏 ∘ (𝑢, 𝑣) where 𝜏 ∈ 𝑆 𝑘 is the
composition of 𝑚 transpositions and (𝑢, 𝑣) is another transposition with 𝑢, 𝑣 ∈ ℤ𝑘 . This
representation and the induction hypothesis imply sign 𝜎 = − sign 𝜏 = (−1)𝑚+1 . □

Corollary A.4.31. Let 𝑘 ∈ ℕ. Then the map 𝑆 𝑘 → {±1}, 𝜎 ↦ sign 𝜎 is a surjective


homomorphism.

Exercise A.4.32. Prove Corollary A.4.31.

A.4.4. Subgroups. We also introduce the notion of a subgroup.

Definition A.4.33. Let (𝐺, ∘) be a group, and let 𝐻 be a subset of 𝐺. Then 𝐻 is called
a subgroup of 𝐺 if (𝐻, ∘) is a group.
302 A. Foundations

Example A.4.34. (2ℤ, +) is a subgroup of (ℤ, +) where 2ℤ = {2𝑧 ∶ 𝑧 ∈ ℤ} is the set of


all even integers.

Next, we discuss quotient groups. We will confine our discussion to commutative


groups, as this is the only aspect required for the context of this book.
Proposition A.4.35. Let (𝐺, ∘) be a commutative group with identity element 𝑒 and let
𝐻 be a subgroup of 𝐺. Let 𝐺/𝐻 = {𝑔𝐻 ∶ 𝑔 ∈ 𝐺}. Then
(A.4.6) ∘ ∶ 𝐺/𝐻 × 𝐺/𝐻 → 𝐺/𝐻, (𝑔0 𝐻, 𝑔1 𝐻) → (𝑔0 ∘ 𝑔1 )𝐻
is a well-defined operation on 𝐺/𝐻 and (𝐺/, ∘) is a commutative group with identity ele-
ment 𝑒𝐻. It is called the quotient of 𝐺 and 𝐻.
Exercise A.4.36. Prove Proposition A.4.35.
Example A.4.37. The set 5ℤ of all integer multiples of 5 is a subgroup of the commu-
tative group ℤ with respect to addition. The corresponding quotient group is ℤ/5ℤ =
{𝑎 + 5ℤ, 0 ≤ 𝑎 < 5}. Its identity element is 0 + 5ℤ = 5ℤ.

A.4.5. Rings and fields. Another basic notion in algebra is that of a ring, which
we define now.
Definition A.4.38. A ring is a triple (𝑅, +, ⋅) where 𝑅 is a nonempty set and + and ⋅ are
binary operations on 𝑅 called addition and multiplication. They satisfy the following
conditions.
(1) (𝑅, +) is an abelian group.
(2) (𝑅, ⋅) is a monoid.
(3) Multiplication is distributive with respect to addition, meaning that
• 𝑎 ⋅ (𝑏 + 𝑐) = 𝑎 ⋅ 𝑏 + 𝑎 ⋅ 𝑐 for all 𝑎, 𝑏, 𝑐 ∈ 𝑅 (left distributivity),
• (𝑏 + 𝑐) ⋅ 𝑎 = 𝑏 ⋅ 𝑎 + 𝑐 ⋅ 𝑎 for all 𝑎, 𝑏, 𝑐 ∈ 𝑅 (right distributivity).
Definition A.4.39. Let (𝑅, +, ⋅) be a ring.
(1) The ring 𝑅 is called commutative if the semigroup (𝑅, ⋅) is commutative.
(2) The unit group of the ring 𝑅 is the unit group of the monoid (𝑅, ⋅).
(3) A zero divisor in 𝑅 is an element 𝑎 ∈ 𝑅 such that there are 𝑥, 𝑦 ∈ 𝑅 with 𝑥𝑎 =
𝑎𝑦 = 0.
(4) The ring 𝑅 is called a field if (𝑅 ⧵ {0}, ⋅) is an abelian group.
Theorem A.4.40. Let (𝑅, +, ⋅) be a ring. Then the set of units and the set of zero divisors
in this ring are disjoint.
Exercise A.4.41. Prove Theorem A.4.40.

We give a few examples of rings, their unit groups, and zero divisors.
Example A.4.42. The claims of this example are verified in Exercise A.4.43.
(1) The integers, equipped with the usual addition and multiplication, are a commu-
tative ring without zero divisors.
A.4. Algebra 303

(2) If 𝑘 is a positive integer, then (ℤ𝑘 , +𝑘 , ⋅𝑘 ) is a commutative ring where +𝑘 and ⋅𝑘


are defined as explained in Example A.1.21. The units in this ring are all integers
𝑎 ∈ ℤ𝑘 such that gcd(𝑎, 𝑘) = 1. The zero divisors in this ring are all integers
𝑎 ∈ ℤ𝑘 such that gcd(𝑎, 𝑘) > 1.
(3) The rational numbers, real numbers, and complex numbers equipped with the
usual addition and multiplication are fields.
Exercise A.4.43. Verify the claims in Example A.4.42.

As explained in [Buc04, Section 2.20], for all prime numbers 𝑝 and all positive
integers 𝑒, there is a finite field with 𝑞 = 𝑝𝑒 elements. It is uniquely determined up to
isomorphism and is denoted by 𝔽𝑞 . These are all the finite fields that exist.

A.4.6. Polynomial rings. In this section, we assume that 𝑅 is a commutative


ring with unit element 1 and we discuss polynomials over 𝑅.
Definition A.4.44. A polynomial in one variable 𝑥 over 𝑅 is an expression
(A.4.7) 𝑓(𝑥) = 𝑎𝑛 𝑥𝑛 + 𝑎𝑛−1 𝑥𝑛−1 + ⋯ + 𝑎1 𝑥 + 𝑎0
where 𝑎𝑖 ∈ 𝑅 for 0 ≤ 𝑖 ≤ 𝑛 and 𝑥 is not an element of 𝑅. We also use the following
notation.
(1) The ring elements 𝑎𝑖 are called the coefficients of 𝑓.
(2) The set of all polynomials over 𝑅 in the variable 𝑥 is denoted by 𝑅[𝑥].
(3) If in (A.4.7) the coefficient 𝑎𝑛 is not zero, then 𝑛 is called the degree of the poly-
nomial 𝑓, we write 𝑛 = deg 𝑓, and 𝑎𝑛 is called the leading coefficient of 𝑓.
(4) If all the coefficients of 𝑓 except the leading one are zero, then 𝑓 is called a mono-
mial.
(5) If 𝑓 = 𝑎0 , then 𝑓 is called a constant polynomial or simply a constant.
(6) If 𝑟 ∈ 𝑅, then we write 𝑓(𝑟) = 𝑎𝑛 𝑟𝑛 + ⋯ + 𝑎0 .
(7) If 𝑟 ∈ 𝑅 with 𝑓(𝑟) = 0, then 𝑟 is called the zero or the root of 𝑓.
Example A.4.45. The polynomials 2𝑥3 + 𝑥 + 1, 𝑥, 1 are elements of ℤ[𝑥]. The first
polynomial has degree 3, the second has degree 1, and the third has degree 0. Also, the
second and third polynomials are monomials. The first polynomial is not a monomial.
Example A.4.46. Consider the polynomial 𝑥2 + 1 in ℤ2 [𝑥]. This polynomial has the
only zero 1.

We define sums and products of polynomials over 𝑅. Let


𝑔(𝑥) = 𝑏𝑚 𝑥𝑚 + ⋯ + 𝑏0
be another polynomial over 𝑅 and let 𝑛 ≥ 𝑚. The sum of the polynomials 𝑓 and 𝑔 is
the polynomial
(𝑓 + 𝑔)(𝑥) = (𝑎𝑛 + 𝑏𝑛 )𝑥𝑛 + ⋯ + (𝑎0 + 𝑏0 ).
Here, the undefined coefficients are set to zero.
Example A.4.47. Let 𝑓(𝑥) = 𝑥3 + 2𝑥2 + 𝑥 + 2 and 𝑔(𝑥) = 𝑥2 + 𝑥 + 1. Then (𝑓 + 𝑔)(𝑥) =
𝑥3 + 3𝑥2 + 2𝑥 + 3.
304 A. Foundations

The product of the polynomials 𝑓 and 𝑔 is

(𝑓𝑔)(𝑥) = 𝑐𝑛+𝑚 𝑥𝑛+𝑚 + ⋯ + 𝑐 0 ,

where
𝑘
𝑐 𝑘 = ∑ 𝑎𝑖 𝑏𝑘−𝑖 , 0 ≤ 𝑘 ≤ 𝑛 + 𝑚.
𝑖=0

In this formula, the undefined coefficients 𝑎𝑖 and 𝑏𝑖 are set to 0.

Example A.4.48. Let 𝑓(𝑥) = 𝑥3 + 2𝑥2 + 𝑥 + 2 and 𝑔(𝑥) = 𝑥2 + 𝑥 + 1 ∈ ℤ[𝑥]. Then

(𝑓𝑔)(𝑥) = (𝑥3 + 2𝑥2 + 𝑥 + 2)(𝑥2 + 𝑥 + 1)


= 𝑥5 + (2 + 1)𝑥4 + (1 + 2 + 1)𝑥3 + (2 + 1 + 2)𝑥2 + (2 + 1)𝑥 + 2
= 𝑥5 + 3𝑥4 + 4𝑥3 + 5𝑥2 + 3𝑥 + 2.

In the proof of Proposition 2.2.25 we use the discriminant of quadratic polynomials


which are introduced in the next exercise.

Exercise A.4.49. Let 𝑓(𝑥) = 𝑎𝑥2 + 𝑏𝑥 + 𝑐 ∈ 𝑅[𝑥]. Then the discriminant of 𝑓 is


Δ(𝑓) = 𝑏2 − 4𝑎𝑐. Prove the following. Let 𝑅 = ℝ. If 𝛿(𝑓) > 0, then 𝑓 has two distinct
real zeros. If Δ(𝑓) = 0, then 𝑓 has two identical real zeros. If Δ(𝑓) < 0, the 𝑓 has two
complex nonreal zeros which are the complex conjugates of each other.

Proposition A.4.50. Let 𝐹 be a field and let 𝑓 ∈ 𝐹[𝑥] be a polynomial of degree 𝑛. If


𝛼 ∈ 𝐹 is a zero of 𝑓, then 𝑛 ≥ 1 and we can write 𝑓 = (𝑥−𝛼)𝑔 with a polynomial 𝑔 ∈ 𝐹[𝑥]
of degree 𝑛 − 1.

Exercise A.4.51. Prove Proposition A.4.50.

Example A.4.52. The polynomial 𝑓(𝑥) = 𝑥2 − 1 ∈ ℚ[𝑥] has the zero 1 and can be
written as 𝑓(𝑥) = (𝑥 − 1)(𝑥 + 1).

Finally we state the fundamental theorem of algebra. It can be found in [FK03,


31.18 Theorem].

Theorem A.4.53. If 𝑓 ∈ ℂ[𝑥] is a polynomial of degree 𝑛. Then 𝑓 can be written as


𝑛−1
𝑓 = ∏𝑖=0 (𝑥 − 𝛼𝑖 ) with complex zeros 𝛼𝑖 for 0 ≤ 𝑖 < 𝑛.

Exercise A.4.54. Let 𝑓 ∈ ℝ[𝑥] be a polynomial of degree 𝑛 ∈ ℕ. Show that there are
nonnegative integers 𝑠 and 𝑡 such that 𝑛 = 𝑠 + 2𝑡 and 𝑓 has 𝑠 real zeros and 2𝑡 pairs of
complex conjugate zeros.

A.5. Trigonometric identities and inequalities


We present trigonometric identities and inequalities that are necessary for several
proofs. We start with a few identities that can be found in [Abr72], and the equation
A.5. Trigonometric identities and inequalities 305

numbers from [Abr72] are provided after each identity:


2
(A.5.1) sin 𝑥 + cos2 𝑥 = 1, 4.3.10,
(A.5.2) sin(𝑥 + 𝑦) = sin 𝑥 cos 𝑦 + cos 𝑥 sin 𝑦, 4.3.16,
(A.5.3) sin(𝑥 − 𝑦) = sin 𝑥 cos 𝑦 − cos 𝑥 sin 𝑦, 4.3.13, 4.3.14, 4.3.16,
(A.5.4) sin 2𝑥 = 2 sin 𝑥 cos 𝑥, 4.3.24,
(A.5.5) cos(𝑥 + 𝑦) = cos 𝑥 cos 𝑦 − sin 𝑥 sin 𝑦, 4.3.17,
(A.5.6) cos(𝑥 − 𝑦) = cos 𝑥 cos 𝑦 + sin 𝑥 sin 𝑦, 4.3.13, 4.3.14, 4.3.17,
2 2
(A.5.7) cos 2𝑥 = cos2 𝑥 − sin 𝑥 = 2 cos2 𝑥 − 1 = 1 − 2 sin 𝑥, 4.3.25
(A.5.8) cos((𝑛 + 1)𝑦) + cos((𝑛 − 1)𝑦) = 2 cos(𝑦) cos(𝑛𝑦) for all 𝑛 ∈ ℕ0
Exercise A.5.1. Prove (A.5.8).

We now demonstrate some necessary trigonometric inequalities.


Lemma A.5.2. For all 𝑥, 𝑦 ∈ ℝ we have
(A.5.9) | sin 𝑥 − sin 𝑦| ≤ |𝑥 − 𝑦|, | cos 𝑥 − cos 𝑦| ≤ |𝑥 − 𝑦|.

Proof. Let 𝑥, 𝑦 ∈ ℝ. If 𝑥 = 𝑦, then the statement is valid. So, let 𝑥 ≠ 𝑦. By the mean
value theorem, there is 𝑧 ∈ ℝ with
(A.5.10) sin(𝑥) − sin(𝑦) = cos(𝑧)(𝑥 − 𝑦).
This implies
(A.5.11) | sin(𝑥) − sin(𝑦)| = | cos(𝑧)||𝑥 − 𝑦| ≤ |𝑥 − 𝑦|.
Likewise, the mean value theorem implies
(A.5.12) cos(𝑥) − cos(𝑦) = − sin(𝑧)(𝑥 − 𝑦).
This implies
(A.5.13) | cos(𝑥) − cos(𝑦)| = | sin(𝑧)||𝑥 − 𝑦| ≤ |𝑥 − 𝑦|. □
1
Lemma A.5.3. For all 𝑥 ∈ [0, 2 ] we have sin 𝜋𝑥 ≥ 2𝑥.

Proof. [Abr72, 4.3.79]. □


𝜋𝑦
Corollary A.5.4. For all 𝑦 ∈ [0, 1] we have arcsin 𝑦 ≤ 2
.
Lemma A.5.5. For all 𝑥 ∈ ℝ we have |sin 𝑥| ≤ |𝑥|.

Proof. [Abr72, 4.3.80]. □


Corollary A.5.6. For all 𝑦 ∈ [−1, 1] we have |arcsin 𝑦| ≥ |𝑦|.
Lemma A.5.7. For all 𝑥 ∈ [0, 𝜋/2] we have cos 𝑥 ≥ 𝑥.

Proof. Consider the function 𝑓(𝑥) = cos 𝑥 − 𝑥. We have 𝑓(0) = 1, 𝑓(𝜋/2) = 0, and
𝑓′ (𝑥) = − sin 𝑥 − 1 < 0 for all 𝑥 ∈ [0, 𝜋/2]. □
306 A. Foundations

Lemma A.5.8. For 𝑥, 𝑦 ∈ ℝ we have


2 2 2 2
(A.5.14) sin (𝑥 + 𝑦) − sin 𝑥 = sin 𝑥 cos 𝑥 sin(2𝑦) + (1 − 2 sin 𝑥) sin 𝑦,
2 2 2 2
(A.5.15) sin 𝑥 − sin (𝑥 − 𝑦) = sin 𝑥 cos 𝑥 sin(2𝑦) − (1 − 2 sin 𝑥) sin 𝑦.
Exercise A.5.9. Prove Lemma A.5.8 using the appropriate trigonometric identities.
𝜋
Lemma A.5.10. Let 𝑥, 𝛼 ∈ ℝ>0 with 0 ≤ 𝑥𝛼 < 2
. Then we have
sin 𝑥𝛼
(A.5.16) ≥ cos 𝑥𝛼.
𝑥 sin 𝛼
Proof. Since 0 ≤ 𝑥𝛼 < 𝜋/2 we have
sin 𝑥𝛼
(A.5.17) = tan 𝑥𝛼 ≥ 𝑥𝛼 ≥ 𝛼 sin 𝛼. □
cos 𝑥𝛼
𝜋
Lemma A.5.11. For all 𝛼 ∈ [0, 2 [ and 𝑘 ∈ ℕ we have
𝑘
sin(2𝑘+1 𝛼)
(A.5.18) ∏ cos(2𝑙 𝛼) = .
𝑙=1
2𝑘 sin 2𝛼

Proof. We prove the assertion by induction on 𝑘. For 𝑘 = 0 the assertion holds since
both sides of (A.5.18) are equal to 1. So assume that 𝑘 ≥ 0 and that the assertion is true
for 𝑘. Then we have
𝑘+1 𝑘
∏ cos(2𝑙 𝛼) = cos(2𝑘+1 𝛼) ∏ cos(2𝑙 𝛼)
(A.5.19) 𝑙=1 𝑙=0

sin(2𝑘+1 𝛼) sin(2𝑘+2 𝛼)
= cos(2𝑘+1 𝛼) =
2𝑘 sin(2𝛼) 2𝑘+1 sin(2𝛼)
where the last equality follows from the trigonometric identity (A.5.4). □
Appendix B

Linear Algebra

Linear algebra plays an essential role in modeling phenomena in diverse scientific dis-
ciplines. Its efficiency in algorithmic solutions empowers the resolution of compu-
tational challenges and the formulation of concrete predictions in various scientific
domains.
In the context of this book, linear algebra assumes particular importance, since it
includes the theory of Hilbert spaces, which serves as a framework for modeling quan-
tum mechanics. To comprehend this theory fully, it becomes necessary to establish a
foundation in linear algebra, which we provide in this appendix.
The appendix is divided into two parts. In the initial part, which includes Sec-
tions B.1 to B.7, we provide a brief overview of fundamental concepts commonly found
in introductory linear algebra courses. We assume the reader’s familiarity with these
concepts and present them as reference points, omitting proofs, examples, and exer-
cises. The topics encompass vectors, matrices, modules over rings, vector spaces, lin-
ear maps, characteristic polynomials, eigenvalues, and eigenspaces. This section also
covers the Gaussian elimination algorithm, its complexity, and its applications in de-
termining bases for linear map images and kernels, as well as solving linear systems.
The subsequent part of this appendix, Section B.8, presents tensor products of mod-
ules and vector spaces, as well as the concept of the partial trace. This area of study
typically falls outside the scope of introductory linear algebra courses but is crucial for
modeling quantum mechanics mathematically. As this topic may be unfamiliar or en-
tirely new to readers, we include comprehensive explanations, proofs, examples, and
exercises to facilitate understanding.
Throughout this chapter, we use the following notation. By 𝑘, 𝑙, 𝑚, and 𝑛 we denote
positive integers, (𝑅, +, ⋅) is a commutative ring, and (𝐹, +, ⋅) is a field. We write 0 for
the identity elements with respect to addition in 𝑅 and 𝐹 and we write 1 for the identity
elements with respect to multiplication in 𝑅 and 𝐹. If 𝑟 ∈ 𝑅 or 𝑟 ∈ 𝐹 is invertible
with respect to multiplication, we write 𝑟−1 for its multiplicative inverse in 𝑅 or 𝐹,
respectively.

307
308 B. Linear Algebra

B.1. Vectors
Definition B.1.1. (1) A vector over a nonempty set 𝑆 is a sequence 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 )
of elements in 𝑆. The positive integer 𝑘 is called the length of 𝑣.⃗ The elements 𝑣 𝑖
are called the entries or components of 𝑣.⃗
(2) The set of all vectors of length 𝑘 over 𝑆 is denoted by 𝑆 𝑘 .
(3) For 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ 𝑆 𝑘 we also write 𝑣 ⃗ = (𝑣 𝑖 )0≤𝑖<𝑘 or 𝑣 ⃗ = (𝑣 𝑖 )𝑖∈ℤ𝑘 .
(4) If 𝑣 ⃗ ∈ 𝑆 𝑙 and 𝑤⃗ ∈ 𝑆 𝑘 , then 𝑣‖⃗ 𝑤⃗ denotes the concatenation of 𝑣 ⃗ and 𝑤⃗ which is an
element of 𝑆 𝑘+𝑙 .

Usually, we will start by numbering the entries of a vector by 0, but we may also
number the entries differently. We do not distinguish between row vectors and column
vectors. However, an analogous distinction is introduced in Section B.4.1.
Proposition B.1.2. Let 𝑆 be finite and let 𝑘 ∈ ℕ. Then |𝑆𝑘 | = |𝑆|𝑘 .

B.1.1. Vector operations. For vectors over the ring 𝑅, we define the following
operations.
Definition B.1.3. Let 𝑟 ∈ 𝑅 and 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ), 𝑤⃗ = (𝑤 0 , . . . , 𝑤 𝑘−1 ) ∈ 𝑅𝑘 .
(1) The scalar product of 𝑟 with 𝑣 ⃗ is defined as
(B.1.1) 𝑟𝑣 ⃗ = 𝑟 ⋅ 𝑣 ⃗ = (𝑟𝑣 0 , 𝑟𝑣 1 , . . . , 𝑟𝑣 𝑘−1 ).
(2) The sum of 𝑣 ⃗ and 𝑤⃗ is defined as
(B.1.2) 𝑣 ⃗ + 𝑤⃗ = (𝑣 0 + 𝑤 0 , 𝑣 1 + 𝑤 1 , . . . , 𝑣 𝑘−1 + 𝑤 𝑘−1 ).
(3) We write −𝑤⃗ = (−𝑤 0 , . . . , −𝑤 𝑘−1 ) and 𝑣 ⃗ − 𝑤⃗ for 𝑣 ⃗ + (−𝑤).

(4) The dot product of 𝑣 ⃗ and 𝑤⃗ is an element of 𝑅 which is defined as
𝑘−1
(B.1.3) 𝑣 ⃗ ⋅ 𝑤⃗ = ∑ 𝑣 𝑖 𝑤 𝑖 .
𝑖=0

B.2. Modules and vector spaces


Definition B.2.1. An R-module or module over 𝑅 is a triplet (𝑀, +, ⋅) where 𝑀 is a
nonempty set and
+ ∶ 𝑀 × 𝑀 → 𝑀, (𝑣,⃗ 𝑤)⃗ ↦ 𝑣 ⃗ + 𝑤,⃗
(B.2.1)
⋅ ∶ 𝑅 × 𝑀 → 𝑀, (𝑟, 𝑤)⃗ ↦ 𝑟𝑤⃗ = 𝑟 ⋅ 𝑤⃗
are maps, called addition and scalar multiplication, which satisfy the following condi-
tions.
(1) (𝑀, +) is an abelian group.
(2) Scalar multiplication is associative; that is, (𝑟 ⋅ 𝑠) ⋅ 𝑣 ⃗ = 𝑟 ⋅ (𝑠 ⋅ 𝑣)⃗ for all 𝑟, 𝑠 ∈ 𝑅 and
all 𝑣 ⃗ ∈ 𝑀.
B.2. Modules and vector spaces 309

(3) The identity element 1 of 𝑅 is also an identity element with respect to scalar mul-
tiplication; that is, 1 ⋅ 𝑣 ⃗ = 𝑣 ⃗ for all 𝑣 ⃗ ∈ 𝑀.
(4) Scalar multiplication is distributive with respect to addition in 𝑀; that is,
𝑟 ⋅ (𝑣 ⃗ + 𝑤)⃗ = 𝑟 ⋅ 𝑣 ⃗ + 𝑟 ⋅ 𝑤⃗ for all 𝑟 ∈ 𝑅 and all 𝑣,⃗ 𝑤⃗ ∈ 𝑀.
(5) Scalar multiplication is distributive with respect to addition in 𝑅; that is, (𝑟+𝑠)⋅ 𝑣 ⃗ =
𝑟 ⋅ 𝑣 ⃗ + 𝑠 ⋅ 𝑣 ⃗ for all 𝑟, 𝑠 ∈ 𝑅 and all 𝑣 ⃗ ∈ 𝑀.

Definition B.2.2. Any module over the field 𝐹 is called a vector space over 𝐹 or an
𝐹-vector space.
Proposition B.2.3. (1) (𝑅𝑘 , +, ⋅) is an 𝑅-module, where “+” and “⋅” denote addition
and scalar multiplication in ℝ𝑘 , respectively.
(2) (𝐹 𝑘 , +, ⋅) is an 𝐹-vector space, where “+” and “⋅” denote addition and scalar multi-
plication in 𝔽𝑘 , respectively.
(3) For all 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ) ∈ 𝑅𝑘 , the element −𝑣 ⃗ = (−𝑣 0 , . . . , −𝑣 𝑘−1 ) is the additive
inverse of 𝑣.⃗
(4) The zero vector 0⃗ = (0, . . . , 0) is the neutral element in ℝ𝑘 with respect to addition.

To simplify our notation, we denote an 𝑅-module (𝑀, +, ⋅) or an 𝐹-vector space


(𝑉, +, ⋅) also by 𝑀 and 𝑉, respectively, if it is clear what is meant by addition and scalar
multiplication. Unless otherwise specified, module addition is always denoted by +
and scalar multiplication by ⋅.

B.2.1. Submodules.
Definition B.2.4. (1) An R-submodule of 𝑀 is a nonempty subset 𝑁 of 𝑀 such that
𝑁 is a subgroup of 𝑀 with respect to addition and 𝑁 is closed under scalar multi-
plication; that is, 𝑟𝑣 ⃗ ∈ 𝑁 for all 𝑣 ⃗ ∈ 𝑁 and all 𝑟 ∈ 𝑅. If it is clear from the context
what is meant by the ring 𝑅, we call 𝑁 a submodule of 𝑀.
(2) An 𝐹-subspace 𝑊 of 𝑉 is an 𝐹-submodule 𝑊 of 𝑉. If it is clear from the context
what is meant by the field 𝐹, we call 𝑊 a subspace of 𝑀.
Proposition B.2.5. (1) Every 𝑅-submodule of 𝑀 is an 𝑅-module with the same addi-
tion and scalar multiplication as in 𝑀.
(2) Every 𝐹-subspace of 𝑉 is an 𝐹-vector space with the same addition and scalar mul-
tiplication as in 𝑉.
Definition B.2.6. (1) Let 𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ , 𝑣 ⃗ be vectors in 𝑀. We say that 𝑣 ⃗ is a linear
combination of the vectors 𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ if 𝑣 ⃗ can be written as
𝑙−1
(B.2.2) 𝑣 ⃗ = 𝑟0 𝑣 0⃗ + ⋯ + 𝑟 𝑙−1 𝑣 𝑙−1
⃗ = ∑ 𝑟𝑗 𝑣𝑗⃗
𝑗=0

with 𝑟𝑗 ∈ 𝑅 for 0 ≤ 𝑗 < 𝑙. The ring elements 𝑟𝑗 are called the coefficients of the
linear combination (B.2.2).
(2) The linear combination of the empty sequence in 𝑀 is defined to be 0.⃗
310 B. Linear Algebra

(3) For any subset 𝑆 of 𝑀 we define the span of 𝑆 as the set of all linear combinations
of finitely many elements of 𝑆 including the empty set. We write it as Span(𝑆).
So, we have
𝑙−1
(B.2.3) Span(𝑆) = { ∑ 𝑟𝑗 𝑣𝑗⃗ ∶ 𝑙 ∈ ℕ0 , 𝑟𝑗 ∈ 𝑅, 𝑣𝑗⃗ ∈ 𝑆, for all 𝑗 ∈ ℤ𝑙 } .
𝑗=0


In particular, the span of the empty set is {0}.
(4) We say that 𝑀 is finitely generated if 𝑀 = Span(𝑆) for a finite subset 𝑆 of 𝑀.

Proposition B.2.7. Let 𝑆 be a subset of 𝑀. Then the span of 𝑆 is an 𝑅-module, and it


is the (with respect to inclusion) smallest submodule of 𝑀 that contains 𝑆. It is called the
submodule generated by 𝑆.

Proposition B.2.8. Let 𝑋 be a set of submodules of 𝑀. Then the set

(B.2.4) ∑ 𝑁 = { ∑ 𝑣𝑁
⃗ ∶ 𝑣𝑁
⃗ ∈ 𝑁, finitely many 𝑣 𝑁
⃗ are nonzero}
𝑁∈𝑋 𝑁∈𝑋

is a submodule of 𝑀. It is called the sum of the submodules in 𝑋.

Definition B.2.9. Let 𝑋 be a set of submodules of 𝑀.


(1) The sum ∑𝑁∈𝑋 𝑁 of the submodules in 𝑋 is called direct if all of its nonzero ele-
ments 𝑣 ⃗ have a uniquely determined representation

(B.2.5) 𝑣 ⃗ = ∑ 𝑣𝑁

𝑁∈𝑋

where 𝑣 𝑁
⃗ ∈ 𝑁 and only finitely many of these elements are nonzero.
(2) If ∑𝑁∈𝑋 𝑁 is direct, then the module 𝑃 = ∑𝑁∈𝑋 𝑁 is called the direct sum of the
submodules in 𝑋.

Definition B.2.10. A submodule 𝑊 of an 𝐹-vector space 𝑉 is called an 𝐹-linear sub-


space of 𝑉 or simply a subspace of 𝑉. All notions concerning modules transfer analo-
gously to subspaces.

B.2.2. Direct product of modules.


𝑙−1
Proposition B.2.11. Let 𝑀0 , . . . , 𝑀𝑙−1 be 𝑅-modules. On the Cartesian product ∏𝑖=0 𝑀𝑖
we define componentwise addition and scalar multiplication as follows. If (𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ ),
𝑙−1
(𝑤⃗ 0 , . . . , 𝑤⃗ 𝑙−1 ) ∈ ∏𝑖=0 𝑀𝑖 and 𝑟 ∈ 𝑅, then we set

(𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ ) + (𝑤⃗ 0 , . . . , 𝑤⃗ 𝑙−1 ) = (𝑣 0⃗ + 𝑤⃗ 0 , . . . , 𝑣 𝑙−1
⃗ + 𝑤⃗ 𝑙−1 ),
(B.2.6)
𝑟 ⋅ (𝑣 0⃗ , . . . , 𝑣 𝑙−1
⃗ ) = (𝑟 ⋅ 𝑣 0⃗ , . . . , 𝑟 ⋅ 𝑣 𝑙−1
⃗ ).
𝑙−1
Then (∏𝑖=0 𝑀𝑖 , +, ⋅) is an 𝑅-module. It is called the direct product of the 𝑅-modules
𝑙−1
𝑀0 , . . . , 𝑀𝑙−1 and is also written as ∏𝑖=0 𝑀𝑖 .
B.3. Linear maps between modules 311

B.2.3. Quotient modules.


Proposition B.2.12. Let 𝑁 be a submodule of an 𝑅-module 𝑀. On the quotient group
𝑀/𝑁 we define the scalar product
(B.2.7) ⋅ ∶ 𝑅 × 𝑀/𝑁 → 𝑀/𝑁, (𝑟, 𝑣 ⃗ + 𝑁) ↦ 𝑟 ⋅ (𝑣 ⃗ + 𝑁) = 𝑟𝑣 ⃗ + 𝑁.
Then (𝑀/𝑁, +, ⋅) is an 𝑅-module. It is called a quotient module or, more precisely, the
quotient of 𝑀 and 𝑁.

B.2.4. Free modules.


Definition B.2.13. Let 𝑀 be an 𝑅-module and let 𝐼 be a nonempty set. Let 𝐵 = (𝑏𝑖⃗ )𝑖∈𝐼
be a family of elements 𝑏𝑖⃗ ∈ 𝑀.
(1) 𝐵 is called a generating system of 𝑀 if 𝑀 = ∑𝑖∈𝐼 𝑅𝑏𝑖⃗ .
(2) 𝐵 is called linearly independent if 𝐵 is the direct sum of the submodules 𝑅𝑏𝑖⃗ , 𝑖 ∈ 𝐼.
(3) 𝐵 is called linearly dependent if 𝐵 is not linearly independent.
(4) 𝐵 is called a basis of 𝑀 if 𝐵 is a linearly independent generating system of 𝑀.
(5) If 𝑀 has a basis, then 𝑀 is called a free module.
Theorem B.2.14. Every vector space 𝑉 over a field 𝐹 is free.

B.3. Linear maps between modules


Definition B.3.1. (1) A map
(B.3.1) 𝑓∶𝑀→𝑁
is called 𝑅-linear or an 𝑅-module homomorphism if it preserves the operations of
addition and scalar multiplication; that is, we have
(a) 𝑓(𝑣 ⃗ + 𝑤)⃗ = 𝑓(𝑣)⃗ + 𝑓(𝑤)⃗ for all 𝑣,⃗ 𝑤⃗ ∈ 𝑀,
(b) 𝑓(𝑟𝑣)⃗ = 𝑟𝑓(𝑣)⃗ for all 𝑟 ∈ 𝑅 and all 𝑣 ⃗ ∈ 𝑉.
(2) The set of all 𝑅-module homomorphisms 𝑓 ∶ 𝑀 → 𝑁 is denoted by Hom𝑅 (𝑀, 𝑁).
If it is clear from the context which ring 𝑅 we refer to, then we simply write
Hom(𝑀, 𝑁).
(3) A bijective 𝑅-module homomorphism 𝑓 ∶ 𝑀 → 𝑁 is called an 𝑅-module isomor-
phism or an isomorphism between 𝑀 and 𝑁.
(4) The 𝑅-modules 𝑀 and 𝑁 are called isomorphic if there is an isomorphism between
𝑀 and 𝑁.

The notions from Definition B.3.1 transfer analogously to vector spaces.


Definition B.3.2. Let
(B.3.2) 𝑓, 𝑔 ∶ 𝑀 → 𝑁
be functions and let 𝑟 ∈ 𝑅.
(1) The sum of 𝑓 and 𝑔 is the function
(B.3.3) 𝑓 + 𝑔 ∶ 𝑀 → 𝑁, 𝑣 ⃗ ↦ 𝑓(𝑣)⃗ + 𝑔(𝑣).

312 B. Linear Algebra

(2) The scalar product of 𝑟 with 𝑓 is the function


(B.3.4) 𝑟𝑓 = 𝑟 ⋅ 𝑓 ∶ 𝑀 → 𝑁, 𝑣 ⃗ ↦ 𝑟𝑓(𝑣).

Proposition B.3.3. (Hom(𝑀, 𝑁), +, ⋅) is an 𝑅-module.
Proposition B.3.4. For every 𝑣 ⃗ ∈ 𝑀 define the map
(B.3.5) 𝑓𝑣⃗ ∶ 𝑅 → 𝑀, 𝑟 ↦ 𝑟𝑣.⃗
Then 𝑓𝑣⃗ ∈ Hom(𝑅, 𝑀) and the map
(B.3.6) 𝑀 → Hom(𝑅, 𝑀), 𝑣 ⃗ ↦ 𝑓𝑣⃗
is an isomorphism of 𝑅-modules.

B.3.1. Kernel and image.


Definition B.3.5. Let 𝑓 ∈ Hom𝑅 (𝑀, 𝑁).
(1) The image of 𝑓 is defined as
(B.3.7) im(𝑓) = {𝑓(𝑢)⃗ ∶ 𝑢⃗ ∈ 𝑀}.
(2) The kernel of 𝑓 is defined as
(B.3.8) ⃗
ker(𝑓) = {𝑢⃗ ∈ 𝑀 ∶ 𝑓(𝑢)⃗ = 0}.

The following theorem is called the fundamental homomorphism theorem for 𝑅-


modules.
Theorem B.3.6. Let 𝑓 ∈ Hom𝑅 (𝑀, 𝑁). Then the kernel of 𝑓 is an 𝑅-submodule of 𝑀,
the image of 𝑓 is an 𝑅-submodule of 𝑁, and the map
(B.3.9) 𝑔 ∶ 𝑀/ ker(𝑓) → im(𝑓), 𝑢⃗ + ker(𝑓) ↦ 𝑓(𝑢)⃗
is an isomorphism of 𝑅-modules.

B.3.2. The dual module.


Definition B.3.7. (1) The set Hom𝑅 (𝑀, 𝑅) of homomorphisms between 𝑀 and the
ring 𝑅 is called the dual module of 𝑀. It is denoted by 𝑀 ∗ .
(2) If 𝑉 is an 𝐹-vector space, then 𝑉 ∗ is called the dual vector space of 𝑀.

B.3.3. 𝑅-algebras.
Definition B.3.8. An 𝑅-algebra is a tuple (𝑀, +, ⋅, ∘) which has the following proper-
ties.
(1) (𝑀, +, ⋅) is an 𝑅-module.
(2) (𝑀, +, ∘) is a ring.
(3) The scalar multiplication of the 𝑅-module 𝑀 is associative with respect to the
∘- operation; that is,
(B.3.10) 𝑟 ⋅ (𝐴 ∘ 𝐵) = (𝑟 ⋅ 𝐴) ∘ 𝐵 = 𝐴 ∘ (𝑟 ⋅ 𝐵)
for all 𝐴, 𝐵 ∈ 𝑀 and 𝑟 ∈ 𝑅.
B.4. Matrices 313

Definition B.3.9. Let (𝑀, +, ⋅, ∘) and (𝑁, +, ⋅, ∘) be 𝑅-algebras. A function


(B.3.11) 𝑓∶𝑀→𝑁
is called linear if it is an 𝑅-module homomorphism and also preserves the operation ∘;
that is, we have
(B.3.12) 𝑓(𝐴 ∘ 𝐵) = 𝑓(𝐴) ∘ 𝑓(𝐵)
for every 𝐴, 𝐵 ∈ 𝑀 and
(B.3.13) 𝑓(𝐼𝑀 ) = 𝐼𝑁
where 𝐼𝑀 and 𝐼𝑁 are the identity elements of 𝑀 and 𝑁, respectively. Such a linear
map is also called an R-algebra homomorphism. The set of all such homomorphisms is
denoted by Hom𝑅 (𝑀, 𝑁). The homomorphism 𝑓 is called an 𝑅-algebra isomorphism
if it is bijective.

B.3.4. Endomorphisms.

Definition B.3.10. (1) An 𝑅-module homomorphism that maps 𝑀 to itself is called


an 𝑅-module endomorphism or simply an endomorphism of 𝑀.
(2) The set of all endomorphisms of 𝑀 is denoted by End𝑅 (𝑀) or by End(𝑀) if it is
clear to which ring 𝑅 we refer.
(3) A bijective endomorphism of 𝑀 is called an automorphism of 𝑀.
(4) The set of all automorphisms of 𝑀 is denoted by Aut𝑅 (𝑀) or simply by Aut(𝑀) if
it is clear which ring 𝑅 we refer to.

Proposition B.3.11. The quadruple (End(𝑀), +, ⋅, ∘) is an 𝑅-algebra where + and ⋅ de-


note addition and scalar product as specified in Definition B.3.2 and ∘ means composition
of functions. It is called the endomorphism algebra of 𝑀. Also, (𝑀, +, ∘) is called the en-
domorphism ring of 𝑀.

If 𝑀 is an 𝑅-algebra, then the notions of an 𝑅-algebra endomorphism and 𝑅-algebra


automorphism are defined analogously to the corresponding notions for 𝑅-modules.
Furthermore, the sets of all such endomorphisms and automorphisms are denoted by
End𝑅 (𝑀) and Aut𝑅 (𝑀), respectively.

B.4. Matrices
In this section, we introduce matrices which play a very important role in linear algebra
as representations of module homomorphisms. We let 𝑆 be a nonempty set.

Definition B.4.1. (1) A 𝑘 × 𝑙 matrix over 𝑆 is a two-dimensional schema


𝑎 𝑎0,1 ... 𝑎0,𝑙−1
⎛ 0,0 ⎞
𝑎 𝑎1,1 ... 𝑎1,𝑙−1
(B.4.1) 𝐴 = ⎜ 1,0 ⎟
⎜ ⋮ ⋮ ⋮ ⎟
⎝𝑎𝑘,0 𝑎𝑘,1 ... 𝑎𝑘,𝑙−1 ⎠
314 B. Linear Algebra

where 𝑎𝑖,𝑗 ∈ 𝑆 for 0 ≤ 𝑖 < 𝑘, 0 ≤ 𝑗 < 𝑙. The elements 𝑎𝑖,𝑗 are called the entries
of the matrix 𝐴. This matrix can also be written as 𝐴 = (𝑎𝑖,𝑗 )0≤𝑖<𝑘,0≤𝑗<𝑙 or 𝐴 =
(𝑎𝑖,𝑗 )𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑙 . If the ranges of 𝑘 and 𝑙 are clear from the context, we also write
𝐴 = (𝑎𝑖,𝑗 ).
(2) The set of all 𝑘 × 𝑙 matrices over 𝑆 is denoted by 𝑆 (𝑘,𝑙) .

Before we give examples of matrices, we need the following definition.


Definition B.4.2. Let
𝑎 𝑎0,1 ... 𝑎0,𝑙−1
⎛ 0,0 ⎞
𝑎 𝑎1,1 ... 𝑎1,𝑙−1
(B.4.2) 𝐴 = ⎜ 1,0 ⎟ ∈ 𝑆 (𝑘,𝑙) .
⎜ ⋮ ⋮ ⋮ ⎟
⎝𝑎𝑘,0 𝑎𝑘,1 . . . 𝑎𝑘,𝑙−1 ⎠
(1) We call (𝑎𝑖,0 , 𝑎𝑖,1 , . . . , 𝑎𝑖,𝑙−1 ), 0 ≤ 𝑖 < 𝑘, the row vectors or rows of 𝐴 and
(𝑎0,𝑗 , 𝑎1,𝑗 , . . . , 𝑎𝑘−1,𝑗 ), 0 ≤ 𝑗 < 𝑙, the column vectors or columns of 𝐴.
(2) The transpose of 𝐴 is the matrix
𝑎 𝑎1,0 ... 𝑎𝑘−1,0
⎛ 0,0 ⎞
𝑎 𝑎1,1 ... 𝑎𝑘−1,1
(B.4.3) 𝐴T = (𝑎𝑗,𝑖 )𝑗∈ℤ𝑙 ,𝑖∈ℤ𝑘 = ⎜ 0,1 ⎟ ∈ 𝑆 (𝑙,𝑘) .
⎜ ⋮ ⋮ ⋮ ⎟
⎝𝑎0,𝑙−1 𝑎1,𝑙−1 ... 𝑎𝑘−1,𝑙−1 ⎠
So, the rows of 𝐴T are the columns of 𝐴 and vice versa.
⃗ ∈ 𝑆 𝑘 . Then we write
Definition B.4.3. Let 𝑎0⃗ , . . . , 𝑎𝑙−1
(B.4.4) 𝐴 = (𝑎0⃗ , . . . , 𝑎𝑙−1
⃗ )
for the matrix in 𝑆 (𝑘,𝑙) with column vectors 𝑎0⃗ , . . . , 𝑎𝑙−1
⃗ and
𝑎0⃗
(B.4.5) 𝐴=( ⋮ )
𝑎𝑙−1

for the matrix in 𝑆 (𝑙,𝑘) with row vectors 𝑎0⃗ , . . . , 𝑎𝑙−1
⃗ .

B.4.1. Vectors as matrices. In some contexts, it is useful to identify vectors with


matrices. We do this in the following way. Let 𝑆 be a nonempty set, let 𝑘 ∈ ℕ, and let
𝑎⃗ = (𝑎0 , . . . , 𝑎𝑘−1 ) ∈ 𝑆 𝑘 . Then we identify 𝑎⃗ with the matrix
𝑎
⎛ 0 ⎞
𝑎
(B.4.6) 𝑎⃗ = ⎜ 1 ⎟ ∈ 𝑆 (𝑘,1) .
⎜ ⋮ ⎟
⎝𝑎𝑘−1 ⎠
This matrix has 𝑎⃗ as its only column vector.
Using this identification, we define the transpose of 𝑎⃗ to be the matrix with 𝑎⃗ as its
only row vector; that is,
(B.4.7) 𝑎T⃗ = (𝑎0 𝑎1 ... 𝑎𝑘−1 ) ∈ 𝑆 (1,𝑘) .
B.5. Square matrices 315

B.4.2. Matrix operations.

Definition B.4.4. Let 𝑟 ∈ 𝑅, 𝐴 = (𝑎𝑖,𝑗 ), 𝐵 = (𝑏𝑖,𝑗 ) ∈ 𝑅(𝑙,𝑘) .

(1) The scalar product of 𝑟 with 𝐴 is the matrix 𝑟 ⋅ 𝐴 = 𝑟𝐴 = (𝑟𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑙) . This
operation is called scalar multiplication.
(2) The sum of 𝐴 and 𝐵 is

(B.4.8) 𝐴 + 𝐵 = (𝑎𝑖,𝑗 + 𝑏𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑙) .

This operation is called the (componentwise) addition of matrices.


(3) We write −𝐵 = (−𝑏𝑖,𝑗 ) and 𝐴 − 𝐵 for 𝐴 + (−𝐵).

Proposition B.4.5. (1) (𝑅(𝑘,𝑙) , +, ⋅) is an 𝑅-module where “+” denotes matrix addition
and “⋅” stands for scalar multiplication on 𝑅(𝑘,𝑙) .
(2) For all 𝐴 ∈ 𝑅(𝑘,𝑙) the matrix −𝐴 is the additive inverse of 𝐴.
(3) The neutral element of the group (𝑅(𝑘,𝑙) , +) is the zero matrix in 𝑅(𝑘,𝑙) all of whose
entries are zero. We denote it by 0𝑘,𝑙 or as 0 if it is clear from the context what is
meant by 𝑘, 𝑙.

Definition B.4.6. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑙) and 𝐵 = (𝑏𝑖,𝑗 ) ∈ 𝑅(𝑙,𝑚) . Then we define the
product of 𝐴 and 𝐵 as
𝑙−1
(B.4.9) 𝐴 ⋅ 𝐵 = ( ∑ 𝑎𝑖,ᵆ 𝑏ᵆ,𝑗 ) .
ᵆ=0 𝑖∈ℤ𝑘 ,𝑗∈ℤ𝑚

Instead of 𝐴 ⋅ 𝐵 we also write 𝐴𝐵.

Proposition B.4.7. Matrix multiplication is associative in the following sense. If 𝐴 ∈


𝑅(𝑘,𝑙) , 𝐵 ∈ 𝑅(𝑙,𝑚) , and 𝐶 ∈ 𝑅(𝑚,𝑛) , then we have

(B.4.10) (𝐴 ⋅ 𝐵) ⋅ 𝐶 = 𝐴 ⋅ (𝐵 ⋅ 𝐶).

Definition B.4.8. Let 𝐴 ∈ 𝑅(𝑘,𝑙) with column vectors 𝑎0⃗ , . . . , 𝑎𝑙−1 ⃗ and let 𝑣 ⃗ =
(𝑣 0 , . . . , 𝑣 𝑙−1 ) ∈ 𝑅𝑙 . Then we define the product of 𝐴 with 𝑣 ⃗ as
𝑙−1
(B.4.11) 𝐴 ⋅ 𝑣 ⃗ = 𝐴𝑣 ⃗ = ∑ 𝑣 𝑖 𝑎𝑖⃗ .
𝑗=0

Note that the product of a matrix 𝐴 with a vector 𝑣 ⃗ is the same as the product of 𝐴
with the matrix corresponding to 𝑣.⃗

B.5. Square matrices


Square matrices, that is, matrices with the same number of rows and columns, are of
special interest in linear algebra. In this section, we discuss the structure and properties
of the set 𝑅(𝑘,𝑘) of all 𝑘 × 𝑘 square matrices over 𝑅.
316 B. Linear Algebra

Definition B.5.1. (1) Let 𝐴 ∈ 𝑆 (𝑘,𝑘) ,

𝒂 𝑎0,1 𝑎0,2 ... 𝑎0,𝑘−1


⎛ 𝟎,𝟎 ⎞
𝑎 𝒂𝟏,𝟏 𝑎1,2 ... 𝑎1,𝑘−1
⎜ 1,0 ⎟
(B.5.1) 𝐴 = ⎜ 𝑎2,0 𝑎2,1 𝒂𝟐,𝟐 ... 𝑎2,𝑘−1 ⎟ .
⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⎟
⎝𝑎𝑘−1,0 𝑎𝑘−1,1 𝑎𝑘−1,2 . . . 𝒂𝒌−𝟏,𝒌−𝟏 ⎠

Then the entries 𝑎𝑖,𝑖 , 0 ≤ 𝑖 < 𝑘, are called the diagonal elements of 𝐴 (highlighted
in (B.5.1)). The other entries are called the off-diagonal elements of 𝐴.
(2) The zero matrix of order 𝑘 over 𝑅 is the matrix in 𝑅(𝑘,𝑘) all of whose entries are 0.
We denote it by 0𝑘 or simply by 0 if it is clear from the context what is meant by
𝑘.
(3) The identity matrix of order 𝑘 over 𝑅 is the following square matrix in 𝑅(𝑘,𝑘) :

1 0 0 ⋯ 0
⎛ ⎞
0 1 0 ⋯ 0
⎜ ⎟
(B.5.2) 𝐼𝑘 = ⎜0 0 1 ⋯ 0⎟ .
⎜⋮ ⋮ ⋮ ⋱ ⋮⎟
⎝0 0 0 ⋯ 1⎠

All of its diagonal elements are 1 and the off-diagonal elements are 0. The matrix
𝐼𝑘 is also called the unit matrix of order and is denoted by 𝐼 if 𝑘 is clear from the
context.

Definition B.5.2. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑆 (𝑘,𝑘) .

(1) We say that 𝐴 is an upper triangular matrix or in upper triangular form if 𝐴 is of


the form

𝑎 𝑎0,1 𝑎0,2 ... 𝑎0,𝑘−1


⎛ 0,0 ⎞
0 𝑎1,1 𝑎1,2 ... 𝑎1,𝑘−1
⎜ ⎟
(B.5.3) 𝐴=⎜ 0 0 𝑎2,2 ... 𝑎2,𝑘−1 ⎟ ;
⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⎟
⎝ 0 0 0 ... 𝑎𝑘−1,𝑘−1 ⎠

that is, 𝑎𝑖,𝑗 = 0 for 0 ≤ 𝑗 < 𝑖 < 𝑘.


(2) We say that 𝐴 is a lower triangular matrix or in lower triangular form if 𝐴 is of the
form

𝑎 0 0 ... 0
⎛ 0,0 ⎞
𝑎 𝑎1,1 0 ... 0
⎜ 1,0 ⎟
(B.5.4) 𝐴 = ⎜ 𝑎2,0 𝑎2,1 𝑎2,2 ... 0 ⎟;
⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⎟
⎝𝑎𝑘−1,0 𝑎𝑘−1,1 𝑎𝑘−1,2 . . . 𝑎𝑘−1,𝑘−1 ⎠

that is, 𝑎𝑖,𝑗 = 0 for 0 ≤ 𝑖 < 𝑗 < 𝑘.


B.5. Square matrices 317

(3) We say that 𝐴 is a diagonal matrix or in diagonal form if 𝐴 is of the form


𝑎0,0 0 0 ... 0
⎛ ⎞
0 𝑎1,1 0 ... 0
⎜ ⎟
(B.5.5) 𝐴=⎜ 0 0 𝑎2,2 ... 0 ⎟;
⎜ ⋮ ⋮ ⋮ ⋱ ⋮ ⎟
⎝ 0 0 0 . . . 𝑎𝑘−1,𝑘−1 ⎠
that is, 𝑎𝑖,𝑗 = 0 for 0 ≤ 𝑖, 𝑗 < 𝑘, 𝑖 ≠ 𝑗. For such a matrix, we also write
(B.5.6) 𝐴 = diag(𝑎0,0 , . . . , 𝑎𝑘−1,𝑘−1 ).

B.5.1. Algebraic structure of 𝑅(𝑘,𝑘) .


Proposition B.5.3. The set (𝑅(𝑘,𝑘) , +, ⋅, ⋅) is an 𝑅-algebra where “+” is matrix addition,
the first “⋅” means scalar multiplication, and the second “⋅” stands for matrix multiplica-
tion. The neutral element with respect to addition is the zero matrix 0𝑘 of order 𝑘. The
identity element with respect to multiplication is the identity matrix 𝐼𝑘 of order 𝑘.
Definition B.5.4. (1) A matrix 𝐴 ∈ 𝑅(𝑘,𝑘) is called invertible if it has an inverse in
the multiplicative semigroup 𝑅(𝑘,𝑘) ; that is, there is a matrix 𝐵 ∈ 𝑅(𝑘,𝑘) such hat
𝐴𝐵 = 𝐵𝐴 = 𝐼𝑘 .
(2) If 𝐴 ∈ 𝑅(𝑘,𝑘) is invertible, then we denote by 𝐴−1 the multiplicative inverse of 𝐴.
This matrix is also called the inverse of 𝐴.
(3) The unit group of 𝑅(𝑘,𝑘) , that is, the set of all invertible 𝑘 × 𝑘 matrices with entries
from 𝑅, is called the general linear group of degree 𝑘 over 𝑅 and is denoted by
𝖦𝖫(𝑘, 𝑅).

Definition B.5.4 uses the fact that the inverse of an element of a semigroup is
uniquely determined. We will show in Corollary B.5.21 that for a matrix 𝐴 ∈ 𝑅(𝑘,𝑘)
to be invertible, it suffices that 𝐴 has a left or right inverse, respectively; that is, there
is a matrix 𝐵 ∈ 𝑅(𝑘,𝑘) such that 𝐵𝐴 = 𝐼𝑘 or 𝐴𝐵 = 𝐼𝑘 , respectively. This right or left
inverse of 𝐴 is its inverse.

B.5.2. Permutation matrices. Permutation matrices are obtained by permuting


the row vectors of the identity matrix.
Definition B.5.5. Let 𝜎 ∈ 𝑆 𝑘 . Then the permutation matrix 𝑃𝜍 associated with 𝜎 is the
matrix in 𝑅(𝑘,𝑘) with row vectors 𝑒 𝜍(0)
⃗ , . . . , 𝑒 𝜍(𝑘−1)
⃗ , in this order.

We provide two other representations of permutation matrices. For this, we recall


the Kronecker delta which is the following function:
1 if 𝑖 = 𝑗,
(B.5.7) ℕ20 → {0, 1}, (𝑖, 𝑗) ↦ 𝛿 𝑖,𝑗 = {
0 if 𝑖 ≠ 𝑗.
Proposition B.5.6. Let 𝜎 ∈ 𝑆 𝑘 . Then the following hold.
(1) 𝑃𝜍 = (𝛿𝜍(𝑖),𝑗 )𝑖,𝑗∈ℤ𝑘 = (𝛿 𝑖,𝜍−1 (𝑗) )𝑖,𝑗∈ℤ𝑘 .
(2) 𝑃𝜍 is the matrix with column vectors 𝑒 𝜍⃗ −1 (0) , . . . , 𝑒 𝜍⃗ −1 (𝑘−1) , in this order.
318 B. Linear Algebra

We prove two important properties of permutation matrices.

Proposition B.5.7. (1) For all 𝜎, 𝜏 ∈ 𝑆 𝑘 we have 𝑃𝜍∘𝜏 = 𝑃𝜏 𝑃𝜍 .


(2) For all 𝜎 ∈ 𝑆 𝑘 the matrix 𝑃𝜍 is invertible, and we have 𝑃𝜍−1 = 𝑃𝜍−1 = 𝑃𝜍T .

Proof. Let 𝜎, 𝜏 ∈ 𝑆 𝑘 . Then Proposition B.5.6 implies


𝑘−1
𝑃𝜏 𝑃𝜍 = ( ∑ 𝛿𝜏(𝑖),ᵆ 𝛿𝜍(ᵆ),𝑗 )
ᵆ=0 𝑖,𝑗∈ℤ𝑘
= (𝛿𝜍(𝜏(𝑖)),𝑗 )𝑖,𝑗∈ℤ𝑘
= 𝑃𝜍∘𝜏 .

This proves the first assertion and also implies

(B.5.8) 𝐼𝑘 = 𝑃𝜍−1 𝑃𝜍 = 𝑃𝜍 𝑃𝜍−1 .

This shows that 𝑃𝜍 is invertible and 𝑃𝜍−1 = (𝑃𝜍 )−1 . Also, by Proposition B.5.6 we
have 𝑃𝜍T = (𝛿 𝑖,𝜍(𝑗) ) and 𝑃𝜍−1 = (𝛿 𝑖,𝜍(𝑗) ) which proves that these two matrices are the
same. □

From Proposition B.5.6, we obtain the following corollary.

Corollary B.5.8. The set of permutation matrices in 𝑅(𝑘,𝑘) is a subgroup of 𝖦𝖫(𝑘, 𝑅).

We also determine the effect of multiplying matrices by permutation matrices from


the left and right.

Proposition B.5.9. (1) For all 𝐴 ∈ 𝑅(𝑘,𝑙) with row vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
⃗ and all 𝜎 ∈ 𝑆 𝑘
the product 𝑃𝜍 𝐴 is the matrix in 𝑅(𝑘,𝑙) with row vectors 𝑎𝜍(0)
⃗ , . . . , 𝑎𝜍(𝑘−1)
⃗ , in this
order.
(2) For all 𝐴 ∈ 𝑅(𝑙,𝑘) with column vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
⃗ and all 𝜎 ∈ 𝑆 𝑙 the product 𝐴𝑃𝜍
is the matrix in 𝑅(𝑙,𝑘) with column vectors 𝑎𝜍⃗ −1 (0) , . . . , 𝑎𝜍⃗ −1 (𝑙−1) , in this order.

Proof. Let 𝐴 ∈ 𝑅(𝑘,𝑙) and 𝜎 ∈ 𝑆 𝑘 . Then, for all 𝑖 ∈ ℤ𝑘 , the product 𝑒 𝜍(𝑖)
⃗ 𝐴 is the 𝜎(𝑖)th
row vector of 𝐴. Together with the definition of 𝑃𝜍 this implies the first assertion. Now,
let 𝜎 ∈ 𝑆 𝑙 . Let 𝑗 ∈ ℤ𝑙 . By Proposition B.5.6, the 𝑗th column vector of 𝑃𝜍 is 𝑒 𝜍⃗ −1 (𝑗)
and the product 𝐴𝑒 𝜍(𝑗)
⃗ is the 𝜎−1 (𝑗)th column vector of 𝐴. This implies the second
assertion. □

B.5.3. Determinants.

Definition B.5.10. Consider a map

(B.5.9) det ∶ 𝑅(𝑘,𝑘) → 𝑅.


(1) The map 𝑓 is called multilinear if it has the following two properties. For all 𝐴 ∈
⃗ , all 𝑏 ⃗ ∈ 𝑅𝑘 , all 𝑗 ∈ ℤ𝑘 , and all 𝑟 ∈ 𝑅 we
𝑅(𝑘,𝑘) with column vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
B.5. Square matrices 319

have
⃗ , 𝑎𝑗⃗ + 𝑏,⃗ 𝑎𝑗+1
det(𝑎0⃗ , . . . , 𝑎𝑗−1 ⃗ , . . . , 𝑎𝑘−1
⃗ )
(B.5.10) = det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑎𝑗⃗ , 𝑎𝑗+1
⃗ , . . . , 𝑎𝑘−1
⃗ )
⃗ , 𝑏,⃗ 𝑎𝑗+1
+ det(𝑎0⃗ , . . . , 𝑎𝑗−1 ⃗ , . . . , 𝑎𝑘−1
⃗ )
and
det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑟𝑎𝑗⃗ , 𝑎𝑗+1
⃗ , . . . , 𝑎𝑘−1
⃗ )
(B.5.11)
= 𝑟 ⋅ det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑎𝑗⃗ , 𝑎𝑗+1
⃗ , . . . , 𝑎𝑘−1
⃗ ).

(2) The map det is called alternating if for every matrix 𝐴 ∈ 𝑅(𝑘,𝑘) which has two
equal columns we have det(𝐴) = 0.
(3) The map det is called normalized if det(𝐼𝑘 ) = 1.
(4) The map det is called a determinant function if it is multilinear, alternating, and
normalized.
Proposition B.5.11. Let det ∶ 𝑅(𝑘,𝑘) → 𝑅 be multilinear and alternating. Then, for all
𝐴 ∈ 𝑅(𝑘,𝑘) with column vectors 𝑎0⃗ , . . . , 𝑎𝑘−1
⃗ the following hold.
(1) Adding a multiple of one column to another column of 𝐴 does not change det(𝐴);
that is, for all all 𝑟 ∈ 𝑅 and all 𝑖, 𝑗 ∈ ℤ𝑘 with 𝑖 ≠ 𝑗 we have
(B.5.12) det(𝑎0⃗ , . . . , 𝑎𝑗−1
⃗ , 𝑎𝑗⃗ + 𝑟𝑎𝑖⃗ , 𝑎𝑗+1
⃗ . . . , 𝑎𝑘−1
⃗ ) = det 𝐴.
(2) Swapping two columns of 𝐴 changes the sign of det(𝐴); that is, for all 𝑖, 𝑗 ∈ ℤ𝑘 with
𝑖 ≠ 𝑗 we have
(B.5.13) det(𝑎0⃗ , . . . , 𝑎𝑗⃗ , . . . , 𝑎𝑖⃗ . . . , 𝑎𝑘−1
⃗ ) = − det 𝐴.
(3) If one column of 𝐴 is zero, then det 𝐴 = 0.
Theorem B.5.12. The map
𝑘−1
(𝑘,𝑘)
(B.5.14) det ∶ 𝑅 → 𝑅, 𝐴 ↦ det(𝐴) = ∑ sign(𝜎) ∏ 𝑎𝜍(𝑗),𝑗
𝜍∈𝑆𝑘 𝑗=0

is a determinant function and it is the only determinant function that maps 𝑅(𝑘,𝑘) to 𝑅.
Definition B.5.13. For 𝐴 ∈ 𝑅(𝑘,𝑘) the value det(𝐴) is called the determinant of 𝐴.

The formula (B.5.14) is called the Leibniz formula for evaluating determinants.
Proposition B.5.14. (1) The determinant of a square matrix over 𝑅 and its transpose
are the same; that is, for all 𝐴 ∈ 𝑅(𝑘,𝑘) we have
(B.5.15) det(𝐴) = det(𝐴T ).
(2) The determinant is linear with respect to matrix multiplication; that is, for all 𝐴, 𝐵 ∈
𝑅(𝑘,𝑘) we have
(B.5.16) det(𝐴𝐵) = det(𝐴) det(𝐵).
(3) The determinant of a triangular matrix (upper or lower) is the product of its diagonal
entries.
320 B. Linear Algebra

Definition B.5.15. Let 𝐴 ∈ 𝑅(𝑘,𝑘) and assume that 𝑘 > 1. Also, let 𝑖, 𝑗 ∈ ℤ𝑘 . Then the
minor 𝐴𝑖,𝑗 is the matrix in 𝑅(𝑘−1,𝑘−1) that is obtained by deleting the 𝑖th row and 𝑗th
column in 𝐴.
Here is the Laplace expansion formula for determinants.
Theorem B.5.16. Let 𝑘 > 1 and let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑘) . Then for every 𝑖 ∈ ℤ𝑘 we have
𝑘−1
(B.5.17) det 𝐴 = ∑ (−1)𝑖+𝑗 𝑎𝑖,𝑗 det 𝐴𝑖,𝑗
𝑗=0

and for every 𝑗 ∈ ℤ𝑘 we have


𝑘−1
(B.5.18) det 𝐴 = ∑ (−1)𝑖+𝑗 𝑎𝑖,𝑗 det 𝐴𝑖,𝑗 .
𝑖=0
(𝑘,𝑘)
Proposition B.5.17. Let 𝐴 ∈ 𝑅 in upper or lower triangular form. Then the deter-
minant is the product of the diagonal elements of 𝐴.
B.5.4. The unit group of 𝑅(𝑘,𝑘) .
Definition B.5.18. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑘,𝑘) . Then the adjugate of 𝐴 is defined as the
matrix
(B.5.19) adj(𝐴) = ((−1)𝑖+𝑗 det 𝐴𝑗,𝑖 )𝑖,𝑗∈ℤ ∈ 𝑅(𝑘,𝑘)
𝑘

where 𝐴𝑖,𝑗 are the minors of 𝐴. We also write adj 𝐴 instead of adj(𝐴).
The adjugate of a square matrix has the following property that allows us to com-
pute inverses of square matrices.
Proposition B.5.19. Let 𝐴 ∈ 𝑅(𝑘,𝑘) . Then we have
(B.5.20) (adj 𝐴)𝐴 = 𝐴(adj 𝐴) = det 𝐴 ⋅ 𝐼𝑘 .
Now we can characterize the elements of 𝖦𝖫(𝑘, 𝑅) and show how to compute the
inverses of square matrices.
Theorem B.5.20. (1) A matrix 𝐴 ∈ 𝑅(𝑘,𝑘) is invertible if and only if det(𝐴) is a unit in
𝑅; that is,
(B.5.21) 𝖦𝖫(𝑘, 𝑅) = {𝐴 ∈ 𝑅(𝑘,𝑘) ∶ det 𝐴 ∈ 𝑈(𝑅)}.
(2) Let 𝐴 ∈ 𝖦𝖫(𝑘, 𝑅). Then we have
(B.5.22) det(𝐴−1 ) = (det 𝐴)−1
and the inverse of 𝐴 is
(B.5.23) 𝐴−1 = det(𝐴)−1 adj(𝐴).
Corollary B.5.21. Let 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) with 𝐴𝐵 = 𝐼𝑘 . Then 𝐴, 𝐵 ∈ 𝖦𝖫(𝑘, 𝑅), 𝐵 = 𝐴−1 , and
𝐴 = 𝐵 −1 .
Corollary B.5.22. If 𝐹 is a field, then we have
(B.5.24) 𝖦𝖫(𝑘, 𝐹) = {𝐴 ∈ 𝐹 (𝑘,𝑘) ∶ det 𝐴 ≠ 0}.
Lemma B.5.23. Let 𝐴, 𝐵 ∈ 𝖦𝖫(𝑘, 𝑅). Then we have (𝐴𝐵)−1 = 𝐵 −1 𝐴−1 .
B.5. Square matrices 321

B.5.5. Trace.

Definition B.5.24. The trace of a square matrix 𝐴 over 𝑅 is the sum of its diagonal
elements. It is denoted by tr(𝐴) or tr 𝐴.

Proposition B.5.25. (1) The trace map tr ∶ 𝑅(𝑘,𝑘) → 𝑅 is 𝑅-linear; that is,
tr(𝑎𝐴 + 𝑏𝐵) = 𝑎 tr(𝐴) + 𝑏 tr(𝐵) for all 𝑎, 𝑏 ∈ 𝑅 and 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) .
(2) tr(𝐴T ) = tr(𝐴) for all 𝐴 ∈ 𝑅(𝑘,𝑘) .
(3) tr(𝐴𝐵) = tr(𝐵𝐴) for all 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) .

B.5.6. Characteristic polynomials.

Definition B.5.26. The characteristic polynomial of a matrix 𝐴 ∈ 𝐹 (𝑘,𝑘) is the polyno-


mial 𝑝𝐴 (𝑥) = det(𝑥𝐼𝑘 − 𝐴) ∈ 𝑅[𝑥].

Proposition B.5.27. Let 𝐴 ∈ 𝑅(𝑘,𝑘) and let


𝑘−1
(B.5.25) 𝑝𝐴 (𝑥) = 𝑥𝑘 + ∑ 𝑟 𝑖 𝑥𝑖
𝑖=0

with 𝑟 𝑖 ∈ 𝑅 for all 𝑖 ∈ ℤ𝑘 . Then we have


(B.5.26) 𝑟 𝑘−1 = − tr(𝐴)
and
(B.5.27) 𝑟0 = (−1)𝑘 det(𝐴).

Corollary B.5.28. Let 𝐴 ∈ 𝑅(𝑘,𝑘) and assume that the characteristic polynomial of 𝐴
can be written as
𝑘−1
(B.5.28) 𝑝𝐴 (𝑥) = ∏(𝑥 − 𝜆𝑖 )
𝑖=0

with 𝜆𝑖 ∈ 𝑅 for 0 ≤ 𝑖 < 𝑘. Then we have


𝑘−1
(B.5.29) tr(𝐴) = ∑ 𝜆𝑖
𝑖=0

and
𝑘−1
(B.5.30) det(𝐴) = ∏ 𝜆𝑖 .
𝑖=0

B.5.7. Similar matrices.

Definition B.5.29. Two matrices 𝐴, 𝐵 ∈ 𝑅(𝑘,𝑘) are called similar if there is a matrix
𝑈 ∈ 𝖦𝖫(𝑘, 𝑅) such that 𝐵 = 𝑈 −1 𝐴𝑈.

Proposition B.5.30. Similar matrices in 𝑅(𝑘,𝑘) have the same characteristic polynomial,
trace, and determinant.
322 B. Linear Algebra

B.6. Free modules of finite dimension


In this section, we discuss free 𝑅-modules with finite bases. Let 𝑀 be an 𝑅-module.

B.6.1. Operations on 𝑀 𝑘 .

Definition B.6.1. Let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1


⃗ ) ∈ 𝑀𝑘.

(1) We define the product of 𝐵 with a vector 𝑥⃗ = (𝑥0 , . . . , 𝑥𝑘−1 ) ∈ 𝑅𝑘 as


𝑘−1
(B.6.1) 𝐵 𝑥⃗ = 𝐵 ⋅ 𝑥⃗ = ∑ 𝑥𝑖 𝑏𝑖⃗ .
𝑖=0

(2) Let 𝑇 ∈ 𝑅(𝑘,𝑙) with column vectors 𝑡0⃗ , . . . , 𝑡 𝑙−1


⃗ . Then we define the product of 𝐵
with 𝑇 as
(B.6.2) 𝐵𝑇 = 𝐵 ⋅ 𝑇 = (𝐵 𝑡0⃗ , . . . , 𝐵𝑡 𝑙−1
⃗ ).

Proposition B.6.2. Let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1


⃗ ) ∈ 𝑀𝑘.

(1) For all 𝑟 ∈ 𝑅 and all 𝑥,⃗ 𝑦 ⃗ ∈ 𝑅𝑘 we have (𝑟𝐵)𝑥⃗ = 𝐵(𝑟𝑥)⃗ and 𝐵(𝑥⃗ + 𝑦)⃗ = 𝐵 𝑥⃗ + 𝐵𝑦.⃗
(2) For all 𝑟 ∈ 𝑅 and 𝑋 ∈ 𝑅(𝑘,𝑙) and 𝑌 ∈ 𝑅(𝑘,𝑙) we have (𝑟𝐵)𝑋 = 𝐵(𝑟𝑋) and 𝐵(𝑋 +𝑌 ) =
𝐵𝑋 + 𝐵𝑌 .
(3) For all 𝑋 ∈ 𝑅(𝑘,𝑙) and 𝑌 ∈ 𝑅(𝑙,𝑚) we have (𝐵𝑋)𝑌 = 𝐵(𝑋𝑌 ).

B.6.2. Bases and dimension.


Proposition B.6.3. Let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑚−1
⃗ ) be a sequence of elements in 𝑀. Then 𝐵
is linearly independent if and only if it follows from ∑𝑗=0 𝑟𝑗 𝑏𝑗⃗ = 0 with 𝑟𝑗 ∈ 𝑅 for
0 ≤ 𝑗 < 𝑚 that 𝑟0 , . . . , 𝑟𝑚−1 = 0.
Theorem B.6.4. Let 𝑀 be finitely generated. Then the following hold.
(1) If 𝑀 is free, then all bases of 𝑀 are finite and have the same length which is called
the dimension of 𝑀.
(2) If 𝑀 has a finite basis 𝐵 of length 𝑘, then every basis of 𝑀 can be obtained as 𝐵𝑇 with
𝑇 ∈ 𝖦𝖫(𝑘, 𝑅); that is, the set of all bases of 𝑀 is 𝐵𝖦𝖫(𝑘, 𝑅).
Corollary B.6.5. Let 𝐹 be a field, and let 𝑉 be a finitely generated 𝐹-vector space. Then
𝑉 has a finite basis 𝐵, all bases of 𝑉 have the same length, which is called the dimension
of 𝑉, and for any basis 𝐵 of 𝑉, the set of all bases of 𝑉 is 𝐵𝖦𝖫(𝑘, 𝐹).

Definition B.6.6. Let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1


⃗ ) be a basis of 𝑀 and let 𝑣 ⃗ ∈ 𝑀. Then the
𝑘
uniquely determined vector 𝑥⃗ ∈ 𝑅 such that 𝑣 ⃗ = 𝐵 𝑥⃗ is called the coefficient vector of
𝑣 ⃗ with respect to the basis 𝐵. We denote it by 𝑣 𝐵⃗ .

Theorem B.6.7. Let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1


⃗ ) be a basis of 𝑀. Then the map

(B.6.3) 𝑀 → 𝑅𝑘 , 𝑣 ⃗ ↦ 𝑣 𝐵⃗ ,
that sends an element 𝑣 ⃗ ∈ 𝑀 to its coefficient vector with respect to the basis 𝐵, is an
isomorphism of 𝑅-modules.
B.6. Free modules of finite dimension 323

Proposition B.6.8. Let 𝐵 be a finite basis of 𝑀 of length 𝑘 and let 𝑇 ∈ 𝖦𝖫(𝑘, 𝑅). Then
for all 𝑣 ⃗ ∈ 𝑀 we have
(B.6.4) 𝑣 𝐵⃗ = 𝑇 𝑣 𝐵𝑇
⃗ .

B.6.3. Linear maps. Let 𝑀, 𝑁 be free 𝑅-modules of dimensions 𝑘 and 𝑙, respec-


tively. In this section, we construct isomorphisms between the 𝑅-modules Hom𝑅 (𝑀, 𝑁)
and 𝑅(𝑘,𝑙) and between the 𝑅-algebras End𝑅 (𝑀) and 𝑅(𝑘,𝑘) . This shows that we can
identify homomorphisms between finite-dimensional 𝑅-modules with matrices over
𝑅. However, we will see that these identifications depend on the choice of bases of 𝑀
and 𝑁.
Let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1
⃗ ) and 𝐶 = (𝑐 0⃗ , . . . , 𝑐 𝑙−1
⃗ ) be 𝑅-bases of 𝑀 and 𝑁, respectively.
Recall that for any 𝑣 ⃗ ∈ 𝑀 we denote by 𝑣 𝐵⃗ the coefficient vector of 𝑣 ⃗ with respect to the
basis 𝐵. Also, for any 𝑤⃗ ∈ 𝑁 we denote by 𝑤⃗ 𝐶 the coefficient vector of 𝑤⃗ with respect
to the basis 𝐶. By Theorem B.6.7 the maps 𝑀 → 𝑅𝑙 , 𝑣 ⃗ ↦ 𝑣 𝐵⃗ and 𝑁 → 𝑅𝑘 , 𝑤⃗ ↦ 𝑤⃗ 𝐶 are
𝑅-module isomorphisms. We now define the representation matrices of linear maps
from 𝑀 to 𝑁.
Definition B.6.9. (1) For 𝑓 ∈ Hom𝑅 (𝑀, 𝑁), we define Mat𝐵,𝐶 (𝑓) as the matrix in
𝑅(𝑙,𝑘) with column vectors 𝑓(𝑏0⃗ )𝐶 , . . . , 𝑓(𝑏𝑘−1
⃗ )𝐶 . This matrix is called the repre-
sentation matrix of 𝑓 with respect to the bases 𝐵 and 𝐶.
(2) For 𝑓 ∈ End(𝑀) we write Mat𝐵 (𝑓) for Mat𝐵,𝐵 (𝑓) and call this matrix the repre-
sentation matrix of 𝑓 with respect the basis 𝐵.
(3) Let 𝑇 be in 𝑅(𝑙,𝑘) . Then we define the map
(B.6.5) 𝑓𝑇,𝐵.𝐶 ∶ 𝑀 → 𝑁, 𝑣 ⃗ ↦ 𝐶𝑇 𝑣 𝐵⃗ .
This map is in Hom𝑅 (𝑀, 𝑁).
(4) Let 𝑀 = 𝑁 and let 𝑇 be in 𝑅(𝑘,𝑘) . Then we write 𝑓𝑇,𝐵 for 𝑓𝑇,𝐵,𝐵 . This map is in
End(𝑀).
Theorem B.6.10. (1) The map
(B.6.6) Hom𝑅 (𝑀, 𝑁) → 𝑅(𝑙,𝑘) , 𝑓 ↦ Mat𝐵,𝐶 (𝑓)
is an 𝑅-module isomorphism. The inverse of this isomorphism sends 𝑇 ∈ 𝑅(𝑙,𝑘) to
𝑓𝑇,𝐵,𝐶 .
(2) The map
(B.6.7) End𝑅 (𝑀) → 𝑅(𝑘,𝑘) , 𝑓 ↦ Mat𝐵 (𝑓)
is an 𝑅-algebra isomorphism.
Proposition B.6.11. Let 𝑆 ∈ 𝖦𝖫(𝑘, 𝑅) and 𝑇 ∈ 𝖦𝖫(𝑙, 𝑅). Then for all 𝑓 ∈ Hom𝑅 (𝑀, 𝑁)
we have
(B.6.8) Mat(𝑓)𝐵𝑆,𝐶𝑇 = 𝑇 −1 Mat(𝑓)𝐵,𝐶 𝑆.

Proof. We apply Proposition B.6.8 and obtain


(B.6.9) 𝑓(𝑣)⃗ = 𝐶Mat𝐵,𝐶 (𝑓)𝑣 𝐵⃗ = 𝐶𝑇𝑇 −1 Mat𝐵,𝐶 (𝑓)𝑆𝑆 −1 𝑣 𝐵⃗ .
This implies the assertion. □
324 B. Linear Algebra

B.6.4. Endomorphisms. Let 𝑀 be a free 𝑅-module of finite dimension 𝑘 and


let 𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1
⃗ ) be a basis of 𝑀. The next theorem follows immediately from
Theorem B.6.10.
Theorem B.6.12. The map
(B.6.10) End𝑅 (𝑀) → 𝑅(𝑘,𝑘) , 𝑓 ↦ Mat𝐵 (𝑓)
is an isomorphism of 𝑅-algebras. The inverse of this isomorphism is
(B.6.11) 𝑅(𝑘,𝑘) → End𝑅 (𝑀), 𝑇 ↦ 𝑓𝑇,𝐵 .
Theorem B.6.13. (1) An endomorphism 𝑓 of 𝑀 is an automorphism if and only if
Mat𝐵 (𝑓) ∈ 𝖦𝖫(𝑅, 𝑘).
(2) For all 𝑓 ∈ Aut𝑅 (𝑀) we have
(B.6.12) Mat𝐵 (𝑓−1 ) = Mat𝐵 (𝑓)−1 .
Proposition B.6.14. For all 𝑈 ∈ 𝖦𝖫(𝑘, 𝑅) we have 𝑀𝐵𝑈 (𝑓) = 𝑈 −1 𝑀𝐵 (𝑓)𝑈.
Corollary B.6.15. The set of representation matrices of an endomorphism 𝑓 of 𝑀 is the
equivalence class of all matrices in 𝑅(𝑘,𝑘) that are similar to Mat𝐵 (𝑓).
Corollary B.6.16. The characteristic polynomials, traces, and determinants of all matrix
representations of an endomorphism of a 𝑘-dimensional 𝑅-module 𝑀 are the same.

This result justifies the following definition.


Definition B.6.17. The characteristic polynomial, determinant, and trace of an endo-
morphism of a finitely generated free 𝑅-module is the characteristic polynomial, trace,
and determinant of any of its representation matrices in 𝑅(𝑘,𝑘) , respectively.

B.6.5. Dual modules. Let 𝑀 be a free 𝑅-module of finite dimension 𝑘 and let
𝐵 = (𝑏0⃗ , . . . , 𝑏𝑘−1
⃗ ) be a basis of 𝑀.

Theorem B.6.18. The dual module 𝑀 ∗ is isomorphic to 𝑀 as an 𝑅-module. In particu-


lar, 𝑀 ∗ is a finitely generated free module and its dimension is the dimension of 𝑀.

B.7. Finite-dimensional vector spaces


Most of the linear algebra used to model quantum algorithms refers to vector spaces of
finite dimensions. This section discusses this topic. We let 𝑉, 𝑊 be 𝐹-vector spaces of
dimensions 𝑘 and 𝑙, respectively.

B.7.1. Bases and generating systems. We know from Theorem B.6.4 that all
bases of 𝑉 have the same length 𝑘 and that for every basis 𝐵 of 𝑉, the set of all bases of
𝑉 is the coset 𝖦𝖫(𝑘, 𝐹)𝐵. We now state the Steinitz Exchange Lemma which allows us
to obtain more results for bases and generating systems of 𝑉.
Lemma B.7.1. Let 𝑚, 𝑛 ∈ ℕ, let 𝑈 = (𝑢⃗0 , . . . , 𝑢⃗𝑚−1 ) ∈ 𝑉 𝑚 be linearly independent, and
let 𝐺 ∈ 𝑉 𝑛 be a generating system of 𝑉. Then 𝑚 ≤ 𝑛 and there are elements 𝑢⃗𝑚 , . . . , 𝑢⃗𝑛−1
in 𝐺 such that (𝑢⃗0 , . . . , 𝑢⃗𝑛−1 ) is a generating system of 𝑉.
B.7. Finite-dimensional vector spaces 325

Theorem B.7.2. (1) Linearly independent sequences in 𝑉 have length ≤ 𝑘.


(2) A linearly independent sequence in 𝑉 is a basis of 𝑉 if and only if its length is 𝑘.
(3) Every linearly independent system can be extended to a basis of 𝑉.
(4) Generating systems of 𝑉 have length ≥ 𝑘.
(5) A generating system of 𝑉 is a basis of 𝑉 if and only if its length is 𝑘.

B.7.2. The rank of a matrix.

Proposition B.7.3. The vector spaces generated by the rows and columns of a matrix 𝐴
over 𝐹, respectively, have the same dimension. This dimension is called the rank of 𝐴. It
is denoted by rank(𝐴) or rank 𝐴.

Example B.7.4. Consider the matrix

1 0
(B.7.1) 𝐴 = (0 1)
1 1

over 𝔽2 . The rank of 𝐴 is 2. Indeed, the column rank of 𝐴 is 2 because the two column
vectors of 𝐴 are linearly independent. Also, the row rank of this matrix is 2 because the
first two row vectors of 𝐴 are linearly independent and the third row vector is a linear
combination of the first two.

The next proposition establishes a connection between the kernel and the image
of a linear map from 𝑉 to 𝑊 and the rank of a representation matrix of this map.

Proposition B.7.5. Let 𝑓 ∈ Hom𝐹 (𝑉, 𝑊). Then the rank 𝑟 of all representation matrices
of 𝑓 is the same and the following hold.

(1) The dimension of the image of 𝑓 is 𝑟.


(2) The dimension of the kernel of 𝐴 is 𝑘 − 𝑟.

Definition B.7.6. Let 𝐴 ∈ 𝐹 (𝑘,𝑙) .

(1) The kernel of 𝐴 is defined as the kernel of the map 𝑅𝑙 → 𝑅𝑘 , 𝑥⃗ ↦ 𝐴𝑥.⃗


(2) The image of 𝐴 is defined as the image of the map 𝑅𝑙 → 𝑅𝑘 , 𝑥⃗ ↦ 𝐴𝑥.⃗

Definition B.7.7. We call a matrix 𝐴 ∈ 𝐹 (𝑘,𝑘) singular if det 𝐴 = 0 and we call it


nonsingular otherwise.

Proposition B.7.8. Let 𝐴 ∈ 𝐹 (𝑘,𝑘) . Then the following statements are equivalent.

(1) 𝐴 is nonsingular.
(2) The rank of 𝐴 is 𝑘.
(3) The columns of 𝐴 form a basis of 𝐹 𝑘 .
(4) The rows of 𝐴 form a basis of 𝐹 𝑘 .
326 B. Linear Algebra

B.7.3. Row and column echelon form.

Definition B.7.9. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ 𝑅(𝑙,𝑘) with row vectors 𝑎0⃗ , . . . , 𝑎𝑙−1
⃗ .
(1) We say that 𝐴 is in row echelon form if the following conditions are satisfied.
(a) All rows of 𝐴 that have only zero entries are at the bottom of 𝐴; that is, if
𝑢, 𝑣 ∈ ℤ𝑘 such that 𝑎ᵆ⃗ ≠ 0⃗ and 𝑎𝑣⃗ = 0,⃗ then 𝑢 < 𝑣.
(b) If 𝑢 > 0 and 𝑎ᵆ⃗ is nonzero, then the first nonzero entry in 𝑎ᵆ⃗ is strictly to the
right of the first nonzero entry in 𝑎ᵆ−1
⃗ ; that is,
(B.7.2) min{𝑗 ∈ ℤ𝑙 ∶ 𝑎ᵆ,𝑗 ≠ 0} > min{𝑗 ∈ ℤ𝑙 ∶ 𝑎ᵆ−1,𝑗 ≠ 0}.
(2) We say that 𝐴 is in reduced row echelon form if 𝐴 is in row echelon form and the
first nonzero element in each nonzero row is 1.

Definition B.7.10. We say that a matrix 𝐴 ∈ 𝑅(𝑘,𝑙) is in column echelon form if 𝐴T is


in row echolon form. Also, we say that 𝐴 is in reduced column echelon form if 𝐴T is in
reduced row echelon form.

We note that a square matrix in row echelon form is an upper triangular matrix.
Also, a square matrix in column echelon form is a lower triangular matrix.

B.7.4. The Gauss elimination algorithm. In this section, we explain the Gauss
Elimination Algorithm B.7.13 that transforms a matrix 𝐴 ∈ 𝐹 (𝑙,𝑘) into column echelon
form. Despite its name, this algorithm was already known in China in the second
century. Since the algorithm uses division by nonzero elements, the algorithm is only
guaranteed to work over fields but, in general, not over rings.
The correctness of the algorithm is stated in the next theorem.

Theorem B.7.11. On input of 𝑘, 𝑙 ∈ ℕ and 𝐴 ∈ 𝐹 (𝑙,𝑘) , Algorithm B.7.13 returns 𝐴′ 𝐹 (𝑙,𝑘) ,


𝑆 ∈ 𝖦𝖫(𝑘, 𝐹), 𝑣 ∈ ℕ, and 𝑤 ∈ ℕ0 such that 𝐴′ is in column echolon form, 𝐴′ = 𝐴𝑆, 𝑣 is
the number of nonzero columns in 𝐴′ , and det 𝑆 = (−1)𝑤 .

The name “Gaussian elimination algorithm” derives from the fact that in the for
loop starting at line 15, the entries 𝑎ᵆ,𝑗 are “eliminated” for 𝑣 < 𝑗 < 𝑘.
Algorithm B.7.13 can also be used to transform 𝐴 ∈ 𝐹 (𝑘,𝑙) into row echelon form
as follows. We apply Algorithm B.7.13 to the transpose of 𝐴. The algorithm returns
𝐴′ , 𝑆, 𝑣, 𝑤. We replace 𝐴′ , 𝑆 by their transposes. Then 𝐴′ is in row echelon form and we
have 𝐴′ = 𝑆𝐴, 𝑣 is the number of nonzero rows of 𝐴′ , and det 𝑆 = (−1)𝑤 .

Theorem B.7.12. Let 𝑘, 𝑙 ∈ ℕ and 𝐴 ∈ 𝐹 (𝑙,𝑘) be the input of Algorithm B.7.13 and
let 𝑛 = max{𝑘, 𝑙}. The algorithm then uses O(𝑛3 ) operations in 𝐹 and space for O(𝑛2 )
elements of 𝐹.

The Gauss elimination algorithm also requires time and space to initialize and
increment the loop variables 𝑢, 𝑣, and 𝑤. However, these time and space requirements
are dominated by the complexity of the operations in the field 𝐹. Therefore, we do not
mention them explicitly.
B.7. Finite-dimensional vector spaces 327

Algorithm B.7.13. Gaussian elimination


Input: 𝑘, 𝑙 ∈ ℕ, 𝐴 ∈ 𝐹 (𝑙,𝑘)
Output: 𝐴′ ∈ 𝐹 (𝑙,𝑘) , 𝑆 ∈ 𝖦𝖫(𝑘, 𝐹), 𝑣, 𝑤 ∈ ℕ0 , such that 𝐴′ is in column echolon form,
𝐴′ = 𝐴𝑆, 𝑣 is the number of nonzero columns in 𝐴′ , and det 𝑆 = (−1)𝑤 .
1: ColumnEcholon(𝑘, 𝑙, 𝐴)
2: 𝑢, 𝑣, 𝑤 ← 0
3: 𝐴′ ← 𝐴
4: /* The entries of 𝐴′ are 𝑎′𝑖,𝑗 */
5: 𝑆 ← 𝐼𝑘
6: /*The columns of 𝐴′ , 𝑆 are 𝑎𝑗′⃗ and 𝑠𝑗⃗ , respectively.
7: while 𝑢 < 𝑙 and 𝑣 < 𝑘 do
8: if one of 𝑎′ᵆ,𝑣 , 𝑎′ᵆ,𝑣+1 , . . . , 𝑎′ᵆ,𝑘−1 is nonzero then
9: Select 𝑣pivot ∈ {𝑣, . . . , 𝑘 − 1} such that 𝑎′ᵆ,𝑣pivot ≠ 0
10: if 𝑣 ≠ 𝑣pivot then
11: Swap 𝑎′𝑣⃗ and 𝑎′𝑣⃗ pivot
12: Swap 𝑠𝑣⃗ and 𝑠𝑣⃗ pivot
13: 𝑤 ←𝑤+1
14: end if
15: for 𝑗 = 𝑣 + 1, . . . , 𝑘 − 1 do
16: 𝛼 ← 𝑎ᵆ,𝑗 /𝑎ᵆ,𝑣
17: 𝑎𝑗⃗ ← 𝑎𝑗⃗ − 𝛼𝑎𝑣⃗
18: 𝑠𝑗⃗ ← 𝑠𝑗⃗ − 𝛼𝑠𝑣⃗
19: end for
20: 𝑣 ←𝑣+1
21: end if
22: 𝑢←𝑢+1
23: end while
24: return 𝐴′ , 𝑆, 𝑣, 𝑤
25: end

Several modifications of Algorithm B.7.13 are possible. Depending on the desired


output, we can omit the computation of 𝑆 or 𝑤 which simplifies the algorithm and im-
proves its performance. When 𝐹 is the field of real or complex numbers, the algorithm
can only use approximations of the entries of the matrix 𝐴. Then, a good selection of
the pivot element is crucial for keeping error propagation under control.

B.7.5. Applications of the Gauss elimination algorithm.

Theorem B.7.14. Let 𝑙, 𝑘 ∈ ℕ and 𝐴 ∈ 𝐹 (𝑙,𝑘) be the input of the Gauss elimination
algorithm and let 𝐴′ ∈ 𝐹 (𝑙,𝑘) , 𝑆 ∈ 𝖦𝖫(𝑘, 𝐹), 𝑣, 𝑤 ∈ ℕ be its output. Then the following
hold.
(1) The rank of 𝐴 is 𝑣.
(2) The sequence consisting of the first 𝑣 column vectors of 𝐴′ is a basis of the image of 𝐴.
328 B. Linear Algebra

(3) The sequence consisting of the last 𝑣 − 𝑘 columns of 𝑆 is a basis of the kernel of 𝐴.
(4) If 𝑘 = 𝑙, then (−1)𝑤 det 𝐴 is the product of the diagonal elements of 𝐴′ .

Also, with 𝑛 = max{𝑘, 𝑙} the computation of these objects requires O(𝑘3 ) operations in 𝐹
and space for O(𝑘2 ) elements of 𝐹.

Next, we discuss the problem of solving linear systems of equations. By this we


mean the following. Let 𝐴 ∈ 𝐹 (𝑙,𝑘) and 𝑏 ⃗ ∈ 𝐹 𝑙 . The goal is to find all 𝑥⃗ ∈ 𝐹 𝑘 such that

(B.7.3) 𝐴𝑥⃗ = 𝑏.⃗

If 𝑥⃗ ∈ 𝐹 𝑙 satisfies (B.7.3), then 𝑥⃗ is called a solution of the linear system (B.7.3). We first
characterize the solutions of linear systems.

Proposition B.7.15. Let 𝐴 ∈ 𝐹 (𝑙,𝑘) and let 𝑏 ∈ 𝐹 𝑙 . Then the set of all the solutions of
the linear system 𝐴𝑥⃗ = 𝑏 ⃗ is empty or a coset of the kernel of 𝐴, i.e., of the form 𝑥⃗ + ker(𝐴)
where 𝑥⃗ is any of the solutions of the linear system.

Proposition B.7.15 shows how to find the set of all solutions of the linear system
(B.7.3). First, decide whether the linear system has a solution and if this is the case,
find one. Second, determine the basis of the kernel of 𝐴. We have already explained
how the second task can be achieved using the Gauss algorithm. So, it remains to solve
the first task. This is done in Algorithm B.7.16.

Algorithm B.7.16. Solving a linear system


Input: 𝑘, 𝑙 ∈ ℕ, 𝐴 ∈ 𝐹 (𝑙,𝑘) , 𝑏 ⃗ ∈ 𝐹 𝑙
Output: 𝑥⃗ ∈ 𝐹 𝑘 with 𝐴𝑥⃗ = 𝑏 ⃗ or “No solution”
1: Solve(𝑘, 𝑙, 𝐴, 𝑏)⃗
2: (𝐴′ , 𝑆, 𝑣, 𝑤) ← 𝙲𝚘𝚕𝚞𝚖𝚗𝙴𝚌𝚑𝚘𝚕𝚘𝚗(𝑘, 𝑙, 𝐴)
3: /* The entries of 𝐴′ are 𝑎′𝑖.𝑗 . The entries of 𝑏 ⃗ are 𝑏𝑖 /*
4: for 𝑗 = 0, . . . , 𝑣 − 1 do
5: 𝑢 = min{𝑖 ∈ ℤ𝑘 ∶ 𝑎′𝑖,𝑗 ≠ 0}
𝑗−1
6: 𝑦𝑗 = (𝑏ᵆ − ∑𝑖=0 𝑥𝑖 𝑎′ᵆ,𝑖 ) /𝑎′ᵆ,𝑗
7: end for
8: 𝑦 ⃗ ← (𝑦0 , . . . , 𝑦𝑣−1 , ⏟
0,⎵.⏟
. .⎵,⏟
0)
𝑘−𝑣
9: if 𝐴′ 𝑦 ⃗ = 𝑏 then
10: 𝑥⃗ ← 𝑆 𝑦 ⃗
11: return 𝑥⃗
12: else
13: return “No solution”
14: end if
15: end
B.7. Finite-dimensional vector spaces 329

Proposition B.7.17. Let 𝑘, 𝑙 ∈ ℕ, 𝐴 ∈ 𝐹 (𝑙,𝑘) , and 𝑏 ⃗ ∈ 𝐹 𝑙 be the input of Algorithm


B.7.16. If the algorithm returns 𝑥⃗ ∈ 𝐹 𝑘 , then this vector satisfies 𝐴𝑥⃗ = 𝑏.⃗ If the algorithm
returns “No solution”, then the linear system 𝐴𝑥⃗ = 𝑏 ⃗ has no solution. The algorithm
requires O(𝑛3 ) operations in 𝐹 and space for O(𝑛2 ) elements of 𝐹.
B.7.6. Eigenvalues, eigenvectors, and eigenspaces.
Definition B.7.18. (1) Let 𝐴 ∈ End(𝑉). An eigenvalue of 𝐴 is a field element 𝜆 ∈ 𝐹
such that 𝐴𝑣 ⃗ = 𝜆𝑣 ⃗ for some nonzero vector 𝑣 ⃗ ∈ 𝑉. Such a vector 𝑣 ⃗ is called an
eigenvector of 𝐴 corresponding to the eigenvalue 𝜆 or simply an eigenvector of 𝐴.
(2) Let 𝐴 ∈ 𝐹 (𝑘,𝑘) . An eigenvalue or eigenvector of 𝐴 is defined as an eigenvalue or
eigenvector of the endomorphism 𝐹 𝑘 → 𝐹 𝑘 , 𝑣 ⃗ ↦ 𝐴𝑣,⃗ respectively.
Proposition B.7.19. Let 𝑓 ∈ End(𝑉) and let 𝑣 ⃗ ∈ 𝑉 be an eigenvector of 𝑓. Then there
is exactly one eigenvalue 𝜆 ∈ 𝐹 of 𝑓 such that 𝑓(𝑣)⃗ = 𝜆𝑣.⃗ It is called the eigenvalue
associated with the eigenvector 𝑣.⃗
Proposition B.7.20. If 𝜆 is an eigenvalue of an endomorphism 𝑓 ∈ End(𝑉), then the
set of all eigenvectors corresponding to this eigenvalue is a subspace of 𝑉. It is called the
eigenspace of 𝑓 associated with the eigenvalue 𝜆.
Proposition B.7.21. A field element 𝜆 ∈ 𝐹 is an eigenvalue of an endomorphism 𝑓 ∈
End(𝑉) if and only if 𝑝𝑓 (𝜆) = 0.
Corollary B.7.22. If 𝐴 ∈ 𝐹 (𝑘,𝑘) is in upper or lower triangular form, then the eigenvalues
of 𝐴 are its diagonal elements.
Definition B.7.23. If 𝜆 is an eigenvalue of 𝑓 ∈ End(𝑉), then the dimension of the
eigenspace associated with 𝜆 is called the geometric multiplicity of 𝜆. Furthermore, the
algebraic multiplicity of 𝜆 is the power to which (𝑥 − 𝜆) divides 𝑝𝐴 (𝑥).
Proposition B.7.24. Let 𝑓 ∈ End(𝑉), let 𝜆0 , . . . , 𝜆𝑙−1 be the eigenvalues of 𝑓, let 𝑚0 , . . . ,
𝑚𝑙−1 be their geometric multiplicities, and let 𝐸0 , . . . , 𝐸 𝑙−1 be the corresponding eigen-
spaces. Then the following hold.
(1) The sum of the eigenspaces 𝐸0 , . . . , 𝐸 𝑙−1 is direct.
(2) There is a basis (𝑏0⃗ , . . . , 𝑏𝑘−1
⃗ ) of 𝑉 such that for all 𝑗 ∈ ℤ𝑙 the sequence (𝑏𝑀
⃗ , ... ,
𝑗

⃗ +𝑚 −1 ) is a basis of 𝐸𝑗 where 𝑀𝑗 = ∑𝑗−1 𝑚𝑖 .


𝑏𝑀 𝑗 𝑗 𝑖=0
(𝑘,𝑘)
Corollary B.7.25. Let 𝐴 ∈ 𝐹 , let 𝜆0 , . . . , 𝜆𝑙−1 be the eigenvalues of 𝐴, and let 𝑚0 , . . . ,
𝑚𝑙−1 be their geometric multiplicities. Then 𝐴 is similar to a matrix of the form
𝐴1 𝐴2
(B.7.4) 𝐴=( )
𝟎 𝐴3
where
(B.7.5) 𝐴1 = diag(𝜆 0 , . . . , 𝜆0 , . . . , ⏟⎵
⏟⎵⏟⎵⏟ 𝜆𝑙−1⎵⎵⏟⎵
, . . . ,⎵
𝜆⎵⏟
𝑙−1 ),
𝑚0 𝑚𝑙−1

(𝑘−𝑙,𝑘)
𝟎 stands for the matrix in 𝐹 with only zero entries, 𝐴2 ∈ 𝐹 (𝑙,𝑘−𝑙) , and 𝐴3 ∈ 𝐹 (𝑙,𝑘−𝑙) .
Corollary B.7.26. Let 𝑓 ∈ End(𝑉). Then the geometric multiplicities of the eigenvalues
of 𝑓 are less than or equal to the corresponding algebraic multiplicities.
330 B. Linear Algebra

B.7.7. Diagonizable matrices.


Definition B.7.27. A matrix 𝐴 ∈ 𝑅(𝑘,𝑘) is called diagonizable if 𝐴 is similar to a diag-
onal matrix.
Theorem B.7.28. Let 𝐴 ∈ 𝔽(𝑘,𝑘) . Then the following statements are equivalent.
(1) 𝐴 is diagonizable.
(2) The characteristic polynomial 𝑝𝐴 (𝑥) of 𝐴 is a product of linear factors, and the geo-
metric multiplicity of each eigenvalue is equal to its algebraic multiplicity.
(3) 𝐹 𝑘 is the direct sum of the eigenspaces of 𝐴.
Corollary B.7.29. Let 𝐴 ∈ 𝔽(𝑘,𝑘) be diagonizable, let 𝜆0 , . . . , 𝜆𝑙−1 be the distinct eigen-
values of 𝐴, and let 𝑚0 , . . . , 𝑚𝑙−1 be their algebraic multiplicities. Then there is a basis
(𝑏0⃗ , . . . , 𝑏𝑘−1
⃗ ) of eigenvectors of 𝐴 such that the first 𝑚0 of them are eigenvectors for the
eigenvalue 𝜆0 , the next 𝑚1 of them are eigenvectors for the eigenvalue 𝜆1 , etc. Also, we
have
(B.7.6) 𝐵 −1 𝐴𝐵 = diag(𝜆 0 , . . . , 𝜆0 , 𝜆
⏟⎵⏟⎵⏟ 1 , . . . , 𝜆1 , . . . , ⏟⎵
⏟⎵⏟⎵⏟ 𝜆𝑙−1⎵⎵⏟⎵
, . . . ,⎵
𝜆⎵⏟
𝑙−1 )
𝑚0 𝑚1 𝑚𝑙−1

where 𝐵 is the matrix with column vectors 𝑏0⃗ , . . . , 𝑏𝑘−1


⃗ .

B.8. Tensor products


Let 𝑀0 , . . . , 𝑀𝑚−1 , 𝑃 be modules over a ring 𝑅 where 𝑚 ∈ ℕ. We discuss the tensor
product of the modules 𝑀0 , . . . , 𝑀𝑚−1 . We let 𝑀 be the direct product of the modules
𝑀0 , . . . , 𝑀𝑚−1 .

B.8.1. Idea and characterization. A tensor product of the modules 𝑀0 , . . . ,


𝑀𝑚−1 combines these modules to a larger module 𝑇 that respects the original mod-
ule structures and their respective operations. Such a construction is useful to model
combinations of systems that are individually modeled as 𝑅-modules, for example the
combination of state spaces in quantum mechanics. For the definition of tensor prod-
ucts, multilinear maps are required.
Definition B.8.1. A function
(B.8.1) 𝑓∶𝑀→𝑃
is called multilinear if for all (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀 and all 𝑗 ∈ ℤ𝑚 the functions
(B.8.2) 𝑀𝑗 → 𝑃, 𝑣 ⃗ ↦ 𝑓(𝑣 0⃗ , . . . , 𝑣𝑗−1
⃗ , 𝑣,⃗ 𝑣𝑗+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
are 𝑅-module homomorphisms. If 𝑚 = 2, then 𝑓 is called bilinear.
Example B.8.2. Let 𝑚 = 3, 𝑅, 𝑀0 , 𝑀1 , 𝑀2 , 𝑃 = ℤ. The map
(B.8.3) 𝑓 ∶ ℤ3 → ℤ, (𝑥0 , 𝑥1 , 𝑥2 ) ↦ 𝑥0 𝑥1 𝑥2
is multilinear. To see this, we note that for (𝑥0 , 𝑥1 , 𝑥2 ) ∈ ℤ3 and 𝑟, 𝑥 ∈ ℤ we have
𝑓(𝑥0 + 𝑥, 𝑥1 , 𝑥2 ) = (𝑥0 + 𝑥)𝑥1 , 𝑥2 = 𝑥0 𝑥1 𝑥2 + 𝑥𝑥1 𝑥2 = 𝑓(𝑥0 , 𝑥1 , 𝑥2 ) + 𝑓(𝑥, 𝑥1 , 𝑥2 ), and
𝑓(𝑟𝑥0 , 𝑥1 , 𝑥2 ) = 𝑟𝑥0 𝑥1 𝑥2 = 𝑟𝑓(𝑥0 , 𝑥1 𝑥2 ). Therefore, 𝑓 is linear in its first argument. In
the same way, it can be shown that 𝑓 is linear in the other two arguments.
B.8. Tensor products 331

We present a condition for the value of a multilinear function to be 0.


Lemma B.8.3. Let 𝑓 ∶ 𝑀 → 𝑃 be multilinear. Then for all (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀 and all
𝑗 ∈ ℤ𝑚 we have
(B.8.4) ⃗ , 0,⃗ 𝑣𝑗+1
𝑓(𝑣 0⃗ , . . . , 𝑣𝑗−1 ⃗ , . . . , 𝑣𝑚−1
⃗ ) = 0.

Proof. Let (𝑣 0⃗ , . . . , 𝑣 𝑚−1


⃗ ) ∈ 𝑀 and let 𝑗 ∈ ℤ𝑚 . Since 𝑓 is multilinear, we have
⃗ , 0,⃗ 𝑣𝑗+1
𝑓(𝑣 0⃗ , . . . , 𝑣𝑗−1 ⃗ , . . . , 𝑣 𝑚−1
⃗ )
(B.8.5) ⃗ , 0 ⋅ 0,⃗ 𝑣𝑗+1
= 𝑓(𝑣 0⃗ , . . . , 𝑣𝑗−1 ⃗ , . . . , 𝑣 𝑚−1
⃗ )
⃗ , 0,⃗ 𝑣𝑗+1
= 0 ⋅ 𝑓(𝑣 0⃗ , . . . , 𝑣𝑗−1 ⃗ , . . . , 𝑣 𝑚−1
⃗ ) = 0.
This concludes the proof. □

Now we define tensor products.


Definition B.8.4. A tensor product of 𝑀0 , . . . , 𝑀𝑚−1 over 𝑅 is a pair (𝑇, 𝜃) where 𝑇 is
an 𝑅-module and
(B.8.6) 𝜃∶𝑀→𝑇
is a multilinear map that has the following properties.
(1) The image 𝜃(𝑀) of 𝑀 under 𝜃 spans 𝑇.
(2) Universal property: For every 𝑅-module 𝑃 and every multilinear map
(B.8.7) 𝜙∶𝑀→𝑃
there is Φ ∈ Hom𝑅 (𝑇, 𝑃) such that
(B.8.8) 𝜙 = Φ ∘ 𝜃.
Example B.8.5. Let 𝑚 = 3, 𝑅, 𝑀0 , 𝑀1 , 𝑀2 , 𝑇 = ℤ. Define the map 𝜃 ∶ ℤ3 → ℤ,
(𝑥0 , 𝑥1 , 𝑥2 ) ↦ 𝑥0 𝑥1 𝑥2 . We claim that (ℤ, 𝜃) is a tensor product of 𝑀0 , 𝑀1 , and 𝑀2 . We
have seen in Example B.8.2 that 𝜃 is multilinear. Also, since for every 𝑡 ∈ ℤ we have
𝑡 = 𝜃(𝑡, 1, 1), it follows that 𝜃(ℤ3 ) = ℤ.
To prove the universal property, let 𝜙 ∶ ℤ3 → ℤ be a linear map. Define
(B.8.9) Φ ∶ ℤ → ℤ, 𝑡 ↦ 𝜙(𝑡, 1, 1).
This map is well-defined. Also, if 𝑧, 𝑡, 𝑡′ ∈ 𝑍, then the multilinearity of 𝜙 implies
(B.8.10) Φ(𝑡 + 𝑡′ ) = 𝜙(𝑡 + 𝑡′ , 1, 1) = 𝜙(𝑡, 1, 1) + 𝜙(𝑡′ , 1, 1) = Φ(𝑡) + Φ(𝑡′ )
and
(B.8.11) Φ(𝑧𝑡) = 𝜙(𝑧𝑡, 1, 1) = 𝑧𝜙(𝑡, 1, 1) = 𝑡Φ(𝑡).
So Φ is linear, and we have 𝜙 = Φ ∘ 𝜃. We also show that Φ is the only linear map
between 𝑇 and 𝑃 such that Φ ∘ 𝜃 = 𝜙. So let Φ′ ∶ ℤ → ℤ be another linear map such
that 𝜙 = Φ′ ∘ 𝜃 . Then for all 𝑡 ∈ ℤ we have
(B.8.12) Φ(𝑡) = 𝜙(𝑡, 1, 1) = Φ′ (𝜃(𝑡, 1, 1)) = Φ′ (𝑡).
Exercise B.8.6. Let 𝑅 = 𝔽2 , 𝑀0 = 𝑀1 = 𝔽22 . Find a tensor product of 𝑀0 and 𝑀1 .
332 B. Linear Algebra

We will now generalize Example B.8.5 and show how to construct the map Φ from
Definition B.8.4 and prove that it is uniquely determined by 𝜙 in this definition.
Lemma B.8.7. Let 𝑇 be an 𝑅-module and let 𝜃 ∶ 𝑀 → 𝑇 be a multilinear map with
Span 𝜃(𝑀) = 𝑇. Then the following statements are equivalent.
(1) The pair (𝑇, 𝜃) has the universal property.
(2) For every 𝑅-module 𝑃, every multilinear map 𝜙 ∶ 𝑀 → 𝑃, every 𝑛 ∈ ℕ, all
𝑛−1
(𝑣 0⃗ , . . . , 𝑣𝑛−1
⃗ ) ∈ 𝑀, and all 𝑟0 , . . . , 𝑟𝑛−1 ∈ 𝑅 with ∑𝑗=0 𝑟𝑗 𝜃(𝑣𝑗⃗ ) = 0 we have
𝑛−1
∑𝑗=0 𝑟𝑗 𝜙(𝑣𝑗⃗ ) = 0.

Proof. Let 𝑃 be an 𝑅-module and let 𝜙 ∶ 𝑀 → 𝑃 be multilinear. Assume that (𝜃, 𝑇)


has the universal property. Then there is a linear map Φ ∶ 𝑇 → 𝑃 with 𝜙 = Φ ∘ 𝜃.
𝑛−1
Let 𝑛 ∈ ℕ, 𝑣 0⃗ , . . . , 𝑣 𝑛−1
⃗ ∈ 𝑀, and 𝑟0 , . . . , 𝑟𝑛−1 ∈ 𝑅 with ∑𝑗=0 𝑟𝑗 𝜃(𝑣𝑗⃗ ) = 0. Then the
linearity of Φ implies
𝑛−1 𝑛−1 𝑛−1
(B.8.13) ∑ 𝑟𝑗 𝜙(𝑣𝑗⃗ ) = ∑ 𝑟𝑗 Φ ∘ 𝜃(𝑣𝑗⃗ ) = Φ ( ∑ 𝑟𝑗 𝜃(𝑣𝑗⃗ )) = Φ(0) = 0.
𝑗=0 𝑗=0 𝑗=0

Conversely, assume that the second condition holds. Consider the map
𝑛−1 𝑛−1
(B.8.14) Φ ∶ 𝑇 → 𝑃, ∑ 𝑟𝑗 𝜃(𝑣𝑗⃗ ) ↦ ∑ 𝑟𝑗 𝜙(𝑣𝑗⃗ )
𝑗=0 𝑗=0

for all 𝑛 ∈ ℕ, 𝑣𝑗⃗ ∈ 𝑀 for 0 ≤ 𝑗 < 𝑛. We show that Φ is well-defined. Since Span 𝜃(𝑀) =
𝑇, every 𝑡 ∈ 𝑇 can be written as
𝑚−1
(B.8.15) 𝑡 = ∑ 𝑟𝑗 𝜃(𝑣𝑗⃗ )
𝑗=0

where 𝑛 ∈ ℕ, 𝑣𝑗⃗ ∈ 𝑀, and 𝑟𝑗 ∈ 𝑅 for 0 ≤ 𝑗 < 𝑛. So Φ is defined for all 𝑡 ∈ 𝑇 and we


must show that the image of 𝑡 under Φ is independent of the representation of 𝑡. Let
𝑛′ −1
(B.8.16) 𝑡 = ∑ 𝑟𝑖′ 𝜃(𝑣′𝑖⃗ )
𝑖=0

be another representation where 𝑛 ∈ ℕ, 𝑣′𝑖⃗ ∈ 𝑀, and 𝑟𝑖′ ∈ 𝑅 for 0 ≤ 𝑖 < 𝑛′ . By


inserting summands with coefficients 0 in the sums on the right sides of (B.8.15) and
(B.8.16) and changing the order of the terms in these sums, we achieve 𝑛 = 𝑛′ and
𝑣𝑗⃗ = 𝑣𝑗′⃗ for all 𝑗 ∈ ℤ𝑛 . So we have
𝑛−1
(B.8.17) 0 = ∑ (𝑟𝑗 − 𝑟𝑗′ )𝜃(𝑣𝑗⃗ ).
𝑗=0

Therefore, the second condition implies


𝑛−1 𝑛−1 𝑛−1
(B.8.18) ∑ 𝑟 𝑖 𝜙(𝑣𝑗⃗ ) − ∑ 𝑟𝑗′ 𝜙(𝑣𝑗⃗ ) = ∑ (𝑟𝑗 − 𝑟𝑖′ )𝜙(𝑣𝑗⃗ ) = 0.
𝑗=0 𝑗=0 𝑗=0
B.8. Tensor products 333

This shows that Φ is, in fact, well-defined. The proof of the linearity is left to the reader
as Exercise B.8.8. □
Exercise B.8.8. Prove the linearity of the map defined in (B.8.14).

From the proof of Lemma B.8.7 we obtain the following result.


Proposition B.8.9. Let (𝑇, 𝜃) be a tensor product of 𝑀0 , . . . , 𝑀𝑚−1 , let 𝑃 be an 𝑅-module,
and let 𝜙 ∶ 𝑇 → 𝑃 be a linear map. Then the map
𝑛−1 𝑛−1
(B.8.19) Φ ∶ 𝑇 → 𝑃, ∑ 𝑟 𝑖 𝜃(𝑣 𝑖⃗ ) ↦ ∑ 𝑟 𝑖 𝜙(𝑣 𝑖⃗ )
𝑖=0 𝑖=0

is a well-defined homomorphism that satisfies 𝜙 = Φ ∘ 𝜃 and it is the only linear map with
this property.

Proof. We have shown in the proof of Lemma B.8.7 that Φ is a well-defined homo-
morphism with 𝜙 = Φ ∘ 𝜃. To prove the uniqueness, let Φ′ ∶ 𝑇 → 𝑃 be another linear
map with these properties. Also, let 𝑡 ∈ 𝑇 with a representation as in (B.8.15). Then
𝑛−1 𝑛−1 𝑛−1
we have Φ′ (𝑡) = Φ′ (∑𝑖=0 𝑟 𝑖 𝜃(𝑣 𝑖⃗ )) = ∑𝑖=0 𝑟 𝑖 Φ′ ∘ 𝜃(𝑣 𝑖⃗ ) = ∑𝑖=0 𝑟 𝑖 𝜙(𝑣 𝑖⃗ ) = Φ(𝑡). □
Example B.8.10. We use the tensor product (ℤ, 𝜃) from Example B.8.5 and consider
the map 𝜙 ∶ ℤ3 → ℤ, (𝑥0 , 𝑥1 , 𝑥2 ) ↦ 2𝑥1 . It is multilinear. The uniquely determined
linear map Φ from Proposition B.8.9 satisfies
(B.8.20) Φ(𝑡) = 𝜙(1, 𝑡, 1) = 2𝑡
and this equation completely determines Φ.

B.8.2. Uniqueness. We show that a tensor product of 𝑀0 , . . . , 𝑀𝑚−1 is uniquely


determined up to tensor product isomorphism.
Definition B.8.11. Let (𝑇, 𝜃) and (𝑇 ′ , 𝜃′ ) be tensor products of 𝑀0 , . . . , 𝑀𝑚−1 over 𝑅.
A tensor product isomorphism between (𝑇, 𝜃) and (𝑇 ′ , 𝜃′ ) is an 𝑅-module isomorphism
Θ ∶ 𝑇 ′ → 𝑇 that satisfies 𝜃 = Θ ∘ 𝜃′ .
Theorem B.8.12. Tensor products of 𝑀0 , . . . , 𝑀𝑚−1 over 𝑅 are uniquely determined up
to tensor product isomorphism. Furthermore, for two tensor products of 𝑀0 , . . . , 𝑀𝑚−1
over 𝑅, the isomorphism between them is uniquely determined.

Proof. Let (𝑇, 𝜃) and (𝑇 ′ , 𝜃′ ) be tensor products of 𝑀0 , . . . , 𝑀𝑚−1 . Then it follows from
the universal property of tensor products that there are linear maps Θ ∶ 𝑇 ′ → 𝑇 and
Θ′ ∶ 𝑇 → 𝑇 ′ such that
(B.8.21) 𝜃 ′ = Θ′ ∘ 𝜃 and 𝜃 = Θ ∘ 𝜃′ .
This implies
(B.8.22) 𝜃′ = (Θ′ ∘ Θ) ∘ 𝜃′ and 𝜃 = (Θ ∘ Θ′ ) ∘ 𝜃.
Equation (B.8.22) implies
(B.8.23) Θ′ ∘ Θ|𝜃′ (𝑀) = 𝐼𝜃′ (𝑀) and Θ ∘ Θ′ |𝜃(𝑀) = 𝐼𝜃(𝑀) .
334 B. Linear Algebra

Since Θ and Θ′ are linear transformations and since Span(𝜃(𝑀)) = 𝑇 and Span(𝜃′ (𝑀))
= 𝑇 ′ we obtain from (B.8.23)
(B.8.24) Θ′ ∘ Θ = Θ′ ∘ Θ|Span(𝜃′ (𝑀)) = 𝐼Span(𝜃′ (𝑀)) = 𝐼𝑇 ′
and
(B.8.25) Θ ∘ Θ′ = Θ ∘ Θ′ |Span(𝜃(𝑀)) = 𝐼Span(𝜃(𝑀)) = 𝐼𝑇 .
So Θ is an isomorphism between 𝑇 ′ and 𝑇. The uniqueness of Θ follows from Propo-
sition B.8.9 □

B.8.3. Construction. We construct a tensor product of 𝑀0 , . . . , 𝑀𝑚−1 over 𝑅. For


this, let 𝐿 be the set of all formal linear combinations
𝑘−1
(B.8.26) ∑ 𝑟 𝑖 (𝑣 𝑖,0
⃗ , . . . , 𝑣 𝑖,𝑚−1
⃗ )
𝑖=0
𝑚−1
where 𝑘 ∈ ℕ0 , 𝑟 𝑖 ∈ 𝑅, (𝑣 𝑖,0 ⃗ , . . . , 𝑣 𝑖,𝑚−1
⃗ ) ∈ ∏𝑗=0 𝑀𝑗 for 0 ≤ 𝑖 < 𝑘 such that the tuples
(𝑣 𝑖,0
⃗ , . . . , 𝑣 𝑖,𝑚−1
⃗ ) are pairwise different and also nonzero. For 𝑘 = 0, the sum in (B.8.26)
is the empty linear combination which we denote by 0.⃗
On 𝐿 we define addition in the obvious way. For all 𝑘 ∈ ℕ, 𝑣 0⃗ , . . . , 𝑣 𝑘−1
⃗ ∈ 𝑀, and
𝑟, 𝑟0 , . . . , 𝑟 𝑘−1 , 𝑠0 , . . . , 𝑠𝑘−1 ∈ 𝑅 we set
𝑘−1 𝑘−1 𝑘−1
(B.8.27) ∑ 𝑟 𝑖 𝑣 𝑖⃗ + ∑ 𝑠𝑖 𝑣 𝑖⃗ = ∑ (𝑟 𝑖 + 𝑠𝑖 )𝑣 𝑖⃗
𝑖=0 𝑖=0 𝑖=0

and
𝑘−1 𝑘−1
(B.8.28) 𝑟 ∑ 𝑟 𝑖 𝑣 𝑖⃗ = ∑ (𝑟𝑟 𝑖 )𝑣 𝑖⃗ .
𝑖=0 𝑖=0

From these rules, we also obtain formulas for adding two linear combinations in 𝐿.
For this, we write both as linear combinations of the same elements of 𝑀 by inserting
summands with coefficients zero and changing the order of the terms in the sum if
necessary. As shown in Exercise B.8.13, 𝐿 is an 𝑅-module.
Exercise B.8.13. Verify that 𝐿 is an 𝑅-module.
Example B.8.14. Let 𝑀0 = 𝑀1 = ℤ. Then the module 𝐿 consists of all formal sums
𝑘−1
∑𝑗=0 𝑟 𝑖 (𝑣 𝑖 , 𝑤 𝑖 ) where 𝑘 ∈ ℕ0 and 𝑟 𝑖 , 𝑣 𝑖 , 𝑤 𝑖 ∈ ℤ for 0 ≤ 𝑖 < 𝑘 such that the tuples
(𝑣 𝑖 , 𝑤 𝑖 ) are pairwise different and different from (0, 0). For example (3, 2) − 2 ⋅ (1, 2)
and (1, 2) are two different elements of 𝐿.

We note that a sequence of nonzero and pairwise different elements of 𝑀 is by


definition linearly independent in 𝐿.
Let 𝑆 be the submodule of 𝐿 which is generated by all elements of 𝐿 of the form
(𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑣,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
(B.8.29) + (𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑤,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
− (𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑣 ⃗ + 𝑤,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
B.8. Tensor products 335

and
𝑟(𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑣,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )
(B.8.30)
− (𝑣 0⃗ , . . . , 𝑣 𝑖−1
⃗ , 𝑟𝑣,⃗ 𝑣 𝑖+1
⃗ , . . . , 𝑣 𝑚−1
⃗ )

where 𝑟 ∈ 𝑅, (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀, 𝑖 ∈ ℤ𝑚 , 𝑣,⃗ 𝑤⃗ ∈ 𝑀𝑖 . For 𝑣 ⃗ = (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀 we
denote the residue class of 𝑣 ⃗ modulo 𝑆 by

(B.8.31) 𝑣 0⃗ ⊗𝑅 ⋯ ⊗𝑅 𝑣 𝑚−1
⃗ .

If the ring 𝑅 is understood, then we also write this residue class as


𝑚−1
(B.8.32) 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗ = 𝑣𝑗⃗ .

𝑗=0

Also, we write the quotient module 𝐿/𝑆 as

(B.8.33) 𝑀0 ⊗𝑅 ⋯ ⊗𝑅 𝑀𝑚−1 .

If the ring 𝑅 is understood, then we also write it as


𝑚−1
(B.8.34) 𝑀0 ⊗ ⋯ ⊗ 𝑀𝑚−1 = 𝑀𝑖 .

𝑗=0

So we have defined the map


𝑚−1 𝑚−1 𝑚−1
(B.8.35) ∶ ∏ 𝑀𝑗 → 𝑀𝑗 , (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ )↦ 𝑣 𝑖⃗ .
⨂ ⨂ ⨂
𝑗=0 𝑗=0 𝑖=0

It follows from the definition of 𝑆 that the following relations hold for all 𝑟 ∈ 𝑅,
(𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ 𝑀, 𝑖 ∈ ℤ𝑚 , and 𝑣,⃗ 𝑤⃗ ∈ 𝑀𝑖 :

𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖−1
⃗ ⊗ 𝑣 ⃗ ⊗ 𝑣 𝑖+1
⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1

(B.8.36) + 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖−1
⃗ ⊗ 𝑤⃗ ⊗ 𝑣 𝑖+1
⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1

= 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖−1
⃗ ⊗ 𝑣 ⃗ + 𝑤⃗ ⊗ 𝑣 𝑖+1
⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗ ,

and
𝑟𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑖⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1

(B.8.37)
= 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑟𝑣 𝑖⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗ .

Example B.8.15. As in Example B.8.14, consider the ℤ-modules 𝑀0 = 𝑀1 = ℤ. In


this example, we have presented the two different elements (3, 2) − 2 ⋅ (1, 2) and (1, 2)
of 𝐿. But applying (B.8.36) and (B.8.37) and using the multilinearity of ⊗ we find that
3 ⊗ℤ 2 − 2 ⋅ (1 ⊗ℤ 2) = 3 ⊗ℤ 2 − 2 ⊗ℤ 2 = 1 ⊗ℤ 2. Therefore, the corresponding elements
in 𝑀0 ⊗ℤ 𝑀1 are the same.
336 B. Linear Algebra

𝑚−1
Theorem B.8.16. The pair (⨂𝑗=0 𝑀𝑗 , ⨂) is a tensor product of 𝑀0 , . . . , 𝑀𝑚−1 over 𝑅.

Proof. By definition, the map


𝑚−1 𝑚−1 𝑚−1
(B.8.38) ∶ ∏ 𝑀𝑗 → 𝑀𝑗 , (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ )↦ 𝑣𝑗⃗
⨂ ⨂ ⨂
𝑗=0 𝑗=0 𝑗=0
𝑚−1
is bilinear and its image spans ⨂𝑗=0 𝑀𝑗 .
We prove the universal property by verifying the second condition in Lemma B.8.7.
𝑚−1
By the definition of ⨂, an element in ⨂𝑗=0 𝑀𝑗 is zero if and only if it is a linear
combination of elements 𝜃(𝑣)⃗ with 𝑣 ⃗ ∈ 𝑆. Hence, it suffices to show that the second
condition of Lemma B.8.7 holds for the generators of 𝑆 shown in (B.8.29) and (B.8.30).
But this follows from the multilinearity of 𝜙. □

The uniqueness of the tensor product shown in Theorem B.8.12 justifies the fol-
lowing definition.
𝑚−1
Definition B.8.17. The pair (⨂𝑗=0 𝑀𝑗 , ⨂) is called the tensor product of 𝑀0 , . . . , 𝑀𝑚−1
𝑚−1
over 𝑅. We simply write it as 𝑀0 ⊗𝑅 ⋯ ⊗𝑅 𝑀𝑚−1 or as 𝑀0 ⊗ ⋯ ⊗ 𝑀𝑚−1 = ⨂𝑖=0 𝑀𝑖
if 𝑅 is understood.

We make the following remark. Let 𝑛 ∈ ℕ and let 𝑁0 , . . . , 𝑁𝑛−1 be 𝑅-modules.


Then the map
𝑚−1 𝑛−1
( 𝑀𝑖 ) ⊗ ( 𝑁 𝑖 ) → 𝑀0 ⊗ ⋯ ⊗ 𝑀𝑚−1 ⊗ 𝑁0 ⊗ ⋯ ⊗ 𝑁𝑛−1 ,
⨂ ⨂
𝑖=0 𝑖=0
(B.8.39)
𝑚−1 𝑛−1
( 𝑣 𝑖⃗ ) ⊗ ( 𝑤⃗ 𝑖 ) ↦ 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑚−1
⃗ ⊗ 𝑤⃗ 0 ⊗ ⋯ ⊗ 𝑤⃗ 𝑛−1
⨂ ⨂
𝑖=0 𝑖=0

induces an isomorphism of tensor products. Using this isomorphism, we identify the


domain and image of this map. For an 𝑅-module 𝑀 and 𝑘 ∈ ℕ, we also write
𝑘−1
(B.8.40) 𝑀 ⊗𝑘 = 𝑀

𝑖=0

and for 𝑣 ⃗ ∈ 𝑀
𝑘−1
(B.8.41) 𝑣⊗𝑘
⃗ = 𝑣.⃗

𝑖=0

Example B.8.18. We construct the tensor product ℤ⊗3 . Its elements are the linear
combinations of 𝑥0 ⊗ 𝑥1 ⊗ 𝑥2 with integer coefficients where 𝑥𝑖 ∈ ℤ. We claim that
(B.8.42) ℤ⊗3 = ℤ ⋅ 1⊗3 .
To verify (B.8.42) we first note that ℤ ⋅ 1⊗3 ⊂ ℤ⊗3 . To show the reverse inclusion, let
𝑥0 ⊗𝑥1 ⊗𝑥2 ∈ ℤ⊗3 with 𝑥0 , 𝑥1 , 𝑥2 ∈ ℤ. Due to the multilinearity of the tensor product,
we have 𝑥0 ⊗ 𝑥1 ⊗ 𝑥2 = 𝑥 ⋅ 1⊗3 where 𝑥 = 𝑥0 𝑥1 𝑥2 . So 𝑥0 ⊗ 𝑥1 ⊗ 𝑥2 ∈ ℤ ⋅ 1⊗3 .
B.9. Tensor products of finite-dimensional vector spaces 337

B.8.4. Homomorphisms. We show that homomorphisms of 𝑅-modules can be


combined in the obvious way to tensor products of homomorphisms. For this, let
𝑁0 , . . . , 𝑁𝑚−1 be further 𝑅-modules and set
𝑚−1 𝑚−1
(B.8.43) 𝑀= 𝑀𝑗 and 𝑁= 𝑁 𝑖.
⨂ ⨂
𝑗=0 𝑗=0

The next proposition shows that with each element of the tensor product
𝑚−1
⨂𝑗=0 Hom(𝑀𝑗 , 𝑁 𝑗 ) we can associate a homomorphisms in Hom(𝑀, 𝑁).

Proposition B.8.19. For 0 ≤ 𝑗 < 𝑚 let 𝑓𝑗 ∈ Hom(𝑀𝑗 , 𝑁 𝑗 ). Then the map

𝑚−1 𝑚−1
(B.8.44) 𝑀 → 𝑁, 𝑣𝑗⃗ ↦ 𝑓𝑗 (𝑣𝑗⃗ )
⨂ ⨂
𝑗=0 𝑗=0

𝑚−1
defines a map in Hom(𝑀, 𝑁). We refer to this homomorphism as ⨂𝑗=0 𝑓𝑗 .

Proof. The map


𝑚−1 𝑚−1
(B.8.45) 𝑀 → 𝑁, 𝑣𝑗⃗ ↦ 𝑓𝑗 (𝑣𝑗⃗ )
⨂ ⨂
𝑗=0 𝑗=0

𝑚−1
is multilinear because the maps 𝑓𝑗 are linear. Since (⨂𝑗=0 𝑀𝑗 , ⊗) is a tensor product
of 𝑀0 , . . . , 𝑀𝑚−1 , the assertion follows from Proposition B.8.9. □

We note that, in general, the map that sends the tensor product of elements in
Hom(𝑀𝑗 , 𝑁 𝑗 ) to the corresponding homomorphism in Hom(𝑀, 𝑁) is not injective.
Therefore, several such tensor products may be associated with the same homomor-
phism in Hom(𝑀, 𝑁). But as we will see in Section B.9.3, the map is an 𝑅-module
isomorphism if the modules 𝑀𝑗 and 𝑁 𝑗 are finite-dimensional vector spaces.

Example B.8.20. Let 𝑅, 𝑀0 , 𝑀1 , 𝑁0 , 𝑁1 = ℤ4 . Let 𝑓 ∶ ℤ4 → ℤ4 , 𝑣 ↦ 2𝑣 mod 4. We


determine the homomorphism in End(ℤ⊗2 ⊗2 ⊗2
4 ) associated with 𝑓 . It sends 𝑥 ⊗ 𝑦 ∈ ℤ4
to (2𝑥 mod 4) ⊗ (2𝑦 mod 4) = 4(𝑥 ⊗ 𝑦) = 0 ⊗ 𝑦ℤ4 = 0.⃗ So, it is the zero map that can
also be represented as the tensor product of the zero map in End(ℤ4 ) with itself.

B.9. Tensor products of finite-dimensional vector spaces


Let 𝑚, 𝑘0 , . . . , 𝑘𝑚−1 ∈ ℕ. For 0 ≤ 𝑗 < 𝑚 let 𝑉 𝑗 be an 𝐹-vector space of dimension 𝑘𝑗
and let 𝐵𝑗 = (𝑏0,𝑗 ⃗ , . . . , 𝑏𝑘⃗ −1,𝑗 ) be an 𝐹-bases of 𝑉 𝑗 .
𝑗

We discuss the properties of the tensor product 𝑉0 ⊗𝐹 ⋯ ⊗𝐹 𝑉𝑚−1 . To simplify our


notation, we write ⊗ for ⊗𝐹 .
338 B. Linear Algebra

B.9.1. Representation. We explain how to represent the tensor product


(B.9.1) 𝑉= 𝑉𝑖

𝑗=0

as an 𝐹-vector space of multi-dimensional matrices over 𝐹. For this, we set


𝑚−1 𝑚−1
(B.9.2) 𝑘 ⃗ = (𝑘0 , . . . , 𝑘𝑚−1 ), ℤ𝑘⃗ = ∏ ℤ𝑘𝑗 , 𝑘 = ∏ 𝑘𝑗 .
𝑗=0 𝑗=0


Definition B.9.1. (1) By 𝐹 𝑘 we mean the set of all 𝑚-dimensional matrices (𝛼𝑖 )⃗ 𝑖∈ℤ

𝑘⃗
with entries 𝛼𝑖 ⃗ ∈ 𝐹.
⃗ ⃗
(2) Let 𝑖 ⃗ ∈ ℤ𝑘⃗ . The standard unit matrices in 𝐹 𝑘 are the matrices 𝐸𝑖 ⃗ in 𝐹 𝑘 such that
the entry with index 𝑖 ⃗ is 1 and it is the only nonzero entry of 𝐸 .⃗ 𝑖

𝑘⃗
Proposition B.9.2. The set 𝐹 equipped with componentwise addition and scalar mul-

tiplication is a 𝑘-dimensional 𝐹-vector space and (𝐸𝑖 )⃗ 𝑖∈ℤ
⃗ is a basis of 𝐹 𝑘 .
Exercise B.9.3. Prove Proposition B.9.2.

Example B.9.4. Let 𝐹 = 𝔽2 , 𝑚 = 3, 𝑘0 = 𝑘1 = 𝑘2 = 2. Then we have 𝑘 ⃗ = (2, 2, 2),


⃗ (2,2,2)
ℤ𝑘⃗ = ℤ32 , 𝑘 = 8. The set 𝐹 𝑘 = 𝔽2 contains 28 three-dimensional matrices with
8 entries each, which can be 0 or 1. For 𝑖 ⃗ = (𝑖0 , 𝑖1 , 𝑖2 ) ∈ ℤ(2,2,2) = ℤ32 the standard
(2,2,2)
unit matrix 𝐸𝑖 ⃗ is the matrix in 𝔽2 such that the entry with index 𝑖 ⃗ is 1 and all other
entries are 0. These matrices form a basis of the eight-dimensional 𝔽2 -vector space
(2,2,2)
𝔽2 .
𝑚−1
Let (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ∈ ∏𝑗=0 𝑉 𝑗 . For all 𝑗 ∈ ℤ𝑚 , write the coefficient vector of 𝑣𝑗⃗
with respect to the basis 𝐵𝑗 of 𝑉 𝑗 as
(B.9.3) (𝑣𝑗⃗ )𝐵𝑗 = (𝑣 0,𝑗 , . . . , 𝑣 𝑘𝑗 ,𝑗 ).
Also, define the 𝑚-dimensional matrix
𝑚−1
(B.9.4) Mat(𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) = ( ∏ 𝑣 𝑖𝑗 ,𝑗 ) .
𝑗=0
(𝑖0 ,. . .,𝑖𝑚−1 )∈ℤ
𝑘⃗

Note that this matrix depends on the choice of the bases of the vector spaces 𝑉 𝑗 . If we
want to make this dependence explicit, we write Mat𝐵0 ,. . .,𝐵𝑚−1 (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ).
Example B.9.5. Let 𝐹 = ℚ, 𝑚 = 3, 𝑘0 = 𝑘1 = 𝑘2 = 2. The three-dimensional matrix
𝑀((1, 1), (1, −1), (1, 0)) is presented in Table B.9.1.

So we have defined a map


𝑚−1

(B.9.5) Mat ∶ ∏ 𝑉 𝑗 → 𝐹 𝑘 , (𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ) ↦ Mat(𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ ).
𝑗=0


Proposition B.9.6. The pair (𝐹 𝑘 , Mat) is a tensor product of 𝑉0 , . . . , 𝑉𝑚−1 .
B.9. Tensor products of finite-dimensional vector spaces 339

Table B.9.1. Mat((1, 1), (1, −1), (1, 0)).

𝑚−1
index (𝑖0 , 𝑖1 , 𝑖2 ) entry ∏𝑗=0 𝑣 𝑖𝑗 ,𝑗
(0, 0, 0) 𝑣 0,0 𝑣 0,1 𝑣 0,2 = 1
(0, 0, 1) 𝑣 0,0 𝑣 0,1 𝑣 1,2 = 0
(0, 1, 0) 𝑣 0,0 𝑣 1,1 𝑣 0,2 = −1
(0, 1, 1) 𝑣 0,0 𝑣 1,1 𝑣 1,2 = 0
(1, 0, 0) 𝑣 1,0 𝑣 0,1 𝑣 0,2 = 1
(1, 0, 1) 𝑣 1,0 𝑣 0,1 𝑣 1,2 = 0
(1, 1, 0) 𝑣 1,0 𝑣 1,1 𝑣 0,2 = −1
(1, 1, 1) 𝑣 1,0 𝑣 1,1 𝑣 1,2 = 0

Proof. The map Mat is well-defined since the 𝐵𝑖 are bases of the 𝑉 𝑖 . Also, it is easy to
verify that Mat is multilinear. Next, we note that for all 𝑖 ⃗ = (𝑖0 , . . . , 𝑖𝑚−1 ) ∈ ℤ𝑘⃗ we have

(B.9.6) Mat(𝑏𝑖⃗ 0 , . . . , 𝑏𝑖⃗ 𝑚−1 ) = 𝐸𝑖 .⃗


𝑚−1 ⃗
Hence, Proposition B.9.2 implies that Span(Mat(∏𝑖=0 𝑉 𝑖 )) = 𝐹 𝑘 .
𝑚−1
To prove the universal property, let 𝑃 be an 𝐹-vector space and let 𝜙 ∶ ∏𝑖=0 𝑉 𝑖 →
𝑃 be multilinear. Define the map

(B.9.7) Φ ∶ 𝐹 𝑘 → 𝑃, 𝐸 (𝑖0 ,. . .,𝑖𝑚−1 ) ↦ 𝜙(𝑏𝑖⃗ 0 ,0 , . . . , 𝑏𝑖⃗ 𝑚−1 ,𝑚−1 ).

This map is well-defined since (𝐸𝑖 )⃗ 𝑖∈ℤ
⃗ is a basis of 𝐹 𝑘 . It is linear by definition, and as
𝑘⃗
shown in Exercise B.9.7 it follows from the multilinearity of 𝜙 that 𝜙 = Φ ∘ Mat. □
Exercise B.9.7. Show that in the proof of Proposition B.9.6 we have 𝜙 = Φ ∘ Mat.

From Proposition B.9.6 and Theorem B.8.12 we obtain the following corollary.
Corollary B.9.8. The map
𝑚−1 𝑚−1

(B.9.8) 𝑉 𝑖 → 𝐹 𝑘, 𝑣 𝑖⃗ ↦ Mat(𝑣 0⃗ , . . . , 𝑣 𝑚−1
⃗ )
⨂ ⨂
𝑖=0 𝑖=0
𝑚−1
is the uniquely determined isomorphism between the tensor products (⨂𝑖=0 𝑉 𝑖 , ⨂) and

(𝐹 𝑘 , Mat).

Corollary B.9.8 justifies the following definition.


Definition B.9.9. We use the map
𝑚−1

(B.9.9) 𝐹 𝑘𝑖 → 𝐹 𝑘 , 𝑣 0⃗ ⊗ ⋯ ⊗ 𝑣 𝑘−1
⃗ ↦ Mat(𝑣 0⃗ , . . . , 𝑣 𝑘⃗ )

𝑖=0
𝑚−1 ⃗
to identify the tensor product ⨂𝑗=0 𝐹 𝑘𝑗 with 𝔽𝑘 .

Next, we show that the tensor product of the 𝐵𝑗 is a basis of 𝑉. For this, we need
the following definition and result.
340 B. Linear Algebra

Definition B.9.10. For 0 ≤ 𝑗 < 𝑚 let 𝑑𝑗 ∈ ℕ and let 𝐷𝑗 = (𝑣 0,𝑗


⃗ , . . . , 𝑣 𝑑⃗ 𝑗 −1,𝑗 ) be finite
sequences in 𝑉 𝑗 . Define
𝑚−1
(B.9.10) 𝐷0 ⊗ ⋯ ⊗ 𝐷𝑚−1 = ( 𝑣 𝑖⃗ 𝑗 ,𝑗 ) .

𝑗=0
𝑖𝑗 ∈ℤ𝑑 ,𝑗∈ℤ𝑚
𝑗

Proposition B.9.11. In the situation of Definition B.9.10 we have


𝑚−1 𝑚−1
(1) ⨂𝑗=0 Span(𝐷𝑗 ) = Span(⨂𝑗=0 𝐷𝑗 ).
𝑚−1
(2) 𝐷0 , . . . , 𝐷𝑚−1 are linearly independent if and only if ⨂𝑗=0 𝐷𝑖 is linearly indepen-
dent.
𝑚−1
Proof. The first assertion follows from the fact that ⨂𝑗=0 Span(𝐷𝑗 ) is the set of linear
𝑚−1
combinations of the elements of ⨂𝑗=0 𝐷𝑗 with coefficients in 𝐹.
We prove the second assertion. Let 𝐷0 , . . . , 𝐷𝑚−1 be linearly independent. It fol-
𝑚−1
lows from Corollary B.9.8 that ⨂𝑗=0 Span(𝐷𝑗 ) is an 𝐹-vector space of dimension 𝑑 =
𝑚−1 𝑚−1
∏𝑗=0 𝑑𝑗 . By the first assertion, ⨂𝑗=0 𝐷𝑗 is a generating system of this tensor product
𝑚−1
with 𝑑 elements. So ⨂𝑗=0 𝐷𝑗 must be a basis of this tensor product. The converse is
left to the reader as Exercise B.9.12. □
Exercise B.9.12. In the situation of Proposition B.9.11 show that the linear indepen-
𝑚−1
dence of ⨂𝑗=0 𝐷𝑗 implies the linear independence of 𝐷𝑗 for 0 ≤ 𝑗 < 𝑚.

B.9.2. Tensor product of matrices. Let 𝑚, 𝑛, 𝑢, 𝑣 ∈ ℕ. We define a map


(B.9.11) 𝜃 ∶ 𝐹 (𝑚,𝑛) × 𝐹 (ᵆ,𝑣) → 𝐹 (𝑚ᵆ,𝑛𝑣)
as follows. Let 𝐴 ∈ 𝐹 (𝑚,𝑛) and 𝐵 ∈ 𝐹 (ᵆ,𝑣) with 𝐴 = (𝑎𝑖,𝑗 ) and 𝐵 = (𝑏𝑘,𝑙 ). Then we set
𝑎 𝐵 𝑎0,1 𝐵 ⋯ 𝑎0,𝑛−1 𝐵
⎛ 0,0 ⎞
𝑎 𝐵 𝑎1,1 𝐵 ⋯ 𝑎1,𝑛−1 𝐵
(B.9.12) 𝜃(𝐴, 𝐵) = ⎜ 1,0 ⎟ ∈ 𝐹 (𝑚ᵆ,𝑛𝑣) .
⎜ ⋮ ⋮ ⎟
⎝𝑎𝑚−1,0 𝐵 𝑎𝑚−1,1 𝐵 ⋯ 𝑎𝑚−1,𝑛−1 𝐵⎠
If we write
(B.9.13) 𝜃(𝐴, 𝐵) = (𝑐 𝑖,𝑗 )𝑖∈ℤ𝑚𝑢 ,𝑗∈ℤ𝑛𝑣 ,
then we have
(B.9.14) 𝑐𝑝ᵆ+𝑞,𝑟𝑣+𝑠 = 𝑎𝑝,𝑞 𝑏𝑟,𝑠
for all 𝑝 ∈ ℤ𝑚 , 𝑞 ∈ ℤᵆ , 𝑟 ∈ ℤ𝑛 , 𝑠 ∈ ℤ𝑣 .
Exercise B.9.13. Verify (B.9.14).

We explain the meaning of 𝜃(𝐴, 𝐵). For this, we assume that we have modified
the representation in Definition B.9.9 such that the matrices in 𝐹 (𝑚,𝑛) become vectors
in 𝐹 𝑚𝑛 and matrices in 𝐹 (ᵆ,𝑣) become vectors in 𝐹 ᵆ𝑣 . The details are worked out in
Exercise B.9.15.
B.9. Tensor products of finite-dimensional vector spaces 341

Proposition B.9.14. Let 𝑓𝐴 , 𝑓𝐵 be the linear maps in Hom(𝐹 𝑛 , 𝐹 𝑚 ) and Hom(𝐹 𝑣 , 𝐹 ᵆ ),


respectively, that have the representation matrices 𝐴 and 𝐵 with respect to the standard
bases of 𝐹 𝑚 , 𝐹 𝑛 , 𝐹 ᵆ , and 𝐹 𝑣 . Then 𝜃(𝐴, 𝐵) is the representation matrix of 𝑓𝐴 ⊗ 𝑓𝐵 with
respect to the standard bases of 𝐹 𝑚ᵆ and 𝐹 𝑛𝑣 .
Exercise B.9.15. Prove Proposition B.9.14.
Example B.9.16. Let
0 1 1 0
(B.9.15) 𝐴=( ) and 𝐵=( ).
0 0 0 0
Then
0 0 1 0
⎛ ⎞
0⋅𝐵 1⋅𝐵 0 0 0 0
(B.9.16) 𝐴⊗𝐵 =( )=⎜ ⎟.
0⋅𝐵 0⋅𝐵 ⎜0 0 0 0⎟
⎝0 0 0 0⎠
The next proposition allows us to identify 𝐴 ⊗ 𝐵 with the matrix 𝜃(𝐴, 𝐵).
Proposition B.9.17. (1) The pair (𝐹 (𝑚ᵆ,𝑛𝑣) , 𝜃) is a tensor product of 𝐹 (𝑚,𝑛) and 𝐹 (ᵆ,𝑣) .
(2) The uniquely determined isomorphism between the tensor products 𝐹 (𝑚,𝑛) ⊗ 𝐹 (ᵆ,𝑣)
and 𝐹 (𝑚ᵆ,𝑛𝑣) of 𝐹 (𝑚,𝑛) and 𝐹 (ᵆ,𝑣) is
(B.9.17) 𝐹 (𝑚,𝑛) ⊗ 𝐹 (ᵆ,𝑣) → 𝐹 (𝑚ᵆ,𝑛𝑣) , 𝐴 ⊗ 𝐵 ↦ 𝜃(𝐴, 𝐵).

Proof. The map 𝜃 is multilinear by definition. Next, we show that


(B.9.18) Span 𝜃(𝐹 (𝑚,𝑛) × 𝐹 (ᵆ,𝑣) ) = 𝐹 (𝑚ᵆ,𝑛𝑣) .
For 𝑝 ∈ ℤ𝑚 , 𝑞 ∈ ℤ𝑛 let 𝐴𝑝,𝑞 = (𝑎𝑖,𝑗 ) ∈ 𝐹 (𝑚,𝑛) with
1 if 𝑖 = 𝑝, 𝑗 = 𝑞,
(B.9.19) 𝑎𝑖,𝑗 = {
0 otherwise.
Also denote by 𝐵𝑟,𝑠 the analogous matrix in 𝐹 (ᵆ,𝑣) . Then it follows from (B.9.14) that
𝜃(𝐴𝑝,𝑞 , 𝐵𝑟,𝑠 ) is the matrix (𝑐 𝑖,𝑗 ) ∈ 𝐹 (ᵆ𝑣,𝑚𝑛) with
1 if 𝑖 = 𝑝𝑢 + 𝑞, 𝑗 = 𝑟𝑣 + 𝑠,
(B.9.20) 𝑐 𝑖,𝑗 = {
0 otherwise.
(𝑚ᵆ,𝑛𝑣)
So (𝜃(𝐴𝑝,𝑞 , 𝐵𝑟,𝑠 )) is a basis of 𝐹 which is in 𝜃(𝐹 (𝑚,𝑛) × 𝐹 (ᵆ,𝑣) ). This implies
(B.9.18). To show the universal property, let 𝑃 be an 𝐹-vector space and let 𝜙 ∶ 𝐹 (𝑚,𝑛) ×
𝐹 (ᵆ,𝑣) → 𝑃 be a multilinear map. Since (𝐴𝑝,𝑞 ) is a basis of 𝐹 (𝑚,𝑛) and (𝐵𝑟,𝑠 ) is a basis of
𝐹 (ᵆ,𝑣) , it follows from the multilinearity of 𝜙 that
(B.9.21) Φ ∶ 𝐹 (𝑚ᵆ,𝑛𝑣) → 𝑃, 𝜃(𝑀𝑝,𝑞 , 𝑁𝑟,𝑠 ) ↦ 𝜙(𝑀𝑝,𝑞 , 𝑁𝑟,𝑠 )
is a well-defined homomorphism with the property that 𝜙 = Φ ∘ 𝜃. This proves the first
assertion. The second assertion follows from Theorem B.8.12. □

Proposition B.9.17 justifies the following definition.


Definition B.9.18. Let 𝐴 ∈ 𝐹 (𝑚,𝑛) and 𝐵 ∈ 𝐹 (ᵆ,𝑣) . Then we identify 𝐴 ⊗ 𝐵 with the
matrix 𝜃(𝐴, 𝐵) from (B.9.12) and we call this matrix the tensor product of 𝐴 and 𝐵.
342 B. Linear Algebra

B.9.3. Tensor product of homomorphisms. Let 𝑊0 , . . . , 𝑊𝑚−1 be further 𝐹-


𝑚−1
vector spaces of finite dimensions 𝑙0 , . . . , 𝑙𝑚−1 ∈ ℕ. Set 𝑙 = ∏𝑗=0 𝑙𝑗 ,
𝑚−1 𝑚−1
(B.9.22) 𝑉= 𝑉 𝑖, and 𝑊= 𝑊 𝑖.
⨂ ⨂
𝑖=0 𝑖=0

In this situation, we can strengthen the assertion in Proposition B.8.19 as follows.

Proposition B.9.19. The map


𝑚−1 𝑚−1 𝑚−1
(B.9.23) Hom(𝑉 𝑗 , 𝑊 𝑗 ) → Hom(𝑉, 𝑊), 𝑓𝑗 ↦ 𝑓𝑗
⨂ ⨂ ⨂
𝑗=0 𝑗=0 𝑗=0

𝑚−1
defines an isomorphism of 𝐹-vector spaces. In this definition, ⨂𝑗=0 𝑓𝑗 means two dif-
𝑚−1
ferent things: a map in ⨂𝑗=0 Hom(𝑉 𝑗 , 𝑊 𝑗 ) and the corresponding map in Hom(𝑉, 𝑊)
defined in Proposition B.8.19.

Proof. We prove the assertion by induction on 𝑚. For 𝑚 = 1, the assertion clearly


holds. Assume that 𝑚 > 1 and that the assertion holds for 𝑚 − 1. We set
𝑚−2 𝑚−2
(B.9.24) 𝑃= 𝑉 𝑗, 𝑄= 𝑊 𝑗, 𝑅 = 𝑉𝑚−1 , 𝑆 = 𝑊𝑚−1 .
⨂ ⨂
𝑗=0 𝑗=0

It follows from the induction hypothesis that the map


𝑚−2 𝑚−2 𝑚−2
(B.9.25) Hom(𝑉 𝑗 , 𝑊 𝑗 ) → Hom(𝑃, 𝑅), 𝑓𝑗 ↦ 𝑓𝑗
⨂ ⨂ ⨂
𝑗=0 𝑗=0 𝑗=0

is an isomorphism. Denote the dimensions of 𝑃, 𝑄, 𝑅, 𝑆 by 𝑛, 𝑚, 𝑣, 𝑢, respectively. Then


we can identify Hom(𝑃, 𝑅) with 𝐹 (𝑚,𝑛) , Hom(𝑄, 𝑆) with 𝐹 (ᵆ,𝑣) , and Hom(𝑃 ⊗ 𝑄, 𝑄 ⊗ 𝑅)
with 𝐹 (𝑚ᵆ,𝑛𝑣) . It follows from Proposition B.9.14 and Proposition B.9.17 that the map
(B.9.26) Hom(𝑃, 𝑄) ⊗ Hom(𝑅, 𝑆) → Hom(𝑃 ⊗ 𝑄, 𝑅 ⊗ 𝑆)
that sends the tensor product 𝑓 ⊗ 𝑔 for 𝑓 ∈ Hom(𝑃, 𝑄) and 𝑔 ∈ Hom(𝑅, 𝑆) to the corre-
sponding homomorphism in Hom(𝑃 ⊗ 𝑅, 𝑄 ⊗ 𝑆) defines an isomorphism. Combining
the two isomorphisms in (B.9.25) and (B.9.26) we obtain the assertion. □

It follows from Proposition B.9.19 that for finite-dimensional vector spaces we


can identify the tensor product of homomorphisms with the corresponding homomor-
phism between the tensor product of the vector spaces.

Example B.9.20. We modify Example B.8.20 and let 𝐹, 𝑀0 , 𝑀1 , 𝑁0 , 𝑁1 = ℤ3 , 𝑓 ∶ ℤ3 →


ℤ3 , 𝑣 ↦ 2𝑣 mod 4. So, we have replaced the ring ℤ4 by the field ℤ3 . We determine the
homomorphism in End(ℤ⊗2 ⊗2 ⊗2
3 ) associated with 𝑓 . It sends 𝑥 ⊗ 𝑦 ∈ ℤ3 to (2𝑥 mod
3) ⊗ (2𝑦 mod 3). This is the only representation of this map as a tensor product of
endomorphisms of ℤ3 .
B.9. Tensor products of finite-dimensional vector spaces 343

B.9.4. Partial trace. Our next goal is to introduce the notion of the partial trace.
In the discussion, we use direct products ∏𝑗∈𝐼 𝑀𝑗 and tensor products ⨂𝑗∈𝐼 𝑀𝑗 for
subsets 𝐼 of ℤ𝑚 . In these expressions, the indices are ordered by size: from smallest to
largest.
First, we note that the following holds.
Proposition B.9.21. For 0 ≤ 𝑗 < 𝑚 let 𝑓𝑗 ∈ End(𝑉 𝑗 ). Then we have
𝑚−1 𝑚−1
(B.9.27) tr( 𝑓𝑗 ) = ∏ tr 𝑓𝑗 .

𝑗=0 𝑗=0

Exercise B.9.22. Prove Proposition B.9.21. Hint: Use induction on 𝑚 and the formula
(B.9.12) for the tensor product of matrices.

We introduce the partial trace.


Theorem B.9.23. Let 𝐽 ⊂ ℤ𝑚 . Then there is a uniquely determined linear map

(B.9.28) tr𝐽 ∶ End ( 𝑉 𝑗 ) → End ( 𝑉 𝑗)


⨂ ⨂
𝑗∈ℤ𝑚 𝑗∈ℤ𝑚 ⧵𝐽

that satisfies

(B.9.29) tr𝐽 ( 𝑓𝑗 ) = ∏ tr 𝑓𝑗 𝑓𝑗
⨂ ⨂
𝑗∈ℤ𝑚 𝑗∈𝐽 𝑗∈ℤ𝑚 ⧵𝐽
𝑚−1
for all (𝑓0 , . . . , 𝑓𝑚−1 ) ∈ ∏𝑗=0 End(𝑉 𝑗 ). It is called the partial trace over the 𝑉 𝑗 , 𝑗 ∈ 𝐽.

Proof. Consider that map


(B.9.30) ∏ End(𝑉 𝑗 ) → ∏ tr 𝑓𝑗 𝑓𝑗 .

𝑗∈ℤ𝑚 𝑗∈𝐽 𝑗∈ℤ𝑚 ⧵𝐽

It is multilinear. Hence, it follows from Proposition B.8.9 that (B.9.29) defines the
uniquely determined homomorphism (B.9.28). □
Example B.9.24. Let 𝑅, 𝑀0 , 𝑀1 = ℤ3 , 𝑓 ∶ ℤ3 → ℤ3 , and 𝑣 ↦ 2𝑣 mod 3. The partial
trace of 𝑓⊗2 over 𝑀0 is the map (𝑥, 𝑦) ↦ (tr 𝑓)𝑓(𝑦) = 𝑦.

We show that the partial trace is trace-preserving.


𝑚−1
Proposition B.9.25. Let 𝐽 ⊂ ℤ𝑚 and let 𝑓 ∈ ⨂𝑗=0 End(𝑉 𝑗 ). Then we have
(B.9.31) tr(tr𝐽 (𝑓)) = tr(𝑓).
𝑚−1 𝑚−1
Proof. We have End(⨂𝑗=0 𝑉 𝑗 ) = ⨂𝑗=0 End(𝑉 𝑗 ). Therefore, the linearity of the
trace, Proposition B.9.21, and (B.9.29) imply the assertion. □
Appendix C

Probability Theory

Quantum algorithms are probabilistic by nature. So their analysis requires some prob-
ability theory. This part of the appendix summarizes the concepts and results of proba-
bility theory that are required in the analyses of probabilistic and quantum algorithms
in this book.

C.1. Basics
We begin with some basic definitions.

Definition C.1.1. A set 𝑆 is called countable if it is finite or there is a bijection ℕ0 → 𝑆.


Otherwise, 𝑆 is called uncountable.

Exercise C.1.2. Show that the set ℕ × ℕ is countable.



Definition C.1.3. An infinite sum ∑𝑖=0 𝑟 𝑖 with 𝑟 𝑖 ∈ ℝ for all 𝑖 ∈ ℕ0 is called absolute

convergent if ∑𝑖=0 |𝑟 𝑖 | converges.

In the following, we need the Riemann Series Theorem which we state now.

Theorem C.1.4. Consider an infinite sum ∑𝑖=0 𝑟 𝑖 where 𝑟 𝑖 ∈ ℝ for all 𝑖 ∈ ℕ0 . Then
the following statements are equivalent.

(1) The infinite sum ∑𝑖=0 𝑟 𝑖 is absolute convergent.

(2) For all permutations 𝜋 ∶ ℕ0 → ℕ0 , the infinite sums ∑𝑖=0 𝑟𝜋(𝑖) are convergent and
have the same limit.

If the two statements hold, then we write ∑𝑟∈𝑅 𝑟 for the limit of the infinite sum ∑𝑖=0 𝑟 𝑖
where 𝑅 represents any ordering of the sequence (𝑟 𝑖 ). If the elements of this sequence are
pairwise distinct, then 𝑅 is the set of these elements.

The proof of this theorem can be found in [Rud76, 3.55 Theorem].

345
346 C. Probability Theory

Exercise C.1.5. Consider the following infinite series:



(−1)𝑛+1
𝑆= ∑ .
𝑛=1
𝑛2
(1) Show that the series 𝑆 is absolute convergent.
(2) Calculate the sum of the series 𝑆 using the original order of its terms.
(3) Now, consider a new series 𝑆 ′ obtained by rearranging the terms of 𝑆 as follows:
First, take the positive terms in the order of increasing 𝑛, and then take the nega-
tive terms in the order of decreasing 𝑛. Prove that 𝑆 ′ is equal to the sum calculated
in step (2).
Definition C.1.6. (1) A discrete probability space is a pair (𝑆, Pr), where 𝑆 is a count-
able set, called a sample space. Its elements are called samples or elementary
events. Also, Pr is a map
(C.1.1) Pr ∶ 𝑆 → [0, 1]
called a probability distribution, that satisfies
(C.1.2) ∑ Pr(𝑠) = 1.
𝑠∈𝑆

We say that the probability distribution assigns the probability Pr(𝑠) to each ele-
mentary event 𝑠 ∈ 𝑆. The probability space is called finite if the sample space is
finite. Otherwise, it is called infinite.
(2) The subsets of 𝑆 are called events. The probability of an event 𝐴 ⊂ 𝑆 is
(C.1.3) Pr(𝐴) = ∑ Pr(𝐴).
𝑎∈𝐴

Note that by Theorem C.1.4, the condition (C.1.2) means that this sum converges
to 1 for any ordering of the elements of 𝑆.
Example C.1.7. Consider the experiment of tossing a fair coin. The corresponding
discrete probability space is ({0, 1}, Pr) where 0 and 1 represent tails and heads, respec-
1
tively, and Pr sends both 0 and 1 to 2 .
Example C.1.8. Consider the experiment of throwing a dice. The corresponding dis-
1
crete probability space is ({1, . . . , 6}, Pr) where Pr sends all elements of {1, . . . , 6} to 6 .
1
Exercise C.1.9. Consider a fair coin, where the probability of getting heads is 2
and
1
the probability of getting tails is 2 . What is the probability of getting heads at least
once when tossing the coin two times? Describe the corresponding probability space
and event and use this to find the solution of the exercise.
Example C.1.10. Consider the experiment in which a dice is rolled until it shows 6.
The sample space is the set of all finite sequences of length ≥ 1 where the last entry is
6 and all other entries are between 1 and 5. The probability distribution is
5|𝑠|−1
(C.1.4) Pr ∶ 𝑆 → [0, 1], 𝑠↦ .
6|𝑠|
C.1. Basics 347

This is a probability distribution because


∞ ∞
5𝑖−1 1 5 𝑖−1
(C.1.5) ∑ Pr(𝑠) = ∑ = ∑ ( ) = 1.
𝑠∈𝑆 𝑖=1
6𝑖 6 𝑖=1 6

Example C.1.11. We present another way to model the experiment of Example C.1.10.
The sample space is ℕ. The sample or elementary event 𝑠 ∈ ℕ means that the experi-
ment is successful after rolling the dice 𝑠 times. The probability distribution is
5𝑠−1
(C.1.6) Pr ∶ ℕ → [0, 1], 𝑠↦ .
6𝑠
This is a probability distribution due to (C.1.5).

Exercise C.1.12. Consider the experiment in which a dice is rolled until an odd num-
ber occurs for the first time. Determine the corresponding discrete probability space
as in Example C.1.11.

Definition C.1.13. A random variable on a discrete probability space (𝑆, Pr) is a func-
tion
𝑋 ∶ 𝑆 → ℝ.
The expected value or expectation of 𝑋 is
(C.1.7) 𝐸[𝑋] = ∑ Pr(𝑠)𝑋(𝑠)
𝑠∈𝑆

if this sum is absolute convergent.

Example C.1.14. Use the notation of Example C.1.10 and define the random variable
(C.1.8) 𝑋 ∶ 𝑆 → ℝ, 𝑠 ↦ |𝑠|.
The expected value of this random variable is

1 5 𝑛−1 1 1
(C.1.9) 𝐸[𝑋] = ∑ 𝑛( ) = = 6.
6 𝑛=1 6 6 (1 − 5/6)2

This means that the expected number of times one needs to roll a dice until it shows a
6 is 6.

Exercise C.1.15. Calculate the expected number of rolls needed to achieve success in
the experiment described in Exercise C.1.12.

Next, we show that the expectation of random variables has linearity properties.

Proposition C.1.16. Let (𝑆, Pr) be a discrete probability space and let 𝑋 and 𝑌 be ran-
dom variables on it such that the expectations 𝐸[𝑋] and 𝐸[𝑌 ] are defined. Then the fol-
lowing hold.
(1) E[𝑋] + E[𝑌 ] = E[𝑋 + 𝑌 ].
(2) E[𝑟𝑋] = 𝑟E[𝑋].
348 C. Probability Theory

Proof. The assertion follows from [Rud76, 3.47 Theorem]. □

We also require Markov’s inequality which we state now.


Proposition C.1.17. Let (𝑆, Pr) be a discrete probability space, and let 𝑋 ∶ 𝑆 → ℝ≥0
be a random variable on it such that 𝐸[𝑋] is defined. Let 𝑐 ∈ ℝ>0 and define the event
𝑋 ≥ 𝑐E[𝑋] to be the set of all elementary events 𝑠 ∈ 𝑆 such 𝑋(𝑠) ≥ 𝑐E[𝑋]. Then
1
(C.1.10) Pr(𝑋 ≥ 𝑐E[𝑋]) ≤ .
𝑐

Proof. Let 𝑌 be the random variable satisfying 𝑌 (𝑠) = 0 if 0 ≤ 𝑋(𝑠) < 𝑐E[𝑋] and
𝑌 (𝑠) = 𝑐E[𝑋] if 𝑋(𝑠) ≥ 𝑐E[𝑋] for all 𝑠 ∈ 𝑆. Then we have
(C.1.11) E[𝑋] ≥ E[𝑌 ] = 𝑐E[𝑋]Pr(𝑋 ≥ 𝑐E[𝑋]).
This implies the assertion. □

C.2. Bernoulli experiments


In this section, we discuss Bernoulli experiments that generalize Example C.1.11. Let
(𝑆, Pr) be a discrete probability distribution and let success and failure be two com-
plementary events in 𝑆. Let 𝑝 be the probability of success. Then the probability of
failure is 1 − 𝑝.
The corresponding Bernoulli experiment consists of repeating the above experi-
ment until the event success happens for the first time. To model it, we define the
discrete probability space (ℕ, Pr∗ ) as follows. An elementary event 𝑖 ∈ ℕ means that
success occurs for the first time after 𝑖 trials. So the probability distribution Pr∗ is de-
fined by
(C.2.1) Pr∗ (𝑖) = (1 − 𝑝)𝑖−1 𝑝, 𝑖 ∈ ℕ.

Proposition C.2.1. The pair (ℕ, Pr ) is a discrete probability space.
Exercise C.2.2. Prove Proposition C.2.1.
Example C.2.3. Consider the experiment of rolling a dice. So, we have 𝑆 = {1, 2, . . . , 6}
1
and Pr(𝑠) = 6 for all 𝑠 ∈ 𝑆. We define a Bernoulli experiment by defining the outcome
of 6 as a success and an outcome different from 6 as a failure. Therefore, we have
1
(C.2.2) success = {6}, failure = {1, 2, 3, 4, 5}, 𝑝= .
6
The corresponding probability distribution (ℕ, Pr∗ ) was presented in Example C.1.11.

We are interested in the expected number of repetitions in the Bernoulli experi-


ment required to be successful for the first time. So we consider the random variable
(C.2.3) 𝑋 ∶ ℕ → ℕ, 𝑖 ↦ 𝑖.
Its value is the number of trials required to be successful. Its expectation is now deter-
mined.
C.2. Bernoulli experiments 349

1
Proposition C.2.4. The expected number of trials in the Bernoulli experiment is 𝑝 .

Proof. We have

𝑝 1
(C.2.4) ∑ 𝑖Pr∗ (𝑖) = 𝑝 ∑ 𝑖(1 − 𝑝)𝑖−1 = = . □
𝑖∈ℕ 𝑖=1
(1 − (1 − 𝑝))2 𝑝

Example C.2.5. The expected number of rolls to obtain a 6 on a dice is 6. The expected
number of coin tosses to obtain heads is 2.
Exercise C.2.6. Determine the expected number of rolls to obtain a number > 3 on a
dice.
Appendix D

Solutions of
Selected Exercises

Solution of Exercise 1.1.8. Set 𝑛 = ⌊log2 𝑎⌋ + 1. Then we have 2𝑛−1 ≤ 𝑎 < 2𝑛 .


We prove the assertion by induction on 𝑛. If 𝑛 = 1, we have 𝑎 = 1 and that proves
the assertion. Suppose that 𝑛 > 1 and that the assertion holds for 𝑛 − 1. Set 𝑎′ =
𝑎 − 2𝑛−1 . Then we have 0 ≤ 𝑎′ < 2𝑛−1 . By the induction hypothesis, we can write
𝑚−1
𝑎′ = ∑𝑖=0 𝑏′𝑖 2𝑚−𝑖−1 where 𝑚 < 𝑛, 𝑏𝑖 ∈ {0, 1} for 0 ≤ 𝑖 < 𝑚. If we set 𝑏0 = 1,
𝑛−1
𝑏1 = ⋯ = 𝑏𝑛−𝑚−1 = 0, and 𝑏𝑛−𝑚 = 𝑏′0 , . . . , 𝑏𝑛−1 = 𝑏𝑚−1

, then 𝑎 = ∑𝑖=0 𝑏𝑖 2𝑛−𝑖−1 .
Also, two such representations of 𝑎 give two representations of 𝑎′ which proves the
uniqueness. □

Solution of Exercise 1.1.14. We have 0 ⊕ 0 = 0 = 0 + 0 mod 2, 0 ⊕ 1 = 1 = 0 +


1 mod 2, 1 ⊕ 0 = 0 = 1 + 0 mod 2, and 1 ⊕ 1 = 0 = 1 + 1 mod 2. □

Solution of Exercise 1.1.20. Let 𝑎 = 𝑏𝑐 with two proper divisors 𝑏, 𝑐 of 𝑎 such that
1 < 𝑏 ≤ |𝑐|. Then we have 𝑏2 ≤ |𝑏𝑐| = |𝑎|. This implies 1 < 𝑏 ≤ √|𝑎|. □

Solution of Exercise 1.1.30. By assumption, we have 𝑟1 < 𝑟0 . Also, we have 0 <


𝑟 𝑖+2 < 𝑟 𝑖+1 for all 𝑖 ∈ ℤ𝑘 since 𝑟 𝑖+2 is the remainder of the division of 𝑟 𝑖 by 𝑟 𝑖+1 .
Hence, the sequence (𝑟 𝑖 )𝑖∈ℤ𝑘+2 is strictly decreasing. Let 𝑖 ∈ ℤ𝑘 . Then we have 𝑟 𝑖+2 <
𝑟 𝑖+1 < 𝑟 𝑖 . If 𝑟 𝑖+1 ≤ 𝑟 𝑖 /2, then 𝑟 𝑖+2 < 𝑟 𝑖 /2. Assume that

(D.1) 𝑟 𝑖+1 > 𝑟 𝑖 /2.

Now we have 𝑟 𝑖 = 𝑞𝑟 𝑖+1 + 𝑟 𝑖+2 with 𝑞 ∈ ℕ0 which implies 𝑟 𝑖+2 = 𝑟 𝑖 − 𝑞𝑟 𝑖+1 . So 0 ≤


𝑟 𝑖+2 < 𝑟 𝑖+1 and (D.1) imply that 𝑞 = 1 and 𝑟 𝑖+2 < 𝑟 𝑖 /2. Next, we see that 𝑟 𝑖+2 < 𝑟 𝑖 /2
implies 𝑟2𝑙 < 𝑟0 /2𝑙 for all 𝑙 ∈ ℕ such that 2𝑙 ≤ 𝑘 + 1. This implies 𝑘 = O(log 𝑟0 ) =
O(size(𝑟0 )). □

351
352 D. Solutions of Selected Exercises

Solution of Exercise 1.2.7. Since 𝑎 is composite, we can write 𝑎 = 𝑏𝑐 with 𝑎, 𝑏 ∈ ℕ


and 1 < 𝑏 ≤ √𝑎. Now we have 𝑎 < 2bitLength 𝑎 . Hence, 𝑏 ≤ √𝑎 < 2(bitLength 𝑎)/2 . This
shows that bitLength(𝑏) ≤ ⌈(bitLength 𝑎)/2⌉ = m(𝑎). Since the binary expansion of 𝑎
can be computed in polynomial time, the same is true for m(𝑎). □

Solution of Exercise 1.3.4. The set FRand(𝐴, 𝑎) ∪ {∞} is countable and by Lemma
1.3.2 Pr𝐴,𝑎 is a probability distribution on the sample space. If Pr𝐴,𝑎 (∞) = 0, then Pr
is a probability distribution on the sample space FRand(𝐴, 𝑎). □

Solution of Exercise 1.3.18. Write 𝑝 = 𝑝𝐴 (𝑎) and 𝑞 = 𝑞𝐴 (𝑎) and denote by 𝑞(𝑎, 𝑘)
the failure probability of 𝗋𝖾𝗉𝖾𝖺𝗍𝐴 (𝑎, 𝑘). If 𝑘 ≥ | log 𝜀|/𝑝, then it follows from (1.3.17)
and from 0 < 𝜀 ≤ 1 that

(D.2) 𝑞𝐴 (𝑎, 𝑘) ≤ 𝑒−𝑘𝑝 ≤ 𝑒−| log 𝜀| = 𝑒log 𝜀 = 𝜀.

Also, if 𝑞𝐴 (𝑎, 𝑘) ≤ 𝜀, then (1.3.17) and 0 < 𝜀 ≤ 1 imply

(D.3) 𝜀 ≥ 𝑞𝐴 (𝑎, 𝑘) ≥ 𝑒−𝑘𝑝/𝑞 .

This implies

(D.4) log 𝜀 ≥ −𝑘𝑝/𝑞

and thus

(D.5) 𝑘 ≥ | log 𝜀|𝑝/𝑞

as asserted. □

Solution of Exercise 1.4.23. The language is 𝐿 = {(𝑎, 𝑥) ∶ 𝑎 ∈ 𝐼, 𝑥 ∈ ℝ>0 , 𝑎 as a


solution 𝑏 with size 𝑏 ≤ 𝑥}. □

Solution of Exercise 2.1.4. We have


|𝑥+ ⟩ + |𝑥− ⟩ |𝑥+ ⟩ − |𝑥− ⟩
(D.6) |0⟩ = , |1⟩ = .
√2 √2
This proves the assertion. □

Solution of Exercise 2.2.10. Let 𝑢⃗ = (𝑢0 , . . . , 𝑢𝑘−1 ), 𝑣 ⃗ = (𝑣 0 , . . . , 𝑣 𝑘−1 ), 𝑤⃗ =


(𝑤 0 , . . . , 𝑤 𝑘−1 ) ∈ ℂ𝑘 and let 𝛼 ∈ ℂ. Denote by ⟨⋅|⋅⟩ the function defined in (2.2.9).
Then we have
𝑘−1 𝑘−1 𝑘−1
⟨𝑢|⃗ 𝑣 ⃗ + 𝑤⟩⃗ = ∑ 𝑢𝑖 (𝑣 𝑖 + 𝑤 𝑖 ) = ∑ 𝑢𝑖 𝑣 𝑖 + ∑ 𝑢𝑖 𝑤 𝑖 = ⟨𝑢|⃗ 𝑣⟩⃗ + ⟨𝑢|⃗ 𝑤⟩.

𝑖=0 𝑖=0 𝑖=0

We also have
𝑘−1 𝑘−1
⃗ 𝑤⟩⃗ = ∑ 𝑣 𝑖 (𝛼𝑤 𝑖 ) = 𝛼 ∑ 𝑣 𝑖 𝑤 𝑖 = 𝛼⟨𝑣|⃗ 𝑤⟩.
⟨𝑣|𝛼 ⃗
𝑖=0 𝑖=0
D. Solutions of Selected Exercises 353

This proves the linearity in the second argument. Next, we prove the conjugate sym-
metry:
𝑘−1 𝑘−1 𝑘−1
⟨𝑤|⃗ 𝑣⟩⃗ = ∑ 𝑤 𝑖 𝑣 𝑖 = ∑ 𝑣 𝑖 𝑤 𝑖 = ∑ 𝑣 𝑖 𝑤 𝑖 = ⟨𝑣|⃗ 𝑤⟩.

𝑖=0 𝑖=0 𝑖=0

Finally, we have
𝑘−1 𝑘−1
⟨𝑣|⃗ 𝑣⟩⃗ = ∑ 𝑣 𝑖 𝑣 𝑖 = ∑ |𝑣 𝑖 |2 .
𝑖=0 𝑖=0

This implies the positive definiteness and concludes the proof of Theorem 2.2.9. □

Solution of Exercise 2.2.23. We prove that the map (2.2.20) is a norm on ℂ. We first
prove the triangle inequality. Let 𝛼, 𝛽 ∈ ℂ. We apply the triangle inequality for the
absolute value in ℝ and obtain
|𝛼 + 𝛽|2 = |ℜ𝛼 + ℜ𝛽|2 + |ℑ𝛼 + ℑ𝛽|2
(D.7) ≤ (ℜ𝛼)2 + (ℜ𝛽)2 + (ℑ𝛼)2 + (ℑ𝛽)2
= |𝛼|2 + |𝛽|2 .
The absolute homogeneity is seen as follows:
|𝛼𝛽|2 = |(ℜ𝛼 + 𝑖ℑ𝛼)(ℜ𝛽 + 𝑖ℑ𝛽)|2
= (ℜ𝛼ℜ𝛽 − ℑ𝛼ℑ𝛽)2 + (ℜ𝛼ℑ𝛽 + ℑ𝛼ℜ𝛽)2
(D.8) = (ℜ𝛼ℜ𝛽)2 + (ℑ𝛼ℑ𝛽)2 + (ℜ𝛼ℑ𝛽)2 + (ℑ𝛼ℜ𝛽)2
= ((ℜ𝛼)2 + (ℑ𝛼)2 )2 ((ℜ𝛽)2 + (ℑ𝛽)2 )2
= |𝛼|2 |𝛽|2 .
Finally, the positive definiteness follows directly from (2.2.20). □

Solution of Exercise 2.3.3. The matrix representation of 𝑌 with respect to 𝐵 is


0 −𝑖
(D.9) Mat𝐵 (𝑌 ) = ( ).
𝑖 0
To determine the matrix representations of 𝑌 with respect to 𝐶 we note that
𝑌 |0⟩ + 𝑌 |1⟩ 𝑖 |1⟩ − 𝑖 |0⟩
(D.10) 𝑌 |𝑥+ ⟩ = = = −𝑖 |𝑥− ⟩
√2 √2
and
𝑌 |0⟩ − 𝑌 |1⟩ 𝑖 |1⟩ + 𝑖 |0⟩
(D.11) 𝑌 |𝑥− ⟩ = = = 𝑖 |𝑥− ⟩ .
√2 √2
Hence, we have
0 𝑖
(D.12) Mat𝐶 (𝑌 ) = ( )
−𝑖 0
which is equal to −Mat𝐵 (𝑌 ). Finally, to find Mat𝐶 (𝑍) we note that
𝑍 |0⟩ + 𝑍 |1⟩ |0⟩ − |1⟩
(D.13) 𝑍 |𝑥+ ⟩ = = = |𝑥− ⟩
√2 √2
354 D. Solutions of Selected Exercises

and
𝑍 |0⟩ − 𝑍 |1⟩ |0⟩ + |1⟩
(D.14) 𝑍 |𝑥− ⟩ = = = |𝑥+ ⟩ .
√2 √2

Hence, we have

0 1
(D.15) Mat𝐶 (𝑍) = ( ).
1 0

This matrix is equal to Mat𝐵 (𝑋). □

Solution of Exercise 2.3.10. The identity (2.3.25) follows from the fact that transpo-
sition and conjugation of matrices are involutions. Next, we have (𝐴 + 𝐵)T = 𝐴T + 𝐵 T
and (𝐴 + 𝐵) = 𝐴 + 𝐵 which implies (2.3.26). Also, (𝛼𝐴)T = 𝛼𝐴T and 𝛼𝐴 = 𝛼𝐴 imply
(2.3.27).
Next, we prove (2.3.28). The rank 𝑟 of 𝐴 is the number of linearly independent
column vectors of 𝐴. The conjugates of these column vectors are the row vectors of 𝐴∗ .
Since conjugation does not change linear dependence and independence, the number
of linearly independent row vectors of 𝐴∗ is also 𝑟. So, Proposition B.7.3 implies that 𝐴
and 𝐴∗ have the same rank.
Finally, equation (2.3.29) follows from the observations that 𝐴𝐵 = 𝐴 𝐵 and (𝐴𝐵)T
=𝐵 𝐴 .T T

Solution of Exercise 2.4.5. Assume that all eigenvalues of 𝐴 ∈ ℂ(𝑘,𝑘) have algebraic
multiplicity 1. It then follows from the definition of an eigenvalue and from Corollary
B.7.26 that all eigenvalues also have geometric multiplicity 1. Since by Proposition
2.4.1 the characteristic polynomial 𝑝𝐴 (𝑥) is a product of linear factors, Theorem B.7.28
implies the assertion. □

Solution of Exercise 2.4.14. Let 𝐴 = (𝑎𝑖,𝑗 ) ∈ ℂ(𝑘,𝑘) . The diagonal elements of 𝐴 are
𝑎𝑖,𝑖 and the diagonal elements of 𝐴∗ are 𝑎𝑖,𝑖 . Since 𝐴 is Hermitian, we have 𝐴 = 𝐴∗
and therefore 𝑎𝑖,𝑖 = 𝑎𝑖,𝑖 for all 𝑖 ∈ ℤ𝑘 . This proves the first assertion. The second
assertion follows from Proposition 2.3.14. The remaining assertions can be deduced
from Proposition 2.3.9 and the Hermitian property. □

Solution of Exercise 2.4.42. Let 𝑃 be a projection. Then by Proposition 2.4.39 also 𝑃 ∗


is a projection. If 𝑃 is Hermitian, then we have

⟨𝑃𝑣,⃗ 𝑣 ⃗ − 𝑃 𝑣⟩⃗
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑃 𝑣⟩⃗ linearity of the inner product,

= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑃 𝑣,⃗ 𝑣⟩⃗ property of the adjoint,
2
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ 𝑃 is Hermitian,
= ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑣⟩⃗ = 0 𝑃 is a projection.
D. Solutions of Selected Exercises 355

So 𝑃 is an orthogonal projection. Conversely, let 𝑃 be orthogonal and let 𝑣 ⃗ ∈ ℂ𝑘 or


𝑣 ⃗ ∈ 𝑉. Then by Proposition 2.4.39 𝑃∗ is also an orthogonal projection and we have
⟨(𝑃 − 𝑃 ∗ )𝑣,⃗ (𝑃 − 𝑃 ∗ )𝑣⟩⃗
= ⟨𝑃 𝑣,⃗ 𝑃 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑃 ∗ 𝑣⟩⃗
− ⟨𝑃 ∗ 𝑣,⃗ 𝑃 𝑣⟩⃗ + ⟨𝑃 ∗ 𝑣,⃗ 𝑃 ∗ 𝑣⟩⃗ linearity of inner product,
2
= ⟨𝑃 𝑣,⃗ 𝑃 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑣⟩⃗
− ⟨(𝑃 ∗ )2 𝑣,⃗ 𝑣⟩⃗ + ⟨𝑃 ∗ 𝑣,⃗ 𝑃 ∗ 𝑣⟩⃗ property of the adjoint,
= ⟨𝑃 𝑣,⃗ 𝑃 𝑣⟩⃗ − ⟨𝑃 𝑣,⃗ 𝑣⟩⃗
− ⟨𝑃 ∗ 𝑣,⃗ 𝑣⟩⃗ + ⟨𝑃 ∗ 𝑣,⃗ 𝑃 ∗ 𝑣⟩⃗ 𝑃 and 𝑃 ∗ are projectors,
= ⟨𝑃 𝑣,⃗ 𝑃 𝑣 ⃗ − 𝑣⟩⃗ + ⟨𝑃 ∗ 𝑣,⃗ 𝑃 ∗ 𝑣 ⃗ − 𝑣⟩⃗ linearity of inner product,
=0 𝑃 and 𝑃 ∗ are orthogonal.
Hence, 𝑃 = 𝑃 ∗ which means that 𝑃 is Hermitian. □

Solution of Exercise 2.4.50. If 𝐴 is Hermitian, then 𝐴 = 𝐴∗ and thus 𝐴∗ 𝐴 = 𝐴𝐴∗ . If


𝐴 is unitary, then 𝐴∗ 𝐴 = 𝐼𝑘 = 𝐴𝐴∗ . □

Solution of Exercise 3.1.10. Assume that 𝑥, 𝑦 ≥ 0. Since 𝑥2 + 𝑦2 = 1, we have 0 ≤


𝑥, 𝑦 ≤ 1. Set 𝛾 = arcsin 𝑦. Then 𝛾 is the uniquely determined number in [0, 𝜋/2] such
2
that sin 𝛾 = 𝑦. Next, we have 𝑥2 = 1 − 𝑦2 = 1 − sin 𝛾 = cos2 𝛾. Since 𝛾 ∈ [0, 𝜋/2] we
have cos 𝛾 ≥ 0. Since 𝑥 ≥ 0 this implies that 𝑥 = cos 𝛾. We claim that 𝛾 is uniquely
determined in [0, 2𝜋[. Let 𝛾′ ∈ [0, 2𝜋[ such that cos 𝛾′ = 𝑥 and sin 𝛾′ = 𝑦. Since
𝑦 ≥ 0, it follows that 𝛾′ ∈ [0, 𝜋] and because 𝑥 ≥ 0, it follows that 𝛾′ ∈ [0, 𝜋/2]. But
𝛾 is the only real number in [0, 𝜋/2] with 𝑥 = cos 𝛾. This implies 𝛾′ = 𝛾. If 𝑥 > 0
and 𝑦 < 0, then we can replace 𝛾 with 2𝜋 − 𝛾 and use sin(2𝜋 − 𝛾) = − sin 𝛾 and
cos(2𝜋 − 𝛾) = cos 𝛾. The other cases are treated analogously. □

Solution of Exercise 3.1.16. Since |𝛼| = 1, it follows that (ℜ𝛼)2 + (ℑ𝛼)2 = 1. By


Lemma 3.1.9, there is 𝛾 ∈ ℝ such that cos 𝛾 = ℜ𝛼 and sin 𝛾 = ℑ𝛼. Hence, we have
𝑒𝑖𝛾 = cos 𝛾 + 𝑖 sin 𝛾 = ℜ𝛼 + 𝑖ℑ𝛼 = 𝛼. Conversely, if 𝛾 ∈ ℝ with 𝛼 = 𝑒𝑖𝛾 = cos 𝛾 + 𝑖 sin 𝛾,
then ℜ𝛼 = cos 𝛾 and ℑ𝛼 = sin 𝛾. From Lemma 3.1.9 it follows that 𝛾 is uniquely
determined modulo 2𝜋. □

Solution of Exercise 3.1.21. We have |𝑥+ ⟩ = cos(𝜋/4) |0⟩ + 𝑒𝑖⋅0 sin(𝜋/4) |1⟩. There-
fore, the spherical coordinates of the point on the Bloch sphere corresponding to |𝑥+ ⟩
are (𝜋/2, 0). The Cartesian coordinates of this point are (1, 0, 0). The proof for |𝑥− ⟩ is
analogous.
Also, we have |𝑦+ ⟩ = cos(𝜋/4) |0⟩ + 𝑒𝑖⋅𝜋/2 sin(𝜋/4) |1⟩. Therefore, the spherical
coordinates of the point on the Bloch sphere corresponding to |𝑦+ ⟩ are (𝜋/2, 𝜋/2). The
Cartesian coordinates of this point are (0, 1, 0). The proof for |𝑦− ⟩ is analogous. □

Solution of Exercise 3.1.24. The relation 𝑅 is reflexive, since for every |𝜓⟩ ∈ 𝑆 we
have |𝜓⟩ = 𝑒𝑖𝛾 |𝜓⟩ with 𝛾 = 0. If |𝜑⟩ , |𝜓⟩ ∈ 𝑆 with |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩ for some 𝛾 ∈ ℝ, then we
356 D. Solutions of Selected Exercises

have |𝜑⟩ = 𝑒𝑖(−𝛾) |𝜓⟩. So 𝑅 is symmetric. Finally, let |𝜑⟩ , |𝜓⟩ , |𝜉⟩ ∈ 𝑆 and let 𝛾, 𝛿 ∈ ℝ
such that |𝜉⟩ = 𝑒𝑖𝛿 |𝜓⟩ and |𝜓⟩ = 𝑒𝑖𝛾 |𝜑⟩. Then we have |𝜉⟩ = 𝑒𝑖(𝛿+𝛾) |𝜑⟩. Therefore, 𝑅 is
transitive. □

Solution of Exercise 3.6.8. Both 𝑂𝐴 and 𝐼𝐵 are Hermitian operators. Therefore, as


noted in Section 2.5.5, it follows that 𝑂𝐴𝐵 = 𝑂𝐴 ⊗ 𝐼𝐵 is Hermitian and thus an ob-
servable of system 𝐴𝐵. Also, it can be easily verified that the spectral decomposition of
this observable is given by (3.6.11). The Measurement Postulate 3.6.5 implies that the
eigenvalue 𝜆 is measured with probability
Pr(𝜆) = tr((𝑃𝜆 ⊗ 𝐼𝐵 )(𝜌𝐴 ⊗ 𝜌𝐵 )) = tr((𝑃𝜆 𝜌𝐴 ) ⊗ (𝐼𝐵 𝜌𝐵 ))
(D.16)
= tr((𝑃𝜆 𝜌𝐴 ) ⊗ 𝜌𝐵 ) = tr(𝑃𝜆 𝜌𝐴 ) tr(𝜌𝐵 ) = tr(𝑃𝜆 𝜌𝐴 ).
Also, it follows from the Measurement Postulate 3.6.5 that if this outcome occurs, the
state immediately after the measurement is
(𝑃𝜆 ⊗ 𝐼𝑏 )(𝜌𝐴 ⊗ 𝜌𝐵 )(𝑃𝜆 ⊗ 𝐼𝑏 ) (𝑃 𝜌 𝑃 ) ⊗ 𝜌𝐵
(D.17) = 𝜆 𝐴 𝜆 .
tr(𝑃𝜆 𝜌𝐴 ) tr(𝑃𝜆 𝜌𝐴 )
Finally, the expectation value of 𝑂 ⊗ 𝐼𝑏 is
(D.18) tr((𝑂 ⊗ 𝐼𝑏 )(𝜌𝐴 ⊗ 𝜌𝐵 )) = tr(𝑂𝜌𝐴 ⊗ 𝐼𝑏 𝜌𝐵 ) = tr(𝑂𝜌𝐴 ) tr(𝜌𝐵 ) = tr(𝑂𝜌𝐴 ). □

Solution of Exercise 3.7.11. Proposition 2.4.27 implies


𝑙−1
1
(D.19) 𝜌 = |𝜉⟩ ⟨𝜉| = ∑ |𝜑 ⟩ |𝜓 ⟩ ⟨𝜑𝑗 | ⟨𝜓𝑗 | .
𝑙 𝑖,𝑗=0 𝑖 𝑖

Since the sequence (|𝜓𝑖 ⟩) is orthonormal, it follows that for all 𝑖, 𝑗 ∈ ℤ𝑙 we have
(D.20) tr𝐵 |𝜑𝑖 ⟩ |𝜓𝑖 ⟩ ⟨𝜑𝑗 | ⟨𝜓𝑗 | = |𝜑𝑖 ⟩ ⟨𝜑𝑗 | 𝛿 𝑖,𝑗 .
Equations (D.19) and (D.20) imply
𝑙−1
1
(D.21) tr𝐵 |𝜉⟩ ⟨𝜉| = ∑ |𝜑 ⟩ ⟨𝜑𝑖 |
𝑙 𝑖=0 𝑖

which proves the claim. □

Solution of Exercise 4.1.5. We have tr 𝐼 ∗ 𝑋 = tr 𝑋 = 0, tr 𝐼 ∗ 𝑌 = tr 𝑌 = 0, tr 𝐼 ∗ 𝑍 =


tr 𝑍 = 0. Also Theorem 4.1.2 implies tr 𝑋 ∗ 𝑌 = tr 𝑋𝑌 = tr 𝑖𝑍 = 0, tr 𝑍 ∗ 𝑋 = tr 𝑍𝑋 =
tr 𝑖𝑌 = 0, and tr 𝑌 ∗ 𝑍 = tr 𝑌 𝑍 = tr 𝑖𝑋 = 0. □

Solution of Exercise 4.2.26. Let 𝐵 = (𝑢,̂ 𝑣,̂ 𝑤)̂ ∈ SO(3). Then Proposition 4.2.25 im-
plies Rot𝑤̂ (𝛾) = 𝐵 Rot𝑧̂ (𝛾)𝐵−1 . Choose 𝑇 ∈ SO(3) with 𝐵𝑇 = (−𝑢,̂ 𝑣,̂ −𝑤). ̂ Then
𝑇 Rot𝑧̂ (𝛾)𝑇 −1 = Rot𝑧̂ (−𝛾). This implies
Rot−𝑤̂ (𝛾) = 𝐵𝑇 Rot𝑧̂ (𝛾)𝑇 −1 𝐵 −1 = 𝐵 Rot𝑧̂ (−𝛾)𝐵 −1 = Rot𝑤̂ (−𝛾). □

Solution of Exercise 4.3.9. We have


𝛾 𝛾 𝛾 𝛾
(D.22) 𝑅𝑥̂ (𝛾) |0⟩ = ( cos 𝐼 − 𝑖 sin 𝑋) |0⟩ = cos |0⟩ − 𝑖 sin |1⟩
2 2 2 2
D. Solutions of Selected Exercises 357

and
𝛾 𝛾 𝛾 𝛾
(D.23) 𝑅𝑥̂ (𝛾) |1⟩ = ( cos 𝐼 − 𝑖 sin 𝑋) |1⟩ = cos |1⟩ − 𝑖 sin |0⟩ .
2 2 2 2
This proves (4.3.8). We also have
𝛾 𝛾 𝛾 𝛾
(D.24) 𝑅𝑦̂(𝛾) |0⟩ = ( cos 𝐼 − 𝑖 sin 𝑌 ) |0⟩ = cos |0⟩ + sin |1⟩
2 2 2 2
and
𝛾 𝛾 𝛾 𝛾
(D.25) 𝑅𝑦̂(𝛾) |1⟩ = ( cos 𝐼 − 𝑖 sin 𝑋) |1⟩ = cos |1⟩ − sin |0⟩ .
2 2 2 2
This proves (4.3.9). Finally, we have
𝛾 𝛾 𝛾 𝛾
(D.26) 𝑅𝑧̂ (𝛾) |0⟩ = ( cos 𝐼 − 𝑖 sin 𝑍) |0⟩ = ( cos − 𝑖 sin ) |0⟩ 𝑒−𝑖𝛾/2 |0⟩
2 2 2 2
and
𝛾 𝛾 𝛾 𝛾
(D.27) 𝑅𝑧̂ (𝛾) |1⟩ = ( cos 𝐼 − 𝑖 sin 𝑍) |1⟩ = ( cos + 𝑖 sin ) |0⟩ .𝑒𝑖𝛾/2 |0⟩ .
2 2 2 2
This proves (4.3.10). □

Solution of Exercise 4.3.13. Let 𝐴 ∈ su(2),


𝑎 𝑏
(D.28) 𝐴=( )
𝑐 𝑑
with 𝑎, 𝑏, 𝑐, 𝑑 ∈ ℂ. Since 𝐴 is Hermitian, we have 𝑎, 𝑑 ∈ ℝ and 𝑏 = 𝑐. Since tr 𝐴 = 0,
we have 𝑑 = −𝑎. Hence, 𝐴 can be written as in the lemma. Conversely, if 𝐴 has
a representation as in the lemma, then 𝐴 is Hermitian and has trace 0; that is, 𝐴 ∈
su(2). □

Solution of Exercise 4.3.18. Let 𝑤̂ ∈ ℝ3 be a unit vector and let 𝛾 ∈ ℝ such that
𝑈 = 𝑅𝑤̂ (𝛾). If 𝑈 ∈ {±𝐼}, then Theorem 4.3.15 implies 𝛾 ≡ 0 mod 2𝜋. So, by Proposition
4.2.27 we have Rot(𝑈) = 𝐼3 . Assume that 𝑈 ≠ ±𝐼. Let 𝑤̂ ′ ∈ ℝ3 be a unit vector
and let 𝛾′ ∈ ℝ such that 𝑈 = 𝑅𝑤̂ ′ (𝛾′ ). Then Theorem 4.3.15 implies 𝑤̂ = 𝑤̂ ′ and
𝛾 ≡ 𝛾′ mod 2𝜋 or 𝑤̂ = −𝑤̂ ′ and 𝛾 ≡ −𝛾′ mod 2𝜋. So Proposition 4.2.27 implies that
Rot𝑤̂ (𝛾) = Rot𝑤̂ ′ (𝛾′ ). □

Solution of Exercise 4.3.23. Let 𝜏 = (𝜏1 , 𝜏2 , 𝜏3 ), 𝑝 = (𝑝1 , 𝑝2 , 𝑝3 ), and 𝐵 = (𝑏𝑖,𝑗 ). Then


we have
𝐵 𝑝 ⃗ ⋅ 𝜏 = (𝑏1,1 𝑝1 + 𝑏1,2 𝑝2 + 𝑏1,3 𝑝3 )𝜏1
+ (𝑏2,1 𝑝1 + 𝑏2,2 𝑝2 + 𝑏2,3 𝑝3 )𝜏2
+ (𝑏3,1 𝑝1 + 𝑏3,2 𝑝2 + 𝑏3,3 𝑝3 )𝜏3
(D.29) = 𝑝1 (𝑏1,1 𝜏1 + 𝑏2,1 𝜏2 + 𝑏3,1 𝜏3 )
+ 𝑝2 (𝑏1,2 𝜏1 + 𝑏2,2 𝜏2 + 𝑏3,2 𝜏3 )
+ 𝑝3 (𝑏1,3 𝜏1 + 𝑏2,3 𝜏2 + 𝑏3,3 𝜏3 )
= 𝑝 ⃗ ⋅ (𝐵 ⋅ 𝜏). □
358 D. Solutions of Selected Exercises

Solution of Exercise 4.3.29. We have


(𝑝 ⃗ ⋅ 𝜎) |0⟩ = cos 𝜙 sin 𝜃 𝑋 |0⟩ + sin 𝜙 sin 𝜃 𝑌 |0⟩ + cos 𝜃 𝑍 |0⟩
(D.30) = cos 𝜙 sin 𝜃 |1⟩ + 𝑖 sin 𝜙 sin 𝜃 |1⟩ + cos 𝜃 |0⟩
= cos 𝜃 |0⟩ + sin 𝜃 𝑒𝑖𝜙 |1⟩ .
We also have
(𝑝 ⃗ ⋅ 𝜎) |1⟩ = cos 𝜙 sin 𝜃 𝑋 |1⟩ + sin 𝜙 sin 𝜃 𝑌 |1⟩ + cos 𝜃 𝑍 |1⟩
(D.31) = cos 𝜙 sin 𝜃 |0⟩ − 𝑖 sin 𝜙 sin 𝜃 |0⟩ − cos 𝜃 |1⟩
= sin 𝜃 𝑒−𝑖𝜙 |0⟩ − cos 𝜃 |1⟩ .
This proves the assertion. □

Solution of Exercise 4.4.3. Let • ∈ {+, −}. Then we have


|0⟩ • |1⟩ 𝑋 |0⟩ • 𝑋 |1⟩
𝑋 |𝑥• ⟩ = 𝑋 =
√2 √2
(D.32) |1⟩ • |0⟩ |0⟩ • |1⟩
= =•
√2 √2
= • |𝑥• ⟩ .
On {+, −} we define multiplication in the usual way:
(D.33) + ⋅ + = − ⋅ − = +, + ⋅ − = − ⋅ + = −.
Then from (D.32) we obtain for all ∘, • ∈ {+, −}
1
𝖢𝖭𝖮𝖳 |𝑥∘ ⟩ |𝑥• ⟩ = 𝖢𝖭𝖮𝖳 (|0⟩ |𝑥• ⟩ ∘ |1⟩ |𝑥• ⟩)
2
1
= (𝖢𝖭𝖮𝖳 |0⟩ |𝑥• ⟩ ∘ 𝖢𝖭𝖮𝖳 |1⟩ |𝑥• ⟩)
√2
(D.34) 1
= (|0⟩ |𝑥• ⟩ ∘ |1⟩ 𝑋 |𝑥• ⟩)
√2
1
= (|0⟩ |𝑥• ⟩ ∘ ⋅ • |1⟩ |𝑥• ⟩)
√2
= |𝑥∘⋅• ⟩ |𝑥• ⟩ .
This implies (4.4.4). □

Solution of Exercise 5.1.5. Since 𝑉 is unitary and |𝜓⟩ is an eigenstate of 𝑉 we have


𝑉 |𝜓⟩ = 𝑒𝑖𝜙 |𝜓⟩ with 𝜙 ∈ ℝ. Also, we have
|0⟩ + 𝑒𝑖𝜙 |1⟩
(D.35) 𝐶(𝑉) |𝑥+ ⟩ |𝜓⟩ = |𝜓⟩ .
√2
The point on the Bloch sphere corresponding to |𝑥+ ⟩ has the spherical coordinates
(1, 𝜋/2, 0). The point on the Bloch sphere corresponding to the first qubit of
𝐶(𝑉) |𝑥+ ⟩ |𝜓⟩ has the spherical coordinates (1, 𝜋/2, 𝜙). Finally, we have 𝑈 𝑓 =
(𝐼 ⊗ 𝑋 𝑓(0) )𝐶(𝑉 𝑓 ). □
D. Solutions of Selected Exercises 359

Solution of Exercise 5.5.8. We use (5.3.3) and obtain


1
𝐻 ⊗𝑛 |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ 𝐻 ⊗𝑛 |𝑧 ⃗ ⊕ 𝑠⟩⃗
√2𝑚 ⃗
𝑠∈𝑆
1 ⃗ 𝑤⃗ |
⃗ 𝑠)⋅
= ∑ ∑ (−1)(𝑧⊕ 𝑤⟩⃗
(D.36) √2𝑚+𝑛 𝑛
⃗ 𝑤∈{0,1}
𝑠∈𝑆 ⃗

1
= ∑ (−1)𝑧⋅⃗ 𝑤⃗ (∑ (−1)𝑠⋅⃗ 𝑤⃗ ) |𝑤⟩⃗ .
√2𝑚+𝑛 ⃗
𝑤∈{0,1}𝑛 ⃗
𝑠∈𝑆

We evaluate the inner sum of the last expression in (D.36). If 𝑤⃗ ∈ 𝑆 ⟂ , we have

(D.37) ∑ (−1)𝑠⋅⃗ 𝑤⃗ = ∑ (−1)0 = |𝑆| = 2𝑚 .



𝑠∈𝑆 ⃗
𝑠∈𝑆

Let 𝑤⃗ ∉ 𝑆 ⟂ and consider the map

(D.38) 𝑆 → {0, 1}, 𝑠 ⃗ ↦ 𝑠 ⃗ ⋅ 𝑤.⃗

It is a surjective homomorphism of groups. By Theorem B.3.6, the kernel of this map


contains |𝑆|/2 = 2𝑚−1 elements. It follows that for half of the elements 𝑠 ⃗ of 𝑆 we have
(−1)𝑠⋅⃗ 𝑤⃗ = 1 and for the other half we have (−1)𝑠⋅⃗ 𝑤⃗ = −1. So, we have

(D.39) ∑ (−1)𝑠⋅⃗ 𝑤⃗ = 0.

𝑠∈𝑆

From (D.36), (D.37), and (D.39) we obtain

2𝑚 1
(D.40) 𝐻 ⊗𝑛 |𝑧 ⃗ ⊕ 𝑆⟩ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ = ∑ (−1)𝑧⋅⃗ 𝑤⃗ |𝑤⟩⃗ . □
√2𝑚+𝑛 ⃗ ⟂
𝑤∈𝑆 √2𝑛−𝑚 ⃗ ⟂
𝑤∈𝑆

Solution of Exercise 6.4.13. By definition, 𝑡0 is the empty product times 𝑡 which is 𝑡.


Also, we have
𝑛−1 𝑐 𝑛−1 𝑛−𝑙−1
∏ 𝑎𝑙 𝑡 ≡ 𝑎∑𝑙=0 𝑐𝑙 2 𝑡 ≡ 𝑎𝑐 𝑡 mod 𝑁 if 𝑡 < 𝑁,
(D.41) 𝑡𝑛 ≡ { 𝑙=0 𝑛−𝑙−1
𝑡 if 𝑡 ≥ 𝑁. □

Solution of Exercise 6.4.14. It suffices to show that the cardinality of the image of the
map in (6.4.34) is 22𝑛 . The number of pairs (𝑥, 𝑦) ∈ ℤ22𝑛 with 𝑥 ≥ 𝑁 is 𝑘1 = (2𝑛 − 𝑁)2𝑛 .
The number of pairs (𝑥, 𝑦) with 𝑥 < 𝑁 and 𝑦 ≥ 𝑁 is 𝑘2 = 𝑁(2𝑛 − 𝑁). The number of
pairs (𝑥, 𝑦) ∈ ℤ2𝑁 with gcd(𝑦, 𝑁) > 1 is 𝑘3 = 𝑁(𝑁 − 𝜑(𝑁)) where 𝜑(𝑁) is the number
of 𝑦 ∈ ℤ𝑁 with gcd(𝑦, 𝑁) = 1. Finally, if 𝑦 ∈ ℤ𝑁 with gcd(𝑦, 𝑁) = 1, then the map
ℤ𝑁 → ℤ𝑁 , 𝑥 ↦ 𝑥𝑦 mod 𝑁 is a bijection. Hence, the number of pairs (𝑥, 𝑥𝑦 mod 𝑁) ∈
ℤ2𝑁 with gcd(𝑦, 𝑁) = 1 is 𝑘4 = 𝑁𝜑(𝑁). So the cardinality of the image of the map in
(6.4.34) is 𝑘1 + 𝑘2 + 𝑘3 + 𝑘4 = (2𝑛 − 𝑁)2𝑛 + 𝑁(2𝑛 − 𝑁) + 𝑁(𝑁 − 𝜑(𝑁)) + 𝑁𝜑(𝑁) =
22𝑛 − 2𝑛 𝑁 + 2𝑛 𝑁 − 𝑁 2 + 𝑁 2 − 𝑁𝜑(𝑁) + 𝑁𝜑(𝑁) = 22𝑛 . □
360 D. Solutions of Selected Exercises

Solution of Exercise 7.1.8. Since the identity operator 𝐼𝑁 and the projection |𝑠1 ⟩ ⟨𝑠1 |
are involutions, we have

𝑈12 = (𝐼𝑁 − 2 |𝑠1 ⟩ ⟨𝑠1 |)2


2
= 𝐼𝑁 − 2𝐼𝑁 |𝑠1 ⟩ ⟨𝑠1 | − 2 |𝑠1 ⟩ ⟨𝑠1 | 𝐼𝑁 + 4(|𝑠1 ⟩ ⟨𝑠1 |)2
(D.42)
= 𝐼𝑁 − 2 |𝑠1 ⟩ ⟨𝑠1 | − 2 |𝑠1 ⟩ ⟨𝑠1 | + 4 |𝑠1 ⟩ ⟨𝑠1 |
= 𝐼𝑁 .

So 𝑈1 is an involution. Also, since 𝐼𝑁 and |𝑠1 ⟩ ⟨𝑠1 | are Hermitian, it follows that 𝑈1 is
also Hermitian. It follows from Exercise 2.4.17 that 𝑈1 is unitary. In the same way, it
can be shown that 𝑈𝑠 is a Hermitian, unitary involution. So 𝐺 = 𝑈𝑠 𝑈1 is unitary. □

Solution of Exercise 7.2.3. We have

1
⟨𝑠+ |𝑠+ ⟩ = ⟨𝑠− |𝑠− ⟩ = (⟨𝑠 |𝑠 ⟩ + ⟨𝑠0 |𝑠0 ⟩) = 1,
2 1 1
1
⟨𝑠+ |𝑠− ⟩ = (⟨𝑠 |𝑠 ⟩ − ⟨𝑠0 |𝑠0 ⟩) = 0.
2 1 1

So (|𝑠+ ⟩ , |𝑠− ⟩) is an orthonormal basis of 𝑃. Also, equation (7.2.3) implies

(D.43) 𝐺 |𝑠0 ⟩ = 𝐺(cos 0 |𝑠0 ⟩ + sin 0 |𝑠1 ⟩) = cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩

and

𝜋 𝜋
𝐺 |𝑠1 ⟩ = 𝐺 (cos |𝑠 ⟩ + sin |𝑠1 ⟩)
2 0 2
(D.44) 𝜋 𝜋
= cos ( + 2𝜃) |𝑠0 ⟩ + sin ( + 2𝜃) |𝑠1 ⟩
2 2
= − sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩ .

From equations (D.43) and (D.44) we obtain

1
𝐺 |𝑠+ ⟩ = (𝐺 |𝑠1 ⟩ + 𝑖 |𝑠0 ⟩)
√2
1
= (− sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩ + 𝑖(cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩))
√2
1
= ((cos 2𝜃 + 𝑖 sin 2𝜃) |𝑠1 ⟩ + (𝑖 cos 2𝜃 − sin 2𝜃) |𝑠0 ⟩)
√2
1
= ((cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠0 ⟩ + 𝑖(cos 2𝜃 + 𝑖 sin 2𝜃) |𝑠1 ⟩)
√2
= 𝑒2𝜃 |𝑠+ ⟩
D. Solutions of Selected Exercises 361

and

1
𝐺 |𝑠− ⟩ = (𝐺(|𝑠1 ⟩ − 𝑖 |𝑠0 ⟩))
√2
1
= (− sin 2𝜃 |𝑠0 ⟩ + cos 2𝜃 |𝑠1 ⟩ − 𝑖(cos 2𝜃 |𝑠0 ⟩ + sin 2𝜃 |𝑠1 ⟩))
√2
1
= ((cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠1 ⟩ + (−𝑖 cos 2𝜃 − sin 2𝜃) |𝑠0 ⟩)
√2
1
= ((cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠0 ⟩ − 𝑖(cos 2𝜃 − 𝑖 sin 2𝜃) |𝑠1 ⟩)
√2
= 𝑒−2𝜃 |𝑠+ ⟩ .

This means that |𝑠+ ⟩ and |𝑠− ⟩ are eigenstates of 𝐺 associated with the eigenvales 𝑒2𝜃
and 𝑒−2𝜃 , respectively. Finally, we prove (7.2.2). We have

−𝑖
(𝑒𝑖𝜃 |𝑠+ ⟩ − 𝑒−𝑖𝜃 |𝑠− ⟩)
√2
−𝑖 𝑖𝜃
= (𝑒 (|𝑠1 ⟩ + 𝑖 |𝑠0 ⟩) − 𝑒−𝑖𝜃 (|𝑠1 ⟩ − 𝑖 |𝑠0 ⟩))
2
1 −𝑖 𝑖𝜃
(D.45) = (𝑒𝑖𝜃 + 𝑒−𝑖𝜃 ) |𝑠0 ⟩ + (𝑒 − 𝑒−𝑖𝜃 ) |𝑠1 ⟩
2 2
= cos 𝜃 |𝑠0 ⟩ + sin 𝜃 |𝑠1 ⟩ = |𝑠⟩ . □


0 𝐴
Solution of Exercise 8.1.1. We have (𝐴′ )∗ = ( ∗ ) = 𝐴′ and 𝐴′ 𝑥⃗ = (𝐴𝑥,⃗ 0)⃗ =
𝐴 0
(𝑏,⃗ 0)⃗ = 𝑏′⃗ . □

Solution of Exercise A.1.11. Let 𝑎, 𝑏 ∈ 𝑆 and assume that the equivalence classes of
𝑎 and 𝑏 have a common element 𝑐. Let 𝑑 ∈ 𝑆. Also, let (𝑎, 𝑑) ∈ 𝑅. Then (𝑎, 𝑐) ∈ 𝑅, the
symmetry, and the transitivity of 𝑅 imply that (𝑏, 𝑑) ∈ 𝑅. So the equivalence class of
𝑎 is contained in the equivalence class of 𝑏 and vice versa. Therefore, the equivalence
classes are equal. □

Solution of Exercise A.4.12. Both are abelian semigroups with identity elements 0
and 1, respectively. Also, (ℤ𝑘 , +𝑘 ) is a group but (ℤ𝑘 , ⋅𝑘 ) is not, since 0 has no inverse.
The unit group of (ℤ𝑘 , ⋅𝑘 ) is (ℤ∗𝑘 , ⋅𝑘 ). □
362 D. Solutions of Selected Exercises

Solution of Exercise A.5.9. We use the trigonometric identities (A.5.1), (A.5.2), and
(A.5.4) and obtain
2 2
sin (𝑥 + 𝑦) − sin 𝑥
2
= (sin 𝑥 cos 𝑦 + cos 𝑥 sin 𝑦)2 − sin 𝑥
2 2 2
= sin 𝑥 cos2 𝑦 + 2 sin 𝑥 cos 𝑥 sin 𝑦 cos 𝑦 + cos2 𝑥 sin 𝑦 − sin 𝑥
2 2 2 2 2
= sin 𝑥(1 − sin 𝑦) + sin 𝑥 cos 𝑥 sin(2𝑦) + (1 − sin 𝑥) sin 𝑦 − sin 𝑥
2 2 2 2 2 2 2
= sin 𝑥 − sin 𝑥 sin 𝑦 + sin 𝑥 cos 𝑥 sin(2𝑦) + sin 𝑦 − sin 𝑥 sin 𝑦 − sin 𝑥
2 2
= sin 𝑥 cos 𝑥 sin(2𝑦) + (1 − 2 sin 𝑥) sin 𝑦.
Likewise, we obtain from the trigonometric identities (A.5.1), (A.5.3), and (A.5.4)
2 2
sin 𝑥 − sin (𝑥 − 𝑦)
= 𝑠𝑖𝑛2 𝑥 − (sin 𝑥 cos 𝑦 − cos 𝑥 sin 𝑦)2
2 2 2
= sin 𝑥 − sin 𝑥 cos2 𝑦 + 2 sin 𝑥 cos 𝑥 sin 𝑦 cos 𝑦 − cos2 𝑥 sin 𝑦
2 2 2 2 2 2
= sin 𝑥 − sin 𝑥(1 − sin 𝑦) + sin 𝑥 cos 𝑥 sin(2𝑦) − (1 − sin 𝑥) sin 𝑦 − sin 𝑥
2 2 2 2 2 2 2
= sin 𝑥 − sin 𝑥 + sin 𝑥 sin 𝑦 + sin 𝑥 cos 𝑥 sin(2𝑦) − sin 𝑦 + sin 𝑥 sin 𝑦
2 2
= sin 𝑥 cos 𝑥 sin(2𝑦) − (1 − 2 sin 𝑥) sin 𝑦. □

Solution of Exercise C.1.9. The probability distribution is ({0, 1}2 , Pr) where 0 and 1
represent tails and heads, respectively. Furthermore, Pr sends each pair (𝑎, 𝑏) ∈ {0, 1}2
1
to its probability 4 . The event “getting heads at least once” is {(0, 1), (1, 0), (1, 1)}. Its
3
probability is 4 . □

Solution of Exercise C.2.2. We must show that


(D.46) ∑ Pr∗ (𝑖) = 1.
𝑖∈ℕ

In fact, we have

𝑝
(D.47) ∑ Pr′ (𝑖) = 𝑝 ∑ (1 − 𝑝)𝑖 = = 1. □
𝑖∈ℕ 𝑖=0
1 − (1 − 𝑝)
Bibliography

[AB09] S. Arora and B. Barak, Computational complexity: A modern approach, Cambridge University Press, Cambridge,
2009, DOI 10.1017/CBO9780511804090. MR2500087
[Abr72] M. Abramowitz (ed.), Handbook of mathematical functions: with formulas, graphs, and mathematical tables,
10th printing, with corrections, Applied mathematics series, no. 55, U. S. Government Printing Office, Wash-
ington, DC, 1972 (English).
[AGP94] W. R. Alford, A. Granville, and C. Pomerance, There are infinitely many Carmichael numbers, Ann. of Math. (2)
139 (1994), no. 3, 703–722, DOI 10.2307/2118576. MR1283874
[AHU74] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The design and analysis of computer algorithms, second print-
ing, Addison-Wesley Series in Computer Science and Information Processing, Addison-Wesley Publishing Co.,
Reading, Mass.-London-Amsterdam, 1975. MR413592
[AKS04] M. Agrawal, N. Kayal, and N. Saxena, PRIMES is in P, Ann. of Math. (2) 160 (2004), no. 2, 781–793, DOI
10.4007/annals.2004.160.781. MR2123939
[Aut23] Wikipedia Authors, Timeline of quantum computing and communication, September 2023, Page Version ID:
1174829260.
[Ben80] P. Benioff, The computer as a physical system: a microscopic quantum mechanical Hamiltonian model of com-
puters as represented by Turing machines, J. Statist. Phys. 22 (1980), no. 5, 563–591, DOI 10.1007/BF01011339.
MR574722
[BHT98] G. Brassard, P. Høyer, and A. Tapp, Quantum counting, Automata, languages and programming (Aalborg,
1998), Lecture Notes in Comput. Sci., vol. 1443, Springer, Berlin, 1998, pp. 820–831, DOI 10.1007/BFb0055105.
MR1683527
[BLM17] J. Buchmann, K. E. Lauter, and M. Mosca (eds.), Postquantum cryptography, part 1, IEEE Security & Privacy,
vol. 15, IEEE, 2017.
[BLM18] J. Buchmann, K. E. Lauter, and M. Mosca (eds.), Postquantum cryptography, part 2, IEEE Security & Privacy,
vol. 16, IEEE, 2018.
[BLP93] J. P. Buhler, H. W. Lenstra Jr., and C. Pomerance, Factoring integers with the number field sieve, The devel-
opment of the number field sieve, Lecture Notes in Math., vol. 1554, Springer, Berlin, 1993, pp. 50–94, DOI
10.1007/BFb0091539. MR1321221
[Buc04] J. Buchmann, Introduction to cryptography, 2nd ed., Undergraduate Texts in Mathematics, Springer-Verlag, New
York, 2004, DOI 10.1007/978-1-4419-9003-7. MR2075209
[BWP+ 17] J. D. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nat.
549 (2017), no. 7671, 195–202.
[CEH+ 98] R. Cleve, A. Ekert, L. Henderson, C. Macchiavello, and M. Mosca, On quantum algorithms, Complexity 4 (1998),
no. 1, 33–42, DOI 10.1002/(SICI)1099-0526(199809/10)4:1<33::AID-CPLX10>3.0.CO;2-U. MR1653992
[Cle11] R. Cleve, Classical lower bounds for Simon’s problem, https://cs.uwaterloo.ca/~cleve/courses/
F11CS667/SimonClassicalLB.pdf, 2011.
[CLRS22] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms, 4th ed., The MIT Press,
Cambridge, MA, 2022.
[Dav82] M. Davis, Computability & unsolvability, Dover, New York, 1982.
[Deu85] D. Deutsch, Quantum theory, the Church-Turing principle and the universal quantum computer, Proc. Roy. Soc.
London Ser. A 400 (1985), no. 1818, 97–117. MR801665

363
364 Bibliography

[DGM+ 21] M. Dürmuth, M. Golla, P. Markert, A. May, and L. Schlieper, Towards quantum large-scale password guessing on
real-world distributions, Cryptology and network security, Lecture Notes in Comput. Sci., vol. 13099, Springer,
2021, pp. 412–431, DOI 10.1007/978-3-030-92548-2_22. MR4460974
[DHM+ 18] D. Dervovic, M. Herbster, P. Mountney, S. Severini, N. Usher, and L. Wossnig, Quantum linear systems algo-
rithms: a primer, CoRR abs/1802.08227 (2018).
[DJ92] D. Deutsch and R. Jozsa, Rapid solution of problems by quantum computation, Proc. Roy. Soc. London Ser. A 439
(1992), no. 1907, 553–558, DOI 10.1098/rspa.1992.0167. MR1196433
[Fey82] R. P. Feynman, Simulating physics with computers, Physics of computation, Part II (Dedham, Mass., 1981), In-
ternat. J. Theoret. Phys. 21 (1981/82), no. 6-7, 467–488, DOI 10.1007/BF02650179. MR658311
[FK03] J. B. Fraleigh and V. J. Katz, A first course in abstract algebra, 7th ed., Addison-Wesley, Boston, 2003.
[Fon12] F. Fontein, The probability that two numbers are coprime, https://math.fontein.de/2012/07/10/
the-probability-that-two-numbers-are-coprime/, 2012.
[GLRS16] M. Grassl, B. Langenberg, M. Roetteler, and R. Steinwandt, Applying Grover’s algorithm to AES: quantum re-
source estimates, Post-quantum cryptography, Lecture Notes in Comput. Sci., vol. 9606, Springer, 2016, pp. 29–
43, DOI 10.1007/978-3-319-29360-8_3. MR3509727
[Gro96] L. K. Grover, A fast quantum mechanical algorithm for database search, Proceedings of the Twenty-eighth An-
nual ACM Symposium on the Theory of Computing (Philadelphia, PA, 1996), ACM, New York, 1996, pp. 212–
219, DOI 10.1145/237814.237866. MR1427516
[HHL08] A. W. Harrow, A. Hassidim, and S. Lloyd, Quantum algorithm for solving linear systems of equations, 2008, cite
arxiv:0811.3171, Comment: 15 pages; v2 is much longer, with errors fixed, run-time improved and a new BQP-
completeness result added; v3 is the final published version and mostly adds clarifications and corrections to
v2.
[HvdH21] D. Harvey and J. van der Hoeven, Integer multiplication in time 𝑂(𝑛 log 𝑛), Ann. of Math. (2) 193 (2021), no. 2,
563–617, DOI 10.4007/annals.2021.193.2.4. MR4224716
[IR10] K. Ireland and M. I. Rosen, A classical introduction to modern number theory, 2nd ed., 3rd printing, Graduate
Texts in Mathematics, no. 84, Springer, New York, Berlin, Heidelberg, 2010 (English).
[Jor] S. Jordan, Quantum algorithm zoo, https://quantumalgorithmzoo.org/.
[KLM06] P. Kaye, R. Laflamme, and M. Mosca, An introduction to quantum computing, Oxford University Press, Oxford,
2007. MR2311153
[Knu82] D. E. Knuth, The art of computer programming. 1: Fundamental algorithms, 2nd ed., 7th printing, Addison-
Wesley, Reading, MA, 1982.
[LP92] H. W. Lenstra Jr. and C. Pomerance, A rigorous time bound for factoring integers, J. Amer. Math. Soc. 5 (1992),
no. 3, 483–516, DOI 10.2307/2152702. MR1137100
[LLMP93] A. K. Lenstra, H. W. Lenstra Jr., M. S. Manasse, and J. M. Pollard, The factorization of the ninth Fermat number,
Math. Comp. 61 (1993), no. 203, 319–349, DOI 10.2307/2152957. MR1182953
[LP98] H. R. Lewis and C. H. Papadimitriou, Elements of the theory of computation, 2nd ed., Prentice-Hall, Upper Saddle
River, N.J, 1998.
[Man80] Y. Manin, Computable and noncomputable (Russian), Cybernetics, 1980.
[Man99] Y. I. Manin, Classical computing, quantum computing, and Shor’s factoring algorithm, Astérisque 266 (2000),
Exp. No. 862, 5, 375–404. Séminaire Bourbaki, Vol. 1998/99. MR1772680
[NC16] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information, Cambridge University Press,
Cambridge, 2000. MR1796805
[RC18] R. Rines and I. L. Chuang, High performance quantum modular multipliers, arXiv:1801.01081 (2018).
[Rud76] W. Rudin, Principles of mathematical analysis, 3rd ed., International Series in Pure and Applied Mathematics,
McGraw-Hill Book Co., New York-Auckland-Düsseldorf, 1976. MR385023
[Sho94] P. W. Shor, Polynominal time algorithms for discrete logarithms and factoring on a quantum computer, Algorith-
mic Number Theory, First International Symposium, ANTS-I, Ithaca, NY, USA, May 6–9, 1994, Proceedings
(Leonard M. Adleman and Ming-Deh A. Huang, eds.), Lecture Notes in Computer Science, vol. 877, Springer,
1994, p. 289.
[Sim94] D. R. Simon, On the power of quantum computation, 35th Annual Symposium on Foundations of Com-
puter Science (Santa Fe, NM, 1994), IEEE Comput. Soc. Press, Los Alamitos, CA, 1994, pp. 116–123, DOI
10.1109/SFCS.1994.365701. MR1489241
[Sim97] D. R. Simon, On the power of quantum computation, SIAM J. Comput. 26 (1997), no. 5, 1474–1483, DOI
10.1137/S0097539796298637. MR1471989
[Vol99] H. Vollmer, Introduction to circuit complexity: A uniform approach, Texts in Theoretical Computer Science. An
EATCS Series, Springer-Verlag, Berlin, 1999, DOI 10.1007/978-3-662-03927-4. MR1704235
[Wan10] F. Wang, The Hidden Subgroup Problem, Publisher: arXiv Version Number: 1.
[Wat09] J. Watrous, Quantum computational complexity, Encyclopedia of Complexity and Systems Science (Robert A.
Meyers, ed.), Springer, 2009, pp. 7174–7201.
Index

𝑎 +𝑚 𝑏, 288 Pr𝐴,𝑎 (𝑟), ⃗ 22


𝑎 ⋅𝑚 𝑏, 288 P, 31
BPP, 32 P-uniform, 42, 200
BQP, 204 𝜙(𝜓), 108
bitLength, 3 𝜙(𝑝), ⃗ 107
ℂ, 284 𝜋/8 gate, 166
Co-NP, 33 𝜑(𝑁), 290
𝖢𝖢𝖭𝖮𝖳, 44 𝑝𝐴 (𝑎), 24
𝖢𝖢𝖭𝖮𝖳 gate, 44, 172 𝑝𝐴 (𝑎, 𝑘), 26
𝖢𝖭𝖮𝖳, 43 ℚ, 283
𝖢𝖭𝖮𝖳 gate, 114, 167 𝑞𝐴 (𝑎), 24
𝖢𝖲𝖶𝖠𝖯, 45 𝑞𝐴 (𝑎, 𝑘), 26
DSPACE(𝑓), 31 𝑅𝑤̂ (𝛾), 159
DTIME(𝑓), 31 ⌈𝑟⌉, 284
EXPTIME, 31 ⌊𝑟⌉, 284
eTime𝐴 (𝑎), 25 ⌊𝑟⌋, 284
𝛾(𝜓), 108 ℝ, 283
𝑔𝑡(𝜓), 108 randomString, 17
Hom(𝑀, 𝑁), 311, 313 𝑆𝑇 , 285
lcm, 291 𝑆𝑛 , 2
𝑙 mod 𝑘, 284 𝑆𝑛 , 300
𝑀 ⊗ 𝑁, 336 Σ∗ , 2
m(𝑎), 18 SO(3), 150
ℕ, 283 SU(3), 150
NP, 33 SIZE(𝑓), 43
𝑛-qubit state, 111 size(𝑎), 4
Ω(𝑓), 289 stringToInt, 3
O(𝑓), 289 𝑠𝑢(2), 160
o(𝑓), 289 Θ(𝑓), 289
𝖮(3), 150 𝜃(𝑝), ⃗ 107
𝜔(𝑓), 289 𝑣|⃗ 𝑤,⃗ 308
PP, 32 wSpace𝐴 , 14
PSPACE, 31 wTime𝐴 , 14

365
366 Index

ℤ, 283 bit, 2
ℤ𝑘 , 4, 284 bit length, 3
ℤ∗𝑚 , 290 bit operation, 5
bit-flip gate, 143
abelian black-box, 202
group, 298 black-box access, 206
semigroup, 298 Bloch sphere, 107
absolute convergence, 345 Boolean circuit, 36
absolute homogeneity, 61 Boolean function, 35
acyclic graph, 288 bra, 59
adjoint, 69, 71 bra notation, 57
adjugate, 320
algebra, 312 Carmichael number, 17
algebraic multiplicity, 329 Cartesian coordinates, 105
algorithm Cartesian product, 284
deterministic, 9 Cauchy-Schwartz inequality, 61
invariant, 12 certificate, 32, 33
probabilistic, 18 character, 2
random, 18 characteristic polynomial, 321
state, 11 circuit
algorithm run Boolean, 36
random sequence, 21 complexity, 40
alphabet, 2 family, 40
alternating function, 319 logic, 36
amplitude, 105 reversible, 43, 46
amplitude amplification, 257 circuit family
ancilla bit, 48 P-uniform, 42
ancillary gate, 175 closed under scalar multiplication, 309
angle between two vectors, 145 codomain, 286
antisymmetric relation, 285 coefficient, 303
argument, 286 coefficient vector, 322
assign instruction, 5 column echelon form, 326
associative operation, 287 reduced, 326
associativity, 308 common divisor, 290
automorphism, 313 commutative
group, 298
balanced function, 206, 209 semigroup, 298
basis, 311 commutative operation, 287
orthogonal, 106 commutative ring, 302
orthonormal, 106 complex numbers, 284
Bell state, 101 complexity
Bernoulli algorithm, 19 exponential, 30, 31
Bernoulli experiment, 348 linear, 30, 31
bijection, 286 polynomial, 30, 31
bijective function, 286 quasilinear, 30, 31
bilinear map, 330 subexponential, 30, 31
bilinearity, 106 complexity class, 31
binary alphabet, 2 composite integer, 12, 291
binary expansion, 3 Composite Systems Postulate, 112
binary length, 3 composition of functions, 287
binary operation, 287 computational basis, 55, 104
binary representation, 3 computational problem, 29
Index 367

instance, 29 dual vector, 58


solution, 29 dual vector space, 312
condition number, 282
congruence relation, 289 edge, 288
conjugate commutativity, 57 efficient encoding, 42, 200
conjugate linearity, 57 eigenspace, 329
conjugate symmetry, 57 eigenvalue, 329
constant function, 206, 209 eigenvalue estimation problem, 231
constant node, 36 eigenvector, 329
constant polynomial, 303 elementary data type, 3
continued fraction, 293 elementary gate, 202
expansion, 295 encoding, 2, 42
controlled swap, 45 efficient, 42
controlled-𝖭𝖮𝖳 gate, 43, 114 sensible, 42
controlled-𝑈 operator, 169 endomorphism, 313
convergent, 295 endomorphism algebra, 313
correctness of an algorithm, 12 endomorphism ring, 313
countable set, 345 entanglement, 112, 113
cross product, 146 equivalence class, 285
cubic complexity, 14 equivalence relation, 285
cycle, 288 erasure gates, 175
error-free probabilistic algorithm, 17
data type, 3 Euclidean norm, 62, 106
decimal alphabet, 2 Euler angles, 154
decision algorithm, 13, 20 Euler’s totient function, 290
degree of a polynomial, 303 event, 346
dense, 196 evolution postulate, 113
density operator, 124 expectation, 347
depth expectation value, 119
of a circuit, 36 expectation value of measurement, 132
of a node, 36 expected running time, 25
determinant, 319 expected value, 347
determinant function, 319 exponential complexity, 14, 30, 31
deterministic algorithm, 9
diagonal element, 316 failure probability, 24
diagonizable matrix, 330 false-biased algorithm, 20
dimension, 322 family
direct inner sum, 310 of quantum circuits, 199
direct product, 310 Fermat numbers, 291
direct sum, 310 Fermat test, 17
directed graph, 288 field, 302
discrete logarithm problem, 248 finitely generated module, 310
discriminant of a quadratic polynomial, floating point number, 4
304 for statement, 7
distributive operation, 302 Fredkin gate, 45
distributivity, 309 free module, 311
divisor, 289 function, 285
DL problem, 248 balanced, 209
domain, 286 constant, 209
dot product, 308 injective, 286
dual, 59 one-to-one, 286
dual module, 312 onto, 286
368 Index

surjective, 286 instance of a computational problem, 29


instruction, 5
garbage, 48 integer
gate, 36 composite, 12
gcd, 290 integer factorization problem, 30
gcd problem, 30 integers, 283
general linear group, 317 invariant of an algorithm, 12
general Simon’s problem, 218 inverse function, 286
generating system, 311 inverse image, 286
geometric multiplicity, 329 involution, 74
global phase factor, 110 isometry, 77
Goldbach language, 32 isomorphism, 311, 313
Gram-Schmidt orthogonalization, 63
graph, 288 ket, 54
Gray code, 190
greatest common divisor, 290 language, 13
group, 298 Las Vegas algorithm, 18
Grover diffusion operator, 258 Latin alphabet, 2
Grover iterator, 257, 258 leading coefficient, 303
least common multiple, 291
Hadamard length, 106
matrix, 72 of a word, 2
operator, 68 linear combination, 309
Hadamard gate, 114 linear complexity, 14, 30, 31
Hadamard operator, 68, 114, 144 linear operator, 67
Hermitian, 74 linear system, 328
conjugate, 69 Linear Systems Problem, 277
transpose, 69 linearly dependent, 311
Hermitian inner product, 58 linearly independent, 311
Hermitian symmetry, 57 logarithmic complexity, 14, 30, 31
hexadecimal alphabet, 2 logic circuit, 36
hidden subgroup, 218 logic gate, 35
Hilbert space, 60 loop, 7
Hilbert-Schmidt inner product, 71
homomorphism, 311, 313 map, 285
mapping, 285
identity element, 287, 309 Markov’s inequality, 348
identity function, 286 matrix, 314
identity gate, 142 adjugate, 320
identity matrix, 316 column vector, 314
if statement, 8 componentwise addition, 315
image, 286 inverse, 317
imaginary part, 284 invertible, 317
incoming edge, 288 normal, 83
injection, 286 product, 315
injective function, 286 rank, 325
inner product, 57, 95, 106 row vector, 314
input, 286 scalar product, 315
input node, 36 similar, 321
Input statement, 10 square, 315
input variable, 10 sum, 315
inseparable, 101 transpose, 314
Index 369

matrix product, 315 Output statement, 10


measurement, 118
in the computational basis, 105 partial trace, 135, 343
Measurement Postulate, 118 path, 288
Millennium Prize Problems, 33 Pauli
mixed state, 124 𝑋 operator, 67
module 𝑌 operator, 68
dimension, 322 𝑍 operator, 68
direct product, 310 matrix, 72
direct sum, 310 Pauli gates, 142
dual, 312 perfectly universal set of quantum gates,
free, 311 181
quotient, 311 permutation, 286
module over a ring, 309 phase gate, 166
monoid, 298 phase kickback, 207, 208, 211
monomial, 303 phase shift gate, 166
Monte Carlo algorithm, 18 phase-flip gate, 143
multilinear function, 319 Pohlig-Hellman algorithm, 248
multilinear map, 330 polynomial, 303
multiple, 289 polynomial complexity, 14, 30, 31
polynomial ring, 303
natural numbers, 283 positive definite operator, 88
node positive definiteness, 57, 61, 106
constant, 36 positive semidefinite operator, 88
input, 36 positivity, 57, 106
output, 36 positivity condition, 124
norm, 61 prime divisor, 291
Euclidean, 62 prime number, 291
induced, 62 probabilistic algorithm, 16, 18
normal operator, 83 success, 17
normalized function, 319 probability
distribution, 346
observable, 118 space, 346
off-diagonal element, 316 projection, 79
one-sided error, 20 promise problem, 210
one-to-one, 286 proper divisor, 12
onto, 286 pseudocode, 9
order finding problem, 234 pure state, 124
order of 𝑎 modulo 𝑁, 299 purification, 178
order of a group element, 299
orthogonal, 62, 106 quadratic complexity, 14
matrix, 150 quantum 𝖭𝖮𝖳 gate, 143
orthogonal basis, 106 quantum algorithm, 201
orthogonal complement, 66 quantum bit, 55, 104
orthogonal group, 150 quantum circuit, 176
orthogonal matrix, 150 Quantum Fourier Transform, 224
orthogonal projection, 79, 80 quantum interference, 207, 208
orthonormal, 62 Quantum Linear Systems Problem, 278
orthonormal basis, 106 quantum parallelism, 207, 208
outer product, 146 quantum permutation operator, 175
outgoing edge, 288 quantum register, 111
output node, 36 quantum swap gate, 175
370 Index

quasilinear complexity, 14, 30, 31 similar, 321


qubit, 55, 104 Simon’s algorithm, 214
quotient group, 302 Simon’s problem, 213
quotient module, 311 single-qubit gate, 142
singular value decomposition, 90
random algorithm, 18 size, 177
random sequence of an algorithm run, 21 of a circuit, 36
random variable, 347 of a data type object, 4
rank of a matrix, 325 of a matrix, 4
rational numbers, 283 of a Roman character, 4
real numbers, 283 of a vector, 4
real part, 284 of an integer, 4
reduced column echelon form, 326 size complexity, 43
reduced density operator, 137 solution of a computational problem, 29
reduced representation of a rational space complexity, 14
number, 237 of a quantum algorithm, 204
reduced row echelon form, 326 of quantum circuits, 203
reflexive, 284 worst-case, 14
relation, 284
span, 310
reflexive, 284
special orthogonal group, 150
repeat statement, 8
spectral decomposition, 86
representation matrix, 323
spherical coordinates, 107, 148
representative of an equivalence class, 285
square matrix, 315
restriction of a function, 286
diagonal element, 316
return statement, 11
off-diagonal element, 316
reversible
square root problem, 29
circuit, 43
standard 𝖢𝖭𝖮𝖳 gate, 167
gate, 43
standard Hermitian inner product, 58
reversible circuit, 46
state, 104
right-hand rule, 146
entangled, 113
rotation in ℝ3 , 150
nonentangled, 113
rotation operator, 159
of a qubit, 55
row echelon form, 326
reduced, 326 state of an algorithm, 11
run of an algorithm, 11 state space, 55, 104
running time, 14 State Space Postulate, 104
expected, 25 statement, 5
of a quantum algorithm, 203 Steinitz Exchange Lemma, 324
worst-case, 14 string, 2
subexponential complexity, 14, 30, 31
scalar product, 308, 315 subgroup, 301
Schmidt coefficients, 101 submodule, 309
Schmidt decomposition, 99 subspace, 309
Schmidt number, 101 success of a probabilistic algorithm, 17
Schmidt rank, 101 success probability, 16, 24
Schur decomposition, 81 sum of submodules, 310
self-adjoint, 74 sum of vectors, 308
semigroup, 298 superposition, 104, 111, 207, 208
sensible encoding, 42, 200 supremum, 180
separable, 101 surjection, 286
sesquilinearity, 57 surjective function, 286
sign of a permutation, 301 symmetric group, 300
Index 371

symmetric relation, 284 word, 2


worst-case
tensor product, 331, 336 space complexity, 14
isomorphism, 333 time complexity, 14
time complexity, 14
of a quantum algorithm, 203 zero divisor, 302
of quantum circuits, 203 zero matrix, 316
Toffoli gate, 44, 172 zero of a polynomial, 303
trace, 321 zero vector, 309
trace condition, 124
tracing out subsystems, 138
transitive relation, 285
transposition, 300
transposition operator, 174
triangle inequality, 61
true-biased algorithm, 20
Turing complete, 200
Turing-Church thesis, 42
two-level matrix, 187
two-sided error, 20

unary alphabet, 2
unary representation, 2
uncompute trick, 50, 51
uncountable set, 345
uniform circuit family, 42
unit group, 298, 302
unit matrix, 316
unit vector, 106
unitary, 75
group, 76
unitary gate, 175
unitary quantum operator, 181
universal gate set, 38
universal property, 331
universal set of quantum gates, 181

value, 286
value function, 37
value of a variable, 4
variable, 4, 303
vector, 308
component, 308
dot product, 308
entry, 308
length, 308
vector space, 309
dimension, 322
dual, 312
vertex, 288

wave function, 104


while statement, 7
Selected Published Titles in This Series
64 Johannes A. Buchmann, Introduction to Quantum Algorithms, 2024
63 Bettina Richmond and Thomas Richmond, A Discrete Transition to Advanced
Mathematics, Second Edition, 2023
62 Scott A. Taylor, Introduction to Mathematics, 2023
61 Bennett Chow, Introduction to Proof Through Number Theory, 2023
60 Lorenzo Sadun, The Six Pillars of Calculus: Biology Edition, 2023
59 Oscar Gonzalez, Topics in Applied Mathematics and Modeling, 2023
58 Sebastian M. Cioabă and Werner Linde, A Bridge to Advanced Mathematics, 2023
57 Meighan I. Dillon, Linear Algebra, 2023
56 Lorenzo Sadun, The Six Pillars of Calculus: Business Edition, 2023
55 Joseph H. Silverman, Abstract Algebra, 2022
54 Rustum Choksi, Partial Differential Equations, 2022
53 Louis-Pierre Arguin, A First Course in Stochastic Calculus, 2022
52 Michael E. Taylor, Introduction to Differential Equations, Second Edition, 2022
51 James R. King, Geometry Transformed, 2021
50 James P. Keener, Biology in Time and Space, 2021
49 Carl G. Wagner, A First Course in Enumerative Combinatorics, 2020
48 Róbert Freud and Edit Gyarmati, Number Theory, 2020
47 Michael E. Taylor, Introduction to Analysis in One Variable, 2020
46 Michael E. Taylor, Introduction to Analysis in Several Variables, 2020
45 Michael E. Taylor, Linear Algebra, 2020
44 Alejandro Uribe A. and Daniel A. Visscher, Explorations in Analysis, Topology, and
Dynamics, 2020
43 Allan Bickle, Fundamentals of Graph Theory, 2020
42 Steven H. Weintraub, Linear Algebra for the Young Mathematician, 2019
41 William J. Terrell, A Passage to Modern Analysis, 2019
40 Heiko Knospe, A Course in Cryptography, 2019
39 Andrew D. Hwang, Sets, Groups, and Mappings, 2019
38 Mark Bridger, Real Analysis, 2019
37 Mike Mesterton-Gibbons, An Introduction to Game-Theoretic Modelling, Third
Edition, 2019
36 Cesar E. Silva, Invitation to Real Analysis, 2019
35 Álvaro Lozano-Robledo, Number Theory and Geometry, 2019
34 C. Herbert Clemens, Two-Dimensional Geometries, 2019
33 Brad G. Osgood, Lectures on the Fourier Transform and Its Applications, 2019
32 John M. Erdman, A Problems Based Course in Advanced Calculus, 2018
31 Benjamin Hutz, An Experimental Introduction to Number Theory, 2018
30 Steven J. Miller, Mathematics of Optimization: How to do Things Faster, 2017
29 Tom L. Lindstrøm, Spaces, 2017
28 Randall Pruim, Foundations and Applications of Statistics: An Introduction Using R,
Second Edition, 2018
27 Shahriar Shahriari, Algebra in Action, 2017
26 Tamara J. Lakins, The Tools of Mathematical Reasoning, 2016
25 Hossein Hosseini Giv, Mathematical Analysis and Its Inherent Nature, 2016
24 Helene Shapiro, Linear Algebra and Matrices, 2015

For a complete list of titles in this series, visit the


AMS Bookstore at www.ams.org/bookstore/amstextseries/.
6

Quantum algorithms are among the most important,


interesting, and promising innovations in information
and communication technology. They pose a major

Introduction to Quantum Algorithms


threat to today’s cybersecurity and at the same time
promise great benefits by potentially solving previously

Photo © Katrin Binner


intractable computational problems with reasonable
effort. The theory of quantum algorithms is based on
advanced concepts from computer science, mathematics,
and physics.
Introduction to Quantum Algorithms offers a mathematically precise exploration
of these concepts, accessible to those with a basic mathematical university
education, while also catering to more experienced readers. This comprehen-
sive book is suitable for self-study or as a textbook for one- or two-semester
introductory courses on quantum computing algorithms. Instructors can tailor
their approach to emphasize theoretical understanding and proofs or practical
applications of quantum algorithms, depending on the course’s goals and
timeframe.

Buchmann
For additional information
and updates on this book, visit
www.ams.org/bookpages/amstext-64

AMSTEXT/64

This series was founded by the highly respected


mathematician and educator, Paul J. Sally, Jr.

You might also like